Good Morning David:

Pardon me, our internal script hides the details of running lastz.  You are 
correct,
lastz wants only one sequence in target.  Your targets should be large chunks 
from
the completely assembled genome.  The query sequences can be small sequences.
We use a script: blastz-normalizeLav to lift the lav results of the split 
targets
back to their chrom coordinates and generate psl files for each target chunk.
Those psl files are given to axtChain to construct the chain files.
Our typical chunk sizes for target sequence are 10,000,000 in length with 
10,000 overlap
to the next chunk.

We have no tools to convert GTF CIGAR format files to chain files.
You can certainly convert your GTF files to bed files to find the locations
in the genome where the alignments locate, but there is no alignment
information there.

You can see our scripts in the source tree: blastz-run-ucsc
and others in this directory:
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=tree;f=src/hg/utils/automation
http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob_plain;f=src/hg/utils/automation/blastz-run-ucsc

--Hiram

David Garfield wrote:
> HI Katrina (et al.),
> 
> Thanks for the information. Let me ask a small mess of follow-up questions:
> 
> 1) As far as I can tell, lastz will only allow multiple target sequences if 
> the output is something other than a .lav formatted file (one can reconfigure 
> the program blastZWrapper to help with that, but lastz spits out an error 
> message if you try to generate .lav files with multiple target sequences). 
> This is fine, of course, but I'm unsure what to do with an anything other 
> than a .lav file as an output if I am following the directions here: 
> (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto). So.....
> 
>       a) When you run lastz in-house, are you generating .lav files or 
> skipping directly to a MAF file (skipping the previous lav2maf script) which 
> would allow for multiple sequences in the target file?
> 
>       b) Running a whole genome alignment is, obviously, a fairly slow task. 
> To address this, the wiki suggests breaking the two genomes apart into 
> smaller, overlapping chunks of sequence. The lav (or psl) files resulting 
> from aligning these sections were then chained together. My questions here
>               i) Does anything special need to be done to the headers of the 
> sequence files so that they can be chained together after the alignment?
>               ii) If you aren't generating lav (or psl) files with lastz, 
> what formats are you generating and how do you move on from there?
> 
> 2) It sounds like my GTF files likely work as they don't contain CIGAR 
> formatted information -- just start and end coordinates. If I could generate 
> a SAM formatted file (with CIGAR information), is it possible to generate a 
> chain file from there?
> 
> Many, many thanks,
> 
> DG
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to