HI Katrina (et al.), Thanks for the information. Let me ask a small mess of follow-up questions:
1) As far as I can tell, lastz will only allow multiple target sequences if the output is something other than a .lav formatted file (one can reconfigure the program blastZWrapper to help with that, but lastz spits out an error message if you try to generate .lav files with multiple target sequences). This is fine, of course, but I'm unsure what to do with an anything other than a .lav file as an output if I am following the directions here: (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto). So..... a) When you run lastz in-house, are you generating .lav files or skipping directly to a MAF file (skipping the previous lav2maf script) which would allow for multiple sequences in the target file? b) Running a whole genome alignment is, obviously, a fairly slow task. To address this, the wiki suggests breaking the two genomes apart into smaller, overlapping chunks of sequence. The lav (or psl) files resulting from aligning these sections were then chained together. My questions here i) Does anything special need to be done to the headers of the sequence files so that they can be chained together after the alignment? ii) If you aren't generating lav (or psl) files with lastz, what formats are you generating and how do you move on from there? 2) It sounds like my GTF files likely work as they don't contain CIGAR formatted information -- just start and end coordinates. If I could generate a SAM formatted file (with CIGAR information), is it possible to generate a chain file from there? Many, many thanks, DG On Dec 14, 2010, at 4:43 PM, Katrina Learned wrote: > Hi David, > > One of our engineers has provided this information in response to your > questions: > > "It is OK to give a target sequence file to lastz that contains multiple > sequences. Run lastz with the argument --help=files to see more information > of what the target/query files can be to lastz. > > The GTF file 454 read location file is not enough information to determine > what the alignment was unless it is a CIGAR format." > > Please don't hesitate to contact the mail list again if you have any further > questions. > > Katrina Learned > UCSC Genome Bioinformatics Group > > David Garfield wrote, On 12/10/2010 10:00 AM: >> Hi folks, >> >> First off, many thanks for your earlier help in figuring out soft masking >> (https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024129.html). It >> all worked out just fine. Now I have two (related) follow-up questions. >> >> PROJECT: I'm conducting some scans for selection on a mess of sea urchins. >> One sea urchin (the reference) has an assembled genome. The others are 454 >> sequences. I'd like to generate .chain files so that I can use liftOver to >> collect specified orthologous regions from the whole set of species. >> >> QUESTION 1: The first question is purely technical. The how-to page on >> whole-genome alignments >> (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto) tells me >> that lastz has replaced blastz. This is fine, however, it seems that there >> is one significant difference between the two programs. While blastz (or the >> wrapper Blastz) can produce .lav files when there are multiple sequences in >> the target file, lastz cannot (or does not seem to be able to). >> >> How do y'all get around this limitation? Should I simply break apart the >> genome such that each (reasonably sized) scaffold from the target genome has >> its own file? Or do you use a different output format that can handle >> multiple sequences in the target? >> >> QUESTION 2: For two of my sea urchin species, I've been given .gtf formatted >> files that match each 454 read to a location within the target genome. Is >> there a way to use this existing map to generate chain files? I haven't >> found anything nearly so convenient for pulling out orthologous sequences >> for multiple species than having chain files. >> >> Many thanks, >> >> David >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
