Hi David, One of our engineers has provided this information in response to your questions:
"It is OK to give a target sequence file to lastz that contains multiple sequences. Run lastz with the argument --help=files to see more information of what the target/query files can be to lastz. The GTF file 454 read location file is not enough information to determine what the alignment was unless it is a CIGAR format." Please don't hesitate to contact the mail list again if you have any further questions. Katrina Learned UCSC Genome Bioinformatics Group David Garfield wrote, On 12/10/2010 10:00 AM: > Hi folks, > > First off, many thanks for your earlier help in figuring out soft masking > (https://lists.soe.ucsc.edu/pipermail/genome/2010-November/024129.html). It > all worked out just fine. Now I have two (related) follow-up questions. > > PROJECT: I'm conducting some scans for selection on a mess of sea urchins. > One sea urchin (the reference) has an assembled genome. The others are 454 > sequences. I'd like to generate .chain files so that I can use liftOver to > collect specified orthologous regions from the whole set of species. > > QUESTION 1: The first question is purely technical. The how-to page on > whole-genome alignments > (http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto) tells me > that lastz has replaced blastz. This is fine, however, it seems that there is > one significant difference between the two programs. While blastz (or the > wrapper Blastz) can produce .lav files when there are multiple sequences in > the target file, lastz cannot (or does not seem to be able to). > > How do y'all get around this limitation? Should I simply break apart the > genome such that each (reasonably sized) scaffold from the target genome has > its own file? Or do you use a different output format that can handle > multiple sequences in the target? > > QUESTION 2: For two of my sea urchin species, I've been given .gtf formatted > files that match each 454 read to a location within the target genome. Is > there a way to use this existing map to generate chain files? I haven't found > anything nearly so convenient for pulling out orthologous sequences for > multiple species than having chain files. > > Many thanks, > > David > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
