Angie, Thanks for the reply. I now have some questions regarding tool behavior.
1. Following Max's guide, I started by masking both fasta files to be compared with trfBig using Tandem Repeats Finder v4.04. However, upon running I get messages from stdout such as the following: Freeing Memory... Resolving output... Done.maybeSystem: system(cd .; trf ./chr.tf 2 7 7 80 10 50 2000 -m ) failed (exit status 1): Unknown error: 0 and Freeing Memory... Resolving output... Done.maybeSystem: system(cd .; trf ./chr2.tf 2 7 7 80 10 50 2000 -m ) failed (exit status 1): Cannot allocate memory I am running OS X v10.6.2 on a quad core with 6 GB of RAM and using the newest trfBig as far as I know. 2. chainPreNet generates the following error: Got 1 chroms in directory/chr1.sizes, 1 in directory/chr2.sizes hashMustFindVal: 'chr2' not found Finishing nets writing stdout writing /dev/null Couldn't open /proc/self/stat , No such file or directory hashMustFindVal: 'chr2' not found Got 1 chroms in directory/chr1.sizes, 1 in directory/chr2.sizes Finishing nets writing stdout writing /dev/null Couldn't open /proc/self/stat , No such file or directory when running line: chainMergeSort *.chain > all.chain chainPreNet all.chain chr1.sizes chr2.sizes stdout \ | chainNet stdin -minSpace=1 chr1.sizes chr2.sizes stdout /dev/null \ | netSyntenic stdin out.net All the tools I am using were compiled from the newest version of jksrc. Any ideas what the problems are? Bremen On Mon, Dec 7, 2009 at 11:57 PM, Angie Hinrichs <[email protected]> wrote: > Hi Bremen, > > You have the basic flow correct. Some details added below: > > > 2. Align both chromosomes with blastz > > We now use lastz, an improved version of blastz > (http://www.bx.psu.edu/miller_lab/, search for lastz) but blastz is > sufficient. The output format of blastz, LAV, must be converted to > either PSL or AXT so that axtChain can read the alignments (we have > lavToAxt and lavToPsl programs; PSL is more compact). lastz can > produce axt directly with the --output=axt option. > > > > 3. Chain using axtChain (QUESTION: I see this program takes two > directories > > as arguments. If I wish only to compare two chromosomes, would these > > directories have only one file each?) > > Yes, that would work. However, you can also use the -faQ and -faT > options, and give fasta files instead of directories. To see a > description of all axtChain options, run "axtChain" with no arguments. > Here is an example usage of -faQ and -faT: > > axtChain -faQ -faT in.axt oneChrom.fa otherChrom.fa chroms.chain > > Another alternative is to convert your fasta files into the compact > format 2bit like this: > > faToTwoBit oneChrom.fa oneChrom.2bit > > -- then you can give axtChain [and lastz but not blastz] the .2bit file > instead of the directory, and don't have to pass -faQ / -faT. > > > > 4. Get size of each chromosome for chain filtering using faSize > > Yes. Note that the sequence name and size must appear in a > tab-separated file, so use the -detailed flag like this: > > faSize -detailed oneChrom.fa > oneChrom.sizes > > > > 5. Sort and filter chains > > a. use chainMergeSort using chain from step 3 > > b. prenet with chainPreNet using chain from 5.a., size of target > > chromosome gotten from step 4, and size of query chromosome from > > step 4 as arguments > > 6. Netting? > > Yes. We pipe the output of chainPreNet to chainNet (and pipe that to > netSyntenic) like this: > > chainPreNet chroms.chain oneChrom.sizes otherChrom.sizes stdout \ > | chainNet stdin -minSpace=1 oneChrom.sizes otherChrom.sizes stdout > /dev/null \ > | netSyntenic stdin chroms.net > > > > 7. Convert to .maf and use phyloFit > > Yes. For historical reasons we use netToAxt | axtToMaf, we don't have > a netToMaf. You can see an example of how phyloFit was run for the > D. melanogaster Conservation track in our source tree > (kent/src/hg/makeDb/doc/dm2.txt, search for "PHASTCONS 15WAY" and then > search for phyloFit). > > If you have only two species, there is not much phylogenetic > information for phyloFit, but I suppose it could still make a > substitution rate model. If you have more than two species, you will > also need multiz from http://www.bx.psu.edu/miller_lab/ . > > > > 8. Run phastCons to get .wig output which can be uploaded as a > conservation > > track > > Yes. Adam Siepel also has a new method of scoring conservation, > phyloP, which we offer in addition to phastCons scores on newer > Conservation tracks. I have no idea whether one would be more > appropriate than the other when working with only two species. > > > > Since I am only comparing two chromosomes at a time, I think only a > single > > chain is generated which doesn't have to be netted. Am I missing > anything? > > Netting is still necessary, in order to get single-coverage alignments > on which conservation scores are computed. axtChain usually produces > many chains that cover the same position on the reference > genome/chromosome. The netting process selects the highest-scoring > chain as the top level, and then fills in gaps (unaligned areas) using > the next highest-scoring chain, and then fills in that chain's gaps > using the next highest-scoring chain and so on. It is kind of like > extracting a global alignment from a sea of chained local alignments. > > > Max Haeussler contributed a very useful genomewiki page about > reconstructing our alignment & conservation pipeline (perhaps you have > seen it already?): > > http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto > > Hope that helps, and please send more questions to [email protected] > as you have them, > > Angie > > ----- "Bremen Braun" <[email protected]> wrote: > > > From: "Bremen Braun" <[email protected]> > > To: [email protected] > > Sent: Thursday, December 3, 2009 9:31:48 AM GMT -08:00 US/Canada Pacific > > Subject: [Genome] Generate conservation info for interspecies genome > comparisons > > > > Hello, > > > > I have sequences of similar chromosomes for different species that I > > want to > > compare. I would like to be able to generate a conservation track such > > as > > the one seen here: > > http://tinyurl.com/yz6o6fu > > > > I looked at an example of steps to be taken. Could you please > > verify/clarify > > for me? Let's assume I want to compare 2 chromosomes. > > 1. Mask both chromosomes > > 2. Align both chromosomes with blastz > > 3. Chain using axtChain (QUESTION: I see this program takes two > > directories > > as arguments. If I wish only to compare two chromosomes, would these > > directories have only one file each?) > > 4. Get size of each chromosome for chain filtering using faSize > > 5. Sort and filter chains > > a. use chainMergeSort using chain from step 3 > > b. prenet with chainPreNet using chain from 5.a., size of target > > chromosome gotten from step 4, and size of query chromosome from step > > 4 as > > arguments > > 6. Netting? > > 7. Convert to .maf and use phyloFit > > 8. Run phastCons to get .wig output which can be uploaded as a > > conservation > > track > > > > Since I am only comparing two chromosomes at a time, I think only a > > single > > chain is generated which doesn't have to be netted. Am I missing > > anything? > > > > Thanks, > > Bremen > > _______________________________________________ > > Genome maillist - [email protected] > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
