Hi Bremen,

You have the basic flow correct.  Some details added below:

> 2. Align both chromosomes with blastz

We now use lastz, an improved version of blastz
(http://www.bx.psu.edu/miller_lab/, search for lastz) but blastz is
sufficient.  The output format of blastz, LAV, must be converted to
either PSL or AXT so that axtChain can read the alignments (we have
lavToAxt and lavToPsl programs; PSL is more compact).  lastz can
produce axt directly with the --output=axt option.


> 3. Chain using axtChain (QUESTION: I see this program takes two directories
> as arguments. If I wish only to compare two chromosomes, would these
> directories have only one file each?)

Yes, that would work.  However, you can also use the -faQ and -faT
options, and give fasta files instead of directories.  To see a
description of all axtChain options, run "axtChain" with no arguments.
Here is an example usage of -faQ and -faT:

axtChain -faQ -faT in.axt oneChrom.fa otherChrom.fa chroms.chain

Another alternative is to convert your fasta files into the compact
format 2bit like this:

faToTwoBit oneChrom.fa oneChrom.2bit

-- then you can give axtChain [and lastz but not blastz] the .2bit file 
instead of the directory, and don't have to pass -faQ / -faT.


> 4. Get size of each chromosome for chain filtering using faSize

Yes.  Note that the sequence name and size must appear in a
tab-separated file, so use the -detailed flag like this:

faSize -detailed oneChrom.fa > oneChrom.sizes


> 5. Sort and filter chains
>    a. use chainMergeSort using chain from step 3
>    b. prenet with chainPreNet using chain from 5.a., size of target
>       chromosome gotten from step 4, and size of query chromosome from
>       step 4 as arguments
> 6. Netting?

Yes.  We pipe the output of chainPreNet to chainNet (and pipe that to
netSyntenic) like this:

chainPreNet chroms.chain oneChrom.sizes otherChrom.sizes stdout \
| chainNet stdin -minSpace=1 oneChrom.sizes otherChrom.sizes stdout /dev/null \
| netSyntenic stdin chroms.net


> 7. Convert to .maf and use phyloFit

Yes.  For historical reasons we use netToAxt | axtToMaf, we don't have
a netToMaf.  You can see an example of how phyloFit was run for the
D. melanogaster Conservation track in our source tree
(kent/src/hg/makeDb/doc/dm2.txt, search for "PHASTCONS 15WAY" and then
search for phyloFit).  

If you have only two species, there is not much phylogenetic
information for phyloFit, but I suppose it could still make a
substitution rate model.  If you have more than two species, you will
also need multiz from http://www.bx.psu.edu/miller_lab/ .


> 8. Run phastCons to get .wig output which can be uploaded as a conservation
> track

Yes.  Adam Siepel also has a new method of scoring conservation,
phyloP, which we offer in addition to phastCons scores on newer
Conservation tracks.  I have no idea whether one would be more
appropriate than the other when working with only two species.


> Since I am only comparing two chromosomes at a time, I think only a single
> chain is generated which doesn't have to be netted. Am I missing anything?

Netting is still necessary, in order to get single-coverage alignments
on which conservation scores are computed.  axtChain usually produces
many chains that cover the same position on the reference
genome/chromosome.  The netting process selects the highest-scoring
chain as the top level, and then fills in gaps (unaligned areas) using
the next highest-scoring chain, and then fills in that chain's gaps
using the next highest-scoring chain and so on.  It is kind of like
extracting a global alignment from a sea of chained local alignments.


Max Haeussler contributed a very useful genomewiki page about
reconstructing our alignment & conservation pipeline (perhaps you have
seen it already?):

http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto

Hope that helps, and please send more questions to [email protected]
as you have them,

Angie

----- "Bremen Braun" <[email protected]> wrote:

> From: "Bremen Braun" <[email protected]>
> To: [email protected]
> Sent: Thursday, December 3, 2009 9:31:48 AM GMT -08:00 US/Canada Pacific
> Subject: [Genome] Generate conservation info for interspecies genome 
> comparisons
>
> Hello,
> 
> I have sequences of similar chromosomes for different species that I
> want to
> compare. I would like to be able to generate a conservation track such
> as
> the one seen here:
> http://tinyurl.com/yz6o6fu
> 
> I looked at an example of steps to be taken. Could you please
> verify/clarify
> for me? Let's assume I want to compare 2 chromosomes.
> 1. Mask both chromosomes
> 2. Align both chromosomes with blastz
> 3. Chain using axtChain (QUESTION: I see this program takes two
> directories
> as arguments. If I wish only to compare two chromosomes, would these
> directories have only one file each?)
> 4. Get size of each chromosome for chain filtering using faSize
> 5. Sort and filter chains
>    a. use chainMergeSort using chain from step 3
>    b. prenet with chainPreNet using chain from 5.a., size of target
> chromosome gotten from step 4, and size of query chromosome from step
> 4 as
> arguments
> 6. Netting?
> 7. Convert to .maf and use phyloFit
> 8. Run phastCons to get .wig output which can be uploaded as a
> conservation
> track
> 
> Since I am only comparing two chromosomes at a time, I think only a
> single
> chain is generated which doesn't have to be netted. Am I missing
> anything?
> 
> Thanks,
> Bremen
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to