Angie,

1) Unfortunately, running trf prints no error message, but issuing "echo $?"
reveals it is indeed exiting with error code 1 which is interpreted by
trfBig with strerror. My guess is this is a problem running the precompiled
binary with Snow Leopard so I tried it with 32 bit Linux. This time trfBig
returns "Done.maybeSystem: ... failed (exit status 1): Success", so I guess
that means it's working properly.

2) As for using -faQ and -faT flags for axtChain (axtChain -faQ -faT -psl
-linearGap=loose $i ${args[0]} ${args[1]} $(basename $i .psl).chain), I get
a segfault so I opted to use directories for now.

3) The "hashMustFindVal" from earlier was because I had switched some
arguments.

Thanks for your help and hopefully I have it working now

Bremen

On Wed, Dec 9, 2009 at 1:39 PM, Angie Hinrichs <[email protected]> wrote:

> Hi Bremen,
>
> Regarding the error messages from bigTrf/trf -- what do you see if you run
> the trf command (copied from the maybeSystem(...)) in your shell?  I'm
> hoping that there will be a more informative error message from trf itself.
> If trf is failing to run on your sequence, then the trf command line and
> sequence would be a good test case for the TRF developers.
>
> To help determine why we're not finding the expected sequence name in the
> hash of chrom names to sizes, can you send the following?:
> * contents of your chr1.sizes and chr2.sizes files
> * header lines of the two fasta files passed to lastz/blastz
> * the first line that begins with "chain " in all.chain
>
>
> Angie
>
> ----- "Bremen Braun" <[email protected]> wrote:
> > From: "Bremen Braun" <[email protected]>
> > To: "Angie Hinrichs" <[email protected]>
> > Cc: [email protected]
> > Sent: Tuesday, December 8, 2009 11:51:37 AM GMT -08:00 US/Canada Pacific
> > Subject: Re: [Genome] Generate conservation info for interspecies genome
>  comparisons
>
> >
> > Angie,
> > Thanks for the reply. I now have some questions regarding tool behavior.
> >
> > 1. Following Max's guide, I started by masking both fasta files to be
> compared with trfBig using Tandem Repeats Finder v4.04. However, upon
> running I get messages from stdout such as the following:
> >
> > Freeing Memory...
> > Resolving output...
> > Done.maybeSystem: system(cd .;   trf ./chr.tf 2 7 7 80 10 50 2000 -m )
> failed (exit status 1): Unknown error: 0
> >
> > and
> >
> > Freeing Memory...
> > Resolving output...
> > Done.maybeSystem: system(cd .; trf ./chr2.tf 2 7 7 80 10 50 2000 -m )
> failed (exit status 1): Cannot allocate memory
> >
> > I am running OS X v10.6.2 on a quad core with 6 GB of RAM and using the
> newest trfBig as far as I know.
> >
> > 2. chainPreNet generates the following error:
> >
> > Got 1 chroms in directory/chr1.sizes, 1 in directory/chr2.sizes
> > hashMustFindVal: 'chr2' not found
> > Finishing nets
> > writing stdout
> > writing /dev/null
> > Couldn't open /proc/self/stat , No such file or directory
> > hashMustFindVal: 'chr2' not found
> > Got 1 chroms in directory/chr1.sizes, 1 in directory/chr2.sizes
> > Finishing nets
> > writing stdout
> > writing /dev/null
> > Couldn't open /proc/self/stat , No such file or directory
> >
> > when running line:
> >
> > chainMergeSort *.chain > all.chain
> > chainPreNet all.chain chr1.sizes chr2.sizes stdout \
> > | chainNet stdin -minSpace=1 chr1.sizes chr2.sizes stdout /dev/null \
> > | netSyntenic stdin out.net
> >
> > All the tools I am using were compiled from the newest version of jksrc.
> Any ideas what the problems are?
> >
> > Bremen
> >
> >
> > On Mon, Dec 7, 2009 at 11:57 PM, Angie Hinrichs <[email protected]>wrote:
> >
>>
>> Hi Bremen,
>> >
>> > You have the basic flow correct.  Some details added below:
>> >
>> >
>> > > 2. Align both chromosomes with blastz
>> >
>> >
>> We now use lastz, an improved version of blastz
>> > (http://www.bx.psu.edu/miller_lab/, search for lastz) but blastz is
>> > sufficient.  The output format of blastz, LAV, must be converted to
>> > either PSL or AXT so that axtChain can read the alignments (we have
>> > lavToAxt and lavToPsl programs; PSL is more compact).  lastz can
>> > produce axt directly with the --output=axt option.
>> >
>> >
>> >
>> > > 3. Chain using axtChain (QUESTION: I see this program takes two
>> directories
>> > > as arguments. If I wish only to compare two chromosomes, would these
>> > > directories have only one file each?)
>> >
>> >
>> Yes, that would work.  However, you can also use the -faQ and -faT
>> > options, and give fasta files instead of directories.  To see a
>> > description of all axtChain options, run "axtChain" with no arguments.
>> > Here is an example usage of -faQ and -faT:
>> >
>> > axtChain -faQ -faT in.axt oneChrom.fa otherChrom.fa chroms.chain
>> >
>> > Another alternative is to convert your fasta files into the compact
>> > format 2bit like this:
>> >
>> > faToTwoBit oneChrom.fa oneChrom.2bit
>> >
>> > -- then you can give axtChain [and lastz but not blastz] the .2bit file
>> > instead of the directory, and don't have to pass -faQ / -faT.
>> >
>> >
>> >
>> > > 4. Get size of each chromosome for chain filtering using faSize
>> >
>> >
>> Yes.  Note that the sequence name and size must appear in a
>> > tab-separated file, so use the -detailed flag like this:
>> >
>> > faSize -detailed oneChrom.fa > oneChrom.sizes
>> >
>> >
>> >
>> > > 5. Sort and filter chains
>> > >    a. use chainMergeSort using chain from step 3
>> > >    b. prenet with chainPreNet using chain from 5.a., size of target
>> > >       chromosome gotten from step 4, and size of query chromosome from
>> > >       step 4 as arguments
>> > > 6. Netting?
>> >
>> >
>> Yes.  We pipe the output of chainPreNet to chainNet (and pipe that to
>> > netSyntenic) like this:
>> >
>> > chainPreNet chroms.chain oneChrom.sizes otherChrom.sizes stdout \
>> > | chainNet stdin -minSpace=1 oneChrom.sizes otherChrom.sizes stdout
>> /dev/null \
>> > | netSyntenic stdin chroms.net
>> >
>> >
>> >
>> > > 7. Convert to .maf and use phyloFit
>> >
>> >
>> Yes.  For historical reasons we use netToAxt | axtToMaf, we don't have
>> > a netToMaf.  You can see an example of how phyloFit was run for the
>> > D. melanogaster Conservation track in our source tree
>> > (kent/src/hg/makeDb/doc/dm2.txt, search for "PHASTCONS 15WAY" and then
>> > search for phyloFit).
>> >
>> > If you have only two species, there is not much phylogenetic
>> > information for phyloFit, but I suppose it could still make a
>> > substitution rate model.  If you have more than two species, you will
>> > also need multiz from http://www.bx.psu.edu/miller_lab/ .
>> >
>> >
>> >
>> > > 8. Run phastCons to get .wig output which can be uploaded as a
>> conservation
>> > > track
>> >
>> >
>> Yes.  Adam Siepel also has a new method of scoring conservation,
>> > phyloP, which we offer in addition to phastCons scores on newer
>> > Conservation tracks.  I have no idea whether one would be more
>> > appropriate than the other when working with only two species.
>> >
>> >
>> >
>> > > Since I am only comparing two chromosomes at a time, I think only a
>> single
>> > > chain is generated which doesn't have to be netted. Am I missing
>> anything?
>> >
>> >
>> Netting is still necessary, in order to get single-coverage alignments
>> > on which conservation scores are computed.  axtChain usually produces
>> > many chains that cover the same position on the reference
>> > genome/chromosome.  The netting process selects the highest-scoring
>> > chain as the top level, and then fills in gaps (unaligned areas) using
>> > the next highest-scoring chain, and then fills in that chain's gaps
>> > using the next highest-scoring chain and so on.  It is kind of like
>> > extracting a global alignment from a sea of chained local alignments.
>> >
>> >
>> > Max Haeussler contributed a very useful genomewiki page about
>> > reconstructing our alignment & conservation pipeline (perhaps you have
>> > seen it already?):
>> >
>> >
>> > http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto
>> >
>> >
>> Hope that helps, and please send more questions to [email protected]
>> > as you have them,
>> >
>> > Angie
>> >
>> > ----- "Bremen Braun" <[email protected]> wrote:
>> >
>> > > From: "Bremen Braun" <[email protected]>
>> > > To: [email protected]
>> > > Sent: Thursday, December 3, 2009 9:31:48 AM GMT -08:00 US/Canada
>> Pacific
>> > > Subject: [Genome] Generate conservation info for interspecies genome
>> comparisons
>> >
>> > >
>> > > Hello,
>> > >
>> > > I have sequences of similar chromosomes for different species that I
>> > > want to
>> > > compare. I would like to be able to generate a conservation track such
>> > > as
>> > > the one seen here:
>> > > http://tinyurl.com/yz6o6fu
>> > >
>> > > I looked at an example of steps to be taken. Could you please
>> > > verify/clarify
>> > > for me? Let's assume I want to compare 2 chromosomes.
>> > > 1. Mask both chromosomes
>> > > 2. Align both chromosomes with blastz
>> > > 3. Chain using axtChain (QUESTION: I see this program takes two
>> > > directories
>> > > as arguments. If I wish only to compare two chromosomes, would these
>> > > directories have only one file each?)
>> > > 4. Get size of each chromosome for chain filtering using faSize
>> > > 5. Sort and filter chains
>> > >    a. use chainMergeSort using chain from step 3
>> > >    b. prenet with chainPreNet using chain from 5.a., size of target
>> > > chromosome gotten from step 4, and size of query chromosome from step
>> > > 4 as
>> > > arguments
>> > > 6. Netting?
>> > > 7. Convert to .maf and use phyloFit
>> > > 8. Run phastCons to get .wig output which can be uploaded as a
>> > > conservation
>> > > track
>> > >
>> > > Since I am only comparing two chromosomes at a time, I think only a
>> > > single
>> > > chain is generated which doesn't have to be netted. Am I missing
>> > > anything?
>> > >
>> > > Thanks,
>> > > Bremen
>> >
>> > > _______________________________________________
>> > > Genome maillist  -  [email protected]
>> > > https://lists.soe.ucsc.edu/mailman/listinfo/genome
>> >
>>
>
> >
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to