Re: [Genome] Locating orthologous regions (not just genes) between human and chimp

Zeng, Jia Fri, 02 Sep 2011 13:32:49 -0700

Hi Hiram,

Thank you so much for your help. It worked like a charm.


I executed the commands one by one and generated all the rbest chains I need 
(panTro3 vs rheMac2 and hg19 vs rheMac2). 
In order to ensure that I was able to reproduce the rbest.chain you guys 
provided for hg19 vs panTro3, I also redid the one
between hg19 and panTro3 and used a diff to check whether it is the same as the 
one I downloaded from UCSC. 

I think the majority of the two files are identical but there is still some 
difference. I investigated a bit more and found that my
version contained a lot of the chromosomes such as chr6_cox_hap2, 
chr6_ssto_hap7, chr17_ctg5_hap1 etc which cannot be
located in the rbest.chain file from UCSC. 

My guess is that when I used the fetchChromSizes to get the hg19.chrom.sizes, 
many of the aforementioned chromosomes
were included but somehow they were not used in your version of the production. 
(Possibly due to a difference in the version
of the genome annotation?)

Could you please confirm my conjecture? If so, what should be the general rule 
of thumb for excluding some chromosomes 
in the analyses? 

Thank you.

- Jia
________________________________________
From: Hiram Clawson [[email protected]]
Sent: Wednesday, August 31, 2011 5:52 PM
To: Zeng, Jia
Cc: [email protected]
Subject: Re: [Genome] Locating orthologous regions (not just genes) between 
human and chimp

Good Afternoon Jia:

Please note the sequence of code generated by this script attached below.

--Hiram

Zeng, Jia wrote:
> Hi Katrina,
>
> Thank you very much for your helpful response. I did set up the kent source 
> tree and located the perl script.
> I also looked up the file /kent/src/hg/makeDb/doc/hg19.txt to grep the 
> particular command you guys used for
> generating the rbest chain:
>
> /cluster/bin/scripts/doRecipBest.pl -workhorse=`uname -n` -stop=recipBest 
> -buildDir=/hive/data/genomes/hg19/bed/lastz.$db hg19 $db >& $db.recipBest.log
>
> I scanned the perl script briefly and thought that I roughly understood the 
> basic usage of the tool. Then I tried
> to do the following to construct a rbest chain of panTro3 vs rheMac2:
>
> ./doRecipBest.pl -stop=recipBest -buildDir=~/data/panTro3 ~/data/panTro3 
> ~/data/rheMac2
>
> where the directory ~/data/panTro3 contains the file: 
> panTro3.rheMac2.all.chain and the directory ~/data/rheMac2 contains the file: 
> rheMac2.panTro3.all.chain
> Both are downloaded from the UCSC golden path repository.
>
> But somehow I think providing the chain files only is not sufficient for the 
> doRecipBest script to work out.
> Could you please let me know what files are assumed to exist in these 
> directories? And what is the buildDir in relation to the tDb and gDb?
>
> Thank you.
>
> Best,
> - Jia

#!/bin/csh -efx
# This script nets in both directions to get reciprocal best chains and nets.
# This script will fail if any of its commands fail.

cd /hive/data/genomes/hg19/bed/lastzPanTro3.2011-02-22/axtChain

# Swap hg19-best chains to be panTro3-referenced:
chainStitchId hg19.panTro3.over.chain.gz stdout \
| chainSwap stdin stdout \
| chainSort stdin panTro3.hg19.tBest.chain

# Net those on panTro3 to get panTro3-ref'd reciprocal best net:
chainPreNet panTro3.hg19.tBest.chain \
   /hive/data/genomes/{panTro3,hg19}/chrom.sizes stdout \
| chainNet -minSpace=1 -minScore=0 \
   stdin /hive/data/genomes/{panTro3,hg19}/chrom.sizes stdout /dev/null \
| netSyntenic stdin stdout \
| gzip -c > panTro3.hg19.rbest.net.gz

# Extract panTro3-ref'd reciprocal best chain:
netChainSubset panTro3.hg19.rbest.net.gz panTro3.hg19.tBest.chain stdout \
| chainStitchId stdin stdout \
| gzip -c > panTro3.hg19.rbest.chain.gz

# Swap to get hg19-ref'd reciprocal best chain:
chainSwap panTro3.hg19.rbest.chain.gz stdout \
| chainSort stdin stdout \
| gzip -c > hg19.panTro3.rbest.chain.gz

# Net those on hg19 to get hg19-ref'd reciprocal best net:
chainPreNet hg19.panTro3.rbest.chain.gz \
   /hive/data/genomes/{hg19,panTro3}/chrom.sizes stdout \
| chainNet -minSpace=1 -minScore=0 \
   stdin /hive/data/genomes/{hg19,panTro3}/chrom.sizes stdout /dev/null \
| netSyntenic stdin stdout \
| gzip -c > hg19.panTro3.rbest.net.gz

# Clean up the one temp file and make md5sum:
rm panTro3.hg19.tBest.chain
md5sum *.rbest.*.gz > md5sum.rbest.txt

# Create files for testing coverage of *.rbest.*.
netToBed -maxGap=1 panTro3.hg19.rbest.net.gz panTro3.hg19.rbest.net.bed
netToBed -maxGap=1 hg19.panTro3.rbest.net.gz hg19.panTro3.rbest.net.bed
chainToPsl panTro3.hg19.rbest.chain.gz \
   /hive/data/genomes/{panTro3,hg19}/chrom.sizes \
   /hive/data/genomes/panTro3/panTro3.2bit /hive/data/genomes/hg19/hg19.2bit \
   panTro3.hg19.rbest.chain.psl
chainToPsl hg19.panTro3.rbest.chain.gz \
   /hive/data/genomes/{hg19,panTro3}/chrom.sizes \
   /hive/data/genomes/hg19/hg19.2bit /hive/data/genomes/panTro3/panTro3.2bit \
   hg19.panTro3.rbest.chain.psl

# Verify that all coverage figures are equal:
set tChCov = `awk '{print $19;}' hg19.panTro3.rbest.chain.psl | sed -e 
's/,/\n/g' | awk 'BEGIN {N = 0;} {N += $1;} END {printf
"%d\n", N;}'`
set qChCov = `awk '{print $19;}' panTro3.hg19.rbest.chain.psl | sed -e 
's/,/\n/g' | awk 'BEGIN {N = 0;} {N += $1;} END {printf
"%d\n", N;}'`
set tNetCov = `awk 'BEGIN {N = 0;} {N += ($3 - $2);} END {printf "%d\n", N;}' 
hg19.panTro3.rbest.net.bed`
set qNetCov = `awk 'BEGIN {N = 0;} {N += ($3 - $2);} END {printf "%d\n", N;}' 
panTro3.hg19.rbest.net.bed`
if ($tChCov != $qChCov) then
   echo "Warning: hg19 rbest chain coverage $tChCov != panTro3 $qChCov"
endif
if ($tNetCov != $qNetCov) then
   echo "Warning: hg19 rbest net coverage $tNetCov != panTro3 $qNetCov"
endif
if ($tChCov != $tNetCov) then
   echo "Warning: hg19 rbest chain coverage $tChCov != net cov $tNetCov"
endif

mkdir experiments
mv *.bed *.psl experiments
# Make rbest net axt's download: one .axt per hg19 seq.
netSplit hg19.panTro3.rbest.net.gz rBestNet
chainSplit rBestChain hg19.panTro3.rbest.chain.gz
cd ..
mkdir axtRBestNet
foreach f (axtChain/rBestNet/*.net)
     netToAxt $f axtChain/rBestChain/$f:t:r.chain \
     /hive/data/genomes/hg19/hg19.2bit /hive/data/genomes/panTro3/panTro3.2bit 
stdout \
     | axtSort stdin stdout \
     | gzip -c > axtRBestNet/$f:t:r.hg19.panTro3.net.axt.gz
   end

# Make rbest mafNet for multiz: one .maf per hg19 seq.
mkdir mafRBestNet
foreach f (axtRBestNet/*.hg19.panTro3.net.axt.gz)
     axtToMaf -tPrefix=hg19. -qPrefix=panTro3. $f \
         /hive/data/genomes/{hg19,panTro3}/chrom.sizes \
             stdout \
       | gzip -c > mafRBestNet/$f:t:r:r:r:r:r.maf.gz
end


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Locating orthologous regions (not just genes) between human and chimp

Reply via email to