Hi Hiram,

Thank you for your prompt reply. I noticed that I neglected to do the axtSort 
step in my previous attempt and I 
redid it with the complete sequence. Now the diff output between my version and 
UCSC version is a lot smaller 
than before but I'm still seeing differences. Take chrY for example, the first 
difference occurred in the following entry:

UCSC: 6014 chrY 28725727 28727469 chr17 1974742 1976484 + 133306
Mine:  6014 chrY 28725727 28728984 chr17 1974742 1977914 + 241557

Just to be sure that we are using the same version of the data/programs, I list 
the exact places where I got them 
as follows:

hg19.2bit: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/
rheMac2.2bit: http://hgdownload.cse.ucsc.edu/gbdb/rheMac2/
hg19.rheMac2.net.gz: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsRheMac2/
hg19.rheMac2.all.chain.gz: 
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsRheMac2/

Also, I'm using the executable compiled by Jim Kent and distributed through his 
webpage: 
http://hgwdev.cse.ucsc.edu/~kent/exe/linux/

As well, I wrote up the following bash script to complete my procedures 
automatically:

#!/bin/bash
tDb=hg19
qDb=rheMac2
baseDir=~/bcm/data/genomes/$tDb.$qDb
synNetDir=$baseDir/synNet
chainDir=$baseDir/chain 
axtDir=$baseDir/axt
tNibDir=~/bcm/data/genomes/$tDb/$tDb.2bit
qNibDir=~/bcm/data/genomes/$qDb/$qDb.2bit

if [ -e $synNetDir ]
then
  echo "$synNetDir already exists"
else
  netFilter -syn $baseDir/$tDb.$qDb.net.gz | netSplit stdin $synNetDir
fi
if [ -e $chainDir ]
then
  echo "$chainDir already exists"
else
  chainSplit $chainDir $baseDir/$tDb.$qDb.all.chain.gz
fi

cd $synNetDir

if [ -e $axtDir ]
then
  echo "$axtDir already exists"
else
  mkdir $axtDir
fi

for f in *.net
do
  echo $f
  if [[ $f =~ .+hap.+ ]]
  then
    echo "haplotype ignored (pseudochromosomes)"
  else
    inChain=$chainDir/${f/".net"/".chain"}
    outAxt=$axtDir/${f/".net"/".$tDb.$qDb.synNet.axt"}
    if [ -e $outAxt ]
    then 
      echo "$outAxt already exists"
    else
      netToAxt $f $inChain $tNibDir $qNibDir stdout | axtSort stdin $outAxt
    fi
  fi
done


Thank you very much for your time in advance.

Best,
- Jia

________________________________________
From: Hiram Clawson [[email protected]]
Sent: Thursday, November 10, 2011 7:01 PM
To: Zeng, Jia
Cc: [email protected]
Subject: Re: [Genome] About generating syntenic net alignments

Good Afternoon Jia:

I tried this sequence of commands here to see if there is any change
in our programs.  I obtain the identical results that we delivered in 2009.

I ran this csh script:

#!/bin/csh -efx

cd /tmp/synNet/axtChain

# filter net for synteny and create syntenic net mafs
netFilter -syn hg19.rheMac2.net.gz  \
    | netSplit stdin synNet
chainSplit chain hg19.rheMac2.all.chain.gz
cd /tmp/synNet
mkdir axtSynNet
foreach f (axtChain/synNet/*.net)
  netToAxt $f axtChain/chain/$f:t:r.chain \
    /scratch/data/hg19/hg19.2bit /scratch/data/rheMac2/rheMac2.2bit stdout \
  | axtSort stdin stdout | gzip -c > axtSynNet/$f:t:r.hg19.rheMac2.synNet.axt.gz
end

Using the files you find on hgdownload:

-rw-rw-r-- 1 80508577 Nov 10 16:22 hg19.rheMac2.net.gz
-rw-rw-r-- 1 54624334 Nov 10 16:23 hg19.rheMac2.all.chain.gz

-rw-rw-r-- 1 816241703 Mar  8  2009 /scratch/data/hg19/hg19.2bit
-rw-rw-r-- 1 746913825 Jan 30  2006 /scratch/data/rheMac2/rheMac2.2bit

Resulting file is identical to the hgdownload version:
-rw-rw-r-- 1   4677891 Nov 10 16:47 chrY.hg19.rheMac2.synNet.axt.gz

Check your procedure script and source files to see if they match the
example here.

--Hiram

----- Original Message -----
From: "Jia Zeng" <[email protected]>
To: [email protected]
Sent: Thursday, November 10, 2011 3:22:40 PM
Subject: [Genome] About generating syntenic net alignments

Hi,

I need to obtain the syntenic net alignments between several mammalian genomes 
and I decided to generate them myself.
Given the information provided in the script doBlastzChainNet.pl (in particular 
the doSyntenicNet subroutine), I located the
following script:

# filter net for synteny and create syntenic net mafs
netFilter -syn $tDb.$qDb.net.gz  \\
    | netSplit stdin synNet
chainSplit chain $tDb.$qDb.all.chain.gz
cd ..
mkdir $successDir
foreach f (axtChain/synNet/*.net)
  netToAxt \$f axtChain/chain/\$f:t:r.chain \\
    $defVars{'SEQ1_DIR'} $defVars{'SEQ2_DIR'} stdout \\
  | axtSort stdin stdout \\
  | axtToMaf -tPrefix=$tDb. -qPrefix=$qDb. stdin \\
    $defVars{SEQ1_LEN} $defVars{SEQ2_LEN} \\
    stdout \\
| gzip -c > mafSynNet/\$f:t:r:r:r:r:r.maf.gz
end
rm -fr $runDir/synNet
rm -fr $runDir/chain

Since the $tDb.$qDb.net.gz, $tDb.$qDb.all.chain.gz files can be downloaded from 
the goldenPath directory of the download
repository, I figured that I can easily complete the generation procedure 
following the script provided. (I do not need to represent
the alignments in MAF form so I stopped at step netToAxt and wrote the output 
axt to chr?.$tDb.$qDb.synNet.axt, following UCSC's
naming convention).

In order to ensure that I have successfully generated the synNet.axt files, I 
downloaded the following file from UCSC website:
chrY.hg19.rheMac2.synNet.axt.gz and compared it with the one generated by me. A 
diff indicated some differences in the two files
for instance, in UCSC's version, such an entry exists which is absent in my 
file, though the entries before and after it are identical.
92 chrY 873749 873891 chr3 158314483 158314625 - 9481

It seems that my version has filtered out more entries than UCSC's version. I 
am just wondering whether you could kindly offer an
input to this? Thank you very much.

Best,
- Jia

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to