Hello, Michael.

We recommend reviewing the format pages for the chain and net data types:

Chain: http://genome.ucsc.edu/goldenPath/help/chain.html

Net: http://genome.ucsc.edu/goldenPath/help/net.html

I also suggest reading through the description page of any Chain/Net track.
The mm9/hg19 page is here:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm9&g=chainNetHg19.  You may
also want to read the article "Evolution's cauldron: Duplication, deletion,
and rearrangement in the mouse and human genomes" which is present in the
References section of any Chain/Net track
(http://www.pnas.org/content/100/20/11484.full).

Please contact us again at [email protected] if you have any further
questions.

---
Steve Heitner
UCSC Genome Bioinformatics Group

-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Michael Yu
Sent: Wednesday, June 20, 2012 1:07 AM
To: [email protected]
Subject: [Genome] Inconsistency in chains between "net" files and "chain"
files

Dear Genome Browser Wizard,

I have been comparing the description of chains in *.net and *.chain files.
 In particular, I compared the chains in the following files

hg18-to-mm9 net file:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm9/hg18.mm9.net.gz
hg18-to-mm9 all.chain file:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm9/hg18.mm9.all.chain.gz
hg18-to-mm9 liftOver over.chain file:
http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToMm9.over.chain
.gz

I noticed that the over.chain and all.chain files are consistent in their
chain descriptions (the chromosome, start position, and length in the query
genome, i.e. mm9), but both of those files are inconsistent with the
description in the net file.  I made this conclusion after I extracted chain
descriptions from every file using the commands

sed -e 's/^[[:space:]]*//' < hg18.mm9.net | grep fill | cut -d ' '
-f4,6,7,9 | sort -k4n > net_chains
grep chain hg18.mm9.all.chain | cut -d ' ' -f8,11-13 | sort -k4n | awk
'{print $1, $2, $3-$2, $4}' > all.chain_chains grep chain
hg18ToMm9.over.chain | cut -d ' ' -f8,11-13 | sort -k4n | awk '{print $1,
$2, $3-$2, $4}' > over.chain_chains

These commands generate 4-column files where each row describes a chain and
is of the format "<chromosome> <start_pos> <length> <chain_id>".   The rows
are also sorted by chain id in ascending order.  Some chains, such as the
chain with id 1, have the same descriptions across all three files, but many
chains do not.  For example, the chain with id 6 has start position 86799633
according to the net file, but it has start position 11 according to the
all.chain and over.chain files.  Interestingly, all three files are
consistent in saying that this chain is in chromosome 3 and has length
72800139.  More generally, it appears that the start positions are
inconsistent for many chains, but the chromosome and length is consistent
for all chains.

My understanding is that net, all.chain, and over.chain files were generated
using the same original chains and assignment of chain id's, so I am
confused why there is this inconsistency.  Please let me know if I am not
understanding something correctly or my analysis is flawed.  Thank you!

Best,
Michael
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to