Hello, Michael. We recommend reviewing the format pages for the chain and net data types:
Chain: http://genome.ucsc.edu/goldenPath/help/chain.html Net: http://genome.ucsc.edu/goldenPath/help/net.html I also suggest reading through the description page of any Chain/Net track. The mm9/hg19 page is here: http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=mm9&g=chainNetHg19. You may also want to read the article "Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes" which is present in the References section of any Chain/Net track (http://www.pnas.org/content/100/20/11484.full). Please contact us again at [email protected] if you have any further questions. --- Steve Heitner UCSC Genome Bioinformatics Group -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Yu Sent: Wednesday, June 20, 2012 1:07 AM To: [email protected] Subject: [Genome] Inconsistency in chains between "net" files and "chain" files Dear Genome Browser Wizard, I have been comparing the description of chains in *.net and *.chain files. In particular, I compared the chains in the following files hg18-to-mm9 net file: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm9/hg18.mm9.net.gz hg18-to-mm9 all.chain file: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsMm9/hg18.mm9.all.chain.gz hg18-to-mm9 liftOver over.chain file: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToMm9.over.chain .gz I noticed that the over.chain and all.chain files are consistent in their chain descriptions (the chromosome, start position, and length in the query genome, i.e. mm9), but both of those files are inconsistent with the description in the net file. I made this conclusion after I extracted chain descriptions from every file using the commands sed -e 's/^[[:space:]]*//' < hg18.mm9.net | grep fill | cut -d ' ' -f4,6,7,9 | sort -k4n > net_chains grep chain hg18.mm9.all.chain | cut -d ' ' -f8,11-13 | sort -k4n | awk '{print $1, $2, $3-$2, $4}' > all.chain_chains grep chain hg18ToMm9.over.chain | cut -d ' ' -f8,11-13 | sort -k4n | awk '{print $1, $2, $3-$2, $4}' > over.chain_chains These commands generate 4-column files where each row describes a chain and is of the format "<chromosome> <start_pos> <length> <chain_id>". The rows are also sorted by chain id in ascending order. Some chains, such as the chain with id 1, have the same descriptions across all three files, but many chains do not. For example, the chain with id 6 has start position 86799633 according to the net file, but it has start position 11 according to the all.chain and over.chain files. Interestingly, all three files are consistent in saying that this chain is in chromosome 3 and has length 72800139. More generally, it appears that the start positions are inconsistent for many chains, but the chromosome and length is consistent for all chains. My understanding is that net, all.chain, and over.chain files were generated using the same original chains and assignment of chain id's, so I am confused why there is this inconsistency. Please let me know if I am not understanding something correctly or my analysis is flawed. Thank you! Best, Michael _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
