Good Afternoon:

You are counting the bytes from the chr3.fa.gz file, including
the line breaks and header names, not the actual sequence.
The sequence length is correct in the fasta files and in
the genome browser.

    $ faSize chr3.fa.gz 
    159599783 bases (3205871 N's 156393912 real 86095927 upper 70297985
    lower) in 1 sequences in 1 files
    %44.05 masked total, %44.95 masked real

    $ zcat chr3.fa.gz | wc
    3191997 3191997 162791785

    $ mysql -ugenome -hgenome-mysql.cse.ucsc.edu -e 'select * from chromInfo 
where chrom="chr3";' mm9
    +-------+-----------+--------------------+
    | chrom | size      | fileName           |
    +-------+-----------+--------------------+
    | chr3  | 159599783 | /gbdb/mm9/mm9.2bit |
    +-------+-----------+--------------------+

--Hiram

----- Original Message -----
From: "Wei Shi" <[email protected]>
To: [email protected]
Cc: "Moshe Olshansky" <[email protected]>
Sent: Thursday, November 3, 2011 3:43:00 PM
Subject: [Genome] chromosome length in UCSC genome browser

Dear UCSC Genome Browser developers,

We have found that the lengths of mouse chromosomes (mm9) displayed in the 
Genome Browser are shorter than their actual lengths obtained from the FASTA 
chromosome sequence data. I used the chromosome data downloaded from 
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/ to get the actual 
chromosomal length.

For example, chromosome 3 has a length of 159,599,783 bases in the Genome 
Browser, however the actual length of chromosome 3 is found to be 162,791,780 
bases from the sequence data. This caused a problem when we  looked at mapping 
locations of next-generation sequencing reads which were mapped to the mouse 
reference genome. We observed our sequencing reads were shifted by ~2 million 
bases in the Genome Browser. We used mm9 in the Genome Browser and the 
sequencing reads were also mapping to mm9.

Could you please let us know why the chromosomal lengths are different, or are 
we using Genome Browser in the wrong way?

Thanks,
------------------
Wei Shi, Ph.D
Bioinformatics Division
The Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, VIC 3052
Australia

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to