Good Afternoon:
You are counting the bytes from the chr3.fa.gz file, including
the line breaks and header names, not the actual sequence.
The sequence length is correct in the fasta files and in
the genome browser.
$ faSize chr3.fa.gz
159599783 bases (3205871 N's 156393912 real 86095927 upper 70297985
lower) in 1 sequences in 1 files
%44.05 masked total, %44.95 masked real
$ zcat chr3.fa.gz | wc
3191997 3191997 162791785
$ mysql -ugenome -hgenome-mysql.cse.ucsc.edu -e 'select * from chromInfo
where chrom="chr3";' mm9
+-------+-----------+--------------------+
| chrom | size | fileName |
+-------+-----------+--------------------+
| chr3 | 159599783 | /gbdb/mm9/mm9.2bit |
+-------+-----------+--------------------+
--Hiram
----- Original Message -----
From: "Wei Shi" <[email protected]>
To: [email protected]
Cc: "Moshe Olshansky" <[email protected]>
Sent: Thursday, November 3, 2011 3:43:00 PM
Subject: [Genome] chromosome length in UCSC genome browser
Dear UCSC Genome Browser developers,
We have found that the lengths of mouse chromosomes (mm9) displayed in the
Genome Browser are shorter than their actual lengths obtained from the FASTA
chromosome sequence data. I used the chromosome data downloaded from
http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/ to get the actual
chromosomal length.
For example, chromosome 3 has a length of 159,599,783 bases in the Genome
Browser, however the actual length of chromosome 3 is found to be 162,791,780
bases from the sequence data. This caused a problem when we looked at mapping
locations of next-generation sequencing reads which were mapped to the mouse
reference genome. We observed our sequencing reads were shifted by ~2 million
bases in the Genome Browser. We used mm9 in the Genome Browser and the
sequencing reads were also mapping to mm9.
Could you please let us know why the chromosomal lengths are different, or are
we using Genome Browser in the wrong way?
Thanks,
------------------
Wei Shi, Ph.D
Bioinformatics Division
The Walter and Eliza Hall Institute of Medical Research
1G Royal Parade, Parkville, VIC 3052
Australia
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome