Hi Sean,

The short answer is that we show the allele frequencies that were in dbSNP's 
b132 release download files, and those files don't contain as many counts as 
shown on the web page now.  I suspect that dbSNP updates data on their web 
pages more frequently than the release schedule, but of course it's best to ask 
them about that.  

Our process involves ftp'ing dbSNP's database dump files after they announce a 
release (ftp://ftp.ncbi.nlm.nih.gov/snp/database/organism_data/human_9606/), 
and processing those into our local format.  Allele frequencies and counts come 
directly from dbSNP's SNPAlleleFreq table.  For snp132, we downloaded the table 
in Nov. 2010; at that time, there were almost no allele counts for 1000 
Genomes, so I asked dbSNP about it and they realized that their frequency 
tables had not been updated in some time.  At the end of Dec. 2010, they 
regenerated a few tables including SNPAlleleFreq.  I downloaded SNPAlleleFreq 
again in early Jan. 2011, and built our snp132 track, so our allele frequencies 
are from Dec. 2010.  This is what dbSNP's Dec. 2010 SNPAlleleFreq table has for 
rs538 and rs222:

mysql> select * from SNPAlleleFreq where snp_id = 538;
+--------+-----------+---------+-----------+---------------------+
| snp_id | allele_id | chr_cnt | freq      | last_updated_time   |
+--------+-----------+---------+-----------+---------------------+
|    538 |         2 |       1 | 0.0526316 | 2010-12-26 18:59:47 | 
|    538 |         4 |       4 |  0.210526 | 2010-12-26 18:59:47 | 
|    538 |         7 |      14 |  0.736842 | 2010-12-26 18:59:47 | 
+--------+-----------+---------+-----------+---------------------+

mysql> select * from SNPAlleleFreq where snp_id = 222;
+--------+-----------+---------+----------+---------------------+
| snp_id | allele_id | chr_cnt | freq     | last_updated_time   |
+--------+-----------+---------+----------+---------------------+
|    222 |         4 |    1823 | 0.714342 | 2010-12-26 18:59:47 | 
|    222 |         7 |     729 | 0.285658 | 2010-12-26 18:59:47 | 
+--------+-----------+---------+----------+---------------------+

We are eagerly anticipating dbSNP's next release of human SNPs, which hopefully 
will happen this fall (and perhaps even sooner).  dbSNP's summary page 
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi indicates that since 
132 there have been 46M new submissions for human, 24M of which include 
genotypes and/or allele frequencies.  

Hope that helps, and please email us at genome@soe.ucsc.edu if you have any 
more questions,
Angie


----- Original Message -----
From: "Xiang Li" <x...@ambrygen.com>
To: genome@soe.ucsc.edu
Sent: Sunday, August 7, 2011 9:50:30 AM
Subject: Re: [Genome] allele_copy_num inconsistent with NCBI dbSNP record       
forrs538

The same case for rs222. There are 3200 allele samples based on
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=222,

But there are only 2552 samples based on genome browser (T,C,
1823.000000,729.000000)

Did NCBI update their dbSNP database just recently? 

Sean

-----Original Message-----
From: genome-boun...@soe.ucsc.edu [mailto:genome-boun...@soe.ucsc.edu]
On Behalf Of Xiang Li
Sent: Sunday, August 07, 2011 8:15 AM
To: genome@soe.ucsc.edu
Subject: [Genome] allele_copy_num inconsistent with NCBI dbSNP record
forrs538

Hi, Dear support,

 

Could you please help me understand why the allele_copy_num is different
from NCBI dbSNP record for rs538. 

 

Based on http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=538,
there are hundreds of samples in HapMap pilot studies.  

 

However, UCSC genome browser shows there are only 19 allele copy numbers
there:

------------------------------------------------------------------------
-----------------

alleles:            G,T,C         allele_copy_num:
1.000000,4.000000,14.000000

------------------------------------------------------------------------
-----------------

 

If I don't include the samples in HapMap pilot studies, the number would
match exactly.  This is just one of many examples.   Can you please help
me understand what rules were used by you to derive those numbers, such
as why not include HapMap pilot studies?

 

Thanks.

 

Best

 

Sean

 

_______________________________________________
Genome maillist  -  Genome@soe.ucsc.edu
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  Genome@soe.ucsc.edu
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  Genome@soe.ucsc.edu
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to