Hi Sean, The short answer is that we show the allele frequencies that were in dbSNP's b132 release download files, and those files don't contain as many counts as shown on the web page now. I suspect that dbSNP updates data on their web pages more frequently than the release schedule, but of course it's best to ask them about that.
Our process involves ftp'ing dbSNP's database dump files after they announce a release (ftp://ftp.ncbi.nlm.nih.gov/snp/database/organism_data/human_9606/), and processing those into our local format. Allele frequencies and counts come directly from dbSNP's SNPAlleleFreq table. For snp132, we downloaded the table in Nov. 2010; at that time, there were almost no allele counts for 1000 Genomes, so I asked dbSNP about it and they realized that their frequency tables had not been updated in some time. At the end of Dec. 2010, they regenerated a few tables including SNPAlleleFreq. I downloaded SNPAlleleFreq again in early Jan. 2011, and built our snp132 track, so our allele frequencies are from Dec. 2010. This is what dbSNP's Dec. 2010 SNPAlleleFreq table has for rs538 and rs222: mysql> select * from SNPAlleleFreq where snp_id = 538; +--------+-----------+---------+-----------+---------------------+ | snp_id | allele_id | chr_cnt | freq | last_updated_time | +--------+-----------+---------+-----------+---------------------+ | 538 | 2 | 1 | 0.0526316 | 2010-12-26 18:59:47 | | 538 | 4 | 4 | 0.210526 | 2010-12-26 18:59:47 | | 538 | 7 | 14 | 0.736842 | 2010-12-26 18:59:47 | +--------+-----------+---------+-----------+---------------------+ mysql> select * from SNPAlleleFreq where snp_id = 222; +--------+-----------+---------+----------+---------------------+ | snp_id | allele_id | chr_cnt | freq | last_updated_time | +--------+-----------+---------+----------+---------------------+ | 222 | 4 | 1823 | 0.714342 | 2010-12-26 18:59:47 | | 222 | 7 | 729 | 0.285658 | 2010-12-26 18:59:47 | +--------+-----------+---------+----------+---------------------+ We are eagerly anticipating dbSNP's next release of human SNPs, which hopefully will happen this fall (and perhaps even sooner). dbSNP's summary page http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi indicates that since 132 there have been 46M new submissions for human, 24M of which include genotypes and/or allele frequencies. Hope that helps, and please email us at genome@soe.ucsc.edu if you have any more questions, Angie ----- Original Message ----- From: "Xiang Li" <x...@ambrygen.com> To: genome@soe.ucsc.edu Sent: Sunday, August 7, 2011 9:50:30 AM Subject: Re: [Genome] allele_copy_num inconsistent with NCBI dbSNP record forrs538 The same case for rs222. There are 3200 allele samples based on http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=222, But there are only 2552 samples based on genome browser (T,C, 1823.000000,729.000000) Did NCBI update their dbSNP database just recently? Sean -----Original Message----- From: genome-boun...@soe.ucsc.edu [mailto:genome-boun...@soe.ucsc.edu] On Behalf Of Xiang Li Sent: Sunday, August 07, 2011 8:15 AM To: genome@soe.ucsc.edu Subject: [Genome] allele_copy_num inconsistent with NCBI dbSNP record forrs538 Hi, Dear support, Could you please help me understand why the allele_copy_num is different from NCBI dbSNP record for rs538. Based on http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=538, there are hundreds of samples in HapMap pilot studies. However, UCSC genome browser shows there are only 19 allele copy numbers there: ------------------------------------------------------------------------ ----------------- alleles: G,T,C allele_copy_num: 1.000000,4.000000,14.000000 ------------------------------------------------------------------------ ----------------- If I don't include the samples in HapMap pilot studies, the number would match exactly. This is just one of many examples. Can you please help me understand what rules were used by you to derive those numbers, such as why not include HapMap pilot studies? Thanks. Best Sean _______________________________________________ Genome maillist - Genome@soe.ucsc.edu https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - Genome@soe.ucsc.edu https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - Genome@soe.ucsc.edu https://lists.soe.ucsc.edu/mailman/listinfo/genome