Hi Nimrod,

The snp130 table contains dbSNP's annotations on each SNP's predicted 
functional role (in the 'func' field), which includes whether the SNP is 
coding-synonymous, coding-nonsynonymous, in a 5' or 3' UTR, in an 
intron, just near a gene, etc.  (See the SNP 130 track description for a 
full list).  dbSNP uses RefSeq Genes to make these predictions.

For determining the amino acid changes, I am happy to report that there 
is a somewhat new table in the hg18 database that already has the exact 
information you are looking to extract: snp130CodingDbSnp.

This table is what the Genome Browser uses to display coding changes 
when you click on a SNP and look at the details page.  For instance, if 
you click on rs17852585 in the Genome Browser and scroll down, you will see:

Coding annotations by dbSNP:
NM_000808: missense L (CTC) --> P (CCC)

(Note that you can also see predicted coding changes for *any* gene or 
gene prediction track by clicking "Go to SNPs (130) track controls" and 
making selections in the "On details page, show function and coding 
differences relative to..." boxes.  This information is not stored in 
any table -- it is generated on the fly when you click on a SNP.)

I think that between the snp130 table and the snp130CodingDbSnp table, 
you should be able to find what you are looking for.  If you have any 
further questions, please feel free to write back to 
[email protected].  And thank you for searching the mailing list 
archives before asking your question!

--
Brooke Rhead
UCSC Genome Bioinformatics Group

On 06/11/10 05:40, nimrod rubinstein wrote:
> hi,
> 
> i have a list of SNPs and their locations on hg18. i'd like to use ucsc data
> to find out for each SNP whether it falls in a known gene and if so in which
> of the following regions: 5'utr/coding sequence/intron/3'utr. if it does
> fall inside the coding sequence i would additionally like to know whether it
> is a synonymous SNP or not, and if not what is the resulting amino acid
> 
> i read through the mailing archives and understood its best to use refGene
> and refMrna for this task:
> for a given SNP coordinate i first check whether it falls inside any
> of refGene's transcription boundaries. if it does, i then determine in which
> region of the gene. if it falls inside one of the coding exons i then
> extract the relevant codon from refMrna - and here's where i'm stuck:
> 
> according to the coordinates in refGene i might determine that the SNP is in
> e.g., the 5'utr but according to the coordinates in the CDS file it may turn
> out that it's actually in the coding sequence.and the other way around (plus
> other similar combinations of that problem concerning the 3'utr and intron
> regions).
> 
> i understand that the genomic coordinates in refGene are the result of BLAT
> and those in the CDS file are local coordinates from NCBI. since the mapping
> of NCBI mRNAs to the genome is imperfect these location discrepancies occur.
> 
> so, if my description is correct is there any solution to my problem?
> if i understood or am doing something wrong i would greatly appreciate your
> corrections.
> 
> thank you very much for your time and help
> Nimrod Rubinstein
> The Department of Cell Research and Immunology
> Tel Aviv University
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to