hi,

i have a list of SNPs and their locations on hg18. i'd like to use ucsc data
to find out for each SNP whether it falls in a known gene and if so in which
of the following regions: 5'utr/coding sequence/intron/3'utr. if it does
fall inside the coding sequence i would additionally like to know whether it
is a synonymous SNP or not, and if not what is the resulting amino acid

i read through the mailing archives and understood its best to use refGene
and refMrna for this task:
for a given SNP coordinate i first check whether it falls inside any
of refGene's transcription boundaries. if it does, i then determine in which
region of the gene. if it falls inside one of the coding exons i then
extract the relevant codon from refMrna - and here's where i'm stuck:

according to the coordinates in refGene i might determine that the SNP is in
e.g., the 5'utr but according to the coordinates in the CDS file it may turn
out that it's actually in the coding sequence.and the other way around (plus
other similar combinations of that problem concerning the 3'utr and intron
regions).

i understand that the genomic coordinates in refGene are the result of BLAT
and those in the CDS file are local coordinates from NCBI. since the mapping
of NCBI mRNAs to the genome is imperfect these location discrepancies occur.

so, if my description is correct is there any solution to my problem?
if i understood or am doing something wrong i would greatly appreciate your
corrections.

thank you very much for your time and help
Nimrod Rubinstein
The Department of Cell Research and Immunology
Tel Aviv University
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to