Hi John, These previously-answered mailing list questions should be helpful:
https://lists.soe.ucsc.edu/pipermail/genome/2009-March/018553.html https://lists.soe.ucsc.edu/pipermail/genome/2008-August/017007.html I also suggest looking at the "description" column in the table schema (hit "view table schema" from the table browser or the SNP details page): http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=varRep&hgta_track=snp135&hgta_table=snp135&hgta_doSchema=describe+table+schema. The refNCBI column (and the refUCSC column) always contain the allele from the positive strand. The strand column denotes the strand of the alleles in the observed column. It might be helpful to look at what we display when you click on the SNP in the Genome Browser. We reverse-complement the reference allele on when the strand is negative, but not the observed allele: dbSNP build 135 rs1000073 Strand: - Observed: C/T Reference allele: T So, for rs1000073, NCBI and UCSC are in agreement that T is present on the negative strand in the GRCh37 reference genome at position chr1:157255396. The two alleles observed on the negative strand in that position are C and T. Also, if you click on "Re-alignment of the SNP's flanking sequences to the genomic sequence," you will see that the flanking sequences for rs1000073 from dbSNP align to the negative genomic strand. I note that most SNPs (~51 million out of ~54 million) in the snp135 table have a strand of "+". You should also be aware of the exceptions column of the snp135 table, which flags some indicators of potential problems with a record, such as these (from the SNP track details page): ObservedContainsIupac - At least one observed allele from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N). ObservedMismatch - UCSC reference allele does not match any observed allele from dbSNP. This is tested only for SNPs whose class is single, in-del, insertion, deletion, mnp or mixed. ObservedTooLong - Observed allele not given (length too long). ObservedWrongFormat - Observed allele(s) from dbSNP have unexpected format for the given class. I hope this helps explain the snp table. If you have further questions, please contact us again at [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 12/21/11 4:07 AM, John Curtin wrote: > Hi > I am trying to determine strand orientation for SNPs using the > output from "http://genome.ucsc.edu/cgi-bin/hgTables" (all SNPs(135)). > > This is in order to impute using 1000 genomes data which requires > you to know the strand orientation of the data. I want to be sure that I fully understand this table and I am getting the information I need. > > In the attached file (exported from tables/SNPs) I have highlighted > 4 columns. Does the "Strand" column refer to the "observed" column. i.e. the observed column for rs1000073 it is on the "-" strand because it is ("C/T"). For rs1000073 the "+" strand equivalent is "A" (from refUCSC). Obviously you cannot use this approach for AT or CG, but I have illumina data and 99% of SNPs are unambiguous. > Is this correct? > Regards > John > > > ************************************************************************************ > John A Curtin > Lecturer in Functional Genomics > Deputy Director, MRes in Translational Medicine > University of Manchester > CIGMR > 2nd Floor, Stopford Building > Oxford Road > Manchester, M13 9PT > [email protected]<mailto:[email protected]> | Tel: > 0161 275-5203 (CIGMR) | 0161 291-5867 (UHSM) > http://www.medicine.manchester.ac.uk/staff/JohnCurtin > Master of Research Translational Medicine: > http://www.medicine.manchester.ac.uk/postgraduate/mres/TMInterMolecMRes/<https://outlook.manchester.ac.uk/owa/redir.aspx?C=497ad8462a0a44fbbbfe62f15e5417b0&URL=http%3a%2f%2fwww.medicine.manchester.ac.uk%2fpostgraduate%2fmres%2fTMInterMolecMRes%2f> > http://www.medicine.manchester.ac.uk/postgraduate/mres/TMPharmCancerMRes/ > > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
