Hi Suganthi,

You're right -- these were not corrected for strand, and the schema description 
is incorrect.  I will revisit the HGDP data files and see if there's a way to 
identify and fix these cases.  

An expedient possibility, if you are adept at Perl or some other programming 
language: read in genome fasta sequence, discard newlines etc, and store into 
large strings (possibly one chrom at a time, since input is sorted; read chrom 
seq each time a new chrom appears).  For each SNP, look up the reference base 
at the given coord (substr).  If neither allele matches the ref, but the ref 
does match a complemented allele (I guess it had better), replace the given 
allele with the complemented allele.  

This would leave some ambiguity of ancestral vs derived if the dataset included 
C/G or A/T SNPs, but it doesn't (by design of the Illumina assay, pers. comm. 
Devin Absher).

Sorry for the inconvenience,
Angie


----- "Suganthi Bala" <[email protected]> wrote:

> From: "Suganthi Bala" <[email protected]>
> To: [email protected]
> Sent: Tuesday, September 14, 2010 7:50:28 PM GMT -08:00 US/Canada Pacific
> Subject: [Genome] HGDP SNP data
>
> Hi,
> 
> This pertains to the data that I downloaded for HGDP SNPs via the Table
> Browser for HG18 build. It appears that the SNPs are not always reported
> with respect to the forward strand of the reference genome even though that
> is what the table schema indicates. For eg, the following SNPs: rs2296441,
> rs12782963, rs4758443 etc.
> 
> Is it possible that it was mistakenly not corrected for strand orientation?
> If yes, is it possible to get a fixed file quickly? Thanks.
> 
> Best,
> Suganthi Bala
> Yale University
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to