Hi R-help folks, I have been doing some single SNP association work using snpMatrix. This works well, but produces a lot of false positives, because of population structure in my data. I would like to correct the p-values (which snpMatrix gives me) for population structure, possibly using principle component analysis (PCA).
My data is complicated, so here's a simple example of what I'd like to do: # 3x8 matrix of example snp data, 1 = allele A, -1 = allele B, 0 = hetero snp.data = matrix( c(0,1,0,-1,-1,1,1,-1,0,1,1,0,0,1,0,-1,-1,NA,0,-1,0,0,1,0), nrow=3, dimnames = list( c("bob", "frita", "trudy"), c("snp1", "snp2", "snp3", "snp4", "snp5","snp6", "snp7", "snp8") ) ) # phenotype data - resistant or susceptible to zombie infection phenotype.data = matrix( c("bob", "frita", "trudy", "resistant", "susceptible", "resistant"), nrow=3, dimnames = list( c("bob", "frita", "trudy"), c("rowNames", "cc") ) ) library("snpMatrix") # add one in the following line so genotypes are 0,1 or 2, so single.snp.tests() method doesn't complain snp.matrix <- as(snp.data + 1,'snp.matrix') single.snp.assoc <- single.snp.tests(cc, data=as.data.frame(phenotype.data), snp.data=snp.matrix ) Okay, so now I have p-values for the association between SNPs and resistance to zombie infection. I do this for PCA: snp.data = replace( snp.data, is.na(snp.data), 0) # workaround, b/c I can't get prcomp to ignore NAs no matter what I do pca = prcomp(snp.data) So, the question is, can I use the PCA data to correct my p-values (in single.snp.association) for population structure? In my real data, population structure is causing a lot of type I errors (false positive SNPs). I have read of some standalone software that does this sort of thing, for example EIGENSTRAT: http://www.biostat.jhsph.edu/~iruczins/teaching/misc/gwas/papers/price2006.pdf But I'd like to stick to R if possible. Any advice/comments welcome. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.