Hi Ana, You seem to be working on an identification or classification problem. Your sample plot didn't come through, perhaps try converting it to a PDF or PNG. I may be missing something, but I can't see how randomly selecting 30 values from almost 4 million is going to mean anything in terms of statistical significance. I hope you will pardon me for saying that it looks like a "p-trawl". It is easy to select cases where the p-value is less than 0.05:
a[a$pvalue < 0.05,] Maybe what you want to do is display this subset of your data as candidates for a match among the very large number of non-matches. Let's do a bit of damage to your sample data and add the proportions: a<-read.table(text="rs pvalue pSNP rs185642176 0.0267407 0.6 rs184120752 0.0787681 0.3 rs10904045 0.0508162 0.4 rs35849539 0.0875910 0.2 rs141633513 0.0787759 0.2 rs4468273 0.0542171 0.4 rs4567378 0.0539484 0.4 rs7084251 0.0126445 0.7 rs181605000 0.0787838 0.35 rs12255619 0.0192719 0.61 rs140367257 0.0788008 0.25 rs10904178 0.0969814 0.16 rs7918960 0.0436341 0.45 rs61688896 0.0526256 0.39 rs151283848 0.0787284 0.34 rs140174295 0.0989107 0.11 rs145945079 0.0787015 0.23 rs4881370 0.0455089 0.51 rs183895035 0.0787015 0.22 rs181749526 0.0787015 0.22", header=TRUE,stringsAsFactors=FALSE) alt05<-a[a$pvalue < 0.05,] library(plotrix) segmat<-matrix(c(alt05$pSNP,alt05$pSNP-0.1,alt05$pSNP+0.1,rep(1,5)), nrow=4,byrow=TRUE) rownames(segmat)<-c("prop","lower","upper","N") centipede.plot(segmat,mar=c(4,6,3,4), main="Proportion of SNPs", left.labels=alt05$rs,right.labels=rep("",5)) This is probably not what you want, but it is a start. Jim On Fri, Jan 24, 2020 at 7:08 AM Ana Marija <sokovic.anamar...@gmail.com> wrote: > > Hello, > > I have a data frame which looks like this: > > > head(a,20) > rs pvalue > 1: rs185642176 0.267407 > 2: rs184120752 0.787681 > 3: rs10904045 0.508162 > 4: rs35849539 0.875910 > 5: rs141633513 0.787759 > 6: rs4468273 0.542171 > 7: rs4567378 0.539484 > 8: rs7084251 0.126445 > 9: rs181605000 0.787838 > 10: rs12255619 0.192719 > 11: rs140367257 0.788008 > 12: rs10904178 0.969814 > 13: rs7918960 0.436341 > 14: rs61688896 0.526256 > 15: rs151283848 0.787284 > 16: rs140174295 0.989107 > 17: rs145945079 0.787015 > 18: rs4881370 0.455089 > 19: rs183895035 0.787015 > 20: rs181749526 0.787015 > > dim(a) > [1] 3859763 2 > > What I would like to do is to take random subsets of 30 of those rs > throughout the dataframe and find out which subsets of those generated > have FDR value <0.05 > > FDR I would calculate I guess with: > a$fdr=p.adjust(a$pvalue,method="BH") > > but I also guess I would be calculating only FDR for a particular > subset of 30 randomly chosen rs, not for the whole data set. > > The result I would like to present like in the attached plot. The > x-axis say proportion of SNPs and in my case SNP is equivalent to rs > > Can you please help with this, I really don't have idea how to go about this. > > Thanks > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.