Hi Ana,
You seem to be working on an identification or classification problem.
Your sample plot didn't come through, perhaps try converting it to a
PDF or PNG.
I may be missing something, but I can't see how randomly selecting 30
values from almost 4 million is going to mean anything in terms of
statistical significance. I hope you will pardon me for saying that it
looks like a "p-trawl". It is easy to select cases where the p-value
is less than 0.05:

a[a$pvalue < 0.05,]

Maybe what you want to do is display this subset of your data as
candidates for a match among the very large number of non-matches.
Let's do a bit of damage to your sample data and add the proportions:

a<-read.table(text="rs pvalue pSNP
 rs185642176 0.0267407 0.6
 rs184120752 0.0787681 0.3
 rs10904045 0.0508162 0.4
 rs35849539 0.0875910 0.2
 rs141633513 0.0787759 0.2
 rs4468273 0.0542171 0.4
 rs4567378 0.0539484 0.4
 rs7084251 0.0126445 0.7
 rs181605000 0.0787838 0.35
 rs12255619 0.0192719 0.61
 rs140367257 0.0788008 0.25
 rs10904178 0.0969814 0.16
 rs7918960 0.0436341 0.45
 rs61688896 0.0526256 0.39
 rs151283848 0.0787284 0.34
 rs140174295 0.0989107 0.11
 rs145945079 0.0787015 0.23
 rs4881370 0.0455089 0.51
 rs183895035 0.0787015 0.22
 rs181749526 0.0787015 0.22",
 header=TRUE,stringsAsFactors=FALSE)
alt05<-a[a$pvalue < 0.05,]
library(plotrix)
segmat<-matrix(c(alt05$pSNP,alt05$pSNP-0.1,alt05$pSNP+0.1,rep(1,5)),
 nrow=4,byrow=TRUE)
rownames(segmat)<-c("prop","lower","upper","N")
centipede.plot(segmat,mar=c(4,6,3,4),
 main="Proportion of SNPs",
 left.labels=alt05$rs,right.labels=rep("",5))

This is probably not what you want, but it is a start.

Jim

On Fri, Jan 24, 2020 at 7:08 AM Ana Marija <sokovic.anamar...@gmail.com> wrote:
>
> Hello,
>
> I have a data frame which looks like this:
>
> > head(a,20)
>              rs   pvalue
>  1: rs185642176 0.267407
>  2: rs184120752 0.787681
>  3:  rs10904045 0.508162
>  4:  rs35849539 0.875910
>  5: rs141633513 0.787759
>  6:   rs4468273 0.542171
>  7:   rs4567378 0.539484
>  8:   rs7084251 0.126445
>  9: rs181605000 0.787838
> 10:  rs12255619 0.192719
> 11: rs140367257 0.788008
> 12:  rs10904178 0.969814
> 13:   rs7918960 0.436341
> 14:  rs61688896 0.526256
> 15: rs151283848 0.787284
> 16: rs140174295 0.989107
> 17: rs145945079 0.787015
> 18:   rs4881370 0.455089
> 19: rs183895035 0.787015
> 20: rs181749526 0.787015
> > dim(a)
> [1] 3859763       2
>
> What I would like to do is to take random subsets of 30 of those rs
> throughout the dataframe and find out which subsets of those generated
> have FDR value <0.05
>
> FDR I would calculate I guess with:
> a$fdr=p.adjust(a$pvalue,method="BH")
>
> but I also guess I would be calculating only FDR for a particular
> subset of 30 randomly chosen rs, not for the whole data set.
>
> The result I would like to present like in the attached plot. The
> x-axis say proportion of SNPs and in my case SNP is equivalent to rs
>
> Can you please help with this, I really don't have idea how to go about this.
>
> Thanks
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to