Just for make the archives more complete and simplifing the life of the following readers. I think to have solved my problem using the caret packages. In this package there is a function named createData Partition that after defining a column of interest in a data.frame allows to split a dataset in subdatasets that try to preserve the original class distribution here is the link to a tutorial http://www.jstatsoft.org/v28/i05/paper
thank you again Guido 2012/6/14 R. Michael Weylandt <michael.weyla...@gmail.com> > I think you're right -- prob probably isn't quite what you need (at > least, directly): constrained sampling like this is a little trickier > -- I'll leave this to someone who knows more than me. > > Michael > > On Thu, Jun 14, 2012 at 9:07 AM, Guido Leoni <guido.le...@gmail.com> > wrote: > > Sorry I'm not sure that prob is suitable for my purposes(but i'm quite > > newbie with R). > > If I correctly understand prob allows to set a weight for each row in the > > original dataset in order to include the rows on the basis of their > > weights). ... I'm not sure to correctly understanding ;-) > > In my case all the rows are equally important. I need "simply " that my > > subset has in each column the same frequency of 1 that in the original > > dataset > > Thank you again > > Guido > > > > 2012/6/14 R. Michael Weylandt <michael.weyla...@gmail.com> > >> > >> sample() takes a prob = argument which lets you supply weights, which > >> need not sum to one so, if I understand you, you could just pass TRUEs > >> and FALSEs for those rows you want. If I'm wrong about that last bit, > >> I'm still pretty confident sample(prob = ) is the way to go. > >> > >> Best, > >> Michael > >> > >> On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni <guido.le...@gmail.com> > >> wrote: > >> > Dear list I wish to extract from a population genotypized for 10 SNP a > >> > subsample of the same population of size n with similar allele > >> > frequencies. > >> > Essentially i have a matrix of 200 rows (df) like this > >> > Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1, > >> > sample01,Case,1,1,1,-1 > >> > sample02,Control,1,1,1,1 > >> > sample06,Control,1,-1,1,0 > >> > sample10,Case,1,1,1,0 > >> > sample11,Control,1,1,1,1 > >> > sample24,Control,-1,-1,1,0 > >> > sample29,Control,1,-1,1,0 > >> > sample42,Case,-1,-1,1,0 > >> > sample64,Case,-1,1,1,0 > >> > .... > >> > I'm interested to mantain in my subsample the same frequencies of > those > >> > observed for the 1 value in each column > >> > I approached the problem with sample() function > >> > > >> > mysample<-df[sample(1:nrow(df),100,replace=F),] > >> > Then I tested that the frequencies of each allele in mysample are not > >> > statistically different respect to the initial dataset by mean of > >> > prop.test > >> > This seems to work but do you know if there is a package that can do > the > >> > same thing allowing for example a more strict control? > >> > Thank you very much > >> > Guido > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > -- > > Guido Leoni > > National Research Institute on Food and Nutrition > > (I.N.R.A.N.) > > via Ardeatina 546 > > 00178 Rome > > Italy > > > > tel + 39 06 51 49 41 (operator) > > + 39 06 51 49 4498 (direct) > -- Guido Leoni National Research Institute on Food and Nutrition (I.N.R.A.N.) via Ardeatina 546 00178 Rome Italy tel + 39 06 51 49 41 (operator) + 39 06 51 49 4498 (direct) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.