Re: [R] Question about sampling

Guido Leoni Thu, 14 Jun 2012 10:07:28 -0700

Just for make the archives more complete and simplifing the life of the
following readers.
I think to have solved my problem using the caret packages.
In this package there is a function named createData Partition that after
defining a column of interest in a data.frame allows to split a dataset in
subdatasets that try to preserve the original class distribution
here is the link to a tutorial
  http://www.jstatsoft.org/v28/i05/paper


thank you again
Guido

2012/6/14 R. Michael Weylandt <michael.weyla...@gmail.com>

> I think you're right -- prob probably isn't quite what you need (at
> least, directly): constrained sampling like this is a little trickier
> -- I'll leave this to someone who knows more than me.
>
> Michael
>
> On Thu, Jun 14, 2012 at 9:07 AM, Guido Leoni <guido.le...@gmail.com>
> wrote:
> > Sorry I'm not sure that prob is suitable for my purposes(but i'm quite
> > newbie with R).
> > If I correctly understand prob allows to set a weight for each row in the
> > original dataset in order to include the rows on the basis of their
> > weights). ... I'm not sure to correctly understanding ;-)
> > In my case all the rows are equally important. I  need  "simply " that my
> > subset has in each column the same frequency of  1 that in the original
> > dataset
> > Thank you again
> > Guido
> >
> > 2012/6/14 R. Michael Weylandt <michael.weyla...@gmail.com>
> >>
> >> sample() takes a prob = argument which lets you supply weights, which
> >> need not sum to one so, if I understand you, you could just pass TRUEs
> >> and FALSEs for those rows you want. If I'm wrong about that last bit,
> >> I'm still pretty confident sample(prob = ) is the way to go.
> >>
> >> Best,
> >> Michael
> >>
> >> On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni <guido.le...@gmail.com>
> >> wrote:
> >> > Dear list I wish to extract from a population genotypized for 10 SNP a
> >> > subsample of the same population of size n with similar allele
> >> > frequencies.
> >> > Essentially i have a matrix of 200 rows (df) like this
> >> > Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
> >> > sample01,Case,1,1,1,-1
> >> > sample02,Control,1,1,1,1
> >> > sample06,Control,1,-1,1,0
> >> > sample10,Case,1,1,1,0
> >> > sample11,Control,1,1,1,1
> >> > sample24,Control,-1,-1,1,0
> >> > sample29,Control,1,-1,1,0
> >> > sample42,Case,-1,-1,1,0
> >> > sample64,Case,-1,1,1,0
> >> > ....
> >> > I'm interested to mantain in my subsample the same frequencies of
> those
> >> > observed for the 1 value in each column
> >> > I approached the problem with sample() function
> >> >
> >> > mysample<-df[sample(1:nrow(df),100,replace=F),]
> >> > Then I tested that  the frequencies of each allele in mysample are not
> >> > statistically different respect to the initial dataset by mean of
> >> > prop.test
> >> > This seems to work but do you know if there is a package that can do
> the
> >> > same thing  allowing for example a more strict control?
> >> > Thank you very much
> >> > Guido
> >> >
> >> >        [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> > --
> > Guido Leoni
> > National Research Institute on Food and Nutrition
> > (I.N.R.A.N.)
> > via Ardeatina 546
> > 00178 Rome
> > Italy
> >
> > tel     + 39 06 51 49 41 (operator)
> >         + 39 06 51 49 4498 (direct)
>



-- 
Guido Leoni
National Research Institute on Food and Nutrition
(I.N.R.A.N.)
via Ardeatina 546
00178 Rome
Italy

tel     + 39 06 51 49 41 (operator)
        + 39 06 51 49 4498 (direct)

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about sampling

Reply via email to