Re: [R] Fwd: rarefy a matrix of counts

Brian Frappier Wed, 11 Oct 2006 14:31:38 -0700

Thanks Manuel,
My only problems with the approach you suggest is that it does not seem to
result in a random sample without replacement as it generates a sample based
on the a priori probabilities, not physical selection and deletion from
subsequent sampling.  I beleive the sample function would achieve the same
result if I supplied probabilities.  Second, unfortunately I have many zero
values as is often the case in ecological data!   Thanks again to everyone
for their help so far.  Physical selection is probably the only option for
sampling without replacement.  brian


>On 10/11/06, Manuel Morales <[EMAIL PROTECTED]> wrote:
>How about the following approach which generates a new sample using the
>rMultinom function from Hmisc.
>
>library(Hmisc)
>
>data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500),
>              nrow=3, byrow=TRUE)
>
>col.sums <- apply(data,2,sum)
>
>probs <- t(data)/col.sums
>
>w <- rMultinom(probs,100)
>
>apply(w, 1, table)
>
>Note that I replaced the zero in your example data set with 25 because
>the table function doesn't seem to output the results nicely when there
>are zero values.
>
>HTH,
>
>Manuel

On 10/11/06, Manuel Morales <[EMAIL PROTECTED]> wrote:
>
> On Wed, 2006-10-11 at 14:25 -0400, Brian Frappier wrote:
> > I tried all of the approaches below.
> >
> > the problem with:
> >
> > > x <- data.frame(matrix(NA,100,3))
> > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > > if you want result in data frame
> > > or
> > > x<-vector("list", 3)
> > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> >
> > is that this code still samples the rows, not the elements, i.e. returns
> 100
> > or 300 in the matrix cells instead of "red" or a matrix of counts by
> color
> > (object type) like:
> >        x1    x2   x3
> > red  32     5    60
> > gr    68    95   40
> > sum 100  100  100
> >
> >  It looks like Tony is right: sampling without replacement requires
> listing
> > of all elements to be sampled.
>
> <snip>
>
> How about the following approach which generates a new sample using the
> rMultinom function from Hmisc.
>
> library(Hmisc)
>
> data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500),
>                nrow=3, byrow=TRUE)
>
> col.sums <- apply(data,2,sum)
>
> probs <- t(data)/col.sums
>
> w <- rMultinom(probs,100)
>
> apply(w, 1, table)
>
> Note that I replaced the zero in your example data set with 25 because
> the table function doesn't seem to output the results nicely when there
> are zero values.
>
> HTH,
>
> Manuel
>
>
>
> > On 10/11/06, Tony Plate <[EMAIL PROTECTED]> wrote:
> > >
> > > Here's a way using apply(), and the prob= argument of sample():
> > >
> > > > df <- data.frame(sample1=c(red=400,green=100,black=300),
> > > sample2=c(300,0,1000), sample3=c(2500,200,500))
> > > > df
> > >        sample1 sample2 sample3
> > > red       400     300    2500
> > > green     100       0     200
> > > black     300    1000     500
> > > > set.seed(1)
> > > > apply(df, 2, function(counts) sample(seq(along=counts), rep=T,
> > > size=7, prob=counts))
> > >       sample1 sample2 sample3
> > > [1,]       1       3       1
> > > [2,]       1       3       1
> > > [3,]       3       3       1
> > > [4,]       2       3       2
> > > [5,]       1       3       1
> > > [6,]       2       3       1
> > > [7,]       2       3       3
> > > >
> > >
> > > Note that this does sampling WITH replacement.
> > > AFAIK, sampling without replacement requires enumerating the entire
> > > population to be sampled from.  I.e., you cannot do
> > > > sample(1:3, prob=1:3, rep=F, size=4)
> > > instead of
> > > > sample(c(1,2,2,3,3,3), rep=F, size=4)
> > >
> > > -- Tony Plate
> > >
> > > From reading ?sample, I was a little unclear on whether sampling
> > > without replacement could work
> > >
> > > Petr Pikal wrote:
> > > > Hi
> > > >
> > > > a litle bit different story. But
> > > >
> > > > x1 <- sample(c(rep("red",400),rep("green", 100),
> > > > rep("black",300)),100)
> > > >
> > > > is maybe close. With data frame (if it is not big)
> > > >
> > > >
> > > >>DF
> > > >
> > > >   color sample1 sample2 sample3
> > > > 1   red     400     300    2500
> > > > 2 green     100       0     200
> > > > 3 black     300    1000     500
> > > >
> > > > x <- data.frame(matrix(NA,100,3))
> > > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100)
> > > > if you want result in data frame
> > > > or
> > > > x<-vector("list", 3)
> > > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100)
> > > >
> > > > if you want it in list. Maybe somebody is clever enough to discard
> > > > for loop but you said you have 80 columns which shall be no problem.
> > > >
> > > > HTH
> > > > Petr
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote:
> > > >
> > > > Date sent:            Wed, 11 Oct 2006 10:11:33 -0400
> > > > From:                 "Brian Frappier" <[EMAIL PROTECTED]>
> > > > To:                   "Petr Pikal" <[EMAIL PROTECTED]>
> > > > Subject:              Fwd: [R] rarefy a matrix of counts
> > > >
> > > >
> > > >>---------- Forwarded message ----------
> > > >>From: Brian Frappier <[EMAIL PROTECTED]>
> > > >>Date: Oct 11, 2006 10:10 AM
> > > >>Subject: Re: [R] rarefy a matrix of counts
> > > >>To: r-help@stat.math.ethz.ch
> > > >>
> > > >>Hi Petr,
> > > >>
> > > >>Thanks for your response.  I have data that looks like the
> following:
> > > >>
> > > >>               sample 1         sample 2         sample 3  ....
> > > >>red candy        400                 300               2500
> > > >>green candy    100                    0                  200
> > > >>black candy     300                1000                500
> > > >>
> > > >>I don't want to randomly select either the samples (columns) or the
> > > >>"candy" types (rows), which sample as you state would allow me.
> > > >>Instead, I want to randomly sample 100 candies from each sample and
> > > >>retain info on their associated type.  I could make a list of all
> the
> > > >>candies in each sample:
> > > >>
> > > >>sample 1
> > > >>red
> > > >>red
> > > >>red
> > > >>red
> > > >>green
> > > >>green
> > > >>black
> > > >>red
> > > >>black
> > > >>...
> > > >>
> > > >>and then randomly sample those rows.  Repeat for each sample.  But,
> I
> > > >>am not sure how to do that without alot of loops, and am wondering
> if
> > > >>there is an easier way in R.  Thanks!  I should have laid this out
> in
> > > >>the first email...sorry.
> > > >>
> > > >>
> > > >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED]> wrote:
> > > >>
> > > >>>Hi
> > > >>>
> > > >>>I am not experienced in Matlab and from your explanation I do not
> > > >>>understand what exactly do you want. It seems that you want
> randomly
> > > >>>choose a sample of 100 rows from your martix, what can be achived
> by
> > > >>>sample.
> > > >>>
> > > >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300)
> > > >>>DF[sample(1:100, 10),]
> > > >>>
> > > >>>If you want to do this several times, you need to save your result
> > > >>>and than it depends on what you want to do next. One suitable form
> > > >>>is list of matrices the other is array and you can use for loop for
> > > >>>completing it.
> > > >>>
> > > >>>HTH
> > > >>>Petr
> > > >>>
> > > >>>
> > > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote:
> > > >>>
> > > >>>Date sent:              Tue, 10 Oct 2006 17:40:47 -0400
> > > >>>From:                   "Brian Frappier" <[EMAIL PROTECTED]>
> > > >>>To:                     r-help@stat.math.ethz.ch Subject:
> > > >>>    [R] rarefy a matrix of counts
> > > >>>
> > > >>>
> > > >>>>Hi all,
> > > >>>>
> > > >>>>I have a matrix of counts for objects (rows) by samples (columns).
> > > >>>> I aimed for about 500 counts in each sample (I have about 80
> > > >>>>samples) and would now like to rarefy these down to 100 counts in
> > > >>>>each sample using simple random sampling without replacement.  I
> > > >>>>plan on rarefying several times for each sample.  I could do the
> > > >>>>tedious looping task of making a list of all objects (with its
> > > >>>>associated identifier) in each sample and then use the wonderful
> > > >>>>"sampling" package to select a sub-sample of 100 for each sample
> > > >>>>and thereby get a logical vector of inclusions.  I would then
> > > >>>>regroup the resulting logical vector into a vector of counts by
> > > >>>>object, rinse and repeat several times for each sample.
> > > >>>>
> > > >>>>Alternately, using the same list, I could create a random index of
> > > >>>>integers between 1 and the number of objects for a sample (without
> > > >>>>repeats) and then select those objects from the list.  Again,
> > > >>>>rinse and repeat several time for each sample.
> > > >>>>
> > > >>>>Is there a way to directly rarefy a matrix of counts without
> > > >>>>having to create a list of objects first?  I am trying to switch
> > > >>>>to R from Matlab and am trying to pick up good programming habits
> > > >>>>from the start.
> > > >>>>
> > > >>>>Much appreciation!
> > > >>>>
> > > >>>> [[alternative HTML version deleted]]
> > > >>>>
> > > >>>>______________________________________________
> > > >>>>R-help@stat.math.ethz.ch mailing list
> > > >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> > > >>>>PLEASE do read the posting guide
> > > >>>>http://www.R-project.org/posting-guide.html and provide commented,
> > > >>>>minimal, self-contained, reproducible code.
> > > >>>
> > > >>>Petr Pikal
> > > >>>[EMAIL PROTECTED]
> > > >>>
> > > >>>
> > > >>
> > > >
> > > > Petr Pikal
> > > > [EMAIL PROTECTED]
> > > >
> > > > ______________________________________________
> > > > R-help@stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> Manuel A. Morales
> http://mutualism.williams.edu
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: rarefy a matrix of counts

Reply via email to