Thanks Manuel, My only problems with the approach you suggest is that it does not seem to result in a random sample without replacement as it generates a sample based on the a priori probabilities, not physical selection and deletion from subsequent sampling. I beleive the sample function would achieve the same result if I supplied probabilities. Second, unfortunately I have many zero values as is often the case in ecological data! Thanks again to everyone for their help so far. Physical selection is probably the only option for sampling without replacement. brian
>On 10/11/06, Manuel Morales <[EMAIL PROTECTED]> wrote: >How about the following approach which generates a new sample using the >rMultinom function from Hmisc. > >library(Hmisc) > >data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500), > nrow=3, byrow=TRUE) > >col.sums <- apply(data,2,sum) > >probs <- t(data)/col.sums > >w <- rMultinom(probs,100) > >apply(w, 1, table) > >Note that I replaced the zero in your example data set with 25 because >the table function doesn't seem to output the results nicely when there >are zero values. > >HTH, > >Manuel On 10/11/06, Manuel Morales <[EMAIL PROTECTED]> wrote: > > On Wed, 2006-10-11 at 14:25 -0400, Brian Frappier wrote: > > I tried all of the approaches below. > > > > the problem with: > > > > > x <- data.frame(matrix(NA,100,3)) > > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > > > if you want result in data frame > > > or > > > x<-vector("list", 3) > > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > > > is that this code still samples the rows, not the elements, i.e. returns > 100 > > or 300 in the matrix cells instead of "red" or a matrix of counts by > color > > (object type) like: > > x1 x2 x3 > > red 32 5 60 > > gr 68 95 40 > > sum 100 100 100 > > > > It looks like Tony is right: sampling without replacement requires > listing > > of all elements to be sampled. > > <snip> > > How about the following approach which generates a new sample using the > rMultinom function from Hmisc. > > library(Hmisc) > > data <- matrix(c(400, 300, 2500, 100, 25, 200, 300, 1000, 500), > nrow=3, byrow=TRUE) > > col.sums <- apply(data,2,sum) > > probs <- t(data)/col.sums > > w <- rMultinom(probs,100) > > apply(w, 1, table) > > Note that I replaced the zero in your example data set with 25 because > the table function doesn't seem to output the results nicely when there > are zero values. > > HTH, > > Manuel > > > > > On 10/11/06, Tony Plate <[EMAIL PROTECTED]> wrote: > > > > > > Here's a way using apply(), and the prob= argument of sample(): > > > > > > > df <- data.frame(sample1=c(red=400,green=100,black=300), > > > sample2=c(300,0,1000), sample3=c(2500,200,500)) > > > > df > > > sample1 sample2 sample3 > > > red 400 300 2500 > > > green 100 0 200 > > > black 300 1000 500 > > > > set.seed(1) > > > > apply(df, 2, function(counts) sample(seq(along=counts), rep=T, > > > size=7, prob=counts)) > > > sample1 sample2 sample3 > > > [1,] 1 3 1 > > > [2,] 1 3 1 > > > [3,] 3 3 1 > > > [4,] 2 3 2 > > > [5,] 1 3 1 > > > [6,] 2 3 1 > > > [7,] 2 3 3 > > > > > > > > > > Note that this does sampling WITH replacement. > > > AFAIK, sampling without replacement requires enumerating the entire > > > population to be sampled from. I.e., you cannot do > > > > sample(1:3, prob=1:3, rep=F, size=4) > > > instead of > > > > sample(c(1,2,2,3,3,3), rep=F, size=4) > > > > > > -- Tony Plate > > > > > > From reading ?sample, I was a little unclear on whether sampling > > > without replacement could work > > > > > > Petr Pikal wrote: > > > > Hi > > > > > > > > a litle bit different story. But > > > > > > > > x1 <- sample(c(rep("red",400),rep("green", 100), > > > > rep("black",300)),100) > > > > > > > > is maybe close. With data frame (if it is not big) > > > > > > > > > > > >>DF > > > > > > > > color sample1 sample2 sample3 > > > > 1 red 400 300 2500 > > > > 2 green 100 0 200 > > > > 3 black 300 1000 500 > > > > > > > > x <- data.frame(matrix(NA,100,3)) > > > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > > > > if you want result in data frame > > > > or > > > > x<-vector("list", 3) > > > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > > > > > > > if you want it in list. Maybe somebody is clever enough to discard > > > > for loop but you said you have 80 columns which shall be no problem. > > > > > > > > HTH > > > > Petr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote: > > > > > > > > Date sent: Wed, 11 Oct 2006 10:11:33 -0400 > > > > From: "Brian Frappier" <[EMAIL PROTECTED]> > > > > To: "Petr Pikal" <[EMAIL PROTECTED]> > > > > Subject: Fwd: [R] rarefy a matrix of counts > > > > > > > > > > > >>---------- Forwarded message ---------- > > > >>From: Brian Frappier <[EMAIL PROTECTED]> > > > >>Date: Oct 11, 2006 10:10 AM > > > >>Subject: Re: [R] rarefy a matrix of counts > > > >>To: r-help@stat.math.ethz.ch > > > >> > > > >>Hi Petr, > > > >> > > > >>Thanks for your response. I have data that looks like the > following: > > > >> > > > >> sample 1 sample 2 sample 3 .... > > > >>red candy 400 300 2500 > > > >>green candy 100 0 200 > > > >>black candy 300 1000 500 > > > >> > > > >>I don't want to randomly select either the samples (columns) or the > > > >>"candy" types (rows), which sample as you state would allow me. > > > >>Instead, I want to randomly sample 100 candies from each sample and > > > >>retain info on their associated type. I could make a list of all > the > > > >>candies in each sample: > > > >> > > > >>sample 1 > > > >>red > > > >>red > > > >>red > > > >>red > > > >>green > > > >>green > > > >>black > > > >>red > > > >>black > > > >>... > > > >> > > > >>and then randomly sample those rows. Repeat for each sample. But, > I > > > >>am not sure how to do that without alot of loops, and am wondering > if > > > >>there is an easier way in R. Thanks! I should have laid this out > in > > > >>the first email...sorry. > > > >> > > > >> > > > >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > > >> > > > >>>Hi > > > >>> > > > >>>I am not experienced in Matlab and from your explanation I do not > > > >>>understand what exactly do you want. It seems that you want > randomly > > > >>>choose a sample of 100 rows from your martix, what can be achived > by > > > >>>sample. > > > >>> > > > >>>DF<-data.frame(rnorm(100), 1:100, 101:200, 201:300) > > > >>>DF[sample(1:100, 10),] > > > >>> > > > >>>If you want to do this several times, you need to save your result > > > >>>and than it depends on what you want to do next. One suitable form > > > >>>is list of matrices the other is array and you can use for loop for > > > >>>completing it. > > > >>> > > > >>>HTH > > > >>>Petr > > > >>> > > > >>> > > > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote: > > > >>> > > > >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400 > > > >>>From: "Brian Frappier" <[EMAIL PROTECTED]> > > > >>>To: r-help@stat.math.ethz.ch Subject: > > > >>> [R] rarefy a matrix of counts > > > >>> > > > >>> > > > >>>>Hi all, > > > >>>> > > > >>>>I have a matrix of counts for objects (rows) by samples (columns). > > > >>>> I aimed for about 500 counts in each sample (I have about 80 > > > >>>>samples) and would now like to rarefy these down to 100 counts in > > > >>>>each sample using simple random sampling without replacement. I > > > >>>>plan on rarefying several times for each sample. I could do the > > > >>>>tedious looping task of making a list of all objects (with its > > > >>>>associated identifier) in each sample and then use the wonderful > > > >>>>"sampling" package to select a sub-sample of 100 for each sample > > > >>>>and thereby get a logical vector of inclusions. I would then > > > >>>>regroup the resulting logical vector into a vector of counts by > > > >>>>object, rinse and repeat several times for each sample. > > > >>>> > > > >>>>Alternately, using the same list, I could create a random index of > > > >>>>integers between 1 and the number of objects for a sample (without > > > >>>>repeats) and then select those objects from the list. Again, > > > >>>>rinse and repeat several time for each sample. > > > >>>> > > > >>>>Is there a way to directly rarefy a matrix of counts without > > > >>>>having to create a list of objects first? I am trying to switch > > > >>>>to R from Matlab and am trying to pick up good programming habits > > > >>>>from the start. > > > >>>> > > > >>>>Much appreciation! > > > >>>> > > > >>>> [[alternative HTML version deleted]] > > > >>>> > > > >>>>______________________________________________ > > > >>>>R-help@stat.math.ethz.ch mailing list > > > >>>>https://stat.ethz.ch/mailman/listinfo/r-help > > > >>>>PLEASE do read the posting guide > > > >>>>http://www.R-project.org/posting-guide.html and provide commented, > > > >>>>minimal, self-contained, reproducible code. > > > >>> > > > >>>Petr Pikal > > > >>>[EMAIL PROTECTED] > > > >>> > > > >>> > > > >> > > > > > > > > Petr Pikal > > > > [EMAIL PROTECTED] > > > > > > > > ______________________________________________ > > > > R-help@stat.math.ethz.ch mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- > Manuel A. Morales > http://mutualism.williams.edu > > > [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.