Two things to note: (1) rep() can be vectorized: > rep(1:3, 2:4) [1] 1 1 2 2 2 3 3 3 3 >
(2) you will likely get much better performance if you work with integers and convert to strings after sampling (or use factors), e.g.: > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)] [1] "red" "blue" "red" "red" "red" > -- Tony Plate Brian Frappier wrote: > I tried all of the approaches below. > > the problem with: > > > x <- data.frame(matrix(NA,100,3)) > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > > if you want result in data frame > > or > > x<-vector("list", 3) > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > is that this code still samples the rows, not the elements, i.e. returns > 100 or 300 in the matrix cells instead of "red" or a matrix of counts by > color (object type) like: > x1 x2 x3 > red 32 5 60 > gr 68 95 40 > sum 100 100 100 > > It looks like Tony is right: sampling without replacement requires > listing of all elements to be sampled. But, the code Petr provided > > x1 <- sample(c(rep("red",400),rep("green", 100),rep("black",300)),100) > > did give me a clue of how to quickly make such a list using the 'rep' > command. I will for-loop a rep statement using my original matrix to > create a list of elements for each sample: > > Thanks Petr and Tony for your help! > > On 10/11/06, *Tony Plate* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > wrote: > > Here's a way using apply(), and the prob= argument of sample(): > > > df <- data.frame(sample1=c(red=400,green=100,black=300), > sample2=c(300,0,1000), sample3=c(2500,200,500)) > > df > sample1 sample2 sample3 > red 400 300 2500 > green 100 0 200 > black 300 1000 500 > > set.seed(1) > > apply(df, 2, function(counts) sample(seq(along=counts), rep=T, > size=7, prob=counts)) > sample1 sample2 sample3 > [1,] 1 3 1 > [2,] 1 3 1 > [3,] 3 3 1 > [4,] 2 3 2 > [5,] 1 3 1 > [6,] 2 3 1 > [7,] 2 3 3 > > > > Note that this does sampling WITH replacement. > AFAIK, sampling without replacement requires enumerating the entire > population to be sampled from. I.e., you cannot do > > sample(1:3, prob=1:3, rep=F, size=4) > instead of > > sample(c(1,2,2,3,3,3), rep=F, size=4) > > -- Tony Plate > > From reading ?sample, I was a little unclear on whether sampling > without replacement could work > > Petr Pikal wrote: > > Hi > > > > a litle bit different story. But > > > > x1 <- sample(c(rep("red",400),rep("green", 100), > > rep("black",300)),100) > > > > is maybe close. With data frame (if it is not big) > > > > > >>DF > > > > color sample1 sample2 sample3 > > 1 red 400 300 2500 > > 2 green 100 0 200 > > 3 black 300 1000 500 > > > > x <- data.frame(matrix(NA,100,3)) > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > > if you want result in data frame > > or > > x<-vector("list", 3) > > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > > > if you want it in list. Maybe somebody is clever enough to discard > > for loop but you said you have 80 columns which shall be no problem. > > > > HTH > > Petr > > > > > > > > > > > > > > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote: > > > > Date sent: Wed, 11 Oct 2006 10:11:33 -0400 > > From: "Brian Frappier" < [EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> > > To: "Petr Pikal" <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> > > Subject: Fwd: [R] rarefy a matrix of counts > > > > > >>---------- Forwarded message ---------- > >>From: Brian Frappier <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> > >>Date: Oct 11, 2006 10:10 AM > >>Subject: Re: [R] rarefy a matrix of counts > >>To: r-help@stat.math.ethz.ch <mailto:r-help@stat.math.ethz.ch> > >> > >>Hi Petr, > >> > >>Thanks for your response. I have data that looks like the > following: > >> > >> sample 1 sample 2 sample 3 .... > >>red candy 400 300 2500 > >>green candy 100 0 200 > >>black candy 300 1000 500 > >> > >>I don't want to randomly select either the samples (columns) or the > >>"candy" types (rows), which sample as you state would allow me. > >>Instead, I want to randomly sample 100 candies from each sample and > >>retain info on their associated type. I could make a list of all the > >>candies in each sample: > >> > >>sample 1 > >>red > >>red > >>red > >>red > >>green > >>green > >>black > >>red > >>black > >>... > >> > >>and then randomly sample those rows. Repeat for each > sample. But, I > >>am not sure how to do that without alot of loops, and am wondering if > >>there is an easier way in R. Thanks! I should have laid this out in > >>the first email...sorry. > >> > >> > >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > >> > >>>Hi > >>> > >>>I am not experienced in Matlab and from your explanation I do not > >>>understand what exactly do you want. It seems that you want randomly > >>>choose a sample of 100 rows from your martix, what can be achived by > >>>sample. > >>> > >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300) > >>>DF[sample(1:100, 10),] > >>> > >>>If you want to do this several times, you need to save your result > >>>and than it depends on what you want to do next. One suitable form > >>>is list of matrices the other is array and you can use for loop for > >>>completing it. > >>> > >>>HTH > >>>Petr > >>> > >>> > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote: > >>> > >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400 > >>>From: "Brian Frappier" > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > >>>To: r-help@stat.math.ethz.ch > <mailto:r-help@stat.math.ethz.ch> Subject: > >>> [R] rarefy a matrix of counts > >>> > >>> > >>>>Hi all, > >>>> > >>>>I have a matrix of counts for objects (rows) by samples (columns). > >>>> I aimed for about 500 counts in each sample (I have about 80 > >>>>samples) and would now like to rarefy these down to 100 counts in > >>>>each sample using simple random sampling without replacement. I > >>>>plan on rarefying several times for each sample. I could do the > >>>>tedious looping task of making a list of all objects (with its > >>>>associated identifier) in each sample and then use the wonderful > >>>>"sampling" package to select a sub-sample of 100 for each sample > >>>>and thereby get a logical vector of inclusions. I would then > >>>>regroup the resulting logical vector into a vector of counts by > >>>>object, rinse and repeat several times for each sample. > >>>> > >>>>Alternately, using the same list, I could create a random index of > >>>>integers between 1 and the number of objects for a sample (without > >>>>repeats) and then select those objects from the list. Again, > >>>>rinse and repeat several time for each sample. > >>>> > >>>>Is there a way to directly rarefy a matrix of counts without > >>>>having to create a list of objects first? I am trying to switch > >>>>to R from Matlab and am trying to pick up good programming habits > >>>>from the start. > >>>> > >>>>Much appreciation! > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>>______________________________________________ > >>>>R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> > mailing list > >>>>https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > >>>>PLEASE do read the posting guide > >>>>http://www.R-project.org/posting-guide.html and provide commented, > >>>>minimal, self-contained, reproducible code. > >>> > >>>Petr Pikal > >>>[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > >>> > >>> > >> > > > > Petr Pikal > > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > > ______________________________________________ > > R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.