Here's the script I wrote to randomly sample without replacement from a csv file of counts for various object classes (columns) in 76 samples (rows): The data file "common_macro_raw.csv" was: SiteID,Scaling_factor,Collembola,Hydrachnida,Nematomorpha,Oligochaeta,Turbellaria,Glossiphonidae,Hirudinidae,Gammaridae,Asellidae,Baetidae,Ephemerellidae,Ephemeridae,Heptageniidae,Leptophlebiidae,Siphlonuridae,Chloroperlidae,Leuctridae,Nemouridae,Peltoperlidae,Perlidae,Perlolidae,Pteronarcyidae,Brachycentridae,Glossosomatidae,Hydropsychidae,Hydroptilidae,Lepidostomatidae,Leptoceridae,Limnephilidae,Molannidae,Odontoceridae,Philopotamidae,Phryganeidae,Polycentropidae,Rhyacophilidae,Uenoidae,Corixidae,Corydalidae,Sialidae,Chrysolmelidae,Dytiscidae,Elmidae,Psephenidae,Athericidae,Blepharicidae,Ceratopogonidae,Chironomidae,Dixidae,Empididae,Psychodidae,Simuliidae,Strationyidae,Tabanidae,Tipulidae,Aeshnidae,Calopterygidae,Cordulegastridae,Gomphidae,Libellulidae,Pyralidae,Planorbidae,Sphaeridae 1100291,1,1,3,2,2,0,0,0,0,0,4,66,1,2,11,1,10,21,0,0,0,1,0,0,1,0,0,3,0,0,0,3,0,0,0,8,0,0,0,1,0,0,71,0,1,0,5,121,0,1,0,2,0,0,15,0,0,0,0,0,0,1,12 2400143,1.88 ,0,0,0,25,0,0,0,0,0,6,8,0,17,3,0,11,9,1,6,0,1,3,0,4,0,0,1,0,0,0,4,38,0,0,8,2,0,0,0,0,11,25,0,1,0,2,29,0,0,0,22,0,0,8,0,0,2,5,0,0,0,0 2500364,1,0,4,0,6,0,0,0,0,0,66,0,0,63,0,0,55,14,3,0,0,0,0,0,4,0,0,1,0,2,0,0,11,0,0,18,0,0,0,0,0,0,2,0,2,0,0,86,0,0,0,9,0,0,10,0,0,0,0,0,1,0,0 2600075,1,0,1,0,15,0,0,0,0,0...etc
The program requires two loops, but took less than a second to run on my 1.8Ghz: #Reads matrix of raw macroinvertebrate counts from the subsampling prior to large-rare search #and scaling for sub-sampling effort rm(list=ls()) library(stats) master_data = read.csv("common_macro_raw.csv", row.names=1) data.frame(master_data) attach(master_data) counts = master_data[,2:ncol(master_data)] #These loops will extract a stream's assemblage, create a list of buggies identified, #take a random sample of 100 buggies without repalcement, and then re-combine the resulting #list into a vector of counts by taxa taxa_codes = c(1:ncol(counts)) #this creates a sequential integer for each taxon that will be the index for the subsequent lists rarified_samples = numeric() for (x in 1: nrow(counts)) { temp_counts = counts[x,] full_list = rep(taxa_codes, times=temp_counts) stream_rand = sum(temp_counts)/100*master_data[x,1] #puts new scaling factor in first column of stream_rand rare_list = sample(full_list, 100) for (i in 1:ncol(counts)) { temp_sum = sum(rare_list==i) stream_rand = c(stream_rand, temp_sum) } rarified_samples = rbind(rarified_samples, stream_rand) } rownames(rarified_samples)=SiteID colnames(rarified_samples)=colnames(master_data) data.frame(rarified_samples) write.csv(rarified_samples, file = "rarified_samples.csv") You could add another for loop that appends as many iterations as needed to the output file. Thanks for all of your input, it helped tremendously. On 10/12/06, Petr Pikal <[EMAIL PROTECTED]> wrote: > > Hi > > On 11 Oct 2006 at 12:54, Tony Plate wrote: > > Date sent: Wed, 11 Oct 2006 12:54:44 -0600 > From: Tony Plate <[EMAIL PROTECTED]> > To: Brian Frappier <[EMAIL PROTECTED]> > Copies to: Petr Pikal <[EMAIL PROTECTED]>, > r-help@stat.math.ethz.ch > Subject: Re: [R] Fwd: rarefy a matrix of counts > > > Two things to note: > > > > (1) rep() can be vectorized: > > > rep(1:3, 2:4) > > [1] 1 1 2 2 2 3 3 3 3 > > > > > > > (2) you will likely get much better performance if you work with > > integers and convert to strings after sampling (or use factors), e.g.: > > that is what I actually used in my suggestion (I hope). > > > DF > color sample1 sample2 sample3 > 1 red 400 300 2500 > 2 green 100 0 200 > 3 black 300 1000 500 > > notice that red, green, black is not **row names** but a column in > data frame. > That is why following code gives red, green, etc. > > x <- data.frame(matrix(NA,100,3)) > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], DF[,i]),100) > if you want result in data frame > or > x<-vector("list", 3) > for (i in 2:ncol(DF)) x[[,i-1]] <- sample(rep(DF[,1], DF[,i]),100) > > > > > > c("red","green","blue")[sample(rep(1:3,c(400,100,300)), 5)] > > [1] "red" "blue" "red" "red" "red" > > > > > > > -- Tony Plate > > > > <snip> > > > > is that this code still samples the rows, not the elements, i.e. > > No, see above. > > > > returns 100 or 300 in the matrix cells instead of "red" or a matrix > > > of counts by color (object type) like: > > > x1 x2 x3 > > > red 32 5 60 > > > gr 68 95 40 > > > sum 100 100 100 > > something like > > sapply(x,table) > X1 X2 X3 > black 36 79 15 > green 14 0 9 > red 50 21 76 > > HTH > Petr > > > > > > > It looks like Tony is right: sampling without replacement requires > > > listing of all elements to be sampled. But, the code Petr provided > > > > > > x1 <- sample(c(rep("red",400),rep("green", > > > 100),rep("black",300)),100) > > > > > > did give me a clue of how to quickly make such a list using the > > > 'rep' command. I will for-loop a rep statement using my original > > > matrix to create a list of elements for each sample: > > > > > > Thanks Petr and Tony for your help! > > > > > > On 10/11/06, *Tony Plate* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > > > wrote: > > > > > > Here's a way using apply(), and the prob= argument of sample(): > > > > > > > df <- data.frame(sample1=c(red=400,green=100,black=300), > > > sample2=c(300,0,1000), sample3=c(2500,200,500)) > > > > df > > > sample1 sample2 sample3 > > > red 400 300 2500 > > > green 100 0 200 > > > black 300 1000 500 > > > > set.seed(1) > > > > apply(df, 2, function(counts) sample(seq(along=counts), > > > > rep=T, > > > size=7, prob=counts)) > > > sample1 sample2 sample3 > > > [1,] 1 3 1 > > > [2,] 1 3 1 > > > [3,] 3 3 1 > > > [4,] 2 3 2 > > > [5,] 1 3 1 > > > [6,] 2 3 1 > > > [7,] 2 3 3 > > > > > > > > > > Note that this does sampling WITH replacement. > > > AFAIK, sampling without replacement requires enumerating the > > > entire population to be sampled from. I.e., you cannot do > > > > sample(1:3, prob=1:3, rep=F, size=4) > > > instead of > > > > sample(c(1,2,2,3,3,3), rep=F, size=4) > > > > > > -- Tony Plate > > > > > > From reading ?sample, I was a little unclear on whether > > > sampling > > > without replacement could work > > > > > > Petr Pikal wrote: > > > > Hi > > > > > > > > a litle bit different story. But > > > > > > > > x1 <- sample(c(rep("red",400),rep("green", 100), > > > > rep("black",300)),100) > > > > > > > > is maybe close. With data frame (if it is not big) > > > > > > > > > > > >>DF > > > > > > > > color sample1 sample2 sample3 > > > > 1 red 400 300 2500 > > > > 2 green 100 0 200 > > > > 3 black 300 1000 500 > > > > > > > > x <- data.frame(matrix(NA,100,3)) > > > > for (i in 2:ncol(DF)) x[,i-1] <- sample(rep(DF[,1], > > > > DF[,i]),100) if you want result in data frame or > > > > x<-vector("list", 3) for (i in 2:ncol(DF)) x[[,i-1]] <- > > > > sample(rep(DF[,1], DF[,i]),100) > > > > > > > > if you want it in list. Maybe somebody is clever enough to > > > > discard for loop but you said you have 80 columns which shall > > > > be no problem. > > > > > > > > HTH > > > > Petr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 11 Oct 2006 at 10:11, Brian Frappier wrote: > > > > > > > > Date sent: Wed, 11 Oct 2006 10:11:33 -0400 > > > > From: "Brian Frappier" < > > > > [EMAIL PROTECTED] > > > <mailto:[EMAIL PROTECTED]>> > > > > To: "Petr Pikal" <[EMAIL PROTECTED] > > > <mailto:[EMAIL PROTECTED]>> > > > > Subject: Fwd: [R] rarefy a matrix of counts > > > > > > > > > > > >>---------- Forwarded message ---------- > > > >>From: Brian Frappier <[EMAIL PROTECTED] > > > <mailto:[EMAIL PROTECTED]>> > > > >>Date: Oct 11, 2006 10:10 AM > > > >>Subject: Re: [R] rarefy a matrix of counts > > > >>To: r-help@stat.math.ethz.ch > > > >><mailto:r-help@stat.math.ethz.ch> > > > >> > > > >>Hi Petr, > > > >> > > > >>Thanks for your response. I have data that looks like the > > > following: > > > >> > > > >> sample 1 sample 2 sample 3 > > > >> .... > > > >>red candy 400 300 2500 > > > >>green candy 100 0 200 > > > >>black candy 300 1000 500 > > > >> > > > >>I don't want to randomly select either the samples (columns) > > > >>or the "candy" types (rows), which sample as you state would > > > >>allow me. Instead, I want to randomly sample 100 candies from > > > >>each sample and retain info on their associated type. I > > > >>could make a list of all the candies in each sample: > > > >> > > > >>sample 1 > > > >>red > > > >>red > > > >>red > > > >>red > > > >>green > > > >>green > > > >>black > > > >>red > > > >>black > > > >>... > > > >> > > > >>and then randomly sample those rows. Repeat for each > > > sample. But, I > > > >>am not sure how to do that without alot of loops, and am > > > >>wondering if there is an easier way in R. Thanks! I should > > > >>have laid this out in the first email...sorry. > > > >> > > > >> > > > >>On 10/11/06, Petr Pikal <[EMAIL PROTECTED] > > > <mailto:[EMAIL PROTECTED]>> wrote: > > > >> > > > >>>Hi > > > >>> > > > >>>I am not experienced in Matlab and from your explanation I > > > >>>do not understand what exactly do you want. It seems that > > > >>>you want randomly choose a sample of 100 rows from your > > > >>>martix, what can be achived by sample. > > > >>> > > > >>>DF<- data.frame(rnorm(100), 1:100, 101:200, 201:300) > > > >>>DF[sample(1:100, 10),] > > > >>> > > > >>>If you want to do this several times, you need to save your > > > >>>result and than it depends on what you want to do next. One > > > >>>suitable form is list of matrices the other is array and you > > > >>>can use for loop for completing it. > > > >>> > > > >>>HTH > > > >>>Petr > > > >>> > > > >>> > > > >>>On 10 Oct 2006 at 17:40, Brian Frappier wrote: > > > >>> > > > >>>Date sent: Tue, 10 Oct 2006 17:40:47 -0400 > > > >>>From: "Brian Frappier" > > > <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> > > > >>>To: r-help@stat.math.ethz.ch > > > <mailto:r-help@stat.math.ethz.ch> Subject: > > > >>> [R] rarefy a matrix of counts > > > >>> > > > >>> > > > >>>>Hi all, > > > >>>> > > > >>>>I have a matrix of counts for objects (rows) by samples > > > >>>>(columns). > > > >>>> I aimed for about 500 counts in each sample (I have about > > > >>>> 80 > > > >>>>samples) and would now like to rarefy these down to 100 > > > >>>>counts in each sample using simple random sampling without > > > >>>>replacement. I plan on rarefying several times for each > > > >>>>sample. I could do the tedious looping task of making a > > > >>>>list of all objects (with its associated identifier) in > > > >>>>each sample and then use the wonderful "sampling" package > > > >>>>to select a sub-sample of 100 for each sample and thereby > > > >>>>get a logical vector of inclusions. I would then regroup > > > >>>>the resulting logical vector into a vector of counts by > > > >>>>object, rinse and repeat several times for each sample. > > > >>>> > > > >>>>Alternately, using the same list, I could create a random > > > >>>>index of integers between 1 and the number of objects for a > > > >>>>sample (without repeats) and then select those objects from > > > >>>>the list. Again, rinse and repeat several time for each > > > >>>>sample. > > > >>>> > > > >>>>Is there a way to directly rarefy a matrix of counts > > > >>>>without having to create a list of objects first? I am > > > >>>>trying to switch to R from Matlab and am trying to pick up > > > >>>>good programming habits from the start. > > > >>>> > > > >>>>Much appreciation! > > > >>>> > > > >>>> [[alternative HTML version deleted]] > > > >>>> > > > >>>>______________________________________________ > > > >>>>R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> > > > mailing list > > > >>>>https://stat.ethz.ch/mailman/listinfo/r-help > > > <https://stat.ethz.ch/mailman/listinfo/r-help> > > > >>>>PLEASE do read the posting guide > > > >>>>http://www.R-project.org/posting-guide.html and provide > > > >>>>commented, minimal, self-contained, reproducible code. > > > >>> > > > >>>Petr Pikal > > > >>>[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > >>> > > > >>> > > > >> > > > > > > > > Petr Pikal > > > > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > > > > > > > ______________________________________________ > > > > R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> > > > mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible > > > > code. > > > > > > > > > > > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > Petr Pikal > [EMAIL PROTECTED] > > [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.