[R] random section of samples based on group membership
Hi all, I have a matrix of 474 rows (samples) with 565 columns (variables). each of the 474 samples belong to one of 120 groups, with the groupings as a column in the above matrix. For example, the group column would be: 1 1 1 2 2 2 . . . 120 120 I want to randomly select one from each group. Not all the groups have the same number of samples, some have 4, some 3 etc. Is there a function to do this, or would I need to write a looping statement to look at each successive group? I basically want to combine the randomly selected samples from the 120 groups into a new matrix in order to perform a cluster analysis. Thanks, Wade __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random section of samples based on group membership
Dear Wade, Say that your groups are groups - sort(sample(1:10, 100, replace = TRUE)) Create a dummy rows - 1:length(groups) Then tapply( rows, groups, function(x) sample(x, 1)) does the trick to select the row numbers you need for your sampling. Sincerely, Carlos J. Gil Bellosta http://www.datanalytics.com http://www.data-mining-blog.com Quoting Wade Wall [EMAIL PROTECTED]: Hi all, I have a matrix of 474 rows (samples) with 565 columns (variables). each of the 474 samples belong to one of 120 groups, with the groupings as a column in the above matrix. For example, the group column would be: 1 1 1 2 2 2 . . . 120 120 I want to randomly select one from each group. Not all the groups have the same number of samples, some have 4, some 3 etc. Is there a function to do this, or would I need to write a looping statement to look at each successive group? I basically want to combine the randomly selected samples from the 120 groups into a new matrix in order to perform a cluster analysis. Thanks, Wade __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random section of samples based on group membership
Well, how you do it might be a matter of taste with respect to how you want the results. You could try using by with sample by(x,x[,3],function(y){y[sample(nrow(y),1),]}) This will return a list with one list element for each sample group. You can the combine the list back into a matrix. That's my naive solution; no doubt there will be half a dozen better ways to go about it. Also, some of the clustering functions I have seen will sample for you. On 7/24/06, Wade Wall [EMAIL PROTECTED] wrote: Hi all, I have a matrix of 474 rows (samples) with 565 columns (variables). each of the 474 samples belong to one of 120 groups, with the groupings as a column in the above matrix. For example, the group column would be: 1 1 1 2 2 2 . . . 120 120 I want to randomly select one from each group. Not all the groups have the same number of samples, some have 4, some 3 etc. Is there a function to do this, or would I need to write a looping statement to look at each successive group? I basically want to combine the randomly selected samples from the 120 groups into a new matrix in order to perform a cluster analysis. Thanks, Wade __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Regards, Mike Nielsen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random section of samples based on group membership
On Mon, 24 Jul 2006 11:18:10 -0400, Wade Wall [EMAIL PROTECTED] wrote: Hi all, I have a matrix of 474 rows (samples) with 565 columns (variables). each of the 474 samples belong to one of 120 groups, with the groupings as a column in the above matrix. For example, the group column would be: 1 1 1 2 2 2 . . . 120 120 I want to randomly select one from each group. Not all the groups have the same number of samples, some have 4, some 3 etc. Is there a function to do this, or would I need to write a looping statement to look at each successive group? I use the following for that (some of it hacked from help(sample)): .resample - function(x, size, ...) { if(length(x) = 1) { if(!missing(size) size == 0) x[FALSE] else x } else sample(x, size, ...) } randpick - function(x, by, size = 1, ...) { nx - seq(nrow(x)) ind - unlist(tapply(nx, by, .resample, size, ...)) x[nx %in% ind, ] } So, for instance: R randpick(Indometh, Indometh$Subject, 3) Subject time conc 21 0.50 0.94 71 3.00 0.12 11 1 8.00 0.05 15 2 1.00 0.70 16 2 1.25 0.64 19 2 4.00 0.20 25 3 0.75 1.16 29 3 3.00 0.22 32 3 6.00 0.08 34 4 0.25 1.85 43 4 6.00 0.07 44 4 8.00 0.07 48 5 1.00 0.39 54 5 6.00 0.10 55 5 8.00 0.06 58 6 0.75 1.03 64 6 5.00 0.13 65 6 6.00 0.10 R randpick(Indometh, Indometh$Subject, 2) Subject time conc 81 4.00 0.11 10 1 6.00 0.07 14 2 0.75 0.71 20 2 5.00 0.25 23 3 0.25 2.72 28 3 2.00 0.39 39 4 2.00 0.40 43 4 6.00 0.07 48 5 1.00 0.39 52 5 4.00 0.11 57 6 0.50 1.44 66 6 8.00 0.09 The 'by' argument allows to sample within any combination of factors desired. Cheers, -- Seb __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.