[R] random section of samples based on group membership

2006-07-24 Thread Wade Wall
Hi all,

I have a matrix of 474 rows (samples) with 565 columns (variables).
each of the 474 samples belong to one of 120 groups, with the
groupings as a column in the above matrix. For example, the group
column would be:

1
1
1
2
2
2
.
.
.
120
120

I  want to randomly select one from each group.  Not all the groups
have the same number of samples, some have 4, some 3 etc.  Is there a
function to do this, or would I need to write a looping statement to
look at each successive group?

I basically want to combine the randomly selected samples from the 120
groups into a new matrix in order to perform a cluster analysis.

Thanks,
Wade

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random section of samples based on group membership

2006-07-24 Thread Carlos J. Gil Bellosta
Dear Wade,

Say that your groups are

groups - sort(sample(1:10, 100, replace = TRUE))

Create a dummy

rows - 1:length(groups)

Then

tapply( rows, groups, function(x) sample(x, 1))

does the trick to select the row numbers you need for your sampling.

Sincerely,

Carlos J. Gil Bellosta
http://www.datanalytics.com
http://www.data-mining-blog.com


Quoting Wade Wall [EMAIL PROTECTED]:

 Hi all,

 I have a matrix of 474 rows (samples) with 565 columns (variables).
 each of the 474 samples belong to one of 120 groups, with the
 groupings as a column in the above matrix. For example, the group
 column would be:

 1
 1
 1
 2
 2
 2
 .
 .
 .
 120
 120

 I  want to randomly select one from each group.  Not all the groups
 have the same number of samples, some have 4, some 3 etc.  Is there a
 function to do this, or would I need to write a looping statement to
 look at each successive group?

 I basically want to combine the randomly selected samples from the 120
 groups into a new matrix in order to perform a cluster analysis.

 Thanks,
 Wade

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random section of samples based on group membership

2006-07-24 Thread Mike Nielsen
Well, how you do it might be a matter of taste with respect to how you
want the results.

You could try using by with sample

by(x,x[,3],function(y){y[sample(nrow(y),1),]})

This will return a list with one list element for each sample group.
You can the combine the list back into a matrix.

That's my naive solution; no doubt there will be half a dozen better
ways to go about it.

Also, some of the clustering functions I have seen will sample for you.


On 7/24/06, Wade Wall [EMAIL PROTECTED] wrote:
 Hi all,

 I have a matrix of 474 rows (samples) with 565 columns (variables).
 each of the 474 samples belong to one of 120 groups, with the
 groupings as a column in the above matrix. For example, the group
 column would be:

 1
 1
 1
 2
 2
 2
 .
 .
 .
 120
 120

 I  want to randomly select one from each group.  Not all the groups
 have the same number of samples, some have 4, some 3 etc.  Is there a
 function to do this, or would I need to write a looping statement to
 look at each successive group?

 I basically want to combine the randomly selected samples from the 120
 groups into a new matrix in order to perform a cluster analysis.

 Thanks,
 Wade

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Regards,

Mike Nielsen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] random section of samples based on group membership

2006-07-24 Thread Sebastian Luque
On Mon, 24 Jul 2006 11:18:10 -0400,
Wade Wall [EMAIL PROTECTED] wrote:

 Hi all, I have a matrix of 474 rows (samples) with 565 columns
 (variables).  each of the 474 samples belong to one of 120 groups, with
 the groupings as a column in the above matrix. For example, the group
 column would be:

 1 1 1 2 2 2 .  .  .  120 120

 I want to randomly select one from each group.  Not all the groups have
 the same number of samples, some have 4, some 3 etc.  Is there a
 function to do this, or would I need to write a looping statement to
 look at each successive group?

I use the following for that (some of it hacked from help(sample)):

.resample - function(x, size, ...) {
if(length(x) = 1) {
if(!missing(size)  size == 0) x[FALSE] else x
} else sample(x, size, ...)
}


randpick - function(x, by, size = 1, ...)
{
nx - seq(nrow(x))
ind - unlist(tapply(nx, by, .resample, size, ...))
x[nx %in% ind, ]
}


So, for instance:

R randpick(Indometh, Indometh$Subject, 3)
   Subject time conc
21 0.50 0.94
71 3.00 0.12
11   1 8.00 0.05
15   2 1.00 0.70
16   2 1.25 0.64
19   2 4.00 0.20
25   3 0.75 1.16
29   3 3.00 0.22
32   3 6.00 0.08
34   4 0.25 1.85
43   4 6.00 0.07
44   4 8.00 0.07
48   5 1.00 0.39
54   5 6.00 0.10
55   5 8.00 0.06
58   6 0.75 1.03
64   6 5.00 0.13
65   6 6.00 0.10
R randpick(Indometh, Indometh$Subject, 2)
   Subject time conc
81 4.00 0.11
10   1 6.00 0.07
14   2 0.75 0.71
20   2 5.00 0.25
23   3 0.25 2.72
28   3 2.00 0.39
39   4 2.00 0.40
43   4 6.00 0.07
48   5 1.00 0.39
52   5 4.00 0.11
57   6 0.50 1.44
66   6 8.00 0.09


The 'by' argument allows to sample within any combination of factors
desired.


Cheers,

-- 
Seb

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.