Re: [R] Splitting a data column randomly into 3 groups
Hi Bert and All: good morning I promise this would be the last time to write about this topic. I come up with this R function (please see below), for sure with your help. It works for all sample sizes. I also provided three different simple examples. with many thanks abou ##Here it is### Random.Sample.IDs <- function (N,n, ngroups){ N = population size, and n = sample size, ngroups = number of groups population.IDs <- seq(1, N, by = 1) sample.IDs <- sample(population.IDs,n) # to print sample.IDs in a column format # -- sample.IDs.in.column<-data.frame(sample.IDs) print(sample.IDs.in.column) reminder.n<-n%%ngroups reminder.n n.final<-n-reminder.n n.final m <- n %/% 3 m s <- sample(1:n, n) if (reminder.n == 0) { group1.IDs <- sample.IDs[s[1:m]] group2.IDs <- sample.IDs[s[(m+1):(2*m)]] group3.IDs <- sample.IDs[s[(m*2+1):(3*m)]] } else if(reminder.n == 1){ group1.IDs <- sample.IDs[s[1:(m+1)]] group2.IDs <- sample.IDs[s[(m+2):(2*m+1)]] group3.IDs <- sample.IDs[s[(m*2+2):(3*m+1)]] } else if(reminder.n == 2){ group1.IDs <- sample.IDs[s[1:(m+1)]] group2.IDs <- sample.IDs[s[(m+2):(2*m+2)]] group3.IDs <- sample.IDs[s[(m*2+3):(3*m+2)]] } nn<-max(length(group1.IDs),length(group2.IDs),length(group3.IDs)) nn length(group1.IDs) <- nn length(group2.IDs) <- nn length(group3.IDs) <- nn groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs } # Examples # Random.Sample.IDs (100,12,3) group sizes are equal (n1=n2=n3=4) Random.Sample.IDs (100,13,3) group sizes are NOT equal (n1=5, n2=4, n3=4) Random.Sample.IDs (100,17,3) group sizes are NOT equal (n1=6, n2=6, n3=5) __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Sun, Sep 5, 2021 at 6:50 PM Bert Gunter wrote: > In case anyone is still interested in my query, note that if there are > n total items to be split into g groups as evenly as possible, if we > define this as at most two different size groups whose size differs by > 1, then: > > if n = k*g + r, where 0 <= r < g, > then n = k*(g - r) + (k + 1)*r . > i.e. g-r groups of size k and r groups of size k+1 > > So using R's modular arithmetic operators, which are handy to know > about, we have: > > r = n %% g and k = n %/% g . > > (and note that you should disregard my previous stupid remark about > numerical analysis). > > Cheers, > Bert > > > On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter wrote: > > > > I have a more general problem for you. > > > > Given n items and 2 <=g < > groups that are as "equal as possible." > > > > First, operationally define "as equal as possible." > > Second, define the algorithm to carry out the definition. Hint: Note > > that sum{m[i]} for i <=g must sum to n, where m[i] is the number of > > items in the ith group. > > Third, write R code for the algorithm. Exercise for the reader. > > > > I may be wrong, but I think numerical analysts might also have a > > little fun here. > > > > Randomization, of course, is trivial. > > > > Cheers, > > Bert > > > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > > and sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa > > wrote: > > > > > > Dear Thomas: > > > > > > > > > Thank you very much for your input in this matter. > > > > > > > > > The core part of this R code(s) (please see below) was written by > *Richard > > > O'Keefe*. I had three examples with different sample sizes. > > > > > > > > > > > > *First sample of size n1 = 204* divided randomly into three groups of > sizes > > > 68. *No problems with this one*. > > > > > > > > > > > > *The second sample of size n2 = 112* divided randomly into three > groups of > > > sizes 37, 37, and 38. BUT this R code generated three groups of equal > sizes > > > (37, 37, and 37). *How to fix the code to make sure that the output > will be > > > three groups of sizes 37, 37, and 38*. > > > > > > > > > > > > *The third sample of size n3 = 284* divided randomly into three groups > of > > > sizes 94, 95, and 95. BUT this R code generated three groups of equal > sizes > > > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the > > > output will be three groups of sizes 94, 95, and 95*. > > > > > > > > > With many thanks > > > > > > abou > > > > > > > > > ### # > > > > > > > > > N1 <- 485 > > > population1.IDs <- seq(1, N1, by = 1) > > > population1.IDs > > > > > > n1<-204# in this case the > size > > > of each group of the three groups = 68 > > > sample1.IDs <- sample(population1.IDs,n1) > > >
Re: [R] Splitting a data column randomly into 3 groups
In case anyone is still interested in my query, note that if there are n total items to be split into g groups as evenly as possible, if we define this as at most two different size groups whose size differs by 1, then: if n = k*g + r, where 0 <= r < g, then n = k*(g - r) + (k + 1)*r . i.e. g-r groups of size k and r groups of size k+1 So using R's modular arithmetic operators, which are handy to know about, we have: r = n %% g and k = n %/% g . (and note that you should disregard my previous stupid remark about numerical analysis). Cheers, Bert On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter wrote: > > I have a more general problem for you. > > Given n items and 2 <=g < groups that are as "equal as possible." > > First, operationally define "as equal as possible." > Second, define the algorithm to carry out the definition. Hint: Note > that sum{m[i]} for i <=g must sum to n, where m[i] is the number of > items in the ith group. > Third, write R code for the algorithm. Exercise for the reader. > > I may be wrong, but I think numerical analysts might also have a > little fun here. > > Randomization, of course, is trivial. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa > wrote: > > > > Dear Thomas: > > > > > > Thank you very much for your input in this matter. > > > > > > The core part of this R code(s) (please see below) was written by *Richard > > O'Keefe*. I had three examples with different sample sizes. > > > > > > > > *First sample of size n1 = 204* divided randomly into three groups of sizes > > 68. *No problems with this one*. > > > > > > > > *The second sample of size n2 = 112* divided randomly into three groups of > > sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes > > (37, 37, and 37). *How to fix the code to make sure that the output will be > > three groups of sizes 37, 37, and 38*. > > > > > > > > *The third sample of size n3 = 284* divided randomly into three groups of > > sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes > > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the > > output will be three groups of sizes 94, 95, and 95*. > > > > > > With many thanks > > > > abou > > > > > > ### # > > > > > > N1 <- 485 > > population1.IDs <- seq(1, N1, by = 1) > > population1.IDs > > > > n1<-204# in this case the size > > of each group of the three groups = 68 > > sample1.IDs <- sample(population1.IDs,n1) > > sample1.IDs > > > > n1 <- length(sample1.IDs) > > > > m1 <- n1 %/% 3 > > s1 <- sample(1:n1, n1) > > group1.IDs <- sample1.IDs[s1[1:m1]] > > group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] > > group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > groups.IDs > > > > > > ### -- > > > > > > N2 <- 266 > > population2.IDs <- seq(1, N2, by = 1) > > population2.IDs > > > > n2<-112 # in this case the sizes of the three > > groups are(37, 37, and 38) > > # BUT this codes generate > > three groups of equal sizes (37, 37, and 37) > > sample2.IDs <- sample(population2.IDs,n2) > > sample2.IDs > > > > n2 <- length(sample2.IDs) > > > > m2 <- n2 %/% 3 > > s2 <- sample(1:n2, n2) > > group1.IDs <- sample2.IDs[s2[1:m2]] > > group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] > > group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > groups.IDs > > > > > > ### -- > > > > > > > > N3 <- 674 > > population3.IDs <- seq(1, N3, by = 1) > > population3.IDs > > > > n3<-284 # in this case the sizes of the three > > groups are(94, 95, and 95) > > # BUT this codes generate > > three groups of equal sizes (94, 94, and 94) > > sample2.IDs <- sample(population2.IDs,n2) > > sample3.IDs <- sample(population3.IDs,n3) > > sample3.IDs > > > > n3 <- length(sample2.IDs) > > > > m3 <- n3 %/% 3 > > s3 <- sample(1:n3, n3) > > group1.IDs <- sample3.IDs[s3[1:m3]] > > group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] > > group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > groups.IDs > > > > __ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > > > > > > > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia wrote: > > > > > Abou, > >
Re: [R] Splitting a data column randomly into 3 groups
Abou, I believe I addressed this issue in a private message the other day. As a general rule, truncating can leave a remainder. If M = length(whatever)/3 Then M is no longer an integer. It can be a number ending in .333... or .666... as well as 0. Now R may silently truncate something like 100/3 which you see to use and make it be as if you typed 33. Same for 2*M. In your code, you used integer division and that is a truncation too! m1 <- n1 %/% 3 s1 <- sample(1:n1, n1) group1.IDs <- sample1.IDs[s1[1:m1]] group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] A proper solution accounts for any leftover items. One method is to leave all extra items till the end and have: MAX <- length(original or whatever) group3.IDs <- sample1.IDs[s1[(m1*2+1):MAX]] The last group then might have one or two extra items. Another is to go for a second sweep and take any leftover items and move one each into whatever groups you wish for some balance. Or, as discussed, there are packages available that let you specify percentages you want and handle these edge cases too. -Original Message- From: R-help On Behalf Of AbouEl-Makarim Aboueissa Sent: Saturday, September 4, 2021 5:13 PM To: Thomas Subia Cc: R mailing list Subject: Re: [R] Splitting a data column randomly into 3 groups Dear Thomas: Thank you very much for your input in this matter. The core part of this R code(s) (please see below) was written by *Richard O'Keefe*. I had three examples with different sample sizes. *First sample of size n1 = 204* divided randomly into three groups of sizes 68. *No problems with this one*. *The second sample of size n2 = 112* divided randomly into three groups of sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes (37, 37, and 37). *How to fix the code to make sure that the output will be three groups of sizes 37, 37, and 38*. *The third sample of size n3 = 284* divided randomly into three groups of sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the output will be three groups of sizes 94, 95, and 95*. With many thanks abou ### # N1 <- 485 population1.IDs <- seq(1, N1, by = 1) population1.IDs n1<-204# in this case the size of each group of the three groups = 68 sample1.IDs <- sample(population1.IDs,n1) sample1.IDs n1 <- length(sample1.IDs) m1 <- n1 %/% 3 s1 <- sample(1:n1, n1) group1.IDs <- sample1.IDs[s1[1:m1]] group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs ### -- N2 <- 266 population2.IDs <- seq(1, N2, by = 1) population2.IDs n2<-112 # in this case the sizes of the three groups are(37, 37, and 38) # BUT this codes generate three groups of equal sizes (37, 37, and 37) sample2.IDs <- sample(population2.IDs,n2) sample2.IDs n2 <- length(sample2.IDs) m2 <- n2 %/% 3 s2 <- sample(1:n2, n2) group1.IDs <- sample2.IDs[s2[1:m2]] group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs ### -- N3 <- 674 population3.IDs <- seq(1, N3, by = 1) population3.IDs n3<-284 # in this case the sizes of the three groups are(94, 95, and 95) # BUT this codes generate three groups of equal sizes (94, 94, and 94) sample2.IDs <- sample(population2.IDs,n2) sample3.IDs <- sample(population3.IDs,n3) sample3.IDs n3 <- length(sample2.IDs) m3 <- n3 %/% 3 s3 <- sample(1:n3, n3) group1.IDs <- sample3.IDs[s3[1:m3]] group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia wrote: > Abou, > > > > I’ve been following your question on how to split a data column > randomly into 3 groups using R. > > > > My method may not be amenable for a large set of data but it surely > worth considering since it makes sense intuitively. > > > > mydata <- LETTERS[1:11] > > > mydata > > [1] "A" "B" "C&qu
Re: [R] Splitting a data column randomly into 3 groups
I have a more general problem for you. Given n items and 2 <=g < wrote: > > Dear Thomas: > > > Thank you very much for your input in this matter. > > > The core part of this R code(s) (please see below) was written by *Richard > O'Keefe*. I had three examples with different sample sizes. > > > > *First sample of size n1 = 204* divided randomly into three groups of sizes > 68. *No problems with this one*. > > > > *The second sample of size n2 = 112* divided randomly into three groups of > sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes > (37, 37, and 37). *How to fix the code to make sure that the output will be > three groups of sizes 37, 37, and 38*. > > > > *The third sample of size n3 = 284* divided randomly into three groups of > sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the > output will be three groups of sizes 94, 95, and 95*. > > > With many thanks > > abou > > > ### # > > > N1 <- 485 > population1.IDs <- seq(1, N1, by = 1) > population1.IDs > > n1<-204# in this case the size > of each group of the three groups = 68 > sample1.IDs <- sample(population1.IDs,n1) > sample1.IDs > > n1 <- length(sample1.IDs) > > m1 <- n1 %/% 3 > s1 <- sample(1:n1, n1) > group1.IDs <- sample1.IDs[s1[1:m1]] > group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] > group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > groups.IDs > > > ### -- > > > N2 <- 266 > population2.IDs <- seq(1, N2, by = 1) > population2.IDs > > n2<-112 # in this case the sizes of the three > groups are(37, 37, and 38) > # BUT this codes generate > three groups of equal sizes (37, 37, and 37) > sample2.IDs <- sample(population2.IDs,n2) > sample2.IDs > > n2 <- length(sample2.IDs) > > m2 <- n2 %/% 3 > s2 <- sample(1:n2, n2) > group1.IDs <- sample2.IDs[s2[1:m2]] > group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] > group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > groups.IDs > > > ### -- > > > > N3 <- 674 > population3.IDs <- seq(1, N3, by = 1) > population3.IDs > > n3<-284 # in this case the sizes of the three > groups are(94, 95, and 95) > # BUT this codes generate > three groups of equal sizes (94, 94, and 94) > sample2.IDs <- sample(population2.IDs,n2) > sample3.IDs <- sample(population3.IDs,n3) > sample3.IDs > > n3 <- length(sample2.IDs) > > m3 <- n3 %/% 3 > s3 <- sample(1:n3, n3) > group1.IDs <- sample3.IDs[s3[1:m3]] > group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] > group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > groups.IDs > > __ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > > > > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia wrote: > > > Abou, > > > > > > > > I’ve been following your question on how to split a data column randomly > > into 3 groups using R. > > > > > > > > My method may not be amenable for a large set of data but it surely worth > > considering since it makes sense intuitively. > > > > > > > > mydata <- LETTERS[1:11] > > > > > mydata > > > > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" > > > > > > > > # Let’s choose a random sample of size 4 from mydata > > > > > random_grp1 > > > > [1] "J" "H" "D" "A" > > > > > > > > Now my next random selection of data is defined by > > > > data_wo_random <- setdiff(mydata,random_grp1) > > > > # this makes sense because I need to choose random data from a set which > > is defined by the difference of the sets mydata and random_grp1 > > > > > > > > > data_wo_random > > > > [1] "B" "C" "E" "F" "G" "I" "K" > > > > > > > > This is great! So now I can randomly select data of any size from this set. > > > > Repeating this process can easily generate subgroups of your original > > dataset of any size you want. > > > > > > > > Surely this method could be improved so that this could be done > > automatically. > > > > Nevertheless, this is an intuitive method which I believe is easier to > > understand than some of the other methods posted. > > > > > > > > Hope this helps! > > > > > > > > Thomas Subia > > > > Statistician > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do
Re: [R] Splitting a data column randomly into 3 groups
Dear Thomas: Thank you very much for your input in this matter. The core part of this R code(s) (please see below) was written by *Richard O'Keefe*. I had three examples with different sample sizes. *First sample of size n1 = 204* divided randomly into three groups of sizes 68. *No problems with this one*. *The second sample of size n2 = 112* divided randomly into three groups of sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes (37, 37, and 37). *How to fix the code to make sure that the output will be three groups of sizes 37, 37, and 38*. *The third sample of size n3 = 284* divided randomly into three groups of sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the output will be three groups of sizes 94, 95, and 95*. With many thanks abou ### # N1 <- 485 population1.IDs <- seq(1, N1, by = 1) population1.IDs n1<-204# in this case the size of each group of the three groups = 68 sample1.IDs <- sample(population1.IDs,n1) sample1.IDs n1 <- length(sample1.IDs) m1 <- n1 %/% 3 s1 <- sample(1:n1, n1) group1.IDs <- sample1.IDs[s1[1:m1]] group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs ### -- N2 <- 266 population2.IDs <- seq(1, N2, by = 1) population2.IDs n2<-112 # in this case the sizes of the three groups are(37, 37, and 38) # BUT this codes generate three groups of equal sizes (37, 37, and 37) sample2.IDs <- sample(population2.IDs,n2) sample2.IDs n2 <- length(sample2.IDs) m2 <- n2 %/% 3 s2 <- sample(1:n2, n2) group1.IDs <- sample2.IDs[s2[1:m2]] group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs ### -- N3 <- 674 population3.IDs <- seq(1, N3, by = 1) population3.IDs n3<-284 # in this case the sizes of the three groups are(94, 95, and 95) # BUT this codes generate three groups of equal sizes (94, 94, and 94) sample2.IDs <- sample(population2.IDs,n2) sample3.IDs <- sample(population3.IDs,n3) sample3.IDs n3 <- length(sample2.IDs) m3 <- n3 %/% 3 s3 <- sample(1:n3, n3) group1.IDs <- sample3.IDs[s3[1:m3]] group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia wrote: > Abou, > > > > I’ve been following your question on how to split a data column randomly > into 3 groups using R. > > > > My method may not be amenable for a large set of data but it surely worth > considering since it makes sense intuitively. > > > > mydata <- LETTERS[1:11] > > > mydata > > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" > > > > # Let’s choose a random sample of size 4 from mydata > > > random_grp1 > > [1] "J" "H" "D" "A" > > > > Now my next random selection of data is defined by > > data_wo_random <- setdiff(mydata,random_grp1) > > # this makes sense because I need to choose random data from a set which > is defined by the difference of the sets mydata and random_grp1 > > > > > data_wo_random > > [1] "B" "C" "E" "F" "G" "I" "K" > > > > This is great! So now I can randomly select data of any size from this set. > > Repeating this process can easily generate subgroups of your original > dataset of any size you want. > > > > Surely this method could be improved so that this could be done > automatically. > > Nevertheless, this is an intuitive method which I believe is easier to > understand than some of the other methods posted. > > > > Hope this helps! > > > > Thomas Subia > > Statistician > > > > > > > > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a data column randomly into 3 groups
Hi Richard: Thank you very much for your help in this matter. with thanks abou __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Fri, Sep 3, 2021 at 10:25 AM Richard O'Keefe wrote: > Your question is ambiguous. > One reading is > n <- length(table$Data) > m <- n %/% 3 > s <- sample(1:n, n) > X <- table$Data[s[1:m]] > Y <- table$Data[s[(m+1):(2*m)]] > Z <- table$Data[s[(m*2+1):(3*m)]] > > > > > On Fri, 3 Sept 2021 at 13:31, AbouEl-Makarim Aboueissa > wrote: > > > > Dear All: > > > > How to split a column data *randomly* into three groups. Please see the > > attached data. I need to split column #2 titled "Data" > > > > with many thanks > > abou > > __ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a data column randomly into 3 groups
Hi Avi: good morning Again, many thanks to all of you. I appreciate all what you are doing. You are good. I did it in Minitab. It cost me a little bit more time, but it is okay. It was a little bit confusing for me to do it in R. Because in *Step 1: *I have to select a random sample of size n=204 (say) out of N=700 (say). Then in Step 2: I have to allocate the 204 randomly selected obs. into three groups of equal sample sizes. Again, thank you very much, and sorry if I bothered you. with many thanks abou __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Thu, Sep 2, 2021 at 10:42 PM Avi Gross via R-help wrote: > Abou, > > > > I am not trying to be negative. Assuming you are a professor of > Statistics, your request seems odd as what you are asking about is very > routine in much of statistical work where you want to make a model or > something using just part of your data and need to reserve some to check if > you perhaps trained an algorithm too much for the original data used. > > > > A simple online search before asking questions here is appreciated. I did > a quick search for something like “R split data into three parts” and see > several applicable answers. > > > > There are people on this forum who actually get paid to do nontrivial > tasks and do not mind help in spots but feel sort of used if expected to > write a serious amount of code and perhaps then be asked to redo it with > more bells and whistles added. A recent badly phrased request comes to mind > where several of us provided and answer only to find out it was for a > different scenario, … > > > > So let me continue with a serious answer. May we assume you KNOW how to > read the data in to something like a data.frame? If so, and if you see no > need or value in doing this the hard way, then your question could have > been to ask if there is an R built-in function or perhaps a pacjkage > already set to solve it quickly. Again, a simple online search can do > wonders. Here, for example is a package called caret and this page > discusses spliutting data multiple ways: > > > > https://topepo.github.io/caret/data-splitting.html > > > > There are other such pages suggesting how to do it using base R. > > > > Here is one that gives an example on how to make three unequal partitions: > > > > inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test > = 0.2)) > > > > > > There is more to do below but in the above, you would use whatever names > you want instead of train/valid/test and set all three to 0.33 and so on. > > > > I repeat, that what you want to do strikes some of us as a fairly routine > thing to do and lots of people have written how they have done it and you > can pick and choose, or redo it on your own. If what you have is a homework > assignment, the appropriate thing is to have you learn to use some > technique yourself and perhaps get minor help when it fails. But if you > will be doing this regularly, use of some packages is highly valuable. > > > > Good Luck. > > > > > > > > > > > > From: AbouEl-Makarim Aboueissa > Sent: Thursday, September 2, 2021 9:51 PM > To: Avi Gross > Cc: R mailing list > Subject: Re: [R] Splitting a data column randomly into 3 groups > > > > Sorry, please forget about it. I believe that I am very serious when I > posted my question. > > > > with thanks > > abou > > > __ > > AbouEl-Makarim Aboueissa, PhD > > > > Professor, Statistics and Data Science > > Graduate Coordinator > > Department of Mathematics and Statistics > > University of Southern Maine > > > > > > > > On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help <mailto:r-help@r-project.org> > wrote: > > What is stopping you Abou? > > Some of us here start wondering if we have better things to do than > homework for others. Help is supposed to be after they try and encounter > issues that we may help with. > > So think about your problem. You supplied data in a file that is NOT in > CSV format but is in Tab separated format. > > You need to get it in to your program and store it in something. It looks > like you have 204 items so 1/3 of those would be exactly 68. > > So if your data is in an object like a vector or data.frame, you want to > choose random number between 1 and 204. How do you do that? You need 1/3 of > the length of the object items, in your case 68. > > Now extract the items with those indices into say A1. Extract
Re: [R] Splitting a data column randomly into 3 groups
Your question is ambiguous. One reading is n <- length(table$Data) m <- n %/% 3 s <- sample(1:n, n) X <- table$Data[s[1:m]] Y <- table$Data[s[(m+1):(2*m)]] Z <- table$Data[s[(m*2+1):(3*m)]] On Fri, 3 Sept 2021 at 13:31, AbouEl-Makarim Aboueissa wrote: > > Dear All: > > How to split a column data *randomly* into three groups. Please see the > attached data. I need to split column #2 titled "Data" > > with many thanks > abou > __ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a data column randomly into 3 groups
Abou, I am not trying to be negative. Assuming you are a professor of Statistics, your request seems odd as what you are asking about is very routine in much of statistical work where you want to make a model or something using just part of your data and need to reserve some to check if you perhaps trained an algorithm too much for the original data used. A simple online search before asking questions here is appreciated. I did a quick search for something like “R split data into three parts” and see several applicable answers. There are people on this forum who actually get paid to do nontrivial tasks and do not mind help in spots but feel sort of used if expected to write a serious amount of code and perhaps then be asked to redo it with more bells and whistles added. A recent badly phrased request comes to mind where several of us provided and answer only to find out it was for a different scenario, … So let me continue with a serious answer. May we assume you KNOW how to read the data in to something like a data.frame? If so, and if you see no need or value in doing this the hard way, then your question could have been to ask if there is an R built-in function or perhaps a pacjkage already set to solve it quickly. Again, a simple online search can do wonders. Here, for example is a package called caret and this page discusses spliutting data multiple ways: https://topepo.github.io/caret/data-splitting.html There are other such pages suggesting how to do it using base R. Here is one that gives an example on how to make three unequal partitions: inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test = 0.2)) There is more to do below but in the above, you would use whatever names you want instead of train/valid/test and set all three to 0.33 and so on. I repeat, that what you want to do strikes some of us as a fairly routine thing to do and lots of people have written how they have done it and you can pick and choose, or redo it on your own. If what you have is a homework assignment, the appropriate thing is to have you learn to use some technique yourself and perhaps get minor help when it fails. But if you will be doing this regularly, use of some packages is highly valuable. Good Luck. From: AbouEl-Makarim Aboueissa Sent: Thursday, September 2, 2021 9:51 PM To: Avi Gross Cc: R mailing list Subject: Re: [R] Splitting a data column randomly into 3 groups Sorry, please forget about it. I believe that I am very serious when I posted my question. with thanks abou __ AbouEl-Makarim Aboueissa, PhD Professor, Statistics and Data Science Graduate Coordinator Department of Mathematics and Statistics University of Southern Maine On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help mailto:r-help@r-project.org> > wrote: What is stopping you Abou? Some of us here start wondering if we have better things to do than homework for others. Help is supposed to be after they try and encounter issues that we may help with. So think about your problem. You supplied data in a file that is NOT in CSV format but is in Tab separated format. You need to get it in to your program and store it in something. It looks like you have 204 items so 1/3 of those would be exactly 68. So if your data is in an object like a vector or data.frame, you want to choose random number between 1 and 204. How do you do that? You need 1/3 of the length of the object items, in your case 68. Now extract the items with those indices into say A1. Extract all the rest into a temporary item. Make another 68 random indices, with no overlap, and copy those items into A2 and the ones that do not have those into A3 and you are sort of done, other than some cleanup or whatever. There are many ways to do the above and I am sure packages too. But since you have made no visible effort, I personally am not going to pick anything in particular. Had you shown some text and code along the lines of the above and just wanted to know how to copy just the ones that were not selected, we could easily ... -Original Message- From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of AbouEl-Makarim Aboueissa Sent: Thursday, September 2, 2021 9:30 PM To: R mailing list mailto:r-help@r-project.org> > Subject: [R] Splitting a data column randomly into 3 groups Dear All: How to split a column data *randomly* into three groups. Please see the attached data. I need to split column #2 titled "Data" with many thanks abou __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and m
Re: [R] Splitting a data column randomly into 3 groups
Hi Abou, One way is to shuffle the original data frame using sample(). and split up the result into three equal parts. I was going to provide example code, but Avi's response popped up and I kind of agree with him. Jim On Fri, Sep 3, 2021 at 11:31 AM AbouEl-Makarim Aboueissa wrote: > > Dear All: > > How to split a column data *randomly* into three groups. Please see the > attached data. I need to split column #2 titled "Data" > > with many thanks > abou > __ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a data column randomly into 3 groups
Sorry, please forget about it. I believe that I am very serious when I posted my question. with thanks abou __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help wrote: > What is stopping you Abou? > > Some of us here start wondering if we have better things to do than > homework for others. Help is supposed to be after they try and encounter > issues that we may help with. > > So think about your problem. You supplied data in a file that is NOT in > CSV format but is in Tab separated format. > > You need to get it in to your program and store it in something. It looks > like you have 204 items so 1/3 of those would be exactly 68. > > So if your data is in an object like a vector or data.frame, you want to > choose random number between 1 and 204. How do you do that? You need 1/3 of > the length of the object items, in your case 68. > > Now extract the items with those indices into say A1. Extract all the > rest into a temporary item. > > Make another 68 random indices, with no overlap, and copy those items into > A2 and the ones that do not have those into A3 and you are sort of done, > other than some cleanup or whatever. > > There are many ways to do the above and I am sure packages too. > > But since you have made no visible effort, I personally am not going to > pick anything in particular. > > Had you shown some text and code along the lines of the above and just > wanted to know how to copy just the ones that were not selected, we could > easily ... > > > -Original Message- > From: R-help On Behalf Of AbouEl-Makarim > Aboueissa > Sent: Thursday, September 2, 2021 9:30 PM > To: R mailing list > Subject: [R] Splitting a data column randomly into 3 groups > > Dear All: > > How to split a column data *randomly* into three groups. Please see the > attached data. I need to split column #2 titled "Data" > > with many thanks > abou > __ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* *Graduate Coordinator* > > *Department of Mathematics and Statistics* *University of Southern Maine* > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a data column randomly into 3 groups
What is stopping you Abou? Some of us here start wondering if we have better things to do than homework for others. Help is supposed to be after they try and encounter issues that we may help with. So think about your problem. You supplied data in a file that is NOT in CSV format but is in Tab separated format. You need to get it in to your program and store it in something. It looks like you have 204 items so 1/3 of those would be exactly 68. So if your data is in an object like a vector or data.frame, you want to choose random number between 1 and 204. How do you do that? You need 1/3 of the length of the object items, in your case 68. Now extract the items with those indices into say A1. Extract all the rest into a temporary item. Make another 68 random indices, with no overlap, and copy those items into A2 and the ones that do not have those into A3 and you are sort of done, other than some cleanup or whatever. There are many ways to do the above and I am sure packages too. But since you have made no visible effort, I personally am not going to pick anything in particular. Had you shown some text and code along the lines of the above and just wanted to know how to copy just the ones that were not selected, we could easily ... -Original Message- From: R-help On Behalf Of AbouEl-Makarim Aboueissa Sent: Thursday, September 2, 2021 9:30 PM To: R mailing list Subject: [R] Splitting a data column randomly into 3 groups Dear All: How to split a column data *randomly* into three groups. Please see the attached data. I need to split column #2 titled "Data" with many thanks abou __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Splitting a data column randomly into 3 groups
Dear All: How to split a column data *randomly* into three groups. Please see the attached data. I need to split column #2 titled "Data" with many thanks abou __ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* ID Data 1 366 2 394 3 222 4 396 5 399 6 158 7 361 8 426 9 255 10 32 11 31 12 53 13 377 14 405 15 448 16 362 17 260 18 90 19 95 20 8 21 385 22 306 23 154 24 345 25 136 26 39 27 472 28 19 29 404 30 463 31 134 32 72 33 477 34 22 35 240 36 389 37 482 38 287 39 180 40 140 41 456 42 403 43 81 44 425 45 57 46 251 47 421 48 343 49 310 50 62 51 412 52 93 53 111 54 148 55 311 56 430 57 12 58 100 59 437 60 363 61 126 62 367 63 165 64 272 65 171 66 167 67 234 68 113 69 315 70 175 71 484 72 379 73 474 74 216 75 250 76 177 77 293 78 133 79 203 80 408 81 150 82 155 83 223 84 381 85 336 86 368 87 290 88 359 89 333 90 219 91 455 92 427 93 444 94 178 95 302 96 221 97 248 98 160 99 304 100 56 101 25 102 400 103 485 104 89 105 254 106 186 107 283 108 431 109 188 110 354 111 119 112 67 113 415 114 346 115 319 116 344 117 121 118 34 119 288 120 416 121 308 122 340 123 166 124 443 125 388 126 286 127 245 128 406 129 253 130 395 131 274 132 428 133 329 134 410 135 127 136 420 137 187 138 244 139 125 140 137 141 206 142 205 143 327 144 211 145 7 146 192 147 317 148 60 149 54 150 4 151 434 152 233 153 47 154 280 155 76 156 398 157 320 158 347 159 453 160 465 161 382 162 476 163 213 164 418 165 409 166 230 167 3 168 229 169 436 170 262 171 77 172 207 173 118 174 99 175 243 176 27 177 479 178 438 179 152 180 109 181 330 182 17 183 179 184 323 185 124 186 296 187 435 188 225 189 128 190 84 191 316 192 195 193 74 194 138 195 149 196 63 197 249 198 104 199 35 200 228 201 44 202 275 203 259 204 356 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.