Re: [R] Splitting a data column randomly into 3 groups

2021-09-06 Thread AbouEl-Makarim Aboueissa
Hi Bert and All: good morning

I promise this would be the last time to write about this topic.

I come up with this R function (please see below), for sure with your help.
It works for all sample sizes. I also provided three different simple
examples.

with many thanks
abou

##Here it is###

Random.Sample.IDs <- function (N,n, ngroups){ N = population size,
and n = sample size, ngroups = number of groups

population.IDs <- seq(1, N, by = 1)
sample.IDs <- sample(population.IDs,n)

# to print sample.IDs in a column format
# --
sample.IDs.in.column<-data.frame(sample.IDs)
print(sample.IDs.in.column)

reminder.n<-n%%ngroups
reminder.n

n.final<-n-reminder.n
n.final

  m <- n %/% 3
  m
  s <- sample(1:n, n)

if (reminder.n == 0) {

  group1.IDs <- sample.IDs[s[1:m]]
  group2.IDs <- sample.IDs[s[(m+1):(2*m)]]
  group3.IDs <- sample.IDs[s[(m*2+1):(3*m)]]

} else if(reminder.n == 1){

  group1.IDs <- sample.IDs[s[1:(m+1)]]
  group2.IDs <- sample.IDs[s[(m+2):(2*m+1)]]
  group3.IDs <- sample.IDs[s[(m*2+2):(3*m+1)]]

} else if(reminder.n == 2){

  group1.IDs <- sample.IDs[s[1:(m+1)]]
  group2.IDs <- sample.IDs[s[(m+2):(2*m+2)]]
  group3.IDs <- sample.IDs[s[(m*2+3):(3*m+2)]]
}
nn<-max(length(group1.IDs),length(group2.IDs),length(group3.IDs))
nn
length(group1.IDs) <- nn
length(group2.IDs) <- nn
length(group3.IDs) <- nn

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs

}


#  Examples
#  

Random.Sample.IDs (100,12,3) group sizes are equal (n1=n2=n3=4)

Random.Sample.IDs (100,13,3) group sizes are NOT equal (n1=5, n2=4,
n3=4)

Random.Sample.IDs (100,17,3) group sizes are NOT equal (n1=6, n2=6,
n3=5)


__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Sun, Sep 5, 2021 at 6:50 PM Bert Gunter  wrote:

> In case anyone is still interested in my query, note that if there are
> n total items to be split into g groups as evenly as possible, if we
> define this as at most two different size groups whose size differs by
> 1, then:
>
> if n = k*g + r, where 0 <= r < g,
> then n = k*(g - r) + (k + 1)*r  .
> i.e. g-r groups of size k and r groups of size k+1
>
> So using R's modular arithmetic operators, which are handy to know
> about, we have:
>
> r = n %% g and k = n %/% g .
>
> (and note that you should disregard my previous stupid remark about
> numerical analysis).
>
> Cheers,
> Bert
>
>
> On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter  wrote:
> >
> > I have a more general problem for you.
> >
> > Given n items and 2 <=g < > groups that are as "equal as possible."
> >
> > First, operationally define "as equal as possible."
> > Second, define the algorithm to carry out the definition. Hint: Note
> > that sum{m[i]} for i <=g must sum to n, where m[i] is the number of
> > items in the ith group.
> > Third, write R code for the algorithm. Exercise for the reader.
> >
> > I may be wrong, but I think numerical analysts might also have a
> > little fun here.
> >
> > Randomization, of course, is trivial.
> >
> > Cheers,
> > Bert
> >
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> > On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa
> >  wrote:
> > >
> > > Dear Thomas:
> > >
> > >
> > > Thank you very much for your input in this matter.
> > >
> > >
> > > The core part of this R code(s) (please see below) was written by
> *Richard
> > > O'Keefe*. I had three examples with different sample sizes.
> > >
> > >
> > >
> > > *First sample of size n1 = 204* divided randomly into three groups of
> sizes
> > > 68. *No problems with this one*.
> > >
> > >
> > >
> > > *The second sample of size n2 = 112* divided randomly into three
> groups of
> > > sizes 37, 37, and 38. BUT this R code generated three groups of equal
> sizes
> > > (37, 37, and 37). *How to fix the code to make sure that the output
> will be
> > > three groups of sizes 37, 37, and 38*.
> > >
> > >
> > >
> > > *The third sample of size n3 = 284* divided randomly into three groups
> of
> > > sizes 94, 95, and 95. BUT this R code generated three groups of equal
> sizes
> > > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the
> > > output will be three groups of sizes 94, 95, and 95*.
> > >
> > >
> > > With many thanks
> > >
> > > abou
> > >
> > >
> > > ###     #
> > >
> > >
> > > N1 <- 485
> > > population1.IDs <- seq(1, N1, by = 1)
> > >  population1.IDs
> > >
> > > n1<-204# in this case the
> size
> > > of each group of the three groups = 68
> > > sample1.IDs <- sample(population1.IDs,n1)
> > >  

Re: [R] Splitting a data column randomly into 3 groups

2021-09-05 Thread Bert Gunter
In case anyone is still interested in my query, note that if there are
n total items to be split into g groups as evenly as possible, if we
define this as at most two different size groups whose size differs by
1, then:

if n = k*g + r, where 0 <= r < g,
then n = k*(g - r) + (k + 1)*r  .
i.e. g-r groups of size k and r groups of size k+1

So using R's modular arithmetic operators, which are handy to know
about, we have:

r = n %% g and k = n %/% g .

(and note that you should disregard my previous stupid remark about
numerical analysis).

Cheers,
Bert


On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter  wrote:
>
> I have a more general problem for you.
>
> Given n items and 2 <=g < groups that are as "equal as possible."
>
> First, operationally define "as equal as possible."
> Second, define the algorithm to carry out the definition. Hint: Note
> that sum{m[i]} for i <=g must sum to n, where m[i] is the number of
> items in the ith group.
> Third, write R code for the algorithm. Exercise for the reader.
>
> I may be wrong, but I think numerical analysts might also have a
> little fun here.
>
> Randomization, of course, is trivial.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa
>  wrote:
> >
> > Dear Thomas:
> >
> >
> > Thank you very much for your input in this matter.
> >
> >
> > The core part of this R code(s) (please see below) was written by *Richard
> > O'Keefe*. I had three examples with different sample sizes.
> >
> >
> >
> > *First sample of size n1 = 204* divided randomly into three groups of sizes
> > 68. *No problems with this one*.
> >
> >
> >
> > *The second sample of size n2 = 112* divided randomly into three groups of
> > sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes
> > (37, 37, and 37). *How to fix the code to make sure that the output will be
> > three groups of sizes 37, 37, and 38*.
> >
> >
> >
> > *The third sample of size n3 = 284* divided randomly into three groups of
> > sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes
> > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the
> > output will be three groups of sizes 94, 95, and 95*.
> >
> >
> > With many thanks
> >
> > abou
> >
> >
> > ###     #
> >
> >
> > N1 <- 485
> > population1.IDs <- seq(1, N1, by = 1)
> >  population1.IDs
> >
> > n1<-204# in this case the size
> > of each group of the three groups = 68
> > sample1.IDs <- sample(population1.IDs,n1)
> >  sample1.IDs
> >
> >   n1 <- length(sample1.IDs)
> >
> >   m1 <- n1 %/% 3
> >   s1 <- sample(1:n1, n1)
> >   group1.IDs <- sample1.IDs[s1[1:m1]]
> >   group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
> >   group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]
> >
> > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
> >
> > groups.IDs
> >
> >
> > ### --
> >
> >
> > N2 <- 266
> > population2.IDs <- seq(1, N2, by = 1)
> >  population2.IDs
> >
> > n2<-112   # in this case the sizes of the three
> > groups are(37, 37, and 38)
> >   # BUT this codes generate
> > three groups of equal sizes (37, 37, and 37)
> > sample2.IDs <- sample(population2.IDs,n2)
> >  sample2.IDs
> >
> >   n2 <- length(sample2.IDs)
> >
> >   m2 <- n2 %/% 3
> >   s2 <- sample(1:n2, n2)
> >   group1.IDs <- sample2.IDs[s2[1:m2]]
> >   group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
> >   group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]
> >
> > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
> >
> > groups.IDs
> >
> >
> > ### --
> >
> >
> >
> > N3 <- 674
> > population3.IDs <- seq(1, N3, by = 1)
> >  population3.IDs
> >
> > n3<-284   # in this case the sizes of the three
> > groups are(94, 95, and 95)
> >   # BUT this codes generate
> > three groups of equal sizes (94, 94, and 94)
> > sample2.IDs <- sample(population2.IDs,n2)
> > sample3.IDs <- sample(population3.IDs,n3)
> >  sample3.IDs
> >
> >   n3 <- length(sample2.IDs)
> >
> >   m3 <- n3 %/% 3
> >   s3 <- sample(1:n3, n3)
> >   group1.IDs <- sample3.IDs[s3[1:m3]]
> >   group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
> >   group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]
> >
> > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
> >
> > groups.IDs
> >
> > __
> >
> >
> > *AbouEl-Makarim Aboueissa, PhD*
> >
> > *Professor, Statistics and Data Science*
> > *Graduate Coordinator*
> >
> > *Department of Mathematics and Statistics*
> > *University of Southern Maine*
> >
> >
> >
> > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia  wrote:
> >
> > > Abou,
> > 

Re: [R] Splitting a data column randomly into 3 groups

2021-09-04 Thread Avi Gross via R-help
Abou,

I believe I addressed this issue in a private message the other day.

As a general rule, truncating can leave a remainder. If 
M  = length(whatever)/3 

Then M is no longer an integer. It can be a number ending in .333... or .666... 
as well as 0.

Now R may silently truncate something like 100/3 which you see to use and make 
it be as if you typed 33. Same for 2*M. In your code, you used integer division 
and that is a truncation too!

  m1 <- n1 %/% 3
  s1 <- sample(1:n1, n1)
  group1.IDs <- sample1.IDs[s1[1:m1]]
  group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
  group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]

A proper solution accounts for any leftover items. One method is to leave all 
extra items till the end and have:

MAX <- length(original or whatever)
group3.IDs <- sample1.IDs[s1[(m1*2+1):MAX]]


The last group then might have one or two extra items. Another is to go for  a 
second sweep and take any leftover items and move one each into whatever groups 
you wish for some balance.

Or, as discussed, there are packages available that let you specify percentages 
you want and handle these edge cases too.

-Original Message-
From: R-help  On Behalf Of AbouEl-Makarim 
Aboueissa
Sent: Saturday, September 4, 2021 5:13 PM
To: Thomas Subia 
Cc: R mailing list 
Subject: Re: [R] Splitting a data column randomly into 3 groups

Dear Thomas:


Thank you very much for your input in this matter.


The core part of this R code(s) (please see below) was written by *Richard 
O'Keefe*. I had three examples with different sample sizes.



*First sample of size n1 = 204* divided randomly into three groups of sizes 68. 
*No problems with this one*.



*The second sample of size n2 = 112* divided randomly into three groups of 
sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes 
(37, 37, and 37). *How to fix the code to make sure that the output will be 
three groups of sizes 37, 37, and 38*.



*The third sample of size n3 = 284* divided randomly into three groups of sizes 
94, 95, and 95. BUT this R code generated three groups of equal sizes (94, 94, 
and 94). *Again*, h*ow to fix the code to make sure that the output will be 
three groups of sizes 94, 95, and 95*.


With many thanks

abou


###     #


N1 <- 485
population1.IDs <- seq(1, N1, by = 1)
 population1.IDs

n1<-204# in this case the size
of each group of the three groups = 68
sample1.IDs <- sample(population1.IDs,n1)  sample1.IDs

  n1 <- length(sample1.IDs)

  m1 <- n1 %/% 3
  s1 <- sample(1:n1, n1)
  group1.IDs <- sample1.IDs[s1[1:m1]]
  group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
  group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs


### --


N2 <- 266
population2.IDs <- seq(1, N2, by = 1)
 population2.IDs

n2<-112   # in this case the sizes of the three
groups are(37, 37, and 38)
  # BUT this codes generate three 
groups of equal sizes (37, 37, and 37) sample2.IDs <- 
sample(population2.IDs,n2)  sample2.IDs

  n2 <- length(sample2.IDs)

  m2 <- n2 %/% 3
  s2 <- sample(1:n2, n2)
  group1.IDs <- sample2.IDs[s2[1:m2]]
  group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
  group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs


### --



N3 <- 674
population3.IDs <- seq(1, N3, by = 1)
 population3.IDs

n3<-284   # in this case the sizes of the three
groups are(94, 95, and 95)
  # BUT this codes generate three 
groups of equal sizes (94, 94, and 94) sample2.IDs <- 
sample(population2.IDs,n2) sample3.IDs <- sample(population3.IDs,n3)  
sample3.IDs

  n3 <- length(sample2.IDs)

  m3 <- n3 %/% 3
  s3 <- sample(1:n3, n3)
  group1.IDs <- sample3.IDs[s3[1:m3]]
  group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
  group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs

__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science* *Graduate Coordinator*

*Department of Mathematics and Statistics* *University of Southern Maine*



On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia  wrote:

> Abou,
>
>
>
> I’ve been following your question on how to split a data column 
> randomly into 3 groups using R.
>
>
>
> My method may not be amenable for a large set of data but it surely 
> worth considering since it makes sense intuitively.
>
>
>
> mydata <- LETTERS[1:11]
>
> > mydata
>
> [1] "A" "B" "C&qu

Re: [R] Splitting a data column randomly into 3 groups

2021-09-04 Thread Bert Gunter
I have a more general problem for you.

Given n items and 2 <=g < wrote:
>
> Dear Thomas:
>
>
> Thank you very much for your input in this matter.
>
>
> The core part of this R code(s) (please see below) was written by *Richard
> O'Keefe*. I had three examples with different sample sizes.
>
>
>
> *First sample of size n1 = 204* divided randomly into three groups of sizes
> 68. *No problems with this one*.
>
>
>
> *The second sample of size n2 = 112* divided randomly into three groups of
> sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes
> (37, 37, and 37). *How to fix the code to make sure that the output will be
> three groups of sizes 37, 37, and 38*.
>
>
>
> *The third sample of size n3 = 284* divided randomly into three groups of
> sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes
> (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the
> output will be three groups of sizes 94, 95, and 95*.
>
>
> With many thanks
>
> abou
>
>
> ###     #
>
>
> N1 <- 485
> population1.IDs <- seq(1, N1, by = 1)
>  population1.IDs
>
> n1<-204# in this case the size
> of each group of the three groups = 68
> sample1.IDs <- sample(population1.IDs,n1)
>  sample1.IDs
>
>   n1 <- length(sample1.IDs)
>
>   m1 <- n1 %/% 3
>   s1 <- sample(1:n1, n1)
>   group1.IDs <- sample1.IDs[s1[1:m1]]
>   group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
>   group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]
>
> groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
>
> groups.IDs
>
>
> ### --
>
>
> N2 <- 266
> population2.IDs <- seq(1, N2, by = 1)
>  population2.IDs
>
> n2<-112   # in this case the sizes of the three
> groups are(37, 37, and 38)
>   # BUT this codes generate
> three groups of equal sizes (37, 37, and 37)
> sample2.IDs <- sample(population2.IDs,n2)
>  sample2.IDs
>
>   n2 <- length(sample2.IDs)
>
>   m2 <- n2 %/% 3
>   s2 <- sample(1:n2, n2)
>   group1.IDs <- sample2.IDs[s2[1:m2]]
>   group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
>   group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]
>
> groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
>
> groups.IDs
>
>
> ### --
>
>
>
> N3 <- 674
> population3.IDs <- seq(1, N3, by = 1)
>  population3.IDs
>
> n3<-284   # in this case the sizes of the three
> groups are(94, 95, and 95)
>   # BUT this codes generate
> three groups of equal sizes (94, 94, and 94)
> sample2.IDs <- sample(population2.IDs,n2)
> sample3.IDs <- sample(population3.IDs,n3)
>  sample3.IDs
>
>   n3 <- length(sample2.IDs)
>
>   m3 <- n3 %/% 3
>   s3 <- sample(1:n3, n3)
>   group1.IDs <- sample3.IDs[s3[1:m3]]
>   group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
>   group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]
>
> groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)
>
> groups.IDs
>
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
>
>
>
> On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia  wrote:
>
> > Abou,
> >
> >
> >
> > I’ve been following your question on how to split a data column randomly
> > into 3 groups using R.
> >
> >
> >
> > My method may not be amenable for a large set of data but it surely worth
> > considering since it makes sense intuitively.
> >
> >
> >
> > mydata <- LETTERS[1:11]
> >
> > > mydata
> >
> > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
> >
> >
> >
> > # Let’s choose a random sample of size 4 from mydata
> >
> > > random_grp1
> >
> > [1] "J" "H" "D" "A"
> >
> >
> >
> > Now my next random selection of data is defined by
> >
> > data_wo_random <- setdiff(mydata,random_grp1)
> >
> > # this makes sense because I need to choose random data from a set which
> > is defined by the difference of the sets mydata and random_grp1
> >
> >
> >
> > > data_wo_random
> >
> > [1] "B" "C" "E" "F" "G" "I" "K"
> >
> >
> >
> > This is great! So now I can randomly select data of any size from this set.
> >
> > Repeating this process can easily generate subgroups of your original
> > dataset of any size you want.
> >
> >
> >
> > Surely this method could be improved so that this could be done
> > automatically.
> >
> > Nevertheless, this is an intuitive method which I believe is easier to
> > understand than some of the other methods posted.
> >
> >
> >
> > Hope this helps!
> >
> >
> >
> > Thomas Subia
> >
> > Statistician
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do 

Re: [R] Splitting a data column randomly into 3 groups

2021-09-04 Thread AbouEl-Makarim Aboueissa
Dear Thomas:


Thank you very much for your input in this matter.


The core part of this R code(s) (please see below) was written by *Richard
O'Keefe*. I had three examples with different sample sizes.



*First sample of size n1 = 204* divided randomly into three groups of sizes
68. *No problems with this one*.



*The second sample of size n2 = 112* divided randomly into three groups of
sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes
(37, 37, and 37). *How to fix the code to make sure that the output will be
three groups of sizes 37, 37, and 38*.



*The third sample of size n3 = 284* divided randomly into three groups of
sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes
(94, 94, and 94). *Again*, h*ow to fix the code to make sure that the
output will be three groups of sizes 94, 95, and 95*.


With many thanks

abou


###     #


N1 <- 485
population1.IDs <- seq(1, N1, by = 1)
 population1.IDs

n1<-204# in this case the size
of each group of the three groups = 68
sample1.IDs <- sample(population1.IDs,n1)
 sample1.IDs

  n1 <- length(sample1.IDs)

  m1 <- n1 %/% 3
  s1 <- sample(1:n1, n1)
  group1.IDs <- sample1.IDs[s1[1:m1]]
  group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]]
  group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs


### --


N2 <- 266
population2.IDs <- seq(1, N2, by = 1)
 population2.IDs

n2<-112   # in this case the sizes of the three
groups are(37, 37, and 38)
  # BUT this codes generate
three groups of equal sizes (37, 37, and 37)
sample2.IDs <- sample(population2.IDs,n2)
 sample2.IDs

  n2 <- length(sample2.IDs)

  m2 <- n2 %/% 3
  s2 <- sample(1:n2, n2)
  group1.IDs <- sample2.IDs[s2[1:m2]]
  group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]]
  group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs


### --



N3 <- 674
population3.IDs <- seq(1, N3, by = 1)
 population3.IDs

n3<-284   # in this case the sizes of the three
groups are(94, 95, and 95)
  # BUT this codes generate
three groups of equal sizes (94, 94, and 94)
sample2.IDs <- sample(population2.IDs,n2)
sample3.IDs <- sample(population3.IDs,n3)
 sample3.IDs

  n3 <- length(sample2.IDs)

  m3 <- n3 %/% 3
  s3 <- sample(1:n3, n3)
  group1.IDs <- sample3.IDs[s3[1:m3]]
  group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]]
  group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]]

groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs)

groups.IDs

__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia  wrote:

> Abou,
>
>
>
> I’ve been following your question on how to split a data column randomly
> into 3 groups using R.
>
>
>
> My method may not be amenable for a large set of data but it surely worth
> considering since it makes sense intuitively.
>
>
>
> mydata <- LETTERS[1:11]
>
> > mydata
>
> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
>
>
>
> # Let’s choose a random sample of size 4 from mydata
>
> > random_grp1
>
> [1] "J" "H" "D" "A"
>
>
>
> Now my next random selection of data is defined by
>
> data_wo_random <- setdiff(mydata,random_grp1)
>
> # this makes sense because I need to choose random data from a set which
> is defined by the difference of the sets mydata and random_grp1
>
>
>
> > data_wo_random
>
> [1] "B" "C" "E" "F" "G" "I" "K"
>
>
>
> This is great! So now I can randomly select data of any size from this set.
>
> Repeating this process can easily generate subgroups of your original
> dataset of any size you want.
>
>
>
> Surely this method could be improved so that this could be done
> automatically.
>
> Nevertheless, this is an intuitive method which I believe is easier to
> understand than some of the other methods posted.
>
>
>
> Hope this helps!
>
>
>
> Thomas Subia
>
> Statistician
>
>
>
>
>
>
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-03 Thread AbouEl-Makarim Aboueissa
Hi Richard:

Thank you very much for your help in this matter.

with thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Fri, Sep 3, 2021 at 10:25 AM Richard O'Keefe  wrote:

> Your question is ambiguous.
> One reading is
>   n <- length(table$Data)
>   m <- n %/% 3
>   s <- sample(1:n, n)
>   X <- table$Data[s[1:m]]
>   Y <- table$Data[s[(m+1):(2*m)]]
>   Z <- table$Data[s[(m*2+1):(3*m)]]
>
>
>
>
> On Fri, 3 Sept 2021 at 13:31, AbouEl-Makarim Aboueissa
>  wrote:
> >
> > Dear All:
> >
> > How to split a column data *randomly* into three groups. Please see the
> > attached data. I need to split column #2 titled "Data"
> >
> > with many thanks
> > abou
> > __
> >
> >
> > *AbouEl-Makarim Aboueissa, PhD*
> >
> > *Professor, Statistics and Data Science*
> > *Graduate Coordinator*
> >
> > *Department of Mathematics and Statistics*
> > *University of Southern Maine*
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-03 Thread AbouEl-Makarim Aboueissa
Hi Avi: good morning

Again, many thanks to all of you. I appreciate all what you are doing. You
are good. I did it in Minitab. It cost me a little bit more time, but it is
okay.

It was a little bit confusing for me to do it in R. Because in *Step 1: *I
have to select a random sample of size n=204 (say) out of N=700 (say). Then
in Step 2: I have to allocate the 204 randomly selected obs. into three
groups of equal sample sizes.

Again, thank you very much, and sorry if I bothered you.


with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Sep 2, 2021 at 10:42 PM Avi Gross via R-help 
wrote:

> Abou,
>
>
>
> I am not trying to be negative. Assuming you are a professor of
> Statistics, your request seems odd as what you are asking about is very
> routine in much of statistical work where you want to make a model or
> something using just part of your data and need to reserve some to check if
> you perhaps trained an algorithm too much for the original data used.
>
>
>
> A simple online search before asking questions here is appreciated. I did
> a quick search for something like “R split data into three parts” and see
> several applicable answers.
>
>
>
> There are people on this forum who actually get paid to do nontrivial
> tasks and do not mind help in spots but feel sort of used if expected to
> write a serious amount of code and perhaps then be asked to redo it with
> more bells and whistles added. A recent badly phrased request comes to mind
> where several of us provided and answer only to find out it was for a
> different scenario, …
>
>
>
> So let me continue with a serious answer. May we assume you KNOW how to
> read the data in to something like a data.frame? If so, and if you see no
> need or value in doing this the hard way, then your question could have
> been to ask if there is an R built-in function or perhaps a pacjkage
> already set to solve it quickly. Again, a simple online search can do
> wonders.  Here, for example is a package called caret and this page
> discusses spliutting data multiple ways:
>
>
>
> https://topepo.github.io/caret/data-splitting.html
>
>
>
> There are other such pages suggesting how to do it using base R.
>
>
>
> Here is one that gives an example on how to make  three unequal partitions:
>
>
>
> inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test
> = 0.2))
>
>
>
>
>
> There is more to do below but in the above, you would use whatever names
> you want instead of train/valid/test and set all three to 0.33 and so on.
>
>
>
> I repeat, that what you want to do strikes some of us as a fairly routine
> thing to do and lots of people have written how they have done it and you
> can pick and choose, or redo it on your own. If what you have is a homework
> assignment, the appropriate thing is to have you learn to use some
> technique yourself and perhaps get minor help when it fails. But if you
> will be doing this regularly, use of some packages is highly valuable.
>
>
>
> Good Luck.
>
>
>
>
>
>
>
>
>
>
>
> From: AbouEl-Makarim Aboueissa 
> Sent: Thursday, September 2, 2021 9:51 PM
> To: Avi Gross 
> Cc: R mailing list 
> Subject: Re: [R] Splitting a data column randomly into 3 groups
>
>
>
> Sorry, please forget about it. I believe that I am very serious when I
> posted my question.
>
>
>
> with thanks
>
> abou
>
>
> __
>
> AbouEl-Makarim Aboueissa, PhD
>
>
>
> Professor, Statistics and Data Science
>
> Graduate Coordinator
>
> Department of Mathematics and Statistics
>
> University of Southern Maine
>
>
>
>
>
>
>
> On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help  <mailto:r-help@r-project.org> > wrote:
>
> What is stopping you Abou?
>
> Some of us here start wondering if we have better things to do than
> homework for others. Help is supposed to be after they try and encounter
> issues that we may help with.
>
> So think about your problem. You supplied data in a file that is NOT in
> CSV format but is in Tab separated format.
>
> You need to get it in to your program and store it in something. It looks
> like you have 204 items so 1/3 of those would be exactly 68.
>
> So if your data is in an object like a vector or data.frame, you want to
> choose random number between 1 and 204. How do you do that? You need 1/3 of
> the length of the object items, in your case 68.
>
> Now extract the items with  those indices into say A1. Extract

Re: [R] Splitting a data column randomly into 3 groups

2021-09-03 Thread Richard O'Keefe
Your question is ambiguous.
One reading is
  n <- length(table$Data)
  m <- n %/% 3
  s <- sample(1:n, n)
  X <- table$Data[s[1:m]]
  Y <- table$Data[s[(m+1):(2*m)]]
  Z <- table$Data[s[(m*2+1):(3*m)]]




On Fri, 3 Sept 2021 at 13:31, AbouEl-Makarim Aboueissa
 wrote:
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread Avi Gross via R-help
Abou,

 

I am not trying to be negative. Assuming you are a professor of Statistics, 
your request seems odd as what you are asking about is very routine in much of 
statistical work where you want to make a model or something using just part of 
your data and need to reserve some to check if you perhaps trained an algorithm 
too much for the original data used.

 

A simple online search before asking questions here is appreciated. I did a 
quick search for something like “R split data into three parts” and see several 
applicable answers.

 

There are people on this forum who actually get paid to do nontrivial tasks and 
do not mind help in spots but feel sort of used if expected to write a serious 
amount of code and perhaps then be asked to redo it with more bells and 
whistles added. A recent badly phrased request comes to mind where several of 
us provided and answer only to find out it was for a different scenario, …

 

So let me continue with a serious answer. May we assume you KNOW how to read 
the data in to something like a data.frame? If so, and if you see no need or 
value in doing this the hard way, then your question could have been to ask if 
there is an R built-in function or perhaps a pacjkage already set to solve it 
quickly. Again, a simple online search can do wonders.  Here, for example is a 
package called caret and this page discusses spliutting data multiple ways:

 

https://topepo.github.io/caret/data-splitting.html

 

There are other such pages suggesting how to do it using base R.

 

Here is one that gives an example on how to make  three unequal partitions:

 

inds <- partition(iris$Sepal.Length, p = c(train = 0.6, valid = 0.2, test = 
0.2))

 

 

There is more to do below but in the above, you would use whatever names you 
want instead of train/valid/test and set all three to 0.33 and so on.

 

I repeat, that what you want to do strikes some of us as a fairly routine thing 
to do and lots of people have written how they have done it and you can pick 
and choose, or redo it on your own. If what you have is a homework assignment, 
the appropriate thing is to have you learn to use some technique yourself and 
perhaps get minor help when it fails. But if you will be doing this regularly, 
use of some packages is highly valuable.

 

Good Luck.

 

 

 

 

 

From: AbouEl-Makarim Aboueissa  
Sent: Thursday, September 2, 2021 9:51 PM
To: Avi Gross 
Cc: R mailing list 
Subject: Re: [R] Splitting a data column randomly into 3 groups

 

Sorry, please forget about it. I believe that I am very serious when I posted 
my question.

 

with thanks

abou


__

AbouEl-Makarim Aboueissa, PhD

 

Professor, Statistics and Data Science

Graduate Coordinator

Department of Mathematics and Statistics

University of Southern Maine

 

 

 

On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help mailto:r-help@r-project.org> > wrote:

What is stopping you Abou?

Some of us here start wondering if we have better things to do than homework 
for others. Help is supposed to be after they try and encounter issues that we 
may help with.

So think about your problem. You supplied data in a file that is NOT in CSV 
format but is in Tab separated format.

You need to get it in to your program and store it in something. It looks like 
you have 204 items so 1/3 of those would be exactly 68.

So if your data is in an object like a vector or data.frame, you want to choose 
random number between 1 and 204. How do you do that? You need 1/3 of the length 
of the object items, in your case 68.

Now extract the items with  those indices into say A1. Extract all the rest 
into a temporary item.

Make another 68 random indices, with no overlap, and copy those items into A2 
and the ones that do not have those into A3 and you are sort of done, other 
than some cleanup or whatever.

There are many ways to do the above and I am sure packages too.

But since you have made no visible effort, I personally am not going to pick 
anything in particular.

Had you shown some text and code along the lines of the above and just wanted 
to know how to copy just the ones that were not selected, we could easily ...


-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of AbouEl-Makarim Aboueissa
Sent: Thursday, September 2, 2021 9:30 PM
To: R mailing list mailto:r-help@r-project.org> >
Subject: [R] Splitting a data column randomly into 3 groups

Dear All:

How to split a column data *randomly* into three groups. Please see the 
attached data. I need to split column #2 titled "Data"

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science* *Graduate Coordinator*

*Department of Mathematics and Statistics* *University of Southern Maine*

__
R-help@r-project.org <mailto:R-help@r-project.org>  mailing list -- To 
UNSUBSCRIBE and m

Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread Jim Lemon
Hi Abou,
One way is to shuffle the original data frame using sample(). and
split up the result into three equal parts.
I was going to provide example code, but Avi's response popped up and
I kind of agree with him.

Jim

On Fri, Sep 3, 2021 at 11:31 AM AbouEl-Makarim Aboueissa
 wrote:
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread AbouEl-Makarim Aboueissa
Sorry, please forget about it. I believe that I am very serious when I
posted my question.

with thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*



On Thu, Sep 2, 2021 at 9:42 PM Avi Gross via R-help 
wrote:

> What is stopping you Abou?
>
> Some of us here start wondering if we have better things to do than
> homework for others. Help is supposed to be after they try and encounter
> issues that we may help with.
>
> So think about your problem. You supplied data in a file that is NOT in
> CSV format but is in Tab separated format.
>
> You need to get it in to your program and store it in something. It looks
> like you have 204 items so 1/3 of those would be exactly 68.
>
> So if your data is in an object like a vector or data.frame, you want to
> choose random number between 1 and 204. How do you do that? You need 1/3 of
> the length of the object items, in your case 68.
>
> Now extract the items with  those indices into say A1. Extract all the
> rest into a temporary item.
>
> Make another 68 random indices, with no overlap, and copy those items into
> A2 and the ones that do not have those into A3 and you are sort of done,
> other than some cleanup or whatever.
>
> There are many ways to do the above and I am sure packages too.
>
> But since you have made no visible effort, I personally am not going to
> pick anything in particular.
>
> Had you shown some text and code along the lines of the above and just
> wanted to know how to copy just the ones that were not selected, we could
> easily ...
>
>
> -Original Message-
> From: R-help  On Behalf Of AbouEl-Makarim
> Aboueissa
> Sent: Thursday, September 2, 2021 9:30 PM
> To: R mailing list 
> Subject: [R] Splitting a data column randomly into 3 groups
>
> Dear All:
>
> How to split a column data *randomly* into three groups. Please see the
> attached data. I need to split column #2 titled "Data"
>
> with many thanks
> abou
> __
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Statistics and Data Science* *Graduate Coordinator*
>
> *Department of Mathematics and Statistics* *University of Southern Maine*
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Splitting a data column randomly into 3 groups

2021-09-02 Thread Avi Gross via R-help
What is stopping you Abou?

Some of us here start wondering if we have better things to do than homework 
for others. Help is supposed to be after they try and encounter issues that we 
may help with.

So think about your problem. You supplied data in a file that is NOT in CSV 
format but is in Tab separated format.

You need to get it in to your program and store it in something. It looks like 
you have 204 items so 1/3 of those would be exactly 68.

So if your data is in an object like a vector or data.frame, you want to choose 
random number between 1 and 204. How do you do that? You need 1/3 of the length 
of the object items, in your case 68.

Now extract the items with  those indices into say A1. Extract all the rest 
into a temporary item.

Make another 68 random indices, with no overlap, and copy those items into A2 
and the ones that do not have those into A3 and you are sort of done, other 
than some cleanup or whatever.

There are many ways to do the above and I am sure packages too.

But since you have made no visible effort, I personally am not going to pick 
anything in particular.

Had you shown some text and code along the lines of the above and just wanted 
to know how to copy just the ones that were not selected, we could easily ...


-Original Message-
From: R-help  On Behalf Of AbouEl-Makarim 
Aboueissa
Sent: Thursday, September 2, 2021 9:30 PM
To: R mailing list 
Subject: [R] Splitting a data column randomly into 3 groups

Dear All:

How to split a column data *randomly* into three groups. Please see the 
attached data. I need to split column #2 titled "Data"

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science* *Graduate Coordinator*

*Department of Mathematics and Statistics* *University of Southern Maine*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splitting a data column randomly into 3 groups

2021-09-02 Thread AbouEl-Makarim Aboueissa
Dear All:

How to split a column data *randomly* into three groups. Please see the
attached data. I need to split column #2 titled "Data"

with many thanks
abou
__


*AbouEl-Makarim Aboueissa, PhD*

*Professor, Statistics and Data Science*
*Graduate Coordinator*

*Department of Mathematics and Statistics*
*University of Southern Maine*
ID Data
1   366
2   394
3   222
4   396
5   399
6   158
7   361
8   426
9   255
10  32
11  31
12  53
13  377
14  405
15  448
16  362
17  260
18  90
19  95
20  8
21  385
22  306
23  154
24  345
25  136
26  39
27  472
28  19
29  404
30  463
31  134
32  72
33  477
34  22
35  240
36  389
37  482
38  287
39  180
40  140
41  456
42  403
43  81
44  425
45  57
46  251
47  421
48  343
49  310
50  62
51  412
52  93
53  111
54  148
55  311
56  430
57  12
58  100
59  437
60  363
61  126
62  367
63  165
64  272
65  171
66  167
67  234
68  113
69  315
70  175
71  484
72  379
73  474
74  216
75  250
76  177
77  293
78  133
79  203
80  408
81  150
82  155
83  223
84  381
85  336
86  368
87  290
88  359
89  333
90  219
91  455
92  427
93  444
94  178
95  302
96  221
97  248
98  160
99  304
100 56
101 25
102 400
103 485
104 89
105 254
106 186
107 283
108 431
109 188
110 354
111 119
112 67
113 415
114 346
115 319
116 344
117 121
118 34
119 288
120 416
121 308
122 340
123 166
124 443
125 388
126 286
127 245
128 406
129 253
130 395
131 274
132 428
133 329
134 410
135 127
136 420
137 187
138 244
139 125
140 137
141 206
142 205
143 327
144 211
145 7
146 192
147 317
148 60
149 54
150 4
151 434
152 233
153 47
154 280
155 76
156 398
157 320
158 347
159 453
160 465
161 382
162 476
163 213
164 418
165 409
166 230
167 3
168 229
169 436
170 262
171 77
172 207
173 118
174 99
175 243
176 27
177 479
178 438
179 152
180 109
181 330
182 17
183 179
184 323
185 124
186 296
187 435
188 225
189 128
190 84
191 316
192 195
193 74
194 138
195 149
196 63
197 249
198 104
199 35
200 228
201 44
202 275
203 259
204 356
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.