Re: [R] r-data partitioning considering two variables (character and numeric)
Thanks Bert, worked nicely. Yes, genotypes with only one ID will be eliminated before partitioning the data. Best regards Ahmed Attia On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter wrote: > Just partition the unique stand_ID's and select on them using %in% , say: > > id <- unique(dataGenotype$stand_ID) > tst <- sample(id, floor(length(id)/2)) > wh <- dataGenotype$stand_ID %in% tst ## logical vector > test<- dataGenotype[wh,] > train <- dataGenotype[!wh,] > > There are a million variations on this theme I'm sure. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia wrote: >> >> I would like to partition the following dataset (dataGenotype) based >> on two variables; Genotype and stand_ID, for example, for Genotype >> H13: stand_ID number 7 may go to training and stand_ID number 18 and >> 21 may go to testing. >> >> Genotypestand_IDInventory_date stemC mheight >> H13 75/18/2006 1940.1075 11.33995 >> H13 711/1/2008 10898.9597 23.20395 >> H13 74/14/2009 12830.1284 23.77395 >> H131811/3/2005 2726.42 13.4432 >> H13186/30/2008 12226.1554 24.091967 >> H13184/14/2009 14141.6825.0922 >> H13215/18/2006 4981.7158 15.7173 >> H13214/14/2009 20327.0667 27.9155 >> H159 3/31/2006 3570.06 14.7898 >> H159 11/1/2008 15138.8383 26.2088 >> H159 4/14/2009 17035.4688 26.8778 >> H15 20 1/18/2005 3016.88114.1886 >> H15 2010/4/2006 8330.4688 20.19425 >> H15 206/30/2008 13576.5 25.4774 >> H15 322/1/20063426.2525 14.31815 >> U21 3 1/9/20063660.41615.09925 >> U21 3 6/30/2008 13236.2924.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> U21 6711/4/2005 2812.8425 13.60485 >> U21 674/14/2009 13468.455 24.6203 >> >> And the desired output is the following; >> >> A-training >> >> Genotypestand_IDInventory_date stemC mheight >> H137 5/18/2006 1940.1075 11.33995 >> H137 11/1/2008 10898.9597 23.20395 >> H137 4/14/2009 12830.1284 23.77395 >> H159 3/31/2006 3570.06 14.7898 >> H159 11/1/2008 15138.8383 26.2088 >> H159 4/14/2009 17035.4688 26.8778 >> U216711/4/2005 2812.8425 13.60485 >> U21674/14/2009 13468.455 24.6203 >> >> B-testing >> >> Genotypestand_IDInventory_date stemC mheight >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.6825.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 20 1/18/2005 3016.88114.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 31/9/2006 3660.41615.09925 >> U21 36/30/2008 13236.2924.27634 >> U21 34/14/2009 16124.192 25.79562 >> >> I tried the following code; >> >> library(caret) >> dataPartitioning <- >> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) >> train = dataGenotype[dataPartitioning,] >> test = dataGenotype[-dataPartitioning,] >> >> Also tried >> >> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) >> >> It did not produce the desired output, the data are partitioned within >> the stand_ID. For example, one row of stand_ID 7 goes to training and >> two rows of stand_ID 7 go to testing. How can I partition the data by >> Genotype and stand_ID together?. >> >> >> >> Ahmed Attia >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r-data partitioning considering two variables (character and numeric)
Sorry, my bad -- careless reading: you need to do the partitioning within genotype. Something like: by(dataGenotype, dataGenotype$Genotype, function(x){ u <- unique(x$standID) tst <- x$x2 %in% sample(u, floor(length(u)/2)) list(test = x[tst,], train = x[!tst,] }) This will give a list each component of which will split the Genotype into test and train dataframe subsets by ID. These lists of data frames can then be recombined into a single test and train dataframe by, e.g. an appropriate rbind() call. HOWEVER, note that you will need to modify this function to decide what to do if/when there is only one ID in a Genotype, as Don MacQueen already pointed out. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 4:09 PM Bert Gunter wrote: > Just partition the unique stand_ID's and select on them using %in% , say: > > id <- unique(dataGenotype$stand_ID) > tst <- sample(id, floor(length(id)/2)) > wh <- dataGenotype$stand_ID %in% tst ## logical vector > test<- dataGenotype[wh,] > train <- dataGenotype[!wh,] > > There are a million variations on this theme I'm sure. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia wrote: > >> I would like to partition the following dataset (dataGenotype) based >> on two variables; Genotype and stand_ID, for example, for Genotype >> H13: stand_ID number 7 may go to training and stand_ID number 18 and >> 21 may go to testing. >> >> Genotypestand_IDInventory_date stemC mheight >> H13 75/18/2006 1940.1075 11.33995 >> H13 711/1/2008 10898.9597 23.20395 >> H13 74/14/2009 12830.1284 23.77395 >> H131811/3/2005 2726.42 13.4432 >> H13186/30/2008 12226.1554 24.091967 >> H13184/14/2009 14141.6825.0922 >> H13215/18/2006 4981.7158 15.7173 >> H13214/14/2009 20327.0667 27.9155 >> H159 3/31/2006 3570.06 14.7898 >> H159 11/1/2008 15138.8383 26.2088 >> H159 4/14/2009 17035.4688 26.8778 >> H15 20 1/18/2005 3016.88114.1886 >> H15 2010/4/2006 8330.4688 20.19425 >> H15 206/30/2008 13576.5 25.4774 >> H15 322/1/20063426.2525 14.31815 >> U21 3 1/9/20063660.41615.09925 >> U21 3 6/30/2008 13236.2924.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> U21 6711/4/2005 2812.8425 13.60485 >> U21 674/14/2009 13468.455 24.6203 >> >> And the desired output is the following; >> >> A-training >> >> Genotypestand_IDInventory_date stemC mheight >> H137 5/18/2006 1940.1075 11.33995 >> H137 11/1/2008 10898.9597 23.20395 >> H137 4/14/2009 12830.1284 23.77395 >> H159 3/31/2006 3570.06 14.7898 >> H159 11/1/2008 15138.8383 26.2088 >> H159 4/14/2009 17035.4688 26.8778 >> U216711/4/2005 2812.8425 13.60485 >> U21674/14/2009 13468.455 24.6203 >> >> B-testing >> >> Genotypestand_IDInventory_date stemC mheight >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.6825.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 20 1/18/2005 3016.88114.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 31/9/2006 3660.41615.09925 >> U21 36/30/2008 13236.2924.27634 >> U21 34/14/2009 16124.192 25.79562 >> >> I tried the following code; >> >> library(caret) >> dataPartitioning <- >> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) >> train = dataGenotype[dataPartitioning,] >> test = dataGenotype[-dataPartitioning,] >> >> Also tried >> >> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) >> >> It did not produce the desired output, the data are partitioned within >> the stand_ID. For example, one row of stand_ID 7 goes to training and >> two rows of stand_ID 7 go to testing. How can I partition the data by >> Genotype and stand_ID together?. >> >> >> >> Ahmed Attia >> >> ___
Re: [R] r-data partitioning considering two variables (character and numeric)
And yes, I ignored Genotype, but for the example data none of the stand_ID values are present in more than one Genotype, so it doesn't matter. If that's not true in general, then constructing the grp variable is a little more complex, but the principle is the same. -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 8/27/18, 4:10 PM, "R-help on behalf of MacQueen, Don via R-help" wrote: You could start with split() grp <- rep('', nrow(mydata) ) grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training' grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing' split(mydata, grp) or perhaps grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' ) split(mydata, grp) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" wrote: I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotypestand_IDInventory_date stemC mheight H13 75/18/2006 1940.1075 11.33995 H13 711/1/2008 10898.9597 23.20395 H13 74/14/2009 12830.1284 23.77395 H131811/3/2005 2726.42 13.4432 H13186/30/2008 12226.1554 24.091967 H13184/14/2009 14141.6825.0922 H13215/18/2006 4981.7158 15.7173 H13214/14/2009 20327.0667 27.9155 H159 3/31/2006 3570.06 14.7898 H159 11/1/2008 15138.8383 26.2088 H159 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.88114.1886 H15 2010/4/2006 8330.4688 20.19425 H15 206/30/2008 13576.5 25.4774 H15 322/1/20063426.2525 14.31815 U21 3 1/9/20063660.41615.09925 U21 3 6/30/2008 13236.2924.27634 U21 3 4/14/2009 16124.192 25.79562 U21 6711/4/2005 2812.8425 13.60485 U21 674/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotypestand_IDInventory_date stemC mheight H137 5/18/2006 1940.1075 11.33995 H137 11/1/2008 10898.9597 23.20395 H137 4/14/2009 12830.1284 23.77395 H159 3/31/2006 3570.06 14.7898 H159 11/1/2008 15138.8383 26.2088 H159 4/14/2009 17035.4688 26.8778 U216711/4/2005 2812.8425 13.60485 U21674/14/2009 13468.455 24.6203 B-testing Genotypestand_IDInventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.6825.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.88114.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 31/9/2006 3660.41615.09925 U21 36/30/2008 13236.2924.27634 U21 34/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia
Re: [R] r-data partitioning considering two variables (character and numeric)
You could start with split() grp <- rep('', nrow(mydata) ) grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training' grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing' split(mydata, grp) or perhaps grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' ) split(mydata, grp) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" wrote: I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotypestand_IDInventory_date stemC mheight H13 75/18/2006 1940.1075 11.33995 H13 711/1/2008 10898.9597 23.20395 H13 74/14/2009 12830.1284 23.77395 H131811/3/2005 2726.42 13.4432 H13186/30/2008 12226.1554 24.091967 H13184/14/2009 14141.6825.0922 H13215/18/2006 4981.7158 15.7173 H13214/14/2009 20327.0667 27.9155 H159 3/31/2006 3570.06 14.7898 H159 11/1/2008 15138.8383 26.2088 H159 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.88114.1886 H15 2010/4/2006 8330.4688 20.19425 H15 206/30/2008 13576.5 25.4774 H15 322/1/20063426.2525 14.31815 U21 3 1/9/20063660.41615.09925 U21 3 6/30/2008 13236.2924.27634 U21 3 4/14/2009 16124.192 25.79562 U21 6711/4/2005 2812.8425 13.60485 U21 674/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotypestand_IDInventory_date stemC mheight H137 5/18/2006 1940.1075 11.33995 H137 11/1/2008 10898.9597 23.20395 H137 4/14/2009 12830.1284 23.77395 H159 3/31/2006 3570.06 14.7898 H159 11/1/2008 15138.8383 26.2088 H159 4/14/2009 17035.4688 26.8778 U216711/4/2005 2812.8425 13.60485 U21674/14/2009 13468.455 24.6203 B-testing Genotypestand_IDInventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.6825.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.88114.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 31/9/2006 3660.41615.09925 U21 36/30/2008 13236.2924.27634 U21 34/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r-data partitioning considering two variables (character and numeric)
Just partition the unique stand_ID's and select on them using %in% , say: id <- unique(dataGenotype$stand_ID) tst <- sample(id, floor(length(id)/2)) wh <- dataGenotype$stand_ID %in% tst ## logical vector test<- dataGenotype[wh,] train <- dataGenotype[!wh,] There are a million variations on this theme I'm sure. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia wrote: > I would like to partition the following dataset (dataGenotype) based > on two variables; Genotype and stand_ID, for example, for Genotype > H13: stand_ID number 7 may go to training and stand_ID number 18 and > 21 may go to testing. > > Genotypestand_IDInventory_date stemC mheight > H13 75/18/2006 1940.1075 11.33995 > H13 711/1/2008 10898.9597 23.20395 > H13 74/14/2009 12830.1284 23.77395 > H131811/3/2005 2726.42 13.4432 > H13186/30/2008 12226.1554 24.091967 > H13184/14/2009 14141.6825.0922 > H13215/18/2006 4981.7158 15.7173 > H13214/14/2009 20327.0667 27.9155 > H159 3/31/2006 3570.06 14.7898 > H159 11/1/2008 15138.8383 26.2088 > H159 4/14/2009 17035.4688 26.8778 > H15 20 1/18/2005 3016.88114.1886 > H15 2010/4/2006 8330.4688 20.19425 > H15 206/30/2008 13576.5 25.4774 > H15 322/1/20063426.2525 14.31815 > U21 3 1/9/20063660.41615.09925 > U21 3 6/30/2008 13236.2924.27634 > U21 3 4/14/2009 16124.192 25.79562 > U21 6711/4/2005 2812.8425 13.60485 > U21 674/14/2009 13468.455 24.6203 > > And the desired output is the following; > > A-training > > Genotypestand_IDInventory_date stemC mheight > H137 5/18/2006 1940.1075 11.33995 > H137 11/1/2008 10898.9597 23.20395 > H137 4/14/2009 12830.1284 23.77395 > H159 3/31/2006 3570.06 14.7898 > H159 11/1/2008 15138.8383 26.2088 > H159 4/14/2009 17035.4688 26.8778 > U216711/4/2005 2812.8425 13.60485 > U21674/14/2009 13468.455 24.6203 > > B-testing > > Genotypestand_IDInventory_date stemC mheight > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.6825.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 20 1/18/2005 3016.88114.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 31/9/2006 3660.41615.09925 > U21 36/30/2008 13236.2924.27634 > U21 34/14/2009 16124.192 25.79562 > > I tried the following code; > > library(caret) > dataPartitioning <- > createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) > train = dataGenotype[dataPartitioning,] > test = dataGenotype[-dataPartitioning,] > > Also tried > > createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) > > It did not produce the desired output, the data are partitioned within > the stand_ID. For example, one row of stand_ID 7 goes to training and > two rows of stand_ID 7 go to testing. How can I partition the data by > Genotype and stand_ID together?. > > > > Ahmed Attia > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r-data partitioning considering two variables (character and numeric)
I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotypestand_IDInventory_date stemC mheight H13 75/18/2006 1940.1075 11.33995 H13 711/1/2008 10898.9597 23.20395 H13 74/14/2009 12830.1284 23.77395 H131811/3/2005 2726.42 13.4432 H13186/30/2008 12226.1554 24.091967 H13184/14/2009 14141.6825.0922 H13215/18/2006 4981.7158 15.7173 H13214/14/2009 20327.0667 27.9155 H159 3/31/2006 3570.06 14.7898 H159 11/1/2008 15138.8383 26.2088 H159 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.88114.1886 H15 2010/4/2006 8330.4688 20.19425 H15 206/30/2008 13576.5 25.4774 H15 322/1/20063426.2525 14.31815 U21 3 1/9/20063660.41615.09925 U21 3 6/30/2008 13236.2924.27634 U21 3 4/14/2009 16124.192 25.79562 U21 6711/4/2005 2812.8425 13.60485 U21 674/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotypestand_IDInventory_date stemC mheight H137 5/18/2006 1940.1075 11.33995 H137 11/1/2008 10898.9597 23.20395 H137 4/14/2009 12830.1284 23.77395 H159 3/31/2006 3570.06 14.7898 H159 11/1/2008 15138.8383 26.2088 H159 4/14/2009 17035.4688 26.8778 U216711/4/2005 2812.8425 13.60485 U21674/14/2009 13468.455 24.6203 B-testing Genotypestand_IDInventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.6825.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.88114.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 31/9/2006 3660.41615.09925 U21 36/30/2008 13236.2924.27634 U21 34/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.