Sorry, my bad -- careless reading: you need to do the partitioning within genotype. Something like:
by(dataGenotype, dataGenotype$Genotype, function(x){ u <- unique(x$standID) tst <- x$x2 %in% sample(u, floor(length(u)/2)) list(test = x[tst,], train = x[!tst,] }) This will give a list each component of which will split the Genotype into test and train dataframe subsets by ID. These lists of data frames can then be recombined into a single test and train dataframe by, e.g. an appropriate rbind() call. HOWEVER, note that you will need to modify this function to decide what to do if/when there is only one ID in a Genotype, as Don MacQueen already pointed out. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 4:09 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > Just partition the unique stand_ID's and select on them using %in% , say: > > id <- unique(dataGenotype$stand_ID) > tst <- sample(id, floor(length(id)/2)) > wh <- dataGenotype$stand_ID %in% tst ## logical vector > test<- dataGenotype[wh,] > train <- dataGenotype[!wh,] > > There are a million variations on this theme I'm sure. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedati...@gmail.com> wrote: > >> I would like to partition the following dataset (dataGenotype) based >> on two variables; Genotype and stand_ID, for example, for Genotype >> H13: stand_ID number 7 may go to training and stand_ID number 18 and >> 21 may go to testing. >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> And the desired output is the following; >> >> A-training >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 7 5/18/2006 1940.1075 11.33995 >> H13 7 11/1/2008 10898.9597 23.20395 >> H13 7 4/14/2009 12830.1284 23.77395 >> H15 9 3/31/2006 3570.06 14.7898 >> H15 9 11/1/2008 15138.8383 26.2088 >> H15 9 4/14/2009 17035.4688 26.8778 >> U21 67 11/4/2005 2812.8425 13.60485 >> U21 67 4/14/2009 13468.455 24.6203 >> >> B-testing >> >> Genotype stand_ID Inventory_date stemC mheight >> H13 18 11/3/2005 2726.42 13.4432 >> H13 18 6/30/2008 12226.1554 24.091967 >> H13 18 4/14/2009 14141.68 25.0922 >> H13 21 5/18/2006 4981.7158 15.7173 >> H13 21 4/14/2009 20327.0667 27.9155 >> H15 20 1/18/2005 3016.881 14.1886 >> H15 20 10/4/2006 8330.4688 20.19425 >> H15 20 6/30/2008 13576.5 25.4774 >> H15 32 2/1/2006 3426.2525 14.31815 >> U21 3 1/9/2006 3660.416 15.09925 >> U21 3 6/30/2008 13236.29 24.27634 >> U21 3 4/14/2009 16124.192 25.79562 >> >> I tried the following code; >> >> library(caret) >> dataPartitioning <- >> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) >> train = dataGenotype[dataPartitioning,] >> test = dataGenotype[-dataPartitioning,] >> >> Also tried >> >> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) >> >> It did not produce the desired output, the data are partitioned within >> the stand_ID. For example, one row of stand_ID 7 goes to training and >> two rows of stand_ID 7 go to testing. How can I partition the data by >> Genotype and stand_ID together?. >> >> >> >> Ahmed Attia >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.