Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Ahmed Attia
Thanks Bert, worked nicely. Yes, genotypes with only one ID will be
eliminated before partitioning the data.


Best regards

Ahmed Attia






On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter  wrote:
> Just partition the unique stand_ID's and select on them using %in% , say:
>
> id <- unique(dataGenotype$stand_ID)
> tst <- sample(id, floor(length(id)/2))
> wh <- dataGenotype$stand_ID %in% tst ## logical vector
> test<- dataGenotype[wh,]
> train <- dataGenotype[!wh,]
>
> There are a million variations on this theme I'm sure.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia  wrote:
>>
>> I would like to partition the following dataset (dataGenotype) based
>> on two variables; Genotype and stand_ID, for example, for Genotype
>> H13: stand_ID number 7 may go to training and stand_ID number 18 and
>> 21 may go to testing.
>>
>> Genotypestand_IDInventory_date  stemC   mheight
>> H13 75/18/2006  1940.1075   11.33995
>> H13 711/1/2008  10898.9597  23.20395
>> H13 74/14/2009  12830.1284  23.77395
>> H131811/3/2005  2726.42 13.4432
>> H13186/30/2008  12226.1554  24.091967
>> H13184/14/2009  14141.6825.0922
>> H13215/18/2006  4981.7158   15.7173
>> H13214/14/2009  20327.0667  27.9155
>> H159 3/31/2006  3570.06 14.7898
>> H159 11/1/2008  15138.8383  26.2088
>> H159 4/14/2009  17035.4688  26.8778
>> H15   20 1/18/2005  3016.88114.1886
>> H15   2010/4/2006   8330.4688   20.19425
>> H15   206/30/2008   13576.5 25.4774
>> H15   322/1/20063426.2525   14.31815
>> U21   3 1/9/20063660.41615.09925
>> U21   3 6/30/2008   13236.2924.27634
>> U21   3 4/14/2009   16124.192   25.79562
>> U21   6711/4/2005   2812.8425   13.60485
>> U21   674/14/2009   13468.455   24.6203
>>
>> And the desired output is the following;
>>
>> A-training
>>
>> Genotypestand_IDInventory_date  stemC   mheight
>> H137 5/18/2006  1940.1075   11.33995
>> H137 11/1/2008  10898.9597  23.20395
>> H137 4/14/2009  12830.1284  23.77395
>> H159 3/31/2006  3570.06 14.7898
>> H159 11/1/2008  15138.8383  26.2088
>> H159 4/14/2009  17035.4688  26.8778
>> U216711/4/2005  2812.8425   13.60485
>> U21674/14/2009  13468.455   24.6203
>>
>> B-testing
>>
>> Genotypestand_IDInventory_date  stemC   mheight
>> H13 18   11/3/2005  2726.42 13.4432
>> H13 18   6/30/2008  12226.1554  24.091967
>> H13 18   4/14/2009  14141.6825.0922
>> H13 21   5/18/2006  4981.7158   15.7173
>> H13 21   4/14/2009  20327.0667  27.9155
>> H15 20   1/18/2005  3016.88114.1886
>> H15 20   10/4/2006  8330.4688   20.19425
>> H15 20   6/30/2008  13576.5 25.4774
>> H15 32   2/1/2006   3426.2525   14.31815
>> U21 31/9/2006   3660.41615.09925
>> U21 36/30/2008  13236.2924.27634
>> U21 34/14/2009  16124.192   25.79562
>>
>> I tried the following code;
>>
>> library(caret)
>> dataPartitioning <-
>> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
>> train = dataGenotype[dataPartitioning,]
>> test = dataGenotype[-dataPartitioning,]
>>
>> Also tried
>>
>> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>>
>> It did not produce the desired output, the data are partitioned within
>> the stand_ID. For example, one row of stand_ID 7 goes to training and
>> two rows of stand_ID 7 go to testing. How can I partition the data by
>> Genotype and stand_ID together?.
>>
>>
>>
>> Ahmed Attia
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Bert Gunter
Sorry, my bad -- careless reading: you need to do the partitioning within
genotype.
Something like:

by(dataGenotype, dataGenotype$Genotype, function(x){

  u <- unique(x$standID)

   tst <- x$x2 %in% sample(u, floor(length(u)/2))

   list(test = x[tst,], train = x[!tst,]

   })


This will give a list each component of which will split the Genotype into
test and train dataframe subsets by ID. These lists of data frames can then
be recombined into a single test and train dataframe by, e.g. an
appropriate rbind() call.


HOWEVER, note that you will need to modify this function to decide what to
do if/when there is only one ID in a Genotype, as Don MacQueen already
pointed out.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 27, 2018 at 4:09 PM Bert Gunter  wrote:

> Just partition the unique stand_ID's and select on them using %in% , say:
>
> id <- unique(dataGenotype$stand_ID)
> tst <- sample(id, floor(length(id)/2))
> wh <- dataGenotype$stand_ID %in% tst ## logical vector
> test<- dataGenotype[wh,]
> train <- dataGenotype[!wh,]
>
> There are a million variations on this theme I'm sure.
>
> -- Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia  wrote:
>
>> I would like to partition the following dataset (dataGenotype) based
>> on two variables; Genotype and stand_ID, for example, for Genotype
>> H13: stand_ID number 7 may go to training and stand_ID number 18 and
>> 21 may go to testing.
>>
>> Genotypestand_IDInventory_date  stemC   mheight
>> H13 75/18/2006  1940.1075   11.33995
>> H13 711/1/2008  10898.9597  23.20395
>> H13 74/14/2009  12830.1284  23.77395
>> H131811/3/2005  2726.42 13.4432
>> H13186/30/2008  12226.1554  24.091967
>> H13184/14/2009  14141.6825.0922
>> H13215/18/2006  4981.7158   15.7173
>> H13214/14/2009  20327.0667  27.9155
>> H159 3/31/2006  3570.06 14.7898
>> H159 11/1/2008  15138.8383  26.2088
>> H159 4/14/2009  17035.4688  26.8778
>> H15   20 1/18/2005  3016.88114.1886
>> H15   2010/4/2006   8330.4688   20.19425
>> H15   206/30/2008   13576.5 25.4774
>> H15   322/1/20063426.2525   14.31815
>> U21   3 1/9/20063660.41615.09925
>> U21   3 6/30/2008   13236.2924.27634
>> U21   3 4/14/2009   16124.192   25.79562
>> U21   6711/4/2005   2812.8425   13.60485
>> U21   674/14/2009   13468.455   24.6203
>>
>> And the desired output is the following;
>>
>> A-training
>>
>> Genotypestand_IDInventory_date  stemC   mheight
>> H137 5/18/2006  1940.1075   11.33995
>> H137 11/1/2008  10898.9597  23.20395
>> H137 4/14/2009  12830.1284  23.77395
>> H159 3/31/2006  3570.06 14.7898
>> H159 11/1/2008  15138.8383  26.2088
>> H159 4/14/2009  17035.4688  26.8778
>> U216711/4/2005  2812.8425   13.60485
>> U21674/14/2009  13468.455   24.6203
>>
>> B-testing
>>
>> Genotypestand_IDInventory_date  stemC   mheight
>> H13 18   11/3/2005  2726.42 13.4432
>> H13 18   6/30/2008  12226.1554  24.091967
>> H13 18   4/14/2009  14141.6825.0922
>> H13 21   5/18/2006  4981.7158   15.7173
>> H13 21   4/14/2009  20327.0667  27.9155
>> H15 20   1/18/2005  3016.88114.1886
>> H15 20   10/4/2006  8330.4688   20.19425
>> H15 20   6/30/2008  13576.5 25.4774
>> H15 32   2/1/2006   3426.2525   14.31815
>> U21 31/9/2006   3660.41615.09925
>> U21 36/30/2008  13236.2924.27634
>> U21 34/14/2009  16124.192   25.79562
>>
>> I tried the following code;
>>
>> library(caret)
>> dataPartitioning <-
>> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
>> train = dataGenotype[dataPartitioning,]
>> test = dataGenotype[-dataPartitioning,]
>>
>> Also tried
>>
>> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>>
>> It did not produce the desired output, the data are partitioned within
>> the stand_ID. For example, one row of stand_ID 7 goes to training and
>> two rows of stand_ID 7 go to testing. How can I partition the data by
>> Genotype and stand_ID together?.
>>
>>
>>
>> Ahmed Attia
>>
>> ___

Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread MacQueen, Don via R-help
And yes, I ignored Genotype, but for the example data none of the stand_ID 
values are present in more than one Genotype, so it doesn't matter. If that's 
not true in general, then constructing the grp variable is a little more 
complex, but the principle is the same.

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 8/27/18, 4:10 PM, "R-help on behalf of MacQueen, Don via R-help" 
 wrote:

You could start with split()

grp <- rep('', nrow(mydata) )
grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'

split(mydata, grp)

or perhaps

grp <- ifelse(  mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' )
split(mydata, grp)

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" 
 wrote:

I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.

Genotypestand_IDInventory_date  stemC   mheight
H13 75/18/2006  1940.1075   11.33995
H13 711/1/2008  10898.9597  23.20395
H13 74/14/2009  12830.1284  23.77395
H131811/3/2005  2726.42 13.4432
H13186/30/2008  12226.1554  24.091967
H13184/14/2009  14141.6825.0922
H13215/18/2006  4981.7158   15.7173
H13214/14/2009  20327.0667  27.9155
H159 3/31/2006  3570.06 14.7898
H159 11/1/2008  15138.8383  26.2088
H159 4/14/2009  17035.4688  26.8778
H15   20 1/18/2005  3016.88114.1886
H15   2010/4/2006   8330.4688   20.19425
H15   206/30/2008   13576.5 25.4774
H15   322/1/20063426.2525   14.31815
U21   3 1/9/20063660.41615.09925
U21   3 6/30/2008   13236.2924.27634
U21   3 4/14/2009   16124.192   25.79562
U21   6711/4/2005   2812.8425   13.60485
U21   674/14/2009   13468.455   24.6203

And the desired output is the following;

A-training

Genotypestand_IDInventory_date  stemC   mheight
H137 5/18/2006  1940.1075   11.33995
H137 11/1/2008  10898.9597  23.20395
H137 4/14/2009  12830.1284  23.77395
H159 3/31/2006  3570.06 14.7898
H159 11/1/2008  15138.8383  26.2088
H159 4/14/2009  17035.4688  26.8778
U216711/4/2005  2812.8425   13.60485
U21674/14/2009  13468.455   24.6203

B-testing

Genotypestand_IDInventory_date  stemC   mheight
H13 18   11/3/2005  2726.42 13.4432
H13 18   6/30/2008  12226.1554  24.091967
H13 18   4/14/2009  14141.6825.0922
H13 21   5/18/2006  4981.7158   15.7173
H13 21   4/14/2009  20327.0667  27.9155
H15 20   1/18/2005  3016.88114.1886
H15 20   10/4/2006  8330.4688   20.19425
H15 20   6/30/2008  13576.5 25.4774
H15 32   2/1/2006   3426.2525   14.31815
U21 31/9/2006   3660.41615.09925
U21 36/30/2008  13236.2924.27634
U21 34/14/2009  16124.192   25.79562

I tried the following code;

library(caret)
dataPartitioning <- 
createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]

Also tried

createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)

It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.



Ahmed Attia


Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread MacQueen, Don via R-help
You could start with split()

grp <- rep('', nrow(mydata) )
grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'

split(mydata, grp)

or perhaps

grp <- ifelse(  mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' )
split(mydata, grp)

-Don

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" 
 wrote:

I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.

Genotypestand_IDInventory_date  stemC   mheight
H13 75/18/2006  1940.1075   11.33995
H13 711/1/2008  10898.9597  23.20395
H13 74/14/2009  12830.1284  23.77395
H131811/3/2005  2726.42 13.4432
H13186/30/2008  12226.1554  24.091967
H13184/14/2009  14141.6825.0922
H13215/18/2006  4981.7158   15.7173
H13214/14/2009  20327.0667  27.9155
H159 3/31/2006  3570.06 14.7898
H159 11/1/2008  15138.8383  26.2088
H159 4/14/2009  17035.4688  26.8778
H15   20 1/18/2005  3016.88114.1886
H15   2010/4/2006   8330.4688   20.19425
H15   206/30/2008   13576.5 25.4774
H15   322/1/20063426.2525   14.31815
U21   3 1/9/20063660.41615.09925
U21   3 6/30/2008   13236.2924.27634
U21   3 4/14/2009   16124.192   25.79562
U21   6711/4/2005   2812.8425   13.60485
U21   674/14/2009   13468.455   24.6203

And the desired output is the following;

A-training

Genotypestand_IDInventory_date  stemC   mheight
H137 5/18/2006  1940.1075   11.33995
H137 11/1/2008  10898.9597  23.20395
H137 4/14/2009  12830.1284  23.77395
H159 3/31/2006  3570.06 14.7898
H159 11/1/2008  15138.8383  26.2088
H159 4/14/2009  17035.4688  26.8778
U216711/4/2005  2812.8425   13.60485
U21674/14/2009  13468.455   24.6203

B-testing

Genotypestand_IDInventory_date  stemC   mheight
H13 18   11/3/2005  2726.42 13.4432
H13 18   6/30/2008  12226.1554  24.091967
H13 18   4/14/2009  14141.6825.0922
H13 21   5/18/2006  4981.7158   15.7173
H13 21   4/14/2009  20327.0667  27.9155
H15 20   1/18/2005  3016.88114.1886
H15 20   10/4/2006  8330.4688   20.19425
H15 20   6/30/2008  13576.5 25.4774
H15 32   2/1/2006   3426.2525   14.31815
U21 31/9/2006   3660.41615.09925
U21 36/30/2008  13236.2924.27634
U21 34/14/2009  16124.192   25.79562

I tried the following code;

library(caret)
dataPartitioning <- 
createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]

Also tried

createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)

It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.



Ahmed Attia

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Bert Gunter
Just partition the unique stand_ID's and select on them using %in% , say:

id <- unique(dataGenotype$stand_ID)
tst <- sample(id, floor(length(id)/2))
wh <- dataGenotype$stand_ID %in% tst ## logical vector
test<- dataGenotype[wh,]
train <- dataGenotype[!wh,]

There are a million variations on this theme I'm sure.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia  wrote:

> I would like to partition the following dataset (dataGenotype) based
> on two variables; Genotype and stand_ID, for example, for Genotype
> H13: stand_ID number 7 may go to training and stand_ID number 18 and
> 21 may go to testing.
>
> Genotypestand_IDInventory_date  stemC   mheight
> H13 75/18/2006  1940.1075   11.33995
> H13 711/1/2008  10898.9597  23.20395
> H13 74/14/2009  12830.1284  23.77395
> H131811/3/2005  2726.42 13.4432
> H13186/30/2008  12226.1554  24.091967
> H13184/14/2009  14141.6825.0922
> H13215/18/2006  4981.7158   15.7173
> H13214/14/2009  20327.0667  27.9155
> H159 3/31/2006  3570.06 14.7898
> H159 11/1/2008  15138.8383  26.2088
> H159 4/14/2009  17035.4688  26.8778
> H15   20 1/18/2005  3016.88114.1886
> H15   2010/4/2006   8330.4688   20.19425
> H15   206/30/2008   13576.5 25.4774
> H15   322/1/20063426.2525   14.31815
> U21   3 1/9/20063660.41615.09925
> U21   3 6/30/2008   13236.2924.27634
> U21   3 4/14/2009   16124.192   25.79562
> U21   6711/4/2005   2812.8425   13.60485
> U21   674/14/2009   13468.455   24.6203
>
> And the desired output is the following;
>
> A-training
>
> Genotypestand_IDInventory_date  stemC   mheight
> H137 5/18/2006  1940.1075   11.33995
> H137 11/1/2008  10898.9597  23.20395
> H137 4/14/2009  12830.1284  23.77395
> H159 3/31/2006  3570.06 14.7898
> H159 11/1/2008  15138.8383  26.2088
> H159 4/14/2009  17035.4688  26.8778
> U216711/4/2005  2812.8425   13.60485
> U21674/14/2009  13468.455   24.6203
>
> B-testing
>
> Genotypestand_IDInventory_date  stemC   mheight
> H13 18   11/3/2005  2726.42 13.4432
> H13 18   6/30/2008  12226.1554  24.091967
> H13 18   4/14/2009  14141.6825.0922
> H13 21   5/18/2006  4981.7158   15.7173
> H13 21   4/14/2009  20327.0667  27.9155
> H15 20   1/18/2005  3016.88114.1886
> H15 20   10/4/2006  8330.4688   20.19425
> H15 20   6/30/2008  13576.5 25.4774
> H15 32   2/1/2006   3426.2525   14.31815
> U21 31/9/2006   3660.41615.09925
> U21 36/30/2008  13236.2924.27634
> U21 34/14/2009  16124.192   25.79562
>
> I tried the following code;
>
> library(caret)
> dataPartitioning <-
> createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
> train = dataGenotype[dataPartitioning,]
> test = dataGenotype[-dataPartitioning,]
>
> Also tried
>
> createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)
>
> It did not produce the desired output, the data are partitioned within
> the stand_ID. For example, one row of stand_ID 7 goes to training and
> two rows of stand_ID 7 go to testing. How can I partition the data by
> Genotype and stand_ID together?.
>
>
>
> Ahmed Attia
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] r-data partitioning considering two variables (character and numeric)

2018-08-27 Thread Ahmed Attia
I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.

Genotypestand_IDInventory_date  stemC   mheight
H13 75/18/2006  1940.1075   11.33995
H13 711/1/2008  10898.9597  23.20395
H13 74/14/2009  12830.1284  23.77395
H131811/3/2005  2726.42 13.4432
H13186/30/2008  12226.1554  24.091967
H13184/14/2009  14141.6825.0922
H13215/18/2006  4981.7158   15.7173
H13214/14/2009  20327.0667  27.9155
H159 3/31/2006  3570.06 14.7898
H159 11/1/2008  15138.8383  26.2088
H159 4/14/2009  17035.4688  26.8778
H15   20 1/18/2005  3016.88114.1886
H15   2010/4/2006   8330.4688   20.19425
H15   206/30/2008   13576.5 25.4774
H15   322/1/20063426.2525   14.31815
U21   3 1/9/20063660.41615.09925
U21   3 6/30/2008   13236.2924.27634
U21   3 4/14/2009   16124.192   25.79562
U21   6711/4/2005   2812.8425   13.60485
U21   674/14/2009   13468.455   24.6203

And the desired output is the following;

A-training

Genotypestand_IDInventory_date  stemC   mheight
H137 5/18/2006  1940.1075   11.33995
H137 11/1/2008  10898.9597  23.20395
H137 4/14/2009  12830.1284  23.77395
H159 3/31/2006  3570.06 14.7898
H159 11/1/2008  15138.8383  26.2088
H159 4/14/2009  17035.4688  26.8778
U216711/4/2005  2812.8425   13.60485
U21674/14/2009  13468.455   24.6203

B-testing

Genotypestand_IDInventory_date  stemC   mheight
H13 18   11/3/2005  2726.42 13.4432
H13 18   6/30/2008  12226.1554  24.091967
H13 18   4/14/2009  14141.6825.0922
H13 21   5/18/2006  4981.7158   15.7173
H13 21   4/14/2009  20327.0667  27.9155
H15 20   1/18/2005  3016.88114.1886
H15 20   10/4/2006  8330.4688   20.19425
H15 20   6/30/2008  13576.5 25.4774
H15 32   2/1/2006   3426.2525   14.31815
U21 31/9/2006   3660.41615.09925
U21 36/30/2008  13236.2924.27634
U21 34/14/2009  16124.192   25.79562

I tried the following code;

library(caret)
dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]

Also tried

createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)

It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.



Ahmed Attia

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.