subject:"\[R\] Subsetting dataframes"

Re: [R] Subsetting dataframes

2007-07-19 Thread CG Pettersson

Thanks a lot.
But an ignorant R user, like me, needed the code example from Jim Holtman
posted outside the list earlier today to understand that:

x62_samvar$cn <- x62_samvar$cn[,drop=TRUE]

was the way to code. Thank you both!

/CG

On Thu, July 19, 2007 3:01 pm, Uwe Ligges said:
>
>
> CG Pettersson wrote:
>> Dear all!
>>
>> W2k, R 2.5.1
>>
>> I am working with an ongoing malting barley variety evaluation within
>> Sweden. The structure is 25 cultivars tested each year at four sites, in

/snip

>>
>> Where do I go wrong and how do I use subset in a proper way?
>
>
> So you have to drop the levels you are excluding. Example:
>
>x <- factor(letters[1:4])
>x
>x[1:2]
>x[1:2, drop=TRUE]
>
>
> Uwe Ligges
>
>
>
>
>> Thanks
>> /CG
>>
>

-- 
CG Pettersson, PhD
Swedish University of Agricultural Sciences (SLU)
Dept. of Crop Production Ecology. Box 7043.
SE-750 07 Uppsala, Sweden
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting dataframes

2007-07-19 Thread Uwe Ligges



CG Pettersson wrote:
> Dear all!
> 
> W2k, R 2.5.1
> 
> I am working with an ongoing malting barley variety evaluation within
> Sweden. The structure is 25 cultivars tested each year at four sites, in
> field trials with three replicates and 'lattice' structure (the replicates
> are divided into five sub blocks in a structured way). As we are normally
> keeping around 15 varieties from each year to the next, and take in 10 new
> for next year, we have tested totally 72 different varieties during five
> years.
> 
> I store the data in a field trial database, and generate text tables with
> the subset of data I want and import the frame to R. I take in all
> cultivars in R and use 'subset' to select what I want to look at. Using
> lme{nlme} works with no problems to get mean results over the years, but
> as I now have a number of years I want to analyse the general site x
> cultivar relation. I am testing AMMI{agricolae} for this and it seems to
> work except for the subsetting. This is what happens:
> 
> If I do the subsetting like this:
> 
> x62_samvar <- subset(x62_5, cn %in%
> c("Astoria","Barke","Christina","Makof", "Prestige","Publican","Quench"))
> 
> A test run with AMMI seems to work in the first part:
> 
>> AMMI(site, cn, rep, yield)
> 
> ANALYSIS AMMI:  yield
> Class level information
> 
> ENV:  Hag Klb Bjt Ska
> GEN:  Astoria Prestige Makof Christina Publican Quench
> REP:  1 2 3
> 
> Number of observations:  240
> 
> model Y: yield  ~ ENV + REP%in%ENV + GEN + ENV:GEN
> 
> Analysis of Variance Table
> 
> Response: Y
>DfSum Sq   Mean Sq F valuePr(>F)
> ENV 3 120092418  40030806 90.0424 1.665e-06 ***
> REP(ENV)8   3556620444578  0.5674  0.803923
> GEN 5  21376142   4275228  5.4564 9.680e-05 ***
> ENV:GEN15  28799807   1919987  2.4504  0.002555 **
> Residuals 208 162973213783525
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
> Coeff var   Mean yield
> 13.08629 6764.098
> 
> After this something goes wrong, as AMMI finds a cultivar name not
> selected in the subsetting. (The plotting might go wrong anyhow, but I
> haven´t got that far yet):
> 
> Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
> object$xlevels) :
> factor 'y' has new level(s) Arkadia
> 
> 
> Looking at the dataframe using
> 
>> edit(x62_samvar)
> 
> only shows the selected lines, but using levels() gives another answer as
> 
>> levels(x62_samvar$cn)
> 
> gives back all 72 cultivar names used during the five years (starting with
> Arcadia).
> 
> Where do I go wrong and how do I use subset in a proper way?


So you have to drop the levels you are excluding. Example:

   x <- factor(letters[1:4])
   x
   x[1:2]
   x[1:2, drop=TRUE]


Uwe Ligges




> Thanks
> /CG
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsetting dataframes

2007-07-19 Thread CG Pettersson

Dear all!

W2k, R 2.5.1

I am working with an ongoing malting barley variety evaluation within
Sweden. The structure is 25 cultivars tested each year at four sites, in
field trials with three replicates and 'lattice' structure (the replicates
are divided into five sub blocks in a structured way). As we are normally
keeping around 15 varieties from each year to the next, and take in 10 new
for next year, we have tested totally 72 different varieties during five
years.

I store the data in a field trial database, and generate text tables with
the subset of data I want and import the frame to R. I take in all
cultivars in R and use 'subset' to select what I want to look at. Using
lme{nlme} works with no problems to get mean results over the years, but
as I now have a number of years I want to analyse the general site x
cultivar relation. I am testing AMMI{agricolae} for this and it seems to
work except for the subsetting. This is what happens:

If I do the subsetting like this:

x62_samvar <- subset(x62_5, cn %in%
c("Astoria","Barke","Christina","Makof", "Prestige","Publican","Quench"))

A test run with AMMI seems to work in the first part:

> AMMI(site, cn, rep, yield)

ANALYSIS AMMI:  yield
Class level information

ENV:  Hag Klb Bjt Ska
GEN:  Astoria Prestige Makof Christina Publican Quench
REP:  1 2 3

Number of observations:  240

model Y: yield  ~ ENV + REP%in%ENV + GEN + ENV:GEN

Analysis of Variance Table

Response: Y
   DfSum Sq   Mean Sq F valuePr(>F)
ENV 3 120092418  40030806 90.0424 1.665e-06 ***
REP(ENV)8   3556620444578  0.5674  0.803923
GEN 5  21376142   4275228  5.4564 9.680e-05 ***
ENV:GEN15  28799807   1919987  2.4504  0.002555 **
Residuals 208 162973213783525
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Coeff var   Mean yield
13.08629 6764.098

After this something goes wrong, as AMMI finds a cultivar name not
selected in the subsetting. (The plotting might go wrong anyhow, but I
haven´t got that far yet):

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) :
factor 'y' has new level(s) Arkadia


Looking at the dataframe using

> edit(x62_samvar)

only shows the selected lines, but using levels() gives another answer as

> levels(x62_samvar$cn)

gives back all 72 cultivar names used during the five years (starting with
Arcadia).

Where do I go wrong and how do I use subset in a proper way?

Thanks
/CG

-- 
CG Pettersson, PhD
Swedish University of Agricultural Sciences (SLU)
Dept. of Crop Production Ecology. Box 7043.
SE-750 07 Uppsala, Sweden
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subsetting dataframes

Re: [R] Subsetting dataframes

[R] Subsetting dataframes

3 matches

Site Navigation

Mail list logo

Footer information