Consider an alternative and realize that it is density() that is complaining about being passed a dataframe rather than subset misbehaving:

density(subset(mydf, ht >= 150.0 & wt <= 150.0)$age)

Call:
        density.default(x = subset(mydf, ht >= 150 & wt <= 150)$age)

Data: subset(mydf, ht >= 150 & wt <= 150)$age (29 obs.); Bandwidth 'bw' = 5.816

       x                y
 Min.   : 4.553   Min.   :3.781e-05
 1st Qu.:22.776   1st Qu.:3.108e-03
 Median :41.000   Median :1.775e-02
 Mean   :41.000   Mean   :1.370e-02
 3rd Qu.:59.224   3rd Qu.:2.128e-02
 Max.   :77.447   Max.   :2.665e-02


--
David Winsemius


On Jan 20, 2009, at 6:02 PM, Steven McKinney wrote:

Hi all,

Can anyone explain why the following use of
the subset() function produces a different
outcome than the use of the "[" extractor?

The subset() function as used in

density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))

appears to me from documentation to be equivalent to

density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

(modulo exclusion of NAs) but use of the former yields an
error from density.default() (shown below).


Is this a bug in the subset() machinery?  Or is it
a documentation issue for the subset() function
documentation or density() documentation?

I'm seeing issues such as this with newcomers to R
who initially seem to prefer using subset() instead
of the bracket extractor.  At this point these functions
are clearly not exchangeable.  Should code be patched
so that they are, or documentation amended to show
when use of subset() is not appropriate?

### Bug in subset()?

set.seed(123)
mydf <- data.frame(ht = 150 + 10 * rnorm(100),
+                    wt = 150 + 10 * rnorm(100),
+                    age = sample(20:60, size = 100, replace = TRUE)
+                    )


density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = c(age))) :
 argument 'x' must be numeric


density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Call:
        density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])

Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.); Bandwidth 'bw' = 5.816

      x                y
Min.   : 4.553   Min.   :3.781e-05
1st Qu.:22.776   1st Qu.:3.108e-03
Median :41.000   Median :1.775e-02
Mean   :41.000   Mean   :1.370e-02
3rd Qu.:59.224   3rd Qu.:2.128e-02
Max.   :77.447   Max.   :2.665e-02


sessionInfo()
R version 2.8.0 Patched (2008-11-06 r46845)
powerpc-apple-darwin9.5.0

locale:
C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

loaded via a namespace (and not attached):
[1] Matrix_0.999375-16 grid_2.8.0 lattice_0.17-15 lme4_0.99875-9
[5] nlme_3.1-89







Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to