on 01/20/2009 05:02 PM Steven McKinney wrote: > Hi all, > > Can anyone explain why the following use of > the subset() function produces a different > outcome than the use of the "[" extractor? > > The subset() function as used in > > density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Here you are asking density to be run on a data frame, which is what subset returns, even when you select a single column. Thus, you get an error since density() expects a numeric vector. No bug in either subset() or the documentation. You could do this: density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = age)[[1]]) > appears to me from documentation to be equivalent to > > density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"]) Here you are running density on a vector, so it works. This is because the default behavior for "[.data.frame" has 'drop = TRUE', which means that the returned result is coerced to the lowest possible dimension. Thus, rather than a single data frame column, a vector is returned. The result from subset() would be equivalent to using 'drop = FALSE'. HTH, Marc Schwartz > (modulo exclusion of NAs) but use of the former yields an > error from density.default() (shown below). > > > Is this a bug in the subset() machinery? Or is it > a documentation issue for the subset() function > documentation or density() documentation? > > I'm seeing issues such as this with newcomers to R > who initially seem to prefer using subset() instead > of the bracket extractor. At this point these functions > are clearly not exchangeable. Should code be patched > so that they are, or documentation amended to show > when use of subset() is not appropriate? > >> ### Bug in subset()? > >> set.seed(123) >> mydf <- data.frame(ht = 150 + 10 * rnorm(100), > + wt = 150 + 10 * rnorm(100), > + age = sample(20:60, size = 100, replace = TRUE) > + ) > > >> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age))) > Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = > c(age))) : > argument 'x' must be numeric > > >> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"]) > > Call: > density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"]) > > Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.); Bandwidth 'bw' > = 5.816 > > x y > Min. : 4.553 Min. :3.781e-05 > 1st Qu.:22.776 1st Qu.:3.108e-03 > Median :41.000 Median :1.775e-02 > Mean :41.000 Mean :1.370e-02 > 3rd Qu.:59.224 3rd Qu.:2.128e-02 > Max. :77.447 Max. :2.665e-02 > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.