On Nov 23, 2009, at 7:34 AM, Peter Ehlers wrote:


Alan Kelly wrote:
Deal list,
I have a data frame (birth) with mixed variables (numeric and alphanumeric). One variable "t1stvisit" was originally coded as numeric with values 1,2, and 3. After attaching the data frame, this is what I see when I use str(t1stvisit)
actually, str(birth), I suspect, but not important.
$ t1stvisit: int  1 1 1 1 1 1 1 1 2 2 ...
This is as expected.
I then convert t1stvisit to a factor and to avoid creating a second copy of this variable independent of the data frame I use:
birth$t1stvisit = as.factor(birth$t1stvisit)
if I check that the conversion has worked:
is.factor(t1stvisit)
[1] FALSE
Now the only object present in the workspace in the data frame "birth" and, as noted, I have not created any new variables. So why does R still treat t1stvisit as numeric?
is.factor(t1stvisit)
[1] FALSE
Yet when I try the following:
> is.factor(birth$t1stvisit)
[1] TRUE
So, there appears to be two versions of "t1stvisit" - the original numeric version and the correct factor version although ls() only shows "birth" as present in the workspace.
If I type:
> summary(t1stvisit)
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
 1.000   1.000   2.000   1.574   2.000   3.000  29.000
I get the numeric version, but if I try
summary(birth$t1stvisit)
  1    2    3 NA's
180  169   22   29
I get the factor version.
Frankly I feel that this behaviour is non-intuitive and potentially problematic. Nor have I seen warnings about this in the various text books on R.
Can anyone comment on why this should occur?

I haven't looked at discussions of 'attach()' for a while,
since I rarely use it nowadays (I find with() more convenient
most of the time), but Chapter 6 in 'An Introduction to R'
does discuss it.

There are indeed two versions of 'birth'.
Your basic problem is which version of 'birth' is being modified.
Hint: it's NOT the attached version.
Small example:

dat <- data.frame(x=1:3)
attach(dat)
dat$y <- 4:6
y
#Error: object 'y' not found
dat$y
#[1] 4 5 6

BTW, you don't need as.factor(); use factor().

-Peter Ehlers

Alan;

Let me second Peter's advice. "Attach" creates more problems than it solves. When I ran his code above, I got output from y but it was not the 4:6 vector but something else that was in my workspace from a prior project. You should also be wary, however, of unexpected (to some of us newbies anyway) behavior with "with":

> with(dat, z<- x + y)
> dat
  x y
1 1 4
2 2 5
3 3 6
Since with is a function the assignment to z was local within that environment.

More effective this way.
> dat$z <- with(dat, x+y)
> dat
  x y z
1 1 4 5
2 2 5 7
3 3 6 9

--
David



Many thanks,
Alan Kelly
Dr. Alan Kelly
Department of Public Health & Primary Care
Trinity College Dublin
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to