On Nov 23, 2009, at 7:34 AM, Peter Ehlers wrote:
Alan Kelly wrote:
Deal list,
I have a data frame (birth) with mixed variables (numeric and
alphanumeric). One variable "t1stvisit" was originally coded as
numeric with values 1,2, and 3. After attaching the data frame,
this is what I see when I use str(t1stvisit)
actually, str(birth), I suspect, but not important.
$ t1stvisit: int 1 1 1 1 1 1 1 1 2 2 ...
This is as expected.
I then convert t1stvisit to a factor and to avoid creating a second
copy of this variable independent of the data frame I use:
birth$t1stvisit = as.factor(birth$t1stvisit)
if I check that the conversion has worked:
is.factor(t1stvisit)
[1] FALSE
Now the only object present in the workspace in the data frame
"birth" and, as noted, I have not created any new variables. So
why does R still treat t1stvisit as numeric?
is.factor(t1stvisit)
[1] FALSE
Yet when I try the following:
> is.factor(birth$t1stvisit)
[1] TRUE
So, there appears to be two versions of "t1stvisit" - the original
numeric version and the correct factor version although ls() only
shows "birth" as present in the workspace.
If I type:
> summary(t1stvisit)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 1.000 2.000 1.574 2.000 3.000 29.000
I get the numeric version, but if I try
summary(birth$t1stvisit)
1 2 3 NA's
180 169 22 29
I get the factor version.
Frankly I feel that this behaviour is non-intuitive and potentially
problematic. Nor have I seen warnings about this in the various
text books on R.
Can anyone comment on why this should occur?
I haven't looked at discussions of 'attach()' for a while,
since I rarely use it nowadays (I find with() more convenient
most of the time), but Chapter 6 in 'An Introduction to R'
does discuss it.
There are indeed two versions of 'birth'.
Your basic problem is which version of 'birth' is being modified.
Hint: it's NOT the attached version.
Small example:
dat <- data.frame(x=1:3)
attach(dat)
dat$y <- 4:6
y
#Error: object 'y' not found
dat$y
#[1] 4 5 6
BTW, you don't need as.factor(); use factor().
-Peter Ehlers
Alan;
Let me second Peter's advice. "Attach" creates more problems than it
solves. When I ran his code above, I got output from y but it was not
the 4:6 vector but something else that was in my workspace from a
prior project. You should also be wary, however, of unexpected (to
some of us newbies anyway) behavior with "with":
> with(dat, z<- x + y)
> dat
x y
1 1 4
2 2 5
3 3 6
Since with is a function the assignment to z was local within that
environment.
More effective this way.
> dat$z <- with(dat, x+y)
> dat
x y z
1 1 4 5
2 2 5 7
3 3 6 9
--
David
Many thanks,
Alan Kelly
Dr. Alan Kelly
Department of Public Health & Primary Care
Trinity College Dublin
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.