On 12/01/2009 6:12 PM, Terry Therneau wrote:
Context:
R version 2.7.1 (2008-06-23)
I don't know when this was upgraded in the department, I just ran into the
aberrent behavior today.
Problem:
Our group BY CHOICE does not change character variables into factors by
default. I can get into a long arguement as to why later, and will give one
example of why below.
The default behavior of S, Splus and R has been to create dummy variables for
factor, character, and logical variables. This is good.
Why has R suddenly gotton a compulsion to put out a warning message for any
model where we do this? I contend that it is
I think you need to be more specific. I just tried
x - sample(letters[1:4], 100, rep=T)
y - rnorm(x)
lm(y ~ x)
and got a warning in all R versions I tried back to 2.4.1. In 2.3.1
this was an error.
So I suspect the change you saw was to some other modelling function
besides lm(), and I would guess that it came from making it consistent
with lm().
But it would help if you told use which function, and which version
you're comparing 2.7.1 with.
Now, it probably does make sense to suppress that warning. I guess it
was probably introduced because we used to give an error, and someone
was being conservative and didn't think error-ful behaviour should go to
accepted behaviour in one step. But maybe it's time for the second step.
Duncan Murdoch
- confusing
- unnecessary
- and wrong.
It is certainly confusing, as it implies a behavior change when there has been
no change.
The fact that the factor command was used behind the scenes is irrelevant to
anyone - who cares that HOW the rules are implemented. Is there going to soon
be a message of WARNING: logical variable turned into numeric? It is the
sensible next step.
Wrong because the data element in question is not converted - not at the user
level at least. It would be much more proper to say was treated as a factor by
model.matrix; but that is a semantic issue.
In any case- to the real question.
1.What is the easiest way to eliminate this? I would prefer not to have to
change the source code and recompile the local versions. Because of namespaces
it is not as easy as adding a repaired copy to our local library and loading it
first. I remember some discussion about forcing a change into another name
space but I've lost the link to it.
I'll do this if we must, but it is such a hassle to keep updating the change
with new releases of the package. It might still be less than dealing with the
training/answer questions burden for our group, which is quite large.
2. Is there any hope of undoing this? Its only real impact is to annoy, and
I've always disliked systems that preach at me. (Detested is more like it, I
still remember how hard it was to delete certain files in Digital's TOPS os,
which was sure you ``didn't actually want to do that''.) Allowing some global
stop preaching option would be a fix, or in the same vein to have it look at
the existing stringsAsFactors option for a this person knows so don't bother
him hint.
If this addition followed on some discussion, please point me to it.
The reaction of other experienced users in our group has been the same, when I
pointed this out. So I am not alone in the why?
Terry Therneau
PS. Here are two interrelated reasons we don't autoconvert:
1. Subject id. Factors give no advantage for a unique id, and some clear
problems. In particular when one creates as subset - everyone over 60 say -
there is no good reason to remember all the ids you didn't select.
2. Subject id. I work on a lot of studies of fractures and fracture risk. A
time-trend model might be
gam(fracture ~ subject + x1 + x2 + ..., subset=(sex='F'))
Fracture risk for males and females is so different that separate models are
the sensible thing. If subject is a factor before the call, then my model has a
zillion unneeded levels. There are other ways out of this issue, but avoiding
factors is the easiest.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel