On Dec 12, 2011, at 3:38 PM, Uwe Ligges wrote:



On 12.12.2011 19:36, Brian Jensvold wrote:
I am doing a logistic regression, and by accident I included a field
which has the 2digit abbreviation for all 50 states labeled "st". I was surprised to see that the glm did not come up with an error message but
instead appears to have automatically broken down this field into
individual fields (stAK and stAL).  Does R really know to turn all
categorical variables in binary dummy variables?

Yes.

I have tried answering
the question on my own and have found:



When including categorical variables in a regression, the default in R
is to

set the first level as the base.  Is there an option to specify a
different

level as the base?

Well, reorder to levels of the factor and use the most appropriate base level as the first one. This simplifies life since it is from now on the base level for all the models you try to fit.


My next/same question is what does it mean to "set the first level as
the base" does this mean it turns each value into a unique binary
result?

What is a "unique binary result"?

Actually, the base level is inlcuded in the intercept of your model and you see the differences for the other levels.

Just to expand a bit on Uwe's efforts, for which we are all in his debt. You might see that there is one missing state level, "AK' perhaps, that would generally be included in the reference level. I would have thought it to be AK but apparently you see that abbreviation. Factor variables get handled auto-magically by regression functions.

Uwe Ligges


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to