On 23 Jan 2013, at 21:36, "Francesco Sarracino" <f.sarrac...@gmail.com> wrote:

> .... what I meant refers to the fact  that  I've read on "an R and
> S-plus companion to applied regression" about methods to alter the encoding
> of factors when using contrasts in regressions. These are options (for
> contrasts) that can be easily set as "option('contrasts')". This command
> changes the way R creates the dummies out of a factor and various methods
> are available.
> I was expecting that R might have had something similar that applied to my
> case, thus changing the way R attaches numeric values to my dummy variable.
> I am just surprised that such option doesn't exist. I was having wrong
> expectations.

Such options do exist, but at modelling time, not factor creation/conversion 
time.

When created, by calls to 'factor' or in functions like 'read.table', factors 
are stored internally as integers with a list of labels (what you see as factor 
levels) that go with each integer. Those internal integers start at 1 and go 
up. You can set the ordering of those labels (by specifying the "levels" 
argument in factor()) so that, for example, yes and no can be associated with 
(numeric) factor levels 1 and 2 respectively instead of the default ordering 
which would put 'no' alphabetically before 'yes'. (I find this choice 
particularly useful for orderings like "high", "medium", "low" for which the 
alphabetic ordering is not exactly intuitive; similarly alphabetic ordering 
puts '1', '2', '10' in the order '1', '10', '2' and so on, so that often needs 
specifying manually. It's also useful to specify levels if you want things like 
boxplots to come out in a particular order, as boxplots by default use the 
order of the factor levels).
The internal integer values are returned by 'as numeric'. If your factor level 
labels - which are always character - are also interpretable as numbers, you 
need 'as.character' to return the character strings and then 'as.numeric' to 
convert those. 

Now, up to this point you just have more or less arbitrary integers asociated 
with the original factor levels (the degree of arbitrariness depends on whether 
you specified the level order or let R use its default). These integers are not 
the contrasts used in model fitting. Contrasts are set at model matrix building 
time; they are not a fixed attribute of the factor. The internal numbering of 
levels  affects contrasts only to the extent that the numerical values used in 
setting contrasts are usually in the same order as the factor levels.  You can 
inspect the functions used to associate contrasts  with factor levels by using 
options("contrasts"). You can inspect the numerical values that would currently 
be used for a given factor with a call to contrasts(). You can change the 
contrast asignments globally using options() or explicitly in some model calls 
(lm, for example, has a contrasts argument) and if you like you can write your 
own contrast functions to set any values you!
  like.  The most common are probably treatment contrasts, which set the first 
factor level as intercept and the rest as (unit) differences from that, and sum 
to zero contrasts which do what they say, setting contrasts that sum to zero by 
choosing a set like (-1, 0, 1). 

So you actually have a great deal of control over both the order in which 
labels are associated with factor levels and the (separate) values of contrasts 
associated with those factor levels at modelling time. 

The cost of that control is some complexity, and the time needed to learn 
what's going on to use it all properly. 

Hope that helps ...


S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to