Re: [R] Column names of model.matrix's output with contrast.arg

2024-06-17 Thread John Fox

Dear Christophe and Ben,

Also see the car package for replacements for contr.treatment(), 
contr.sum(), and contr.helmert() -- e.g., help("contr.Sum", package="car").


These functions have been in the car package for more than two decades, 
and AFAIK, no one uses them (including myself). I didn't write a 
replacement for contr.poly() because the current coefficient labeling 
seemed reasonably transparent.


Best,
 John

--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://www.john-fox.ca/

--
On 2024-06-17 4:29 p.m., Ben Bolker wrote:

Caution: External email.


   It's sorta-kinda-obliquely-partially documented in the examples:

zapsmall(cP <- contr.poly(3)) # Linear and Quadratic

output:

     .L .Q
[1,] -0.7071068  0.4082483
[2,]  0.000 -0.8164966
[3,]  0.7071068  0.4082483

FWIW the faux package provides better-named alternatives.


On 2024-06-17 4:25 p.m., Christophe Dutang wrote:

Thanks for your reply.

It might good to document the naming convention in ?contrasts. It is 
hard to understand .L for linear, .Q for quadratic, .C for cubic and 
^n for other degrees.


For contr.sum, we could have used .Sum, .Sum…

Maybe the examples ?model.matrix should use names in dd objects so 
that we observe when names are dropped.


Kind regards, Christophe



Le 14 juin 2024 à 11:45, peter dalgaard  a écrit :

You're at the mercy of the various contr.XXX functions. They may or 
may not set the colnames on the matrices that they generate.


The rationales for (not) setting them is not perfectly transparent, 
but you obviously cannot use level names on contr.poly, so it uses 
.L, .Q, etc.


In MASS, contr.sdif is careful about labeling the columns with the 
levels that are being diff'ed.


For contr.treatment, there is a straightforward connection to 0/1 
dummy variables, so level names there are natural.


One could use levels in contr.sum and contr.helmert, but it might 
confuse users that comparisons are with the average of all levels or 
preceding levels. (It can be quite confusing when coding is +1 for 
male and -1 for female, so that the gender difference is twice the 
coefficient.)


-pd


On 14 Jun 2024, at 08:12 , Christophe Dutang  wrote:

Dear list,

Changing the default contrasts used in glm() makes me aware how 
model.matrix() set column names.


With default contrasts, model.matrix() use the level values to name 
the columns. However with other contrasts, model.matrix() use the 
level indexes. In the documentation, I don’t see anything in the 
documentation related to this ? It does not seem natural to have 
such a behavior?


Any comment is welcome.

An example is below.

Kind regards, Christophe


#example from ?glm
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- paste0("O", gl(3,1,9))
treatment <- paste0("T", gl(3,3))

X3 <- model.matrix(counts ~ outcome + treatment)
X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.sum"))
X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.helmert"))


#check with original factor
cbind.data.frame(X3, outcome)
cbind.data.frame(X4, outcome)
cbind.data.frame(X5, outcome)

#same issue with glm
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), 
contrasts = list("outcome"="contr.sum"))
glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), 
contrasts = list("outcome"="contr.helmert"))


coef(glm.D93)
coef(glm.D94)
coef(glm.D95)

#check linear predictor
cbind(X3 %*% coef(glm.D93), predict(glm.D93))
cbind(X4 %*% coef(glm.D94), predict(glm.D94))

-
Christophe DUTANG
LJK, Ensimag, Grenoble INP, UGA, France
ILB research fellow
Web: http://dutangc.free.fr

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of
working hours.

__
R-help@r-project.org 

Re: [R] Column names of model.matrix's output with contrast.arg

2024-06-17 Thread Ben Bolker

  It's sorta-kinda-obliquely-partially documented in the examples:

zapsmall(cP <- contr.poly(3)) # Linear and Quadratic

output:

.L .Q
[1,] -0.7071068  0.4082483
[2,]  0.000 -0.8164966
[3,]  0.7071068  0.4082483

FWIW the faux package provides better-named alternatives.


On 2024-06-17 4:25 p.m., Christophe Dutang wrote:

Thanks for your reply.

It might good to document the naming convention in ?contrasts. It is hard to 
understand .L for linear, .Q for quadratic, .C for cubic and ^n for other 
degrees.

For contr.sum, we could have used .Sum, .Sum…

Maybe the examples ?model.matrix should use names in dd objects so that we 
observe when names are dropped.

Kind regards, Christophe



Le 14 juin 2024 à 11:45, peter dalgaard  a écrit :

You're at the mercy of the various contr.XXX functions. They may or may not set 
the colnames on the matrices that they generate.

The rationales for (not) setting them is not perfectly transparent, but you 
obviously cannot use level names on contr.poly, so it uses .L, .Q, etc.

In MASS, contr.sdif is careful about labeling the columns with the levels that 
are being diff'ed.

For contr.treatment, there is a straightforward connection to 0/1 dummy 
variables, so level names there are natural.

One could use levels in contr.sum and contr.helmert, but it might confuse users 
that comparisons are with the average of all levels or preceding levels. (It 
can be quite confusing when coding is +1 for male and -1 for female, so that 
the gender difference is twice the coefficient.)

-pd


On 14 Jun 2024, at 08:12 , Christophe Dutang  wrote:

Dear list,

Changing the default contrasts used in glm() makes me aware how model.matrix() 
set column names.

With default contrasts, model.matrix() use the level values to name the 
columns. However with other contrasts, model.matrix() use the level indexes. In 
the documentation, I don’t see anything in the documentation related to this ? 
It does not seem natural to have such a behavior?

Any comment is welcome.

An example is below.

Kind regards, Christophe


#example from ?glm
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- paste0("O", gl(3,1,9))
treatment <- paste0("T", gl(3,3))

X3 <- model.matrix(counts ~ outcome + treatment)
X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.sum"))
X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.helmert"))

#check with original factor
cbind.data.frame(X3, outcome)
cbind.data.frame(X4, outcome)
cbind.data.frame(X5, outcome)

#same issue with glm
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
list("outcome"="contr.sum"))
glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
list("outcome"="contr.helmert"))

coef(glm.D93)
coef(glm.D94)
coef(glm.D95)

#check linear predictor
cbind(X3 %*% coef(glm.D93), predict(glm.D93))
cbind(X4 %*% coef(glm.D94), predict(glm.D94))

-
Christophe DUTANG
LJK, Ensimag, Grenoble INP, UGA, France
ILB research fellow
Web: http://dutangc.free.fr

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
> E-mail is sent at my convenience; I don't expect replies outside of 
working hours.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column names of model.matrix's output with contrast.arg

2024-06-17 Thread Christophe Dutang
Thanks for your reply.

It might good to document the naming convention in ?contrasts. It is hard to 
understand .L for linear, .Q for quadratic, .C for cubic and ^n for other 
degrees.

For contr.sum, we could have used .Sum, .Sum…

Maybe the examples ?model.matrix should use names in dd objects so that we 
observe when names are dropped.

Kind regards, Christophe


> Le 14 juin 2024 à 11:45, peter dalgaard  a écrit :
> 
> You're at the mercy of the various contr.XXX functions. They may or may not 
> set the colnames on the matrices that they generate. 
> 
> The rationales for (not) setting them is not perfectly transparent, but you 
> obviously cannot use level names on contr.poly, so it uses .L, .Q, etc. 
> 
> In MASS, contr.sdif is careful about labeling the columns with the levels 
> that are being diff'ed. 
> 
> For contr.treatment, there is a straightforward connection to 0/1 dummy 
> variables, so level names there are natural.
> 
> One could use levels in contr.sum and contr.helmert, but it might confuse 
> users that comparisons are with the average of all levels or preceding 
> levels. (It can be quite confusing when coding is +1 for male and -1 for 
> female, so that the gender difference is twice the coefficient.)
> 
> -pd
> 
>> On 14 Jun 2024, at 08:12 , Christophe Dutang  wrote:
>> 
>> Dear list,
>> 
>> Changing the default contrasts used in glm() makes me aware how 
>> model.matrix() set column names.
>> 
>> With default contrasts, model.matrix() use the level values to name the 
>> columns. However with other contrasts, model.matrix() use the level indexes. 
>> In the documentation, I don’t see anything in the documentation related to 
>> this ? It does not seem natural to have such a behavior?
>> 
>> Any comment is welcome.
>> 
>> An example is below.
>> 
>> Kind regards, Christophe  
>> 
>> 
>> #example from ?glm
>> counts <- c(18,17,15,20,10,20,25,13,12)
>> outcome <- paste0("O", gl(3,1,9))
>> treatment <- paste0("T", gl(3,3))
>> 
>> X3 <- model.matrix(counts ~ outcome + treatment)
>> X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
>> list("outcome"="contr.sum"))
>> X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
>> list("outcome"="contr.helmert"))
>> 
>> #check with original factor
>> cbind.data.frame(X3, outcome)
>> cbind.data.frame(X4, outcome)
>> cbind.data.frame(X5, outcome)
>> 
>> #same issue with glm
>> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
>> glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
>> list("outcome"="contr.sum"))
>> glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
>> list("outcome"="contr.helmert"))
>> 
>> coef(glm.D93)
>> coef(glm.D94)
>> coef(glm.D95)
>> 
>> #check linear predictor
>> cbind(X3 %*% coef(glm.D93), predict(glm.D93))
>> cbind(X4 %*% coef(glm.D94), predict(glm.D94))
>> 
>> -
>> Christophe DUTANG
>> LJK, Ensimag, Grenoble INP, UGA, France
>> ILB research fellow
>> Web: http://dutangc.free.fr
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column names of model.matrix's output with contrast.arg

2024-06-14 Thread peter dalgaard
You're at the mercy of the various contr.XXX functions. They may or may not set 
the colnames on the matrices that they generate. 

The rationales for (not) setting them is not perfectly transparent, but you 
obviously cannot use level names on contr.poly, so it uses .L, .Q, etc. 

In MASS, contr.sdif is careful about labeling the columns with the levels that 
are being diff'ed. 

For contr.treatment, there is a straightforward connection to 0/1 dummy 
variables, so level names there are natural.

One could use levels in contr.sum and contr.helmert, but it might confuse users 
that comparisons are with the average of all levels or preceding levels. (It 
can be quite confusing when coding is +1 for male and -1 for female, so that 
the gender difference is twice the coefficient.)

-pd

> On 14 Jun 2024, at 08:12 , Christophe Dutang  wrote:
> 
> Dear list,
> 
> Changing the default contrasts used in glm() makes me aware how 
> model.matrix() set column names.
> 
> With default contrasts, model.matrix() use the level values to name the 
> columns. However with other contrasts, model.matrix() use the level indexes. 
> In the documentation, I don’t see anything in the documentation related to 
> this ? It does not seem natural to have such a behavior?
> 
> Any comment is welcome.
> 
> An example is below.
> 
> Kind regards, Christophe  
> 
> 
> #example from ?glm
> counts <- c(18,17,15,20,10,20,25,13,12)
> outcome <- paste0("O", gl(3,1,9))
> treatment <- paste0("T", gl(3,3))
> 
> X3 <- model.matrix(counts ~ outcome + treatment)
> X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
> list("outcome"="contr.sum"))
> X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
> list("outcome"="contr.helmert"))
> 
> #check with original factor
> cbind.data.frame(X3, outcome)
> cbind.data.frame(X4, outcome)
> cbind.data.frame(X5, outcome)
> 
> #same issue with glm
> glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
> glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
> list("outcome"="contr.sum"))
> glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
> list("outcome"="contr.helmert"))
> 
> coef(glm.D93)
> coef(glm.D94)
> coef(glm.D95)
> 
> #check linear predictor
> cbind(X3 %*% coef(glm.D93), predict(glm.D93))
> cbind(X4 %*% coef(glm.D94), predict(glm.D94))
> 
> -
> Christophe DUTANG
> LJK, Ensimag, Grenoble INP, UGA, France
> ILB research fellow
> Web: http://dutangc.free.fr
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Column names of model.matrix's output with contrast.arg

2024-06-14 Thread Christophe Dutang
Dear list,

Changing the default contrasts used in glm() makes me aware how model.matrix() 
set column names.

With default contrasts, model.matrix() use the level values to name the 
columns. However with other contrasts, model.matrix() use the level indexes. In 
the documentation, I don’t see anything in the documentation related to this ? 
It does not seem natural to have such a behavior?

Any comment is welcome.

An example is below.

Kind regards, Christophe  


#example from ?glm
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- paste0("O", gl(3,1,9))
treatment <- paste0("T", gl(3,3))

X3 <- model.matrix(counts ~ outcome + treatment)
X4 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.sum"))
X5 <- model.matrix(counts ~ outcome + treatment, contrasts = 
list("outcome"="contr.helmert"))

#check with original factor
cbind.data.frame(X3, outcome)
cbind.data.frame(X4, outcome)
cbind.data.frame(X5, outcome)

#same issue with glm
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
glm.D94 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
list("outcome"="contr.sum"))
glm.D95 <- glm(counts ~ outcome + treatment, family = poisson(), contrasts = 
list("outcome"="contr.helmert"))

coef(glm.D93)
coef(glm.D94)
coef(glm.D95)

#check linear predictor
cbind(X3 %*% coef(glm.D93), predict(glm.D93))
cbind(X4 %*% coef(glm.D94), predict(glm.D94))

-
Christophe DUTANG
LJK, Ensimag, Grenoble INP, UGA, France
ILB research fellow
Web: http://dutangc.free.fr

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.