Re: [Rd] named arguments in formula and terms

2017-03-13 Thread Achim Zeileis

Martin, thanks for the follow-up!

On Mon, 13 Mar 2017, Martin Maechler wrote:


Dear Achim,


Achim Zeileis 
on Fri, 10 Mar 2017 15:02:38 +0100 writes:


   > Hi, we came across the following unexpected (for us)
   > behavior in terms.formula: When determining whether a term
   > is duplicated, only the order of the arguments in function
   > calls seems to be checked but not their names. Thus the
   > terms f(x, a = z) and f(x, b = z) are deemed to be
   > duplicated and one of the terms is thus dropped.

   R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
   > [1] "f(x, a = z)"

   > However, changing the arguments or the order of arguments
   > keeps both terms:

   R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
   > [1] "f(x, a = z)" "f(x, b = zz)"
   R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
   > [1] "f(x, a = z)" "f(b = z, x)"

   > Is this intended behavior or needed for certain terms?

   > We came across this problem when setting up certain smooth
   > regressors with different kinds of patterns. As a trivial
   > simplified example we can generate the same kind of
   > problem with rep(). Consider the two dummy variables rep(x
   > = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
   > response y = 1:8 I get:

   R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

   > Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x
   > = 0:1, times = 4))

   > Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0

   > So while the model is identified because the two
   > regressors are not the same, terms.fomula does not
   > recognize this and drops the second regressor.  What I
   > would have wanted can be obtained by switching the
   > arguments:

   R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))

   > Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x
   > = 0:1, times = 4))

   > Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =
   > 0:1, times = 4) 2 4 1

   > Of course, here I could avoid the problem by setting up
   > proper factors etc. But to me this looks a potential bug
   > in terms.formula...

I agree that there is a bug.


OK, good. I just wasn't sure whether I had missed some documentation 
somewhere that this is intended behavior.



According to https://www.r-project.org/bugs.html
I have generated an R bugzilla account for you so you can report
it there (for "book keeping", posteriority, etc).


Thanks, I had already looked at that but waited for feedback on this list 
first.



   > Thanks in advance for any insights, Z

and thank *you* (and Nikolaus ?) for the report!


No problem. Niki found the problem and I came up with the simplified 
example. In any case, I just posted a slightly modified version of my 
e-mail as #17235 on Bugzilla:


https://bugs.R-project.org/bugzilla/show_bug.cgi?id=17235

Thanks & best wishes,
Z



Best regards,
Martin




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] named arguments in formula and terms

2017-03-13 Thread Martin Maechler
Dear Achim,

> Achim Zeileis 
> on Fri, 10 Mar 2017 15:02:38 +0100 writes:

> Hi, we came across the following unexpected (for us)
> behavior in terms.formula: When determining whether a term
> is duplicated, only the order of the arguments in function
> calls seems to be checked but not their names. Thus the
> terms f(x, a = z) and f(x, b = z) are deemed to be
> duplicated and one of the terms is thus dropped.

R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
> [1] "f(x, a = z)"

> However, changing the arguments or the order of arguments
> keeps both terms:

R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
> [1] "f(x, a = z)" "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
> [1] "f(x, a = z)" "f(b = z, x)"

> Is this intended behavior or needed for certain terms?

> We came across this problem when setting up certain smooth
> regressors with different kinds of patterns. As a trivial
> simplified example we can generate the same kind of
> problem with rep(). Consider the two dummy variables rep(x
> = 0:1, each = 4) and rep(x = 0:1, times = 4). With the
> response y = 1:8 I get:

R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

> Call: lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x
> = 0:1, times = 4))

> Coefficients: (Intercept) rep(x = 0:1, each = 4) 2.5 4.0

> So while the model is identified because the two
> regressors are not the same, terms.fomula does not
> recognize this and drops the second regressor.  What I
> would have wanted can be obtained by switching the
> arguments:

R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times =4))

> Call: lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x
> = 0:1, times = 4))

> Coefficients: (Intercept) rep(each = 4, x = 0:1) rep(x =
> 0:1, times = 4) 2 4 1

> Of course, here I could avoid the problem by setting up
> proper factors etc. But to me this looks a potential bug
> in terms.formula...

I agree that there is a bug.
According to https://www.r-project.org/bugs.html
I have generated an R bugzilla account for you so you can report
it there (for "book keeping", posteriority, etc).

> Thanks in advance for any insights, Z

and thank *you* (and Nikolaus ?) for the report!

Best regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] named arguments in formula and terms

2017-03-10 Thread Achim Zeileis
Hi, we came across the following unexpected (for us) behavior in 
terms.formula: When determining whether a term is duplicated, only the 
order of the arguments in function calls seems to be checked but not their 
names. Thus the terms f(x, a = z) and f(x, b = z) are deemed to be 
duplicated and one of the terms is thus dropped.


R> attr(terms(y ~ f(x, a = z) + f(x, b = z)), "term.labels")
[1] "f(x, a = z)"

However, changing the arguments or the order of arguments keeps both 
terms:


R> attr(terms(y ~ f(x, a = z) + f(x, b = zz)), "term.labels")
[1] "f(x, a = z)"  "f(x, b = zz)"
R> attr(terms(y ~ f(x, a = z) + f(b = z, x)), "term.labels")
[1] "f(x, a = z)" "f(b = z, x)"

Is this intended behavior or needed for certain terms?

We came across this problem when setting up certain smooth regressors with 
different kinds of patterns. As a trivial simplified example we can 
generate the same kind of problem with rep(). Consider the two dummy 
variables rep(x = 0:1, each = 4) and rep(x = 0:1, times = 4). With the 
response y = 1:8 I get:


R> lm((1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

Call:
lm(formula = (1:8) ~ rep(x = 0:1, each = 4) + rep(x = 0:1, times = 4))

Coefficients:
   (Intercept)  rep(x = 0:1, each = 4)
   2.5 4.0

So while the model is identified because the two regressors are not the 
same, terms.fomula does not recognize this and drops the second regressor. 
What I would have wanted can be obtained by switching the arguments:


R> lm((1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))

Call:
lm(formula = (1:8) ~ rep(each = 4, x = 0:1) + rep(x = 0:1, times = 4))

Coefficients:
(Intercept)   rep(each = 4, x = 0:1)  rep(x = 0:1, times = 4)
  241

Of course, here I could avoid the problem by setting up proper factors 
etc. But to me this looks a potential bug in terms.formula...


Thanks in advance for any insights,
Z

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel