[Rd] [patch] ?confint: "assumes asymptotic normality"

2017-07-20 Thread Scott Kostyshak
>From ?confint:

"Computes confidence intervals" and "The default method assumes
asymptotic normality"

For me, a "confidence interval" implies an exact confidence interval in
formal statistics (I concede that when speaking, the term is often used
more loosely). And of course, even if a test statistic is asymptotically
normal (so the assumption is satisfied), the finite distribution might
not be normal and thus an exact confidence interval would not be
computed.

Attached is a patch that simply changes "asymptotic normality" to
"normality" in confint.Rd. This encourages the user of the function to
think about whether their asymptotically normal statistic is "normal
enough" in a finite sample to get something reliable from confint().

Alternatively, we could instead change "Computes confidence intervals"
to "Computes asymptotic confidence intervals".

I hope I'm not being too pedantic here.

Scott


-- 
Scott Kostyshak
Assistant Professor of Economics
University of Florida
https://people.clas.ufl.edu/skostyshak/

Index: src/library/stats/man/confint.Rd
===
--- src/library/stats/man/confint.Rd(revision 72930)
+++ src/library/stats/man/confint.Rd(working copy)
@@ -31,7 +31,7 @@
 }
 \details{
   \code{confint} is a generic function.  The default method assumes
-  asymptotic normality, and needs suitable \code{\link{coef}} and
+  normality, and needs suitable \code{\link{coef}} and
   \code{\link{vcov}} methods to be available.  The default method can be
   called directly for comparison with other methods.
 
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] configure check might fail for texi2any on Solaris

2017-07-20 Thread Gábor Csárdi
This is R-patched, from 15th of July. I get:

configure:5117: found /opt/csw/bin/texi2any
configure:5129: result: /opt/csw/bin/texi2any
configure:5141: checking whether texi2any is at least 5.1
configure:5163: result:no

However:

/opt/csw/bin/texi2any --version
texi2any (GNU texinfo) 6.1

Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Does the configure check fail or am I missing something?

Thanks,
Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] configure check might fail for texi2any on Solaris

2017-07-20 Thread Gábor Csárdi
Seems like I just need to put /usr/xpg4/bin first in the PATH, that
has a feature full grep program, and then the configure test passes.

Gabor

On Thu, Jul 20, 2017 at 11:19 AM, Gábor Csárdi  wrote:
> This is R-patched, from 15th of July. I get:
>
> configure:5117: found /opt/csw/bin/texi2any
> configure:5129: result: /opt/csw/bin/texi2any
> configure:5141: checking whether texi2any is at least 5.1
> configure:5163: result:no
>
> However:
>
> /opt/csw/bin/texi2any --version
> texi2any (GNU texinfo) 6.1
>
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Does the configure check fail or am I missing something?
>
> Thanks,
> Gabor

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Separate packages per windows subarch in repository

2017-07-20 Thread Iago Mosqueira
Hi,

I maintain a repository of R packages, where some of them contain
executable binaries. I need to separate those compiled for 32 and 64 bit in
Windows, but I could not how to do any of the two options I can think of:

1. Have subarch subfolders in PKG/inst/bin to that the right one is
installed or called

2. Have separate versions of the packages accessible in the same repository
for each subarch, e.g. bin/windows/contrib/3.4/i386

Could I do thew first via a configure.win script?

Is the second option possible?

Any pointers to the documentation I might have missed would be appreciated.

Best regards,


Iago

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Separate packages per windows subarch in repository

2017-07-20 Thread Uwe Ligges



On 20.07.2017 13:55, Iago Mosqueira wrote:

Hi,

I maintain a repository of R packages, where some of them contain
executable binaries. I need to separate those compiled for 32 and 64 bit in
Windows, but I could not how to do any of the two options I can think of:

1. Have subarch subfolders in PKG/inst/bin to that the right one is
installed or called


Use 1., i.e. the same approach as for dlls, where we have the two 
subdirs for the two archs.


I think this will be difficult in Makevars.win when you build the 
executable binaries as part of the installation process, hence I think 
an appropriate Makefile.win should be used.


Best,
Uwe Ligges





2. Have separate versions of the packages accessible in the same repository
for each subarch, e.g. bin/windows/contrib/3.4/i386

Could I do thew first via a configure.win script?

Is the second option possible?

Any pointers to the documentation I might have missed would be appreciated.

Best regards,


Iago

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Separate packages per windows subarch in repository

2017-07-20 Thread Jeroen Ooms
On Thu, Jul 20, 2017 at 1:55 PM, Iago Mosqueira
 wrote:
>
> I maintain a repository of R packages, where some of them contain
> executable binaries. I need to separate those compiled for 32 and 64 bit in
> Windows.

Have a look at the antiword package. It has a simple makevars which
builds antiword$(WIN) executable which is just the 'antiword' on unix,
and antiword32.exe + antiword64.exe on multiarch windows.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] ?confint: "assumes asymptotic normality"

2017-07-20 Thread Martin Maechler
> Scott Kostyshak 
> on Thu, 20 Jul 2017 03:28:37 -0400 writes:

>> From ?confint:
> "Computes confidence intervals" and "The default method assumes
> asymptotic normality"

> For me, a "confidence interval" implies an exact confidence interval in
> formal statistics (I concede that when speaking, the term is often used
> more loosely). And of course, even if a test statistic is asymptotically
> normal (so the assumption is satisfied), the finite distribution might
> not be normal and thus an exact confidence interval would not be
> computed.

> Attached is a patch that simply changes "asymptotic normality" to
> "normality" in confint.Rd. This encourages the user of the function to
> think about whether their asymptotically normal statistic is "normal
> enough" in a finite sample to get something reliable from confint().

> Alternatively, we could instead change "Computes confidence intervals"
> to "Computes asymptotic confidence intervals".

> I hope I'm not being too pedantic here.

well, it's just at the 97.5% border line of "too pedantic"  ...
















;-)

I think you are right with your first proposal to drop
"asymptotic" here.  After all, there's the explict 'fac <- qnorm(a)'.

One could consider to make  'qnorm' an argument of the
default method to allow more general distributional assumptions,
but it may be wiser to have useRs write their own
confint.() method, notably for cases where
diag(vcov(object)) is an efficiency waste...

Martin


> Scott


> -- 
> Scott Kostyshak
> Assistant Professor of Economics
> University of Florida
> https://people.clas.ufl.edu/skostyshak/


> --
> Index: src/library/stats/man/confint.Rd
> ===
> --- src/library/stats/man/confint.Rd  (revision 72930)
> +++ src/library/stats/man/confint.Rd  (working copy)
> @@ -31,7 +31,7 @@
> }
> \details{
> \code{confint} is a generic function.  The default method assumes
> -  asymptotic normality, and needs suitable \code{\link{coef}} and
> +  normality, and needs suitable \code{\link{coef}} and
> \code{\link{vcov}} methods to be available.  The default method can be
> called directly for comparison with other methods.
 

> --
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Separate packages per windows subarch in repository

2017-07-20 Thread Iago Mosqueira
Thanks. I did not explain clearly that the executables are not compiled
during package compilation, apologies. They are compiled beforehand, as
they use ADMB (AD Model Builder) and placed in inst/bin/windows.

I assume Makefile.win could copy the appropriate one from PKG/bin/$arch to
PKG/inst/bin/ so that the right one is installed?

Cheers,


Iago

On 20 July 2017 at 15:38, Uwe Ligges 
wrote:

>
>
> On 20.07.2017 13:55, Iago Mosqueira wrote:
>
>> Hi,
>>
>> I maintain a repository of R packages, where some of them contain
>> executable binaries. I need to separate those compiled for 32 and 64 bit
>> in
>> Windows, but I could not how to do any of the two options I can think of:
>>
>> 1. Have subarch subfolders in PKG/inst/bin to that the right one is
>> installed or called
>>
>
> Use 1., i.e. the same approach as for dlls, where we have the two subdirs
> for the two archs.
>
> I think this will be difficult in Makevars.win when you build the
> executable binaries as part of the installation process, hence I think an
> appropriate Makefile.win should be used.
>
> Best,
> Uwe Ligges
>
>
>
>
> 2. Have separate versions of the packages accessible in the same repository
>> for each subarch, e.g. bin/windows/contrib/3.4/i386
>>
>> Could I do thew first via a configure.win script?
>>
>> Is the second option possible?
>>
>> Any pointers to the documentation I might have missed would be
>> appreciated.
>>
>> Best regards,
>>
>>
>> Iago
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Change in print.function between R 3.4.1 and R 3.4.0

2017-07-20 Thread nicola farina
Dear all,

I just installed R 3.4.1 and noticed a change in how user-defined functions
are printed. A small example:

string<-"f<-function(x){\n\tx^2\n}"
cat(string,file="tmp00a.R")
source("tmp00a.R")
f

And this is what I see:

#R 3.4.0
function(x){
x^2
}

#R 3.4.1
function(x){
\tx^2
}

Seems that in 3.4.1 the tab character isn't "rendered". This is rather
annoying since it becomes very difficult to inspect the source code of user
defined functions. This behaviour seems to be present just for the tab
character (\n and \r are displayed correctly).

I'm on Ubuntu 14, 64bit. If you need more details I will gladly provide
what I can. Here is my sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/local/R-3.4.1/lib/libRblas.so
LAPACK: /usr/local/R-3.4.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=it_IT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=it_IT.UTF-8LC_COLLATE=it_IT.UTF-8
 [5] LC_MONETARY=it_IT.UTF-8LC_MESSAGES=it_IT.UTF-8
 [7] LC_PAPER=it_IT.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1

(the value of sessionInfo() on R 3.4.0 is obviously identical aside from
the versions).


Thank you for your attention and your incredible work.

Nicola Farina

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wrongly converging glm()

2017-07-20 Thread Harm-Jan Westra
Dear R-core,


I have found an edge-case where the glm function falsely concludes that the 
model has converged. The issue is the following: my data contains a number of 
covariates, one of these covariates has a very small variance. For most of the 
rows of this covariate, the value is 0, except for one of the rows, where it is 
1.


The glm function correctly determines the beta and standard error estimates for 
all other covariates.


I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt


The model I'm using is very simple:


data <- read.table("rtestdata.txt")

model <- glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] + data[,6] + 
data[,7] + data[,8] + data[,9] + data[,10] + data[,11] + data[,12] + data[,13] 
+ data[,14], family=binomial("logit"))

summary(model)


You will see that for covariate data[,13], the beta/coefficient estimate is 
around -9. The number of iterations that has been performed is 8, and 
model$converged returns TRUE.


I've used some alternate logistic regression code in C 
(https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces 
identical estimates for the other covariates and comparable deviance values. 
However, using this C code, I'm seeing that the estimate for data[,13] is 
around -100 (since I'm allowing a maximum of 100 MLE iterations). There, the 
conclusion is that the model does not converge.


The difference between the two pieces of code is that in R, the glm() function 
determines convergence of the whole model by measuring the difference between 
deviance of the current iteration versus the deviance of the prior iteration, 
and calls the model converged when it reaches a certain epsilon value. In the 
C++ code, the model is converged when all parameters haven't changed markedly 
compared to the previous iteration.


I think both approaches are valid, although the R variant (while faster) makes 
it vulnerable to wrongly concluding convergence in edge cases such as the one 
presented above, resulting in wrong coefficient estimates. For people wanting 
to use logistic regression in a training/prediction kind of setting, using 
these estimates might influence their predictive performance.


The problem here is that the glm function does not return any warnings when one 
of the covariates in the model does not converge. For someone who is not paying 
attention, this may lead them to conclude there is nothing wrong with their 
data. In my opinion, the default behavior in this case should therefore be to 
conclude that the model did not converge, or at least to show a warning message.


Please let me know whether you believe this is an issue, and whether I can 
provide additional information.


With kind regards,


Harm-Jan Westra









[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Separate packages per windows subarch in repository

2017-07-20 Thread Iago Mosqueira
I have something working using configure.win, but this changes the
executable when the windows version of the package is created via R CMD
INSTALL --build.

Could there be any way to do so at installation time from the binary
package?

Thanks,


Iago

On 20 July 2017 at 16:21, Iago Mosqueira  wrote:

> Thanks. I did not explain clearly that the executables are not compiled
> during package compilation, apologies. They are compiled beforehand, as
> they use ADMB (AD Model Builder) and placed in inst/bin/windows.
>
> I assume Makefile.win could copy the appropriate one from PKG/bin/$arch to
> PKG/inst/bin/ so that the right one is installed?
>
> Cheers,
>
>
> Iago
>
> On 20 July 2017 at 15:38, Uwe Ligges 
> wrote:
>
>>
>>
>> On 20.07.2017 13:55, Iago Mosqueira wrote:
>>
>>> Hi,
>>>
>>> I maintain a repository of R packages, where some of them contain
>>> executable binaries. I need to separate those compiled for 32 and 64 bit
>>> in
>>> Windows, but I could not how to do any of the two options I can think of:
>>>
>>> 1. Have subarch subfolders in PKG/inst/bin to that the right one is
>>> installed or called
>>>
>>
>> Use 1., i.e. the same approach as for dlls, where we have the two subdirs
>> for the two archs.
>>
>> I think this will be difficult in Makevars.win when you build the
>> executable binaries as part of the installation process, hence I think an
>> appropriate Makefile.win should be used.
>>
>> Best,
>> Uwe Ligges
>>
>>
>>
>>
>> 2. Have separate versions of the packages accessible in the same
>>> repository
>>> for each subarch, e.g. bin/windows/contrib/3.4/i386
>>>
>>> Could I do thew first via a configure.win script?
>>>
>>> Is the second option possible?
>>>
>>> Any pointers to the documentation I might have missed would be
>>> appreciated.
>>>
>>> Best regards,
>>>
>>>
>>> Iago
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Change in print.function between R 3.4.1 and R 3.4.0

2017-07-20 Thread Martin Maechler
> nicola farina 
> on Thu, 20 Jul 2017 16:51:54 +0200 writes:

> Dear all,
> I just installed R 3.4.1 and noticed a change in how user-defined 
functions
> are printed. A small example:

> string<-"f<-function(x){\n\tx^2\n}"
> cat(string,file="tmp00a.R")
> source("tmp00a.R")
> f

> And this is what I see:

> #R 3.4.0
> function(x){
> x^2
> }

> #R 3.4.1
> function(x){
> \tx^2
> }

> Seems that in 3.4.1 the tab character isn't "rendered". This is rather
> annoying since it becomes very difficult to inspect the source code of 
user
> defined functions. This behaviour seems to be present just for the tab
> character (\n and \r are displayed correctly).

[..]

> Thank you for your attention and your incredible work.

Thank you for the flowers.

Believe it or not, I've detected this bug about 8 hours ago, after typing

  menu

at the R prompt after receiving a bug report about it.

It came from fixing bug  PR#16732  (which was about rendering
Japanese fonts in a function defintion when printing)

  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16732

A (probably shortest possible!) symptom of the bug is
that in R <= 3.4.0,

> quote(-"\n")
-"\n"
> 

whereas in R 3.4.1 (and the current newer development versions of R)

> quote(-"\n")
-"\\n"
> 


Ideally, fixing this (wrong duplication of "\") will not make
bug 16732 resurface.

I expect a bug fix by tomorrow.
If this is a big problem for you, you will have to upgrade to
 "R 3.4.1 patched"
during the weekend.
(Downgrading to R 3.4.0 is not a good idea, compared to the above!)

Martin Maechler,
ETH Zurich


> Nicola Farina

> [[alternative HTML version deleted]]
 ^
(do learn to post in plain text on R-devel, please)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Joris Meys
Allow me to chime in. That's an interesting case you present, but as far as
I'm concerned the algorithm did converge. The estimate of -9.25 has an
estimated standard error of 72.4, meaning that frequentists would claim the
true value would lie anywhere between appx. -151 and 132 (CI) and hence the
estimate from the glm algorithm is perfectly compatible with the one from
the C++ code. And as the glm algorithm uses a different convergence rule,
the algorithm rightfully reported it converged. It's not because another
algorithm based on another rule doesn't converge, that the one glm uses
didn't.

On top of that: In both cases the huge standard error on that estimate
clearly tells you that the estimate should not be trusted, and the fit is
unstable. That's to be expected, given the insane inbalance in your data,
especially for the 13th column. If my students would incorporate that
variable in a generalized linear model and tries to formulate a conclusion
based on that coefficient, they failed the exam. So if somebody does this
analysis and tries to draw any conclusion whatsoever on that estimate,
maybe they should leave the analysis to somebody who does know what they're
doing.

Cheers
Joris

On Thu, Jul 20, 2017 at 5:02 PM, Harm-Jan Westra  wrote:

> Dear R-core,
>
>
> I have found an edge-case where the glm function falsely concludes that
> the model has converged. The issue is the following: my data contains a
> number of covariates, one of these covariates has a very small variance.
> For most of the rows of this covariate, the value is 0, except for one of
> the rows, where it is 1.
>
>
> The glm function correctly determines the beta and standard error
> estimates for all other covariates.
>
>
> I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt
>
>
> The model I'm using is very simple:
>
>
> data <- read.table("rtestdata.txt")
>
> model <- glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> data[,12] + data[,13] + data[,14], family=binomial("logit"))
>
> summary(model)
>
>
> You will see that for covariate data[,13], the beta/coefficient estimate
> is around -9. The number of iterations that has been performed is 8, and
> model$converged returns TRUE.
>
>
> I've used some alternate logistic regression code in C (
> https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> identical estimates for the other covariates and comparable deviance
> values. However, using this C code, I'm seeing that the estimate for
> data[,13] is around -100 (since I'm allowing a maximum of 100 MLE
> iterations). There, the conclusion is that the model does not converge.
>
>
> The difference between the two pieces of code is that in R, the glm()
> function determines convergence of the whole model by measuring the
> difference between deviance of the current iteration versus the deviance of
> the prior iteration, and calls the model converged when it reaches a
> certain epsilon value. In the C++ code, the model is converged when all
> parameters haven't changed markedly compared to the previous iteration.
>
>
> I think both approaches are valid, although the R variant (while faster)
> makes it vulnerable to wrongly concluding convergence in edge cases such as
> the one presented above, resulting in wrong coefficient estimates. For
> people wanting to use logistic regression in a training/prediction kind of
> setting, using these estimates might influence their predictive performance.
>
>
> The problem here is that the glm function does not return any warnings
> when one of the covariates in the model does not converge. For someone who
> is not paying attention, this may lead them to conclude there is nothing
> wrong with their data. In my opinion, the default behavior in this case
> should therefore be to conclude that the model did not converge, or at
> least to show a warning message.
>
>
> Please let me know whether you believe this is an issue, and whether I can
> provide additional information.
>
>
> With kind regards,
>
>
> Harm-Jan Westra
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Harm-Jan Westra
Dear Joris,


I agree that such a covariate should not be used in the analysis, and fully 
agree with your assessment. However, your response assumes that everybody who 
uses R knows what they're doing, which is a dangerous assumption to make. I bet 
there are a lot of people who blindly trust the output from R, even when 
there's clearly something wrong with the estimates.


In terms of your conclusion that the C++ estimate corresponds to a value within 
the R estimated confidence interval: if I allow the C++ code to run for 1000 
iterations, it's estimate would be around -1000. It simply never converges.


I think there's nothing wrong with letting the user know there might be 
something wrong with one of the estimates, especially if your code can easily 
figure it out for you, by adding an additional rule. Not everyone is always 
paying attention (even if they know what they're doing).


With kind regards,


Harm-Jan



From: Joris Meys 
Sent: Thursday, July 20, 2017 11:38 AM
To: Harm-Jan Westra
Cc: r-devel@r-project.org
Subject: Re: [Rd] Wrongly converging glm()

Allow me to chime in. That's an interesting case you present, but as far as I'm 
concerned the algorithm did converge. The estimate of -9.25 has an estimated 
standard error of 72.4, meaning that frequentists would claim the true value 
would lie anywhere between appx. -151 and 132 (CI) and hence the estimate from 
the glm algorithm is perfectly compatible with the one from the C++ code. And 
as the glm algorithm uses a different convergence rule, the algorithm 
rightfully reported it converged. It's not because another algorithm based on 
another rule doesn't converge, that the one glm uses didn't.

On top of that: In both cases the huge standard error on that estimate clearly 
tells you that the estimate should not be trusted, and the fit is unstable. 
That's to be expected, given the insane inbalance in your data, especially for 
the 13th column. If my students would incorporate that variable in a 
generalized linear model and tries to formulate a conclusion based on that 
coefficient, they failed the exam. So if somebody does this analysis and tries 
to draw any conclusion whatsoever on that estimate, maybe they should leave the 
analysis to somebody who does know what they're doing.

Cheers
Joris

On Thu, Jul 20, 2017 at 5:02 PM, Harm-Jan Westra 
mailto:westra.harm...@outlook.com>> wrote:
Dear R-core,


I have found an edge-case where the glm function falsely concludes that the 
model has converged. The issue is the following: my data contains a number of 
covariates, one of these covariates has a very small variance. For most of the 
rows of this covariate, the value is 0, except for one of the rows, where it is 
1.


The glm function correctly determines the beta and standard error estimates for 
all other covariates.


I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt


The model I'm using is very simple:


data <- read.table("rtestdata.txt")

model <- glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] + data[,6] + 
data[,7] + data[,8] + data[,9] + data[,10] + data[,11] + data[,12] + data[,13] 
+ data[,14], family=binomial("logit"))

summary(model)


You will see that for covariate data[,13], the beta/coefficient estimate is 
around -9. The number of iterations that has been performed is 8, and 
model$converged returns TRUE.


I've used some alternate logistic regression code in C 
(https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces 
identical estimates for the other covariates and comparable deviance values. 
However, using this C code, I'm seeing that the estimate for data[,13] is 
around -100 (since I'm allowing a maximum of 100 MLE iterations). There, the 
conclusion is that the model does not converge.


The difference between the two pieces of code is that in R, the glm() function 
determines convergence of the whole model by measuring the difference between 
deviance of the current iteration versus the deviance of the prior iteration, 
and calls the model converged when it reaches a certain epsilon value. In the 
C++ code, the model is converged when all parameters haven't changed markedly 
compared to the previous iteration.


I think both approaches are valid, although the R variant (while faster) makes 
it vulnerable to wrongly concluding convergence in edge cases such as the one 
presented above, resulting in wrong coefficient estimates. For people wanting 
to use logistic regression in a training/prediction kind of setting, using 
these estimates might influence their predictive performance.


The problem here is that the glm function does not return any warnings when one 
of the covariates in the model does not converge. For someone who is not paying 
attention, this may lead them to conclude there is nothing wrong with their 
data. In my opinion, the default behavior in this case should therefore be to 
conclude that the model did not converge, o

Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Simon Bonner
In defence of Harma-Jan's original post I would say that there is a difference 
between true convergence and satisfying a convergence criterion. 

In my view the algorithm has not converged. This is a case of quasi-complete 
separate -- there are both successes and failures when x13=0 but only failures 
when x13=1. As a result, the likelihood has no maximum and increases, albeit 
slightly, as the associated coefficient tends to infinity while maximizing over 
the other parameters. The estimate given is not the MLE and the standard error 
is not meaningful because the conditions for convergence of MLEs to their 
asymptotic normal distribution has been violated.

I agree with Joris that someone familiar with logistic regression should be 
able to identify this situation -- though the solution is not as simple as 
throwing out the covariate. Suppose that there had been many failures when 
x13=1, not just 1. The same problem would arise, but x13 is clearly an 
important covariate. Removing it from the analysis is not the thing to do. A 
better solution is to penalize the likelihood or (and I'm showing my true 
colours here) conduct a Bayesian analysis. 

Regarding the statement that the algorithm has converged, perhaps R should say 
more truthfully that the convergence criterion has been satisfied -- but that 
might lead to more confusion. In this case, any convergence criterion will be 
satisfied eventually. If you increase the maximum number of iterations in the C 
implementation then the other convergence criterion will be satisfied and the 
code will say that the algorithm has converged. 

In the end, it's up to the analyst to be aware of the pitfalls and how to 
address them.

Cheers,

Simon

> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Joris Meys
> Sent: July 20, 2017 11:39 AM
> To: Harm-Jan Westra 
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Wrongly converging glm()
> 
> Allow me to chime in. That's an interesting case you present, but as far as 
> I'm
> concerned the algorithm did converge. The estimate of -9.25 has an estimated
> standard error of 72.4, meaning that frequentists would claim the true value
> would lie anywhere between appx. -151 and 132 (CI) and hence the estimate
> from the glm algorithm is perfectly compatible with the one from the C++ code.
> And as the glm algorithm uses a different convergence rule, the algorithm
> rightfully reported it converged. It's not because another algorithm based on
> another rule doesn't converge, that the one glm uses didn't.
> 
> On top of that: In both cases the huge standard error on that estimate clearly
> tells you that the estimate should not be trusted, and the fit is unstable. 
> That's
> to be expected, given the insane inbalance in your data, especially for the 
> 13th
> column. If my students would incorporate that variable in a generalized linear
> model and tries to formulate a conclusion based on that coefficient, they 
> failed
> the exam. So if somebody does this analysis and tries to draw any conclusion
> whatsoever on that estimate, maybe they should leave the analysis to
> somebody who does know what they're doing.
> 
> Cheers
> Joris
> 
> On Thu, Jul 20, 2017 at 5:02 PM, Harm-Jan Westra
>  > wrote:
> 
> > Dear R-core,
> >
> >
> > I have found an edge-case where the glm function falsely concludes
> > that the model has converged. The issue is the following: my data
> > contains a number of covariates, one of these covariates has a very small
> variance.
> > For most of the rows of this covariate, the value is 0, except for one
> > of the rows, where it is 1.
> >
> >
> > The glm function correctly determines the beta and standard error
> > estimates for all other covariates.
> >
> >
> > I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt
> >
> >
> > The model I'm using is very simple:
> >
> >
> > data <- read.table("rtestdata.txt")
> >
> > model <- glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> > data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> > data[,12] + data[,13] + data[,14], family=binomial("logit"))
> >
> > summary(model)
> >
> >
> > You will see that for covariate data[,13], the beta/coefficient
> > estimate is around -9. The number of iterations that has been
> > performed is 8, and model$converged returns TRUE.
> >
> >
> > I've used some alternate logistic regression code in C (
> > https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> > identical estimates for the other covariates and comparable deviance
> > values. However, using this C code, I'm seeing that the estimate for
> > data[,13] is around -100 (since I'm allowing a maximum of 100 MLE
> > iterations). There, the conclusion is that the model does not converge.
> >
> >
> > The difference between the two pieces of code is that in R, the glm()
> > function determines convergence of the whole model by measuring the
> > difference between devianc

[Rd] matrices with names

2017-07-20 Thread William Dunlap via R-devel
How should R deal with matrices that have a 'names' attribute?  S (and S+)
did not allow an object to have both dims and names but R does.  However,
some R functions copy the dims but not the names to the returned value and
some copy both.  I don't see a pattern to it.  Is there a general rule for
when the names on a matrix should be copied to the return value of a
function?

> x <- matrix(11,1,1)
> names(x)<-"One"
> dput(x)
structure(11, .Dim = c(1L, 1L), .Names = "One")
> dput(log2(x))
structure(3.4594316186373, .Dim = c(1L, 1L), .Names = "One")
> dput(pchisq(x,8))
structure(0.798300801297471, .Dim = c(1L, 1L), .Names = "One")
> dput(x+1)
structure(12, .Dim = c(1L, 1L))
> dput(x > 3)
structure(TRUE, .Dim = c(1L, 1L))
> dput(!x)
structure(FALSE, .Names = "One", .Dim = c(1L, 1L))
> dput(-x)
structure(-11, .Dim = c(1L, 1L), .Names = "One")
> dput(0-x)
structure(-11, .Dim = c(1L, 1L))

The binary operators don't copy, unary operators do copy, and many other
low-level functions do copy.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Joris Meys
On Thu, Jul 20, 2017 at 6:21 PM, Harm-Jan Westra  wrote:

> Dear Joris,
>
>
> I agree that such a covariate should not be used in the analysis, and
> fully agree with your assessment. However, your response assumes that
> everybody who uses R knows what they're doing, which is a dangerous
> assumption to make. I bet there are a lot of people who blindly trust the
> output from R, even when there's clearly something wrong with the estimates.
>

You missed my point then. I don't assume that everybody who uses R knows
what they're doing. Actually, I know for a fact quite a few people using R
have absolutely no clue about what they are doing. My point is that
everybody using R should first do the effort of learning what they're
doing. And if they don't, one shouldn't blame R. There's a million
different cases where both algorithms would converge and the resulting
estimates are totally meaningless regardless. R cannot be held responsible
for that.


>
>
> In terms of your conclusion that the C++ estimate corresponds to a value
> within the R estimated confidence interval: if I allow the C++ code to run
> for 1000 iterations, it's estimate would be around -1000. It simply never
> converges.
>

I didn't test that far, and you're right in the sense that -100 is indeed
not the final estimate. After looking at the C code, it appears as if the
author of that code combines a Newton-Raphson approach with a different
convergence rule. And then it's quite understandible it doesn't converge.
You can wildly vary that estimate, the effect it has on the jacobian, log
likelihood or deviance will be insignificant. So the model won't improve,
it would just move all over the parameter space.


>
>
> I think there's nothing wrong with letting the user know there might be
> something wrong with one of the estimates, especially if your code can
> easily figure it out for you, by adding an additional rule. Not everyone is
> always paying attention (even if they know what they're doing).
>

If R would do that, it wouldn't start the fitting procedure but just return
an error "Your analysis died due to a lack of useable data." . Because
that's the problem here.


>
>
> With kind regards,
>
>
> Harm-Jan
>
>
> 
> From: Joris Meys 
> Sent: Thursday, July 20, 2017 11:38 AM
> To: Harm-Jan Westra
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Wrongly converging glm()
>
> Allow me to chime in. That's an interesting case you present, but as far
> as I'm concerned the algorithm did converge. The estimate of -9.25 has an
> estimated standard error of 72.4, meaning that frequentists would claim the
> true value would lie anywhere between appx. -151 and 132 (CI) and hence the
> estimate from the glm algorithm is perfectly compatible with the one from
> the C++ code. And as the glm algorithm uses a different convergence rule,
> the algorithm rightfully reported it converged. It's not because another
> algorithm based on another rule doesn't converge, that the one glm uses
> didn't.
>
> On top of that: In both cases the huge standard error on that estimate
> clearly tells you that the estimate should not be trusted, and the fit is
> unstable. That's to be expected, given the insane inbalance in your data,
> especially for the 13th column. If my students would incorporate that
> variable in a generalized linear model and tries to formulate a conclusion
> based on that coefficient, they failed the exam. So if somebody does this
> analysis and tries to draw any conclusion whatsoever on that estimate,
> maybe they should leave the analysis to somebody who does know what they're
> doing.
>
> Cheers
> Joris
>
> On Thu, Jul 20, 2017 at 5:02 PM, Harm-Jan Westra <
> westra.harm...@outlook.com> wrote:
> Dear R-core,
>
>
> I have found an edge-case where the glm function falsely concludes that
> the model has converged. The issue is the following: my data contains a
> number of covariates, one of these covariates has a very small variance.
> For most of the rows of this covariate, the value is 0, except for one of
> the rows, where it is 1.
>
>
> The glm function correctly determines the beta and standard error
> estimates for all other covariates.
>
>
> I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt
>
>
> The model I'm using is very simple:
>
>
> data <- read.table("rtestdata.txt")
>
> model <- glm(data[,1] ~ data[,2] + data[,3] + data[,4] + data[,5] +
> data[,6] + data[,7] + data[,8] + data[,9] + data[,10] + data[,11] +
> data[,12] + data[,13] + data[,14], family=binomial("logit"))
>
> summary(model)
>
>
> You will see that for covariate data[,13], the beta/coefficient estimate
> is around -9. The number of iterations that has been performed is 8, and
> model$converged returns TRUE.
>
>
> I've used some alternate logistic regression code in C (
> https://github.com/czep/mlelr/blob/master/src/mlelr.c), which produces
> identical estimates for the other covariates and com

Re: [Rd] [patch] ?confint: "assumes asymptotic normality"

2017-07-20 Thread Scott Kostyshak
On Thu, Jul 20, 2017 at 04:21:04PM +0200, Martin Maechler wrote:
> > Scott Kostyshak 
> > on Thu, 20 Jul 2017 03:28:37 -0400 writes:
> 
> >> From ?confint:
> > "Computes confidence intervals" and "The default method assumes
> > asymptotic normality"
> 
> > For me, a "confidence interval" implies an exact confidence interval in
> > formal statistics (I concede that when speaking, the term is often used
> > more loosely). And of course, even if a test statistic is asymptotically
> > normal (so the assumption is satisfied), the finite distribution might
> > not be normal and thus an exact confidence interval would not be
> > computed.
> 
> > Attached is a patch that simply changes "asymptotic normality" to
> > "normality" in confint.Rd. This encourages the user of the function to
> > think about whether their asymptotically normal statistic is "normal
> > enough" in a finite sample to get something reliable from confint().
> 
> > Alternatively, we could instead change "Computes confidence intervals"
> > to "Computes asymptotic confidence intervals".
> 
> > I hope I'm not being too pedantic here.
> 
> well, it's just at the 97.5% border line of "too pedantic"  ...

:)

> ;-)
> 
> I think you are right with your first proposal to drop
> "asymptotic" here.  After all, there's the explict 'fac <- qnorm(a)'.

Note that I received a private email that my message was indeed too
pedantic and expressed disagreement with the proposal. I'm not sure if
they intended it to be private so I will respond in private and see if
they feel like bringing the discussion on the list. Or perhaps this
minor (and perhaps controversial?) issue is not worth any additional
time.

> One could consider to make  'qnorm' an argument of the
> default method to allow more general distributional assumptions,
> but it may be wiser to have useRs write their own
> confint.() method, notably for cases where
> diag(vcov(object)) is an efficiency waste...

Thanks for your comments,

Scott

> Martin
> 
> 
> > Scott
> 
> 
> > -- 
> > Scott Kostyshak
> > Assistant Professor of Economics
> > University of Florida
> > https://people.clas.ufl.edu/skostyshak/
> 
> 
> > --
> > Index: src/library/stats/man/confint.Rd
> > ===
> > --- src/library/stats/man/confint.Rd(revision 72930)
> > +++ src/library/stats/man/confint.Rd(working copy)
> > @@ -31,7 +31,7 @@
> > }
> > \details{
> > \code{confint} is a generic function.  The default method assumes
> > -  asymptotic normality, and needs suitable \code{\link{coef}} and
> > +  normality, and needs suitable \code{\link{coef}} and
> > \code{\link{vcov}} methods to be available.  The default method can be
> > called directly for comparison with other methods.
>  
> 
> > --
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Harm-Jan Westra
Dear Simon,


Thanks for your response. I have a suggestion that could be non-intrusive, but 
still provide some additional info to the user.


The glm function already checks for collinearity of the input, and you can 
easily check which covariate was aliased as a result, using 
summary(model)$aliased.


My suggestion would therefore be to add an additional variable to the model 
summary, e.g. summary(model)$estimateconverged, which states whether the MLE 
has converged for that particular covariate.


This would provide users a way to perform sanity checks on the models they fit: 
sometimes you're running hundreds or millions of models, making it infeasible 
to check every single one of them. I agree that investigating the estimates + 
standard errors would be a solution, but then again, the estimate that R 
produces for such a covariate might as well be random.


With kind regards,


Harm-Jan



From: Simon Bonner 
Sent: Thursday, July 20, 2017 12:32 PM
To: Joris Meys; Harm-Jan Westra
Cc: r-devel@r-project.org
Subject: RE: [Rd] Wrongly converging glm()

In defence of Harma-Jan's original post I would say that there is a difference 
between true convergence and satisfying a convergence criterion.

In my view the algorithm has not converged. This is a case of quasi-complete 
separate -- there are both successes and failures when x13=0 but only failures 
when x13=1. As a result, the likelihood has no maximum and increases, albeit 
slightly, as the associated coefficient tends to infinity while maximizing over 
the other parameters. The estimate given is not the MLE and the standard error 
is not meaningful because the conditions for convergence of MLEs to their 
asymptotic normal distribution has been violated.

I agree with Joris that someone familiar with logistic regression should be 
able to identify this situation -- though the solution is not as simple as 
throwing out the covariate. Suppose that there had been many failures when 
x13=1, not just 1. The same problem would arise, but x13 is clearly an 
important covariate. Removing it from the analysis is not the thing to do. A 
better solution is to penalize the likelihood or (and I'm showing my true 
colours here) conduct a Bayesian analysis.

Regarding the statement that the algorithm has converged, perhaps R should say 
more truthfully that the convergence criterion has been satisfied -- but that 
might lead to more confusion. In this case, any convergence criterion will be 
satisfied eventually. If you increase the maximum number of iterations in the C 
implementation then the other convergence criterion will be satisfied and the 
code will say that the algorithm has converged.

In the end, it's up to the analyst to be aware of the pitfalls and how to 
address them.

Cheers,

Simon

> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Joris Meys
> Sent: July 20, 2017 11:39 AM
> To: Harm-Jan Westra 
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Wrongly converging glm()
>
> Allow me to chime in. That's an interesting case you present, but as far as 
> I'm
> concerned the algorithm did converge. The estimate of -9.25 has an estimated
> standard error of 72.4, meaning that frequentists would claim the true value
> would lie anywhere between appx. -151 and 132 (CI) and hence the estimate
> from the glm algorithm is perfectly compatible with the one from the C++ code.
> And as the glm algorithm uses a different convergence rule, the algorithm
> rightfully reported it converged. It's not because another algorithm based on
> another rule doesn't converge, that the one glm uses didn't.
>
> On top of that: In both cases the huge standard error on that estimate clearly
> tells you that the estimate should not be trusted, and the fit is unstable. 
> That's
> to be expected, given the insane inbalance in your data, especially for the 
> 13th
> column. If my students would incorporate that variable in a generalized linear
> model and tries to formulate a conclusion based on that coefficient, they 
> failed
> the exam. So if somebody does this analysis and tries to draw any conclusion
> whatsoever on that estimate, maybe they should leave the analysis to
> somebody who does know what they're doing.
>
> Cheers
> Joris
>
> On Thu, Jul 20, 2017 at 5:02 PM, Harm-Jan Westra
>  > wrote:
>
> > Dear R-core,
> >
> >
> > I have found an edge-case where the glm function falsely concludes
> > that the model has converged. The issue is the following: my data
> > contains a number of covariates, one of these covariates has a very small
> variance.
> > For most of the rows of this covariate, the value is 0, except for one
> > of the rows, where it is 1.
> >
> >
> > The glm function correctly determines the beta and standard error
> > estimates for all other covariates.
> >
> >
> > I've placed the data here: http://www.harmjanwestra.nl/rtestdata.txt
> >
> >
> > The model I'm using is very simple:
> >

Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Harm-Jan Westra
My apologies if I seemed to �blame R�. This was in no way my intention. I get 
the feeling that you�re missing my point as well.

I observed something that I thought was confusing, when comparing two more or 
less identical methods (when validating the C code), and wanted to make a 
suggestion as to how to help future R users. Note that I already acknowledged 
that my data was bad. Note that I also mention that the way R determines 
convergence is a valid approach.

What strikes me as odd is that R would warn you when your data is faulty for a 
function such as cor(), but not for glm(). I don�t see why you wouldn�t want to 
check both convergence criteria if you know multiple of such criteria exist. It 
would make the software more user friendly in the end.

It may be true that there are millions of edge cases causing issues with glm(), 
as you say, but here I am presenting an edge case that can be easily detected, 
by checking whether the difference in beta estimates between the current and 
previous iteration is bigger than a certain epsilon value.

I agree �that everybody using R should first do the effort of learning what 
they're doing�, but it is a bit of a non-argument, because we all know that, 
the world just doesn�t work that way, plus this is one of the arguments that 
has held for example the Linux community back for quite a while (i.e. let�s not 
make the software more user friendly because the user should be more 
knowledgeable).

Harm-Jan


From: Joris Meys
Sent: Thursday, July 20, 2017 13:16
To: Harm-Jan Westra
Cc: r-devel@r-project.org
Subject: Re: [Rd] Wrongly converging glm()



On Thu, Jul 20, 2017 at 6:21 PM, Harm-Jan Westra 
mailto:westra.harm...@outlook.com>> wrote:
Dear Joris,


I agree that such a covariate should not be used in the analysis, and fully 
agree with your assessment. However, your response assumes that everybody who 
uses R knows what they're doing, which is a dangerous assumption to make. I bet 
there are a lot of people who blindly trust the output from R, even when 
there's clearly something wrong with the estimates.

You missed my point then. I don't assume that everybody who uses R knows what 
they're doing. Actually, I know for a fact quite a few people using R have 
absolutely no clue about what they are doing. My point is that everybody using 
R should first do the effort of learning what they're doing. And if they don't, 
one shouldn't blame R. There's a million different cases where both algorithms 
would converge and the resulting estimates are totally meaningless regardless. 
R cannot be held responsible for that.



In terms of your conclusion that the C++ estimate corresponds to a value within 
the R estimated confidence interval: if I allow the C++ code to run for 1000 
iterations, it's estimate would be around -1000. It simply never converges.

I didn't test that far, and you're right in the sense that -100 is indeed not 
the final estimate. After looking at the C code, it appears as if the author of 
that code combines a Newton-Raphson approach with a different convergence rule. 
And then it's quite understandible it doesn't converge. You can wildly vary 
that estimate, the effect it has on the jacobian, log likelihood or deviance 
will be insignificant. So the model won't improve, it would just move all over 
the parameter space.



I think there's nothing wrong with letting the user know there might be 
something wrong with one of the estimates, especially if your code can easily 
figure it out for you, by adding an additional rule. Not everyone is always 
paying attention (even if they know what they're doing).

If R would do that, it wouldn't start the fitting procedure but just return an 
error "Your analysis died due to a lack of useable data." . Because that's the 
problem here.



With kind regards,


Harm-Jan



From: Joris Meys mailto:jorism...@gmail.com>>
Sent: Thursday, July 20, 2017 11:38 AM
To: Harm-Jan Westra
Cc: r-devel@r-project.org
Subject: Re: [Rd] Wrongly converging glm()

Allow me to chime in. That's an interesting case you present, but as far as I'm 
concerned the algorithm did converge. The estimate of -9.25 has an estimated 
standard error of 72.4, meaning that frequentists would claim the true value 
would lie anywhere between appx. -151 and 132 (CI) and hence the estimate from 
the glm algorithm is perfectly compatible with the one from the C++ code. And 
as the glm algorithm uses a different convergence rule, the algorithm 
rightfully reported it converged. It's not because another algorithm based on 
another rule doesn't converge, that the one glm uses didn't.

On top of that: In both cases the huge standard error on that estimate clearly 
tells you that the estimate should not be trusted, and the fit is unstable. 
That's to be expected, given the insane inbalance in your

Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Mark Leeds
Hi Harm-Jan. I've been following this thread to some degree and just want
to add that
 this issue is not specific to the GLM. It's a problem with optimization of
functions in general. I was using use Rvmmin with constraints which is an
extremely solid optimization package written by John Nash ( uses a modified
BFGS  algorithm) and it took me two years to realize that, although my
optimization generally converged, there was an idenitifiability issue with
my model that basically meant that the results meant nothing. I only
eventually found this out because, in the econometrics literature,  the
type of economic model I was estimating ( rational expectations ) is known
to have an identifiability issue. I guess if I was an economics expert, I
may have been able to know this but, in general, I think what you are
asking
optimization code to do is EXTREMELY DIFFICULT.

John Nash can say more because he's THE optimization masteR but it's much
more difficult to write optimization algorithms with convergence rules that
are able to identify when mathematical convergence ( norm near zero say )
is not necessarily model convergence. That I can tell you from experience
!!!








On Thu, Jul 20, 2017 at 2:32 PM, Harm-Jan Westra  wrote:

> My apologies if I seemed to ‘blame R’. This was in no way my intention. I
> get the feeling that you’re missing my point as well.
>
> I observed something that I thought was confusing, when comparing two more
> or less identical methods (when validating the C code), and wanted to make
> a suggestion as to how to help future R users. Note that I already
> acknowledged that my data was bad. Note that I also mention that the way R
> determines convergence is a valid approach.
>
> What strikes me as odd is that R would warn you when your data is faulty
> for a function such as cor(), but not for glm(). I don’t see why you
> wouldn’t want to check both convergence criteria if you know multiple of
> such criteria exist. It would make the software more user friendly in the
> end.
>
> It may be true that there are millions of edge cases causing issues with
> glm(), as you say, but here I am presenting an edge case that can be easily
> detected, by checking whether the difference in beta estimates between the
> current and previous iteration is bigger than a certain epsilon value.
>
> I agree ‘that everybody using R should first do the effort of learning
> what they're doing’, but it is a bit of a non-argument, because we all know
> that, the world just doesn’t work that way, plus this is one of the
> arguments that has held for example the Linux community back for quite a
> while (i.e. let’s not make the software more user friendly because the user
> should be more knowledgeable).
>
> Harm-Jan
>
>
> From: Joris Meys
> Sent: Thursday, July 20, 2017 13:16
> To: Harm-Jan Westra
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Wrongly converging glm()
>
>
>
> On Thu, Jul 20, 2017 at 6:21 PM, Harm-Jan Westra <
> westra.harm...@outlook.com> wrote:
> Dear Joris,
>
>
> I agree that such a covariate should not be used in the analysis, and
> fully agree with your assessment. However, your response assumes that
> everybody who uses R knows what they're doing, which is a dangerous
> assumption to make. I bet there are a lot of people who blindly trust the
> output from R, even when there's clearly something wrong with the estimates.
>
> You missed my point then. I don't assume that everybody who uses R knows
> what they're doing. Actually, I know for a fact quite a few people using R
> have absolutely no clue about what they are doing. My point is that
> everybody using R should first do the effort of learning what they're
> doing. And if they don't, one shouldn't blame R. There's a million
> different cases where both algorithms would converge and the resulting
> estimates are totally meaningless regardless. R cannot be held responsible
> for that.
>
>
>
> In terms of your conclusion that the C++ estimate corresponds to a value
> within the R estimated confidence interval: if I allow the C++ code to run
> for 1000 iterations, it's estimate would be around -1000. It simply never
> converges.
>
> I didn't test that far, and you're right in the sense that -100 is indeed
> not the final estimate. After looking at the C code, it appears as if the
> author of that code combines a Newton-Raphson approach with a different
> convergence rule. And then it's quite understandible it doesn't converge.
> You can wildly vary that estimate, the effect it has on the jacobian, log
> likelihood or deviance will be insignificant. So the model won't improve,
> it would just move all over the parameter space.
>
>
>
> I think there's nothing wrong with letting the user know there might be
> something wrong with one of the estimates, especially if your code can
> easily figure it out for you, by addi

Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Harm-Jan Westra
Dear Mark,

I agree that convergence is a problem that applies to optimization in general, 
where the function you�re trying to optimize may have more than one local 
minimum. In your case, you probably would have to try different starting points 
for the MLE procedure. This should not be the case for logistic regression 
however (unless, like in my data, you have something that defies your model 
assumptions; check Simon Bonner�s response).

Still, I would think it would be a bit odd if the deviance wouldn�t change, but 
one of the model parameters did after the next MLE iteration. It would tell me 
that these parameters wouldn�t add to the model fit, which in my opinion would 
be useful debugging information, even when I would be hitting a local minimum 
(it could even help me inform that there is another, more optimal, solution?). 
Probably I should try to figure out whether this observation is also true for 
other models/link functions (I honestly don�t know).

However, thanks to your response, I can see that my suggestion is probably not 
applicable to all glm link functions, and I see how implementation of my 
proposed �warning system� could be confusing to the user. Thanks alot!

With kind regards,

Harm-Jan

From: Mark Leeds
Sent: Thursday, July 20, 2017 14:54
To: Harm-Jan Westra
Cc: Joris Meys; 
r-devel@r-project.org
Subject: Re: [Rd] Wrongly converging glm()

Hi Harm-Jan. I've been following this thread to some degree and just want to 
add that
 this issue is not specific to the GLM. It's a problem with optimization of 
functions in general. I was using use Rvmmin with constraints which is an 
extremely solid optimization package written by John Nash ( uses a modified 
BFGS  algorithm) and it took me two years to realize that, although my 
optimization generally converged, there was an idenitifiability issue with my 
model that basically meant that the results meant nothing. I only eventually 
found this out because, in the econometrics literature,  the type of economic 
model I was estimating ( rational expectations ) is known to have an 
identifiability issue. I guess if I was an economics expert, I  may have been 
able to know this but, in general, I think what you are asking
optimization code to do is EXTREMELY DIFFICULT.

John Nash can say more because he's THE optimization masteR but it's much more 
difficult to write optimization algorithms with convergence rules that are able 
to identify when mathematical convergence ( norm near zero say ) is not 
necessarily model convergence. That I can tell you from experience !!!





On Thu, Jul 20, 2017 at 2:32 PM, Harm-Jan Westra 
mailto:westra.harm...@outlook.com>> wrote:
My apologies if I seemed to �blame R�. This was in no way my intention. I get 
the feeling that you�re missing my point as well.

I observed something that I thought was confusing, when comparing two more or 
less identical methods (when validating the C code), and wanted to make a 
suggestion as to how to help future R users. Note that I already acknowledged 
that my data was bad. Note that I also mention that the way R determines 
convergence is a valid approach.

What strikes me as odd is that R would warn you when your data is faulty for a 
function such as cor(), but not for glm(). I don�t see why you wouldn�t want to 
check both convergence criteria if you know multiple of such criteria exist. It 
would make the software more user friendly in the end.

It may be true that there are millions of edge cases causing issues with glm(), 
as you say, but here I am presenting an edge case that can be easily detected, 
by checking whether the difference in beta estimates between the current and 
previous iteration is bigger than a certain epsilon value.

I agree �that everybody using R should first do the effort of learning what 
they're doing�, but it is a bit of a non-argument, because we all know that, 
the world just doesn�t work that way, plus this is one of the arguments that 
has held for example the Linux community back for quite a while (i.e. let�s not 
make the software more user friendly because the user should be more 
knowledgeable).

Harm-Jan


From: Joris Meys>
Sent: Thursday, July 20, 2017 13:16
To: Harm-Jan 
Westra>
Cc: 
r-devel@r-project.org>
Subject: Re: [Rd] Wrongly converging glm()



On Thu, Jul 20, 2017 at 6:21 PM, Harm-Jan Westra 
mailto:westra.harm...@outlook.com>>>
 wrote:
Dear Joris,


I agree that such a covariate should not be used in the analysis, and fully 
agree with your assessment. However, your response assumes that everybody who 
uses R knows wha

Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Joris Meys
On Thu, Jul 20, 2017 at 8:32 PM, Harm-Jan Westra  wrote:

> My apologies if I seemed to ‘blame R’. This was in no way my intention. I
> get the feeling that you’re missing my point as well.
>

I get that now. But you're on R-devel and you started with the claim that R
"falsely reports...". That looks like a bug report, and that's why I
initially answered that R correctly reports it converged. Maybe to the
wrong value, but it converged.


>
>
> What strikes me as odd is that R would warn you when your data is faulty
> for a function such as cor(), but not for glm(). I don’t see why you
> wouldn’t want to check both convergence criteria if you know multiple of
> such criteria exist. It would make the software more user friendly in the
> end.
>

The unfitness of the data bears no relation to the convergence criterium
and vice versa. These data checks should be done before the convergence
algorithm is even started, and as Mark Leeds also indicated, that's one
hell of a job to do. That said, the glm function has an argument "method"
by which you can provide an alternative version of glm.fit().  Adapting
that one to use another convergence criterium is rather trivial, so
technically R even allows you to do that out of the box. No patches needed.


>
>
> I agree ‘that everybody using R should first do the effort of learning
> what they're doing’, but it is a bit of a non-argument, because we all know
> that, the world just doesn’t work that way, plus this is one of the
> arguments that has held for example the Linux community back for quite a
> while (i.e. let’s not make the software more user friendly because the user
> should be more knowledgeable).
>

That's a wrong analogy imho. You can expect Linux to be user friendly, but
not "I will detect every logical fallacy in the article you're writing in
this text editor" friendly. And honestly, that's a bit what you're asking R
to do here. I understand why, but there's always cases that will be missed.
And I wouldn't dare to speak in the name of the R core team, but I can
imagine they have a little more urgent issues than helping my students to
pass their statistics course ;-)

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [PATCH] Fix missing break

2017-07-20 Thread Steve Grubb
Hello,

There appears to be a break missing in the switch/case for the LISTSXP case.
If this is supposed to fall through, I'd suggest a comment so that others
know its by design.

Signed-off-by: Steve Grubb 

Index: src/main/builtin.c
===
--- src/main/builtin.c  (revision 72935)
+++ src/main/builtin.c  (working copy)
@@ -888,6 +888,7 @@
SETCAR(t, CAR(x));
SET_TAG(t, TAG(x));
}
+   break;
 case VECSXP:
for (i = 0; i < len; i++)
if (i < lenx) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [PATCH] Fix bad free in connections

2017-07-20 Thread Steve Grubb
Hello, 

There are times when b points to buf which is a stack variable. This
leads to a bad free. The current test actually guarantees the stack
will try to get freed. Simplest to just drop the variable and directly
test if b should get freed.


Signed-off-by: Steve Grubb 


Index: src/main/connections.c
===
--- src/main/connections.c  (revision 72935)
+++ src/main/connections.c  (working copy)
@@ -421,7 +421,6 @@
 char buf[BUFSIZE], *b = buf;
 int res;
 const void *vmax = NULL; /* -Wall*/
-int usedVasprintf = FALSE;
 va_list aq;
 
 va_copy(aq, ap);
@@ -434,7 +433,7 @@
b = buf;
buf[BUFSIZE-1] = '\0';
warning(_("printing of extremely long output is truncated"));
-   } else usedVasprintf = TRUE;
+   }
 }
 #else
 if(res >= BUFSIZE) { /* res is the desired output length */
@@ -481,7 +480,7 @@
 } else
con->write(b, 1, res, con);
 if(vmax) vmaxset(vmax);
-if(usedVasprintf) free(b);
+if(b != buf) free(b);
 return res;
 }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Harm-Jan Westra
Dear Joris,

I’ll be more careful in my wording next time; thanks for the pointer, and 
thanks for the discussion. This whole process has been quite educational! 😉. I 
think we’ve reached a consensus here, where the situation as it is right now 
has been chosen to allow for flexibility of R’s glm() function.

With kind regards,

Harm-Jan



From: Joris Meys
Sent: Thursday, July 20, 2017 16:06
To: Harm-Jan Westra
Cc: r-devel@r-project.org
Subject: Re: [Rd] Wrongly converging glm()




On Thu, Jul 20, 2017 at 8:32 PM, Harm-Jan Westra 
mailto:westra.harm...@outlook.com>> wrote:
My apologies if I seemed to ‘blame R’. This was in no way my intention. I get 
the feeling that you’re missing my point as well.

I get that now. But you're on R-devel and you started with the claim that R 
"falsely reports...". That looks like a bug report, and that's why I initially 
answered that R correctly reports it converged. Maybe to the wrong value, but 
it converged.


What strikes me as odd is that R would warn you when your data is faulty for a 
function such as cor(), but not for glm(). I don’t see why you wouldn’t want to 
check both convergence criteria if you know multiple of such criteria exist. It 
would make the software more user friendly in the end.

The unfitness of the data bears no relation to the convergence criterium and 
vice versa. These data checks should be done before the convergence algorithm 
is even started, and as Mark Leeds also indicated, that's one hell of a job to 
do. That said, the glm function has an argument "method" by which you can 
provide an alternative version of glm.fit().  Adapting that one to use another 
convergence criterium is rather trivial, so technically R even allows you to do 
that out of the box. No patches needed.


I agree ‘that everybody using R should first do the effort of learning what 
they're doing’, but it is a bit of a non-argument, because we all know that, 
the world just doesn’t work that way, plus this is one of the arguments that 
has held for example the Linux community back for quite a while (i.e. let’s not 
make the software more user friendly because the user should be more 
knowledgeable).

That's a wrong analogy imho. You can expect Linux to be user friendly, but not 
"I will detect every logical fallacy in the article you're writing in this text 
editor" friendly. And honestly, that's a bit what you're asking R to do here. I 
understand why, but there's always cases that will be missed. And I wouldn't 
dare to speak in the name of the R core team, but I can imagine they have a 
little more urgent issues than helping my students to pass their statistics 
course ;-)
Cheers
Joris


--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] [PATCH] Fix memory leak in PicTeXDeviceDriver

2017-07-20 Thread Steve Grubb
Hello,

This patch fixes a memory leak due to ptd going out of scope
before its assigned to dd.

Signed-off-by: Steve Grubb 

Index: src/library/grDevices/src/devPicTeX.c
===
--- src/library/grDevices/src/devPicTeX.c   (revision 72935)
+++ src/library/grDevices/src/devPicTeX.c   (working copy)
@@ -665,8 +665,10 @@
 ptd->width = width;
 ptd->height = height;
 
-if( ! PicTeX_Open(dd, ptd) ) 
+if( ! PicTeX_Open(dd, ptd) ) {
+free(ptd);
return FALSE;
+}
 
 /* Base Pointsize */
 /* Nominal Character Sizes in Pixels */

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [PATCH] Fix fscanf specifier in InIntegerAscii

2017-07-20 Thread Steve Grubb
Hello,

The SMBUF_SIZED_STRING allows fscanf to read upto 511 bytes. The buffer
at line 1382 is only 128 bytes. The fscanf format specifier ought to be
resized to prevent a stack overrun.

Signed-of-by: Steve Grubb 

Index: saveload.c
===
--- src/main/saveload.c (revision 72935)
+++ src/main/saveload.c (working copy)
@@ -1379,7 +1379,7 @@
 {
 char buf[128];
 int x, res;
-res = fscanf(fp, SMBUF_SIZED_STRING, buf);
+res = fscanf(fp, "%127s", buf);
 if(res != 1) error(_("read error"));
 if (strcmp(buf, "NA") == 0)
return NA_INTEGER;

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] [PATCH] Fix status in main

2017-07-20 Thread Steve Grubb
Hello,

This is a patch to fix what appears to be a simple typo. The warning says
"invalid status assuming 0", but then instead sets runLast to 0.

Signed-of-by: Steve Grubb 

Index: src/main/main.c
===
--- src/main/main.c (revision 72935)
+++ src/main/main.c (working copy)
@@ -1341,7 +1341,7 @@
 status = asInteger(CADR(args));
 if (status == NA_INTEGER) {
warning(_("invalid 'status', 0 assumed"));
-   runLast = 0;
+   status = 0;
 }
 runLast = asLogical(CADDR(args));
 if (runLast == NA_LOGICAL) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [patch] ?confint: "assumes asymptotic normality"

2017-07-20 Thread peter dalgaard

> On 20 Jul 2017, at 19:46 , Scott Kostyshak  wrote:
> 
> On Thu, Jul 20, 2017 at 04:21:04PM +0200, Martin Maechler wrote:
>>> Scott Kostyshak 
>>>on Thu, 20 Jul 2017 03:28:37 -0400 writes:
>> 
 From ?confint:
>>> "Computes confidence intervals" and "The default method assumes
>>> asymptotic normality"
>> 
>>> For me, a "confidence interval" implies an exact confidence interval in
>>> formal statistics (I concede that when speaking, the term is often used
>>> more loosely). And of course, even if a test statistic is asymptotically
>>> normal (so the assumption is satisfied), the finite distribution might
>>> not be normal and thus an exact confidence interval would not be
>>> computed.
>> 
>>> Attached is a patch that simply changes "asymptotic normality" to
>>> "normality" in confint.Rd. This encourages the user of the function to
>>> think about whether their asymptotically normal statistic is "normal
>>> enough" in a finite sample to get something reliable from confint().
>> 
>>> Alternatively, we could instead change "Computes confidence intervals"
>>> to "Computes asymptotic confidence intervals".
>> 
>>> I hope I'm not being too pedantic here.
>> 
>> well, it's just at the 97.5% border line of "too pedantic"  ...
> 
> :)
> 
>> ;-)
>> 
>> I think you are right with your first proposal to drop
>> "asymptotic" here.  After all, there's the explict 'fac <- qnorm(a)'.
> 
> Note that I received a private email that my message was indeed too
> pedantic and expressed disagreement with the proposal. I'm not sure if
> they intended it to be private so I will respond in private and see if
> they feel like bringing the discussion on the list. Or perhaps this
> minor (and perhaps controversial?) issue is not worth any additional
> time.

At any rate, it is important not to let the pedantry cause the text to become 
misleading. If you just write "assumes normality", readers may consider the 
procedure to be simply wrong when the estimator (or worse: the original data) 
is not normally distributed. And "computes asymptotic c.i." is just wrong, 
because they are sometimes exact. 

It may be necessary to spell things out more extensively. Something like "the 
default method assumes normality and that the s.e. is known. For asymptotically 
normally distributed estimators, it yields an asymptotic confidence interval."

-pd

  

> 
>> One could consider to make  'qnorm' an argument of the
>> default method to allow more general distributional assumptions,
>> but it may be wiser to have useRs write their own
>> confint.() method, notably for cases where
>> diag(vcov(object)) is an efficiency waste...
> 
> Thanks for your comments,
> 
> Scott
> 
>> Martin
>> 
>> 
>>> Scott
>> 
>> 
>>> -- 
>>> Scott Kostyshak
>>> Assistant Professor of Economics
>>> University of Florida
>>> https://people.clas.ufl.edu/skostyshak/
>> 
>> 
>>> --
>>> Index: src/library/stats/man/confint.Rd
>>> ===
>>> --- src/library/stats/man/confint.Rd(revision 72930)
>>> +++ src/library/stats/man/confint.Rd(working copy)
>>> @@ -31,7 +31,7 @@
>>> }
>>> \details{
>>> \code{confint} is a generic function.  The default method assumes
>>> -  asymptotic normality, and needs suitable \code{\link{coef}} and
>>> +  normality, and needs suitable \code{\link{coef}} and
>>> \code{\link{vcov}} methods to be available.  The default method can be
>>> called directly for comparison with other methods.
>> 
>> 
>>> --
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix missing break

2017-07-20 Thread Duncan Murdoch
Thanks for posting this series of patches.  Unfortunately, there's a 
good chance they'll get lost in all the traffic on R-devel.  If you 
don't hear that they've been fixed in the next couple of weeks, could 
you post them to bugs.r-project.org, and post future patches there as well?


In examples like the one below, if you have R code that shows symptoms, 
it would really help in the bug report.  Otherwise, someone else will 
have to analyze the code to decide whether it's a bug or missing 
comment.  That takes time, and if there are no known symptoms, it's 
likely to be assigned a low priority.  The sad truth is that very few 
members of R Core are currently actively fixing bugs.


Duncan Murdoch



On 20/07/2017 5:02 PM, Steve Grubb wrote:

Hello,

There appears to be a break missing in the switch/case for the LISTSXP case.
If this is supposed to fall through, I'd suggest a comment so that others
know its by design.

Signed-off-by: Steve Grubb 

Index: src/main/builtin.c
===
--- src/main/builtin.c  (revision 72935)
+++ src/main/builtin.c  (working copy)
@@ -888,6 +888,7 @@
SETCAR(t, CAR(x));
SET_TAG(t, TAG(x));
}
+   break;
 case VECSXP:
for (i = 0; i < len; i++)
if (i < lenx) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix missing break

2017-07-20 Thread Steve Grubb
On Thursday, July 20, 2017 7:41:00 PM EDT Duncan Murdoch wrote:
> Thanks for posting this series of patches.  Unfortunately, there's a
> good chance they'll get lost in all the traffic on R-devel.  If you
> don't hear that they've been fixed in the next couple of weeks, could
> you post them to bugs.r-project.org, and post future patches there as well?

That was my first inclination. But there is no way to create an account unlike 
most open source projects I work with. And I work with quite a lot.


> In examples like the one below, if you have R code that shows symptoms,
> it would really help in the bug report. 

I am hoping that we can look at the code as seasoned programmers and say yeah, 
that is a bug. I run the code through Coverity and have quite a lot of 
problems to tell you about. I run these 5 out as tests to see how this 
community works. I am new to this community but not necessarily R and just 
want to contribute back to something I am using. But believe me, I have a 
bunch more that seasoned programmers can eyeball and say yep - that's a bug.


> Otherwise, someone else will have to analyze the code to decide whether it's
> a bug or missing comment.  That takes time, and if there are no known
> symptoms, it's likely to be assigned a low priority.  The sad truth is that
> very few members of R Core are currently actively fixing bugs.

That's a shame. I'd be happy to give the scan to people in core so they can 
see what the lay of the land looks like. R works amazingly good. So much so I 
decided to dig deeper. I'd recommend to the core developers that they ask to 
get on Coverity's open source scan list.

https://scan.coverity.com/

It's free to open source projects like this. :-)

-Steve


> On 20/07/2017 5:02 PM, Steve Grubb wrote:
> > Hello,
> > 
> > There appears to be a break missing in the switch/case for the LISTSXP
> > case. If this is supposed to fall through, I'd suggest a comment so that
> > others know its by design.
> > 
> > Signed-off-by: Steve Grubb 
> > 
> > Index: src/main/builtin.c
> > ===
> > --- src/main/builtin.c  (revision 72935)
> > +++ src/main/builtin.c  (working copy)
> > @@ -888,6 +888,7 @@
> > 
> > SETCAR(t, CAR(x));
> > SET_TAG(t, TAG(x));
> > 
> > }
> > 
> > +   break;
> > 
> >  case VECSXP:
> > for (i = 0; i < len; i++)
> > 
> > if (i < lenx) {
> > 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel