Re: [Rd] Rscript failing with h5 reveals bugs in h5 (and 'R CMD check')

2017-12-28 Thread Martin Maechler
> Duncan Murdoch 
> on Wed, 27 Dec 2017 06:13:12 -0500 writes:

> On 26/12/2017 9:40 AM, Dirk Eddelbuettel wrote:
>> 
>> On 26 December 2017 at 22:14, Sun Yijiang wrote: | Thanks
>> for the solution.  Now I know the work-arounds, but still
>> don't | quite get it. Why does R_DEFAULT_PACKAGES has
>> anything to do with | library(methods)?
>> 
>> Because it governs which packages are loaded by default.
>> And while R also loads 'methods', Rscript does
>> not. Source of endless confusion.

> Mostly irrelevant correction of the jargon: that setting
> controls which packages are "attached" by default.
> library(h5) would be enough to load methods, because h5
> imports things from methods.  But loading doesn't put a
> package on the search list.  library(methods) both loads
> methods (if it hasn't already been loaded), and attaches
> it.

>> 
>> | If library(h5) works, it should just work, not depend
>> on an environment variable.
>> 
>> Every package using S4 will fail under Rscript unless
>> 'methods' explicitly.

> That's not quite true (or quite English, as per
> fortune(112)).  The "gmp" package imports methods, and
> it works in Rscript.  What doesn't work is to expect
> library(h5) or library(gmp) to cause methods functions
> like show() to be available to the user.

But indeed, in this case Sun's  test.R  script did not use any
such user level functions, and it still did not work when
methods is not attached...
and indeed that's the case also with R if you run it without
loading methods e.g. by

  R_DEFAULT_PACKAGES=NULL R CMD BATCH test.R

shows the same error...

===> There is really a bug in  h5 :  It does not import enough
from methods or it would all work fine, even with Rscript !!

===> So we have something relevant to R-devel , actually at
least one bug in R's  checking :

Why did
R   CMD check --as-cran h5

not see that h5 defines methods for initialize() but never
imports that from methods ?
and so "should not work" when methods is not attached

--

After all, the fact that the default packages attached at the
beginning differ between R and Rscript  has contributed to
revealing a bug in both 'h5' and R's checking procedures.

Maybe we should keep Rscript's incompatibility therefore ;-)

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Numerical stability in chisq.test

2017-12-28 Thread peter dalgaard


> On 28 Dec 2017, at 13:08 , Kurt Hornik  wrote:
> 
>> Jan Motl writes:
> 
>> The chisq.test on line 57 contains following code:
>>  STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))
> 
> The preceding 2 lines seem relevant:
> 
>## Sorting before summing may look strange, but seems to be
>## a sensible way to deal with rounding issues (PR#3486):
>STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))
> 
> -k

My thoughts too. PR 3486 is about simulated tables that theoretically have 
STATISTIC equal to the one observed, but come out slightly different, messing 
up the simulated p value. The sort is not actually intended to squeeze the very 
last bit of accuracy out of the computation, just to make sure that the 
round-off affects equivalent tables in the same way. "Fixing" the code may 
therefore unfix PR#3486; at the very least some care is required if this is 
modified.  

-pd


> 
>> However, based on book "Accuracy and stability of numerical algorithms" 
>> available from:
>>  
>> http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf
>> Table 4.1 on page 89, it is better to sort the data in increasing order than 
>> in decreasing order, when the data are non-negative.
> 
>> An example:
>>  x = matrix(c(rep(1.1, 1)), 10^16, nrow = 10001, ncol = 1)# We 
>> have a vector with 1*1.1 and 1*10^16
>>  c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))
>> The result:
>>  100010996 100011000
>> When we sort the data in the increasing order, we get the correct result. If 
>> we sort the data in the decreasing order, we get a result that is off by 4.
> 
>> Shouldn't the sort be in the increasing order rather than in the decreasing 
>> order?
> 
>> Best regards,
>> Jan Motl
> 
> 
>> PS: This post is based on discussion on 
>> https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation
>>  and the response from the post to r-h...@r-project.org.
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Numerical stability in chisq.test

2017-12-28 Thread Kurt Hornik
> Jan Motl writes:

> The chisq.test on line 57 contains following code:
>   STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))

The preceding 2 lines seem relevant:

## Sorting before summing may look strange, but seems to be
## a sensible way to deal with rounding issues (PR#3486):
STATISTIC <- sum(sort((x - E) ^ 2 / E, decreasing = TRUE))

-k

> However, based on book "Accuracy and stability of numerical algorithms" 
> available from:
>   
> http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf
> Table 4.1 on page 89, it is better to sort the data in increasing order than 
> in decreasing order, when the data are non-negative.

> An example:
>   x = matrix(c(rep(1.1, 1)), 10^16, nrow = 10001, ncol = 1)# We 
> have a vector with 1*1.1 and 1*10^16
>   c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE)))
> The result:
>   100010996 100011000
> When we sort the data in the increasing order, we get the correct result. If 
> we sort the data in the decreasing order, we get a result that is off by 4.

> Shouldn't the sort be in the increasing order rather than in the decreasing 
> order?

> Best regards,
> Jan Motl


> PS: This post is based on discussion on 
> https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation
>  and the response from the post to r-h...@r-project.org.
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel