Re: [Rd] CRAN indices out of whack (for at least macOS)

2018-02-05 Thread Thierry Onkelinx
Another benefit of Winston's proposal is that it make it easy to
install specific package versions from source. For the time being I'm
using a construct like
https://github.com/inbo/Rstable/blob/master/cran_install.sh to
generate a Docker image.

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///




2018-02-03 20:31 GMT+01:00 Winston Chang :
> Although it may not have been the cause of this particular index
> inconsistency, there are other causes of intermittent index
> inconsistencies. They could be avoided if there were a different
> directory structure on CRAN servers.
>
> One of the causes of inconsistencies is caching. With
> cloud.r-project.org (note that this is not cran.r-project.org), the
> there is a CDN in front of the server; the CDN has caching endpoints
> around the world, and will serve files to the user from the nearest
> endpoint.
>
> The cache timeout for each file is 30 minutes. Suppose a user
> downloads file X from some endpoint at 1:00. If the endpoint doesn't
> already have X in the cache, then it will fetch the file from the
> server, and then send it to the user. The endpoint will consider the
> cached file valid until 1:30. If another user requests X at 1:20, the
> endpoint will serve up the file from its cache without checking with
> the server. If someone requests X at 1:40, the endpoint will check
> with the server to see if its cached version is still valid (and
> download an updated version if necessary), then it wills end the file
> to the user.
>
> Because the caching is on a per-file basis, this can lead to a
> situation where the PACKAGES file served by an endpoint is out of sync
> with the .tgz package files. Imagine this scenario:
>
> 1:00 Someone downloads PACKAGES. It is not yet in the endpoint's
> cache, so it fetches it from the server. This version of PACKAGES says
> that the current version of PkgA is 1.0.
> 1:10 The server performs an rsync from the central CRAN mirror. It
> gets an updated version of PACKAGES, which says that the current
> version of PkgA is 2.0. The rsync also removes the PkgA_1.0.tgz file
> and adds PkgA_2.0.tgz.
> 1:20 Someone else wants to install PkgA, so their R session first
> downloads PACKAGES, which points to PkgA_1.0.tgz. Then R tries to
> download PkgA_1.0.tgz; it is not in the endpoint's cache, so the
> endpoint tries to fetch it from the server, but the file is not
> present there so it sends a 404 missing message. The endpoint passes
> this to the R session, and the package installation fails.
>
> Anyone else who tries to install PkgA (and hits the same CDN endpoint)
> will get the same installation failure, until the cache for PACKAGES
> expires at 1:30. However, another person who happens to hit another
> endpoint may be able to install PkgA, because each endpoint does its
> caching independently.
>
> Something similar even without a CDN, because download.packages()
> caches the contents of PACKAGES. However, that can be worked around by
> telling download.packages() to not use the cache, or by simply
> restarting R.
>
> One reason that package installations fail in these cases is that the
> current version of a package is in one directory, and the old
> (archived) versions of a package are in another directory. If current
> and old versions were in the same directory, then package installation
> would not fail.
>
>
> -Winston
>
>
>
> On Tue, Jan 30, 2018 at 1:19 PM, Dirk Eddelbuettel  wrote:
>>
>> I have received three distinct (non-)bug reports where someone claimed a
>> recent package of mine was broken ... simply because the macOS binary was not
>> there.
>>
>> Is there something wrong with the cronjob providing the indices? Why is it
>> pointing people to binaries that do not exist?
>>
>> Concretely, file
>>
>>   https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES
>>
>> contains
>>
>>   Package: digest
>>   Version: 0.6.15
>>   Title: Create Compact Hash Digests of R Objects
>>   Depends: R (>= 2.4.1)
>>   Suggests: knitr, rmarkdown
>>   Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 

Re: [Rd] CRAN indices out of whack (for at least macOS)

2018-02-03 Thread Winston Chang
Although it may not have been the cause of this particular index
inconsistency, there are other causes of intermittent index
inconsistencies. They could be avoided if there were a different
directory structure on CRAN servers.

One of the causes of inconsistencies is caching. With
cloud.r-project.org (note that this is not cran.r-project.org), the
there is a CDN in front of the server; the CDN has caching endpoints
around the world, and will serve files to the user from the nearest
endpoint.

The cache timeout for each file is 30 minutes. Suppose a user
downloads file X from some endpoint at 1:00. If the endpoint doesn't
already have X in the cache, then it will fetch the file from the
server, and then send it to the user. The endpoint will consider the
cached file valid until 1:30. If another user requests X at 1:20, the
endpoint will serve up the file from its cache without checking with
the server. If someone requests X at 1:40, the endpoint will check
with the server to see if its cached version is still valid (and
download an updated version if necessary), then it wills end the file
to the user.

Because the caching is on a per-file basis, this can lead to a
situation where the PACKAGES file served by an endpoint is out of sync
with the .tgz package files. Imagine this scenario:

1:00 Someone downloads PACKAGES. It is not yet in the endpoint's
cache, so it fetches it from the server. This version of PACKAGES says
that the current version of PkgA is 1.0.
1:10 The server performs an rsync from the central CRAN mirror. It
gets an updated version of PACKAGES, which says that the current
version of PkgA is 2.0. The rsync also removes the PkgA_1.0.tgz file
and adds PkgA_2.0.tgz.
1:20 Someone else wants to install PkgA, so their R session first
downloads PACKAGES, which points to PkgA_1.0.tgz. Then R tries to
download PkgA_1.0.tgz; it is not in the endpoint's cache, so the
endpoint tries to fetch it from the server, but the file is not
present there so it sends a 404 missing message. The endpoint passes
this to the R session, and the package installation fails.

Anyone else who tries to install PkgA (and hits the same CDN endpoint)
will get the same installation failure, until the cache for PACKAGES
expires at 1:30. However, another person who happens to hit another
endpoint may be able to install PkgA, because each endpoint does its
caching independently.

Something similar even without a CDN, because download.packages()
caches the contents of PACKAGES. However, that can be worked around by
telling download.packages() to not use the cache, or by simply
restarting R.

One reason that package installations fail in these cases is that the
current version of a package is in one directory, and the old
(archived) versions of a package are in another directory. If current
and old versions were in the same directory, then package installation
would not fail.


-Winston



On Tue, Jan 30, 2018 at 1:19 PM, Dirk Eddelbuettel  wrote:
>
> I have received three distinct (non-)bug reports where someone claimed a
> recent package of mine was broken ... simply because the macOS binary was not
> there.
>
> Is there something wrong with the cronjob providing the indices? Why is it
> pointing people to binaries that do not exist?
>
> Concretely, file
>
>   https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES
>
> contains
>
>   Package: digest
>   Version: 0.6.15
>   Title: Create Compact Hash Digests of R Objects
>   Depends: R (>= 2.4.1)
>   Suggests: knitr, rmarkdown
>   Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix
>   Archs: digest.so.dSYM
>
> yet the _same directory_ only has:
>
>   digest_0.6.14.tgz 15-Jan-2018 21:36   157K
>
> I presume this is a temporary accident.
>
> We are all spoiled by you all providing such a wonderfully robust and
> well-oiled service---so again big THANKS for that--but today something is out
> of order.
>
> Dirk
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN indices out of whack (for at least macOS)

2018-01-31 Thread Simon Urbanek
Dirk,

yes, thanks, the edge server that serves the Mac binaries to CRAN has run out 
of disk space (due to size of CRAN itself) so the sync was incomplete.
It is fixed now -- you can try by using the macos master server as mirror: 
https://r.research.att.com/ and it will propagate through other mirrors as 
usual.

Thanks,
Simon




> On Jan 31, 2018, at 1:34 PM, Dirk Eddelbuettel  wrote:
> 
> 
> Bumping this as we now have two more issue tickets filed and a fresh SO
> question.
> 
> Is anybody looking at this? Simon?
> 
> Dirk
> 
> On 30 January 2018 at 15:19, Dirk Eddelbuettel wrote:
> | 
> | I have received three distinct (non-)bug reports where someone claimed a
> | recent package of mine was broken ... simply because the macOS binary was 
> not
> | there.
> | 
> | Is there something wrong with the cronjob providing the indices? Why is it
> | pointing people to binaries that do not exist?
> | 
> | Concretely, file
> | 
> |   https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES
> | 
> | contains
> | 
> |   Package: digest
> |   Version: 0.6.15
> |   Title: Create Compact Hash Digests of R Objects
> |   Depends: R (>= 2.4.1)
> |   Suggests: knitr, rmarkdown
> |   Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix
> |   Archs: digest.so.dSYM
> | 
> | yet the _same directory_ only has:
> | 
> |   digest_0.6.14.tgz 15-Jan-2018 21:36   157K
> | 
> | I presume this is a temporary accident.
> | 
> | We are all spoiled by you all providing such a wonderfully robust and
> | well-oiled service---so again big THANKS for that--but today something is 
> out
> | of order.
> | 
> | Dirk
> | 
> | -- 
> | http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
> | 
> | __
> | R-devel@r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> -- 
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] CRAN indices out of whack (for at least macOS)

2018-01-31 Thread Dirk Eddelbuettel

Bumping this as we now have two more issue tickets filed and a fresh SO
question.

Is anybody looking at this? Simon?

Dirk

On 30 January 2018 at 15:19, Dirk Eddelbuettel wrote:
| 
| I have received three distinct (non-)bug reports where someone claimed a
| recent package of mine was broken ... simply because the macOS binary was not
| there.
| 
| Is there something wrong with the cronjob providing the indices? Why is it
| pointing people to binaries that do not exist?
| 
| Concretely, file
| 
|   https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES
| 
| contains
| 
|   Package: digest
|   Version: 0.6.15
|   Title: Create Compact Hash Digests of R Objects
|   Depends: R (>= 2.4.1)
|   Suggests: knitr, rmarkdown
|   Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix
|   Archs: digest.so.dSYM
| 
| yet the _same directory_ only has:
| 
|   digest_0.6.14.tgz   15-Jan-2018 21:36   157K
| 
| I presume this is a temporary accident.
| 
| We are all spoiled by you all providing such a wonderfully robust and
| well-oiled service---so again big THANKS for that--but today something is out
| of order.
| 
| Dirk
| 
| -- 
| http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
| 
| __
| R-devel@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] CRAN indices out of whack (for at least macOS)

2018-01-30 Thread Dirk Eddelbuettel

I have received three distinct (non-)bug reports where someone claimed a
recent package of mine was broken ... simply because the macOS binary was not
there.

Is there something wrong with the cronjob providing the indices? Why is it
pointing people to binaries that do not exist?

Concretely, file

  https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES

contains

  Package: digest
  Version: 0.6.15
  Title: Create Compact Hash Digests of R Objects
  Depends: R (>= 2.4.1)
  Suggests: knitr, rmarkdown
  Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix
  Archs: digest.so.dSYM

yet the _same directory_ only has:

  digest_0.6.14.tgz 15-Jan-2018 21:36   157K

I presume this is a temporary accident.

We are all spoiled by you all providing such a wonderfully robust and
well-oiled service---so again big THANKS for that--but today something is out
of order.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel