Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-08 Thread Dirk Eddelbuettel


Hi Hadley,

On 8 August 2023 at 08:34, Hadley Wickham wrote:
| Do you think it's worth also/instead considering a fix to S4 to avoid
| this caching issue in future R versions?

That is somewhat orthogonal to my point of "'some uses' of the 20 year old S4
system (which as we know is fairly widely used 'out there') break
deployments" and the related "this is also a PITA for binary distributors".

The existing body of code seems to need some help.

| (This is top of my for me as we consider the design of S7, and I
| recently made a note to ensure we avoid similar problems there:
| https://github.com/RConsortium/OOP-WG/issues/317)

I haven't followed the S7 repo closely but peek every couple of months. It
seems sensible to avoid repeating shortcomings identfied elsewhere.

Best,  Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-08 Thread Hadley Wickham
Hi Dirk,

Do you think it's worth also/instead considering a fix to S4 to avoid
this caching issue in future R versions?

(This is top of my for me as we consider the design of S7, and I
recently made a note to ensure we avoid similar problems there:
https://github.com/RConsortium/OOP-WG/issues/317)

Hadley

On Sun, Aug 6, 2023 at 4:05 PM Dirk Eddelbuettel  wrote:
>
>
> CRAN, by relying on the powerful package management system that is part of R,
> provides an unparalleled framework for extending R with nearly 20k packages.
>
> We recently encountered an issue that highlights a missing element in the
> otherwise outstanding package management system. So we would like to start a
> discussion about enhancing its feature set. As shown below, a mechanism to
> force reinstallation of packages may be needed.
>
> A demo is included below, it is reproducible in a container. We find the
> easiest/fastest reproduction is by saving the code snippet below in the
> current directory as eg 'matrixIssue.R' and have it run in a container as
>
>docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
>
> This runs in under two minutes, first installing the older Matrix, next
> installs SeuratObject, and then by removing the older Matrix making the
> (already installed) current Matrix version the default. This simulates a
> package update for Matrix. Which, as the final snippet demonstrates, silently
> breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
> So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
> under Matrix 1.6.0.
>
> What this shows is that a call to update.packages() will silently corrupt an
> existing installation.  We understand that this was known and addressed at
> CRAN by rebuilding all binary packages (for macOS and Windows).
>
> But it leaves both users relying on source installation as well as
> distributors of source packages in a dire situation. It hurt me three times:
> my default R installation was affected with unit tests (involving
> SeuratObject) silently failing. It similarly broke our CI setup at work.  And
> it created a fairly bad headache for the Debian packaging I am involved with
> (and I surmise it affects other distro similarly).
>
> It would be good to have a mechanism where a package, when being upgraded,
> could flag that 'more actions are required' by the system (administrator).
> We think this example demonstrates that we need such a mechanism to avoid
> (silently !!) breaking existing installations, possibly by forcing
> reinstallation of other packages.  R knows the package dependency graph and
> could trigger this, possibly after an 'opt-in' variable the user / admin
> sets.
>
> One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
> then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
> of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
> permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
> SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).
>
> Regards,  Dirk
>
>
> ## Code example follows. Recommended to run the rocker/r2u container.
> ## Could also run 'apt update -qq; apt upgrade -y' but not required
> ## Thanks to my colleague Paul Hoffman for the core of this example
>
> ## now have Matrix 1.6.0 because r2u and CRAN remain current but we can 
> install an older Matrix
> remotes::install_version('Matrix', '1.5.1')
>
> ## we can confirm that we have Matrix 1.5.1
> packageVersion("Matrix")
>
> ## we now install SeuratObject from source and to speed things up we first 
> install the binary
> install.packages("SeuratObject")   # in this container via bspm/r2u as binary
> ## and then force a source installation (turning bspm off) _while Matrix is 
> at 1.5.1_
> if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
> Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
> noise silencer
> install.packages('SeuratObject')
>
> ## we now remove the Matrix package version 1.5.1 we installed into 
> /usr/local leaving 1.6.0
> remove.packages("Matrix")
> packageVersion("Matrix")
>
> ## and we now run a bit of SeuratObject code that is now broken as 
> Csparse_validate is gone
> suppressMessages(library(SeuratObject))
> data('pbmc_small')
> graph <- pbmc_small[['RNA_snn']]
> class(graph)
> getClass('Graph')
> show(graph) # this fails
>
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-07 Thread Dirk Eddelbuettel


Hi Ivan,

I usually 'mentally applaud' when reading your replies on list but not here.

On 7 August 2023 at 16:15, Ivan Krylov wrote:
| SeuratObject 4.1.3. The breakage definitely exists, but not on the
| source package level.

You seem to overlook that a large part of the R Universe only works off
source distributions and, as one cannot run source, builds binaries off them.

The breakage is real.

Also implying our packages in question would not reverse-dependency check is
not helpful. Of course they do.

[ And as an illustrative aside, part of the 'third problem I mentioned in my
initial email concerning Debian is that eg (some) Debian maintainers insist
on autopkgtests (a good idea in theory) and get terribly hung up when they
manage to mismatch package relations (ie 'skew' from CRAN -- while that is in
part self-imposed) some troubles are real and eg Matrix 1.6.0 only got to
'testing' now after release manager intervention. (And no Mikael, that was
not Seurat related but thanks for the tip in your email, already passed it
on.)  Of course we have other issues too there with eg exotic non-CRAN
platforms (now including i386) breaking. But that is outside of this thread.
Thanks for bearing with me). ]

Dirk

| 
| It may also not be easy for the package developer to notice breaking a
| binary package while performing reverse dependency checks, in time to
| add such a notice to their package. The recommended way to do that is
| tools::check_packages_in_dir(), which works on source packages.
| 
| Would it help to reframe the problem in terms of binary packages
| acquiring dependency constraints that are more strict than those of the
| corresponding source packages? If a package that imports S4 classes
| from another package and thus ends up caching their definitions, R
| could compute a hash of the classes being imported, store it together
| with the installed package and complain noisily if the hash doesn't
| match later at load time. This could be used to detect such problems
| automatically (but could also result in false positives!).
| 
| This is not the only way a binary package could accidentally depend on
| internals of another binary package. I remember reading about (but
| cannot find it now!) some packages "importing" a function from ggplot2
| (I think) by assigning it into their namespace:
| 
|  foo <- ggplot2::useful_function
| 
| This worked for quite a while, but later broke because
| ggplot2::useful_function called an internal function which ceased to
| exist in a new version of ggplot2. This is arguably a bug and probably
| even harder to track, but are there any other ways to catch a "binary
| dependency" for a package?
| 
| -- 
| Best regards,
| Ivan

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-07 Thread Ivan Krylov
В Sun, 6 Aug 2023 16:05:03 -0500
Dirk Eddelbuettel  пишет:

> One possibility may be to add a new (versioned) field 'Breaks:'.
> Matrix could then have added 'Breaks: SeuratObject (<= 4.1.3)'
> preventing an installation of Matrix 1.6.0 when SeuratObject 4.1.3
> (or earlier) is present, but permitting an update to Matrix 1.6.0
> alongside a new version, say, 4.1.4 of SeuratObject which could
> itself have a versioned Depends: Matrix (>= 1.6.0).

I wouldn't entirely agree that Matrix 1.6.0 breaks SeuratObject 4.1.3,
given that it's still possible to install first Matrix 1.6.0 and then
SeuratObject 4.1.3. The breakage definitely exists, but not on the
source package level.

It may also not be easy for the package developer to notice breaking a
binary package while performing reverse dependency checks, in time to
add such a notice to their package. The recommended way to do that is
tools::check_packages_in_dir(), which works on source packages.

Would it help to reframe the problem in terms of binary packages
acquiring dependency constraints that are more strict than those of the
corresponding source packages? If a package that imports S4 classes
from another package and thus ends up caching their definitions, R
could compute a hash of the classes being imported, store it together
with the installed package and complain noisily if the hash doesn't
match later at load time. This could be used to detect such problems
automatically (but could also result in false positives!).

This is not the only way a binary package could accidentally depend on
internals of another binary package. I remember reading about (but
cannot find it now!) some packages "importing" a function from ggplot2
(I think) by assigning it into their namespace:

 foo <- ggplot2::useful_function

This worked for quite a while, but later broke because
ggplot2::useful_function called an internal function which ceased to
exist in a new version of ggplot2. This is arguably a bug and probably
even harder to track, but are there any other ways to catch a "binary
dependency" for a package?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-06 Thread Ben Bolker
  I would support this suggestion.  There is a similar binary 
dependency chain from Matrix → TMB → glmmTMB; we have implemented 
various checks to make users aware that they need to reinstall from 
source, and to some extent we've tried to push out synchronous updates 
(i.e., push an update of TMB to CRAN every time Matrix changes, and an 
update of glmmTMB after that), but centralized machinery for this would 
certainly be nice.


  FWIW some of the machinery is here: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295 
-- it relies on a Makefile rule that caches the current installed 
version of TMB: 
https://github.com/glmmTMB/glmmTMB/blob/d9ee7b043281341429381faa19b5e53cb5a378c3/glmmTMB/R/utils.R#L209-L295



  cheers
   Ben Bolker


On 2023-08-06 5:05 p.m., Dirk Eddelbuettel wrote:


CRAN, by relying on the powerful package management system that is part of R,
provides an unparalleled framework for extending R with nearly 20k packages.

We recently encountered an issue that highlights a missing element in the
otherwise outstanding package management system. So we would like to start a
discussion about enhancing its feature set. As shown below, a mechanism to
force reinstallation of packages may be needed.

A demo is included below, it is reproducible in a container. We find the
easiest/fastest reproduction is by saving the code snippet below in the
current directory as eg 'matrixIssue.R' and have it run in a container as

docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
   
This runs in under two minutes, first installing the older Matrix, next

installs SeuratObject, and then by removing the older Matrix making the
(already installed) current Matrix version the default. This simulates a
package update for Matrix. Which, as the final snippet demonstrates, silently
breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
under Matrix 1.6.0.

What this shows is that a call to update.packages() will silently corrupt an
existing installation.  We understand that this was known and addressed at
CRAN by rebuilding all binary packages (for macOS and Windows).

But it leaves both users relying on source installation as well as
distributors of source packages in a dire situation. It hurt me three times:
my default R installation was affected with unit tests (involving
SeuratObject) silently failing. It similarly broke our CI setup at work.  And
it created a fairly bad headache for the Debian packaging I am involved with
(and I surmise it affects other distro similarly).

It would be good to have a mechanism where a package, when being upgraded,
could flag that 'more actions are required' by the system (administrator).
We think this example demonstrates that we need such a mechanism to avoid
(silently !!) breaking existing installations, possibly by forcing
reinstallation of other packages.  R knows the package dependency graph and
could trigger this, possibly after an 'opt-in' variable the user / admin
sets.

One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).

Regards,  Dirk


## Code example follows. Recommended to run the rocker/r2u container.
## Could also run 'apt update -qq; apt upgrade -y' but not required
## Thanks to my colleague Paul Hoffman for the core of this example

## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install 
an older Matrix
remotes::install_version('Matrix', '1.5.1')

## we can confirm that we have Matrix 1.5.1
packageVersion("Matrix")

## we now install SeuratObject from source and to speed things up we first 
install the binary
install.packages("SeuratObject")   # in this container via bspm/r2u as binary
## and then force a source installation (turning bspm off) _while Matrix is at 
1.5.1_
if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
noise silencer
install.packages('SeuratObject')

## we now remove the Matrix package version 1.5.1 we installed into /usr/local 
leaving 1.6.0
remove.packages("Matrix")
packageVersion("Matrix")

## and we now run a bit of SeuratObject code that is now broken as 
Csparse_validate is gone
suppressMessages(library(SeuratObject))
data('pbmc_small')
graph <- pbmc_small[['RNA_snn']]
class(graph)
getClass('Graph')
show(graph) # this fails




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] A demonstrated shortcoming of the R package management system

2023-08-06 Thread Dirk Eddelbuettel


CRAN, by relying on the powerful package management system that is part of R,
provides an unparalleled framework for extending R with nearly 20k packages.

We recently encountered an issue that highlights a missing element in the
otherwise outstanding package management system. So we would like to start a
discussion about enhancing its feature set. As shown below, a mechanism to
force reinstallation of packages may be needed.

A demo is included below, it is reproducible in a container. We find the
easiest/fastest reproduction is by saving the code snippet below in the
current directory as eg 'matrixIssue.R' and have it run in a container as

   docker run --rm -ti -v `pwd`:/mnt rocker/r2u Rscript /mnt/matrixIssue.R
  
This runs in under two minutes, first installing the older Matrix, next
installs SeuratObject, and then by removing the older Matrix making the
(already installed) current Matrix version the default. This simulates a
package update for Matrix. Which, as the final snippet demonstrates, silently
breaks SeuratObject as the cached S4 method Csparse_validate is now missing.
So when SeuratObject was installed under Matrix 1.5.1, it becomes unuseable
under Matrix 1.6.0.

What this shows is that a call to update.packages() will silently corrupt an
existing installation.  We understand that this was known and addressed at
CRAN by rebuilding all binary packages (for macOS and Windows).

But it leaves both users relying on source installation as well as
distributors of source packages in a dire situation. It hurt me three times:
my default R installation was affected with unit tests (involving
SeuratObject) silently failing. It similarly broke our CI setup at work.  And
it created a fairly bad headache for the Debian packaging I am involved with
(and I surmise it affects other distro similarly).

It would be good to have a mechanism where a package, when being upgraded,
could flag that 'more actions are required' by the system (administrator).
We think this example demonstrates that we need such a mechanism to avoid
(silently !!) breaking existing installations, possibly by forcing
reinstallation of other packages.  R knows the package dependency graph and
could trigger this, possibly after an 'opt-in' variable the user / admin
sets.

One possibility may be to add a new (versioned) field 'Breaks:'. Matrix could
then have added 'Breaks: SeuratObject (<= 4.1.3)' preventing an installation
of Matrix 1.6.0 when SeuratObject 4.1.3 (or earlier) is present, but
permitting an update to Matrix 1.6.0 alongside a new version, say, 4.1.4 of
SeuratObject which could itself have a versioned Depends: Matrix (>= 1.6.0).

Regards,  Dirk


## Code example follows. Recommended to run the rocker/r2u container.
## Could also run 'apt update -qq; apt upgrade -y' but not required
## Thanks to my colleague Paul Hoffman for the core of this example

## now have Matrix 1.6.0 because r2u and CRAN remain current but we can install 
an older Matrix
remotes::install_version('Matrix', '1.5.1')

## we can confirm that we have Matrix 1.5.1
packageVersion("Matrix")

## we now install SeuratObject from source and to speed things up we first 
install the binary
install.packages("SeuratObject")   # in this container via bspm/r2u as binary
## and then force a source installation (turning bspm off) _while Matrix is at 
1.5.1_
if (requireNamespace("bspm", quietly=TRUE) bspm::disable()
Sys.setenv(PKG_CXXFLAGS='-Wno-ignored-attributes')  # Eigen compilation 
noise silencer
install.packages('SeuratObject')

## we now remove the Matrix package version 1.5.1 we installed into /usr/local 
leaving 1.6.0
remove.packages("Matrix")
packageVersion("Matrix")

## and we now run a bit of SeuratObject code that is now broken as 
Csparse_validate is gone
suppressMessages(library(SeuratObject))
data('pbmc_small')
graph <- pbmc_small[['RNA_snn']]
class(graph)
getClass('Graph')
show(graph) # this fails


-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel