Re: R and R modules (and a Ruby twist)

2015-09-25 Thread Ricardo Wurmus

Pjotr Prins  writes:

> On Thu, Sep 24, 2015 at 11:40:57AM +0200, Ricardo Wurmus wrote:
>> Maybe it would be best to append the R version to the site-library
>> directory.  I don’t think we should go further than that and bring in
>> the Guix hash, because I’m willing to trust that packages built with
>> version 3.2.2 are compatible with R 3.2.2, even if the inputs to our R
>> package changed and thus the hash is different.
>
> The exception I can think of is when R provides compile time switches
> for blas or ssl (for example). We don't do that now (Nix does!), but
> if you had two R's with the same version number, it could just be that
> a module 'lifts' that dependency and strictly works with one R (and
> not the other).

Isn’t that expected, though?  That’s a property of the used version of
R, then, not a problem with the package.

> It is the same for Ruby, Perl, Python, Apache, Firefox, etc. Anything
> that allows for building 'site' modules.

I don’t disagree in general.  There may be cases where the variant of
the build-time dependency must be identical to that used at runtime.
But I don’t think this is true for more than a few special packages.
Take R as an example.  Most packages are written in pure R, and thus
only depend on features provided by R.  What features are provided by
the language depends only on the version, not on configure flags.

If a user builds a variant of R that lacks Cairo, for example, then
certain packages won’t work as intended.  But does this mean that we
need to disallow installing packages that would have reduced feature
sets for a mutilated version of R in that case?

> I know this is mostly theoretical at this stage, but why not encourage
> strict isolation of interpreter+modules? That is the only way we'll
> guarantee independence between graphs. Nix/Guix does such a great job
> there, and now we allow interpreters to 'leak' their environments,
> just because of their convention and our trust in things that ought to
> work. And all it costs us is a partial SHA added to the path. So for
> Ruby it would be
>
>   ~/.guix-profile/lib/ruby/2.2.0-edb92950/
>
> instead of
>
>   ~/.guix-profile/lib/ruby/2.2.0/
>
> Personally I can live with the status quo, but somehow I prefer the
> exact isolation. Maybe it will come when someone gets hurt.

For R, Perl, Ruby and Python we are often forced to propagate inputs, so
that they end up in the profile and can be loaded by looking up the path
to the union in some environment variable, such as R_LIBS_SITE,
GEM_HOME, or PYTHONPATH.  These environment variables do not make a
distinction between versions or variants.  (Only Perl allows for a
distinction between major versions by having the major version number as
part of its environment variable: PERL5LIB.)

How would a *user* make sure to use different sets of packages with
different variants of languages?  At the moment, the only way is to
manually set the environment variable to point to the desired path.

With propagated inputs we cannot achieve as much isolation as we would
like to.  There might be a way to actually patch the mechanisms that
these languages use to load additional libraries/packages, patching them
such that they load dependencies by full path rather than by simple
name, similar to how we patch ‘dlopen()’ calls in C programmes.

Only if we can avoid using these inflexible environment variables can we
achieve the kind of isolation you try to get by adding a partial hash to
the output directory.

Just a data point: last time I checked Ruby’s “require” directive allows
for a full path to be given instead of a simple string.  Might there be
a way to forego propagating inputs by patching all “require $string”
statements in Ruby sources in a build phase, much like we automatically
patch shebangs?

To note: this would make it impossible for users to override
libraries/modules by adding an alternative directory containing a
modified version of a module to the list of search paths in the
consulted environment variable.  That’s akin to disabling the
LD_LIBRARY_PATH feature in C programmes.

~~ Ricardo



Re: R and R modules (and a Ruby twist)

2015-09-24 Thread Ricardo Wurmus

Pjotr Prins  writes:

> When we add an R module, such as R-qtl, the R-build-system does not
> provide R itself as a propagated input, i.e., the R interpreter is not
> in the profile. In the R world this is kinda odd.  Almost all modules
> used from R. I.e. start up R and
>
>   library(qtl)
>   do something with R/qtl
>
> Would have use people use that module in interactive mode. In the
> current package install R is not included as a symlink and needs to be
> separately installed.

Correct.  I didn’t think of it as a problem as I assumed people would
have R installed in their profile if they wanted to interactively use an
R package.  But now that you mention it, I think it might lead to
problems (see below).

> It is one other thing I am trying to think through. With a standard R
> distribution, every package is strictly aligned with the interpreter
> (they get installed from inside R).
>
> With Guix' rolling model of package updates modules may go out of sync
> - even if they are correctly linked with an underlying R. So mixing
> interpreters and modules/packages may potentially give problems. 

Users can have any number of “libraries” (directories containing
installed R packages) in R_LIBS_SITE.  Currently, our R package suggests
R_LIBS_SITE to be set to “$profile/site-library” and the r-build-system
installs packages to “$out/site-library”.

We could add a level for the R version, e.g. “$out/site-library/3.2.2/”,
but it should be noted that R_LIBS_SITE makes no distinction for
different versions of R.  It’s just a single list of directories.  I
don’t know what would happen if you had

R_LIBS_SITE=$HOME/site-library/3.2.2:$HOME/site-library/3.1.3

and then ran one or the other version of R.  (Note that currently there
can only be one version of R in a single profile anyway.)

I guess the problem is with updates.  If you had R 3.1.3 in your profile
and installed a new R package that is then built with the latest version
of R (3.2.2), this might lead to problems actually using the package in
an R session using version 3.1.3.

Maybe it would be best to append the R version to the site-library
directory.  I don’t think we should go further than that and bring in
the Guix hash, because I’m willing to trust that packages built with
version 3.2.2 are compatible with R 3.2.2, even if the inputs to our R
package changed and thus the hash is different.

~~ Ricardo



Re: R and R modules (and a Ruby twist)

2015-09-24 Thread Pjotr Prins
On Thu, Sep 24, 2015 at 11:40:57AM +0200, Ricardo Wurmus wrote:
> Maybe it would be best to append the R version to the site-library
> directory.  I don’t think we should go further than that and bring in
> the Guix hash, because I’m willing to trust that packages built with
> version 3.2.2 are compatible with R 3.2.2, even if the inputs to our R
> package changed and thus the hash is different.

The exception I can think of is when R provides compile time switches
for blas or ssl (for example). We don't do that now (Nix does!), but
if you had two R's with the same version number, it could just be that
a module 'lifts' that dependency and strictly works with one R (and
not the other).

It is the same for Ruby, Perl, Python, Apache, Firefox, etc. Anything
that allows for building 'site' modules.

I know this is mostly theoretical at this stage, but why not encourage
strict isolation of interpreter+modules? That is the only way we'll
guarantee independence between graphs. Nix/Guix does such a great job
there, and now we allow interpreters to 'leak' their environments,
just because of their convention and our trust in things that ought to
work. And all it costs us is a partial SHA added to the path. So for
Ruby it would be

  ~/.guix-profile/lib/ruby/2.2.0-edb92950/

instead of

  ~/.guix-profile/lib/ruby/2.2.0/

Personally I can live with the status quo, but somehow I prefer the
exact isolation. Maybe it will come when someone gets hurt.

Pj.