On Wed, Feb 8, 2023 at 12:22 PM Iñaki Ucar <iu...@fedoraproject.org> wrote:
>
> On Wed, 8 Feb 2023 at 19:59, Henrik Bengtsson
> <henrik.bengts...@gmail.com> wrote:
> >
> > I just want to add a few reasons that I know of for why users are
> > still on Red Hat/CentOS 7 and learned from being deeply involved with
> > big academic and research high-performance compute (HPC) environments.
> > These systems are not like your regular sailing boat, but more like a
> > giant container ship; much harder to navigate, slower to react, you
> > cannot just cruise around and pop into any harbor you like to, or when
> > you like to. It takes much more efforts, more logistics, and more
> > people to operate them. If you mess up, the damage is much bigger.
>
> I'm fully aware of, and I understand, all the technical and
> organizational reasons why there are CentOS 7 systems out there. I
> only challenge a single point (cherry-picked from your list):
>
> > * The majority of users and sysadmins prefer stability over the being
> > able to run the latest tools.
>
> This is simply not true. In general, sysadmins do prefer stability,
> but users want the latest tools (otherwise, this very thread would not
> exist, QED). And the first thing is hardly compatible with the second
> one. That is, without containers, which brings us to the next point.

We might operate in different environments, but there are lots of labs
that keep the exact same pipeline for years (5-10 years), because "it
works", and because if they change anything, they might have to
re-analyze all their old data to avoid batch effects purely from
different versions of algorithms. I can agree with this strategy too,
especially if your data are huge and staging them back on the compute
environment from cold storage can be a huge task in itself.  Then
there are reasons such as being less savvy, and bad memories from last
time they tried this (e.g. years ago), everything broke, and it took
them weeks and months to sort it out.  I'm not trying to make fun of
anyone here - it's just that on big clusters with many users, the
skill-level spectrum varies a lot.

>
> > * Although you might want to tell everyone to just run a new version
> > via Linux containers, it's not the magic sauce for all of the above.
> > Savvy users might be able to do it, but not your average users. Also,
> > this basically puts the common sysadmin burden on the end-user, who
> > now have to keep their container stacks up-to-date and in sync.  In
> > contrast to a homogeneous environment, this strategy increases the
> > support burden on sysadms, because they will get much more questions
> > and request for troubleshooting on very specific setups.
>
> How is that so? Let's say a user wants the latest version of R.
> Nothing prevents a sysadmin to set up a script called "R" in the PATH
> that runs e.g. the r2u container [1] with the proper mounts. And
> that's it: the user runs "R" and receives the latest version (and even
> package installations seem to be blazing fast now!) without even
> knowing that it's running inside a container.
>
> I know, you are thinking "security", "permissions"...

I'm actually thinking maintenance and support. When you bring in Linux
containers, you basically introduce a bunch of new compute
environments in addition to your host system. So, instead of the
support team (often same as the sysadm) having to understand and
answer questions for a homogeneous environment, they now have to be
up-to-date with different versions of CentOS/Rocky, Ubuntu, Debian,
... and different container images. In R we often have a hard time to
even get users to report their sessionInfo() - now imagine their
container details.  If admins start providing one-off container
images, that becomes an added maintenance load. But, I agree, Linux
containers are great and makes it possible for a lot of users to run
analyzes that they otherwise would not be able to do on the host
system.

>
> $ yum install podman
>
> Drop-in replacement for docker, but rootless, daemonless. Also there's
> a similar thing called Apptainer [1], formerly Singularity, that was
> specifically designed with HPC in mind, now part of the Linux
> Foundation.
>
> [1] https://github.com/eddelbuettel/r2u
> [2] https://apptainer.org/

Yes, Singularity/Apptainer is awesome, especially since Docker is
mostly considered a no-no in HPC environments. The minimal, or even
zero use of SUID, these days, is great. That it runs as a regular
process as the users itself with good default file mounts is also
neat.  These things get even better with newer Linux kernels, which,
by the way, is another motivation for upgrading the OS.

That said, with Apptainer and likes, the user might run into conflicts
here, similar to what we see when users install software via Conda,
which often generals a parallel software stack to that of the host
system.  Taking R as an example, when a user installs packages, they
end up in R_LIBS_USER=~/R/%p-library/%v (*).  This is the same
directory regardless of running R on the host system, in a Linux
container, and in Linux containers based on different OSes.  So, if
they end up running a little bit here and there, which is not
unreasonable to expect if they work on different projects, then there
will a mishmash of R package binaries that are not compatible with
each other.  This happens a ton when people use Conda.  Of course, a
savvy user will at some point figure this out, and configure their
R_LIBS_USER to be agile to the environment they run, but the majority
won't notice this until it's too late.  And, boom, now you're adding
lots of load on the support team, and troubleshooting and undoing
these conflicts consumes a lot of wasted efforts.  In the worst case,
the user does not reach out for help, but in stead struggle in silence
and might work with something half broken.  From my experience at UCSF
(~2,000 users on two big clusters), this is unfortunately not that
uncommon.

(*) My wish would be if R could to include also the OS name and the OS
version in the default R_LIBS_USER, something like
R_LIBS_USER=~/R/%O-%p-library/%v, where %O would be a new
specification that expands to, say, "centos-7", "ubuntu-22.04".  That
would mitigate lots of these issues automatically.

Thanks for the feedback and questions,

Henrik

> > What R Core, Gabor, and many others are doing, often silently in the
> > background, is to make sure R works smoothly for many R users out
> > there, whatever operating system and version they may be on. This is
> > so essential to R's success, and, more importantly, for research and
> > science to be able to move forward.
>
> +1000
>
> --
> Iñaki Úcar

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to