"Jonathan Dowland" <j...@debian.org> writes:

> Else-thread, Russ begs people to stop doing this. I agree people
> shouldn't! We should also work on education and promotion of the
> alternatives.

Also, helping people use better tools for managing workloads like this
that make their lives easier and have better semantics, thus improving
life for everyone.

I'm suggesting solutions that I don't have time to help implement, and of
course it will take a long time for better tools to filter into all those
clusters, so this doesn't address the immediate problem of this thread
(hence the subject change).  But based on my past experience with these
types of systems, I bet a lot of the patterns captured in software are
older ones.  Linux has a *lot* of facilities today that it didn't have, or
at least weren't widely used, five years ago.  It would be great to help
some of those improvements filter down, because they can make a lot of
these problems go away.

For example, take the case of scratch space for batch computing.  The
logical lifespan for temporary files for a batch computing job is the
lifetime of the job, whatever that may be.  (I know there are exceptions,
but here I'm just talking about defaults.)  Previously one would have to
build support into the batch job management system for creating and
managing those per-job temporary directories, and ensure the jobs support
TMPDIR or other environment variables to control where they store data,
and everyone was doing this independently.  (I've done a *lot* of this
kind of thing, once upon a time.)

But now we have mount namespaces, and systemd has PrivateTmp that builds
on top of that.  So if the job is managed by an execution manager, it can
create per-job temporary directories and it may already support (as
systemd does) the semantics of deleting the contents of those directories
on job exit, and it bind-mounts those into the process space and the
process is none the wiser.  I think all of the desirable glue may not
fully be there (controlling what underlying file system is used for
PrivateTmp, ensuring they're also excluded from normal cleanup, etc.), but
this is very close to a much better way of handling this problem that
still exposes /tmp and /var/tmp to the job so that none of the
often-crufty scientific computing software has to change.

The new capabilities that Linux now has due to namespaces are marvellous
and solve a whole lot of problems that I didn't realize were even
solvable, and right now I suspect there are huge opportunities for
substantial improvements without a whole lot of effort by just plumbing
those facilities through to higher-level layers like batch systems.  Whole
classes of long-standing problems would just disappear, or at least be
far, far easier to manage.

Substantial, substantial caveat: I have been out of this world for a
while, and maybe most of this work has already been done?  That would be
amazing.  The best possible response to this post would be for someone to
tell me I'm five years behind and the batch systems have already picked up
this work and we can just point people at them.

-- 
Russ Allbery (r...@debian.org)              <https://www.eyrie.org/~eagle/>

Reply via email to