Hi Dag Sverre,
As discussed during the conf call, HashDist plans some nice tools that
would be very useful to support in EasyBuild.
If you need any feedback from us on some of the stuff you're working on,
let us know.
If you want to integrate support for the HashDist tools in EasyBuild,
we'll be glad to help where we can (time permitting).
regards,
Kenneth
On 11/27/2012 10:09 AM, Dag Sverre Seljebotn wrote:
Hi EasyBuild folks, (with CC to Hashdist list)
I'm funded to work for two months on the Hashdist project (which only
exists on paper at the moment), and had a nice conversation with
Kenneth and Jens from EasyBuild today. The conclusion seems to be that
Hashdist and EasyBuild may complement one another nicely and are
mostly orthogonal.
Minutes from our call:
https://github.com/hpcugent/easybuild/wiki/Notes-on-EasyBuild-HashDist-conf-call-%2820121126%29
The aim of Hashdist is to accelerate the development of existing and
future (scientific) software distribution systems, by providing some
core tools that can be shared between them.
Currently all software distribution systems lacks the features I
need/want. Instead of wasting my time in the attempt to write yet
another distribution framework and get 10% there, I want to develop
just those features that I think are missing, and then hope that
existing systems such as EasyBuild picks it up and uses it.
Thus Hashdist is not meant to be used directly (except perhaps for a
few power-users) but rather as a component in other distribution systems.
Hashdist will be a set of loosely coupled tools. The below is my
personal wishlist which may be adjusted as the project proceeds:
a) A source store mechanism for downloading and hashing source code
(the hashing bit being the important part).
You already have a lot of this in EasyBuild but some others don't,
perhaps you can ignore this (or only use it to get the hashes to give
to b).
b) A "prefix database system" based on hashing; e.g.,
~/.hashdist/artifacts/numpy/1.7/a4324sdfq32r
(The exact path-name pattern will probably be configurable, that's one
of the things I want to engage you in discussing.)
This is what you already have in EasyBuild except that a cryptographic
hash is included in the path-name, so that if you make a minor change
such as a minor-version gcc upgrade, or change CFLAGS, or apply a
minor patch to your git tree and want to quickly try it out, this can
change the hash and cause a new parallel build/installation.
The "try it out" bit is important. I want jumping around between
slightly different software stacks to be as quick and easy as using
git, and this relies on the hashes to be quite reliable.
c) A tool for capturing the system software and hashing it and making
a prefix that symlinks to it; i.e.,
~/.hashdist/artifacts/gcc/4.6.3/34qw3da32e4q2 # symlinks to /usr/...
The point is simply that if the system software is upgraded we want to
track it somehow in the hashes of the dependencies. (Details on this
to be hashed out but I have some main ideas ready.)
d) A light-weight (optional!) jail tool to make sure that all
dependencies are explicitly stated when creating packages, so that the
following command would fail if anything is pulled in from /usr/lib
which wasn't first accessed through a symlink created in c) above:
LD_PRELOAD=hdistjail.so gcc ....
e) Garbage collection to remove prefixes that are no longer used
f) A tool to build "profiles", which are prefixes that mostly symlinks
to other software. (However there are some non-trivial cases, such as
I want to allow Python and Python packages to live in different
prefixes but still use them without relying on setting PYTHONPATH;
this can be handled by copying the python executable instead of
symlinking it.)
g) And finally, a tool like "modules" that knows about the above
allows inserting prefixes/profiles into the environment.
For desktop users in particular I think it's very important to be able
to simply call /some/path/to/python, e.g., without having to have
PYTHONPATH, LD_LIBRARY_PATH and so on set up correctly (which is
doable, but takes some extra effort). This may not affect EasyBuild
that much, but can help explain some of the design decisions in
Hashdist as I go along.
Dag Sverre