Re: [easybuild] dateutil and python module builds

Fotis Georgatos Mon, 16 Mar 2015 10:27:50 +0100 (CET)

Hello Stuart,

first, sorry for reacting on this email so many days later, I wanted to give 
more time to it. 
In fact, it’s a lengthy response (it will be TL;DR for most others, be aware).

On Mar 5, 2015, at 4:55 PM, Stuart Barkley <stua...@4gh.net> wrote:
> Excuse the following paragraph of ranting, easybuild is still looking
> like a good and useful tool.  I'm an old curmudgeon who does not like
> some of the trends on the Internet as a whole.

Yeah, we all do that sometimes :-P

> I find it disgusting that a lot of modern software will just
> arbitrarily download things from the Internet.  Perl started doing

And it is disgusting, indeed!

In fact, it is *two* sins wrapped into one case, breaking both software quality 
features:
a) generality - there is an assumption of net connectivity which is not always 
the case
b) orthogonality - version is implied hardwired

Definitions for software quality (my take) visible here:
https://github.com/fgeorgatos/easybuild.experimental/wiki/HPC-Software-Quality

> some of this late in it's life and is somewhat controllable.  Python
> seems to have embraced this early in it's life.  I believe that
> software builders really need to know and explicitly state/request all
> of the libraries their software uses.  Not doing this can build large,
> complex or other long term issues into software without anyone
> knowing.  This issue is part of why I support our systems not having a
> direct Internet connection.

Your approach is basically the fail-early one, so that you can collect 
dependencies
up-front. I am not that fanatic in this respect, however I am glad others are :)

>> There are some ways (e.g., doing the build in a jail-like
>> environment), but they're not trivial.
> 
> My initial hope was that easybuild supported building software outside
> of the live environment (mock, chroot, fakeroot).  This is not the
> case and I can live with it for now but I hope this is/becomes part of
> the future direction.

What you are saying, in effect, is to confine not only build process, but also 
build environment.
I think all people who have been for a while in this business, have expressed 
consent
to go in this direction (in fact, docker has been referenced quite a few times).
fi. I’ve seen on IRC rjeschmi testing PRs from docker?! 

To be fair, this may be beyond and/or orthogonal to EasyBuild:
- noone blocks anybody to setup an isolated build environment under chroot or 
whatever
However, this works well for one build, it is more tricky for builds with 
dependencies!
IMHO it’s all doable, it just takes someone with a strong itch to jump on the 
problem.

In fact, I would like that kind of feature to be possible to combine with this 
effort:
https://github.com/hpcugent/easybuild-framework/pull/1170 ## how to call fpm 
from EB
because then you can end up with tightly-controlled builds where no bits can be 
spilled 
in/out of the build process (you may obtain bit-level control at the build 
process).

Then, it would still be the case that different code paths may be taken on the 
basis
of archs (think: AVX vs non-AVX builds), however there would be far better 
traceability
than what we have now (where a single header file can variate a build, wildly).

>> I'll see what I can do to get six and ecdsa properly included where
>> needed, as dependencies for dateutil and paramiko, resp., thanks for
>> bringing this up.
> 
> thanks.

+1, thanks.

>> Just to clarify: the easyconfigs that are part of the EasyBuild
>> installation are meant to be treated as examples (which does not
>> mean they should contain inconsistencies like the one you reported).
> 
> This is also where I find the current easybuild process a little
> awkward.  I generally prefer examples to be out of the main execution
> path.  i.e.  If these are really examples, they should not be used
> automatically.  If they are a default configuration, they should work
> and be maintained (which is a lot of work for 2000+ configurations).

I think your last sentence explains why they are called examples :)

ie. the original EB authors try to disclaim responsibility for fitness
of the builds “for any particular purpose”. I find that a fair game.

IMHO, as time advances and community knowledge is pooled together
they would start looking less and less of examples and more and more
of production configuration - for most. In fact, here is the current status 
summary:
- If you have RHEL6, x86_64 w. AVX, Infiniband & CUDA - You have 
*near-production* builds
- If you have alternative OS/architecture - you largely call them examples

> I presume this can be addressed by not installing the easyconfig
> package, but I would still like to have the examples handy for
> reference.

IMHO, you can’t boost your HPC software builds productivity without the 
examples;
you’d have to reinvent far too much of the wheel, starting from zero, to get 
there.

A relative straight-forward approach, which I’ve used for a while, is the 
following:
- fork & clone the easyconfigs git repo
- create your site’s customisation branch in it (git checkout -b XXX)
- modify/hack that tree to your heart’s liking, commit/push etc
- feed to easybuild via -r XXXpath

Rinse, repeat. That should work.

As some point, you’ll know it’s the right time to contribute back with changes
which others can tap on. Please do that!

> It is a little more unclear how easyblocks are to be treated.  So far
> I have not had to look at or deal with any of the easy blocks.

Although many of us have expressed sentiments of unease with the fact
that easyconfigs & easyblocks are effectively a configuration set split apart,
fact of life is that plain easyconfigs are not sufficient for all kind of
build processes - this is where easyblocks come into play, with Python code.

For the moment, I’d suggest you check out which easyblocks are generic,
and which ones are not. This is a major distinction and that may trigger us,
in the future, to rethink how the code is factorized and how it should be.

> After things settle down I expect to have a consistent collection of
> easyconfig files in the ebfiles_repo directory.  I do notice that
> these files have an extra buildstats clause in them which causes minor
> difficulties diffing my original files against the built versions to
> ensure consistency.

IMHO, you should only use those for human reference, rather than diffing.

Let git do the diff job for you, see the git work approach proposed above;
it should be good enough to get you going for a while, 
until you get your own direction and can fly on your own :)

> This is what I'm doing and I expect to spend considerable time
> building our first production set of applications.  Easybuild does
> cover a large portion of our user software needs.  At this time I have
> no need/desire to address any application unless it already has an
> example easyconfig (in the future I do expect to start adding other or
> unique software).

One important trick, to help grow the community:
a) invite your users to write easyconfigs/easyblocks
b) if they can’t write easyconfigs/easyblocks, ask them to shell-script the 
build process
c) it’s easy/deterministic to convert shell scripts to an EB build process

It is very important to stay emphatic to your users that THEY are the owners of 
step (b).
In fact, only the expert user really knows which kind of build process is the 
right one for their purpose
(proof: tell them that you will toss a coin to decide if you’ll do a 32b or 64b 
build - see how they react).

Fact of life is, scientific software has a far too large configuration vector 
to handle
and it is users’ responsibility to define that - for the reproducibility 
argument.
More often than not, this includes the exact versions of software that should 
be employed.

> A follow on issue is when to update our private configuration with new
> versions of software.  I hope this or the HPCBIOS project can provide
> some backend support for this.

OK. Over this spring, I expect to do a major rework over LifeScience/Bioinfo 
collection;
if you are in this area hang on tight for major improvements and a worthy 
update.

However, I’d expect that other fellows would have a strong itch for other 
science domains
and may come up with the definitions that suit them for their objectives. I’ll 
certainly help 
with the write up of “policies” as well as related testing to ensure at least 
some generality.

Some scientific fields that others could experiment with, emerging anytime in 
the future:
* climate Science (most work is finished here, somebody is just needed to 
validate it)
* density-functional theory (DFT codes) - someone bother with this please ;-)
* benchmarking codes (incl. io* & friends) - I am sure you’d appreciate :)
* visualisation, rendering & typesetting (half-baked recipes largely exist for 
this)

any takes?

I have access to hundreds of clusters/sites if somebody wants to get serious in 
testing.

>> See numpy-1.7.1-goolf-1.4.10-Python-2.7.3.eb for example. The numpy
>> that is included in Python-2.7.3-goolf-1.4.10.eb is older (v1.6.1),
>> so a seperate easyconfig file for numpy allows to install a newer
>> version of numpy than the one that comes with the Python module.
> 
> I did see that numpy and scipy and a few others were already included
> in the Python build.  I didn't notice the version numbers differences
> but it is good to see support for other versions available.  Can these
> modules be loaded on top of a python already containing older
> versions?
> 
> What is current thinking on whether things like scipy should be built
> into the python build or be separate modules?

Apparently, the answer is you can do both and makes sense to do both:
- the more specific version can be loaded after the python default.

AFAIK, this works well IFF there is a depends-on relationship AND a 
prepend-path clause
(ie. this combination ensures correct PYTHONPATH ordering - regardless of 
loads/unloads);
if I’m missing something here, please somebody stop me right now :)

> For now I'm planning to build a single consistent set of software
> (with just the goolf toolchain).  Every 6-8 months or so would be an
> updated release with freshened software.

Everybody has contemplated a twice-a-year update of software collections;
in practice, it has proven to be a bit more tricky. It’s all a matter
of putting man-hours together to get things going.

> My real hope (but not expectation) is that scientific software
> developers will improve their processes to better address
> compatibility between software versions.

IMHO, it is their responsibility to fix those processes, however it is also
our responsibility (as systems people or, user supporters) to tell them the 
how/what.

If we don’t publicly share the expectations and hopes, they won’t happen - as 
simple as that. 

I’d much invite for a hall-of-non-fame for year 2015, for software not 
fulfilling these:
https://github.com/fgeorgatos/easybuild.experimental/wiki/HPC-Software-Quality
(fi: I’m guilty of the *testable* requirement for my own tool 
http://cern.ch/fotis/QTOP)

Come on, speak your mind; get that rant out in the wild ;-)

> I'm still getting used to the log files and am finding some issues in
> their usability, but that is subject for a separate discussing at
> another time.

If you have ideas on how to improve them, they are certainly welcome.
A good way to get that discussion started, is to point how others do it better.

> Other things I plan to do are building documentation, demonstration,
> example and performance cases for each major software package.  In
> part, this becomes site specific but I also hope for some support from
> the HPCBIOS and similar efforts.

I’d prefer that people see HPCBIOS more as a collection of RFC-like definitions 
(as IETF used to do):
- common and well-defined configuration aspects of HPC sites, which do not 
constraint implementation
In fact, two independent implementations which prove a compatibility layer make 
for a good HPCBIOS case!
By no means do we suppose that all these definitions are good for all sites at 
once (ie. do mix’n’match).

btw. 
the standard IETF doctrine for RFCs (which I personally find very productive in 
the 90s) has been:
- rough consensus and running code

I hope we can achieve something similar in our field of interest.

best,
Fotis

-- 
echo "sysadmin know better bash than english" | sed s/min/mins/ \
  | sed 's/better bash/bash better/' # signal detected in a CERN forum

Re: [easybuild] dateutil and python module builds

Reply via email to