On 01/16/2013 11:58 AM, Riccardo Murri wrote:
Hi,

thanks for the prompt replies.

On Tue, Jan 15, 2013 at 2:09 PM, Kenneth Hoste <[email protected]> wrote:
Like Stijn mentioned, usually you really don't want to use the system
compilers/libraries, and an OS update might breaks all the built software
(sometimes in very subtle ways). You can get around it and define a
non-dummy toolchain that basically just uses the system stuff, but you
probably don't want to (although you may not realize that yet ;-)).

Can you clarify why you want to use the system compilers/libs? Is it just to
avoid needing to build a full toolchain stack?
Well, that was just an idea that we're exploring - I'm still not sure
about it.  However:

- the system toolchain has a much wider community of users and
   vendor/upstream support.  I see this as an advantage because bugs in
   Ubuntu's GCC or ATLAS packages would have probably already spotted
   by someone else.  With a custom-compiled toolchain (or a "completely
   isolated HPC environment" as Fotis put it), more support activity is
   on our shoulders as the user community is basically restricted to
   the local users.

True, but whenever you do stumble onto an issue, you'll need to make your own compiler/library build anyway, or wait until it gets fixed upstream.

With a self-built toolchain, you fix the issue (if you can) and continue your work.

- Ubuntu and Debian do not upgrade the compiler in stable releases,
   exactly for the point Stijn and you mentioned: almost every piece of
   a GNU/Linux system can be broken by a mad upgrade of the compiler,
   not just HPC software.

Are minor upgrades (e.g. GCC 4.6.3 to GCC 4.6.4) also prohibited? Because (although less likely), those may cause problems as well.

What about libraries? ATLAS? OpenMPI? I doubt they also stay away from updating those, which is just a potentially harmful as updating compilers.

The idea is to keep the stack of things you depend upon fixed, whatever happens. That's very important in the HPC world to (try and) ensure reproducibility of results of scientific experiments.

If some bug is causing problems, you just build a new version of a compiler/lib, provide an update of the toolchain, and rebuild the software you had trouble with.

- our cluster is heterogeneous: ATLAS built on one node might not be
   compatible with other nodes.  With the system packages, this won't
   happen (at the price of performance drop, but see next point).
- the cluster is mainly for HTC usage rather than HPC.  In practice
   this means we're less concerned with raw performance and more with
   the robustness of the system in processing a large number of jobs.

We have a very heterogeneous setup too. As Stijn already mentioned, we make dedicated builds for each architecture (we have 6 different ones right now).
Although that comes with some build/space overhead, it works out quite well.

If you want to make sure your built user-end software keeps working for a long time after multiple OS updates, which is equally important in HPC and HTC imho, I'd stay away from using the system-provided stuff (which is fine for building system tools, don't get me wrong).

But as I said, the whole idea is more a feeling than a strong opinion,
so do not bash me about it :-)

We can see where the idea that this might be better is coming from, but according to us it's an option that you will regret later on.


regards,

Kenneth

Reply via email to