On 10/18/2016 at 16:20 Ludovic Courtès writes:

> Hello,
>
> I’m trying to gather a “wish list” of things to be done to facilitate
> the use of Guix on clusters and for high-performance computing (HPC).

The scheduler that I am most familiar with, SGE, supports the
proposition that compute hosts are heterogeneous and that they each have
a fixed software and/or hardware configuration. As a result, users need
to specify resources, such as SW packages &/or #CPUs &/or memory needed
for a given job. These requirements in turn control where a given job
can run. QMAKE, the integration of GNU Make with the SGE scheduler,
further allows a make recipe step to specify specific resources for a
SGE job to process the make step.

While SGE is dated and can be a bear to use, it provides a useful
yardstick for HPC/Cluster functionality. So it is useful to consider how
Guix(SD) might impact this model. Presumably a defining characteristic
of GuixSD clusters is that the software configuration of compute hosts
no longer needs to be fixed and the user can "dial in" a specific SW
configuration for each job step.  This is in many ways a good thing. But
it also generates new requirements. How does one specify the SW config
for a given job or recipe step:

1) VM image?

2) VM?

3) Installed System Packages?

4) Installed (user) packages?

Based on my experiments with Guix/Debian, GuixSD, VMs, and VM images it
is not obvious to me which of these levels of abstraction is
appropriate. Perhaps any mix should be supported. In any case, tools to
manage this aspect of a GuixSD cluster are needed. And they need to be
integrated with the cluster scheduler to produce a manageable GuixSD HPC
cluster.

The most forward-thinking group that I know discarded their cluster
hardware a year ago to replace it with starcluster
(http://star.mit.edu/cluster/). Starcluster automates the creation,
care, and feeding of a HPC clusters on AWS using the Grid Engine
scheduler and AMIs. The group has a full-time "starcluster jockey" who
manages their cluster and they seem quite happy with the approach. So
you may want to consider starcluster as a model when you think of
cluster management requirements.

Reply via email to