On 10/18/2016 at 16:20 Ludovic Courtès writes: > Hello, > > I’m trying to gather a “wish list” of things to be done to facilitate > the use of Guix on clusters and for high-performance computing (HPC).
The scheduler that I am most familiar with, SGE, supports the proposition that compute hosts are heterogeneous and that they each have a fixed software and/or hardware configuration. As a result, users need to specify resources, such as SW packages &/or #CPUs &/or memory needed for a given job. These requirements in turn control where a given job can run. QMAKE, the integration of GNU Make with the SGE scheduler, further allows a make recipe step to specify specific resources for a SGE job to process the make step. While SGE is dated and can be a bear to use, it provides a useful yardstick for HPC/Cluster functionality. So it is useful to consider how Guix(SD) might impact this model. Presumably a defining characteristic of GuixSD clusters is that the software configuration of compute hosts no longer needs to be fixed and the user can "dial in" a specific SW configuration for each job step. This is in many ways a good thing. But it also generates new requirements. How does one specify the SW config for a given job or recipe step: 1) VM image? 2) VM? 3) Installed System Packages? 4) Installed (user) packages? Based on my experiments with Guix/Debian, GuixSD, VMs, and VM images it is not obvious to me which of these levels of abstraction is appropriate. Perhaps any mix should be supported. In any case, tools to manage this aspect of a GuixSD cluster are needed. And they need to be integrated with the cluster scheduler to produce a manageable GuixSD HPC cluster. The most forward-thinking group that I know discarded their cluster hardware a year ago to replace it with starcluster (http://star.mit.edu/cluster/). Starcluster automates the creation, care, and feeding of a HPC clusters on AWS using the Grid Engine scheduler and AMIs. The group has a full-time "starcluster jockey" who manages their cluster and they seem quite happy with the approach. So you may want to consider starcluster as a model when you think of cluster management requirements.