Hi, First of all, thanks for sharing all of these informations.
I also participated in a similar study in 2017 : https://arxiv.org/pdf/1709.10140.pdf
Eduardo is now working for Sylabs. I also did some talks in France about containers in the HPC : eg.: http://devlog.cnrs.fr/_media/jdev2017/dev2017_p8_rdernat_short.pdfThose technologies changed a lot since then, but conclusions are almost the same, even if this paper (https://arxiv.org/abs/1905.08415) focused on MPI.
I am using Singularity since version 1 and installed it in our HTC systems (in production) since march 2017 from version 2.2. It works well. I noticed some issues from time to time and submitted issues on github, or for new features. Sometimes, I even wrote some piece of code or doc for them or for the container ecosystem. The sylabs team is pretty active, and fixed all of those bugs, except one (CRIU / DMTCP [*]). I am happy to have installed it from all this time, since I am working mainly with a bioinformatic community of users. The bioinformatic software landspace is full of different languages with many dependencies. Before Singularity, I installed all softwares and packages, statically. At the end there were more than 150 applications, but the mixture of these applications was the most complicated part. Limitations of Singularity include MPI jobs where you still need to be very carefull in the MPI versions you use (container vs host).
I am also using nvidia-docker with the user namespace for GPUs, for about 2 years now. This is a standalone service, not connected to our clusters, mostly used for Deep Learning. However, I tried Singularity on it a while ago (using gpu4singularity code from NIH; before "--nv" option appears in Singularity) and it also works fine.
IMHO, the main downside of singularity, is that Sylabs are developping new releases too quickly, and most of the time, new releases are targeting security bugs... So, as an admin, you need to upgrade quite often, if your OS is still compatible with it... Alternatives are to use charliecloud, udocker or shifter. But we also chose Singularity b/c it was the most active project, with many contributors. Some people could also argue that Singularity is moving too fast to a "cloud model", allowing the use of K8s, but I not agree as HPC community is here from the beginning, for them, and I think (almost for sure) that they won't turn their back on them.
Note that some new big players could become interesting in the future b/c big companies are developing those products (Katacontainers, with a mix of advantages between VM and containers, supported by Intel, using a lightweight qemu, and Podman with RedHat from the atomic project). However, from this time, HPC is not in their priority list, and RedHat, as well as nvidia, already collaborated with Singularity (in many ways).
IMO all HPC systems should now allow users to launch "containerized" jobs on it (and not using Docker for obvious security reasons). We have now many scientific workflows which are designed to run either on HPC/HTC/Grids or clouds platforms. If, as a big HPC platform sysadmin, you don't allow that kind of jobs, the risk for your platform is to see many people leaving the platform to compute elsewhere (eg. in the cloud or another platform allowing to perform containerized tasks), even if you could argue that they won't benefit the best hardware performances (if users already have a pipeline or a single application using containers (many apps are now "containerized"), they won't be very excited in converting everything to get it work on your HPC system [**]). So you have to be proactive, and collaborate with your users, to provide the best and secure platform to run those jobs.
We are currently developing a WebUI that is generating docker and Singularity recipes [***]; every contributions are welcome (note that some issues descriptions are still in french). This is still a WIP, and recipes may be not secure for now.
Best regards, Rémy.[*] Since version 3.2, Singularity provides a way to stop/resume jobs with the OCI subcommands using cgroup freezer.
[**] Ok, many developers worked hard in the past, to get their software work on HPC systems (using OpenMPI, cuda, OpenACC, or whatsoever), and they will continue, for sure, but most of academic/scientific users did not have any easy access to a cloud (and now, many users have that kind of access... ?)... What would be the gain for their users between using an optimized app on a HPC system, and distributing their (sequentials ?) jobs among many clouds... (that is a real question, and I think there is no good answer, as it depends on the size and the type of the problem and the software used, but I can be wrong about it...) ?
[***] https://gitlab.mbb.univ-montp2.fr/jlopez/wicopa/ Contributors need an account on our gitlab; you can email me or create an issue here (https://kimura.univ-montp2.fr/calcul/helpdesk_NewTicket.html) to get one. Le 26/05/2019 à 14:17, Benjamin Redling a écrit :
Good news. I'll try it out, again. Am 26. Mai 2019 13:57:05 MESZ schrieb INKozin <i.n.ko...@googlemail.com>:for what it's worth, Singularity worked well for me last time I tried it. I think it was shortly after NVIDIA had announced support for it. On Sun, 26 May 2019 at 11:11, Benjamin Redling <benjamin.ra...@uni-jena.de> wrote:On 23/05/2019 16.13, Loncaric, Josip via Beowulf wrote:"Charliecloud" is a more secure approach to containers in HPC:I tried Singularity short before and during 2.3 with GPUs -- didn't work, documented issue, maybe solved. Stopped caring. Shortly afterwards I read about Charliecloud and tried it -- didn't work, too many issues. Stopped caring. So, "more secure" on paper (less lines of code) doesn't get any workdone.My advice to anyone with a working setup: try it out if time permits, but don't bother to much and definitively don't advertise it to third parties beforehand. Regards, Benjamin -- FSU Jena | https://JULIELab.de/Staff/Redling/ ☎ +49 3641 9 44323 _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by PenguinComputingTo change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-- Dernat Rémy Plateforme MBB - ISEM Montpellier
smime.p7s
Description: Signature cryptographique S/MIME
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf