Re: If someone wants to join (Was: Any input for some talk about usage of Debian in HPC)
The call is today (2024-06-20) at 17:00 UTC (19:00 CEST, for example) Timezone converter: https://time.is/1700_20_June_2024_in_UTC?Debian_GNU/Linux_for_Scientific_Research Thank you Andreas for presenting! On 20/06/2024 13.15, Andreas Tille wrote: Hi, feel free to pick the relevant data from this document https://docs.google.com/document/d/1rcvtsD5QVmmLxSDNxm6FBrWdI0GjXSZtfJ-vkGfKPO4 if you want to join. Thanks to all who provided input which was really valuable for me. Kind regards Andreas. Am Sun, May 19, 2024 at 01:12:40PM +0200 schrieb Andreas Tille: Hi, I have an invitation to have some talk with the title Debian GNU/Linux for Scientific Research Abstract: Over the past decade, Enterprise Linux has dominated large-scale research computing infrastructure. However, recent developments have sparked increased interest in community-led alternatives. Debian GNU/Linux, a long-standing choice among researchers for supporting scientific work, is experiencing a renewed interest for High-Throughput Computing (HTC) and High-Performance Computing (HPC) applications. This presentation will provide an overview of how Debian is being utilized to support scientific research and will include a case study showcasing the migration of HTC operations from Enterprise Linux 7 (EL7) to Debian. While I could talk about Debian Science and Debian Med in general it would be cool to reference to some real life examples where Debian is used in Science and what might be the reason to use Debian. I personally would like to stress the "we package what we use" aspect and the "we mentor upstream to merge competence of the program with packaging skills" idea. Any input would be welcome to cover more ideas. Kind regards Andreas. -- https://fam-tille.de OpenPGP_signature.asc Description: OpenPGP digital signature
If someone wants to join (Was: Any input for some talk about usage of Debian in HPC)
Hi, feel free to pick the relevant data from this document https://docs.google.com/document/d/1rcvtsD5QVmmLxSDNxm6FBrWdI0GjXSZtfJ-vkGfKPO4 if you want to join. Thanks to all who provided input which was really valuable for me. Kind regards Andreas. Am Sun, May 19, 2024 at 01:12:40PM +0200 schrieb Andreas Tille: > Hi, > > I have an invitation to have some talk with the title > >Debian GNU/Linux for Scientific Research > > Abstract: > >Over the past decade, Enterprise Linux has dominated large-scale >research computing infrastructure. However, recent developments have >sparked increased interest in community-led alternatives. Debian >GNU/Linux, a long-standing choice among researchers for supporting >scientific work, is experiencing a renewed interest for High-Throughput >Computing (HTC) and High-Performance Computing (HPC) applications. This >presentation will provide an overview of how Debian is being utilized to >support scientific research and will include a case study showcasing the >migration of HTC operations from Enterprise Linux 7 (EL7) to Debian. > > While I could talk about Debian Science and Debian Med in general it > would be cool to reference to some real life examples where Debian is > used in Science and what might be the reason to use Debian. > > I personally would like to stress the "we package what we use" aspect > and the "we mentor upstream to merge competence of the program with > packaging skills" idea. Any input would be welcome to cover more ideas. > > Kind regards > Andreas. > > -- > https://fam-tille.de > > -- https://fam-tille.de
Re: Any input for some talk about usage of Debian in HPC
On 20/05/2024 21:00, Steven Robbins wrote: Hello, On Sunday, May 19, 2024 9:31:02 A.M. CDT Tony Travis wrote: You can't ignore the host OS when you talk about HPC applications and the HEP (High Energy Physics) community put a lot of effort into developing good node provisioning systems and job-scheduling for HPC. Consequently, there was a significant bias towards support for HEP applications running under CentOS and less support for bioinformatics. I've been out of academia for decades, but HEP was my first love and neuroimaging my second, so this paragraph really piqued my interest. Can you briefly say what are the different needs of HEP and bioinformatics and how they are in conflict? Hi, Steve. Many HEP applications involve a lot of floating-point arithmetic and are computationally intensive. By contrast most bioinformatics applications do not require floating point arithmetic: They are dominated by speed of memory access and memory size. Optimisations used in HEP calculations to keep everything in the high-speed CPU cache don't help with this access. One area of bioinformatics in particular that has this sort of memory requirement is sequence alignment and sequence assembly. Some efforts have been made to speed this up using GPGPU and SIMD CPU instructions, but I've found it all very complicated and disappointing to be honest. A recent success for GPGPU applications in bioinformatics is, however, base-calling of 'long' DNA sequencing reads and TensorFlow ML (Machine Learning) methods for error-correcting DNA/RNA sequence reads and e.g. predicting Transcription Factor Binding Sites etc. None of this requires the use of floating point calculations in the frequency/Fourier domain that many HEP applications do. I must admit that my views are largely based on experience of helping my friend do DFT (Density Field Theory) simulations of protein 'docking' domains on a Beowulf cluster that we built for chemical modelling and bioinformatics! I also worked for several years doing image analysis with Physicists :-) Bye, Tony. -- Minke Informatics Limited, Registered in Scotland - Company No. SC419028 Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK) tel. +44(0)19755 63548http://minke-informatics.co.uk mob. +44(0)7985 078324mailto:tony.tra...@minke-informatics.co.uk
Re: Any input for some talk about usage of Debian in HPC
Hello, On Sunday, May 19, 2024 9:31:02 A.M. CDT Tony Travis wrote: > You can't ignore the host OS when you talk about HPC applications and > the HEP (High Energy Physics) community put a lot of effort into > developing good node provisioning systems and job-scheduling for HPC. > Consequently, there was a significant bias towards support for HEP > applications running under CentOS and less support for bioinformatics. I've been out of academia for decades, but HEP was my first love and neuroimaging my second, so this paragraph really piqued my interest. Can you briefly say what are the different needs of HEP and bioinformatics and how they are in conflict? -Steve signature.asc Description: This is a digitally signed message part.
Re: Any input for some talk about usage of Debian in HPC
On Sun, 2024-05-19 at 13:12 +0200, Andreas Tille wrote: > Hi, > > I have an invitation to have some talk with the title > > Debian GNU/Linux for Scientific Research > > I learned from some of the LIGO sysadmins at Caltech that some of the LIGO systems are using Debian. The LIGO documentation https://computing.docs.ligo.org/guide/software/debian/ mentions a server for their custom apt repository, and I thought I'd look around which gives this information page about their cluster. https://hypatia.aei.mpg.de/cgi-bin/hypatia-index.cgi?p=main Which looks like a fairly large HPC cluster running Debian, and using Debian tool FAI. Diane
Re: Any input for some talk about usage of Debian in HPC
On 19/05/2024 12:12, Andreas Tille wrote: Hi, I have an invitation to have some talk with the title Debian GNU/Linux for Scientific Research Abstract: Over the past decade, Enterprise Linux has dominated large-scale research computing infrastructure. However, recent developments have sparked increased interest in community-led alternatives. Debian GNU/Linux, a long-standing choice among researchers for supporting scientific work, is experiencing a renewed interest for High-Throughput Computing (HTC) and High-Performance Computing (HPC) applications. This presentation will provide an overview of how Debian is being utilized to support scientific research and will include a case study showcasing the migration of HTC operations from Enterprise Linux 7 (EL7) to Debian. While I could talk about Debian Science and Debian Med in general it would be cool to reference to some real life examples where Debian is used in Science and what might be the reason to use Debian. Hi, Andreas. The Sanger Centre in the UK use Ubuntu + OpenStack + Ceph: https://www.sanger.ac.uk/group/core-software-services/ I realise that it's not Debian, but it is based on Debian. I went there many years ago when they were running Debian on DEC Alpha AXP's, but they moved to CentOS because many other Academic HPC centres were using it, including ours when I worked at the University of Aberdeen. This was not a good experience, and they decided to change to Ubuntu mainly because of the support provided by Canonical for OpenStack and Ceph. However, in my opinion, CentOS/RHEL is not a good platform for bioinformatics because the 'Enterprise' approach stifles innovation. You can't ignore the host OS when you talk about HPC applications and the HEP (High Energy Physics) community put a lot of effort into developing good node provisioning systems and job-scheduling for HPC. Consequently, there was a significant bias towards support for HEP applications running under CentOS and less support for bioinformatics. This was partly the motivation underlying our development of Bio-Linux in order to provide biologists with an alternative platform running on their own hardware instead of struggling to get the IT department to port the software they wanted to use to CentOS. In that respect the Debian-Med project was fundamentally important in helping biologists do their work outside of the centrally managed 'Enterprise' oriented IT policy imposed on us by Universities and Research Institutes. The Sanger Centre provide a centrally managed HPC that is 'biologist-friendly' and, I think, is an excellent model of how it should be done. However, it does not support the view that Debian should be the HPC OS because the main reason they chose Ubuntu was the commercial support for OpenStack and Ceph provided by Canonical. I personally would like to stress the "we package what we use" aspect and the "we mentor upstream to merge competence of the program with packaging skills" idea. Any input would be welcome to cover more ideas. As you might remember, I built and I advocate the use of 'departmental' or 'research-group' clusters. These are much more powerful than an individual biologists personal laptop, but are under the administrative control of the department or research group that funded their purchase. In the past, I've used various HPC node-provisioning, cluster filesystem and job submission systems running under one version of another of Bio-Linux, now using your "med-bio" meta-package to provide bioinformatics software instead of the discontinued Bio-Linux packages. However, I've recently set up a 3-node 'Proxmox-VE' cluster: https://www.proxmox.com/en/proxmox-virtual-environment/overview [Proxmox is a GPL server management system based on Debian] I'm using the Proxmox cluster for a bioinformatics in schools project with the University of Edinburgh: https://4273pi.org/ I'm also planning to use it for a new project with the IAEA in Vienna. I think that giving biologists the choice of running the software they want under the OS they choose is very important when innovation is the priority of an organisation rather than centralisation of IT systems to reduce cost. You can, of course use Proxmox-VE as the node-provisioning and shared filesystem of an HPC cluster. Or, simply provide biologists with VMs running their OS of choice, administered by themselves e.g. a Bio-Linux VM or vanilla Debian etc. etc. Finally, don't forget about Amdahl's Law: https://en.wikipedia.org/wiki/Amdahl%27s_law There is really no such thing as an HPC or HTP 'application', because it's the underlying resource management system of an HPC cluster that provides the 'HP'. In my experience, most bioinformatics applications are 'embarrassingly' parallel and in this case processes do not communicate with each other. The 'HP' is achieved by managing the workflow efficiently usi
Any input for some talk about usage of Debian in HPC
Hi, I have an invitation to have some talk with the title Debian GNU/Linux for Scientific Research Abstract: Over the past decade, Enterprise Linux has dominated large-scale research computing infrastructure. However, recent developments have sparked increased interest in community-led alternatives. Debian GNU/Linux, a long-standing choice among researchers for supporting scientific work, is experiencing a renewed interest for High-Throughput Computing (HTC) and High-Performance Computing (HPC) applications. This presentation will provide an overview of how Debian is being utilized to support scientific research and will include a case study showcasing the migration of HTC operations from Enterprise Linux 7 (EL7) to Debian. While I could talk about Debian Science and Debian Med in general it would be cool to reference to some real life examples where Debian is used in Science and what might be the reason to use Debian. I personally would like to stress the "we package what we use" aspect and the "we mentor upstream to merge competence of the program with packaging skills" idea. Any input would be welcome to cover more ideas. Kind regards Andreas. -- https://fam-tille.de