subject:"Re\: Any input for some talk about usage of Debian in HPC"

Re: Any input for some talk about usage of Debian in HPC

2024-05-20 Thread Tony Travis


On 20/05/2024 21:00, Steven Robbins wrote:

Hello,

On Sunday, May 19, 2024 9:31:02 A.M. CDT Tony Travis wrote:


You can't ignore the host OS when you talk about HPC applications and
the HEP (High Energy Physics) community put a lot of effort into
developing good node provisioning systems and job-scheduling for HPC.
Consequently, there was a significant bias towards support for HEP
applications running under CentOS and less support for bioinformatics.


I've been out of academia for decades, but HEP was my first love and
neuroimaging my second, so this paragraph really piqued my interest.  Can you
briefly say what are the different needs of HEP and bioinformatics and how they
are in conflict?


Hi, Steve.

Many HEP applications involve a lot of floating-point arithmetic and are 
computationally intensive. By contrast most bioinformatics applications 
do not require floating point arithmetic: They are dominated by speed of 
memory access and memory size. Optimisations used in HEP calculations to 
keep everything in the high-speed CPU cache don't help with this access.


One area of bioinformatics in particular that has this sort of memory 
requirement is sequence alignment and sequence assembly. Some efforts 
have been made to speed this up using GPGPU and SIMD CPU instructions, 
but I've found it all very complicated and disappointing to be honest.


A recent success for GPGPU applications in bioinformatics is, however, 
base-calling of 'long' DNA sequencing reads and TensorFlow ML (Machine 
Learning) methods for error-correcting DNA/RNA sequence reads and e.g. 
predicting Transcription Factor Binding Sites etc.


None of this requires the use of floating point calculations in the 
frequency/Fourier domain that many HEP applications do. I must admit 
that my views are largely based on experience of helping my friend do 
DFT (Density Field Theory) simulations of protein 'docking' domains on a 
Beowulf cluster that we built for chemical modelling and bioinformatics!


I also worked for several years doing image analysis with Physicists :-)

Bye,

  Tony.

--
Minke Informatics Limited, Registered in Scotland - Company No. SC419028
Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)
tel. +44(0)19755 63548http://minke-informatics.co.uk
mob. +44(0)7985 078324mailto:tony.tra...@minke-informatics.co.uk

Re: Any input for some talk about usage of Debian in HPC

2024-05-20 Thread Steven Robbins

Hello,

On Sunday, May 19, 2024 9:31:02 A.M. CDT Tony Travis wrote:

> You can't ignore the host OS when you talk about HPC applications and
> the HEP (High Energy Physics) community put a lot of effort into
> developing good node provisioning systems and job-scheduling for HPC.
> Consequently, there was a significant bias towards support for HEP
> applications running under CentOS and less support for bioinformatics.

I've been out of academia for decades, but HEP was my first love and 
neuroimaging my second, so this paragraph really piqued my interest.  Can you 
briefly say what are the different needs of HEP and bioinformatics and how they 
are in conflict?

-Steve

signature.asc
Description: This is a digitally signed message part.

Re: Any input for some talk about usage of Debian in HPC

2024-05-19 Thread Diane Trout

On Sun, 2024-05-19 at 13:12 +0200, Andreas Tille wrote:
> Hi,
> 
> I have an invitation to have some talk with the title
> 
>    Debian GNU/Linux for Scientific Research
> 
> 

I learned from some of the LIGO sysadmins at Caltech that some of the
LIGO systems are using Debian.

The LIGO documentation
https://computing.docs.ligo.org/guide/software/debian/

mentions a server for their custom apt repository, and I thought I'd
look around which gives this information page about their cluster.
https://hypatia.aei.mpg.de/cgi-bin/hypatia-index.cgi?p=main

Which looks like a fairly large HPC cluster running Debian, and using
Debian tool FAI.

Diane

Re: Any input for some talk about usage of Debian in HPC

2024-05-19 Thread Tony Travis


On 19/05/2024 12:12, Andreas Tille wrote:

Hi,

I have an invitation to have some talk with the title

Debian GNU/Linux for Scientific Research

Abstract:

Over the past decade, Enterprise Linux has dominated large-scale
research computing infrastructure. However, recent developments have
sparked increased interest in community-led alternatives. Debian
GNU/Linux, a long-standing choice among researchers for supporting
scientific work, is experiencing a renewed interest for High-Throughput
Computing (HTC) and High-Performance Computing (HPC) applications.  This
presentation will provide an overview of how Debian is being utilized to
support scientific research and will include a case study showcasing the
migration of HTC operations from Enterprise Linux 7 (EL7) to Debian.

While I could talk about Debian Science and Debian Med in general it
would be cool to reference to some real life examples where Debian is
used in Science and what might be the reason to use Debian.


Hi, Andreas.

The Sanger Centre in the UK use Ubuntu + OpenStack + Ceph:

https://www.sanger.ac.uk/group/core-software-services/

I realise that it's not Debian, but it is based on Debian. I went there 
many years ago when they were running Debian on DEC Alpha AXP's, but 
they moved to CentOS because many other Academic HPC centres were using 
it, including ours when I worked at the University of Aberdeen.


This was not a good experience, and they decided to change to Ubuntu 
mainly because of the support provided by Canonical for OpenStack and 
Ceph. However, in my opinion, CentOS/RHEL is not a good platform for 
bioinformatics because the 'Enterprise' approach stifles innovation.


You can't ignore the host OS when you talk about HPC applications and 
the HEP (High Energy Physics) community put a lot of effort into 
developing good node provisioning systems and job-scheduling for HPC. 
Consequently, there was a significant bias towards support for HEP 
applications running under CentOS and less support for bioinformatics.


This was partly the motivation underlying our development of Bio-Linux 
in order to provide biologists with an alternative platform running on 
their own hardware instead of struggling to get the IT department to 
port the software they wanted to use to CentOS. In that respect the 
Debian-Med project was fundamentally important in helping biologists do 
their work outside of the centrally managed 'Enterprise' oriented IT 
policy imposed on us by Universities and Research Institutes.


The Sanger Centre provide a centrally managed HPC that is 
'biologist-friendly' and, I think, is an excellent model of how it 
should be done. However, it does not support the view that Debian should 
be the HPC OS because the main reason they chose Ubuntu was the 
commercial support for OpenStack and Ceph provided by Canonical.



I personally would like to stress the "we package what we use" aspect
and the "we mentor upstream to merge competence of the program with
packaging skills" idea.  Any input would be welcome to cover more ideas.


As you might remember, I built and I advocate the use of 'departmental' 
or 'research-group' clusters. These are much more powerful than an 
individual biologists personal laptop, but are under the administrative 
control of the department or research group that funded their purchase.


In the past, I've used various HPC node-provisioning, cluster filesystem 
and job submission systems running under one version of another of 
Bio-Linux, now using your "med-bio" meta-package to provide 
bioinformatics software instead of the discontinued Bio-Linux packages.


However, I've recently set up a 3-node 'Proxmox-VE' cluster:


https://www.proxmox.com/en/proxmox-virtual-environment/overview


[Proxmox is a GPL server management system based on Debian]

I'm using the Proxmox cluster for a bioinformatics in schools project 
with the University of Edinburgh:



https://4273pi.org/


I'm also planning to use it for a new project with the IAEA in Vienna.

I think that giving biologists the choice of running the software they 
want under the OS they choose is very important when innovation is the 
priority of an organisation rather than centralisation of IT systems to 
reduce cost. You can, of course use Proxmox-VE as the node-provisioning 
and shared filesystem of an HPC cluster. Or, simply provide biologists 
with VMs running their OS of choice, administered by themselves e.g. a 
Bio-Linux VM or vanilla Debian etc. etc.


Finally, don't forget about Amdahl's Law:


https://en.wikipedia.org/wiki/Amdahl%27s_law


There is really no such thing as an HPC or HTP 'application', because 
it's the underlying resource management system of an HPC cluster that 
provides the 'HP'. In my experience, most bioinformatics applications 
are 'embarrassingly' parallel and in this case processes do not 
communicate with each other. The 'HP' is achieved by managing the 
workflow efficiently usi

Re: Any input for some talk about usage of Debian in HPC

Re: Any input for some talk about usage of Debian in HPC

Re: Any input for some talk about usage of Debian in HPC

Re: Any input for some talk about usage of Debian in HPC

4 matches

Site Navigation

Mail list logo

Footer information