Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
"Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)"
 writes:

>  On Sep 29, 2022, at 10:34 AM, Steffen Grunewald 
>  wrote:
>
>  Hi Noam,
>
>  I'm wondering why one would want to know that - given that there are
>  approaches to multi-node operation beyond MPI (Charm++ comes to mind)?
>
> The thread title requested a way of detecting non-MPI jobs running on 
> multiple nodes.  I assumed that the requester knows, maybe based on their 
> users' software, that there are no legitimate ways for them to run on 
> multiple nodes without MPI.
> Actually, we have users that run embarrassingly parallel jobs which just ssh 
> to the other nodes and gather files, so clearly it can be done in a useful 
> way with very low-tech approaches, but that's a n oddball (and just plain 
> old) software package.

There may indeed be legitimate ways for non-MPI jobs to be running on
multiple nodes, but that's a bit of an edge case.  However, such cases
would be fine, as long as the resources requested are being used
efficiently.  Thus, Ward's suggestion about checking for cgroups seems
the most general solution.  Having said that, it would also be useful to
then check the head node for 'mpirun' or similar.

Cheers,

Loris



Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
On Sep 29, 2022, at 10:34 AM, Steffen Grunewald 
mailto:steffen.grunew...@aei.mpg.de>> wrote:

Hi Noam,

I'm wondering why one would want to know that - given that there are
approaches to multi-node operation beyond MPI (Charm++ comes to mind)?

The thread title requested a way of detecting non-MPI jobs running on multiple 
nodes.  I assumed that the requester knows, maybe based on their users' 
software, that there are no legitimate ways for them to run on multiple nodes 
without MPI. Actually, we have users that run embarrassingly parallel jobs 
which just ssh to the other nodes and gather files, so clearly it can be done 
in a useful way with very low-tech approaches, but that's a n oddball (and just 
plain old) software package.


Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Steffen Grunewald
On Thu, 2022-09-29 at 14:03:58 +, Bernstein, Noam CIV USN NRL (6393) 
Washington DC (USA) wrote:
> Can you check slurm for a job that requests multiple nodes but doesn't have 
> mpirun (or srun, or mpiexec) running on its head node?

Hi Noam,

I'm wondering why one would want to know that - given that there are
approaches to multi-node operation beyond MPI (Charm++ comes to mind)?

Best,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~



Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Ward Poelmans

Hi Loris,

On 29/09/2022 09:26, Loris Bennett wrote:


I can see that this is potentially not easy, since an MPI job might have
still have phases where only one core is actually being used.


Slurm will create the needed cgroups on all the nodes that are part of the job 
when the job starts. So you could with a cron job check if there are any 
cgroups on the node with no processes in it?

Ward


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Can you check slurm for a job that requests multiple nodes but doesn't have 
mpirun (or srun, or mpiexec) running on its head node?


Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi Ole,

Ole Holm Nielsen  writes:

> Hi Loris,
>
> On 9/29/22 09:26, Loris Bennett wrote:
>> Has anyone already come up with a good way to identify non-MPI jobs which
>> request multiple cores but don't restrict themselves to a single node,
>> leaving cores idle on all but the first node?
>> I can see that this is potentially not easy, since an MPI job might have
>> still have phases where only one core is actually being used.
>
> Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an
> "unexpected" CPU load.  If you see the same JobID runing on multiple nodes 
> with a too low CPU load, that might point to a job such as you describe.
>
> /Ole
>
> [1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat

I do already use 'pestat -F' although this flags over 100 of our 170
nodes, so it results in a bit of information overload.  I guess it would
be nice if the sensitivity of the flagging could be tweaked on the
command line, so that only the worst nodes are shown.

I also use some wrappers around 'sueff' from

  https://github.com/ubccr/stubl

to generate part of an ASCII dashboard (an dasciiboard?), which looks
like

  UsernameMem_Request  Max_Mem_Use  CPU_Efficiency  
Number_of_CPUs_In_Use
  alpha   42000M   0.03Gn   48.80%  (0.98 of 2)
  beta10500M   11.01Gn  99.55%  (3.98 of 4)
  gamma   8000M8.39Gn   99.64%  (63.77 of 64)
  ...
  chi varied   3.96Gn   83.65%  (248.44 of 297)
  phi 1800M1.01Gn   98.79%  (248.95 of 252)
  omega   16G  4.61Gn   99.69%  (127.60 of 128)

  == Above data from: Thu 29 Sep 15:26:29 CEST 2022 
=

and just loops every 30 seconds.  This is what I use to spot users with
badly configured jobs.

However, I'd really like to be able to identify non-MPI jobs on multiple
nodes automatically.

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de



Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi Davide,

That is a interesting idea.  We already do some averaging, but over the
whole of the past month.  For each user we use the output of seff to
generate two scatterplots: CPU-efficiency vs. CPU-hours and
memory-efficiency vs. GB-hours.  See
 
  
https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik

However, I am mainly interested in being able to cancel some of the inefficient
jobs before they have run for too long.

Cheers,

Loris

 Davide DelVento  writes:

> At my previous job there were cron jobs running everywhere measuring
> possibly idle cores which were eventually averaged out for the
> duration of the job, and reported (the day after) via email to the
> user support team.
> I believe they stopped doing so when compute became (relatively) cheap
> at the expense of memory and I/O becoming expensive.
>
> I know, it does not help you much, but perhaps something to think about
>
> On Thu, Sep 29, 2022 at 1:29 AM Loris Bennett
>  wrote:
>>
>> Hi,
>>
>> Has anyone already come up with a good way to identify non-MPI jobs which
>> request multiple cores but don't restrict themselves to a single node,
>> leaving cores idle on all but the first node?
>>
>> I can see that this is potentially not easy, since an MPI job might have
>> still have phases where only one core is actually being used.
>>
>> Cheers,
>>
>> Loris
>>
>> --
>> Dr. Loris Bennett (Herr/Mr)
>> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>>
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de



Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Ole Holm Nielsen

Hi Loris,

On 9/29/22 09:26, Loris Bennett wrote:

Has anyone already come up with a good way to identify non-MPI jobs which
request multiple cores but don't restrict themselves to a single node,
leaving cores idle on all but the first node?

I can see that this is potentially not easy, since an MPI job might have
still have phases where only one core is actually being used.


Just an idea: The "pestat -F" tool[1] will tell you if any nodes have an 
"unexpected" CPU load.  If you see the same JobID runing on multiple nodes 
with a too low CPU load, that might point to a job such as you describe.


/Ole

[1] https://github.com/OleHolmNielsen/Slurm_tools/tree/master/pestat



Re: [slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Davide DelVento
At my previous job there were cron jobs running everywhere measuring
possibly idle cores which were eventually averaged out for the
duration of the job, and reported (the day after) via email to the
user support team.
I believe they stopped doing so when compute became (relatively) cheap
at the expense of memory and I/O becoming expensive.

I know, it does not help you much, but perhaps something to think about

On Thu, Sep 29, 2022 at 1:29 AM Loris Bennett
 wrote:
>
> Hi,
>
> Has anyone already come up with a good way to identify non-MPI jobs which
> request multiple cores but don't restrict themselves to a single node,
> leaving cores idle on all but the first node?
>
> I can see that this is potentially not easy, since an MPI job might have
> still have phases where only one core is actually being used.
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Herr/Mr)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>



[slurm-users] Detecting non-MPI jobs running on multiple nodes

2022-09-29 Thread Loris Bennett
Hi,

Has anyone already come up with a good way to identify non-MPI jobs which
request multiple cores but don't restrict themselves to a single node,
leaving cores idle on all but the first node?

I can see that this is potentially not easy, since an MPI job might have
still have phases where only one core is actually being used.

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de