Re: [slurm-users] Using oversubscribe to hammer a node

2023-01-19 Thread Loris Bennett
Hi Rob,

"Groner, Rob"  writes:

> I'm trying to setup a specific partition where users can fight with the OS 
> for dominance,  The oversubscribe property sounds like what I want, as it says
> "More than one job can execute simultaneously on the same compute resource."  
> That's exactly what I want.  I've setup a node with 48 CPU and
> oversubscribe set to force:4.  I then execute a job that requests 48 cpus, 
> and that starts running.  I execute another job asking for 48 cores, and it 
> gets
> assigned to the node...but it is not running, it's suspended.  I can execute 
> 2 more jobs, and they'll all go on the node (so, 4x) but 3 will be suspended 
> at
> any time.  I see the time slicing going on, but that isn't what I though it 
> would be...I thought all 4 tasks per cpu would be running at the same time. 
> Basically, I want the CPU/OS to work out the sharing of resources.  
> Otherwise, if one of the tasks that is running is just sitting there doing 
> nothing, it's
> going to do that for its 30 seconds while other tasks are suspended, right?  

Is --oversubscribe set for the jobs?

> What I want to see is 4x the nodes CPUs in tasks all running at the same 
> time, not time slicing, just for jobs using this partition.  Is that a thing?

It might be thing.  I'm not sure it is a very sensible thing.  Time
slicing and context switching is still going to take place, with each
process getting a quarter of a core on average.  It is not clear that
you will actually increase throughput this way.  I would probably first
turn on hyperthreading to deal with jobs which have intermittent
CPU-usage.

Still, since Slurm offers the possibility of oversubscription, I assume
there must be a use-case.

Cheers,

Loris

-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin



Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Christopher Samuel

On 1/19/23 5:01 am, Stefan Staeglich wrote:


Hi,


Hiya,


I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.


That's right, it's only executed on compute nodes.


But the man page slurm.conf of 21.08.x states:
UnkillableStepProgram
   Must be executable by user SlurmUser.  The file must be
accessible by the primary and backup control machines.

So I would expect it's executed on the controller node.


That's strange, my slurm.conf man page from a system still running 21.08 
says:


UNKILLABLE STEP PROGRAM SCRIPT
   This program can be used to take special actions to clean up
   the unkillable processes and/or notify system administrators.
   The program will be run as SlurmdUser (usually "root") on
   the compute node where UnkillableStepTimeout was triggered.

Ah, I see, there's a later "FILE AND DIRECTORY PERMISSIONS" part which 
has the text that you've found - that part's wrong! :-)


All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA




[slurm-users] How to get allocated cpu cores for nodes in tres mode

2023-01-19 Thread Lu Weizheng
Hi all.

In my site, I configure the cpu and gpu resources as TRES 
(https://slurm.schedmd.com/tres.html). Multiple jobs can co-run on the same 
node.
The users want to know how many cores remain unallocated when they are 
submitting jobs. This can help them choose which partition to use.
So is there any commands to check remaining cpu cores un allocated?

Thanks!


Re: [slurm-users] Job cancelled into the future

2023-01-19 Thread Reed Dier
Just to hopefully close this out, I believe I was actually able to resolve this 
in “user-land” rather than mucking with the database.

I was able to requeue the bad jid’s, and they went pending.
Then I updated the jobs to a time limit of 60.
Then I scancelled the jobs, and they returned to a cancelled state, before they 
rolled off within about 10 minutes.

Surprised I didn’t think to try requeueing earlier, but here’s to hoping that 
this did the trick, and I will have more accurate reporting and fewer “more 
time than is possible” log errors.

Thanks,
Reed

> On Jan 17, 2023, at 11:29 AM, Reed Dier  wrote:
> 
> So I was going to take a stab at trying to rectify this after taking care of 
> post-holiday matters.
> 
> Paste of the $CLUSTER_job_table table where I think I see the issue, and now 
> I just want to sanity check my steps to remediate.
> https://rentry.co/qhw6mg  (pastebin alternative 
> because markdown is paywalled for pastebin).
> 
> There are a number of job steps with a timelimit of 4294967295, where as the 
> others of the same job array are 525600.
> Obviously I want to edit those time limits to sane limits (match them to the 
> others).
> I don’t see anything in the $CLUSTER_step_table that looks like it would need 
> to be modified to match, though I could be wrong.
> 
> But then the part of getting slurm to pick it up is where I’m wanting to make 
> sure I’m on the right page.
> Should I manually update the mod_time timestamp and slurm will catch that at 
> its next rollup?
> Or will slurm catch the change in the time limit at update the mod_time when 
> it sees it upon rollup?
> 
> I also don’t see any documentation stating how to manually trigger a rollup, 
> either via slurmdbd.conf or command line flag.
> Will it automagically perform a rollup at some predefined, non-configurable 
> interval, or when restarting the daemon?
> 
> Apologies if this is all trivial information, just trying to measure twice 
> and cut once.
> 
> Appreciate everyone’s help so far.
> 
> Thanks,
> Reed
> 
>> On Dec 23, 2022, at 7:18 PM, Chris Samuel > > wrote:
>> 
>> On 20/12/22 6:01 pm, Brian Andrus wrote:
>> 
>>> You may want to dump the database, find what table/records need updated and 
>>> try updating them. If anything went south, you could restore from the dump.
>> 
>> +lots to making sure you've got good backups first, and stop slurmdbd before 
>> you start on the backups and don't restart it until you've made the changes, 
>> including setting the rollup times to be before the jobs started to make 
>> sure that the rollups include these changes!
>> 
>> When you start slurmdbd after making the changes it should see that it needs 
>> to do rollups and kick those off.
>> 
>> All the best,
>> Chris
>> -- 
>> Chris Samuel  :  http://www.csamuel.org/   :  
>> Berkeley, CA, USA



[slurm-users] Using oversubscribe to hammer a node

2023-01-19 Thread Groner, Rob
I'm trying to setup a specific partition where users can fight with the OS for 
dominance,  The oversubscribe property sounds like what I want, as it says 
"More than one job can execute simultaneously on the same compute resource."  
That's exactly what I want.  I've setup a node with 48 CPU and oversubscribe 
set to force:4.  I then execute a job that requests 48 cpus, and that starts 
running.  I execute another job asking for 48 cores, and it gets assigned to 
the node...but it is not running, it's suspended.  I can execute 2 more jobs, 
and they'll all go on the node (so, 4x) but 3 will be suspended at any time.  I 
see the time slicing going on, but that isn't what I though it would be...I 
thought all 4 tasks per cpu would be running at the same time.  Basically, I 
want the CPU/OS to work out the sharing of resources.  Otherwise, if one of the 
tasks that is running is just sitting there doing nothing, it's going to do 
that for its 30 seconds while other tasks are suspended, right?

What I want to see is 4x the nodes CPUs in tasks all running at the same time, 
not time slicing, just for jobs using this partition.  Is that a thing?

Thanks.



Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Stefan Staeglich
Hi,

I'm wondering where the UnkillableStepProgram is actually executed. According 
to Mike it has to be available on every on the compute nodes. This makes sense 
only if it is executed there.

But the man page slurm.conf of 21.08.x states:
   UnkillableStepProgram
  Must be executable by user SlurmUser.  The file must be 
accessible by the primary and backup control machines.

So I would expect it's executed on the controller node.

Best,
Stefan

Am Dienstag, 23. März 2021, 05:30:01 CET schrieb Chris Samuel:
> Hi Mike,
> 
> On 22/3/21 7:12 pm, Yap, Mike wrote:
> > # I presume UnkillableStepTimeout is set in slurm.conf. and it act as a
> > timer to trigger UnkillableStepProgram
> 
> That is correct.
> 
> > # UnkillableStepProgram   can be use to send email or reboot compute node
> > – question is how do we configure it ?
> 
> Also - or to automate collecting debug info (which is what we do) and
> then we manually intervene to reboot the node once we've determined
> there's no more useful info to collect.
> 
> It's just configured in your slurm.conf.
> 
> UnkillableStepProgram=/path/to/the/unkillable/step/script.sh
> 
> Of course this script has to be present on every compute node.
> 
> All the best,
> Chris


-- 
Stefan Stäglich,  Universität Freiburg,  Institut für Informatik
Georges-Köhler-Allee,  Geb.52,   79110 Freiburg,Germany

E-Mail : staeg...@informatik.uni-freiburg.de
WWW: ml.informatik.uni-freiburg.de
Telefon: +49 761 203-8223


signature.asc
Description: This is a digitally signed message part.


Re: [slurm-users] srun jobfarming hassle question

2023-01-19 Thread Ohlerich, Martin
Helle Björn-Helge.


Thank for reminding me /sys/fs for checking OOM issues. I lost that already out 
of sight again.

In this case, there are more steps involved (one for each srun call). I'm not 
sure whether cgroup handles each separately, or just on a node-base. If the 
latter ... why do I have to specify --mem at all in each single srun-step call? 
That is somehow illogical, imho. I mean that would semantically mean "Please 
tell me the resources you in order to find reasonable slots to run you task. 
But don't worry! On a node, I anyway do not care much. Do as you like, as long 
as the node's total memory consumption is below the threshold ...!" ;)


Anyway. I will see soon. My current user who I support with that is using quite 
memory-consumping stuff (bioinformatics) 

Thank you again!
Cheers, Martin



Von: slurm-users  im Auftrag von 
Bjørn-Helge Mevik 
Gesendet: Donnerstag, 19. Januar 2023 08:23
An: slurm-us...@schedmd.com
Betreff: Re: [slurm-users] srun jobfarming hassle question

"Ohlerich, Martin"  writes:

> Hello Björn-Helge.
>
>
> Sigh ...
>
> First of all, of course, many thanks! This indeed helped a lot!

Good!

> b) This only works if I have to specify --mem for a task. Although
> manageable, I wonder why one needs to be that restrictive. In
> principle, in the use case outlined, one task could use a bit less
> memory, and the other may require a bit more the half of the node's
> available memory. (So clearly this isn't always predictable.) I only
> hope that in such cases the second task does not die from OOM ... (I
> will know soon, I guess.)

As I understand it, Slurm (at least cgroups) will only kill a step if it
uses more memory *in total* on a node than the job got allocated to the
node.  So if a job has 10 GiB allocated on a node, and a step runs two
tasks there, one task could use 9 GiB and the other 1 GiB without the
step being killed.

You can inspect the memory limits that are in effect in cgroups (v1) in
/sys/fs/cgroup/memory/slurm/uid_/job_ (usual location, at
least).

--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo