[slurm-users] Re: problem with squeue --json with version 24.05.1

2024-07-03 Thread Ümit Seren via slurm-users
We experience the same issue.

SLURM 24.05.1 segfaults with squeue –json and squeue --json=v0.0.41 but works 
with squeue --json=v0.0.40


From: Markus Köberl via slurm-users 
Date: Wednesday, 3. July 2024 at 15:15
To: Joshua Randall 
Cc: slurm-users@lists.schedmd.com 
Subject: [slurm-users] Re: problem with squeue --json with version 24.05.1
On Wednesday, 3 July 2024 13:26:25 CEST Joshua Randall wrote:
> Markus,
>
> I had a similar problem after upgrading from v23 to v24 but found that
> specifying _any_ valid data version worked for me, it was only
> specifying `--json` without a version that triggered an error (which
> in my case was I believe a segfault from sinfo rather than a malloc
> error from squeue - but as these are both memory issues it seems
> possible they could both potentially arise from the same underlying
> library issue presenting differently in different CLI tools). So the
> underlying issue _may_ be with the logic that attempts to determine
> what the latest data version is and to load that, whereas specifying
> any valid version explicitly may work.
>
> Are you able to run `squeue --json=v0.0.41` successfully?

It seams to be a problem only with squeue and data parser version v0.0.41
only, it also affects 24.05.0 the same way.

$ squeue --json=v0.0.41
malloc(): corrupted top size
Aborted

data parser version v0.0.40 works, v0.0.39 does not return anything.


regards
Markus
--
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


Re: [slurm-users] slurmctld/slurmdbd (code=exited, status=217/USER)

2024-01-19 Thread Ümit Seren
Looks like the slurm user does not exist on the system.
Did you run the slurmctld and slurmdbd before as root ?
If you remove the two lines (User, Group), the services will start.
But is is recommended to create a dedicated slurm user for that:
https://slurm.schedmd.com/quickstart_admin.html#daemons



On Fri, Jan 19, 2024, 16:02 Miriam Olmi  wrote:

> Hi all,
>
> I am having some issue with the new version of slurm 23.11.0-1.
>
> I had already installed and configured slurm 23.02.3-1 on my cluster and
> all the services were active and running properly.
>
> After I install with the same procedure the new version of slurm I have
> that
> the slurmctld and slurmdbd daemons fail to start all with the same error:
>
>  (code=exited, status=217/USER)
>
> And investigating the problem with the command journalctl -xe I find:
>
> slurmctld.service: Failed to determine user credentials: No such process
> slurmctld.service: Failed at step USER spawning /usr/sbin/slurmctld: No
> such process
>
>
> I had a look at the slurmctld.service file for both the slurm versions and
> I found the following differences in the [Service] section.
>
> From the slurmctld.service file of slurm 23.02.3-1:
>
> [Service]
> Type=simple
> EnvironmentFile=-/etc/sysconfig/slurmctld
> EnvironmentFile=-/etc/default/slurmctld
> ExecStart=/usr/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS
> ExecReload=/bin/kill -HUP $MAINPID
> LimitNOFILE=65536
> TasksMax=infinity
>
>
> From the slurmctld.service file of slurm 23.11.0-1:
>
> [Service]
> Type=notify
> EnvironmentFile=-/etc/sysconfig/slurmctld
> EnvironmentFile=-/etc/default/slurmctld
> User=slurm
> Group=slurm
> ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS
> ExecReload=/bin/kill -HUP $MAINPID
> LimitNOFILE=65536
> TasksMax=infinity
>
>
> I think the presence of the new lines regarding the slurm user might be
> the problem
> but I am not sure and I have no idea how to solve it.
>
> Can anyone halp me?
>
> Thanks in advance,
> Miriam
>
>
>
>
>


Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

2024-01-19 Thread Ümit Seren
Maybe also post the output of scontrol show job  to check the other
resources allocated for the job.



On Thu, Jan 18, 2024, 19:22 Kherfani, Hafedh (Professional Services, TC) <
hafedh.kherf...@hpe.com> wrote:

> Hi Ümit, Troy,
>
>
>
> I removed the line “#SBATCH --gres=gpu:1”, and changed the sbatch
> directive “--gpus-per-node=4” to “--gpus-per-node=1”, but still getting the
> same result: When running multiple sbatch commands for the same script,
> only one job (first execution) is running, and all subsequent jobs are in a
> pending state (REASON being reported as “Resources” for immediately next
> job in the queue, and “Priority” for remaining ones) …
>
>
>
> As for the output from “scontrol show job ” command: I don’t see a 
> “TRES”
> field on its own .. I see the field “TresPerNode=gres/gpu:1” (the value in
> the end f the line will correspond to the value specified in the 
> “--gpus-per-node=”
> directive.
>
>
>
> PS: Is it normal/expected (in the output of scontrol show job command) to
> have “Features=(null)” ? I was expecting to see Features=gpu ….
>
>
>
>
>
> Best regards,
>
>
>
> *Hafedh *
>
>
>
> *From:* slurm-users  *On Behalf Of
> *Baer, Troy
> *Sent:* jeudi 18 janvier 2024 3:47 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] Need help with running multiple
> instances/executions of a batch script in parallel (with NVIDIA HGX A100
> GPU as a Gres)
>
>
>
> Hi Hafedh,
>
>
>
> Your job script has the sbatch directive “—gpus-per-node=4” set.  I
> suspect that if you look at what’s allocated to the running job by doing
> “scontrol show job ” and looking at the TRES field, it’s been
> allocated 4 GPUs instead of one.
>
>
>
> Regards,
>
> --Troy
>
>
>
> *From:* slurm-users  *On Behalf Of
> *Kherfani, Hafedh (Professional Services, TC)
> *Sent:* Thursday, January 18, 2024 9:38 AM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] Need help with running multiple
> instances/executions of a batch script in parallel (with NVIDIA HGX A100
> GPU as a Gres)
>
>
>
> Hi Noam and Matthias, Thanks both for your answers. I changed the “#SBATCH
> --gres=gpu: 4“ directive (in the batch script) with “#SBATCH --gres=gpu: 1“
> as you suggested, but it didn’t make a difference, as running
>
>
>
> Hi Noam and Matthias,
>
>
>
> Thanks both for your answers.
>
>
>
> I changed the “#SBATCH --gres=gpu:4“ directive (in the batch script) with
> “#SBATCH --gres=gpu:1“ as you suggested, but it didn’t make a difference,
> as running this batch script 3 times will result in the first job to be in
> a running state, while the second and third jobs will still be in a pending
> state …
>
>
>
> [slurmtest@c-a100-master test-batch-scripts]$ cat gpu-job.sh
>
> #!/bin/bash
>
> #SBATCH --job-name=gpu-job
>
> #SBATCH --partition=gpu
>
> #SBATCH --nodes=1
>
> #SBATCH --gpus-per-node=4
>
> #SBATCH --gres=gpu:1#  Changed from ‘4’ to
> ‘1’
>
> #SBATCH --tasks-per-node=1
>
> #SBATCH --output=gpu_job_output.%j
>
> #SBATCH --error=gpu_job_error.%j
>
>
>
> hostname
>
> date
>
> sleep 40
>
> pwd
>
>
>
> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>
> Submitted batch job *217*
>
> [slurmtest@c-a100-master test-batch-scripts]$ squeue
>
>  JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
>
>217   gpu  gpu-job slurmtes  R   0:02  1
> c-a100-cn01
>
> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>
> Submitted batch job *218*
>
> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>
> Submitted batch job *219*
>
> [slurmtest@c-a100-master test-batch-scripts]$ squeue
>
>  JOBID PARTITION NAME USER ST   TIME  NODES
> NODELIST(REASON)
>
>219   gpu  gpu-job slurmtes *PD*   0:00  1
> (Priority)
>
>218   gpu  gpu-job slurmtes *PD*   0:00  1
> (Resources)
>
>217   gpu  gpu-job slurmtes  *R*   0:07  1
> c-a100-cn01
>
>
>
> Basically I’m seeking for some help/hints on how to tell Slurm, from the
> batch script for example: “I want only 1 or 2 GPUs to be used/consumed by
> the job”, and then I run the batch script/job a couple of times with sbatch
> command, and confirm that we can indeed have multiple jobs using a GPU and
> running in parallel, at the same time.
>
>
>
> Makes sense ?
>
>
>
>
>
> Best regards,
>
>
>
> *Hafedh *
>
>
>
> *From:* slurm-users  *On Behalf Of
> *Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
> *Sent:* jeudi 18 janvier 2024 2:30 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] Need help with running multiple
> instances/executions of a batch script in parallel (with NVIDIA HGX A100
> GPU as a Gres)
>
>
>
> On Jan 18, 2024, at 7:31 AM, Matthias Loose  wrote:
>
>
>
> Hi Hafedh,
>
> Im no expert in the GPU side of SLURM, but looking at you current
> configuration to me its working as intended at the moment

Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

2024-01-18 Thread Ümit Seren
This line also has tobe changed:

#SBATCH --gpus-per-node=4 • #SBATCH --gpus-per-node=1

--gpus-per-node seems to be the new parameter that is replacing the  --gres= 
one, so you can remove the –gres line completely.

Best
Ümit


From: slurm-users  on behalf of 
Kherfani, Hafedh (Professional Services, TC) 
Date: Thursday, 18. January 2024 at 15:40
To: Slurm User Community List 
Subject: Re: [slurm-users] Need help with running multiple instances/executions 
of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
Hi Noam and Matthias,

Thanks both for your answers.

I changed the “#SBATCH --gres=gpu:4“ directive (in the batch script) with 
“#SBATCH --gres=gpu:1“ as you suggested, but it didn’t make a difference, as 
running this batch script 3 times will result in the first job to be in a 
running state, while the second and third jobs will still be in a pending state 
…

[slurmtest@c-a100-master test-batch-scripts]$ cat gpu-job.sh
#!/bin/bash
#SBATCH --job-name=gpu-job
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --gres=gpu:1#  Changed from ‘4’ to ‘1’
#SBATCH --tasks-per-node=1
#SBATCH --output=gpu_job_output.%j
#SBATCH --error=gpu_job_error.%j

hostname
date
sleep 40
pwd

[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 217
[slurmtest@c-a100-master test-batch-scripts]$ squeue
 JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
   217   gpu  gpu-job slurmtes  R   0:02  1 c-a100-cn01
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 218
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 219
[slurmtest@c-a100-master test-batch-scripts]$ squeue
 JOBID PARTITION NAME USER ST   TIME  NODES 
NODELIST(REASON)
   219   gpu  gpu-job slurmtes PD   0:00  1 (Priority)
   218   gpu  gpu-job slurmtes PD   0:00  1 (Resources)
   217   gpu  gpu-job slurmtes  R   0:07  1 c-a100-cn01

Basically I’m seeking for some help/hints on how to tell Slurm, from the batch 
script for example: “I want only 1 or 2 GPUs to be used/consumed by the job”, 
and then I run the batch script/job a couple of times with sbatch command, and 
confirm that we can indeed have multiple jobs using a GPU and running in 
parallel, at the same time.

Makes sense ?


Best regards,

Hafedh

From: slurm-users  On Behalf Of 
Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Sent: jeudi 18 janvier 2024 2:30 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Need help with running multiple instances/executions 
of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)

On Jan 18, 2024, at 7:31 AM, Matthias Loose 
mailto:m.lo...@mindcode.de>> wrote:

Hi Hafedh,

Im no expert in the GPU side of SLURM, but looking at you current configuration 
to me its working as intended at the moment. You have defined 4 GPUs and start 
multiple jobs each consuming 4 GPUs each. So the jobs wait for the ressource 
the be free again.

I think what you need to look into is the MPS plugin, which seems to do what 
you are trying to achieve:
https://slurm.schedmd.com/gres.html#MPS_Management

I agree with the first paragraph.  How many GPUs are you expecting each job to 
use? I'd have assumed, based on the original text, that each job is supposed to 
use 1 GPU, and the 4 jobs were supposed to be running side-by-side on the one 
node you have (with 4 GPUs).  If so, you need to tell each job to request only 
1 GPU, and currently each one is requesting 4.

If your jobs are actually supposed to be using 4 GPUs each, I still don't see 
any advantage to MPS (at least in what is my usual GPU usage pattern): all the 
jobs will take longer to finish, because they are sharing the fixed resource. 
If they take turns, at least the first ones finish as fast as they can, and the 
last one will finish no later than it would have if they were all time-sharing 
the GPUs.  I guess NVIDIA had something in mind when they developed MPS, so I 
guess our pattern may not be typical (or at least not universal), and in that 
case the MPS plugin may well be what you need.


Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-10-05 Thread Ümit Seren
I would suggest du increase the log verbosity of slurmrest and see if there
is more information in the log file

On Thu, Oct 5, 2023 at 3:34 PM Laurence  wrote:

> Coming back to this, it is failing again and I don't know why.
>
> *slurmctld: error: failed to verify jwt, rc=22*
> *slurmctld: error: could not find matching kid or decode failed*
>
> The kids seem to match and python code I have verifies the jwt with the
> jwks. Does anyone have any ideas on what the issue might be? The jwks can
> be found at the following URL.
>
> https://auth.cern.ch/auth/realms/cern/protocol/openid-connect/certs
>
> Cheers,
>
> Laurence
> On 27/03/2023 11:07, Laurence Field wrote:
>
> Hi Ümit,
>
> Thanks for the reply. Yes, it looks like this is the issue. Although from
> the master branch it suggests that the claim_field can also be used but
> this is not in the version we have deployed.
>
> Cheers,
>
> Laurence
> On 24.03.23 16:51, Ümit Seren wrote:
>
> Looks like you are missing the username field in the JWT token:
> https://github.com/SchedMD/slurm/blob/slurm-22-05-8-1/src/plugins/auth/jwt/auth_jwt.c#L419
> You have to make sure that your JWT token contains the SLURM username as
> an attribute (https://slurm.schedmd.com/jwt.html#compatibility).
>
>
>
> On Fri, Mar 24, 2023 at 4:40 PM Laurence Field 
> wrote:
>
>> Hi,
>>
>> After verifying the JWT and JWKS with some Python code, it magically
>> seems to work. At least the error has changed to *auth_p_verify:
>> jwt_get_grant failure. *This suggests I need to update something in the
>> authorization policy. Will do that now but if anyone has done this before
>> and can give me some hints, they would be most welcome.
>>
>> Cheers,
>>
>> Laurence
>> On 24.03.23 10:41, Laurence Field wrote:
>>
>> Hi Ümit,
>>
>> Thanks for your reply. We are using Keycloak and the JWKS does contain
>> this parameter. I will continue to debug but any suggestions would be
>> greatly appreciated.
>>
>> Cheers,
>>
>> Laurence
>> On 23.03.23 11:42, Ümit Seren wrote:
>>
>> If you use AzureAD as your identity provider beware that their JWKS json
>> doesn't contain the alg parameter.
>> We opened an issue: https://bugs.schedmd.com/show_bug.cgi?id=16168 and
>> it is confirmed.
>> As a workaround you can use this jq query to add the alg to the jwks json
>> that you get from AzureAD:
>> curl -s https://login.microsoftonline.com/TENANT/discovery/v2.0/keys |
>> jq '.keys |= map(.alg="RS256")' > $TMPFILE
>>
>> Hope this helps
>> Best
>> Ümit
>>
>> On Thu, Mar 23, 2023 at 11:26 AM Laurence  wrote:
>>
>>> Hi,
>>>
>>> I am trying to configure SLURM to use external authentication for JWT as
>>> described in the documentation.
>>>
>>> https://slurm.schedmd.com/jwt.html
>>>
>>> JWT Authentication worked when I tested the setup for standalone use but
>>> am having difficulty with tokens from our oauth provider.
>>>
>>> My first question is has anyone successfully done this? My second
>>> question is on the example code to verify the jwt key. Is the example up to
>>> date as it doesn't work for me. The final question is does anyone have any
>>> suggestions on the concrete error reported in the slurmctld log.
>>>
>>> *slurmctld: error: failed to verify jwt, rc=22*
>>> *slurmctld: error: could not find matching kid or decode failed*
>>>
>>> Thanks,
>>>
>>> Laurence
>>>
>>


Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-24 Thread Ümit Seren
Looks like you are missing the username field in the JWT token:
https://github.com/SchedMD/slurm/blob/slurm-22-05-8-1/src/plugins/auth/jwt/auth_jwt.c#L419
You have to make sure that your JWT token contains the SLURM username as an
attribute (https://slurm.schedmd.com/jwt.html#compatibility).



On Fri, Mar 24, 2023 at 4:40 PM Laurence Field 
wrote:

> Hi,
>
> After verifying the JWT and JWKS with some Python code, it magically seems
> to work. At least the error has changed to *auth_p_verify: jwt_get_grant
> failure. *This suggests I need to update something in the authorization
> policy. Will do that now but if anyone has done this before and can give me
> some hints, they would be most welcome.
>
> Cheers,
>
> Laurence
> On 24.03.23 10:41, Laurence Field wrote:
>
> Hi Ümit,
>
> Thanks for your reply. We are using Keycloak and the JWKS does contain
> this parameter. I will continue to debug but any suggestions would be
> greatly appreciated.
>
> Cheers,
>
> Laurence
> On 23.03.23 11:42, Ümit Seren wrote:
>
> If you use AzureAD as your identity provider beware that their JWKS json
> doesn't contain the alg parameter.
> We opened an issue: https://bugs.schedmd.com/show_bug.cgi?id=16168 and it
> is confirmed.
> As a workaround you can use this jq query to add the alg to the jwks json
> that you get from AzureAD:
> curl -s https://login.microsoftonline.com/TENANT/discovery/v2.0/keys | jq
> '.keys |= map(.alg="RS256")' > $TMPFILE
>
> Hope this helps
> Best
> Ümit
>
> On Thu, Mar 23, 2023 at 11:26 AM Laurence  wrote:
>
>> Hi,
>>
>> I am trying to configure SLURM to use external authentication for JWT as
>> described in the documentation.
>>
>> https://slurm.schedmd.com/jwt.html
>>
>> JWT Authentication worked when I tested the setup for standalone use but
>> am having difficulty with tokens from our oauth provider.
>>
>> My first question is has anyone successfully done this? My second
>> question is on the example code to verify the jwt key. Is the example up to
>> date as it doesn't work for me. The final question is does anyone have any
>> suggestions on the concrete error reported in the slurmctld log.
>>
>> *slurmctld: error: failed to verify jwt, rc=22*
>> *slurmctld: error: could not find matching kid or decode failed*
>>
>> Thanks,
>>
>> Laurence
>>
>


Re: [slurm-users] External Authentication Integration with JWKS and RS256 Tokens

2023-03-23 Thread Ümit Seren
If you use AzureAD as your identity provider beware that their JWKS json
doesn't contain the alg parameter.
We opened an issue: https://bugs.schedmd.com/show_bug.cgi?id=16168 and it
is confirmed.
As a workaround you can use this jq query to add the alg to the jwks json
that you get from AzureAD:
curl -s https://login.microsoftonline.com/TENANT/discovery/v2.0/keys | jq
'.keys |= map(.alg="RS256")' > $TMPFILE

Hope this helps
Best
Ümit

On Thu, Mar 23, 2023 at 11:26 AM Laurence  wrote:

> Hi,
>
> I am trying to configure SLURM to use external authentication for JWT as
> described in the documentation.
>
> https://slurm.schedmd.com/jwt.html
>
> JWT Authentication worked when I tested the setup for standalone use but
> am having difficulty with tokens from our oauth provider.
>
> My first question is has anyone successfully done this? My second question
> is on the example code to verify the jwt key. Is the example up to date as
> it doesn't work for me. The final question is does anyone have any
> suggestions on the concrete error reported in the slurmctld log.
>
> *slurmctld: error: failed to verify jwt, rc=22*
> *slurmctld: error: could not find matching kid or decode failed*
>
> Thanks,
>
> Laurence
>


Re: [slurm-users] speed / efficiency of sacct vs. scontrol

2023-02-27 Thread Ümit Seren
As a side note:
In Slurm 23.x a new rate limiting feature for client RPC calls was added:
(see this commit:
https://github.com/SchedMD/slurm/commit/674f118140e171d10c2501444a0040e1492f4eab#diff-b4e84d09d9b1d817a964fb78baba0a2ea6316bfc10c1405329a95ad0353ca33e
)
This would give operators the ability to limit the negative effect of
workflow managers on the scheduler.


On Mon, Feb 27, 2023 at 4:57 PM Davide DelVento 
wrote:

> > > And if you are seeing a workflow management system causing trouble on
> > > your system, probably the most sustainable way of getting this resolved
> > > is to file issues or pull requests with the respective project, with
> > > suggestions like the ones you made. For snakemake, a second good point
> > > to currently chime in, would be the issue discussing Slurm job array
> > > support: https://github.com/snakemake/snakemake/issues/301
> >
> > I have to disagree here.  I think the onus is on the people in a given
> > community to ensure that their software behaves well on the systems they
> > want to use, not on the operators of those system.  Those of us running
> > HPC systems often have to deal with a very large range of different
> > pieces of software and time and personell are limited.  If some program
> > used by only a subset of the users is causing disruption, then it
> > already costs us time and energy to mitigate those effects.  Even if I
> > had the appropriate skill set, I don't see my self be writing many
> > patches for workflow managers any time soon.
>
> As someone who has worked in both roles (and to a degree still is) and
> therefore can better understand the perspective from both parties, I
> side more with David than with Loris here.
>
> Yes, David wrote "or pull requests", but that's an OR.
>
> Loris, if you know or experience a problem, it takes close to zero
> time to file a bug report educating the author of the software about
> the problem (or pointing them to places where they can educate
> themselves). Otherwise they will never know about it, they will never
> fix it, and potentially they think it's fine and will make the problem
> worse. Yes, you could alternatively forbid the use of the problematic
> software on the machine (I've done that on our systems), but users
> with those needs will find ways to create the very same problem, and
> perhaps worse, in other ways (they have done it on our system). Yes,
> time is limited, and as operators of HPC systems we often don't have
> the time to understand all the nuances and needs of all the users, but
> that's not the point I am advocating. In fact it does seem to me that
> David is putting the onus on himself and his community to make the
> software behave correctly, and he is trying to educate himself about
> what "correct" is like. So just give him the input he's looking for,
> both here and (if and when snakemake causes troubles on your system)
> by opening tickets on that repo, explaining the problem (definitely
> not writing a PR for you, sorry David)
>
>


Re: [slurm-users] job_container/tmpfs and autofs

2023-01-12 Thread Ümit Seren
We had the same issue when we switched to job_container plugin. We ended up
running cvmfs_cpnfig probe as part of the health check tool so that the
cvmfs repos stay mounted. However after we switched on power saving we ran
into some race conditions (job landed on a node before the cvmfs was
mounted). We ended up switching to static mounts for the cvmfs repos on the
compute nodes

Best
Ümit

On Thu, Jan 12, 2023, 09:17 Bjørn-Helge Mevik  wrote:

> In my opinion, the problem is with autofs, not with tmpfs.  Autofs
> simply doesn't work well when you are using detached fs name spaces and
> bind mounting.  We ran into this problem years ago (with an inhouse
> spank plugin doing more or less what tmpfs does), and ended up simply
> not using autofs.
>
> I guess you could try using systemd's auto-mounting features, but I have
> no idea if they work better than autofs in situations like this.
>
> We ended up using a system where the prolog script mounts any needed
> file systems, and then the healthcheck script unmounts file systems that
> are no longer needed.
>
> --
> Regards,
> Bjørn-Helge Mevik, dr. scient,
> Department for Research Computing, University of Oslo
>


Re: [slurm-users] GPU-node not waking up after power-save

2022-10-13 Thread Ümit Seren
We use power saving with our GPU nodes and they power up fine. They take a bit 
longer to boot but that’s it.
What do you mean with not waking up ?
The power on script is not called ?
Best
Ümit

From: slurm-users  on behalf of Loris 
Bennett 
Date: Thursday, 13. October 2022 at 08:14
To: Slurm Users Mailing List 
Subject: [slurm-users] GPU-node not waking up after power-save
Hi,

We use Slurm's power saving mechanism to switch of idle nodes.  However,
we don't currently use it for our GPU nodes.  This is because in the
past these nodes failed to wake up again when jobs were submitted to the
GPU partition.  Before we look at the issue due to the current energy
situation, I was wondering whether this a problem others have (had).

So does power-saving work in general for GPU nodes and, if so, are there
any extra steps one needs to take in order to set things up properly?

Cheers,

Loris

--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de


Re: [slurm-users] Providing users with info on wait time vs. run time

2022-09-16 Thread Ümit Seren
On Fri, Sep 16, 2022 at 3:43 PM Sebastian Potthoff <
s.potth...@uni-muenster.de> wrote:

> Hi Hermann,
>
> So you both are happily(?) ignoring this warning the "Prolog and Epilog
> Guide",
> right? :-)
>
> "Prolog and Epilog scripts [...] should not call Slurm commands (e.g.
> squeue,
> scontrol, sacctmgr, etc)."
>
>
> We have probably been doing this since before the warning was added to
> the documentation.  So we are "ignorantly ignoring" the advice :-/
>
>
> Same here :) But if $SLURM_JOB_STDOUT is not defined as documented … what
> can you do.
>

FYI: SLURM_JOB_STDOUT among other ENV variables was added in 22.05 (see
https://slurm.schedmd.com/news.html) so it might not be available if you
have an older SLURM version.



>
> May I ask how big your clusters are (number of nodes) and how heavily they
> are
> used (submitted jobs per hour)?
>
>
> We have around 500 nodes (mostly 2x18 cores). Jobs ending (i.e. calling
> the epilog script) varies quite a lot between 1000 and 15k a day, so
> something in between 40 and 625 Jobs/hour. During those peaks Slurm can
> become noticeably slower, however usually it runs fine.
>
> Sebastian
>
> Am 16.09.2022 um 15:15 schrieb Loris Bennett :
>
> Hi Hermann,
>
> Hermann Schwärzler  writes:
>
> Hi Loris,
> hi Sebastian,
>
> thanks for the information on how you are doing this.
> So you both are happily(?) ignoring this warning the "Prolog and Epilog
> Guide",
> right? :-)
>
> "Prolog and Epilog scripts [...] should not call Slurm commands (e.g.
> squeue,
> scontrol, sacctmgr, etc)."
>
>
> We have probably been doing this since before the warning was added to
> the documentation.  So we are "ignorantly ignoring" the advice :-/
>
> May I ask how big your clusters are (number of nodes) and how heavily they
> are
> used (submitted jobs per hour)?
>
>
> We have around 190 32-core nodes.  I don't know how I would easily find
> out the average number of jobs per hour.  The only problems we have had
> with submission have been when people have written their own mechanisms
> for submitting thousands of jobs.  Once we get them to use job array,
> such problems generally disappear.
>
> Cheers,
>
> Loris
>
> Regards,
> Hermann
>
> On 9/16/22 9:09 AM, Loris Bennett wrote:
>
> Hi Hermann,
> Sebastian Potthoff  writes:
>
> Hi Hermann,
>
> I happened to read along this conversation and was just solving this issue
> today. I added this part to the epilog script to make it work:
>
> # Add job report to stdout
> StdOut=$(/usr/bin/scontrol show job=$SLURM_JOB_ID | /usr/bin/grep StdOut |
> /usr/bin/xargs | /usr/bin/awk 'BEGIN { FS = "=" } ; { print $2 }')
>
> NODELIST=($(/usr/bin/scontrol show hostnames))
>
> # Only add to StdOut file if it exists and if we are the first node
> if [ "$(/usr/bin/hostname -s)" = "${NODELIST[0]}" -a ! -z "${StdOut}" ]
> then
>   echo "# JOB REPORT
> ##" >> $StdOut
>   /usr/bin/seff $SLURM_JOB_ID >> $StdOut
>   echo
> "###"
> >> $StdOut
> fi
>
> We do something similar.  At the end of our script pointed to by
> EpilogSlurmctld we have
>   OUT=`scontrol show jobid ${job_id} | awk -F= '/ StdOut/{print $2}'`
>   if [ ! -f "$OUT" ]; then
> exit
>   fi
>   printf "\n== Epilog Slurmctld
> ==\n\n" >>  ${OUT}
>   seff ${SLURM_JOB_ID} >> ${OUT}
>   printf
>
> "\n==\n"
>
> ${OUT}
>
>   chown ${user} ${OUT}
> Cheers,
> Loris
>
>   Contrary to what it says in the slurm docs
> https://slurm.schedmd.com/prolog_epilog.html  I was not able to use the
> env var SLURM_JOB_STDOUT, so I had to fetch it via scontrol. In addition I
> had to
> make sure it is only called by the „leading“ node as the epilog script
> will be called by ALL nodes of a multinode job and they would all call seff
> and clutter up the output. Last thing was to check if StdOut is
> not of length zero (i.e. it exists). Interactive jobs would otherwise
> cause the node to drain.
>
> Maybe this helps.
>
> Kind regards
> Sebastian
>
> PS: goslmailer looks quite nice with its recommendations! Will definitely
> look into it.
>
> --
> Westfälische Wilhelms-Universität (WWU) Münster
> WWU IT
> Sebastian Potthoff (eScience / HPC)
>
>  Am 15.09.2022 um 18:07 schrieb Hermann Schwärzler <
> hermann.schwaerz...@uibk.ac.at>:
>
>  Hi Ole,
>
>  On 9/15/22 5:21 PM, Ole Holm Nielsen wrote:
>
>  On 15-09-2022 16:08, Hermann Schwärzler wrote:
>
>  Just out of curiosity: how do you insert the output of seff into the
> out-file of a job?
>
>  Use the "smail" tool from the slurm-contribs RPM and set this in
> slurm.conf:
>  MailProg=/usr/bin/smail
>
>  Maybe I am missing something but from what I can tell smail sends an
> email and does *not* change or append to the .out file of a job...
>
>  Regards,
>  Hermann
>
>
>
> --
> Dr. Loris Bennett (Herr/Mr)
> ZEDAT, Freie Universi

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ümit Seren
We did a couple of major and minor SLURM upgrades without draining the compute 
nodes.
Once slurmdbd and slurmctld were updated to the new major version, we did a 
package update on the compute nodes and restarted slurmd on them.
The existing running jobs continued to run fine and new jobs on the same 
compute started by the updated slurmd daemon and also worked fine.

So, for us this worked smoothly.

Best
Ümit


From: slurm-users  on behalf of Ole Holm 
Nielsen 
Date: Monday, 30. May 2022 at 20:58
To: slurm-users@lists.schedmd.com 
Subject: Re: [slurm-users] Rolling upgrade of compute nodes
On 30-05-2022 19:34, Chris Samuel wrote:
> On 30/5/22 10:06 am, Chris Samuel wrote:
>
>> If you switch that symlink those jobs will pick up the 20.11 srun
>> binary and that's where you may come unstuck.
>
> Just to quickly fix that, srun talks to slurmctld (which would also be
> 20.11 for you), slurmctld will talk to the slurmd's running the job
> (which would be 19.05, so OK) but then the slurmd would try and launch a
> 20.11 slurmstepd and that is where I suspect things could come undone.

How about restarting all slurmd's at version 20.11 in one shot?  No
reboot will be required.  There will be running 19.05 slurmstepd's for
the running job steps, even though slurmd is at 20.11.  You could
perhaps restart 20.11 slurmd one partition at a time in order to see if
it works correctly on a small partition of the cluster.

I think we have done this successfully when we install new RPMs on *all*
compute nodes in one shot, and I'm not aware of any job crashes.  Your
mileage may vary depending on job types!

Question: Does anyone have bad experiences with upgrading slurmd while
the cluster is running production?

/Ole