[slurm-users] Re: Naive SLURM question: equivalent to LSF pre-exec

2024-02-14 Thread Paul Edmon via slurm-users
You probably want the Prolog option: 
https://slurm.schedmd.com/slurm.conf.html#OPT_Prolog along with: 
https://slurm.schedmd.com/slurm.conf.html#OPT_ForceRequeueOnFail


-Paul Edmon-

On 2/14/2024 8:38 AM, Cutts, Tim via slurm-users wrote:


Hi, I apologise if I’ve failed to find this in the documentation (and 
am happy to be told to RTFM) but a recent issue for one of my users 
resulted in a question I couldn’t answer.


LSF has a feature called a Pre-Exec where a script executes to check 
whether a node is ready to run a task.  So, you can run arbitrary 
checks and go back to the queue if they fail.


For example, if I have some automounted filesystems, and I want to be 
able to check for failure of the automounted, in an LSF world, I can do:


  bsub -E “test -f /nfs/someplace/file_I_know_exists” my_job.sh

What’s the equivalent in SLURM?

Thanks,

Tim

--

*Tim Cutts*

Scientific Computing Platform Lead

AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support 
you by visiting ourService Catalogue 
|




AstraZeneca UK Limited is a company incorporated in England and Wales 
with registered number:03674842 and its registered office at 1 Francis 
Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.


This e-mail and its attachments are intended for the above named 
recipient only and may contain confidential and privileged 
information. If they have come to you in error, you must not copy or 
show them to anyone; instead, please reply to this e-mail, 
highlighting the error to the sender and then immediately delete the 
message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor 
communications, please see our privacy notice at www.astrazeneca.com 




-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com


[slurm-users] Re: Naive SLURM question: equivalent to LSF pre-exec

2024-02-14 Thread Paul Raines via slurm-users


The Prolog will run with every job, not just "as asked for" by the user.
Also it runs as the root or slurm user, not the user who submitted.
For that one would use TaskProlog but at that point there is no
way to abort or requeue the job I think from TaskProlog

The Prolog script could check for environment var set by user
such as SLURM_USER_PROLOG that it will 'su' run as the submitting
user if it exists.  Then if it returns a non-zero return value,
exit and return that value.  Even with 'su' there are security
issues one has to think through here.

The requeing thing is a bit tricky.  I would not necessarily
set ForceRequeueOnFail as some Prolog scripts probably really
want some jobs just cancelled.Also Prolog will put the node
in a drain state which is not necessarily what an admin wants
when a user's prolog script fails.

Not sure there is any good way to do this with safe requeing.

-- Paul Raines (http://help.nmr.mgh.harvard.edu)



On Wed, 14 Feb 2024 9:32am, Paul Edmon via slurm-users wrote:

   External Email - Use Caution 

You probably want the Prolog option: 
https://secure-web.cisco.com/1gA_zj13OnVqs4BaLrstiwdHEvx0FITE_aDl92-7hACgRFo_Ph48JPmpZ9c5eUdI5r38RRv4LyHRZxUazGd8Y_CxRcSjCSPq4HCIQJcE60NasvEWY9i9Xgqo6APDiT8QvHKHdYw50eQKRazhP2XS1g-wXOiOOw7uPptriVL5hqDIwKYoVSXAuGhHms65rMC17PKxnfoFr0MI86JHZ2ecT4U3sFwTTtV-dVm9VPNG-mQcT-61c-7jDh8mJ-iQFauaFo9p9qmU6XPonf41CieTMfOIcaTkNo9Z04YFmOY8hH-q1xTXVS-sc2AhU0kzQ5t_D/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_Prolog 
along with: 
https://secure-web.cisco.com/1yoj7-l3lvo6_mD2LfIN7tNcHHzRekef8BenX_pB-l_Y7mzJdx9VNkuJnU8gyQGzWeU5PydyWg37_UnlJ-9STr9PxDBKHbmzaItEyH3XzeXO9cJY5-0NrAHcvRaBL76KveIqVKxkIAYjIwbmDtolahe9_FEuINl-B53wd6YYisn6loWpdYtQpL0z4Mjz4DZWuxs-GaRcoZRSUDqmseghEAlLBUJvKpdkAUBOA78xhWCIv6W7jJb75di-NmFX5h2R_GPJa9tdgTBgvdh3MS8FYTTnAH7R2hAK6X2iXoO6EsGmkWQP0l-8PCdsTreZ9bkkn/https%3A%2F%2Fslurm.schedmd.com%2Fslurm.conf.html%23OPT_ForceRequeueOnFail


-Paul Edmon-

On 2/14/2024 8:38 AM, Cutts, Tim via slurm-users wrote:


 Hi, I apologise if I’ve failed to find this in the documentation (and am
 happy to be told to RTFM) but a recent issue for one of my users resulted
 in a question I couldn’t answer.

 LSF has a feature called a Pre-Exec where a script executes to check
 whether a node is ready to run a task.  So, you can run arbitrary checks
 and go back to the queue if they fail.

 For example, if I have some automounted filesystems, and I want to be
 able to check for failure of the automounted, in an LSF world, I can do:

   bsub -E “test -f /nfs/someplace/file_I_know_exists” my_job.sh

 What’s the equivalent in SLURM?

 Thanks,

 Tim

 -- 


 *Tim Cutts*

 Scientific Computing Platform Lead

 AstraZeneca

 Find out more about R&D IT Data, Analytics & AI and how we can support
 you by visiting ourService Catalogue
 
|

 

 AstraZeneca UK Limited is a company incorporated in England and Wales
 with registered number:03674842 and its registered office at 1 Francis
 Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

 This e-mail and its attachments are intended for the above named
 recipient only and may contain confidential and privileged information.
 If they have come to you in error, you must not copy or show them to
 anyone; instead, please reply to this e-mail, highlighting the error to
 the sender and then immediately delete the message. For information about
 how AstraZeneca UK Limited and its affiliates may process information,
 personal data and monitor communications, please see our privacy notice
 at
 
http://secure-web.cisco.com/1q7NtvBOcnPasccer2doNzN_s8v1EcsmDX2FxZh2VSwc2uzmfYW2FXyowHk8HzIZc3W29AeTyP6K3IQ09J9wkqccL3YEmWXawrFtfmdq4C8grGvRzVHvP8J2EGesqYf4oYUBmWr7AbxxKPhAbl3_e2wUlnsio3GqIuAIn5DESBYEyg0rqpn3XrV-XdDVIqQGcGsaeOB6a_rQ_hylgkpEWxW8078vY1BOiAqG6st4UyGCztQVnXAAk1i55kJAUDVOJXrlkLtooEiXNuxgj4Q6yITevENGYhXbsTU9gc1GsJvqCMgYpjwfFGovZqMEIToZx/http%3A%2F%2Fwww.astrazeneca.com
 






The information in this e-mail is intended only for the person to whom it is 
addressed.  If you believe this e-mail was sent to you in error and the e-mail 
contains patient