Re: [slurm-users] How to deal with jobs that need to be restarted several time

2019-03-13 Thread Selch, Brigitte (FIDF)
Hello,

Jeah, that's it.
I can use salloc, instead of sbatch.
The user can test and run the job within this interactive slurm allocation.

 Thank you

Brigitte Selch

-Ursprüngliche Nachricht-
Von: slurm-users  Im Auftrag von Renfro, 
Michael
Gesendet: Dienstag, 12. März 2019 15:33
An: Slurm User Community List 
Betreff: Re: [slurm-users] How to deal with jobs that need to be restarted 
several time

If the failures happen right after the job starts (or close enough), I’d use an 
interactive session with srun (or some other wrapper that calls srun, such as 
fisbatch).

Our hpcshell wrapper for srun is just a bash function:

=

hpcshell ()
{
srun --partition=interactive $@ --pty bash -i }

=

The interactive partition argument is optional, but we use it as a time- and 
resource-limited partition with a higher priority. I always recommend our users 
to develop and debug with interactive jobs, and only submit the full production 
job with sbatch after all the easy bugs have been identified.

--
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University

> On Mar 12, 2019, at 9:26 AM, Selch, Brigitte (FIDF)  
> wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hello,
>
> Some jobs have to be restarted several times until they run.
> Users start the Job, it fails, they have to do some changes, they
> start the job again, it fails again … and so on.
>
> So they want to keep the resources until the job is running properly.
>
> Is there a possibility to ‘inherit’ allocated resources from one job
> to the next.
>
> Or something else to do the job?
>
> All our jobs are submitted with sbatch
>
> Thank you,
> Brigitte Selch
>
>
>
> Mit freundlichen Grüßen,
> Brigitte Selch
>
> MAN Truck & Bus AG
> IT Produktentwicklung Simulation (FIDF) Vogelweiher Str. 33
> 90441 Nürnberg
>
> Telefon +49 911 420 6056
> brigitte.se...@man.eu
>
>
>
> MAN Truck & Bus AG
> Sitz der Gesellschaft: München
> Registergericht: Amtsgericht München, HRB 86963 Vorsitzender des
> Aufsichtsrates: Andreas Renschler
> Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr.
> Carsten Intra, Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg,
> Dr. Frederik Zohm
>
> You can find information about how we process your personal data and
> your rights in our data protection notice:
> www.man.eu/data-protection-notice
>
> This e-mail (including any attachments) is confidential and may be privileged.
> If you have received it by mistake, please notify the sender by e-mail and 
> delete this message from your system.
> Any unauthorised use or dissemination of this e-mail in whole or in part is 
> strictly prohibited.
> Please note that e-mails are susceptible to change.
> MAN Truck & Bus AG (including its group companies) shall not be liable for 
> the improper or incomplete transmission of the information contained in this 
> communication nor for any delay in its receipt.
> MAN Truck & Bus AG (or its group companies) does not guarantee that the 
> integrity of this communication has been maintained nor that this 
> communication is free of viruses, interceptions or interference.




MAN Truck & Bus AG
Sitz der Gesellschaft: München
Registergericht: Amtsgericht München, HRB 86963
Vorsitzender des Aufsichtsrates: Andreas Renschler
Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr. Carsten Intra, 
Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg, Dr. Frederik Zohm

You can find information about how we process your personal data and your 
rights in our data protection notice: www.man.eu/data-protection-notice

This e-mail (including any attachments) is confidential and may be privileged.
If you have received it by mistake, please notify the sender by e-mail and 
delete this message from your system.
Any unauthorised use or dissemination of this e-mail in whole or in part is 
strictly prohibited.
Please note that e-mails are susceptible to change.
MAN Truck & Bus AG (including its group companies) shall not be liable for the 
improper or incomplete transmission of the information contained in this 
communication nor for any delay in its receipt.
MAN Truck & Bus AG (or its group companies) does not guarantee that the 
integrity of this communication has been maintained nor that this communication 
is free of viruses, interceptions or interference.



Re: [slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Renfro, Michael
If the failures happen right after the job starts (or close enough), I’d use an 
interactive session with srun (or some other wrapper that calls srun, such as 
fisbatch).

Our hpcshell wrapper for srun is just a bash function:

=

hpcshell ()
{
srun --partition=interactive $@ --pty bash -i
}

=

The interactive partition argument is optional, but we use it as a time- and 
resource-limited partition with a higher priority. I always recommend our users 
to develop and debug with interactive jobs, and only submit the full production 
job with sbatch after all the easy bugs have been identified.

-- 
Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
931 372-3601 / Tennessee Tech University

> On Mar 12, 2019, at 9:26 AM, Selch, Brigitte (FIDF)  
> wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> Hello,
>  
> Some jobs have to be restarted several times until they run.
> Users start the Job, it fails, they have to do some changes,
> they start the job again, it fails again … and so on.
>  
> So they want to keep the resources until the job is running properly.
>  
> Is there a possibility to ‘inherit’ allocated resources
> from one job to the next.
>  
> Or something else to do the job? 
>  
> All our jobs are submitted with sbatch
>  
> Thank you,
> Brigitte Selch
>  
>  
>  
> Mit freundlichen Grüßen,
> Brigitte Selch
>  
> MAN Truck & Bus AG
> IT Produktentwicklung Simulation (FIDF)
> Vogelweiher Str. 33
> 90441 Nürnberg
>  
> Telefon +49 911 420 6056
> brigitte.se...@man.eu
>  
> 
> 
> MAN Truck & Bus AG
> Sitz der Gesellschaft: München
> Registergericht: Amtsgericht München, HRB 86963
> Vorsitzender des Aufsichtsrates: Andreas Renschler
> Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr. Carsten 
> Intra, Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg, Dr. Frederik Zohm
> 
> You can find information about how we process your personal data and your 
> rights in our data protection notice: www.man.eu/data-protection-notice
> 
> This e-mail (including any attachments) is confidential and may be privileged.
> If you have received it by mistake, please notify the sender by e-mail and 
> delete this message from your system.
> Any unauthorised use or dissemination of this e-mail in whole or in part is 
> strictly prohibited.
> Please note that e-mails are susceptible to change.
> MAN Truck & Bus AG (including its group companies) shall not be liable for 
> the improper or incomplete transmission of the information contained in this 
> communication nor for any delay in its receipt.
> MAN Truck & Bus AG (or its group companies) does not guarantee that the 
> integrity of this communication has been maintained nor that this 
> communication is free of viruses, interceptions or interference.



[slurm-users] How to deal with jobs that need to be restarted several time

2019-03-12 Thread Selch, Brigitte (FIDF)
Hello,

Some jobs have to be restarted several times until they run.
Users start the Job, it fails, they have to do some changes,
they start the job again, it fails again ... and so on.

So they want to keep the resources until the job is running properly.

Is there a possibility to 'inherit' allocated resources
from one job to the next.

Or something else to do the job?

All our jobs are submitted with sbatch

Thank you,
Brigitte Selch



Mit freundlichen Grüßen,
Brigitte Selch

MAN Truck & Bus AG
IT Produktentwicklung Simulation (FIDF)
Vogelweiher Str. 33
90441 Nürnberg

Telefon +49 911 420 6056
brigitte.se...@man.eu




MAN Truck & Bus AG
Sitz der Gesellschaft: München
Registergericht: Amtsgericht München, HRB 86963
Vorsitzender des Aufsichtsrates: Andreas Renschler
Vorstand: Joachim Drees (Vorsitzender), Dirk Große-Loheide, Dr. Carsten Intra, 
Michael Kobriger, Jan-Henrik Lafrentz, Göran Nyberg, Dr. Frederik Zohm

You can find information about how we process your personal data and your 
rights in our data protection notice: www.man.eu/data-protection-notice

This e-mail (including any attachments) is confidential and may be privileged.
If you have received it by mistake, please notify the sender by e-mail and 
delete this message from your system.
Any unauthorised use or dissemination of this e-mail in whole or in part is 
strictly prohibited.
Please note that e-mails are susceptible to change.
MAN Truck & Bus AG (including its group companies) shall not be liable for the 
improper or incomplete transmission of the information contained in this 
communication nor for any delay in its receipt.
MAN Truck & Bus AG (or its group companies) does not guarantee that the 
integrity of this communication has been maintained nor that this communication 
is free of viruses, interceptions or interference.