[slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Matteo Guglielmi
Dear comunity, I have a user who usually submits 36 (identical) jobs at a time using a simple for loop, thus jobs are sbatched all the same time. Each job requests a single core and all jobs are independent from one another (read different input files and write to different output files). Jobs

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread John Hearns
Matteo, a stupid question but if these are single CPU jobs why is mpirun being used? Is your user using these 36 jobs to construct a parallel job to run charmm? If the mpirun is killed, yes all the other processes which are started by it on the other compute nodes will be killed. I suspect your u

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Paddy Doyle
Hi Matteo, On Fri, Jun 29, 2018 at 10:13:33AM +, Matteo Guglielmi wrote: > Dear comunity, > > I have a user who usually submits 36 (identical) jobs at a time using a > simple for loop, > thus jobs are sbatched all the same time. > > Each job requests a single core and all jobs are independ

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Matteo Guglielmi
ity List Subject: Re: [slurm-users] All user's jobs killed at the same time on all nodes Matteo, a stupid question but if these are single CPU jobs why is mpirun being used? Is your user using these 36 jobs to construct a parallel job to run charmm? If the mpirun is killed, yes all the othe

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread John Hearns
em is killed, why > would all others go down as well? > > > That would make sense if a single mpirun is running 36 tasks... but the > user is not doing this. > > ________________ > From: slurm-users on behalf of > John Hearns > Sent: Friday, June 29,

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-06-29 Thread Thomas M. Payerle
/exec/gnu_M/charmm < >> newphcnl99a0.inp > newphcnl99a0.out >> >> >> >> >> so they are all independent mpiruns... if one of them is killed, why >> would all others go down as well? >> >> >> That would make sense if a single mpirun is running 36 tasks... b

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-07-02 Thread Matteo Guglielmi
lor=auto moha From: slurm-users on behalf of Thomas M. Payerle Sent: Friday, June 29, 2018 7:34:09 PM To: Slurm User Community List Subject: Re: [slurm-users] All user's jobs killed at the same time on all nodes A couple comments/possible suggestions. First,

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-07-02 Thread John Hearns
S21:46 0:00 -bash > moha 194080 0.0 0.0 151060 1820 pts/4R+ 21:52 0:00 ps aux > moha 194081 0.0 0.0 112664 972 pts/4S+ 21:52 0:00 grep > --color=auto moha > > > ________________________ > From: slurm-users on behalf of > Thomas M. Payerle

Re: [slurm-users] All user's jobs killed at the same time on all nodes

2018-07-02 Thread Matteo Guglielmi
lly you get the f. out of here twice a day so that my jobs can start running. Hhahaha!!! From: slurm-users on behalf of John Hearns Sent: Monday, July 2, 2018 12:37:13 PM To: Slurm User Community List Subject: Re: [slurm-users] All user's jobs killed