[slurm-dev] Re: _slurm_cgroup_destroy message?

2014-11-19 Thread Bjørn-Helge Mevik

David Bigagli da...@schedmd.com writes:

 Yes this is fixed in commit c8f34560c87c in 14.03.11 which has not been
 release yet. However it is straightforward to back port.

Thanks!  We'll consider back-porting it.  (We just upgraded to 14.03.7,
but due to slurmctld consistently aborting while running the test suite
for versions 14.03.8--14.03.10, we didn't upgrade to .10.)

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

[slurm-dev] Re: How many accounts can SLURM support?

2014-11-19 Thread Ryan Cox

Dave,

I have done testing on 5-6 year old hardware with 100,000 users randomly 
distributed in 10,000 accounts with semi-random depths with most being 
between 1-4 levels from root but some much deeper than that, plus 
100,000 jobs pending.  slurmctld startup time was really long but, after 
getting started, fairshare and decay iterations in all fairshare 
algorithms took 50-150 milliseconds depending on how you measure it.  
Those calculations run no more frequently than once per minute and can 
be configured to run less frequently.


You shouldn't have any problems.

Ryan

On 11/18/2014 12:30 PM, David Lipowitz wrote:

How many accounts can SLURM support?
Does anyone have a sense of how far SLURM scales regarding accounts 
and sub-accounts?


In our batch environment, all jobs need to run under the same service 
account for a number of reasons (which I won't go into here).  Since 
our scheduler knows which end user is actually submitting the job, 
we'd like to handle prioritization by creating sub-accounts for each 
user under each of the leaf accounts depicted below:


root
 |
 +- query
 ||
 |+- type_a
 ||
 |+- type_b
 ||
 |+- type_c
 ||
 |+- type_d
 |
 +- process


So I'd have five accounts, one for each type of query and another for 
the process account:


query_type_a_dlipowitz
query_type_b_dlipowitz
query_type_c_dlipowitz
query_type_d_dlipowitz

process_dlipowitz


And each other user would have five analogous accounts.

Given that we have 600 users, can SLURM handle 3000 sub-accounts like 
this?  If we doubled in size, could SLURM handle 6000?


Thanks for any insight you might be able to offer.


Cheers,
Dave




[slurm-dev] Re: How many accounts can SLURM support?

2014-11-19 Thread David Lipowitz
Sounds like our account model should be fine.  Thanks!


Cheers,
Dave


[slurm-dev] gres/gpu unable to set CUDA_VISIBLE_DEVICES, no device files configured

2014-11-19 Thread Felix Wolfheimer
I'm having a problem configuring slurm with GPU resources. My machines have
four Tesla GPUs and the gres.conf file contains a single line:

NodeName=gpunode[13-14] Name=gpu File=/dev/nvidia[0-3]

The device files are there and GPU computing is working when started
outside of SLURM:

crw-rw-rw-. 1 root root 195,   0 Sep 26 15:48 /dev/nvidia0
crw-rw-rw-. 1 root root 195,   1 Sep 26 15:48 /dev/nvidia1
crw-rw-rw-. 1 root root 195,   2 Sep 26 15:48 /dev/nvidia2
crw-rw-rw-. 1 root root 195,   3 Sep 26 15:48 /dev/nvidia3

The resources are configured in slurm.conf as well:

GresTypes=gpu
NodeName=gpunode[13-14] Feature=Tesla,K20Xm Gres=gpu:4 CPUs=16 Sockets=2
CoresPerSocket=8 State=UNKNOWN

However, when I submit a job requesting these resources (--gres=gpu:4) the
job goes to the running state and then hangs forever. The log file of
slurmd for the job shows the following lines if the jobs start up.

[2014-11-19T15:16:12.676] [16] gres: gpu state for job 16
[2014-11-19T15:16:12.676] [16]   gres_cnt:4 node_cnt:1
[2014-11-19T15:16:12.676] [16]   gres_bit_alloc:NULL
[2014-11-19T15:16:12.676] [16]   gres_bit_step_alloc:NULL
[2014-11-19T15:16:12.676] [16]   gres_cnt_step_alloc:NULL
[2014-11-19T15:16:12.676] [16] gres/gpu unable to set CUDA_VISIBLE_DEVICES,
no device files configured

The srun -s hostname command the job uses to get all allocated machine
names gets stuck in the job context. The stdout file contains a single line
saying srun: Job step creation temporarily disabled, retrying. But in
fact it's not retrying but just getting stuck there. Everything goes well
when no GPUs are requested.

This is for version 14.03.10. Does anyone have an idea?


[slurm-dev] user exclusivity on node

2014-11-19 Thread Marcin Sliwowski


Was wondering if any of SLURM's internal schedulers have the equivalent 
setting as Maui's nodeaccesspolicy singleuser.


In Maui this means that multiple jobs owned by the same user may land on 
a single node.


I am using the cons_res selection plugin with shed/backfill for 
SchedulerType and --exclusive only allows a single job on any given 
node. Without --exclusive jobs from many users can land on a single node.


Thanks

--
Marcin Sliwowski | SysAdmin@RENCI | 919-445-0479


[slurm-dev] sbatch --array question and a tale of job and task confusion

2014-11-19 Thread Balt.Indermuehle
Hi all,

I'm submitting computing jobs using the sbatch --array option. I'd like to get 
notified about the progress of the master job, but not about the progress of 
each of its up to several hundred sub tasks.

I use the sbatch command like this to launch the jobs:

sbatch --array 0-59 --mail-type=ALL --mail-user=myemail rfi.job

but that notifies me of the progress of each of the 60 sub tasks, which is not 
what I intended.

The reason I think this should work by notifying me only for the parent task is 
that when the rfi.job batch is called for each array item, the SLURM_JOB_ID in 
the environment is the same for each of the 60 calls, which would hint at all 
of the tasks running under the same job ID. The SLURM_ARRAY_TASK_ID, as 
expected, then changes for every call.

Showing the job queue using squeue, I see all tasks having the same job ID, 
with _tasknumber attached, again as expected.

But when using sacct, I see that each array task has a different job ID. This 
is where the confusion enters, are they really separate jobs, or are they one 
parent job and sub tasks? If the latter is true, that would explain the email 
behaviour. But that also then points at a fairly obvious need: Isn't there a 
built in way to group or attach an array of jobs to one master job, and get 
notified when the first sub-task begins execution, and when the last sub-task 
has completed, plus notification if any of the sub tasks have failed?

Any input greatly appreciated.

Cheers

- Balt

Dr Balthasar Indermuehle
CSIRO Astronomy and Space Science
t: +61 2 9372 4274 m: +61 4 2791 2856


[slurm-dev] Re: sbatch --array question and a tale of job and task confusion

2014-11-19 Thread Andrew Elwell
I'll add that this is (most likely) being seen on slurm 2.6.6 on a Cray
using ALPS.
/me waves to Balt