We have a heterogeneous mix of nodes, most 32 core, but one group of 36 core, grouped into homogeneous partitions. We like to be able to specify multiple partitions so that a job can run on any homogeneous group. It would be nice if we could run on all such nodes using 32 cores per node. To try to do this, I created a partition for the 36-core nodes (call them n2019) which specifies a max cpu # of 64 PartitionName=n2019 DefMemPerCPU=2631 Nodes=compute-4-[0-47] PartitionName=n2019_32 DefMemPerCPU=2631 Nodes=compute-4-[0-47] MaxCPUsPerNode=64 PartitionName=n2021 DefMemPerCPU=2960 Nodes=compute-7-[0-18]
However, if I try to run a 128 task, 1 task per core job on n2019_32, the sbatch fails with > sbatch --ntasks=128 --exclusive --partition=n2019_32 --ntasks-per-core=1 > job.pbs sbatch: error: Batch job submission failed: Requested node configuration is not available (please ignore the ".pbs" - it's a relic, and the job script works with slurm). The identical command but with "n2019" or "n2021" for the partition works (but the former uses 36 cores per node). If I specify multiple partitions it will only actually run when the non-n2019 (same node set as n2019_32) nodes are available. The job header includes only walltime, job name and stdout/stderr files, shell, and a job array range. I tried to add "-v" to the sbatch to see if that gives more useful info, but I couldn't get any more insight. Does anyone have any idea why it's rejecting my job? thanks, Noam