[slurm-dev] Re: Error: Unable to contact slurm controller

2014-08-21 Thread Gerry Creager - NOAA Affiliate
No, slurmctld isn't running. Now. It was when I started, but I suspect I
made at least one mod too many to slurm.conf. When I try to start
slurmctld, I get these in slurmctld.log:
[2014-08-21T09:30:09.626] debug2: No ApbasilTimeout configured (65534)
[2014-08-21T09:30:09.630] debug2: No ApbasilTimeout configured (65534)
[2014-08-21T09:30:09.673] fatal: system has no usable batch compute nodes


I've just made a mod to slurm.conf that makes sure there's a default
partition. I'd had named partitions in previously, but got some errors and
warnings when trying to get the partition naming right in #SBATCH, so I'd
gone back to the default config.

This appears to have started with a reboot several days ago. I'm now making
sure it's not something deeper causing a Gemini network problem.

Thanks, Trey!
gerry


On Wed, Aug 20, 2014 at 10:11 PM, Trey Dockendorf treyd...@tamu.edu wrote:


 Is slurmctld running?  My guess is that you need at least one partition
 defined in addition to the DEFAULT partition.  Try creating a partition
 with any name, which will inherit everything from DEFAULT.

 - Trey

 =

 Trey Dockendorf
 Systems Analyst I
 Texas AM University
 Academy for Advanced Telecommunications and Learning Technologies
 Phone: (979)458-2396
 Email: treyd...@tamu.edu
 Jabber: treyd...@tamu.edu

 - Original Message -
  From: Gerry Creager - NOAA Affiliate gerry.crea...@noaa.gov
  To: slurm-dev slurm-dev@schedmd.com
  Sent: Wednesday, August 20, 2014 4:40:40 PM
  Subject: [slurm-dev] Re: Error: Unable to contact slurm controller
 
 
  Hi, Trey
 
 
  That's what I am intuiting, as well, but:
 
 
 
  gerry@loki:~/software/wrf/NME/DART_Lanai/models/wrf/work egrep
  '^(PartitionName|NodeName)' /opt/slurm/default/etc/slurm.conf
 
 NodeName=nid00[002-007,024-029,040-043,046-049,052-055,064-071,088-099,100-103,120-127,136-151,160-167,184-199,216-223,232-247,256-263,280-287]
  Sockets=4 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=65536
  PartitionName=DEFAULT Shared=EXCLUSIVE State=UP DefaultTime=60
 
 Nodes=nid00[002-007,024-029,040-043,046-049,052-055,064-071,088-099,100-103,120-127,136-151,160-167,184-199,216-223,232-247,256-263,280-287]
  MaxNodes=12
 
 
  looks pretty normal.
 
 
  gerry
 
 
 
 
 
  On Wed, Aug 20, 2014 at 4:25 PM, Trey Dockendorf  treyd...@tamu.edu
   wrote:
 
 
 
  What's your slurm.conf look like? Do you have valid Nodes and
  Partitions defined?
 
  For example:
 
  egrep '^(PartitionName|NodeName)' /etc/slurm/slurm.conf
 
  Sounds like invalid slurm.conf is preventing slurmctld from starting.
 
  - Trey
 
  =
 
  Trey Dockendorf
  Systems Analyst I
  Texas AM University
  Academy for Advanced Telecommunications and Learning Technologies
  Phone: (979)458-2396
  Email: treyd...@tamu.edu
  Jabber: treyd...@tamu.edu
 
 
 
  - Original Message -
   From: Gerry Creager - NOAA Affiliate  gerry.crea...@noaa.gov 
   To: slurm-dev  slurm-dev@schedmd.com 
   Sent: Wednesday, August 20, 2014 4:09:25 PM
   Subject: [slurm-dev] Re: Error: Unable to contact slurm controller
  
  
   Moe,
  
  
   Thanks. I've tried. I'm noting a pair of errors in the
   slurmctld.log
   file:
  
  
  
   2014-08-20T15:58:58.458] debug: No DownNodes
   [2014-08-20T15:58:58.458] fatal: No PartitionName information
   available!
  
  
   So far, Google hasn't helped me much in this regard.
  
  
   gerry
  
  
  
   On Wed, Aug 20, 2014 at 11:39 AM,  je...@schedmd.com  wrote:
  
  
  
   Try this:
   http://slurm.schedmd.com/ troubleshoot.html
  
  
  
   Quoting Gerry Creager - NOAA Affiliate  gerry.crea...@noaa.gov :
  
  
  
   I'm trying to learn how to use and administer slurm on a new Cray
   system,
   and started seeing this yesterday:
   squeue
   slurm_load_jobs error: Unable to contact slurm controller (connect
   failure)
  
   I'm at a loss as to how to proceed.
  
   Thanks, Gerry
   --
   Gerry Creager
   NSSL/CIMMS
   405.325.6371
   ++
   “Big whorls have little whorls,
   That feed on their velocity;
   And little whorls have lesser whorls,
   And so on to viscosity.”
   Lewis Fry Richardson (1881-1953)
  
  
   --
   Morris Moe Jette
   CTO, SchedMD LLC
  
   Slurm User Group Meeting
   September 23-24, Lugano, Switzerland
   Find out more http://slurm.schedmd.com/ slurm_ug_agenda.html
  
  
  
  
   --
  
   Gerry Creager
   NSSL/CIMMS
   405.325.6371
   ++
  
   “Big whorls have little whorls,
   That feed on their velocity;
   And little whorls have lesser whorls,
   And so on to viscosity.”
   Lewis Fry Richardson (1881-1953)
 
 
 
  --
 
  Gerry Creager
  NSSL/CIMMS
  405.325.6371
  ++
 
  “Big whorls have little whorls,
  That feed on their velocity;
  And little whorls have lesser whorls,
  And so on to viscosity.”
  Lewis Fry Richardson (1881-1953)




-- 
Gerry Creager
NSSL/CIMMS
405.325.6371
++
“Big whorls have little whorls,
That feed on their 

[slurm-dev] Storing the job submission script in the accounting database

2014-08-21 Thread Antony Cleave


Is it possible to store the job submission script and the environment 
variables passed  to it in the account database or log this data 
automatically to /path/to/spylog/SLURM_JOB_ID.log files  in SLURM?


I'm interested in analysing what the cluster is used for over time and 
this would be a good start in working out what is really being submitted.


Thanks

Antony


[slurm-dev] Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Jesse Stroik


Slurmites,

We recently noticed sporadic performance inconsistencies on one of our 
clusters. We discovered that if we restarted slurmd in an interactive 
shell, we observed correct performance.


To track down the cause, we ran:

(1) single-node linpack
(2) dual node mp_linpack
(3) mpptest

On affected nodes, Linpack performance was normal and mp_linpack was 
about 85% as high as expected.


mpptest, which measures MPI performance, was our smoking gun. Latencies 
to be 10x higher than expected (~20us instead of  2us). We were able to 
consistently reproduce the issue with freshly imaged or freshly rebooted 
nodes. Upon restarting slurmd on each execution node manually, MPI 
latencies immediately improved to the expected  2us for our set of 
tested nodes.


The cluster is under fairly heavy use right now so we don't have the 
luxury of diagnosing this thoroughly and determining the cause. We 
wanted to share this experience with others in case it can help other 
users or if any slurm developers would like us to file a bug report and 
be interested in gathering further information.


Best,
Jesse Stroik

University of Wisconsin


[slurm-dev] Re: Account / partition association on heterogeneous clusters

2014-08-21 Thread Jesse Stroik


We ended up working around our needs by writing a program that provided 
users with the appropriate settings.


It may be something to consider for future releases of slurm to be able 
to automatically use any available and valid account given a 
user-partition request, or allow administrators to set a default account 
for each user-partition combination.


Best,
Jesse Stroik
University of Wisconsin

On 8/13/2014 12:43 PM, Jesse Stroik wrote:


Our cluster has two primary groups of users. The users groups each have
a different account from which we designate shares and for which we
provide accounting information.

We are in the process of adding nodes for which CPU time has a very
different practical value to the end users. If users used these nodes
their shares provide them less value.

To mitigate this, we've created a new partition ('amd') and we've set up
an additional account for each group.

group1
group2
group1-amd
group2-amd

Each user has multiple associations. For example:

group1 ourcluster 9000
group2 ourcluster 1000

group1 ourcluster alice 100 regular-partition
group1-amd ourcluster alice 100 amd-partition

group2 ourcluster bob 100 regular-partition
group2-amd ourcluster bob 100 amd-partition

One issue with this is users who can only use one partition get a
smaller share of the total system. Another is that we cannot set a
default account per user-partition combination. For example, if Alice
wants to submit to --partition amd-partition, she must also specify -A
group1-amd or she will get an invalid user/partition error.

Is there a better way to do this? We don't see a way to allow SLURM to
search the association tables for a valid account for the user/partition
combination.

Best,
Jesse Stroik
University of Wisconsin


[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Kilian Cavalotti

Hi Jesse,

Just a shot in the dark, but do you use task affinity or CPU binding?

Cheers,
-- 
Kilian


[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Jesse Stroik


Yes, but we aren't specifying it for all of these jobs. In the config we 
have:



---
TaskPlugin=task/affinity
TaskPluginParam=Sched
SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
---

And we typically suggest --cpu_bind=core --distribution=block:block 
for srun in our documentation. However, we did not specify --cpu_bind or 
--distribution as arguments to the job for mpptest or for mp_linpack. 
And we noticed that despite the CR_CORE_DEFAULT_DIST_BLOCK setting, we 
still needed to specify --distribution=block:block for our binding to be 
correct for OpenMP+MPI hybrid jobs.


Best,
Jesse


On 8/21/2014 2:14 PM, Kilian Cavalotti wrote:


Hi Jesse,

Just a shot in the dark, but do you use task affinity or CPU binding?

Cheers,



[slurm-dev] Re: Intel MPI Performance inconsistency (and workaround)

2014-08-21 Thread Christopher Samuel

On 22/08/14 04:43, Jesse Stroik wrote:

 We recently noticed sporadic performance inconsistencies on one of our
 clusters.

What distro is this?  Are you using cgroups?

cheers,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci