[slurm-dev] Re: Slurm versions 17.02.5 and 17.11.0-pre1 are now available

2017-07-15 Thread Andrew Elwell

> Slurm version 17.11.0-pre1 is the first pre-release of version 17.11, to
> be
> released in November 2017. This version contains the support for
> scheduling of
> a workload across a set (federation) of clusters which is described in
> some
> detail here:
> https://slurm.schedmd.com/SLUG16/FederatedScheduling.pdf

Something that seems to be missing in the PDF (unless it's in the
"Magic: TBD" part) is the ability for a federated job to have
dependencies on sibling jobs - is this sill part of the workflow?

ie
Federation = MySite
sibling cluster 1 = BigCray
sibling cluster 2 = PrePostCluster

ideallly we'd like a user who probably logged into BigCray as their
local cluster to submiit a job with
step1 - serial work on PrePostCluster
step2 - large srun on BigCray, dependency = afterok: PrePostCluster:step1
step3 - small parallel cleanup on PrePostCluster, dependency=afterok:
BigCray:step2


Is this still on schedule for the initial 17.11 release or will it
land in a later update or release?


Andrew (trying to work out if we'll have time to test 17.11 before
upgrading all clusters the 1st week in Jan)


[slurm-dev] Re: rpmbuild from tarball

2017-06-18 Thread Andrew Elwell

yes - The *schedmd* suppliied tarballs (as you point to in the wiki
page) work fine. The *github* ones dont.

if you compare them, the github ones have a different slurm.spec,
hence my asking if there was another script needed to sed them



On 19 June 2017 at 14:01, Ole Holm Nielsen  wrote:
> On 06/19/2017 05:36 AM, Andrew Elwell wrote> I've just tried and failed to
> get the github release
>>
>> (https://github.com/SchedMD/slurm/releases) of 16.05.10-2 to build
>> using the 'rpmbuild -ta tarball' trick - it's failing on line 88 of
>> the spec
>>
>> ie
>>>
>>> Name:see META file
>>> Version: see META file
>>> Release: see META file
>
>
> Works like a charm on CentOS 7.3!  Do you have all the prerequisites
> installed?  See
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms
>
> /Ole


[slurm-dev] rpmbuild from tarball

2017-06-18 Thread Andrew Elwell

I've just tried and failed to get the github release
(https://github.com/SchedMD/slurm/releases) of 16.05.10-2 to build
using the 'rpmbuild -ta tarball' trick - it's failing on line 88 of
the spec

ie
> Name:see META file
> Version: see META file
> Release: see META file

however the tarball from the schedmd website has them hard coded

aelwell@badger:~/compile$ diff -ur slurm-slurm-16-05-10-2/ slurm-16.05.10-2/
diff -ur slurm-slurm-16-05-10-2/META slurm-16.05.10-2/META
--- slurm-slurm-16-05-10-2/META 2017-03-03 08:42:11.0 +0800
+++ slurm-16.05.10-2/META 2017-03-03 08:52:56.0 +0800
@@ -1,36 +1,11 @@
-##
-# Metadata for RPM/TAR makefile targets
-##
-# See src/api/Makefile.am for guidance on setting API_ values
-##
-  Meta: 1
-  Name: slurm
-  Major: 16
-  Minor: 05
-  Micro: 10
-  Version: 16.05.10
-  Release: 2
-# Include leading zero for all pre-releases
-
-##
-#  When making a new Major/Minor version update
-#  src/common/slurm_protocol_common.h
-#  with a new SLURM_PROTOCOL_VERSION signifing the old one and the version
-#  it was so the slurmdbd can continue to send the old protocol version.
-#  In src/common/slurm_protocol_util.c check_header_version()
-#  need to be updated also when changes are added also.
-#  In src/plugins/slurmctld/nonstop/msg.c needs to have version_string updated.
-#  The META of libsmd needs to reflect this version and API_CURRENT as well.
-#
-#  NOTE: The API version can not be the same as the Slurm version above.  The
-#version in the code is referenced as a uint16_t which if 1403 was the
-#API_CURRENT it would go over the limit.  So keep is a relatively
-#small number.
-#
-#  NOTE: The values below are used to set up environment variables in
-#the config.h file that may be used throughout Slurm, so don't remove
-# them.
-##
-  API_CURRENT: 30
-  API_AGE: 0
-  API_REVISION: 0
+  Api_age:   0
+  Api_current:   30
+  Api_revision:  0
+  Major: 16
+  Meta:  1
+  Micro: 10
+  Minor: 05
+  Name:  slurm
+  Release:   2
+  Release_tags:  dist
+  Version:   16.05.10
diff -ur slurm-slurm-16-05-10-2/slurm.spec slurm-16.05.10-2/slurm.spec
--- slurm-slurm-16-05-10-2/slurm.spec 2017-03-03 08:42:11.0 +0800
+++ slurm-16.05.10-2/slurm.spec 2017-03-03 08:52:54.0 +0800
@@ -85,15 +85,15 @@
 %slurm_with_opt sgijob
 %endif

-Name:see META file
-Version: see META file
-Release: see META file
+Name:slurm
+Version: 16.05.10
+Release: 2%{?dist}

 Summary: Slurm Workload Manager

 License: GPL
 Group: System Environment/Base
-Source: %{name}-%{version}-%{release}.tgz
+Source: slurm-16.05.10-2.tar.bz2
 BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}
 URL: http://slurm.schedmd.com/

@@ -431,7 +431,7 @@
 #

 %prep
-%setup -n %{name}-%{version}-%{release}
+%setup -n slurm-16.05.10-2

 %build
 %configure \
@@ -648,8 +648,8 @@
 Cflags: -I\${includedir}
 Libs: -L\${libdir} -lslurm
 Description: Slurm API
-Name: %{name}
-Version: %{version}
+Name: slurm
+Version: 16.05.10
 EOF

 %if %{slurm_with bluegene}



Is there a magic script that sets these?

Andrew


[slurm-dev] Re: Fwd: how to perform a DB upgrade?

2017-01-04 Thread Andrew Elwell

> 4. Start the new slurmdbd

Do this part by hand (ie, slurmdbd -Dvvv) as it takes longer than an
init script / systemctl allows for it to start due to the migration,
and it'll be flagged as 'failed'

When I did this for our test cluster, the initial startip of slurmdbd
took ~30 mins or so to update from 14.11.x to 16.05.x

Once that'd done, I ctrl-C'd the slurmdbd and started as normal via
"systemctl start slurmdbd"
(yeah I know. systemd...)


[slurm-dev] 404 on webpages

2016-11-03 Thread Andrew Elwell

Paging schedmd peeps:

http://slurm.schedmd.com/man_index.html
leads to a bunch of 404's


[slurm-dev] slurmdbd user across multiple clusters

2016-11-01 Thread Andrew Elwell

In the docs for slurmdbd 16.05 it states[1]:

SlurmUser
The name of the user that the slurmctld daemon executes as. This user
must exist on the machine executing the Slurm Database Daemon and have
the same user ID as the hosts on which slurmctld execute. For security
purposes, a user other than "root" is recommended. The default value
is "root".

however, what should this be set to when we have a mixed cluster?
The "normal" cluster nodes have slurmuser=slurm, but for the crays
(running alps) this needs to be slurmuser=root [2]

Will Bad Things(TM) happen if I leave this set to slurmuser=slurm on
our DBD host?

[1] http://slurm.schedmd.com/slurmdbd.conf.html
[2] http://slurm.schedmd.com/cray_alps.html

Andrew


[slurm-dev] Re: sreport "duplicate" lines

2016-10-20 Thread Andrew Elwell

> Looks like you've somehow created partition specific associations for
> some people - not something we do at all.

ISTR this was because 2.6 didn't let us have an overall restriction
for the cluster and a sub-restriction on the number of jobs to run in
a (debug) partition

I could understand that in the case of 'maali' which only has a
top-level assoc, but not wdavey who has the same as me but only one
line showing in sreport.

Couldn't see any obvious options to pass to sreport
format="cluster,u,l," to display the missing difference, ie

sreport cluster AccountUtilizationByUser -th start=2016-07-01
end=2016-10-01 cluster=magnus user=aelwell tree
format="cluster,u,l,partition"
 Unknown field 'partition'

The magic seems to be in format_list in
https://github.com/SchedMD/slurm/blob/master/src/sreport/cluster_reports.c
but I'm still trying to work out where...

Andrew


[slurm-dev] Re: sreport "duplicate" lines

2016-10-20 Thread Andrew Elwell

Yep, and for that particular account, not all of the members are
showing twice - I can't work out what causes it



Cluster/Account/User Utilization 1 Jul 00:00 - 30 Sep 23:59 (7948800 secs)
Time reported in CPU Hours

  Cluster Account Login Proper Name   Used Energy
- --- - --- -- --
   magnus  pawsey0001   397565  0
   magnus  pawsey0001 achew Ashley Chew175  0
   magnus  pawsey0001 achew Ashley Chew136  0
   magnus  pawsey0001   aelwell   Andrew Elwell  2  0
   magnus  pawsey0001   aelwell   Andrew Elwell   2236  0
   magnus  pawsey0001 bskjerven  Brian Skjerven275  0
   magnus  pawsey0001 bskjerven  Brian Skjerven  9  0
   magnus  pawsey0001   charris Christopher Ha+309  0
   magnus  pawsey0001   charris Christopher Ha+ 12  0
   magnus  pawsey0001 cyang   Charlene Yang830  0
   magnus  pawsey0001 cyang   Charlene Yang   1912  0
   magnus  pawsey0001darranDarran Carey  7  0
   magnus  pawsey0001darranDarran Carey 30  0
   magnus  pawsey0001 ddeeptim+ Deva Deeptimah+   2212  0
   magnus  pawsey0001 ddeeptim+ Deva Deeptimah+   4130  0
   magnus  pawsey0001 maali  Black Swan170  0
   magnus  pawsey0001moshea Mark O'Shea  1  0
   magnus  pawsey0001moshea Mark O'Shea  4  0
   magnus  pawsey0001   mshaikh Mohsin Ahmed S+   1460  0
   magnus  pawsey0001   mshaikh Mohsin Ahmed S+   4538  0
   magnus  pawsey0001 pryan   Paul Ryan   1611  0
   magnus  pawsey0001 pryan   Paul Ryan  14397  0
   magnus  pawsey0001reaper Daniel Grimwood 17  0
   magnus  pawsey0001reaper Daniel Grimwood225  0
   magnus  pawsey0001wdavey   William Davey  0  0

$ sacctmgr show assoc user=maali account=pawsey0001 cluster=magnus
   ClusterAccount   User  Partition Share GrpJobs GrpNodes
 GrpCPUs  GrpMem GrpSubmit GrpWall  GrpCPUMins MaxJobs MaxNodes
MaxCPUs MaxSubmit MaxWall  MaxCPUMins  QOS   Def
QOS GrpCPURunMins
-- -- -- -- - --- 
 --- - --- --- --- 
 - --- --- 
- -
magnus pawsey0001  maali   parent
32
128   normal

$ sacctmgr show assoc user=wdavey account=pawsey0001 cluster=magnus
   ClusterAccount   User  Partition Share GrpJobs GrpNodes
 GrpCPUs  GrpMem GrpSubmit GrpWall  GrpCPUMins MaxJobs MaxNodes
MaxCPUs MaxSubmit MaxWall  MaxCPUMins  QOS   Def
QOS GrpCPURunMins
-- -- -- -- - --- 
 --- - --- --- --- 
 - --- --- 
- -
magnus pawsey0001 wdavey   parent
32
128   normal
magnus pawsey0001 wdavey debugqparent
 1
  4   normal
magnus pawsey0001 wdavey  workqparent
32
128   normal




Should I log this with our vendor (for official support) or directly
into the schedmd BZ.?


[slurm-dev] sreport "duplicate" lines

2016-10-20 Thread Andrew Elwell

Hi folks,

When running sreport (both 14.11 and 16.05) I'm seeing "duplicate"
user info with different timings. Can someone say what's being added
up separately here - it seems to be summing something differently for
me and I can't work out what makes it split into two:


$ sreport cluster AccountUtilizationByUser start=2016-07-01
end=2016-10-01 account=pawsey0001 user=aelwell cluster=magnus -t h



Cluster/Account/User Utilization 1 Jul 00:00 - 30 Sep 23:59 (7948800 secs)
Time reported in CPU Hours

  Cluster Account Login Proper Name   Used Energy
- --- - --- -- --
   magnus  pawsey0001   aelwell   Andrew Elwell  2  0
   magnus  pawsey0001   aelwell   Andrew Elwell   2236  0


$ sacctmgr show assoc user=aelwell account=pawsey0001 cluster=magnus
   ClusterAccount   User  Partition Share GrpJobs GrpNodes
 GrpCPUs  GrpMem GrpSubmit GrpWall  GrpCPUMins MaxJobs MaxNodes
MaxCPUs MaxSubmit MaxWall  MaxCPUMins  QOS   Def
QOS GrpCPURunMins
-- -- -- -- - --- 
 --- - --- --- --- 
 - --- --- 
- -
magnus pawsey0001aelwell   parent
32
128   normal
magnus pawsey0001aelwell debugqparent
 1
  4   normal
magnus pawsey0001aelwell  workqparent
32
128   normal



is there a way of getting sreport to just give an aggregated total for
me, or if not, show why one "usage" is 2h, and the other 2236h?

Andrew


[slurm-dev] Re: Packaging for fedora (and EPEL)

2016-10-17 Thread Andrew Elwell

> I've had consistent success with the documented system - "rpmbulid
> slurm-.tgz" then yum installing the resulting files, using 15.x,
> 16.05 and 17.02.

Yup, it seems to build well enough but then fails a few picky rpmlint
rules - Nothing too major and *could* be worked around with patches
but hey, I'd rather get em upstream (lazy future maintainer)

Andrew


[slurm-dev] Packaging for fedora (and EPEL)

2016-10-17 Thread Andrew Elwell

Hi folks,

I see from https://bugzilla.redhat.com/show_bug.cgi?id=1149566 that
there have been a few unsuccessful attempts to get slurm into fedora
(and potentially EPEL)

Is anyone on this list actively working on it at the moment? I'll
update the bugzilla ticket to prod the last portential packager but
failing that I'm offering to work on it.

My plan is to get 16.05 into fedora, but not into EPEL itself (the
supported life of a given release is just too short to match with the
RHEL timeline), however I'll probably put "unofficial" srpms publicly
available that shoud meet all the epel packaging requirements.

schedmd people - as some of this may involve patches to the spec file
amongst other things, what's the best way to progress this - attach a
diff to something on your bugzilla page rather than git pull req?

Andrew


[slurm-dev] Re: rpm dependencies in 16.05.5

2016-10-13 Thread Andrew Elwell

> I have a Wiki page describing how to install Munge and Slurm on CentOS 7:

Thanks Ole, there's some good notes in there I'll use.

My original question was more a packaging issue - In this case I don't
mind installing the rest of the slurm binaries, but ideally I'd like
our slurmdbd host to be just that. slurmdbd alone (and as few other
installed applications as possible).


%if %{slurm_with munge}
%package munge
Summary: Slurm authentication and crypto implementation using Munge
Group: System Environment/Base
Requires: slurm munge
BuildRequires: munge-devel munge-libs
Obsoletes: slurm-auth-munge
%description munge
Slurm authentication and crypto implementation using Munge. Used to
authenticate user originating an RPC, digitally sign and/or encrypt messages
%endif

> Requires: slurm munge
seems to be the culprit


[slurm-dev] rpm dependencies in 16.05.5

2016-10-13 Thread Andrew Elwell

Hi folks,

I've just built 16.05.5 into rpms (using the rpmbuild -ta
slurm*.tar.bz2 method) to update a CentOS 7 slurmdbd host.

According to http://slurm.schedmd.com/accounting.html

"Note that SlurmDBD relies upon existing Slurm plugins for
authentication and Slurm sql for database use, but the other Slurm
commands and daemons are not required on the host where SlurmDBD is
installed. Install the slurmdbd, slurm-plugins, and slurm-sql RPMs on
the computer when SlurmDBD is to execute. If you want munge
authentication, which is highly recommended, you will also need to
install the slurm-munge RPM."

so just installing  slurmdbd, slurm-plugins, and slurm-sql works (yum
localinstall), but as expected fails to start:

[2016-10-13T20:19:46.931] error: Couldn't find the specified plugin
name for auth/munge looking at all files
[2016-10-13T20:19:46.931] error: cannot find auth plugin for auth/munge
[2016-10-13T20:19:46.931] error: cannot create auth context for auth/munge
[2016-10-13T20:19:46.931] fatal: Unable to initialize auth/munge
authentication plugin

however it's not possible to cleanly install slurm-munge without slurm:

[root@ae-test01 ~]# yum localinstall
rpmbuild/RPMS/x86_64/slurm-munge-16.05.5-1.el7.centos.x86_64.rpm
Loaded plugins: fastestmirror
Examining rpmbuild/RPMS/x86_64/slurm-munge-16.05.5-1.el7.centos.x86_64.rpm:
slurm-munge-16.05.5-1.el7.centos.x86_64
Marking rpmbuild/RPMS/x86_64/slurm-munge-16.05.5-1.el7.centos.x86_64.rpm
to be installed
Resolving Dependencies
--> Running transaction check
---> Package slurm-munge.x86_64 0:16.05.5-1.el7.centos will be installed
--> Processing Dependency: slurm for package:
slurm-munge-16.05.5-1.el7.centos.x86_64
base
  | 3.6 kB  00:00:00
extras
  | 3.4 kB  00:00:00
updates
  | 3.4 kB  00:00:00
--> Finished Dependency Resolution
Error: Package: slurm-munge-16.05.5-1.el7.centos.x86_64
(/slurm-munge-16.05.5-1.el7.centos.x86_64)
   Requires: slurm
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest


[root@ae-test01 ~]# yum localinstall
rpmbuild/RPMS/x86_64/slurm-munge-16.05.5-1.el7.centos.x86_64.rpm
rpmbuild/RPMS/x86_64/slurm-16.05.5-1.el7.centos.x86_64.rpm
Loaded plugins: fastestmirror
Examining rpmbuild/RPMS/x86_64/slurm-munge-16.05.5-1.el7.centos.x86_64.rpm:
slurm-munge-16.05.5-1.el7.centos.x86_64
Marking rpmbuild/RPMS/x86_64/slurm-munge-16.05.5-1.el7.centos.x86_64.rpm
to be installed
Examining rpmbuild/RPMS/x86_64/slurm-16.05.5-1.el7.centos.x86_64.rpm:
slurm-16.05.5-1.el7.centos.x86_64
Marking rpmbuild/RPMS/x86_64/slurm-16.05.5-1.el7.centos.x86_64.rpm to
be installed
Resolving Dependencies
--> Running transaction check
---> Package slurm.x86_64 0:16.05.5-1.el7.centos will be installed
---> Package slurm-munge.x86_64 0:16.05.5-1.el7.centos will be installed
--> Finished Dependency Resolution

Dependencies Resolved


 PackageArch  Version
   Repository   Size

Installing:
 slurm  x86_6416.05.5-1.el7.centos
   /slurm-16.05.5-1.el7.centos.x86_64   85 M
 slurm-mungex86_6416.05.5-1.el7.centos
   /slurm-munge-16.05.5-1.el7.centos.x86_64 44 k

Transaction Summary

Install  2 Packages

Total size: 85 M
Installed size: 85 M
Is this ok [y/d/N]: y


So - is this just a broken spec file that sets unneeded dependencies
or are the docs wrong that you don't need to install slurm?

Andrew


[slurm-dev] Re: Remote Visualization and Slurm

2016-08-18 Thread Andrew Elwell

> If anyone has a working remote visualization cluster that integrates well
> with slurm, I would love to hear from you.

We're using 'strudel'
https://www.massive.org.au/userguide/cluster-instructions/strudel
and our local instructions are
https://support.pawsey.org.au/documentation/display/US/Getting+started%3A+Remote+visualisation+with+Strudel

Andrew


[slurm-dev] Re: Cray Resource Utilization Reporting (RUR) via plugin

2015-03-20 Thread Andrew Elwell

Thanks Danny,

Native is on our roadmap, but we're rolling out 14.11.x first onto the
production systems first.

We'll see what RUR offers vs native when we get the next set of
changes through our test/dev system

Andrew


[slurm-dev] Cray Resource Utilization Reporting (RUR) via plugin

2015-03-20 Thread Andrew Elwell

Hi All,

We're investigating the possibility of enabling RUR on our XC30's,
with the end goal of integrating this into the slurmdbd for jobs.

Is anyone else working on this? if not, is anyone else interested?

I know that there's already
./acct_gather_energy/cray/acct_gather_energy_cray.c but I don't see
anything to interact with RUR.

Andrew


[slurm-dev] FlexLM integration - roughly how much work?

2015-01-15 Thread Andrew Elwell

Hi Folks,


At the Lugano meeting last year, SchedMD said that the Flexlm
integration had come off the short term roadmap due to other features.

We’re interested in the possibility of holding jobs until certain
licences are available (hello ansys) rather than them running and
failing. Can anyone speculate roughly how much work is involved to
finish the current implementation? Is this feature of interest to any
other users if we got it into a fork or pull-requestable branch?

Andrew

[slurm-dev] Re: ReqNodeNotAvail instead of not accepting at all

2014-12-04 Thread Andrew Elwell
>
> Is there anyway to configure slurm that it won't accept jobs with non
> existent features ?
>

Do you have

EnforcePartLimits=YES

in your config file?


[slurm-dev] Re: sbatch --array question and a tale of job and task confusion

2014-11-19 Thread Andrew Elwell
I'll add that this is (most likely) being seen on slurm 2.6.6 on a Cray
using ALPS.
/me waves to Balt


[slurm-dev] Re: including config files

2014-10-23 Thread Andrew Elwell

> That's a new feature in SLurm v14.11.

ah right, (digs out git blame, so it is) - Is there any equvalent
functionality or variable parsing in older (2.6.9 or 14.03) releases
prior to Natan's patch?


[slurm-dev] including config files

2014-10-22 Thread Andrew Elwell

Hi Folks,

According to the docs (http://slurm.schedmd.com/slurm.conf.html) it
should be possible to have

include otherconfig.conf

in my slurm.conf,

however I'd like to make this ${ClusterName}.conf - is this possible to do this?

I see that in src/common/parse_config.c there seems to be some sort of
hook for this

static char *_parse_for_format(s_p_hashtbl_t *f_hashtbl, char *path)
{
char *filename = xstrdup(path);
char *format = NULL;
char *tmp_str = NULL;

while (1) {
if ((format = strstr(filename, "%c"))) { /* ClusterName */
if (!s_p_get_string(&tmp_str, "ClusterName",f_hashtbl)){
error("%s: Did not get ClusterName for include "
  "path", __func__);
xfree(filename);
break;
}
xstrtolower(tmp_str);

but I can't work out how to do this

related, I also notice that it seems to need a fully qualified path -
is there a flexible shorthand for the same directory as the existing
config files?

Many thanks

Andrew


[slurm-dev] Re: Error: Unable to contact slurm controller

2014-08-24 Thread Andrew Elwell
Hi Gerry,


> [2014-08-21T09:30:09.673] fatal: system has no usable batch compute nodes
>

We see this on our systems (running Slurm + Alps/basil rather than native)
when the slurmctld starts before the sdb has a list of batch nodes. It's
bitten us when we've set the nodes to interactive rather than batch, and
more regularly when we've restarted the sdb and slurmctld has started too
early in the boot process. (a quick 'service slurm restart' sorts that tho)

Andrew


[slurm-dev] --parsable(2) option for squeue / sinfo

2014-06-12 Thread Andrew Elwell

Hi folks,

Wishlist item -- would it be easy to port in the parsable flags into
squeue? from a very quick glance over the code, it seems that sreport
and sacctmgr use common/print_fields.c but that's not used from
squeue.

Many thanks

Andrew


[slurm-dev] Re: Pbs to slurmdbd

2014-01-28 Thread Andrew Elwell

> You might take a look at the moab_2_slurmdb.pl script in 
> contribs/slurmdb-direct.


Thanks - I figured that was a good start - my concern was the
> use lib qw(/home/da/slurm/1.3/
line in the code - I wasn't sure how much bitrot had set in to make it
work with 2.6.x :-)


[slurm-dev] Pbs to slurmdbd

2014-01-28 Thread Andrew Elwell
Hi folks,

We're migrating from pbs pro to slurm mid cpu accounting cycle. Since
slurmdbd/sreport looks nicer than grepping through pbs logs for usage (no
gold on this cluster), is there a way to populate slurmdbd records from pbs
till we migrate?

(I.e. has anyone done this already rather than me coding from scratch?)

Andrew


[slurm-dev] Installation onto an XT30

2013-09-26 Thread Andrew Elwell

Hi Folks,

I'm trying to install slurm (2.6.2) onto our Cray XT30 -- I've been
following the guide at http://slurm.schedmd.com/cray.html and Gerrit's
paper from CUG11, but I've got a few questions about daemon placement
and configuration.

1) we use eslogin nodes (and other external services) so the
instructions to enable the cray job pam module will fail as
/proc/cray/ -- we can (and have) enabled this on the
internal service nodes -- is this an issue?

2) I'm basically replicating our current PBS Pro daemon placement --
using the mom nodes for the slurmd, and our aux node (currently
running flexlm) as the slurmctld (I'm not using our sdb node as that
can't mount the common NFS share with the slurm binaries/configs) --
is this the right approach?

3) we're using a central slurmdbd on a separate host (plan to get all
the site accounting from the various clusters together) -- as we use
LDAP for all our user details, is there a quick way to seed the
initial 'sacctmgr add user <...>' stage already about (or do I hack up
an ldapsearch script to do it)?

Many thanks

Andrew