[slurm-dev] SchedMD is hiring!

2017-06-13 Thread Danny Auble


SchedMD has opened another engineering position.  Details can be found 
at https://www.ziprecruiter.com/job/9aad3ef4


If you are interested in working for SchedMD as a full time Slurm 
development and support engineer please submit your resume at 
https://www.ziprecruiter.com/job/9aad3ef4 or to j...@schedmd.com


Feel free to contact ja...@schedmd.com with any questions regarding this 
opportunity.


Warmest Regards,
Danny


[slurm-dev] Slurm version 17.02.3 is now available

2017-05-10 Thread Danny Auble


We are pleased to announce the release of Slurm version 17.02.3, which 
contains 40 bug fixes developed over the past month.


Slurm can be downloaded from https://www.schedmd.com/downloads.php

Changes are as followed (which is a cut and paste of NEWS)...

* Changes in Slurm 17.02.3
==
 -- Increase --cpu_bind and --mem_bind field length limits.
 -- Fix segfault when using AdminComment field with job arrays.
 -- Clear Dependency field when all dependencies are satisfied.
 -- Add --array-unique to squeue which will display one unique pending job
array element per line.
 -- Reset backfill timers correctly without skipping over them in certain
circumstances.
 -- When running the "scontrol top" command, make sure that all of the 
user's
jobs have a priority that is lower than the selected job. Previous 
logic
would permit other jobs with equal priority (no jobs with higher 
priority).

 -- Fix perl api so we always get an allocation when calling Slurm::new().
 -- Fix issue with cleaning up cpuset and devices cgroups when multiple 
steps

end at the same time.
 -- Document that PriorityFlags option of DEPTH_OBLIVIOUS precludes the 
use of

FAIR_TREE.
 -- Fix issue if an invalid message came in a Slurm daemon/command may 
abort.

 -- Make it impossible to use CR_CPU* along with CR_ONE_TASK_PER_CORE. The
options are mutually exclusive.
 -- ALPS - Fix scheduling when ALPS doesn't agree with Slurm on what nodes
are free.
 -- When removing a partition make sure it isn't part of a reservation.
 -- Fix seg fault if loading attempting to load non-existent 
burstbuffer plugin.
 -- Fix to backfill scheduling with respect to QOS and association 
limits. Jobs

submitted to multiple partitions are most likley to be effected.
 -- sched/backfill: Improve assoc_limit_stop configuration parameter 
support.

 -- CRAY - Add ansible play and README.
 -- sched/backfill: Fix bug related to advanced reservations and the 
need to

reboot nodes to change KNL mode.
 -- Preempt plugins - fix check for 'preempt_youngest_first' option.
 -- Preempt plugins - fix incorrect casts in preempt_youngest_first mode.
 -- Preempt/job_prio - fix incorrect casts in sort function.
 -- Fix to make task/affinity work with ldoms where there are more than 64
cpus on the node.
 -- When using node_features/knl_generic make it so the slurmd doesn't 
segfault

when shutting down.
 -- Fix potential double-xfree() when using job arrays that can lead to
slurmctld crashing.
 -- Fix priority/multifactor priorities on a slurmctld restart if not using
accounting_storage/[mysql|slurmdbd].
 -- Fix NULL dereference reported by CLANG.
 -- Update proctrack documentation to strongly encourage use of
proctrack/cgroup.
 -- Fix potential memory leak if job fails to begin after nodes have been
selected for a job.
 -- Handle a job that made it out of the select plugin without a job_resrcs
pointer.
 -- Fix potential race condition when persistent connections are being 
closed at

shutdown.
 -- Fix incorrect locks levels when submitting a batch job or updating 
a job

in general.
 -- CRAY - Move delay waiting for job cleanup to after we check once.
 -- MYSQL - Fix memory leak when loading archived jobs into the database.
 -- Fix potential race condition when starting the priority/multifactor 
plugin's

decay thread.
 -- Sanity check to make sure we have started a job in acct_policy.c 
before we

clear it as started.
 -- Allow reboot program to use arguments.
 -- Message Aggr - Remove race condition on slurmd shutdown with 
respects to

destroying a mutex.
 -- Fix updating job priority on multiple partitions to be correct.
 -- Don't remove admin comment when updating a job.
 -- Return error when bad separator is given for scontrol update job 
licenses.


[slurm-dev] Slurm version 17.02.0 is now available

2017-02-23 Thread Danny Auble


After 9 months of development we are pleased to announce the 
availability of Slurm version 17.02.0.


A brief description of what is contained in this release and other notes 
about it is contained below.  For a fuller description please consult 
the RELEASE_NOTES file available in the source.


Thanks to all involved!

Slurm downloads are available from https://schedmd.com/downloads.php.

RELEASE NOTES FOR SLURM VERSION 17.02
23 February 2017

IMPORTANT NOTES:
THE MAXJOBID IS NOW 67,108,863. ANY PRE-EXISTING JOBS WILL CONTINUE TO 
RUN BUT

NEW JOB IDS WILL BE WITHIN THE NEW MAXJOBID RANGE. Adjust your configured
MaxJobID value as needed to eliminate any confusion.

If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
The 17.02 slurmdbd will work with Slurm daemons of version 15.08 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.  No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do.  Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 
64M.

You can accomplish this by adding the line

innodb_buffer_pool_size=64M

under the [mysqld] reference in the my.cnf file and restarting the 
mysqld. The

buffer pool size must be smaller than the size of the MySQL tmpdir. This is
needed when converting large tables over to the new database schema.

Slurm can be upgraded from version 15.08 or 16.05 to version 17.02 
without loss
of jobs or other state information. Upgrading directly from an earlier 
version

of Slurm will result in loss of state information.

If using SPANK plugins that use the Slurm APIs, they should be 
recompiled when

upgrading Slurm to a new major release.

NOTE: systemd services files are installed automatically, but not enabled.
  You will need to manually enable them on the appropriate systems:
  - Controller: systemctl enable slurmctld
  - Database: systemctl enable slurmdbd
  - Compute Nodes: systemctl enable slurmd

NOTE: If you are not using Munge, but are using the "service" scripts to
  start Slurm daemons, then you will need to remove this check from the
  etc/slurm*service scripts.

NOTE: If you are upgrading with any jobs from 14.03 or earlier
  (i.e. quick upgrade from 14.03 -> 15.08 -> 17.02) you will need
  to wait until after those jobs are gone before you upgrade to 17.02.

HIGHLIGHTS
==
 -- Added infrastructure for managing workload across a federation of 
clusters.

(partial functionality in version 17.02, fully operational in May 2017)
 -- In order to support federated jobs, the MaxJobID configuration 
parameter

default value has been reduced from 2,147,418,112 to 67,043,328 and its
maximum value is now 67,108,863. Upon upgrading, any pre-existing 
jobs that
have a job ID above the new range will continue to run and new jobs 
will get

job IDs in the new range.
 -- Added "MailDomain" configuration parameter to qualify email addresses.
 -- Automatically clean up task/cgroup cpuset and devices cgroups after 
steps

are completed.
 -- Added burst buffer support for job arrays. Added new 
SchedulerParameters
configuration parameter of bb_array_stage_cnt=# to indicate how 
many pending

tasks of a job array should be made available for burst buffer resource
allocation.
 -- Added new sacctmgr commands: "shutdown" (shutdown the server), 
"list stats"

(get server statistics) "clear stats" (clear server statistics).
 -- The database index for jobs is now 64 bits.  If you happen to be 
close to
4 billion jobs in your database you will want to update your 
slurmctld at

the same time as your slurmdbd to prevent roll over of this variable as
it is 32 bit previous versions of Slurm.
 -- All memory values (in MB) are now 64 bit. Previously, nodes with 
more than

of memory would not schedule or enforce memory limits correctly.
 -- Removed AIX, BlueGene/L and BlueGene/P support.
 -- Removed sched/wiki and sched/wiki2 plugins and associated code.
 -- Added PrologFlags=Serial to disable concurrent execution of 
prolog/epilog

scripts.


[slurm-dev] Slurm versions 16.05.9 and 17.02.0-0rc1 are now available

2017-01-31 Thread Danny Auble


We are pleased to announce the availability of Slurm versions 16.05.9 
and 17.02.0-0rc1 (release candidate 1).


16.05.9 contains around 25 rather minor bug fixes. Please upgrade at 
your leisure.


The rc release contains all of the features intended for release 17.02. 
Development has ended for this release and we are continuing with our 
testing phase which will most likely result in another rc before we tag 
17.02.0 near the middle of February. A description of what this release 
contains is in the RELEASE_NOTES file available in the source. Your help 
in hardening this version is greatly appreciated. You are invited to 
download this version and assist in testing. As with all rc releases you 
should be able to install and not worry about protocol/state changes 
going forward with the version.


Slurm downloads are available from https://schedmd.com/downloads.php.

Reading from NEWS for 16.05.9...

* Changes in Slurm 16.05.9
==
 -- Fix parsing of SBCAST_COMPRESS environment variable in sbcast.
 -- Change some debug messages to errors in task/cgroup plugin.
 -- backfill scheduler: Stop trying to determine expected start time 
for a job
after 2 seconds of wall time. This can happen if there are many 
running jobs

and a pending job can not be started soon.
 -- Improve performance of cr_sort_part_rows() in cons_res plugin.
 -- CRAY - Fix dealock issue when updating accounting in the slurmctld and
scheduling a Datawarp job.
 -- Correct the job state accounting information for jobs requeued due 
to burst

buffer errors.
 -- burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e.
just creating or deleting a persistent burst buffer).
 -- Fix slurm.spec file support for BlueGene builds.
 -- Fix missing TRES read lock in acct_policy_job_runnable_pre_select() 
code.

 -- Fix debug2 message printing value using wrong array index in
_qos_job_runnable_post_select().
 -- Prevent job timeout on node power up.
 -- MYSQL - Fix minor memory leak when querying steps and the sql fails.
 -- Make it so sacctmgr accepts column headers like MaxTRESPU and not 
MaxTRESP.

 -- Only look at SLURM_STEP_KILLED_MSG_NODE_ID on startup, to avoid race
condition later when looking at a steps env.
 -- Make backfill scheduler behave like regular scheduler in respect to
'assoc_limit_stop'.
 -- Allow a lower version client command to talk to a higher version 
contoller

using the multi-cluster options (e.g. squeue -M).
 -- slurmctld/agent race condition fix: Prevent job launch while 
PrologSlurmctld

daemon is running or node boot in progress.
 -- MYSQL - Fix a few other minor memory leaks when uncommon failures 
occur.
 -- burst_buffer/cray - Fix race condition that could cause multiple 
batch job

launch requests resulting in drained nodes.
 -- Correct logic to purge old reservations.
 -- Fix DBD cache restore from previous versions.
 -- Fix to logic for getting expected start time of existing job ID with
explicit begin time that is in the past.
 -- Clear job's reason of "BeginTime" in a more timely fashion and/or 
prevents

them from being stuck in a PENDING state.
 -- Make sure acct policy limits imposed on a job are correct after 
requeue.


Reading from NEWS for 17.02.0-0rc1...

* Changes in Slurm 17.02.0rc1
==
 -- Add port info to 'sinfo' and 'scontrol show node'.
 -- Fix errant definition of USE_64BIT_BITSTR which can lead to core dumps.
 -- Move BatchScript to end of each job's information when using
"scontrol -dd show job" to make it more readable.
 -- Add SchedulerParameters configuration parameter of 
"default_gbytes", which
treats numeric only (no suffix) value for memory and tmp disk space 
as being

in units of Gigabytes. Mostly for compatability with LSF.
 -- Fix race condtion in srun/sattach logic which would prevent srun from
terminating.
 -- Bitstring operations are now 64bit instead of 32bit.
 -- Replace hweight() function in bitstring with faster version.
 -- scancel would treat a non-numeric argument as the name of jobs to be
cancelled (a non-documented feature). Cancelling jobs by name now 
require

the "--jobname=" command line argument.
 -- scancel modified to note that no jobs satisfy the filter options 
when the
--verbose option is used along with one or more job filters (e.g. 
"--qos=").

 -- Change _pack_cred to use pack_bit_str_hex instead of pack_bit_fmt for
better scalability and performance.
 -- Add BootTime configuration parameter to knl.conf file to optimize 
resource

allocations with respect to required node reboots.
 -- Add node_features_p_boot_time() to node_features plugin to optimize
scheduling with respect to node reboots.
 -- Avoid allocating resources to a job in the event that its run time 
plus boot

time (if needed) extent into an advanced reservation.
 -- Burst_buffer/cray - Avoid stage-out operation if job never started.
 -- node

[slurm-dev] Slurm version 16.05.7 is now available

2016-12-08 Thread Danny Auble


We are pleased to announce the immediate availability of Slurm 16.05.7.  
It contains about 40 relatively minor bug fixes.


Slurm downloads are available from 
https://www.schedmd.com/downloads.php.  You may notice this is a change 
in location, https://www.schedmd.com/#repos will still work for the time 
being, but it is a good idea to update your links sooner than later.


Changes are listed below or available as always in the NEWS file.

* Changes in Slurm 16.05.7
==
 -- Fix issue in the priority/multifactor plugin where on a slurmctld 
restart,

where more time is accounted for than should be allowed.
 -- cray/busrt_buffer - If total_space in a pool decreases, reset 
used_space

rather than trying to account for buffer allocations in progress.
 -- cray/busrt_buffer - Fix for double counting of used_space at slurmctld
startup.
 -- Fix regression in 16.05.6 where if you request multiple cpus per 
task (-c2)

and request --ntasks-per-core=1 and only 1 task on the node
the slurmd would abort on an infinite loop fatal.
 -- cray/busrt_buffer - Internally track both allocated and unusable space.
The reported UsedSpace in a pool is now the allocated space 
(previously was
unusable space). Base available space on whichever value leaves 
least free

space.
 -- cray/burst_buffer - Preserve job ID and don't translate to job 
array ID.
 -- cray/burst_buffer - Update "instance" parsing to match updated 
dw_wlm_cli

output.
 -- sched/backfill - Insure we don't try to start a job that was 
already started

and requeued by the main scheduling logic.
 -- job_submit/lua - add access to the job features field in job_record.
 -- select/linear plugin modified to better support heterogeneous 
clusters when

topology/none is also configured.
 -- Permit cancellation of jobs in configuring state.
 -- acct_gather_energy/rapl - prevent segfault in slurmd from race to 
gather

data at slurmd startup.
 -- Integrate node_feature/knl_generic with "hbm" GRES information.
 -- Fix output routines to prevent rounding the TRES values for memory 
or BB.

 -- switch/cray plugin - fix use after free error.
 -- docs - elaborate on how way to clear TRES limits in sacctmgr.
 -- knl_cray plugin - Avoid abort from backup slurmctld at start time.
 -- cgroup plugins - fix two minor memory leaks.
 -- If a node is booting for some job, don't allocate additional jobs 
to the

node until the boot completes.
 -- testsuite - fix job id output in test17.39.
 -- Modify backfill algorithm to improve performance with large numbers of
running jobs. Group running jobs that end in a "similar" time frame 
using a
time window that grows exponentially rather than linearly. After 
one second

of wall time, simulate the termination of all remaining running jobs in
order to respond in a reasonable time frame.
 -- Fix slurm_job_cpus_allocated_str_on_node_id() API call.
 -- sched/backfill plugin: Make malloc match data type (defined as 
uint32_t and

allocated as int).
 -- srun - prevent segfault when terminating job step before step has 
launched.

 -- sacctmgr - prevent segfault when trying to reset usage for an invalid
account name.
 -- Make the openssl crypto plugin compile with openssl >= 1.1.
 -- Fix SuspendExcNodes and SuspendExcParts on slurmctld reconfiguration.
 -- sbcast - prevent segfault in slurmd due to race condition between file
transfers from separate jobs using zlib compression
 -- cray/burst_buffer - Increase time to synchronize operations between 
threads

from 5 to 60 seconds ("setup" operation time observed over 17 seconds).
 -- node_features/knl_cray - Fix possible race condition when changing node
state that could result in old KNL mode as an active features.
 -- Make sure if a job can't run because of resources we also check 
accounting
limits after the node selection to make sure it doesn't violate 
those limits
and if it does change the reason for waiting so we don't reserve 
resources

on jobs violating accounting limits.
 -- NRT - Make it so a system running against IBM's PE will work with PE
version 1.3.
 -- NRT - Make it so protocols pgas and test are allowed to be used.
 -- NRT - Make it so you can have more than 1 protocol listed in 
MP_MSG_API.
 -- cray/burst_buffer - If slurmctld daemon restarts with pending job 
and burst
buffer having unknown file stage-in status, teardown the buffer, 
defer the

job, and start stage-in over again.
 -- On state restore in the slurmctld don't overwrite the 
mem_spec_limit given

from the slurm.conf when using FastSchedule=0.
 -- Recognize a KNL's proper NUMA count (rather than setting it to the 
value

in slurm.conf) when using FastSchedule=0.
 -- Fix parsing in regression test1.92 for some prompts.
 -- sbcast - use slurmd's gid cache rather than a separate lookup.
 -- slurmd - return error if setgroups() call fails in _drop_privileges().
 -- Remove error 

[slurm-dev] SchedMD support

2016-11-17 Thread Danny Auble
SchedMD has 2 technical teams. One team is dedicated to Slurm development and 
one team is dedicated to commercial Slurm support. At this time SchedMD does 
not have resources allocated to the community mailing list. 

You can find out more about commercial Slurm support by emailing 
sa...@schedmd.com 

Danny 

[slurm-dev] Slurm at SC16!

2016-11-12 Thread Danny Auble


We hope you all come by the booth this week at SC16.  Our booth number 
is #412.


We will also be having a birds of a feather on Nov 17 at the conference 
12:15-13:15.  More information can be found at

http://sc16.supercomputing.org/presentation/?id=bof101&sess=sess321.

We also just updated our website (http://schedmd.com) as well as the 
slurm documentation (http://slurm.schedmd.com)!  Please go and see what 
you think.


See you all next week!

Danny


[slurm-dev] Slurm versions 16.05.6 and 17.02.0-pre3 are now available

2016-10-27 Thread Danny Auble
Slurm version 16.05.6 is now available and includes around 40 bug fixes 
developed over the past month.


We have also made the third pre-release of version 17.02, which is under 
development and scheduled for release in February 2017.


Slurm downloads are available from http://www.schedmd.com/#repos.

We are excited to see you all next month at SC16, please feel free to 
come by our booth #412.


The Slurm BoF will be Thursday, November 17th 12:15pm - 1:15pm in room 355-E

More information about that can be found at 
http://sc16.supercomputing.org/presentation/?id=bof101&sess=sess321.


[slurm-dev] Slurm version 16.05.4 is now available

2016-08-12 Thread Danny Auble


We are pleased to announce the availability of Slurm version 16.05.4.  
It contains about 30 bug fixes developed over the past few weeks as 
listed below.


Slurm downloads are available from:
http://www.schedmd.com/#repos

* Changes in Slurm 16.05.4
==
 -- Fix potential deadlock if running with message aggregation.
 -- Streamline when schedule() is called when running with message 
aggregation

on batch script completes.
 -- Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t.
 -- Document that persistent burst buffers can not be created or 
destroyed using

the salloc or srun --bb options.
 -- Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and
SLURM_JOB_RESERVAION environment variables are set for the salloc 
command.

Document the same environment variables for the salloc, sbatch and srun
commands in their man pages.
 -- Fix issue where sacctmgr load cluster.cfg wouldn't load associations
that had a partition in them.
 -- Don't return the extern step from sstat by default.
 -- In sstat print 'extern' instead of 4294967295 for the extern step.
 -- Make advanced reservations work properly with core specialization.
 -- Fix race condition in the account_gather plugin that could result 
in job

stuck in COMPLETING state.
 -- Regression test fixes if SelectTypePlugin not managing memory and 
no node

memory size set (defaults to 1 MB per node).
 -- Add missing partition write locks to 
_slurm_rpc_dump_nodes/node_single to

prevent a race condition leading to inconsistent sinfo results.
 -- Fix task:CPU binding logic for some processors. This bug was introduced
in version 16.05.1 to address KNL bunding problem.
 -- Fix two minor memory leaks in slurmctld.
 -- Improve partition-specific limit logging from slurmctld daemon.
 -- Fix incorrect access check when using MaxNodes setting on the 
partition.

 -- Fix issue with sacctmgr when specifying a list of clusters to query.
 -- Fix issue when calculating future StartTime for a job.
 -- Make EnforcePartLimit support logic work with any ordering of 
partitions

in job submit request.
 -- Prevent restoration of wrong CPU governor and frequency when using
multiple task plugins.
 -- Prevent slurmd abort if hwloc library fails to populate the "children"
arrays (observed with hwloc version "dev-333-g85ea6e4").
 -- burst_buffer/cray: Add "--groupid" to DataWarp "setup" command.
 -- Fix lustre profiling putting it in the Filesystem dataset instead 
of the

Network dataset.
 -- Fix profiling documentation and code to match be consistent with
Filesystem instead of Lustre.
 -- Correct the way watts is calculated in the rapl plugin when using a 
poll

frequency other than AcctGatherNodeFreq.
 -- Don't about step launch if job reaches expected end time while node is
configuring/booting (NOTE: The job end time will be adjusted after node
becomes ready for use).
 -- Fix several print routines to respect a custom output delimiter when
printing NO_VAL or INFINITE.
 -- Correct documented configurations where --ntasks-per-core and
--ntasks-per-socket are supported.
 -- task/affinity plugin buffer allocated too small, can corrupt memory.


[slurm-dev] Re: Per Job Usage

2016-06-15 Thread Danny Auble


Paul, you can use

sacct --format=AllocTRES,Elapsed -p -X

which should give you the tres and the elapsed time allocated to the job 
and the time it was allocated, there is no ElapsedRaw but I'm guessing 
you could write a little script to give you what you want. CPUTimeRaw is 
just what was actually used by a step and doesn't have anything to do 
with the fairshare calculation at all.


I will make note that while the values should match the database isn't 
part of the calculation for fairshare.  It just stores the information.  
I will also make note that unless you have PriorityDecayHalfLife=0 (hard 
limits) it will be very difficult to reverse engineer the fairshare 
value as decay will take place many times most likely and you will have 
to take that into consideration.


Danny

On 06/15/16 07:33, Paul Edmon wrote:


The CPUTimeRaw is useful but it would good to have a TRESTimeRaw as 
well that was the sum of all the TRES charges.  In reality what I want 
to do is check the math that slurm is doing to convince a user that it 
is doing the math right.  Also its a good way of spotting if a 
specific job is anomolous, which could cause errant TRES charges 
(especially if the DB entry for that job is corrupted or wrong for 
various reasons).


-Paul Edmon-

On 06/15/2016 10:27 AM, Thomas M. Payerle wrote:


I have only a little experience with TRES stuff, but in Slurm version 
14 and
earlier (preTRES) there is a CPUTimeRaw field you can display, which 
should

be the same as RawUsage.

On Slurm version 15 and later (post-TRES), it looks like the CPUTimeRaw
is still there.  I would expect it to contain CPUs allocated * 
elapsed wall
time in seconds, which by default is still RawUsage (but one can now 
define

RawUsage to have other components).  Presumably one could write a filter
which takes Elapsed time and AllocTRES to produced per TRES usage, but I
do not see that as directly output by sacct (but I may be missing 
something)




On Tue, 14 Jun 2016, Paul Edmon wrote:


So I don't thread hijack, here is my post in a new thread.

Is there a way in sacct (or some other command) to get the RawUsage 
of a job, rather than the aggregate calculated by sshare.  I want to 
see the break down of which job charged how much against a users 
total usage and hence fairshare score. Essentially I want to 
reconstruct the math it used for a specific user to see if there was 
a job that it changed inordinately more for and why.


My first logical step was to look at sacct but I didn't see an entry 
that simply listed RawUsage for the job in terms of TRES. Even 
better would be a break down of memory and cpu charges.


-Paul Edmon-



Tom Payerle
IT-ETI-EUS paye...@umd.edu
4254 Stadium Dr(301) 405-6135
University of Maryland
College Park, MD 20742-4111


[slurm-dev] Slurm 16.05.0 and 15.08.12 are now available

2016-05-31 Thread Danny Auble


We are pleased to announce the release of 16.05.0! It contains many new 
features and performance enhancements. Please read the RELEASE_NOTES 
file to get an idea of the new items that have been added. The online 
Slurm documentation has been updated to reflect this release.


We have also release one of the last tags of 15.08 in the form of 15.08.12.

Both versions can be downloaded from the normal spot 
http://schedmd.com/#repos.


* Changes in Slurm 15.08.12
===
 -- Do not attempt to power down a node which has never responded if the
slurmctld daemon restarts without state.
 -- Fix for possible slurmstepd segfault on invalid user ID.
 -- MySQL - Fix for possible race condition when archiving multiple 
clusters

at the same time.
 -- Fix compile for when you don't have hwloc.
 -- Fix issue where daemons would only listen on specific address given in
slurm.conf instead of all.  If looking for specific addresses use
TopologyParam options No*InAddrAny.
 -- Cray - Better robustness when dealing with the aeld interface.
 -- job_submit.lua - add array_inx value for job arrays.
 -- Perlapi - Remove unneeded/undefined mutex.
 -- Fix issue when TopologyParam=NoInAddrAny is set the responses wouldn't
make it to the slurmctld when using message aggregation.
 -- MySQL - Fix potential memory leak when rolling up data.
 -- Fix issue with clustername file when running on NFS with root_squash.
 -- Fix race condition with respects to cleaning up the profiling threads
when in use.
 -- Fix issues when building on NetBSD.
 -- Fix jobcomp/elasticsearch build when libcurl is installed in a
non-standard location.
 -- Fix MemSpecLimit to explicitly require TaskPlugin=task/cgroup and
ConstrainRAMSpace set in cgroup.conf.
 -- MYSQL - Fix order of operations issue where if the database is 
locked up
and the slurmctld doesn't wait long enough for the response it 
would give
up leaving the connection open and create a situation where the 
next message

sent could receive the response of the first one.

* Changes in Slurm 16.05.0
=
 -- Update seff to fix warnings with ncpus, and list slurm-perlapi 
dependency

in spec file.
 -- Fix testsuite to consistent use /usr/bin/env {bash,expect} construct.
 -- Cray - Ensure that step completion messages get to the database.
 -- Fix step cpus_per_task calculation for heterogeneous job allocation.
 -- Fix --with-json= configure option to use specified path.
 -- Add back thread_id to "thread_id" LogTimeFormat to distinguish between
mutliple threads with the same name. Now displays thread name and id.
 -- Change how Slurm determines the NUMA count of a node. Ignore KNL NUMA
that only include memory.
 -- Cray - Fix node list parsing in capmc_suspend/resume programs.
 -- Fix sbatch #BSUB parsing for -W and -M options.
 -- Fix GRES task layout bug that could cause slurmctld to abort.
 -- Fix to --gres-flags=enforce-binding logic when multiple sockets needed.


[slurm-dev] Re: 16.05?

2016-05-29 Thread Danny Auble
Most likely Tuesday. 

On May 29, 2016 9:01:22 PM PDT, Lachlan Musicman  wrote:
>I know "it will be ready when it's ready" but I am about to deploy to
>production - how far off is the official 16.05 release?
>
>cheers
>L.
>--
>The most dangerous phrase in the language is, "We've always done it
>this
>way."
>
>- Grace Hopper


[slurm-dev] Slurm versions 15.08.11 and 16.05.0-rc1 now available

2016-05-03 Thread Danny Auble


We are pleased to announce the availability of Slurm versions 15.08.11 
and 16.05.0-rc1 (release candidate 1).


15.08.11 contains around 25 rather minor bug fixes, detailed below. 
Please upgrade at your leisure.


The rc release contains all of the features intended for release 16.05.  
Development has ended for this release and we are continuing with our 
testing phase which will most likely result in another rc before we tag 
16.05.0 near the end of the month.  A description of what this release 
contains is in the RELEASE_NOTES file available in the source.  Your 
help in hardening this version is greatly appreciated.  You are invited 
to download this version and assist in testing.


Slurm downloads are available from http://schedmd.com/#repos

* Changes in Slurm 15.08.11
===
 -- Fix for job "--contiguous" option that could cause job 
allocation/launch

failure or slurmctld crash.
 -- Fix to setup logs for single-character program names correctly.
 -- Backfill scheduling performance enhancement with large number of 
running

jobs.
 -- Reset job's prolog_running counter on slurmctld restart or reconfigure.
 -- burst_buffer/cray - Update job's prolog_running counter if pre_run 
fails.
 -- MYSQL - Make the error message more specific when removing a 
reservation

and it doesn't meet basic requirements.
 -- burst_buffer/cray - Fix for script creating or deleting persistent 
buffer

would fail "paths" operation and hold the job.
 -- power/cray - Prevent possible divide by zero.
 -- power/cray - Fix bug introduced in 15.08.10 preventing operation in 
many

cases.
 -- Prevent deadlock for flow of data to the slurmdbd when sending 
reservation

that wasn't set up correctly.
 -- burst_buffer/cray - Don't call Datawarp "paths" function if script 
includes
only create or destroy of persistent burst buffer. Some versions of 
Datawarp

software return an error for such scripts, causing the job to be held.
 -- Fix potential issue when adding and removing TRES which could result
in the slurmdbd segfaulting.
 -- Add cast to memory limit calculation to prevent integer overflow for
very large memory values.
 -- Bluegene - Fix issue with reservations resizing under the covers on a
restart of the slurmctld.
 -- Avoid error message of "Requested cpu_bind option requires entire 
node to

be allocated; disabling affinity" being generated in some cases where
task/affinity and task/cgroup plugins used together.
 -- Fix version issue when packing GRES information between 2 different 
versions

of Slurm.
 -- Fix for srun hanging with OpenMPI and PMIx
 -- Better initialization of node_ptr when dealing with protocol_version.
 -- Fix incorrect type when initializing header of a message.
 -- MYSQL - Fix incorrect usage of limit and union.
 -- MYSQL - Remove 'ignore' from alter ignore when updating a table.
 -- Documentation - update prolog_epilog page to reflect current behavior
if the Prolog fails.
 -- Documentation - clarify behavior of 'srun --export=NONE' in man page.
 -- Fix potential gres underflow on restart of slurmctld.
 -- Fix sacctmgr to remove a user who has no associations.

* Changes in Slurm 16.05.0rc1
==
 -- Remove the SchedulerParameters option of "assoc_limit_continue", 
making it
the default value. Add option of "assoc_limit_stop". If 
"assoc_limit_stop"
is set and a job cannot start due to association limits, then do 
not attempt

to initiate any lower priority jobs in that partition. Setting this can
decrease system throughput and utlization, but avoid potentially 
starving

larger jobs by preventing them from launching indefinitely.
 -- Update a node's socket and cores per socket counts as needed after 
a node
boot to reflect configuration changes which can occur on KNL 
processors.
Note that the node's total core count must not change, only the 
distribution
of cores across varying socket counts (KNL NUMA nodes treated as 
sockets by

Slurm).
 -- Rename partition configuration from "Shared" to "OverSubscribe". Rename
salloc, sbatch, srun option from "--shared" to "--oversubscribe". 
The old

options will continue to function. Output field names also changed in
scontrol, sinfo, squeue and sview.
 -- Add SLURM_UMASK environment variable to user job.
 -- knl_conf: Added new configuration parameter of CapmcPollFreq.
 -- squeue: remove errant spaces in column formats for "squeue -o %all".
 -- Add ARRAY_TASKS mail option to send emails to each task in a job array.
 -- Change default compression library for sbcast to lz4.
 -- select/cray - Initiate step node health check at start of step 
termination

rather than after application completely ends so that NHC can capture
information about hung (non-killable) processes.
 -- Add --units=[KMGTP] option to sacct to display values in specific 
unit type.

 -- Modify sacct and sacctmgr to display TRES val

[slurm-dev] Re: Need to restart slurmctld when adding user to accounting

2016-03-30 Thread Danny Auble
Chris is right.  If you ever have this problem it should be fairly clearly 
marked in both slurmctld and slurmdbd logs when it fails.  Usually a firewall 
like iptables is to blame or different slurm users set in the various .conf 
files as mentioned before. 

On March 30, 2016 5:57:19 PM PDT, Gene Soudlenkov  
wrote:
>
>We've been having the same problem for years - and we still need to do
>it.
>
>Gene
>
>On 31/03/16 13:46, Christopher Samuel wrote:
>> On 31/03/16 11:33, Terri Knight wrote:
>>
>>> Upon further testing, I only need restart the slurmctld daemon to
>get
>>> the new user added such that he can run a job.
>> I think when you add a user with sacctmgr slurmdbd will try and do an
>> RPC to slurmctld on the registered clusters to inform them of this.
>>
>> If slurmdbd can't do so then you should see an error logged in the
>> slurmdbd logs and consequently slurmctld won't realise this new user
>> exists until it reloads its list of users from slurmdbd (say on a
>restart).
>>
>> Check your slurmdbd logs and also check that:
>>
>> sacctmgr list cluster format=cluster,controlhost
>>
>> reports an IP address that slurmdbd can talk to for each cluster.
>>
>> Best of luck,
>> Chris
>
>-- 
>New Zealand eScience Infrastructure
>Centre for eResearch
>The University of Auckland
>e: g.soudlen...@auckland.ac.nz
>p: +64 9 3737599 ext 89834 c: +64 21 840 825 f: +64 9 373 7453
>w: www.nesi.org.nz


[slurm-dev] Slurm version 15.08.7 is now available

2016-01-20 Thread Danny Auble


We are pleased to announce the availability of Slurm version 15.08.7. It 
contains 46 relatively minor bug fixes you may find interesting.   We 
recommend upgrading to 15.08.7 at your earliest convenience.


Slurm downloads are available from http://schedmd.com/#repos.

Here is a list of what has changed...

* Changes in Slurm 15.08.7
==
 -- sched/backfill: If a job can not be started within the configured
backfill_window, set it's start time to 0 (unknown) rather than the end
of the backfill_window.
 -- Remove the 1024-character limit on lines in batch scripts.
 -- burst_buffer/cray: Round up swap size by configured granularity.
 -- select/cray: Log repeated aeld reconnects.
 -- task/affinity: Disable core-level task binding if more CPUs 
required than

available cores.
 -- Preemption/gang scheduling: If a job is suspended at slurmctld 
restart or
reconfiguration time, then leave it suspended rather than 
resume+suspend.
 -- Don't use lower weight nodes for job allocation when topology/tree 
used.

 -- BGQ - If a cable goes into error state remove the under lying block on
a dynamic system and mark the block in error on a static/overlap 
system.

 -- BGQ - Fix regression in 9cc4ae8add7f where blocks would be deleted on
static/overlap systems when some hardware issue happens when restarting
the slurmctld.
 -- Log if CLOUD node configured without a resume/suspend program or 
suspend

time.
 -- MYSQL - Better locking around g_qos_count which was previously 
unprotected.

 -- Correct size of buffer used for jobid2str to avoid truncation.
 -- Fix allocation/distribution of tasks across multiple nodes when
--hint=nomultithread is requested.
 -- If a reservation's nodes value is "all" then track the current 
nodes in the

system, even if those nodes change.
 -- Fix formatting if using "tree" option with sreport.
 -- Make it so sreport prints out a line for non-existent TRES instead of
error message.
 -- Set job's reason to "Priority" when higher priority job in that 
partition

(or reservation) can not start rather than leaving the reason set to
"Resources".
 -- Fix memory corruption when a new non-generic TRES is added to the
DBD for the first time.  The corruption is only noticed at shutdown.
 -- burst_buffer/cray - Improve tracking of allocated resources to 
handle race

condition when reading state while buffer allocation is in progress.
 -- If a job is submitted only with -c option and numcpus is updated before
the job starts update the cpus_per_task appropriately.
 -- Update salloc/sbatch/srun documentation to mention time granularity.
 -- Fixed memory leak when freeing assoc_mgr_info_msg_t.
 -- Prevent possible use of empty reservation core bitmap, causing abort.
 -- Remove unneeded pack32's from qos_rec when qos_rec is NULL.
 -- Make sacctmgr print MaxJobsPerUser when adding/altering a QOS.
 -- Correct dependency formatting to print array task ids if set.
 -- Update sacctmgr help with current QOS options.
 -- Update slurmstepd to initialize authentication before task launch.
 -- burst_cray/cray: Eliminate need for dedicated nodes.
 -- If no MsgAggregationParams is set don't set the internal string to
anything.  The slurmd will process things correctly after the fact.
 -- Fix output from api when printing job step not found.
 -- Don't allow user specified reservation names to disrupt the normal
reservation sequeuece numbering scheme.
 -- Fix scontrol to be able to accept TRES as an option when creating
a reservation.
 -- contrib/torque/qstat.pl - return exit code of zero even with no records
printed for 'qstat -u'.
 -- When a reservation is created or updated, compress user provided 
node names
using hostlist functions (e.g. translate user input of 
"Nodes=tux1,tux2"

into "Nodes=tux[1-2]").
 -- Change output routines for scontrol show partition/reservation to 
handle

unexpectedly large strings.
 -- Add more partition fields to "scontrol write config" output file.
 -- Backfill scheduling fix: If a job can't be started due to a "group" 
resource
limit, rather than reserve resources for it when the next job ends, 
don't

reserve any resources for it.
 -- Avoid slurmstepd abort if malloc fails during accounting gather 
operation.
 -- Fix nodes from being overallocated when allocation straddles 
multiple nodes.

 -- Fix memory leak in slurmctld job array logic.
 -- Prevent decrementing of TRESRunMins when 
AccountingStorageEnforce=limits is

not set.
 -- Fix backfill scheduling bug which could postpone the scheduling of 
jobs due

to avoidance of nodes in COMPLETING state.
 -- Properly account for memory, CPUs and GRES when slurmctld is 
reconfigured
while there is a suspended job. Previous logic would add the CPUs, 
but not
memory or GPUs. This would result in underflow/overflow errors in 
select

cons_res plugin.
 -- Strip flags from a job state in qstat wrapper b

[slurm-dev] Re: slurmd can't mount cpuacct cgroup namespace on RHEL 7.2 ?

2015-12-23 Thread Danny Auble
I don't currently see any. It the case of the account gather you not only have 
the overhead of the cgroup reading but you also have the reading of proc since 
the cgroup doesn't contain all the stats you need, only a very small subset.  
So it just takes longer. The proctrack plugin is very fast I highly recommend 
using it, just not the account gather.  The task plugin also adds overhead as 
well depending on the cgroups you use but most people won't notice.  You would 
have to be running 10s of jobs a sec to really notice.  It dies add quite a bit 
of functionality so it isn't something I generally discourage. 

On December 23, 2015 3:40:28 PM PST, Christopher Samuel  
wrote:
>
>Hiya Danny,
>
>On 23/12/15 14:41, Danny Auble wrote:
>
>> We do use cgroups to get the processes, at least if you use the
>cgroup
>> proctrack plugin. The only cgroup plugin I usually don't suggest is
>the
>> account gather one. I don't believe scalability is an issue when done
>> this way.
>
>What scalability issues do you see with the cgroup plugin, and what
>causes them?
>
>Does it take longer to read from cgroup filesystems?
>
>All the best!
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: slurmd can't mount cpuacct cgroup namespace on RHEL 7.2 ?

2015-12-22 Thread Danny Auble
We do use cgroups to get the processes, at least if you use the cgroup 
proctrack plugin. The only cgroup plugin I usually don't suggest is the account 
gather one.  I don't believe scalability is an issue when done this way. 

On December 22, 2015 7:34:25 PM PST, gareth.willi...@csiro.au wrote:
>Hi Danny,
>
>You *might* get better scalability querying based on cgroups.  In the
>past we had problems on a large numa system where /proc was repeatedly
>traversed by an ancient version of torque looking at all processes to
>update job stats. With cgroups, for each job you can easily identify
>just that jobs processes and then get info from wherever
>(handling/ignoring the case where processes finish or spawn between
>identification and stats collection).
>
>With slurm, I’ve just been waiting for cgroup support to mature and
>figured it would become the recommended way sooner or later and I’m
>following the discussion with interest.  That said, I’m not aware of
>mooted cgroups performance issues while my cpusets experience becoming
>increasingly dated.
>
>Gareth
>
>From: Danny Auble [mailto:d...@schedmd.com]
>Sent: Wednesday, 23 December 2015 12:56 PM
>To: slurm-dev 
>Subject: [slurm-dev] Re: slurmd can't mount cpuacct cgroup namespace on
>RHEL 7.2 ?
>
>All the JobAcctGatherType does is poll, proc or cgroup, for memory. The
>memory is slightly different but I wouldn't say better. I have actually
>preferred proc just since it has been around for a lot longer. Also
>because of the overhead involved in pooling the cgroup, it is much
>slower.
>
>On December 22, 2015 5:47:23 PM PST, Christopher Samuel
>mailto:sam...@unimelb.edu.au>> wrote:
>
>On 23/12/15 12:27, Danny Auble wrote:
>
> Chris I will make note there is very little the
>JobAcctGatherType=jobacct_gather/cgroup plugin will add, mostly it will
> just slow things down, most of the stats are still pulled from proc.
>
>Oh interesting, it doesn't get a more accurate view of memory use?
>
>I would have thought that cgroups memory.stat would provide a much
>better measure of that than the polling Slurm would be doing (as it you
>could miss peaks with polling, whereas cgroups are less likely to - I
>would have thought).
>
>cheers!
>Chris


[slurm-dev] Re: slurmd can't mount cpuacct cgroup namespace on RHEL 7.2 ?

2015-12-22 Thread Danny Auble
All the JobAcctGatherType does is poll, proc or cgroup, for memory.  The memory 
is slightly different but I wouldn't say better.  I have actually preferred 
proc just since it has been around for a lot longer.  Also because of the 
overhead involved in pooling the cgroup, it is much slower. 


On December 22, 2015 5:47:23 PM PST, Christopher Samuel  
wrote:
>
>On 23/12/15 12:27, Danny Auble wrote:
>
>> Chris I will make note there is very little the
>> JobAcctGatherType=jobacct_gather/cgroup plugin will add, mostly it
>will
>> just slow things down, most of the stats are still pulled from proc.
>
>Oh interesting, it doesn't get a more accurate view of memory use?
>
>I would have thought that cgroups memory.stat would provide a much
>better measure of that than the polling Slurm would be doing (as it you
>could miss peaks with polling, whereas cgroups are less likely to - I
>would have thought).
>
>cheers!
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: slurmd can't mount cpuacct cgroup namespace on RHEL 7.2 ?

2015-12-22 Thread Danny Auble
Chris I will make note there is very little the 
JobAcctGatherType=jobacct_gather/cgroup plugin will add, mostly it will just 
slow things down, most of the stats are still pulled from proc. 

I would advise against using this in production, I have yet to find a 
compelling case otherwise. 

If you have found one please share :-). 

Danny 

On December 22, 2015 5:22:17 PM PST, Christopher Samuel  
wrote:
>
>On 23/12/15 03:20, je...@schedmd.com wrote:
>
>> Changed for the next major Slurm release, version 16.05:
>
>Thanks Moe & Janne, I just added:
>
>CgroupMountpoint=/sys/fs/cgroup
>
>to my cgroup.conf on this test system to mimic this commit, reverted
>to:
>
>JobAcctGatherType=jobacct_gather/cgroup
>
>in slurm.conf, rebooted the compute nodes and it does appear to be
>working now.
>
>Thanks so much!
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Slurm version 15.08.6 now available

2015-12-18 Thread Danny Auble


We are pleased to announce the availability of Slurm version 15.08.6. 
This release is primarily in response to the regression in 15.08.5 with 
respects to finding the lua library.  It also contains a few other minor 
bug fixes you may find interesting.  Slurm downloads are available from: 
http://www.schedmd.com/#repos


We hope everyone has a great holiday and thanks for a great year!

* Changes in Slurm 15.08.6
==
 -- In slurmctld log file, log duplicate job ID found by slurmd. 
Previously was

being logged as prolog/epilog failure.
 -- If a job is requeued while in the process of being launch, remove it's
job ID from slurmd's record of active jobs in order to avoid 
generating a

duplicate job ID error when launched for the second time (which would
drain the node).
 -- Cleanup messages when handling job script and environment variables in
older directory structure formats.
 -- Prevent triggering gang scheduling within a partition if configured 
with

PreemptType=partition_prio and PreemptMode=suspend,gang.
 -- Decrease parallelism in job cancel request to prevent denial of service
when canceling huge numbers of jobs.
 -- If all ephemeral ports are in use, try using other port numbers.
 -- Revert way lib lua is handled when doing a dlopen, fixing a 
regression in

15.08.5.
 -- Set the debug level of the rmdir message in xcgroup_delete() to debug2.
 -- Fix the qstat wrapper when user is removed from the system but still
has running jobs.
 -- Log the request to terminate a job at info level if DebugFlags includes
the Steps keyword.
 -- Fix potential memory corruption in _slurm_rpc_epilog_complete as 
well as

_slurm_rpc_complete_job_allocation.
 -- Fix cosmetic display of AccountingStorageEnforce option "nosteps" when
in use.
 -- If a job can never be started due to unsatisfied job dependencies, 
report
the full original job dependency specification rather than the 
dependencies

remaining to be satisfied (typically NULL).
 -- Refactor logic to synchronize active batch jobs and their 
script/environment

files, reducing overhead dramatically for large numbers of active jobs.
 -- Avoid hard-link/copy of script/environment files for job arrays. 
Use the

master job record file for all tasks of the job array.
NOTE: Job arrays submitted to Slurm version 15.08.6 or later will 
fail if

the slurmctld daemon is downgraded to an earlier version of Slurm.
 -- Move slurmctld mail handler to separate thread for improved 
performance.

 -- Fix containment of adopted processes from pam_slurm_adopt.
 -- If a pending job array has multiple reasons for being in a pending 
state,

then print all reasons in a comma separated list.


[slurm-dev] Re: Slurm version 15.08.5 now available

2015-12-10 Thread Danny Auble


I'm not Moe, but I would say yes, definitely.  Mixing and matching will 
most likely lead you to heartache ;).


On 12/10/15 17:45, Christopher Samuel wrote:

On 11/12/15 11:56, je...@schedmd.com wrote:


We are pleased to announce the availability of Slurm version 15.08.5

Thanks Moe - silly question - do you need to recompile plugins if going
from 15.08.4 to 15.08.5 ?

cheers,
Chris


[slurm-dev] Re: MsgAggregation Parameters

2015-12-08 Thread Danny Auble


Hey Paul,  Unless you have a very busy cluster (100s of jobs a second) 
or are running very large jobs (>2000 nodes) I don't think this will be 
very useful.  But I would expect 
MsgAggregationParams=WindowMsgs=10,WindowTime=10 to be more what you 
would want.  WindowTime=100 may be too long of a wait.  I am surprised 
at the threading of your slurmctld though, I would expect it to have 
much less threading.  Be sure to restart all your slurmd's as well as 
the slurmctld when you change the parameter. Try again and see if 
lowering the WindowTime down improves your situation.


Danny

On 12/08/15 10:25, Paul Edmon wrote:


We recently upgraded to 15.08.4 from 14.11 and we wanted to try out 
the MsgAggregation to see if that would improve cluster throughput and 
responsiveness.  However when we turned it on with the settings of 
WindowMsgs=10 and WindowTime=100 everything slowed to a crawl and it 
looked like the slurmctld was threading like crazy.  When we turned it 
off everything returned to normal.  Does any one have any suggestions 
or guidelines for what to set the MsgAggregationParam to?  I'm 
guessing it depends on the size of the cluster as we have the same 
settings on our test cluster but it is about 10 times smaller in terms 
of number of nodes than our main one.  I'm guessing this is a scaling 
problem.


Thoughts?  Anyone else using MsgAggregation?

-Paul Edmon-


[slurm-dev] Slurm version 15.08.4 is now available

2015-11-13 Thread Danny Auble


Slurm version 15.08.4 is now available it includes about 25 bug fixes 
developed over the past couple of weeks.


One notable fix is found in commits 8e66e2677 and d72f132d42 which will 
fix a slurmctld bug in which a pending job array could be canceled by a 
user different from the owner or the administrator. This appears to 
exist in the 15.08.* as well as the 14.11.* branches.


It is recommended you update at your earliest convenience.  If upgrading 
isn't an option generating a patch from those 2 commits is recommended.


Details about the changes are listed in the distribution's NEWS file. 
Slurm downloads are available from http://www.schedmd.com/#repos.


See you all at SC15 next week, Slurm booth #1851!

* Changes in Slurm 15.08.4
==
 -- Fix typo for the "devices" cgroup subsystem in pam_slurm_adopt.c
 -- Fix TRES_MAX flag to work correctly.
 -- Improve the systemd startup files.
 -- Added burst_buffer.conf flag parameter of "TeardownFailure" which will
teardown and remove a burst buffer after failed stage-in or stage-out.
By default, the buffer will be preserved for analysis and manual 
teardown.

 -- Prevent a core dump in srun if the signal handler runs during the job
allocation causing the step context to be NULL.
 -- Don't fail job if multiple prolog operations in progress at slurmctld
restart time.
 -- Burst_buffer/cray: Fix to purge terminated jobs with burst buffer 
errors.
 -- Burst_buffer/cray: Don't stall scheduling of other jobs while a 
stage-in

is in progress.
 -- Make it possible to query 'extern' step with sstat.
 -- Make 'extern' step show up in the database.
 -- MYSQL - Quote assoc table name in mysql query.
 -- Make SLURM_ARRAY_TASK_MIN, SLURM_ARRAY_TASK_MAX, and 
SLURM_ARRAY_TASK_STEP

environment variables available to PrologSlurmctld and EpilogSlurmctld.
 -- Fix slurmctld bug in which a pending job array could be canceled
by a user different from the owner or the administrator.
 -- Support taking node out of FUTURE state with "scontrol reconfig" 
command.

 -- Sched/backfill: Fix to properly enforce SchedulerParameters of
bf_max_job_array_resv.
 -- Enable operator to reset sdiag data.
 -- jobcomp/elasticsearch plugin: Add array_job_id and array_task_id 
fields.

 -- Remove duplicate #define IS_NODE_POWER_UP.
 -- Added SchedulerParameters option of max_script_size.
 -- Add REQUEST_ADD_EXTERN_PID option to add pid to the slurmstepd's extern
step.
 -- Add unique identifiers to anchor tags in HTML generated from the 
man pages.

 -- Add with_freeipmi option to spec file.
 -- Minor elasticsearch code improvements


[slurm-dev] Re: Partition QoS

2015-11-10 Thread Danny Auble
I'm guessing you also added the qos to your partition line in your slurm.conf 
as well. 

On November 10, 2015 5:08:24 PM PST, Paul Edmon  wrote:
>I did that but it didn't pick it up.  I must need to reconfigure again 
>after I made the qos.  I will have to try it again.  Let you know how
>it 
>goes.
>
>-Paul Edmon-
>
>On 11/10/2015 5:40 PM, Douglas Jacobsen wrote:
>> Re: [slurm-dev] Partition QoS
>> Hi Paul,
>>
>> I did this by creating the qos, e.g. sacctmgr create qos
>part_whatever
>> Then in slurm.conf setting qos=part_whatever in the "whatever" 
>> partition definition.
>>
>> scontrol reconfigure
>>
>> finally, set the limits on the qos:
>> sacctmgr modify qos set MaxJobsPerUser=5 where name=part_whatever
>> ...
>>
>> -Doug
>>
>> 
>> Doug Jacobsen, Ph.D.
>> NERSC Computer Systems Engineer
>> National Energy Research Scientific Computing Center 
>> 
>>
>> - __o
>> -- _ '\<,_
>> --(_)/  (_)__
>>
>>
>> On Tue, Nov 10, 2015 at 2:29 PM, Paul Edmon > > wrote:
>>
>>
>> In 15.08 you are able to set QoS limits directly on a partition. 
>> So how do you actually accomplish this?  I've tried a couple of
>> ways, but no luck.  I haven't seen a demo of how to do this
>> anywhere either.  My goal is to set up a partition with the
>> following QoS parameters:
>>
>> MaxJobsPerUser=5
>> MaxSubmitJobsPerUser=5
>> MaxCPUsPerUser=128
>>
>> Thanks for the info.
>>
>> -Paul Edmon-
>>
>>


[slurm-dev] Re: fix a bug in accounting_storage/mysql

2015-11-10 Thread Danny Auble


Thanks! this was committed in commit edd932af36d08.

Danny

On 11/10/15 14:42, Hongjia Cao wrote:

This will be triggered when a non-privileged user executes "sreport"
and the cluster name contains a special character such as '-' .


[slurm-dev] Mailing list back up

2015-11-10 Thread Danny Auble


Sorry for the delay on the list messages.  We just switched servers and 
there was a problem with the transition that was just fixed.  I don't 
believe any messages were lost.  If you don't see yours please re-send.


Danny


[slurm-dev] Re: PMI2 in Slurm 14.11.8 ?

2015-09-01 Thread Danny Auble
I'm fairly sure if you install via rpm it will be there.  Contribs isn't build 
through the normal make as was pointed out, but it is through the rpm process.  

On September 1, 2015 8:24:21 PM PDT, Ralph Castain  wrote:
>
>Danny, Moe, etal: can you confirm that pmi2 is intentionally -not-
>being installed by default?
>
>
>> On Sep 1, 2015, at 8:18 PM, Christopher Samuel
> wrote:
>> 
>> 
>> Hi Ralph,
>> 
>> On 02/09/15 13:10, Ralph Castain wrote:
>> 
>>> Might be my bad, Chris - it was my understanding that PMI2 support
>>> was to be installed by default in Slurm releases post 14.03.
>> 
>> Well to be fair to you srun does list it as an option even though it
>> doesn't appear to be installed..
>> 
>> [samuel@snowy-m ~]$ srun --mpi=list
>> srun: MPI types are...
>> [...]
>> srun: mpi/pmi2
>> srun: mpi/openmpi
>> [...]
>> 
>> All the best,
>> Chris
>> -- 
>> Christopher SamuelSenior Systems Administrator
>> VLSCI - Victorian Life Sciences Computation Initiative
>> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Slurm versions 15.08.0 and 14.11.9 have been released!

2015-08-31 Thread Danny Auble
OLDER VERSION!


We have also release one of the last tags of 14.11 in the form of 14.11.9.

Changes are listed here

 -- Correct "sdiag" backfill cycle time calculation if it yields locks. A
microsecond value was being treated as a second value resulting in an
overflow in the calcuation.
 -- Fix segfault when updating timelimit on jobarray task.
 -- Fix to job array update logic that can result in a task ID of 
4294967294.

 -- Fix of job array update, previous logic could fail to update some tasks
of a job array for some fields.
 -- CRAY - Fix seg fault if a blade is replaced and slurmctld is restarted.
 -- Fix plane distribution to allocate in blocks rather than cyclically.
 -- squeue - Remove newline from job array ID value printed.
 -- squeue - Enable filtering for job state SPECIAL_EXIT.
 -- Prevent job array task ID being inappropriately set to NO_VAL.
 -- MYSQL - Make it so you don't have to restart the slurmctld
to gain the correct limit when a parent account is root and you
remove a subaccount's limit which exists on the parent account.
 -- MYSQL - Close chance of setting the wrong limit on an association
when removing a limit from an association on multiple clusters
at the same time.
 -- MYSQL - Fix minor memory leak when modifying an association but no
change was made.
 -- srun command line of either --mem or --mem-per-cpu will override 
both the

SLURM_MEM_PER_CPU and SLURM_MEM_PER_NODE environment variables.
 -- Prevent slurmctld abort on update of advanced reservation that 
contains no

nodes.
 -- ALPS - Revert commit 2c95e2d22 which also removes commit 2e2de6a4 
allowing

cray with the SubAllocate option to work as it did with 2.5.
 -- Properly parse CPU frequency data on POWER systems.
 -- Correct sacct.a man pages describing -i option.
 -- Capture salloc/srun information in sdiag statistics.
 -- Fix bug in node selection with topology optimization.
 -- Don't set distribution when srun requests 0 memory.
 -- Read in correct number of nodes from SLURM_HOSTFILE when specifying 
nodes

and --distribution=arbitrary.
 -- Fix segfault in Bluegene setups where RebootQOSList is defined in
bluegene.conf and accounting is not setup.
 -- MYSQL - Update mod_time when updating a start job record or adding one.
 -- MYSQL - Fix issue where if an association id ever changes on at least a
portion of a job array is pending after it's initial start in the
database it could create another row for the remain array instead
of using the already existing row.
 -- Fix scheduling anomaly with job arrays submitted to multiple 
partitions,

jobs could be started out of priority order.
 -- If a host has suspended jobs do not reboot it. Reboot only hosts
with no jobs in any state.
 -- ALPS - Fix issue when using --exclusive flag on srun to do the correct
thing (-F exclusive) instead of -F share.
 -- Fix various memory leaks in the Perl API.
 -- Fix a bug in the controller which display jobs in CF state as RUNNING.
 -- Preserve advanced _core_ reservation when nodes 
added/removed/resized on

slurmctld restart. Rebuild core_bitmap as needed.
 -- Fix for non-standard Munge port location for srun/pmi use.
 -- Fix gang scheduling/preemption issue that could cancel job at startup.
 -- Fix a bug in squeue which prevented squeue -tPD to print array jobs.
 -- Sort job arrays in job queue according to array_task_id when 
priorities are

equal.
 -- Fix segfault in sreport when there was no response from the dbd.
 -- ALPS - Fix compile to not link against -ljob and -lexpat with every lib
or binary.
 -- Fix testing for CR_Memory when CR_Memory and CR_ONE_TASK_PER_CORE 
are used

with select/linear.
 -- MySQL - Fix minor memory leak if a connection ever goes away whist 
using it.

 -- ALPS - Make it so srun --hint=nomultithread works correctly.
 -- Prevent job array task ID from being reported as NO_VAL if last 
task in the

array gets requeued.
 -- Fix some potential deadlock issues when state files don't exist in the
association manager.
 -- Correct RebootProgram logic when executed outside of a maintenance
reservation.
 -- Requeue job if possible when slurmstepd aborts.

Both versions can be downloaded from the normal spot 
http://schedmd.com/#repos.


--
Danny Auble
President, SchedMD LLC
Commercial Slurm Development and Support
===
Slurm User Group Meeting, 15-16 September 2015, Washington D.C.
http://slurm.schedmd.com/slurm_ug_agenda.html


[slurm-dev] Re: srun to existing allocation, but just a specific node

2015-08-26 Thread Danny Auble


If you don't request a node count '-N1' the srun will grab all the nodes 
in the allocation. --nodelist says "give me at least this one node plus 
any thing else to fulfill my request".


Try -N1 -n1 and you should get a srun that only runs on one of the nodes 
in the allocation.  As long as you are in the allocation you shouldn't 
need --jobid=.


srun -N1 -n1 --nodelist=node1 ps -ef

should get you what you want.

On 08/26/15 08:06, Thompson, Matt[SCIENCE SYSTEMS AND APPLICATIONS INC] 
(GSFC-610.1) wrote:


SLURM Dev,

I'm hoping you can help me with something. I recently had need to 
figure out what's going on inside a running batch job. I had a 
suspicion that maybe a job that should use, say, 48 cores on a few 
nodes managed to somehow pack them all on a single node due to my 
idiocy with an mpirun command.


So, I have a job-id and I know I can do, say:

  srun --jobid= ps -ef

and I'll get a ps on the nodes in that allocation. But, if that 
allocation has, say, 14 nodes, I get 14 nodes worth of information 
that is hard to parse out since ps doesn't prepend/print hostname[1].


I thought maybe there is a way to run the srun command on just one of 
the nodes in the allocation and I tried:


  srun --jobid= --nodelist=node1 ps -ef

where node1 is one of the nodes in the allocation. But, no, that 
doesn't seem to do what I'd hoped as I still get every node running ps.


Now, I'm sure I could whip up a bash script which tests for the 
hostname and runs a command only if that matches the one I want, but I 
was hoping for a nice simple way with srun itself to do this.


Matt

[1] That I know of. I didn't see "hostname" in the ps manpage.


[slurm-dev] Re: Adding a new cluster to slurmdbd - how to get existing accounts/associations to be recognised?

2015-08-20 Thread Danny Auble
You need to add the accounts to the cluster.  If you want it like your other 
cluster an easy way to do that is use sacctmgr to dump the cluster and then 
change the cluster name in the file and load it in with sacctmgr. 

On August 20, 2015 7:43:22 PM PDT, Chris Samuel  wrote:
>
>Hi there,
>
>We've upgraded our slurmdbd to 14.11.8 as part of our prep for a new
>cluster
>and I'm having issues with the creation of the new cluster.
>
>We can add the new cluster "snowy" to slurmdbd with sacctmgr, but any
>changes
>to accounts after that now fail with:
>
># sacctmgr modify user set defaultaccount=VLSCI where name=samuel
>Can't modify because these users aren't associated with new default
>account 'vlsci'...
>  U = samuel C = snowy
>
>Indeed if I list associations there are none for the new cluster.
>
>We don't seem to have hit this issue in the past when adding new
>machines.
>
>I can't modify the cluster list for an existing account as even though
>the
>website says:
>
>http://slurm.schedmd.com/accounting.html
>
># When either adding or modifying an account, the following sacctmgr
>options
># are available:
>#
># Cluster= Only add this account to these clusters. The account is
>added to all
>#defined clusters by default.
>
>I get:
>
># sacctmgr -i modify account set Cluster="" where account=vlsci
> Can't modify the cluster of an account
>
>Any ideas please?
>
>thanks!
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: gmail spam filters?

2015-07-30 Thread Danny Auble


Looks like our server "sometimes" decided to use the ipv6 address to 
send mail instead of the blessed ipv4 address, hence the sporadic 
success rate.  This has been corrected so hopefully this will be fixed.


Thanks for reporting,
Danny

On 07/30/15 06:36, Nick Semenkovich wrote:

The message warns that the list isn't following any of the bulk sender
rules: https://support.google.com/mail/answer/81126?hl=en#authentication

Headers show the big issues are:
- No SPF records
- Rewriting the sender (spoofing this message to send as me, rather
than "sent on behalf of schedmd")

On Thu, Jul 30, 2015 at 6:52 AM, David Carlet  wrote:

I've gotten it to the point where it only throws one out of every 4 or 5
emails into the spam folder; and I've submitted a few to their spam filter
team so they will hopefully learn why   Assuming that the submit window is
not a window of lies, anyway.

It's quite annoying for sure.

On Thu, Jul 30, 2015 at 7:44 AM, Michael Di Domenico
 wrote:


is anyone else having an issue using a gmail address for the slurm
mailling lists?  Gmail keeps blocking all the slurm mail for my
account and marking it as Spam.  A little yellow box pops up and says
this message is in violation of gmails bulk sender something or other







[slurm-dev] Re: node renaming

2015-07-28 Thread Danny Auble


Is there a reason for the reservation?  You could down the partition if 
you want to wait for running jobs to complete before hand which would 
also stop new jobs from starting as stated by Marcin.  I would not 
expect any running jobs to survive renaming nodes.


When you don't have any running jobs or you don't care about running 
jobs I would


1. Issue scontrol shutdown
2. Change your slurm.conf with the appropriate node names
3. Restart all your daemons
4. Don't touch the database :)

Any pending jobs (as long as they didn't request any specific old node 
name) should be just fine.  As long as you have the addresses right on 
your nodes I wouldn't expect anything else is needed.


You can always set up a test system and do these steps before hand so 
you don't experience anything unexpected.


Danny


On 07/28/15 08:04, Andrew E. Bruno wrote:

On Tue, Jul 28, 2015 at 07:21:13AM -0700, Marcin Stolarek wrote:

2015-07-28 15:47 GMT+02:00 Andrew E. Bruno :


On Tue, Jul 28, 2015 at 06:30:09AM -0700, Marcin Stolarek wrote:

2015-07-28 15:08 GMT+02:00 Andrew E. Bruno :


We need to rename all the nodes in our cluster. Our thinking is to put
in a full-system reservation:

 scontrol create reservation  nodes=ALL ..

Take the nodes down and rename them. Then bring slurm backup configured
with the new names.

What will happen when we bring slurm backup with all new node names?
Does the reservation store specific nodenames? or will the slurmdbd/ctl
handle this gracefully?

Any suggestions on how best to rename all the nodes in a cluster?


As I understand you want to remove nodes and add new - with new names.
In database/accounting this names have no special format, they are just
"strings".
So... old nodes (old names/ip's) from slurm.conf are going to be down and
you need to add new entries to slurm.conf, but
maybe I'm not getting what the problem is..?

Wondering how the reservation will be handled.. when all the old names
go away. Are you suggesting just renaming the nodes directly in the db?


but why you want to change anything in db? Historical jobs where running on
nodes with old names.

I haven't check this in code, but I'm pretty sure that "nodes=all"  is
changed to the list of all hosts configured in slurm.conf, so if  you add
new nodes (with new names) they are not going to be in previously created
reservation.

If you don't want to run jobs on the newly added nodes, you can change
partition state to DOWN.

Perfect, setting the partition down should work great. Thanks for the
tip.


cheers,
marcin


[slurm-dev] Re: Accounting frequency, running jobs and sacct

2015-07-15 Thread Danny Auble
sstat should work with mpi jobs as well, just as long as srun is the launcher. 

On July 15, 2015 9:11:22 PM PDT, Christopher Samuel  
wrote:
>
>On 14/07/15 00:22, Danny Auble wrote:
>
>> sstat will work while the step is running and sacct will start to
>> display the stats after completion.
>
>Great thanks - we'll make suggestions to users who want to check their
>(non-MPI) jobs at runtime to use srun to launch the binary instead.
>
>All the best!
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: requeue with preemption not working

2015-07-15 Thread Danny Auble


Hey Jackie, only batch jobs can be requeued.  Otherwise they get 
canceled.  If you have level "debug" debugging on you would get a 
message like "Job-requeue can only be done for batch jobs" for non-batch 
jobs right before the "had to be killed" message.


Are you seeing this for batch jobs as well?  If so make sure your 
slurm.conf doesn't have JobRequeue=0 or you will have to have each 
sbatch have the --requeue option to allow requeueing.


Let me know if that helps or not

On 07/15/15 15:32, Jacqueline Scoggins wrote:

requeue with preemption not working
Can someone help assist with this behavior we are seeing?

Slurm - 14.3.8
Linux - SL 6.6

trying to setup preemption via qos
/etc/slurm/slurm.conf -
   PreemptMode=REQUEUE
   PreemptType=preempt/qos


qos settings are as follows:

Name   Priority  GraceTime Preempt PreemptMode Flags UsageThres 
UsageFactor  GrpCPUs  GrpCPUMins GrpCPURunMins GrpJobs  GrpMem 
GrpNodes GrpSubmit GrpWall  MaxCPUs  MaxCPUMins MaxNodes MaxWall 
MaxCPUsPU MaxJobsPU MaxNodesPU MaxSubmitPU


-- -- -- -- --- 
 -- --- 
 --- - --- ---  - 
---  ---  --- - 
- -- ---


normal  0   00:00:00 cluster 
1.00


  lr_debug  1   00:00:00 pr_normal cluster 
  1.00 400:30:00


 lr_normal   1000   00:00:00 pr_normal cluster 
  1.00 64  3-00:00:00


  c_serial   1000   00:00:00 pr_normal cluster 
  1.00   7 1


 pr_normal  0   00:00:00 requeue 
1.00   3  3-00:00:00



Jobs are being preempted by lr_normal queued job but instead of being 
requeued they are cancelled.


[2015-07-15T14:56:17.888] job_signal 9 of running job 93 successful 0x8008

[2015-07-15T14:56:17.888] preempted job 93 had to be killed

[2015-07-15T14:56:17.940] completing job 93 status 15

How does slurm decide if REQUEUE will cancel or requeue a job and can 
a user specify to only do a requeue within sbatch or srun?



Thanks


Jackie




[slurm-dev] Re: timeout issues

2015-07-14 Thread Danny Auble


Perhaps teaching the users about job arrays would help a lot in this 
situation.  They could submit all 20-30k jobs with only 1 sbatch 
command.  It is much more efficient for the scheduler and would probably 
eliminate almost all the timeout issues you are seeing.


On 07/14/15 08:42, Charles Johnson wrote:

slurm 14.11.7
cgroups implemented
backfill implemented

We have a small cluster -- ~650 nodes and ~6500 processors. We are 
looking for ways to lessen the impact of a busy scheduler for users 
who submit jobs with an automated submission process. Their job 
monitoring will fail with:


squeue: error: slurm_receive_msg: Socket timed out on send/recv operation
slurm_load_jobs error: Socket timed out on send/recv operation

We are using back-fill:

SchedulerParameters=bf_interval=120,bf_continue,bf_resolution=300,bf_max_job_test=2000,bf_max_job_user=100,max_sched_time=2 



Our cluster generally has numerous small, single-core; and when a user 
submits 20,000 or 30,000 jobs the system can fail to respond to 
squeue, or even sbatch.


One user has suggested we write a wrapper for certain commands, like 
squeue, which auto re-try when such messages are returned. This 
doesn't seem like the appropriate "fix." IMHO, a better approach would 
be to "fix" the submission systems that some users have.


Are there other who have faced this issue?  I have thought about 
caching the output to squeue in a file, refreshing the file in a 
timely way, and pointing an squeue wrapper to return that; but again 
that doesn't seem like a good approach.


Any suggestions would be great.

Charles



[slurm-dev] Re: Accounting frequency, running jobs and sacct

2015-07-13 Thread Danny Auble


Hey Chris,

On 07/12/15 19:04, Christopher Samuel wrote:

Hi Danny,

On 10/07/15 22:30, Danny Auble wrote:


What is your jobaccountgather set to in your slurm.conf?

I think this is what you mean, I believe it's the default.

JobAcctGatherFrequency=30

We're using cgroups to gather the info:

JobAcctGatherType=jobacct_gather/cgroup
I would recommend jobacct_gather/linux over cgroup.  cgroup adds quite a 
bit of overhead with very little benefit, but this was what I was 
looking for.



Stats should only show up from jobs launched from srun.

Do you mean during runtime, or at completion?
sstat will work while the step is running and sacct will start to 
display the stats after completion.


MPI jobs launched either directly with srun or via Open-MPI's mpirun
(which uses srun to invoke orted to launch MPI ranks) do not report any
stats until completion.
Expected, no stats are sent to the database until after the step 
completes, use sstat beforehand.


We do get stats for jobs that just use sbatch (and no srun) once they
complete.

Expected, for the batch step anyway.


The manual page for sbatch says that memory stats are collected by
default every 30 seconds.
Correct, they are.  Just use sstat if you want to see any running step's 
info.


All the best,
Chris


[slurm-dev] Re: Accounting frequency, running jobs and sacct

2015-07-10 Thread Danny Auble
What is your jobaccountgather set to in your slurm.conf? 

Stats should only show up from jobs launched from srun. 

On July 9, 2015 10:03:21 PM PDT, Christopher Samuel  
wrote:
>
>Hi folks,
>
>The Slurm 14.03.11 sbatch man page says for --acctg-freq=
>
># Define the job accounting and profiling sampling intervals.
>
>Now our default is the Slurm default of 30 seconds, but even when I
>set:
>
>sbatch --acctg-freq=task=5
>
>I do not see anything useful for things like MaxRSS or similar in sacct
>for my running jobs, even after 5 minutes of runtime.
>
>Currently I'm running two identical NAMD jobs, one launched with mpirun
>and one with srun, but the sacct information isn't getting updated for
>either.
>
>If I run (as root):
>
>sacct --state R -o maxrss
>
>then there is no MaxRSS reported for any running jobs, even the longest
>currently (20 hours old).
>
>We are running using slurmdbd with the MySQL backend.
>
>Am I misunderstanding something?
>
>All the best,
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: cgroup setup and cpuset issues

2015-06-10 Thread Danny Auble
You can use include files in your slurm.conf 
(http://slurm.schedmd.com/slurm.conf.html), just have that file be local 
on each  cluster, I am guessing you already have something like this 
since you would need a different cluster name for each system.  I would 
use jobacct_gather/linux, the cgroup one adds quite a bit of overhead 
with little benefit.


On 06/10/15 09:36, Jacqueline Scoggins wrote:

Re: [slurm-dev] Re: cgroup setup and cpuset issues
I also have another question.  What will be the impact of these 
settings if this is not a single system with one slurm configuration 
but  multiple clusters with one slurm configuration and some cluster 
groups are not going to be using cgroups?


i.e.  - Shared resources owned by separate PI's and they have their 
own set of policies in regards to the run time environment. Exclusive 
resources owned by the IT department that all approved PI's and staff 
members can use and they will be controlled by our policies regarding 
their run time environment (hence cgroup settings).


ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
JobAcctGatherType=jobacct_gather/cgroup



Thanks

Jackie

On Wed, Jun 10, 2015 at 8:26 AM, Peter Kjellstrom > wrote:



What is the user trying to run?

We've seen that, for example, IntelMPI-4.x has major problems setting
up it's pinning in a slurm created cgroup with size less than a full
node. In this case actual pinning is random/corrupt and the only
work-around is to request a whole node.

/Peter K

On Tue, 09 Jun 2015 17:51:20 -0700
Jacqueline Scoggins mailto:jscogg...@lbl.gov>>
wrote:

> First round of testing cgroups and noticed that no matter how many
> cpus requested (-n x) the users job is only running on one cpu.
...
> Thanks in advanced for your assistance.
>
>
> Jackie Scoggins






[slurm-dev] Re: cgroup setup and cpuset issues

2015-06-10 Thread Danny Auble
I would make sure the application is an actual parallel application.  
How are you verifying it is only running on 1 cpu? Could you send what 
the user is running?


On a sort of related note, since you are using cgroups in the task 
plugin you might also try


ProctrackType=proctrack/cgroup

Since you are using slurmdbd accounting you probably don't need 
JobCompType=jobcomp/filetxt either.


With respect to task affinity, I wouldn't expect it to matter here as we 
advise most people to use it as it will bind tasks to specific cpus 
usually increasing performance.


Danny

On 06/10/15 07:27, Ryan Cox wrote:

Jackie,

You probably want to try TaskAffinity=no.  I remember that we had some 
weird behavior when we had it set to yes.  Task affinity is used to 
pin tasks to certain cpus but the cgroup already limits them to the 
allocated set of cpus, so it seems redundant.


Ryan

On 06/09/2015 06:51 PM, Jacqueline Scoggins wrote:

cgroup setup and cpuset issues
First round of testing cgroups and noticed that no matter how many 
cpus requested (-n x) the users job is only running on one cpu.


Current configuration:

slurm.conf  -

SlurmUser=slurm

SlurmdUser=root

SlurmctldPort=6817

SlurmdPort=6818

AuthType=auth/munge

CryptoType=crypto/munge

CompleteWait=0

StateSaveLocation=/tmp

SlurmdSpoolDir=/tmp/slurmd

SlurmctldPidFile=/var/run/slurmctld.pid

SlurmdPidFile=/var/run/slurmd.pid

SwitchType=switch/none

MpiDefault=none

CacheGroups=0

KillOnBadExit=1

JobRequeue=0

ReturnToService=1

TreeWidth=4096

MaxJobCount=10

*/TaskPlugin=task/cgroup/*

TopologyPlugin=topology/tree

MessageTimeout=60

SlurmctldTimeout=300

SlurmdTimeout=300

InactiveLimit=0

MinJobAge=300

KillWait=30

Waittime=0

SchedulerType=sched/backfill

SchedulerParameters=bf_continue

SelectType=select/cons_res

SelectTypeParameters=CR_CPU_Memory

ProctrackType=proctrack/linuxproc

FastSchedule=0

PriorityType=priority/multifactor

PriorityDecayHalfLife=14-0

PriorityUsageResetPeriod=None

PriorityWeightFairshare=100

PriorityWeightQOS=10

PriorityWeightAge=1000

PriorityWeightPartition=0

PriorityWeightJobSize=1000

PriorityMaxAge=06:00:00

PriorityFlags=Ticket_based

SlurmctldDebug=4

SlurmctldLogFile=/local/slurm/log/slurmctld.log

SlurmdDebug=4

SlurmdLogFile=/local/slurm/log/slurmd.log

JobCompType=jobcomp/filetxt

JobCompLoc=/var/spool/slurm/jobs/complete

JobAcctGatherType=jobacct_gather/linux

JobAcctGatherFrequency=10

AccountingStorageType=accounting_storage/slurmdbd

AccountingStorageEnforce=associations,limits,qos

AccountingStorageHost=phoenix.scs.lbl.gov 

HealthCheckProgram=/usr/sbin/nhc

HealthCheckInterval=300

NodeName=n0[000-018] NodeAddr=10.0.17.[0-18] CPUs=8 Sockets=2 
CoresPerSocket=4 Feature=lr_phi


PartitionName=c_shared Nodes=n0[000-008] Shared=yes

PartitionName=regular Nodes=n0[009-018] Shared=Exclusive


cgroup.conf


CgroupAutomount=yes

CgroupReleaseAgentDir="/etc/slurm/cgroup"

ConstrainCores=yes

TaskAffinity=yes

ConstrainRAMSpace=no


Is there something I am missing. I tried it with only TaskAffinity 
without ConstrainRAMSpace=no but that did not make any difference.



Slurm version = 14.03.8

OS = SL 6.6


Please advise if I need to configure something else to make it work.


Thanks in advanced for your assistance.


Jackie Scoggins



--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University




[slurm-dev] Re: Slurm and docker/containers

2015-06-04 Thread Danny Auble
Were they linking to slurm or just calling commands? Unless they link there 
shouldn't be any license issues. 

On June 4, 2015 4:40:14 PM PDT, Christopher Samuel  
wrote:
>
>On 05/06/15 06:59, Michael Jennings wrote:
>
>> My team had a very (!!) productive and interesting discussion
>yesterday
>> with some folks who have succeeded in integrating Docker and SLURM to
>> the point that users can specify Docker repositories in which they
>> want SLURM to run their jobs, and SLURM will do so.
>
>How did they deal with the license incompatibility that Ralph
>mentioned?
>
>All the best
>Chris
>-- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci


[slurm-dev] Re: Slurm versions 14.11.7 and 15.08.0-0pre5 are now available

2015-05-21 Thread Danny Auble

Most likely

http://bugs.schedmd.com/show_bug.cgi?id=858


On 05/21/15 13:59, Chris Read wrote:
Re: [slurm-dev] Slurm versions 14.11.7 and 15.08.0-0pre5 are now 
available

Hooray for memory accounting!

Does this mean it will be possible to include memory usage in the 
Fairshare calculation too?


Chris

On Thu, May 21, 2015 at 3:40 PM, Danny Auble <mailto:d...@schedmd.com>> wrote:



Slurm version 14.11.7 is now available with quite a few bug fixes as
listed below.

A development tag for 15.08 (pre5) has also been made.  It
represents the current state of Slurm development for the release
planned in August 2015 and is intended for development and test
purposes only.  One notable enhancement included is the idea of
Trackable Resources (TRES) for accounting for cpu, memory, energy,
GRES, licenses, etc.

Both are available for download at
http://slurm.schedmd.com/download.html

Notable changes for these versions are these...

* Changes in Slurm 14.11.7
==
 -- Initialize some variables used with the srun --no-alloc option
that may
cause random failures.
 -- Add SchedulerParameters option of sched_min_interval that
controls the
minimum time interval between any job scheduling action. The
default value
is zero (disabled).
 -- Change default SchedulerParameters=max_sched_time from 4
seconds to 2.
 -- Refactor scancel so that all pending jobs are cancelled before
starting
cancellation of running jobs. Otherwise they happen in
parallel and the
pending jobs can be scheduled on resources as the running jobs
are being
cancelled.
 -- ALPS - Add new cray.conf variable NoAPIDSignalOnKill. When set
to yes this
will make it so the slurmctld will not signal the apid's in a
batch job.
Instead it relies on the rpc coming from the slurmctld to kill
the job to
end things correctly.
 -- ALPS - Have the slurmstepd running a batch job wait for an
ALPS release
before ending the job.
 -- Initialize variables in consumable resource plugin to prevent
core dump.
 -- Fix scancel bug which could return an error on attempt to
signal a job step.
 -- In slurmctld communication agent, make the thread timeout be
the configured
value of MessageTimeout rather than 30 seconds.
 -- sshare -U/--Users only flag was used uninitialized.
 -- Cray systems, add "plugstack.conf.template" sample SPANK
configuration file.
 -- BLUEGENE - Set DB2NOEXITLIST when starting the slurmctld
daemon to avoid
random crashing in db2 when the slurmctld is exiting.
 -- Make full node reservations display correctly the core count
instead of
cpu count.
 -- Preserve original errno on execve() failure in task plugin.
 -- Add SLURM_JOB_NAME env variable to an salloc's environment.
 -- Overwrite SLURM_JOB_NAME in an srun when it gets an allocation.
 -- Make sure each job has a wckey if that is something that is
tracked.
 -- Make sure old step data is cleared when job is requeued.
 -- Load libtinfo as needed when building ncurses tools.
 -- Fix small memory leak in backup controller.
 -- Fix segfault when backup controller takes control for second time.
 -- Cray - Fix backup controller running native Slurm.
 -- Provide prototypes for init_setproctitle()/fini_setproctitle
on NetBSD.
 -- Add configuration test to find out the full path to su command.
 -- preempt/job_prio plugin: Fix for possible infinite loop when
identifying
preemptable jobs.
 -- preempt/job_prio plugin: Implement the concept of Warm-up Time
here. Use
the QoS GraceTime as the amount of time to wait before preempting.
Basically, skip preemption if your time is not up.
 -- Make srun wait KillWait time when a task is cancelled.
 -- switch/cray: Revert logic added to 14.11.6 that set
"PMI_CRAY_NO_SMP_ENV=1"
if CR_PACK_NODES is configured.
 -- Prevent users from setting job's partition to an invalid
partition.

* Changes in Slurm 15.08.0pre5
==
 -- Add jobcomp/elasticsearch plugin. Libcurl is required for
build. Configure
the server as follows:
"JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200";.
 -- Scancel logic large re-written to better support job arrays.
 -- Added a slurm.conf parameter PrologEpilogTimeout to control
how long
prolog/epilog can run.
 -- Added TRES (Trackable resources) to track Mem, GRES, license, etc
utilization.
 -- Add re-entrant versions of glibc time functions (e.g.
localtime) to Slurm
in order to eliminate rare deadlock of slurmstepd fork and
exec calls.
 -- Constrain kernel memory (if available) in cgroups.
 

[slurm-dev] Slurm versions 14.11.7 and 15.08.0-0pre5 are now available

2015-05-21 Thread Danny Auble


Slurm version 14.11.7 is now available with quite a few bug fixes as
listed below.

A development tag for 15.08 (pre5) has also been made.  It represents 
the current state of Slurm development for the release planned in August 
2015 and is intended for development and test purposes only.  One 
notable enhancement included is the idea of Trackable Resources (TRES) 
for accounting for cpu, memory, energy, GRES, licenses, etc.


Both are available for download at
http://slurm.schedmd.com/download.html

Notable changes for these versions are these...

* Changes in Slurm 14.11.7
==
 -- Initialize some variables used with the srun --no-alloc option that may
cause random failures.
 -- Add SchedulerParameters option of sched_min_interval that controls the
minimum time interval between any job scheduling action. The 
default value

is zero (disabled).
 -- Change default SchedulerParameters=max_sched_time from 4 seconds to 2.
 -- Refactor scancel so that all pending jobs are cancelled before starting
cancellation of running jobs. Otherwise they happen in parallel and the
pending jobs can be scheduled on resources as the running jobs are 
being

cancelled.
 -- ALPS - Add new cray.conf variable NoAPIDSignalOnKill.  When set to 
yes this
will make it so the slurmctld will not signal the apid's in a batch 
job.
Instead it relies on the rpc coming from the slurmctld to kill the 
job to

end things correctly.
 -- ALPS - Have the slurmstepd running a batch job wait for an ALPS release
before ending the job.
 -- Initialize variables in consumable resource plugin to prevent core 
dump.
 -- Fix scancel bug which could return an error on attempt to signal a 
job step.
 -- In slurmctld communication agent, make the thread timeout be the 
configured

value of MessageTimeout rather than 30 seconds.
 -- sshare -U/--Users only flag was used uninitialized.
 -- Cray systems, add "plugstack.conf.template" sample SPANK 
configuration file.
 -- BLUEGENE - Set DB2NOEXITLIST when starting the slurmctld daemon to 
avoid

random crashing in db2 when the slurmctld is exiting.
 -- Make full node reservations display correctly the core count instead of
cpu count.
 -- Preserve original errno on execve() failure in task plugin.
 -- Add SLURM_JOB_NAME env variable to an salloc's environment.
 -- Overwrite SLURM_JOB_NAME in an srun when it gets an allocation.
 -- Make sure each job has a wckey if that is something that is tracked.
 -- Make sure old step data is cleared when job is requeued.
 -- Load libtinfo as needed when building ncurses tools.
 -- Fix small memory leak in backup controller.
 -- Fix segfault when backup controller takes control for second time.
 -- Cray - Fix backup controller running native Slurm.
 -- Provide prototypes for init_setproctitle()/fini_setproctitle on NetBSD.
 -- Add configuration test to find out the full path to su command.
 -- preempt/job_prio plugin: Fix for possible infinite loop when 
identifying

preemptable jobs.
 -- preempt/job_prio plugin: Implement the concept of Warm-up Time 
here. Use

the QoS GraceTime as the amount of time to wait before preempting.
Basically, skip preemption if your time is not up.
 -- Make srun wait KillWait time when a task is cancelled.
 -- switch/cray: Revert logic added to 14.11.6 that set 
"PMI_CRAY_NO_SMP_ENV=1"

if CR_PACK_NODES is configured.
 -- Prevent users from setting job's partition to an invalid partition.

* Changes in Slurm 15.08.0pre5
==
 -- Add jobcomp/elasticsearch plugin. Libcurl is required for build. 
Configure
the server as follows: 
"JobCompLoc=http://YOUR_ELASTICSEARCH_SERVER:9200";.

 -- Scancel logic large re-written to better support job arrays.
 -- Added a slurm.conf parameter PrologEpilogTimeout to control how long
prolog/epilog can run.
 -- Added TRES (Trackable resources) to track Mem, GRES, license, etc
utilization.
 -- Add re-entrant versions of glibc time functions (e.g. localtime) to 
Slurm

in order to eliminate rare deadlock of slurmstepd fork and exec calls.
 -- Constrain kernel memory (if available) in cgroups.
 -- Add PrologFlags option of "Contain" to create a proctrack container at
job resource allocation time.
 -- Disable the OOM Killer in slurmd and slurmstepd's memory cgroup 
when using

MemSpecLimit.


[slurm-dev] Re: hyper-threading question

2015-05-21 Thread Danny Auble


Try adding --hint=nomultithread to your srun line.

On 05/21/15 01:26, Maciej L. Olchowik wrote:

Hello,

How could a user specify that he does not want to use hyper-threading in his 
jobscript?

We have a Cray XC40 system with 32 cores (64 threads) on each compute nodes 
(two Haswell sockets). The performance of some jobs varies between identical 
runs and we have found that this is due to threads allocation on the same core.

We enabled task/affinity plugin and looked at cpu_bind option. We have found 
that the following does what we want:
srun -n 32 
--cpu_bind=map_cpu:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
 ./app

However, is there a shorter/nicer way of just "disabling" hyper-threading on a 
per-job basis?

Maciej

--
Maciej Olchowik
HPC Systems Administrator
KAUST Supercomputing Laboratory (KSL)
Al Khawarizmi Bldg. (1) Room 0134
Thuwal, Kingdom of Saudi Arabia
tel +966 12 808 0684



This message and its contents including attachments are intended solely for the 
original recipient. If you are not the intended recipient or have received this 
message in error, please notify me immediately and delete this message from 
your computer system. Any unauthorized use or distribution is prohibited. 
Please consider the environment before printing this email.


[slurm-dev] Re: Usage of the "deleted" column in clusterName_job_table table

2015-05-13 Thread Danny Auble
I would be very cautious when trying to override or use the database 
schema directly as it changes quite regularly.  There is no guarantee 
accessing a Slurm table directly today in a certain fashion will work in 
the next major release.


On 05/13/15 10:17, Bruce Roberts wrote:
I believe it is used when determining the last submitted job when 
modifying a job as well as when archiving jobs, at least that is what 
the code says.


On 05/13/15 10:12, Andy Riebs wrote:

No, I don't know how it's used. Anyone else care to answer?

Andy

On 05/13/2015 12:26 PM, Anatoliy Kovalenko wrote:
Re: [slurm-dev] Re: Usage of the "deleted" column in 
clusterName_job_table table
Thanks for your advice about "new table". However it will a bit 
difficult to support it in our case. Still, we need to know answer 
to our question about "deleted" column in job_table - do you maybe 
know how it is used?



2015-05-12 16:03 GMT+03:00 Andy Riebs >:


Hi,

In exchange for Slurm automatically handling database setup (and
reconfiguration when you upgrade to a newer version of Slurm),
you have to allow it to do whatever it wants to do with its tables.

Rather than adding (or reusing) an existing column in a table, I
would suggest creating new tables with names such as
"local_clusterName_job_table_extension" that have only your
added columns and appropriate keys. These tables should persist
over (most?) Slurm upgrades.

Andy

On 05/12/2015 07:27 AM, Anatoliy Kovalenko wrote:

We have added to the SLURM clusterName_job_table 1 custom
column. But when we restart slurmdbd , it is deleted. Why this
happens and is it possible to modify SLURM db (tables) without
disrupting SLURM work?
We have also noticed that in table clusterName_job_table there
is a column called "deleted". Is it used by SLURM at any
moment? Is this column (deleted) anyhow related to task
archival?  If it is unused - can we use for our custom purposes
and sometimes put "1" there instead of 0?  will it anyhow
affect SLURM work?











[slurm-dev] Re: Issues with cons_res

2015-05-07 Thread Danny Auble
How did you install?  My guess is it isn't a full install like Moe said.  I 
would remove the PluginDir option since it will default to where you configured 
it to be.  Based on you pointing to /usr/lib64 as the location on your one node 
I'm surprised it didn't work.  

On May 7, 2015 8:13:35 PM PDT, David Lin  wrote:
>Hi Danny,
>No that doesn't work,
>
>starting slurmd: slurmd: error: Couldn't find the specified plugin name
>
>for select/cons_res looking at all files
>slurmd: error: cannot find select plugin for select/cons_res
>slurmd: fatal: Can't find plugin for select/cons_res
>
>David
>
>On 05/07/2015 07:39 PM, Danny Auble wrote:
>> What happens if you set
>>
>> PluginDir=/usr/lib64
>>
>>
>>
>> On May 7, 2015 6:10:19 PM PDT, David Lin 
>wrote:
>>
>> Hi Moe,
>> I do have the Slurm plugins installed, and I do see the file
>> /usr/lib64/select_cons_res.so  <http://res.so>
>> my slurm.conf also has PluginDir=/usr/lib64/slurm
>> I've pasted my full slurm.conf below just in case.
>>
>> Thanks!
>> David
>>
>>
>> # slurm.conf file generated by configurator.html.
>> # Put this file on all nodes of your cluster.
>> # See the slurm.conf man page for more information.
>> #
>> ControlMachine=rsg-master
>> ControlAddr=171.64.74.213  <http://171.64.74.213>
>> #BackupController=
>> #BackupAddr=
>> #
>> AuthType=auth/munge
>> CacheGroups=0
>> #CheckpointType=checkpoint/none
>> CryptoType=crypto/munge
>> #DisableRootJobs=NO
>> #EnforcePartLimits=NO
>> #Epilog=
>> #EpilogSlurmctld=
>> #FirstJobId=1
>> #MaxJobId=99
>> #GresTypes=
>> #GroupUpdateForce=0
>> #GroupUpdateTime=600
>> #JobCheckpointDir=/var/slurm/checkpoint
>> #JobCredentialPrivateKey=
>> #JobCredentialPublicCertificate=
>> #JobFileAppend=0
>> #JobRequeue=1
>> #JobSubmitPlugins=1
>> #KillOnBadExit=0
>> #LaunchType=launch/slurm
>> #Licenses=foo*4,bar
>> #MailProg=/bin/mail
>> #MaxJobCount=5000
>> #MaxStepCount=4
>> #MaxTasksPerNode=128
>> MpiDefault=none
>> #MpiParams=ports=#-#
>> PluginDir=/usr/lib64/slurm
>> #PlugStackConfig=
>> #PrivateData=jobs
>> ProctrackType=proctrack/pgid
>> #Prolog=
>> #PrologFlags=
>> #PrologSlurmctld=
>> #PropagatePrioProcess=0
>> #PropagateResourceLimits=
>> #PropagateResourceLimitsExcept=
>> #RebootProgram=
>> ReturnToService=2
>> #SallocDefaultCommand=
>> SlurmctldPidFile=/var/run/slurmctld.pid
>> SlurmctldPort=6817
>> SlurmdPidFile=/var/run/slurmd.pid
>> SlurmdPort=6818
>> SlurmdSpoolDir=/var/spool/slurmd
>> SlurmUser=slurm
>> #SlurmdUser=root
>> #SrunEpilog=
>> #SrunProlog=
>> StateSaveLocation=/var/spool
>> SwitchType=switch/none
>> #TaskEpilog=
>> TaskPlugin=task/none
>> #TaskPluginParam=
>> #TaskProlog=
>> #TopologyPlugin=topology/tree
>> #TmpFS=/tmp
>> #TrackWCKey=no
>> #TreeWidth=
>> #UnkillableStepProgram=
>> #UsePAM=0
>> #
>> #
>> # TIMERS
>> #BatchStartTimeout=10
>> #CompleteWait=0
>> #EpilogMsgTime=2000
>> #GetEnvTimeout=2
>> #HealthCheckInterval=0
>> #HealthCheckProgram=
>> InactiveLimit=0
>> KillWait=30
>> #MessageTimeout=10
>> #ResvOverRun=0
>> MinJobAge=300
>> #OverTimeLimit=0
>> SlurmctldTimeout=120
>> SlurmdTimeout=300
>> #UnkillableStepTimeout=60
>> #VSizeFactor=0
>> Waittime=0
>> #
>> #
>> # SCHEDULING
>> #DefMemPerCPU=0
>> FastSchedule=0
>> #MaxMemPerCPU=0
>> #SchedulerRootFilter=1
>> #SchedulerTimeSlice=30
>> SchedulerType=sched/backfill
>> SchedulerPort=7321
>> SelectType=select/cons_res
>> SelectTypeParameters=CR_Core_Memory
>> #
>> #
>> # JOB PRIORITY
>> #PriorityFlags=
>> #PriorityType=priority/basic
>> #PriorityDecayHalfLife=
>> #PriorityCalcPeriod=
>> #PriorityFavorSmall=
>> #PriorityMaxAge=
>> #PriorityUsageResetPeriod=
>> #PriorityWeightAge=
>>

[slurm-dev] Re: Issues with cons_res

2015-05-07 Thread Danny Auble
What happens if you set 

PluginDir=/usr/lib64



On May 7, 2015 6:10:19 PM PDT, David Lin  wrote:
>
>Hi Moe,
>I do have the Slurm plugins installed, and I do see the file 
>/usr/lib64/select_cons_res.so
>my slurm.conf also has PluginDir=/usr/lib64/slurm
>I've pasted my full slurm.conf below just in case.
>
>Thanks!
>David
>
>
># slurm.conf file generated by configurator.html.
># Put this file on all nodes of your cluster.
># See the slurm.conf man page for more information.
>#
>ControlMachine=rsg-master
>ControlAddr=171.64.74.213
>#BackupController=
>#BackupAddr=
>#
>AuthType=auth/munge
>CacheGroups=0
>#CheckpointType=checkpoint/none
>CryptoType=crypto/munge
>#DisableRootJobs=NO
>#EnforcePartLimits=NO
>#Epilog=
>#EpilogSlurmctld=
>#FirstJobId=1
>#MaxJobId=99
>#GresTypes=
>#GroupUpdateForce=0
>#GroupUpdateTime=600
>#JobCheckpointDir=/var/slurm/checkpoint
>#JobCredentialPrivateKey=
>#JobCredentialPublicCertificate=
>#JobFileAppend=0
>#JobRequeue=1
>#JobSubmitPlugins=1
>#KillOnBadExit=0
>#LaunchType=launch/slurm
>#Licenses=foo*4,bar
>#MailProg=/bin/mail
>#MaxJobCount=5000
>#MaxStepCount=4
>#MaxTasksPerNode=128
>MpiDefault=none
>#MpiParams=ports=#-#
>PluginDir=/usr/lib64/slurm
>#PlugStackConfig=
>#PrivateData=jobs
>ProctrackType=proctrack/pgid
>#Prolog=
>#PrologFlags=
>#PrologSlurmctld=
>#PropagatePrioProcess=0
>#PropagateResourceLimits=
>#PropagateResourceLimitsExcept=
>#RebootProgram=
>ReturnToService=2
>#SallocDefaultCommand=
>SlurmctldPidFile=/var/run/slurmctld.pid
>SlurmctldPort=6817
>SlurmdPidFile=/var/run/slurmd.pid
>SlurmdPort=6818
>SlurmdSpoolDir=/var/spool/slurmd
>SlurmUser=slurm
>#SlurmdUser=root
>#SrunEpilog=
>#SrunProlog=
>StateSaveLocation=/var/spool
>SwitchType=switch/none
>#TaskEpilog=
>TaskPlugin=task/none
>#TaskPluginParam=
>#TaskProlog=
>#TopologyPlugin=topology/tree
>#TmpFS=/tmp
>#TrackWCKey=no
>#TreeWidth=
>#UnkillableStepProgram=
>#UsePAM=0
>#
>#
># TIMERS
>#BatchStartTimeout=10
>#CompleteWait=0
>#EpilogMsgTime=2000
>#GetEnvTimeout=2
>#HealthCheckInterval=0
>#HealthCheckProgram=
>InactiveLimit=0
>KillWait=30
>#MessageTimeout=10
>#ResvOverRun=0
>MinJobAge=300
>#OverTimeLimit=0
>SlurmctldTimeout=120
>SlurmdTimeout=300
>#UnkillableStepTimeout=60
>#VSizeFactor=0
>Waittime=0
>#
>#
># SCHEDULING
>#DefMemPerCPU=0
>FastSchedule=0
>#MaxMemPerCPU=0
>#SchedulerRootFilter=1
>#SchedulerTimeSlice=30
>SchedulerType=sched/backfill
>SchedulerPort=7321
>SelectType=select/cons_res
>SelectTypeParameters=CR_Core_Memory
>#
>#
># JOB PRIORITY
>#PriorityFlags=
>#PriorityType=priority/basic
>#PriorityDecayHalfLife=
>#PriorityCalcPeriod=
>#PriorityFavorSmall=
>#PriorityMaxAge=
>#PriorityUsageResetPeriod=
>#PriorityWeightAge=
>#PriorityWeightFairshare=
>#PriorityWeightJobSize=
>#PriorityWeightPartition=
>#PriorityWeightQOS=
>#
>#
># LOGGING AND ACCOUNTING
>#AccountingStorageEnforce=0
>#AccountingStorageHost=
>#AccountingStorageLoc=
>#AccountingStoragePass=
>#AccountingStoragePort=
>AccountingStorageType=accounting_storage/none
>#AccountingStorageUser=
>AccountingStoreJobComment=YES
>ClusterName=cluster
>#DebugFlags=
>#JobCompHost=
>#JobCompLoc=
>#JobCompPass=
>#JobCompPort=
>JobCompType=jobcomp/none
>#JobCompUser=
>#JobContainerType=job_container/none
>JobAcctGatherFrequency=30
>JobAcctGatherType=jobacct_gather/none
>SlurmctldDebug=9
>SlurmctldLogFile=/var/log/slurmctld.log
>SlurmdDebug=9
>SlurmdLogFile=/var/log/slurmd.log
>#SlurmSchedLogFile=
>#SlurmSchedLogLevel=
>#
>#
># POWER SAVE SUPPORT FOR IDLE NODES (optional)
>#SuspendProgram=
>#ResumeProgram=
>#SuspendTimeout=
>#ResumeTimeout=
>#ResumeRate=
>#SuspendExcNodes=
>#SuspendExcParts=
>#SuspendRate=
>#SuspendTime=
>#
>#
># COMPUTE NODES
>NodeName=rsg[4-7]State=UNKNOWN CPUs=24
>Sockets=2 
>CoresPerSocket=6 ThreadsPerCore=2
>NodeName=rsg[12-15]  State=UNKNOWN CPUs=24
>Sockets=2 
>CoresPerSocket=6 ThreadsPerCore=2
>NodeName=rsg[16-31]  State=UNKNOWN CPUs=32
>Sockets=2 
>CoresPerSocket=8 ThreadsPerCore=2
>
>
>
>
>
>On 05/07/2015 05:59 PM, Moe Jette wrote:
>>
>> It looks like you didn't install the RPM with Slurm plugins.
>>
>> Quoting David Lin :
>>> Hello,
>>>
>>> I am having some issues with the select/cons_res mode of slurm. When
>I
>>> tried to execute a job such as srun -N 2 -n 2 hostname, I get this
>>>
>>> $ srun -N 2 -n 2 -q RHEL6 hostname
>>> srun: error: slurm_receive_msg: Zero Bytes were transmitted or
>received
>>> srun: error: Unable to allocate resources: Zero Bytes were
>transmitted
>>> or received
>>>
>>> and on the slurmctld log, I see this
>>>
>>> [2015-05-07T16:52:43.264] error: we don't have select plugin type
>102
>>> [2015-05-07T16:52:43.264] error: select_g_select_jobinfo_unpack:
>unpack
>>> error
>>> [2015-05-07T16:52:43.264] error: Malformed RPC of type
>>> REQUEST_RESOURCE_ALLOCATION(4001) received
>>> [2015-05-07T16:52:43.264] error: slurm_receive_msg: Header lengths
>are
>>> longer than data received
>>> [2015-05-07T16:52:43.274] error: slurm_receive_msg: Header leng

[slurm-dev] Re: cgroups support in slurm (sbatch vs salloc)

2015-05-07 Thread Danny Auble


salloc should be used in start an interactive session.  If you would 
like to end up on a node in the allocation by default you should look at 
the slurm.conf SallocDefaultCommand option and reference this FAQ 
http://slurm.schedmd.com/faq.html#salloc_default_command.


Doing it your way with srun will consume resources leaving you in the 
position you are currently in.


On 05/07/15 09:35, Igor Kozin wrote:

Chris, Mehdi, thank you. I must say that based on what I'd read salloc appeared to me as 
a command to start interactive jobs while srun is to "Run parallel jobs".
If I get it right now, srun must be used to start interactive sessions
srun --ntasks=1 --mem-per-cpu=1000 --pty /bin/bash

(and salloc should be probably removed from the list of tools available to our 
users).

Now, if I set in slurm.conf

DefMemPerCPU=800
MaxMemPerCPU=1600

and run srun --ntasks=1 --pty /bin/bash

I get
memory.limit_in_bytes 838860800

I can still override the max mem limit on the command line but at the cost of 
having more cores

srun --ntasks=1 --mem-per-cpu=2000 --pty /bin/bash
memory.limit_in_bytes 2097152000
cpuset.cpus 0-2

It's only when I hit the limit of number of cores x 1600 MB I get an error.

srun --ntasks=1 --mem-per-cpu=2 --pty /bin/bash
srun: Force Terminated job 52
srun: error: CPU count per node can not be satisfied
srun: error: Unable to allocate resources: Requested node configuration is not 
available

So far so good.


-Original Message-
From: Mehdi Denou [mailto:mehdi.de...@atos.net]
Sent: 07 May 2015 13:07
To: slurm-dev
Subject: [slurm-dev] Re: cgroups support in slurm (sbatch vs salloc)


Here is another example which is (from my point of view) less confusing:

[root@host1 ~]# salloc -N 1
salloc: Granted job allocation 8
[root@host1 ~]# srun hostname
host9
[root@host1 ~]# hostname
host1
[root@host1 ~]# exit
exit
salloc: Relinquishing job allocation 8
salloc: Job allocation 8 has been revoked.
[root@host1 ~]#


Le 07/05/2015 13:28, Chris Samuel a écrit :

On Thu, 7 May 2015 04:01:25 AM Igor Kozin wrote:


My real question is why running
salloc --mem-per-cpu=1000 --ntasks=1 bash
does not create cgroups and therefore gets you an unlimited interactive
session?

My understanding is that salloc will give you a session on the same node you
run it, and you then need to use srun to launch a process on the assigned
compute node (and thus into the relevant control group).

To demonstrate, here is an example from one of our systems (Slurm 14.03.11),
first just running hostname in salloc so you can see the shell is on the same
node:

[samuel@merri ~]$ salloc hostname
salloc: Pending job allocation 2096414
salloc: job 2096414 queued and waiting for resources
salloc: job 2096414 has been allocated resources
salloc: Granted job allocation 2096414
merri
salloc: Relinquishing job allocation 2096414
[samuel@merri ~]$


Now running hostname with srun inside salloc to show it appears on the compute
node instead:

[samuel@merri ~]$ salloc srun hostname
salloc: Pending job allocation 2096415
salloc: job 2096415 queued and waiting for resources
salloc: job 2096415 has been allocated resources
salloc: Granted job allocation 2096415
Scratch directory /scratch/merri/jobs/2096415 has been allocated
merri009
salloc: Relinquishing job allocation 2096415


Now to demonstrate that the one on the login node has (as expected) no cgroup
whilst the one run with srun does run inside a cgroup:

[samuel@merri ~]$ salloc cat /proc/self/cpuset
salloc: Pending job allocation 2096416
salloc: job 2096416 queued and waiting for resources
salloc: job 2096416 has been allocated resources
salloc: Granted job allocation 2096416
/
salloc: Relinquishing job allocation 2096416
salloc: Job allocation 2096416 has been revoked.
[samuel@merri ~]$

[samuel@merri ~]$ salloc srun cat /proc/self/cpuset
salloc: Pending job allocation 2096417
salloc: job 2096417 queued and waiting for resources
salloc: job 2096417 has been allocated resources
salloc: Granted job allocation 2096417
Scratch directory /scratch/merri/jobs/2096417 has been allocated
/slurm/uid_500/job_2096417/step_0
salloc: Relinquishing job allocation 2096417
salloc: Job allocation 2096417 has been revoked.
[samuel@merri ~]$


Hope that helps!

All the best,
Chris


[slurm-dev] Re: slurmdbd association lifetime/expiry

2015-04-21 Thread Danny Auble


Currently there isn't any straightforward way to expire a limit.

Based on the talk you referenced it looks like expiry was something they 
wanted to add.  If you knew the end time of the project the easiest way 
to make it so the account couldn't run is set the GrpCPUMins=0 when the 
end time happens, or just remove the account from Slurm.  I am unaware 
of anything that does this automatically with Slurm, but a cronjob 
running a script every day or so might give you what you are looking for.



On 04/21/15 10:06, Danny Auble wrote:
Have you looked at sreport?  I think the Cluster 
UserUtilizationByAccount report would give you what you are looking for.


On 04/21/15 05:53, Maciej L. Olchowik wrote:

Dear all,

For our accounting needs, we are currently running slurmdbd with the 
sbank scripts:

http://slurm.schedmd.com/slurm_ug_2012/SUG2012-TCHPC-sbank.pdf

We find that this setup does almost everything we need, apart from 
one thing. We need to be able to track core-hour allocations on per 
project basis. For example, a project would be allowed to run for 3 
months and it's remaining core-hour allocation will expire after 
that. Is it possible to set an expiry date on the slurmdbd 
association (specifically we use GrpCPUMins parameter)?


We know that we can somewhat track allocations via:
   sacctmgr list transactions

but how do we keep track and enforce the expiry date? Does anyone 
have a neat solution to this problem?


many thanks for any pointers,

Maciej

--
Maciej Olchowik
HPC Systems Administrator
KAUST Supercomputing Laboratory (KSL)
Al Khawarizmi Bldg. (1) Room 0134
Thuwal, Kingdom of Saudi Arabia
tel +966 2 808 0684



This message and its contents including attachments are intended 
solely for the original recipient. If you are not the intended 
recipient or have received this message in error, please notify me 
immediately and delete this message from your computer system. Any 
unauthorized use or distribution is prohibited. Please consider the 
environment before printing this email.




[slurm-dev] Re: slurmdbd association lifetime/expiry

2015-04-21 Thread Danny Auble


Have you looked at sreport?  I think the Cluster 
UserUtilizationByAccount report would give you what you are looking for.


On 04/21/15 05:53, Maciej L. Olchowik wrote:

Dear all,

For our accounting needs, we are currently running slurmdbd with the sbank 
scripts:
http://slurm.schedmd.com/slurm_ug_2012/SUG2012-TCHPC-sbank.pdf

We find that this setup does almost everything we need, apart from one thing. 
We need to be able to track core-hour allocations on per project basis. For 
example, a project would be allowed to run for 3 months and it's remaining 
core-hour allocation will expire after that. Is it possible to set an expiry 
date on the slurmdbd association (specifically we use GrpCPUMins parameter)?

We know that we can somewhat track allocations via:
   sacctmgr list transactions

but how do we keep track and enforce the expiry date? Does anyone have a neat 
solution to this problem?

many thanks for any pointers,

Maciej

--
Maciej Olchowik
HPC Systems Administrator
KAUST Supercomputing Laboratory (KSL)
Al Khawarizmi Bldg. (1) Room 0134
Thuwal, Kingdom of Saudi Arabia
tel +966 2 808 0684



This message and its contents including attachments are intended solely for the 
original recipient. If you are not the intended recipient or have received this 
message in error, please notify me immediately and delete this message from 
your computer system. Any unauthorized use or distribution is prohibited. 
Please consider the environment before printing this email.


[slurm-dev] Re: Multi-Cluster installation update-safe?

2015-04-16 Thread Danny Auble
Yes Ulf, your understanding is correct.  The DBD can talk to 2 previous 
versions.  As long as the DBD is updated first you should be fine. 

On April 16, 2015 3:33:22 AM PDT, Ulf Markwardt  
wrote:
>Dear Slurm developers,
>
>before I set up Slurm with multi-cluster support, I would like to make 
>sure that it will be update-safe: It certainly will happen that I can 
>only update the Slurm installation on _one_ cluster at a time, maybe 
>with e.g. a week in between.
>
>Is it a design principle that the latest slurmdbd can communicate with
>a 
>slightly older slurmctrld (e.g. 15.08 with 14.11)?
>
>Thank you,
>Ulf
>-- 
>___
>Dr. Ulf Markwardt
>
>Technische Universität Dresden
>Center for Information Services and High Performance Computing (ZIH)
>01062 Dresden, Germany
>
>Phone: (+49) 351/463-33640  WWW:  http://www.tu-dresden.de/zih


[slurm-dev] Re: Cray Resource Utilization Reporting (RUR) via plugin

2015-03-20 Thread Danny Auble
There are currently no plans to utilize RUR information. 

I would suggest running natively (without ALPS) and using normal accounting.  I 
What from RUR is missing of you do that?

The energy ACCOUNTING currently there is for when running native. 



On March 20, 2015 7:38:15 AM CDT, Andrew Elwell  wrote:
>
>Hi All,
>
>We're investigating the possibility of enabling RUR on our XC30's,
>with the end goal of integrating this into the slurmdbd for jobs.
>
>Is anyone else working on this? if not, is anyone else interested?
>
>I know that there's already
>./acct_gather_energy/cray/acct_gather_energy_cray.c but I don't see
>anything to interact with RUR.
>
>Andrew


[slurm-dev] Re: slurmdbd segmentation fault 14.11.4

2015-03-16 Thread Danny Auble
Yann, this is fixed in commit 2e2d924e3d0 and will be in 14.11.5 when 
released.  The patch is quite small, all you will need to do it patch 
your code, recompile the mysql plugin and restart your slurmdbd.


The problem comes from setting a node down without a reason given.

Danny

On 03/16/2015 08:22 AM, Yann Sagon wrote:

Re: [slurm-dev] Re: slurmdbd segmentation fault 14.11.4
#0  0x2b69ed3df395 in __strcasecmp_l_sse42 () from /lib64/libc.so.6
#1  0x2b69edc866de in as_mysql_node_down 
(mysql_conn=0x2b6a04000960, node_ptr=0x2b6a000ffac0, 
event_time=1426513631, reason=, reason_uid=200) 
at as_mysql_cluster.c:1085
#2  0x00522c70 in clusteracct_storage_g_node_down 
(db_conn=0x2b6a04000960, node_ptr=0x2b6a000ffac0, 
event_time=1426513631, reason=0x0, reason_uid=200) at 
slurm_accounting_storage.c:703
#3  0x0042ac39 in _node_state (slurmdbd_conn=0x2b69fc0008d0, 
in_buffer=0x2b6a04045d10, out_buffer=0x2b6a000ffce8, 
uid=0x2b6a000ffe58) at proc_req.c:2792
#4  0x00424b10 in proc_req (slurmdbd_conn=0x2b69fc0008d0, 
msg=0x2b6a04000d90 "\005\230", msg_size=40, first=false, 
out_buffer=0x2b6a000ffce8, uid=0x2b6a000ffe58) at proc_req.c:387
#5  0x0042d3d8 in _send_mult_msg 
(slurmdbd_conn=0x2b69fc0008d0, in_buffer=0x2b6a04000910, 
out_buffer=0x2b6a000ffe48, uid=0x2b6a000ffe58) at proc_req.c:3764
#6  0x00424cfa in proc_req (slurmdbd_conn=0x2b69fc0008d0, 
msg=0x2b6a0408fdb0 "\005", , msg_size=72054, 
first=false, out_buffer=0x2b6a000ffe48, uid=0x2b6a000ffe58)

at proc_req.c:444
#7  0x0043178c in _service_connection (arg=0x2b69fc0008d0) at 
rpc_mgr.c:232

#8  0x2b69ed09e9d1 in start_thread () from /lib64/libpthread.so.0
#9  0x2b69ed39c8fd in clone () from /lib64/libc.so.6



2015-03-16 15:37 GMT+01:00 Morris Jette >:


What is in the core dump back trace?


On March 16, 2015 7:30:07 AM PDT, Yann Sagon mailto:ysa...@gmail.com>> wrote:

I have the same problem again after several days.

slurmdbd -Dv

[...]
slurmdbd: debug2: Everything rolled up
slurmdbd: debug4: got 0 commits
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: Opened connection 7 from 192.168.100.1
slurmdbd: debug:  DBD_INIT: CLUSTER:baobab VERSION:7168
UID:200 IP:192.168.100.1 CONN:7
slurmdbd: debug2: acct_storage_p_get_connection: request new
connection 1
slurmdbd: debug2: DBD_REGISTER_CTLD: called for baobab(6817)
slurmdbd: debug2: slurmctld at ip:192.168.100.1, port:6817
slurmdbd: debug4: got 0 commits
slurmdbd: debug2: DBD_NODE_STATE: NODE:node130 STATE:DOWN
REASON:(null) UID:200 TIME:1426513631
Segmentation fault (core dumped)

For info: node130 was put in drain by slurm because a SPANK
plugins returned an error (legitime, I fixed it).
I have since restarted the node130, set it to idle, and no change.




2015-03-09 15:37 GMT+01:00 Yann Sagon mailto:ysa...@gmail.com>>:

I tried the hard way: revert back to previous slurmdbd,
revert the mysql backup, tried to remove some entries on
the db. No luck.

Then I had a look at the dbd.messages
in StateSaveLocation. I suppose it's there where the
messages to be flushed to the db are stored when the db is
not available.
I stopped slurmctld. Remove the file dbd.messages. Started
slurmctld. Started slurmdbd. Everything seems fine, the
daemon is not crashing. I'm keeping the file dbd.messages
on my hd in case someone is interested to discover the
cause of the crash.



2015-03-08 10:53 GMT+01:00 Yann Sagon mailto:ysa...@gmail.com>>:

I just noticed that slurmdbd is crashing as soon as I
start it.

/usr/sbin/slurmdbd -D
slurmdbd: debug3: Trying to load plugin
/usr/lib64/slurm/auth_munge.so
slurmdbd: debug:  auth plugin for Munge
(http://code.google.com/p/munge/) loaded
slurmdbd: debug3: Success.
slurmdbd: debug3: Trying to load plugin
/usr/lib64/slurm/accounting_storage_mysql.so
slurmdbd: debug2: mysql_connect() called for db slurm
slurmdbd: Accounting storage MYSQL plugin loaded
slurmdbd: debug3: Success.
slurmdbd: pidfile not locked, assuming no running daemon
slurmdbd: debug2: ArchiveDir= /var/lib/slurmdbd
slurmdbd: debug2: ArchiveScript = (null)
slurmdbd: debug2: AuthInfo  = (null)
slurmdbd: debug2: AuthType  = auth/munge
slurmdbd: debug2: CommitDelay   = 0
slurmdbd: debug2: DbdAddr   = master
slurmdbd: debug2: DbdBa

[slurm-dev] Re: SlurmDBD Archiving

2015-03-10 Thread Danny Auble
The fatal you received means your query lasted more than 15 minutes, mysql 
deemed it hung and aborted. You can increase the timeout for 
innodb_lock_wait_timeout in your my.cnf and try again, but that generally isn't 
a good idea. You can safely try again as many times as you would like and 
perhaps it will finish or just wait for the normal purge. 

This timeout should only occur when running the archive dump with sacctmgr, 
under normal purge with the dbd this wouldn't happen. 

On March 10, 2015 6:26:19 AM PDT, Paul Edmon  wrote:
>
>So when I tried to do an archive dump I got the following error. What 
>does this mean?
>
>[root@holy-slurm01 slurm]# sacctmgr -i archive dump
>sacctmgr: error: slurmdbd: Getting response to message type 1459
>sacctmgr: error: slurmdbd: DBD_ARCHIVE_DUMP failure: No error
>  Problem dumping archive: Unspecified error
>
>It also caused the slurmdbd to crash so I had to restart it.  Here is 
>the log:
>
>Mar 10 08:18:21 holy-slurm01 slurmctld[20144]: slurmdbd: agent queue
>size 50
>Mar 10 09:20:57 holy-slurm01 slurmdbd[47429]: error: mysql_query
>failed: 
>1205 Lock wait timeout exceeded; try restarting transaction
>Mar 10 09:20:57 holy-slurm01 slurmdbd[47429]: fatal: mysql gave 
>ER_LOCK_WAIT_TIMEOUT as an error. The only way to fix this is restart 
>the calling program
>Mar 10 09:20:57 holy-slurm01 slurmctld[20144]: error: slurmdbd: Getting
>
>response to message type 1407
>Mar 10 09:20:57 holy-slurm01 slurmctld[20144]: slurmdbd: reopening 
>connection
>Mar 10 09:21:02 holy-slurm01 slurmctld[20144]: error: slurmdbd: Sending
>
>message type 1472: 11: Connection refused
>Mar 10 09:21:02 holy-slurm01 slurmctld[20144]: error: slurmdbd: 
>DBD_SEND_MULT_JOB_START failure: Connection refused
>
>It did manage to dump:
>
>[root@holy-slurm01 archive]# ls -ltr
>total 160260
>-rw--- 1 slurm slurm_users 163847476 Mar 10 09:16 
>odyssey_event_archive_2013-08-01T00:00:00_2014-08-31T23:59:59
>-rw--- 1 slurm slurm_users253639 Mar 10 09:17 
>odyssey_suspend_archive_2013-08-01T00:00:00_2014-08-31T23:59:59
>
>Is it safe to try again?
>
>-Paul Edmon-
>
>On 03/06/2015 03:07 PM, Paul Edmon wrote:
>>
>> Ah, okay, that was the command I was looking for.  I wasn't sure how 
>> to force it.  Thanks.
>>
>> -Paul Edmon-
>>
>> On 03/06/2015 01:43 PM, Danny Auble wrote:
>>>
>>> It looks like I might stand corrected though.  It looks like you
>will 
>>> have to wait for the month to go by before the purge starts.
>>>
>>> With a lot of jobs it may take a while depending on the speed of
>your 
>>> disk and such.  If you have the debugflag=DB_USAGE you should see
>the 
>>> sql statements go by.  This will be quite verbose, so it might not
>be 
>>> that great of an idea.  You can edit the slurmdbd.conf file with the
>
>>> flag and run "sacctmgr reconfig" to add or remove the flag.
>>>
>>> You can force a purge which in turn will force an archive with
>>>
>>> sacctmgr archive dump
>>>
>>> see http://slurm.schedmd.com/sacctmgr.html
>>>
>>> Danny
>>>
>>> On 03/06/2015 10:12 AM, Paul Edmon wrote:
>>>>
>>>> How long does that typically take?  Because I have done it on our 
>>>> large job database and I have seen nothing yet.
>>>>
>>>> -Paul Edmon
>>>>
>>>> On 03/06/2015 01:07 PM, Danny Auble wrote:
>>>>>
>>>>> If you had older than 6 month data I would expect it to purge on 
>>>>> restart of the slurmdbd.  You will see a message in the log when 
>>>>> the archive file is created.
>>>>>
>>>>> On 03/06/2015 09:58 AM, Paul Edmon wrote:
>>>>>>
>>>>>> Okay, that's what I suspected.  We set it to 6 months.  So I
>guess 
>>>>>> then the purge will happen on April 1st.
>>>>>>
>>>>>> -Paul Edmon-
>>>>>>
>>>>>> On 03/06/2015 12:33 PM, Danny Auble wrote:
>>>>>>>
>>>>>>> Paul, do you have Purge* set up in the slurmdbd.conf? Archiving 
>>>>>>> takes place during the Purge process.  If no Purge values are
>set 
>>>>>>> archiving will never take place since nothing is ever purged. 
>>>>>>> When Purge values are set the related archivings take place on
>an 
>>>>>>> hourly, daily, and monthly basis depending on the units your 
>>>>>>> purge values are set to.
>>>>>>>
>>>>>>> If PurgeJobs=2months the archive would take place at the 
>>>>>>> beginning of each month.  If it were set to 2hours it would 
>>>>>>> happen each hour. The purge will also happen on slurmdbd startup
>
>>>>>>> as well if running things for the first time.
>>>>>>>
>>>>>>> Danny
>>>>>>>
>>>>>>>
>>>>>>> On 03/06/2015 08:20 AM, Paul Edmon wrote:
>>>>>>>>
>>>>>>>> So we recently turned this on to archive jobs older than 6 
>>>>>>>> months. However when we restarted slurmdbd nothing happened, at
>
>>>>>>>> least no file was deposited at the specified archive location. 
>>>>>>>> Is there a way to force it to purge? When is the archiving 
>>>>>>>> scheduled to be done? We definitely have jobs older than 6 
>>>>>>>> months in the database, I'm just curious about the schedule of 
>>>>>>>> when the archiving is done.
>>>>>>>>
>>>>>>>> -Paul Edmon-


[slurm-dev] Re: SlurmDBD Archiving

2015-03-06 Thread Danny Auble


It looks like I might stand corrected though.  It looks like you will 
have to wait for the month to go by before the purge starts.


With a lot of jobs it may take a while depending on the speed of your 
disk and such.  If you have the debugflag=DB_USAGE you should see the 
sql statements go by.  This will be quite verbose, so it might not be 
that great of an idea.  You can edit the slurmdbd.conf file with the 
flag and run "sacctmgr reconfig" to add or remove the flag.


You can force a purge which in turn will force an archive with

sacctmgr archive dump

see http://slurm.schedmd.com/sacctmgr.html

Danny

On 03/06/2015 10:12 AM, Paul Edmon wrote:


How long does that typically take?  Because I have done it on our 
large job database and I have seen nothing yet.


-Paul Edmon

On 03/06/2015 01:07 PM, Danny Auble wrote:


If you had older than 6 month data I would expect it to purge on 
restart of the slurmdbd.  You will see a message in the log when the 
archive file is created.


On 03/06/2015 09:58 AM, Paul Edmon wrote:


Okay, that's what I suspected.  We set it to 6 months.  So I guess 
then the purge will happen on April 1st.


-Paul Edmon-

On 03/06/2015 12:33 PM, Danny Auble wrote:


Paul, do you have Purge* set up in the slurmdbd.conf? Archiving 
takes place during the Purge process.  If no Purge values are set 
archiving will never take place since nothing is ever purged. When 
Purge values are set the related archivings take place on an 
hourly, daily, and monthly basis depending on the units your purge 
values are set to.


If PurgeJobs=2months the archive would take place at the beginning 
of each month.  If it were set to 2hours it would happen each hour. 
The purge will also happen on slurmdbd startup as well if running 
things for the first time.


Danny


On 03/06/2015 08:20 AM, Paul Edmon wrote:


So we recently turned this on to archive jobs older than 6 months. 
However when we restarted slurmdbd nothing happened, at least no 
file was deposited at the specified archive location. Is there a 
way to force it to purge? When is the archiving scheduled to be 
done?  We definitely have jobs older than 6 months in the 
database, I'm just curious about the schedule of when the 
archiving is done.


-Paul Edmon-


[slurm-dev] Re: SlurmDBD Archiving

2015-03-06 Thread Danny Auble


If you had older than 6 month data I would expect it to purge on restart 
of the slurmdbd.  You will see a message in the log when the archive 
file is created.


On 03/06/2015 09:58 AM, Paul Edmon wrote:


Okay, that's what I suspected.  We set it to 6 months.  So I guess 
then the purge will happen on April 1st.


-Paul Edmon-

On 03/06/2015 12:33 PM, Danny Auble wrote:


Paul, do you have Purge* set up in the slurmdbd.conf?  Archiving 
takes place during the Purge process.  If no Purge values are set 
archiving will never take place since nothing is ever purged. When 
Purge values are set the related archivings take place on an hourly, 
daily, and monthly basis depending on the units your purge values are 
set to.


If PurgeJobs=2months the archive would take place at the beginning of 
each month.  If it were set to 2hours it would happen each hour. The 
purge will also happen on slurmdbd startup as well if running things 
for the first time.


Danny


On 03/06/2015 08:20 AM, Paul Edmon wrote:


So we recently turned this on to archive jobs older than 6 months. 
However when we restarted slurmdbd nothing happened, at least no 
file was deposited at the specified archive location. Is there a way 
to force it to purge? When is the archiving scheduled to be done?  
We definitely have jobs older than 6 months in the database, I'm 
just curious about the schedule of when the archiving is done.


-Paul Edmon-


[slurm-dev] Re: SlurmDBD Archiving

2015-03-06 Thread Danny Auble


Paul, do you have Purge* set up in the slurmdbd.conf?  Archiving takes 
place during the Purge process.  If no Purge values are set archiving 
will never take place since nothing is ever purged.  When Purge values 
are set the related archivings take place on an hourly, daily, and 
monthly basis depending on the units your purge values are set to.


If PurgeJobs=2months the archive would take place at the beginning of 
each month.  If it were set to 2hours it would happen each hour. The 
purge will also happen on slurmdbd startup as well if running things for 
the first time.


Danny


On 03/06/2015 08:20 AM, Paul Edmon wrote:


So we recently turned this on to archive jobs older than 6 months. 
However when we restarted slurmdbd nothing happened, at least no file 
was deposited at the specified archive location.  Is there a way to 
force it to purge? When is the archiving scheduled to be done?  We 
definitely have jobs older than 6 months in the database, I'm just 
curious about the schedule of when the archiving is done.


-Paul Edmon-


[slurm-dev] Re: Per-partition QOS limits?

2015-02-19 Thread Danny Auble
There may be round about ways of doing this in 14.11, but this kind of 
functionality is already in 15.08.  There we added the idea of a 
"Partition QOS" to a partition thus giving all the limits of a QOS to 
the partition.  The jobs ran will still have their own regular QOS.  The 
Partition QOS will override the job's QOS.  If the opposite is desired 
you need to have the job's QOS have the 'PartitionQOS' flag which will 
reverse the order of precedence.


You can read more about this in the 15.08 docs if you download a pre2 
release and look at the qos.shtml in doc/html dir.


Danny

On 02/19/2015 12:01 PM, Chris Read wrote:

Per-partition QOS limits?
Greetings...

We currently rely heavily on QOS based time and resource limits, but 
up until recently have been running in a single partition. Now that we 
have multiple partitions we've seen the need for some of these limits 
(for example GrpCPUs) to be limited per partition.


I see no easy way to do that right now other than creating new 
partitions, which I don't want to do. Are there any plans to make this 
easy to do?


Anyone else interested in such a feature?

Thanks,

Chris




[slurm-dev] Re: Job on wrong node

2015-02-04 Thread Danny Auble



On 02/04/2015 11:23 AM, Ulf Markwardt wrote:



DebugFlags=NO_CONF_HASH

But we do have different slurm.conf files due to different energy
sensors, prolog/epilog scripts.
The NO_CONF_HASH is very dangerous in most systems.  It should be 
avoided at all cost.


It is interesting you have different sensors per node.  I could 
understand in this case to have NO_CONF_HASH set.  We are thinking of 
adding a new kind of slurm.conf include that doesn't get added to the 
hash which you could put node specific information like this and could 
remove the NO_CONF_HASH.


You might be able to get around the pro/epilog issue by having a master 
pro/epilog that in turn calls different ones depending on the node.  
Adding the new file would also eliminate this issue as well. This 
doesn't exist today, but is being thought about.






I am guessing the slurm.conf file on your nodes may be insync, but
perhaps the slurmd on the troubled nodes may be running with an old
version.

All show slurm 14.11.3
I meant an older version of the file, not Slurm :).  With NO_CONF_HASH 
set there isn't a real good way to verify the slurmd's are all running 
the same slurm.conf.


I would suggest issuing a "scontrol shutdown" then restarting all your 
nodes and your controller.  If you still see the problem after that then 
indeed something else is the matter.  Perhaps routing tables or 
something else.


U



[slurm-dev] Re: Job on wrong node

2015-02-04 Thread Danny Auble


Make sure you don't have

DebugFlags=NO_CONF_HASH

in your slurm.conf.

Then in your slurmctld.log verify you don't see any messages like

error: Node snowflake0 appears to have a different slurm.conf than the 
slurmctld.  This could cause issues with communication and 
functionality.  Please review both files and make sure they are the 
same.  If this is expected ignore, and set DebugFlags=NO_CONF_HASH in 
your slurm.conf.


I am guessing the slurm.conf file on your nodes may be insync, but 
perhaps the slurmd on the troubled nodes may be running with an old version.


Danny

On 02/04/2015 10:36 AM, Ulf Markwardt wrote:

Dear Moe and Danny,


I would also check that your configured addresses for the nodes in
slurm.conf are correct (e.g. NodeName and NodeAddr match in slurm.conf).

Quoting Danny Auble :

Ulf, I would verify the slurm.conf is the same in each node.


after initial confusion with diverging slurm.conf we have provisioning 
tool which simply assures that our config files are synchron. (Apart 
from different energy sensors, GPU things etc). They are always 
updated at the start of the slurm daemon.


I have checked right now, there are no differences in hostnames, 
partitions, addresses, etc.


Best regards,
ulf



[slurm-dev] Re: Job on wrong node

2015-02-04 Thread Danny Auble
Ulf, I would verify the slurm.conf is the same in each node. 

On February 4, 2015 3:41:35 AM PST, Ulf Markwardt  
wrote:
>Dear all, 
>
>we see messages like this:
>
> grep "wrong node" /var/log/slurm/slurmctld.log
>
>[2015-02-04T04:27:05.591] error: Registered job 11923579.0 on wrong
>node taurusi3033
>[2015-02-04T04:27:05.591] error: Registered job 11900205.4294967294 on
>wrong node taurusi3033
>[2015-02-04T04:27:05.591] error: Registered job 11925038.0 on wrong
>node taurusi3033
>[2015-02-04T08:59:23.360] error: Registered job 11923729.0 on wrong
>node taurusi3019
>[2015-02-04T09:23:23.143] error: Registered job 11923729.0 on wrong
>node taurusi3107
>[2015-02-04T11:01:58.993] error: Batch completion for job 11923075 sent
>from wrong node (taurusi3178 rather than taurusi3084), ignored request
>[2015-02-04T11:28:31.198] error: Batch completion for job 11925657 sent
>from wrong node (taurusi3137 rather than taurusi1235), ignored request
>[2015-02-04T12:17:06.055] error: Registered job 11925657.0 on wrong
>node taurusi3137
>
>What can possibly have gone wrong here? I have no clue!
>(Slurm 14.11.03)
>
>Thank you
>Ulf
>
>-- 
>___
>Dr. Ulf Markwardt
>
>Technische Universität Dresden
>Center for Information Services and High Performance Computing (ZIH)
>01062 Dresden, Germany
>
>Phone: (+49) 351/463-33640  WWW:  http://www.tu-dresden.de/zih


[slurm-dev] Slurm versions 14.11.3 is now available

2015-01-08 Thread Danny Auble


Slurm versions 14.11.3 is now available. Version 14.11.3 includes quite 
a few bug fixes, most of which are relatively minor. There were also a 
few more major issues fixed that previously would cause various daemons 
to seg fault in corner case scenarios.


It is encouraged anyone running 14.11 to upgrade to 14.11.3.  It is also 
encouraged everyone else to do the same :).


The new tarball can be downloaded from http://schedmd.com/#repos

* Changes in Slurm 14.11.3
==
 -- Prevent vestigial job record when canceling a pending job array record.
 -- Fixed squeue core dump.
 -- Fix job array hash table bug, could result in slurmctld infinite 
loop or

invalid memory reference.
 -- In srun honor ntasks_per_node before looking at cpu count when the user
doesn't request a number of tasks.
 -- Fix ghost job when submitting job after all jobids are exhausted.
 -- MySQL - Enhanced coordinator security checks.
 -- Fix for task/affinity if an admin configures a node for having threads
but then sets CPUs to only represent the number of cores on the node.
 -- Make it so previous versions of salloc/srun work with newer versions
of Slurm daemons.
 -- Avoid delay on commit for PMI rank 0 to improve performance with some
MPI implementations.
 -- auth/munge - Correct logic to read old format AccountingStoragePass.
 -- Reset node "RESERVED" state as appropriate when deleting a maintenance
reservation.
 -- Prevent a job manually suspended from being resumed by gang 
scheduler once

free resources are available.
 -- Prevent invalid job array task ID value if a task is started using gang
scheduling.
 -- Fixes for clean build on FreeBSD.
 -- Fix documentation bugs in slurm.conf.5. DenyAccount should be 
DenyAccounts.

 -- For backward compatibility with older versions of OMPI not compiled
with --with-pmi restore the SLURM_STEP_RESV_PORTS in the job 
environment.

 -- Update the html documentation describing the integration with openmpi.
 -- Fix sacct when searching by nodelist.
 -- Fix cosmetic info statements when dealing with a job array task 
instead of

a normal job.
 -- Fix segfault with job arrays.
 -- Correct the sbatch pbs parser to process -j.
 -- BGQ - Put print statement under a DebugFlag.  This was just an 
oversight.

 -- BLUEGENE - Remove check that would erroneously remove the CONFIGURING
flag from a job while the job is waiting for a block to boot.
 -- Fix segfault in slurmstepd when job exceeded memory limit.
 -- Fix race condition that could start a job that is dependent upon a 
job array

before all tasks of that job array complete.
 -- PMI2 race condition fix.


[slurm-dev] Slurm version 14.11.2 and 15.08.0-pre1 are now available

2014-12-12 Thread Danny Auble


Slurm versions 14.11.2 and 15.08.0-pre1 are now available. Version 
14.11.2 includes quite a few relatively minor bug fixes.


Version 15.08.0 is under active development and its release is planned 
in August 2015.  While this is the first pre-release there is already 
quite a bit of new functionality.


Both versions can be downloaded from http://schedmd.com/#repos

Highlights of the 2 versions are these

* Changes in Slurm 14.11.2
==
 -- Fix Centos5 compile errors.
 -- Fix issue with association hash not getting the correct index which
could result in seg fault.
 -- Fix salloc/sbatch -B segfault.
 -- Avoid huge malloc if GRES configured with "Type" and huge "Count".
 -- Fix jobs from starting in overlapping reservations that won't 
finish before

a "maint" reservation begins.
 -- When node gets drained while in state mixed display its status as 
draining

in sinfo output.
 -- Allow priority/multifactor to work with sched/wiki(2) if all priorities
have no weight.  This allows for association and QOS decay limits 
to work.

 -- Fix "squeue --start" to override SQUEUE_FORMAT env variable.
 -- Fix scancel to be able to cancel multiple jobs that are space 
delimited.
 -- Log Cray MPI job calling exit() without mpi_fini(), but do not 
treat it as

a fatal error. This partially reverts logic added in version 14.03.9.
 -- sview - Fix displaying of suspended steps elapsed times.
 -- Increase number of messages that get cached before throwing them away
when the DBD is down.
 -- Fix jobs from starting in overlapping reservations that won't 
finish before

a "maint" reservation begins.
 -- Restore GRES functionality with select/linear plugin. It was broken in
version  14.03.10.
 -- Fix bug with GRES having multiple types that can cause slurmctld abort.
 -- Fix squeue issue with not recognizing "localhost" in --nodelist option.
 -- Make sure the bitstrings for a partitions Allow/DenyQOS are up to date
when running from cache.
 -- Add smap support for job arrays and larger job ID values.
 -- Fix possible race condition when attempting to use QOS on a system 
running

accounting_storage/filetxt.
 -- Fix issue with accounting_storage/filetxt and job arrays not being 
printed

correctly.
 -- In proctrack/linuxproc and proctrack/pgid, check the result of strtol()
for error condition rather than errno, which might have a vestigial 
error

code.
 -- Improve information recording for jobs deferred due to advanced
reservation.
 -- Exports eio_new_initial_obj to the plugins and initialize kvs_seq on
mpi/pmi2 setup to support launching.

* Changes in Slurm 15.08.0pre1
==
 -- Add sbcast support for file transfer to resources allocated to a 
job step

rather than a job allocation.
 -- Change structures with association in them to assoc to save space.
 -- Add support for job dependencies jointed with OR operator (e.g.
"--depend=afterok:123?afternotok:124").
 -- Add "--bb" (burst buffer specification) option to salloc, sbatch, 
and srun.
 -- Added configuration parameters BurstBufferParameters and 
BurstBufferType.

 -- Added burst_buffer plugin infrastructure (needs many more functions).
 -- Make it so when the fanout logic comes across a node that is down 
we abandon
the tree to avoid worst case scenarios when the entire branch is 
down and

we have to try each serially.
 -- Add better error reporting of invalid partitions at submission time.
 -- Move will-run test for multiple clusters from the sbatch code into 
the API

so that it can be used with DRMAA.
 -- If a non-exclusive allocation requests --hint=nomultithread on a
CR_CORE/SOCKET system lay out tasks correctly.
 -- Avoid including unused CPUs in a job's allocation when cores or 
sockets are

allocated.
 -- Added new job state of STOPPED indicating processes have been 
stopped with a
SIGSTOP (using scancel or sview), but retain its allocated CPUs. 
Job state

returns to RUNNING when SIGCONT is sent (also using scancel or sview).
 -- Added EioTimeout parameter to slurm.conf. It is the number of 
seconds srun

waits for slurmstepd to close the TCP/IP connection used to relay data
between the user application and srun when the user application 
terminates.
 -- Remove slurmctld/dynalloc plugin as the work was never completed, 
so it is

not worth the effort of continued support at this time.
 -- Remove DynAllocPort configuration parameter.
 -- Add advance reservation flag of "replace" that causes allocated 
resources

to be replaced with idle resources. This maintains a pool of available
resources that maintains a constant size (to the extent possible).
 -- Added SchedulerParameters option of "bf_busy_nodes". When selecting
resources for pending jobs to reserve for future execution (i.e. 
the job
can not be started immediately), then preferentially select nodes 
that are

in use. This will tend to

[slurm-dev] Re: Slurm 14.11.0 is now available

2014-11-14 Thread Danny Auble
Taras, I don't see what you see, and I wouldn't expect what you are 
seeing.  The auxdir/x_ac_hwloc.m4 doesn't appear to work that way when 
checking things, or when setting things up.


Danny

On 11/14/2014 01:21 AM, Taras Shapovalov wrote:

Re: [slurm-dev] Slurm 14.11.0 is now available
Hi Danny,

FYI: With the new configure script --with-hwloc should point now to 
'include' directory (in previous versions it was hwloc top 
installation directory). I am not sure this was changed intentionally.


Thanks,

Taras

On Fri, Nov 14, 2014 at 3:19 AM, Danny Auble <mailto:d...@schedmd.com>> wrote:



Slurm version 14.11.0 is now available. This is a major Slurm
release with many new features. See the RELEASE_NOTES and NEWS
files in the distribution for detailed descriptions of the
changes, a few of which are noted below.

Upgrading from Slurm versions 2.6 or 14.03 should proceed without
loss of jobs or other state.  Just be sure to upgrade the slurmdbd
first. (Upgrades from pre-releases of version 14.11 may result job
loss.)

Slurm downloads are available from http://www.schedmd.com/#repos.

Thanks to all those who helped make this release!

Highlights of changes in Slurm version 14.11.0 include:
 -- Added job array data structure and removed 64k array size
restriction.
 -- Added support for reserving CPUs and/or memory on a compute
node for system
use.
 -- Added support for allocation of generic resources by model
type for
heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU,
or a GPU of
any type).
 -- Added support for non-consumable generic resources that are
limited, but
can be shared between jobs.
 -- Added support for automatic job requeue policy based on exit
value.
 -- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO
CHANGE! The
lua script no longer needs to explicitly load meta-tables, but
information
is available directly using names slurm.reservations,
slurm.jobs <http://slurm.jobs>,
slurm.log_info, etc. Also, the job_submit.lua script is
reloaded when
updated without restarting the slurmctld daemon.
 -- Eliminate native Cray specific port management. Native Cray
systems must
now use the MpiParams configuration parameter to specify ports
to be used
for communications. When upgrading Native Cray systems from
version 14.03,
all running jobs should be killed and the switch_cray_state
file (in
SaveStateLocation of the nodes where the slurmctld daemon
runs) must be
explicitly deleted.






[slurm-dev] Re: SLURM Table/Column definitions for Ruby on Rails

2014-11-14 Thread Danny Auble



On 11/14/2014 08:51 AM, Charles Johnson wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have charged with writing a ruby-on-rails (RoR) web app for
reporting usage statistics that we supply to end users. I have written
one that we currently use, but the data comes from a different
scheduler/resource manager system. We do not currently use slurm, but
it is under very active testing, and so far we have not encountered
any deal-breakers that would prevent us from adopting slurm, hence my
assignment.

Outside of the slurm source, and simply examining the tables and their
columns in our MariaDB database, is there any documentation about the
nature of the data stored, the table structures, the
interrelationships (i.e., data integrity rules) between tables and the
relationships between the two databases slurm_acct_db and
slurm_jobcomp_db?
The slurm_jobcomp_db is mostly a subset of what is stored in the 
$CLUSTER_job_table table in the slurm_acct_db.  You probably don't need 
to run the job_comp plugin when using the accounting to the slurmdbd as 
it won't buy you much except duplicate data.

Rather that hitting the two databases directly, I will probably do a
nightly extract into a third database that de-normalizes the data
somewhat (i.e., a data-warehouse), but I want to make sure that the
data slurm captures is accurately reported.

Any pointers would be gratefully appreciated.
I would suggest not talking directly to the database as it changes often 
between releases.  If you are looking into dumping data from one DB to 
another I would suggest using sacct as that should always work the same 
even when the tables change in the database.


Charles
- -- 
Charles Johnson, Vanderbilt University

Advanced Computing Center for Research and Education
1231 18th Avenue South
Hill Center, Suite 146
Nashville, TN 37212
Office: (615) 343-4134
Cell  : (615) 478-7788
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJUZjLUAAoJEAk9vftsV0ONAiUP/1BI+ZfDX2leZ39OVydRvx+m
Q76sbS+J1tMbhjC2dLOXA1gAVZ/PYqak4MmJueP4Deo/S1gwHCfG1kUf+Jl6c19U
uf1MZi0kRnuAhPVBaydQT0aMxM4Lsg81+6HNBqUU0eTJZf58rogo6yNEaW1tBpaq
jwmHXB93eCgOnJdI/R8x//37YvlckcpriP3ZuVvtbVrmd7HoeEEnn+U3JxZ9xuDH
eAJ+/n+gVhD1c+Iqyrn3u23gtDBqONcVPwTu/HzzVaFFUjZLtG6nmdj/bAOjTbuQ
hiN3ICViD4WrDomV5Dw6dxygg2Wy2YiHAza/sKT9gRdi3IHnjjE9odkbrVmJ09u6
6Or88OYHZ2cqic+bjyTHqlJDSEDZUCtqgzfIwdZQUjilGR4Fk+GWqvgvMRiJzQ4V
d7DzLtKwV7OC2MNM4oUEMSIfl9JUebo0yxgsbRactrssYxiogbEqrPzpYC2dKqr4
Rlgk0B8m6/9bi8WVJ9/1xOthD5zXNLOaMLEJ7tjN2mLPGDQW8G3ZZ1s0ij3cQEax
uGwHMG7uRH/cseiwa7aCSUWldeg6jdW0DN/frKElpW+VVz4lJhQzfhpb5SoUFeGU
1INOFtzRtNPhjdFsdE8KfQen+gzEldHXTn8gCJewZ3aQlGza94T9HHWL/9ua2EoH
5MTN21TIX0L26cVPJfoY
=ubom
-END PGP SIGNATURE-


[slurm-dev] Slurm 14.11.0 is now available

2014-11-13 Thread Danny Auble


Slurm version 14.11.0 is now available. This is a major Slurm release 
with many new features. See the RELEASE_NOTES and NEWS files in the 
distribution for detailed descriptions of the changes, a few of which 
are noted below.


Upgrading from Slurm versions 2.6 or 14.03 should proceed without loss 
of jobs or other state.  Just be sure to upgrade the slurmdbd first. 
(Upgrades from pre-releases of version 14.11 may result job loss.)


Slurm downloads are available from http://www.schedmd.com/#repos.

Thanks to all those who helped make this release!

Highlights of changes in Slurm version 14.11.0 include:
 -- Added job array data structure and removed 64k array size restriction.
 -- Added support for reserving CPUs and/or memory on a compute node 
for system

use.
 -- Added support for allocation of generic resources by model type for
heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a 
GPU of

any type).
 -- Added support for non-consumable generic resources that are 
limited, but

can be shared between jobs.
 -- Added support for automatic job requeue policy based on exit value.
 -- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The
lua script no longer needs to explicitly load meta-tables, but 
information

is available directly using names slurm.reservations, slurm.jobs,
slurm.log_info, etc. Also, the job_submit.lua script is reloaded when
updated without restarting the slurmctld daemon.
 -- Eliminate native Cray specific port management. Native Cray systems 
must
now use the MpiParams configuration parameter to specify ports to 
be used
for communications. When upgrading Native Cray systems from version 
14.03,

all running jobs should be killed and the switch_cray_state file (in
SaveStateLocation of the nodes where the slurmctld daemon runs) must be
explicitly deleted.


[slurm-dev] Slurm versions 14.03.10 and 14.11.0-rc3 are now available

2014-11-03 Thread Danny Auble


Slurm version 14.03.10 includes quite a few relatively minor bug fixes, 
and will most likely be the last 14.03 release.  Thanks to all those who 
helped make this a very stable release.


We hope to officially tag 14.11.0 before SC14.  Version 14.11.0-rc3 
includes a few bug fixes discovered in recent testing but is looking 
very stable. Thanks to everyone participating in the testing!  If you 
can, please test this release so we can attempt to fix as many issues as 
we can before we tag 14.11.0.


Just a heads up, version 15.08 is already starting development we will 
most likely tag a pre1 of this later this month.


Slurm downloads are available from http://www.schedmd.com/#repos.

Here are some snips from the NEWS file on what has changed since the 
last releases.


* Changes in Slurm 14.03.10
===
 -- Fix a few sacctmgr error messages.
 -- Treat non-zero SlurmSchedLogLevel without SlurmSchedLogFile as a fatal
error.
 -- Correct sched_config.html documentation SchedulingParameters
should be SchedulerParameters.
 -- When using gres and cgroup ConstrainDevices set correct access
permission for the batch step.
 -- Fix minor memory leak in jobcomp/mysql on slurmctld reconfig.
 -- Fix bug that prevented preservation of a job's GRES bitmap on slurmctld
restart or reconfigure (bug was introduced in 14.03.5 "Clear record 
of a

job's gres when requeued" and only applies when GRES mapped to specific
files).
 -- BGQ: Fix race condition when job fails due to hardware failure and is
requeued. Previous code could result in slurmctld abort with NULL 
pointer.

 -- Prevent negative job array index, which could cause slurmctld to crash.
 -- Fix issue with squeue/scontrol showing correct node_cnt when only tasks
are specified.
 -- Check the status of the database connection before using it.
 -- ALPS - If an allocation requests -n set the BASIL -N option to the
amount of tasks / number of node.
 -- ALPS - Don't set the env var APRUN_DEFAULT_MEMORY, it is not needed 
anymore.

 -- Fix potential buffer overflow.
 -- Give better estimates on pending node count if no node count is 
requested.

 -- BLUEGENE - Fix issue where requeuing jobs could cause an assert.

* Changes in Slurm 14.11.0rc3
=
 -- Allow envs to override autotools binaries in autogen.sh
 -- Added system services files.
 -- If the jobs pends with DependencyNeverSatisfied keep it pending 
even after

the job which it was depending upon was cleaned.
 -- Let operators (in addition to user root and SlurmUser) see job 
script for

other user's jobs.
 -- Perl API modified to return node state of MIXED rather than 
ALLOCATED if

only some CPUs allocated.
 -- Double Munge connect retry timeout from 1 to 2 seconds.
 -- sview - Remove unneeded code that was resolved globally in commit
98e24b0dedc.
 -- Collect and report the accounting of the batch step and its children.
 -- Add configure checks for faccessat and eaccess, and make use of one of
them if available.
 -- Make configure --enable-developer also set --enable-debug
 -- Introduce a SchedulerParameters variable kill_invalid_depend, if set
then jobs pending with invalid dependency are going to be terminated.
 -- Move spank_user_task() call in slurmstepd after the task_g_pre_launch()
so that the task affinity information is available to spank.
 -- Make /etc/init.d/slurm script return value 3 when the daemon is
not running. This is required by Linux Standard Base Core
Specification 3.1


[slurm-dev] Re: Non static partition definition

2014-10-30 Thread Danny Auble


Keep in mind if you use GrpNodes and aren't doing whole node allocations 
and if multiple jobs using the QOS land on the same node that node gets 
counted multiple times in the limit.  I would suggest using GrpCPU 
instead if not always using whole node allocations.


Danny

On 10/30/2014 12:38 PM, Brown George Andrew wrote:

Hi,

To go into more specifics I'm wanting to be able to limit the number of nodes 
or cores providing the ability to run jobs with a wall time up to one week, all 
other nodes defaulting to 1 day. So I'd set say Q1 to have a 
MaxWallDurationPerJob of 24 hours and set it as the default then add another 
QOS with MaxWallDurationPerJob as a week and GrpNodes to N. Where this was 
previously two partitions I would now have a single partition with a max wall 
time of 1 week and two QOSes. In this case I'd want users to be able to do 
exactly what you highlight as a bug.

For completeness I would also set the DenyOnLimit flag.

In your case perhaps the MaxNodes setting in sacctmgr may help? In 14.03 
features were added which now allow you to more finely control which accounts 
get used with partitions as well as QOSes, this may be of interest.

Kind regards,
George

From: Tingyang Xu [tix11...@engr.uconn.edu]
Sent: 30 October 2014 19:42
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition

Hello George,
We do have the same issue now. I think the solution of QOS has bug. For
example,
Assume that Partition B allows two QOSes, Q1 and Q2. Then you set up
GrpNodes=10 on both Q1 and Q2. Then, the users can actually use 20 nodes if
they submit jobs to Q1 and Q2, respectively.

Best,
Tingyang Xu

-Original Message-
From: Brown George Andrew
Sent: Thursday, October 30, 2014 2:36 PM
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition


Thanks for the quick replies!

Indeed a QOS seems like what I want here. Sorry I was stuck thinking in
partitions and clearly was having some tunnel vision.

Cheers,
George

From: je...@schedmd.com [je...@schedmd.com]
Sent: 30 October 2014 19:08
To: slurm-dev
Subject: [slurm-dev] Re: Non static partition definition

In addition to a QOS, an advanced reservation may also satisfy your needs:
http://slurm.schedmd.com/reservations.html

Quoting Ryan Cox :


George,

Wouldn't a QOS with GrpNodes=10 accomplish that?

Ryan

On 10/30/2014 11:47 AM, Brown George Andrew wrote:

Hi,

I would like to have a partition of N nodes without statically
defining which nodes should belong to a partition and I'm trying to
work out the best way to achieve this.

Currently I have partitions which span across all the nodes in my
cluster with differing settings, but I would like some of these to
only occupy a subset of the cluster. I could say define partition A
which can use all nodes but partition B may only access nodes
01-10. But I would like avoid partition B being reduced in size in
the event of maintenance or hardware failure.

I'm thinking the way to do this would be via a plugin. I would keep
all partitions spanning all nodes in the cluster but upon
submission check how many nodes are in use on the requested
partition. If there were say already 10 nodes in use in partition B
the job should be queued. However things then get a bit more
complex as to when slurm should de-queue and then run the job.

Is there a native method to do this in slurm? Essentially I would
like something like the MaxNodes option that exists for partitions
today but have it limit the total number of nodes used by jobs
submitted to that partition rather than just a limit per job.

Many thanks,
George


--
Morris "Moe" Jette
CTO, SchedMD LLC


[slurm-dev] Re: SLURM installing into $prefix/lib and $prefix/lib64?

2014-10-15 Thread Danny Auble


What rpmbuild does is sets the --libdir so things should work correctly 
using that method.  I can verify running configure without the --libdir 
results in just $prefix/lib no matter the arch.  I could see how that is 
confusing.  If someone submits a patch with a reasonable solution we 
will evaluate it and get it in the next version.  Don't worry about the 
exotic systems like BG or Cray, we will.


As you are probably aware both rpmbuild and standalone configure both 
use perl to the the PERLARCHLIB and only replace the prefix of that, you 
can find what we do in the slurm.spec if you are interested.  I don't 
think changing that should happen.  For what it is worth PERLARCHLIB is 
lib on a 64bit ubuntu install. Redhat/CentOS appears to have lib64.


On 10/14/2014 06:52 PM, Jeff Squyres (jsquyres) wrote:

Moe / Danny / SchedMD in general --

Comments?


On Oct 14, 2014, at 5:41 PM, Michael Jennings  wrote:


Yes.  Or perhaps configure could just error out if it detects a
mismatch (and therefore let a human figure out, perhaps by supplying
--libdir to override lib tool's default install location, etc., or
passing PERLARCHLIB=...).

I agree with you.  As for what the right answer is, hopefully Moe or
Danny will chime in at this point.  Or someone from SchedMD.  It's an
administrative choice as to which is the right way to go.

I think most folks would expect PERLARCHLIB to correlate with $libdir
(and $prefix/lib*) when using a custom prefix; I think it's quite
reasonable to take specific actions when --prefix is given (outside of
/usr and /usr/local) that aren't taken otherwise.  But as I said
before, I don't have the BG/Cray experience to say what those systems'
expectations or caveats are.  :-)

Michael

--
Michael Jennings 
Senior HPC Systems Engineer
High-Performance Computing Services
Lawrence Berkeley National Laboratory
Bldg 50B-3209EW: 510-495-2687
MS 050B-3209  F: 510-486-8615




[slurm-dev] Re: sbatch option to constrain one task per core

2014-10-15 Thread Danny Auble
What happens if you use srun instead of mpirun?

On October 15, 2014 5:31:42 AM PDT, Edrisse Chermak 
 wrote:
>
>My mistake, I forgot some important lscpu NUMA output :
>
>NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
>NUMA node1 CPU(s): 32,36,40,44,48,52,56,60
>NUMA node2 CPU(s): 1,5,9,13,17,21,25,29
>NUMA node3 CPU(s): 33,37,41,45,49,53,57,61
>NUMA node4 CPU(s): 2,6,10,14,18,22,26,30
>NUMA node5 CPU(s): 34,38,42,46,50,54,58,62
>NUMA node6 CPU(s): 35,39,43,47,51,55,59,63
>NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
>
>Thanks in advance,
>Edrisse
>
>On 10/15/2014 03:24 PM, Edrisse Chermak wrote:
>> Dear Slurm Developers and Users,
>>
>> I would like to constrain an 8 cpu job to run in one socket of 16
>cpu,
>> with one task per core.
>> Unfortunately, when using the script :
>> ---
>> sbatch -J $JOB -N 1 -B '1:8:1' --ntasks-per-socket=8
>> --ntasks-per-core=1 << eof
>> ...
>> mpirun -np 8 nwchem_64to32 $JOB.nwc >& $JOB.out
>> ...
>> eof
>> ---
>> top command on compute node shows 2 tasks running on the same core :
>> ---
>> $ top
>> 11838 11846 51 edrisse   20   0 12.3g 9452  95m R 46.7  0.0 0:01.43
>> nwchem_64to32
>> 11838 11845 59 edrisse   20   0 12.3g 9600  96m R 46.4  0.0 0:01.42
>> nwchem_64to32
>> 11838 11844 47 edrisse   20   0 12.3g 9592  95m R 46.4  0.0 0:01.42
>> nwchem_64to32
>> 11838 11843 43 edrisse   20   0 12.3g 9844  96m R 46.4  0.0 0:01.42
>> nwchem_64to32
>> 11838 11842  3 edrisse   20   0 12.3g 9.8m  96m R 46.4  0.0 0:01.43
>> nwchem_64to32
>> 11838 11841 35 edrisse   20   0 12.3g 9.8m  92m R 45.7  0.0 0:01.41
>> nwchem_64to32
>> 11838 11840 39 edrisse   20   0 12.3g  10m  96m R 46.1  0.0 0:01.42
>> nwchem_64to32
>> 11838 11839 55 edrisse   20   0 12.3g  10m 109m R 46.4  0.0 0:01.42
>> nwchem_64to32
>> ---
>> Unfortunately, cpu 55 and cpu 51 own to the same core in our node's
>> architecture: (see NUMA node7)
>> ---
>> $ lscpu
>> CPU(s):64
>> On-line CPU(s) list:   0-63
>> Thread(s) per core:2
>> Core(s) per socket:8
>> Socket(s): 4
>> NUMA node(s):  8
>> ...
>> NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
>> ...
>> NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
>> ---
>> I perhaps missed something, if you could guide me to the right option
>> it would be great.
>> I also attached my slurm.conf file.
>>
>> Best Regards,
>> Edrisse
>
>
>
>This message and its contents including attachments are intended solely
>for the original recipient. If you are not the intended recipient or
>have received this message in error, please notify me immediately and
>delete this message from your computer system. Any unauthorized use or
>distribution is prohibited. Please consider the environment before
>printing this email.


[slurm-dev] Re: Clusters API

2014-10-09 Thread Danny Auble


Nate check out commit e49fcbeb48647460fda59895c22afc4b080efe89

It should give you what you want.  Since working_cluster_rec was already 
in slurmdb.h I ended up sticking with that instead of changing the 
name.  This is also only in 14.11, but the patch should apply cleanly to 
14.03.


Danny

On 10/09/2014 12:13 PM, Nate Coraor wrote:

On Thu, Oct 9, 2014 at 2:04 PM, Danny Auble  wrote:

Nate,

Does

diff --git a/src/common/slurmdb_defs.c b/src/common/slurmdb_defs.c
index d094b91..01673fc 100644
--- a/src/common/slurmdb_defs.c
+++ b/src/common/slurmdb_defs.c
@@ -49,6 +49,8 @@
  #include "src/common/slurm_auth.h"
  #include "src/slurmdbd/read_config.h"

+strong_alias(working_cluster_rec, slurm_working_cluster_rec);
+
  slurmdb_cluster_rec_t *working_cluster_rec = NULL;

  static void _free_res_cond_members(slurmdb_res_cond_t *res_cond);

Fix your situation?  You would probably have to change your references to
slurm_working_cluster_rec, but that is probably safer.

Danny

Hi Danny,

I had thought about proposing that change, it works just fine. Should
it be slurmdb_working_cluster_rec? Could you add it to slurmdb.h as
well? I have a few other missing symbols I'm using as well, if you're
going to add anything to the public includes:

 
https://github.com/natefoo/slurm-drmaa/blob/master/slurm_drmaa/slurm_missing.h

Thanks,
--nate


[slurm-dev] Re: Clusters API

2014-10-09 Thread Danny Auble


Nate,

Does

diff --git a/src/common/slurmdb_defs.c b/src/common/slurmdb_defs.c
index d094b91..01673fc 100644
--- a/src/common/slurmdb_defs.c
+++ b/src/common/slurmdb_defs.c
@@ -49,6 +49,8 @@
 #include "src/common/slurm_auth.h"
 #include "src/slurmdbd/read_config.h"

+strong_alias(working_cluster_rec, slurm_working_cluster_rec);
+
 slurmdb_cluster_rec_t *working_cluster_rec = NULL;

 static void _free_res_cond_members(slurmdb_res_cond_t *res_cond);

Fix your situation?  You would probably have to change your references 
to slurm_working_cluster_rec, but that is probably safer.


Danny

On 10/09/2014 10:22 AM, Nate Coraor wrote:

Here's the fairly complete form of it, in case anyone else finds this
useful. As the documentation mentions, all I have to do is compile a
standalone libslurmdb.so for the application, with working_cluster_rec
public and everything works great:

 https://github.com/natefoo/slurm-drmaa

Thanks,
--nate

On Wed, Oct 8, 2014 at 11:47 AM, Nate Coraor  wrote:

Hi all,

For the last few days I've been working on adding support for the
-M/--clusters option to slurm-drmaa. I have it working but it's taken
a few hacks:

1. I cannot see any public way to actually use the multi-cluster
functionality. It's possible to query slurmdbd for all the cluster
info you can get with sacctmgr via the public API, but I had to expose
working_cluster_rec in libslurmdb to be able to use those cluster
records for submit and status requests. Of course this will not work
with any standard installations (and may not be entirely safe).

2. I'm using the non-published slurmdb_get_info_clusters() function to
get cluster records. It's possible to get them with the public
slurmdb_clusters_get() function, but for some reason the
plugin_id_select returned by the latter (and indeed, as found in
slurmdbd's database) is incorrect for both clusters (both are 101 in
slurmdbd, should be 2).

Could anyone provide guidance on how to fix these? Am I going at this all wrong?

Thanks,
--nate


[slurm-dev] Re: Checking on array jobs within slurm accounting DB and via sacct

2014-09-26 Thread Danny Auble


Depending to what commit you upgrade to yes, anything in 14.03 is in 
14.11.  Right now I wouldn't suggest on running 14.11 in production 
since it is still under development.  If this feature is something you 
really need I would suggest getting a 14.03.8 tag and cherry pick the 
14.11 commit massage it and run it that way.


On 09/26/2014 11:08 AM, John Desantis wrote:

Danny,

Thank you for your response.  We'll schedule an upgrade to address the issue.

Could you tell me if commit 6aadcf15355dfe (introduced in 14.03.4)
will still be present?

John DeSantis

2014-09-26 13:45 GMT-04:00 Danny Auble :

John, this was fixed in 14.11 (commit
d23590dbc94e40a0963fc8d1cee0e6145f782f5c).  Since structures had to change
it wasn't possible to fix previous versions.  The patch might go in cleanly
to 14.03, but will probably need some massaging with the packs and unpacks.
Using this patch will also break backwards compatibility which you may or
may not care about.

Danny

On 09/26/2014 10:19 AM, John Desantis wrote:

Hello all,

First and foremost since this is my first post to the list, I'd like
to thank the Slurm developers for a great and gratis product!

Anyways, to the point.

We have users submitting array jobs via sbatch and using
"-a/--array=n-n" without an issue.  When these jobs are running, we
can use 'squeue' to see tasks under the form of "jobnumber_task".
When we try to query these jobs via the accounting database (checking
on job_table, step_table, and jobcomp_table) and via sacct -j
"jobnumber", we're not getting the complete set of information
associated with the job(batch and exec hosts, etc.).  If the job is
currently running, we can use scontrol to see the job and its steps,
and the full set of information we're looking for.

When I used scontrol to view an array job, I saw that "JobId" for each
of the array tasks incremented based upon the step, e.g.:

JobId=23383 ArrayJobId=23383 ArrayTaskId=1
JobId=23384 ArrayJobId=23383 ArrayTaskId=2
JobId=23385 ArrayJobId=23383 ArrayTaskId=3

When I tried to query any of the successive JobId's via sacct or the
DB itself, I didn't get any information.  Only the real JobId "23383"
returned a result within sacct and the DB.  I was able to glean node
information from the scheduler and control daemon logs by looking for
the JobId's listed above.

I did find a previous post
https://www.mail-archive.com/slurm-dev@schedmd.com/msg03344.html which
seems to be my question as well.

Thanks for any insight which can be provided,

John DeSantis


[slurm-dev] Re: Checking on array jobs within slurm accounting DB and via sacct

2014-09-26 Thread Danny Auble


John, this was fixed in 14.11 (commit 
d23590dbc94e40a0963fc8d1cee0e6145f782f5c).  Since structures had to 
change it wasn't possible to fix previous versions.  The patch might go 
in cleanly to 14.03, but will probably need some massaging with the 
packs and unpacks.  Using this patch will also break backwards 
compatibility which you may or may not care about.


Danny

On 09/26/2014 10:19 AM, John Desantis wrote:

Hello all,

First and foremost since this is my first post to the list, I'd like
to thank the Slurm developers for a great and gratis product!

Anyways, to the point.

We have users submitting array jobs via sbatch and using
"-a/--array=n-n" without an issue.  When these jobs are running, we
can use 'squeue' to see tasks under the form of "jobnumber_task".
When we try to query these jobs via the accounting database (checking
on job_table, step_table, and jobcomp_table) and via sacct -j
"jobnumber", we're not getting the complete set of information
associated with the job(batch and exec hosts, etc.).  If the job is
currently running, we can use scontrol to see the job and its steps,
and the full set of information we're looking for.

When I used scontrol to view an array job, I saw that "JobId" for each
of the array tasks incremented based upon the step, e.g.:

JobId=23383 ArrayJobId=23383 ArrayTaskId=1
JobId=23384 ArrayJobId=23383 ArrayTaskId=2
JobId=23385 ArrayJobId=23383 ArrayTaskId=3

When I tried to query any of the successive JobId's via sacct or the
DB itself, I didn't get any information.  Only the real JobId "23383"
returned a result within sacct and the DB.  I was able to glean node
information from the scheduler and control daemon logs by looking for
the JobId's listed above.

I did find a previous post
https://www.mail-archive.com/slurm-dev@schedmd.com/msg03344.html which
seems to be my question as well.

Thanks for any insight which can be provided,

John DeSantis


[slurm-dev] Re: "dummy" slurm for PC

2014-09-23 Thread Danny Auble

Check out

http://slurm.schedmd.com/faq.html#multi_slurmd

which doesn't require a cluster or extra nodes to emulate a cluster with 
most of the bells and whistles.


Keep in mind Slurm will not run on Windows, but it should work on your Mac.

On 09/23/2014 12:34 PM, Sakhile Masoka wrote:

"dummy" slurm for PC
Hi

I want to install SLURM on my Macbook pro and Windows machine for 
testing purposes. The Center is considering moving to SLURM and I want 
to try it out before I install it on my test cluster. Is there a 
"dummy" version I can download and test policies?


Regards
Sakhile, CHPC





[slurm-dev] Re: Implementing fair-share policy using BLCR

2014-09-23 Thread Danny Auble


Or just use the all_partitions job_submit plugin.

On 09/23/2014 09:52 AM, Kilian Cavalotti wrote:

Hi,

On Tue, Sep 23, 2014 at 7:18 AM, Yann Sagon  wrote:

To lower the problem of having to deal with two queues, you can specify the two 
queues like that when you submit a job : --partition=queue1,queue2 and the 
first one that is free is selected.

You can even define an env variable in users' environment so they
don't have to type anything. "export SLURM_PARTITION=queue1,queue2"
would do the same. Note that for sbatch, it's SBATCH_PARTITION, and
SALLOC_PARTITION for salloc.

Cheers,


[slurm-dev] Slurm versions 14.03.8 and 14.11.0-pre5 are now available

2014-09-17 Thread Danny Auble


Slurm versions 14.03.8 and 14.11.0-pre5 are now available. Version 
14.03.8 includes quite a few relatively minor bug fixes.


Version 14.11.0 is under active development and its release is planned 
in November 2014.  Much of its features and performance enhancements 
will be discussed next week at SLUG 2014 in Lugano Switzerland.


Note to all developers, code freeze for new features in 14.11 will be at 
the end of this month (September).


Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of the 2 versions are these

* Changes in Slurm 14.03.8
==
 -- Fix minor memory leak when Job doesn't have nodes on it (Meaning 
the job

has finished)
 -- Fix sinfo/sview to be able to query against nodes in reserved and other
states.
 -- Make sbatch/salloc read in (SLURM|(SBATCH|SALLOC))_HINT in order to
handle sruns in the script that will use it.
 -- srun properly interprets a leading "." in the executable name based 
upon

the working directory of the compute node rather than the submit host.
 -- Fix Lustre misspellings in hdf5 guide
 -- Fix wrong reference in slurm.conf man page to what --profile option 
should

be used for AcctGatherFilesystemType.
 -- Update HDF5 document to point out the SlurmdUser is who creates the
ProfileHDF5Dir directory as well as all it's sub-directories and files.
 -- CRAY NATIVE - Remove error message for srun's ran inside an salloc that
had --network= specified.
 -- Defer job step initiation of required GRES are in use by other 
steps rather

than immediately returning an error.
 -- Deprecate --cpu_bind from sbatch and salloc.  These never worked 
correctly

and only caused confusion since the cpu_bind options mostly refer to a
step we opted to only allow srun to set them in future versions.
 -- Modify sgather to work if Nodename and NodeHostname differ.
 -- Changed use of JobContainerPlugin where it should be JobContainerType.
 -- Fix for possible error if job has GRES, but the step explicitly 
requests a

GRES count of zero.
 -- Make "srun --gres=none ..." work when executed without a job 
allocation.

 -- Change the global eio_shutdown_time to a field in eio handle.
 -- Advanced reservation fixes for heterogeneous systems, especially when
reserving cores.
 -- If --hint=nomultithread is used in a job allocation make sure any 
srun's

ran inside the allocation can read the environment correctly.
 -- If batchdir can't be made set errno correctly so the slurmctld is 
notified

correctly.
 -- Remove repeated batch complete if batch directory isn't able to be made
since the slurmd will send the same message.
 -- sacctmgr fix default format for list transactions.
 -- BLUEGENE - Fix backfill issue with backfilling jobs on blocks already
reserved for higher priority jobs.
 -- When creating job arrays the job specification files for each elements
are hard links to the first element specification files. If the 
controller

fails to make the links the files are copied instead.
 -- Fix error handling for job array create failure due to inability to 
copy

job files (script and environment).
 -- Added patch in the contribs directory for integrating make version 
4.0 with

Slurm and renamed the previous patch "make-3.81.slurm.patch".
 -- Don't wait for an update message from the DBD to finish before 
sending rc

message back.  In slow systems with many associations this could speed
responsiveness in sacctmgr after adding associations.
 -- Eliminate race condition in enforcement of MaxJobCount limit for 
job arrays.

 -- Fix anomaly allocating cores for GRES with specific device/CPU mapping.
 -- cons_res - When requesting exclusive access make sure we set the number
of cpus in the job_resources_t structure so as nodes finish the correct
cpu count is displayed in the user tools.
 -- If the job_submit plugin calls take longer than 1 second to run, 
print a

warning.
 -- Make sure transfer_s_p_options transfers all the portions of the
s_p_options_t struct.
 -- Correct the srun man page, the SLURM_CPU_BIND_VERBOSE, 
SLURM_CPU_BIND_TYPE
SLURM_CPU_BIND_LIST environment variable are set only when 
task/affinity

plugin is configured.
 -- sacct - Initialize variables correctly to avoid incorrect structure
reference.
 -- Performance adjustment to avoid calling a function multiple times 
when it

only needs to be called once.
 -- Give more correct waiting reason if job is waiting on association/QOS
MaxNode limit.
 -- DB - When sending lft updates to the slurmctld only send 
non-deleted lfts.

 -- BLUEGENE - Fix documentation on how to build a reservation less than
a midplane.
 -- If Slurmctld fails to read the job environment consider it an error
and abort the job.
 -- Add the name of the node a job is running on to the message printed by
slurmstepd when terminating a job.
 -- Remove unsupported options from sacctmgr help

[slurm-dev] Re: job exit codes

2014-07-29 Thread Danny Auble


Upgrade and see if you get different behavior, as this was fixed in 
14.03.05 ;).


On 07/29/2014 12:26 PM, Bill Wichser wrote:


Lol.  Missed that!  14.03.04

On 07/29/2014 02:01 PM, Danny Auble wrote:

14.03.05?

On July 29, 2014 8:41:25 AM PDT, Bill Wichser  
wrote:



Version currently demonstrating this is: 14.03

Bill

On 07/25/2014 09:44 PM, Danny Auble wrote:

What version are you using?

On July 25, 2014 5:12:22 PM PDT, Bill Wichser
 wrote:


Thanks. I knew that with our implementation of PBS it was always
this
way. But there was no indication from Slurm docs that the lower
7 bits
(-128) also applied for slurm.

My exit codes from sacct are always 137:0 and 139:0 from 
these jobs.


Bill

On 7/25/2014 6:22 PM, Danny Auble wrote:


Paul is correct,

Before 14.03.5 Slurm didn't obey POSIX convention but now does.

Basically if the job was signaled in some fashion the exit 
code is

! increased by 128 to show this is the case.

As an example on the command line, if I do a simple sleep and
ctrl-C
it the exit code would be 130

sleep 1000
^C
echo $?
130

Before 14.03.5 srun wouldn't return just 15 in this case but we
wanted
to be POSIX c! ompliant so we modified it to increase the
exit_code as
it should to be compliant.

What does sacct tell you on the jobs? For the exit code of 137 I
would expect you would get a ExitCode of 0:9 meaning you had an
exit
code of 0 but it was signaled with a SIGKILL. For the 139 I 
would

expect a 0:11 meaning a Seg Fault happened just as Paul said.

Danny

On 07/25/2014 03:06 PM, Bill Wichser wrote:


 From the documentation there is no clear explanation which
I find
explaining the exit codes of jobs. I have a user
experiencing exit
codes of 137 and 139. Can anyone help me to locate what this
8 bit
unsigned integer references?

Thanks,
Bill




[slurm-dev] Re: job exit codes

2014-07-29 Thread Danny Auble
14.03.05?

On July 29, 2014 8:41:25 AM PDT, Bill Wichser  wrote:
>
>Version currently demonstrating this is: 14.03
>
>Bill
>
>On 07/25/2014 09:44 PM, Danny Auble wrote:
>> What version are you using?
>>
>> On July 25, 2014 5:12:22 PM PDT, Bill Wichser 
>wrote:
>>
>>
>> Thanks.  I knew that with our implementation of PBS it was always
>this
>> way.  But there was no indication from Slurm docs that the lower
>7 bits
>> (-128) also applied for slurm.
>>
>> My exit codes from sacct are always 137:0 and 139:0 from these
>jobs.
>>
>> Bill
>>
>> On 7/25/2014 6:22 PM, Danny Auble wrote:
>>
>>
>> Paul is correct,
>>
>> Before 14.03.5 Slurm didn't obey POSIX convention but now
>does.
>>
>> Basically if the job was signaled in some fashion the exit
>code is
>> increased by 128 to show this is the case.
>>
>> As an example on the command line, if I do a simple sleep and
>> ctrl-C
>> it the exit code would be 130
>>
>> sleep 1000
>> ^C
>> echo $?
>> 130
>>
>> Before 14.03.5 srun wouldn't return just 15 in this case but
>we
>> wanted
>> to be POSIX c! ompliant so we modified it to increase the
>> exit_code as
>> it should to be compliant.
>>
>> What does sacct tell you on the jobs? For the exit code of
>137 I
>> would expect you would get a ExitCode of 0:9 meaning you had
>an
>> exit
>> code of 0 but it was signaled with a SIGKILL. For the 139 I
>would
>> expect a 0:11 meaning a Seg Fault happened just as Paul said.
>>
>> Danny
>>
>> On 07/25/2014 03:06 PM, Bill Wichser wrote:
>>
>>
>>  From the documentation there is no clear explanation
>which
>> I find
>> explaining the exit codes of jobs. I have a user
>> experiencing exit
>> codes of 137 and 139. Can anyone help me to locate what
>this
>> 8 bit
>> unsigned integer references?
>>
>> Thanks,
>> Bill
>>


[slurm-dev] Re: job exit codes

2014-07-25 Thread Danny Auble
What version are you using? 

On July 25, 2014 5:12:22 PM PDT, Bill Wichser  wrote:
>
>Thanks.  I knew that with our implementation of PBS it was always this 
>way.  But there was no indication from Slurm docs that the lower 7 bits
>
>(-128) also applied for slurm.
>
>My exit codes from sacct are always 137:0 and 139:0 from these jobs.
>
>Bill
>
>On 7/25/2014 6:22 PM, Danny Auble wrote:
>>
>> Paul is correct,
>>
>> Before 14.03.5 Slurm didn't obey POSIX convention but now does.
>>
>> Basically if the job was signaled in some fashion the exit code is 
>> increased by 128 to show this is the case.
>>
>> As an example on the command line, if I do a simple sleep and ctrl-C 
>> it the exit code would be 130
>>
>> sleep 1000
>> ^C
>> echo $?
>> 130
>>
>> Before 14.03.5 srun wouldn't return just 15 in this case but we
>wanted 
>> to be POSIX compliant so we modified it to increase the exit_code as 
>> it should to be compliant.
>>
>> What does sacct tell you on the jobs?  For the exit code of 137 I 
>> would expect you would get a ExitCode of 0:9 meaning you had an exit 
>> code of 0 but it was signaled with a SIGKILL.  For the 139 I would 
>> expect a 0:11 meaning a Seg Fault happened just as Paul said.
>>
>> Danny
>>
>> On 07/25/2014 03:06 PM, Bill Wichser wrote:
>>>
>>> From the documentation there is no clear explanation which I find 
>>> explaining the exit codes of jobs.  I have a user experiencing exit 
>>> codes of 137 and 139.  Can anyone help me to locate what this 8 bit 
>>> unsigned integer references?
>>>
>>> Thanks,
>>> Bill


[slurm-dev] Re: job exit codes

2014-07-25 Thread Danny Auble


Paul is correct,

Before 14.03.5 Slurm didn't obey POSIX convention but now does.

Basically if the job was signaled in some fashion the exit code is 
increased by 128 to show this is the case.


As an example on the command line, if I do a simple sleep and ctrl-C it 
the exit code would be 130


sleep 1000
^C
echo $?
130

Before 14.03.5 srun wouldn't return just 15 in this case but we wanted 
to be POSIX compliant so we modified it to increase the exit_code as it 
should to be compliant.


What does sacct tell you on the jobs?  For the exit code of 137 I would 
expect you would get a ExitCode of 0:9 meaning you had an exit code of 0 
but it was signaled with a SIGKILL.  For the 139 I would expect a 0:11 
meaning a Seg Fault happened just as Paul said.


Danny

On 07/25/2014 03:06 PM, Bill Wichser wrote:


From the documentation there is no clear explanation which I find 
explaining the exit codes of jobs.  I have a user experiencing exit 
codes of 137 and 139.  Can anyone help me to locate what this 8 bit 
unsigned integer references?


Thanks,
Bill


[slurm-dev] Re: Interactive job array

2014-07-18 Thread Danny Auble
Would running steps (multiple sruns) inside of an allocation give you 
what you are looking for?


On 07/18/2014 09:31 AM, Nicolas GRANDEMANGE wrote:

Re: [slurm-dev] Re: Interactive job array
Hi Julien,

I believe you can retrieve the job id (like David did) and then use 
the 'afterany' dependency with a fake 'true' command:


bash$ JID=`sbatch --array=1-1000 -o /dev/null test.sh | awk '{print $4}'`
bash$ srun -d "afterany:$JID" true
srun: job 788910 queued and waiting for resources
srun: job 788910 has been allocated resources

But, I don't think there is any easy way to get an aggregated return code
like with the Sun Grid Engine --sync option:

man qsub
If -sync y is used in conjunction  with  -t  n[-m[:i]],
qsub  will  wait  for  all  the job's tasks to complete
before exiting.  If all the job's tasks  complete  suc-
cessfully,  qsub's  exit code will be that of the first
completed job tasks with a non-zero exit code, or 0  if
all job tasks exited with an exit code of 0

Regards


--
Nicolas Grandemange




[slurm-dev] Re: pmi and hwloc

2014-07-15 Thread Danny Auble

In 14.11 it doesn't.

On 07/15/2014 02:18 PM, Andy Riebs wrote:
Is there a reason to have libpmi depend on hwloc for some 
architectures, even though it's not relevant for RHEL x86_64 clusters 
today?


Andy

On 07/13/2014 10:19 AM, Ralph Castain wrote:
Just to clarify something: this only occurs when --with-pmi is 
provided. We *never* link directly against libslurm for licensing 
reasons, and --with-slurm doesn't cause us to link against any Slurm 
libraries.


So the only impact here is that we would have to drop support for 
directly launching apps using srun, and require the use of mpirun 
instead. Regrettable, but my point is to clarify that this doesn't 
preclude use of OMPI under Slurm environments.


Obviously, we would prefer to see it resolved, and that libpmi stand 
alone as an LGPL library :-)  This goes beyond what Mike is 
requesting, which is to at least remove the hwloc dependency as PMI 
clearly doesn't require it.



On Jul 13, 2014, at 4:24 AM, Mike Dubman > wrote:



Hi guys,
The new SLURM 14.x series contains “–lhwloc” dependency 
mentioned in the dependency_libs= string, in the slurm provided .la 
files:

libpmi.la
libslurmdb.la
libslurm.la
This breaks OMPI compilation when either –with-pmi or 
–with-slurm flags provided to OMPI “configure”.
I checked previous SLURM 2.6.x version and it does not have such 
dependency for hwloc.

_http://www.open-mpi.org/community/lists/devel/2014/07/15130.php_
Please fix.
Thanks
Kind Regards,
*Mike Dubman*|R&D Senior Director, HPC
Tel:  +972 (74) 712 9214|Fax: +972 (74) 712 9111
Mellanox Ltd. 13 Zarchin St., Bldg B, Raanana 43662, Israel








[slurm-dev] RE: CPU/GPU utilization userwise

2014-07-10 Thread Danny Auble


Unless you enforce associations you do not need to create them 
manually.  You can use the database to only store jobs and steps.


On 07/10/2014 01:20 AM, Loris Bennett wrote:

Hi Dhvanika,

 writes:


Hi guys

  


I am trying to understand the database.

  


For SLURM to record user wise cpu utilization, is it MUST to have user-account
association to be defined in SLURM accouting?

  


Regards

Dhvani

As it says in the documentation (http://slurm.schedmd.com/accounting.html):

,-
| Accounting records are maintained based upon what we refer to as an
| Association, which consists of four elements: cluster, account, user
| names and an optional partition name
`-

So, yes, associations are must.  The ideas is that a given user could
work in multiple groups or projects on a given cluster, so you need the
associations to keep track of how the CPU time was used.

Cheers

Loris


[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-08 Thread Danny Auble


Thanks for the tip Chris, I hadn't noticed it before.  It is committed 
in 14.11 commit da24acfc359ef9e28866c82fb9f9e9880235fcaa


The DBD will now print a nice error about InnoDB not existing and halt 
if it isn't available.


Danny

On 07/03/2014 08:23 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/07/14 10:07, Danny Auble wrote:


That is exactly what it does. Evidently if innodb isn't there it
just goes along quietly without issue without innodb.

Wow, I'd assumed it would report an error in that case, but apparently
not.   This describes what happens and how to get it to fail instead.

http://dev.mysql.com/doc/refman/5.6/en/storage-engine-setting.html

# By default, a warning is generated whenever CREATE TABLE or
# ALTER TABLE cannot use the default storage engine. To prevent
# confusing, unintended behavior if the desired engine is
# unavailable, enable the NO_ENGINE_SUBSTITUTION SQL mode. If
# the desired engine is unavailable, this setting produces an
# error instead of a warning, and the table is not created or
# altered. See Section 5.1.7, “Server SQL Modes”.

cheers!
Chris
- -- 
  Christopher SamuelSenior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlO2HdIACgkQO2KABBYQAh+x+gCfcA8tzRbYvK3X+kZz3DSQgohS
i6AAn1I72z5GRUCEnN+qncD3Dgxz3269
=I5Bo
-END PGP SIGNATURE-


[slurm-dev] Re: Minor edit to SLURM FAQ

2014-07-07 Thread Danny Auble
Brian, the current documentation is at http://slurm.schedmd.com. The 
LLNL documentation is years old but remains around for internal 
reasons.  This particular FAQ is the same though in both places. I'll 
see that it is updated to something similar to your suggestion in the 
current docs.


Thanks,
Danny

On 07/07/2014 10:35 AM, Adams, Brian M wrote:


Hello all (please advise if this is the wrong list for this),

Could you please make a minor update to this FAQ entry to help with 
user confusion: 
https://computing.llnl.gov/linux/slurm/faq.html#mpi_symbols? 
Specifically, Dakota versions older than 5.2 did indeed have a global 
regcomp symbol (my apologies for that), but it does not exist in 5.2, 
5.3, 5.4, or 6.0.  Perhaps this rewording would help:


For example DAKOTA , versions 5.1 and 
older, contains a function named *regcomp*, which will get used rather 
than the POSIX regex functions. Rename DAKOTA's function and 
references from regcomp to something else to make it work properly.


Thanks,

Brian

-
Brian M. Adams, PhD (bria...@sandia.gov )
Optimization and Uncertainty Quantification
Sandia National Laboratories, Albuquerque, NM
http://www.sandia.gov/~briadam 





[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Danny Auble
That is exactly what it does.  Evidently if innodb isn't there it just goes 
along quietly without issue without innodb. 

On July 3, 2014 4:38:24 PM MST, Christopher Samuel  
wrote:
>
>-BEGIN PGP SIGNED MESSAGE-
>Hash: SHA1
>
>On 28/06/14 06:32, Kevin M. Hildebrand wrote:
>
>> I might have missed it, but I didn't see any place in the SLURM
>> install docs that mention that InnoDB is a requirement when using
>> MySQL.
>
>I'd suggest that for these cases it should use Engine=InnoDB when
>creating MySQL tables, that way you don't need to remember.
>
>All the best,
>Chris
>- -- 
> Christopher SamuelSenior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/  http://twitter.com/vlsci
>
>-BEGIN PGP SIGNATURE-
>Version: GnuPG v1
>Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
>iEYEARECAAYFAlO16RsACgkQO2KABBYQAh8I2wCeMQUiMlrDsCMV4kuMHUZig+Ld
>b5IAniZ2PuWHp4hRXCK7NZwC+ywMAeEH
>=rqjS
>-END PGP SIGNATURE-


[slurm-dev] Re: Documentation enhancement request- mysql

2014-07-03 Thread Danny Auble
This has been added to the accounting web page.  Thanks for the note.  I 
agree with the frustration of MySQL just chirping right along, no error 
is even given when we specifically set the engine to InnoDB.  Hopefully 
this has only bitten a couple of people, and hopefully the documentation 
update will help others.


Danny

On 06/27/2014 01:31 PM, Kevin M. Hildebrand wrote:


I might have missed it, but I didn't see any place in the SLURM 
install docs that mention that InnoDB is a requirement when using MySQL.


I've just spent a couple of hours debugging why when using sacctmgr, 
answering "no" to any of the commit questions is the same as answering 
"yes".


It turns out that my MySQL default engine is MyISAM, which doesn't 
support transactions, so rollbacks on "no" responses don't work.


The most frustrating part is that MySQL runs along happily without any 
warnings to tell you that all of the transaction controls are being 
ignored.


So, I'd suggest adding a warning to the install docs that InnoDB is 
required and must be the default when the tables are initially created.


Thanks!
Kevin

--

Kevin Hildebrand

Division of IT

University of Maryland, College Park





[slurm-dev] Re: [PATCH 2/2] Fix memory management on error in sacct load

2014-06-25 Thread Danny Auble


Ah, well that makes sense then, your patch will fix this as well since 
the free will only happen once.


Thanks,
Danny

On 06/25/2014 06:50 AM, Rémi Palancher wrote:


Hi Danny,

Le 25/06/2014 15:05, Danny Auble a écrit :


Thanks Rémi.  This patch does appear good, but the real question is
where did the

l=0xb0 for file_opts->wckey_list come from.  This appears to represent
memory corruption, but it is unclear how.


After few breakpoints and step-by-step debugging session, gdb gave me 
the answer: xfree(file_opts) in _destroy_sacctmgr_file_opts().


Once the second option without '=' is detected in _parse_options(), 
the if (file_opts->name) branch is taken and 
_destroy_sacctmgr_file_opts() is called for the first time with this 
file_opts struct:


(gdb) n
226 if (file_opts->wckey_list) {
(gdb) p *file_opts
$8 = {admin = SLURMDB_ADMIN_NOTSET, classification = 0, coord_list = 
0x0, def_acct = 0x0, ... , qos_list = 0x0, wckey_list = 0x0}


(gdb) n
230 xfree(file_opts);
(gdb) p *file_opts
$9 = {admin = SLURMDB_ADMIN_NOTSET, classification = 0, coord_list = 
0x0, def_acct = 0x0, ... , qos_list = 0x0, wckey_list = 0x0}


(gdb) n
232 }
(gdb) p *file_opts
Cannot access memory at address 0x0

Then, when back in _parse_options():

(gdb) n
_parse_options (options=0x7ff7e98a 
"name1:name2:Description='none':Organization='none':Fairshare=1\n") at 
file_functions.c:292

292 break;
(gdb) p *file_opts
$10 = {admin = 4147285656, classification = 32767, coord_list = 0x0, 
def_acct = 0x0, ... , qos_list = 0x0, wckey_list = 0xb0}

 ^^

The structure pointed by file_opts has been free'd in _destroy() but 
the pointer in _parse_options() still refer to this memory space. The 
pointer should be set to NULL right after _destroy() to avoid extra 
reference to this free'd memory.


Then, at the end _parse_options(), the second call to _destroy() is 
done with a pointer to unknown garbage.


What do you think?

I couldn't tell if this behaviour is due to my specific combination of 
compiler, architecture, (and whatever) though.


[slurm-dev] Re: [PATCH 0/2] Patch set for bugs with sacctmgr load

2014-06-25 Thread Danny Auble


We prefer getting patches in an attached file form.  This inline method 
will work though, but in the future adding the patch as an attachment is 
better.  Using http://bugs.schedmd.com is even better since it is easier 
to keep track of there.  It is easy to add a file to the bug and it 
won't get lost in the list.


We do not prefer GitHub Pull Requests and usually ask those submitting 
them to resubmit them in the fashion described above.


Thanks,
Danny

On 06/25/2014 05:16 AM, Rémi Palancher wrote:

From: Rémi Palancher 

Hi developers,

Here are 2 patches for 2 bugs I found while using `sacctmgr load`.

I didn't find developers instructions nor your prefered way to receive
patches. So These patches are formatted and directly sent by git to this
mailing-list. If you prefer another way (GitHub PR?) please tell me!

I'm looking forward to reading your comments on these patches.

Rémi Palancher (2):
   Increased BUFFER_SIZE of sacctmgr load to 512KB
   Fix memory management on error in sacct load

  src/sacctmgr/file_functions.c |   18 ++
  src/sacctmgr/sacctmgr.c   |2 --
  src/sacctmgr/sacctmgr.h   |2 +-
  3 files changed, 3 insertions(+), 19 deletions(-)



[slurm-dev] Re: [PATCH 1/2] Increased BUFFER_SIZE of sacctmgr load to 512KB

2014-06-25 Thread Danny Auble


This will work, but I am thinking of doing as src/common/parse_config.c 
does and use stat() to get the size we should use. Instead of a hard 
coded value.


Removing the extra #define in sacctmgr.c is obviously a good catch as 
well ;).


Thanks,
Danny

On 06/25/2014 05:14 AM, Rémi Palancher wrote:

From: Rémi Palancher 

The macro BUFFER_SIZE used by sacctmgr was set to 4KB. This macro is
used by load_sacctmgr_cfg_file() in src/sacctmgr/file_functions.c to
set the size of the buffer which stores the lines read in dump files
that could be loaded by sacctmgr (notably the ones generated by
`sacctmgr dump`).

This size of 4KB is too small for very long lines that could be present
in theses files. For instance, this bug was encountered with a 236763B
User line full of many WCKeys.

This commit proposes to increase its size to 512KB. This is obviously
not a general solution but it should cover most use cases.

The macro definition is also removed of src/sacctmgr/sacctmgr.c since it
was useless in this file.

Signed-off-by: Rémi Palancher 
---
  src/sacctmgr/sacctmgr.c |2 --
  src/sacctmgr/sacctmgr.h |2 +-
  2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/src/sacctmgr/sacctmgr.c b/src/sacctmgr/sacctmgr.c
index cd27722..1b1af40 100644
--- a/src/sacctmgr/sacctmgr.c
+++ b/src/sacctmgr/sacctmgr.c
@@ -43,8 +43,6 @@
  #include "src/common/xsignal.h"
  #include "src/common/proc_args.h"
  
-#define BUFFER_SIZE 4096

-
  char *command_name;
  int exit_code;/* sacctmgr's exit code, =1 on any error at any 
time */
  int exit_flag;/* program to terminate if =1 */
diff --git a/src/sacctmgr/sacctmgr.h b/src/sacctmgr/sacctmgr.h
index f6068d1..974a3ba 100644
--- a/src/sacctmgr/sacctmgr.h
+++ b/src/sacctmgr/sacctmgr.h
@@ -86,7 +86,7 @@
  
  #define CKPT_WAIT	10

  #define   MAX_INPUT_FIELDS 128
-#define BUFFER_SIZE 4096
+#define BUFFER_SIZE 524288
  
  typedef enum {

/* COMMON */


[slurm-dev] Re: [PATCH 2/2] Fix memory management on error in sacct load

2014-06-25 Thread Danny Auble


Thanks Rémi.  This patch does appear good, but the real question is 
where did the


l=0xb0 for file_opts->wckey_list come from.  This appears to represent memory 
corruption, but it is unclear how.

In any case. This should work just fine since the if (exit_code) { at 
the bottom of the function will catch things as you have noted.


While this particular code is the same in 14.03 I would strongly advise 
upgrading to 14.03 or at least 2.6.  There have been numerous bug fixes 
and feature/performance enhancements since 2.5.


Danny

On 06/25/2014 05:12 AM, Rémi Palancher wrote:

From: Rémi Palancher 

Homogenize memory management after error detection in _parse_options()
with exit_code and breaks only. Always delay memory free of structure
file_opts at the end of the function when exit_code is checked. This
avoids double free() and segfaults with certain errors. For instance,
with a line0 with 2 options w/o '=':

   # cat slurmdbd.dump
   Cluster - test:Fairshare=1:QOS='normal'
   Parent - root
   Account - name1:name2:Description='none':Organization='none':Fairshare=1

   # sacctmgr load slurmdbd.dump
   For cluster test
Bad format on name2: End your option with an '=' sign
Segmentation fault

Here is the backtrace according to gdb (with slurm 2.5.7 but code did
not change on relevant parts except line numbers):

   Program received signal SIGSEGV, Segmentation fault.
   __pthread_mutex_lock (mutex=0xd8) at pthread_mutex_lock.c:50
   50  pthread_mutex_lock.c: No such file or directory.
   (gdb) bt
   #0  __pthread_mutex_lock (mutex=0xd8) at pthread_mutex_lock.c:50
   #1  0x0046716a in list_destroy (l=0xb0) at list.c:306
   #2  0x00438bd0 in _destroy_sacctmgr_file_opts (object=0x79b3d8) at 
file_functions.c:227
   #3  0x00439bc1 in _parse_options (options=0x7ff7e98a 
"name1:name2:Description='none':Organization='none':Fairshare=1\n") at 
file_functions.c:542
   #4  0x0043e5d1 in load_sacctmgr_cfg_file (argc=1, argv=0x78f0b0) at 
file_functions.c:2240
   #5  0x00440b81 in _process_command (argc=2, argv=0x78f0a8) at 
sacctmgr.c:395
   #6  0x00440649 in main (argc=3, argv=0x7fffec88) at 
sacctmgr.c:217

After this commit, the result is better:

   # sacctmgr load slurmdbd.dump
   For cluster test
Bad format on name2: End your option with an '=' sign
Problem with line(3)
Problem with requests: Unspecified error

All these _destroy() calls removed should not leak memory. Here are the valgrind
summary as a proof:

Before:

  # valgrind sacctmgr load slurmdbd.dump

   ...
   ==21552== Invalid free() / delete / delete[]
   ==21552==at 0x4C240FD: free (vg_replace_malloc.c:366)
   ==21552==by 0x463A27: slurm_xfree (xmalloc.c:270)
   ==21552==by 0x438BF9: _destroy_sacctmgr_file_opts (file_functions.c:230)
   ==21552==by 0x439C05: _parse_options (file_functions.c:548)
   ==21552==by 0x43E5F5: load_sacctmgr_cfg_file (file_functions.c:2243)
   ==21552==by 0x440BA4: _process_command (sacctmgr.c:395)
   ==21552==by 0x44066C: main (sacctmgr.c:217)
   ==21552==  Address 0x5cd8360 is 0 bytes inside a block of size 168 free'd
   ==21552==at 0x4C240FD: free (vg_replace_malloc.c:366)
   ==21552==by 0x463A27: slurm_xfree (xmalloc.c:270)
   ==21552==by 0x438BF9: _destroy_sacctmgr_file_opts (file_functions.c:230)
   ==21552==by 0x438E5D: _parse_options (file_functions.c:291)
   ==21552==by 0x43E5F5: load_sacctmgr_cfg_file (file_functions.c:2243)
   ==21552==by 0x440BA4: _process_command (sacctmgr.c:395)
   ==21552==by 0x44066C: main (sacctmgr.c:217)
   ==21552==
Problem with line(3)
Problem with requests: Unspecified error
   ==21552==
   ==21552== HEAP SUMMARY:
   ==21552== in use at exit: 66,471 bytes in 819 blocks
   ==21552==   total heap usage: 3,819 allocs, 3,001 frees, 363,156 bytes 
allocated
   ==21552==
   ==21552== LEAK SUMMARY:
   ==21552==definitely lost: 60 bytes in 1 blocks
   ==21552==indirectly lost: 240 bytes in 10 blocks
   ==21552==  possibly lost: 35,300 bytes in 516 blocks
   ==21552==still reachable: 30,871 bytes in 292 blocks
   ==21552== suppressed: 0 bytes in 0 blocks
   ==21552== Rerun with --leak-check=full to see details of leaked memory
   ==21552==
   ==21552== For counts of detected and suppressed errors, rerun with: -v
   ==21552== ERROR SUMMARY: 11 errors from 11 contexts (suppressed: 4 from 4)

After:

   # valgrind sacctmgr load slurmdbd.dump

   ==26228== Memcheck, a memory error detector
   ==26228== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
   ==26228== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for 
copyright info
   ==26228== Command: sacctmgr load slurmdbd.dump
   ==26228==
   For cluster test
Bad format on name2: End your option with an '=' sign
Problem with line(3)
Problem with requests: Unspecified error
   ==26228==
   ==26228== HEAP SUMMARY:
   ==26228== in use

[slurm-dev] Re: backfill breaking out too early

2014-06-05 Thread Danny Auble


Hey Michael,

A commit in 14.03.1 that may be related to what you are seeing is 
e94f10b8a2f85936e487358a0da001a271898d4f.  It is a partial revert of 
commit 9b1dadea4eb823b5ef29d8b4ee56cb6b7c3be22f which first appeared in 
2.6.8.  Try applying that (or upgrading to 14.03) and see if it fixes 
your issue.


I think Paul is correct though.  I don't think this is the backfill 
loop, but the normal scheduler.


Danny

On 06/05/2014 11:16 AM, Michael Gutteridge wrote:

I'm running slurm 2.6.9: I've got the backfill scheduler set up with
some pretty ridiculous parameters as we have a large number of queued
jobs of various dimensions:

SchedulerParameters=default_queue_depth=1,bf_continue,bf_interval=120,bf_max_job_user=1,bf_resolution=600,bf_window=4320,bf_max_job_part=1

This has been working fine- backfill was effectively going through the
full queue- but today it appears to have stopped- jobs which should be
backfilled onto idle resources aren't being run.  The scheduler log
shows:

[2014-06-04T13:16:10.107] sched: Running job scheduler
[2014-06-04T13:16:10.111] sched: JobId=7060218. State=PENDING.
Reason=Resources. Priority=10850. Partition=campus.
[2014-06-04T13:16:10.111] sched: JobId=7060219. State=PENDING.
Reason=Priority(Priority), Priority=10850, Partition=campus.
[2014-06-04T13:16:10.111] sched: already tested 3 jobs, breaking out

My understanding is that it shouldn't hit that limit until
default_queue_depth.  Has my controller lost it's mind?  I've got a
nearly identical test setup where this is working as I'd expect.

Any hints appreciated... thanks much

Michael


[slurm-dev] Re: Required RPMs

2014-05-29 Thread Danny Auble


Only install the DBD on the controller if that is where you plan on 
running it.  Usually there is only 1 per enterprise.  The pam rpm is 
only needed on the compute nodes as well.  You could also install the 
sj* on the compute nodes.


DBD node

slurm-2.6.4-1.el6.x86_64
slurm-devel-2.6.4-1.el6.x86_64
slurm-munge-2.6.4-1.el6.x86_64
slurm-plugins-2.6.4-1.el6.x86_64
slurm-slurmdbd-2.6.4-1.el6.x86_64
slurm-sql-2.6.4-1.el6.x86_64

controller:

slurm-2.6.4-1.el6.x86_64
slurm-devel-2.6.4-1.el6.x86_64
slurm-munge-2.6.4-1.el6.x86_64
slurm-perlapi-2.6.4-1.el6.x86_64
slurm-plugins-2.6.4-1.el6.x86_64
slurm-sjobexit-2.6.4-1.el6.x86_64
slurm-sjstat-2.6.4-1.el6.x86_64
slurm-torque-2.6.4-1.el6.x86_64

Keep in mind 2.6.4 is not recommended for new installs.  Please install 
14.03.3-2 or at least 2.6.9.


Danny

On 05/29/2014 01:15 PM, Jonathan Mills wrote:


controller:

# rpm -qa | grep -i slurm | sort
slurm-2.6.4-1.el6.x86_64
slurm-devel-2.6.4-1.el6.x86_64
slurm-munge-2.6.4-1.el6.x86_64
slurm-pam_slurm-2.6.4-1.el6.x86_64
slurm-perlapi-2.6.4-1.el6.x86_64
slurm-plugins-2.6.4-1.el6.x86_64
slurm-sjobexit-2.6.4-1.el6.x86_64
slurm-sjstat-2.6.4-1.el6.x86_64
slurm-slurmdbd-2.6.4-1.el6.x86_64
slurm-sql-2.6.4-1.el6.x86_64
slurm-torque-2.6.4-1.el6.x86_64


node:

[root@compute-0-0 ~]# rpm -qa | grep -i slurm | sort
slurm-2.6.4-1.el6.x86_64
slurm-devel-2.6.4-1.el6.x86_64
slurm-munge-2.6.4-1.el6.x86_64
slurm-pam_slurm-2.6.4-1.el6.x86_64
slurm-perlapi-2.6.4-1.el6.x86_64
slurm-plugins-2.6.4-1.el6.x86_64
slurm-torque-2.6.4-1.el6.x86_64

On 05/29/2014 04:11 PM, Brian Baughman wrote:

Greetings,

I am installing SLURM for the first time and am using the RPMs to 
make distribution across our cluster easier. The documentation:


http://slurm.schedmd.com/quickstart_admin.html

Does not provide one with a list of required RPMs for the controller 
or nodes. There are 11 RPMs generated in the rpm build process and 
some have names which would indicate they are not needed on all 
systems or on nodes. Does anyone have a list of which RPMs are 
required for the controller? For the nodes? Thank you for your help.


Regards,
Brian







[slurm-dev] Re: HDF5 Profile Plugin setup

2014-05-28 Thread Danny Auble
l Milroy [mailto:daniel.mil...@colorado.edu]
Sent: Wednesday, May 28, 2014 10:08
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hi Nancy and Rod,

I believe that slurm was built properly on the runtime system.
Slurm was configured with the --with-hdf5=yes option, and config.log
indicates that the hdf5 libs were found:

configure:20180: checking hdf5.h usability
configure:20180: gcc -c -g -O2 -pthread -fno-gcse -I/include conftest.c >&5
configure:20180: $? = 0
configure:20180: result: yes
configure:20180: checking hdf5.h presence
configure:20180: gcc -E -I/include conftest.c
configure:20180: $? = 0
configure:20180: result: yes
configure:20180: checking for hdf5.h
configure:20180: result: yes
configure:20188: checking for H5Fcreate in -lhdf5
configure:20213: gcc -o conftest -g -O2 -pthread -fno-gcse
-I/include -L/usr/lib64  conftest.c -lhdf5  -lm -lz  -lhdf5 >&5
configure:20213: $? = 0
configure:20222: result: yes
configure:20234: checking for main in -lhdf5_hl
configure:20253: gcc -o conftest -g -O2 -pthread -fno-gcse
-I/include -L/usr/lib64  conftest.c -lhdf5_hl  -lm -lz  -lhdf5 >&5
configure:20253: $? = 0
configure:20262: result: yes
configure:20274: checking for matching HDF5 Fortran wrapper
configure:20278: result: /usr/bin/h5fc

The required shared object is in
/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_profile_hdf5.so.


Thank you,

Dan Milroy

From: Nancy Kritkausky [mailto:nancy.kritkau...@bull.com]
Sent: Wednesday, May 28, 2014 10:55 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Dan,
You can check your installation to make sure the library is there.
The name of the library is acct_gather_profile_hdf5.so.  It is
normally installed under /usr/lib64/slurm.  But depending on your
.configure is could be elsewhere, including /usr/share.  As Rod
said, if hdf5 is not installed, it will not be built.
Hope this helps too,
Nancy
From: Rod Schultz [mailto:rod.schu...@bull.com]
Sent: Wednesday, May 28, 2014 09:23
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Dan,

Do you have HDF5 installed on your system? Both the runtime system
and the system upon which you built slurm.

At configure time, there is a dependency on hdf5 being installed.

The first couple of error appear to be caused by not finding the
library. This is probably the result of a build problem.

The last few are continued parsing of account_gather.conf.
The parsing of this file involves calling parsers in each
sub-account-gather plugin. If the plugin isn’t installed, items in
the file are considered errors.

Rod



From: Daniel Milroy [mailto:daniel.mil...@colorado.edu]
Sent: Wednesday, May 28, 2014 8:37 AM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Hi Danny,

There wasn’t anything in the “Profiling Using HDF5 User Guide” that
indicated that I should load the plugin via spank.  It was a result
of research into enabling the plugin since various combinations of
the parameters weren’t working.

Removing the reference to the lustre acct_gather shared object in
plugstack.conf and restarting the service yields:

error: Couldn't find the specified plugin name for
acct_gather_profile/hdf5 looking at all files
error: cannot find acct_gather_profile plugin for acct_gather_profile/hdf5
fatal: ProfileHDF5Default can not be set to NotSet, please specify a
valid option
error: Parsing error at unrecognized key: ProfileHDF5Dir
error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
line 1: "ProfileHDF5Dir=/curc/slurm/slurm/acct"
error: Parsing error at unrecognized key: ProfileHDF5Default
error: Parse error in file /curc/slurm/slurm/etc/acct_gather.conf
line 2: "ProfileHDF5Default=Filesystem"


Regards,

Dan Milroy

From: Danny Auble [mailto:d...@schedmd.com]
Sent: Tuesday, May 27, 2014 12:29 PM
To: slurm-dev
Subject: [slurm-dev] Re: HDF5 Profile Plugin setup

Dan, I wouldn't expect spank would be needed to load this plugin.

Try taking the line out of your plugstack.conf and see if that works
for you.  Was there something in the documentation
(http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead
you down this path?

Danny
On 05/23/2014 03:59 PM, Daniel Milroy wrote:
Hello,

I’ve been experiencing difficulties enabling the
AcctGatherProfileType/hdf5 plugin for the Lustre filesystem.  So far
I’ve set the following parameters:

slurm.conf
 AcctGatherProfileType=acct_gather_profile/hdf5
AcctGatherFilesystemType=acct_gather_filesystem/lustre

acct_gather.conf
 ProfileHDF5Dir=/curc/slurm/slurm/acct
ProfileHDF5Default=Filesystem

plugstack.conf
 required
/curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so

Upon job submission, I receive the following error:
salloc: error: spank:
"/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so"
exports 0 symbols
salloc: error: spank: /curc/slurm/slurm/etc/plugst

[slurm-dev] Re: migration and node communication error

2014-05-27 Thread Danny Auble
ruser

1000214tcp6  ::.212.199   nlockmgr   superuser


Is there something that is not running that should be running?


I even changed logging to debug4 and I still did not see any reason 
why.   Should I up the logging higher?



Thanks


Jackie




On Tue, May 27, 2014 at 7:24 PM, Danny Auble <mailto:d...@schedmd.com>> wrote:


Jackie, what does the slurmd log look like on one of these nodes?
The * means just what you thought, no communication.

Make sure you can ping the address from the slurmctld.

Your timeout should be fine.

Danny

On May 27, 2014 4:40:23 PM PDT, Jacqueline Scoggins
mailto:jscogg...@lbl.gov>> wrote:

I just migrated over 611  nodes to slurm from moab/torque.
 The last set of our nodes and noticed that a subset of the
nodes around 39 or so show down with a * after the work down.
 I have tried to change the state to IDLE but the log files
shows - Communication connection failure rpc:1008 errors and I
can't see to see what is causing this.


Any ideas of what to troubleshoot would be helpful.  Tried the
munge -n | ssh nodename umunge so munge is communication just
fine.  Does it have anything to do with any of the scheduler
parameters.  My thoughts are that the Timeout for message
timeout is too low for a cluster of this size:  1831 nodes.

Current setting is MessageTimeout  = 60 sec

should I increase it to 5 minutes or at least 2 minutes?

Jackie






[slurm-dev] Re: migration and node communication error

2014-05-27 Thread Danny Auble
Jackie, what does the slurmd log look like on one of these nodes?   The * means 
just what you thought, no communication. 

Make sure you can ping the address from the slurmctld. 

Your timeout should be fine. 

Danny 

On May 27, 2014 4:40:23 PM PDT, Jacqueline Scoggins  wrote:
>I just migrated over 611  nodes to slurm from moab/torque.  The last
>set of
>our nodes and noticed that a subset of the nodes around 39 or so show
>down
>with a * after the work down.  I have tried to change the state to IDLE
>but
>the log files shows - Communication connection failure rpc:1008 errors
>and
>I can't see to see what is causing this.
>
>
>Any ideas of what to troubleshoot would be helpful.  Tried the munge -n
>|
>ssh nodename umunge so munge is communication just fine.  Does it have
>anything to do with any of the scheduler parameters.  My thoughts are
>that
>the Timeout for message timeout is too low for a cluster of this size:
> 1831 nodes.
>
>Current setting is MessageTimeout  = 60 sec
>
>should I increase it to 5 minutes or at least 2 minutes?
>
>Jackie


[slurm-dev] Re: HDF5 Profile Plugin setup

2014-05-27 Thread Danny Auble

Dan, I wouldn't expect spank would be needed to load this plugin.

Try taking the line out of your plugstack.conf and see if that works for 
you.  Was there something in the documentation 
(http://slurm.schedmd.com/hdf5_profile_user_guide.html) that lead you 
down this path?


Danny

On 05/23/2014 03:59 PM, Daniel Milroy wrote:


Hello,

I’ve been experiencing difficulties enabling the 
AcctGatherProfileType/hdf5 plugin for the Lustre filesystem.  So far 
I’ve set the following parameters:


slurm.conf

AcctGatherProfileType=acct_gather_profile/hdf5

AcctGatherFilesystemType=acct_gather_filesystem/lustre

acct_gather.conf

ProfileHDF5Dir=/curc/slurm/slurm/acct

ProfileHDF5Default=Filesystem

plugstack.conf

required 
/curc/slurm/slurm/current/lib/slurm/acct_gather_filesystem_lustre.so


Upon job submission, I receive the following error:

salloc: error: spank: 
"/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so" 
exports 0 symbols


salloc: error: spank: /curc/slurm/slurm/etc/plugstack.conf:2: Failed 
to load plugin 
/curc/slurm/slurm/14.03.3/lib/slurm/acct_gather_filesystem_lustre.so. 
Aborting.


salloc: error: Failed to initialize plugin stack

Please let me know what I can do to properly enable this plugin.

Regards,

Dan Milroy





[slurm-dev] -m*:cyclic with -c > 1

2014-05-20 Thread Danny Auble


The current task/affinity when cyclically binding tasks using more than 
1 cpu will bind to cpus cyclically as well.


We feel this multi-cpu task should be bound in more of a block method, 
so a task is bound to close cpus instead of spread potentially across 
sockets.


In example of a 2 socket 6 core per socket system, a request of -n2 -c2 
would result in the following binding...


task 0 : socket 0 core 0 and socket 1 core 0
task 1 : socket 0 core 1 and socket 1 core 1

Our idea would be to change this default behavior to have the following 
binding...


task 0 : socket 0 cores 0-1
task 1 : socket 1 cores 0-1

We will probably introduce a new distribution method so if someone 
wanted to get the full cyclic distribution they could.


Does anyone have any feelings on this matter one way or the other?

Thanks,
Danny


[slurm-dev] Re: Bizarre exit code and job re-queue on node crash

2014-05-08 Thread Danny Auble


Michael, this was fixed in 2.6.4, commit d7dfa58ef.

Danny

On 05/08/2014 03:25 PM, Michael Gutteridge wrote:

Hi all..

I've run into a curious situation on our production cluster (Slurm
2.6.2, MWM 7.1.2).  It appears that sometimes when a node crashes the
job on that node is being requeued contrary to the configuration
(JobRequeue=0).  It doesn't appear that the requeue flag is being set
on the job, either.  Nor do I think Moab is at fault as it notes that
the job completed, but doesn't see the requeue.

This is curious to me as we are using the wiki2 scheduler, so it
appears that the requeue decision was made internal to Slurm (i.e.
without consulting MWM).

So... logs.  slurmctld.log:


[2014-05-07T23:24:35.143] sched: Allocate JobId=6568293
NodeList=gizmof84 #CPUs=4
...
[2014-05-08T02:53:47.852] error: Nodes gizmof84 not responding
...
[2014-05-08T02:58:01.150] Batch JobId=6568293 missing from node 0
[2014-05-08T02:58:01.150] completing job 6568293
[2014-05-08T02:58:01.150] Job 6568293 cancelled from interactive user
[2014-05-08T02:58:01.150] Requeue JobId=6568293 due to node failure
[2014-05-08T02:58:01.151] sched: job_complete for JobId=6568293
successful, exit code=4294967294
...
[2014-05-08T02:58:01.863] requeue batch job 6568293
...
[2014-05-08T02:58:11.591] _slurm_rpc_submit_batch_job JobId=6577619 usec=2788
[2014-05-08T02:58:12.024] completing job 6568293
[2014-05-08T02:58:12.025] sched: job_complete for JobId=6568293
successful, exit code=0

That exit code looks really horrible.  I'm also certain this job
wasn't cancelled by the user (assuming that is what the "cancelled
from interactive user" message means.  Is it possible that the job
record was somehow corrupted as the node failed?  Anyway, on the
slurmd.log there's indications that munge took some time getting
sorted out (communications errors), but once it does:

[2014-05-08T02:58:01.122] Purging vestigal job script
/var/tmp/slurmd/job6568293/slurm_script
[2014-05-08T02:58:11.061] reissued job credential for job 6568293
[2014-05-08T02:58:11.135] Launching batch job 6568293 for UID 45402
[2014-05-08T02:58:11.151] Received cpu frequency information for 4 cpus
[2014-05-08T02:58:11.164] [6568293] checkpoint/blcr init
[2014-05-08T02:58:12.000] [6568293] sending
REQUEST_COMPLETE_BATCH_SCRIPT, error:0
[2014-05-08T02:58:12.007] [6568293] done with job

I don't have a solid reproducible example yet, but I thought I'd see
if this is a known issue or if anyone has any thoughts on how to get
this to properly fail the jobs.

Thanks much

Michael


[slurm-dev] Re: Slurm version 14.03.2 is now available

2014-05-03 Thread Danny Auble

Commit 36b69754131b9d76b1e0ded2c8f0abafc360712e

Please check again.

On 05/03/2014 02:34 AM, Anthony Alba wrote:

Re: [slurm-dev] Slurm version 14.03.2 is now available
Hi Danny, there does not seem to be a git tag for slurm-14-03-2.


On Sat, May 3, 2014 at 5:21 AM, Danny Auble <mailto:d...@schedmd.com>> wrote:



We are Please to announce Slurm 14.03.2 available at
http://www.schedmd.com/#repos.

Please upgrade at your earliest convenience.

Here is a list of changes/fixes since 14.03.1-2.

 -- Fix race condition if PrologFlags=Alloc,NoHold is used.
 -- Cray - Make NPC only limit running other NPC jobs on shared
blades instead
of limited non NPC jobs.
 -- Fix for sbatch #PBS -m (mail) option parsing.
 -- Fix job dependency bug. Jobs dependent upon multiple other
jobs may start
prematurely.
 -- Set "Reason" field for all elements of a job array on
short-circuited
scheduling for job arrays.
 -- Allow -D option of salloc/srun/sbatch to specify relative path.
 -- Added SchedulerParameter of batch_sched_delay to permit many
batch jobs
to be submitted between each scheduling attempt to reduce
overhead of
scheduling logic.
 -- Added job reason of "SchedTimeout" if the scheduler was not
able to reach
the job to attempt scheduling it.
 -- Add job's exit state and exit code to email message.
 -- scontrol hold/release accepts job name option (in addition to
job ID).
 -- Handle when trying to cancel a step that hasn't started yet
better.
 -- Handle Max/GrpCPU limits better
 -- Add --priority option to salloc, sbatch and srun commands.
 -- Honor partition priorities over job priorities.
 -- Fix sacct -c when using jobcomp/filetxt to read newer variables
 -- Fix segfault of sacct -c if spaces are in the variables.
 -- Release held job only with "scontrol release " and not
by resetting
the job's priority. This is needed to support job arrays better.
 -- Correct squeue command not to merge jobs with state pending
and completing
together.
 -- Fix issue where user is requesting --acctg-freq=0 and no
memory limits.
 -- Fix issue with GrpCPURunMins if a job's timelimit is altered
while the job
is running.
 -- Temporary fix for handling our typemap for the perl api with
newer perl.
 -- Fix allowgroup on bad group seg fault with the controller.
 -- Handle node ranges better when dealing with accounting max
node limits.






  1   2   3   4   >