[slurm-dev] Re: Finding job command after fails

2017-10-18 Thread Marcin Stolarek
Some time ago we've been using slurmctl prologue for this.

2017-10-16 16:36 GMT+02:00 Ryan Richholt :

> Thanks, that sounds like a good idea. A prolog script could also handle
> this right? That way if the node crashes while the job is running, it would
> still be saved.
>
> On Mon, Oct 16, 2017 at 3:20 AM Merlin Hartley <
> merlin-sl...@mrc-mbu.cam.ac.uk> wrote:
>
>> You could also use a simple epilog script to save the output of ‘scontrol
>> show job’ to a file/database.
>>
>> M
>>
>>
>> --
>> Merlin Hartley
>> Computer Officer
>> MRC Mitochondrial Biology Unit
>> Cambridge, CB2 0XY
>> United Kingdom
>>
>> On 15 Oct 2017, at 20:49, Ryan Richholt  wrote:
>>
>> Is there any way to get the job command with sacct?
>>
>> For example, if I submit a job like this:
>>
>> $ sbatch testArgs.sh hey there
>>
>> I can get the full command from "scontrol show job":
>>
>>   ...
>>   Command=/home/rrichholt/scripts/testArgs.sh hey there
>>   ...
>>
>> But, that information is not available long-term with sacct.
>>
>> To explain why I would like this:
>>
>> I'm dealing with a workflow that submits lots of jobs for different
>> projects. Each submits the same script, but the first argument points to a
>> different project directory. When jobs fail, it's very hard to tell which
>> project they were working on, because "scontrol show job" only lasts for
>> 300 seconds. Sometimes they fail at night and I don't know until the next
>> morning.
>>
>>
>>


[slurm-dev] Re: How to limit number of running jobs in a partition?

2017-09-29 Thread Marcin Stolarek
You may take a look at this plugin, it does what you need - unshare
(unmount) filesystem from job namespace if license was not specified.

https://github.com/fafik23/slurm_plugins/tree/master/unshare

2017-09-29 14:44 GMT+02:00 E V :

>
> Thanks, I hadn't paid attention to the license options since I thought
> they worked just like Gres. Looks like that will work nicely, I'll
> read up on them some more and start testing it out.
>
> On Thu, Sep 28, 2017 at 4:42 PM, Marcin Stolarek
>  wrote:
> > You may consider specification of licenses in slurm.conf and for jobs.
> For
> > example license=storage1*3 and #SBATCH -l storage1.
> >
> > If you cannot rely on users you can use submit plugins and namespaces to
> > umount storage for jobs without specific license.
> >
> > cheers,
> > Marcin
> >
> > 2017-09-28 17:12 GMT+02:00 E V :
> >>
> >>
> >> Looking through the man page on slurm.conf I don't see a way to set a
> >> limit for a partition on the max number of jobs to start up at a time.
> >> Is there such a thing?
> >>
> >> What I'm trying to accomplish is essentially IO throttling. I have
> >> different storage systems hooked up to a set of compute nodes. Let's
> >> say storage x can support 1 job at a time well, i.e. that job is now
> >> CPU bound. Running more then 1 becomes disk bound and they thrash and
> >> throughput goes down. Storage y can support 3 jobs at a time before
> >> throughput starts dropping. So I'd like to allow only 1 job to run at
> >> a time using x, but up to 3 using y across any of the nodes in the
> >> partition.
> >>
> >> So the 2 ways I thought of to accomplish this are either create
> >> separate partitions for x & y and limit the total number of jobs that
> >> can be run from that partition at a time(which I can't find out how to
> >> do,) or use a single partition and have Gres handle the job counting
> >> with an x & y gres. However, Gres doesn't appear to have the concept
> >> of an overall count for the partition or cluster, only a per node
> >> count. So I'm stumped. Am I missing something, or is there another way
> >> of accomplishing this?
> >
> >
>


[slurm-dev] Re: Don't allow scheduling when slurmdbd is down

2017-09-28 Thread Marcin Stolarek
What about solution for the root cause? Shouldn't ntpd fix it?

2017-09-29 4:15 GMT+02:00 J.R. W :

>
> Hello,
>
> When our slurmdbd goes down (usually from a munge timestamps), is there a
> way to not allow anyone to submit jobs via slurmctl? When slurmdbd goes
> down, the account limits go with it and people can just hammer our cluster
> with any amount of jobs. Is there a way to make slurmdbd be a requisite for
> the slurmctld?
>
> Jordan


[slurm-dev] Re: How to limit number of running jobs in a partition?

2017-09-28 Thread Marcin Stolarek
You may consider specification of licenses in slurm.conf and for jobs. For
example license=storage1*3 and #SBATCH -l storage1.

If you cannot rely on users you can use submit plugins and namespaces to
umount storage for jobs without specific license.

cheers,
Marcin

2017-09-28 17:12 GMT+02:00 E V :

>
> Looking through the man page on slurm.conf I don't see a way to set a
> limit for a partition on the max number of jobs to start up at a time.
> Is there such a thing?
>
> What I'm trying to accomplish is essentially IO throttling. I have
> different storage systems hooked up to a set of compute nodes. Let's
> say storage x can support 1 job at a time well, i.e. that job is now
> CPU bound. Running more then 1 becomes disk bound and they thrash and
> throughput goes down. Storage y can support 3 jobs at a time before
> throughput starts dropping. So I'd like to allow only 1 job to run at
> a time using x, but up to 3 using y across any of the nodes in the
> partition.
>
> So the 2 ways I thought of to accomplish this are either create
> separate partitions for x & y and limit the total number of jobs that
> can be run from that partition at a time(which I can't find out how to
> do,) or use a single partition and have Gres handle the job counting
> with an x & y gres. However, Gres doesn't appear to have the concept
> of an overall count for the partition or cluster, only a per node
> count. So I'm stumped. Am I missing something, or is there another way
> of accomplishing this?
>


[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

2017-09-25 Thread Marcin Stolarek
I think that all you needed was to set the node state to DOWN/FAIL and then
RESUME without actually rebooting the node. Did you try this? I remember
that in FAQ this was used for jobs stacked in CG state.

cheers,
Marcin


[slurm-dev] Re: defaults, passwd and data

2017-09-24 Thread Marcin Stolarek
2017-09-24 9:13 GMT+02:00 Lachlan Musicman :

>
> On 24 September 2017 at 16:20, Daniel Letai  wrote:
>
>> Hello,
>>
>> B. We have active directory(AD) in our faculty, and We prefer manage
>> users/groups from there , is it possible? any guide available somewhere?
>>
>>
>> Search this mailing list, this question pops up every now and again,
>> there is no builtin solution.
>> You should consider using accounting, but if you decide to incorporate AD
>> into slurm accounting, you will have to decide how to group users and
>> accounts (create correct rules).
>>
>
>
> We are successfully using FreeIPA/SSSD in a one way trust with AD. This
> works fine with slurm.
>
> So do I, however, I'm using sssd with AD provider joined into AD domain.
It's tricky and requires good sssd understanding, but it works... in
general.

Separate thing is that slurm accounting requirements may be much more
compilcated than simply groups allowed/denied access to resources - in this
scenario you can think about additional submit plugin utilizing your AD
groups or simply configure slurm accounts/wc keys on top of your AD
accounts.

cheers,
Marcin


[slurm-dev] Re: Accounting using LDAP ?

2017-09-20 Thread Marcin Stolarek
Christopher,

If you want to use advanced slurm features you'll have to disable slurm
management in Bright. It provides really basic functionalities for those
who would like to start with the cluster very fast. However, when you're
configuration complexity grows, you have to manage slurm directly...

cheers,
Marcin

2017-09-20 9:14 GMT+02:00 Loris Bennett :

>
> Hi Chris,
>
> Christopher Samuel  writes:
>
> > On 20/09/17 15:53, Loris Bennett wrote:
> >
> >> Having said that, the only scenario I can see being easily automated is
> >> one where each user only has one association, namely with their Unix
> >> group, and everyone has equal shares.  This is our set up, but as soon
> >> as you have, say, users with multiple associations and/or membership in
> some
> >> associations confers more shares automation becomes very difficult.
> >
> > The user management system we use adds/removes users to accounts (which
> > map to projects in our lingo) whenever a user is added/removed to a
> > project as well as creating/deleting them.  Users can change their
> > default project which changes their default account in Slurm.
>
> Is the user management system homegrown or something more generally
> available?
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>


[slurm-dev] Re: Can you run SLURM on a single node ?

2017-08-17 Thread Marcin Stolarek
However I'd advice you to create a VM with dedicated CPUs for "login node".
If you allow people to login to the one node which is also computing node
you have to bind ssh processes to dedicated CPU to prevent resources usage
outside of slurm..

2017-08-10 15:15 GMT+02:00 Benjamin Redling :

>
> Am 10. August 2017 13:47:21 MESZ, schrieb Sean McGrath <
> smcg...@tchpc.tcd.ie>:
> >
> >Yes, you can run slurm on a single node. There is no need for for a
> >different
> >head and compute node(s).
> >
> >You will need to set Shared=Yes if you want multiple people to be able
> >to run on
> >the machine simultaneously.
> >
> >The slurm.conf will have a single node defined in it.
> >
> >Best
> >
> >Sean
> >
> >On Thu, Aug 10, 2017 at 05:39:29AM -0600, Carlos Lijeron wrote:
> >
> >> Hi Everyone,
> >>
> >> In order to use resources more efficiently on a server that has 64
> >CPU Cores and 1 TB of RAM, is it possible to use SLURM on a stand alone
> >server, or do you always need a head node and compute nodes to setup
> >the clients?   Please advise.
> >>
> >> Thank you.
> >>
> >>
> >> Carlos.
> >>
> >>
>
> AFAIK "Shared" is about resources and known as  "OverSubscribe" in newer
> versions.
> As long as constrains are resource based and nodes are not reserved
> exclusively to a single user multiple jobs from different users are
> possible even without oversubscription.
>
> BR
> --
> FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html
> vox: +49 3641 9 44323 | fax: +49 3641 9 44321
>


[slurm-dev] Re: How to print information message at submission time.

2017-06-19 Thread Marcin Stolarek
I don't think it's possible to print a message if job is accepted to the
queue..

2017-06-19 18:16 GMT+02:00 :

> Hello,
>
> I'm using *job_submit* plugin (C langage) to manage users job submission
> on ours systems.
>
> I would like to print an information message at the terminal each time a
> job is submitted and I don't find how to do.
>
>
>
> It works fine in case of error message using *err_msg* parameter of the
> *job_submit* function :
>
> *extern int job_submit(struct job_descriptor *job_desc, uint32_t
> submit_uid, char **err_msg)*
>
>
>
>
> Best regards,
> Gerard Gil
>
>
> Centre Informatique National de l'Enseignement Superieur
> 950, rue de Saint Priest
> 34097 Montpellier CEDEX 5
> FRANCE
>
>
>


[slurm-dev] Re: Looking for distributions of wait times for jobs submitted over the past year

2017-06-17 Thread Marcin Stolarek
W dniu czw., 15.06.2017 o 21:58 Barry Moore 
napisał(a):

> Thanks alot! These will be very helpful!
>
> On Thu, Jun 15, 2017 at 3:52 PM, Kilian Cavalotti <
> kilian.cavalotti.w...@gmail.com> wrote:
>
>>
>> Hi Barry,
>>
>> On Thu, Jun 15, 2017 at 9:16 AM, Barry Moore  wrote:
>> > Does anyone have a script or knowledge of how to query wait times for
>> Slurm
>> > jobs in the last year or so?
>>
>> With the help of histogram.py from
>> https://github.com/bitly/data_hacks, you can have a one-liner:
>>
>> $ SLURM_TIME_FORMAT="%s" sacct -nDX -o submit,start -S $(date -d "now
>> -1 year" +%Y-%m-%d) | awk '{w=($2-$1)/60; if (w>=0) print w}' |
>> histogram.py -p --no-mvsd -f "%8.0f"
>> # NumSamples = 255101; Min = 0.00; Max = 18688.60
>> # each ∎ represents a count of 3299
>>   0 - 1869 [247430]:
>>
>> ∎∎∎
>> (96.99%)
>>1869 - 3738 [  3437]: ∎ (1.35%)
>>3738 - 5607 [  1974]:  (0.77%)
>>5607 - 7475 [  1045]:  (0.41%)
>>7475 - 9344 [   677]:  (0.27%)
>>9344 -11213 [53]:  (0.02%)
>>   11213 -13082 [55]:  (0.02%)
>>   13082 -14951 [46]:  (0.02%)
>>   14951 -16820 [95]:  (0.04%)
>>   16820 -18689 [   289]:  (0.11%)
>>
>> That's wait times in minutes for the past year. You can add the
>> relevant divider in the awk part to get that in other time units.
>> Also, adjusting bins or using log scale in histogram.py options could
>> help.
>>
>> Cheers,
>> --
>> Kilian
>>
>
>
>
> --
> Barry E Moore II, PhD
> E-mail: bmoor...@pitt.edu
>
> Assistant Research Professor
> Center for Simulation and Modeling
> University of Pittsburgh
> Pittsburgh, PA 15260
>


[slurm-dev] Re: Looking for distributions of wait times for jobs submitted over the past year

2017-06-15 Thread Marcin Stolarek
My advice would be to export  accounting data to xdmod

Cheers
Marcin

W dniu czw., 15.06.2017 o 18:17 Barry Moore 
napisał(a):

> Hey All,
>
> Does anyone have a script or knowledge of how to query wait times for
> Slurm jobs in the last year or so?
>
> Thank you,
>
> Barry
>
> --
> Barry E Moore II, PhD
> E-mail: bmoor...@pitt.edu
>
> Assistant Research Professor
> Center for Simulation and Modeling
> University of Pittsburgh
> Pittsburgh, PA 15260
>


[slurm-dev] Re: LDAP required?

2017-04-10 Thread Marcin Stolarek
but... is LDAP such a big issue?

2017-04-10 22:03 GMT+02:00 Jeff White :

> Using Salt/Ansible/Chef/Puppet/Engine is another way to get it done.
> Define your users in states/playbooks/whatever and don't bother with
> painful LDAP or ancient NIS solutions.
>
> --
> Jeff White
> HPC Systems Engineer
> Information Technology Services - WSU
>
> On 04/10/2017 09:39 AM, Alexey Safonov wrote:
>
> If you don't want to share passwd and setup LDAP which is complex task you
> can setup NIS. It will take 30 minutes of your time
>
> Alex
>
> 11 апр. 2017 г. 0:35 пользователь "Raymond Wan" 
> написал:
>
>>
>> Dear all,
>>
>> I'm trying to set up a small cluster of computers (i.e., less than 5
>> nodes).  I don't expect the number of nodes to ever get larger than
>> this.
>>
>> For SLURM to work, I understand from web pages such as
>> https://slurm.schedmd.com/accounting.html
>> 
>> that UIDs need to be shared
>> across nodes.  Based on this web page, it seems sharing /etc/passwd
>> between nodes appears sufficient.  The word LDAP is mentioned at the
>> end of the paragraph as an alternative.
>>
>> I guess what I would like to know is whether it is acceptable to
>> completely avoid LDAP and use the approach mentioned there?  The
>> reason I'm asking is that I seem to be having a very nasty time
>> setting up LDAP.  It doesn't seem as "easy" as I thought it would be
>> [perhaps it was my fault for thinking it would be easy...].
>>
>> If I can set up a small cluster without LDAP, that would be great.
>> But beyond this web page, I am wondering if there are suggestions for
>> "best practices".  For example, in practice, do most administrators
>> use LDAP?  If so and if it'll pay off in the end, then I can consider
>> continuing with setting it up...
>>
>> Thanks a lot!
>>
>> Ray
>>
>
>


[slurm-dev] Re: Fwd: job requeued in held state

2017-04-03 Thread Marcin Stolarek
Have you checked slurmd/prologue logs? It looks like you job was eligible
to run, but it failed to start on computing node. If it failed in prologue
you can requeue the job without helding with
SchedulerParameters=nohoold_on_prologue_fail.

cheers,
Marcin

2017-04-03 21:31 GMT+02:00 Chris Woelkers - NOAA Affiliate <
chris.woelk...@noaa.gov>:

>
> I am running a small HPC, only 24 nodes, via slurm and am having an
> issue where one of the users is unable to submit any jobs.
> The user is new and whenever a job is submitted it shows the "job
> requeued in held state" state and is never actually ran. We have left
> the job sitting for over three days and it does not start. We have
> tried releasing the job and it does not start. Here are the log
> entries after an attempted release:
>
> [2017-04-03T19:16:24.173] sched: update_job: releasing hold for job_id
> 1938 uid 0
> [2017-04-03T19:16:24.174] _slurm_rpc_update_job complete JobId=1938
> uid=0 usec=375
> [2017-04-03T19:16:24.919] sched: Allocate JobId=1938
> NodeList=rhinonode[07-14] #CPUs=192
> [2017-04-03T19:16:25.017] _slurm_rpc_requeue: Processing RPC:
> REQUEST_JOB_REQUEUE from uid=0
> [2017-04-03T19:16:25.035] Requeuing JobID=1938 State=0x0 NodeCnt=0
>
> The user has the same permissions as the older users that can run jobs.
> The script that is being run is a simple test script and no matter
> where the output is redirected, an NFS mount(for our SAN), the local
> home directory, or the tmp directory, the result is the same.
>
> Any idea as to what might be happening?
>
> Thanks,
>
> Chris Woelkers
> Caelum Research Corp.
> Linux Server and Network Administrator
> NOAA GLERL
>


[slurm-dev] Re: Priority blocking jobs despite idle machines

2017-03-25 Thread Marcin Stolarek
Shorten your time specification for this job if possible.  Ask your admins
:)

2017-03-24 16:32 GMT+01:00 Stefan Doerr :

> Hm I don't know how since only the admins have access to that stuff. I
> could ask them if you could be a bit more specific :)
>
> On Fri, Mar 24, 2017 at 1:39 PM, Jared David Baker 
> wrote:
>
>> Stefan,
>>
>>
>>
>> I believe I experienced this as well. Very similar situation at least
>> (and still looking into it). Can you provide your conf files?
>>
>>
>>
>> Best, Jared
>>
>>
>>
>> *From: *Roe Zohar 
>> *Reply-To: *slurm-dev 
>> *Date: *Friday, March 24, 2017 at 6:17 AM
>> *To: *slurm-dev 
>> *Subject: *[slurm-dev] Re: Priority blocking jobs despite idle machines
>>
>>
>>
>> I had the same problem. Didn't find a solution.
>>
>>
>>
>> On Mar 24, 2017 12:59 PM, "Stefan Doerr"  wrote:
>>
>> OK so after some investigation it seems that the problem is when there
>> are miha jobs pending in the queue (I truncated the squeue before for
>> brevity).
>>
>> These miha jobs require ace1, ace2 machines so they are pending since
>> these machines are full right now.
>>
>> For some reason SLURM thinks that because the miha jobs are pending it
>> cannot work on my jobs (which don't require specific machines) and puts
>> mine pending as well.
>>
>>
>>
>> Once we cancelled the pending miha jobs (leaving the running ones
>> running), I cancelled also my pending jobs, resent them and it worked.
>>
>>
>>
>> This seems to me like a quite problematic limitation in SLURM.
>>
>> Any opinions on this?
>>
>>
>>
>> On Fri, Mar 24, 2017 at 10:43 AM, Stefan Doerr 
>> wrote:
>>
>> Hi I was wondering about the following. I have this situation
>>
>>
>>
>>  84974 multiscal testpock   sdoerr PD   0:00  1
>> (Priority)
>>
>>  84973 multiscal testpock   sdoerr PD   0:00  1
>> (Priority)
>>
>>  81538 multiscalRC_f7 miha  R   17:41:56  1 ace2
>>
>>  81537 multiscalRC_f6 miha  R   17:42:00  1 ace2
>>
>>  81536 multiscalRC_f5 miha  R   17:42:04  1 ace2
>>
>>  81535 multiscalRC_f4 miha  R   17:42:08  1 ace2
>>
>>  81534 multiscalRC_f3 miha  R   17:42:12  1 ace1
>>
>>  81533 multiscalRC_f2 miha  R   17:42:16  1 ace1
>>
>>  81532 multiscalRC_f1 miha  R   17:42:20  1 ace1
>>
>>  81531 multiscalRC_f0 miha  R   17:42:24  1 ace1
>>
>>
>>
>>
>>
>> [sdoerr@xxx Fri10:35 slurmtest]  sinfo
>>
>> PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
>>
>> multiscale*  up   infinite  1 drain* ace3
>>
>> multiscale*  up   infinite  1  down* giallo
>>
>> multiscale*  up   infinite  2mix ace[1-2]
>>
>> multiscale*  up   infinite  5   idle arancio,loro,oliva,rosa,suo
>>
>>
>>
>>
>>
>> The miha jobs use exclude nodes to run only on machines with good GPUs
>> (ace1, ace2)
>>
>> As you can see I have 5 machines idle which could serve my jobs but my
>> jobs are for some reason stuck in pending due to "priority". I am indeed
>> very sure that these 5 nodes satisfy the hardware requirements for my jobs
>> (also ran them yesterday).
>>
>>
>>
>> It's just that for some reason, which we have had before, these
>> node-excluding miha jobs seem to get the rest stuck in priority. If we
>> cancel them, then mine will go through to the idle machines. However we
>> cannot figure out what is the cause for that. I paste below the scontrol
>> show job for one miha and one of my jobs.
>>
>>
>>
>> Many thanks!
>>
>>
>>
>>
>>
>> JobId=81534 JobName=RC_f3
>>
>>UserId=miha(3056) GroupId=lab(3000)
>>
>>Priority=33 Nice=0 Account=lab QOS=normal
>>
>>JobState=RUNNING Reason=None Dependency=(null)
>>
>>Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>
>>RunTime=17:44:07 TimeLimit=UNLIMITED TimeMin=N/A
>>
>>SubmitTime=2017-03-23T16:53:14 EligibleTime=2017-03-23T16:53:14
>>
>>StartTime=2017-03-23T16:53:15 EndTime=Unknown
>>
>>PreemptTime=None SuspendTime=None SecsPreSuspend=0
>>
>>Partition=multiscale AllocNode:Sid=blu:26225
>>
>>ReqNodeList=(null) ExcNodeList=arancio,giallo,loro,oliva,pink,rosa,suo
>>
>>NodeList=ace1
>>
>>BatchHost=ace1
>>
>>NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>
>>TRES=cpu=1,mem=11500,node=1,gres/gpu=1
>>
>>Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
>>
>>MinCPUsNode=1 MinMemoryNode=11500M MinTmpDiskNode=0
>>
>>Features=(null) Gres=gpu:1 Reservation=(null)
>>
>>Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>>
>>Power= SICP=0
>>
>>
>>
>>
>>
>> JobId=84973 JobName=testpock2
>>
>>UserId=sdoerr(3041) GroupId=lab(3000)
>>
>>Priority=33 Nice=0 Account=lab QOS=normal
>>
>>JobState=PENDING Reason=Priority Dependency=(null)
>>
>>Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>>
>>RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
>>
>>SubmitTime=2017-03-2

[slurm-dev] sreport inconsistency

2017-03-17 Thread Marcin Stolarek
I've observed that utlization and top users listing looks like inconsitent
for me.
Do I understand correctly thatt percent of used by users shoudl sum to
percent of allocated for cluster utilization?

cheers,
Marcin

# sreport cluster utilization Start=2017-03-01 -t percent

Cluster Utilization 2017-03-01T00:00:00 - 2017-03-16T23:59:59
Use reported in Percentage of Total

  Cluster Allocated Down PLND Dow Idle Reserved  Reported
- -     -
slurm_cl+44.16%   34.18%0.00%   20.87%0.80%   100.00%

# sreport user topusage Start=2017-03-01  -t percent

Top 10 Users 2017-03-01T00:00:00 - 2017-03-16T23:59:59 (1382400 secs)
Use reported in Percentage of Total

  Cluster Login Proper Name Account Used   Energy
- - --- ---  
slurm_cl+dXXX RXXroot   33.86%0.00%
slurm_cl+lXXX LXXroot0.44%0.00%
slurm_cl+sXXX  BXXlroot0.20%0.00%
slurm_cl+fXXX NXXroot0.06%0.00%
slurm_cl+qXXXSXXroot0.00%0.00%


[slurm-dev] RE: Job-Specific Working Directory on Local Scratch

2017-03-14 Thread Marcin Stolarek
What about setting a random named working directory in submit plugin and
creation of this directory in prologue?


[slurm-dev] Re: Power outage causes wrong reports

2017-02-21 Thread Marcin Stolarek
It you are on Slurm 16, you can try:
sacctmgr show


*RunawayJobsfrom man:*Used only with the *list* or *show* command to report
current jobs that have been orphanded on the local cluster and are now
runaway. If there are jobs in this state it will also give you an option to
"fix" the

2017-01-23 11:39 GMT+01:00 Paddy Doyle :

>
> Hi Lucas,
>
> This old thread might help:
>
> https://groups.google.com/forum/#!topic/slurm-devel/TQcerLLEKAU
>
> Paddy
>
> On Fri, Jan 20, 2017 at 10:00:00AM -0800, Lucas Vuotto wrote:
>
> >
> > Hi all,
> > sreport was showing that an user was using more CPU hours per week
> > than available. After checking the output of sacct, we found that some
> > jobs from an array didn't ended:
> >
> > $ sacct -j 69204 -o jobid%-14,state%6,start,elapsed,end
> >
> >  JobID  State   StartElapsed End
> >
> > -- -- --- -- ---
> > 69204_[1-1000] FAILED 2016-11-09T17:46:50   00:00:00 2016-11-09T17:46:50
> > 69204_1FAILED 2016-11-09T17:46:44 71-20:25:55Unknown
> > 69204_2FAILED 2016-11-09T17:46:44 71-20:25:55Unknown
> > [...]
> > 69204_295  FAILED 2016-11-09T17:46:46 71-20:25:53Unknown
> > 69204_296  FAILED 2016-11-09T17:46:46 71-20:25:53Unknown
> > 69204_297  FAILED 2016-11-09T17:46:46   00:00:00 2016-11-09T17:46:46
> > [...]
> > 69204_999  FAILED 2016-11-09T17:46:50   00:00:00 2016-11-09T17:46:50
> >
> > It seems that somehow those jobs got stucked (~72 days after
> > 2016-11-09 is today, 2017-01-20, and that's why the wrong reports).
> > scancel says that 69204 is an invalid job id.
> >
> > Any idea on how to fix this? We're thinking about deleting the entries
> > of those jobs in the DB. Is it safe to run "arbitrary" commands in the
> > DB, bypassing slurmdbd?
> >
> > Thanks in advance.
> >
> >
> > -- lv.
> >
> >
> > -- lv.
> >
>
> --
> Paddy Doyle
> Trinity Centre for High Performance Computing,
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> Phone: +353-1-896-3725
> http://www.tchpc.tcd.ie/
>


[slurm-dev] RE: A little bit help from my slurm-friends

2017-02-16 Thread Marcin Stolarek
You can also have a submit plugin that will put job in multiple partition
if non specified. This should reduce the drawback of multiple partitions.

However I  think that with features and topology plugin you should be able
to aviod multiple partitions setup.

cheers
Marcin

2017-01-17 9:49 GMT+01:00 David WALTER :

>
> Thanks Paul for your response and your advices.
>
> That's actually the reason why they asked me to set 3 and now 4
> partitions. As we have now 4 different generation of nodes with significant
> differences of hardware (not the same CPU, not the same amount of RAM) we
> thought that it was a good solution.
>
> I will test with people to adjust the solution with the needs of the many.
>
> Thanks again
>
> --
> David WALTER
> The computer guy
> david.wal...@ens.fr
> 01/44/32/27/94
>
> INSERM U960
> Laboratoire de Neurosciences Cognitives
> Ecole Normale Supérieure
> 29, rue d'Ulm
> 75005 Paris
>
> -Message d'origine-
> De : Paul Edmon [mailto:ped...@cfa.harvard.edu]
> Envoyé : lundi 16 janvier 2017 16:37
> À : slurm-dev
> Objet : [slurm-dev] RE: A little bit help from my slurm-friends
>
>
> I agree having multiple partitions will decrease efficiency of the
> scheduler.  That said if you have to do it, you have to do it.  Using the
> features is a good way to go if people need specific ones.  I could see
> having multiple partitions so you can charge differently for each
> generation of hardware, as run times will invariably be different.
> Still if that isn't a concern just have a single queue.
>
> For multifactor I would turn on fairshare and age.  JobSize really isn't
> useful unless you have people running multicore jobs and you want to
> prioritize, or deprioritize those.
>
> If you end up in a multipartition scenario then I recommend having a
> backfill queue that underlies all the partitions and setting up REQUEUE on
> that partition.  That way people can farm idle cycles.  This is especially
> good for people who are hardware agnostic and don't really care when their
> jobs get done but rather just have a ton to do that can be interrupted at
> any moment. That's what we do here and we have 110 partitions.  Our
> backfill queue does a pretty good job up picking up the idle cores but
> still there is structural inefficiencies with that many partitions so we
> never get above about 70% usage of our hardware.
>
> So just keep that in mind when you are setting things up.  More partitions
> means more structural inefficiency but it does give you other benefits such
> as isolating hardware for specific use.  It really depends on what you
> need.  I highly recommend experimenting to figure out what fits you and
> your users best.
>
> -Paul Edmon-
>
> On 1/16/2017 10:16 AM, Loris Bennett wrote:
> > David WALTER  writes:
> >
> >> Dear Loris,
> >>
> >> Thanks for your response !
> >>
> >> I'm going to look on this features in slurm.conf.  I only configured
> >> the CPUs, Sockets per node. Do you have any example or link to
> >> explain me how it's working and what can I use ?
> > It's not very complicated.  A feature is just a label, so if you had
> > some nodes with Intel processors and some with AMD, you could attach
> > the features, e.g.
> >
> > NodeName=node[001,002] Procs=12 Sockets=2 CoresPerSocket=6
> > ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=intel
> > NodeName=node[003,004] Procs=12 Sockets=2 CoresPerSocket=6
> > ThreadsPerCore=1 RealMemory=42000 State=unknown Feature=amd
> >
> > Users then just request the required CPU type in their batch scripts
> > as a constraint, e.g:
> >
> > #SBATCH --constraint="intel"
> >
> >> My goal is to respond to people needs and launch their jobs as fast
> >> as possible without losing time when one partition is idle whereas
> >> the others are fully loaded.
> > The easiest way to avoid the problem you describe is to just have one
> > partition.  If you have multiple partitions, the users have to
> > understand what the differences are so that they can choose sensibly.
> >
> >> That's why I thought the fair share factor was the best solution
> > Fairshare won't really help you with the problem that one partition
> > might be full while another is empty.  It will just affect the
> > ordering of jobs in the full partition, although the weight of the
> > partition term in the priority expression can affect the relative
> > attractiveness of the partitions.
> >
> > In general, however, I would suggest you start with a simple set-up.
> > You can always add to it later to address specific issues as they arise.
> > For instance, you could start with one partition and two QOS: one for
> > normal jobs and one for test jobs.  The latter could have a higher
> > priority, but only a short maximum run-time and possibly a low maximum
> > number of jobs per user.
> >
> > Cheers,
> >
> > Loris
> >
>


[slurm-dev] Re: Standard suspend/resume scripts?

2017-02-16 Thread Marcin Stolarek
I don't think that there is something good for everyone. It depends on the
way you manage the cluster. If you have diskless/stateless nodes it can be
also good to force power off with ipmitool.

2017-02-16 8:19 GMT+01:00 Loris Bennett :

>
> Lachlan Musicman  writes:
>
> > Re: [slurm-dev] Standard suspend/resume scripts?
> >
> > If you are looking to suspend and resume jobs, use scontrol:
> >
> > scontrol suspend 
> > scontrol resume 
> >
> > https://slurm.schedmd.com/scontrol.html
> >
> > The docs you are pointing to look more like taking nodes offline in
> times of low usage?
>
> Yes, because that's what I'm interested in ;-)
>
> Cheers,
>
> Loris
>
> --
> Dr. Loris Bennett (Mr.)
> ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>


[slurm-dev] Re: build slurm on musl?

2017-02-14 Thread Marcin Stolarek
can't you just use different libc (glibc) for slurm?

2017-01-16 20:31 GMT+01:00 Rowan, Jim :

>
> Hi,
>
> We're trying to bring up slurm (compute nodes, not the master) on a
> platform that uses musl for libc.   Musl doesn’t support lazy binding of
> symbols in dynamic objects — a feature that seems to be a cornerstone of
> the plugin implementation.
>
> Has anyone done work on getting around this issue?Ideas to share on
> what approach to take?   Our cluster is very simple; we don’t need any of
> the fancy things that plugins provide, but it appears that even linear
> scheduling is a plugin.
>
>
>
>
> Jim Rowan
> j...@codeaurora.org
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by the Linux Foundation
>
>


[slurm-dev] Re: Numbering of physical and hyper cores

2017-02-14 Thread Marcin Stolarek
I don't think you can specify it. I believe slurm makes use of external
library to recognize the cores topology.
To check if this is a bug or designed behaviur, you can check:
--hint=nomultithread option to srun

cheers,
Marcin

2017-02-09 8:21 GMT+01:00 Ulf Markwardt :

> Dear all,
>
> where can I tell Slurm what core numbers belong to the same physical core?
>
> The physical cores on our KNL are 0-63, followed by hyperthreads 64-255.
>   cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
>   0,64,128,192
>
> When I ask for 4 cores with "srun --pty -c 4 -p knl bash" I see:
>taurusknl1 /home/mark taskset -pc $$
>pid 285662's current affinity list: 0,64,128,192
> but these are not 4 cores but only one core!
>
> It looks like Slurm does not recognize the numbering scheme for the
> cores on the node. Where can I specify this?
>
>
> Thank you,
> Ulf
>
> "scontrol show node " says:
>CoreSpecCount=1 CPUSpecList=252-255
> this, again, are 4 threads on 4 different cores!
>
> This is my node entry for this guy:
> NodeName=taurusknl[1] Sockets=1 CoresPerSocket=64 ThreadsPerCore=4
> State=UNKNOWN RealMemory=94000 Weight=64 CoreSpecCount=1
>
>
> --
> ___
> Dr. Ulf Markwardt
>
> Technische Universität Dresden
> Center for Information Services and High Performance Computing (ZIH)
> 01062 Dresden, Germany
>
> Phone: (+49) 351/463-33640  WWW:  http://www.tu-dresden.de/zih
>
>


[slurm-dev] Re: Removing partition killed jobs

2017-02-14 Thread Marcin Stolarek
I think that generally you should set this partition to drain state. This
will prevent new jobs submission, but allow all running and pending jobs
execution.

I don't think it's possible to change the partition for running jobs, for
pending you could have done simply: scontrol update job=JOBID
partition=NEWPART

If you remove the partition from configuration, then  in job structure you
have a reference to non-existent partition. Such a structure can not be
created when you restart slurm, so jobs were removed.

cheers,
Marcin

2017-02-13 18:12 GMT+01:00 Nicholas McCollum :

> Just ran into an issue while removing old nodes that I had in a
> partition.  We use a job_submit.lua script that has every job submitted
> to every partition.
>
> local submit_part = ""
>   if asc_job_partition == "" then
> asc_job_partition = "dmc,uv"
>   end
>   if string.find( asc_job_partition , 'dmc' ) then
> submit_part = (submit_part .. "dmc-ivy-bridge,dmc-haswell,dmc-
> broadwell,gpu_kepler,gpu_pascal,")
>   end
>   if string.find( asc_job_partition , 'uv' ) then
> submit_part = (submit_part .. "uv,")
>   end
>   if string.find( asc_job_partition , 'knl' ) then
> submit_part = (submit_part .. "knl,")
>   end
>   --[[ Strips the last comma off the string and writes the new info ]]-
> -
>   job_desc.partition = string.sub(submit_part, 1, -2)
>
> I had previously gpu_fermi and gpu_tesla in my string.  All currently
> running jobs had submit partitions with this string in it, although all
> of the nodes were drained in the gpu_fermi and gpu_tesla partitions.
>
> I went ahead and removed the line from my job_submit.lua, and removed
> the partition and nodes from the slurm.conf.  I then initiated a
> scontrol reconfigure.
>
> After a minute or two I noticed then, that all the jobs had
> disappeared.  Nothing pending, nothing running.
>
> The logs showed 'error: Invalid partition (gpu_fermi) for job 70496'
> and that SLURM had sent a terminate command to all of the running jobs
> because of this.
>
> So, a cautionary tale for all.  I think in the future I will edit my
> job_submit.lua script and wait for all the jobs that have ran through
> it to finish before removing partitions.
>
> My question for the group is, other than the above mentioned method, is
> there something I could have done differently to prevent SLURM from
> killing jobs when removing partitions?
>
> Thanks!
>
> --
> Nicholas McCollum
> HPC Systems Administrator
> Alabama Supercomputer Authority


[slurm-dev] Re: Rename a SLURM account?

2017-02-14 Thread Marcin Stolarek
Maybe setup a test account and try with simply update on database:
update clustername_assoc_table set  acct="new" where id_assoc=YOURASSOCID ?

I haven't checked the details, but I think that only id_assoc may be used
as foreign key, or assumed to be a key in slurm and acct should only be a
string used to display the name human being can understand.

I'm neither sure, nor haven't tested this.

cheers,
Marcin


2017-02-13 20:12 GMT+01:00 Ryan Novosielski :

> Yeah, I was afraid of that. My only issue with that is that it will
> negatively affect accounting data (or may positively impact a user when it
> should not, by associating their usage with a different/no-longer-existent
> lab). I don’t know that there’s any way around that, is there?
>
> > On Feb 11, 2017, at 7:51 AM, Will French 
> wrote:
> >
> >
> > Your only option is to delete the association and recreate it with the
> appropriate account name. An association is a user+account+[partition] so
> none of those can be changed once the association exists.
> >
> > Will
> >
> >> On Feb 10, 2017, at 5:37 PM, Ryan Novosielski 
> wrote:
> >>
> >> Hope someone has an idea here:
> >>
> >> We apparently accidentally named an account incorrectly at my
> organization. Trying to update it, I got the error message “Can’t modify
> the name of an account”. Is there any other recourse that anyone’s aware
> of? I’m running SLURM 15.08.x at the moment.
> >>
> >> Thanks.
> >>
> >> --
> >> 
> >> || \\UTGERS,  |---*
> O*---
> >> ||_// the State   | Ryan Novosielski - novos...@rutgers.edu
> >> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> >> ||  \\of NJ   | Office of Advanced Research Computing - MSB
> C630, Newark
> >>   `'
> >>
>
> --
> 
> || \\UTGERS, |---*
> O*---
> ||_// the State  | Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> ||  \\of NJ  | Office of Advanced Research Computing - MSB C630, Newark
>  `'
>
>


[slurm-dev] Re: Stopping compute usage on login nodes

2017-02-10 Thread Marcin Stolarek
On the cluster I've been managing we had a solution with pam_script that
was choosing for each user two random cores and bounding his session to
those (if this is second session use the same cores). I think it's quite
good solution, since
1) User is not able to take all server resources
2) The probability that two users will be bound to the same resources is
decreased (so one user will not affect others) . It can be optimized with
change of two cores to any number that is optimal for the login node
resources and number of users loged in.

Additionally to this we had simply cron job to notify admins when user
process cputime is larger than 2 minutes and the difference between real
time and cpu time is small.

cheers,
Marcin

2017-02-09 20:01 GMT+01:00 Ryan Novosielski :

> I have used ulimits in the past to limit users to 768MB of RAM per
> process. This seemed to be enough to run anything they were actually
> supposed to be running. I would use cgroups on a more modern (this was
> RHEL5).
>
> A related question: we used cgroups on a CentOS 6 system, but then
> switched our accounts to private user groups as opposed to a more general
> "hpcusers" group. It doesn't seem like there is a way to use cgroups on a
> secondary group, or any other easy way to do this. The setup was that the
> main user group was limited to "most" of the machine and users were limited
> to some percentage of the most. With users not sharing any group, this
> stopped working. Anyone know of an alternative (I guess doing it based on
> excluding system users and applying limits to everyone else, but this seems
> hamfisted).
>
> --
> 
> || \UTGERS,   |---*O
> *---
> ||_// the State | Ryan Novosielski - novos...@rutgers.edu
> || \ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922>
> (2x0922) ~*~ RBHS Campus
> ||  \of NJ | Office of Advanced Research Computing - MSB C630,
> Newark
> `'
>
> On Feb 9, 2017, at 13:05, Ole Holm Nielsen 
> wrote:
>
> We limit the cpu times in /etc/security/limits.conf so that user processes
> have a maximum of 10 minutes. It doesn't eliminate the problem completely,
> but it's fairly effective on users who misunderstood the role of login
> nodes.
>
>
>
> On Thu, Feb 9, 2017 at 6:38 PM +0100, "Jason Bacon" 
> wrote:
>
> We simply make it impossible to run computational software on the head
>> nodes.
>>
>> 1.No scientific software packages are installed on the local disk.
>> 2.Our NFS-mounted application directory is mounted with noexec.
>>
>> Regards,
>>
>>  Jason
>>
>> On 02/09/17 07:09, John Hearns wrote:
>> >
>> > Does anyone have a good suggestion for this problem?
>> >
>> > On a cluster I am implementing I noticed a user is running a code on
>> > 16 cores, on one of the login nodes, outside the batch system.
>> >
>> > What are the accepted techniques to combat this? Other than applying a
>> > LART, if you all know what this means.
>> >
>> > On one system I set up a year or so ago I was asked to implement a
>> > shell timeout, so if the user was idle for 30 minutes they would be
>> > logged out.
>> >
>> > This actually is quite easy to set up as I recall.
>> >
>> > I guess in this case as the user is connected to a running process
>> > then they are not ‘idle’.
>> >
>> > Any views or opinions presented in this email are solely those of the
>> > author and do not necessarily represent those of the company.
>> > Employees of XMA Ltd are expressly required not to make defamatory
>> > statements and not to infringe or authorise any infringement of
>> > copyright or any other legal right by email communications. Any such
>> > communication is contrary to company policy and outside the scope of
>> > the employment of the individual concerned. The company will not
>> > accept any liability in respect of such communication, and the
>> > employee responsible will be personally liable for any damages or
>> > other liability arising. XMA Limited is registered in England and
>> > Wales (registered no. 2051703). Registered Office: Wilford Industrial
>> > Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP
>>
>>
>> --
>> Earth is a beta site.
>>
>>


[slurm-dev] Re: prioritize based on walltime request

2016-11-27 Thread Marcin Stolarek

Probably the solution for you is to use ostrich priority plugin:
http://www.mimuw.edu.pl/~krzadca/ostrich/
Maybe you have to play with campaign setup in the plugin code.

cheers,
Marcin


[slurm-dev] Re: sreport empty

2016-10-19 Thread Marcin Stolarek

Account field is empty, have you enabled AccountingStorageEnforce ?


cheers,
Marcin

2016-10-19 16:50 GMT+02:00 Russell Jones :
> Hi all,
>
> This is on Slurm 14.11.7.
>
> When trying to get a report via sreport, regardless of if I run it as root
> or a user, and on the slurm controller or a node, the report is empty.
> "sacct" reports information back as expected for all users.
>
> I've seen a few messages about this issue and have verified I am not running
> into those problems (date and time not synced, controller node not knowing
> what cluster it belongs to, etc).
>
> Any additional help in figuring out what's going on would be great! Here's
> an example of my issue:
>
>
> [root@rdhpc-n5 ~]# /opt/slurm/14.11.7/bin/sacct -S 10/18
>JobIDJobName  PartitionAccount  AllocCPUS  State ExitCode
>  -- -- -- -- -- 
> 85 bash  rdhpc80 CANCELLED+  0:0
> 85.0  pmi_proxy   10  COMPLETED  0:0
> 87   sbatch Acceptanc+ 8  COMPLETED  0:0
> 87.batch  batch8  COMPLETED  0:0
> 88 bash  rdhpc80 CANCELLED+  0:0
> 88.0  pmi_proxy   10  COMPLETED  0:0
> 88.1  pmi_proxy   10  COMPLETED  0:0
> 88.2  pmi_proxy   10  COMPLETED  0:0
> 89   run_rdhpc+  rdhpc32 FAILED  1:0
> 89.batch  batch   32 FAILED  1:0
> 90   run_rdhpc+  rdhpc32  COMPLETED  0:0
> 90.batch  batch   32  COMPLETED  0:0
> 90.0  orted3  COMPLETED  0:0
> 91   run_rdhpc+  rdhpc32  COMPLETED  0:0
> 91.batch  batch   32  COMPLETED  0:0
> 91.0  orted3  COMPLETED  0:0
> 92   run_rdhpc+  rdhpc32  COMPLETED  0:0
> 92.batch  batch   32  COMPLETED  0:0
> 92.0  orted3  COMPLETED  0:0
> 93 bash  rdhpc80  COMPLETED  0:0
> 93.0  pmi_proxy   10  COMPLETED  0:0
> 94   run_rdhpc+  rdhpc32  COMPLETED  0:0
> 94.batch  batch   32  COMPLETED  0:0
> 94.0  orted3  COMPLETED  0:0
> 95   run_rdhpc+  rdhpc32  COMPLETED  0:0
> 95.batch  batch   32  COMPLETED  0:0
> 95.0  orted3  COMPLETED  0:0
>
>
>
> [root@rdhpc-n5 ~]# /opt/slurm/14.11.7/bin/sreport user topusage start=10/18
> 
> Top 10 Users 2016-10-18T00:00:00 - 2016-10-18T23:59:59 (86400 secs)
> Time reported in CPU Minutes
> 
>   Cluster Login Proper Name Account   Used Energy
> - - --- --- -- --
>


[slurm-dev] Re: auto detect Node definition details

2016-10-02 Thread Marcin Stolarek
Yes, I think this is how it normally works, however I always prefered
specifining resourses in slurm.conf
Check the manual.
Cheers
Marcin

W dniu czwartek, 29 września 2016 Tus  napisał(a):

>
> Hi All,
>
> Is there a way to auto detect node details that go in slurm.conf? If I
> just have the NodeName in there can slurm get the basic info (i.e cpu,
> sockets, threads)?
>
> Thanks
>


[slurm-dev] Re: Send mail from SLURM

2016-09-16 Thread Marcin Stolarek
yes, you need to set up mail sending :)

check your MTA configuration.

cheers
marcin

2016-09-16 21:37 GMT+02:00 Fanny Pagés Díaz :

> I need to send notifications from slurm to any mail? It works only for my
> local mail. I have to make some settings? thanks
>


[slurm-dev] RE: Combating idle interactive sessions

2016-08-30 Thread Marcin Stolarek
Maybe you can create a partition/qos for interactive jobs and use
job_submit plugin to force all interactive jobs to use it?

cheers,
Marcin

2016-08-30 8:09 GMT+02:00 John Hearns :

> I worked on the same problem in my last job, where engineers had
> interactive sessions used for post simulation analysis and visualization.
> Only this was using PBSPro.
> As I recall, the engineers did not directly run the command to start an
> interactive session so we just started those interactive jobs in a queue
> with limits.
>
> You could use a "wrapper script" maybe for your interactive sessions? That
> does not stop inventive users from figuring out what limits the wrapper
> script sets though.
>
> There are also idle timeouts for bash and C-shells which I used.
> On the compute nodes you can edit the /etc/profile scripts for bash and C,
> and put it a bit of logic.  On a PBS job there are environment variables
> set, so you can
> sayIF $PBS_INTERACTIVE THEN   set timeout
>
>
>
>
>
>
>
>
>
>
>
> --
> *From:* T Friddy [t.friddy...@gmail.com]
> *Sent:* 29 August 2016 21:19
> *To:* slurm-dev
> *Subject:* [slurm-dev] Combating idle interactive sessions
>
> Hi All,
>
> I'm having an issue with idle interactive sessions taking up resources for
> many hours. I was hoping I could restrict all interactive sessions to a
> single partition and then lower the time limit on that partition, but I
> haven't been able to find anything about partition specification pertaining
> to job type. Is it possible to do this? If not, is there another way to
> control such a scenario other than trusting users not to leave idle
> terminals open?
>
> Thanks
> Any views or opinions presented in this email are solely those of the
> author and do not necessarily represent those of the company. Employees of
> XMA Ltd are expressly required not to make defamatory statements and not to
> infringe or authorise any infringement of copyright or any other legal
> right by email communications. Any such communication is contrary to
> company policy and outside the scope of the employment of the individual
> concerned. The company will not accept any liability in respect of such
> communication, and the employee responsible will be personally liable for
> any damages or other liability arising. XMA Limited is registered in
> England and Wales (registered no. 2051703). Registered Office: Wilford
> Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP
>


[slurm-dev] Re: SLURM daemon doesn't start

2016-08-30 Thread Marcin Stolarek
Just run slurmctld into the foreground, check the output. If you still
don't know the cause of the problem paste a few lines here.

cheers,
Marcin


[slurm-dev] RE: Remote Visualization and Slurm

2016-08-17 Thread Marcin Stolarek
You can try http://www.qoscosgrid.org/trac/qcg

cheers,
Marcin

2016-08-17 17:31 GMT+02:00 John Hearns :

> Nicholas,
> As you say there are several solutions out there.
>
> The one I have has experience with is NICE Software, which I admit I
> integrated with PBS Pro.
> When looking at the code though there are the options to use with SLurm.
>
> Please send me an email off list and I can give more information.
>
>
>
> -Original Message-
> From: Nicholas McCollum [mailto:nmccol...@asc.edu]
> Sent: 17 August 2016 15:33
> To: slurm-dev 
> Subject: [slurm-dev] Remote Visualization and Slurm
>
> Hello All,
>
> I've been looking for remote visualization solutions that integrate with
> slurm; While I have found several companies that say they could work with
> slurm, I have yet to find any that can actually show me a product.
>
> Essentially I am wanting a solution where a class of 40 students could log
> on to our supercomputers, be provided a remote CentOS VM desktop that is
> attached to a GPU so that they can either interactively send jobs to the
> GPU that they are assigned or run programs like Spartan, Maestro, Abacus,
> ANSYS, etc.
>
> If anyone has a working remote visualization cluster that integrates well
> with slurm, I would love to hear from you.
>
> Thanks!
>
> ---
> Nicholas McCollum
> HPC Systems Administrator
> Alabama Supercomputer Authority
> Any views or opinions presented in this email are solely those of the
> author and do not necessarily represent those of the company. Employees of
> XMA Ltd are expressly required not to make defamatory statements and not to
> infringe or authorise any infringement of copyright or any other legal
> right by email communications. Any such communication is contrary to
> company policy and outside the scope of the employment of the individual
> concerned. The company will not accept any liability in respect of such
> communication, and the employee responsible will be personally liable for
> any damages or other liability arising. XMA Limited is registered in
> England and Wales (registered no. 2051703). Registered Office: Wilford
> Industrial Estate, Ruddington Lane, Wilford, Nottingham, NG11 7EP
>


[slurm-dev] Storage accounting, with web presentation

2016-07-28 Thread Marcin Stolarek
Hi,

This is not related to slurm, but I don't know better place to ask such a
question. For sure on clusters you manage there is a need to present space
used by projects, users on particular file systems.
I'am no aware of any opensource solution providing sth like "accounting
portal" for storage, does anyone know about such project?

I'm thinking about solution where used space is periodically updated basing
on script getting data from quota, or simple du, putting it into database,
then displaying through web interface.

How do you deal with this on yours clusters?

cheers,
Marcin


[slurm-dev] Re: SlurmUser question

2016-07-21 Thread Marcin Stolarek
2016-07-21 6:59 GMT+02:00 Barbara Krasovec :

>
> I re-checked, it is recommended to run slurm controller as slurm user and
> slurmd on worker nodes can run under any user.. By default they all run as
> root. Slurm user doesn't need login privileges.
>

Slurmd needs to fork  processes as arbitrary user of cluster, so of course
you can use any user if it has CAP_SETUID, CAP_SETGID (maybe CAP_CHOWN?)
but normally this means that it is running as root.

cheers,
marcin


[slurm-dev] Re: SPANK plugin to access job info at submission stage

2016-07-19 Thread Marcin Stolarek
check this:
http://slurm.schedmd.com/job_submit_plugins.html

However for directly accessing the options specified you probably need to
work with  wrapper. Inside the plugin you can work on  job structure.

cheers,
marcin

2016-07-19 7:53 GMT+02:00 Yong Qin :

> Hi,
>
> I'm trying to write a plugin to filter jobs at submission time (accept or
> deny with an error msg). I have to admit that I have not started reading
> the job submission plugin architecture yet and I will do that if there's
> really no way to implement it as a SPANK plugin.
>
> My understanding up to this point is, to achieve this goal the most likely
> callback is slurm_spank_init() (local context). However at this stage there
> is no way to access any job related information until the job is allocated.
> Ideally I would like to access the job submission line in its original form
> (-n 4 -t 20:0:0 --mem 2g, etc.) so that I can be as thorough as possible
> when parsing it. Is there any way to access that information as I describe?
> Thanks for shedding the light.
>
> Yong Qin
>


[slurm-dev] Process finished but jobs still "R" in squeue

2016-07-13 Thread Marcin Stolarek
Hi guys,

I have a cluster with a few nodes. Users are submitting job arrays with
like 50k tasks in array, some jobs are finishing within less than second. I
observed a lot of tasks are in running state for a few minutes even if user
process finished after a second.

I'm using 14.11.6, anyone with similar workload, observations?

cheers,
marcin


[slurm-dev] Re: TMPDIR, clean up and prolog/epilog

2016-06-26 Thread Marcin Stolarek
This was discussed numbers of times before. You can check the list archive,
or start for instance with:
https://github.com/fafik23/slurm_plugins/tree/master/bindtmp

cheers
marcin

2016-06-24 7:22 GMT+02:00 Lachlan Musicman :

> We are transitioning from Torque/Maui to SLURM and have only just noticed
> that SLURM puts all files in /tmp and doesn't create a per job/user TMPDIR.
>
> On searching, we have found a number of options for creation of TMPDIR on
> the fly using SPANK and lua and prolog/epilog.
>
> I am looking for something relatively benign, since this we are still
> learning the new paradigm.
>
> One thing in particular: our /tmp files are SSD local to CPU rather than
> on a shared filesystem for speed, so we will need to remove the tmps
>
> So I was looking at the --prolog and --task-prolog options, doing a little
> testing on how I might export TMPDIR
>
> I had a very simple
>
> srun --prolog=/data/pro.sh --task-prolog=/data/t-pro.sh -l hostname
>
>  pro.sh
>
>  #!/bin/bash
>  echo "PROLOG: this is from the prologue. currently on `hostname`"
>
>  t-pro.sh
>
>  #!/bin/bash
>  echo "TASK-PROLOG: this is from the task-prologue. currently on
> `hostname`"
>
> /data is a shared file system and is the WORKDIR
>
> I'm getting results from --prolog but not from --task-prolog.
> Running this instead:
>
> srun --task-prolog=/data/t-pro.sh -l hostname
>
> I confirm still no output from task-prolog.
>
> What am I doing wrong?
>
> (both scripts have a+x)
>
> cheers
> L.
>
> --
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>


[slurm-dev] Re: Change in RealMemory after OS upgrade

2016-05-24 Thread Marcin Stolarek
Hi Ray,

Can you for example check if  sys/systemcfg.h is available/absent on new
and old system? Is hwloc installed on both ?

cheers,
marcin


>


[slurm-dev] Re: Get triggering node as command line argument in the triggered program?

2016-05-19 Thread Marcin Stolarek
check here:
http://slurm.schedmd.com/strigger.html

1st paragraph ends with:  "A hostlist expression for the nodelist or job ID
is passed as an argument to the program."

2016-05-19 0:50 GMT+02:00 Michael Basilyan :

> Looks like it already does pass it as an argument -- it's just not
> documented?
>
> On Wed, May 18, 2016 at 3:26 PM, Michael Basilyan 
> wrote:
>
>> I'm doing this to trigger slurm_node_down.sh when a machine goes done. My
>> only problem is that I don't know what machine actually went *down*...
>> so that I can take care of it.
>>
>> sudo -u slurm strigger --set -nlinux[1-50] --down \
>>  --program=/etc/slurm-llnl/slurm_node_down.sh -v \
>>  --flags=PERM --offset=120
>>
>>
>> I used to do this to get a list of nodes currently down but that doesn't
>> work when the nodes have State=CLOUD
>>
>> down_nodes=`sinfo --states=down -h -N`
>> down_nodes=`echo "$down_nodes" | awk '{print $1}'`
>>
>> Thanks,
>> Mike
>>
>
>


[slurm-dev] Re: slurm limits

2016-05-17 Thread Marcin Stolarek
2016-05-18 7:49 GMT+02:00 remi marchal :

> Dear Slurm users,
>
> I am quite new in the community and so don’t have a strong experience with
> Slurm.
>
> My question is quite simple.
>
> I would like to fix limitation in terms of the number of running jobs
> allowed by users.
>
> I seen from the website that it’s possible to fix limitations either
> through mysql database or through reading a simple file. I strongly prefer
> to have it written on a simple file.
>
> Does anybody have experience with such kind of limitations and how to set
> it up.
>
> Regards,
>
> Rémi
>
>
As far as I know you have to configure accounting database ( I'd sugest
slurmdbd instead of directly using mysql) for per user job limits.

cheers,
marcin


[slurm-dev] Re: SPANK vs Task Plugin

2016-05-17 Thread Marcin Stolarek
2016-05-17 22:41 GMT+02:00 Tanner Satchwell :

> I am writing a plugin that isolates temporary files per job and per user.
> I have written it as a Task plugin, but recently came across some similar
> work as a SPANK plugin. I want the plugin to run at the start of a new job,
> and it needs to be run as root.
>
> What are the pros and cons of SPANK vs Task? Which should I ultimately use?
>

I think you should use spank, the main advantage of spank I know is that
you can build this plugins without access to slurm code. I think this is
very good place for site specific plugins.
Task plugins rearly have local modification.
Can you use multiple task plugins at once? I'm not sure but if not it's
another disadvantage - you are currenly unable to use for example
task/cpuset? If you are going to modify it locally, you need to apply this
changes after upgrade. In case of spank plugin you can easily reuse it with
new version.

cheers,
marcin


[slurm-dev] Re: Change in RealMemory after OS upgrade

2016-05-16 Thread Marcin Stolarek
2016-05-17 5:22 GMT+02:00 Raymond Wan :

>
> Hi Marcin,
>
>
> On Mon, May 16, 2016 at 5:51 PM, Marcin Stolarek
>  wrote:
> > 2016-05-14 16:49 GMT+02:00 Raymond Wan :
> >> Anyway, after upgrading the OS, something strange happened.  The queue
> >> failed to start and I got error messages in the log about the
> >> RealMemory being insufficient.  I ran "slrumd -C" again, and the value
> >> of RealMemory changed!  For example, from 32058 to 32057.  So...I
> >> changed it in slurm.conf and restarted slurm and all is fine now.
> >>
> >> But, I was wondering ... is this something I should be alarmed about?
> >> How could I "lose" memory during an OS upgrade.  Is it an
> >> approximation and there is some kind of rounding error?
> >>
> >> Ray
> >
> >
> >
> > It looks like this value comes from get_memory function:
> >
> https://github.com/SchedMD/slurm/blob/9d5ad6398b85c185756f93b44efbc4fbab028c81/src/slurmd/slurmd/get_mach_stat.c
> >
> > The actual source (if you check the code) can be _system_configuration,
> or
> > _SC_PHYS_PAGES  or hw.physmem. I think that upgrading your OS caused for
> > example sys/systemcfg.h library installation or removal, which in fact
> > changed the source of the information and.. that's why the number
> changed.
>
>
> Thanks a lot for finding the place in the source code!  I have to
> admit that I find SLURM's source code to be a bit intimidating and
> never had the courage the look through it myself.  (Nor do I have the
> expertise to know I've found what I'm looking for...)
>
> I'm relieved that my system isn't mysteriously losing memory with the
> upgrade -- thanks a lot for your reply!
>

It  maybe still  interesting to check what changed during the upgrade, with
that information removing this inconsistency will probably be easy. Can you
check it ?

cheers,
marcin


[slurm-dev] RE: Jobs are waiting for resources with some partitions

2016-05-16 Thread Marcin Stolarek
2016-05-13 16:02 GMT+02:00 David Ramírez :

> Thanks Carlos!!
>
>
>
My users need put manually the partition I see. Time to learn users L
>
>
You can also write your own job_submit_plugin to change the patition user
specified, or to use decide with some logic which partition should be
chosen by default.

cheers,
marcin


[slurm-dev] Re: Change in RealMemory after OS upgrade

2016-05-16 Thread Marcin Stolarek
2016-05-14 16:49 GMT+02:00 Raymond Wan :

>
> Hi all,
>
> I've recently upgraded the Ubuntu OS on a few independent (i.e., they
> are not part of the same cluster) servers.  This brought me up from
> version 14.11 of SLURM  to 15.08 .
>
> While things have gone fine, on at least two occasions, I've noticed a
> problem.  At the end of slurm.conf, I have an entry like this:
>
> NodeName=mynode CPUs=24 SocketsPerBoard=2 CoresPerSocket=6
> ThreadsPerCore=2 RealMemory=32057 State=UNKNOWN
>
> After using the web-based configuration tool, some web site (sorry, I
> forgot where I saw it) suggested I run "slurmd -C" and then copy the
> values to the end of the file. That is what I've done.
>
> Anyway, after upgrading the OS, something strange happened.  The queue
> failed to start and I got error messages in the log about the
> RealMemory being insufficient.  I ran "slrumd -C" again, and the value
> of RealMemory changed!  For example, from 32058 to 32057.  So...I
> changed it in slurm.conf and restarted slurm and all is fine now.
>
> But, I was wondering ... is this something I should be alarmed about?
> How could I "lose" memory during an OS upgrade.  Is it an
> approximation and there is some kind of rounding error?
>
> Ray
>


It looks like this value comes from get_memory function:
https://github.com/SchedMD/slurm/blob/9d5ad6398b85c185756f93b44efbc4fbab028c81/src/slurmd/slurmd/get_mach_stat.c

The actual source (if you check the code) can be _system_configuration, or
_SC_PHYS_PAGES  or hw.physmem. I think that upgrading your OS caused for
example sys/systemcfg.h library installation or removal, which in fact
changed the source of the information and.. that's why the number changed.

cheers,
marcin


[slurm-dev] Re: slurm and ldap

2016-05-10 Thread Marcin Stolarek
2016-05-10 20:02 GMT+02:00 remi marchal :

> dear Marcin
>
> here is the output of the getent
>
> remarche:*:1050:502: remarche:/home/users/remarche:/bin/bash
>

have you restarted slurmd after configuring ldap authenticatiion? If you
have "CacheGroups" enabled in slurm.conf it may be a problem.. but it's
diabled by default.

cheers,
marcin


[slurm-dev] Re: slurm and ldap

2016-05-10 Thread Marcin Stolarek
2016-05-10 17:43 GMT+02:00 remi marchal :

> Dear Slurm users,
>
> I am quite new in the slurm community.
>
> I set up a slurm cluster using a munge authentification and would like to
> allow ldap users to submit jobs.
>
> ldap authentification works perfectly on the all hosts.
>
> When I submit a job using a local user, everything works fine.
>
> However, when I submit a job using an ldap account, it remains pended and
> I see the following error message in the slurmd logfile:
>
> [2016-05-10T17:14:45.213] Launching batch job 126 for UID 1050
> [2016-05-10T17:14:45.213] error: _send_slurmstepd_init: getpwuid_r: No
> error
>

Can you provide the output of `getent passwd | grep 1050` on the compute
node?

Have you configured passwd/shadow source appropriately?

cheers,
marcin


[slurm-dev] Re: Difficulty using reboot_nodes or similar for maintenance, SLURM 15.08

2016-05-09 Thread Marcin Stolarek
2016-05-07 6:43 GMT+02:00 Ryan Novosielski :

>
> Hi all,
>
> What I want to do is to be able to use reboot_nodes as it is described in
> the manual. The trouble is that my nodes return to service before the user
> filesystems are mounted. I haven't been able to resolve that problem.


Is it Centos/RHEL 7, if so maybe the solution for main problem is to make a
dependency on particular filesystem rather than "filesystems service"?
Haven't been solving this, but I remember such a problem described by
friend.

cheers,
marcin


[slurm-dev] Re: DB on worker nodes

2016-03-24 Thread Marcin Stolarek
2016-03-24 1:26 GMT+01:00 Lachlan Musicman :

> Hi,
>
> I'm just configuring a script to deploy worker nodes. I've realised that
> version #1, made many moons ago, installed MySQL/MariaDB.
>
> But now that I look at my worker nodes, I don't think that they need mysql
> on them.
>
> Can any one confirm or deny if they do?
>


for sure you don't need mariadb on worker node to run "Hello Worold"
thorugh Slurm, but if you want load some data into mysql, and than run some
scripts with mysql :)

cheers,
marcin


[slurm-dev] Re: Problem using slurm 14.11.9 :

2015-10-19 Thread Marcin Stolarek
2015-10-19 18:21 GMT+02:00 :

> Hello,
> is there someone who can explain such kind of message in *slurmctld.log* :
>
> *debug: not the right user 2279 != 1761*
>

Have you enabled debug4 messages?
This message can just mean that the association beeing iterated is not
added assoc_list in assoc_mgr.c

I think there is nothing to worry about.

cheers,
marcin

>
>
> Thanks,
>
> Best regards,
> Gerard Gil
>
> Departement Calcul Intensif
> Centre Informatique National de l'Enseignement Superieur
> 950, rue de Saint Priest
> 34097 Montpellier CEDEX 5
> FRANCE
>
> tel :  (334) 67 14 14 14
> fax : (334) 67 52 37 63
> web : http://www.cines.fr
>
> --
>
> *De: *"gil" 
> *À: *"slurm-dev" 
> *Cc: *"gil" 
> *Envoyé: *Mercredi 7 Octobre 2015 09:42:31
> *Objet: *Problem using  --ntasks (slurm 14.11.9)
>
> Hello,
> we have just upgraded our configuration from SLURM 2.6.9 to SLURM 14.11.9.
>
> We are facing a new issue with jobs using --ntasks.
>
>
> The following variables* SLURM_NTASKS*,  *SLURM_NPROCS* and *
> SLURM_STEP_NUM_TASKS* are set with wrong values when  the job is
> submitted using sbatch command :
>
>
>
> slurm script exemple :
>
> #!/bin/bash
>
>
> *#SBATCH --nodes=4   #SBATCH --ntasks=7 #SBATCH
> --ntasks-per-node=2*
> ...
>
>
>
> In a "normal" case slurm* 2.6.9* we get :
>
>
> *SLURM_NTASKS=7 SLURM_NPROCS=7 SLURM_STEP_NUM_TASKS=7*
>
>
> With slurm version *14.11.9*, when the job is submitted with *sbatch*
> command we get :
>
>
>
> *SLURM_NTASKS=8 SLURM_NPROCS=8 SLURM_STEP_NUM_TASKS=8*
>
>
> With slurm version *14.11.9*, when the job is submitted with *salloc*
> command we get :
>
>
>
> *SLURM_NTASKS=7 SLURM_NPROCS=7 SLURM_STEP_NUM_TASKS=7*
>
>
>
> The only way we found to workaround the problem is to set these tree
> variables "by hand" inside the slurm script as the first command before job
> steps :
>
> #!/bin/bash
>
>
> *#SBATCH --nodes=4   #SBATCH --ntasks=7 #SBATCH
> --ntasks-per-node=2*
> ...
>
>
> *SLURM_NTASKS=7 SLURM_NPROCS=7 SLURM_STEP_NUM_TASKS=7*
>
>
>
> Any idea about this problem ?
>
> How can we solve it ?
>
>
> Best Regards,
> Gerard Gil
>
> Departement Calcul Intensif
> Centre Informatique National de l'Enseignement Superieur
> 950, rue de Saint Priest
> 34097 Montpellier CEDEX 5
> FRANCE
>
> tel :  (334) 67 14 14 14
> fax : (334) 67 52 37 63
> web : http://www.cines.fr
>
>


[slurm-dev] Re: Automatic failover and replication

2015-10-18 Thread Marcin Stolarek
2015-10-19 1:06 GMT+02:00 Walter Landry :

>
> Hello,
>
> I administer a fairly busy system that runs about about 30,000 jobs a
> day.  I would like to set up an automatic failover system so that I
> can take the scheduler down and have a backup scheduler handle jobs.
> I see that there is a BackupController option in the config files, but
> I am unclear how it actually works.  Does it replicate the jobs
> database, so that if I start job 1138 on the master scheduler, will
> the backup scheduler also see the same job 1138?
>
> Thank you,
> Walter Landry
>

so... from man slurm.conf about BackupController
"[...]The backup controller recovers state information from the
StateSaveLocation directory,
  which must be readable and writable from both the primary and
backup controllers.[...]"


[slurm-dev] Re: Limit access to reconfiguration of Slurm (p.ex. accounting, limits) to certain hosts?

2015-09-29 Thread Marcin Stolarek
As far as I remember the easy way was to modify auth/munge not to trust
root from particular host.
Cheers,
Marcin

W dniu środa, 30 września 2015 Christopher Samuel 
napisał(a):

>
> On 30/09/15 01:07, Thomas Orgis wrote:
>
> > Given the traffic on this list on other topics, may I assume the
> > answer to this question is "no", without being too rude?
>
> It's probably also that nobody has tried it (we certainly haven't), the
> only people who are meant to have root access are those on our staff.
>
> > I might prepare a patch to limit administrative actions to localhost at
> > some point in time.
>
> You probably want an optional configurable list of multiple CIDR IP
> ranges in slurmdbd.conf as there's no guarantee that slurmdbd (which is
> what sacctmgr will be talking to) is running on the same host as slurmctld.
>
> For instance we have many clusters talking back to a central host
> running slurmdbd (and only slurmdbd) and have a variety of various
> systems that talk to it for different tasks.
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au  Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>


[slurm-dev] Re: pam_slurm: how can I exclude some users from pam_slurm?

2015-09-24 Thread Marcin Stolarek
pam_listfile before pam_slurm with "sufficient" key word in pam.d/ssh
configuration?

cheers,
marcin

2015-09-25 6:18 GMT+02:00 Koji Tanaka :

> Hello Slurm Community,
>
> Is there a way to exclude some users from pam_slurm?
>
> I've successfully set up ssh restriction with using pam_slurm, but there's
> one problem. When we deploy our system, we use a regular user
> account+sudo+ansible, instead of logging in as root. So if a compute node
> has a problem on slurm, the deploying-user won't be able to login to the
> node. The simple solution is to enable root ssh login, but is there a way
> to exclude the deploy-user from pam_slurm restriction?
>
> Thank you and best regards,
> Koji
>


[slurm-dev] Re: Disk I/O as consumable?

2015-09-10 Thread Marcin Stolarek
2015-09-10 22:10 GMT+02:00 Kilian Cavalotti :

>
> On Tue, Sep 8, 2015 at 5:01 AM, Marcin Stolarek
>  wrote:
> > using specified mountpoint, but... thats not real IOPS threshold.
> Currently
> > I don't how any linux mechanism that allows limitting process to
> specified
> > number of I/O operations per second. At our side we've been considering
> > writing our own fusefs with this functionality.
>
> Wouldn't the cgroup blkio subsytem achieve this?
> https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
>
> I see that there is support for it in Slurm code, but it has been
> commented 2 years ago "until full kernel support is ready".
>
> https://github.com/SchedMD/slurm/blob/master/src/plugins/jobacct_gather/cgroup/jobacct_gather_cgroup_blkio.c
>
> Not sure exactly what this entails, but maybe kernel support is ready now?

:)
>

It's probably a good question for rhel/centos/scientificlinux 7. I'm not
sure if real support for lustre with rhel7 exist so...  in a lot of
production environments it may be still "until full kernel support is
ready" :)

cheers,
marcin

>
> Cheers,
> --
> Kilian
>


[slurm-dev] Re: Disk I/O as consumable?

2015-09-08 Thread Marcin Stolarek
2015-09-08 12:55 GMT+02:00 Raymond Wan :

>
> Dear all,
>
> I'm trying to figure out how to configure a "cluster" with a single
> computer (i.e., execution and master node is the same).  After I
> figure this out, I hope that setting up a cluster with multiple nodes
> is not too difficult.
>
> In particular, I think the default setting permits only a single job
> per node at a time.  However, I'd like to set things up so that more
> than one job can run at a time.
>
> I'm looking at the CPU Management User and Administrator Guide [1],
> and in particular, the Consumable Resources in Slurm page [2].  I hope
> I'm on the right track?
>
> In the examples, I understand the memory (CR_Memory) example.  But, I
> don't quite understand the CR_CPU_Memory example.  What is -N and -n?
> The manpages says -N is the number of nodes...so with only one node,
> that is meaningless in my case.  -n is "number of tasks".  Is "number
> of tasks" the same as "number of CPUs"?
>
> Is there a reason why the example used both -N and -n and not just -n?
>  Do the two parameters interact somehow?
>
> If I have a computer with 2 cores and 10 threads each, that is 20
> CPUs.  So, -n can range from 1 to 20?
>
> And under SelectTypeParameters, if I set CR_CPU_Memory, then a job
> enters the running state if both CPU and Memory is available.
>
> So far, I hope I'm correct?  If so, then my "real" question is that
> the jobs I would like to run are mainly I/O intensive.  CPU and Memory
> usage is important, but the bottleneck is probably disk I/O.  If I've
> set up k disk partitions using object store, I'd like no more than k
> jobs to run at a time and I'd like each one to write to a different
> partition.
>
> I *think* this is "impossible" to do since it would be hard to force
> users to write to one partition and not any others.  But, I thought
> I'd ask anyway in case there is something within SLURM that I've
> missed.  Any suggestions?
>

You can use something like this:
 https://github.com/fafik23/slurm_plugins/blob/master/unshare/unshare.c
Using unshare syscall/linux namespaces, and unmount specified filesystems.
You can use licenses to achieve a kind of limiting number of jobs that are
using specified mountpoint, but... thats not real IOPS threshold. Currently
I don't how any linux mechanism that allows limitting process to specified
number of I/O operations per second. At our side we've been considering
writing our own fusefs with this functionality.

If you are using local disks,  gres may fit better than licenses...

cheers,
marcin


[slurm-dev] Re: extend user ssh permissions

2015-09-01 Thread Marcin Stolarek
2015-09-01 19:50 GMT+02:00 Jan Schulze :

>
> Dear all,
>
> this is slurm 14.11.6 on a ROCKS 6.2 cluster.
>
> I have a perhaps trivial question concerning the user permissions for the
> ssh login to computing nodes. The default setting allows our users to login
> to computing nodes exclusively if they have a job running. Can one extend
> these permissions such that the users are also allowed to log in to the
> computing node if they have a suspended job (changed to suspended state by
> root using scontrol) on it?
>
> Thanks.
>
> greetings
>
>
> Jan Schulze
>

check the lines listed below in 'contribs/pam/pam_slurm.c';   I assume you
are using it with hostbased ssh ?




job_info_t *j = &msg->job_array[i];
/*This is probably the line you can remove to achieve your goal: */
if (j->job_state == JOB_RUNNING) {
DBG ("jobid %ld: nodes=\"%s\"", j->job_id,
j->nodes);
if (_hostrange_member(nodename, j->nodes) ) {



cheers,
marcin


[slurm-dev] Re: problem starting slurm on stateless node

2015-08-12 Thread Marcin Stolarek
2015-08-12 19:46 GMT+02:00 Trevor Gale :

> Thank you for your reply!
>
> I found that the error was being caused by the var/log/* directories being
> excluded, as well as the hostname being changed on the node when I switched
> to Warewulf. I thought about using the file store to provision the
> slurm.conf, but I ended up adding it to my NFS exports and just mounting
> it. I am using a separate network for NFS/WW so my applications still have
> exclusive use of the IB.
>



Thanks,
> Trevor
>
> On Aug 12, 2015, at 12:51 PM, James Armstrong 
> wrote:
>
> Trevor,
>   I also have a warewulf provisioned cluster, and I have noticed that the
> default rule when creating a vnfs is to exclude all /var/log/* directories
> (/etc/warewulf/vnfs.conf) and I don't think the slurmd executable will
> create it if doesn't exist. I have implemented various solutions from
> editing the init.d slurm script to create the log directory to using the
> wwsh file system to create it. The simplest solution is to just edit the
> slurm.conf to point the log file to somewhere else (an NFS mount) or not
> have it at all. I would recommend against having it write to somewhere on
> the VNFS as this will be a drain on your available ram.
>
>   You also need to add /etc/slurm/slurm.conf to the wwsh file system to
> keep all your nodes updated.
>
> Hope this helps
>
> James.
>
> --
> *From: *"Trevor Gale" 
> *To: *"slurm-dev" 
> *Sent: *Friday, 7 August, 2015 5:21:25 PM
> *Subject: *[slurm-dev] problem starting slurm on stateless node
>
>
> Hello all,
>
> I’m working on a small test cluster (2 nodes linked with eth and IB) and
> am trying to install slurm on them. I have installed Slurm numerous times
> on a normal system but I am having issues starting the slurm service on the
> compute node. I am using werewulf to boot my nodes statelessly, so I
> installed munge and slurm in a chroot on my head node and then provisioned
> it to my compute node. When my compute node boots, the munge daemon is
> running, but when I try to start the slurm daemon I get no output. Also, if
> I query the status of the slurm daemon I get no output. However, if I run
> “slurmd -C” I see the expected output of all the resources on my node. My
> head nodes ctl daemon is running but It cannot connect to the compute nodes
> daemon. I also have the exact users and slurm.conf on both nodes (they are
> mounted with NFS).
>
> Can't you just start slurmd on foreground and check why it's not starting?
Because probably that it's what you call "no output"?

cheers,
marcin




> My slurm.conf specifies to create log files in /var/log/slurm, but this
> folder was not created even though the slurm daemon appears to be running.
> I’m guessing there is some sort of issue with ownership of the slurm files
> that is causing this. When I installed munge I had to go through and fix
> the owners on a number of files and directories. Does anyone have any
> indication of what files might cause this?
>
> Thanks,
> Trevor
>
>
>


[slurm-dev] Re: Logging job executing time.

2015-08-10 Thread Marcin Stolarek
2015-08-10 17:35 GMT+02:00 Zentz, Scott C. :

> Hello Everyone,
>
>
>
> The email’s from SLURM contain the job completion time but
> I was wondering if there was a way to get the job completion time  from
> either an srun or sbatch command and have the time logged to a file. Is
> that possible?
>
>
>

check AccountingStorageLoc in man slurm.conf

cheers
marcin


[slurm-dev] Re: node renaming

2015-07-28 Thread Marcin Stolarek
2015-07-28 15:47 GMT+02:00 Andrew E. Bruno :

>
> On Tue, Jul 28, 2015 at 06:30:09AM -0700, Marcin Stolarek wrote:
> > 2015-07-28 15:08 GMT+02:00 Andrew E. Bruno :
> >
> > >
> > > We need to rename all the nodes in our cluster. Our thinking is to put
> > > in a full-system reservation:
> > >
> > > scontrol create reservation  nodes=ALL ..
> > >
> > > Take the nodes down and rename them. Then bring slurm backup configured
> > > with the new names.
> > >
> > > What will happen when we bring slurm backup with all new node names?
> > > Does the reservation store specific nodenames? or will the slurmdbd/ctl
> > > handle this gracefully?
> > >
> > > Any suggestions on how best to rename all the nodes in a cluster?
> > >
> >
> > As I understand you want to remove nodes and add new - with new names.
> > In database/accounting this names have no special format, they are just
> > "strings".
> > So... old nodes (old names/ip's) from slurm.conf are going to be down and
> > you need to add new entries to slurm.conf, but
> > maybe I'm not getting what the problem is..?
>
> Wondering how the reservation will be handled.. when all the old names
> go away. Are you suggesting just renaming the nodes directly in the db?
>

but why you want to change anything in db? Historical jobs where running on
nodes with old names.

I haven't check this in code, but I'm pretty sure that "nodes=all"  is
changed to the list of all hosts configured in slurm.conf, so if  you add
new nodes (with new names) they are not going to be in previously created
reservation.

If you don't want to run jobs on the newly added nodes, you can change
partition state to DOWN.

cheers,
marcin


[slurm-dev] Re: node renaming

2015-07-28 Thread Marcin Stolarek
2015-07-28 15:08 GMT+02:00 Andrew E. Bruno :

>
> We need to rename all the nodes in our cluster. Our thinking is to put
> in a full-system reservation:
>
> scontrol create reservation  nodes=ALL ..
>
> Take the nodes down and rename them. Then bring slurm backup configured
> with the new names.
>
> What will happen when we bring slurm backup with all new node names?
> Does the reservation store specific nodenames? or will the slurmdbd/ctl
> handle this gracefully?
>
> Any suggestions on how best to rename all the nodes in a cluster?
>

As I understand you want to remove nodes and add new - with new names.
In database/accounting this names have no special format, they are just
"strings".
So... old nodes (old names/ip's) from slurm.conf are going to be down and
you need to add new entries to slurm.conf, but
maybe I'm not getting what the problem is..?

cheers,
marcin


[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Marcin Stolarek
Sory for previous mails.. (keyboard problem)

We are using slurm accounting with xdmod (http://xdmod.sourceforge.net/)
for graphical presentation. It's nice and I hope with group of people using
and developing this tool will make it even better :)

cheers,
marcin


[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Marcin Stolarek
2015-06-24 23:12 GMT+02:00 Marcin Stolarek :

>
>
> 2015-06-24 16:43 GMT+02:00 Veronique Legrand :
>
>>
>> On 24/06/15 16:04, Bjørn-Helge Mevik wrote:
>>
>>> (Apologies for this slightly off-topic question.)
>>>
>>> We are currently using Gold
>>> (http://www.adaptivecomputing.com/products/open-source/gold/) to manage
>>> allocations and accounting, but are looking for alternative solutions.
>>>
>>> It would be very interesting to know what people on this list use for
>>> accounting.
>>>
>>>


[slurm-dev] Re: Off-topic: What accounting system do you use?

2015-06-24 Thread Marcin Stolarek
2015-06-24 16:43 GMT+02:00 Veronique Legrand :

>
> On 24/06/15 16:04, Bjørn-Helge Mevik wrote:
>
>> (Apologies for this slightly off-topic question.)
>>
>> We are currently using Gold
>> (http://www.adaptivecomputing.com/products/open-source/gold/) to manage
>> allocations and accounting, but are looking for alternative solutions.
>>
>> It would be very interesting to know what people on this list use for
>> accounting.
>>
>>  Hi,
>
> We use the slurm accounting. More precisely the jobacctgather plugin with
> task=30 for the moment. I am also looking for information about accounting
> solutions.
> Are you using gold together with slurm?
>
> Best regards,
>
> Véronique
>
> --
> Véronique Legrand
> IT engineer - Scientific computing
> IT center for biology - C3BI
> Institut Pasteur, Paris
>
> mail: veronique.legr...@pasteur.fr
> Tel: 01 44 38 95 03
>
>
>
>
>


[slurm-dev] Changing /dev file permissions for particular user

2015-06-24 Thread Marcin Stolarek
Hey!

I've got one user I trust and know that he isn't going to do anything
malicious, he needs a direct acces to file in dev (/dev/cpu/*/msr in
particular).

Have anybody checked how to do such a thing in slurm? We are thinking abuot
doing it in prologue and changing back in epilogue, checking if the node is
exclusive for user X. Do you know if the file permissions can be changed in
users namespace or how to achieve this using slurm on Linux?

cheers,
marcin


[slurm-dev] Re: Job name truncated in email

2015-06-24 Thread Marcin Stolarek
2015-06-24 22:27 GMT+02:00 Moe Jette :

>
> It's open source. Help yourself to it.

Like it! :)


[slurm-dev] Re: Segment Fault in Slurmctld

2015-04-23 Thread Marcin Stolarek
2015-04-21 19:16 GMT+02:00 Dinesh Kumar :

>  Hi Everyone,
>   I am supporting cons_res plugin but my code is segment faulted, so
> please give me the hints to use gdb with core. Thanks for ur help in
> advance.
>
>
> check this page:


https://www.google.pl/search?q=man+ddb&gws_rd=cr,ssl&ei=wOc4VbOiF8jhaLT5gKgG#q=man+gdb+core

cheers,
marcin


[slurm-dev] Separate slurm-realdev list

2015-03-10 Thread Marcin Stolarek
Hi guys,

One of the ideas that came on the last slurm user group was to create a
separate list for "more advanced topics".

Any news on this?

cheers,
marcin


[slurm-dev] Re: Rounding up of resource requests

2015-02-14 Thread Marcin Stolarek
you may also be interested in using this:
https://github.com/cinek810/misc/tree/master/jobsubmit/job_sane

this plugin allows you to reject jobs that are trying to use more than one
node, and not the full nodes. It uses separate configuration file to allow
specific users jobs not to be checked and feature combinations to force
jobs placement on full nodes for heterogenous clusters.

Rounding slurm job specification may not work, because users can for
example specify -np to mpirun or set
OMP_NUM_THREADS and "rounded" resources may not be used.

Let me know if you are intersted in using this plugin.

cheers,
marcin

2015-02-12 4:47 GMT+01:00 :

>
> You can use a job submit plugin to modify any job submit request. See:
> http://slurm.schedmd.com/job_submit_plugins.html
>
>
>
> Quoting Simon Michnowicz :
>
>> Dear Group,
>> is it possible to configure SLURM so that it rounds up resource requests
>> for hardware, For example, if a user requests 11 cores out of a 12 core
>> machine we would like to force this to be rounded up to12.  Another
>> example
>> is the use of a GPU, so if a user wants 1 core of a socket with a GPU
>> attached, we force them to request the entire socket (for performance
>> reasons)
>>
>> regards
>>
>> --
>> Simon Michnowicz
>> Monash e-Research Centre
>> PH:   (03) 9902 0794
>> Mob: 0418 302 046
>> www.monash.edu.au/eresearch
>>
>
>
> --
> Morris "Moe" Jette
> CTO, SchedMD LLC
> Commercial Slurm Development and Support
>


[slurm-dev] accounting from previous version of slurm

2015-01-22 Thread Marcin Stolarek
Hi guys,

I've noticed the problem with getting accounting data (with sacct) which
was generated in the past, before slurm upgrade.
Now I'm running 14.03.5, previous version was 2.6.x. Is it known problem,
or should check the code?

cheers,
marcin


[slurm-dev] Re: Environment in prolog/epilog

2015-01-19 Thread Marcin Stolarek
2015-01-19 17:35 GMT+01:00 Uwe Sauter :

>
> Hi,
>
> is there a list of SLURM environment variables which I can access in the
> different prolog/epilog scripts?
>
> Specifically is it possible to get a list of nodes for a job in in the
> PrologSlurmctld script although this runs on the controller host?
>
> Perhaps this information could be added to
> http://slurm.schedmd.com/prolog_epilog.html
>


Can you just use `env > some_file`  in your prolog to get the list?

cheers,
marcin

>
>
> Thanks,
>
> Uwe
>


[slurm-dev] Re: Number of running jobs as a priority factor

2014-12-15 Thread Marcin Stolarek
2014-12-16 4:08 GMT+01:00 :
>
>
> Quoting "Skouson, Gary B" :
>
>  I'm looking for a way to prioritize jobs so that users with jobs running
>> get lower priority than those without jobs running.  I'd like the priority
>> to be independent of the job size or past usage.  For example, one user may
>> have three single-node jobs and another user has a single 300-node job
>> running.  I'd like the user with the single job running to have their
>> pending jobs get priority, until both users have the same number running.
>> Similarly, a user with several 100-node jobs would get lower priority than
>> a user with nothing running that submits a single-node job.
>>
>> From what I can tell, the current priority policies have cpu-hours as
>> part of the priority equation, but not "number of currently running jobs".
>> Any ideas, or pointers to something I've missed?  Any other way to do this?
>>
>
I think ostrich may be an interesting idea for you, please check:
http://www.mimuw.edu.pl/~krzadca/ostrich/index.html

this is not exactly what you have described, but maybe when you run into
more mathematics and theory you and ostrich will meet.

We have been using this plugin for 6 months and we are quite satisfied.

chears,
marcin


[slurm-dev] Re: How to get SLURM to honor TmpDisk reservations?

2014-11-13 Thread Marcin Stolarek
2014-11-12 17:06 GMT+01:00 David Lipowitz :

>  Hi,
>
> We're doing a SLURM proof-of-concept and the management of temp space is
> really important for what we want to do.  We set up a few virtual machines
> as a test with the following slurm.conf settings:
>
> FastSchedule=2
> SelectType=select/cons_res
> SelectTypeParameters=CR_CPU_Memory
> NodeName=test[1-3] NodeAddr=xxx.xxx.xxx.[1-3] CPUs=4 State=UNKNOWN
> TmpDisk=1024 RealMemory=512
>
>
> The CPUs setting above works as expected: We see jobs waiting when number
> of jobs per node exceeds 4, so that's good.
>
> We also see jobs waiting when the total memory consumption as specified
> using sbatch's --mem option exceeds the RealMemory setting.  Also good.
>
> But we can't get jobs to wait when TmpDisk is exceeded using the sbatch's
> --tmp option.  Any suggestions?
>

As far as I remember --tmp is not treated as consumable resource, you can
define license  (if the the filesystem is shared) or gres if it's per
computing node, and setup quota in prologue/epilogue using this values.

cheers,
marcin


[slurm-dev] Re: OpenMPI, mpirun and suspend/gang

2014-11-09 Thread Marcin Stolarek
W dniu niedziela, 9 listopada 2014 Ralph Castain 
napisał(a):

>
> What stop signal is being sent, and where? We will catch and suspend the
> job on receipt of a SIGTSTP signal by mpirun.
>
>
> > On Nov 9, 2014, at 6:47 AM, Jason Bacon >
> wrote:
> >
> >
> >
> > Does anyone have SUSPEND,GANG working with openmpi via mpirun?
> >
> > I've set up a low-priority queue, which seems to be working, except that
> for openmpi jobs, only the processes on the MPI root node seem to be
> getting the stop signal.
> >
> > From slurm.conf:
> >
> > SelectType=select/cons_res
> > SelectTypeParameters=CR_Core_Memory
> > PreemptMode=SUSPEND,GANG
> > PreemptType=preempt/partition_prio
> >
> > MpiDefault=none
> >
> > I've also tried --mca orte_forward_job_control 1, but it had no apparent
> effect.
> >
> > Thanks,
> >
> >   Jason
>
Hi jason,

I've been testing the setup with freezer cgroup. It's working fine with mpi
jobs, but intensive tests shown that in case of multinode jobs the job
which should be suspended is using cpus on one of the nodes, this happends
only for small percent of tests.
Currently i'm not able to fully reproduce the issue, so i'm allowing only
one node jobs on the lower priority partition.

Cheers,
Marcin


[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-09-29 Thread Marcin Stolarek
2014-09-29 11:10 GMT+02:00 Alan Orth :

>
> Wow, well spotted.  I came here to see if anyone had reported this same
> issue with environment modules, as I noticed several of my jobs failing
> on our cluster this morning.  Turns out, I'm probably the only one who
> had failed jobs, as I have a long-running tmux session open on the head
> node, and therefore old bash. ;)
>
> Other users wouldn't have noticed because we updated all of our
> infrastructure in one go using ansible[0]

   ^ +1  :)


> last Friday.
>
> In any case, glad to be in good company.  Cheers!
>
> Alan
>
> [0]
>
> http://mjanja.co.ke/2014/09/update-hosts-via-ansible-to-mitigate-bash-shellshock-vulnerability/
>
> On 09/29/2014 08:27 AM, Christopher Samuel wrote:
> > On 27/09/14 08:30, John Brunelle wrote:
> >
> >> This caused a bit of trouble for us when we patched some head nodes
> >> before compute nodes.
> > We did some testing to confirm that:
> >
> > A) If you update a login node before compute nodes jobs will fail as
> > John describes.
> >
> > B) If you update a compute node when there are jobs queued under the
> > previous bash then they will fail when they run there (also cannot find
> > modules, even though a prologue of ours sets BASH_ENV to force the env
> > vars to get set).
> >
> >
> > Our way to (hopefully safely) upgrade our x86-64 clusters was:
> >
> > 0) Note that our slurmctld runs on the cluster management node which is
> > separate to the login nodes and not accessible to users.
> >
> > 1) Kick all the users off the login nodes, update bash, reboot them
> > (ours come back with nologin enabled to stop users getting back on
> > before we're ready).
> >
> > 2) Set all partitions down to stop new jobs starting
> >
> > 3) Move all compute nodes to an "old" partition
> >
> > 4) Move all queued (pending) jobs to the "old" partition
> >
> > 5) Update bash on any idle nodes and move them back to our "main"
> > (default) partition
> >
> > 6) Set an AllowGroups on the "old" partition so users can't submit jobs
> > to it by accident.
> >
> > 7) Let users back onto the login nodes.
> >
> > 8) Set partitions back to "up" to start jobs going again.
> >
> >
> > Hope this helps folks..
> >
> > cheers!
> > Chris
>
> --
> Alan Orth
> alan.o...@gmail.com
> http://alaninkenya.org
> http://mjanja.co.ke
> "I have always wished for my computer to be as easy to use as my
> telephone; my wish has come true because I can no longer figure out how to
> use my telephone." -Bjarne Stroustrup, inventor of C++
> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>


[slurm-dev] Re: shellshock patch uses a different function export, caused some errors on our Slurm cluster

2014-09-28 Thread Marcin Stolarek
2014-09-27 0:30 GMT+02:00 John Brunelle :

>
> Though I hope everyone is putting the bash shellshock patching in
> their rearview mirror, it might still help to be aware of a change to
> function exports that the latest version introduced.  Instead of the
> corresponding environment variable being named "myfunction", it's now
> "BASH_FUNC_myfunction()".
>
> This caused a bit of trouble for us when we patched some head nodes
> before compute nodes.  Since job environments are created on the
> submission host, but run on the compute host, the compute hosts didn't
> understand/accept the environment variable definition.  Along with the
> error message, our jobs lost the ability to load software environment
> modules (which is implemented with bash functions).
>
> Though not specific to Slurm, I think it's relevant here because of
> the sharing of environments across hosts that comes up in this
> context.  We wrote up a bit more detail here:
>
>
> https://rc.fas.harvard.edu/shellshock-update-issue-temporarily-affecting-slurm-jobs-software-modules/
>
> Hope that helps someone,
>
> John
>
> John Brunelle
> Harvard University FAS Research Computing, Informatics and Scientific
> Applications
> john_brune...@harvard.edu  @jabrcx
>

Does I understood you correctly, that it's able to start interactive shell
with:
srun --pty bash in yours configuration and because this is non-login shell
the environment have to be set on submit host?

We forced our users to always use bash in login  (-l) mode in this case
environment is set on worker nodes, I believe it's comon

cheers,
marcin


[slurm-dev] Re: job pending, not starting

2014-09-23 Thread Marcin Stolarek
2014-09-23 20:23 GMT+02:00 Eva Hocks :

>
>
>
> How can I get a job started after it was pending with
>
> JobState=PENDING Reason=AssociationJobLimit
>
>
> I removed the qos job limit with no success, I removed the user from the
> qos with no success. I tried the scontrol StartTime=now with no success.
>
> So how can I get the job running? What is the equivalent to torque
> "qrun" ?
>
there is no, you can eventually change job priority with
scontrol update job=JOBID priority=

cheers,
marcin

>
> Thanks
> Eva
>


[slurm-dev] Re: can't see completed jobs with squeue

2014-09-10 Thread Marcin Stolarek
2014-09-10 15:10 GMT+02:00 Erica Riello :

>  Hi all,
>
> I'm running Slurm 14.03.07 and I'd like to configure it in order to
> preserve completed jobs for 5 minutes so that if I run squeue a short while
> after a job completion, it would show me the job.
>
> What do I have to add in slurm.conf to enable this feature?
>
>

Start reading the manual, this is really annoying to see so many mails
about simple problems from one person on the list.


regards,
marcin

> Regards,
>
> --
> ===
> Erica Riello
> Aluna Engenharia de Computação PUC-Rio
>
>


[slurm-dev] Re: Upgrading and not losing jobs

2014-08-24 Thread Marcin Stolarek
2014-08-25 7:28 GMT+02:00 Dennis Zheleznyak :

>  Both running and queued
>

>From theoretical point of view it's  possible to upgrade from 2.4.4 to
14.11 version,  for sure you have to do that step by step, since slurm
protocol is consistent only between three subsequent versions.

 However the overall success is also dependent on your skills and
experience. For example, I'll suggest increasing SlurmdTimeout to a few
hours, you should also follow the procedure of upgrading slurmdbd first and
check if all plugins (spank, job_submit) used in your site (and present in
configuration) are going to be available. If you have written your own
plugin you should check if it will compile against new slurm version (for
example job_submit plugin api have changed).

cheers,
marcin

>
>
> On Sun, Aug 24, 2014 at 12:22 PM, Chris Samuel 
> wrote:
>
>>
>> On Sun, 24 Aug 2014 01:58:05 AM Dennis Zheleznyak wrote:
>>
>> > I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried
>> to
>> > simulate it in a virtual environment the running jobs were deleted every
>> > single time.
>>
>> As Uwe said I suspect that's too large a jump to be supported, you might
>> want
>> to test 2.4.4 -> 2.6.9 first to see if that will work.
>>
>> Also - do you mean keeping running jobs or just queued jobs?
>>
>> Best of luck,
>> Chris
>> --
>>  Christopher SamuelSenior Systems Administrator
>>  VLSCI - Victorian Life Sciences Computation Initiative
>>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>>
>
>


[slurm-dev] Re: Storing the job submission script in the accounting database

2014-08-22 Thread Marcin Stolarek
2014-08-22 10:39 GMT+02:00 Antony Cleave :

>  This is great news, the final part of the puzzle is how do you access the
> text of the job script?  I'd assume that there is an environment variable
> with the script in it but I can't find it in the prolog guide. Getting it
> into mysql won't be a problem
>
> Cheers
>
> Antony
>
> this
SCRIPT=$(base64 -w0 /var/spool/slurm/job.$JOBID/script)
ENVIRONMENT=$(base64 -w0 /var/spool/slurm/job.$JOBID/environment)
 is part of our script:

the exec location may depend on your configuration - check
StateSaveLocation in your slurm.conf.


You may be also interested in using slurmmon whitespace's, you can find
slurmmon project on github.

cheers,
marcin



>
>
> On 21/08/2014 19:18, Marcin Stolarek wrote:
>
>
>
> W dniu czwartek, 21 sierpnia 2014 Antony Cleave 
> napisał(a):
>
>>
>> Is it possible to store the job submission script and the environment
>> variables passed  to it in the account database or log this data
>> automatically to /path/to/spylog/.log files  in SLURM?
>>
>> I'm interested in analysing what the cluster is used for over time and
>> this would be a good start in working out what is really being submitted.
>
>
>  You can slurmctld prologue, we use this and put every job script into
> mysql.
> Cheers,
> Marcin
>
>>
>> Thanks
>>
>> Antony
>>
>
>


[slurm-dev] Re: Storing the job submission script in the accounting database

2014-08-21 Thread Marcin Stolarek
W dniu czwartek, 21 sierpnia 2014 Antony Cleave 
napisał(a):

>
> Is it possible to store the job submission script and the environment
> variables passed  to it in the account database or log this data
> automatically to /path/to/spylog/.log files  in SLURM?
>
> I'm interested in analysing what the cluster is used for over time and
> this would be a good start in working out what is really being submitted.


You can slurmctld prologue, we use this and put every job script into mysql.
Cheers,
Marcin

>
> Thanks
>
> Antony
>


[slurm-dev] Re: How to size the controller systems

2014-08-18 Thread Marcin Stolarek
W dniu poniedziałek, 18 sierpnia 2014 Jason Bacon 
napisał(a):

>
> The controller generally shouldn't require much, but if you're running
> Linux, be aware that the way memory use is measured in recent kernels makes
> it look like slurmctld is using a lot of RAM
>

Can you point me to detailed information about that ? How is the memory
measured?

>  when multiple threads are active.  I had to up the per-process limit to
> 10G on our CentOS 6.5 controller nodes, even though slurmctld was using
> less than 1G in reality.
>
> Regards,
>
> Jason
>
> On 8/18/14 1:08 PM, Louis Capps wrote:
>
> Hi,
> We are looking at using SLURM for a large 6000 node cluster and need more
> info on the support systems.  Can you point me to a sizing guide or info on
> the requirements for the primary and backup controllers for SLURM including
> CPU, memory and local disk requirements?
>
> Thx,
> Louis
>
>
>
> ***
> Louis Capps (lca...@us.ibm.com
> )
>   --- Systems Architect - Federal High Performance Computing - US Federal
> IMT - IBM Corporation
>   --- Office (512)286-5556, t/l 363-5556 --- fax 678-6146 --- cell
> (512)796-4501
>   --- Bld 045, 3C80, Austin, TX
> http://www-1.ibm.com/servers/deepcomputing/
> http://www-03.ibm.com/systems/clusters/
>
> ***
>
>
>
> --
> ~~~
>   Jason W. Bacon
>   jwba...@tds.net 
>
>   Circumstances don't make a man:
>   They reveal him.
> ~~~
>
>


[slurm-dev] Re: Re:

2014-08-06 Thread Marcin Stolarek
2014-08-06 17:50 GMT+02:00 Erica Riello :

>  Uwe,
>
> I've created slurm user and edited the value in slurm.conf. However, it
> still shows the same error message:
>
> > sudo slurmctld -D -
> slurmctld: pidfile not locked, assuming no running daemon
> slurmctld: error: Configured MailProg is invalid
> slurmctld: error: Job accounting information gathered, but not stored
> slurmctld: fatal: Incorrect permissions on state save loc: /var/spool
>
> There's no /var/log/audit directory, is it supposed to be there?
>
> I'm a newbie, I don't know what's SELinux...
>
 http://pl.wikipedia.org/wiki/Security-Enhanced_Linux
;-)

>
> The file slurm.conf is attached to this email.
>
> Regards,
>
> Erica
>
>
>
> 2014-08-05 17:55 GMT-03:00 Uwe Sauter :
>
>  I suggest that you create the slurm user as described in the guide.
>> Check again the permissions of all folders that are configured in
>> slurm.conf. (Does munge run as a separate user?)
>>
>> If you still have problems then it would be good if you post your
>> complete slurm.conf so the list can check that for errors.
>>
>> Also take a look into /var/log/audit/audit.log (at least on RHEL based
>> distributions this is the file where SELinux logs errors).
>>
>> I'm not sure if SELinux could be another piece in this puzzle but you
>> could also try turning off SELinux temporarily (run setenforce 0, this
>> won't survive a reboot).
>>
>>
>> Regards,
>>
>> Uwe
>>
>> Am 05.08.2014 22:39, schrieb Erica Riello:
>>
>> Uwe,
>>
>>  Thank you for the info.
>>
>>  I compiled it myself, following the instructions available in the
>> website.
>>
>>  Regards,
>>
>>  Erica.
>>
>>
>> 2014-08-05 17:32 GMT-03:00 Uwe Sauter :
>>
>>>  Erica,
>>>
>>> sure it could be changed. But then you have to change all the
>>> permissions of the different folders that slurm uses to reflect that.
>>>
>>> Running daemons as root is discouraged since a few years for security
>>> reasons. This doesn't necessarly mean that the daemon is started by that
>>> user account but that it gives up all those privileges that it doesn't need
>>> right after the start and that it also changes the EUID (effective user id)
>>> once it did setup all the things it needs root privileges for (like binding
>>> a socket <1024).
>>> The keyword to search for is "capabilities".  A very good and detailed
>>> book about this and many other topics is "The Linux Programming Interface"
>>> by Michael Kerrisk (http://man7.org/tlpi/)
>>>
>>> Can you tell us how you installed slurm? Did you compile it yourself or
>>> used RPMs? If you used RPMs, those should have created the user account I
>>> was referring to.
>>>
>>> I didn't have problems installing by following
>>> http://slurm.schedmd.com/quickstart_admin.html
>>>
>>> Regards,
>>>
>>> Uwe
>>>
>>>
>>> Am 05.08.2014 22:20, schrieb Erica Riello:
>>>
>>> Hi Uwe,
>>>
>>>  I thought it could be changed to another user. Well, if it is the user
>>> who will run the daemons, shouldn't it be root?
>>>
>>>  Regards,
>>>
>>>  Erica
>>>
>>>
>>> 2014-08-05 17:10 GMT-03:00 Uwe Sauter :
>>>
  Hi Erica,

 I think you misunderstood the concept of "service user" in Linux.

 SlurmUser in the slurm.conf doesn't mean which user should be able to
 use SLURM (submit jobs, etc.) but which system user will run the slurm
 control daemon and slurm database daemon. This user is usually called
 "slurm".

 Here's the line of my /etc/passwd for this user:

 slurm:x:222:222:SLURM Manager:/:/bin/false


 Regards,

 Uwe


 Am 05.08.2014 22:04, schrieb Erica Riello:

 Kiran,

  that's exactly what I've done, but I still get the same error
 messages.

  Erica


  2014-08-05 15:54 GMT-03:00 Kiran Thyagaraja :

>
> You should be ideally specifying a directory under /var/spool
> e.g /var/spool/slurm and then change its permissions so that the
> SlurmUser can write to it.
>
> Kiran
>
> On 08/05/2014 01:33 PM, Mike Johnson wrote:
>
>> Re: [slurm-dev]
>>
>> Hi Erica
>>
>> What's the value of SlurmUser in the slurm.conf?
>>
>> You'll need to make sure the MailProg exists and is executable by the
>> SlurmUser too
>>
>> Mike
>>
>>
>>  On 5 August 2014 18:20, Erica Riello > > wrote:
>>
>> Hi all,
>>
>> I've been trying to run the slurm controller daemon, but I get an
>> error I have no idea how to solve:
>>
>> > sudo slurmctld -Dc
>> slurmctld: pidfile not locked, assuming no running daemon
>> slurmctld: error: Configured MailProg is invalid
>> slurmctld: error: Job accounting information gathered, but not
>> stored
>> slurmctld: fatal: Incorrect permissions on state save loc:
>> /var/spool
>>
>>
>> Munge daemon is running and /var/spool permissions are:
>>
>>

[slurm-dev] Re: Checkpoint support using BLCR - Steps and needed packages

2014-08-05 Thread Marcin Stolarek
2014-08-05 19:11 GMT+02:00 Trey Dockendorf :

>
> I have found that in order to support SUSPEND preemption we can not use
> CR_Memory or Memory as a consumable resource.  I've seen that if a
> preemptable partition has requested 15900MB of RAM on a 16GB node then the
> job will not be preempted and understandably so.  Now I'm looking at how to
> implement Preemption using Checkpoint.  However I'm unable to find any
> documentation on the exact behavior, configuration and necessary packages.
>

The job can be preempted only if it can fit in RAM. For example if 512GB
memory job would be preempted it will take a lot of time to swap whole
memory. It's better to check it on the queueing system level rather then
assume that you can use swap (i'm not sure how it would work  for instance
on bluegene system).



>
> I have rebuilt the BLCR SRPM for my cluster, and am unsure which packages
> are necessary for the various systems.  I have the SLURM controller, SLURM
> compute nodes and SLURM submit hosts (login nodes) that do not run the
> slurm daemon but only submit jobs.
>
> I'm also unsure what the expected behavior of when a job is preempted and
> checkpointed.  Will the job's state be saved?  The documentation mentions
> ImageDir but does not mention how it's set outside of interactive scontrol
> commands.  If I enable PreemptMode=CHECKPOINT, I'm just not clear on what
> the expected behavior will be for a user's job.
>
> Any guidance on how other sites have implemented BLCR checkpointing, and
> your experiences would be useful.
>

It's quite difficult staff. And it's much more on MPI and BLCR side than on
Slurms.

cheers,
marcin


[slurm-dev] Re: Preemption and job cancellation

2014-08-05 Thread Marcin Stolarek
2014-08-05 23:47 GMT+02:00 Satrajit Ghosh :

>  hi
>
> out cluster is setup with the configuration below. yet we have been having
> a lot of jobs cancelled when preempted:
>
> slurmd[node004]: *** JOB 79188 CANCELLED AT 2014-08-05T15:31:41 DUE TO
> PREEMPTION ***
> i thought the settings would simply suspend the job instead of canceling
> it.
>
> cheers,
>
> satra
>
> Partial configuration
> ---
>
> PreemptMode=GANG,SUSPEND
>
> PreemptType=preempt/partition_prio
>
> # default
>
> SchedulerTimeSlice=30
>
> DefMemPerCPU=2048
>
> DefMemPerNode=2048
>
> PartitionName=DEFAULT MaxTime=7-0 DefaultTime=24:00:00
>
> # Partitions
>
> PartitionName=defq Default=NO MinNodes=1 DefaultTime=1-00:00:00
> MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
> RootOnly=NO Hidden=YES Shared=NO GraceTime=0 ReqResv=NO
> PreemptMode=GANG,SUSPEND State=UP
>
> PartitionName=om_all_nodes Default=YES MinNodes=1 DefaultTime=1-00:00:00
> MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
> RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0 ReqResv=NO
> PreemptMode=GANG,SUSPEND State=UP Nodes=node[001-030]
>
> PartitionName=om_interactive Default=NO MinNodes=1 MaxNodes=1
> DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10
> DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:1 GraceTime=0
> MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG,SUSPEND State=UP Nodes=node017
>
>
>  If I remember the logic correctly it will try to suspend you job, but if
the plugin (proctrack?) will fail to suspend you job, the job will be
killed.

Are you using cgroups freezer or SIGSTOP to suspend you jobs?

hope this can help

marcin


[slurm-dev] Re: Enforcing qos limits without associations limits

2014-08-01 Thread Marcin Stolarek
2014-08-01 0:22 GMT+02:00 Trey Dockendorf :

>
> I don't have a solution regarding the removal of accounting enforcement,
> but what is it your storing in LDAP that is checked by your plugin?
>

We prefer using job_submit plugin which is connecting to LDAP database,
because this guarantees that closing account in web interface (which is not
managed by cluster administrators) affects the possibility of job
submission immediately.



>
> We are still migrating our Torque/Maui cluster to SLURM and part of the
> migration includes moving from /etc/passwd based user management to LDAP.
>  I've gone to considerable trouble to script the importing of our LDAP into
> slurmdbd.  Right now my script only queries LDAP and converts various
> attributes into a sacctmgr import file.  The code is a proof-of-concept and
> eventually will be changed to perform regular checks that slurmdbd matches
> LDAP.
>

We have our own code which is working the same way (however we have our
own  LDAP schema for  accounts; honestly we are merging 2 different schemas
from different sources), our backup file has more than 4k7 lines and
currnetly we are unable to load it, the error message is very helpfull :)
"Unspecified error".


> Something similar to what I'm doing could possibly be easier than having
> the SLURM code changed.
>
Easier solution often doesn't mean better one ;-)

We want to use only QoS values allowed for every user, but we are forced to
generate the dump file, load it from cron doesn't look ugly? :)


cheers,
marcin


[slurm-dev] Enforcing qos limits without associations limits

2014-07-31 Thread Marcin Stolarek
Hi guys,

In our installation we have a separate job_submit plugin which checks the
account validity directly in LDAP. We would like to disable associations
enforcement, but in current configuration we are using qos limits which are
limiting number of jobs and cores per user (user can choose to use qos
normal - with longer jobs and lower number of jobs/cores running/allocated,
or qos short which is allowing more resources with lower walltime limit).

I haven't yet checked in code, do you think there is an ease way to remove
accounting enforcement dependencies?
cheers,
marcin


[slurm-dev] Re: even CPU load

2014-07-28 Thread Marcin Stolarek
2014-07-28 8:00 GMT+02:00 Леонид Коньков :

>  Hi.
>
> I want my CPUs be loaded as even as possible. I have 22 nodes
> (motherboards) with 1 CPU each with 8 CPU cores each. ( My English is far
> from perfect, and I'm not shure what is what in you terminology.) I want to
> load 100 onethread tasks with 4-5 tasks on one CPU. But tasks always fill
> all 8 cores of CPU and leave some CPUs idle.
>
you mean doesn't leave any idle cpu? You want node to be responsible and
available for interactive work while tasks are running?


>
> <--distribution=cyclic> and <--hint=memory_bound> don't help.
>
> Leo.
>

If I understood you correctly, you can limit the slurmd cpuset to a subset
of your cores, so jobs running under this cpuset won't use more than whole
cgroup/cpuset/slurm/.

cheers,
marcin


[slurm-dev] Re: Limiting job array concurrency

2014-07-22 Thread Marcin Stolarek
2014-07-21 14:43 GMT+02:00 Yuri D'Elia :

>
> Is there a way for an user to specify an upper bound for the number of
> jobs running simultaneously on an array?
>
> A practical example for this scenario is to limit concurrent connections
> to a shared resource that is external to the cluster. As such, I don't
> want really to specify a "Gres" resource, as this would also bind jobs
> to the node/s where the resource was assigned and would be not much
> better than -w.
>
> Thanks.
>

Create licences for this resource?

cheers,
marcin


[slurm-dev] Job Submit plugin API

2014-07-16 Thread Marcin Stolarek
Hi guys,

I've been checking job submit plugins API description:
http://slurm.schedmd.com/job_submit_plugins.html

And I was supprised that third argument of job_modify function is: part_list
(input) List of pointer to partitions which this user is authorized to use.

I've check job_modify definition in job_submit_partition.c ( 14.03.5) and
it looks like:
extern int job_modify(struct job_descriptor *job_desc,
  struct job_record *job_ptr, uint32_t submit_uid)


Is there an error in documentation?

cheers,
marcin


[slurm-dev] Re: Segfault in gres.c:2945 with 14.03.5

2014-07-15 Thread Marcin Stolarek
2014-07-15 15:43 GMT+02:00 Markus Blank-Burian :

> Hi,
>
> after job 436172 completed, the slurmctld daemon segfaulted. Starting
> slurmctld again reproduces the segfault. Debugging with gdb shows the
> following backtrace. How can i fix this without losing the complete state?
>
> Markus
>
>
check this bug:
 http://bugs.schedmd.com/show_bug.cgi?id=958

>
> slurmctld: _sync_nodes_to_comp_job: Job 436172 in completing state
> [New Thread 0x72906700 (LWP 15397)]
> [New Thread 0x72805700 (LWP 15398)]
> slurmctld: debug:  Priority MULTIFACTOR plugin loaded
> slurmctld: debug2: _adjust_limit_usage: job 436172: MPC: job_memory set to
> 16384
> slurmctld: debug2: Spawning RPC agent for msg_type REQUEST_TERMINATE_JOB
> [New Thread 0x72704700 (LWP 15399)]
> slurmctld: _sync_nodes_to_comp_job: completing 1 jobs
> slurmctld: debug:  Updating partition uid access list
> slurmctld: Recovered state of 0 reservations
> slurmctld: State of 0 triggers recovered
> [New Thread 0x72603700 (LWP 15400)]
> slurmctld: debug2: got 1 threads to send out
> slurmctld: read_slurm_conf: backup_controller not specified.
> slurmctld: cons_res: select_p_reconfigure
> slurmctld: cons_res: select_p_node_init
> slurmctld: cons_res: preparing for 3 partitions
> [New Thread 0x72502700 (LWP 15401)]
> [New Thread 0x72401700 (LWP 15402)]
> slurmctld: debug2: Tree head got back 0 looking for 1
> slurmctld: Running as primary controller
> slurmctld: Registering slurmctld at port 6817 with slurmdbd.
> slurmctld: debug2: Tree head got back 1
> [Thread 0x72401700 (LWP 15402) exited]
> slurmctld: cleanup_completing: job 436172 completion process took 2671
> seconds
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x72502700 (LWP 15401)]
> 0x0054e0a7 in gres_plugin_job_clear (job_gres_list= out>) at
> gres.c:2945
> 2945FREE_NULL_BITMAP(job_state_ptr-
> >gres_bit_step_alloc[i]);
> (gdb) bt
> #0  0x0054e0a7 in gres_plugin_job_clear (job_gres_list= out>) at gres.c:2945
> #1  0x0048e350 in delete_step_records (job_ptr=job_ptr@entry
> =0xb85b08)
> at step_mgr.c:263
> #2  0x0045d7d3 in cleanup_completing (job_ptr=job_ptr@entry
> =0xb85b08)
> at job_scheduler.c:3057
> #3  0x0046713c in make_node_idle (node_ptr=0x7ff728,
> job_ptr=job_ptr@entry=0xb85b08) at node_mgr.c:3072
> #4  0x0044bac6 in job_epilog_complete (job_id=436172,
> node_name=0x7fffecc8 "kaa-23", return_code=return_code@entry=0) at
> job_mgr.c:10265
> #5  0x00436d7c in _thread_per_group_rpc (args=0x7fffe8000a28) at
> agent.c:923
> #6  0x77486ed3 in start_thread (arg=0x72502700) at
> pthread_create.c:308
> #7  0x771bbe2d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
> (gdb) list
> 2940if (!job_gres_ptr)
> 2941continue;
> 2942job_state_ptr = (gres_job_state_t *) job_gres_ptr-
> >gres_data;
> 2943for (i = 0; i < job_state_ptr->node_cnt; i++) {
> 2944FREE_NULL_BITMAP(job_state_ptr-
> >gres_bit_alloc[i]);
> 2945FREE_NULL_BITMAP(job_state_ptr-
> >gres_bit_step_alloc[i]);
> 2946}
> 2947xfree(job_state_ptr->gres_bit_alloc);
> 2948xfree(job_state_ptr->gres_bit_step_alloc);
> 2949xfree(job_state_ptr->gres_cnt_step_alloc);
> (gdb)


[slurm-dev] Re: problem with slurm job step creation

2014-05-30 Thread Marcin Stolarek
2014-05-28 19:05 GMT+02:00 :

>  Hi,
>
>
>
> I am facing below error with builtin  scheduling in shared mode of GPUs
> (cons_resource,CR_CORE_MEMORY) .
>
>
>
> Facing below message and  in output file.
>
>
>
>
>
> "srun: Job step creation temporarily disabled, retrying"
>
Are you trying with interactive job?

I've seen such a problem with gres and interactive job, batch jobs were
working just fine, but I hanven't time to examine this deeply.

cheers,
marcin

>
>
> Thanks in  Advance.
>
>
>
> --
>
> Regards,
>
> *Yogendra Sharma*
>
>
>
>
>
>
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
>
> www.wipro.com
>



-- 
Marcin Stolarek
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM),
University of Warsaw, Poland


[slurm-dev] How is job_suspend function defined?

2014-05-29 Thread Marcin Stolarek
Hi,

I'm trying to understand how does job suspension mechanism works in my
configuration, but I stucked in looking for job_suspend function (used in
slurmctld/gang.c) definition.

I suppose that this depends on my proctrack plugin configuration, but... I
cannot figure out where this connection ( in my case probably with
_slurm_cgroup_suspend)   is made in code.

thanks in advance,
marcin

-- 
Marcin Stolarek
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM),
University of Warsaw, Poland


[slurm-dev] How to check if job is interactive

2014-05-21 Thread Marcin Stolarek

Hi,

I'm wondering if it's possible to recognize job as interactive within
a job submit plugin?
Is it possible to get this information from job_desc structure, or is
it possible to check if job was using --pty ?

Does interactive job really differs from batch job, or is just shell
the binary being running in this job?

cheers,
marcin


[slurm-dev] job_submit plugin init function

2014-05-08 Thread Marcin Stolarek
Hi Guys,

I'm writing a job submit plugin where I find it convenient to have a
configuration file for the plugin. I've seen that in job_submit/pbs plugin
there is a function called init, however this is not documented in:
http://slurm.schedmd.com/job_submit_plugins.html
and... for now I believe this doesn't work.

My question is, does any mechanism to load job_submit plugin configuration
file only once on slurmctld start exist?

cheers
marcin
-- 
Marcin Stolarek
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM),
University of Warsaw, Poland


[slurm-dev] Re: slurm_update error

2014-05-05 Thread Marcin Stolarek
2014-04-30 21:33 GMT+02:00 jeff wang :

>  Hello,
>
> I am using slurm  2.6.6. When I tried to change a job's size using the
> command:
>
> scontrol update JobId=2440 NumCPUs=3.
>
> It gave me an error:
>
> slurm_update error: Requested operation is presently disabled
>
> Can anyone help me ?
>
Was the job already running?
check FAQ:  https://computing.llnl.gov/linux/slurm/faq.html#job_size

cheers,
marcin


[slurm-dev] Re: Shared TmpFS

2014-04-25 Thread Marcin Stolarek
2014-04-25 15:25 GMT+02:00 Barbara Krasovec :

>  Hello!
>
> As far as I know SlurmSpoolDir is a directory for slurmd state
> information, slurm TMPFS is the directory that the daemon should use when
> SlurmSpoolDir runs out of space.
>
man slurm.conf
 /TmpFs

  TmpFS  Fully qualified pathname of the file system available to user jobs
for temporary storage. This parameter is used in estab-
  lishing a node's TmpDisk space.  The default value is "/tmp".

  SlurmdSpoolDir
  Fully qualified pathname of a directory into which the slurmd
daemon's state information and batch job script  information
  are  written.  This  must be a common pathname for all nodes,
but should represent a directory which is local to each node
  (reference a local file system). The default value is
"/var/spool/slurmd".  Any "%h" within the name is replaced with  the
  hostname  on  which  the  slurmd  is  running.  Any "%n"
within the name is replaced with the SLURM node name on which the
  slurmd is running.


so difficult...

cheers,
marcin


[slurm-dev] Re: Guidance on planning a slurmdbd outage

2014-04-07 Thread Marcin Stolarek
2014-04-07 8:24 GMT+02:00 Christopher Samuel :

>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi folks,
>
> We're doing some migrations of VMs to new infrastructure and one of
> them runs our central slurmdbd for our 3 Intel clusters and our
> BlueGene/Q.
>
> Now the Slurm accounting page says:
>
> # If SlurmDBD is configured for use but not responding then slurmctld
> # will utilize an interal cache until SlurmDBD is returned to service.
>
> We think it'll take between 30 minutes and an hour to do the move, are
> there any guesstimates on how much memory this cache uses if we had 1
> job per minute completing through a slurmctld ?
>
> I suspect it's not a great deal but it'd be good to have an idea how
> long we could safely run without a slurmdbd to talk to.
>

I believe it'is not going to be a problem ;)

We had longer outages of slurmdbd, with real HTC load. Of course it's
important to have enough free memory but I assume you are not using 32MB of
RAM.

cheers,
marcin


-- 
Marcin Stolarek
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM),
University of Warsaw, Poland


[slurm-dev] select plugins doesn't correctly deal with PreemptMode=off

2014-03-18 Thread Marcin Stolarek
Hi guys,

I've been testing job preemption and found a bug in implementation of
PreemptMode=off option for partitions. The examples from:
http://slurm.schedmd.com/preempt.html, presents non-preemptable partitions
with PreemtMode=off only when the option is set for the highest priority
partition. In our environment we rather wanted to implement partitions like:
PartitionName=priousers  [...] Priority=30 PreemptMode=suspend
PartitionName=users [...] Priority=20
PreemptMode=suspend Shared=FORCE:1
PartitionName=external-users   [...]   Priority=10 PreemptMode=off
Shared=FORCE:1

so.. the priority for external-users is lowest, but their jobs won't be
preempted. Whithout patching jobs from external-users partition where
considered to be preempted, and finally killed with:
slurmctld/gang.c :671
 if (rc != SLURM_SUCCESS) {
rc = job_signal(job_ptr->job_id, SIGKILL, 0, 0,
true);
if (rc == SLURM_SUCCESS)
info("preempted job %u had to be killed",
 job_ptr->job_id);
else {
info("preempted job %u kill failure %s",
 job_ptr->job_id, slurm_strerror(rc));
}
}


because they cannot be suspended, requeued, etc. because they are not in
partition with appropriate PreemptMode.

The needed change is in select plugin beeing used, in our case it's
cons_res, but checking the code the problem in different plugins will be
similar.
I've changed a condition in:
plugins/select/cons_res/job_test.c:2326 to:
if (p_ptr->part_ptr->priority <= jp_ptr->part_ptr->priority &&
p_ptr->part_ptr->preempt_mode != PREEMPT_MODE_OFF)

Probably this is the only needed change to implement correct behaviour (I
was testing on workstation with 3 partitons), but I'd also recommend
additional change in preemptable_candidates list creation.

However it is implicitly checked  (line 1558 select_cons_res.c) checked, if
preemptable_candidate have PREEMPT_MODE_OFF job:
mode = slurm_job_preempt_mode(tmp_job_ptr);
if ((mode != PREEMPT_MODE_REQUEUE)&&
(mode != PREEMPT_MODE_CHECKPOINT) &&
(mode != PREEMPT_MODE_CANCEL))
continue;   /* can't remove job */

I think it's more efficient do not include jobs from partition with
PREEMPT_MODE_OFF in preemptable_candidates list, which can be done  in
(plugins/preempt/partition_prio/preempt_partition_prio.c:113
if ((job_p->part_ptr == NULL) ||
(job_p->part_ptr->priority >=
job_ptr->part_ptr->priority) ||
(job_p->part_ptr->preempt_mode=PREEMPT_MODE_OFF))
// add check jobs partmode
continue;



I've attached a patch with my changes, it also add some additional debug
with appropriate debug flags. I'was working on 2.6.7, but quick review of
the code shows that there were no chages in 14.03-rc1 version.

cheers,
marcin
===
Marcin Stolarek
Interdisciplinary Centre for Mathematical and Computational Modelling (ICM),
University of Warsaw, Poland


patch
Description: Binary data


[slurm-dev]

2014-03-13 Thread Marcin Stolarek
Hi guys,

On our cluster we run into situation, when we want change the
SlurmdSpoolDir location, do you know any way to do this without draining
whole cluster?

cheers,
marcin


Marcin Stolarek
Interdisciplinary Center for Mathematical and Computational Modeling,
University of Warsaw


[slurm-dev] Re: Is MaxJobs enforced for associations?

2014-03-04 Thread Marcin Stolarek
2014-03-03 23:46 GMT+01:00 Christopher Samuel :

>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 04/03/14 04:53, Lyn Gerner wrote:
>
> > Have you also set AccountingStorageEnforce appropriately, as
> > described here: http://slurm.schedmd.com/resource_limits.html ?
>
> We've had the association one from the start (we rely on hard limits
> for CPU time for projects - i.e. accounts) so my first go with
> association based limits should have worked.
>
> However, I didn't when I sent that email for the QoS based limits
> though I fixed it shortly afterwards.
>
> I have a suspicion though that in the short window between me sending
> that email and adding qos as an enforcement a number of the problem
> users jobs finished, taking them below the 192 job cut off. :-/
>
> My next task is to figure out how to implement the equivalent of
> Maui/Moab's MAXIJOB such that any more than (say) 5 waiting jobs of a
> user get marked as QOSResourceLimit, and then patch our local showq to
> treat those as "blocked" and hide them from other users.
>

   bf_max_job_user=#
 The maximum number of jobs per user to attempt
backfill scheduling for, not counting jobs which cannot be started due to
an association resource limit.  One can  set
 this  limit  to  prevent  users from flooding the
backfill queue with jobs that cannot start and that prevent jobs from other
users to start.  This is similar to the
 MAXIJOB limit in Maui.  The default value is 0, which
means no limit.  This option applies only to SchedulerType=sched/backfill.

cheers,
marcin


[slurm-dev] Re: Reserving a node for short jobs between specific hours

2014-02-14 Thread Marcin Stolarek

2014-02-14 4:50 GMT+01:00 Christopher Samuel :
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi there,
>
> I'm looking at tweaking our Slurm config to get it to handle our
> workload better and it seems to be doing pretty well, but there's one
> thing missing that I'd like to get going.
>
> When we had Torque and Moab (and I used to do the same with Maui at
> $JOB--) we had a standing reservation that would reserve a node on the
> cluster for jobs of less than 15 minutes between 8am and 8pm for testing.
>
> Basically any job that fitted the criteria would get attracted into
> the reservation and run there in preference to other nodes and those
> that didn't would be kept off it so people could get quick turnarounds.
>
> We can't use partitions for this as the user shouldn't need to know
> about this (and has never needed to know in the past, so it'd be a
> regression from their point of view) plus we'd need to have scripts to
> shuffle short jobs between partitions at 8am and 8pm and bring that
> partition up/down as appropriate.
>
> My reading of the manual page isn't encouraging, but I could just be
> not seeing the wood for the trees.  Any ideas please?
>

You can create a reservation for the root ( I mean a parent of all )
account and write your job submit plugin to add this reservation to
jobs shorter than 15minutes and remove (if setted by user) this
reservation for jobs longer than 15 minutes.

cheers,
marcin


[slurm-dev] Re: Don't allow SSH Login

2014-02-06 Thread Marcin Stolarek
2014-02-06 Dennis Zheleznyak :

>  Hi everyone,
>
> I have a 4 node cluster - 3 compute servers and one storage server, all
> running CentOS 6.5.
>
> I want to allow ssh login only to one specific compute server and only
> from there users will be able to run jobs. If i'll disable ssh login to the
> rest of the compute servers, slurm won't work since it relies on SSH
>
Hi Dennis,

How does slurm rely on ssh?

cheers,
marcin


  1   2   >