from:"Felix Willenborg"

[slurm-dev] Infrequent output flush of STDOUT and STDERR in slurm jobs

2017-10-11 Thread Felix Willenborg


Dear everyone,

I experience a very infrequent (like every 30 minutes or so) output
flush of STDOUT and STDERR in my Slurm system (slurm 14.11.8 on RHEL7,
all nodes share a network share on which data is written). I'd like to
monitor the output 'live' via tailf or something similiar to have an
output like I'd run the program on the nodes. The are hints that I've
got to use the unbuffered option, e.g. 'srun --unbuffered [...]' or
'stdbuf -o0 -e0' at the beginning of my batch script, which aren't
working as supposed. Do I miss a possibility to get a frequent output
flush or isn't that supported yet? I'd look very forward to a feature
like that!

Looking forward to your answers!

Best wishes,
Felix Willenborg

-- 
Felix Willenborg

Arbeitsgruppe Machine Learning
Department für Medizinische Physik und Akustik
Fakultät für Medizin und Gesundheitswissenschaften 
Carl von Ossietzky Universität Oldenburg

Küpkersweg 74, 26129 Oldenburg
Tel: +49 441 798 3945

https://www.uni-oldenburg.de/machine-learning/

[slurm-dev] Re: SLURM ERROR! NEED HELP

2017-07-05 Thread Felix Willenborg

Which OS are you using on both nodes and how particulary did you turn
off the firewall?

Best,
Felix

Am 05.07.2017 um 16:23 schrieb Said Mohamed Said:
> Sinfo -R gives "NODE IS NOT RESPONDING"
> ping gives successful results from both nodes
>
> I really can not figure out what is causing the problem.
>
> Regards,
> Said
> --------
> *From:* Felix Willenborg 
> *Sent:* Wednesday, July 5, 2017 9:07:05 PM
> *To:* slurm-dev
> *Subject:* [slurm-dev] Re: SLURM ERROR! NEED HELP
>  
> When the nodes change to the down state, what is 'sinfo -R' saying?
> Sometimes it gives you a reason for that.
>
> Best,
> Felix
>
> Am 05.07.2017 um 13:16 schrieb Said Mohamed Said:
>> Thank you Adam, For NTP I did that as well before posting but didn't
>> fix the issue.
>>
>> Regards,
>> Said
>> 
>> *From:* Adam Huffman 
>> *Sent:* Wednesday, July 5, 2017 8:11:03 PM
>> *To:* slurm-dev
>> *Subject:* [slurm-dev] Re: SLURM ERROR! NEED HELP
>>  
>>
>> I've seen something similar when node clocks were skewed.
>>
>> Worth checking that NTP is running and they're all synchronised.
>>
>> On Wed, Jul 5, 2017 at 12:06 PM, Said Mohamed Said
>>  wrote:
>> > Thank you all for suggestions. I turned off firewall on both
>> machines but
>> > still no luck. I can confirm that No managed switch is preventing
>> the nodes
>> > from communicating. If you check the log file, there is
>> communication for
>> > about 4mins and then the node state goes down.
>> > Any other idea?
>> > 
>> > From: Ole Holm Nielsen 
>> > Sent: Wednesday, July 5, 2017 7:07:15 PM
>> > To: slurm-dev
>> > Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP
>> >
>> >
>> > On 07/05/2017 11:40 AM, Felix Willenborg wrote:
>> >> in my network I encountered that managed switches were preventing
>> >> necessary network communication between the nodes, on which SLURM
>> >> relies. You should check if you're using managed switches to connect
>> >> nodes to the network and if so, if they're blocking communication on
>> >> slurm ports.
>> >
>> > Managed switches should permit IP layer 2 traffic just like unmanaged
>> > switches!  We only have managed Ethernet switches, and they work
>> without
>> > problems.
>> >
>> > Perhaps you meant that Ethernet switches may perform some firewall
>> > functions by themselves?
>> >
>> > Firewalls must be off between Slurm compute nodes as well as the
>> > controller host.  See
>> >
>> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
>> >
>> > /Ole
>

[slurm-dev] Re: SLURM ERROR! NEED HELP

2017-07-05 Thread Felix Willenborg

When the nodes change to the down state, what is 'sinfo -R' saying?
Sometimes it gives you a reason for that.

Best,
Felix

Am 05.07.2017 um 13:16 schrieb Said Mohamed Said:
> Thank you Adam, For NTP I did that as well before posting but didn't
> fix the issue.
>
> Regards,
> Said
> 
> *From:* Adam Huffman 
> *Sent:* Wednesday, July 5, 2017 8:11:03 PM
> *To:* slurm-dev
> *Subject:* [slurm-dev] Re: SLURM ERROR! NEED HELP
>  
>
> I've seen something similar when node clocks were skewed.
>
> Worth checking that NTP is running and they're all synchronised.
>
> On Wed, Jul 5, 2017 at 12:06 PM, Said Mohamed Said
>  wrote:
> > Thank you all for suggestions. I turned off firewall on both
> machines but
> > still no luck. I can confirm that No managed switch is preventing
> the nodes
> > from communicating. If you check the log file, there is
> communication for
> > about 4mins and then the node state goes down.
> > Any other idea?
> > 
> > From: Ole Holm Nielsen 
> > Sent: Wednesday, July 5, 2017 7:07:15 PM
> > To: slurm-dev
> > Subject: [slurm-dev] Re: SLURM ERROR! NEED HELP
> >
> >
> > On 07/05/2017 11:40 AM, Felix Willenborg wrote:
> >> in my network I encountered that managed switches were preventing
> >> necessary network communication between the nodes, on which SLURM
> >> relies. You should check if you're using managed switches to connect
> >> nodes to the network and if so, if they're blocking communication on
> >> slurm ports.
> >
> > Managed switches should permit IP layer 2 traffic just like unmanaged
> > switches!  We only have managed Ethernet switches, and they work without
> > problems.
> >
> > Perhaps you meant that Ethernet switches may perform some firewall
> > functions by themselves?
> >
> > Firewalls must be off between Slurm compute nodes as well as the
> > controller host.  See
> >
> https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
> >
> > /Ole

[slurm-dev] Re: SLURM ERROR! NEED HELP

2017-07-05 Thread Felix Willenborg

Hi,

in my network I encountered that managed switches were preventing
necessary network communication between the nodes, on which SLURM
relies. You should check if you're using managed switches to connect
nodes to the network and if so, if they're blocking communication on
slurm ports.

Best,
Felix

Am 05.07.2017 um 11:30 schrieb Ole Holm Nielsen:
>
> On 07/05/2017 11:25 AM, Ole Holm Nielsen wrote:
>> Could it be that you have enabled the firewall on the compute nodes?
>> The firewall must be turned off (this requirement isn't documented
>> anywhere).
>>
>> You may want to go through my Slurm deployment Wiki at
>> https://wiki.fysik.dtu.dk/niflheim/Niflheim7_Getting_started to see
>> if anything obvious is missing in your configuration.
>
> Correction to the web page: https://wiki.fysik.dtu.dk/niflheim/SLURM
>
> Sorry,
> Ole

[slurm-dev] gpu_mem is not respected (?)

2017-06-11 Thread Felix Willenborg

Dear all,

I configured a GPU Cluster with GRES the following way:

_*slurm.conf*_:

/[...]
GresTypes=gpu,gpu_mem
[...]
NodeName=hpcg01 NodeAddr=x.x.x.x CPUs=12 RealMemory=128741 Sockets=2
CoresPerSocket=6 ThreadsPerCore=1 State=IDLE
Gres=gpu_mem:6143,gpu:titanb:2
[...]/

_*gres.conf (on controller)*_:

/NodeName=hpcg1 Type=titanb Name=gpu File=/dev/nvidia[0,1]
[...]
NodeName=hpcg[1,3-6] Name=gpu_mem Count=6143
[...]/

_* gres.conf (on node hpcg01)*_:

/Name=gpu Type=titanb File=/dev/nvidia0
Name=gpu Type=titanb File=/dev/nvidia1
Name=gpu_mem Count=6143/

When I run batch scripts with the flag --gres=gpu:1,gpu_mem:2000 where a
program is allocating memory more than 2000 MB, the program still runs.
Should it not be terminated when exceeding the limits like in --mem? If
it should be terminated, I configured it somehow wrong. Does anybody
spot an error I made? I'm looking for a solution for days and I'm
running out of ideas.

Looking forward to your answers. Thanks in advance!

Best wishes,
Felix Willenborg

[slurm-dev] Re: Restrict access for a user group to certain nodes

2016-12-02 Thread Felix Willenborg


Thanks Brian and Magnus for your answers. Reservations was exactly what
I was looking for. Thank you very much!

Have a nice day,
Felix

Am 01.12.2016 um 16:14 schrieb Magnus Jonsson:
>
> Hi!
>
> You could either setup a partition for your tests with group
> restrictions or you can use the reservation feature depending on your
> exact use case.
>
> /Magnus
>
> On 2016-12-01 15:54, Felix Willenborg wrote:
>>
>> Dear everybody,
>>
>> I'd like to restrict submissions from a certain user group or allow only
>> one certain user group to submit jobs to certain nodes. Does Slurm offer
>> groups which can handle such an occassion? It'd be prefered if there is
>> a linux user group support, because this would save time setting up a
>> new user group environment.
>>
>> The intention is that only administrators can submit jobs to those
>> certain nodes to perform some tests, which might be disturbed by users
>> submitting their jobs to those nodes. Various Search Engines didn't
>> offer answers to my question, which is why I'm writing you here.
>>
>> Looking forward to some answers!
>>
>> Best,
>> Felix Willenborg
>>
>

[slurm-dev] Restrict access for a user group to certain nodes

2016-12-01 Thread Felix Willenborg


Dear everybody,

I'd like to restrict submissions from a certain user group or allow only
one certain user group to submit jobs to certain nodes. Does Slurm offer
groups which can handle such an occassion? It'd be prefered if there is
a linux user group support, because this would save time setting up a
new user group environment.

The intention is that only administrators can submit jobs to those
certain nodes to perform some tests, which might be disturbed by users
submitting their jobs to those nodes. Various Search Engines didn't
offer answers to my question, which is why I'm writing you here.

Looking forward to some answers!

Best,
Felix Willenborg

[slurm-dev] Re: One CPU always reserved for one GPU

2016-06-02 Thread Felix Willenborg


Hey everyone,

sorry for digging out this old post but unfortunately I searched very
intensively and didn't find any solution.. therefore my suggestion would
be to add a flag for partitions like "MinCPUsPerGPUs" or something like
that. Would this be something which is useful? Maybe someone has a good
idea to solve my problem anyway?

Best,
Felix Willenborg

Am 07.03.2016 um 12:55 schrieb Felix Willenborg:
> Dear Lachele,
>
> your suggestion is great! It would work if we'd have a complete
> homogenic cluster - which is unfortunately not the case :(. All nodes
> have at least two graphic cards, one has 3, another has 4. Also one Node
> has a CPU with 16 cores. With MaxCPUsPerNode in slurm.conf for a
> partition I'd exclude hardware which would never be used. That would be
> very sad.
>
> Best,
> Felix
>
> On 01.03.2016 23:29, Lachele Foley wrote:
>> We do exactly that.  We use the CPUs as the consumable resource rather
>> than the GPUs for that reason.  We also limit memory use as needed.
>> You might want to see the configuration issues we ran into and solved
>> as recorded in the thread at the link below.
>>
>> https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y
>>
>>
>> On Tue, Mar 1, 2016 at 1:27 PM, John Desantis  wrote:
>>> Felix,
>>>
>>> Although I haven't run into a use-case like yours (yet), my initial
>>> thought was to use the flag "MaxCPUsPerNode" in your configuration:
>>>
>>> 'Maximum number of CPUs on any node available to all jobs from this
>>> partition.  This can be especially useful to schedule GPUs. For
>>> example  a  node can  be  associated  with  two Slurm partitions (e.g.
>>> "cpu" and "gpu") and the partition/queue "cpu" could be limited to
>>> only a subset of the node’s CPUs, insuring that one or more CPUs would
>>> be available to jobs in the "gpu" partition/queue.'
>>>
>>> HTH,
>>> John DeSantis
>>>
>>>
>>>
>>> 2016-03-01 9:05 GMT-05:00 Felix Willenborg 
>>> :
>>>> Hey folks,
>>>>
>>>> I'm kind of new to SLURM and we're setting it up in our work group with our
>>>> nodes. Our cluster contains per node 2 GPUs and 12 CPU cores.
>>>>
>>>> The GPUs are configured with gres like this :
>>>> Name=gpu_mem Count=6143
>>>> Name=gpu File=/dev/nvidia0
>>>> Name=gpu File=/dev/nvidia1
>>>> #Name=bandwidth count=4G
>>>> (Somehow the bandwith plugin isn't available in the repository slurm and 
>>>> I'm
>>>> getting error messages with that. That's why it's commented out. Is it even
>>>> necessary?)
>>>>
>>>> The nodes are defined like that in the slurm.conf :
>>>> [...]
>>>> NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2
>>>> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
>>>> Gres=gpu:3,gpu_mem:12287#,bandwidth:4G
>>>>
>>>>
>>>> We'd like to have a situation where one CPU is always available for one GPU
>>>> and only can allocated with one GPU, because we often had the situation 
>>>> that
>>>> reservations were made where all CPUs were allocated and we couldn't use 
>>>> the
>>>> GPUs anymore. I searched on the internet and didn't find any similiar cases
>>>> which could help me. The only thing I found was adding "CPUS=0,1" at the 
>>>> end
>>>> of every Name=gpu ... in gres.conf. Would this already do it? And if not,
>>>> what can I do? I've got the feeling that I could solve my problem with 
>>>> SLURM
>>>> in many ways. We're using SLURM version 14.11.8.
>>>>
>>>> Looking forward to some answers!
>>>>
>>>> Best wishes,
>>>> Felix Willenborg
>>

[slurm-dev] Re: One CPU always reserved for one GPU

2016-03-07 Thread Felix Willenborg


Dear Lachele,

your suggestion is great! It would work if we'd have a complete
homogenic cluster - which is unfortunately not the case :(. All nodes
have at least two graphic cards, one has 3, another has 4. Also one Node
has a CPU with 16 cores. With MaxCPUsPerNode in slurm.conf for a
partition I'd exclude hardware which would never be used. That would be
very sad.

Best,
Felix

On 01.03.2016 23:29, Lachele Foley wrote:
> We do exactly that.  We use the CPUs as the consumable resource rather
> than the GPUs for that reason.  We also limit memory use as needed.
> You might want to see the configuration issues we ran into and solved
> as recorded in the thread at the link below.
>
> https://groups.google.com/forum/#!topic/slurm-devel/x6VaKfrdH5Y
>
>
> On Tue, Mar 1, 2016 at 1:27 PM, John Desantis  wrote:
>> Felix,
>>
>> Although I haven't run into a use-case like yours (yet), my initial
>> thought was to use the flag "MaxCPUsPerNode" in your configuration:
>>
>> 'Maximum number of CPUs on any node available to all jobs from this
>> partition.  This can be especially useful to schedule GPUs. For
>> example  a  node can  be  associated  with  two Slurm partitions (e.g.
>> "cpu" and "gpu") and the partition/queue "cpu" could be limited to
>> only a subset of the node’s CPUs, insuring that one or more CPUs would
>> be available to jobs in the "gpu" partition/queue.'
>>
>> HTH,
>> John DeSantis
>>
>>
>>
>> 2016-03-01 9:05 GMT-05:00 Felix Willenborg 
>> :
>>> Hey folks,
>>>
>>> I'm kind of new to SLURM and we're setting it up in our work group with our
>>> nodes. Our cluster contains per node 2 GPUs and 12 CPU cores.
>>>
>>> The GPUs are configured with gres like this :
>>> Name=gpu_mem Count=6143
>>> Name=gpu File=/dev/nvidia0
>>> Name=gpu File=/dev/nvidia1
>>> #Name=bandwidth count=4G
>>> (Somehow the bandwith plugin isn't available in the repository slurm and I'm
>>> getting error messages with that. That's why it's commented out. Is it even
>>> necessary?)
>>>
>>> The nodes are defined like that in the slurm.conf :
>>> [...]
>>> NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2
>>> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
>>> Gres=gpu:3,gpu_mem:12287#,bandwidth:4G
>>>
>>>
>>> We'd like to have a situation where one CPU is always available for one GPU
>>> and only can allocated with one GPU, because we often had the situation that
>>> reservations were made where all CPUs were allocated and we couldn't use the
>>> GPUs anymore. I searched on the internet and didn't find any similiar cases
>>> which could help me. The only thing I found was adding "CPUS=0,1" at the end
>>> of every Name=gpu ... in gres.conf. Would this already do it? And if not,
>>> what can I do? I've got the feeling that I could solve my problem with SLURM
>>> in many ways. We're using SLURM version 14.11.8.
>>>
>>> Looking forward to some answers!
>>>
>>> Best wishes,
>>> Felix Willenborg
>
>

[slurm-dev] One CPU always reserved for one GPU

2016-03-01 Thread Felix Willenborg

Hey folks,

I'm kind of new to SLURM and we're setting it up in our work group with
our nodes. Our cluster contains per node 2 GPUs and 12 CPU cores.

The GPUs are configured with gres like this :
/Name=gpu_mem Count=6143//
//Name=gpu File=/dev/nvidia0 //
//Name=gpu File=/dev/nvidia1 //
//#Name=bandwidth count=4G//
//(Somehow the bandwith plugin isn't available in the repository slurm
and I'm getting error messages with that. That's why it's commented out.
Is it even necessary?)

/The nodes are defined like that in the slurm.conf :
/[...]//
//NodeName=node01 NodeAddr=<...> CPUs=12 RealMemory=128740 Sockets=2
CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
Gres=gpu:3,gpu_mem:12287#,bandwidth:4G/


We'd like to have a situation where one CPU is always available for one
GPU and only can allocated with one GPU, because we often had the
situation that reservations were made where all CPUs were allocated and
we couldn't use the GPUs anymore. I searched on the internet and didn't
find any similiar cases which could help me. The only thing I found was
adding "CPUS=0,1" at the end of every Name=gpu ... in gres.conf. Would
this already do it? And if not, what can I do? I've got the feeling that
I could solve my problem with SLURM in many ways. We're using SLURM
version 14.11.8.

Looking forward to some answers!

Best wishes,
Felix Willenborg

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-19 Thread Felix Willenborg



So i tried out installing the latest package (14.11.4-1) of slurm with 
no success - unfortunately. I kept an eye on the compilation of the 
Infiniband Plugin, that it is loaded in the slurmd and that a 
acct_gathering.conf is available. Still, i have the same problem. I 
assume that i'm not configuring slurm correctly with regard to 
Infiniband. Are there possibilities where i can make any mistakes?

[slurm-dev] slurm-dev Re: Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Felix Willenborg



@Yann Sagon:
This is a good point. I didn't notice this. I'm new to this particular 
system. Everything installed, configured etc. was already there when i 
got into it. I'll keep that in mind, because i've got to discuss the 
upgrade with the group. Maybe i should set it up from the beginning then.


@John Desantis
I checked Nodenames and their proper IP adresses. The /etc/hosts 
contains the correct IP Adress with the full DNS aswell as the nodename 
e.g. "node01". "ping node01" works.


@Trevor Cooper
The munge daemon is reporting no errors in munged.log, also all nodes 
are syncing times with one and the same NTP Server. Nevertheless a 
manual check showed the exact same time on every node.


@Moe Jette
I already went through the "Slurm is not responding" points. 
Unfortunately none helped.


Thanks everybody who answered until now!

[slurm-dev] Slurm is refusing to establish a connection between nodes and controller

2015-03-17 Thread Felix Willenborg



Hi there,

first of all, i'm kinda new to slurm, so hopefully i may have missed 
something very basic here.


I'm trying to set up a system of six to seven nodes with homogenic 
hardware as SLURM nodes. The nodes are connected via Infiniband. As a 
controller, i have a system which differs the hardware specification a 
little bit. To keep munge.key and slurm.conf homogenic on all systems i 
use salt. So far so good.


The problem i recieve is that no node is responding to the master when 
"sinfo" is run under the controller. "scontrol ping" although says on 
every node, that the primary controller is up, which is really 
confusing. Another thing which seems weird is, that when i watch the log 
file of the controller, it says that the node is found when slurmd on 
the node is restarted, and after one minute approximately the connection 
is lost again.


I checked pretty much everything which came in my mind, like possible 
blocked ports or user/group rights set wrong. Maybe you have an idea.. i 
ran out of them. Also, here is the - anonymized - slurm.conf aswell as 
the slurmctld.log and slurmd.log of on node. I'm looking forward to some 
help!!


Best wishes,
Felix Willenborg

slurm.conf

# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=erica
ControlAddr=***.***.***.***
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
CacheGroups=0
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=99
#GresTypes=gpu
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobCheckpointDir=/var/slurm/checkpoint
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=4
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/pgid
#Prolog=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFs=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=1
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=7200
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
#
#
# JOB PRIORITY
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
AccountingStorageLoc=/var/log/slurm-llnl/accounting
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/filetxt
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=7
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdDebug=7
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
#NodeName=node[01-06] CPUs=12 RealMemory=128910 Sockets=2 
CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
NodeName=node01 NodeAddr=***.***.***.51 CPUs=12 RealMemory=128910 
Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
NodeName=node02 NodeAddr=***.***.***.52 CPUs=12 RealMemory=128910 
Sockets=2 CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
NodeName=node03 NodeAddr=***.***.***.53 CPUs=12 RealMemory=128910 
Sockets=2 CoresPerSock

[slurm-dev] Infrequent output flush of STDOUT and STDERR in slurm jobs

[slurm-dev] Re: SLURM ERROR! NEED HELP

[slurm-dev] Re: SLURM ERROR! NEED HELP

[slurm-dev] Re: SLURM ERROR! NEED HELP

[slurm-dev] gpu_mem is not respected (?)

[slurm-dev] Re: Restrict access for a user group to certain nodes

[slurm-dev] Restrict access for a user group to certain nodes

[slurm-dev] Re: One CPU always reserved for one GPU

[slurm-dev] Re: One CPU always reserved for one GPU

[slurm-dev] One CPU always reserved for one GPU

[slurm-dev] Re: Slurm is refusing to establish a connection between nodes and controller

[slurm-dev] slurm-dev Re: Slurm is refusing to establish a connection between nodes and controller

[slurm-dev] Slurm is refusing to establish a connection between nodes and controller

13 matches

Site Navigation

Mail list logo

Footer information