Re: [slurm-users] Correct way to give srun and sbatch different MaxTime values?

2020-08-06 Thread Jaekyeom Kim
Thank you for the answer.
I wasn't aware of that file.
I'll look into it!

Best,
Jaekyeom


On Wed, Aug 5, 2020 at 3:27 AM Renfro, Michael  wrote:

> Untested, but you should be able to use a job_submit.lua file to detect if
> the job was started with srun or sbatch:
>
>- Check with (job_desc.script == nil or job_desc.script == '')
>- Adjust job_desc.time_limit accordingly
>
> Here, I just gave people a shell function "hpcshell", which automatically
> drops them in a time-limited partition. Easier for them, fewer idle
> resources for everyone:
>
> hpcshell ()
> {
> srun --partition=interactive $@ --pty $SHELL -i
> }
> --
> *From:* slurm-users  on behalf of
> Jaekyeom Kim 
> *Sent:* Tuesday, August 4, 2020 5:35 AM
> *To:* slurm-us...@schedmd.com 
> *Subject:* [slurm-users] Correct way to give srun and sbatch different
> MaxTime values?
>
>
> *External Email Warning*
>
> *This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.*
> --
> Hi,
>
> I'd like to prevent my Slurm users from taking up resources with dummy
> shell process jobs left unaware/intentionally.
> To that end, I simply want to put a tougher maximum time limit for srun
> only.
> One possible way might be to wrap the srun binary.
> But could someone tell me if there is any proper way to do it, please?
>
> Best,
> Jaekyeom
>
>


Re: [slurm-users] Slurmstepd errors

2020-08-06 Thread Williams, Jenny Avis
We ran into a similar error -- 

A response from schedmd:
https://bugs.schedmd.com/show_bug.cgi?id=3890

Remediating steps until updates got us past this particular issue:
Check for "xcgroup_instantiate errors” and close nodes that show this in 
messages log. From the nodes listed here we close compute node hosts that show 
the error. A reboot clears the condition.



-Original Message-
From: slurm-users  On Behalf Of Matthew 
BETTINGER
Sent: Tuesday, July 28, 2020 12:53 AM
To: Slurm User Community List 
Subject: [slurm-users] Slurmstepd errors

Hello,

Running slurm 17.02.6 on a cray system and all of a sudden we have been 
receiving these message errors from slurmstepd.  Not sure what triggers this?

srun -N 4 -n 4 hostname
nid00031
slurmstepd: error: task/cgroup: unable to add task[pid=903] to memory cg 
'(null)'
nid00029
nid00030
slurmstepd: error: task/cgroup: unable to add task[pid=50322] to memory cg 
'(null)'
nid00032

The jobs seem to be running but this sort of just popped up for some reason.

Configuration data as of 2020-07-27T23:51:10 AccountingStorageBackupHost = 
(null) AccountingStorageEnforce = none
AccountingStorageHost   1
AccountingStorageLoc= N/A
AccountingStoragePort   = 6819
AccountingStorageTRES   = gres/gpu,gres/craynetwork,bb/cray,cpu,mem,energy,node
AccountingStorageType   = accounting_storage/slurmdbd
AccountingStorageUser   = N/A
AccountingStoreJobComment = Yes
AcctGatherEnergyType= acct_gather_energy/rapl
AcctGatherFilesystemType = acct_gather_filesystem/none AcctGatherInfinibandType 
= acct_gather_infiniband/none
AcctGatherNodeFreq  = 30 sec
AcctGatherProfileType   = acct_gather_profile/none
AllowSpecResourcesUsage = 1
AuthInfo= (null)
AuthType= auth/munge
BackupAddr  = hickory-2
BackupController= hickory-2
BatchStartTimeout   = 10 sec
BOOT_TIME   = 2020-07-27T14:27:51
BurstBufferType = burst_buffer/cray
CacheGroups = 0
CheckpointType  = checkpoint/none
ChosLoc = (null)
ClusterName = hickory
CompleteWait= 0 sec
ControlAddr = hickory-1
ControlMachine  = hickory-1
CoreSpecPlugin  = cray
CpuFreqDef  = Performance
CpuFreqGovernors= Performance,OnDemand
CryptoType  = crypto/munge
DebugFlags  = (null)
DefMemPerNode   = UNLIMITED
DisableRootJobs = No
EioTimeout  = 60
EnforcePartLimits   = NO
Epilog  = (null)
EpilogMsgTime   = 2000 usec
EpilogSlurmctld = (null)
ExtSensorsType  = ext_sensors/none
ExtSensorsFreq  = 0 sec
FairShareDampeningFactor = 1
FastSchedule= 0
FirstJobId  = 1
GetEnvTimeout   = 2 sec
GresTypes   = gpu,craynetwork
GroupUpdateForce= 1
GroupUpdateTime = 600 sec
HASH_VAL= Match
HealthCheckInterval = 0 sec
HealthCheckNodeState= ANY
HealthCheckProgram  = (null)
InactiveLimit   = 0 sec
JobAcctGatherFrequency  = 30
JobAcctGatherType   = jobacct_gather/linux
JobAcctGatherParams = (null)
JobCheckpointDir= /var/slurm/checkpoint
JobCompHost = localhost
JobCompLoc  = /var/log/slurm_jobcomp.log
JobCompPort = 0
JobCompType = jobcomp/none
JobCompUser = root
JobContainerType= job_container/cncu
JobCredentialPrivateKey = (null)
JobCredentialPublicCertificate = (null)
JobFileAppend   = 0
JobRequeue  = 1
JobSubmitPlugins= cray
KeepAliveTime   = SYSTEM_DEFAULT
KillOnBadExit   = 1
KillWait= 30 sec
LaunchParameters= (null)
LaunchType  = launch/slurm
Layouts =
Licenses= (null)
LicensesUsed= (null)
MailDomain  = (null)
MailProg= /bin/mail
MaxArraySize= 1001
MaxJobCount = 1
MaxJobId= 67043328
MaxMemPerCPU= 128450
MaxStepCount= 4
MaxTasksPerNode = 512
MCSPlugin   = mcs/none
MCSParameters   = (null)
MemLimitEnforce = Yes
MessageTimeout  = 10 sec
MinJobAge   = 300 sec
MpiDefault  = none
MpiParams   = ports=2-32767
MsgAggregationParams= (null)
NEXT_JOB_ID = 2760029
NodeFeaturesPlugins = (null)
OverTimeLimit   = 0 min
PluginDir   = /opt/slurm/17.02.6/lib64/slurm
PlugStackConfig = /etc/opt/slurm/plugstack.conf
PowerParameters = (null)
PowerPlugin =
PreemptMode = CANCEL
PreemptType = preempt/partition_prio
PriorityParameters  = (null)
PriorityDecayHalfLife   = 7-00:00:00
PriorityCalcPeriod  = 00:05:00
PriorityFavorSmall  = No
PriorityFlags   =
PriorityMaxAge  = 7-00:00:00
PriorityUsageResetPeriod = 

[slurm-users] Tuning MaxJobs and MaxJobsSubmit per user and for the whole cluster?

2020-08-06 Thread Hoyle, Alan P
I can't find any advice online about how to tune things like MaxJobs on a 
per-cluster or per-user basis.

As far as I can tell, it seems that the default install cluster MaxJobs seems 
to be 10,000 and MaxSubmit as the same.  Those seem pretty low to me:  are 
there resources that get consumed if maxSubmit is much higher or can we raise 
this without much worry?

Is there advice anywhere about tuning these?  When I google, all I can find are 
the generic "here's how to change this" and various universities' documentation 
of "here are the limits we have set."

-alan


--
Alan Hoyle - al...@unc.edu
Bioinformatics Scientist
UNC Lineberger - Bioinformatics Core
https://lbc.unc.edu/



Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Christopher Samuel

On 8/6/20 10:13 am, Jason Simms wrote:

Later this month, I will have to bring down, patch, and reboot all nodes 
in our cluster for maintenance. The two options available to set nodes 
into a maintenance mode seem to be either: 1) creating a system-wide 
reservation, or 2) setting all nodes into a DRAIN state.


We use both. :-)

So for cases where we need to do a system wide outage for some reason we 
will put reservations on in advance to ensure the system is drained for 
the maintenance.


But for rolling upgrades we will build a new image, set nodes to use it 
and then do something like:


scontrol reboot ASAP nextstate=resume reason="Rolling upgrade" [nodes]

That will allow running jobs to complete, drain all the nodes and when 
idle they'll reboot into the new image and resume themselves once 
they're back up and slurmd has started and checked in.


We use the same mechanism when we need to reboot nodes for other 
maintenance activities, say when huge pages are too fragmented and the 
only way to reclaim them is to reboot the node (these checks happen in 
the node epilog).


We paid for enhancements to Slurm 18.08 to ensure that slurmctld took 
these nodes states into account when scheduling jobs so that large jobs 
(as in requiring most of the nodes in the system) do not lose their 
scheduling window when a node has to be rebooted for this reason.


All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



[slurm-users] Compute node OS and firmware updates

2020-08-06 Thread Ole Holm Nielsen
Regarding the question of methods for Slurm compute node OS and firmware 
updates, we have for a long time used rolling updates while the cluster 
is in full production, so that we do not waste any resources.  When 
entire partitions are upgraded in this way, there is no risk of starting 
new jobs on nodes with differing states of OS and firmware, while 
running jobs continue on the not-yet-updated nodes.


The basic idea (which was provided by Niels Carl Hansen, ncwh -at- 
cscaa.dk) is to run a crontab script "update.sh" whenever a node is 
rebooted.  Use scontrol to reboot the nodes as they become idle, thereby 
performing the updates that you want.  Remove the crontab job as part of 
the update.sh script.


The update.sh script and instructions for usage are in:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/nodes

Comments are welcome.

/Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark



Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Thomas M. Payerle
We usually we set up a reservation for maintenance.  This prevents jobs
from starting if they are not expected to end before the reservation
(maintenance) starts.
As Paul indicated, this causes nodes to become idle (and pending job queue
to grow) as maintenance time approaches, but avoids requiring users to
resubmit partially completed jobs, especially since many of our users do
notbioe464-1v2y adequately checkpoint.

Draining all of the nodes has the disadvantage of potentially increasing
cluster idle time even more --- if your maximum walltime is 3 days and you
start draining at T-3d, if all jobs on the nodes have walltime of at most
1d than cluster is completely idle at T-2d.  Which is fine if you can
effect the maintenance then and end 2d early, but problematic if you can;t,
as no jobs can run those 2 days.  With reservation, short jobs continue to
run until reservation starts.

But draining nodes is useful when yuo can effect the maintenance early if
nodes become available, and particularly in cases where only a limited
number of nodes are involved.



On Thu, Aug 6, 2020 at 1:54 PM Paul Edmon  wrote:

> Because we want to maximize usage we actually have opted to just cancel
> all running jobs the day of.  We send out notification to all the users
> that this will happen.  We haven't really seen any complaints and we've
> been doing this for years.  At the start of the outage we set all
> partitions to down, then run a cancel over all the running jobs.  Pending
> jobs are left in place, and users are allowed to submit work during the
> outage and when we reopen everything gets going again.
>
> So there is a third option, though you have to accept that jobs will be
> cancelled to pull it off.
>
> -Paul Edmon-
> On 8/6/2020 1:13 PM, Jason Simms wrote:
>
> Hello all,
>
> Later this month, I will have to bring down, patch, and reboot all nodes
> in our cluster for maintenance. The two options available to set nodes into
> a maintenance mode seem to be either: 1) creating a system-wide
> reservation, or 2) setting all nodes into a DRAIN state.
>
> I'm not sure it really matters either way, but is there any preference one
> way or the other? Any gotchas I should be aware of?
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632
>
>

-- 
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroadspaye...@umd.edu
5825 University Research Park   (301) 405-6135
University of Maryland
College Park, MD 20740-3831


Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Ing. Gonzalo E. Arroyo
When I need to do something like this I let the automatic SLURM management
to do the job. I only shutdown by using SSH, replace something, then power
on and everything starts Ok, other option is to call resume in case of any
failure, and restart the slurm services in nodes... Regards

*Ing. Gonzalo E. Arroyo - CPA Profesional*
IFIMAR - CONICET
*www.ifimar-conicet.gob.ar *

*Este mensaje es confidencial. Puede contener información amparada por el
secreto comercial. Si usted ha recibido este e-mail por error, deberá
eliminarlo de su sistema. No deberá copiar el mensaje ni divulgar su
contenido a ninguna persona. Muchas gracias.*
This message is confidential. It may also contain information that is
privileged or not authorized to be disclosed. If you have received it by
mistake, delete it from your system. You should not copy the messsage nor
disclose its contents to anyone. Thanks.


El jue., 6 ago. 2020 a las 14:13, Jason Simms ()
escribió:

> Hello all,
>
> Later this month, I will have to bring down, patch, and reboot all nodes
> in our cluster for maintenance. The two options available to set nodes into
> a maintenance mode seem to be either: 1) creating a system-wide
> reservation, or 2) setting all nodes into a DRAIN state.
>
> I'm not sure it really matters either way, but is there any preference one
> way or the other? Any gotchas I should be aware of?
>
> Warmest regards,
> Jason
>
> --
> *Jason L. Simms, Ph.D., M.P.H.*
> Manager of Research and High-Performance Computing
> XSEDE Campus Champion
> Lafayette College
> Information Technology Services
> 710 Sullivan Rd | Easton, PA 18042
> Office: 112 Skillman Library
> p: (610) 330-5632
>


Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Ole Holm Nielsen

On 06-08-2020 19:13, Jason Simms wrote:
Later this month, I will have to bring down, patch, and reboot all nodes 
in our cluster for maintenance. The two options available to set nodes 
into a maintenance mode seem to be either: 1) creating a system-wide 
reservation, or 2) setting all nodes into a DRAIN state.


I'm not sure it really matters either way, but is there any preference 
one way or the other? Any gotchas I should be aware of?


I'd recommend using a reservation because you can define a specific 
maintenance period way ahead of time.  You ought to create the 
reservation in advance, before the greatest MaxTime for all partitions 
in slurm.conf, so that you won't have any remaining running jobs when 
the reservation sets in.  Jobs can then continue to run until the very 
last minute!


I have some notes on reservations in
https://wiki.fysik.dtu.dk/niflheim/SLURM#resource-reservation

Draining nodes is a bad idea, IMHO, because you'll have a lot of drained 
nodes from now and until your maintenance period, causing lost resources.


The way I prefer to do upgrades is actually neither 1) nor 2).  I make 
rolling (minor) upgrades of the compute node OS and firmware while the 
cluster is in full production in order to avoid lost resources.  I will 
post my upgrade script to this list in a separate message.


/Ole



Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Paul Edmon
Because we want to maximize usage we actually have opted to just cancel 
all running jobs the day of.  We send out notification to all the users 
that this will happen.  We haven't really seen any complaints and we've 
been doing this for years.  At the start of the outage we set all 
partitions to down, then run a cancel over all the running jobs.  
Pending jobs are left in place, and users are allowed to submit work 
during the outage and when we reopen everything gets going again.


So there is a third option, though you have to accept that jobs will be 
cancelled to pull it off.


-Paul Edmon-

On 8/6/2020 1:13 PM, Jason Simms wrote:

Hello all,

Later this month, I will have to bring down, patch, and reboot all 
nodes in our cluster for maintenance. The two options available to set 
nodes into a maintenance mode seem to be either: 1) creating a 
system-wide reservation, or 2) setting all nodes into a DRAIN state.


I'm not sure it really matters either way, but is there any preference 
one way or the other? Any gotchas I should be aware of?


Warmest regards,
Jason

--
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632


[slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Jason Simms
Hello all,

Later this month, I will have to bring down, patch, and reboot all nodes in
our cluster for maintenance. The two options available to set nodes into a
maintenance mode seem to be either: 1) creating a system-wide reservation,
or 2) setting all nodes into a DRAIN state.

I'm not sure it really matters either way, but is there any preference one
way or the other? Any gotchas I should be aware of?

Warmest regards,
Jason

-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research and High-Performance Computing
XSEDE Campus Champion
Lafayette College
Information Technology Services
710 Sullivan Rd | Easton, PA 18042
Office: 112 Skillman Library
p: (610) 330-5632


Re: [slurm-users] Debugging communication problems

2020-08-06 Thread Gerhard Strangar
Gerhard Strangar wrote:

> I'm experiencing a connectivity problem and I'm out of ideas, why this
> is happening. I'm running a slurmctld on a multihomed host.
> 
> (10.9.8.0/8) - master - (10.11.12.0/8)
> There is no routing between these two subnets.

My topology.conf contained a loop, which resulted in incorrect message
forwarding.

Gerhard



Re: [slurm-users] Billing issue

2020-08-06 Thread Bas van der Vlies
On Thu, 2020-08-06 at 09:30 -0400, Paul Raines wrote:
> Bas
> 
> Does that mean you are setting PriorityFlags=MAX_TRES ?
> 
YES

> Also does anyone understand this from the slurm.conf docs:
> 
>The weighted amount of a resource can be adjusted by adding a suffix of
>K,M,G,T or P after the billing weight. For example, a memory weight of
>"mem=.25" on a job allocated 8GB will be billed 2048 (8192MB *.25) units. A
>memory weight of "mem=.25G" on the same job will be billed 2 (8192MB *
>(.25/1024)) units.
> 
> Where is this "2048" and "2" factor coming from in those two calculations?
> 
job is using 8G --> 0.25 * 8192 = 2048 
with .25G ---)  .25/1024M --> 8192 * 0.25/1024 = 2

Memory is always a factor 1024

> On Thu, 6 Aug 2020 6:46am, Bas van der Vlies wrote:
> 
> > Il 06/08/20 10:00, Bas van der Vlies ha scritto:
> > 
> > Tks for the answer.
> > 
> > > > We have also node with GPU's (dfiferent types) and some cost more the
> > > > others.
> > > The partitions always have the same type of nodes not mixed,eg:
> > >  *
> > > TRESBillingWeights=CPU=3801.0,Mem=502246.0T,GRES/gpu=22807.0,GRES/gpu:tita
> > > nrt
> > > x=22807.0
> > >  * node type: 24 cores 4 GPU's  MemSpecLimit=1024 Memory=191488
> > >  * the max is 91228 SBU
> > Still missing something. Can't come to your result.
> > 24*3801+502246*(191488-1024)/(1024**5)+4*22807 = 182452 (rounded), about
> > twice your result.
> > 
> > we have MAX(core, mem, gres). all resources can have the score: 91228
> >  24 * 3801 = 91224
> >  mem same value
> >  4 gres * 22807 = 91228
> > 
> > So we take one of these maximum values we dived it again by 1000 and round
> > it. Hopefully this explains it.
> > 
> > DIFA - Dip. di Fisica e Astronomia
> > Servizi Informatici
> > Alma Mater Studiorum - Università di Bologna
> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > tel.: +39 051 20 95786
> > 
> > 


smime.p7s
Description: S/MIME cryptographic signature


Re: [slurm-users] Billing issue

2020-08-06 Thread Paul Raines


Bas

Does that mean you are setting PriorityFlags=MAX_TRES ?

Also does anyone understand this from the slurm.conf docs:

  The weighted amount of a resource can be adjusted by adding a suffix of
  K,M,G,T or P after the billing weight. For example, a memory weight of
  "mem=.25" on a job allocated 8GB will be billed 2048 (8192MB *.25) units. A
  memory weight of "mem=.25G" on the same job will be billed 2 (8192MB *
  (.25/1024)) units.

Where is this "2048" and "2" factor coming from in those two calculations?

On Thu, 6 Aug 2020 6:46am, Bas van der Vlies wrote:



Il 06/08/20 10:00, Bas van der Vlies ha scritto:

Tks for the answer.


We have also node with GPU's (dfiferent types) and some cost more the others.

The partitions always have the same type of nodes not mixed,eg:
 * TRESBillingWeights=CPU=3801.0,Mem=502246.0T,GRES/gpu=22807.0,GRES/gpu:titanrt
x=22807.0
 * node type: 24 cores 4 GPU's  MemSpecLimit=1024 Memory=191488
 * the max is 91228 SBU

Still missing something. Can't come to your result.
24*3801+502246*(191488-1024)/(1024**5)+4*22807 = 182452 (rounded), about
twice your result.

we have MAX(core, mem, gres). all resources can have the score: 91228
 24 * 3801 = 91224
 mem same value
 4 gres * 22807 = 91228

So we take one of these maximum values we dived it again by 1000 and round it. 
Hopefully this explains it.

DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Re: [slurm-users] Billing issue

2020-08-06 Thread Diego Zuccato
Il 06/08/20 12:46, Bas van der Vlies ha scritto:

> we have MAX(core, mem, gres). all resources can have the score: 91228
Ah, Ok. So you have
PriorityFlags=MAX_TRES
too.

> So we take one of these maximum values we dived it again by 1000 and round 
> it. Hopefully this explains it.
Yup, tks.

Now I have to understand why my definition of

NodeName=str957-mtx-00 Sockets=2 CoresPerSocket=14
ThreadsPerCore=2 Feature=ib,nonblade,intel MemSpecLimit=1024

gives

# scontrol show node str957-mtx-00
scontrol: error: NodeNames=str957-mtx-00 MemSpecLimit=1024 is invalid,
reset to 0
NodeName=str957-mtx-00 CoresPerSocket=14
   CPUAlloc=0 CPUTot=56 CPULoad=N/A
   AvailableFeatures=ib,nonblade,intel
   ActiveFeatures=ib,nonblade,intel
   Gres=(null)
   NodeAddr=str957-mtx-00 NodeHostName=str957-mtx-00
   RealMemory=257407 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1
   State=DOWN* ThreadsPerCore=2 TmpDisk=482683 Weight=1 Owner=N/A
MCS_label=N/A
   Partitions=matrix,debug
   BootTime=None SlurmdStartTime=None
   CfgTRES=cpu=56,mem=257407M,billing=56
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=Not responding [slurm@2020-08-04T08:33:54]

I tried lowering it down to 2, same result...

Node is turned off for maintenance, but RealMemory is OK...

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] Billing issue

2020-08-06 Thread Bas van der Vlies


Il 06/08/20 10:00, Bas van der Vlies ha scritto:

Tks for the answer.

>> We have also node with GPU's (dfiferent types) and some cost more the others.
> The partitions always have the same type of nodes not mixed,eg:
>  * 
> TRESBillingWeights=CPU=3801.0,Mem=502246.0T,GRES/gpu=22807.0,GRES/gpu:titanrt
> x=22807.0
>  * node type: 24 cores 4 GPU's  MemSpecLimit=1024 Memory=191488
>  * the max is 91228 SBU
Still missing something. Can't come to your result.
24*3801+502246*(191488-1024)/(1024**5)+4*22807 = 182452 (rounded), about
twice your result.

we have MAX(core, mem, gres). all resources can have the score: 91228
  24 * 3801 = 91224
  mem same value
  4 gres * 22807 = 91228

So we take one of these maximum values we dived it again by 1000 and round it. 
Hopefully this explains it.

DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] Billing issue

2020-08-06 Thread Diego Zuccato
Il 06/08/20 10:00, Bas van der Vlies ha scritto:

Tks for the answer.

> We have nodes with 16 cores and 96GB of ram and this are the cheapest nodes 
> they
> cost in our model. 
Theoretical 6GB/core. 5.625 net.

> We multiple everything by 1000 to avoid slurm's behaviour of truncating the
> result. The node has the following spec: RealMemory=93184 and 
> MemSpecLimit=1024
I overlooked MemSpecLimit. Now added. Very useful.

> The memory tresbillingweight is calculates  with this formula:
>   * price_per_core * num_cores / memory * 1024 * 1024`. Memory is `RealMemory 
> -
> MemSpecLimit`
Ok.

> We have also node with GPU's (dfiferent types) and some cost more the others.
> The partitions always have the same type of nodes not mixed,eg: 
>  * 
> TRESBillingWeights=CPU=3801.0,Mem=502246.0T,GRES/gpu=22807.0,GRES/gpu:titanrt
> x=22807.0
>  * node type: 24 cores 4 GPU's  MemSpecLimit=1024 Memory=191488   
>  * the max is 91228 SBU
Still missing something. Can't come to your result.
24*3801+502246*(191488-1024)/(1024**5)+4*22807 = 182452 (rounded), about
twice your result.

> This are all shared partitions show if an user request more memory then the
> default per core then it will be charged for the extra memory.  If user use 
> more
> memory per gpu then it will be charged for the extra memory. This memory for a
> GPU machine  is  more expensive then a CPU machine.
Uh? How? The formula is linear...

> The one with different tressbillingweight per job type (serial or parallel) 
> can
> not be solved with this setting.  It must be solved that defaults for this
> partition is always 1/4 of the node even if you request just 1 core.
Can't see how to force it :(

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: [slurm-users] Billing issue

2020-08-06 Thread Bas van der Vlies
Hi Diego,

 Yes this can be tricky we also use this feature. The billing is  on partition
level. so you can set different
schemas. 

We have nodes with 16 cores and 96GB of ram and this are the cheapest nodes they
cost in our model. 
1 SBU (System Billing Unit). For this node we have the following setting:
 * TRESBillingWeights=CPU=1000.0,Mem=182044.0T

We multiple everything by 1000 to avoid slurm's behaviour of truncating the
result. The node has the following spec: RealMemory=93184 and MemSpecLimit=1024

The memory tresbillingweight is calculates  with this formula:
  * price_per_core * num_cores / memory * 1024 * 1024`. Memory is `RealMemory -
MemSpecLimit`

We have also node with GPU's (dfiferent types) and some cost more the others.
The partitions always have the same type of nodes not mixed,eg: 
 * TRESBillingWeights=CPU=3801.0,Mem=502246.0T,GRES/gpu=22807.0,GRES/gpu:titanrt
x=22807.0
 * node type: 24 cores 4 GPU's  MemSpecLimit=1024 Memory=191488   
 * the max is 91228 SBU

This are all shared partitions show if an user request more memory then the
default per core then it will be charged for the extra memory.  If user use more
memory per gpu then it will be charged for the extra memory. This memory for a
GPU machine  is  more expensive then a CPU machine.

The one with different tressbillingweight per job type (serial or parallel) can
not be solved with this setting.  It must be solved that defaults for this
partition is always 1/4 of the node even if you request just 1 core.

regards,



On Thu, 2020-08-06 at 08:55 +0200, Diego Zuccato wrote:
> Hello all.
> 
> How can I configure Slurm to bill users for the actual "percentage" of a
> node's use?
> 
> I mean that an user requesting one core and all the RAM should be billed
> for the whole node, since the other cores can't be allocated to another job.
> The ideal would be to bill max(ReqCores, ReqMem * MaxMem/TotalCores) .
> 
> To make things harder, on another partition I'd need to bill for a
> minimim of 1/4 of the cores and 1/4 of the RAM (to discourage serial
> jobs in favor of parallel ones), but keeping the same logic above: an
> user requesting 1 core and 2G RAM will be billed 1/4 of the node.
> 
> Is it doable? It's a couple weeks I'm looking at the docs, but I still
> miss something...
> 
> TIA!
> 


smime.p7s
Description: S/MIME cryptographic signature