Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-20 Thread Steven Senator (slurm-dev-list)
When upgrading to 18.08 it is prudent to add following lines into your
/etc/my.cnf as per
  https://slurm.schedmd.com/accounting.html
  https://slurm.schedmd.com/SLUG19/High_Throughput_Computing.pdf (slide #6)

[mysqld]
innodb_buffer_pool_size=1G
innodb_log_file_size=64M
innodb_lock_wait_timeout=900

If the node on which mysql is running has sufficient memory you may
want to increase the innodb_buffer_pool_size beyond 1G. That's just
the minimum threshold below which slurm complains. We use 8G, for
example, because it fits our churn rate for {job arrival, job dispatch
to run state} in RAM and our nodes enough RAM to accommodate an 8G
cache. (references on tuning below)

When you reset this, you will also need to remove the previous innodb
caches, which are probably in /var/lib/mysql. When we did this we
removed and recreated the slurm_acct_db, although that was partially
motivated by the fact that this coincided with an OS and database
patch upgrade and a major accounting and allocation cycle.
  0. Stop slurmctld, slurmdbd.
  1. Create a dump of your database. (mysqldump ...)
  2. Verify that the dump is complete and valid.
  3. Remove the slurm_acct_db. (mysql -e "drop database slurm_acct_db;")
  3. Stop your mysql instance cleanly.
  4. Check the logs. Verify that the mysql instance was stopped cleanly.
  5.  rm /var/lib/mysql/ib_logfile? /var/lib/ibdata1
  6. Put the new lines as above into /etc/my.cnf with the log file
sized appropriately.
  7. Start mysql.
  8. Verify it started cleanly.
  9. Restart the slurm dbd manually, possibly in non-daemon mode.
(slurmdbd -D -vv)
  10. sacctmgr create cluster 

If you want to restore the data back into the data base, do it
*before* step 9 so that the schema conversion can be performed. I like
using mutiple "-vv" so that I can see some of the messages as that
conversion process proceeds.

Some references on mysql innodb_buffer_pool_size tuning:
  
https://scalegrid.io/blog/calculating-innodb-buffer-pool-size-for-your-mysql-server/
  https://mariadb.com/kb/en/innodb-system-variables/#innodb_buffer_pool_size
  https://mariadb.com/kb/en/innodb-buffer-pool/
  https://www.percona.com/blog/2015/06/02/80-ram-tune-innodb_buffer_pool_size/
  https://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool-resize.html

Hope this helps,
 -Steve Senator

On Wed, Feb 19, 2020 at 7:12 AM Ricardo Gregorio
 wrote:
>
> hi all,
>
>
>
> I am putting together an upgrade plan for slurm on our HPC. We are currently 
> running old version 17.02.11. Would you guys advise us upgrading to 18.08 or 
> 19.05?
>
>
>
> I understand we will have to also upgrade the version of mariadb from 5.5 to 
> 10.X and pay attention to 'long db upgrade from 17.02 to 18.X or 19.X' and 
> 'bug 6796' amongst other things.
>
>
>
> We would appreciate your comments/recommendations
>
>
>
> Regards,
>
> Ricardo Gregorio
>
> Research and Systems Administrator
>
> Operations ITS
>
>
>
>
>
>
> Rothamsted Research is a company limited by guarantee, registered in England 
> at Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 
> and a not for profit charity number 802038.



Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-20 Thread Ricardo Gregorio
Thank you Ole/Chris/Marcus. Your input was much appreciated

Ole, I was(am) basing my upgrade plan using the documentation found on the link 
you had sent me. In fact your wiki is always my first stop when 
learning/tshooting SLURM issues, even before SLURM docs pages. Excellent work, 
well done.

Regards,
Ricardo Gregorio


-Original Message-
From: slurm-users  On Behalf Of Ole Holm 
Nielsen
Sent: 19 February 2020 14:41
To: slurm-users@lists.schedmd.com
Subject: Re: [slurm-users] Slurm Upgrade from 17.02

On 2/19/20 3:10 PM, Ricardo Gregorio wrote:
> I am putting together an upgrade plan for slurm on our HPC. We are
> currently running old version 17.02.11. Would you guys advise us
> upgrading to 18.08 or 19.05?

You should be able to upgrade 2 Slurm major versions in one step.  The
18.08 version is just about to become unsupported since 20.02 will be released 
shortly.  We use 19.05.5.

I have collected a number of upgrading details in my Slurm Wiki page:
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.fysik.dtu.dk%2Fniflheim%2FSlurm_installation%23upgrading-slurm&data=01%7C01%7Cricardo.gregorio%40rothamsted.ac.uk%7C5fe28607ff8d455f5d9c08d7b54a06f6%7Cb688362589414342b0e37b8cc8392f64%7C1&sdata=zQfmqJcyEEp%2BvC2WxHR1eKWIu4F%2Ftbms344YlwW0Bs0%3D&reserved=0

You really, really want to perform a dry-run Slurm database upgrade on a test 
machine before doing the real upgrade!  See
https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.fysik.dtu.dk%2Fniflheim%2FSlurm_installation%23make-a-dry-run-database-upgrade&data=01%7C01%7Cricardo.gregorio%40rothamsted.ac.uk%7C5fe28607ff8d455f5d9c08d7b54a06f6%7Cb688362589414342b0e37b8cc8392f64%7C1&sdata=WWqmE7erGoSEJ9cMQ1o%2FOgXsI8kqK7YQ8zztSr9JpIg%3D&reserved=0

> I understand we will have to also upgrade the version of mariadb from
> 5.5 to 10.X and pay attention to 'long db upgrade from 17.02 to 18.X or 19.X'
> and 'bug 6796' amongst other things.

We use the default MariaDB 5.5 in CentOS 7.7.  Upgrading to MariaDB 10 seems to 
have quite a number of unresolved installation issues, so I would skip that for 
now.  Se s

> We would appreciate your comments/recommendations

Slurm 19.05 works great for us.  We're happy with our SchedMD support contract.

/Ole


Rothamsted Research is a company limited by guarantee, registered in England at 
Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a 
not for profit charity number 802038.


Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Marcus Wagner

Hi Ricardo,

If I remember right, you can only upgrade two versions further. So you 
WILL have to upgrade to 18.08, even if you want to use 19.05 or the 
coming 20.02


17.02 -> 17.11 -> 18.08 -> 19.05 -> 20.02
^  ^
|  |
|- you are here    |- "farthest jump" to a newer version in one step.


As SchedMD introduced constres in 19.05, consres will become depercated 
in future versions. The way you order GPUs is more consistent in the new 
version. So, I would upgrade to 19.05. Still you will have in a first 
step to upgrade to 18.08 though.



Best
Marcus


On 2/19/20 3:10 PM, Ricardo Gregorio wrote:


hi all,

I am putting together an upgrade plan for slurm on our HPC. We are 
currently running old version 17.02.11. Would you guys advise us 
upgrading to 18.08 or 19.05?


I understand we will have to also upgrade the version of mariadb from 
5.5 to 10.X and pay attention to 'long db upgrade from 17.02 to 18.X 
or 19.X' and 'bug 6796' amongst other things.


We would appreciate your comments/recommendations

Regards,

*Ricardo Gregorio*

Research and Systems Administrator

Operations ITS


Rothamsted Research is a company limited by guarantee, registered in 
England at Harpenden, Hertfordshire, AL5 2JQ under the registration 
number 2393175 and a not for profit charity number 802038. 


--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de



Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Chris Samuel

On 19/2/20 6:10 am, Ricardo Gregorio wrote:

I am putting together an upgrade plan for slurm on our HPC. We are 
currently running old version 17.02.11. Would you guys advise us 
upgrading to 18.08 or 19.05?


Slurm versions only support upgrading from 2 major versions back, so you 
could only upgrade from 17.02 to 17.11 or 18.08.  I'd suggest going 
straight to 18.08.


Remember you have to upgrade slurmdbd first, then upgrade slurmctld and 
then finally the slurmd's.


Also, as Ole points out, 20.02 is due out soon at which point 18.08 gets 
retired from support, so you'd probably want to jump to 19.05 from 18.08.


Don't forget to take backups first!  We do a mysqldump of the whole 
accounting DB and rsync backups of our state directories before an upgrade.


Best of luck!
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Ole Holm Nielsen

On 2/19/20 3:10 PM, Ricardo Gregorio wrote:
I am putting together an upgrade plan for slurm on our HPC. We are 
currently running old version 17.02.11. Would you guys advise us upgrading 
to 18.08 or 19.05?


You should be able to upgrade 2 Slurm major versions in one step.  The 
18.08 version is just about to become unsupported since 20.02 will be 
released shortly.  We use 19.05.5.


I have collected a number of upgrading details in my Slurm Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm

You really, really want to perform a dry-run Slurm database upgrade on a 
test machine before doing the real upgrade!  See

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#make-a-dry-run-database-upgrade

I understand we will have to also upgrade the version of mariadb from 5.5 
to 10.X and pay attention to 'long db upgrade from 17.02 to 18.X or 19.X' 
and 'bug 6796' amongst other things.


We use the default MariaDB 5.5 in CentOS 7.7.  Upgrading to MariaDB 10 
seems to have quite a number of unresolved installation issues, so I would 
skip that for now.  Se s



We would appreciate your comments/recommendations


Slurm 19.05 works great for us.  We're happy with our SchedMD support 
contract.


/Ole



[slurm-users] Slurm Upgrade from 17.02

2020-02-19 Thread Ricardo Gregorio
hi all,

I am putting together an upgrade plan for slurm on our HPC. We are currently 
running old version 17.02.11. Would you guys advise us upgrading to 18.08 or 
19.05?

I understand we will have to also upgrade the version of mariadb from 5.5 to 
10.X and pay attention to 'long db upgrade from 17.02 to 18.X or 19.X' and 'bug 
6796' amongst other things.

We would appreciate your comments/recommendations

Regards,
Ricardo Gregorio
Research and Systems Administrator
Operations ITS



Rothamsted Research is a company limited by guarantee, registered in England at 
Harpenden, Hertfordshire, AL5 2JQ under the registration number 2393175 and a 
not for profit charity number 802038.