Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Chris Samuel

On 12/12/19 8:14 am, Dean Schulze wrote:

configure:5021: gcc -o conftest -I/usr/include/mysql -g -O2   conftest.c 
-L/usr/lib/x86_64-linux-gnu -lmysqlclient -lpthread -lz -lm -lrt 
-latomic -lssl -lcrypto -ldl  >&5

/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
collect2: error: ld returned 1 exit status


That looks like your failure, you're missing the package that provides 
those libraries it's trying to use - in this case for Debian/Ubuntu I 
suspect it's libssl-dev.


All the best,
Chris
--
 Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Dean Schulze
Thanks for mentioning the config.log file.  It has dozens of errors in it,
yet ./configure completes and doesn't report any errors.

Here's what got me past the problem with the mysql plugin.  A test program
that needed -lssl and -lcrypto on the make command line was failing.  The
solution was

sudo apt-get install libssl-dev

I also added

sudo apt-get install g++
sudo apt install build-essential

to eliminate some other failures.  Thanks to all who responded here, I now
have slurmctld and slurmdbd running.

The config.log still has dozens of errors in it, almost all due to failed
include statements.  I'll open another thread about those.



On Tue, Dec 10, 2019 at 2:05 PM Dean Schulze 
wrote:

> I'm trying to set up my first slurm installation following these
> instructions:
>
> https://github.com/nateGeorge/slurm_gpu_ubuntu
>
> I've had to deviate a little bit because I'm using virtual machines that
> don't have GPUs, so I don't have a gres.conf file and in
> /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last
> line.
>
> On my controller vm I get errors when trying to do simple commnands:
>
> $ sinfo
> slurm_load_partitions: Unable to contact slurm controller (connect failure)
>
> $ sudo sacctmgr add cluster compute-cluster
> sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to localhost:6819: Connection refused
> sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> sacctmgr: error: Problem talking to the database: Connection refused
>
>
> Something is supposed to be running on port 6819, but netstat shows
> nothing using that port.  What is supposed to be running on 6819?
>
> My database (Maria) is running.  I can connect to it with `sudo mysql -U
> root`.
>
> When I boot my controller which services are supposed to be running and on
> which ports?
>
> Thanks.
>
>


Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Dean Schulze
There's a mysql test failure in config.log.  It looks like a couple of
missing libraries.  The config.log also shows errors because g++ isn't
present, and dozens of errors because of failed includes.  I must need g++
packages on my Ubuntu instance.

But ./configure completes successfully in spite of dozens of failures.


configure:4890: checking for mysql_config
configure:4908: found /usr/bin/mysql_config
configure:4920: result: /usr/bin/mysql_config
configure:5021: gcc -o conftest -I/usr/include/mysql -g -O2   conftest.c
-L/usr/lib/x86_64-linux-gnu -lmysqlclient -lpthread -lz -lm -lrt -latomic
-lssl -lcrypto -ldl  >&5
/usr/bin/ld: cannot find -lssl
/usr/bin/ld: cannot find -lcrypto
collect2: error: ld returned 1 exit status
configure:5021: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "slurm"
| #define PACKAGE_TARNAME "slurm"
| #define PACKAGE_VERSION "19.05"
| #define PACKAGE_STRING "slurm 19.05"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL "https://slurm.schedmd.com;
| #define PROJECT "slurm"
| #define SLURM_API_VERSION 0x22
| #define SLURM_API_CURRENT 34
| #define SLURM_API_MAJOR 34
| #define SLURM_API_AGE 0
| #define SLURM_API_REVISION 0
| #define VERSION "19.05.4"
| #define SLURM_VERSION_NUMBER 0x130504
| #define SLURM_MAJOR "19"
| #define SLURM_MINOR "05"
| #define SLURM_MICRO "4"
| #define RELEASE "1"
| #define SLURM_VERSION_STRING "19.05.4"
| /* end confdefs.h.  */
| #include 
| int
| main ()
| {
|
| MYSQL mysql;
| (void) mysql_init();
| (void) mysql_close();
|
|   ;
|   return 0;
| }
configure:5041: WARNING: *** MySQL test program execution failed. A
thread-safe MySQL library is required.


On Wed, Dec 11, 2019 at 6:33 PM Kurt H Maier  wrote:

> On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote:
> > I tried again with a completely new system (virtual machine).  I used the
> > latest source, I used mysql instead of mariadb, and I installed all the
> > client and dev libs (below).  I still get the same error.  It doesn't
> > build the /usr/lib/slurm/accounting_storage_mysql.so file.
> >
> > Could the ./configure command be the problem?  Here's how I run it:
>
> It's going to be extremely difficult to diagnose this without the output
> from the build process.  Perhaps you could attach this to the bug report
> you opened about this issue.
>
> khm
>
>


Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Gennaro Oliva
Hi Dean,

On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote:
> I tried again with a completely new system (virtual machine).  I used the
> latest source, I used mysql instead of mariadb, and I installed all the
> client and dev libs (below).  I still get the same error.  It doesn't
> build the /usr/lib/slurm/accounting_storage_mysql.so file.

On Debian it builds fine with default-libmysqlclient-dev installed.

Best regards,
-- 
Gennaro Oliva



Re: [slurm-users] Need help with controller issues

2019-12-12 Thread William Brown
I looked back in the list to November when I had the same problem problem
building with MariaDB:
>>>> On 11-11-2019 21:23, William Brown wrote:
>>>>> I have in fact found the answer by looking harder.
>>>>>
>>>>> The config.log clearly showed that the build of the test MySQL
>>>>> program failed, which is why it was set to be excluded.
>>>>>
>>>>> It failed to link against '-lmariadb'.  It turns out that library is
>>>>> no longer in MariaDB or MariaDB-devel, it is separately packaged in
>>>>> MariaDB-shared.  That may of course be because I have built MariaDB
>>>>> 10.4 from the mariadb.org site, because CentOS 7 only ships with the
>>>>> extremely old version 5.5.
>>>>>
>>>>> Once I installed the missing package it built the RPMs just fine.
>>>>> However it would be easier to use it linked to static MariaDB
>>>>> libraries, as I now have to installed MariaDB-shared on every server
>>>>> that will run slurmd, i.e. all compute nodes.  I expect that if I
>>>>> looked harder at the build options there may be a way to do this,
>>>>> perhaps with linker flags.

I think that even if you are building with MySQL on a clean VM you need to
look at the detailed log of the build of the accounting components.

The configure looks and if it finds either mysql_config or mariadb_config
commands it assumes that you need to build with MySQL support.  In your
case you actually want it, where I really didn't as I use slurmdbd, not a
direct connect to MySQL.  It was then a matter of having the right RPMs
installed, and for MariaDB the missing bit was MariaDB-shared (this is for
RHEL/CentOS).  As soon as I had installed that I was able to get the
accounting_storage_mysql to build.

On Thu, 12 Dec 2019 at 03:54,  wrote:

> Is that logged somewhere or do I need to capture the output from the make
> command to a file?
>
> -Original Message-
> From: slurm-users  On Behalf Of
> Kurt
> H Maier
> Sent: Wednesday, December 11, 2019 6:32 PM
> To: Slurm User Community List 
> Subject: Re: [slurm-users] Need help with controller issues
>
> On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote:
> > I tried again with a completely new system (virtual machine).  I used
> > the latest source, I used mysql instead of mariadb, and I installed
> > all the client and dev libs (below).  I still get the same error.  It
> > doesn't build the /usr/lib/slurm/accounting_storage_mysql.so file.
> >
> > Could the ./configure command be the problem?  Here's how I run it:
>
> It's going to be extremely difficult to diagnose this without the output
> from the build process.  Perhaps you could attach this to the bug report
> you
> opened about this issue.
>
> khm
>
>
>
>


Re: [slurm-users] Need help with controller issues

2019-12-11 Thread dean.w.schulze
Is that logged somewhere or do I need to capture the output from the make
command to a file?

-Original Message-
From: slurm-users  On Behalf Of Kurt
H Maier
Sent: Wednesday, December 11, 2019 6:32 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Need help with controller issues

On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote:
> I tried again with a completely new system (virtual machine).  I used 
> the latest source, I used mysql instead of mariadb, and I installed 
> all the client and dev libs (below).  I still get the same error.  It 
> doesn't build the /usr/lib/slurm/accounting_storage_mysql.so file.
> 
> Could the ./configure command be the problem?  Here's how I run it:

It's going to be extremely difficult to diagnose this without the output
from the build process.  Perhaps you could attach this to the bug report you
opened about this issue.

khm





Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Kurt H Maier
On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote:
> I tried again with a completely new system (virtual machine).  I used the
> latest source, I used mysql instead of mariadb, and I installed all the
> client and dev libs (below).  I still get the same error.  It doesn't
> build the /usr/lib/slurm/accounting_storage_mysql.so file.
> 
> Could the ./configure command be the problem?  Here's how I run it:

It's going to be extremely difficult to diagnose this without the output
from the build process.  Perhaps you could attach this to the bug report
you opened about this issue.

khm



Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Dean Schulze
I tried again with a completely new system (virtual machine).  I used the
latest source, I used mysql instead of mariadb, and I installed all the
client and dev libs (below).  I still get the same error.  It doesn't
build the /usr/lib/slurm/accounting_storage_mysql.so file.

Could the ./configure command be the problem?  Here's how I run it:

./configure --prefix=/tmp/slurm-build --sysconfdir=/etc/slurm --enable-pam
--with-pam_dir=/lib/x86_64-linux-gnu/security/ --without-shared-libslurm


$ dpkg -l | grep mysql
ii  libmysqlclient-dev 5.7.28-0ubuntu0.18.04.4
amd64MySQL database development files
ii  libmysqlclient20:amd64 5.7.28-0ubuntu0.18.04.4
amd64MySQL database client library
ii  libmysqld-dev  5.7.28-0ubuntu0.18.04.4
amd64MySQL embedded database development files
ii  mysql-client   5.7.28-0ubuntu0.18.04.4
all  MySQL database client (metapackage
depending on the latest version)
ii  mysql-client-5.7   5.7.28-0ubuntu0.18.04.4
amd64MySQL database client binaries
ii  mysql-client-core-5.7  5.7.28-0ubuntu0.18.04.4
amd64MySQL database core client binaries
ii  mysql-common   5.8+1.0.4
all  MySQL database common files, e.g.
/etc/mysql/my.cnf
ii  mysql-server   5.7.28-0ubuntu0.18.04.4
all  MySQL database server (metapackage
depending on the latest version)
ii  mysql-server-5.7   5.7.28-0ubuntu0.18.04.4
amd64MySQL database server binaries and system
database setup
ii  mysql-server-core-5.7  5.7.28-0ubuntu0.18.04.4
amd64MySQL database server binaries



On Tue, Dec 10, 2019 at 2:05 PM Dean Schulze 
wrote:

> I'm trying to set up my first slurm installation following these
> instructions:
>
> https://github.com/nateGeorge/slurm_gpu_ubuntu
>
> I've had to deviate a little bit because I'm using virtual machines that
> don't have GPUs, so I don't have a gres.conf file and in
> /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last
> line.
>
> On my controller vm I get errors when trying to do simple commnands:
>
> $ sinfo
> slurm_load_partitions: Unable to contact slurm controller (connect failure)
>
> $ sudo sacctmgr add cluster compute-cluster
> sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to localhost:6819: Connection refused
> sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> sacctmgr: error: Problem talking to the database: Connection refused
>
>
> Something is supposed to be running on port 6819, but netstat shows
> nothing using that port.  What is supposed to be running on 6819?
>
> My database (Maria) is running.  I can connect to it with `sudo mysql -U
> root`.
>
> When I boot my controller which services are supposed to be running and on
> which ports?
>
> Thanks.
>
>


Re: [slurm-users] Need help with controller issues

2019-12-11 Thread Eli V
Look for libmariadb-client. That's needed for slurmdbd on debian.

On Wed, Dec 11, 2019 at 11:43 AM Dean Schulze  wrote:
>
> Turns out I've already got libmariadb-dev installed:
>
> $ dpkg -l | grep maria
> ii  libmariadb-dev 3.0.3-1build1  
>  amd64MariaDB Connector/C, development files
> ii  libmariadb3:amd64  3.0.3-1build1  
>  amd64MariaDB Connector/C
> ii  mariadb-client-10.11:10.1.43-0ubuntu0.18.04.1 
>  amd64MariaDB database client binaries
> ii  mariadb-client-core-10.1   1:10.1.43-0ubuntu0.18.04.1 
>  amd64MariaDB database core client binaries
> ii  mariadb-common 1:10.1.43-0ubuntu0.18.04.1 
>  all  MariaDB common metapackage
> ii  mariadb-server 1:10.1.43-0ubuntu0.18.04.1 
>  all  MariaDB database server (metapackage depending 
> on the latest version)
> ii  mariadb-server-10.11:10.1.43-0ubuntu0.18.04.1 
>  amd64MariaDB database server binaries
> ii  mariadb-server-core-10.1   1:10.1.43-0ubuntu0.18.04.1 
>  amd64MariaDB database core server files
>
> On Wed, Dec 11, 2019 at 9:04 AM Dean Schulze  wrote:
>>
>> These are the packages I installed prior to building slurm:
>>
>> libmariadb-client-lgpl-dev
>> libmysqlclient-dev
>> mariadb-server
>>
>> This installs mariadb 10.1.43 which is old.
>>
>> On the Ubuntu site (https://packages.ubuntu.com/search?keywords=mariadb) 
>> there's a package called
>>
>> libmariadb-dev
>>
>> Maybe this is the one I'm missing for accounting_storage?
>>
>> The Ubuntu site also shows packages for mariadb and mariadbd, but I don't 
>> know what the difference is between mariadb and mariadbd.  Do I need the 
>> mariadbd packages as well as the mariadb packages?
>>
>>
>> On Tue, Dec 10, 2019 at 10:23 PM Chris Samuel  wrote:
>>>
>>> On Tuesday, 10 December 2019 1:57:59 PM PST Dean Schulze wrote:
>>>
>>> > This bug report from a couple of years ago indicates a source code issue:
>>> >
>>> > https://bugs.schedmd.com/show_bug.cgi?id=3278
>>> >
>>> > This must have been fixed by now, though.
>>> >
>>> > I built using slurm-19.05.2.  Does anyone know if this has been fixed in
>>> > 19.05.4?
>>>
>>> I don't think this is a Slurm issue - have you checked that you have the
>>> MariaDB development package for your distro installed before trying to buidl
>>> Slurm?   It will skip things it doesn't find and that could explain what 
>>> you're
>>> seeing.
>>>
>>> All the best,
>>> Chris
>>> --
>>>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>>>
>>>
>>>
>>>



Re: [slurm-users] Need help with controller issues

2019-12-10 Thread William Brown
The latest MariaDB packaging is different, there is a 3rd RPM needed, as
well as the client and developer.  Away from my desk but the info is on the
MariaDB site.

William

On Wed, 11 Dec 2019, 05:23 Chris Samuel,  wrote:

> On Tuesday, 10 December 2019 1:57:59 PM PST Dean Schulze wrote:
>
> > This bug report from a couple of years ago indicates a source code issue:
> >
> > https://bugs.schedmd.com/show_bug.cgi?id=3278
> >
> > This must have been fixed by now, though.
> >
> > I built using slurm-19.05.2.  Does anyone know if this has been fixed in
> > 19.05.4?
>
> I don't think this is a Slurm issue - have you checked that you have the
> MariaDB development package for your distro installed before trying to
> buidl
> Slurm?   It will skip things it doesn't find and that could explain what
> you're
> seeing.
>
> All the best,
> Chris
> --
>   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
>
>
>
>
>


Re: [slurm-users] Need help with controller issues

2019-12-10 Thread Chris Samuel
On Tuesday, 10 December 2019 1:57:59 PM PST Dean Schulze wrote:

> This bug report from a couple of years ago indicates a source code issue:
> 
> https://bugs.schedmd.com/show_bug.cgi?id=3278
> 
> This must have been fixed by now, though.
> 
> I built using slurm-19.05.2.  Does anyone know if this has been fixed in
> 19.05.4?

I don't think this is a Slurm issue - have you checked that you have the 
MariaDB development package for your distro installed before trying to buidl 
Slurm?   It will skip things it doesn't find and that could explain what you're 
seeing.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






Re: [slurm-users] Need help with controller issues

2019-12-10 Thread Dean Schulze
There's a problem with accounting_storage/mysql plugin:

$ sudo  slurmdbd -D -
slurmdbd: debug:  Log file re-opened
slurmdbd: pidfile not locked, assuming no running daemon
slurmdbd: debug3: Trying to load plugin /usr/lib/slurm/auth_munge.so
slurmdbd: debug:  Munge authentication plugin loaded
slurmdbd: debug3: Success.
slurmdbd: debug3: Trying to load plugin
/usr/lib/slurm/accounting_storage_mysql.so
slurmdbd: error: Couldn't find the specified plugin name for
accounting_storage/mysql looking at all files
slurmdbd: error: cannot find accounting_storage plugin for
accounting_storage/mysql
slurmdbd: error: cannot create accounting_storage context for
accounting_storage/mysql
slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting
storage plugin


This bug report from a couple of years ago indicates a source code issue:

https://bugs.schedmd.com/show_bug.cgi?id=3278

This must have been fixed by now, though.

I built using slurm-19.05.2.  Does anyone know if this has been fixed in
19.05.4?



On Tue, Dec 10, 2019 at 2:05 PM Dean Schulze 
wrote:

> I'm trying to set up my first slurm installation following these
> instructions:
>
> https://github.com/nateGeorge/slurm_gpu_ubuntu
>
> I've had to deviate a little bit because I'm using virtual machines that
> don't have GPUs, so I don't have a gres.conf file and in
> /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last
> line.
>
> On my controller vm I get errors when trying to do simple commnands:
>
> $ sinfo
> slurm_load_partitions: Unable to contact slurm controller (connect failure)
>
> $ sudo sacctmgr add cluster compute-cluster
> sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to localhost:6819: Connection refused
> sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> sacctmgr: error: Problem talking to the database: Connection refused
>
>
> Something is supposed to be running on port 6819, but netstat shows
> nothing using that port.  What is supposed to be running on 6819?
>
> My database (Maria) is running.  I can connect to it with `sudo mysql -U
> root`.
>
> When I boot my controller which services are supposed to be running and on
> which ports?
>
> Thanks.
>
>


Re: [slurm-users] Need help with controller issues

2019-12-10 Thread Dean Schulze
$ systemctl status slurmdbd
● slurmdbd.service - Slurm DBD accounting daemon
   Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-12-10 13:33:28 MST;
40min ago
  Process: 787 ExecStart=/usr/sbin/slurmdbd $SLURMDBD_OPTIONS (code=exited,
status=0/SUCCESS)
 Main PID: 791 (code=exited, status=1/FAILURE)

Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Starting Slurm DBD
accounting daemon...
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Started Slurm DBD
accounting daemon.
Dec 10 13:33:28 ubuntu-controller.liqid.com slurmdbd[791]: fatal: Unable to
initialize accounting_storage/mysql accounting storage plugin
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmdbd.service:
Main process exited, code=exited, status=1/FAILURE
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmdbd.service:
Failed with result 'exit-code'.
$ systemctl status slurmctld
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor
preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-12-10 13:33:28 MST;
41min ago
  Process: 788 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
(code=exited, status=0/SUCCESS)
 Main PID: 796 (code=exited, status=1/FAILURE)

Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Starting Slurm
controller daemon...
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: Started Slurm
controller daemon.
Dec 10 13:33:28 ubuntu-controller.liqid.com slurmctld[796]: fatal: You are
running with a database but for some reason we have no TRES from it.  Th
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmctld.service:
Main process exited, code=exited, status=1/FAILURE
Dec 10 13:33:28 ubuntu-controller.liqid.com systemd[1]: slurmctld.service:
Failed with result 'exit-code'.
$

One issue is with a database plugin.  During database setup this command
failed:

sudo systemctl enable mysql

I did this instead

sudo systemctl enable mariadb.service

Maybe there is some config that has to be modified to use maria instead  of
mysql?


On Tue, Dec 10, 2019 at 2:13 PM Renfro, Michael  wrote:

> What do you get from
>
> systemctl status slurmdbd
> systemctl status slurmctld
>
> I’m assuming at least slurmdbd isn’t running.
>
> > On Dec 10, 2019, at 3:05 PM, Dean Schulze 
> wrote:
> >
> > External Email Warning
> > This email originated from outside the university. Please use caution
> when opening attachments, clicking links, or responding to requests.
> > I'm trying to set up my first slurm installation following these
> instructions:
> >
> > https://github.com/nateGeorge/slurm_gpu_ubuntu
> >
> > I've had to deviate a little bit because I'm using virtual machines that
> don't have GPUs, so I don't have a gres.conf file and in
> /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last
> line.
> >
> > On my controller vm I get errors when trying to do simple commnands:
> >
> > $ sinfo
> > slurm_load_partitions: Unable to contact slurm controller (connect
> failure)
> >
> > $ sudo sacctmgr add cluster compute-cluster
> > sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
> persistent connection to localhost:6819: Connection refused
> > sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> > sacctmgr: error: Problem talking to the database: Connection refused
> >
> >
> > Something is supposed to be running on port 6819, but netstat shows
> nothing using that port.  What is supposed to be running on 6819?
> >
> > My database (Maria) is running.  I can connect to it with `sudo mysql -U
> root`.
> >
> > When I boot my controller which services are supposed to be running and
> on which ports?
> >
> > Thanks.
> >
>
>


Re: [slurm-users] Need help with controller issues

2019-12-10 Thread Renfro, Michael
What do you get from

systemctl status slurmdbd
systemctl status slurmctld

I’m assuming at least slurmdbd isn’t running.

> On Dec 10, 2019, at 3:05 PM, Dean Schulze  wrote:
> 
> External Email Warning
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> I'm trying to set up my first slurm installation following these instructions:
> 
> https://github.com/nateGeorge/slurm_gpu_ubuntu
> 
> I've had to deviate a little bit because I'm using virtual machines that 
> don't have GPUs, so I don't have a gres.conf file and in 
> /etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last line.
> 
> On my controller vm I get errors when trying to do simple commnands:
> 
> $ sinfo
> slurm_load_partitions: Unable to contact slurm controller (connect failure)
> 
> $ sudo sacctmgr add cluster compute-cluster
> sacctmgr: error: slurm_persist_conn_open_without_init: failed to open 
> persistent connection to localhost:6819: Connection refused
> sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
> sacctmgr: error: Problem talking to the database: Connection refused
> 
> 
> Something is supposed to be running on port 6819, but netstat shows nothing 
> using that port.  What is supposed to be running on 6819?
> 
> My database (Maria) is running.  I can connect to it with `sudo mysql -U 
> root`.
> 
> When I boot my controller which services are supposed to be running and on 
> which ports?
> 
> Thanks.
> 



[slurm-users] Need help with controller issues

2019-12-10 Thread Dean Schulze
I'm trying to set up my first slurm installation following these
instructions:

https://github.com/nateGeorge/slurm_gpu_ubuntu

I've had to deviate a little bit because I'm using virtual machines that
don't have GPUs, so I don't have a gres.conf file and in
/etc/slurm/slurm.conf I don't have an entry like Gres=gpu:2 on the last
line.

On my controller vm I get errors when trying to do simple commnands:

$ sinfo
slurm_load_partitions: Unable to contact slurm controller (connect failure)

$ sudo sacctmgr add cluster compute-cluster
sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to localhost:6819: Connection refused
sacctmgr: error: slurmdbd: Sending PersistInit msg: Connection refused
sacctmgr: error: Problem talking to the database: Connection refused


Something is supposed to be running on port 6819, but netstat shows nothing
using that port.  What is supposed to be running on 6819?

My database (Maria) is running.  I can connect to it with `sudo mysql -U
root`.

When I boot my controller which services are supposed to be running and on
which ports?

Thanks.