Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
Just noticed this.  On the problem node the munged.log file has an entry
every 1:40:

2020-04-17 15:31:02 -0600 Info:  Invalid credential
2020-04-17 15:32:42 -0600 Info:  Invalid credential
2020-04-17 15:34:22 -0600 Info:  Invalid credential

This happens on the failed node and two other nodes that work.  Two nodes
that work (including the controller) don't have this message.



On Fri, Apr 17, 2020 at 2:00 PM Riebs, Andy  wrote:

> A couple of quick checks to see if the problem is munge:
>
> 1.   On the problem node, try
> $ echo foo | munge | unmunge
>
> 2.   If (1) works, try this from the node running slurmctld to the
> problem node
> slurm-node$ echo foo | ssh node munge | unmunge
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *Dean Schulze
> *Sent:* Friday, April 17, 2020 3:40 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] Munge decode failing on new node
>
>
>
> There is no ntp service running on any of my nodes, and all but this one
> is working.  I haven't heard that ntp is a requirement for slurm, just that
> the time be synchronized across the cluster.  And it is.
>
>
>
> On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy  wrote:
>
> I’d check ntp as your encoding time seems odd to me
>
>
>
> On Wed, 15 Apr 2020 at 19:59, Dean Schulze 
> wrote:
>
> I've installed two new nodes onto my slurm cluster.  One node works, but
> the other one complains about an invalid credential for munge.  I've
> verified that the munge.key is the same as on all other nodes with
>
>
> sudo cksum /etc/munge/munge.key
>
>
>
> I recopied a munge.key from a node that works.  I've verified that munge
> uid and gid are the same on the nodes.  The time is in sync on all nodes.
>
>
>
> Here is what is in the slurmd.log:
>
>
>
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>  error: Munge decode failed: Invalid credential
>  ENCODED: Wed Dec 31 17:00:00 1969
>  DECODED: Wed Dec 31 17:00:00 1969
>  error: authentication: Invalid authentication credential
>  error: slurm_receive_msg_and_forward: Protocol authentication error
>  error: service_connection: slurm_receive_msg: Protocol authentication
> error
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>
>
>
> I've checked in the munged.log and all it says is
>
>
>
> Invalid credential
>
>
>
> Thanks for your help
>
> --
>
> --
> Carles Fenoy
>
>


Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
Both work.  The only discrepancy is that the slurm controller output had
these two lines:

UID:  ??? (1000)
GID:  ??? (1000)

Like the controller doesn't know the username for UID 1000.

But it returned success 0

On Fri, Apr 17, 2020 at 2:00 PM Riebs, Andy  wrote:

> A couple of quick checks to see if the problem is munge:
>
> 1.   On the problem node, try
> $ echo foo | munge | unmunge
>
> 2.   If (1) works, try this from the node running slurmctld to the
> problem node
> slurm-node$ echo foo | ssh node munge | unmunge
>
>
>
> *From:* slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] *On
> Behalf Of *Dean Schulze
> *Sent:* Friday, April 17, 2020 3:40 PM
> *To:* Slurm User Community List 
> *Subject:* Re: [slurm-users] Munge decode failing on new node
>
>
>
> There is no ntp service running on any of my nodes, and all but this one
> is working.  I haven't heard that ntp is a requirement for slurm, just that
> the time be synchronized across the cluster.  And it is.
>
>
>
> On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy  wrote:
>
> I’d check ntp as your encoding time seems odd to me
>
>
>
> On Wed, 15 Apr 2020 at 19:59, Dean Schulze 
> wrote:
>
> I've installed two new nodes onto my slurm cluster.  One node works, but
> the other one complains about an invalid credential for munge.  I've
> verified that the munge.key is the same as on all other nodes with
>
>
> sudo cksum /etc/munge/munge.key
>
>
>
> I recopied a munge.key from a node that works.  I've verified that munge
> uid and gid are the same on the nodes.  The time is in sync on all nodes.
>
>
>
> Here is what is in the slurmd.log:
>
>
>
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>  error: Munge decode failed: Invalid credential
>  ENCODED: Wed Dec 31 17:00:00 1969
>  DECODED: Wed Dec 31 17:00:00 1969
>  error: authentication: Invalid authentication credential
>  error: slurm_receive_msg_and_forward: Protocol authentication error
>  error: service_connection: slurm_receive_msg: Protocol authentication
> error
>  error: Unable to register: Unable to contact slurm controller (connect
> failure)
>
>
>
> I've checked in the munged.log and all it says is
>
>
>
> Invalid credential
>
>
>
> Thanks for your help
>
> --
>
> --
> Carles Fenoy
>
>


Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Riebs, Andy
A couple of quick checks to see if the problem is munge:

1.   On the problem node, try
$ echo foo | munge | unmunge

2.   If (1) works, try this from the node running slurmctld to the problem 
node
slurm-node$ echo foo | ssh node munge | unmunge

From: slurm-users [mailto:slurm-users-boun...@lists.schedmd.com] On Behalf Of 
Dean Schulze
Sent: Friday, April 17, 2020 3:40 PM
To: Slurm User Community List 
Subject: Re: [slurm-users] Munge decode failing on new node

There is no ntp service running on any of my nodes, and all but this one is 
working.  I haven't heard that ntp is a requirement for slurm, just that the 
time be synchronized across the cluster.  And it is.

On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy 
mailto:mini...@gmail.com>> wrote:
I’d check ntp as your encoding time seems odd to me

On Wed, 15 Apr 2020 at 19:59, Dean Schulze 
mailto:dean.w.schu...@gmail.com>> wrote:
I've installed two new nodes onto my slurm cluster.  One node works, but the 
other one complains about an invalid credential for munge.  I've verified that 
the munge.key is the same as on all other nodes with

sudo cksum /etc/munge/munge.key

I recopied a munge.key from a node that works.  I've verified that munge uid 
and gid are the same on the nodes.  The time is in sync on all nodes.

Here is what is in the slurmd.log:

 error: Unable to register: Unable to contact slurm controller (connect failure)
 error: Munge decode failed: Invalid credential
 ENCODED: Wed Dec 31 17:00:00 1969
 DECODED: Wed Dec 31 17:00:00 1969
 error: authentication: Invalid authentication credential
 error: slurm_receive_msg_and_forward: Protocol authentication error
 error: service_connection: slurm_receive_msg: Protocol authentication error
 error: Unable to register: Unable to contact slurm controller (connect failure)

I've checked in the munged.log and all it says is

Invalid credential

Thanks for your help
--
--
Carles Fenoy


[slurm-users] Alternative to munge for use with slurm?

2020-04-17 Thread Dean Schulze
Is there an alternative to munge when running slurm?  Munge issues are a
common problem in slurm, and munge doesn't give any useful information when
a problem occurs.  An alternative that at least gave some useful
information when a problem occurs would be a big improvement.

Thanks.


Re: [slurm-users] Munge decode failing on new node

2020-04-17 Thread Dean Schulze
There is no ntp service running on any of my nodes, and all but this one is
working.  I haven't heard that ntp is a requirement for slurm, just that
the time be synchronized across the cluster.  And it is.

On Wed, Apr 15, 2020 at 12:17 PM Carlos Fenoy  wrote:

> I’d check ntp as your encoding time seems odd to me
>
> On Wed, 15 Apr 2020 at 19:59, Dean Schulze 
> wrote:
>
>> I've installed two new nodes onto my slurm cluster.  One node works, but
>> the other one complains about an invalid credential for munge.  I've
>> verified that the munge.key is the same as on all other nodes with
>>
>> sudo cksum /etc/munge/munge.key
>>
>> I recopied a munge.key from a node that works.  I've verified that munge
>> uid and gid are the same on the nodes.  The time is in sync on all nodes.
>>
>> Here is what is in the slurmd.log:
>>
>>  error: Unable to register: Unable to contact slurm controller (connect
>> failure)
>>  error: Munge decode failed: Invalid credential
>>  ENCODED: Wed Dec 31 17:00:00 1969
>>  DECODED: Wed Dec 31 17:00:00 1969
>>  error: authentication: Invalid authentication credential
>>  error: slurm_receive_msg_and_forward: Protocol authentication error
>>  error: service_connection: slurm_receive_msg: Protocol authentication
>> error
>>  error: Unable to register: Unable to contact slurm controller (connect
>> failure)
>>
>> I've checked in the munged.log and all it says is
>>
>> Invalid credential
>>
>> Thanks for your help
>>
> --
> --
> Carles Fenoy
>


Re: [slurm-users] follow-up: [Still broken]CentOS 7 CUDA 8.0 can't find plugin cons_tres

2020-04-17 Thread Lisa Kay Weihl
I went back and built the slurm-19.05.6 rpms using:

 rpmbuld -ta slurm-19.05.6.tar.bz2 for slurm-19.05.6. 

It still failed with:

Error: Package: slurm-19.05.6-1.el7.x86_64
Requires: libnvidia-ml.so.1()(64bit)

Now I remember why I went back to 18.08. It was because this post 
https://lists.schedmd.com/pipermail/slurm-users/2019-August/003910.html 
reported the same errors. He said he had no issues with 18.08 and he was 
looking for using GPU. I guess that's why I thought 18.08 supported cons_tres

I followed the rest of that thread and it follows my issues pretty much the 
same although AdavancedHPC installed CUDA 8, I assume from the NVIDIA rpm 
because /etc/yum.repos.d contains a cuda file and looks similar to other 
machines where I've installed cuda via that method.

It was suggested that he go back to version 10.0 of CUDA because the newer 
CUDAs don't build links properly but we are back even further than 10 so I 
figured that must be okay.

libnvidia-ml it there evidenced by ldconfig -p:

libnvidia-ml.so.1 (lib6.x86-64) => /lib64/libnvidia-ml.so.1
libnvidia-ml.so.1 (lib6) => /lib/libnvidia-ml.so.1
libnvidia-ml.so (lib6.x86-64) => /lib64/libnvidia-ml.so
libnvidia-ml.so (lib6) => /lib/libnvidia-ml.so

The gentleman in the thread was going to report back if rolling back to CUDA 
10.0 helped him but I never saw another post.

I also found a post about adding some linker switches to slurm.spec before 
building the rpms but that was for CentOS 8. Even if I add those and rebuild 
the rpms I get the same error message.


I'm at a loss for what combination I need to make this work. 


--
Lisa Weihl 
Systems Administrator, Computer Science 
Bowling Green State University
Tel: (419) 372-0116   |    Fax: (419) 372-8061
lwe...@bgsu.edu
www.bgsu.edu

-Original Message-
From: slurm-users  On Behalf Of 
slurm-users-requ...@lists.schedmd.com
Sent: Friday, April 17, 2020 10:00 AM
To: slurm-users@lists.schedmd.com
Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 35

Send slurm-users mailing list submissions to
slurm-users@lists.schedmd.com

To subscribe or unsubscribe via the World Wide Web, visit

https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-usersdata=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943217204sdata=ZvuYVsbhXhI1%2Bb%2FhUNT306rkHPoAKyzFcnDJG4kYin4%3Dreserved=0
or, via email, send a message with subject or body 'help' to
slurm-users-requ...@lists.schedmd.com

You can reach the person managing the list at
slurm-users-ow...@lists.schedmd.com

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of slurm-users digest..."


Today's Topics:

   1. Re: slurm-20.02.1-1 failed rpmbuild with error File not found
  (Ole Holm Nielsen)
   2. Re: [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32
  (Lisa Kay Weihl)
   3. Re: [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32
  (Renfro, Michael)


--

Message: 1
Date: Fri, 17 Apr 2020 14:11:03 +0200
From: Ole Holm Nielsen 
To: 
Subject: Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error
File not found
Message-ID: <2a452504-183a-6208-f367-f5ae2d03d...@fysik.dtu.dk>
Content-Type: text/plain; charset="utf-8"; format=flowed

On 17-04-2020 11:47, Ole Holm Nielsen wrote:
> On 17-04-2020 10:38, Christian Anthon wrote:
>> It would be neat to have these build requirements / install 
>> requirements built into the spec file.
> 
> I agree with you, and it seems that the SchedMD pages no longer list 
> the build prerequisites (I think there was some information in the past).
> Try googling for "slurm build prerequisites" and see which pages this 
> gives you :-)

If you read the page 
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.htmldata=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195sdata=ctDmhFUK4Mbm3T9z9Suwz2k9YxNazHbmorSCbNwuP5c%3Dreserved=0
carefully, please note the section starting with:

> Optional Slurm plugins will be built automatically when the configure script 
> detects that the required build requirements are present. Build dependencies 
> for various plugins and commands are denoted below: 

A list of optional software is given, but not in a format that is immediately 
applicable to any particular Linux distribution.  For CentOS you should consult
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.fysik.dtu.dk%2Fniflheim%2FSlurm_installation%23build-slurm-rpmsdata=02%7C01%7Clweihl%40bgsu.edu%7C7afbebca312c41a1336208d7e2d79a24%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637227287943227195sdata=%2Bmj3OKaS9UVi1Vw7jaM2tDYpv22c%2FSVqBcysFRkUDU8%3Dreserved=0

> The Slurm 

Re: [slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32

2020-04-17 Thread Renfro, Michael
Can’t speak for everyone, but I went to Slurm 19.05 some months back, and 
haven't had any problems with CUDA 10.0 or 10.1 (or 8.0, 9.0, or 9.1).

> On Apr 17, 2020, at 8:46 AM, Lisa Kay Weihl  wrote:
> 
> External Email Warning
> 
> This email originated from outside the university. Please use caution when 
> opening attachments, clicking links, or responding to requests.
> 
> 
> 
> Wow. I did not catch that version issue. I saw that there were issues with 
> the newest Slurm and how CUDA 10+ installs so I avoided that even though we 
> have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an 
> issue with that and went back to 18 but now that I have more experience 
> setting it up I'll wipe the 18 install and start over. Fingers crossed for 
> success!
> 
> Thanks for your help!
> 
> --
> Lisa Weihl
> Systems Administrator, Computer Science
> Bowling Green State University
> Tel: (419) 372-0116   |Fax: (419) 372-8061
> lwe...@bgsu.edu
> www.bgsu.edu
> 
> -Original Message-
> From: slurm-users  On Behalf Of 
> slurm-users-requ...@lists.schedmd.com
> Sent: Thursday, April 16, 2020 6:39 PM
> To: slurm-users@lists.schedmd.com
> Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32
> 
> Send slurm-users mailing list submissions to
>slurm-users@lists.schedmd.com
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-usersdata=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045sdata=D782Wwobcc6ezSuy5GipiXuiH7EKRMm5Llk3BRwYnss%3Dreserved=0
> or, via email, send a message with subject or body 'help' to
>slurm-users-requ...@lists.schedmd.com
> 
> You can reach the person managing the list at
>slurm-users-ow...@lists.schedmd.com
> 
> When replying, please edit your Subject line so it is more specific than "Re: 
> Contents of slurm-users digest..."
> 
> 
> Today's Topics:
> 
>   1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
>   2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
>  (Sean Crosby)
> 
> 
> --
> 
> Message: 1
> Date: Thu, 16 Apr 2020 19:00:03 +
> From: Lisa Kay Weihl 
> To: "slurm-users@lists.schedmd.com" 
> Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
> Message-ID:
>
> 
> 
> Content-Type: text/plain; charset="utf-8"
> 
> I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to 
> serve as a computer server for data science jobs. My department chair wants a 
> job scheduler on it. I have installed SLURM (18.08.9). That works just fine 
> in a basic configuration when I attempt to add Gres_Types gpu and then add 
> Gres:gpu:4 to the end of the node description:
> 
> 
> NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 
> ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4
> 
> and then try to restart slurmd I get an error that it cannot find the plugin
> 
> slurmd: error: Couldn't find the specified plugin name for select/cons_tres 
> looking at all files
> 
> slurmd: error: cannot find select plugin for select/cons_tres
> 
> slurmd: fatal: Can't find plugin for select/cons_tres
> 
> The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0
> 
> I usually keep notes when I'm installing things but in this case I wasn't 
> jotting things down as I went. I think I started with the instructions on 
> this page: 
> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.htmldata=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3Dreserved=0
>  and went with the usual ./configure, make, make install.
> 
> I have a feeling maybe something did not work and I switched to the rpm 
> packages based on some other web pages I saw because if I do a yum list 
> installed | grep slurm I see a lot of pacakages. The problem is I was 
> interrupted with other tasks and my memory was somewhat rusty when I came 
> back to this.
> 
> When I went looking for this error I saw there were some issues with the 
> newest SLURM and CUDA 10.2 but I didn't think that should be an issue because 
> I was at CUDA 8.0.  Just in case I backed down to SLURM 18.
> 
> I'm willing to start all over if anyone thinks cleaning up and rebuilding 
> will help that. I do see libraries in /etc/lib64/slurm but I also see 2 files 
> in /usr/local/lib/slurm/src so I'm not sure if that's left over from trying 
> to install from source.  All the daemons are in /usr/sbin and user commands 
> in /usr/bin
> 
> I'm a newbie at this and very frustrated. Can anyone help?
> 
> 

Re: [slurm-users] [EXTERNAL] Follow-up-slurm-users Digest, Vol 30, Issue 32

2020-04-17 Thread Lisa Kay Weihl
Wow. I did not catch that version issue. I saw that there were issues with the 
newest Slurm and how CUDA 10+ installs so I avoided that even though we have 
CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with 
that and went back to 18 but now that I have more experience setting it up I'll 
wipe the 18 install and start over. Fingers crossed for success!

Thanks for your help!

--
Lisa Weihl 
Systems Administrator, Computer Science 
Bowling Green State University
Tel: (419) 372-0116   |    Fax: (419) 372-8061
lwe...@bgsu.edu
www.bgsu.edu

-Original Message-
From: slurm-users  On Behalf Of 
slurm-users-requ...@lists.schedmd.com
Sent: Thursday, April 16, 2020 6:39 PM
To: slurm-users@lists.schedmd.com
Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32

Send slurm-users mailing list submissions to
slurm-users@lists.schedmd.com

To subscribe or unsubscribe via the World Wide Web, visit

https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-usersdata=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045sdata=D782Wwobcc6ezSuy5GipiXuiH7EKRMm5Llk3BRwYnss%3Dreserved=0
or, via email, send a message with subject or body 'help' to
slurm-users-requ...@lists.schedmd.com

You can reach the person managing the list at
slurm-users-ow...@lists.schedmd.com

When replying, please edit your Subject line so it is more specific than "Re: 
Contents of slurm-users digest..."


Today's Topics:

   1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl)
   2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres
  (Sean Crosby)


--

Message: 1
Date: Thu, 16 Apr 2020 19:00:03 +
From: Lisa Kay Weihl 
To: "slurm-users@lists.schedmd.com" 
Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres
Message-ID:



Content-Type: text/plain; charset="utf-8"

I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve 
as a computer server for data science jobs. My department chair wants a job 
scheduler on it. I have installed SLURM (18.08.9). That works just fine in a 
basic configuration when I attempt to add Gres_Types gpu and then add 
Gres:gpu:4 to the end of the node description:


NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 
ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4

and then try to restart slurmd I get an error that it cannot find the plugin

slurmd: error: Couldn't find the specified plugin name for select/cons_tres 
looking at all files

slurmd: error: cannot find select plugin for select/cons_tres

slurmd: fatal: Can't find plugin for select/cons_tres

The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0

I usually keep notes when I'm installing things but in this case I wasn't 
jotting things down as I went. I think I started with the instructions on this 
page: 
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.htmldata=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3Dreserved=0
 and went with the usual ./configure, make, make install.

I have a feeling maybe something did not work and I switched to the rpm 
packages based on some other web pages I saw because if I do a yum list 
installed | grep slurm I see a lot of pacakages. The problem is I was 
interrupted with other tasks and my memory was somewhat rusty when I came back 
to this.

When I went looking for this error I saw there were some issues with the newest 
SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at 
CUDA 8.0.  Just in case I backed down to SLURM 18.

I'm willing to start all over if anyone thinks cleaning up and rebuilding will 
help that. I do see libraries in /etc/lib64/slurm but I also see 2 files in 
/usr/local/lib/slurm/src so I'm not sure if that's left over from trying to 
install from source.  All the daemons are in /usr/sbin and user commands in 
/usr/bin

I'm a newbie at this and very frustrated. Can anyone help?

***

Lisa Weihl Systems Administrator

Computer Science, Bowling Green State University
Tel: (419) 372-0116   |Fax: (419) 372-8061
lwe...@bgsu.edu
http://www.bgsu.edu/?
-- next part --
An HTML attachment was scrubbed...
URL: 

Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Ole Holm Nielsen

On 17-04-2020 11:47, Ole Holm Nielsen wrote:

On 17-04-2020 10:38, Christian Anthon wrote:
It would be neat to have these build requirements / install 
requirements built into the spec file.


I agree with you, and it seems that the SchedMD pages no longer list the 
build prerequisites (I think there was some information in the past). 
Try googling for "slurm build prerequisites" and see which pages this 
gives you :-)


If you read the page https://slurm.schedmd.com/quickstart_admin.html 
carefully, please note the section starting with:


Optional Slurm plugins will be built automatically when the configure script detects that the required build requirements are present. Build dependencies for various plugins and commands are denoted below: 


A list of optional software is given, but not in a format that is 
immediately applicable to any particular Linux distribution.  For CentOS 
you should consult 
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms


The Slurm build system searches for installed software and omits Slurm 
components where it didn't find the prerequisites installed on the system.


To submit a bug report against the slurm.spec file, you would need to 
have a support contract with SchedMD.  We get a lot of benefit from 
having such a support contract ;-)


A bug report for slurm.spec has been submitted at 
https://bugs.schedmd.com/show_bug.cgi?id=8882


/Ole



Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Ole Holm Nielsen

On 17-04-2020 10:38, Christian Anthon wrote:
It would be neat to have these build requirements / install requirements 
built into the spec file.


I agree with you, and it seems that the SchedMD pages no longer list the 
build prerequisites (I think there was some information in the past). 
Try googling for "slurm build prerequisites" and see which pages this 
gives you :-)


The Slurm build system searches for installed software and omits Slurm 
components where it didn't find the prerequisites installed on the system.


To submit a bug report against the slurm.spec file, you would need to 
have a support contract with SchedMD.  We get a lot of benefit from 
having such a support contract ;-)


Best regards,
Ole


On 17/04/2020 10.08, Ole Holm Nielsen wrote:

Hi Felix,

Please make sure to install all prerequisite packages on the Slurm 
build host.  I have summarized this information in my Slurm Wiki page:

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms

/Ole


On 17-04-2020 09:11, Felix Farcas wrote:
I am trying to build a rpm for a new server and I get the following 
error:


Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 
rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) 
<= 3.0.4-1

Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) 
libdl.so.2()(64bit) libpam_misc.so.0()(64bit) 
libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) libpam.so.0()(64bit) 
libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) 
libpthread.so.0(GLIBC_2.3.2)(64bit) libresolv.so.2()(64bit) 
libslurmfull.so()(64bit) libutil.so.1()(64bit) 
libutil.so.1(GLIBC_2.2.5)(64bit)

Processing files: slurm-slurmdbd-20.02.1-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



RPM build errors:
 File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 

 File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



How may I find this file.








Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Felix Farcas

Hello

I did install  mariadb-server and mariadb-devel and all worked fine

Thank you

Felix

On 4/17/2020 11:38 AM, Christian Anthon wrote:
It would be neat to have these build requirements / install 
requirements built into the spec file.


Cheers, Christian.

On 17/04/2020 10.08, Ole Holm Nielsen wrote:

Hi Felix,

Please make sure to install all prerequisite packages on the Slurm 
build host.  I have summarized this information in my Slurm Wiki page:

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms

/Ole


On 17-04-2020 09:11, Felix Farcas wrote:
I am trying to build a rpm for a new server and I get the following 
error:


Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 
rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) 
<= 3.0.4-1

Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) 
libdl.so.2()(64bit) libpam_misc.so.0()(64bit) 
libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) libpam.so.0()(64bit) 
libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) 
libpthread.so.0(GLIBC_2.3.2)(64bit) libresolv.so.2()(64bit) 
libslurmfull.so()(64bit) libutil.so.1()(64bit) 
libutil.so.1(GLIBC_2.2.5)(64bit)

Processing files: slurm-slurmdbd-20.02.1-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



RPM build errors:
 File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 

 File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



How may I find this file.




--
Dr. Ing. Farcas Felix
National Institute of Research and Development
of Isotopic and Molecular Technology,
IT - Department - Cluj-Napoca, Romania
yahoo id: felixfarcas
skype id: felix.farcas
mobile: +40-742-195323




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Christian Anthon
It would be neat to have these build requirements / install requirements 
built into the spec file.


Cheers, Christian.

On 17/04/2020 10.08, Ole Holm Nielsen wrote:

Hi Felix,

Please make sure to install all prerequisite packages on the Slurm 
build host.  I have summarized this information in my Slurm Wiki page:

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms

/Ole


On 17-04-2020 09:11, Felix Farcas wrote:
I am trying to build a rpm for a new server and I get the following 
error:


Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 
rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) 
<= 3.0.4-1

Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) 
libdl.so.2()(64bit) libpam_misc.so.0()(64bit) 
libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) libpam.so.0()(64bit) 
libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) 
libpthread.so.0(GLIBC_2.3.2)(64bit) libresolv.so.2()(64bit) 
libslurmfull.so()(64bit) libutil.so.1()(64bit) 
libutil.so.1(GLIBC_2.2.5)(64bit)

Processing files: slurm-slurmdbd-20.02.1-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



RPM build errors:
 File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 

 File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



How may I find this file.






Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Ole Holm Nielsen

Hi Felix,

Please make sure to install all prerequisite packages on the Slurm build 
host.  I have summarized this information in my Slurm Wiki page:

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#build-slurm-rpms

/Ole


On 17-04-2020 09:11, Felix Farcas wrote:

I am trying to build a rpm for a new server and I get the following error:

Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 
rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 
3.0.4-1

Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) 
libdl.so.2()(64bit) libpam_misc.so.0()(64bit) 
libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) libpam.so.0()(64bit) 
libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) 
libresolv.so.2()(64bit) libslurmfull.so()(64bit) libutil.so.1()(64bit) 
libutil.so.1(GLIBC_2.2.5)(64bit)

Processing files: slurm-slurmdbd-20.02.1-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



RPM build errors:
     File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 

     File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so 



How may I find this file.




Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Christian Anthon
As such it is a mistake in the rpm spec file. But you just need 
mariadb-devel, or possibly mysql-devel installed.


Cheers, Christian.

On 17/04/2020 09.11, Felix Farcas wrote:

Hello

I am trying to build a rpm for a new server and I get the following 
error:


Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 
rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 
3.0.4-1

Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) 
libdl.so.2()(64bit) libpam_misc.so.0()(64bit) 
libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) libpam.so.0()(64bit) 
libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) 
libpthread.so.0(GLIBC_2.3.2)(64bit) libresolv.so.2()(64bit) 
libslurmfull.so()(64bit) libutil.so.1()(64bit) 
libutil.so.1(GLIBC_2.2.5)(64bit)

Processing files: slurm-slurmdbd-20.02.1-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so


RPM build errors:
    File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so
    File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so


How may I find this file.

Thank you

Felix





[slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-17 Thread Felix Farcas

Hello

I am trying to build a rpm for a new server and I get the following error:

Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 
rpmlib(PayloadFilesHavePrefix) <= 4.0-1 rpmlib(CompressedFileNames) <= 
3.0.4-1

Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) 
libdl.so.2()(64bit) libpam_misc.so.0()(64bit) 
libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) libpam.so.0()(64bit) 
libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) 
libresolv.so.2()(64bit) libslurmfull.so()(64bit) libutil.so.1()(64bit) 
libutil.so.1(GLIBC_2.2.5)(64bit)

Processing files: slurm-slurmdbd-20.02.1-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so


RPM build errors:
    File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so
    File not found: 
/root/rpmbuild/BUILDROOT/slurm-20.02.1-1.el7.x86_64/usr/lib64/slurm/accounting_storage_mysql.so


How may I find this file.

Thank you

Felix

--
Dr. Ing. Farcas Felix
National Institute of Research and Development
of Isotopic and Molecular Technology,
IT - Department - Cluj-Napoca, Romania
yahoo id: felixfarcas
skype id: felix.farcas
mobile: +40-742-195323




smime.p7s
Description: S/MIME Cryptographic Signature