[slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-08 Thread Lou Nicotra
I am running into an error while trying to
install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
error: Failed dependencies:
libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64

Packages are built using rpmbuild... And complete with no errors...
+ cd /root/rpmbuild/BUILD
+ cd slurm-19.05.1-2
+ rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
+ exit 0

Investigation of the output while building the rpm package shows that
nvidia-ml is found:
checking for nvmlInit in -lnvidia-ml... yes
.
.
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
gpu_nvml.c  -fPIC -DPIC -o .libs/gpu_nvml.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
-m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
gpu_nvml.c -o gpu_nvml.o >/dev/null 2>&1
/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc
 -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1
-fno-strict-aliasing -module -avoid-version --export-dynamic -Wl,-z,relro
-o gpu_nvml.la -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml
libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o   -lnvidia-ml  -O2
-g -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic
-pthread -ggdb3 -g -O1 -Wl,-z -Wl,relro   -pthread -Wl,-soname
-Wl,gpu_nvml.so -o .libs/gpu_nvml.so

The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
includes: NVML_LIBS = -lnvidia-ml
but previous releases did not (slurm-18.08.8) And I was able to compile and
install that release with no issues after building it with rpmbuild...

My LD_LIBRARY_PATH is
/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:

Can anyone provide suggestions on working out this issue?

Thanks.
 --

LOU NICOTRA

IT Systems Engineer - SLT

Interactions LLC

o:  908-673-1833 <781-405-5114>

m: 908-451-6983 <781-405-5114>

*lnico...@interactions.com *
www.interactions.com

-- 





***




This e-mail and any of its attachments may contain
Interactions LLC 
proprietary information, which is privileged,
confidential, or subject to 
copyright belonging to the Interactions
LLC. This e-mail is intended solely 
for the use of the individual or
entity to which it is addressed. If you 
are not the intended recipient of this
e-mail, you are hereby notified that 
any dissemination, distribution, copying,
or action taken in relation to 
the contents of and attachments to this e-mail
is strictly prohibited and 
may be unlawful. If you have received this e-mail in
error, please notify 
the sender immediately and permanently delete the original
and any copy of 
this e-mail and any printout. Thank You.  




*** 


Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-11 Thread Barbara Krašovec
What if you try to run ldconfig manually before building the rpm?

Cheers,

Barbara

On 8/8/19 5:57 PM, Lou Nicotra wrote:
> I am running into an error while trying to
> install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
> root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
> error: Failed dependencies:
>         libnvidia-ml.so.1()(64bit) is needed by
> slurm-19.05.1-2.el7.centos.x86_64
>
> Packages are built using rpmbuild... And complete with no errors...
> + cd /root/rpmbuild/BUILD
> + cd slurm-19.05.1-2
> + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
> + exit 0
>
> Investigation of the output while building the rpm package shows that
> nvidia-ml is found:
> checking for nvmlInit in -lnvidia-ml... yes
> .
> .
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
> -I../../../../slurm -I../../../.. -I../../../../src/common
> -I/usr/local/cuda/include -I/usr/cuda/include
> -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
> -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c  -fPIC
> -DPIC -o .libs/gpu_nvml.o
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
> -I../../../../slurm -I../../../.. -I../../../../src/common
> -I/usr/local/cuda/include -I/usr/cuda/include
> -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
> -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -o
> gpu_nvml.o >/dev/null 2>&1
> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc
>  -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
> --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic
> -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -module
> -avoid-version --export-dynamic -Wl,-z,relro   -o gpu_nvml.la
>  -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml
> libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o  -lnvidia-ml
>  -O2 -g -fstack-protector-strong -grecord-gcc-switches -m64
> -mtune=generic -pthread -ggdb3 -g -O1 -Wl,-z -Wl,relro   -pthread
> -Wl,-soname -Wl,gpu_nvml.so -o .libs/gpu_nvml.so
>
> The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src 
> includes: NVML_LIBS = -lnvidia-ml
> but previous releases did not (slurm-18.08.8) And I was able to
> compile and install that release with no issues after building it with
> rpmbuild...
>
> My LD_LIBRARY_PATH is
> /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:
>
> Can anyone provide suggestions on working out this issue?
>
> Thanks.
>  --
>
> LOU NICOTRA
>
> IT Systems Engineer - SLT
>
> Interactions LLC
>
> o:  908-673-1833 
>
> m: 908-451-6983 
>
> _lnico...@interactions.com _
>
> www.interactions.com 
>
> ***
>
> This e-mail and any of its attachments may contain Interactions LLC
> proprietary information, which is privileged, confidential, or subject
> to copyright belonging to the Interactions LLC. This e-mail is
> intended solely for the use of the individual or entity to which it is
> addressed. If you are not the intended recipient of this e-mail, you
> are hereby notified that any dissemination, distribution, copying, or
> action taken in relation to the contents of and attachments to this
> e-mail is strictly prohibited and may be unlawful. If you have
> received this e-mail in error, please notify the sender immediately
> and permanently delete the original and any copy of this e-mail and
> any printout. Thank You. 
>
> ***
>  
>
>


Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Lou Nicotra
I have tried running ldconfig manually as suggested with
slurm-19.05.1-2 and it fails the same way...
error: Failed dependencies:
libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64

ldconfig -p shows:
root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1
libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so

Just tried the latest release slurm-19.05.2 and it fails in the same way...
root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
error: Failed dependencies:
libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.2-1.el7.centos.x86_64

Reinstalled slurm-18.08.8 and it installs with no issues... Just
like slurm-18.08.03 and slurm-18.08.4 did...  All built on the same machine
with rpmbuild -ta command...
root@panther02 slurm-18.08.8# rpm -Uvh slurm-18.08.8-1.el7.centos.x86_64.rpm
Preparing...  #
[100%]
Updating / installing...
   1:slurm-18.08.8-1.el7.centos   #
[100%]

Oh, well...

Lou



On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec 
wrote:

> What if you try to run ldconfig manually before building the rpm?
>
> Cheers,
>
> Barbara
> On 8/8/19 5:57 PM, Lou Nicotra wrote:
>
> I am running into an error while trying to
> install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
> root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
> error: Failed dependencies:
> libnvidia-ml.so.1()(64bit) is needed by
> slurm-19.05.1-2.el7.centos.x86_64
>
> Packages are built using rpmbuild... And complete with no errors...
> + cd /root/rpmbuild/BUILD
> + cd slurm-19.05.1-2
> + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
> + exit 0
>
> Investigation of the output while building the rpm package shows that
> nvidia-ml is found:
> checking for nvmlInit in -lnvidia-ml... yes
> .
> .
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
> -I../../../../slurm -I../../../.. -I../../../../src/common
> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
> gpu_nvml.c  -fPIC -DPIC -o .libs/gpu_nvml.o
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
> -I../../../../slurm -I../../../.. -I../../../../src/common
> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
> gpu_nvml.c -o gpu_nvml.o >/dev/null 2>&1
> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc
>  -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
> -grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1
> -fno-strict-aliasing -module -avoid-version --export-dynamic -Wl,-z,relro
> -o gpu_nvml.la -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml
> libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o   -lnvidia-ml  -O2
> -g -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic
> -pthread -ggdb3 -g -O1 -Wl,-z -Wl,relro   -pthread -Wl,-soname
> -Wl,gpu_nvml.so -o .libs/gpu_nvml.so
>
> The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
> includes: NVML_LIBS = -lnvidia-ml
> but previous releases did not (slurm-18.08.8) And I was able to compile
> and install that release with no issues after building it with rpmbuild...
>
> My LD_LIBRARY_PATH is
> /usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:
>
> Can anyone provide suggestions on working out this issue?
>
> Thanks.
>  --
>
> LOU NICOTRA
>
> IT Systems Engineer - SLT
>
> Interactions LLC
>
> o:  908-673-1833 <781-405-5114>
>
> m: 908-451-6983 <781-405-5114>
>
> *lnico...@interactions.com *
> www.interactions.com
>
>
> ***
>
> This e-mail and any of its attachments may contain Interactions LLC
> proprietary information, which is privileged, confidential, or subject to
> copyright belonging to the Interactions LLC. This e-mail is intended solely
> for the use of the individual or entity to which it is addressed. If you
> are not the intended recipient of this e-mail, you are hereby notified that
> any dissemination, distribution, copying, or action taken in relation to
> the contents of and attachments to this e-mail is strictly prohibited and
> may be unlawful. If you have received this e-mail in error, please notify
> the 

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Philip Kovacs
 >I have tried running ldconfig manually as suggested with  slurm-19.05.1-2 and 
 >it fails the same way... >error: Failed dependencies:>        
 >libnvidia-ml.so.1()(64bit) is needed by slurm-19.05.1-2.el7.centos.x86_64  
Lou, that's a packaging mistake on the part of the person who created that el7 
centos bundle.  What no doubt happenedwas that he/she had the nvidia 
proprietary libs/headers installed on the machine when he configured slurm.  
That causedslurm to see the nvidia drivers and configure for them, thus causing 
everyone who installed that package to require the nvidia library.   That's 
definitely a bug from a licensing perspective since nvidia is a closed, 
proprietary driver. 
If someone absolutely forced you to use that slurm bundle, you could install 
the nvidia prop driver separately --perhaps through a repo like negativo17's, 
but that will taint your kernel.  
I would inform the packager that they made a mistake.

On Thursday, August 15, 2019, 10:55:30 AM EDT, Lou Nicotra 
 wrote:  
 
 I have tried running ldconfig manually as suggested with slurm-19.05.1-2 and 
it fails the same way... error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by 
slurm-19.05.1-2.el7.centos.x86_64  

ldconfig -p shows:root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
        libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1
        libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
        libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
        libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
 
Just tried the latest release slurm-19.05.2 and it fails in the same way... 
root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by 
slurm-19.05.2-1.el7.centos.x86_64

Reinstalled slurm-18.08.8 and it installs with no issues... Just like 
slurm-18.08.03 and slurm-18.08.4 did...  All built on the same machine with 
rpmbuild -ta command...root@panther02 slurm-18.08.8# rpm -Uvh 
slurm-18.08.8-1.el7.centos.x86_64.rpm
Preparing...                          # [100%]
Updating / installing...
   1:slurm-18.08.8-1.el7.centos       # [100%]

Oh, well...
Lou


On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec  
wrote:

  
What if you try to run ldconfig manually before building the rpm?
 
Cheers,
 
Barbara
 
 On 8/8/19 5:57 PM, Lou Nicotra wrote:
  
 I am running into an error while trying to install 
slurm-19.05.1-2.el7.centos.x86_64... Error is as follows: root@panther02 
x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
 error: Failed dependencies:
         libnvidia-ml.so.1()(64bit) is needed by 
slurm-19.05.1-2.el7.centos.x86_64 
  Packages are built using rpmbuild... And complete with no errors... + cd 
/root/rpmbuild/BUILD
 + cd slurm-19.05.1-2
 + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
 + exit 0
  
  Investigation of the output while building the rpm package shows that 
nvidia-ml is found: checking for nvmlInit in -lnvidia-ml... yes
  . . libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../.. 
-I../../../../slurm -I../../../.. -I../../../../src/common 
-I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY -O2 
-g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64  -mtune=generic -pthread 
-ggdb3 -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c  -fPIC -DPIC -o 
.libs/gpu_nvml.o
 libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../.. -I../../../../slurm 
-I../../../.. -I../../../../src/common -I/usr/local/cuda/include 
-I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall 
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64  -mtune=generic -pthread 
-ggdb3 -Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c -o gpu_nvml.o >/dev/null 
2>&1
 /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc  
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 
-grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 
-fno-strict-aliasing -module -avoid-version --export-dynamic -Wl,-z,relro   -o 
gpu_nvml.la -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml
 libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o   -lnvidia-ml  -O2 
-g -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -pthread 
-ggdb3 -g -O1 -Wl,-z -Wl,relro   -pthread -Wl,-soname -Wl,gpu_nvml.so -o 
.libs/gpu_nvml.so
  
 The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src  includes: NVML_LIBS 
= -lnvidia-ml but previous releases did not (slurm-18.08.8) And I was able to 
compile and install that release with no issues after building it with 
rpmbuild... 
  My LD_LIBRARY_PATH 
is/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/lo

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-15 Thread Brian Andrus

Lou,

Are you installing on the same machine you built?

Are the nvidia libraries installed by RPM or a 'make install' on the box 
you compiled it on?


Brian Andrus

On 8/15/2019 7:53 AM, Lou Nicotra wrote:
I have tried running ldconfig manually as suggested with 
slurm-19.05.1-2 and it fails the same way...

error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by 
slurm-19.05.1-2.el7.centos.x86_64


ldconfig -p shows:
root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
        libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1
        libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
        libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
        libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so

Just tried the latest release slurm-19.05.2 and it fails in the same 
way...

root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by 
slurm-19.05.2-1.el7.centos.x86_64


Reinstalled slurm-18.08.8 and it installs with no issues... Just 
like slurm-18.08.03 and slurm-18.08.4 did...  All built on the same 
machine with rpmbuild -ta command...
root@panther02 slurm-18.08.8# rpm -Uvh 
slurm-18.08.8-1.el7.centos.x86_64.rpm

Preparing...  # [100%]
Updating / installing...
   1:slurm-18.08.8-1.el7.centos # [100%]

Oh, well...

Lou



On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec 
mailto:barbara.kraso...@ijs.si>> wrote:


What if you try to run ldconfig manually before building the rpm?

Cheers,

Barbara

On 8/8/19 5:57 PM, Lou Nicotra wrote:

I am running into an error while trying to
install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64

Packages are built using rpmbuild... And complete with no errors...
+ cd /root/rpmbuild/BUILD
+ cd slurm-19.05.1-2
+ rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
+ exit 0

Investigation of the output while building the rpm package shows
that nvidia-ml is found:
checking for nvmlInit in -lnvidia-ml... yes
.
.
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64
-mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing
-c gpu_nvml.c  -fPIC -DPIC -o .libs/gpu_nvml.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches -m64
-mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing
-c gpu_nvml.c -o gpu_nvml.o >/dev/null 2>&1
/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc
 -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong
--param=ssp-buffer-size=4 -grecord-gcc-switches   -m64
-mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing
-module -avoid-version --export-dynamic -Wl,-z,relro   -o
gpu_nvml.la  -rpath /usr/lib64/slurm
gpu_nvml.lo -lnvidia-ml
libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o
-lnvidia-ml -O2 -g -fstack-protector-strong -grecord-gcc-switches
-m64 -mtune=generic -pthread -ggdb3 -g -O1 -Wl,-z -Wl,relro  
-pthread -Wl,-soname -Wl,gpu_nvml.so -o .libs/gpu_nvml.so

The Makefile in /root/rpmbuild/BUILD/slurm-19.05.1-2/src
includes: NVML_LIBS = -lnvidia-ml
but previous releases did not (slurm-18.08.8) And I was able to
compile and install that release with no issues after building it
with rpmbuild...

My LD_LIBRARY_PATH is

/usr/lib64:/usr/lib:/usr/local/lib64:/usr/local/lib:/var/local/miniconda2/lib/:

Can anyone provide suggestions on working out this issue?

Thanks.
 --

LOU NICOTRA

IT Systems Engineer - SLT

Interactions LLC

o: 908-673-1833 

m: 908-451-6983 

_lnico...@interactions.com _

www.interactions.com 


***

This e-mail and any of its attachments may contain Interactions
LLC proprietary information, which is privileged, confidential,
or subject to copyright belonging to the Interactions LLC. Thi

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-16 Thread Lou Nicotra
Brian, the package is being built and installed on the master server.  I am
testing by removing all instances of V18 and installing the newly created
V19 slurm rpms,  I get the error message on the slurm rpm install, all
others (ctl, db, ... ) install fine.

After I get the error message, I remove all rpms from V19 and reinstall V18
using the same procedure with no issues... And the system sees all nodes as
it did before trying to install V19

The nvidia libraries are installed via the official Nvidia
rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
supporting cuda10. Multi GPU server currently used by multiple users (DNN
training) with no errors of any type while utilizing the nvidia libs/code.

nvidia-smi command shows:  NVIDIA-SMI 418.39   Driver Version: 418.39
CUDA Version: 10.1

So, it is definitely something new to the V19 release... I have installed
18.08.0, .3, .4 and .8 on the same server and nodes since Sep of 2018 using
the same procedures and never had any issues... Currently running 18.08.8

Thanks.
Lou

On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus  wrote:

> Lou,
>
> Are you installing on the same machine you built?
>
> Are the nvidia libraries installed by RPM or a 'make install' on the box
> you compiled it on?
>
> Brian Andrus
> On 8/15/2019 7:53 AM, Lou Nicotra wrote:
>
> I have tried running ldconfig manually as suggested with
> slurm-19.05.1-2 and it fails the same way...
> error: Failed dependencies:
> libnvidia-ml.so.1()(64bit) is needed by
> slurm-19.05.1-2.el7.centos.x86_64
>
> ldconfig -p shows:
> root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
> libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1
> libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
> libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
> libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
>
> Just tried the latest release slurm-19.05.2 and it fails in the same
> way...
> root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
> error: Failed dependencies:
> libnvidia-ml.so.1()(64bit) is needed by
> slurm-19.05.2-1.el7.centos.x86_64
>
> Reinstalled slurm-18.08.8 and it installs with no issues... Just
> like slurm-18.08.03 and slurm-18.08.4 did...  All built on the same machine
> with rpmbuild -ta command...
> root@panther02 slurm-18.08.8# rpm -Uvh
> slurm-18.08.8-1.el7.centos.x86_64.rpm
> Preparing...  #
> [100%]
> Updating / installing...
>1:slurm-18.08.8-1.el7.centos   #
> [100%]
>
> Oh, well...
>
> Lou
>
>
>
> On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec 
> wrote:
>
>> What if you try to run ldconfig manually before building the rpm?
>>
>> Cheers,
>>
>> Barbara
>> On 8/8/19 5:57 PM, Lou Nicotra wrote:
>>
>> I am running into an error while trying to
>> install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
>> root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
>> error: Failed dependencies:
>> libnvidia-ml.so.1()(64bit) is needed by
>> slurm-19.05.1-2.el7.centos.x86_64
>>
>> Packages are built using rpmbuild... And complete with no errors...
>> + cd /root/rpmbuild/BUILD
>> + cd slurm-19.05.1-2
>> + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
>> + exit 0
>>
>> Investigation of the output while building the rpm package shows that
>> nvidia-ml is found:
>> checking for nvmlInit in -lnvidia-ml... yes
>> .
>> .
>> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>> -I../../../../slurm -I../../../.. -I../../../../src/common
>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
>> gpu_nvml.c  -fPIC -DPIC -o .libs/gpu_nvml.o
>> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>> -I../../../../slurm -I../../../.. -I../../../../src/common
>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
>> gpu_nvml.c -o gpu_nvml.o >/dev/null 2>&1
>> /bin/sh ../../../../libtool  --tag=CC   --mode=link gcc
>>  -DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
>> -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4
>> -grecord-gcc-switches   -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1
>> -fno-strict-aliasing -module -avoid-version --export-dynamic -Wl,-z,relro
>> -o gpu_nvml.la -rpath /usr/lib64/slurm gpu_nvml.lo -lnvidia-ml
>> libtool: link: gcc -shared  -fPIC -DPIC  .libs/gpu_nvml.o   -lnvidia-ml  -O2
>> -g -fstack-protector-strong -grecord-gcc-switche

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-16 Thread Brian Andrus
Ah. I suspect your issue may be the cuda. 10.1 which does not 
create/register all the appropriate symlinks and "provides".

I ran into that trying to install tensorflow.

If you can, downgrade to 10.0, which does a better job of installing itself.

Brian

On 8/16/2019 5:47 AM, Lou Nicotra wrote:
Brian, the package is being built and installed on the master server.  
I am testing by removing all instances of V18 and installing the newly 
created V19 slurm rpms,  I get the error message on the slurm rpm 
install, all others (ctl, db, ... ) install fine.


After I get the error message, I remove all rpms from V19 and 
reinstall V18 using the same procedure with no issues... And the 
system sees all nodes as it did before trying to install V19


The nvidia libraries are installed via the official Nvidia 
rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm 
supporting cuda10. Multi GPU server currently used by multiple users 
(DNN training) with no errors of any type while utilizing the nvidia 
libs/code.


nvidia-smi command shows:  NVIDIA-SMI 418.39       Driver Version: 
418.39       CUDA Version: 10.1


So, it is definitely something new to the V19 release... I have 
installed 18.08.0, .3, .4 and .8 on the same server and nodes since 
Sep of 2018 using the same procedures and never had any issues... 
Currently running 18.08.8


Thanks.
Lou

On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus > wrote:


Lou,

Are you installing on the same machine you built?

Are the nvidia libraries installed by RPM or a 'make install' on
the box you compiled it on?

Brian Andrus

On 8/15/2019 7:53 AM, Lou Nicotra wrote:

I have tried running ldconfig manually as suggested with 
slurm-19.05.1-2 and it fails the same way...
error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64

ldconfig -p shows:
root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
        libnvidia-ml.so.1 (libc6,x86-64) =>
/usr/lib64/libnvidia-ml.so.1
        libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
        libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
        libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so

Just tried the latest release slurm-19.05.2 and it fails in the
same way...
root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.2-1.el7.centos.x86_64

Reinstalled slurm-18.08.8 and it installs with no issues... Just
like slurm-18.08.03 and slurm-18.08.4 did...  All built on the
same machine with rpmbuild -ta command...
root@panther02 slurm-18.08.8# rpm -Uvh
slurm-18.08.8-1.el7.centos.x86_64.rpm
Preparing...  # [100%]
Updating / installing...
   1:slurm-18.08.8-1.el7.centos #
[100%]

Oh, well...

Lou



On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec
mailto:barbara.kraso...@ijs.si>> wrote:

What if you try to run ldconfig manually before building the rpm?

Cheers,

Barbara

On 8/8/19 5:57 PM, Lou Nicotra wrote:

I am running into an error while trying to
install slurm-19.05.1-2.el7.centos.x86_64... Error is as
follows:
root@panther02 x86_64# rpm -Uvh
slurm-19.05.1-2.el7.centos.x86_64.rpm
error: Failed dependencies:
        libnvidia-ml.so.1()(64bit) is needed by
slurm-19.05.1-2.el7.centos.x86_64

Packages are built using rpmbuild... And complete with no
errors...
+ cd /root/rpmbuild/BUILD
+ cd slurm-19.05.1-2
+ rm -rf
/root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
+ exit 0

Investigation of the output while building the rpm package
shows that nvidia-ml is found:
checking for nvmlInit in -lnvidia-ml... yes
.
.
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
-Wall -g -O1 -fno-strict-aliasing -c gpu_nvml.c  -fPIC -DPIC
-o .libs/gpu_nvml.o
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
-I../../../../slurm -I../../../.. -I../../../../src/common
-I/usr/local/cuda/include -I/usr/cuda/include
-DNUMA_VERSION1_COMPATIBILITY -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector-strong --param=ssp-buffer-size=4
-grecord-gcc-switches -m64 -mtune=generic -pthread -ggdb3
-

Re: [slurm-users] Trouble installing slurm-19.05.1-2.el7.centos.x86_64

2019-08-16 Thread Lou Nicotra
Ok, thank you so much for that hint... I will try doing that and report
back.

Thanks!
Lou

On Fri, Aug 16, 2019 at 11:05 AM Brian Andrus  wrote:

> Ah. I suspect your issue may be the cuda. 10.1 which does not
> create/register all the appropriate symlinks and "provides".
> I ran into that trying to install tensorflow.
>
> If you can, downgrade to 10.0, which does a better job of installing
> itself.
>
> Brian
> On 8/16/2019 5:47 AM, Lou Nicotra wrote:
>
> Brian, the package is being built and installed on the master server.  I
> am testing by removing all instances of V18 and installing the newly
> created V19 slurm rpms,  I get the error message on the slurm rpm install,
> all others (ctl, db, ... ) install fine.
>
> After I get the error message, I remove all rpms from V19 and reinstall
> V18 using the same procedure with no issues... And the system sees all
> nodes as it did before trying to install V19
>
> The nvidia libraries are installed via the official Nvidia
> rpm... cuda-repo-rhel7-10-1-local-10.1.105-418.39-1.0-1.x86_64.rpm
> supporting cuda10. Multi GPU server currently used by multiple users (DNN
> training) with no errors of any type while utilizing the nvidia libs/code.
>
> nvidia-smi command shows:  NVIDIA-SMI 418.39   Driver Version: 418.39
>   CUDA Version: 10.1
>
> So, it is definitely something new to the V19 release... I have installed
> 18.08.0, .3, .4 and .8 on the same server and nodes since Sep of 2018 using
> the same procedures and never had any issues... Currently running 18.08.8
>
> Thanks.
> Lou
>
> On Thu, Aug 15, 2019 at 3:07 PM Brian Andrus  wrote:
>
>> Lou,
>>
>> Are you installing on the same machine you built?
>>
>> Are the nvidia libraries installed by RPM or a 'make install' on the box
>> you compiled it on?
>>
>> Brian Andrus
>> On 8/15/2019 7:53 AM, Lou Nicotra wrote:
>>
>> I have tried running ldconfig manually as suggested with
>> slurm-19.05.1-2 and it fails the same way...
>> error: Failed dependencies:
>> libnvidia-ml.so.1()(64bit) is needed by
>> slurm-19.05.1-2.el7.centos.x86_64
>>
>> ldconfig -p shows:
>> root@panther02 slurm# ldconfig -p|grep libnvidia-ml.
>> libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib64/libnvidia-ml.so.1
>> libnvidia-ml.so.1 (libc6) => /lib/libnvidia-ml.so.1
>> libnvidia-ml.so (libc6,x86-64) => /usr/lib64/libnvidia-ml.so
>> libnvidia-ml.so (libc6) => /lib/libnvidia-ml.so
>>
>> Just tried the latest release slurm-19.05.2 and it fails in the same
>> way...
>> root@panther02 x86_64# rpm -Uvh slurm-19.05.2-1.el7.centos.x86_64.rpm
>> error: Failed dependencies:
>> libnvidia-ml.so.1()(64bit) is needed by
>> slurm-19.05.2-1.el7.centos.x86_64
>>
>> Reinstalled slurm-18.08.8 and it installs with no issues... Just
>> like slurm-18.08.03 and slurm-18.08.4 did...  All built on the same machine
>> with rpmbuild -ta command...
>> root@panther02 slurm-18.08.8# rpm -Uvh
>> slurm-18.08.8-1.el7.centos.x86_64.rpm
>> Preparing...  #
>> [100%]
>> Updating / installing...
>>1:slurm-18.08.8-1.el7.centos   #
>> [100%]
>>
>> Oh, well...
>>
>> Lou
>>
>>
>>
>> On Mon, Aug 12, 2019 at 1:32 AM Barbara Krašovec 
>> wrote:
>>
>>> What if you try to run ldconfig manually before building the rpm?
>>>
>>> Cheers,
>>>
>>> Barbara
>>> On 8/8/19 5:57 PM, Lou Nicotra wrote:
>>>
>>> I am running into an error while trying to
>>> install slurm-19.05.1-2.el7.centos.x86_64... Error is as follows:
>>> root@panther02 x86_64# rpm -Uvh slurm-19.05.1-2.el7.centos.x86_64.rpm
>>> error: Failed dependencies:
>>> libnvidia-ml.so.1()(64bit) is needed by
>>> slurm-19.05.1-2.el7.centos.x86_64
>>>
>>> Packages are built using rpmbuild... And complete with no errors...
>>> + cd /root/rpmbuild/BUILD
>>> + cd slurm-19.05.1-2
>>> + rm -rf /root/rpmbuild/BUILDROOT/slurm-19.05.1-2.el7.centos.x86_64
>>> + exit 0
>>>
>>> Investigation of the output while building the rpm package shows that
>>> nvidia-ml is found:
>>> checking for nvmlInit in -lnvidia-ml... yes
>>> .
>>> .
>>> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>>> -I../../../../slurm -I../../../.. -I../../../../src/common
>>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
>>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
>>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
>>> gpu_nvml.c  -fPIC -DPIC -o .libs/gpu_nvml.o
>>> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../..
>>> -I../../../../slurm -I../../../.. -I../../../../src/common
>>> -I/usr/local/cuda/include -I/usr/cuda/include -DNUMA_VERSION1_COMPATIBILITY
>>> -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
>>> -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches
>>> -m64 -mtune=generic -pthread -ggdb3 -Wall -g -O1 -fno-strict-aliasing -c
>>