Re: [CentOS] Update to Centos 7.7 / Arch ppc64le / Problem with nvidia driver

2019-09-27 Thread Ralf Aumüller

Hello Fabian,

On 26.09.19 09:46, Fabian Arrotin wrote:

On 25/09/2019 10:30, Ralf Aumüller wrote:

...

today I updated a CentOS 7.6 ppc64le machine to CentOS 7.7. After reboot
to the new kernel (4.18.0-80.7.2.el7.ppc64le) dkms could not build the
nvidia-module.

...>> Any comments?

thanks for Your quick response.


Well, if you use that kernel, that means you're on Power9 variant, and
that architecture doesn't exist anymore upstream (so no RHEL 7.7 for
Power9).
As almost all packages are just ppc64le (which still exist  upstream),
the decision was to still provide 7.7.1908 for Power9 users, but using
the kernel from CentOS 8, rebuilt for CentOS 7. (same is also true for
aarch64)

For that kernel to be built, we had to use newer gcc, that you can
find/use through devtoolset-8 :
http://mirror.centos.org/altarch/7/sclo/ppc64le/rh/devtoolset-8/


Ok. So I try to install devtoolset-8 and build the nvidia-driver with that gcc.


Curious : which kind of machine do you have that has both a Power9 and
nvidia ? that seems to *not* be an IBM node, but a kind of openpower
workstation ?


It'a a IBM Power System AC922 (8335-GTH) with Nvidia Tesla V100 graphic cards.
Supercomputer "Summit" uses this nodes (https://www.olcf.ornl.gov/summit/).


PS2 : worth creating a bug report on https://bugs.centos.org for easier
tracking and also indexing, so that other people in your situation would
follow the bug report (index by crawlers) and eventually discussion can
happen there.


I will do that.

Best regards,
Ralf

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Update to Centos 7.7 / Arch ppc64le / Problem with nvidia driver

2019-09-25 Thread Ralf Aumüller

Hello,

today I updated a CentOS 7.6 ppc64le machine to CentOS 7.7. After reboot to the 
new kernel (4.18.0-80.7.2.el7.ppc64le) dkms could not build the nvidia-module.


Error-message from dkms:

Compiler version check failed:

The major and minor number of the compiler used to
compile the kernel:

gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)

does not match the compiler used here:

cc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)


Output of /proc/version with new kernel running is:
Linux version 4.18.0-80.7.2.el7.ppc64le (mockbu...@ppc64le-01.bsys.centos.org) 
(gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)) #1 SMP Thu Sep 12 15:45:05 
UTC 2019


Problem seams to be:

The kernel was compiled with gcc-version 8.3.1 and installed is gcc 4.8.5. All 
previous kernels were compiled with gcc 4.8.5. See:


#cat /usr/src/kernels/*/include/generated/compile.h |grep LINUX_COMPILER
define LINUX_COMPILER "gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)"
define LINUX_COMPILER "gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)"
define LINUX_COMPILER "gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)"
define LINUX_COMPILER "gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)"
define LINUX_COMPILER "gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)"
define LINUX_COMPILER "gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)"
define LINUX_COMPILER "gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC)"

Any comments?

Best regards,
Ralf
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Update from 6.6 to 6.7 > automount logs error message

2015-09-04 Thread Ralf Aumüller
Hello,

>>> after an update from 6.6 to 6.7 the following error message is logged to
>>> /var/log/messages when I login (per ssh):
>>>
>>> Aug 11 16:31:21 a1234 automount[1598]: set_tsd_user_vars: failed to get
>>> passwd info from getpwuid_r

Did some more tests:

Compiled autofs with logging of UID/GID in autofs-function "set_tsd_user_vars".
Just before the error is logged, autofs tries to get password info for e.g. UID
409651584 and GID 4294936577 (witch don't exist). Then error message is logged.

A fully updated 6.7 system running latest 6.6 kernel
(2.6.32-504.30.3.el6.x86_64) won't print the error message. I checked the
changelog of kernel 2.6.32-573.3.1.el6.x86_64 and found some autofs patches
since version 504.30.3.

But I can't test an further because the kernel-srpm didn't include single
patches anymore.

Maybe someone with deeper kernel knowledge has an idea?

Best regards,
Ralf







___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Update from 6.6 to 6.7 > automount logs error message

2015-08-11 Thread Ralf Aumüller
Hello,

after an update from 6.6 to 6.7 the following error message is logged to
/var/log/messages when I login (per ssh):

Aug 11 16:31:21 a1234 automount[1598]: set_tsd_user_vars: failed to get passwd
info from getpwuid_r

Checked all log-files of my systems running 6.6 with same configuration -- never
got such a message (We use NFS/autofs for home-directories, NIS and tcsh (login
shell)).

Everything seems to work -- but before I update all machines to 6.7 I want to
know whats going on.

Any comments?

Best regards,
Ralf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] md5sum mismatch between CentOS 6.4 and 6.5 repository

2013-12-09 Thread Ralf Aumüller
Hello,

On 12/09/2013 03:16 PM, Johnny Hughes wrote:
> Yes, it is signed by the same key ... but the rpm is not identical. (The
> difference being a different md5sum because rpm metadata (signature
> time) is different).
> 
> He does not say HOW the installation fails ... one would have to look at
> the kickstart file to see.

anaconda crashes while installing packages. Checking the log-files of the
traceback I found something about wrong md5sum of package python-slip-dbus.

> If the ks install is looking at a real 6.5 and pointing at it, it should
> work fine.  If he is pointing at something else, maybe not.  I would
> personally just delete the files in question and rsync again to make
> sure they are replaced.
> 
> Rsync, if it sees the same file and the same date, will not validate the
> crc without a -c switch ... that would take forever for a whole tree, so
> I would delete the files in question and sync again from a 6.5 tree.

That was the way I fixed the kickstart installation.

Thank You very much for Your explanation of the md5sum difference and best 
regards,

Ralf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] md5sum mismatch between CentOS 6.4 and 6.5 repository

2013-12-06 Thread Ralf Aumüller
Hello,

when I download python-slip-dbus-0.2.20-1.el6_2.noarch.rpm from CentOS 6.5
repository the md5sum is different than when I download same file from 6.4.

wget http://msync.centos.org/centos-6/6.5/os/x86_64/Packages/
 python-slip-dbus-0.2.20-1.el6_2.noarch.rpm
 -O python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.65
wget http://msync.centos.org/centos-6/6.4/os/x86_64/Packages/
 python-slip-dbus-0.2.20-1.el6_2.noarch.rpm
 -O python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.64

ls -l
. 30844 Mar 26  2012 python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.64
. 30844 Mar 26  2012 python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.65

md5sum python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.*
20bb02e6f3b7b71e09dcaff7f3b0ca02  python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.64
d37fe4404a7a5fdb27b29f9b5ed09c73  python-slip-dbus-0.2.20-1.el6_2.noarch.rpm.65

Any comments?

Background:
We have a local CentOS mirror and after updating to 6.5 the kickstart
installation fails because of the wrong md5sum of python-slip-dbus. We mirror
with rsync (no -c) and so we had the version from 6.4 in our 6.5 repository.

(Same seams to be true for python-paste-script-1.7.3-5.el6_3.noarch.rpm and
slf4j-javadoc-1.5.8-8.el6.noarch.rpm)

Best regards,
Ralf
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos