[Kernel-packages] [Bug 1960256] Re: compilation errors due to "peermem" module

2022-02-07 Thread D M
** Summary changed:

- compilation due to "peermem" module
+ compilation errors due to "peermem" module

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to nvidia-graphics-drivers-470 in Ubuntu.
https://bugs.launchpad.net/bugs/1960256

Title:
  compilation errors due to "peermem" module

Status in nvidia-graphics-drivers-470 package in Ubuntu:
  New

Bug description:
  Installed with the following:

  apt-get install --no-install-recommends nvidia-driver-470 nvidia-
  modprobe libnvidia-cfg1-470 libnvidia-common-470 libnvidia-compute-470
  libnvidia-decode-470 libnvidia-encode-470 libnvidia-extra-470
  libnvidia-fbc1-470 libnvidia-gl-470 libnvidia-ifr1-470 nvidia-compute-
  utils-470 nvidia-dkms-470 nvidia-driver-470 nvidia-kernel-common-470
  nvidia-kernel-source-470 nvidia-utils-470 xserver-xorg-video-
  nvidia-470

  Seems to generate /var/crash/nvidia-dkms-470.0.crash (tail -20):

 /usr/bin/ld.bfd -m elf_x86_64  -z max-page-size=0x20-r -o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-drv.o […]   { echo  
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o  […] 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-format.o;  echo; } 
> /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.mod
   make -f ./scripts/Makefile.modpost
 sed 's/ko$/o/' /var/lib/dkms/nvidia/470.103.01/build/modules.order | 
scripts/mod/modpost -m -a -i ./Module.symvers -I 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -e 
/usr/src/ofa_kernel/default/Module.symvers -o 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -s -T - 
   FATAL: parse error in symbol dump file
   scripts/Makefile.modpost:93: recipe for target '__modpost' failed
   make[2]: *** [__modpost] Error 1
   Makefile:1675: recipe for target 'modules' failed
   make[1]: *** [modules] Error 2
   make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
   Makefile:80: recipe for target 'modules' failed
   make: *** [modules] Error 2
  DKMSKernelVersion: 5.4.0-97-generic
  Date: Mon Feb  7 11:32:36 2022
  Package: nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1
  PackageVersion: 470.103.01-0ubuntu0.18.04.1
  SourcePackage: nvidia-graphics-drivers-470
  Title: nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1: nvidia kernel module 
failed to build

  
  It is added:

  # dkms status -k `uname -r`
  iser, 4.7: added
  kernel-mft-dkms, 4.13.0, 5.4.0-97-generic, x86_64: installed
  knem, 1.1.3.90mlnx1: added
  mlnx-ofed-kernel, 4.7: added
  nvidia, 470.103.01: added
  rshim, 1.8, 5.4.0-97-generic, x86_64: installed
  srp, 4.7: added

  However, doing a build gets:

   dkms build nvidia/470.103.01

  Kernel preparation unnecessary for this kernel.  Skipping...
  applying patch 
disable_fstack-clash-protection_fcf-protection.patch...patching file Kbuild
  Hunk #1 succeeded at 82 (offset 11 lines).

  
  Building module:
  cleaning build area...
  unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 
'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-97-generic 
IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 
SYSSRC=/lib/modules/5.4.0-97-generic/build LD=/usr/bin/ld.bfd 
modules.(bad exit status: 2)
  ERROR: Cannot create report: [Errno 17] File exists: 
'/var/crash/nvidia-dkms-470.0.crash'
  Error! Bad return status for module build on kernel: 5.4.0-97-generic (x86_64)
  Consult /var/lib/dkms/nvidia/470.103.01/build/make.log for more information.

  Which I suspect is related to the "peermem" module:

{ echo  /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-drv.o [...] 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-format.o;  echo; } 
> /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.mod
/usr/bin/ld.bfd -m elf_x86_64  -z max-page-size=0x20-r -o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o
{ echo  
/var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o;  echo; } 
> /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.mod
  make -f ./scripts/Makefile.modpost
sed 's/ko$/o/' /var/lib/dkms/nvidia/470.103.01/build/modules.order | 
scripts/mod/modpost -m -a -i ./Module.symvers -I 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -e 
/usr/src/ofa_kernel/default/Module.symvers -o 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -s -T - 
  FATAL: parse error in symbol dump file
  scripts/Makefile.modpost:93: recipe for target '__modpost' failed
  make[2]: *** [__modpost] Error 1
  Makefile:1675: recipe for target 'modules' failed
  make[1]: *** [modules] Error 2
  make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
  Makefile:80: recipe for target 'modules' failed
  make: *** [modules] 

[Kernel-packages] [Bug 1960256] [NEW] compilation due to "peermem" module

2022-02-07 Thread D M
Public bug reported:

Installed with the following:

apt-get install --no-install-recommends nvidia-driver-470 nvidia-
modprobe libnvidia-cfg1-470 libnvidia-common-470 libnvidia-compute-470
libnvidia-decode-470 libnvidia-encode-470 libnvidia-extra-470 libnvidia-
fbc1-470 libnvidia-gl-470 libnvidia-ifr1-470 nvidia-compute-utils-470
nvidia-dkms-470 nvidia-driver-470 nvidia-kernel-common-470 nvidia-
kernel-source-470 nvidia-utils-470 xserver-xorg-video-nvidia-470

Seems to generate /var/crash/nvidia-dkms-470.0.crash (tail -20):

   /usr/bin/ld.bfd -m elf_x86_64  -z max-page-size=0x20-r -o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-drv.o […]   { echo  
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o  […] 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-format.o;  echo; } 
> /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.mod
 make -f ./scripts/Makefile.modpost
   sed 's/ko$/o/' /var/lib/dkms/nvidia/470.103.01/build/modules.order | 
scripts/mod/modpost -m -a -i ./Module.symvers -I 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -e 
/usr/src/ofa_kernel/default/Module.symvers -o 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -s -T - 
 FATAL: parse error in symbol dump file
 scripts/Makefile.modpost:93: recipe for target '__modpost' failed
 make[2]: *** [__modpost] Error 1
 Makefile:1675: recipe for target 'modules' failed
 make[1]: *** [modules] Error 2
 make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
 Makefile:80: recipe for target 'modules' failed
 make: *** [modules] Error 2
DKMSKernelVersion: 5.4.0-97-generic
Date: Mon Feb  7 11:32:36 2022
Package: nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1
PackageVersion: 470.103.01-0ubuntu0.18.04.1
SourcePackage: nvidia-graphics-drivers-470
Title: nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1: nvidia kernel module failed 
to build


It is added:

# dkms status -k `uname -r`
iser, 4.7: added
kernel-mft-dkms, 4.13.0, 5.4.0-97-generic, x86_64: installed
knem, 1.1.3.90mlnx1: added
mlnx-ofed-kernel, 4.7: added
nvidia, 470.103.01: added
rshim, 1.8, 5.4.0-97-generic, x86_64: installed
srp, 4.7: added

However, doing a build gets:

 dkms build nvidia/470.103.01

Kernel preparation unnecessary for this kernel.  Skipping...
applying patch disable_fstack-clash-protection_fcf-protection.patch...patching 
file Kbuild
Hunk #1 succeeded at 82 (offset 11 lines).


Building module:
cleaning build area...
unset ARCH; [ ! -h /usr/bin/cc ] && export CC=/usr/bin/gcc; env NV_VERBOSE=1 
'make' -j16 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-97-generic 
IGNORE_XEN_PRESENCE=1 IGNORE_CC_MISMATCH=1 
SYSSRC=/lib/modules/5.4.0-97-generic/build LD=/usr/bin/ld.bfd 
modules.(bad exit status: 2)
ERROR: Cannot create report: [Errno 17] File exists: 
'/var/crash/nvidia-dkms-470.0.crash'
Error! Bad return status for module build on kernel: 5.4.0-97-generic (x86_64)
Consult /var/lib/dkms/nvidia/470.103.01/build/make.log for more information.

Which I suspect is related to the "peermem" module:

  { echo  /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-drv.o [...] 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-drm/nvidia-drm-format.o;  echo; } 
> /var/lib/dkms/nvidia/470.103.01/build/nvidia-drm.mod
  /usr/bin/ld.bfd -m elf_x86_64  -z max-page-size=0x20-r -o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.o 
/var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o
  { echo  
/var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem/nvidia-peermem.o;  echo; } 
> /var/lib/dkms/nvidia/470.103.01/build/nvidia-peermem.mod
make -f ./scripts/Makefile.modpost
  sed 's/ko$/o/' /var/lib/dkms/nvidia/470.103.01/build/modules.order | 
scripts/mod/modpost -m -a -i ./Module.symvers -I 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -e 
/usr/src/ofa_kernel/default/Module.symvers -o 
/var/lib/dkms/nvidia/470.103.01/build/Module.symvers -s -T - 
FATAL: parse error in symbol dump file
scripts/Makefile.modpost:93: recipe for target '__modpost' failed
make[2]: *** [__modpost] Error 1
Makefile:1675: recipe for target 'modules' failed
make[1]: *** [modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.4.0-97-generic'
Makefile:80: recipe for target 'modules' failed
make: *** [modules] Error 2
(END)

This does not seem to be supported on (unpatched?) 5.4 kernels per this
thread I found:

https://forums.linuxmint.com/viewtopic.php?p=2106512#p2106512

Downloading and running "NVIDIA-Linux-x86_64-470.103.01.run" directly
also fails, UNLESS the following options is used:

  --no-peermem
  Do not install the nvidia-peermem kernel module. This kernel module 
provides support for peer-to-peer memory sharing with Mellanox HCAs (Host 
Channel Adapters) via GPUDirect RDMA (Remote Direct Memory Access).

BUT