On 22/06/2024 10.46, Adam D. Barratt wrote:
On Sat, 2024-06-22 at 01:10 +0200, Andreas Beckmann wrote:
On 21/06/2024 19.05, Adam D. Barratt wrote:
On Tue, 2024-06-11 at 02:02 +0200, Andreas Beckmann wrote:
A new upstream release of the nvidia drivers in non-free is
needed
for fixing a few new CVEs.

The ppc64el build failed:

FATAL: modpost: GPL-incompatible module nvidia.ko uses GPL-only
symbol 'rcu_read_unlock_strict'
make[5]: *** [/usr/src/linux-headers-5.10.0-30-
common/scripts/Makefile.modpost:123: /<<PKGBUILDDIR>>/kernel-
source-tree/Module.symvers] Error 1

OK, I can reproduce that with linux-headers-5.10.0-30-powerpc64le but
not with linux-headers-5.10.0-28-powerpc64le (nor with
linux-headers-6.8.12-powerpc64le)

"Yay".

This happened:

There are two commits in 6.8 that modify (the arch independent) pfn_valid() in include/linux/mmzone.h to fix race conditions:

5ec8e8ea8b7783fab150cf86404fc38cb4db8800 (v6.8-rc1)
introduces usage of rcu_read_lock()/rcu_read_unlock()
(which are (transitively) GPL-only symbols)

f6564fce256a3944aa1bc76cb3c40e792d97c1eb (v6.8-rc3)
switches that to rcu_read_lock_sched()/rcu_read_unlock_sched()
(which are not)

Both commits got backported to Linux 6.1 (in bookworm) in v6.1.76/v6.1.77 but so far only the first got backported to Linux 5.10 (in bullseye) in v5.10.210. I just filed #1074170 for the potentially missing backport in the bullseye-pu kernel.

While the nvidia driver stopped using pfn_valid() in 470.239.06, it still uses the (arch specific) virt_addr_valid() macro.

On ppc64el (arch/powerpc/include/asm/page.h) this macro calls the arch independent pfn_valid() (which is transitively GPL-only).

On amd64 (arch/x86/include/asm/page.h) this macro uses EXPORT_SYMBOL(__virt_addr_valid) from arch/x86/mm/physaddr.c

On arm64 (arch/arm64/include/asm/memory.h) this macro calls the arch specific pfn_valid() (due to CONFIG_HAVE_ARCH_PFN_VALID=y).

I'm adding a patch that (on ppc64el only) for Linux >= 5.10.210 &&
Linux < 5.11 introduces nv_pfn_valid() which is the pfn_valid() from 5.10.210 + the changes from f6564fce256a3944aa1bc76cb3c40e792d97c1eb
as well as nv_virt_addr_valid() which uses it.

It has only slightly been tested:
- building a module for linux-headers-5.10.0-30-powerpc64le now succeeds
- building a module for linux-headers-5.10.0-28-powerpc64le (5.10.209) still succeeds - building a module for linux-headers-5.10.0-30-amd64 still succeeds (patch is theoretically a no-op on amd64)
- untested on arm64, but patch is theoretically a no-op on arm64

I'm not routing this patch through sid and bookworm for now, therefore the versions of the bullseye uploads (just done) are
- nvidia-graphics-drivers 470.256.02-2
- nvidia-graphics-drivers-tesla-470 470.256.02-1~deb11u2
Do you need separate opu requests for these?

The GPL-only symbol usage bug is also reproducible on ppc64el when trying to build a module for
linux-headers-5.10.0-30-powerpc64le (5.10.218-1) from
- nvidia-tesla-418-kernel-dkms
- nvidia-tesla-450-kernel-dkms
- nvidia-tesla-460-kernel-dkms
(no bugs filed, yet)

I'm not going to address these now for the imminent point release, perhaps that can be resolved on the kernel side: #1074170

If we can't find a fix in time, do we need to skip all of nvidia-* for
the bullseye point release?

That shouldn't be neccessary ;-)
In the worst case (the package still FTBFSing) it should be sufficient to hold back the failing source packages.

Andreas
diff --git a/debian/changelog b/debian/changelog
index ed562f33d..001505b03 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,11 @@
+nvidia-graphics-drivers (470.256.02-2) bullseye; urgency=medium
+
+  * ppc64el: Use pfn_valid() variant with rcu_read_{,un}lock_sched() for
+    Linux 5.10 from 5.10.210 onwards to avoid using GPL symbols.
+  * Upload to bullseye.
+
+ -- Andreas Beckmann <a...@debian.org>  Mon, 24 Jun 2024 09:13:50 +0200
+
 nvidia-graphics-drivers (470.256.02-1) bullseye; urgency=medium
 
   * New upstream LTS and Tesla branch release 470.256.02 (2024-06-04).
@@ -13,6 +21,7 @@ nvidia-graphics-drivers (470.256.02-1) bullseye; urgency=medium
   * Move the libnvidia-glvkspirv dependency to libnvidia-(e)glcore.
     (Cf. #1064194)
   * Bump Standards-Version to 4.7.0. No changes needed.
+  * Upload to bullseye.
 
  -- Andreas Beckmann <a...@debian.org>  Sun, 09 Jun 2024 09:55:50 +0200
 
@@ -54,6 +63,7 @@ nvidia-graphics-drivers (470.223.02-2) bullseye; urgency=medium
   * nvidia-detect: Drop support for Tesla 450 drivers (EoL).
   * *-common: Drop alternative Suggests on EoL Tesla 450 packages that have
     been turned into transitional packages.
+  * Upload to bullseye.
 
  -- Andreas Beckmann <a...@debian.org>  Wed, 21 Feb 2024 09:55:22 +0100
 
diff --git a/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
new file mode 100644
index 000000000..62ff9ee55
--- /dev/null
+++ b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
@@ -0,0 +1,80 @@
+From e23e041bd9ec3858dda734e38dec065befb2a45b Mon Sep 17 00:00:00 2001
+From: Andreas Beckmann <a...@debian.org>
+Date: Mon, 24 Jun 2024 02:31:03 +0200
+Subject: [PATCH] use pfn_valid() variant with rcu_read_{,un}lock_sched()
+
+---
+ common/inc/nv-linux.h | 43 +++++++++++++++++++++++++++++++++++++++++++
+ nvidia/nv-vtophys.c   |  2 +-
+ 2 files changed, 44 insertions(+), 1 deletion(-)
+
+diff --git a/common/inc/nv-linux.h b/common/inc/nv-linux.h
+index e095a89..5cd5abc 100644
+--- a/common/inc/nv-linux.h
++++ b/common/inc/nv-linux.h
+@@ -2014,6 +2014,49 @@ static inline void nv_mutex_destroy(struct mutex *lock)
+ 
+ }
+ 
++#if defined(CONFIG_HAVE_ARCH_PFN_VALID) || \
++	!defined(NVCPU_PPC64LE) || \
++	LINUX_VERSION_CODE < KERNEL_VERSION(5,10,210) || \
++	LINUX_VERSION_CODE > KERNEL_VERSION(5,11,0)
++#  define nv_virt_addr_valid virt_addr_valid
++#else
++/* - based on pfn_valid() from v5.10.210 which uses
++     rcu_read_lock()/rcu_read_unlock() from
++     5ec8e8ea8b7783fab150cf86404fc38cb4db8800 (v6.8-rc1/v6.1.76)
++   - applied rcu_read_lock_sched()/rcu_read_unlock_sched() switch from
++     f6564fce256a3944aa1bc76cb3c40e792d97c1eb (v6.8-rc3/v6.1.77)
++     which is not yet backported to 5.10
++*/
++static inline int nv_pfn_valid(unsigned long pfn)
++{
++        struct mem_section *ms;
++        int ret;
++
++        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
++                return 0;
++        ms = __pfn_to_section(pfn);
++        rcu_read_lock_sched();
++        if (!valid_section(ms)) {
++                rcu_read_unlock_sched();
++                return 0;
++        }
++        /*
++         * Traditionally early sections always returned pfn_valid() for
++         * the entire section-sized span.
++         */
++        ret = early_section(ms) || pfn_section_valid(ms, pfn);
++        rcu_read_unlock_sched();
++
++        return ret;
++}
++
++#define nv_virt_addr_valid(vaddr)  ({                                   \
++        unsigned long _addr = (unsigned long)vaddr;                     \
++        _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory &&   \
++        nv_pfn_valid(virt_to_pfn(_addr));                               \
++})
++#endif
++
+ #define NV_CHECK_EXPORT_SYMBOL(symbol)  (NV_IS_EXPORT_SYMBOL_PRESENT_##symbol && \
+                                          !NV_IS_EXPORT_SYMBOL_GPL_##symbol)
+ #endif  /* _NV_LINUX_H_ */
+diff --git a/nvidia/nv-vtophys.c b/nvidia/nv-vtophys.c
+index 628a07b..3f158d5 100644
+--- a/nvidia/nv-vtophys.c
++++ b/nvidia/nv-vtophys.c
+@@ -16,7 +16,7 @@
+ NvU64 NV_API_CALL nv_get_kern_phys_address(NvU64 address)
+ {
+     /* direct-mapped kernel address */
+-    if (virt_addr_valid(address))
++    if (nv_virt_addr_valid(address))
+         return __pa(address);
+ 
+     nv_printf(NV_DBG_ERRORS,
+-- 
+2.20.1
+
diff --git a/debian/module/debian/patches/conftest-verbose.patch b/debian/module/debian/patches/conftest-verbose.patch
index f79fc31ad..826248926 100644
--- a/debian/module/debian/patches/conftest-verbose.patch
+++ b/debian/module/debian/patches/conftest-verbose.patch
@@ -3,7 +3,7 @@ Description: dump the generated conftest headers
 
 --- a/Kbuild
 +++ b/Kbuild
-@@ -130,6 +130,16 @@ NV_CONFTEST_HEADERS += $(obj)/conftest/h
+@@ -120,6 +120,16 @@ NV_CONFTEST_HEADERS += $(obj)/conftest/h
  NV_CONFTEST_HEADERS += $(NV_CONFTEST_COMPILE_TEST_HEADERS)
  
  
@@ -20,7 +20,7 @@ Description: dump the generated conftest headers
  #
  # Generate a header file for a single conftest compile test. Each compile test
  # header depends on conftest.sh, as well as the generated conftest/headers.h
-@@ -154,6 +164,8 @@ define NV_GENERATE_COMPILE_TEST_HEADER
+@@ -144,6 +154,8 @@ define NV_GENERATE_COMPILE_TEST_HEADER
  	@mkdir -p $(obj)/conftest
  	@# concatenate /dev/null to prevent cat from hanging when $$^ is empty
  	@cat $$^ /dev/null > $$@
@@ -29,7 +29,7 @@ Description: dump the generated conftest headers
  endef
  
  #
-@@ -173,9 +185,11 @@ $(eval $(call NV_GENERATE_COMPILE_TEST_H
+@@ -163,9 +175,11 @@ $(eval $(call NV_GENERATE_COMPILE_TEST_H
  $(eval $(call NV_GENERATE_COMPILE_TEST_HEADER,symbols,$(NV_CONFTEST_SYMBOL_COMPILE_TESTS)))
  $(eval $(call NV_GENERATE_COMPILE_TEST_HEADER,types,$(NV_CONFTEST_TYPE_COMPILE_TESTS)))
  
@@ -42,7 +42,7 @@ Description: dump the generated conftest headers
  
  
  # Each of these headers is checked for presence with a test #include; a
-@@ -256,8 +270,9 @@ NV_HEADER_PRESENCE_PART = $(addprefix $(
+@@ -246,8 +260,9 @@ NV_HEADER_PRESENCE_PART = $(addprefix $(
  
  # Define a rule to check the header $(1).
  define NV_HEADER_PRESENCE_CHECK
@@ -53,7 +53,7 @@ Description: dump the generated conftest headers
  	@$$(NV_CONFTEST_CMD) test_kernel_header '$$(NV_CONFTEST_CFLAGS)' '$(1)' > $$@
  endef
  
-@@ -267,6 +282,8 @@ $(foreach header,$(NV_HEADER_PRESENCE_TE
+@@ -257,6 +272,8 @@ $(foreach header,$(NV_HEADER_PRESENCE_TE
  # Concatenate all of the parts into headers.h.
  $(obj)/conftest/headers.h: $(call NV_HEADER_PRESENCE_PART,$(NV_HEADER_PRESENCE_TESTS))
  	@cat $^ > $@
@@ -62,7 +62,7 @@ Description: dump the generated conftest headers
  
  clean-dirs := $(obj)/conftest
  
-@@ -287,7 +304,8 @@ BUILD_SANITY_CHECKS = \
+@@ -277,7 +294,8 @@ BUILD_SANITY_CHECKS = \
  
  .PHONY: $(BUILD_SANITY_CHECKS)
  
diff --git a/debian/module/debian/patches/series.in b/debian/module/debian/patches/series.in
index 3d094678c..a20785277 100644
--- a/debian/module/debian/patches/series.in
+++ b/debian/module/debian/patches/series.in
@@ -5,10 +5,11 @@ bashisms.patch
 0001-some-power-management-features-were-not-yet-in-Linux.patch
 0033-refuse-to-load-legacy-module-if-IBT-is-enabled.patch
 0034-fix-typos.patch
+0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
 
 # build system updates
 fragile-ARCH.patch
+conftest-verbose.patch
 use-kbuild-compiler.patch
 use-kbuild-flags.patch
-conftest-verbose.patch
 conftest-prefer-arch-headers.patch
diff --git a/debian/changelog b/debian/changelog
index b5f02601c..abc77fa50 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,17 @@
+nvidia-graphics-drivers-tesla-470 (470.256.02-1~deb11u2) bullseye; urgency=medium
+
+  * Rebuild as Tesla 470 driver for bullseye.
+
+ -- Andreas Beckmann <a...@debian.org>  Mon, 24 Jun 2024 10:58:46 +0200
+
+nvidia-graphics-drivers (470.256.02-2) bullseye; urgency=medium
+
+  * ppc64el: Use pfn_valid() variant with rcu_read_{,un}lock_sched() for
+    Linux 5.10 from 5.10.210 onwards to avoid using GPL symbols.
+  * Upload to bullseye.
+
+ -- Andreas Beckmann <a...@debian.org>  Mon, 24 Jun 2024 09:13:50 +0200
+
 nvidia-graphics-drivers-tesla-470 (470.256.02-1~deb11u1) bullseye; urgency=medium
 
   * Rebuild for bullseye.
@@ -136,6 +150,7 @@ nvidia-graphics-drivers (470.223.02-2) bullseye; urgency=medium
   * nvidia-detect: Drop support for Tesla 450 drivers (EoL).
   * *-common: Drop alternative Suggests on EoL Tesla 450 packages that have
     been turned into transitional packages.
+  * Upload to bullseye.
 
  -- Andreas Beckmann <a...@debian.org>  Wed, 21 Feb 2024 09:55:22 +0100
 
diff --git a/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
new file mode 100644
index 000000000..62ff9ee55
--- /dev/null
+++ b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
@@ -0,0 +1,80 @@
+From e23e041bd9ec3858dda734e38dec065befb2a45b Mon Sep 17 00:00:00 2001
+From: Andreas Beckmann <a...@debian.org>
+Date: Mon, 24 Jun 2024 02:31:03 +0200
+Subject: [PATCH] use pfn_valid() variant with rcu_read_{,un}lock_sched()
+
+---
+ common/inc/nv-linux.h | 43 +++++++++++++++++++++++++++++++++++++++++++
+ nvidia/nv-vtophys.c   |  2 +-
+ 2 files changed, 44 insertions(+), 1 deletion(-)
+
+diff --git a/common/inc/nv-linux.h b/common/inc/nv-linux.h
+index e095a89..5cd5abc 100644
+--- a/common/inc/nv-linux.h
++++ b/common/inc/nv-linux.h
+@@ -2014,6 +2014,49 @@ static inline void nv_mutex_destroy(struct mutex *lock)
+ 
+ }
+ 
++#if defined(CONFIG_HAVE_ARCH_PFN_VALID) || \
++	!defined(NVCPU_PPC64LE) || \
++	LINUX_VERSION_CODE < KERNEL_VERSION(5,10,210) || \
++	LINUX_VERSION_CODE > KERNEL_VERSION(5,11,0)
++#  define nv_virt_addr_valid virt_addr_valid
++#else
++/* - based on pfn_valid() from v5.10.210 which uses
++     rcu_read_lock()/rcu_read_unlock() from
++     5ec8e8ea8b7783fab150cf86404fc38cb4db8800 (v6.8-rc1/v6.1.76)
++   - applied rcu_read_lock_sched()/rcu_read_unlock_sched() switch from
++     f6564fce256a3944aa1bc76cb3c40e792d97c1eb (v6.8-rc3/v6.1.77)
++     which is not yet backported to 5.10
++*/
++static inline int nv_pfn_valid(unsigned long pfn)
++{
++        struct mem_section *ms;
++        int ret;
++
++        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
++                return 0;
++        ms = __pfn_to_section(pfn);
++        rcu_read_lock_sched();
++        if (!valid_section(ms)) {
++                rcu_read_unlock_sched();
++                return 0;
++        }
++        /*
++         * Traditionally early sections always returned pfn_valid() for
++         * the entire section-sized span.
++         */
++        ret = early_section(ms) || pfn_section_valid(ms, pfn);
++        rcu_read_unlock_sched();
++
++        return ret;
++}
++
++#define nv_virt_addr_valid(vaddr)  ({                                   \
++        unsigned long _addr = (unsigned long)vaddr;                     \
++        _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory &&   \
++        nv_pfn_valid(virt_to_pfn(_addr));                               \
++})
++#endif
++
+ #define NV_CHECK_EXPORT_SYMBOL(symbol)  (NV_IS_EXPORT_SYMBOL_PRESENT_##symbol && \
+                                          !NV_IS_EXPORT_SYMBOL_GPL_##symbol)
+ #endif  /* _NV_LINUX_H_ */
+diff --git a/nvidia/nv-vtophys.c b/nvidia/nv-vtophys.c
+index 628a07b..3f158d5 100644
+--- a/nvidia/nv-vtophys.c
++++ b/nvidia/nv-vtophys.c
+@@ -16,7 +16,7 @@
+ NvU64 NV_API_CALL nv_get_kern_phys_address(NvU64 address)
+ {
+     /* direct-mapped kernel address */
+-    if (virt_addr_valid(address))
++    if (nv_virt_addr_valid(address))
+         return __pa(address);
+ 
+     nv_printf(NV_DBG_ERRORS,
+-- 
+2.20.1
+
diff --git a/debian/module/debian/patches/series.in b/debian/module/debian/patches/series.in
index cda151f88..a20785277 100644
--- a/debian/module/debian/patches/series.in
+++ b/debian/module/debian/patches/series.in
@@ -5,6 +5,7 @@ bashisms.patch
 0001-some-power-management-features-were-not-yet-in-Linux.patch
 0033-refuse-to-load-legacy-module-if-IBT-is-enabled.patch
 0034-fix-typos.patch
+0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
 
 # build system updates
 fragile-ARCH.patch

Reply via email to