On 22/06/2024 10.46, Adam D. Barratt wrote:
On Sat, 2024-06-22 at 01:10 +0200, Andreas Beckmann wrote:
On 21/06/2024 19.05, Adam D. Barratt wrote:
On Tue, 2024-06-11 at 02:02 +0200, Andreas Beckmann wrote:
A new upstream release of the nvidia drivers in non-free is
needed
for fixing a few new CVEs.
The ppc64el build failed:
FATAL: modpost: GPL-incompatible module nvidia.ko uses GPL-only
symbol 'rcu_read_unlock_strict'
make[5]: *** [/usr/src/linux-headers-5.10.0-30-
common/scripts/Makefile.modpost:123: /<<PKGBUILDDIR>>/kernel-
source-tree/Module.symvers] Error 1
OK, I can reproduce that with linux-headers-5.10.0-30-powerpc64le but
not with linux-headers-5.10.0-28-powerpc64le (nor with
linux-headers-6.8.12-powerpc64le)
"Yay".
This happened:
There are two commits in 6.8 that modify (the arch independent)
pfn_valid() in include/linux/mmzone.h to fix race conditions:
5ec8e8ea8b7783fab150cf86404fc38cb4db8800 (v6.8-rc1)
introduces usage of rcu_read_lock()/rcu_read_unlock()
(which are (transitively) GPL-only symbols)
f6564fce256a3944aa1bc76cb3c40e792d97c1eb (v6.8-rc3)
switches that to rcu_read_lock_sched()/rcu_read_unlock_sched()
(which are not)
Both commits got backported to Linux 6.1 (in bookworm) in
v6.1.76/v6.1.77 but so far only the first got backported to Linux 5.10
(in bullseye) in v5.10.210.
I just filed #1074170 for the potentially missing backport in the
bullseye-pu kernel.
While the nvidia driver stopped using pfn_valid() in 470.239.06, it
still uses the (arch specific) virt_addr_valid() macro.
On ppc64el (arch/powerpc/include/asm/page.h) this macro calls the arch
independent pfn_valid() (which is transitively GPL-only).
On amd64 (arch/x86/include/asm/page.h) this macro uses
EXPORT_SYMBOL(__virt_addr_valid) from arch/x86/mm/physaddr.c
On arm64 (arch/arm64/include/asm/memory.h) this macro calls the arch
specific pfn_valid() (due to CONFIG_HAVE_ARCH_PFN_VALID=y).
I'm adding a patch that (on ppc64el only) for Linux >= 5.10.210 &&
Linux < 5.11 introduces nv_pfn_valid() which is the pfn_valid() from
5.10.210 + the changes from f6564fce256a3944aa1bc76cb3c40e792d97c1eb
as well as nv_virt_addr_valid() which uses it.
It has only slightly been tested:
- building a module for linux-headers-5.10.0-30-powerpc64le now succeeds
- building a module for linux-headers-5.10.0-28-powerpc64le (5.10.209)
still succeeds
- building a module for linux-headers-5.10.0-30-amd64 still succeeds
(patch is theoretically a no-op on amd64)
- untested on arm64, but patch is theoretically a no-op on arm64
I'm not routing this patch through sid and bookworm for now, therefore
the versions of the bullseye uploads (just done) are
- nvidia-graphics-drivers 470.256.02-2
- nvidia-graphics-drivers-tesla-470 470.256.02-1~deb11u2
Do you need separate opu requests for these?
The GPL-only symbol usage bug is also reproducible on ppc64el when
trying to build a module for
linux-headers-5.10.0-30-powerpc64le (5.10.218-1) from
- nvidia-tesla-418-kernel-dkms
- nvidia-tesla-450-kernel-dkms
- nvidia-tesla-460-kernel-dkms
(no bugs filed, yet)
I'm not going to address these now for the imminent point release,
perhaps that can be resolved on the kernel side: #1074170
If we can't find a fix in time, do we need to skip all of nvidia-* for
the bullseye point release?
That shouldn't be neccessary ;-)
In the worst case (the package still FTBFSing) it should be sufficient
to hold back the failing source packages.
Andreas
diff --git a/debian/changelog b/debian/changelog
index ed562f33d..001505b03 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,11 @@
+nvidia-graphics-drivers (470.256.02-2) bullseye; urgency=medium
+
+ * ppc64el: Use pfn_valid() variant with rcu_read_{,un}lock_sched() for
+ Linux 5.10 from 5.10.210 onwards to avoid using GPL symbols.
+ * Upload to bullseye.
+
+ -- Andreas Beckmann <a...@debian.org> Mon, 24 Jun 2024 09:13:50 +0200
+
nvidia-graphics-drivers (470.256.02-1) bullseye; urgency=medium
* New upstream LTS and Tesla branch release 470.256.02 (2024-06-04).
@@ -13,6 +21,7 @@ nvidia-graphics-drivers (470.256.02-1) bullseye; urgency=medium
* Move the libnvidia-glvkspirv dependency to libnvidia-(e)glcore.
(Cf. #1064194)
* Bump Standards-Version to 4.7.0. No changes needed.
+ * Upload to bullseye.
-- Andreas Beckmann <a...@debian.org> Sun, 09 Jun 2024 09:55:50 +0200
@@ -54,6 +63,7 @@ nvidia-graphics-drivers (470.223.02-2) bullseye; urgency=medium
* nvidia-detect: Drop support for Tesla 450 drivers (EoL).
* *-common: Drop alternative Suggests on EoL Tesla 450 packages that have
been turned into transitional packages.
+ * Upload to bullseye.
-- Andreas Beckmann <a...@debian.org> Wed, 21 Feb 2024 09:55:22 +0100
diff --git a/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
new file mode 100644
index 000000000..62ff9ee55
--- /dev/null
+++ b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
@@ -0,0 +1,80 @@
+From e23e041bd9ec3858dda734e38dec065befb2a45b Mon Sep 17 00:00:00 2001
+From: Andreas Beckmann <a...@debian.org>
+Date: Mon, 24 Jun 2024 02:31:03 +0200
+Subject: [PATCH] use pfn_valid() variant with rcu_read_{,un}lock_sched()
+
+---
+ common/inc/nv-linux.h | 43 +++++++++++++++++++++++++++++++++++++++++++
+ nvidia/nv-vtophys.c | 2 +-
+ 2 files changed, 44 insertions(+), 1 deletion(-)
+
+diff --git a/common/inc/nv-linux.h b/common/inc/nv-linux.h
+index e095a89..5cd5abc 100644
+--- a/common/inc/nv-linux.h
++++ b/common/inc/nv-linux.h
+@@ -2014,6 +2014,49 @@ static inline void nv_mutex_destroy(struct mutex *lock)
+
+ }
+
++#if defined(CONFIG_HAVE_ARCH_PFN_VALID) || \
++ !defined(NVCPU_PPC64LE) || \
++ LINUX_VERSION_CODE < KERNEL_VERSION(5,10,210) || \
++ LINUX_VERSION_CODE > KERNEL_VERSION(5,11,0)
++# define nv_virt_addr_valid virt_addr_valid
++#else
++/* - based on pfn_valid() from v5.10.210 which uses
++ rcu_read_lock()/rcu_read_unlock() from
++ 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 (v6.8-rc1/v6.1.76)
++ - applied rcu_read_lock_sched()/rcu_read_unlock_sched() switch from
++ f6564fce256a3944aa1bc76cb3c40e792d97c1eb (v6.8-rc3/v6.1.77)
++ which is not yet backported to 5.10
++*/
++static inline int nv_pfn_valid(unsigned long pfn)
++{
++ struct mem_section *ms;
++ int ret;
++
++ if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
++ return 0;
++ ms = __pfn_to_section(pfn);
++ rcu_read_lock_sched();
++ if (!valid_section(ms)) {
++ rcu_read_unlock_sched();
++ return 0;
++ }
++ /*
++ * Traditionally early sections always returned pfn_valid() for
++ * the entire section-sized span.
++ */
++ ret = early_section(ms) || pfn_section_valid(ms, pfn);
++ rcu_read_unlock_sched();
++
++ return ret;
++}
++
++#define nv_virt_addr_valid(vaddr) ({ \
++ unsigned long _addr = (unsigned long)vaddr; \
++ _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory && \
++ nv_pfn_valid(virt_to_pfn(_addr)); \
++})
++#endif
++
+ #define NV_CHECK_EXPORT_SYMBOL(symbol) (NV_IS_EXPORT_SYMBOL_PRESENT_##symbol && \
+ !NV_IS_EXPORT_SYMBOL_GPL_##symbol)
+ #endif /* _NV_LINUX_H_ */
+diff --git a/nvidia/nv-vtophys.c b/nvidia/nv-vtophys.c
+index 628a07b..3f158d5 100644
+--- a/nvidia/nv-vtophys.c
++++ b/nvidia/nv-vtophys.c
+@@ -16,7 +16,7 @@
+ NvU64 NV_API_CALL nv_get_kern_phys_address(NvU64 address)
+ {
+ /* direct-mapped kernel address */
+- if (virt_addr_valid(address))
++ if (nv_virt_addr_valid(address))
+ return __pa(address);
+
+ nv_printf(NV_DBG_ERRORS,
+--
+2.20.1
+
diff --git a/debian/module/debian/patches/conftest-verbose.patch b/debian/module/debian/patches/conftest-verbose.patch
index f79fc31ad..826248926 100644
--- a/debian/module/debian/patches/conftest-verbose.patch
+++ b/debian/module/debian/patches/conftest-verbose.patch
@@ -3,7 +3,7 @@ Description: dump the generated conftest headers
--- a/Kbuild
+++ b/Kbuild
-@@ -130,6 +130,16 @@ NV_CONFTEST_HEADERS += $(obj)/conftest/h
+@@ -120,6 +120,16 @@ NV_CONFTEST_HEADERS += $(obj)/conftest/h
NV_CONFTEST_HEADERS += $(NV_CONFTEST_COMPILE_TEST_HEADERS)
@@ -20,7 +20,7 @@ Description: dump the generated conftest headers
#
# Generate a header file for a single conftest compile test. Each compile test
# header depends on conftest.sh, as well as the generated conftest/headers.h
-@@ -154,6 +164,8 @@ define NV_GENERATE_COMPILE_TEST_HEADER
+@@ -144,6 +154,8 @@ define NV_GENERATE_COMPILE_TEST_HEADER
@mkdir -p $(obj)/conftest
@# concatenate /dev/null to prevent cat from hanging when $$^ is empty
@cat $$^ /dev/null > $$@
@@ -29,7 +29,7 @@ Description: dump the generated conftest headers
endef
#
-@@ -173,9 +185,11 @@ $(eval $(call NV_GENERATE_COMPILE_TEST_H
+@@ -163,9 +175,11 @@ $(eval $(call NV_GENERATE_COMPILE_TEST_H
$(eval $(call NV_GENERATE_COMPILE_TEST_HEADER,symbols,$(NV_CONFTEST_SYMBOL_COMPILE_TESTS)))
$(eval $(call NV_GENERATE_COMPILE_TEST_HEADER,types,$(NV_CONFTEST_TYPE_COMPILE_TESTS)))
@@ -42,7 +42,7 @@ Description: dump the generated conftest headers
# Each of these headers is checked for presence with a test #include; a
-@@ -256,8 +270,9 @@ NV_HEADER_PRESENCE_PART = $(addprefix $(
+@@ -246,8 +260,9 @@ NV_HEADER_PRESENCE_PART = $(addprefix $(
# Define a rule to check the header $(1).
define NV_HEADER_PRESENCE_CHECK
@@ -53,7 +53,7 @@ Description: dump the generated conftest headers
@$$(NV_CONFTEST_CMD) test_kernel_header '$$(NV_CONFTEST_CFLAGS)' '$(1)' > $$@
endef
-@@ -267,6 +282,8 @@ $(foreach header,$(NV_HEADER_PRESENCE_TE
+@@ -257,6 +272,8 @@ $(foreach header,$(NV_HEADER_PRESENCE_TE
# Concatenate all of the parts into headers.h.
$(obj)/conftest/headers.h: $(call NV_HEADER_PRESENCE_PART,$(NV_HEADER_PRESENCE_TESTS))
@cat $^ > $@
@@ -62,7 +62,7 @@ Description: dump the generated conftest headers
clean-dirs := $(obj)/conftest
-@@ -287,7 +304,8 @@ BUILD_SANITY_CHECKS = \
+@@ -277,7 +294,8 @@ BUILD_SANITY_CHECKS = \
.PHONY: $(BUILD_SANITY_CHECKS)
diff --git a/debian/module/debian/patches/series.in b/debian/module/debian/patches/series.in
index 3d094678c..a20785277 100644
--- a/debian/module/debian/patches/series.in
+++ b/debian/module/debian/patches/series.in
@@ -5,10 +5,11 @@ bashisms.patch
0001-some-power-management-features-were-not-yet-in-Linux.patch
0033-refuse-to-load-legacy-module-if-IBT-is-enabled.patch
0034-fix-typos.patch
+0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
# build system updates
fragile-ARCH.patch
+conftest-verbose.patch
use-kbuild-compiler.patch
use-kbuild-flags.patch
-conftest-verbose.patch
conftest-prefer-arch-headers.patch
diff --git a/debian/changelog b/debian/changelog
index b5f02601c..abc77fa50 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,17 @@
+nvidia-graphics-drivers-tesla-470 (470.256.02-1~deb11u2) bullseye; urgency=medium
+
+ * Rebuild as Tesla 470 driver for bullseye.
+
+ -- Andreas Beckmann <a...@debian.org> Mon, 24 Jun 2024 10:58:46 +0200
+
+nvidia-graphics-drivers (470.256.02-2) bullseye; urgency=medium
+
+ * ppc64el: Use pfn_valid() variant with rcu_read_{,un}lock_sched() for
+ Linux 5.10 from 5.10.210 onwards to avoid using GPL symbols.
+ * Upload to bullseye.
+
+ -- Andreas Beckmann <a...@debian.org> Mon, 24 Jun 2024 09:13:50 +0200
+
nvidia-graphics-drivers-tesla-470 (470.256.02-1~deb11u1) bullseye; urgency=medium
* Rebuild for bullseye.
@@ -136,6 +150,7 @@ nvidia-graphics-drivers (470.223.02-2) bullseye; urgency=medium
* nvidia-detect: Drop support for Tesla 450 drivers (EoL).
* *-common: Drop alternative Suggests on EoL Tesla 450 packages that have
been turned into transitional packages.
+ * Upload to bullseye.
-- Andreas Beckmann <a...@debian.org> Wed, 21 Feb 2024 09:55:22 +0100
diff --git a/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
new file mode 100644
index 000000000..62ff9ee55
--- /dev/null
+++ b/debian/module/debian/patches/0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
@@ -0,0 +1,80 @@
+From e23e041bd9ec3858dda734e38dec065befb2a45b Mon Sep 17 00:00:00 2001
+From: Andreas Beckmann <a...@debian.org>
+Date: Mon, 24 Jun 2024 02:31:03 +0200
+Subject: [PATCH] use pfn_valid() variant with rcu_read_{,un}lock_sched()
+
+---
+ common/inc/nv-linux.h | 43 +++++++++++++++++++++++++++++++++++++++++++
+ nvidia/nv-vtophys.c | 2 +-
+ 2 files changed, 44 insertions(+), 1 deletion(-)
+
+diff --git a/common/inc/nv-linux.h b/common/inc/nv-linux.h
+index e095a89..5cd5abc 100644
+--- a/common/inc/nv-linux.h
++++ b/common/inc/nv-linux.h
+@@ -2014,6 +2014,49 @@ static inline void nv_mutex_destroy(struct mutex *lock)
+
+ }
+
++#if defined(CONFIG_HAVE_ARCH_PFN_VALID) || \
++ !defined(NVCPU_PPC64LE) || \
++ LINUX_VERSION_CODE < KERNEL_VERSION(5,10,210) || \
++ LINUX_VERSION_CODE > KERNEL_VERSION(5,11,0)
++# define nv_virt_addr_valid virt_addr_valid
++#else
++/* - based on pfn_valid() from v5.10.210 which uses
++ rcu_read_lock()/rcu_read_unlock() from
++ 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 (v6.8-rc1/v6.1.76)
++ - applied rcu_read_lock_sched()/rcu_read_unlock_sched() switch from
++ f6564fce256a3944aa1bc76cb3c40e792d97c1eb (v6.8-rc3/v6.1.77)
++ which is not yet backported to 5.10
++*/
++static inline int nv_pfn_valid(unsigned long pfn)
++{
++ struct mem_section *ms;
++ int ret;
++
++ if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
++ return 0;
++ ms = __pfn_to_section(pfn);
++ rcu_read_lock_sched();
++ if (!valid_section(ms)) {
++ rcu_read_unlock_sched();
++ return 0;
++ }
++ /*
++ * Traditionally early sections always returned pfn_valid() for
++ * the entire section-sized span.
++ */
++ ret = early_section(ms) || pfn_section_valid(ms, pfn);
++ rcu_read_unlock_sched();
++
++ return ret;
++}
++
++#define nv_virt_addr_valid(vaddr) ({ \
++ unsigned long _addr = (unsigned long)vaddr; \
++ _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory && \
++ nv_pfn_valid(virt_to_pfn(_addr)); \
++})
++#endif
++
+ #define NV_CHECK_EXPORT_SYMBOL(symbol) (NV_IS_EXPORT_SYMBOL_PRESENT_##symbol && \
+ !NV_IS_EXPORT_SYMBOL_GPL_##symbol)
+ #endif /* _NV_LINUX_H_ */
+diff --git a/nvidia/nv-vtophys.c b/nvidia/nv-vtophys.c
+index 628a07b..3f158d5 100644
+--- a/nvidia/nv-vtophys.c
++++ b/nvidia/nv-vtophys.c
+@@ -16,7 +16,7 @@
+ NvU64 NV_API_CALL nv_get_kern_phys_address(NvU64 address)
+ {
+ /* direct-mapped kernel address */
+- if (virt_addr_valid(address))
++ if (nv_virt_addr_valid(address))
+ return __pa(address);
+
+ nv_printf(NV_DBG_ERRORS,
+--
+2.20.1
+
diff --git a/debian/module/debian/patches/series.in b/debian/module/debian/patches/series.in
index cda151f88..a20785277 100644
--- a/debian/module/debian/patches/series.in
+++ b/debian/module/debian/patches/series.in
@@ -5,6 +5,7 @@ bashisms.patch
0001-some-power-management-features-were-not-yet-in-Linux.patch
0033-refuse-to-load-legacy-module-if-IBT-is-enabled.patch
0034-fix-typos.patch
+0035-use-pfn_valid-variant-with-rcu_read_-un-lock_sched.patch
# build system updates
fragile-ARCH.patch