Hi,

On Tue, Jun 24, 2025 at 04:41:33PM +0200, Christoph Berg wrote:
> Re: Bertrand Drouvot
> > Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's 
> > used
> > in do_pages_move()).
> 
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, 
> unsigned long nr_pages,
>               if (copy_to_user(status, chunk_status, chunk_nr * 
> sizeof(*status)))
>                       break;
> 
> -             pages += chunk_nr;
> +             if (in_compat_syscall()) {
> +                     compat_uptr_t __user *pages32 = (compat_uptr_t __user 
> *)pages;
> +
> +                     pages32 += chunk_nr;
> +                     pages = (const void __user * __user *) pages32;
> +             } else
> +                     pages += chunk_nr;
>               status += chunk_nr;
>               nr_pages -= chunk_nr;
>       }
> 

Thanks! Yeah, I had the same kind of patch idea in mind.

> > +               #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
> > DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
> > +
> > +               for (uint64 chunk_start = 0; chunk_start < 
> > shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> 
> Perhaps optimize it to this:
> 
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
> (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
> 
> ... or some other bigger number.

I had in mind to split the batch size on the PG side only for 32-bits, what 
about
the attached?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 8f38a49f7e929fa385b3784725e8a1955db9fc57 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Wed, 25 Jun 2025 05:06:40 +0000
Subject: [PATCH v1] Work around Linux kernel bug in do_pages_stat()

do_pages_stat() is already handling the input arrays correctly in
32-bit mode, but at the end of the "while (nr_pages)" loop, it
incorrectly advances the pages pointer with the wrong word size.

Work around is to ensure that pg_numa_query_pages() does not pass more than
DO_PAGES_STAT_CHUNK_NR (see do_pages_stat()) pages to do_pages_stat() so that
the wrong pointer arithmetic has no effect (as the pages variable is not being
used).

Linux kernel bug reported here: https://marc.info/?l=linux-mm&m=175077821909222&w=2
---
 src/port/pg_numa.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 4b487a2a4e8..86921a156b4 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -16,6 +16,7 @@
 #include "c.h"
 #include <unistd.h>
 
+#include "miscadmin.h"
 #include "port/pg_numa.h"
 
 /*
@@ -46,7 +47,31 @@ pg_numa_init(void)
 int
 pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status)
 {
+/*
+ * Work around Linux kernel bug in 32-bit compat mode: do_pages_stat() has
+ * incorrect pointer arithmetic for more than DO_PAGES_STAT_CHUNK_NR pages.
+ */
+#if SIZEOF_SIZE_T == 4
+#define NUMA_QUERY_CHUNK_SIZE 16	/* has to be <= DO_PAGES_STAT_CHUNK_NR
+									 * (do_pages_stat()) */
+	for (size_t chunk_start = 0; chunk_start < count; chunk_start += NUMA_QUERY_CHUNK_SIZE)
+	{
+		int			result;
+		uint64		chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, count - chunk_start);
+
+		CHECK_FOR_INTERRUPTS();
+
+		result = numa_move_pages(pid, chunk_size, &pages[chunk_start], NULL,
+								 &status[chunk_start], 0);
+
+		if (result != 0)
+			return result;
+	}
+
+	return 0;
+#else
 	return numa_move_pages(pid, count, pages, NULL, status, 0);
+#endif
 }
 
 int
-- 
2.34.1

Reply via email to