Hi, On Tue, Jun 24, 2025 at 04:41:33PM +0200, Christoph Berg wrote: > Re: Bertrand Drouvot > > Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's > > used > > in do_pages_move()). > > I was also reading the kernel source around that place but you spotted > the problem before me. This patch resolves the issue here: > > diff --git a/mm/migrate.c b/mm/migrate.c > index 8cf0f9c9599..542c81ec3ed 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, > unsigned long nr_pages, > if (copy_to_user(status, chunk_status, chunk_nr * > sizeof(*status))) > break; > > - pages += chunk_nr; > + if (in_compat_syscall()) { > + compat_uptr_t __user *pages32 = (compat_uptr_t __user > *)pages; > + > + pages32 += chunk_nr; > + pages = (const void __user * __user *) pages32; > + } else > + pages += chunk_nr; > status += chunk_nr; > nr_pages -= chunk_nr; > } >
Thanks! Yeah, I had the same kind of patch idea in mind. > > + #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= > > DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/ > > + > > + for (uint64 chunk_start = 0; chunk_start < > > shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) { > > Perhaps optimize it to this: > > #if sizeof(void *) == 4 > #define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR > (do_pages_stat())*/ > #else > #define NUMA_QUERY_CHUNK_SIZE 1024 > #endif > > ... or some other bigger number. I had in mind to split the batch size on the PG side only for 32-bits, what about the attached? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From 8f38a49f7e929fa385b3784725e8a1955db9fc57 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Wed, 25 Jun 2025 05:06:40 +0000 Subject: [PATCH v1] Work around Linux kernel bug in do_pages_stat() do_pages_stat() is already handling the input arrays correctly in 32-bit mode, but at the end of the "while (nr_pages)" loop, it incorrectly advances the pages pointer with the wrong word size. Work around is to ensure that pg_numa_query_pages() does not pass more than DO_PAGES_STAT_CHUNK_NR (see do_pages_stat()) pages to do_pages_stat() so that the wrong pointer arithmetic has no effect (as the pages variable is not being used). Linux kernel bug reported here: https://marc.info/?l=linux-mm&m=175077821909222&w=2 --- src/port/pg_numa.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c index 4b487a2a4e8..86921a156b4 100644 --- a/src/port/pg_numa.c +++ b/src/port/pg_numa.c @@ -16,6 +16,7 @@ #include "c.h" #include <unistd.h> +#include "miscadmin.h" #include "port/pg_numa.h" /* @@ -46,7 +47,31 @@ pg_numa_init(void) int pg_numa_query_pages(int pid, unsigned long count, void **pages, int *status) { +/* + * Work around Linux kernel bug in 32-bit compat mode: do_pages_stat() has + * incorrect pointer arithmetic for more than DO_PAGES_STAT_CHUNK_NR pages. + */ +#if SIZEOF_SIZE_T == 4 +#define NUMA_QUERY_CHUNK_SIZE 16 /* has to be <= DO_PAGES_STAT_CHUNK_NR + * (do_pages_stat()) */ + for (size_t chunk_start = 0; chunk_start < count; chunk_start += NUMA_QUERY_CHUNK_SIZE) + { + int result; + uint64 chunk_size = Min(NUMA_QUERY_CHUNK_SIZE, count - chunk_start); + + CHECK_FOR_INTERRUPTS(); + + result = numa_move_pages(pid, chunk_size, &pages[chunk_start], NULL, + &status[chunk_start], 0); + + if (result != 0) + return result; + } + + return 0; +#else return numa_move_pages(pid, count, pages, NULL, status, 0); +#endif } int -- 2.34.1