On Thu, Jun 26, 2025 at 12:05:11PM +0530, Dev Jain wrote: > > On 26/06/25 11:12 am, Donet Tom wrote: > > On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote: > > > On 25/06/25 10:47 pm, Donet Tom wrote: > > > > On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote: > > > > > On 19/06/25 1:53 pm, Donet Tom wrote: > > > > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote: > > > > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote: > > > > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote: > > > > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote: > > > > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote: > > > > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote: > > > > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain > > > > > > > > > > > > wrote: > > > > > > > > > > > > Are you accounting for sys.max_map_count? If not, then > > > > > > > > > > > > you'll be hitting that > > > > > > > > > > > > first. > > > > > > > > > > > run_vmtests.sh will run the test in overcommit mode so > > > > > > > > > > > that won't be an issue. > > > > > > > > > > Umm, what? You mean overcommit all mode, and that has no > > > > > > > > > > bearing on the max > > > > > > > > > > mapping count check. > > > > > > > > > > > > > > > > > > > > In do_mmap(): > > > > > > > > > > > > > > > > > > > > /* Too many mappings? */ > > > > > > > > > > if (mm->map_count > sysctl_max_map_count) > > > > > > > > > > return -ENOMEM; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As well as numerous other checks in mm/vma.c. > > > > > > > > > Ah sorry, didn't look at the code properly just assumed that > > > > > > > > > overcommit_always meant overriding > > > > > > > > > this. > > > > > > > > No problem! It's hard to be aware of everything in mm :) > > > > > > > > > > > > > > > > > > I'm not sure why an overcommit toggle is even necessary > > > > > > > > > > when you could use > > > > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the > > > > > > > > > > OVERCOMMIT_GUESS limits? > > > > > > > > > > > > > > > > > > > > I'm pretty confused as to what this test is really > > > > > > > > > > achieving honestly. This > > > > > > > > > > isn't a useful way of asserting mmap() behaviour as far as > > > > > > > > > > I can tell. > > > > > > > > > Well, seems like a useful way to me at least : ) Not sure if > > > > > > > > > you are in the mood > > > > > > > > > to discuss that but if you'd like me to explain from start to > > > > > > > > > end what the test > > > > > > > > > is doing, I can do that : ) > > > > > > > > > > > > > > > > > I just don't have time right now, I guess I'll have to come > > > > > > > > back to it > > > > > > > > later... it's not the end of the world for it to be iffy in my > > > > > > > > view as long as > > > > > > > > it passes, but it might just not be of great value. > > > > > > > > > > > > > > > > Philosophically I'd rather we didn't assert internal > > > > > > > > implementation details like > > > > > > > > where we place mappings in userland memory. At no point do we > > > > > > > > promise to not > > > > > > > > leave larger gaps if we feel like it :) > > > > > > > You have a fair point. Anyhow a debate for another day. > > > > > > > > > > > > > > > I'm guessing, reading more, the _real_ test here is some > > > > > > > > mathematical assertion > > > > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when > > > > > > > > using hints. > > > > > > > > > > > > > > > > But again I'm not sure that achieves much and again also is > > > > > > > > asserting internal > > > > > > > > implementation details. > > > > > > > > > > > > > > > > Correct behaviour of this kind of thing probably better belongs > > > > > > > > to tests in the > > > > > > > > userland VMA testing I'd say. > > > > > > > > > > > > > > > > Sorry I don't mean to do down work you've done before, just > > > > > > > > giving an honest > > > > > > > > technical appraisal! > > > > > > > Nah, it will be rather hilarious to see it all go down the drain > > > > > > > xD > > > > > > > > > > > > > > > Anyway don't let this block work to fix the test if it's > > > > > > > > failing. We can revisit > > > > > > > > this later. > > > > > > > Sure. @Aboorva and Donet, I still believe that the correct > > > > > > > approach is to elide > > > > > > > the gap check at the crossing boundary. What do you think? > > > > > > > > > > > > > One problem I am seeing with this approach is that, since the hint > > > > > > address > > > > > > is generated randomly, the VMAs are also being created at randomly > > > > > > based on > > > > > > the hint address.So, for the VMAs created at high addresses, we > > > > > > cannot guarantee > > > > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE. > > > > > > > > > > > > High address VMAs > > > > > > ----------------- > > > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0 > > > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0 > > > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0 > > > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0 > > > > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0 > > > > > > > > > > > > I have a different approach to solve this issue. > > > > > > > > > > > > From 0 to 128TB, we map memory directly without using any hint. > > > > > > For the range above > > > > > > 256TB up to 512TB, we perform the mapping using hint addresses. In > > > > > > the current test, > > > > > > we use random hint addresses, but I have modified it to generate > > > > > > hint addresses linearly > > > > > > starting from 128TB. > > > > > > > > > > > > With this change: > > > > > > > > > > > > The 0–128TB range is mapped without hints and verified accordingly. > > > > > > > > > > > > The 128TB–512TB range is mapped using linear hint addresses and > > > > > > then verified. > > > > > > > > > > > > Below are the VMAs obtained with this approach: > > > > > > > > > > > > 10000000-10010000 r-xp 00000000 fd:05 135019531 > > > > > > 10010000-10020000 r--p 00000000 fd:05 135019531 > > > > > > 10020000-10030000 rw-p 00010000 fd:05 135019531 > > > > > > 20000000-10020000000 r--p 00000000 00:00 0 > > > > > > 10020800000-10020830000 rw-p 00000000 00:00 0 > > > > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0 > > > > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0 > > > > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355 > > > > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355 > > > > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355 > > > > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358 > > > > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358 > > > > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358 > > > > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0 > > > > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0 > > > > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351 > > > > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351 > > > > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351 > > > > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0 > > > > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0 > > > > > > 800000000000-2000000000000 r--p 00000000 00:00 0 -> High Address > > > > > > (128TB to 512TB) > > > > > > > > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c > > > > > > b/tools/testing/selftests/mm/virtual_address_range.c > > > > > > index 4c4c35eac15e..0be008cba4b0 100644 > > > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > > > > > @@ -56,21 +56,21 @@ > > > > > > #ifdef __aarch64__ > > > > > > #define HIGH_ADDR_MARK ADDR_MARK_256TB > > > > > > -#define HIGH_ADDR_SHIFT 49 > > > > > > +#define HIGH_ADDR_SHIFT 48 > > > > > > #define NR_CHUNKS_LOW NR_CHUNKS_256TB > > > > > > #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB > > > > > > #else > > > > > > #define HIGH_ADDR_MARK ADDR_MARK_128TB > > > > > > -#define HIGH_ADDR_SHIFT 48 > > > > > > +#define HIGH_ADDR_SHIFT 47 > > > > > > #define NR_CHUNKS_LOW NR_CHUNKS_128TB > > > > > > #define NR_CHUNKS_HIGH NR_CHUNKS_384TB > > > > > > #endif > > > > > > -static char *hint_addr(void) > > > > > > +static char *hint_addr(int hint) > > > > > > { > > > > > > - int bits = HIGH_ADDR_SHIFT + rand() % (63 - > > > > > > HIGH_ADDR_SHIFT); > > > > > > + unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * > > > > > > MAP_CHUNK_SIZE)); > > > > > > - return (char *) (1UL << bits); > > > > > > + return (char *) (addr); > > > > > > } > > > > > > static void validate_addr(char *ptr, int high_addr) > > > > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[]) > > > > > > } > > > > > > for (i = 0; i < NR_CHUNKS_HIGH; i++) { > > > > > > - hint = hint_addr(); > > > > > > + hint = hint_addr(i); > > > > > > hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ, > > > > > > MAP_PRIVATE | MAP_ANONYMOUS, -1, > > > > > > 0); > > > > > Ah you sent it here, thanks. This is fine really, but the mystery is > > > > > something else. > > > > > > > > > Thanks Dev > > > > > > > > I can send out v2 with this patch included, right? > > > Sorry not yet :) this patch will just hide the real problem, which > > > is, after the hint addresses get exhausted, why on ppc the kernel > > > cannot find a VMA to install despite having such large gaps between > > > VMAs. > > > > I think there is some confusion here, so let me clarify. > > > > On PowerPC, mmap is able to find VMAs both with and without a hint. > > There is no issue there. If you look at the test, from 0 to 128TB we > > are mapping without any hint, and the VMAs are getting created as > > expected. > > > > Above 256TB, we are mapping with random hint addresses, and with > > those hints, all VMAs are being created above 258TB. No mmap call > > is failing in this case. > > > > The problem is with the test itself: since we are providing random > > hint addresses, the VMAs are also being created at random locations. > > > > Below is the VMAs created with hint addreess > > > > 1. 256TB hint address > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0 > > [anon:virtual_address_range] > > > > 2. 512TB hint address > > 2000000000000-2000040000000 r--p 00000000 00:00 0 > > [anon:virtual_address_range] > > > > 3. 1024TB Hint address > > 4000000000000-4000040000000 r--p 00000000 00:00 0 > > [anon:virtual_address_range] > > > > 4. 2048TB hint Address > > 8000000000000-8000040000000 r--p 00000000 00:00 0 > > [anon:virtual_address_range] > > > > 5. above 3096 Hint address > > eb95410220000-fffff90220000 r--p 00000000 00:00 0 > > [anon:virtual_address_range]. > > > > > > We support up to 4PB, and since the hint addresses are random, > > the VMAs are created at random locations. > > > > With sequential hint addresses from 128TB to 512TB, we provide the > > hint addresses in order, and the VMAs are created at the hinted > > addresses. > > > > Within 512TB, we were able to test both high and low addresses, so > > I thought sequential hinting would be a good approach. Since there > > has been a lot of confusion, I’m considering adding a complete 4PB > > allocation test — from 0 to 128TB we allocate without any hint, and > > from 128TB onward we use sequential hint addresses. > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c > > b/tools/testing/selftests/mm/virtual_address_range.c > > index e24c36a39f22..f2009d23f8b2 100644 > > --- a/tools/testing/selftests/mm/virtual_address_range.c > > +++ b/tools/testing/selftests/mm/virtual_address_range.c > > @@ -50,6 +50,7 @@ > > #define NR_CHUNKS_256TB (NR_CHUNKS_128TB * 2UL) > > #define NR_CHUNKS_384TB (NR_CHUNKS_128TB * 3UL) > > #define NR_CHUNKS_3840TB (NR_CHUNKS_128TB * 30UL) > > +#define NR_CHUNKS_3968TB (NR_CHUNKS_128TB * 31UL) > > #define ADDR_MARK_128TB (1UL << 47) /* First address beyond 128TB */ > > #define ADDR_MARK_256TB (1UL << 48) /* First address beyond 256TB */ > > @@ -59,6 +60,11 @@ > > #define HIGH_ADDR_SHIFT 49 > > #define NR_CHUNKS_LOW NR_CHUNKS_256TB > > #define NR_CHUNKS_HIGH NR_CHUNKS_3840TB > > +#elif defined(__PPC64__) > > +#define HIGH_ADDR_MARK ADDR_MARK_128TB > > +#define HIGH_ADDR_SHIFT 47 > > +#define NR_CHUNKS_LOW NR_CHUNKS_128TB > > +#define NR_CHUNKS_HIGH NR_CHUNKS_3968TB > > #else > > #define HIGH_ADDR_MARK ADDR_MARK_128TB > > #define HIGH_ADDR_SHIFT 48 > > > > > > With this the test is passing. > > Ah okay this was the problem, PPC got extended for 52 bits and the > test was not updated. This is the correct fix, you can go ahead > with this one.
Thanks Dev > > > > > > > > > It should be quite easy to trace which function is failing. Can you > > > please do some debugging for me? Otherwise I will have to go ahead > > > with setting up a PPC VM and testing myself :) > > > > > > > > > Can we fix it this way? >