On Thu, Jun 26, 2025 at 12:05:11PM +0530, Dev Jain wrote:
> 
> On 26/06/25 11:12 am, Donet Tom wrote:
> > On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote:
> > > On 25/06/25 10:47 pm, Donet Tom wrote:
> > > > On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
> > > > > On 19/06/25 1:53 pm, Donet Tom wrote:
> > > > > > On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
> > > > > > > On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
> > > > > > > > On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
> > > > > > > > > On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
> > > > > > > > > > > On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
> > > > > > > > > > > > On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > Are you accounting for sys.max_map_count? If not, then 
> > > > > > > > > > > > you'll be hitting that
> > > > > > > > > > > > first.
> > > > > > > > > > > run_vmtests.sh will run the test in overcommit mode so 
> > > > > > > > > > > that won't be an issue.
> > > > > > > > > > Umm, what? You mean overcommit all mode, and that has no 
> > > > > > > > > > bearing on the max
> > > > > > > > > > mapping count check.
> > > > > > > > > > 
> > > > > > > > > > In do_mmap():
> > > > > > > > > > 
> > > > > > > > > >     /* Too many mappings? */
> > > > > > > > > >     if (mm->map_count > sysctl_max_map_count)
> > > > > > > > > >             return -ENOMEM;
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > As well as numerous other checks in mm/vma.c.
> > > > > > > > > Ah sorry, didn't look at the code properly just assumed that 
> > > > > > > > > overcommit_always meant overriding
> > > > > > > > > this.
> > > > > > > > No problem! It's hard to be aware of everything in mm :)
> > > > > > > > 
> > > > > > > > > > I'm not sure why an overcommit toggle is even necessary 
> > > > > > > > > > when you could use
> > > > > > > > > > MAP_NORESERVE or simply map PROT_NONE to avoid the 
> > > > > > > > > > OVERCOMMIT_GUESS limits?
> > > > > > > > > > 
> > > > > > > > > > I'm pretty confused as to what this test is really 
> > > > > > > > > > achieving honestly. This
> > > > > > > > > > isn't a useful way of asserting mmap() behaviour as far as 
> > > > > > > > > > I can tell.
> > > > > > > > > Well, seems like a useful way to me at least : ) Not sure if 
> > > > > > > > > you are in the mood
> > > > > > > > > to discuss that but if you'd like me to explain from start to 
> > > > > > > > > end what the test
> > > > > > > > > is doing, I can do that : )
> > > > > > > > > 
> > > > > > > > I just don't have time right now, I guess I'll have to come 
> > > > > > > > back to it
> > > > > > > > later... it's not the end of the world for it to be iffy in my 
> > > > > > > > view as long as
> > > > > > > > it passes, but it might just not be of great value.
> > > > > > > > 
> > > > > > > > Philosophically I'd rather we didn't assert internal 
> > > > > > > > implementation details like
> > > > > > > > where we place mappings in userland memory. At no point do we 
> > > > > > > > promise to not
> > > > > > > > leave larger gaps if we feel like it :)
> > > > > > > You have a fair point. Anyhow a debate for another day.
> > > > > > > 
> > > > > > > > I'm guessing, reading more, the _real_ test here is some 
> > > > > > > > mathematical assertion
> > > > > > > > about layout from HIGH_ADDR_SHIFT -> end of address space when 
> > > > > > > > using hints.
> > > > > > > > 
> > > > > > > > But again I'm not sure that achieves much and again also is 
> > > > > > > > asserting internal
> > > > > > > > implementation details.
> > > > > > > > 
> > > > > > > > Correct behaviour of this kind of thing probably better belongs 
> > > > > > > > to tests in the
> > > > > > > > userland VMA testing I'd say.
> > > > > > > > 
> > > > > > > > Sorry I don't mean to do down work you've done before, just 
> > > > > > > > giving an honest
> > > > > > > > technical appraisal!
> > > > > > > Nah, it will be rather hilarious to see it all go down the drain 
> > > > > > > xD
> > > > > > > 
> > > > > > > > Anyway don't let this block work to fix the test if it's 
> > > > > > > > failing. We can revisit
> > > > > > > > this later.
> > > > > > > Sure. @Aboorva and Donet, I still believe that the correct 
> > > > > > > approach is to elide
> > > > > > > the gap check at the crossing boundary. What do you think?
> > > > > > > 
> > > > > > One problem I am seeing with this approach is that, since the hint 
> > > > > > address
> > > > > > is generated randomly, the VMAs are also being created at randomly 
> > > > > > based on
> > > > > > the hint address.So, for the VMAs created at high addresses, we 
> > > > > > cannot guarantee
> > > > > > that the gaps between them will be aligned to MAP_CHUNK_SIZE.
> > > > > > 
> > > > > > High address VMAs
> > > > > > -----------------
> > > > > > 1000000000000-1000040000000 r--p 00000000 00:00 0
> > > > > > 2000000000000-2000040000000 r--p 00000000 00:00 0
> > > > > > 4000000000000-4000040000000 r--p 00000000 00:00 0
> > > > > > 8000000000000-8000040000000 r--p 00000000 00:00 0
> > > > > > e80009d260000-fffff9d260000 r--p 00000000 00:00 0
> > > > > > 
> > > > > > I have a different approach to solve this issue.
> > > > > > 
> > > > > >    From 0 to 128TB, we map memory directly without using any hint. 
> > > > > > For the range above
> > > > > > 256TB up to 512TB, we perform the mapping using hint addresses. In 
> > > > > > the current test,
> > > > > > we use random hint addresses, but I have modified it to generate 
> > > > > > hint addresses linearly
> > > > > > starting from 128TB.
> > > > > > 
> > > > > > With this change:
> > > > > > 
> > > > > > The 0–128TB range is mapped without hints and verified accordingly.
> > > > > > 
> > > > > > The 128TB–512TB range is mapped using linear hint addresses and 
> > > > > > then verified.
> > > > > > 
> > > > > > Below are the VMAs obtained with this approach:
> > > > > > 
> > > > > > 10000000-10010000 r-xp 00000000 fd:05 135019531
> > > > > > 10010000-10020000 r--p 00000000 fd:05 135019531
> > > > > > 10020000-10030000 rw-p 00010000 fd:05 135019531
> > > > > > 20000000-10020000000 r--p 00000000 00:00 0
> > > > > > 10020800000-10020830000 rw-p 00000000 00:00 0
> > > > > > 1004bcf0000-1004c000000 rw-p 00000000 00:00 0
> > > > > > 1004c000000-7fff8c000000 r--p 00000000 00:00 0
> > > > > > 7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
> > > > > > 7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
> > > > > > 7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
> > > > > > 7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
> > > > > > 7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
> > > > > > 7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
> > > > > > 7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
> > > > > > 7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
> > > > > > 7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
> > > > > > 7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
> > > > > > 7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
> > > > > > 7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
> > > > > > 7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
> > > > > > 800000000000-2000000000000 r--p 00000000 00:00 0    -> High Address 
> > > > > > (128TB to 512TB)
> > > > > > 
> > > > > > diff --git a/tools/testing/selftests/mm/virtual_address_range.c 
> > > > > > b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > index 4c4c35eac15e..0be008cba4b0 100644
> > > > > > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > > > > > @@ -56,21 +56,21 @@
> > > > > >     #ifdef __aarch64__
> > > > > >     #define HIGH_ADDR_MARK  ADDR_MARK_256TB
> > > > > > -#define HIGH_ADDR_SHIFT 49
> > > > > > +#define HIGH_ADDR_SHIFT 48
> > > > > >     #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> > > > > >     #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > > > > >     #else
> > > > > >     #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > > > > > -#define HIGH_ADDR_SHIFT 48
> > > > > > +#define HIGH_ADDR_SHIFT 47
> > > > > >     #define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > > > > >     #define NR_CHUNKS_HIGH  NR_CHUNKS_384TB
> > > > > >     #endif
> > > > > > -static char *hint_addr(void)
> > > > > > +static char *hint_addr(int hint)
> > > > > >     {
> > > > > > -       int bits = HIGH_ADDR_SHIFT + rand() % (63 - 
> > > > > > HIGH_ADDR_SHIFT);
> > > > > > +       unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint * 
> > > > > > MAP_CHUNK_SIZE));
> > > > > > -       return (char *) (1UL << bits);
> > > > > > +       return (char *) (addr);
> > > > > >     }
> > > > > >     static void validate_addr(char *ptr, int high_addr)
> > > > > > @@ -217,7 +217,7 @@ int main(int argc, char *argv[])
> > > > > >            }
> > > > > >            for (i = 0; i < NR_CHUNKS_HIGH; i++) {
> > > > > > -               hint = hint_addr();
> > > > > > +               hint = hint_addr(i);
> > > > > >                    hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
> > > > > >                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 
> > > > > > 0);
> > > > > Ah you sent it here, thanks. This is fine really, but the mystery is
> > > > > something else.
> > > > > 
> > > > Thanks Dev
> > > > 
> > > > I can send out v2 with this patch included, right?
> > > Sorry not yet :) this patch will just hide the real problem, which
> > > is, after the hint addresses get exhausted, why on ppc the kernel
> > > cannot find a VMA to install despite having such large gaps between
> > > VMAs.
> > 
> > I think there is some confusion here, so let me clarify.
> > 
> > On PowerPC, mmap is able to find VMAs both with and without a hint.
> > There is no issue there. If you look at the test, from 0 to 128TB we
> > are mapping without any hint, and the VMAs are getting created as
> > expected.
> > 
> > Above 256TB, we are mapping with random hint addresses, and with
> > those hints, all VMAs are being created above 258TB. No mmap call
> > is failing in this case.
> > 
> > The problem is with the test itself: since we are providing random
> > hint addresses, the VMAs are also being created at random locations.
> > 
> > Below is the VMAs created with hint addreess
> > 
> > 1. 256TB hint address
> > 
> > 1000000000000-1000040000000 r--p 00000000 00:00 0                        
> > [anon:virtual_address_range]
> > 
> > 2. 512TB hint address
> > 2000000000000-2000040000000 r--p 00000000 00:00 0                        
> > [anon:virtual_address_range]
> > 
> > 3. 1024TB Hint address
> > 4000000000000-4000040000000 r--p 00000000 00:00 0                        
> > [anon:virtual_address_range]
> > 
> > 4. 2048TB hint Address
> > 8000000000000-8000040000000 r--p 00000000 00:00 0                        
> > [anon:virtual_address_range]
> > 
> > 5. above 3096 Hint address
> > eb95410220000-fffff90220000 r--p 00000000 00:00 0                        
> > [anon:virtual_address_range].
> > 
> > 
> > We support up to 4PB, and since the hint addresses are random,
> > the VMAs are created at random locations.
> > 
> > With sequential hint addresses from 128TB to 512TB, we provide the
> > hint addresses in order, and the VMAs are created at the hinted
> > addresses.
> > 
> > Within 512TB, we were able to test both high and low addresses, so
> > I thought sequential hinting would be a good approach. Since there
> > has been a lot of confusion, I’m considering adding a complete 4PB
> > allocation test — from 0 to 128TB we allocate without any hint, and
> > from 128TB onward we use sequential hint addresses.
> > 
> > diff --git a/tools/testing/selftests/mm/virtual_address_range.c 
> > b/tools/testing/selftests/mm/virtual_address_range.c
> > index e24c36a39f22..f2009d23f8b2 100644
> > --- a/tools/testing/selftests/mm/virtual_address_range.c
> > +++ b/tools/testing/selftests/mm/virtual_address_range.c
> > @@ -50,6 +50,7 @@
> >   #define NR_CHUNKS_256TB   (NR_CHUNKS_128TB * 2UL)
> >   #define NR_CHUNKS_384TB   (NR_CHUNKS_128TB * 3UL)
> >   #define NR_CHUNKS_3840TB  (NR_CHUNKS_128TB * 30UL)
> > +#define NR_CHUNKS_3968TB  (NR_CHUNKS_128TB * 31UL)
> >   #define ADDR_MARK_128TB  (1UL << 47) /* First address beyond 128TB */
> >   #define ADDR_MARK_256TB  (1UL << 48) /* First address beyond 256TB */
> > @@ -59,6 +60,11 @@
> >   #define HIGH_ADDR_SHIFT 49
> >   #define NR_CHUNKS_LOW   NR_CHUNKS_256TB
> >   #define NR_CHUNKS_HIGH  NR_CHUNKS_3840TB
> > +#elif defined(__PPC64__)
> > +#define HIGH_ADDR_MARK  ADDR_MARK_128TB
> > +#define HIGH_ADDR_SHIFT 47
> > +#define NR_CHUNKS_LOW   NR_CHUNKS_128TB
> > +#define NR_CHUNKS_HIGH  NR_CHUNKS_3968TB
> >   #else
> >   #define HIGH_ADDR_MARK  ADDR_MARK_128TB
> >   #define HIGH_ADDR_SHIFT 48
> > 
> > 
> > With this the test is passing.
> 
> Ah okay this was the problem, PPC got extended for 52 bits and the
> test was not updated. This is the correct fix, you can go ahead
> with this one.


Thanks Dev
 
> > 
> > 
> > 
> > > It should be quite easy to trace which function is failing. Can you
> > > please do some debugging for me? Otherwise I will have to go ahead
> > > with setting up a PPC VM and testing myself :)
> > > 
> > > > > > Can we fix it this way?
> 

Reply via email to