On 26/06/25 11:12 am, Donet Tom wrote:
On Thu, Jun 26, 2025 at 09:27:30AM +0530, Dev Jain wrote:
On 25/06/25 10:47 pm, Donet Tom wrote:
On Wed, Jun 25, 2025 at 06:22:53PM +0530, Dev Jain wrote:
On 19/06/25 1:53 pm, Donet Tom wrote:
On Wed, Jun 18, 2025 at 08:13:54PM +0530, Dev Jain wrote:
On 18/06/25 8:05 pm, Lorenzo Stoakes wrote:
On Wed, Jun 18, 2025 at 07:47:18PM +0530, Dev Jain wrote:
On 18/06/25 7:37 pm, Lorenzo Stoakes wrote:
On Wed, Jun 18, 2025 at 07:28:16PM +0530, Dev Jain wrote:
On 18/06/25 5:27 pm, Lorenzo Stoakes wrote:
On Wed, Jun 18, 2025 at 05:15:50PM +0530, Dev Jain wrote:
Are you accounting for sys.max_map_count? If not, then you'll be hitting that
first.
run_vmtests.sh will run the test in overcommit mode so that won't be an issue.
Umm, what? You mean overcommit all mode, and that has no bearing on the max
mapping count check.
In do_mmap():
/* Too many mappings? */
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;
As well as numerous other checks in mm/vma.c.
Ah sorry, didn't look at the code properly just assumed that overcommit_always
meant overriding
this.
No problem! It's hard to be aware of everything in mm :)
I'm not sure why an overcommit toggle is even necessary when you could use
MAP_NORESERVE or simply map PROT_NONE to avoid the OVERCOMMIT_GUESS limits?
I'm pretty confused as to what this test is really achieving honestly. This
isn't a useful way of asserting mmap() behaviour as far as I can tell.
Well, seems like a useful way to me at least : ) Not sure if you are in the mood
to discuss that but if you'd like me to explain from start to end what the test
is doing, I can do that : )
I just don't have time right now, I guess I'll have to come back to it
later... it's not the end of the world for it to be iffy in my view as long as
it passes, but it might just not be of great value.
Philosophically I'd rather we didn't assert internal implementation details like
where we place mappings in userland memory. At no point do we promise to not
leave larger gaps if we feel like it :)
You have a fair point. Anyhow a debate for another day.
I'm guessing, reading more, the _real_ test here is some mathematical assertion
about layout from HIGH_ADDR_SHIFT -> end of address space when using hints.
But again I'm not sure that achieves much and again also is asserting internal
implementation details.
Correct behaviour of this kind of thing probably better belongs to tests in the
userland VMA testing I'd say.
Sorry I don't mean to do down work you've done before, just giving an honest
technical appraisal!
Nah, it will be rather hilarious to see it all go down the drain xD
Anyway don't let this block work to fix the test if it's failing. We can revisit
this later.
Sure. @Aboorva and Donet, I still believe that the correct approach is to elide
the gap check at the crossing boundary. What do you think?
One problem I am seeing with this approach is that, since the hint address
is generated randomly, the VMAs are also being created at randomly based on
the hint address.So, for the VMAs created at high addresses, we cannot guarantee
that the gaps between them will be aligned to MAP_CHUNK_SIZE.
High address VMAs
-----------------
1000000000000-1000040000000 r--p 00000000 00:00 0
2000000000000-2000040000000 r--p 00000000 00:00 0
4000000000000-4000040000000 r--p 00000000 00:00 0
8000000000000-8000040000000 r--p 00000000 00:00 0
e80009d260000-fffff9d260000 r--p 00000000 00:00 0
I have a different approach to solve this issue.
From 0 to 128TB, we map memory directly without using any hint. For the
range above
256TB up to 512TB, we perform the mapping using hint addresses. In the current
test,
we use random hint addresses, but I have modified it to generate hint addresses
linearly
starting from 128TB.
With this change:
The 0–128TB range is mapped without hints and verified accordingly.
The 128TB–512TB range is mapped using linear hint addresses and then verified.
Below are the VMAs obtained with this approach:
10000000-10010000 r-xp 00000000 fd:05 135019531
10010000-10020000 r--p 00000000 fd:05 135019531
10020000-10030000 rw-p 00010000 fd:05 135019531
20000000-10020000000 r--p 00000000 00:00 0
10020800000-10020830000 rw-p 00000000 00:00 0
1004bcf0000-1004c000000 rw-p 00000000 00:00 0
1004c000000-7fff8c000000 r--p 00000000 00:00 0
7fff8c130000-7fff8c360000 r-xp 00000000 fd:00 792355
7fff8c360000-7fff8c370000 r--p 00230000 fd:00 792355
7fff8c370000-7fff8c380000 rw-p 00240000 fd:00 792355
7fff8c380000-7fff8c460000 r-xp 00000000 fd:00 792358
7fff8c460000-7fff8c470000 r--p 000d0000 fd:00 792358
7fff8c470000-7fff8c480000 rw-p 000e0000 fd:00 792358
7fff8c490000-7fff8c4d0000 r--p 00000000 00:00 0
7fff8c4d0000-7fff8c4e0000 r-xp 00000000 00:00 0
7fff8c4e0000-7fff8c530000 r-xp 00000000 fd:00 792351
7fff8c530000-7fff8c540000 r--p 00040000 fd:00 792351
7fff8c540000-7fff8c550000 rw-p 00050000 fd:00 792351
7fff8d000000-7fffcd000000 r--p 00000000 00:00 0
7fffe9c80000-7fffe9d90000 rw-p 00000000 00:00 0
800000000000-2000000000000 r--p 00000000 00:00 0 -> High Address (128TB to
512TB)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c
b/tools/testing/selftests/mm/virtual_address_range.c
index 4c4c35eac15e..0be008cba4b0 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -56,21 +56,21 @@
#ifdef __aarch64__
#define HIGH_ADDR_MARK ADDR_MARK_256TB
-#define HIGH_ADDR_SHIFT 49
+#define HIGH_ADDR_SHIFT 48
#define NR_CHUNKS_LOW NR_CHUNKS_256TB
#define NR_CHUNKS_HIGH NR_CHUNKS_3840TB
#else
#define HIGH_ADDR_MARK ADDR_MARK_128TB
-#define HIGH_ADDR_SHIFT 48
+#define HIGH_ADDR_SHIFT 47
#define NR_CHUNKS_LOW NR_CHUNKS_128TB
#define NR_CHUNKS_HIGH NR_CHUNKS_384TB
#endif
-static char *hint_addr(void)
+static char *hint_addr(int hint)
{
- int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT);
+ unsigned long addr = ((1UL << HIGH_ADDR_SHIFT) + (hint *
MAP_CHUNK_SIZE));
- return (char *) (1UL << bits);
+ return (char *) (addr);
}
static void validate_addr(char *ptr, int high_addr)
@@ -217,7 +217,7 @@ int main(int argc, char *argv[])
}
for (i = 0; i < NR_CHUNKS_HIGH; i++) {
- hint = hint_addr();
+ hint = hint_addr(i);
hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
Ah you sent it here, thanks. This is fine really, but the mystery is
something else.
Thanks Dev
I can send out v2 with this patch included, right?
Sorry not yet :) this patch will just hide the real problem, which
is, after the hint addresses get exhausted, why on ppc the kernel
cannot find a VMA to install despite having such large gaps between
VMAs.
I think there is some confusion here, so let me clarify.
On PowerPC, mmap is able to find VMAs both with and without a hint.
There is no issue there. If you look at the test, from 0 to 128TB we
are mapping without any hint, and the VMAs are getting created as
expected.
Above 256TB, we are mapping with random hint addresses, and with
those hints, all VMAs are being created above 258TB. No mmap call
is failing in this case.
The problem is with the test itself: since we are providing random
hint addresses, the VMAs are also being created at random locations.
Below is the VMAs created with hint addreess
1. 256TB hint address
1000000000000-1000040000000 r--p 00000000 00:00 0
[anon:virtual_address_range]
2. 512TB hint address
2000000000000-2000040000000 r--p 00000000 00:00 0
[anon:virtual_address_range]
3. 1024TB Hint address
4000000000000-4000040000000 r--p 00000000 00:00 0
[anon:virtual_address_range]
4. 2048TB hint Address
8000000000000-8000040000000 r--p 00000000 00:00 0
[anon:virtual_address_range]
5. above 3096 Hint address
eb95410220000-fffff90220000 r--p 00000000 00:00 0
[anon:virtual_address_range].
We support up to 4PB, and since the hint addresses are random,
the VMAs are created at random locations.
I am still not getting you. What happens on my system (and what should
happen) is, suppose after you get the VMA layout you described above,
you again do an mmap with a random high addr hint. Suppose that the hint
addr is 1 << 50. Then, mmap will see that there already is a VMA there. So,
we will fall back to top down allocation, and the fifth VMA described
above will get expanded downwards. This will keep happening till the gap
between the fifth VMA start and fourth VMA end is less than MAP_CHUNK_SIZE.
Then, mmap will extend the fourth VMA downwards, and so on. Eventually
all the gaps will be less than MAP_CHUNK_SIZE.
With sequential hint addresses from 128TB to 512TB, we provide the
hint addresses in order, and the VMAs are created at the hinted
addresses.
Within 512TB, we were able to test both high and low addresses, so
I thought sequential hinting would be a good approach. Since there
has been a lot of confusion, I’m considering adding a complete 4PB
allocation test — from 0 to 128TB we allocate without any hint, and
from 128TB onward we use sequential hint addresses.
diff --git a/tools/testing/selftests/mm/virtual_address_range.c
b/tools/testing/selftests/mm/virtual_address_range.c
index e24c36a39f22..f2009d23f8b2 100644
--- a/tools/testing/selftests/mm/virtual_address_range.c
+++ b/tools/testing/selftests/mm/virtual_address_range.c
@@ -50,6 +50,7 @@
#define NR_CHUNKS_256TB (NR_CHUNKS_128TB * 2UL)
#define NR_CHUNKS_384TB (NR_CHUNKS_128TB * 3UL)
#define NR_CHUNKS_3840TB (NR_CHUNKS_128TB * 30UL)
+#define NR_CHUNKS_3968TB (NR_CHUNKS_128TB * 31UL)
#define ADDR_MARK_128TB (1UL << 47) /* First address beyond 128TB */
#define ADDR_MARK_256TB (1UL << 48) /* First address beyond 256TB */
@@ -59,6 +60,11 @@
#define HIGH_ADDR_SHIFT 49
#define NR_CHUNKS_LOW NR_CHUNKS_256TB
#define NR_CHUNKS_HIGH NR_CHUNKS_3840TB
+#elif defined(__PPC64__)
+#define HIGH_ADDR_MARK ADDR_MARK_128TB
+#define HIGH_ADDR_SHIFT 47
+#define NR_CHUNKS_LOW NR_CHUNKS_128TB
+#define NR_CHUNKS_HIGH NR_CHUNKS_3968TB
#else
#define HIGH_ADDR_MARK ADDR_MARK_128TB
#define HIGH_ADDR_SHIFT 48
With this the test is passing.
It should be quite easy to trace which function is failing. Can you
please do some debugging for me? Otherwise I will have to go ahead
with setting up a PPC VM and testing myself :)
Can we fix it this way?