On 14/04/2026 06:09, Dev Jain wrote:
> 
> 
> On 14/04/26 12:57 am, David Hildenbrand (Arm) wrote:
>> On 4/10/26 16:30, Dev Jain wrote:
>>> The original version of mremap_test (7df666253f26: "kselftests: vm: add
>>> mremap tests") validated remapped contents byte-by-byte and printed a
>>> mismatch index in case the bytes streams are not equal. That made
>>> validation expensive in both cases: for "no mismatch" (the common case when
>>> mremap is not buggy), it still walked all bytes in C; for "mismatch", it
>>> broke out of the loop after printing the mismatch index.
>>>
>>> Later, my commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize
>>> execution time from minutes to seconds using chunkwise memcmp") tried to
>>> optimize both cases by using chunk-wise memcmp() and only scanning bytes
>>> within a range which has been determined by memcmp as mismatching.
>>>
>>> But get_sqrt() in that commit is buggy: `high = mid - 1` is applied
>>> unconditionally. This makes the speed of checking the mismatch index
>>> suboptimal.
>>
>> So is that the only problem with 7033c6cc9620: the speed?
> 
> Yes.
> 
> I'll explain the algorithm in 7033c6cc9620.
> 
> The problem statement is: given two buffers of equal length n, find the
> first mismatch index.
> 
> Algorithm: Divide the buffers into sqrt(n) chunks. Do a memcmp() over
> each chunk. If all of them succeed, the buffers are equal, giving the
> result in O(sqrt(n)) * t, where t = time taken by memcmp().
> 
> Otherwise, worst case is that we find the mismatch in the last chunk.
> Now brute-force iterate this chunk to find the mismatch. Since chunk
> size is sqrt(n), complexity is again
> sqrt(n) * t + sqrt(n) = O(sqrt(n)) * t.
> 
> So if get_sqrt() computes a wrong square root, we lose this time
> complexity.
> 
> Maybe there is an optimal value of x = #number of chunks of the buffer,
> which may not be sqrt(n).
> 
> But given the information we have, a CS course on algorithms will
> say this is one of the optimal ways to do it.
> 
>>
>>>
>>> The mismatch index does not provide useful debugging value here: if
>>> validation fails, we know mremap behavior is wrong, and the specific byte
>>> offset does not make root-causing easier.
>>
>> Fully agreed.
>>
>>>
>>> So instead of fixing get_sqrt(), bite the bullet, drop mismatch index
>>> scanning and just compare the two byte streams with memcmp().
>>
>> How does this affect the execution time of the test?
> 
> I just checked with ./mremap_test -t 0, the variance is very high on my
> system.
> 
> In the common case of the test passing:
> 
> before patch, there are multiple sub-length calls to memcmp.
> after patch, there is a single full-length call to memcmp.
> 
> So the time should reduce but may not be very distinguishable.

My intuition would be the opposite; if you hafve a 4096 byte buffer, I would
have thought that a single memcmp would be significantly faster than sqrt(4096)
= 64 calls, each over 64 bytes.

If you want to keep the common case fast, but also find the first differing
offset on failure, I expect you can exploit the fact that the buffers are all
page aligned. With some prompting, Codex gave me this:


---8<---
static size_t first_mismatch_offset(const void *buf1, const void *buf2,
                                    size_t len)
{
        const uint64_t *ptr1 = buf1;
        const uint64_t *ptr2 = buf2;
        size_t word;
        size_t words = len / sizeof(*ptr1);

        assert(!((uintptr_t)buf1 & (sizeof(*ptr1) - 1)));
        assert(!((uintptr_t)buf2 & (sizeof(*ptr2) - 1)));
        assert(!(len & (sizeof(*ptr1) - 1)));

        if (!memcmp(buf1, buf2, len))
                return len;

        for (word = 0; word < words; word++) {
                if (ptr1[word] != ptr2[word]) {
                        const unsigned char *bytes1 =
                                (const unsigned char *)&ptr1[word];
                        const unsigned char *bytes2 =
                                (const unsigned char *)&ptr2[word];
                        size_t i;

                        for (i = 0; i < sizeof(*ptr1); i++) {
                                if (bytes1[i] != bytes2[i])
                                        return word * sizeof(*ptr1) + i;
                        }
                }
        }

        return len;
}
---8<---

I've not benchmarked it though...

Thanks,
Ryan



> 
>>
>>>
>>> Reported-by: Sarthak Sharma <[email protected]>
>>> Signed-off-by: Dev Jain <[email protected]>
>>
>> Fixes: 7033c6cc9620 ("selftests/mm: mremap_test: optimize execution time
>> from minutes to seconds using chunkwise memcmp")
>>
>> ?
> 
> Not needed. 7033c6cc9620 does not create any incorrectness in the checking
> of mismatch index.
> 
>>
>>> ---
>>> Sorry for sending two patchsets the same day - the problem was made known
>>> to me today, and I couldn't help myself but fix it immediately, imagine
>>> my embarrassment when I found out that I made a typo in the binary search
>>> code which I had been writing consistently throughout college :)
>>
>> :)
>>
>>>
>>> Applies on mm-unstable.
>>>
>>>  tools/testing/selftests/mm/mremap_test.c | 109 +++--------------------
>>>  1 file changed, 10 insertions(+), 99 deletions(-)
>>
>> I mean, it certainly looks like a nice cleanup.
>>
> 


Reply via email to