On 14/04/26 3:17 pm, David Laight wrote:
> On Tue, 14 Apr 2026 10:01:57 +0200
> "David Hildenbrand (Arm)" <[email protected]> wrote:
> 
>> On 4/14/26 07:09, Dev Jain wrote:
>>>
>>>
>>> On 14/04/26 12:57 am, David Hildenbrand (Arm) wrote:  
>>>> On 4/10/26 16:30, Dev Jain wrote:  
>>>>> The original version of mremap_test (7df666253f26: "kselftests: vm: add
>>>>> mremap tests") validated remapped contents byte-by-byte and printed a
>>>>> mismatch index in case the bytes streams are not equal. That made
>>>>> validation expensive in both cases: for "no mismatch" (the common case 
>>>>> when
>>>>> mremap is not buggy), it still walked all bytes in C; for "mismatch", it
>>>>> broke out of the loop after printing the mismatch index.
>>>>>
>>>>> Later, my commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize
>>>>> execution time from minutes to seconds using chunkwise memcmp") tried to
>>>>> optimize both cases by using chunk-wise memcmp() and only scanning bytes
>>>>> within a range which has been determined by memcmp as mismatching.
>>>>>
>>>>> But get_sqrt() in that commit is buggy: `high = mid - 1` is applied
>>>>> unconditionally. This makes the speed of checking the mismatch index
>>>>> suboptimal.  
>>>>
>>>> So is that the only problem with 7033c6cc9620: the speed?  
>>>
>>> Yes.
>>>
>>> I'll explain the algorithm in 7033c6cc9620.
>>>
>>> The problem statement is: given two buffers of equal length n, find the
>>> first mismatch index.
>>>
>>> Algorithm: Divide the buffers into sqrt(n) chunks. Do a memcmp() over
>>> each chunk. If all of them succeed, the buffers are equal, giving the
>>> result in O(sqrt(n)) * t, where t = time taken by memcmp().
>>>
>>> Otherwise, worst case is that we find the mismatch in the last chunk.
>>> Now brute-force iterate this chunk to find the mismatch. Since chunk
>>> size is sqrt(n), complexity is again
>>> sqrt(n) * t + sqrt(n) = O(sqrt(n)) * t.
>>>
>>> So if get_sqrt() computes a wrong square root, we lose this time
>>> complexity.  
>>
>> Ah, thanks for clarifying.
>>
>>>
>>> Maybe there is an optimal value of x = #number of chunks of the buffer,
>>> which may not be sqrt(n).
>>>
>>> But given the information we have, a CS course on algorithms will
>>> say this is one of the optimal ways to do it.
>>>   
>>>>  
>>>>>
>>>>> The mismatch index does not provide useful debugging value here: if
>>>>> validation fails, we know mremap behavior is wrong, and the specific byte
>>>>> offset does not make root-causing easier.  
>>>>
>>>> Fully agreed.
>>>>  
>>>>>
>>>>> So instead of fixing get_sqrt(), bite the bullet, drop mismatch index
>>>>> scanning and just compare the two byte streams with memcmp().  
>>>>
>>>> How does this affect the execution time of the test?  
>>>
>>> I just checked with ./mremap_test -t 0, the variance is very high on my
>>> system.
>>>
>>> In the common case of the test passing:
>>>
>>> before patch, there are multiple sub-length calls to memcmp.
>>> after patch, there is a single full-length call to memcmp.
>>>
>>> So the time should reduce but may not be very distinguishable.  
>>
>> Okay, so doesn't matter. I agree that we should simplify all that.
>>
>> The exact index is irrelevant. Whoever wants to debug the test failure
>> could modify the test to find that out. It's one of the tests we don't
>> really expect to fail (often).
>>
>>>   
>>>>  
>>>>>
>>>>> Reported-by: Sarthak Sharma <[email protected]>
>>>>> Signed-off-by: Dev Jain <[email protected]>  
>>>>
>>>> Fixes: 7033c6cc9620 ("selftests/mm: mremap_test: optimize execution time
>>>> from minutes to seconds using chunkwise memcmp")
>>>>
>>>> ?  
>>>
>>> Not needed. 7033c6cc9620 does not create any incorrectness in the checking
>>> of mismatch index.  
>>
>> Yes, agreed.
>>
>>
>> I would suggest to rewrite/simplify/clarify the patch description, not
>> talking about "buggy" etc, focusing on the simplification.
>>
>> "
>> The original version of mremap_test (7df666253f26: "kselftests: vm: add
>> mremap tests") validated remapped contents byte-by-byte and printed a
>> mismatch index in case the bytes streams didn't match. That was rather
>> inefficient, especially also if the test passed.
>>
>> Later, commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize
>> execution time from minutes to seconds using chunkwise memcmp") used
>> memcmp() on bigger chunks, to fallback to byte-wise scanning to detect
>> the problematic index only if it discovered a problem.
>>
>> However, the implementation is overly complicated (e.g., get_sqrt() is
>> currently not optimal) and we don't really have to report the exact
>> index: whoever debugs the failing test can figure that out.
>>
>> Let's simplify by just comparing both byte streams with memcmp() and not
>> detecting the exact failed index.
>> "
> 
> ISTM that if you get a failure it doesn't really matter how long a
> second scan takes.
> So a simple byte loop that reports the offset and data of the first
> error and counts the number of errors is more than enough.

You are right in saying that it won't matter how long the second scan
takes, but it also doesn't matter what is the mismatch offset : ) so
let us remove some code while we can!

> 
>       David
> 
> 


Reply via email to