On 14 Apr 2026, at 6:38, David Hildenbrand (Arm) wrote:

> On 4/13/26 21:20, Zi Yan wrote:
>> This check ensures the correctness of collapse read-only THPs for FSes
>> after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
>> PMD THP pagecache.
>>
>> READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
>> and inode->i_writecount to prevent any write to read-only to-be-collapsed
>> folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
>> aforementioned mechanism will go away too. To ensure khugepaged functions
>> as expected after the changes, rollback if any folio is dirty after
>> try_to_unmap_flush() to , since a dirty folio means this read-only folio
>> got some writes via mmap can happen between try_to_unmap() and
>> try_to_unmap_flush() via cached TLB entries and khugepaged does not support
>> collapse writable pagecache folios.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>>  mm/khugepaged.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index d2f0acd2dac2..ec609e53082e 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2121,6 +2121,24 @@ static enum scan_result collapse_file(struct 
>> mm_struct *mm, unsigned long addr,
>>       */
>>      try_to_unmap_flush();
>>
>> +    /*
>> +     * At this point, all folios are locked, unmapped, and all cached
>> +     * mappings in TLBs are flushed. No one else is able to write to these
>> +     * folios, since
>> +     * 1. writes via FS ops require folio locks (see 
>> write_begin_get_folio());
>> +     * 2. writes via mmap require taking a fault and locking folio locks.
>> +     *
>
> maybe simplify to "folios, since that would require taking the folio lock 
> first."

Sure.

>
>> +     * khugepaged only works for read-only fd, make sure all folios are
>> +     * clean, since writes via mmap can happen between try_to_unmap() and
>> +     * try_to_unmap_flush() via cached TLB entries.
>
>
> IIRC, after successful try_to_unmap() the PTE dirty bit would be synced to the
> folio. That's what you care about, not about any stale TLB entries.

Right. I missed the PTE dirty bit to folio dirtiness part.

>
> The important part is that the
>
> So can't we simply test for dirty folios after the refcount check (where
> we made sure the folio is no longer mapped)?
>

I think so.

>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b2ac28ddd480..920e16067134 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2089,6 +2089,14 @@ static enum scan_result collapse_file(struct mm_struct 
> *mm, unsigned long addr,
>                         goto out_unlock;
>                 }
>
> +               /* ... */
> +               if (!is_shmem && folio_test_dirty(folio)) {
> +                       result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
> +                       xas_unlock_irq(&xas);
> +                       folio_putback_lru(folio);
> +                       goto out_unlock;
> +               }
> +
>                 /*
>                  * Accumulate the folios that are being collapsed.
>
>
> I guess we don't have to recheck folio_test_writeback() ?

Right, writeback needs to take folio lock and we are holding it, so
others can only dirty the folio but are not able to initiate write backs.

Best Regards,
Yan, Zi

Reply via email to