collapse_file() is capable of collapsing pagecache folios from writable
files to PMD folios.  Now enable clean pagecache folio collapse in
addition to read-only pagecache folio collapse by removing the
inode_is_open_for_write() from file_thp_enabled() and only performing
filemap_flush() if the file is read-only.

This means userspace needs to explicitly flush the content of pagecache
folios before khugepaged can collapse the folios, or use
madvise(MADV_COLLAPSE), which does the flush in the retry.  The reason is
that blindly enabling dirty pagecache folio from writable files collapse
makes khugepaged flush these folios all the time.  It is undesirable to
cause system level pagecache flushes.

To properly support dirty pagecache folio collapse, filemap_flush() needs
to be avoided.  Potentially, merging associated buffer instead of dropping
it with filemap_release_folio() might be needed.

NOTE: this breaks khugepaged selftests for writable file pagecache
collapse, which is set to fail all the time.  The next commit fixes it.

Signed-off-by: Zi Yan <[email protected]>
Reviewed-by: Lance Yang <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Barry Song <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: David Hildenbrand (Arm) <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dev Jain <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Liam Howlett <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nico Pache <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
---
 mm/huge_memory.c |  2 +-
 mm/khugepaged.c  | 15 +++++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d055f53be8502..c565b2a651e06 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -97,7 +97,7 @@ static inline bool file_thp_enabled(struct vm_area_struct 
*vma)
        if (!mapping_pmd_folio_support(vma->vm_file->f_mapping))
                return false;
 
-       return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
+       return S_ISREG(inode->i_mode);
 }
 
 /* If returns true, we are unable to access the VMA's folios. */
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c743ec41a7b8b..395c40c24dbc5 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2342,18 +2342,21 @@ static enum scan_result collapse_file(struct mm_struct 
*mm, unsigned long addr,
                        } else if (folio_test_dirty(folio)) {
                                /*
                                 * This page is dirty because it hasn't
-                                * been flushed since first write. There
-                                * won't be new dirty pages.
+                                * been flushed since first write.
                                 *
-                                * Trigger async flush here and hope the
-                                * writeback is done when khugepaged
-                                * revisits this page.
+                                * Trigger async flush for read-only files and
+                                * hope the writeback is done when khugepaged
+                                * revisits this page. Writable files can have
+                                * their folios dirty at any time; blindly
+                                * flushing them would cause undesirable
+                                * system-wide writeback.
                                 *
                                 * This is a one-off situation. We are not
                                 * forcing writeback in loop.
                                 */
                                xas_unlock_irq(&xas);
-                               filemap_flush(mapping);
+                               if (!inode_is_open_for_write(mapping->host))
+                                       filemap_flush(mapping);
                                result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
                                goto xa_unlocked;
                        } else if (folio_test_writeback(folio)) {
-- 
2.53.0


Reply via email to