On Mon, Nov 17, 2025 at 11:16:53AM -0700, Nico Pache wrote:
>On Sat, Nov 8, 2025 at 7:40 PM Wei Yang <[email protected]> wrote:
>>
>> On Wed, Oct 22, 2025 at 12:37:15PM -0600, Nico Pache wrote:
>> >There are cases where, if an attempted collapse fails, all subsequent
>> >orders are guaranteed to also fail. Avoid these collapse attempts by
>> >bailing out early.
>> >
>> >Signed-off-by: Nico Pache <[email protected]>
>> >---
>> > mm/khugepaged.c | 31 ++++++++++++++++++++++++++++++-
>> > 1 file changed, 30 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> >index e2319bfd0065..54f5c7888e46 100644
>> >--- a/mm/khugepaged.c
>> >+++ b/mm/khugepaged.c
>> >@@ -1431,10 +1431,39 @@ static int collapse_scan_bitmap(struct mm_struct 
>> >*mm, unsigned long address,
>> >                       ret = collapse_huge_page(mm, address, referenced,
>> >                                                unmapped, cc, mmap_locked,
>> >                                                order, offset);
>> >-                      if (ret == SCAN_SUCCEED) {
>> >+
>> >+                      /*
>> >+                       * Analyze failure reason to determine next action:
>> >+                       * - goto next_order: try smaller orders in same 
>> >region
>> >+                       * - continue: try other regions at same order
>> >+                       * - break: stop all attempts (system-wide failure)
>> >+                       */
>> >+                      switch (ret) {
>> >+                      /* Cases were we should continue to the next region 
>> >*/
>> >+                      case SCAN_SUCCEED:
>> >                               collapsed += 1UL << order;
>> >+                              fallthrough;
>> >+                      case SCAN_PTE_MAPPED_HUGEPAGE:
>> >                               continue;
>> >+                      /* Cases were lower orders might still succeed */
>> >+                      case SCAN_LACK_REFERENCED_PAGE:
>> >+                      case SCAN_EXCEED_NONE_PTE:
>> >+                      case SCAN_EXCEED_SWAP_PTE:
>> >+                      case SCAN_EXCEED_SHARED_PTE:
>> >+                      case SCAN_PAGE_LOCK:
>> >+                      case SCAN_PAGE_COUNT:
>> >+                      case SCAN_PAGE_LRU:
>> >+                      case SCAN_PAGE_NULL:
>> >+                      case SCAN_DEL_PAGE_LRU:
>> >+                      case SCAN_PTE_NON_PRESENT:
>> >+                      case SCAN_PTE_UFFD_WP:
>> >+                      case SCAN_ALLOC_HUGE_PAGE_FAIL:
>> >+                              goto next_order;
>> >+                      /* All other cases should stop collapse attempts */
>> >+                      default:
>> >+                              break;
>> >                       }
>> >+                      break;
>>
>> One question here:
>
>Hi Wei Yang,
>
>Sorry I forgot to get back to this email.
>

No problem, thanks for taking a look.

>>
>> Suppose we have iterated several orders and not collapse successfully yet. So
>> the mthp_bitmap_stack[] would look like this:
>>
>> [8 7 6 6]
>>        ^
>>        |
>
>so we always pop before pushing. So it would go
>
>[9]
>pop
>if (collapse fails)
>[8 8]
>lets say we pop and successfully collapse a order 8
>[8]
>Then we fail the other order 8
>[7 7]
>now if we succeed the first order 7
>[7 6 6]
>I believe we are now in the state you wanted to describe.
>
>>
>> Now we found this one pass the threshold check, but it fails with other
>> result.
>
>ok lets say we pass the threshold checks, but the collapse fails for
>any reason that is described in the
>/* Cases were lower orders might still succeed */
>In this case we would continue to order 5 (or lower). Once we are done
>with this branch of the tree we go back to the other order 6 collapse.
>and eventually the order 7.
>
>>
>> Current code looks it would give up at all, but we may still have a chance to
>> collapse the above 3 range?
>
>for cases under /* All other cases should stop collapse attempts */
>Yes we would bail out and skip some collapses. I tried to think about
>all the cases were we would still want to continue trying, vs cases
>where the system is probably out of resources or hitting some major
>failure, and we should just break out (as others will probably fail
>too).
>

Thanks, your explanation is very clear.

>But this is also why I separated this patch out on its own. I was
>hoping to have some more focus on the different cases, and make sure I
>handled them in the best possible way. So I really appreciate the
>question :)
>
>* I did some digging through old message to find this *
>
>I believe these are the remaining cases. If these are hit I figured
>it's better to abort.
>

I agree we need to take care of those cases.

>/* cases where we must stop collapse attempts */
>case SCAN_CGROUP_CHARGE_FAIL:
>case SCAN_COPY_MC:
>case SCAN_ADDRESS_RANGE:
>case SCAN_PMD_NULL:
>case SCAN_ANY_PROCESS:
>case SCAN_VMA_NULL:
>case SCAN_VMA_CHECK:
>case SCAN_SCAN_ABORT:
>case SCAN_PMD_NONE:
>case SCAN_PAGE_ANON:
>case SCAN_PMD_MAPPED:
>case SCAN_FAIL:
>
>Please let me know if you think we should move these to either the
>`continue` or `next order` cases.

Take a look into these cases, it looks good to me now.

Also one of my concern is this coding style is a little hard to maintain. In
case we introduce a new result, we should remember to add it here. Otherwise
we may stop the collapse too early.

While it maybe a separate work after this patch set merged.

>
>Cheers,
>-- Nico
>
>>
>> >               }
>> >
>> > next_order:
>> >--
>> >2.51.0
>>
>> --
>> Wei Yang
>> Help you, Help me
>>

-- 
Wei Yang
Help you, Help me

Reply via email to