On 2020-09-17 17:27, HORIGUCHI NAOYA wrote:
Sorry, I modified the patches based on the different assumption from yours.
I firstly thought of taking page off after confirming the error page
is freed back to buddy. This approach leaves the possibility of reusing
the error page (which is acceptable), but simpler and less invasive one.

Your approach removes the error page from page allocator's control in
freeing time. It has no possibility of reusing the error page but changes
are tightly coupled with page free code.

This is a tradeoff between complexity and completeness of soft offline,
Now I'm not sure I could persist on my own opinion without providing
working code, and it's OK for me to take your one.

Yeah, you are right it is a trade off.
I would suggest taking this path now, and if it proofs to be problematic in some way, we can always
do the:

free_page
 take_it_off_buddy
  OK: mark it as hwpoison and increment refcount
  NOT_OK (raced with allocation): oops, sorry

The test passed in my environment, so this is fine.

Thanks for trying it out.


If they do, I will try to see if Andrew can squezee above changes into [1],
where they belong to.

Yes, proposing the fix for mmhwpoison-rework-soft-offline-for-in-use-pages.patch
seems fine to me.

Again, sorry for modifying code without asking.

No worries, I wil do a couple of tests on my own and then I will talk to Andrew to see if we can squeeze the changes in there.


Reply via email to