Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount
On 10/7/20 10:17 PM, Ram Pai wrote: On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote: ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. I was hoping this patch would resolve a page-reference issue that we run into at random times while migrating a page to a device page backed by secure-memory. Unfortunately I continue to see the problem. There is a reference held on that page, which fails the migration. FYI RP I'm willing to look into it but I would need more information. Can you give any more details about the conditions when it happens? It would be great if you have a program that reproduces the problem.
Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount
On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote: > ZONE_DEVICE struct pages have an extra reference count that complicates the > code for put_page() and several places in the kernel that need to check the > reference count to see that a page is not being used (gup, compaction, > migration, etc.). Clean up the code so the reference count doesn't need to > be treated specially for ZONE_DEVICE. I was hoping this patch would resolve a page-reference issue that we run into at random times while migrating a page to a device page backed by secure-memory. Unfortunately I continue to see the problem. There is a reference held on that page, which fails the migration. FYI RP
Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount
On 10/1/20 10:59 PM, Christoph Hellwig wrote: On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote: ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE. Signed-off-by: Ralph Campbell Looks good, Reviewed-by: Christoph Hellwig Thanks for the review. I still have reservations about making this an official patch. Did you see the updated cover letter? Basically, I'm concerned about ZONE_DEVICE struct pages being inserted into the process page table with a zero reference count with vmf_insert_mixed(). If it is to be a non-zero reference count, then DAX, pmem, and other uses of ZONE_DEVICE pages need to be changed (or vmf_insert_mixed()) to inc/dec in appropriate places but I don't feel I know that code well enough to make those changes.
Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount
On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote: > ZONE_DEVICE struct pages have an extra reference count that complicates the > code for put_page() and several places in the kernel that need to check the > reference count to see that a page is not being used (gup, compaction, > migration, etc.). Clean up the code so the reference count doesn't need to > be treated specially for ZONE_DEVICE. > > Signed-off-by: Ralph Campbell Looks good, Reviewed-by: Christoph Hellwig