Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2020-10-08 Thread Ralph Campbell



On 10/7/20 10:17 PM, Ram Pai wrote:

On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote:

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.



I was hoping this patch would resolve a page-reference issue that we run
into at random times while migrating a page to a device page backed by
secure-memory.

Unfortunately I continue to see the problem. There is a reference
held on that page, which fails the migration.

FYI
RP


I'm willing to look into it but I would need more information.
Can you give any more details about the conditions when it happens?
It would be great if you have a program that reproduces the problem.


Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2020-10-07 Thread Ram Pai
On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote:
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.


I was hoping this patch would resolve a page-reference issue that we run
into at random times while migrating a page to a device page backed by
secure-memory.

Unfortunately I continue to see the problem. There is a reference
held on that page, which fails the migration.

FYI
RP


Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2020-10-05 Thread Ralph Campbell



On 10/1/20 10:59 PM, Christoph Hellwig wrote:

On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote:

ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference count doesn't need to
be treated specially for ZONE_DEVICE.

Signed-off-by: Ralph Campbell 


Looks good,

Reviewed-by: Christoph Hellwig 


Thanks for the review.

I still have reservations about making this an official patch.
Did you see the updated cover letter?
Basically, I'm concerned about ZONE_DEVICE struct pages being inserted into
the process page table with a zero reference count with vmf_insert_mixed().
If it is to be a non-zero reference count, then DAX, pmem, and other uses
of ZONE_DEVICE pages need to be changed (or vmf_insert_mixed()) to
inc/dec in appropriate places but I don't feel I know that code well enough
to make those changes.


Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2020-10-01 Thread Christoph Hellwig
On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote:
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.
> 
> Signed-off-by: Ralph Campbell 

Looks good,

Reviewed-by: Christoph Hellwig