On Mon, Apr 24, 2017 at 10:30 AM, Kirill A. Shutemov
<kir...@shutemov.name> wrote:
> On Mon, Apr 24, 2017 at 10:23:59AM -0700, Dan Williams wrote:
>> On Sun, Apr 23, 2017 at 4:31 PM, Kirill A. Shutemov
>> <kir...@shutemov.name> wrote:
>> > On Thu, Apr 20, 2017 at 02:46:51PM -0700, Dan Williams wrote:
>> >> On Sat, Mar 18, 2017 at 2:52 AM, tip-bot for Kirill A. Shutemov
>> >> <tip...@zytor.com> wrote:
>> >> > Commit-ID:  2947ba054a4dabbd82848728d765346886050029
>> >> > Gitweb:     
>> >> > http://git.kernel.org/tip/2947ba054a4dabbd82848728d765346886050029
>> >> > Author:     Kirill A. Shutemov <kirill.shute...@linux.intel.com>
>> >> > AuthorDate: Fri, 17 Mar 2017 00:39:06 +0300
>> >> > Committer:  Ingo Molnar <mi...@kernel.org>
>> >> > CommitDate: Sat, 18 Mar 2017 09:48:03 +0100
>> >> >
>> >> > x86/mm/gup: Switch GUP to the generic get_user_page_fast() 
>> >> > implementation
>> >> >
>> >> > This patch provides all required callbacks required by the generic
>> >> > get_user_pages_fast() code and switches x86 over - and removes
>> >> > the platform specific implementation.
>> >> >
>> >> > Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
>> >> > Cc: Andrew Morton <a...@linux-foundation.org>
>> >> > Cc: Aneesh Kumar K . V <aneesh.ku...@linux.vnet.ibm.com>
>> >> > Cc: Borislav Petkov <b...@alien8.de>
>> >> > Cc: Catalin Marinas <catalin.mari...@arm.com>
>> >> > Cc: Dann Frazier <dann.fraz...@canonical.com>
>> >> > Cc: Dave Hansen <dave.han...@intel.com>
>> >> > Cc: H. Peter Anvin <h...@zytor.com>
>> >> > Cc: Linus Torvalds <torva...@linux-foundation.org>
>> >> > Cc: Peter Zijlstra <pet...@infradead.org>
>> >> > Cc: Rik van Riel <r...@redhat.com>
>> >> > Cc: Steve Capper <steve.cap...@linaro.org>
>> >> > Cc: Thomas Gleixner <t...@linutronix.de>
>> >> > Cc: linux-a...@vger.kernel.org
>> >> > Cc: linux...@kvack.org
>> >> > Link: 
>> >> > http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shute...@linux.intel.com
>> >> > [ Minor readability edits. ]
>> >> > Signed-off-by: Ingo Molnar <mi...@kernel.org>
>> >>
>> >> I'm still trying to spot the bug, but bisect points to this patch as
>> >> the point at which my unit tests start failing with the following
>> >> signature:
>> >>
>> >> [   35.423841] WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155
>> >> percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
>> >
>> > Okay, I've tracked it down. The issue is triggered by replacment
>> > get_page() with page_cache_get_speculative().
>> >
>> > page_cache_get_speculative() doesn't have get_zone_device_page(). :-|
>> >
>> > And I think it's your bug, Dan: it's wrong to have
>> > get_/put_zone_device_page() in get_/put_page(). I must be handled by
>> > page_ref_* machinery to catch all cases where we manipulate with page
>> > refcount.
>>
>> The page_ref conversion landed in 4.6 *after* the ZONE_DEVICE
>> implementation that landed in 4.5, so there was a missed conversion of
>> the zone-device reference counting to page_ref.
>
> Fair enough.
>
> But get_page_unless_zero() definitely predates ZONE_DEVICE. :)
>

It does, but that's deliberate. A ZONE_DEVICE page never has a zero
reference count, it's always owned by the device, never by the page
allocator. ZONE_DEVICE overrides the ->lru list_head to store private
device information and we rely on the behavior that a non-zero
reference means the page is not added to any lru or page cache list.

Reply via email to