On Sun, Apr 23, 2017 at 4:31 PM, Kirill A. Shutemov <[email protected]> wrote: > On Thu, Apr 20, 2017 at 02:46:51PM -0700, Dan Williams wrote: >> On Sat, Mar 18, 2017 at 2:52 AM, tip-bot for Kirill A. Shutemov >> <[email protected]> wrote: >> > Commit-ID: 2947ba054a4dabbd82848728d765346886050029 >> > Gitweb: >> > http://git.kernel.org/tip/2947ba054a4dabbd82848728d765346886050029 >> > Author: Kirill A. Shutemov <[email protected]> >> > AuthorDate: Fri, 17 Mar 2017 00:39:06 +0300 >> > Committer: Ingo Molnar <[email protected]> >> > CommitDate: Sat, 18 Mar 2017 09:48:03 +0100 >> > >> > x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation >> > >> > This patch provides all required callbacks required by the generic >> > get_user_pages_fast() code and switches x86 over - and removes >> > the platform specific implementation. >> > >> > Signed-off-by: Kirill A. Shutemov <[email protected]> >> > Cc: Andrew Morton <[email protected]> >> > Cc: Aneesh Kumar K . V <[email protected]> >> > Cc: Borislav Petkov <[email protected]> >> > Cc: Catalin Marinas <[email protected]> >> > Cc: Dann Frazier <[email protected]> >> > Cc: Dave Hansen <[email protected]> >> > Cc: H. Peter Anvin <[email protected]> >> > Cc: Linus Torvalds <[email protected]> >> > Cc: Peter Zijlstra <[email protected]> >> > Cc: Rik van Riel <[email protected]> >> > Cc: Steve Capper <[email protected]> >> > Cc: Thomas Gleixner <[email protected]> >> > Cc: [email protected] >> > Cc: [email protected] >> > Link: >> > http://lkml.kernel.org/r/[email protected] >> > [ Minor readability edits. ] >> > Signed-off-by: Ingo Molnar <[email protected]> >> >> I'm still trying to spot the bug, but bisect points to this patch as >> the point at which my unit tests start failing with the following >> signature: >> >> [ 35.423841] WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 >> percpu_ref_switch_to_atomic_rcu+0x1f5/0x200 > > Okay, I've tracked it down. The issue is triggered by replacment > get_page() with page_cache_get_speculative(). > > page_cache_get_speculative() doesn't have get_zone_device_page(). :-| > > And I think it's your bug, Dan: it's wrong to have > get_/put_zone_device_page() in get_/put_page(). I must be handled by > page_ref_* machinery to catch all cases where we manipulate with page > refcount.
The page_ref conversion landed in 4.6 *after* the ZONE_DEVICE implementation that landed in 4.5, so there was a missed conversion of the zone-device reference counting to page_ref. > Back to the big picture: > > I hate that we need to have such additional code in page refcount > primitives. I worked hard to remove compound page ugliness from there and > now zone_device creeping in... > > Is it the only option? Not sure, I need to spend some time to understand what page_ref means to ZONE_DEVICE.

