On Fri, Apr 21, 2017 at 7:16 AM, Kirill A. Shutemov
<kir...@shutemov.name> wrote:
> On Thu, Apr 20, 2017 at 02:46:51PM -0700, Dan Williams wrote:
>> On Sat, Mar 18, 2017 at 2:52 AM, tip-bot for Kirill A. Shutemov
>> <tip...@zytor.com> wrote:
>> > Commit-ID:  2947ba054a4dabbd82848728d765346886050029
>> > Gitweb:     
>> > http://git.kernel.org/tip/2947ba054a4dabbd82848728d765346886050029
>> > Author:     Kirill A. Shutemov <kirill.shute...@linux.intel.com>
>> > AuthorDate: Fri, 17 Mar 2017 00:39:06 +0300
>> > Committer:  Ingo Molnar <mi...@kernel.org>
>> > CommitDate: Sat, 18 Mar 2017 09:48:03 +0100
>> >
>> > x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
>> >
>> > This patch provides all required callbacks required by the generic
>> > get_user_pages_fast() code and switches x86 over - and removes
>> > the platform specific implementation.
>> >
>> > Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
>> > Cc: Andrew Morton <a...@linux-foundation.org>
>> > Cc: Aneesh Kumar K . V <aneesh.ku...@linux.vnet.ibm.com>
>> > Cc: Borislav Petkov <b...@alien8.de>
>> > Cc: Catalin Marinas <catalin.mari...@arm.com>
>> > Cc: Dann Frazier <dann.fraz...@canonical.com>
>> > Cc: Dave Hansen <dave.han...@intel.com>
>> > Cc: H. Peter Anvin <h...@zytor.com>
>> > Cc: Linus Torvalds <torva...@linux-foundation.org>
>> > Cc: Peter Zijlstra <pet...@infradead.org>
>> > Cc: Rik van Riel <r...@redhat.com>
>> > Cc: Steve Capper <steve.cap...@linaro.org>
>> > Cc: Thomas Gleixner <t...@linutronix.de>
>> > Cc: linux-a...@vger.kernel.org
>> > Cc: linux...@kvack.org
>> > Link: 
>> > http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shute...@linux.intel.com
>> > [ Minor readability edits. ]
>> > Signed-off-by: Ingo Molnar <mi...@kernel.org>
>>
>> I'm still trying to spot the bug, but bisect points to this patch as
>> the point at which my unit tests start failing with the following
>> signature:
>
> I can't find the issue either.
>
> Is it something reproducible without hardware? In KVM?

You can do it in KVM, just boot with the memmap=ss!nn parameter to
simulate pmem. In this case I'm booting with memmap=4G!8G, you should
also specify "nokaslr".

> If yes, could you share the test-case?

Yes, run:

    ./autogen.sh
    ./configure CFLAGS='-g -O0' --prefix=/usr --sysconfdir=/etc
--libdir=/usr/lib64
    make TESTS=device-dax check

...from a checkout of the ndctl project:

    https://github.com/pmem/ndctl

Let me know if you run into any problems getting the test to build or run.

>
>> [   35.423841] WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155
>> percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
>> [   35.425328] percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0
>> (0) after switching to atomic
>> [   35.425329] Modules linked in: ip6t_rpfilter ip6t_REJECT
>> nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc
>> ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip
>> 6table_mangle ip6table_raw ip6table_security iptable_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
>> iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
>> ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul crc32c_intel
>> ghash_clmulni_intel nd_pmem(O) dax_pmem(O) nd_btt(O) dax(O) serio_raw
>> nfit(O) nd_e820(O) libnvdimm(O) tpm_tis tpm_tis_co
>> re tpm nfit_test_iomap(O) nfsd nfs_acl
>> [   35.433683] CPU: 8 PID: 245 Comm: rcuos/29 Tainted: G           O
>>  4.11.0-rc2+ #55
>> [   35.435538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.9.3-1.fc25 04/01/2014
>> [   35.437500] Call Trace:
>> [   35.438270]  dump_stack+0x86/0xc3
>> [   35.439156]  __warn+0xcb/0xf0
>> [   35.439995]  warn_slowpath_fmt+0x5f/0x80
>> [   35.440962]  ? rcu_nocb_kthread+0x27a/0x500
>> [   35.441957]  ? dax_pmem_percpu_exit+0x50/0x50 [dax_pmem]
>> [   35.443107]  percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
>> [   35.444251]  ? percpu_ref_exit+0x60/0x60
>> [   35.445206]  rcu_nocb_kthread+0x327/0x500
>> [   35.446186]  ? rcu_nocb_kthread+0x27a/0x500
>> [   35.447188]  kthread+0x10c/0x140
>> [   35.448058]  ? rcu_eqs_enter+0x50/0x50
>> [   35.448990]  ? kthread_create_on_node+0x60/0x60
>> [   35.450038]  ret_from_fork+0x31/0x40
>> [   35.450976] ---[ end trace eaa40898a09519b5 ]---
>>
>> This is similar to the backtrace when we were not properly handling
>> pud faults and was fixed with this commit: 220ced1676c4 "mm: fix
>> get_user_pages() vs device-dax pud mappings"
>>
>> I've found some missing _devmap checks in the generic
>> get_user_pages_fast() path, but this does not fix the regression:
>
> I don't see these in x86 GUP. Was the bug there too?

No it wasn't, the test runs fine with v4.11-rc7, so perhaps I'm
looking in the wrong place...

Reply via email to