Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 18:48 +0800, Herbert Xu wrote:
> On Mon, Sep 19, 2016 at 12:10:27PM +0200, Johannes Berg wrote:
> > 
> > Btw, for debug I put
> > 
> > BUG_ON(atomic_read(&ht->nelems) < 0);
> > 
> > after the atomic_dec() in __rhashtable_remove_fast_one(). That
> > makes
> > the kernel crash instantly on the buggy code, and I just have to
> > run a
> > single test ("wpas_ctrl_interface_add_many") to get there.
> 
> Aha I see the problem now.  The nelems logic on remove is broken.

I looked at it for a long time, but didn't see it :) But yeah, I've
come to the same conclusion by adding debugging of the chains etc.

> I'll send out a v3.

I'll test it when I have it :)

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Herbert Xu
On Mon, Sep 19, 2016 at 12:10:27PM +0200, Johannes Berg wrote:
> Btw, for debug I put
> 
> BUG_ON(atomic_read(&ht->nelems) < 0);
> 
> after the atomic_dec() in __rhashtable_remove_fast_one(). That makes
> the kernel crash instantly on the buggy code, and I just have to run a
> single test ("wpas_ctrl_interface_add_many") to get there.

Aha I see the problem now.  The nelems logic on remove is broken.
I'll send out a v3.

Thanks!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
Btw, for debug I put

BUG_ON(atomic_read(&ht->nelems) < 0);

after the atomic_dec() in __rhashtable_remove_fast_one(). That makes
the kernel crash instantly on the buggy code, and I just have to run a
single test ("wpas_ctrl_interface_add_many") to get there.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 12:02 +0200, Johannes Berg wrote:
> On Mon, 2016-09-19 at 11:54 +0200, Johannes Berg wrote:
> > 
> > > 
> > > 
> > > The stack trace is useless, but my other annotation showed that
> > > the
> > > table's nelems *underflowed* to -1, so now the worker will
> > > continue
> > > to try to grow it forever.
> > > 
> > 
> > And this *was* actually a case of duplication, afaict, since it was
> > multiple virtual interfaces on the same device all connecting to
> > the
> > same AP.
> 
> It seems that __rhashtable_remove_fast_one() should return 0 even in
> the case of err==1 for the "skip all the maintenance due to list
> deletion"?
> 
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -1009,7 +1009,7 @@ static inline int __rhashtable_remove_fast_one(
> err = 0;
> }
>  
> -   return err;
> +   return err < 0 ? err : 0;
>  }
> 

No, that's obviously bogus and already handled - wrong case anyway.
Sorry.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 11:54 +0200, Johannes Berg wrote:
> > 
> > The stack trace is useless, but my other annotation showed that the
> > table's nelems *underflowed* to -1, so now the worker will continue
> > to try to grow it forever.
> > 
> 
> And this *was* actually a case of duplication, afaict, since it was
> multiple virtual interfaces on the same device all connecting to the
> same AP.

It seems that __rhashtable_remove_fast_one() should return 0 even in
the case of err==1 for the "skip all the maintenance due to list
deletion"?

--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -1009,7 +1009,7 @@ static inline int __rhashtable_remove_fast_one(
err = 0;
}
 
-   return err;
+   return err < 0 ? err : 0;
 }

But that in itself doesn't help.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg

> The stack trace is useless, but my other annotation showed that the
> table's nelems *underflowed* to -1, so now the worker will continue
> to try to grow it forever.
> 

And this *was* actually a case of duplication, afaict, since it was
multiple virtual interfaces on the same device all connecting to the
same AP.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 17:34 +0800, Herbert Xu wrote:
> On Mon, Sep 19, 2016 at 11:27:24AM +0200, Johannes Berg wrote:
> > 
> > 
> > I have a feeling there's a bug with ht->nelems, since the crash is
> > always in the grow worker, but I haven't quite put my finger on it
> > yet.
> 
> Can you show me a stack trace?
> 
The stack trace is useless, but my other annotation showed that the
table's nelems *underflowed* to -1, so now the worker will continue to
try to grow it forever.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 17:34 +0800, Herbert Xu wrote:
> On Mon, Sep 19, 2016 at 11:27:24AM +0200, Johannes Berg wrote:
> > 
> > 
> > I have a feeling there's a bug with ht->nelems, since the crash is
> > always in the grow worker, but I haven't quite put my finger on it
> > yet.
> 
> Can you show me a stack trace?

Oops, just deleted the old ones to reproduce with some debug printks :)

It's always while trying to grow a table from the worker; it kills
processes, but eventually no process are left and the kernel panics.

I'll send you a trace when I have one again - I'm running a new
reproduction now.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Herbert Xu
On Mon, Sep 19, 2016 at 11:27:24AM +0200, Johannes Berg wrote:
> 
> I have a feeling there's a bug with ht->nelems, since the crash is
> always in the grow worker, but I haven't quite put my finger on it yet.

Can you show me a stack trace?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 11:27 +0200, Johannes Berg wrote:
> > 
> I have a feeling there's a bug with ht->nelems, since the crash is
> always in the grow worker, but I haven't quite put my finger on it
> yet.
> 
Btw, I should not actually get into the duplicate case here.

johannes


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg

> What does your test suite actually do? Is it something that I
> can run without special hardware?

Yes, it's pretty simple - it spins up a number of VMs with hwsim and
just runs a lot of tests:

https://w1.fi/cgit/hostap/tree/tests/hwsim

I've attached a kernel .config you can use for it.

I'm running 7 VMs (vm/parallel-vm.py 7), which makes it faster, but you
can probably reproduce the bug better with a single VM.

I have a feeling there's a bug with ht->nelems, since the crash is
always in the grow worker, but I haven't quite put my finger on it yet.

johannes#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.8.0-rc4 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
CONFIG_KERNEL_BZIP2=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_FHANDLE is not set
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=21
CONFIG_LOG_CPU_MAX_BUF_SHIFT=17
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_CGROUPS is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_NAMESPACES is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_MULTIUSER=y
CONFIG_SGETMASK_SYSCALL=y
CONFIG_SYSFS_SYSCALL=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPO

Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Herbert Xu
On Mon, Sep 19, 2016 at 11:15:30AM +0200, Johannes Berg wrote:
> On Mon, 2016-09-19 at 16:40 +0800, Herbert Xu wrote:
> 
> > I've tested the rhlist code with test_rhashtable but I haven't
> > tested the mac80211 conversion.  So please give it a go and see
> > if it still works.
> 
> This is still running out of memory on my test suite.
> 
> Somehow I don't see kmemleak kicking in, so I'll have to find the bug
> manually :)

What does your test suite actually do? Is it something that I
can run without special hardware?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Johannes Berg
On Mon, 2016-09-19 at 16:40 +0800, Herbert Xu wrote:

> I've tested the rhlist code with test_rhashtable but I haven't
> tested the mac80211 conversion.  So please give it a go and see
> if it still works.

This is still running out of memory on my test suite.

Somehow I don't see kmemleak kicking in, so I'll have to find the bug
manually :)

johannes


[v2 PATCH 0/2] rhashtable: rhashtable with duplicate objects

2016-09-19 Thread Herbert Xu
v2 contains a reworked insertion slowpath to ensure that the
spinlock for the table we're inserting into is taken.

This series contains one two patches.  The first adds the rhlist
interface and the second converts mac80211 to use it.  If this works
out I'll then proceed to convert the other insecure_elasticity
users over to this.

I've tested the rhlist code with test_rhashtable but I haven't
tested the mac80211 conversion.  So please give it a go and see
if it still works.

Thanks!
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt