On Tue, 2012-10-23 at 17:15 -0700, Peter LaDow wrote: > (Sorry for the subject change, but I wanted to try and pull in those > who work on RT issues, and the subject didn't make that obvious. > Please search for the same subject without the RT Linux trailing > text.) > > Well, more information. Even with SMP enabled (and presumably the > migrate_enable having calls to preempt_disable), we still got the > lockup in iptables-restore. I did more digging, and it looks like the > complete stack trace includes a call from iptables-restore (through > setsockopt with IPT_SO_SET_ADD_COUNTERS). This seems to be a > potential multiple writer case where the counters are updated through > the syscall and the kernel is updating the counters as it filters > packets. > > I think there might be a race on the update to xt_recseq.sequence, > since the RT patches remove the spinlock in seqlock_t. Thus multiple > writers can corrupt the sequence count.
> And I thought the SMP > configuration would disable preemption when local_bh_disable() is > called. And indeed, looking at the disassembly, I see > preempt_disable() (though optimized, goes to add_preempt_count() ) is > being called. > > Yet we still see the lockup in the get_counters() in iptables. Which, > it seems, is because of some sort of problem with the sequence. It > doesn't appear to be related to the preemption, and perhaps there is > some other corruption of the sequence counter happening. But the only > places I see it modified is in xt_write_recseq_begin and > xt_write_recseq_end, which is only in the netfilter code > (ip6_tables.c, ip_tables.c, and arp_tables.c). And every single call > is preceeded by a call to local_bh_disable(). > > This problem is a huge one for us. And so far I'm unable to track > down how this is occurring. > > Any other tips? I presume this is the proper place for RT issues. Hmm... Could you try following patch ? diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 8d674a7..053f9e7 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -471,30 +471,24 @@ DECLARE_PER_CPU(seqcount_t, xt_recseq); * * Begin packet processing : all readers must wait the end * 1) Must be called with preemption disabled - * 2) softirqs must be disabled too (or we should use this_cpu_add()) * Returns : * 1 if no recursion on this cpu * 0 if recursion detected */ static inline unsigned int xt_write_recseq_begin(void) { - unsigned int addend; - + unsigned int old, new; + seqcount_t *ptr = &__get_cpu_var(xt_recseq); /* * Low order bit of sequence is set if we already * called xt_write_recseq_begin(). */ - addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1; - - /* - * This is kind of a write_seqcount_begin(), but addend is 0 or 1 - * We dont check addend value to avoid a test and conditional jump, - * since addend is most likely 1 - */ - __this_cpu_add(xt_recseq.sequence, addend); - smp_wmb(); - - return addend; + old = ptr->sequence; + new = old | 1; + /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */ + if (old != new && cmpxchg(&ptr->sequence, old, new) == old) + return 1; + return 0; } /** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/