(Sorry for the subject change, but I wanted to try and pull in those who work on RT issues, and the subject didn't make that obvious. Please search for the same subject without the RT Linux trailing text.)
Well, more information. Even with SMP enabled (and presumably the migrate_enable having calls to preempt_disable), we still got the lockup in iptables-restore. I did more digging, and it looks like the complete stack trace includes a call from iptables-restore (through setsockopt with IPT_SO_SET_ADD_COUNTERS). This seems to be a potential multiple writer case where the counters are updated through the syscall and the kernel is updating the counters as it filters packets. I think there might be a race on the update to xt_recseq.sequence, since the RT patches remove the spinlock in seqlock_t. Thus multiple writers can corrupt the sequence count. And I thought the SMP configuration would disable preemption when local_bh_disable() is called. And indeed, looking at the disassembly, I see preempt_disable() (though optimized, goes to add_preempt_count() ) is being called. Yet we still see the lockup in the get_counters() in iptables. Which, it seems, is because of some sort of problem with the sequence. It doesn't appear to be related to the preemption, and perhaps there is some other corruption of the sequence counter happening. But the only places I see it modified is in xt_write_recseq_begin and xt_write_recseq_end, which is only in the netfilter code (ip6_tables.c, ip_tables.c, and arp_tables.c). And every single call is preceeded by a call to local_bh_disable(). This problem is a huge one for us. And so far I'm unable to track down how this is occurring. Any other tips? I presume this is the proper place for RT issues. Thanks, Pete -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/