Re: Process Hang in __read_seqcount_begin

2012-10-23 Thread Peter LaDow
(Sorry for the subject change, but I wanted to try and pull in those
who work on RT issues, and the subject didn't make that obvious.
Please search for the same subject without the RT Linux trailing
text.)

Well, more information.  Even with SMP enabled (and presumably the
migrate_enable having calls to preempt_disable), we still got the
lockup in iptables-restore.  I did more digging, and it looks like the
complete stack trace includes a call from iptables-restore (through
setsockopt with IPT_SO_SET_ADD_COUNTERS).  This seems to be a
potential multiple writer case where the counters are updated through
the syscall and the kernel is updating the counters as it filters
packets.

I think there might be a race on the update to xt_recseq.sequence,
since the RT patches remove the spinlock in seqlock_t.  Thus multiple
writers can corrupt the sequence count.  And I thought the SMP
configuration would disable preemption when local_bh_disable() is
called.  And indeed, looking at the disassembly, I see
preempt_disable() (though optimized, goes to add_preempt_count() ) is
being called.

Yet we still see the lockup in the get_counters() in iptables.  Which,
it seems, is because of some sort of problem with the sequence.  It
doesn't appear to be related to the preemption, and perhaps there is
some other corruption of the sequence counter happening.  But the only
places I see it modified is in xt_write_recseq_begin and
xt_write_recseq_end, which is only in the netfilter code
(ip6_tables.c, ip_tables.c, and arp_tables.c).  And every single call
is preceeded by a call to local_bh_disable().

This problem is a huge one for us.  And so far I'm unable to track
down how this is occurring.

Any other tips?  I presume this is the proper place for RT issues.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-24 Thread Peter LaDow
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet  wrote:
> Could you try following patch ?

Thanks for the suggestion. But I have a question about the patch below.

> +   /* Note : cmpxchg() is a memory barrier, we dont need smp_wmb() */
> +   if (old != new && cmpxchg(&ptr->sequence, old, new) == old)
> +   return 1;
> +   return 0;

Looking at arch/powerpc/include/asm/system.h, cmpxchg is defined as a
call to __cmpxchg_u32 (we are 32-bit, and I presume the size is 32
bits):

static __always_inline unsigned long
__cmpxchg_u32(volatile unsigned int *p, unsigned long old, unsigned long new)
{
unsigned int prev;

__asm__ __volatile__ (
PPC_RELEASE_BARRIER
"1: lwarx   %0,0,%2 # __cmpxchg_u32\n\
cmpw0,%0,%3\n\
bne-2f\n"
PPC405_ERR77(0,%2)
"   stwcx.  %4,0,%2\n\
bne-1b"
PPC_ACQUIRE_BARRIER
"\n\
2:"
: "=&r" (prev), "+m" (*p)
: "r" (p), "r" (old), "r" (new)
: "cc", "memory");

return prev;
}

And the interesting part is PPC_RELEASE_BARRIER and
PPC_ACQUIRE_BARRIER.  Both of these are noops in non SMP systems.
>From arch/powerpc/include/asm/sync.h:

#ifdef CONFIG_SMP
#define __PPC_ACQUIRE_BARRIER   \
START_LWSYNC_SECTION(97);   \
isync;  \
MAKE_LWSYNC_SECTION_ENTRY(97, __lwsync_fixup);
#define PPC_ACQUIRE_BARRIER "\n" stringify_in_c(__PPC_ACQUIRE_BARRIER)
#define PPC_RELEASE_BARRIER stringify_in_c(LWSYNC) "\n"
#else
#define PPC_ACQUIRE_BARRIER
#define PPC_RELEASE_BARRIER
#endif

So, if these are noops, does this really become an atomic operation?

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-10-31 Thread Peter LaDow
On Tue, Oct 30, 2012 at 5:33 PM, Steven Rostedt  wrote:
> From: Thomas Gleixner 
>
> The netfilter code relies only on the implicit semantics of
> local_bh_disable() for serializing wt_write_recseq sections. RT breaks
> that and needs explicit serialization here.
>
> Reported-by: Peter LaDow 
> Signed-off-by: Thomas Gleixner 

> diff --git a/include/linux/locallock.h b/include/linux/locallock.h
> index f1804a3..a5eea5d 100644

> diff --git a/include/linux/netfilter/x_tables.h 
> b/include/linux/netfilter/x_tables.h
> index 32cddf7..bed90da2 100644

> diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> index 899b71c..5db16ea 100644

I'm trying these out right now.  We've applied these patches to
3.0.36-rt58 (rather than pull in the full 3.0.48-rt72 -- too much risk
for us right now to do a full kernel change).  I am setting up a
3.0.48-rt72-rc1 fixture as well to test the release candidate.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
On Thu, Nov 1, 2012 at 2:26 PM, Thomas Gleixner  wrote:
> Cough. You are missing a boat load of crucial fixes. There is a damned
> good reason why 3.0.stable got 12 updates and the -rt version 14.

I don't doubt there are.  But we've only experienced one problem
between 3.0.36-rt58 and 3.0.48-rt72.  Indeed, it might be easier to
evaluate the risk if all the changelogs were readily available.  So
far, however, we haven't discovered any other issues that give us any
concerns.

> Your risk assessment is definitley interesting.

We have specifically tested 3.0.36-rt58 for performance in our
application.  Moving to 3.0.48-rt72 poses risk about whether would
still be able to meet our performance requirements.  And with regard
to stability, the only issue we've had since our earlier moved to
3.0.36-rt58 was the netfilter problem.  We initially only saw the
problem on PPP related interfaces, so we reverted to 3.0.3 (non-rt)
for PPP related issues.

Now, if you think there are some issues that should raise eyebrows, or
can point me to a list of changelogs, I'd be happy to evaluate them.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
On Tue, Oct 30, 2012 at 5:33 PM, Steven Rostedt  wrote:
> From: Thomas Gleixner 
>
> The netfilter code relies only on the implicit semantics of
> local_bh_disable() for serializing wt_write_recseq sections. RT breaks
> that and needs explicit serialization here.
>
> Reported-by: Peter LaDow 
> Signed-off-by: Thomas Gleixner 

> diff --git a/include/linux/locallock.h b/include/linux/locallock.h
> index f1804a3..a5eea5d 100644

> diff --git a/include/linux/netfilter/x_tables.h 
> b/include/linux/netfilter/x_tables.h
> index 32cddf7..bed90da2 100644

> diff --git a/net/netfilter/core.c b/net/netfilter/core.c
> index 899b71c..5db16ea 100644

I'm have a setup running 3.0.48-rt72.  It's been running about 8 hours
so far, and tomorrow I'll know if there's been any problems.  I'm
confident things will be fine tomorrow, and at that time I'll be glad
to attach a Tested-by tag.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-01 Thread Peter LaDow
> git log v3.0.36-rt58..3.0.48-rt72
>
> That's what a source version control system is designed for AFAICT

Thanks for the tip.  I (naively) presumed there were published
changelogs and was looking for them.  Nor did I know the git logs were
limited to releases, and didn't look there because I feared a huge
number of mid-release commits.  Thanks for the pointer.

I've gone through all the commits from 3.0.36-rt58 to 3.0.48-rt72 and
see a few items we need to pull in.  The majority are irrelevant to
our platform/configuration (ALSA, SCSI, x86, etc).  But the few I did
find point to us moving to rt72 when we can.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-02 Thread Peter LaDow
On Thu, Nov 1, 2012 at 5:36 PM, Peter LaDow  wrote:
> I'm have a setup running 3.0.48-rt72.  It's been running about 8 hours
> so far, and tomorrow I'll know if there's been any problems.  I'm
> confident things will be fine tomorrow, and at that time I'll be glad
> to attach a Tested-by tag.

Ok.  We've had 3.0.48-rt72 running on our setup that used to exhibit
the corruption of the sequence count xt_recseq for over 24 hours.  No
further corruptions occurred.  We've also tested this same version
with our PPP setup and it has not exhibited any problems.  Feel free
to add a Tested-by tag on the 3.0.48-rt72 patch for the netfilter
code.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT 3/4] net: netfilter: Serialize xt_write_recseq sections on RT

2012-11-02 Thread Peter LaDow
On a separate note, I want to thank everyone that helped with this
issue, especially Eric and Thomas, to and Steven and Thomas for
schooling me on the changelog extraction.  This problem was a big one
for us that we were struggling to understand.  All the help is greatly
appreciated.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
On Tue, Oct 23, 2012 at 9:32 PM, Eric Dumazet  wrote:
> Could you try following patch ?

So, I applied your patch.  And so far, it seems to have fixed the
issue.  I've had my systems running for 48 hours, and no lockup in
iptables.  Usually, I could get a lockup to occur within 12 to 24
hours, and running this long tells me this patch may have things
fixed.

Now, I have a couple of questions.

First, I'm not sure how this actually fixes things.  It does seem
there was a race before.  But it isn't clear to me how this patch
eliminates the race.  I fear that this patch only reduces the window
in which a race could occur, but doesn't eliminate it completely.

Second, perhaps use seqlock_t instead of a seqcount_t for xt_recseq?
This eliminates the problem.  But given that it has been using
seqcount_t for a long time, and is still using it, and nobody else has
had this issue appear, makes me wonder if this problem isn't something
unique to the RT patches.  Perhaps only use seqlock_t in built for
PREEMPT_RT_FULL?

Third, recent RT patches (such as 3.6.2-rt4) have added
preempt_disable_rt() and preempt_enable_rt() calls inside of
read_seqcount_begin() and write_seqcount_end() respectively.  The call
to local_bh_disable/local_bh_enable doesn't do anything to
disable/enable preemption (in 3.6.2-rt4), but it does in 3.03.36-rt58.
 But in 3.0.36-rt58 it only did so if not in an atomic context.  And
it doesn't appear to be an atomic context since local_bh_disable
increments preempt_count by SOFTIRQ_OFFSET.  And it appeared that
building with SMP enabled, even though it did call
preempt_disable_rt(), indirectly, through local_bh_disable(), but the
lockup still occurred.

Finally, after more testing, if this patch proves a solution to the
problem, we could apply it locally.  But what kind of testing would be
required as part of a submission back to the general kernel/RT folk?
The patch is easy enough to generate, but if it can't be proven that
this patch actually fixes anything, I fail to see how it would be
useful.

Thanks,
Pete LaDow
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
(I've added netfilter and linux-rt-users to try to pull in more help).

On Fri, Oct 26, 2012 at 9:48 AM, Eric Dumazet  wrote:
> Upstream kernel is fine, there is no race, as long as :
>
> local_bh_disable() disables BH and preemption.

Looking at the unpatched code in net/ipv4/netfilter/ip_tables.c, it
doesn't appear that any of the code checks the return value for
xt_write_receq_begin to determine if it is safe to write.  And neither
does the newly patched code.  How did the mainline code prevent
corruption of the tables it is updating?

Why isn't there something like:

  while ( (addend = xt_write_recseq_begin()) == 0 );

To make sure that only one person has write access to the tables?
Better yet, why not use a seqlock_t instead?

> Apparently RT changes this, so RT needs to change the code.

The RT patch only touches local_bh_disable/enable, not the code in
ip_tables.c.  Does the local_bh_disable/enable in the mainline code
protect against multiple writers?

> cmpxchg() has strong guarantees (and is also slower than upstream code),
> and seems a reasonable way to fix RT/iptables

I see this now.  And I agree that your patch would prevent corruption
of the sequence counter.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-26 Thread Peter LaDow
On Fri, Oct 26, 2012 at 2:05 PM, Eric Dumazet  wrote:
> Do you know what is per cpu data in linux kernel ?

I sorta did.  But since your response, I did more reading, and now I
see what you mean.  But I don't think this is a per cpu issue.  More
below.

> Because its not needed. Really I dont know why you want that.
>
> Once you are sure a thread cannot be interrupted by a softirq, and
> cannot migrate to another cpu, access to percpu data doesnt need other
> synchronization at all.

Because there are multiple entry points on the same CPU.  In
net/ipv4/netfilter/ip_tables, there are two entries to
xt_write_recseq_begin().  The first is in ipt_do_table and the other
is in get_counters.  Where we are seeing the lockup is with a
getsockopt syscall leading to do_counters.  The other path is through
ipt_do_table, which is installed as a hook.  I'm not sure from what
context the hooks are called, but it is certainly from a different
context than the syscall.

> Following sequence is safe :
>
> addend = (__this_cpu_read(xt_recseq.sequence) + 1) & 1;
> /*
>  * This is kind of a write_seqcount_begin(), but addend is 0 or 1
>  * We dont check addend value to avoid a test and conditional jump,
>  * since addend is most likely 1
>  */
> __this_cpu_add(xt_recseq.sequence, addend);

If this were safe, we wouldn't be seeing this lockup and your patch
wouldn't be needed.  So it seems that your patch doesn't really
address the issue that we are not "sure a thread cannot be interrupted
by a softirq, and cannot migrate to another cpu".  Well, we know it
cannot migrate to another CPU, because there isn't another CPU.  So
apparently, it can be interrupted by a softirq.  So local_bh_disable
isn't doing anything useful in the RT patches with regard to this.

As I mentioned earlier, I think perhaps what your patch did was ensure
an atomic update of the sequence counter.  But it does nothing to
synchronize two writers.  If they were already synchronized (such as
via the call to local_bh_disable), then we wouldn't see sequence
counter corruption, no?

Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-30 Thread Peter LaDow
Ok.  More of an update.  We've managed to create a scenario that
exhibits the problem much earlier.  We can now cause the lockup to
occur within a few hours (rather than the 12 to 24 hours in our other
scenario).

 Our setup is to to have a a lot of traffic constantly being processed
by the netfilter code.  After about 2 hours, any external attempt to
read the table entries (such as with getsocktopt and
IPT_SO_GET_ENTRIES) triggers the lockup.  What is strange is that this
does not appear until after a couple of hours of heavy traffic.  We
cannot trigger this problem in the first hour, rarely in the second
hour, and always after the second hour.

Now, our original setup did not have as much traffic.  But based upon
a quick, back of the napkin computation, it seems to occurring after a
certain amount of traffic.  I can try to get more firm numbers.  But
this kind of behavior hints less at a race condition between two
writers, and is instead somehow dependent upon the amount of traffic.
Indeed, my test program only uses IPT_SO_GET_ENTRIES which does not
trigger the second path do do_add_counters.  So I'm no longer thinking
the path through setsockopts is a cause of the problem.

So instead, it seems that the only way there could be multiple writers
(assuming that is the problem), is if there are multiple contexts
through which ipt_do_table() is called.  So far, my perusal of the
code indicates only through the hooks in each of the iptables modules.
 And it isn't clear to me how these are called.  But it does seem that
even with the patch Eric provided (which fixes the seqcount update),
there is still a potential problem.

If indeed we have multiple contexts executing ipt_do_table(), it is
possible for more than just the seqcount to be corrupted.  Indeed, it
seems that any updates to the internal structures could cause
problems.  It isn't clear to me if there is anything modified here,
other than the counters, so I'm not sure if there are any other
issues.  But regardless, if the counters could become corrupted, then
it is possible to break any rules that use them.

Anyway, based on earlier discussion, is there any reason not to use a
lock (presuming any solution properly takes into account possible
recursion)?  I understand that the mainline is protected, but perhaps
in the RT version we can use seqlock (and prevent a recursive lock)?

Thanks,
Pete LaDow
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
I posted this problem some time back on the linux-rt-users and
netfilter lists.  Since then, we thought we had a workaround to avoid
this problem, so we dropped the issue.  But now 5 months later, the
problem has reappeared.  And this time it is much more serious and
much more difficult to re-create.  After perusing both those lists,
I'm not sure if those were the proper places to post.  The netfilter
list seems to be more focused on the user space side of things, and
the RT page indicates that kernel side RT issues should go to lkml.

Anyway, here's a repost of that problem from July.  Perhaps somebody
here can point us in the right direction.

We are running 3.0.36-rt57 on a powerpc box.  During some testing with
heavy loads and interfaces coming up/going down (specifically PPP), we
have run into a case where iptables hangs and cannot be killed.  It
requires a reboot to fix the problem.

Connecting the BDI and debugging the kernel, we get:

#0  get_counters (t=0xdd5145a0, counters=0xe3458000)
at include/linux/seqlock.h:66
#1  0xc026b4ac in do_ipt_get_ctl (sk=,
cmd=, user=0x10612078, len=)
at net/ipv4/netfilter/ip_tables.c:918
#2  0xc06c in nf_sockopt (sk=, pf=2 '\002',
val=, opt=, len=0xdd4c7d4c,
get=1) at net/netfilter/nf_sockopt.c:109
#3  0xc0236b1c in ip_getsockopt (sk=0xdf071480, level=,
optname=65, optval=0x10612078 ,
optlen=0xbfbe0c2c) at net/ipv4/ip_sockglue.c:1308
#4  0xc02522a8 in raw_getsockopt (sk=0xdf071480, level=,
optname=, optval=,
optlen=) at net/ipv4/raw.c:811
#5  0xc01f4c38 in sock_common_getsockopt (sock=,
level=, optname=,
optval=, optlen=)
at net/core/sock.c:2157
#6  0xc01f2df8 in sys_getsockopt (fd=, level=0,
optname=65, optval=0x10612078 ,
optlen=0xbfbe0c2c) at net/socket.c:1839
#7  0xc01f45b4 in sys_socketcall (call=15, args=)
at net/socket.c:2421

It seems to be stuck in __read_seqcount_begin.  From include/linux/seqlock.h:

static inline unsigned __read_seqcount_begin(const seqcount_t *s)
{
unsigned ret;

repeat:
ret = ACCESS_ONCE(s->sequence);
if (unlikely(ret & 1)) {
cpu_relax();
  <- It is always here
goto repeat;
}
return ret;
}

I've been scouring the mailing lists and Google searches trying to
find something, but thus far have come up with nothing.

Any tips would be appreciated.

Thanks,
Pete
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
On Mon, Oct 22, 2012 at 10:01 AM, Eric Dumazet  wrote:
> This looks like a corruption of s->sequence, and is value is odd, even
> if no writer is alive.
>
> Does local_bh_disable() disables preemption on RT ?

Hmmm

With PREEMPT_RT_FULL defined (as we have):

void local_bh_disable(void)
{
migrate_disable();
current->softirq_nestcnt++;
}

And the RT patches add the following:

#ifdef CONFIG_PREEMPT_RT_FULL
# define preempt_disable_rt()  preempt_disable()
# define preempt_enable_rt()   preempt_enable()
# define preempt_disable_nort()do { } while (0)
# define preempt_enable_nort() do { } while (0)
# ifdef CONFIG_SMP
   extern void migrate_disable(void);
   extern void migrate_enable(void);
# else /* CONFIG_SMP */
#  define migrate_disable()do { } while (0)
#  define migrate_enable() do { } while (0)
# endif /* CONFIG_SMP */
#else
# define preempt_disable_rt()  do { } while (0)
# define preempt_enable_rt()   do { } while (0)
# define preempt_disable_nort()preempt_disable()
# define preempt_enable_nort() preempt_enable()
# define migrate_disable() preempt_disable()
# define migrate_enable()  preempt_enable()
#endif

And since we are not SMP, local_bh_disable() essentially does nothing.
 These definitions are consistent across all the RT patches, up to
3.6.2-rt4 (as far as I can tell).

Now, is preemption required to be disabled in non-SMP systems?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Process Hang in __read_seqcount_begin

2012-10-22 Thread Peter LaDow
> Now, is preemption required to be disabled in non-SMP systems?

I did more digging, and I found this.

In linux/netfilter/x_tables.h, there is the definition of
xt_write_recseq_begin.  This function updates the sequence number for
the sequence locks.  This is called in the iptables kernel code.  From
the header for the function:

/**
 * xt_write_recseq_begin - start of a write section
 *
 * Begin packet processing : all readers must wait the end
 * 1) Must be called with preemption disabled
 * 2) softirqs must be disabled too (or we should use irqsafe_cpu_add())
 * Returns :
 *  1 if no recursion on this cpu
 *  0 if recursion detected
 */

Note #1.  Your earlier response what that local_bh_disable should
disable preemption.  Indeed, looking at the calls to
xt_write_recseq_begin in the code, all are preceeded by a call to
local_bh_disable().  This call (local_bh_disable) is modified by the
RT patch.  The standard kernel calls __local_bh_disable, which does
quite a bit of work.  From the standard kernel:

static inline void _local_bh_enable_ip(unsigned long ip)
{
WARN_ON_ONCE(in_irq() || irqs_disabled());
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_disable();
#endif
/*
 * Are softirqs going to be turned on now:
 */
if (softirq_count() == SOFTIRQ_DISABLE_OFFSET)
trace_softirqs_on(ip);
/*
 * Keep preemption disabled until we are done with
 * softirq processing:
 */
sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1);

if (unlikely(!in_interrupt() && local_softirq_pending()))
do_softirq();

dec_preempt_count();
#ifdef CONFIG_TRACE_IRQFLAGS
local_irq_enable();
#endif
preempt_check_resched();
}

But the patch changes things:

void local_bh_disable(void)
{
   migrate_disable();
   current->softirq_nestcnt++;
}

And the call to migrate_disable() is defined as a macro, depending
upon the configuration.  From the patch:

#ifdef CONFIG_PREEMPT_RT_FULL
# define preempt_disable_rt()   preempt_disable()
# define preempt_enable_rt()preempt_enable()
# define preempt_disable_nort() do { } while (0)
# define preempt_enable_nort()  do { } while (0)
# ifdef CONFIG_SMP
   extern void migrate_disable(void);
   extern void migrate_enable(void);
# else /* CONFIG_SMP */
#  define migrate_disable() do { } while (0)
#  define migrate_enable()  do { } while (0)
# endif /* CONFIG_SMP */
#else
# define preempt_disable_rt()   do { } while (0)
# define preempt_enable_rt()do { } while (0)
# define preempt_disable_nort() preempt_disable()
# define preempt_enable_nort()  preempt_enable()
# define migrate_disable()  preempt_disable()
# define migrate_enable()   preempt_enable()
#endif

In our configuration, we have CONFIG_PREEMPT_RT_FULL defined, and do
not have CONFIG_SMP defined.  So migrate_enable() becomes a NOP.  This
seems to violate the requirement on the xt_write_recseq_begin
requirement.  However, enabling SMP does enable the actual function.

Should local_bh_disable() call the actual function in non-SMP systems
to meet the requirements for xt_write_recseq_begin()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/