Re: the very first step towards MULTIPROCESSOR friendly PF

2015-09-07 Thread Alexandr Nedvedicky
Hello,

please do not use the patch I've sent last Friday. It's wrong. It might work,
but it's a dead end in evolution. The thing is I misunderstood (read: ignored)
all the constraints behind packets and interrupts in OpenBSD kernel.

see mikeb's email below.


please do not wast your cycles with the stuff from last Friday.

regards
sasha

cut here to get mike's email...

8<---8<-8<

Hi,

We will certainly discuss it next week, but I think this is a wrong
direction.  rw-locks don't mix with interrupts at all, grabbing locks
for every packet should also be avoided if possible.

You have mentioned deferred rule gc that you've implemented.
perhaps this is more important to do right now.  Get rid of as much
memory operations during packet processing as possible.
This is something that can be committed early.

Cheers,
Mike



Re: the very first step towards MULTIPROCESSOR friendly PF

2015-09-04 Thread Alexandr Nedvedicky
Hello,

after reading emails from Philip Guenther and Mark Kettenis, doing some RTFM on
locking in OpenBSD kernel I have a new patch. I call it as a smp-step-0.

Patch introduces a KERNEL_LOCK() to PF. Many dances with KERNEL_LOCK() happens
in pf_test(). From future work point of view there are distinct parts as
follows in pf_test():

- packet sanitization with reassembly. We grab and drop KERNEL_LOCK()
  pf_noralize_ip*() functions around  pf_reassemble*(). Fragment cache
  will get its own rw-lock later.

- as soon as packet is sanitized (or should say normalized), we do
  state check. State check will get its own rw-lock (I call it
  smp-step-1)

- if no state is found PF proceeds to rule check, which will get
  its rw-lock too (smp-step-2).

Basically KERNEL_LOCK() is placed everywhere, where individual rw-locks will be
introduced later.

As Philip Guenther has pointed out earlier in email, the user-threads must
grab KERNEL_LOCK() before they will jump to rw-locks. The game is easy for PF
in ioctl() subsystem, since everything there will be user thread.

For packets (pf_test()) function the game is not that clear, since not all
packets will be running in user-thread context. My assumption is the packets
bound to loopback and local outbound packets are running in user-threads.  All
other packets (inbound and forwarded) are running in interrupt-threads. For
this reason I'd like to introduce the extra code to pf_test(), which decides
whether packet requires KERNEL_LOCK(), because it is running as a user thread.

I'd like to get the patch in, before proceeding to next SMP step.

thanks and
regards
sasha

8<---8<---8<--8<

Index: net/pf.c
===
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.937
diff -u -p -r1.937 pf.c
--- net/pf.c1 Sep 2015 19:12:25 -   1.937
+++ net/pf.c4 Sep 2015 12:52:55 -
@@ -6316,6 +6316,7 @@ pf_test(sa_family_t af, int fwdir, struc
union pf_headers pdhdrs;
int  dir = (fwdir == PF_FWD) ? PF_OUT : fwdir;
u_int32_tqid, pqid = 0;
+   int  have_lock = 0;
 
if (!pf_status.running)
return (PF_PASS);
@@ -6349,6 +6350,26 @@ pf_test(sa_family_t af, int fwdir, struc
return (PF_PASS);
}
 
+   /*
+* Unlike kernel-threads, the user-threads require to have
+* KENREL_LOCK() before they will touch rw-locks.
+*
+* The assumption here is fairly simple:
+*  - all inbound packets run as interrupt (kernel-thread),
+*unless they are bound to loopback
+*
+*  - outbound packets run as kernel-thread, if they
+*are marked by PF_TAG_KERNEL flag.
+*
+* in all other cases we must assume packet runs in user-thread
+* context and we should get KERNEL_LOCK().
+*/
+   if ((dir == PF_IN) && (ifp->if_type != IFT_LOOP)) {
+   (*m0)->m_pkthdr.pf.flags |= PF_TAG_KERNEL;
+   } else if (((*m0)->m_pkthdr.pf.flags & PF_TAG_KERNEL) == 0) {
+   PF_LOCK(have_lock);
+   }
+
action = pf_setup_pdesc(&pd, &pdhdrs, af, dir, kif, *m0, &reason);
if (action != PF_PASS) {
 #if NPFLOG > 0
@@ -6399,6 +6420,7 @@ pf_test(sa_family_t af, int fwdir, struc
 * handle fragments that aren't reassembled by
 * normalization
 */
+   PF_LOCK(have_lock);
action = pf_test_rule(&pd, &r, &s, &a, &ruleset);
if (action != PF_PASS)
REASON_SET(&reason, PFRES_FRAG);
@@ -6413,6 +6435,7 @@ pf_test(sa_family_t af, int fwdir, struc
"dropping IPv6 packet with ICMPv4 payload");
goto done;
}
+   PF_LOCK(have_lock);
action = pf_test_state_icmp(&pd, &s, &reason);
if (action == PF_PASS || action == PF_AFRT) {
 #if NPFSYNC > 0
@@ -6437,6 +6460,7 @@ pf_test(sa_family_t af, int fwdir, struc
"dropping IPv4 packet with ICMPv6 payload");
goto done;
}
+   PF_LOCK(have_lock);
action = pf_test_state_icmp(&pd, &s, &reason);
if (action == PF_PASS || action == PF_AFRT) {
 #if NPFSYNC > 0
@@ -6461,6 +6485,7 @@ pf_test(sa_family_t af, int fwdir, struc
if (action == PF_DROP)
goto done;
}
+   PF_LOCK(have_lock);
action = pf_test_state(&pd, &s, &reason);
if (action == PF_PASS || action == PF_AFRT) {
 #if NPFSYNC > 0
@@ -6658,6 +6683,8 @@ done:
else if (!s->if_index_out && dir == PF_OUT)
 

Re: the very first step towards MULTIPROCESSOR friendly PF

2015-08-27 Thread Martin Pieuchot
On 27/08/15(Thu) 14:19, Alexandr Nedvedicky wrote:
> 
> 
> > > Assuming the locking in MULTIPROCESSOR goes like:
> > >interrupt grabs splsoftnet -> ip_input -> PF grabs KERNEL_LOCK()
> > > We need to take care of ioctl() call path and purge thread. Those need to
> > > get synchronize with packets using KERNEL_LOCK(). They should not to mess 
> > > with
> > > splsoftnet() any more in MULTIPROCESSOR version.
> > 
> > As long as we are using (soft-)interrupts to process incoming packets
> > they should.
> > 
> > If you don't raise the SPL level to softnet while CPU0 is executing some
> > code in ioctl() path and it takes an interrupt at this point it might end
> > up executing ip_input() *before* returning to the ioctl() (process)
> > context.  This scenario might corrupt the states you're trying to protect.
> > 
> 
> just to verify I got everything right here. I assume I've got MULTIPROCESSOR
> kernel with my broken patch, which basically trades or splfotnet() locks for
> KERNEL_LOCKS()...
> 
> We have ioctl() operation, which attempts to insert a rule into a global
> ruleset, ioctl() grabs KERNEL_LOCK() only and gets to work. (yes my broken
> patch does not grab splsoftnet()).
> 
> while ioctl() is working there is interrupt, which delivers packet from NIC.
> since no one holds splsoftnet() on CPU packet is free to go and enters
> firewall (pf_test()).  And that's very bad, interrupt tries to grab
> KERNEL_LOCK(), which is still held by ioctl() packet has just interrupted. One
> of the two bad things will happen:
> 
>   a) deadlock, we will be waiting for KERNEL_LOCK(), which is still held
>  by interrupted ioctl() operation.
> 
>   b) since KERNEL_LOCK() is recursive, the pf_test() will be free to
>  go possibly finding a list of rules in inconsistent state. In this
>  case no deadlock happens, but PF plays game with pointers...
> 
> as Kettenis pointed out the KERNEL_LOCK() is recursive, I have an itchy 
> feeling
> we are taking b) option. Is that right?

Yep :)



Re: the very first step towards MULTIPROCESSOR friendly PF

2015-08-27 Thread Alexandr Nedvedicky


> > Assuming the locking in MULTIPROCESSOR goes like:
> >  interrupt grabs splsoftnet -> ip_input -> PF grabs KERNEL_LOCK()
> > We need to take care of ioctl() call path and purge thread. Those need to
> > get synchronize with packets using KERNEL_LOCK(). They should not to mess 
> > with
> > splsoftnet() any more in MULTIPROCESSOR version.
> 
> As long as we are using (soft-)interrupts to process incoming packets
> they should.
> 
> If you don't raise the SPL level to softnet while CPU0 is executing some
> code in ioctl() path and it takes an interrupt at this point it might end
> up executing ip_input() *before* returning to the ioctl() (process)
> context.  This scenario might corrupt the states you're trying to protect.
> 

just to verify I got everything right here. I assume I've got MULTIPROCESSOR
kernel with my broken patch, which basically trades or splfotnet() locks for
KERNEL_LOCKS()...

We have ioctl() operation, which attempts to insert a rule into a global
ruleset, ioctl() grabs KERNEL_LOCK() only and gets to work. (yes my broken
patch does not grab splsoftnet()).

while ioctl() is working there is interrupt, which delivers packet from NIC.
since no one holds splsoftnet() on CPU packet is free to go and enters
firewall (pf_test()).  And that's very bad, interrupt tries to grab
KERNEL_LOCK(), which is still held by ioctl() packet has just interrupted. One
of the two bad things will happen:

a) deadlock, we will be waiting for KERNEL_LOCK(), which is still held
   by interrupted ioctl() operation.

b) since KERNEL_LOCK() is recursive, the pf_test() will be free to
   go possibly finding a list of rules in inconsistent state. In this
   case no deadlock happens, but PF plays game with pointers...

as Kettenis pointed out the KERNEL_LOCK() is recursive, I have an itchy feeling
we are taking b) option. Is that right?

> > Does that idea sound right? If not what piece I'm still missing in puzzle?
> 
> I think you're assuming that the "softnet" code path is running in a
> thread which is not (yet) the case.

yes, my context is not fully switched from Solaris to OpenBSD yet...

thanks and
regards
sasha



Re: the very first step towards MULTIPROCESSOR friendly PF

2015-08-27 Thread Martin Pieuchot
On 27/08/15(Thu) 11:10, Alexandr Nedvedicky wrote:
> On Wed, Aug 26, 2015 at 06:12:10PM +0200, Mark Kettenis wrote:
> > > Date: Wed, 26 Aug 2015 17:30:14 +0200
> > > From: Alexandr Nedvedicky 
> > > 
> > > Hello,
> > > 
> > > I'm not sure I got everything right in Calgary. So this patch should
> > > roughly illustrates how I think we should start moving forward to
> > > make PF MULTIPROCESSOR friendly. It's quite likely my proposal/way
> > > is completely off, I'll be happy if you put me back to ground.
> > > 
> > > The brief summary of what patch is trying to achieve is as follows:
> > > 
> > >   patch trades all splsoftnet() for KERNEL_LOCK() when it gets compiled
> > >   with MULTIPROCESSOR option on.
> > > 
> > >   if MULTIPROCESSOR option is off, the compiler produces PF, which uses
> > >   splsoftnet.
> > > 
> > >   To achieve this the patch introduces macros PF_LOCK()/PF_UNLOCK(),
> > >   which expand to KERNEL_LOCK()/KERNEL_UNLOCK(), when MULTIPROCESSOR is 
> > > on.
> > >   On the other hand if MULTIPROCESSOR is off the PF_*LOCK() macros become
> > >   splsoftnet()/splx()
> > 
> > I don't think this will work.  Even on MULTIPROCESSOR kernels you'll
> > need to raise the spl to prevent soft interrupts from running on the
> > same CPU.  KERNEL_LOCK() will not prevent this from happening as it is
> > a recursive lock.  This is why OpenBSD's mutexes (spinning locks)
> > raise the spl.
> > 
> > So I think you'll have to define PF_LOCK()/PF_UNLOCK() to do the spl
> > stuff even for MULTIPROCESSOR kernels.
> 
> Thanks for putting my feet to ground. I'm trying to make some sense from that.
> So if I understand splsotfnet() right in MULTIPROCESSOR kernel, it prevents
> CPU from handling another network interrupt until the first one is
> handled. So that splsoftnet should be raised by network driver, before
> packet reaches PF (pf_test() function).

pf_test() is not always called in interrupt context.  When it is called
from a process context you want to make sure the CPU wont try to execute
a softnet interrupt that might corrupt the states it is currently
messing with.

> the pf_test() must protect consistency of firewall data, which are shared
> between CPUs (?can we say kernel threads?). So for the first phase, we opt for
> KERNEL_LOCK() being grabbed by pf_test() as soon as it gets called. It makes 
> PF
> to still process single packet at time. All other CPUs will have to wait on
> KERNEL_LOCK(), holding their splsoftnets(). Do I still follow the concept, or
> am I wrong again?

You're correct.  We cannot say kernel threads right now because incoming
packets are *still* processed in (soft-)interrupt context.

> Assuming the locking in MULTIPROCESSOR goes like:
>interrupt grabs splsoftnet -> ip_input -> PF grabs KERNEL_LOCK()
> We need to take care of ioctl() call path and purge thread. Those need to
> get synchronize with packets using KERNEL_LOCK(). They should not to mess with
> splsoftnet() any more in MULTIPROCESSOR version.

As long as we are using (soft-)interrupts to process incoming packets
they should.

If you don't raise the SPL level to softnet while CPU0 is executing some
code in ioctl() path and it takes an interrupt at this point it might end
up executing ip_input() *before* returning to the ioctl() (process)
context.  This scenario might corrupt the states you're trying to protect.

> Does that idea sound right? If not what piece I'm still missing in puzzle?

I think you're assuming that the "softnet" code path is running in a
thread which is not (yet) the case.

Martin



Re: the very first step towards MULTIPROCESSOR friendly PF

2015-08-27 Thread Alexandr Nedvedicky
On Wed, Aug 26, 2015 at 06:12:10PM +0200, Mark Kettenis wrote:
> > Date: Wed, 26 Aug 2015 17:30:14 +0200
> > From: Alexandr Nedvedicky 
> > 
> > Hello,
> > 
> > I'm not sure I got everything right in Calgary. So this patch should
> > roughly illustrates how I think we should start moving forward to
> > make PF MULTIPROCESSOR friendly. It's quite likely my proposal/way
> > is completely off, I'll be happy if you put me back to ground.
> > 
> > The brief summary of what patch is trying to achieve is as follows:
> > 
> > patch trades all splsoftnet() for KERNEL_LOCK() when it gets compiled
> > with MULTIPROCESSOR option on.
> > 
> > if MULTIPROCESSOR option is off, the compiler produces PF, which uses
> > splsoftnet.
> > 
> > To achieve this the patch introduces macros PF_LOCK()/PF_UNLOCK(),
> > which expand to KERNEL_LOCK()/KERNEL_UNLOCK(), when MULTIPROCESSOR is 
> > on.
> > On the other hand if MULTIPROCESSOR is off the PF_*LOCK() macros become
> > splsoftnet()/splx()
> 
> I don't think this will work.  Even on MULTIPROCESSOR kernels you'll
> need to raise the spl to prevent soft interrupts from running on the
> same CPU.  KERNEL_LOCK() will not prevent this from happening as it is
> a recursive lock.  This is why OpenBSD's mutexes (spinning locks)
> raise the spl.
> 
> So I think you'll have to define PF_LOCK()/PF_UNLOCK() to do the spl
> stuff even for MULTIPROCESSOR kernels.

Thanks for putting my feet to ground. I'm trying to make some sense from that.
So if I understand splsotfnet() right in MULTIPROCESSOR kernel, it prevents
CPU from handling another network interrupt until the first one is
handled. So that splsoftnet should be raised by network driver, before
packet reaches PF (pf_test() function).

the pf_test() must protect consistency of firewall data, which are shared
between CPUs (?can we say kernel threads?). So for the first phase, we opt for
KERNEL_LOCK() being grabbed by pf_test() as soon as it gets called. It makes PF
to still process single packet at time. All other CPUs will have to wait on
KERNEL_LOCK(), holding their splsoftnets(). Do I still follow the concept, or
am I wrong again?

Assuming the locking in MULTIPROCESSOR goes like:
 interrupt grabs splsoftnet -> ip_input -> PF grabs KERNEL_LOCK()
We need to take care of ioctl() call path and purge thread. Those need to
get synchronize with packets using KERNEL_LOCK(). They should not to mess with
splsoftnet() any more in MULTIPROCESSOR version.

Does that idea sound right? If not what piece I'm still missing in puzzle?

thanks and
regards
sasha



Re: the very first step towards MULTIPROCESSOR friendly PF

2015-08-26 Thread Mark Kettenis
> Date: Wed, 26 Aug 2015 17:30:14 +0200
> From: Alexandr Nedvedicky 
> 
> Hello,
> 
> I'm not sure I got everything right in Calgary. So this patch should
> roughly illustrates how I think we should start moving forward to
> make PF MULTIPROCESSOR friendly. It's quite likely my proposal/way
> is completely off, I'll be happy if you put me back to ground.
> 
> The brief summary of what patch is trying to achieve is as follows:
> 
>   patch trades all splsoftnet() for KERNEL_LOCK() when it gets compiled
>   with MULTIPROCESSOR option on.
> 
>   if MULTIPROCESSOR option is off, the compiler produces PF, which uses
>   splsoftnet.
> 
>   To achieve this the patch introduces macros PF_LOCK()/PF_UNLOCK(),
>   which expand to KERNEL_LOCK()/KERNEL_UNLOCK(), when MULTIPROCESSOR is 
> on.
>   On the other hand if MULTIPROCESSOR is off the PF_*LOCK() macros become
>   splsoftnet()/splx()

I don't think this will work.  Even on MULTIPROCESSOR kernels you'll
need to raise the spl to prevent soft interrupts from running on the
same CPU.  KERNEL_LOCK() will not prevent this from happening as it is
a recursive lock.  This is why OpenBSD's mutexes (spinning locks)
raise the spl.

So I think you'll have to define PF_LOCK()/PF_UNLOCK() to do the spl
stuff even for MULTIPROCESSOR kernels.