[RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Wed, 28 Feb 2007 17:18:46 -0800
 
  I was measuring bridging/routing performance and noticed this.
  
  The current code runs the all packet type handlers before calling the
  bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
  this means that each received packet gets run through the Berkeley Packet 
  Filter
  code in sk_run_filter (slow).
 
 I know we closed this out by saying that even though performance
 sucks, we can't really apply this without breaking things.

wrong.

 What would be broken is if the DHCP client isn't specifying
 a device ifindex when it binds the AF_PACKET socket.  That
 would be an easy way to fix this performance problem at the
 application level.
 
 The DHCP client should only care about a particular interface's
 traffic, the one it wants to listen on.


My assumption is that when bridging, the normal stack path only has
to receive those packets that it would receive if it was not doing
bridging.

A better version of the patch is:
==

The current code runs the all packet type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter. This is significant overhead.

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there first. This results in a 14%
improvement in performance.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/core/dev.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- netem.orig/net/core/dev.c
+++ netem/net/core/dev.c
@@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge
unsigned char *addr);
 void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent);
 
-static __inline__ int handle_bridge(struct sk_buff **pskb,
-   struct packet_type **pt_prev, int *ret,
-   struct net_device *orig_dev)
+/*
+ * If bridge module is loaded call bridging hook.
+ * when it returns 1, this is a non-local packet
+ */
+int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) 
__read_mostly;
+static int handle_bridge(struct sk_buff **pskb)
 {
struct net_bridge_port *port;
 
@@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru
(port = rcu_dereference((*pskb)-dev-br_port)) == NULL)
return 0;
 
-   if (*pt_prev) {
-   *ret = deliver_skb(*pskb, *pt_prev, orig_dev);
-   *pt_prev = NULL;
-   }
-
return br_handle_frame_hook(port, pskb);
 }
 #else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (0)
+#define handle_bridge(pskb)0
 #endif
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (handle_bridge(skb))
+   goto out;
+
list_for_each_entry_rcu(ptype, ptype_all, list) {
if (!ptype-dev || ptype-dev == skb-dev) {
if (pt_prev)
@@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk
 ncls:
 #endif
 
-   if (handle_bridge(skb, pt_prev, ret, orig_dev))
-   goto out;
-
type = skb-protocol;
list_for_each_entry_rcu(ptype, ptype_base[ntohs(type)15], list) {
if (ptype-type == type 



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Wed, 28 Feb 2007 17:18:46 -0800
 
  I was measuring bridging/routing performance and noticed this.
  
  The current code runs the all packet type handlers before calling the
  bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
  this means that each received packet gets run through the Berkeley Packet 
  Filter
  code in sk_run_filter (slow).
 
 I know we closed this out by saying that even though performance
 sucks, we can't really apply this without breaking things.

wrong.

 What would be broken is if the DHCP client isn't specifying
 a device ifindex when it binds the AF_PACKET socket.  That
 would be an easy way to fix this performance problem at the
 application level.
 
 The DHCP client should only care about a particular interface's
 traffic, the one it wants to listen on.


My assumption is that when bridging, the normal stack path only has
to receive those packets that it would receive if it was not doing
bridging.

A better version of the patch is:
==

The current code runs the all packet type handlers before calling the
bridge hook.  If an application (like some DHCP clients) is using AF_PACKET,
this means that each received packet gets run through the Berkeley Packet Filter
code in sk_run_filter. This is significant overhead.

By moving the bridging hook to run first, the packets flowing through
the bridge get filtered out there first. This results in a 14%
improvement in performance.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
---
 net/core/dev.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- netem.orig/net/core/dev.c
+++ netem/net/core/dev.c
@@ -1702,9 +1702,12 @@ struct net_bridge_fdb_entry *(*br_fdb_ge
unsigned char *addr);
 void (*br_fdb_put_hook)(struct net_bridge_fdb_entry *ent);
 
-static __inline__ int handle_bridge(struct sk_buff **pskb,
-   struct packet_type **pt_prev, int *ret,
-   struct net_device *orig_dev)
+/*
+ * If bridge module is loaded call bridging hook.
+ * when it returns 1, this is a non-local packet
+ */
+int (*br_handle_frame_hook)(struct net_bridge_port *p, struct sk_buff **pskb) 
__read_mostly;
+static int handle_bridge(struct sk_buff **pskb)
 {
struct net_bridge_port *port;
 
@@ -1712,15 +1715,10 @@ static __inline__ int handle_bridge(stru
(port = rcu_dereference((*pskb)-dev-br_port)) == NULL)
return 0;
 
-   if (*pt_prev) {
-   *ret = deliver_skb(*pskb, *pt_prev, orig_dev);
-   *pt_prev = NULL;
-   }
-
return br_handle_frame_hook(port, pskb);
 }
 #else
-#define handle_bridge(skb, pt_prev, ret, orig_dev) (0)
+#define handle_bridge(pskb)0
 #endif
 
 #ifdef CONFIG_NET_CLS_ACT
@@ -1799,6 +1797,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (handle_bridge(skb))
+   goto out;
+
list_for_each_entry_rcu(ptype, ptype_all, list) {
if (!ptype-dev || ptype-dev == skb-dev) {
if (pt_prev)
@@ -1826,9 +1827,6 @@ int netif_receive_skb(struct sk_buff *sk
 ncls:
 #endif
 
-   if (handle_bridge(skb, pt_prev, ret, orig_dev))
-   goto out;
-
type = skb-protocol;
list_for_each_entry_rcu(ptype, ptype_base[ntohs(type)15], list) {
if (ptype-type == type 



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 14:09:29 -0800

 On Fri, 02 Mar 2007 13:26:38 -0800 (PST)
 David Miller [EMAIL PROTECTED] wrote:
 
  From: Stephen Hemminger [EMAIL PROTECTED]
  Date: Wed, 28 Feb 2007 17:18:46 -0800
  
   I was measuring bridging/routing performance and noticed this.
   
   The current code runs the all packet type handlers before calling the
   bridge hook.  If an application (like some DHCP clients) is using 
   AF_PACKET,
   this means that each received packet gets run through the Berkeley Packet 
   Filter
   code in sk_run_filter (slow).
  
  I know we closed this out by saying that even though performance
  sucks, we can't really apply this without breaking things.
 
 wrong.

I disagee, and your patch is still broken because as Jamal
pointed out (which you didn't address in any way) this breaks
traffic classification of bridged traffic as well.

If someone wants their network tap to hear all traffic, they do mean
all traffic, and this includes potentially seeing it multiple times
when things like bridging and virtual devices decap incoming frames.

We can't apply this.

Back to a workable solution, why doesn't DHCP specify a specific
device?  It would fix this performance problem completely, at the
application level.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST)

 Back to a workable solution, why doesn't DHCP specify a specific
 device?  It would fix this performance problem completely, at the
 application level.

Since nobody seems to be able to be bothered to actually look
at what DHCP clients are doing, I actually did and it's no
surprise that broken stuff is happening here.

Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c:

struct sockaddr sa;
 ...
/* Bind to the interface name */
memset (sa, 0, sizeof sa);
sa.sa_family = AF_PACKET;
strncpy (sa.sa_data, (const char *)info - ifp, sizeof sa.sa_data);
if (bind (sock, sa, sizeof sa)) {
if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
errno == EAFNOSUPPORT || errno == EINVAL) {
log_error (socket: %m - make sure);
log_error (CONFIG_PACKET (Packet socket) %s,
   and CONFIG_FILTER);
log_error ((Socket Filtering) are enabled %s,
   in your kernel);
log_fatal (configuration!);
}
log_fatal (Bind socket to interface: %m);
}

So it puts a string into the sockaddr data, and there
is no mention of sockaddr_ll, which is what is supposed to be
provided as the socket address here, in the entire DHCP tree.

I'm tempted to say I must be missing something here, since I can't see
how this could possible work at all.  The string passed in should
be interpreted as the ifindex value, and thus trigger a -ENODEV
return from AF_PACKET's bind() implementation.

My suspicions are confirmed by the patch here:

http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch

Really, this bogus bind() explains everything.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Stephen Hemminger
On Fri, 02 Mar 2007 15:18:03 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: David Miller [EMAIL PROTECTED]
 Date: Fri, 02 Mar 2007 14:48:18 -0800 (PST)
 
  Back to a workable solution, why doesn't DHCP specify a specific
  device?  It would fix this performance problem completely, at the
  application level.
 
 Since nobody seems to be able to be bothered to actually look
 at what DHCP clients are doing, I actually did and it's no
 surprise that broken stuff is happening here.

I was in middle of checking that..

 Here is how dhcp3-3.0.3 binds AF_PACKET sockets, in common/lpf.c:
 
   struct sockaddr sa;
  ...
   /* Bind to the interface name */
   memset (sa, 0, sizeof sa);
   sa.sa_family = AF_PACKET;
   strncpy (sa.sa_data, (const char *)info - ifp, sizeof sa.sa_data);
   if (bind (sock, sa, sizeof sa)) {
   if (errno == ENOPROTOOPT || errno == EPROTONOSUPPORT ||
   errno == ESOCKTNOSUPPORT || errno == EPFNOSUPPORT ||
   errno == EAFNOSUPPORT || errno == EINVAL) {
   log_error (socket: %m - make sure);
   log_error (CONFIG_PACKET (Packet socket) %s,
  and CONFIG_FILTER);
   log_error ((Socket Filtering) are enabled %s,
  in your kernel);
   log_fatal (configuration!);
   }
   log_fatal (Bind socket to interface: %m);
   }
 
 So it puts a string into the sockaddr data, and there
 is no mention of sockaddr_ll, which is what is supposed to be
 provided as the socket address here, in the entire DHCP tree.
 
 I'm tempted to say I must be missing something here, since I can't see
 how this could possible work at all.  The string passed in should
 be interpreted as the ifindex value, and thus trigger a -ENODEV
 return from AF_PACKET's bind() implementation.
 
 My suspicions are confirmed by the patch here:
 
 http://kernel.org/pub/linux/kernel/people/chuyee/patches/dhcp-3.0/dhcp-3.0-linux_cooked_packet.patch

Can you get FC fixed?

 Really, this bogus bind() explains everything.

Should we add a warning to kernel log, to make distro's fix it?

It might make sense to add a per-device ptype_dev list in network device?



-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 2 Mar 2007 15:34:14 -0800

 Can you get FC fixed?

I am not the DHCP package maintainer. :-)

I'm up to my earfulls already dealing with people trying
to slug broken patches into the kernel networking that paper
around application bugs. ;)

 Should we add a warning to kernel log, to make distro's fix it?

Unfortunately it looks like a properly formed sockaddr_ll,
the ifindex is in fact zero, so there is nothing we can do
to warn about this case.

The sockaddr_ll sits after the first sockaddr string in the ifreq, and
the rest remains initialized to zeros, thus the bind() succeeds.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread Herbert Xu
David Miller [EMAIL PROTECTED] wrote:
 
 I'm tempted to say I must be missing something here, since I can't see
 how this could possible work at all.  The string passed in should
 be interpreted as the ifindex value, and thus trigger a -ENODEV
 return from AF_PACKET's bind() implementation.

This is using packet_bind_spkt which uses a name instead of ifindex.

As you may recall, I've made a patch to convert it to use the new
(actually it's not-so-new anymore) AF_PACKET interface.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 1/2] bridge: avoid ptype_all packet handling

2007-03-02 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Sat, 03 Mar 2007 16:38:45 +1100

 This is using packet_bind_spkt which uses a name instead of ifindex.

So it should be just fine, it should be binding to a specific
device (by name instead of ifindex) and therefore it should
only trigger the pt_all hook when the packet arrives on that
specific device.

 As you may recall, I've made a patch to convert it to use the new
 (actually it's not-so-new anymore) AF_PACKET interface.

That's right.

So it's still a mystery why dhcp is causing bridge devices
to trigger the network tap paths on Stephen's machine.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html