from:"Thomas Graf"

Re: [RFC net-next 3/3] mpls: new ipmpls device for encapsulating IP packets as mpls

2015-06-02 Thread Thomas Graf

On 06/02/15 at 01:26pm, Eric W. Biederman wrote:
 What we really want here is xfrm-lite.  By lite I mean the tunnel
 selection criteria is simple enough that it fits into the normal
 routing table instead of having to do weird flow based magic that
 is rarely needed.
 
 I believe what we want are the xfrm stacking of dst entries.

I assume you are referring to reusing the selector and stacked
dst. I considered that for the transmit side.

Can you elaborate on this some more? How would this look like
for the specific case of VXLAN? Any thoughts on the receive
side? You also mention that you dislike the net_device approach.
What do you suggest instead? The encapsulation is often postponed
to after the packet is fully constructed. Where should it get
hooked into?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap

2015-06-02 Thread Thomas Graf

On 06/02/15 at 02:28pm, Robert Shearman wrote:
 Nesting attributes inside the RTA_ENCAP blob should be supported by the
 patch series today. Something like this:

Sure. I'm not seeing such a construct for the MPLS case yet.

I'm happy to rebase my patches on top of your nexthop implementation.
It is definitely superior. Are you maintaining a git tree somewhere?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 04/14] route: Extend flow representation with tunnel key

2015-06-01 Thread Thomas Graf

Add a new flowi_tunnel structure which is a subset of ip_tunnel_key
to allow routes to match on tunnel metadata. For now, the tunnel id
is added to flowi_tunnel which allows for routes to be bound to
specific virtual tunnels.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/flow.h   |  7 +++
 include/net/ip_tunnels.h | 10 ++
 net/ipv4/route.c |  2 ++
 3 files changed, 19 insertions(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index 8109a15..c15fb5e 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -19,6 +19,10 @@
 
 #define LOOPBACK_IFINDEX   1
 
+struct flowi_tunnel {
+   __be64  tun_id;
+};
+
 struct flowi_common {
int flowic_oif;
int flowic_iif;
@@ -30,6 +34,7 @@ struct flowi_common {
 #define FLOWI_FLAG_ANYSRC  0x01
 #define FLOWI_FLAG_KNOWN_NH0x02
__u32   flowic_secid;
+   struct flowi_tunnel flowic_tun_key;
 };
 
 union flowi_uli {
@@ -66,6 +71,7 @@ struct flowi4 {
 #define flowi4_proto   __fl_common.flowic_proto
 #define flowi4_flags   __fl_common.flowic_flags
 #define flowi4_secid   __fl_common.flowic_secid
+#define flowi4_tun_key __fl_common.flowic_tun_key
 
/* (saddr,daddr) must be grouped, same order as in IP header */
__be32  saddr;
@@ -165,6 +171,7 @@ struct flowi {
 #define flowi_protou.__fl_common.flowic_proto
 #define flowi_flagsu.__fl_common.flowic_flags
 #define flowi_secidu.__fl_common.flowic_secid
+#define flowi_tun_key  u.__fl_common.flowic_tun_key
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 8b76ba1..df8cfd3 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -12,6 +12,7 @@
 #include net/ip.h
 #include net/netns/generic.h
 #include net/rtnetlink.h
+#include net/flow.h
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include net/ipv6.h
@@ -337,6 +338,15 @@ static inline void *ip_tunnel_info_opts(struct 
ip_tunnel_info *info,
return info + 1;
 }
 
+static inline void ip_tunnel_derive_key(struct sk_buff *skb,
+   struct flowi_tunnel *key)
+{
+   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
+
+   if (tun_info  tun_info-mode == IP_TUNNEL_INFO_RX)
+   key-tun_id = tun_info-key.tun_id;
+}
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f605598..6e8e1be 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -109,6 +109,7 @@
 #include linux/kmemleak.h
 #endif
 #include net/secure_seq.h
+#include net/ip_tunnels.h
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)-flowi4_tos  (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1716,6 +1717,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.daddr = daddr;
fl4.saddr = saddr;
+   ip_tunnel_derive_key(skb, fl4.flowi4_tun_key);
err = fib_lookup(net, fl4, res);
if (err != 0) {
if (!IN_DEV_FORWARD(in_dev))
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata

2015-06-01 Thread Thomas Graf

This allows to attach an ip_tunnel_info metadata structure to skbs
via skb_shared_info to represent receive side tunnel information
as well as transmit side encapsulation instructions.

The new field is added to skb_shared_info as the field is typically
immutable after it has been attached. A new flag indicates whether
the metadata is meant for receive or transmit. This allows to keep
receive metadata attached to the skb all the way through the
forwarding path without mistaking it for transmit instructions. The
tun_info pointer is thus only released if a packet which has been
received on a tunnel is being forwarded to tunnel device again.

Since transmit instructions are immutable per flow which attaches
them to the skb, a reference count is introduced which allows to
reuse the metadata for many packets. Therefore, when a route later
on receives the capability to attach tunnel metadata, it will only
have to allocate the metadata once and can simply increment the
reference counter for each packet that uses that instruction set.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/linux/skbuff.h|  1 +
 include/net/ip_tunnels.h  | 45 +
 net/core/skbuff.c |  8 
 net/ipv4/ip_tunnel_core.c | 15 +++
 4 files changed, 69 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6b41c15..83f9a59 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -323,6 +323,7 @@ struct skb_shared_info {
unsigned short  gso_segs;
unsigned short  gso_type;
struct sk_buff  *frag_list;
+   struct ip_tunnel_info   *tun_info;
struct skb_shared_hwtstamps hwtstamps;
u32 tskey;
__be32  ip6_frag_id;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 6b9d559..3968705 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -38,10 +38,20 @@ struct ip_tunnel_key {
__be16  tp_dst;
 } __packed __aligned(4); /* Minimize padding. */
 
+/* Indicates whether the tunnel info structure represents receive
+ * or transmit tunnel parameters.
+ */
+enum {
+   IP_TUNNEL_INFO_RX,
+   IP_TUNNEL_INFO_TX,
+};
+
 struct ip_tunnel_info {
struct ip_tunnel_keykey;
const void  *options;
+   atomic_trefcnt;
u8  options_len;
+   u8  mode;
 };
 
 /* 6rd prefix/relay information */
@@ -284,6 +294,41 @@ static inline void iptunnel_xmit_stats(int err,
}
 }
 
+struct ip_tunnel_info *ip_tunnel_info_alloc(size_t optslen, gfp_t flags);
+
+static inline void ip_tunnel_info_get(struct ip_tunnel_info *info)
+{
+   atomic_inc(info-refcnt);
+}
+
+static inline void ip_tunnel_info_put(struct ip_tunnel_info *info)
+{
+   if (!info)
+   return;
+
+   if (atomic_dec_and_test(info-refcnt))
+   kfree(info);
+}
+
+static inline int skb_attach_tunnel_info(struct sk_buff *skb,
+struct ip_tunnel_info *info)
+{
+   if (skb_unclone(skb, GFP_ATOMIC))
+   return -ENOMEM;
+
+   ip_tunnel_info_put(skb_shinfo(skb)-tun_info);
+   ip_tunnel_info_get(info);
+   skb_shinfo(skb)-tun_info = info;
+
+   return 0;
+}
+
+static inline void skb_release_tunnel_info(struct sk_buff *skb)
+{
+   ip_tunnel_info_put(skb_shinfo(skb)-tun_info);
+   skb_shinfo(skb)-tun_info = NULL;
+}
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9bac0e6..dbbace2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -69,6 +69,7 @@
 #include net/sock.h
 #include net/checksum.h
 #include net/ip6_checksum.h
+#include net/ip_tunnels.h
 #include net/xfrm.h
 
 #include asm/uaccess.h
@@ -594,6 +595,8 @@ static void skb_release_data(struct sk_buff *skb)
uarg-callback(uarg, true);
}
 
+   ip_tunnel_info_put(shinfo-tun_info);
+
if (shinfo-frag_list)
kfree_skb_list(shinfo-frag_list);
 
@@ -985,6 +988,11 @@ static void copy_skb_header(struct sk_buff *new, const 
struct sk_buff *old)
skb_shinfo(new)-gso_size = skb_shinfo(old)-gso_size;
skb_shinfo(new)-gso_segs = skb_shinfo(old)-gso_segs;
skb_shinfo(new)-gso_type = skb_shinfo(old)-gso_type;
+
+   if (skb_shinfo(old)-tun_info) {
+   ip_tunnel_info_get(skb_shinfo(old)-tun_info);
+   skb_shinfo(new)-tun_info = skb_shinfo(old)-tun_info;
+   }
 }
 
 static inline int skb_alloc_rx_flag(const struct sk_buff *skb)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6a51a71..bbd4f91 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -190,3 +190,18 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct 
net_device *dev,
return tot;
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64

[net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic

2015-06-01 Thread Thomas Graf

Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.

Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_tunnels.h | 63 +
 include/uapi/linux/openvswitch.h |  2 +-
 net/openvswitch/actions.c|  2 +-
 net/openvswitch/datapath.h   |  5 +--
 net/openvswitch/flow.c   |  4 +--
 net/openvswitch/flow.h   | 76 ++--
 net/openvswitch/flow_netlink.c   | 16 -
 net/openvswitch/flow_netlink.h   |  2 +-
 net/openvswitch/vport-geneve.c   | 17 +
 net/openvswitch/vport-gre.c  | 16 -
 net/openvswitch/vport-vxlan.c| 18 +-
 net/openvswitch/vport.c  | 30 
 net/openvswitch/vport.h  | 12 +++
 13 files changed, 128 insertions(+), 135 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d8214cb..6b9d559 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -22,6 +22,28 @@
 /* Keep error state on tunnel for 30 sec */
 #define IPTUNNEL_ERR_TIMEO (30*HZ)
 
+/* Used to memset ip_tunnel padding. */
+#define IP_TUNNEL_KEY_SIZE \
+   (offsetof(struct ip_tunnel_key, tp_dst) +   \
+FIELD_SIZEOF(struct ip_tunnel_key, tp_dst))
+
+struct ip_tunnel_key {
+   __be64  tun_id;
+   __be32  ipv4_src;
+   __be32  ipv4_dst;
+   __be16  tun_flags;
+   __u8ipv4_tos;
+   __u8ipv4_ttl;
+   __be16  tp_src;
+   __be16  tp_dst;
+} __packed __aligned(4); /* Minimize padding. */
+
+struct ip_tunnel_info {
+   struct ip_tunnel_keykey;
+   const void  *options;
+   u8  options_len;
+};
+
 /* 6rd prefix/relay information */
 #ifdef CONFIG_IPV6_SIT_6RD
 struct ip_tunnel_6rd_parm {
@@ -136,6 +158,47 @@ int ip_tunnel_encap_add_ops(const struct 
ip_tunnel_encap_ops *op,
 int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
unsigned int num);
 
+static inline void __ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+__be32 saddr, __be32 daddr,
+u8 tos, u8 ttl,
+__be16 tp_src, __be16 tp_dst,
+__be64 tun_id, __be16 tun_flags,
+const void *opts, u8 opts_len)
+{
+   tun_info-key.tun_id = tun_id;
+   tun_info-key.ipv4_src = saddr;
+   tun_info-key.ipv4_dst = daddr;
+   tun_info-key.ipv4_tos = tos;
+   tun_info-key.ipv4_ttl = ttl;
+   tun_info-key.tun_flags = tun_flags;
+
+   /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
+* the upper tunnel are used.
+* E.g: GRE over IPSEC, the tp_src and tp_port are zero.
+*/
+   tun_info-key.tp_src = tp_src;
+   tun_info-key.tp_dst = tp_dst;
+
+   /* Clear struct padding. */
+   if (sizeof(tun_info-key) != IP_TUNNEL_KEY_SIZE)
+   memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_SIZE,
+  0, sizeof(tun_info-key) - IP_TUNNEL_KEY_SIZE);
+
+   tun_info-options = opts;
+   tun_info-options_len = opts_len;
+}
+
+static inline void ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+  const struct iphdr *iph,
+  __be16 tp_src, __be16 tp_dst,
+  __be64 tun_id, __be16 tun_flags,
+  const void *opts, u8 opts_len)
+{
+   __ip_tunnel_info_init(tun_info, iph-saddr, iph-daddr,
+ iph-tos, iph-ttl, tp_src, tp_dst,
+ tun_id, tun_flags, opts, opts_len);
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bbd49a0..fffe317 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -319,7 +319,7 @@ enum ovs_key_attr {
 * the accepted length of the array. */
 
 #ifdef __KERNEL__
-   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ovs_tunnel_info */
+   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
 #endif
__OVS_KEY_ATTR_MAX
 };
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index b491c1c..34cad57 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -610,7

[net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device

2015-06-01 Thread Thomas Graf

This gets rid of all OVS specific VXLAN code in the receive and
transmit path by using a VXLAN net_device to represent the vport.
Only a small shim layer remains which takes care of handling the
VXLAN specific OVS Netlink configuration.

Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
since they are no longer needed.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 drivers/net/vxlan.c|  23 +--
 include/net/vxlan.h|  14 +-
 net/openvswitch/Kconfig|  12 --
 net/openvswitch/Makefile   |   1 -
 net/openvswitch/flow_netlink.c |   5 +-
 net/openvswitch/vport-netdev.c | 176 +-
 net/openvswitch/vport-vxlan.c  | 322 -
 7 files changed, 193 insertions(+), 360 deletions(-)
 delete mode 100644 net/openvswitch/vport-vxlan.c

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3acab95..b696871 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -74,6 +74,10 @@ static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+vxlan_rcv_t *rcv, void *data,
+bool no_share, u32 flags);
+
 /* per-network namespace private data for this module */
 struct vxlan_net {
struct list_head  vxlan_list;
@@ -1020,7 +1024,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct 
vxlan_dev *dev)
return false;
 }
 
-void vxlan_sock_release(struct vxlan_sock *vs)
+static void vxlan_sock_release(struct vxlan_sock *vs)
 {
struct sock *sk = vs-sock-sk;
struct net *net = sock_net(sk);
@@ -1036,7 +1040,6 @@ void vxlan_sock_release(struct vxlan_sock *vs)
 
queue_work(vxlan_wq, vs-del_work);
 }
-EXPORT_SYMBOL_GPL(vxlan_sock_release);
 
 /* Update multicast group membership when first VNI on
  * multicast address is brought up
@@ -1761,10 +1764,10 @@ err:
 }
 #endif
 
-int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
-  __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-  __be16 src_port, __be16 dst_port,
-  struct vxlan_metadata *md, bool xnet, u32 vxflags)
+static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff 
*skb,
+ __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
+ __be16 src_port, __be16 dst_port,
+ struct vxlan_metadata *md, bool xnet, u32 vxflags)
 {
struct vxlanhdr *vxh;
int min_headroom;
@@ -1834,7 +1837,6 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, 
struct sk_buff *skb,
   ttl, df, src_port, dst_port, xnet,
   !(vxflags  VXLAN_F_UDP_CSUM));
 }
-EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
 
 /* Bypass encapsulation if the destination is local */
 static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev 
*src_vxlan,
@@ -2609,9 +2611,9 @@ static struct vxlan_sock *vxlan_socket_create(struct net 
*net, __be16 port,
return vs;
 }
 
-struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
- bool no_share, u32 flags)
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+vxlan_rcv_t *rcv, void *data,
+bool no_share, u32 flags)
 {
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_sock *vs;
@@ -2632,7 +2634,6 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 
port,
 
return vxlan_socket_create(net, port, rcv, data, flags);
 }
-EXPORT_SYMBOL_GPL(vxlan_sock_add);
 
 static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
   struct vxlan_config *conf)
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index c037b27..d3ce81f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -197,19 +197,13 @@ struct vxlan_dev {
 VXLAN_F_REMCSUM_NOPARTIAL |\
 VXLAN_F_FLOW_BASED)
 
-struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
- bool no_share, u32 flags);
-
 struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type, struct vxlan_config 
*conf);
 
-void vxlan_sock_release(struct vxlan_sock *vs);
-
-int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
-  __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-  __be16 src_port, __be16 dst_port, struct

[net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name()

2015-06-01 Thread Thomas Graf

This allows to get rid of the get_name() vport ops later on.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/openvswitch/datapath.c   | 4 ++--
 net/openvswitch/vport-internal_dev.c | 1 -
 net/openvswitch/vport-netdev.c   | 6 --
 net/openvswitch/vport-netdev.h   | 1 -
 net/openvswitch/vport.c  | 4 ++--
 net/openvswitch/vport.h  | 5 +
 6 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c3ecfd4..8986558 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int 
dp_ifindex)
 const char *ovs_dp_name(const struct datapath *dp)
 {
struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
-   return vport-ops-get_name(vport);
+   return ovs_vport_name(vport);
 }
 
 static int get_dpifindex(const struct datapath *dp)
@@ -1786,7 +1786,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, 
struct sk_buff *skb,
if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport-port_no) ||
nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport-ops-type) ||
nla_put_string(skb, OVS_VPORT_ATTR_NAME,
-  vport-ops-get_name(vport)))
+  ovs_vport_name(vport)))
goto nla_put_failure;
 
ovs_vport_get_stats(vport, vport_stats);
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index a2c205d..c058bbf 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = {
.type   = OVS_VPORT_TYPE_INTERNAL,
.create = internal_dev_create,
.destroy= internal_dev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = internal_dev_recv,
 };
 
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index cb22051..ef11a41 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -170,11 +170,6 @@ static void netdev_destroy(struct vport *vport)
call_rcu(vport-rcu, free_port_rcu);
 }
 
-const char *ovs_netdev_get_name(const struct vport *vport)
-{
-   return vport-dev-name;
-}
-
 static unsigned int packet_length(const struct sk_buff *skb)
 {
unsigned int length = skb-len - ETH_HLEN;
@@ -222,7 +217,6 @@ static struct vport_ops ovs_netdev_vport_ops = {
.type   = OVS_VPORT_TYPE_NETDEV,
.create = netdev_create,
.destroy= netdev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = netdev_send,
 };
 
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 1c52aed..684fb88 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,7 +26,6 @@
 
 struct vport *ovs_netdev_get_vport(struct net_device *dev);
 
-const char *ovs_netdev_get_name(const struct vport *);
 void ovs_netdev_detach_dev(struct vport *);
 
 int __init ovs_netdev_init(void);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index af23ba0..d14f594 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const 
char *name)
struct vport *vport;
 
hlist_for_each_entry_rcu(vport, bucket, hash_node)
-   if (!strcmp(name, vport-ops-get_name(vport)) 
+   if (!strcmp(name, ovs_vport_name(vport)) 
net_eq(ovs_dp_get_net(vport-dp), net))
return vport;
 
@@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
}
 
bucket = hash_bucket(ovs_dp_get_net(vport-dp),
-vport-ops-get_name(vport));
+ovs_vport_name(vport));
hlist_add_head_rcu(vport-hash_node, bucket);
return vport;
}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index e05ec68..1a689c2 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff 
*skb,
skb-csum = csum_add(skb-csum, csum_partial(start, len, 0));
 }
 
+static inline const char *ovs_vport_name(struct vport *vport)
+{
+   return vport-dev ? vport-dev-name : vport-ops-get_name(vport);
+}
+
 int ovs_vport_ops_register(struct vport_ops *ops);
 void ovs_vport_ops_unregister(struct vport_ops *ops);
 
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 09/14] openvswitch: Move dev pointer into vport itself

2015-06-01 Thread Thomas Graf

This is the first step in representing all OVS vports as regular
struct net_devices. Move the net_device pointer into the vport
structure itself to get rid of struct vport_netdev.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/openvswitch/datapath.c   |  7 +--
 net/openvswitch/dp_notify.c  |  5 +--
 net/openvswitch/vport-internal_dev.c | 37 +++-
 net/openvswitch/vport-netdev.c   | 84 
 net/openvswitch/vport-netdev.h   | 12 --
 net/openvswitch/vport.h  |  3 +-
 6 files changed, 58 insertions(+), 90 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3315e3a..c3ecfd4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp)
 
local = ovs_vport_rcu(dp, OVSP_LOCAL);
if (local)
-   ifindex = netdev_vport_priv(local)-dev-ifindex;
+   ifindex = local-dev-ifindex;
else
ifindex = 0;
 
@@ -2205,13 +2205,10 @@ static void __net_exit list_vports_from_net(struct net 
*net, struct net *dnet,
struct vport *vport;
 
hlist_for_each_entry(vport, dp-ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport-ops-type != OVS_VPORT_TYPE_INTERNAL)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (dev_net(netdev_vport-dev) == dnet)
+   if (dev_net(vport-dev) == dnet)
list_add(vport-detach_list, head);
}
}
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..a7a80a6 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
 
hlist_for_each_entry_safe(vport, n, dp-ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport-ops-type != OVS_VPORT_TYPE_NETDEV)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (!(netdev_vport-dev-priv_flags  
IFF_OVS_DATAPATH))
+   if (!(vport-dev-priv_flags  
IFF_OVS_DATAPATH))
dp_detach_port_notify(vport);
}
}
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 6a55f71..a2c205d 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev)
 static struct vport *internal_dev_create(const struct vport_parms *parms)
 {
struct vport *vport;
-   struct netdev_vport *netdev_vport;
struct internal_dev *internal_dev;
int err;
 
-   vport = ovs_vport_alloc(sizeof(struct netdev_vport),
-   ovs_internal_vport_ops, parms);
+   vport = ovs_vport_alloc(0, ovs_internal_vport_ops, parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
goto error;
}
 
-   netdev_vport = netdev_vport_priv(vport);
-
-   netdev_vport-dev = alloc_netdev(sizeof(struct internal_dev),
-parms-name, NET_NAME_UNKNOWN,
-do_setup);
-   if (!netdev_vport-dev) {
+   vport-dev = alloc_netdev(sizeof(struct internal_dev),
+ parms-name, NET_NAME_UNKNOWN, do_setup);
+   if (!vport-dev) {
err = -ENOMEM;
goto error_free_vport;
}
 
-   dev_net_set(netdev_vport-dev, ovs_dp_get_net(vport-dp));
-   internal_dev = internal_dev_priv(netdev_vport-dev);
+   dev_net_set(vport-dev, ovs_dp_get_net(vport-dp));
+   internal_dev = internal_dev_priv(vport-dev);
internal_dev-vport = vport;
 
/* Restrict bridge port to current netns. */
if (vport-port_no == OVSP_LOCAL)
-   netdev_vport-dev-features |= NETIF_F_NETNS_LOCAL;
+   vport-dev-features |= NETIF_F_NETNS_LOCAL;
 
rtnl_lock();
-   err = register_netdevice(netdev_vport-dev);
+   err = register_netdevice(vport-dev);
if (err)
goto error_free_netdev;
 
-   dev_set_promiscuity(netdev_vport-dev, 1);
+   dev_set_promiscuity(vport-dev, 1);
rtnl_unlock();
-   netif_start_queue(netdev_vport-dev);
+   netif_start_queue(vport-dev

[net-next RFC 03/14] vxlan: Flow based tunneling

2015-06-01 Thread Thomas Graf

Allows putting a VXLAN device into a new flow-based mode in which it
will populate a tunnel info structure for each packet received. The
metadata structure will contain the outer header and tunnel header
fields which have been stripped off. Layers further up in the stack
such as routing, tc or netfitler can later match on these fields.

On the transmit side, it allows skbs to carry their own encapsulation
instructions thus allowing encapsulations parameters to be set per
flow/route.

This prepares the VXLAN device to be steered by the routing subsystem
which will allow to support encapsulation for a large number of tunnel
endpoints and tunnel ids through a single net_device which improves
the scalability of current VXLAN tunnels.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 drivers/net/vxlan.c  | 147 ---
 include/linux/skbuff.h   |   1 +
 include/net/ip_tunnels.h |   8 +++
 include/net/route.h  |   8 +++
 include/net/vxlan.h  |   4 +-
 include/uapi/linux/if_link.h |   1 +
 6 files changed, 146 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 34c519e..d5edba5 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1164,10 +1164,12 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct ip_tunnel_info *tun_info = NULL;
struct vxlan_sock *vs;
struct vxlanhdr *vxh;
u32 flags, vni;
-   struct vxlan_metadata md = {0};
+   struct vxlan_metadata _md;
+   struct vxlan_metadata *md = _md;
 
/* Need Vxlan and inner Ethernet header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1202,6 +1204,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
vni = VXLAN_VNI_MASK;
}
 
+   if (vs-flags  VXLAN_F_FLOW_BASED) {
+   const struct iphdr *iph = ip_hdr(skb);
+
+   /* TODO: Consider optimizing by looking up in flow cache */
+   tun_info = ip_tunnel_info_alloc(sizeof(*md), GFP_ATOMIC);
+   if (!tun_info)
+   goto drop;
+
+   tun_info-key.ipv4_src = iph-saddr;
+   tun_info-key.ipv4_dst = iph-daddr;
+   tun_info-key.ipv4_tos = iph-tos;
+   tun_info-key.ipv4_ttl = iph-ttl;
+   tun_info-key.tp_src = udp_hdr(skb)-source;
+   tun_info-key.tp_dst = udp_hdr(skb)-dest;
+
+   tun_info-mode = IP_TUNNEL_INFO_RX;
+   tun_info-key.tun_flags = TUNNEL_KEY;
+   tun_info-key.tun_id = cpu_to_be64(vni  8);
+   if (udp_hdr(skb)-check != 0)
+   tun_info-key.tun_flags |= TUNNEL_CSUM;
+
+   md = ip_tunnel_info_opts(tun_info, sizeof(*md));
+   skb_attach_tunnel_info(skb, tun_info);
+   } else {
+   memset(md, 0, sizeof(*md));
+   }
+
/* For backwards compatibility, only allow reserved fields to be
 * used by VXLAN extensions if explicitly requested.
 */
@@ -1209,13 +1238,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
struct vxlanhdr_gbp *gbp;
 
gbp = (struct vxlanhdr_gbp *)vxh;
-   md.gbp = ntohs(gbp-policy_id);
+   md-gbp = ntohs(gbp-policy_id);
+
+   if (tun_info)
+   tun_info-key.tun_flags |= TUNNEL_VXLAN_OPT;
 
if (gbp-dont_learn)
-   md.gbp |= VXLAN_GBP_DONT_LEARN;
+   md-gbp |= VXLAN_GBP_DONT_LEARN;
 
if (gbp-policy_applied)
-   md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+   md-gbp |= VXLAN_GBP_POLICY_APPLIED;
 
flags = ~VXLAN_GBP_USED_BITS;
}
@@ -1233,8 +1265,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
goto bad_flags;
}
 
-   md.vni = vxh-vx_vni;
-   vs-rcv(vs, skb, md);
+   md-vni = vxh-vx_vni;
+   vs-rcv(vs, skb, md);
return 0;
 
 drop:
@@ -1254,6 +1286,7 @@ error:
 static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
  struct vxlan_metadata *md)
 {
+   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
struct iphdr *oip = NULL;
struct ipv6hdr *oip6 = NULL;
struct vxlan_dev *vxlan;
@@ -1263,7 +1296,12 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct 
sk_buff *skb,
int err = 0;
union vxlan_addr *remote_ip;
 
-   vni = ntohl(md-vni)  8;
+   /* For flow based devices, map all packets to VNI 0 */
+   if (vs-flags  VXLAN_F_FLOW_BASED)
+   vni = 0;
+   else
+   vni = ntohl(md

[net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices

2015-06-01 Thread Thomas Graf

This is the first series in a greater effort to bring the scalability
and programmability advantages of OVS to the rest of the network
stack and to get rid of as much OVS specific code as possible.

This first series focuses on getting rid of OVS tunnel vports and use
regular tunnel net_devices instead. As part of this effort, the
routing subsystem is extended with support for flow based tunneling.
In this new tunneling mode, the route is able to match on tunnel
information as well as set tunnel encapsulation parameters per route.
This allows to perform L3 forwarding for a large number of tunnel
endpoints and virtual networks using a single tunnel net_device.

TODO:
 - Geneve support
 - IPv6 support
 - Benchmarks

Pravin Shelar (1):
  openvswitch: Use regular GRE net_device instead of vport

Thomas Graf (13):
  ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
  ip_tunnel: support per packet tunnel metadata
  vxlan: Flow based tunneling
  route: Extend flow representation with tunnel key
  route: Per route tunnel metadata with RTA_TUNNEL
  fib: Add fib rule match on tunnel id
  vxlan: Factor out device configuration
  openvswitch: Allocate  attach ip_tunnel_info for tunnel set action
  openvswitch: Move dev pointer into vport itself
  openvswitch: Abstract vport name through ovs_vport_name()
  openvswitch: Use regular VXLAN net_device device
  vxlan: remove indirect call to vxlan_rcv() and vni member
  arp: Associate ARP requests with tunnel info

 drivers/net/vxlan.c  | 663 ---
 include/linux/skbuff.h   |   2 +
 include/net/fib_rules.h  |   1 +
 include/net/flow.h   |   7 +
 include/net/ip_fib.h |   3 +
 include/net/ip_tunnels.h | 127 ++-
 include/net/route.h  |  18 +
 include/net/vxlan.h  |  82 -
 include/uapi/linux/fib_rules.h   |   2 +-
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/openvswitch.h |   2 +-
 include/uapi/linux/rtnetlink.h   |  16 +
 net/core/dev.c   |   5 +-
 net/core/fib_rules.c |  17 +-
 net/core/skbuff.c|   8 +
 net/ipv4/arp.c   |   8 +
 net/ipv4/fib_frontend.c  |  57 +++
 net/ipv4/fib_semantics.c |  45 +++
 net/ipv4/ip_gre.c| 161 -
 net/ipv4/ip_tunnel_core.c|  15 +
 net/ipv4/route.c |  32 +-
 net/openvswitch/Kconfig  |  12 -
 net/openvswitch/Makefile |   2 -
 net/openvswitch/actions.c|  10 +-
 net/openvswitch/datapath.c   |  19 +-
 net/openvswitch/datapath.h   |   5 +-
 net/openvswitch/dp_notify.c  |   5 +-
 net/openvswitch/flow.c   |   4 +-
 net/openvswitch/flow.h   |  77 +---
 net/openvswitch/flow_netlink.c   |  78 -
 net/openvswitch/flow_netlink.h   |   3 +-
 net/openvswitch/vport-geneve.c   |  17 +-
 net/openvswitch/vport-gre.c  | 313 -
 net/openvswitch/vport-internal_dev.c |  38 +-
 net/openvswitch/vport-netdev.c   | 271 +++---
 net/openvswitch/vport-netdev.h   |  13 -
 net/openvswitch/vport-vxlan.c| 322 -
 net/openvswitch/vport.c  |  34 +-
 net/openvswitch/vport.h  |  21 +-
 39 files changed, 1334 insertions(+), 1182 deletions(-)
 delete mode 100644 net/openvswitch/vport-gre.c
 delete mode 100644 net/openvswitch/vport-vxlan.c

-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 07/14] vxlan: Factor out device configuration

2015-06-01 Thread Thomas Graf

This factors out the device configuration out of the RTNL newlink
API which allows for in-kernel creation of VXLAN net_devices.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 332 
 include/net/vxlan.h |  59 ++
 2 files changed, 236 insertions(+), 155 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d5edba5..3acab95 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -54,10 +54,6 @@
 
 #define PORT_HASH_BITS 8
 #define PORT_HASH_SIZE  (1PORT_HASH_BITS)
-#define VNI_HASH_BITS  10
-#define VNI_HASH_SIZE  (1VNI_HASH_BITS)
-#define FDB_HASH_BITS  8
-#define FDB_HASH_SIZE  (1FDB_HASH_BITS)
 #define FDB_AGE_DEFAULT 300 /* 5 min */
 #define FDB_AGE_INTERVAL (10 * HZ) /* rescan interval */
 
@@ -74,6 +70,7 @@ module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN);
 
 static int vxlan_net_id;
+static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
@@ -84,21 +81,6 @@ struct vxlan_net {
spinlock_tsock_lock;
 };
 
-union vxlan_addr {
-   struct sockaddr_in sin;
-   struct sockaddr_in6 sin6;
-   struct sockaddr sa;
-};
-
-struct vxlan_rdst {
-   union vxlan_addr remote_ip;
-   __be16   remote_port;
-   u32  remote_vni;
-   u32  remote_ifindex;
-   struct list_head list;
-   struct rcu_head  rcu;
-};
-
 /* Forwarding table entry */
 struct vxlan_fdb {
struct hlist_node hlist;/* linked list of entries */
@@ -111,31 +93,6 @@ struct vxlan_fdb {
u8eth_addr[ETH_ALEN];
 };
 
-/* Pseudo network device */
-struct vxlan_dev {
-   struct hlist_node hlist;/* vni hash table */
-   struct list_head  next; /* vxlan's per namespace list */
-   struct vxlan_sock *vn_sock; /* listening socket */
-   struct net_device *dev;
-   struct net*net; /* netns for packet i/o */
-   struct vxlan_rdst default_dst;  /* default destination */
-   union vxlan_addr  saddr;/* source address */
-   __be16dst_port;
-   __u16 port_min; /* source port range */
-   __u16 port_max;
-   __u8  tos;  /* TOS override */
-   __u8  ttl;
-   u32   flags;/* VXLAN_F_* in vxlan.h */
-
-   unsigned long age_interval;
-   struct timer_list age_timer;
-   spinlock_thash_lock;
-   unsigned int  addrcnt;
-   unsigned int  addrmax;
-
-   struct hlist_head fdb_head[FDB_HASH_SIZE];
-};
-
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
 static struct workqueue_struct *vxlan_wq;
@@ -345,7 +302,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct 
vxlan_dev *vxlan,
if (send_ip  vxlan_nla_put_addr(skb, NDA_DST, rdst-remote_ip))
goto nla_put_failure;
 
-   if (rdst-remote_port  rdst-remote_port != vxlan-dst_port 
+   if (rdst-remote_port  rdst-remote_port != vxlan-cfg.dst_port 
nla_put_be16(skb, NDA_PORT, rdst-remote_port))
goto nla_put_failure;
if (rdst-remote_vni != vxlan-default_dst.remote_vni 
@@ -749,7 +706,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags  NLM_F_CREATE))
return -ENOENT;
 
-   if (vxlan-addrmax  vxlan-addrcnt = vxlan-addrmax)
+   if (vxlan-cfg.addrmax 
+   vxlan-addrcnt = vxlan-cfg.addrmax)
return -ENOSPC;
 
/* Disallow replace to add a multicast entry */
@@ -835,7 +793,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct 
vxlan_dev *vxlan,
return -EINVAL;
*port = nla_get_be16(tb[NDA_PORT]);
} else {
-   *port = vxlan-dst_port;
+   *port = vxlan-cfg.dst_port;
}
 
if (tb[NDA_VNI]) {
@@ -1021,7 +979,7 @@ static bool vxlan_snoop(struct net_device *dev,
vxlan_fdb_create(vxlan, src_mac, src_ip,
 NUD_REACHABLE,
 NLM_F_EXCL|NLM_F_CREATE,
-vxlan-dst_port,
+vxlan-cfg.dst_port,
 vxlan-default_dst.remote_vni,
 0, NTF_SELF);
spin_unlock(vxlan-hash_lock);
@@ -1945,7 +1903,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
u32 flags = vxlan-flags;
 
if (rdst) {
-   dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-dst_port;
+   dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-cfg.dst_port

[net-next RFC 14/14] arp: Associate ARP requests with tunnel info

2015-06-01 Thread Thomas Graf

Since ARP performs its own route lookup call, eventually
returned tunnel metadata must be attached manually.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/ipv4/arp.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 933a928..6cf0502 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -489,6 +489,7 @@ struct sk_buff *arp_create(int type, int ptype, __be32 
dest_ip,
unsigned char *arp_ptr;
int hlen = LL_RESERVED_SPACE(dev);
int tlen = dev-needed_tailroom;
+   struct rtable *rt;
 
/*
 *  Allocate a buffer
@@ -577,6 +578,13 @@ struct sk_buff *arp_create(int type, int ptype, __be32 
dest_ip,
}
memcpy(arp_ptr, dest_ip, 4);
 
+   rt = ip_route_output(dev_net(dev), dest_ip, src_ip, 0, dev-ifindex);
+   if (!IS_ERR(rt)) {
+   if (rt-rt_tun_info)
+   skb_attach_tunnel_info(skb, rt-rt_tun_info);
+   ip_rt_put(rt);
+   }
+
return skb;
 
 out:
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 06/14] fib: Add fib rule match on tunnel id

2015-06-01 Thread Thomas Graf

This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.

ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/fib_rules.h|  1 +
 include/uapi/linux/fib_rules.h |  2 +-
 net/core/fib_rules.c   | 17 +++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6d67383..822ed1e 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -19,6 +19,7 @@ struct fib_rule {
u8  action;
/* 3 bytes hole, try to use */
u32 target;
+   __be64  tun_id;
struct fib_rule __rcu   *ctarget;
struct net  *fr_net;
 
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 2b82d7e..96161b8 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -43,7 +43,7 @@ enum {
FRA_UNUSED5,
FRA_FWMARK, /* mark */
FRA_FLOW,   /* flow/class id */
-   FRA_UNUSED6,
+   FRA_TUN_ID,
FRA_SUPPRESS_IFGROUP,
FRA_SUPPRESS_PREFIXLEN,
FRA_TABLE,  /* Extended table id */
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 9a12668..6da78c9 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -186,6 +186,9 @@ static int fib_rule_match(struct fib_rule *rule, struct 
fib_rules_ops *ops,
if ((rule-mark ^ fl-flowi_mark)  rule-mark_mask)
goto out;
 
+   if (rule-tun_id  (rule-tun_id != fl-flowi_tun_key.tun_id))
+   goto out;
+
ret = ops-match(rule, fl, flags);
 out:
return (rule-flags  FIB_RULE_INVERT) ? !ret : ret;
@@ -330,6 +333,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (tb[FRA_FWMASK])
rule-mark_mask = nla_get_u32(tb[FRA_FWMASK]);
 
+   if (tb[FRA_TUN_ID])
+   rule-tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+
rule-action = frh-action;
rule-flags = frh-flags;
rule-table = frh_get_table(frh, tb);
@@ -473,6 +479,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
(rule-mark_mask != nla_get_u32(tb[FRA_FWMASK])))
continue;
 
+   if (tb[FRA_TUN_ID] 
+   (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID])))
+   continue;
+
if (!ops-compare(rule, frh, tb))
continue;
 
@@ -535,7 +545,8 @@ static inline size_t fib_rule_nlmsg_size(struct 
fib_rules_ops *ops,
 + nla_total_size(4) /* FRA_SUPPRESS_PREFIXLEN */
 + nla_total_size(4) /* FRA_SUPPRESS_IFGROUP */
 + nla_total_size(4) /* FRA_FWMARK */
-+ nla_total_size(4); /* FRA_FWMASK */
++ nla_total_size(4) /* FRA_FWMASK */
++ nla_total_size(8); /* FRA_TUN_ID */
 
if (ops-nlmsg_payload)
payload += ops-nlmsg_payload(rule);
@@ -591,7 +602,9 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct 
fib_rule *rule,
((rule-mark_mask || rule-mark) 
 nla_put_u32(skb, FRA_FWMASK, rule-mark_mask)) ||
(rule-target 
-nla_put_u32(skb, FRA_GOTO, rule-target)))
+nla_put_u32(skb, FRA_GOTO, rule-target)) ||
+   (rule-tun_id 
+nla_put_be64(skb, FRA_TUN_ID, rule-tun_id)))
goto nla_put_failure;
 
if (rule-suppress_ifgroup != -1) {
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL

2015-06-01 Thread Thomas Graf

Introduces a new Netlink attribute RTA_TUNNEL which allows routes
to set tunnel transmit metadata and specify the tunnel endpoint or
tunnel id on a per route basis. The route must point to a tunnel
device which understands per skb tunnel metadata and has been put
into the respective mode.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_fib.h   |  3 +++
 include/net/ip_tunnels.h   |  1 -
 include/net/route.h| 10 
 include/uapi/linux/rtnetlink.h | 16 
 net/ipv4/fib_frontend.c| 57 ++
 net/ipv4/fib_semantics.c   | 45 +
 net/ipv4/route.c   | 30 +-
 net/openvswitch/vport.h|  1 +
 8 files changed, 161 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..1cd7cf8 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
 #include net/fib_rules.h
 #include net/inetpeer.h
 #include linux/percpu.h
+#include net/ip_tunnels.h
 
 struct fib_config {
u8  fc_dst_len;
@@ -44,6 +45,7 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
+   struct ip_tunnel_info   fc_tunnel;
  };
 
 struct fib_info;
@@ -117,6 +119,7 @@ struct fib_info {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_power;
 #endif
+   struct ip_tunnel_info   *fib_tunnel;
struct rcu_head rcu;
struct fib_nh   fib_nh[0];
 #define fib_devfib_nh[0].nh_dev
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index df8cfd3..b4ab930 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -9,7 +9,6 @@
 #include net/dsfield.h
 #include net/gro_cells.h
 #include net/inet_ecn.h
-#include net/ip.h
 #include net/netns/generic.h
 #include net/rtnetlink.h
 #include net/flow.h
diff --git a/include/net/route.h b/include/net/route.h
index 6ede321..dbda603 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
 #include net/inetpeer.h
 #include net/flow.h
 #include net/inet_sock.h
+#include net/ip_tunnels.h
 #include linux/in_route.h
 #include linux/rtnetlink.h
 #include linux/rcupdate.h
@@ -66,6 +67,7 @@ struct rtable {
 
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+   struct ip_tunnel_info   *rt_tun_info;
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
@@ -198,6 +200,8 @@ struct in_ifaddr;
 void fib_add_ifaddr(struct in_ifaddr *);
 void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);
 
+int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info);
+
 static inline void ip_rt_put(struct rtable *rt)
 {
/* dst_release() accepts a NULL parameter.
@@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry 
*dst)
 
 static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
 {
+   struct rtable *rt;
+
if (skb_shinfo(skb)-tun_info)
return skb_shinfo(skb)-tun_info;
 
+   rt = skb_rtable(skb);
+   if (rt)
+   return rt-rt_tun_info;
+
return NULL;
 }
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..1f7aa68 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -286,6 +286,21 @@ enum rt_class_t {
 
 /* Routing message attributes */
 
+enum rta_tunnel_t {
+   RTA_TUN_UNSPEC,
+   RTA_TUN_ID,
+   RTA_TUN_DST,
+   RTA_TUN_SRC,
+   RTA_TUN_TTL,
+   RTA_TUN_TOS,
+   RTA_TUN_SPORT,
+   RTA_TUN_DPORT,
+   RTA_TUN_FLAGS,
+   __RTA_TUN_MAX,
+};
+
+#define RTA_TUN_MAX (__RTA_TUN_MAX - 1)
+
 enum rtattr_type_t {
RTA_UNSPEC,
RTA_DST,
@@ -308,6 +323,7 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_TUNNEL, /* destination VTEP */
__RTA_MAX
 };
 
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..bfa77a6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void 
__user *arg)
return -EINVAL;
 }
 
+static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = {
+   [RTA_TUN_ID]= { .type = NLA_U64 },
+   [RTA_TUN_DST]   = { .type = NLA_U32 },
+   [RTA_TUN_SRC]   = { .type = NLA_U32 },
+   [RTA_TUN_TTL]   = { .type = NLA_U8 },
+   [RTA_TUN_TOS]   = { .type = NLA_U8 },
+   [RTA_TUN_SPORT] = { .type = NLA_U16 },
+   [RTA_TUN_DPORT] = { .type = NLA_U16 },
+   [RTA_TUN_FLAGS] = { .type = NLA_U16 },
+};
+
+static int parse_rta_tunnel(struct fib_config *cfg, struct nlattr *attr)
+{
+   struct nlattr *tb[RTA_TUN_MAX

[net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member

2015-06-01 Thread Thomas Graf

With the removal of the special treating of OVS VXLAN vports, the
indirect call to vxlan_rcv() can be avoided and the VNI member
in vxlan_metadata can be removed.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 225 +---
 include/net/vxlan.h |   7 --
 2 files changed, 107 insertions(+), 125 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index b696871..9cc7d5a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -75,7 +75,6 @@ static struct rtnl_link_ops vxlan_link_ops;
 static const u8 all_zeros_mac[ETH_ALEN];
 
 static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
-vxlan_rcv_t *rcv, void *data,
 bool no_share, u32 flags);
 
 /* per-network namespace private data for this module */
@@ -1122,6 +1121,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
return vh;
 }
 
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+ struct vxlan_metadata *md, __u32 vni)
+{
+   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
+   struct iphdr *oip = NULL;
+   struct ipv6hdr *oip6 = NULL;
+   struct vxlan_dev *vxlan;
+   struct pcpu_sw_netstats *stats;
+   union vxlan_addr saddr;
+   int err = 0;
+   union vxlan_addr *remote_ip;
+
+   /* For flow based devices, map all packets to VNI 0 */
+   if (vs-flags  VXLAN_F_FLOW_BASED)
+   vni = 0;
+
+   /* Is this VNI defined? */
+   vxlan = vxlan_vs_find_vni(vs, vni);
+   if (!vxlan)
+   goto drop;
+
+   remote_ip = vxlan-default_dst.remote_ip;
+   skb_reset_mac_header(skb);
+   skb_scrub_packet(skb, !net_eq(vxlan-net, dev_net(vxlan-dev)));
+   skb-protocol = eth_type_trans(skb, vxlan-dev);
+   skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+
+   /* Ignore packet loops (and multicast echo) */
+   if (ether_addr_equal(eth_hdr(skb)-h_source, vxlan-dev-dev_addr))
+   goto drop;
+
+   /* Re-examine inner Ethernet packet */
+   if (remote_ip-sa.sa_family == AF_INET) {
+   oip = ip_hdr(skb);
+   saddr.sin.sin_addr.s_addr = oip-saddr;
+   saddr.sa.sa_family = AF_INET;
+
+   if (tun_info) {
+   tun_info-key.ipv4_src = oip-saddr;
+   tun_info-key.ipv4_dst = oip-daddr;
+   tun_info-key.ipv4_tos = oip-tos;
+   tun_info-key.ipv4_ttl = oip-ttl;
+   }
+#if IS_ENABLED(CONFIG_IPV6)
+   } else {
+   oip6 = ipv6_hdr(skb);
+   saddr.sin6.sin6_addr = oip6-saddr;
+   saddr.sa.sa_family = AF_INET6;
+
+   /* TODO : Fill IPv6 tunnel info */
+#endif
+   }
+
+   if ((vxlan-flags  VXLAN_F_LEARN) 
+   vxlan_snoop(skb-dev, saddr, eth_hdr(skb)-h_source))
+   goto drop;
+
+   skb_reset_network_header(skb);
+   if (!(vs-flags  VXLAN_F_FLOW_BASED))
+   skb-mark = md-gbp;
+
+   if (oip6)
+   err = IP6_ECN_decapsulate(oip6, skb);
+   if (oip)
+   err = IP_ECN_decapsulate(oip, skb);
+
+   if (unlikely(err)) {
+   if (log_ecn_error) {
+   if (oip6)
+   net_info_ratelimited(non-ECT from %pI6\n,
+oip6-saddr);
+   if (oip)
+   net_info_ratelimited(non-ECT from %pI4 with 
TOS=%#x\n,
+oip-saddr, oip-tos);
+   }
+   if (err  1) {
+   ++vxlan-dev-stats.rx_frame_errors;
+   ++vxlan-dev-stats.rx_errors;
+   goto drop;
+   }
+   }
+
+   stats = this_cpu_ptr(vxlan-dev-tstats);
+   u64_stats_update_begin(stats-syncp);
+   stats-rx_packets++;
+   stats-rx_bytes += skb-len;
+   u64_stats_update_end(stats-syncp);
+
+   netif_rx(skb);
+
+   return;
+drop:
+   /* Consume bad packet */
+   kfree_skb(skb);
+}
+
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
@@ -1226,8 +1321,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
goto bad_flags;
}
 
-   md-vni = vxh-vx_vni;
-   vs-rcv(vs, skb, md);
+   vxlan_rcv(vs, skb, md, vni  8);
return 0;
 
 drop:
@@ -1244,105 +1338,6 @@ error:
return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
- struct vxlan_metadata *md)
-{
-   struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info;
-   struct iphdr *oip = NULL;
-   struct ipv6hdr *oip6 = NULL;
-   struct

[net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport

2015-06-01 Thread Thomas Graf

From: Pravin Shelar pshe...@nicira.com

Removes all of the OVS specific GRE code and makes OVS use a
GRE net_device .

Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/core/dev.c |   5 +-
 net/ipv4/ip_gre.c  | 161 -
 net/openvswitch/Makefile   |   1 -
 net/openvswitch/vport-gre.c| 313 -
 net/openvswitch/vport-netdev.c |   7 +-
 5 files changed, 168 insertions(+), 319 deletions(-)
 delete mode 100644 net/openvswitch/vport-gre.c

diff --git a/net/core/dev.c b/net/core/dev.c
index 594163d..656f3b4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6969,6 +6969,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(dev-ptype_all);
INIT_LIST_HEAD(dev-ptype_specific);
dev-priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
+
+   strcpy(dev-name, name);
+   dev-name_assign_type = name_assign_type;
setup(dev);
 
dev-num_tx_queues = txqs;
@@ -6983,8 +6986,6 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
goto free_all;
 #endif
 
-   strcpy(dev-name, name);
-   dev-name_assign_type = name_assign_type;
dev-group = INIT_NETDEV_GROUP;
if (!dev-ethtool_ops)
dev-ethtool_ops = default_ethtool_ops;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5fd7064..b37515e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -25,6 +25,7 @@
 #include linux/udp.h
 #include linux/if_arp.h
 #include linux/mroute.h
+#include linux/if_vlan.h
 #include linux/init.h
 #include linux/in6.h
 #include linux/inetdevice.h
@@ -115,6 +116,8 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN);
 
+#define GRE_TAP_FB_NAME gretap0
+
 static struct rtnl_link_ops ipgre_link_ops __read_mostly;
 static int ipgre_tunnel_init(struct net_device *dev);
 
@@ -217,7 +220,17 @@ static int ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi)
  iph-saddr, iph-daddr, tpi-key);
 
if (tunnel) {
+
skb_pop_mac_header(skb);
+   if (tunnel-dev == itn-fb_tunnel_dev) {
+   struct ip_tunnel_info *tun_info;
+
+   tun_info = ip_tunnel_info_alloc(0, GFP_ATOMIC);
+
+   /* TODO: setup tun info from tpi */
+   skb_attach_tunnel_info(skb, tun_info);
+   }
+
ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error);
return PACKET_RCVD;
}
@@ -287,6 +300,135 @@ out:
return NETDEV_TX_OK;
 }
 
+/* TODO: share xmit code */
+static inline struct rtable *tunnel_route_lookup(struct net *net,
+const struct ip_tunnel_key 
*key,
+u32 mark,
+struct flowi4 *fl,
+u8 protocol)
+{
+   struct rtable *rt;
+
+   memset(fl, 0, sizeof(*fl));
+   fl-daddr = key-ipv4_dst;
+   fl-saddr = key-ipv4_src;
+   fl-flowi4_tos = RT_TOS(key-ipv4_tos);
+   fl-flowi4_mark = mark;
+   fl-flowi4_proto = protocol;
+
+   rt = ip_route_output_key(net, fl);
+   return rt;
+}
+
+
+/* Returns the least-significant 32 bits of a __be64. */
+static __be32 be64_get_low32(__be64 x)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)x;
+#else
+   return (__force __be32)((__force u64)x  32);
+#endif
+}
+
+static __be16 filter_tnl_flags(__be16 flags)
+{
+   return flags  (TUNNEL_CSUM | TUNNEL_KEY);
+}
+
+
+static struct sk_buff *__build_header(struct sk_buff *skb,
+ const struct ip_tunnel_info *tun_info,
+ int tunnel_hlen)
+{
+   struct tnl_ptk_info tpi;
+
+   skb = gre_handle_offloads(skb, !!(tun_info-key.tun_flags  
TUNNEL_CSUM));
+   if (IS_ERR(skb))
+   return skb;
+
+   tpi.flags = filter_tnl_flags(tun_info-key.tun_flags);
+   tpi.proto = htons(ETH_P_TEB);
+   tpi.key = be64_get_low32(tun_info-key.tun_id);
+   tpi.seq = 0;
+   gre_build_header(skb, tpi, tunnel_hlen);
+
+   return skb;
+}
+
+static netdev_tx_t gre_fb_xmit(struct sk_buff *skb,
+   struct net_device *dev)
+{
+   struct net *net = dev_net(dev);
+   struct ip_tunnel_info *tun_info;
+   const struct ip_tunnel_key *key;
+   struct flowi4 fl;
+   struct rtable *rt;
+   int min_headroom;
+   int tunnel_hlen;
+   __be16 df;
+   int err;
+
+   tun_info = skb_shinfo(skb)-tun_info;
+   if (unlikely(!tun_info)) {
+   err = -EINVAL;
+   goto err_free_skb;
+   }
+
+   key = tun_info-key;
+
+   rt =

[net-next RFC 08/14] openvswitch: Allocate attach ip_tunnel_info for tunnel set action

2015-06-01 Thread Thomas Graf

Make use of the new skb tunnel metadata field by allocating a
ip_tunnel_info per OVS tunnel set action and then attaching that
metadata to each skb that passes the set action.

The old egress_tun_info via the OVS_CB() is left in place until
all tunnel vports have been converted to the new method.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/openvswitch/actions.c  |  8 +-
 net/openvswitch/datapath.c |  8 +++---
 net/openvswitch/flow.h |  5 
 net/openvswitch/flow_netlink.c | 59 +-
 net/openvswitch/flow_netlink.h |  1 +
 5 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 34cad57..484d965 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -726,7 +726,13 @@ static int execute_set_action(struct sk_buff *skb,
 {
/* Only tunnel set execution is supported without a mask. */
if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) {
-   OVS_CB(skb)-egress_tun_info = nla_data(a);
+   struct ovs_tunnel_info *tun = nla_data(a);
+
+   skb_attach_tunnel_info(skb, tun-info);
+
+   /* FIXME: Remove when all vports have been converted */
+   OVS_CB(skb)-egress_tun_info = tun-info;
+
return 0;
}
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3b90461..3315e3a 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
ovs_unlock();
 
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
ovs_flow_free(new_flow, false);
}
 
@@ -1016,7 +1016,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 err_kfree_flow:
ovs_flow_free(new_flow, false);
 error:
@@ -1143,7 +1143,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
if (reply)
ovs_notify(dp_flow_genl_family, reply, info);
if (old_acts)
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
 
return 0;
 
@@ -1151,7 +1151,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 error:
return error;
 }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index cadc6c5..193eab9 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -45,6 +45,11 @@ struct sk_buff;
 #define TUN_METADATA_OPTS(flow_key, opt_len) \
((void *)((flow_key)-tun_opts + TUN_METADATA_OFFSET(opt_len)))
 
+struct ovs_tunnel_info
+{
+   struct ip_tunnel_info   *info;
+};
+
 #define OVS_SW_FLOW_KEY_METADATA_SIZE  \
(offsetof(struct sw_flow_key, recirc_id) +  \
FIELD_SIZEOF(struct sw_flow_key, recirc_id))
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ecfa530..35086c6 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1548,11 +1548,45 @@ static struct sw_flow_actions 
*nla_alloc_flow_actions(int size, bool log)
return sfa;
 }
 
+static void ovs_nla_free_set_action(const struct nlattr *a)
+{
+   const struct nlattr *ovs_key = nla_data(a);
+   struct ovs_tunnel_info *ovs_tun;
+
+   switch (nla_type(ovs_key)) {
+   case OVS_KEY_ATTR_TUNNEL_INFO:
+   ovs_tun = nla_data(ovs_key);
+   ip_tunnel_info_put(ovs_tun-info);
+   break;
+   }
+}
+
+void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+{
+   const struct nlattr *a;
+   int rem;
+
+   nla_for_each_attr(a, sf_acts-actions, sf_acts-actions_len, rem) {
+   switch (nla_type(a)) {
+   case OVS_ACTION_ATTR_SET:
+   ovs_nla_free_set_action(a);
+   break;
+   }
+   }
+
+   kfree(sf_acts);
+}
+
+static void __ovs_nla_free_flow_actions(struct rcu_head *head)
+{
+   ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, 
rcu));
+}
+
 /* Schedules 'sf_acts' to be freed after the next RCU grace period.
  * The caller must hold rcu_read_lock for this to be sensible. */
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts)
 {
-   kfree_rcu(sf_acts, rcu);
+   call_rcu(sf_acts-rcu, __ovs_nla_free_flow_actions);
 }
 
 static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
@@ -1747,6 +1781,7 @@ static int validate_and_copy_set_tun(const struct nlattr 
*attr,
struct sw_flow_match match

Re: [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL

2015-06-01 Thread Thomas Graf

On 06/01/15 at 05:51pm, Robert Shearman wrote:
 On 01/06/15 15:27, Thomas Graf wrote:
 Introduces a new Netlink attribute RTA_TUNNEL which allows routes
 to set tunnel transmit metadata and specify the tunnel endpoint or
 tunnel id on a per route basis. The route must point to a tunnel
 device which understands per skb tunnel metadata and has been put
 into the respective mode.
 
 We've been discussing something similar for the purposes of IP over MPLS,
 but most of the attributes for IP tunnels aren't relevant for MPLS. It be
 great if we can come up with something general enough that can serve both
 purposes. I've just sent a patch series ([RFC net-next 0/3] IP imposition
 of per-nh MPLS encap) which I believe would allow this.

Nice! On a first glance, your series looks like an excellent complement
to this series. I'll comment directly in your series.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap

2015-06-01 Thread Thomas Graf

On 06/01/15 at 05:46pm, Robert Shearman wrote:
 In order to be able to function as a Label Edge Router in an MPLS
 network, it is necessary to be able to take IP packets and impose an
 MPLS encap and forward them out. The traditional approach of setting
 up an interface for each tunnel endpoint doesn't scale for the
 common MPLS use-cases where each IP route tends to be assigned a
 different label as encap.
 
 The solution suggested here for further discussion is to provide the
 facility to define encap data on a per-nexthop basis using a new
 netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6
 forwarding code, but interpreted by the virtual interface assigned to
 the nexthop.

RTA_ENCAP is currently a binary blob specific to each encapsulation
type interface. I guess this should be converted to a set of nested
Netlink attributes for each type of encap to make it extendible in
the future.

What is your plan regarding the receive side and on the matching of
encap fields? Storing the receive parameters is what lead me to
storing it in skb_shared_info.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink: Disable insertions/removals during rehash

2015-05-20 Thread Thomas Graf

On 05/15/15 at 08:06am, Herbert Xu wrote:
 On Thu, May 14, 2015 at 07:37:56AM -0700, Eric Dumazet wrote:
 
  This solves the corruption thanks Herbert.
 
 Great.
 
  But wasn't rhashtable meant to be faster ? ;)
 
 Is it, that's news to me :)

Eric, can you share the scripts you used to test this?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 6/6] netlink: allow to listen all netns

2015-05-06 Thread Thomas Graf

On 05/06/15 at 11:58am, Nicolas Dichtel wrote:
 More accurately, listen all netns that have a nsid assigned into the netns
 where the netlink socket is opened.
 For this purpose, a netlink socket option is added:
 NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this
 socket will receive netlink notifications from all netns that have a nsid
 assigned into the netns where the socket has been opened. The nsid is sent
 to userland via an anscillary data.
 
 With this patch, a daemon needs only one socket to listen many netns. This
 is useful when the number of netns is high.
 
 Signed-off-by: Nicolas Dichtel nicolas.dich...@6wind.com

[...]

 +/* This function returns true is the peer netns has an id assigned into the
 + * current netns.
 + */
 +bool peernet_has_id(struct net *net, struct net *peer)
 +{
 + return peernet2id(net, peer) = 0;
 +}

Missing export?

 +
  struct net *get_net_ns_by_id(struct net *net, int id)
  {
   unsigned long flags;
 diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
 index ec4adbdcb9b4..bdbde542e952 100644
 --- a/net/netlink/af_netlink.c
 +++ b/net/netlink/af_netlink.c
 @@ -83,6 +83,7 @@ struct listeners {
  #define NETLINK_RECV_PKTINFO 0x2
  #define NETLINK_BROADCAST_SEND_ERROR 0x4
  #define NETLINK_RECV_NO_ENOBUFS  0x8
 +#define NETLINK_LISTEN_ALL   0x10

Maybe name this NETLINK_LISTEN_ALL_NSID just to make it clear?

 + if (!file_ns_capable(sk-sk_socket-file, p-net-user_ns,
 +  CAP_NET_BROADCAST))
 + return;
 + }
 + NETLINK_CB(p-skb).net = p-net;

Does this need a get_net()? The netns could disappear while the skb is
queued, right?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rhashtable: Add cap on number of elements in hash table

2015-04-24 Thread Thomas Graf

On 04/24/15 at 08:57am, Herbert Xu wrote:
 It seems that I lost track somewhere along the line.  I meant
 to add an explicit limit on the overall number of entries since
 that was what users like netlink expected but never got around
 to doing it.  Instead it seems that we're currently relying on
 the rht_grow_above_100 to protect us.

Can we please just take Johannes's fix as-is first? It fixes
the bug at hand in an isolated manner without introducing any
new knobs. Your patch includes his fix as-is without modification
anyway.

 So here is a patch that adds an explicit limit and fixes the
 problem Johannes reported.
 
 ---8---
 We currently have no limit on the number of elements in a hash table.
 This is very bad especially considering that some rhashtable users
 had such a limit before the conversion and relied on it for defence
 against DoS attacks.

Which users are you talking about? Both Netlink and TIPC still
have an upper limit. nft sets are controlled by privileged users.

 We already have a maximum hash table size limit but its enforcement
 is only by luck and results in a nasty WARN_ON.

As I stated earlier, this is no longer the case and thus this
paragraph only confuses the commit message.

 This patch adds a new paramater insecure_max_entries which becomes
 the cap on the table.  If unset it defaults to max_size.  If it is
 also zero it means that there is no cap on the number of elements
 in the table.  However, the table will grow whenever the utilisation
 hits 100% and if that growth fails, you will get ENOMEM on insertion.

Last time we discussed this it was said that the caller should enforce
the limit like Netlink does.

I'm fine with adding an upper max but I'd like to discuss that in the
context of a full series which converts all existing enforcements and
also contains a testing mechanism to verify this. Also, unless you can
show me where this is currently a real bug, this is really net-next
material.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rhashtable: Add cap on number of elements in hash table

2015-04-24 Thread Thomas Graf

On 04/24/15 at 04:12pm, Herbert Xu wrote:
 On Fri, Apr 24, 2015 at 09:06:08AM +0100, Thomas Graf wrote:
 
  Which users are you talking about? Both Netlink and TIPC still
  have an upper limit. nft sets are controlled by privileged users.
 
 There is no limit in netlink apart from UINT_MAX AFAICS.  Allowing
 UINT_MAX entries into a hash table limited to 64K is not a good
 thing.

OK, so you are saying that the Netlink limit is too low? Then let's
fix that.

You are claiming that the rhashtable convertion removed a cap. I'm
not seeing such a change. Can you point me to where netlink_insert()
enforced a cap pre-rhashtable?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rhashtable: don't attempt to grow when at max_size

2015-04-23 Thread Thomas Graf

On 04/23/15 at 04:38pm, Johannes Berg wrote:
 From: Johannes Berg johannes.b...@intel.com
 
 The conversion of mac80211's station table to rhashtable had a bug
 that I found by accident in code review, that hadn't been found as
 rhashtable apparently managed to have a maximum hash chain length
 of one (!) in all our testing.

This is the desired chain length ;-)

 In order to test the bug and verify the fix I set my rhashtable's
 max_size very low (4) in order to force getting hash collisions.
 
 At that point, rhashtable WARNed in rhashtable_insert_rehash() but
 didn't actually reject the hash table insertion. This caused it to
 lose insertions - my master list of stations would have 9 entries,
 but the rhashtable only had 5. This may warrant a deeper look, but
 that WARN_ON() just shouldn't happen.

The warning got fixed recently (51bb8e331b) and
rhashtable_insert_rehash() now only allows a single rehash if at
max_size already. It will now return -EBUSY.

Insertions may still fail while the table is above 100% utilization
so this fix is absolutely needed though.

 Fix this by not returning true from rht_grow_above_100() when the
 rhashtable's max_size has been reached - in this case the user is
 explicitly configuring it to be at most that big, so even if it's
 now above 100% it shouldn't attempt to resize.

Good catch. I wonder whether we want to trigger a periodic rehash
in an interval in this situation or just leave this up to the user
to setup a timer himself.

 This fixes the lost insertion issue and consequently allows my
 code to display its error (and verify my fix for it.)
 
 Signed-off-by: Johannes Berg johannes.b...@intel.com

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/3] tc: fix return values of ingress qdisc

2015-04-23 Thread Thomas Graf

On 04/22/15 at 04:29pm, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:04 PM, Alexei Starovoitov a...@plumgrid.com wrote:
  On 4/21/15 9:59 PM, Cong Wang wrote:
 
  On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com
  wrote:
 
  ingress qdisc should return NET_XMIT_* values just like all other qdiscs.
 
 
  XMIT already means egress...
 
 
  may be then it should be renamed as well.
  from include/linux/netdevice.h:
  /* qdisc -enqueue() return codes. */
  #define NET_XMIT_SUCCESS0x00
  ...
 
  the point is that qdisc-enqeue() must return NET_XMIT_* values.
  ingress qdisc is violating this and therefore should be fixed.
 
 XMIT is non-sense for ingress, you really need to pick another
 name for it if TC_ACT_OK isn't okay for you (it is okay for me).

You transmit into a qdisc. If that terminology doesn't suit
you then rename it to NET_QUEUE_* but moving away from returning
TC_ACT_* is definitely the right thing to do here.
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 3/3] tc: cleanup tc_classify

2015-04-23 Thread Thomas Graf

On 04/22/15 at 04:38pm, Cong Wang wrote:
 On Wed, Apr 22, 2015 at 3:27 PM, Alexei Starovoitov a...@plumgrid.com wrote:
  On 4/21/15 10:05 PM, Cong Wang wrote:
 
  On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com
  wrote:
 
  introduce tc_classify_act() and qdisc_drop_bypass() helper functions to
  reduce
  copy-paste among different qdiscs

I like this cleanup. It aligns all skb dropping in qdiscs to a
qdisc_drop*() function.

  I don't think qdisc_drop_bypass() is more readable than without it,
  maybe you need a better name, or just leave the code as it is.
 
 
  what would be a better name? I'm open to suggestions.
 
 My reading for qdisc_drop_bypass() is it bypasses packet
 dropping for some case, apparently doesn't match its definition.
 
 I can't think out a better name therefore I don't think it deserves
 a function, just leave as it is.

Interesting logic ;-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further

2015-04-22 Thread Thomas Graf

The current code currently only stops inserting rehashes into the
chain when no resizes are currently scheduled. As long as resizes
are scheduled and while inserting above the utilization watermark,
more and more rehashes will be scheduled.

This lead to a perfect DoS storm with thousands of rehashes
scheduled which lead to thousands of spinlocks to be taken
sequentially.

Instead, only allow either a series of resizes or a single rehash.
Drop any further rehashes and return -EBUSY.

Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion)
Signed-off-by: Thomas Graf tg...@suug.ch
Acked-by: Herbert Xu herb...@gondor.apana.org.au
---
 lib/rhashtable.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index f648cfd..b28df40 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -405,8 +405,8 @@ int rhashtable_insert_rehash(struct rhashtable *ht)
 
if (rht_grow_above_75(ht, tbl))
size *= 2;
-   /* More than two rehashes (not resizes) detected. */
-   else if (WARN_ON(old_tbl != tbl  old_tbl-size == size))
+   /* Do not schedule more than one rehash */
+   else if (old_tbl != tbl)
return -EBUSY;
 
new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails

2015-04-22 Thread Thomas Graf

When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
we can't allocate the necessary memory in the current context but the
limits as set by the user would still allow to grow.

Thus attempt an async resize in the background where we can allocate
using GFP_KERNEL which is more likely to succeed. The insertion itself
will still fail to indicate pressure.

This fixes a bug where the table would never continue growing once the
utilization is above 100%.

Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion)
Signed-off-by: Thomas Graf tg...@suug.ch
---
 lib/rhashtable.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 4898442..f648cfd 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -410,8 +410,13 @@ int rhashtable_insert_rehash(struct rhashtable *ht)
return -EBUSY;
 
new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
-   if (new_tbl == NULL)
+   if (new_tbl == NULL) {
+   /* Schedule async resize/rehash to try allocation
+* non-atomic context.
+*/
+   schedule_work(ht-run_work);
return -ENOMEM;
+   }
 
err = rhashtable_rehash_attach(ht, tbl, new_tbl);
if (err) {
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 0/2 v2] rhashtable rehashing fixes

2015-04-22 Thread Thomas Graf

Some rhashtable rehashing bugs found while testing with the
next rhashtable self-test queued up for the next devel cycle:

https://github.com/tgraf/net-next/commits/rht

v2:
 - Moved schedule_work() call into rhashtable_insert_rehash()

Thomas Graf (2):
  rhashtable: Schedule async resize when sync realloc fails
  rhashtable: Do not schedule more than one rehash if we can't grow
further

 lib/rhashtable.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails

2015-04-22 Thread Thomas Graf

On 04/21/15 at 10:10pm, David Miller wrote:
 From: Herbert Xu herb...@gondor.apana.org.au
 Date: Wed, 22 Apr 2015 08:36:34 +0800

  On Tue, Apr 21, 2015 at 02:55:34PM +0200, Thomas Graf wrote:
  When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
  we can't allocate the necessary memory in the current context but the
  limits as set by the user would still allow to grow.

  Thus attempt an async resize in the background where we can allocate
  using GFP_KERNEL which is more likely to succeed. The insertion itself
  will still fail to indicate pressure.

  This fixes a bug where the table would never continue growing once the
  utilization is above 100%.

  Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion)
  Signed-off-by: Thomas Graf tg...@suug.ch

  Good catch.  But I think this call should happen in
  rhashtable_insert_rehash since it's on the slow-path.

 Ok, then I expect a respin of this series.

Agreed, respinning.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further

2015-04-21 Thread Thomas Graf

The current code currently only stops inserting rehashes into the
chain when no resizes are currently scheduled. As long as resizes
are scheduled and while inserting above the utilization watermark,
more and more rehashes will be scheduled.

This lead to a perfect DoS storm with thousands of rehashes
scheduled which lead to thousands of spinlocks to be taken
sequentially.

Instead, only allow either a series of resizes or a single rehash.
Drop any further rehashes and return -EBUSY.

Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion)
Signed-off-by: Thomas Graf tg...@suug.ch
---
 lib/rhashtable.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 4898442..cb819ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -405,8 +405,8 @@ int rhashtable_insert_rehash(struct rhashtable *ht)
 
if (rht_grow_above_75(ht, tbl))
size *= 2;
-   /* More than two rehashes (not resizes) detected. */
-   else if (WARN_ON(old_tbl != tbl  old_tbl-size == size))
+   /* Do not schedule more than one rehash */
+   else if (old_tbl != tbl)
return -EBUSY;
 
new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 0/2] rhashtable rehashing fixes

2015-04-21 Thread Thomas Graf

Some rhashtable rehashing bugs found while testing with the
next rhashtable self-test queued up for the next devel cycle:

https://github.com/tgraf/net-next/commits/rht

Thomas Graf (2):
  rhashtable: Schedule async resize when sync realloc fails
  rhashtable: Do not schedule more than one rehash if we can't grow
further

 include/linux/rhashtable.h | 2 ++
 lib/rhashtable.c   | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails

2015-04-21 Thread Thomas Graf

When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
we can't allocate the necessary memory in the current context but the
limits as set by the user would still allow to grow.

Thus attempt an async resize in the background where we can allocate
using GFP_KERNEL which is more likely to succeed. The insertion itself
will still fail to indicate pressure.

This fixes a bug where the table would never continue growing once the
utilization is above 100%.

Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion)
Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/linux/rhashtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index e23d242..7040b5c 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -593,6 +593,8 @@ slow_path:
spin_unlock_bh(lock);
err = rhashtable_insert_rehash(ht);
rcu_read_unlock();
+   if (err == -ENOMEM)
+   schedule_work(ht-run_work);
if (err)
return err;
 
-- 
2.3.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Revert net: Reset secmark when scrubbing packet

2015-04-16 Thread Thomas Graf

On 04/16/15 at 04:12pm, Herbert Xu wrote:
 On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote:
  
  They don't support namespaces, and maintaining the label is critical for 
  SELinux, at least, which mediates security for the system as a whole.
 
 Thanks for the confirmation James, I thought this looked a bit
 dodgy :)
 
 ---8---
 This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
 because the secmark must be preserved even when a packet crosses
 namespace boundaries.  The reason is that security labels apply to
 the system as a whole and is not per-namespace.

No objection to reverting, _BUT_ just because security labels
apply to the system as a whole does not mean that both the packet
in the underlay and overlay belong to the same context.

The point here was to not blindly inherit the security context of a
packet based on the outer or inner header. Someone tagging all
packets addressed to the host itself with a SElinux context may not
expect that SELinux context to be preserved into a namespaced tenant.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v3] skbuff: Do not scrub skb mark within the same name space

2015-04-16 Thread Thomas Graf

On 04/16/15 at 09:03am, Herbert Xu wrote:
 The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 (tunnels:
 harmonize cleanup done on skb on rx path) broke anyone trying to
 use netfilter marking across IPv4 tunnels.  While most of the
 fields that are cleared by skb_scrub_packet don't matter, the
 netfilter mark must be preserved.
 
 This patch rearranges skb_scrub_packet to preserve the mark field.
 
 Fixes: ea23192e8e57 (tunnels: harmonize cleanup done on skb on rx path)
 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

Acked-by: Thomas Graf tg...@suug.ch

We should also add a flag to veth which expclitly allows to preserve
the mark into the namespace.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] ethtool netlink interface

2008-02-25 Thread Thomas Graf

Hello,

Before I continue to finish this work I'd like to get a few comments
on my implementation attempt.

The following patch implements the ETHTOOL_SSET and ETHTOOL_GSET
command via netlink. The individual commands are implemented as
separate functions and hooked into a table holding a validate,
set and fill function for each command. Additionaly an entry must
be made in the attribute policy to validate attributes when received.

Each ethtool command bundle is stored as a nested attribute in
the regular link netlink message, therefore, unlike the ioctl
interface, multiple ethtool commands can be issued in the same
message allowing for links to be fully configured with a single
message.

There is one big disadvantage: Due to the nature of ioctl it is
basically not possible to share any code between the ioctl and
neltink implementation therefore it implies duplicating code
unless we want to do the same hack as fib fronted by constructing
netlink messages inside the kernel.

Index: net-2.6.26/include/linux/if_link.h
===
--- net-2.6.26.orig/include/linux/if_link.h 2008-02-22 14:13:22.0 
+0100
+++ net-2.6.26/include/linux/if_link.h  2008-02-22 14:40:24.0 +0100
@@ -79,6 +79,7 @@
IFLA_LINKINFO,
 #define IFLA_LINKINFO IFLA_LINKINFO
IFLA_NET_NS_PID,
+   IFLA_ETHTOOL,
__IFLA_MAX
 };
 
Index: net-2.6.26/net/core/ethtool.c
===
--- net-2.6.26.orig/net/core/ethtool.c  2008-02-22 14:13:22.0 +0100
+++ net-2.6.26/net/core/ethtool.c   2008-02-25 13:51:23.0 +0100
@@ -18,6 +18,7 @@
 #include linux/ethtool.h
 #include linux/netdevice.h
 #include asm/uaccess.h
+#include net/rtnetlink.h
 
 /*
  * Some useful ethtool_ops methods that're device independent.
@@ -977,6 +978,136 @@
return rc;
 }
 
+static int validate_settings(struct net_device *dev, struct nlattr *attr)
+{
+   if (!dev-ethtool_ops-get_settings)
+   return -EOPNOTSUPP;
+
+   return 0;
+}
+
+static int set_settings(struct net_device *dev, struct nlattr *attr)
+{
+   return dev-ethtool_ops-set_settings(dev, nla_data(attr));
+}
+
+static int fill_settings(struct sk_buff *skb, struct net_device *dev)
+{
+   const struct ethtool_ops *ops = dev-ethtool_ops;
+   struct ethtool_cmd cmd = { ETHTOOL_GSET };
+   int err;
+
+   if (!ops-get_settings)
+   return 0;
+
+   if ((err = ops-get_settings(dev, cmd))  0)
+   return err;
+
+   return nla_put(skb, IFLA_ET_SETTINGS, sizeof(cmd), cmd);
+}
+
+static struct {
+   int (*validate)(struct net_device *, struct nlattr *);
+   int (*exec)(struct net_device *, struct nlattr *);
+   int (*fill)(struct sk_buff *, struct net_device *);
+} nlops[IFLA_ET_MAX+1] = {
+   [IFLA_ET_SETTINGS] = { .validate = validate_settings,
+  .exec = set_settings,
+  .fill = fill_settings, },
+};
+
+static const struct nla_policy ethtool_policy[IFLA_ET_MAX+1] = {
+   [IFLA_ET_SETTINGS]  = { .len = sizeof(struct ethtool_cmd) },
+};
+
+int ethtool_validate_nlattr(struct net_device *dev, struct nlattr *cfg)
+{
+   const struct ethtool_ops *ops;
+   struct nlattr *attr;
+   int err, remaining = 0;
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
+   if (!netif_device_present(dev))
+   return -ENODEV;
+
+   if (!(ops = dev-ethtool_ops))
+   return -EOPNOTSUPP;
+
+   if ((err = nla_validate_nested(cfg, IFLA_ET_MAX, ethtool_policy))  0)
+   goto errout;
+
+   nla_for_each_nested(attr, cfg, remaining) {
+   if (nlops[attr-nla_type].validate) {
+   err = nlops[attr-nla_type].validate(dev, attr);
+   if (err  0)
+   goto errout;
+   }
+   }
+
+errout:
+   return err;
+}
+
+int ethtool_execute_nlattr(struct net_device *dev, struct nlattr *et_attr)
+{
+   const struct ethtool_ops *ops = dev-ethtool_ops;
+   struct nlattr *attr;
+   unsigned long old_features;
+   int err, remaining = 0;
+
+   if (ops-begin  (err = ops-begin(dev))  0)
+   return err;
+
+   old_features = dev-features;
+
+   nla_for_each_nested(attr, et_attr, remaining) {
+   if (nlops[attr-nla_type].exec) {
+   if ((err = nlops[attr-nla_type].exec(dev, attr))  0)
+   goto errout;
+   }
+   }
+
+   err = 0;
+errout:
+   if (ops-complete)
+   ops-complete(dev);
+
+   if (old_features != dev-features)
+   netdev_features_change(dev);
+
+   return err;
+}
+
+int ethtool_fill_nlattr(struct sk_buff *skb, struct net_device *dev)
+{
+   struct nlattr *attr;
+   int nfilled = 0, i, err = -EMSGSIZE;
+
+

Re: [RFC] ethtool netlink interface

2008-02-25 Thread Thomas Graf

* Jeff Garzik [EMAIL PROTECTED] 2008-02-25 12:30
 However, I would think it inconsistent to only do SSET/GSET.  If others 
 are OK with this patch, are you open to implementing the full set of 
 ethtool operations?

Of course, I would also provide a documented userspace api within libnl.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK

2008-02-22 Thread Thomas Graf

RTM_NEWLINK allows for already existing links to be modified. For this
purpose do_setlink() is called which expects address attributes with a
payload length of at least dev-addr_len. This patch adds the necessary
validation for the RTM_NEWLINK case.

The address length for links to be created is not checked for now as the
actual attribute length is used when copying the address to the netdevice
structure. It might make sense to report an error if less than addr_len
bytes are provided but enforcing this might break drivers trying to be
smart with not transmitting all zero addresses.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.26/net/core/rtnetlink.c
===
--- net-2.6.26.orig/net/core/rtnetlink.c2008-02-22 01:50:53.0 
+0100
+++ net-2.6.26/net/core/rtnetlink.c 2008-02-22 11:28:59.0 +0100
@@ -726,6 +726,21 @@
return net;
 }
 
+static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
+{
+   if (dev) {
+   if (tb[IFLA_ADDRESS] 
+   nla_len(tb[IFLA_ADDRESS])  dev-addr_len)
+   return -EINVAL;
+
+   if (tb[IFLA_BROADCAST] 
+   nla_len(tb[IFLA_BROADCAST])  dev-addr_len)
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
  struct nlattr **tb, char *ifname, int modified)
 {
@@ -910,12 +925,7 @@
goto errout;
}
 
-   if (tb[IFLA_ADDRESS] 
-   nla_len(tb[IFLA_ADDRESS])  dev-addr_len)
-   goto errout_dev;
-
-   if (tb[IFLA_BROADCAST] 
-   nla_len(tb[IFLA_BROADCAST])  dev-addr_len)
+   if ((err = validate_linkmsg(dev, tb))  0)
goto errout_dev;
 
err = do_setlink(dev, ifm, tb, ifname, 0);
@@ -1036,6 +1046,9 @@
else
dev = NULL;
 
+   if ((err = validate_linkmsg(dev, tb))  0)
+   return err;
+
if (tb[IFLA_LINKINFO]) {
err = nla_parse_nested(linkinfo, IFLA_INFO_MAX,
   tb[IFLA_LINKINFO], ifla_info_policy);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RTNL]: Add missing link netlink attribute policy definitions

2008-02-19 Thread Thomas Graf

IFLA_LINK is no longer a write-only attribute on the kernel
side and must thus be validated. Same goes for the newly
introduced IFLA_LINKINFO.

Fixes undefined behaviour if either of the attributes are
not well formed.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.26/net/core/rtnetlink.c
===
--- net-2.6.26.orig/net/core/rtnetlink.c2008-02-19 20:30:08.0 
+0100
+++ net-2.6.26/net/core/rtnetlink.c 2008-02-20 00:39:54.0 +0100
@@ -693,10 +693,12 @@
[IFLA_BROADCAST]= { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
[IFLA_MAP]  = { .len = sizeof(struct rtnl_link_ifmap) },
[IFLA_MTU]  = { .type = NLA_U32 },
+   [IFLA_LINK] = { .type = NLA_U32 },
[IFLA_TXQLEN]   = { .type = NLA_U32 },
[IFLA_WEIGHT]   = { .type = NLA_U32 },
[IFLA_OPERSTATE]= { .type = NLA_U8 },
[IFLA_LINKMODE] = { .type = NLA_U8 },
+   [IFLA_LINKINFO] = { .type = NLA_NESTED },
[IFLA_NET_NS_PID]   = { .type = NLA_U32 },
 };
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: update frequency for stats in /proc/net/dev

2007-12-18 Thread Thomas Graf

* Mark Seger [EMAIL PROTECTED] 2007-12-18 08:37
 Anyhow, I just wanted to let people know that ALL tools that monitor 
 once a second on older counters will get the wrong numbers and tools 
 that correct for the wrong number by using fractional intervals (and I 
 suspect mine is the only one that does) but run on newer kernels will 
 also get the wrong numbers.  In any event, if anyone is interested in 
 trying out collectl - it monitors a  LOT more than just networks - you 
 can snag a copy of from http://collectl.sourceforge.net/ if you'd like 
 to take if for a drive.  The website has a lot of output examples to 
 give you a better idea what it can do.  I even included a writeup about 
 the odd network performance observations at 
 http://collectl.sourceforge.net/NetworkStats.html

I've solved this problem by using netlink to read the interface counters
ten times per second and maintain an own counter from which I calculate
the rate exactly once per second/minute/hour. The rate per second may
still be inaccurate to some degree, therefore I keep a history of 2-5
rates and take them into account to smoothen the result. This works
fairly well with _all_ operating systems.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ip neigh show not showing arp cache entries?

2007-12-17 Thread Thomas Graf

* Patrick McHardy [EMAIL PROTECTED] 2007-12-18 00:51
 Chris Friesen wrote:
 Patrick McHardy wrote:
 
  From a kernel perspective there are only complete dumps, the
 filtering is done by iproute. So the fact that it shows them
 when querying specifically implies there is a bug in the
 iproute neighbour filter. Does it work if you omit all
 from the ip neigh show command?
 
 Omitting all gives identical results.  It is still missing entries 
 when compared with the output of arp.
 
 
 In that case the easiest way to debug this is probably if you
 add some debugging to ip/ipneigh.c:print_neigh() since I'm
 unable to reproduce this problem. A printf for all the filter
 conditions (= return 0) at the top should do.

Alternatively, you can download libnl and run

NLCB=debug src/nl-neigh-dump brief

and check if the netlink message is sent by the kenrel for the
neighbour in question. 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: libnl - netlink library: Memory leak in address cache?

2007-12-13 Thread Thomas Graf

* Joerg Pommnitz [EMAIL PROTECTED] 2007-12-11 06:52
 I think the leak comes from addr_msg_parser. The newly created address object 
 gets added to the cache with nl_cache_add wich takes a reference, so the 
 reference in addr_msg_parser should be dropped, e.g. the following patch 
 might be correct:

That's correct, thanks for catching this.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[IPv4] ESP: Discard dummy packets introduced in rfc4303

2007-12-10 Thread Thomas Graf

RFC4303 introduces dummy packets with a nexthdr value of 59
to implement traffic confidentiality. Such packets need to
be dropped silently and the payload may not be attempted to
be parsed as it consists of random chunk.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.25/net/ipv4/esp4.c
===
--- net-2.6.25.orig/net/ipv4/esp4.c 2007-12-10 15:57:23.0 +0100
+++ net-2.6.25/net/ipv4/esp4.c  2007-12-10 16:06:10.0 +0100
@@ -9,6 +9,7 @@
 #include linux/pfkeyv2.h
 #include linux/random.h
 #include linux/spinlock.h
+#include linux/in6.h
 #include net/icmp.h
 #include net/protocol.h
 #include net/udp.h
@@ -233,6 +234,10 @@
 
/* ... check padding bits here. Silly. :-) */
 
+   /* RFC4303: Drop dummy packets without any error */
+   if (nexthdr[1] == IPPROTO_NONE)
+   goto out;
+
iph = ip_hdr(skb);
ihl = iph-ihl * 4;
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[IPv6] ESP: Discard dummy packets introduced in rfc4303

2007-12-10 Thread Thomas Graf

RFC4303 introduces dummy packets with a nexthdr value of 59
to implement traffic confidentiality. Such packets need to
be dropped silently and the payload may not be attempted to
be parsed as it consists of random chunk.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.25/net/ipv6/esp6.c
===
--- net-2.6.25.orig/net/ipv6/esp6.c 2007-12-10 16:06:02.0 +0100
+++ net-2.6.25/net/ipv6/esp6.c  2007-12-10 16:08:02.0 +0100
@@ -238,6 +238,12 @@
}
/* ... check padding bits here. Silly. :-) */
 
+   /* RFC4303: Drop dummy packets without any error */
+   if (nexthdr[1] == IPPROTO_NONE) {
+   ret = -EINVAL;
+   goto out;
+   }
+
pskb_trim(skb, skb-len - alen - padlen - 2);
ret = nexthdr[1];
}
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regression in current git - Network Manager fails (bisected)

2007-10-25 Thread Thomas Graf

* Dan Williams [EMAIL PROTECTED] 2007-10-23 10:10
 Should I make NM disable ACKs for now until it gets fixed?

The reason libnl enables ACKs by default is to give the
application using it clear synchronisation points. For
change requests that means the interface function won't
return until the change has been commited as it will
call nl_wait_for_ack(). So if you disable it in NM and
run it on old kernels still using async netlink you
won't be sure when the change is actually being done so
this might break things if you rely on it.

I think providing a invalid message handler which returns
NL_OK if nlmsg_type is NLMSG_DONE or NLMSG_ERROR  err == 0
would be better if you need some kind of workaround. As
those messages are always last this should never cause
real troubles.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regression in current git - Network Manager fails (bisected)

2007-10-23 Thread Thomas Graf

* Dan Williams [EMAIL PROTECTED] 2007-10-22 11:57
 On Mon, 2007-10-22 at 13:22 +0400, Denis V. Lunev wrote:
  We have spent some time with the problem with Alexey and there are no 
  guesses for now.
  
  Is it possible to name exact version of Network Manager and all 
  libraries related + provide us an output of strace with full buffers 
  send/received from netlink. Something like
   strace -v -x -s 32768 nm
 
 NM uses netlink in two places; libnl (from Thomas Graf) and some custom
 code for listening for interface up/down events and wireless events.
 
 It looks like that code comes from libnl's lib/handlers.c where it
 thinks the received message is invalid.
 
 I'm pretty sure the code that checks carrier status of the device isn't
 libnl code; so maybe the error message (which should get fixed of
 course) isn't in the same path as the link detection.
 
 The link detection comes from src/nm-netlink-monitor.c, so maybe we
 should look at debugging there.

The patch introduced a change in semantics because it removed the
special ACK handling after a dump was started.

I will look into this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Regression in current git - Network Manager fails (bisected)

2007-10-23 Thread Thomas Graf

* Denis V. Lunev [EMAIL PROTECTED] 2007-10-23 17:09
 I have reproduced the problem with one-line test.
 ./nl-route-get 192.168.1.1
 The problem is with this message:
 
 -- Debug: Sent Message:
 --   BEGIN NETLINK MESSAGE 
 ---
   [HEADER] 16 octets
 .nlmsg_len = 20
 .nlmsg_type = 18 route/link
 .nlmsg_flags = 773 REQUEST,ACK,ROOT,MATCH
 .nlmsg_seq = 1193143772
 .nlmsg_pid = 8233
   [PAYLOAD] 16 octets
 00 1d fa 20 00 00 00 00 81 0e 02 00 00 00 00 00   ... 
 ---  END NETLINK MESSAGE 
 ---
 it starts dump and requests ACK.

libnl sets the ACK bit for all requests unless the application
disables this behaviour.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix ACK processing after netlink_dump_start

2007-10-23 Thread Thomas Graf

* Denis V. Lunev [EMAIL PROTECTED] 2007-10-23 18:40
 Revert to original netlink behavior. Do not reply with ACK if the
 netlink dump has bees successfully started.
 
 libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0
 The following command reproduce the problem:
/nl-route-get 192.168.1.1
 
 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

ACK. Thank you for taking care of this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH - net-2.6.24 1/2] Introduce and use print_ip

2007-09-20 Thread Thomas Graf

* Joe Perches [EMAIL PROTECTED] 2007-09-19 23:53
 This removes the uses of NIPQUAD and HIPQUAD in
 drivers/net and net
 
 IPV4 Use:
 
   DECLARE_IP_BUF(ipbuf);
   __be32 addr;
   print_ip(ipbuf, addr)
 
 Signed-off-by:  Joe Perches [EMAIL PROTECTED]
 
 please pull from:
 git pull http://repo.or.cz/r/linux-2.6/trivial-mods.git print_ipv4

Including a patch for review would be helpful.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6

2007-09-20 Thread Thomas Graf

* Joe Perches [EMAIL PROTECTED] 2007-09-19 23:53
 In the same vein as print_mac, the implementations
 introduce declaration macros:
   DECLARE_IP_BUF(var)
   DECLARE_IPV6_BUF(var)
 and functions:
   print_ip
   print_ipv6
   print_ipv6_nofmt
 
 IPV4 Use:
 
   DECLARE_IP_BUF(ipbuf);
   __be32 addr;
   print_ip(ipbuf, addr);
 
 IPV6 use:
 
   DECLARE_IPV6_BUF(ipv6buf);
   const struct in6_addr *addr;
   print_ipv6(ipv6buf, addr);
 and
   print_ipv6_nofmt(ipv6buf, addr);
 
 compiled x86, defconfig and allyesconfig

What exactly is the advantage of this?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NETLINK]: Introduce nested and byteorder flag to netlink attribute

2007-09-12 Thread Thomas Graf

This change allows the generic attribute interface to be used within
the netfilter subsystem where this flag was initially introduced.

The byte-order flag is yet unused, it's intended use is to
allow automatic byte order convertions for all atomic types.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/include/linux/netlink.h
===
--- net-2.6.24.orig/include/linux/netlink.h 2007-09-12 13:29:49.0 
+0200
+++ net-2.6.24/include/linux/netlink.h  2007-09-12 13:59:41.0 +0200
@@ -129,6 +129,20 @@
__u16   nla_type;
 };
 
+/*
+ * nla_type (16 bits)
+ * +---+---+---+
+ * | N | O | Attribute Type|
+ * +---+---+---+
+ * N := Carries nested attributes
+ * O := Payload stored in network byte order
+ *
+ * Note: The N and O flag are mutually exclusive.
+ */
+#define NLA_F_NESTED   (1  15)
+#define NLA_F_NET_BYTEORDER(1  14)
+#define NLA_TYPE_MASK  ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
+
 #define NLA_ALIGNTO4
 #define NLA_ALIGN(len) (((len) + NLA_ALIGNTO - 1)  ~(NLA_ALIGNTO - 1))
 #define NLA_HDRLEN ((int) NLA_ALIGN(sizeof(struct nlattr)))
Index: net-2.6.24/include/net/netlink.h
===
--- net-2.6.24.orig/include/net/netlink.h   2007-09-12 13:29:50.0 
+0200
+++ net-2.6.24/include/net/netlink.h2007-09-12 14:17:56.0 +0200
@@ -667,6 +667,15 @@
 }
 
 /**
+ * nla_type - attribute type
+ * @nla: netlink attribute
+ */
+static inline int nla_type(const struct nlattr *nla)
+{
+   return nla-nla_type  NLA_TYPE_MASK;
+}
+
+/**
  * nla_data - head of payload
  * @nla: netlink attribute
  */
Index: net-2.6.24/net/ipv4/fib_frontend.c
===
--- net-2.6.24.orig/net/ipv4/fib_frontend.c 2007-09-12 13:29:51.0 
+0200
+++ net-2.6.24/net/ipv4/fib_frontend.c  2007-09-12 13:59:41.0 +0200
@@ -487,7 +487,7 @@
}
 
nlmsg_for_each_attr(attr, nlh, sizeof(struct rtmsg), remaining) {
-   switch (attr-nla_type) {
+   switch (nla_type(attr)) {
case RTA_DST:
cfg-fc_dst = nla_get_be32(attr);
break;
Index: net-2.6.24/net/ipv4/fib_semantics.c
===
--- net-2.6.24.orig/net/ipv4/fib_semantics.c2007-09-12 13:29:51.0 
+0200
+++ net-2.6.24/net/ipv4/fib_semantics.c 2007-09-12 13:59:41.0 +0200
@@ -743,7 +743,7 @@
int remaining;
 
nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) {
-   int type = nla-nla_type;
+   int type = nla_type(nla);
 
if (type) {
if (type  RTAX_MAX)
Index: net-2.6.24/net/ipv6/route.c
===
--- net-2.6.24.orig/net/ipv6/route.c2007-09-12 13:29:51.0 +0200
+++ net-2.6.24/net/ipv6/route.c 2007-09-12 13:59:41.0 +0200
@@ -1278,7 +1278,7 @@
int remaining;
 
nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) {
-   int type = nla-nla_type;
+   int type = nla_type(nla);
 
if (type) {
if (type  RTAX_MAX) {
Index: net-2.6.24/net/netlabel/netlabel_cipso_v4.c
===
--- net-2.6.24.orig/net/netlabel/netlabel_cipso_v4.c2007-09-12 
13:29:51.0 +0200
+++ net-2.6.24/net/netlabel/netlabel_cipso_v4.c 2007-09-12 13:59:41.0 
+0200
@@ -130,7 +130,7 @@
return -EINVAL;
 
nla_for_each_nested(nla, info-attrs[NLBL_CIPSOV4_A_TAGLST], nla_rem)
-   if (nla-nla_type == NLBL_CIPSOV4_A_TAG) {
+   if (nla_type(nla) == NLBL_CIPSOV4_A_TAG) {
if (iter = CIPSO_V4_TAG_MAXCNT)
return -EINVAL;
doi_def-tags[iter++] = nla_get_u8(nla);
@@ -192,13 +192,13 @@
nla_for_each_nested(nla_a,
info-attrs[NLBL_CIPSOV4_A_MLSLVLLST],
nla_a_rem)
-   if (nla_a-nla_type == NLBL_CIPSOV4_A_MLSLVL) {
+   if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSLVL) {
if (nla_validate_nested(nla_a,
NLBL_CIPSOV4_A_MAX,
netlbl_cipsov4_genl_policy) != 0)
goto add_std_failure;
nla_for_each_nested(nla_b, nla_a, nla_b_rem)
-   switch (nla_b-nla_type) {
+   switch (nla_type

Re: [PATCH 1/1] ipv6: corrects sended rtnetlink message

2007-09-12 Thread Thomas Graf

* Milan Kocian [EMAIL PROTECTED] 2007-09-12 16:50
 However I still think that this notitfication is redundant. I tried to look
 at XORP, bird, USAGI , quagga and to see RTM_DELLINK handling. And imho
 nobody depends on RTM_DELLINK message from ipv6.

Send a patch to remove and we'll see if anyone complains.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] devinet: show all addresses assigned to interface

2007-09-06 Thread Thomas Graf

* Stephen Hemminger [EMAIL PROTECTED] 2007-09-06 16:10
 Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876
 
 Not all ips are shown by ip addr show command when IPs number assigned to an
 interface is more than 60-80 (in fact it depends on broadcast/label etc
 presence on each address).

The more attributes are assigned to an address, the sooner the netlink
message will be full.

 Steps to reproduce:
 It's terribly simple to reproduce:
 
 # for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done
 # ip addr show
 
 this will _not_ show all IPs.
 Looks like the problem is in netlink/ipv4 message processing.
 
 This is fix from bug submitter, it looks correct.

The fix is correct.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: some weird corruption in net-2.6.24

2007-09-04 Thread Thomas Graf

* Herbert Xu [EMAIL PROTECTED] 2007-09-04 07:05
 Thomas Graf [EMAIL PROTECTED] wrote:
  
  I've been trying to reproduce this, what happens on my system
  is that when the ISAKMP SA lifetime is exceeded the rekeying
  fails and my connection dies. I can reproduce this back to
  2.6.22 and it doesn't seem related to my recent xfrm_user work.
  It looks like this behaviour is hiding the bug you are seeing.
 
 Could you try extending the ISAKMP SA life time so that it is
 longer than the IPSec SA life time?

Yes, in this case the IPSec SA rekeying works just fine.
I can't spot any signs of corruptions or alike.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: some weird corruption in net-2.6.24

2007-09-03 Thread Thomas Graf

* David Miller [EMAIL PROTECTED] 2007-08-30 22:39
 
 Every so often some piece of userland dies, and often it's
 bad enough that my desktop session logs out.
 
 I've been trying to find some clues and it seems to happen
 about as often as openswan rekeys my VPN, so one suspect
 area is the netlink cleanups to xfrm_user.
 
 I plan to do some auditing of those changes looking for
 errors, but if someone can beat me to it... :-)

I've been trying to reproduce this, what happens on my system
is that when the ISAKMP SA lifetime is exceeded the rekeying
fails and my connection dies. I can reproduce this back to
2.6.22 and it doesn't seem related to my recent xfrm_user work.
It looks like this behaviour is hiding the bug you are seeing.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET] atm: Fix build errors after conversion to pr_debug()

2007-08-27 Thread Thomas Graf

Fixes ancient ATM debug code to at least compile again.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/atm/signaling.c
===
--- net-2.6.24.orig/net/atm/signaling.c 2007-08-27 09:53:40.0 +0200
+++ net-2.6.24/net/atm/signaling.c  2007-08-27 09:55:16.0 +0200
@@ -89,9 +89,9 @@ static int sigd_send(struct atm_vcc *vcc
 
msg = (struct atmsvc_msg *) skb-data;
atomic_sub(skb-truesize, sk_atm(vcc)-sk_wmem_alloc);
-   pr_debug(sigd_send %d (0x%lx)\n,(int) msg-type,
- (unsigned long) msg-vcc);
vcc = *(struct atm_vcc **) msg-vcc;
+   pr_debug(sigd_send %d (0x%lx)\n,(int) msg-type,
+ (unsigned long) vcc);
sk = sk_atm(vcc);
 
switch (msg-type) {
Index: net-2.6.24/net/atm/common.c
===
--- net-2.6.24.orig/net/atm/common.c2007-08-27 09:56:06.0 +0200
+++ net-2.6.24/net/atm/common.c 2007-08-27 09:56:16.0 +0200
@@ -497,7 +497,7 @@ int vcc_recvmsg(struct kiocb *iocb, stru
if (error)
return error;
sock_recv_timestamp(msg, sk, skb);
-   pr_debug(RcvM %d -= %d\n, atomic_read(sk-rmem_alloc), 
skb-truesize);
+   pr_debug(RcvM %d -= %d\n, atomic_read(sk-sk_rmem_alloc), 
skb-truesize);
atm_return(vcc, skb-truesize);
skb_free_datagram(sk, skb);
return copied;
Index: net-2.6.24/net/atm/raw.c
===
--- net-2.6.24.orig/net/atm/raw.c   2007-08-27 09:57:56.0 +0200
+++ net-2.6.24/net/atm/raw.c2007-08-27 09:58:09.0 +0200
@@ -32,8 +32,8 @@ static void atm_pop_raw(struct atm_vcc *
 {
struct sock *sk = sk_atm(vcc);
 
-   pr_debug(APopR (%d) %d -= %d\n, vcc-vci, sk-sk_wmem_alloc,
-   skb-truesize);
+   pr_debug(APopR (%d) %d -= %d\n, vcc-vci,
+   atomic_read(sk-sk_wmem_alloc), skb-truesize);
atomic_sub(skb-truesize, sk-sk_wmem_alloc);
dev_kfree_skb_any(skb);
sk-sk_write_space(sk);
Index: net-2.6.24/net/atm/pppoatm.c
===
--- net-2.6.24.orig/net/atm/pppoatm.c   2007-08-27 10:01:34.0 +0200
+++ net-2.6.24/net/atm/pppoatm.c2007-08-27 10:02:05.0 +0200
@@ -165,9 +165,8 @@ static void pppoatm_push(struct atm_vcc 
pvcc-chan.mtu += LLC_LEN;
break;
}
-   pr_debug((unit %d): Couldn't autodetect yet 
+   pr_debug(Couldn't autodetect yet 
(skb: %02X %02X %02X %02X %02X %02X)\n,
-   pvcc-chan.unit,
skb-data[0], skb-data[1], skb-data[2],
skb-data[3], skb-data[4], skb-data[5]);
goto error;
@@ -195,8 +194,7 @@ static int pppoatm_send(struct ppp_chann
 {
struct pppoatm_vcc *pvcc = chan_to_pvcc(chan);
ATM_SKB(skb)-vcc = pvcc-atmvcc;
-   pr_debug((unit %d): pppoatm_send (skb=0x%p, vcc=0x%p)\n,
-   pvcc-chan.unit, skb, pvcc-atmvcc);
+   pr_debug(pppoatm_send (skb=0x%p, vcc=0x%p)\n, skb, pvcc-atmvcc);
if (skb-data[0] == '\0'  (pvcc-flags  SC_COMP_PROT))
(void) skb_pull(skb, 1);
switch (pvcc-encaps) { /* LLC encapsulation needed */
@@ -221,16 +219,14 @@ static int pppoatm_send(struct ppp_chann
goto nospace;
break;
case e_autodetect:
-   pr_debug((unit %d): Trying to send without setting encaps!\n,
-   pvcc-chan.unit);
+   pr_debug(Trying to send without setting encaps!\n);
kfree_skb(skb);
return 1;
}
 
atomic_add(skb-truesize, sk_atm(ATM_SKB(skb)-vcc)-sk_wmem_alloc);
ATM_SKB(skb)-atm_options = ATM_SKB(skb)-vcc-atm_options;
-   pr_debug((unit %d): atm_skb(%p)-vcc(%p)-dev(%p)\n,
-   pvcc-chan.unit, skb, ATM_SKB(skb)-vcc,
+   pr_debug(atm_skb(%p)-vcc(%p)-dev(%p)\n, skb, ATM_SKB(skb)-vcc,
ATM_SKB(skb)-vcc-dev);
return ATM_SKB(skb)-vcc-send(ATM_SKB(skb)-vcc, skb)
? DROP_PACKET : 1;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET] 82596: Add missing parenthesis

2007-08-27 Thread Thomas Graf


Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/drivers/net/82596.c
===
--- net-2.6.24.orig/drivers/net/82596.c 2007-08-27 14:43:16.0 +0200
+++ net-2.6.24/drivers/net/82596.c  2007-08-27 14:43:51.0 +0200
@@ -1562,7 +1562,7 @@ static void set_multicast_list(struct ne
memcpy(cp, dmi-dmi_addr, 6);
if (i596_debug  1)
DEB(DEB_MULTI,printk(KERN_INFO %s: Adding 
address  MAC_FMT \n,
-   dev-name, MAC_ARG(cp));
+   dev-name, MAC_ARG(cp)));
}
i596_add_cmd(dev, cmd-cmd);
}
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[XFRM] policy: Replace magic number with XFRM_POLICY_OUT

2007-08-25 Thread Thomas Graf


Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_policy.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_policy.c  2007-08-24 13:11:17.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_policy.c   2007-08-24 13:11:48.0 +0200
@@ -1477,7 +1477,7 @@ restart:
pol_dead = 0;
xfrm_nr = 0;
 
-   if (sk  sk-sk_policy[1]) {
+   if (sk  sk-sk_policy[XFRM_POLICY_OUT]) {
policy = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl);
if (IS_ERR(policy))
return PTR_ERR(policy);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [XFRM] : Fix pointer copy size for encap_tmpl and coaddr.

2007-08-24 Thread Thomas Graf

* Masahide NAKAMURA [EMAIL PROTECTED] 2007-08-24 19:05
 This is minor fix about sizeof argument using with kmemdup().

Thanks for catching this!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: net-2.6.24 failure with netconsole

2007-08-22 Thread Thomas Graf

* Andrew Morton [EMAIL PROTECTED] 2007-08-21 22:54
 Which used to be a BUG.  It later oopsed via a null-pointer deref in
 net_rx_action(), which is a much preferable result.

I fixed this already

Index: net-2.6.24/include/linux/netpoll.h
===
--- net-2.6.24.orig/include/linux/netpoll.h 2007-08-22 01:02:14.0 
+0200
+++ net-2.6.24/include/linux/netpoll.h  2007-08-22 01:02:30.0 +0200
@@ -75,7 +75,7 @@ static inline void *netpoll_poll_lock(st
struct net_device *dev = napi-dev;
 
rcu_read_lock(); /* deal with race on -npinfo */
-   if (dev-npinfo) {
+   if (dev  dev-npinfo) {
spin_lock(napi-poll_lock);
napi-poll_owner = smp_processor_id();
return napi;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/16] xfrm netlink interface cleanups

2007-08-22 Thread Thomas Graf

This patchset converts the xfrm netlink bits over to the type
safe netlink interface and does some cleanups.

 xfrm_user.c | 1041 
 1 file changed, 433 insertions(+), 608 deletions(-)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/16] [XFRM] netlink: Use nlmsg_put() instead of NLMSG_PUT()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-20 17:09:48.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:10:34.0 +0200
@@ -588,10 +588,10 @@ static int dump_one_state(struct xfrm_st
if (sp-this_idx  sp-start_idx)
goto out;
 
-   nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid,
-   sp-nlmsg_seq,
-   XFRM_MSG_NEWSA, sizeof(*p));
-   nlh-nlmsg_flags = sp-nlmsg_flags;
+   nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq,
+   XFRM_MSG_NEWSA, sizeof(*p), sp-nlmsg_flags);
+   if (nlh == NULL)
+   return -EMSGSIZE;
 
p = NLMSG_DATA(nlh);
copy_to_user_state(x, p);
@@ -633,7 +633,6 @@ out:
sp-this_idx++;
return 0;
 
-nlmsg_failure:
 rtattr_failure:
nlmsg_trim(skb, b);
return -1;
@@ -1276,11 +1275,11 @@ static int dump_one_policy(struct xfrm_p
if (sp-this_idx  sp-start_idx)
goto out;
 
-   nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid,
-   sp-nlmsg_seq,
-   XFRM_MSG_NEWPOLICY, sizeof(*p));
+   nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq,
+   XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags);
+   if (nlh == NULL)
+   return -EMSGSIZE;
p = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = sp-nlmsg_flags;
 
copy_to_user_policy(xp, p, dir);
if (copy_to_user_tmpl(xp, skb)  0)
@@ -1449,9 +1448,10 @@ static int build_aevent(struct sk_buff *
struct xfrm_lifetime_cur ltime;
unsigned char *b = skb_tail_pointer(skb);
 
-   nlh = NLMSG_PUT(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id));
+   nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
+   if (nlh == NULL)
+   return -EMSGSIZE;
id = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = 0;
 
memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr));
id-sa_id.spi = x-id.spi;
@@ -1483,7 +1483,6 @@ static int build_aevent(struct sk_buff *
return skb-len;
 
 rtattr_failure:
-nlmsg_failure:
nlmsg_trim(skb, b);
return -1;
 }
@@ -1866,9 +1865,10 @@ static int build_migrate(struct sk_buff 
unsigned char *b = skb_tail_pointer(skb);
int i;
 
-   nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id));
+   nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0);
+   if (nlh == NULL)
+   return -EMSGSIZE;
pol_id = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = 0;
 
/* copy data from selector, dir, and type to the pol_id */
memset(pol_id, 0, sizeof(*pol_id));
@@ -2045,20 +2045,16 @@ static int build_expire(struct sk_buff *
struct nlmsghdr *nlh;
unsigned char *b = skb_tail_pointer(skb);
 
-   nlh = NLMSG_PUT(skb, c-pid, 0, XFRM_MSG_EXPIRE,
-   sizeof(*ue));
+   nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
+   if (nlh == NULL)
+   return -EMSGSIZE;
ue = NLMSG_DATA(nlh);
-   nlh-nlmsg_flags = 0;
 
copy_to_user_state(x, ue-state);
ue-hard = (c-data.hard != 0) ? 1 : 0;
 
nlh-nlmsg_len = skb_tail_pointer(skb) - b;
return skb-len;
-
-nlmsg_failure:
-   nlmsg_trim(skb, b);
-   return -1;
 }
 
 static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c)
@@ -2108,9 +2104,11 @@ static int xfrm_notify_sa_flush(struct k
return -ENOMEM;
b = skb-tail;
 
-   nlh = NLMSG_PUT(skb, c-pid, c-seq,
-   XFRM_MSG_FLUSHSA, sizeof(*p));
-   nlh-nlmsg_flags = 0;
+   nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0);
+   if (nlh == NULL) {
+   kfree_skb(skb);
+   return -EMSGSIZE;
+   }
 
p = NLMSG_DATA(nlh);
p-proto = c-data.proto;
@@ -2119,10 +2117,6 @@ static int xfrm_notify_sa_flush(struct k
 
NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
-
-nlmsg_failure:
-   kfree_skb(skb);
-   return -1;
 }
 
 static inline int xfrm_sa_len(struct xfrm_state *x)
@@ -2162,8 +2156,9 @@ static int xfrm_notify_sa(struct xfrm_st
return -ENOMEM;
b = skb-tail;
 
-   nlh = NLMSG_PUT(skb, c-pid, c-seq, c-event, headlen);
-   nlh-nlmsg_flags = 0;
+   nlh = nlmsg_put(skb, c-pid, c-seq, c-event, headlen, 0);
+   if (nlh == NULL)
+   goto nlmsg_failure;
 
p = NLMSG_DATA(nlh);
if (c-event == XFRM_MSG_DELSA) {
@@ -2233,10 +2228,10 @@ static int build_acquire(struct sk_buff 
unsigned char *b = skb_tail_pointer(skb);
__u32 seq = xfrm_get_acqseq

[PATCH 09/16] [XFRM] netlink: Use nlmsg_parse() to parse attributes

2007-08-22 Thread Thomas Graf

Uses nlmsg_parse() to parse the attributes. This actually changes
behaviour as unknown attributes (type  MAXTYPE) no longer cause
an error. Instead unknown attributes will be ignored henceforth
to keep older kernels compatible with more recent userspace tools.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:07:38.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:04.0 +0200
@@ -1890,7 +1890,7 @@ static int xfrm_send_migrate(struct xfrm
 }
 #endif
 
-#define XMSGSIZE(type) NLMSG_LENGTH(sizeof(struct type))
+#define XMSGSIZE(type) sizeof(struct type)
 
 static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
[XFRM_MSG_NEWSA   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info),
@@ -1906,13 +1906,13 @@ static const int xfrm_msg_min[XFRM_NR_MS
[XFRM_MSG_UPDSA   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info),
[XFRM_MSG_POLEXPIRE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire),
[XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush),
-   [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = NLMSG_LENGTH(0),
+   [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = 0,
[XFRM_MSG_NEWAE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
[XFRM_MSG_GETAE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
[XFRM_MSG_REPORT  - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
[XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
-   [XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)),
-   [XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)),
+   [XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = sizeof(u32),
+   [XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 };
 
 #undef XMSGSIZE
@@ -1946,9 +1946,9 @@ static struct xfrm_link {
 
 static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
-   struct rtattr *xfrma[XFRMA_MAX];
+   struct nlattr *xfrma[XFRMA_MAX+1];
struct xfrm_link *link;
-   int type, min_len;
+   int type, err;
 
type = nlh-nlmsg_type;
if (type  XFRM_MSG_MAX)
@@ -1970,30 +1970,16 @@ static int xfrm_user_rcv_msg(struct sk_b
return netlink_dump_start(xfrm_nl, skb, nlh, link-dump, NULL);
}
 
-   memset(xfrma, 0, sizeof(xfrma));
-
-   if (nlh-nlmsg_len  (min_len = xfrm_msg_min[type]))
-   return -EINVAL;
-
-   if (nlh-nlmsg_len  min_len) {
-   int attrlen = nlh-nlmsg_len - NLMSG_ALIGN(min_len);
-   struct rtattr *attr = (void *) nlh + NLMSG_ALIGN(min_len);
-
-   while (RTA_OK(attr, attrlen)) {
-   unsigned short flavor = attr-rta_type;
-   if (flavor) {
-   if (flavor  XFRMA_MAX)
-   return -EINVAL;
-   xfrma[flavor - 1] = attr;
-   }
-   attr = RTA_NEXT(attr, attrlen);
-   }
-   }
+   /* FIXME: Temporary hack, nlmsg_parse() starts at xfrma[1], old code
+* expects first attribute at xfrma[0] */
+   err = nlmsg_parse(nlh, xfrm_msg_min[type], xfrma-1, XFRMA_MAX, NULL);
+   if (err  0)
+   return err;
 
if (link-doit == NULL)
return -EINVAL;
 
-   return link-doit(skb, nlh, xfrma);
+   return link-doit(skb, nlh, (struct rtattr **) xfrma);
 }
 
 static void xfrm_netlink_rcv(struct sock *sk, int len)

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/16] [XFRM] netlink: Use nlmsg_new() and type-safe size calculation helpers

2007-08-22 Thread Thomas Graf

Moves all complex message size calculation into own inlined helper
functions and makes use of the type-safe netlink interface.

Using nlmsg_new() simplifies the calculation itself as it takes care
of the netlink header length by itself.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:04:46.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:07:38.0 +0200
@@ -670,7 +670,7 @@ static struct sk_buff *xfrm_state_netlin
struct xfrm_dump_info info;
struct sk_buff *skb;
 
-   skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
+   skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
if (!skb)
return ERR_PTR(-ENOMEM);
 
@@ -688,6 +688,13 @@ static struct sk_buff *xfrm_state_netlin
return skb;
 }
 
+static inline size_t xfrm_spdinfo_msgsize(void)
+{
+   return NLMSG_ALIGN(4)
+  + nla_total_size(sizeof(struct xfrmu_spdinfo))
+  + nla_total_size(sizeof(struct xfrmu_spdhinfo));
+}
+
 static int build_spdinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags)
 {
struct xfrmk_spdinfo si;
@@ -729,12 +736,8 @@ static int xfrm_get_spdinfo(struct sk_bu
u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
-   int len = NLMSG_LENGTH(sizeof(u32));
 
-   len += RTA_SPACE(sizeof(struct xfrmu_spdinfo));
-   len += RTA_SPACE(sizeof(struct xfrmu_spdhinfo));
-
-   r_skb = alloc_skb(len, GFP_ATOMIC);
+   r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC);
if (r_skb == NULL)
return -ENOMEM;
 
@@ -744,6 +747,13 @@ static int xfrm_get_spdinfo(struct sk_bu
return nlmsg_unicast(xfrm_nl, r_skb, spid);
 }
 
+static inline size_t xfrm_sadinfo_msgsize(void)
+{
+   return NLMSG_ALIGN(4)
+  + nla_total_size(sizeof(struct xfrmu_sadhinfo))
+  + nla_total_size(4); /* XFRMA_SAD_CNT */
+}
+
 static int build_sadinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags)
 {
struct xfrmk_sadinfo si;
@@ -779,13 +789,8 @@ static int xfrm_get_sadinfo(struct sk_bu
u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
-   int len = NLMSG_LENGTH(sizeof(u32));
-
-   len += RTA_SPACE(sizeof(struct xfrmu_sadhinfo));
-   len += RTA_SPACE(sizeof(u32));
-
-   r_skb = alloc_skb(len, GFP_ATOMIC);
 
+   r_skb = nlmsg_new(xfrm_sadinfo_msgsize(), GFP_ATOMIC);
if (r_skb == NULL)
return -ENOMEM;
 
@@ -1311,7 +1316,7 @@ static struct sk_buff *xfrm_policy_netli
struct xfrm_dump_info info;
struct sk_buff *skb;
 
-   skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+   skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!skb)
return ERR_PTR(-ENOMEM);
 
@@ -1425,6 +1430,14 @@ static int xfrm_flush_sa(struct sk_buff 
return 0;
 }
 
+static inline size_t xfrm_aevent_msgsize(void)
+{
+   return NLMSG_ALIGN(sizeof(struct xfrm_aevent_id))
+  + nla_total_size(sizeof(struct xfrm_replay_state))
+  + nla_total_size(sizeof(struct xfrm_lifetime_cur))
+  + nla_total_size(4) /* XFRM_AE_RTHR */
+  + nla_total_size(4); /* XFRM_AE_ETHR */
+}
 
 static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, struct 
km_event *c)
 {
@@ -1469,19 +1482,9 @@ static int xfrm_get_ae(struct sk_buff *s
int err;
struct km_event c;
struct xfrm_aevent_id *p = nlmsg_data(nlh);
-   int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id));
struct xfrm_usersa_id *id = p-sa_id;
 
-   len += RTA_SPACE(sizeof(struct xfrm_replay_state));
-   len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur));
-
-   if (p-flagsXFRM_AE_RTHR)
-   len+=RTA_SPACE(sizeof(u32));
-
-   if (p-flagsXFRM_AE_ETHR)
-   len+=RTA_SPACE(sizeof(u32));
-
-   r_skb = alloc_skb(len, GFP_ATOMIC);
+   r_skb = nlmsg_new(xfrm_aevent_msgsize(), GFP_ATOMIC);
if (r_skb == NULL)
return -ENOMEM;
 
@@ -1824,6 +1827,13 @@ static int copy_to_user_migrate(struct x
return nla_put(skb, XFRMA_MIGRATE, sizeof(um), um);
 }
 
+static inline size_t xfrm_migrate_msgsize(int num_migrate)
+{
+   return NLMSG_ALIGN(sizeof(struct xfrm_userpolicy_id))
+  + nla_total_size(sizeof(struct xfrm_user_migrate) * num_migrate)
+  + userpolicy_type_attrsize();
+}
+
 static int build_migrate(struct sk_buff *skb, struct xfrm_migrate *m,
 int num_migrate, struct xfrm_selector *sel,
 u8 dir, u8 type)
@@ -1861,12 +1871,8 @@ static int xfrm_send_migrate(struct xfrm
 struct xfrm_migrate *m, int num_migrate)
 {
struct sk_buff *skb;
-   size_t len

[PATCH 02/16] [XFRM] netlink: Use nlmsg_end() and nlmsg_cancel()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:10:34.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:12:20.0 +0200
@@ -583,7 +583,6 @@ static int dump_one_state(struct xfrm_st
struct sk_buff *skb = sp-out_skb;
struct xfrm_usersa_info *p;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
 
if (sp-this_idx  sp-start_idx)
goto out;
@@ -628,14 +627,14 @@ static int dump_one_state(struct xfrm_st
if (x-lastused)
RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused);
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
+   nlmsg_end(skb, nlh);
 out:
sp-this_idx++;
return 0;
 
 rtattr_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb)
@@ -1270,7 +1269,6 @@ static int dump_one_policy(struct xfrm_p
struct sk_buff *in_skb = sp-in_skb;
struct sk_buff *skb = sp-out_skb;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
 
if (sp-this_idx  sp-start_idx)
goto out;
@@ -1289,14 +1287,14 @@ static int dump_one_policy(struct xfrm_p
if (copy_to_user_policy_type(xp-type, skb)  0)
goto nlmsg_failure;
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
+   nlmsg_end(skb, nlh);
 out:
sp-this_idx++;
return 0;
 
 nlmsg_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb)
@@ -1446,7 +1444,6 @@ static int build_aevent(struct sk_buff *
struct xfrm_aevent_id *id;
struct nlmsghdr *nlh;
struct xfrm_lifetime_cur ltime;
-   unsigned char *b = skb_tail_pointer(skb);
 
nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
if (nlh == NULL)
@@ -1479,12 +1476,11 @@ static int build_aevent(struct sk_buff *
RTA_PUT(skb,XFRMA_ETIMER_THRESH,sizeof(u32),etimer);
}
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
-   return skb-len;
+   return nlmsg_end(skb, nlh);
 
 rtattr_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -1862,7 +1858,6 @@ static int build_migrate(struct sk_buff 
struct xfrm_migrate *mp;
struct xfrm_userpolicy_id *pol_id;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
int i;
 
nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0);
@@ -1883,11 +1878,10 @@ static int build_migrate(struct sk_buff 
goto nlmsg_failure;
}
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
-   return skb-len;
+   return nlmsg_end(skb, nlh);
 nlmsg_failure:
-   nlmsg_trim(skb, b);
-   return -1;
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
 }
 
 static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
@@ -2043,7 +2037,6 @@ static int build_expire(struct sk_buff *
 {
struct xfrm_user_expire *ue;
struct nlmsghdr *nlh;
-   unsigned char *b = skb_tail_pointer(skb);
 
nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
if (nlh == NULL)
@@ -2053,8 +2046,7 @@ static int build_expire(struct sk_buff *
copy_to_user_state(x, ue-state);
ue-hard = (c-data.hard != 0) ? 1 : 0;
 
-   nlh-nlmsg_len = skb_tail_pointer(skb) - b;
-   return skb-len;
+   return nlmsg_end(skb, nlh);
 }
 
 static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c)
@@ -2096,13 +2088,11 @@ static int xfrm_notify_sa_flush(struct k
struct xfrm_usersa_flush *p;
struct nlmsghdr *nlh;
struct sk_buff *skb;
-   sk_buff_data_t b;
int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush));
 
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
-   b = skb-tail;
 
nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0);
if (nlh == NULL) {
@@ -2113,7 +2103,7 @@ static int xfrm_notify_sa_flush(struct k
p = NLMSG_DATA(nlh);
p-proto = c-data.proto;
 
-   nlh-nlmsg_len = skb-tail - b;
+   nlmsg_end(skb, nlh);
 
NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
@@ -2140,7 +2130,6 @@ static int xfrm_notify_sa(struct xfrm_st
struct xfrm_usersa_id *id;
struct nlmsghdr *nlh;
struct sk_buff *skb

[PATCH 16/16] [XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()

2007-08-22 Thread Thomas Graf

These functions are only used once and are a lot easier to understand if
inlined directly into the function.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 23:05:30.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-22 16:45:31.0 +0200
@@ -214,23 +214,6 @@ static int attach_one_algo(struct xfrm_a
return 0;
 }
 
-static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr 
*rta)
-{
-   struct xfrm_encap_tmpl *p, *uencap;
-
-   if (!rta)
-   return 0;
-
-   uencap = nla_data(rta);
-   p = kmemdup(uencap, sizeof(*p), GFP_KERNEL);
-   if (!p)
-   return -ENOMEM;
-
-   *encapp = p;
-   return 0;
-}
-
-
 static inline int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx)
 {
int len = 0;
@@ -242,33 +225,6 @@ static inline int xfrm_user_sec_ctx_size
return len;
 }
 
-static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg)
-{
-   struct xfrm_user_sec_ctx *uctx;
-
-   if (!u_arg)
-   return 0;
-
-   uctx = nla_data(u_arg);
-   return security_xfrm_state_alloc(x, uctx);
-}
-
-static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta)
-{
-   xfrm_address_t *p, *uaddrp;
-
-   if (!rta)
-   return 0;
-
-   uaddrp = nla_data(rta);
-   p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL);
-   if (!p)
-   return -ENOMEM;
-
-   *addrpp = p;
-   return 0;
-}
-
 static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info 
*p)
 {
memcpy(x-id, p-id, sizeof(x-id));
@@ -340,15 +296,27 @@ static struct xfrm_state *xfrm_state_con
   xfrm_calg_get_byname,
   attrs[XFRMA_ALG_COMP])))
goto error;
-   if ((err = attach_encap_tmpl(x-encap, attrs[XFRMA_ENCAP])))
-   goto error;
-   if ((err = attach_one_addr(x-coaddr, attrs[XFRMA_COADDR])))
-   goto error;
+
+   if (attrs[XFRMA_ENCAP]) {
+   x-encap = kmemdup(nla_data(attrs[XFRMA_ENCAP]),
+  sizeof(x-encap), GFP_KERNEL);
+   if (x-encap == NULL)
+   goto error;
+   }
+
+   if (attrs[XFRMA_COADDR]) {
+   x-coaddr = kmemdup(nla_data(attrs[XFRMA_COADDR]),
+   sizeof(x-coaddr), GFP_KERNEL);
+   if (x-coaddr == NULL)
+   goto error;
+   }
+
err = xfrm_init_state(x);
if (err)
goto error;
 
-   if ((err = attach_sec_ctx(x, attrs[XFRMA_SEC_CTX])))
+   if (attrs[XFRMA_SEC_CTX] 
+   security_xfrm_state_alloc(x, nla_data(attrs[XFRMA_SEC_CTX])))
goto error;
 
x-km.seq = p-seq;

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/16] [XFRM] netlink: Move algorithm length calculation to its own function

2007-08-22 Thread Thomas Graf

Adds alg_len() to calculate the properly padded length of an
algorithm attribute to simplify the code.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:16:03.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:03:43.0 +0200
@@ -33,6 +33,11 @@
 #endif
 #include linux/audit.h
 
+static inline int alg_len(struct xfrm_algo *alg)
+{
+   return sizeof(*alg) + ((alg-alg_key_len + 7) / 8);
+}
+
 static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type)
 {
struct rtattr *rt = xfrma[type - 1];
@@ -232,7 +237,6 @@ static int attach_one_algo(struct xfrm_a
struct rtattr *rta = u_arg;
struct xfrm_algo *p, *ualg;
struct xfrm_algo_desc *algo;
-   int len;
 
if (!rta)
return 0;
@@ -244,8 +248,7 @@ static int attach_one_algo(struct xfrm_a
return -ENOSYS;
*props = algo-desc.sadb_alg_id;
 
-   len = sizeof(*ualg) + (ualg-alg_key_len + 7U) / 8;
-   p = kmemdup(ualg, len, GFP_KERNEL);
+   p = kmemdup(ualg, alg_len(ualg), GFP_KERNEL);
if (!p)
return -ENOMEM;
 
@@ -617,11 +620,9 @@ static int dump_one_state(struct xfrm_st
copy_to_user_state(x, p);
 
if (x-aalg)
-   NLA_PUT(skb, XFRMA_ALG_AUTH,
-   sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg);
+   NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg);
if (x-ealg)
-   NLA_PUT(skb, XFRMA_ALG_CRYPT,
-   sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg);
+   NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg);
if (x-calg)
NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
 
@@ -2072,9 +2073,9 @@ static inline int xfrm_sa_len(struct xfr
 {
int l = 0;
if (x-aalg)
-   l += RTA_SPACE(sizeof(*x-aalg) + (x-aalg-alg_key_len+7)/8);
+   l += RTA_SPACE(alg_len(x-aalg));
if (x-ealg)
-   l += RTA_SPACE(sizeof(*x-ealg) + (x-ealg-alg_key_len+7)/8);
+   l += RTA_SPACE(alg_len(x-ealg));
if (x-calg)
l += RTA_SPACE(sizeof(*x-calg));
if (x-encap)
@@ -2127,11 +2128,9 @@ static int xfrm_notify_sa(struct xfrm_st
copy_to_user_state(x, p);
 
if (x-aalg)
-   NLA_PUT(skb, XFRMA_ALG_AUTH,
-   sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg);
+   NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg);
if (x-ealg)
-   NLA_PUT(skb, XFRMA_ALG_CRYPT,
-   sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg);
+   NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg);
if (x-calg)
NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/16] [XFRM] netlink: Rename attribyte array from xfrma[] to attrs[]

2007-08-22 Thread Thomas Graf

Increases readability a lot.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:10.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:34:29.0 +0200
@@ -38,9 +38,9 @@ static inline int alg_len(struct xfrm_al
return sizeof(*alg) + ((alg-alg_key_len + 7) / 8);
 }
 
-static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type)
+static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type)
 {
-   struct rtattr *rt = xfrma[type];
+   struct rtattr *rt = attrs[type];
struct xfrm_algo *algp;
 
if (!rt)
@@ -75,18 +75,18 @@ static int verify_one_alg(struct rtattr 
return 0;
 }
 
-static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type,
+static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type,
   xfrm_address_t **addrp)
 {
-   struct rtattr *rt = xfrma[type];
+   struct rtattr *rt = attrs[type];
 
if (rt  addrp)
*addrp = RTA_DATA(rt);
 }
 
-static inline int verify_sec_ctx_len(struct rtattr **xfrma)
+static inline int verify_sec_ctx_len(struct rtattr **attrs)
 {
-   struct rtattr *rt = xfrma[XFRMA_SEC_CTX];
+   struct rtattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_user_sec_ctx *uctx;
 
if (!rt)
@@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str
 
 
 static int verify_newsa_info(struct xfrm_usersa_info *p,
-struct rtattr **xfrma)
+struct rtattr **attrs)
 {
int err;
 
@@ -125,35 +125,35 @@ static int verify_newsa_info(struct xfrm
err = -EINVAL;
switch (p-id.proto) {
case IPPROTO_AH:
-   if (!xfrma[XFRMA_ALG_AUTH]  ||
-   xfrma[XFRMA_ALG_CRYPT]  ||
-   xfrma[XFRMA_ALG_COMP])
+   if (!attrs[XFRMA_ALG_AUTH]  ||
+   attrs[XFRMA_ALG_CRYPT]  ||
+   attrs[XFRMA_ALG_COMP])
goto out;
break;
 
case IPPROTO_ESP:
-   if ((!xfrma[XFRMA_ALG_AUTH] 
-!xfrma[XFRMA_ALG_CRYPT])   ||
-   xfrma[XFRMA_ALG_COMP])
+   if ((!attrs[XFRMA_ALG_AUTH] 
+!attrs[XFRMA_ALG_CRYPT])   ||
+   attrs[XFRMA_ALG_COMP])
goto out;
break;
 
case IPPROTO_COMP:
-   if (!xfrma[XFRMA_ALG_COMP]  ||
-   xfrma[XFRMA_ALG_AUTH]   ||
-   xfrma[XFRMA_ALG_CRYPT])
+   if (!attrs[XFRMA_ALG_COMP]  ||
+   attrs[XFRMA_ALG_AUTH]   ||
+   attrs[XFRMA_ALG_CRYPT])
goto out;
break;
 
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
case IPPROTO_DSTOPTS:
case IPPROTO_ROUTING:
-   if (xfrma[XFRMA_ALG_COMP]   ||
-   xfrma[XFRMA_ALG_AUTH]   ||
-   xfrma[XFRMA_ALG_CRYPT]  ||
-   xfrma[XFRMA_ENCAP]  ||
-   xfrma[XFRMA_SEC_CTX]||
-   !xfrma[XFRMA_COADDR])
+   if (attrs[XFRMA_ALG_COMP]   ||
+   attrs[XFRMA_ALG_AUTH]   ||
+   attrs[XFRMA_ALG_CRYPT]  ||
+   attrs[XFRMA_ENCAP]  ||
+   attrs[XFRMA_SEC_CTX]||
+   !attrs[XFRMA_COADDR])
goto out;
break;
 #endif
@@ -162,13 +162,13 @@ static int verify_newsa_info(struct xfrm
goto out;
}
 
-   if ((err = verify_one_alg(xfrma, XFRMA_ALG_AUTH)))
+   if ((err = verify_one_alg(attrs, XFRMA_ALG_AUTH)))
goto out;
-   if ((err = verify_one_alg(xfrma, XFRMA_ALG_CRYPT)))
+   if ((err = verify_one_alg(attrs, XFRMA_ALG_CRYPT)))
goto out;
-   if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP)))
+   if ((err = verify_one_alg(attrs, XFRMA_ALG_COMP)))
goto out;
-   if ((err = verify_sec_ctx_len(xfrma)))
+   if ((err = verify_sec_ctx_len(attrs)))
goto out;
 
err = -EINVAL;
@@ -298,12 +298,12 @@ static void copy_from_user_state(struct 
  * somehow made shareable and move it to xfrm_state.c - JHS
  *
 */
-static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma)
+static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs)
 {
-   struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL];
-   struct rtattr *lt = xfrma[XFRMA_LTIME_VAL];
-   struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH];
-   struct rtattr *rt = xfrma[XFRMA_REPLAY_THRESH];
+   struct rtattr *rp = attrs[XFRMA_REPLAY_VAL

[PATCH 10/16] [XFRM] netlink: Establish an attribute policy

2007-08-22 Thread Thomas Graf

Adds a policy defining the minimal payload lengths for all the attributes
allowing for most attribute validation checks to be removed from in
the middle of the code path. Makes updates more consistent as many format
errors are recognised earlier, before any changes have been attempted.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:31:04.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:56.0 +0200
@@ -42,19 +42,12 @@ static int verify_one_alg(struct rtattr 
 {
struct rtattr *rt = xfrma[type - 1];
struct xfrm_algo *algp;
-   int len;
 
if (!rt)
return 0;
 
-   len = (rt-rta_len - sizeof(*rt)) - sizeof(*algp);
-   if (len  0)
-   return -EINVAL;
-
algp = RTA_DATA(rt);
-
-   len -= (algp-alg_key_len + 7U) / 8;
-   if (len  0)
+   if (RTA_PAYLOAD(rt)  alg_len(algp))
return -EINVAL;
 
switch (type) {
@@ -82,55 +75,25 @@ static int verify_one_alg(struct rtattr 
return 0;
 }
 
-static int verify_encap_tmpl(struct rtattr **xfrma)
-{
-   struct rtattr *rt = xfrma[XFRMA_ENCAP - 1];
-   struct xfrm_encap_tmpl *encap;
-
-   if (!rt)
-   return 0;
-
-   if ((rt-rta_len - sizeof(*rt))  sizeof(*encap))
-   return -EINVAL;
-
-   return 0;
-}
-
-static int verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type,
+static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type,
   xfrm_address_t **addrp)
 {
struct rtattr *rt = xfrma[type - 1];
 
-   if (!rt)
-   return 0;
-
-   if ((rt-rta_len - sizeof(*rt))  sizeof(**addrp))
-   return -EINVAL;
-
-   if (addrp)
+   if (rt  addrp)
*addrp = RTA_DATA(rt);
-
-   return 0;
 }
 
 static inline int verify_sec_ctx_len(struct rtattr **xfrma)
 {
struct rtattr *rt = xfrma[XFRMA_SEC_CTX - 1];
struct xfrm_user_sec_ctx *uctx;
-   int len = 0;
 
if (!rt)
return 0;
 
-   if (rt-rta_len  sizeof(*uctx))
-   return -EINVAL;
-
uctx = RTA_DATA(rt);
-
-   len += sizeof(struct xfrm_user_sec_ctx);
-   len += uctx-ctx_len;
-
-   if (uctx-len != len)
+   if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len))
return -EINVAL;
 
return 0;
@@ -205,12 +168,8 @@ static int verify_newsa_info(struct xfrm
goto out;
if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP)))
goto out;
-   if ((err = verify_encap_tmpl(xfrma)))
-   goto out;
if ((err = verify_sec_ctx_len(xfrma)))
goto out;
-   if ((err = verify_one_addr(xfrma, XFRMA_COADDR, NULL)))
-   goto out;
 
err = -EINVAL;
switch (p-mode) {
@@ -339,9 +298,8 @@ static void copy_from_user_state(struct 
  * somehow made shareable and move it to xfrm_state.c - JHS
  *
 */
-static int xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma)
+static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma)
 {
-   int err = - EINVAL;
struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1];
struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1];
struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH-1];
@@ -349,8 +307,6 @@ static int xfrm_update_ae_params(struct 
 
if (rp) {
struct xfrm_replay_state *replay;
-   if (RTA_PAYLOAD(rp)  sizeof(*replay))
-   goto error;
replay = RTA_DATA(rp);
memcpy(x-replay, replay, sizeof(*replay));
memcpy(x-preplay, replay, sizeof(*replay));
@@ -358,8 +314,6 @@ static int xfrm_update_ae_params(struct 
 
if (lt) {
struct xfrm_lifetime_cur *ltime;
-   if (RTA_PAYLOAD(lt)  sizeof(*ltime))
-   goto error;
ltime = RTA_DATA(lt);
x-curlft.bytes = ltime-bytes;
x-curlft.packets = ltime-packets;
@@ -367,21 +321,11 @@ static int xfrm_update_ae_params(struct 
x-curlft.use_time = ltime-use_time;
}
 
-   if (et) {
-   if (RTA_PAYLOAD(et)  sizeof(u32))
-   goto error;
+   if (et)
x-replay_maxage = *(u32*)RTA_DATA(et);
-   }
 
-   if (rt) {
-   if (RTA_PAYLOAD(rt)  sizeof(u32))
-   goto error;
+   if (rt)
x-replay_maxdiff = *(u32*)RTA_DATA(rt);
-   }
-
-   return 0;
-error:
-   return err;
 }
 
 static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p,
@@ -429,9 +373,7 @@ static struct xfrm_state *xfrm_state_con
 
/* override default values from above

[PATCH 14/16] [XFRM] netlink: Use nla_memcpy() in xfrm_update_ae_params()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:35:13.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:36:59.0 +0200
@@ -303,20 +303,12 @@ static void xfrm_update_ae_params(struct
struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
if (rp) {
-   struct xfrm_replay_state *replay;
-   replay = nla_data(rp);
-   memcpy(x-replay, replay, sizeof(*replay));
-   memcpy(x-preplay, replay, sizeof(*replay));
+   nla_memcpy(x-replay, rp, sizeof(x-replay));
+   nla_memcpy(x-preplay, rp, sizeof(x-preplay));
}
 
-   if (lt) {
-   struct xfrm_lifetime_cur *ltime;
-   ltime = nla_data(lt);
-   x-curlft.bytes = ltime-bytes;
-   x-curlft.packets = ltime-packets;
-   x-curlft.add_time = ltime-add_time;
-   x-curlft.use_time = ltime-use_time;
-   }
+   if (lt)
+   nla_memcpy(x-curlft, lt, sizeof(x-curlft));
 
if (et)
x-replay_maxage = nla_get_u32(et);

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/16] [XFRM] netlink: Use nlattr instead of rtattr

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:29.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:35:13.0 +0200
@@ -38,16 +38,16 @@ static inline int alg_len(struct xfrm_al
return sizeof(*alg) + ((alg-alg_key_len + 7) / 8);
 }
 
-static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type)
+static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
-   struct rtattr *rt = attrs[type];
+   struct nlattr *rt = attrs[type];
struct xfrm_algo *algp;
 
if (!rt)
return 0;
 
-   algp = RTA_DATA(rt);
-   if (RTA_PAYLOAD(rt)  alg_len(algp))
+   algp = nla_data(rt);
+   if (nla_len(rt)  alg_len(algp))
return -EINVAL;
 
switch (type) {
@@ -75,24 +75,24 @@ static int verify_one_alg(struct rtattr 
return 0;
 }
 
-static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type,
+static void verify_one_addr(struct nlattr **attrs, enum xfrm_attr_type_t type,
   xfrm_address_t **addrp)
 {
-   struct rtattr *rt = attrs[type];
+   struct nlattr *rt = attrs[type];
 
if (rt  addrp)
-   *addrp = RTA_DATA(rt);
+   *addrp = nla_data(rt);
 }
 
-static inline int verify_sec_ctx_len(struct rtattr **attrs)
+static inline int verify_sec_ctx_len(struct nlattr **attrs)
 {
-   struct rtattr *rt = attrs[XFRMA_SEC_CTX];
+   struct nlattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_user_sec_ctx *uctx;
 
if (!rt)
return 0;
 
-   uctx = RTA_DATA(rt);
+   uctx = nla_data(rt);
if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len))
return -EINVAL;
 
@@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str
 
 
 static int verify_newsa_info(struct xfrm_usersa_info *p,
-struct rtattr **attrs)
+struct nlattr **attrs)
 {
int err;
 
@@ -191,16 +191,15 @@ out:
 
 static int attach_one_algo(struct xfrm_algo **algpp, u8 *props,
   struct xfrm_algo_desc *(*get_byname)(char *, int),
-  struct rtattr *u_arg)
+  struct nlattr *rta)
 {
-   struct rtattr *rta = u_arg;
struct xfrm_algo *p, *ualg;
struct xfrm_algo_desc *algo;
 
if (!rta)
return 0;
 
-   ualg = RTA_DATA(rta);
+   ualg = nla_data(rta);
 
algo = get_byname(ualg-alg_name, 1);
if (!algo)
@@ -216,15 +215,14 @@ static int attach_one_algo(struct xfrm_a
return 0;
 }
 
-static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct rtattr 
*u_arg)
+static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr 
*rta)
 {
-   struct rtattr *rta = u_arg;
struct xfrm_encap_tmpl *p, *uencap;
 
if (!rta)
return 0;
 
-   uencap = RTA_DATA(rta);
+   uencap = nla_data(rta);
p = kmemdup(uencap, sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -245,26 +243,25 @@ static inline int xfrm_user_sec_ctx_size
return len;
 }
 
-static int attach_sec_ctx(struct xfrm_state *x, struct rtattr *u_arg)
+static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg)
 {
struct xfrm_user_sec_ctx *uctx;
 
if (!u_arg)
return 0;
 
-   uctx = RTA_DATA(u_arg);
+   uctx = nla_data(u_arg);
return security_xfrm_state_alloc(x, uctx);
 }
 
-static int attach_one_addr(xfrm_address_t **addrpp, struct rtattr *u_arg)
+static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta)
 {
-   struct rtattr *rta = u_arg;
xfrm_address_t *p, *uaddrp;
 
if (!rta)
return 0;
 
-   uaddrp = RTA_DATA(rta);
+   uaddrp = nla_data(rta);
p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
@@ -298,23 +295,23 @@ static void copy_from_user_state(struct 
  * somehow made shareable and move it to xfrm_state.c - JHS
  *
 */
-static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs)
+static void xfrm_update_ae_params(struct xfrm_state *x, struct nlattr **attrs)
 {
-   struct rtattr *rp = attrs[XFRMA_REPLAY_VAL];
-   struct rtattr *lt = attrs[XFRMA_LTIME_VAL];
-   struct rtattr *et = attrs[XFRMA_ETIMER_THRESH];
-   struct rtattr *rt = attrs[XFRMA_REPLAY_THRESH];
+   struct nlattr *rp = attrs[XFRMA_REPLAY_VAL];
+   struct nlattr *lt = attrs[XFRMA_LTIME_VAL];
+   struct nlattr *et = attrs[XFRMA_ETIMER_THRESH];
+   struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH];
 
if (rp) {
struct xfrm_replay_state *replay;
-   replay = RTA_DATA(rp

[PATCH 05/16] [XFRM] netlink: Use nla_put()/NLA_PUT() variantes

2007-08-22 Thread Thomas Graf

Also makes use of copy_sec_ctx() in another place and removes
duplicated code.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:15:03.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:16:03.0 +0200
@@ -576,6 +576,27 @@ struct xfrm_dump_info {
int this_idx;
 };
 
+static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
+{
+   int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len;
+   struct xfrm_user_sec_ctx *uctx;
+   struct nlattr *attr;
+
+   attr = nla_reserve(skb, XFRMA_SEC_CTX, ctx_size);
+   if (attr == NULL)
+   return -EMSGSIZE;
+
+   uctx = nla_data(attr);
+   uctx-exttype = XFRMA_SEC_CTX;
+   uctx-len = ctx_size;
+   uctx-ctx_doi = s-ctx_doi;
+   uctx-ctx_alg = s-ctx_alg;
+   uctx-ctx_len = s-ctx_len;
+   memcpy(uctx + 1, s-ctx_str, s-ctx_len);
+
+   return 0;
+}
+
 static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
 {
struct xfrm_dump_info *sp = ptr;
@@ -596,43 +617,32 @@ static int dump_one_state(struct xfrm_st
copy_to_user_state(x, p);
 
if (x-aalg)
-   RTA_PUT(skb, XFRMA_ALG_AUTH,
+   NLA_PUT(skb, XFRMA_ALG_AUTH,
sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg);
if (x-ealg)
-   RTA_PUT(skb, XFRMA_ALG_CRYPT,
+   NLA_PUT(skb, XFRMA_ALG_CRYPT,
sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg);
if (x-calg)
-   RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
+   NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg);
 
if (x-encap)
-   RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap);
+   NLA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap);
 
-   if (x-security) {
-   int ctx_size = sizeof(struct xfrm_sec_ctx) +
-   x-security-ctx_len;
-   struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size);
-   struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt);
-
-   uctx-exttype = XFRMA_SEC_CTX;
-   uctx-len = ctx_size;
-   uctx-ctx_doi = x-security-ctx_doi;
-   uctx-ctx_alg = x-security-ctx_alg;
-   uctx-ctx_len = x-security-ctx_len;
-   memcpy(uctx + 1, x-security-ctx_str, x-security-ctx_len);
-   }
+   if (x-security  copy_sec_ctx(x-security, skb)  0)
+   goto nla_put_failure;
 
if (x-coaddr)
-   RTA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr);
+   NLA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr);
 
if (x-lastused)
-   RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused);
+   NLA_PUT_U64(skb, XFRMA_LASTUSED, x-lastused);
 
nlmsg_end(skb, nlh);
 out:
sp-this_idx++;
return 0;
 
-rtattr_failure:
+nla_put_failure:
nlmsg_cancel(skb, nlh);
return -EMSGSIZE;
 }
@@ -1193,32 +1203,9 @@ static int copy_to_user_tmpl(struct xfrm
up-ealgos = kp-ealgos;
up-calgos = kp-calgos;
}
-   RTA_PUT(skb, XFRMA_TMPL,
-   (sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr),
-   vec);
-
-   return 0;
-
-rtattr_failure:
-   return -1;
-}
-
-static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
-{
-   int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len;
-   struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size);
-   struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt);
-
-   uctx-exttype = XFRMA_SEC_CTX;
-   uctx-len = ctx_size;
-   uctx-ctx_doi = s-ctx_doi;
-   uctx-ctx_alg = s-ctx_alg;
-   uctx-ctx_len = s-ctx_len;
-   memcpy(uctx + 1, s-ctx_str, s-ctx_len);
-   return 0;
 
- rtattr_failure:
-   return -1;
+   return nla_put(skb, XFRMA_TMPL,
+  sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr, vec);
 }
 
 static inline int copy_to_user_state_sec_ctx(struct xfrm_state *x, struct 
sk_buff *skb)
@@ -1240,17 +1227,11 @@ static inline int copy_to_user_sec_ctx(s
 #ifdef CONFIG_XFRM_SUB_POLICY
 static int copy_to_user_policy_type(u8 type, struct sk_buff *skb)
 {
-   struct xfrm_userpolicy_type upt;
+   struct xfrm_userpolicy_type upt = {
+   .type = type,
+   };
 
-   memset(upt, 0, sizeof(upt));
-   upt.type = type;
-
-   RTA_PUT(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt);
-
-   return 0;
-
-rtattr_failure:
-   return -1;
+   return nla_put(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt);
 }
 
 #else
@@ -1440,7 +1421,6 @@ static int build_aevent(struct sk_buff *
 {
struct xfrm_aevent_id *id;
struct nlmsghdr *nlh;
-   struct xfrm_lifetime_cur ltime

[PATCH 04/16] [XFRM] netlink: Use nlmsg_broadcast() and nlmsg_unicast()

2007-08-22 Thread Thomas Graf

This simplifies successful return codes from 0 to 0.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:13:57.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:15:03.0 +0200
@@ -800,8 +800,7 @@ static int xfrm_get_sa(struct sk_buff *s
if (IS_ERR(resp_skb)) {
err = PTR_ERR(resp_skb);
} else {
-   err = netlink_unicast(xfrm_nl, resp_skb,
- NETLINK_CB(skb).pid, MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid);
}
xfrm_state_put(x);
 out_noput:
@@ -882,8 +881,7 @@ static int xfrm_alloc_userspi(struct sk_
goto out;
}
 
-   err = netlink_unicast(xfrm_nl, resp_skb,
- NETLINK_CB(skb).pid, MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid);
 
 out:
xfrm_state_put(x);
@@ -1393,9 +1391,8 @@ static int xfrm_get_policy(struct sk_buf
if (IS_ERR(resp_skb)) {
err = PTR_ERR(resp_skb);
} else {
-   err = netlink_unicast(xfrm_nl, resp_skb,
- NETLINK_CB(skb).pid,
- MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, resp_skb,
+   NETLINK_CB(skb).pid);
}
} else {
xfrm_audit_log(NETLINK_CB(skb).loginuid, NETLINK_CB(skb).sid,
@@ -1525,8 +1522,7 @@ static int xfrm_get_ae(struct sk_buff *s
 
if (build_aevent(r_skb, x, c)  0)
BUG();
-   err = netlink_unicast(xfrm_nl, r_skb,
- NETLINK_CB(skb).pid, MSG_DONTWAIT);
+   err = nlmsg_unicast(xfrm_nl, r_skb, NETLINK_CB(skb).pid);
spin_unlock_bh(x-lock);
xfrm_state_put(x);
return err;
@@ -1903,9 +1899,7 @@ static int xfrm_send_migrate(struct xfrm
if (build_migrate(skb, m, num_migrate, sel, dir, type)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_MIGRATE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE,
-GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE, GFP_ATOMIC);
 }
 #else
 static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
@@ -2061,8 +2055,7 @@ static int xfrm_exp_state_notify(struct 
if (build_expire(skb, x, c)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
 }
 
 static int xfrm_aevent_state_notify(struct xfrm_state *x, struct km_event *c)
@@ -2079,8 +2072,7 @@ static int xfrm_aevent_state_notify(stru
if (build_aevent(skb, x, c)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_AEVENTS;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, 
GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC);
 }
 
 static int xfrm_notify_sa_flush(struct km_event *c)
@@ -2105,8 +2097,7 @@ static int xfrm_notify_sa_flush(struct k
 
nlmsg_end(skb, nlh);
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
 }
 
 static inline int xfrm_sa_len(struct xfrm_state *x)
@@ -2175,8 +2166,7 @@ static int xfrm_notify_sa(struct xfrm_st
 
nlmsg_end(skb, nlh);
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_SA;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC);
 
 nlmsg_failure:
 rtattr_failure:
@@ -2262,8 +2252,7 @@ static int xfrm_send_acquire(struct xfrm
if (build_acquire(skb, x, xt, xp, dir)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_ACQUIRE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, 
GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, GFP_ATOMIC);
 }
 
 /* User gives us xfrm_user_policy_info followed by an array of 0
@@ -2371,8 +2360,7 @@ static int xfrm_exp_policy_notify(struct
if (build_polexpire(skb, xp, dir, c)  0)
BUG();
 
-   NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE;
-   return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
+   return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC);
 }
 
 static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, struct km_event 
*c)
@@ -2423,8 +2411,7 @@ static

[PATCH 03/16] [XFRM] netlink: Use nlmsg_data() instead of NLMSG_DATA()

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:12:20.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:13:57.0 +0200
@@ -443,7 +443,7 @@ error_no_put:
 static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtattr **xfrma)
 {
-   struct xfrm_usersa_info *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_info *p = nlmsg_data(nlh);
struct xfrm_state *x;
int err;
struct km_event c;
@@ -520,7 +520,7 @@ static int xfrm_del_sa(struct sk_buff *s
struct xfrm_state *x;
int err = -ESRCH;
struct km_event c;
-   struct xfrm_usersa_id *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_id *p = nlmsg_data(nlh);
 
x = xfrm_user_state_lookup(p, xfrma, err);
if (x == NULL)
@@ -592,7 +592,7 @@ static int dump_one_state(struct xfrm_st
if (nlh == NULL)
return -EMSGSIZE;
 
-   p = NLMSG_DATA(nlh);
+   p = nlmsg_data(nlh);
copy_to_user_state(x, p);
 
if (x-aalg)
@@ -715,7 +715,7 @@ static int xfrm_get_spdinfo(struct sk_bu
struct rtattr **xfrma)
 {
struct sk_buff *r_skb;
-   u32 *flags = NLMSG_DATA(nlh);
+   u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
int len = NLMSG_LENGTH(sizeof(u32));
@@ -765,7 +765,7 @@ static int xfrm_get_sadinfo(struct sk_bu
struct rtattr **xfrma)
 {
struct sk_buff *r_skb;
-   u32 *flags = NLMSG_DATA(nlh);
+   u32 *flags = nlmsg_data(nlh);
u32 spid = NETLINK_CB(skb).pid;
u32 seq = nlh-nlmsg_seq;
int len = NLMSG_LENGTH(sizeof(u32));
@@ -787,7 +787,7 @@ static int xfrm_get_sadinfo(struct sk_bu
 static int xfrm_get_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtattr **xfrma)
 {
-   struct xfrm_usersa_id *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_id *p = nlmsg_data(nlh);
struct xfrm_state *x;
struct sk_buff *resp_skb;
int err = -ESRCH;
@@ -841,7 +841,7 @@ static int xfrm_alloc_userspi(struct sk_
int family;
int err;
 
-   p = NLMSG_DATA(nlh);
+   p = nlmsg_data(nlh);
err = verify_userspi_info(p);
if (err)
goto out_noput;
@@ -1130,7 +1130,7 @@ static struct xfrm_policy *xfrm_policy_c
 static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
struct rtattr **xfrma)
 {
-   struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh);
+   struct xfrm_userpolicy_info *p = nlmsg_data(nlh);
struct xfrm_policy *xp;
struct km_event c;
int err;
@@ -1277,8 +1277,8 @@ static int dump_one_policy(struct xfrm_p
XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags);
if (nlh == NULL)
return -EMSGSIZE;
-   p = NLMSG_DATA(nlh);
 
+   p = nlmsg_data(nlh);
copy_to_user_policy(xp, p, dir);
if (copy_to_user_tmpl(xp, skb)  0)
goto nlmsg_failure;
@@ -1351,7 +1351,7 @@ static int xfrm_get_policy(struct sk_buf
struct km_event c;
int delete;
 
-   p = NLMSG_DATA(nlh);
+   p = nlmsg_data(nlh);
delete = nlh-nlmsg_type == XFRM_MSG_DELPOLICY;
 
err = copy_from_user_policy_type(type, xfrma);
@@ -1420,7 +1420,7 @@ static int xfrm_flush_sa(struct sk_buff 
struct rtattr **xfrma)
 {
struct km_event c;
-   struct xfrm_usersa_flush *p = NLMSG_DATA(nlh);
+   struct xfrm_usersa_flush *p = nlmsg_data(nlh);
struct xfrm_audit audit_info;
int err;
 
@@ -1448,8 +1448,8 @@ static int build_aevent(struct sk_buff *
nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0);
if (nlh == NULL)
return -EMSGSIZE;
-   id = NLMSG_DATA(nlh);
 
+   id = nlmsg_data(nlh);
memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr));
id-sa_id.spi = x-id.spi;
id-sa_id.family = x-props.family;
@@ -1490,7 +1490,7 @@ static int xfrm_get_ae(struct sk_buff *s
struct sk_buff *r_skb;
int err;
struct km_event c;
-   struct xfrm_aevent_id *p = NLMSG_DATA(nlh);
+   struct xfrm_aevent_id *p = nlmsg_data(nlh);
int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id));
struct xfrm_usersa_id *id = p-sa_id;
 
@@ -1538,7 +1538,7 @@ static int xfrm_new_ae(struct sk_buff *s
struct xfrm_state *x;
struct km_event c;
int err = - EINVAL;
-   struct xfrm_aevent_id *p = NLMSG_DATA(nlh);
+   struct xfrm_aevent_id *p = nlmsg_data(nlh);
struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1];
struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1];
 
@@ -1602,7 +1602,7 @@ static int xfrm_add_pol_expire(struct sk
struct rtattr

[PATCH 15/16] [XFRM] netlink: Remove dependency on rtnetlink

2007-08-22 Thread Thomas Graf

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:36:59.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:37:18.0 +0200
@@ -19,7 +19,6 @@
 #include linux/string.h
 #include linux/net.h
 #include linux/skbuff.h
-#include linux/rtnetlink.h
 #include linux/pfkeyv2.h
 #include linux/ipsec.h
 #include linux/init.h

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/16] [XFRM] netlink: Clear up some of the CONFIG_XFRM_SUB_POLICY ifdef mess

2007-08-22 Thread Thomas Graf

Moves all of the SUB_POLICY ifdefs related to the attribute size
calculation into a function.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/net/xfrm/xfrm_user.c
===
--- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:03:43.0 
+0200
+++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:04:46.0 +0200
@@ -1224,6 +1224,14 @@ static inline int copy_to_user_sec_ctx(s
}
return 0;
 }
+static inline size_t userpolicy_type_attrsize(void)
+{
+#ifdef CONFIG_XFRM_SUB_POLICY
+   return nla_total_size(sizeof(struct xfrm_userpolicy_type));
+#else
+   return 0;
+#endif
+}
 
 #ifdef CONFIG_XFRM_SUB_POLICY
 static int copy_to_user_policy_type(u8 type, struct sk_buff *skb)
@@ -1857,9 +1865,7 @@ static int xfrm_send_migrate(struct xfrm
 
len = RTA_SPACE(sizeof(struct xfrm_user_migrate) * num_migrate);
len += NLMSG_SPACE(sizeof(struct xfrm_userpolicy_id));
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
@@ -2214,9 +2220,7 @@ static int xfrm_send_acquire(struct xfrm
len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr);
len += NLMSG_SPACE(sizeof(struct xfrm_user_acquire));
len += RTA_SPACE(xfrm_user_sec_ctx_size(x-security));
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
@@ -2322,9 +2326,7 @@ static int xfrm_exp_policy_notify(struct
len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr);
len += NLMSG_SPACE(sizeof(struct xfrm_user_polexpire));
len += RTA_SPACE(xfrm_user_sec_ctx_size(xp-security));
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
skb = alloc_skb(len, GFP_ATOMIC);
if (skb == NULL)
return -ENOMEM;
@@ -2349,9 +2351,7 @@ static int xfrm_notify_policy(struct xfr
len += RTA_SPACE(headlen);
headlen = sizeof(*id);
}
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
len += NLMSG_SPACE(headlen);
 
skb = alloc_skb(len, GFP_ATOMIC);
@@ -2401,9 +2401,7 @@ static int xfrm_notify_policy_flush(stru
struct nlmsghdr *nlh;
struct sk_buff *skb;
int len = 0;
-#ifdef CONFIG_XFRM_SUB_POLICY
-   len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type));
-#endif
+   len += userpolicy_type_attrsize();
len += NLMSG_LENGTH(0);
 
skb = alloc_skb(len, GFP_ATOMIC);

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Wild and crazy ideas involving struct sk_buff

2007-08-22 Thread Thomas Graf

* Paul Moore [EMAIL PROTECTED] 2007-08-22 16:31
 We're currently talking about several different ideas to solve the problem, 
 including leveraging the sk_buff.secmark field, and one of the ideas was to 
 add an additional field to the sk_buff structure.  Knowing how well that idea 
 would go over (lead balloon is probably an understatement at best) I started 
 looking at what I might be able to remove from the sk_buff struct to make 
 room for a new field (the new field would be a u32).  Looking at the sk_buff 
 structure it appears that the sk_buff.dev and sk_buff.iif fields are a bit 
 redundant and removing the sk_buff.dev field could free 32/64 bits depending 
 on the platform.  Is there any reason (performance?) for keeping the 
 sk_buff.dev field around?  Would the community be open to patches which 
 removed it and transition users over to the sk_buff.iif field?  Finally, 
 assuming the sk_buff.dev field was removed, would the community be open to 
 adding a new LSM/SELinux related u32 field to the sk_buff struct?

This reminds of an idea someone brought up a while ago, it involved
having a way to attach additional space to an sk_buff for all the
different marks and other non-essential fields.

I think skb-dev is required because we need to have a reference on the
device while a packet being processing is put on a queue somewhere.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4 - rev 2] Initilize and populate age field

2007-08-21 Thread Thomas Graf

* Varun Chandramohan [EMAIL PROTECTED] 2007-08-20 13:46
 The age field is filled with the current time at the time of creation of the 
 route. When the routes are dumped
 then the age value stored in the route structure is subtracted from the 
 current time value and the difference is the age expressed in secs.
 
 Signed-off-by: Varun Chandramohan [EMAIL PROTECTED]
 @@ -985,6 +987,14 @@ int fib_dump_info(struct sk_buff *skb, u
   NLA_PUT_U32(skb, RTA_FLOW, fi-fib_nh[0].nh_tclassid);
  #endif
   }
 +
 + do_gettimeofday(tv);
 + if (!*age) {
 + *age = timeval_to_sec(tv);
 + NLA_PUT_U32(skb, RTA_AGE, *age);

Why don't you take the timestamp at the time of allocating the alias?
This time-since-first-dump is very confusing.

 + } else {
 + NLA_PUT_U32(skb, RTA_AGE, timeval_to_sec(tv) - *age);
 + }
  #ifdef CONFIG_IP_ROUTE_MULTIPATH
   if (fi-fib_nhs  1) {
   struct rtnexthop *rtnh;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4 - rev 2] Initilize and populate age field

2007-08-21 Thread Thomas Graf

* Varun Chandramohan [EMAIL PROTECTED] 2007-08-21 16:52
 I know its a bit confusing but let me explain the reason. In my first
 version patch i used fn_hash_insert() (place where alias is created)as
 place to insert my current time in the age field.
 This will eventually call fib_dump_info() for inserting the age filed
 attribute into the skb. Now in both places i have to call
 do_gettimeofday(). Its obvious that i need it in fn_hash_insert(), its
 also need in fib_dump_info() as it is the same function called for
 retrieving and dumping the age value to the userspace. So as you are
 aware that before we dump it to userspace we need to subtract the value
 with current time i need to call do_gettimeofday() twice. To avoid this
 i did as above.

At least put a comment there, it's far from obvious.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NET]: Don't do netpoll on per cpu backlog napi struct

2007-08-21 Thread Thomas Graf

The per cpu backlog napi struct can't do netpoll and has the
dev member set to NULL. Fixes an oops on boot when netpoll is
enabled.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6.24/include/linux/netpoll.h
===
--- net-2.6.24.orig/include/linux/netpoll.h 2007-08-22 01:02:14.0 
+0200
+++ net-2.6.24/include/linux/netpoll.h  2007-08-22 01:02:30.0 +0200
@@ -75,7 +75,7 @@ static inline void *netpoll_poll_lock(st
struct net_device *dev = napi-dev;
 
rcu_read_lock(); /* deal with race on -npinfo */
-   if (dev-npinfo) {
+   if (dev  dev-npinfo) {
spin_lock(napi-poll_lock);
napi-poll_owner = smp_processor_id();
return napi;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.

2007-08-20 Thread Thomas Graf

* Felix Marti [EMAIL PROTECTED] 2007-08-20 12:02
 These graphic adapters provide a wealth of features that you can take
 advantage of to bring these amazing graphics to life. General purpose
 CPUs cannot keep up. Chelsio offload devices do the same thing in the
 realm of networking. - Will there be things you can't do, probably yes,
 but as I said, there are lots of knobs to turn (and the latest and
 greatest feature that gets hyped up might not always be the best thing
 since sliced bread anyway; what happened to BIC love? ;)

GPUs have almost no influence on system security, the network stack OTOH
is probably the most vulnerable part of an operating system. Even if all
vendors would implement all the features collected over the last years
properly which seems unlikely. Having such an essential and critical
part depend on the vendor of my network card without being able to even
verify it properly is truly frightening.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?

2007-08-16 Thread Thomas Graf

* Richard MUSIL [EMAIL PROTECTED] 2007-07-24 13:09
 Thomas Graf wrote:
  Please provide a new overall patch which is not based on your
  initial patch so I can review your idea properly.
 
 Here it goes (merging two previous patches). I have diffed
 against v2.6.22, which I am using currently as my base:

Sorry for taking so long.

 @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct 
 genl_ops *ops)
   if (ops-policy)
   ops-flags |= GENL_CMD_CAP_HASPOL;
  
 - genl_lock();
 + genl_fam_lock(family);
   list_add_tail(ops-ops_list, family-ops_list);
 - genl_unlock();
 + genl_fam_unlock(family);

For registering operations, it is sufficient to just acquire the
family lock, the family itself can't disappear while holding it.

 @@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family)
   goto errout;
  
   INIT_LIST_HEAD(family-ops_list);
 + mutex_init(family-lock);
  
 - genl_lock();
 + genl_fam_lock(family);
  
   if (genl_family_find_byname(family-name)) {
   err = -EEXIST;
 @@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family)
   family-attrbuf = NULL;
  
   list_add_tail(family-family_list, genl_family_chain(family-id));
 - genl_unlock();
 + genl_fam_unlock(family);

This looks good.

 @@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct 
 nlmsghdr *nlh)
   struct genlmsghdr *hdr = nlmsg_data(nlh);
   int hdrlen, err;
  
 + genl_fam_lock(NULL);
   family = genl_family_find_byid(nlh-nlmsg_type);
 - if (family == NULL)
 + if (family == NULL) {
 + genl_fam_unlock(NULL);
   return -ENOENT;
 + }
 +
 + /* get particular family lock, but release global family lock
 +  * so registering operations for other families are possible */
 + genl_onefam_lock(family);
 + genl_fam_unlock(NULL);

I don't like having two locks for something as trivial as this.
Basically the only reason the global lock is required here is to
protect from family removal which can be avoided otherwise by
using RCU list operations.

Therefore, I'd propose the following lock semantics:
Use own global mutex to protect writing to the family list, make
reading side lockless using rcu for use when looking up family
upon mesage processing. Use a family lock to protect writing to
operations list and serialize messae processing with unregister
operations.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Potential u32 classifier bug.

2007-08-15 Thread Thomas Graf

* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-08-09 18:07
 My big question is: Has anyone recently used the 802_3 protocol in tc
 with u32 and actually gotten it to work?  I can't see how the
 u32_classify() code can look at the mac header, since it is using the
 network header accessor to start looking.  I think this is an issue with
 the classification code, but I'm looking to see if I'm doing something
 stupid before I really start digging into this mess.

There is this very horrible way of using the u32 classifier with a
negative offset to look into the ethernet header.

You might want to look into the cmp ematch which can be attached to
almost any classifier. It allows basing offsets on any layer thus
making ethernet header filtering trivial.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GENETLINK] some thoughts on the usage

2007-08-15 Thread Thomas Graf

* Richard MUSIL [EMAIL PROTECTED] 2007-08-10 10:45
 I have noticed that although ops for each family are the same (each
 device is functionally same) I cannot use same genl_ops struct for
 registration, because it uses internal member to link in list. Therefore
 it is necessary to allocate new genl_ops for each device and pass it to
 registration. But I cannot officially use this list to track those
 genl_ops (so I can properly destroy them later), because there is no
 interface. So I need to redo the management of the structures on my own.

The intended usage of the interface in your example would be to register
only one genetlink family, say tpm, register one set of operations
and then have an attribute in every message which specifies which TPM
device to use. This helps keeping the total number of genetlink families
down.

 The second inconvenience is that for each family I register, I also
 register basically same ops (basically means, the definitions, and doit,
 dumpit handlers are same, though the structures are at different
 addresses for reasons described above). When the handler receives the
 message it needs to associate the message with the actual device it is
 handling. This could be done through family lookup (using
 nlmsghdr::nlmsg_type), but I wondered if it would make sense to extend
 genl_family for user custom data pointer and then pass this custom data
 (or genl_family reference) to each handler (for example inside
 genl_info). It is already parsed by genetlink layer, so it should not
 slow things down.

That's not a bad idea, although I think we should try and keep the
generic netlink part as simple as possible. There is a family specific
header, referred to as user header in genl_info which is basically
what you're looking for with the custom header. I believe making the
generic netlink family aware of anything beyond family id and operations
id only complicates things.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Potential u32 classifier bug.

2007-08-15 Thread Thomas Graf

* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-08-15 11:02
  There is this very horrible way of using the u32 classifier 
  with a negative offset to look into the ethernet header.
 
 Based on this, it sounds like u32 using protocol 802_3 is broken?

You might be expecting too much from u32. The protocol given
to u32 is just a filter, it doesn't imply anything beyond that.
u32 has its usage the way it is, that's way we've added an ematch
rather than extending u32 itself.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NEIGH]: Combine neighbour cleanup and release

2007-07-31 Thread Thomas Graf

Introduces neigh_cleanup_and_release() to be used after a
neighbour has been removed from its neighbour table. Serves
as preparation to add event notifications.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/core/neighbour.c
===
--- net-2.6.orig/net/core/neighbour.c   2007-07-22 11:41:46.0 +0200
+++ net-2.6/net/core/neighbour.c2007-07-22 11:42:02.0 +0200
@@ -104,6 +104,14 @@ static int neigh_blackhole(struct sk_buf
return -ENETDOWN;
 }
 
+static void neigh_cleanup_and_release(struct neighbour *neigh)
+{
+   if (neigh-parms-neigh_cleanup)
+   neigh-parms-neigh_cleanup(neigh);
+
+   neigh_release(neigh);
+}
+
 /*
  * It is random distribution in the interval (1/2)*base...(3/2)*base.
  * It corresponds to default IPv6 settings and is not overridable,
@@ -140,9 +148,7 @@ static int neigh_forced_gc(struct neigh_
n-dead = 1;
shrunk  = 1;
write_unlock(n-lock);
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
+   neigh_cleanup_and_release(n);
continue;
}
write_unlock(n-lock);
@@ -213,9 +219,7 @@ static void neigh_flush_dev(struct neigh
NEIGH_PRINTK2(neigh %p is stray.\n, n);
}
write_unlock(n-lock);
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
+   neigh_cleanup_and_release(n);
}
}
 }
@@ -676,9 +680,7 @@ static void neigh_periodic_timer(unsigne
*np = n-next;
n-dead = 1;
write_unlock(n-lock);
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
+   neigh_cleanup_and_release(n);
continue;
}
write_unlock(n-lock);
@@ -2094,11 +2096,8 @@ void __neigh_for_each_release(struct nei
} else
np = n-next;
write_unlock(n-lock);
-   if (release) {
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
-   }
+   if (release)
+   neigh_cleanup_and_release(n);
}
}
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NEIGH]: Netlink notifications

2007-07-31 Thread Thomas Graf

Currently neighbour event notifications are limited to update
notifications and only sent if the ARP daemon is enabled. This
patch extends the existing notification code by also reporting
neighbours being removed due to gc or administratively and
removes the dependency on the ARP daemon. This allows to keep
track of neighbour states without periodically fetching the
complete neighbour table.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/core/neighbour.c
===
--- net-2.6.orig/net/core/neighbour.c   2007-07-22 11:42:02.0 +0200
+++ net-2.6/net/core/neighbour.c2007-07-22 11:49:15.0 +0200
@@ -54,9 +54,8 @@
 #define PNEIGH_HASHMASK0xF
 
 static void neigh_timer_handler(unsigned long arg);
-#ifdef CONFIG_ARPD
-static void neigh_app_notify(struct neighbour *n);
-#endif
+static void __neigh_notify(struct neighbour *n, int type, int flags);
+static void neigh_update_notify(struct neighbour *neigh);
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 
@@ -109,6 +108,7 @@ static void neigh_cleanup_and_release(st
if (neigh-parms-neigh_cleanup)
neigh-parms-neigh_cleanup(neigh);
 
+   __neigh_notify(neigh, RTM_DELNEIGH, 0);
neigh_release(neigh);
 }
 
@@ -829,13 +829,10 @@ static void neigh_timer_handler(unsigned
 out:
write_unlock(neigh-lock);
}
+
if (notify)
-   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+   neigh_update_notify(neigh);
 
-#ifdef CONFIG_ARPD
-   if (notify  neigh-parms-app_probes)
-   neigh_app_notify(neigh);
-#endif
neigh_release(neigh);
 }
 
@@ -1064,11 +1061,8 @@ out:
write_unlock_bh(neigh-lock);
 
if (notify)
-   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
-#ifdef CONFIG_ARPD
-   if (notify  neigh-parms-app_probes)
-   neigh_app_notify(neigh);
-#endif
+   neigh_update_notify(neigh);
+
return err;
 }
 
@@ -2001,6 +1995,11 @@ nla_put_failure:
return -EMSGSIZE;
 }
 
+static void neigh_update_notify(struct neighbour *neigh)
+{
+   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+   __neigh_notify(neigh, RTM_NEWNEIGH, 0);
+}
 
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
struct netlink_callback *cb)
@@ -2420,7 +2419,6 @@ static const struct file_operations neig
 
 #endif /* CONFIG_PROC_FS */
 
-#ifdef CONFIG_ARPD
 static inline size_t neigh_nlmsg_size(void)
 {
return NLMSG_ALIGN(sizeof(struct ndmsg))
@@ -2452,16 +2450,11 @@ errout:
rtnl_set_sk_err(RTNLGRP_NEIGH, err);
 }
 
+#ifdef CONFIG_ARPD
 void neigh_app_ns(struct neighbour *n)
 {
__neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST);
 }
-
-static void neigh_app_notify(struct neighbour *n)
-{
-   __neigh_notify(n, RTM_NEWNEIGH, 0);
-}
-
 #endif /* CONFIG_ARPD */
 
 #ifdef CONFIG_SYSCTL
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RTNETLINK]: Fix warning for !CONFIG_KMOD

2007-07-31 Thread Thomas Graf

replay label is unused otherwise.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/core/rtnetlink.c
===
--- net-2.6.orig/net/core/rtnetlink.c   2007-07-22 11:41:46.0 +0200
+++ net-2.6/net/core/rtnetlink.c2007-07-22 12:04:27.0 +0200
@@ -952,7 +952,9 @@ static int rtnl_newlink(struct sk_buff *
struct nlattr *linkinfo[IFLA_INFO_MAX+1];
int err;
 
+#ifdef CONFIG_KMOD
 replay:
+#endif
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
if (err  0)
return err;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?

2007-07-24 Thread Thomas Graf

* Richard MUSIL [EMAIL PROTECTED] 2007-07-23 18:45
 I have been giving it a second thought and came up with something more
 complex. The idea is to have locking granularity at the level of
 individual families.

I agree in general, it would make up a better solution.

However, your initial patch allows operations and families to be
unregistered while message of the same family are being processed
which must not be allowed.

Please provide a new overall patch which is not based on your
initial patch so I can review your idea properly.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GENETLINK]: Correctly report errors while registering a multicast group

2007-07-24 Thread Thomas Graf


Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/netlink/genetlink.c
===
--- net-2.6.orig/net/netlink/genetlink.c2007-07-23 21:54:35.0 
+0200
+++ net-2.6/net/netlink/genetlink.c 2007-07-23 21:54:54.0 +0200
@@ -196,7 +196,7 @@ int genl_register_mc_group(struct genl_f
genl_ctrl_event(CTRL_CMD_NEWMCAST_GRP, grp);
  out:
genl_unlock();
-   return 0;
+   return err;
 }
 EXPORT_SYMBOL(genl_register_mc_group);
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GENETLINK]: Fix adjustment of number of multicast groups

2007-07-24 Thread Thomas Graf

The current calculation of the maximum number of genetlink
multicast groups seems odd, fix it.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/netlink/genetlink.c
===
--- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:03:02.0 
+0200
+++ net-2.6/net/netlink/genetlink.c 2007-07-23 22:05:12.0 +0200
@@ -184,7 +184,7 @@ int genl_register_mc_group(struct genl_f
}
 
err = netlink_change_ngroups(genl_sock,
-sizeof(unsigned long) * NETLINK_GENERIC);
+mc_groups_longs * BITS_PER_LONG);
if (err)
goto out;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread Thomas Graf

family-mcast_groups is protected by genl_lock so it must
be held while accessing the list in genl_unregister_mc_groups().
Requires adding a non-locking variant of genl_unregister_mc_group().

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/netlink/genetlink.c
===
--- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 
+0200
+++ net-2.6/net/netlink/genetlink.c 2007-07-23 22:09:08.0 +0200
@@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f
 }
 EXPORT_SYMBOL(genl_register_mc_group);
 
+static void __genl_unregister_mc_group(struct genl_family *family,
+  struct genl_multicast_group *grp)
+{
+   BUG_ON(grp-family != family);
+   netlink_clear_multicast_users(genl_sock, grp-id);
+   clear_bit(grp-id, mc_groups);
+   list_del(grp-list);
+   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
+   grp-id = 0;
+   grp-family = NULL;
+}
+
 /**
  * genl_unregister_mc_group - unregister a multicast group
  *
@@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group);
 void genl_unregister_mc_group(struct genl_family *family,
  struct genl_multicast_group *grp)
 {
-   BUG_ON(grp-family != family);
genl_lock();
-   netlink_clear_multicast_users(genl_sock, grp-id);
-   clear_bit(grp-id, mc_groups);
-   list_del(grp-list);
-   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
-   grp-id = 0;
-   grp-family = NULL;
+   genl_unregister_mc_group(family, grp);
genl_unlock();
 }
 
@@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st
 {
struct genl_multicast_group *grp, *tmp;
 
+   genl_lock();
list_for_each_entry_safe(grp, tmp, family-mcast_groups, list)
-   genl_unregister_mc_group(family, grp);
+   __genl_unregister_mc_group(family, grp);
+   genl_unlock();
 }
 
 /**
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread Thomas Graf

* Brian Haley [EMAIL PROTECTED] 2007-07-24 12:14
 Thomas Graf wrote:
 @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group);
  void genl_unregister_mc_group(struct genl_family *family,
struct genl_multicast_group *grp)
  {
 -BUG_ON(grp-family != family);
  genl_lock();
 -netlink_clear_multicast_users(genl_sock, grp-id);
 -clear_bit(grp-id, mc_groups);
 -list_del(grp-list);
 -genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
 -grp-id = 0;
 -grp-family = NULL;
 +genl_unregister_mc_group(family, grp);
  genl_unlock();
  }
 
 Shouldn't this be __genl_unregister_mc_group(family, grp) ?

Yes, thank for you noticing.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[REPOST][GENETLINK]: Fix race in genl_unregister_mc_groups()

2007-07-24 Thread Thomas Graf

family-mcast_groups is protected by genl_lock so it must
be held while accessing the list in genl_unregister_mc_groups().
Requires adding a non-locking variant of genl_unregister_mc_group().

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/netlink/genetlink.c
===
--- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 
+0200
+++ net-2.6/net/netlink/genetlink.c 2007-07-24 23:51:11.0 +0200
@@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f
 }
 EXPORT_SYMBOL(genl_register_mc_group);
 
+static void __genl_unregister_mc_group(struct genl_family *family,
+  struct genl_multicast_group *grp)
+{
+   BUG_ON(grp-family != family);
+   netlink_clear_multicast_users(genl_sock, grp-id);
+   clear_bit(grp-id, mc_groups);
+   list_del(grp-list);
+   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
+   grp-id = 0;
+   grp-family = NULL;
+}
+
 /**
  * genl_unregister_mc_group - unregister a multicast group
  *
@@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group);
 void genl_unregister_mc_group(struct genl_family *family,
  struct genl_multicast_group *grp)
 {
-   BUG_ON(grp-family != family);
genl_lock();
-   netlink_clear_multicast_users(genl_sock, grp-id);
-   clear_bit(grp-id, mc_groups);
-   list_del(grp-list);
-   genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp);
-   grp-id = 0;
-   grp-family = NULL;
+   __genl_unregister_mc_group(family, grp);
genl_unlock();
 }
 
@@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st
 {
struct genl_multicast_group *grp, *tmp;
 
+   genl_lock();
list_for_each_entry_safe(grp, tmp, family-mcast_groups, list)
-   genl_unregister_mc_group(family, grp);
+   __genl_unregister_mc_group(family, grp);
+   genl_unlock();
 }
 
 /**
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?

2007-07-23 Thread Thomas Graf

* Richard MUSIL [EMAIL PROTECTED] 2007-07-20 18:15
 Patrick McHardy wrote:
  Export the lock/unlock/.. functions. You'll also need a new version 
  similar to __rtnl_unlock.
 
 Patrick, you might feel, I am not reading your lines, but in fact I do.
 The problem is that I do not feel competent to follow/propose such
 changes. So what I propose here (in included patch) is the least change
 scenario, which I can think of and on which I feel safe.
 
 If there are some other changes required, as you suggested for example
 exporting lock from genetlink module, I hope authors of genetlink will
 comment on that. Currently, I do not see any reason for that, but this
 could be due to my limited knowledge.

Actually there is no reason to not use separate locks for the
message serialization and the protection of the list of registered
families. There is only one lock simply for the reason that I've
never thought of anybody could think of registering a new genetlink
family while processing a message.

Alternatively you could also postpone the registration of the new
genetlink family to a workqueue.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NEIGH]: Combine neighbour cleanup and release

2007-07-22 Thread Thomas Graf

Introduces neigh_cleanup_and_release() to be used after a
neighbour has been removed from its neighbour table. Serves
as preparation to add event notifications.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/core/neighbour.c
===
--- net-2.6.orig/net/core/neighbour.c   2007-07-22 11:41:46.0 +0200
+++ net-2.6/net/core/neighbour.c2007-07-22 11:42:02.0 +0200
@@ -104,6 +104,14 @@ static int neigh_blackhole(struct sk_buf
return -ENETDOWN;
 }
 
+static void neigh_cleanup_and_release(struct neighbour *neigh)
+{
+   if (neigh-parms-neigh_cleanup)
+   neigh-parms-neigh_cleanup(neigh);
+
+   neigh_release(neigh);
+}
+
 /*
  * It is random distribution in the interval (1/2)*base...(3/2)*base.
  * It corresponds to default IPv6 settings and is not overridable,
@@ -140,9 +148,7 @@ static int neigh_forced_gc(struct neigh_
n-dead = 1;
shrunk  = 1;
write_unlock(n-lock);
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
+   neigh_cleanup_and_release(n);
continue;
}
write_unlock(n-lock);
@@ -213,9 +219,7 @@ static void neigh_flush_dev(struct neigh
NEIGH_PRINTK2(neigh %p is stray.\n, n);
}
write_unlock(n-lock);
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
+   neigh_cleanup_and_release(n);
}
}
 }
@@ -676,9 +680,7 @@ static void neigh_periodic_timer(unsigne
*np = n-next;
n-dead = 1;
write_unlock(n-lock);
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
+   neigh_cleanup_and_release(n);
continue;
}
write_unlock(n-lock);
@@ -2094,11 +2096,8 @@ void __neigh_for_each_release(struct nei
} else
np = n-next;
write_unlock(n-lock);
-   if (release) {
-   if (n-parms-neigh_cleanup)
-   n-parms-neigh_cleanup(n);
-   neigh_release(n);
-   }
+   if (release)
+   neigh_cleanup_and_release(n);
}
}
 }
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[NEIGH]: Netlink notifications

2007-07-22 Thread Thomas Graf

Currently neighbour event notifications are limited to update
notifications and only sent if the ARP daemon is enabled. This
patch extends the existing notification code by also reporting
neighbours being removed due to gc or administratively and
removes the dependency on the ARP daemon. This allows to keep
track of neighbour states without periodically fetching the
complete neighbour table.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/core/neighbour.c
===
--- net-2.6.orig/net/core/neighbour.c   2007-07-22 11:42:02.0 +0200
+++ net-2.6/net/core/neighbour.c2007-07-22 11:49:15.0 +0200
@@ -54,9 +54,8 @@
 #define PNEIGH_HASHMASK0xF
 
 static void neigh_timer_handler(unsigned long arg);
-#ifdef CONFIG_ARPD
-static void neigh_app_notify(struct neighbour *n);
-#endif
+static void __neigh_notify(struct neighbour *n, int type, int flags);
+static void neigh_update_notify(struct neighbour *neigh);
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev);
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev);
 
@@ -109,6 +108,7 @@ static void neigh_cleanup_and_release(st
if (neigh-parms-neigh_cleanup)
neigh-parms-neigh_cleanup(neigh);
 
+   __neigh_notify(neigh, RTM_DELNEIGH, 0);
neigh_release(neigh);
 }
 
@@ -829,13 +829,10 @@ static void neigh_timer_handler(unsigned
 out:
write_unlock(neigh-lock);
}
+
if (notify)
-   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+   neigh_update_notify(neigh);
 
-#ifdef CONFIG_ARPD
-   if (notify  neigh-parms-app_probes)
-   neigh_app_notify(neigh);
-#endif
neigh_release(neigh);
 }
 
@@ -1064,11 +1061,8 @@ out:
write_unlock_bh(neigh-lock);
 
if (notify)
-   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
-#ifdef CONFIG_ARPD
-   if (notify  neigh-parms-app_probes)
-   neigh_app_notify(neigh);
-#endif
+   neigh_update_notify(neigh);
+
return err;
 }
 
@@ -2001,6 +1995,11 @@ nla_put_failure:
return -EMSGSIZE;
 }
 
+static void neigh_update_notify(struct neighbour *neigh)
+{
+   call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh);
+   __neigh_notify(neigh, RTM_NEWNEIGH, 0);
+}
 
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
struct netlink_callback *cb)
@@ -2420,7 +2419,6 @@ static const struct file_operations neig
 
 #endif /* CONFIG_PROC_FS */
 
-#ifdef CONFIG_ARPD
 static inline size_t neigh_nlmsg_size(void)
 {
return NLMSG_ALIGN(sizeof(struct ndmsg))
@@ -2452,16 +2450,11 @@ errout:
rtnl_set_sk_err(RTNLGRP_NEIGH, err);
 }
 
+#ifdef CONFIG_ARPD
 void neigh_app_ns(struct neighbour *n)
 {
__neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST);
 }
-
-static void neigh_app_notify(struct neighbour *n)
-{
-   __neigh_notify(n, RTM_NEWNEIGH, 0);
-}
-
 #endif /* CONFIG_ARPD */
 
 #ifdef CONFIG_SYSCTL
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RTNETLINK]: Fix warning if !CONFIG_KMOD

2007-07-22 Thread Thomas Graf

replay label is unused otherwise.

Signed-off-by: Thomas Graf [EMAIL PROTECTED]

Index: net-2.6/net/core/rtnetlink.c
===
--- net-2.6.orig/net/core/rtnetlink.c   2007-07-22 11:41:46.0 +0200
+++ net-2.6/net/core/rtnetlink.c2007-07-22 12:04:27.0 +0200
@@ -952,7 +952,9 @@ static int rtnl_newlink(struct sk_buff *
struct nlattr *linkinfo[IFLA_INFO_MAX+1];
int err;
 
+#ifdef CONFIG_KMOD
 replay:
+#endif
err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
if (err  0)
return err;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Thomas Graf

* Miklos Szeredi [EMAIL PROTECTED] 2007-06-18 11:44
 Garbage collection only ever happens, if the app is sending AF_UNIX
 sockets over AF_UNIX sockets.  Which is a rather rare case.  And which
 is basically why this bug went unnoticed for so long.
 
 So my second patch only affects the performance of _exactly_ those
 apps which might well be bitten by the bug itself.

That's not entirely the truth. It affects all applications using
AF_UNIX sockets while file descriptors are being transfered. I
agree that the performance impact is not severe on most systems
but if file descriptors are being transfered continously by just
a single application it can become rather severe.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] fix race in AF_UNIX

2007-06-18 Thread Thomas Graf

* Thomas Graf [EMAIL PROTECTED] 2007-06-18 12:32
 * Miklos Szeredi [EMAIL PROTECTED] 2007-06-18 11:44
  Garbage collection only ever happens, if the app is sending AF_UNIX
  sockets over AF_UNIX sockets.  Which is a rather rare case.  And which
  is basically why this bug went unnoticed for so long.
  
  So my second patch only affects the performance of _exactly_ those
  apps which might well be bitten by the bug itself.
 
 That's not entirely the truth. It affects all applications using
 AF_UNIX sockets while file descriptors are being transfered. I
 agree that the performance impact is not severe on most systems
 but if file descriptors are being transfered continously by just
 a single application it can become rather severe.

Also think of the scenario where an application, deliberately or not,
begins a file descriptor tranfser using sendmsg() and the receiving
part never invokes recvmsg() to decrement the inflight counters
again. Every unix socket that gets closed would result in a gc call
locking all sockets.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 926 matches

Mail list logo