Re: [RFC net-next 3/3] mpls: new ipmpls device for encapsulating IP packets as mpls
On 06/02/15 at 01:26pm, Eric W. Biederman wrote: What we really want here is xfrm-lite. By lite I mean the tunnel selection criteria is simple enough that it fits into the normal routing table instead of having to do weird flow based magic that is rarely needed. I believe what we want are the xfrm stacking of dst entries. I assume you are referring to reusing the selector and stacked dst. I considered that for the transmit side. Can you elaborate on this some more? How would this look like for the specific case of VXLAN? Any thoughts on the receive side? You also mention that you dislike the net_device approach. What do you suggest instead? The encapsulation is often postponed to after the packet is fully constructed. Where should it get hooked into? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap
On 06/02/15 at 02:28pm, Robert Shearman wrote: Nesting attributes inside the RTA_ENCAP blob should be supported by the patch series today. Something like this: Sure. I'm not seeing such a construct for the MPLS case yet. I'm happy to rebase my patches on top of your nexthop implementation. It is definitely superior. Are you maintaining a git tree somewhere? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 04/14] route: Extend flow representation with tunnel key
Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to allow routes to match on tunnel metadata. For now, the tunnel id is added to flowi_tunnel which allows for routes to be bound to specific virtual tunnels. Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/flow.h | 7 +++ include/net/ip_tunnels.h | 10 ++ net/ipv4/route.c | 2 ++ 3 files changed, 19 insertions(+) diff --git a/include/net/flow.h b/include/net/flow.h index 8109a15..c15fb5e 100644 --- a/include/net/flow.h +++ b/include/net/flow.h @@ -19,6 +19,10 @@ #define LOOPBACK_IFINDEX 1 +struct flowi_tunnel { + __be64 tun_id; +}; + struct flowi_common { int flowic_oif; int flowic_iif; @@ -30,6 +34,7 @@ struct flowi_common { #define FLOWI_FLAG_ANYSRC 0x01 #define FLOWI_FLAG_KNOWN_NH0x02 __u32 flowic_secid; + struct flowi_tunnel flowic_tun_key; }; union flowi_uli { @@ -66,6 +71,7 @@ struct flowi4 { #define flowi4_proto __fl_common.flowic_proto #define flowi4_flags __fl_common.flowic_flags #define flowi4_secid __fl_common.flowic_secid +#define flowi4_tun_key __fl_common.flowic_tun_key /* (saddr,daddr) must be grouped, same order as in IP header */ __be32 saddr; @@ -165,6 +171,7 @@ struct flowi { #define flowi_protou.__fl_common.flowic_proto #define flowi_flagsu.__fl_common.flowic_flags #define flowi_secidu.__fl_common.flowic_secid +#define flowi_tun_key u.__fl_common.flowic_tun_key } __attribute__((__aligned__(BITS_PER_LONG/8))); static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 8b76ba1..df8cfd3 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -12,6 +12,7 @@ #include net/ip.h #include net/netns/generic.h #include net/rtnetlink.h +#include net/flow.h #if IS_ENABLED(CONFIG_IPV6) #include net/ipv6.h @@ -337,6 +338,15 @@ static inline void *ip_tunnel_info_opts(struct ip_tunnel_info *info, return info + 1; } +static inline void ip_tunnel_derive_key(struct sk_buff *skb, + struct flowi_tunnel *key) +{ + struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; + + if (tun_info tun_info-mode == IP_TUNNEL_INFO_RX) + key-tun_id = tun_info-key.tun_id; +} + #endif /* CONFIG_INET */ #endif /* __NET_IP_TUNNELS_H */ diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f605598..6e8e1be 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -109,6 +109,7 @@ #include linux/kmemleak.h #endif #include net/secure_seq.h +#include net/ip_tunnels.h #define RT_FL_TOS(oldflp4) \ ((oldflp4)-flowi4_tos (IPTOS_RT_MASK | RTO_ONLINK)) @@ -1716,6 +1717,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, fl4.flowi4_scope = RT_SCOPE_UNIVERSE; fl4.daddr = daddr; fl4.saddr = saddr; + ip_tunnel_derive_key(skb, fl4.flowi4_tun_key); err = fib_lookup(net, fl4, res); if (err != 0) { if (!IN_DEV_FORWARD(in_dev)) -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata
This allows to attach an ip_tunnel_info metadata structure to skbs via skb_shared_info to represent receive side tunnel information as well as transmit side encapsulation instructions. The new field is added to skb_shared_info as the field is typically immutable after it has been attached. A new flag indicates whether the metadata is meant for receive or transmit. This allows to keep receive metadata attached to the skb all the way through the forwarding path without mistaking it for transmit instructions. The tun_info pointer is thus only released if a packet which has been received on a tunnel is being forwarded to tunnel device again. Since transmit instructions are immutable per flow which attaches them to the skb, a reference count is introduced which allows to reuse the metadata for many packets. Therefore, when a route later on receives the capability to attach tunnel metadata, it will only have to allocate the metadata once and can simply increment the reference counter for each packet that uses that instruction set. Signed-off-by: Thomas Graf tg...@suug.ch --- include/linux/skbuff.h| 1 + include/net/ip_tunnels.h | 45 + net/core/skbuff.c | 8 net/ipv4/ip_tunnel_core.c | 15 +++ 4 files changed, 69 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6b41c15..83f9a59 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -323,6 +323,7 @@ struct skb_shared_info { unsigned short gso_segs; unsigned short gso_type; struct sk_buff *frag_list; + struct ip_tunnel_info *tun_info; struct skb_shared_hwtstamps hwtstamps; u32 tskey; __be32 ip6_frag_id; diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index 6b9d559..3968705 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -38,10 +38,20 @@ struct ip_tunnel_key { __be16 tp_dst; } __packed __aligned(4); /* Minimize padding. */ +/* Indicates whether the tunnel info structure represents receive + * or transmit tunnel parameters. + */ +enum { + IP_TUNNEL_INFO_RX, + IP_TUNNEL_INFO_TX, +}; + struct ip_tunnel_info { struct ip_tunnel_keykey; const void *options; + atomic_trefcnt; u8 options_len; + u8 mode; }; /* 6rd prefix/relay information */ @@ -284,6 +294,41 @@ static inline void iptunnel_xmit_stats(int err, } } +struct ip_tunnel_info *ip_tunnel_info_alloc(size_t optslen, gfp_t flags); + +static inline void ip_tunnel_info_get(struct ip_tunnel_info *info) +{ + atomic_inc(info-refcnt); +} + +static inline void ip_tunnel_info_put(struct ip_tunnel_info *info) +{ + if (!info) + return; + + if (atomic_dec_and_test(info-refcnt)) + kfree(info); +} + +static inline int skb_attach_tunnel_info(struct sk_buff *skb, +struct ip_tunnel_info *info) +{ + if (skb_unclone(skb, GFP_ATOMIC)) + return -ENOMEM; + + ip_tunnel_info_put(skb_shinfo(skb)-tun_info); + ip_tunnel_info_get(info); + skb_shinfo(skb)-tun_info = info; + + return 0; +} + +static inline void skb_release_tunnel_info(struct sk_buff *skb) +{ + ip_tunnel_info_put(skb_shinfo(skb)-tun_info); + skb_shinfo(skb)-tun_info = NULL; +} + #endif /* CONFIG_INET */ #endif /* __NET_IP_TUNNELS_H */ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 9bac0e6..dbbace2 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -69,6 +69,7 @@ #include net/sock.h #include net/checksum.h #include net/ip6_checksum.h +#include net/ip_tunnels.h #include net/xfrm.h #include asm/uaccess.h @@ -594,6 +595,8 @@ static void skb_release_data(struct sk_buff *skb) uarg-callback(uarg, true); } + ip_tunnel_info_put(shinfo-tun_info); + if (shinfo-frag_list) kfree_skb_list(shinfo-frag_list); @@ -985,6 +988,11 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) skb_shinfo(new)-gso_size = skb_shinfo(old)-gso_size; skb_shinfo(new)-gso_segs = skb_shinfo(old)-gso_segs; skb_shinfo(new)-gso_type = skb_shinfo(old)-gso_type; + + if (skb_shinfo(old)-tun_info) { + ip_tunnel_info_get(skb_shinfo(old)-tun_info); + skb_shinfo(new)-tun_info = skb_shinfo(old)-tun_info; + } } static inline int skb_alloc_rx_flag(const struct sk_buff *skb) diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 6a51a71..bbd4f91 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -190,3 +190,18 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev, return tot; } EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64
[net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/ip_tunnels.h | 63 + include/uapi/linux/openvswitch.h | 2 +- net/openvswitch/actions.c| 2 +- net/openvswitch/datapath.h | 5 +-- net/openvswitch/flow.c | 4 +-- net/openvswitch/flow.h | 76 ++-- net/openvswitch/flow_netlink.c | 16 - net/openvswitch/flow_netlink.h | 2 +- net/openvswitch/vport-geneve.c | 17 + net/openvswitch/vport-gre.c | 16 - net/openvswitch/vport-vxlan.c| 18 +- net/openvswitch/vport.c | 30 net/openvswitch/vport.h | 12 +++ 13 files changed, 128 insertions(+), 135 deletions(-) diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index d8214cb..6b9d559 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -22,6 +22,28 @@ /* Keep error state on tunnel for 30 sec */ #define IPTUNNEL_ERR_TIMEO (30*HZ) +/* Used to memset ip_tunnel padding. */ +#define IP_TUNNEL_KEY_SIZE \ + (offsetof(struct ip_tunnel_key, tp_dst) + \ +FIELD_SIZEOF(struct ip_tunnel_key, tp_dst)) + +struct ip_tunnel_key { + __be64 tun_id; + __be32 ipv4_src; + __be32 ipv4_dst; + __be16 tun_flags; + __u8ipv4_tos; + __u8ipv4_ttl; + __be16 tp_src; + __be16 tp_dst; +} __packed __aligned(4); /* Minimize padding. */ + +struct ip_tunnel_info { + struct ip_tunnel_keykey; + const void *options; + u8 options_len; +}; + /* 6rd prefix/relay information */ #ifdef CONFIG_IPV6_SIT_6RD struct ip_tunnel_6rd_parm { @@ -136,6 +158,47 @@ int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *op, int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op, unsigned int num); +static inline void __ip_tunnel_info_init(struct ip_tunnel_info *tun_info, +__be32 saddr, __be32 daddr, +u8 tos, u8 ttl, +__be16 tp_src, __be16 tp_dst, +__be64 tun_id, __be16 tun_flags, +const void *opts, u8 opts_len) +{ + tun_info-key.tun_id = tun_id; + tun_info-key.ipv4_src = saddr; + tun_info-key.ipv4_dst = daddr; + tun_info-key.ipv4_tos = tos; + tun_info-key.ipv4_ttl = ttl; + tun_info-key.tun_flags = tun_flags; + + /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of +* the upper tunnel are used. +* E.g: GRE over IPSEC, the tp_src and tp_port are zero. +*/ + tun_info-key.tp_src = tp_src; + tun_info-key.tp_dst = tp_dst; + + /* Clear struct padding. */ + if (sizeof(tun_info-key) != IP_TUNNEL_KEY_SIZE) + memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_SIZE, + 0, sizeof(tun_info-key) - IP_TUNNEL_KEY_SIZE); + + tun_info-options = opts; + tun_info-options_len = opts_len; +} + +static inline void ip_tunnel_info_init(struct ip_tunnel_info *tun_info, + const struct iphdr *iph, + __be16 tp_src, __be16 tp_dst, + __be64 tun_id, __be16 tun_flags, + const void *opts, u8 opts_len) +{ + __ip_tunnel_info_init(tun_info, iph-saddr, iph-daddr, + iph-tos, iph-ttl, tp_src, tp_dst, + tun_id, tun_flags, opts, opts_len); +} + #ifdef CONFIG_INET int ip_tunnel_init(struct net_device *dev); diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index bbd49a0..fffe317 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -319,7 +319,7 @@ enum ovs_key_attr { * the accepted length of the array. */ #ifdef __KERNEL__ - OVS_KEY_ATTR_TUNNEL_INFO, /* struct ovs_tunnel_info */ + OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */ #endif __OVS_KEY_ATTR_MAX }; diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index b491c1c..34cad57 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -610,7
[net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device
This gets rid of all OVS specific VXLAN code in the receive and transmit path by using a VXLAN net_device to represent the vport. Only a small shim layer remains which takes care of handling the VXLAN specific OVS Netlink configuration. Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb() since they are no longer needed. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- drivers/net/vxlan.c| 23 +-- include/net/vxlan.h| 14 +- net/openvswitch/Kconfig| 12 -- net/openvswitch/Makefile | 1 - net/openvswitch/flow_netlink.c | 5 +- net/openvswitch/vport-netdev.c | 176 +- net/openvswitch/vport-vxlan.c | 322 - 7 files changed, 193 insertions(+), 360 deletions(-) delete mode 100644 net/openvswitch/vport-vxlan.c diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 3acab95..b696871 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -74,6 +74,10 @@ static struct rtnl_link_ops vxlan_link_ops; static const u8 all_zeros_mac[ETH_ALEN]; +static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, +vxlan_rcv_t *rcv, void *data, +bool no_share, u32 flags); + /* per-network namespace private data for this module */ struct vxlan_net { struct list_head vxlan_list; @@ -1020,7 +1024,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct vxlan_dev *dev) return false; } -void vxlan_sock_release(struct vxlan_sock *vs) +static void vxlan_sock_release(struct vxlan_sock *vs) { struct sock *sk = vs-sock-sk; struct net *net = sock_net(sk); @@ -1036,7 +1040,6 @@ void vxlan_sock_release(struct vxlan_sock *vs) queue_work(vxlan_wq, vs-del_work); } -EXPORT_SYMBOL_GPL(vxlan_sock_release); /* Update multicast group membership when first VNI on * multicast address is brought up @@ -1761,10 +1764,10 @@ err: } #endif -int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, - __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df, - __be16 src_port, __be16 dst_port, - struct vxlan_metadata *md, bool xnet, u32 vxflags) +static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, + __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df, + __be16 src_port, __be16 dst_port, + struct vxlan_metadata *md, bool xnet, u32 vxflags) { struct vxlanhdr *vxh; int min_headroom; @@ -1834,7 +1837,6 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, ttl, df, src_port, dst_port, xnet, !(vxflags VXLAN_F_UDP_CSUM)); } -EXPORT_SYMBOL_GPL(vxlan_xmit_skb); /* Bypass encapsulation if the destination is local */ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, @@ -2609,9 +2611,9 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port, return vs; } -struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, - vxlan_rcv_t *rcv, void *data, - bool no_share, u32 flags) +static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, +vxlan_rcv_t *rcv, void *data, +bool no_share, u32 flags) { struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct vxlan_sock *vs; @@ -2632,7 +2634,6 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, return vxlan_socket_create(net, port, rcv, data, flags); } -EXPORT_SYMBOL_GPL(vxlan_sock_add); static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, struct vxlan_config *conf) diff --git a/include/net/vxlan.h b/include/net/vxlan.h index c037b27..d3ce81f 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -197,19 +197,13 @@ struct vxlan_dev { VXLAN_F_REMCSUM_NOPARTIAL |\ VXLAN_F_FLOW_BASED) -struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, - vxlan_rcv_t *rcv, void *data, - bool no_share, u32 flags); - struct net_device *vxlan_dev_create(struct net *net, const char *name, u8 name_assign_type, struct vxlan_config *conf); -void vxlan_sock_release(struct vxlan_sock *vs); - -int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb, - __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df, - __be16 src_port, __be16 dst_port, struct
[net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name()
This allows to get rid of the get_name() vport ops later on. Signed-off-by: Thomas Graf tg...@suug.ch --- net/openvswitch/datapath.c | 4 ++-- net/openvswitch/vport-internal_dev.c | 1 - net/openvswitch/vport-netdev.c | 6 -- net/openvswitch/vport-netdev.h | 1 - net/openvswitch/vport.c | 4 ++-- net/openvswitch/vport.h | 5 + 6 files changed, 9 insertions(+), 12 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index c3ecfd4..8986558 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int dp_ifindex) const char *ovs_dp_name(const struct datapath *dp) { struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL); - return vport-ops-get_name(vport); + return ovs_vport_name(vport); } static int get_dpifindex(const struct datapath *dp) @@ -1786,7 +1786,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb, if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport-port_no) || nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport-ops-type) || nla_put_string(skb, OVS_VPORT_ATTR_NAME, - vport-ops-get_name(vport))) + ovs_vport_name(vport))) goto nla_put_failure; ovs_vport_get_stats(vport, vport_stats); diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index a2c205d..c058bbf 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = { .type = OVS_VPORT_TYPE_INTERNAL, .create = internal_dev_create, .destroy= internal_dev_destroy, - .get_name = ovs_netdev_get_name, .send = internal_dev_recv, }; diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c index cb22051..ef11a41 100644 --- a/net/openvswitch/vport-netdev.c +++ b/net/openvswitch/vport-netdev.c @@ -170,11 +170,6 @@ static void netdev_destroy(struct vport *vport) call_rcu(vport-rcu, free_port_rcu); } -const char *ovs_netdev_get_name(const struct vport *vport) -{ - return vport-dev-name; -} - static unsigned int packet_length(const struct sk_buff *skb) { unsigned int length = skb-len - ETH_HLEN; @@ -222,7 +217,6 @@ static struct vport_ops ovs_netdev_vport_ops = { .type = OVS_VPORT_TYPE_NETDEV, .create = netdev_create, .destroy= netdev_destroy, - .get_name = ovs_netdev_get_name, .send = netdev_send, }; diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h index 1c52aed..684fb88 100644 --- a/net/openvswitch/vport-netdev.h +++ b/net/openvswitch/vport-netdev.h @@ -26,7 +26,6 @@ struct vport *ovs_netdev_get_vport(struct net_device *dev); -const char *ovs_netdev_get_name(const struct vport *); void ovs_netdev_detach_dev(struct vport *); int __init ovs_netdev_init(void); diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index af23ba0..d14f594 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const char *name) struct vport *vport; hlist_for_each_entry_rcu(vport, bucket, hash_node) - if (!strcmp(name, vport-ops-get_name(vport)) + if (!strcmp(name, ovs_vport_name(vport)) net_eq(ovs_dp_get_net(vport-dp), net)) return vport; @@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms) } bucket = hash_bucket(ovs_dp_get_net(vport-dp), -vport-ops-get_name(vport)); +ovs_vport_name(vport)); hlist_add_head_rcu(vport-hash_node, bucket); return vport; } diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h index e05ec68..1a689c2 100644 --- a/net/openvswitch/vport.h +++ b/net/openvswitch/vport.h @@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb, skb-csum = csum_add(skb-csum, csum_partial(start, len, 0)); } +static inline const char *ovs_vport_name(struct vport *vport) +{ + return vport-dev ? vport-dev-name : vport-ops-get_name(vport); +} + int ovs_vport_ops_register(struct vport_ops *ops); void ovs_vport_ops_unregister(struct vport_ops *ops); -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 09/14] openvswitch: Move dev pointer into vport itself
This is the first step in representing all OVS vports as regular struct net_devices. Move the net_device pointer into the vport structure itself to get rid of struct vport_netdev. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- net/openvswitch/datapath.c | 7 +-- net/openvswitch/dp_notify.c | 5 +-- net/openvswitch/vport-internal_dev.c | 37 +++- net/openvswitch/vport-netdev.c | 84 net/openvswitch/vport-netdev.h | 12 -- net/openvswitch/vport.h | 3 +- 6 files changed, 58 insertions(+), 90 deletions(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 3315e3a..c3ecfd4 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp) local = ovs_vport_rcu(dp, OVSP_LOCAL); if (local) - ifindex = netdev_vport_priv(local)-dev-ifindex; + ifindex = local-dev-ifindex; else ifindex = 0; @@ -2205,13 +2205,10 @@ static void __net_exit list_vports_from_net(struct net *net, struct net *dnet, struct vport *vport; hlist_for_each_entry(vport, dp-ports[i], dp_hash_node) { - struct netdev_vport *netdev_vport; - if (vport-ops-type != OVS_VPORT_TYPE_INTERNAL) continue; - netdev_vport = netdev_vport_priv(vport); - if (dev_net(netdev_vport-dev) == dnet) + if (dev_net(vport-dev) == dnet) list_add(vport-detach_list, head); } } diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c index 2c631fe..a7a80a6 100644 --- a/net/openvswitch/dp_notify.c +++ b/net/openvswitch/dp_notify.c @@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work) struct hlist_node *n; hlist_for_each_entry_safe(vport, n, dp-ports[i], dp_hash_node) { - struct netdev_vport *netdev_vport; - if (vport-ops-type != OVS_VPORT_TYPE_NETDEV) continue; - netdev_vport = netdev_vport_priv(vport); - if (!(netdev_vport-dev-priv_flags IFF_OVS_DATAPATH)) + if (!(vport-dev-priv_flags IFF_OVS_DATAPATH)) dp_detach_port_notify(vport); } } diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 6a55f71..a2c205d 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev) static struct vport *internal_dev_create(const struct vport_parms *parms) { struct vport *vport; - struct netdev_vport *netdev_vport; struct internal_dev *internal_dev; int err; - vport = ovs_vport_alloc(sizeof(struct netdev_vport), - ovs_internal_vport_ops, parms); + vport = ovs_vport_alloc(0, ovs_internal_vport_ops, parms); if (IS_ERR(vport)) { err = PTR_ERR(vport); goto error; } - netdev_vport = netdev_vport_priv(vport); - - netdev_vport-dev = alloc_netdev(sizeof(struct internal_dev), -parms-name, NET_NAME_UNKNOWN, -do_setup); - if (!netdev_vport-dev) { + vport-dev = alloc_netdev(sizeof(struct internal_dev), + parms-name, NET_NAME_UNKNOWN, do_setup); + if (!vport-dev) { err = -ENOMEM; goto error_free_vport; } - dev_net_set(netdev_vport-dev, ovs_dp_get_net(vport-dp)); - internal_dev = internal_dev_priv(netdev_vport-dev); + dev_net_set(vport-dev, ovs_dp_get_net(vport-dp)); + internal_dev = internal_dev_priv(vport-dev); internal_dev-vport = vport; /* Restrict bridge port to current netns. */ if (vport-port_no == OVSP_LOCAL) - netdev_vport-dev-features |= NETIF_F_NETNS_LOCAL; + vport-dev-features |= NETIF_F_NETNS_LOCAL; rtnl_lock(); - err = register_netdevice(netdev_vport-dev); + err = register_netdevice(vport-dev); if (err) goto error_free_netdev; - dev_set_promiscuity(netdev_vport-dev, 1); + dev_set_promiscuity(vport-dev, 1); rtnl_unlock(); - netif_start_queue(netdev_vport-dev); + netif_start_queue(vport-dev
[net-next RFC 03/14] vxlan: Flow based tunneling
Allows putting a VXLAN device into a new flow-based mode in which it will populate a tunnel info structure for each packet received. The metadata structure will contain the outer header and tunnel header fields which have been stripped off. Layers further up in the stack such as routing, tc or netfitler can later match on these fields. On the transmit side, it allows skbs to carry their own encapsulation instructions thus allowing encapsulations parameters to be set per flow/route. This prepares the VXLAN device to be steered by the routing subsystem which will allow to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability of current VXLAN tunnels. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- drivers/net/vxlan.c | 147 --- include/linux/skbuff.h | 1 + include/net/ip_tunnels.h | 8 +++ include/net/route.h | 8 +++ include/net/vxlan.h | 4 +- include/uapi/linux/if_link.h | 1 + 6 files changed, 146 insertions(+), 23 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 34c519e..d5edba5 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1164,10 +1164,12 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh, /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) { + struct ip_tunnel_info *tun_info = NULL; struct vxlan_sock *vs; struct vxlanhdr *vxh; u32 flags, vni; - struct vxlan_metadata md = {0}; + struct vxlan_metadata _md; + struct vxlan_metadata *md = _md; /* Need Vxlan and inner Ethernet header to be present */ if (!pskb_may_pull(skb, VXLAN_HLEN)) @@ -1202,6 +1204,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) vni = VXLAN_VNI_MASK; } + if (vs-flags VXLAN_F_FLOW_BASED) { + const struct iphdr *iph = ip_hdr(skb); + + /* TODO: Consider optimizing by looking up in flow cache */ + tun_info = ip_tunnel_info_alloc(sizeof(*md), GFP_ATOMIC); + if (!tun_info) + goto drop; + + tun_info-key.ipv4_src = iph-saddr; + tun_info-key.ipv4_dst = iph-daddr; + tun_info-key.ipv4_tos = iph-tos; + tun_info-key.ipv4_ttl = iph-ttl; + tun_info-key.tp_src = udp_hdr(skb)-source; + tun_info-key.tp_dst = udp_hdr(skb)-dest; + + tun_info-mode = IP_TUNNEL_INFO_RX; + tun_info-key.tun_flags = TUNNEL_KEY; + tun_info-key.tun_id = cpu_to_be64(vni 8); + if (udp_hdr(skb)-check != 0) + tun_info-key.tun_flags |= TUNNEL_CSUM; + + md = ip_tunnel_info_opts(tun_info, sizeof(*md)); + skb_attach_tunnel_info(skb, tun_info); + } else { + memset(md, 0, sizeof(*md)); + } + /* For backwards compatibility, only allow reserved fields to be * used by VXLAN extensions if explicitly requested. */ @@ -1209,13 +1238,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) struct vxlanhdr_gbp *gbp; gbp = (struct vxlanhdr_gbp *)vxh; - md.gbp = ntohs(gbp-policy_id); + md-gbp = ntohs(gbp-policy_id); + + if (tun_info) + tun_info-key.tun_flags |= TUNNEL_VXLAN_OPT; if (gbp-dont_learn) - md.gbp |= VXLAN_GBP_DONT_LEARN; + md-gbp |= VXLAN_GBP_DONT_LEARN; if (gbp-policy_applied) - md.gbp |= VXLAN_GBP_POLICY_APPLIED; + md-gbp |= VXLAN_GBP_POLICY_APPLIED; flags = ~VXLAN_GBP_USED_BITS; } @@ -1233,8 +1265,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) goto bad_flags; } - md.vni = vxh-vx_vni; - vs-rcv(vs, skb, md); + md-vni = vxh-vx_vni; + vs-rcv(vs, skb, md); return 0; drop: @@ -1254,6 +1286,7 @@ error: static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, struct vxlan_metadata *md) { + struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; struct iphdr *oip = NULL; struct ipv6hdr *oip6 = NULL; struct vxlan_dev *vxlan; @@ -1263,7 +1296,12 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, int err = 0; union vxlan_addr *remote_ip; - vni = ntohl(md-vni) 8; + /* For flow based devices, map all packets to VNI 0 */ + if (vs-flags VXLAN_F_FLOW_BASED) + vni = 0; + else + vni = ntohl(md
[net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices
This is the first series in a greater effort to bring the scalability and programmability advantages of OVS to the rest of the network stack and to get rid of as much OVS specific code as possible. This first series focuses on getting rid of OVS tunnel vports and use regular tunnel net_devices instead. As part of this effort, the routing subsystem is extended with support for flow based tunneling. In this new tunneling mode, the route is able to match on tunnel information as well as set tunnel encapsulation parameters per route. This allows to perform L3 forwarding for a large number of tunnel endpoints and virtual networks using a single tunnel net_device. TODO: - Geneve support - IPv6 support - Benchmarks Pravin Shelar (1): openvswitch: Use regular GRE net_device instead of vport Thomas Graf (13): ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic ip_tunnel: support per packet tunnel metadata vxlan: Flow based tunneling route: Extend flow representation with tunnel key route: Per route tunnel metadata with RTA_TUNNEL fib: Add fib rule match on tunnel id vxlan: Factor out device configuration openvswitch: Allocate attach ip_tunnel_info for tunnel set action openvswitch: Move dev pointer into vport itself openvswitch: Abstract vport name through ovs_vport_name() openvswitch: Use regular VXLAN net_device device vxlan: remove indirect call to vxlan_rcv() and vni member arp: Associate ARP requests with tunnel info drivers/net/vxlan.c | 663 --- include/linux/skbuff.h | 2 + include/net/fib_rules.h | 1 + include/net/flow.h | 7 + include/net/ip_fib.h | 3 + include/net/ip_tunnels.h | 127 ++- include/net/route.h | 18 + include/net/vxlan.h | 82 - include/uapi/linux/fib_rules.h | 2 +- include/uapi/linux/if_link.h | 1 + include/uapi/linux/openvswitch.h | 2 +- include/uapi/linux/rtnetlink.h | 16 + net/core/dev.c | 5 +- net/core/fib_rules.c | 17 +- net/core/skbuff.c| 8 + net/ipv4/arp.c | 8 + net/ipv4/fib_frontend.c | 57 +++ net/ipv4/fib_semantics.c | 45 +++ net/ipv4/ip_gre.c| 161 - net/ipv4/ip_tunnel_core.c| 15 + net/ipv4/route.c | 32 +- net/openvswitch/Kconfig | 12 - net/openvswitch/Makefile | 2 - net/openvswitch/actions.c| 10 +- net/openvswitch/datapath.c | 19 +- net/openvswitch/datapath.h | 5 +- net/openvswitch/dp_notify.c | 5 +- net/openvswitch/flow.c | 4 +- net/openvswitch/flow.h | 77 +--- net/openvswitch/flow_netlink.c | 78 - net/openvswitch/flow_netlink.h | 3 +- net/openvswitch/vport-geneve.c | 17 +- net/openvswitch/vport-gre.c | 313 - net/openvswitch/vport-internal_dev.c | 38 +- net/openvswitch/vport-netdev.c | 271 +++--- net/openvswitch/vport-netdev.h | 13 - net/openvswitch/vport-vxlan.c| 322 - net/openvswitch/vport.c | 34 +- net/openvswitch/vport.h | 21 +- 39 files changed, 1334 insertions(+), 1182 deletions(-) delete mode 100644 net/openvswitch/vport-gre.c delete mode 100644 net/openvswitch/vport-vxlan.c -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 07/14] vxlan: Factor out device configuration
This factors out the device configuration out of the RTNL newlink API which allows for in-kernel creation of VXLAN net_devices. Signed-off-by: Thomas Graf tg...@suug.ch --- drivers/net/vxlan.c | 332 include/net/vxlan.h | 59 ++ 2 files changed, 236 insertions(+), 155 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index d5edba5..3acab95 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -54,10 +54,6 @@ #define PORT_HASH_BITS 8 #define PORT_HASH_SIZE (1PORT_HASH_BITS) -#define VNI_HASH_BITS 10 -#define VNI_HASH_SIZE (1VNI_HASH_BITS) -#define FDB_HASH_BITS 8 -#define FDB_HASH_SIZE (1FDB_HASH_BITS) #define FDB_AGE_DEFAULT 300 /* 5 min */ #define FDB_AGE_INTERVAL (10 * HZ) /* rescan interval */ @@ -74,6 +70,7 @@ module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN); static int vxlan_net_id; +static struct rtnl_link_ops vxlan_link_ops; static const u8 all_zeros_mac[ETH_ALEN]; @@ -84,21 +81,6 @@ struct vxlan_net { spinlock_tsock_lock; }; -union vxlan_addr { - struct sockaddr_in sin; - struct sockaddr_in6 sin6; - struct sockaddr sa; -}; - -struct vxlan_rdst { - union vxlan_addr remote_ip; - __be16 remote_port; - u32 remote_vni; - u32 remote_ifindex; - struct list_head list; - struct rcu_head rcu; -}; - /* Forwarding table entry */ struct vxlan_fdb { struct hlist_node hlist;/* linked list of entries */ @@ -111,31 +93,6 @@ struct vxlan_fdb { u8eth_addr[ETH_ALEN]; }; -/* Pseudo network device */ -struct vxlan_dev { - struct hlist_node hlist;/* vni hash table */ - struct list_head next; /* vxlan's per namespace list */ - struct vxlan_sock *vn_sock; /* listening socket */ - struct net_device *dev; - struct net*net; /* netns for packet i/o */ - struct vxlan_rdst default_dst; /* default destination */ - union vxlan_addr saddr;/* source address */ - __be16dst_port; - __u16 port_min; /* source port range */ - __u16 port_max; - __u8 tos; /* TOS override */ - __u8 ttl; - u32 flags;/* VXLAN_F_* in vxlan.h */ - - unsigned long age_interval; - struct timer_list age_timer; - spinlock_thash_lock; - unsigned int addrcnt; - unsigned int addrmax; - - struct hlist_head fdb_head[FDB_HASH_SIZE]; -}; - /* salt for hash table */ static u32 vxlan_salt __read_mostly; static struct workqueue_struct *vxlan_wq; @@ -345,7 +302,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (send_ip vxlan_nla_put_addr(skb, NDA_DST, rdst-remote_ip)) goto nla_put_failure; - if (rdst-remote_port rdst-remote_port != vxlan-dst_port + if (rdst-remote_port rdst-remote_port != vxlan-cfg.dst_port nla_put_be16(skb, NDA_PORT, rdst-remote_port)) goto nla_put_failure; if (rdst-remote_vni != vxlan-default_dst.remote_vni @@ -749,7 +706,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, if (!(flags NLM_F_CREATE)) return -ENOENT; - if (vxlan-addrmax vxlan-addrcnt = vxlan-addrmax) + if (vxlan-cfg.addrmax + vxlan-addrcnt = vxlan-cfg.addrmax) return -ENOSPC; /* Disallow replace to add a multicast entry */ @@ -835,7 +793,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan, return -EINVAL; *port = nla_get_be16(tb[NDA_PORT]); } else { - *port = vxlan-dst_port; + *port = vxlan-cfg.dst_port; } if (tb[NDA_VNI]) { @@ -1021,7 +979,7 @@ static bool vxlan_snoop(struct net_device *dev, vxlan_fdb_create(vxlan, src_mac, src_ip, NUD_REACHABLE, NLM_F_EXCL|NLM_F_CREATE, -vxlan-dst_port, +vxlan-cfg.dst_port, vxlan-default_dst.remote_vni, 0, NTF_SELF); spin_unlock(vxlan-hash_lock); @@ -1945,7 +1903,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, u32 flags = vxlan-flags; if (rdst) { - dst_port = rdst-remote_port ? rdst-remote_port : vxlan-dst_port; + dst_port = rdst-remote_port ? rdst-remote_port : vxlan-cfg.dst_port
[net-next RFC 14/14] arp: Associate ARP requests with tunnel info
Since ARP performs its own route lookup call, eventually returned tunnel metadata must be attached manually. Signed-off-by: Thomas Graf tg...@suug.ch --- net/ipv4/arp.c | 8 1 file changed, 8 insertions(+) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index 933a928..6cf0502 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -489,6 +489,7 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, unsigned char *arp_ptr; int hlen = LL_RESERVED_SPACE(dev); int tlen = dev-needed_tailroom; + struct rtable *rt; /* * Allocate a buffer @@ -577,6 +578,13 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip, } memcpy(arp_ptr, dest_ip, 4); + rt = ip_route_output(dev_net(dev), dest_ip, src_ip, 0, dev-ifindex); + if (!IS_ERR(rt)) { + if (rt-rt_tun_info) + skb_attach_tunnel_info(skb, rt-rt_tun_info); + ip_rt_put(rt); + } + return skb; out: -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 06/14] fib: Add fib rule match on tunnel id
This add the ability to select a routing table based on the tunnel id which allows to maintain separate routing tables for each virtual tunnel network. ip rule add from all tunnel-id 100 lookup 100 ip rule add from all tunnel-id 200 lookup 200 Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/fib_rules.h| 1 + include/uapi/linux/fib_rules.h | 2 +- net/core/fib_rules.c | 17 +++-- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 6d67383..822ed1e 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -19,6 +19,7 @@ struct fib_rule { u8 action; /* 3 bytes hole, try to use */ u32 target; + __be64 tun_id; struct fib_rule __rcu *ctarget; struct net *fr_net; diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h index 2b82d7e..96161b8 100644 --- a/include/uapi/linux/fib_rules.h +++ b/include/uapi/linux/fib_rules.h @@ -43,7 +43,7 @@ enum { FRA_UNUSED5, FRA_FWMARK, /* mark */ FRA_FLOW, /* flow/class id */ - FRA_UNUSED6, + FRA_TUN_ID, FRA_SUPPRESS_IFGROUP, FRA_SUPPRESS_PREFIXLEN, FRA_TABLE, /* Extended table id */ diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index 9a12668..6da78c9 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -186,6 +186,9 @@ static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops, if ((rule-mark ^ fl-flowi_mark) rule-mark_mask) goto out; + if (rule-tun_id (rule-tun_id != fl-flowi_tun_key.tun_id)) + goto out; + ret = ops-match(rule, fl, flags); out: return (rule-flags FIB_RULE_INVERT) ? !ret : ret; @@ -330,6 +333,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh) if (tb[FRA_FWMASK]) rule-mark_mask = nla_get_u32(tb[FRA_FWMASK]); + if (tb[FRA_TUN_ID]) + rule-tun_id = nla_get_be64(tb[FRA_TUN_ID]); + rule-action = frh-action; rule-flags = frh-flags; rule-table = frh_get_table(frh, tb); @@ -473,6 +479,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh) (rule-mark_mask != nla_get_u32(tb[FRA_FWMASK]))) continue; + if (tb[FRA_TUN_ID] + (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID]))) + continue; + if (!ops-compare(rule, frh, tb)) continue; @@ -535,7 +545,8 @@ static inline size_t fib_rule_nlmsg_size(struct fib_rules_ops *ops, + nla_total_size(4) /* FRA_SUPPRESS_PREFIXLEN */ + nla_total_size(4) /* FRA_SUPPRESS_IFGROUP */ + nla_total_size(4) /* FRA_FWMARK */ -+ nla_total_size(4); /* FRA_FWMASK */ ++ nla_total_size(4) /* FRA_FWMASK */ ++ nla_total_size(8); /* FRA_TUN_ID */ if (ops-nlmsg_payload) payload += ops-nlmsg_payload(rule); @@ -591,7 +602,9 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct fib_rule *rule, ((rule-mark_mask || rule-mark) nla_put_u32(skb, FRA_FWMASK, rule-mark_mask)) || (rule-target -nla_put_u32(skb, FRA_GOTO, rule-target))) +nla_put_u32(skb, FRA_GOTO, rule-target)) || + (rule-tun_id +nla_put_be64(skb, FRA_TUN_ID, rule-tun_id))) goto nla_put_failure; if (rule-suppress_ifgroup != -1) { -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
Introduces a new Netlink attribute RTA_TUNNEL which allows routes to set tunnel transmit metadata and specify the tunnel endpoint or tunnel id on a per route basis. The route must point to a tunnel device which understands per skb tunnel metadata and has been put into the respective mode. Signed-off-by: Thomas Graf tg...@suug.ch --- include/net/ip_fib.h | 3 +++ include/net/ip_tunnels.h | 1 - include/net/route.h| 10 include/uapi/linux/rtnetlink.h | 16 net/ipv4/fib_frontend.c| 57 ++ net/ipv4/fib_semantics.c | 45 + net/ipv4/route.c | 30 +- net/openvswitch/vport.h| 1 + 8 files changed, 161 insertions(+), 2 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index 54271ed..1cd7cf8 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -22,6 +22,7 @@ #include net/fib_rules.h #include net/inetpeer.h #include linux/percpu.h +#include net/ip_tunnels.h struct fib_config { u8 fc_dst_len; @@ -44,6 +45,7 @@ struct fib_config { u32 fc_flow; u32 fc_nlflags; struct nl_info fc_nlinfo; + struct ip_tunnel_info fc_tunnel; }; struct fib_info; @@ -117,6 +119,7 @@ struct fib_info { #ifdef CONFIG_IP_ROUTE_MULTIPATH int fib_power; #endif + struct ip_tunnel_info *fib_tunnel; struct rcu_head rcu; struct fib_nh fib_nh[0]; #define fib_devfib_nh[0].nh_dev diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index df8cfd3..b4ab930 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -9,7 +9,6 @@ #include net/dsfield.h #include net/gro_cells.h #include net/inet_ecn.h -#include net/ip.h #include net/netns/generic.h #include net/rtnetlink.h #include net/flow.h diff --git a/include/net/route.h b/include/net/route.h index 6ede321..dbda603 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -28,6 +28,7 @@ #include net/inetpeer.h #include net/flow.h #include net/inet_sock.h +#include net/ip_tunnels.h #include linux/in_route.h #include linux/rtnetlink.h #include linux/rcupdate.h @@ -66,6 +67,7 @@ struct rtable { struct list_headrt_uncached; struct uncached_list*rt_uncached_list; + struct ip_tunnel_info *rt_tun_info; }; static inline bool rt_is_input_route(const struct rtable *rt) @@ -198,6 +200,8 @@ struct in_ifaddr; void fib_add_ifaddr(struct in_ifaddr *); void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *); +int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info); + static inline void ip_rt_put(struct rtable *rt) { /* dst_release() accepts a NULL parameter. @@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst) static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb) { + struct rtable *rt; + if (skb_shinfo(skb)-tun_info) return skb_shinfo(skb)-tun_info; + rt = skb_rtable(skb); + if (rt) + return rt-rt_tun_info; + return NULL; } diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 17fb02f..1f7aa68 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -286,6 +286,21 @@ enum rt_class_t { /* Routing message attributes */ +enum rta_tunnel_t { + RTA_TUN_UNSPEC, + RTA_TUN_ID, + RTA_TUN_DST, + RTA_TUN_SRC, + RTA_TUN_TTL, + RTA_TUN_TOS, + RTA_TUN_SPORT, + RTA_TUN_DPORT, + RTA_TUN_FLAGS, + __RTA_TUN_MAX, +}; + +#define RTA_TUN_MAX (__RTA_TUN_MAX - 1) + enum rtattr_type_t { RTA_UNSPEC, RTA_DST, @@ -308,6 +323,7 @@ enum rtattr_type_t { RTA_VIA, RTA_NEWDST, RTA_PREF, + RTA_TUNNEL, /* destination VTEP */ __RTA_MAX }; diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 872494e..bfa77a6 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg) return -EINVAL; } +static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = { + [RTA_TUN_ID]= { .type = NLA_U64 }, + [RTA_TUN_DST] = { .type = NLA_U32 }, + [RTA_TUN_SRC] = { .type = NLA_U32 }, + [RTA_TUN_TTL] = { .type = NLA_U8 }, + [RTA_TUN_TOS] = { .type = NLA_U8 }, + [RTA_TUN_SPORT] = { .type = NLA_U16 }, + [RTA_TUN_DPORT] = { .type = NLA_U16 }, + [RTA_TUN_FLAGS] = { .type = NLA_U16 }, +}; + +static int parse_rta_tunnel(struct fib_config *cfg, struct nlattr *attr) +{ + struct nlattr *tb[RTA_TUN_MAX
[net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member
With the removal of the special treating of OVS VXLAN vports, the indirect call to vxlan_rcv() can be avoided and the VNI member in vxlan_metadata can be removed. Signed-off-by: Thomas Graf tg...@suug.ch --- drivers/net/vxlan.c | 225 +--- include/net/vxlan.h | 7 -- 2 files changed, 107 insertions(+), 125 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index b696871..9cc7d5a 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -75,7 +75,6 @@ static struct rtnl_link_ops vxlan_link_ops; static const u8 all_zeros_mac[ETH_ALEN]; static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port, -vxlan_rcv_t *rcv, void *data, bool no_share, u32 flags); /* per-network namespace private data for this module */ @@ -1122,6 +1121,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh, return vh; } +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, + struct vxlan_metadata *md, __u32 vni) +{ + struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; + struct iphdr *oip = NULL; + struct ipv6hdr *oip6 = NULL; + struct vxlan_dev *vxlan; + struct pcpu_sw_netstats *stats; + union vxlan_addr saddr; + int err = 0; + union vxlan_addr *remote_ip; + + /* For flow based devices, map all packets to VNI 0 */ + if (vs-flags VXLAN_F_FLOW_BASED) + vni = 0; + + /* Is this VNI defined? */ + vxlan = vxlan_vs_find_vni(vs, vni); + if (!vxlan) + goto drop; + + remote_ip = vxlan-default_dst.remote_ip; + skb_reset_mac_header(skb); + skb_scrub_packet(skb, !net_eq(vxlan-net, dev_net(vxlan-dev))); + skb-protocol = eth_type_trans(skb, vxlan-dev); + skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN); + + /* Ignore packet loops (and multicast echo) */ + if (ether_addr_equal(eth_hdr(skb)-h_source, vxlan-dev-dev_addr)) + goto drop; + + /* Re-examine inner Ethernet packet */ + if (remote_ip-sa.sa_family == AF_INET) { + oip = ip_hdr(skb); + saddr.sin.sin_addr.s_addr = oip-saddr; + saddr.sa.sa_family = AF_INET; + + if (tun_info) { + tun_info-key.ipv4_src = oip-saddr; + tun_info-key.ipv4_dst = oip-daddr; + tun_info-key.ipv4_tos = oip-tos; + tun_info-key.ipv4_ttl = oip-ttl; + } +#if IS_ENABLED(CONFIG_IPV6) + } else { + oip6 = ipv6_hdr(skb); + saddr.sin6.sin6_addr = oip6-saddr; + saddr.sa.sa_family = AF_INET6; + + /* TODO : Fill IPv6 tunnel info */ +#endif + } + + if ((vxlan-flags VXLAN_F_LEARN) + vxlan_snoop(skb-dev, saddr, eth_hdr(skb)-h_source)) + goto drop; + + skb_reset_network_header(skb); + if (!(vs-flags VXLAN_F_FLOW_BASED)) + skb-mark = md-gbp; + + if (oip6) + err = IP6_ECN_decapsulate(oip6, skb); + if (oip) + err = IP_ECN_decapsulate(oip, skb); + + if (unlikely(err)) { + if (log_ecn_error) { + if (oip6) + net_info_ratelimited(non-ECT from %pI6\n, +oip6-saddr); + if (oip) + net_info_ratelimited(non-ECT from %pI4 with TOS=%#x\n, +oip-saddr, oip-tos); + } + if (err 1) { + ++vxlan-dev-stats.rx_frame_errors; + ++vxlan-dev-stats.rx_errors; + goto drop; + } + } + + stats = this_cpu_ptr(vxlan-dev-tstats); + u64_stats_update_begin(stats-syncp); + stats-rx_packets++; + stats-rx_bytes += skb-len; + u64_stats_update_end(stats-syncp); + + netif_rx(skb); + + return; +drop: + /* Consume bad packet */ + kfree_skb(skb); +} + /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) { @@ -1226,8 +1321,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) goto bad_flags; } - md-vni = vxh-vx_vni; - vs-rcv(vs, skb, md); + vxlan_rcv(vs, skb, md, vni 8); return 0; drop: @@ -1244,105 +1338,6 @@ error: return 1; } -static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, - struct vxlan_metadata *md) -{ - struct ip_tunnel_info *tun_info = skb_shinfo(skb)-tun_info; - struct iphdr *oip = NULL; - struct ipv6hdr *oip6 = NULL; - struct
[net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport
From: Pravin Shelar pshe...@nicira.com Removes all of the OVS specific GRE code and makes OVS use a GRE net_device . Signed-off-by: Pravin B Shelar pshe...@nicira.com --- net/core/dev.c | 5 +- net/ipv4/ip_gre.c | 161 - net/openvswitch/Makefile | 1 - net/openvswitch/vport-gre.c| 313 - net/openvswitch/vport-netdev.c | 7 +- 5 files changed, 168 insertions(+), 319 deletions(-) delete mode 100644 net/openvswitch/vport-gre.c diff --git a/net/core/dev.c b/net/core/dev.c index 594163d..656f3b4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6969,6 +6969,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(dev-ptype_all); INIT_LIST_HEAD(dev-ptype_specific); dev-priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; + + strcpy(dev-name, name); + dev-name_assign_type = name_assign_type; setup(dev); dev-num_tx_queues = txqs; @@ -6983,8 +6986,6 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, goto free_all; #endif - strcpy(dev-name, name); - dev-name_assign_type = name_assign_type; dev-group = INIT_NETDEV_GROUP; if (!dev-ethtool_ops) dev-ethtool_ops = default_ethtool_ops; diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 5fd7064..b37515e 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -25,6 +25,7 @@ #include linux/udp.h #include linux/if_arp.h #include linux/mroute.h +#include linux/if_vlan.h #include linux/init.h #include linux/in6.h #include linux/inetdevice.h @@ -115,6 +116,8 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN); +#define GRE_TAP_FB_NAME gretap0 + static struct rtnl_link_ops ipgre_link_ops __read_mostly; static int ipgre_tunnel_init(struct net_device *dev); @@ -217,7 +220,17 @@ static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) iph-saddr, iph-daddr, tpi-key); if (tunnel) { + skb_pop_mac_header(skb); + if (tunnel-dev == itn-fb_tunnel_dev) { + struct ip_tunnel_info *tun_info; + + tun_info = ip_tunnel_info_alloc(0, GFP_ATOMIC); + + /* TODO: setup tun info from tpi */ + skb_attach_tunnel_info(skb, tun_info); + } + ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error); return PACKET_RCVD; } @@ -287,6 +300,135 @@ out: return NETDEV_TX_OK; } +/* TODO: share xmit code */ +static inline struct rtable *tunnel_route_lookup(struct net *net, +const struct ip_tunnel_key *key, +u32 mark, +struct flowi4 *fl, +u8 protocol) +{ + struct rtable *rt; + + memset(fl, 0, sizeof(*fl)); + fl-daddr = key-ipv4_dst; + fl-saddr = key-ipv4_src; + fl-flowi4_tos = RT_TOS(key-ipv4_tos); + fl-flowi4_mark = mark; + fl-flowi4_proto = protocol; + + rt = ip_route_output_key(net, fl); + return rt; +} + + +/* Returns the least-significant 32 bits of a __be64. */ +static __be32 be64_get_low32(__be64 x) +{ +#ifdef __BIG_ENDIAN + return (__force __be32)x; +#else + return (__force __be32)((__force u64)x 32); +#endif +} + +static __be16 filter_tnl_flags(__be16 flags) +{ + return flags (TUNNEL_CSUM | TUNNEL_KEY); +} + + +static struct sk_buff *__build_header(struct sk_buff *skb, + const struct ip_tunnel_info *tun_info, + int tunnel_hlen) +{ + struct tnl_ptk_info tpi; + + skb = gre_handle_offloads(skb, !!(tun_info-key.tun_flags TUNNEL_CSUM)); + if (IS_ERR(skb)) + return skb; + + tpi.flags = filter_tnl_flags(tun_info-key.tun_flags); + tpi.proto = htons(ETH_P_TEB); + tpi.key = be64_get_low32(tun_info-key.tun_id); + tpi.seq = 0; + gre_build_header(skb, tpi, tunnel_hlen); + + return skb; +} + +static netdev_tx_t gre_fb_xmit(struct sk_buff *skb, + struct net_device *dev) +{ + struct net *net = dev_net(dev); + struct ip_tunnel_info *tun_info; + const struct ip_tunnel_key *key; + struct flowi4 fl; + struct rtable *rt; + int min_headroom; + int tunnel_hlen; + __be16 df; + int err; + + tun_info = skb_shinfo(skb)-tun_info; + if (unlikely(!tun_info)) { + err = -EINVAL; + goto err_free_skb; + } + + key = tun_info-key; + + rt =
[net-next RFC 08/14] openvswitch: Allocate attach ip_tunnel_info for tunnel set action
Make use of the new skb tunnel metadata field by allocating a ip_tunnel_info per OVS tunnel set action and then attaching that metadata to each skb that passes the set action. The old egress_tun_info via the OVS_CB() is left in place until all tunnel vports have been converted to the new method. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: Pravin B Shelar pshe...@nicira.com --- net/openvswitch/actions.c | 8 +- net/openvswitch/datapath.c | 8 +++--- net/openvswitch/flow.h | 5 net/openvswitch/flow_netlink.c | 59 +- net/openvswitch/flow_netlink.h | 1 + 5 files changed, 69 insertions(+), 12 deletions(-) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 34cad57..484d965 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -726,7 +726,13 @@ static int execute_set_action(struct sk_buff *skb, { /* Only tunnel set execution is supported without a mask. */ if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) { - OVS_CB(skb)-egress_tun_info = nla_data(a); + struct ovs_tunnel_info *tun = nla_data(a); + + skb_attach_tunnel_info(skb, tun-info); + + /* FIXME: Remove when all vports have been converted */ + OVS_CB(skb)-egress_tun_info = tun-info; + return 0; } diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 3b90461..3315e3a 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info) } ovs_unlock(); - ovs_nla_free_flow_actions(old_acts); + ovs_nla_free_flow_actions_rcu(old_acts); ovs_flow_free(new_flow, false); } @@ -1016,7 +1016,7 @@ err_unlock_ovs: ovs_unlock(); kfree_skb(reply); err_kfree_acts: - kfree(acts); + ovs_nla_free_flow_actions(acts); err_kfree_flow: ovs_flow_free(new_flow, false); error: @@ -1143,7 +1143,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info) if (reply) ovs_notify(dp_flow_genl_family, reply, info); if (old_acts) - ovs_nla_free_flow_actions(old_acts); + ovs_nla_free_flow_actions_rcu(old_acts); return 0; @@ -1151,7 +1151,7 @@ err_unlock_ovs: ovs_unlock(); kfree_skb(reply); err_kfree_acts: - kfree(acts); + ovs_nla_free_flow_actions(acts); error: return error; } diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h index cadc6c5..193eab9 100644 --- a/net/openvswitch/flow.h +++ b/net/openvswitch/flow.h @@ -45,6 +45,11 @@ struct sk_buff; #define TUN_METADATA_OPTS(flow_key, opt_len) \ ((void *)((flow_key)-tun_opts + TUN_METADATA_OFFSET(opt_len))) +struct ovs_tunnel_info +{ + struct ip_tunnel_info *info; +}; + #define OVS_SW_FLOW_KEY_METADATA_SIZE \ (offsetof(struct sw_flow_key, recirc_id) + \ FIELD_SIZEOF(struct sw_flow_key, recirc_id)) diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index ecfa530..35086c6 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1548,11 +1548,45 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size, bool log) return sfa; } +static void ovs_nla_free_set_action(const struct nlattr *a) +{ + const struct nlattr *ovs_key = nla_data(a); + struct ovs_tunnel_info *ovs_tun; + + switch (nla_type(ovs_key)) { + case OVS_KEY_ATTR_TUNNEL_INFO: + ovs_tun = nla_data(ovs_key); + ip_tunnel_info_put(ovs_tun-info); + break; + } +} + +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +{ + const struct nlattr *a; + int rem; + + nla_for_each_attr(a, sf_acts-actions, sf_acts-actions_len, rem) { + switch (nla_type(a)) { + case OVS_ACTION_ATTR_SET: + ovs_nla_free_set_action(a); + break; + } + } + + kfree(sf_acts); +} + +static void __ovs_nla_free_flow_actions(struct rcu_head *head) +{ + ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, rcu)); +} + /* Schedules 'sf_acts' to be freed after the next RCU grace period. * The caller must hold rcu_read_lock for this to be sensible. */ -void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts) { - kfree_rcu(sf_acts, rcu); + call_rcu(sf_acts-rcu, __ovs_nla_free_flow_actions); } static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa, @@ -1747,6 +1781,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr, struct sw_flow_match match
Re: [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
On 06/01/15 at 05:51pm, Robert Shearman wrote: On 01/06/15 15:27, Thomas Graf wrote: Introduces a new Netlink attribute RTA_TUNNEL which allows routes to set tunnel transmit metadata and specify the tunnel endpoint or tunnel id on a per route basis. The route must point to a tunnel device which understands per skb tunnel metadata and has been put into the respective mode. We've been discussing something similar for the purposes of IP over MPLS, but most of the attributes for IP tunnels aren't relevant for MPLS. It be great if we can come up with something general enough that can serve both purposes. I've just sent a patch series ([RFC net-next 0/3] IP imposition of per-nh MPLS encap) which I believe would allow this. Nice! On a first glance, your series looks like an excellent complement to this series. I'll comment directly in your series. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] IP imposition of per-nh MPLS encap
On 06/01/15 at 05:46pm, Robert Shearman wrote: In order to be able to function as a Label Edge Router in an MPLS network, it is necessary to be able to take IP packets and impose an MPLS encap and forward them out. The traditional approach of setting up an interface for each tunnel endpoint doesn't scale for the common MPLS use-cases where each IP route tends to be assigned a different label as encap. The solution suggested here for further discussion is to provide the facility to define encap data on a per-nexthop basis using a new netlink attribue, RTA_ENCAP, which would be opaque to the IPv4/IPv6 forwarding code, but interpreted by the virtual interface assigned to the nexthop. RTA_ENCAP is currently a binary blob specific to each encapsulation type interface. I guess this should be converted to a set of nested Netlink attributes for each type of encap to make it extendible in the future. What is your plan regarding the receive side and on the matching of encap fields? Storing the receive parameters is what lead me to storing it in skb_shared_info. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: netlink: Disable insertions/removals during rehash
On 05/15/15 at 08:06am, Herbert Xu wrote: On Thu, May 14, 2015 at 07:37:56AM -0700, Eric Dumazet wrote: This solves the corruption thanks Herbert. Great. But wasn't rhashtable meant to be faster ? ;) Is it, that's news to me :) Eric, can you share the scripts you used to test this? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 6/6] netlink: allow to listen all netns
On 05/06/15 at 11:58am, Nicolas Dichtel wrote: More accurately, listen all netns that have a nsid assigned into the netns where the netlink socket is opened. For this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this socket will receive netlink notifications from all netns that have a nsid assigned into the netns where the socket has been opened. The nsid is sent to userland via an anscillary data. With this patch, a daemon needs only one socket to listen many netns. This is useful when the number of netns is high. Signed-off-by: Nicolas Dichtel nicolas.dich...@6wind.com [...] +/* This function returns true is the peer netns has an id assigned into the + * current netns. + */ +bool peernet_has_id(struct net *net, struct net *peer) +{ + return peernet2id(net, peer) = 0; +} Missing export? + struct net *get_net_ns_by_id(struct net *net, int id) { unsigned long flags; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index ec4adbdcb9b4..bdbde542e952 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -83,6 +83,7 @@ struct listeners { #define NETLINK_RECV_PKTINFO 0x2 #define NETLINK_BROADCAST_SEND_ERROR 0x4 #define NETLINK_RECV_NO_ENOBUFS 0x8 +#define NETLINK_LISTEN_ALL 0x10 Maybe name this NETLINK_LISTEN_ALL_NSID just to make it clear? + if (!file_ns_capable(sk-sk_socket-file, p-net-user_ns, + CAP_NET_BROADCAST)) + return; + } + NETLINK_CB(p-skb).net = p-net; Does this need a get_net()? The netns could disappear while the skb is queued, right? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rhashtable: Add cap on number of elements in hash table
On 04/24/15 at 08:57am, Herbert Xu wrote: It seems that I lost track somewhere along the line. I meant to add an explicit limit on the overall number of entries since that was what users like netlink expected but never got around to doing it. Instead it seems that we're currently relying on the rht_grow_above_100 to protect us. Can we please just take Johannes's fix as-is first? It fixes the bug at hand in an isolated manner without introducing any new knobs. Your patch includes his fix as-is without modification anyway. So here is a patch that adds an explicit limit and fixes the problem Johannes reported. ---8--- We currently have no limit on the number of elements in a hash table. This is very bad especially considering that some rhashtable users had such a limit before the conversion and relied on it for defence against DoS attacks. Which users are you talking about? Both Netlink and TIPC still have an upper limit. nft sets are controlled by privileged users. We already have a maximum hash table size limit but its enforcement is only by luck and results in a nasty WARN_ON. As I stated earlier, this is no longer the case and thus this paragraph only confuses the commit message. This patch adds a new paramater insecure_max_entries which becomes the cap on the table. If unset it defaults to max_size. If it is also zero it means that there is no cap on the number of elements in the table. However, the table will grow whenever the utilisation hits 100% and if that growth fails, you will get ENOMEM on insertion. Last time we discussed this it was said that the caller should enforce the limit like Netlink does. I'm fine with adding an upper max but I'd like to discuss that in the context of a full series which converts all existing enforcements and also contains a testing mechanism to verify this. Also, unless you can show me where this is currently a real bug, this is really net-next material. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rhashtable: Add cap on number of elements in hash table
On 04/24/15 at 04:12pm, Herbert Xu wrote: On Fri, Apr 24, 2015 at 09:06:08AM +0100, Thomas Graf wrote: Which users are you talking about? Both Netlink and TIPC still have an upper limit. nft sets are controlled by privileged users. There is no limit in netlink apart from UINT_MAX AFAICS. Allowing UINT_MAX entries into a hash table limited to 64K is not a good thing. OK, so you are saying that the Netlink limit is too low? Then let's fix that. You are claiming that the rhashtable convertion removed a cap. I'm not seeing such a change. Can you point me to where netlink_insert() enforced a cap pre-rhashtable? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] rhashtable: don't attempt to grow when at max_size
On 04/23/15 at 04:38pm, Johannes Berg wrote: From: Johannes Berg johannes.b...@intel.com The conversion of mac80211's station table to rhashtable had a bug that I found by accident in code review, that hadn't been found as rhashtable apparently managed to have a maximum hash chain length of one (!) in all our testing. This is the desired chain length ;-) In order to test the bug and verify the fix I set my rhashtable's max_size very low (4) in order to force getting hash collisions. At that point, rhashtable WARNed in rhashtable_insert_rehash() but didn't actually reject the hash table insertion. This caused it to lose insertions - my master list of stations would have 9 entries, but the rhashtable only had 5. This may warrant a deeper look, but that WARN_ON() just shouldn't happen. The warning got fixed recently (51bb8e331b) and rhashtable_insert_rehash() now only allows a single rehash if at max_size already. It will now return -EBUSY. Insertions may still fail while the table is above 100% utilization so this fix is absolutely needed though. Fix this by not returning true from rht_grow_above_100() when the rhashtable's max_size has been reached - in this case the user is explicitly configuring it to be at most that big, so even if it's now above 100% it shouldn't attempt to resize. Good catch. I wonder whether we want to trigger a periodic rehash in an interval in this situation or just leave this up to the user to setup a timer himself. This fixes the lost insertion issue and consequently allows my code to display its error (and verify my fix for it.) Signed-off-by: Johannes Berg johannes.b...@intel.com Acked-by: Thomas Graf tg...@suug.ch -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/3] tc: fix return values of ingress qdisc
On 04/22/15 at 04:29pm, Cong Wang wrote: On Wed, Apr 22, 2015 at 3:04 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/21/15 9:59 PM, Cong Wang wrote: On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: ingress qdisc should return NET_XMIT_* values just like all other qdiscs. XMIT already means egress... may be then it should be renamed as well. from include/linux/netdevice.h: /* qdisc -enqueue() return codes. */ #define NET_XMIT_SUCCESS0x00 ... the point is that qdisc-enqeue() must return NET_XMIT_* values. ingress qdisc is violating this and therefore should be fixed. XMIT is non-sense for ingress, you really need to pick another name for it if TC_ACT_OK isn't okay for you (it is okay for me). You transmit into a qdisc. If that terminology doesn't suit you then rename it to NET_QUEUE_* but moving away from returning TC_ACT_* is definitely the right thing to do here. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 3/3] tc: cleanup tc_classify
On 04/22/15 at 04:38pm, Cong Wang wrote: On Wed, Apr 22, 2015 at 3:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/21/15 10:05 PM, Cong Wang wrote: On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: introduce tc_classify_act() and qdisc_drop_bypass() helper functions to reduce copy-paste among different qdiscs I like this cleanup. It aligns all skb dropping in qdiscs to a qdisc_drop*() function. I don't think qdisc_drop_bypass() is more readable than without it, maybe you need a better name, or just leave the code as it is. what would be a better name? I'm open to suggestions. My reading for qdisc_drop_bypass() is it bypasses packet dropping for some case, apparently doesn't match its definition. I can't think out a better name therefore I don't think it deserves a function, just leave as it is. Interesting logic ;-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further
The current code currently only stops inserting rehashes into the chain when no resizes are currently scheduled. As long as resizes are scheduled and while inserting above the utilization watermark, more and more rehashes will be scheduled. This lead to a perfect DoS storm with thousands of rehashes scheduled which lead to thousands of spinlocks to be taken sequentially. Instead, only allow either a series of resizes or a single rehash. Drop any further rehashes and return -EBUSY. Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion) Signed-off-by: Thomas Graf tg...@suug.ch Acked-by: Herbert Xu herb...@gondor.apana.org.au --- lib/rhashtable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index f648cfd..b28df40 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -405,8 +405,8 @@ int rhashtable_insert_rehash(struct rhashtable *ht) if (rht_grow_above_75(ht, tbl)) size *= 2; - /* More than two rehashes (not resizes) detected. */ - else if (WARN_ON(old_tbl != tbl old_tbl-size == size)) + /* Do not schedule more than one rehash */ + else if (old_tbl != tbl) return -EBUSY; new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC); -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails
When rhashtable_insert_rehash() fails with ENOMEM, this indicates that we can't allocate the necessary memory in the current context but the limits as set by the user would still allow to grow. Thus attempt an async resize in the background where we can allocate using GFP_KERNEL which is more likely to succeed. The insertion itself will still fail to indicate pressure. This fixes a bug where the table would never continue growing once the utilization is above 100%. Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion) Signed-off-by: Thomas Graf tg...@suug.ch --- lib/rhashtable.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 4898442..f648cfd 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -410,8 +410,13 @@ int rhashtable_insert_rehash(struct rhashtable *ht) return -EBUSY; new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC); - if (new_tbl == NULL) + if (new_tbl == NULL) { + /* Schedule async resize/rehash to try allocation +* non-atomic context. +*/ + schedule_work(ht-run_work); return -ENOMEM; + } err = rhashtable_rehash_attach(ht, tbl, new_tbl); if (err) { -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/2 v2] rhashtable rehashing fixes
Some rhashtable rehashing bugs found while testing with the next rhashtable self-test queued up for the next devel cycle: https://github.com/tgraf/net-next/commits/rht v2: - Moved schedule_work() call into rhashtable_insert_rehash() Thomas Graf (2): rhashtable: Schedule async resize when sync realloc fails rhashtable: Do not schedule more than one rehash if we can't grow further lib/rhashtable.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails
On 04/21/15 at 10:10pm, David Miller wrote: From: Herbert Xu herb...@gondor.apana.org.au Date: Wed, 22 Apr 2015 08:36:34 +0800 On Tue, Apr 21, 2015 at 02:55:34PM +0200, Thomas Graf wrote: When rhashtable_insert_rehash() fails with ENOMEM, this indicates that we can't allocate the necessary memory in the current context but the limits as set by the user would still allow to grow. Thus attempt an async resize in the background where we can allocate using GFP_KERNEL which is more likely to succeed. The insertion itself will still fail to indicate pressure. This fixes a bug where the table would never continue growing once the utilization is above 100%. Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion) Signed-off-by: Thomas Graf tg...@suug.ch Good catch. But I think this call should happen in rhashtable_insert_rehash since it's on the slow-path. Ok, then I expect a respin of this series. Agreed, respinning. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/2] rhashtable: Do not schedule more than one rehash if we can't grow further
The current code currently only stops inserting rehashes into the chain when no resizes are currently scheduled. As long as resizes are scheduled and while inserting above the utilization watermark, more and more rehashes will be scheduled. This lead to a perfect DoS storm with thousands of rehashes scheduled which lead to thousands of spinlocks to be taken sequentially. Instead, only allow either a series of resizes or a single rehash. Drop any further rehashes and return -EBUSY. Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion) Signed-off-by: Thomas Graf tg...@suug.ch --- lib/rhashtable.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 4898442..cb819ed 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -405,8 +405,8 @@ int rhashtable_insert_rehash(struct rhashtable *ht) if (rht_grow_above_75(ht, tbl)) size *= 2; - /* More than two rehashes (not resizes) detected. */ - else if (WARN_ON(old_tbl != tbl old_tbl-size == size)) + /* Do not schedule more than one rehash */ + else if (old_tbl != tbl) return -EBUSY; new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC); -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/2] rhashtable rehashing fixes
Some rhashtable rehashing bugs found while testing with the next rhashtable self-test queued up for the next devel cycle: https://github.com/tgraf/net-next/commits/rht Thomas Graf (2): rhashtable: Schedule async resize when sync realloc fails rhashtable: Do not schedule more than one rehash if we can't grow further include/linux/rhashtable.h | 2 ++ lib/rhashtable.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/2] rhashtable: Schedule async resize when sync realloc fails
When rhashtable_insert_rehash() fails with ENOMEM, this indicates that we can't allocate the necessary memory in the current context but the limits as set by the user would still allow to grow. Thus attempt an async resize in the background where we can allocate using GFP_KERNEL which is more likely to succeed. The insertion itself will still fail to indicate pressure. This fixes a bug where the table would never continue growing once the utilization is above 100%. Fixes: ccd57b1bd324 (rhashtable: Add immediate rehash during insertion) Signed-off-by: Thomas Graf tg...@suug.ch --- include/linux/rhashtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index e23d242..7040b5c 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -593,6 +593,8 @@ slow_path: spin_unlock_bh(lock); err = rhashtable_insert_rehash(ht); rcu_read_unlock(); + if (err == -ENOMEM) + schedule_work(ht-run_work); if (err) return err; -- 2.3.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Revert net: Reset secmark when scrubbing packet
On 04/16/15 at 04:12pm, Herbert Xu wrote: On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote: They don't support namespaces, and maintaining the label is critical for SELinux, at least, which mediates security for the system as a whole. Thanks for the confirmation James, I thought this looked a bit dodgy :) ---8--- This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602 because the secmark must be preserved even when a packet crosses namespace boundaries. The reason is that security labels apply to the system as a whole and is not per-namespace. No objection to reverting, _BUT_ just because security labels apply to the system as a whole does not mean that both the packet in the underlay and overlay belong to the same context. The point here was to not blindly inherit the security context of a packet based on the outer or inner header. Someone tagging all packets addressed to the host itself with a SElinux context may not expect that SELinux context to be preserved into a namespaced tenant. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3] skbuff: Do not scrub skb mark within the same name space
On 04/16/15 at 09:03am, Herbert Xu wrote: The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 (tunnels: harmonize cleanup done on skb on rx path) broke anyone trying to use netfilter marking across IPv4 tunnels. While most of the fields that are cleared by skb_scrub_packet don't matter, the netfilter mark must be preserved. This patch rearranges skb_scrub_packet to preserve the mark field. Fixes: ea23192e8e57 (tunnels: harmonize cleanup done on skb on rx path) Signed-off-by: Herbert Xu herb...@gondor.apana.org.au Acked-by: Thomas Graf tg...@suug.ch We should also add a flag to veth which expclitly allows to preserve the mark into the namespace. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] ethtool netlink interface
Hello, Before I continue to finish this work I'd like to get a few comments on my implementation attempt. The following patch implements the ETHTOOL_SSET and ETHTOOL_GSET command via netlink. The individual commands are implemented as separate functions and hooked into a table holding a validate, set and fill function for each command. Additionaly an entry must be made in the attribute policy to validate attributes when received. Each ethtool command bundle is stored as a nested attribute in the regular link netlink message, therefore, unlike the ioctl interface, multiple ethtool commands can be issued in the same message allowing for links to be fully configured with a single message. There is one big disadvantage: Due to the nature of ioctl it is basically not possible to share any code between the ioctl and neltink implementation therefore it implies duplicating code unless we want to do the same hack as fib fronted by constructing netlink messages inside the kernel. Index: net-2.6.26/include/linux/if_link.h === --- net-2.6.26.orig/include/linux/if_link.h 2008-02-22 14:13:22.0 +0100 +++ net-2.6.26/include/linux/if_link.h 2008-02-22 14:40:24.0 +0100 @@ -79,6 +79,7 @@ IFLA_LINKINFO, #define IFLA_LINKINFO IFLA_LINKINFO IFLA_NET_NS_PID, + IFLA_ETHTOOL, __IFLA_MAX }; Index: net-2.6.26/net/core/ethtool.c === --- net-2.6.26.orig/net/core/ethtool.c 2008-02-22 14:13:22.0 +0100 +++ net-2.6.26/net/core/ethtool.c 2008-02-25 13:51:23.0 +0100 @@ -18,6 +18,7 @@ #include linux/ethtool.h #include linux/netdevice.h #include asm/uaccess.h +#include net/rtnetlink.h /* * Some useful ethtool_ops methods that're device independent. @@ -977,6 +978,136 @@ return rc; } +static int validate_settings(struct net_device *dev, struct nlattr *attr) +{ + if (!dev-ethtool_ops-get_settings) + return -EOPNOTSUPP; + + return 0; +} + +static int set_settings(struct net_device *dev, struct nlattr *attr) +{ + return dev-ethtool_ops-set_settings(dev, nla_data(attr)); +} + +static int fill_settings(struct sk_buff *skb, struct net_device *dev) +{ + const struct ethtool_ops *ops = dev-ethtool_ops; + struct ethtool_cmd cmd = { ETHTOOL_GSET }; + int err; + + if (!ops-get_settings) + return 0; + + if ((err = ops-get_settings(dev, cmd)) 0) + return err; + + return nla_put(skb, IFLA_ET_SETTINGS, sizeof(cmd), cmd); +} + +static struct { + int (*validate)(struct net_device *, struct nlattr *); + int (*exec)(struct net_device *, struct nlattr *); + int (*fill)(struct sk_buff *, struct net_device *); +} nlops[IFLA_ET_MAX+1] = { + [IFLA_ET_SETTINGS] = { .validate = validate_settings, + .exec = set_settings, + .fill = fill_settings, }, +}; + +static const struct nla_policy ethtool_policy[IFLA_ET_MAX+1] = { + [IFLA_ET_SETTINGS] = { .len = sizeof(struct ethtool_cmd) }, +}; + +int ethtool_validate_nlattr(struct net_device *dev, struct nlattr *cfg) +{ + const struct ethtool_ops *ops; + struct nlattr *attr; + int err, remaining = 0; + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (!netif_device_present(dev)) + return -ENODEV; + + if (!(ops = dev-ethtool_ops)) + return -EOPNOTSUPP; + + if ((err = nla_validate_nested(cfg, IFLA_ET_MAX, ethtool_policy)) 0) + goto errout; + + nla_for_each_nested(attr, cfg, remaining) { + if (nlops[attr-nla_type].validate) { + err = nlops[attr-nla_type].validate(dev, attr); + if (err 0) + goto errout; + } + } + +errout: + return err; +} + +int ethtool_execute_nlattr(struct net_device *dev, struct nlattr *et_attr) +{ + const struct ethtool_ops *ops = dev-ethtool_ops; + struct nlattr *attr; + unsigned long old_features; + int err, remaining = 0; + + if (ops-begin (err = ops-begin(dev)) 0) + return err; + + old_features = dev-features; + + nla_for_each_nested(attr, et_attr, remaining) { + if (nlops[attr-nla_type].exec) { + if ((err = nlops[attr-nla_type].exec(dev, attr)) 0) + goto errout; + } + } + + err = 0; +errout: + if (ops-complete) + ops-complete(dev); + + if (old_features != dev-features) + netdev_features_change(dev); + + return err; +} + +int ethtool_fill_nlattr(struct sk_buff *skb, struct net_device *dev) +{ + struct nlattr *attr; + int nfilled = 0, i, err = -EMSGSIZE; + +
Re: [RFC] ethtool netlink interface
* Jeff Garzik [EMAIL PROTECTED] 2008-02-25 12:30 However, I would think it inconsistent to only do SSET/GSET. If others are OK with this patch, are you open to implementing the full set of ethtool operations? Of course, I would also provide a documented userspace api within libnl. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK
RTM_NEWLINK allows for already existing links to be modified. For this purpose do_setlink() is called which expects address attributes with a payload length of at least dev-addr_len. This patch adds the necessary validation for the RTM_NEWLINK case. The address length for links to be created is not checked for now as the actual attribute length is used when copying the address to the netdevice structure. It might make sense to report an error if less than addr_len bytes are provided but enforcing this might break drivers trying to be smart with not transmitting all zero addresses. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.26/net/core/rtnetlink.c === --- net-2.6.26.orig/net/core/rtnetlink.c2008-02-22 01:50:53.0 +0100 +++ net-2.6.26/net/core/rtnetlink.c 2008-02-22 11:28:59.0 +0100 @@ -726,6 +726,21 @@ return net; } +static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[]) +{ + if (dev) { + if (tb[IFLA_ADDRESS] + nla_len(tb[IFLA_ADDRESS]) dev-addr_len) + return -EINVAL; + + if (tb[IFLA_BROADCAST] + nla_len(tb[IFLA_BROADCAST]) dev-addr_len) + return -EINVAL; + } + + return 0; +} + static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, struct nlattr **tb, char *ifname, int modified) { @@ -910,12 +925,7 @@ goto errout; } - if (tb[IFLA_ADDRESS] - nla_len(tb[IFLA_ADDRESS]) dev-addr_len) - goto errout_dev; - - if (tb[IFLA_BROADCAST] - nla_len(tb[IFLA_BROADCAST]) dev-addr_len) + if ((err = validate_linkmsg(dev, tb)) 0) goto errout_dev; err = do_setlink(dev, ifm, tb, ifname, 0); @@ -1036,6 +1046,9 @@ else dev = NULL; + if ((err = validate_linkmsg(dev, tb)) 0) + return err; + if (tb[IFLA_LINKINFO]) { err = nla_parse_nested(linkinfo, IFLA_INFO_MAX, tb[IFLA_LINKINFO], ifla_info_policy); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTNL]: Add missing link netlink attribute policy definitions
IFLA_LINK is no longer a write-only attribute on the kernel side and must thus be validated. Same goes for the newly introduced IFLA_LINKINFO. Fixes undefined behaviour if either of the attributes are not well formed. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.26/net/core/rtnetlink.c === --- net-2.6.26.orig/net/core/rtnetlink.c2008-02-19 20:30:08.0 +0100 +++ net-2.6.26/net/core/rtnetlink.c 2008-02-20 00:39:54.0 +0100 @@ -693,10 +693,12 @@ [IFLA_BROADCAST]= { .type = NLA_BINARY, .len = MAX_ADDR_LEN }, [IFLA_MAP] = { .len = sizeof(struct rtnl_link_ifmap) }, [IFLA_MTU] = { .type = NLA_U32 }, + [IFLA_LINK] = { .type = NLA_U32 }, [IFLA_TXQLEN] = { .type = NLA_U32 }, [IFLA_WEIGHT] = { .type = NLA_U32 }, [IFLA_OPERSTATE]= { .type = NLA_U8 }, [IFLA_LINKMODE] = { .type = NLA_U8 }, + [IFLA_LINKINFO] = { .type = NLA_NESTED }, [IFLA_NET_NS_PID] = { .type = NLA_U32 }, }; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: update frequency for stats in /proc/net/dev
* Mark Seger [EMAIL PROTECTED] 2007-12-18 08:37 Anyhow, I just wanted to let people know that ALL tools that monitor once a second on older counters will get the wrong numbers and tools that correct for the wrong number by using fractional intervals (and I suspect mine is the only one that does) but run on newer kernels will also get the wrong numbers. In any event, if anyone is interested in trying out collectl - it monitors a LOT more than just networks - you can snag a copy of from http://collectl.sourceforge.net/ if you'd like to take if for a drive. The website has a lot of output examples to give you a better idea what it can do. I even included a writeup about the odd network performance observations at http://collectl.sourceforge.net/NetworkStats.html I've solved this problem by using netlink to read the interface counters ten times per second and maintain an own counter from which I calculate the rate exactly once per second/minute/hour. The rate per second may still be inaccurate to some degree, therefore I keep a history of 2-5 rates and take them into account to smoothen the result. This works fairly well with _all_ operating systems. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ip neigh show not showing arp cache entries?
* Patrick McHardy [EMAIL PROTECTED] 2007-12-18 00:51 Chris Friesen wrote: Patrick McHardy wrote: From a kernel perspective there are only complete dumps, the filtering is done by iproute. So the fact that it shows them when querying specifically implies there is a bug in the iproute neighbour filter. Does it work if you omit all from the ip neigh show command? Omitting all gives identical results. It is still missing entries when compared with the output of arp. In that case the easiest way to debug this is probably if you add some debugging to ip/ipneigh.c:print_neigh() since I'm unable to reproduce this problem. A printf for all the filter conditions (= return 0) at the top should do. Alternatively, you can download libnl and run NLCB=debug src/nl-neigh-dump brief and check if the netlink message is sent by the kenrel for the neighbour in question. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libnl - netlink library: Memory leak in address cache?
* Joerg Pommnitz [EMAIL PROTECTED] 2007-12-11 06:52 I think the leak comes from addr_msg_parser. The newly created address object gets added to the cache with nl_cache_add wich takes a reference, so the reference in addr_msg_parser should be dropped, e.g. the following patch might be correct: That's correct, thanks for catching this. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv4] ESP: Discard dummy packets introduced in rfc4303
RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.25/net/ipv4/esp4.c === --- net-2.6.25.orig/net/ipv4/esp4.c 2007-12-10 15:57:23.0 +0100 +++ net-2.6.25/net/ipv4/esp4.c 2007-12-10 16:06:10.0 +0100 @@ -9,6 +9,7 @@ #include linux/pfkeyv2.h #include linux/random.h #include linux/spinlock.h +#include linux/in6.h #include net/icmp.h #include net/protocol.h #include net/udp.h @@ -233,6 +234,10 @@ /* ... check padding bits here. Silly. :-) */ + /* RFC4303: Drop dummy packets without any error */ + if (nexthdr[1] == IPPROTO_NONE) + goto out; + iph = ip_hdr(skb); ihl = iph-ihl * 4; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv6] ESP: Discard dummy packets introduced in rfc4303
RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.25/net/ipv6/esp6.c === --- net-2.6.25.orig/net/ipv6/esp6.c 2007-12-10 16:06:02.0 +0100 +++ net-2.6.25/net/ipv6/esp6.c 2007-12-10 16:08:02.0 +0100 @@ -238,6 +238,12 @@ } /* ... check padding bits here. Silly. :-) */ + /* RFC4303: Drop dummy packets without any error */ + if (nexthdr[1] == IPPROTO_NONE) { + ret = -EINVAL; + goto out; + } + pskb_trim(skb, skb-len - alen - padlen - 2); ret = nexthdr[1]; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression in current git - Network Manager fails (bisected)
* Dan Williams [EMAIL PROTECTED] 2007-10-23 10:10 Should I make NM disable ACKs for now until it gets fixed? The reason libnl enables ACKs by default is to give the application using it clear synchronisation points. For change requests that means the interface function won't return until the change has been commited as it will call nl_wait_for_ack(). So if you disable it in NM and run it on old kernels still using async netlink you won't be sure when the change is actually being done so this might break things if you rely on it. I think providing a invalid message handler which returns NL_OK if nlmsg_type is NLMSG_DONE or NLMSG_ERROR err == 0 would be better if you need some kind of workaround. As those messages are always last this should never cause real troubles. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression in current git - Network Manager fails (bisected)
* Dan Williams [EMAIL PROTECTED] 2007-10-22 11:57 On Mon, 2007-10-22 at 13:22 +0400, Denis V. Lunev wrote: We have spent some time with the problem with Alexey and there are no guesses for now. Is it possible to name exact version of Network Manager and all libraries related + provide us an output of strace with full buffers send/received from netlink. Something like strace -v -x -s 32768 nm NM uses netlink in two places; libnl (from Thomas Graf) and some custom code for listening for interface up/down events and wireless events. It looks like that code comes from libnl's lib/handlers.c where it thinks the received message is invalid. I'm pretty sure the code that checks carrier status of the device isn't libnl code; so maybe the error message (which should get fixed of course) isn't in the same path as the link detection. The link detection comes from src/nm-netlink-monitor.c, so maybe we should look at debugging there. The patch introduced a change in semantics because it removed the special ACK handling after a dump was started. I will look into this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Regression in current git - Network Manager fails (bisected)
* Denis V. Lunev [EMAIL PROTECTED] 2007-10-23 17:09 I have reproduced the problem with one-line test. ./nl-route-get 192.168.1.1 The problem is with this message: -- Debug: Sent Message: -- BEGIN NETLINK MESSAGE --- [HEADER] 16 octets .nlmsg_len = 20 .nlmsg_type = 18 route/link .nlmsg_flags = 773 REQUEST,ACK,ROOT,MATCH .nlmsg_seq = 1193143772 .nlmsg_pid = 8233 [PAYLOAD] 16 octets 00 1d fa 20 00 00 00 00 81 0e 02 00 00 00 00 00 ... --- END NETLINK MESSAGE --- it starts dump and requests ACK. libnl sets the ACK bit for all requests unless the application disables this behaviour. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix ACK processing after netlink_dump_start
* Denis V. Lunev [EMAIL PROTECTED] 2007-10-23 18:40 Revert to original netlink behavior. Do not reply with ACK if the netlink dump has bees successfully started. libnl has been broken by the cd40b7d3983c708aabe3d3008ec64ffce56d33b0 The following command reproduce the problem: /nl-route-get 192.168.1.1 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] ACK. Thank you for taking care of this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH - net-2.6.24 1/2] Introduce and use print_ip
* Joe Perches [EMAIL PROTECTED] 2007-09-19 23:53 This removes the uses of NIPQUAD and HIPQUAD in drivers/net and net IPV4 Use: DECLARE_IP_BUF(ipbuf); __be32 addr; print_ip(ipbuf, addr) Signed-off-by: Joe Perches [EMAIL PROTECTED] please pull from: git pull http://repo.or.cz/r/linux-2.6/trivial-mods.git print_ipv4 Including a patch for review would be helpful. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH - net-2.6.24 0/2] Introduce and use print_ip and print_ipv6
* Joe Perches [EMAIL PROTECTED] 2007-09-19 23:53 In the same vein as print_mac, the implementations introduce declaration macros: DECLARE_IP_BUF(var) DECLARE_IPV6_BUF(var) and functions: print_ip print_ipv6 print_ipv6_nofmt IPV4 Use: DECLARE_IP_BUF(ipbuf); __be32 addr; print_ip(ipbuf, addr); IPV6 use: DECLARE_IPV6_BUF(ipv6buf); const struct in6_addr *addr; print_ipv6(ipv6buf, addr); and print_ipv6_nofmt(ipv6buf, addr); compiled x86, defconfig and allyesconfig What exactly is the advantage of this? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NETLINK]: Introduce nested and byteorder flag to netlink attribute
This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/include/linux/netlink.h === --- net-2.6.24.orig/include/linux/netlink.h 2007-09-12 13:29:49.0 +0200 +++ net-2.6.24/include/linux/netlink.h 2007-09-12 13:59:41.0 +0200 @@ -129,6 +129,20 @@ __u16 nla_type; }; +/* + * nla_type (16 bits) + * +---+---+---+ + * | N | O | Attribute Type| + * +---+---+---+ + * N := Carries nested attributes + * O := Payload stored in network byte order + * + * Note: The N and O flag are mutually exclusive. + */ +#define NLA_F_NESTED (1 15) +#define NLA_F_NET_BYTEORDER(1 14) +#define NLA_TYPE_MASK ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER) + #define NLA_ALIGNTO4 #define NLA_ALIGN(len) (((len) + NLA_ALIGNTO - 1) ~(NLA_ALIGNTO - 1)) #define NLA_HDRLEN ((int) NLA_ALIGN(sizeof(struct nlattr))) Index: net-2.6.24/include/net/netlink.h === --- net-2.6.24.orig/include/net/netlink.h 2007-09-12 13:29:50.0 +0200 +++ net-2.6.24/include/net/netlink.h2007-09-12 14:17:56.0 +0200 @@ -667,6 +667,15 @@ } /** + * nla_type - attribute type + * @nla: netlink attribute + */ +static inline int nla_type(const struct nlattr *nla) +{ + return nla-nla_type NLA_TYPE_MASK; +} + +/** * nla_data - head of payload * @nla: netlink attribute */ Index: net-2.6.24/net/ipv4/fib_frontend.c === --- net-2.6.24.orig/net/ipv4/fib_frontend.c 2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/ipv4/fib_frontend.c 2007-09-12 13:59:41.0 +0200 @@ -487,7 +487,7 @@ } nlmsg_for_each_attr(attr, nlh, sizeof(struct rtmsg), remaining) { - switch (attr-nla_type) { + switch (nla_type(attr)) { case RTA_DST: cfg-fc_dst = nla_get_be32(attr); break; Index: net-2.6.24/net/ipv4/fib_semantics.c === --- net-2.6.24.orig/net/ipv4/fib_semantics.c2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/ipv4/fib_semantics.c 2007-09-12 13:59:41.0 +0200 @@ -743,7 +743,7 @@ int remaining; nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) { - int type = nla-nla_type; + int type = nla_type(nla); if (type) { if (type RTAX_MAX) Index: net-2.6.24/net/ipv6/route.c === --- net-2.6.24.orig/net/ipv6/route.c2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/ipv6/route.c 2007-09-12 13:59:41.0 +0200 @@ -1278,7 +1278,7 @@ int remaining; nla_for_each_attr(nla, cfg-fc_mx, cfg-fc_mx_len, remaining) { - int type = nla-nla_type; + int type = nla_type(nla); if (type) { if (type RTAX_MAX) { Index: net-2.6.24/net/netlabel/netlabel_cipso_v4.c === --- net-2.6.24.orig/net/netlabel/netlabel_cipso_v4.c2007-09-12 13:29:51.0 +0200 +++ net-2.6.24/net/netlabel/netlabel_cipso_v4.c 2007-09-12 13:59:41.0 +0200 @@ -130,7 +130,7 @@ return -EINVAL; nla_for_each_nested(nla, info-attrs[NLBL_CIPSOV4_A_TAGLST], nla_rem) - if (nla-nla_type == NLBL_CIPSOV4_A_TAG) { + if (nla_type(nla) == NLBL_CIPSOV4_A_TAG) { if (iter = CIPSO_V4_TAG_MAXCNT) return -EINVAL; doi_def-tags[iter++] = nla_get_u8(nla); @@ -192,13 +192,13 @@ nla_for_each_nested(nla_a, info-attrs[NLBL_CIPSOV4_A_MLSLVLLST], nla_a_rem) - if (nla_a-nla_type == NLBL_CIPSOV4_A_MLSLVL) { + if (nla_type(nla_a) == NLBL_CIPSOV4_A_MLSLVL) { if (nla_validate_nested(nla_a, NLBL_CIPSOV4_A_MAX, netlbl_cipsov4_genl_policy) != 0) goto add_std_failure; nla_for_each_nested(nla_b, nla_a, nla_b_rem) - switch (nla_b-nla_type) { + switch (nla_type
Re: [PATCH 1/1] ipv6: corrects sended rtnetlink message
* Milan Kocian [EMAIL PROTECTED] 2007-09-12 16:50 However I still think that this notitfication is redundant. I tried to look at XORP, bird, USAGI , quagga and to see RTM_DELLINK handling. And imho nobody depends on RTM_DELLINK message from ipv6. Send a patch to remove and we'll see if anyone complains. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] devinet: show all addresses assigned to interface
* Stephen Hemminger [EMAIL PROTECTED] 2007-09-06 16:10 Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876 Not all ips are shown by ip addr show command when IPs number assigned to an interface is more than 60-80 (in fact it depends on broadcast/label etc presence on each address). The more attributes are assigned to an address, the sooner the netlink message will be full. Steps to reproduce: It's terribly simple to reproduce: # for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done # ip addr show this will _not_ show all IPs. Looks like the problem is in netlink/ipv4 message processing. This is fix from bug submitter, it looks correct. The fix is correct. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: some weird corruption in net-2.6.24
* Herbert Xu [EMAIL PROTECTED] 2007-09-04 07:05 Thomas Graf [EMAIL PROTECTED] wrote: I've been trying to reproduce this, what happens on my system is that when the ISAKMP SA lifetime is exceeded the rekeying fails and my connection dies. I can reproduce this back to 2.6.22 and it doesn't seem related to my recent xfrm_user work. It looks like this behaviour is hiding the bug you are seeing. Could you try extending the ISAKMP SA life time so that it is longer than the IPSec SA life time? Yes, in this case the IPSec SA rekeying works just fine. I can't spot any signs of corruptions or alike. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: some weird corruption in net-2.6.24
* David Miller [EMAIL PROTECTED] 2007-08-30 22:39 Every so often some piece of userland dies, and often it's bad enough that my desktop session logs out. I've been trying to find some clues and it seems to happen about as often as openswan rekeys my VPN, so one suspect area is the netlink cleanups to xfrm_user. I plan to do some auditing of those changes looking for errors, but if someone can beat me to it... :-) I've been trying to reproduce this, what happens on my system is that when the ISAKMP SA lifetime is exceeded the rekeying fails and my connection dies. I can reproduce this back to 2.6.22 and it doesn't seem related to my recent xfrm_user work. It looks like this behaviour is hiding the bug you are seeing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET] atm: Fix build errors after conversion to pr_debug()
Fixes ancient ATM debug code to at least compile again. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/atm/signaling.c === --- net-2.6.24.orig/net/atm/signaling.c 2007-08-27 09:53:40.0 +0200 +++ net-2.6.24/net/atm/signaling.c 2007-08-27 09:55:16.0 +0200 @@ -89,9 +89,9 @@ static int sigd_send(struct atm_vcc *vcc msg = (struct atmsvc_msg *) skb-data; atomic_sub(skb-truesize, sk_atm(vcc)-sk_wmem_alloc); - pr_debug(sigd_send %d (0x%lx)\n,(int) msg-type, - (unsigned long) msg-vcc); vcc = *(struct atm_vcc **) msg-vcc; + pr_debug(sigd_send %d (0x%lx)\n,(int) msg-type, + (unsigned long) vcc); sk = sk_atm(vcc); switch (msg-type) { Index: net-2.6.24/net/atm/common.c === --- net-2.6.24.orig/net/atm/common.c2007-08-27 09:56:06.0 +0200 +++ net-2.6.24/net/atm/common.c 2007-08-27 09:56:16.0 +0200 @@ -497,7 +497,7 @@ int vcc_recvmsg(struct kiocb *iocb, stru if (error) return error; sock_recv_timestamp(msg, sk, skb); - pr_debug(RcvM %d -= %d\n, atomic_read(sk-rmem_alloc), skb-truesize); + pr_debug(RcvM %d -= %d\n, atomic_read(sk-sk_rmem_alloc), skb-truesize); atm_return(vcc, skb-truesize); skb_free_datagram(sk, skb); return copied; Index: net-2.6.24/net/atm/raw.c === --- net-2.6.24.orig/net/atm/raw.c 2007-08-27 09:57:56.0 +0200 +++ net-2.6.24/net/atm/raw.c2007-08-27 09:58:09.0 +0200 @@ -32,8 +32,8 @@ static void atm_pop_raw(struct atm_vcc * { struct sock *sk = sk_atm(vcc); - pr_debug(APopR (%d) %d -= %d\n, vcc-vci, sk-sk_wmem_alloc, - skb-truesize); + pr_debug(APopR (%d) %d -= %d\n, vcc-vci, + atomic_read(sk-sk_wmem_alloc), skb-truesize); atomic_sub(skb-truesize, sk-sk_wmem_alloc); dev_kfree_skb_any(skb); sk-sk_write_space(sk); Index: net-2.6.24/net/atm/pppoatm.c === --- net-2.6.24.orig/net/atm/pppoatm.c 2007-08-27 10:01:34.0 +0200 +++ net-2.6.24/net/atm/pppoatm.c2007-08-27 10:02:05.0 +0200 @@ -165,9 +165,8 @@ static void pppoatm_push(struct atm_vcc pvcc-chan.mtu += LLC_LEN; break; } - pr_debug((unit %d): Couldn't autodetect yet + pr_debug(Couldn't autodetect yet (skb: %02X %02X %02X %02X %02X %02X)\n, - pvcc-chan.unit, skb-data[0], skb-data[1], skb-data[2], skb-data[3], skb-data[4], skb-data[5]); goto error; @@ -195,8 +194,7 @@ static int pppoatm_send(struct ppp_chann { struct pppoatm_vcc *pvcc = chan_to_pvcc(chan); ATM_SKB(skb)-vcc = pvcc-atmvcc; - pr_debug((unit %d): pppoatm_send (skb=0x%p, vcc=0x%p)\n, - pvcc-chan.unit, skb, pvcc-atmvcc); + pr_debug(pppoatm_send (skb=0x%p, vcc=0x%p)\n, skb, pvcc-atmvcc); if (skb-data[0] == '\0' (pvcc-flags SC_COMP_PROT)) (void) skb_pull(skb, 1); switch (pvcc-encaps) { /* LLC encapsulation needed */ @@ -221,16 +219,14 @@ static int pppoatm_send(struct ppp_chann goto nospace; break; case e_autodetect: - pr_debug((unit %d): Trying to send without setting encaps!\n, - pvcc-chan.unit); + pr_debug(Trying to send without setting encaps!\n); kfree_skb(skb); return 1; } atomic_add(skb-truesize, sk_atm(ATM_SKB(skb)-vcc)-sk_wmem_alloc); ATM_SKB(skb)-atm_options = ATM_SKB(skb)-vcc-atm_options; - pr_debug((unit %d): atm_skb(%p)-vcc(%p)-dev(%p)\n, - pvcc-chan.unit, skb, ATM_SKB(skb)-vcc, + pr_debug(atm_skb(%p)-vcc(%p)-dev(%p)\n, skb, ATM_SKB(skb)-vcc, ATM_SKB(skb)-vcc-dev); return ATM_SKB(skb)-vcc-send(ATM_SKB(skb)-vcc, skb) ? DROP_PACKET : 1; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET] 82596: Add missing parenthesis
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/drivers/net/82596.c === --- net-2.6.24.orig/drivers/net/82596.c 2007-08-27 14:43:16.0 +0200 +++ net-2.6.24/drivers/net/82596.c 2007-08-27 14:43:51.0 +0200 @@ -1562,7 +1562,7 @@ static void set_multicast_list(struct ne memcpy(cp, dmi-dmi_addr, 6); if (i596_debug 1) DEB(DEB_MULTI,printk(KERN_INFO %s: Adding address MAC_FMT \n, - dev-name, MAC_ARG(cp)); + dev-name, MAC_ARG(cp))); } i596_add_cmd(dev, cmd-cmd); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[XFRM] policy: Replace magic number with XFRM_POLICY_OUT
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_policy.c === --- net-2.6.24.orig/net/xfrm/xfrm_policy.c 2007-08-24 13:11:17.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_policy.c 2007-08-24 13:11:48.0 +0200 @@ -1477,7 +1477,7 @@ restart: pol_dead = 0; xfrm_nr = 0; - if (sk sk-sk_policy[1]) { + if (sk sk-sk_policy[XFRM_POLICY_OUT]) { policy = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl); if (IS_ERR(policy)) return PTR_ERR(policy); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [XFRM] : Fix pointer copy size for encap_tmpl and coaddr.
* Masahide NAKAMURA [EMAIL PROTECTED] 2007-08-24 19:05 This is minor fix about sizeof argument using with kmemdup(). Thanks for catching this! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net-2.6.24 failure with netconsole
* Andrew Morton [EMAIL PROTECTED] 2007-08-21 22:54 Which used to be a BUG. It later oopsed via a null-pointer deref in net_rx_action(), which is a much preferable result. I fixed this already Index: net-2.6.24/include/linux/netpoll.h === --- net-2.6.24.orig/include/linux/netpoll.h 2007-08-22 01:02:14.0 +0200 +++ net-2.6.24/include/linux/netpoll.h 2007-08-22 01:02:30.0 +0200 @@ -75,7 +75,7 @@ static inline void *netpoll_poll_lock(st struct net_device *dev = napi-dev; rcu_read_lock(); /* deal with race on -npinfo */ - if (dev-npinfo) { + if (dev dev-npinfo) { spin_lock(napi-poll_lock); napi-poll_owner = smp_processor_id(); return napi; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/16] xfrm netlink interface cleanups
This patchset converts the xfrm netlink bits over to the type safe netlink interface and does some cleanups. xfrm_user.c | 1041 1 file changed, 433 insertions(+), 608 deletions(-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] [XFRM] netlink: Use nlmsg_put() instead of NLMSG_PUT()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-20 17:09:48.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:10:34.0 +0200 @@ -588,10 +588,10 @@ static int dump_one_state(struct xfrm_st if (sp-this_idx sp-start_idx) goto out; - nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid, - sp-nlmsg_seq, - XFRM_MSG_NEWSA, sizeof(*p)); - nlh-nlmsg_flags = sp-nlmsg_flags; + nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq, + XFRM_MSG_NEWSA, sizeof(*p), sp-nlmsg_flags); + if (nlh == NULL) + return -EMSGSIZE; p = NLMSG_DATA(nlh); copy_to_user_state(x, p); @@ -633,7 +633,6 @@ out: sp-this_idx++; return 0; -nlmsg_failure: rtattr_failure: nlmsg_trim(skb, b); return -1; @@ -1276,11 +1275,11 @@ static int dump_one_policy(struct xfrm_p if (sp-this_idx sp-start_idx) goto out; - nlh = NLMSG_PUT(skb, NETLINK_CB(in_skb).pid, - sp-nlmsg_seq, - XFRM_MSG_NEWPOLICY, sizeof(*p)); + nlh = nlmsg_put(skb, NETLINK_CB(in_skb).pid, sp-nlmsg_seq, + XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags); + if (nlh == NULL) + return -EMSGSIZE; p = NLMSG_DATA(nlh); - nlh-nlmsg_flags = sp-nlmsg_flags; copy_to_user_policy(xp, p, dir); if (copy_to_user_tmpl(xp, skb) 0) @@ -1449,9 +1448,10 @@ static int build_aevent(struct sk_buff * struct xfrm_lifetime_cur ltime; unsigned char *b = skb_tail_pointer(skb); - nlh = NLMSG_PUT(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id)); + nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0); + if (nlh == NULL) + return -EMSGSIZE; id = NLMSG_DATA(nlh); - nlh-nlmsg_flags = 0; memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr)); id-sa_id.spi = x-id.spi; @@ -1483,7 +1483,6 @@ static int build_aevent(struct sk_buff * return skb-len; rtattr_failure: -nlmsg_failure: nlmsg_trim(skb, b); return -1; } @@ -1866,9 +1865,10 @@ static int build_migrate(struct sk_buff unsigned char *b = skb_tail_pointer(skb); int i; - nlh = NLMSG_PUT(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id)); + nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0); + if (nlh == NULL) + return -EMSGSIZE; pol_id = NLMSG_DATA(nlh); - nlh-nlmsg_flags = 0; /* copy data from selector, dir, and type to the pol_id */ memset(pol_id, 0, sizeof(*pol_id)); @@ -2045,20 +2045,16 @@ static int build_expire(struct sk_buff * struct nlmsghdr *nlh; unsigned char *b = skb_tail_pointer(skb); - nlh = NLMSG_PUT(skb, c-pid, 0, XFRM_MSG_EXPIRE, - sizeof(*ue)); + nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0); + if (nlh == NULL) + return -EMSGSIZE; ue = NLMSG_DATA(nlh); - nlh-nlmsg_flags = 0; copy_to_user_state(x, ue-state); ue-hard = (c-data.hard != 0) ? 1 : 0; nlh-nlmsg_len = skb_tail_pointer(skb) - b; return skb-len; - -nlmsg_failure: - nlmsg_trim(skb, b); - return -1; } static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) @@ -2108,9 +2104,11 @@ static int xfrm_notify_sa_flush(struct k return -ENOMEM; b = skb-tail; - nlh = NLMSG_PUT(skb, c-pid, c-seq, - XFRM_MSG_FLUSHSA, sizeof(*p)); - nlh-nlmsg_flags = 0; + nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0); + if (nlh == NULL) { + kfree_skb(skb); + return -EMSGSIZE; + } p = NLMSG_DATA(nlh); p-proto = c-data.proto; @@ -2119,10 +2117,6 @@ static int xfrm_notify_sa_flush(struct k NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); - -nlmsg_failure: - kfree_skb(skb); - return -1; } static inline int xfrm_sa_len(struct xfrm_state *x) @@ -2162,8 +2156,9 @@ static int xfrm_notify_sa(struct xfrm_st return -ENOMEM; b = skb-tail; - nlh = NLMSG_PUT(skb, c-pid, c-seq, c-event, headlen); - nlh-nlmsg_flags = 0; + nlh = nlmsg_put(skb, c-pid, c-seq, c-event, headlen, 0); + if (nlh == NULL) + goto nlmsg_failure; p = NLMSG_DATA(nlh); if (c-event == XFRM_MSG_DELSA) { @@ -2233,10 +2228,10 @@ static int build_acquire(struct sk_buff unsigned char *b = skb_tail_pointer(skb); __u32 seq = xfrm_get_acqseq
[PATCH 09/16] [XFRM] netlink: Use nlmsg_parse() to parse attributes
Uses nlmsg_parse() to parse the attributes. This actually changes behaviour as unknown attributes (type MAXTYPE) no longer cause an error. Instead unknown attributes will be ignored henceforth to keep older kernels compatible with more recent userspace tools. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:07:38.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:04.0 +0200 @@ -1890,7 +1890,7 @@ static int xfrm_send_migrate(struct xfrm } #endif -#define XMSGSIZE(type) NLMSG_LENGTH(sizeof(struct type)) +#define XMSGSIZE(type) sizeof(struct type) static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = { [XFRM_MSG_NEWSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info), @@ -1906,13 +1906,13 @@ static const int xfrm_msg_min[XFRM_NR_MS [XFRM_MSG_UPDSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info), [XFRM_MSG_POLEXPIRE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire), [XFRM_MSG_FLUSHSA - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush), - [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = NLMSG_LENGTH(0), + [XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = 0, [XFRM_MSG_NEWAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), [XFRM_MSG_GETAE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id), [XFRM_MSG_REPORT - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report), [XFRM_MSG_MIGRATE - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id), - [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)), - [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = NLMSG_LENGTH(sizeof(u32)), + [XFRM_MSG_GETSADINFO - XFRM_MSG_BASE] = sizeof(u32), + [XFRM_MSG_GETSPDINFO - XFRM_MSG_BASE] = sizeof(u32), }; #undef XMSGSIZE @@ -1946,9 +1946,9 @@ static struct xfrm_link { static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) { - struct rtattr *xfrma[XFRMA_MAX]; + struct nlattr *xfrma[XFRMA_MAX+1]; struct xfrm_link *link; - int type, min_len; + int type, err; type = nlh-nlmsg_type; if (type XFRM_MSG_MAX) @@ -1970,30 +1970,16 @@ static int xfrm_user_rcv_msg(struct sk_b return netlink_dump_start(xfrm_nl, skb, nlh, link-dump, NULL); } - memset(xfrma, 0, sizeof(xfrma)); - - if (nlh-nlmsg_len (min_len = xfrm_msg_min[type])) - return -EINVAL; - - if (nlh-nlmsg_len min_len) { - int attrlen = nlh-nlmsg_len - NLMSG_ALIGN(min_len); - struct rtattr *attr = (void *) nlh + NLMSG_ALIGN(min_len); - - while (RTA_OK(attr, attrlen)) { - unsigned short flavor = attr-rta_type; - if (flavor) { - if (flavor XFRMA_MAX) - return -EINVAL; - xfrma[flavor - 1] = attr; - } - attr = RTA_NEXT(attr, attrlen); - } - } + /* FIXME: Temporary hack, nlmsg_parse() starts at xfrma[1], old code +* expects first attribute at xfrma[0] */ + err = nlmsg_parse(nlh, xfrm_msg_min[type], xfrma-1, XFRMA_MAX, NULL); + if (err 0) + return err; if (link-doit == NULL) return -EINVAL; - return link-doit(skb, nlh, xfrma); + return link-doit(skb, nlh, (struct rtattr **) xfrma); } static void xfrm_netlink_rcv(struct sock *sk, int len) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/16] [XFRM] netlink: Use nlmsg_new() and type-safe size calculation helpers
Moves all complex message size calculation into own inlined helper functions and makes use of the type-safe netlink interface. Using nlmsg_new() simplifies the calculation itself as it takes care of the netlink header length by itself. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:04:46.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:07:38.0 +0200 @@ -670,7 +670,7 @@ static struct sk_buff *xfrm_state_netlin struct xfrm_dump_info info; struct sk_buff *skb; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC); if (!skb) return ERR_PTR(-ENOMEM); @@ -688,6 +688,13 @@ static struct sk_buff *xfrm_state_netlin return skb; } +static inline size_t xfrm_spdinfo_msgsize(void) +{ + return NLMSG_ALIGN(4) + + nla_total_size(sizeof(struct xfrmu_spdinfo)) + + nla_total_size(sizeof(struct xfrmu_spdhinfo)); +} + static int build_spdinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags) { struct xfrmk_spdinfo si; @@ -729,12 +736,8 @@ static int xfrm_get_spdinfo(struct sk_bu u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; - int len = NLMSG_LENGTH(sizeof(u32)); - len += RTA_SPACE(sizeof(struct xfrmu_spdinfo)); - len += RTA_SPACE(sizeof(struct xfrmu_spdhinfo)); - - r_skb = alloc_skb(len, GFP_ATOMIC); + r_skb = nlmsg_new(xfrm_spdinfo_msgsize(), GFP_ATOMIC); if (r_skb == NULL) return -ENOMEM; @@ -744,6 +747,13 @@ static int xfrm_get_spdinfo(struct sk_bu return nlmsg_unicast(xfrm_nl, r_skb, spid); } +static inline size_t xfrm_sadinfo_msgsize(void) +{ + return NLMSG_ALIGN(4) + + nla_total_size(sizeof(struct xfrmu_sadhinfo)) + + nla_total_size(4); /* XFRMA_SAD_CNT */ +} + static int build_sadinfo(struct sk_buff *skb, u32 pid, u32 seq, u32 flags) { struct xfrmk_sadinfo si; @@ -779,13 +789,8 @@ static int xfrm_get_sadinfo(struct sk_bu u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; - int len = NLMSG_LENGTH(sizeof(u32)); - - len += RTA_SPACE(sizeof(struct xfrmu_sadhinfo)); - len += RTA_SPACE(sizeof(u32)); - - r_skb = alloc_skb(len, GFP_ATOMIC); + r_skb = nlmsg_new(xfrm_sadinfo_msgsize(), GFP_ATOMIC); if (r_skb == NULL) return -ENOMEM; @@ -1311,7 +1316,7 @@ static struct sk_buff *xfrm_policy_netli struct xfrm_dump_info info; struct sk_buff *skb; - skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); if (!skb) return ERR_PTR(-ENOMEM); @@ -1425,6 +1430,14 @@ static int xfrm_flush_sa(struct sk_buff return 0; } +static inline size_t xfrm_aevent_msgsize(void) +{ + return NLMSG_ALIGN(sizeof(struct xfrm_aevent_id)) + + nla_total_size(sizeof(struct xfrm_replay_state)) + + nla_total_size(sizeof(struct xfrm_lifetime_cur)) + + nla_total_size(4) /* XFRM_AE_RTHR */ + + nla_total_size(4); /* XFRM_AE_ETHR */ +} static int build_aevent(struct sk_buff *skb, struct xfrm_state *x, struct km_event *c) { @@ -1469,19 +1482,9 @@ static int xfrm_get_ae(struct sk_buff *s int err; struct km_event c; struct xfrm_aevent_id *p = nlmsg_data(nlh); - int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); struct xfrm_usersa_id *id = p-sa_id; - len += RTA_SPACE(sizeof(struct xfrm_replay_state)); - len += RTA_SPACE(sizeof(struct xfrm_lifetime_cur)); - - if (p-flagsXFRM_AE_RTHR) - len+=RTA_SPACE(sizeof(u32)); - - if (p-flagsXFRM_AE_ETHR) - len+=RTA_SPACE(sizeof(u32)); - - r_skb = alloc_skb(len, GFP_ATOMIC); + r_skb = nlmsg_new(xfrm_aevent_msgsize(), GFP_ATOMIC); if (r_skb == NULL) return -ENOMEM; @@ -1824,6 +1827,13 @@ static int copy_to_user_migrate(struct x return nla_put(skb, XFRMA_MIGRATE, sizeof(um), um); } +static inline size_t xfrm_migrate_msgsize(int num_migrate) +{ + return NLMSG_ALIGN(sizeof(struct xfrm_userpolicy_id)) + + nla_total_size(sizeof(struct xfrm_user_migrate) * num_migrate) + + userpolicy_type_attrsize(); +} + static int build_migrate(struct sk_buff *skb, struct xfrm_migrate *m, int num_migrate, struct xfrm_selector *sel, u8 dir, u8 type) @@ -1861,12 +1871,8 @@ static int xfrm_send_migrate(struct xfrm struct xfrm_migrate *m, int num_migrate) { struct sk_buff *skb; - size_t len
[PATCH 02/16] [XFRM] netlink: Use nlmsg_end() and nlmsg_cancel()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:10:34.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:12:20.0 +0200 @@ -583,7 +583,6 @@ static int dump_one_state(struct xfrm_st struct sk_buff *skb = sp-out_skb; struct xfrm_usersa_info *p; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); if (sp-this_idx sp-start_idx) goto out; @@ -628,14 +627,14 @@ static int dump_one_state(struct xfrm_st if (x-lastused) RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused); - nlh-nlmsg_len = skb_tail_pointer(skb) - b; + nlmsg_end(skb, nlh); out: sp-this_idx++; return 0; rtattr_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb) @@ -1270,7 +1269,6 @@ static int dump_one_policy(struct xfrm_p struct sk_buff *in_skb = sp-in_skb; struct sk_buff *skb = sp-out_skb; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); if (sp-this_idx sp-start_idx) goto out; @@ -1289,14 +1287,14 @@ static int dump_one_policy(struct xfrm_p if (copy_to_user_policy_type(xp-type, skb) 0) goto nlmsg_failure; - nlh-nlmsg_len = skb_tail_pointer(skb) - b; + nlmsg_end(skb, nlh); out: sp-this_idx++; return 0; nlmsg_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb) @@ -1446,7 +1444,6 @@ static int build_aevent(struct sk_buff * struct xfrm_aevent_id *id; struct nlmsghdr *nlh; struct xfrm_lifetime_cur ltime; - unsigned char *b = skb_tail_pointer(skb); nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0); if (nlh == NULL) @@ -1479,12 +1476,11 @@ static int build_aevent(struct sk_buff * RTA_PUT(skb,XFRMA_ETIMER_THRESH,sizeof(u32),etimer); } - nlh-nlmsg_len = skb_tail_pointer(skb) - b; - return skb-len; + return nlmsg_end(skb, nlh); rtattr_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_get_ae(struct sk_buff *skb, struct nlmsghdr *nlh, @@ -1862,7 +1858,6 @@ static int build_migrate(struct sk_buff struct xfrm_migrate *mp; struct xfrm_userpolicy_id *pol_id; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); int i; nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_MIGRATE, sizeof(*pol_id), 0); @@ -1883,11 +1878,10 @@ static int build_migrate(struct sk_buff goto nlmsg_failure; } - nlh-nlmsg_len = skb_tail_pointer(skb) - b; - return skb-len; + return nlmsg_end(skb, nlh); nlmsg_failure: - nlmsg_trim(skb, b); - return -1; + nlmsg_cancel(skb, nlh); + return -EMSGSIZE; } static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type, @@ -2043,7 +2037,6 @@ static int build_expire(struct sk_buff * { struct xfrm_user_expire *ue; struct nlmsghdr *nlh; - unsigned char *b = skb_tail_pointer(skb); nlh = nlmsg_put(skb, c-pid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0); if (nlh == NULL) @@ -2053,8 +2046,7 @@ static int build_expire(struct sk_buff * copy_to_user_state(x, ue-state); ue-hard = (c-data.hard != 0) ? 1 : 0; - nlh-nlmsg_len = skb_tail_pointer(skb) - b; - return skb-len; + return nlmsg_end(skb, nlh); } static int xfrm_exp_state_notify(struct xfrm_state *x, struct km_event *c) @@ -2096,13 +2088,11 @@ static int xfrm_notify_sa_flush(struct k struct xfrm_usersa_flush *p; struct nlmsghdr *nlh; struct sk_buff *skb; - sk_buff_data_t b; int len = NLMSG_LENGTH(sizeof(struct xfrm_usersa_flush)); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; - b = skb-tail; nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_FLUSHSA, sizeof(*p), 0); if (nlh == NULL) { @@ -2113,7 +2103,7 @@ static int xfrm_notify_sa_flush(struct k p = NLMSG_DATA(nlh); p-proto = c-data.proto; - nlh-nlmsg_len = skb-tail - b; + nlmsg_end(skb, nlh); NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); @@ -2140,7 +2130,6 @@ static int xfrm_notify_sa(struct xfrm_st struct xfrm_usersa_id *id; struct nlmsghdr *nlh; struct sk_buff *skb
[PATCH 16/16] [XFRM] netlink: Inline attach_encap_tmpl(), attach_sec_ctx(), and attach_one_addr()
These functions are only used once and are a lot easier to understand if inlined directly into the function. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 23:05:30.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-22 16:45:31.0 +0200 @@ -214,23 +214,6 @@ static int attach_one_algo(struct xfrm_a return 0; } -static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr *rta) -{ - struct xfrm_encap_tmpl *p, *uencap; - - if (!rta) - return 0; - - uencap = nla_data(rta); - p = kmemdup(uencap, sizeof(*p), GFP_KERNEL); - if (!p) - return -ENOMEM; - - *encapp = p; - return 0; -} - - static inline int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx) { int len = 0; @@ -242,33 +225,6 @@ static inline int xfrm_user_sec_ctx_size return len; } -static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg) -{ - struct xfrm_user_sec_ctx *uctx; - - if (!u_arg) - return 0; - - uctx = nla_data(u_arg); - return security_xfrm_state_alloc(x, uctx); -} - -static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta) -{ - xfrm_address_t *p, *uaddrp; - - if (!rta) - return 0; - - uaddrp = nla_data(rta); - p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL); - if (!p) - return -ENOMEM; - - *addrpp = p; - return 0; -} - static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p) { memcpy(x-id, p-id, sizeof(x-id)); @@ -340,15 +296,27 @@ static struct xfrm_state *xfrm_state_con xfrm_calg_get_byname, attrs[XFRMA_ALG_COMP]))) goto error; - if ((err = attach_encap_tmpl(x-encap, attrs[XFRMA_ENCAP]))) - goto error; - if ((err = attach_one_addr(x-coaddr, attrs[XFRMA_COADDR]))) - goto error; + + if (attrs[XFRMA_ENCAP]) { + x-encap = kmemdup(nla_data(attrs[XFRMA_ENCAP]), + sizeof(x-encap), GFP_KERNEL); + if (x-encap == NULL) + goto error; + } + + if (attrs[XFRMA_COADDR]) { + x-coaddr = kmemdup(nla_data(attrs[XFRMA_COADDR]), + sizeof(x-coaddr), GFP_KERNEL); + if (x-coaddr == NULL) + goto error; + } + err = xfrm_init_state(x); if (err) goto error; - if ((err = attach_sec_ctx(x, attrs[XFRMA_SEC_CTX]))) + if (attrs[XFRMA_SEC_CTX] + security_xfrm_state_alloc(x, nla_data(attrs[XFRMA_SEC_CTX]))) goto error; x-km.seq = p-seq; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/16] [XFRM] netlink: Move algorithm length calculation to its own function
Adds alg_len() to calculate the properly padded length of an algorithm attribute to simplify the code. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:16:03.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:03:43.0 +0200 @@ -33,6 +33,11 @@ #endif #include linux/audit.h +static inline int alg_len(struct xfrm_algo *alg) +{ + return sizeof(*alg) + ((alg-alg_key_len + 7) / 8); +} + static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type) { struct rtattr *rt = xfrma[type - 1]; @@ -232,7 +237,6 @@ static int attach_one_algo(struct xfrm_a struct rtattr *rta = u_arg; struct xfrm_algo *p, *ualg; struct xfrm_algo_desc *algo; - int len; if (!rta) return 0; @@ -244,8 +248,7 @@ static int attach_one_algo(struct xfrm_a return -ENOSYS; *props = algo-desc.sadb_alg_id; - len = sizeof(*ualg) + (ualg-alg_key_len + 7U) / 8; - p = kmemdup(ualg, len, GFP_KERNEL); + p = kmemdup(ualg, alg_len(ualg), GFP_KERNEL); if (!p) return -ENOMEM; @@ -617,11 +620,9 @@ static int dump_one_state(struct xfrm_st copy_to_user_state(x, p); if (x-aalg) - NLA_PUT(skb, XFRMA_ALG_AUTH, - sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg); + NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg); if (x-ealg) - NLA_PUT(skb, XFRMA_ALG_CRYPT, - sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg); + NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg); if (x-calg) NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); @@ -2072,9 +2073,9 @@ static inline int xfrm_sa_len(struct xfr { int l = 0; if (x-aalg) - l += RTA_SPACE(sizeof(*x-aalg) + (x-aalg-alg_key_len+7)/8); + l += RTA_SPACE(alg_len(x-aalg)); if (x-ealg) - l += RTA_SPACE(sizeof(*x-ealg) + (x-ealg-alg_key_len+7)/8); + l += RTA_SPACE(alg_len(x-ealg)); if (x-calg) l += RTA_SPACE(sizeof(*x-calg)); if (x-encap) @@ -2127,11 +2128,9 @@ static int xfrm_notify_sa(struct xfrm_st copy_to_user_state(x, p); if (x-aalg) - NLA_PUT(skb, XFRMA_ALG_AUTH, - sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg); + NLA_PUT(skb, XFRMA_ALG_AUTH, alg_len(x-aalg), x-aalg); if (x-ealg) - NLA_PUT(skb, XFRMA_ALG_CRYPT, - sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg); + NLA_PUT(skb, XFRMA_ALG_CRYPT, alg_len(x-ealg), x-ealg); if (x-calg) NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/16] [XFRM] netlink: Rename attribyte array from xfrma[] to attrs[]
Increases readability a lot. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:10.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:34:29.0 +0200 @@ -38,9 +38,9 @@ static inline int alg_len(struct xfrm_al return sizeof(*alg) + ((alg-alg_key_len + 7) / 8); } -static int verify_one_alg(struct rtattr **xfrma, enum xfrm_attr_type_t type) +static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type) { - struct rtattr *rt = xfrma[type]; + struct rtattr *rt = attrs[type]; struct xfrm_algo *algp; if (!rt) @@ -75,18 +75,18 @@ static int verify_one_alg(struct rtattr return 0; } -static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type, +static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type, xfrm_address_t **addrp) { - struct rtattr *rt = xfrma[type]; + struct rtattr *rt = attrs[type]; if (rt addrp) *addrp = RTA_DATA(rt); } -static inline int verify_sec_ctx_len(struct rtattr **xfrma) +static inline int verify_sec_ctx_len(struct rtattr **attrs) { - struct rtattr *rt = xfrma[XFRMA_SEC_CTX]; + struct rtattr *rt = attrs[XFRMA_SEC_CTX]; struct xfrm_user_sec_ctx *uctx; if (!rt) @@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str static int verify_newsa_info(struct xfrm_usersa_info *p, -struct rtattr **xfrma) +struct rtattr **attrs) { int err; @@ -125,35 +125,35 @@ static int verify_newsa_info(struct xfrm err = -EINVAL; switch (p-id.proto) { case IPPROTO_AH: - if (!xfrma[XFRMA_ALG_AUTH] || - xfrma[XFRMA_ALG_CRYPT] || - xfrma[XFRMA_ALG_COMP]) + if (!attrs[XFRMA_ALG_AUTH] || + attrs[XFRMA_ALG_CRYPT] || + attrs[XFRMA_ALG_COMP]) goto out; break; case IPPROTO_ESP: - if ((!xfrma[XFRMA_ALG_AUTH] -!xfrma[XFRMA_ALG_CRYPT]) || - xfrma[XFRMA_ALG_COMP]) + if ((!attrs[XFRMA_ALG_AUTH] +!attrs[XFRMA_ALG_CRYPT]) || + attrs[XFRMA_ALG_COMP]) goto out; break; case IPPROTO_COMP: - if (!xfrma[XFRMA_ALG_COMP] || - xfrma[XFRMA_ALG_AUTH] || - xfrma[XFRMA_ALG_CRYPT]) + if (!attrs[XFRMA_ALG_COMP] || + attrs[XFRMA_ALG_AUTH] || + attrs[XFRMA_ALG_CRYPT]) goto out; break; #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) case IPPROTO_DSTOPTS: case IPPROTO_ROUTING: - if (xfrma[XFRMA_ALG_COMP] || - xfrma[XFRMA_ALG_AUTH] || - xfrma[XFRMA_ALG_CRYPT] || - xfrma[XFRMA_ENCAP] || - xfrma[XFRMA_SEC_CTX]|| - !xfrma[XFRMA_COADDR]) + if (attrs[XFRMA_ALG_COMP] || + attrs[XFRMA_ALG_AUTH] || + attrs[XFRMA_ALG_CRYPT] || + attrs[XFRMA_ENCAP] || + attrs[XFRMA_SEC_CTX]|| + !attrs[XFRMA_COADDR]) goto out; break; #endif @@ -162,13 +162,13 @@ static int verify_newsa_info(struct xfrm goto out; } - if ((err = verify_one_alg(xfrma, XFRMA_ALG_AUTH))) + if ((err = verify_one_alg(attrs, XFRMA_ALG_AUTH))) goto out; - if ((err = verify_one_alg(xfrma, XFRMA_ALG_CRYPT))) + if ((err = verify_one_alg(attrs, XFRMA_ALG_CRYPT))) goto out; - if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP))) + if ((err = verify_one_alg(attrs, XFRMA_ALG_COMP))) goto out; - if ((err = verify_sec_ctx_len(xfrma))) + if ((err = verify_sec_ctx_len(attrs))) goto out; err = -EINVAL; @@ -298,12 +298,12 @@ static void copy_from_user_state(struct * somehow made shareable and move it to xfrm_state.c - JHS * */ -static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) +static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs) { - struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL]; - struct rtattr *lt = xfrma[XFRMA_LTIME_VAL]; - struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH]; - struct rtattr *rt = xfrma[XFRMA_REPLAY_THRESH]; + struct rtattr *rp = attrs[XFRMA_REPLAY_VAL
[PATCH 10/16] [XFRM] netlink: Establish an attribute policy
Adds a policy defining the minimal payload lengths for all the attributes allowing for most attribute validation checks to be removed from in the middle of the code path. Makes updates more consistent as many format errors are recognised earlier, before any changes have been attempted. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:31:04.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:31:56.0 +0200 @@ -42,19 +42,12 @@ static int verify_one_alg(struct rtattr { struct rtattr *rt = xfrma[type - 1]; struct xfrm_algo *algp; - int len; if (!rt) return 0; - len = (rt-rta_len - sizeof(*rt)) - sizeof(*algp); - if (len 0) - return -EINVAL; - algp = RTA_DATA(rt); - - len -= (algp-alg_key_len + 7U) / 8; - if (len 0) + if (RTA_PAYLOAD(rt) alg_len(algp)) return -EINVAL; switch (type) { @@ -82,55 +75,25 @@ static int verify_one_alg(struct rtattr return 0; } -static int verify_encap_tmpl(struct rtattr **xfrma) -{ - struct rtattr *rt = xfrma[XFRMA_ENCAP - 1]; - struct xfrm_encap_tmpl *encap; - - if (!rt) - return 0; - - if ((rt-rta_len - sizeof(*rt)) sizeof(*encap)) - return -EINVAL; - - return 0; -} - -static int verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type, +static void verify_one_addr(struct rtattr **xfrma, enum xfrm_attr_type_t type, xfrm_address_t **addrp) { struct rtattr *rt = xfrma[type - 1]; - if (!rt) - return 0; - - if ((rt-rta_len - sizeof(*rt)) sizeof(**addrp)) - return -EINVAL; - - if (addrp) + if (rt addrp) *addrp = RTA_DATA(rt); - - return 0; } static inline int verify_sec_ctx_len(struct rtattr **xfrma) { struct rtattr *rt = xfrma[XFRMA_SEC_CTX - 1]; struct xfrm_user_sec_ctx *uctx; - int len = 0; if (!rt) return 0; - if (rt-rta_len sizeof(*uctx)) - return -EINVAL; - uctx = RTA_DATA(rt); - - len += sizeof(struct xfrm_user_sec_ctx); - len += uctx-ctx_len; - - if (uctx-len != len) + if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len)) return -EINVAL; return 0; @@ -205,12 +168,8 @@ static int verify_newsa_info(struct xfrm goto out; if ((err = verify_one_alg(xfrma, XFRMA_ALG_COMP))) goto out; - if ((err = verify_encap_tmpl(xfrma))) - goto out; if ((err = verify_sec_ctx_len(xfrma))) goto out; - if ((err = verify_one_addr(xfrma, XFRMA_COADDR, NULL))) - goto out; err = -EINVAL; switch (p-mode) { @@ -339,9 +298,8 @@ static void copy_from_user_state(struct * somehow made shareable and move it to xfrm_state.c - JHS * */ -static int xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) +static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **xfrma) { - int err = - EINVAL; struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; struct rtattr *et = xfrma[XFRMA_ETIMER_THRESH-1]; @@ -349,8 +307,6 @@ static int xfrm_update_ae_params(struct if (rp) { struct xfrm_replay_state *replay; - if (RTA_PAYLOAD(rp) sizeof(*replay)) - goto error; replay = RTA_DATA(rp); memcpy(x-replay, replay, sizeof(*replay)); memcpy(x-preplay, replay, sizeof(*replay)); @@ -358,8 +314,6 @@ static int xfrm_update_ae_params(struct if (lt) { struct xfrm_lifetime_cur *ltime; - if (RTA_PAYLOAD(lt) sizeof(*ltime)) - goto error; ltime = RTA_DATA(lt); x-curlft.bytes = ltime-bytes; x-curlft.packets = ltime-packets; @@ -367,21 +321,11 @@ static int xfrm_update_ae_params(struct x-curlft.use_time = ltime-use_time; } - if (et) { - if (RTA_PAYLOAD(et) sizeof(u32)) - goto error; + if (et) x-replay_maxage = *(u32*)RTA_DATA(et); - } - if (rt) { - if (RTA_PAYLOAD(rt) sizeof(u32)) - goto error; + if (rt) x-replay_maxdiff = *(u32*)RTA_DATA(rt); - } - - return 0; -error: - return err; } static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p, @@ -429,9 +373,7 @@ static struct xfrm_state *xfrm_state_con /* override default values from above
[PATCH 14/16] [XFRM] netlink: Use nla_memcpy() in xfrm_update_ae_params()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:35:13.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:36:59.0 +0200 @@ -303,20 +303,12 @@ static void xfrm_update_ae_params(struct struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH]; if (rp) { - struct xfrm_replay_state *replay; - replay = nla_data(rp); - memcpy(x-replay, replay, sizeof(*replay)); - memcpy(x-preplay, replay, sizeof(*replay)); + nla_memcpy(x-replay, rp, sizeof(x-replay)); + nla_memcpy(x-preplay, rp, sizeof(x-preplay)); } - if (lt) { - struct xfrm_lifetime_cur *ltime; - ltime = nla_data(lt); - x-curlft.bytes = ltime-bytes; - x-curlft.packets = ltime-packets; - x-curlft.add_time = ltime-add_time; - x-curlft.use_time = ltime-use_time; - } + if (lt) + nla_memcpy(x-curlft, lt, sizeof(x-curlft)); if (et) x-replay_maxage = nla_get_u32(et); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] [XFRM] netlink: Use nlattr instead of rtattr
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:34:29.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:35:13.0 +0200 @@ -38,16 +38,16 @@ static inline int alg_len(struct xfrm_al return sizeof(*alg) + ((alg-alg_key_len + 7) / 8); } -static int verify_one_alg(struct rtattr **attrs, enum xfrm_attr_type_t type) +static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type) { - struct rtattr *rt = attrs[type]; + struct nlattr *rt = attrs[type]; struct xfrm_algo *algp; if (!rt) return 0; - algp = RTA_DATA(rt); - if (RTA_PAYLOAD(rt) alg_len(algp)) + algp = nla_data(rt); + if (nla_len(rt) alg_len(algp)) return -EINVAL; switch (type) { @@ -75,24 +75,24 @@ static int verify_one_alg(struct rtattr return 0; } -static void verify_one_addr(struct rtattr **attrs, enum xfrm_attr_type_t type, +static void verify_one_addr(struct nlattr **attrs, enum xfrm_attr_type_t type, xfrm_address_t **addrp) { - struct rtattr *rt = attrs[type]; + struct nlattr *rt = attrs[type]; if (rt addrp) - *addrp = RTA_DATA(rt); + *addrp = nla_data(rt); } -static inline int verify_sec_ctx_len(struct rtattr **attrs) +static inline int verify_sec_ctx_len(struct nlattr **attrs) { - struct rtattr *rt = attrs[XFRMA_SEC_CTX]; + struct nlattr *rt = attrs[XFRMA_SEC_CTX]; struct xfrm_user_sec_ctx *uctx; if (!rt) return 0; - uctx = RTA_DATA(rt); + uctx = nla_data(rt); if (uctx-len != (sizeof(struct xfrm_user_sec_ctx) + uctx-ctx_len)) return -EINVAL; @@ -101,7 +101,7 @@ static inline int verify_sec_ctx_len(str static int verify_newsa_info(struct xfrm_usersa_info *p, -struct rtattr **attrs) +struct nlattr **attrs) { int err; @@ -191,16 +191,15 @@ out: static int attach_one_algo(struct xfrm_algo **algpp, u8 *props, struct xfrm_algo_desc *(*get_byname)(char *, int), - struct rtattr *u_arg) + struct nlattr *rta) { - struct rtattr *rta = u_arg; struct xfrm_algo *p, *ualg; struct xfrm_algo_desc *algo; if (!rta) return 0; - ualg = RTA_DATA(rta); + ualg = nla_data(rta); algo = get_byname(ualg-alg_name, 1); if (!algo) @@ -216,15 +215,14 @@ static int attach_one_algo(struct xfrm_a return 0; } -static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct rtattr *u_arg) +static int attach_encap_tmpl(struct xfrm_encap_tmpl **encapp, struct nlattr *rta) { - struct rtattr *rta = u_arg; struct xfrm_encap_tmpl *p, *uencap; if (!rta) return 0; - uencap = RTA_DATA(rta); + uencap = nla_data(rta); p = kmemdup(uencap, sizeof(*p), GFP_KERNEL); if (!p) return -ENOMEM; @@ -245,26 +243,25 @@ static inline int xfrm_user_sec_ctx_size return len; } -static int attach_sec_ctx(struct xfrm_state *x, struct rtattr *u_arg) +static int attach_sec_ctx(struct xfrm_state *x, struct nlattr *u_arg) { struct xfrm_user_sec_ctx *uctx; if (!u_arg) return 0; - uctx = RTA_DATA(u_arg); + uctx = nla_data(u_arg); return security_xfrm_state_alloc(x, uctx); } -static int attach_one_addr(xfrm_address_t **addrpp, struct rtattr *u_arg) +static int attach_one_addr(xfrm_address_t **addrpp, struct nlattr *rta) { - struct rtattr *rta = u_arg; xfrm_address_t *p, *uaddrp; if (!rta) return 0; - uaddrp = RTA_DATA(rta); + uaddrp = nla_data(rta); p = kmemdup(uaddrp, sizeof(*p), GFP_KERNEL); if (!p) return -ENOMEM; @@ -298,23 +295,23 @@ static void copy_from_user_state(struct * somehow made shareable and move it to xfrm_state.c - JHS * */ -static void xfrm_update_ae_params(struct xfrm_state *x, struct rtattr **attrs) +static void xfrm_update_ae_params(struct xfrm_state *x, struct nlattr **attrs) { - struct rtattr *rp = attrs[XFRMA_REPLAY_VAL]; - struct rtattr *lt = attrs[XFRMA_LTIME_VAL]; - struct rtattr *et = attrs[XFRMA_ETIMER_THRESH]; - struct rtattr *rt = attrs[XFRMA_REPLAY_THRESH]; + struct nlattr *rp = attrs[XFRMA_REPLAY_VAL]; + struct nlattr *lt = attrs[XFRMA_LTIME_VAL]; + struct nlattr *et = attrs[XFRMA_ETIMER_THRESH]; + struct nlattr *rt = attrs[XFRMA_REPLAY_THRESH]; if (rp) { struct xfrm_replay_state *replay; - replay = RTA_DATA(rp
[PATCH 05/16] [XFRM] netlink: Use nla_put()/NLA_PUT() variantes
Also makes use of copy_sec_ctx() in another place and removes duplicated code. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:15:03.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:16:03.0 +0200 @@ -576,6 +576,27 @@ struct xfrm_dump_info { int this_idx; }; +static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb) +{ + int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len; + struct xfrm_user_sec_ctx *uctx; + struct nlattr *attr; + + attr = nla_reserve(skb, XFRMA_SEC_CTX, ctx_size); + if (attr == NULL) + return -EMSGSIZE; + + uctx = nla_data(attr); + uctx-exttype = XFRMA_SEC_CTX; + uctx-len = ctx_size; + uctx-ctx_doi = s-ctx_doi; + uctx-ctx_alg = s-ctx_alg; + uctx-ctx_len = s-ctx_len; + memcpy(uctx + 1, s-ctx_str, s-ctx_len); + + return 0; +} + static int dump_one_state(struct xfrm_state *x, int count, void *ptr) { struct xfrm_dump_info *sp = ptr; @@ -596,43 +617,32 @@ static int dump_one_state(struct xfrm_st copy_to_user_state(x, p); if (x-aalg) - RTA_PUT(skb, XFRMA_ALG_AUTH, + NLA_PUT(skb, XFRMA_ALG_AUTH, sizeof(*(x-aalg))+(x-aalg-alg_key_len+7)/8, x-aalg); if (x-ealg) - RTA_PUT(skb, XFRMA_ALG_CRYPT, + NLA_PUT(skb, XFRMA_ALG_CRYPT, sizeof(*(x-ealg))+(x-ealg-alg_key_len+7)/8, x-ealg); if (x-calg) - RTA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); + NLA_PUT(skb, XFRMA_ALG_COMP, sizeof(*(x-calg)), x-calg); if (x-encap) - RTA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap); + NLA_PUT(skb, XFRMA_ENCAP, sizeof(*x-encap), x-encap); - if (x-security) { - int ctx_size = sizeof(struct xfrm_sec_ctx) + - x-security-ctx_len; - struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size); - struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); - - uctx-exttype = XFRMA_SEC_CTX; - uctx-len = ctx_size; - uctx-ctx_doi = x-security-ctx_doi; - uctx-ctx_alg = x-security-ctx_alg; - uctx-ctx_len = x-security-ctx_len; - memcpy(uctx + 1, x-security-ctx_str, x-security-ctx_len); - } + if (x-security copy_sec_ctx(x-security, skb) 0) + goto nla_put_failure; if (x-coaddr) - RTA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr); + NLA_PUT(skb, XFRMA_COADDR, sizeof(*x-coaddr), x-coaddr); if (x-lastused) - RTA_PUT(skb, XFRMA_LASTUSED, sizeof(x-lastused), x-lastused); + NLA_PUT_U64(skb, XFRMA_LASTUSED, x-lastused); nlmsg_end(skb, nlh); out: sp-this_idx++; return 0; -rtattr_failure: +nla_put_failure: nlmsg_cancel(skb, nlh); return -EMSGSIZE; } @@ -1193,32 +1203,9 @@ static int copy_to_user_tmpl(struct xfrm up-ealgos = kp-ealgos; up-calgos = kp-calgos; } - RTA_PUT(skb, XFRMA_TMPL, - (sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr), - vec); - - return 0; - -rtattr_failure: - return -1; -} - -static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb) -{ - int ctx_size = sizeof(struct xfrm_sec_ctx) + s-ctx_len; - struct rtattr *rt = __RTA_PUT(skb, XFRMA_SEC_CTX, ctx_size); - struct xfrm_user_sec_ctx *uctx = RTA_DATA(rt); - - uctx-exttype = XFRMA_SEC_CTX; - uctx-len = ctx_size; - uctx-ctx_doi = s-ctx_doi; - uctx-ctx_alg = s-ctx_alg; - uctx-ctx_len = s-ctx_len; - memcpy(uctx + 1, s-ctx_str, s-ctx_len); - return 0; - rtattr_failure: - return -1; + return nla_put(skb, XFRMA_TMPL, + sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr, vec); } static inline int copy_to_user_state_sec_ctx(struct xfrm_state *x, struct sk_buff *skb) @@ -1240,17 +1227,11 @@ static inline int copy_to_user_sec_ctx(s #ifdef CONFIG_XFRM_SUB_POLICY static int copy_to_user_policy_type(u8 type, struct sk_buff *skb) { - struct xfrm_userpolicy_type upt; + struct xfrm_userpolicy_type upt = { + .type = type, + }; - memset(upt, 0, sizeof(upt)); - upt.type = type; - - RTA_PUT(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt); - - return 0; - -rtattr_failure: - return -1; + return nla_put(skb, XFRMA_POLICY_TYPE, sizeof(upt), upt); } #else @@ -1440,7 +1421,6 @@ static int build_aevent(struct sk_buff * { struct xfrm_aevent_id *id; struct nlmsghdr *nlh; - struct xfrm_lifetime_cur ltime
[PATCH 04/16] [XFRM] netlink: Use nlmsg_broadcast() and nlmsg_unicast()
This simplifies successful return codes from 0 to 0. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:13:57.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:15:03.0 +0200 @@ -800,8 +800,7 @@ static int xfrm_get_sa(struct sk_buff *s if (IS_ERR(resp_skb)) { err = PTR_ERR(resp_skb); } else { - err = netlink_unicast(xfrm_nl, resp_skb, - NETLINK_CB(skb).pid, MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid); } xfrm_state_put(x); out_noput: @@ -882,8 +881,7 @@ static int xfrm_alloc_userspi(struct sk_ goto out; } - err = netlink_unicast(xfrm_nl, resp_skb, - NETLINK_CB(skb).pid, MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, resp_skb, NETLINK_CB(skb).pid); out: xfrm_state_put(x); @@ -1393,9 +1391,8 @@ static int xfrm_get_policy(struct sk_buf if (IS_ERR(resp_skb)) { err = PTR_ERR(resp_skb); } else { - err = netlink_unicast(xfrm_nl, resp_skb, - NETLINK_CB(skb).pid, - MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, resp_skb, + NETLINK_CB(skb).pid); } } else { xfrm_audit_log(NETLINK_CB(skb).loginuid, NETLINK_CB(skb).sid, @@ -1525,8 +1522,7 @@ static int xfrm_get_ae(struct sk_buff *s if (build_aevent(r_skb, x, c) 0) BUG(); - err = netlink_unicast(xfrm_nl, r_skb, - NETLINK_CB(skb).pid, MSG_DONTWAIT); + err = nlmsg_unicast(xfrm_nl, r_skb, NETLINK_CB(skb).pid); spin_unlock_bh(x-lock); xfrm_state_put(x); return err; @@ -1903,9 +1899,7 @@ static int xfrm_send_migrate(struct xfrm if (build_migrate(skb, m, num_migrate, sel, dir, type) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_MIGRATE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE, -GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_MIGRATE, GFP_ATOMIC); } #else static int xfrm_send_migrate(struct xfrm_selector *sel, u8 dir, u8 type, @@ -2061,8 +2055,7 @@ static int xfrm_exp_state_notify(struct if (build_expire(skb, x, c) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); } static int xfrm_aevent_state_notify(struct xfrm_state *x, struct km_event *c) @@ -2079,8 +2072,7 @@ static int xfrm_aevent_state_notify(stru if (build_aevent(skb, x, c) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_AEVENTS; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_AEVENTS, GFP_ATOMIC); } static int xfrm_notify_sa_flush(struct km_event *c) @@ -2105,8 +2097,7 @@ static int xfrm_notify_sa_flush(struct k nlmsg_end(skb, nlh); - NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); } static inline int xfrm_sa_len(struct xfrm_state *x) @@ -2175,8 +2166,7 @@ static int xfrm_notify_sa(struct xfrm_st nlmsg_end(skb, nlh); - NETLINK_CB(skb).dst_group = XFRMNLGRP_SA; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_SA, GFP_ATOMIC); nlmsg_failure: rtattr_failure: @@ -2262,8 +2252,7 @@ static int xfrm_send_acquire(struct xfrm if (build_acquire(skb, x, xt, xp, dir) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_ACQUIRE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_ACQUIRE, GFP_ATOMIC); } /* User gives us xfrm_user_policy_info followed by an array of 0 @@ -2371,8 +2360,7 @@ static int xfrm_exp_policy_notify(struct if (build_polexpire(skb, xp, dir, c) 0) BUG(); - NETLINK_CB(skb).dst_group = XFRMNLGRP_EXPIRE; - return netlink_broadcast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); + return nlmsg_multicast(xfrm_nl, skb, 0, XFRMNLGRP_EXPIRE, GFP_ATOMIC); } static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, struct km_event *c) @@ -2423,8 +2411,7 @@ static
[PATCH 03/16] [XFRM] netlink: Use nlmsg_data() instead of NLMSG_DATA()
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 16:12:20.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 16:13:57.0 +0200 @@ -443,7 +443,7 @@ error_no_put: static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh, struct rtattr **xfrma) { - struct xfrm_usersa_info *p = NLMSG_DATA(nlh); + struct xfrm_usersa_info *p = nlmsg_data(nlh); struct xfrm_state *x; int err; struct km_event c; @@ -520,7 +520,7 @@ static int xfrm_del_sa(struct sk_buff *s struct xfrm_state *x; int err = -ESRCH; struct km_event c; - struct xfrm_usersa_id *p = NLMSG_DATA(nlh); + struct xfrm_usersa_id *p = nlmsg_data(nlh); x = xfrm_user_state_lookup(p, xfrma, err); if (x == NULL) @@ -592,7 +592,7 @@ static int dump_one_state(struct xfrm_st if (nlh == NULL) return -EMSGSIZE; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); copy_to_user_state(x, p); if (x-aalg) @@ -715,7 +715,7 @@ static int xfrm_get_spdinfo(struct sk_bu struct rtattr **xfrma) { struct sk_buff *r_skb; - u32 *flags = NLMSG_DATA(nlh); + u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; int len = NLMSG_LENGTH(sizeof(u32)); @@ -765,7 +765,7 @@ static int xfrm_get_sadinfo(struct sk_bu struct rtattr **xfrma) { struct sk_buff *r_skb; - u32 *flags = NLMSG_DATA(nlh); + u32 *flags = nlmsg_data(nlh); u32 spid = NETLINK_CB(skb).pid; u32 seq = nlh-nlmsg_seq; int len = NLMSG_LENGTH(sizeof(u32)); @@ -787,7 +787,7 @@ static int xfrm_get_sadinfo(struct sk_bu static int xfrm_get_sa(struct sk_buff *skb, struct nlmsghdr *nlh, struct rtattr **xfrma) { - struct xfrm_usersa_id *p = NLMSG_DATA(nlh); + struct xfrm_usersa_id *p = nlmsg_data(nlh); struct xfrm_state *x; struct sk_buff *resp_skb; int err = -ESRCH; @@ -841,7 +841,7 @@ static int xfrm_alloc_userspi(struct sk_ int family; int err; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); err = verify_userspi_info(p); if (err) goto out_noput; @@ -1130,7 +1130,7 @@ static struct xfrm_policy *xfrm_policy_c static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh, struct rtattr **xfrma) { - struct xfrm_userpolicy_info *p = NLMSG_DATA(nlh); + struct xfrm_userpolicy_info *p = nlmsg_data(nlh); struct xfrm_policy *xp; struct km_event c; int err; @@ -1277,8 +1277,8 @@ static int dump_one_policy(struct xfrm_p XFRM_MSG_NEWPOLICY, sizeof(*p), sp-nlmsg_flags); if (nlh == NULL) return -EMSGSIZE; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); copy_to_user_policy(xp, p, dir); if (copy_to_user_tmpl(xp, skb) 0) goto nlmsg_failure; @@ -1351,7 +1351,7 @@ static int xfrm_get_policy(struct sk_buf struct km_event c; int delete; - p = NLMSG_DATA(nlh); + p = nlmsg_data(nlh); delete = nlh-nlmsg_type == XFRM_MSG_DELPOLICY; err = copy_from_user_policy_type(type, xfrma); @@ -1420,7 +1420,7 @@ static int xfrm_flush_sa(struct sk_buff struct rtattr **xfrma) { struct km_event c; - struct xfrm_usersa_flush *p = NLMSG_DATA(nlh); + struct xfrm_usersa_flush *p = nlmsg_data(nlh); struct xfrm_audit audit_info; int err; @@ -1448,8 +1448,8 @@ static int build_aevent(struct sk_buff * nlh = nlmsg_put(skb, c-pid, c-seq, XFRM_MSG_NEWAE, sizeof(*id), 0); if (nlh == NULL) return -EMSGSIZE; - id = NLMSG_DATA(nlh); + id = nlmsg_data(nlh); memcpy(id-sa_id.daddr, x-id.daddr,sizeof(x-id.daddr)); id-sa_id.spi = x-id.spi; id-sa_id.family = x-props.family; @@ -1490,7 +1490,7 @@ static int xfrm_get_ae(struct sk_buff *s struct sk_buff *r_skb; int err; struct km_event c; - struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + struct xfrm_aevent_id *p = nlmsg_data(nlh); int len = NLMSG_LENGTH(sizeof(struct xfrm_aevent_id)); struct xfrm_usersa_id *id = p-sa_id; @@ -1538,7 +1538,7 @@ static int xfrm_new_ae(struct sk_buff *s struct xfrm_state *x; struct km_event c; int err = - EINVAL; - struct xfrm_aevent_id *p = NLMSG_DATA(nlh); + struct xfrm_aevent_id *p = nlmsg_data(nlh); struct rtattr *rp = xfrma[XFRMA_REPLAY_VAL-1]; struct rtattr *lt = xfrma[XFRMA_LTIME_VAL-1]; @@ -1602,7 +1602,7 @@ static int xfrm_add_pol_expire(struct sk struct rtattr
[PATCH 15/16] [XFRM] netlink: Remove dependency on rtnetlink
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:36:59.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:37:18.0 +0200 @@ -19,7 +19,6 @@ #include linux/string.h #include linux/net.h #include linux/skbuff.h -#include linux/rtnetlink.h #include linux/pfkeyv2.h #include linux/ipsec.h #include linux/init.h -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/16] [XFRM] netlink: Clear up some of the CONFIG_XFRM_SUB_POLICY ifdef mess
Moves all of the SUB_POLICY ifdefs related to the attribute size calculation into a function. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/net/xfrm/xfrm_user.c === --- net-2.6.24.orig/net/xfrm/xfrm_user.c2007-08-21 17:03:43.0 +0200 +++ net-2.6.24/net/xfrm/xfrm_user.c 2007-08-21 17:04:46.0 +0200 @@ -1224,6 +1224,14 @@ static inline int copy_to_user_sec_ctx(s } return 0; } +static inline size_t userpolicy_type_attrsize(void) +{ +#ifdef CONFIG_XFRM_SUB_POLICY + return nla_total_size(sizeof(struct xfrm_userpolicy_type)); +#else + return 0; +#endif +} #ifdef CONFIG_XFRM_SUB_POLICY static int copy_to_user_policy_type(u8 type, struct sk_buff *skb) @@ -1857,9 +1865,7 @@ static int xfrm_send_migrate(struct xfrm len = RTA_SPACE(sizeof(struct xfrm_user_migrate) * num_migrate); len += NLMSG_SPACE(sizeof(struct xfrm_userpolicy_id)); -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -2214,9 +2220,7 @@ static int xfrm_send_acquire(struct xfrm len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_acquire)); len += RTA_SPACE(xfrm_user_sec_ctx_size(x-security)); -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -2322,9 +2326,7 @@ static int xfrm_exp_policy_notify(struct len = RTA_SPACE(sizeof(struct xfrm_user_tmpl) * xp-xfrm_nr); len += NLMSG_SPACE(sizeof(struct xfrm_user_polexpire)); len += RTA_SPACE(xfrm_user_sec_ctx_size(xp-security)); -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); skb = alloc_skb(len, GFP_ATOMIC); if (skb == NULL) return -ENOMEM; @@ -2349,9 +2351,7 @@ static int xfrm_notify_policy(struct xfr len += RTA_SPACE(headlen); headlen = sizeof(*id); } -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); len += NLMSG_SPACE(headlen); skb = alloc_skb(len, GFP_ATOMIC); @@ -2401,9 +2401,7 @@ static int xfrm_notify_policy_flush(stru struct nlmsghdr *nlh; struct sk_buff *skb; int len = 0; -#ifdef CONFIG_XFRM_SUB_POLICY - len += RTA_SPACE(sizeof(struct xfrm_userpolicy_type)); -#endif + len += userpolicy_type_attrsize(); len += NLMSG_LENGTH(0); skb = alloc_skb(len, GFP_ATOMIC); -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Wild and crazy ideas involving struct sk_buff
* Paul Moore [EMAIL PROTECTED] 2007-08-22 16:31 We're currently talking about several different ideas to solve the problem, including leveraging the sk_buff.secmark field, and one of the ideas was to add an additional field to the sk_buff structure. Knowing how well that idea would go over (lead balloon is probably an understatement at best) I started looking at what I might be able to remove from the sk_buff struct to make room for a new field (the new field would be a u32). Looking at the sk_buff structure it appears that the sk_buff.dev and sk_buff.iif fields are a bit redundant and removing the sk_buff.dev field could free 32/64 bits depending on the platform. Is there any reason (performance?) for keeping the sk_buff.dev field around? Would the community be open to patches which removed it and transition users over to the sk_buff.iif field? Finally, assuming the sk_buff.dev field was removed, would the community be open to adding a new LSM/SELinux related u32 field to the sk_buff struct? This reminds of an idea someone brought up a while ago, it involved having a way to attach additional space to an sk_buff for all the different marks and other non-essential fields. I think skb-dev is required because we need to have a reference on the device while a packet being processing is put on a queue somewhere. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4 - rev 2] Initilize and populate age field
* Varun Chandramohan [EMAIL PROTECTED] 2007-08-20 13:46 The age field is filled with the current time at the time of creation of the route. When the routes are dumped then the age value stored in the route structure is subtracted from the current time value and the difference is the age expressed in secs. Signed-off-by: Varun Chandramohan [EMAIL PROTECTED] @@ -985,6 +987,14 @@ int fib_dump_info(struct sk_buff *skb, u NLA_PUT_U32(skb, RTA_FLOW, fi-fib_nh[0].nh_tclassid); #endif } + + do_gettimeofday(tv); + if (!*age) { + *age = timeval_to_sec(tv); + NLA_PUT_U32(skb, RTA_AGE, *age); Why don't you take the timestamp at the time of allocating the alias? This time-since-first-dump is very confusing. + } else { + NLA_PUT_U32(skb, RTA_AGE, timeval_to_sec(tv) - *age); + } #ifdef CONFIG_IP_ROUTE_MULTIPATH if (fi-fib_nhs 1) { struct rtnexthop *rtnh; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4 - rev 2] Initilize and populate age field
* Varun Chandramohan [EMAIL PROTECTED] 2007-08-21 16:52 I know its a bit confusing but let me explain the reason. In my first version patch i used fn_hash_insert() (place where alias is created)as place to insert my current time in the age field. This will eventually call fib_dump_info() for inserting the age filed attribute into the skb. Now in both places i have to call do_gettimeofday(). Its obvious that i need it in fn_hash_insert(), its also need in fib_dump_info() as it is the same function called for retrieving and dumping the age value to the userspace. So as you are aware that before we dump it to userspace we need to subtract the value with current time i need to call do_gettimeofday() twice. To avoid this i did as above. At least put a comment there, it's far from obvious. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NET]: Don't do netpoll on per cpu backlog napi struct
The per cpu backlog napi struct can't do netpoll and has the dev member set to NULL. Fixes an oops on boot when netpoll is enabled. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.24/include/linux/netpoll.h === --- net-2.6.24.orig/include/linux/netpoll.h 2007-08-22 01:02:14.0 +0200 +++ net-2.6.24/include/linux/netpoll.h 2007-08-22 01:02:30.0 +0200 @@ -75,7 +75,7 @@ static inline void *netpoll_poll_lock(st struct net_device *dev = napi-dev; rcu_read_lock(); /* deal with race on -npinfo */ - if (dev-npinfo) { + if (dev dev-npinfo) { spin_lock(napi-poll_lock); napi-poll_owner = smp_processor_id(); return napi; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCPportsfrom the host TCP port space.
* Felix Marti [EMAIL PROTECTED] 2007-08-20 12:02 These graphic adapters provide a wealth of features that you can take advantage of to bring these amazing graphics to life. General purpose CPUs cannot keep up. Chelsio offload devices do the same thing in the realm of networking. - Will there be things you can't do, probably yes, but as I said, there are lots of knobs to turn (and the latest and greatest feature that gets hyped up might not always be the best thing since sliced bread anyway; what happened to BIC love? ;) GPUs have almost no influence on system security, the network stack OTOH is probably the most vulnerable part of an operating system. Even if all vendors would implement all the features collected over the last years properly which seems unlikely. Having such an essential and critical part depend on the vendor of my network card without being able to even verify it properly is truly frightening. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
* Richard MUSIL [EMAIL PROTECTED] 2007-07-24 13:09 Thomas Graf wrote: Please provide a new overall patch which is not based on your initial patch so I can review your idea properly. Here it goes (merging two previous patches). I have diffed against v2.6.22, which I am using currently as my base: Sorry for taking so long. @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct genl_ops *ops) if (ops-policy) ops-flags |= GENL_CMD_CAP_HASPOL; - genl_lock(); + genl_fam_lock(family); list_add_tail(ops-ops_list, family-ops_list); - genl_unlock(); + genl_fam_unlock(family); For registering operations, it is sufficient to just acquire the family lock, the family itself can't disappear while holding it. @@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family) goto errout; INIT_LIST_HEAD(family-ops_list); + mutex_init(family-lock); - genl_lock(); + genl_fam_lock(family); if (genl_family_find_byname(family-name)) { err = -EEXIST; @@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family) family-attrbuf = NULL; list_add_tail(family-family_list, genl_family_chain(family-id)); - genl_unlock(); + genl_fam_unlock(family); This looks good. @@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) struct genlmsghdr *hdr = nlmsg_data(nlh); int hdrlen, err; + genl_fam_lock(NULL); family = genl_family_find_byid(nlh-nlmsg_type); - if (family == NULL) + if (family == NULL) { + genl_fam_unlock(NULL); return -ENOENT; + } + + /* get particular family lock, but release global family lock + * so registering operations for other families are possible */ + genl_onefam_lock(family); + genl_fam_unlock(NULL); I don't like having two locks for something as trivial as this. Basically the only reason the global lock is required here is to protect from family removal which can be avoided otherwise by using RCU list operations. Therefore, I'd propose the following lock semantics: Use own global mutex to protect writing to the family list, make reading side lockless using rcu for use when looking up family upon mesage processing. Use a family lock to protect writing to operations list and serialize messae processing with unregister operations. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Potential u32 classifier bug.
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-08-09 18:07 My big question is: Has anyone recently used the 802_3 protocol in tc with u32 and actually gotten it to work? I can't see how the u32_classify() code can look at the mac header, since it is using the network header accessor to start looking. I think this is an issue with the classification code, but I'm looking to see if I'm doing something stupid before I really start digging into this mess. There is this very horrible way of using the u32 classifier with a negative offset to look into the ethernet header. You might want to look into the cmp ematch which can be attached to almost any classifier. It allows basing offsets on any layer thus making ethernet header filtering trivial. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK] some thoughts on the usage
* Richard MUSIL [EMAIL PROTECTED] 2007-08-10 10:45 I have noticed that although ops for each family are the same (each device is functionally same) I cannot use same genl_ops struct for registration, because it uses internal member to link in list. Therefore it is necessary to allocate new genl_ops for each device and pass it to registration. But I cannot officially use this list to track those genl_ops (so I can properly destroy them later), because there is no interface. So I need to redo the management of the structures on my own. The intended usage of the interface in your example would be to register only one genetlink family, say tpm, register one set of operations and then have an attribute in every message which specifies which TPM device to use. This helps keeping the total number of genetlink families down. The second inconvenience is that for each family I register, I also register basically same ops (basically means, the definitions, and doit, dumpit handlers are same, though the structures are at different addresses for reasons described above). When the handler receives the message it needs to associate the message with the actual device it is handling. This could be done through family lookup (using nlmsghdr::nlmsg_type), but I wondered if it would make sense to extend genl_family for user custom data pointer and then pass this custom data (or genl_family reference) to each handler (for example inside genl_info). It is already parsed by genetlink layer, so it should not slow things down. That's not a bad idea, although I think we should try and keep the generic netlink part as simple as possible. There is a family specific header, referred to as user header in genl_info which is basically what you're looking for with the custom header. I believe making the generic netlink family aware of anything beyond family id and operations id only complicates things. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Potential u32 classifier bug.
* Waskiewicz Jr, Peter P [EMAIL PROTECTED] 2007-08-15 11:02 There is this very horrible way of using the u32 classifier with a negative offset to look into the ethernet header. Based on this, it sounds like u32 using protocol 802_3 is broken? You might be expecting too much from u32. The protocol given to u32 is just a filter, it doesn't imply anything beyond that. u32 has its usage the way it is, that's way we've added an ematch rather than extending u32 itself. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NEIGH]: Combine neighbour cleanup and release
Introduces neigh_cleanup_and_release() to be used after a neighbour has been removed from its neighbour table. Serves as preparation to add event notifications. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/core/neighbour.c === --- net-2.6.orig/net/core/neighbour.c 2007-07-22 11:41:46.0 +0200 +++ net-2.6/net/core/neighbour.c2007-07-22 11:42:02.0 +0200 @@ -104,6 +104,14 @@ static int neigh_blackhole(struct sk_buf return -ENETDOWN; } +static void neigh_cleanup_and_release(struct neighbour *neigh) +{ + if (neigh-parms-neigh_cleanup) + neigh-parms-neigh_cleanup(neigh); + + neigh_release(neigh); +} + /* * It is random distribution in the interval (1/2)*base...(3/2)*base. * It corresponds to default IPv6 settings and is not overridable, @@ -140,9 +148,7 @@ static int neigh_forced_gc(struct neigh_ n-dead = 1; shrunk = 1; write_unlock(n-lock); - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); + neigh_cleanup_and_release(n); continue; } write_unlock(n-lock); @@ -213,9 +219,7 @@ static void neigh_flush_dev(struct neigh NEIGH_PRINTK2(neigh %p is stray.\n, n); } write_unlock(n-lock); - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); + neigh_cleanup_and_release(n); } } } @@ -676,9 +680,7 @@ static void neigh_periodic_timer(unsigne *np = n-next; n-dead = 1; write_unlock(n-lock); - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); + neigh_cleanup_and_release(n); continue; } write_unlock(n-lock); @@ -2094,11 +2096,8 @@ void __neigh_for_each_release(struct nei } else np = n-next; write_unlock(n-lock); - if (release) { - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); - } + if (release) + neigh_cleanup_and_release(n); } } } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NEIGH]: Netlink notifications
Currently neighbour event notifications are limited to update notifications and only sent if the ARP daemon is enabled. This patch extends the existing notification code by also reporting neighbours being removed due to gc or administratively and removes the dependency on the ARP daemon. This allows to keep track of neighbour states without periodically fetching the complete neighbour table. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/core/neighbour.c === --- net-2.6.orig/net/core/neighbour.c 2007-07-22 11:42:02.0 +0200 +++ net-2.6/net/core/neighbour.c2007-07-22 11:49:15.0 +0200 @@ -54,9 +54,8 @@ #define PNEIGH_HASHMASK0xF static void neigh_timer_handler(unsigned long arg); -#ifdef CONFIG_ARPD -static void neigh_app_notify(struct neighbour *n); -#endif +static void __neigh_notify(struct neighbour *n, int type, int flags); +static void neigh_update_notify(struct neighbour *neigh); static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); @@ -109,6 +108,7 @@ static void neigh_cleanup_and_release(st if (neigh-parms-neigh_cleanup) neigh-parms-neigh_cleanup(neigh); + __neigh_notify(neigh, RTM_DELNEIGH, 0); neigh_release(neigh); } @@ -829,13 +829,10 @@ static void neigh_timer_handler(unsigned out: write_unlock(neigh-lock); } + if (notify) - call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); + neigh_update_notify(neigh); -#ifdef CONFIG_ARPD - if (notify neigh-parms-app_probes) - neigh_app_notify(neigh); -#endif neigh_release(neigh); } @@ -1064,11 +1061,8 @@ out: write_unlock_bh(neigh-lock); if (notify) - call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); -#ifdef CONFIG_ARPD - if (notify neigh-parms-app_probes) - neigh_app_notify(neigh); -#endif + neigh_update_notify(neigh); + return err; } @@ -2001,6 +1995,11 @@ nla_put_failure: return -EMSGSIZE; } +static void neigh_update_notify(struct neighbour *neigh) +{ + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); + __neigh_notify(neigh, RTM_NEWNEIGH, 0); +} static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct netlink_callback *cb) @@ -2420,7 +2419,6 @@ static const struct file_operations neig #endif /* CONFIG_PROC_FS */ -#ifdef CONFIG_ARPD static inline size_t neigh_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ndmsg)) @@ -2452,16 +2450,11 @@ errout: rtnl_set_sk_err(RTNLGRP_NEIGH, err); } +#ifdef CONFIG_ARPD void neigh_app_ns(struct neighbour *n) { __neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST); } - -static void neigh_app_notify(struct neighbour *n) -{ - __neigh_notify(n, RTM_NEWNEIGH, 0); -} - #endif /* CONFIG_ARPD */ #ifdef CONFIG_SYSCTL - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTNETLINK]: Fix warning for !CONFIG_KMOD
replay label is unused otherwise. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/core/rtnetlink.c === --- net-2.6.orig/net/core/rtnetlink.c 2007-07-22 11:41:46.0 +0200 +++ net-2.6/net/core/rtnetlink.c2007-07-22 12:04:27.0 +0200 @@ -952,7 +952,9 @@ static int rtnl_newlink(struct sk_buff * struct nlattr *linkinfo[IFLA_INFO_MAX+1]; int err; +#ifdef CONFIG_KMOD replay: +#endif err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err 0) return err; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
* Richard MUSIL [EMAIL PROTECTED] 2007-07-23 18:45 I have been giving it a second thought and came up with something more complex. The idea is to have locking granularity at the level of individual families. I agree in general, it would make up a better solution. However, your initial patch allows operations and families to be unregistered while message of the same family are being processed which must not be allowed. Please provide a new overall patch which is not based on your initial patch so I can review your idea properly. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GENETLINK]: Correctly report errors while registering a multicast group
Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 21:54:35.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-23 21:54:54.0 +0200 @@ -196,7 +196,7 @@ int genl_register_mc_group(struct genl_f genl_ctrl_event(CTRL_CMD_NEWMCAST_GRP, grp); out: genl_unlock(); - return 0; + return err; } EXPORT_SYMBOL(genl_register_mc_group); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GENETLINK]: Fix adjustment of number of multicast groups
The current calculation of the maximum number of genetlink multicast groups seems odd, fix it. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:03:02.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-23 22:05:12.0 +0200 @@ -184,7 +184,7 @@ int genl_register_mc_group(struct genl_f } err = netlink_change_ngroups(genl_sock, -sizeof(unsigned long) * NETLINK_GENERIC); +mc_groups_longs * BITS_PER_LONG); if (err) goto out; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GENETLINK]: Fix race in genl_unregister_mc_groups()
family-mcast_groups is protected by genl_lock so it must be held while accessing the list in genl_unregister_mc_groups(). Requires adding a non-locking variant of genl_unregister_mc_group(). Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-23 22:09:08.0 +0200 @@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f } EXPORT_SYMBOL(genl_register_mc_group); +static void __genl_unregister_mc_group(struct genl_family *family, + struct genl_multicast_group *grp) +{ + BUG_ON(grp-family != family); + netlink_clear_multicast_users(genl_sock, grp-id); + clear_bit(grp-id, mc_groups); + list_del(grp-list); + genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); + grp-id = 0; + grp-family = NULL; +} + /** * genl_unregister_mc_group - unregister a multicast group * @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { - BUG_ON(grp-family != family); genl_lock(); - netlink_clear_multicast_users(genl_sock, grp-id); - clear_bit(grp-id, mc_groups); - list_del(grp-list); - genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); - grp-id = 0; - grp-family = NULL; + genl_unregister_mc_group(family, grp); genl_unlock(); } @@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st { struct genl_multicast_group *grp, *tmp; + genl_lock(); list_for_each_entry_safe(grp, tmp, family-mcast_groups, list) - genl_unregister_mc_group(family, grp); + __genl_unregister_mc_group(family, grp); + genl_unlock(); } /** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Fix race in genl_unregister_mc_groups()
* Brian Haley [EMAIL PROTECTED] 2007-07-24 12:14 Thomas Graf wrote: @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { -BUG_ON(grp-family != family); genl_lock(); -netlink_clear_multicast_users(genl_sock, grp-id); -clear_bit(grp-id, mc_groups); -list_del(grp-list); -genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); -grp-id = 0; -grp-family = NULL; +genl_unregister_mc_group(family, grp); genl_unlock(); } Shouldn't this be __genl_unregister_mc_group(family, grp) ? Yes, thank for you noticing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[REPOST][GENETLINK]: Fix race in genl_unregister_mc_groups()
family-mcast_groups is protected by genl_lock so it must be held while accessing the list in genl_unregister_mc_groups(). Requires adding a non-locking variant of genl_unregister_mc_group(). Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/netlink/genetlink.c === --- net-2.6.orig/net/netlink/genetlink.c2007-07-23 22:08:04.0 +0200 +++ net-2.6/net/netlink/genetlink.c 2007-07-24 23:51:11.0 +0200 @@ -200,6 +200,18 @@ int genl_register_mc_group(struct genl_f } EXPORT_SYMBOL(genl_register_mc_group); +static void __genl_unregister_mc_group(struct genl_family *family, + struct genl_multicast_group *grp) +{ + BUG_ON(grp-family != family); + netlink_clear_multicast_users(genl_sock, grp-id); + clear_bit(grp-id, mc_groups); + list_del(grp-list); + genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); + grp-id = 0; + grp-family = NULL; +} + /** * genl_unregister_mc_group - unregister a multicast group * @@ -217,14 +229,8 @@ EXPORT_SYMBOL(genl_register_mc_group); void genl_unregister_mc_group(struct genl_family *family, struct genl_multicast_group *grp) { - BUG_ON(grp-family != family); genl_lock(); - netlink_clear_multicast_users(genl_sock, grp-id); - clear_bit(grp-id, mc_groups); - list_del(grp-list); - genl_ctrl_event(CTRL_CMD_DELMCAST_GRP, grp); - grp-id = 0; - grp-family = NULL; + __genl_unregister_mc_group(family, grp); genl_unlock(); } @@ -232,8 +238,10 @@ static void genl_unregister_mc_groups(st { struct genl_multicast_group *grp, *tmp; + genl_lock(); list_for_each_entry_safe(grp, tmp, family-mcast_groups, list) - genl_unregister_mc_group(family, grp); + __genl_unregister_mc_group(family, grp); + genl_unlock(); } /** - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
* Richard MUSIL [EMAIL PROTECTED] 2007-07-20 18:15 Patrick McHardy wrote: Export the lock/unlock/.. functions. You'll also need a new version similar to __rtnl_unlock. Patrick, you might feel, I am not reading your lines, but in fact I do. The problem is that I do not feel competent to follow/propose such changes. So what I propose here (in included patch) is the least change scenario, which I can think of and on which I feel safe. If there are some other changes required, as you suggested for example exporting lock from genetlink module, I hope authors of genetlink will comment on that. Currently, I do not see any reason for that, but this could be due to my limited knowledge. Actually there is no reason to not use separate locks for the message serialization and the protection of the list of registered families. There is only one lock simply for the reason that I've never thought of anybody could think of registering a new genetlink family while processing a message. Alternatively you could also postpone the registration of the new genetlink family to a workqueue. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NEIGH]: Combine neighbour cleanup and release
Introduces neigh_cleanup_and_release() to be used after a neighbour has been removed from its neighbour table. Serves as preparation to add event notifications. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/core/neighbour.c === --- net-2.6.orig/net/core/neighbour.c 2007-07-22 11:41:46.0 +0200 +++ net-2.6/net/core/neighbour.c2007-07-22 11:42:02.0 +0200 @@ -104,6 +104,14 @@ static int neigh_blackhole(struct sk_buf return -ENETDOWN; } +static void neigh_cleanup_and_release(struct neighbour *neigh) +{ + if (neigh-parms-neigh_cleanup) + neigh-parms-neigh_cleanup(neigh); + + neigh_release(neigh); +} + /* * It is random distribution in the interval (1/2)*base...(3/2)*base. * It corresponds to default IPv6 settings and is not overridable, @@ -140,9 +148,7 @@ static int neigh_forced_gc(struct neigh_ n-dead = 1; shrunk = 1; write_unlock(n-lock); - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); + neigh_cleanup_and_release(n); continue; } write_unlock(n-lock); @@ -213,9 +219,7 @@ static void neigh_flush_dev(struct neigh NEIGH_PRINTK2(neigh %p is stray.\n, n); } write_unlock(n-lock); - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); + neigh_cleanup_and_release(n); } } } @@ -676,9 +680,7 @@ static void neigh_periodic_timer(unsigne *np = n-next; n-dead = 1; write_unlock(n-lock); - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); + neigh_cleanup_and_release(n); continue; } write_unlock(n-lock); @@ -2094,11 +2096,8 @@ void __neigh_for_each_release(struct nei } else np = n-next; write_unlock(n-lock); - if (release) { - if (n-parms-neigh_cleanup) - n-parms-neigh_cleanup(n); - neigh_release(n); - } + if (release) + neigh_cleanup_and_release(n); } } } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[NEIGH]: Netlink notifications
Currently neighbour event notifications are limited to update notifications and only sent if the ARP daemon is enabled. This patch extends the existing notification code by also reporting neighbours being removed due to gc or administratively and removes the dependency on the ARP daemon. This allows to keep track of neighbour states without periodically fetching the complete neighbour table. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/core/neighbour.c === --- net-2.6.orig/net/core/neighbour.c 2007-07-22 11:42:02.0 +0200 +++ net-2.6/net/core/neighbour.c2007-07-22 11:49:15.0 +0200 @@ -54,9 +54,8 @@ #define PNEIGH_HASHMASK0xF static void neigh_timer_handler(unsigned long arg); -#ifdef CONFIG_ARPD -static void neigh_app_notify(struct neighbour *n); -#endif +static void __neigh_notify(struct neighbour *n, int type, int flags); +static void neigh_update_notify(struct neighbour *neigh); static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev); void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev); @@ -109,6 +108,7 @@ static void neigh_cleanup_and_release(st if (neigh-parms-neigh_cleanup) neigh-parms-neigh_cleanup(neigh); + __neigh_notify(neigh, RTM_DELNEIGH, 0); neigh_release(neigh); } @@ -829,13 +829,10 @@ static void neigh_timer_handler(unsigned out: write_unlock(neigh-lock); } + if (notify) - call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); + neigh_update_notify(neigh); -#ifdef CONFIG_ARPD - if (notify neigh-parms-app_probes) - neigh_app_notify(neigh); -#endif neigh_release(neigh); } @@ -1064,11 +1061,8 @@ out: write_unlock_bh(neigh-lock); if (notify) - call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); -#ifdef CONFIG_ARPD - if (notify neigh-parms-app_probes) - neigh_app_notify(neigh); -#endif + neigh_update_notify(neigh); + return err; } @@ -2001,6 +1995,11 @@ nla_put_failure: return -EMSGSIZE; } +static void neigh_update_notify(struct neighbour *neigh) +{ + call_netevent_notifiers(NETEVENT_NEIGH_UPDATE, neigh); + __neigh_notify(neigh, RTM_NEWNEIGH, 0); +} static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb, struct netlink_callback *cb) @@ -2420,7 +2419,6 @@ static const struct file_operations neig #endif /* CONFIG_PROC_FS */ -#ifdef CONFIG_ARPD static inline size_t neigh_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ndmsg)) @@ -2452,16 +2450,11 @@ errout: rtnl_set_sk_err(RTNLGRP_NEIGH, err); } +#ifdef CONFIG_ARPD void neigh_app_ns(struct neighbour *n) { __neigh_notify(n, RTM_GETNEIGH, NLM_F_REQUEST); } - -static void neigh_app_notify(struct neighbour *n) -{ - __neigh_notify(n, RTM_NEWNEIGH, 0); -} - #endif /* CONFIG_ARPD */ #ifdef CONFIG_SYSCTL - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RTNETLINK]: Fix warning if !CONFIG_KMOD
replay label is unused otherwise. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6/net/core/rtnetlink.c === --- net-2.6.orig/net/core/rtnetlink.c 2007-07-22 11:41:46.0 +0200 +++ net-2.6/net/core/rtnetlink.c2007-07-22 12:04:27.0 +0200 @@ -952,7 +952,9 @@ static int rtnl_newlink(struct sk_buff * struct nlattr *linkinfo[IFLA_INFO_MAX+1]; int err; +#ifdef CONFIG_KMOD replay: +#endif err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy); if (err 0) return err; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
* Miklos Szeredi [EMAIL PROTECTED] 2007-06-18 11:44 Garbage collection only ever happens, if the app is sending AF_UNIX sockets over AF_UNIX sockets. Which is a rather rare case. And which is basically why this bug went unnoticed for so long. So my second patch only affects the performance of _exactly_ those apps which might well be bitten by the bug itself. That's not entirely the truth. It affects all applications using AF_UNIX sockets while file descriptors are being transfered. I agree that the performance impact is not severe on most systems but if file descriptors are being transfered continously by just a single application it can become rather severe. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix race in AF_UNIX
* Thomas Graf [EMAIL PROTECTED] 2007-06-18 12:32 * Miklos Szeredi [EMAIL PROTECTED] 2007-06-18 11:44 Garbage collection only ever happens, if the app is sending AF_UNIX sockets over AF_UNIX sockets. Which is a rather rare case. And which is basically why this bug went unnoticed for so long. So my second patch only affects the performance of _exactly_ those apps which might well be bitten by the bug itself. That's not entirely the truth. It affects all applications using AF_UNIX sockets while file descriptors are being transfered. I agree that the performance impact is not severe on most systems but if file descriptors are being transfered continously by just a single application it can become rather severe. Also think of the scenario where an application, deliberately or not, begins a file descriptor tranfser using sendmsg() and the receiving part never invokes recvmsg() to decrement the inflight counters again. Every unix socket that gets closed would result in a gc call locking all sockets. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html