[PATCH v4 60/79] include/uapi/linux/atm_zatm.h: include linux/time.h
Fixes userspace compile error: error: field ‘real’ has incomplete type struct timeval real; /* real (wall-clock) time */ Signed-off-by: Mikko Rapeli --- include/uapi/linux/atm_zatm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h index 10f0fa2..adbaa6c 100644 --- a/include/uapi/linux/atm_zatm.h +++ b/include/uapi/linux/atm_zatm.h @@ -14,6 +14,7 @@ #include #include +#include #define ZATM_GETPOOL _IOW('a',ATMIOC_SARPRV+1,struct atmif_sioc) /* get pool statistics */ -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 57/79] include/uapi/linux/openvswitch.h: use __u32 from linux/types.h
Fixes userspace compiler error: error: unknown type name ‘uint32_t’ Signed-off-by: Mikko Rapeli --- include/uapi/linux/openvswitch.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 32e07d8..80c39a1 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -612,8 +612,8 @@ enum ovs_hash_alg { * @hash_basis: basis used for computing hash. */ struct ovs_action_hash { - uint32_t hash_alg; /* One of ovs_hash_alg. */ - uint32_t hash_basis; + __u32 hash_alg; /* One of ovs_hash_alg. */ + __u32 hash_basis; }; /** -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] netlink: trim skb to exact size to avoid MSG_TRUNC
>-Original Message- >From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On >Behalf Of Thomas Graf >Sent: Wednesday, October 14, 2015 12:45 AM >To: Arad, Ronen >Cc: netdev@vger.kernel.org >Subject: Re: [PATCH] netlink: trim skb to exact size to avoid MSG_TRUNC > >On 10/13/15 at 05:52pm, Arad, Ronen wrote: >> [@Ronen] My reader as I described above is providing a larger message >> which I'm trying to properly size. I'm aware that libnl shields >> applications from the need to know and provide properly sized buffer by >> peeking or/and re-allocating a buffer. >> My issue is with iproute2 "ip link show" and "bridge vlan show" commands. > >> >> >I'm just trying to understand which exact case you are solving here. >> Allocation is always performed by alloc_size which could be >> nlk->max_recvmsg_len (only when min_dump_alloc is sufficiently small) and >> upon failure falling back to alloc_min_size. >> The trimming of the skb space is common regardless of the allocation call. >> I tried to submit the minimal patch to address the issue. If you think the >> Re-organized code is better I can re-submit a V2. > >I was about to suggest the same code change after initial discussion ;-) > >So you are fixing the case where >2x messages fit the padded skb size. [@Ronen] I'm not sure I understand this statement. I'm fixing the padding of the skb such that reader could have reasonable buffer size based on the largest netdev. It is just happened that the skb allocation was about double the size. Probably because the allocation was some kind of power of 2 and the requested size was slightly above the next lower power of 2. On a separate patch titled [PATCH net-next v3] netlink: Rightsize IFLA_AF_SPEC size calculation I'm reducing the over-estimation of the buffer size for "ip link" requests. It turned out that VLAN information space was added to unrelated dump requests since ext_filter_mask was not passed to rtnl_link_get_af_size(). The "rightsizing" patch also reduces the buffer size of compressed VLANs dump. Non-compressed VLANs dump will continue to require more than the 16KiB buffer size from somewhere around 1800 VLANs and above (based on 8 bytes per VLAN plus other attributes consuming about 1700 bytes). Using the same logic the full range of 4094 VLANs uncompressed would take Roughly 34500 bytes. It looks like a "safe" iproute2 statically sized Buffer would have to be about 36000 bytes or so. >This was not clear from the commit message. I would appreciate a note >in the commit message and updated code comment to reflect this. > >The fix is definitely not incorrect and the penalty for readers which >peek first is less than I thought since nlk->max_recvmsg_len is at >least 16K in size. Since most peekers will double buffer sizes they >will most likely end up growing nlk->max_recvmsg_len after the first >read. [@Ronen] nlk->max_recvmsg_len is actually capped at 16KiB. /* Record the max length of recvmsg() calls for future allocations */ nlk->max_recvmsg_len = max(nlk->max_recvmsg_len, len); nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len, 16384); > >However, if alloc_size is > 16K, we would have typically ended up with a >giant skb which peeking users were able to take advantage of while >with this fix this is no longer the case. [@Ronen] As I noted above, peeking reader could only enjoy saving up to 16KiB. > >However #2, I'll see if it makes sense to look at MSG_PEEK in recvmsg >and change nlk->max_recvmsg_len accordingly so we take advantage of >the full skb size on sockets which perform peeking. Given that both >reader behaviours can be preserved, I'm good with your proposed v2. [@Ronen] I'll submit by suggested v2. >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Question]: iproute2 extension for supporting lightweight tunnel
Hi, I am trying to use the "lightweight tunnels" after building the Linux kernel from source with "Lightweight & flow based encapsulation" support. Can you tell me how to get iproute2 extension for supporting the following command in commit log(commit ID e69724f32e62502a6e686eae36b7aadfeea60dca) "ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0" Thanks. Best Regards Mengke -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch net] mlxsw: core: Fix race condition in __mlxsw_emad_transmit
From: Ido Schimmel Under certain conditions EMAD responses can be returned from the device even before setting trans_active. This will cause the EMAD Rx listener to drop the EMAD response - as there are no active transactions - and timeouts will be generated. Fix this by setting trans_active before transmitting the EMAD skb. Fixes: 4ec14b7634b2 ("mlxsw: Add interface to access registers and process events") Signed-off-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/core.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index dbcaf5d..28c19cc 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -374,26 +374,31 @@ static int __mlxsw_emad_transmit(struct mlxsw_core *mlxsw_core, int err; int ret; + mlxsw_core->emad.trans_active = true; + err = mlxsw_core_skb_transmit(mlxsw_core->driver_priv, skb, tx_info); if (err) { dev_err(mlxsw_core->bus_info->dev, "Failed to transmit EMAD (tid=%llx)\n", mlxsw_core->emad.tid); dev_kfree_skb(skb); - return err; + goto trans_inactive_out; } - mlxsw_core->emad.trans_active = true; ret = wait_event_timeout(mlxsw_core->emad.wait, !(mlxsw_core->emad.trans_active), msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_MS)); if (!ret) { dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out (tid=%llx)\n", mlxsw_core->emad.tid); - mlxsw_core->emad.trans_active = false; - return -EIO; + err = -EIO; + goto trans_inactive_out; } return 0; + +trans_inactive_out: + mlxsw_core->emad.trans_active = false; + return err; } static int mlxsw_emad_process_status(struct mlxsw_core *mlxsw_core, -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 39/79] include/uapi/linux/if_pppox.h: include linux/if.h
Fixes userspace compilation error: error: ‘IFNAMSIZ’ undeclared here (not in a function) Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_pppox.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h index e128769..473c3c4 100644 --- a/include/uapi/linux/if_pppox.h +++ b/include/uapi/linux/if_pppox.h @@ -21,6 +21,7 @@ #include #include +#include #include #include -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 44/79] include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.h
Fixes userspace compilation errors: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_pppox.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h index 473c3c4..d37bbb1 100644 --- a/include/uapi/linux/if_pppox.h +++ b/include/uapi/linux/if_pppox.h @@ -24,6 +24,8 @@ #include #include #include +#include +#include /* For user-space programs to pick up these definitions * which they wouldn't get otherwise without defining __KERNEL__ -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 43/79] include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.h
Fixes userspace compilation errors like: error: field ‘addr’ has incomplete type struct sockaddr_in addr; /* IP address and port to send to */ ^ error: field ‘addr’ has incomplete type struct sockaddr_in6 addr; /* IP address and port to send to */ Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_pppol2tp.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/if_pppol2tp.h b/include/uapi/linux/if_pppol2tp.h index 163e8ad..4bd1f55 100644 --- a/include/uapi/linux/if_pppol2tp.h +++ b/include/uapi/linux/if_pppol2tp.h @@ -16,7 +16,8 @@ #define _UAPI__LINUX_IF_PPPOL2TP_H #include - +#include +#include /* Structure used to connect() the socket to a particular tunnel UDP * socket over IPv4. -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 40/79] include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.h
Fixes userspace compilation errors like: error: field ‘iph’ has incomplete type error: field ‘prefix’ has incomplete type Signed-off-by: Mikko Rapeli --- include/uapi/linux/if_tunnel.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index af4de90..8afe695 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -2,6 +2,9 @@ #define _UAPI_IF_TUNNEL_H_ #include +#include +#include +#include #include -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/4] Produce system time from correlated clocksource
On Wed, Oct 14, 2015 at 06:57:33PM -0700, Christopher Hall wrote: > >>+#define SHADOW_HISTORY_DEPTH 7 > > > >And that number is 7 because? > > Due to power of 2 it will be 8 instead. As above the useful history is 8-2*1 > ms (1 ms is the minimum jiffy length). Array size 4 would not be enough > history for the DSP which requires 4 ms of history, in the worst case. Just as I suspected, the magic number 7 is based on the needs of one particular user. What about the next user who comes along needing 10 milliseconds? That will not do. Any new interface should be generic enough to support a wide range of users. So I think this approach is all wrong. Here is an idea for you to consider. Instead of mucking with the TK, let the user code (possibly in-kernel) sample ART/sys pairs and interpolate the ART/dev time stamps. That way, the user can choose the range and resolution that he needs. > The audio driver is structured in such a way that it's simpler to provide a > value rather than a callback. Can you please provide a link to the audio driver that uses this new interface? Thanks, Richard -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
Thu, Oct 15, 2015 at 06:34:01AM CEST, sfel...@gmail.com wrote: >On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko wrote: >> From: Jiri Pirko >> >> Caller should know if he can call attr_set directly (when holding RTNL) >> or if he has to defer the att_set processing for later. >> >> This also allows drivers to sleep inside attr_set and report operation >> status back to switchdev core. Switchdev core then warns if status is >> not ok, instead of silent errors happening in drivers. >> >> Benefit from newly introduced switchdev deferred ops infrastructure. >> >> Signed-off-by: Jiri Pirko >> --- >> include/net/switchdev.h | 1 + >> net/bridge/br_stp.c | 3 +- >> net/switchdev/switchdev.c | 108 >> ++ >> 3 files changed, 46 insertions(+), 66 deletions(-) >> >> diff --git a/include/net/switchdev.h b/include/net/switchdev.h >> index d1c7f90..f7de6f8 100644 >> --- a/include/net/switchdev.h >> +++ b/include/net/switchdev.h >> @@ -17,6 +17,7 @@ >> >> #define SWITCHDEV_F_NO_RECURSE BIT(0) >> #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1) >> +#define SWITCHDEV_F_DEFER BIT(2) >> >> struct switchdev_trans_item { >> struct list_head list; >> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c >> index db6d243de..80c34d7 100644 >> --- a/net/bridge/br_stp.c >> +++ b/net/bridge/br_stp.c >> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned >> int state) >> { >> struct switchdev_attr attr = { >> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, >> + .flags = SWITCHDEV_F_DEFER, >> .u.stp_state = state, >> }; >> int err; >> >> p->state = state; >> err = switchdev_port_attr_set(p->dev, &attr); >> - if (err && err != -EOPNOTSUPP) >> + if (err) >> br_warn(p->br, "error setting offload STP state on port >> %u(%s)\n", >> (unsigned int) p->port_no, p->dev->name); >> } > >Should this part of the patch be moved to patch 6/8 where >switchdev_deferred_process() is called from del_nbp()? No, this part relates to the fact that attr_set now does not defer automagically. So caller must say when to defer. So having this here is correct. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next v2] netlink: Rightsize IFLA_AF_SPEC size calculation
>-Original Message- >From: David Miller [mailto:da...@davemloft.net] >Sent: Wednesday, October 14, 2015 7:24 PM >To: Arad, Ronen >Cc: netdev@vger.kernel.org >Subject: Re: [PATCH net-next v2] netlink: Rightsize IFLA_AF_SPEC size >calculation > >From: Ronen Arad >Date: Wed, 14 Oct 2015 08:51:28 -0700 > >> @@ -900,7 +901,7 @@ static noinline size_t if_nlmsg_size(const struct >net_device *dev, >> + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ >> + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + >IFLA_PORT_SELF */ >> + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ >> - + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */ >> ++ rtnl_link_get_af_size(dev, ext_filter_mask) /* IFLA_AF_SPEC */ >> + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */ > >Please don't change the indentation on this line, keep it matching >the indentation of all of the surrounding lines of this expression. [@Ronen] Sure. V3 submitted. My editor didn't like the indentation of the surrounding lines which are one less than two TAB spaces but consistency is important. > >Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3] netlink: Rightsize IFLA_AF_SPEC size calculation
if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. --- include/net/rtnetlink.h | 3 ++- net/bridge/br_netlink.c | 21 + net/core/rtnetlink.c| 8 net/ipv4/devinet.c | 4 ++-- net/ipv6/addrconf.c | 3 ++- 5 files changed, 11 insertions(+), 28 deletions(-) diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index aff6ceb..2f87c1b 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -124,7 +124,8 @@ struct rtnl_af_ops { int (*fill_link_af)(struct sk_buff *skb, const struct net_device *dev, u32 ext_filter_mask); - size_t (*get_link_af_size)(const struct net_device *dev); + size_t (*get_link_af_size)(const struct net_device *dev, + u32 ext_filter_mask); int (*validate_link_af)(const struct net_device *dev, const struct nlattr *attr); diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 94b4de8..40197ff 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1214,29 +1214,10 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) return 0; } -static size_t br_get_link_af_size(const struct net_device *dev) -{ - struct net_bridge_port *p; - struct net_bridge *br; - int num_vlans = 0; - - if (br_port_exists(dev)) { - p = br_port_get_rtnl(dev); - num_vlans = br_get_num_vlan_infos(nbp_vlan_group(p), - RTEXT_FILTER_BRVLAN); - } else if (dev->priv_flags & IFF_EBRIDGE) { - br = netdev_priv(dev); - num_vlans = br_get_num_vlan_infos(br_vlan_group(br), - RTEXT_FILTER_BRVLAN); - } - - /* Each VLAN is returned in bridge_vlan_info along with flags */ - return num_vlans * nla_total_size(sizeof(struct bridge_vlan_info)); -} static struct rtnl_af_ops br_af_ops __read_mostly = { .family = AF_BRIDGE, - .get_link_af_size = br_get_link_af_size, + .get_link_af_size = br_get_link_af_size_filtered, }; struct rtnl_link_ops br_link_ops __read_mostly = { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2477595..7c78b5a 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -497,7 +497,8 @@ void rtnl_af_unregister(struct rtnl_af_ops *ops) } EXPORT_SYMBOL_GPL(rtnl_af_unregister); -static size_t rtnl_link_get_af_size(const struct net_device *dev) +static size_t rtnl_link_get_af_size(const struct net_device *dev, + u32 ext_filter_mask) { struct rtnl_af_ops *af_ops; size_t size; @@ -509,7 +510,7 @@ static size_t rtnl_link_get_af_size(const struct net_device *dev) if (af_ops->get_link_af_size) { /* AF_* + nested data */ size += nla_total_size(sizeof(struct nlattr)) + - af_ops->get_link_af_size(dev); + af_ops->get_link_af_size(dev, ext_filter_mask); } } @@ -900,7 +901,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + IFLA_PORT_SELF */ + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ - + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */ + + rtnl_link_get_af_size(dev, ext_filter_mask) /* IFLA_AF_SPEC */ + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */ + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */ + nla_total_size(1); /* IFLA_PROTO_DOWN */ @@ -3443,4 +3444,3 @@ void __init rtnetlink_init(void) rtnl_register(PF_BRIDGE, RTM_DELLINK, rtnl_bridge_dellin
Re: [PATCH v4 1/4] Produce system time from correlated clocksource
On Wed, Oct 14, 2015 at 07:34:03PM -0700, Christopher Hall wrote: > I hope this is helpful. Thanks. So the DSP does not produce or consume system time stamps. Fine. Still I fail to understand why you need the system time. Thomas seems to say that there are *other* applications that will want to transform device time into system time, but why does your audio application use the system time, when the audio-to-ptp time is directly available, without any man in the middle? Thanks, Richard -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred
On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko wrote: > From: Jiri Pirko > > Caller should know if he can call attr_set directly (when holding RTNL) > or if he has to defer the att_set processing for later. > > This also allows drivers to sleep inside attr_set and report operation > status back to switchdev core. Switchdev core then warns if status is > not ok, instead of silent errors happening in drivers. > > Benefit from newly introduced switchdev deferred ops infrastructure. > > Signed-off-by: Jiri Pirko > --- > include/net/switchdev.h | 1 + > net/bridge/br_stp.c | 3 +- > net/switchdev/switchdev.c | 108 > ++ > 3 files changed, 46 insertions(+), 66 deletions(-) > > diff --git a/include/net/switchdev.h b/include/net/switchdev.h > index d1c7f90..f7de6f8 100644 > --- a/include/net/switchdev.h > +++ b/include/net/switchdev.h > @@ -17,6 +17,7 @@ > > #define SWITCHDEV_F_NO_RECURSE BIT(0) > #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1) > +#define SWITCHDEV_F_DEFER BIT(2) > > struct switchdev_trans_item { > struct list_head list; > diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c > index db6d243de..80c34d7 100644 > --- a/net/bridge/br_stp.c > +++ b/net/bridge/br_stp.c > @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int > state) > { > struct switchdev_attr attr = { > .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE, > + .flags = SWITCHDEV_F_DEFER, > .u.stp_state = state, > }; > int err; > > p->state = state; > err = switchdev_port_attr_set(p->dev, &attr); > - if (err && err != -EOPNOTSUPP) > + if (err) > br_warn(p->br, "error setting offload STP state on port > %u(%s)\n", > (unsigned int) p->port_no, p->dev->name); > } Should this part of the patch be moved to patch 6/8 where switchdev_deferred_process() is called from del_nbp()? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net-next 2/4] net_sched: update hierarchical backlog too
On Wed, Oct 14, 2015 at 5:11 AM, Jamal Hadi Salim wrote: > On 10/12/15 14:38, Cong Wang wrote: >> >> When the bottom qdisc decides to, for example, drop some packet, >> it calls qdisc_tree_decrease_qlen() to update the queue length >> for all its ancestors, we need to update the backlog too to >> keep the stats on root qdisc accurate. >> > > > There is more than one change in there (the codel change seems > out of place and i wasnt sure why it was needed). I thought it is clear that when codel decides to drop some packets we don't know how many bytes it drops, we only know how many packets before my patch. For example, - qdisc_tree_decrease_qlen(sch, q->cstats.drop_count); + qdisc_tree_reduce_backlog(sch, q->cstats.drop_count, + q->cstats.drop_len); This clearly means I need some codel stats from codel to pass to qdisc_tree_reduce_backlog(), this is why the codel part is necessary. > Also it seems possible you are double-dipping in some cases; > i dont have time to scrutinize - but looking at codel_change() change > when the queue limit is exceeded you will end up affecting backlog from > both qdisc_qstats_backlog_dec() and your new > qdisc_tree_reduce_backlog() Nope, qdisc_qstats_backlog_dec() decreases the backlog of itself, qdisc_tree_reduce_backlog() decreases its upper qdiscs'. It is correct as it was. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: hisilicon: fixes a bug when using ethtool -S
From: lipeng this patch fixes a bug in hns driver. when we want to get statistic info by using ethtool -S, it shows us there are 3 wrong counters info. because the strings related to the registers are wrong. it needs to modify the strings which give us wrong info. Signed-off-by: lipeng Signed-off-by: yankejian Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c index dab5ecf..802d554 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c @@ -51,9 +51,9 @@ static const struct mac_stats_string g_xgmac_stats_string[] = { {"xgmac_rx_bad_pkt_from_dsaf", MAC_STATS_FIELD_OFF(rx_bad_from_sw)}, {"xgmac_tx_bad_pkt_64tomax", MAC_STATS_FIELD_OFF(tx_bad_pkts)}, - {"xgmac_rx_not_well_pkt", MAC_STATS_FIELD_OFF(rx_fragment_err)}, - {"xgmac_rx_good_well_pkt", MAC_STATS_FIELD_OFF(rx_undersize)}, - {"xgmac_rx_total_pkt", MAC_STATS_FIELD_OFF(rx_under_min)}, + {"xgmac_rx_bad_pkts_minto64", MAC_STATS_FIELD_OFF(rx_fragment_err)}, + {"xgmac_rx_good_pkts_minto64", MAC_STATS_FIELD_OFF(rx_undersize)}, + {"xgmac_rx_total_pkts_minto64", MAC_STATS_FIELD_OFF(rx_under_min)}, {"xgmac_rx_pkt_64", MAC_STATS_FIELD_OFF(rx_64bytes)}, {"xgmac_rx_pkt_65to127", MAC_STATS_FIELD_OFF(rx_65to127)}, {"xgmac_rx_pkt_128to255", MAC_STATS_FIELD_OFF(rx_128to255)}, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net-next 3/4] sch_htb: update backlog as well
On Wed, Oct 14, 2015 at 5:25 AM, Jamal Hadi Salim wrote: > On 10/12/15 14:38, Cong Wang wrote: >> >> It is odd to see qlen!=0 but backlog==0, for a real example: >> > > Backlog is a transient stat so a lot of times it should be 0. Only when > the CPU is sending faster than the link can handle should you see > the backlog grow (and eventually drain to 0). Of course. But in my case, we were sending a burst of traffic while with a lower HTB bw limit, so we can consistently see backlog!=0 for many seconds. > > Even though your explanation above is inaccurate I think the spirit > of the patch looks reasonable. i.e keeping track of all additions to > the queue and removals from the queue in the backlog stats is useful. > However, you need to be extremely careful: This should only be done > at exactly the spot the packet is enqueued (and not by a parent's > enqueue asking for hierarchical enques). The reason why I care about backlog and qlen is I want to know the average length of each packet in backlog, to check if it is a GSO packet at least. > > I think some more work is needed Cong for this general patchset. > Sure, I could miss something somewhere, just point it out. :) Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net-next 1/4] net_sched: introduce qdisc_replace() helper
On Wed, Oct 14, 2015 at 4:56 AM, Jamal Hadi Salim wrote: > On 10/12/15 14:38, Cong Wang wrote: >> >> Remove nearly duplicated code and prepare for the following patch. >> > > > Cong - like Dave, I dont see equivalence in some of these > changes. > Example not sure how the qfq grafting invocation of > qfq_purge_queue fits in. There are a few others. drr_purge_queue() and qfq_purge_queue() are both qdisc_reset() + qdisc_tree_decrease_qlen(): static void drr_purge_queue(struct drr_class *cl) { unsigned int len = cl->qdisc->q.qlen; qdisc_reset(cl->qdisc); qdisc_tree_decrease_qlen(cl->qdisc, len); } static void qfq_purge_queue(struct qfq_class *cl) { unsigned int len = cl->qdisc->q.qlen; qdisc_reset(cl->qdisc); qdisc_tree_decrease_qlen(cl->qdisc, len); } Or you mean the order of calling them?? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch net-next 1/4] net_sched: introduce qdisc_replace() helper
On Tue, Oct 13, 2015 at 6:54 PM, David Miller wrote: > From: Cong Wang > Date: Mon, 12 Oct 2015 11:38:00 -0700 > >> Remove nearly duplicated code and prepare for the following patch. >> >> Cc: Jamal Hadi Salim >> Signed-off-by: Cong Wang > > This isn't an equivalent transformation: > >> +static inline struct Qdisc *qdisc_replace(struct Qdisc *sch, struct Qdisc >> *new, >> + struct Qdisc **pold) >> +{ >> + struct Qdisc *old; >> + >> + sch_tree_lock(sch); >> + old = *pold; >> + *pold = new; >> + if (old != NULL) { >> + qdisc_tree_decrease_qlen(old, old->q.qlen); >> + qdisc_reset(old); >> + } >> + sch_tree_unlock(sch); >> + >> + return old; >> +} >> + > > Is not the same as: > >> diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c >> index f26bdea..c76cdd4 100644 >> --- a/net/sched/sch_drr.c >> +++ b/net/sched/sch_drr.c >> @@ -226,11 +226,7 @@ static int drr_graft_class(struct Qdisc *sch, unsigned >> long arg, >> new = &noop_qdisc; >> } >> >> - sch_tree_lock(sch); >> - drr_purge_queue(cl); >> - *old = cl->qdisc; >> - cl->qdisc = new; >> - sch_tree_unlock(sch); >> + *old = qdisc_replace(sch, new, &cl->qdisc); >> return 0; >> } >> > > This. > > If you want to change semantics, you must do it explicitly in a separate > commit with a detailed commit message explaining how and why. If you meant drr_purge_queue(), it is same: static void drr_purge_queue(struct drr_class *cl) { unsigned int len = cl->qdisc->q.qlen; qdisc_reset(cl->qdisc); qdisc_tree_decrease_qlen(cl->qdisc, len); } Or if you mean the 'if', always having one if doesn't harm, do it? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipconfig: send Client-identifier in DHCP requests
On Thu, Oct 15, 2015 at 11:27 AM, kbuild test robot wrote: > Hi Li, > > [auto build test WARNING on net/master -- if it's inappropriate base, please > suggest rules for selecting the more suitable base] > > url: > https://github.com/0day-ci/linux/commits/roy-qing-li-gmail-com/ipconfig-send-Client-identifier-in-DHCP-requests/20151015-105553 > config: parisc-c3000_defconfig (attached as .config) > reproduce: > wget > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross > -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=parisc > > All warnings (new ones prefixed by >>): > >>> net/ipv4/ipconfig.c:148:13: warning: 'dhcp_client_identifier' defined but >>> not used [-Wunused-variable] > static char dhcp_client_identifier[253] __initdata; > ^ Thanks, I will fix it -Roy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipconfig: send Client-identifier in DHCP requests
Hi Li, [auto build test WARNING on net/master -- if it's inappropriate base, please suggest rules for selecting the more suitable base] url: https://github.com/0day-ci/linux/commits/roy-qing-li-gmail-com/ipconfig-send-Client-identifier-in-DHCP-requests/20151015-105553 config: parisc-c3000_defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=parisc All warnings (new ones prefixed by >>): >> net/ipv4/ipconfig.c:148:13: warning: 'dhcp_client_identifier' defined but >> not used [-Wunused-variable] static char dhcp_client_identifier[253] __initdata; ^ vim +/dhcp_client_identifier +148 net/ipv4/ipconfig.c 132 133 static int ic_host_name_set __initdata; /* Host name set by us? */ 134 135 __be32 ic_myaddr = NONE;/* My IP address */ 136 static __be32 ic_netmask = NONE;/* Netmask for local subnet */ 137 __be32 ic_gateway = NONE; /* Gateway IP address */ 138 139 __be32 ic_addrservaddr = NONE; /* IP Address of the IP addresses'server */ 140 141 __be32 ic_servaddr = NONE; /* Boot server IP address */ 142 143 __be32 root_server_addr = NONE; /* Address of NFS server */ 144 u8 root_server_path[256] = { 0, }; /* Path to mount as root */ 145 146 /* vendor class identifier */ 147 static char vendor_class_identifier[253] __initdata; > 148 static char dhcp_client_identifier[253] __initdata; 149 150 /* Persistent data: */ 151 152 static int ic_proto_used; /* Protocol used, if any */ 153 static __be32 ic_nameservers[CONF_NAMESERVERS_MAX]; /* DNS Server IP addresses */ 154 static u8 ic_domain[64];/* DNS (not NIS) domain name */ 155 156 /* --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH] ipconfig: send Client-identifier in DHCP requests
Hi Li, [auto build test WARNING on net/master -- if it's inappropriate base, please suggest rules for selecting the more suitable base] url: https://github.com/0day-ci/linux/commits/roy-qing-li-gmail-com/ipconfig-send-Client-identifier-in-DHCP-requests/20151015-105553 config: parisc-defconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=parisc All warnings (new ones prefixed by >>): net/ipv4/ipconfig.c: In function 'ic_proto_name': >> net/ipv4/ipconfig.c:1584:4: warning: ignoring return value of 'kstrtou8', >> declared with attribute warn_unused_result [-Wunused-result] kstrtou8(client_id, 0, dhcp_client_identifier); ^ vim +/kstrtou8 +1584 net/ipv4/ipconfig.c 1568 return 0; 1569 } 1570 #ifdef CONFIG_IP_PNP_DHCP 1571 else if (!strncmp(name, "dhcp", 4)) { 1572 char *client_id; 1573 1574 ic_proto_enabled &= ~IC_RARP; 1575 client_id = strstr(name, "dhcp,"); 1576 if (client_id) { 1577 char *v; 1578 1579 client_id = client_id + 5; 1580 v = strchr(client_id, ','); 1581 if (!v) 1582 return 1; 1583 *v = 0; > 1584 kstrtou8(client_id, 0, dhcp_client_identifier); 1585 strncpy(dhcp_client_identifier + 1, v + 1, 251); 1586 *v = ','; 1587 } 1588 return 1; 1589 } 1590 #endif 1591 #ifdef CONFIG_IP_PNP_BOOTP 1592 else if (!strcmp(name, "bootp")) { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging
On 10/14/2015 07:52 PM, Andrew Lunn wrote: On Wed, Oct 14, 2015 at 09:28:55PM -0400, Vivien Didelot wrote: On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote: On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote: DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in order to configure the VLAN map of every port. This VLAN map is a feature of these switch chips to hardcode and restrict which output ports a given input port can egress frames to. A Linux bridge is a simple untagged VLAN propagated by the bridge code itself. With a proper 802.1Q support, a driver does not need this hook anymore, and will simply program the related VLAN object. This patchset improves the hardware bridging code in the mv88e6xxx driver with a strict 802.1Q mode. Hi Vivien I just tested this as part of net-next/master, and found a problem If i do: ip link set lan0 up ip addr add 192.168.10.2/24 dev lan0 It will not ping. Looking in sys/kernel/debug/dsa0/stats i see broadcast packets, probably ARP, being received at the port. But they are not being forwarded out the CPU port. If however i do brctl addbr br0 brctl addif br0 lan0 ip addr add 192.168.10.2/24 dev br0 ip link set br0 up i can ping. So it looks like we are too restrictive by default. You should be able to use interfaces as they are, without a bridge. Correct, if the ports are not in a VLAN by default, they cannot talk. Hi Vivien This is a regression. Ports of the switch should work like normal Linux interfaces. And up until now, they did. This patchset changed that. As Florian pointed out, these interfaces are separated from each other. So you need something like a bridge per port by default, which then gets removed and replaced when a port is added to a Linux bridge. We also need to take care of VLANs. When the port is not a member of a linux bridge, i expect all VLAN tagged frames to be received, as well as untagged frames. This is normal Linux behaviour. But i never got around to testing this with DSA. There was a reason for the original code. I had wondered how it is now supposed to work. Guess this exchange explains it. Looking forward to see how it is going to be fixed, and too bad I don't have time to be more involved. Guenter -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
On Wed, Oct 14, 2015 at 10:42 AM, Ido Schimmel wrote: > Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote: >>On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot >> wrote: >>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote: Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com wrote: >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote: >> Mon, Oct 12, 2015 at 08:36:25PM IDT, >> vivien.dide...@savoirfairelinux.com wrote: >> >Hi guys, >> > >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote: >> >> From: Nikolay Aleksandrov >> >> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. >> >> >> >> Signed-off-by: Nikolay Aleksandrov >> >> --- >> >> net/switchdev/switchdev.c | 3 +++ >> >> 1 file changed, 3 insertions(+) >> >> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c >> >> index 6e4a4f9ad927..256c596de896 100644 >> >> --- a/net/switchdev/switchdev.c >> >> +++ b/net/switchdev/switchdev.c >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct >> >> net_device *dev, >> >> if (vlan.vid_begin) >> >> return -EINVAL; >> >> vlan.vid_begin = vinfo->vid; >> >> + /* don't allow range of pvids */ >> >> + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) >> >> + return -EINVAL; >> >> } else if (vinfo->flags & >> >> BRIDGE_VLAN_INFO_RANGE_END) { >> >> if (!vlan.vid_begin) >> >> return -EINVAL; >> >> -- >> >> 2.4.3 >> >> >> > >> >Yes the patch looks good, but it is a minor check though. I hope the >> >subject of this thread is making sense. >> > >> >VLAN ranges seem to have been included for an UX purpose (so commands >> >look like Cisco IOS). We don't want to change any existing interface, >> >so >> >we pushed that down to drivers, with the only valid reason that, maybe >> >one day, an hardware can be capable of programming a range on a >> >per-port >> >basis. >> Hi, >> >> That's actually what we are doing in mlxsw. We can do up to 256 entries >> in >> one go. We've yet to submit this part. > >Perfect Ido, thanks for pointing this out! I'm OK with the range then. > >So there is now a very last question in my head for this, which is more >a matter of kernel design. Should the user be aware of such underlying >support? In other words, would it make sense to do this in a driver: > >foo_port_vlan_add(struct net_device *dev, > struct switchdev_obj_port_vlan *vlan) >{ >if (vlan->vid_begin != vlan->vid_end) >return -ENOTSUPP; /* or something more relevant for user */ > >return foo_port_single_vlan_add(dev, vlan->vid_begin); >} > >So drivers keep being simple, and we can easily propagate the fact that >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not >implemented and must be done in software. I think that if you want to keep it simple, then Scott's advice from the previous thread is the most appropriate one. I believe the hardware you are using is simply not meant to support multiple 802.1Q bridges. >>> >>> You mean allowing only one Linux bridge over an hardware switch? >>> >>> It would for sure simplify how, as developers and users, we represent a >>> physical switch. But I am not sure how to achieve that and I don't have >>> strong opinions on this TBH. >> >>Hi Vivien, I think it's possible to keep switch ports on just one >>bridge if we do a little bit of work on the NETDEV_CHANGEUPPER >>notifier. This will give you the driver-level control you want. Do >>you have time to investigate? The idea is: >> >>1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is >>being added to a second bridge,then return NOTIFY_BAD. Your driver >>needs to track the bridge count. >> >>2) In __netdev_upper_dev_link(), check the return code from the >>call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if >>NOTIFY_BAD, abort the linking operation (goto rollback_xxx). >> > Hi, > > We are doing something similar in mlxsw (not upstream yet). Jiri > introduced PRE_CHANGEUPPER, which is called from the function you > mentioned, but before the linking operation (so that you don't need to > rollback). Oh, cool. > If the notification is about a linking operation and the master is a > bridge different than the current one, then NOTIFY_BAD is returned. So you're wanting to restrict to just one bridge also? Or is NOTIFY_BAD returned for some other reason? I g
Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()
> > X-Signed-Off-By: Rainer Weikusat > Hi, So the patches I've posted and yours both use the idea of a relaying the remote peer wakeup via callbacks that are internal to the net/unix, such that we avoid exposing the remote peer wakeup to the external poll()/select()/epoll(). They differ in when and how those callbacks are registered/unregistered. So I think your approach here will generally keep the peer wait wakeup queue to its absolute minimum, by removing from that queue when we set POLLOUT, however it requires taking the peer waitqueue lock on every poll() call. So I think there are tradeoffs here vs. what I've posted. So for example, if there are a lot of writers against one 'server' socket, there is going to be a lot of lock contention with your approach here. So I think the performance is going to depend on the workload that is tested. I don't have a specific workload that I am trying to solve here, but since you introduced the waiting on the remote peer queue, perhaps you can post numbers comparing the patches that have been posted for the workload that this was developed for? I will send you the latest version of what I have privately - so as not to overly spam the list. Thanks, -Jason -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ipconfig: send Client-identifier in DHCP requests
From: Li RongQing A dhcp server may provide parameters to a client from a pool of IP addresses and using a shared rootfs, or provide a specific set of parameters for a specific client, usually using the MAC address to identify each client individually. The dhcp protocol also specifies a client-id field which can be used to determine the correct parameters to supply when no MAC address is available. There is currently no way to tell the kernel to supply a specific client-id, only the userspace dhcp clients support this feature, but this can not be used when the network is needed before userspace is available such as when the root filesystem is on NFS. This patch is to be able to do something like "ip=dhcp,client_id_type, client_id_value", as a kernel parameter to enable the kernel to identify itself to the server. Signed-off-by: Li RongQing --- Documentation/filesystems/nfs/nfsroot.txt | 3 +++ net/ipv4/ipconfig.c | 28 +++- 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/nfs/nfsroot.txt b/Documentation/filesystems/nfs/nfsroot.txt index 2d66ed6..bb5ab6d 100644 --- a/Documentation/filesystems/nfs/nfsroot.txt +++ b/Documentation/filesystems/nfs/nfsroot.txt @@ -157,6 +157,9 @@ ip=::: both:use both BOOTP and RARP but not DHCP (old option kept for backwards compatibility) + if dhcp is used, the client identifier can be used by following + format "ip=dhcp,client-id-type,client-id-value" + Default: any IP address of first nameserver. diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index ed4ef09..57c4fd4 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -145,6 +145,7 @@ u8 root_server_path[256] = { 0, }; /* Path to mount as root */ /* vendor class identifier */ static char vendor_class_identifier[253] __initdata; +static char dhcp_client_identifier[253] __initdata; /* Persistent data: */ @@ -728,6 +729,16 @@ ic_dhcp_init_options(u8 *options) memcpy(e, vendor_class_identifier, len); e += len; } + len = strlen(dhcp_client_identifier + 1); + /* the minimum length of identifier is 2, include 1 byte type, +* and can not be larger than the length of options +*/ + if (len >= 1 && len < 312 - (e - options) - 1) { + *e++ = 61; + *e++ = len + 1; + memcpy(e, dhcp_client_identifier, len + 1); + e += len + 1; + } } *e++ = 255; /* End of the list */ @@ -1557,8 +1568,23 @@ static int __init ic_proto_name(char *name) return 0; } #ifdef CONFIG_IP_PNP_DHCP - else if (!strcmp(name, "dhcp")) { + else if (!strncmp(name, "dhcp", 4)) { + char *client_id; + ic_proto_enabled &= ~IC_RARP; + client_id = strstr(name, "dhcp,"); + if (client_id) { + char *v; + + client_id = client_id + 5; + v = strchr(client_id, ','); + if (!v) + return 1; + *v = 0; + kstrtou8(client_id, 0, dhcp_client_identifier); + strncpy(dhcp_client_identifier + 1, v + 1, 251); + *v = ','; + } return 1; } #endif -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging
On Wed, Oct 14, 2015 at 09:28:55PM -0400, Vivien Didelot wrote: > On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote: > > On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote: > > > DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device > > > event in > > > order to configure the VLAN map of every port. > > > > > > This VLAN map is a feature of these switch chips to hardcode and restrict > > > which > > > output ports a given input port can egress frames to. > > > > > > A Linux bridge is a simple untagged VLAN propagated by the bridge code > > > itself. > > > With a proper 802.1Q support, a driver does not need this hook anymore, > > > and > > > will simply program the related VLAN object. > > > > > > This patchset improves the hardware bridging code in the mv88e6xxx driver > > > with > > > a strict 802.1Q mode. > > > > Hi Vivien > > > > I just tested this as part of net-next/master, and found a problem > > > > If i do: > > > > ip link set lan0 up > > ip addr add 192.168.10.2/24 dev lan0 > > > > It will not ping. Looking in sys/kernel/debug/dsa0/stats i see > > broadcast packets, probably ARP, being received at the port. > > But they are not being forwarded out the CPU port. > > > > If however i do > > > > brctl addbr br0 > > brctl addif br0 lan0 > > ip addr add 192.168.10.2/24 dev br0 > > ip link set br0 up > > > > i can ping. > > > > So it looks like we are too restrictive by default. You should be able > > to use interfaces as they are, without a bridge. > > Correct, if the ports are not in a VLAN by default, they cannot talk. Hi Vivien This is a regression. Ports of the switch should work like normal Linux interfaces. And up until now, they did. This patchset changed that. As Florian pointed out, these interfaces are separated from each other. So you need something like a bridge per port by default, which then gets removed and replaced when a port is added to a Linux bridge. We also need to take care of VLANs. When the port is not a member of a linux bridge, i expect all VLAN tagged frames to be received, as well as untagged frames. This is normal Linux behaviour. But i never got around to testing this with DSA. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 3/4] Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
On Tue, 13 Oct 2015 06:59:26 -0700, Richard Cochran wrote: On Mon, Oct 12, 2015 at 11:45:21AM -0700, Christopher S. Hall wrote: +struct ptp_sys_offset_precise { + unsigned int rsv[4];/* Reserved for future use. */ + struct ptp_clock_time dev; + struct ptp_clock_time sys; +}; + Please put the reserved field at the bottom. Also, since we reading the raw monotonic time under the hood, we might as well return it in Good idea. Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
On 10/14/15 7:06 PM, David Miller wrote: From: David Ahern Date: Wed, 14 Oct 2015 10:09:59 -0600 This latest patch makes IPv6 static addresses on par with IPv4, including error paths. I don't agree with ipv4's behavior... and just because ipv4 does something poorly doesn't mean we get a free pass to replicate that lazyness in ipv6. As I stated this patch makes IPv6 on par with IPv4 with regards to saving the address and lack of error handling back to the user should a failure happen on a link up. Yes, it is best to give the user notification of a failure, but step back for a moment and look at the bigger picture: At best the address is saved and restored on a link up (the expected outcome for 99.99...% of the time). At worst the address is removed because the prefix route fails a memory allocation and the user is not notified. But that is exactly what happens today - the address is dropped and the user has to restore it. As for the 1 failure path -- it's a GFP_ATOMIC memory allocation failure. Frankly if that happens lack of an address on an interface is the least of the user's problems. As for the options to fix this existing shortcoming: 1. The existing call_netdevice_notifiers infra does not allow a notifier to 'fail' the transaction and roll it back or even to give the user an error message. 2. Stashing the prefix route has its merits but it has to deal with error paths as well. What if the address is deleted? What if the mask is changed while the device is a down state? What if the device is deleted? Sure, handle those cases but what other paths are missing from that list? Both paths introduce a lot of complexity all b/c we want to save the address on a link and restore the route on a link up. Why not take this as a start point that at least does the right thing almost every time? David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/4] Produce system time from correlated clocksource
Richard, On Tue, 13 Oct 2015 14:12:24 -0700, Richard Cochran wrote: On Tue, Oct 13, 2015 at 09:15:51PM +0200, Thomas Gleixner wrote: Can we at least have a explanation of how the firmware operates? How are (ART,sys) pairs are generated, and how they are supposed to get into the DSP? I'll give it a try. The audio controller has a set of registers almost exactly like those on the network device. The e1000e patch adds the e1000e_phc_get_ts() function. It writes a register to start the cross-timestamp process and some time later the hardware sets a bit indicating that it's finished. In the case of the network, the host polls for this bit to be set, indicating the cross-timestamp registers have valid data. In the audio DSP case, it is the DSP that's doing the polling and it can only poll once per millisecond. The transfers look like: Host -PCI (write request) -> DSP [Transaction started from host] DSP -PCI (write to initiate)-> Audio controller [Transaction started from DSP] DSP <-PCI (read to poll status)- Audio Controller [Transaction Complete from DSP perspective] DSP <-PCI (read (ART,device) pair)- Audio Controller DSP -PCI (write notification) -> Host [Transaction complete from Host perspective] Host <-PCI read (ART,device) pair- DSP I hope this is helpful. Thanks. Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] xen-netfront: update num_queues to real created
Sometimes xennet_create_queues() may failed to created all requested queues, we need to update num_queues to real created to avoid NULL pointer dereference. Signed-off-by: Joe Jin Cc: Wei Liu Cc: Ian Campbell Cc: David S. Miller --- drivers/net/xen-netfront.c |9 ++--- 1 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index f821a97..d580aec 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1746,7 +1746,7 @@ static int xennet_create_queues(struct netfront_info *info, dev_err(&info->netdev->dev, "no queues\n"); return -EINVAL; } - return 0; + return num_queues; } /* Common code used when first setting up, and when resuming. */ @@ -1788,9 +1788,12 @@ static int talk_to_netback(struct xenbus_device *dev, if (info->queues) xennet_destroy_queues(info); - err = xennet_create_queues(info, num_queues); - if (err < 0) + /* Update queues number to real created */ + num_queues = xennet_create_queues(info, num_queues); + if (num_queues < 0) { + err = num_queues; goto destroy_ring; + } /* Create shared ring, alloc event channel -- for each queue */ for (i = 0; i < num_queues; ++i) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/2] bpf: control a set of perf events by creating a new ioctl PERF_EVENT_IOC_SET_ENABLER
于 2015/10/15 5:28, Alexei Starovoitov 写道: > On 10/14/15 5:37 AM, Kaixu Xia wrote: >> +event->p_sample_disable = &enabler_event->sample_disable; > > I don't like it as a concept and it's buggy implementation. > What happens here when enabler is alive, but other event is destroyed? > >> --- a/kernel/trace/bpf_trace.c >> +++ b/kernel/trace/bpf_trace.c >> @@ -221,9 +221,12 @@ static u64 bpf_perf_event_sample_control(u64 r1, u64 >> index, u64 flag, u64 r4, u6 >> struct bpf_array *array = container_of(map, struct bpf_array, map); >> struct perf_event *event; >> >> -if (unlikely(index >= array->map.max_entries)) >> +if (unlikely(index > array->map.max_entries)) >> return -E2BIG; >> >> +if (index == array->map.max_entries) >> +index = 0; > > what is this hack for ? > > Either use notification and user space disable or > call bpf_perf_event_sample_control() manually for each cpu. I will discard current implemention that controlling a set of perf events by the 'enabler' event. Call bpf_perf_event_sample_control() manually for each cpu is fine. Maybe we can add a loop to control all the events stored in maps by judging the index, OK? > > > > . > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] netlink: Rightsize IFLA_AF_SPEC size calculation
From: Ronen Arad Date: Wed, 14 Oct 2015 08:51:28 -0700 > @@ -900,7 +901,7 @@ static noinline size_t if_nlmsg_size(const struct > net_device *dev, > + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */ > + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + > IFLA_PORT_SELF */ > + rtnl_link_get_size(dev) /* IFLA_LINKINFO */ > -+ rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */ > + + rtnl_link_get_af_size(dev, ext_filter_mask) /* IFLA_AF_SPEC */ > + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */ Please don't change the indentation on this line, keep it matching the indentation of all of the surrounding lines of this expression. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] drivers/net: get rid of unnecessary initializations in .get_drvinfo()
From: Ivan Vecera Date: Wed, 14 Oct 2015 18:27:52 +0200 > Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len, > eedump_len & regdump_len fields in their .get_drvinfo() ethtool op. > It's not necessary as these fields is filled in ethtool_get_drvinfo(). > > Signed-off-by: Ivan Vecera ... > drivers/net/usb/sr9800.c| 1 - Please fix this unused variable warning added to this file and resubmit, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next V1 0/4] Mellanox driver update, Oct 14 2015
From: Or Gerlitz Date: Wed, 14 Oct 2015 17:43:44 +0300 > Hi Dave, > > This series contains two more patches from Eli, patch from Majd > to support PCI error handlers and a fix from Jack to mlx4 VFs > when probed without a provisioned mac address. > > The patch set applied on top of net-next commit bbb300e "Merge branch > 'bridge-vlan'" > > changes from V0: > - made the health flag int --> bool to address comment from Dave on patch #1 > - fixed sparse warning noted by the 0-day build tests in patch #2 Series applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/4] Produce system time from correlated clocksource
Thomas, On Tue, 13 Oct 2015 12:42:52 -0700, Thomas Gleixner wrote: On Mon, 12 Oct 2015, Christopher S. Hall wrote: audio. This wants to be a seperate patch, really. OK. This makes sense, I'll do this the next time. +/* This needs to be 3 or greater for backtracking to be useful */ Why? The current index points to a copy and the next may be being changed by update_wall_time(). Leaving n-2 entries available with useful history in them. I'll add more descriptive comments here. +#define SHADOW_HISTORY_DEPTH 7 And that number is 7 because? Due to power of 2 it will be 8 instead. As above the useful history is 8-2*1 ms (1 ms is the minimum jiffy length). Array size 4 would not be enough history for the DSP which requires 4 ms of history, in the worst case. +static int shadow_index = -1; /* incremented to zero in What's the point of this? Aside of that, please do not use tail comments. It's removed. A check for validity is added below and this isn't necessary. That's silly. Make DEPTH a power of 2 and do: idx = (idx + 1) & (DEPTH - 1); This is changed. + true : *shadow_index_out < shadow_index; All this can go away. Yes. + /* Also make sure that entry is valid based on current shadow_index */ + *shadow_index_io = ret; + return true; You surely try hard to do stuff in the most unreadable way. Is like this easier to follow? +static struct timekeeper *search_shadow_history(cycles_t cycles, + struct clocksource *cs) +{ + struct timekeeper *tk = &tk_core.timekeeper; + int srchidx = shadow_index; + cycles_t cycles_start, cycles_end; + + cycles_start = tk->tkr_mono.cycle_last; + do { + srchidx = !srchidx-- ? srchidx+SHADOW_HISTORY_DEPTH : srchidx; + tk = shadow_timekeeper + srchidx; + + /* The next shadow entry may be in flight, don't use it */ + if (srchidx == ((shadow_index+1) & (SHADOW_HISTORY_DEPTH-1))) + return NULL; + + /* Make sure timekeeper is related to clock on this interval */ + if (tk->tkr_mono.clock != cs) + return NULL; + + cycles_end = cycles_start; + cycles_start = tk->tkr_mono.cycle_last; + } while (!cycle_between(cycles_start, cycles, cycles_end)); + + return tk; +} A check for validity is added here using the clocksource pointer. and inside of get_correlated_timestamp(): +* into account. If the value is in the past, try to backtrack +*/ + cycles_end = tk->tkr_mono.read(tk->tkr_mono.clock); + cycles_start = tk->tkr_mono.cycle_last; + if (!cycle_between(cycles_start, cycles, cycles_end)) { + tk = search_shadow_history(cycles, crs->related_cs); + if (!tk) + return -EAGAIN; + } + /* +* Get a timestamp from the device if get_ts is non-NULL +*/ + if( crt->get_ts ) { + ret = crt->get_ts(crt); + if (ret) + return ret; + } What's the point of this? Why are you not making the few lines which you can actually reuse a helper function and leave the PTP code alone? The audio driver is structured in such a way that it's simpler to provide a value rather than a callback. I changed this to allow the audio developers to provide an ART value as input. If a callback is provided, the resulting counter value is guaranteed to be later than cycle_last and there is no need to do extra checking (the goto skips that check). Is this an answer to your question? So I reached enf of patch and did not find anything in timekeeping_init() which tells that the index is incremented to 0. It really would need a comment, but why do you want to do that at all. It does not matter whether the first entry is at 0 or 1. You need a validity check for the entries anyway. I think this should be resolved. There's no sensitivity with regard to the start index with an added validity check. Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v2 1/1] tipc: move fragment importance field to new header position
From: Jon Maloy Date: Wed, 14 Oct 2015 09:23:18 -0400 > In commit e3eea1eb47a ("tipc: clean up handling of message priorities") > we introduced a field in the packet header for keeping track of the > priority of fragments, since this value is not present in the specified > protocol header. Since the value so far only is used at the transmitting > end of the link, we have not yet officially defined it as part of the > protocol. > > Unfortunately, the field we use for keeping this value, bits 13-15 in > in word 5, has turned out to be a poor choice; it is already used by the > broadcast protocol for carrying the 'network id' field of the sending > node. Since packet fragments also need to be transported across the > broadcast protocol, the risk of conflict is obvious, and we see this > happen when we use network identities larger than 2^13-1. This has > escaped our testing because we have so far only been using small network > id values. > > We now move this field to bits 0-2 in word 9, a field that is guaranteed > to be unused by all involved protocols. > > Fixes: e3eea1eb47a ("tipc: clean up handling of message priorities") > Signed-off-by: Jon Maloy > Acked-by: Ying Xue Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tcp/dccp: fix potential NULL deref in __inet_inherit_port()
From: Eric Dumazet Date: Wed, 14 Oct 2015 05:58:38 -0700 > From: Eric Dumazet > > As we no longer hold listener lock in fast path, it is possible that a > child is created right after listener freed its bound port, if a close() > is done while incoming packets are processed. > > __inet_inherit_port() must detect this and return an error, > so that caller can free the child earlier. > > Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") > Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") > Signed-off-by: Eric Dumazet Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tcp: avoid spurious SYN flood detection at listen() time
From: Eric Dumazet Date: Wed, 14 Oct 2015 06:16:49 -0700 > From: Eric Dumazet > > At listen() time, there is a small window where listener is visible with > a zero backlog, triggering a spurious "Possible SYN flooding on port" > message. > > Nothing prevents us from setting the correct backlog. > > Signed-off-by: Eric Dumazet Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Captain Kelvin Ken Miller
Am Captain Kelvin Ken Miller i am with the us army in Camp Abu Naji / FOB Garry Owen (Al Amarah)I need you assistant to move some funds out of Iraq. Kindly respond for more details. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: aquantia/teranetics: Convert to use module_phy_driver macro
From: Axel Lin Date: Wed, 14 Oct 2015 18:30:48 +0800 > Use module_phy_driver macro to simplify the code a bit. > > Signed-off-by: Axel Lin Applied to net-next, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/1] eventfd: implementation of EFD_MASK flag
From: Martin Sustrik When implementing network protocols in user space, one has to implement fake file descriptors to represent the sockets for the protocol. Polling on such fake file descriptors is a problem (poll/select/epoll accept only true file descriptors) and forces protocol implementers to use various workarounds resulting in complex, non-standard and convoluted APIs. More generally, ability to create full-blown file descriptors for userspace-to-userspace signalling is missing. While eventfd(2) goes half the way towards this goal it has follwoing shorcomings: I. There's no way to signal POLLPRI, POLLHUP etc. II. There's no way to signal arbitrary combination of POLL* flags. Most notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid combination for a network protocol (rx buffer is empty and tx buffer is full), cannot be signaled using eventfd. This patch implements new EFD_MASK flag which solves the above problems. The semantics of EFD_MASK are as follows: eventfd(2): If eventfd is created with EFD_MASK flag set, it is initialised in such a way as to signal no events on the file descriptor when it is polled on. The 'initval' argument is ignored. write(2): User is allowed to write only buffers containing a 32-bit value representing any combination of event flags as defined by the poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.). Specified events will be signaled when polling (select, poll, epoll) on the eventfd is done later on. read(2): read is not supported and will fail with EINVAL. select(2), poll(2) and similar: When polling on the eventfd marked by EFD_MASK flag, all the events specified in last written event flags shall be signaled. Signed-off-by: Martin Sustrik [dhobs...@igel.co.jp: Rebased, and resubmitted for Linux 4.3] Signed-off-by: Damian Hobson-Garcia --- fs/eventfd.c | 102 ++- include/linux/eventfd.h | 16 +-- include/uapi/linux/eventfd.h | 33 ++ 3 files changed, 126 insertions(+), 25 deletions(-) create mode 100644 include/uapi/linux/eventfd.h diff --git a/fs/eventfd.c b/fs/eventfd.c index 8d0c0df..1310779 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -2,6 +2,7 @@ * fs/eventfd.c * * Copyright (C) 2007 Davide Libenzi + * Copyright (C) 2013 Martin Sustrik * */ @@ -22,18 +23,31 @@ #include #include +#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK) +#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP) + struct eventfd_ctx { struct kref kref; wait_queue_head_t wqh; - /* -* Every time that a write(2) is performed on an eventfd, the -* value of the __u64 being written is added to "count" and a -* wakeup is performed on "wqh". A read(2) will return the "count" -* value to userspace, and will reset "count" to zero. The kernel -* side eventfd_signal() also, adds to the "count" counter and -* issue a wakeup. -*/ - __u64 count; + union { + /* +* Every time that a write(2) is performed on an eventfd, the +* value of the __u64 being written is added to "count" and a +* wakeup is performed on "wqh". A read(2) will return the +* "count" value to userspace, and will reset "count" to zero. +* The kernel side eventfd_signal() also, adds to the "count" +* counter and issue a wakeup. +*/ + __u64 count; + + /* +* When using eventfd in EFD_MASK mode this stracture stores the +* current events to be signaled on the eventfd (events member) +* along with opaque user-defined data (data member). +*/ + __u32 events; + }; unsigned int flags; }; @@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) return events; } +static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait) +{ + struct eventfd_ctx *ctx = file->private_data; + + poll_wait(file, &ctx->wqh, wait); + return ctx->events; +} + static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) { *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count; @@ -239,6 +261,14 @@ static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, return put_user(cnt, (__u64 __user *) buf) ? -EFAULT : sizeof(cnt); } +static ssize_t eventfd_mask_read(struct file *file, char __user *buf, + size_t count, + loff_t *ppos) +{ + return -EINVAL; +} + + static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { @@ -286,6 +31
Re: [PATCH] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
From: Joe Perches Date: Wed, 14 Oct 2015 01:09:40 -0700 > It seems that kernel memory can leak into userspace by a > kmalloc, ethtool_get_strings, then copy_to_user sequence. > > Avoid this by using kcalloc to zero fill the copied buffer. > > Signed-off-by: Joe Perches Applied and queued up for -stable, thanks Joe. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/1] Generalize poll events from eventfd
Using eventfd user space can generate POLLIN/POLLOUT events but some applications may want to generate POLLPRI/POLLERR events as well. This patch submission aims to generalize the events generated by an eventfd. This is a resubmission of a patch from Feb 2013[1]. The original discussion trailed off without any conclusion, but the original author has recently confirmed[2] that this functionality is still useful, so I volunteered to rebase and resubmit the patch for discussion. [1] https://lkml.org/lkml/2013/2/18/147 [2] https://lkml.org/lkml/2015/7/9/153 Changes in v3 - * replace efd_mask structure with scalar 'events' variable. Changes in v2 - * rebased on Linux v4.3-rc1 * Move file operation implementations for EFD_MASK to a seperate structure * Remove 'data' element from efd_mask structure * read() is no longer supported when EFD_MASK is set (fails with EINVAL) * eventfd_ctx_fileget() now returns EINVAL when EFD_MASK is set, eliminating the possibility of triggering the orginal BUG_ON() macros which have now been removed. Thank you, Damian Martin Sustrik (1): eventfd: implementation of EFD_MASK flag fs/eventfd.c | 91 ++-- include/linux/eventfd.h | 16 +--- include/uapi/linux/eventfd.h | 40 +++ 3 files changed, 121 insertions(+), 26 deletions(-) create mode 100644 include/uapi/linux/eventfd.h -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging
On 14/10/15 18:28, Vivien Didelot wrote: > On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote: >> On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote: >>> DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event >>> in >>> order to configure the VLAN map of every port. >>> >>> This VLAN map is a feature of these switch chips to hardcode and restrict >>> which >>> output ports a given input port can egress frames to. >>> >>> A Linux bridge is a simple untagged VLAN propagated by the bridge code >>> itself. >>> With a proper 802.1Q support, a driver does not need this hook anymore, and >>> will simply program the related VLAN object. >>> >>> This patchset improves the hardware bridging code in the mv88e6xxx driver >>> with >>> a strict 802.1Q mode. >> >> Hi Vivien >> >> I just tested this as part of net-next/master, and found a problem >> >> If i do: >> >> ip link set lan0 up >> ip addr add 192.168.10.2/24 dev lan0 >> >> It will not ping. Looking in sys/kernel/debug/dsa0/stats i see >> broadcast packets, probably ARP, being received at the port. >> But they are not being forwarded out the CPU port. >> >> If however i do >> >> brctl addbr br0 >> brctl addif br0 lan0 >> ip addr add 192.168.10.2/24 dev br0 >> ip link set br0 up >> >> i can ping. >> >> So it looks like we are too restrictive by default. You should be able >> to use interfaces as they are, without a bridge. > > Correct, if the ports are not in a VLAN by default, they cannot talk. The expectation for DSA devices, if no bridge device is configured is to have each port be able to talk to the CPU port only, but this has to work out of the box. > > If you want to, I think the special VLAN 0 can be used for that purpose. > IIRC, in a given configuration, Linux add the interfaces (thus programs > the hardware) with VLAN 0. I'm not sure when, maybe when the > .ndo_vlan_rx_add_vid is implemented, I need to give it a shot. But if you do that, won't that put all DSA ports into VLAN 0? Would not that break isolation between each ports as expected for a DSA switch? > > Otherwise, I can send you a patch configuring the VLAN 0 on switch > setup if this is the behavior we want. > > Thanks, > -v > -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging
On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote: > On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote: > > DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event > > in > > order to configure the VLAN map of every port. > > > > This VLAN map is a feature of these switch chips to hardcode and restrict > > which > > output ports a given input port can egress frames to. > > > > A Linux bridge is a simple untagged VLAN propagated by the bridge code > > itself. > > With a proper 802.1Q support, a driver does not need this hook anymore, and > > will simply program the related VLAN object. > > > > This patchset improves the hardware bridging code in the mv88e6xxx driver > > with > > a strict 802.1Q mode. > > Hi Vivien > > I just tested this as part of net-next/master, and found a problem > > If i do: > > ip link set lan0 up > ip addr add 192.168.10.2/24 dev lan0 > > It will not ping. Looking in sys/kernel/debug/dsa0/stats i see > broadcast packets, probably ARP, being received at the port. > But they are not being forwarded out the CPU port. > > If however i do > > brctl addbr br0 > brctl addif br0 lan0 > ip addr add 192.168.10.2/24 dev br0 > ip link set br0 up > > i can ping. > > So it looks like we are too restrictive by default. You should be able > to use interfaces as they are, without a bridge. Correct, if the ports are not in a VLAN by default, they cannot talk. If you want to, I think the special VLAN 0 can be used for that purpose. IIRC, in a given configuration, Linux add the interfaces (thus programs the hardware) with VLAN 0. I'm not sure when, maybe when the .ndo_vlan_rx_add_vid is implemented, I need to give it a shot. Otherwise, I can send you a patch configuring the VLAN 0 on switch setup if this is the behavior we want. Thanks, -v -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v6 01/10] qed: Add module with basic common support
From: Yuval Mintz Date: Wed, 14 Oct 2015 09:24:05 +0300 > +int qed_qm_pf_rt_init(struct qed_hwfn*p_hwfn, > + struct qed_ptt*p_ptt, > + u8port_id, > + u8pf_id, > + u8max_phys_tcs_per_port, > + bool is_first_pf, > + u32 num_pf_cids, > + u32 num_vf_cids, > + u32 num_tids, > + u16 start_pq, > + u16 num_pf_pqs, > + u16 num_vf_pqs, > + u8start_vport, > + u8num_vports, > + u8pf_wfq, > + u32 pf_rl, > + struct init_qm_pq_params *pq_params, > + struct init_qm_vport_params *vport_params); Sorry, this is completely rediculous. No function should have so many parameters. If you need to pass this much information to a function, create a structure in which to contain the values and pass a reference to that. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next 0/4] Rightsize IFLA_AF_SPEC size calculation
>-Original Message- >From: David Miller [mailto:da...@davemloft.net] >Sent: Wednesday, October 14, 2015 6:44 PM >To: Arad, Ronen >Cc: netdev@vger.kernel.org >Subject: Re: [PATCH net-next 0/4] Rightsize IFLA_AF_SPEC size calculation > >From: Ronen Arad >Date: Tue, 13 Oct 2015 22:58:30 -0700 > >> if_nlmsg_size() overestimates the minimum allocation size of netlink dump >> request (when called from rtnl_calcit()) or the size of the message (when >called >> from rtnl_getlink()). This is because ext_filter_mask is not supported by >> rtnl_link_get_af_size() and rtnl_link_get_size(). >> >> The over-estimation is significant when at least one netdev has many VLANs >> configured (8 bytes for each configured VLAN). >> >> This patch-set "rightsizes" the protocol specific attribute size calculation >by >> propagating ext_filter_mask to rtnl_link_get_af_size() and adding optional >> filtering aware get_af_size_filtered op in struct rtnl_af_ops. Bridge >module, >> which already used filtering aware sizing for notification, is enhanced to >do >> the same for netlink dump requests. > >There are only three implementations of get_link_af_size, so please just >simply >change it's signature by adding the ext_filter_mask parameter instead of >creating >a completely new operation. [@Ronen] I've already submitted a V2 that does that in a simplified single part patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] Rightsize IFLA_AF_SPEC size calculation
From: Ronen Arad Date: Tue, 13 Oct 2015 22:58:30 -0700 > if_nlmsg_size() overestimates the minimum allocation size of netlink dump > request (when called from rtnl_calcit()) or the size of the message (when > called > from rtnl_getlink()). This is because ext_filter_mask is not supported by > rtnl_link_get_af_size() and rtnl_link_get_size(). > > The over-estimation is significant when at least one netdev has many VLANs > configured (8 bytes for each configured VLAN). > > This patch-set "rightsizes" the protocol specific attribute size calculation > by > propagating ext_filter_mask to rtnl_link_get_af_size() and adding optional > filtering aware get_af_size_filtered op in struct rtnl_af_ops. Bridge module, > which already used filtering aware sizing for notification, is enhanced to do > the same for netlink dump requests. There are only three implementations of get_link_af_size, so please just simply change it's signature by adding the ext_filter_mask parameter instead of creating a completely new operation. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: hisilicon net: fix a bug about led
From: yankejian Date: Wed, 14 Oct 2015 10:28:57 +0800 > From: lipeng > > this patch fixes a bug in hns driver. the link led is on at the beginning, > but at this time the ethernet port is on down status. it needs to reset > the led status on init sequence. > > Signed-off-by: lipeng > Signed-off-by: yankejian Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] cxgb4i: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytes
From: Karen Xie Date: Tue, 13 Oct 2015 17:13:59 -0700 > This helps improving the latency of small packets. > > Signed-off-by: Rakesh Ranjan > Signed-off-by: Karen Xie Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] net: phy: bcm-phy-lib: Fix module license issue
From: Arun Parameswaran Date: Tue, 13 Oct 2015 13:40:12 -0700 > The 'bcm-phy-lib.c', added as a part of the commit > "net: phy: Add Broadcom phy library for common interfaces" > was missing the module license. This was causing an issue > when the library is built as a module; "module license > 'unspecified' taints kernel". > > This patch fixes the issue by adding the module license, > author and description to the bcm-phy-lib.c file. > > Fixes: a1cba5613edf5 ("net: phy: Add Broadcom phy library for common > interfaces") > Signed-off-by: Arun Parameswaran This patch doesn't apply to my net-next tree at all. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND 1/2] ixgb:Remove reducant error path after call to ixgb_sw_init in the function ixgb_probe
On Wed, 2015-10-14 at 18:57 -0400, Nicholas Krause wrote: > This removes the reducant error path and now no longer used goto > label err_sw_init after the call to ixgb_probe in the function > ixgb_sw_init after calling this function due to it always returning > zero as it is guarantee to run successfully without any issues. > > Signed-off-by: Nicholas Krause > --- > drivers/net/ethernet/intel/ixgb/ixgb_main.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) This driver (ixgb), as well as e100 and e1000 are in maintenance mode which means bug fixes ONLY! Is this patch necessary? Answer: No Is this a bug fix? Answer: No Should you have sent this patch? See answers to previous questions. Please ask these questions to yourself when putting together a patch against these drivers (listed above). With that said, dropping this series. signature.asc Description: This is a digitally signed message part
Re: pull-request: can-next 2015-09-17
From: Marc Kleine-Budde Date: Tue, 13 Oct 2015 18:08:01 +0200 > this is a pull request of 4 patches for net-next/master. > > Two patches are by Gerhard Bertelsmann, fixing some problems in the > sun4i driver. The patch by Arnd Bergmann stops using timeval for the > CAN broadcast manager. The last patch by Alexandre Belloni removes the > otherwise unused struct at91_can_data from the driver. Pulled, thank you. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: mac80211 2015-10-13
From: Johannes Berg Date: Tue, 13 Oct 2015 10:59:47 +0200 > There are just two small fixes, but I didn't really want to wait since > I have nothing else pending. > > Let me know if there's any problem. Pulled, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/switchdev/switchdev.c between commit: 87aaf2caed84 ("switchdev: check if the vlan id is in the proper vlan range") from the net tree and commits: 7ea6eb3f56f4 ("switchdev: introduce transaction item queue for attr_set and obj_add") ab0690023018 ("net: switchdev: abstract object in add/del ops" from the net-next tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc net/switchdev/switchdev.c index 77f5d17e2612,b8aaf820ef65.. --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@@ -16,7 -16,7 +16,8 @@@ #include #include #include +#include + #include #include #include @@@ -635,32 -722,33 +723,35 @@@ static int switchdev_port_br_afspec(str if (nla_len(attr) != sizeof(struct bridge_vlan_info)) return -EINVAL; vinfo = nla_data(attr); + if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK) + return -EINVAL; - vlan->flags = vinfo->flags; + vlan.flags = vinfo->flags; if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) { - if (vlan->vid_begin) + if (vlan.vid_begin) + return -EINVAL; + vlan.vid_begin = vinfo->vid; + /* don't allow range of pvids */ + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) return -EINVAL; - vlan->vid_begin = vinfo->vid; } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) { - if (!vlan->vid_begin) + if (!vlan.vid_begin) return -EINVAL; - vlan->vid_end = vinfo->vid; - if (vlan->vid_end <= vlan->vid_begin) + vlan.vid_end = vinfo->vid; + if (vlan.vid_end <= vlan.vid_begin) return -EINVAL; - err = f(dev, &obj); + err = f(dev, &vlan.obj); if (err) return err; - memset(vlan, 0, sizeof(*vlan)); + memset(&vlan, 0, sizeof(vlan)); } else { - if (vlan->vid_begin) + if (vlan.vid_begin) return -EINVAL; - vlan->vid_begin = vinfo->vid; - vlan->vid_end = vinfo->vid; - err = f(dev, &obj); + vlan.vid_begin = vinfo->vid; + vlan.vid_end = vinfo->vid; + err = f(dev, &vlan.obj); if (err) return err; - memset(vlan, 0, sizeof(*vlan)); + memset(&vlan, 0, sizeof(vlan)); } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Exporting some of sysctls from net/ipv4 and net/core to a net namespace
Hi, It seems due to the following patch set in Linux v3.5 [PATCH net-next 00/19] net: Sysctl simplications and enhancements http://comments.gmane.org/gmane.linux.network/227965 some of the previously visible sysctls variables in net/core and net/ipv4 has become invisible. Is there a possibility that the idea of bringing back some of those parameters as a read-only to a net namespace be considered ? Thanks in advance. -- Regards, Thomas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional
From: David Ahern Date: Wed, 14 Oct 2015 10:09:59 -0600 > This latest patch makes IPv6 static addresses on par with IPv4, > including error paths. I don't agree with ipv4's behavior... and just because ipv4 does something poorly doesn't mean we get a free pass to replicate that lazyness in ipv6. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] openvswitch: Scrub skb between namespaces
On Wed, Oct 14, 2015 at 11:10 AM, Joe Stringer wrote: > If OVS receives a packet from another namespace, then the packet should > be scrubbed. However, people have already begun to rely on the behaviour > that skb->mark is preserved across namespaces, so retain this one field. > > This is mainly to address information leakage between namespaces when > using OVS internal ports, but by placing it in ovs_vport_receive() it is > more generally applicable, meaning it should not be overlooked if other > port types are allowed to be moved into namespaces in future. > > Signed-off-by: Joe Stringer > --- > I originally proposed this patch as part of the conntrack changes to OVS, > and there was some discussion on that thread, culminating here: > http://www.spinics.net/lists/netdev/msg338626.html > > We also discussed this a bit in Seattle, however I didn't follow up > immediately so I don't exactly recall what the consensus was. Following > Jesse's direction in the above thread, I'm proposing that we preserve the > mark, but scrub the rest. Also fixed the use-after-free bug present in the > previous version. > > I think this is relevant for 'net', because this is the first time that > the metadata_dst and nfct are exposed (albeit indirectly) through OVS so it > would be nice to get agreement on the expected behaviour. > --- > net/openvswitch/vport.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c > index fc5c0b9ccfe9..70f19ea99b92 100644 > --- a/net/openvswitch/vport.c > +++ b/net/openvswitch/vport.c > @@ -440,10 +440,17 @@ int ovs_vport_receive(struct vport *vport, struct > sk_buff *skb, > const struct ip_tunnel_info *tun_info) > { > struct sw_flow_key key; > + u32 mark = skb->mark; > int error; > > OVS_CB(skb)->input_vport = vport; > OVS_CB(skb)->mru = 0; > + if (dev_net(skb->dev) != ovs_dp_get_net(vport->dp)) { This should be marked as unlikely. > + skb_scrub_packet(skb, true); > + tun_info = NULL; > + } > + skb->mark = mark; Lets move this to skb scrub block. in other cases this not required. > + > /* Extract flow from 'skb' into 'key'. */ > error = ovs_flow_key_extract(tun_info, skb, &key); > if (unlikely(error)) { > -- > 2.1.4 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc
J. Bruce Fields wrote: > On Wed, Oct 14, 2015 at 03:57:13AM +, Kosuke Tatsukawa wrote: >> J. Bruce Fields wrote: >> > On Mon, Oct 12, 2015 at 10:41:06AM +, Kosuke Tatsukawa wrote: >> >> J. Bruce Fields wrote: >> >> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote: >> >> >> Neil Brown wrote: >> >> >> > Kosuke Tatsukawa writes: >> >> >> > >> >> >> >> There are several places in net/sunrpc/svcsock.c which calls >> >> >> >> waitqueue_active() without calling a memory barrier. Add a memory >> >> >> >> barrier just as in wq_has_sleeper(). >> >> >> >> >> >> >> >> I found this issue when I was looking through the linux source code >> >> >> >> for places calling waitqueue_active() before wake_up*(), but without >> >> >> >> preceding memory barriers, after sending a patch to fix a similar >> >> >> >> issue in drivers/tty/n_tty.c (Details about the original issue can >> >> >> >> be >> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849). >> >> >> > >> >> >> > hi, >> >> >> > this feels like the wrong approach to the problem. It requires extra >> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to >> >> >> > forget. >> >> >> > >> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active() >> >> >> > will need an smb_mb. Could we just put the smb_mb() inside >> >> >> > waitqueue_active()?? >> >> >> >> >> >> >> >> >> There are around 200 occurrences of waitqueue_active() in the kernel >> >> >> source, and most of the places which use it before wake_up are either >> >> >> protected by some spin lock, or already has a memory barrier or some >> >> >> kind of atomic operation before it. >> >> >> >> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in >> >> >> many cases and won't be a good idea. >> >> >> >> >> >> Another way to solve this problem is to remove the waitqueue_active(), >> >> >> making the code look like this; >> >> >>if (wq) >> >> >>wake_up_interruptible(wq); >> >> >> This also fixes the problem because the spinlock in the wake_up*() acts >> >> >> as a memory barrier and prevents the code from being reordered by the >> >> >> CPU (and it also makes the resulting code is much simpler). >> >> > >> >> > I might not care which we did, except I don't have the means to test >> >> > this quickly, and I guess this is some of our most frequently called >> >> > code. >> >> > >> >> > I suppose your patch is the most conservative approach, as the >> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I >> >> > assume is necessarily more expensive than an smp_mb(). >> >> > >> >> > As far as I can tell it's been this way since forever. (Well, since a >> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which >> >> > removed some spinlocks from the data_ready routines.) >> >> > >> >> > I don't understand what the actual race is yet (which code exactly is >> >> > missing the wakeup in this case? nfsd threads seem to instead get >> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().) >> >> >> >> Thank you for the reply. I tried looking into this. >> >> >> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and >> >> svc_udp_init(), which are both called from svc_setup_socket(). >> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4 >> >> callback port related code. >> >> >> >> Maybe I'm wrong, but there might not be any kernel code that is using >> >> the socket's wait queue in this case. >> > >> > As Trond points out there are probably waiters internal to the >> > networking code. >> >> Trond and Bruce, thank you for the comment. I was able to find the call >> to the wait function that was called from nfsd. >> >> sk_stream_wait_connect() and sk_stream_wait_memory() were called from >> either do_tcp_sendpages() or tcp_sendmsg() called from within >> svc_send(). sk_stream_wait_connect() shouldn't be called at this point, >> because the socket has already been used to receive the rpc request. >> >> On the wake_up side, sk_write_space() is called from the following >> locations. The relevant ones seems to be preceded by atomic_sub or a >> memory barrier. >> + ksocknal_write_space >> [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633] >> + atm_pop_raw [net/atm/raw.c:40] >> + sock_setsockopt [net/core/sock.c:740] >> + sock_wfree [net/core/sock.c:1630] >> Preceded by atomic_sub in sock_wfree() >> + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442] >> + do_tcp_sendpages [net/ipv4/tcp.c:1008] >> + tcp_sendmsg [net/ipv4/tcp.c:1300] >> + do_tcp_setsockopt [net/ipv4/tcp.c:2597] >> + tcp_new_space [net/ipv4/tcp_input.c:4885] >> Preceded by smp_mb__after_atomic in tcp_check_space() >> + llc_conn_state_process [net/llc/llc_conn.c:148] >> + pipe_rcv_status [net/phonet/pep.c:312] >> + pipe_do_rcv [net/phonet/pep.c:440] >> + pipe_start_flow_control [net/phonet/pep.c:554] >> + svc_sock_set
Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
On Oct. Wednesday 14 (42) 03:08 PM, Florian Fainelli wrote: > On 14/10/15 11:51, Vivien Didelot wrote: > > On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote: > >> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote: > >>> On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot > >>> wrote: > On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote: > > Tue, Oct 13, 2015 at 05:32:26PM IDT, > > vivien.dide...@savoirfairelinux.com wrote: > >> On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote: > >>> Mon, Oct 12, 2015 at 08:36:25PM IDT, > >>> vivien.dide...@savoirfairelinux.com wrote: > Hi guys, > > On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote: > > From: Nikolay Aleksandrov > > > > We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. > > > > Signed-off-by: Nikolay Aleksandrov > > --- > > net/switchdev/switchdev.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > > index 6e4a4f9ad927..256c596de896 100644 > > --- a/net/switchdev/switchdev.c > > +++ b/net/switchdev/switchdev.c > > @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct > > net_device *dev, > > if (vlan.vid_begin) > > return -EINVAL; > > vlan.vid_begin = vinfo->vid; > > + /* don't allow range of pvids */ > > + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) > > + return -EINVAL; > > } else if (vinfo->flags & > > BRIDGE_VLAN_INFO_RANGE_END) { > > if (!vlan.vid_begin) > > return -EINVAL; > > -- > > 2.4.3 > > > > Yes the patch looks good, but it is a minor check though. I hope the > subject of this thread is making sense. > > VLAN ranges seem to have been included for an UX purpose (so commands > look like Cisco IOS). We don't want to change any existing > interface, so > we pushed that down to drivers, with the only valid reason that, > maybe > one day, an hardware can be capable of programming a range on a > per-port > basis. > >>> Hi, > >>> > >>> That's actually what we are doing in mlxsw. We can do up to 256 > >>> entries in > >>> one go. We've yet to submit this part. > >> > >> Perfect Ido, thanks for pointing this out! I'm OK with the range then. > >> > >> So there is now a very last question in my head for this, which is more > >> a matter of kernel design. Should the user be aware of such underlying > >> support? In other words, would it make sense to do this in a driver: > >> > >>foo_port_vlan_add(struct net_device *dev, > >> struct switchdev_obj_port_vlan *vlan) > >>{ > >>if (vlan->vid_begin != vlan->vid_end) > >>return -ENOTSUPP; /* or something more relevant for user */ > >> > >>return foo_port_single_vlan_add(dev, vlan->vid_begin); > >>} > >> > >> So drivers keep being simple, and we can easily propagate the fact that > >> one-or-all VLAN is not supportable, vs. the VLAN feature itself is not > >> implemented and must be done in software. > > I think that if you want to keep it simple, then Scott's advice from the > > previous thread is the most appropriate one. I believe the hardware you > > are using is simply not meant to support multiple 802.1Q bridges. > > You mean allowing only one Linux bridge over an hardware switch? > > It would for sure simplify how, as developers and users, we represent a > physical switch. But I am not sure how to achieve that and I don't have > strong opinions on this TBH. > >>> > >>> Hi Vivien, I think it's possible to keep switch ports on just one > >>> bridge if we do a little bit of work on the NETDEV_CHANGEUPPER > >>> notifier. This will give you the driver-level control you want. Do > >>> you have time to investigate? The idea is: > >>> > >>> 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is > >>> being added to a second bridge,then return NOTIFY_BAD. Your driver > >>> needs to track the bridge count. > >>> > >>> 2) In __netdev_upper_dev_link(), check the return code from the > >>> call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if > >>> NOTIFY_BAD, abort the linking operation (goto rollback_xxx). > >>> > >> Hi, > >> > >> We are doing something similar in mlxsw (not upstream yet). Jiri > >> introduced PRE_CHANGEUPPER, which is called from the function you > >
[PATCH net-next] net: Fix suspicious RCU usage in fib_rebalance
This command: ip route add 192.168.1.0/24 nexthop via 10.2.1.5 dev eth1 nexthop via 10.2.2.5 dev eth2 generated this suspicious RCU usage message: [ 63.249262] [ 63.249939] === [ 63.251571] [ INFO: suspicious RCU usage. ] [ 63.253250] 4.3.0-rc3+ #298 Not tainted [ 63.254724] --- [ 63.256401] ../include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage! [ 63.259450] [ 63.259450] other info that might help us debug this: [ 63.259450] [ 63.262297] [ 63.262297] rcu_scheduler_active = 1, debug_locks = 1 [ 63.264647] 1 lock held by ip/2870: [ 63.265896] #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x14 [ 63.268858] [ 63.268858] stack backtrace: [ 63.270409] CPU: 4 PID: 2870 Comm: ip Not tainted 4.3.0-rc3+ #298 [ 63.272478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 63.275745] 0001 8800b8c9f8b8 8125f73c 88013afcf301 [ 63.278185] 8800bab7a380 8800b8c9f8e8 8107bf30 8800bb728000 [ 63.280634] 880139fe9a60 880139fe9a00 8800b8c9f908 [ 63.283177] Call Trace: [ 63.283959] [] dump_stack+0x4c/0x68 [ 63.285593] [] lockdep_rcu_suspicious+0xfa/0x103 [ 63.287500] [] __in_dev_get_rcu+0x48/0x4f [ 63.289169] [] fib_rebalance+0x3e/0x127 [ 63.290753] [] ? rcu_read_unlock+0x3e/0x5f [ 63.292442] [] fib_create_info+0xaf9/0xdcc [ 63.294093] [] ? sched_clock_local+0x12/0x75 [ 63.295791] [] fib_table_insert+0x8c/0x451 [ 63.297493] [] ? fib_get_table+0x36/0x43 [ 63.299109] [] inet_rtm_newroute+0x43/0x51 [ 63.300709] [] rtnetlink_rcv_msg+0x182/0x195 [ 63.302334] [] ? trace_hardirqs_on+0xd/0xf [ 63.303888] [] ? rtnl_lock+0x12/0x14 [ 63.305346] [] ? __rtnl_unlock+0x12/0x12 [ 63.306878] [] netlink_rcv_skb+0x3d/0x90 [ 63.308437] [] rtnetlink_rcv+0x21/0x28 [ 63.309916] [] netlink_unicast+0xfa/0x17f [ 63.311447] [] netlink_sendmsg+0x297/0x2dc [ 63.313029] [] sock_sendmsg_nosec+0x12/0x1d [ 63.314597] [] ___sys_sendmsg+0x196/0x21b [ 63.316125] [] ? native_sched_clock+0x1f/0x3c [ 63.317671] [] ? sched_clock_local+0x12/0x75 [ 63.319185] [] ? sched_clock_cpu+0x9d/0xb6 [ 63.320693] [] ? __lock_is_held+0x32/0x54 [ 63.322145] [] ? __fget_light+0x4b/0x77 [ 63.323541] [] __sys_sendmsg+0x3d/0x5b [ 63.324947] [] SyS_sendmsg+0xd/0x19 [ 63.326274] [] entry_SYSCALL_64_fastpath+0x12/0x6f It looks like all of the code paths to fib_rebalance are under rtnl. Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath") Cc: Peter Nørlund Signed-off-by: David Ahern --- net/ipv4/fib_semantics.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index af77298c8b4f..42778d9d71e5 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -545,7 +545,7 @@ static void fib_rebalance(struct fib_info *fi) if (nh->nh_flags & RTNH_F_DEAD) continue; - in_dev = __in_dev_get_rcu(nh->nh_dev); + in_dev = __in_dev_get_rtnl(nh->nh_dev); if (in_dev && IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) && @@ -559,7 +559,7 @@ static void fib_rebalance(struct fib_info *fi) change_nexthops(fi) { int upper_bound; - in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev); + in_dev = __in_dev_get_rtnl(nexthop_nh->nh_dev); if (nexthop_nh->nh_flags & RTNH_F_DEAD) { upper_bound = -1; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in rt6_get_cookie()
On Thu, Oct 15, 2015 at 12:34:13AM +0200, Phil Sutter wrote: > Hi Martin, > > On Tue, Oct 13, 2015 at 11:14:21PM -0700, Martin KaFai Lau wrote: > > On Tue, Oct 13, 2015 at 09:26:41PM +0200, Phil Sutter wrote: > > > I have backed up the rt pointer at top of the function and restored it > > > before pr_err, this is the output: > > > > > > | rt6i_dst:2001:4dd0:ff3b:13::/64 rt6i_gateway::: rt6i_flags:4001 > > > dst.flags: > > Hi Phil, Can you try the following patch and report the pr_err? > > Probably needless to say, but with your patch applied the Oops does not > occur anymore. This is the log output: Thanks for testing it. The patch may need a bit refactoring work and I will post it soon. > > | [ 46.518869] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 46.518874] IPv6: rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 46.529171] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 46.529174] IPv6: rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 46.529187] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 46.529189] IPv6: rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 47.532014] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 47.532021] IPv6: rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 47.532028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 47.532031] IPv6: rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 49.536010] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 49.536014] IPv6: rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 49.536021] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 49.536024] IPv6: rt:8800cb07a180 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 53.544013] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 53.544020] IPv6: rt:8800cb07a300 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > | [ 53.544028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 > | [ 53.544031] IPv6: rt:8800cb07b980 rt6i_dst:[2001:4dd0:ff3b:13::]/64 > rt6i_gateway:[::] rt6i_flags:0001 dst.flags: > > In case the amount of log entries is surprising: my test-case is > mounting two NFS shares over IPsec. No idea if that's relevant or not. I also don't know why xfrm_lookup() errors out and then triggers make_blackhole() but I believe it should not affect the fix here. Thanks, Martin -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging
On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote: > DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in > order to configure the VLAN map of every port. > > This VLAN map is a feature of these switch chips to hardcode and restrict > which > output ports a given input port can egress frames to. > > A Linux bridge is a simple untagged VLAN propagated by the bridge code itself. > With a proper 802.1Q support, a driver does not need this hook anymore, and > will simply program the related VLAN object. > > This patchset improves the hardware bridging code in the mv88e6xxx driver with > a strict 802.1Q mode. Hi Vivien I just tested this as part of net-next/master, and found a problem If i do: ip link set lan0 up ip addr add 192.168.10.2/24 dev lan0 It will not ping. Looking in sys/kernel/debug/dsa0/stats i see broadcast packets, probably ARP, being received at the port. But they are not being forwarded out the CPU port. If however i do brctl addbr br0 brctl addif br0 lan0 ip addr add 192.168.10.2/24 dev br0 ip link set br0 up i can ping. So it looks like we are too restrictive by default. You should be able to use interfaces as they are, without a bridge. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NULL pointer dereference in rt6_get_cookie()
Hi Martin, On Tue, Oct 13, 2015 at 11:14:21PM -0700, Martin KaFai Lau wrote: > On Tue, Oct 13, 2015 at 09:26:41PM +0200, Phil Sutter wrote: > > I have backed up the rt pointer at top of the function and restored it > > before pr_err, this is the output: > > > > | rt6i_dst:2001:4dd0:ff3b:13::/64 rt6i_gateway::: rt6i_flags:4001 > > dst.flags: > Hi Phil, Can you try the following patch and report the pr_err? Probably needless to say, but with your patch applied the Oops does not occur anymore. This is the log output: | [ 46.518869] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 46.518874] IPv6: rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 46.529171] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 46.529174] IPv6: rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 46.529187] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 46.529189] IPv6: rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 47.532014] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 47.532021] IPv6: rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 47.532028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 47.532031] IPv6: rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 49.536010] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 49.536014] IPv6: rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 49.536021] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 49.536024] IPv6: rt:8800cb07a180 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 53.544013] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 53.544020] IPv6: rt:8800cb07a300 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: | [ 53.544028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020 | [ 53.544031] IPv6: rt:8800cb07b980 rt6i_dst:[2001:4dd0:ff3b:13::]/64 rt6i_gateway:[::] rt6i_flags:0001 dst.flags: In case the amount of log entries is surprising: my test-case is mounting two NFS shares over IPsec. No idea if that's relevant or not. Cheers, Phil -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 5/6] netfilter-ipv4: code indentation
Use tabs instead of spaces to indent code. No changes detected by objdiff. Signed-off-by: Ian Morris --- net/ipv4/netfilter/ip_tables.c| 6 +++--- net/ipv4/netfilter/ipt_SYNPROXY.c | 2 +- net/ipv4/netfilter/iptable_security.c | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 3991a87..b99affa 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -431,8 +431,8 @@ ipt_do_table(struct sk_buff *skb, } while (!acpar.hotdrop); pr_debug("Exiting %s; sp at %u\n", __func__, stackidx); - xt_write_recseq_end(addend); - local_bh_enable(); + xt_write_recseq_end(addend); + local_bh_enable(); #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT; @@ -484,7 +484,7 @@ mark_source_chains(const struct xt_table_info *newinfo, unsigned int oldpos, size; if ((strcmp(t->target.u.user.name, - XT_STANDARD_TARGET) == 0) && + XT_STANDARD_TARGET) == 0) && t->verdict < -NF_MAX_VERDICT - 1) { duprintf("mark_source_chains: bad " "negative verdict (%i)\n", diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c b/net/ipv4/netfilter/ipt_SYNPROXY.c index 6a6e762..ff746b33 100644 --- a/net/ipv4/netfilter/ipt_SYNPROXY.c +++ b/net/ipv4/netfilter/ipt_SYNPROXY.c @@ -231,7 +231,7 @@ synproxy_send_client_ack(const struct synproxy_net *snet, synproxy_build_options(nth, opts); synproxy_send_tcp(snet, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY, - niph, nth, tcp_hdr_size); + niph, nth, tcp_hdr_size); } static bool diff --git a/net/ipv4/netfilter/iptable_security.c b/net/ipv4/netfilter/iptable_security.c index f534e2f..c2e23d5 100644 --- a/net/ipv4/netfilter/iptable_security.c +++ b/net/ipv4/netfilter/iptable_security.c @@ -79,7 +79,7 @@ static int __init iptable_security_init(void) int ret; ret = register_pernet_subsys(&iptable_security_net_ops); -if (ret < 0) + if (ret < 0) return ret; sectbl_ops = xt_hook_link(&security_table, iptable_security_hook); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 4/6] netfilter-ipv4: function definition layout
Use tabs instead of spaces to indent second line of parameters in function definitions. No changes detected by objdiff. Signed-off-by: Ian Morris --- net/ipv4/netfilter/arp_tables.c | 6 +++--- net/ipv4/netfilter/ip_tables.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index eb6663bd..11dccba 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -632,7 +632,7 @@ static inline void cleanup_entry(struct arpt_entry *e) * newinfo). */ static int translate_table(struct xt_table_info *newinfo, void *entry0, - const struct arpt_replace *repl) + const struct arpt_replace *repl) { struct arpt_entry *iter; unsigned int i; @@ -892,7 +892,7 @@ static int compat_table_info(const struct xt_table_info *info, #endif static int get_info(struct net *net, void __user *user, -const int *len, int compat) + const int *len, int compat) { char name[XT_TABLE_MAXNAMELEN]; struct xt_table *t; @@ -1069,7 +1069,7 @@ static int __do_replace(struct net *net, const char *name, } static int do_replace(struct net *net, const void __user *user, - unsigned int len) + unsigned int len) { int ret; struct arpt_replace tmp; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 08b7ab0..3991a87 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -804,7 +804,7 @@ cleanup_entry(struct ipt_entry *e, struct net *net) newinfo) */ static int translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0, -const struct ipt_replace *repl) + const struct ipt_replace *repl) { struct ipt_entry *iter; unsigned int i; @@ -1078,7 +1078,7 @@ static int compat_table_info(const struct xt_table_info *info, #endif static int get_info(struct net *net, void __user *user, -const int *len, int compat) + const int *len, int compat) { char name[XT_TABLE_MAXNAMELEN]; struct xt_table *t; @@ -1304,7 +1304,7 @@ do_replace(struct net *net, const void __user *user, unsigned int len) static int do_add_counters(struct net *net, const void __user *user, -unsigned int len, int compat) + unsigned int len, int compat) { unsigned int i; struct xt_counters_info tmp; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 6/6] netfilter-ipv4: whitespace around operators
This patch cleanses whitespace around arithmetical operators. No changes detected by objdiff. Signed-off-by: Ian Morris --- net/ipv4/netfilter/ipt_CLUSTERIP.c | 8 net/ipv4/netfilter/ipt_ah.c| 2 +- net/ipv4/netfilter/nf_nat_snmp_basic.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index 3f32c03..4a9e6db 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -492,14 +492,14 @@ static void arp_print(struct arp_payload *payload) { #define HBUFFERLEN 30 char hbuffer[HBUFFERLEN]; - int j,k; + int j, k; - for (k=0, j=0; k < HBUFFERLEN-3 && j < ETH_ALEN; j++) { + for (k = 0, j = 0; k < HBUFFERLEN - 3 && j < ETH_ALEN; j++) { hbuffer[k++] = hex_asc_hi(payload->src_hw[j]); hbuffer[k++] = hex_asc_lo(payload->src_hw[j]); - hbuffer[k++]=':'; + hbuffer[k++] = ':'; } - hbuffer[--k]='\0'; + hbuffer[--k] = '\0'; pr_debug("src %pI4@%s, dst %pI4\n", &payload->src_ip, hbuffer, &payload->dst_ip); diff --git a/net/ipv4/netfilter/ipt_ah.c b/net/ipv4/netfilter/ipt_ah.c index 14a2aa8..a787d07 100644 --- a/net/ipv4/netfilter/ipt_ah.c +++ b/net/ipv4/netfilter/ipt_ah.c @@ -25,7 +25,7 @@ spi_match(u_int32_t min, u_int32_t max, u_int32_t spi, bool invert) bool r; pr_debug("spi_match:%c 0x%x <= 0x%x <= 0x%x\n", invert ? '!' : ' ', min, spi, max); - r=(spi >= min && spi <= max) ^ invert; + r = (spi >= min && spi <= max) ^ invert; pr_debug(" result %s\n", r ? "PASS" : "FAILED"); return r; } diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c b/net/ipv4/netfilter/nf_nat_snmp_basic.c index 8e3dffa..89be5c5 100644 --- a/net/ipv4/netfilter/nf_nat_snmp_basic.c +++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c @@ -1156,7 +1156,7 @@ static int snmp_parse_mangle(unsigned char *msg, } if (obj->type == SNMP_IPADDR) - mangle_address(ctx.begin, ctx.pointer - 4 , map, check); + mangle_address(ctx.begin, ctx.pointer - 4, map, check); kfree(obj->id); kfree(obj); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 0/6] coding style improvements: netfilter-ipv4
This series of patches improves the coding style of the netfilter-ipv4 code by addressing some issues detected by checkpatch. The changes were previously submitted as part of a larger monolithic patch but on advice from Pablo, these are being re-sent in smaller, more structured batches. Ian Morris (6): netfilter-ipv4: Line layout whitespace fixes netfilter-ipv4: label placement netfilter-ipv4: ternary operator layout netfilter-ipv4: function definition layout netfilter-ipv4: code indentation netfilter-ipv4: whitespace around operators net/ipv4/netfilter/arp_tables.c| 12 ++-- net/ipv4/netfilter/ip_tables.c | 20 ++-- net/ipv4/netfilter/ipt_CLUSTERIP.c | 8 net/ipv4/netfilter/ipt_ECN.c | 2 +- net/ipv4/netfilter/ipt_SYNPROXY.c | 2 +- net/ipv4/netfilter/ipt_ah.c| 2 +- net/ipv4/netfilter/iptable_security.c | 2 +- net/ipv4/netfilter/nf_nat_pptp.c | 2 +- net/ipv4/netfilter/nf_nat_snmp_basic.c | 4 ++-- 9 files changed, 27 insertions(+), 27 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 2/6] netfilter-ipv4: label placement
Whitespace cleansing: Labels should not be indented. No changes detected by objdiff. Signed-off-by: Ian Morris --- net/ipv4/netfilter/arp_tables.c | 2 +- net/ipv4/netfilter/ip_tables.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 2dad3e1..7300616 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -468,7 +468,7 @@ static int mark_source_chains(const struct xt_table_info *newinfo, pos = newpos; } } - next: +next: duprintf("Finished chain %u\n", hook); } return 1; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 42d0946..3be2a4d 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -549,7 +549,7 @@ mark_source_chains(const struct xt_table_info *newinfo, pos = newpos; } } - next: +next: duprintf("Finished chain %u\n", hook); } return 1; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 1/6] netfilter-ipv4: Line layout whitespace fixes
Cleanses some whitespace issues by removing a leading space before a tab. No changes detected by objdiff. Signed-off-by: Ian Morris --- net/ipv4/netfilter/ipt_ECN.c | 2 +- net/ipv4/netfilter/nf_nat_pptp.c | 2 +- net/ipv4/netfilter/nf_nat_snmp_basic.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c index 2707652..6592708 100644 --- a/net/ipv4/netfilter/ipt_ECN.c +++ b/net/ipv4/netfilter/ipt_ECN.c @@ -24,7 +24,7 @@ MODULE_AUTHOR("Harald Welte "); MODULE_DESCRIPTION("Xtables: Explicit Congestion Notification (ECN) flag modification"); /* set ECT codepoint from IP header. - * return false if there was an error. */ + * return false if there was an error. */ static inline bool set_ect_ip(struct sk_buff *skb, const struct ipt_ECN_info *einfo) { diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c index 657d230..d5726f7 100644 --- a/net/ipv4/netfilter/nf_nat_pptp.c +++ b/net/ipv4/netfilter/nf_nat_pptp.c @@ -16,7 +16,7 @@ * (C) 2006-2012 Patrick McHardy * * TODO: - NAT to a unique tuple, not to TCP source port - *(needs netfilter tuple reservation) + *(needs netfilter tuple reservation) */ #include diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c b/net/ipv4/netfilter/nf_nat_snmp_basic.c index 7c67667..8e3dffa 100644 --- a/net/ipv4/netfilter/nf_nat_snmp_basic.c +++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c @@ -891,7 +891,7 @@ static void fast_csum(__sum16 *csum, /* * Mangle IP address. - * - begin points to the start of the snmp messgae + * - begin points to the start of the snmp messgae * - addr points to the start of the address */ static inline void mangle_address(unsigned char *begin, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next 3/6] netfilter-ipv4: ternary operator layout
Correct whitespace layout of ternary operators in the netfilter-ipv4 code. No changes detected by objdiff. Signed-off-by: Ian Morris --- net/ipv4/netfilter/arp_tables.c | 4 ++-- net/ipv4/netfilter/ip_tables.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 7300616..eb6663bd 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -186,7 +186,7 @@ static inline int arp_packet_match(const struct arphdr *arphdr, if (FWINV(ret != 0, ARPT_INV_VIA_IN)) { dprintf("VIA in mismatch (%s vs %s).%s\n", indev, arpinfo->iniface, - arpinfo->invflags&ARPT_INV_VIA_IN ?" (INV)":""); + arpinfo->invflags & ARPT_INV_VIA_IN ? " (INV)" : ""); return 0; } @@ -195,7 +195,7 @@ static inline int arp_packet_match(const struct arphdr *arphdr, if (FWINV(ret != 0, ARPT_INV_VIA_OUT)) { dprintf("VIA out mismatch (%s vs %s).%s\n", outdev, arpinfo->outiface, - arpinfo->invflags&ARPT_INV_VIA_OUT ?" (INV)":""); + arpinfo->invflags & ARPT_INV_VIA_OUT ? " (INV)" : ""); return 0; } diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 3be2a4d..08b7ab0 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -102,7 +102,7 @@ ip_packet_match(const struct iphdr *ip, if (FWINV(ret != 0, IPT_INV_VIA_IN)) { dprintf("VIA in mismatch (%s vs %s).%s\n", indev, ipinfo->iniface, - ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":""); + ipinfo->invflags & IPT_INV_VIA_IN ? " (INV)" : ""); return false; } @@ -111,7 +111,7 @@ ip_packet_match(const struct iphdr *ip, if (FWINV(ret != 0, IPT_INV_VIA_OUT)) { dprintf("VIA out mismatch (%s vs %s).%s\n", outdev, ipinfo->outiface, - ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":""); + ipinfo->invflags & IPT_INV_VIA_OUT ? " (INV)" : ""); return false; } @@ -120,7 +120,7 @@ ip_packet_match(const struct iphdr *ip, FWINV(ip->protocol != ipinfo->proto, IPT_INV_PROTO)) { dprintf("Packet protocol %hi does not match %hi.%s\n", ip->protocol, ipinfo->proto, - ipinfo->invflags&IPT_INV_PROTO ? " (INV)":""); + ipinfo->invflags & IPT_INV_PROTO ? " (INV)" : ""); return false; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/1] via-rhine: fix VLAN receive handling regression.
From: Andrej Ota Because eth_type_trans() consumes ethernet header worth of bytes, a call to read TCI from end of packet using rhine_rx_vlan_tag() no longer works as it's reading from an invalid offset. Tested to be working on PCEngines Alix board. Fixes: 810f19bcb862 ("via-rhine: add consistent memory barrier in vlan receive code.") Signed-off-by: Andrej Ota Acked-by: Francois Romieu --- Applies fine as of 0f8b8e28fb3241f9fd82ce13bac2b40c35e987e0 ("tipc: eliminate risk of stalled link synchronization"). Andrej posted it on l-k the 2015/10/04, see http://marc.info/?l=linux-kernel&m=144398918324349 Kernel v4.2 exhibits the regression. Stable v4.[01] kernels don't. drivers/net/ethernet/via/via-rhine.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c index a832637..2b7550c 100644 --- a/drivers/net/ethernet/via/via-rhine.c +++ b/drivers/net/ethernet/via/via-rhine.c @@ -2134,10 +2134,11 @@ static int rhine_rx(struct net_device *dev, int limit) } skb_put(skb, pkt_len); - skb->protocol = eth_type_trans(skb, dev); rhine_rx_vlan_tag(skb, desc, data_size); + skb->protocol = eth_type_trans(skb, dev); + netif_receive_skb(skb); u64_stats_update_begin(&rp->rx_stats.syncp); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
On 14/10/15 11:51, Vivien Didelot wrote: > On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote: >> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote: >>> On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot >>> wrote: On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote: > Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com > wrote: >> On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote: >>> Mon, Oct 12, 2015 at 08:36:25PM IDT, >>> vivien.dide...@savoirfairelinux.com wrote: Hi guys, On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote: > From: Nikolay Aleksandrov > > We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. > > Signed-off-by: Nikolay Aleksandrov > --- > net/switchdev/switchdev.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > index 6e4a4f9ad927..256c596de896 100644 > --- a/net/switchdev/switchdev.c > +++ b/net/switchdev/switchdev.c > @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct > net_device *dev, > if (vlan.vid_begin) > return -EINVAL; > vlan.vid_begin = vinfo->vid; > + /* don't allow range of pvids */ > + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) > + return -EINVAL; > } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) > { > if (!vlan.vid_begin) > return -EINVAL; > -- > 2.4.3 > Yes the patch looks good, but it is a minor check though. I hope the subject of this thread is making sense. VLAN ranges seem to have been included for an UX purpose (so commands look like Cisco IOS). We don't want to change any existing interface, so we pushed that down to drivers, with the only valid reason that, maybe one day, an hardware can be capable of programming a range on a per-port basis. >>> Hi, >>> >>> That's actually what we are doing in mlxsw. We can do up to 256 entries >>> in >>> one go. We've yet to submit this part. >> >> Perfect Ido, thanks for pointing this out! I'm OK with the range then. >> >> So there is now a very last question in my head for this, which is more >> a matter of kernel design. Should the user be aware of such underlying >> support? In other words, would it make sense to do this in a driver: >> >>foo_port_vlan_add(struct net_device *dev, >> struct switchdev_obj_port_vlan *vlan) >>{ >>if (vlan->vid_begin != vlan->vid_end) >>return -ENOTSUPP; /* or something more relevant for user */ >> >>return foo_port_single_vlan_add(dev, vlan->vid_begin); >>} >> >> So drivers keep being simple, and we can easily propagate the fact that >> one-or-all VLAN is not supportable, vs. the VLAN feature itself is not >> implemented and must be done in software. > I think that if you want to keep it simple, then Scott's advice from the > previous thread is the most appropriate one. I believe the hardware you > are using is simply not meant to support multiple 802.1Q bridges. You mean allowing only one Linux bridge over an hardware switch? It would for sure simplify how, as developers and users, we represent a physical switch. But I am not sure how to achieve that and I don't have strong opinions on this TBH. >>> >>> Hi Vivien, I think it's possible to keep switch ports on just one >>> bridge if we do a little bit of work on the NETDEV_CHANGEUPPER >>> notifier. This will give you the driver-level control you want. Do >>> you have time to investigate? The idea is: >>> >>> 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is >>> being added to a second bridge,then return NOTIFY_BAD. Your driver >>> needs to track the bridge count. >>> >>> 2) In __netdev_upper_dev_link(), check the return code from the >>> call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if >>> NOTIFY_BAD, abort the linking operation (goto rollback_xxx). >>> >> Hi, >> >> We are doing something similar in mlxsw (not upstream yet). Jiri >> introduced PRE_CHANGEUPPER, which is called from the function you >> mentioned, but before the linking operation (so that you don't need to >> rollback). >> >> If the notification is about a linking operation and the master is a >> bridge different than the current one, then NOTIFY_BAD is returned. > > Great, I'll wait for this then. > > Scott,
Re: [PATCH net-next] bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put
On 10/14/15 2:40 PM, Tom Herbert wrote: Currently, is only called from __prog_put_rcu in the bpf_prog_release path. Need this to call this from bpf_prog_put also to get correct accounting. Fixes: commit aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: Tom Herbert ohh. right. good catch. thanks! Acked-by: Alexei Starovoitov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put
Currently, is only called from __prog_put_rcu in the bpf_prog_release path. Need this to call this from bpf_prog_put also to get correct accounting. Fixes: commit aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: Tom Herbert --- kernel/bpf/syscall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f640e5f..687dd6c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -520,6 +520,7 @@ void bpf_prog_put(struct bpf_prog *prog) { if (atomic_dec_and_test(&prog->aux->refcnt)) { free_used_maps(prog->aux); + bpf_prog_uncharge_memlock(prog); bpf_prog_free(prog); } } -- 2.4.6 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 2/2] bpf: control a set of perf events by creating a new ioctl PERF_EVENT_IOC_SET_ENABLER
On 10/14/15 5:37 AM, Kaixu Xia wrote: + event->p_sample_disable = &enabler_event->sample_disable; I don't like it as a concept and it's buggy implementation. What happens here when enabler is alive, but other event is destroyed? --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -221,9 +221,12 @@ static u64 bpf_perf_event_sample_control(u64 r1, u64 index, u64 flag, u64 r4, u6 struct bpf_array *array = container_of(map, struct bpf_array, map); struct perf_event *event; - if (unlikely(index >= array->map.max_entries)) + if (unlikely(index > array->map.max_entries)) return -E2BIG; + if (index == array->map.max_entries) + index = 0; what is this hack for ? Either use notification and user space disable or call bpf_perf_event_sample_control() manually for each cpu. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] RDS: convert bind hash table to re-sizable hashtable
On 10/14/15 2:15 PM, Santosh Shilimkar wrote: From: Santosh Shilimkar To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Cc: David Laight Cc: David Miller Signed-off-by: Santosh Shilimkar --- As promised in last series review, here is an RFC to conver RDS to make use of re-sizable hash tables. I haven't turned on auto shrinking on by purpose. Ignore the automatic_shrinking remark since patch has it enabled. net/rds/af_rds.c | 10 - net/rds/bind.c | 127 --- net/rds/rds.h| 7 ++- 3 files changed, 58 insertions(+), 86 deletions(-) diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 384ea1e..b5476aeb 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -573,6 +573,7 @@ static void rds_exit(void) rds_threads_exit(); rds_stats_exit(); rds_page_exit(); + rds_bind_lock_destroy(); rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info); rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); } @@ -582,11 +583,14 @@ static int rds_init(void) { int ret; - rds_bind_lock_init(); + ret = rds_bind_lock_init(); + if (ret) + goto out; ret = rds_conn_init(); if (ret) - goto out; + goto out_bind; + ret = rds_threads_init(); if (ret) goto out_conn; @@ -620,6 +624,8 @@ out_conn: rds_conn_exit(); rds_cong_exit(); rds_page_exit(); +out_bind: + rds_bind_lock_destroy(); out: return ret; } diff --git a/net/rds/bind.c b/net/rds/bind.c index bc6b93e..199e4cc 100644 --- a/net/rds/bind.c +++ b/net/rds/bind.c @@ -38,54 +38,18 @@ #include #include "rds.h" -struct bind_bucket { - rwlock_tlock; - struct hlist_head head; +static struct rhashtable bind_hash_table; + +static struct rhashtable_params ht_parms = { + .nelem_hint = 768, + .key_len = sizeof(u64), + .key_offset = offsetof(struct rds_sock, rs_bound_key), + .head_offset = offsetof(struct rds_sock, rs_bound_node), + .max_size = 16384, + .min_size = 1024, + .automatic_shrinking = true, }; -#define BIND_HASH_SIZE 1024 -static struct bind_bucket bind_hash_table[BIND_HASH_SIZE]; - -static struct bind_bucket *hash_to_bucket(__be32 addr, __be16 port) -{ - return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) & - (BIND_HASH_SIZE - 1)); -} - -/* must hold either read or write lock (write lock for insert != NULL) */ -static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket, - __be32 addr, __be16 port, - struct rds_sock *insert) -{ - struct rds_sock *rs; - struct hlist_head *head = &bucket->head; - u64 cmp; - u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port); - - hlist_for_each_entry(rs, head, rs_bound_node) { - cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) | - be16_to_cpu(rs->rs_bound_port); - - if (cmp == needle) { - rds_sock_addref(rs); - return rs; - } - } - - if (insert) { - /* -* make sure our addr and port are set before -* we are added to the list. -*/ - insert->rs_bound_addr = addr; - insert->rs_bound_port = port; - rds_sock_addref(insert); - - hlist_add_head(&insert->rs_bound_node, head); - } - return NULL; -} - /* * Return the rds_sock bound at the given local address. * @@ -94,18 +58,14 @@ static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket, */ struct rds_sock *rds_find_bound(__be32 addr, __be16 port) { + u64 key = ((u64)addr << 32) | port; struct rds_sock *rs; - unsigned long flags; - struct bind_bucket *bucket = hash_to_bucket(addr, port); - read_lock_irqsave(&bucket->lock, flags); - rs = rds_bind_lookup(bucket, addr, port, NULL); - read_unlock_irqrestore(&bucket->lock, flags); - - if (rs && sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) { - rds_sock_put(rs); + rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms); + if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) + rds_sock_addref(rs); + else rs = NULL; - } rdsdebug("returning rs %p for %pI4:%u\n",
Re: [PATCH V2 1/2] bpf: control the trace data output on current cpu when perf sampling
On 10/14/15 5:37 AM, Kaixu Xia wrote: This patch adds the flag sample_disable to control the trace data output process when perf sampling. By setting this flag and integrating with ebpf, we can control the data output process and get the samples we are most interested in. The bpf helper bpf_perf_event_sample_control() can control the perf_event on current cpu. Signed-off-by: Kaixu Xia ... --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event, irq_work_queue(&event->pending); } + if (!atomic_read(&event->sample_disable)) + return ret; + the condition check and the name are inconsistent. It's either if (!enabled) return or if (disabled) return if (event->overflow_handler) event->overflow_handler(event, data, regs); else @@ -7709,6 +7712,14 @@ static void account_event(struct perf_event *event) account_event_cpu(event, event->cpu); } +static void perf_event_check_sample_flag(struct perf_event *event) +{ + if (event->attr.sample_disable == 1) + atomic_set(&event->sample_disable, 0); + else + atomic_set(&event->sample_disable, 1); +} why introduce new attribute for this? we already have 'disabled' flag. +static u64 bpf_perf_event_sample_control(u64 r1, u64 index, u64 flag, u64 r4, u64 r5) +{ + struct bpf_map *map = (struct bpf_map *) (unsigned long) r1; + struct bpf_array *array = container_of(map, struct bpf_array, map); + struct perf_event *event; + + if (unlikely(index >= array->map.max_entries)) + return -E2BIG; + + event = (struct perf_event *)array->ptrs[index]; + if (!event) + return -ENOENT; + + if (flag) please check only bit 0 and check that all other bits are zero as well for future extensibility. + atomic_dec(&event->sample_disable); it should be atomic_dec_if_positive(); + else + atomic_inc(&event->sample_disable); and atomic_add_unless() to make sure we don't wrap on either side. +const struct bpf_func_proto bpf_perf_event_sample_control_proto = { static. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] RDS: convert bind hash table to re-sizable hashtable
From: Santosh Shilimkar To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Cc: David Laight Cc: David Miller Signed-off-by: Santosh Shilimkar --- As promised in last series review, here is an RFC to conver RDS to make use of re-sizable hash tables. I haven't turned on auto shrinking on by purpose. net/rds/af_rds.c | 10 - net/rds/bind.c | 127 --- net/rds/rds.h| 7 ++- 3 files changed, 58 insertions(+), 86 deletions(-) diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 384ea1e..b5476aeb 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -573,6 +573,7 @@ static void rds_exit(void) rds_threads_exit(); rds_stats_exit(); rds_page_exit(); + rds_bind_lock_destroy(); rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info); rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info); } @@ -582,11 +583,14 @@ static int rds_init(void) { int ret; - rds_bind_lock_init(); + ret = rds_bind_lock_init(); + if (ret) + goto out; ret = rds_conn_init(); if (ret) - goto out; + goto out_bind; + ret = rds_threads_init(); if (ret) goto out_conn; @@ -620,6 +624,8 @@ out_conn: rds_conn_exit(); rds_cong_exit(); rds_page_exit(); +out_bind: + rds_bind_lock_destroy(); out: return ret; } diff --git a/net/rds/bind.c b/net/rds/bind.c index bc6b93e..199e4cc 100644 --- a/net/rds/bind.c +++ b/net/rds/bind.c @@ -38,54 +38,18 @@ #include #include "rds.h" -struct bind_bucket { - rwlock_tlock; - struct hlist_head head; +static struct rhashtable bind_hash_table; + +static struct rhashtable_params ht_parms = { + .nelem_hint = 768, + .key_len = sizeof(u64), + .key_offset = offsetof(struct rds_sock, rs_bound_key), + .head_offset = offsetof(struct rds_sock, rs_bound_node), + .max_size = 16384, + .min_size = 1024, + .automatic_shrinking = true, }; -#define BIND_HASH_SIZE 1024 -static struct bind_bucket bind_hash_table[BIND_HASH_SIZE]; - -static struct bind_bucket *hash_to_bucket(__be32 addr, __be16 port) -{ - return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) & - (BIND_HASH_SIZE - 1)); -} - -/* must hold either read or write lock (write lock for insert != NULL) */ -static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket, - __be32 addr, __be16 port, - struct rds_sock *insert) -{ - struct rds_sock *rs; - struct hlist_head *head = &bucket->head; - u64 cmp; - u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port); - - hlist_for_each_entry(rs, head, rs_bound_node) { - cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) | - be16_to_cpu(rs->rs_bound_port); - - if (cmp == needle) { - rds_sock_addref(rs); - return rs; - } - } - - if (insert) { - /* -* make sure our addr and port are set before -* we are added to the list. -*/ - insert->rs_bound_addr = addr; - insert->rs_bound_port = port; - rds_sock_addref(insert); - - hlist_add_head(&insert->rs_bound_node, head); - } - return NULL; -} - /* * Return the rds_sock bound at the given local address. * @@ -94,18 +58,14 @@ static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket, */ struct rds_sock *rds_find_bound(__be32 addr, __be16 port) { + u64 key = ((u64)addr << 32) | port; struct rds_sock *rs; - unsigned long flags; - struct bind_bucket *bucket = hash_to_bucket(addr, port); - read_lock_irqsave(&bucket->lock, flags); - rs = rds_bind_lookup(bucket, addr, port, NULL); - read_unlock_irqrestore(&bucket->lock, flags); - - if (rs && sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) { - rds_sock_put(rs); + rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms); + if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) + rds_sock_addref(rs); + else rs = NULL; - } rdsdebug("returning rs %p for %pI4:%u\n", rs, &addr, ntohs(port)); @@ -116,10 +76,9 @@ struct rds_sock *rds_find_bound(__be32 addr, __be16 port) /* retur
Re: [PATCH] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
On Wed, 2015-10-14 at 01:09 -0700, Joe Perches wrote: > It seems that kernel memory can leak into userspace by a > kmalloc, ethtool_get_strings, then copy_to_user sequence. > > Avoid this by using kcalloc to zero fill the copied buffer. > > Signed-off-by: Joe Perches > --- > > stable too... > > On Tue, 2015-10-13 at 23:59 -0700, Jeff Kirsher wrote: > > From: Jacob Keller > [] > > diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c > > b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c > [] > > @@ -206,13 +206,13 @@ static void fm10k_get_stat_strings(struct net_device > > *dev, u8 *data) > > > >> > } > > > > > >> > for (i = 0; i < interface->hw.mac.max_queues; i++) { > > -> >> > > > sprintf(p, "tx_queue_%u_packets", i); > > +> >> > > > snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_packets", > > i); > > It seems these need a memset after the snprintf to zero fill > bytes after the string terminating \0 to avoid leaking > contents of any unset bytes. Right. It used to be that all drivers were memcpy()ing from a static array which had all the necessary zero bytes, but now there are a bunch of them using s{,n}printf() or otherwise dynamically generating names for statistics or tests. And I don't think there's any snprintf()- alike function that will fix that. At least these drivers aren't zero-padding all strings: bnx2x, bnad, i40e, i40evf, igb, ixgbe, liquidio, mlx4_en, mlx5e, nicvf, qlcnic, sfc, vxge. Acked-by: Ben Hutchings Ben. > It'd probably be better to allocate a zeroed buffer instead. > > > p += ETH_GSTRING_LEN; > > - sprintf(p, "tx_queue_%u_bytes", i); > > +> >> > > > snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_bytes", > > i); > > so... > > net/core/ethtool.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/ethtool.c b/net/core/ethtool.c > index b495ab1..29edf74 100644 > --- a/net/core/ethtool.c > +++ b/net/core/ethtool.c > @@ -1284,7 +1284,7 @@ static int ethtool_get_strings(struct net_device *dev, > void __user *useraddr) > > gstrings.len = ret; > > - data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER); > + data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER); > if (!data) > return -ENOMEM; > > > -- Ben Hutchings [W]e found...that it wasn't as easy to get programs right as we had thought. ... I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. - Maurice Wilkes, 1949 signature.asc Description: This is a digitally signed message part
Re: [PATCH net-next 3/3] tcp/dccp: fix race at listener dismantle phase
Hi Eric, [auto build test WARNING on net-next/master -- if it's inappropriate base, please suggest rules for selecting the more suitable base] url: https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-dccp-make-our-listener-code-more-robust/20151015-020006 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> net/ipv4/tcp_input.c:6238:17: sparse: context imbalance in >> 'tcp_conn_request' - unexpected unlock vim +/tcp_conn_request +6238 net/ipv4/tcp_input.c f7b3bec6 Florian Westphal 2014-11-03 6222 f7b3bec6 Florian Westphal 2014-11-03 6223 if (want_cookie) { f7b3bec6 Florian Westphal 2014-11-03 6224 isn = cookie_init_sequence(af_ops, sk, skb, &req->mss); f7b3bec6 Florian Westphal 2014-11-03 6225 req->cookie_ts = tmp_opt.tstamp_ok; f7b3bec6 Florian Westphal 2014-11-03 6226 if (!tmp_opt.tstamp_ok) f7b3bec6 Florian Westphal 2014-11-03 6227 inet_rsk(req)->ecn_ok = 0; f7b3bec6 Florian Westphal 2014-11-03 6228 } f7b3bec6 Florian Westphal 2014-11-03 6229 1fb6f159 Octavian Purdila 2014-06-25 6230 tcp_rsk(req)->snt_isn = isn; 58d607d3 Eric Dumazet 2015-09-15 6231 tcp_rsk(req)->txhash = net_tx_rndhash(); 1fb6f159 Octavian Purdila 2014-06-25 6232 tcp_openreq_init_rwin(req, sk, dst); ca6fb065 Eric Dumazet 2015-10-02 6233 if (!want_cookie) { ca6fb065 Eric Dumazet 2015-10-02 6234 tcp_reqsk_record_syn(sk, req, skb); 7656d842 Eric Dumazet 2015-10-04 6235 fastopen_sk = tcp_try_fastopen(sk, skb, req, &foc, dst); ca6fb065 Eric Dumazet 2015-10-02 6236 } 7c85af88 Eric Dumazet 2015-09-24 6237 if (fastopen_sk) { ca6fb065 Eric Dumazet 2015-10-02 @6238 af_ops->send_synack(fastopen_sk, dst, &fl, req, ca6fb065 Eric Dumazet 2015-10-02 6239 skb_get_queue_mapping(skb), &foc, false); 7656d842 Eric Dumazet 2015-10-04 6240 /* Add the child socket directly into the accept queue */ 7656d842 Eric Dumazet 2015-10-04 6241 inet_csk_reqsk_queue_add(sk, req, fastopen_sk); 7656d842 Eric Dumazet 2015-10-04 6242 sk->sk_data_ready(sk); 7656d842 Eric Dumazet 2015-10-04 6243 bh_unlock_sock(fastopen_sk); 7c85af88 Eric Dumazet 2015-09-24 6244 sock_put(fastopen_sk); 7c85af88 Eric Dumazet 2015-09-24 6245 } else { 9439ce00 Eric Dumazet 2015-03-17 6246 tcp_rsk(req)->tfo_listener = false; :: The code at line 6238 was first introduced by commit :: ca6fb06518836ef9b65dc0aac02ff97704d52a05 tcp: attach SYNACK messages to request sockets instead of listener :: TO: Eric Dumazet :: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] route: fib_validate_source remove the <= RT_SCOPE_HOST test
Hello, On Thu, 15 Oct 2015, lucien xin wrote: > yeah, I don't understand why err > 0 is necessary to set IPSKB_DOREDIRECT > to send redirects. > FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST, what's that mean? It tells us that packet comes from remote address that we can reach directly, without using gateway. The most common values for nh_scope are RT_SCOPE_LINK (when nh_gw is unicast address), RT_SCOPE_HOST (when nh_gw is not set or is local address) and RT_SCOPE_NOWHERE (when we have a local route). You can check fib_check_nh() and fib_create_info() for reference. Regards -- Julian Anastasov -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls
On Oct. Wednesday 14 (42) 07:40 PM, Jiri Pirko wrote: > From: Jiri Pirko > > Signed-off-by: Jiri Pirko Reviewed-by: Vivien Didelot -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote: > Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote: > >On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot > > wrote: > >> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote: > >>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com > >>> wrote: > >>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote: > >>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, > >>> >> vivien.dide...@savoirfairelinux.com wrote: > >>> >> >Hi guys, > >>> >> > > >>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote: > >>> >> >> From: Nikolay Aleksandrov > >>> >> >> > >>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges. > >>> >> >> > >>> >> >> Signed-off-by: Nikolay Aleksandrov > >>> >> >> --- > >>> >> >> net/switchdev/switchdev.c | 3 +++ > >>> >> >> 1 file changed, 3 insertions(+) > >>> >> >> > >>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c > >>> >> >> index 6e4a4f9ad927..256c596de896 100644 > >>> >> >> --- a/net/switchdev/switchdev.c > >>> >> >> +++ b/net/switchdev/switchdev.c > >>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct > >>> >> >> net_device *dev, > >>> >> >> if (vlan.vid_begin) > >>> >> >> return -EINVAL; > >>> >> >> vlan.vid_begin = vinfo->vid; > >>> >> >> + /* don't allow range of pvids */ > >>> >> >> + if (vlan.flags & BRIDGE_VLAN_INFO_PVID) > >>> >> >> + return -EINVAL; > >>> >> >> } else if (vinfo->flags & > >>> >> >> BRIDGE_VLAN_INFO_RANGE_END) { > >>> >> >> if (!vlan.vid_begin) > >>> >> >> return -EINVAL; > >>> >> >> -- > >>> >> >> 2.4.3 > >>> >> >> > >>> >> > > >>> >> >Yes the patch looks good, but it is a minor check though. I hope the > >>> >> >subject of this thread is making sense. > >>> >> > > >>> >> >VLAN ranges seem to have been included for an UX purpose (so commands > >>> >> >look like Cisco IOS). We don't want to change any existing interface, > >>> >> >so > >>> >> >we pushed that down to drivers, with the only valid reason that, maybe > >>> >> >one day, an hardware can be capable of programming a range on a > >>> >> >per-port > >>> >> >basis. > >>> >> Hi, > >>> >> > >>> >> That's actually what we are doing in mlxsw. We can do up to 256 > >>> >> entries in > >>> >> one go. We've yet to submit this part. > >>> > > >>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then. > >>> > > >>> >So there is now a very last question in my head for this, which is more > >>> >a matter of kernel design. Should the user be aware of such underlying > >>> >support? In other words, would it make sense to do this in a driver: > >>> > > >>> >foo_port_vlan_add(struct net_device *dev, > >>> > struct switchdev_obj_port_vlan *vlan) > >>> >{ > >>> >if (vlan->vid_begin != vlan->vid_end) > >>> >return -ENOTSUPP; /* or something more relevant for user */ > >>> > > >>> >return foo_port_single_vlan_add(dev, vlan->vid_begin); > >>> >} > >>> > > >>> >So drivers keep being simple, and we can easily propagate the fact that > >>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not > >>> >implemented and must be done in software. > >>> I think that if you want to keep it simple, then Scott's advice from the > >>> previous thread is the most appropriate one. I believe the hardware you > >>> are using is simply not meant to support multiple 802.1Q bridges. > >> > >> You mean allowing only one Linux bridge over an hardware switch? > >> > >> It would for sure simplify how, as developers and users, we represent a > >> physical switch. But I am not sure how to achieve that and I don't have > >> strong opinions on this TBH. > > > >Hi Vivien, I think it's possible to keep switch ports on just one > >bridge if we do a little bit of work on the NETDEV_CHANGEUPPER > >notifier. This will give you the driver-level control you want. Do > >you have time to investigate? The idea is: > > > >1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is > >being added to a second bridge,then return NOTIFY_BAD. Your driver > >needs to track the bridge count. > > > >2) In __netdev_upper_dev_link(), check the return code from the > >call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if > >NOTIFY_BAD, abort the linking operation (goto rollback_xxx). > > > Hi, > > We are doing something similar in mlxsw (not upstream yet). Jiri > introduced PRE_CHANGEUPPER, which is called from the function you > mentioned, but before the linking operation (so that you don't need to > rollback). > > If the notification is about a linking operation and the master is a > bridge different than the current one, then NOTIFY_BAD is returned. Great, I'll wait fo
Re: [PATCH net-next v2 6/6] net: dsa: remove port_fdb_getnext
On 13/10/15 09:46, Vivien Didelot wrote: > No driver implements port_fdb_getnext anymore, and port_fdb_dump is > preferred anyway, so remove this function from DSA. > > Signed-off-by: Vivien Didelot Acked-by: Florian Fainelli -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/6] net: dsa: add port_fdb_dump function
On 13/10/15 09:46, Vivien Didelot wrote: > Not all switch chips support a Get Next operation to iterate on its FDB. > So add a more simple port_fdb_dump function for them. > > Signed-off-by: Vivien Didelot Acked-by: Florian Fainelli -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 2/3] tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper
Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet --- include/net/inet_connection_sock.h | 1 + net/dccp/ipv4.c| 2 +- net/dccp/ipv6.c| 2 +- net/ipv4/inet_connection_sock.c| 10 -- net/ipv4/tcp_ipv4.c| 2 +- net/ipv6/tcp_ipv6.c| 2 +- 6 files changed, 13 insertions(+), 6 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 3208a65d1c28..89ecbc80b2ce 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -299,6 +299,7 @@ static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk) } void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req); +void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req); void inet_csk_destroy_sock(struct sock *sk); void inet_csk_prepare_forced_close(struct sock *sk); diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 644af510d932..59bc180b02d8 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -828,7 +828,7 @@ lookup: if (likely(sk->sk_state == DCCP_LISTEN)) { nsk = dccp_check_req(sk, skb, req); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 68831931b1fe..d9cc731f2619 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -686,7 +686,7 @@ lookup: if (likely(sk->sk_state == DCCP_LISTEN)) { nsk = dccp_check_req(sk, skb, req); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 514b9e910bd4..a5a1b54915e5 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -546,6 +546,13 @@ void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req) } EXPORT_SYMBOL(inet_csk_reqsk_queue_drop); +void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req) +{ + inet_csk_reqsk_queue_drop(sk, req); + reqsk_put(req); +} +EXPORT_SYMBOL(inet_csk_reqsk_queue_drop_and_put); + static void reqsk_timer_handler(unsigned long data) { struct request_sock *req = (struct request_sock *)data; @@ -608,8 +615,7 @@ static void reqsk_timer_handler(unsigned long data) return; } drop: - inet_csk_reqsk_queue_drop(sk_listener, req); - reqsk_put(req); + inet_csk_reqsk_queue_drop_and_put(sk_listener, req); } static void reqsk_queue_hash_req(struct request_sock *req, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index aad2298de7ad..9c68cf3762c4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1591,7 +1591,7 @@ process: if (likely(sk->sk_state == TCP_LISTEN)) { nsk = tcp_check_req(sk, skb, req, false); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 7ce1c57199d1..acb06f86f372 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1386,7 +1386,7 @@ process: if (likely(sk->sk_state == TCP_LISTEN)) { nsk = tcp_check_req(sk, skb, req, false); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 3/3] tcp/dccp: fix race at listener dismantle phase
Under stress, a close() on a listener can trigger the WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop() We need to test if listener is still active before queueing a child in inet_csk_reqsk_queue_add() Create a common inet_child_forget() helper, and use it from inet_csk_reqsk_queue_add() and inet_csk_listen_stop() Signed-off-by: Eric Dumazet --- include/net/inet_connection_sock.h | 9 ++--- include/net/request_sock.h | 19 -- net/ipv4/inet_connection_sock.c| 71 ++ 3 files changed, 51 insertions(+), 48 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 89ecbc80b2ce..8b0e3d8a4d81 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -268,13 +268,8 @@ struct dst_entry *inet_csk_route_child_sock(const struct sock *sk, struct sock *newsk, const struct request_sock *req); -static inline void inet_csk_reqsk_queue_add(struct sock *sk, - struct request_sock *req, - struct sock *child) -{ - reqsk_queue_add(&inet_csk(sk)->icsk_accept_queue, req, sk, child); -} - +void inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req, + struct sock *child); void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req, unsigned long timeout); diff --git a/include/net/request_sock.h b/include/net/request_sock.h index 2e73748956d5..a0dde04eb178 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -186,25 +186,6 @@ static inline bool reqsk_queue_empty(const struct request_sock_queue *queue) return queue->rskq_accept_head == NULL; } -static inline void reqsk_queue_add(struct request_sock_queue *queue, - struct request_sock *req, - struct sock *parent, - struct sock *child) -{ - spin_lock(&queue->rskq_lock); - req->sk = child; - sk_acceptq_added(parent); - - if (queue->rskq_accept_head == NULL) - queue->rskq_accept_head = req; - else - queue->rskq_accept_tail->dl_next = req; - - queue->rskq_accept_tail = req; - req->dl_next = NULL; - spin_unlock(&queue->rskq_lock); -} - static inline struct request_sock *reqsk_queue_remove(struct request_sock_queue *queue, struct sock *parent) { diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index a5a1b54915e5..08eaa5e20574 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -764,6 +764,53 @@ int inet_csk_listen_start(struct sock *sk, const int nr_table_entries) } EXPORT_SYMBOL_GPL(inet_csk_listen_start); +static void inet_child_forget(struct sock *sk, struct request_sock *req, + struct sock *child) +{ + sk->sk_prot->disconnect(child, O_NONBLOCK); + + sock_orphan(child); + + percpu_counter_inc(sk->sk_prot->orphan_count); + + if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(req)->tfo_listener) { + BUG_ON(tcp_sk(child)->fastopen_rsk != req); + BUG_ON(sk != req->rsk_listener); + + /* Paranoid, to prevent race condition if +* an inbound pkt destined for child is +* blocked by sock lock in tcp_v4_rcv(). +* Also to satisfy an assertion in +* tcp_v4_destroy_sock(). +*/ + tcp_sk(child)->fastopen_rsk = NULL; + } + inet_csk_destroy_sock(child); + reqsk_put(req); +} + +void inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req, + struct sock *child) +{ + struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue; + + spin_lock(&queue->rskq_lock); + if (unlikely(sk->sk_state != TCP_LISTEN)) { + inet_child_forget(sk, req, child); + } else { + req->sk = child; + req->dl_next = NULL; + if (queue->rskq_accept_head == NULL) + queue->rskq_accept_head = req; + else + queue->rskq_accept_tail->dl_next = req; + queue->rskq_accept_tail = req; + sk_acceptq_added(sk); + } + spin_unlock(&queue->rskq_lock); +} +EXPORT_SYMBOL(inet_csk_reqsk_queue_add); + /* * This routine closes sockets which have been at least partially * opened, but not yet accepted. @@ -790,31 +837,11 @@ void inet_csk_listen_stop(struct sock *sk) WARN_ON(sock_owned_by_user(child)); sock_hold(child); - sk->sk_prot->disc
[PATCH v2 net-next 1/3] Revert "inet: fix double request socket freeing"
This reverts commit c69736696cf3742b37d850289dc0d7ead177bb14. At the time of above commit, tcp_req_err() and dccp_req_err() were dead code, as SYN_RECV request sockets were not yet in ehash table. Real bug was fixed later in a different commit. We need to revert to not leak a refcount on request socket. inet_csk_reqsk_queue_drop_and_put() will be added in following commit to make clean inet_csk_reqsk_queue_drop() does not release the reference owned by caller. Signed-off-by: Eric Dumazet --- net/dccp/ipv4.c | 2 +- net/ipv4/tcp_ipv4.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 0dcf1963b323..644af510d932 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -208,7 +208,6 @@ void dccp_req_err(struct sock *sk, u64 seq) if (!between48(seq, dccp_rsk(req)->dreq_iss, dccp_rsk(req)->dreq_gss)) { NET_INC_STATS_BH(net, LINUX_MIB_OUTOFWINDOWICMPS); - reqsk_put(req); } else { /* * Still in RESPOND, just remove it silently. @@ -218,6 +217,7 @@ void dccp_req_err(struct sock *sk, u64 seq) */ inet_csk_reqsk_queue_drop(req->rsk_listener, req); } + reqsk_put(req); } EXPORT_SYMBOL(dccp_req_err); diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 1ff0923df715..aad2298de7ad 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -324,7 +324,6 @@ void tcp_req_err(struct sock *sk, u32 seq) if (seq != tcp_rsk(req)->snt_isn) { NET_INC_STATS_BH(net, LINUX_MIB_OUTOFWINDOWICMPS); - reqsk_put(req); } else { /* * Still in SYN_RECV, just remove it silently. @@ -332,9 +331,10 @@ void tcp_req_err(struct sock *sk, u32 seq) * created socket, and POSIX does not want network * errors returned from accept(). */ - NET_INC_STATS_BH(net, LINUX_MIB_LISTENDROPS); inet_csk_reqsk_queue_drop(req->rsk_listener, req); + NET_INC_STATS_BH(net, LINUX_MIB_LISTENDROPS); } + reqsk_put(req); } EXPORT_SYMBOL(tcp_req_err); -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 0/3] tcp/dccp: make our listener code more robust
This patch series addresses request sockets leaks and listener dismantle phase. This survives a stress test with listeners being added/removed quite randomly. Eric Dumazet (3): Revert "inet: fix double request socket freeing" tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper tcp/dccp: fix race at listener dismantle phase include/net/inet_connection_sock.h | 10 ++--- include/net/request_sock.h | 19 - net/dccp/ipv4.c| 4 +- net/dccp/ipv6.c| 2 +- net/ipv4/inet_connection_sock.c| 81 +++--- net/ipv4/tcp_ipv4.c| 6 +-- net/ipv6/tcp_ipv6.c| 2 +- 7 files changed, 67 insertions(+), 57 deletions(-) -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 3/3] tcp/dccp: fix race at listener dismantle phase
On Wed, 2015-10-14 at 10:58 -0700, Eric Dumazet wrote: ... > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index a5a1b54915e5..38b7ef8b0b78 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -740,7 +740,7 @@ int inet_csk_listen_start(struct sock *sk, const int > nr_table_entries) > > reqsk_queue_alloc(&icsk->icsk_accept_queue); > > - sk->sk_max_ack_backlog = 0; > + sk->sk_max_ack_backlog = nr_table_entries; > sk->sk_ack_backlog = 0; > inet_csk_delack_init(sk); > > @@ -764,6 +764,53 @@ int inet_csk_listen_start(struct sock *sk, const int > nr_table_entries) Arg, this part was not meant to be there, sorry. Will send a v2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 1/3] openvswitch: Reject ct_state masks for unknown bits
Currently, 0-bits are generated in ct_state where the bit position is undefined, and matches are accepted on these bit-positions. If userspace requests to match the 0-value for this bit then it may expect only a subset of traffic to match this value, whereas currently all packets will have this bit set to 0. Fix this by rejecting such masks. Signed-off-by: Joe Stringer --- net/openvswitch/conntrack.h| 11 +-- net/openvswitch/flow_netlink.c | 5 - 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h index da8714942c95..2d42b3640117 100644 --- a/net/openvswitch/conntrack.h +++ b/net/openvswitch/conntrack.h @@ -35,12 +35,9 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key); int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb); void ovs_ct_free_action(const struct nlattr *a); -static inline bool ovs_ct_state_supported(u32 state) -{ - return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | -OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR | -OVS_CS_F_INVALID | OVS_CS_F_TRACKED)); -} +#define CT_SUPPORTED_MASK (OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | \ + OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR | \ + OVS_CS_F_INVALID | OVS_CS_F_TRACKED) #else #include @@ -94,5 +91,7 @@ static inline int ovs_ct_put_key(const struct sw_flow_key *key, } static inline void ovs_ct_free_action(const struct nlattr *a) { } + +#define CT_SUPPORTED_MASK 0 #endif /* CONFIG_NF_CONNTRACK */ #endif /* ovs_conntrack.h */ diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index 171a691f1c32..bd710bc37469 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -816,7 +816,7 @@ static int metadata_from_nlattrs(struct net *net, struct sw_flow_match *match, ovs_ct_verify(net, OVS_KEY_ATTR_CT_STATE)) { u32 ct_state = nla_get_u32(a[OVS_KEY_ATTR_CT_STATE]); - if (!is_mask && !ovs_ct_state_supported(ct_state)) { + if (ct_state & ~CT_SUPPORTED_MASK) { OVS_NLERR(log, "ct_state flags %08x unsupported", ct_state); return -EINVAL; @@ -1099,6 +1099,9 @@ static void nlattr_set(struct nlattr *attr, u8 val, } else { memset(nla_data(nla), val, nla_len(nla)); } + + if (nla_type(nla) == OVS_KEY_ATTR_CT_STATE) + *(u32 *)nla_data(nla) &= CT_SUPPORTED_MASK; } } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 3/3] openvswitch: Serialize nested ct actions if provided
If userspace provides a ct action with no nested mark or label, then the storage for these fields is zeroed. Later when actions are requested, such zeroed fields are serialized even though userspace didn't originally specify them. Fix the behaviour by ensuring that no action is serialized in this case, and reject actions where userspace attempts to set these fields with mask=0. This should make netlink marshalling consistent across deserialization/reserialization. Reported-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- net/openvswitch/conntrack.c | 21 - 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 480dbb9095b7..ba29e6c2e0d4 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -540,6 +540,16 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info *info, const char *name, return 0; } +static bool label_zero(const struct ovs_key_ct_labels *labels) +{ + int i; + + for (i = 0; i < sizeof(*labels); i++) + if (labels->ct_labels[i]) + return false; + return true; +} + static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = { [OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 }, [OVS_CT_ATTR_ZONE] = { .minlen = sizeof(u16), @@ -589,6 +599,10 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, case OVS_CT_ATTR_MARK: { struct md_mark *mark = nla_data(a); + if (!mark->mask) { + OVS_NLERR(log, "ct_mark mask cannot be 0"); + return -EINVAL; + } info->mark = *mark; break; } @@ -597,6 +611,10 @@ static int parse_ct(const struct nlattr *attr, struct ovs_conntrack_info *info, case OVS_CT_ATTR_LABELS: { struct md_labels *labels = nla_data(a); + if (label_zero(&labels->mask)) { + OVS_NLERR(log, "ct_labels mask cannot be 0"); + return -EINVAL; + } info->labels = *labels; break; } @@ -707,11 +725,12 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info *ct_info, if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) && nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone.id)) return -EMSGSIZE; - if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && + if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && ct_info->mark.mask && nla_put(skb, OVS_CT_ATTR_MARK, sizeof(ct_info->mark), &ct_info->mark)) return -EMSGSIZE; if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) && + !label_zero(&ct_info->labels.mask) && nla_put(skb, OVS_CT_ATTR_LABELS, sizeof(ct_info->labels), &ct_info->labels)) return -EMSGSIZE; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] openvswitch: Scrub skb between namespaces
If OVS receives a packet from another namespace, then the packet should be scrubbed. However, people have already begun to rely on the behaviour that skb->mark is preserved across namespaces, so retain this one field. This is mainly to address information leakage between namespaces when using OVS internal ports, but by placing it in ovs_vport_receive() it is more generally applicable, meaning it should not be overlooked if other port types are allowed to be moved into namespaces in future. Signed-off-by: Joe Stringer --- I originally proposed this patch as part of the conntrack changes to OVS, and there was some discussion on that thread, culminating here: http://www.spinics.net/lists/netdev/msg338626.html We also discussed this a bit in Seattle, however I didn't follow up immediately so I don't exactly recall what the consensus was. Following Jesse's direction in the above thread, I'm proposing that we preserve the mark, but scrub the rest. Also fixed the use-after-free bug present in the previous version. I think this is relevant for 'net', because this is the first time that the metadata_dst and nfct are exposed (albeit indirectly) through OVS so it would be nice to get agreement on the expected behaviour. --- net/openvswitch/vport.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index fc5c0b9ccfe9..70f19ea99b92 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -440,10 +440,17 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff *skb, const struct ip_tunnel_info *tun_info) { struct sw_flow_key key; + u32 mark = skb->mark; int error; OVS_CB(skb)->input_vport = vport; OVS_CB(skb)->mru = 0; + if (dev_net(skb->dev) != ovs_dp_get_net(vport->dp)) { + skb_scrub_packet(skb, true); + tun_info = NULL; + } + skb->mark = mark; + /* Extract flow from 'skb' into 'key'. */ error = ovs_flow_key_extract(tun_info, skb, &key); if (unlikely(error)) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/3] openvswitch: Treat IP_CT_RELATED as new
New, related connections are marked as such as part of ovs_ct_lookup(), but they are not marked as "new" if the commit flag is used. Make this consistent by treating IP_CT_RELATED as new as well. Reported-by: Jarno Rajahalme Signed-off-by: Joe Stringer --- net/openvswitch/conntrack.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 80bf702715bb..480dbb9095b7 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -86,6 +86,8 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo) ct_state |= OVS_CS_F_ESTABLISHED; break; case IP_CT_RELATED: + ct_state |= OVS_CS_F_NEW; + /* Fall through */ case IP_CT_RELATED_REPLY: ct_state |= OVS_CS_F_RELATED; break; -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] drivers/net: get rid of unnecessary initializations in .get_drvinfo()
On Wed, 2015-10-14 at 18:27 +0200, Ivan Vecera wrote: > Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len, > eedump_len & regdump_len fields in their .get_drvinfo() ethtool op. > It's not necessary as these fields is filled in ethtool_get_drvinfo(). > > Signed-off-by: Ivan Vecera [...] Acked-by: Ben Hutchings -- Ben Hutchings [W]e found...that it wasn't as easy to get programs right as we had thought. ... I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. - Maurice Wilkes, 1949 signature.asc Description: This is a digitally signed message part
Re: [PATCH v3 3/5] net: phy: Add Broadcom phy library for common interfaces
On 10/06/2015 03:25 PM, Arun Parameswaran wrote: This patch adds the Broadcom phy library to consolidate common interfaces shared by Broadcom phy's. The BCM54612E is included in the Broadcom Community part portfolio (https://community.broadcom.com). However, I don't see this part explicitly supported by your phy library ( e.g., not included in broadcom_drivers[] in broadcom.c ). Can you please comment on whether this part is supported or the extent of changes required to establish and support a robust GigE connection between RGMII and CU? We're considering this part for a new embedded design, and we need an open source driver for it. Thanks Bob Moved the common interfaces to the 'bcm-phy-lib.c' and updated the Broadcom PHY drivers to use the new APIs. Signed-off-by: Arun Parameswaran --- drivers/net/phy/Kconfig | 6 ++ drivers/net/phy/Makefile | 1 + drivers/net/phy/bcm-phy-lib.c | 209 ++ drivers/net/phy/bcm-phy-lib.h | 37 drivers/net/phy/bcm63xx.c | 38 +--- drivers/net/phy/bcm7xxx.c | 127 ++--- drivers/net/phy/broadcom.c| 149 +- include/linux/brcmphy.h | 22 + 8 files changed, 333 insertions(+), 256 deletions(-) create mode 100644 drivers/net/phy/bcm-phy-lib.c create mode 100644 drivers/net/phy/bcm-phy-lib.h diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index b57f6c2..606fdc9 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -69,8 +69,12 @@ config SMSC_PHY ---help--- Currently supports the LAN83C185, LAN8187 and LAN8700 PHYs +config BCM_NET_PHYLIB + tristate + config BROADCOM_PHY tristate "Drivers for Broadcom PHYs" + select BCM_NET_PHYLIB ---help--- Currently supports the BCM5411, BCM5421, BCM5461, BCM54616S, BCM5464, BCM5481 and BCM5482 PHYs. @@ -78,11 +82,13 @@ config BROADCOM_PHY config BCM63XX_PHY tristate "Drivers for Broadcom 63xx SOCs internal PHY" depends on BCM63XX + select BCM_NET_PHYLIB ---help--- Currently supports the 6348 and 6358 PHYs. config BCM7XXX_PHY tristate "Drivers for Broadcom 7xxx SOCs internal PHYs" + select BCM_NET_PHYLIB ---help--- Currently supports the BCM7366, BCM7439, BCM7445, and 40nm and 65nm generation of BCM7xxx Set Top Box SoCs. diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index f4e6eb9..6932475 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_QSEMI_PHY) += qsemi.o obj-$(CONFIG_SMSC_PHY)+= smsc.o obj-$(CONFIG_TERANETICS_PHY) += teranetics.o obj-$(CONFIG_VITESSE_PHY) += vitesse.o +obj-$(CONFIG_BCM_NET_PHYLIB) += bcm-phy-lib.o obj-$(CONFIG_BROADCOM_PHY)+= broadcom.o obj-$(CONFIG_BCM63XX_PHY) += bcm63xx.o obj-$(CONFIG_BCM7XXX_PHY) += bcm7xxx.o diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c new file mode 100644 index 000..13e161e --- /dev/null +++ b/drivers/net/phy/bcm-phy-lib.c @@ -0,0 +1,209 @@ +/* + * Copyright (C) 2015 Broadcom Corporation + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation version 2. + * + * This program is distributed "as is" WITHOUT ANY WARRANTY of any + * kind, whether express or implied; without even the implied warranty + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include "bcm-phy-lib.h" +#include +#include +#include +#include + +#define MII_BCM_CHANNEL_WIDTH 0x2000 +#define BCM_CL45VEN_EEE_ADV 0x3c + +int bcm_phy_write_exp(struct phy_device *phydev, u16 reg, u16 val) +{ + int rc; + + rc = phy_write(phydev, MII_BCM54XX_EXP_SEL, reg); + if (rc < 0) + return rc; + + return phy_write(phydev, MII_BCM54XX_EXP_DATA, val); +} +EXPORT_SYMBOL_GPL(bcm_phy_write_exp); + +int bcm_phy_read_exp(struct phy_device *phydev, u16 reg) +{ + int val; + + val = phy_write(phydev, MII_BCM54XX_EXP_SEL, reg); + if (val < 0) + return val; + + val = phy_read(phydev, MII_BCM54XX_EXP_DATA); + + /* Restore default value. It's O.K. if this write fails. */ + phy_write(phydev, MII_BCM54XX_EXP_SEL, 0); + + return val; +} +EXPORT_SYMBOL_GPL(bcm_phy_read_exp); + +int bcm_phy_write_misc(struct phy_device *phydev, + u16 reg, u16 chl, u16 val) +{ + int rc; + int tmp; + + rc = phy_write(phydev, MII_BCM54XX_AUX_CTL, + MII_BCM54XX_AUXCTL_SHDWSEL_MISC); + if (rc < 0) + return rc; + + tmp = phy_read(phydev, MII_BCM54XX_AUX_CTL); + tmp |= MII_BCM
[PATCH net-next 2/3] tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper
Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet --- include/net/inet_connection_sock.h | 1 + net/dccp/ipv4.c| 2 +- net/dccp/ipv6.c| 2 +- net/ipv4/inet_connection_sock.c| 10 -- net/ipv4/tcp_ipv4.c| 2 +- net/ipv6/tcp_ipv6.c| 2 +- 6 files changed, 13 insertions(+), 6 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 3208a65d1c28..89ecbc80b2ce 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -299,6 +299,7 @@ static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk) } void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req); +void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req); void inet_csk_destroy_sock(struct sock *sk); void inet_csk_prepare_forced_close(struct sock *sk); diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index 644af510d932..59bc180b02d8 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -828,7 +828,7 @@ lookup: if (likely(sk->sk_state == DCCP_LISTEN)) { nsk = dccp_check_req(sk, skb, req); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index 68831931b1fe..d9cc731f2619 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -686,7 +686,7 @@ lookup: if (likely(sk->sk_state == DCCP_LISTEN)) { nsk = dccp_check_req(sk, skb, req); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 514b9e910bd4..a5a1b54915e5 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -546,6 +546,13 @@ void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req) } EXPORT_SYMBOL(inet_csk_reqsk_queue_drop); +void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock *req) +{ + inet_csk_reqsk_queue_drop(sk, req); + reqsk_put(req); +} +EXPORT_SYMBOL(inet_csk_reqsk_queue_drop_and_put); + static void reqsk_timer_handler(unsigned long data) { struct request_sock *req = (struct request_sock *)data; @@ -608,8 +615,7 @@ static void reqsk_timer_handler(unsigned long data) return; } drop: - inet_csk_reqsk_queue_drop(sk_listener, req); - reqsk_put(req); + inet_csk_reqsk_queue_drop_and_put(sk_listener, req); } static void reqsk_queue_hash_req(struct request_sock *req, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index aad2298de7ad..9c68cf3762c4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1591,7 +1591,7 @@ process: if (likely(sk->sk_state == TCP_LISTEN)) { nsk = tcp_check_req(sk, skb, req, false); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 7ce1c57199d1..acb06f86f372 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1386,7 +1386,7 @@ process: if (likely(sk->sk_state == TCP_LISTEN)) { nsk = tcp_check_req(sk, skb, req, false); } else { - inet_csk_reqsk_queue_drop(sk, req); + inet_csk_reqsk_queue_drop_and_put(sk, req); goto lookup; } if (!nsk) { -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html