[PATCH v4 60/79] include/uapi/linux/atm_zatm.h: include linux/time.h

2015-10-14 Thread Mikko Rapeli
Fixes userspace compile error:

error: field ‘real’ has incomplete type
 struct timeval real;  /* real (wall-clock) time */

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/atm_zatm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/atm_zatm.h b/include/uapi/linux/atm_zatm.h
index 10f0fa2..adbaa6c 100644
--- a/include/uapi/linux/atm_zatm.h
+++ b/include/uapi/linux/atm_zatm.h
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #define ZATM_GETPOOL   _IOW('a',ATMIOC_SARPRV+1,struct atmif_sioc)
/* get pool statistics */
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 57/79] include/uapi/linux/openvswitch.h: use __u32 from linux/types.h

2015-10-14 Thread Mikko Rapeli
Fixes userspace compiler error:

error: unknown type name ‘uint32_t’

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/openvswitch.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 32e07d8..80c39a1 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -612,8 +612,8 @@ enum ovs_hash_alg {
  * @hash_basis: basis used for computing hash.
  */
 struct ovs_action_hash {
-   uint32_t  hash_alg; /* One of ovs_hash_alg. */
-   uint32_t  hash_basis;
+   __u32  hash_alg; /* One of ovs_hash_alg. */
+   __u32  hash_basis;
 };
 
 /**
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] netlink: trim skb to exact size to avoid MSG_TRUNC

2015-10-14 Thread Arad, Ronen


>-Original Message-
>From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
>Behalf Of Thomas Graf
>Sent: Wednesday, October 14, 2015 12:45 AM
>To: Arad, Ronen
>Cc: netdev@vger.kernel.org
>Subject: Re: [PATCH] netlink: trim skb to exact size to avoid MSG_TRUNC
>
>On 10/13/15 at 05:52pm, Arad, Ronen wrote:
>> [@Ronen] My reader as I described above is providing a larger message
>> which I'm trying to properly size. I'm aware that libnl shields
>> applications from the need to know and provide properly sized buffer by
>> peeking or/and re-allocating a buffer.
>> My issue is with iproute2 "ip link show" and "bridge vlan show" commands.
>
>>
>> >I'm just trying to understand which exact case you are solving here.
>> Allocation is always performed by alloc_size which could be
>> nlk->max_recvmsg_len (only when min_dump_alloc is sufficiently small) and
>> upon failure falling back to alloc_min_size.
>> The trimming of the skb space is common regardless of the allocation call.
>> I tried to submit the minimal patch to address the issue. If you think the
>> Re-organized code is better I can re-submit a V2.
>
>I was about to suggest the same code change after initial discussion ;-)
>
>So you are fixing the case where >2x messages fit the padded skb size.
[@Ronen] I'm not sure I understand this statement. I'm fixing the padding
of the skb such that reader could have reasonable buffer size based on the
largest netdev. It is just happened that the skb allocation was about
double the size. Probably because the allocation was some kind of power
of 2 and the requested size was slightly above the next lower power of 2.

On a separate patch titled 
[PATCH net-next v3] netlink: Rightsize IFLA_AF_SPEC size calculation
I'm reducing the over-estimation of the buffer size for "ip link"
requests. It turned out that VLAN information space was added to
unrelated dump requests since ext_filter_mask was not passed to
rtnl_link_get_af_size().
The "rightsizing" patch also reduces the buffer size of compressed VLANs
dump. Non-compressed VLANs dump will continue to require more than the
16KiB buffer size from somewhere around 1800 VLANs and above (based on
8 bytes per VLAN plus other attributes consuming about 1700 bytes).
Using the same logic the full range of 4094 VLANs uncompressed would take
Roughly 34500 bytes. It looks like a "safe" iproute2 statically sized
Buffer would have to be about 36000 bytes or so.

>This was not clear from the commit message. I would appreciate a note
>in the commit message and updated code comment to reflect this.
>
>The fix is definitely not incorrect and the penalty for readers which
>peek first is less than I thought since nlk->max_recvmsg_len is at
>least 16K in size. Since most peekers will double buffer sizes they
>will most likely end up growing nlk->max_recvmsg_len after the first
>read.
[@Ronen] nlk->max_recvmsg_len is actually capped at 16KiB.
/* Record the max length of recvmsg() calls for future allocations */
nlk->max_recvmsg_len = max(nlk->max_recvmsg_len, len);
nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len, 16384);
>
>However, if alloc_size is > 16K, we would have typically ended up with a
>giant skb which peeking users were able to take advantage of while
>with this fix this is no longer the case.
[@Ronen] As I noted above, peeking reader could only enjoy saving up to 
16KiB.
>
>However #2, I'll see if it makes sense to look at MSG_PEEK in recvmsg
>and change nlk->max_recvmsg_len accordingly so we take advantage of
>the full skb size on sockets which perform peeking. Given that both
>reader behaviours can be preserved, I'm good with your proposed v2.
[@Ronen] I'll submit by suggested v2.
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Question]: iproute2 extension for supporting lightweight tunnel

2015-10-14 Thread Liu, Mengke
Hi,

I am trying to use the "lightweight tunnels" after building the Linux kernel 
from source with "Lightweight & flow based encapsulation" support. Can you tell 
me how to get iproute2 extension for supporting the following  command in 
commit log(commit ID e69724f32e62502a6e686eae36b7aadfeea60dca)
   "ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0" 
Thanks.

Best Regards
Mengke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net] mlxsw: core: Fix race condition in __mlxsw_emad_transmit

2015-10-14 Thread Jiri Pirko
From: Ido Schimmel 

Under certain conditions EMAD responses can be returned from the device
even before setting trans_active. This will cause the EMAD Rx listener
to drop the EMAD response - as there are no active transactions - and
timeouts will be generated.

Fix this by setting trans_active before transmitting the EMAD skb.

Fixes: 4ec14b7634b2 ("mlxsw: Add interface to access registers and process 
events")
Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index dbcaf5d..28c19cc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -374,26 +374,31 @@ static int __mlxsw_emad_transmit(struct mlxsw_core 
*mlxsw_core,
int err;
int ret;
 
+   mlxsw_core->emad.trans_active = true;
+
err = mlxsw_core_skb_transmit(mlxsw_core->driver_priv, skb, tx_info);
if (err) {
dev_err(mlxsw_core->bus_info->dev, "Failed to transmit EMAD 
(tid=%llx)\n",
mlxsw_core->emad.tid);
dev_kfree_skb(skb);
-   return err;
+   goto trans_inactive_out;
}
 
-   mlxsw_core->emad.trans_active = true;
ret = wait_event_timeout(mlxsw_core->emad.wait,
 !(mlxsw_core->emad.trans_active),
 msecs_to_jiffies(MLXSW_EMAD_TIMEOUT_MS));
if (!ret) {
dev_warn(mlxsw_core->bus_info->dev, "EMAD timed-out 
(tid=%llx)\n",
 mlxsw_core->emad.tid);
-   mlxsw_core->emad.trans_active = false;
-   return -EIO;
+   err = -EIO;
+   goto trans_inactive_out;
}
 
return 0;
+
+trans_inactive_out:
+   mlxsw_core->emad.trans_active = false;
+   return err;
 }
 
 static int mlxsw_emad_process_status(struct mlxsw_core *mlxsw_core,
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 39/79] include/uapi/linux/if_pppox.h: include linux/if.h

2015-10-14 Thread Mikko Rapeli
Fixes userspace compilation error:

error: ‘IFNAMSIZ’ undeclared here (not in a function)

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_pppox.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index e128769..473c3c4 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -21,6 +21,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 44/79] include/uapi/linux/if_pppox.h: include linux/in.h and linux/in6.h

2015-10-14 Thread Mikko Rapeli
Fixes userspace compilation errors:

error: field ‘addr’ has incomplete type
 struct sockaddr_in addr; /* IP address and port to send to */

error: field ‘addr’ has incomplete type
 struct sockaddr_in6 addr; /* IP address and port to send to */

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_pppox.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/if_pppox.h b/include/uapi/linux/if_pppox.h
index 473c3c4..d37bbb1 100644
--- a/include/uapi/linux/if_pppox.h
+++ b/include/uapi/linux/if_pppox.h
@@ -24,6 +24,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* For user-space programs to pick up these definitions
  * which they wouldn't get otherwise without defining __KERNEL__
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 43/79] include/uapi/linux/if_pppol2tp.h: include linux/in.h and linux/in6.h

2015-10-14 Thread Mikko Rapeli
Fixes userspace compilation errors like:

error: field ‘addr’ has incomplete type
 struct sockaddr_in addr; /* IP address and port to send to */
^
error: field ‘addr’ has incomplete type
 struct sockaddr_in6 addr; /* IP address and port to send to */

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_pppol2tp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_pppol2tp.h b/include/uapi/linux/if_pppol2tp.h
index 163e8ad..4bd1f55 100644
--- a/include/uapi/linux/if_pppol2tp.h
+++ b/include/uapi/linux/if_pppol2tp.h
@@ -16,7 +16,8 @@
 #define _UAPI__LINUX_IF_PPPOL2TP_H
 
 #include 
-
+#include 
+#include 
 
 /* Structure used to connect() the socket to a particular tunnel UDP
  * socket over IPv4.
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 40/79] include/uapi/linux/if_tunnel.h: include linux/if.h, linux/ip.h and linux/in6.h

2015-10-14 Thread Mikko Rapeli
Fixes userspace compilation errors like:

error: field ‘iph’ has incomplete type
error: field ‘prefix’ has incomplete type

Signed-off-by: Mikko Rapeli 
---
 include/uapi/linux/if_tunnel.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index af4de90..8afe695 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -2,6 +2,9 @@
 #define _UAPI_IF_TUNNEL_H_
 
 #include 
+#include 
+#include 
+#include 
 #include 
 
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-14 Thread Richard Cochran
On Wed, Oct 14, 2015 at 06:57:33PM -0700, Christopher Hall wrote:
> >>+#define SHADOW_HISTORY_DEPTH 7
> >
> >And that number is 7 because?
> 
> Due to power of 2 it will be 8 instead. As above the useful history is 8-2*1
> ms (1 ms is the minimum jiffy length).  Array size 4 would not be enough
> history for the DSP which requires 4 ms of history, in the worst case.

Just as I suspected, the magic number 7 is based on the needs of one
particular user.  What about the next user who comes along needing 10
milliseconds?  That will not do.  Any new interface should be generic
enough to support a wide range of users.

So I think this approach is all wrong.  Here is an idea for you to
consider.  Instead of mucking with the TK, let the user code (possibly
in-kernel) sample ART/sys pairs and interpolate the ART/dev time
stamps.  That way, the user can choose the range and resolution that
he needs.

> The audio driver is structured in such a way that it's simpler to provide a
> value rather than a callback.

Can you please provide a link to the audio driver that uses this new
interface?

Thanks,
Richard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred

2015-10-14 Thread Jiri Pirko
Thu, Oct 15, 2015 at 06:34:01AM CEST, sfel...@gmail.com wrote:
>On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> Caller should know if he can call attr_set directly (when holding RTNL)
>> or if he has to defer the att_set processing for later.
>>
>> This also allows drivers to sleep inside attr_set and report operation
>> status back to switchdev core. Switchdev core then warns if status is
>> not ok, instead of silent errors happening in drivers.
>>
>> Benefit from newly introduced switchdev deferred ops infrastructure.
>>
>> Signed-off-by: Jiri Pirko 
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/bridge/br_stp.c   |   3 +-
>>  net/switchdev/switchdev.c | 108 
>> ++
>>  3 files changed, 46 insertions(+), 66 deletions(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index d1c7f90..f7de6f8 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,7 @@
>>
>>  #define SWITCHDEV_F_NO_RECURSE BIT(0)
>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1)
>> +#define SWITCHDEV_F_DEFER  BIT(2)
>>
>>  struct switchdev_trans_item {
>> struct list_head list;
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index db6d243de..80c34d7 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned 
>> int state)
>>  {
>> struct switchdev_attr attr = {
>> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>> +   .flags = SWITCHDEV_F_DEFER,
>> .u.stp_state = state,
>> };
>> int err;
>>
>> p->state = state;
>> err = switchdev_port_attr_set(p->dev, &attr);
>> -   if (err && err != -EOPNOTSUPP)
>> +   if (err)
>> br_warn(p->br, "error setting offload STP state on port 
>> %u(%s)\n",
>> (unsigned int) p->port_no, p->dev->name);
>>  }
>
>Should this part of the patch be moved to patch 6/8 where
>switchdev_deferred_process() is called from del_nbp()?

No, this part relates to the fact that attr_set now does not defer
automagically. So caller must say when to defer. So having this here is
correct.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next v2] netlink: Rightsize IFLA_AF_SPEC size calculation

2015-10-14 Thread Arad, Ronen


>-Original Message-
>From: David Miller [mailto:da...@davemloft.net]
>Sent: Wednesday, October 14, 2015 7:24 PM
>To: Arad, Ronen
>Cc: netdev@vger.kernel.org
>Subject: Re: [PATCH net-next v2] netlink: Rightsize IFLA_AF_SPEC size
>calculation
>
>From: Ronen Arad 
>Date: Wed, 14 Oct 2015 08:51:28 -0700
>
>> @@ -900,7 +901,7 @@ static noinline size_t if_nlmsg_size(const struct
>net_device *dev,
>> + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
>> + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS +
>IFLA_PORT_SELF */
>> + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>> -   + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
>> ++ rtnl_link_get_af_size(dev, ext_filter_mask) /* IFLA_AF_SPEC */
>> + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
>
>Please don't change the indentation on this line, keep it matching
>the indentation of all of the surrounding lines of this expression.
[@Ronen] Sure. V3 submitted. My editor didn't like the indentation of the
surrounding lines which are one less than two TAB spaces but consistency
is important. 
>
>Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3] netlink: Rightsize IFLA_AF_SPEC size calculation

2015-10-14 Thread Ronen Arad
if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
---
 include/net/rtnetlink.h |  3 ++-
 net/bridge/br_netlink.c | 21 +
 net/core/rtnetlink.c|  8 
 net/ipv4/devinet.c  |  4 ++--
 net/ipv6/addrconf.c |  3 ++-
 5 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index aff6ceb..2f87c1b 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -124,7 +124,8 @@ struct rtnl_af_ops {
int (*fill_link_af)(struct sk_buff *skb,
const struct net_device *dev,
u32 ext_filter_mask);
-   size_t  (*get_link_af_size)(const struct net_device 
*dev);
+   size_t  (*get_link_af_size)(const struct net_device 
*dev,
+   u32 ext_filter_mask);
 
int (*validate_link_af)(const struct net_device 
*dev,
const struct nlattr *attr);
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 94b4de8..40197ff 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1214,29 +1214,10 @@ static int br_fill_info(struct sk_buff *skb, const 
struct net_device *brdev)
return 0;
 }
 
-static size_t br_get_link_af_size(const struct net_device *dev)
-{
-   struct net_bridge_port *p;
-   struct net_bridge *br;
-   int num_vlans = 0;
-
-   if (br_port_exists(dev)) {
-   p = br_port_get_rtnl(dev);
-   num_vlans = br_get_num_vlan_infos(nbp_vlan_group(p),
- RTEXT_FILTER_BRVLAN);
-   } else if (dev->priv_flags & IFF_EBRIDGE) {
-   br = netdev_priv(dev);
-   num_vlans = br_get_num_vlan_infos(br_vlan_group(br),
- RTEXT_FILTER_BRVLAN);
-   }
-
-   /* Each VLAN is returned in bridge_vlan_info along with flags */
-   return num_vlans * nla_total_size(sizeof(struct bridge_vlan_info));
-}
 
 static struct rtnl_af_ops br_af_ops __read_mostly = {
.family = AF_BRIDGE,
-   .get_link_af_size   = br_get_link_af_size,
+   .get_link_af_size   = br_get_link_af_size_filtered,
 };
 
 struct rtnl_link_ops br_link_ops __read_mostly = {
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2477595..7c78b5a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -497,7 +497,8 @@ void rtnl_af_unregister(struct rtnl_af_ops *ops)
 }
 EXPORT_SYMBOL_GPL(rtnl_af_unregister);
 
-static size_t rtnl_link_get_af_size(const struct net_device *dev)
+static size_t rtnl_link_get_af_size(const struct net_device *dev,
+   u32 ext_filter_mask)
 {
struct rtnl_af_ops *af_ops;
size_t size;
@@ -509,7 +510,7 @@ static size_t rtnl_link_get_af_size(const struct net_device 
*dev)
if (af_ops->get_link_af_size) {
/* AF_* + nested data */
size += nla_total_size(sizeof(struct nlattr)) +
-   af_ops->get_link_af_size(dev);
+   af_ops->get_link_af_size(dev, ext_filter_mask);
}
}
 
@@ -900,7 +901,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
   + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + 
IFLA_PORT_SELF */
   + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
-  + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
+  + rtnl_link_get_af_size(dev, ext_filter_mask) /* IFLA_AF_SPEC */
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(1); /* IFLA_PROTO_DOWN */
@@ -3443,4 +3444,3 @@ void __init rtnetlink_init(void)
rtnl_register(PF_BRIDGE, RTM_DELLINK, rtnl_bridge_dellin

Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-14 Thread Richard Cochran
On Wed, Oct 14, 2015 at 07:34:03PM -0700, Christopher Hall wrote:
> I hope this is helpful. Thanks.

So the DSP does not produce or consume system time stamps. Fine.
Still I fail to understand why you need the system time.

Thomas seems to say that there are *other* applications that will want
to transform device time into system time, but why does your audio
application use the system time, when the audio-to-ptp time is
directly available, without any man in the middle?

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v5 3/8] switchdev: allow caller to explicitly request attr_set as deferred

2015-10-14 Thread Scott Feldman
On Wed, Oct 14, 2015 at 10:40 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Caller should know if he can call attr_set directly (when holding RTNL)
> or if he has to defer the att_set processing for later.
>
> This also allows drivers to sleep inside attr_set and report operation
> status back to switchdev core. Switchdev core then warns if status is
> not ok, instead of silent errors happening in drivers.
>
> Benefit from newly introduced switchdev deferred ops infrastructure.
>
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/switchdev.h   |   1 +
>  net/bridge/br_stp.c   |   3 +-
>  net/switchdev/switchdev.c | 108 
> ++
>  3 files changed, 46 insertions(+), 66 deletions(-)
>
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index d1c7f90..f7de6f8 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -17,6 +17,7 @@
>
>  #define SWITCHDEV_F_NO_RECURSE BIT(0)
>  #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1)
> +#define SWITCHDEV_F_DEFER  BIT(2)
>
>  struct switchdev_trans_item {
> struct list_head list;
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index db6d243de..80c34d7 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int 
> state)
>  {
> struct switchdev_attr attr = {
> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
> +   .flags = SWITCHDEV_F_DEFER,
> .u.stp_state = state,
> };
> int err;
>
> p->state = state;
> err = switchdev_port_attr_set(p->dev, &attr);
> -   if (err && err != -EOPNOTSUPP)
> +   if (err)
> br_warn(p->br, "error setting offload STP state on port 
> %u(%s)\n",
> (unsigned int) p->port_no, p->dev->name);
>  }

Should this part of the patch be moved to patch 6/8 where
switchdev_deferred_process() is called from del_nbp()?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net-next 2/4] net_sched: update hierarchical backlog too

2015-10-14 Thread Cong Wang
On Wed, Oct 14, 2015 at 5:11 AM, Jamal Hadi Salim  wrote:
> On 10/12/15 14:38, Cong Wang wrote:
>>
>> When the bottom qdisc decides to, for example, drop some packet,
>> it calls qdisc_tree_decrease_qlen() to update the queue length
>> for all its ancestors, we need to update the backlog too to
>> keep the stats on root qdisc accurate.
>>
>
>
> There is more than one change in there (the codel change seems
> out of place and i wasnt sure why it was needed).

I thought it is clear that when codel decides to drop some packets
we don't know how many bytes it drops, we only know how many
packets before my patch. For example,

-   qdisc_tree_decrease_qlen(sch, q->cstats.drop_count);
+   qdisc_tree_reduce_backlog(sch, q->cstats.drop_count,
+ q->cstats.drop_len);

This clearly means I need some codel stats from codel to pass to
qdisc_tree_reduce_backlog(), this is  why the codel part is
necessary.


> Also it seems possible you are double-dipping in some cases;
> i dont have time to scrutinize - but looking at codel_change() change
> when the queue limit is exceeded you will end up affecting backlog from
> both qdisc_qstats_backlog_dec() and your new
> qdisc_tree_reduce_backlog()

Nope, qdisc_qstats_backlog_dec() decreases the backlog of itself,
qdisc_tree_reduce_backlog() decreases its upper qdiscs'. It is correct
as it was.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: hisilicon: fixes a bug when using ethtool -S

2015-10-14 Thread yankejian
From: lipeng 

this patch fixes a bug in hns driver. when we want to get statistic info
by using ethtool -S, it shows us there are 3 wrong counters info. because
the strings related to the registers are wrong. it needs to modify the
strings which give us wrong info.

Signed-off-by: lipeng 
Signed-off-by: yankejian 
Signed-off-by: Yisen Zhuang 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
index dab5ecf..802d554 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
@@ -51,9 +51,9 @@ static const struct mac_stats_string g_xgmac_stats_string[] = 
{
{"xgmac_rx_bad_pkt_from_dsaf", MAC_STATS_FIELD_OFF(rx_bad_from_sw)},
{"xgmac_tx_bad_pkt_64tomax", MAC_STATS_FIELD_OFF(tx_bad_pkts)},
 
-   {"xgmac_rx_not_well_pkt", MAC_STATS_FIELD_OFF(rx_fragment_err)},
-   {"xgmac_rx_good_well_pkt", MAC_STATS_FIELD_OFF(rx_undersize)},
-   {"xgmac_rx_total_pkt", MAC_STATS_FIELD_OFF(rx_under_min)},
+   {"xgmac_rx_bad_pkts_minto64", MAC_STATS_FIELD_OFF(rx_fragment_err)},
+   {"xgmac_rx_good_pkts_minto64", MAC_STATS_FIELD_OFF(rx_undersize)},
+   {"xgmac_rx_total_pkts_minto64", MAC_STATS_FIELD_OFF(rx_under_min)},
{"xgmac_rx_pkt_64", MAC_STATS_FIELD_OFF(rx_64bytes)},
{"xgmac_rx_pkt_65to127", MAC_STATS_FIELD_OFF(rx_65to127)},
{"xgmac_rx_pkt_128to255", MAC_STATS_FIELD_OFF(rx_128to255)},
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net-next 3/4] sch_htb: update backlog as well

2015-10-14 Thread Cong Wang
On Wed, Oct 14, 2015 at 5:25 AM, Jamal Hadi Salim  wrote:
> On 10/12/15 14:38, Cong Wang wrote:
>>
>> It is odd to see qlen!=0 but backlog==0, for a real example:
>>
>
> Backlog is a transient stat so a lot of times it should be 0. Only when
> the CPU is sending faster than the link can handle should you see
> the backlog grow (and eventually drain to 0).

Of course. But in my case, we were sending a burst of traffic while
with a lower HTB bw limit, so we can consistently see backlog!=0
for many seconds.

>
> Even though your explanation above is inaccurate I think the spirit
> of the patch looks reasonable. i.e keeping track of all additions to
> the queue and removals from the queue in the backlog stats is useful.
> However, you need to be extremely careful: This should only be done
> at exactly the spot the packet is enqueued (and not by a parent's
> enqueue asking for hierarchical enques).

The reason why I care about backlog and qlen is I want to know
the average length of each packet in backlog, to check if it is a
GSO packet at least.

>
> I think some more work is needed Cong for this general patchset.
>

Sure, I could miss something somewhere, just point it out. :)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net-next 1/4] net_sched: introduce qdisc_replace() helper

2015-10-14 Thread Cong Wang
On Wed, Oct 14, 2015 at 4:56 AM, Jamal Hadi Salim  wrote:
> On 10/12/15 14:38, Cong Wang wrote:
>>
>> Remove nearly duplicated code and prepare for the following patch.
>>
>
>
> Cong - like Dave, I dont see equivalence in some of these
> changes.
> Example not sure how the qfq grafting invocation of
> qfq_purge_queue fits in. There are a few others.

drr_purge_queue() and qfq_purge_queue() are both
qdisc_reset() + qdisc_tree_decrease_qlen():


static void drr_purge_queue(struct drr_class *cl)
{
unsigned int len = cl->qdisc->q.qlen;

qdisc_reset(cl->qdisc);
qdisc_tree_decrease_qlen(cl->qdisc, len);
}

static void qfq_purge_queue(struct qfq_class *cl)
{
unsigned int len = cl->qdisc->q.qlen;

qdisc_reset(cl->qdisc);
qdisc_tree_decrease_qlen(cl->qdisc, len);
}

Or you mean the order of calling them??
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net-next 1/4] net_sched: introduce qdisc_replace() helper

2015-10-14 Thread Cong Wang
On Tue, Oct 13, 2015 at 6:54 PM, David Miller  wrote:
> From: Cong Wang 
> Date: Mon, 12 Oct 2015 11:38:00 -0700
>
>> Remove nearly duplicated code and prepare for the following patch.
>>
>> Cc: Jamal Hadi Salim 
>> Signed-off-by: Cong Wang 
>
> This isn't an equivalent transformation:
>
>> +static inline struct Qdisc *qdisc_replace(struct Qdisc *sch, struct Qdisc 
>> *new,
>> +   struct Qdisc **pold)
>> +{
>> + struct Qdisc *old;
>> +
>> + sch_tree_lock(sch);
>> + old = *pold;
>> + *pold = new;
>> + if (old != NULL) {
>> + qdisc_tree_decrease_qlen(old, old->q.qlen);
>> + qdisc_reset(old);
>> + }
>> + sch_tree_unlock(sch);
>> +
>> + return old;
>> +}
>> +
>
> Is not the same as:
>
>> diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
>> index f26bdea..c76cdd4 100644
>> --- a/net/sched/sch_drr.c
>> +++ b/net/sched/sch_drr.c
>> @@ -226,11 +226,7 @@ static int drr_graft_class(struct Qdisc *sch, unsigned 
>> long arg,
>>   new = &noop_qdisc;
>>   }
>>
>> - sch_tree_lock(sch);
>> - drr_purge_queue(cl);
>> - *old = cl->qdisc;
>> - cl->qdisc = new;
>> - sch_tree_unlock(sch);
>> + *old = qdisc_replace(sch, new, &cl->qdisc);
>>   return 0;
>>  }
>>
>
> This.
>
> If you want to change semantics, you must do it explicitly in a separate
> commit with a detailed commit message explaining how and why.

If you meant drr_purge_queue(),  it is same:

static void drr_purge_queue(struct drr_class *cl)
{
unsigned int len = cl->qdisc->q.qlen;

qdisc_reset(cl->qdisc);
qdisc_tree_decrease_qlen(cl->qdisc, len);
}

Or if you mean the 'if', always having one if doesn't harm, do it?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipconfig: send Client-identifier in DHCP requests

2015-10-14 Thread Li RongQing
On Thu, Oct 15, 2015 at 11:27 AM, kbuild test robot  wrote:
> Hi Li,
>
> [auto build test WARNING on net/master -- if it's inappropriate base, please 
> suggest rules for selecting the more suitable base]
>
> url:
> https://github.com/0day-ci/linux/commits/roy-qing-li-gmail-com/ipconfig-send-Client-identifier-in-DHCP-requests/20151015-105553
> config: parisc-c3000_defconfig (attached as .config)
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=parisc
>
> All warnings (new ones prefixed by >>):
>
>>> net/ipv4/ipconfig.c:148:13: warning: 'dhcp_client_identifier' defined but 
>>> not used [-Wunused-variable]
> static char dhcp_client_identifier[253] __initdata;
> ^


Thanks, I will fix it

-Roy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipconfig: send Client-identifier in DHCP requests

2015-10-14 Thread kbuild test robot
Hi Li,

[auto build test WARNING on net/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/roy-qing-li-gmail-com/ipconfig-send-Client-identifier-in-DHCP-requests/20151015-105553
config: parisc-c3000_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All warnings (new ones prefixed by >>):

>> net/ipv4/ipconfig.c:148:13: warning: 'dhcp_client_identifier' defined but 
>> not used [-Wunused-variable]
static char dhcp_client_identifier[253] __initdata;
^

vim +/dhcp_client_identifier +148 net/ipv4/ipconfig.c

   132  
   133  static int ic_host_name_set __initdata; /* Host name set by us? */
   134  
   135  __be32 ic_myaddr = NONE;/* My IP address */
   136  static __be32 ic_netmask = NONE;/* Netmask for local subnet */
   137  __be32 ic_gateway = NONE;   /* Gateway IP address */
   138  
   139  __be32 ic_addrservaddr = NONE;  /* IP Address of the IP 
addresses'server */
   140  
   141  __be32 ic_servaddr = NONE;  /* Boot server IP address */
   142  
   143  __be32 root_server_addr = NONE; /* Address of NFS server */
   144  u8 root_server_path[256] = { 0, };  /* Path to mount as root */
   145  
   146  /* vendor class identifier */
   147  static char vendor_class_identifier[253] __initdata;
 > 148  static char dhcp_client_identifier[253] __initdata;
   149  
   150  /* Persistent data: */
   151  
   152  static int ic_proto_used;   /* Protocol used, if 
any */
   153  static __be32 ic_nameservers[CONF_NAMESERVERS_MAX]; /* DNS Server IP 
addresses */
   154  static u8 ic_domain[64];/* DNS (not NIS) domain name */
   155  
   156  /*

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH] ipconfig: send Client-identifier in DHCP requests

2015-10-14 Thread kbuild test robot
Hi Li,

[auto build test WARNING on net/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/roy-qing-li-gmail-com/ipconfig-send-Client-identifier-in-DHCP-requests/20151015-105553
config: parisc-defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All warnings (new ones prefixed by >>):

   net/ipv4/ipconfig.c: In function 'ic_proto_name':
>> net/ipv4/ipconfig.c:1584:4: warning: ignoring return value of 'kstrtou8', 
>> declared with attribute warn_unused_result [-Wunused-result]
   kstrtou8(client_id, 0, dhcp_client_identifier);
   ^

vim +/kstrtou8 +1584 net/ipv4/ipconfig.c

  1568  return 0;
  1569  }
  1570  #ifdef CONFIG_IP_PNP_DHCP
  1571  else if (!strncmp(name, "dhcp", 4)) {
  1572  char *client_id;
  1573  
  1574  ic_proto_enabled &= ~IC_RARP;
  1575  client_id = strstr(name, "dhcp,");
  1576  if (client_id) {
  1577  char *v;
  1578  
  1579  client_id = client_id + 5;
  1580  v = strchr(client_id, ',');
  1581  if (!v)
  1582  return 1;
  1583  *v = 0;
> 1584  kstrtou8(client_id, 0, dhcp_client_identifier);
  1585  strncpy(dhcp_client_identifier + 1, v + 1, 251);
  1586  *v = ',';
  1587  }
  1588  return 1;
  1589  }
  1590  #endif
  1591  #ifdef CONFIG_IP_PNP_BOOTP
  1592  else if (!strcmp(name, "bootp")) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-14 Thread Guenter Roeck

On 10/14/2015 07:52 PM, Andrew Lunn wrote:

On Wed, Oct 14, 2015 at 09:28:55PM -0400, Vivien Didelot wrote:

On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote:

On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote:

DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in
order to configure the VLAN map of every port.

This VLAN map is a feature of these switch chips to hardcode and restrict which
output ports a given input port can egress frames to.

A Linux bridge is a simple untagged VLAN propagated by the bridge code itself.
With a proper 802.1Q support, a driver does not need this hook anymore, and
will simply program the related VLAN object.

This patchset improves the hardware bridging code in the mv88e6xxx driver with
a strict 802.1Q mode.


Hi Vivien

I just tested this as part of net-next/master, and found a problem

If i do:

ip link set lan0 up
ip addr add 192.168.10.2/24 dev lan0

It will not ping. Looking in sys/kernel/debug/dsa0/stats i see
broadcast packets, probably ARP, being received at the port.
But they are not being forwarded out the CPU port.

If however i do

brctl addbr br0
brctl addif br0 lan0
ip addr add 192.168.10.2/24 dev br0
ip link set br0 up

i can ping.

So it looks like we are too restrictive by default. You should be able
to use interfaces as they are, without a bridge.


Correct, if the ports are not in a VLAN by default, they cannot talk.


Hi Vivien

This is a regression. Ports of the switch should work like normal
Linux interfaces. And up until now, they did. This patchset changed
that.

As Florian pointed out, these interfaces are separated from each
other. So you need something like a bridge per port by default, which
then gets removed and replaced when a port is added to a Linux bridge.

We also need to take care of VLANs. When the port is not a member of a
linux bridge, i expect all VLAN tagged frames to be received, as well
as untagged frames. This is normal Linux behaviour. But i never got
around to testing this with DSA.



There was a reason for the original code. I had wondered how it is now
supposed to work. Guess this exchange explains it. Looking forward to see
how it is going to be fixed, and too bad I don't have time to be more
involved.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges

2015-10-14 Thread Scott Feldman
On Wed, Oct 14, 2015 at 10:42 AM, Ido Schimmel  wrote:
> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote:
>>On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
>> wrote:
>>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
 Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com 
 wrote:
 >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
 >> Mon, Oct 12, 2015 at 08:36:25PM IDT, 
 >> vivien.dide...@savoirfairelinux.com wrote:
 >> >Hi guys,
 >> >
 >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
 >> >> From: Nikolay Aleksandrov 
 >> >>
 >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
 >> >>
 >> >> Signed-off-by: Nikolay Aleksandrov 
 >> >> ---
 >> >>  net/switchdev/switchdev.c | 3 +++
 >> >>  1 file changed, 3 insertions(+)
 >> >>
 >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
 >> >> index 6e4a4f9ad927..256c596de896 100644
 >> >> --- a/net/switchdev/switchdev.c
 >> >> +++ b/net/switchdev/switchdev.c
 >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct 
 >> >> net_device *dev,
 >> >> if (vlan.vid_begin)
 >> >> return -EINVAL;
 >> >> vlan.vid_begin = vinfo->vid;
 >> >> +   /* don't allow range of pvids */
 >> >> +   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
 >> >> +   return -EINVAL;
 >> >> } else if (vinfo->flags & 
 >> >> BRIDGE_VLAN_INFO_RANGE_END) {
 >> >> if (!vlan.vid_begin)
 >> >> return -EINVAL;
 >> >> --
 >> >> 2.4.3
 >> >>
 >> >
 >> >Yes the patch looks good, but it is a minor check though. I hope the
 >> >subject of this thread is making sense.
 >> >
 >> >VLAN ranges seem to have been included for an UX purpose (so commands
 >> >look like Cisco IOS). We don't want to change any existing interface, 
 >> >so
 >> >we pushed that down to drivers, with the only valid reason that, maybe
 >> >one day, an hardware can be capable of programming a range on a 
 >> >per-port
 >> >basis.
 >> Hi,
 >>
 >> That's actually what we are doing in mlxsw. We can do up to 256 entries 
 >> in
 >> one go. We've yet to submit this part.
 >
 >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
 >
 >So there is now a very last question in my head for this, which is more
 >a matter of kernel design. Should the user be aware of such underlying
 >support? In other words, would it make sense to do this in a driver:
 >
 >foo_port_vlan_add(struct net_device *dev,
 >  struct switchdev_obj_port_vlan *vlan)
 >{
 >if (vlan->vid_begin != vlan->vid_end)
 >return -ENOTSUPP; /* or something more relevant for user */
 >
 >return foo_port_single_vlan_add(dev, vlan->vid_begin);
 >}
 >
 >So drivers keep being simple, and we can easily propagate the fact that
 >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
 >implemented and must be done in software.
 I think that if you want to keep it simple, then Scott's advice from the
 previous thread is the most appropriate one. I believe the hardware you
 are using is simply not meant to support multiple 802.1Q bridges.
>>>
>>> You mean allowing only one Linux bridge over an hardware switch?
>>>
>>> It would for sure simplify how, as developers and users, we represent a
>>> physical switch. But I am not sure how to achieve that and I don't have
>>> strong opinions on this TBH.
>>
>>Hi Vivien, I think it's possible to keep switch ports on just one
>>bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
>>notifier.  This will give you the driver-level control you want.  Do
>>you have time to investigate?  The idea is:
>>
>>1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
>>being added to a second bridge,then return NOTIFY_BAD.  Your driver
>>needs to track the bridge count.
>>
>>2) In __netdev_upper_dev_link(), check the return code from the
>>call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
>>NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
>>
> Hi,
>
> We are doing something similar in mlxsw (not upstream yet). Jiri
> introduced PRE_CHANGEUPPER, which is called from the function you
> mentioned, but before the linking operation (so that you don't need to
> rollback).

Oh, cool.

> If the notification is about a linking operation and the master is a
> bridge different than the current one, then NOTIFY_BAD is returned.

So you're wanting to restrict to just one bridge also?  Or is
NOTIFY_BAD returned for some other reason?  I g

Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()

2015-10-14 Thread Jason Baron

> 
> X-Signed-Off-By: Rainer Weikusat 
> 

Hi,

So the patches I've posted and yours both use the idea of a relaying
the remote peer wakeup via callbacks that are internal to the net/unix,
such that we avoid exposing the remote peer wakeup to the external
poll()/select()/epoll(). They differ in when and how those callbacks
are registered/unregistered.

So I think your approach here will generally keep the peer wait wakeup
queue to its absolute minimum, by removing from that queue when
we set POLLOUT, however it requires taking the peer waitqueue lock on
every poll() call. So I think there are tradeoffs here vs. what I've
posted. So for example, if there are a lot of writers against one 'server'
socket, there is going to be a lot of lock contention with your approach
here. So I think the performance is going to depend on the workload that
is tested.

I don't have a specific workload that I am trying to solve here, but
since you introduced the waiting on the remote peer queue, perhaps you
can post numbers comparing the patches that have been posted for the
workload that this was developed for? I will send you the latest version
of what I have privately - so as not to overly spam the list.

Thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ipconfig: send Client-identifier in DHCP requests

2015-10-14 Thread roy . qing . li
From: Li RongQing 

A dhcp server may provide parameters to a client from a pool of IP
addresses and using a shared rootfs, or provide a specific set of
parameters for a specific client, usually using the MAC address to
identify each client individually. The dhcp protocol also specifies
a client-id field which can be used to determine the correct
parameters to supply when no MAC address is available. There is
currently no way to tell the kernel to supply a specific client-id,
only the userspace dhcp clients support this feature, but this can
not be used when the network is needed before userspace is available
such as when the root filesystem is on NFS.

This patch is to be able to do something like "ip=dhcp,client_id_type,
client_id_value", as a kernel parameter to enable the kernel to
identify itself to the server.

Signed-off-by: Li RongQing 
---
 Documentation/filesystems/nfs/nfsroot.txt |  3 +++
 net/ipv4/ipconfig.c   | 28 +++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/nfs/nfsroot.txt 
b/Documentation/filesystems/nfs/nfsroot.txt
index 2d66ed6..bb5ab6d 100644
--- a/Documentation/filesystems/nfs/nfsroot.txt
+++ b/Documentation/filesystems/nfs/nfsroot.txt
@@ -157,6 +157,9 @@ 
ip=:::
  both:use both BOOTP and RARP but not DHCP
   (old option kept for backwards compatibility)
 
+   if dhcp is used, the client identifier can be used by following
+   format "ip=dhcp,client-id-type,client-id-value"
+
 Default: any
 
   IP address of first nameserver.
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index ed4ef09..57c4fd4 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -145,6 +145,7 @@ u8 root_server_path[256] = { 0, };  /* Path to mount as 
root */
 
 /* vendor class identifier */
 static char vendor_class_identifier[253] __initdata;
+static char dhcp_client_identifier[253] __initdata;
 
 /* Persistent data: */
 
@@ -728,6 +729,16 @@ ic_dhcp_init_options(u8 *options)
memcpy(e, vendor_class_identifier, len);
e += len;
}
+   len = strlen(dhcp_client_identifier + 1);
+   /* the minimum length of identifier is 2, include 1 byte type,
+* and can not be larger than the length of options
+*/
+   if (len >= 1 && len < 312 - (e - options) - 1) {
+   *e++ = 61;
+   *e++ = len + 1;
+   memcpy(e, dhcp_client_identifier, len + 1);
+   e += len + 1;
+   }
}
 
*e++ = 255; /* End of the list */
@@ -1557,8 +1568,23 @@ static int __init ic_proto_name(char *name)
return 0;
}
 #ifdef CONFIG_IP_PNP_DHCP
-   else if (!strcmp(name, "dhcp")) {
+   else if (!strncmp(name, "dhcp", 4)) {
+   char *client_id;
+
ic_proto_enabled &= ~IC_RARP;
+   client_id = strstr(name, "dhcp,");
+   if (client_id) {
+   char *v;
+
+   client_id = client_id + 5;
+   v = strchr(client_id, ',');
+   if (!v)
+   return 1;
+   *v = 0;
+   kstrtou8(client_id, 0, dhcp_client_identifier);
+   strncpy(dhcp_client_identifier + 1, v + 1, 251);
+   *v = ',';
+   }
return 1;
}
 #endif
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-14 Thread Andrew Lunn
On Wed, Oct 14, 2015 at 09:28:55PM -0400, Vivien Didelot wrote:
> On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote:
> > On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote:
> > > DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device 
> > > event in
> > > order to configure the VLAN map of every port.
> > > 
> > > This VLAN map is a feature of these switch chips to hardcode and restrict 
> > > which
> > > output ports a given input port can egress frames to.
> > > 
> > > A Linux bridge is a simple untagged VLAN propagated by the bridge code 
> > > itself.
> > > With a proper 802.1Q support, a driver does not need this hook anymore, 
> > > and
> > > will simply program the related VLAN object.
> > > 
> > > This patchset improves the hardware bridging code in the mv88e6xxx driver 
> > > with
> > > a strict 802.1Q mode.
> > 
> > Hi Vivien
> > 
> > I just tested this as part of net-next/master, and found a problem
> > 
> > If i do:
> > 
> > ip link set lan0 up
> > ip addr add 192.168.10.2/24 dev lan0
> > 
> > It will not ping. Looking in sys/kernel/debug/dsa0/stats i see
> > broadcast packets, probably ARP, being received at the port.
> > But they are not being forwarded out the CPU port.
> > 
> > If however i do
> > 
> > brctl addbr br0
> > brctl addif br0 lan0
> > ip addr add 192.168.10.2/24 dev br0
> > ip link set br0 up
> > 
> > i can ping.
> > 
> > So it looks like we are too restrictive by default. You should be able
> > to use interfaces as they are, without a bridge.
> 
> Correct, if the ports are not in a VLAN by default, they cannot talk.

Hi Vivien 

This is a regression. Ports of the switch should work like normal
Linux interfaces. And up until now, they did. This patchset changed
that.

As Florian pointed out, these interfaces are separated from each
other. So you need something like a bridge per port by default, which
then gets removed and replaced when a port is added to a Linux bridge.

We also need to take care of VLANs. When the port is not a member of a
linux bridge, i expect all VLAN tagged frames to be received, as well
as untagged frames. This is normal Linux behaviour. But i never got
around to testing this with DSA.

   Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 3/4] Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping

2015-10-14 Thread Christopher Hall
On Tue, 13 Oct 2015 06:59:26 -0700, Richard Cochran  
 wrote:



On Mon, Oct 12, 2015 at 11:45:21AM -0700, Christopher S. Hall wrote:


+struct ptp_sys_offset_precise {
+   unsigned int rsv[4];/* Reserved for future use. */
+   struct ptp_clock_time dev;
+   struct ptp_clock_time sys;
+};
+


Please put the reserved field at the bottom.  Also, since we reading
the raw monotonic time under the hood, we might as well return it in


Good idea.

Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional

2015-10-14 Thread David Ahern

On 10/14/15 7:06 PM, David Miller wrote:

From: David Ahern 
Date: Wed, 14 Oct 2015 10:09:59 -0600


This latest patch makes IPv6 static addresses on par with IPv4,
including error paths.


I don't agree with ipv4's behavior... and just because ipv4 does
something poorly doesn't mean we get a free pass to replicate that
lazyness in ipv6.



As I stated this patch makes IPv6 on par with IPv4 with regards to 
saving the address and lack of error handling back to the user should a 
failure happen on a link up. Yes, it is best to give the user 
notification of a failure, but step back for a moment and look at the 
bigger picture:


At best the address is saved and restored on a link up (the expected 
outcome for 99.99...% of the time). At worst the address is removed 
because the prefix route fails a memory allocation and the user is not 
notified. But that is exactly what happens today - the address is 
dropped and the user has to restore it.


As for the 1 failure path -- it's a GFP_ATOMIC memory allocation 
failure. Frankly if that happens lack of an address on an interface is 
the least of the user's problems.


As for the options to fix this existing shortcoming:

1. The existing call_netdevice_notifiers infra does not allow a notifier 
to 'fail' the transaction and roll it back or even to give the user an 
error message.


2. Stashing the prefix route has its merits but it has to deal with 
error paths as well. What if the address is deleted? What if the mask is 
changed while the device is a down state? What if the device is deleted? 
Sure, handle those cases but what other paths are missing from that list?


Both paths introduce a lot of complexity all b/c we want to save the 
address on a link and restore the route on a link up.


Why not take this as a start point that at least does the right thing 
almost every time?


David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-14 Thread Christopher Hall

Richard,

On Tue, 13 Oct 2015 14:12:24 -0700, Richard Cochran  
 wrote:

On Tue, Oct 13, 2015 at 09:15:51PM +0200, Thomas Gleixner wrote:



Can we at least have a explanation of how the firmware operates?  How
are (ART,sys) pairs are generated, and how they are supposed to get
into the DSP?


I'll give it a try. The audio controller has a set of registers almost  
exactly like those on the network device. The e1000e patch adds the  
e1000e_phc_get_ts() function. It writes a register to start the  
cross-timestamp process and some time later the hardware sets a bit  
indicating that it's finished.


In the case of the network, the host polls for this bit to be set,  
indicating the cross-timestamp registers have valid data.  In the audio  
DSP case, it is the DSP that's doing the polling and it can only poll once  
per millisecond.


The transfers look like:

Host -PCI (write request) -> DSP

[Transaction started from host]

DSP -PCI (write to initiate)-> Audio controller

[Transaction started from DSP]

DSP <-PCI (read to poll status)- Audio Controller

[Transaction Complete from DSP perspective]

DSP <-PCI (read (ART,device) pair)- Audio Controller

DSP -PCI (write notification) -> Host

[Transaction complete from Host perspective]

Host <-PCI read (ART,device) pair- DSP


I hope this is helpful. Thanks.

Chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] xen-netfront: update num_queues to real created

2015-10-14 Thread Joe Jin
Sometimes xennet_create_queues() may failed to created all requested
queues, we need to update num_queues to real created to avoid NULL
pointer dereference.

Signed-off-by: Joe Jin 
Cc: Wei Liu 
Cc: Ian Campbell 
Cc: David S. Miller 
---
 drivers/net/xen-netfront.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index f821a97..d580aec 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1746,7 +1746,7 @@ static int xennet_create_queues(struct netfront_info 
*info,
dev_err(&info->netdev->dev, "no queues\n");
return -EINVAL;
}
-   return 0;
+   return num_queues;
 }
 
 /* Common code used when first setting up, and when resuming. */
@@ -1788,9 +1788,12 @@ static int talk_to_netback(struct xenbus_device *dev,
if (info->queues)
xennet_destroy_queues(info);
 
-   err = xennet_create_queues(info, num_queues);
-   if (err < 0)
+   /* Update queues number to real created */
+   num_queues = xennet_create_queues(info, num_queues);
+   if (num_queues < 0) {
+   err = num_queues;
goto destroy_ring;
+   }
 
/* Create shared ring, alloc event channel -- for each queue */
for (i = 0; i < num_queues; ++i) {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/2] bpf: control a set of perf events by creating a new ioctl PERF_EVENT_IOC_SET_ENABLER

2015-10-14 Thread xiakaixu
于 2015/10/15 5:28, Alexei Starovoitov 写道:
> On 10/14/15 5:37 AM, Kaixu Xia wrote:
>> +event->p_sample_disable = &enabler_event->sample_disable;
> 
> I don't like it as a concept and it's buggy implementation.
> What happens here when enabler is alive, but other event is destroyed?
> 
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -221,9 +221,12 @@ static u64 bpf_perf_event_sample_control(u64 r1, u64 
>> index, u64 flag, u64 r4, u6
>>   struct bpf_array *array = container_of(map, struct bpf_array, map);
>>   struct perf_event *event;
>>
>> -if (unlikely(index >= array->map.max_entries))
>> +if (unlikely(index > array->map.max_entries))
>>   return -E2BIG;
>>
>> +if (index == array->map.max_entries)
>> +index = 0;
> 
> what is this hack for ?
> 
> Either use notification and user space disable or
> call bpf_perf_event_sample_control() manually for each cpu.

I will discard current implemention that controlling a set of
perf events by the 'enabler' event. Call bpf_perf_event_sample_control()
manually for each cpu is fine. Maybe we can add a loop to control all the
events stored in maps by judging the index, OK?
> 
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] netlink: Rightsize IFLA_AF_SPEC size calculation

2015-10-14 Thread David Miller
From: Ronen Arad 
Date: Wed, 14 Oct 2015 08:51:28 -0700

> @@ -900,7 +901,7 @@ static noinline size_t if_nlmsg_size(const struct 
> net_device *dev,
>  + rtnl_vfinfo_size(dev, ext_filter_mask) /* IFLA_VFINFO_LIST */
>  + rtnl_port_size(dev, ext_filter_mask) /* IFLA_VF_PORTS + 
> IFLA_PORT_SELF */
>  + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
> -+ rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
> + + rtnl_link_get_af_size(dev, ext_filter_mask) /* IFLA_AF_SPEC */
>  + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */

Please don't change the indentation on this line, keep it matching
the indentation of all of the surrounding lines of this expression.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] drivers/net: get rid of unnecessary initializations in .get_drvinfo()

2015-10-14 Thread David Miller
From: Ivan Vecera 
Date: Wed, 14 Oct 2015 18:27:52 +0200

> Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
> eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
> It's not necessary as these fields is filled in ethtool_get_drvinfo().
> 
> Signed-off-by: Ivan Vecera 
 ...
>  drivers/net/usb/sr9800.c| 1 -

Please fix this unused variable warning added to this file and
resubmit, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next V1 0/4] Mellanox driver update, Oct 14 2015

2015-10-14 Thread David Miller
From: Or Gerlitz 
Date: Wed, 14 Oct 2015 17:43:44 +0300

> Hi Dave,
> 
> This series contains two more patches from Eli, patch from Majd
> to support PCI error handlers and a fix from Jack to mlx4 VFs
> when probed without a provisioned mac address.
> 
> The patch set applied on top of net-next commit bbb300e "Merge branch 
> 'bridge-vlan'"
> 
> changes from V0:
>   - made the health flag int --> bool to address comment from Dave on patch #1
>   - fixed sparse warning noted by the 0-day build tests in patch #2

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-14 Thread Christopher Hall

Thomas,

On Tue, 13 Oct 2015 12:42:52 -0700, Thomas Gleixner   
wrote:

On Mon, 12 Oct 2015, Christopher S. Hall wrote:

audio.


This wants to be a seperate patch, really.


OK. This makes sense, I'll do this the next time.


+/* This needs to be 3 or greater for backtracking to be useful */


Why?


The current index points to a copy and the next may be being changed by  
update_wall_time(). Leaving n-2 entries available with useful history in  
them. I'll add more descriptive comments here.





+#define SHADOW_HISTORY_DEPTH 7


And that number is 7 because?


Due to power of 2 it will be 8 instead. As above the useful history is  
8-2*1 ms (1 ms is the minimum jiffy length).  Array size 4 would not be  
enough history for the DSP which requires 4 ms of history, in the worst  
case.



+static int shadow_index = -1; /* incremented to zero in


What's the point of this? Aside of that, please do not use tail comments.


It's removed.  A check for validity is added below and this isn't  
necessary.



That's silly. Make DEPTH a power of 2 and do:

   idx = (idx + 1) & (DEPTH - 1);


This is changed.


+   true : *shadow_index_out < shadow_index;


All this can go away.


Yes.


+   /* Also make sure that entry is valid based on current shadow_index */
+   *shadow_index_io = ret;
+   return true;


You surely try hard to do stuff in the most unreadable way.


Is like this easier to follow?

+static struct timekeeper *search_shadow_history(cycles_t cycles,
+   struct clocksource *cs)
+{
+   struct timekeeper *tk = &tk_core.timekeeper;
+   int srchidx = shadow_index;
+   cycles_t cycles_start, cycles_end;
+
+   cycles_start = tk->tkr_mono.cycle_last;
+   do {
+   srchidx = !srchidx-- ? srchidx+SHADOW_HISTORY_DEPTH :  
srchidx;

+   tk = shadow_timekeeper + srchidx;
+
+   /* The next shadow entry may be in flight, don't use it */
+   if (srchidx == ((shadow_index+1) &  
(SHADOW_HISTORY_DEPTH-1)))

+   return NULL;
+
+   /* Make sure timekeeper is related to clock on this  
interval */

+   if (tk->tkr_mono.clock != cs)
+   return NULL;
+
+   cycles_end = cycles_start;
+   cycles_start = tk->tkr_mono.cycle_last;
+   } while (!cycle_between(cycles_start, cycles, cycles_end));
+
+   return tk;
+}
A check for validity is added here using the clocksource pointer.

and inside of get_correlated_timestamp():

+* into account. If the value is in the past, try to  
backtrack

+*/
+   cycles_end = tk->tkr_mono.read(tk->tkr_mono.clock);
+   cycles_start = tk->tkr_mono.cycle_last;
+   if (!cycle_between(cycles_start, cycles, cycles_end)) {
+   tk = search_shadow_history(cycles,  
crs->related_cs);

+   if (!tk)
+   return -EAGAIN;
+   }




+   /*
+* Get a timestamp from the device if get_ts is non-NULL
+*/
+   if( crt->get_ts ) {
+   ret = crt->get_ts(crt);
+   if (ret)
+   return ret;
+   }


What's the point of this? Why are you not making the few lines which
you can actually reuse a helper function and leave the PTP code alone?


The audio driver is structured in such a way that it's simpler to provide  
a value rather than a callback.  I changed this to allow the audio  
developers to provide an ART value as input.  If a callback is provided,  
the resulting counter value is guaranteed to be later than cycle_last and  
there is no need to do extra checking (the goto skips that check).  Is  
this an answer to your question?



So I reached enf of patch and did not find anything in
timekeeping_init() which tells that the index is incremented to 0. It
really would need a comment, but why do you want to do that at all. It
does not matter whether the first entry is at 0 or 1. You need a
validity check for the entries anyway.


I think this should be resolved.  There's no sensitivity with regard to  
the start index with an added validity check.


Thanks,
Chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2 1/1] tipc: move fragment importance field to new header position

2015-10-14 Thread David Miller
From: Jon Maloy 
Date: Wed, 14 Oct 2015 09:23:18 -0400

> In commit e3eea1eb47a ("tipc: clean up handling of message priorities")
> we introduced a field in the packet header for keeping track of the
> priority of fragments, since this value is not present in the specified
> protocol header. Since the value so far only is used at the transmitting
> end of the link, we have not yet officially defined it as part of the
> protocol.
> 
> Unfortunately, the field we use for keeping this value, bits 13-15 in
> in word 5, has turned out to be a poor choice; it is already used by the
> broadcast protocol for carrying the 'network id' field of the sending
> node. Since packet fragments also need to be transported across the
> broadcast protocol, the risk of conflict is obvious, and we see this
> happen when we use network identities larger than 2^13-1. This has
> escaped our testing because we have so far only been using small network
> id values.
> 
> We now move this field to bits 0-2 in word 9, a field that is guaranteed
> to be unused by all involved protocols.
> 
> Fixes: e3eea1eb47a ("tipc: clean up handling of message priorities")
> Signed-off-by: Jon Maloy 
> Acked-by: Ying Xue 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp/dccp: fix potential NULL deref in __inet_inherit_port()

2015-10-14 Thread David Miller
From: Eric Dumazet 
Date: Wed, 14 Oct 2015 05:58:38 -0700

> From: Eric Dumazet 
> 
> As we no longer hold listener lock in fast path, it is possible that a
> child is created right after listener freed its bound port, if a close()
> is done while incoming packets are processed.
> 
> __inet_inherit_port() must detect this and return an error,
> so that caller can free the child earlier.
> 
> Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
> Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
> Signed-off-by: Eric Dumazet 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: avoid spurious SYN flood detection at listen() time

2015-10-14 Thread David Miller
From: Eric Dumazet 
Date: Wed, 14 Oct 2015 06:16:49 -0700

> From: Eric Dumazet 
> 
> At listen() time, there is a small window where listener is visible with
> a zero backlog, triggering a spurious "Possible SYN flooding on port"
> message.
> 
> Nothing prevents us from setting the correct backlog.
> 
> Signed-off-by: Eric Dumazet 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Captain Kelvin Ken Miller

2015-10-14 Thread
Am Captain Kelvin Ken Miller i am with the us army in Camp Abu Naji / FOB Garry 
Owen (Al Amarah)I need you assistant to move some funds out of Iraq. 
Kindly respond for more details.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: aquantia/teranetics: Convert to use module_phy_driver macro

2015-10-14 Thread David Miller
From: Axel Lin 
Date: Wed, 14 Oct 2015 18:30:48 +0800

> Use module_phy_driver macro to simplify the code a bit.
> 
> Signed-off-by: Axel Lin 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] eventfd: implementation of EFD_MASK flag

2015-10-14 Thread Damian Hobson-Garcia
From: Martin Sustrik 

When implementing network protocols in user space, one has to implement
fake file descriptors to represent the sockets for the protocol.

Polling on such fake file descriptors is a problem (poll/select/epoll
accept only true file descriptors) and forces protocol implementers to use
various workarounds resulting in complex, non-standard and convoluted APIs.

More generally, ability to create full-blown file descriptors for
userspace-to-userspace signalling is missing. While eventfd(2) goes half
the way towards this goal it has follwoing shorcomings:

I.  There's no way to signal POLLPRI, POLLHUP etc.
II. There's no way to signal arbitrary combination of POLL* flags. Most
notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid
combination for a network protocol (rx buffer is empty and tx buffer is
full), cannot be signaled using eventfd.

This patch implements new EFD_MASK flag which solves the above problems.

The semantics of EFD_MASK are as follows:

eventfd(2):

If eventfd is created with EFD_MASK flag set, it is initialised in such a
way as to signal no events on the file descriptor when it is polled on.
The 'initval' argument is ignored.

write(2):

User is allowed to write only buffers containing a 32-bit value
representing any combination of event flags as defined by the poll(2)
function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.). Specified events
will be signaled when polling (select, poll, epoll) on the eventfd is
done later on.

read(2):

read is not supported and will fail with EINVAL.

select(2), poll(2) and similar:

When polling on the eventfd marked by EFD_MASK flag, all the events
specified in last written event flags shall be signaled.

Signed-off-by: Martin Sustrik 

[dhobs...@igel.co.jp: Rebased, and resubmitted for Linux 4.3]
Signed-off-by: Damian Hobson-Garcia 
---
 fs/eventfd.c | 102 ++-
 include/linux/eventfd.h  |  16 +--
 include/uapi/linux/eventfd.h |  33 ++
 3 files changed, 126 insertions(+), 25 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8d0c0df..1310779 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -2,6 +2,7 @@
  *  fs/eventfd.c
  *
  *  Copyright (C) 2007  Davide Libenzi 
+ *  Copyright (C) 2013  Martin Sustrik 
  *
  */
 
@@ -22,18 +23,31 @@
 #include 
 #include 
 
+#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
+#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK)
+#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP)
+
 struct eventfd_ctx {
struct kref kref;
wait_queue_head_t wqh;
-   /*
-* Every time that a write(2) is performed on an eventfd, the
-* value of the __u64 being written is added to "count" and a
-* wakeup is performed on "wqh". A read(2) will return the "count"
-* value to userspace, and will reset "count" to zero. The kernel
-* side eventfd_signal() also, adds to the "count" counter and
-* issue a wakeup.
-*/
-   __u64 count;
+   union {
+   /*
+* Every time that a write(2) is performed on an eventfd, the
+* value of the __u64 being written is added to "count" and a
+* wakeup is performed on "wqh". A read(2) will return the
+* "count" value to userspace, and will reset "count" to zero.
+* The kernel side eventfd_signal() also, adds to the "count"
+* counter and issue a wakeup.
+*/
+   __u64 count;
+
+   /*
+* When using eventfd in EFD_MASK mode this stracture stores the
+* current events to be signaled on the eventfd (events member)
+* along with opaque user-defined data (data member).
+*/
+   __u32 events;
+   };
unsigned int flags;
 };
 
@@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, 
poll_table *wait)
return events;
 }
 
+static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait)
+{
+   struct eventfd_ctx *ctx = file->private_data;
+
+   poll_wait(file, &ctx->wqh, wait);
+   return ctx->events;
+}
+
 static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
 {
*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
@@ -239,6 +261,14 @@ static ssize_t eventfd_read(struct file *file, char __user 
*buf, size_t count,
return put_user(cnt, (__u64 __user *) buf) ? -EFAULT : sizeof(cnt);
 }
 
+static ssize_t eventfd_mask_read(struct file *file, char __user *buf,
+   size_t count,
+   loff_t *ppos)
+{
+   return -EINVAL;
+}
+
+
 static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t 
count,
 loff_t *ppos)
 {
@@ -286,6 +31

Re: [PATCH] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings

2015-10-14 Thread David Miller
From: Joe Perches 
Date: Wed, 14 Oct 2015 01:09:40 -0700

> It seems that kernel memory can leak into userspace by a
> kmalloc, ethtool_get_strings, then copy_to_user sequence.
> 
> Avoid this by using kcalloc to zero fill the copied buffer.
> 
> Signed-off-by: Joe Perches 

Applied and queued up for -stable, thanks Joe.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/1] Generalize poll events from eventfd

2015-10-14 Thread Damian Hobson-Garcia
Using eventfd user space can generate POLLIN/POLLOUT events but some
applications may want to generate POLLPRI/POLLERR events as well.
This patch submission aims to generalize the events generated by an
eventfd. This is a resubmission of a patch from Feb 2013[1]. The original
discussion trailed off without any conclusion, but the original author
has recently confirmed[2] that this functionality is still useful, so I
volunteered to rebase and resubmit the patch for discussion.

[1] https://lkml.org/lkml/2013/2/18/147
[2] https://lkml.org/lkml/2015/7/9/153

Changes in v3
-
* replace efd_mask structure with scalar 'events' variable.

Changes in v2
-

* rebased on Linux v4.3-rc1
* Move file operation implementations for EFD_MASK to a seperate structure
* Remove 'data' element from efd_mask structure
* read() is no longer supported when EFD_MASK is set (fails with EINVAL)
* eventfd_ctx_fileget() now returns EINVAL when EFD_MASK is set, eliminating
  the possibility of triggering the orginal BUG_ON() macros which have now
  been removed.

Thank you,
Damian

Martin Sustrik (1):
  eventfd: implementation of EFD_MASK flag

 fs/eventfd.c | 91 ++--
 include/linux/eventfd.h  | 16 +---
 include/uapi/linux/eventfd.h | 40 +++
 3 files changed, 121 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-14 Thread Florian Fainelli
On 14/10/15 18:28, Vivien Didelot wrote:
> On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote:
>> On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote:
>>> DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event 
>>> in
>>> order to configure the VLAN map of every port.
>>>
>>> This VLAN map is a feature of these switch chips to hardcode and restrict 
>>> which
>>> output ports a given input port can egress frames to.
>>>
>>> A Linux bridge is a simple untagged VLAN propagated by the bridge code 
>>> itself.
>>> With a proper 802.1Q support, a driver does not need this hook anymore, and
>>> will simply program the related VLAN object.
>>>
>>> This patchset improves the hardware bridging code in the mv88e6xxx driver 
>>> with
>>> a strict 802.1Q mode.
>>
>> Hi Vivien
>>
>> I just tested this as part of net-next/master, and found a problem
>>
>> If i do:
>>
>> ip link set lan0 up
>> ip addr add 192.168.10.2/24 dev lan0
>>
>> It will not ping. Looking in sys/kernel/debug/dsa0/stats i see
>> broadcast packets, probably ARP, being received at the port.
>> But they are not being forwarded out the CPU port.
>>
>> If however i do
>>
>> brctl addbr br0
>> brctl addif br0 lan0
>> ip addr add 192.168.10.2/24 dev br0
>> ip link set br0 up
>>
>> i can ping.
>>
>> So it looks like we are too restrictive by default. You should be able
>> to use interfaces as they are, without a bridge.
> 
> Correct, if the ports are not in a VLAN by default, they cannot talk.

The expectation for DSA devices, if no bridge device is configured is to
have each port be able to talk to the CPU port only, but this has to
work out of the box.

> 
> If you want to, I think the special VLAN 0 can be used for that purpose.
> IIRC, in a given configuration, Linux add the interfaces (thus programs
> the hardware) with VLAN 0. I'm not sure when, maybe when the
> .ndo_vlan_rx_add_vid is implemented, I need to give it a shot.

But if you do that, won't that put all DSA ports into VLAN 0? Would not
that break isolation between each ports as expected for a DSA switch?

> 
> Otherwise, I can send you a patch configuring the VLAN 0 on switch
> setup if this is the behavior we want.
> 
> Thanks,
> -v
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-14 Thread Vivien Didelot
On Oct. Thursday 15 (42) 12:46 AM, Andrew Lunn wrote:
> On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote:
> > DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event 
> > in
> > order to configure the VLAN map of every port.
> > 
> > This VLAN map is a feature of these switch chips to hardcode and restrict 
> > which
> > output ports a given input port can egress frames to.
> > 
> > A Linux bridge is a simple untagged VLAN propagated by the bridge code 
> > itself.
> > With a proper 802.1Q support, a driver does not need this hook anymore, and
> > will simply program the related VLAN object.
> > 
> > This patchset improves the hardware bridging code in the mv88e6xxx driver 
> > with
> > a strict 802.1Q mode.
> 
> Hi Vivien
> 
> I just tested this as part of net-next/master, and found a problem
> 
> If i do:
> 
> ip link set lan0 up
> ip addr add 192.168.10.2/24 dev lan0
> 
> It will not ping. Looking in sys/kernel/debug/dsa0/stats i see
> broadcast packets, probably ARP, being received at the port.
> But they are not being forwarded out the CPU port.
> 
> If however i do
> 
> brctl addbr br0
> brctl addif br0 lan0
> ip addr add 192.168.10.2/24 dev br0
> ip link set br0 up
> 
> i can ping.
> 
> So it looks like we are too restrictive by default. You should be able
> to use interfaces as they are, without a bridge.

Correct, if the ports are not in a VLAN by default, they cannot talk.

If you want to, I think the special VLAN 0 can be used for that purpose.
IIRC, in a given configuration, Linux add the interfaces (thus programs
the hardware) with VLAN 0. I'm not sure when, maybe when the
.ndo_vlan_rx_add_vid is implemented, I need to give it a shot.

Otherwise, I can send you a patch configuring the VLAN 0 on switch
setup if this is the behavior we want.

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v6 01/10] qed: Add module with basic common support

2015-10-14 Thread David Miller
From: Yuval Mintz 
Date: Wed, 14 Oct 2015 09:24:05 +0300

> +int qed_qm_pf_rt_init(struct qed_hwfn*p_hwfn,
> +   struct qed_ptt*p_ptt,
> +   u8port_id,
> +   u8pf_id,
> +   u8max_phys_tcs_per_port,
> +   bool  is_first_pf,
> +   u32   num_pf_cids,
> +   u32   num_vf_cids,
> +   u32   num_tids,
> +   u16   start_pq,
> +   u16   num_pf_pqs,
> +   u16   num_vf_pqs,
> +   u8start_vport,
> +   u8num_vports,
> +   u8pf_wfq,
> +   u32   pf_rl,
> +   struct init_qm_pq_params  *pq_params,
> +   struct init_qm_vport_params   *vport_params);

Sorry, this is completely rediculous.

No function should have so many parameters.

If you need to pass this much information to a function, create a structure in
which to contain the values and pass a reference to that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next 0/4] Rightsize IFLA_AF_SPEC size calculation

2015-10-14 Thread Arad, Ronen


>-Original Message-
>From: David Miller [mailto:da...@davemloft.net]
>Sent: Wednesday, October 14, 2015 6:44 PM
>To: Arad, Ronen
>Cc: netdev@vger.kernel.org
>Subject: Re: [PATCH net-next 0/4] Rightsize IFLA_AF_SPEC size calculation
>
>From: Ronen Arad 
>Date: Tue, 13 Oct 2015 22:58:30 -0700
>
>> if_nlmsg_size() overestimates the minimum allocation size of netlink dump
>> request (when called from rtnl_calcit()) or the size of the message (when
>called
>> from rtnl_getlink()). This is because ext_filter_mask is not supported by
>> rtnl_link_get_af_size() and rtnl_link_get_size().
>>
>> The over-estimation is significant when at least one netdev has many VLANs
>> configured (8 bytes for each configured VLAN).
>>
>> This patch-set "rightsizes" the protocol specific attribute size calculation
>by
>> propagating ext_filter_mask to rtnl_link_get_af_size() and adding optional
>> filtering aware get_af_size_filtered op in struct rtnl_af_ops. Bridge
>module,
>> which already used filtering aware sizing for notification, is enhanced to
>do
>> the same for netlink dump requests.
>
>There are only three implementations of get_link_af_size, so please just
>simply
>change it's signature by adding the ext_filter_mask parameter instead of
>creating
>a completely new operation.
[@Ronen] I've already submitted a V2 that does that in a simplified
single part patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] Rightsize IFLA_AF_SPEC size calculation

2015-10-14 Thread David Miller
From: Ronen Arad 
Date: Tue, 13 Oct 2015 22:58:30 -0700

> if_nlmsg_size() overestimates the minimum allocation size of netlink dump
> request (when called from rtnl_calcit()) or the size of the message (when 
> called
> from rtnl_getlink()). This is because ext_filter_mask is not supported by
> rtnl_link_get_af_size() and rtnl_link_get_size().
> 
> The over-estimation is significant when at least one netdev has many VLANs
> configured (8 bytes for each configured VLAN).
> 
> This patch-set "rightsizes" the protocol specific attribute size calculation 
> by
> propagating ext_filter_mask to rtnl_link_get_af_size() and adding optional
> filtering aware get_af_size_filtered op in struct rtnl_af_ops. Bridge module,
> which already used filtering aware sizing for notification, is enhanced to do
> the same for netlink dump requests.

There are only three implementations of get_link_af_size, so please just simply
change it's signature by adding the ext_filter_mask parameter instead of 
creating
a completely new operation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: hisilicon net: fix a bug about led

2015-10-14 Thread David Miller
From: yankejian 
Date: Wed, 14 Oct 2015 10:28:57 +0800

> From: lipeng 
> 
> this patch fixes a bug in hns driver. the link led is on at the beginning,
> but at this time the ethernet port is on down status. it needs to reset
> the led status on init sequence.
> 
> Signed-off-by: lipeng 
> Signed-off-by: yankejian 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] cxgb4i: Increased the value of MAX_IMM_TX_PKT_LEN from 128 to 256 bytes

2015-10-14 Thread David Miller
From: Karen Xie 
Date: Tue, 13 Oct 2015 17:13:59 -0700

> This helps improving the latency of small packets.
> 
> Signed-off-by: Rakesh Ranjan 
> Signed-off-by: Karen Xie 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] net: phy: bcm-phy-lib: Fix module license issue

2015-10-14 Thread David Miller
From: Arun Parameswaran 
Date: Tue, 13 Oct 2015 13:40:12 -0700

> The 'bcm-phy-lib.c', added as a part of the commit
> "net: phy: Add Broadcom phy library for common interfaces"
> was missing the module license. This was causing an issue
> when the library is built as a module; "module license
> 'unspecified' taints kernel".
> 
> This patch fixes the issue by adding the module license,
> author and description to the bcm-phy-lib.c file.
> 
> Fixes: a1cba5613edf5 ("net: phy: Add Broadcom phy library for common
> interfaces")
> Signed-off-by: Arun Parameswaran 

This patch doesn't apply to my net-next tree at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 1/2] ixgb:Remove reducant error path after call to ixgb_sw_init in the function ixgb_probe

2015-10-14 Thread Jeff Kirsher
On Wed, 2015-10-14 at 18:57 -0400, Nicholas Krause wrote:
> This removes the reducant error path and now no longer used goto
> label  err_sw_init after the call to ixgb_probe in the function
> ixgb_sw_init after calling this function due to it always returning
> zero as it is guarantee to run successfully without any issues.
> 
> Signed-off-by: Nicholas Krause 
> ---
>  drivers/net/ethernet/intel/ixgb/ixgb_main.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

This driver (ixgb), as well as e100 and e1000 are in maintenance mode
which means bug fixes ONLY!

Is this patch necessary?  Answer: No
Is this a bug fix?  Answer: No
Should you have sent this patch?  See answers to previous questions.

Please ask these questions to yourself when putting together a patch
against these drivers (listed above).

With that said, dropping this series.

signature.asc
Description: This is a digitally signed message part


Re: pull-request: can-next 2015-09-17

2015-10-14 Thread David Miller
From: Marc Kleine-Budde 
Date: Tue, 13 Oct 2015 18:08:01 +0200

> this is a pull request of 4 patches for net-next/master.
> 
> Two patches are by Gerhard Bertelsmann, fixing some problems in the
> sun4i driver. The patch by Arnd Bergmann stops using timeval for the
> CAN broadcast manager. The last patch by Alexandre Belloni removes the
> otherwise unused struct at91_can_data from the driver.

Pulled, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: mac80211 2015-10-13

2015-10-14 Thread David Miller
From: Johannes Berg 
Date: Tue, 13 Oct 2015 10:59:47 +0200

> There are just two small fixes, but I didn't really want to wait since
> I have nothing else pending.
> 
> Let me know if there's any problem.

Pulled, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux-next: manual merge of the net-next tree with the net tree

2015-10-14 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/switchdev/switchdev.c

between commit:

  87aaf2caed84 ("switchdev: check if the vlan id is in the proper vlan range")

from the net tree and commits:

  7ea6eb3f56f4 ("switchdev: introduce transaction item queue for attr_set and 
obj_add")
  ab0690023018 ("net: switchdev: abstract object in add/del ops"

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc net/switchdev/switchdev.c
index 77f5d17e2612,b8aaf820ef65..
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@@ -16,7 -16,7 +16,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  #include 
  #include 
  
@@@ -635,32 -722,33 +723,35 @@@ static int switchdev_port_br_afspec(str
if (nla_len(attr) != sizeof(struct bridge_vlan_info))
return -EINVAL;
vinfo = nla_data(attr);
 +  if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
 +  return -EINVAL;
-   vlan->flags = vinfo->flags;
+   vlan.flags = vinfo->flags;
if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
-   if (vlan->vid_begin)
+   if (vlan.vid_begin)
+   return -EINVAL;
+   vlan.vid_begin = vinfo->vid;
+   /* don't allow range of pvids */
+   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
return -EINVAL;
-   vlan->vid_begin = vinfo->vid;
} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
-   if (!vlan->vid_begin)
+   if (!vlan.vid_begin)
return -EINVAL;
-   vlan->vid_end = vinfo->vid;
-   if (vlan->vid_end <= vlan->vid_begin)
+   vlan.vid_end = vinfo->vid;
+   if (vlan.vid_end <= vlan.vid_begin)
return -EINVAL;
-   err = f(dev, &obj);
+   err = f(dev, &vlan.obj);
if (err)
return err;
-   memset(vlan, 0, sizeof(*vlan));
+   memset(&vlan, 0, sizeof(vlan));
} else {
-   if (vlan->vid_begin)
+   if (vlan.vid_begin)
return -EINVAL;
-   vlan->vid_begin = vinfo->vid;
-   vlan->vid_end = vinfo->vid;
-   err = f(dev, &obj);
+   vlan.vid_begin = vinfo->vid;
+   vlan.vid_end = vinfo->vid;
+   err = f(dev, &vlan.obj);
if (err)
return err;
-   memset(vlan, 0, sizeof(*vlan));
+   memset(&vlan, 0, sizeof(vlan));
}
}
  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Exporting some of sysctls from net/ipv4 and net/core to a net namespace

2015-10-14 Thread Thomas Tanaka
Hi,

It seems due to the following patch set in Linux v3.5
[PATCH net-next 00/19] net: Sysctl simplications and enhancements
http://comments.gmane.org/gmane.linux.network/227965
some of the previously visible sysctls variables in net/core and
net/ipv4 has become invisible.

Is there a possibility that the idea of bringing back some of those
parameters as a read-only to a net namespace be considered ?

Thanks in advance.

-- 
Regards,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v5] net: ipv6: Make address flushing on ifdown optional

2015-10-14 Thread David Miller
From: David Ahern 
Date: Wed, 14 Oct 2015 10:09:59 -0600

> This latest patch makes IPv6 static addresses on par with IPv4,
> including error paths.

I don't agree with ipv4's behavior... and just because ipv4 does
something poorly doesn't mean we get a free pass to replicate that
lazyness in ipv6.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] openvswitch: Scrub skb between namespaces

2015-10-14 Thread Pravin Shelar
On Wed, Oct 14, 2015 at 11:10 AM, Joe Stringer  wrote:
> If OVS receives a packet from another namespace, then the packet should
> be scrubbed. However, people have already begun to rely on the behaviour
> that skb->mark is preserved across namespaces, so retain this one field.
>
> This is mainly to address information leakage between namespaces when
> using OVS internal ports, but by placing it in ovs_vport_receive() it is
> more generally applicable, meaning it should not be overlooked if other
> port types are allowed to be moved into namespaces in future.
>
> Signed-off-by: Joe Stringer 
> ---
> I originally proposed this patch as part of the conntrack changes to OVS,
> and there was some discussion on that thread, culminating here:
> http://www.spinics.net/lists/netdev/msg338626.html
>
> We also discussed this a bit in Seattle, however I didn't follow up
> immediately so I don't exactly recall what the consensus was. Following
> Jesse's direction in the above thread, I'm proposing that we preserve the
> mark, but scrub the rest. Also fixed the use-after-free bug present in the
> previous version.
>
> I think this is relevant for 'net', because this is the first time that
> the metadata_dst and nfct are exposed (albeit indirectly) through OVS so it
> would be nice to get agreement on the expected behaviour.
> ---
>  net/openvswitch/vport.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
> index fc5c0b9ccfe9..70f19ea99b92 100644
> --- a/net/openvswitch/vport.c
> +++ b/net/openvswitch/vport.c
> @@ -440,10 +440,17 @@ int ovs_vport_receive(struct vport *vport, struct 
> sk_buff *skb,
>   const struct ip_tunnel_info *tun_info)
>  {
> struct sw_flow_key key;
> +   u32 mark = skb->mark;
> int error;
>
> OVS_CB(skb)->input_vport = vport;
> OVS_CB(skb)->mru = 0;
> +   if (dev_net(skb->dev) != ovs_dp_get_net(vport->dp)) {
This should be marked as unlikely.

> +   skb_scrub_packet(skb, true);
> +   tun_info = NULL;
> +   }
> +   skb->mark = mark;
Lets move this to skb scrub block. in other cases this not required.

> +
> /* Extract flow from 'skb' into 'key'. */
> error = ovs_flow_key_extract(tun_info, skb, &key);
> if (unlikely(error)) {
> --
> 2.1.4
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-14 Thread Kosuke Tatsukawa
J. Bruce Fields wrote:
> On Wed, Oct 14, 2015 at 03:57:13AM +, Kosuke Tatsukawa wrote:
>> J. Bruce Fields wrote:
>> > On Mon, Oct 12, 2015 at 10:41:06AM +, Kosuke Tatsukawa wrote:
>> >> J. Bruce Fields wrote:
>> >> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
>> >> >> Neil Brown wrote:
>> >> >> > Kosuke Tatsukawa  writes:
>> >> >> > 
>> >> >> >> There are several places in net/sunrpc/svcsock.c which calls
>> >> >> >> waitqueue_active() without calling a memory barrier.  Add a memory
>> >> >> >> barrier just as in wq_has_sleeper().
>> >> >> >>
>> >> >> >> I found this issue when I was looking through the linux source code
>> >> >> >> for places calling waitqueue_active() before wake_up*(), but without
>> >> >> >> preceding memory barriers, after sending a patch to fix a similar
>> >> >> >> issue in drivers/tty/n_tty.c  (Details about the original issue can 
>> >> >> >> be
>> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849).
>> >> >> > 
>> >> >> > hi,
>> >> >> > this feels like the wrong approach to the problem.  It requires extra
>> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to
>> >> >> > forget.
>> >> >> > 
>> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active()
>> >> >> > will need an smb_mb.  Could we just put the smb_mb() inside
>> >> >> > waitqueue_active()??
>> >> >> 
>> >> >> 
>> >> >> There are around 200 occurrences of waitqueue_active() in the kernel
>> >> >> source, and most of the places which use it before wake_up are either
>> >> >> protected by some spin lock, or already has a memory barrier or some
>> >> >> kind of atomic operation before it.
>> >> >> 
>> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
>> >> >> many cases and won't be a good idea.
>> >> >> 
>> >> >> Another way to solve this problem is to remove the waitqueue_active(),
>> >> >> making the code look like this;
>> >> >>if (wq)
>> >> >>wake_up_interruptible(wq);
>> >> >> This also fixes the problem because the spinlock in the wake_up*() acts
>> >> >> as a memory barrier and prevents the code from being reordered by the
>> >> >> CPU (and it also makes the resulting code is much simpler).
>> >> > 
>> >> > I might not care which we did, except I don't have the means to test
>> >> > this quickly, and I guess this is some of our most frequently called
>> >> > code.
>> >> > 
>> >> > I suppose your patch is the most conservative approach, as the
>> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I
>> >> > assume is necessarily more expensive than an smp_mb().
>> >> > 
>> >> > As far as I can tell it's been this way since forever.  (Well, since a
>> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
>> >> > removed some spinlocks from the data_ready routines.)
>> >> > 
>> >> > I don't understand what the actual race is yet (which code exactly is
>> >> > missing the wakeup in this case?  nfsd threads seem to instead get
>> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().)
>> >> 
>> >> Thank you for the reply.  I tried looking into this.
>> >> 
>> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
>> >> svc_udp_init(), which are both called from svc_setup_socket().
>> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4
>> >> callback port related code.
>> >> 
>> >> Maybe I'm wrong, but there might not be any kernel code that is using
>> >> the socket's wait queue in this case.
>> > 
>> > As Trond points out there are probably waiters internal to the
>> > networking code.
>> 
>> Trond and Bruce, thank you for the comment.  I was able to find the call
>> to the wait function that was called from nfsd.
>> 
>> sk_stream_wait_connect() and sk_stream_wait_memory() were called from
>> either do_tcp_sendpages() or tcp_sendmsg() called from within
>> svc_send().  sk_stream_wait_connect() shouldn't be called at this point,
>> because the socket has already been used to receive the rpc request.
>> 
>> On the wake_up side, sk_write_space() is called from the following
>> locations.  The relevant ones seems to be preceded by atomic_sub or a
>> memory barrier.
>> + ksocknal_write_space 
>> [drivers/staging/lustre/lnet/klnds/socklnd/socklnd_lib.c:633]
>> + atm_pop_raw [net/atm/raw.c:40]
>> + sock_setsockopt [net/core/sock.c:740]
>> + sock_wfree [net/core/sock.c:1630]
>>   Preceded by atomic_sub in sock_wfree()
>> + ccid3_hc_tx_packet_recv [net/dccp/ccids/ccid3.c:442]
>> + do_tcp_sendpages [net/ipv4/tcp.c:1008]
>> + tcp_sendmsg [net/ipv4/tcp.c:1300]
>> + do_tcp_setsockopt [net/ipv4/tcp.c:2597]
>> + tcp_new_space [net/ipv4/tcp_input.c:4885]
>>   Preceded by smp_mb__after_atomic in tcp_check_space()
>> + llc_conn_state_process [net/llc/llc_conn.c:148]
>> + pipe_rcv_status [net/phonet/pep.c:312]
>> + pipe_do_rcv [net/phonet/pep.c:440]
>> + pipe_start_flow_control [net/phonet/pep.c:554]
>> + svc_sock_set

Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges

2015-10-14 Thread Vivien Didelot
On Oct. Wednesday 14 (42) 03:08 PM, Florian Fainelli wrote:
> On 14/10/15 11:51, Vivien Didelot wrote:
> > On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote:
> >> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote:
> >>> On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
> >>>  wrote:
>  On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
> > Tue, Oct 13, 2015 at 05:32:26PM IDT, 
> > vivien.dide...@savoirfairelinux.com wrote:
> >> On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
> >>> Mon, Oct 12, 2015 at 08:36:25PM IDT, 
> >>> vivien.dide...@savoirfairelinux.com wrote:
>  Hi guys,
> 
>  On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> > From: Nikolay Aleksandrov 
> >
> > We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> >
> > Signed-off-by: Nikolay Aleksandrov 
> > ---
> >  net/switchdev/switchdev.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> > index 6e4a4f9ad927..256c596de896 100644
> > --- a/net/switchdev/switchdev.c
> > +++ b/net/switchdev/switchdev.c
> > @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct 
> > net_device *dev,
> > if (vlan.vid_begin)
> > return -EINVAL;
> > vlan.vid_begin = vinfo->vid;
> > +   /* don't allow range of pvids */
> > +   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> > +   return -EINVAL;
> > } else if (vinfo->flags & 
> > BRIDGE_VLAN_INFO_RANGE_END) {
> > if (!vlan.vid_begin)
> > return -EINVAL;
> > --
> > 2.4.3
> >
> 
>  Yes the patch looks good, but it is a minor check though. I hope the
>  subject of this thread is making sense.
> 
>  VLAN ranges seem to have been included for an UX purpose (so commands
>  look like Cisco IOS). We don't want to change any existing 
>  interface, so
>  we pushed that down to drivers, with the only valid reason that, 
>  maybe
>  one day, an hardware can be capable of programming a range on a 
>  per-port
>  basis.
> >>> Hi,
> >>>
> >>> That's actually what we are doing in mlxsw. We can do up to 256 
> >>> entries in
> >>> one go. We've yet to submit this part.
> >>
> >> Perfect Ido, thanks for pointing this out! I'm OK with the range then.
> >>
> >> So there is now a very last question in my head for this, which is more
> >> a matter of kernel design. Should the user be aware of such underlying
> >> support? In other words, would it make sense to do this in a driver:
> >>
> >>foo_port_vlan_add(struct net_device *dev,
> >>  struct switchdev_obj_port_vlan *vlan)
> >>{
> >>if (vlan->vid_begin != vlan->vid_end)
> >>return -ENOTSUPP; /* or something more relevant for user */
> >>
> >>return foo_port_single_vlan_add(dev, vlan->vid_begin);
> >>}
> >>
> >> So drivers keep being simple, and we can easily propagate the fact that
> >> one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
> >> implemented and must be done in software.
> > I think that if you want to keep it simple, then Scott's advice from the
> > previous thread is the most appropriate one. I believe the hardware you
> > are using is simply not meant to support multiple 802.1Q bridges.
> 
>  You mean allowing only one Linux bridge over an hardware switch?
> 
>  It would for sure simplify how, as developers and users, we represent a
>  physical switch. But I am not sure how to achieve that and I don't have
>  strong opinions on this TBH.
> >>>
> >>> Hi Vivien, I think it's possible to keep switch ports on just one
> >>> bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
> >>> notifier.  This will give you the driver-level control you want.  Do
> >>> you have time to investigate?  The idea is:
> >>>
> >>> 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
> >>> being added to a second bridge,then return NOTIFY_BAD.  Your driver
> >>> needs to track the bridge count.
> >>>
> >>> 2) In __netdev_upper_dev_link(), check the return code from the
> >>> call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
> >>> NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
> >>>
> >> Hi,
> >>
> >> We are doing something similar in mlxsw (not upstream yet). Jiri
> >> introduced PRE_CHANGEUPPER, which is called from the function you
> >

[PATCH net-next] net: Fix suspicious RCU usage in fib_rebalance

2015-10-14 Thread David Ahern
This command:
  ip route add 192.168.1.0/24 nexthop via 10.2.1.5 dev eth1 nexthop via 
10.2.2.5 dev eth2

generated this suspicious RCU usage message:

[ 63.249262]
[ 63.249939] ===
[ 63.251571] [ INFO: suspicious RCU usage. ]
[ 63.253250] 4.3.0-rc3+ #298 Not tainted
[ 63.254724] ---
[ 63.256401] ../include/linux/inetdevice.h:205 suspicious 
rcu_dereference_check() usage!
[ 63.259450]
[ 63.259450] other info that might help us debug this:
[ 63.259450]
[ 63.262297]
[ 63.262297] rcu_scheduler_active = 1, debug_locks = 1
[ 63.264647] 1 lock held by ip/2870:
[ 63.265896] #0: (rtnl_mutex){+.+.+.}, at: [] 
rtnl_lock+0x12/0x14
[ 63.268858]
[ 63.268858] stack backtrace:
[ 63.270409] CPU: 4 PID: 2870 Comm: ip Not tainted 4.3.0-rc3+ #298
[ 63.272478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[ 63.275745] 0001 8800b8c9f8b8 8125f73c 88013afcf301
[ 63.278185] 8800bab7a380 8800b8c9f8e8 8107bf30 8800bb728000
[ 63.280634] 880139fe9a60  880139fe9a00 8800b8c9f908
[ 63.283177] Call Trace:
[ 63.283959] [] dump_stack+0x4c/0x68
[ 63.285593] [] lockdep_rcu_suspicious+0xfa/0x103
[ 63.287500] [] __in_dev_get_rcu+0x48/0x4f
[ 63.289169] [] fib_rebalance+0x3e/0x127
[ 63.290753] [] ? rcu_read_unlock+0x3e/0x5f
[ 63.292442] [] fib_create_info+0xaf9/0xdcc
[ 63.294093] [] ? sched_clock_local+0x12/0x75
[ 63.295791] [] fib_table_insert+0x8c/0x451
[ 63.297493] [] ? fib_get_table+0x36/0x43
[ 63.299109] [] inet_rtm_newroute+0x43/0x51
[ 63.300709] [] rtnetlink_rcv_msg+0x182/0x195
[ 63.302334] [] ? trace_hardirqs_on+0xd/0xf
[ 63.303888] [] ? rtnl_lock+0x12/0x14
[ 63.305346] [] ? __rtnl_unlock+0x12/0x12
[ 63.306878] [] netlink_rcv_skb+0x3d/0x90
[ 63.308437] [] rtnetlink_rcv+0x21/0x28
[ 63.309916] [] netlink_unicast+0xfa/0x17f
[ 63.311447] [] netlink_sendmsg+0x297/0x2dc
[ 63.313029] [] sock_sendmsg_nosec+0x12/0x1d
[ 63.314597] [] ___sys_sendmsg+0x196/0x21b
[ 63.316125] [] ? native_sched_clock+0x1f/0x3c
[ 63.317671] [] ? sched_clock_local+0x12/0x75
[ 63.319185] [] ? sched_clock_cpu+0x9d/0xb6
[ 63.320693] [] ? __lock_is_held+0x32/0x54
[ 63.322145] [] ? __fget_light+0x4b/0x77
[ 63.323541] [] __sys_sendmsg+0x3d/0x5b
[ 63.324947] [] SyS_sendmsg+0xd/0x19
[ 63.326274] [] entry_SYSCALL_64_fastpath+0x12/0x6f

It looks like all of the code paths to fib_rebalance are under rtnl.

Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath")
Cc: Peter Nørlund 
Signed-off-by: David Ahern 
---
 net/ipv4/fib_semantics.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index af77298c8b4f..42778d9d71e5 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -545,7 +545,7 @@ static void fib_rebalance(struct fib_info *fi)
if (nh->nh_flags & RTNH_F_DEAD)
continue;
 
-   in_dev = __in_dev_get_rcu(nh->nh_dev);
+   in_dev = __in_dev_get_rtnl(nh->nh_dev);
 
if (in_dev &&
IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN(in_dev) &&
@@ -559,7 +559,7 @@ static void fib_rebalance(struct fib_info *fi)
change_nexthops(fi) {
int upper_bound;
 
-   in_dev = __in_dev_get_rcu(nexthop_nh->nh_dev);
+   in_dev = __in_dev_get_rtnl(nexthop_nh->nh_dev);
 
if (nexthop_nh->nh_flags & RTNH_F_DEAD) {
upper_bound = -1;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference in rt6_get_cookie()

2015-10-14 Thread Martin KaFai Lau
On Thu, Oct 15, 2015 at 12:34:13AM +0200, Phil Sutter wrote:
> Hi Martin,
>
> On Tue, Oct 13, 2015 at 11:14:21PM -0700, Martin KaFai Lau wrote:
> > On Tue, Oct 13, 2015 at 09:26:41PM +0200, Phil Sutter wrote:
> > > I have backed up the rt pointer at top of the function and restored it
> > > before pr_err, this is the output:
> > >
> > > | rt6i_dst:2001:4dd0:ff3b:13::/64 rt6i_gateway::: rt6i_flags:4001 
> > > dst.flags:
> > Hi Phil, Can you try the following patch and report the pr_err?
>
> Probably needless to say, but with your patch applied the Oops does not
> occur anymore. This is the log output:
Thanks for testing it.  The patch may need a bit refactoring work and
I will post it soon.

>
> | [   46.518869] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   46.518874] IPv6:  rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   46.529171] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   46.529174] IPv6:  rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   46.529187] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   46.529189] IPv6:  rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   47.532014] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   47.532021] IPv6:  rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   47.532028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   47.532031] IPv6:  rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   49.536010] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   49.536014] IPv6:  rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   49.536021] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   49.536024] IPv6:  rt:8800cb07a180 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   53.544013] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   53.544020] IPv6:  rt:8800cb07a300 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
> | [   53.544028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
> | [   53.544031] IPv6:  rt:8800cb07b980 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
> rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
>
> In case the amount of log entries is surprising: my test-case is
> mounting two NFS shares over IPsec. No idea if that's relevant or not.
I also don't know why xfrm_lookup() errors out and then triggers
make_blackhole() but I believe it should not affect the fix here.

Thanks,
Martin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-14 Thread Andrew Lunn
On Sun, Oct 11, 2015 at 06:08:34PM -0400, Vivien Didelot wrote:
> DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in
> order to configure the VLAN map of every port.
> 
> This VLAN map is a feature of these switch chips to hardcode and restrict 
> which
> output ports a given input port can egress frames to.
> 
> A Linux bridge is a simple untagged VLAN propagated by the bridge code itself.
> With a proper 802.1Q support, a driver does not need this hook anymore, and
> will simply program the related VLAN object.
> 
> This patchset improves the hardware bridging code in the mv88e6xxx driver with
> a strict 802.1Q mode.

Hi Vivien

I just tested this as part of net-next/master, and found a problem

If i do:

ip link set lan0 up
ip addr add 192.168.10.2/24 dev lan0

It will not ping. Looking in sys/kernel/debug/dsa0/stats i see
broadcast packets, probably ARP, being received at the port.
But they are not being forwarded out the CPU port.

If however i do

brctl addbr br0
brctl addif br0 lan0
ip addr add 192.168.10.2/24 dev br0
ip link set br0 up

i can ping.

So it looks like we are too restrictive by default. You should be able
to use interfaces as they are, without a bridge.

   Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NULL pointer dereference in rt6_get_cookie()

2015-10-14 Thread Phil Sutter
Hi Martin,

On Tue, Oct 13, 2015 at 11:14:21PM -0700, Martin KaFai Lau wrote:
> On Tue, Oct 13, 2015 at 09:26:41PM +0200, Phil Sutter wrote:
> > I have backed up the rt pointer at top of the function and restored it
> > before pr_err, this is the output:
> >
> > | rt6i_dst:2001:4dd0:ff3b:13::/64 rt6i_gateway::: rt6i_flags:4001 
> > dst.flags:
> Hi Phil, Can you try the following patch and report the pr_err?

Probably needless to say, but with your patch applied the Oops does not
occur anymore. This is the log output:

| [   46.518869] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   46.518874] IPv6:  rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   46.529171] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   46.529174] IPv6:  rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   46.529187] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   46.529189] IPv6:  rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   47.532014] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   47.532021] IPv6:  rt:8800cb07a000 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   47.532028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   47.532031] IPv6:  rt:8800cb07b500 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   49.536010] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   49.536014] IPv6:  rt:8800cb07ad80 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   49.536021] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   49.536024] IPv6:  rt:8800cb07a180 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   53.544013] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   53.544020] IPv6:  rt:8800cb07a300 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:
| [   53.544028] IPv6: ort:8800cbb5b800 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:4001 dst.flags:0020
| [   53.544031] IPv6:  rt:8800cb07b980 rt6i_dst:[2001:4dd0:ff3b:13::]/64 
rt6i_gateway:[::] rt6i_flags:0001 dst.flags:

In case the amount of log entries is surprising: my test-case is
mounting two NFS shares over IPsec. No idea if that's relevant or not.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 5/6] netfilter-ipv4: code indentation

2015-10-14 Thread Ian Morris
Use tabs instead of spaces to indent code.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv4/netfilter/ip_tables.c| 6 +++---
 net/ipv4/netfilter/ipt_SYNPROXY.c | 2 +-
 net/ipv4/netfilter/iptable_security.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 3991a87..b99affa 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -431,8 +431,8 @@ ipt_do_table(struct sk_buff *skb,
} while (!acpar.hotdrop);
pr_debug("Exiting %s; sp at %u\n", __func__, stackidx);
 
-   xt_write_recseq_end(addend);
-   local_bh_enable();
+   xt_write_recseq_end(addend);
+   local_bh_enable();
 
 #ifdef DEBUG_ALLOW_ALL
return NF_ACCEPT;
@@ -484,7 +484,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
unsigned int oldpos, size;
 
if ((strcmp(t->target.u.user.name,
-   XT_STANDARD_TARGET) == 0) &&
+   XT_STANDARD_TARGET) == 0) &&
t->verdict < -NF_MAX_VERDICT - 1) {
duprintf("mark_source_chains: bad "
"negative verdict (%i)\n",
diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c 
b/net/ipv4/netfilter/ipt_SYNPROXY.c
index 6a6e762..ff746b33 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -231,7 +231,7 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
synproxy_build_options(nth, opts);
 
synproxy_send_tcp(snet, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
- niph, nth, tcp_hdr_size);
+ niph, nth, tcp_hdr_size);
 }
 
 static bool
diff --git a/net/ipv4/netfilter/iptable_security.c 
b/net/ipv4/netfilter/iptable_security.c
index f534e2f..c2e23d5 100644
--- a/net/ipv4/netfilter/iptable_security.c
+++ b/net/ipv4/netfilter/iptable_security.c
@@ -79,7 +79,7 @@ static int __init iptable_security_init(void)
int ret;
 
ret = register_pernet_subsys(&iptable_security_net_ops);
-if (ret < 0)
+   if (ret < 0)
return ret;
 
sectbl_ops = xt_hook_link(&security_table, iptable_security_hook);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 4/6] netfilter-ipv4: function definition layout

2015-10-14 Thread Ian Morris
Use tabs instead of spaces to indent second line of parameters in
function definitions.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv4/netfilter/arp_tables.c | 6 +++---
 net/ipv4/netfilter/ip_tables.c  | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index eb6663bd..11dccba 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -632,7 +632,7 @@ static inline void cleanup_entry(struct arpt_entry *e)
  * newinfo).
  */
 static int translate_table(struct xt_table_info *newinfo, void *entry0,
-   const struct arpt_replace *repl)
+  const struct arpt_replace *repl)
 {
struct arpt_entry *iter;
unsigned int i;
@@ -892,7 +892,7 @@ static int compat_table_info(const struct xt_table_info 
*info,
 #endif
 
 static int get_info(struct net *net, void __user *user,
-const int *len, int compat)
+   const int *len, int compat)
 {
char name[XT_TABLE_MAXNAMELEN];
struct xt_table *t;
@@ -1069,7 +1069,7 @@ static int __do_replace(struct net *net, const char *name,
 }
 
 static int do_replace(struct net *net, const void __user *user,
-  unsigned int len)
+ unsigned int len)
 {
int ret;
struct arpt_replace tmp;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 08b7ab0..3991a87 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -804,7 +804,7 @@ cleanup_entry(struct ipt_entry *e, struct net *net)
newinfo) */
 static int
 translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
-const struct ipt_replace *repl)
+   const struct ipt_replace *repl)
 {
struct ipt_entry *iter;
unsigned int i;
@@ -1078,7 +1078,7 @@ static int compat_table_info(const struct xt_table_info 
*info,
 #endif
 
 static int get_info(struct net *net, void __user *user,
-const int *len, int compat)
+   const int *len, int compat)
 {
char name[XT_TABLE_MAXNAMELEN];
struct xt_table *t;
@@ -1304,7 +1304,7 @@ do_replace(struct net *net, const void __user *user, 
unsigned int len)
 
 static int
 do_add_counters(struct net *net, const void __user *user,
-unsigned int len, int compat)
+   unsigned int len, int compat)
 {
unsigned int i;
struct xt_counters_info tmp;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 6/6] netfilter-ipv4: whitespace around operators

2015-10-14 Thread Ian Morris
This patch cleanses whitespace around arithmetical operators.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 8 
 net/ipv4/netfilter/ipt_ah.c| 2 +-
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3f32c03..4a9e6db 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -492,14 +492,14 @@ static void arp_print(struct arp_payload *payload)
 {
 #define HBUFFERLEN 30
char hbuffer[HBUFFERLEN];
-   int j,k;
+   int j, k;
 
-   for (k=0, j=0; k < HBUFFERLEN-3 && j < ETH_ALEN; j++) {
+   for (k = 0, j = 0; k < HBUFFERLEN - 3 && j < ETH_ALEN; j++) {
hbuffer[k++] = hex_asc_hi(payload->src_hw[j]);
hbuffer[k++] = hex_asc_lo(payload->src_hw[j]);
-   hbuffer[k++]=':';
+   hbuffer[k++] = ':';
}
-   hbuffer[--k]='\0';
+   hbuffer[--k] = '\0';
 
pr_debug("src %pI4@%s, dst %pI4\n",
 &payload->src_ip, hbuffer, &payload->dst_ip);
diff --git a/net/ipv4/netfilter/ipt_ah.c b/net/ipv4/netfilter/ipt_ah.c
index 14a2aa8..a787d07 100644
--- a/net/ipv4/netfilter/ipt_ah.c
+++ b/net/ipv4/netfilter/ipt_ah.c
@@ -25,7 +25,7 @@ spi_match(u_int32_t min, u_int32_t max, u_int32_t spi, bool 
invert)
bool r;
pr_debug("spi_match:%c 0x%x <= 0x%x <= 0x%x\n",
 invert ? '!' : ' ', min, spi, max);
-   r=(spi >= min && spi <= max) ^ invert;
+   r = (spi >= min && spi <= max) ^ invert;
pr_debug(" result %s\n", r ? "PASS" : "FAILED");
return r;
 }
diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c 
b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index 8e3dffa..89be5c5 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -1156,7 +1156,7 @@ static int snmp_parse_mangle(unsigned char *msg,
}
 
if (obj->type == SNMP_IPADDR)
-   mangle_address(ctx.begin, ctx.pointer - 4 , map, check);
+   mangle_address(ctx.begin, ctx.pointer - 4, map, check);
 
kfree(obj->id);
kfree(obj);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 0/6] coding style improvements: netfilter-ipv4

2015-10-14 Thread Ian Morris
This series of patches improves the coding style of the netfilter-ipv4 
code by addressing some issues detected by checkpatch.

The changes were previously submitted as part of a larger monolithic 
patch but on advice from Pablo, these are being re-sent in smaller, 
more structured batches.

Ian Morris (6):
  netfilter-ipv4: Line layout whitespace fixes
  netfilter-ipv4: label placement
  netfilter-ipv4: ternary operator layout
  netfilter-ipv4: function definition layout
  netfilter-ipv4: code indentation
  netfilter-ipv4: whitespace around operators

 net/ipv4/netfilter/arp_tables.c| 12 ++--
 net/ipv4/netfilter/ip_tables.c | 20 ++--
 net/ipv4/netfilter/ipt_CLUSTERIP.c |  8 
 net/ipv4/netfilter/ipt_ECN.c   |  2 +-
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  2 +-
 net/ipv4/netfilter/ipt_ah.c|  2 +-
 net/ipv4/netfilter/iptable_security.c  |  2 +-
 net/ipv4/netfilter/nf_nat_pptp.c   |  2 +-
 net/ipv4/netfilter/nf_nat_snmp_basic.c |  4 ++--
 9 files changed, 27 insertions(+), 27 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 2/6] netfilter-ipv4: label placement

2015-10-14 Thread Ian Morris
Whitespace cleansing: Labels should not be indented.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv4/netfilter/arp_tables.c | 2 +-
 net/ipv4/netfilter/ip_tables.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 2dad3e1..7300616 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -468,7 +468,7 @@ static int mark_source_chains(const struct xt_table_info 
*newinfo,
pos = newpos;
}
}
-   next:
+next:
duprintf("Finished chain %u\n", hook);
}
return 1;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 42d0946..3be2a4d 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -549,7 +549,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
pos = newpos;
}
}
-   next:
+next:
duprintf("Finished chain %u\n", hook);
}
return 1;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 1/6] netfilter-ipv4: Line layout whitespace fixes

2015-10-14 Thread Ian Morris
Cleanses some whitespace issues by removing a leading space before a tab.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv4/netfilter/ipt_ECN.c   | 2 +-
 net/ipv4/netfilter/nf_nat_pptp.c   | 2 +-
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
index 2707652..6592708 100644
--- a/net/ipv4/netfilter/ipt_ECN.c
+++ b/net/ipv4/netfilter/ipt_ECN.c
@@ -24,7 +24,7 @@ MODULE_AUTHOR("Harald Welte ");
 MODULE_DESCRIPTION("Xtables: Explicit Congestion Notification (ECN) flag 
modification");
 
 /* set ECT codepoint from IP header.
- * return false if there was an error. */
+ * return false if there was an error. */
 static inline bool
 set_ect_ip(struct sk_buff *skb, const struct ipt_ECN_info *einfo)
 {
diff --git a/net/ipv4/netfilter/nf_nat_pptp.c b/net/ipv4/netfilter/nf_nat_pptp.c
index 657d230..d5726f7 100644
--- a/net/ipv4/netfilter/nf_nat_pptp.c
+++ b/net/ipv4/netfilter/nf_nat_pptp.c
@@ -16,7 +16,7 @@
  * (C) 2006-2012 Patrick McHardy 
  *
  * TODO: - NAT to a unique tuple, not to TCP source port
- *(needs netfilter tuple reservation)
+ *(needs netfilter tuple reservation)
  */
 
 #include 
diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c 
b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index 7c67667..8e3dffa 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -891,7 +891,7 @@ static void fast_csum(__sum16 *csum,
 
 /*
  * Mangle IP address.
- * - begin points to the start of the snmp messgae
+ * - begin points to the start of the snmp messgae
  *  - addr points to the start of the address
  */
 static inline void mangle_address(unsigned char *begin,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 3/6] netfilter-ipv4: ternary operator layout

2015-10-14 Thread Ian Morris
Correct whitespace layout of ternary operators in the netfilter-ipv4
code.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv4/netfilter/arp_tables.c | 4 ++--
 net/ipv4/netfilter/ip_tables.c  | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 7300616..eb6663bd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -186,7 +186,7 @@ static inline int arp_packet_match(const struct arphdr 
*arphdr,
if (FWINV(ret != 0, ARPT_INV_VIA_IN)) {
dprintf("VIA in mismatch (%s vs %s).%s\n",
indev, arpinfo->iniface,
-   arpinfo->invflags&ARPT_INV_VIA_IN ?" (INV)":"");
+   arpinfo->invflags & ARPT_INV_VIA_IN ? " (INV)" : "");
return 0;
}
 
@@ -195,7 +195,7 @@ static inline int arp_packet_match(const struct arphdr 
*arphdr,
if (FWINV(ret != 0, ARPT_INV_VIA_OUT)) {
dprintf("VIA out mismatch (%s vs %s).%s\n",
outdev, arpinfo->outiface,
-   arpinfo->invflags&ARPT_INV_VIA_OUT ?" (INV)":"");
+   arpinfo->invflags & ARPT_INV_VIA_OUT ? " (INV)" : "");
return 0;
}
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 3be2a4d..08b7ab0 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -102,7 +102,7 @@ ip_packet_match(const struct iphdr *ip,
if (FWINV(ret != 0, IPT_INV_VIA_IN)) {
dprintf("VIA in mismatch (%s vs %s).%s\n",
indev, ipinfo->iniface,
-   ipinfo->invflags&IPT_INV_VIA_IN ?" (INV)":"");
+   ipinfo->invflags & IPT_INV_VIA_IN ? " (INV)" : "");
return false;
}
 
@@ -111,7 +111,7 @@ ip_packet_match(const struct iphdr *ip,
if (FWINV(ret != 0, IPT_INV_VIA_OUT)) {
dprintf("VIA out mismatch (%s vs %s).%s\n",
outdev, ipinfo->outiface,
-   ipinfo->invflags&IPT_INV_VIA_OUT ?" (INV)":"");
+   ipinfo->invflags & IPT_INV_VIA_OUT ? " (INV)" : "");
return false;
}
 
@@ -120,7 +120,7 @@ ip_packet_match(const struct iphdr *ip,
FWINV(ip->protocol != ipinfo->proto, IPT_INV_PROTO)) {
dprintf("Packet protocol %hi does not match %hi.%s\n",
ip->protocol, ipinfo->proto,
-   ipinfo->invflags&IPT_INV_PROTO ? " (INV)":"");
+   ipinfo->invflags & IPT_INV_PROTO ? " (INV)" : "");
return false;
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/1] via-rhine: fix VLAN receive handling regression.

2015-10-14 Thread Francois Romieu
From: Andrej Ota 

Because eth_type_trans() consumes ethernet header worth of bytes, a call
to read TCI from end of packet using rhine_rx_vlan_tag() no longer works
as it's reading from an invalid offset.

Tested to be working on PCEngines Alix board.

Fixes: 810f19bcb862 ("via-rhine: add consistent memory barrier in vlan receive 
code.")
Signed-off-by: Andrej Ota 
Acked-by: Francois Romieu 

---

 Applies fine as of 0f8b8e28fb3241f9fd82ce13bac2b40c35e987e0
 ("tipc: eliminate risk of stalled link synchronization").

 Andrej posted it on l-k the 2015/10/04, see
 http://marc.info/?l=linux-kernel&m=144398918324349

 Kernel v4.2 exhibits the regression. Stable v4.[01] kernels don't.

 drivers/net/ethernet/via/via-rhine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/via/via-rhine.c 
b/drivers/net/ethernet/via/via-rhine.c
index a832637..2b7550c 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -2134,10 +2134,11 @@ static int rhine_rx(struct net_device *dev, int limit)
}
 
skb_put(skb, pkt_len);
-   skb->protocol = eth_type_trans(skb, dev);
 
rhine_rx_vlan_tag(skb, desc, data_size);
 
+   skb->protocol = eth_type_trans(skb, dev);
+
netif_receive_skb(skb);
 
u64_stats_update_begin(&rp->rx_stats.syncp);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges

2015-10-14 Thread Florian Fainelli
On 14/10/15 11:51, Vivien Didelot wrote:
> On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote:
>> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote:
>>> On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
>>>  wrote:
 On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com 
> wrote:
>> On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>>> Mon, Oct 12, 2015 at 08:36:25PM IDT, 
>>> vivien.dide...@savoirfairelinux.com wrote:
 Hi guys,

 On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov 
>
> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>
> Signed-off-by: Nikolay Aleksandrov 
> ---
>  net/switchdev/switchdev.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 6e4a4f9ad927..256c596de896 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct 
> net_device *dev,
> if (vlan.vid_begin)
> return -EINVAL;
> vlan.vid_begin = vinfo->vid;
> +   /* don't allow range of pvids */
> +   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> +   return -EINVAL;
> } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) 
> {
> if (!vlan.vid_begin)
> return -EINVAL;
> --
> 2.4.3
>

 Yes the patch looks good, but it is a minor check though. I hope the
 subject of this thread is making sense.

 VLAN ranges seem to have been included for an UX purpose (so commands
 look like Cisco IOS). We don't want to change any existing interface, 
 so
 we pushed that down to drivers, with the only valid reason that, maybe
 one day, an hardware can be capable of programming a range on a 
 per-port
 basis.
>>> Hi,
>>>
>>> That's actually what we are doing in mlxsw. We can do up to 256 entries 
>>> in
>>> one go. We've yet to submit this part.
>>
>> Perfect Ido, thanks for pointing this out! I'm OK with the range then.
>>
>> So there is now a very last question in my head for this, which is more
>> a matter of kernel design. Should the user be aware of such underlying
>> support? In other words, would it make sense to do this in a driver:
>>
>>foo_port_vlan_add(struct net_device *dev,
>>  struct switchdev_obj_port_vlan *vlan)
>>{
>>if (vlan->vid_begin != vlan->vid_end)
>>return -ENOTSUPP; /* or something more relevant for user */
>>
>>return foo_port_single_vlan_add(dev, vlan->vid_begin);
>>}
>>
>> So drivers keep being simple, and we can easily propagate the fact that
>> one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>> implemented and must be done in software.
> I think that if you want to keep it simple, then Scott's advice from the
> previous thread is the most appropriate one. I believe the hardware you
> are using is simply not meant to support multiple 802.1Q bridges.

 You mean allowing only one Linux bridge over an hardware switch?

 It would for sure simplify how, as developers and users, we represent a
 physical switch. But I am not sure how to achieve that and I don't have
 strong opinions on this TBH.
>>>
>>> Hi Vivien, I think it's possible to keep switch ports on just one
>>> bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
>>> notifier.  This will give you the driver-level control you want.  Do
>>> you have time to investigate?  The idea is:
>>>
>>> 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
>>> being added to a second bridge,then return NOTIFY_BAD.  Your driver
>>> needs to track the bridge count.
>>>
>>> 2) In __netdev_upper_dev_link(), check the return code from the
>>> call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
>>> NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
>>>
>> Hi,
>>
>> We are doing something similar in mlxsw (not upstream yet). Jiri
>> introduced PRE_CHANGEUPPER, which is called from the function you
>> mentioned, but before the linking operation (so that you don't need to
>> rollback).
>>
>> If the notification is about a linking operation and the master is a
>> bridge different than the current one, then NOTIFY_BAD is returned.
> 
> Great, I'll wait for this then.
> 
> Scott, 

Re: [PATCH net-next] bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put

2015-10-14 Thread Alexei Starovoitov

On 10/14/15 2:40 PM, Tom Herbert wrote:

Currently, is only called from __prog_put_rcu in the bpf_prog_release
path. Need this to call this from bpf_prog_put also to get correct
accounting.

Fixes: commit aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and 
programs")
Signed-off-by: Tom Herbert


ohh. right. good catch. thanks!
Acked-by: Alexei Starovoitov 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put

2015-10-14 Thread Tom Herbert
Currently, is only called from __prog_put_rcu in the bpf_prog_release
path. Need this to call this from bpf_prog_put also to get correct
accounting.

Fixes: commit aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and 
programs")
Signed-off-by: Tom Herbert 
---
 kernel/bpf/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f640e5f..687dd6c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -520,6 +520,7 @@ void bpf_prog_put(struct bpf_prog *prog)
 {
if (atomic_dec_and_test(&prog->aux->refcnt)) {
free_used_maps(prog->aux);
+   bpf_prog_uncharge_memlock(prog);
bpf_prog_free(prog);
}
 }
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/2] bpf: control a set of perf events by creating a new ioctl PERF_EVENT_IOC_SET_ENABLER

2015-10-14 Thread Alexei Starovoitov

On 10/14/15 5:37 AM, Kaixu Xia wrote:

+   event->p_sample_disable = &enabler_event->sample_disable;


I don't like it as a concept and it's buggy implementation.
What happens here when enabler is alive, but other event is destroyed?


--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -221,9 +221,12 @@ static u64 bpf_perf_event_sample_control(u64 r1, u64 
index, u64 flag, u64 r4, u6
struct bpf_array *array = container_of(map, struct bpf_array, map);
struct perf_event *event;

-   if (unlikely(index >= array->map.max_entries))
+   if (unlikely(index > array->map.max_entries))
return -E2BIG;

+   if (index == array->map.max_entries)
+   index = 0;


what is this hack for ?

Either use notification and user space disable or
call bpf_perf_event_sample_control() manually for each cpu.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] RDS: convert bind hash table to re-sizable hashtable

2015-10-14 Thread santosh.shilim...@oracle.com

On 10/14/15 2:15 PM, Santosh Shilimkar wrote:

From: Santosh Shilimkar 

To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands  of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.

Cc: David Laight 
Cc: David Miller 
Signed-off-by: Santosh Shilimkar 
---
As promised in last series review, here is an RFC to conver RDS to make
use of re-sizable hash tables. I haven't turned on auto shrinking on
by purpose.


Ignore the automatic_shrinking remark since patch has it enabled.



  net/rds/af_rds.c |  10 -
  net/rds/bind.c   | 127 ---
  net/rds/rds.h|   7 ++-
  3 files changed, 58 insertions(+), 86 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 384ea1e..b5476aeb 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -573,6 +573,7 @@ static void rds_exit(void)
rds_threads_exit();
rds_stats_exit();
rds_page_exit();
+   rds_bind_lock_destroy();
rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info);
rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
  }
@@ -582,11 +583,14 @@ static int rds_init(void)
  {
int ret;

-   rds_bind_lock_init();
+   ret = rds_bind_lock_init();
+   if (ret)
+   goto out;

ret = rds_conn_init();
if (ret)
-   goto out;
+   goto out_bind;
+
ret = rds_threads_init();
if (ret)
goto out_conn;
@@ -620,6 +624,8 @@ out_conn:
rds_conn_exit();
rds_cong_exit();
rds_page_exit();
+out_bind:
+   rds_bind_lock_destroy();
  out:
return ret;
  }
diff --git a/net/rds/bind.c b/net/rds/bind.c
index bc6b93e..199e4cc 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -38,54 +38,18 @@
  #include 
  #include "rds.h"

-struct bind_bucket {
-   rwlock_tlock;
-   struct hlist_head   head;
+static struct rhashtable bind_hash_table;
+
+static struct rhashtable_params ht_parms = {
+   .nelem_hint = 768,
+   .key_len = sizeof(u64),
+   .key_offset = offsetof(struct rds_sock, rs_bound_key),
+   .head_offset = offsetof(struct rds_sock, rs_bound_node),
+   .max_size = 16384,
+   .min_size = 1024,
+   .automatic_shrinking = true,
  };

-#define BIND_HASH_SIZE 1024
-static struct bind_bucket bind_hash_table[BIND_HASH_SIZE];
-
-static struct bind_bucket *hash_to_bucket(__be32 addr, __be16 port)
-{
-   return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
- (BIND_HASH_SIZE - 1));
-}
-
-/* must hold either read or write lock (write lock for insert != NULL) */
-static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket,
-   __be32 addr, __be16 port,
-   struct rds_sock *insert)
-{
-   struct rds_sock *rs;
-   struct hlist_head *head = &bucket->head;
-   u64 cmp;
-   u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
-
-   hlist_for_each_entry(rs, head, rs_bound_node) {
-   cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
- be16_to_cpu(rs->rs_bound_port);
-
-   if (cmp == needle) {
-   rds_sock_addref(rs);
-   return rs;
-   }
-   }
-
-   if (insert) {
-   /*
-* make sure our addr and port are set before
-* we are added to the list.
-*/
-   insert->rs_bound_addr = addr;
-   insert->rs_bound_port = port;
-   rds_sock_addref(insert);
-
-   hlist_add_head(&insert->rs_bound_node, head);
-   }
-   return NULL;
-}
-
  /*
   * Return the rds_sock bound at the given local address.
   *
@@ -94,18 +58,14 @@ static struct rds_sock *rds_bind_lookup(struct bind_bucket 
*bucket,
   */
  struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
  {
+   u64 key = ((u64)addr << 32) | port;
struct rds_sock *rs;
-   unsigned long flags;
-   struct bind_bucket *bucket = hash_to_bucket(addr, port);

-   read_lock_irqsave(&bucket->lock, flags);
-   rs = rds_bind_lookup(bucket, addr, port, NULL);
-   read_unlock_irqrestore(&bucket->lock, flags);
-
-   if (rs && sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) {
-   rds_sock_put(rs);
+   rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms);
+   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
+   rds_sock_addref(rs);
+   else
rs = NULL;
-   }

rdsdebug("returning rs %p for %pI4:%u\n",

Re: [PATCH V2 1/2] bpf: control the trace data output on current cpu when perf sampling

2015-10-14 Thread Alexei Starovoitov

On 10/14/15 5:37 AM, Kaixu Xia wrote:

This patch adds the flag sample_disable to control the trace data
output process when perf sampling. By setting this flag and
integrating with ebpf, we can control the data output process and
get the samples we are most interested in.

The bpf helper bpf_perf_event_sample_control() can control the
perf_event on current cpu.

Signed-off-by: Kaixu Xia 

...

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6337,6 +6337,9 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending);
}

+   if (!atomic_read(&event->sample_disable))
+   return ret;
+


the condition check and the name are inconsistent.
It's either
if (!enabled) return
or
if (disabled) return


if (event->overflow_handler)
event->overflow_handler(event, data, regs);
else
@@ -7709,6 +7712,14 @@ static void account_event(struct perf_event *event)
account_event_cpu(event, event->cpu);
  }

+static void perf_event_check_sample_flag(struct perf_event *event)
+{
+   if (event->attr.sample_disable == 1)
+   atomic_set(&event->sample_disable, 0);
+   else
+   atomic_set(&event->sample_disable, 1);
+}


why introduce new attribute for this?
we already have 'disabled' flag.


+static u64 bpf_perf_event_sample_control(u64 r1, u64 index, u64 flag, u64 r4, 
u64 r5)
+{
+   struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+   struct bpf_array *array = container_of(map, struct bpf_array, map);
+   struct perf_event *event;
+
+   if (unlikely(index >= array->map.max_entries))
+   return -E2BIG;
+
+   event = (struct perf_event *)array->ptrs[index];
+   if (!event)
+   return -ENOENT;
+
+   if (flag)


please check only bit 0 and check that all other bits are zero as well
for future extensibility.


+   atomic_dec(&event->sample_disable);


it should be atomic_dec_if_positive();


+   else
+   atomic_inc(&event->sample_disable);


and atomic_add_unless()
to make sure we don't wrap on either side.


+const struct bpf_func_proto bpf_perf_event_sample_control_proto = {


static.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] RDS: convert bind hash table to re-sizable hashtable

2015-10-14 Thread Santosh Shilimkar
From: Santosh Shilimkar 

To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands  of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.

Cc: David Laight 
Cc: David Miller 
Signed-off-by: Santosh Shilimkar 
---
As promised in last series review, here is an RFC to conver RDS to make
use of re-sizable hash tables. I haven't turned on auto shrinking on
by purpose.

 net/rds/af_rds.c |  10 -
 net/rds/bind.c   | 127 ---
 net/rds/rds.h|   7 ++-
 3 files changed, 58 insertions(+), 86 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 384ea1e..b5476aeb 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -573,6 +573,7 @@ static void rds_exit(void)
rds_threads_exit();
rds_stats_exit();
rds_page_exit();
+   rds_bind_lock_destroy();
rds_info_deregister_func(RDS_INFO_SOCKETS, rds_sock_info);
rds_info_deregister_func(RDS_INFO_RECV_MESSAGES, rds_sock_inc_info);
 }
@@ -582,11 +583,14 @@ static int rds_init(void)
 {
int ret;
 
-   rds_bind_lock_init();
+   ret = rds_bind_lock_init();
+   if (ret)
+   goto out;
 
ret = rds_conn_init();
if (ret)
-   goto out;
+   goto out_bind;
+
ret = rds_threads_init();
if (ret)
goto out_conn;
@@ -620,6 +624,8 @@ out_conn:
rds_conn_exit();
rds_cong_exit();
rds_page_exit();
+out_bind:
+   rds_bind_lock_destroy();
 out:
return ret;
 }
diff --git a/net/rds/bind.c b/net/rds/bind.c
index bc6b93e..199e4cc 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -38,54 +38,18 @@
 #include 
 #include "rds.h"
 
-struct bind_bucket {
-   rwlock_tlock;
-   struct hlist_head   head;
+static struct rhashtable bind_hash_table;
+
+static struct rhashtable_params ht_parms = {
+   .nelem_hint = 768,
+   .key_len = sizeof(u64),
+   .key_offset = offsetof(struct rds_sock, rs_bound_key),
+   .head_offset = offsetof(struct rds_sock, rs_bound_node),
+   .max_size = 16384,
+   .min_size = 1024,
+   .automatic_shrinking = true,
 };
 
-#define BIND_HASH_SIZE 1024
-static struct bind_bucket bind_hash_table[BIND_HASH_SIZE];
-
-static struct bind_bucket *hash_to_bucket(__be32 addr, __be16 port)
-{
-   return bind_hash_table + (jhash_2words((u32)addr, (u32)port, 0) &
- (BIND_HASH_SIZE - 1));
-}
-
-/* must hold either read or write lock (write lock for insert != NULL) */
-static struct rds_sock *rds_bind_lookup(struct bind_bucket *bucket,
-   __be32 addr, __be16 port,
-   struct rds_sock *insert)
-{
-   struct rds_sock *rs;
-   struct hlist_head *head = &bucket->head;
-   u64 cmp;
-   u64 needle = ((u64)be32_to_cpu(addr) << 32) | be16_to_cpu(port);
-
-   hlist_for_each_entry(rs, head, rs_bound_node) {
-   cmp = ((u64)be32_to_cpu(rs->rs_bound_addr) << 32) |
- be16_to_cpu(rs->rs_bound_port);
-
-   if (cmp == needle) {
-   rds_sock_addref(rs);
-   return rs;
-   }
-   }
-
-   if (insert) {
-   /*
-* make sure our addr and port are set before
-* we are added to the list.
-*/
-   insert->rs_bound_addr = addr;
-   insert->rs_bound_port = port;
-   rds_sock_addref(insert);
-
-   hlist_add_head(&insert->rs_bound_node, head);
-   }
-   return NULL;
-}
-
 /*
  * Return the rds_sock bound at the given local address.
  *
@@ -94,18 +58,14 @@ static struct rds_sock *rds_bind_lookup(struct bind_bucket 
*bucket,
  */
 struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
 {
+   u64 key = ((u64)addr << 32) | port;
struct rds_sock *rs;
-   unsigned long flags;
-   struct bind_bucket *bucket = hash_to_bucket(addr, port);
 
-   read_lock_irqsave(&bucket->lock, flags);
-   rs = rds_bind_lookup(bucket, addr, port, NULL);
-   read_unlock_irqrestore(&bucket->lock, flags);
-
-   if (rs && sock_flag(rds_rs_to_sk(rs), SOCK_DEAD)) {
-   rds_sock_put(rs);
+   rs = rhashtable_lookup_fast(&bind_hash_table, &key, ht_parms);
+   if (rs && !sock_flag(rds_rs_to_sk(rs), SOCK_DEAD))
+   rds_sock_addref(rs);
+   else
rs = NULL;
-   }
 
rdsdebug("returning rs %p for %pI4:%u\n", rs, &addr,
ntohs(port));
@@ -116,10 +76,9 @@ struct rds_sock *rds_find_bound(__be32 addr, __be16 port)
 /* retur

Re: [PATCH] ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings

2015-10-14 Thread Ben Hutchings
On Wed, 2015-10-14 at 01:09 -0700, Joe Perches wrote:
> It seems that kernel memory can leak into userspace by a
> kmalloc, ethtool_get_strings, then copy_to_user sequence.
> 
> Avoid this by using kcalloc to zero fill the copied buffer.
> 
> Signed-off-by: Joe Perches 
> ---
> 
> stable too...
> 
> On Tue, 2015-10-13 at 23:59 -0700, Jeff Kirsher wrote:
> > From: Jacob Keller 
> []
> > diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c 
> > b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
> []
> > @@ -206,13 +206,13 @@ static void fm10k_get_stat_strings(struct net_device 
> > *dev, u8 *data)
> >  > >> > }
> >  
> >  > >> > for (i = 0; i < interface->hw.mac.max_queues; i++) {
> > -> >> > > > sprintf(p, "tx_queue_%u_packets", i);
> > +> >> > > > snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_packets", 
> > i);
> 
> It seems these need a memset after the snprintf to zero fill
> bytes after the string terminating \0 to avoid leaking
> contents of any unset bytes.

Right.  It used to be that all drivers were memcpy()ing from a static
array which had all the necessary zero bytes, but now there are a bunch
of them using s{,n}printf() or otherwise dynamically generating names
for statistics or tests.  And I don't think there's any snprintf()-
alike function that will fix that.

At least these drivers aren't zero-padding all strings: bnx2x, bnad,
i40e, i40evf, igb, ixgbe, liquidio, mlx4_en, mlx5e, nicvf, qlcnic, sfc,
vxge.

Acked-by: Ben Hutchings 

Ben.

> It'd probably be better to allocate a zeroed buffer instead.
> 
> >     p += ETH_GSTRING_LEN;
> > -   sprintf(p, "tx_queue_%u_bytes", i);
> > +> >> > > > snprintf(p, ETH_GSTRING_LEN, "tx_queue_%u_bytes", 
> > i);
> 
> so...
> 
>  net/core/ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index b495ab1..29edf74 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -1284,7 +1284,7 @@ static int ethtool_get_strings(struct net_device *dev, 
> void __user *useraddr)
>  
>   gstrings.len = ret;
>  
> - data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
> + data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER);
>   if (!data)
>   return -ENOMEM;
>  
> 
> 
-- 
Ben Hutchings
[W]e found...that it wasn't as easy to get programs right as we had thought.
... I realized that a large part of my life from then on was going to be spent
in finding mistakes in my own programs. - Maurice Wilkes, 1949


signature.asc
Description: This is a digitally signed message part


Re: [PATCH net-next 3/3] tcp/dccp: fix race at listener dismantle phase

2015-10-14 Thread kbuild test robot
Hi Eric,

[auto build test WARNING on net-next/master -- if it's inappropriate base, 
please suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Eric-Dumazet/tcp-dccp-make-our-listener-code-more-robust/20151015-020006
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/ipv4/tcp_input.c:6238:17: sparse: context imbalance in 
>> 'tcp_conn_request' - unexpected unlock

vim +/tcp_conn_request +6238 net/ipv4/tcp_input.c

f7b3bec6 Florian Westphal 2014-11-03  6222  
f7b3bec6 Florian Westphal 2014-11-03  6223  if (want_cookie) {
f7b3bec6 Florian Westphal 2014-11-03  6224  isn = 
cookie_init_sequence(af_ops, sk, skb, &req->mss);
f7b3bec6 Florian Westphal 2014-11-03  6225  req->cookie_ts = 
tmp_opt.tstamp_ok;
f7b3bec6 Florian Westphal 2014-11-03  6226  if (!tmp_opt.tstamp_ok)
f7b3bec6 Florian Westphal 2014-11-03  6227  
inet_rsk(req)->ecn_ok = 0;
f7b3bec6 Florian Westphal 2014-11-03  6228  }
f7b3bec6 Florian Westphal 2014-11-03  6229  
1fb6f159 Octavian Purdila 2014-06-25  6230  tcp_rsk(req)->snt_isn = isn;
58d607d3 Eric Dumazet 2015-09-15  6231  tcp_rsk(req)->txhash = 
net_tx_rndhash();
1fb6f159 Octavian Purdila 2014-06-25  6232  tcp_openreq_init_rwin(req, sk, 
dst);
ca6fb065 Eric Dumazet 2015-10-02  6233  if (!want_cookie) {
ca6fb065 Eric Dumazet 2015-10-02  6234  
tcp_reqsk_record_syn(sk, req, skb);
7656d842 Eric Dumazet 2015-10-04  6235  fastopen_sk = 
tcp_try_fastopen(sk, skb, req, &foc, dst);
ca6fb065 Eric Dumazet 2015-10-02  6236  }
7c85af88 Eric Dumazet 2015-09-24  6237  if (fastopen_sk) {
ca6fb065 Eric Dumazet 2015-10-02 @6238  
af_ops->send_synack(fastopen_sk, dst, &fl, req,
ca6fb065 Eric Dumazet 2015-10-02  6239  
skb_get_queue_mapping(skb), &foc, false);
7656d842 Eric Dumazet 2015-10-04  6240  /* Add the child socket 
directly into the accept queue */
7656d842 Eric Dumazet 2015-10-04  6241  
inet_csk_reqsk_queue_add(sk, req, fastopen_sk);
7656d842 Eric Dumazet 2015-10-04  6242  sk->sk_data_ready(sk);
7656d842 Eric Dumazet 2015-10-04  6243  
bh_unlock_sock(fastopen_sk);
7c85af88 Eric Dumazet 2015-09-24  6244  sock_put(fastopen_sk);
7c85af88 Eric Dumazet 2015-09-24  6245  } else {
9439ce00 Eric Dumazet 2015-03-17  6246  
tcp_rsk(req)->tfo_listener = false;

:: The code at line 6238 was first introduced by commit
:: ca6fb06518836ef9b65dc0aac02ff97704d52a05 tcp: attach SYNACK messages to 
request sockets instead of listener

:: TO: Eric Dumazet 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] route: fib_validate_source remove the <= RT_SCOPE_HOST test

2015-10-14 Thread Julian Anastasov

Hello,

On Thu, 15 Oct 2015, lucien xin wrote:

> yeah, I don't understand why err > 0 is necessary to set IPSKB_DOREDIRECT
> to send redirects.
> FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST, what's that mean?

It tells us that packet comes from remote address that
we can reach directly, without using gateway. The most
common values for nh_scope are RT_SCOPE_LINK (when nh_gw is
unicast address), RT_SCOPE_HOST (when nh_gw is not set or
is local address) and RT_SCOPE_NOWHERE (when we have a
local route). You can check fib_check_nh() and
fib_create_info() for reference.

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v5 2/8] switchdev: make struct switchdev_attr parameter const for attr_set calls

2015-10-14 Thread Vivien Didelot
On Oct. Wednesday 14 (42) 07:40 PM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Signed-off-by: Jiri Pirko 

Reviewed-by: Vivien Didelot 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges

2015-10-14 Thread Vivien Didelot
On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote:
> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfel...@gmail.com wrote:
> >On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
> > wrote:
> >> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
> >>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.dide...@savoirfairelinux.com 
> >>> wrote:
> >>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
> >>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, 
> >>> >> vivien.dide...@savoirfairelinux.com wrote:
> >>> >> >Hi guys,
> >>> >> >
> >>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> >>> >> >> From: Nikolay Aleksandrov 
> >>> >> >>
> >>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> >>> >> >>
> >>> >> >> Signed-off-by: Nikolay Aleksandrov 
> >>> >> >> ---
> >>> >> >>  net/switchdev/switchdev.c | 3 +++
> >>> >> >>  1 file changed, 3 insertions(+)
> >>> >> >>
> >>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> >>> >> >> index 6e4a4f9ad927..256c596de896 100644
> >>> >> >> --- a/net/switchdev/switchdev.c
> >>> >> >> +++ b/net/switchdev/switchdev.c
> >>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct 
> >>> >> >> net_device *dev,
> >>> >> >> if (vlan.vid_begin)
> >>> >> >> return -EINVAL;
> >>> >> >> vlan.vid_begin = vinfo->vid;
> >>> >> >> +   /* don't allow range of pvids */
> >>> >> >> +   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> >>> >> >> +   return -EINVAL;
> >>> >> >> } else if (vinfo->flags & 
> >>> >> >> BRIDGE_VLAN_INFO_RANGE_END) {
> >>> >> >> if (!vlan.vid_begin)
> >>> >> >> return -EINVAL;
> >>> >> >> --
> >>> >> >> 2.4.3
> >>> >> >>
> >>> >> >
> >>> >> >Yes the patch looks good, but it is a minor check though. I hope the
> >>> >> >subject of this thread is making sense.
> >>> >> >
> >>> >> >VLAN ranges seem to have been included for an UX purpose (so commands
> >>> >> >look like Cisco IOS). We don't want to change any existing interface, 
> >>> >> >so
> >>> >> >we pushed that down to drivers, with the only valid reason that, maybe
> >>> >> >one day, an hardware can be capable of programming a range on a 
> >>> >> >per-port
> >>> >> >basis.
> >>> >> Hi,
> >>> >>
> >>> >> That's actually what we are doing in mlxsw. We can do up to 256 
> >>> >> entries in
> >>> >> one go. We've yet to submit this part.
> >>> >
> >>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
> >>> >
> >>> >So there is now a very last question in my head for this, which is more
> >>> >a matter of kernel design. Should the user be aware of such underlying
> >>> >support? In other words, would it make sense to do this in a driver:
> >>> >
> >>> >foo_port_vlan_add(struct net_device *dev,
> >>> >  struct switchdev_obj_port_vlan *vlan)
> >>> >{
> >>> >if (vlan->vid_begin != vlan->vid_end)
> >>> >return -ENOTSUPP; /* or something more relevant for user */
> >>> >
> >>> >return foo_port_single_vlan_add(dev, vlan->vid_begin);
> >>> >}
> >>> >
> >>> >So drivers keep being simple, and we can easily propagate the fact that
> >>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
> >>> >implemented and must be done in software.
> >>> I think that if you want to keep it simple, then Scott's advice from the
> >>> previous thread is the most appropriate one. I believe the hardware you
> >>> are using is simply not meant to support multiple 802.1Q bridges.
> >>
> >> You mean allowing only one Linux bridge over an hardware switch?
> >>
> >> It would for sure simplify how, as developers and users, we represent a
> >> physical switch. But I am not sure how to achieve that and I don't have
> >> strong opinions on this TBH.
> >
> >Hi Vivien, I think it's possible to keep switch ports on just one
> >bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
> >notifier.  This will give you the driver-level control you want.  Do
> >you have time to investigate?  The idea is:
> >
> >1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
> >being added to a second bridge,then return NOTIFY_BAD.  Your driver
> >needs to track the bridge count.
> >
> >2) In __netdev_upper_dev_link(), check the return code from the
> >call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
> >NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
> >
> Hi,
> 
> We are doing something similar in mlxsw (not upstream yet). Jiri
> introduced PRE_CHANGEUPPER, which is called from the function you
> mentioned, but before the linking operation (so that you don't need to
> rollback).
> 
> If the notification is about a linking operation and the master is a
> bridge different than the current one, then NOTIFY_BAD is returned.

Great, I'll wait fo

Re: [PATCH net-next v2 6/6] net: dsa: remove port_fdb_getnext

2015-10-14 Thread Florian Fainelli
On 13/10/15 09:46, Vivien Didelot wrote:
> No driver implements port_fdb_getnext anymore, and port_fdb_dump is
> preferred anyway, so remove this function from DSA.
> 
> Signed-off-by: Vivien Didelot 

Acked-by: Florian Fainelli 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 1/6] net: dsa: add port_fdb_dump function

2015-10-14 Thread Florian Fainelli
On 13/10/15 09:46, Vivien Didelot wrote:
> Not all switch chips support a Get Next operation to iterate on its FDB.
> So add a more simple port_fdb_dump function for them.
> 
> Signed-off-by: Vivien Didelot 

Acked-by: Florian Fainelli 
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 2/3] tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper

2015-10-14 Thread Eric Dumazet
Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.

Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: Eric Dumazet 
---
 include/net/inet_connection_sock.h |  1 +
 net/dccp/ipv4.c|  2 +-
 net/dccp/ipv6.c|  2 +-
 net/ipv4/inet_connection_sock.c| 10 --
 net/ipv4/tcp_ipv4.c|  2 +-
 net/ipv6/tcp_ipv6.c|  2 +-
 6 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 3208a65d1c28..89ecbc80b2ce 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -299,6 +299,7 @@ static inline int inet_csk_reqsk_queue_is_full(const struct 
sock *sk)
 }
 
 void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req);
+void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock 
*req);
 
 void inet_csk_destroy_sock(struct sock *sk);
 void inet_csk_prepare_forced_close(struct sock *sk);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 644af510d932..59bc180b02d8 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -828,7 +828,7 @@ lookup:
if (likely(sk->sk_state == DCCP_LISTEN)) {
nsk = dccp_check_req(sk, skb, req);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 68831931b1fe..d9cc731f2619 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -686,7 +686,7 @@ lookup:
if (likely(sk->sk_state == DCCP_LISTEN)) {
nsk = dccp_check_req(sk, skb, req);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 514b9e910bd4..a5a1b54915e5 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -546,6 +546,13 @@ void inet_csk_reqsk_queue_drop(struct sock *sk, struct 
request_sock *req)
 }
 EXPORT_SYMBOL(inet_csk_reqsk_queue_drop);
 
+void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock 
*req)
+{
+   inet_csk_reqsk_queue_drop(sk, req);
+   reqsk_put(req);
+}
+EXPORT_SYMBOL(inet_csk_reqsk_queue_drop_and_put);
+
 static void reqsk_timer_handler(unsigned long data)
 {
struct request_sock *req = (struct request_sock *)data;
@@ -608,8 +615,7 @@ static void reqsk_timer_handler(unsigned long data)
return;
}
 drop:
-   inet_csk_reqsk_queue_drop(sk_listener, req);
-   reqsk_put(req);
+   inet_csk_reqsk_queue_drop_and_put(sk_listener, req);
 }
 
 static void reqsk_queue_hash_req(struct request_sock *req,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index aad2298de7ad..9c68cf3762c4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1591,7 +1591,7 @@ process:
if (likely(sk->sk_state == TCP_LISTEN)) {
nsk = tcp_check_req(sk, skb, req, false);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7ce1c57199d1..acb06f86f372 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1386,7 +1386,7 @@ process:
if (likely(sk->sk_state == TCP_LISTEN)) {
nsk = tcp_check_req(sk, skb, req, false);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 3/3] tcp/dccp: fix race at listener dismantle phase

2015-10-14 Thread Eric Dumazet
Under stress, a close() on a listener can trigger the
WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop()

We need to test if listener is still active before queueing
a child in inet_csk_reqsk_queue_add()

Create a common inet_child_forget() helper, and use it
from inet_csk_reqsk_queue_add() and inet_csk_listen_stop()

Signed-off-by: Eric Dumazet 
---
 include/net/inet_connection_sock.h |  9 ++---
 include/net/request_sock.h | 19 --
 net/ipv4/inet_connection_sock.c| 71 ++
 3 files changed, 51 insertions(+), 48 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 89ecbc80b2ce..8b0e3d8a4d81 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -268,13 +268,8 @@ struct dst_entry *inet_csk_route_child_sock(const struct 
sock *sk,
struct sock *newsk,
const struct request_sock *req);
 
-static inline void inet_csk_reqsk_queue_add(struct sock *sk,
-   struct request_sock *req,
-   struct sock *child)
-{
-   reqsk_queue_add(&inet_csk(sk)->icsk_accept_queue, req, sk, child);
-}
-
+void inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req,
+ struct sock *child);
 void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
   unsigned long timeout);
 
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 2e73748956d5..a0dde04eb178 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -186,25 +186,6 @@ static inline bool reqsk_queue_empty(const struct 
request_sock_queue *queue)
return queue->rskq_accept_head == NULL;
 }
 
-static inline void reqsk_queue_add(struct request_sock_queue *queue,
-  struct request_sock *req,
-  struct sock *parent,
-  struct sock *child)
-{
-   spin_lock(&queue->rskq_lock);
-   req->sk = child;
-   sk_acceptq_added(parent);
-
-   if (queue->rskq_accept_head == NULL)
-   queue->rskq_accept_head = req;
-   else
-   queue->rskq_accept_tail->dl_next = req;
-
-   queue->rskq_accept_tail = req;
-   req->dl_next = NULL;
-   spin_unlock(&queue->rskq_lock);
-}
-
 static inline struct request_sock *reqsk_queue_remove(struct 
request_sock_queue *queue,
  struct sock *parent)
 {
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index a5a1b54915e5..08eaa5e20574 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -764,6 +764,53 @@ int inet_csk_listen_start(struct sock *sk, const int 
nr_table_entries)
 }
 EXPORT_SYMBOL_GPL(inet_csk_listen_start);
 
+static void inet_child_forget(struct sock *sk, struct request_sock *req,
+ struct sock *child)
+{
+   sk->sk_prot->disconnect(child, O_NONBLOCK);
+
+   sock_orphan(child);
+
+   percpu_counter_inc(sk->sk_prot->orphan_count);
+
+   if (sk->sk_protocol == IPPROTO_TCP && tcp_rsk(req)->tfo_listener) {
+   BUG_ON(tcp_sk(child)->fastopen_rsk != req);
+   BUG_ON(sk != req->rsk_listener);
+
+   /* Paranoid, to prevent race condition if
+* an inbound pkt destined for child is
+* blocked by sock lock in tcp_v4_rcv().
+* Also to satisfy an assertion in
+* tcp_v4_destroy_sock().
+*/
+   tcp_sk(child)->fastopen_rsk = NULL;
+   }
+   inet_csk_destroy_sock(child);
+   reqsk_put(req);
+}
+
+void inet_csk_reqsk_queue_add(struct sock *sk, struct request_sock *req,
+ struct sock *child)
+{
+   struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
+
+   spin_lock(&queue->rskq_lock);
+   if (unlikely(sk->sk_state != TCP_LISTEN)) {
+   inet_child_forget(sk, req, child);
+   } else {
+   req->sk = child;
+   req->dl_next = NULL;
+   if (queue->rskq_accept_head == NULL)
+   queue->rskq_accept_head = req;
+   else
+   queue->rskq_accept_tail->dl_next = req;
+   queue->rskq_accept_tail = req;
+   sk_acceptq_added(sk);
+   }
+   spin_unlock(&queue->rskq_lock);
+}
+EXPORT_SYMBOL(inet_csk_reqsk_queue_add);
+
 /*
  * This routine closes sockets which have been at least partially
  * opened, but not yet accepted.
@@ -790,31 +837,11 @@ void inet_csk_listen_stop(struct sock *sk)
WARN_ON(sock_owned_by_user(child));
sock_hold(child);
 
-   sk->sk_prot->disc

[PATCH v2 net-next 1/3] Revert "inet: fix double request socket freeing"

2015-10-14 Thread Eric Dumazet
This reverts commit c69736696cf3742b37d850289dc0d7ead177bb14.

At the time of above commit, tcp_req_err() and dccp_req_err()
were dead code, as SYN_RECV request sockets were not yet in ehash table.

Real bug was fixed later in a different commit.

We need to revert to not leak a refcount on request socket.

inet_csk_reqsk_queue_drop_and_put() will be added
in following commit to make clean inet_csk_reqsk_queue_drop()
does not release the reference owned by caller.

Signed-off-by: Eric Dumazet 
---
 net/dccp/ipv4.c | 2 +-
 net/ipv4/tcp_ipv4.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 0dcf1963b323..644af510d932 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -208,7 +208,6 @@ void dccp_req_err(struct sock *sk, u64 seq)
 
if (!between48(seq, dccp_rsk(req)->dreq_iss, dccp_rsk(req)->dreq_gss)) {
NET_INC_STATS_BH(net, LINUX_MIB_OUTOFWINDOWICMPS);
-   reqsk_put(req);
} else {
/*
 * Still in RESPOND, just remove it silently.
@@ -218,6 +217,7 @@ void dccp_req_err(struct sock *sk, u64 seq)
 */
inet_csk_reqsk_queue_drop(req->rsk_listener, req);
}
+   reqsk_put(req);
 }
 EXPORT_SYMBOL(dccp_req_err);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1ff0923df715..aad2298de7ad 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -324,7 +324,6 @@ void tcp_req_err(struct sock *sk, u32 seq)
 
if (seq != tcp_rsk(req)->snt_isn) {
NET_INC_STATS_BH(net, LINUX_MIB_OUTOFWINDOWICMPS);
-   reqsk_put(req);
} else {
/*
 * Still in SYN_RECV, just remove it silently.
@@ -332,9 +331,10 @@ void tcp_req_err(struct sock *sk, u32 seq)
 * created socket, and POSIX does not want network
 * errors returned from accept().
 */
-   NET_INC_STATS_BH(net, LINUX_MIB_LISTENDROPS);
inet_csk_reqsk_queue_drop(req->rsk_listener, req);
+   NET_INC_STATS_BH(net, LINUX_MIB_LISTENDROPS);
}
+   reqsk_put(req);
 }
 EXPORT_SYMBOL(tcp_req_err);
 
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 0/3] tcp/dccp: make our listener code more robust

2015-10-14 Thread Eric Dumazet
This patch series addresses request sockets leaks and listener dismantle
phase. This survives a stress test with listeners being added/removed
quite randomly.

Eric Dumazet (3):
  Revert "inet: fix double request socket freeing"
  tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper
  tcp/dccp: fix race at listener dismantle phase

 include/net/inet_connection_sock.h | 10 ++---
 include/net/request_sock.h | 19 -
 net/dccp/ipv4.c|  4 +-
 net/dccp/ipv6.c|  2 +-
 net/ipv4/inet_connection_sock.c| 81 +++---
 net/ipv4/tcp_ipv4.c|  6 +--
 net/ipv6/tcp_ipv6.c|  2 +-
 7 files changed, 67 insertions(+), 57 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/3] tcp/dccp: fix race at listener dismantle phase

2015-10-14 Thread Eric Dumazet
On Wed, 2015-10-14 at 10:58 -0700, Eric Dumazet wrote:


...


> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index a5a1b54915e5..38b7ef8b0b78 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -740,7 +740,7 @@ int inet_csk_listen_start(struct sock *sk, const int 
> nr_table_entries)
>  
>   reqsk_queue_alloc(&icsk->icsk_accept_queue);
>  
> - sk->sk_max_ack_backlog = 0;
> + sk->sk_max_ack_backlog = nr_table_entries;
>   sk->sk_ack_backlog = 0;
>   inet_csk_delack_init(sk);
>  
> @@ -764,6 +764,53 @@ int inet_csk_listen_start(struct sock *sk, const int 
> nr_table_entries)


Arg, this part was not meant to be there, sorry. Will send a v2


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/3] openvswitch: Reject ct_state masks for unknown bits

2015-10-14 Thread Joe Stringer
Currently, 0-bits are generated in ct_state where the bit position is
undefined, and matches are accepted on these bit-positions. If userspace
requests to match the 0-value for this bit then it may expect only a
subset of traffic to match this value, whereas currently all packets
will have this bit set to 0. Fix this by rejecting such masks.

Signed-off-by: Joe Stringer 
---
 net/openvswitch/conntrack.h| 11 +--
 net/openvswitch/flow_netlink.c |  5 -
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index da8714942c95..2d42b3640117 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -35,12 +35,9 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key);
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
 void ovs_ct_free_action(const struct nlattr *a);
 
-static inline bool ovs_ct_state_supported(u32 state)
-{
-   return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED |
-OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR |
-OVS_CS_F_INVALID | OVS_CS_F_TRACKED));
-}
+#define CT_SUPPORTED_MASK (OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | \
+  OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR | \
+  OVS_CS_F_INVALID | OVS_CS_F_TRACKED)
 #else
 #include 
 
@@ -94,5 +91,7 @@ static inline int ovs_ct_put_key(const struct sw_flow_key 
*key,
 }
 
 static inline void ovs_ct_free_action(const struct nlattr *a) { }
+
+#define CT_SUPPORTED_MASK 0
 #endif /* CONFIG_NF_CONNTRACK */
 #endif /* ovs_conntrack.h */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 171a691f1c32..bd710bc37469 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -816,7 +816,7 @@ static int metadata_from_nlattrs(struct net *net, struct 
sw_flow_match *match,
ovs_ct_verify(net, OVS_KEY_ATTR_CT_STATE)) {
u32 ct_state = nla_get_u32(a[OVS_KEY_ATTR_CT_STATE]);
 
-   if (!is_mask && !ovs_ct_state_supported(ct_state)) {
+   if (ct_state & ~CT_SUPPORTED_MASK) {
OVS_NLERR(log, "ct_state flags %08x unsupported",
  ct_state);
return -EINVAL;
@@ -1099,6 +1099,9 @@ static void nlattr_set(struct nlattr *attr, u8 val,
} else {
memset(nla_data(nla), val, nla_len(nla));
}
+
+   if (nla_type(nla) == OVS_KEY_ATTR_CT_STATE)
+   *(u32 *)nla_data(nla) &= CT_SUPPORTED_MASK;
}
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 3/3] openvswitch: Serialize nested ct actions if provided

2015-10-14 Thread Joe Stringer
If userspace provides a ct action with no nested mark or label, then the
storage for these fields is zeroed. Later when actions are requested,
such zeroed fields are serialized even though userspace didn't
originally specify them. Fix the behaviour by ensuring that no action is
serialized in this case, and reject actions where userspace attempts to
set these fields with mask=0. This should make netlink marshalling
consistent across deserialization/reserialization.

Reported-by: Jarno Rajahalme 
Signed-off-by: Joe Stringer 
---
 net/openvswitch/conntrack.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 480dbb9095b7..ba29e6c2e0d4 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -540,6 +540,16 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info 
*info, const char *name,
return 0;
 }
 
+static bool label_zero(const struct ovs_key_ct_labels *labels)
+{
+   int i;
+
+   for (i = 0; i < sizeof(*labels); i++)
+   if (labels->ct_labels[i])
+   return false;
+   return true;
+}
+
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
[OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
@@ -589,6 +599,10 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
case OVS_CT_ATTR_MARK: {
struct md_mark *mark = nla_data(a);
 
+   if (!mark->mask) {
+   OVS_NLERR(log, "ct_mark mask cannot be 0");
+   return -EINVAL;
+   }
info->mark = *mark;
break;
}
@@ -597,6 +611,10 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
case OVS_CT_ATTR_LABELS: {
struct md_labels *labels = nla_data(a);
 
+   if (label_zero(&labels->mask)) {
+   OVS_NLERR(log, "ct_labels mask cannot be 0");
+   return -EINVAL;
+   }
info->labels = *labels;
break;
}
@@ -707,11 +725,12 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info 
*ct_info,
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone.id))
return -EMSGSIZE;
-   if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
+   if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) && ct_info->mark.mask &&
nla_put(skb, OVS_CT_ATTR_MARK, sizeof(ct_info->mark),
&ct_info->mark))
return -EMSGSIZE;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_LABELS) &&
+   !label_zero(&ct_info->labels.mask) &&
nla_put(skb, OVS_CT_ATTR_LABELS, sizeof(ct_info->labels),
&ct_info->labels))
return -EMSGSIZE;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] openvswitch: Scrub skb between namespaces

2015-10-14 Thread Joe Stringer
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.

This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.

Signed-off-by: Joe Stringer 
---
I originally proposed this patch as part of the conntrack changes to OVS,
and there was some discussion on that thread, culminating here:
http://www.spinics.net/lists/netdev/msg338626.html

We also discussed this a bit in Seattle, however I didn't follow up
immediately so I don't exactly recall what the consensus was. Following
Jesse's direction in the above thread, I'm proposing that we preserve the
mark, but scrub the rest. Also fixed the use-after-free bug present in the
previous version.

I think this is relevant for 'net', because this is the first time that
the metadata_dst and nfct are exposed (albeit indirectly) through OVS so it
would be nice to get agreement on the expected behaviour.
---
 net/openvswitch/vport.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index fc5c0b9ccfe9..70f19ea99b92 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -440,10 +440,17 @@ int ovs_vport_receive(struct vport *vport, struct sk_buff 
*skb,
  const struct ip_tunnel_info *tun_info)
 {
struct sw_flow_key key;
+   u32 mark = skb->mark;
int error;
 
OVS_CB(skb)->input_vport = vport;
OVS_CB(skb)->mru = 0;
+   if (dev_net(skb->dev) != ovs_dp_get_net(vport->dp)) {
+   skb_scrub_packet(skb, true);
+   tun_info = NULL;
+   }
+   skb->mark = mark;
+
/* Extract flow from 'skb' into 'key'. */
error = ovs_flow_key_extract(tun_info, skb, &key);
if (unlikely(error)) {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/3] openvswitch: Treat IP_CT_RELATED as new

2015-10-14 Thread Joe Stringer
New, related connections are marked as such as part of ovs_ct_lookup(),
but they are not marked as "new" if the commit flag is used. Make this
consistent by treating IP_CT_RELATED as new as well.

Reported-by: Jarno Rajahalme 
Signed-off-by: Joe Stringer 
---
 net/openvswitch/conntrack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 80bf702715bb..480dbb9095b7 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -86,6 +86,8 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
ct_state |= OVS_CS_F_ESTABLISHED;
break;
case IP_CT_RELATED:
+   ct_state |= OVS_CS_F_NEW;
+   /* Fall through */
case IP_CT_RELATED_REPLY:
ct_state |= OVS_CS_F_RELATED;
break;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] drivers/net: get rid of unnecessary initializations in .get_drvinfo()

2015-10-14 Thread Ben Hutchings
On Wed, 2015-10-14 at 18:27 +0200, Ivan Vecera wrote:
> Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
> eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
> It's not necessary as these fields is filled in ethtool_get_drvinfo().
> 
> Signed-off-by: Ivan Vecera 
[...]

Acked-by: Ben Hutchings 

-- 
Ben Hutchings
[W]e found...that it wasn't as easy to get programs right as we had thought.
... I realized that a large part of my life from then on was going to be spent
in finding mistakes in my own programs. - Maurice Wilkes, 1949

signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 3/5] net: phy: Add Broadcom phy library for common interfaces

2015-10-14 Thread Robert E Cochran



On 10/06/2015 03:25 PM, Arun Parameswaran wrote:

This patch adds the Broadcom phy library to consolidate common
interfaces shared by Broadcom phy's.


The BCM54612E is included in the Broadcom Community part portfolio 
(https://community.broadcom.com).  However, I don't see this part 
explicitly supported by your phy library ( e.g., not included in 
broadcom_drivers[] in broadcom.c ).


Can you please comment on whether this part is supported or the extent 
of changes required to establish and support a robust GigE connection 
between RGMII and CU?


We're considering this part for a new embedded design, and we need an 
open source driver for it.


Thanks

Bob





Moved the common interfaces to the 'bcm-phy-lib.c' and updated
the Broadcom PHY drivers to use the new APIs.

Signed-off-by: Arun Parameswaran 
---
  drivers/net/phy/Kconfig   |   6 ++
  drivers/net/phy/Makefile  |   1 +
  drivers/net/phy/bcm-phy-lib.c | 209 ++
  drivers/net/phy/bcm-phy-lib.h |  37 
  drivers/net/phy/bcm63xx.c |  38 +---
  drivers/net/phy/bcm7xxx.c | 127 ++---
  drivers/net/phy/broadcom.c| 149 +-
  include/linux/brcmphy.h   |  22 +
  8 files changed, 333 insertions(+), 256 deletions(-)
  create mode 100644 drivers/net/phy/bcm-phy-lib.c
  create mode 100644 drivers/net/phy/bcm-phy-lib.h

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index b57f6c2..606fdc9 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -69,8 +69,12 @@ config SMSC_PHY
---help---
  Currently supports the LAN83C185, LAN8187 and LAN8700 PHYs
  
+config BCM_NET_PHYLIB

+   tristate
+
  config BROADCOM_PHY
tristate "Drivers for Broadcom PHYs"
+   select BCM_NET_PHYLIB
---help---
  Currently supports the BCM5411, BCM5421, BCM5461, BCM54616S, BCM5464,
  BCM5481 and BCM5482 PHYs.
@@ -78,11 +82,13 @@ config BROADCOM_PHY
  config BCM63XX_PHY
tristate "Drivers for Broadcom 63xx SOCs internal PHY"
depends on BCM63XX
+   select BCM_NET_PHYLIB
---help---
  Currently supports the 6348 and 6358 PHYs.
  
  config BCM7XXX_PHY

tristate "Drivers for Broadcom 7xxx SOCs internal PHYs"
+   select BCM_NET_PHYLIB
---help---
  Currently supports the BCM7366, BCM7439, BCM7445, and
  40nm and 65nm generation of BCM7xxx Set Top Box SoCs.
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index f4e6eb9..6932475 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_QSEMI_PHY)   += qsemi.o
  obj-$(CONFIG_SMSC_PHY)+= smsc.o
  obj-$(CONFIG_TERANETICS_PHY)  += teranetics.o
  obj-$(CONFIG_VITESSE_PHY) += vitesse.o
+obj-$(CONFIG_BCM_NET_PHYLIB)   += bcm-phy-lib.o
  obj-$(CONFIG_BROADCOM_PHY)+= broadcom.o
  obj-$(CONFIG_BCM63XX_PHY) += bcm63xx.o
  obj-$(CONFIG_BCM7XXX_PHY) += bcm7xxx.o
diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
new file mode 100644
index 000..13e161e
--- /dev/null
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (C) 2015 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "bcm-phy-lib.h"
+#include 
+#include 
+#include 
+#include 
+
+#define MII_BCM_CHANNEL_WIDTH 0x2000
+#define BCM_CL45VEN_EEE_ADV   0x3c
+
+int bcm_phy_write_exp(struct phy_device *phydev, u16 reg, u16 val)
+{
+   int rc;
+
+   rc = phy_write(phydev, MII_BCM54XX_EXP_SEL, reg);
+   if (rc < 0)
+   return rc;
+
+   return phy_write(phydev, MII_BCM54XX_EXP_DATA, val);
+}
+EXPORT_SYMBOL_GPL(bcm_phy_write_exp);
+
+int bcm_phy_read_exp(struct phy_device *phydev, u16 reg)
+{
+   int val;
+
+   val = phy_write(phydev, MII_BCM54XX_EXP_SEL, reg);
+   if (val < 0)
+   return val;
+
+   val = phy_read(phydev, MII_BCM54XX_EXP_DATA);
+
+   /* Restore default value.  It's O.K. if this write fails. */
+   phy_write(phydev, MII_BCM54XX_EXP_SEL, 0);
+
+   return val;
+}
+EXPORT_SYMBOL_GPL(bcm_phy_read_exp);
+
+int bcm_phy_write_misc(struct phy_device *phydev,
+  u16 reg, u16 chl, u16 val)
+{
+   int rc;
+   int tmp;
+
+   rc = phy_write(phydev, MII_BCM54XX_AUX_CTL,
+  MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
+   if (rc < 0)
+   return rc;
+
+   tmp = phy_read(phydev, MII_BCM54XX_AUX_CTL);
+   tmp |= MII_BCM

[PATCH net-next 2/3] tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper

2015-10-14 Thread Eric Dumazet
Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.

Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets")
Signed-off-by: Eric Dumazet 
---
 include/net/inet_connection_sock.h |  1 +
 net/dccp/ipv4.c|  2 +-
 net/dccp/ipv6.c|  2 +-
 net/ipv4/inet_connection_sock.c| 10 --
 net/ipv4/tcp_ipv4.c|  2 +-
 net/ipv6/tcp_ipv6.c|  2 +-
 6 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 3208a65d1c28..89ecbc80b2ce 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -299,6 +299,7 @@ static inline int inet_csk_reqsk_queue_is_full(const struct 
sock *sk)
 }
 
 void inet_csk_reqsk_queue_drop(struct sock *sk, struct request_sock *req);
+void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock 
*req);
 
 void inet_csk_destroy_sock(struct sock *sk);
 void inet_csk_prepare_forced_close(struct sock *sk);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 644af510d932..59bc180b02d8 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -828,7 +828,7 @@ lookup:
if (likely(sk->sk_state == DCCP_LISTEN)) {
nsk = dccp_check_req(sk, skb, req);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 68831931b1fe..d9cc731f2619 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -686,7 +686,7 @@ lookup:
if (likely(sk->sk_state == DCCP_LISTEN)) {
nsk = dccp_check_req(sk, skb, req);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 514b9e910bd4..a5a1b54915e5 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -546,6 +546,13 @@ void inet_csk_reqsk_queue_drop(struct sock *sk, struct 
request_sock *req)
 }
 EXPORT_SYMBOL(inet_csk_reqsk_queue_drop);
 
+void inet_csk_reqsk_queue_drop_and_put(struct sock *sk, struct request_sock 
*req)
+{
+   inet_csk_reqsk_queue_drop(sk, req);
+   reqsk_put(req);
+}
+EXPORT_SYMBOL(inet_csk_reqsk_queue_drop_and_put);
+
 static void reqsk_timer_handler(unsigned long data)
 {
struct request_sock *req = (struct request_sock *)data;
@@ -608,8 +615,7 @@ static void reqsk_timer_handler(unsigned long data)
return;
}
 drop:
-   inet_csk_reqsk_queue_drop(sk_listener, req);
-   reqsk_put(req);
+   inet_csk_reqsk_queue_drop_and_put(sk_listener, req);
 }
 
 static void reqsk_queue_hash_req(struct request_sock *req,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index aad2298de7ad..9c68cf3762c4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1591,7 +1591,7 @@ process:
if (likely(sk->sk_state == TCP_LISTEN)) {
nsk = tcp_check_req(sk, skb, req, false);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7ce1c57199d1..acb06f86f372 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1386,7 +1386,7 @@ process:
if (likely(sk->sk_state == TCP_LISTEN)) {
nsk = tcp_check_req(sk, skb, req, false);
} else {
-   inet_csk_reqsk_queue_drop(sk, req);
+   inet_csk_reqsk_queue_drop_and_put(sk, req);
goto lookup;
}
if (!nsk) {
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >