date:20161011

Re: [PATCH v6] net: ip, diag -- Add diag interface for raw sockets

2016-10-11 Thread David Miller

From: Cyrill Gorcunov 
Date: Thu, 6 Oct 2016 13:00:55 +0300

> v6:
>  - use sdiag_raw_protocol() helper which will access @pad
>structure used for raw sockets protocol specification:
>we can't simply rename this member without breaking uapi.

Macros that look like function calls and are also lvalues tend to be
troublesome.

I know what you're trying to achieve, you want a named way to access
this so that the intent and semantics are clear.

But I'd rather you do something that provides a way by which normal
struct member accesses do the job, and your earlier patches achieved
this.

Re: [PATCH net-next v2] openvswitch: correctly fragment packet with mpls headers

2016-10-11 Thread David Miller

From: Jiri Benc 
Date: Wed,  5 Oct 2016 15:01:57 +0200

> If mpls headers were pushed to a defragmented packet, the refragmentation no
> longer works correctly after 48d2ab609b6b ("net: mpls: Fixups for GSO"). The
> network header has to be shifted after the mpls headers for the
> fragmentation and restored afterwards.
> 
> Fixes: 48d2ab609b6b ("net: mpls: Fixups for GSO")
> Signed-off-by: Jiri Benc 

Applied.

Re: [PATCH 1/2] net: mv643xx_eth: use phydev from struct net_device

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Sun,  2 Oct 2016 12:06:48 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: dsa: slave: use new api ethtool_{get|set}_link_ksettings

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Sun,  9 Oct 2016 17:00:53 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: ti: netcp_ethss: use new api ethtool_{get|set}_link_ksettings

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Sat,  8 Oct 2016 19:48:15 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH v2 1/2] net: stmmac: use phydev from struct net_device

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Mon,  3 Oct 2016 08:28:19 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH v2 2/2] net: stmmac: use new api ethtool_{get|set}_link_ksettings

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Mon,  3 Oct 2016 08:28:20 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: usb: lan78xx: use new api ethtool_{get|set}_link_ksettings

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Sun,  9 Oct 2016 12:07:04 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] net: ti: cpsw: use new api ethtool_{get|set}_link_ksettings

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Sat,  8 Oct 2016 17:46:15 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH 2/2] net: mv643xx_eth: use new api ethtool_{get|set}_link_ksettings

2016-10-11 Thread David Miller

From: Philippe Reynes 
Date: Sun,  2 Oct 2016 12:06:49 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH] iwlwifi: pcie: reduce "unsupported splx" to a warning

2016-10-11 Thread Chris Rorvick

On Tue, Oct 11, 2016 at 5:11 AM, Paul Bolle  wrote:
> For what it's worth, on my machine I have twenty (!) SPLX entries, all
> reading:
> Name (SPLX, Package (0x04)
> {
> Zero,
> Package (0x03)
> {
> 0x8000,
> 0x8000,
> 0x8000
> },
>
> Package (0x03)
> {
>0x8000,
>0x8000,
>0x8000
> },
>
> Package (0x03)
> {
> 0x8000,
> 0x8000,
> 0x8000
> }
> })

I actually see exactly the same on my Dell XPS 13 (9350) when I  use
acpidump, etc.  I typed the entry I included in the commit log by hand
based on what the driver gets back from the SPLC method (I added a
function to dump the returned object.)

Chris

Re: [PATCH net-next] tcp: Change txhash on some non-RTO retransmits

2016-10-11 Thread Yuchung Cheng

On Tue, Oct 11, 2016 at 6:01 PM, Yuchung Cheng  wrote:
> On Tue, Oct 11, 2016 at 2:08 PM, Lawrence Brakmo  wrote:
>> Yuchung, thank you for your comments. Responses inline.
>>
>> On 10/11/16, 12:49 PM, "Yuchung Cheng"  wrote:
>>
>>>On Mon, Oct 10, 2016 at 5:18 PM, Lawrence Brakmo  wrote:

 The purpose of this patch is to help balance flows across paths. A new
 sysctl "tcp_retrans_txhash_prob" specifies the probability (0-100) that
 the txhash (IPv6 flowlabel) will be changed after a non-RTO retransmit.
 A probability is used in order to control how many flows are moved
 during a congestion event and prevent the congested path from becoming
 under utilized (which could occur if too many flows leave the current
 path). Txhash changes may be delayed in order to decrease the likelihood
 that it will trigger retransmists due to too much reordering.

 Another sysctl "tcp_retrans_txhash_mode" determines the behavior after
 RTOs. If the sysctl is 0, then after an RTO, only RTOs can trigger
 txhash changes. The idea is to decrease the likelihood of going back
 to a broken path. That is, we don't want flow balancing to trigger
 changes to broken paths. The drawback is that flow balancing does
 not work as well. If the sysctl is greater than 1, then we always
 do flow balancing, even after RTOs.

 Tested with packedrill tests (for correctness) and performance
 experiments with 2 and 3 paths. Performance experiments looked at
 aggregate goodput and fairness. For each run, we looked at the ratio of
 the goodputs for the fastest and slowest flows. These were averaged for
 all the runs. A fairness of 1 means all flows had the same goodput, a
 fairness of 2 means the fastest flow was twice as fast as the slowest
 flow.

 The setup for the performance experiments was 4 or 5 serves in a rack,
 10G links. I tested various probabilities, but 20 seemed to have the
 best tradeoff for my setup (small RTTs).

   --- node1 -
 sender --- switch --- node2 - switch  receiver
   --- node3 -

 Scenario 1: One sender sends to one receiver through 2 routes (node1 or
 node 2). The output from node1 and node2 is 1G (1gbit/sec). With only 2
 flows, without flow balancing (prob=0) the average goodput is 1.6G vs.
 1.9G with flow balancing due to 2 flows ending up in one link and either
 not moving and taking some time to move. Fairness was 1 in all cases.
 For 7 flows, goodput was 1.9G for all, but fairness was 1.5, 1.4 or 1.2
 for prob=0, prob=20,mode=0 and prob=20,mode=1 respectively. That is,
 flow balancing increased fairness.

 Scenario 2: One sender to one receiver, through 3 routes (node1,...
 node2). With 6 or 16 flows the goodput was the same for all, but
 fairness was 1.8, 1.5 and 1.2 respectively. Interestingly, the worst
 case fairness out of 10 runs were 2.2, 1.8 and 1.4 repectively. That is,
 prob=20,mode=1 improved average and worst case fairness.
>>>I am wondering if we can build better API with routing layer to
>>>implement this type of feature, instead of creeping the tx_rehashing
>>>logic scatter in TCP. For example, we call dst_negative_advice on TCP
>>>write timeouts.
>>
>> Not sure. The route is not necessarily bad, may be temporarily congested
>> or they may all be congested. If all we want to do is change the txhash
>> (unlike dst_negative_advice), then calling a tx_rehashing function may
>> be the appropriate call.
>>
>>>
>>>On the patch itself, it seems aggressive to (attempt to) rehash every
>>>post-RTO retranmission. Also you can just use ca_state (==CA_Loss) to
>>>identify post-RTO retransmission directly.
>>
>> Thanks, I will add the test.
>>
>>>
>>>is this an implementation of the Flow Bender ?
>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation.cf
>>>m-3Fid-3D2674985=DQIBaQ=5VD0RTtNlTh3ycd41b3MUw=pq_Mqvzfy-C8ltkgyx1u_
>>>g=Q4nONH7kQ5AvQguw9UxpcHd79jfdDdrXj1YSJs7Ezhk=MA4fWBLMTGgRS0eGvBjxf7BJ
>>>Ol3-oxAzZDEYUG4cE-s=
>>
>> Part of flow bender, although there are also some similarities to flowlet
>> switching.
>>
>>>

 Scenario 3: One sender to one receiver, 2 routes, one route drops 50% of
 the packets. With 7 flows, goodput was the same 1.1G, but fairness was
 1.8, 2.0 and 2.1 respectively. That is, if there is a bad route, then
 balancing, which does more re-routes, is less fair.

 Signed-off-by: Lawrence Brakmo 
 ---
  Documentation/networking/ip-sysctl.txt | 15 +++
  include/linux/tcp.h|  4 +++-
  include/net/tcp.h  |  2 ++
  net/ipv4/sysctl_net_ipv4.c | 18 ++
  net/ipv4/tcp_input.c   | 10 ++

Re: [PATCH v4 1/3] skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid conflicts with leds namespace

2016-10-11 Thread Zach Brown

On Tue, Oct 11, 2016 at 02:14:07PM -0700, Stephen Hemminger wrote:
> On Tue, 11 Oct 2016 15:26:18 -0500
> Zach Brown  wrote:
>
> > Adding led support for phy causes namespace conflicts for some
> > phy drivers.
> >
> > The marvel skge driver declared an enum for representing the states of
> > Link LED Register. The enum contained constant LED_OFF which conflicted
> > with declartation found in linux/leds.h.
> > LED_OFF changed to LED_REG_OFF
> > Also changed LED_ON to LED_REG_ON to avoid possible future conflict and
> > for consistency.
> >
> > Signed-off-by: Zach Brown 
>
> Sure, that's fine but not sure why skge would be including linux/leds.h
> anyway.

It's pretty convoluted. Here's the chain of includes.
skge -> netdevice -> dsa -> phy -> phy_led_triggers -> leds

Re: [PATCH net-next] mlx5: Add MLX5_SET64_VCHK to fix BUILD_BUG_ON

2016-10-11 Thread Saeed Mahameed

On Wed, Oct 12, 2016 at 4:40 AM, Leon Romanovsky  wrote:
> On Tue, Oct 11, 2016 at 08:46:45AM -0700, Tom Herbert wrote:
>> On Tue, Oct 11, 2016 at 4:57 AM, Saeed Mahameed
>>  wrote:
>> > On Tue, Oct 11, 2016 at 7:50 PM, David Laight  
>> > wrote:
>> >> From: Tom Herbert
>> >>> Sent: 11 October 2016 05:22
>> >> ...
>> >>> Fix is to create MLX5_SET64_VCHK that takes an additional argument
>> >>> that is a constant. There are two callers of MLX5_SET64 that are
>> >>> trying to get a variable offset, change those to call MLX5_SET64_VCHK
>> >>> passing pas[0] as the argument to use in the offset check.
>> >>
>> >> I think I'd separate the array index instead.
>> >> Something like:
>> >>
>> >> #define MLX5_SET64_INDEXED(typ, p, fld, ndx, v) do { \
>> >> BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
>> >> __MLX5_SET64(typ, p, fld[ndx], v); \
>> >> } while (0)
>> >>
>> >> David
>> >
>> > Yes, I think this looks more natural, but instead MLX5_SET64_INDEXED,
>> > I prefer to have 2 macros
>> > MLX5_SET64(typ, p, fld, v) and MLX5_ARRAY_SET64(typ, p, fld, idx, v).
>> >
>> > Tom, do you want me to fix it ?
>> >
>> Please do.
>
> Saeed,
>
> Do you success to send this patch before -rc1 is released? So Linus's
> -rc1 will be clean from such build error.
>

Just submitted the patch, it seems that i have issues with my other
Mailer, sometimes e-mails take a while to appear in the mailing list.
I Hope the patch will arrive on time for Dave to pick it it up.

Thanks,
-Saeed.

[PATCH V2 net-next] net/mlx5: Add MLX5_ARRAY_SET64 to fix BUILD_BUG_ON

2016-10-11 Thread Saeed Mahameed

From: Tom Herbert 

I am hitting this in mlx5:

drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function
reclaim_pages_cmd.clone.0:
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:346: error: call
to __compiletime_assert_346 declared with attribute error:
BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_out, pas[i]) % 64
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function give_pages:
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:291: error: call
to __compiletime_assert_291 declared with attribute error:
BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_in, pas[i]) % 64

Problem is that this is doing a BUILD_BUG_ON on a non-constant
expression because of trying to take offset of pas[i] in the
structure.

Fix is to create MLX5_ARRAY_SET64 that takes an additional argument
that is the field index to separate between BUILD_BUG_ON on the array
constant field and the indexed field to assign the value to.
There are two callers of MLX5_SET64 that are trying to get a variable
offset, change those to call MLX5_ARRAY_SET64 passing 'pas' and 'i'
as the arguments to use in the offset check and the indexed value
assignment.

Fixes: a533ed5e179cd ("net/mlx5: Pages management commands via mlx5 ifc")
Signed-off-by: Tom Herbert 
Signed-off-by: Saeed Mahameed 
---

Hi Dave,

I hope this version of this patch will make it to -rc1, I made
some changes to the original version Tom submitted.  Following David Laight
suggestion to separate the array index from the constant array field to have
a more natural API.

Thanks,
Saeed.


 drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c |  4 ++--
 include/linux/mlx5/device.h | 13 +++--
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index d458515..cc4fd61 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -287,7 +287,7 @@ retry:
 
goto retry;
}
-   MLX5_SET64(manage_pages_in, in, pas[i], addr);
+   MLX5_ARRAY_SET64(manage_pages_in, in, pas, i, addr);
}
 
MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
@@ -344,7 +344,7 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
if (fwp->func_id != func_id)
continue;
 
-   MLX5_SET64(manage_pages_out, out, pas[i], fwp->addr);
+   MLX5_ARRAY_SET64(manage_pages_out, out, pas, i, fwp->addr);
i++;
}
 
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 77c1417..5827614 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -92,12 +92,21 @@ __mlx5_mask(typ, fld))
___t; \
 })
 
-#define MLX5_SET64(typ, p, fld, v) do { \
+#define __MLX5_SET64(typ, p, fld, v) do { \
BUILD_BUG_ON(__mlx5_bit_sz(typ, fld) != 64); \
-   BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
*((__be64 *)(p) + __mlx5_64_off(typ, fld)) = cpu_to_be64(v); \
 } while (0)
 
+#define MLX5_SET64(typ, p, fld, v) do { \
+   BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
+   __MLX5_SET64(typ, p, fld, v); \
+} while (0)
+
+#define MLX5_ARRAY_SET64(typ, p, fld, idx, v) do { \
+   BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
+   __MLX5_SET64(typ, p, fld[idx], v); \
+} while (0)
+
 #define MLX5_GET64(typ, p, fld) be64_to_cpu(*((__be64 *)(p) + 
__mlx5_64_off(typ, fld)))
 
 #define MLX5_GET64_PR(typ, p, fld) ({ \
-- 
2.7.4

Re: [PATCH net-next] tcp: Change txhash on some non-RTO retransmits

2016-10-11 Thread Yuchung Cheng

On Tue, Oct 11, 2016 at 2:08 PM, Lawrence Brakmo  wrote:
> Yuchung, thank you for your comments. Responses inline.
>
> On 10/11/16, 12:49 PM, "Yuchung Cheng"  wrote:
>
>>On Mon, Oct 10, 2016 at 5:18 PM, Lawrence Brakmo  wrote:
>>>
>>> The purpose of this patch is to help balance flows across paths. A new
>>> sysctl "tcp_retrans_txhash_prob" specifies the probability (0-100) that
>>> the txhash (IPv6 flowlabel) will be changed after a non-RTO retransmit.
>>> A probability is used in order to control how many flows are moved
>>> during a congestion event and prevent the congested path from becoming
>>> under utilized (which could occur if too many flows leave the current
>>> path). Txhash changes may be delayed in order to decrease the likelihood
>>> that it will trigger retransmists due to too much reordering.
>>>
>>> Another sysctl "tcp_retrans_txhash_mode" determines the behavior after
>>> RTOs. If the sysctl is 0, then after an RTO, only RTOs can trigger
>>> txhash changes. The idea is to decrease the likelihood of going back
>>> to a broken path. That is, we don't want flow balancing to trigger
>>> changes to broken paths. The drawback is that flow balancing does
>>> not work as well. If the sysctl is greater than 1, then we always
>>> do flow balancing, even after RTOs.
>>>
>>> Tested with packedrill tests (for correctness) and performance
>>> experiments with 2 and 3 paths. Performance experiments looked at
>>> aggregate goodput and fairness. For each run, we looked at the ratio of
>>> the goodputs for the fastest and slowest flows. These were averaged for
>>> all the runs. A fairness of 1 means all flows had the same goodput, a
>>> fairness of 2 means the fastest flow was twice as fast as the slowest
>>> flow.
>>>
>>> The setup for the performance experiments was 4 or 5 serves in a rack,
>>> 10G links. I tested various probabilities, but 20 seemed to have the
>>> best tradeoff for my setup (small RTTs).
>>>
>>>   --- node1 -
>>> sender --- switch --- node2 - switch  receiver
>>>   --- node3 -
>>>
>>> Scenario 1: One sender sends to one receiver through 2 routes (node1 or
>>> node 2). The output from node1 and node2 is 1G (1gbit/sec). With only 2
>>> flows, without flow balancing (prob=0) the average goodput is 1.6G vs.
>>> 1.9G with flow balancing due to 2 flows ending up in one link and either
>>> not moving and taking some time to move. Fairness was 1 in all cases.
>>> For 7 flows, goodput was 1.9G for all, but fairness was 1.5, 1.4 or 1.2
>>> for prob=0, prob=20,mode=0 and prob=20,mode=1 respectively. That is,
>>> flow balancing increased fairness.
>>>
>>> Scenario 2: One sender to one receiver, through 3 routes (node1,...
>>> node2). With 6 or 16 flows the goodput was the same for all, but
>>> fairness was 1.8, 1.5 and 1.2 respectively. Interestingly, the worst
>>> case fairness out of 10 runs were 2.2, 1.8 and 1.4 repectively. That is,
>>> prob=20,mode=1 improved average and worst case fairness.
>>I am wondering if we can build better API with routing layer to
>>implement this type of feature, instead of creeping the tx_rehashing
>>logic scatter in TCP. For example, we call dst_negative_advice on TCP
>>write timeouts.
>
> Not sure. The route is not necessarily bad, may be temporarily congested
> or they may all be congested. If all we want to do is change the txhash
> (unlike dst_negative_advice), then calling a tx_rehashing function may
> be the appropriate call.
>
>>
>>On the patch itself, it seems aggressive to (attempt to) rehash every
>>post-RTO retranmission. Also you can just use ca_state (==CA_Loss) to
>>identify post-RTO retransmission directly.
>
> Thanks, I will add the test.
>
>>
>>is this an implementation of the Flow Bender ?
>>https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation.cf
>>m-3Fid-3D2674985=DQIBaQ=5VD0RTtNlTh3ycd41b3MUw=pq_Mqvzfy-C8ltkgyx1u_
>>g=Q4nONH7kQ5AvQguw9UxpcHd79jfdDdrXj1YSJs7Ezhk=MA4fWBLMTGgRS0eGvBjxf7BJ
>>Ol3-oxAzZDEYUG4cE-s=
>
> Part of flow bender, although there are also some similarities to flowlet
> switching.
>
>>
>>>
>>> Scenario 3: One sender to one receiver, 2 routes, one route drops 50% of
>>> the packets. With 7 flows, goodput was the same 1.1G, but fairness was
>>> 1.8, 2.0 and 2.1 respectively. That is, if there is a bad route, then
>>> balancing, which does more re-routes, is less fair.
>>>
>>> Signed-off-by: Lawrence Brakmo 
>>> ---
>>>  Documentation/networking/ip-sysctl.txt | 15 +++
>>>  include/linux/tcp.h|  4 +++-
>>>  include/net/tcp.h  |  2 ++
>>>  net/ipv4/sysctl_net_ipv4.c | 18 ++
>>>  net/ipv4/tcp_input.c   | 10 ++
>>>  net/ipv4/tcp_output.c  | 23 ++-
>>>  net/ipv4/tcp_timer.c   |  4 
>>>  7 files changed, 74 insertions(+), 2 deletions(-)

[PATCH net-next v2] bridge: add address and vlan to fdb warning messages

2016-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds vlan and address to warning messages printed
in the bridge fdb code for debuggability.

Signed-off-by: Roopa Prabhu 
---
v2 - address comments from stephen
- use %u format specifier for vlan
- move print string to a single line

 net/bridge/br_fdb.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 6b43c8c..e4a4176 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -535,9 +535,8 @@ static int fdb_insert(struct net_bridge *br, struct 
net_bridge_port *source,
 */
if (fdb->is_local)
return 0;
-   br_warn(br, "adding interface %s with same address "
-  "as a received packet\n",
-  source ? source->dev->name : br->dev->name);
+   br_warn(br, "adding interface %s with same address as a 
received packet (addr:%pM, vlan:%u)\n",
+  source ? source->dev->name : br->dev->name, addr, vid);
fdb_delete(br, fdb);
}
 
@@ -583,9 +582,8 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
/* attempt to update an entry for a local interface */
if (unlikely(fdb->is_local)) {
if (net_ratelimit())
-   br_warn(br, "received packet on %s with "
-   "own address as source address\n",
-   source->dev->name);
+   br_warn(br, "received packet on %s with own 
address as source address (addr:%pM, vlan:%u)\n",
+   source->dev->name, addr, vid);
} else {
/* fastpath: update of existing entry */
if (unlikely(source != fdb->dst)) {
-- 
1.9.1

Re: [PATCH net-next] bridge: add address and vlan to fdb warning messages

2016-10-11 Thread Roopa Prabhu

On 10/11/16, 3:17 PM, Stephen Hemminger wrote:
> On Tue, 11 Oct 2016 14:33:51 -0700
> Roopa Prabhu  wrote:
>
>> From: Roopa Prabhu 
>>
>> This patch adds vlan and address to warning messages printed
>> in the bridge fdb code for debuggability.
>>
>> Signed-off-by: Roopa Prabhu 
>> ---
>>  net/bridge/br_fdb.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index 6b43c8c..1257a88 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -536,8 +536,8 @@ static int fdb_insert(struct net_bridge *br, struct 
>> net_bridge_port *source,
>>  if (fdb->is_local)
>>  return 0;
>>  br_warn(br, "adding interface %s with same address "
>> -   "as a received packet\n",
>> -   source ? source->dev->name : br->dev->name);
>> +   "as a received packet (addr:%pM, vlan:%d)\n",
>> +   source ? source->dev->name : br->dev->name, addr, vid);
>>  fdb_delete(br, fdb);
>>  }
>>  
>> @@ -583,9 +583,10 @@ void br_fdb_update(struct net_bridge *br, struct 
>> net_bridge_port *source,
>>  /* attempt to update an entry for a local interface */
>>  if (unlikely(fdb->is_local)) {
>>  if (net_ratelimit())
>> -br_warn(br, "received packet on %s with "
>> -"own address as source address\n",
>> -source->dev->name);
>> +br_warn(br, "received packet on %s with own "
>> +"address as source address "
>> +"(addr:%pM, vlan:%d)\n",
>> +source->dev->name, addr, vid);
>>  } else {
> Isn't vlan unsigned here so print with %u
you are right. will fix it
> Also it would be good to make string format on one line to allow for easy 
> search of source.
>   br_warn(br,
>   "received packet on %s with own address 
> %pM vlan %u",
>   source->dev->name, addr, vid);

ack, was debating abt this.

v2 coming..thanks

Re: [PATCH net-next] bridge: add address and vlan to fdb warning messages

2016-10-11 Thread Stephen Hemminger

On Tue, 11 Oct 2016 14:33:51 -0700
Roopa Prabhu  wrote:

> From: Roopa Prabhu 
> 
> This patch adds vlan and address to warning messages printed
> in the bridge fdb code for debuggability.
> 
> Signed-off-by: Roopa Prabhu 
> ---
>  net/bridge/br_fdb.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index 6b43c8c..1257a88 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -536,8 +536,8 @@ static int fdb_insert(struct net_bridge *br, struct 
> net_bridge_port *source,
>   if (fdb->is_local)
>   return 0;
>   br_warn(br, "adding interface %s with same address "
> -"as a received packet\n",
> -source ? source->dev->name : br->dev->name);
> +"as a received packet (addr:%pM, vlan:%d)\n",
> +source ? source->dev->name : br->dev->name, addr, vid);
>   fdb_delete(br, fdb);
>   }
>  
> @@ -583,9 +583,10 @@ void br_fdb_update(struct net_bridge *br, struct 
> net_bridge_port *source,
>   /* attempt to update an entry for a local interface */
>   if (unlikely(fdb->is_local)) {
>   if (net_ratelimit())
> - br_warn(br, "received packet on %s with "
> - "own address as source address\n",
> - source->dev->name);
> + br_warn(br, "received packet on %s with own "
> + "address as source address "
> + "(addr:%pM, vlan:%d)\n",
> + source->dev->name, addr, vid);
>   } else {

Isn't vlan unsigned here so print with %u
Also it would be good to make string format on one line to allow for easy 
search of source.
br_warn(br,
"received packet on %s with own address 
%pM vlan %u",
source->dev->name, addr, vid);

Re: saddr based blackhole/unreachable route

2016-10-11 Thread Bjørnar Ness

2016-10-11 20:39 GMT+02:00 Bjørnar Ness :
> Hello, netdev
>
> In a typical setup (eth0=internet, eth1=lan) i populate routing table
> 100 with saddrs I want
> dropped, and: "ip ru a pref 100 lookup table 100"
>
> What I would expect to see is packets with a saddr in table 100 coming
> in eth0 will go out eth1,
> with replies beeing dropped, but I do not see the packets going out eth1 at 
> all.
>
> Have tried searching and following the fib codepath, but have still
> not managed to understand what is really going on here.

Answering my own question, I guess its rp_filter kicking in here. and
it also explains
why I dont get icmp unreachable

Is there a better way to do source based rtbh?

-- 
Bj(/)rnar

[PATCH] Temporary patch for arpd

2016-10-11 Thread Pascal

Hello. I found wonderful bug in arpd daemon of iproute2 package.
Somehow arpd is absolute unworkable if run program with -f flag. On my
amd64 server i got "Segmentation fault" regardless -f mac-list.txt file
content.
The source of misc/arpd.c is not hard and i found that cause of
this bug is commit dd50247dba85255538d659551305b4bb75bcae62. I'm not
c++ developer, but i suppose segfault occured because argument of
dbase->put() has non-initialized dbdat.data argument.
Also arpd.c has strange condition "if (do_load || do_list)" that not
allows to run program with -f argument.
I did pull out the hexstring_a2n function from utils.c of previous commit
aeb199d5ce86c6c72decaac333cad5a7d7b38b3a and used it to populate
dbdat.data value after which program works fine.
I hurriedly make the patch that makes program alive. Please inspect
this problem, fix this bug and test program with -f key.
PS: sorry for my english =)

0001-Temporary-patch-for-arpd.patch
Description: Binary data

[PATCH net-next] bridge: add address and vlan to fdb warning messages

2016-10-11 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds vlan and address to warning messages printed
in the bridge fdb code for debuggability.

Signed-off-by: Roopa Prabhu 
---
 net/bridge/br_fdb.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 6b43c8c..1257a88 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -536,8 +536,8 @@ static int fdb_insert(struct net_bridge *br, struct 
net_bridge_port *source,
if (fdb->is_local)
return 0;
br_warn(br, "adding interface %s with same address "
-  "as a received packet\n",
-  source ? source->dev->name : br->dev->name);
+  "as a received packet (addr:%pM, vlan:%d)\n",
+  source ? source->dev->name : br->dev->name, addr, vid);
fdb_delete(br, fdb);
}
 
@@ -583,9 +583,10 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
/* attempt to update an entry for a local interface */
if (unlikely(fdb->is_local)) {
if (net_ratelimit())
-   br_warn(br, "received packet on %s with "
-   "own address as source address\n",
-   source->dev->name);
+   br_warn(br, "received packet on %s with own "
+   "address as source address "
+   "(addr:%pM, vlan:%d)\n",
+   source->dev->name, addr, vid);
} else {
/* fastpath: update of existing entry */
if (unlikely(source != fdb->dst)) {
-- 
1.9.1

Re: [PATCH v4 1/3] skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid conflicts with leds namespace

2016-10-11 Thread Stephen Hemminger

On Tue, 11 Oct 2016 15:26:18 -0500
Zach Brown  wrote:

> Adding led support for phy causes namespace conflicts for some
> phy drivers.
> 
> The marvel skge driver declared an enum for representing the states of
> Link LED Register. The enum contained constant LED_OFF which conflicted
> with declartation found in linux/leds.h.
> LED_OFF changed to LED_REG_OFF
> Also changed LED_ON to LED_REG_ON to avoid possible future conflict and
> for consistency.
> 
> Signed-off-by: Zach Brown 

Sure, that's fine but not sure why skge would be including linux/leds.h
anyway.

Re: [PATCH net-next] tcp: Change txhash on some non-RTO retransmits

2016-10-11 Thread Lawrence Brakmo

Yuchung, thank you for your comments. Responses inline.

On 10/11/16, 12:49 PM, "Yuchung Cheng"  wrote:

>On Mon, Oct 10, 2016 at 5:18 PM, Lawrence Brakmo  wrote:
>>
>> The purpose of this patch is to help balance flows across paths. A new
>> sysctl "tcp_retrans_txhash_prob" specifies the probability (0-100) that
>> the txhash (IPv6 flowlabel) will be changed after a non-RTO retransmit.
>> A probability is used in order to control how many flows are moved
>> during a congestion event and prevent the congested path from becoming
>> under utilized (which could occur if too many flows leave the current
>> path). Txhash changes may be delayed in order to decrease the likelihood
>> that it will trigger retransmists due to too much reordering.
>>
>> Another sysctl "tcp_retrans_txhash_mode" determines the behavior after
>> RTOs. If the sysctl is 0, then after an RTO, only RTOs can trigger
>> txhash changes. The idea is to decrease the likelihood of going back
>> to a broken path. That is, we don't want flow balancing to trigger
>> changes to broken paths. The drawback is that flow balancing does
>> not work as well. If the sysctl is greater than 1, then we always
>> do flow balancing, even after RTOs.
>>
>> Tested with packedrill tests (for correctness) and performance
>> experiments with 2 and 3 paths. Performance experiments looked at
>> aggregate goodput and fairness. For each run, we looked at the ratio of
>> the goodputs for the fastest and slowest flows. These were averaged for
>> all the runs. A fairness of 1 means all flows had the same goodput, a
>> fairness of 2 means the fastest flow was twice as fast as the slowest
>> flow.
>>
>> The setup for the performance experiments was 4 or 5 serves in a rack,
>> 10G links. I tested various probabilities, but 20 seemed to have the
>> best tradeoff for my setup (small RTTs).
>>
>>   --- node1 -
>> sender --- switch --- node2 - switch  receiver
>>   --- node3 -
>>
>> Scenario 1: One sender sends to one receiver through 2 routes (node1 or
>> node 2). The output from node1 and node2 is 1G (1gbit/sec). With only 2
>> flows, without flow balancing (prob=0) the average goodput is 1.6G vs.
>> 1.9G with flow balancing due to 2 flows ending up in one link and either
>> not moving and taking some time to move. Fairness was 1 in all cases.
>> For 7 flows, goodput was 1.9G for all, but fairness was 1.5, 1.4 or 1.2
>> for prob=0, prob=20,mode=0 and prob=20,mode=1 respectively. That is,
>> flow balancing increased fairness.
>>
>> Scenario 2: One sender to one receiver, through 3 routes (node1,...
>> node2). With 6 or 16 flows the goodput was the same for all, but
>> fairness was 1.8, 1.5 and 1.2 respectively. Interestingly, the worst
>> case fairness out of 10 runs were 2.2, 1.8 and 1.4 repectively. That is,
>> prob=20,mode=1 improved average and worst case fairness.
>I am wondering if we can build better API with routing layer to
>implement this type of feature, instead of creeping the tx_rehashing
>logic scatter in TCP. For example, we call dst_negative_advice on TCP
>write timeouts.

Not sure. The route is not necessarily bad, may be temporarily congested
or they may all be congested. If all we want to do is change the txhash
(unlike dst_negative_advice), then calling a tx_rehashing function may
be the appropriate call.
 
>
>On the patch itself, it seems aggressive to (attempt to) rehash every
>post-RTO retranmission. Also you can just use ca_state (==CA_Loss) to
>identify post-RTO retransmission directly.

Thanks, I will add the test.

>
>is this an implementation of the Flow Bender ?
>https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation.cf
>m-3Fid-3D2674985=DQIBaQ=5VD0RTtNlTh3ycd41b3MUw=pq_Mqvzfy-C8ltkgyx1u_
>g=Q4nONH7kQ5AvQguw9UxpcHd79jfdDdrXj1YSJs7Ezhk=MA4fWBLMTGgRS0eGvBjxf7BJ
>Ol3-oxAzZDEYUG4cE-s=

Part of flow bender, although there are also some similarities to flowlet
switching.

>
>>
>> Scenario 3: One sender to one receiver, 2 routes, one route drops 50% of
>> the packets. With 7 flows, goodput was the same 1.1G, but fairness was
>> 1.8, 2.0 and 2.1 respectively. That is, if there is a bad route, then
>> balancing, which does more re-routes, is less fair.
>>
>> Signed-off-by: Lawrence Brakmo 
>> ---
>>  Documentation/networking/ip-sysctl.txt | 15 +++
>>  include/linux/tcp.h|  4 +++-
>>  include/net/tcp.h  |  2 ++
>>  net/ipv4/sysctl_net_ipv4.c | 18 ++
>>  net/ipv4/tcp_input.c   | 10 ++
>>  net/ipv4/tcp_output.c  | 23 ++-
>>  net/ipv4/tcp_timer.c   |  4 
>>  7 files changed, 74 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt
>>b/Documentation/networking/ip-sysctl.txt
>> index 3db8c67..87a984c 100644
>> ---

[PATCH net] netvsc: fix checksum on UDP IPV6

2016-10-11 Thread Stephen Hemminger

The software calculation of UDP checksum in Netvsc driver was
only handling IPv4 case. By using skb_checksum_help() instead
all protocols can be handled. Rearrange code to eliminate goto
and look like other drivers.

This is a temporary solution; recent versions of Window Server etc
do support UDP checksum offload, just need to do the appropriate negotiation
with host to validate before using. This will be done in later patch.

Please queue this for -stable as well.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c | 72 +
 1 file changed, 22 insertions(+), 50 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 52eeb2f..9570d21 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hyperv_net.h"
@@ -442,8 +443,6 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
net_device *net)
}
 
net_trans_info = get_net_transport_info(skb, _offset);
-   if (net_trans_info == TRANSPORT_INFO_NOT_IP)
-   goto do_send;
 
/*
 * Setup the sendside checksum offload only if this is not a
@@ -478,56 +477,29 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
net_device *net)
}
lso_info->lso_v2_transmit.tcp_header_offset = hdr_offset;
lso_info->lso_v2_transmit.mss = skb_shinfo(skb)->gso_size;
-   goto do_send;
-   }
-
-   if ((skb->ip_summed == CHECKSUM_NONE) ||
-   (skb->ip_summed == CHECKSUM_UNNECESSARY))
-   goto do_send;
-
-   rndis_msg_size += NDIS_CSUM_PPI_SIZE;
-   ppi = init_ppi_data(rndis_msg, NDIS_CSUM_PPI_SIZE,
-   TCPIP_CHKSUM_PKTINFO);
-
-   csum_info = (struct ndis_tcp_ip_checksum_info *)((void *)ppi +
-   ppi->ppi_offset);
-
-   if (net_trans_info & (INFO_IPV4 << 16))
-   csum_info->transmit.is_ipv4 = 1;
-   else
-   csum_info->transmit.is_ipv6 = 1;
-
-   if (net_trans_info & INFO_TCP) {
-   csum_info->transmit.tcp_checksum = 1;
-   csum_info->transmit.tcp_header_offset = hdr_offset;
-   } else if (net_trans_info & INFO_UDP) {
-   /* UDP checksum offload is not supported on ws2008r2.
-* Furthermore, on ws2012 and ws2012r2, there are some
-* issues with udp checksum offload from Linux guests.
-* (these are host issues).
-* For now compute the checksum here.
-*/
-   struct udphdr *uh;
-   u16 udp_len;
-
-   ret = skb_cow_head(skb, 0);
-   if (ret)
-   goto no_memory;
-
-   uh = udp_hdr(skb);
-   udp_len = ntohs(uh->len);
-   uh->check = 0;
-   uh->check = csum_tcpudp_magic(ip_hdr(skb)->saddr,
- ip_hdr(skb)->daddr,
- udp_len, IPPROTO_UDP,
- csum_partial(uh, udp_len, 0));
-   if (uh->check == 0)
-   uh->check = CSUM_MANGLED_0;
-
-   csum_info->transmit.udp_checksum = 0;
+   } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+   if (net_trans_info & INFO_TCP) {
+   rndis_msg_size += NDIS_CSUM_PPI_SIZE;
+   ppi = init_ppi_data(rndis_msg, NDIS_CSUM_PPI_SIZE,
+   TCPIP_CHKSUM_PKTINFO);
+
+   csum_info = (struct ndis_tcp_ip_checksum_info *)((void 
*)ppi +
+
ppi->ppi_offset);
+
+   if (net_trans_info & (INFO_IPV4 << 16))
+   csum_info->transmit.is_ipv4 = 1;
+   else
+   csum_info->transmit.is_ipv6 = 1;
+
+   csum_info->transmit.tcp_checksum = 1;
+   csum_info->transmit.tcp_header_offset = hdr_offset;
+   } else {
+   /* UDP checksum (and other) offload is not supported. */
+   if (skb_checksum_help(skb))
+   goto drop;
+   }
}
 
-do_send:
/* Start filling in the page buffers with the rndis hdr */
rndis_msg->msg_len += rndis_msg_size;
packet->total_data_buflen = rndis_msg->msg_len;
-- 
2.9.3

Re: [RFC v3 3/3] phy,leds: add support for led triggers on phy link state change

2016-10-11 Thread Andrew Lunn

> Andrew, are you happy with this implementation?

Sorry, been to busy with other things to follow the discussion.

What would be nice to see is a comment about how the link to LEDs in
the PHYs is made. Often there is a couple of LEDs in the RJ45 socket
driven by the PHY. They can show link, speed, packet Rx/Tx, etc. How
are these triggers related to these LEDs?

Andrew

[PATCH net] ip6_tunnel: fix ip6_tnl_lookup

2016-10-11 Thread Vadim Fedorenko

The commit ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel
endpoints.") introduces support for wildcards in tunnels endpoints,
but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
interface relying only on source or destination address of the packet
and not checking presence of wildcard in tunnels endpoints. Later in
ip6_tnl_rcv this packets can be dicarded because of difference in
ipproto even if fallback device have proper ipproto configuration.

This patch adds checks of wildcard endpoint in tunnel avoiding such
behavior

Fixes: ea3dc9601bda ("ip6_tunnel: Add support for wildcard tunnel
endpoints.")

Signed-off-by: Vadim Fedorenko 
---
 net/ipv6/ip6_tunnel.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6a66adb..5692d6b 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -157,6 +157,7 @@ ip6_tnl_lookup(struct net *net, const struct in6_addr 
*remote, const struct in6_
hash = HASH(, local);
for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
if (ipv6_addr_equal(local, >parms.laddr) &&
+   ipv6_addr_any(>parms.raddr) &&
(t->dev->flags & IFF_UP))
return t;
}
@@ -164,6 +165,7 @@ ip6_tnl_lookup(struct net *net, const struct in6_addr 
*remote, const struct in6_
hash = HASH(remote, );
for_each_ip6_tunnel_rcu(ip6n->tnls_r_l[hash]) {
if (ipv6_addr_equal(remote, >parms.raddr) &&
+   ipv6_addr_any(>parms.laddr) &&
(t->dev->flags & IFF_UP))
return t;
}
-- 
1.9.1

[PATCH v4 1/3] skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid conflicts with leds namespace

2016-10-11 Thread Zach Brown

Adding led support for phy causes namespace conflicts for some
phy drivers.

The marvel skge driver declared an enum for representing the states of
Link LED Register. The enum contained constant LED_OFF which conflicted
with declartation found in linux/leds.h.
LED_OFF changed to LED_REG_OFF
Also changed LED_ON to LED_REG_ON to avoid possible future conflict and
for consistency.

Signed-off-by: Zach Brown 
---
 drivers/net/ethernet/marvell/skge.c | 6 +++---
 drivers/net/ethernet/marvell/skge.h | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/skge.c 
b/drivers/net/ethernet/marvell/skge.c
index 7173836..783df01 100644
--- a/drivers/net/ethernet/marvell/skge.c
+++ b/drivers/net/ethernet/marvell/skge.c
@@ -1048,7 +1048,7 @@ static const char *skge_pause(enum pause_status status)
 static void skge_link_up(struct skge_port *skge)
 {
skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG),
-   LED_BLK_OFF|LED_SYNC_OFF|LED_ON);
+   LED_BLK_OFF|LED_SYNC_OFF|LED_REG_ON);
 
netif_carrier_on(skge->netdev);
netif_wake_queue(skge->netdev);
@@ -1062,7 +1062,7 @@ static void skge_link_up(struct skge_port *skge)
 
 static void skge_link_down(struct skge_port *skge)
 {
-   skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG), LED_OFF);
+   skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG), LED_REG_OFF);
netif_carrier_off(skge->netdev);
netif_stop_queue(skge->netdev);
 
@@ -2668,7 +2668,7 @@ static int skge_down(struct net_device *dev)
if (hw->ports == 1)
free_irq(hw->pdev->irq, hw);
 
-   skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG), LED_OFF);
+   skge_write8(skge->hw, SK_REG(skge->port, LNK_LED_REG), LED_REG_OFF);
if (is_genesis(hw))
genesis_stop(skge);
else
diff --git a/drivers/net/ethernet/marvell/skge.h 
b/drivers/net/ethernet/marvell/skge.h
index a2eb341..3ea151f 100644
--- a/drivers/net/ethernet/marvell/skge.h
+++ b/drivers/net/ethernet/marvell/skge.h
@@ -662,8 +662,8 @@ enum {
LED_BLK_OFF = 1<<4, /* Link LED Blinking Off */
LED_SYNC_ON = 1<<3, /* Use Sync Wire to switch LED */
LED_SYNC_OFF= 1<<2, /* Disable Sync Wire Input */
-   LED_ON  = 1<<1, /* switch LED on */
-   LED_OFF = 1<<0, /* switch LED off */
+   LED_REG_ON  = 1<<1, /* switch LED on */
+   LED_REG_OFF = 1<<0, /* switch LED off */
 };
 
 /* Receive GMAC FIFO (YUKON) */
-- 
2.7.4

[PATCH v4 0/3] Add support for led triggers on phy link state change

2016-10-11 Thread Zach Brown

Fix skge driver that declared enum contants that conflicted with enum
constants in linux/leds.h

Create function that encapsulates actions taken during the adjust phy link step
of phy state changes.

Add support for led triggers on phy link state changes by adding
a config option. When set the config option will create a set of led triggers
for each phy device. Users can use the led triggers to represent link state
changes on the phy.

v2:
 * New patch that creates phy_adjust_link function to encapsulate actions taken
   when adjusting phy link during phy state changes
 * led trigger speed strings changed to match existing phy speed strings
 * New function that maps speeds to led triggers
 * Replace magic constants with definitions when declaring trigger name
   buffer and number of triggers.
v3:
 * Changed LED_ON to LED_REG_ON in skge driver to avoid possible future
   conflict and improve consistency.
 * Dropped rtl8712 patch that was accepted separately.
v4:
 * tweaked commit message

Josh Cartwright (1):
  net: phy: leds: add support for led triggers on phy link state change

Zach Brown (2):
  skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid
conflicts with leds namespace
  net: phy: Encapsulate actions performed during link state changes into
function phy_adjust_link

 drivers/net/ethernet/marvell/skge.c |   6 +-
 drivers/net/ethernet/marvell/skge.h |   4 +-
 drivers/net/phy/Kconfig |  13 +++-
 drivers/net/phy/Makefile|   1 +
 drivers/net/phy/phy.c   |  22 ---
 drivers/net/phy/phy_device.c|   4 ++
 drivers/net/phy/phy_led_triggers.c  | 121 
 include/linux/phy.h |   9 +++
 include/linux/phy_led_triggers.h|  52 
 9 files changed, 218 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/phy/phy_led_triggers.c
 create mode 100644 include/linux/phy_led_triggers.h

-- 
2.7.4

[PATCH v4 3/3] net: phy: leds: add support for led triggers on phy link state change

2016-10-11 Thread Zach Brown

From: Josh Cartwright 

Create an option CONFIG_LED_TRIGGER_PHY (default n), which will
create a set of led triggers for each instantiated PHY device.  There is
one LED trigger per link-speed, per-phy.

This allows for a user to configure their system to allow a set of LEDs
to represent link state changes on the phy.

Signed-off-by: Josh Cartwright 
Signed-off-by: Nathan Sullivan 
Signed-off-by: Zach Brown 
---
 drivers/net/phy/Kconfig|  13 +++-
 drivers/net/phy/Makefile   |   1 +
 drivers/net/phy/phy.c  |   1 +
 drivers/net/phy/phy_device.c   |   4 ++
 drivers/net/phy/phy_led_triggers.c | 121 +
 include/linux/phy.h|   9 +++
 include/linux/phy_led_triggers.h   |  52 
 7 files changed, 200 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/phy/phy_led_triggers.c
 create mode 100644 include/linux/phy_led_triggers.h

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 5078a0d..4fd912d 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -25,6 +25,18 @@ config MDIO_BCM_IPROC
  This module provides a driver for the MDIO busses found in the
  Broadcom iProc SoC's.
 
+config LED_TRIGGER_PHY
+   bool "Support LED triggers for tracking link state"
+   depends on LEDS_TRIGGERS
+   ---help---
+ Adds support for a set of LED trigger events per-PHY.  Link
+ state change will trigger the events, for consumption by an
+ LED class driver.  There are triggers for each link speed,
+ and are of the form:
+  ::
+
+ Where speed is one of: 10Mbps, 100Mbps, 1Gbps, 2.5Gbps, or 10Gbps.
+
 config MDIO_BCM_UNIMAC
tristate "Broadcom UniMAC MDIO bus controller"
depends on HAS_IOMEM
@@ -40,7 +52,6 @@ config MDIO_BITBANG
  This module implements the MDIO bus protocol in software,
  for use by low level drivers that export the ability to
  drive the relevant pins.
-
  If in doubt, say N.
 
 config MDIO_BUS_MUX
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index e58667d..86d12cd 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -2,6 +2,7 @@
 
 libphy-y   := phy.o phy_device.o mdio_bus.o mdio_device.o
 libphy-$(CONFIG_SWPHY) += swphy.o
+libphy-$(CONFIG_LED_TRIGGER_PHY)   += phy_led_triggers.o
 
 obj-$(CONFIG_PHYLIB)   += libphy.o
 
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f5721db..e5f9fee7 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -896,6 +896,7 @@ EXPORT_SYMBOL(phy_start);
 static void phy_adjust_link(struct phy_device *phydev)
 {
phydev->adjust_link(phydev->attached_dev);
+   phy_led_trigger_change_speed(phydev);
 }
 
 /**
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index e977ba9..4671c13 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@ static void phy_mdio_device_free(struct mdio_device *mdiodev)
 
 static void phy_device_release(struct device *dev)
 {
+   phy_led_triggers_unregister(to_phy_device(dev));
kfree(to_phy_device(dev));
 }
 
@@ -345,6 +347,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, 
int addr, int phy_id,
 
dev->state = PHY_DOWN;
 
+   phy_led_triggers_register(dev);
+
mutex_init(>lock);
INIT_DELAYED_WORK(>state_queue, phy_state_machine);
INIT_WORK(>phy_queue, phy_change);
diff --git a/drivers/net/phy/phy_led_triggers.c 
b/drivers/net/phy/phy_led_triggers.c
new file mode 100644
index 000..32326d7
--- /dev/null
+++ b/drivers/net/phy/phy_led_triggers.c
@@ -0,0 +1,121 @@
+/* Copyright (C) 2016 National Instruments Corp.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+  * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include 
+#include 
+#include 
+
+static struct phy_led_trigger *phy_speed_to_led_trigger(struct phy_device *phy,
+   unsigned int speed)
+{
+   switch (speed) {
+   case SPEED_10:
+   return >phy_led_trigger[0];
+   case SPEED_100:
+   return >phy_led_trigger[1];
+   case SPEED_1000:
+   return >phy_led_trigger[2];
+   case SPEED_2500:
+   return

[PATCH v4 2/3] net: phy: Encapsulate actions performed during link state changes into function phy_adjust_link

2016-10-11 Thread Zach Brown

During phy state machine state transitions some set of actions should
occur whenever the link state changes. These actions should be
encapsulated into a single function

This patch adds the phy_adjust_link function, which is called whenever
phydev->adjust_link would have been called before. Actions that should
occur whenever the phy link is adjusted can now be added to the
phy_adjust_link function.

Signed-off-by: Zach Brown 
---
 drivers/net/phy/phy.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index c6f6683..f5721db 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -893,6 +893,11 @@ void phy_start(struct phy_device *phydev)
 }
 EXPORT_SYMBOL(phy_start);
 
+static void phy_adjust_link(struct phy_device *phydev)
+{
+   phydev->adjust_link(phydev->attached_dev);
+}
+
 /**
  * phy_state_machine - Handle the state machine
  * @work: work_struct that describes the work to be done
@@ -935,7 +940,7 @@ void phy_state_machine(struct work_struct *work)
if (!phydev->link) {
phydev->state = PHY_NOLINK;
netif_carrier_off(phydev->attached_dev);
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
break;
}
 
@@ -948,7 +953,7 @@ void phy_state_machine(struct work_struct *work)
if (err > 0) {
phydev->state = PHY_RUNNING;
netif_carrier_on(phydev->attached_dev);
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
 
} else if (0 == phydev->link_timeout--)
needs_aneg = true;
@@ -975,7 +980,7 @@ void phy_state_machine(struct work_struct *work)
}
phydev->state = PHY_RUNNING;
netif_carrier_on(phydev->attached_dev);
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
}
break;
case PHY_FORCING:
@@ -991,7 +996,7 @@ void phy_state_machine(struct work_struct *work)
needs_aneg = true;
}
 
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
break;
case PHY_RUNNING:
/* Only register a CHANGE if we are polling and link changed
@@ -1020,7 +1025,7 @@ void phy_state_machine(struct work_struct *work)
netif_carrier_off(phydev->attached_dev);
}
 
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
 
if (phy_interrupt_is_valid(phydev))
err = phy_config_interrupt(phydev,
@@ -1030,7 +1035,7 @@ void phy_state_machine(struct work_struct *work)
if (phydev->link) {
phydev->link = 0;
netif_carrier_off(phydev->attached_dev);
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
do_suspend = true;
}
break;
@@ -1054,7 +1059,7 @@ void phy_state_machine(struct work_struct *work)
} else  {
phydev->state = PHY_NOLINK;
}
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
} else {
phydev->state = PHY_AN;
phydev->link_timeout = PHY_AN_TIMEOUT;
@@ -1070,7 +1075,7 @@ void phy_state_machine(struct work_struct *work)
} else  {
phydev->state = PHY_NOLINK;
}
-   phydev->adjust_link(phydev->attached_dev);
+   phy_adjust_link(phydev);
}
break;
}
-- 
2.7.4

Re: [PATCH net-next] mlx5: Add MLX5_SET64_VCHK to fix BUILD_BUG_ON

2016-10-11 Thread Saeed Mahameed




On 10/11/2016 01:22 PM, Tom Herbert wrote:

I am hitting this in mlx5:

drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function
‘reclaim_pages_cmd.clone.0’:
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:346: error: call
to ‘__compiletime_assert_346’ declared with attribute error:
BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_out, pas[i]) % 64
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function ‘give_pages’:
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:291: error: call
to ‘__compiletime_assert_291’ declared with attribute error:
BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_in, pas[i]) % 64

Problem is that this is doing a BUILD_BUG_ON on a non-constant
expression because of trying to take offset of pas[i] in the
structure.

Fix is to create MLX5_SET64_VCHK that takes an additional argument
that is a constant. There are two callers of MLX5_SET64 that are
trying to get a variable offset, change those to call MLX5_SET64_VCHK
passing pas[0] as the argument to use in the offset check.

Fixes: a533ed5e179cd ("net/mlx5: Pages management commands via mlx5 ifc")
Signed-off-by: Tom Herbert 


Acked-by: Saeed Mahameed

[PATCH net] Revert "net: Add driver helper functions to determine checksum offloadability"

2016-10-11 Thread Stephen Hemminger


This reverts commit 6ae23ad36253a8033c5714c52b691b84456487c5.

The code has been in kernel since 4.4 but there are no in tree
code that uses. Unused code is broken code, remove it.

Signed-off-by: Stephen Hemminger 
---
 include/linux/netdevice.h |  78 --
 net/core/dev.c| 136 --
 2 files changed, 214 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 136ae6bb..793155e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2649,71 +2649,6 @@ static inline void skb_gro_remcsum_cleanup(struct 
sk_buff *skb,
remcsum_unadjust((__sum16 *)ptr, grc->delta);
 }
 
-struct skb_csum_offl_spec {
-   __u16   ipv4_okay:1,
-   ipv6_okay:1,
-   encap_okay:1,
-   ip_options_okay:1,
-   ext_hdrs_okay:1,
-   tcp_okay:1,
-   udp_okay:1,
-   sctp_okay:1,
-   vlan_okay:1,
-   no_encapped_ipv6:1,
-   no_not_encapped:1;
-};
-
-bool __skb_csum_offload_chk(struct sk_buff *skb,
-   const struct skb_csum_offl_spec *spec,
-   bool *csum_encapped,
-   bool csum_help);
-
-static inline bool skb_csum_offload_chk(struct sk_buff *skb,
-   const struct skb_csum_offl_spec *spec,
-   bool *csum_encapped,
-   bool csum_help)
-{
-   if (skb->ip_summed != CHECKSUM_PARTIAL)
-   return false;
-
-   return __skb_csum_offload_chk(skb, spec, csum_encapped, csum_help);
-}
-
-static inline bool skb_csum_offload_chk_help(struct sk_buff *skb,
-const struct skb_csum_offl_spec 
*spec)
-{
-   bool csum_encapped;
-
-   return skb_csum_offload_chk(skb, spec, _encapped, true);
-}
-
-static inline bool skb_csum_off_chk_help_cmn(struct sk_buff *skb)
-{
-   static const struct skb_csum_offl_spec csum_offl_spec = {
-   .ipv4_okay = 1,
-   .ip_options_okay = 1,
-   .ipv6_okay = 1,
-   .vlan_okay = 1,
-   .tcp_okay = 1,
-   .udp_okay = 1,
-   };
-
-   return skb_csum_offload_chk_help(skb, _offl_spec);
-}
-
-static inline bool skb_csum_off_chk_help_cmn_v4_only(struct sk_buff *skb)
-{
-   static const struct skb_csum_offl_spec csum_offl_spec = {
-   .ipv4_okay = 1,
-   .ip_options_okay = 1,
-   .tcp_okay = 1,
-   .udp_okay = 1,
-   .vlan_okay = 1,
-   };
-
-   return skb_csum_offload_chk_help(skb, _offl_spec);
-}
-
 static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
  unsigned short type,
  const void *daddr, const void *saddr,
@@ -3957,19 +3892,6 @@ static inline bool 
can_checksum_protocol(netdev_features_t features,
}
 }
 
-/* Map an ethertype into IP protocol if possible */
-static inline int eproto_to_ipproto(int eproto)
-{
-   switch (eproto) {
-   case htons(ETH_P_IP):
-   return IPPROTO_IP;
-   case htons(ETH_P_IPV6):
-   return IPPROTO_IPV6;
-   default:
-   return -1;
-   }
-}
-
 #ifdef CONFIG_BUG
 void netdev_rx_csum_fault(struct net_device *dev);
 #else
diff --git a/net/core/dev.c b/net/core/dev.c
index f1fe26f..593f427 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -139,7 +139,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "net-sysfs.h"
@@ -2492,141 +2491,6 @@ out:
 }
 EXPORT_SYMBOL(skb_checksum_help);
 
-/* skb_csum_offload_check - Driver helper function to determine if a device
- * with limited checksum offload capabilities is able to offload the checksum
- * for a given packet.
- *
- * Arguments:
- *   skb - sk_buff for the packet in question
- *   spec - contains the description of what device can offload
- *   csum_encapped - returns true if the checksum being offloaded is
- *   encpasulated. That is it is checksum for the transport header
- *   in the inner headers.
- *   checksum_help - when set indicates that helper function should
- *   call skb_checksum_help if offload checks fail
- *
- * Returns:
- *   true: Packet has passed the checksum checks and should be offloadable to
- *the device (a driver may still need to check for additional
- *restrictions of its device)
- *   false: Checksum is not offloadable. If checksum_help was set then
- *skb_checksum_help was called to resolve checksum for non-GSO
- *packets and when IP protocol is not SCTP
- */
-bool __skb_csum_offload_chk(struct sk_buff *skb,
-   const

Re: [PATCH net-next] tcp: Change txhash on some non-RTO retransmits

2016-10-11 Thread Yuchung Cheng

On Mon, Oct 10, 2016 at 5:18 PM, Lawrence Brakmo  wrote:
>
> The purpose of this patch is to help balance flows across paths. A new
> sysctl "tcp_retrans_txhash_prob" specifies the probability (0-100) that
> the txhash (IPv6 flowlabel) will be changed after a non-RTO retransmit.
> A probability is used in order to control how many flows are moved
> during a congestion event and prevent the congested path from becoming
> under utilized (which could occur if too many flows leave the current
> path). Txhash changes may be delayed in order to decrease the likelihood
> that it will trigger retransmists due to too much reordering.
>
> Another sysctl "tcp_retrans_txhash_mode" determines the behavior after
> RTOs. If the sysctl is 0, then after an RTO, only RTOs can trigger
> txhash changes. The idea is to decrease the likelihood of going back
> to a broken path. That is, we don't want flow balancing to trigger
> changes to broken paths. The drawback is that flow balancing does
> not work as well. If the sysctl is greater than 1, then we always
> do flow balancing, even after RTOs.
>
> Tested with packedrill tests (for correctness) and performance
> experiments with 2 and 3 paths. Performance experiments looked at
> aggregate goodput and fairness. For each run, we looked at the ratio of
> the goodputs for the fastest and slowest flows. These were averaged for
> all the runs. A fairness of 1 means all flows had the same goodput, a
> fairness of 2 means the fastest flow was twice as fast as the slowest
> flow.
>
> The setup for the performance experiments was 4 or 5 serves in a rack,
> 10G links. I tested various probabilities, but 20 seemed to have the
> best tradeoff for my setup (small RTTs).
>
>   --- node1 -
> sender --- switch --- node2 - switch  receiver
>   --- node3 -
>
> Scenario 1: One sender sends to one receiver through 2 routes (node1 or
> node 2). The output from node1 and node2 is 1G (1gbit/sec). With only 2
> flows, without flow balancing (prob=0) the average goodput is 1.6G vs.
> 1.9G with flow balancing due to 2 flows ending up in one link and either
> not moving and taking some time to move. Fairness was 1 in all cases.
> For 7 flows, goodput was 1.9G for all, but fairness was 1.5, 1.4 or 1.2
> for prob=0, prob=20,mode=0 and prob=20,mode=1 respectively. That is,
> flow balancing increased fairness.
>
> Scenario 2: One sender to one receiver, through 3 routes (node1,...
> node2). With 6 or 16 flows the goodput was the same for all, but
> fairness was 1.8, 1.5 and 1.2 respectively. Interestingly, the worst
> case fairness out of 10 runs were 2.2, 1.8 and 1.4 repectively. That is,
> prob=20,mode=1 improved average and worst case fairness.
I am wondering if we can build better API with routing layer to
implement this type of feature, instead of creeping the tx_rehashing
logic scatter in TCP. For example, we call dst_negative_advice on TCP
write timeouts.

On the patch itself, it seems aggressive to (attempt to) rehash every
post-RTO retranmission. Also you can just use ca_state (==CA_Loss) to
identify post-RTO retransmission directly.

is this an implementation of the Flow Bender ?
http://dl.acm.org/citation.cfm?id=2674985

>
> Scenario 3: One sender to one receiver, 2 routes, one route drops 50% of
> the packets. With 7 flows, goodput was the same 1.1G, but fairness was
> 1.8, 2.0 and 2.1 respectively. That is, if there is a bad route, then
> balancing, which does more re-routes, is less fair.
>
> Signed-off-by: Lawrence Brakmo 
> ---
>  Documentation/networking/ip-sysctl.txt | 15 +++
>  include/linux/tcp.h|  4 +++-
>  include/net/tcp.h  |  2 ++
>  net/ipv4/sysctl_net_ipv4.c | 18 ++
>  net/ipv4/tcp_input.c   | 10 ++
>  net/ipv4/tcp_output.c  | 23 ++-
>  net/ipv4/tcp_timer.c   |  4 
>  7 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt 
> b/Documentation/networking/ip-sysctl.txt
> index 3db8c67..87a984c 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -472,6 +472,21 @@ tcp_max_reordering - INTEGER
> if paths are using per packet load balancing (like bonding rr mode)
> Default: 300
>
> +tcp_retrans_txhash_mode - INTEGER
> +   If zero, disable txhash recalculation due to non-RTO retransmissions
> +   after an RTO. The idea is that broken paths will trigger an RTO and
> +   we don't want going back to that path due to standard retransmissons
> +   (flow balancing). The drawback is that balancing is less robust.
> +   If greater than zero, can always (probabilistically) recalculate
> +   txhash after non-RTO retransmissions.
> +
> +tcp_retrans_txhash_prob - INTEGER
> +   Probability [0 to 100] that we

Re: [PATCH net-next] mlx5: Add MLX5_SET64_VCHK to fix BUILD_BUG_ON

2016-10-11 Thread Leon Romanovsky

On Tue, Oct 11, 2016 at 08:46:45AM -0700, Tom Herbert wrote:
> On Tue, Oct 11, 2016 at 4:57 AM, Saeed Mahameed
>  wrote:
> > On Tue, Oct 11, 2016 at 7:50 PM, David Laight  
> > wrote:
> >> From: Tom Herbert
> >>> Sent: 11 October 2016 05:22
> >> ...
> >>> Fix is to create MLX5_SET64_VCHK that takes an additional argument
> >>> that is a constant. There are two callers of MLX5_SET64 that are
> >>> trying to get a variable offset, change those to call MLX5_SET64_VCHK
> >>> passing pas[0] as the argument to use in the offset check.
> >>
> >> I think I'd separate the array index instead.
> >> Something like:
> >>
> >> #define MLX5_SET64_INDEXED(typ, p, fld, ndx, v) do { \
> >> BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
> >> __MLX5_SET64(typ, p, fld[ndx], v); \
> >> } while (0)
> >>
> >> David
> >
> > Yes, I think this looks more natural, but instead MLX5_SET64_INDEXED,
> > I prefer to have 2 macros
> > MLX5_SET64(typ, p, fld, v) and MLX5_ARRAY_SET64(typ, p, fld, idx, v).
> >
> > Tom, do you want me to fix it ?
> >
> Please do.

Saeed,

Do you success to send this patch before -rc1 is released? So Linus's
-rc1 will be clean from such build error.

>
> > Thanks,
> > Saeed.


signature.asc
Description: PGP signature

Re: RFH: problems with adjacency graph

2016-10-11 Thread David Ahern

On 10/11/16 12:54 AM, Jiri Pirko wrote:
>>
>> It seems like the complete mesh is not really needed, but cscope shows 
>> spectrum, ixgbe and bonding all using the for_each upper and lower device 
>> macros.
>>
>> Suggestions?
> 
> Well other possibility is to traverse the tree recursively. But that is
> exactly why the colided lists of all uppers/lowers were introduced to
> avoid this.

The simpler approach is to remove all_adj_list completely and iteratively walk 
the lower and upper lists in adj_list for the macros that walk all lower and 
all upper devices. Maintaining a complete linear list per netdev of all lower 
devices and all upper devices is just not going to work given all combinations 
of enslave orderings and considering stacks as high as 6 deep. As far as I can 
tell the netdev_for_each_all_lower_dev, netdev_for_each_all_lower_dev_rcu, and 
netdev_for_each_all_upper_dev_rcu macros are all used in the slow path so it 
should be ok. Removing all_adj_list significantly simplifies the dev insert and 
remove code.

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Doug Ledford

On 10/11/2016 2:30 PM, Jason Gunthorpe wrote:
> On Tue, Oct 11, 2016 at 02:17:51PM -0400, Doug Ledford wrote:
> 
>> Well, not exactly.  Even if we put 65520 into the scripts, the kernel
>> will silently drop it down to 65504.  It actually won't require anyone
>> change anything, they just won't get the full value.  I experimented
>> with this in the past for other reasons and an overly large MTU setting
>> just resulted in the max MTU.  I don't know if that's changed, but if it
>> still works that way, this is much less of an issue than it might
>> otherwise be.
> 
> So it is just docs and relying on PMTU? That is not as bad..
> 
> Still would be nice to avoid if at all possible..

I agree, but we have a test getting ready to commence.  We'll know
shortly how much the reduced MTU effects things because they aren't
going to alter any of their setup, just put the new kernel in place, and
see what happens.


-- 
Doug Ledford 
GPG Key ID: 0E572FDD



signature.asc
Description: OpenPGP digital signature

saddr based blackhole/unreachable route

2016-10-11 Thread Bjørnar Ness

Hello, netdev

In a typical setup (eth0=internet, eth1=lan) i populate routing table
100 with saddrs I want
dropped, and: "ip ru a pref 100 lookup table 100"

What I would expect to see is packets with a saddr in table 100 coming
in eth0 will go out eth1,
with replies beeing dropped, but I do not see the packets going out eth1 at all.

Have tried searching and following the fib codepath, but have still
not managed to understand
what is really going on here.

Is the saddr looked up in the routing table?
Why dont I get icmp unreachable for unreachable routes?
Is tcpdump tricking me here?

I like the behavior, I just don't know if I can trust it.

Kernel 4.8.1

Regards,
-- 
Bj(/)rnar

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Jason Gunthorpe

On Tue, Oct 11, 2016 at 02:17:51PM -0400, Doug Ledford wrote:

> Well, not exactly.  Even if we put 65520 into the scripts, the kernel
> will silently drop it down to 65504.  It actually won't require anyone
> change anything, they just won't get the full value.  I experimented
> with this in the past for other reasons and an overly large MTU setting
> just resulted in the max MTU.  I don't know if that's changed, but if it
> still works that way, this is much less of an issue than it might
> otherwise be.

So it is just docs and relying on PMTU? That is not as bad..

Still would be nice to avoid if at all possible..

Jason

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Paolo Abeni

On Tue, 2016-10-11 at 11:42 -0600, Jason Gunthorpe wrote:
> On Tue, Oct 11, 2016 at 07:37:32PM +0200, Paolo Abeni wrote:
> > On Tue, 2016-10-11 at 11:32 -0600, Jason Gunthorpe wrote:
> > > On Tue, Oct 11, 2016 at 07:15:44PM +0200, Paolo Abeni wrote:
> > > 
> > > > Also the connected mode maximum mtu is reduced by 16 bytes to
> > > > cope with the increased hard header len.
> > > 
> > > Changing the MTU is going to cause annoying interop problems, can you
> > > avoid this?
> > 
> > I don't like changing the maximum MTU value, too, but I was unable to
> > find an alternative solution. The PMTU detection should protect against
> > such issues.
> 
> It is more that PMTU, we have instructed all users that is the MTU
> number needed to enable CM mode, so it appears in documentation,
> scripts, etc.

AFAICS the max mtu is already underlying h/w dependent, how does such
differences are currently coped by ? (I'm sorry I lack some/a lot of IB
back-ground)

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Jason Gunthorpe

On Tue, Oct 11, 2016 at 08:10:07PM +0200, Paolo Abeni wrote:

> The first s/g fragment (the head buffer) is not allocated with the page
> allocator, so perhaps there is some not too difficult/costly way out of
> this.

Keep in mind, there is nothing magic about the 16 SGL limit, other
than we know all hardware supports it. That can be bumped up and most
hardware will support a higher value.

We'd just have to figure out if any hardware breaks, Mellanox and Intel
should be able to respond to that question.

Jason

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Doug Ledford

On 10/11/2016 2:10 PM, Paolo Abeni wrote:
> On Tue, 2016-10-11 at 12:01 -0600, Jason Gunthorpe wrote:
>>> AFAICS the max mtu is already underlying h/w dependent, how does such
>>> differences are currently coped by ? (I'm sorry I lack some/a lot of IB
>>> back-ground)
>>
>> It isn't h/w dependent. In CM mode the MTU is 65520 because that is
>> what is hard coded into the ipoib driver. We tell everyone to use that
>> number. Eg see RH's docs on the subject:
>>
>> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configuring_IPoIB.html
>>
>> AFAIK, today everyone just wires that number into their scripts, so we
>> have to mass change everything to the smaller number.

Well, not exactly.  Even if we put 65520 into the scripts, the kernel
will silently drop it down to 65504.  It actually won't require anyone
change anything, they just won't get the full value.  I experimented
with this in the past for other reasons and an overly large MTU setting
just resulted in the max MTU.  I don't know if that's changed, but if it
still works that way, this is much less of an issue than it might
otherwise be.

>> That sounds
>> really hard, IMHO if there is any way to avoid it we should, even if
>> it is a little costly.
> 
> Thank you for the details!
> 
> The first s/g fragment (the head buffer) is not allocated with the page
> allocator, so perhaps there is some not too difficult/costly way out of
> this.
> 
> 
> 
> 
> 
> 


-- 
Doug Ledford 
GPG Key ID: 0E572FDD



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Paolo Abeni

On Tue, 2016-10-11 at 12:01 -0600, Jason Gunthorpe wrote:
> > AFAICS the max mtu is already underlying h/w dependent, how does such
> > differences are currently coped by ? (I'm sorry I lack some/a lot of IB
> > back-ground)
> 
> It isn't h/w dependent. In CM mode the MTU is 65520 because that is
> what is hard coded into the ipoib driver. We tell everyone to use that
> number. Eg see RH's docs on the subject:
> 
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configuring_IPoIB.html
> 
> AFAIK, today everyone just wires that number into their scripts, so we
> have to mass change everything to the smaller number. That sounds
> really hard, IMHO if there is any way to avoid it we should, even if
> it is a little costly.

Thank you for the details!

The first s/g fragment (the head buffer) is not allocated with the page
allocator, so perhaps there is some not too difficult/costly way out of
this.

Re: [PATCH net 3/3] openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev

2016-10-11 Thread Eric Garver

On Mon, Oct 10, 2016 at 05:02:44PM +0200, Jiri Benc wrote:
> The internal device does support 802.1AD offloading since 018c1dda5ff1
> ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
> attributes").
> 
> Signed-off-by: Jiri Benc 

Acked-by: Eric Garver

Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-11 Thread Cong Wang

On Tue, Oct 11, 2016 at 2:28 AM, Krister Johansen
 wrote:
> On Wed, Oct 05, 2016 at 11:01:38AM -0700, Cong Wang wrote:
>> Does the attached patch make any sense now? Our pernet init doesn't
>> rely on act_base, so even we have some race, the worst case is after
>> we initialize the pernet netns for an action but its ops still not
>> visible, which seems fine (at least no crash).
>
> I tried to reproduce the panic with this latest patch, but I am unable
> to do so.  The one difference I notice between this patch, and the one I

Nice, so the crash is fixed. I will send out my patch formally.

> sent to the list, is that with yours it takes much longer before we get
> any output from the simultaneous launch of these containers.  Presumably
> that's the extra latency added by allowing many extra modprobe calls to
> get spawned by request_module().

This is a different problem. When we register a pernet ops, the ops->init()
will be called on each container to initialize its pernet data structures,
this is why request_module() blocks on waiting for register_pernet_subsys()
to finish. As we discussed earlier, this could be solved or workaround by
loading modules prior to creating containers. Or if this really needs to be
fixed, it should be in register_pernet_subsys(), since it is not specific to
tc actions, we have so many places loading modules at run-time in
networking subsystem.

Thanks for testing!

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Jason Gunthorpe

On Tue, Oct 11, 2016 at 01:41:56PM -0400, Doug Ledford wrote:

> declare the header.  The problem then became that the sg setup is such
> that we are limited to 16 4k pages for the sg array, so that header had
> to come out of the 64k maximum mtu.

Oh, that clarifies things..

Hum, so various options become:
 - Use >=17 SGL entries when creating the QP. Is this possible
   on common adapters?
 - Use the FRWR infrastructure when necessary. Is there any chance
   the majority of skbs will have at least two physically
   continuous pages to make this overhead rare? Perhaps as a fall
   back if many adaptors can do >=17 SGLs 
 - Pad the hard header out to 4k and discard the first page
   when building the sgl
 - Memcopy the first ~8k into a contiguous 8k region on send
 - Move the pseudo header to the end so it can cross the page
   barrier without needing a sgl entry. (probably impossible?)
 
>From Paolo

> AFAICS the max mtu is already underlying h/w dependent, how does such
> differences are currently coped by ? (I'm sorry I lack some/a lot of IB
> back-ground)

It isn't h/w dependent. In CM mode the MTU is 65520 because that is
what is hard coded into the ipoib driver. We tell everyone to use that
number. Eg see RH's docs on the subject:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Configuring_IPoIB.html

AFAIK, today everyone just wires that number into their scripts, so we
have to mass change everything to the smaller number. That sounds
really hard, IMHO if there is any way to avoid it we should, even if
it is a little costly.

Jason

[Patch net] net_sched: reorder pernet ops and act ops registrations

2016-10-11 Thread Cong Wang

Krister reported a kernel NULL pointer dereference after
tcf_action_init_1() invokes a_o->init(), it is a race condition
where one thread calling tcf_register_action() to initialize
the netns data after putting act ops in the global list and
the other thread searching the list and then calling
a_o->init(net, ...).

Fix this by moving the pernet ops registration before making
the action ops visible. This is fine because: a) we don't
rely on act_base in pernet ops->init(), b) in the worst case we
have a fully initialized netns but ops is still not ready so
new actions still can't be created.

Reported-by: Krister Johansen 
Tested-by: Krister Johansen 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/act_api.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index c910217..a512b18 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -341,22 +341,25 @@ int tcf_register_action(struct tc_action_ops *act,
if (!act->act || !act->dump || !act->init || !act->walk || !act->lookup)
return -EINVAL;
 
+   /* We have to register pernet ops before making the action ops visible,
+* otherwise tcf_action_init_1() could get a partially initialized
+* netns.
+*/
+   ret = register_pernet_subsys(ops);
+   if (ret)
+   return ret;
+
write_lock(_mod_lock);
list_for_each_entry(a, _base, head) {
if (act->type == a->type || (strcmp(act->kind, a->kind) == 0)) {
write_unlock(_mod_lock);
+   unregister_pernet_subsys(ops);
return -EEXIST;
}
}
list_add_tail(>head, _base);
write_unlock(_mod_lock);
 
-   ret = register_pernet_subsys(ops);
-   if (ret) {
-   tcf_unregister_action(act, ops);
-   return ret;
-   }
-
return 0;
 }
 EXPORT_SYMBOL(tcf_register_action);
@@ -367,8 +370,6 @@ int tcf_unregister_action(struct tc_action_ops *act,
struct tc_action_ops *a;
int err = -ENOENT;
 
-   unregister_pernet_subsys(ops);
-
write_lock(_mod_lock);
list_for_each_entry(a, _base, head) {
if (a == act) {
@@ -378,6 +379,8 @@ int tcf_unregister_action(struct tc_action_ops *act,
}
}
write_unlock(_mod_lock);
+   if (!err)
+   unregister_pernet_subsys(ops);
return err;
 }
 EXPORT_SYMBOL(tcf_unregister_action);
-- 
2.1.0

Re: [PATCH net 2/3] openvswitch: fix vlan subtraction from packet length

2016-10-11 Thread Eric Garver

On Mon, Oct 10, 2016 at 05:02:43PM +0200, Jiri Benc wrote:
> When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
> header is not counted in skb->len. It doesn't make sense to subtract it.
> 
> Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan 
> parsing, netlink attributes")
> Signed-off-by: Jiri Benc 

Acked-by: Eric Garver

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Doug Ledford

On 10/11/2016 1:32 PM, Jason Gunthorpe wrote:
> On Tue, Oct 11, 2016 at 07:15:44PM +0200, Paolo Abeni wrote:
> 
>> Also the connected mode maximum mtu is reduced by 16 bytes to
>> cope with the increased hard header len.
> 
> Changing the MTU is going to cause annoying interop problems, can you
> avoid this?

(Paolo did the work I'm describing here, I'm just giving the explanation
he gave me):

Not using this particular solution I don't think.  We tried it without
increasing the declared hard header length and it broke when dealing
with skb_clone/GSO paths.  In order to make the LL pseudo header get
copied along with the rest of the encap and data on clone, we had to
declare the header.  The problem then became that the sg setup is such
that we are limited to 16 4k pages for the sg array, so that header had
to come out of the 64k maximum mtu.

-- 
Doug Ledford 
GPG Key ID: 0E572FDD

signature.asc
Description: OpenPGP digital signature

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Jason Gunthorpe

On Tue, Oct 11, 2016 at 07:37:32PM +0200, Paolo Abeni wrote:
> On Tue, 2016-10-11 at 11:32 -0600, Jason Gunthorpe wrote:
> > On Tue, Oct 11, 2016 at 07:15:44PM +0200, Paolo Abeni wrote:
> > 
> > > Also the connected mode maximum mtu is reduced by 16 bytes to
> > > cope with the increased hard header len.
> > 
> > Changing the MTU is going to cause annoying interop problems, can you
> > avoid this?
> 
> I don't like changing the maximum MTU value, too, but I was unable to
> find an alternative solution. The PMTU detection should protect against
> such issues.

It is more that PMTU, we have instructed all users that is the MTU
number needed to enable CM mode, so it appears in documentation,
scripts, etc.

There is really no way to re-use some of the existing alignment
padding or exceed 64k?

Jason

Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-11 Thread Cong Wang

On Sat, Oct 8, 2016 at 11:13 PM, Krister Johansen
 wrote:
> Hi Cong,
> Thanks for the follow-up.
>
> On Thu, Oct 06, 2016 at 12:01:15PM -0700, Cong Wang wrote:
>> On Wed, Oct 5, 2016 at 11:11 PM, Krister Johansen
>> > pernet_operations pointer.  The code in register_pernet_subsys() makes
>> > no attempt to check for duplicates.  If we add a pointer that's already
>> > in the list, and subsequently call unregister, the results seem
>> > undefined.  It looks like we'll remove the pernet_operations for the
>> > existing action, assuming we don't corrupt the list in the process.
>> >
>> > Is this actually safe?  If so, what corner case is the act->type /
>> > act->kind protecting us from?
>>
>> ops->type and ops->kind should be unique too, user-space already
>> relies on this (tc action ls action xxx). The code exists probably just
>> for sanity check.
>
> With that in mind, would it make sense to change the check to a WARN/BUG
> or some kind of assertion?  I mistakenly inferred that it was possible to
> legtimately end up in this scenario.


Yes, it makes sense to me.


>> > Part of the desire to inhibit extra modprobe calls is that if hundreds
>> > of these all start at once on boot, it's really unnecessary to have all
>> > of the rest of them wait while lots of extra modprobe calls are forked
>> > by the kernel.
>>
>> You can tell systemd to load these modules before starting these
>> containers to avoid blocking, no?
>
> That was exactly what I did to work around the panic until I was able to
> get a patch together.  The preload of the modules is still occurring,
> but I was hoping to excise that workaround entirely.

Or you can compile these modules into kernel, but I am not sure about
the dependencies. :-D

Thanks.

Re: [PATCH net 1/3] openvswitch: vlan: remove wrong likely statement

2016-10-11 Thread Eric Garver

On Mon, Oct 10, 2016 at 05:02:42PM +0200, Jiri Benc wrote:
> This code is called whenever flow key is being extracted from the packet.
> The packet may be as likely vlan tagged as not.
> 
> Fixes: 018c1dda5ff1 ("openvswitch: 802.1AD Flow handling, actions, vlan 
> parsing, netlink attributes")
> Signed-off-by: Jiri Benc 

Acked-by: Eric Garver

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Paolo Abeni

On Tue, 2016-10-11 at 11:32 -0600, Jason Gunthorpe wrote:
> On Tue, Oct 11, 2016 at 07:15:44PM +0200, Paolo Abeni wrote:
> 
> > Also the connected mode maximum mtu is reduced by 16 bytes to
> > cope with the increased hard header len.
> 
> Changing the MTU is going to cause annoying interop problems, can you
> avoid this?

I don't like changing the maximum MTU value, too, but I was unable to
find an alternative solution. The PMTU detection should protect against
such issues.

Re: [PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Jason Gunthorpe

On Tue, Oct 11, 2016 at 07:15:44PM +0200, Paolo Abeni wrote:

> Also the connected mode maximum mtu is reduced by 16 bytes to
> cope with the increased hard header len.

Changing the MTU is going to cause annoying interop problems, can you
avoid this?

Jason

[PATCH] IB/ipoib: move back the IB LL address into the hard header

2016-10-11 Thread Paolo Abeni

After the commit 9207f9d45b0a ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel=146787279825501=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3d736 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
Also the connected mode maximum mtu is reduced by 16 bytes to
cope with the increased hard header len.

After this commit, IPoIB performances are back to pre-regression
value.

Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Paolo Abeni 
---
 drivers/infiniband/ulp/ipoib/ipoib.h   | 24 
 drivers/infiniband/ulp/ipoib/ipoib_cm.c| 17 
 drivers/infiniband/ulp/ipoib/ipoib_ib.c| 12 +++---
 drivers/infiniband/ulp/ipoib/ipoib_main.c  | 54 --
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |  6 ++-
 5 files changed, 67 insertions(+), 46 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 9dbfcc0..5dd01fa 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -63,12 +63,14 @@ enum ipoib_flush_level {
 
 enum {
IPOIB_ENCAP_LEN   = 4,
+   IPOIB_PSEUDO_LEN  = 20,
+   IPOIB_HARD_LEN= IPOIB_ENCAP_LEN + IPOIB_PSEUDO_LEN,
 
IPOIB_UD_HEAD_SIZE= IB_GRH_BYTES + IPOIB_ENCAP_LEN,
IPOIB_UD_RX_SG= 2, /* max buffer needed for 4K mtu */
 
-   IPOIB_CM_MTU  = 0x1 - 0x10, /* padding to align header 
to 16 */
-   IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU  + IPOIB_ENCAP_LEN,
+   IPOIB_CM_MTU  = 0x1 - 0x20, /* padding to align header 
to 16 */
+   IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU  + IPOIB_HARD_LEN,
IPOIB_CM_HEAD_SIZE= IPOIB_CM_BUF_SIZE % PAGE_SIZE,
IPOIB_CM_RX_SG= ALIGN(IPOIB_CM_BUF_SIZE, PAGE_SIZE) / 
PAGE_SIZE,
IPOIB_RX_RING_SIZE= 256,
@@ -134,15 +136,21 @@ struct ipoib_header {
u16 reserved;
 };
 
-struct ipoib_cb {
-   struct qdisc_skb_cb qdisc_cb;
-   u8  hwaddr[INFINIBAND_ALEN];
+struct ipoib_pseudo_header {
+   u8  hwaddr[INFINIBAND_ALEN];
 };
 
-static inline struct ipoib_cb *ipoib_skb_cb(const struct sk_buff *skb)
+static inline void skb_add_pseudo_hdr(struct sk_buff *skb)
 {
-   BUILD_BUG_ON(sizeof(skb->cb) < sizeof(struct ipoib_cb));
-   return (struct ipoib_cb *)skb->cb;
+   char *data = skb_push(skb, IPOIB_PSEUDO_LEN);
+
+   /*
+* only the ipoib header is present now, make room for a dummy
+* pseudo header and set skb field accordingly
+*/
+   memset(data, 0, IPOIB_PSEUDO_LEN);
+   skb_reset_mac_header(skb);
+   skb_pull(skb, IPOIB_HARD_LEN);
 }
 
 /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d..1b04c8a 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -63,6 +63,8 @@ MODULE_PARM_DESC(cm_data_debug_level,
 #define IPOIB_CM_RX_DELAY   (3 * 256 * HZ)
 #define IPOIB_CM_RX_UPDATE_MASK (0x3)
 
+#define IPOIB_CM_RX_RESERVE (ALIGN(IPOIB_HARD_LEN, 16) - IPOIB_ENCAP_LEN)
+
 static struct ib_qp_attr ipoib_cm_err_attr = {
.qp_state = IB_QPS_ERR
 };
@@ -146,15 +148,15 @@ static struct sk_buff *ipoib_cm_alloc_rx_skb(struct 
net_device *dev,
struct sk_buff *skb;
int i;
 
-   skb = dev_alloc_skb(IPOIB_CM_HEAD_SIZE + 12);
+   skb = dev_alloc_skb(ALIGN(IPOIB_CM_HEAD_SIZE, 16));
if (unlikely(!skb))
return NULL;
 
/*
-* IPoIB adds a 4 byte header. So we need 12 more bytes to align the
+* IPoIB adds a IPOIB_ENCAP_LEN byte header, this will align the
 * IP header to a multiple of 16.
 */
-   skb_reserve(skb, 12);
+   skb_reserve(skb, IPOIB_CM_RX_RESERVE);
 
mapping[0] = ib_dma_map_single(priv->ca, skb->data, IPOIB_CM_HEAD_SIZE,
   DMA_FROM_DEVICE);
@@ -624,9 +626,9 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev, struct 
ib_wc *wc)
if (wc->byte_len < IPOIB_CM_COPYBREAK) {
int dlen = wc->byte_len;
 
-   small_skb = dev_alloc_skb(dlen + 12);
+   small_skb = dev_alloc_skb(dlen + IPOIB_CM_RX_RESERVE);
if

Re: [PATCH 04/10] i2c: i2c-sam: Add device tree bindings

2016-10-11 Thread Peter Rosin

On 2016-10-10 21:54, Rob Herring wrote:
> On Fri, Oct 07, 2016 at 06:18:32PM +0300, Pantelis Antoniou wrote:
>> From: Georgi Vlaev 
>>
>> Add binding document for the i2c driver of SAM FPGA.
>>
>> Signed-off-by: Georgi Vlaev 
>> [Ported from Juniper kernel]
>> Signed-off-by: Pantelis Antoniou 
>> ---
>>  .../devicetree/bindings/i2c/i2c-sam-mux.txt| 20 ++
>>  Documentation/devicetree/bindings/i2c/i2c-sam.txt  | 44 
>> ++
>>  2 files changed, 64 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/i2c/i2c-sam-mux.txt
>>  create mode 100644 Documentation/devicetree/bindings/i2c/i2c-sam.txt
>>
>> diff --git a/Documentation/devicetree/bindings/i2c/i2c-sam-mux.txt 
>> b/Documentation/devicetree/bindings/i2c/i2c-sam-mux.txt
>> new file mode 100644
>> index 000..10ddffa
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/i2c/i2c-sam-mux.txt
>> @@ -0,0 +1,20 @@
>> +Juniper's SAM FPGA I2C accelerator mux
>> +
>> +The SAM FPGA I2C mux is present only on Juniper SAM FPGA PTX series
>> +of routers.
>> +
>> +The definition of the i2c sam bus is located in the i2c-sam.txt document.
>> +
>> +Required properties:
>> +- compatible: should be "jnx,i2c-sam-mux".
>> +- reg: master number and mux number.
> 
> This is not how i2c muxes are done.
> 
>> +
>> +Optional properties:
>> +- speed: If present must be either 10 or 40. No other values 
>> supported.
>> +
>> +Examples:
>> +
>> +pe1i2c: i2c-sam-mux@1,0 {
> 
> i2c-mux@...
> 
>> +compatible = "jnx,i2c-sam-mux";
>> +reg = <1 0>;
>> +};
>> diff --git a/Documentation/devicetree/bindings/i2c/i2c-sam.txt 
>> b/Documentation/devicetree/bindings/i2c/i2c-sam.txt
>> new file mode 100644
>> index 000..4830b48
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/i2c/i2c-sam.txt
>> @@ -0,0 +1,44 @@
>> +Juniper's SAM FPGA I2C accelerator
>> +
>> +The SAM FPGA accelerator is used to connect the large number of
>> +I2C muxes that are present on Juniper PTX series of routers.
>> +While it's an i2c bus, no other devices are located besides
>> +i2c-sam-mux devices.
>> +
>> +The definition of the i2c sam mux is located in the i2c-sam-mux.txt 
>> document.
>> +
>> +Required properties:
>> +- compatible: should be "jnx,i2c-sam".
>> +- #address-cells: should be 2.
>> +- #size-cells: should be 0.
>> +- mux-channels: number of mux channels present
> 
> What is this needed for?
> 
>> +
>> +Optional properties:
>> +- reg: offset and length of the register set for the device are optional 
>> since
>> +  typically the register range is provided by the parent SAM MFD device.
>> +- master-offset: Offset of where the master register memory starts.
>> +  Default value is 0x8000.
> 
> Make this required.
> 
>> +- reverse-fill: Fill the start entries of transactions in reverse order
> 
> Needs a better explanation.
> 
>> +- priority-tables: Use the pre-programmed priority tables in the FPGA
> 
> What does not present mean?
> 
>> +- i2c-options: list of options to be written to the option field in the
>> +  FPGA controlling things like SCL push-pull drives, hold-times, etc.
> 
>> +- bus-range: start of bus master range and number of masters.
> 
> Needs a better explanation.
> 
>> +
>> +Examples:
>> +
>> +i2c-sam {
>> +compatible = "jnx,i2c-sam";
>> +mux-channels = <2>;
>> +#size-cells = <0>;
>> +#address-cells = <2>;
>> +
>> +/* PE0 */ pe0i2c: i2c-sam-mux@0,0 {
> 
> i2c-mux@...

Hmm, I actually think i2c@... is the usual naming for i2c-mux children.

Cheers,
Peter

>> +compatible = "jnx,i2c-sam-mux";
>> +reg = <0 0>;
>> +};
>> +
>> +/* PE1 */ pe1i2c: i2c-sam-mux@1,0 {
>> +compatible = "jnx,i2c-sam-mux";
>> +reg = <1 0>;
>> +};
>> +};
>> -- 
>> 1.9.1
>>

ATENCIÓN;

2016-10-11 Thread Sistemas administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que
vuelva a validar su buzón de correo electrónico. Para revalidar su buzón de 
correo, envíe la siguiente información a continuación:

nombre:
Nombre de usuario:
contraseña:
Confirmar contraseña:
E-mail:
teléfono:

Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2016

¡gracias
Sistemas administrador

Re: [PATCH net-next] mlx5: Add MLX5_SET64_VCHK to fix BUILD_BUG_ON

2016-10-11 Thread Tom Herbert

On Tue, Oct 11, 2016 at 4:57 AM, Saeed Mahameed
 wrote:
> On Tue, Oct 11, 2016 at 7:50 PM, David Laight  wrote:
>> From: Tom Herbert
>>> Sent: 11 October 2016 05:22
>> ...
>>> Fix is to create MLX5_SET64_VCHK that takes an additional argument
>>> that is a constant. There are two callers of MLX5_SET64 that are
>>> trying to get a variable offset, change those to call MLX5_SET64_VCHK
>>> passing pas[0] as the argument to use in the offset check.
>>
>> I think I'd separate the array index instead.
>> Something like:
>>
>> #define MLX5_SET64_INDEXED(typ, p, fld, ndx, v) do { \
>> BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
>> __MLX5_SET64(typ, p, fld[ndx], v); \
>> } while (0)
>>
>> David
>
> Yes, I think this looks more natural, but instead MLX5_SET64_INDEXED,
> I prefer to have 2 macros
> MLX5_SET64(typ, p, fld, v) and MLX5_ARRAY_SET64(typ, p, fld, idx, v).
>
> Tom, do you want me to fix it ?
>
Please do.

> Thanks,
> Saeed.

RE: [PATCHv1 net] xen-netback: fix guest Rx stall detection (after guest Rx refactor)

2016-10-11 Thread Paul Durrant

> -Original Message-
> From: David Vrabel [mailto:david.vra...@citrix.com]
> Sent: 11 October 2016 16:48
> To: netdev@vger.kernel.org
> Cc: David Vrabel ; xen-de...@lists.xenproject.org;
> Paul Durrant ; Wei Liu 
> Subject: [PATCHv1 net] xen-netback: fix guest Rx stall detection (after guest
> Rx refactor)
> 
> If a VIF has been ready for rx_stall_timeout (60s by default) and an
> Rx ring is drained of all requests an Rx stall will be incorrectly
> detected.  When this occurs and the guest Rx queue is empty, the Rx
> ring's event index will not be set and the frontend will not raise an
> event when new requests are placed on the ring, permanently stalling
> the VIF.
> 
> This is a regression introduced by eb1723a29b9a7 (xen-netback:
> refactor guest rx).
> 
> Fix this by reinstating the setting of queue->last_rx_time when
> placing a packet onto the guest Rx ring.
> 
> Signed-off-by: David Vrabel 
Reviewed-by: Paul Durrant 

> ---
>  drivers/net/xen-netback/rx.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
> index 8e9ade6..d69f2a9 100644
> --- a/drivers/net/xen-netback/rx.c
> +++ b/drivers/net/xen-netback/rx.c
> @@ -425,6 +425,8 @@ void xenvif_rx_skb(struct xenvif_queue *queue)
> 
>   xenvif_rx_next_skb(queue, );
> 
> + queue->last_rx_time = jiffies;
> +
>   do {
>   struct xen_netif_rx_request *req;
>   struct xen_netif_rx_response *rsp;
> --
> 2.1.4

Re: [PATCH kernel v2] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-10-11 Thread Alexander Duyck

On Mon, Oct 10, 2016 at 9:08 PM, Alexey Kardashevskiy  wrote:
> On 11/10/16 02:23, Alexander Duyck wrote:
>> On Wed, Sep 28, 2016 at 10:21 PM, Alexey Kardashevskiy  
>> wrote:
>>> There is at least one Chelsio 10Gb card which uses VPD area to store
>>> some custom blocks (example below). However pci_vpd_size() returns
>>> the length of the first block only assuming that there can be only
>>> one VPD "End Tag" and VFIO blocks access beyond that offset
>>> (since 4e1a63555) which leads to the situation when the guest "cxgb3"
>>> driver fails to probe the device. The host system does not have this
>>> problem as the drives accesses the config space directly without
>>> pci_read_vpd()/...
>>>
>>> This adds a quirk to override the VPD size to a bigger value.
>>> The maximum size is taken from EEPROMSIZE in
>>> drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag
>>> as the cxgb3 driver does as the driver supports writing to EEPROM/VPD
>>> and when it writes, it only checks for 8192 bytes boundary. The quirk
>>> is registerted for all devices supported by the cxgb3 driver.
>>>
>>> This adds a quirk to the PCI layer (not to the cxgb3 driver) as
>>> the cxgb3 driver itself accesses VPD directly and the problem only exists
>>> with the vfio-pci driver (when cxgb3 is not running on the host and
>>> may not be even loaded) which blocks accesses beyond the first block
>>> of VPD data. However vfio-pci itself does not have quirks mechanism so
>>> we add it to PCI.
>>>
>>> Tested on:
>>> Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single 
>>> Port Adapter [1425:0030]
>>>
>>> This is its VPD:
>>>  Large item 42 bytes; name 0x2 Identifier String
>>> b'10 Gigabit Ethernet-SR PCI Express Adapter'
>>> #00 [EC] len=7: b'D76809 '
>>> #0a [FN] len=7: b'46K7897'
>>> #14 [PN] len=7: b'46K7897'
>>> #1e [MN] len=4: b'1037'
>>> #25 [FC] len=4: b'5769'
>>> #2c [SN] len=12: b'YL102035603V'
>>> #3b [NA] len=12: b'00145E992ED1'
>>>
>>> 0c00 Large item 16 bytes; name 0x2 Identifier String
>>> b'S310E-SR-X  '
>>> 0c13 Large item 234 bytes; name 0x10
>>> #00 [PN] len=16: b'TBD '
>>> #13 [EC] len=16: b'110107730D2 '
>>> #26 [SN] len=16: b'97YL102035603V  '
>>> #39 [NA] len=12: b'00145E992ED1'
>>> #48 [V0] len=6: b'175000'
>>> #51 [V1] len=6: b'26'
>>> #5a [V2] len=6: b'26'
>>> #63 [V3] len=6: b'2000  '
>>> #6c [V4] len=2: b'1 '
>>> #71 [V5] len=6: b'c2'
>>> #7a [V6] len=6: b'0 '
>>> #83 [V7] len=2: b'1 '
>>> #88 [V8] len=2: b'0 '
>>> #8d [V9] len=2: b'0 '
>>> #92 [VA] len=2: b'0 '
>>> #97 [RV] len=80: 
>>> b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>>> 0d00 Large item 252 bytes; name 0x11
>>> #00 [VC] len=16: b'122310_1222 dp  '
>>> #13 [VD] len=16: b'610-0001-00 H1\x00\x00'
>>> #26 [VE] len=16: b'122310_1353 fp  '
>>> #39 [VF] len=16: b'610-0001-00 H1\x00\x00'
>>> #4c [RW] len=173: 
>>> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>>> 0dff Small item 0 bytes; name 0xf End Tag
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>> Changes:
>>> v2:
>>> * used pci_set_vpd_size() helper
>>> * added explicit list of IDs from cxgb3 driver
>>> * added a note in the commit log why the quirk is not in cxgb3
>>> ---
>>>  drivers/pci/quirks.c | 22 ++
>>>  1 file changed, 22 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 44e0ff3..b22fce5 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3243,6 +3243,28 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
>>> PCI_DEVICE_ID_INTEL_PORT_RIDGE,
>>> quirk_thunderbolt_hotplug_msi);
>>>
>>> +static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>>> +{
>>> +   if (!dev->vpd)
>>> +   return;
>>> +
>>> +   pci_set_vpd_size(dev, max_t(unsigned int, dev->vpd->len, 8192));
>>
>> What is the point of the max_t?  From what I can tell you aren't
>> writing 8192, you will always end up writing 32K since that is the
>> starting value for dev->vpd->len assuming there have yet to be any
>> reads.
>
> At this stage dev->vpd->len is always 32k? I thought here VPD was scanned
> already, I'll double check.

It only gets updated when an actual read is performed.  You cannot
rely on that occurring before your workaround.

>
>>
>> What you may want to do instead is just modify the pci_vpd_size
>> function you can use that in your quirk, and modify it so that you can
>> pass an offset  Then you could just start it with an offset of 0x0c00
>> and have it read to get the exact size of the region

[PATCHv1 net] xen-netback: fix guest Rx stall detection (after guest Rx refactor)

2016-10-11 Thread David Vrabel

If a VIF has been ready for rx_stall_timeout (60s by default) and an
Rx ring is drained of all requests an Rx stall will be incorrectly
detected.  When this occurs and the guest Rx queue is empty, the Rx
ring's event index will not be set and the frontend will not raise an
event when new requests are placed on the ring, permanently stalling
the VIF.

This is a regression introduced by eb1723a29b9a7 (xen-netback:
refactor guest rx).

Fix this by reinstating the setting of queue->last_rx_time when
placing a packet onto the guest Rx ring.

Signed-off-by: David Vrabel 
---
 drivers/net/xen-netback/rx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
index 8e9ade6..d69f2a9 100644
--- a/drivers/net/xen-netback/rx.c
+++ b/drivers/net/xen-netback/rx.c
@@ -425,6 +425,8 @@ void xenvif_rx_skb(struct xenvif_queue *queue)
 
xenvif_rx_next_skb(queue, );
 
+   queue->last_rx_time = jiffies;
+
do {
struct xen_netif_rx_request *req;
struct xen_netif_rx_response *rsp;
-- 
2.1.4

Re: [PATCH] doc: fix wrongly referencing dev->skb_mark

2016-10-11 Thread Ryota Ozaki

On Mon, Oct 10, 2016 at 10:49 PM, Ido Schimmel  wrote:
> Hi,
>
> On Mon, Oct 10, 2016 at 08:15:39PM +0900, Ryota Ozaki wrote:
>> Section "Flooding L2 domain" says, to avoid duplicated flooding, if
>> skb->offload_fwd_mark is matched with dev->skb_mark, the kernel will
>> drop the packet. However, the relevant code in __dev_queue_xmit
>> compares skb->offload_fwd_mark with dev->offload_fwd_mark, not
>> dev->skb_mark. I guess the text is wrong.
>>
>> Signed-off-by: Ryota Ozaki 
>> ---
>>  Documentation/networking/switchdev.txt | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/switchdev.txt 
>> b/Documentation/networking/switchdev.txt
>> index 31c3911..d4124a0 100644
>> --- a/Documentation/networking/switchdev.txt
>> +++ b/Documentation/networking/switchdev.txt
>> @@ -286,8 +286,8 @@ otherwise there will be duplicate packets on the wire.
>>  To avoid duplicate packets, the device/driver should mark a packet as 
>> already
>>  forwarded using skb->offload_fwd_mark.  The same mark is set on the device
>>  ports in the domain using dev->offload_fwd_mark.  If the 
>> skb->offload_fwd_mark
>> -is non-zero and matches the forwarding egress port's dev->skb_mark, the 
>> kernel
>> -will drop the skb right before transmit on the egress port, with the
>> +is non-zero and matches the forwarding egress port's dev->offload_fwd_mark,
>> +the kernel will drop the skb right before transmit on the egress port, with 
>> the
>
> I think your tree isn't up to date. The flooding mechanism (and this
> document) were modified in commit 6bc506b4fb06 ("bridge: switchdev: Add
> forward mark support for stacked devices")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6bc506b4fb065eac3d89ca1ce37082e174493d9e

Oh, right. I forgot to rebase my branch after the 4.9 merge period
and saw the file as of 4.8.

>
> Also, in the future, please specify to which tree (net, net-next) your
> patch should go. See:
> https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt
> (under "How do I indicate which tree (net vs. net-next) my patch
> should be in?").

Thanks for the pointer. I'll follow the rule next time.

Thanks,
  ozaki-r

>
> Thanks!
>
>>  understanding that the device already forwarded the packet on same egress 
>> port.
>>  The driver can use switchdev_port_fwd_mark_set() to set a globally unique 
>> mark
>>  for port's dev->offload_fwd_mark, based on the port's parent ID (switch ID) 
>> and
>> --
>> 2.7.4
>>

Re: [RFC] net: phy: smsc: Disable auto-negotiation on startup

2016-10-11 Thread Jeremy Linton


On 10/10/2016 12:41 PM, Kyle Roeschley wrote:

Because the SMSC PHY completes auto-negotiation before the driver is
ready to handle interrupts, the PHY state machine never realizes that we
have a link. Clear the ANENABLE bit on initialization, which lets
genphy_config_aneg do its thing when that code is hit later.

While this patch does fix the problem we see (no link on boot without
re-plugging the cable), it seems like the generic PHY code should be
able to handle auto-negotiation completing before interrupts are
enabled. Submitted as an RFC in the hopes that someone has an idea as to
how that could be done.


Hi,

	Which smsc chip/driver? Maybe assuring the device interrupts are 
enabled before the phy is started is a solution?


	The whole problem sounds similar to what was recently happening in the 
smsc911x driver, but AFAIK that driver is basically only polling at this 
point so connecting the phy before the interrupts are enabled shouldn't 
be a problem.







This fix is copied from commit 99f81afc139c ("phy: micrel: Disable auto
negotiation on startup").

Signed-off-by: Kyle Roeschley 
---
 drivers/net/phy/smsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index b62c4aa..8de8011 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -62,6 +62,16 @@ static int smsc_phy_config_init(struct phy_device *phydev)
return rc;
}

+   if (phy_interrupt_is_valid(phydev)) {
+   rc = phy_read(phydev, MII_BMCR);
+   if (rc < 0)
+   return rc;
+
+   rc = phy_write(phydev, MII_BMCR, rc & ~BMCR_ANENABLE);
+   if (rc < 0)
+   return rc;
+   }
+
return smsc_phy_ack_interrupt(phydev);
 }

Re: [PATCH] iwlwifi: pcie: reduce "unsupported splx" to a warning

2016-10-11 Thread Chris Rorvick

On Tue, Oct 11, 2016 at 9:09 AM, Chris Rorvick  wrote:
> I didn't receive your email so I'll try to respond via Paul's.

>>> If this is really bothering you, I guess I could apply this patch for
>>> now.  But as I said, this is not solving the actual problem.
>>
>> Bikeshedding: I think IWL_INFO() is more appropriate, as info doesn't
>> imply one needs to act on this message, while warn does imply that
>> action is needed.
>
> Agreed.  I still think making this a warning is appropriate, but it
> seems pretty clear this is not an error.  This has nothing to do with
> how much it bothers me.  An error tells the user something needs to be
> fixed, but in this case the interface is working fine.  Making it a
> warning with an improved message will result in fewer people wasting
> their time.

I found your original email on lkml.org... should have looked there in
the first place!  Yes, if there is a fix for the underlying issue then
that is obviously preferred.  When I investigated this I saw several
reports spanning at least a few distros and kernel versions with at
least some concluding "this is normal".

Again, thanks!

Chris

Re: [PATCH 1/1] vxlan: insert ipv6 macro

2016-10-11 Thread Jiri Benc

On Tue, 11 Oct 2016 16:23:31 +0800, zyjzyj2...@gmail.com wrote:
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2647,15 +2647,15 @@ static struct socket *vxlan_create_sock(struct net 
> *net, bool ipv6,
>   int err;
>  
>   memset(_conf, 0, sizeof(udp_conf));
> -
> +#if IS_ENABLED(CONFIG_IPV6)
>   if (ipv6) {
>   udp_conf.family = AF_INET6;
>   udp_conf.use_udp6_rx_checksums =
>   !(flags & VXLAN_F_UDP_ZERO_CSUM6_RX);
>   udp_conf.ipv6_v6only = 1;
> - } else {
> + } else 
> +#endif
>   udp_conf.family = AF_INET;
> - }

Zhu Yanjun, before posting patches such as the previous ones or
this one, please test whether they make any difference. In this case,
try to compile the code with IPv6 disabled before and after this patch,
disassemble and compare the results. You'll see that this patch is
pointless.

It's pretty obvious from the code but to be really sure, I've just
quickly built the vxlan module with IPv6 disabled. And indeed, as
expected, the compiler just inlined everything into vxlan_open. The
whole chain vxlan_open -> vxlan_sock_add -> __vxlan_sock_add (note that
there's only a single caller of __vxlan_sock_add with IPv6 disabled) ->
vxlan_socket_create -> vxlan_create_sock is inlined.

It also means the code in the "if (ipv6)" branch is completely
eliminated by the compiler even without ugly #ifdefs.

 Jiri

Re: [PATCH] iwlwifi: pcie: reduce "unsupported splx" to a warning

2016-10-11 Thread Chris Rorvick

Hi Luca,

I didn't receive your email so I'll try to respond via Paul's.

On Tue, Oct 11, 2016 at 5:11 AM, Paul Bolle  wrote:
>> This is not coming from the NIC itself, but from the platform's ACPI
>> tables.  Can you tell us which platform you are using?

Interesting.  I'm running a Dell XPS 13 9350.  I replaced the
factory-provided Broadcom card with an AC 8260.  I can update the
commit log to reflect this.

>> There are other things that look a bit inconsistent in this code...
>> I'll try to find the official ACPI table definitions for this entries
>> to make sure it's correct.
>
> When I looked into this error, some time ago, I searched around a bit
> for documentation on this splx stuff. Sadly, commit bcb079a14d75
> ("iwlwifi: pcie: retrieve and parse ACPI power limitations") provides
> very few clues and my searches turned up nothing useful. So a pointer
> or two would be really appreciated.

Ditto.

>> If this is really bothering you, I guess I could apply this patch for
>> now.  But as I said, this is not solving the actual problem.
>
> Bikeshedding: I think IWL_INFO() is more appropriate, as info doesn't
> imply one needs to act on this message, while warn does imply that
> action is needed.

Agreed.  I still think making this a warning is appropriate, but it
seems pretty clear this is not an error.  This has nothing to do with
how much it bothers me.  An error tells the user something needs to be
fixed, but in this case the interface is working fine.  Making it a
warning with an improved message will result in fewer people wasting
their time.

Thanks!

Chris

Re: slab corruption with current -git

2016-10-11 Thread Aaron Conole

Michal Kubecek  writes:

> On Mon, Oct 10, 2016 at 04:24:01AM -0400, David Miller wrote:
>> From: David Miller 
>> Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT)
>> 
>> This means that the netns is possibly getting freed up before we
>> unregister the netfilter hooks.
>
> Sounds a bit like the issue discussed here:
>
>   https://marc.info/?l=netfilter-devel=146980917627262=2
>
> Could it be (partly) the same race condition?

It looks like it's possible.  It appears that there could be a
long-standing race between these.  I'll look into it more carefully, and
discuss with Pablo and Florian when they're situated from netdev
conference.

-Aaron

Hello !

2016-10-11 Thread Benoît QUEMIN

‎
We have seen your message and are glad to know you :)
We love your work and hope to hear from you soon <3
Xx

[PATCH 1/1] vxlan: insert ipv6 macro

2016-10-11 Thread zyjzyj2000

From: Zhu Yanjun 

The source code is related with ipv6. As such, it is better to insert
ipv6 macro.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/vxlan.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e7d1668..9af6600 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2647,15 +2647,15 @@ static struct socket *vxlan_create_sock(struct net 
*net, bool ipv6,
int err;
 
memset(_conf, 0, sizeof(udp_conf));
-
+#if IS_ENABLED(CONFIG_IPV6)
if (ipv6) {
udp_conf.family = AF_INET6;
udp_conf.use_udp6_rx_checksums =
!(flags & VXLAN_F_UDP_ZERO_CSUM6_RX);
udp_conf.ipv6_v6only = 1;
-   } else {
+   } else 
+#endif
udp_conf.family = AF_INET;
-   }
 
udp_conf.local_udp_port = port;
 
-- 
2.7.4

Re: slab corruption with current -git

2016-10-11 Thread Michal Kubecek

On Mon, Oct 10, 2016 at 04:24:01AM -0400, David Miller wrote:
> From: David Miller 
> Date: Sun, 09 Oct 2016 23:57:45 -0400 (EDT)
> 
> This means that the netns is possibly getting freed up before we
> unregister the netfilter hooks.

Sounds a bit like the issue discussed here:

  https://marc.info/?l=netfilter-devel=146980917627262=2

Could it be (partly) the same race condition?

Michal Kubecek

Re: [PATCH net-next v10 1/1] net: phy: Cleanup the Edge-Rate feature in Microsemi PHYs.

2016-10-11 Thread Andrew Lunn

On Mon, Oct 10, 2016 at 04:13:45PM +0200, Allan W. Nielsen wrote:
> Edge-Rate cleanup include the following:
> - Updated device tree bindings documentation for edge-rate
> - The edge-rate is now specified as a "slowdown", meaning that it is now
>   being specified as positive values instead of negative (both
>   documentation and implementation wise).
> - Only explicitly documented values for "vsc8531,vddmac" and
>   "vsc8531,edge-slowdown" are accepted by the device driver.
> - Deleted include/dt-bindings/net/mscc-phy-vsc8531.h as it was not needed.
> - Read/validate devicetree settings in probe instead of init
> 
> Signed-off-by: Allan W. Nielsen 
> Signed-off-by: Raju Lakkaraju 

Hi Raju, Allan

Thanks for keeping at this.

Reviewed-by: Andrew Lunn 

Andrew

[PATCH] drivers/ptp: Fix kernel memory disclosure

2016-10-11 Thread Vlad Tsyrklevich

The reserved field precise_offset->rsv is not cleared before being
copied to user space, leaking kernel stack memory. Clear the struct
before it's copied.

Signed-off-by: Vlad Tsyrklevich 

---
 drivers/ptp/ptp_chardev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index d637c93..58a97d4 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -193,6 +193,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
if (err)
break;
 
+   memset(_offset, 0, sizeof(precise_offset));
ts = ktime_to_timespec64(xtstamp.device);
precise_offset.device.sec = ts.tv_sec;
precise_offset.device.nsec = ts.tv_nsec;
-- 
2.7.0

[PATCH v2 iproute2 2/9] ife: print prio, mark and hash as unsigned

2016-10-11 Thread Jamal Hadi Salim

From: Roman Mashak 

Signed-off-by: Roman Mashak 
Signed-off-by: Jamal Hadi Salim 
---
 tc/m_ife.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index a5a7516..588bad7 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -252,7 +252,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
len = RTA_PAYLOAD(metalist[IFE_META_SKBMARK]);
if (len) {
mmark = 
rta_getattr_u32(metalist[IFE_META_SKBMARK]);
-   fprintf(f, "use mark %d ", mmark);
+   fprintf(f, "use mark %u ", mmark);
} else
fprintf(f, "allow mark ");
}
@@ -261,7 +261,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
len = RTA_PAYLOAD(metalist[IFE_META_HASHID]);
if (len) {
mhash = 
rta_getattr_u32(metalist[IFE_META_HASHID]);
-   fprintf(f, "use hash %d ", mhash);
+   fprintf(f, "use hash %u ", mhash);
} else
fprintf(f, "allow hash ");
}
@@ -270,7 +270,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
len = RTA_PAYLOAD(metalist[IFE_META_PRIO]);
if (len) {
mprio = 
rta_getattr_u32(metalist[IFE_META_PRIO]);
-   fprintf(f, "use prio %d ", mprio);
+   fprintf(f, "use prio %u ", mprio);
} else
fprintf(f, "allow prio ");
}
-- 
1.9.1

Re: [PATCH net-next] mlx5: Add MLX5_SET64_VCHK to fix BUILD_BUG_ON

2016-10-11 Thread Saeed Mahameed

On Tue, Oct 11, 2016 at 7:50 PM, David Laight  wrote:
> From: Tom Herbert
>> Sent: 11 October 2016 05:22
> ...
>> Fix is to create MLX5_SET64_VCHK that takes an additional argument
>> that is a constant. There are two callers of MLX5_SET64 that are
>> trying to get a variable offset, change those to call MLX5_SET64_VCHK
>> passing pas[0] as the argument to use in the offset check.
>
> I think I'd separate the array index instead.
> Something like:
>
> #define MLX5_SET64_INDEXED(typ, p, fld, ndx, v) do { \
> BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
> __MLX5_SET64(typ, p, fld[ndx], v); \
> } while (0)
>
> David

Yes, I think this looks more natural, but instead MLX5_SET64_INDEXED,
I prefer to have 2 macros
MLX5_SET64(typ, p, fld, v) and MLX5_ARRAY_SET64(typ, p, fld, idx, v).

Tom, do you want me to fix it ?

Thanks,
Saeed.

[PATCH v2 iproute2 6/9] actions: add skbmod action

2016-10-11 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

This action is intended to be an upgrade from a usability perspective
from pedit (as well as operational debugability).
Compare this:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action pedit munge offset -14 u8 set 0x02 \
munge offset -13 u8 set 0x15 \
munge offset -12 u8 set 0x15 \
munge offset -11 u8 set 0x15 \
munge offset -10 u16 set 0x1515 \
pipe

to:

sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbmod dmac 02:15:15:15:15:15

Or worse, try to debug a policy with destination mac, source mac and
etherype. Then make that a hundred rules and you'll get my point.

The most important ethernet use case at the moment is when redirecting or
mirroring packets to a remote machine. The dst mac address needs a re-write
so that it doesn't get dropped or confuse an interconnecting (learning) switch
or dropped by a target machine (which looks at the dst mac).

In the future common use cases on pedit can be migrated to this action
(as an example different fields in ip v4/6, transports like tcp/udp/sctp
etc). For this first cut, this allows modifying basic ethernet header.

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_skbmod.c | 260 ++
 1 file changed, 260 insertions(+)
 create mode 100644 tc/m_skbmod.c

diff --git a/tc/m_skbmod.c b/tc/m_skbmod.c
new file mode 100644
index 000..b7f4765
--- /dev/null
+++ b/tc/m_skbmod.c
@@ -0,0 +1,260 @@
+/*
+ * m_skbmod.c  skb modifier action module
+ *
+ * This program is free software; you can distribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:  J Hadi Salim (j...@mojatatu.com)
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rt_names.h"
+#include "utils.h"
+#include "tc_util.h"
+#include 
+
+static void skbmod_explain(void)
+{
+   fprintf(stderr,
+   "Usage:... skbmod {[set ] [swap ]} 
[CONTROL] [index INDEX]\n");
+   fprintf(stderr, "where SETTABLE is: [dmac DMAC] [smac SMAC] [etype 
ETYPE] \n");
+   fprintf(stderr, "where SWAPABLE is: \"mac\" to swap mac addresses\n");
+   fprintf(stderr, "note: \"swap mac\" is done after any outstanding 
D/SMAC change\n");
+   fprintf(stderr,
+   "\tDMAC := 6 byte Destination MAC address\n"
+   "\tSMAC := optional 6 byte Source MAC address\n"
+   "\tETYPE := optional 16 bit ethertype\n"
+   "\tCONTROL := reclassify|pipe|drop|continue|ok\n"
+   "\tINDEX := skbmod index value to use\n");
+}
+
+static void skbmod_usage(void)
+{
+   skbmod_explain();
+   exit(-1);
+}
+
+static int parse_skbmod(struct action_util *a, int *argc_p, char ***argv_p,
+   int tca_id, struct nlmsghdr *n)
+{
+   int argc = *argc_p;
+   char **argv = *argv_p;
+   int ok = 0;
+   struct tc_skbmod p;
+   struct rtattr *tail;
+   char dbuf[ETH_ALEN];
+   char sbuf[ETH_ALEN];
+   __u16 skbmod_etype = 0;
+   char *daddr = NULL;
+   char *saddr = NULL;
+
+   memset(, 0, sizeof(p));
+   p.action = TC_ACT_PIPE; /* good default */
+
+   if (argc <= 0)
+   return -1;
+
+   while (argc > 0) {
+   if (matches(*argv, "skbmod") == 0) {
+   NEXT_ARG();
+   continue;
+   } else if (matches(*argv, "swap") == 0) {
+   NEXT_ARG();
+   continue;
+   } else if (matches(*argv, "mac") == 0) {
+   p.flags |= SKBMOD_F_SWAPMAC;
+   ok += 1;
+   } else if (matches(*argv, "set") == 0) {
+   NEXT_ARG();
+   continue;
+   } else if (matches(*argv, "etype") == 0) {
+   NEXT_ARG();
+   if (get_u16(_etype, *argv, 0))
+   invarg("ethertype is invalid", *argv);
+   fprintf(stderr, "skbmod etype 0x%x\n", skbmod_etype);
+   p.flags |= SKBMOD_F_ETYPE;
+   ok += 1;
+   } else if (matches(*argv, "dmac") == 0) {
+   NEXT_ARG();
+   daddr = *argv;
+   if (sscanf(daddr, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+  dbuf, dbuf + 1, dbuf + 2,
+  dbuf + 3, dbuf + 4, dbuf + 5) != 6) {
+   fprintf(stderr, "Invalid dst mac address %s\n",
+   daddr);

RE: [PATCH net-next] mlx5: Add MLX5_SET64_VCHK to fix BUILD_BUG_ON

2016-10-11 Thread David Laight

From: Tom Herbert
> Sent: 11 October 2016 05:22
...
> Fix is to create MLX5_SET64_VCHK that takes an additional argument
> that is a constant. There are two callers of MLX5_SET64 that are
> trying to get a variable offset, change those to call MLX5_SET64_VCHK
> passing pas[0] as the argument to use in the offset check.

I think I'd separate the array index instead.
Something like:

#define MLX5_SET64_INDEXED(typ, p, fld, ndx, v) do { \
BUILD_BUG_ON(__mlx5_bit_off(typ, fld) % 64); \
__MLX5_SET64(typ, p, fld[ndx], v); \
} while (0)

David

[PATCH v2 iproute2 9/9] man pages: add man page for skbmod action

2016-10-11 Thread Jamal Hadi Salim

From: Lucas Bates 

Signed-off-by: Lucas Bates 
Signed-off-by: Jamal Hadi Salim 
---
 man/man8/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/Makefile b/man/man8/Makefile
index 4ad96ce..de6f249 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -16,7 +16,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 
rtmon.8 rtpr.8 ss.
tc-basic.8 tc-cgroup.8 tc-flow.8 tc-flower.8 tc-fw.8 tc-route.8 \
tc-tcindex.8 tc-u32.8 tc-matchall.8 \
tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
-   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 \
+   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 tc-skbmod.8 \
devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8
 
 all: $(TARGETS)
-- 
1.9.1

[PATCH v2 iproute2 3/9] ife: improve help text

2016-10-11 Thread Jamal Hadi Salim

From: Roman Mashak 

Signed-off-by: Roman Mashak 
Signed-off-by: Jamal Hadi Salim 
---
 tc/m_ife.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index 588bad7..862461b 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -29,12 +29,13 @@
 static void ife_explain(void)
 {
fprintf(stderr,
-   "Usage:... ife {decode|encode} {ALLOW|USE} [dst DMAC] [src 
SMAC] [type TYPE] [CONTROL] [index INDEX]\n");
+   "Usage:... ife {decode|encode} [{ALLOW|USE} ATTR] [dst DMAC] 
[src SMAC] [type TYPE] [CONTROL] [index INDEX]\n");
fprintf(stderr,
"\tALLOW := Encode direction. Allows encoding specified 
metadata\n"
"\t\t e.g \"allow mark\"\n"
"\tUSE := Encode direction. Enforce Static encoding of 
specified metadata\n"
"\t\t e.g \"use mark 0x12\"\n"
+   "\tATTR := mark (32-bit), prio (32-bit), tcindex (16-bit)\n"
"\tDMAC := 6 byte Destination MAC address to encode\n"
"\tSMAC := optional 6 byte Source MAC address to encode\n"
"\tTYPE := optional 16 bit ethertype to encode\n"
-- 
1.9.1

[PATCH v2 iproute2 1/9] ife action: allow specifying index in hex

2016-10-11 Thread Jamal Hadi Salim

From: Roman Mashak 

Signed-off-by: Roman Mashak 
Signed-off-by: Jamal Hadi Salim 
---
 tc/m_ife.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index 0219760..a5a7516 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -152,7 +152,7 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
if (argc) {
if (matches(*argv, "index") == 0) {
NEXT_ARG();
-   if (get_u32(, *argv, 10)) {
+   if (get_u32(, *argv, 0)) {
fprintf(stderr, "ife: Illegal \"index\"\n");
return -1;
}
-- 
1.9.1

[PATCH v2 iproute2 5/9] action gact: list pipe as a valid action

2016-10-11 Thread Jamal Hadi Salim

From: Craig Dillabaugh 

Signed-off-by: Craig Dillabaugh 
Signed-off-by: Jamal Hadi Salim 
---
 tc/m_gact.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tc/m_gact.c b/tc/m_gact.c
index 2bfd9a7..dc04b9f 100644
--- a/tc/m_gact.c
+++ b/tc/m_gact.c
@@ -45,7 +45,7 @@ explain(void)
 #ifdef CONFIG_GACT_PROB
fprintf(stderr, "Usage: ... gact  [RAND] [INDEX]\n");
fprintf(stderr,
-   "Where: \tACTION := reclassify | drop | continue | pass\n"
+   "Where: \tACTION := reclassify | drop | continue | pass | 
pipe\n"
"\tRAND := random   \n"
"\tRANDTYPE := netrand | determ\n"
"\tVAL : = value not exceeding 1\n"
@@ -54,7 +54,7 @@ explain(void)
 #else
fprintf(stderr, "Usage: ... gact  [INDEX]\n");
fprintf(stderr,
-   "Where: \tACTION := reclassify | drop | continue | pass\n"
+   "Where: \tACTION := reclassify | drop | continue | pass | 
pipe\n"
"\tINDEX := index value used\n"
"\n");
 #endif
-- 
1.9.1

[PATCH v2 iproute2 4/9] actions ife: Introduce encoding and decoding of tcindex metadata

2016-10-11 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 include/linux/tc_act/tc_ife.h |  3 ++-
 tc/m_ife.c| 29 +++--
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/include/linux/tc_act/tc_ife.h b/include/linux/tc_act/tc_ife.h
index 4ece02a..cd18360 100644
--- a/include/linux/tc_act/tc_ife.h
+++ b/include/linux/tc_act/tc_ife.h
@@ -32,8 +32,9 @@ enum {
 #define IFE_META_HASHID 2
 #defineIFE_META_PRIO 3
 #defineIFE_META_QMAP 4
+#defineIFE_META_TCINDEX 5
 /*Can be overridden at runtime by module option*/
-#define__IFE_META_MAX 5
+#define__IFE_META_MAX 6
 #define IFE_META_MAX (__IFE_META_MAX - 1)
 
 #endif
diff --git a/tc/m_ife.c b/tc/m_ife.c
index 862461b..e6f6153 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -67,6 +67,8 @@ static int parse_ife(struct action_util *a, int *argc_p, char 
***argv_p,
__u32 ife_prio_v = 0;
__u32 ife_mark = 0;
__u32 ife_mark_v = 0;
+   __u16 ife_tcindex = 0;
+   __u16 ife_tcindex_v = 0;
char *daddr = NULL;
char *saddr = NULL;
 
@@ -89,6 +91,8 @@ static int parse_ife(struct action_util *a, int *argc_p, char 
***argv_p,
ife_mark = IFE_META_SKBMARK;
} else if (matches(*argv, "prio") == 0) {
ife_prio = IFE_META_PRIO;
+   } else if (matches(*argv, "tcindex") == 0) {
+   ife_prio = IFE_META_TCINDEX;
} else {
fprintf(stderr, "Illegal meta define <%s>\n",
*argv);
@@ -106,6 +110,11 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
if (get_u32(_prio_v, *argv, 0))
invarg("ife prio val is invalid",
   *argv);
+   } else if (matches(*argv, "tcindex") == 0) {
+   NEXT_ARG();
+   if (get_u16(_tcindex_v, *argv, 0))
+   invarg("ife tcindex val is invalid",
+  *argv);
} else {
fprintf(stderr, "Illegal meta use type <%s>\n",
*argv);
@@ -196,6 +205,13 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
else
addattr_l(n, MAX_MSG, IFE_META_PRIO, NULL, 0);
}
+   if (ife_tcindex || ife_tcindex_v) {
+   if (ife_tcindex_v)
+   addattr_l(n, MAX_MSG, IFE_META_TCINDEX, _tcindex_v,
+ 2);
+   else
+   addattr_l(n, MAX_MSG, IFE_META_TCINDEX, NULL, 0);
+   }
 
tail2->rta_len = (void *)NLMSG_TAIL(n) - (void *)tail2;
 
@@ -213,7 +229,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
struct rtattr *tb[TCA_IFE_MAX + 1];
__u16 ife_type = 0;
__u32 mmark = 0;
-   __u32 mhash = 0;
+   __u16 mtcindex = 0;
__u32 mprio = 0;
int has_optional = 0;
SPRINT_BUF(b2);
@@ -258,13 +274,14 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
fprintf(f, "allow mark ");
}
 
-   if (metalist[IFE_META_HASHID]) {
-   len = RTA_PAYLOAD(metalist[IFE_META_HASHID]);
+   if (metalist[IFE_META_TCINDEX]) {
+   len = RTA_PAYLOAD(metalist[IFE_META_TCINDEX]);
if (len) {
-   mhash = 
rta_getattr_u32(metalist[IFE_META_HASHID]);
-   fprintf(f, "use hash %u ", mhash);
+   mtcindex =
+   
rta_getattr_u16(metalist[IFE_META_TCINDEX]);
+   fprintf(f, "use tcindex %d ", mtcindex);
} else
-   fprintf(f, "allow hash ");
+   fprintf(f, "allow tcindex ");
}
 
if (metalist[IFE_META_PRIO]) {
-- 
1.9.1

[PATCH v2 iproute2 8/9] man pages: Add tc-ife to Makefile

2016-10-11 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 man/man8/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/Makefile b/man/man8/Makefile
index 9213769..4ad96ce 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -16,7 +16,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 
rtmon.8 rtpr.8 ss.
tc-basic.8 tc-cgroup.8 tc-flow.8 tc-flower.8 tc-fw.8 tc-route.8 \
tc-tcindex.8 tc-u32.8 tc-matchall.8 \
tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
-   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8 \
+   tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 \
devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8
 
 all: $(TARGETS)
-- 
1.9.1

[PATCH v2 iproute2 7/9] man pages: update ife action to include tcindex

2016-10-11 Thread Jamal Hadi Salim

From: Lucas Bates 

Signed-off-by: Lucas Bates 
Signed-off-by: Jamal Hadi Salim 
---
 man/man8/tc-ife.8 | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/man/man8/tc-ife.8 b/man/man8/tc-ife.8
index 7b3601e..aaf0f97 100644
--- a/man/man8/tc-ife.8
+++ b/man/man8/tc-ife.8
@@ -5,8 +5,8 @@ IFE - encapsulate/decapsulate metadata
 .SH SYNOPSIS
 .in +8
 .ti -8
-.BR tc " ... " action ife"
-.I DIRECTION ACTION
+.BR tc " ... " " action ife"
+.IR DIRECTION " [ " ACTION " ] "
 .RB "[ " dst
 .IR DMAC " ] "
 .RB "[ " src
@@ -24,7 +24,13 @@ IFE - encapsulate/decapsulate metadata
 
 .ti -8
 .IR ACTION " := { "
-.BR allow " | " use " }"
+.BI allow " ATTR"
+.RB "| " use
+.IR "ATTR value" " }"
+
+.ti -8
+.IR ATTR " := { "
+.BR mark " | " prio " | " tcindex " }"
 
 .ti -8
 .IR CONTROL " := { "
@@ -50,6 +56,23 @@ Encode direction only. Allows encoding specified metadata.
 .B use
 Encode direction only. Enforce static encoding of specified metadata.
 .TP
+.BR mark " [ "
+.IR u32_value " ]"
+The value to set for the skb mark. The u32 value is required only when
+.BR use " is specified."
+.TP
+.BR prio " [ "
+.IR u32_value " ]"
+The value to set for priority in the skb structure. The u32 value is required
+only when
+.BR use " is specified."
+.TP
+.BR tcindex " ["
+.IR u16_value " ]"
+Value to set for the traffic control index in the skb structure. The u16 value
+is required only when
+.BR use " is specified."
+.TP
 .BI dmac " DMAC"
 .TQ
 .BI smac " SMAC"
-- 
1.9.1

[PATCH v2 iproute2 0/9] Cleanup backlog

2016-10-11 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 


Variety of cleanup and new functionality I had sitting around on my
private tree

Craig Dillabaugh (1):
  action gact: list pipe as a valid action

Jamal Hadi Salim (3):
  actions ife: Introduce encoding and decoding of tcindex metadata
  actions:  add skbmod action
  man pages: Add tc-ife to Makefile

Lucas Bates (2):
  man pages: update ife action to include tcindex
  man pages: add man page for skbmod action

Roman Mashak (3):
  ife action: allow specifying index in hex
  ife: print prio, mark and hash as unsigned
  ife: improve help text

 include/linux/tc_act/tc_ife.h |   3 +-
 man/man8/Makefile |   2 +-
 man/man8/tc-ife.8 |  29 -
 tc/m_gact.c   |   4 +-
 tc/m_ife.c|  38 --
 tc/m_skbmod.c | 260 ++
 6 files changed, 319 insertions(+), 17 deletions(-)
 create mode 100644 tc/m_skbmod.c

-- 
1.9.1

Re: [PATCH] iwlwifi: pcie: reduce "unsupported splx" to a warning

2016-10-11 Thread Paul Bolle

Hi Luca,

On Mon, 2016-10-10 at 17:02 +0300, Luca Coelho wrote:
> On Mon, 2016-10-10 at 02:19 -0500, Chris Rorvick wrote:
> This is not coming from the NIC itself, but from the platform's ACPI
> tables.  Can you tell us which platform you are using?

On my machine I'm seeing the same error as Chris. So what exactly do
you mean with "platform" here?

> > Name (SPLX, Package (0x04)
> > {
> > Zero,
> > Package (0x03)
> > {
> > 0,
> > 1200,
> > 1000
> > },
> > Package (0x03)
> > {
> > 0,
> > 1200,
> > 1000
> > },
> > Package (0x03)
> > {
> > 0,
> > 1200,
> > 1000
> > }
> > })
> 
> This is not the structure that we are expecting.  We expect this:
> 
>    Name (SPLX, Package (0x02)
>    {
>    Zero,
>    Package (0x03)
>    {
>    0x07,
>    ,
>    
>    }
>    })
> 
> ...as you correctly pointed out.  The data in the structure you have is
> not for WiFi (actually I don't think 0 is a valid value, but I'll
> double-check).

For what it's worth, on my machine I have twenty (!) SPLX entries, all
reading:
Name (SPLX, Package (0x04)
{
Zero, 
Package (0x03)
{
0x8000, 
0x8000, 
0x8000
}, 

Package (0x03)
{
   0x8000, 
   0x8000, 
   0x8000
}, 

Package (0x03)
{
0x8000, 
0x8000, 
0x8000
}
})

> There are other things that look a bit inconsistent in this code...
> I'll try to find the official ACPI table definitions for this entries
> to make sure it's correct.

When I looked into this error, some time ago, I searched around a bit
for documentation on this splx stuff. Sadly, commit bcb079a14d75
("iwlwifi: pcie: retrieve and parse ACPI power limitations") provides
very few clues and my searches turned up nothing useful. So a pointer
or two would be really appreciated.

> > --- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
> > @@ -540,7 +540,7 @@ static u64 splx_get_pwr_limit(struct iwl_trans *trans, 
> > union acpi_object *splx)
> >     splx->package.count != 2 ||
> >     splx->package.elements[0].type != ACPI_TYPE_INTEGER ||
> >     splx->package.elements[0].integer.value != 0) {
> > -   IWL_ERR(trans, "Unsupported splx structure\n");
> > +   IWL_WARN(trans, "Unsupported splx structure, not limiting WiFi 
> > power\n");
> >     return 0;
> >     }
> 
> If this is really bothering you, I guess I could apply this patch for
> now.  But as I said, this is not solving the actual problem.

Bikeshedding: I think IWL_INFO() is more appropriate, as info doesn't
imply one needs to act on this message, while warn does imply that
action is needed.

Thanks,


Paul Bolle

Re: [PATCH v4 04/10] ARM: dts: sun8i-h3: Add dt node for the syscon control module

2016-10-11 Thread Maxime Ripard

On Mon, Oct 10, 2016 at 02:50:21PM +0200, Jean-Francois Moine wrote:
> On Mon, 10 Oct 2016 14:31:51 +0200
> Maxime Ripard  wrote:
> 
> > Hi,
> > 
> > On Fri, Oct 07, 2016 at 10:25:51AM +0200, Corentin Labbe wrote:
> > > This patch add the dt node for the syscon register present on the
> > > Allwinner H3.
> > > 
> > > Only two register are present in this syscon and the only one useful is
> > > the one dedicated to EMAC clock.
> > > 
> > > Signed-off-by: Corentin Labbe 
> > > ---
> > >  arch/arm/boot/dts/sun8i-h3.dtsi | 5 +
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/arch/arm/boot/dts/sun8i-h3.dtsi 
> > > b/arch/arm/boot/dts/sun8i-h3.dtsi
> > > index 8a95e36..1101d2f 100644
> > > --- a/arch/arm/boot/dts/sun8i-h3.dtsi
> > > +++ b/arch/arm/boot/dts/sun8i-h3.dtsi
> > > @@ -140,6 +140,11 @@
> > >   #size-cells = <1>;
> > >   ranges;
> > >  
> > > + syscon: syscon@01c0 {
> > > + compatible = "syscon";
> > 
> > It would be great to have a more specific compatible here in addition
> > to the syscon, like "allwinner,sun8i-h3-system-controller".
> 
> The System Control area is just like the PRCM area: it would be simpler
> to define the specific registers in the associated drivers.

Until you actually have to share those registers between different
devices, and then you're just screwed.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

Re: [PATCH v4 10/10] ARM: sunxi: Enable sun8i-emac driver on multi_v7_defconfig

2016-10-11 Thread Maxime Ripard

On Mon, Oct 10, 2016 at 03:09:43PM +0200, Jean-Francois Moine wrote:
> On Mon, 10 Oct 2016 14:35:11 +0200
> LABBE Corentin  wrote:
> 
> > On Mon, Oct 10, 2016 at 02:30:46PM +0200, Maxime Ripard wrote:
> > > On Fri, Oct 07, 2016 at 10:25:57AM +0200, Corentin Labbe wrote:
> > > > Enable the sun8i-emac driver in the multi_v7 default configuration
> > > > 
> > > > Signed-off-by: Corentin Labbe 
> > > > ---
> > > >  arch/arm/configs/multi_v7_defconfig | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/arch/arm/configs/multi_v7_defconfig 
> > > > b/arch/arm/configs/multi_v7_defconfig
> > > > index 5845910..f44d633 100644
> > > > --- a/arch/arm/configs/multi_v7_defconfig
> > > > +++ b/arch/arm/configs/multi_v7_defconfig
> > > > @@ -229,6 +229,7 @@ CONFIG_NETDEVICES=y
> > > >  CONFIG_VIRTIO_NET=y
> > > >  CONFIG_HIX5HD2_GMAC=y
> > > >  CONFIG_SUN4I_EMAC=y
> > > > +CONFIG_SUN8I_EMAC=y
> > > 
> > > Any reason to build it statically?
> > 
> > No, just copied the same than CONFIG_SUN4I_EMAC that probably do
> > not need it also.
> 
> All arm configs are done the same way, and, some day, the generic ARM
> V7 kernel will not be loadable in 1Gb RAM...

Yeah, if possible, I'd really like to avoid introducing statically
built drivers to multi_v7.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature

[PATCH net-next 0/6] FUJITSU Extended Socket driver version 1.2

2016-10-11 Thread Taku Izumi

From: Taku 

This patchset updates FUJITSU Extended Socket network driver into version 1.2.
This includes the following enhancements:
  - ethtool -d support
  - ethtool -S enhancement
  - Add some debugging feature (tracepoints etc)

Taku Izumi (6):
  fjes: ethtool -d support for fjes driver
  fjes: Enhance ethtool -S for fjes driver
  fjes: Add tracepoints in fjes driver
  fjes: Implement debug mode for fjes driver
  fjes: Add debugfs entry for EP status information in fjes driver
  fjes: Update fjes driver version : 1.2

 drivers/net/fjes/Makefile   |   2 +-
 drivers/net/fjes/fjes.h |  18 ++
 drivers/net/fjes/fjes_debugfs.c | 220 +++
 drivers/net/fjes/fjes_ethtool.c | 118 -
 drivers/net/fjes/fjes_hw.c  | 156 +++-
 drivers/net/fjes/fjes_hw.h  |  34 
 drivers/net/fjes/fjes_main.c|  63 ++-
 drivers/net/fjes/fjes_trace.c   |  30 
 drivers/net/fjes/fjes_trace.h   | 381 
 9 files changed, 1011 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/fjes/fjes_debugfs.c
 create mode 100644 drivers/net/fjes/fjes_trace.c
 create mode 100644 drivers/net/fjes/fjes_trace.h

-- 
2.6.6

Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-11 Thread Krister Johansen

On Wed, Oct 05, 2016 at 11:01:38AM -0700, Cong Wang wrote:
> Does the attached patch make any sense now? Our pernet init doesn't
> rely on act_base, so even we have some race, the worst case is after
> we initialize the pernet netns for an action but its ops still not
> visible, which seems fine (at least no crash).

I tried to reproduce the panic with this latest patch, but I am unable
to do so.  The one difference I notice between this patch, and the one I
sent to the list, is that with yours it takes much longer before we get
any output from the simultaneous launch of these containers.  Presumably
that's the extra latency added by allowing many extra modprobe calls to
get spawned by request_module().

-K

[PATCH 1/6] fjes: ethtool -d support for fjes driver

2016-10-11 Thread Taku Izumi

This patch adds implementation of supporting
ethtool -d for fjes driver. By using ethtool -d,
you can get registers dump of Exetnded socket device.

  # ethtool -d es0

Offset  Values
--  --
0x: 01 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00
0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0020: 02 00 00 80 02 00 00 80 64 a6 58 08 07 00 00 00
0x0030: 00 00 00 00 28 80 00 00 00 00 f9 e3 06 00 00 00
0x0040: 00 00 00 00 18 00 00 00 80 a4 58 08 07 00 00 00
0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0080: 00 00 00 00 00 00 e0 7f 00 00 01 00 00 00 01 00
0x0090: 00 00 00 00

Signed-off-by: Taku Izumi 
---
 drivers/net/fjes/fjes_ethtool.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/fjes/fjes_ethtool.c b/drivers/net/fjes/fjes_ethtool.c
index 9c218e1..8397634 100644
--- a/drivers/net/fjes/fjes_ethtool.c
+++ b/drivers/net/fjes/fjes_ethtool.c
@@ -121,12 +121,60 @@ static int fjes_get_settings(struct net_device *netdev,
return 0;
 }
 
+static int fjes_get_regs_len(struct net_device *netdev)
+{
+#define FJES_REGS_LEN  37
+   return FJES_REGS_LEN * sizeof(u32);
+}
+
+static void fjes_get_regs(struct net_device *netdev,
+ struct ethtool_regs *regs, void *p)
+{
+   struct fjes_adapter *adapter = netdev_priv(netdev);
+   struct fjes_hw *hw = >hw;
+   u32 *regs_buff = p;
+
+   memset(p, 0, FJES_REGS_LEN * sizeof(u32));
+
+   regs->version = 1;
+
+   /* Information registers */
+   regs_buff[0] = rd32(XSCT_OWNER_EPID);
+   regs_buff[1] = rd32(XSCT_MAX_EP);
+
+   /* Device Control registers */
+   regs_buff[4] = rd32(XSCT_DCTL);
+
+   /* Command Control registers */
+   regs_buff[8] = rd32(XSCT_CR);
+   regs_buff[9] = rd32(XSCT_CS);
+   regs_buff[10] = rd32(XSCT_SHSTSAL);
+   regs_buff[11] = rd32(XSCT_SHSTSAH);
+
+   regs_buff[13] = rd32(XSCT_REQBL);
+   regs_buff[14] = rd32(XSCT_REQBAL);
+   regs_buff[15] = rd32(XSCT_REQBAH);
+
+   regs_buff[17] = rd32(XSCT_RESPBL);
+   regs_buff[18] = rd32(XSCT_RESPBAL);
+   regs_buff[19] = rd32(XSCT_RESPBAH);
+
+   /* Interrupt Control registers */
+   regs_buff[32] = rd32(XSCT_IS);
+   regs_buff[33] = rd32(XSCT_IMS);
+   regs_buff[34] = rd32(XSCT_IMC);
+   regs_buff[35] = rd32(XSCT_IG);
+   regs_buff[36] = rd32(XSCT_ICTL);
+}
+
 static const struct ethtool_ops fjes_ethtool_ops = {
.get_settings   = fjes_get_settings,
.get_drvinfo= fjes_get_drvinfo,
.get_ethtool_stats = fjes_get_ethtool_stats,
.get_strings  = fjes_get_strings,
.get_sset_count   = fjes_get_sset_count,
+   .get_regs   = fjes_get_regs,
+   .get_regs_len   = fjes_get_regs_len,
 };
 
 void fjes_set_ethtool_ops(struct net_device *netdev)
-- 
2.6.6

[GIT] Networking

2016-10-11 Thread David Miller


1) Netfilter list handling fix, from Linus.

2) RXRPC/AFS bug fixes from David Howells (oops on call to serviceless
   endpoints, build warnings, missing notifications, etc.) From David
   Howells.

3) Kernel log message missing newlines, from Colin Ian King.

4) Don't enter direct reclaim in netlink dumps, the idea is to use a
   high order allocation first and fallback quickly to a 0-order
   allocation if such a high-order one cannot be done cheaply and
   without reclaim.  From Eric Dumazet.

5) Fix firmware download errors in btusb bluetooth driver, from Ethan
   Hsieh.

6) Missing Kconfig deps for QCOM_EMAC, from Geert Uytterhoeven.

7) Fix MDIO_XGENE dup Kconfig entry.  From Laura Abbott.

8) Constrain ipv6 rtr_solicits sysctl values properly, from Maciej
   Żenczykowski.

Please pull, thanks a lot!

The following changes since commit 4c1fad64eff481982349f5795b9c198c532b0f13:

  Merge tag 'for-f2fs-4.9' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs (2016-10-06 15:30:40 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to bd3769bfedb2b65af61744e9b40b1863e0870e2b:

  netfilter: Fix slab corruption. (2016-10-11 04:44:37 -0400)


Alex Sidorenko (1):
  Fixing a bug in team driver due to incorrect 'unsigned int' to 'int' 
conversion

Amitkumar Karwar (1):
  Bluetooth: btusb: add entry for Marvell 8997 chipset

Anoob Soman (1):
  packet: call fanout_release, while UNREGISTERING a netdev

Christophe Jaillet (1):
  wan/fsl_ucc_hdlc: Fix size used in dma_free_coherent()

Colin Ian King (3):
  net: axienet: Add missing \n to end of dev_err messages
  net: ps3_gelic: Add missing \n to end of deb_dbg message
  net: hns: Add missing \n to end of dev_err messages, tidy up text

David Howells (13):
  rxrpc: Accesses of rxrpc_local::service need to be RCU managed
  rxrpc: Fix duplicate const
  rxrpc: Fix oops on incoming call to serviceless endpoint
  rxrpc: Only ping for lost reply in client call
  rxrpc: Fix warning by splitting rxrpc_send_call_packet()
  rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs
  rxrpc: Partially handle OpenAFS's improper termination of calls
  rxrpc: Queue the call on expiry
  rxrpc: Add missing notification
  rxrpc: Return negative error code to kernel service
  afs: Check for fatal error when in waiting for ack state
  rxrpc: Need to produce an ACK for service op if op takes a long time
  rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase

David S. Miller (6):
  Merge branch 'fman-next' of git://git.freescale.com/ppc/upstream/linux
  Merge branch 'xen-netback-rx-refactor'
  Merge tag 'rxrpc-rewrite-20161004' of 
git://git.kernel.org/.../dhowells/linux-fs
  Merge branch 'mediatek-hw-lro-chip-id-check'
  Merge branch 'for-upstream' of 
git://git.kernel.org/.../bluetooth/bluetooth
  Merge branch 'be2net-fixes'

David Vrabel (4):
  xen-netback: refactor guest rx
  xen-netback: immediately wake tx queue when guest rx queue has space
  xen-netback: process guest rx packets in batches
  xen-netback: batch copies for multiple to-guest rx packets

Eric Dumazet (1):
  netlink: do not enter direct reclaim from netlink_dump()

Ethan Hsieh (1):
  Bluetooth: btusb: Fix atheros firmware download error

Geert Uytterhoeven (1):
  ethernet: qualcomm: QCOM_EMAC should depend on HAS_DMA and HAS_IOMEM

Igal Liberman (1):
  fsl/fman: fix loadable module compilation

Jon Mason (1):
  net: bgmac: Fix errant feature flag check

Laura Abbott (1):
  drivers: net: phy: Correct duplicate MDIO_XGENE entry

Laurent Pinchart (1):
  dt-bindings: net: renesas-ravb: Add support for R8A7796 RAVB

Linus Torvalds (1):
  netfilter: Fix slab corruption.

Maciej Żenczykowski (1):
  ipv6 addrconf: disallow rtr_solicits < -1

Madalin Bucur (12):
  fsl/fman: split lines over 80 characters
  fsl/fman: small fixes
  fsl/fman: use of_get_phy_mode()
  fsl/fman: simplify device tree reads
  fsl/fman: return a phy_dev pointer from init
  fsl/fman: MEMAC may use QSGMII PHY interface mode
  fsl/fman: check pcsphy pointer before use
  fsl/fman: check of_get_phy_mode() return value
  fsl/fman: simplify redundant condition
  fsl/fman: fix return value checking
  fsl/fman: remove leftover comment
  MAINTAINERS: net: add entry for Freescale QorIQ DPAA FMan driver

Michał Narajowski (3):
  Bluetooth: Fix local name in scan rsp
  Bluetooth: Add appearance to default scan rsp data
  Bluetooth: Refactor append name and appearance

Mike Looijmans (1):
  devicetree: net: micrel-ksz90x1.txt: Properly explain skew settings

Mugunthan V N (1):
  drivers: net: cpsw-phy-sel: add support to configure rgmii internal delay

[PATCH 3/6] fjes: Add tracepoints in fjes driver

2016-10-11 Thread Taku Izumi

This patch adds tracepoints in fjes driver.
This is useful for debugging purpose.

Signed-off-by: Taku Izumi 
---
 drivers/net/fjes/Makefile |   2 +-
 drivers/net/fjes/fjes_hw.c|  25 +++-
 drivers/net/fjes/fjes_main.c  |   5 +
 drivers/net/fjes/fjes_trace.c |  30 
 drivers/net/fjes/fjes_trace.h | 312 ++
 5 files changed, 370 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/fjes/fjes_trace.c
 create mode 100644 drivers/net/fjes/fjes_trace.h

diff --git a/drivers/net/fjes/Makefile b/drivers/net/fjes/Makefile
index 523e3d7..6705d1b 100644
--- a/drivers/net/fjes/Makefile
+++ b/drivers/net/fjes/Makefile
@@ -27,4 +27,4 @@
 
 obj-$(CONFIG_FUJITSU_ES) += fjes.o
 
-fjes-objs := fjes_main.o fjes_hw.o fjes_ethtool.o
+fjes-objs := fjes_main.o fjes_hw.o fjes_ethtool.o fjes_trace.o
diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
index 82b56e8..dba59dc 100644
--- a/drivers/net/fjes/fjes_hw.c
+++ b/drivers/net/fjes/fjes_hw.c
@@ -21,6 +21,7 @@
 
 #include "fjes_hw.h"
 #include "fjes.h"
+#include "fjes_trace.h"
 
 static void fjes_hw_update_zone_task(struct work_struct *);
 static void fjes_hw_epstop_task(struct work_struct *);
@@ -371,7 +372,7 @@ fjes_hw_issue_request_command(struct fjes_hw *hw,
enum fjes_dev_command_response_e ret = FJES_CMD_STATUS_UNKNOWN;
union REG_CR cr;
union REG_CS cs;
-   int timeout;
+   int timeout = FJES_COMMAND_REQ_TIMEOUT * 1000;
 
cr.reg = 0;
cr.bits.req_start = 1;
@@ -408,6 +409,8 @@ fjes_hw_issue_request_command(struct fjes_hw *hw,
}
}
 
+   trace_fjes_hw_issue_request_command(, , timeout, ret);
+
return ret;
 }
 
@@ -427,11 +430,13 @@ int fjes_hw_request_info(struct fjes_hw *hw)
res_buf->info.code = 0;
 
ret = fjes_hw_issue_request_command(hw, FJES_CMD_REQ_INFO);
+   trace_fjes_hw_request_info(hw, res_buf);
 
result = 0;
 
if (FJES_DEV_COMMAND_INFO_RES_LEN((*hw->hw_info.max_epid)) !=
res_buf->info.length) {
+   trace_fjes_hw_request_info_err("Invalid res_buf");
result = -ENOMSG;
} else if (ret == FJES_CMD_STATUS_NORMAL) {
switch (res_buf->info.code) {
@@ -448,6 +453,7 @@ int fjes_hw_request_info(struct fjes_hw *hw)
result = -EPERM;
break;
case FJES_CMD_STATUS_TIMEOUT:
+   trace_fjes_hw_request_info_err("Timeout");
result = -EBUSY;
break;
case FJES_CMD_STATUS_ERROR_PARAM:
@@ -512,6 +518,8 @@ int fjes_hw_register_buff_addr(struct fjes_hw *hw, int 
dest_epid,
res_buf->share_buffer.length = 0;
res_buf->share_buffer.code = 0;
 
+   trace_fjes_hw_register_buff_addr_req(req_buf, buf_pair);
+
ret = fjes_hw_issue_request_command(hw, FJES_CMD_REQ_SHARE_BUFFER);
 
timeout = FJES_COMMAND_REQ_BUFF_TIMEOUT * 1000;
@@ -532,16 +540,20 @@ int fjes_hw_register_buff_addr(struct fjes_hw *hw, int 
dest_epid,
 
result = 0;
 
+   trace_fjes_hw_register_buff_addr(res_buf, timeout);
+
if (res_buf->share_buffer.length !=
-   FJES_DEV_COMMAND_SHARE_BUFFER_RES_LEN)
+   FJES_DEV_COMMAND_SHARE_BUFFER_RES_LEN) {
+   trace_fjes_hw_register_buff_addr_err("Invalid res_buf");
result = -ENOMSG;
-   else if (ret == FJES_CMD_STATUS_NORMAL) {
+   } else if (ret == FJES_CMD_STATUS_NORMAL) {
switch (res_buf->share_buffer.code) {
case FJES_CMD_REQ_RES_CODE_NORMAL:
result = 0;
set_bit(dest_epid, >hw_info.buffer_share_bit);
break;
case FJES_CMD_REQ_RES_CODE_BUSY:
+   trace_fjes_hw_register_buff_addr_err("Busy Timeout");
result = -EBUSY;
break;
default:
@@ -554,6 +566,7 @@ int fjes_hw_register_buff_addr(struct fjes_hw *hw, int 
dest_epid,
result = -EPERM;
break;
case FJES_CMD_STATUS_TIMEOUT:
+   trace_fjes_hw_register_buff_addr_err("Timeout");
result = -EBUSY;
break;
case FJES_CMD_STATUS_ERROR_PARAM:
@@ -595,6 +608,7 @@ int fjes_hw_unregister_buff_addr(struct fjes_hw *hw, int 
dest_epid)
res_buf->unshare_buffer.length = 0;
res_buf->unshare_buffer.code = 0;
 
+   trace_fjes_hw_unregister_buff_addr_req(req_buf);
ret = fjes_hw_issue_request_command(hw, FJES_CMD_REQ_UNSHARE_BUFFER);
 
timeout = FJES_COMMAND_REQ_BUFF_TIMEOUT * 1000;
@@ -616,8 +630,11 @@ int fjes_hw_unregister_buff_addr(struct fjes_hw *hw, int 
dest_epid)
 
result = 0;
 
+

[PATCH 4/6] fjes: Implement debug mode for fjes driver

2016-10-11 Thread Taku Izumi

This patch implements debug mode for fjes driver.
You can get firmware activity information by enabling
debug mode. This is useful for debugging.

To enable debug mode, write value of debugging mode to
debug_mode file in debugfs:

  # echo 1 > /sys/kernel/debug/fjes/fjes.0/debug_mode

To disable debug mode, write 0 to debug_mode file in debugfs:

  # echo 0 > /sys/kernel/debug/fjes/fjes.0/debug_mode

Firmware activity information can be retrieved via
/sys/kernel/debug/fjes/fjes.0/debug_data file.

Signed-off-by: Taku Izumi 
---
 drivers/net/fjes/Makefile   |   2 +-
 drivers/net/fjes/fjes.h |  18 +
 drivers/net/fjes/fjes_debugfs.c | 169 
 drivers/net/fjes/fjes_hw.c  | 122 +
 drivers/net/fjes/fjes_hw.h  |  15 
 drivers/net/fjes/fjes_main.c|  12 ++-
 drivers/net/fjes/fjes_trace.h   |  69 
 7 files changed, 405 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/fjes/fjes_debugfs.c

diff --git a/drivers/net/fjes/Makefile b/drivers/net/fjes/Makefile
index 6705d1b..bc47b35 100644
--- a/drivers/net/fjes/Makefile
+++ b/drivers/net/fjes/Makefile
@@ -27,4 +27,4 @@
 
 obj-$(CONFIG_FUJITSU_ES) += fjes.o
 
-fjes-objs := fjes_main.o fjes_hw.o fjes_ethtool.o fjes_trace.o
+fjes-objs := fjes_main.o fjes_hw.o fjes_ethtool.o fjes_trace.o fjes_debugfs.o
diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h
index a592fe2..3f4dc73 100644
--- a/drivers/net/fjes/fjes.h
+++ b/drivers/net/fjes/fjes.h
@@ -23,6 +23,7 @@
 #define FJES_H_
 
 #include 
+#include 
 
 #include "fjes_hw.h"
 
@@ -66,6 +67,11 @@ struct fjes_adapter {
bool interrupt_watch_enable;
 
struct fjes_hw hw;
+
+#ifdef CONFIG_DEBUG_FS
+   struct dentry *dbg_adapter;
+   struct debugfs_blob_wrapper blob;
+#endif /* CONFIG_DEBUG_FS */
 };
 
 extern char fjes_driver_name[];
@@ -74,4 +80,16 @@ extern const u32 fjes_support_mtu[];
 
 void fjes_set_ethtool_ops(struct net_device *);
 
+#ifdef CONFIG_DEBUG_FS
+void fjes_dbg_adapter_init(struct fjes_adapter *adapter);
+void fjes_dbg_adapter_exit(struct fjes_adapter *adapter);
+void fjes_dbg_init(void);
+void fjes_dbg_exit(void);
+#else
+static inline void fjes_dbg_adapter_init(struct fjes_adapter *adapter) {}
+static inline void fjes_dbg_adapter_exit(struct fjes_adapter *adapter) {}
+static inline void fjes_dbg_init(void) {}
+static inline void fjes_dbg_exit(void) {}
+#endif /* CONFIG_DEBUG_FS */
+
 #endif /* FJES_H_ */
diff --git a/drivers/net/fjes/fjes_debugfs.c b/drivers/net/fjes/fjes_debugfs.c
new file mode 100644
index 000..d868fe7
--- /dev/null
+++ b/drivers/net/fjes/fjes_debugfs.c
@@ -0,0 +1,169 @@
+/*
+ *  FUJITSU Extended Socket Network Device driver
+ *  Copyright (c) 2015-2016 FUJITSU LIMITED
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, see .
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ */
+
+/* debugfs support for fjes driver */
+
+#ifdef CONFIG_DEBUG_FS
+
+#include 
+#include 
+#include 
+
+#include "fjes.h"
+
+static struct dentry *fjes_debug_root;
+
+static ssize_t fjes_dbg_dbg_mode_read(struct file *file, char __user *ubuf,
+ size_t count, loff_t *ppos)
+{
+   struct fjes_adapter *adapter = file->private_data;
+   struct fjes_hw *hw = >hw;
+   char buf[64];
+   int size;
+
+   size = sprintf(buf, "%d\n", hw->debug_mode);
+
+   return simple_read_from_buffer(ubuf, count, ppos, buf, size);
+}
+
+static ssize_t fjes_dbg_dbg_mode_write(struct file *file,
+  const char __user *ubuf, size_t count,
+  loff_t *ppos)
+{
+   struct fjes_adapter *adapter = file->private_data;
+   struct fjes_hw *hw = >hw;
+   unsigned int value;
+   int ret;
+
+   ret = kstrtouint_from_user(ubuf, count, 10, );
+   if (ret)
+   return ret;
+
+   if (value) {
+   if (hw->debug_mode)
+   return -EPERM;
+
+   hw->debug_mode = value;
+
+   /* enable debug mode */
+   mutex_lock(>hw_info.lock);
+   ret = fjes_hw_start_debug(hw);
+   mutex_unlock(>hw_info.lock);
+
+   if (ret) {
+   hw->debug_mode = 0;
+   return ret;
+

[PATCH 2/6] fjes: Enhance ethtool -S for fjes driver

2016-10-11 Thread Taku Izumi

This patch enhances ethtool -S for fjes driver so that
EP related statistics can be retrieved.

The following statistics can be displayed via ethtool -S:

 ep%d_com_regist_buf_exec
 ep%d_com_unregist_buf_exec
 ep%d_send_intr_rx
 ep%d_send_intr_unshare
 ep%d_send_intr_zoneupdate
 ep%d_recv_intr_rx
 ep%d_recv_intr_unshare
 ep%d_recv_intr_stop
 ep%d_recv_intr_zoneupdate
 ep%d_tx_buffer_full
 ep%d_tx_dropped_not_shared
 ep%d_tx_dropped_ver_mismatch
 ep%d_tx_dropped_buf_size_mismatch
 ep%d_tx_dropped_vlanid_mismatch

Signed-off-by: Taku Izumi 
---
 drivers/net/fjes/fjes_ethtool.c | 70 -
 drivers/net/fjes/fjes_hw.c  |  9 ++
 drivers/net/fjes/fjes_hw.h  | 19 +++
 drivers/net/fjes/fjes_main.c| 44 +++---
 4 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/drivers/net/fjes/fjes_ethtool.c b/drivers/net/fjes/fjes_ethtool.c
index 8397634..68ef287 100644
--- a/drivers/net/fjes/fjes_ethtool.c
+++ b/drivers/net/fjes/fjes_ethtool.c
@@ -49,10 +49,18 @@ static const struct fjes_stats fjes_gstrings_stats[] = {
FJES_STAT("tx_dropped", stats64.tx_dropped),
 };
 
+#define FJES_EP_STATS_LEN 14
+#define FJES_STATS_LEN \
+   (ARRAY_SIZE(fjes_gstrings_stats) + \
+((&((struct fjes_adapter *)netdev_priv(netdev))->hw)->max_epid - 1) * \
+FJES_EP_STATS_LEN)
+
 static void fjes_get_ethtool_stats(struct net_device *netdev,
   struct ethtool_stats *stats, u64 *data)
 {
struct fjes_adapter *adapter = netdev_priv(netdev);
+   struct fjes_hw *hw = >hw;
+   int epidx;
char *p;
int i;
 
@@ -61,11 +69,39 @@ static void fjes_get_ethtool_stats(struct net_device 
*netdev,
data[i] = (fjes_gstrings_stats[i].sizeof_stat == sizeof(u64))
? *(u64 *)p : *(u32 *)p;
}
+   for (epidx = 0; epidx < hw->max_epid; epidx++) {
+   if (epidx == hw->my_epid)
+   continue;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .com_regist_buf_exec;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .com_unregist_buf_exec;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats.send_intr_rx;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats.send_intr_unshare;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .send_intr_zoneupdate;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats.recv_intr_rx;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats.recv_intr_unshare;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats.recv_intr_stop;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .recv_intr_zoneupdate;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats.tx_buffer_full;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .tx_dropped_not_shared;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .tx_dropped_ver_mismatch;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .tx_dropped_buf_size_mismatch;
+   data[i++] = hw->ep_shm_info[epidx].ep_stats
+   .tx_dropped_vlanid_mismatch;
+   }
 }
 
 static void fjes_get_strings(struct net_device *netdev,
 u32 stringset, u8 *data)
 {
+   struct fjes_adapter *adapter = netdev_priv(netdev);
+   struct fjes_hw *hw = >hw;
u8 *p = data;
int i;
 
@@ -76,6 +112,38 @@ static void fjes_get_strings(struct net_device *netdev,
   ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
+   for (i = 0; i < hw->max_epid; i++) {
+   if (i == hw->my_epid)
+   continue;
+   sprintf(p, "ep%u_com_regist_buf_exec", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_com_unregist_buf_exec", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_send_intr_rx", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_send_intr_unshare", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_send_intr_zoneupdate", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_recv_intr_rx", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_recv_intr_unshare", i);
+   p += ETH_GSTRING_LEN;
+   sprintf(p, "ep%u_recv_intr_stop", i);
+   p += ETH_GSTRING_LEN;
+

[PATCH 5/6] fjes: Add debugfs entry for EP status information in fjes driver

2016-10-11 Thread Taku Izumi

This patch adds debugfs entry to show EP status information.
You can get each EP's status information like the following:

  # cat /sys/kernel/debug/fjes/fjes.0/status

EPIDSTATUS   SAME_ZONECONNECTED
ep0 shared   YY
ep1 ---
ep2 unshared NN
ep3 unshared NN
ep4 unshared NN
ep5 unshared NN
ep6 unshared NN
ep7 unshared NN

Signed-off-by: Taku Izumi 
---
 drivers/net/fjes/fjes_debugfs.c | 51 +
 1 file changed, 51 insertions(+)

diff --git a/drivers/net/fjes/fjes_debugfs.c b/drivers/net/fjes/fjes_debugfs.c
index d868fe7..19528a5 100644
--- a/drivers/net/fjes/fjes_debugfs.c
+++ b/drivers/net/fjes/fjes_debugfs.c
@@ -97,6 +97,51 @@ static const struct file_operations fjes_dbg_dbg_mode_fops = 
{
.write  = fjes_dbg_dbg_mode_write,
 };
 
+static const char * const ep_status_string[] = {
+   "unshared",
+   "shared",
+   "waiting",
+   "complete",
+};
+
+static int fjes_dbg_status_show(struct seq_file *m, void *v)
+{
+   struct fjes_adapter *adapter = m->private;
+   struct fjes_hw *hw = >hw;
+   int max_epid = hw->max_epid;
+   int my_epid = hw->my_epid;
+   int epidx;
+
+   seq_puts(m, "EPID\tSTATUS   SAME_ZONECONNECTED\n");
+   for (epidx = 0; epidx < max_epid; epidx++) {
+   if (epidx == my_epid) {
+   seq_printf(m, "ep%d\t%-16c %-16c %-16c\n",
+  epidx, '-', '-', '-');
+   } else {
+   seq_printf(m, "ep%d\t%-16s %-16c %-16c\n",
+  epidx,
+  
ep_status_string[fjes_hw_get_partner_ep_status(hw, epidx)],
+  fjes_hw_epid_is_same_zone(hw, epidx) ? 'Y' : 
'N',
+  fjes_hw_epid_is_shared(hw->hw_info.share, 
epidx) ? 'Y' : 'N');
+   }
+   }
+
+   return 0;
+}
+
+static int fjes_dbg_status_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, fjes_dbg_status_show, inode->i_private);
+}
+
+static const struct file_operations fjes_dbg_status_fops = {
+   .owner  = THIS_MODULE,
+   .open   = fjes_dbg_status_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= single_release,
+};
+
 void fjes_dbg_adapter_init(struct fjes_adapter *adapter)
 {
const char *name = dev_name(>plat_dev->dev);
@@ -132,6 +177,12 @@ void fjes_dbg_adapter_init(struct fjes_adapter *adapter)
hw->hw_info.trace = NULL;
hw->hw_info.trace_size = 0;
}
+
+   pfile = debugfs_create_file("status", 0444, adapter->dbg_adapter,
+   adapter, _dbg_status_fops);
+   if (!pfile)
+   dev_err(>plat_dev->dev,
+   "debugfs status for %s failed\n", name);
 }
 
 void fjes_dbg_adapter_exit(struct fjes_adapter *adapter)
-- 
2.6.6

[PATCH 6/6] fjes: Update fjes driver version : 1.2

2016-10-11 Thread Taku Izumi

From: Taku 

Signed-off-by: Taku Izumi 
---
 drivers/net/fjes/fjes_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
index 359e7a5..f36eb4a 100644
--- a/drivers/net/fjes/fjes_main.c
+++ b/drivers/net/fjes/fjes_main.c
@@ -30,7 +30,7 @@
 #include "fjes_trace.h"
 
 #define MAJ 1
-#define MIN 1
+#define MIN 2
 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)
 #define DRV_NAME   "fjes"
 char fjes_driver_name[] = DRV_NAME;
-- 
2.6.6

Re: slab corruption with current -git

2016-10-11 Thread David Miller

From: Linus Torvalds 
Date: Mon, 10 Oct 2016 22:47:50 -0700

> On Mon, Oct 10, 2016 at 10:39 PM, Linus Torvalds
>  wrote:
>>
>> I guess I will have to double-check that the slub corruption is gone
>> still with that fixed.
> 
> So I'm not getting any warnings now from SLUB debugging. So the
> original bug seems to not have re-surfaced, and the registration bug
> is gone, so now the unregistration doesn't warn about anything either.
> 
> But I only rebooted three times.

Looks good to me, I applied it to my tree with your signoff and will
send you a pull request right now.

Thanks!

Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-10-11 Thread James Chapman

On 11/10/16 02:54, R Parameswaran wrote:
>
>
> Hi James,
>
> Please see inline:
>
> On Tue, Oct 4, 2016 at 12:53 AM, James Chapman  > wrote:
>
> On 04/10/16 04:12, R. Parameswaran wrote:
> >
> > Hi James,
> >
> > Please see inline, thanks for the reply:
> >
> > On Sat, 1 Oct 2016, James Chapman wrote:
> >
> >> On 30/09/16 03:39, R. Parameswaran wrote:
> > + /* Adjust MTU, factor overhead - underlay L3 hdr, overlay
> L2 hdr*/
> > + if (tunnel->sock->sk_family == AF_INET)
> > + overhead += (ETH_HLEN + sizeof(struct iphdr));
> > + else if (tunnel->sock->sk_family == AF_INET6)
> > + overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
>  What about options in the IP header? If certain options are
> set on the
>  socket, the IP header may be larger.
> 
> >>> Thanks for the reply - It looks like IP options can only be
> >>> enabled through setsockopt on an application's socket (if
> there's any
> >>> other way to turn on IP options, please let me know - didn't
> see any
> >>> sysctl setting for transmit). This scenario would come
> >>> into picture when an application opens a raw IP or UDP socket
> such that it
> >>> routes into the L2TP logical interface.
> >> No. An L2TP daemon (userspace) will open a socket for each
> tunnel that
> >> it creates. Control and data packets use the same socket, which
> is the
> >> socket used by this code. It may set any options on its
> sockets. L2TP
> >> tunnel sockets can be created either by an L2TP daemon (managed
> tunnels)
> >> or by ip l2tp commands (unmanaged tunnels).
> >>
> > One Q I have is whether it would be sufficient to solve this for the
> > common case (i.e no IP options) and have an expectation that the
> > administrator will explicitly provision the mtu using the 'ip
> link ...
> > mtu'  command when dealing with infrequent occurences like IP
> options?
> >
> > But looking at the code, it looks to be possible to pick up whether
> > options are enabled and how long the options are, from the
> ip_options struct
> > embedded in the tunnel socket. If you want me to, I can repost
> the patch
> > with this change (will need a few days) - please let me know if
> this is
> > what you had in mind.
> >
> >
> Yes, that's what I had in mind. But my preference would be that this
> would be a new function in the ip core, for use by any encap protocol,
> where appropriate.
>
> Discussed this with Nachi (nprachan), we were thinking of a new
> function in ip_sockglue.c which would take the tunnel socket as
> parameter, derive the underlay device MTU and compute the underlay L3
> overhead (IPv4/IPv6 header, UDP header if it is a UDP socket, and IP
> option length if the ip_options struct exists in the socket). The
> function would be agnostic to the tunnel type (although we could
> provision tunnel-type and encap type as parameters). Callers would
> call it to figure out the cumulative underlay L3 overhead and the
> underlay MTU, and then use these numbers in the MTU calculation for
> their specific tunnel type. Let me know if that is different from what
> you had in mind, and/or if you have any suggestions on which file to
> place this in. I'll try and have this re-posted  by the end of this
> week or by early next week.
>

I think keep it simple. A function to return the size of the IP header
associated with any IP socket, not necessarily a tunnel socket. Don't
mix in any MTU derivation logic or UDP header size etc.

Post code early as an RFC. You're more likely to get review feedback
from others.

Re: BUILD_BUG_ON error in mlx5/core/pagealloc.c

2016-10-11 Thread Saeed Mahameed

On Tue, Oct 11, 2016 at 12:58 PM, Tom Herbert  wrote:
> On Mon, Oct 10, 2016 at 8:17 PM, Saeed Mahameed
>  wrote:
>>
>>
>> On Tuesday, October 11, 2016, Tom Herbert  wrote:
>>>
>>> On Mon, Oct 10, 2016 at 4:41 PM, Tom Herbert  wrote:
>>> > I am hitting this in mlx5
>>> >
>>> > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function
>>> > ‘reclaim_pages_cmd.clone.0’:
>>> > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:346: error: call
>>> > to ‘__compiletime_assert_346’ declared with attribute error:
>>> > BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_out, pas[i]) % 64
>>> > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c: In function
>>> > ‘give_pages’:
>>> > drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:291: error: call
>>> > to ‘__compiletime_assert_291’ declared with attribute error:
>>> > BUILD_BUG_ON failed: __mlx5_bit_off(manage_pages_in, pas[i]) % 64
>>> >
>>>
>>> The expression in BUILD_BUG_ON expands to "((unsigned)(unsigned
>>> long)(&(((struct mlx5_ifc_manage_pages_in_bits *)0)->pas[i]))) % 64".
>>> The variable array index to pas makes this expression non-constant, I
>>> imagine that is where the problem lies.
>>>
>>> > Bisecting points to:
>>> >
>>> > commit a533ed5e179cd15512d40282617909d3482a771c
>>> > Author: Saeed Mahameed 
>>> > Date:   Sun Jul 17 13:27:25 2016 +0300
>>> >
>>> > net/mlx5: Pages management commands via mlx5 ifc
>>
>>
>> Hi Tom,
>> We know about this issue, it happens only with gcc4.4  or gcc 4.2 i don't
>> remember exactly which.
>> Most likely it is a gcc issue.
>>
> I don't believe that is the case. As I pointed out this is giving
> BUILD_BUG_ON a non-constant expression. To fix this should just be a
> matter of using a constant argument for the BUILD_BUG_ON. AFAICT

Well, compilation work for us on most of our systems (where we have
newer gcc), something just doesn't add up here!
Anyway we didn't want to complicate the MACROs API and we wanted to
investigate a little bit more.

> there's only two instances of MLX5_SET64 where this can be a problem.
> I will post a fix shortly.

Thanks Tom, Just saw it, i will ack it.

Re: RFH: problems with adjacency graph

2016-10-11 Thread Jiri Pirko

Tue, Oct 11, 2016 at 04:18:52AM CEST, dsah...@gmail.com wrote:
>Jiri / Veaceslav:
>
>As author's of the adjacency tracking code in dev.c I am hoping you can help 
>with suggested patches for a couple of problems. The start point needs to 
>include commit 93409033ae65 which resolved a different problem from what I am 
>seeing now.
>
>At the moment I have 2 cases both for this topology:
>++
>|  myvrf |
>++
>  ||
>  |  +-+
>  |  | macvlan |
>  |  +-+
>  ||
>  +--+
>  |  bridge  |
>  +--+
>  |
>  ++
>  | bond0  |
>  ++
>  |
>  ++
>  |  eth3  |
>  ++
>
>
>Base set of commands for both cases:
>
>ip link add bond1 type bond
>ip link set bond1 up
>ip link set eth3 down
>ip link set eth3 master bond1
>ip link set eth3 up
>
>ip link add bridge type bridge
>ip link set bridge up
>ip link add macvlan link bridge type macvlan
>ip link set macvlan up
>
>ip link add myvrf type vrf table 1234
>ip link set myvrf up
>
>ip link set bridge master myvrf
>
>
>
># case 1
>
>ip link set macvlan master myvrf
>ip link set bond1 master bridge
>
>ip link delete myvrf
>
>dmesg has a splat triggered in __netdev_adjacent_dev_remove() where you 
>currently see the BUG(). If you convert that to a WARN_ON (which it should be, 
>no need to panic on the remove path) it will show you 4 missing adjacencies: 
>eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1. All of those are 
>because the dev_link function does not link macvlan lower devices to myvrf 
>when it is enslaved. (Enable the debugging to see that those messages are 
>missing.)
>
>
>
>
># case 2
>
>This case just flips the ordering of the enslavements:
>
>ip link set bond1 master bridge
>ip link set macvlan master myvrf
>
>Then run:
>ip link delete bond1
>ip link delete myvrf
>
>The last command hangs because myvrf has a reference that has not been 
>released. If you do not have commit 93409033ae65 the delete of bond1 hangs for 
>the same reason. For this case, the debug messages show that the macvlan lower 
>devices (eth3 and bond1) are connected to myvrf on the enslavement, but the 
>link delete the path only removes one of them hence the unreleased refcnt on 
>myvrf.
>
>
>In the end it seems that the code for the dependency graph is not making the 
>complete mesh which causes problems on the tear down. I have attempted a few 
>changes that so far fix 1 problem and uncover a different one. Hence the 
>request for help from the author's of this code.

Agreed. We need to fix the code to work with duplicates so the graph is
complete.


>
>It seems like the complete mesh is not really needed, but cscope shows 
>spectrum, ixgbe and bonding all using the for_each upper and lower device 
>macros.
>
>Suggestions?

Well other possibility is to traverse the tree recursively. But that is
exactly why the colided lists of all uppers/lowers were introduced to
avoid this.


>
>David

1 2 >

100 matches

Mail list logo