Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2016-01-06 Thread zhuyj

On 01/07/2016 10:43 AM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, January 05, 2016 7:05 PM
To: Tantilov, Emil S; Michal Kubecek; Jay Vosburgh
Cc: vfal...@gmail.com; go...@cumulusnetworks.com; netdev@vger.kernel.org;
Shteinbock, Boris (Wind River)
Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

On 01/06/2016 09:26 AM, Tantilov, Emil S wrote:

-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]

On

Behalf Of zhuyj
Sent: Monday, December 28, 2015 1:19 AM
To: Michal Kubecek; Jay Vosburgh
Cc: vfal...@gmail.com; go...@cumulusnetworks.com;

netdev@vger.kernel.org;

Shteinbock, Boris (Wind River)
Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

On 12/28/2015 04:43 PM, Michal Kubecek wrote:

On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:

 wrote:

In 802.3ad mode, the speed and duplex is needed. But in some NIC,
there is a time span between NIC up state and getting speed and

duplex.

As such, sometimes a slave in 802.3ad mode is in up state without
speed and duplex. This will make bonding in 802.3ad mode can not
work well.
To make bonding driver be compatible with more NICs, it is
necessary to restrict the up state in 802.3ad mode.

What device is this?  It seems a bit odd that an Ethernet device
can be carrier up but not have the duplex and speed available.

...

In general, though, bonding expects a speed or duplex change to
be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
propagate to the 802.3ad logic.

If the device here is going carrier up prior to having speed or
duplex available, then maybe it should call netdev_state_change() when
the duplex and speed are available, or delay calling

netif_carrier_on().

I have encountered this problem (NIC having carrier on before being

able

to detect speed/duplex and driver not notifying when speed/duplex
becomes available) with netxen cards earlier. But it was eventually
fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event
handling.") so this example rather supports what you said.

 Michal

Kubecek

Thanks a lot.
I checked the commit 9d01412ae76f ("netxen: Fix link event
handling."). The symptoms are the same with mine.

The root cause is different. In my problem, the root cause is that LINKS
register[]  can not provide link_up and link_speed at the same time.
There is a time span between link_up and link_speed.

The LINK_UP and LINK_SPEED bits in the LINKS register for ixgbe HW are

updated

simultaneously. Do you have any proof to show the delay you are referring

to

as I am sure our HW engineers would like to know about it.

Sorry. I can not reproduce this problem locally. What I have is the
feedback from the customer.

So you are assuming that there is a delay due to the issue you are seeing?


Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   100baseT/Full
1000baseT/Full
1baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  100baseT/Full
1000baseT/Full
1baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
   drv probe link
Link detected: yes

The speed and the link state here are reported from
different sources:


Link detected: yes

Comes from a netif_carrier_ok() check. This is done via ethtool_op_get_link().

Only the speed is reported through the LINKS register - that is why it is 
reported
as "Unknown" - in other words link_up is false.

This is a trace from the case where the bonding driver reports 0 Mbps:

kworker/u48:1-27950 [010]   6493.084916: ixgbe_service_task: eth1: 
link_speed = 80, link_up = false
kworker/u48:1-27950 [011]   6493.184894: ixgbe_service_task: eth1: 
link_speed = 80, link_up = false
kworker/u48:1-27950 [000]   6494.439883: ixgbe_service_task: eth1: 
link_speed = 80, link_up = true
kworker/u48:1-27950 [000]   6494.464204: ixgbe_service_task: eth1: NIC 
Link is Up 10 Gbps, Flow Control: RX/TX
  kworker/0:2-1926  [000]   6494.464249: ixgbe_get_settings: eth1: 
link_speed = 80, link_up = false
   NetworkManager-3819  [008]   6494.464484: ixgbe_get_settings: eth1: 
link_speed = 80, link_up = false
kworker/u48:1-27950 [007]   6494.496886: bond_mii_monitor: bond0: link 
status definitely up for interface eth1, 0 Mbps full duplex
   NetworkManager-3819  [008]   6494.496967: ixgbe_get_settings: eth1: 
link_speed = 80, link

Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h

2016-01-06 Thread Mikko Rapeli
On Wed, Jan 06, 2016 at 09:20:07AM -0800, Stephen Hemminger wrote:
> This commit breaks compilation of iproute2 with net-next.

Ok, linux/if.h and libc net/if.h have overlapping defines, and this is not
the only one. I saw lots of them in the core dump headers.

How should we handle them? Another ifndef for IFNAMSIZ into kernel uapi
headers?

-Mikko

> commit 1ffad83dffd675cd742286ae82dca7d746cb0da8
> Author: Mikko Rapeli 
> Date:   Thu Oct 15 07:56:30 2015 +0200
> 
> netfilter: fix include files for compilation
> 
> Add missing header dependencies and other small changes so that each file
> compiles alone in userspace.
> 
> Signed-off-by: Mikko Rapeli 
> Signed-off-by: Pablo Neira Ayuso 
> 
> For iproute2, a copy of kernel headers (from make install_headers) is used.
> After this change. the build of x_tables.c fails because IFNAMSIZ is already
> defined in net/if.h
> 
> gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations 
> -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES 
> -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  
> -DHAVE_SETNS -DHAVE_ELF -DCONFIG_GACT -DCONFIG_GACT_PROB 
> -DIPT_LIB_DIR=\"/lib/xtables\" -DYY_NO_INPUT -Wl,-export-dynamic -shared 
> -fpic -o q_atm.so q_atm.c -latm
> gcc -Wall -Wstrict-prototypes  -Wmissing-prototypes -Wmissing-declarations 
> -Wold-style-definition -Wformat=2 -O2 -I../include -DRESOLVE_HOSTNAMES 
> -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE  
> -DHAVE_SETNS -DHAVE_ELF -DCONFIG_GACT -DCONFIG_GACT_PROB 
> -DIPT_LIB_DIR=\"/lib/xtables\" -DYY_NO_INPUT -Wl,-export-dynamic -shared 
> -fpic -o m_xt.so m_xt.c $(pkg-config xtables --cflags --libs)
> In file included from ../include/linux/netfilter_ipv4/ip_tables.h:20:0,
>  from m_xt.c:20:
> ../include/linux/if.h:26:0: warning: "IFNAMSIZ" redefined
>  #define IFNAMSIZ 16
>  ^
> In file included from m_xt.c:17:0:
> /usr/include/net/if.h:129:0: note: this is the location of the previous 
> definition
>  # define IFNAMSIZ IF_NAMESIZE
>  ^
> ../include/linux/if.h:71:2: error: redeclaration of enumerator ‘IFF_UP’
>   IFF_UP= 1<<0,  /* sysfs */
>   ^
> /usr/include/net/if.h:44:5: note: previous definition of ‘IFF_UP’ was here
>  IFF_UP = 0x1,  /* Interface is up.  */
>  ^
> ../include/linux/if.h:72:2: error: redeclaration of enumerator ‘IFF_BROADCAST’
>   IFF_BROADCAST   = 1<<1,  /* __volatile__ */
>   ^
> /usr/include/net/if.h:46:5: note: previous definition of ‘IFF_BROADCAST’ was 
> here
>  IFF_BROADCAST = 0x2, /* Broadcast address valid.  */
>  ^
> ../include/linux/if.h:73:2: error: redeclaration of enumerator ‘IFF_DEBUG’
>   IFF_DEBUG   = 1<<2,  /* sysfs */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARN trace - skb_warn_bad_offload - vxlan - large udp packet - udp checksum disabled

2016-01-06 Thread Michal Kubecek
On Thu, Jan 07, 2016 at 03:36:27AM +0100, Hannes Frederic Sowa wrote:
> On 15.12.2015 02:35, Nelson, Shannon wrote:
> >Using a slightly modified version of udpspam (see diff below -
> >hopefully not mangled by corporate email servers), where I set the
> >SO_NO_CHECK socket option and can specify a large buffer size, I can
> >reliably get the following WARN trace.  I have reproduced this on
> >both ixgbe and i40e drivers using "udpspam-no-check 
> >6000".
> >
> >It looks to me like this is in the Tx path before we get to the
> >actual NIC drivers, but I may be wrong.
> 
> Does UFO offloading on the tunnel interface fix this error?
> 
> ethtool -K vxlan ufo off
> 
> If yes we can't feed SO_NO_CHECK udp packets into UFO as gso/ufo
> requires CHECKSUM_PARTIAL (or add some more logic into the skb or
> query socket status from skb_gso_segment).

I believe you are right. This sounds like the issue addressed by commit
acf8dd0a9d0b ("udp: only allow UFO for packets from SOCK_DGRAM sockets")
except for a regular SOCK_DGRAM socket with SO_NO_CHECK option. If so,
changing both instances of the check

(sk->sk_type == SOCK_DGRAM)

added by this commit to

(sk->sk_type == SOCK_DGRAM) && !sk->sk_no_check_tx

should fix the problem (and it should be done even if the issue reported
here is caused by something else).

Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2016-01-06 Thread zhuyj

Hi, Emil

Would you like to help me to make tests with this patch?
If the root cause is not the time span, I will make a new patch for this.

Thanks a lot.
Zhu Yanjun

On 01/07/2016 02:15 PM, zyjzyj2...@gmail.com wrote:

From: Zhu Yanjun 

In 802.3ad mode, the speed and duplex is needed. But in some NIC,
there is a time span between NIC up state and getting speed and duplex.
As such, sometimes a slave in 802.3ad mode is in up state without
speed and duplex. This will make bonding in 802.3ad mode can not
work well.
To make bonding driver be compatible with more NICs, it is
necessary to restrict the up state in 802.3ad mode.

Signed-off-by: Zhu Yanjun 
---
  drivers/net/bonding/bond_main.c |   11 +++
  1 file changed, 11 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 09f8a48..7df8af5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding *bond)
  
  		link_state = bond_check_dev_link(bond, slave->dev, 0);
  
+		if ((BMSR_LSTATUS == link_state) &&

+   (BOND_MODE(bond) == BOND_MODE_8023AD)) {
+   rtnl_lock();
+   bond_update_speed_duplex(slave);
+   rtnl_unlock();
+   if ((slave->speed == SPEED_UNKNOWN) ||
+   (slave->duplex == DUPLEX_UNKNOWN)) {
+   link_state = 0;
+   netdev_info(bond->dev, "In 802.3ad mode, it is not 
enough to up without speed and duplex");
+   }
+   }
switch (slave->link) {
case BOND_LINK_UP:
if (link_state)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2016-01-06 Thread Tantilov, Emil S
>-Original Message-
>From: zhuyj [mailto:zyjzyj2...@gmail.com]
>Sent: Wednesday, January 06, 2016 7:34 PM
>To: Tantilov, Emil S; Michal Kubecek; Jay Vosburgh
>Cc: vfal...@gmail.com; go...@cumulusnetworks.com; netdev@vger.kernel.org;
>Shteinbock, Boris (Wind River)
>Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
>
>On 01/07/2016 10:43 AM, Tantilov, Emil S wrote:
>>> -Original Message-
>>> From: zhuyj [mailto:zyjzyj2...@gmail.com]
>>> Sent: Tuesday, January 05, 2016 7:05 PM
>>> To: Tantilov, Emil S; Michal Kubecek; Jay Vosburgh
>>> Cc: vfal...@gmail.com; go...@cumulusnetworks.com;
>netdev@vger.kernel.org;
>>> Shteinbock, Boris (Wind River)
>>> Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
>>>
>>> On 01/06/2016 09:26 AM, Tantilov, Emil S wrote:
> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
>ow...@vger.kernel.org]
>>> On
> Behalf Of zhuyj
> Sent: Monday, December 28, 2015 1:19 AM
> To: Michal Kubecek; Jay Vosburgh
> Cc: vfal...@gmail.com; go...@cumulusnetworks.com;
>>> netdev@vger.kernel.org;
> Shteinbock, Boris (Wind River)
> Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
>
> On 12/28/2015 04:43 PM, Michal Kubecek wrote:
>> On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:
>>>  wrote:
 In 802.3ad mode, the speed and duplex is needed. But in some NIC,
 there is a time span between NIC up state and getting speed and
>>> duplex.
 As such, sometimes a slave in 802.3ad mode is in up state without
 speed and duplex. This will make bonding in 802.3ad mode can not
 work well.
 To make bonding driver be compatible with more NICs, it is
 necessary to restrict the up state in 802.3ad mode.
>>> What device is this?  It seems a bit odd that an Ethernet
>device
>>> can be carrier up but not have the duplex and speed available.
>> ...
>>> In general, though, bonding expects a speed or duplex change to
>>> be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
>>> propagate to the 802.3ad logic.
>>>
>>> If the device here is going carrier up prior to having speed or
>>> duplex available, then maybe it should call netdev_state_change()
>when
>>> the duplex and speed are available, or delay calling
>>> netif_carrier_on().
>> I have encountered this problem (NIC having carrier on before being
>>> able
>> to detect speed/duplex and driver not notifying when speed/duplex
>> becomes available) with netxen cards earlier. But it was eventually
>> fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event
>> handling.") so this example rather supports what you said.
>>
>>  Michal
>>> Kubecek
> Thanks a lot.
> I checked the commit 9d01412ae76f ("netxen: Fix link event
> handling."). The symptoms are the same with mine.
>
> The root cause is different. In my problem, the root cause is that
>LINKS
> register[]  can not provide link_up and link_speed at the same time.
> There is a time span between link_up and link_speed.
 The LINK_UP and LINK_SPEED bits in the LINKS register for ixgbe HW are
>>> updated
 simultaneously. Do you have any proof to show the delay you are
>referring
>>> to
 as I am sure our HW engineers would like to know about it.
>>> Sorry. I can not reproduce this problem locally. What I have is the
>>> feedback from the customer.
>> So you are assuming that there is a delay due to the issue you are
>seeing?
>
>Sure. Before I get the further feedback from the customer, I can not
>make further conclusion.
>My patch is based on the feedback from the customer.

Your patch is throwing an RTNL assertion warning:

RTNL: assertion failed at net/core/ethtool.c (357)

Looks like you may need to hold an RTNL lock for the slave before calling
bond_update_speed_duplex(), though I am not sure if it's a good idea in
general. 

Thanks,
Emil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net, sched: add clsact qdisc

2016-01-06 Thread Alexei Starovoitov
On Wed, Jan 06, 2016 at 02:00:56AM +0100, Daniel Borkmann wrote:
> 
> I decided to extend the sch_ingress module with clsact functionality so
> that commonly used code can be reused, the module is being aliased with
> sch_clsact so that it can be auto-loaded properly. Alternative would have been
> to add a flag when initializing ingress to alter its behaviour plus aliasing
> to a different name (as it's more than just ingress). However, the first would
> end up, based on the flag, choosing the new/old behaviour by calling different
> function implementations to handle each anyway, the latter would require to
> register ingress qdisc once again under different alias. So, this really begs
> to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
> by its own that share callbacks used by both.
...
> Signed-off-by: Daniel Borkmann 

we've been going back and forth on the design and this final approach
presented seems to be the best, since pros outweigh the cons.

Acked-by: Alexei Starovoitov 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2016-01-06 Thread zhuyj

On 01/07/2016 10:43 AM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, January 05, 2016 7:05 PM
To: Tantilov, Emil S; Michal Kubecek; Jay Vosburgh
Cc: vfal...@gmail.com; go...@cumulusnetworks.com; netdev@vger.kernel.org;
Shteinbock, Boris (Wind River)
Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

On 01/06/2016 09:26 AM, Tantilov, Emil S wrote:

-Original Message-
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]

On

Behalf Of zhuyj
Sent: Monday, December 28, 2015 1:19 AM
To: Michal Kubecek; Jay Vosburgh
Cc: vfal...@gmail.com; go...@cumulusnetworks.com;

netdev@vger.kernel.org;

Shteinbock, Boris (Wind River)
Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

On 12/28/2015 04:43 PM, Michal Kubecek wrote:

On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:

 wrote:

In 802.3ad mode, the speed and duplex is needed. But in some NIC,
there is a time span between NIC up state and getting speed and

duplex.

As such, sometimes a slave in 802.3ad mode is in up state without
speed and duplex. This will make bonding in 802.3ad mode can not
work well.
To make bonding driver be compatible with more NICs, it is
necessary to restrict the up state in 802.3ad mode.

What device is this?  It seems a bit odd that an Ethernet device
can be carrier up but not have the duplex and speed available.

...

In general, though, bonding expects a speed or duplex change to
be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
propagate to the 802.3ad logic.

If the device here is going carrier up prior to having speed or
duplex available, then maybe it should call netdev_state_change() when
the duplex and speed are available, or delay calling

netif_carrier_on().

I have encountered this problem (NIC having carrier on before being

able

to detect speed/duplex and driver not notifying when speed/duplex
becomes available) with netxen cards earlier. But it was eventually
fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event
handling.") so this example rather supports what you said.

 Michal

Kubecek

Thanks a lot.
I checked the commit 9d01412ae76f ("netxen: Fix link event
handling."). The symptoms are the same with mine.

The root cause is different. In my problem, the root cause is that LINKS
register[]  can not provide link_up and link_speed at the same time.
There is a time span between link_up and link_speed.

The LINK_UP and LINK_SPEED bits in the LINKS register for ixgbe HW are

updated

simultaneously. Do you have any proof to show the delay you are referring

to

as I am sure our HW engineers would like to know about it.

Sorry. I can not reproduce this problem locally. What I have is the
feedback from the customer.

So you are assuming that there is a delay due to the issue you are seeing?


Sure. Before I get the further feedback from the customer, I can not 
make further conclusion.

My patch is based on the feedback from the customer.




Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   100baseT/Full
1000baseT/Full
1baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  100baseT/Full
1000baseT/Full
1baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: external
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Current message level: 0x0007 (7)
   drv probe link
Link detected: yes

The speed and the link state here are reported from
different sources:
Sure. 
ixgbe_get_settings->hw->mac.ops.check_link(X540)->ixgbe_check_mac_link_generic
In this function ixgbe_check_mac_link_generic, the register IXGBE_LINKS 
is checked. link_up and

link_speed is got from this register.




Link detected: yes

Comes from a netif_carrier_ok() check. This is done via ethtool_op_get_link()

Only the speed is reported through the LINKS register - that is why it is 
reported
as "Unknown" - in other words link_up is false.

Sorry. I do not agree with you.

static inline bool netif_carrier_ok(const struct net_device *dev)
{
return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
}

netif_carrier_ok will check __LINK_STATE_NOCARRIER. This 
__LINK_STATE_NOCARRIER is set by netif_carrier_on.


/**
 *  netif_carrier_on - set carrier
 *  @dev: network device
 *
 * Device has detected that carrier.
 */
void netif_carrier_on(struct net_device *dev)
{
if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
if (dev->reg_state == NETREG_UNINITIALIZED)

Re: [PATCH net-next] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions

2016-01-06 Thread kbuild test robot
Hi Daniel,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Daniel-Borkmann/bpf-add-skb_postpush_rcsum-and-fix-dev_forward_skb-occasions/20160107-090423
config: x86_64-lkp (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from include/net/sch_generic.h:8:0,
from include/linux/filter.h:16,
from include/net/sock.h:64,
from include/net/inet_sock.h:27,
from include/net/ip.h:30,
from net/core/filter.c:34:
   net/core/filter.c: In function 'bpf_clone_redirect':
>> net/core/filter.c:1530:19: error: 'struct sk_buff' has no member named 
>> 'tc_verd'
  if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)
  ^
   include/uapi/linux/pkt_cls.h:11:25: note: in definition of macro '_TC_MAKE32'
#define _TC_MAKE32(x) ((x))
^
   include/uapi/linux/pkt_cls.h:54:26: note: in expansion of macro 
'_TC_GETVALUE'
#define G_TC_AT(x)   _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
 ^
   net/core/filter.c:1530:7: note: in expansion of macro 'G_TC_AT'
  if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)
  ^
   net/core/filter.c: In function 'skb_do_redirect':
   net/core/filter.c:1578:18: error: 'struct sk_buff' has no member named 
'tc_verd'
  if (G_TC_AT(skb->tc_verd) & AT_INGRESS)
 ^
   include/uapi/linux/pkt_cls.h:11:25: note: in definition of macro '_TC_MAKE32'
#define _TC_MAKE32(x) ((x))
^
   include/uapi/linux/pkt_cls.h:54:26: note: in expansion of macro 
'_TC_GETVALUE'
#define G_TC_AT(x)   _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
 ^
   net/core/filter.c:1578:7: note: in expansion of macro 'G_TC_AT'
  if (G_TC_AT(skb->tc_verd) & AT_INGRESS)
  ^

vim +1530 net/core/filter.c

  1524  
  1525  skb2 = skb_clone(skb, GFP_ATOMIC);
  1526  if (unlikely(!skb2))
  1527  return -ENOMEM;
  1528  
  1529  if (BPF_IS_REDIRECT_INGRESS(flags)) {
> 1530  if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)
  1531  skb_postpush_rcsum(skb2, skb_mac_header(skb2),
  1532 skb2->mac_len);
  1533  return dev_forward_skb(dev, skb2);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v2 net-next] net: Implement fast csum_partial for x86_64

2016-01-06 Thread Hannes Frederic Sowa

On 07.01.2016 03:36, Tom Herbert wrote:

On Wed, Jan 6, 2016 at 5:52 PM, Hannes Frederic Sowa
 wrote:

Hi Tom,

On 05.01.2016 19:41, Tom Herbert wrote:


--- /dev/null
+++ b/arch/x86/lib/csum-partial_64.S
@@ -0,0 +1,147 @@
+/* Copyright 2016 Tom Herbert 
+ *
+ * Checksum partial calculation
+ *
+ * __wsum csum_partial(const void *buff, int len, __wsum sum)
+ *
+ * Computes the checksum of a memory block at buff, length len,
+ * and adds in "sum" (32-bit)
+ *
+ * Returns a 32-bit number suitable for feeding into itself
+ * or csum_tcpudp_magic
+ *
+ * Register usage:
+ *   %rdi: argument 1, buff
+ *   %rsi: argument 2, length
+ *   %rdx: argument 3, add in value



I think you forgot to carry-add-in the %rdx register.

The assembly code replaces do_csum but not csum_partial.


First instruction is: movl%edx, %eax  /* Initialize with
initial sum argument */


Ups, sorry. I only grepped through the source. Meanwhile my test also 
uses non-zero sums and shows that it is fine.


Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2016-01-06 Thread Tantilov, Emil S
>-Original Message-
>From: zhuyj [mailto:zyjzyj2...@gmail.com]
>Sent: Tuesday, January 05, 2016 7:05 PM
>To: Tantilov, Emil S; Michal Kubecek; Jay Vosburgh
>Cc: vfal...@gmail.com; go...@cumulusnetworks.com; netdev@vger.kernel.org;
>Shteinbock, Boris (Wind River)
>Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
>
>On 01/06/2016 09:26 AM, Tantilov, Emil S wrote:
>>> -Original Message-
>>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
>On
>>> Behalf Of zhuyj
>>> Sent: Monday, December 28, 2015 1:19 AM
>>> To: Michal Kubecek; Jay Vosburgh
>>> Cc: vfal...@gmail.com; go...@cumulusnetworks.com;
>netdev@vger.kernel.org;
>>> Shteinbock, Boris (Wind River)
>>> Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
>>>
>>> On 12/28/2015 04:43 PM, Michal Kubecek wrote:
 On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:
>  wrote:
>> In 802.3ad mode, the speed and duplex is needed. But in some NIC,
>> there is a time span between NIC up state and getting speed and
>duplex.
>> As such, sometimes a slave in 802.3ad mode is in up state without
>> speed and duplex. This will make bonding in 802.3ad mode can not
>> work well.
>> To make bonding driver be compatible with more NICs, it is
>> necessary to restrict the up state in 802.3ad mode.
>   What device is this?  It seems a bit odd that an Ethernet device
> can be carrier up but not have the duplex and speed available.
 ...
>   In general, though, bonding expects a speed or duplex change to
> be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
> propagate to the 802.3ad logic.
>
>   If the device here is going carrier up prior to having speed or
> duplex available, then maybe it should call netdev_state_change() when
> the duplex and speed are available, or delay calling
>netif_carrier_on().
 I have encountered this problem (NIC having carrier on before being
>able
 to detect speed/duplex and driver not notifying when speed/duplex
 becomes available) with netxen cards earlier. But it was eventually
 fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event
 handling.") so this example rather supports what you said.

 Michal
>Kubecek
>>> Thanks a lot.
>>> I checked the commit 9d01412ae76f ("netxen: Fix link event
>>> handling."). The symptoms are the same with mine.
>>>
>>> The root cause is different. In my problem, the root cause is that LINKS
>>> register[]  can not provide link_up and link_speed at the same time.
>>> There is a time span between link_up and link_speed.
>> The LINK_UP and LINK_SPEED bits in the LINKS register for ixgbe HW are
>updated
>> simultaneously. Do you have any proof to show the delay you are referring
>to
>> as I am sure our HW engineers would like to know about it.
>Sorry. I can not reproduce this problem locally. What I have is the
>feedback from the customer.

So you are assuming that there is a delay due to the issue you are seeing?

>Settings for eth0:
>Supported ports: [ TP ]
>Supported link modes:   100baseT/Full
>1000baseT/Full
>1baseT/Full
>Supported pause frame use: No
>Supports auto-negotiation: Yes
>Advertised link modes:  100baseT/Full
>1000baseT/Full
>1baseT/Full
>Advertised pause frame use: No
>Advertised auto-negotiation: Yes
>Speed: Unknown!
>Duplex: Unknown! (255)
>Port: Twisted Pair
>PHYAD: 0
>Transceiver: external
>Auto-negotiation: on
>MDI-X: Unknown
>Supports Wake-on: d
>Wake-on: d
>Current message level: 0x0007 (7)
>   drv probe link
>Link detected: yes

The speed and the link state here are reported from
different sources:

>Link detected: yes

Comes from a netif_carrier_ok() check. This is done via ethtool_op_get_link().

Only the speed is reported through the LINKS register - that is why it is 
reported
as "Unknown" - in other words link_up is false.

This is a trace from the case where the bonding driver reports 0 Mbps:

   kworker/u48:1-27950 [010]   6493.084916: ixgbe_service_task: eth1: 
link_speed = 80, link_up = false
   kworker/u48:1-27950 [011]   6493.184894: ixgbe_service_task: eth1: 
link_speed = 80, link_up = false
   kworker/u48:1-27950 [000]   6494.439883: ixgbe_service_task: eth1: 
link_speed = 80, link_up = true
   kworker/u48:1-27950 [000]   6494.464204: ixgbe_service_task: eth1: NIC 
Link is Up 10 Gbps, Flow Control: RX/TX
 kworker/0:2-1926  [000]   6494.464249: ixgbe_get_settings: eth1: 
link_speed = 80, link_up = false
  NetworkManager-3819  [008]   6494.464484: ixgbe_get_settings: eth1: 
link_speed = 80, link_up = false
   kworker/u48:1-27950 [007]   6494.496886: bond_mii_monitor: bo

Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization of link_up and speed

2016-01-06 Thread zhuyj

On 01/06/2016 11:30 PM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, January 05, 2016 9:42 PM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River); Bourg,
Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization
of link_up and speed

On 12/31/2015 12:37 AM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Wednesday, December 30, 2015 12:20 AM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict

synchronization

of link_up and speed

On 12/30/2015 02:55 PM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, December 29, 2015 6:49 PM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W;

Ronciak,

John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict

synchronization

of link_up and speed

On 12/30/2015 12:18 AM, Tantilov, Emil S wrote:

-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-

boun...@lists.osuosl.org]

On

Behalf Of zyjzyj2...@gmail.com
Sent: Monday, December 28, 2015 6:32 PM
To: Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny,
Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak, John;

Williams,

Mitch

A; intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; e1000-
de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict

synchronization

of

link_up and speed

From: Zhu Yanjun 

When the X540 NIC acts as a slave of some virtual NICs, it is very
important to synchronize link_up and link_speed, such as a bonding
driver in 802.3ad mode. When X540 NIC acts as an independent

interface,

it is not necessary to synchronize link_up and link_speed. That is,
the time span between link_up and link_speed is acceptable.

What exactly do you mean by "time span between link_up and

link_speed"?

In the previous mail, I show you some ethtool logs. In these logs,

there

is some
time with NIC up while speed is unknown. I think this "some time" is
time span between
link_up and link_speed. Please see the previous mail for details.

Was this when reporting the link state from check_link() (reading the

LINKS

register) or reporting the adapter->link_speed?


Where is it you think the de-synchronization occurs?

When a NIC interface acts as a slave, a flag "IFF_SLAVE" is set in
netdevice struct.
Before we enter this function, we check IFF_SLAVE flag. If this flag

is

set, we continue to check
link_speed. If not, this function is executed whether this link_speed

is

unknown or not.

I can already see this in your patch. I was asking about the reason why
your change is needed.

an extreme example, let us assume this scenario:

Is this the scenario you are trying to fix?

Sure. If IFF_SLAVE is checked, this scenario will not happen.

I already explained why this is not a valid scenario, but if you were able
to set it up somehow I'd like to know how you did it
Sorry. Why can we not disable auto-negotiate on ixgbe NIC ? To be 
honest, I am interested in it.

This is related with hardware or driver?

Thanks a lot.
Zhu Yanjun


If we are to enter ixgbe_watchdog_link_is_up() with unknown link this would
be an issue regardless of whether the interface is a part of a bond or not,
but you haven't provided any proof that this is the case. Do you have a
dmesg log that shows ixgbe reporting unknown speed?

Was your patch tested by the customer that reported this issue?

Thanks,
Emil




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] net: Implement fast csum_partial for x86_64

2016-01-06 Thread Tom Herbert
On Wed, Jan 6, 2016 at 5:52 PM, Hannes Frederic Sowa
 wrote:
> Hi Tom,
>
> On 05.01.2016 19:41, Tom Herbert wrote:
>>
>> --- /dev/null
>> +++ b/arch/x86/lib/csum-partial_64.S
>> @@ -0,0 +1,147 @@
>> +/* Copyright 2016 Tom Herbert 
>> + *
>> + * Checksum partial calculation
>> + *
>> + * __wsum csum_partial(const void *buff, int len, __wsum sum)
>> + *
>> + * Computes the checksum of a memory block at buff, length len,
>> + * and adds in "sum" (32-bit)
>> + *
>> + * Returns a 32-bit number suitable for feeding into itself
>> + * or csum_tcpudp_magic
>> + *
>> + * Register usage:
>> + *   %rdi: argument 1, buff
>> + *   %rsi: argument 2, length
>> + *   %rdx: argument 3, add in value
>
>
> I think you forgot to carry-add-in the %rdx register.
>
> The assembly code replaces do_csum but not csum_partial.
>
First instruction is: movl%edx, %eax  /* Initialize with
initial sum argument */

Tom

> Bye,
> Hannes
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARN trace - skb_warn_bad_offload - vxlan - large udp packet - udp checksum disabled

2016-01-06 Thread Hannes Frederic Sowa

On 15.12.2015 02:35, Nelson, Shannon wrote:

Using a slightly modified version of udpspam (see diff below - hopefully not mangled by 
corporate email servers), where I set the SO_NO_CHECK socket option and can specify a large 
buffer size, I can reliably get the following WARN trace.  I have reproduced this on both ixgbe 
and i40e drivers using "udpspam-no-check  6000".

It looks to me like this is in the Tx path before we get to the actual NIC 
drivers, but I may be wrong.


Does UFO offloading on the tunnel interface fix this error?

ethtool -K vxlan ufo off

If yes we can't feed SO_NO_CHECK udp packets into UFO as gso/ufo 
requires CHECKSUM_PARTIAL (or add some more logic into the skb or query 
socket status from skb_gso_segment).


Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARN trace - skb_warn_bad_offload - vxlan - large udp packet - udp checksum disabled

2016-01-06 Thread Jesse Brandeburg
On Tue, 15 Dec 2015 01:35:27 +
"Nelson, Shannon"  wrote:

> Using a slightly modified version of udpspam (see diff below - hopefully
> not mangled by corporate email servers), where I set the SO_NO_CHECK
> socket option and can specify a large buffer size, I can reliably get
> the following WARN trace.  I have reproduced this on both ixgbe and
> i40e drivers using "udpspam-no-check  6000".
> 
> It looks to me like this is in the Tx path before we get to the actual
> NIC drivers, but I may be wrong.
> 
> 
> [ 1757.644324] [ cut here ]
> [ 1757.644333] WARNING: CPU: 22 PID: 5537 at net/core/dev.c:2423 
> skb_warn_bad_offload+0x104/0x111()

Was anyone able to take a look at this?  There is a pretty clear
reproducer in the original mail.

I know SO_NO_CHECK is a lightly used option, but it does exist and we
found a userspace program that tries to use it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization of link_up and speed

2016-01-06 Thread zhuyj

On 01/06/2016 11:30 PM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, January 05, 2016 9:42 PM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River); Bourg,
Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization
of link_up and speed

On 12/31/2015 12:37 AM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Wednesday, December 30, 2015 12:20 AM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict

synchronization

of link_up and speed

On 12/30/2015 02:55 PM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, December 29, 2015 6:49 PM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W;

Ronciak,

John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict

synchronization

of link_up and speed

On 12/30/2015 12:18 AM, Tantilov, Emil S wrote:

-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-

boun...@lists.osuosl.org]

On

Behalf Of zyjzyj2...@gmail.com
Sent: Monday, December 28, 2015 6:32 PM
To: Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny,
Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak, John;

Williams,

Mitch

A; intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; e1000-
de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict

synchronization

of

link_up and speed

From: Zhu Yanjun 

When the X540 NIC acts as a slave of some virtual NICs, it is very
important to synchronize link_up and link_speed, such as a bonding
driver in 802.3ad mode. When X540 NIC acts as an independent

interface,

it is not necessary to synchronize link_up and link_speed. That is,
the time span between link_up and link_speed is acceptable.

What exactly do you mean by "time span between link_up and

link_speed"?

In the previous mail, I show you some ethtool logs. In these logs,

there

is some
time with NIC up while speed is unknown. I think this "some time" is
time span between
link_up and link_speed. Please see the previous mail for details.

Was this when reporting the link state from check_link() (reading the

LINKS

register) or reporting the adapter->link_speed?


Where is it you think the de-synchronization occurs?

When a NIC interface acts as a slave, a flag "IFF_SLAVE" is set in
netdevice struct.
Before we enter this function, we check IFF_SLAVE flag. If this flag

is

set, we continue to check
link_speed. If not, this function is executed whether this link_speed

is

unknown or not.

I can already see this in your patch. I was asking about the reason why
your change is needed.

an extreme example, let us assume this scenario:

Is this the scenario you are trying to fix?

Sure. If IFF_SLAVE is checked, this scenario will not happen.

I already explained why this is not a valid scenario, but if you were able
to set it up somehow I'd like to know how you did it
If it is not a valid scenario, maybe there is something wrong with NIC 
driver/hardware.

We should pay attention to it.

Zhu Yanjun



If we are to enter ixgbe_watchdog_link_is_up() with unknown link this would
be an issue regardless of whether the interface is a part of a bond or not,
but you haven't provided any proof that this is the case. Do you have a
dmesg log that shows ixgbe reporting unknown speed?

Was your patch tested by the customer that reported this issue?

Thanks,
Emil



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions

2016-01-06 Thread Hannes Frederic Sowa

On 07.01.2016 02:01, Daniel Borkmann wrote:

+static inline void skb_postpush_rcsum(struct sk_buff *skb,
+ const void *start, unsigned int len)
+{
+   if (skb->ip_summed == CHECKSUM_COMPLETE)
+   skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));


skb->csum = csum_partial(start, len, skb->csum);

should work without calling carry-add twice, no?

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] net: Implement fast csum_partial for x86_64

2016-01-06 Thread Hannes Frederic Sowa

Hi Tom,

On 05.01.2016 19:41, Tom Herbert wrote:

--- /dev/null
+++ b/arch/x86/lib/csum-partial_64.S
@@ -0,0 +1,147 @@
+/* Copyright 2016 Tom Herbert 
+ *
+ * Checksum partial calculation
+ *
+ * __wsum csum_partial(const void *buff, int len, __wsum sum)
+ *
+ * Computes the checksum of a memory block at buff, length len,
+ * and adds in "sum" (32-bit)
+ *
+ * Returns a 32-bit number suitable for feeding into itself
+ * or csum_tcpudp_magic
+ *
+ * Register usage:
+ *   %rdi: argument 1, buff
+ *   %rsi: argument 2, length
+ *   %rdx: argument 3, add in value


I think you forgot to carry-add-in the %rdx register.

The assembly code replaces do_csum but not csum_partial.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v5 5/6] Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping

2016-01-06 Thread Christopher Hall

Hi Richard,

This all sounds fine.  Thanks for the feedback.  I'll roll this into the  
next patchset.


Chris

On Tue, 05 Jan 2016 07:27:32 -0800, Richard Cochran  
 wrote:

On Mon, Jan 04, 2016 at 04:45:22AM -0800, Christopher S. Hall wrote:

+   case PTP_SYS_OFFSET_PRECISE:
+   if (!ptp->info->getsynctime) {
+   err = -EINVAL;


-EOPNOTSUPP would be better here.


+   precise_offset.sys_real.sec =
+   div_u64_rem(ktime_to_ns(xtstamp.sys_realtime),
+   NSEC_PER_SEC, &rem);
+   precise_offset.sys_real.nsec = rem;


How about this instead:

ts = ktime_to_timespec64(xtstamp.sys_realtime);
precise_offset.sys_real.sec = ts.tv_sec;
precise_offset.sys_real.nsec = ts.tv_nsec;


+   precise_offset.sys_raw.sec =
+   div_u64_rem(ktime_to_ns(xtstamp.sys_monoraw),
+   NSEC_PER_SEC, &rem);
+   precise_offset.sys_raw.nsec = rem;
+   precise_offset.dev.sec =
+   div_u64_rem(ktime_to_ns(xtstamp.device), NSEC_PER_SEC,
+   &rem);
+   precise_offset.dev.nsec = rem;


And for these as well.

Thanks,
Richard

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 4:29 PM, David Wragg  wrote:
> Jesse Gross  writes:
>> On Wed, Jan 6, 2016 at 3:25 PM, David Wragg  wrote:
>>> I'm certainly open to suggestions of better ways to solve the problem.
>>
>> One option is to simply set the MTU on the device from userspace.
>
> If that worked I wouldn't be submitting a patch.
>
> The MTU value of 1500 is not merely the default.  It is also the maximum
> allowed for a vxlan netdev not associated with an underlying netdev.  If
> you do e.g. "ip link set dev vxlan-6784 mtu 8950", where vxlan-6784
> was created by an ovs vport, it fails with EINVAL.
>
> The first patch of the two submitted removes that limit.

I saw your first patch and I agree that it fixes a problem. I was
referring to the second patch.

>> The reality is that the code you're modifying is compatibility code.
>> Maybe we should make this change to preserve the old behavior or old
>> callers (although, again, it should not be just for VXLAN). But no new
>> features or tunnel types will be supported in this manner.
>
> That's fine.  Naturally, the ideal from our point of view is if the
> compatibility code is fully compatible, so we don't have to make changes
> on our side that involve different code paths for different kernel
> versions.  That's what my patches are intended to achieve.

The intention is to be fully backwards compatible with existing
software. If you want to take advantage of future functionality then,
yes, you will need to change to the new model.

I agree that behavior changed with existing compatibility code. I'm
fine with your series as long as you generalize it to all tunnel types
and not just VXLAN. Just be aware that you're going to have to find
another solution long term.

> Ok.  But please try to be gentle on the poor souls who have to come up
> with a single codebase that works on a range of kernel versions going
> back a few years.

I maintain a large program that needs to do this myself, so I am aware
that it can be challenging.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question about vrf-lite

2016-01-06 Thread Li RongQing
>>
>> is it right?
>
>
> no. The above works fine for me. I literally copied and pasted all of the
> commands except the master ones which were adapted to my setup -- eth9 and
> eth11 for me instead of eth0 and eth1. tcpdump on N2, N3 show the right one
> is receiving packets based on which 'ping -I vrf' is run.
>
> Do tables 5 and 6 have the right routes?


Thanks, David;

it is not VRF issue, it is my configuration issue about qemu;

I am testing VRF on qemu, and the issue/solution is same as issue
under below link

https://lists.gnu.org/archive/html/qemu-discuss/2014-06/msg00059.html

-Roy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions

2016-01-06 Thread Daniel Borkmann
Add a small helper skb_postpush_rcsum() and fix up redirect locations
that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects
a proper csum that covers also Ethernet header, f.e. since 2c26d34bbcc0
("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we
also do skb_postpull_rcsum() after pulling Ethernet header off via
eth_type_trans().

When using eBPF in a netns setup f.e. with vxlan in collect metadata mode,
I can trigger the following csum issue with an IPv6 setup:

  [  505.144065] dummy1: hw csum failure
  [...]
  [  505.144108] Call Trace:
  [  505.144112][] dump_stack+0x44/0x5c
  [  505.144134]  [] netdev_rx_csum_fault+0x3a/0x40
  [  505.144142]  [] __skb_checksum_complete+0xcf/0xe0
  [  505.144149]  [] nf_ip6_checksum+0xb2/0x120
  [  505.144161]  [] icmpv6_error+0x17e/0x328 
[nf_conntrack_ipv6]
  [  505.144170]  [] ? ip6t_do_table+0x2fa/0x645 [ip6_tables]
  [  505.144177]  [] ? ipv6_get_l4proto+0x65/0xd0 
[nf_conntrack_ipv6]
  [  505.144189]  [] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack]
  [  505.144196]  [] ipv6_conntrack_in+0x1c/0x20 
[nf_conntrack_ipv6]
  [  505.144204]  [] nf_iterate+0x5d/0x70
  [  505.144210]  [] nf_hook_slow+0x66/0xc0
  [  505.144218]  [] ipv6_rcv+0x3f2/0x4f0
  [  505.144225]  [] ? ip6_make_skb+0x1b0/0x1b0
  [  505.144232]  [] __netif_receive_skb_core+0x36b/0x9a0
  [  505.144239]  [] ? __netif_receive_skb+0x18/0x60
  [  505.144245]  [] __netif_receive_skb+0x18/0x60
  [  505.144252]  [] process_backlog+0x9f/0x140
  [  505.144259]  [] net_rx_action+0x145/0x320
  [...]

What happens is that on ingress, we push Ethernet header back in, either
from cls_bpf or right before skb_do_redirect(), but without updating csum.
The "hw csum failure" can be fixed by using the new skb_postpush_rcsum()
helper for the dev_forward_skb() case to correct the csum diff again.

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann 
---
 ( Given -net pull req is already out and nobody seemed to have triggered
   this so far, net-next seems just fine imho. )

 include/linux/skbuff.h |  7 +++
 net/core/filter.c  | 17 +
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6b6bd42..f8fad71 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2805,6 +2805,13 @@ static inline void skb_postpull_rcsum(struct sk_buff 
*skb,
 
 unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);
 
+static inline void skb_postpush_rcsum(struct sk_buff *skb,
+ const void *start, unsigned int len)
+{
+   if (skb->ip_summed == CHECKSUM_COMPLETE)
+   skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
+}
+
 /**
  * pskb_trim_rcsum - trim received skb and update checksum
  * @skb: buffer to trim
diff --git a/net/core/filter.c b/net/core/filter.c
index 35e6fed..c6160ce 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1368,8 +1368,9 @@ static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, 
u64 r4, u64 flags)
/* skb_store_bits cannot return -EFAULT here */
skb_store_bits(skb, offset, ptr, len);
 
-   if (BPF_RECOMPUTE_CSUM(flags) && skb->ip_summed == CHECKSUM_COMPLETE)
-   skb->csum = csum_add(skb->csum, csum_partial(ptr, len, 0));
+   if (BPF_RECOMPUTE_CSUM(flags))
+   skb_postpush_rcsum(skb, ptr, len);
+
return 0;
 }
 
@@ -1525,8 +1526,12 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 
flags, u64 r4, u64 r5)
if (unlikely(!skb2))
return -ENOMEM;
 
-   if (BPF_IS_REDIRECT_INGRESS(flags))
+   if (BPF_IS_REDIRECT_INGRESS(flags)) {
+   if (G_TC_AT(skb2->tc_verd) & AT_INGRESS)
+   skb_postpush_rcsum(skb2, skb_mac_header(skb2),
+  skb2->mac_len);
return dev_forward_skb(dev, skb2);
+   }
 
skb2->dev = dev;
skb_sender_cpu_clear(skb2);
@@ -1569,8 +1574,12 @@ int skb_do_redirect(struct sk_buff *skb)
return -EINVAL;
}
 
-   if (BPF_IS_REDIRECT_INGRESS(ri->flags))
+   if (BPF_IS_REDIRECT_INGRESS(ri->flags)) {
+   if (G_TC_AT(skb->tc_verd) & AT_INGRESS)
+   skb_postpush_rcsum(skb, skb_mac_header(skb),
+  skb->mac_len);
return dev_forward_skb(dev, skb);
+   }
 
skb->dev = dev;
skb_sender_cpu_clear(skb);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 4:14 PM, Hannes Frederic Sowa
 wrote:
> Hi,
>
>
> On 07.01.2016 00:57, Jesse Gross wrote:
>>
>> On Wed, Jan 6, 2016 at 3:25 PM, David Wragg  wrote:
>>>
>>> David Miller  writes:
>
> Prior to 4.3, openvswitch vxlan vports could transmit vxlan packets of
> any size, constrained only by the ability to transmit the resulting
> UDP packets.  4.3 introduced vxlan netdevs corresponding to vxlan
> vports.  These netdevs have an MTU, which limits the size of a packet
> that can be successfully vxlan-encapsulated.  The default value for
> this MTU is 1500, which is awkwardly small, and leads to a conspicuous
> change in behaviour for userspace.
>
> These two patches set the MTU on openvswitch-crated vxlan devices to
> be 65465 (the maximum IP packet size minus the vxlan-on-IPv6
> overhead), effectively restoring the behaviour prior to 4.3.  In order
> to accomplish this, the first patch removes the MTU constraint of 1500
> for vxlan netdevs without an underlying device.


 Is this really the right thing to do?
>>>
>>>
>>> I'm certainly open to suggestions of better ways to solve the problem.
>>
>>
>> One option is to simply set the MTU on the device from userspace.
>>
>> The reality is that the code you're modifying is compatibility code.
>> Maybe we should make this change to preserve the old behavior for old
>> callers (although, again, it should not be just for VXLAN). But no new
>> features or tunnel types will be supported in this manner.
>>
>> New or updated userspace programs should work by simply creating and
>> adding tunnel devices to OVS. That won't go through this path at all
>> so you're going to need to find another approach in the near future in
>> any case.
>
>
> I don't see any other way as to make MTUs part of the flow if we want to
> have correct ip_local_error notifications. And those must also work across
> VMs, so openvswitch in quasi brouting mode would need to emit ICMP PtBs
> (hopefully with a correct source address, otherwise uRPF kills them before
> reaching the applications) or do error signaling via virtio_net.

I actually implemented this a long time ago and then there was some
additional discussion on this about a year ago. I agree it's the right
solution overall but it's not entirely clearly to me how to get the
details correct.

> Either the openvswitch user space can feed those information to the datapath
> or the ovs dataplane can do a lookup on the outer ip address while filling
> out the metadata_dst and caching it in the flow or we just keep the dst in
> the flow anyway. So a net_device used by ovs has no real mtu anymore.

I agree that the concept of MTU is much more complicated than a single
number on a device, we just have to find the right way to model it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 8/8] geneve: break dependency to network drivers

2016-01-06 Thread Hannes Frederic Sowa

Hi Jesse,

On 07.01.2016 01:18, Jesse Gross wrote:

On Wed, Jan 6, 2016 at 3:39 PM, Hannes Frederic Sowa
 wrote:

Signed-off-by: Hannes Frederic Sowa 
---
  drivers/net/geneve.c | 30 +++---
  include/net/geneve.h |  7 +++
  2 files changed, 30 insertions(+), 7 deletions(-)


Thanks a lot for going through all the drivers! You found more things
than I thought. I noticed just a couple new things with this version
but overall I think it is a good direction.


diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 24b077a32c1c9c..5b8d728b1204be 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1110,7 +1110,7 @@ static struct device_type geneve_type = {
   * supply the listening GENEVE udp ports. Callers are expected
   * to implement the ndo_add_geneve_port.
   */
-void geneve_get_rx_port(struct net_device *dev)
+static void geneve_notify_refresh_netdev(struct net_device *dev)
  {
 struct net *net = dev_net(dev);
 struct geneve_net *gn = net_generic(net, geneve_net_id);


Both this function and the corresponding one in VXLAN assume that the
add port NDO is implemented and blindly dereference the pointer. Now
that all protocols get refreshed at the same time, we should check
first. There's also a comment above the function that says this, which
we can remove now.


Absolutely, will fix this in the next version. With the separate 
offloading types this wouldn't be possible. Thanks for seeing this!


I actually had this in a follow-up patch because I want to call this on 
NETDEV_REGISTER, too, so we don't need to add the geneve/vxlan functions 
to the drivers in ndo_start anymore.



I also think we can update the comments in netdevice.h for the
VXLAN/Geneve add/remove NDOs to say that they are protected by RTNL.


Will do. Thanks!


diff --git a/include/net/geneve.h b/include/net/geneve.h
index e6c23dc765f7ec..7d52077b72faa3 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
-#if IS_ENABLED(CONFIG_GENEVE)
-void geneve_get_rx_port(struct net_device *netdev);
-#else
  static inline void geneve_get_rx_port(struct net_device *netdev)
  {
+   call_netdevice_notifiers(NETDEV_REFRESH_OFFLOADS, netdev);
  }
-#endif


I think we should merge vxlan_get_rx_port() and geneve_get_rx_port()
into a single generic function. For drivers that support both, they
now have two calls to get notified for all offloaded ports. This
actually can cause problems related to duplicates, similar to what you
feared before.


Agreed. I add comments to the offloading functions in netdevice.h so 
they need to be implemented in a idempotent way and review the drivers 
how they deal with it.


Thanks for the review,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread David Wragg
Jesse Gross  writes:
> On Wed, Jan 6, 2016 at 3:25 PM, David Wragg  wrote:
>> I'm certainly open to suggestions of better ways to solve the problem.
>
> One option is to simply set the MTU on the device from userspace.

If that worked I wouldn't be submitting a patch.

The MTU value of 1500 is not merely the default.  It is also the maximum
allowed for a vxlan netdev not associated with an underlying netdev.  If
you do e.g. "ip link set dev vxlan-6784 mtu 8950", where vxlan-6784
was created by an ovs vport, it fails with EINVAL.

The first patch of the two submitted removes that limit.

> The reality is that the code you're modifying is compatibility code.
> Maybe we should make this change to preserve the old behavior or old
> callers (although, again, it should not be just for VXLAN). But no new
> features or tunnel types will be supported in this manner.

That's fine.  Naturally, the ideal from our point of view is if the
compatibility code is fully compatible, so we don't have to make changes
on our side that involve different code paths for different kernel
versions.  That's what my patches are intended to achieve.

But we can live with such changes on our side, as long as there is some
reasonable way to do so.  In the case of this vxlan MTU issue, there
doesn't seem to be one.

> New or updated userspace programs should work by simply creating and
> adding tunnel devices to OVS. That won't go through this path at all
> so you're going to need to find another approach in the near future in
> any case.

Ok.  But please try to be gentle on the poor souls who have to come up
with a single codebase that works on a range of kernel versions going
back a few years.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 8/8] geneve: break dependency to network drivers

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 3:39 PM, Hannes Frederic Sowa
 wrote:
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  drivers/net/geneve.c | 30 +++---
>  include/net/geneve.h |  7 +++
>  2 files changed, 30 insertions(+), 7 deletions(-)

Thanks a lot for going through all the drivers! You found more things
than I thought. I noticed just a couple new things with this version
but overall I think it is a good direction.

> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index 24b077a32c1c9c..5b8d728b1204be 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -1110,7 +1110,7 @@ static struct device_type geneve_type = {
>   * supply the listening GENEVE udp ports. Callers are expected
>   * to implement the ndo_add_geneve_port.
>   */
> -void geneve_get_rx_port(struct net_device *dev)
> +static void geneve_notify_refresh_netdev(struct net_device *dev)
>  {
> struct net *net = dev_net(dev);
> struct geneve_net *gn = net_generic(net, geneve_net_id);

Both this function and the corresponding one in VXLAN assume that the
add port NDO is implemented and blindly dereference the pointer. Now
that all protocols get refreshed at the same time, we should check
first. There's also a comment above the function that says this, which
we can remove now.

I also think we can update the comments in netdevice.h for the
VXLAN/Geneve add/remove NDOs to say that they are protected by RTNL.

> diff --git a/include/net/geneve.h b/include/net/geneve.h
> index e6c23dc765f7ec..7d52077b72faa3 100644
> --- a/include/net/geneve.h
> +++ b/include/net/geneve.h
> -#if IS_ENABLED(CONFIG_GENEVE)
> -void geneve_get_rx_port(struct net_device *netdev);
> -#else
>  static inline void geneve_get_rx_port(struct net_device *netdev)
>  {
> +   call_netdevice_notifiers(NETDEV_REFRESH_OFFLOADS, netdev);
>  }
> -#endif

I think we should merge vxlan_get_rx_port() and geneve_get_rx_port()
into a single generic function. For drivers that support both, they
now have two calls to get notified for all offloaded ports. This
actually can cause problems related to duplicates, similar to what you
feared before.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread Hannes Frederic Sowa

Hi,

On 07.01.2016 00:57, Jesse Gross wrote:

On Wed, Jan 6, 2016 at 3:25 PM, David Wragg  wrote:

David Miller  writes:

Prior to 4.3, openvswitch vxlan vports could transmit vxlan packets of
any size, constrained only by the ability to transmit the resulting
UDP packets.  4.3 introduced vxlan netdevs corresponding to vxlan
vports.  These netdevs have an MTU, which limits the size of a packet
that can be successfully vxlan-encapsulated.  The default value for
this MTU is 1500, which is awkwardly small, and leads to a conspicuous
change in behaviour for userspace.

These two patches set the MTU on openvswitch-crated vxlan devices to
be 65465 (the maximum IP packet size minus the vxlan-on-IPv6
overhead), effectively restoring the behaviour prior to 4.3.  In order
to accomplish this, the first patch removes the MTU constraint of 1500
for vxlan netdevs without an underlying device.


Is this really the right thing to do?


I'm certainly open to suggestions of better ways to solve the problem.


One option is to simply set the MTU on the device from userspace.

The reality is that the code you're modifying is compatibility code.
Maybe we should make this change to preserve the old behavior for old
callers (although, again, it should not be just for VXLAN). But no new
features or tunnel types will be supported in this manner.

New or updated userspace programs should work by simply creating and
adding tunnel devices to OVS. That won't go through this path at all
so you're going to need to find another approach in the near future in
any case.


I don't see any other way as to make MTUs part of the flow if we want to 
have correct ip_local_error notifications. And those must also work 
across VMs, so openvswitch in quasi brouting mode would need to emit 
ICMP PtBs (hopefully with a correct source address, otherwise uRPF kills 
them before reaching the applications) or do error signaling via virtio_net.


Either the openvswitch user space can feed those information to the 
datapath or the ovs dataplane can do a lookup on the outer ip address 
while filling out the metadata_dst and caching it in the flow or we just 
keep the dst in the flow anyway. So a net_device used by ovs has no real 
mtu anymore.


Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread David Miller
From: Joe Perches 
Date: Wed, 06 Jan 2016 15:32:24 -0800

> On Thu, 2016-01-07 at 00:26 +0100, Bjørn Mork wrote:
>> Joe Perches  writes:
>> > On Wed, 2016-01-06 at 16:33 -0500, David Miller wrote:
>> > > A repeating pattern in drivers has become to use OF node information
>> > > and, if not found, platform specific host information to extract the
>> > > ethernet address for a given device.
>> > []
>> > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
>> > []
>> > > @@ -485,3 +487,32 @@ static int __init eth_offload_init(void)
>> > >  }
>> > >  
>> > >  fs_initcall(eth_offload_init);
>> > > +
>> > > +unsigned char * __weak arch_get_platform_mac_address(void)
>> > > +{
>> > > +   return NULL;
>> > 
>> > WARN_ON_ONCE ?
>> 
>> That would prevent a driver from using this with additional fallback
>> methods.  For what reason?
> 
> It's declared __weak and should be overridden by
> an arch specific function.
> 
> A NULL address would cause a fault when using
> a function like copy_ether_addr.

The caller checks for NULL, and this is the default implementation
doing precisely what it is meant to do.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread Joe Perches
On Wed, 2016-01-06 at 19:01 -0500, David Miller wrote:
> The caller checks for NULL, and this is the default implementation
> doing precisely what it is meant to do.

Yeah, I noticed that later. Nevermind...

cheers, Joe
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 3:25 PM, David Wragg  wrote:
> David Miller  writes:
>>> Prior to 4.3, openvswitch vxlan vports could transmit vxlan packets of
>>> any size, constrained only by the ability to transmit the resulting
>>> UDP packets.  4.3 introduced vxlan netdevs corresponding to vxlan
>>> vports.  These netdevs have an MTU, which limits the size of a packet
>>> that can be successfully vxlan-encapsulated.  The default value for
>>> this MTU is 1500, which is awkwardly small, and leads to a conspicuous
>>> change in behaviour for userspace.
>>>
>>> These two patches set the MTU on openvswitch-crated vxlan devices to
>>> be 65465 (the maximum IP packet size minus the vxlan-on-IPv6
>>> overhead), effectively restoring the behaviour prior to 4.3.  In order
>>> to accomplish this, the first patch removes the MTU constraint of 1500
>>> for vxlan netdevs without an underlying device.
>>
>> Is this really the right thing to do?
>
> I'm certainly open to suggestions of better ways to solve the problem.

One option is to simply set the MTU on the device from userspace.

The reality is that the code you're modifying is compatibility code.
Maybe we should make this change to preserve the old behavior for old
callers (although, again, it should not be just for VXLAN). But no new
features or tunnel types will be supported in this manner.

New or updated userspace programs should work by simply creating and
adding tunnel devices to OVS. That won't go through this path at all
so you're going to need to find another approach in the near future in
any case.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Florian Westphal
Florian Westphal  wrote:
> Thadeu Lima de Souza Cascardo  wrote:
> > On Wed, Jan 06, 2016 at 11:11:41PM +0300, Konstantin Khlebnikov wrote:

[ skb_gso_segment uses skb->cb[], causes oops if ip_fragment is invoked
  on segmented skbs ]

> > I have hit this as well, this fixes it for me on an older kernel. Can you 
> > try it
> > on latest kernel?
> 
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index d8a1745..f44bc91 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -216,6 +216,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
> > netdev_features_t features;
> > struct sk_buff *segs;
> > int ret = 0;
> > +   struct inet_skb_parm ipcb;
> >  
> > if (skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
> > return ip_finish_output2(skb);
> > @@ -227,6 +228,10 @@ static int ip_finish_output_gso(struct sk_buff *skb)
> >  * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
> >  * from host network stack.
> >  */
> > +   /* We need to save IPCB here because skb_gso_segment will use
> > +* SKB_GSO_CB.
> > +*/
> > +   ipcb = *IPCB(skb);
> > features = netif_skb_features(skb);
> > segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
> > if (IS_ERR_OR_NULL(segs)) {
> > @@ -241,6 +246,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
> > int err;
> >  
> > segs->next = NULL;
> > +   *IPCB(segs) = ipcb;
> > err = ip_fragment(segs, ip_finish_output2);
> >  
> > if (err && ret == 0)
> 
> I'm worried that this doesn't solve all cases. f.e. xfrm may also
> call skb_gso_segment(), and it will call into ipv4/ipv6 netfilter
> postrouting + ipv4 output functions...
> 
> nfqnl_enqueue_packet() is also affected.

... but it seems that those three are the only affected callers
of skb_gso_segment (tbf is ok since skb isn't owned by anyone,
ovs does save/restore already).

I think this patch is the right way, we just need similar
save/restore in nfqnl_enqueue_packet and xfrm_output_gso().

The latter two can be used by either ipv4 or ipv6 so it might
be preferable to just save/restore sizeof(struct skb_gso_cb);
or a union of inet_skb_parm+inet6_skb_parm.

Cascardo, can you cook a patch?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread Bjørn Mork
Joe Perches  writes:
> On Thu, 2016-01-07 at 00:26 +0100, Bjørn Mork wrote:
>> Joe Perches  writes:
>> > On Wed, 2016-01-06 at 16:33 -0500, David Miller wrote:
>> > > A repeating pattern in drivers has become to use OF node information
>> > > and, if not found, platform specific host information to extract the
>> > > ethernet address for a given device.
>> > []
>> > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
>> > []
>> > > @@ -485,3 +487,32 @@ static int __init eth_offload_init(void)
>> > >  }
>> > >  
>> > >  fs_initcall(eth_offload_init);
>> > > +
>> > > +unsigned char * __weak arch_get_platform_mac_address(void)
>> > > +{
>> > > +   return NULL;
>> > 
>> > WARN_ON_ONCE ?
>> 
>> That would prevent a driver from using this with additional fallback
>> methods.  For what reason?
>
> It's declared __weak and should be overridden by
> an arch specific function.

"should"?  Why?  Do you have a suggested implementation for - say - x86?

> A NULL address would cause a fault when using
> a function like copy_ether_addr.

You should not do that then :)


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 4/8] benet: add rtnl lock protection around be_open in be_resume

2016-01-06 Thread Hannes Frederic Sowa
Cc: Sathya Perla 
Cc: Ajit Khaparde 
Cc: Padmanabh Ratnakar 
Cc: Sriharsha Basavapatna 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index f99de3657ce3b5..3200f48ddd5a68 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4846,11 +4846,12 @@ static int be_resume(struct be_adapter *adapter)
if (status)
return status;
 
-   if (netif_running(netdev)) {
+   rtnl_lock();
+   if (netif_running(netdev))
status = be_open(netdev);
-   if (status)
-   return status;
-   }
+   rtnl_unlock();
+   if (status)
+   return status;
 
netif_device_attach(netdev);
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 1/8] qlcnic: protect qlcnic_82xx_io_slot_reset with rtnl lock

2016-01-06 Thread Hannes Frederic Sowa
Cc: dept-gelinuxnic...@qlogic.com
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 1205f6f9c94173..1c29105b6c364f 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -3952,8 +3952,14 @@ static pci_ers_result_t 
qlcnic_82xx_io_error_detected(struct pci_dev *pdev,
 
 static pci_ers_result_t qlcnic_82xx_io_slot_reset(struct pci_dev *pdev)
 {
-   return qlcnic_attach_func(pdev) ? PCI_ERS_RESULT_DISCONNECT :
-   PCI_ERS_RESULT_RECOVERED;
+   pci_ers_result_t res;
+
+   rtnl_lock();
+   res = qlcnic_attach_func(pdev) ? PCI_ERS_RESULT_DISCONNECT :
+PCI_ERS_RESULT_RECOVERED;
+   rtnl_unlock();
+
+   return res;
 }
 
 static void qlcnic_82xx_io_resume(struct pci_dev *pdev)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 5/8] fm10k: add rtnl lock protection in fm10k_io_resume

2016-01-06 Thread Hannes Frederic Sowa
Cc: Jeff Kirsher 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 4eb7a6fa6b0ddc..846d9b731c8c14 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2350,8 +2350,10 @@ static void fm10k_io_resume(struct pci_dev *pdev)
/* reset clock */
fm10k_ts_reset(interface);
 
+   rtnl_lock();
if (netif_running(netdev))
err = fm10k_open(netdev);
+   rtnl_lock();
 
/* final check of hardware state before registering the interface */
err = err ? : fm10k_hw_ready(interface);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 8/8] geneve: break dependency to network drivers

2016-01-06 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/geneve.c | 30 +++---
 include/net/geneve.h |  7 +++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 24b077a32c1c9c..5b8d728b1204be 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1110,7 +1110,7 @@ static struct device_type geneve_type = {
  * supply the listening GENEVE udp ports. Callers are expected
  * to implement the ndo_add_geneve_port.
  */
-void geneve_get_rx_port(struct net_device *dev)
+static void geneve_notify_refresh_netdev(struct net_device *dev)
 {
struct net *net = dev_net(dev);
struct geneve_net *gn = net_generic(net, geneve_net_id);
@@ -1128,7 +1128,6 @@ void geneve_get_rx_port(struct net_device *dev)
}
rcu_read_unlock();
 }
-EXPORT_SYMBOL_GPL(geneve_get_rx_port);
 
 /* Initialize the device structure. */
 static void geneve_setup(struct net_device *dev)
@@ -1450,6 +1449,24 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
 }
 EXPORT_SYMBOL_GPL(geneve_dev_create_fb);
 
+static int geneve_notifier(struct notifier_block *unused,
+  unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+   switch (event) {
+   case NETDEV_REFRESH_OFFLOADS:
+   geneve_notify_refresh_netdev(dev);
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block geneve_notifier_block __read_mostly = {
+   .notifier_call = geneve_notifier,
+};
+
 static __net_init int geneve_init_net(struct net *net)
 {
struct geneve_net *gn = net_generic(net, geneve_net_id);
@@ -1502,11 +1519,17 @@ static int __init geneve_init_module(void)
if (rc)
goto out1;
 
-   rc = rtnl_link_register(&geneve_link_ops);
+   rc = register_netdevice_notifier(&geneve_notifier_block);
if (rc)
goto out2;
 
+   rc = rtnl_link_register(&geneve_link_ops);
+   if (rc)
+   goto out3;
+
return 0;
+out3:
+   unregister_netdevice_notifier(&geneve_notifier_block);
 out2:
unregister_pernet_subsys(&geneve_net_ops);
 out1:
@@ -1517,6 +1540,7 @@ late_initcall(geneve_init_module);
 static void __exit geneve_cleanup_module(void)
 {
rtnl_link_unregister(&geneve_link_ops);
+   unregister_netdevice_notifier(&geneve_notifier_block);
unregister_pernet_subsys(&geneve_net_ops);
 }
 module_exit(geneve_cleanup_module);
diff --git a/include/net/geneve.h b/include/net/geneve.h
index e6c23dc765f7ec..7d52077b72faa3 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -1,6 +1,8 @@
 #ifndef __NET_GENEVE_H
 #define __NET_GENEVE_H  1
 
+#include 
+
 #ifdef CONFIG_INET
 #include 
 #endif
@@ -62,13 +64,10 @@ struct genevehdr {
struct geneve_opt options[];
 };
 
-#if IS_ENABLED(CONFIG_GENEVE)
-void geneve_get_rx_port(struct net_device *netdev);
-#else
 static inline void geneve_get_rx_port(struct net_device *netdev)
 {
+   call_netdevice_notifiers(NETDEV_REFRESH_OFFLOADS, netdev);
 }
-#endif
 
 #ifdef CONFIG_INET
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 3/8] ixgbe: add rtnl locking in service task around vxlan_get_rx_port

2016-01-06 Thread Hannes Frederic Sowa
Protect IXGBE_FLAG2_VXLAN_REREG_NEEDED flag with rtnl_lock as it seems
it could be concurrently modified by ixgbe_set_features and
ixgbe_del_vxlan_port.

Cc: Jeff Kirsher 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea9537d0e63a9d..8bef07196c804d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7115,10 +7115,12 @@ static void ixgbe_service_task(struct work_struct *work)
return;
}
 #ifdef CONFIG_IXGBE_VXLAN
+   rtnl_lock();
if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
vxlan_get_rx_port(adapter->netdev);
}
+   rtnl_unlock();
 #endif /* CONFIG_IXGBE_VXLAN */
ixgbe_reset_subtask(adapter);
ixgbe_phy_interrupt_subtask(adapter);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 7/8] vxlan: break dependency to network drivers

2016-01-06 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/vxlan.c | 17 +++--
 include/net/vxlan.h |  5 +
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fecf7b6c732e96..10f1304d58e1ea 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2468,7 +2468,7 @@ static struct device_type vxlan_type = {
  * supply the listening VXLAN udp ports. Callers are expected
  * to implement the ndo_add_vxlan_port.
  */
-void vxlan_get_rx_port(struct net_device *dev)
+static void vxlan_notify_refresh_netdev(struct net_device *dev)
 {
struct vxlan_sock *vs;
struct net *net = dev_net(dev);
@@ -2488,7 +2488,6 @@ void vxlan_get_rx_port(struct net_device *dev)
}
spin_unlock(&vn->sock_lock);
 }
-EXPORT_SYMBOL_GPL(vxlan_get_rx_port);
 
 /* Initialize the device structure. */
 static void vxlan_setup(struct net_device *dev)
@@ -3164,20 +3163,26 @@ static void vxlan_handle_lowerdev_unregister(struct 
vxlan_net *vn,
unregister_netdevice_many(&list_kill);
 }
 
-static int vxlan_lowerdev_event(struct notifier_block *unused,
-   unsigned long event, void *ptr)
+static int vxlan_notifier(struct notifier_block *unused,
+ unsigned long event, void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 
-   if (event == NETDEV_UNREGISTER)
+   switch (event) {
+   case NETDEV_REFRESH_OFFLOADS:
+   vxlan_notify_refresh_netdev(dev);
+   break;
+   case NETDEV_UNREGISTER:
vxlan_handle_lowerdev_unregister(vn, dev);
+   break;
+   }
 
return NOTIFY_DONE;
 }
 
 static struct notifier_block vxlan_notifier_block __read_mostly = {
-   .notifier_call = vxlan_lowerdev_event,
+   .notifier_call = vxlan_notifier,
 };
 
 static __net_init int vxlan_init_net(struct net *net)
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 0fb86442544b26..48d0450160c91f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -242,13 +242,10 @@ static inline netdev_features_t 
vxlan_features_check(struct sk_buff *skb,
 /* IPv6 header + UDP + VXLAN + Ethernet header */
 #define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
 
-#if IS_ENABLED(CONFIG_VXLAN)
-void vxlan_get_rx_port(struct net_device *netdev);
-#else
 static inline void vxlan_get_rx_port(struct net_device *netdev)
 {
+   call_netdevice_notifiers(NETDEV_REFRESH_OFFLOADS, netdev);
 }
-#endif
 
 static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
 {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 6/8] netdev: add netdevice notifier type to trigger a reprogramming of offloads

2016-01-06 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/netdevice.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8d8e5ca951b493..9750e46760695d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2183,6 +2183,7 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO0x0019
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
+#define NETDEV_REFRESH_OFFLOADS0x001C
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 2/8] mlx4: add rtnl lock protection in mlx4_en_restart

2016-01-06 Thread Hannes Frederic Sowa
I don't really understand the use of the state_lock. Can't it be simply
replaced by rtnl_lock? It seems a lot of current users depend on
rtnl_lock anyway.

Anyway, fix this up for the moment.

Cc: Eugenia Emantayev 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0c7e3f69a73bb6..94abd0843901cf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1846,6 +1846,7 @@ static void mlx4_en_restart(struct work_struct *work)
 
en_dbg(DRV, priv, "Watchdog task called for port %d\n", priv->port);
 
+   rtnl_lock();
mutex_lock(&mdev->state_lock);
if (priv->port_up) {
mlx4_en_stop_port(dev, 1);
@@ -1853,6 +1854,7 @@ static void mlx4_en_restart(struct work_struct *work)
en_err(priv, "Failed restarting port %d\n", priv->port);
}
mutex_unlock(&mdev->state_lock);
+   rtnl_unlock();
 }
 
 static void mlx4_en_clear_stats(struct net_device *dev)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 0/8] net: break dependency of drivers on geneve and vxlan

2016-01-06 Thread Hannes Frederic Sowa
Device drivers which support geneve or vxlan offloading have a dependency
on the correlating tunnel kernel modules. Thus those drivers automatically
load the geneve or vxlan modules. Break this dependency with this
small series.

Additionally this series features a review of the respective ->ndo_open
and other functions around vxlan_get_rx_port and geneve_get_rx_port.

* Result:
$ cd drivers/net/ethernet/
$ find . -name '*.ko' | xargs modinfo | egrep  '^depends:.*(vxlan|geneve)'  | 
wc -l
0

I also incorporated feedback from Jesse Gross to only use one new
netdevice notifiers type, namely NETDEV_REFRESH_OFFLOADS. Otherwise this
series is very much the same as v1.

Hannes Frederic Sowa (8):
  qlcnic: protect qlcnic_82xx_io_slot_reset with rtnl lock
  mlx4: add rtnl lock protection in mlx4_en_restart
  ixgbe: add rtnl locking in service task around vxlan_get_rx_port
  benet: add rtnl lock protection around be_open in be_resume
  fm10k: add rtnl lock protection in fm10k_io_resume
  netdev: add netdevice notifier type to trigger a reprogramming of
offloads
  vxlan: break dependency to network drivers
  geneve: break dependency to network drivers

 drivers/net/ethernet/emulex/benet/be_main.c  |  9 +++
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  2 ++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 ++
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 10 ++--
 drivers/net/geneve.c | 30 +---
 drivers/net/vxlan.c  | 17 +-
 include/linux/netdevice.h|  1 +
 include/net/geneve.h |  7 +++---
 include/net/vxlan.h  |  5 +---
 10 files changed, 62 insertions(+), 23 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread Joe Perches
On Thu, 2016-01-07 at 00:26 +0100, Bjørn Mork wrote:
> Joe Perches  writes:
> > On Wed, 2016-01-06 at 16:33 -0500, David Miller wrote:
> > > A repeating pattern in drivers has become to use OF node information
> > > and, if not found, platform specific host information to extract the
> > > ethernet address for a given device.
> > []
> > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> > []
> > > @@ -485,3 +487,32 @@ static int __init eth_offload_init(void)
> > >  }
> > >  
> > >  fs_initcall(eth_offload_init);
> > > +
> > > +unsigned char * __weak arch_get_platform_mac_address(void)
> > > +{
> > > +   return NULL;
> > 
> > WARN_ON_ONCE ?
> 
> That would prevent a driver from using this with additional fallback
> methods.  For what reason?

It's declared __weak and should be overridden by
an arch specific function.

A NULL address would cause a fault when using
a function like copy_ether_addr.


> I don't have a specific usecase, but I can imagine drivers falling back
> to e.g a random address without wanting to be noisy about it.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread Bjørn Mork
Joe Perches  writes:
> On Wed, 2016-01-06 at 16:33 -0500, David Miller wrote:
>> A repeating pattern in drivers has become to use OF node information
>> and, if not found, platform specific host information to extract the
>> ethernet address for a given device.
> []
>> diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> []
>> @@ -485,3 +487,32 @@ static int __init eth_offload_init(void)
>>  }
>>  
>>  fs_initcall(eth_offload_init);
>> +
>> +unsigned char * __weak arch_get_platform_mac_address(void)
>> +{
>> +   return NULL;
>
> WARN_ON_ONCE ?

That would prevent a driver from using this with additional fallback
methods.  For what reason?

I don't have a specific usecase, but I can imagine drivers falling back
to e.g a random address without wanting to be noisy about it.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread David Wragg
David Miller  writes:
>> Prior to 4.3, openvswitch vxlan vports could transmit vxlan packets of
>> any size, constrained only by the ability to transmit the resulting
>> UDP packets.  4.3 introduced vxlan netdevs corresponding to vxlan
>> vports.  These netdevs have an MTU, which limits the size of a packet
>> that can be successfully vxlan-encapsulated.  The default value for
>> this MTU is 1500, which is awkwardly small, and leads to a conspicuous
>> change in behaviour for userspace.
>> 
>> These two patches set the MTU on openvswitch-crated vxlan devices to
>> be 65465 (the maximum IP packet size minus the vxlan-on-IPv6
>> overhead), effectively restoring the behaviour prior to 4.3.  In order
>> to accomplish this, the first patch removes the MTU constraint of 1500
>> for vxlan netdevs without an underlying device.
>
> Is this really the right thing to do?

I'm certainly open to suggestions of better ways to solve the problem.

To be clear, the problem from our perspective is that a use of the
kernel openvswitch that worked fine in 4.2 and earlier is hobbled in
4.3.  Previously the MTU of an openvswitch-based vxlan overlay network
was constrained only by the MTU of the physical network.  In 4.3, we
can't take advantage of physical networks that support jumbo frames,
causing a huge hit to throughput across the overlay network.

The specific limit of 1500 seems very arbitrary.  For a vxlan overlay
network on top of a traditional ethernet network, the "correct" MTU for
the vxlan netdevs is 1450 rather than 1500.  And in general with
openvswitch, the destination for vxlan packets is determined on a
packet-by-packet basis, possibly involving different path MTUs of the
underlying network.  There is no single "correct" value.

> Won't we get a lot of fragmentation
> by using such a large MTU, especially since you're making it the default
> for OVS setups?

In the context of the openvswitch vxlan vport transmit path, I can't
find a place where the dev->mtu is used (and it would be surprising, on
the basis that the relevant parts of vxlan.c have not changed that much
since 4.2, when no netdev was involved in that path).

Considering non-openvswitch scenarios, when using vxlan netdevs
directly, a vxlan netdev locked to an underlying device supporting jumbo
frames can use a larger MTU.  It's only vxlan netdevs without an
underlying device that have the limit of 1500 imposed.  But why
shouldn't there be the same flexibility to select an MTU for best
performance in both cases?  Aren't the fragmentation concerns the same?

> Things like path MTU discovery hinge strongly upon accurate MTU settings.
> Otherwise they won't function properly.

True.  But in what sense is 1500 accurate?  Uses/users of the kernel
openvswitch code have always had to get this right, making sure that the
MTU set on a vxlan overlay network conforms to the underlying network
paths involved.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread Joe Perches
On Wed, 2016-01-06 at 16:33 -0500, David Miller wrote:
> A repeating pattern in drivers has become to use OF node information
> and, if not found, platform specific host information to extract the
> ethernet address for a given device.
[]
> diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
[]
> @@ -485,3 +487,32 @@ static int __init eth_offload_init(void)
>  }
>  
>  fs_initcall(eth_offload_init);
> +
> +unsigned char * __weak arch_get_platform_mac_address(void)
> +{
> +   return NULL;

WARN_ON_ONCE ?

> +}
> +
> +int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr)
> +{
> +   const unsigned char *addr;
> +   struct device_node *dp;
> +
> +   if (dev_is_pci(dev))
> +   dp = pci_device_to_OF_node(to_pci_dev(dev));
> +   else
> +   dp = dev->of_node;
> +
> +   addr = NULL;
> +   if (dp)
> +   addr = of_get_mac_address(dp);
> +   if (!addr)
> +   addr = arch_get_platform_mac_address();
> +
> +   if (!addr)
> +   return -ENODEV;
> +
> +   ether_addr_copy(mac_addr, addr);

Couldn't some oddball arch have an unaligned addr?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: [PATCH net 0/3] ipv4: fix various issues reported by sparse

2016-01-06 Thread Lance Richardson
- Forwarded Message -
> This trivial patch series addresses a number of endianness- and
> lock-related issues reported by sparse.
> 
> Lance Richardson (3):
>   ipv4: fix endianness warnings in ip_tunnel_core.c
>   ipv4: eliminate endianness warnings in ip_fib.h
>   ipv4: eliminate lock count warnings in ping.c
> 
>  include/net/ip_fib.h  |  3 ++-
>  net/ipv4/ip_tunnel_core.c | 16 
>  net/ipv4/ping.c   |  2 ++
>  3 files changed, 12 insertions(+), 9 deletions(-)
> 
> --
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Apologies, I must have fumbled the "--cc-cmd=" option to git send-email.

Resending cover letter with cc: list.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 12:59 PM, David Miller  wrote:
> From: David Wragg 
> Date: Wed,  6 Jan 2016 13:33:04 +
>
>> Prior to 4.3, openvswitch vxlan vports could transmit vxlan packets of
>> any size, constrained only by the ability to transmit the resulting
>> UDP packets.  4.3 introduced vxlan netdevs corresponding to vxlan
>> vports.  These netdevs have an MTU, which limits the size of a packet
>> that can be successfully vxlan-encapsulated.  The default value for
>> this MTU is 1500, which is awkwardly small, and leads to a conspicuous
>> change in behaviour for userspace.
>>
>> These two patches set the MTU on openvswitch-crated vxlan devices to
>> be 65465 (the maximum IP packet size minus the vxlan-on-IPv6
>> overhead), effectively restoring the behaviour prior to 4.3.  In order
>> to accomplish this, the first patch removes the MTU constraint of 1500
>> for vxlan netdevs without an underlying device.
>
> Is this really the right thing to do?  Won't we get a lot of fragmentation
> by using such a large MTU, especially since you're making it the default
> for OVS setups?
>
> Things like path MTU discovery hinge strongly upon accurate MTU settings.
> Otherwise they won't function properly.

At a minimum, I don't think this should be VXLAN specific. But I agree
that I'm not sure this is the right thing to do.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 3/3] ipv4: eliminate lock count warnings in ping.c

2016-01-06 Thread Lance Richardson
Add lock release/acquire annotations to ping_seq_start() and
ping_seq_stop() to satisfy sparse.

Signed-off-by: Lance Richardson 
---
 net/ipv4/ping.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index e89094a..c117b21 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -1063,6 +1063,7 @@ static struct sock *ping_get_idx(struct seq_file *seq, 
loff_t pos)
 }
 
 void *ping_seq_start(struct seq_file *seq, loff_t *pos, sa_family_t family)
+   __acquires(ping_table.lock)
 {
struct ping_iter_state *state = seq->private;
state->bucket = 0;
@@ -1094,6 +1095,7 @@ void *ping_seq_next(struct seq_file *seq, void *v, loff_t 
*pos)
 EXPORT_SYMBOL_GPL(ping_seq_next);
 
 void ping_seq_stop(struct seq_file *seq, void *v)
+   __releases(ping_table.lock)
 {
read_unlock_bh(&ping_table.lock);
 }
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/3] ipv4: fix endianness warnings in ip_tunnel_core.c

2016-01-06 Thread Lance Richardson
Eliminate endianness mismatch warnings (reported by sparse) in this file by
using appropriate nla_put_*()/nla_get_*() calls.

Signed-off-by: Lance Richardson 
---
 net/ipv4/ip_tunnel_core.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6cb9009..bf70c1c 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -251,7 +251,7 @@ static int ip_tun_build_state(struct net_device *dev, 
struct nlattr *attr,
tun_info = lwt_tun_info(new_state);
 
if (tb[LWTUNNEL_IP_ID])
-   tun_info->key.tun_id = nla_get_u64(tb[LWTUNNEL_IP_ID]);
+   tun_info->key.tun_id = nla_get_be64(tb[LWTUNNEL_IP_ID]);
 
if (tb[LWTUNNEL_IP_DST])
tun_info->key.u.ipv4.dst = nla_get_be32(tb[LWTUNNEL_IP_DST]);
@@ -266,7 +266,7 @@ static int ip_tun_build_state(struct net_device *dev, 
struct nlattr *attr,
tun_info->key.tos = nla_get_u8(tb[LWTUNNEL_IP_TOS]);
 
if (tb[LWTUNNEL_IP_FLAGS])
-   tun_info->key.tun_flags = nla_get_u16(tb[LWTUNNEL_IP_FLAGS]);
+   tun_info->key.tun_flags = nla_get_be16(tb[LWTUNNEL_IP_FLAGS]);
 
tun_info->mode = IP_TUNNEL_INFO_TX;
tun_info->options_len = 0;
@@ -281,12 +281,12 @@ static int ip_tun_fill_encap_info(struct sk_buff *skb,
 {
struct ip_tunnel_info *tun_info = lwt_tun_info(lwtstate);
 
-   if (nla_put_u64(skb, LWTUNNEL_IP_ID, tun_info->key.tun_id) ||
+   if (nla_put_be64(skb, LWTUNNEL_IP_ID, tun_info->key.tun_id) ||
nla_put_be32(skb, LWTUNNEL_IP_DST, tun_info->key.u.ipv4.dst) ||
nla_put_be32(skb, LWTUNNEL_IP_SRC, tun_info->key.u.ipv4.src) ||
nla_put_u8(skb, LWTUNNEL_IP_TOS, tun_info->key.tos) ||
nla_put_u8(skb, LWTUNNEL_IP_TTL, tun_info->key.ttl) ||
-   nla_put_u16(skb, LWTUNNEL_IP_FLAGS, tun_info->key.tun_flags))
+   nla_put_be16(skb, LWTUNNEL_IP_FLAGS, tun_info->key.tun_flags))
return -ENOMEM;
 
return 0;
@@ -346,7 +346,7 @@ static int ip6_tun_build_state(struct net_device *dev, 
struct nlattr *attr,
tun_info = lwt_tun_info(new_state);
 
if (tb[LWTUNNEL_IP6_ID])
-   tun_info->key.tun_id = nla_get_u64(tb[LWTUNNEL_IP6_ID]);
+   tun_info->key.tun_id = nla_get_be64(tb[LWTUNNEL_IP6_ID]);
 
if (tb[LWTUNNEL_IP6_DST])
tun_info->key.u.ipv6.dst = 
nla_get_in6_addr(tb[LWTUNNEL_IP6_DST]);
@@ -361,7 +361,7 @@ static int ip6_tun_build_state(struct net_device *dev, 
struct nlattr *attr,
tun_info->key.tos = nla_get_u8(tb[LWTUNNEL_IP6_TC]);
 
if (tb[LWTUNNEL_IP6_FLAGS])
-   tun_info->key.tun_flags = nla_get_u16(tb[LWTUNNEL_IP6_FLAGS]);
+   tun_info->key.tun_flags = nla_get_be16(tb[LWTUNNEL_IP6_FLAGS]);
 
tun_info->mode = IP_TUNNEL_INFO_TX | IP_TUNNEL_INFO_IPV6;
tun_info->options_len = 0;
@@ -376,12 +376,12 @@ static int ip6_tun_fill_encap_info(struct sk_buff *skb,
 {
struct ip_tunnel_info *tun_info = lwt_tun_info(lwtstate);
 
-   if (nla_put_u64(skb, LWTUNNEL_IP6_ID, tun_info->key.tun_id) ||
+   if (nla_put_be64(skb, LWTUNNEL_IP6_ID, tun_info->key.tun_id) ||
nla_put_in6_addr(skb, LWTUNNEL_IP6_DST, &tun_info->key.u.ipv6.dst) 
||
nla_put_in6_addr(skb, LWTUNNEL_IP6_SRC, &tun_info->key.u.ipv6.src) 
||
nla_put_u8(skb, LWTUNNEL_IP6_HOPLIMIT, tun_info->key.tos) ||
nla_put_u8(skb, LWTUNNEL_IP6_TC, tun_info->key.ttl) ||
-   nla_put_u16(skb, LWTUNNEL_IP6_FLAGS, tun_info->key.tun_flags))
+   nla_put_be16(skb, LWTUNNEL_IP6_FLAGS, tun_info->key.tun_flags))
return -ENOMEM;
 
return 0;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/3] ipv4: fix various issues reported by sparse

2016-01-06 Thread Lance Richardson
This trivial patch series addresses a number of endianness- and
lock-related issues reported by sparse.

Lance Richardson (3):
  ipv4: fix endianness warnings in ip_tunnel_core.c
  ipv4: eliminate endianness warnings in ip_fib.h
  ipv4: eliminate lock count warnings in ping.c

 include/net/ip_fib.h  |  3 ++-
 net/ipv4/ip_tunnel_core.c | 16 
 net/ipv4/ping.c   |  2 ++
 3 files changed, 12 insertions(+), 9 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/3] ipv4: eliminate endianness warnings in ip_fib.h

2016-01-06 Thread Lance Richardson
fib_multipath_hash() computes a hash using __be32 values, force
cast these to u32 to pacify sparse.

Signed-off-by: Lance Richardson 
---
 include/net/ip_fib.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 9f4df68..7029527 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -325,7 +325,8 @@ extern u32 fib_multipath_secret __read_mostly;
 
 static inline int fib_multipath_hash(__be32 saddr, __be32 daddr)
 {
-   return jhash_2words(saddr, daddr, fib_multipath_secret) >> 1;
+   return jhash_2words((__force u32)saddr, (__force u32)daddr,
+   fib_multipath_secret) >> 1;
 }
 
 void fib_select_multipath(struct fib_result *res, int hash);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bpf: cleanup bpf_prog_run_{save,clear}_cb helpers

2016-01-06 Thread Alexei Starovoitov
On Wed, Jan 06, 2016 at 10:32:16PM +0100, Daniel Borkmann wrote:
> Move the details behind the cb[] access into a small helper to decouple
> and make them generic for bpf_prog_run_save_cb()/bpf_prog_run_clear_cb()
> that was introduced via commit ff936a04e5f2 ("bpf: fix cb access in socket
> filter programs"). Also add a comment to better clarify what is done in
> bpf_skb_cb().
> 
> Signed-off-by: Daniel Borkmann 

Thanks. Great catch.
Acked-by: Alexei Starovoitov 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: net: bpf: don't BUG() on large shifts

2016-01-06 Thread Alexei Starovoitov
On Wed, Jan 06, 2016 at 09:31:27PM +0100, Rabin Vincent wrote:
> On Tue, Jan 05, 2016 at 09:55:58AM -0800, Alexei Starovoitov wrote:
> > this one is better to be addressed in verifier instead of eBPF JITs.
> > Please reject it in check_alu_op() instead.
> 
> AFAICS the eBPF verifier is not called on the eBPF filters generated by
> the BPF->eBPF conversion in net/core/filter.c, so performing this check
> only in check_alu_op() will be insufficient.  So I think we'd need to
> add this check to bpf_check_classic() too.  Or am I missing something?

correct. the check is needed in bpf_check_classic() too and I believe
it's ok to tighten it up in this case, since >32 shift is
invalid/undefined anyway. We can either accept it as nop or K&=31
or error. I think returning error is more user friendly long term, though
there is a small risk of rejecting previously loadable broken programs.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT] Networking

2016-01-06 Thread David Miller

As usual, there are a couple straggler bug fixes:

1) qlcnic_alloc_mbx_args() error returns are not checked in qlcnic driver.
   Fix from Insu Yun.

2) SKB refcounting bug in connector, from Florian Westphal.

3) vrf_get_saddr() has to propagate fib_lookup() errors to it's callers,
   from David Ahern.

4) Fix AF_UNIX splice/bind deadlock, from Rainer Weikusat.

5) qdisc_rcu_free() fails to free the per-cpu qstats.  Fix from John
   Fastabend.

6) vmxnet3 driver passes wrong page to dma_map_page(), fix from
   Shrikrishna Khare.

7) Don't allow zero cwnd in tcp_cwnd_reduction(), from Yuchung Cheng.

Please pull, thanks a lot!

The following changes since commit 9c982e86dbdbaa3fb248dfc776a32cbc8927:

  Merge tag 'pci-v4.4-fixes-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci (2015-12-31 14:59:21 
-0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 8b8a321ff72c785ed5e8b4cf6eda20b35d427390:

  tcp: fix zero cwnd in tcp_cwnd_reduction (2016-01-06 16:39:56 -0500)


Alan (1):
  mkiss: fix scribble on freed memory

David Ahern (1):
  net: Propagate lookup failure in l3mdev_get_saddr to caller

Florian Westphal (1):
  connector: bump skb->users before callback invocation

Francesco Ruggeri (1):
  net: possible use after free in dst_release

Hannes Frederic Sowa (1):
  bridge: Only call /sbin/bridge-stp for the initial network namespace

Insu Yun (2):
  qlcnic: correctly handle qlcnic_alloc_mbx_args
  cxgb4: correctly handling failed allocation

John Fastabend (1):
  net: sched: fix missing free per cpu on qstats

Kristian Evensen (1):
  net: qmi_wwan: Add WeTelecom-WPD600N

One Thousand Gnomes (1):
  6pack: fix free memory scribbles

Rabin Vincent (2):
  net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
  ARM: net: bpf: fix zero right shift

Rainer Weikusat (1):
  af_unix: Fix splice-bind deadlock

Shrikrishna Khare (1):
  Driver: Vmxnet3: Fix regression caused by 5738a09

Yuchung Cheng (1):
  tcp: fix zero cwnd in tcp_cwnd_reduction

hayeswang (1):
  r8152: add reset_resume function

 arch/arm/net/bpf_jit_32.c   | 19 +++
 arch/mips/net/bpf_jit.c | 16 +---
 arch/powerpc/net/bpf_jit_comp.c | 13 ++---
 arch/sparc/net/bpf_jit_comp.c   | 17 ++---
 drivers/connector/connector.c   | 11 +++
 drivers/net/ethernet/chelsio/cxgb4/clip_tbl.c   |  4 
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c |  6 --
 drivers/net/hamradio/6pack.c|  6 ++
 drivers/net/hamradio/mkiss.c|  5 +
 drivers/net/usb/qmi_wwan.c  |  1 +
 drivers/net/usb/r8152.c | 10 +-
 drivers/net/vmxnet3/vmxnet3_drv.c   |  8 
 drivers/net/vmxnet3/vmxnet3_int.h   |  4 ++--
 drivers/net/vrf.c   | 10 +++---
 include/linux/filter.h  | 19 +++
 include/net/l3mdev.h| 16 ++--
 include/net/route.h |  7 ++-
 net/bridge/br_stp_if.c  |  5 -
 net/core/dst.c  |  3 ++-
 net/ipv4/raw.c  |  7 +--
 net/ipv4/tcp_input.c|  3 +++
 net/ipv4/udp.c  |  7 +--
 net/sched/sch_generic.c |  4 +++-
 net/unix/af_unix.c  | 66 
--
 24 files changed, 150 insertions(+), 117 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Florian Westphal
Thadeu Lima de Souza Cascardo  wrote:
> On Wed, Jan 06, 2016 at 11:11:41PM +0300, Konstantin Khlebnikov wrote:
> > On Wed, Jan 6, 2016 at 10:59 PM, Cong Wang  wrote:
> > > On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov  
> > > wrote:
> > >> Looks like this happens because ip_options_fragment() relies on
> > >> correct ip options length in ip control block in skb. But in
> > >> ip_finish_output_gso() control block in segments is reused by
> > >> skb_gso_segment(). following ip_fragment() sees some garbage.
> > >>
> > >> In my case there was no ip options but length becomes non-zero and
> > >> ip_options_fragment() picked some bytes from payload and decides to
> > >> fill huge range with IPOPT_NOOP (1). One of that ones flipped nr_frags
> > >> in skb_shared_info at the end of data =)
> > >>
> > >
> > > Hmm, it looks like SKB_GSO_CB should be cleared after skb_gso_segment()
> > > since all the gso information should be saved in shared_info after it 
> > > finishes.
> > >
> > > Does a memset(0) on SKB_GSO_CB after skb_gso_segment() work as well?
> > 
> > This will break present logic around ip_options_fragment() - it clears
> > options from
> > second and following fragments. With zeroed cb it will do nothing.
> > 
> > ip_options_fragment() can get required information directly from ip header 
> > but
> > it also resets fields in IPCB -- probably it should stay valid here
> > and somebody else will use it later.
[..]
> I have hit this as well, this fixes it for me on an older kernel. Can you try 
> it
> on latest kernel?

> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index d8a1745..f44bc91 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -216,6 +216,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
>   netdev_features_t features;
>   struct sk_buff *segs;
>   int ret = 0;
> + struct inet_skb_parm ipcb;
>  
>   if (skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
>   return ip_finish_output2(skb);
> @@ -227,6 +228,10 @@ static int ip_finish_output_gso(struct sk_buff *skb)
>* 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
>* from host network stack.
>*/
> + /* We need to save IPCB here because skb_gso_segment will use
> +  * SKB_GSO_CB.
> +  */
> + ipcb = *IPCB(skb);
>   features = netif_skb_features(skb);
>   segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
>   if (IS_ERR_OR_NULL(segs)) {
> @@ -241,6 +246,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
>   int err;
>  
>   segs->next = NULL;
> + *IPCB(segs) = ipcb;
>   err = ip_fragment(segs, ip_finish_output2);
>  
>   if (err && ret == 0)

I'm worried that this doesn't solve all cases. f.e. xfrm may also
call skb_gso_segment(), and it will call into ipv4/ipv6 netfilter
postrouting + ipv4 output functions...

nfqnl_enqueue_packet() is also affected.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 10/12] net/mlx5_core: Export flow steering API

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Add exports to flow steering API for mlx5_ib usage.
The following functions are exported:

1. mlx5_create_auto_grouped_flow_table - used to create flow
table with auto flow grouping management (create and destroy
flow groups). In auto-grouped flow tables, we create groups
automatically if needed (if we don't find an existing
flow group with same match criteria when we add new rule).

2. mlx5_destroy_flow_table - used to destroy  a flow table.

3. mlx5_add_flow_rule - used to add flow rule into a flow table.

4. mlx5_del_flow_rule - used to delete flow rule from its flow table.

5. mlx5_get_flow_namespace - used to get a handle to the required
namespace sub-tree.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 757725b..6f68dba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -702,6 +702,7 @@ struct mlx5_flow_table 
*mlx5_create_auto_grouped_flow_table(struct mlx5_flow_nam
 
return ft;
 }
+EXPORT_SYMBOL(mlx5_create_auto_grouped_flow_table);
 
 /* Flow table should be locked */
 static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table 
*ft,
@@ -1013,11 +1014,13 @@ unlock:
unlock_ref_node(&ft->node);
return rule;
 }
+EXPORT_SYMBOL(mlx5_add_flow_rule);
 
 void mlx5_del_flow_rule(struct mlx5_flow_rule *rule)
 {
tree_remove_node(&rule->node);
 }
+EXPORT_SYMBOL(mlx5_del_flow_rule);
 
 /* Assuming prio->node.children(flow tables) is sorted by level */
 static struct mlx5_flow_table *find_next_ft(struct mlx5_flow_table *ft)
@@ -1099,6 +1102,7 @@ int mlx5_destroy_flow_table(struct mlx5_flow_table *ft)
 
return err;
 }
+EXPORT_SYMBOL(mlx5_destroy_flow_table);
 
 void mlx5_destroy_flow_group(struct mlx5_flow_group *fg)
 {
@@ -1143,6 +1147,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 
return ns;
 }
+EXPORT_SYMBOL(mlx5_get_flow_namespace);
 
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
  unsigned prio, int max_ft)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 12/12] IB/mlx5: Add flow steering support

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Adding flow steering support by creating a flow-table per
priority (if rules exist in the priority).
mlx5_ib uses autogrouping and thus only creates the
required destinations.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/main.c|  285 ++
 drivers/infiniband/hw/mlx5/mlx5_ib.h |   45 +-
 2 files changed, 329 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index e16f13f..01f7ef5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "user.h"
 #include "mlx5_ib.h"
 
@@ -1012,6 +1013,281 @@ static bool is_valid_attr(struct ib_flow_attr 
*flow_attr)
return !has_ipv4_spec || eth_type_ipv4;
 }
 
+static void put_flow_table(struct mlx5_ib_dev *dev,
+  struct mlx5_ib_flow_prio *prio, bool ft_added)
+{
+   prio->refcount -= !!ft_added;
+   if (!prio->refcount) {
+   mlx5_destroy_flow_table(prio->flow_table);
+   prio->flow_table = NULL;
+   }
+}
+
+static int mlx5_ib_destroy_flow(struct ib_flow *flow_id)
+{
+   struct mlx5_ib_dev *dev = to_mdev(flow_id->qp->device);
+   struct mlx5_ib_flow_handler *handler = container_of(flow_id,
+ struct 
mlx5_ib_flow_handler,
+ ibflow);
+   struct mlx5_ib_flow_handler *iter, *tmp;
+
+   mutex_lock(&dev->flow_db.lock);
+
+   list_for_each_entry_safe(iter, tmp, &handler->list, list) {
+   mlx5_del_flow_rule(iter->rule);
+   list_del(&iter->list);
+   kfree(iter);
+   }
+
+   mlx5_del_flow_rule(handler->rule);
+   put_flow_table(dev, &dev->flow_db.prios[handler->prio], true);
+   mutex_unlock(&dev->flow_db.lock);
+
+   kfree(handler);
+
+   return 0;
+}
+
+#define MLX5_FS_MAX_TYPES   10
+#define MLX5_FS_MAX_ENTRIES 32000UL
+static struct mlx5_ib_flow_prio *get_flow_table(struct mlx5_ib_dev *dev,
+   struct ib_flow_attr *flow_attr)
+{
+   struct mlx5_flow_namespace *ns = NULL;
+   struct mlx5_ib_flow_prio *prio;
+   struct mlx5_flow_table *ft;
+   int num_entries;
+   int num_groups;
+   int priority;
+   int err = 0;
+
+   if (flow_attr->type == IB_FLOW_ATTR_NORMAL) {
+   if (flow_is_multicast_only(flow_attr))
+   priority = MLX5_IB_FLOW_MCAST_PRIO;
+   else
+   priority = flow_attr->priority;
+   ns = mlx5_get_flow_namespace(dev->mdev,
+MLX5_FLOW_NAMESPACE_BYPASS);
+   num_entries = MLX5_FS_MAX_ENTRIES;
+   num_groups = MLX5_FS_MAX_TYPES;
+   prio = &dev->flow_db.prios[priority];
+   } else if (flow_attr->type == IB_FLOW_ATTR_ALL_DEFAULT ||
+  flow_attr->type == IB_FLOW_ATTR_MC_DEFAULT) {
+   ns = mlx5_get_flow_namespace(dev->mdev,
+MLX5_FLOW_NAMESPACE_LEFTOVERS);
+   build_leftovers_ft_param(&priority,
+&num_entries,
+&num_groups);
+   prio = &dev->flow_db.prios[MLX5_IB_FLOW_LEFTOVERS_PRIO];
+   }
+
+   if (!ns)
+   return ERR_PTR(-ENOTSUPP);
+
+   ft = prio->flow_table;
+   if (!ft) {
+   ft = mlx5_create_auto_grouped_flow_table(ns, priority,
+num_entries,
+num_groups);
+
+   if (!IS_ERR(ft)) {
+   prio->refcount = 0;
+   prio->flow_table = ft;
+   } else {
+   err = PTR_ERR(ft);
+   }
+   }
+
+   return err ? ERR_PTR(err) : prio;
+}
+
+static struct mlx5_ib_flow_handler *create_flow_rule(struct mlx5_ib_dev *dev,
+struct mlx5_ib_flow_prio 
*ft_prio,
+struct ib_flow_attr 
*flow_attr,
+struct 
mlx5_flow_destination *dst)
+{
+   struct mlx5_flow_table  *ft = ft_prio->flow_table;
+   struct mlx5_ib_flow_handler *handler;
+   void *ib_flow = flow_attr + 1;
+   u8 match_criteria_enable = 0;
+   unsigned int spec_index;
+   u32 *match_c;
+   u32 *match_v;
+   int err = 0;
+
+   if (!is_valid_attr(flow_attr))
+   return ERR_PTR(-EINVAL);
+
+   match_c = kzalloc(MLX5_ST_SZ_BYTES(fte_match_param), GFP_KERNEL);
+   match_v = kzalloc(MLX5

[PATCH net-next V1 08/12] net/mlx5_core: Enable flow steering support for the IB driver

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

When the driver is loaded, we create flow steering namespace
for kernel bypass with nine priorities and another namespace
for leftovers(in order to catch packets that weren't matched).
Verbs applications will use these priorities.
we found nine as a number that balances the requirements from the
user and retains performance.

The bypass namespace is used by verbs applications that want to bypass
the kernel networking stack. The leftovers namespace is used by verbs
applications and the sniffer in order to catch packets that weren't
handled by any preceding rules.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   55 ++---
 include/linux/mlx5/device.h   |2 +
 include/linux/mlx5/fs.h   |2 +
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 96e287a..757725b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,18 +40,19 @@
 #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
 sizeof(struct init_tree_node))
 
-#define ADD_PRIO(min_level_val, max_ft_val, caps_val,\
+#define ADD_PRIO(num_prios_val, min_level_val, max_ft_val, caps_val,\
 ...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
.max_ft = max_ft_val,\
+   .num_leaf_prios = num_prios_val,\
.caps = caps_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_FT_PRIO(max_ft_val, ...)\
-   ADD_PRIO(0, max_ft_val, {},\
- __VA_ARGS__)\
+#define ADD_MULTIPLE_PRIO(num_prios_val, max_ft_val, ...)\
+   ADD_PRIO(num_prios_val, 0, max_ft_val, {},\
+__VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
@@ -66,7 +67,14 @@
 #define FS_REQUIRED_CAPS(...) {.arr_sz = INIT_CAPS_ARRAY_SIZE(__VA_ARGS__), \
   .caps = (long[]) {__VA_ARGS__} }
 
+#define LEFTOVERS_MAX_FT 1
+#define LEFTOVERS_NUM_PRIOS 1
+#define BY_PASS_PRIO_MAX_FT 1
+#define BY_PASS_MIN_LEVEL (KENREL_MIN_LEVEL + MLX5_BY_PASS_NUM_PRIOS +\
+  LEFTOVERS_MAX_FT)
+
 #define KERNEL_MAX_FT 2
+#define KERNEL_NUM_PRIOS 1
 #define KENREL_MIN_LEVEL 2
 
 struct node_caps {
@@ -79,14 +87,27 @@ static struct init_tree_node {
int ar_size;
struct node_caps caps;
int min_ft_level;
+   int num_leaf_prios;
int prio;
int max_ft;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
-   .ar_size = 1,
+   .ar_size = 3,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, 0, {},
-ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
+   ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
+
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
+ 
FS_CAP(flow_table_properties_nic_receive.modify_root),
+ 
FS_CAP(flow_table_properties_nic_receive.identified_miss_table_mode),
+ 
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
+ADD_NS(ADD_MULTIPLE_PRIO(MLX5_BY_PASS_NUM_PRIOS, 
BY_PASS_PRIO_MAX_FT))),
+   ADD_PRIO(0, KENREL_MIN_LEVEL, 0, {},
+ADD_NS(ADD_MULTIPLE_PRIO(KERNEL_NUM_PRIOS, 
KERNEL_MAX_FT))),
+   ADD_PRIO(0, BY_PASS_MIN_LEVEL, 0,
+
FS_REQUIRED_CAPS(FS_CAP(flow_table_properties_nic_receive.flow_modify_en),
+ 
FS_CAP(flow_table_properties_nic_receive.modify_root),
+ 
FS_CAP(flow_table_properties_nic_receive.identified_miss_table_mode),
+ 
FS_CAP(flow_table_properties_nic_receive.flow_table_modify)),
+ADD_NS(ADD_MULTIPLE_PRIO(LEFTOVERS_NUM_PRIOS, 
LEFTOVERS_MAX_FT))),
}
 };
 
@@ -1098,8 +1119,10 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
return NULL;
 
switch (type) {
+   case MLX5_FLOW_NAMESPACE_BYPASS:
case MLX5_FLOW_NAMESPACE_KERNEL:
-   prio = 0;
+   case MLX5_FLOW_NAMESPACE_LEFTOVERS:
+   prio = type;
break;
case MLX5_FLOW_NAMESPACE_FDB:
if (dev->priv.fdb_root_ns)
@@ -1164,6 +1187,20 @@ static struct mlx5_flow_namespace 
*fs_create_namespace(struct fs_prio *prio)
return ns;
 }
 
+static in

[PATCH net-next V1 05/12] net/mlx5_core: Connect flow tables

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Flow tables from different priorities should be chained together.
When a packet arrives we search for a match in the
by-pass flow tables (first we search for a match in priority 0
and if we don't find a match we move to the next priority).
If we can't find a match in any of the bypass flow-tables, we continue
searching in the flow-tables of the next priority, which are the
kernel's flow tables.

Setting the miss flow table in a new flow table to be the next one in
the list is performed via create flow table API. If we want to change an
existing flow table, for example in order to point from an
existing flow table to the new next-in-list flow table, we use the
modify flow table API.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |7 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |3 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  104 +++--
 3 files changed, 104 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 2b55625..a9894d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -58,7 +58,8 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
 
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
-  unsigned int log_size, unsigned int *table_id)
+  unsigned int log_size, struct mlx5_flow_table
+  *next_ft, unsigned int *table_id)
 {
u32 out[MLX5_ST_SZ_DW(create_flow_table_out)];
u32 in[MLX5_ST_SZ_DW(create_flow_table_in)];
@@ -69,6 +70,10 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
MLX5_SET(create_flow_table_in, in, opcode,
 MLX5_CMD_OP_CREATE_FLOW_TABLE);
 
+   if (next_ft) {
+   MLX5_SET(create_flow_table_in, in, table_miss_mode, 1);
+   MLX5_SET(create_flow_table_in, in, table_miss_id, next_ft->id);
+   }
MLX5_SET(create_flow_table_in, in, table_type, type);
MLX5_SET(create_flow_table_in, in, level, level);
MLX5_SET(create_flow_table_in, in, log_size, log_size);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 1ae9b68..9814d47 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -35,7 +35,8 @@
 
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
-  unsigned int log_size, unsigned int *table_id);
+  unsigned int log_size, struct mlx5_flow_table
+  *next_ft, unsigned int *table_id);
 
 int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 64bdb54..c6f864d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -510,6 +510,48 @@ static struct mlx5_flow_table *find_prev_chained_ft(struct 
fs_prio *prio)
return find_closest_ft(prio, true);
 }
 
+static int connect_fts_in_prio(struct mlx5_core_dev *dev,
+  struct fs_prio *prio,
+  struct mlx5_flow_table *ft)
+{
+   struct mlx5_flow_table *iter;
+   int i = 0;
+   int err;
+
+   fs_for_each_ft(iter, prio) {
+   i++;
+   err = mlx5_cmd_modify_flow_table(dev,
+iter,
+ft);
+   if (err) {
+   mlx5_core_warn(dev, "Failed to modify flow table %d\n",
+  iter->id);
+   /* The driver is out of sync with the FW */
+   if (i > 1)
+   WARN_ON(true);
+   return err;
+   }
+   }
+   return 0;
+}
+
+/* Connect flow tables from previous priority of prio to ft */
+static int connect_prev_fts(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft,
+   struct fs_prio *prio)
+{
+   struct mlx5_flow_table *prev_ft;
+
+   prev_ft = find_prev_chained_ft(prio);
+   if (prev_ft) {
+   struct fs_prio *prev_prio;
+
+   fs_get_obj(prev_prio, prev_ft->node.parent);
+   return connect_fts_in_prio(dev, prev_prio, ft);
+   }
+   retur

[PATCH net-next V1 06/12] net/mlx5_core: Set priority attributes

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Each priority has two attributes:
1. max_ft - maximum allowed flow tables under this priority.
2. start_level - start level range of the flow tables
in the priority.

These attributes are set by traversing the tree nodes by
DFS and set start level and max flow tables to each priority.
Start level depends on the max flow tables of the prior priorities
in the tree.

The leaves of the trees have max_ft set in them. Each node accumulates
the max_ft of its children and set it accordingly.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   71 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |3 +
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c6f864d..e1282e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -41,20 +41,19 @@
 sizeof(struct init_tree_node))
 
 #define INIT_PRIO(min_level_val, max_ft_val,\
- start_level_val, ...) {.type = FS_TYPE_PRIO,\
+ ...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
-   .start_level = start_level_val,\
.max_ft = max_ft_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_PRIO(min_level_val, max_ft_val, start_level_val, ...)\
-   INIT_PRIO(min_level_val, max_ft_val, start_level_val,\
+#define ADD_PRIO(min_level_val, max_ft_val, ...)\
+   INIT_PRIO(min_level_val, max_ft_val,\
  __VA_ARGS__)\
 
-#define ADD_FT_PRIO(max_ft_val, start_level_val, ...)\
-   INIT_PRIO(0, max_ft_val, start_level_val,\
+#define ADD_FT_PRIO(max_ft_val, ...)\
+   INIT_PRIO(0, max_ft_val,\
  __VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
@@ -62,8 +61,6 @@
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define KERNEL_START_LEVEL 0
-#define KERNEL_P0_START_LEVEL KERNEL_START_LEVEL
 #define KERNEL_MAX_FT 2
 #define KENREL_MIN_LEVEL 2
 static struct init_tree_node {
@@ -73,15 +70,12 @@ static struct init_tree_node {
int min_ft_level;
int prio;
int max_ft;
-   int start_level;
 } root_fs = {
.type = FS_TYPE_NAMESPACE,
.ar_size = 1,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, KERNEL_MAX_FT,
-KERNEL_START_LEVEL,
-ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT,
-   KERNEL_P0_START_LEVEL))),
+   ADD_PRIO(KENREL_MIN_LEVEL, 0,
+ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
}
 };
 
@@ -1117,8 +,7 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 }
 
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
- unsigned prio, int max_ft,
- int start_level)
+ unsigned prio, int max_ft)
 {
struct fs_prio *fs_prio;
 
@@ -1131,7 +1124,6 @@ static struct fs_prio *fs_create_prio(struct 
mlx5_flow_namespace *ns,
tree_add_node(&fs_prio->node, &ns->node);
fs_prio->max_ft = max_ft;
fs_prio->prio = prio;
-   fs_prio->start_level = start_level;
list_add_tail(&fs_prio->node.list, &ns->node.children);
 
return fs_prio;
@@ -1177,8 +1169,7 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
return -ENOTSUPP;
 
fs_get_obj(fs_ns, fs_parent_node);
-   fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft,
-init_node->start_level);
+   fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft);
if (IS_ERR(fs_prio))
return PTR_ERR(fs_prio);
base = &fs_prio->node;
@@ -1245,6 +1236,46 @@ static struct mlx5_flow_root_namespace 
*create_root_ns(struct mlx5_core_dev *dev
return root_ns;
 }
 
+static void set_prio_attrs_in_prio(struct fs_prio *prio, int acc_level);
+
+static int set_prio_attrs_in_ns(struct mlx5_flow_namespace *ns, int acc_level)
+{
+   struct fs_prio *prio;
+
+   fs_for_each_prio(prio, ns) {
+/* This updates prio start_level and max_ft */
+   set_prio_attrs_in_prio(prio, acc_level);
+   acc_level += prio->max_ft;
+   }
+   return acc_level;
+}
+
+static void set_prio_attrs_in_prio(struct fs_prio *prio, int acc_level)
+{
+   struct mlx5_flow_namespace *ns;
+   int acc_level_ns = acc_level;
+
+   prio->start_level = acc

[PATCH net-next V1 09/12] net/mlx5_core: Make ipv4/ipv6 location more clear

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Change the mlx5 firmware interface header to make it
more clear which bytes should be used by IPv4 or
IPv6 addresses.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/mlx5_ifc.h |   20 ++--
 1 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 7f16695..68d73f8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -298,6 +298,22 @@ struct mlx5_ifc_odp_per_transport_service_cap_bits {
u8 reserved_1[0x1a];
 };
 
+struct mlx5_ifc_ipv4_layout_bits {
+   u8 reserved_0[0x60];
+
+   u8 ipv4[0x20];
+};
+
+struct mlx5_ifc_ipv6_layout_bits {
+   u8 ipv6[16][0x8];
+};
+
+union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits {
+   struct mlx5_ifc_ipv6_layout_bits ipv6_layout;
+   struct mlx5_ifc_ipv4_layout_bits ipv4_layout;
+   u8 reserved_0[0x80];
+};
+
 struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 smac_47_16[0x20];
 
@@ -328,9 +344,9 @@ struct mlx5_ifc_fte_match_set_lyr_2_4_bits {
u8 udp_sport[0x10];
u8 udp_dport[0x10];
 
-   u8 src_ip[4][0x20];
+   union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6;
 
-   u8 dst_ip[4][0x20];
+   union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6;
 };
 
 struct mlx5_ifc_fte_match_set_misc_bits {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 04/12] net/mlx5_core: Introduce modify flow table command

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Introduce the modify flow table command. This command is used when
we want to change the next flow table of an existing flow table.
The next flow table is defined as the table we search (in order
to find a match), if we couldn't find a match in any of the flow table
entries in the current flow table.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c |   27 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h |4 ++
 include/linux/mlx5/mlx5_ifc.h|   56 --
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index d8b1195..2b55625 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -101,6 +101,33 @@ int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
  sizeof(out));
 }
 
+int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
+  struct mlx5_flow_table *ft,
+  struct mlx5_flow_table *next_ft)
+{
+   u32 in[MLX5_ST_SZ_DW(modify_flow_table_in)];
+   u32 out[MLX5_ST_SZ_DW(modify_flow_table_out)];
+
+   memset(in, 0, sizeof(in));
+   memset(out, 0, sizeof(out));
+
+   MLX5_SET(modify_flow_table_in, in, opcode,
+MLX5_CMD_OP_MODIFY_FLOW_TABLE);
+   MLX5_SET(modify_flow_table_in, in, table_type, ft->type);
+   MLX5_SET(modify_flow_table_in, in, table_id, ft->id);
+   MLX5_SET(modify_flow_table_in, in, modify_field_select,
+MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID);
+   if (next_ft) {
+   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 1);
+   MLX5_SET(modify_flow_table_in, in, table_miss_id, next_ft->id);
+   } else {
+   MLX5_SET(modify_flow_table_in, in, table_miss_mode, 0);
+   }
+
+   return mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+ sizeof(out));
+}
+
 int mlx5_cmd_create_flow_group(struct mlx5_core_dev *dev,
   struct mlx5_flow_table *ft,
   u32 *in,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index 70d18ec..1ae9b68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -40,6 +40,10 @@ int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
 int mlx5_cmd_destroy_flow_table(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft);
 
+int mlx5_cmd_modify_flow_table(struct mlx5_core_dev *dev,
+  struct mlx5_flow_table *ft,
+  struct mlx5_flow_table *next_ft);
+
 int mlx5_cmd_create_flow_group(struct mlx5_core_dev *dev,
   struct mlx5_flow_table *ft,
   u32 *in, unsigned int *group_id);
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 323e713..7f16695 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -194,7 +194,8 @@ enum {
MLX5_CMD_OP_QUERY_FLOW_GROUP  = 0x935,
MLX5_CMD_OP_SET_FLOW_TABLE_ENTRY  = 0x936,
MLX5_CMD_OP_QUERY_FLOW_TABLE_ENTRY= 0x937,
-   MLX5_CMD_OP_DELETE_FLOW_TABLE_ENTRY   = 0x938
+   MLX5_CMD_OP_DELETE_FLOW_TABLE_ENTRY   = 0x938,
+   MLX5_CMD_OP_MODIFY_FLOW_TABLE = 0x93c
 };
 
 struct mlx5_ifc_flow_table_fields_supported_bits {
@@ -260,7 +261,9 @@ struct mlx5_ifc_flow_table_prop_layout_bits {
u8 reserved_0[0x2];
u8 flow_modify_en[0x1];
u8 modify_root[0x1];
-   u8 reserved_1[0x1b];
+   u8 identified_miss_table_mode[0x1];
+   u8 flow_table_modify[0x1];
+   u8 reserved_1[0x19];
 
u8 reserved_2[0x2];
u8 log_max_ft_size[0x6];
@@ -5669,12 +5672,16 @@ struct mlx5_ifc_create_flow_table_in_bits {
 
u8 reserved_4[0x20];
 
-   u8 reserved_5[0x8];
+   u8 reserved_5[0x4];
+   u8 table_miss_mode[0x4];
u8 level[0x8];
u8 reserved_6[0x8];
u8 log_size[0x8];
 
-   u8 reserved_7[0x120];
+   u8 reserved_7[0x8];
+   u8 table_miss_id[0x18];
+
+   u8 reserved_8[0x100];
 };
 
 struct mlx5_ifc_create_flow_group_out_bits {
@@ -6975,4 +6982,45 @@ struct mlx5_ifc_set_flow_table_root_in_bits {
u8 reserved_5[0x140];
 };
 
+enum {
+   MLX5_MODIFY_FLOW_TABLE_MISS_TABLE_ID = 0x1,
+};
+
+struct mlx5_ifc_modify_flow_table_out_bits {

[PATCH net-next V1 11/12] IB/mlx5: Add flow steering utilities

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Add three utility functions for support flow steering:

1. Parsing verbs flow attributes hardware steering specs.

2. Check if flow is multicast - this is required in order to decide to
which flow table will we add the steering rule.

3. Set outer headers in flow match criteria to zeros.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/main.c |  177 +
 include/linux/mlx5/fs.h   |   10 ++
 2 files changed, 187 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 7e97cb5..e16f13f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "user.h"
 #include "mlx5_ib.h"
 
@@ -835,6 +837,181 @@ static int mlx5_ib_dealloc_pd(struct ib_pd *pd)
return 0;
 }
 
+static bool outer_header_zero(u32 *match_criteria)
+{
+   int size = MLX5_ST_SZ_BYTES(fte_match_param);
+   char *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_criteria,
+outer_headers);
+
+   return outer_headers_c[0] == 0 && !memcmp(outer_headers_c,
+ outer_headers_c + 1,
+ size - 1);
+}
+
+static int parse_flow_attr(u32 *match_c, u32 *match_v,
+  union ib_flow_spec *ib_spec)
+{
+   void *outer_headers_c = MLX5_ADDR_OF(fte_match_param, match_c,
+outer_headers);
+   void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
+outer_headers);
+   switch (ib_spec->type) {
+   case IB_FLOW_SPEC_ETH:
+   if (ib_spec->size != sizeof(ib_spec->eth))
+   return -EINVAL;
+
+   ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, 
outer_headers_c,
+dmac_47_16),
+   ib_spec->eth.mask.dst_mac);
+   ether_addr_copy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, 
outer_headers_v,
+dmac_47_16),
+   ib_spec->eth.val.dst_mac);
+
+   if (ib_spec->eth.mask.vlan_tag) {
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+vlan_tag, 1);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+vlan_tag, 1);
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_vid, ntohs(ib_spec->eth.mask.vlan_tag));
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_vid, ntohs(ib_spec->eth.val.vlan_tag));
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_cfi,
+ntohs(ib_spec->eth.mask.vlan_tag) >> 12);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_cfi,
+ntohs(ib_spec->eth.val.vlan_tag) >> 12);
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+first_prio,
+ntohs(ib_spec->eth.mask.vlan_tag) >> 13);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+first_prio,
+ntohs(ib_spec->eth.val.vlan_tag) >> 13);
+   }
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ethertype, ntohs(ib_spec->eth.mask.ether_type));
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ethertype, ntohs(ib_spec->eth.val.ether_type));
+   break;
+   case IB_FLOW_SPEC_IPV4:
+   if (ib_spec->size != sizeof(ib_spec->ipv4))
+   return -EINVAL;
+
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_c,
+ethertype, 0x);
+   MLX5_SET(fte_match_set_lyr_2_4, outer_headers_v,
+ethertype, ETH_P_IP);
+
+   memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_c,
+   src_ipv4_src_ipv6.ipv4_layout.ipv4),
+  &ib_spec->ipv4.mask.src_ip,
+  sizeof(ib_spec->ipv4.mask.src_ip));
+   memcpy(MLX5_ADDR_OF(fte_match_set_lyr_2_4, outer_headers_v,
+   src_ipv4_src_ipv6.ipv4_layout.ipv4),
+  &ib_spec->ipv4.val.src_ip,
+  sizeof(ib_spec->ipv4.val.src_ip));
+   me

[PATCH net-next V1 00/12] net/mlx5_core: Enhance flow steering support

2016-01-06 Thread Saeed Mahameed
Hi Dave,

This series adds three new functionalists to the driver flow-steering
infrastructure:
auto-grouped flow tables, chaining of flow tables and updates for the
root flow table.

Changes since V0:
- Fixed improperly formatted comments.
- Compare value of ib_spec->eth.mask.ether_type in network byte order
  in ('IB/mlx5: Add flow steering utilities').

1. Auto-grouped flow tables - Flow table with auto grouping management.
When a flow table is created, hints regarding the number of rule types
and the number of rules are given in advance. Thus, a flow table is
divided into #NUM_TYPES+1 groups each contains
(#NUM_RULES)/(#NUM_TYPES+1) rules. The first #NUM_TYPES parts are groups
which are filled if the added rule matches the group specification or
the group is empty. The last part is filled by rules that can't fit
any of the former groups.

2. Chaining flow tables - Flow tables from different priorities are chained
together, if there is no match in flow table of priority i we continue
searching for a match in priority i+1. This is both true if priorities
i and i+1 belongs to the same namespace or not.

3. Updating the root flow table - the root flow table is the flow table
with the lowest level. The hardware start searching for a match in the
root flow table and continue according to the matches it find along
the way.

The first usage for the new functionality is flow steering for user-space
ConnectX-4 offloaded HW Eth RX queues done through the mlx5 IB driver.

When the mlx5 core driver is loaded, it opens three flow namespaces:
1. By-pass namespace (used by mlx5 IB driver).
2. Kernel namespace (used in order to get packets to the networking stack
through mlx5 EN driver).
3. Leftovers namespace (used by mlx5 IB and future sniffer)

The series is built as follows:

Patch #1 introduces auto-grouped flow tables support.

Patch #2 add utility functions for finding the next and the previous
flow tables in different priorities. This is used in order to chain
the flow tables in a downstream patch.

Patch #3 introduces a firmware command for updating the root flow table.

Patch #4 introduces modify flow table firmware command, this command is used
when we want to change the next flow table of an existing flow table.
This is used for chaining flow tables as well.

Patch #5 connect/disconnect flow tables. This is actually the chaining
process when we want to link flow tables. This means that if we couldn't
find a match in the first flow table, we'll continue in the chained
flow table.

Patch #6 updates priority's attributes that is required for flow table
level allocation. We update both the max_fts (the number of allowed FTs
in the sub-tree of this priority) and the start_level (which is the first
level we'll assign to the flow-tables created inside the priority).

Patch #7 adds checking of required device capabilities. Some namespaces
could be only created if the hardware supports certain attributes.
This is especially true for the Bypass and leftovers namespaces. This
adds a generic mechanism to check these required attributes.

Patch #8 creates two additional namespaces:
a. Bypass flow rules(has nine priorities)
b. Leftovers packets(have one priority) - for unmatched packets.

Patch #9 re-factors ipv4/ipv6 match fields in the mlx5 firmware interface
header to be more clear.

Patch #10 exports the flow steering API for mlx5_ib usage

Patch #11 and #12 implements the required support in mlx5_ib in order
to support the RDMA flow steering verbs.

Regards,
Moni, Matan and Maor


Maor Gottlieb (12):
  net/mlx5_core: Introduce flow steering autogrouped flow table
  net/mlx5_core: Add utilities to find next and prev flow-tables
  net/mlx5_core: Managing root flow table
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Export flow steering API
  IB/mlx5: Add flow steering utilities
  IB/mlx5: Add flow steering support

 drivers/infiniband/hw/mlx5/main.c |  462 
 drivers/infiniband/hw/mlx5/mlx5_ib.h  |   45 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   52 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |9 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  601 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   14 +
 include/linux/mlx5/device.h   |2 +
 include/linux/mlx5/fs.h   |   18 +
 include/linux/mlx5/mlx5_ifc.h |  105 -
 9 files changed, 1232 insertions(+), 76 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger

Re: [PATCH net] tcp: fix zero cwnd in tcp_cwnd_reduction

2016-01-06 Thread David Miller
From: Yuchung Cheng 
Date: Wed,  6 Jan 2016 12:42:38 -0800

> Patch 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode
> conditionally") introduced a bug that cwnd may become 0 when both
> inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead
> to a div-by-zero if the connection starts another cwnd reduction
> phase by setting tp->prior_cwnd to the current cwnd (0) in
> tcp_init_cwnd_reduction().
> 
> To prevent this we skip PRR operation when nothing is acked or
> sacked. Then cwnd must be positive in all cases as long as ssthresh
> is positive:
> 
> 1) The proportional reduction mode
>inflight > ssthresh > 0
> 
> 2) The reduction bound mode
>   a) inflight == ssthresh > 0
> 
>   b) inflight < ssthresh
>  sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh
> 
> Therefore in all cases inflight and sndcnt can not both be 0.
> We check invalid tp->prior_cwnd to avoid potential div0 bugs.
> 
> In reality this bug is triggered only with a sequence of less common
> events.  For example, the connection is terminating an ECN-triggered
> cwnd reduction with an inflight 0, then it receives reordered/old
> ACKs or DSACKs from prior transmission (which acks nothing). Or the
> connection is in fast recovery stage that marks everything lost,
> but fails to retransmit due to local issues, then receives data
> packets from other end which acks nothing.
> 
> Fixes: 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode 
> conditionally")
> Reported-by: Oleksandr Natalenko 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 02/12] net/mlx5_core: Add utilities to find next and prev flow-tables

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Add two utility functions for find next and prev flow table.
Find next flow table function gets priority and return the
first flow table of the next priority in the tree.
Find prev flow table return the last flow table of
the previous priority in the tree.

These utility functions are used for chaining flow table from different
priorities.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   67 +
 1 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 7d24bbb..c5a96e6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -443,6 +443,73 @@ static struct mlx5_flow_table *alloc_flow_table(int level, 
int max_fte,
return ft;
 }
 
+/* If reverse is false, then we search for the first flow table in the
+ * root sub-tree from start(closest from right), else we search for the
+ * last flow table in the root sub-tree till start(closest from left).
+ */
+static struct mlx5_flow_table *find_closest_ft_recursive(struct fs_node  *root,
+struct list_head 
*start,
+bool reverse)
+{
+#define list_advance_entry(pos, reverse)   \
+   ((reverse) ? list_prev_entry(pos, list) : list_next_entry(pos, list))
+
+#define list_for_each_advance_continue(pos, head, reverse) \
+   for (pos = list_advance_entry(pos, reverse);\
+&pos->list != (head);  \
+pos = list_advance_entry(pos, reverse))
+
+   struct fs_node *iter = list_entry(start, struct fs_node, list);
+   struct mlx5_flow_table *ft = NULL;
+
+   if (!root)
+   return NULL;
+
+   list_for_each_advance_continue(iter, &root->children, reverse) {
+   if (iter->type == FS_TYPE_FLOW_TABLE) {
+   fs_get_obj(ft, iter);
+   return ft;
+   }
+   ft = find_closest_ft_recursive(iter, &iter->children, reverse);
+   if (ft)
+   return ft;
+   }
+
+   return ft;
+}
+
+/* If reverse if false then return the first flow table in next priority of
+ * prio in the tree, else return the last flow table in the previous priority
+ * of prio in the tree.
+ */
+static struct mlx5_flow_table *find_closest_ft(struct fs_prio *prio, bool 
reverse)
+{
+   struct mlx5_flow_table *ft = NULL;
+   struct fs_node *curr_node;
+   struct fs_node *parent;
+
+   parent = prio->node.parent;
+   curr_node = &prio->node;
+   while (!ft && parent) {
+   ft = find_closest_ft_recursive(parent, &curr_node->list, 
reverse);
+   curr_node = parent;
+   parent = curr_node->parent;
+   }
+   return ft;
+}
+
+/* Assuming all the tree is locked by mutex chain lock */
+static struct mlx5_flow_table *find_next_chained_ft(struct fs_prio *prio)
+{
+   return find_closest_ft(prio, false);
+}
+
+/* Assuming all the tree is locked by mutex chain lock */
+static struct mlx5_flow_table *find_prev_chained_ft(struct fs_prio *prio)
+{
+   return find_closest_ft(prio, true);
+}
+
 struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
   int prio,
   int max_fte)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next V1 01/12] net/mlx5_core: Introduce flow steering autogrouped flow table

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

When user add rule to autogrouped flow table, we search
for flow group with the same match criteria, if we don't
find such group then we create new flow group with the
required match criteria and insert the rule to this group.

We divide the flow table into required_groups + 1,
in order to reserve a part of the flow table for rules
which don't match any existing group.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |  166 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |5 +
 include/linux/mlx5/fs.h   |6 +
 3 files changed, 158 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index f7d62fe..7d24bbb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -85,6 +85,12 @@ static struct init_tree_node {
}
 };
 
+enum fs_i_mutex_lock_class {
+   FS_MUTEX_GRANDPARENT,
+   FS_MUTEX_PARENT,
+   FS_MUTEX_CHILD
+};
+
 static void del_rule(struct fs_node *node);
 static void del_flow_table(struct fs_node *node);
 static void del_flow_group(struct fs_node *node);
@@ -119,10 +125,11 @@ static void tree_get_node(struct fs_node *node)
atomic_inc(&node->refcount);
 }
 
-static void nested_lock_ref_node(struct fs_node *node)
+static void nested_lock_ref_node(struct fs_node *node,
+enum fs_i_mutex_lock_class class)
 {
if (node) {
-   mutex_lock_nested(&node->lock, SINGLE_DEPTH_NESTING);
+   mutex_lock_nested(&node->lock, class);
atomic_inc(&node->refcount);
}
 }
@@ -481,9 +488,7 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
list_add_tail(&ft->node.list, &fs_prio->node.children);
fs_prio->num_ft++;
unlock_ref_node(&fs_prio->node);
-
return ft;
-
 free_ft:
kfree(ft);
 unlock_prio:
@@ -491,8 +496,32 @@ unlock_prio:
return ERR_PTR(err);
 }
 
-struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
-  u32 *fg_in)
+struct mlx5_flow_table *mlx5_create_auto_grouped_flow_table(struct 
mlx5_flow_namespace *ns,
+   int prio,
+   int 
num_flow_table_entries,
+   int max_num_groups)
+{
+   struct mlx5_flow_table *ft;
+
+   if (max_num_groups > num_flow_table_entries)
+   return ERR_PTR(-EINVAL);
+
+   ft = mlx5_create_flow_table(ns, prio, num_flow_table_entries);
+   if (IS_ERR(ft))
+   return ft;
+
+   ft->autogroup.active = true;
+   ft->autogroup.required_groups = max_num_groups;
+
+   return ft;
+}
+
+/* Flow table should be locked */
+static struct mlx5_flow_group *create_flow_group_common(struct mlx5_flow_table 
*ft,
+   u32 *fg_in,
+   struct list_head
+   *prev_fg,
+   bool is_auto_fg)
 {
struct mlx5_flow_group *fg;
struct mlx5_core_dev *dev = get_dev(&ft->node);
@@ -505,18 +534,33 @@ struct mlx5_flow_group *mlx5_create_flow_group(struct 
mlx5_flow_table *ft,
if (IS_ERR(fg))
return fg;
 
-   lock_ref_node(&ft->node);
err = mlx5_cmd_create_flow_group(dev, ft, fg_in, &fg->id);
if (err) {
kfree(fg);
-   unlock_ref_node(&ft->node);
return ERR_PTR(err);
}
+
+   if (ft->autogroup.active)
+   ft->autogroup.num_groups++;
/* Add node to tree */
-   tree_init_node(&fg->node, 1, del_flow_group);
+   tree_init_node(&fg->node, !is_auto_fg, del_flow_group);
tree_add_node(&fg->node, &ft->node);
/* Add node to group list */
list_add(&fg->node.list, ft->node.children.prev);
+
+   return fg;
+}
+
+struct mlx5_flow_group *mlx5_create_flow_group(struct mlx5_flow_table *ft,
+  u32 *fg_in)
+{
+   struct mlx5_flow_group *fg;
+
+   if (ft->autogroup.active)
+   return ERR_PTR(-EPERM);
+
+   lock_ref_node(&ft->node);
+   fg = create_flow_group_common(ft, fg_in, &ft->node.children, false);
unlock_ref_node(&ft->node);
 
return fg;
@@ -614,7 +658,63 @@ static struct fs_fte *create_fte(struct mlx5_flow_group 
*fg,
return fte;
 }
 
-/* Assuming parent fg(flow table) is locked */
+static struct mlx5_flow_group *create_autogroup(struct mlx5_flow_table *ft,
+

[PATCH net-next V1 03/12] net/mlx5_core: Managing root flow table

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

The root Flow Table for each Flow Table Type is defined,
by default, as the Flow Table with level 0.

In order not to use an empty flow tables and introduce new hops,
but still preserve space for flow-tables that have a priority
greater(lower number) than the current flow table, we introduce this
new set root flow table command.
This command tells the HW to start matching packets from the
assigned root flow table.
This command is used when we create new flow table with level lower than the
current lowest flow table or it is the first flow table.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   18 
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h  |2 +
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   97 +++--
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |6 ++
 include/linux/mlx5/mlx5_ifc.h |   31 +++-
 5 files changed, 144 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 5096f4f..d8b1195 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -38,6 +38,24 @@
 #include "fs_cmd.h"
 #include "mlx5_core.h"
 
+int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft)
+{
+   u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)];
+   u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)];
+
+   memset(in, 0, sizeof(in));
+
+   MLX5_SET(set_flow_table_root_in, in, opcode,
+MLX5_CMD_OP_SET_FLOW_TABLE_ROOT);
+   MLX5_SET(set_flow_table_root_in, in, table_type, ft->type);
+   MLX5_SET(set_flow_table_root_in, in, table_id, ft->id);
+
+   memset(out, 0, sizeof(out));
+   return mlx5_cmd_exec_check_status(dev, in, sizeof(in), out,
+ sizeof(out));
+}
+
 int mlx5_cmd_create_flow_table(struct mlx5_core_dev *dev,
   enum fs_flow_table_type type, unsigned int level,
   unsigned int log_size, unsigned int *table_id)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index f39304e..70d18ec 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -62,4 +62,6 @@ int mlx5_cmd_delete_fte(struct mlx5_core_dev *dev,
struct mlx5_flow_table *ft,
unsigned int index);
 
+int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
+   struct mlx5_flow_table *ft);
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c5a96e6..64bdb54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -510,6 +510,29 @@ static struct mlx5_flow_table *find_prev_chained_ft(struct 
fs_prio *prio)
return find_closest_ft(prio, true);
 }
 
+static int update_root_ft_create(struct mlx5_flow_table *ft, struct fs_prio
+*prio)
+{
+   struct mlx5_flow_root_namespace *root = find_root(&prio->node);
+   int min_level = INT_MAX;
+   int err;
+
+   if (root->root_ft)
+   min_level = root->root_ft->level;
+
+   if (ft->level >= min_level)
+   return 0;
+
+   err = mlx5_cmd_update_root_ft(root->dev, ft);
+   if (err)
+   mlx5_core_warn(root->dev, "Update root flow table of id=%u 
failed\n",
+  ft->id);
+   else
+   root->root_ft = ft;
+
+   return err;
+}
+
 struct mlx5_flow_table *mlx5_create_flow_table(struct mlx5_flow_namespace *ns,
   int prio,
   int max_fte)
@@ -526,14 +549,15 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
return ERR_PTR(-ENODEV);
}
 
+   mutex_lock(&root->chain_lock);
fs_prio = find_prio(ns, prio);
-   if (!fs_prio)
-   return ERR_PTR(-EINVAL);
-
-   lock_ref_node(&fs_prio->node);
+   if (!fs_prio) {
+   err = -EINVAL;
+   goto unlock_root;
+   }
if (fs_prio->num_ft == fs_prio->max_ft) {
err = -ENOSPC;
-   goto unlock_prio;
+   goto unlock_root;
}
 
ft = alloc_flow_table(find_next_free_level(fs_prio),
@@ -541,7 +565,7 @@ struct mlx5_flow_table *mlx5_create_flow_table(struct 
mlx5_flow_namespace *ns,
  root->table_type);
if (!ft) {
err = -ENOMEM;
-   goto unlock_prio;
+   

[PATCH net-next V1 07/12] net/mlx5_core: Initialize namespaces only when supported by device

2016-01-06 Thread Saeed Mahameed
From: Maor Gottlieb 

Before we create the sub tree of a steering namespaces(kernel, bypass,
leftovers) we check that the device has the required capabilities
in order to create this subtree.

Signed-off-by: Maor Gottlieb 
Signed-off-by: Moni Shoua 
Signed-off-by: Matan Barak 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c |   70 ++--
 1 files changed, 49 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index e1282e8..96e287a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -40,20 +40,17 @@
 #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct 
init_tree_node[]){__VA_ARGS__}) /\
 sizeof(struct init_tree_node))
 
-#define INIT_PRIO(min_level_val, max_ft_val,\
- ...) {.type = FS_TYPE_PRIO,\
+#define ADD_PRIO(min_level_val, max_ft_val, caps_val,\
+...) {.type = FS_TYPE_PRIO,\
.min_ft_level = min_level_val,\
.max_ft = max_ft_val,\
+   .caps = caps_val,\
.children = (struct init_tree_node[]) {__VA_ARGS__},\
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
-#define ADD_PRIO(min_level_val, max_ft_val, ...)\
-   INIT_PRIO(min_level_val, max_ft_val,\
- __VA_ARGS__)\
-
 #define ADD_FT_PRIO(max_ft_val, ...)\
-   INIT_PRIO(0, max_ft_val,\
+   ADD_PRIO(0, max_ft_val, {},\
  __VA_ARGS__)\
 
 #define ADD_NS(...) {.type = FS_TYPE_NAMESPACE,\
@@ -61,12 +58,26 @@
.ar_size = INIT_TREE_NODE_ARRAY_SIZE(__VA_ARGS__) \
 }
 
+#define INIT_CAPS_ARRAY_SIZE(...) (sizeof((long[]){__VA_ARGS__}) /\
+  sizeof(long))
+
+#define FS_CAP(cap) (__mlx5_bit_off(flow_table_nic_cap, cap))
+
+#define FS_REQUIRED_CAPS(...) {.arr_sz = INIT_CAPS_ARRAY_SIZE(__VA_ARGS__), \
+  .caps = (long[]) {__VA_ARGS__} }
+
 #define KERNEL_MAX_FT 2
 #define KENREL_MIN_LEVEL 2
+
+struct node_caps {
+   size_t  arr_sz;
+   long*caps;
+};
 static struct init_tree_node {
enum fs_node_type   type;
struct init_tree_node *children;
int ar_size;
+   struct node_caps caps;
int min_ft_level;
int prio;
int max_ft;
@@ -74,7 +85,7 @@ static struct init_tree_node {
.type = FS_TYPE_NAMESPACE,
.ar_size = 1,
.children = (struct init_tree_node[]) {
-   ADD_PRIO(KENREL_MIN_LEVEL, 0,
+   ADD_PRIO(KENREL_MIN_LEVEL, 0, {},
 ADD_NS(ADD_FT_PRIO(KERNEL_MAX_FT))),
}
 };
@@ -1153,11 +1164,31 @@ static struct mlx5_flow_namespace 
*fs_create_namespace(struct fs_prio *prio)
return ns;
 }
 
-static int init_root_tree_recursive(int max_ft_level, struct init_tree_node 
*init_node,
+#define FLOW_TABLE_BIT_SZ 1
+#define GET_FLOW_TABLE_CAP(dev, offset) \
+   ((be32_to_cpu(*((__be32 *)(dev->hca_caps_cur[MLX5_CAP_FLOW_TABLE]) +
\
+   offset / 32)) >>
\
+ (32 - FLOW_TABLE_BIT_SZ - (offset & 0x1f))) & FLOW_TABLE_BIT_SZ)
+static bool has_required_caps(struct mlx5_core_dev *dev, struct node_caps 
*caps)
+{
+   int i;
+
+   for (i = 0; i < caps->arr_sz; i++) {
+   if (!GET_FLOW_TABLE_CAP(dev, caps->caps[i]))
+   return false;
+   }
+   return true;
+}
+
+static int init_root_tree_recursive(struct mlx5_core_dev *dev,
+   struct init_tree_node *init_node,
struct fs_node *fs_parent_node,
struct init_tree_node *init_parent_node,
int index)
 {
+   int max_ft_level = MLX5_CAP_FLOWTABLE(dev,
+ flow_table_properties_nic_receive.
+ max_ft_level);
struct mlx5_flow_namespace *fs_ns;
struct fs_prio *fs_prio;
struct fs_node *base;
@@ -1165,8 +1196,9 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
int err;
 
if (init_node->type == FS_TYPE_PRIO) {
-   if (init_node->min_ft_level > max_ft_level)
-   return -ENOTSUPP;
+   if ((init_node->min_ft_level > max_ft_level) ||
+   !has_required_caps(dev, &init_node->caps))
+   return 0;
 
fs_get_obj(fs_ns, fs_parent_node);
fs_prio = fs_create_prio(fs_ns, index, init_node->max_ft);
@@ -1183,9 +1215,8 @@ static int init_root_tree_recursive(int max_ft_level, 
struct init_tree_node *ini
return -EINVAL;
}
for (i = 0; i < init_node->ar_size; i++) {
-   err = init_root_tree_recursive(max_ft_level,
-  

Re: [PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread Sowmini Varadhan
On (01/06/16 16:33), David Miller wrote:
> 
> Signed-off-by: David S. Miller 

Acked-by: Sowmini Varadhan 

> 
> As promised back in November, I'm commiting this into net-next
> now.  I'd work on the conversions of existing drivers, but I'd
> rather someone with access to the hardware to so.

I'll look into this.

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] net: Add eth_platform_get_mac_address() helper.

2016-01-06 Thread David Miller

A repeating pattern in drivers has become to use OF node information
and, if not found, platform specific host information to extract the
ethernet address for a given device.

Currently this is done with a call to of_get_mac_address() and then
some ifdef'd stuff for SPARC.

Consolidate this into a portable routine, and provide the
arch_get_platform_mac_address() weak function hook for all
architectures to implement if they want.

Signed-off-by: David S. Miller 
---

As promised back in November, I'm commiting this into net-next
now.  I'd work on the conversions of existing drivers, but I'd
rather someone with access to the hardware to so.

 arch/sparc/kernel/idprom.c  |  7 +++
 include/linux/etherdevice.h |  3 +++
 net/ethernet/eth.c  | 31 +++
 3 files changed, 41 insertions(+)

diff --git a/arch/sparc/kernel/idprom.c b/arch/sparc/kernel/idprom.c
index 6bd7501..f95dd11 100644
--- a/arch/sparc/kernel/idprom.c
+++ b/arch/sparc/kernel/idprom.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -60,6 +61,12 @@ static void __init display_system_type(unsigned char 
machtype)
 {
 }
 #endif
+
+unsigned char *arch_get_platform_mac_address(void)
+{
+   return idprom->id_ethaddr;
+}
+
 /* Calculate the IDPROM checksum (xor of the data bytes). */
 static unsigned char __init calc_idprom_cksum(struct idprom *idprom)
 {
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index eb049c6..37ff4a6 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -29,6 +29,9 @@
 #include 
 
 #ifdef __KERNEL__
+struct device;
+int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
+unsigned char *arch_get_platform_get_mac_address(void);
 u32 eth_get_headlen(void *data, unsigned int max_len);
 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
 extern const struct header_ops eth_header_ops;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 9e63f25..1038717 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -52,6 +52,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -485,3 +487,32 @@ static int __init eth_offload_init(void)
 }
 
 fs_initcall(eth_offload_init);
+
+unsigned char * __weak arch_get_platform_mac_address(void)
+{
+   return NULL;
+}
+
+int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr)
+{
+   const unsigned char *addr;
+   struct device_node *dp;
+
+   if (dev_is_pci(dev))
+   dp = pci_device_to_OF_node(to_pci_dev(dev));
+   else
+   dp = dev->of_node;
+
+   addr = NULL;
+   if (dp)
+   addr = of_get_mac_address(dp);
+   if (!addr)
+   addr = arch_get_platform_mac_address();
+
+   if (!addr)
+   return -ENODEV;
+
+   ether_addr_copy(mac_addr, addr);
+   return 0;
+}
+EXPORT_SYMBOL(eth_platform_get_mac_address);
-- 
2.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bpf: cleanup bpf_prog_run_{save,clear}_cb helpers

2016-01-06 Thread Daniel Borkmann
Move the details behind the cb[] access into a small helper to decouple
and make them generic for bpf_prog_run_save_cb()/bpf_prog_run_clear_cb()
that was introduced via commit ff936a04e5f2 ("bpf: fix cb access in socket
filter programs"). Also add a comment to better clarify what is done in
bpf_skb_cb().

Signed-off-by: Daniel Borkmann 
---
 include/linux/filter.h | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 294c3cd..30fa395 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -350,25 +350,43 @@ struct sk_filter {
 
 #define BPF_PROG_RUN(filter, ctx)  (*filter->bpf_func)(ctx, filter->insnsi)
 
+#define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
+
+static inline u8 *bpf_skb_cb(struct sk_buff *skb)
+{
+   /* eBPF programs may read/write skb->cb[] area to transfer meta
+* data between tail calls. Since this also needs to work with
+* tc, that scratch memory is mapped to qdisc_skb_cb's data area.
+*
+* In some socket filter cases, the cb unfortunately needs to be
+* saved/restored so that protocol specific skb->cb[] data won't
+* be lost. In any case, due to unpriviledged eBPF programs
+* attached to sockets, we need to clear the bpf_skb_cb() area
+* to not leak previous contents to user space.
+*/
+   BUILD_BUG_ON(FIELD_SIZEOF(struct __sk_buff, cb) != BPF_SKB_CB_LEN);
+   BUILD_BUG_ON(FIELD_SIZEOF(struct __sk_buff, cb) !=
+FIELD_SIZEOF(struct qdisc_skb_cb, data));
+
+   return qdisc_skb_cb(skb)->data;
+}
+
 static inline u32 bpf_prog_run_save_cb(const struct bpf_prog *prog,
   struct sk_buff *skb)
 {
-   u8 *cb_data = qdisc_skb_cb(skb)->data;
-   u8 saved_cb[QDISC_CB_PRIV_LEN];
+   u8 *cb_data = bpf_skb_cb(skb);
+   u8 cb_saved[BPF_SKB_CB_LEN];
u32 res;
 
-   BUILD_BUG_ON(FIELD_SIZEOF(struct __sk_buff, cb) !=
-QDISC_CB_PRIV_LEN);
-
if (unlikely(prog->cb_access)) {
-   memcpy(saved_cb, cb_data, sizeof(saved_cb));
-   memset(cb_data, 0, sizeof(saved_cb));
+   memcpy(cb_saved, cb_data, sizeof(cb_saved));
+   memset(cb_data, 0, sizeof(cb_saved));
}
 
res = BPF_PROG_RUN(prog, skb);
 
if (unlikely(prog->cb_access))
-   memcpy(cb_data, saved_cb, sizeof(saved_cb));
+   memcpy(cb_data, cb_saved, sizeof(cb_saved));
 
return res;
 }
@@ -376,10 +394,11 @@ static inline u32 bpf_prog_run_save_cb(const struct 
bpf_prog *prog,
 static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
struct sk_buff *skb)
 {
-   u8 *cb_data = qdisc_skb_cb(skb)->data;
+   u8 *cb_data = bpf_skb_cb(skb);
 
if (unlikely(prog->cb_access))
-   memset(cb_data, 0, QDISC_CB_PRIV_LEN);
+   memset(cb_data, 0, BPF_SKB_CB_LEN);
+
return BPF_PROG_RUN(prog, skb);
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] Driver: Vmxnet3: Fix regression caused by 5738a09

2016-01-06 Thread David Miller
From: Shrikrishna Khare 
Date: Wed,  6 Jan 2016 10:44:27 -0800

> Reported-by: Bingkuo Liu 
> Signed-off-by: Shrikrishna Khare 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] geneve: break dependency to network drivers

2016-01-06 Thread Hannes Frederic Sowa

On 06.01.2016 22:01, Jesse Gross wrote:

On Wed, Jan 6, 2016 at 12:25 PM, Hannes Frederic Sowa
 wrote:

The refreshes from each module are completely synchronous and don't get
interleaved, so as long as the driver is correctly handling the locking
internally rtnl lock shouldn't be needed. But as I don't know too much about
driver developing I can revisit this.

As a advantage I see that the driver developers don't need to worry about
the rtnl lock at all when adding new events. Is this realistic?


I don't think that there is much savings to be had by avoiding RTNL
since the majority of interactions that the driver has with the stack
involve holding it anyways.

In order to do this safely without RTNL we need to have a lock in each
driver. I don't think that this is safely handled in all cases today
and is likely to get worse in the future. I also noticed that Geneve
actually doesn't hold any special lock while calling into drivers from
geneve_get_rx_port() so it is de-facto relying on RTNL. All other
operations in the Geneve driver are protected by RTNL currently, so we
would need to introduce a new lock to handle this as well. In effect,
it seems like people are implicitly assuming that these operations are
covered by RTNL since most similar things are.


Okay, on top of the v1 version I will check all drivers and add 
necessary rtnl_locks. Hopefully it works out and I don't have to defer 
calls into working queues in the drivers first.


Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Thadeu Lima de Souza Cascardo
On Wed, Jan 06, 2016 at 11:11:41PM +0300, Konstantin Khlebnikov wrote:
> On Wed, Jan 6, 2016 at 10:59 PM, Cong Wang  wrote:
> > On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov  
> > wrote:
> >> Looks like this happens because ip_options_fragment() relies on
> >> correct ip options length in ip control block in skb. But in
> >> ip_finish_output_gso() control block in segments is reused by
> >> skb_gso_segment(). following ip_fragment() sees some garbage.
> >>
> >> In my case there was no ip options but length becomes non-zero and
> >> ip_options_fragment() picked some bytes from payload and decides to
> >> fill huge range with IPOPT_NOOP (1). One of that ones flipped nr_frags
> >> in skb_shared_info at the end of data =)
> >>
> >
> > Hmm, it looks like SKB_GSO_CB should be cleared after skb_gso_segment()
> > since all the gso information should be saved in shared_info after it 
> > finishes.
> >
> > Does a memset(0) on SKB_GSO_CB after skb_gso_segment() work as well?
> 
> This will break present logic around ip_options_fragment() - it clears
> options from
> second and following fragments. With zeroed cb it will do nothing.
> 
> ip_options_fragment() can get required information directly from ip header but
> it also resets fields in IPCB -- probably it should stay valid here
> and somebody else will use it later.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I have hit this as well, this fixes it for me on an older kernel. Can you try it
on latest kernel?

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d8a1745..f44bc91 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
netdev_features_t features;
struct sk_buff *segs;
int ret = 0;
+   struct inet_skb_parm ipcb;
 
if (skb_gso_network_seglen(skb) <= ip_skb_dst_mtu(skb))
return ip_finish_output2(skb);
@@ -227,6 +228,10 @@ static int ip_finish_output_gso(struct sk_buff *skb)
 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
 * from host network stack.
 */
+   /* We need to save IPCB here because skb_gso_segment will use
+* SKB_GSO_CB.
+*/
+   ipcb = *IPCB(skb);
features = netif_skb_features(skb);
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
if (IS_ERR_OR_NULL(segs)) {
@@ -241,6 +246,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
int err;
 
segs->next = NULL;
+   *IPCB(segs) = ipcb;
err = ip_fragment(segs, ip_finish_output2);
 
if (err && ret == 0)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] geneve: break dependency to network drivers

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 12:25 PM, Hannes Frederic Sowa
 wrote:
> On 06.01.2016 20:52, Jesse Gross wrote:
>> On Wed, Jan 6, 2016 at 10:48 AM, Hannes Frederic Sowa
>>  wrote:
>>> On 06.01.2016 19:00, Jesse Gross wrote:
 Unfortunately, I don't think that we can assume that RTNL is held
 here. It actually is for the drivers that implement Geneve at this
 point but not in all cases for VXLAN. For example, ixgbe refreshes the
 offloads in a service task in addition to when it is opened. There's
 only a couple instances of things like this, so I guess it's probably
 not too hard to through and make sure that we hold RTNL in those
 cases.
>>>
>>>
>>>
>>> Hmm, I am tempted to switch over to the netevent atomic notifier chain
>>> and
>>> install those events there. It does not need rtnl lock at all, so we can
>>> preserve the current semantics. What do you think?
>>
>>
>> I think that holding RTNL while we do these updates is actually the
>> right thing to do. The current situation of having calls from
>> different protocols protected by different locks is not really a great
>> model given that at the driver level these are usually shared data
>> structures. RTNL is already held in the majority of cases already, so
>> I think it is better to just convert the rest.
>
>
> The refreshes from each module are completely synchronous and don't get
> interleaved, so as long as the driver is correctly handling the locking
> internally rtnl lock shouldn't be needed. But as I don't know too much about
> driver developing I can revisit this.
>
> As a advantage I see that the driver developers don't need to worry about
> the rtnl lock at all when adding new events. Is this realistic?

I don't think that there is much savings to be had by avoiding RTNL
since the majority of interactions that the driver has with the stack
involve holding it anyways.

In order to do this safely without RTNL we need to have a lock in each
driver. I don't think that this is safely handled in all cases today
and is likely to get worse in the future. I also noticed that Geneve
actually doesn't hold any special lock while calling into drivers from
geneve_get_rx_port() so it is de-facto relying on RTNL. All other
operations in the Geneve driver are protected by RTNL currently, so we
would need to introduce a new lock to handle this as well. In effect,
it seems like people are implicitly assuming that these operations are
covered by RTNL since most similar things are.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: move ndo_features_check() close to ndo_start_xmit()

2016-01-06 Thread David Miller
From: Eric Dumazet 
Date: Wed, 06 Jan 2016 06:53:50 -0800

> From: Eric Dumazet 
> 
> TX fast path uses ndo_start_xmit(), ndo_features_check() and
> ndo_select_queue().
> 
> Move ndo_features_check() close to ndo_start_xmit() to increase
> data locality.
> 
> All "struct net_device_ops" should now be using C99 initializers.
> 
> Signed-off-by: Eric Dumazet 

Looks good, applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

2016-01-06 Thread David Miller
From: David Wragg 
Date: Wed,  6 Jan 2016 13:33:04 +

> Prior to 4.3, openvswitch vxlan vports could transmit vxlan packets of
> any size, constrained only by the ability to transmit the resulting
> UDP packets.  4.3 introduced vxlan netdevs corresponding to vxlan
> vports.  These netdevs have an MTU, which limits the size of a packet
> that can be successfully vxlan-encapsulated.  The default value for
> this MTU is 1500, which is awkwardly small, and leads to a conspicuous
> change in behaviour for userspace.
> 
> These two patches set the MTU on openvswitch-crated vxlan devices to
> be 65465 (the maximum IP packet size minus the vxlan-on-IPv6
> overhead), effectively restoring the behaviour prior to 4.3.  In order
> to accomplish this, the first patch removes the MTU constraint of 1500
> for vxlan netdevs without an underlying device.

Is this really the right thing to do?  Won't we get a lot of fragmentation
by using such a large MTU, especially since you're making it the default
for OVS setups?

Things like path MTU discovery hinge strongly upon accurate MTU settings.
Otherwise they won't function properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: qmi_wwan: Add WeTelecom-WPD600N

2016-01-06 Thread David Miller
From: Kristian Evensen 
Date: Wed,  6 Jan 2016 14:15:50 +0100

> From: Kristian Evensen 
> 
> The WeTelecom-WPD600N is an LTE module that, in addition to supporting most
> "normal" bands, also supports LTE over 450MHz. Manual testing showed that
> only interface number three replies to QMI messages.
> 
> Cc: Bjørn Mork 
> Signed-off-by: Kristian Evensen 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ixgbe: Fix for RAR0 not being set to default MAC addr

2016-01-06 Thread Sowmini Varadhan
On (01/06/16 12:05), Tushar Dave wrote:

> commit c9f53e63c208 ("ixgbe: Refactor MAC address configuration code")
> introduced code that doesn't set HW register RAR0 to default mac address
> but FF:FF:FF:FF:FF:FF. Due to this, ixgbe HW discards all incoming packets
> that doesn't have destination mac address equals to FF:FF:FF:FF:FF:FF.
> 
> This commit sets RAR0 correctly to default HW mac address.
> 
> Signed-off-by: Tushar Dave 

Tested-by: Sowmini Varadhan 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: fix zero cwnd in tcp_cwnd_reduction

2016-01-06 Thread Yuchung Cheng
Patch 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode
conditionally") introduced a bug that cwnd may become 0 when both
inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead
to a div-by-zero if the connection starts another cwnd reduction
phase by setting tp->prior_cwnd to the current cwnd (0) in
tcp_init_cwnd_reduction().

To prevent this we skip PRR operation when nothing is acked or
sacked. Then cwnd must be positive in all cases as long as ssthresh
is positive:

1) The proportional reduction mode
   inflight > ssthresh > 0

2) The reduction bound mode
  a) inflight == ssthresh > 0

  b) inflight < ssthresh
 sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh

Therefore in all cases inflight and sndcnt can not both be 0.
We check invalid tp->prior_cwnd to avoid potential div0 bugs.

In reality this bug is triggered only with a sequence of less common
events.  For example, the connection is terminating an ECN-triggered
cwnd reduction with an inflight 0, then it receives reordered/old
ACKs or DSACKs from prior transmission (which acks nothing). Or the
connection is in fast recovery stage that marks everything lost,
but fails to retransmit due to local issues, then receives data
packets from other end which acks nothing.

Fixes: 3759824da87b ("tcp: PRR uses CRB mode by default and SS mode 
conditionally")
Reported-by: Oleksandr Natalenko 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 2d656ee..d4c5115 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2478,6 +2478,9 @@ static void tcp_cwnd_reduction(struct sock *sk, const int 
prior_unsacked,
int newly_acked_sacked = prior_unsacked -
 (tp->packets_out - tp->sacked_out);
 
+   if (newly_acked_sacked <= 0 || WARN_ON_ONCE(!tp->prior_cwnd))
+   return;
+
tp->prr_delivered += newly_acked_sacked;
if (delta < 0) {
u64 dividend = (u64)tp->snd_ssthresh * tp->prr_delivered +
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -next] fsl/fman: use the ALIGN() macro

2016-01-06 Thread Dan Carpenter
On Wed, Jan 06, 2016 at 03:27:38PM -0500, David Miller wrote:
> From: Dan Carpenter 
> Date: Wed, 6 Jan 2016 13:03:08 +0300
> 
> > diff --git a/drivers/net/ethernet/freescale/fman/fman_sp.c 
> > b/drivers/net/ethernet/freescale/fman/fman_sp.c
> > index f9e7aa3..b527da1 100644
> > --- a/drivers/net/ethernet/freescale/fman/fman_sp.c
> > +++ b/drivers/net/ethernet/freescale/fman/fman_sp.c
> > @@ -92,11 +92,8 @@ int fman_sp_build_buffer_struct(struct 
> > fman_sp_int_context_data_copy *
> > u32 tmp;
> >  
> > /* Align start of internal context data to 16 byte */
> > -   int_context_data_copy->ext_buf_offset = (u16)
> > -   ((buffer_prefix_content->priv_data_size & (OFFSET_UNITS - 1)) ?
> > -   ((buffer_prefix_content->priv_data_size + OFFSET_UNITS) &
> > -   ~(u16)(OFFSET_UNITS - 1)) :
> > -   buffer_prefix_content->priv_data_size);
> > +   int_context_data_copy->ext_buf_offset =
> > +   (u16)ALIGN(buffer_prefix_content->priv_data_size, 16);
> 
> Why are you using '16' instead of OFFSET_UNITS?

I made a brain fart.  :/  I'll resend.  Thanks.

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm64: net: bpf: don't BUG() on large shifts

2016-01-06 Thread Rabin Vincent
On Tue, Jan 05, 2016 at 09:55:58AM -0800, Alexei Starovoitov wrote:
> this one is better to be addressed in verifier instead of eBPF JITs.
> Please reject it in check_alu_op() instead.

AFAICS the eBPF verifier is not called on the eBPF filters generated by
the BPF->eBPF conversion in net/core/filter.c, so performing this check
only in check_alu_op() will be insufficient.  So I think we'd need to
add this check to bpf_check_classic() too.  Or am I missing something?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mkiss: fix scribble on freed memory

2016-01-06 Thread Sergei Shtylyov

Hello.

On 1/6/2016 5:55 PM, Alan wrote:


commit d79f16c046086f4fe0d42184a458e187464eb83e fixed a user triggerable


   scripts/checkpatch.pl now enforces certain format of the commit citing.


scribble on free memory but added a new one which allows the user to
scribble even more and user controlled data into freed space.

As with 6pack we need to halt the queue before we free the buffers, because
the transmit logic is not protected by the semaphore.

Signed-off-by: Alan Cox 

[...]

MBR, Sergei


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -next] fsl/fman: fix the pause_time test

2016-01-06 Thread David Miller
From: Dan Carpenter 
Date: Wed, 6 Jan 2016 12:58:09 +0300

> pause_time is unsigned so it can't be less than zero.  The bug means
> that we allow invalid pause-times.
> 
> Fixes: 57ba4c9b56d8 ('fsl/fman: Add FMan MAC support')
> Signed-off-by: Dan Carpenter 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -next] fsl/fman: double free on probe failure

2016-01-06 Thread David Miller
From: Dan Carpenter 
Date: Wed, 6 Jan 2016 12:59:10 +0300

> "priv" is allocated with devm_kzalloc() so freeing it here with kfree()
> will lead to a double free.
> 
> Fixes: 3933961682a3 ('fsl/fman: Add FMan MAC driver')
> Signed-off-by: Dan Carpenter 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -next] fsl/fman: use the ALIGN() macro

2016-01-06 Thread David Miller
From: Dan Carpenter 
Date: Wed, 6 Jan 2016 13:03:08 +0300

> diff --git a/drivers/net/ethernet/freescale/fman/fman_sp.c 
> b/drivers/net/ethernet/freescale/fman/fman_sp.c
> index f9e7aa3..b527da1 100644
> --- a/drivers/net/ethernet/freescale/fman/fman_sp.c
> +++ b/drivers/net/ethernet/freescale/fman/fman_sp.c
> @@ -92,11 +92,8 @@ int fman_sp_build_buffer_struct(struct 
> fman_sp_int_context_data_copy *
>   u32 tmp;
>  
>   /* Align start of internal context data to 16 byte */
> - int_context_data_copy->ext_buf_offset = (u16)
> - ((buffer_prefix_content->priv_data_size & (OFFSET_UNITS - 1)) ?
> - ((buffer_prefix_content->priv_data_size + OFFSET_UNITS) &
> - ~(u16)(OFFSET_UNITS - 1)) :
> - buffer_prefix_content->priv_data_size);
> + int_context_data_copy->ext_buf_offset =
> + (u16)ALIGN(buffer_prefix_content->priv_data_size, 16);

Why are you using '16' instead of OFFSET_UNITS?

Unless you can come up with a good reason, please use OFFSET_UNITS instead
of a magic constant.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] geneve: break dependency to network drivers

2016-01-06 Thread Hannes Frederic Sowa

Hi Jesse,

hmpf, I saw your mail too late and send out another series just now.

On 06.01.2016 20:52, Jesse Gross wrote:

On Wed, Jan 6, 2016 at 10:48 AM, Hannes Frederic Sowa
 wrote:

On 06.01.2016 19:00, Jesse Gross wrote:


On Wed, Jan 6, 2016 at 7:41 AM, Hannes Frederic Sowa
 wrote:


diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 24b077a32c1c9c..548925d1571cb1 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
+static int geneve_notifier(struct notifier_block *unused,
+  unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+   switch (event) {
+   case NETDEV_REFRESH_OFFLOAD_VXLAN:
+   geneve_notify_refresh_netdev(dev);



Presumably this should be NETDEV_REFRESH_OFFLOAD_GENEVE, not VXLAN.
However, rather than having a notifier for each protocol, it seems
like it might be cleaner to just have a single one that triggers for
all protocols and drivers that don't have the corresponding NDO
wouldn't get called, similar to what happens when the port gets added
in the first place.



Ah, thanks for noticing the typo.

The reason why I went with several types is that I didn't want to change the
behavior and wasn't sure if driver tested with reoccurring offload refreshes
to the driver. What you described was my first patch but because I couldn't
see if that works for all drivers I went this way.


Hmm, I see what you mean but I think it should be safe. All drivers
that have both Geneve and VXLAN offloads make calls to refresh them
back to back, which is a pattern that I would expect to continue. In
that case, having a single notifier that triggers multiple protocols
would have the same effect and is simpler.


My new series adds a new netdev_notifier which is atomic and can deal 
with non locked rtnl events. I thought maybe it would be useful for 
future callbacks, too. I am currently looking into ptp and reduce the 
dependency there, if possible.



diff --git a/include/net/geneve.h b/include/net/geneve.h
index e6c23dc765f7ec..36245115143652 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h


[...]


   static inline void geneve_get_rx_port(struct net_device *netdev)
   {
+   call_netdevice_notifiers(NETDEV_REFRESH_OFFLOAD_GENEVE, netdev);
   }



Unfortunately, I don't think that we can assume that RTNL is held
here. It actually is for the drivers that implement Geneve at this
point but not in all cases for VXLAN. For example, ixgbe refreshes the
offloads in a service task in addition to when it is opened. There's
only a couple instances of things like this, so I guess it's probably
not too hard to through and make sure that we hold RTNL in those
cases.



Hmm, I am tempted to switch over to the netevent atomic notifier chain and
install those events there. It does not need rtnl lock at all, so we can
preserve the current semantics. What do you think?


I think that holding RTNL while we do these updates is actually the
right thing to do. The current situation of having calls from
different protocols protected by different locks is not really a great
model given that at the driver level these are usually shared data
structures. RTNL is already held in the majority of cases already, so
I think it is better to just convert the rest.


The refreshes from each module are completely synchronous and don't get 
interleaved, so as long as the driver is correctly handling the locking 
internally rtnl lock shouldn't be needed. But as I don't know too much 
about driver developing I can revisit this.


As a advantage I see that the driver developers don't need to worry 
about the rtnl lock at all when adding new events. Is this realistic?


Thanks,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 3/3] geneve: use netdev_atomic notifier chain to remove dependency from drivers

2016-01-06 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/geneve.c  | 30 +++---
 include/linux/netdevice.h |  1 +
 include/net/geneve.h  |  6 ++
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 24b077a32c1c9c..e04a7678438999 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1110,7 +1110,7 @@ static struct device_type geneve_type = {
  * supply the listening GENEVE udp ports. Callers are expected
  * to implement the ndo_add_geneve_port.
  */
-void geneve_get_rx_port(struct net_device *dev)
+static void geneve_netdev_refresh_offload(struct net_device *dev)
 {
struct net *net = dev_net(dev);
struct geneve_net *gn = net_generic(net, geneve_net_id);
@@ -1128,7 +1128,6 @@ void geneve_get_rx_port(struct net_device *dev)
}
rcu_read_unlock();
 }
-EXPORT_SYMBOL_GPL(geneve_get_rx_port);
 
 /* Initialize the device structure. */
 static void geneve_setup(struct net_device *dev)
@@ -1427,6 +1426,24 @@ static struct rtnl_link_ops geneve_link_ops 
__read_mostly = {
.fill_info  = geneve_fill_info,
 };
 
+static int geneve_atomic_event(struct notifier_block *unused,
+  unsigned long event, void *ptr)
+{
+   struct net_device *dev = ptr;
+
+   switch (event) {
+   case NETDEV_OFFLOAD_REFRESH_GENEVE:
+   geneve_netdev_refresh_offload(dev);
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block geneve_atomic_notifier_block __read_mostly = {
+   .notifier_call = geneve_atomic_event,
+};
+
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
u8 name_assign_type, u16 dst_port)
 {
@@ -1502,11 +1519,17 @@ static int __init geneve_init_module(void)
if (rc)
goto out1;
 
-   rc = rtnl_link_register(&geneve_link_ops);
+   rc = register_netdev_atomic_notifier(&geneve_atomic_notifier_block);
if (rc)
goto out2;
 
+   rc = rtnl_link_register(&geneve_link_ops);
+   if (rc)
+   goto out3;
+
return 0;
+out3:
+   unregister_netdev_atomic_notifier(&geneve_atomic_notifier_block);
 out2:
unregister_pernet_subsys(&geneve_net_ops);
 out1:
@@ -1517,6 +1540,7 @@ late_initcall(geneve_init_module);
 static void __exit geneve_cleanup_module(void)
 {
rtnl_link_unregister(&geneve_link_ops);
+   unregister_netdev_atomic_notifier(&geneve_atomic_notifier_block);
unregister_pernet_subsys(&geneve_net_ops);
 }
 module_exit(geneve_cleanup_module);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b2b943b1896c57..21cdb1afc5cd20 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2225,6 +2225,7 @@ int call_netdevice_notifiers(unsigned long val, struct 
net_device *dev);
 
 enum netdev_atomic_callback_type {
NETDEV_OFFLOAD_REFRESH_VXLAN = 0x1UL,
+   NETDEV_OFFLOAD_REFRESH_GENEVE = 0x2UL,
 };
 
 int register_netdev_atomic_notifier(struct notifier_block *nb);
diff --git a/include/net/geneve.h b/include/net/geneve.h
index e6c23dc765f7ec..e09476ed8feb20 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -62,13 +62,11 @@ struct genevehdr {
struct geneve_opt options[];
 };
 
-#if IS_ENABLED(CONFIG_GENEVE)
-void geneve_get_rx_port(struct net_device *netdev);
-#else
 static inline void geneve_get_rx_port(struct net_device *netdev)
 {
+   call_netdev_atomic_notifiers(NETDEV_OFFLOAD_REFRESH_GENEVE,
+netdev);
 }
-#endif
 
 #ifdef CONFIG_INET
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 2/3] vxlan: use netdev_atomic notifier chain to remove dependency from drivers

2016-01-06 Thread Hannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/vxlan.c   | 35 ++-
 include/linux/netdevice.h |  2 +-
 include/net/vxlan.h   |  6 ++
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index fecf7b6c732e96..47da2045ca1b19 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2468,7 +2468,7 @@ static struct device_type vxlan_type = {
  * supply the listening VXLAN udp ports. Callers are expected
  * to implement the ndo_add_vxlan_port.
  */
-void vxlan_get_rx_port(struct net_device *dev)
+static void vxlan_netdev_refresh_offload(struct net_device *dev)
 {
struct vxlan_sock *vs;
struct net *net = dev_net(dev);
@@ -2488,7 +2488,6 @@ void vxlan_get_rx_port(struct net_device *dev)
}
spin_unlock(&vn->sock_lock);
 }
-EXPORT_SYMBOL_GPL(vxlan_get_rx_port);
 
 /* Initialize the device structure. */
 static void vxlan_setup(struct net_device *dev)
@@ -3180,6 +3179,24 @@ static struct notifier_block vxlan_notifier_block 
__read_mostly = {
.notifier_call = vxlan_lowerdev_event,
 };
 
+static int vxlan_atomic_event(struct notifier_block *unused,
+ unsigned long event, void *ptr)
+{
+   struct net_device *dev = ptr;
+
+   switch (event) {
+   case NETDEV_OFFLOAD_REFRESH_VXLAN:
+   vxlan_netdev_refresh_offload(dev);
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block vxlan_atomic_notifier_block __read_mostly = {
+   .notifier_call = vxlan_atomic_event,
+};
+
 static __net_init int vxlan_init_net(struct net *net)
 {
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -3241,17 +3258,24 @@ static int __init vxlan_init_module(void)
if (rc)
goto out1;
 
-   rc = register_netdevice_notifier(&vxlan_notifier_block);
+   rc = register_netdev_atomic_notifier(&vxlan_atomic_notifier_block);
if (rc)
goto out2;
 
-   rc = rtnl_link_register(&vxlan_link_ops);
+   rc = register_netdevice_notifier(&vxlan_notifier_block);
if (rc)
goto out3;
 
+   rc = rtnl_link_register(&vxlan_link_ops);
+   if (rc)
+   goto out4;
+
return 0;
-out3:
+
+out4:
unregister_netdevice_notifier(&vxlan_notifier_block);
+out3:
+   unregister_netdev_atomic_notifier(&vxlan_atomic_notifier_block);
 out2:
unregister_pernet_subsys(&vxlan_net_ops);
 out1:
@@ -3263,6 +3287,7 @@ late_initcall(vxlan_init_module);
 static void __exit vxlan_cleanup_module(void)
 {
rtnl_link_unregister(&vxlan_link_ops);
+   unregister_netdev_atomic_notifier(&vxlan_atomic_notifier_block);
unregister_netdevice_notifier(&vxlan_notifier_block);
destroy_workqueue(vxlan_wq);
unregister_pernet_subsys(&vxlan_net_ops);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fb6064c88fc259..b2b943b1896c57 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2224,7 +2224,7 @@ netdev_notifier_info_to_dev(const struct 
netdev_notifier_info *info)
 int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
 
 enum netdev_atomic_callback_type {
-   PLACEHOLDER = 0x1UL,
+   NETDEV_OFFLOAD_REFRESH_VXLAN = 0x1UL,
 };
 
 int register_netdev_atomic_notifier(struct notifier_block *nb);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 0fb86442544b26..cebe08569b5129 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -242,13 +242,11 @@ static inline netdev_features_t 
vxlan_features_check(struct sk_buff *skb,
 /* IPv6 header + UDP + VXLAN + Ethernet header */
 #define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
 
-#if IS_ENABLED(CONFIG_VXLAN)
-void vxlan_get_rx_port(struct net_device *netdev);
-#else
 static inline void vxlan_get_rx_port(struct net_device *netdev)
 {
+   call_netdev_atomic_notifiers(NETDEV_OFFLOAD_REFRESH_VXLAN,
+netdev);
 }
-#endif
 
 static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
 {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 0/3] netdev: add netdev atomic callchain to avoid driver depending on vxlan/geneve

2016-01-06 Thread Hannes Frederic Sowa
This small series adds a netdev_atomic notifier chain so that drivers can
callback to the core network stack requesting to reprogram the offloads
without depending on the geneve or vxlan modules.

* Result:
$ cd drivers/net/ethernet/
$ find . -name '*.ko' | xargs modinfo | egrep  '^depends:.*(vxlan|geneve)'  | 
wc -l
0

Hannes Frederic Sowa (3):
  netdev: add atomic netdev callback chain
  vxlan: use netdev_atomic notifier chain to remove dependency from
drivers
  geneve: use netdev_atomic notifier chain to remove dependency from
drivers

 drivers/net/geneve.c  | 30 +++---
 drivers/net/vxlan.c   | 35 ++-
 include/linux/netdevice.h | 10 ++
 include/net/geneve.h  |  6 ++
 include/net/vxlan.h   |  6 ++
 net/core/dev.c| 24 
 6 files changed, 95 insertions(+), 16 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 1/3] netdev: add atomic netdev callback chain

2016-01-06 Thread Hannes Frederic Sowa
This atomic notifier is later used by drivers to deliver callbacks to
the core network stack without requireing rtnl lock to be held.

Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/netdevice.h |  9 +
 net/core/dev.c| 24 
 2 files changed, 33 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814e46a072..fb6064c88fc259 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2223,6 +2223,15 @@ netdev_notifier_info_to_dev(const struct 
netdev_notifier_info *info)
 
 int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
 
+enum netdev_atomic_callback_type {
+   PLACEHOLDER = 0x1UL,
+};
+
+int register_netdev_atomic_notifier(struct notifier_block *nb);
+int unregister_netdev_atomic_notifier(struct notifier_block *nb);
+int call_netdev_atomic_notifiers(enum netdev_atomic_callback_type type,
+struct net_device *dev);
+
 
 extern rwlock_tdev_base_lock;  /* 
Device list lock */
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a24c65436..e96248d955f1aa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7840,6 +7840,30 @@ define_netdev_printk_level(netdev_warn, KERN_WARNING);
 define_netdev_printk_level(netdev_notice, KERN_NOTICE);
 define_netdev_printk_level(netdev_info, KERN_INFO);
 
+static ATOMIC_NOTIFIER_HEAD(netdev_atomic_notifier_chain);
+
+int register_netdev_atomic_notifier(struct notifier_block *nb)
+{
+   return atomic_notifier_chain_register(&netdev_atomic_notifier_chain,
+ nb);
+}
+EXPORT_SYMBOL_GPL(register_netdev_atomic_notifier);
+
+int unregister_netdev_atomic_notifier(struct notifier_block *nb)
+{
+   return atomic_notifier_chain_unregister(&netdev_atomic_notifier_chain,
+   nb);
+}
+EXPORT_SYMBOL_GPL(unregister_netdev_atomic_notifier);
+
+int call_netdev_atomic_notifiers(enum netdev_atomic_callback_type type,
+struct net_device *dev)
+{
+   return atomic_notifier_call_chain(&netdev_atomic_notifier_chain,
+ type, dev);
+}
+EXPORT_SYMBOL_GPL(call_netdev_atomic_notifiers);
+
 static void __net_exit netdev_exit(struct net *net)
 {
kfree(net->dev_name_head);
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Konstantin Khlebnikov
On Wed, Jan 6, 2016 at 10:59 PM, Cong Wang  wrote:
> On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov  
> wrote:
>> Looks like this happens because ip_options_fragment() relies on
>> correct ip options length in ip control block in skb. But in
>> ip_finish_output_gso() control block in segments is reused by
>> skb_gso_segment(). following ip_fragment() sees some garbage.
>>
>> In my case there was no ip options but length becomes non-zero and
>> ip_options_fragment() picked some bytes from payload and decides to
>> fill huge range with IPOPT_NOOP (1). One of that ones flipped nr_frags
>> in skb_shared_info at the end of data =)
>>
>
> Hmm, it looks like SKB_GSO_CB should be cleared after skb_gso_segment()
> since all the gso information should be saved in shared_info after it 
> finishes.
>
> Does a memset(0) on SKB_GSO_CB after skb_gso_segment() work as well?

This will break present logic around ip_options_fragment() - it clears
options from
second and following fragments. With zeroed cb it will do nothing.

ip_options_fragment() can get required information directly from ip header but
it also resets fields in IPCB -- probably it should stay valid here
and somebody else will use it later.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -next] mlxsw: core: remove an unnecessary condition

2016-01-06 Thread David Miller
From: Dan Carpenter 
Date: Wed, 6 Jan 2016 12:56:30 +0300

> We checked "err" on the lines before so we know it's zero here.
> 
> These cause a static checker warning because checking known things can
> indicate a bug.  Maybe there is a missing assignment or we are checking
> the wrong variable.
> 
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mkiss: fix scribble on freed memory

2016-01-06 Thread David Miller
From: Alan 
Date: Wed, 06 Jan 2016 14:55:02 +

> commit d79f16c046086f4fe0d42184a458e187464eb83e fixed a user triggerable
> scribble on free memory but added a new one which allows the user to
> scribble even more and user controlled data into freed space.
> 
> As with 6pack we need to halt the queue before we free the buffers, because
> the transmit logic is not protected by the semaphore.
> 
> Signed-off-by: Alan Cox 

Applied and queued up for -stable, thanks Alan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] ixgbe: Fix for RAR0 not being set to default MAC addr

2016-01-06 Thread Tushar Dave
commit c9f53e63c208 ("ixgbe: Refactor MAC address configuration code")
introduced code that doesn't set HW register RAR0 to default mac address
but FF:FF:FF:FF:FF:FF. Due to this, ixgbe HW discards all incoming packets
that doesn't have destination mac address equals to FF:FF:FF:FF:FF:FF.

This commit sets RAR0 correctly to default HW mac address.

Signed-off-by: Tushar Dave 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea9537d..7e6ce13 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9084,6 +9084,8 @@ skip_sriov:
goto err_sw_init;
}
 
+   /* Set hw->addr to permanent mac address*/
+   ether_addr_copy(hw->mac.addr, hw->mac.perm_addr);
ixgbe_mac_set_default_filter(adapter);
 
setup_timer(&adapter->service_timer, &ixgbe_service_timer,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] ethernet/atheros/alx: sanitize buffer sizing and padding

2016-01-06 Thread David Miller
From: Jarod Wilson 
Date: Wed,  6 Jan 2016 09:36:37 -0500

> This is based on the work done by Przemek Rudy in bug 70761 at
> bugzilla.kernel.org, but with some work done to disentagle and clarify
> things a bit.
> 
> Similar to Przemek's work and other drivers, we're adding a padding of 16
> here, but we're also disentangling mtu size calculations from max buffer
> size calculations a bit, and adding ETH_HLEN to the value written into
> ALX_MTU. Hopefully, with a bit more consistency and clarity, things behave
> better here. Sadly, I can only test in my alx-driven E2200, which worked
> just fine before this patch.
> 
> In comment #58 of bug 70761, Eugene A. Shatokhin reports that this patch
> does help considerably for a ROSA Linux user of his with an AR8162 network
> adapter when patched into a 4.1.x-based kernel, with several days of
> normal operation where wired network previously wasn't usable without
> setting MTU to 9000 as a work-around.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
> CC: "Eugene A. Shatokhin" 
> CC: Przemek Rudy 
> CC: Jay Cliburn 
> CC: Chris Snook 
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson 
> ---
> v2: remove superfluous parens around raw_mtu, pointed out by davem

Looks great, applied, thanks Jarod.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] net: Implement fast csum_partial for x86_64

2016-01-06 Thread Andi Kleen
Tom Herbert  writes:

> Also, we don't do anything special for alignment, unaligned
> accesses on x86 do not appear to be a performance issue.

This is not true on Atom CPUs.

Also on most CPUs there is still a larger penalty when crossing
cache lines.

> Verified correctness by testing arbitrary length buffer filled with
> random data. For each buffer I compared the computed checksum
> using the original algorithm for each possible alignment (0-7 bytes).
>
> Checksum performance:
>
> Isolating old and new implementation for some common cases:

You forgot to state the CPU. The results likely depend heavily
on the micro architecture.

The original C code was optimized for K8 FWIW.

Overall your assembler looks similar to the C code, except for the jump
table. Jump table has the disadvantage that it is much harder to branch
predict, with a large penalty if it's mispredicted.

I would expect it to be slower for cases where the length
changes frequently. Did you benchmark that case?

-Andi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] net: possible use after free in dst_release

2016-01-06 Thread David Miller
From: Eric Dumazet 
Date: Wed, 06 Jan 2016 05:57:24 -0800

> On Wed, 2016-01-06 at 00:18 -0800, Francesco Ruggeri wrote:
>> dst_release should not access dst->flags after decrementing
>> __refcnt to 0. The dst_entry may be in dst_busy_list and
>> dst_gc_task may dst_destroy it before dst_release gets a chance
>> to access dst->flags.
>> 
>> Signed-off-by: Francesco Ruggeri 
>> ---
>>  net/core/dst.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/core/dst.c b/net/core/dst.c
>> index e6dc772..a1656e3 100644
>> --- a/net/core/dst.c
>> +++ b/net/core/dst.c
>> @@ -301,12 +301,13 @@ void dst_release(struct dst_entry *dst)
>>  {
>>  if (dst) {
>>  int newrefcnt;
>> +unsigned short nocache = dst->flags & DST_NOCACHE;
>>  
>>  newrefcnt = atomic_dec_return(&dst->__refcnt);
>>  if (unlikely(newrefcnt < 0))
>>  net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
>>   __func__, dst, newrefcnt);
>> -if (!newrefcnt && unlikely(dst->flags & DST_NOCACHE))
>> +if (!newrefcnt && unlikely(nocache))
>>  call_rcu(&dst->rcu_head, dst_destroy_rcu);
>>  }
>>  }
> 
> You're right.
> 
> My previous fix (commit d69bbf88c8d0) was okay only for DST_NOCACHE dst.
> 
> Fixes: d69bbf88c8d0 ("net: fix a race in dst_release()")
> Fixes: 27b75c95f10d ("net: avoid RCU for NOCACHE dst")
> Acked-by: Eric Dumazet 

Applied and queued up for -stable, thanks everyone!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-01-06 Thread Cong Wang
On Wed, Jan 6, 2016 at 11:15 AM, Konstantin Khlebnikov  wrote:
> Looks like this happens because ip_options_fragment() relies on
> correct ip options length in ip control block in skb. But in
> ip_finish_output_gso() control block in segments is reused by
> skb_gso_segment(). following ip_fragment() sees some garbage.
>
> In my case there was no ip options but length becomes non-zero and
> ip_options_fragment() picked some bytes from payload and decides to
> fill huge range with IPOPT_NOOP (1). One of that ones flipped nr_frags
> in skb_shared_info at the end of data =)
>

Hmm, it looks like SKB_GSO_CB should be cleared after skb_gso_segment()
since all the gso information should be saved in shared_info after it finishes.

Does a memset(0) on SKB_GSO_CB after skb_gso_segment() work as well?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] geneve: break dependency to network drivers

2016-01-06 Thread Jesse Gross
On Wed, Jan 6, 2016 at 10:48 AM, Hannes Frederic Sowa
 wrote:
> On 06.01.2016 19:00, Jesse Gross wrote:
>>
>> On Wed, Jan 6, 2016 at 7:41 AM, Hannes Frederic Sowa
>>  wrote:
>>>
>>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>>> index 24b077a32c1c9c..548925d1571cb1 100644
>>> --- a/drivers/net/geneve.c
>>> +++ b/drivers/net/geneve.c
>>> +static int geneve_notifier(struct notifier_block *unused,
>>> +  unsigned long event, void *ptr)
>>> +{
>>> +   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>>> +
>>> +   switch (event) {
>>> +   case NETDEV_REFRESH_OFFLOAD_VXLAN:
>>> +   geneve_notify_refresh_netdev(dev);
>>
>>
>> Presumably this should be NETDEV_REFRESH_OFFLOAD_GENEVE, not VXLAN.
>> However, rather than having a notifier for each protocol, it seems
>> like it might be cleaner to just have a single one that triggers for
>> all protocols and drivers that don't have the corresponding NDO
>> wouldn't get called, similar to what happens when the port gets added
>> in the first place.
>
>
> Ah, thanks for noticing the typo.
>
> The reason why I went with several types is that I didn't want to change the
> behavior and wasn't sure if driver tested with reoccurring offload refreshes
> to the driver. What you described was my first patch but because I couldn't
> see if that works for all drivers I went this way.

Hmm, I see what you mean but I think it should be safe. All drivers
that have both Geneve and VXLAN offloads make calls to refresh them
back to back, which is a pattern that I would expect to continue. In
that case, having a single notifier that triggers multiple protocols
would have the same effect and is simpler.

>>> diff --git a/include/net/geneve.h b/include/net/geneve.h
>>> index e6c23dc765f7ec..36245115143652 100644
>>> --- a/include/net/geneve.h
>>> +++ b/include/net/geneve.h
>>
>> [...]
>>>
>>>   static inline void geneve_get_rx_port(struct net_device *netdev)
>>>   {
>>> +   call_netdevice_notifiers(NETDEV_REFRESH_OFFLOAD_GENEVE, netdev);
>>>   }
>>
>>
>> Unfortunately, I don't think that we can assume that RTNL is held
>> here. It actually is for the drivers that implement Geneve at this
>> point but not in all cases for VXLAN. For example, ixgbe refreshes the
>> offloads in a service task in addition to when it is opened. There's
>> only a couple instances of things like this, so I guess it's probably
>> not too hard to through and make sure that we hold RTNL in those
>> cases.
>
>
> Hmm, I am tempted to switch over to the netevent atomic notifier chain and
> install those events there. It does not need rtnl lock at all, so we can
> preserve the current semantics. What do you think?

I think that holding RTNL while we do these updates is actually the
right thing to do. The current situation of having calls from
different protocols protected by different locks is not really a great
model given that at the driver level these are usually shared data
structures. RTNL is already held in the majority of cases already, so
I think it is better to just convert the rest.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >