Re: [PATCH] xfrm: Only add l3mdev oif to dst lookups

2016-08-22 Thread Steffen Klassert
On Wed, Aug 17, 2016 at 07:31:14PM -0400, David Miller wrote:
> From: David Ahern 
> Date: Sun, 14 Aug 2016 19:52:56 -0700
> 
> > Subash reported that commit 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
> > broke a wifi use case that uses fib rules and xfrms. The intent of
> > 42a7b32b73d6 was driven by VRFs with IPsec. As a compromise relax the
> > use of oif in xfrm lookups to L3 master devices only (ie., oif is either
> > an L3 master device or is enslaved to a master device).
> > 
> > Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups")
> > Reported-by: Subash Abhinov Kasiviswanathan 
> > Signed-off-by: David Ahern 
> 
> Steffen, please pick this up.

Now applied to the ipsec tree, thanks a lot!


[PATCH iproute2] tipc: add peer remove functionality

2016-08-22 Thread Richard Alpe
This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

This functionality was first merged in:
f9dec657e4 (Richard Alpe tipc: add peer remove functionality)

And later backed out (as the kernel counterpart was held up) in:
385caeb13b (Stephen Hemminger Revert "tipc: add peer remove functionality")

Signed-off-by: Richard Alpe 
Reviewed-by: Jon Maloy 
Reviewed-by: Ying Xue 
---
 include/linux/tipc_netlink.h |  1 +
 man/man8/tipc-bearer.8   |  1 +
 man/man8/tipc-link.8 |  1 +
 man/man8/tipc-media.8|  1 +
 man/man8/tipc-nametable.8|  1 +
 man/man8/tipc-node.8 |  1 +
 man/man8/tipc-peer.8 | 52 +
 man/man8/tipc.8  |  1 +
 tipc/Makefile|  2 +-
 tipc/peer.c  | 93 
 tipc/peer.h  | 21 ++
 tipc/tipc.c  |  3 ++
 12 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/tipc-peer.8
 create mode 100644 tipc/peer.c
 create mode 100644 tipc/peer.h

diff --git a/include/linux/tipc_netlink.h b/include/linux/tipc_netlink.h
index 5f3f6d0..bcb65ef 100644
--- a/include/linux/tipc_netlink.h
+++ b/include/linux/tipc_netlink.h
@@ -59,6 +59,7 @@ enum {
TIPC_NL_MON_SET,
TIPC_NL_MON_GET,
TIPC_NL_MON_PEER_GET,
+   TIPC_NL_PEER_REMOVE,
 
__TIPC_NL_CMD_MAX,
TIPC_NL_CMD_MAX = __TIPC_NL_CMD_MAX - 1
diff --git a/man/man8/tipc-bearer.8 b/man/man8/tipc-bearer.8
index 50a1ed2..565ee01 100644
--- a/man/man8/tipc-bearer.8
+++ b/man/man8/tipc-bearer.8
@@ -218,6 +218,7 @@ Exit status is 0 if command was successful or a positive 
integer upon failure.
 .BR tipc-media (8),
 .BR tipc-nametable (8),
 .BR tipc-node (8),
+.BR tipc-peer (8),
 .BR tipc-socket (8)
 .br
 .SH REPORTING BUGS
diff --git a/man/man8/tipc-link.8 b/man/man8/tipc-link.8
index 3be8c9a..2ee03a0 100644
--- a/man/man8/tipc-link.8
+++ b/man/man8/tipc-link.8
@@ -213,6 +213,7 @@ Exit status is 0 if command was successful or a positive 
integer upon failure.
 .BR tipc-bearer (8),
 .BR tipc-nametable (8),
 .BR tipc-node (8),
+.BR tipc-peer (8),
 .BR tipc-socket (8)
 .br
 .SH REPORTING BUGS
diff --git a/man/man8/tipc-media.8 b/man/man8/tipc-media.8
index 6c6e2b1..4689cb3 100644
--- a/man/man8/tipc-media.8
+++ b/man/man8/tipc-media.8
@@ -74,6 +74,7 @@ Exit status is 0 if command was successful or a positive 
integer upon failure.
 .BR tipc-link (8),
 .BR tipc-nametable (8),
 .BR tipc-node (8),
+.BR tipc-peer (8),
 .BR tipc-socket (8)
 .br
 .SH REPORTING BUGS
diff --git a/man/man8/tipc-nametable.8 b/man/man8/tipc-nametable.8
index d3397f9..4bcefe4 100644
--- a/man/man8/tipc-nametable.8
+++ b/man/man8/tipc-nametable.8
@@ -87,6 +87,7 @@ Exit status is 0 if command was successful or a positive 
integer upon failure.
 .BR tipc-link (8),
 .BR tipc-media (8),
 .BR tipc-node (8),
+.BR tipc-peer (8),
 .BR tipc-socket (8)
 .br
 .SH REPORTING BUGS
diff --git a/man/man8/tipc-node.8 b/man/man8/tipc-node.8
index ef32ec7..a72a409 100644
--- a/man/man8/tipc-node.8
+++ b/man/man8/tipc-node.8
@@ -59,6 +59,7 @@ Exit status is 0 if command was successful or a positive 
integer upon failure.
 .BR tipc-link (8),
 .BR tipc-media (8),
 .BR tipc-nametable (8),
+.BR tipc-peer (8),
 .BR tipc-socket (8)
 .br
 .SH REPORTING BUGS
diff --git a/man/man8/tipc-peer.8 b/man/man8/tipc-peer.8
new file mode 100644
index 000..430651f
--- /dev/null
+++ b/man/man8/tipc-peer.8
@@ -0,0 +1,52 @@
+.TH TIPC-PEER 8 "04 Dec 2015" "iproute2" "Linux"
+
+.\" For consistency, please keep padding right aligned.
+.\" For example '.B "foo " bar' and not '.B foo " bar"'
+
+.SH NAME
+tipc-peer \- modify peer information
+
+.SH SYNOPSIS
+.ad l
+.in +8
+
+.ti -8
+.B tipc peer remove address
+.IR ADDRESS
+
+.SH OPTIONS
+Options (flags) that can be passed anywhere in the command chain.
+.TP
+.BR "\-h" , " --help"
+Show help about last valid command. For example
+.B tipc peer --help
+will show peer help and
+.B tipc --help
+will show general help. The position of the option in the string is irrelevant.
+.SH DESCRIPTION
+
+.SS Peer remove
+Remove an offline peer node from the local data structures. The peer is
+identified by its
+.B address
+
+.SH EXIT STATUS
+Exit status is 0 if command was successful or a positive integer upon failure.
+
+.SH SEE ALSO
+.BR tipc (8),
+.BR tipc-bearer (8),
+.BR tipc-link (8),
+.BR tipc-media (8),
+.BR tipc-nametable (8),
+.BR tipc-node (8),
+.BR tipc-socket (8)
+.br
+.SH REPORTING BUGS
+Report any bugs to the Network Developers mailing list
+.B 
+where the development and maintenance is primarily done.
+You do not have to be subscribed to the list to send a message there.
+
+.SH AUTHOR
+Richard Alpe 
diff --git a/man/man8/tipc.8 b/man/man8/tipc.8
index c116552..32943fa 100644
--- a/man/man8/tipc.8
+++ b/man/man8/tipc.8
@@ -87,6 +87,7 @@ Exit 

Re: Patch to netfilter conntrack for secondary connection logging

2016-08-22 Thread Florian Westphal
Thomas Winter  wrote:
> Hello,
> 
> We are using netfilter to implement a firewall for a router and we had the 
> problem that the ftp data connections were not being logged.
> I did some investigating and found that it is conntrack that is allowing the 
> secondary connection by the ftp helper module.
> I created a patch to enable such logging for any conntrack helper.
> Is this a good change? Or did I miss something really obvious?

It should be possible to log the data connections via

-p tcp -m conntrack --ctstate RELATED -m helper --helper ftp -j (NF)LOG



[PATCH net] qed: FLR of active VFs might lead to FW assert

2016-08-22 Thread Yuval Mintz
Driver never bothered marking the VF's vport with the VF's sw_fid.
As a result, FLR flows are not going to clean those vports.

If the vport was active when FLRed, re-activating it would lead
to a FW assertion.

Fixes: dacd88d6f6851 ("qed: IOV l2 functionality")
Signed-off-by: Yuval Mintz 
---
Hi Dave,

Please consider applying this to 'net'.

Thanks,
Yuval
---
 drivers/net/ethernet/qlogic/qed/qed.h | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 35e5377..45ab746 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -561,9 +561,18 @@ struct qed_dev {
 static inline u8 qed_concrete_to_sw_fid(struct qed_dev *cdev,
u32 concrete_fid)
 {
+   u8 vfid = GET_FIELD(concrete_fid, PXP_CONCRETE_FID_VFID);
u8 pfid = GET_FIELD(concrete_fid, PXP_CONCRETE_FID_PFID);
+   u8 vf_valid = GET_FIELD(concrete_fid,
+   PXP_CONCRETE_FID_VFVALID);
+   u8 sw_fid;
 
-   return pfid;
+   if (vf_valid)
+   sw_fid = vfid + MAX_NUM_PFS;
+   else
+   sw_fid = pfid;
+
+   return sw_fid;
 }
 
 #define PURE_LB_TC 8
-- 
1.9.3



Callback at XFRM ESP tunnel creation

2016-08-22 Thread Begun, Dennis
My goal is to run some code before actual packets begin running in the IPSec 
tunnel. For this, I am thinking of running a callback at the time of an XFRM 
ESP tunnel creation, where tunnel IPs and the SPI will be known. Is there a 
standard way of achieving this? If I'm not mistaken, registering for any event 
that has xfrm_state available should be enough, for instance, xfrm_add_sa, 
esp_init_state. 

-
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.



Re: [PATCH 1/1] netfilter: gre: Use the consitent GRE and PPTP struct instead of the structures defined in netfilter

2016-08-22 Thread Pablo Neira Ayuso
On Fri, Aug 19, 2016 at 11:01:34PM +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> There are two structures which define the GRE header and PPTP
> header. So it is unneccessary to define duplicated structures in
> netfilter again.

Please, split this change in smaller patches, I'd suggest one to
replace GRE_* definitions and another to use generic GRE struct
definitions, so this makes it is easier to review.

> @@ -212,8 +212,8 @@ static bool gre_pkt_to_tuple(const struct sk_buff *skb, 
> unsigned int dataoff,
>   if (!pgrehdr)
>   return true;
>  
> - if (ntohs(grehdr->protocol) != GRE_PROTOCOL_PPTP) {
> - pr_debug("GRE_VERSION_PPTP but unknown proto\n");
> + if (grehdr->protocol != GRE_PROTO_PPP) {
> + pr_debug("Unknown GRE proto(0x%x)\n", ntohs(grehdr->protocol));

Something is fishy here, grehdr->protocol used to have ntohs(), the
pr_debug() still has it while the branch check does not.


RE: [PATCH v1 1/1] net: phy: Add edge-rate, mac-if, read, write func to Microsemi PHYs.

2016-08-22 Thread Nagaraju Lakkaraju

Hello,

Can you please review this code?

Thanks,
Raju.

-Original Message-
From: Nagaraju Lakkaraju [mailto:raju.lakkar...@microsemi.com] 
Sent: Monday, August 08, 2016 7:13 PM
To: netdev@vger.kernel.org
Cc: f.faine...@gmail.com; Allan Nielsen
Subject: [PATCH v1 1/1] net: phy: Add edge-rate, mac-if, read, write func to 
Microsemi PHYs.

Hello,

As part of 2nd patch, Add Edge rate control, MAC Interface, Read and write 
driver functions add for Microsemi PHYs.

Please review and send your comments.

Thanks,
Raju.

>From 6303576768b5c5dcc0e35fb46c525337a3845557 Mon Sep 17 00:00:00 2001
From: Nagaraju Lakkaraju 
Date: Mon, 8 Aug 2016 18:51:36 +0530
Subject: [PATCH v1 1/1]  net: phy: Add edge-rate, mac-if, read, write func to  
Microsemi PHYs.

Signed-off-by: Nagaraju Lakkaraju 
---
 drivers/net/phy/mscc.c | 234 +++--
 drivers/net/phy/mscc_reg.h | 135 ++
 include/linux/mscc.h   |  45 +
 include/linux/phy.h|   2 +
 4 files changed, 387 insertions(+), 29 deletions(-)  mode change 100644 => 
100755 drivers/net/phy/mscc.c  create mode 100644 drivers/net/phy/mscc_reg.h  
create mode 100644 include/linux/mscc.h  mode change 100644 => 100755 
include/linux/phy.h

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c old mode 100644 
new mode 100755 index 49c7506..af7a441
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -11,34 +11,9 @@
 #include 
 #include 
 #include 
+#include 

-enum rgmii_rx_clock_delay {
-   RGMII_RX_CLK_DELAY_0_2_NS = 0,
-   RGMII_RX_CLK_DELAY_0_8_NS = 1,
-   RGMII_RX_CLK_DELAY_1_1_NS = 2,
-   RGMII_RX_CLK_DELAY_1_7_NS = 3,
-   RGMII_RX_CLK_DELAY_2_0_NS = 4,
-   RGMII_RX_CLK_DELAY_2_3_NS = 5,
-   RGMII_RX_CLK_DELAY_2_6_NS = 6,
-   RGMII_RX_CLK_DELAY_3_4_NS = 7
-};
-
-#define MII_VSC85XX_INT_MASK  25
-#define MII_VSC85XX_INT_MASK_MASK 0xa000
-#define MII_VSC85XX_INT_STATUS26
-
-#define MSCC_EXT_PAGE_ACCESS  31
-#define MSCC_PHY_PAGE_STANDARD0x /* Standard registers */
-#define MSCC_PHY_PAGE_EXTENDED_2  0x0002 /* Extended reg - page 2 */
-
-/* Extended Page 2 Registers */
-#define MSCC_PHY_RGMII_CNTL   20
-#define RGMII_RX_CLK_DELAY_MASK   0x0070
-#define RGMII_RX_CLK_DELAY_POS4
-
-/* Microsemi PHY ID's */
-#define PHY_ID_VSC85310x00070570
-#define PHY_ID_VSC85410x00070770
+#include "mscc_reg.h"

 static int vsc85xx_phy_page_set(struct phy_device *phydev, u8 page)  { @@ 
-84,7 +59,7 @@ static int vsc85xx_config_init(struct phy_device *phydev)

 static int vsc85xx_ack_interrupt(struct phy_device *phydev)  {
-   int rc;
+   int rc = 0;

if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
rc = phy_read(phydev, MII_VSC85XX_INT_STATUS); @@ -98,7 +73,7 
@@ static int vsc85xx_config_intr(struct phy_device *phydev)

if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
rc = phy_write(phydev, MII_VSC85XX_INT_MASK,
-  MII_VSC85XX_INT_MASK_MASK);
+  MII_VSC85XX_INT_MASK_MASK);
} else {
rc = phy_write(phydev, MII_VSC85XX_INT_MASK, 0);
if (rc < 0)
@@ -109,6 +84,203 @@ static int vsc85xx_config_intr(struct phy_device *phydev)
return rc;
 }

+static int vsc85xx_soft_reset(struct phy_device *phydev) {
+   int rc;
+   u16 reg_val;
+
+   reg_val = phy_read(phydev, MII_BMCR);
+   reg_val |= BMCR_RESET;
+   rc = phy_write(phydev, MII_BMCR, reg_val);
+
+   return rc;
+}
+
+static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev,
+ u8 *rate)
+{
+   int rc;
+   u16 reg_val;
+   u8  edge_rate = *rate;
+
+   mutex_lock(&phydev->lock);
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
+   if (rc != 0)
+   goto out_unlock;
+   reg_val = phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL);
+   reg_val &= ~(EDGE_RATE_CNTL_MASK);
+   reg_val |= (edge_rate << EDGE_RATE_CNTL_POS);
+   phy_write(phydev, MSCC_PHY_WOL_MAC_CONTROL, reg_val);
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
+
+out_unlock:
+   mutex_unlock(&phydev->lock);
+
+   return rc;
+}
+
+static int vsc85xx_edge_rate_cntl_get(struct phy_device *phydev,
+ u8 *rate)
+{
+   int rc;
+   u16 reg_val;
+
+   mutex_lock(&phydev->lock);
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
+   if (rc != 0)
+   goto out_unlock;
+   reg_val = phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL);
+   reg_val &= EDGE_RATE_CNTL_MASK;
+   *rate = reg_val >> EDGE_RATE_CNTL_POS;
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
+
+out_unlock:
+   mutex_

Re: [PATCH] netfilter: fix spelling mistake: "delimitter" -> "delimiter"

2016-08-22 Thread Pablo Neira Ayuso
On Thu, Aug 18, 2016 at 04:47:57PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> trivial fix to spelling mistake in pr_debug message

Applied.


Re: [PATCH -next] netfilter: nft_hash: fix non static symbol warning

2016-08-22 Thread Pablo Neira Ayuso
On Sun, Aug 21, 2016 at 03:21:10PM +, Wei Yongjun wrote:
> Fixes the following sparse warning:
> 
> net/netfilter/nft_hash.c:40:25: warning:
>  symbol 'nft_hash_policy' was not declared. Should it be static?

Applied, thanks.


Re: [PATCH v3 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread Guillaume Nault
On Sun, Aug 21, 2016 at 04:36:52PM -0600, Philp Prindeville wrote:
> Inline
> 
> 
> On 08/20/2016 09:52 AM, f...@48lvckh6395k16k5.yundunddos.com wrote:
> > From: Gao Feng 
> > 
> > Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
> > 0x03, and 2 separately.
> > 
> > Signed-off-by: Gao Feng 
> > ---
> >   v3: Modify the subject;
> >   v2: Only replace the literal number with macros according to Guillaume's 
> > advice
> >   v1: Inital patch
> > 
> >   net/l2tp/l2tp_ppp.c | 8 
> >   1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
> > index d9560aa..65e2fd6 100644
> > --- a/net/l2tp/l2tp_ppp.c
> > +++ b/net/l2tp/l2tp_ppp.c
> > @@ -177,7 +177,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff 
> > *skb)
> > if (!pskb_may_pull(skb, 2))
> > return 1;
> > -   if ((skb->data[0] == 0xff) && (skb->data[1] == 0x03))
> > +   if ((skb->data[0] == PPP_ALLSTATIONS) && (skb->data[1] == PPP_UI))
> 
> This should have used PPP_ADDRESS() and PPP_CONTROL() here.
>
Then please justify how would that make the code more readable.
We're not trying to interpret a known valid PPP header here.

> > skb_pull(skb, 2);
> 
> This magic number should go away.
>
Again, this is *not* a magic number. We've explicitely accessed the
first _two_ header bytes and want to skip them.
pskb_may_pull(2), ->data[0], ->data[1] and skb_pull(2) all go together.

There's even a nice comment telling you what is done and why:
/* Skip PPP header, if present.  In testing, Microsoft L2TP clients
 * don't send the PPP header (PPP header compression enabled), but
 * other clients can include the header. So we cope with both cases
 * here. The PPP header is always FF03 when using L2TP.
 *
 * Note that skb->data[] isn't dereferenced from a u16 ptr here since
 * the field may be unaligned.
 */
Apart from the unprecise "PPP header" term, which should be read as
"address and control fields", things should be quite clear.

> > @@ -282,7 +282,7 @@ static void pppol2tp_session_sock_put(struct 
> > l2tp_session *session)
> >   static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
> > size_t total_len)
> >   {
> > -   static const unsigned char ppph[2] = { 0xff, 0x03 };
> > +   static const unsigned char ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
> 
> PPP has a 4-byte header.  Where's the protocol value?
>
No, PPP header (whatever you include in it) is of variable length. And
the protocol has already been set by the PPP layer anyway.
We're in L2TP here.


Re: [PATCH] net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset

2016-08-22 Thread Hannes Frederic Sowa
Hi,

On Sun, Aug 21, 2016, at 10:22, Shmulik Ladkani wrote:
> In b8247f095e,
> 
>"net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU,
>allow segmentation for local udp tunneled skbs"
> 
> gso skbs arriving from an ingress interface that go through UDP
> tunneling, are allowed to be fragmented if the resulting encapulated
> segments exceed the dst mtu of the egress interface.
> 
> This aligned the behavior of gso skbs to non-gso skbs going through udp
> encapsulation path.
> 
> However the non-gso vs gso anomaly is present also in the following
> cases of a GRE tunnel:
>  - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
>(e.g. OvS vport-gre with df_default=false)
>  - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set
> 
> In both of the above cases, the non-gso skbs get fragmented, whereas the
> gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get
> dropped,
> as they don't go through the segment+fragment code path.
> 
> Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT
> set.
> 
> Tunnels that do set IP_DF, will not go to fragmentation of segments.
> This preserves behavior of ip_gre in (the default) pmtudisc mode.
> 
> Fixes: b8247f095e ("net: ip_finish_output_gso: If skb_gso_network_seglen
> exceeds MTU, allow segmentation for local udp tunneled skbs")
> Reported-by: wenxu 
> Cc: Hannes Frederic Sowa 
> Signed-off-by: Shmulik Ladkani 

Acked-by: Hannes Frederic Sowa 

Your dissecting of the current state of fragmentation handling also
looked fine. I wonder if it would now make sense to add a sysctl to add
back the dropping of packets in case they can't be fragmented by a
bridge, as we made sure that the defaults don't break for anyone.

Thanks,
Hannes



Re: [PATCH v3 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread Guillaume Nault
On Mon, Aug 22, 2016 at 08:13:48AM +0800, Feng Gao wrote:
> inline
> 
> On Mon, Aug 22, 2016 at 6:36 AM, Philp Prindeville
>  wrote:
> > Inline
> >
> >
> > On 08/20/2016 09:52 AM, f...@48lvckh6395k16k5.yundunddos.com wrote:
> >>
> >> From: Gao Feng 
> >>
> >> Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
> >> 0x03, and 2 separately.
> >>
> >> Signed-off-by: Gao Feng 
> >> ---
> >>   v3: Modify the subject;
> >>   v2: Only replace the literal number with macros according to Guillaume's
> >> advice
> >>   v1: Inital patch
> >>
> >>   net/l2tp/l2tp_ppp.c | 8 
> >>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
> >> index d9560aa..65e2fd6 100644
> >> --- a/net/l2tp/l2tp_ppp.c
> >> +++ b/net/l2tp/l2tp_ppp.c
> >> @@ -177,7 +177,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff
> >> *skb)
> >> if (!pskb_may_pull(skb, 2))
> >> return 1;
> >>   - if ((skb->data[0] == 0xff) && (skb->data[1] == 0x03))
> >> +   if ((skb->data[0] == PPP_ALLSTATIONS) && (skb->data[1] == PPP_UI))
> >
> >
> > This should have used PPP_ADDRESS() and PPP_CONTROL() here.
> 
> In my initial patch, I replace them with PPP_ADDRESS() and PPP_CONTROL.
> But Guillaume thought it was not clear as before.
> So I revert it.
> 
> >
> >> skb_pull(skb, 2);
> >
> >
> > This magic number should go away.
> 
> Same as above.
> 
> >
> >> return 0;
> >> @@ -282,7 +282,7 @@ static void pppol2tp_session_sock_put(struct
> >> l2tp_session *session)
> >>   static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
> >> size_t total_len)
> >>   {
> >> -   static const unsigned char ppph[2] = { 0xff, 0x03 };
> >> +   static const unsigned char ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
> >
> >
> > PPP has a 4-byte header.  Where's the protocol value?
> 
> In the original code, I fail to find the code which is used to fill
> the protocol value.
> So I keep the two bytes header. And I thought the protocol value may be filled
> by the upper layer.
> 
And you were right. This was a macro replacement patch anyway, so you
didn't have to bring functional changes with it.
And if the protocol field was really missing, the L2TP module would
have never worked.


Re: [PATCH v3 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread Guillaume Nault
On Sat, Aug 20, 2016 at 11:52:27PM +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
> 0x03, and 2 separately.
> 
> Signed-off-by: Gao Feng 
> ---
>  v3: Modify the subject;
>  v2: Only replace the literal number with macros according to Guillaume's 
> advice
>  v1: Inital patch
> 
>  net/l2tp/l2tp_ppp.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
> index d9560aa..65e2fd6 100644
> --- a/net/l2tp/l2tp_ppp.c
> +++ b/net/l2tp/l2tp_ppp.c
> @@ -177,7 +177,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff *skb)
>   if (!pskb_may_pull(skb, 2))
>   return 1;
>  
> - if ((skb->data[0] == 0xff) && (skb->data[1] == 0x03))
> + if ((skb->data[0] == PPP_ALLSTATIONS) && (skb->data[1] == PPP_UI))
>   skb_pull(skb, 2);
>  
>   return 0;
> @@ -282,7 +282,7 @@ static void pppol2tp_session_sock_put(struct l2tp_session 
> *session)
>  static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
>   size_t total_len)
>  {
> - static const unsigned char ppph[2] = { 0xff, 0x03 };
> + static const unsigned char ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
> 
Minor nit: I'd prefer to keep the space after '{' and before '}'.
I didn't want to bother you with this, but since it seems you'll have
to repost...

>   struct sock *sk = sock->sk;
>   struct sk_buff *skb;
>   int error;
> @@ -369,7 +369,7 @@ error:
>   */
>  static int pppol2tp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
>  {
> - static const u8 ppph[2] = { 0xff, 0x03 };
> + static const u8 ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
> 
Same here.

BTW, I thought you also wanted to remove the static ppph variable
from pppol2tp_xmit() / pppol2tp_sendmsg(), to directly assign
skb->data[0/1] with PPP_ALLSTATIONS/PPP_UI.


Re: [PATCH 2/3] vsockmon: Add vsockmon device.

2016-08-22 Thread Gerard Garcia

On 08/15/2016 07:23 PM, David Miller wrote:

From: David Laight 
Date: Mon, 15 Aug 2016 16:38:36 +


From: Stefan Hajnoczi

Sent: 10 August 2016 12:52
On Mon, Aug 08, 2016 at 06:14:41PM +0200, ggar...@abra.uab.cat wrote:

diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h
new file mode 100644
index 000..739b4bf
--- /dev/null
+++ b/include/uapi/linux/vsockmon.h
@@ -0,0 +1,35 @@
+#ifndef _UAPI_VSOCKMON_H
+#define _UAPI_VSOCKMON_H
+
+#include 
+
+/* Structure of packets received trought the vsockmon device. */
+
+struct af_vsockmon_hdr {
+   __le64 src_cid;
+   __le64 dst_cid;
+   __le32 src_port;
+   __le32 dst_port;
+   __le16 op;  /* enum af_vsockmon_op */
+   __le16 t;   /* enum af_vosckmon_t */
+   __le16 len; /* sizeof(t_hdr) */
+   union {
+   struct virtio_vsock_hdr virtio_hdr;
+   } t_hdr;
+} __attribute__((packed));

...

Gah, another 'packed' structure.
Have you looked at the amount of code the sparc64 compiler generates
to access the structure members??

You really want to add another 16bit field and enforce 64bit alignment
on the header and all data blocks.


Indeed, avoid the packed attribute at all costs.



I understand. I'll add another 16b field so it is aligned and avoid the 
packed attribute.


Gerard


[PATCH iproute2 0/2] tc: flower, m_vlan: Introduce vlan tag support

2016-08-22 Thread y
From: Hadar Hen Zion 

This patchset introduce vlan tag support to the tc flower classifier.
In addition to adding vlan priority to vlan push action.

- The first patch adds classification according to vlan id and vlan priority to 
the flower.
- The second patch adds support for vlan priority to the current vlan push 
action.

Hadar Hen Zion (2):
  tc: flower: Introduce vlan support
  tc: m_vlan: Add priority option to push vlan action

 include/linux/pkt_cls.h|  5 +++
 include/linux/tc_act/tc_vlan.h |  4 +++
 man/man8/tc-flower.8   | 25 -
 man/man8/tc-vlan.8 |  5 +++
 tc/f_flower.c  | 80 --
 tc/m_vlan.c| 22 +++-
 6 files changed, 136 insertions(+), 5 deletions(-)

-- 
1.8.3.1



[PATCH iproute2 1/2] tc: flower: Introduce vlan support

2016-08-22 Thread y
From: Hadar Hen Zion 

Classification according to vlan id and vlan priority.

Example script that adds vlan filter:

 # add ingress qdisc
 tc qdisc add dev ens4f0 ingress

 # add a flower filter with vlan id and priority classification
 tc filter add dev ens4f0 protocol 802.1Q parent : \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop

Signed-off-by: Hadar Hen Zion 
---
 include/linux/pkt_cls.h|  5 +++
 include/linux/tc_act/tc_vlan.h |  3 ++
 man/man8/tc-flower.8   | 25 -
 tc/f_flower.c  | 80 --
 4 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 5e6c61e..25a8fae 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -374,6 +374,11 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
 
TCA_FLOWER_FLAGS,
+
+   TCA_FLOWER_KEY_VLAN_ID,
+   TCA_FLOWER_KEY_VLAN_PRIO,
+   TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index 31151ff..26ae695 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -16,6 +16,9 @@
 
 #define TCA_VLAN_ACT_POP   1
 #define TCA_VLAN_ACT_PUSH  2
+#define VLAN_PRIO_MASK 0x7
+#define VLAN_VID_MASK  0x0fff
+
 
 struct tc_vlan {
tc_gen;
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 9ae10e6..74f7664 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -23,7 +23,13 @@ flower \- flow based traffic control filter
 .R " | { "
 .BR dst_mac " | " src_mac " } "
 .IR mac_address " | "
-.BR eth_type " { " ipv4 " | " ipv6 " | "
+.BR eth_type " { " ipv4 " | " ipv6 " | " 802.1Q " | "
+.IR ETH_TYPE " } | "
+.B vlan_id
+.IR VID " | "
+.B vlan_prio
+.IR PRIORITY " | "
+.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | "
 .IR IP_PROTO " } | { "
@@ -70,6 +76,23 @@ Do not process filter by hardware.
 Match on source or destination MAC address.
 .TP
 .BI eth_type " ETH_TYPE"
+Match on the next protocol.
+.I ETH_TYPE
+may be either
+.BR ipv4 , ipv6 , 802.1Q ,
+or an unsigned 16bit value in hexadecimal format.
+.TP
+.BI vlan_id " VID"
+Match on vlan tag id.
+.I VID
+is an unsigned 12bit value in decimal format.
+.TP
+.BI vlan_prio " priority"
+Match on vlan tag priority.
+.I PRIORITY
+is an unsigned 3bit value in decimal format.
+.TP
+.BI vlan_eth_type " VLAN_ETH_TYPE"
 Match on layer three protocol.
 .I ETH_TYPE
 may be either
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 791ade7..2ab2de1 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "utils.h"
 #include "tc_util.h"
@@ -30,6 +31,9 @@ static void explain(void)
fprintf(stderr, "\n");
fprintf(stderr, "Where: MATCH-LIST := [ MATCH-LIST ] MATCH\n");
fprintf(stderr, "   MATCH  := { indev DEV-NAME |\n");
+   fprintf(stderr, "   vlan_id VID |\n");
+   fprintf(stderr, "   vlan_prio PRIORITY |\n");
+   fprintf(stderr, "   vlan_ethtype [ ipv4 | ipv6 | 
ETH-TYPE ] |\n");
fprintf(stderr, "   dst_mac MAC-ADDR |\n");
fprintf(stderr, "   src_mac MAC-ADDR |\n");
fprintf(stderr, "   [ipv4 | ipv6 ] |\n");
@@ -61,6 +65,24 @@ static int flower_parse_eth_addr(char *str, int addr_type, 
int mask_type,
return 0;
 }
 
+static int flower_parse_vlan_eth_type(char *str, __be16 eth_type, int type,
+ __be16 *p_vlan_eth_type,
+ struct nlmsghdr *n)
+{
+   __be16 vlan_eth_type;
+
+   if (eth_type != htons(ETH_P_8021Q)) {
+   fprintf(stderr, "Can't set \"vlan_ethtype\" if ethertype isn't 
802.1Q\n");
+   return -1;
+   }
+
+   if (ll_proto_a2n(&vlan_eth_type, str))
+   invarg("invalid vlan_ethtype", str);
+   addattr16(n, MAX_MSG, type, vlan_eth_type);
+   *p_vlan_eth_type = vlan_eth_type;
+   return 0;
+}
+
 static int flower_parse_ip_proto(char *str, __be16 eth_type, int type,
 __u8 *p_ip_proto, struct nlmsghdr *n)
 {
@@ -167,6 +189,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
struct tcmsg *t = NLMSG_DATA(n);
struct rtattr *tail;
__be16 eth_type = TC_H_MIN(t->tcm_info);
+   __be16 vlan_ethtype = 0;
__u8 ip_proto = 0xff;
__u32 flags = 0;
 
@@ -208,6 +231,42 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
NEXT_ARG();
strncpy(ifname, *argv, sizeof(ifname) - 1);
  

[PATCH iproute2 2/2] tc: m_vlan: Add priority option to push vlan action

2016-08-22 Thread y
From: Hadar Hen Zion 

The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and priority:

tc filter add dev veth0 protocol ip parent : \
flower \
indev veth0 \
action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion 
---
 include/linux/tc_act/tc_vlan.h |  1 +
 man/man8/tc-vlan.8 |  5 +
 tc/m_vlan.c| 22 +-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index 26ae695..29e2113 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -32,6 +32,7 @@ enum {
TCA_VLAN_PUSH_VLAN_ID,
TCA_VLAN_PUSH_VLAN_PROTOCOL,
TCA_VLAN_PAD,
+   TCA_VLAN_PUSH_VLAN_PRIORITY,
__TCA_VLAN_MAX,
 };
 #define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1)
diff --git a/man/man8/tc-vlan.8 b/man/man8/tc-vlan.8
index 4bfd72b..4d0c5c8 100644
--- a/man/man8/tc-vlan.8
+++ b/man/man8/tc-vlan.8
@@ -12,6 +12,8 @@ vlan - vlan manipulation module
 .IR PUSH " := "
 .BR push " [ " protocol
 .IR VLANPROTO " ]"
+.BR " [ " priority
+.IR VLANPRIO " ] "
 .BI id " VLANID"
 
 .ti -8
@@ -55,6 +57,9 @@ for hexadecimal interpretation, etc.).
 Choose the VLAN protocol to use. At the time of writing, the kernel accepts 
only
 .BR 802.1Q " or " 802.1ad .
 .TP
+.BI priority " VLANPRIO"
+Choose the VLAN priority to use. Decimal number in range of 0-7.
+.TP
 .I CONTROL
 How to continue after executing this action.
 .RS
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index ac63d9e..be2ffd2 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -22,7 +22,7 @@
 static void explain(void)
 {
fprintf(stderr, "Usage: vlan pop\n");
-   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID 
[CONTROL]\n");
+   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID [ 
priority VLANPRIO ] [CONTROL]\n");
fprintf(stderr, "   VLANPROTO is one of 802.1Q or 802.1AD\n");
fprintf(stderr, "with default: 802.1Q\n");
fprintf(stderr, "   CONTROL := reclassify | pipe | drop | continue 
| pass\n");
@@ -45,6 +45,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
int id_set = 0;
__u16 proto;
int proto_set = 0;
+   __u8 prio;
+   int prio_set = 0;
struct tc_vlan parm = { 0 };
 
if (matches(*argv, "vlan") != 0)
@@ -91,6 +93,17 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
if (ll_proto_a2n(&proto, *argv))
invarg("protocol is invalid", *argv);
proto_set = 1;
+   } else if (matches(*argv, "priority") == 0) {
+   if (action != TCA_VLAN_ACT_PUSH) {
+   fprintf(stderr, "\"%s\" is only valid for 
push\n",
+   *argv);
+   explain();
+   return -1;
+   }
+   NEXT_ARG();
+   if (get_u8(&prio, *argv, 0) || (prio & ~VLAN_PRIO_MASK))
+   invarg("prio is invalid", *argv);
+   prio_set = 1;
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -138,6 +151,9 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
 
addattr_l(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PROTOCOL, &proto, 2);
}
+   if (prio_set)
+   addattr8(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PRIORITY, prio);
+
tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
 
*argc_p = argc;
@@ -180,6 +196,10 @@ static int print_vlan(struct action_util *au, FILE *f, 
struct rtattr *arg)

ll_proto_n2a(rta_getattr_u16(tb[TCA_VLAN_PUSH_VLAN_PROTOCOL]),
 b1, sizeof(b1)));
}
+   if (tb[TCA_VLAN_PUSH_VLAN_PRIORITY]) {
+   val = rta_getattr_u8(tb[TCA_VLAN_PUSH_VLAN_PRIORITY]);
+   fprintf(f, " priority %u", val);
+   }
break;
}
fprintf(f, " %s", action_n2a(parm->action));
-- 
1.8.3.1



Re: [PATCH v3 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread Feng Gao
inline

On Mon, Aug 22, 2016 at 6:07 PM, Guillaume Nault  wrote:
> On Sat, Aug 20, 2016 at 11:52:27PM +0800, f...@ikuai8.com wrote:
>> From: Gao Feng 
>>
>> Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
>> 0x03, and 2 separately.
>>
>> Signed-off-by: Gao Feng 
>> ---
>>  v3: Modify the subject;
>>  v2: Only replace the literal number with macros according to Guillaume's 
>> advice
>>  v1: Inital patch
>>
>>  net/l2tp/l2tp_ppp.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
>> index d9560aa..65e2fd6 100644
>> --- a/net/l2tp/l2tp_ppp.c
>> +++ b/net/l2tp/l2tp_ppp.c
>> @@ -177,7 +177,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff 
>> *skb)
>>   if (!pskb_may_pull(skb, 2))
>>   return 1;
>>
>> - if ((skb->data[0] == 0xff) && (skb->data[1] == 0x03))
>> + if ((skb->data[0] == PPP_ALLSTATIONS) && (skb->data[1] == PPP_UI))
>>   skb_pull(skb, 2);
>>
>>   return 0;
>> @@ -282,7 +282,7 @@ static void pppol2tp_session_sock_put(struct 
>> l2tp_session *session)
>>  static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
>>   size_t total_len)
>>  {
>> - static const unsigned char ppph[2] = { 0xff, 0x03 };
>> + static const unsigned char ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
>>
> Minor nit: I'd prefer to keep the space after '{' and before '}'.
> I didn't want to bother you with this, but since it seems you'll have
> to repost...

I don't know if it is the coding style of Linux kernel.

>
>>   struct sock *sk = sock->sk;
>>   struct sk_buff *skb;
>>   int error;
>> @@ -369,7 +369,7 @@ error:
>>   */
>>  static int pppol2tp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
>>  {
>> - static const u8 ppph[2] = { 0xff, 0x03 };
>> + static const u8 ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
>>
> Same here.
>
> BTW, I thought you also wanted to remove the static ppph variable
> from pppol2tp_xmit() / pppol2tp_sendmsg(), to directly assign
> skb->data[0/1] with PPP_ALLSTATIONS/PPP_UI.

If removed static ppph, there will be some codes which use literal "2"
instead of sizeof ppph.
Is it ok?

Regards
Feng


[PATCH net-next 0/4] qed*: IOV patch series

2016-08-22 Thread Yuval Mintz
Recent FW [8.10.10.0] enabled us to support sriov interaction
with legacy VF/PF. This patch series adds the necessary driver changes
to utilize this additional compatibility.
In addition, utilize the new FW ability to prevent pause floods by VFs,
and fix a bug that is [mostly] exposed by the added legacy support.

Dave,

Please consider apply this to 'net-next'.

Thanks,
Yuval

Yuval Mintz (4):
  qed: Add support for legacy VFs
  qed: Prevent VFs from pause flooding
  qed*: Add support for VFs over legacy PFs
  qed: Change locking scheme for VF channel

 drivers/net/ethernet/qlogic/qed/qed_l2.c |  20 ++-
 drivers/net/ethernet/qlogic/qed/qed_l2.h |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_sriov.c  | 110 +++--
 drivers/net/ethernet/qlogic/qed/qed_vf.c | 231 ---
 drivers/net/ethernet/qlogic/qed/qed_vf.h |   7 +-
 drivers/net/ethernet/qlogic/qede/qede.h  |   2 +
 drivers/net/ethernet/qlogic/qede/qede_main.c |  10 ++
 include/linux/qed/qed_eth_if.h   |   3 +
 8 files changed, 309 insertions(+), 79 deletions(-)

-- 
1.9.3



[PATCH iproute2 1/2] tc: flower: Introduce vlan support

2016-08-22 Thread Hadar Hen Zion
Classification according to vlan id and vlan priority.

Example script that adds vlan filter:

 # add ingress qdisc
 tc qdisc add dev ens4f0 ingress

 # add a flower filter with vlan id and priority classification
 tc filter add dev ens4f0 protocol 802.1Q parent : \
flower \
indev ens4f0 \
vlan_ethtype ipv4 \
vlan_id 100 \
vlan_prio 3 \
action vlan pop

Signed-off-by: Hadar Hen Zion 
---
 include/linux/pkt_cls.h|  5 +++
 include/linux/tc_act/tc_vlan.h |  3 ++
 man/man8/tc-flower.8   | 25 -
 tc/f_flower.c  | 80 --
 4 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 5e6c61e..25a8fae 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -374,6 +374,11 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
 
TCA_FLOWER_FLAGS,
+
+   TCA_FLOWER_KEY_VLAN_ID,
+   TCA_FLOWER_KEY_VLAN_PRIO,
+   TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index 31151ff..26ae695 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -16,6 +16,9 @@
 
 #define TCA_VLAN_ACT_POP   1
 #define TCA_VLAN_ACT_PUSH  2
+#define VLAN_PRIO_MASK 0x7
+#define VLAN_VID_MASK  0x0fff
+
 
 struct tc_vlan {
tc_gen;
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 9ae10e6..74f7664 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -23,7 +23,13 @@ flower \- flow based traffic control filter
 .R " | { "
 .BR dst_mac " | " src_mac " } "
 .IR mac_address " | "
-.BR eth_type " { " ipv4 " | " ipv6 " | "
+.BR eth_type " { " ipv4 " | " ipv6 " | " 802.1Q " | "
+.IR ETH_TYPE " } | "
+.B vlan_id
+.IR VID " | "
+.B vlan_prio
+.IR PRIORITY " | "
+.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | "
 .IR IP_PROTO " } | { "
@@ -70,6 +76,23 @@ Do not process filter by hardware.
 Match on source or destination MAC address.
 .TP
 .BI eth_type " ETH_TYPE"
+Match on the next protocol.
+.I ETH_TYPE
+may be either
+.BR ipv4 , ipv6 , 802.1Q ,
+or an unsigned 16bit value in hexadecimal format.
+.TP
+.BI vlan_id " VID"
+Match on vlan tag id.
+.I VID
+is an unsigned 12bit value in decimal format.
+.TP
+.BI vlan_prio " priority"
+Match on vlan tag priority.
+.I PRIORITY
+is an unsigned 3bit value in decimal format.
+.TP
+.BI vlan_eth_type " VLAN_ETH_TYPE"
 Match on layer three protocol.
 .I ETH_TYPE
 may be either
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 791ade7..2ab2de1 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "utils.h"
 #include "tc_util.h"
@@ -30,6 +31,9 @@ static void explain(void)
fprintf(stderr, "\n");
fprintf(stderr, "Where: MATCH-LIST := [ MATCH-LIST ] MATCH\n");
fprintf(stderr, "   MATCH  := { indev DEV-NAME |\n");
+   fprintf(stderr, "   vlan_id VID |\n");
+   fprintf(stderr, "   vlan_prio PRIORITY |\n");
+   fprintf(stderr, "   vlan_ethtype [ ipv4 | ipv6 | 
ETH-TYPE ] |\n");
fprintf(stderr, "   dst_mac MAC-ADDR |\n");
fprintf(stderr, "   src_mac MAC-ADDR |\n");
fprintf(stderr, "   [ipv4 | ipv6 ] |\n");
@@ -61,6 +65,24 @@ static int flower_parse_eth_addr(char *str, int addr_type, 
int mask_type,
return 0;
 }
 
+static int flower_parse_vlan_eth_type(char *str, __be16 eth_type, int type,
+ __be16 *p_vlan_eth_type,
+ struct nlmsghdr *n)
+{
+   __be16 vlan_eth_type;
+
+   if (eth_type != htons(ETH_P_8021Q)) {
+   fprintf(stderr, "Can't set \"vlan_ethtype\" if ethertype isn't 
802.1Q\n");
+   return -1;
+   }
+
+   if (ll_proto_a2n(&vlan_eth_type, str))
+   invarg("invalid vlan_ethtype", str);
+   addattr16(n, MAX_MSG, type, vlan_eth_type);
+   *p_vlan_eth_type = vlan_eth_type;
+   return 0;
+}
+
 static int flower_parse_ip_proto(char *str, __be16 eth_type, int type,
 __u8 *p_ip_proto, struct nlmsghdr *n)
 {
@@ -167,6 +189,7 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
struct tcmsg *t = NLMSG_DATA(n);
struct rtattr *tail;
__be16 eth_type = TC_H_MIN(t->tcm_info);
+   __be16 vlan_ethtype = 0;
__u8 ip_proto = 0xff;
__u32 flags = 0;
 
@@ -208,6 +231,42 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
NEXT_ARG();
strncpy(ifname, *argv, sizeof(ifname) - 1);
addattrstrz(n

[PATCH iproute2 0/2] tc: flower, m_vlan: Introduce vlan tag support

2016-08-22 Thread Hadar Hen Zion
Hi,
Re-sending becuase of a wrong source e-mail address sent before.

This patchset introduce vlan tag support to the tc flower classifier.
In addition to adding vlan priority to vlan push action.

- The first patch adds classification according to vlan id and vlan priority to 
the flower.
- The second patch adds support for vlan priority to the current vlan push 
action.

Hadar Hen Zion (2):
  tc: flower: Introduce vlan support
  tc: m_vlan: Add priority option to push vlan action

 include/linux/pkt_cls.h|  5 +++
 include/linux/tc_act/tc_vlan.h |  4 +++
 man/man8/tc-flower.8   | 25 -
 man/man8/tc-vlan.8 |  5 +++
 tc/f_flower.c  | 80 --
 tc/m_vlan.c| 22 +++-
 6 files changed, 136 insertions(+), 5 deletions(-)

-- 
1.8.3.1



[PATCH net-next 1/4] qed: Add support for legacy VFs

2016-08-22 Thread Yuval Mintz
The 8.10.x FW added support for forward compatability as well as
'future' backward compatibility, but only to those VFs that were
using HSI which was 8.10.x based or newer.

The latest firmware now supports backward compatibility for the
older VFs based on 8.7.x and 8.8.x firmware as well.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_l2.c|  15 ++--
 drivers/net/ethernet/qlogic/qed/qed_l2.h|   3 +-
 drivers/net/ethernet/qlogic/qed/qed_sriov.c | 109 +++-
 drivers/net/ethernet/qlogic/qed/qed_vf.h|   2 +-
 4 files changed, 104 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index c823c46..c04162d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -514,7 +514,8 @@ int qed_sp_eth_rxq_start_ramrod(struct qed_hwfn *p_hwfn,
u8 stats_id,
u16 bd_max_bytes,
dma_addr_t bd_chain_phys_addr,
-   dma_addr_t cqe_pbl_addr, u16 cqe_pbl_size)
+   dma_addr_t cqe_pbl_addr,
+   u16 cqe_pbl_size, bool b_use_zone_a_prod)
 {
struct rx_queue_start_ramrod_data *p_ramrod = NULL;
struct qed_spq_entry *p_ent = NULL;
@@ -571,11 +572,14 @@ int qed_sp_eth_rxq_start_ramrod(struct qed_hwfn *p_hwfn,
p_ramrod->num_of_pbl_pages = cpu_to_le16(cqe_pbl_size);
DMA_REGPAIR_LE(p_ramrod->cqe_pbl_addr, cqe_pbl_addr);
 
-   p_ramrod->vf_rx_prod_index = p_params->vf_qid;
-   if (p_params->vf_qid)
+   if (p_params->vf_qid || b_use_zone_a_prod) {
+   p_ramrod->vf_rx_prod_index = p_params->vf_qid;
DP_VERBOSE(p_hwfn, QED_MSG_SP,
-  "Queue is meant for VF rxq[%04x]\n",
+  "Queue%s is meant for VF rxq[%02x]\n",
+  b_use_zone_a_prod ? " [legacy]" : "",
   p_params->vf_qid);
+   p_ramrod->vf_rx_prod_use_zone_a = b_use_zone_a_prod;
+   }
 
return qed_spq_post(p_hwfn, p_ent, NULL);
 }
@@ -637,8 +641,7 @@ qed_sp_eth_rx_queue_start(struct qed_hwfn *p_hwfn,
 abs_stats_id,
 bd_max_bytes,
 bd_chain_phys_addr,
-cqe_pbl_addr,
-cqe_pbl_size);
+cqe_pbl_addr, cqe_pbl_size, false);
 
if (rc)
qed_sp_release_queue_cid(p_hwfn, p_rx_cid);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h 
b/drivers/net/ethernet/qlogic/qed/qed_l2.h
index ff3a198..ea93519 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h
@@ -225,7 +225,8 @@ int qed_sp_eth_rxq_start_ramrod(struct qed_hwfn *p_hwfn,
u8 stats_id,
u16 bd_max_bytes,
dma_addr_t bd_chain_phys_addr,
-   dma_addr_t cqe_pbl_addr, u16 cqe_pbl_size);
+   dma_addr_t cqe_pbl_addr,
+   u16 cqe_pbl_size, bool b_use_zone_a_prod);
 
 int qed_sp_eth_txq_start_ramrod(struct qed_hwfn  *p_hwfn,
u16  opaque_fid,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c 
b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
index 1579f33..f1fae77 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
@@ -60,7 +60,8 @@ static int qed_sp_vf_start(struct qed_hwfn *p_hwfn, struct 
qed_vf_info *p_vf)
}
 
fp_minor = p_vf->acquire.vfdev_info.eth_fp_hsi_minor;
-   if (fp_minor > ETH_HSI_VER_MINOR) {
+   if (fp_minor > ETH_HSI_VER_MINOR &&
+   fp_minor != ETH_HSI_VER_NO_PKT_LEN_TUNN) {
DP_VERBOSE(p_hwfn,
   QED_MSG_IOV,
   "VF [%d] - Requested fp hsi %02x.%02x which is 
slightly newer than PF's %02x.%02x; Configuring PFs version\n",
@@ -1241,6 +1242,16 @@ static u8 qed_iov_vf_mbx_acquire_resc(struct qed_hwfn 
*p_hwfn,
   p_req->num_vlan_filters,
   p_resp->num_vlan_filters,
   p_req->num_mc_filters, p_resp->num_mc_filters);
+
+   /* Some legacy OSes are incapable of correctly handling this
+* failure.
+*/
+   if ((p_vf->acquire.vfdev_info.eth_fp_hsi_minor ==
+ETH_HSI_VER_NO_PKT_LEN_TUNN) &&
+   (p_vf->acquire.vfdev_info.os_type ==
+VFPF_ACQUIRE_OS_WINDOWS))
+   return PFVF_STATUS_SUCCESS;
+
return PFVF_STATUS_NO_RESOURCE;
}
 
@

[PATCH iproute2 2/2] tc: m_vlan: Add priority option to push vlan action

2016-08-22 Thread Hadar Hen Zion
The current vlan push action supports only vid and protocol options.
Add priority option.

Example script that adds vlan push action with vid and priority:

tc filter add dev veth0 protocol ip parent : \
flower \
indev veth0 \
action vlan push id 100 priority 5

Signed-off-by: Hadar Hen Zion 
---
 include/linux/tc_act/tc_vlan.h |  1 +
 man/man8/tc-vlan.8 |  5 +
 tc/m_vlan.c| 22 +-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/tc_act/tc_vlan.h b/include/linux/tc_act/tc_vlan.h
index 26ae695..29e2113 100644
--- a/include/linux/tc_act/tc_vlan.h
+++ b/include/linux/tc_act/tc_vlan.h
@@ -32,6 +32,7 @@ enum {
TCA_VLAN_PUSH_VLAN_ID,
TCA_VLAN_PUSH_VLAN_PROTOCOL,
TCA_VLAN_PAD,
+   TCA_VLAN_PUSH_VLAN_PRIORITY,
__TCA_VLAN_MAX,
 };
 #define TCA_VLAN_MAX (__TCA_VLAN_MAX - 1)
diff --git a/man/man8/tc-vlan.8 b/man/man8/tc-vlan.8
index 4bfd72b..4d0c5c8 100644
--- a/man/man8/tc-vlan.8
+++ b/man/man8/tc-vlan.8
@@ -12,6 +12,8 @@ vlan - vlan manipulation module
 .IR PUSH " := "
 .BR push " [ " protocol
 .IR VLANPROTO " ]"
+.BR " [ " priority
+.IR VLANPRIO " ] "
 .BI id " VLANID"
 
 .ti -8
@@ -55,6 +57,9 @@ for hexadecimal interpretation, etc.).
 Choose the VLAN protocol to use. At the time of writing, the kernel accepts 
only
 .BR 802.1Q " or " 802.1ad .
 .TP
+.BI priority " VLANPRIO"
+Choose the VLAN priority to use. Decimal number in range of 0-7.
+.TP
 .I CONTROL
 How to continue after executing this action.
 .RS
diff --git a/tc/m_vlan.c b/tc/m_vlan.c
index ac63d9e..be2ffd2 100644
--- a/tc/m_vlan.c
+++ b/tc/m_vlan.c
@@ -22,7 +22,7 @@
 static void explain(void)
 {
fprintf(stderr, "Usage: vlan pop\n");
-   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID 
[CONTROL]\n");
+   fprintf(stderr, "   vlan push [ protocol VLANPROTO ] id VLANID [ 
priority VLANPRIO ] [CONTROL]\n");
fprintf(stderr, "   VLANPROTO is one of 802.1Q or 802.1AD\n");
fprintf(stderr, "with default: 802.1Q\n");
fprintf(stderr, "   CONTROL := reclassify | pipe | drop | continue 
| pass\n");
@@ -45,6 +45,8 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
int id_set = 0;
__u16 proto;
int proto_set = 0;
+   __u8 prio;
+   int prio_set = 0;
struct tc_vlan parm = { 0 };
 
if (matches(*argv, "vlan") != 0)
@@ -91,6 +93,17 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
if (ll_proto_a2n(&proto, *argv))
invarg("protocol is invalid", *argv);
proto_set = 1;
+   } else if (matches(*argv, "priority") == 0) {
+   if (action != TCA_VLAN_ACT_PUSH) {
+   fprintf(stderr, "\"%s\" is only valid for 
push\n",
+   *argv);
+   explain();
+   return -1;
+   }
+   NEXT_ARG();
+   if (get_u8(&prio, *argv, 0) || (prio & ~VLAN_PRIO_MASK))
+   invarg("prio is invalid", *argv);
+   prio_set = 1;
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -138,6 +151,9 @@ static int parse_vlan(struct action_util *a, int *argc_p, 
char ***argv_p,
 
addattr_l(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PROTOCOL, &proto, 2);
}
+   if (prio_set)
+   addattr8(n, MAX_MSG, TCA_VLAN_PUSH_VLAN_PRIORITY, prio);
+
tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
 
*argc_p = argc;
@@ -180,6 +196,10 @@ static int print_vlan(struct action_util *au, FILE *f, 
struct rtattr *arg)

ll_proto_n2a(rta_getattr_u16(tb[TCA_VLAN_PUSH_VLAN_PROTOCOL]),
 b1, sizeof(b1)));
}
+   if (tb[TCA_VLAN_PUSH_VLAN_PRIORITY]) {
+   val = rta_getattr_u8(tb[TCA_VLAN_PUSH_VLAN_PRIORITY]);
+   fprintf(f, " priority %u", val);
+   }
break;
}
fprintf(f, " %s", action_n2a(parm->action));
-- 
1.8.3.1



[PATCH net-next 4/4] qed: Change locking scheme for VF channel

2016-08-22 Thread Yuval Mintz
Each VF employees a lock that's supposed to serialize its usage of the
HW channel for communication with its PF, but the critical section is
ill-defined:

  - VFs currently release the lock whenever the PF response arrives,
prior to actually processing the reply buffer [which was also supposed
to have been protected by same lock].

  - The lock would be released on first response, ignoring the possibilty
the sw flow isn't over [as might be the case of the acquisition flow].
As a result, the flow would run unprotected and would cause a double
mutex release [as the additional message completion would release it
while its actually already free].

Change the flow to have a dedicated function to be called at end of each
flow and release the lock.

Signed-off-by: Yuval Mintz 
---
Notice this is basically a bug fix, but pushing it to net would create
several merge conflicts.
Furthermore, while the first issue is a theoretical race, the second
would be constantly hit if a modern VF would be used on top of a legacy
PF [i.e., patch #3 in this series].
Hence the motivation of adding it here.

Still, if prefered I can provide a version of this for `net'.
---
 drivers/net/ethernet/qlogic/qed/qed_vf.c | 124 ++-
 1 file changed, 90 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c 
b/drivers/net/ethernet/qlogic/qed/qed_vf.c
index f9f68da..3c9071d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_vf.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c
@@ -46,6 +46,17 @@ static void *qed_vf_pf_prep(struct qed_hwfn *p_hwfn, u16 
type, u16 length)
return p_tlv;
 }
 
+static void qed_vf_pf_req_end(struct qed_hwfn *p_hwfn, int req_status)
+{
+   union pfvf_tlvs *resp = p_hwfn->vf_iov_info->pf2vf_reply;
+
+   DP_VERBOSE(p_hwfn, QED_MSG_IOV,
+  "VF request status = 0x%x, PF reply status = 0x%x\n",
+  req_status, resp->default_resp.hdr.status);
+
+   mutex_unlock(&(p_hwfn->vf_iov_info->mutex));
+}
+
 static int qed_send_msg2pf(struct qed_hwfn *p_hwfn, u8 *done, u32 resp_size)
 {
union vfpf_tlvs *p_req = p_hwfn->vf_iov_info->vf2pf_request;
@@ -103,16 +114,12 @@ static int qed_send_msg2pf(struct qed_hwfn *p_hwfn, u8 
*done, u32 resp_size)
   "VF <-- PF Timeout [Type %d]\n",
   p_req->first_tlv.tl.type);
rc = -EBUSY;
-   goto exit;
} else {
DP_VERBOSE(p_hwfn, QED_MSG_IOV,
   "PF response: %d [Type %d]\n",
   *done, p_req->first_tlv.tl.type);
}
 
-exit:
-   mutex_unlock(&(p_hwfn->vf_iov_info->mutex));
-
return rc;
 }
 
@@ -296,6 +303,8 @@ static int qed_vf_pf_acquire(struct qed_hwfn *p_hwfn)
}
 
 exit:
+   qed_vf_pf_req_end(p_hwfn, rc);
+
return rc;
 }
 
@@ -435,10 +444,12 @@ int qed_vf_pf_rxq_start(struct qed_hwfn *p_hwfn,
resp = &p_iov->pf2vf_reply->queue_start;
rc = qed_send_msg2pf(p_hwfn, &resp->hdr.status, sizeof(*resp));
if (rc)
-   return rc;
+   goto exit;
 
-   if (resp->hdr.status != PFVF_STATUS_SUCCESS)
-   return -EINVAL;
+   if (resp->hdr.status != PFVF_STATUS_SUCCESS) {
+   rc = -EINVAL;
+   goto exit;
+   }
 
/* Learn the address of the producer from the response */
if (pp_prod && !p_iov->b_pre_fp_hsi) {
@@ -453,6 +464,8 @@ int qed_vf_pf_rxq_start(struct qed_hwfn *p_hwfn,
__internal_ram_wr(p_hwfn, *pp_prod, sizeof(u32),
  (u32 *)&init_prod_val);
}
+exit:
+   qed_vf_pf_req_end(p_hwfn, rc);
 
return rc;
 }
@@ -478,10 +491,15 @@ int qed_vf_pf_rxq_stop(struct qed_hwfn *p_hwfn, u16 
rx_qid, bool cqe_completion)
resp = &p_iov->pf2vf_reply->default_resp;
rc = qed_send_msg2pf(p_hwfn, &resp->hdr.status, sizeof(*resp));
if (rc)
-   return rc;
+   goto exit;
+
+   if (resp->hdr.status != PFVF_STATUS_SUCCESS) {
+   rc = -EINVAL;
+   goto exit;
+   }
 
-   if (resp->hdr.status != PFVF_STATUS_SUCCESS)
-   return -EINVAL;
+exit:
+   qed_vf_pf_req_end(p_hwfn, rc);
 
return rc;
 }
@@ -544,6 +562,7 @@ int qed_vf_pf_txq_start(struct qed_hwfn *p_hwfn,
   tx_queue_id, *pp_doorbell, resp->offset);
}
 exit:
+   qed_vf_pf_req_end(p_hwfn, rc);
 
return rc;
 }
@@ -568,10 +587,15 @@ int qed_vf_pf_txq_stop(struct qed_hwfn *p_hwfn, u16 
tx_qid)
resp = &p_iov->pf2vf_reply->default_resp;
rc = qed_send_msg2pf(p_hwfn, &resp->hdr.status, sizeof(*resp));
if (rc)
-   return rc;
+   goto exit;
 
-   if (resp->hdr.status != PFVF_STATUS_SUCCESS)
-   return -EINVAL;
+   if (resp->hdr.status != PFVF_STATUS_SUCCESS) {
+   rc = -EIN

[PATCH net-next 3/4] qed*: Add support for VFs over legacy PFs

2016-08-22 Thread Yuval Mintz
Modern VFs can't run on old non-compatible as the fastpath HSI is
slightly changed - but as the HSI is actually very close [basically,
a single bit whose meaning flipped] this can be supported with small
modifications.

The major differences would be in:
  - Recognizing that VF is running on top of a legacy PF.
  - Returning some slowpath configurations that are no longer needed
on top of modern PFs, but would be required when working over
the legacy ones.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_l2.c |   2 +
 drivers/net/ethernet/qlogic/qed/qed_vf.c | 107 ++-
 drivers/net/ethernet/qlogic/qed/qed_vf.h |   5 ++
 drivers/net/ethernet/qlogic/qede/qede.h  |   2 +
 drivers/net/ethernet/qlogic/qede/qede_main.c |  10 +++
 include/linux/qed/qed_eth_if.h   |   3 +
 6 files changed, 109 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index bf43301..4409ea3 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -1685,6 +1685,8 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev,
qed_vf_get_num_vlan_filters(&cdev->hwfns[0],
&info->num_vlan_filters);
qed_vf_get_port_mac(&cdev->hwfns[0], info->port_mac);
+
+   info->is_legacy = !!cdev->hwfns[0].vf_iov_info->b_pre_fp_hsi;
}
 
qed_fill_dev_info(cdev, &info->common);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c 
b/drivers/net/ethernet/qlogic/qed/qed_vf.c
index 9b780b3..f9f68da 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_vf.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c
@@ -191,6 +191,9 @@ static int qed_vf_pf_acquire(struct qed_hwfn *p_hwfn)
DP_VERBOSE(p_hwfn,
   QED_MSG_IOV, "attempting to acquire resources\n");
 
+   /* Clear response buffer, as this might be a re-send */
+   memset(p_iov->pf2vf_reply, 0, sizeof(union pfvf_tlvs));
+
/* send acquire request */
rc = qed_send_msg2pf(p_hwfn, &resp->hdr.status, sizeof(*resp));
if (rc)
@@ -205,9 +208,12 @@ static int qed_vf_pf_acquire(struct qed_hwfn *p_hwfn)
/* PF agrees to allocate our resources */
if (!(resp->pfdev_info.capabilities &
  PFVF_ACQUIRE_CAP_POST_FW_OVERRIDE)) {
-   DP_INFO(p_hwfn,
-   "PF is using old incompatible driver; 
Either downgrade driver or request provider to update hypervisor version\n");
-   return -EINVAL;
+   /* It's possible legacy PF mistakenly accepted;
+* but we don't care - simply mark it as
+* legacy and continue.
+*/
+   req->vfdev_info.capabilities |=
+   VFPF_ACQUIRE_CAP_PRE_FP_HSI;
}
DP_VERBOSE(p_hwfn, QED_MSG_IOV, "resources acquired\n");
resources_acquired = true;
@@ -215,27 +221,55 @@ static int qed_vf_pf_acquire(struct qed_hwfn *p_hwfn)
   attempts < VF_ACQUIRE_THRESH) {
qed_vf_pf_acquire_reduce_resc(p_hwfn, p_resc,
  &resp->resc);
+   } else if (resp->hdr.status == PFVF_STATUS_NOT_SUPPORTED) {
+   if (pfdev_info->major_fp_hsi &&
+   (pfdev_info->major_fp_hsi != ETH_HSI_VER_MAJOR)) {
+   DP_NOTICE(p_hwfn,
+ "PF uses an incompatible fastpath HSI 
%02x.%02x [VF requires %02x.%02x]. Please change to a VF driver using 
%02x.xx.\n",
+ pfdev_info->major_fp_hsi,
+ pfdev_info->minor_fp_hsi,
+ ETH_HSI_VER_MAJOR,
+ ETH_HSI_VER_MINOR,
+ pfdev_info->major_fp_hsi);
+   rc = -EINVAL;
+   goto exit;
+   }
 
-   /* Clear response buffer */
-   memset(p_iov->pf2vf_reply, 0, sizeof(union pfvf_tlvs));
-   } else if ((resp->hdr.status == PFVF_STATUS_NOT_SUPPORTED) &&
-  pfdev_info->major_fp_hsi &&
-  (pfdev_info->major_fp_hsi != ETH_HSI_VER_MAJOR)) {
-   DP_NOTICE(p_hwfn,
- "PF uses an incompatible fastpath HSI 
%02x.%02x [VF requires %02x.%02x]. Please change to a VF driver using 
%02x.xx.\n",
-

[PATCH net-next 2/4] qed: Prevent VFs from pause flooding

2016-08-22 Thread Yuval Mintz
Firmware would silently drop any control frame sent by VF to prevent
a malicious VF from generating pause flood in the network.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_l2.c| 3 +++
 drivers/net/ethernet/qlogic/qed/qed_l2.h| 2 ++
 drivers/net/ethernet/qlogic/qed/qed_sriov.c | 1 +
 3 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index c04162d..bf43301 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -101,6 +101,9 @@ int qed_sp_eth_vport_start(struct qed_hwfn *p_hwfn,
 
p_ramrod->tx_switching_en = p_params->tx_switching;
 
+   p_ramrod->ctl_frame_mac_check_en = !!p_params->check_mac;
+   p_ramrod->ctl_frame_ethtype_check_en = !!p_params->check_ethtype;
+
/* Software Function ID in hwfn (PFs are 0 - 15, VFs are 16 - 135) */
p_ramrod->sw_fid = qed_concrete_to_sw_fid(p_hwfn->cdev,
  p_params->concrete_fid);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h 
b/drivers/net/ethernet/qlogic/qed/qed_l2.h
index ea93519..e495d62 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h
@@ -102,6 +102,8 @@ struct qed_sp_vport_start_params {
u16 opaque_fid;
u8 vport_id;
u16 mtu;
+   bool check_mac;
+   bool check_ethtype;
 };
 
 int qed_sp_eth_vport_start(struct qed_hwfn *p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c 
b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
index f1fae77..cb68674 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
@@ -1680,6 +1680,7 @@ static void qed_iov_vf_mbx_start_vport(struct qed_hwfn 
*p_hwfn,
params.vport_id = vf->vport_id;
params.max_buffers_per_cqe = start->max_buffers_per_cqe;
params.mtu = vf->mtu;
+   params.check_mac = true;
 
rc = qed_sp_eth_vport_start(p_hwfn, ¶ms);
if (rc) {
-- 
1.9.3



Re: [PATCH v3 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread Feng Gao
inline

On Mon, Aug 22, 2016 at 5:48 PM, Guillaume Nault  wrote:
> On Sun, Aug 21, 2016 at 04:36:52PM -0600, Philp Prindeville wrote:
>> Inline
>>
>>
>> On 08/20/2016 09:52 AM, f...@48lvckh6395k16k5.yundunddos.com wrote:
>> > From: Gao Feng 
>> >
>> > Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
>> > 0x03, and 2 separately.
>> >
>> > Signed-off-by: Gao Feng 
>> > ---
>> >   v3: Modify the subject;
>> >   v2: Only replace the literal number with macros according to Guillaume's 
>> > advice
>> >   v1: Inital patch
>> >
>> >   net/l2tp/l2tp_ppp.c | 8 
>> >   1 file changed, 4 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
>> > index d9560aa..65e2fd6 100644
>> > --- a/net/l2tp/l2tp_ppp.c
>> > +++ b/net/l2tp/l2tp_ppp.c
>> > @@ -177,7 +177,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff 
>> > *skb)
>> > if (!pskb_may_pull(skb, 2))
>> > return 1;
>> > -   if ((skb->data[0] == 0xff) && (skb->data[1] == 0x03))
>> > +   if ((skb->data[0] == PPP_ALLSTATIONS) && (skb->data[1] == PPP_UI))
>>
>> This should have used PPP_ADDRESS() and PPP_CONTROL() here.
>>
> Then please justify how would that make the code more readable.
> We're not trying to interpret a known valid PPP header here.
>
>> > skb_pull(skb, 2);
>>
>> This magic number should go away.
>>
> Again, this is *not* a magic number. We've explicitely accessed the
> first _two_ header bytes and want to skip them.
> pskb_may_pull(2), ->data[0], ->data[1] and skb_pull(2) all go together.
>
> There's even a nice comment telling you what is done and why:
> /* Skip PPP header, if present.  In testing, Microsoft L2TP clients
>  * don't send the PPP header (PPP header compression enabled), but
>  * other clients can include the header. So we cope with both cases
>  * here. The PPP header is always FF03 when using L2TP.
>  *
>  * Note that skb->data[] isn't dereferenced from a u16 ptr here since
>  * the field may be unaligned.
>  */
> Apart from the unprecise "PPP header" term, which should be read as
> "address and control fields", things should be quite clear.

If remove the static ppph, may be more clear. Because it will cause
person think about the ppp header.

Regards
Feng

>
>> > @@ -282,7 +282,7 @@ static void pppol2tp_session_sock_put(struct 
>> > l2tp_session *session)
>> >   static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
>> > size_t total_len)
>> >   {
>> > -   static const unsigned char ppph[2] = { 0xff, 0x03 };
>> > +   static const unsigned char ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
>>
>> PPP has a 4-byte header.  Where's the protocol value?
>>
> No, PPP header (whatever you include in it) is of variable length. And
> the protocol has already been set by the PPP layer anyway.
> We're in L2TP here.


Re: [PATCH v2 0/3] VSOCK: vsockmon virtual device to monitor AF_VSOCK sockets.

2016-08-22 Thread Gerard Garcia

On 08/15/2016 05:13 PM, Stefan Hajnoczi wrote:

On Mon, Aug 15, 2016 at 02:15:38AM +0300, Michael S. Tsirkin wrote:

On Sat, Aug 13, 2016 at 12:21:51PM +0200, ggar...@abra.uab.cat wrote:

From: Gerard Garcia 

This patch applies over the mst vhost git repository:
http://git.kernel.org/cgit/linux/kernel/git/mst/vhost.git


So I do like where this is going, but it gives me pause
that there's a global list of taps, where all sockets
seem to multicast to them all.

In particular, this won't play well with things
like containers.


vsock currently has no network namespace support.  I agree that the tap
instances should be per-namespace when we add namespace support.


As each socket is bound to a physical device, how about binding
the monitor there as well?


Sockets aren't bound to physical devices, they are bound globally in the
af_vsock.ko module.  The module currently doesn't allow multiple
instances (you cannot have multiple VMCI or virtio transports).


Only sockets from this device
would do the forwarding, and only one monitor per
device would be supported.

In a sense this will make it more like macvtap than tap.


Restricting the number of monitors could make userspace cumbersome.
Imagine two scripts that want to capture packets.  The two scripts have
no knowledge of each other and create their own vsockmon interfaces.  If
we restrict vsockmon to just 1 interface then users need to agree on
sharing just 1 vsockmon interface.  I don't think this is beneficial.

So I think this global list is acceptable until we introduce network
namespace support.  At that point it will become per-namespace.



Sorry, I was out last week.

I don't have much to add to what Stefan said. I agree that when vsock 
introduces namespace support it will be necessary to have monitors 
divided per-namespace. Right now, if only one af_vsock instance is 
allowed, I think it makes sense to have a global list of taps.


Gerard


[PATCH net-next 0/3] TX max rate limiting for Chelsio T4/T5 adapters

2016-08-22 Thread Rahul Lakkireddy
This series of patches implement tx max rate limiting per queue on
Chelsio T4/T5 hardware.  This is achieved by first creating a tx
scheduling class with the specified max rate.  The queue is then
bound to the newly created class.  If a scheduling class with similar
max rate already exists, then the queue is bound to the matching class.

Patch 1 adds support for setting tx scheduling classes.
Patch 2 adds support to bind/unbind queues to/from the scheduling classes.
Patch 3 implements the set_tx_maxrate NDO.

Rahul Lakkireddy (3):
  cxgb4: add support for tx traffic scheduling classes
  cxgb4: add support for per queue tx scheduling
  cxgb4: add support for tx max rate limiting

 drivers/net/ethernet/chelsio/cxgb4/Makefile |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  56 ++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 102 -
 drivers/net/ethernet/chelsio/cxgb4/sched.c  | 556 
 drivers/net/ethernet/chelsio/cxgb4/sched.h  | 110 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  |  31 +-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  38 +-
 7 files changed, 890 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/sched.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/sched.h

-- 
2.5.3



Re: [PATCH net 1/1] net sched ife action: fix encoding to use real length

2016-08-22 Thread Jamal Hadi Salim

On 16-08-19 09:22 AM, Sergei Shtylyov wrote:

Hello.

On 8/19/2016 1:17 PM, Jamal Hadi Salim wrote:


From: Jamal Hadi Salim 

Encoding of the metadata was using the padded length as opposed to
the real length of the data which is a bug per specification.
This has not been an issue todate because all metadatum specified


   To date? Metadata perhaps?



Todate as in "up to today".


so far has been 32 bit where aligned and data length are the same width.
This also includes a bug fix for validating the length of a u16 field.
But since there is no metadata of size u16 yes we are fine to include it
here.

While at it get rid of magic numbers

Fixes: ef6980b6


   This tag has a standardized format, including 12-digit SHA1 and the
commit summary enclosed in ("").



Ok, let me fix that and resend.

cheers,
jamal


[PATCH net-next 1/3] cxgb4: add support for tx traffic scheduling classes

2016-08-22 Thread Rahul Lakkireddy
Add support to create tx traffic scheduling classes with specified
scheduling parameters.  Return an existing class if a match is found
with same scheduling parameters.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/Makefile |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  28 ++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  20 +-
 drivers/net/ethernet/chelsio/cxgb4/sched.c  | 235 
 drivers/net/ethernet/chelsio/cxgb4/sched.h  |  89 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  |  31 +++-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  38 +++-
 7 files changed, 438 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/sched.c
 create mode 100644 drivers/net/ethernet/chelsio/cxgb4/sched.h

diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile 
b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index fac2157..2461296 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -4,7 +4,7 @@
 
 obj-$(CONFIG_CHELSIO_T4) += cxgb4.o
 
-cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o 
cxgb4_uld.o
+cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o 
cxgb4_uld.o sched.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_CHELSIO_T4_FCOE) +=  cxgb4_fcoe.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 6b05289..17a6dd0 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1,7 +1,7 @@
 /*
  * This file is part of the Chelsio T4 Ethernet driver for Linux.
  *
- * Copyright (c) 2003-2014 Chelsio Communications, Inc. All rights reserved.
+ * Copyright (c) 2003-2016 Chelsio Communications, Inc. All rights reserved.
  *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
@@ -347,6 +347,7 @@ struct adapter_params {
unsigned int ofldq_wr_cred;
bool ulptx_memwrite_dsgl;  /* use of T5 DSGL allowed */
 
+   unsigned int nsched_cls;  /* number of traffic classes */
unsigned int max_ordird_qp;   /* Max read depth per RDMA QP */
unsigned int max_ird_adapter; /* Max read depth per adapter */
 };
@@ -495,6 +496,7 @@ struct port_info {
 #endif /* CONFIG_CHELSIO_T4_FCOE */
bool rxtstamp;  /* Enable TS */
struct hwtstamp_config tstamp_config;
+   struct sched_table *sched_tbl;
 };
 
 struct dentry;
@@ -858,6 +860,27 @@ struct adapter {
spinlock_t win0_lock cacheline_aligned_in_smp;
 };
 
+/* Support for "sched-class" command to allow a TX Scheduling Class to be
+ * programmed with various parameters.
+ */
+struct ch_sched_params {
+   s8   type; /* packet or flow */
+   union {
+   struct {
+   s8   level;/* scheduler hierarchy level */
+   s8   mode; /* per-class or per-flow */
+   s8   rateunit; /* bit or packet rate */
+   s8   ratemode; /* %port relative or kbps absolute */
+   s8   channel;  /* scheduler channel [0..N] */
+   s8   class;/* scheduler class [0..N] */
+   s32  minrate;  /* minimum rate */
+   s32  maxrate;  /* maximum rate */
+   s16  weight;   /* percent weight */
+   s16  pktsize;  /* average packet size */
+   } params;
+   } u;
+};
+
 /* Defined bit width of user definable filter tuples
  */
 #define ETHTYPE_BITWIDTH 16
@@ -1563,6 +1586,9 @@ void t4_get_trace_filter(struct adapter *adapter, struct 
trace_params *tp,
 int filter_index, int *enabled);
 int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
 u32 addr, u32 val);
+int t4_sched_params(struct adapter *adapter, int type, int level, int mode,
+   int rateunit, int ratemode, int channel, int class,
+   int minrate, int maxrate, int weight, int pktsize);
 void t4_sge_decode_idma_state(struct adapter *adapter, int state);
 void t4_free_mem(void *addr);
 void t4_idma_monitor_init(struct adapter *adapter,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 0099a0c..2e341bf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -1,7 +1,7 @@
 /*
  * This file is part of the Chelsio T4 Ethernet driver for Linux.
  *
- * Copyright (c) 2003-2014 Chelsio Communications, Inc. All rights reserved.
+ * Copyright (c) 2003-2016 Chelsio Communications, Inc. All rights reserved.
  *
  * This softw

[PATCH net-next 3/3] cxgb4: add support for tx max rate limiting

2016-08-22 Thread Rahul Lakkireddy
Implement set_tx_maxrate NDO to perform per queue tx rate limiting.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  | 20 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 82 +
 drivers/net/ethernet/chelsio/cxgb4/sched.h  |  3 +
 3 files changed, 105 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index eb30612..f988c60 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -881,6 +881,26 @@ struct ch_sched_params {
} u;
 };
 
+enum {
+   SCHED_CLASS_TYPE_PACKET = 0,/* class type */
+};
+
+enum {
+   SCHED_CLASS_LEVEL_CL_RL = 0,/* class rate limiter */
+};
+
+enum {
+   SCHED_CLASS_MODE_CLASS = 0, /* per-class scheduling */
+};
+
+enum {
+   SCHED_CLASS_RATEUNIT_BITS = 0,  /* bit rate scheduling */
+};
+
+enum {
+   SCHED_CLASS_RATEMODE_ABS = 1,   /* Kb/s */
+};
+
 /* Support for "sched_queue" command to allow one or more NIC TX Queues
  * to be bound to a TX Scheduling Class.
  */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 2e341bf..be5c942 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3140,6 +3140,87 @@ static void cxgb_netpoll(struct net_device *dev)
 }
 #endif
 
+static int cxgb_set_tx_maxrate(struct net_device *dev, int index, u32 rate)
+{
+   struct port_info *pi = netdev_priv(dev);
+   struct adapter *adap = pi->adapter;
+   struct sched_class *e;
+   struct ch_sched_params p;
+   struct ch_sched_queue qe;
+   u32 req_rate;
+   int err = 0;
+
+   if (!can_sched(dev))
+   return -ENOTSUPP;
+
+   if (index < 0 || index > pi->nqsets - 1)
+   return -EINVAL;
+
+   if (!(adap->flags & FULL_INIT_DONE)) {
+   dev_err(adap->pdev_dev,
+   "Failed to rate limit on queue %d. Link Down?\n",
+   index);
+   return -EINVAL;
+   }
+
+   /* Convert from Mbps to Kbps */
+   req_rate = rate << 10;
+
+   /* Max rate is 10 Gbps */
+   if (req_rate >= SCHED_MAX_RATE_KBPS) {
+   dev_err(adap->pdev_dev,
+   "Invalid rate %u Mbps, Max rate is %u Gbps\n",
+   rate, SCHED_MAX_RATE_KBPS);
+   return -ERANGE;
+   }
+
+   /* First unbind the queue from any existing class */
+   memset(&qe, 0, sizeof(qe));
+   qe.queue = index;
+   qe.class = SCHED_CLS_NONE;
+
+   err = cxgb4_sched_class_unbind(dev, (void *)(&qe), SCHED_QUEUE);
+   if (err) {
+   dev_err(adap->pdev_dev,
+   "Unbinding Queue %d on port %d fail. Err: %d\n",
+   index, pi->port_id, err);
+   return err;
+   }
+
+   /* Queue already unbound */
+   if (!req_rate)
+   return 0;
+
+   /* Fetch any available unused or matching scheduling class */
+   memset(&p, 0, sizeof(p));
+   p.type = SCHED_CLASS_TYPE_PACKET;
+   p.u.params.level= SCHED_CLASS_LEVEL_CL_RL;
+   p.u.params.mode = SCHED_CLASS_MODE_CLASS;
+   p.u.params.rateunit = SCHED_CLASS_RATEUNIT_BITS;
+   p.u.params.ratemode = SCHED_CLASS_RATEMODE_ABS;
+   p.u.params.channel  = pi->tx_chan;
+   p.u.params.class= SCHED_CLS_NONE;
+   p.u.params.minrate  = 0;
+   p.u.params.maxrate  = req_rate;
+   p.u.params.weight   = 0;
+   p.u.params.pktsize  = dev->mtu;
+
+   e = cxgb4_sched_class_alloc(dev, &p);
+   if (!e)
+   return -ENOMEM;
+
+   /* Bind the queue to a scheduling class */
+   memset(&qe, 0, sizeof(qe));
+   qe.queue = index;
+   qe.class = e->idx;
+
+   err = cxgb4_sched_class_bind(dev, (void *)(&qe), SCHED_QUEUE);
+   if (err)
+   dev_err(adap->pdev_dev,
+   "Queue rate limiting failed. Err: %d\n", err);
+   return err;
+}
+
 static const struct net_device_ops cxgb4_netdev_ops = {
.ndo_open = cxgb_open,
.ndo_stop = cxgb_close,
@@ -3162,6 +3243,7 @@ static const struct net_device_ops cxgb4_netdev_ops = {
 #ifdef CONFIG_NET_RX_BUSY_POLL
.ndo_busy_poll= cxgb_busy_poll,
 #endif
+   .ndo_set_tx_maxrate   = cxgb_set_tx_maxrate,
 };
 
 static const struct net_device_ops cxgb4_mgmt_netdev_ops = {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.h 
b/drivers/net/ethernet/chelsio/cxgb4/sched.h
index ac415eb..77b2b3f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.h
@@ -42,6 +42,9 @@
 
 #define FW_SCHED_CLS_NONE 0x
 
+/* Max rate that can be set to a scheduling class is 10 Gbps */
+#define SCHED_MAX_RATE_KBPS 1000U
+

[PATCH net-next 2/3] cxgb4: add support for per queue tx scheduling

2016-08-22 Thread Rahul Lakkireddy
Add support to bind/unbind specified tx queues to/from scheduling
classes.  If a queue is already bound to a scheduling class, it is
unbound first and then bound to a new specified class.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |   8 +
 drivers/net/ethernet/chelsio/cxgb4/sched.c | 321 +
 drivers/net/ethernet/chelsio/cxgb4/sched.h |  18 ++
 3 files changed, 347 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 17a6dd0..eb30612 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -881,6 +881,14 @@ struct ch_sched_params {
} u;
 };
 
+/* Support for "sched_queue" command to allow one or more NIC TX Queues
+ * to be bound to a TX Scheduling Class.
+ */
+struct ch_sched_queue {
+   s8   queue;/* queue index */
+   s8   class;/* class index */
+};
+
 /* Defined bit width of user definable filter tuples
  */
 #define ETHTYPE_BITWIDTH 16
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c 
b/drivers/net/ethernet/chelsio/cxgb4/sched.c
index 6158daf..539de76 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -67,6 +67,312 @@ static int t4_sched_class_fw_cmd(struct port_info *pi,
return err;
 }
 
+/* Spinlock must be held by caller */
+static int t4_sched_bind_unbind_op(struct port_info *pi, void *arg,
+  enum sched_bind_type type, bool bind)
+{
+   struct adapter *adap = pi->adapter;
+   u32 fw_mnem, fw_class, fw_param;
+   unsigned int pf = adap->pf;
+   unsigned int vf = 0;
+   int err = 0;
+
+   switch (type) {
+   case SCHED_QUEUE: {
+   struct sched_queue_entry *qe;
+
+   qe = (struct sched_queue_entry *)arg;
+
+   /* Create a template for the FW_PARAMS_CMD mnemonic and
+* value (TX Scheduling Class in this case).
+*/
+   fw_mnem = (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_DMAQ) |
+  FW_PARAMS_PARAM_X_V(
+  FW_PARAMS_PARAM_DMAQ_EQ_SCHEDCLASS_ETH));
+   fw_class = bind ? qe->param.class : FW_SCHED_CLS_NONE;
+   fw_param = (fw_mnem | FW_PARAMS_PARAM_YZ_V(qe->cntxt_id));
+
+   pf = adap->pf;
+   vf = 0;
+   break;
+   }
+   default:
+   err = -ENOTSUPP;
+   goto out;
+   }
+
+   err = t4_set_params(adap, adap->mbox, pf, vf, 1, &fw_param, &fw_class);
+
+out:
+   return err;
+}
+
+static struct sched_class *t4_sched_queue_lookup(struct port_info *pi,
+const unsigned int qid,
+int *index)
+{
+   struct sched_table *s = pi->sched_tbl;
+   struct sched_class *e, *end;
+   struct sched_class *found = NULL;
+   int i;
+
+   /* Look for a class with matching bound queue parameters */
+   end = &s->tab[s->sched_size];
+   for (e = &s->tab[0]; e != end; ++e) {
+   struct sched_queue_entry *qe;
+
+   i = 0;
+   if (e->state == SCHED_STATE_UNUSED)
+   continue;
+
+   list_for_each_entry(qe, &e->queue_list, list) {
+   if (qe->cntxt_id == qid) {
+   found = e;
+   if (index)
+   *index = i;
+   break;
+   }
+   i++;
+   }
+
+   if (found)
+   break;
+   }
+
+   return found;
+}
+
+static int t4_sched_queue_unbind(struct port_info *pi, struct ch_sched_queue 
*p)
+{
+   struct adapter *adap = pi->adapter;
+   struct sched_class *e;
+   struct sched_queue_entry *qe = NULL;
+   struct sge_eth_txq *txq;
+   unsigned int qid;
+   int index = -1;
+   int err = 0;
+
+   if (p->queue < 0 || p->queue >= pi->nqsets)
+   return -ERANGE;
+
+   txq = &adap->sge.ethtxq[pi->first_qset + p->queue];
+   qid = txq->q.cntxt_id;
+
+   /* Find the existing class that the queue is bound to */
+   e = t4_sched_queue_lookup(pi, qid, &index);
+   if (e && index >= 0) {
+   int i = 0;
+
+   spin_lock(&e->lock);
+   list_for_each_entry(qe, &e->queue_list, list) {
+   if (i == index)
+   break;
+   i++;
+   }
+   err = t4_sched_bind_unbind_op(pi, (void *)qe, SCHED_QUEUE,
+ false);
+   if (err) {
+   spin_unlock(&e->lock);
+   goto out;
+   }

[PATCH v4 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread fgao
From: Gao Feng 

Use PPP_ALLSTATIONS, PPP_UI, and SEND_SHUTDOWN instead of 0xff,
0x03, and 2 separately.

Signed-off-by: Gao Feng 
---
 v4: Remove two static ppph variables;
 v3: Modify the subject;
 v2: Only replace the literal number with macros according to Guillaume's advice
 v1: Inital patch

 net/l2tp/l2tp_ppp.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index d9560aa..0c071c4 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -177,7 +177,7 @@ static int pppol2tp_recv_payload_hook(struct sk_buff *skb)
if (!pskb_may_pull(skb, 2))
return 1;
 
-   if ((skb->data[0] == 0xff) && (skb->data[1] == 0x03))
+   if ((skb->data[0] == PPP_ALLSTATIONS) && (skb->data[1] == PPP_UI))
skb_pull(skb, 2);
 
return 0;
@@ -282,7 +282,6 @@ static void pppol2tp_session_sock_put(struct l2tp_session 
*session)
 static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
size_t total_len)
 {
-   static const unsigned char ppph[2] = { 0xff, 0x03 };
struct sock *sk = sock->sk;
struct sk_buff *skb;
int error;
@@ -312,7 +311,7 @@ static int pppol2tp_sendmsg(struct socket *sock, struct 
msghdr *m,
error = -ENOMEM;
skb = sock_wmalloc(sk, NET_SKB_PAD + sizeof(struct iphdr) +
   uhlen + session->hdr_len +
-  sizeof(ppph) + total_len,
+  2 + total_len, /* 2 bytes for PPP_ALLSTATIONS & 
PPP_UI */
   0, GFP_KERNEL);
if (!skb)
goto error_put_sess_tun;
@@ -325,8 +324,8 @@ static int pppol2tp_sendmsg(struct socket *sock, struct 
msghdr *m,
skb_reserve(skb, uhlen);
 
/* Add PPP header */
-   skb->data[0] = ppph[0];
-   skb->data[1] = ppph[1];
+   skb->data[0] = PPP_ALLSTATIONS; 
+   skb->data[1] = PPP_UI;
skb_put(skb, 2);
 
/* Copy user data into skb */
@@ -369,7 +368,6 @@ error:
  */
 static int pppol2tp_xmit(struct ppp_channel *chan, struct sk_buff *skb)
 {
-   static const u8 ppph[2] = { 0xff, 0x03 };
struct sock *sk = (struct sock *) chan->private;
struct sock *sk_tun;
struct l2tp_session *session;
@@ -398,14 +396,14 @@ static int pppol2tp_xmit(struct ppp_channel *chan, struct 
sk_buff *skb)
   sizeof(struct iphdr) + /* IP header */
   uhlen +  /* UDP header (if L2TP_ENCAPTYPE_UDP) */
   session->hdr_len +   /* L2TP header */
-  sizeof(ppph);/* PPP header */
+  2;   /* 2 bytes for PPP_ALLSTATIONS & PPP_UI 
*/
if (skb_cow_head(skb, headroom))
goto abort_put_sess_tun;
 
/* Setup PPP header */
-   __skb_push(skb, sizeof(ppph));
-   skb->data[0] = ppph[0];
-   skb->data[1] = ppph[1];
+   __skb_push(skb, 2); 
+   skb->data[0] = PPP_ALLSTATIONS;
+   skb->data[1] = PPP_UI;
 
local_bh_disable();
l2tp_xmit_skb(session, skb, session->hdr_len);
@@ -440,7 +438,7 @@ static void pppol2tp_session_close(struct l2tp_session 
*session)
BUG_ON(session->magic != L2TP_SESSION_MAGIC);
 
if (sock) {
-   inet_shutdown(sock, 2);
+   inet_shutdown(sock, SEND_SHUTDOWN);
/* Don't let the session go away before our socket does */
l2tp_session_inc_refcount(session);
}
-- 
1.9.1



Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support

2016-08-22 Thread Simon Horman
On Wed, Aug 10, 2016 at 10:17:30AM -0700, Joe Stringer wrote:
> On 10 August 2016 at 03:20, Simon Horman  wrote:
> > On Tue, Aug 09, 2016 at 08:47:40AM -0700, pravin shelar wrote:
> >> On Mon, Aug 8, 2016 at 8:17 AM, Simon Horman  
> >> wrote:
> >> > Light testing seems to indicate that it works for GSO skbs
> >> > received over both L3 and L2 GRE tunnels by OvS with both
> >> > IP-in-MPLS and IP (without MPLS) payloads.
> >> >
> >>
> >> Thanks for testing it. Can you also add those tests to OVS kmod test suite?
> >> ..
> >
> > Sure, I will look into doing that.
> > Am I correct in thinking Joe Stringer is the best person to contact if
> > I run into trouble there?
> 
> Sure. The basics of running the tests is documented here:
> https://github.com/openvswitch/ovs/blob/master/INSTALL.md#datapath-testing
> 
> You should be able to get a good feel for how to add tests by perusing
> the commits to tests/system-{traffic,kmod-macros}.at in the OVS source
> tree.

Thanks Joe,

it took me a while but I think that I have something working
against the head branch of the OVS tree. I'd value opinions
on the direction I have taken.

Subject: [PATCH] system-traffic: Exercise GSO

Exercise GSO for: unencapsulated; MPLS; GRE; and MPLS in GRE.

There is scope to extend this testing to other encapsulation formats
if desired.

This is motivated by a desire to test GRE and MPLS encapsulation in
the context of L3/VPN (MPLS over non-TEB GRE work). That is not
tested here but tests for those cases would ideally be based on those in
this patch.

---
 tests/system-common-macros.at |  36 +--
 tests/system-kmod-macros.at   |  22 +
 tests/system-traffic.at   | 225 +-
 3 files changed, 274 insertions(+), 9 deletions(-)

diff --git a/tests/system-common-macros.at b/tests/system-common-macros.at
index 4ffc3822a4d3..a201cf8ce100 100644
--- a/tests/system-common-macros.at
+++ b/tests/system-common-macros.at
@@ -56,7 +56,7 @@ m4_define([ADD_INT],
 ]
 )
 
-# ADD_VETH([port], [namespace], [ovs-br], [ip_addr] [mac_addr [gateway]])
+# ADD_VETH([port], [namespace], [ovs-br], [ip_addr] [mac_addr [gateway 
[ofport]]])
 #
 # Add a pair of veth ports. 'port' will be added to name space 'namespace',
 # and "ovs-'port'" will be added to ovs bridge 'ovs-br'.
@@ -64,8 +64,8 @@ m4_define([ADD_INT],
 # The 'port' in 'namespace' will be brought up with static IP address
 # with 'ip_addr' in CIDR notation.
 #
-# Optionally, one can specify the 'mac_addr' for 'port' and the default
-# 'gateway'.
+# Optionally, one can specify the 'mac_addr' for 'port', the default
+# 'gateway' and the 'ofport' number.
 #
 # The existing 'port' or 'ovs-port' will be removed before new ones are added.
 #
@@ -74,8 +74,14 @@ m4_define([ADD_VETH],
   CONFIGURE_VETH_OFFLOADS([$1])
   AT_CHECK([ip link set $1 netns $2])
   AT_CHECK([ip link set dev ovs-$1 up])
-  AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
-set interface ovs-$1 external-ids:iface-id="$1"])
+  if test -n "$7"; then
+AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+  set interface ovs-$1 external-ids:iface-id="$1" \
+  ofport_request=$7])
+  else
+AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+  set interface ovs-$1 external-ids:iface-id="$1"])
+  fi
   NS_CHECK_EXEC([$2], [ip addr add $4 dev $1])
   NS_CHECK_EXEC([$2], [ip link set dev $1 up])
   if test -n "$5"; then
@@ -99,7 +105,7 @@ m4_define([ADD_VLAN],
 ]
 )
 
-# ADD_OVS_TUNNEL([type], [bridge], [port], [remote-addr], [overlay-addr])
+# ADD_OVS_TUNNEL([type], [bridge], [port], [remote-addr], [overlay-addr 
[ofport]])
 #
 # Add an ovs-based tunnel device in the root namespace, with name 'port' and
 # type 'type'. The tunnel device will be configured as point-to-point with the
@@ -107,9 +113,17 @@ m4_define([ADD_VLAN],
 #
 # 'port will be configured with the address 'overlay-addr'.
 #
+# Optionally one can specify the 'ofport' number
+#
 m4_define([ADD_OVS_TUNNEL],
-   [AT_CHECK([ovs-vsctl add-port $2 $3 -- \
-  set int $3 type=$1 options:remote_ip=$4])
+   [if test -n "$6"; then
+  AT_CHECK([ovs-vsctl add-port $2 $3 -- \
+set int $3 type=$1 options:remote_ip=$4 \
+   ofport_request=$6])
+else
+  AT_CHECK([ovs-vsctl add-port $2 $3 -- \
+set int $3 type=$1 options:remote_ip=$4])
+fi
 AT_CHECK([ip addr add dev $2 $5])
 AT_CHECK([ip link set dev $2 up])
 AT_CHECK([ip link set dev $2 mtu 1450])
@@ -143,6 +157,12 @@ m4_define([ADD_NATIVE_TUNNEL],
 #
 m4_define([FORMAT_PING], [grep "transmitted" | sed 's/time.*ms$/time 0ms/'])
 
+# FORMAT_DD([])
+#
+# Strip variant pieces from dd output so the output can be reliably compared.
+#
+m4_define([FORMAT_DD], [sed 's/copied,.*$/copied, .../'])
+
 # FORMAT_CT([ip-addr])
 #
 # Strip content from the piped input which would differ from test to test
diff --git a/tests/system

Re: [PATCH v2 2/2] ipv6: fixup RTF_* flags when restoring RTPROT_RA route from rtnetlink

2016-08-22 Thread Andrew Yourtchenko
Hello, 

Thanks for the review and feedback, one small clarification below.

On Fri, 19 Aug 2016, Sergei Shtylyov wrote:

> Hello.
> 
> On 08/19/2016 07:41 PM, Andrew Yourtchenko wrote:
> 
> > Fix the flags for RA-derived routes that were saved
> > via "ip -6 route save" and and subsequently restored via
> > "ip -6 route restore", allowing the incoming router advertisements
> > to update them, rather than complain about inability to do so.
> > 
> > Upon the restore of RA-derived saved routes, set the RTF_ADDRCONF
> > to indicate that the source of the route was originally
> > a router advertisement, and set the RTF_DEFAULT or RTF_ROUTEINFO
> > flag depending on prefix length. This can be considered a
> > sister change of f0396f60d7c165018c9b203fb9b89fb224835578, in
> 
>It's enough to specify 12 digits but you also need to specify the commit
> summary enclosed in ("").

Is just the first line of "ipv6: fix RTPROT_RA markup of RA routes 
w/nexthops" enough, or do I need to include the additional two 
paragraphs that follow ? (and can I keep the full hash rather than 
truncate down to 12 digits?)

> 
> > the other direction.
> > 
> > Signed-off-by: Andrew Yourtchenko 
> > ---
> > Changes since v1 [1]:
> >  * fixed the indentation of the basic blocks to be always a full TAB
> >as per David Miller's review
> > 
> > [1] v1: http://marc.info/?l=linux-netdev&m=147135599322285&w=2
> > 
> >  net/ipv6/route.c | 10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index f5b987d..60d95cd 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -2769,6 +2769,16 @@ static int rtm_to_fib6_config(struct sk_buff *skb,
> > struct nlmsghdr *nlh,
> > cfg->fc_protocol = rtm->rtm_protocol;
> > cfg->fc_type = rtm->rtm_type;
> > 
> > +   if (rtm->rtm_protocol == RTPROT_RA) {
> > +   /* RA-derived route: set flags accordingly. */
> > +   cfg->fc_flags |= RTF_ADDRCONF;
> > +   if (rtm->rtm_dst_len == 0) {
> > +   cfg->fc_flags |= RTF_DEFAULT;
> > +   } else {
> > +   cfg->fc_flags |= RTF_ROUTEINFO;
> > +   }
> 
>{} not needed here and above.

Will fix in v3, thanks!

--a


> 
> [...]
> 
> MBR, Sergei
> 
> 


Re: [PATCH iproute2 2/2] tc: m_vlan: Add priority option to push vlan action

2016-08-22 Thread Jiri Pirko
Mon, Aug 22, 2016 at 12:24:54PM CEST, had...@mellanox.com wrote:
>The current vlan push action supports only vid and protocol options.
>Add priority option.
>
>Example script that adds vlan push action with vid and priority:
>
>tc filter add dev veth0 protocol ip parent : \
>   flower \
>   indev veth0 \
>   action vlan push id 100 priority 5
>
>Signed-off-by: Hadar Hen Zion 

Acked-by: Jiri Pirko 


Re: [PATCH v2 2/2] ipv6: fixup RTF_* flags when restoring RTPROT_RA route from rtnetlink

2016-08-22 Thread Sergei Shtylyov

Hello.

On 8/22/2016 2:04 PM, Andrew Yourtchenko wrote:


Fix the flags for RA-derived routes that were saved
via "ip -6 route save" and and subsequently restored via
"ip -6 route restore", allowing the incoming router advertisements
to update them, rather than complain about inability to do so.

Upon the restore of RA-derived saved routes, set the RTF_ADDRCONF
to indicate that the source of the route was originally
a router advertisement, and set the RTF_DEFAULT or RTF_ROUTEINFO
flag depending on prefix length. This can be considered a
sister change of f0396f60d7c165018c9b203fb9b89fb224835578, in


   It's enough to specify 12 digits but you also need to specify the commit
summary enclosed in ("").


Is just the first line of "ipv6: fix RTPROT_RA markup of RA routes
w/nexthops" enough, or do I need to include the additional two


   Summary means the patch subject (and the first line of the commit log).


paragraphs that follow ? (and can I keep the full hash rather than
truncate down to 12 digits?)


   Yes.


the other direction.

Signed-off-by: Andrew Yourtchenko 

[...]

MBR, Sergei



[PATCH v2 net 1/1] net sched: fix encoding to use real length

2016-08-22 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Encoding of the metadata was using the padded length as opposed to
the real length of the data which is a bug per specification.
This has not been an issue todate because all metadatum specified
so far has been 32 bit where aligned and data length are the same width.
This also includes a bug fix for validating the length of a u16 field.
But since there is no metadata of size u16 yes we are fine to include it
here.

While at it get rid of magic numbers.

Fixes: ef6980b6becb ("net sched: introduce IFE action")
Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_ife.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 141a06e..e87cd81 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -53,7 +53,7 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 
dlen, const void *dval)
u32 *tlv = (u32 *)(skbdata);
u16 totlen = nla_total_size(dlen);  /*alignment + hdr */
char *dptr = (char *)tlv + NLA_HDRLEN;
-   u32 htlv = attrtype << 16 | totlen;
+   u32 htlv = attrtype << 16 | dlen;
 
*tlv = htonl(htlv);
memset(dptr, 0, totlen - NLA_HDRLEN);
@@ -135,7 +135,7 @@ EXPORT_SYMBOL_GPL(ife_release_meta_gen);
 
 int ife_validate_meta_u32(void *val, int len)
 {
-   if (len == 4)
+   if (len == sizeof(u32))
return 0;
 
return -EINVAL;
@@ -144,8 +144,8 @@ EXPORT_SYMBOL_GPL(ife_validate_meta_u32);
 
 int ife_validate_meta_u16(void *val, int len)
 {
-   /* length will include padding */
-   if (len == NLA_ALIGN(2))
+   /* length will not include padding */
+   if (len == sizeof(u16))
return 0;
 
return -EINVAL;
@@ -652,12 +652,14 @@ static int tcf_ife_decode(struct sk_buff *skb, const 
struct tc_action *a,
u8 *tlvdata = (u8 *)tlv;
u16 mtype = tlv->type;
u16 mlen = tlv->len;
+   u16 alen;
 
mtype = ntohs(mtype);
mlen = ntohs(mlen);
+   alen = NLA_ALIGN(mlen);
 
-   if (find_decode_metaid(skb, ife, mtype, (mlen - 4),
-  (void *)(tlvdata + 4))) {
+   if (find_decode_metaid(skb, ife, mtype, (mlen - NLA_HDRLEN),
+  (void *)(tlvdata + NLA_HDRLEN))) {
/* abuse overlimits to count when we receive metadata
 * but dont have an ops for it
 */
@@ -666,8 +668,8 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct 
tc_action *a,
ife->tcf_qstats.overlimits++;
}
 
-   tlvdata += mlen;
-   ifehdrln -= mlen;
+   tlvdata += alen;
+   ifehdrln -= alen;
tlv = (struct meta_tlvhdr *)tlvdata;
}
 
-- 
1.9.1



Re: [PATCH iproute2 1/1] tc classifiers: Modernize tcindex classifier

2016-08-22 Thread Jamal Hadi Salim

On 16-08-18 05:57 PM, Stephen Hemminger wrote:


The is ok for the parsing of config, but you are still using print_police
for display.


Thanks for catching that. I will send a v2.

cheers,
jamal




Re: [PATCH iproute2 1/2] tc: flower: Introduce vlan support

2016-08-22 Thread Jiri Pirko
Mon, Aug 22, 2016 at 12:24:53PM CEST, had...@mellanox.com wrote:
>Classification according to vlan id and vlan priority.
>
>Example script that adds vlan filter:
>
> # add ingress qdisc
> tc qdisc add dev ens4f0 ingress
>
> # add a flower filter with vlan id and priority classification
> tc filter add dev ens4f0 protocol 802.1Q parent : \
>   flower \
>   indev ens4f0 \
>   vlan_ethtype ipv4 \
>   vlan_id 100 \
>   vlan_prio 3 \
>   action vlan pop
>
>Signed-off-by: Hadar Hen Zion 

Acked-by: Jiri Pirko 

Would be nice to see vlan proto support in near future.


[PATCH v2 iproute2 1/1] tc classifiers: Modernize tcindex classifier

2016-08-22 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 tc/f_tcindex.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/tc/f_tcindex.c b/tc/f_tcindex.c
index 280d1d4..32bccb0 100644
--- a/tc/f_tcindex.c
+++ b/tc/f_tcindex.c
@@ -23,7 +23,7 @@ static void explain(void)
 }
 
 static int tcindex_parse_opt(struct filter_util *qu, char *handle, int argc,
-char **argv, struct nlmsghdr *n)
+char **argv, struct nlmsghdr *n)
 {
struct tcmsg *t = NLMSG_DATA(n);
struct rtattr *tail;
@@ -49,7 +49,8 @@ static int tcindex_parse_opt(struct filter_util *qu, char 
*handle, int argc,
explain();
return -1;
}
-   addattr_l(n, 4096, TCA_TCINDEX_HASH, &hash, 
sizeof(hash));
+   addattr_l(n, 4096, TCA_TCINDEX_HASH, &hash,
+ sizeof(hash));
} else if (!strcmp(*argv,"mask")) {
__u16 mask;
 
@@ -59,7 +60,8 @@ static int tcindex_parse_opt(struct filter_util *qu, char 
*handle, int argc,
explain();
return -1;
}
-   addattr_l(n, 4096, TCA_TCINDEX_MASK, &mask, 
sizeof(mask));
+   addattr_l(n, 4096, TCA_TCINDEX_MASK, &mask,
+ sizeof(mask));
} else if (!strcmp(*argv,"shift")) {
int shift;
 
@@ -99,7 +101,7 @@ static int tcindex_parse_opt(struct filter_util *qu, char 
*handle, int argc,
continue;
} else if (!strcmp(*argv,"action")) {
NEXT_ARG();
-   if (parse_police(&argc, &argv, TCA_TCINDEX_ACT, n)) {
+   if (parse_action(&argc, &argv, TCA_TCINDEX_ACT, n)) {
fprintf(stderr, "Illegal \"action\"\n");
return -1;
}
@@ -117,7 +119,7 @@ static int tcindex_parse_opt(struct filter_util *qu, char 
*handle, int argc,
 
 
 static int tcindex_print_opt(struct filter_util *qu, FILE *f,
- struct rtattr *opt, __u32 handle)
+struct rtattr *opt, __u32 handle)
 {
struct rtattr *tb[TCA_TCINDEX_MAX+1];
 
@@ -171,7 +173,7 @@ static int tcindex_print_opt(struct filter_util *qu, FILE 
*f,
}
if (tb[TCA_TCINDEX_ACT]) {
fprintf(f, "\n");
-   tc_print_police(f, tb[TCA_TCINDEX_ACT]);
+   tc_print_action(f, tb[TCA_TCINDEX_ACT]);
}
return 0;
 }
-- 
1.9.1



Re: [PATCH v2 1/2] ipv6: save route expiry in RTA_EXPIRES if RTF_EXPIRES set

2016-08-22 Thread Andrew Yourtchenko
On Fri, 19 Aug 2016, Eric Dumazet wrote:

> On Fri, 2016-08-19 at 18:41 +0200, Andrew Yourtchenko wrote:
> > This allows "ip -6 route save" to save the expiry for the routes
> > that have it, such that it can be correctly restored later by
> > "ip -6 route restore".
> > 
> > If a route has RTF_EXPIRES set, generate RTA_EXPIRES value which
> > will be used to restore the flag and expiry value by already
> > existing code in rtm_to_fib6_config.
> > 
> > The expiry was already being saved as part of RTA_CACHEINFO
> > in rtnl_put_cacheinfo(), but adding code to generate RTF_EXPIRES upon save
> > looked more appropriate than redundant cherrypicking from
> > RTA_CACHEINFO upon restore.
> > 
> > Signed-off-by: Andrew Yourtchenko 
> > ---
> > Changes since v1 [1]: 
> >  * fixed the indentation in a multiline function call
> >as per David Miller's review
> > 
> > [1] v1: http://marc.info/?l=linux-netdev&m=147135597422280&w=2
> > 
> >  net/ipv6/route.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 4981755..f5b987d 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -3244,6 +3244,14 @@ static int rt6_fill_node(struct net *net,
> > if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, rt->dst.error) < 0)
> > goto nla_put_failure;
> >  
> > +   /* Can't rely on expires == 0. It is zero if no expires flag,
> > +* or if the timing is precisely at expiry. So, recheck the flag.
> > +*/
> > +   if (rt->rt6i_flags & RTF_EXPIRES)
> > +   if (nla_put_u32(skb, RTA_EXPIRES,
> > +   expires > 0 ? expires / HZ : 0))
> > +   goto nla_put_failure;
> > +
> 
> Why sending (kernel -> user) the value in second units, while the other
> direction (user -> kernel) expects hz ?

No, the userland does expect seconds, see below:

> 
> iproute2$ git grep -n RTA_EXPIRES
> include/linux/rtnetlink.h:389:  RTA_EXPIRES,
> ip/iproute.c:930:   addattr32(&req.n, sizeof(req), 
> RTA_EXPIRES, expires*hz);
> 

Fixed by commit eecc006952d6f3992b632974d0f04f995d2a176e
Author: Andrew Vagin 
Date:   Wed Jun 29 02:27:14 2016 +0300

ip route: timeout for routes has to be set in seconds

Currently a timeout is multiplied by HZ in user-space and
then it multiplied by HZ in kernel-space.


> kernel patch : (32bc201e1974976b7d3fea9a9b17bb7392ca6394)
> 
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> 
> @@ -2809,6 +2810,15 @@ static int rtm_to_fib6_config(struct sk_buff *skb, 
> struct nlmsghdr *nlh,
> if (tb[RTA_ENCAP_TYPE])
> cfg->fc_encap_type = nla_get_u16(tb[RTA_ENCAP_TYPE]);
>  
> +   if (tb[RTA_EXPIRES]) {
> +   unsigned long timeout = 
> addrconf_timeout_fixup(nla_get_u32(tb[RTA_EXPIRES]), HZ);
> +
> +   if (addrconf_finite_timeout(timeout)) {
> +   cfg->fc_expires = jiffies_to_clock_t(timeout * HZ);
> +   cfg->fc_flags |= RTF_EXPIRES;
> +   }
> +   }
> +
> 

Which is exactly the bug this snippet would trigger together with the 
older git grep that you have enclosed - the resulting value from user 
space assigned to fc_expires would have been:

jiffies_to_clock_t(timeout * HZ * hz).

So, the userland needs value in seconds. I suppose a good idea might be to 
change my comment within the commit into something like:

/* Can't rely on expires == 0. It is zero if no expires flag,
 * or if the timing is precisely at expiry. So, recheck the flag.
 * The division by HZ is to obtain the value in seconds, which
 * is needed by the userland, see commit
 * eecc006952d6f3992b632974d0f04f995d2a176e
 * "ip route: timeout for routes has to be set in seconds"
 */

What do you think ?

--a


Re: [PATCH v3 net-next] l2tp: Refactor the codes with existing macros instead of literal number

2016-08-22 Thread Guillaume Nault
On Mon, Aug 22, 2016 at 06:22:42PM +0800, Feng Gao wrote:
> inline
> 
> On Mon, Aug 22, 2016 at 6:07 PM, Guillaume Nault  wrote:
> > On Sat, Aug 20, 2016 at 11:52:27PM +0800, f...@ikuai8.com wrote:
> >> From: Gao Feng 
> >> @@ -282,7 +282,7 @@ static void pppol2tp_session_sock_put(struct 
> >> l2tp_session *session)
> >>  static int pppol2tp_sendmsg(struct socket *sock, struct msghdr *m,
> >>   size_t total_len)
> >>  {
> >> - static const unsigned char ppph[2] = { 0xff, 0x03 };
> >> + static const unsigned char ppph[2] = {PPP_ALLSTATIONS, PPP_UI};
> >>
> > Minor nit: I'd prefer to keep the space after '{' and before '}'.
> > I didn't want to bother you with this, but since it seems you'll have
> > to repost...
> 
> I don't know if it is the coding style of Linux kernel.
>
Both forms are used currently and I can't recall any explicit
preference statement. So unless David has an opinion, you can just use
the form you like the best.

> > BTW, I thought you also wanted to remove the static ppph variable
> > from pppol2tp_xmit() / pppol2tp_sendmsg(), to directly assign
> > skb->data[0/1] with PPP_ALLSTATIONS/PPP_UI.
> 
> If removed static ppph, there will be some codes which use literal "2"
> instead of sizeof ppph.
> Is it ok?
> 
The literal "2" would be used in the sock_wmalloc() call only (or for
assigning the headroom variable in the case of pppol2tp_xmit()). Given
the number of data summed, I agree that having a plain "2" in the
middle could look odd. You can either add a comment for each data summed
(like in pppol2tp_xmit()), something like:
sock_wmalloc(sk, NET_SKB_PAD +
 sizeof(struct iphdr) + /* IP header */
 ...
 2 +/* PPP Address and control field */
 ...);

Or use a simple macro like:
/* Size of the PPP address and control fields */
#define PPP_ACF_LEN 2

Or event use macro and comment. That's up to you.
You can even drop this change entirely if you prefer, I don't mind.
I just raised this point because you said you'd remove ppph.


[patch iproute2 0/2] Add matchall support to tc

2016-08-22 Thread Jiri Pirko
From: Jiri Pirko 

Yotam says:
Add the matchall classifier support to tc and added the specific man pages.

Yotam Gigi (2):
  tc: Add support for the matchall traffic classifier.
  tc: man: Add man entry for the matchall classifier.

 man/man8/Makefile  |   2 +-
 man/man8/tc-matchall.8 |  76 ++
 man/man8/tc.8  |   5 ++
 tc/Makefile|   1 +
 tc/f_matchall.c| 144 +
 5 files changed, 227 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/tc-matchall.8
 create mode 100644 tc/f_matchall.c

-- 
2.5.5



[patch iproute2 1/2] tc: Add support for the matchall traffic classifier.

2016-08-22 Thread Jiri Pirko
From: Yotam Gigi 

The matchall classifier matches every packet and allows the user to apply
actions on it. In addition, it supports the skip_sw and skip_hw (as can
be found on u32 and flower filter) that direct the kernel to skip the
software/hardware processing of the actions.

This filter is very useful in usecases where every packet should be
matched. For example, packet mirroring (SPAN) can be setup very easily
using that filter.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
---
 tc/Makefile |   1 +
 tc/f_matchall.c | 144 
 2 files changed, 145 insertions(+)
 create mode 100644 tc/f_matchall.c

diff --git a/tc/Makefile b/tc/Makefile
index 42747c5..8917eaf 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -67,6 +67,7 @@ TCMODULES += q_pie.o
 TCMODULES += q_hhf.o
 TCMODULES += q_clsact.o
 TCMODULES += e_bpf.o
+TCMODULES += f_matchall.o
 
 ifeq ($(TC_CONFIG_IPSET), y)
   ifeq ($(TC_CONFIG_XT), y)
diff --git a/tc/f_matchall.c b/tc/f_matchall.c
new file mode 100644
index 000..c985276
--- /dev/null
+++ b/tc/f_matchall.c
@@ -0,0 +1,144 @@
+/*
+ * f_matchall.cMatch-all Classifier
+ *
+ * This program is free software; you can distribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Jiri Pirko , Yotam Gigi 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... matchall [skip_sw | skip_hw] \n");
+   fprintf(stderr, " [ action ACTION_SPEC ] [ classid 
CLASSID ]\n");
+   fprintf(stderr, "\n");
+   fprintf(stderr, "Where: SELECTOR := SAMPLE SAMPLE ...\n");
+   fprintf(stderr, "   FILTERID := X:Y:Z\n");
+   fprintf(stderr, "   ACTION_SPEC := ... look at individual 
actions\n");
+   fprintf(stderr, "\nNOTE: CLASSID is parsed as hexadecimal input.\n");
+}
+
+static int matchall_parse_opt(struct filter_util *qu, char *handle,
+  int argc, char **argv, struct nlmsghdr *n)
+{
+   struct tcmsg *t = NLMSG_DATA(n);
+   struct rtattr *tail;
+   __u32 flags = 0;
+   long h = 0;
+
+   if (handle) {
+   h = strtol(handle, NULL, 0);
+   if (h == LONG_MIN || h == LONG_MAX) {
+   fprintf(stderr, "Illegal handle \"%s\", must be 
numeric.\n",
+   handle);
+   return -1;
+   }
+   }
+   t->tcm_handle = h;
+
+   if (argc == 0)
+   return 0;
+
+   tail = (struct rtattr *)(((void *)n)+NLMSG_ALIGN(n->nlmsg_len));
+   addattr_l(n, MAX_MSG, TCA_OPTIONS, NULL, 0);
+
+   while (argc > 0) {
+   if (matches(*argv, "classid") == 0 ||
+  strcmp(*argv, "flowid") == 0) {
+   unsigned int handle;
+
+   NEXT_ARG();
+   if (get_tc_classid(&handle, *argv)) {
+   fprintf(stderr, "Illegal \"classid\"\n");
+   return -1;
+   }
+   addattr_l(n, MAX_MSG, TCA_MATCHALL_CLASSID, &handle, 4);
+   } else if (matches(*argv, "action") == 0) {
+   NEXT_ARG();
+   if (parse_action(&argc, &argv, TCA_MATCHALL_ACT, n)) {
+   fprintf(stderr, "Illegal \"action\"\n");
+   return -1;
+   }
+   continue;
+
+   } else if (strcmp(*argv, "skip_hw") == 0) {
+   NEXT_ARG();
+   flags |= TCA_CLS_FLAGS_SKIP_HW;
+   continue;
+   } else if (strcmp(*argv, "skip_sw") == 0) {
+   NEXT_ARG();
+   flags |= TCA_CLS_FLAGS_SKIP_SW;
+   continue;
+   } else if (strcmp(*argv, "help") == 0) {
+   explain();
+   return -1;
+   } else {
+   fprintf(stderr, "What is \"%s\"?\n", *argv);
+   explain();
+   return -1;
+   }
+   argc--; argv++;
+   }
+
+   if (flags) {
+   if (!(flags ^ (TCA_CLS_FLAGS_SKIP_HW |
+  TCA_CLS_FLAGS_SKIP_SW))) {
+   fprintf(stderr,
+   "skip_hw and skip_sw are mutually exclusive\n");
+   return -1;
+   }
+   addattr_l(n, MAX_MSG, TCA_MATCHALL_FLAGS, &flags, 4);
+   }
+
+   tail->rta_len = (((void *)n)+n-

[patch iproute2 2/2] tc: man: Add man entry for the matchall classifier.

2016-08-22 Thread Jiri Pirko
From: Yotam Gigi 

In addition to providing infromation about the mathcall filter and its
configurations, the man entry contains examples for creating port
mirorring entries.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
---
 man/man8/Makefile  |  2 +-
 man/man8/tc-matchall.8 | 76 ++
 man/man8/tc.8  |  5 
 3 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100644 man/man8/tc-matchall.8

diff --git a/man/man8/Makefile b/man/man8/Makefile
index 9badbed..9213769 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -14,7 +14,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 
rtmon.8 rtpr.8 ss.
tipc.8 tipc-bearer.8 tipc-link.8 tipc-media.8 tipc-nametable.8 \
tipc-node.8 tipc-socket.8 \
tc-basic.8 tc-cgroup.8 tc-flow.8 tc-flower.8 tc-fw.8 tc-route.8 \
-   tc-tcindex.8 tc-u32.8 \
+   tc-tcindex.8 tc-u32.8 tc-matchall.8 \
tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8 \
devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8
diff --git a/man/man8/tc-matchall.8 b/man/man8/tc-matchall.8
new file mode 100644
index 000..b927784
--- /dev/null
+++ b/man/man8/tc-matchall.8
@@ -0,0 +1,76 @@
+.TH "Match-all classifier in tc" 8 "21 Oct 2015" "iproute2" "Linux"
+
+.SH NAME
+matchall \- traffic control filter that matches every packet
+.SH SYNOPSIS
+.in +8
+.ti -8
+.BR tc " " filter " ... " matchall " [ "
+.BR skip_sw " | " skip_hw
+.R " ] [ "
+.B action
+.IR ACTION_SPEC " ] [ "
+.B classid
+.IR CLASSID " ]"
+.SH DESCRIPTION
+The
+.B matchall
+filter allows to classify every packet that flows on the port and run a
+action on it.
+.SH OPTIONS
+.TP
+.BI action " ACTION_SPEC"
+Apply an action from the generic actions framework on matching packets.
+.TP
+.BI classid " CLASSID"
+Push matching packets into the class identified by
+.IR CLASSID .
+.TP
+.BI skip_sw
+Do not process filter by software. If hardware has no offload support for this
+filter, or TC offload is not enabled for the interface, operation will fail.
+.TP
+.BI skip_hw
+Do not process filter by hardware.
+.SH EXAMPLES
+To create ingress mirroring from port eth1 to port eth2:
+.RS
+.EX
+
+tc qdisc  add dev eth1 handle : ingress
+tc filter add dev eth1 parent :   \\
+matchall skip_sw  \\
+action mirred egress mirror   \\
+dev eth2
+.EE
+.RE
+
+The first command creats an ingress qdisc with handle
+.BR :
+on device
+.BR eth1
+where the second command attaches a matchall filters on it that mirrors the
+pacets to device eth2.
+
+To create engress mirroring from port eth1 to port eth2:
+.EX
+
+tc qdisc add dev eth1 handle 1: root prio
+tc filter add dev eth1 parent 1:   \\
+matchall skip_sw   \\
+action mirred egress mirror\\
+dev eth2
+.EE
+.RE
+
+The first command creats an egress qdisc with handle
+.BR 1:
+that replaces the root qdisc on device
+.BR eth1
+where the second command attaches a matchall filters on it that mirrors the
+pacets to device eth2.
+
+
+.EE
+.SH SEE ALSO
+.BR tc (8),
diff --git a/man/man8/tc.8 b/man/man8/tc.8
index 4e99dca..7ee1c9c 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -187,6 +187,11 @@ u32
 Generic filtering on arbitrary packet data, assisted by syntax to abstract 
common operations. See
 .BR tc-u32 (8)
 for details.
+.TP
+matchall
+Traffic control filter that matches every packet. See
+.BR tc-matchall (8)
+for details.
 
 .SH CLASSLESS QDISCS
 The classless qdiscs are:
-- 
2.5.5



Re: [PATCH v2 1/4] net: dt-bindings: Document the new Meson8b and GXBB DWMAC bindings

2016-08-22 Thread Arnd Bergmann
On Saturday, August 20, 2016 11:35:35 AM CEST Martin Blumenstingl wrote:
> +- reg: The first register range should be the one of the DWMAC
> +   controller. The second range is is for the Amlogic specific
> +   configuration (for example the PRG_ETHERNET register range
> +   on Meson8b and newer)
>  
...

> +Example for GXBB:
> +   ethmac: ethernet@c941 {
> +   compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
> +   reg = <0x0 0xc941 0x0 0x1>,
> +   <0x0 0xc8834540 0x0 0x8>;
> 

The address "0xc8834540" suggests that this is part of a larger register
range that is used for various things, i.e. a "syscon" type of device.

How about making this a syscon reference rather than a "reg" address?

Arnd


Re: [PATCH v2 1/4] net: dt-bindings: Document the new Meson8b and GXBB DWMAC bindings

2016-08-22 Thread Martin Blumenstingl
On Mon, Aug 22, 2016 at 1:55 PM, Arnd Bergmann  wrote:
> On Saturday, August 20, 2016 11:35:35 AM CEST Martin Blumenstingl wrote:
>> +- reg: The first register range should be the one of the DWMAC
>> +   controller. The second range is is for the Amlogic specific
>> +   configuration (for example the PRG_ETHERNET register range
>> +   on Meson8b and newer)
>>
> ...
>
>> +Example for GXBB:
>> +   ethmac: ethernet@c941 {
>> +   compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac";
>> +   reg = <0x0 0xc941 0x0 0x1>,
>> +   <0x0 0xc8834540 0x0 0x8>;
>>
>
> The address "0xc8834540" suggests that this is part of a larger register
> range that is used for various things, i.e. a "syscon" type of device.
You are right, these are part of the cbus range (which is already
defined in meson-gxbb.dtsi)

> How about making this a syscon reference rather than a "reg" address?
The first version of my patch ([0]) used
syscon_regmap_lookup_by_phandle. Maybe I did it wrong (and I should
have passed the cbus syscon-node instead of defining a new one just
for the 2x32bit PRG_ETHERNET registers).
I am perfectly fine with either way - however it seems that some other
dwmac glue implementations are also using a second set of resources
(that doesn't automatically make it "correct" though).


Martin


Re: [Patch net-next] net_sched: properly handle failure case of tcf_exts_init()

2016-08-22 Thread Jamal Hadi Salim


Just small comment below:

On 16-08-19 03:36 PM, Cong Wang wrote:

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 944c8ff..d950070 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -219,10 +219,10 @@ static const struct nla_policy 
tcindex_policy[TCA_TCINDEX_MAX + 1] = {
[TCA_TCINDEX_CLASSID]   = { .type = NLA_U32 },
 };

-static void tcindex_filter_result_init(struct tcindex_filter_result *r)
+static int tcindex_filter_result_init(struct tcindex_filter_result *r)
 {
memset(r, 0, sizeof(*r));
-   tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
+   return tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 }

 static void __tcindex_partial_destroy(struct rcu_head *head)
@@ -233,23 +233,57 @@ static void __tcindex_partial_destroy(struct rcu_head 
*head)
kfree(p);
 }

+static void tcindex_free_perfect_hash(struct tcindex_data *cp)
+{
+   int i;
+
+   for (i = 0; i < cp->hash; i++)
+   tcf_exts_destroy(&cp->perfect[i].exts);
+   kfree(cp->perfect);
+}
+
+static int tcindex_alloc_perfect_hash(struct tcindex_data *cp)
+{
+   int i, err = 0;
+
+   cp->perfect = kcalloc(cp->hash, sizeof(struct tcindex_filter_result),
+ GFP_KERNEL);
+   if (!cp->perfect)
+   return -ENOMEM;
+
+   for (i = 0; i < cp->hash; i++) {
+   err = tcf_exts_init(&cp->perfect[i].exts,
+   TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
+   if (err < 0)
+   goto errout;
+   }
+
+   return 0;
+
+errout:
+   tcindex_free_perfect_hash(cp);
+   return err;
+}
+



If you fail tcindex_alloc_perfect_hash somewhere freeing actions in
tcindex_free_perfect_hash() via tcf_exts_destroy require checking
if exts->actions require a check?

Otherwise, looks good.

Acked-by: Jamal Hadi Salim 

cheers,
jamal


[RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu

2016-08-22 Thread Shmulik Ladkani
There are cases where gso skbs (which originate from an ingress
interface) have a gso_size value that exceeds the output dst mtu:

 - ipv4 forwarding middlebox having in/out interfaces with different mtus
   addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path'
 - bridge having a tunnel member interface stacked over a device with small mtu
   addressed by b8247f095e 'net: ip_finish_output_gso: If 
skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled 
skbs'

In both cases, such skbs are identified, then go through early software
segmentation+fragmentation as part of ip_finish_output_gso.

Another approach is to shrink the gso_size to a value suitable so
resulting segments are smaller than dst mtu, as suggeted by Eric
Dumazet (as part of [1]) and Florian Westphal (as part of [2]).

This will void the need for software segmentation/fragmentation at
ip_finish_output_gso, thus significantly improve throughput and lower
cpu load.

This RFC patch attempts to implement this gso_size clamping.

[1] https://patchwork.ozlabs.org/patch/314327/
[2] https://patchwork.ozlabs.org/patch/644724/

Cc: Hannes Frederic Sowa 
Cc: Eric Dumazet 
Cc: Florian Westphal 

Signed-off-by: Shmulik Ladkani 
---

 Comments welcome.

 Few questions embedded in the patch.

 Florian, in fe6cc55f you described a BUG due to gso_size decrease.
 I've tested both bridged and routed cases, but in my setups failed to
 hit the issue; Appreciate if you can provide some hints.

 net/ipv4/ip_output.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index dde37fb..b911b43 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -216,6 +216,40 @@ static int ip_finish_output2(struct net *net, struct sock 
*sk, struct sk_buff *s
return -EINVAL;
 }
 
+static inline int skb_gso_clamp(struct sk_buff *skb, unsigned int network_len)
+{
+   struct skb_shared_info *shinfo = skb_shinfo(skb);
+   unsigned short gso_size;
+   unsigned int seglen;
+
+   if (shinfo->gso_size == GSO_BY_FRAGS)
+   return -EINVAL;
+
+   seglen = skb_gso_network_seglen(skb);
+
+   /* Decrease gso_size to fit specified network length */
+   gso_size = shinfo->gso_size - (seglen - network_len);
+   if (shinfo->gso_type & SKB_GSO_UDP)
+   gso_size &= ~7;
+
+   if (!(shinfo->gso_type & SKB_GSO_DODGY)) {
+   /* Recalculate gso_segs for skbs of trusted sources.
+* Untrusted skbs will have their gso_segs calculated by
+* skb_gso_segment.
+*/
+   unsigned int hdr_len, payload_len;
+
+   hdr_len = seglen - shinfo->gso_size;
+   payload_len = skb->len - hdr_len;
+   shinfo->gso_segs = DIV_ROUND_UP(payload_len, gso_size);
+
+   // Q: need to verify gso_segs <= dev->gso_max_segs?
+   //seems to be protected by netif_skb_features
+   }
+   shinfo->gso_size = gso_size;
+   return 0;
+}
+
 static int ip_finish_output_gso(struct net *net, struct sock *sk,
struct sk_buff *skb, unsigned int mtu)
 {
@@ -237,6 +271,16 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
 * 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
 * from host network stack.
 */
+
+   /* Attempt to clamp gso_size to avoid segmenting and fragmenting.
+*
+* Q: policy needed? per device?
+*/
+   if (sysctl_ip_output_gso_clamp) {
+   if (!skb_gso_clamp(skb, mtu))
+   return ip_finish_output2(net, sk, skb);
+   }
+
features = netif_skb_features(skb);
BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
-- 
1.9.1



Re: [Patch net-next] net_sched: properly handle failure case of tcf_exts_init()

2016-08-22 Thread Jamal Hadi Salim

On 16-08-19 03:36 PM, Cong Wang wrote:

After commit 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer 
array")
we do dynamic allocation in tcf_exts_init(), therefore we need
to handle the ENOMEM case properly.

Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 


Answered my own question by looking at implementation of kfree();
ignores NULL (unlike free())

Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-22 Thread Simon Horman
On Fri, Aug 19, 2016 at 10:09:01AM -0700, David Ahern wrote:
> As reported by Lennert the MPLS GSO code is failing to properly segment
> large packets. There are a couple of problems:
> 
> 1. the inner protocol is not set so the gso segment functions for inner
>protocol layers are not getting run, and
> 
> 2  MPLS labels for packets that use the "native" (non-OVS) MPLS code
>are not properly accounted for in mpls_gso_segment.
> 
> The MPLS GSO code was added for OVS. It is re-using skb_mac_gso_segment
> to call the gso segment functions for the higher layer protocols. That
> means skb_mac_gso_segment is called twice -- once with the network
> protocol set to MPLS and again with the network protocol set to the
> inner protocol.
> 
> This patch sets the inner skb protocol addressing item 1 above and sets
> the network_header and inner_network_header to mark where the MPLS labels
> start and end. The MPLS code in OVS is also updated to set the two
> network markers.
> 
> From there the MPLS GSO code uses the difference between the network
> header and the inner network header to know the size of the MPLS header
> that was pushed. It then pulls the MPLS header, resets the mac_len and
> protocol for the inner protocol and then calls skb_mac_gso_segment
> to segment the skb.
> 
> Afterward the inner protocol segmentation is done the skb protocol
> is set to mpls for each segment and the network and mac headers
> restored.
> 
> Reported-by: Lennert Buytenhek 
> Signed-off-by: David Ahern 
> ---
>  net/mpls/mpls_gso.c   | 38 +++---
>  net/mpls/mpls_iptunnel.c  |  4 
>  net/openvswitch/actions.c |  6 ++
>  3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..2aa4beaa0e4f 100644

...

> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 1ecbd7715f6d..6d78f162a88b 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
> sw_flow_key *key,
>   skb->mac_len);
>   skb_reset_mac_header(skb);
>  
> + /* for GSO: set MPLS as network header and encapsulated protocol
> +  * header as inner network header
> +  */
> + skb_set_network_header(skb, skb->mac_len);
> + skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
> +
>   new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>   *new_mpls_lse = mpls->mpls_lse;

Is the above calculation correct if push_mpls() is called multiple times?


Re: [PATCH v4 net-next] ppp: Fix one deadlock issue of PPP when reentrant

2016-08-22 Thread Guillaume Nault
On Mon, Aug 22, 2016 at 09:20:14AM +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> PPP channel holds one spinlock before send frame. But the skb may
> select the same PPP channel with wrong route policy. As a result,
> the skb reaches the same channel path. It tries to get the same
> spinlock which is held before. Bang, the deadlock comes out.
>
Thanks for following up on this case.
On my side, I've thought a bit more about it in the weekend and cooked
this patch.
It's experimental and requires cleanup and further testing, but it
should fix all issues I could think of (at least for PPP over L2TP).

The main idea is to use a per-cpu variable to detect recursion in
selected points of PPP and L2TP xmit path.

---
 drivers/net/ppp/ppp_generic.c | 49 ---
 net/l2tp/l2tp_core.c  | 28 +
 2 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index f226db4..c33036bf 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1354,6 +1354,8 @@ static void ppp_setup(struct net_device *dev)
dev->netdev_ops = &ppp_netdev_ops;
SET_NETDEV_DEVTYPE(dev, &ppp_type);
 
+   dev->features |= NETIF_F_LLTX;
+
dev->hard_header_len = PPP_HDRLEN;
dev->mtu = PPP_MRU;
dev->addr_len = 0;
@@ -1367,12 +1369,7 @@ static void ppp_setup(struct net_device *dev)
  * Transmit-side routines.
  */
 
-/*
- * Called to do any work queued up on the transmit side
- * that can now be done.
- */
-static void
-ppp_xmit_process(struct ppp *ppp)
+static void __ppp_xmit_process(struct ppp *ppp)
 {
struct sk_buff *skb;
 
@@ -1392,6 +1389,23 @@ ppp_xmit_process(struct ppp *ppp)
ppp_xmit_unlock(ppp);
 }
 
+static DEFINE_PER_CPU(int, ppp_xmit_recursion);
+
+/* Called to do any work queued up on the transmit side that can now be done */
+static void ppp_xmit_process(struct ppp *ppp)
+{
+   if (unlikely(__this_cpu_read(ppp_xmit_recursion))) {
+   WARN(1, "recursion detected\n");
+   return;
+   }
+
+   __this_cpu_inc(ppp_xmit_recursion);
+   local_bh_disable();
+   __ppp_xmit_process(ppp);
+   local_bh_enable();
+   __this_cpu_dec(ppp_xmit_recursion);
+}
+
 static inline struct sk_buff *
 pad_compress_skb(struct ppp *ppp, struct sk_buff *skb)
 {
@@ -1847,11 +1861,7 @@ static int ppp_mp_explode(struct ppp *ppp, struct 
sk_buff *skb)
 }
 #endif /* CONFIG_PPP_MULTILINK */
 
-/*
- * Try to send data out on a channel.
- */
-static void
-ppp_channel_push(struct channel *pch)
+static void __ppp_channel_push(struct channel *pch)
 {
struct sk_buff *skb;
struct ppp *ppp;
@@ -1876,11 +1886,26 @@ ppp_channel_push(struct channel *pch)
read_lock_bh(&pch->upl);
ppp = pch->ppp;
if (ppp)
-   ppp_xmit_process(ppp);
+   __ppp_xmit_process(ppp);
read_unlock_bh(&pch->upl);
}
 }
 
+/* Try to send data out on a channel */
+static void ppp_channel_push(struct channel *pch)
+{
+   if (unlikely(__this_cpu_read(ppp_xmit_recursion))) {
+   WARN(1, "recursion detected\n");
+   return;
+   }
+
+   __this_cpu_inc(ppp_xmit_recursion);
+   local_bh_disable();
+   __ppp_channel_push(pch);
+   local_bh_enable();
+   __this_cpu_dec(ppp_xmit_recursion);
+}
+
 /*
  * Receive-side routines.
  */
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 1e40dac..bdfb1be 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1096,10 +1096,8 @@ static int l2tp_xmit_core(struct l2tp_session *session, 
struct sk_buff *skb,
return 0;
 }
 
-/* If caller requires the skb to have a ppp header, the header must be
- * inserted in the skb data before calling this function.
- */
-int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int 
hdr_len)
+static int __l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb,
+  int hdr_len)
 {
int data_len = skb->len;
struct l2tp_tunnel *tunnel = session->tunnel;
@@ -1178,6 +1176,28 @@ out_unlock:
 
return ret;
 }
+
+static DEFINE_PER_CPU(int, l2tp_xmit_recursion);
+
+/* If caller requires the skb to have a ppp header, the header must be
+ * inserted in the skb data before calling this function.
+ */
+int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb,
+ int hdr_len)
+{
+   int ret;
+
+   if (unlikely(__this_cpu_read(l2tp_xmit_recursion))) {
+   WARN(1, "recursion detected\n");
+   return NET_XMIT_DROP;
+   }
+
+   __this_cpu_inc(l2tp_xmit_recursion);
+   ret = __l2tp_xmit_skb(session, skb, hdr_len);
+   __this_cpu_dec(l2tp_xmit_recursion);
+
+   return ret;
+}
 EXPORT_SYMBOL_GPL(l2tp_xmit_skb);
 
 /

[PATCH] net: ipconfig: Fix NULL pointer dereference on RARP/BOOTP/DHCP timeout

2016-08-22 Thread Geert Uytterhoeven
If no RARP, BOOTP, or DHCP response is received, ic_dev is never set,
causing a NULL pointer dereference in ic_close_devs():

Sending DHCP requests .. timed out!
Unable to handle kernel NULL pointer dereference at virtual address 0004

To fix this, add a check to avoid dereferencing ic_dev if it is still
NULL.

Signed-off-by: Geert Uytterhoeven 
Fixes: 2647cffb2bc6fbed ("net: ipconfig: Support using "delayed" DHCP replies")
---
 net/ipv4/ipconfig.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index ba9cbeafbb2e0817..071a785c65eb7ab5 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -306,7 +306,7 @@ static void __init ic_close_devs(void)
while ((d = next)) {
next = d->next;
dev = d->dev;
-   if (dev != ic_dev->dev && !netdev_uses_dsa(dev)) {
+   if ((!ic_dev || dev != ic_dev->dev) && !netdev_uses_dsa(dev)) {
pr_debug("IP-Config: Downing %s\n", dev->name);
dev_change_flags(dev, d->flags);
}
-- 
1.9.1



Re: [PATCH 1/1] brcmfmac: fix pmksa->bssid usage

2016-08-22 Thread Nicolas Iooss
Hello,

After I sent the following patch a few weeks ago, I have not received
any feedback. Could you please review it and tell me what I may have
done wrong?

Thanks,
Nicolas

On 05/08/16 22:34, Nicolas Iooss wrote:
> The struct cfg80211_pmksa defines its bssid field as:
> 
> const u8 *bssid;
> 
> contrary to struct brcmf_pmksa, which uses:
> 
> u8 bssid[ETH_ALEN];
> 
> Therefore in brcmf_cfg80211_del_pmksa(), &pmksa->bssid takes the address
> of this field (of type u8**), not the one of its content (which would be
> u8*).  Remove the & operator to make brcmf_dbg("%pM") and memcmp()
> behave as expected.
> 
> This bug have been found using a custom static checker (which checks the
> usage of %p... attributes at build time).  It has been introduced in
> commit 6c404f34f2bd ("brcmfmac: Cleanup pmksa cache handling code"),
> which replaced pmksa->bssid by &pmksa->bssid while refactoring the code,
> without modifying struct cfg80211_pmksa definition.
> 
> Fixes: 6c404f34f2bd ("brcmfmac: Cleanup pmksa cache handling code")
> Cc: sta...@ger.kernel.org
> Signed-off-by: Nicolas Iooss 
> ---
> 
> scripts/checkpatch.pl reports a warning: "Prefer ether_addr_equal() or
> ether_addr_equal_unaligned() over memcmp()".  Because some files in
> drivers/net/wireless/broadcom/brcm80211/brcmfmac/ still use memcmp()
> to compare addresses and because I do not know whether pmksa->bssid is
> always aligned, I did not follow this warning.
> 
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> index 2628d5e12c64..aceab77cd95a 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
> @@ -3884,11 +3884,11 @@ brcmf_cfg80211_del_pmksa(struct wiphy *wiphy, struct 
> net_device *ndev,
>   if (!check_vif_up(ifp->vif))
>   return -EIO;
>  
> - brcmf_dbg(CONN, "del_pmksa - PMK bssid = %pM\n", &pmksa->bssid);
> + brcmf_dbg(CONN, "del_pmksa - PMK bssid = %pM\n", pmksa->bssid);
>  
>   npmk = le32_to_cpu(cfg->pmk_list.npmk);
>   for (i = 0; i < npmk; i++)
> - if (!memcmp(&pmksa->bssid, &pmk[i].bssid, ETH_ALEN))
> + if (!memcmp(pmksa->bssid, &pmk[i].bssid, ETH_ALEN))
>   break;
>  
>   if ((npmk > 0) && (i < npmk)) {
> 



Re: [net-next PATCH] ethtool: Add support for SCTP verification tag in Rx NFC

2016-08-22 Thread Ben Hutchings
On Sat, 2016-08-20 at 18:56 -0700, Alexander Duyck wrote:
> > On Sat, Aug 20, 2016 at 5:21 PM, Ben Hutchings  wrote:
> > 
> > On Fri, 2016-08-19 at 14:32 -0700, Alexander Duyck wrote:
> > > 
> > > The i40e hardware has support for SCTP filtering via Rx NFC however the
> > > default configuration expects us to include the verification tag as a part
> > > of the filter.  In order to support that I need to be able to transfer 
> > > that
> > > data through the ethtool interface via a new structure.
> > > 
> > > This patch adds a new structure to allow us to pass the verification tag
> > > for IPv4 or IPv6 SCTP traffic.
> > [...]
> > 
> > This looks like an incompatible ABI change.  I suppose it could be OK
> > if no drivers implemented flow steering for SCTP using the previously
> > specified structure, but have you checked that that is the case?
> > 
> > Ben.
> 
> Well the structure itself matches the TCP flow spec for the TCP flow
> sized portion.  All I am doing with this patch is adding an extension
> to that which is still confined to the 52 byte limit of the flow
> > union.

But that extension will be ignored by any drivers that implemented the
API as previously defined.  (If there aren't any, as I said, this
doesn't really matter.)

With previous extensions (everything in struct ethtool_flow_ext) we've
introduced new type flags to ensure that they won't be silently
ignored.  You could add a new extended-SCTP type value for the same
reason.

[...]
> One thing I could do if you would like would be to spin up another
> patch to force the kernel to return -EINVAL if we are masking in
> fields that are out of bounds for the flow specification.  That way we
> can handle this  a bit more concisely in the future should we end up
> > having to extend any other flow specifications.

It's too late to do that now.

Ben.

-- 
Ben Hutchings
Quantity is no substitute for quality, but it's the only one we've got.

signature.asc
Description: This is a digitally signed message part


Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu

2016-08-22 Thread Shmulik Ladkani
Hi,

On Mon, 22 Aug 2016 14:58:42 +0200, f...@strlen.de wrote:
> > 
> >  Florian, in fe6cc55f you described a BUG due to gso_size decrease.
> >  I've tested both bridged and routed cases, but in my setups failed to
> >  hit the issue; Appreciate if you can provide some hints.
> 
> Still get the BUG, I applied this patch on top of net-next.
> 
> On hypervisor:
> 10.0.0.2 via 192.168.7.10 dev tap0 mtu lock 1500
> ssh root@10.0.0.2 'cat > /dev/null' < /dev/zero
> 
> On vm1 (which dies instantly, see below):
> eth0 mtu 1500 (192.168.7.10)
> eth1 mtu 1280 (10.0.0.1)
> 
> On vm2
> eth0 mtu 1280 (10.0.0.2)
> 
> Normal ipv4 routing via vm1, no iptables etc. present, so
> 
> we have  hypervisor 1500 -> 1500 VM1 1280 -> 1280 VM2
> 
> Turning off gro avoids this problem.

Thanks Florian, will dive into this.


Re: mlx5 VST and VGT mode at the same time

2016-08-22 Thread Mohamad Haj Yahia
On Thu, Aug 18, 2016 at 12:41 PM, domingo montoya
 wrote:
> Hi All,
>
> Is there any way we can support both VST and VGT modes at the same time in 
> mlx5?
>
> For e.g,
>
> If i send untagged packets from the VF, they should be tagged with the
> VST vlan and the vlan be stripped for received packets.
>
> If i send tagged packets from the VF, they should be send as it and no
> tag inserted for these and also the vlan tag not stripped for received
> packets.
>
> Any way we can achieve this?
>
>
> I understand that in the latest code these features are mutually exclusive.
>
> But if we have a requirement like this, any ideas on how to go about
> implementing the same.
>
> Few observations:
>
> After going through the code, I figured out that for VST mode, we run
> MODIFY_ESW_VPORT_CONTEXT and as part of this set the flag to strip the
> vlan from the received packets. In case of VGT mode, because of this
> command, the tags set by the VF driver also get stripped.
>
>
>
> Thanks a lot!
>
>
> Best Regards,
> Domingo

Hi Domingo,

Unfortunately there is a HW limitation that prevent VGT working
besides VST on the same VF.
Since the stripping feature is global attribute for all the VF
incoming vlans, if we enable both modes you will see that the VGT
traffic vlan also stripped and thus it will arrive to the VF as
untagged.
Because of this limitation we blocked the outgoing vlan tagged traffic
from a VF that is in VST mode and also dropped incoming vlan tagged
packets targeting that VF with a different vlan than the VF vlan-id.
The VGT and VST mutual exclusive enforcement is done by VF ACL ingress
and egress flow tables.

Thanks,
Mohamad


Re: [PATCH 1/1] brcmfmac: fix pmksa->bssid usage

2016-08-22 Thread Rafał Miłecki
On 22 August 2016 at 15:03, Nicolas Iooss  wrote:
> After I sent the following patch a few weeks ago, I have not received
> any feedback. Could you please review it and tell me what I may have
> done wrong?

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches#checking_state_of_patches_from_patchwork
https://patchwork.kernel.org/patch/9265733/


Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-08-22 Thread David Ahern
On 8/22/16 6:21 AM, Simon Horman wrote:
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 1ecbd7715f6d..6d78f162a88b 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -167,6 +167,12 @@ static int push_mpls(struct sk_buff *skb, struct 
>> sw_flow_key *key,
>>  skb->mac_len);
>>  skb_reset_mac_header(skb);
>>  
>> +/* for GSO: set MPLS as network header and encapsulated protocol
>> + * header as inner network header
>> + */
>> +skb_set_network_header(skb, skb->mac_len);
>> +skb_set_inner_network_header(skb, skb->mac_len + MPLS_HLEN);
>> +
>>  new_mpls_lse = (__be32 *)skb_mpls_header(skb);
>>  *new_mpls_lse = mpls->mpls_lse;
> 
> Is the above calculation correct if push_mpls() is called multiple times?
> 

No. Does OVS support more than 1? I really need someone who is familiar with 
the OVS code to make sure it works for all use cases. e.g., set 
skb_set_inner_network_header() before pushing a series of MPLS labels.


Re: [PATCH v4 net-next] ppp: Fix one deadlock issue of PPP when reentrant

2016-08-22 Thread Feng Gao
It seems a better solution, simple and apparent.
I accept any best solution which could make kernel works well :))

Best Regards
Feng

On Mon, Aug 22, 2016 at 8:35 PM, Guillaume Nault  wrote:
> On Mon, Aug 22, 2016 at 09:20:14AM +0800, f...@ikuai8.com wrote:
>> From: Gao Feng 
>>
>> PPP channel holds one spinlock before send frame. But the skb may
>> select the same PPP channel with wrong route policy. As a result,
>> the skb reaches the same channel path. It tries to get the same
>> spinlock which is held before. Bang, the deadlock comes out.
>>
> Thanks for following up on this case.
> On my side, I've thought a bit more about it in the weekend and cooked
> this patch.
> It's experimental and requires cleanup and further testing, but it
> should fix all issues I could think of (at least for PPP over L2TP).
>
> The main idea is to use a per-cpu variable to detect recursion in
> selected points of PPP and L2TP xmit path.
>
> ---
>  drivers/net/ppp/ppp_generic.c | 49 
> ---
>  net/l2tp/l2tp_core.c  | 28 +
>  2 files changed, 61 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
> index f226db4..c33036bf 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -1354,6 +1354,8 @@ static void ppp_setup(struct net_device *dev)
> dev->netdev_ops = &ppp_netdev_ops;
> SET_NETDEV_DEVTYPE(dev, &ppp_type);
>
> +   dev->features |= NETIF_F_LLTX;
> +
> dev->hard_header_len = PPP_HDRLEN;
> dev->mtu = PPP_MRU;
> dev->addr_len = 0;
> @@ -1367,12 +1369,7 @@ static void ppp_setup(struct net_device *dev)
>   * Transmit-side routines.
>   */
>
> -/*
> - * Called to do any work queued up on the transmit side
> - * that can now be done.
> - */
> -static void
> -ppp_xmit_process(struct ppp *ppp)
> +static void __ppp_xmit_process(struct ppp *ppp)
>  {
> struct sk_buff *skb;
>
> @@ -1392,6 +1389,23 @@ ppp_xmit_process(struct ppp *ppp)
> ppp_xmit_unlock(ppp);
>  }
>
> +static DEFINE_PER_CPU(int, ppp_xmit_recursion);
> +
> +/* Called to do any work queued up on the transmit side that can now be done 
> */
> +static void ppp_xmit_process(struct ppp *ppp)
> +{
> +   if (unlikely(__this_cpu_read(ppp_xmit_recursion))) {
> +   WARN(1, "recursion detected\n");
> +   return;
> +   }
> +
> +   __this_cpu_inc(ppp_xmit_recursion);
> +   local_bh_disable();
> +   __ppp_xmit_process(ppp);
> +   local_bh_enable();
> +   __this_cpu_dec(ppp_xmit_recursion);
> +}
> +
>  static inline struct sk_buff *
>  pad_compress_skb(struct ppp *ppp, struct sk_buff *skb)
>  {
> @@ -1847,11 +1861,7 @@ static int ppp_mp_explode(struct ppp *ppp, struct 
> sk_buff *skb)
>  }
>  #endif /* CONFIG_PPP_MULTILINK */
>
> -/*
> - * Try to send data out on a channel.
> - */
> -static void
> -ppp_channel_push(struct channel *pch)
> +static void __ppp_channel_push(struct channel *pch)
>  {
> struct sk_buff *skb;
> struct ppp *ppp;
> @@ -1876,11 +1886,26 @@ ppp_channel_push(struct channel *pch)
> read_lock_bh(&pch->upl);
> ppp = pch->ppp;
> if (ppp)
> -   ppp_xmit_process(ppp);
> +   __ppp_xmit_process(ppp);
> read_unlock_bh(&pch->upl);
> }
>  }
>
> +/* Try to send data out on a channel */
> +static void ppp_channel_push(struct channel *pch)
> +{
> +   if (unlikely(__this_cpu_read(ppp_xmit_recursion))) {
> +   WARN(1, "recursion detected\n");
> +   return;
> +   }
> +
> +   __this_cpu_inc(ppp_xmit_recursion);
> +   local_bh_disable();
> +   __ppp_channel_push(pch);
> +   local_bh_enable();
> +   __this_cpu_dec(ppp_xmit_recursion);
> +}
> +
>  /*
>   * Receive-side routines.
>   */
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 1e40dac..bdfb1be 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1096,10 +1096,8 @@ static int l2tp_xmit_core(struct l2tp_session 
> *session, struct sk_buff *skb,
> return 0;
>  }
>
> -/* If caller requires the skb to have a ppp header, the header must be
> - * inserted in the skb data before calling this function.
> - */
> -int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb, int 
> hdr_len)
> +static int __l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb,
> +  int hdr_len)
>  {
> int data_len = skb->len;
> struct l2tp_tunnel *tunnel = session->tunnel;
> @@ -1178,6 +1176,28 @@ out_unlock:
>
> return ret;
>  }
> +
> +static DEFINE_PER_CPU(int, l2tp_xmit_recursion);
> +
> +/* If caller requires the skb to have a ppp header, the header must be
> + * inserted in the skb data before calling this function.
> + */
> +int l2tp_xmit_skb(struct l2tp_session *session, struct sk_buff *skb,
>

Re: [PATCH] gianfar: fix size of fragmented frames

2016-08-22 Thread Zefir Kurtisi
Hi Claudiu,


On 08/19/2016 11:12 PM, Claudiu Manoil wrote:
> Hi Zefir,
> 
> [sorry if the message indentation is wrong ... webmail]
> 
> 
> From: Zefir Kurtisi 
> Sent: Friday, August 19, 2016 8:11 PM
> To: Claudiu Manoil
> Subject: Re: [PATCH] gianfar: fix size of fragmented frames
> 
> Hi Claudiu,
> 
> On 08/19/2016 06:46 PM, Claudiu Manoil wrote:
>>> -Original Message-
>>> From: Zefir Kurtisi [mailto:zefir.kurt...@neratec.com]
>>> Sent: Friday, August 19, 2016 12:16 PM
>>> To: netdev@vger.kernel.org
>>> Cc: claudiu.man...@freescale.com
>>> Subject: [PATCH] gianfar: fix size of fragmented frames
>>>
>>> The eTSEC RxBD 'Data Length' field is context depening:
>>> for the last fragment it contains the full frame size,
>>> while fragments contain the fragment size, which equals
>>> the value written to register MRBLR.
>>>
>>
>> According to RM the last fragment has the whole packet length indeed,
>> and this should apply to fragmented frames too:
>>
>> " Data length, written by the eTSEC.
>> Data length is the number of octets written by the eTSEC into this BD's data 
>> buffer if L is cleared
>> (the value is equal to MRBLR), or, if L is set, the length of the frame 
>> including CRC, FCB (if
>> RCTRL[PRSDEP > 00), preamble (if MACCFG2[PreAmRxEn]=1), time stamp (if 
>> RCTRL[TS] = 1)
>> and any padding (RCTRL[PAL]). "
>>
>>> This differentiation is missing in the gianfar driver,
>>> which causes data corruption as soon as the hardware
>>> starts to fragment receiving frames. As a result, the
>>> size of fragmented frames is increased by
>>> (nr_frags - 1) * MRBLR
>>>
>>> We first noticed this issue working with DSA, where a
>>> 1540 octet frame is fragmented by the hardware and
>>> reconstructed by the driver as 3076 octet frame.
>>>
>>> This patch fixes the problem by adjusting the size of
>>> the last fragment.
>>>
>>> Signed-off-by: Zefir Kurtisi 
>>> ---
>>> drivers/net/ethernet/freescale/gianfar.c | 19 +--
>>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/freescale/gianfar.c
>>> b/drivers/net/ethernet/freescale/gianfar.c
>>> index d20935d..4187280 100644
>>> --- a/drivers/net/ethernet/freescale/gianfar.c
>>> +++ b/drivers/net/ethernet/freescale/gianfar.c
>>> @@ -2922,17 +2922,24 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff 
>>> *rxb,
>>> u32 lstatus,
>>> {
>>>unsigned int size = lstatus & BD_LENGTH_MASK;
>>>struct page *page = rxb->page;
>>> +bool last = !!(lstatus & BD_LFLAG(RXBD_LAST));
>>>
>>>/* Remove the FCS from the packet length */
>>> -if (likely(lstatus & BD_LFLAG(RXBD_LAST)))
>>> +if (last)
>>>size -= ETH_FCS_LEN;
>>>
>>> -if (likely(first))
>>> +if (likely(first)) {
>>>skb_put(skb, size);
>>> -else
>>> -skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
>>> -rxb->page_offset + RXBUF_ALIGNMENT,
>>> -size, GFAR_RXB_TRUESIZE);
>>> +} else {
>>> +/* the last fragments' length contains the full frame length */
>>> +if (last)
>>> +size -= skb->len;
>>
>> While I agree with your finding, I don't think this works for packets having 
>> more
>> than 2 buffers (head + 1 fragment).
>> How's this supposed to work for a skb with 2 fragments, for instance?
>> I think this needs more thoughtful consideration.
>>
> In fact, this works and I tested it by setting GFAR_RXB_SIZE to 256. 
> Receiving a
> 1000 byte frame then results in 4 fragments, 3*256 plus the last one sized 
> 1000.
> The driver then combines 256+256+256+232 (=1000-768) back to a 1000 bytes 
> frame.
> 
> I don't see why it should not, because skb->len is exactly the size of the
> fragments added before the last one, so subtracting them from the total frame 
> size
> should give the size of the last fragment, no?
> 
> [Claudiu]
> Thanks for the details. I didn't have much time to look into it (I still 
> don't), and you can
> never be too careful with this kind of changes. But I agree with your 
> assessment.
> I'm surprised this didn't come out earlier, like a warning from the stack, 
> since the
> truesize can be easily exceeded in this case.
> 
In fact, I was observing warnings sending max-sized pings to the device:
# ping -c 1 -s 1472  => $ br0: dropped over-mtu packet: 3036 > 1500

That way, you won't get ICMP requests responded when the eTSEC is fragmenting 
(or
scatter-gathering) frames.

>>> +
>>> +if (size > 0 && size <= GFAR_RXB_SIZE)
>>
>> Why do you need this check?
>> The h/w ensures that the buffers won't exceed GFAR_RXB_SIZE
>> (which is MRBL), size 0 is also not possible.
>>
> Do you question the first part? In cases where the last fragment consists of 
> only
> the FCS, adding a zero size fragment would falsify skb->truesize.
> 
> The second expression is fail-safe only - it should never happen given the
> hardware specs - but if it did,

Re: [RFC PATCH] net: ip_finish_output_gso: Attempt gso_size clamping if segments exceed mtu

2016-08-22 Thread Florian Westphal
Shmulik Ladkani  wrote:
> There are cases where gso skbs (which originate from an ingress
> interface) have a gso_size value that exceeds the output dst mtu:
> 
>  - ipv4 forwarding middlebox having in/out interfaces with different mtus
>addressed by fe6cc55f3a 'net: ip, ipv6: handle gso skbs in forwarding path'
>  - bridge having a tunnel member interface stacked over a device with small 
> mtu
>addressed by b8247f095e 'net: ip_finish_output_gso: If 
> skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled 
> skbs'
> 
> In both cases, such skbs are identified, then go through early software
> segmentation+fragmentation as part of ip_finish_output_gso.
> 
> Another approach is to shrink the gso_size to a value suitable so
> resulting segments are smaller than dst mtu, as suggeted by Eric
> Dumazet (as part of [1]) and Florian Westphal (as part of [2]).
> 
> This will void the need for software segmentation/fragmentation at
> ip_finish_output_gso, thus significantly improve throughput and lower
> cpu load.
> 
> This RFC patch attempts to implement this gso_size clamping.
> 
> [1] https://patchwork.ozlabs.org/patch/314327/
> [2] https://patchwork.ozlabs.org/patch/644724/
> 
> Cc: Hannes Frederic Sowa 
> Cc: Eric Dumazet 
> Cc: Florian Westphal 
> 
> Signed-off-by: Shmulik Ladkani 
> ---
> 
>  Comments welcome.
> 
>  Few questions embedded in the patch.
> 
>  Florian, in fe6cc55f you described a BUG due to gso_size decrease.
>  I've tested both bridged and routed cases, but in my setups failed to
>  hit the issue; Appreciate if you can provide some hints.

Still get the BUG, I applied this patch on top of net-next.

On hypervisor:
10.0.0.2 via 192.168.7.10 dev tap0 mtu lock 1500
ssh root@10.0.0.2 'cat > /dev/null' < /dev/zero

On vm1 (which dies instantly, see below):
eth0 mtu 1500 (192.168.7.10)
eth1 mtu 1280 (10.0.0.1)

On vm2
eth0 mtu 1280 (10.0.0.2)

Normal ipv4 routing via vm1, no iptables etc. present, so

we have  hypervisor 1500 -> 1500 VM1 1280 -> 1280 VM2

Turning off gro avoids this problem.

[ cut here ]
kernel BUG at net-next/net/core/skbuff.c:3210!
invalid opcode:  [#1] SMP
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.8.0-rc2+ #1842
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
task: 88013b10 task.stack: 88013b0fc000
RIP: 0010:[]  [] skb_segment+0x964/0xb20
RSP: 0018:88013fd838d0  EFLAGS: 00010212
RAX: 05a8 RBX: 88013a9f9900 RCX: 88013b1cf500
RDX: 6612 RSI: 0494 RDI: 0114
RBP: 88013fd839a8 R08: 69ca R09: 88013b1cf400
R10: 0011 R11: 6612 R12: 64fe
R13: 8801394c7300 R14: 88013937ad80 R15: 0011
FS:  () GS:88013fd8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f059fc3b2b0 CR3: 01806000 CR4: 06a0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Stack:
 003b ffbe fff4 88013b1cf400
  0042 0040 0001
 0042 88013b1cf600  880104cc
Call Trace:
  
 [] ? swiotlb_map_page+0x5f/0x120
 [] tcp_gso_segment+0x100/0x480
 [] tcp4_gso_segment+0x33/0x90
 [] inet_gso_segment+0x12a/0x3b0
 [] ? dev_hard_start_xmit+0x20/0x110
 [] skb_mac_gso_segment+0x90/0xf0
 [] __skb_gso_segment+0xb1/0x140
 [] validate_xmit_skb+0x14f/0x2b0
 [] validate_xmit_skb_list+0x3e/0x60
 [] sch_direct_xmit+0x10a/0x1a0
 [] __dev_queue_xmit+0x369/0x5d0
 [] dev_queue_xmit+0xb/0x10
 [] ip_finish_output2+0x247/0x310
 [] ip_finish_output+0x1c0/0x250
 [] ip_output+0x3a/0x40
 [] ip_forward+0x36c/0x410
 [] ip_rcv+0x2e6/0x630
 [] __netif_receive_skb_core+0x2cf/0x940
 [] ? e1000_alloc_rx_buffers+0x1bd/0x490
 [] __netif_receive_skb+0x18/0x60
 [] netif_receive_skb_internal+0x28/0x90
 [] ? tcp4_gro_complete+0x80/0x90
 [] napi_gro_complete+0x7a/0xa0
 [] napi_gro_flush+0x55/0x70
 [] napi_complete_done+0x66/0xb0
 [] e1000_clean+0x380/0x900
 [] ? dev_hard_start_xmit+0x85/0x110
 [] net_rx_action+0x1a3/0x2b0
 [] __do_softirq+0xe2/0x1d0
 [] irq_exit+0x89/0x90
 [] do_IRQ+0x4f/0xd0
 [] common_interrupt+0x82/0x82
  
 [] ? native_safe_halt+0x6/0x10
 [] default_idle+0x9/0x10
 [] arch_cpu_idle+0xa/0x10
 [] default_idle_call+0x2e/0x30
 [] cpu_startup_entry+0x16f/0x220
 [] start_secondary+0x105/0x130
Code: 00 08 02 48 89 df 44 89 44 24 18 83 e6 c0 e8 04 c7 ff ff 85 c0 0f 85 02 
01 00 00 8b 83 b8 00 00 00 44 8b 44 24 18 e9 cc fe ff ff <0f> 0b 0f 0b 0f 0b 8b 
4b 74 85 c9 0f 85 ce 00 00 00 48 8b 83 c0 
RIP  [] skb_segment+0x964/0xb20
 RSP 
---[ end trace 924612451efe8dce ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in i

Re: [PATCH v1 1/1] net: phy: Add edge-rate, mac-if, read, write func to Microsemi PHYs.

2016-08-22 Thread Andrew Lunn
On Mon, Aug 08, 2016 at 07:13:28PM +0530, Nagaraju Lakkaraju wrote:
> crosemi PHYsBcc: 
> Subject: [PATCH v1 1/1]  net: phy: Add edge-rate, mac-if, read, write func to
> Reply-To: Nagaraju Lakkaraju  
> 
> Hello,
> 
> As part of 2nd patch, Add Edge rate control, MAC Interface, Read and write 
> driver functions add for Microsemi PHYs.

Please add these different features as separate patches.  You should
be aiming for a number of small patches, each of which are obviously
correct, not one big patch containing multiple things which are hard
to review.

   Andrew


Re: [PATCH] gianfar: prevent fragmentation in DSA environments

2016-08-22 Thread Zefir Kurtisi
On 08/19/2016 11:45 PM, Andrew Lunn wrote:
>> Nice improvement.
> 
> Thanks
> 
>> But what's so special about 8?
> 
> When talking to Marvell Switch chips, there are two different headers
> which can be added. The DSA header is 4 bytes, and the EDSA header is
> 8 bytes. However, if there is already a VLAN header, when using EDSA,
> it will replace the VLAN header, so only need 4 additional bytes.
> There are some very old Marvell switches which add a 4 byte
> trailer. If you are talking to a Broadcom switch chip, it also has a 4
> byte header.
> 
>> I thought only 4 bytes were missing :)
> 
> So the requirement is probably not currently for the newer Marvell
> switch chips? But we should be looking forward and expect at some
> point somebody wants to use the newer chips. I've got a Freescale
> Vybrid board using the modern Marvell chips, but the FEC driver does
> not have a such a hard limit as this driver.
> 
> However, it does not seem as simple as that. A standard Ethernet frame
> should have a maximum size of 1522 when including a VLAN header. Yet
> the driver appears to be using 1536, which is this rounded up to
> multiples of 64. So there is already 14 spare bytes in there. So there
> must be something else going on here.
> 
>> At least 1536 is the default size of the MRBLR register, as specified
>> in the h/w ref manual.  Is there some recommended standard size
>> to accommodate most (if not all) headers, to refer to?
> 
> Not that i know of. These switch headers are proprietary.
> 
> Andrew
> 
Hi,

it is a combination of
* (E)DSA header (+4/8 bytes)
* GMAC_FCB_LEN (8)
* FSL_GIANFAR_DEV_HAS_TIMER (causing priv->padding=8)
which sums up to a maximum frame size of 1538 and activates scatter-gather.

A v2 patch with better info is on its way.


Thanks,
Zefir


[PATCH 17/20] batman-adv: Place kref_get for tvlv_container near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/tvlv.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/batman-adv/tvlv.c b/net/batman-adv/tvlv.c
index 3d1cf0f..3533867 100644
--- a/net/batman-adv/tvlv.c
+++ b/net/batman-adv/tvlv.c
@@ -257,8 +257,13 @@ void batadv_tvlv_container_register(struct batadv_priv 
*bat_priv,
spin_lock_bh(&bat_priv->tvlv.container_list_lock);
tvlv_old = batadv_tvlv_container_get(bat_priv, type, version);
batadv_tvlv_container_remove(bat_priv, tvlv_old);
+
+   kref_get(&tvlv_new->refcount);
hlist_add_head(&tvlv_new->list, &bat_priv->tvlv.container_list);
spin_unlock_bh(&bat_priv->tvlv.container_list_lock);
+
+   /* don't return reference to new tvlv_container */
+   batadv_tvlv_container_put(tvlv_new);
 }
 
 /**
-- 
2.9.3



[PATCH 16/20] batman-adv: Place kref_get for nc_path near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/network-coding.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 4f4cfe5..165cd27 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -978,7 +978,6 @@ static struct batadv_nc_path *batadv_nc_get_path(struct 
batadv_priv *bat_priv,
INIT_LIST_HEAD(&nc_path->packet_list);
spin_lock_init(&nc_path->packet_list_lock);
kref_init(&nc_path->refcount);
-   kref_get(&nc_path->refcount);
nc_path->last_valid = jiffies;
ether_addr_copy(nc_path->next_hop, dst);
ether_addr_copy(nc_path->prev_hop, src);
@@ -988,6 +987,7 @@ static struct batadv_nc_path *batadv_nc_get_path(struct 
batadv_priv *bat_priv,
   nc_path->next_hop);
 
/* Add nc_path to hash table */
+   kref_get(&nc_path->refcount);
hash_added = batadv_hash_add(hash, batadv_nc_hash_compare,
 batadv_nc_hash_choose, &nc_path_key,
 &nc_path->hash_entry);
-- 
2.9.3



[PATCH 14/20] batman-adv: Place kref_get for softif_vlan near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/soft-interface.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index e508bf5..49e16b6 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -594,6 +594,7 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, 
unsigned short vid)
}
 
spin_lock_bh(&bat_priv->softif_vlan_list_lock);
+   kref_get(&vlan->refcount);
hlist_add_head_rcu(&vlan->list, &bat_priv->softif_vlan_list);
spin_unlock_bh(&bat_priv->softif_vlan_list_lock);
 
@@ -604,6 +605,9 @@ int batadv_softif_create_vlan(struct batadv_priv *bat_priv, 
unsigned short vid)
bat_priv->soft_iface->dev_addr, vid,
BATADV_NULL_IFINDEX, BATADV_NO_MARK);
 
+   /* don't return reference to new softif_vlan */
+   batadv_softif_vlan_put(vlan);
+
return 0;
 }
 
-- 
2.9.3



[PATCH 11/20] batman-adv: Place kref_get for dat_entry near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/distributed-arp-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/distributed-arp-table.c 
b/net/batman-adv/distributed-arp-table.c
index b1cc8bf..059bc23 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -343,8 +343,8 @@ static void batadv_dat_entry_add(struct batadv_priv 
*bat_priv, __be32 ip,
ether_addr_copy(dat_entry->mac_addr, mac_addr);
dat_entry->last_update = jiffies;
kref_init(&dat_entry->refcount);
-   kref_get(&dat_entry->refcount);
 
+   kref_get(&dat_entry->refcount);
hash_added = batadv_hash_add(bat_priv->dat.hash, batadv_compare_dat,
 batadv_hash_dat, dat_entry,
 &dat_entry->hash_entry);
-- 
2.9.3



[PATCH 09/20] batman-adv: Place kref_get for bla_claim near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/bridge_loop_avoidance.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c 
b/net/batman-adv/bridge_loop_avoidance.c
index 35ed1d3..b0517a0 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -718,12 +718,13 @@ static void batadv_bla_add_claim(struct batadv_priv 
*bat_priv,
claim->lasttime = jiffies;
kref_get(&backbone_gw->refcount);
claim->backbone_gw = backbone_gw;
-
kref_init(&claim->refcount);
-   kref_get(&claim->refcount);
+
batadv_dbg(BATADV_DBG_BLA, bat_priv,
   "bla_add_claim(): adding new entry %pM, vid %d to 
hash ...\n",
   mac, BATADV_PRINT_VID(vid));
+
+   kref_get(&claim->refcount);
hash_added = batadv_hash_add(bat_priv->bla.claim_hash,
 batadv_compare_claim,
 batadv_choose_claim, claim,
-- 
2.9.3



[PATCH 10/20] batman-adv: Place kref_get for bla_backbone_gw near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/bridge_loop_avoidance.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c 
b/net/batman-adv/bridge_loop_avoidance.c
index b0517a0..1db3c12 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -526,11 +526,9 @@ batadv_bla_get_backbone_gw(struct batadv_priv *bat_priv, 
u8 *orig,
atomic_set(&entry->wait_periods, 0);
ether_addr_copy(entry->orig, orig);
INIT_WORK(&entry->report_work, batadv_bla_loopdetect_report);
-
-   /* one for the hash, one for returning */
kref_init(&entry->refcount);
-   kref_get(&entry->refcount);
 
+   kref_get(&entry->refcount);
hash_added = batadv_hash_add(bat_priv->bla.backbone_hash,
 batadv_compare_backbone_gw,
 batadv_choose_backbone_gw, entry,
-- 
2.9.3



[PATCH 07/20] batman-adv: Place kref_get for tt_local_entry near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/translation-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/translation-table.c 
b/net/batman-adv/translation-table.c
index 5cc500f..094da1a 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -734,7 +734,6 @@ bool batadv_tt_local_add(struct net_device *soft_iface, 
const u8 *addr,
if (batadv_is_wifi_netdev(in_dev))
tt_local->common.flags |= BATADV_TT_CLIENT_WIFI;
kref_init(&tt_local->common.refcount);
-   kref_get(&tt_local->common.refcount);
tt_local->last_seen = jiffies;
tt_local->common.added_at = tt_local->last_seen;
tt_local->vlan = vlan;
@@ -746,6 +745,7 @@ bool batadv_tt_local_add(struct net_device *soft_iface, 
const u8 *addr,
is_multicast_ether_addr(addr))
tt_local->common.flags |= BATADV_TT_CLIENT_NOPURGE;
 
+   kref_get(&tt_local->common.refcount);
hash_added = batadv_hash_add(bat_priv->tt.local_hash, batadv_compare_tt,
 batadv_choose_tt, &tt_local->common,
 &tt_local->common.hash_entry);
-- 
2.9.3



[PATCH 02/20] batman-adv: Place kref_get for orig_ifinfo near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/originator.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 5108af1..8828964 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -386,6 +386,7 @@ batadv_orig_ifinfo_new(struct batadv_orig_node *orig_node,
orig_ifinfo->if_outgoing = if_outgoing;
INIT_HLIST_NODE(&orig_ifinfo->list);
kref_init(&orig_ifinfo->refcount);
+
kref_get(&orig_ifinfo->refcount);
hlist_add_head_rcu(&orig_ifinfo->list,
   &orig_node->ifinfo_list);
-- 
2.9.3



[PATCH 12/20] batman-adv: Place kref_get for gw_node near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/gateway_client.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
index b889e1f..4b51b1c 100644
--- a/net/batman-adv/gateway_client.c
+++ b/net/batman-adv/gateway_client.c
@@ -339,14 +339,15 @@ static void batadv_gw_node_add(struct batadv_priv 
*bat_priv,
if (!gw_node)
return;
 
+   kref_init(&gw_node->refcount);
INIT_HLIST_NODE(&gw_node->list);
kref_get(&orig_node->refcount);
gw_node->orig_node = orig_node;
gw_node->bandwidth_down = ntohl(gateway->bandwidth_down);
gw_node->bandwidth_up = ntohl(gateway->bandwidth_up);
-   kref_init(&gw_node->refcount);
 
spin_lock_bh(&bat_priv->gw.list_lock);
+   kref_get(&gw_node->refcount);
hlist_add_head_rcu(&gw_node->list, &bat_priv->gw.list);
spin_unlock_bh(&bat_priv->gw.list_lock);
 
@@ -357,6 +358,9 @@ static void batadv_gw_node_add(struct batadv_priv *bat_priv,
   ntohl(gateway->bandwidth_down) % 10,
   ntohl(gateway->bandwidth_up) / 10,
   ntohl(gateway->bandwidth_up) % 10);
+
+   /* don't return reference to new gw_node */
+   batadv_gw_node_put(gw_node);
 }
 
 /**
-- 
2.9.3



[PATCH 08/20] batman-adv: Place kref_get for tt_common near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/translation-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/translation-table.c 
b/net/batman-adv/translation-table.c
index 094da1a..d94e298 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1645,13 +1645,13 @@ static bool batadv_tt_global_add(struct batadv_priv 
*bat_priv,
if (flags & BATADV_TT_CLIENT_ROAM)
tt_global_entry->roam_at = jiffies;
kref_init(&common->refcount);
-   kref_get(&common->refcount);
common->added_at = jiffies;
 
INIT_HLIST_HEAD(&tt_global_entry->orig_list);
atomic_set(&tt_global_entry->orig_list_count, 0);
spin_lock_init(&tt_global_entry->list_lock);
 
+   kref_get(&common->refcount);
hash_added = batadv_hash_add(bat_priv->tt.global_hash,
 batadv_compare_tt,
 batadv_choose_tt, common,
-- 
2.9.3



[PATCH 19/20] batman-adv: Keep batadv netdev when hardif disappears

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

Switch-like virtual interfaces like bridge or openvswitch don't destroy
itself when all their attached netdevices dissappear. Instead they only
remove the link to the unregistered device and keep working until they get
removed manually.

This has the benefit that all configurations for this interfaces are kept
and daemons reacting to rtnl events can just add new slave interfaces
without going through the complete configuration of the switch-like
netdevice.

Handling unregister events of client devices similar in batman-adv allows
users to drop their current workaround of dummy netdevices attached to
batman-adv soft-interfaces.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/hard-interface.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 9284c73..08ce361 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -725,7 +725,7 @@ static void batadv_hardif_remove_interface(struct 
batadv_hard_iface *hard_iface)
/* first deactivate interface */
if (hard_iface->if_status != BATADV_IF_NOT_IN_USE)
batadv_hardif_disable_interface(hard_iface,
-   BATADV_IF_CLEANUP_AUTO);
+   BATADV_IF_CLEANUP_KEEP);
 
if (hard_iface->if_status != BATADV_IF_NOT_IN_USE)
return;
-- 
2.9.3



[PATCH v2] gianfar: fix size of scatter-gathered frames

2016-08-22 Thread Zefir Kurtisi
The current scatter-gather logic in gianfar is flawed, since
it does not consider the eTSEC's RxBD 'Data Length' field is
context depening: for the last fragment it contains the full
frame size, while fragments contain the fragment size, which
equals the value written to register MRBLR.

This causes data corruption as soon as the hardware starts
to fragment receiving frames. As a result, the size of
fragmented frames is increased by
(nr_frags - 1) * MRBLR

We first noticed this issue working with DSA, where an ICMP
request sized 1472 bytes causes the scatter-gather logic to
kick in. The full Ethernet frame (1518) gets increased by
DSA (4), GMAC_FCB_LEN (8), and FSL_GIANFAR_DEV_HAS_TIMER
(priv->padding=8) to a total of 1538 octets, which is
fragmented by the hardware and reconstructed by the driver
to a 3074 octet frame.

This patch fixes the problem by adjusting the size of
the last fragment.

It was tested by setting MRBLR to different multiples of
64, proving correct scatter-gather operation on frames
with up to 9000 octets in size.

Signed-off-by: Zefir Kurtisi 
---
 drivers/net/ethernet/freescale/gianfar.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

Changes to v1:
* removed check for total length exceeding sum of fragments
  (as suggested by Claudiu Manoil)
* updated commit log for clarification

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index d20935d..4b4f5bc 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2922,17 +2922,25 @@ static bool gfar_add_rx_frag(struct gfar_rx_buff *rxb, 
u32 lstatus,
 {
unsigned int size = lstatus & BD_LENGTH_MASK;
struct page *page = rxb->page;
+   bool last = !!(lstatus & BD_LFLAG(RXBD_LAST));
 
/* Remove the FCS from the packet length */
-   if (likely(lstatus & BD_LFLAG(RXBD_LAST)))
+   if (last)
size -= ETH_FCS_LEN;
 
-   if (likely(first))
+   if (likely(first)) {
skb_put(skb, size);
-   else
-   skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
-   rxb->page_offset + RXBUF_ALIGNMENT,
-   size, GFAR_RXB_TRUESIZE);
+   } else {
+   /* the last fragments' length contains the full frame length */
+   if (last)
+   size -= skb->len;
+
+   /* in case the last fragment consisted only of the FCS */
+   if (size > 0)
+   skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+   rxb->page_offset + RXBUF_ALIGNMENT,
+   size, GFAR_RXB_TRUESIZE);
+   }
 
/* try reuse page */
if (unlikely(page_count(page) != 1))
-- 
2.7.4



[PATCH 00/20] pull request for net-next: batman-adv 2016-08-16*

2016-08-22 Thread Simon Wunderlich
Hi David,

this is our third (and final, for now) pull request for batman-adv in this 
round,
with mostly maintainability stuff.

Please pull or let me know of any problem!

Thank you,
  Simon

The following changes since commit 4c09a08b47ffac9aa3bc91870aa54c9ae39d9674:

  batman-adv: Indicate netlink socket can be used with netns. (2016-08-09 
07:54:43 +0200)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batadv-next-for-davem-20160822

for you to fetch changes up to dc1cbd145eecf21209d0322874e1766bcbce3e3f:

  batman-adv: Allow to disable debugfs support (2016-08-09 07:54:54 +0200)


This feature patchset includes the following changes:

 - place kref_get near usage of referenced objects, separate patches
   for various used objects to improve readability and maintainability
   by Sven Eckelmann (18 patches)

 - Keep batadv net device when all hard interfaces disappear, to
   improve situations where tools currently use work arounds, by
   Sven Eckelmann

 - Add an option to disable debugfs support to minimize footprint when
   userspace uses netlink only, by Sven Eckelmann


Sven Eckelmann (20):
  batman-adv: Place kref_get for orig_node_vlan near use
  batman-adv: Place kref_get for orig_ifinfo near use
  batman-adv: Place kref_get for tt_orig_list_entry near use
  batman-adv: Place kref_get for neigh_ifinfo near use
  batman-adv: Place kref_get for neigh_node near use
  batman-adv: Place kref_get for orig_node near use
  batman-adv: Place kref_get for tt_local_entry near use
  batman-adv: Place kref_get for tt_common near use
  batman-adv: Place kref_get for bla_claim near use
  batman-adv: Place kref_get for bla_backbone_gw near use
  batman-adv: Place kref_get for dat_entry near use
  batman-adv: Place kref_get for gw_node near use
  batman-adv: Place kref_get for hard_iface near use
  batman-adv: Place kref_get for softif_vlan near use
  batman-adv: Place kref_get for nc_node near use
  batman-adv: Place kref_get for nc_path near use
  batman-adv: Place kref_get for tvlv_container near use
  batman-adv: Place kref_get for tvlv_handler near use
  batman-adv: Keep batadv netdev when hardif disappears
  batman-adv: Allow to disable debugfs support

 net/batman-adv/Kconfig | 15 +--
 net/batman-adv/Makefile|  4 ++--
 net/batman-adv/bat_algo.c  |  2 ++
 net/batman-adv/bat_iv_ogm.c| 19 ---
 net/batman-adv/bat_v.c | 12 
 net/batman-adv/bat_v_ogm.c |  5 ++---
 net/batman-adv/bridge_loop_avoidance.c | 13 -
 net/batman-adv/debugfs.h   |  2 +-
 net/batman-adv/distributed-arp-table.c |  4 +++-
 net/batman-adv/gateway_client.c| 10 --
 net/batman-adv/hard-interface.c|  8 +++-
 net/batman-adv/icmp_socket.h   | 18 +-
 net/batman-adv/main.c  |  2 ++
 net/batman-adv/multicast.c |  2 ++
 net/batman-adv/network-coding.c| 11 ++-
 net/batman-adv/originator.c| 12 
 net/batman-adv/soft-interface.c|  4 
 net/batman-adv/translation-table.c | 10 +++---
 net/batman-adv/tvlv.c  |  9 +
 net/batman-adv/types.h |  6 ++
 20 files changed, 131 insertions(+), 37 deletions(-)


[PATCH 04/20] batman-adv: Place kref_get for neigh_ifinfo near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/originator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 8828964..5e99a6e 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -460,9 +460,9 @@ batadv_neigh_ifinfo_new(struct batadv_neigh_node *neigh,
 
INIT_HLIST_NODE(&neigh_ifinfo->list);
kref_init(&neigh_ifinfo->refcount);
-   kref_get(&neigh_ifinfo->refcount);
neigh_ifinfo->if_outgoing = if_outgoing;
 
+   kref_get(&neigh_ifinfo->refcount);
hlist_add_head_rcu(&neigh_ifinfo->list, &neigh->ifinfo_list);
 
 out:
-- 
2.9.3



[PATCH 01/20] batman-adv: Place kref_get for orig_node_vlan near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/originator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 95c8555..5108af1 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -133,9 +133,9 @@ batadv_orig_node_vlan_new(struct batadv_orig_node 
*orig_node,
goto out;
 
kref_init(&vlan->refcount);
-   kref_get(&vlan->refcount);
vlan->vid = vid;
 
+   kref_get(&vlan->refcount);
hlist_add_head_rcu(&vlan->list, &orig_node->vlan_list);
 
 out:
-- 
2.9.3



[PATCH 18/20] batman-adv: Place kref_get for tvlv_handler near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/tvlv.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/batman-adv/tvlv.c b/net/batman-adv/tvlv.c
index 3533867..77654f0 100644
--- a/net/batman-adv/tvlv.c
+++ b/net/batman-adv/tvlv.c
@@ -547,8 +547,12 @@ void batadv_tvlv_handler_register(struct batadv_priv 
*bat_priv,
INIT_HLIST_NODE(&tvlv_handler->list);
 
spin_lock_bh(&bat_priv->tvlv.handler_list_lock);
+   kref_get(&tvlv_handler->refcount);
hlist_add_head_rcu(&tvlv_handler->list, &bat_priv->tvlv.handler_list);
spin_unlock_bh(&bat_priv->tvlv.handler_list_lock);
+
+   /* don't return reference to new tvlv_handler */
+   batadv_tvlv_handler_put(tvlv_handler);
 }
 
 /**
-- 
2.9.3



[PATCH 05/20] batman-adv: Place kref_get for neigh_node near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/originator.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 5e99a6e..0792de8 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -654,8 +654,8 @@ batadv_neigh_node_create(struct batadv_orig_node *orig_node,
 
/* extra reference for return */
kref_init(&neigh_node->refcount);
-   kref_get(&neigh_node->refcount);
 
+   kref_get(&neigh_node->refcount);
hlist_add_head_rcu(&neigh_node->list, &orig_node->neigh_list);
 
batadv_dbg(BATADV_DBG_BATMAN, orig_node->bat_priv,
-- 
2.9.3



[PATCH 06/20] batman-adv: Place kref_get for orig_node near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/bat_iv_ogm.c | 7 ---
 net/batman-adv/bat_v_ogm.c  | 5 ++---
 net/batman-adv/gateway_client.c | 2 +-
 net/batman-adv/network-coding.c | 7 +++
 net/batman-adv/originator.c | 1 -
 5 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 9ed4f1f..3c7900d 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -324,17 +324,18 @@ batadv_iv_ogm_orig_get(struct batadv_priv *bat_priv, 
const u8 *addr)
if (!orig_node->bat_iv.bcast_own_sum)
goto free_orig_node;
 
+   kref_get(&orig_node->refcount);
hash_added = batadv_hash_add(bat_priv->orig_hash, batadv_compare_orig,
 batadv_choose_orig, orig_node,
 &orig_node->hash_entry);
if (hash_added != 0)
-   goto free_orig_node;
+   goto free_orig_node_hash;
 
return orig_node;
 
-free_orig_node:
-   /* free twice, as batadv_orig_node_new sets refcount to 2 */
+free_orig_node_hash:
batadv_orig_node_put(orig_node);
+free_orig_node:
batadv_orig_node_put(orig_node);
 
return NULL;
diff --git a/net/batman-adv/bat_v_ogm.c b/net/batman-adv/bat_v_ogm.c
index 6fbba4e..1aeeadc 100644
--- a/net/batman-adv/bat_v_ogm.c
+++ b/net/batman-adv/bat_v_ogm.c
@@ -73,13 +73,12 @@ struct batadv_orig_node *batadv_v_ogm_orig_get(struct 
batadv_priv *bat_priv,
if (!orig_node)
return NULL;
 
+   kref_get(&orig_node->refcount);
hash_added = batadv_hash_add(bat_priv->orig_hash, batadv_compare_orig,
 batadv_choose_orig, orig_node,
 &orig_node->hash_entry);
if (hash_added != 0) {
-   /* orig_node->refcounter is initialised to 2 by
-* batadv_orig_node_new()
-*/
+   /* remove refcnt for newly created orig_node and hash entry */
batadv_orig_node_put(orig_node);
batadv_orig_node_put(orig_node);
orig_node = NULL;
diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c
index c2928c2..b889e1f 100644
--- a/net/batman-adv/gateway_client.c
+++ b/net/batman-adv/gateway_client.c
@@ -339,8 +339,8 @@ static void batadv_gw_node_add(struct batadv_priv *bat_priv,
if (!gw_node)
return;
 
-   kref_get(&orig_node->refcount);
INIT_HLIST_NODE(&gw_node->list);
+   kref_get(&orig_node->refcount);
gw_node->orig_node = orig_node;
gw_node->bandwidth_down = ntohl(gateway->bandwidth_down);
gw_node->bandwidth_up = ntohl(gateway->bandwidth_up);
diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 293ef4f..3814cfb 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -856,14 +856,13 @@ batadv_nc_get_nc_node(struct batadv_priv *bat_priv,
if (!nc_node)
return NULL;
 
-   kref_get(&orig_neigh_node->refcount);
-
/* Initialize nc_node */
INIT_LIST_HEAD(&nc_node->list);
-   ether_addr_copy(nc_node->addr, orig_node->orig);
-   nc_node->orig_node = orig_neigh_node;
kref_init(&nc_node->refcount);
kref_get(&nc_node->refcount);
+   ether_addr_copy(nc_node->addr, orig_node->orig);
+   kref_get(&orig_neigh_node->refcount);
+   nc_node->orig_node = orig_neigh_node;
 
/* Select ingoing or outgoing coding node */
if (in_coding) {
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 0792de8..0b7d57a 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -989,7 +989,6 @@ struct batadv_orig_node *batadv_orig_node_new(struct 
batadv_priv *bat_priv,
 
/* extra reference for return */
kref_init(&orig_node->refcount);
-   kref_get(&orig_node->refcount);
 
orig_node->bat_priv = bat_priv;
ether_addr_copy(orig_node->orig, addr);
-- 
2.9.3



[PATCH v2] gianfar: prevent fragmentation in DSA environments

2016-08-22 Thread Zefir Kurtisi
The eTSEC register MRBLR defines the maximum space in
the RX buffers and is set to 1536 by gianfar. This
reasonably covers the common use case where the MTU
is kept at default 1500. In that case, the largest
Ethernet frame size of 1518 plus an optional
GMAC_FCB_LEN of 8, and an additional padding of 8
to handle FSL_GIANFAR_DEV_HAS_TIMER totals to 1534
and nicely fit within the chosen MRBLR.

Alas, if the eTSEC is attached to a DSA enabled switch,
the (E)DSA header extension (4 or 8 bytes) causes every
maximum sized frame to be fragmented by the hardware.

This patch increases the maximum RX buffer size by 8
and rounds up to the next multiple of 64, which the
hardware's defines as RX buffer granularity.

Signed-off-by: Zefir Kurtisi 
---
 drivers/net/ethernet/freescale/gianfar.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Changes to v1:
* used roundup() for better readability of size increase
  (as suggested by Andrew Lunn)
* updated commmit log explaining what causes the S/G
  to kick in

diff --git a/drivers/net/ethernet/freescale/gianfar.h 
b/drivers/net/ethernet/freescale/gianfar.h
index 373fd09..6e8a9c8 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -100,7 +100,8 @@ extern const char gfar_driver_version[];
 #define DEFAULT_RX_LFC_THR  16
 #define DEFAULT_LFC_PTVVAL  4
 
-#define GFAR_RXB_SIZE 1536
+/* prevent fragmenation by HW in DSA environments */
+#define GFAR_RXB_SIZE roundup(1536 + 8, 64)
 #define GFAR_SKBFRAG_SIZE (RXBUF_ALIGNMENT + GFAR_RXB_SIZE \
  + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 #define GFAR_RXB_TRUESIZE 2048
-- 
2.7.4



[PATCH 03/20] batman-adv: Place kref_get for tt_orig_list_entry near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/translation-table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/translation-table.c 
b/net/batman-adv/translation-table.c
index 2080407..5cc500f 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1567,9 +1567,9 @@ batadv_tt_global_orig_entry_add(struct 
batadv_tt_global_entry *tt_global,
orig_entry->orig_node = orig_node;
orig_entry->ttvn = ttvn;
kref_init(&orig_entry->refcount);
-   kref_get(&orig_entry->refcount);
 
spin_lock_bh(&tt_global->list_lock);
+   kref_get(&orig_entry->refcount);
hlist_add_head_rcu(&orig_entry->list,
   &tt_global->orig_list);
spin_unlock_bh(&tt_global->list_lock);
-- 
2.9.3



[PATCH 13/20] batman-adv: Place kref_get for hard_iface near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/hard-interface.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 43c9a3e..9284c73 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -694,6 +694,7 @@ batadv_hardif_add_interface(struct net_device *net_dev)
INIT_HLIST_HEAD(&hard_iface->neigh_list);
 
spin_lock_init(&hard_iface->neigh_list_lock);
+   kref_init(&hard_iface->refcount);
 
hard_iface->num_bcasts = BATADV_NUM_BCASTS_DEFAULT;
if (batadv_is_wifi_netdev(net_dev))
@@ -701,11 +702,8 @@ batadv_hardif_add_interface(struct net_device *net_dev)
 
batadv_v_hardif_init(hard_iface);
 
-   /* extra reference for return */
-   kref_init(&hard_iface->refcount);
-   kref_get(&hard_iface->refcount);
-
batadv_check_known_mac_addr(hard_iface->net_dev);
+   kref_get(&hard_iface->refcount);
list_add_tail_rcu(&hard_iface->list, &batadv_hardif_list);
 
return hard_iface;
-- 
2.9.3



[PATCH 15/20] batman-adv: Place kref_get for nc_node near use

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

It is hard to understand why the refcnt is increased when it isn't done
near the actual place the new reference is used. So using kref_get right
before the place which requires the reference and in the same function
helps to avoid accidental problems caused by incorrect reference counting.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/network-coding.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 3814cfb..4f4cfe5 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -859,7 +859,6 @@ batadv_nc_get_nc_node(struct batadv_priv *bat_priv,
/* Initialize nc_node */
INIT_LIST_HEAD(&nc_node->list);
kref_init(&nc_node->refcount);
-   kref_get(&nc_node->refcount);
ether_addr_copy(nc_node->addr, orig_node->orig);
kref_get(&orig_neigh_node->refcount);
nc_node->orig_node = orig_neigh_node;
@@ -878,6 +877,7 @@ batadv_nc_get_nc_node(struct batadv_priv *bat_priv,
 
/* Add nc_node to orig_node */
spin_lock_bh(lock);
+   kref_get(&nc_node->refcount);
list_add_tail_rcu(&nc_node->list, list);
spin_unlock_bh(lock);
 
-- 
2.9.3



[PATCH 20/20] batman-adv: Allow to disable debugfs support

2016-08-22 Thread Simon Wunderlich
From: Sven Eckelmann 

The files provided by batman-adv via debugfs are currently converted to
netlink. Tools which are not yet converted to use the netlink interface may
still rely on the old debugfs files. But systems which already upgraded
their tools can save some space by disabling this feature. The default
configuration of batman-adv on amd64 can reduce the size of the module by
around 11% when this feature is disabled.

$ size net/batman-adv/batman-adv.ko*
   textdata bss dec hex filename
 150507   103954160  165062   284c6 net/batman-adv/batman-adv.ko.y
 13710670992112  146317   23b8d net/batman-adv/batman-adv.ko.n

Signed-off-by: Sven Eckelmann 
Signed-off-by: Marek Lindner 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/Kconfig | 15 +--
 net/batman-adv/Makefile|  4 ++--
 net/batman-adv/bat_algo.c  |  2 ++
 net/batman-adv/bat_iv_ogm.c| 12 
 net/batman-adv/bat_v.c | 12 
 net/batman-adv/bridge_loop_avoidance.c |  4 
 net/batman-adv/debugfs.h   |  2 +-
 net/batman-adv/distributed-arp-table.c |  2 ++
 net/batman-adv/gateway_client.c|  2 ++
 net/batman-adv/icmp_socket.h   | 18 +-
 net/batman-adv/main.c  |  2 ++
 net/batman-adv/multicast.c |  2 ++
 net/batman-adv/network-coding.c|  2 ++
 net/batman-adv/originator.c|  4 
 net/batman-adv/translation-table.c |  4 
 net/batman-adv/types.h |  6 ++
 16 files changed, 87 insertions(+), 6 deletions(-)

diff --git a/net/batman-adv/Kconfig b/net/batman-adv/Kconfig
index 833bb14..f20742c 100644
--- a/net/batman-adv/Kconfig
+++ b/net/batman-adv/Kconfig
@@ -73,10 +73,21 @@ config BATMAN_ADV_MCAST
  reduce the air overhead while improving the reliability of
  multicast messages.
 
-config BATMAN_ADV_DEBUG
-   bool "B.A.T.M.A.N. debugging"
+config BATMAN_ADV_DEBUGFS
+   bool "batman-adv debugfs entries"
depends on BATMAN_ADV
depends on DEBUG_FS
+   default y
+   help
+ Enable this to export routing related debug tables via debugfs.
+ The information for each soft-interface and used hard-interface can be
+ found under batman_adv/
+
+ If unsure, say Y.
+
+config BATMAN_ADV_DEBUG
+   bool "B.A.T.M.A.N. debugging"
+   depends on BATMAN_ADV_DEBUGFS
help
  This is an option for use by developers; most people should
  say N here. This enables compilation of support for
diff --git a/net/batman-adv/Makefile b/net/batman-adv/Makefile
index a83fc6c..f724d3c 100644
--- a/net/batman-adv/Makefile
+++ b/net/batman-adv/Makefile
@@ -24,14 +24,14 @@ batman-adv-$(CONFIG_BATMAN_ADV_BATMAN_V) += bat_v_elp.o
 batman-adv-$(CONFIG_BATMAN_ADV_BATMAN_V) += bat_v_ogm.o
 batman-adv-y += bitarray.o
 batman-adv-$(CONFIG_BATMAN_ADV_BLA) += bridge_loop_avoidance.o
-batman-adv-$(CONFIG_DEBUG_FS) += debugfs.o
+batman-adv-$(CONFIG_BATMAN_ADV_DEBUGFS) += debugfs.o
 batman-adv-$(CONFIG_BATMAN_ADV_DAT) += distributed-arp-table.o
 batman-adv-y += fragmentation.o
 batman-adv-y += gateway_client.o
 batman-adv-y += gateway_common.o
 batman-adv-y += hard-interface.o
 batman-adv-y += hash.o
-batman-adv-y += icmp_socket.o
+batman-adv-$(CONFIG_BATMAN_ADV_DEBUGFS) += icmp_socket.o
 batman-adv-$(CONFIG_BATMAN_ADV_DEBUG) += log.o
 batman-adv-y += main.o
 batman-adv-$(CONFIG_BATMAN_ADV_MCAST) += multicast.o
diff --git a/net/batman-adv/bat_algo.c b/net/batman-adv/bat_algo.c
index f2cc50d3..623d043 100644
--- a/net/batman-adv/bat_algo.c
+++ b/net/batman-adv/bat_algo.c
@@ -101,6 +101,7 @@ int batadv_algo_select(struct batadv_priv *bat_priv, char 
*name)
return 0;
 }
 
+#ifdef CONFIG_BATMAN_ADV_DEBUGFS
 int batadv_algo_seq_print_text(struct seq_file *seq, void *offset)
 {
struct batadv_algo_ops *bat_algo_ops;
@@ -113,6 +114,7 @@ int batadv_algo_seq_print_text(struct seq_file *seq, void 
*offset)
 
return 0;
 }
+#endif
 
 static int batadv_param_set_ra(const char *val, const struct kernel_param *kp)
 {
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 3c7900d..e2d18d0 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1855,6 +1855,7 @@ static int batadv_iv_ogm_receive(struct sk_buff *skb,
return NET_RX_SUCCESS;
 }
 
+#ifdef CONFIG_BATMAN_ADV_DEBUGFS
 /**
  * batadv_iv_ogm_orig_print_neigh - print neighbors for the originator table
  * @orig_node: the orig_node for which the neighbors are printed
@@ -1952,6 +1953,7 @@ next:
if (batman_count == 0)
seq_puts(seq, "No batman nodes in range ...\n");
 }
+#endif
 
 /**
  * batadv_iv_ogm_neigh_get_tq_avg - Get the TQ average for a neighbour on a
@@ -2182,6 +2184,7 @@ batadv_iv_ogm_orig_dump(struct sk_buff *msg, struct 
netlink_callback *cb,
cb->args[2] = sub;
 }
 
+#ifdef C

[PATCH 3/4] dsa: mv88e6xxx: Delete ppu timer when removing module

2016-08-22 Thread Andrew Lunn
The PPU method of accessing PHYs makes use of a timer. Make sure this
timer is deleted before unloading the driver.

Reported-by: Jamie Lentin 
Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b315769aa5be..1d5f9576e62a 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -486,6 +486,11 @@ static void mv88e6xxx_ppu_state_init(struct mv88e6xxx_chip 
*chip)
chip->ppu_timer.function = mv88e6xxx_ppu_reenable_timer;
 }
 
+static void mv88e6xxx_ppu_state_destroy(struct mv88e6xxx_chip *chip)
+{
+   del_timer_sync(&chip->ppu_timer);
+}
+
 static int mv88e6xxx_phy_ppu_read(struct mv88e6xxx_chip *chip, int addr,
  int reg, u16 *val)
 {
@@ -3892,6 +3897,13 @@ static void mv88e6xxx_phy_init(struct mv88e6xxx_chip 
*chip)
}
 }
 
+static void mv88e6xxx_phy_destroy(struct mv88e6xxx_chip *chip)
+{
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU)) {
+   mv88e6xxx_ppu_state_destroy(chip);
+   }
+}
+
 static int mv88e6xxx_smi_init(struct mv88e6xxx_chip *chip,
  struct mii_bus *bus, int sw_addr)
 {
@@ -4080,6 +4092,7 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
struct dsa_switch *ds = dev_get_drvdata(&mdiodev->dev);
struct mv88e6xxx_chip *chip = ds_to_priv(ds);
 
+   mv88e6xxx_phy_destroy(chip);
mv88e6xxx_unregister_switch(chip);
mv88e6xxx_mdio_unregister(chip);
 }
-- 
2.8.1



[PATCH 4/4] net: mv88e6xxx: Enable PORT_CONTROL_FORWARD_UNKNOWN for DSA-tagged CPU ports

2016-08-22 Thread Andrew Lunn
From: Jamie Lentin 

Without it, a mv88e6131 switch will not forward incoming unicast
packets to the CPU port.

Signed-off-by: Jamie Lentin 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 1d5f9576e62a..82d45165803c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2490,11 +2490,11 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
if (dsa_is_cpu_port(ds, port)) {
if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA))
reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
-   PORT_CONTROL_FORWARD_UNKNOWN |
PORT_CONTROL_FORWARD_UNKNOWN_MC;
else
reg |= PORT_CONTROL_DSA_TAG;
-   reg |= PORT_CONTROL_EGRESS_ADD_TAG;
+   reg |= PORT_CONTROL_EGRESS_ADD_TAG |
+   PORT_CONTROL_FORWARD_UNKNOWN;
}
if (dsa_is_dsa_port(ds, port)) {
if (mv88e6xxx_6095_family(chip) ||
-- 
2.8.1



Re: [PATCH -next v2] net: phy: Add missing of_node_put() in xgmiitorgmii_probe()

2016-08-22 Thread Kedari Appana
Hi ,


On Mon, Aug 22, 2016 at 4:16 AM, Wei Yongjun  wrote:
> This node pointer is returned by of_parse_phandle() with
> refcount incremented in this function. of_node_put() on it
> before exitting this function.
>
> This is detected by Coccinelle semantic patch.
>
> Signed-off-by: Wei Yongjun 

Reviewed-by: Kedareswara rao Appana 

Thanks,
Kedar.

> ---
> v1 -> v2: release it unconditionally as Andrew and David's suggestion
> ---
>  drivers/net/phy/xilinx_gmii2rgmii.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c 
> b/drivers/net/phy/xilinx_gmii2rgmii.c
> index cad6e19..73b50f3 100644
> --- a/drivers/net/phy/xilinx_gmii2rgmii.c
> +++ b/drivers/net/phy/xilinx_gmii2rgmii.c
> @@ -73,6 +73,7 @@ int xgmiitorgmii_probe(struct mdio_device *mdiodev)
> }
>
> priv->phy_dev = of_phy_find_device(phy_node);
> +   of_node_put(phy_node);
> if (!priv->phy_dev) {
> dev_info(dev, "Couldn't find phydev\n");
> return -EPROBE_DEFER;
>


[PATCH 2/4] net: dsa: mv88e6xxx: Fix support for DSA tagging for older switches.

2016-08-22 Thread Andrew Lunn
Older chips only support DSA tagging on the CPU port. New devices
support both DSA and EDSA. The driver needs to tell the core the tag
protocol to use, and configure the switch for what is available.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/Kconfig |  1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 41 +++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 16 +++---
 3 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/Kconfig 
b/drivers/net/dsa/mv88e6xxx/Kconfig
index 490bc06f993e..ac77737bbd87 100644
--- a/drivers/net/dsa/mv88e6xxx/Kconfig
+++ b/drivers/net/dsa/mv88e6xxx/Kconfig
@@ -2,6 +2,7 @@ config NET_DSA_MV88E6XXX
tristate "Marvell 88E6xxx Ethernet switch fabric support"
depends on NET_DSA
select NET_DSA_TAG_EDSA
+   select NET_DSA_TAG_DSA
help
  This driver adds support for most of the Marvell 88E6xxx models of
  Ethernet switch chips, except 88E6060.
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 63cad6c00bc7..b315769aa5be 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2483,28 +2483,13 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
PORT_CONTROL_USE_TAG | PORT_CONTROL_USE_IP |
PORT_CONTROL_STATE_FORWARDING;
if (dsa_is_cpu_port(ds, port)) {
-   if (mv88e6xxx_6095_family(chip) || mv88e6xxx_6185_family(chip))
-   reg |= PORT_CONTROL_DSA_TAG;
-   if (mv88e6xxx_6352_family(chip) ||
-   mv88e6xxx_6351_family(chip) ||
-   mv88e6xxx_6165_family(chip) ||
-   mv88e6xxx_6097_family(chip) ||
-   mv88e6xxx_6320_family(chip)) {
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA))
reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
PORT_CONTROL_FORWARD_UNKNOWN |
PORT_CONTROL_FORWARD_UNKNOWN_MC;
-   }
-
-   if (mv88e6xxx_6352_family(chip) ||
-   mv88e6xxx_6351_family(chip) ||
-   mv88e6xxx_6165_family(chip) ||
-   mv88e6xxx_6097_family(chip) ||
-   mv88e6xxx_6095_family(chip) ||
-   mv88e6xxx_6065_family(chip) ||
-   mv88e6xxx_6185_family(chip) ||
-   mv88e6xxx_6320_family(chip)) {
-   reg |= PORT_CONTROL_EGRESS_ADD_TAG;
-   }
+   else
+   reg |= PORT_CONTROL_DSA_TAG;
+   reg |= PORT_CONTROL_EGRESS_ADD_TAG;
}
if (dsa_is_dsa_port(ds, port)) {
if (mv88e6xxx_6095_family(chip) ||
@@ -2632,10 +2617,13 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
/* Port Ethertype: use the Ethertype DSA Ethertype
 * value.
 */
-   ret = _mv88e6xxx_reg_write(chip, REG_PORT(port),
-  PORT_ETH_TYPE, ETH_P_EDSA);
-   if (ret)
-   return ret;
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA)) {
+   ret = _mv88e6xxx_reg_write(chip, REG_PORT(port),
+  PORT_ETH_TYPE, ETH_P_EDSA);
+   if (ret)
+   return ret;
+   }
+
/* Tag Remap: use an identity 802.1p prio -> switch
 * prio mapping.
 */
@@ -3926,7 +3914,12 @@ static int mv88e6xxx_smi_init(struct mv88e6xxx_chip 
*chip,
 
 static enum dsa_tag_protocol mv88e6xxx_get_tag_protocol(struct dsa_switch *ds)
 {
-   return DSA_TAG_PROTO_EDSA;
+   struct mv88e6xxx_chip *chip = ds_to_priv(ds);
+
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA))
+   return DSA_TAG_PROTO_EDSA;
+
+   return DSA_TAG_PROTO_DSA;
 }
 
 static const char *mv88e6xxx_drv_probe(struct device *dsa_dev,
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 1f9bab5891b1..e157d4f69864 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -386,6 +386,12 @@ enum mv88e6xxx_family {
 };
 
 enum mv88e6xxx_cap {
+   /* Two different tag protocols can be used by the driver. All
+* switches support DSA, but only later generations support
+* EDSA.
+*/
+   MV88E6XXX_CAP_EDSA,
+
/* Energy Efficient Ethernet.
 */
MV88E6XXX_CAP_EEE,
@@ -447,6 +453,7 @@ enum mv88e6xxx_cap {
 };
 
 /* Bitmask of capabilities */
+#define MV88E6XXX_FLAG_EDSABIT(MV88E6XXX_CAP_EDSA)
 #define MV88E6XXX_FLAG_EEE BIT(MV88E6XXX_CAP_EEE)
 
 #define MV88E6XXX_FLAG_SMI_CMD BIT(MV88E6XXX_CAP_SMI_CMD)
@@ -547,7 +554,8 @@ enu

[PATCH 1/4] net: dsa: Allow the DSA driver to indicate the tag protocol

2016-08-22 Thread Andrew Lunn
DSA drivers may drive different families of switches which need
different tag protocol. Rather than hard code the tag protocol in the
driver structure, have a callback for the DSA core to call.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/b53/b53_common.c | 7 ++-
 drivers/net/dsa/bcm_sf2.c| 7 ++-
 drivers/net/dsa/mv88e6060.c  | 7 ++-
 drivers/net/dsa/mv88e6xxx/chip.c | 7 ++-
 include/net/dsa.h| 5 +++--
 net/dsa/dsa.c| 5 -
 net/dsa/dsa2.c   | 4 +++-
 7 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 38ee10de7884..65ecb51f99e5 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1373,8 +1373,13 @@ static void b53_br_set_stp_state(struct dsa_switch *ds, 
int port,
b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), reg);
 }
 
+static enum dsa_tag_protocol b53_get_tag_protocol(struct dsa_switch *ds)
+{
+   return DSA_TAG_PROTO_NONE;
+}
+
 static struct dsa_switch_driver b53_switch_ops = {
-   .tag_protocol   = DSA_TAG_PROTO_NONE,
+   .get_tag_protocol   = b53_get_tag_protocol,
.setup  = b53_setup,
.set_addr   = b53_set_addr,
.get_strings= b53_get_strings,
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8e6fe13dbec3..b47a74b37a42 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -136,6 +136,11 @@ static int bcm_sf2_sw_get_sset_count(struct dsa_switch *ds)
return BCM_SF2_STATS_SIZE;
 }
 
+static enum dsa_tag_protocol bcm_sf2_sw_get_tag_protocol(struct dsa_switch *ds)
+{
+   return DSA_TAG_PROTO_BRCM;
+}
+
 static void bcm_sf2_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
 {
struct bcm_sf2_priv *priv = ds_to_priv(ds);
@@ -1577,8 +1582,8 @@ static int bcm_sf2_sw_setup(struct dsa_switch *ds)
 }
 
 static struct dsa_switch_driver bcm_sf2_switch_driver = {
-   .tag_protocol   = DSA_TAG_PROTO_BRCM,
.setup  = bcm_sf2_sw_setup,
+   .get_tag_protocol   = bcm_sf2_sw_get_tag_protocol,
.set_addr   = bcm_sf2_sw_set_addr,
.get_phy_flags  = bcm_sf2_sw_get_phy_flags,
.get_strings= bcm_sf2_sw_get_strings,
diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index e36b40886bd8..1fdfbf3a50bc 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -69,6 +69,11 @@ static const char *mv88e6060_get_name(struct mii_bus *bus, 
int sw_addr)
return NULL;
 }
 
+static enum dsa_tag_protocol mv88e6060_get_tag_protocol(struct dsa_switch *ds)
+{
+   return DSA_TAG_PROTO_TRAILER;
+}
+
 static const char *mv88e6060_drv_probe(struct device *dsa_dev,
   struct device *host_dev, int sw_addr,
   void **_priv)
@@ -248,7 +253,7 @@ mv88e6060_phy_write(struct dsa_switch *ds, int port, int 
regnum, u16 val)
 }
 
 static struct dsa_switch_driver mv88e6060_switch_driver = {
-   .tag_protocol   = DSA_TAG_PROTO_TRAILER,
+   .get_tag_protocol = mv88e6060_get_tag_protocol,
.probe  = mv88e6060_drv_probe,
.setup  = mv88e6060_setup,
.set_addr   = mv88e6060_set_addr,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 014b52bd72f1..63cad6c00bc7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3924,6 +3924,11 @@ static int mv88e6xxx_smi_init(struct mv88e6xxx_chip 
*chip,
return 0;
 }
 
+static enum dsa_tag_protocol mv88e6xxx_get_tag_protocol(struct dsa_switch *ds)
+{
+   return DSA_TAG_PROTO_EDSA;
+}
+
 static const char *mv88e6xxx_drv_probe(struct device *dsa_dev,
   struct device *host_dev, int sw_addr,
   void **priv)
@@ -3967,8 +3972,8 @@ free:
 }
 
 static struct dsa_switch_driver mv88e6xxx_switch_driver = {
-   .tag_protocol   = DSA_TAG_PROTO_EDSA,
.probe  = mv88e6xxx_drv_probe,
+   .get_tag_protocol   = mv88e6xxx_get_tag_protocol,
.setup  = mv88e6xxx_setup,
.set_addr   = mv88e6xxx_set_addr,
.adjust_link= mv88e6xxx_adjust_link,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index d00c392bc9f8..8ca2684c5358 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -239,14 +239,15 @@ struct switchdev_obj_port_vlan;
 struct dsa_switch_driver {
struct list_headlist;
 
-   enum dsa_tag_protocol   tag_protocol;
-
/*
 * Probing and setup.
 */
const char  *(*probe)(struct device *dsa_dev,
  struct device *host_dev, int sw_addr,
  void *

[PATCH 0/4] Fix MV88E6131 tagging

2016-08-22 Thread Andrew Lunn
Marvell has two different tagging protocols for frames passed to a
swicth. There is the older DSA and the newer EDSA. Somewhere along the
way, we broke support for switches which only support DSA, by trying
to configure them to use EDSA. These patches add back support for
switches which only support DSA, by allowing the drivers to
dynamically indicate the tagging protocol they support to the DSA
core. This needs to be dynamic since the mv88e6xxx has to support two
protocols.

Thanks go to Jamie Lentin for reporting the problem, helping debug it,
providing some of the fix, and testing.

Andrew Lunn (3):
  net: dsa: Allow the DSA driver to indicate the tag protocol
  net: dsa: mv88e6xxx: Fix support for DSA tagging for older switches.
  dsa: mv88e6xxx: Delete ppu timer when removing module

Jamie Lentin (1):
  net: mv88e6xxx: Enable PORT_CONTROL_FORWARD_UNKNOWN for DSA-tagged CPU
ports

 drivers/net/dsa/b53/b53_common.c  |  7 +++-
 drivers/net/dsa/bcm_sf2.c |  7 +++-
 drivers/net/dsa/mv88e6060.c   |  7 +++-
 drivers/net/dsa/mv88e6xxx/Kconfig |  1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 61 +--
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 16 +++--
 include/net/dsa.h |  5 +--
 net/dsa/dsa.c |  5 ++-
 net/dsa/dsa2.c|  4 ++-
 9 files changed, 78 insertions(+), 35 deletions(-)

-- 
2.8.1



Re: [PATCH net 2/2] sctp: not copying duplicate addrs to the assoc's bind address list

2016-08-22 Thread Neil Horman
On Sat, Aug 20, 2016 at 02:41:01PM +0800, Xin Long wrote:
> > Ah, I see what you're doing.  Ok, this makes some sense, at least on the 
> > receive
> > side, when you get a cookie unpacked and modify the remote peers address 
> > list,
> > it makes sense to check for duplicates.  On the local side however, I would,
> > instead of checking it when the list gets copied, I'd check it when the 
> > master
> > list gets updated (in the NETDEV_UP event notifier for the local address 
> > list,
> 
> I was thinking about to check it in the NETDEV_UP, yes it can make the
> master list has no duplicated addresses.  But what if two same addresses
> events come, and they come from different NICs (though I can't point  out
> the valid use case), then we filter there.
> 
That I think would be a bug in the protocol code.  For the ipv4 case, all
addresses are owned by the system and the same addresses added to multiple
interfaces should not be allowed.  The same is true of ipv6 case.  The only
exception there is a link local address and that should still be unique within
the context of an address/dev tuple.

> Later, sctp may receive one  NETDEV_DOWN event,sctp will remove that
> addr in the master list, but it shouldn't have been removed, as another local
> NIC still has that addr.
> 
> That's why I have to leave the master alone, just check when they are really
> being bind to asoc addr list.
> 
> > and the sctp_add_bind_addr function for the endpoint address list).  That 
> > way
> 
> As to the endpoint address list, sctp has different process for binding
> the address 'ANY' from assoc address list (note that this issue only
> happened in binding the address 'ANY'). instead of  copying the master
> address list to  the endpoint, it only adds address 'ANY' to the EP
> address list. Only when starting a connection and create the assoc, it
> copy the master address list to ASOC.
> 
> So no need to do it in sctp_add_bind_addr for endpoint address list.
> Besides, sctp_add_bind_addr  is supposed to be called after checking
> the duplicated address(I got it from sctp_do_bind()). :-)
> 
> > you can keep that nested for loop out of the send path on the local system.
> >
> >
> 


Re: [PATCH 1/4] net: dsa: Allow the DSA driver to indicate the tag protocol

2016-08-22 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> DSA drivers may drive different families of switches which need
> different tag protocol. Rather than hard code the tag protocol in the
> driver structure, have a callback for the DSA core to call.
>
> Signed-off-by: Andrew Lunn 

Great, this DSA structure finally becomes operation-only.

Reviewed-by: Vivien Didelot 

Thanks,

Vivien


[PATCH net-next 1/3] net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()

2016-08-22 Thread Amir Vadai
Add utility functions to convert a 32 bits key into a 64 bits tunnel and
vice versa.
These functions will be used instead of cloning code in GRE and VXLAN,
and in tc act_iptunnel which will be introduced in a following patch in
this patchset.

Signed-off-by: Amir Vadai 
---
 drivers/net/vxlan.c  |  4 ++--
 include/net/ip_tunnels.h | 19 +++
 include/net/vxlan.h  | 18 --
 net/ipv4/ip_gre.c| 23 ++-
 4 files changed, 23 insertions(+), 41 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c0dda6fc0921..b1ddf8f756d4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1294,7 +1294,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
struct metadata_dst *tun_dst;
 
tun_dst = udp_tun_rx_dst(skb, vxlan_get_sk_family(vs), 
TUNNEL_KEY,
-vxlan_vni_to_tun_id(vni), sizeof(*md));
+key32_to_tunnel_id(vni), sizeof(*md));
 
if (!tun_dst)
goto drop;
@@ -1948,7 +1948,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = vxlan_tun_id_to_vni(info->key.tun_id);
+   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index a5e7035fb93f..d8afe4400373 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -222,6 +222,25 @@ static inline unsigned short ip_tunnel_info_af(const 
struct ip_tunnel_info
return tun_info->mode & IP_TUNNEL_INFO_IPV6 ? AF_INET6 : AF_INET;
 }
 
+static inline __be64 key32_to_tunnel_id(__be32 key)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be64)((__force u32)key);
+#else
+   return (__force __be64)((__force u64)key << 32);
+#endif
+}
+
+/* Returns the least-significant 32 bits of a __be64. */
+static inline __be32 tunnel_id_to_key32(__be64 x)
+{
+#ifdef __BIG_ENDIAN
+   return (__force __be32)x;
+#else
+   return (__force __be32)((__force u64)x >> 32);
+#endif
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index b96d0360c095..0255613a54a4 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -350,24 +350,6 @@ static inline __be32 vxlan_vni_field(__be32 vni)
 #endif
 }
 
-static inline __be32 vxlan_tun_id_to_vni(__be64 tun_id)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be32)tun_id;
-#else
-   return (__force __be32)((__force u64)tun_id >> 32);
-#endif
-}
-
-static inline __be64 vxlan_vni_to_tun_id(__be32 vni)
-{
-#if defined(__BIG_ENDIAN)
-   return (__force __be64)vni;
-#else
-   return (__force __be64)((u64)(__force u32)vni << 32);
-#endif
-}
-
 static inline size_t vxlan_rco_start(__be32 vni_field)
 {
return be32_to_cpu(vni_field & VXLAN_RCO_MASK) << VXLAN_RCO_SHIFT;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 113cc43df789..576f705d8180 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -246,25 +246,6 @@ static void gre_err(struct sk_buff *skb, u32 info)
ipgre_err(skb, info, &tpi);
 }
 
-static __be64 key_to_tunnel_id(__be32 key)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be64)((__force u32)key);
-#else
-   return (__force __be64)((__force u64)key << 32);
-#endif
-}
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 tunnel_id_to_key(__be64 x)
-{
-#ifdef __BIG_ENDIAN
-   return (__force __be32)x;
-#else
-   return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
 static int __ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi,
   struct ip_tunnel_net *itn, int hdr_len, bool raw_proto)
 {
@@ -290,7 +271,7 @@ static int __ipgre_rcv(struct sk_buff *skb, const struct 
tnl_ptk_info *tpi,
__be64 tun_id;
 
flags = tpi->flags & (TUNNEL_CSUM | TUNNEL_KEY);
-   tun_id = key_to_tunnel_id(tpi->key);
+   tun_id = key32_to_tunnel_id(tpi->key);
tun_dst = ip_tun_rx_dst(skb, flags, tun_id, 0);
if (!tun_dst)
return PACKET_REJECT;
@@ -446,7 +427,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct 
net_device *dev,
 
flags = tun_info->key.tun_flags & (TUNNEL_CSUM | TUNNEL_KEY);
gre_build_header(skb, tunnel_hlen, flags, proto,
-tunnel_id_to_key(tun_info->key.tun_id), 0);
+tunnel_id_to_key32(tun_info->key.tun_id), 0);
 
df = key->tun_f

[PATCH net-next 2/3] net/sched: cls_flower: Classify packet in ip tunnels

2016-08-22 Thread Amir Vadai
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0' (after metadata is released):

$ filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
  enc_dst_ip 11.11.0.1 \
  enc_key_id 11 \
  dst_ip 11.11.11.1 \
action iptunnel decap \
action mirred egress redirect dev vnet0

The action iptunnel, will be introduced in the next patch in this
series.

Signed-off-by: Amir Vadai 
---
 include/uapi/linux/pkt_cls.h | 11 +
 net/sched/cls_flower.c   | 59 ++--
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 51b5b247fb5a..f9c287c67eae 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -431,6 +431,17 @@ enum {
TCA_FLOWER_KEY_VLAN_ID,
TCA_FLOWER_KEY_VLAN_PRIO,
TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+
+   TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
+   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 1e11e57e6947..75f719944fa8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,6 +23,9 @@
 #include 
 #include 
 
+#include 
+#include 
+
 struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
@@ -35,6 +38,8 @@ struct fl_flow_key {
struct flow_dissector_key_ipv6_addrs ipv6;
};
struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_ipv4_addrs enc_ipv4;
+   struct flow_dissector_key_keyid enc_key_id;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -124,11 +129,22 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+   struct ip_tunnel_info *info;
 
if (!atomic_read(&head->ht.nelems))
return -1;
 
fl_clear_masked_range(&skb_key, &head->mask);
+
+   info = skb_tunnel_info(skb);
+   if (info) {
+   struct ip_tunnel_key *key = &info->key;
+
+   skb_key.enc_ipv4.src = key->u.ipv4.src;
+   skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+   skb_key.enc_key_id.keyid = tunnel_id_to_key32(key->tun_id);
+   }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
 * so do it rather here.
@@ -297,7 +313,11 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_VLAN_ID]= { .type = NLA_U16 },
[TCA_FLOWER_KEY_VLAN_PRIO]  = { .type = NLA_U8 },
[TCA_FLOWER_KEY_VLAN_ETH_TYPE]  = { .type = NLA_U16 },
-
+   [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST]   = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -345,7 +365,6 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
mask->indev_ifindex = 0x;
}
 #endif
-
fl_set_key_val(tb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
   mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
   sizeof(key->eth.dst));
@@ -408,6 +427,29 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
   sizeof(key->tp.dst));
}
 
+   if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
+   tb[TCA_FLOWER_KEY_ENC_IPV4_DST] ||
+   tb[TCA_FLOWER_KEY_ENC_KEY_ID]) {
+   fl_set_key_val(tb, &key->enc_ipv4.src,
+  TCA_FLOWER_KEY_ENC_IPV4_SRC,
+  &mask->enc_ipv4.src,
+  TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+  sizeof(key->enc_ipv4.src));
+   fl_set_key_val(tb, &key->enc_ipv4.dst,
+  

[PATCH net-next 0/3] net/sched: iptunnel encap/decap/classify using TC

2016-08-22 Thread Amir Vadai
Hi,

This patchset introduces iptunnel support using the TC subsystem.

In the decap flow, it enables the user to redirect packets from a shared tunnel
device and classify by outer and inner headers. The outer headers are extracted
from the metadata and used by the flower filter. A new action act_iptunnel,
releases the metadata.

In the encap flow, act_iptunnel creates a metadata object to be used by the
shared tunnel device. The actual redirection to the tunnel device is done using
act_mirred.

For example:
$ tc qdisc add dev vnet0 ingress
$ tc filter add dev vnet0 protocol ip parent : \
flower \
  ip_proto 1 \
action iptunnel encap \
  src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
  
$ tc qdisc add dev vxlan0 ingress
$ tc filter add dev vxlan0 protocol ip parent : \
flower \
  enc_src_ip 11.11.0.2 \
enc_dst_ip 11.11.0.1 \
enc_key_id 11 \
action iptunnel decap \
  action mirred egress redirect dev vnet0

note: Current implementation supports ipv4 only, but it should be easy to add
  ipv6 later on.

Amir

Changes from RFC:
- Add a new action instead of making mirred too complex
- No need to specify UDP port in action - it is already in the tunnel device
configuration
- Added a decap operation to drop tunnel metadata

Amir Vadai (3):
  net/ip_tunnels: Introduce tunnel_id_to_key32() and
key32_to_tunnel_id()
  net/sched: cls_flower: Classify packet in ip tunnels
  net/sched: Introduce act_iptunnel

 drivers/net/vxlan.c |   4 +-
 include/net/ip_tunnels.h|  19 +++
 include/net/tc_act/tc_iptunnel.h|  24 +++
 include/net/vxlan.h |  18 --
 include/uapi/linux/pkt_cls.h|  11 ++
 include/uapi/linux/tc_act/tc_iptunnel.h |  40 +
 net/ipv4/ip_gre.c   |  23 +--
 net/sched/Kconfig   |  11 ++
 net/sched/Makefile  |   1 +
 net/sched/act_iptunnel.c| 292 
 net/sched/cls_flower.c  |  59 ++-
 11 files changed, 459 insertions(+), 43 deletions(-)
 create mode 100644 include/net/tc_act/tc_iptunnel.h
 create mode 100644 include/uapi/linux/tc_act/tc_iptunnel.h
 create mode 100644 net/sched/act_iptunnel.c

-- 
2.9.0



  1   2   3   >