date:20180517

Re: [RFC PATCH bpf-next 05/12] xdp: add MEM_TYPE_ZERO_COPY

2018-05-17 Thread Björn Töpel

2018-05-17 7:57 GMT+02:00 Jesper Dangaard Brouer :
> On Tue, 15 May 2018 21:06:08 +0200
> Björn Töpel  wrote:
>
>> @@ -82,6 +88,10 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff 
>> *xdp)
>>   int metasize;
>>   int headroom;
>>
>> + // XXX implement clone, copy, use "native" MEM_TYPE
>> + if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY)
>> + return NULL;
>> +
>
> There is going to be significant tradeoffs between AF_XDP zero-copy and
> copy-variant.  The copy-variant, still have very attractive
> RX-performance, and other benefits like no exposing unrelated packets
> to userspace (but limit these to the XDP filter).
>
> Thus, as a user I would like to choose between AF_XDP zero-copy and
> copy-variant. Even if my NIC support zero-copy, I can be interested in
> only enabling the copy-variant. This patchset doesn't let me choose.
>
> How do we expose this to userspace?
> (Maybe as simple as an sockaddr_xdp->sxdp_flags flag?)
>

We planned to add these flags later, but I think you're right that
it's better to do that right away.

If we try to follow the behavior of the XDP netlink interface: Pick
the "the best mode" when there are no flags. A user would like to
"force" a mode -- meaning that you select, say copy, and getting an
error if that's not supported. Four new flags?

diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index 77b88c4efe98..ce1f710847b7 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -22,7 +22,11 @@
 #include 

 /* Options for the sxdp_flags field */
-#define XDP_SHARED_UMEM 1
+#define XDP_SHARED_UMEM(1U << 0)
+#define XDP_COPY_TX_UMEM(1U << 1)
+#define XDP_ZEROCOPY_TX_UMEM(1U << 2)
+#define XDP_COPY_RX_UMEM(1U << 3)
+#define XDP_ZEROCOPY_RX_UMEM(1U << 4)

 struct sockaddr_xdp {
 __u16 sxdp_family;

A better way?




> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

Re: [RFC PATCH bpf-next 05/12] xdp: add MEM_TYPE_ZERO_COPY

2018-05-17 Thread Björn Töpel

2018-05-17 9:08 GMT+02:00 Björn Töpel :
> 2018-05-17 7:57 GMT+02:00 Jesper Dangaard Brouer :
>> On Tue, 15 May 2018 21:06:08 +0200
>> Björn Töpel  wrote:
>>
>>> @@ -82,6 +88,10 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff 
>>> *xdp)
>>>   int metasize;
>>>   int headroom;
>>>
>>> + // XXX implement clone, copy, use "native" MEM_TYPE
>>> + if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY)
>>> + return NULL;
>>> +
>>
>> There is going to be significant tradeoffs between AF_XDP zero-copy and
>> copy-variant.  The copy-variant, still have very attractive
>> RX-performance, and other benefits like no exposing unrelated packets
>> to userspace (but limit these to the XDP filter).
>>
>> Thus, as a user I would like to choose between AF_XDP zero-copy and
>> copy-variant. Even if my NIC support zero-copy, I can be interested in
>> only enabling the copy-variant. This patchset doesn't let me choose.
>>
>> How do we expose this to userspace?
>> (Maybe as simple as an sockaddr_xdp->sxdp_flags flag?)
>>
>
> We planned to add these flags later, but I think you're right that
> it's better to do that right away.
>
> If we try to follow the behavior of the XDP netlink interface: Pick
> the "the best mode" when there are no flags. A user would like to
> "force" a mode -- meaning that you select, say copy, and getting an
> error if that's not supported. Four new flags?
>
> diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
> index 77b88c4efe98..ce1f710847b7 100644
> --- a/include/uapi/linux/if_xdp.h
> +++ b/include/uapi/linux/if_xdp.h
> @@ -22,7 +22,11 @@
>  #include 
>
>  /* Options for the sxdp_flags field */
> -#define XDP_SHARED_UMEM 1
> +#define XDP_SHARED_UMEM(1U << 0)
> +#define XDP_COPY_TX_UMEM(1U << 1)
> +#define XDP_ZEROCOPY_TX_UMEM(1U << 2)
> +#define XDP_COPY_RX_UMEM(1U << 3)
> +#define XDP_ZEROCOPY_RX_UMEM(1U << 4)
>
>  struct sockaddr_xdp {
>  __u16 sxdp_family;
>
> A better way?
>

...but without the _UMEM suffix obviously.

>
>
>
>> --
>> Best regards,
>>   Jesper Dangaard Brouer
>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>   LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH 2/3] sh_eth: add EDMR.NBST support

2018-05-17 Thread Simon Horman

On Wed, May 16, 2018 at 10:58:26PM +0300, Sergei Shtylyov wrote:
> The R-Car V3H (AKA R8A77980) GEther controller adds the DMA burst mode bit
> (NBST) in EDMR and the manual tells to always set it before doing any DMA.
> 
> Based on the original (and large) patch by Vladimir Barinov.
> 
> Signed-off-by: Vladimir Barinov 
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Simon Horman 

> 
> ---
>  drivers/net/ethernet/renesas/sh_eth.c |4 
>  drivers/net/ethernet/renesas/sh_eth.h |2 ++
>  2 files changed, 6 insertions(+)
> 
> Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
> ===
> --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
> +++ net-next/drivers/net/ethernet/renesas/sh_eth.c
> @@ -1434,6 +1434,10 @@ static int sh_eth_dev_init(struct net_de
>  
>   sh_eth_write(ndev, mdp->cd->trscer_err_mask, TRSCER);
>  
> + /* DMA transfer burst mode */
> + if (mdp->cd->nbst)
> + sh_eth_modify(ndev, EDMR, EDMR_NBST, EDMR_NBST);
> +
>   if (mdp->cd->bculr)
>   sh_eth_write(ndev, 0x800, BCULR);   /* Burst sycle set */

Not related to this patch, but: s/sycle/cycle/

>  
> Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
> ===
> --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
> +++ net-next/drivers/net/ethernet/renesas/sh_eth.h
> @@ -184,6 +184,7 @@ enum GECMR_BIT {
>  
>  /* EDMR */
>  enum DMAC_M_BIT {
> + EDMR_NBST = 0x80,

It would be nice to start using BIT() in this file.

>   EDMR_EL = 0x40, /* Litte endian */
>   EDMR_DL1 = 0x20, EDMR_DL0 = 0x10,
>   EDMR_SRST_GETHER = 0x03,
> @@ -505,6 +506,7 @@ struct sh_eth_cpu_data {
>   unsigned bculr:1;   /* EtherC have BCULR */
>   unsigned tsu:1; /* EtherC have TSU */
>   unsigned hw_swap:1; /* E-DMAC have DE bit in EDMR */
> + unsigned nbst:1;/* E-DMAC has NBST bit in EDMR */
>   unsigned rpadir:1;  /* E-DMAC have RPADIR */
>   unsigned no_trimd:1;/* E-DMAC DO NOT have TRIMD */
>   unsigned no_ade:1;  /* E-DMAC DO NOT have ADE bit in EESR */
>

Re: [PATCH 1/3] sh_eth: add RGMII support

2018-05-17 Thread Sergei Shtylyov


On 5/16/2018 11:37 PM, Andrew Lunn wrote:


What about
PHY_INTERFACE_MODE_RGMII_ID,
PHY_INTERFACE_MODE_RGMII_RXID,
PHY_INTERFACE_MODE_RGMII_TXID,


Oops, totally forgot about those... :-/


Everybody does. I keep intending to write a email template for
this, and phy_interface_mode_is_rgmii() :-)


   The latter doesn't fit for *switch*, anyway.


Andrew


MBR, Sergei

[PATCH] net: qcom/emac: Allocate buffers from local node

2018-05-17 Thread Hemanth Puranik

Currently we use non-NUMA aware allocation for TPD and RRD buffers,
this patch modifies to use NUMA friendly allocation.

Signed-off-by: Hemanth Puranik 
---
 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c 
b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 092718a..c3df86a 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -684,9 +684,10 @@ static int emac_tx_q_desc_alloc(struct emac_adapter *adpt,
 {
struct emac_ring_header *ring_header = &adpt->ring_header;
size_t size;
+   int node = dev_to_node(adpt->netdev->dev.parent);
 
size = sizeof(struct emac_buffer) * tx_q->tpd.count;
-   tx_q->tpd.tpbuff = kzalloc(size, GFP_KERNEL);
+   tx_q->tpd.tpbuff = kzalloc_node(size, GFP_KERNEL, node);
if (!tx_q->tpd.tpbuff)
return -ENOMEM;
 
@@ -725,9 +726,10 @@ static int emac_rx_descs_alloc(struct emac_adapter *adpt)
struct emac_ring_header *ring_header = &adpt->ring_header;
struct emac_rx_queue *rx_q = &adpt->rx_q;
size_t size;
+   int node = dev_to_node(adpt->netdev->dev.parent);
 
size = sizeof(struct emac_buffer) * rx_q->rfd.count;
-   rx_q->rfd.rfbuff = kzalloc(size, GFP_KERNEL);
+   rx_q->rfd.rfbuff = kzalloc_node(size, GFP_KERNEL, node);
if (!rx_q->rfd.rfbuff)
return -ENOMEM;
 
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH net-next v3 06/10] net: mvpp2: 2500baseX support

2018-05-17 Thread Antoine Tenart

This patch adds the 2500Base-X PHY mode support in the Marvell PPv2
driver. 2500Base-X is quite close to 1000Base-X and SGMII modes and uses
nearly the same code path.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 51 +---
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index ece61f1727e4..5e580482769e 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -4871,6 +4871,7 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
break;
case PHY_INTERFACE_MODE_SGMII:
case PHY_INTERFACE_MODE_1000BASEX:
+   case PHY_INTERFACE_MODE_2500BASEX:
mvpp22_gop_init_sgmii(port);
break;
case PHY_INTERFACE_MODE_10GKR:
@@ -4909,7 +4910,8 @@ static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
 
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+   port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
/* Enable the GMAC link status irq for this port */
val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
@@ -4940,7 +4942,8 @@ static void mvpp22_gop_mask_irq(struct mvpp2_port *port)
 
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+   port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
val &= ~MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
@@ -4953,7 +4956,8 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
 
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+   port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_MASK);
val |= MVPP22_GMAC_INT_MASK_LINK_STAT;
writel(val, port->base + MVPP22_GMAC_INT_MASK);
@@ -4968,6 +4972,16 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
mvpp22_gop_unmask_irq(port);
 }
 
+/* Sets the PHY mode of the COMPHY (which configures the serdes lanes).
+ *
+ * The PHY mode used by the PPv2 driver comes from the network subsystem, while
+ * the one given to the COMPHY comes from the generic PHY subsystem. Hence they
+ * differ.
+ *
+ * The COMPHY configures the serdes lanes regardless of the actual use of the
+ * lanes by the physical layer. This is why configurations like
+ * "PPv2 (2500BaseX) - COMPHY (2500SGMII)" are valid.
+ */
 static int mvpp22_comphy_init(struct mvpp2_port *port)
 {
enum phy_mode mode;
@@ -4981,6 +4995,9 @@ static int mvpp22_comphy_init(struct mvpp2_port *port)
case PHY_INTERFACE_MODE_1000BASEX:
mode = PHY_MODE_SGMII;
break;
+   case PHY_INTERFACE_MODE_2500BASEX:
+   mode = PHY_MODE_2500SGMII;
+   break;
case PHY_INTERFACE_MODE_10GKR:
mode = PHY_MODE_10GKR;
break;
@@ -5062,7 +5079,8 @@ static void mvpp2_port_loopback_set(struct mvpp2_port 
*port,
val &= ~MVPP2_GMAC_GMII_LB_EN_MASK;
 
if (port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX)
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+   port->phy_interface == PHY_INTERFACE_MODE_2500BASEX)
val |= MVPP2_GMAC_PCS_LB_EN_MASK;
else
val &= ~MVPP2_GMAC_PCS_LB_EN_MASK;
@@ -6273,7 +6291,8 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void 
*dev_id)
}
} else if (phy_interface_mode_is_rgmii(port->phy_interface) ||
   port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-  port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+  port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+  port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_STAT);
if (val & MVPP22_GMAC_INT_STAT_LINK) {
event = true;
@@ -8056,8 +8075,10 @@ static void mvpp2_phylink_validate(struct net_device 
*dev,
phylink_set(mask, 1ba

[PATCH net-next v3 10/10] arm64: dts: marvell: 7040-db: describe the 10G interface as fixed-link

2018-05-17 Thread Antoine Tenart

This patch adds a fixed-link node to the 10G interface of the 7040-db
board. This is required as the mvpp2 driver now uses phylink. The best
solution would have been to describe the SFP cage but they are not
wired correctly, and thus unusable, so we chose to use fixed-link
instead.

Signed-off-by: Antoine Tenart 
---
 arch/arm64/boot/dts/marvell/armada-7040-db.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-7040-db.dts 
b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
index d6bec058a30a..412efdb46e7c 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
@@ -242,6 +242,11 @@
phy-mode = "10gbase-kr";
/* Generic PHY, providing serdes lanes */
phys = <&cp0_comphy2 0>;
+
+   fixed-link {
+   speed = <1>;
+   full-duplex;
+   };
 };
 
 &cp0_eth1 {
-- 
2.17.0

[PATCH net-next v3 09/10] arm64: dts: marvell: 8040-db: describe the 10G interfaces as fixed-link

2018-05-17 Thread Antoine Tenart

This patch adds a fixed-link node to both 10G interfaces of the 8040-db
board. This is required as the mvpp2 driver now uses phylink. The best
solution would have been to describe the SFP cages but they are not
wired correctly, and thus unusable, so we chose to use fixed-link
instead.

Signed-off-by: Antoine Tenart 
---
 arch/arm64/boot/dts/marvell/armada-8040-db.dts | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-db.dts 
b/arch/arm64/boot/dts/marvell/armada-8040-db.dts
index 5689fb23bbab..1bac437369a1 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-db.dts
@@ -177,6 +177,11 @@
 &cp0_eth0 {
status = "okay";
phy-mode = "10gbase-kr";
+
+   fixed-link {
+   speed = <1>;
+   full-duplex;
+   };
 };
 
 &cp0_eth2 {
@@ -303,6 +308,11 @@
 &cp1_eth0 {
status = "okay";
phy-mode = "10gbase-kr";
+
+   fixed-link {
+   speed = <1>;
+   full-duplex;
+   };
 };
 
 &cp1_eth1 {
-- 
2.17.0

[PATCH net-next v3 01/10] net: mvpp2: align the ethtool ops definition

2018-05-17 Thread Antoine Tenart

Cosmetic patch to align the ethtool functions to ops definitions. This
patch does not change in any way the driver's behaviour.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 6f410235987c..77dd91e3d962 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -7859,18 +7859,18 @@ static const struct net_device_ops mvpp2_netdev_ops = {
 };
 
 static const struct ethtool_ops mvpp2_eth_tool_ops = {
-   .nway_reset = phy_ethtool_nway_reset,
-   .get_link   = ethtool_op_get_link,
-   .set_coalesce   = mvpp2_ethtool_set_coalesce,
-   .get_coalesce   = mvpp2_ethtool_get_coalesce,
-   .get_drvinfo= mvpp2_ethtool_get_drvinfo,
-   .get_ringparam  = mvpp2_ethtool_get_ringparam,
-   .set_ringparam  = mvpp2_ethtool_set_ringparam,
-   .get_strings= mvpp2_ethtool_get_strings,
-   .get_ethtool_stats = mvpp2_ethtool_get_stats,
-   .get_sset_count = mvpp2_ethtool_get_sset_count,
-   .get_link_ksettings = phy_ethtool_get_link_ksettings,
-   .set_link_ksettings = phy_ethtool_set_link_ksettings,
+   .nway_reset = phy_ethtool_nway_reset,
+   .get_link   = ethtool_op_get_link,
+   .set_coalesce   = mvpp2_ethtool_set_coalesce,
+   .get_coalesce   = mvpp2_ethtool_get_coalesce,
+   .get_drvinfo= mvpp2_ethtool_get_drvinfo,
+   .get_ringparam  = mvpp2_ethtool_get_ringparam,
+   .set_ringparam  = mvpp2_ethtool_set_ringparam,
+   .get_strings= mvpp2_ethtool_get_strings,
+   .get_ethtool_stats  = mvpp2_ethtool_get_stats,
+   .get_sset_count = mvpp2_ethtool_get_sset_count,
+   .get_link_ksettings = phy_ethtool_get_link_ksettings,
+   .set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
 /* Used for PPv2.1, or PPv2.2 with the old Device Tree binding that
-- 
2.17.0

[PATCH net-next v3 00/10] net: mvpp2: phylink conversion

2018-05-17 Thread Antoine Tenart

Hi Dave, Russell,

This series convert the Marvell PPv2 driver to phylink (models the MAC
to PHY link).

One important point is the PPv2 driver supports two probe modes: device
tree and ACPI. This series only brings phylink support for the device
tree mode, as the ACPI one will need further work. Still, the driver
should be working as before when using ACPI. This split should be
temporary, and was discussed with Marcin (in Cc.) who added ACPI support
to the driver.

Also as the SFP cages on both DB boards can be considered as non-wired.
We thus chose not to describe those SFP cages and we use fixed-link.

The rest of the series uses phylink to add support for 1000BaseX and
2500BaseX modes in the PPv2 driver. To do this, two patches are needed
in the common PHY framework (patches 3 and 4). The last 4 patches modify
the device tree to use the new PPv2 functionalities.

The series has been tested for the device tree mode on the 7040-db,
8040-db and 8040-mcbin boards, to ensure all the interface where working
as expected.

@Dave: patches 7 to 10 should go through the mvebu tree (Gregory in
Cc.) to avoid any conflict with the other mvebu dt patches taken during
this cycle.

The series is based on today's net-next.

Thanks!
Antoine

Since v2:
  - Removed the SFP description from the DB boards, as their SFP cages
are wired properly. We now use fixed-link.
  - Because of this rework, split the series in two, so that the SFP
part is reviewed separately.
  - Small fixes in the phylink patch.
  - Rebased on the latest net-next branch.

Since v1:
  - Chose a different approach to the SFP changes, as the previous ones
weren't valid and reworked both BD boards device trees.
  - Misc fixes.
  - Added Kishon's acked-by on one patch.
  - Rebaed on latest net-next branch.

Antoine Tenart (9):
  net: mvpp2: align the ethtool ops definition
  net: mvpp2: phylink support
  phy: add 2.5G SGMII mode to the phy_mode enum
  phy: cp110-comphy: 2.5G SGMII mode
  net: mvpp2: 1000baseX support
  net: mvpp2: 2500baseX support
  arm64: dts: marvell: mcbin: enable the fourth network interface
  arm64: dts: marvell: 8040-db: describe the 10G interfaces as
fixed-link
  arm64: dts: marvell: 7040-db: describe the 10G interface as fixed-link

Russell King (1):
  arm64: dts: marvell: mcbin: add 10G SFP support

 .../arm64/boot/dts/marvell/armada-7040-db.dts |   5 +
 .../arm64/boot/dts/marvell/armada-8040-db.dts |  10 +
 .../boot/dts/marvell/armada-8040-mcbin.dts|  70 ++
 drivers/net/ethernet/marvell/Kconfig  |   1 +
 drivers/net/ethernet/marvell/mvpp2.c  | 931 +++---
 drivers/phy/marvell/phy-mvebu-cp110-comphy.c  |  17 +-
 include/linux/phy/phy.h   |   1 +
 7 files changed, 680 insertions(+), 355 deletions(-)

-- 
2.17.0

[PATCH net-next v3 04/10] phy: cp110-comphy: 2.5G SGMII mode

2018-05-17 Thread Antoine Tenart

This patch allow the CP110 comphy to configure some lanes in the
2.5G SGMII mode. This mode is quite close to SGMII and uses nearly the
same code path.

Signed-off-by: Antoine Tenart 
---
 drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c 
b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
index a0d522154cdf..4ef429250d7b 100644
--- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
+++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
@@ -135,19 +135,25 @@ struct mvebu_comhy_conf {
 static const struct mvebu_comhy_conf mvebu_comphy_cp110_modes[] = {
/* lane 0 */
MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1),
+   MVEBU_COMPHY_CONF(0, 1, PHY_MODE_2500SGMII, 0x1),
/* lane 1 */
MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1),
+   MVEBU_COMPHY_CONF(1, 2, PHY_MODE_2500SGMII, 0x1),
/* lane 2 */
MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1),
+   MVEBU_COMPHY_CONF(2, 0, PHY_MODE_2500SGMII, 0x1),
MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1),
/* lane 3 */
MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2),
+   MVEBU_COMPHY_CONF(3, 1, PHY_MODE_2500SGMII, 0x2),
/* lane 4 */
MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2),
+   MVEBU_COMPHY_CONF(4, 0, PHY_MODE_2500SGMII, 0x2),
MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2),
MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1),
/* lane 5 */
MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1),
+   MVEBU_COMPHY_CONF(5, 2, PHY_MODE_2500SGMII, 0x1),
 };
 
 struct mvebu_comphy_priv {
@@ -206,6 +212,10 @@ static void mvebu_comphy_ethernet_init_reset(struct 
mvebu_comphy_lane *lane,
if (mode == PHY_MODE_10GKR)
val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0xe) |
   MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0xe);
+   else if (mode == PHY_MODE_2500SGMII)
+   val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x8) |
+  MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x8) |
+  MVEBU_COMPHY_SERDES_CFG0_HALF_BUS;
else if (mode == PHY_MODE_SGMII)
val |= MVEBU_COMPHY_SERDES_CFG0_GEN_RX(0x6) |
   MVEBU_COMPHY_SERDES_CFG0_GEN_TX(0x6) |
@@ -296,13 +306,13 @@ static int mvebu_comphy_init_plls(struct 
mvebu_comphy_lane *lane,
return 0;
 }
 
-static int mvebu_comphy_set_mode_sgmii(struct phy *phy)
+static int mvebu_comphy_set_mode_sgmii(struct phy *phy, enum phy_mode mode)
 {
struct mvebu_comphy_lane *lane = phy_get_drvdata(phy);
struct mvebu_comphy_priv *priv = lane->priv;
u32 val;
 
-   mvebu_comphy_ethernet_init_reset(lane, PHY_MODE_SGMII);
+   mvebu_comphy_ethernet_init_reset(lane, mode);
 
val = readl(priv->base + MVEBU_COMPHY_RX_CTRL1(lane->id));
val &= ~MVEBU_COMPHY_RX_CTRL1_CLK8T_EN;
@@ -487,7 +497,8 @@ static int mvebu_comphy_power_on(struct phy *phy)
 
switch (lane->mode) {
case PHY_MODE_SGMII:
-   ret = mvebu_comphy_set_mode_sgmii(phy);
+   case PHY_MODE_2500SGMII:
+   ret = mvebu_comphy_set_mode_sgmii(phy, lane->mode);
break;
case PHY_MODE_10GKR:
ret = mvebu_comphy_set_mode_10gkr(phy);
-- 
2.17.0

[PATCH net-next v3 02/10] net: mvpp2: phylink support

2018-05-17 Thread Antoine Tenart

Convert the PPv2 driver to implement phylink helpers, and use phylink in
DT mode. The other mode supported is ACPI, which will need further work
in order to be entirely compatible with phylink.

The MAC and GoP configuration functions were completely moved to fit
into the phylink helpers. When a PHY is always present between the MAC
and the physical port, phylink only is used, but when this is not the
case (the MAC directly is connected to the physical port) the link IRQ
is used to detect changes in the link state and call phylink_mac_change.

The ACPI mode do not uses phylink as of now, and the changes shouldn't
impact its use.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/Kconfig |   1 +
 drivers/net/ethernet/marvell/mvpp2.c | 846 ---
 2 files changed, 509 insertions(+), 338 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index ebe5c9148935..cc2f7701e71e 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -86,6 +86,7 @@ config MVPP2
depends on ARCH_MVEBU || COMPILE_TEST
depends on HAS_DMA
select MVMDIO
+   select PHYLINK
---help---
  This driver supports the network interface units in the
  Marvell ARMADA 375, 7K and 8K SoCs.
diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 77dd91e3d962..60093f1e6297 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -359,15 +360,23 @@
 #define MVPP2_GMAC_FORCE_LINK_PASS BIT(1)
 #define MVPP2_GMAC_IN_BAND_AUTONEG BIT(2)
 #define MVPP2_GMAC_IN_BAND_AUTONEG_BYPASS  BIT(3)
+#define MVPP2_GMAC_IN_BAND_RESTART_AN  BIT(4)
 #define MVPP2_GMAC_CONFIG_MII_SPEEDBIT(5)
 #define MVPP2_GMAC_CONFIG_GMII_SPEED   BIT(6)
 #define MVPP2_GMAC_AN_SPEED_EN BIT(7)
 #define MVPP2_GMAC_FC_ADV_EN   BIT(9)
+#define MVPP2_GMAC_FC_ADV_ASM_EN   BIT(10)
 #define MVPP2_GMAC_FLOW_CTRL_AUTONEG   BIT(11)
 #define MVPP2_GMAC_CONFIG_FULL_DUPLEX  BIT(12)
 #define MVPP2_GMAC_AN_DUPLEX_ENBIT(13)
 #define MVPP2_GMAC_STATUS0 0x10
 #define MVPP2_GMAC_STATUS0_LINK_UP BIT(0)
+#define MVPP2_GMAC_STATUS0_GMII_SPEED  BIT(1)
+#define MVPP2_GMAC_STATUS0_MII_SPEED   BIT(2)
+#define MVPP2_GMAC_STATUS0_FULL_DUPLEX BIT(3)
+#define MVPP2_GMAC_STATUS0_RX_PAUSEBIT(6)
+#define MVPP2_GMAC_STATUS0_TX_PAUSEBIT(7)
+#define MVPP2_GMAC_STATUS0_AN_COMPLETE BIT(11)
 #define MVPP2_GMAC_PORT_FIFO_CFG_1_REG 0x1c
 #define MVPP2_GMAC_TX_FIFO_MIN_TH_OFFS 6
 #define MVPP2_GMAC_TX_FIFO_MIN_TH_ALL_MASK 0x1fc0
@@ -379,6 +388,8 @@
 #define MVPP22_GMAC_INT_MASK_LINK_STAT BIT(1)
 #define MVPP22_GMAC_CTRL_4_REG 0x90
 #define MVPP22_CTRL4_EXT_PIN_GMII_SEL  BIT(0)
+#define MVPP22_CTRL4_RX_FC_EN  BIT(3)
+#define MVPP22_CTRL4_TX_FC_EN  BIT(4)
 #define MVPP22_CTRL4_DP_CLK_SELBIT(5)
 #define MVPP22_CTRL4_SYNC_BYPASS_DIS   BIT(6)
 #define MVPP22_CTRL4_QSGMII_BYPASS_ACTIVE  BIT(7)
@@ -392,6 +403,7 @@
 #define MVPP22_XLG_CTRL0_PORT_EN   BIT(0)
 #define MVPP22_XLG_CTRL0_MAC_RESET_DIS BIT(1)
 #define MVPP22_XLG_CTRL0_RX_FLOW_CTRL_EN   BIT(7)
+#define MVPP22_XLG_CTRL0_TX_FLOW_CTRL_EN   BIT(8)
 #define MVPP22_XLG_CTRL0_MIB_CNT_DIS   BIT(14)
 #define MVPP22_XLG_CTRL1_REG   0x104
 #define MVPP22_XLG_CTRL1_FRAMESIZELIMIT_OFFS   0
@@ -413,6 +425,7 @@
 #define MVPP22_XLG_CTRL4_FWD_FCBIT(5)
 #define MVPP22_XLG_CTRL4_FWD_PFC   BIT(6)
 #define MVPP22_XLG_CTRL4_MACMODSELECT_GMAC BIT(12)
+#define MVPP22_XLG_CTRL4_EN_IDLE_CHECK BIT(14)
 
 /* SMI registers. PPv2.2 only, relative to priv->iface_base. */
 #define MVPP22_SMI_MISC_CFG_REG0x1204
@@ -1017,6 +1030,9 @@ struct mvpp2_port {
/* Firmware node associated to the port */
struct fwnode_handle *fwnode;
 
+   /* Is a PHY always connected to the port */
+   bool has_phy;
+
/* Per-port registers' base address */
void __iomem *base;
void __iomem *stats_base;
@@ -1044,12 +1060,11 @@ struct mvpp2_port {
struct mutex gather_stats_lock;
struct delayed_work stats_work;
 
+   struct device_node *of_node;
+
phy_interface_t phy_interface;
-   struct device_node *phy_node;
+   struct phylink *phylink;
struct phy *comphy;
-   unsigned int link;
-   unsigned int duplex;
-   unsigned int speed;
 
struct mvpp2_bm_pool *pool_long;
struct mvpp2_bm_pool *pool_short;
@@ -1338,6 +1353,12

[PATCH net-next v3 08/10] arm64: dts: marvell: mcbin: enable the fourth network interface

2018-05-17 Thread Antoine Tenart

This patch enables the fourth network interface on the Marvell
Macchiatobin. It is configured in the 2500Base-X PHY mode. The SFP cage
is also described.

Signed-off-by: Antoine Tenart 
---
 .../boot/dts/marvell/armada-8040-mcbin.dts| 32 +++
 1 file changed, 32 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts 
b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index eaa67de8c2bb..a66958ff4de6 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -27,6 +27,7 @@
ethernet0 = &cp0_eth0;
ethernet1 = &cp1_eth0;
ethernet2 = &cp1_eth1;
+   ethernet3 = &cp1_eth2;
};
 
/* Regulator labels correspond with schematics */
@@ -88,6 +89,18 @@
pinctrl-names = "default";
pinctrl-0 = <&cp1_sfpp1_pins &cp0_sfpp1_pins>;
};
+
+   sfp_eth3: sfp-eth3 {
+   /* CON3,4 - CPS lane 5 */
+   compatible = "sff,sfp";
+   i2c-bus = <&sfp_1g_i2c>;
+   los-gpio = <&cp0_gpio2 22 GPIO_ACTIVE_HIGH>;
+   mod-def0-gpio = <&cp0_gpio2 21 GPIO_ACTIVE_LOW>;
+   tx-disable-gpio = <&cp1_gpio1 24 GPIO_ACTIVE_HIGH>;
+   tx-fault-gpio = <&cp0_gpio2 19 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <&cp0_sfp_1g_pins &cp1_sfp_1g_pins>;
+   };
 };
 
 &uart0 {
@@ -195,6 +208,10 @@
marvell,pins = "mpp47";
marvell,function = "gpio";
};
+   cp0_sfp_1g_pins: sfp-1g-pins {
+   marvell,pins = "mpp51", "mpp53", "mpp54";
+   marvell,function = "gpio";
+   };
cp0_pcie_pins: pcie-pins {
marvell,pins = "mpp52";
marvell,function = "gpio";
@@ -287,6 +304,17 @@
phys = <&cp1_comphy0 1>;
 };
 
+&cp1_eth2 {
+   /* CPS Lane 5 */
+   status = "okay";
+   /* Network PHY */
+   phy-mode = "2500base-x";
+   managed = "in-band-status";
+   /* Generic PHY, providing serdes lanes */
+   phys = <&cp1_comphy5 2>;
+   sfp = <&sfp_eth3>;
+};
+
 &cp1_pinctrl {
cp1_sfpp1_pins: sfpp1-pins {
marvell,pins = "mpp8", "mpp10", "mpp11";
@@ -300,6 +328,10 @@
marvell,pins = "mpp6", "mpp7";
marvell,function = "uart0";
};
+   cp1_sfp_1g_pins: sfp-1g-pins {
+   marvell,pins = "mpp24";
+   marvell,function = "gpio";
+   };
cp1_sfpp0_pins: sfpp0-pins {
marvell,pins = "mpp26", "mpp27", "mpp28", "mpp29";
marvell,function = "gpio";
-- 
2.17.0

[PATCH net-next 2/2] net: phy: sfp: warn the user when no tx_disable pin is available

2018-05-17 Thread Antoine Tenart

In case no Tx disable pin is available the SFP modules will always be
emitting. This could be an issue when using modules using laser as their
light source as we would have no way to disable it when the fiber is
removed. This patch adds a warning when registering an SFP cage which do
not have its tx_disable pin wired or available.

Signed-off-by: Antoine Tenart 
Acked-by: Russell King 
---
 drivers/net/phy/sfp.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 0fd2a92a6f7b..4e62769b3e00 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -1077,6 +1077,15 @@ static int sfp_probe(struct platform_device *pdev)
if (poll)
mod_delayed_work(system_wq, &sfp->poll, poll_jiffies);
 
+   /* We could have an issue in cases no Tx disable pin is available or
+* wired as modules using a laser as their light source will continue to
+* be active when the fiber is removed. This could be a safety issue and
+* we should at least warn the user about that.
+*/
+   if (!sfp->gpio[GPIO_TX_DISABLE])
+   dev_warn(sfp->dev,
+"No tx_disable pin: SFP modules will always be 
emitting.\n");
+
return 0;
 }
 
-- 
2.17.0

[PATCH net-next v3 03/10] phy: add 2.5G SGMII mode to the phy_mode enum

2018-05-17 Thread Antoine Tenart

This patch adds one more generic PHY mode to the phy_mode enum, to allow
configuring generic PHYs to the 2.5G SGMII mode by using the set_mode
callback.

Signed-off-by: Antoine Tenart 
Acked-by: Kishon Vijay Abraham I 
---
 include/linux/phy/phy.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
index c9d147f5..9713aebdd348 100644
--- a/include/linux/phy/phy.h
+++ b/include/linux/phy/phy.h
@@ -36,6 +36,7 @@ enum phy_mode {
PHY_MODE_USB_DEVICE_SS,
PHY_MODE_USB_OTG,
PHY_MODE_SGMII,
+   PHY_MODE_2500SGMII,
PHY_MODE_10GKR,
PHY_MODE_UFS_HS_A,
PHY_MODE_UFS_HS_B,
-- 
2.17.0

[PATCH net-next 0/2] net: sfp: small improvements

2018-05-17 Thread Antoine Tenart

Hi Russell,

This series was part of the mvpp2 phylink one but as we reworked it to
use fixed-link on the DB boards, the SFP commits weren't needed
anymore for our use case. Two of the three patches still are needed I
believe (I ditched the one about non-wired SFP cages), so they are sent
here in a separate series.

Thanks!
Antoine

Since last time:
  - s/-EOPNOTSUPP/-ENODEV/ in patch 1/2.
  - I added the acked-by tag in patch 2/2.

Antoine Tenart (2):
  net: phy: sfp: make the i2c-bus property really optional
  net: phy: sfp: warn the user when no tx_disable pin is available

 drivers/net/phy/sfp.c | 21 +
 1 file changed, 21 insertions(+)

-- 
2.17.0

[PATCH net-next v3 07/10] arm64: dts: marvell: mcbin: add 10G SFP support

2018-05-17 Thread Antoine Tenart

From: Russell King 

This patch adds the SFP cage description in the Marvell Armada 8040
mcbin, for both 10G interfaces.

Signed-off-by: Russell King 
[Antoine: small reworks, commit message]
Signed-off-by: Antoine Tenart 
---
 .../boot/dts/marvell/armada-8040-mcbin.dts| 38 +++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts 
b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index 81de03ef860d..eaa67de8c2bb 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -64,6 +64,30 @@
compatible = "usb-nop-xceiv";
vcc-supply = <&v_5v0_usb3_hst_vbus>;
};
+
+   sfp_eth0: sfp-eth0 {
+   /* CON15,16 - CPM lane 4 */
+   compatible = "sff,sfp";
+   i2c-bus = <&sfpp0_i2c>;
+   los-gpio = <&cp1_gpio1 28 GPIO_ACTIVE_HIGH>;
+   mod-def0-gpio = <&cp1_gpio1 27 GPIO_ACTIVE_LOW>;
+   tx-disable-gpio = <&cp1_gpio1 29 GPIO_ACTIVE_HIGH>;
+   tx-fault-gpio  = <&cp1_gpio1 26 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <&cp1_sfpp0_pins>;
+   };
+
+   sfp_eth1: sfp-eth1 {
+   /* CON17,18 - CPS lane 4 */
+   compatible = "sff,sfp";
+   i2c-bus = <&sfpp1_i2c>;
+   los-gpio = <&cp1_gpio1 8 GPIO_ACTIVE_HIGH>;
+   mod-def0-gpio = <&cp1_gpio1 11 GPIO_ACTIVE_LOW>;
+   tx-disable-gpio = <&cp1_gpio1 10 GPIO_ACTIVE_HIGH>;
+   tx-fault-gpio = <&cp0_gpio2 30 GPIO_ACTIVE_HIGH>;
+   pinctrl-names = "default";
+   pinctrl-0 = <&cp1_sfpp1_pins &cp0_sfpp1_pins>;
+   };
 };
 
 &uart0 {
@@ -180,6 +204,10 @@
   "mpp60", "mpp61";
marvell,function = "sdio";
};
+   cp0_sfpp1_pins: sfpp1-pins {
+   marvell,pins = "mpp62";
+   marvell,function = "gpio";
+   };
 };
 
 &cp0_xmdio {
@@ -188,11 +216,13 @@
phy0: ethernet-phy@0 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <0>;
+   sfp = <&sfp_eth0>;
};
 
phy8: ethernet-phy@8 {
compatible = "ethernet-phy-ieee802.3-c45";
reg = <8>;
+   sfp = <&sfp_eth1>;
};
 };
 
@@ -258,6 +288,10 @@
 };
 
 &cp1_pinctrl {
+   cp1_sfpp1_pins: sfpp1-pins {
+   marvell,pins = "mpp8", "mpp10", "mpp11";
+   marvell,function = "gpio";
+   };
cp1_spi1_pins: spi1-pins {
marvell,pins = "mpp12", "mpp13", "mpp14", "mpp15", "mpp16";
marvell,function = "spi1";
@@ -266,6 +300,10 @@
marvell,pins = "mpp6", "mpp7";
marvell,function = "uart0";
};
+   cp1_sfpp0_pins: sfpp0-pins {
+   marvell,pins = "mpp26", "mpp27", "mpp28", "mpp29";
+   marvell,function = "gpio";
+   };
 };
 
 /* J27 UART header */
-- 
2.17.0

[PATCH net-next 2/3] net: mvpp2: set mac address does not require the stop/start sequence

2018-05-17 Thread Antoine Tenart

From: Yan Markman 

Remove special stop/start handling from the set_mac_address callback.
All this special care is not needed, and can be removed. It also
simplifies the up/down status in the driver and helps avoiding possible
link status mismatch issues.

Signed-off-by: Yan Markman 
[Antoine: commit message]
Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 38 +---
 1 file changed, 7 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 73b2f2d331c5..a9483da18e00 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -7358,42 +7358,18 @@ static void mvpp2_set_rx_mode(struct net_device *dev)
 
 static int mvpp2_set_mac_address(struct net_device *dev, void *p)
 {
-   struct mvpp2_port *port = netdev_priv(dev);
const struct sockaddr *addr = p;
int err;
 
-   if (!is_valid_ether_addr(addr->sa_data)) {
-   err = -EADDRNOTAVAIL;
-   goto log_error;
-   }
-
-   if (!netif_running(dev)) {
-   err = mvpp2_prs_update_mac_da(dev, addr->sa_data);
-   if (!err)
-   return 0;
-   /* Reconfigure parser to accept the original MAC address */
-   err = mvpp2_prs_update_mac_da(dev, dev->dev_addr);
-   if (err)
-   goto log_error;
-   }
-
-   mvpp2_stop_dev(port);
+   if (!is_valid_ether_addr(addr->sa_data))
+   return -EADDRNOTAVAIL;
 
err = mvpp2_prs_update_mac_da(dev, addr->sa_data);
-   if (!err)
-   goto out_start;
-
-   /* Reconfigure parser accept the original MAC address */
-   err = mvpp2_prs_update_mac_da(dev, dev->dev_addr);
-   if (err)
-   goto log_error;
-out_start:
-   mvpp2_start_dev(port);
-   mvpp2_egress_enable(port);
-   mvpp2_ingress_enable(port);
-   return 0;
-log_error:
-   netdev_err(dev, "failed to change MAC address\n");
+   if (err) {
+   /* Reconfigure parser accept the original MAC address */
+   mvpp2_prs_update_mac_da(dev, dev->dev_addr);
+   netdev_err(dev, "failed to change MAC address\n");
+   }
return err;
 }
 
-- 
2.17.0

[PATCH net-next 1/3] net: mvpp2: avoid checking for free aggregated descriptors twice

2018-05-17 Thread Antoine Tenart

From: Yan Markman 

Avoid repeating the check for free aggregated descriptors when it
already failed at the beginning of the function.

Signed-off-by: Yan Markman 
[Antoine: commit message]
Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index df59f0e0d33c..73b2f2d331c5 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -5488,11 +5488,10 @@ static int mvpp2_aggr_desc_num_check(struct mvpp2 *priv,
 MVPP2_AGGR_TXQ_STATUS_REG(cpu));
 
aggr_txq->count = val & MVPP2_AGGR_TXQ_PENDING_MASK;
-   }
-
-   if ((aggr_txq->count + num) > MVPP2_AGGR_TXQ_SIZE)
-   return -ENOMEM;
 
+   if ((aggr_txq->count + num) > MVPP2_AGGR_TXQ_SIZE)
+   return -ENOMEM;
+   }
return 0;
 }
 
-- 
2.17.0

[PATCH net-next 3/3] net: mvpp2: print rx error with rate-limit

2018-05-17 Thread Antoine Tenart

From: Yan Markman 

Prevent flood of RX error prints during heavy traffic with weak signal
in link by checking net_ratelimit() before using netdev_err().

Signed-off-by: Yan Markman 
[Antoine: small rework, commit message]
Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index a9483da18e00..f8ed983bc767 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -6382,21 +6382,23 @@ static void mvpp2_rx_error(struct mvpp2_port *port,
 {
u32 status = mvpp2_rxdesc_status_get(port, rx_desc);
size_t sz = mvpp2_rxdesc_size_get(port, rx_desc);
+   char *err_str = NULL;
 
switch (status & MVPP2_RXD_ERR_CODE_MASK) {
case MVPP2_RXD_ERR_CRC:
-   netdev_err(port->dev, "bad rx status %08x (crc error), 
size=%zu\n",
-  status, sz);
+   err_str = "crc";
break;
case MVPP2_RXD_ERR_OVERRUN:
-   netdev_err(port->dev, "bad rx status %08x (overrun error), 
size=%zu\n",
-  status, sz);
+   err_str = "overrun";
break;
case MVPP2_RXD_ERR_RESOURCE:
-   netdev_err(port->dev, "bad rx status %08x (resource error), 
size=%zu\n",
-  status, sz);
+   err_str = "resource";
break;
}
+   if (err_str && net_ratelimit())
+   netdev_err(port->dev,
+  "bad rx status %08x (%s error), size=%zu\n",
+  status, err_str, sz);
 }
 
 /* Handle RX checksum offload */
-- 
2.17.0

[PATCH net-next v3 05/10] net: mvpp2: 1000baseX support

2018-05-17 Thread Antoine Tenart

This patch adds the 1000Base-X PHY mode support in the Marvell PPv2
driver. 1000Base-X is quite close the SGMII and uses nearly the same
code path.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvpp2.c | 72 
 1 file changed, 51 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 60093f1e6297..ece61f1727e4 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -4870,6 +4870,7 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
mvpp22_gop_init_rgmii(port);
break;
case PHY_INTERFACE_MODE_SGMII:
+   case PHY_INTERFACE_MODE_1000BASEX:
mvpp22_gop_init_sgmii(port);
break;
case PHY_INTERFACE_MODE_10GKR:
@@ -4907,7 +4908,8 @@ static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
u32 val;
 
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-   port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+   port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
/* Enable the GMAC link status irq for this port */
val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
@@ -4937,7 +4939,8 @@ static void mvpp22_gop_mask_irq(struct mvpp2_port *port)
}
 
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-   port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+   port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
val &= ~MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
@@ -4949,7 +4952,8 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
u32 val;
 
if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-   port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+   port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_MASK);
val |= MVPP22_GMAC_INT_MASK_LINK_STAT;
writel(val, port->base + MVPP22_GMAC_INT_MASK);
@@ -4974,6 +4978,7 @@ static int mvpp22_comphy_init(struct mvpp2_port *port)
 
switch (port->phy_interface) {
case PHY_INTERFACE_MODE_SGMII:
+   case PHY_INTERFACE_MODE_1000BASEX:
mode = PHY_MODE_SGMII;
break;
case PHY_INTERFACE_MODE_10GKR:
@@ -5056,7 +5061,8 @@ static void mvpp2_port_loopback_set(struct mvpp2_port 
*port,
else
val &= ~MVPP2_GMAC_GMII_LB_EN_MASK;
 
-   if (port->phy_interface == PHY_INTERFACE_MODE_SGMII)
+   if (port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX)
val |= MVPP2_GMAC_PCS_LB_EN_MASK;
else
val &= ~MVPP2_GMAC_PCS_LB_EN_MASK;
@@ -6266,7 +6272,8 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void 
*dev_id)
link = true;
}
} else if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-  port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+  port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+  port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
val = readl(port->base + MVPP22_GMAC_INT_STAT);
if (val & MVPP22_GMAC_INT_STAT_LINK) {
event = true;
@@ -8032,20 +8039,25 @@ static void mvpp2_phylink_validate(struct net_device 
*dev,
phylink_set(mask, Pause);
phylink_set(mask, Asym_Pause);
 
-   phylink_set(mask, 10baseT_Half);
-   phylink_set(mask, 10baseT_Full);
-   phylink_set(mask, 100baseT_Half);
-   phylink_set(mask, 100baseT_Full);
-   phylink_set(mask, 1000baseT_Full);
-   phylink_set(mask, 1baseT_Full);
-
-   if (state->interface == PHY_INTERFACE_MODE_10GKR) {
+   switch (state->interface) {
+   case PHY_INTERFACE_MODE_10GKR:
phylink_set(mask, 1baseCR_Full);
phylink_set(mask, 1baseSR_Full);
phylink_set(mask, 1baseLR_Full);
phylink_set(mask, 1baseLRM_Full);
phylink_set(mask, 1baseER_Full);
phylink_set(mask, 1baseKR_Full);
+   /* Fall-through */
+   default:
+   phylink_set(mask, 10baseT_Half);
+   phylink_set(mask, 10baseT_Full);
+   phylink_set(mask, 100baseT_Half);
+   phylink_set(mask, 100baseT_Full);
+   phylink_set(mask, 1baseT

[PATCH net-next 0/3] net: mvpp2: small improvements

2018-05-17 Thread Antoine Tenart

Hi all,

Those 3 patches are small improvements to the Marvell PPv2 driver. The
series does not conflict with the one sent about phylink and
1000/2500baseX support, so the two series can live in parallel.

Thanks!
Antoine

Yan Markman (3):
  net: mvpp2: avoid checking for free aggregated descriptors twice
  net: mvpp2: set mac address does not require the stop/start sequence
  net: mvpp2: print rx error with rate-limit

 drivers/net/ethernet/marvell/mvpp2.c | 59 +---
 1 file changed, 18 insertions(+), 41 deletions(-)

-- 
2.17.0

[PATCH net-next 1/2] net: phy: sfp: make the i2c-bus property really optional

2018-05-17 Thread Antoine Tenart

The SFF,SFP documentation is clear about making all the DT properties,
with the exception of the compatible, optional. In practice this is not
the case and without an i2c-bus property provided the SFP code will
throw NULL pointer exceptions.

This patch is an attempt to fix this.

Signed-off-by: Antoine Tenart 
---
 drivers/net/phy/sfp.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 4ab6e9a50bbe..0fd2a92a6f7b 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -298,11 +298,17 @@ static void sfp_set_state(struct sfp *sfp, unsigned int 
state)
 
 static int sfp_read(struct sfp *sfp, bool a2, u8 addr, void *buf, size_t len)
 {
+   if (!sfp->read)
+   return -ENODEV;
+
return sfp->read(sfp, a2, addr, buf, len);
 }
 
 static int sfp_write(struct sfp *sfp, bool a2, u8 addr, void *buf, size_t len)
 {
+   if (!sfp->write)
+   return -ENODEV;
+
return sfp->write(sfp, a2, addr, buf, len);
 }
 
@@ -533,6 +539,8 @@ static int sfp_sm_mod_hpower(struct sfp *sfp)
return 0;
 
err = sfp_read(sfp, true, SFP_EXT_STATUS, &val, sizeof(val));
+   if (err == -ENODEV)
+   goto err;
if (err != sizeof(val)) {
dev_err(sfp->dev, "Failed to read EEPROM: %d\n", err);
err = -EAGAIN;
@@ -542,6 +550,8 @@ static int sfp_sm_mod_hpower(struct sfp *sfp)
val |= BIT(0);
 
err = sfp_write(sfp, true, SFP_EXT_STATUS, &val, sizeof(val));
+   if (err == -ENODEV)
+   goto err;
if (err != sizeof(val)) {
dev_err(sfp->dev, "Failed to write EEPROM: %d\n", err);
err = -EAGAIN;
@@ -565,6 +575,8 @@ static int sfp_sm_mod_probe(struct sfp *sfp)
int ret;
 
ret = sfp_read(sfp, false, 0, &id, sizeof(id));
+   if (ret == -ENODEV)
+   return ret;
if (ret < 0) {
dev_err(sfp->dev, "failed to read EEPROM: %d\n", ret);
return -EAGAIN;
-- 
2.17.0

[PATCH] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Greg Kroah-Hartman

When allocating a xt_table_info structure, we should be clearing out the
full amount of memory that was allocated, not just the "header" of the
structure.  Otherwise odd values could be passed to userspace, which is
not a good thing.

Cc: stable 
Signed-off-by: Greg Kroah-Hartman 
---
 net/netfilter/x_tables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index cb7cb300c3bc..a300e8252bb6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1187,7 +1187,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
size)
if (!info)
return NULL;
 
-   memset(info, 0, sizeof(*info));
+   memset(info, 0, sz);
info->size = size;
return info;
 }
-- 
2.17.0

pull-request: wireless-drivers-next 2018-05-17

2018-05-17 Thread Kalle Valo

Hi Dave,

here's a pull request to net-next for 4.18. I forgot to mention in the
signed tag was that one id is added to include/linux/mmc/sdio_ids.h but
that was acked by Ulf.

I suspect hat because of my merge of wireless-drivers into
wireless-drivers-next the diffstat from request-pull was wrong again. I
manually replaced that with the diffstat from my test pull to net-next.

Please let me know if you have any problems.

Kalle

The following changes since commit af8a41cccf8f469165c6debc8fe07c5fd2ca501a:

  rtlwifi: cleanup 8723be ant_sel definition (2018-04-24 13:15:08 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-for-davem-2018-05-17

for you to fetch changes up to 763ece85f45a6b93268e25a0abf02922f911dab4:

  brcmfmac: fix initialization of struct cfg80211_inform_bss variable 
(2018-05-15 18:03:35 +0300)


wireless-drivers-next patches for 4.18

The first pull request for 4.18. As usual new features and bug fixes
but nothing really special.

I also merged wireless-drivers due to an iwlwifi patch dependency.

Major changes:

iwlwifi

* implement Traffic Condition Monitor and use it for scan, BT coex and
  to detect when the AP doesn't support UAPSD properly

* some more work for the 22000 family of devices;

* introduce AMSDU rate control offload

qtnfmac

* DFS offload support

rsi

* roaming enhancements

* increase max supported aggregation subframes

* don't advertise 5 GHz support if the device doesn't support it

brcmfmac

* add support for BCM4366E chipset

* add support for bcm43364 wireless chipset

ath10k

* enable temperature reads for QCA6174 and QCA9377

* add firmware memory dump support for QCA9984

* continue adding WCN3990 support via SNOC bus


Amitkumar Karwar (7):
  rsi: disable fw watchdog timer during reset
  rsi: device bootup parameter configuration
  rsi: use appropriate interface for power save configuration
  rsi: increase max supported aggregation subframes
  rsi: parse TID from data frame correctly
  rsi: enable power save by default for coex
  rsi: advertise 5GHz support based on device capability

Arend Van Spriel (2):
  brcmfmac: check p2pdev mac address uniqueness
  brcmfmac: constify firmware mapping tables

Arnd Bergmann (1):
  ath10k: avoid possible string overflow

Carl Huang (2):
  ath10k: add WMI_SERVICE_AVAILABLE_EVENT support
  ath10k: support MAC address randomization in scan

Colin Ian King (9):
  wil6210: fix potential null dereference of ndev before null check
  ath10k: fix spelling mistake: "tiggers" -> "triggers"
  ath6kl: fix spelling mistake: "chache" -> "cache"
  cw1200: fix spelling mistake: "Mailformed" -> "Malformed"
  rt2x00: fix spelling mistake in various macros, UKNOWN -> UNKNOWN
  ipw2100: fix spelling mistake: "decsribed" -> "described"
  rtlwifi: fix spelling mistake: "dismatch" -> "mismatch"
  ipw2200: fix spelling mistake: "functionalitis" -> "functionalities"
  rsi: fix spelling mistake: "thead" -> "thread"

Dan Carpenter (2):
  rsi: remove unecessary PTR_ALIGN()s
  mwifiex: pcie: tighten a check in mwifiex_pcie_process_event_ready()

Dan Haab (1):
  brcmfmac: add support for BCM4366E chipset

Daniel Mack (11):
  wcn36xx: check for DMA mapping errors in wcn36xx_dxe_tx_frame()
  wcn36xx: don't keep reference to skb if transmission failed
  wcn36xx: don't delete invalid bss indices
  wcn36xx: allocate skbs with GFP_KERNEL during init
  wcn36xx: use READ_ONCE() to access desc->ctrl
  wcn36xx: pass correct BSS index when deleting BSS keys
  wcn36xx: abort scan request when 'dequeued' indicator is sent
  wcn36xx: cancel pending scan request when interface goes down
  wcn36xx: handle scan cancellation when firmware support is missing
  wcn36xx: send bss_type in scan requests
  wcn36xx: pass information elements in scan requests

Dmitry Lebed (1):
  qtnfmac: add DFS offload support

Eliad Peller (2):
  iwlwifi: pcie: allow sending pre-built A-MSDUs
  iwlwifi: mvm: set wakeup filters for wowlan "any" configuration

Emmanuel Grumbach (3):
  iwlwifi: mvm: BT Coex - make the primary / secondary pick traffic aware
  iwlwifi: pcie: implement the overlow queue for Gen2 devices
  iwlwifi: mvm: set the MFP flag for keys that are used by MFP stations

Erik Stromdahl (2):
  ath10k: add inlined wrappers for htt tx ops
  ath10k: add inlined wrappers for htt rx ops

Eyal Reizer (1):
  wlcore: sdio: allow pm to handle sdio power

Felix Fietkau (11):
  mt76: stop tx queues from the driver callback instead of common code
  mt76: add missing VHT maximum A-MPDU length capability
  mt76: toggle driver station powersave bit before notifying mac80211

Re: [patch net-next RFC 00/12] devlink: introduce port flavours and common phys_port_name generation

2018-05-17 Thread Jiri Pirko

Thu, Mar 22, 2018 at 08:25:46PM CET, andrew.gospoda...@broadcom.com wrote:
>On Thu, Mar 22, 2018 at 01:10:38PM -0600, David Ahern wrote:
>> On 3/22/18 11:49 AM, Jiri Pirko wrote:
>> > Thu, Mar 22, 2018 at 04:34:07PM CET, dsah...@gmail.com wrote:
>> >> On 3/22/18 4:55 AM, Jiri Pirko wrote:
>> >>> From: Jiri Pirko 
>> >>>
>> >>> This patchset resolves 2 issues we have right now:
>> >>> 1) There are many netdevices / ports in the system, for port, pf, vf
>> >>>represenatation but the user has no way to see which is which
>> >>> 2) The ndo_get_phys_port_name is implemented in each driver separatelly,
>> >>>which may lead to inconsistent names between drivers.
>> >>
>> >> Similar to ndo_get_phys_port_{name,id}, devlink requires drivers to opt
>> >> in with an implementation right, so you can't really force a solution to
>> >> the consistent naming.
>> > 
>> > Yeah, drivers would still have free choice to implemen the ndo
>> > themselves. But most of them, like all sriov switch drivers should use
>> > the devlink helper to have consistent naming. In other words, devlink
>> > helper should be the standard way, in weird cases (like rocker), driver
>> > implements it himself.
>> 
>> That's an assumption that somehow the devlink API will be better
>> supported than ndo_get_phys_port_{name,id}. Don't get me wrong -- an API
>> to show the kind of device is needed, but I do not think this enforces
>> any kind of consistency in naming.
>> 
>> > 
>> > 
>> >>
>> >>>
>> >>> This patchset introduces port flavours which should address the first
>> >>> problem. I'm testing this with Netronome nfp hardware. When the user
>> >>> has 2 physical ports, 1 pf, and 4 vfs, he should see something like this:
>> >>> # devlink port
>> >>> pci/:05:00.0/0: type eth netdev enp5s0np0 flavour physical number 0
>> >>> pci/:05:00.0/268435456: type eth netdev eth0 flavour physical number >> >>> 0
>> >>> pci/:05:00.0/268435460: type eth netdev enp5s0np1 flavour physical 
>> >>> number 1
>> >>> pci/:05:00.0/536875008: type eth netdev eth2 flavour pf_rep number 
>> >>> 536875008
>> >>> pci/:05:00.0/536870912: type eth netdev eth1 flavour vf_rep number 0
>> >>> pci/:05:00.0/536870976: type eth netdev eth3 flavour vf_rep number 1
>> >>> pci/:05:00.0/536871040: type eth netdev eth4 flavour vf_rep number 2
>> >>> pci/:05:00.0/536871104: type eth netdev eth5 flavour vf_rep number 3
>> >>
>> >> How about 'kind' instead of flavo{u}r?
>> > 
>> > Yeah, kind is often used in kernel already with different meaning
>> > git grep kind net/core
>> > I wanted to avoid confusions
>> 
>> Roopa's amendment works as well; I just think flavor / flavour is the
>> wrong word. Make me thinks of food ... ice cream vs netdevices.
>
>Naming it a 'subtype' could also work.  It's a bit longer than 'kind'
>(no longer than 'flavour') and accurately describes the characteristic
>of this port.  It also avoids the namespace collision of 'kind' that
>Jiri points out.
>
>It also fits with the names used in the PCI world with vendor:device and
>subsystem vendor:subsystem device naming used there for further
>granularity.

Problem with "subtype" is that it indicates some relation with type.
We have type:
enum devlink_port_type {
DEVLINK_PORT_TYPE_NOTSET,
DEVLINK_PORT_TYPE_AUTO,
DEVLINK_PORT_TYPE_ETH,
DEVLINK_PORT_TYPE_IB,
};

Does not feel correct to have subtypes VF/PF/VFREP/etc, which really has
no relation to ETH/IB

What about "role"? Also does not sound good to me, as the "role"
indicates that the port can "act" like something.

For me the "flavour/flavor" is still a favourite. Tells how the port tastes
like, in a semi-fun way :)
Also, there is a precedence to this in particle physics:
https://en.wikipedia.org/wiki/Color%E2%80%93flavor_locking
I guess they also had troubles to find the right name :)
>

Re: [BUG] net: stmmac: dwmac-sun8i broken in linux-next

2018-05-17 Thread Jose Abreu

Hi Corentin,

On 16-05-2018 19:32, Corentin Labbe wrote:
> Hello
>
> The dwmac-sun8i driver is broken in next-20180515, symptom are no RX and TX 
> errors as shown by ifconfig:
> eth0: flags=4163  mtu 1500
> inet 192.168.1.204  netmask 255.255.255.0  broadcast 192.168.1.255
> ether 96:75:ff:0d:f6:d8  txqueuelen 1000  (Ethernet)
> RX packets 0  bytes 0 (0.0 B)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 0  bytes 4956 (4.8 KiB)
> TX errors 118  dropped 0 overruns 0  carrier 0  collisions 0
>
> Reverting the following commit made the driver working:
> 4dbbe8dde8485b89bce8bbbe7564337fd7eed69f ("net: stmmac: Add support for U32 
> TC filter using Flexible RX Parser")
> 5f0456b43140af9413397cc11d03d18b9f2fc2fc ("net: stmmac: Implement logic to 
> automatically select HW Interface")
>
> Note that reverting only 4dbbe8dde8485b89bce8bbbe7564337fd7eed69f lead to 
> crash:
> [   31.385110] Backtrace: 
> [   31.387576] [] (stmmac_open) from [] 
> (__dev_open+0xe4/0x180)
> [   31.394972]  r10:ed447d04 r9:edc5d010 r8:ef02002c r7:c08670a4 r6: 
> r5:c0c08488
> [   31.402793]  r4:ef02
> [   31.405335] [] (__dev_open) from [] 
> (__dev_change_flags+0x190/0x1e8)
> [   31.413421]  r8:1002 r7:c0c08488 r6:1003 r5:0001 r4:ef02
> [   31.420122] [] (__dev_change_flags) from [] 
> (dev_change_flags+0x20/0x50)
> [   31.428555]  r9:edc5d010 r8:ed447c18 r7:ef020134 r6: r5:1002 
> r4:ef02
> [   31.436300] [] (dev_change_flags) from [] 
> (do_setlink+0x28c/0xbdc)
> [   31.444213]  r9:edc5d010 r8:ed447c18 r7: r6:c0c08488 r5:ed447b50 
> r4:ef02
> [   31.451955] [] (do_setlink) from [] 
> (rtnl_newlink+0x54c/0x7a8)
> [   31.459522]  r10:ed447d04 r9: r8: r7: r6: 
> r5:
> [   31.467343]  r4:ef02
> [   31.469885] [] (rtnl_newlink) from [] 
> (rtnetlink_rcv_msg+0x38c/0x544)
> [   31.478058]  r10:ed447d04 r9: r8:ee242840 r7: r6:edc5d000 
> r5:c0c08488
> [   31.485879]  r4:
> [   31.488422] [] (rtnetlink_rcv_msg) from [] 
> (netlink_rcv_skb+0xc0/0x118)
> [   31.496768]  r10:c0c08488 r9: r8:0020 r7:edc5d000 r6:c064466c 
> r5:c0c08488
> [   31.504589]  r4:ee242840
> [   31.507129] [] (netlink_rcv_skb) from [] 
> (rtnetlink_rcv+0x18/0x1c)
> [   31.515042]  r8:ed447d60 r7:ee242840 r6:0020 r5:ee37d800 r4:ee5fac00
> [   31.521742] [] (rtnetlink_rcv) from [] 
> (netlink_unicast+0x190/0x1fc)
> [   31.529829] [] (netlink_unicast) from [] 
> (netlink_sendmsg+0x3cc/0x410)
> [   31.538089]  r10: r9:0020 r8:014000c0 r7:ee242840 r6:ee37d800 
> r5:c0c08488
> [   31.545910]  r4:ed447f44
> [   31.548452] [] (netlink_sendmsg) from [] 
> (sock_sendmsg+0x1c/0x2c)
> [   31.556279]  r10: r9:ed447edc r8: r7:eefce640 r6: 
> r5:c0c08488
> [   31.564100]  r4:ed447f44
> [   31.566640] [] (sock_sendmsg) from [] 
> (___sys_sendmsg+0x250/0x264)
> [   31.574555] [] (___sys_sendmsg) from [] 
> (__sys_sendmsg+0x58/0x94)
> [   31.582382]  r10: r9:ed446000 r8:c01011c4 r7:eefce640 r6: 
> r5:bec25150
> [   31.590203]  r4:c0c08488
> [   31.592743] [] (__sys_sendmsg) from [] 
> (sys_sendmsg+0x14/0x18)
> [   31.600307]  r7:0128 r6:bec2d17c r5:bec25144 r4:00093ee0
> [   31.605969] [] (sys_sendmsg) from [] 
> (ret_fast_syscall+0x0/0x28)
> [   31.613704] Exception stack(0xed447fa8 to 0xed447ff0)
> [   31.618756] 7fa0:   00093ee0 bec25144 0003 bec25150 
>  85ce
> [   31.626929] 7fc0: 00093ee0 bec25144 bec2d17c 0128 000942a8 5afc783a 
> 00094000 bec25150
> [   31.635099] 7fe0:  bec250f0 012c b6e10b5c
> [   31.640152] Code: e59a261c e59a013c e50b306c e592300c (e593300c) 
> [   31.646632] ---[ end trace 407964b7deb937bf ]---
>
> For the moment, I stil didnt find the issue.
> What to we do now ? do you want that I send revert patchs ?

No need for revert. I checked the patch and dwmac-sun8i
implementation and I think I found the problem. I will send you a
patch to test.

Thanks and Best Regards,
Jose Miguel Abreu

>
> Regards

Re: net: ieee802154: 6lowpan: fix frag reassembly

2018-05-17 Thread Greg KH

On Mon, May 14, 2018 at 05:22:18PM +0200, Stefan Schmidt wrote:
> Hello.
> 
> 
> Please apply f18fa5de5ba7f1d6650951502bb96a6e4715a948
> 
> (net: ieee802154: 6lowpan: fix frag reassembly) to the 4.16.x stable tree.
> 
> 
> Earlier trees are not needed as the problem was introduced in 4.16.

Really?  Commit f18fa5de5ba7 ("net: ieee802154: 6lowpan: fix frag
reassembly") says it fixes commit 648700f76b03 ("inet: frags: use
rhashtables for reassembly units") which did not show up until 4.17-rc1:
$ git describe --contains 648700f76b03
v4.17-rc1~148^2~20^2~11

Also, it did not get backported to 4.16.y, so I don't see how it is
needed in 4.16-stable.

To verify this, I tried applying the patch, and it totally fails to
apply to the 4.16.y tree.

So are you _sure_ you want/need this in 4.16?  If so, can you provide a
working backport that you have verified works?

thanks,

greg k-h

Re: [PATCH] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Michal Kubecek

On Thu, May 17, 2018 at 10:44:42AM +0200, Greg Kroah-Hartman wrote:
> When allocating a xt_table_info structure, we should be clearing out the
> full amount of memory that was allocated, not just the "header" of the
> structure.  Otherwise odd values could be passed to userspace, which is
> not a good thing.
> 
> Cc: stable 
> Signed-off-by: Greg Kroah-Hartman 
> ---
>  net/netfilter/x_tables.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index cb7cb300c3bc..a300e8252bb6 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1187,7 +1187,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
> size)
>   if (!info)
>   return NULL;
>  
> - memset(info, 0, sizeof(*info));
> + memset(info, 0, sz);
>   info->size = size;
>   return info;
>  }
> -- 
> 2.17.0
> 

Or we can replace kvmalloc() by kvzalloc() and remove the memset().

Michal Kubecek

Re: [PATCH 3/3] sh_eth: add R8A77980 support

2018-05-17 Thread Simon Horman

On Wed, May 16, 2018 at 11:00:29PM +0300, Sergei Shtylyov wrote:
> Finally, add support for the DT probing of the R-Car V3H (AKA R8A77980) --
> it's the only R-Car gen3 SoC having the GEther controller -- others have
> only EtherAVB...
> 
> Based on the original (and large) patch by Vladimir Barinov.
> 
> Signed-off-by: Vladimir Barinov 
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Simon Horman

Re: [PATCH net-next v3 00/10] net: mvpp2: phylink conversion

2018-05-17 Thread Russell King - ARM Linux

On Thu, May 17, 2018 at 10:29:29AM +0200, Antoine Tenart wrote:
> Since v2:
>   - Removed the SFP description from the DB boards, as their SFP cages
> are wired properly. We now use fixed-link.

I think you mean "improperly" here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: xdp and fragments with virtio

2018-05-17 Thread Jason Wang




On 2018年05月17日 10:55, David Ahern wrote:

On 5/16/18 1:24 AM, Jason Wang wrote:


On 2018年05月16日 11:51, David Ahern wrote:

Hi Jason:

I am trying to test MTU changes to the BPF fib_lookup helper and seeing
something odd. Hoping you can help.

I have a VM with multiple virtio based NICs and tap backends. I install
the xdp program on eth1 and eth2 to do forwarding. In the host I send a
large packet to eth1:

$ ping -s 1500 9.9.9.9


The tap device in the host sees 2 packets:

$ sudo tcpdump -nv -i vm02-eth1
20:44:33.943160 IP (tos 0x0, ttl 64, id 58746, offset 0, flags [+],
proto ICMP (1), length 1500)
  10.100.1.254 > 9.9.9.9: ICMP echo request, id 17917, seq 1,
length 1480
20:44:33.943172 IP (tos 0x0, ttl 64, id 58746, offset 1480, flags
[none], proto ICMP (1), length 48)
  10.100.1.254 > 9.9.9.9: ip-proto-1


In the VM, the XDP program only sees the first packet, not the fragment.
I added a printk to the program (see diff below):

$ cat trace_pipe
    -0 [003] ..s2   254.436467: 0: packet length 1514


Anything come to mind in the virtio xdp implementation that affects
fragment packets? I see this with both IPv4 and v6.

Not yet. But we do turn of tap gso when virtio has XDP set, but it
shouldn't matter this case.

Will try to see what's wrong.


I added this to the command line for the NICs and it works:

"mrg_rxbuf=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off"

XDP program sees the full size packet and the fragment.

Fun fact: only adding mrg_rxbuf=off so that mergeable_rx_bufs is false
but big_packets is true generates a panic when it receives large packets.


It looks like we wrongly drop packets after linearizing the packets 
during XDP_REDIRECT.


Please try the patch (but I do spot some other issues, will post a series):

Thanks

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f34794a..59702f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -800,7 +800,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,

    }
    *xdp_xmit = true;
    if (unlikely(xdp_page != page))
-   goto err_xdp;
+   put_page(page);
    rcu_read_unlock();
    goto xdp_xmit;
    default:

Re: [PATCH net-next v3 00/10] net: mvpp2: phylink conversion

2018-05-17 Thread Antoine Tenart

Hi Russell,

On Thu, May 17, 2018 at 10:18:56AM +0100, Russell King - ARM Linux wrote:
> On Thu, May 17, 2018 at 10:29:29AM +0200, Antoine Tenart wrote:
> > Since v2:
> >   - Removed the SFP description from the DB boards, as their SFP cages
> > are wired properly. We now use fixed-link.
> 
> I think you mean "improperly" here.

Right :)

Thanks,
Antoine

-- 
Antoine Ténart, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Greg Kroah-Hartman

On Thu, May 17, 2018 at 10:59:51AM +0200, Michal Kubecek wrote:
> On Thu, May 17, 2018 at 10:44:42AM +0200, Greg Kroah-Hartman wrote:
> > When allocating a xt_table_info structure, we should be clearing out the
> > full amount of memory that was allocated, not just the "header" of the
> > structure.  Otherwise odd values could be passed to userspace, which is
> > not a good thing.
> > 
> > Cc: stable 
> > Signed-off-by: Greg Kroah-Hartman 
> > ---
> >  net/netfilter/x_tables.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index cb7cb300c3bc..a300e8252bb6 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -1187,7 +1187,7 @@ struct xt_table_info *xt_alloc_table_info(unsigned 
> > int size)
> > if (!info)
> > return NULL;
> >  
> > -   memset(info, 0, sizeof(*info));
> > +   memset(info, 0, sz);
> > info->size = size;
> > return info;
> >  }
> > -- 
> > 2.17.0
> > 
> 
> Or we can replace kvmalloc() by kvzalloc() and remove the memset().

That works for me too, either is sufficient to solve the problem.

Let me go respin this, less lines of code is always better :)

thanks,

greg k-h

[PATCH v2] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Greg Kroah-Hartman

When allocating a xt_table_info structure, we should be clearing out the
full amount of memory that was allocated, not just the "header" of the
structure.  Otherwise odd values could be passed to userspace, which is
not a good thing.

Cc: stable 
Signed-off-by: Greg Kroah-Hartman 
---
v2: use kvzalloc instead of kvmalloc/memset pair, as suggested by Michal Kubecek

 net/netfilter/x_tables.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index cb7cb300c3bc..cd22bb9b66f3 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1183,11 +1183,10 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
size)
 * than shoot all processes down before realizing there is nothing
 * more to reclaim.
 */
-   info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
+   info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
if (!info)
return NULL;
 
-   memset(info, 0, sizeof(*info));
info->size = size;
return info;
 }
-- 
2.17.0

Re: [PATCH v2] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Eric Dumazet



On 05/17/2018 02:34 AM, Greg Kroah-Hartman wrote:
> When allocating a xt_table_info structure, we should be clearing out the
> full amount of memory that was allocated, not just the "header" of the
> structure.  Otherwise odd values could be passed to userspace, which is
> not a good thing.
> 
> Cc: stable 
> Signed-off-by: Greg Kroah-Hartman 
> ---
> v2: use kvzalloc instead of kvmalloc/memset pair, as suggested by Michal 
> Kubecek
> 
>  net/netfilter/x_tables.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index cb7cb300c3bc..cd22bb9b66f3 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -1183,11 +1183,10 @@ struct xt_table_info *xt_alloc_table_info(unsigned 
> int size)
>* than shoot all processes down before realizing there is nothing
>* more to reclaim.
>*/
> - info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> + info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>   if (!info)
>   return NULL;
>  
> - memset(info, 0, sizeof(*info));
>   info->size = size;
>   return info;
>  }
> 

I am curious, what particular path does not later overwrite the whole zone ?

Do not get me wrong, this is not fast path, but these blobs can be huge.

[PATCH net-next] net: stmmac: Populate missing callbacks in HWIF initialization

2018-05-17 Thread Jose Abreu

Some HW specific setusp, like sun8i, do not populate all the necessary
callbacks, which is what HWIF helpers were expecting.

Fix this by always trying to get the generic helpers and populate them
if they were not previously populated by HW specific setup.

Signed-off-by: Jose Abreu 
Fixes: 5f0456b43140 ("net: stmmac: Implement logic to automatically
select HW Interface")
Reported-by: Corentin Labbe 
Cc: Corentin Labbe 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
Hi Corentin,

Please check if this patch makes sun8i work again.

Thanks and Best Regards,
Jose Miguel Abreu
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   38 ---
 1 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c 
b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 9acc8d2..bf87571 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -163,13 +163,16 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
bool needs_gmac = priv->plat->has_gmac;
const struct stmmac_hwif_entry *entry;
struct mac_device_info *mac;
+   bool needs_setup = true;
int i, ret;
u32 id;
 
if (needs_gmac) {
id = stmmac_get_id(priv, GMAC_VERSION);
-   } else {
+   } else if (needs_gmac4) {
id = stmmac_get_id(priv, GMAC4_VERSION);
+   } else {
+   id = 0;
}
 
/* Save ID for later use */
@@ -177,13 +180,12 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 
/* Check for HW specific setup first */
if (priv->plat->setup) {
-   priv->hw = priv->plat->setup(priv);
-   if (!priv->hw)
-   return -ENOMEM;
-   return 0;
+   mac = priv->plat->setup(priv);
+   needs_setup = false;
+   } else {
+   mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
}
 
-   mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
if (!mac)
return -ENOMEM;
 
@@ -195,22 +197,26 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
continue;
if (needs_gmac4 ^ entry->gmac4)
continue;
-   if (id < entry->min_id)
+   /* Use synopsys_id var because some setups can override this */
+   if (priv->synopsys_id < entry->min_id)
continue;
 
-   mac->desc = entry->desc;
-   mac->dma = entry->dma;
-   mac->mac = entry->mac;
-   mac->ptp = entry->hwtimestamp;
-   mac->mode = entry->mode;
-   mac->tc = entry->tc;
+   /* Only use generic HW helpers if needed */
+   mac->desc = mac->desc ? : entry->desc;
+   mac->dma = mac->dma ? : entry->dma;
+   mac->mac = mac->mac ? : entry->mac;
+   mac->ptp = mac->ptp ? : entry->hwtimestamp;
+   mac->mode = mac->mode ? : entry->mode;
+   mac->tc = mac->tc ? : entry->tc;
 
priv->hw = mac;
 
/* Entry found */
-   ret = entry->setup(priv);
-   if (ret)
-   return ret;
+   if (needs_setup) {
+   ret = entry->setup(priv);
+   if (ret)
+   return ret;
+   }
 
/* Run quirks, if needed */
if (entry->quirks) {
-- 
1.7.1

[patch net-next] nfp: flower: set sysfs link to device for representors

2018-05-17 Thread Jiri Pirko

From: Jiri Pirko 

Do this so the sysfs has "device" link correctly set.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 4e67c0cbf9f0..976ed112387d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -267,6 +267,7 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
app->pf->vf_cfg_mem + i * NFP_NET_CFG_BAR_SZ;
}
 
+   SET_NETDEV_DEV(repr, &priv->nn->pdev->dev);
eth_hw_addr_random(repr);
 
port_id = nfp_flower_cmsg_pcie_port(nfp_pcie, vnic_type,
-- 
2.14.3

[patch net-next] nfp: flower: fix error path during representor creation

2018-05-17 Thread Jiri Pirko

From: Jiri Pirko 

Don't store repr pointer to reprs array until the representor is
successfully created. This avoids message about "representor
destruction" even when it was never created. Also it cleans-up the flow.
Also, check return value after port alloc.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c  | 13 +++--
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c |  9 +++--
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h |  1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 84e3b9f5abb1..4e67c0cbf9f0 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -247,12 +247,16 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
err = -ENOMEM;
goto err_reprs_clean;
}
-   RCU_INIT_POINTER(reprs->reprs[i], repr);
 
/* For now we only support 1 PF */
WARN_ON(repr_type == NFP_REPR_TYPE_PF && i);
 
port = nfp_port_alloc(app, port_type, repr);
+   if (IS_ERR(port)) {
+   err = PTR_ERR(port);
+   nfp_repr_free(repr);
+   goto err_reprs_clean;
+   }
if (repr_type == NFP_REPR_TYPE_PF) {
port->pf_id = i;
port->vnic = priv->nn->dp.ctrl_bar;
@@ -271,9 +275,11 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
port_id, port, priv->nn->dp.netdev);
if (err) {
nfp_port_free(port);
+   nfp_repr_free(repr);
goto err_reprs_clean;
}
 
+   RCU_INIT_POINTER(reprs->reprs[i], repr);
nfp_info(app->cpp, "%s%d Representor(%s) created\n",
 repr_type == NFP_REPR_TYPE_PF ? "PF" : "VF", i,
 repr->name);
@@ -344,16 +350,17 @@ nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct 
nfp_flower_priv *priv)
err = -ENOMEM;
goto err_reprs_clean;
}
-   RCU_INIT_POINTER(reprs->reprs[phys_port], repr);
 
port = nfp_port_alloc(app, NFP_PORT_PHYS_PORT, repr);
if (IS_ERR(port)) {
err = PTR_ERR(port);
+   nfp_repr_free(repr);
goto err_reprs_clean;
}
err = nfp_port_init_phy_port(app->pf, app, port, i);
if (err) {
nfp_port_free(port);
+   nfp_repr_free(repr);
goto err_reprs_clean;
}
 
@@ -365,6 +372,7 @@ nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct 
nfp_flower_priv *priv)
cmsg_port_id, port, priv->nn->dp.netdev);
if (err) {
nfp_port_free(port);
+   nfp_repr_free(repr);
goto err_reprs_clean;
}
 
@@ -373,6 +381,7 @@ nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct 
nfp_flower_priv *priv)
 eth_tbl->ports[i].base,
 phys_port);
 
+   RCU_INIT_POINTER(reprs->reprs[phys_port], repr);
nfp_info(app->cpp, "Phys Port %d Representor(%s) created\n",
 phys_port, repr->name);
}
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 0cd077addb26..6e79da91e475 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -348,12 +348,17 @@ int nfp_repr_init(struct nfp_app *app, struct net_device 
*netdev,
return err;
 }
 
-static void nfp_repr_free(struct nfp_repr *repr)
+static void __nfp_repr_free(struct nfp_repr *repr)
 {
free_percpu(repr->stats);
free_netdev(repr->netdev);
 }
 
+void nfp_repr_free(struct net_device *netdev)
+{
+   __nfp_repr_free(netdev_priv(netdev));
+}
+
 struct net_device *nfp_repr_alloc(struct nfp_app *app)
 {
struct net_device *netdev;
@@ -385,7 +390,7 @@ static void nfp_repr_clean_and_free(struct nfp_repr *repr)
nfp_info(repr->app->cpp, "Destroying Representor(%s)\n",
 repr->netdev->name);
nfp_repr_clean(repr);
-   nfp_repr_free(repr);
+   __nfp_repr_free(repr);
 }
 
 void nfp_reprs_clean_and_free(struct nfp_app *app, struct nfp_reprs *reprs)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
index a621e8ff528e..cd756a15445f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp

Re: [PATCH v2] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Greg Kroah-Hartman

On Thu, May 17, 2018 at 02:55:42AM -0700, Eric Dumazet wrote:
> 
> 
> On 05/17/2018 02:34 AM, Greg Kroah-Hartman wrote:
> > When allocating a xt_table_info structure, we should be clearing out the
> > full amount of memory that was allocated, not just the "header" of the
> > structure.  Otherwise odd values could be passed to userspace, which is
> > not a good thing.
> > 
> > Cc: stable 
> > Signed-off-by: Greg Kroah-Hartman 
> > ---
> > v2: use kvzalloc instead of kvmalloc/memset pair, as suggested by Michal 
> > Kubecek
> > 
> >  net/netfilter/x_tables.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index cb7cb300c3bc..cd22bb9b66f3 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -1183,11 +1183,10 @@ struct xt_table_info *xt_alloc_table_info(unsigned 
> > int size)
> >  * than shoot all processes down before realizing there is nothing
> >  * more to reclaim.
> >  */
> > -   info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> > +   info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> > if (!info)
> > return NULL;
> >  
> > -   memset(info, 0, sizeof(*info));
> > info->size = size;
> > return info;
> >  }
> > 
> 
> I am curious, what particular path does not later overwrite the whole zone ?

The path back was long, adding Greg Hackman who helped to debug this to
the To: to confirm that I got this correct...

In do_ipt_get_ctl, the IPT_SO_GET_ENTRIES: option uses a len value that
can be larger than the size of the structure itself.

Then the data is copied to userspace in copy_entries_to_user() for ipv4
and v6, and that's where the "bad data" was noticed (a researcher was
using a kernel patch to determine what the data was)

Greg, that's the correct path here, right?

> Do not get me wrong, this is not fast path, but these blobs can be huge.

Yeah, I bet, but for "normal" cases the size should be small and all
should be fine.

thanks,

greg k-h

Re: [PATCH v2] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Jan Engelhardt

On Thursday 2018-05-17 12:09, Greg Kroah-Hartman wrote:
>> > --- a/net/netfilter/x_tables.c
>> > +++ b/net/netfilter/x_tables.c
>> > @@ -1183,11 +1183,10 @@ struct xt_table_info *xt_alloc_table_info(unsigned 
>> > int size)
>> > * than shoot all processes down before realizing there is nothing
>> > * more to reclaim.
>> > */
>> > -  info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>> > +  info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
>> >if (!info)
>> >return NULL;
>>
>> I am curious, what particular path does not later overwrite the whole zone ?
>
>In do_ipt_get_ctl, the IPT_SO_GET_ENTRIES: option uses a len value that
>can be larger than the size of the structure itself.
>
>Then the data is copied to userspace in copy_entries_to_user() for ipv4
>and v6, and that's where the "bad data"

If the kernel incorrectly copies more bytes than it should, isn't that
a sign that may be going going past the end of the info buffer?
(And thus, zeroing won't truly fix the issue)

And if the kernel copies too few (because it just does not have more
data than userspace is requesting), what remains in the user buffer
is the garbage that originally was there.

[PATCH net 1/1] net/smc: initialize tx_work before llc initial handshake

2018-05-17 Thread Ursula Braun

From: Karsten Graul 

When the llc handshake fails in an early state, the general cleanup
routines may try to cancel an uninitialized worker. Avoid this by
initializing the worker before the llc initial handshake starts.

Signed-off-by: Karsten Graul 
Signed-off-by: Ursula Braun 
---
 net/smc/af_smc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 544bab42f925..e96f324dc69f 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -494,6 +494,7 @@ static int smc_connect_rdma(struct smc_sock *smc)
rc = smc_clc_send_confirm(smc);
if (rc)
goto out_err_unlock;
+   smc_tx_init(smc);
 
if (local_contact == SMC_FIRST_CONTACT) {
/* QP confirmation over RoCE fabric */
@@ -505,9 +506,7 @@ static int smc_connect_rdma(struct smc_sock *smc)
if (reason_code > 0)
goto decline_rdma_unlock;
}
-
mutex_unlock(&smc_create_lgr_pending);
-   smc_tx_init(smc);
 
 out_connected:
smc_copy_sock_settings_to_clc(smc);
@@ -885,6 +884,7 @@ static void smc_listen_work(struct work_struct *work)
reason_code = SMC_CLC_DECL_INTERR;
goto decline_rdma_unlock;
}
+   smc_tx_init(new_smc);
 
if (local_contact == SMC_FIRST_CONTACT) {
rc = smc_ib_ready_link(link);
@@ -900,8 +900,6 @@ static void smc_listen_work(struct work_struct *work)
if (reason_code > 0)
goto decline_rdma_unlock;
}
-
-   smc_tx_init(new_smc);
mutex_unlock(&smc_create_lgr_pending);
 
 out_connected:
-- 
2.16.3

[PATCH net-next] net/smc: init conn.tx_work & conn.send_lock sooner

2018-05-17 Thread Eric Dumazet

syzkaller found that following program crashes the host :

{
  int fd = socket(AF_SMC, SOCK_STREAM, 0);
  int val = 1;

  listen(fd, 0);
  shutdown(fd, SHUT_RDWR);
  setsockopt(fd, 6, TCP_NODELAY, &val, 4);
}

Simply initialize conn.tx_work & conn.send_lock at socket creation,
rather than deeper in the stack.

ODEBUG: assert_init not available (active state 0) object type: timer_list 
hint:   (null)
WARNING: CPU: 1 PID: 13988 at lib/debugobjects.c:329 
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 13988 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #46
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 panic+0x22f/0x4de kernel/panic.c:184
 __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
RSP: 0018:880197a37880 EFLAGS: 00010086
RAX: 0061 RBX: 0005 RCX: c90001ed
RDX: 4aaf RSI: 8160f6f1 RDI: 0001
RBP: 880197a378c0 R08: 8801aa7a0080 R09: ed003b5e3eb2
R10: ed003b5e3eb2 R11: 8801daf1f597 R12: 0001
R13: 88d96980 R14: 87fa19a0 R15: 81666ec0
 debug_object_assert_init+0x309/0x500 lib/debugobjects.c:692
 debug_timer_assert_init kernel/time/timer.c:724 [inline]
 debug_assert_init kernel/time/timer.c:776 [inline]
 del_timer+0x74/0x140 kernel/time/timer.c:1198
 try_to_grab_pending+0x439/0x9a0 kernel/workqueue.c:1223
 mod_delayed_work_on+0x91/0x250 kernel/workqueue.c:1592
 mod_delayed_work include/linux/workqueue.h:541 [inline]
 smc_setsockopt+0x387/0x6d0 net/smc/af_smc.c:1367
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Eric Dumazet 
Cc: Ursula Braun 
Cc: linux-s...@vger.kernel.org
Reported-by: syzbot 
---
 net/smc/af_smc.c | 2 ++
 net/smc/smc_tx.c | 4 +---
 net/smc/smc_tx.h | 1 +
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 
d15762b057c0d8d4167feca9a4c41f0408604c37..6ad4f6c771c3fa63d3ae714dbe65879910963a21
 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -193,8 +193,10 @@ static struct sock *smc_sock_alloc(struct net *net, struct 
socket *sock,
sk->sk_protocol = protocol;
smc = smc_sk(sk);
INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
+   INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
INIT_LIST_HEAD(&smc->accept_q);
spin_lock_init(&smc->accept_q_lock);
+   spin_lock_init(&smc->conn.send_lock);
sk->sk_prot->hash(sk);
sk_refcnt_debug_inc(sk);
 
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 
58dfe0bd9d6075b5d3db97c65b364b97da73..08a7de98bb031b5d2256e3238236108749e7ae39
 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -450,7 +450,7 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
 /* Wakeup sndbuf consumers from process context
  * since there is more data to transmit
  */
-static void smc_tx_work(struct work_struct *work)
+void smc_tx_work(struct work_struct *work)
 {
struct smc_connection *conn = container_of(to_delayed_work(work),
   struct smc_connection,
@@ -512,6 +512,4 @@ void smc_tx_consumer_update(struct smc_connection *conn)
 void smc_tx_init(struct smc_sock *smc)
 {
smc->sk.sk_write_space = smc_tx_write_space;
-   INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
-   spin_lock_init(&smc->conn.send_lock);
 }
diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
index 
78255964fa4dc1c69f96548e035e74a167999a62..8f64b12bf03c1b52c599b1239784a404efe65dae
 100644
--- a/net/smc/smc_tx.h
+++ b/net/smc/smc_tx.h
@@ -27,6 +27,7 @@ static inline int smc_tx_prepared_sends(struct smc_connection 
*conn)
return smc_curs_diff(conn->sndbuf_size, &sent, &prep);
 }
 
+void smc_tx_work(struct work_struct *work);
 void smc_tx_init(struct smc_sock *smc);
 int smc_tx_sendmsg(struct smc_sock *smc, struct msghdr *msg, size_t len);
 int smc_tx_sndbuf_nonempty(struct smc_connection *conn);
-- 
2.17.0.441.gb46fe60e1d-goog

Re: [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter

2018-05-17 Thread Eric Dumazet

On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
> The ACK filter is an optional feature of CAKE which is designed to improve
> performance on links with very asymmetrical rate limits. On such links
> (which are unfortunately quite prevalent, especially for DSL and cable
> subscribers), the downstream throughput can be limited by the number of
> ACKs capable of being transmitted in the *upstream* direction.
> 

...

> 
> Signed-off-by: Toke Høiland-Jørgensen 
> ---
>  net/sched/sch_cake.c |  260 
> ++
>  1 file changed, 258 insertions(+), 2 deletions(-)
> 
>

I have decided to implement ACK compression in TCP stack itself.

First step is to take care of SACK, which are the main source of the bloat,
since we send one SACK for every incoming out-of-order packet.

These SACK are not only causing pain on the network, they also cause
the sender to send one MSS at a time (TSO auto defer is not engaged in this 
case),
thus starting to fill its RTX queue with pathological skbs (1-MSS each), 
increasing
processing time.

I see that your ACK filter does not take care of this common case :)

Doing the filtering in TCP has the immense advantage of knowing the RTT and 
thus be able
to use heuristics causing less damage.

Re: [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter

2018-05-17 Thread Toke Høiland-Jørgensen

Eric Dumazet  writes:

> On 05/16/2018 01:29 PM, Toke Høiland-Jørgensen wrote:
>> The ACK filter is an optional feature of CAKE which is designed to improve
>> performance on links with very asymmetrical rate limits. On such links
>> (which are unfortunately quite prevalent, especially for DSL and cable
>> subscribers), the downstream throughput can be limited by the number of
>> ACKs capable of being transmitted in the *upstream* direction.
>> 
>
> ...
>
>> 
>> Signed-off-by: Toke Høiland-Jørgensen 
>> ---
>>  net/sched/sch_cake.c |  260 
>> ++
>>  1 file changed, 258 insertions(+), 2 deletions(-)
>> 
>>
>
> I have decided to implement ACK compression in TCP stack itself.

Awesome! Will look forward to seeing that!

> First step is to take care of SACK, which are the main source of the
> bloat, since we send one SACK for every incoming out-of-order packet.
>
> These SACK are not only causing pain on the network, they also cause
> the sender to send one MSS at a time (TSO auto defer is not engaged in
> this case), thus starting to fill its RTX queue with pathological skbs
> (1-MSS each), increasing processing time.
>
> I see that your ACK filter does not take care of this common case :)

We don't do full parsing of SACKs, no; we were trying to keep things
simple... We do detect the presence of SACK options, though, and the
presence of SACK options on an ACK will make previous ACKs be considered
redundant.

> Doing the filtering in TCP has the immense advantage of knowing the
> RTT and thus be able to use heuristics causing less damage.

Quite so. I'll be quite happy if the CAKE ACK filter can be delegated to
something only relevant for the poor sods stuck on proprietary operating
systems :)


Are you satisfied that the current version of the filter doesn't mangle
the skbs or crash the kernel?

-Toke

[PATCH] wcn36xx: Add support for Factory Test Mode (FTM)

2018-05-17 Thread Ramon Fried

From: Eyal Ilsar 

Introduce infrastructure for supporting Factory Test Mode (FTM) of the
wireless LAN subsystem. In order for the user space to access the
firmware in test mode the relevant netlink channel needs to be exposed
from the kernel driver.

The above is achieved as follows:
1) Register wcn36xx driver to testmode callback from netlink
2) Add testmode callback implementation to handle incoming FTM commands
3) Add FTM command packet structure
4) Add handling for GET_BUILD_RELEASE_NUMBER (msgid=0x32A2)
5) Add generic handling for all PTT_MSG packets

Signed-off-by: Eyal Ilsar 
Signed-off-by: Ramon Fried 
---
 drivers/net/wireless/ath/wcn36xx/Makefile |   2 +
 drivers/net/wireless/ath/wcn36xx/hal.h|  16 ++
 drivers/net/wireless/ath/wcn36xx/main.c   |   3 +
 drivers/net/wireless/ath/wcn36xx/smd.c|  74 +
 drivers/net/wireless/ath/wcn36xx/smd.h|   4 +
 drivers/net/wireless/ath/wcn36xx/testmode.c   | 151 ++
 drivers/net/wireless/ath/wcn36xx/testmode.h   |  46 ++
 drivers/net/wireless/ath/wcn36xx/testmode_i.h |  29 
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h|   2 +
 9 files changed, 327 insertions(+)
 create mode 100644 drivers/net/wireless/ath/wcn36xx/testmode.c
 create mode 100644 drivers/net/wireless/ath/wcn36xx/testmode.h
 create mode 100644 drivers/net/wireless/ath/wcn36xx/testmode_i.h

diff --git a/drivers/net/wireless/ath/wcn36xx/Makefile 
b/drivers/net/wireless/ath/wcn36xx/Makefile
index 3b09435104eb..582049f65735 100644
--- a/drivers/net/wireless/ath/wcn36xx/Makefile
+++ b/drivers/net/wireless/ath/wcn36xx/Makefile
@@ -6,3 +6,5 @@ wcn36xx-y +=   main.o \
smd.o \
pmc.o \
debug.o
+
+wcn36xx-$(CONFIG_NL80211_TESTMODE) += testmode.o
diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index 182963522941..8491b3cb3206 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -2230,6 +2230,22 @@ struct wcn36xx_hal_switch_channel_rsp_msg {
 
 } __packed;
 
+struct wcn36xx_hal_process_ptt_msg_req_msg {
+   struct wcn36xx_hal_msg_header header;
+
+   /* Actual FTM Command body */
+   u8 ptt_msg[0];
+} __packed;
+
+struct wcn36xx_hal_process_ptt_msg_rsp_msg {
+   struct wcn36xx_hal_msg_header header;
+
+   /* FTM Command response status */
+   u32 ptt_msg_resp_status;
+   /* Actual FTM Command body */
+   u8 ptt_msg[0];
+} __packed;
+
 struct update_edca_params_req_msg {
struct wcn36xx_hal_msg_header header;
 
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 69d6be59d97f..ea14f87d11ff 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include "wcn36xx.h"
+#include "testmode.h"
 
 unsigned int wcn36xx_dbg_mask;
 module_param_named(debug_mask, wcn36xx_dbg_mask, uint, 0644);
@@ -1116,6 +1117,8 @@ static const struct ieee80211_ops wcn36xx_ops = {
.sta_add= wcn36xx_sta_add,
.sta_remove = wcn36xx_sta_remove,
.ampdu_action   = wcn36xx_ampdu_action,
+
+   CFG80211_TESTMODE_CMD(wcn36xx_tm_cmd)
 };
 
 static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 8932af5e4d8d..8eaf192f8bdb 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -292,12 +292,26 @@ static void init_hal_msg(struct wcn36xx_hal_msg_header 
*hdr,
msg_body.header.len = sizeof(msg_body); \
} while (0) \
 
+#define INIT_HAL_PTT_MSG(p_msg_body, ppt_msg_len) \
+   do { \
+   memset(p_msg_body, 0, sizeof(*p_msg_body) + ppt_msg_len); \
+   p_msg_body->header.msg_type = WCN36XX_HAL_PROCESS_PTT_REQ; \
+   p_msg_body->header.msg_version = WCN36XX_HAL_MSG_VERSION0; \
+   p_msg_body->header.len = sizeof(*p_msg_body) + ppt_msg_len; \
+   } while (0)
+
 #define PREPARE_HAL_BUF(send_buf, msg_body) \
do {\
memset(send_buf, 0, msg_body.header.len);   \
memcpy(send_buf, &msg_body, sizeof(msg_body));  \
} while (0) \
 
+#define PREPARE_HAL_PTT_MSG_BUF(send_buf, p_msg_body) \
+   do {\
+   memset(send_buf, 0, p_msg_body->header.len); \
+   memcpy(send_buf, p_msg_body, p_msg_body->header.len); \
+   } while (0)
+
 static int wcn36xx_smd_rsp_status_check(void *buf, size_t len)
 {
struct wcn36xx_fw_msg_status_rsp *rsp;
@@ -741,6 +755,64 @@ int wcn36xx_smd_switch_channel(struct wcn36xx *wcn,
return ret;

Re: [PATCH] net: qcom/emac: Allocate buffers from local node

2018-05-17 Thread Timur Tabi


On 5/17/18 3:28 AM, Hemanth Puranik wrote:

Currently we use non-NUMA aware allocation for TPD and RRD buffers,
this patch modifies to use NUMA friendly allocation.

Signed-off-by: Hemanth Puranik


Acked-by: Timur Tabi 

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter

2018-05-17 Thread Eric Dumazet

On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:

> 
> We don't do full parsing of SACKs, no; we were trying to keep things
> simple... We do detect the presence of SACK options, though, and the
> presence of SACK options on an ACK will make previous ACKs be considered
> redundant.
> 

But they are not redundant in some cases, particularly when reorders happen in 
the network.

Re: [RFC v4 3/5] virtio_ring: add packed ring support

2018-05-17 Thread Jason Wang




On 2018年05月16日 22:33, Tiwei Bie wrote:

On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:

On 2018年05月16日 21:45, Tiwei Bie wrote:

On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:

On 2018年05月16日 20:39, Tiwei Bie wrote:

On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:

On 2018年05月16日 16:37, Tiwei Bie wrote:

[...]

+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+ unsigned int id, void **ctx)
+{
+   struct vring_packed_desc *desc;
+   unsigned int i, j;
+
+   /* Clear data ptr. */
+   vq->desc_state[id].data = NULL;
+
+   i = head;
+
+   for (j = 0; j < vq->desc_state[id].num; j++) {
+   desc = &vq->vring_packed.desc[i];
+   vring_unmap_one_packed(vq, desc);

As mentioned in previous discussion, this probably won't work for the case
of out of order completion since it depends on the information in the
descriptor ring. We probably need to extend ctx to record such information.

Above code doesn't depend on the information in the descriptor
ring. The vq->desc_state[] is the extended ctx.

Best regards,
Tiwei Bie

Yes, but desc is a pointer to descriptor ring I think so
vring_unmap_one_packed() still depends on the content of descriptor ring?


I got your point now. I think it makes sense to reserve
the bits of the addr field. Driver shouldn't try to get
addrs from the descriptors when cleanup the descriptors
no matter whether we support out-of-order or not.

Maybe I was wrong, but I remember spec mentioned something like this.

You're right. Spec mentioned this. I was just repeating
the spec to emphasize that it does make sense. :)


But combining it with the out-of-order support, it will
mean that the driver still needs to maintain a desc/ctx
list that is very similar to the desc ring in the split
ring. I'm not quite sure whether it's something we want.
If it is true, I'll do it. So do you think we also want
to maintain such a desc/ctx list for packed ring?

To make it work for OOO backends I think we need something like this
(hardware NIC drivers are usually have something like this).

Which hardware NIC drivers have this?


It's quite common I think, e.g driver track e.g dma addr and page frag 
somewhere. e.g the ring->rx_info in mlx4 driver.


Thanks




Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
much more simpler to be started with.

+1

Best regards,
Tiwei Bie

Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Quentin Monnet

2018-05-16 22:04 UTC+0100 ~ Sean Young 
> Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
> rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> that the last key should be repeated.
> 
> The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> the target_fd must be the /dev/lircN device.
> 
> Signed-off-by: Sean Young 
> ---
>  drivers/media/rc/Kconfig   |  13 ++
>  drivers/media/rc/Makefile  |   1 +
>  drivers/media/rc/bpf-rawir-event.c | 363 +
>  drivers/media/rc/lirc_dev.c|  24 ++
>  drivers/media/rc/rc-core-priv.h|  24 ++
>  drivers/media/rc/rc-ir-raw.c   |  14 +-
>  include/linux/bpf_rcdev.h  |  30 +++
>  include/linux/bpf_types.h  |   3 +
>  include/uapi/linux/bpf.h   |  55 -
>  kernel/bpf/syscall.c   |   7 +
>  10 files changed, 531 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/media/rc/bpf-rawir-event.c
>  create mode 100644 include/linux/bpf_rcdev.h
> 

[...]

Hi Sean,

Please find below some nitpicks on the documentation for the two helpers.

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..243e141e8a5b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h

[...]

> @@ -1902,6 +1904,35 @@ union bpf_attr {
>   *   egress otherwise). This is the only flag supported for now.
>   *   Return
>   *   **SK_PASS** on success, or **SK_DROP** on error.
> + *
> + * int bpf_rc_keydown(void *ctx, u32 protocol, u32 scancode, u32 toggle)
> + *   Description
> + *   Report decoded scancode with toggle value. For use in
> + *   BPF_PROG_TYPE_RAWIR_EVENT, to report a successfully

Could you please use bold RST markup for constants and function names?
Typically for BPF_PROG_TYPE_RAWIR_EVENT here and the enum below.

> + *   decoded scancode. This is will generate a keydown event,

s/This is will/This will/?

> + *   and a keyup event once the scancode is no longer repeated.
> + *
> + *   *ctx* pointer to bpf_rawir_event, *protocol* is decoded
> + *   protocol (see RC_PROTO_* enum).

This documentation is intended to be compiled as a man page. Could you
please use a complete sentence here?
Also, this could do with additional markup as well: **struct
bpf_rawir_event**.

> + *
> + *   Some protocols include a toggle bit, in case the button
> + *   was released and pressed again between consecutive scancodes,
> + *   copy this bit into *toggle* if it exists, else set to 0.
> + *
> + * Return

The "Return" lines here and in the second helper use space indent
instead as tabs (as all other lines do). Would you mind fixing it for
consistency?

> + *   Always return 0 (for now)

Other helpers use just "0" in that case, but I do not really mind.
Out of curiosity, do you have anything specific in mind for changing the
return value here in the future?

> + *
> + * int bpf_rc_repeat(void *ctx)
> + *   Description
> + *   Repeat the last decoded scancode; some IR protocols like
> + *   NEC have a special IR message for repeat last button,

s/repeat/repeating/?

> + *   in case user is holding a button down; the scancode is
> + *   not repeated.
> + *
> + *   *ctx* pointer to bpf_rawir_event.

Please use a complete sentence here as well, if you do not mind.

> + *
> + * Return
> + *   Always return 0 (for now)
>   */
Thanks,
Quentin

[PATCH net-next 3/4] tcp: add SACK compression

2018-05-17 Thread Eric Dumazet

When TCP receives an out-of-order packet, it immediately sends
a SACK packet, generating network load but also forcing the
receiver to send 1-MSS pathological packets, increasing its
RTX queue length/depth, and thus processing time.

Wifi networks suffer from this aggressive behavior, but generally
speaking, all these SACK packets add fuel to the fire when networks
are under congestion.

This patch adds a high resolution timer and tp->compressed_ack counter.

Instead of sending a SACK, we program this timer with a small delay,
based on SRTT and capped to 2.5 ms : delay = min ( 5 % of SRTT, 2.5 ms)

If subsequent SACKs need to be sent while the timer has not yet expired,
we simply increment tp->compressed_ack

When timer expires, a SACK is sent with the latest information.

Note that tcp_sack_new_ofo_skb() is able to force a SACK to be sent
if the sack blocks need to be shuffled, even if the timer has not
expired.

A new SNMP counter is added in the following patch.

Signed-off-by: Eric Dumazet 
---
 include/linux/tcp.h   |  2 ++
 include/net/tcp.h |  3 +++
 net/ipv4/tcp.c|  1 +
 net/ipv4/tcp_input.c  | 31 +--
 net/ipv4/tcp_output.c |  7 +++
 net/ipv4/tcp_timer.c  | 25 +
 6 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 
807776928cb8610fe97121fbc3c600b08d5d2991..72705eaf4b84060a45bf04d5170f389a18010eac
 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -218,6 +218,7 @@ struct tcp_sock {
   reord:1;  /* reordering detected */
} rack;
u16 advmss; /* Advertised MSS   */
+   u8  compressed_ack;
u32 chrono_start;   /* Start time in jiffies of a TCP chrono */
u32 chrono_stat[3]; /* Time in jiffies for chrono_stat stats */
u8  chrono_type:2,  /* current chronograph type */
@@ -297,6 +298,7 @@ struct tcp_sock {
u32 sacked_out; /* SACK'd packets   */
 
struct hrtimer  pacing_timer;
+   struct hrtimer  compressed_ack_timer;
 
/* from STCP, retrans queue hinting */
struct sk_buff* lost_skb_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
6ffc8bd894876ad23407f5ec4994350139af85e7..c8c65ae62955eb12a9a6489fa8e008fd89f89f16
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -560,6 +560,9 @@ static inline void tcp_clear_xmit_timers(struct sock *sk)
if (hrtimer_try_to_cancel(&tcp_sk(sk)->pacing_timer) == 1)
__sock_put(sk);
 
+   if (hrtimer_try_to_cancel(&tcp_sk(sk)->compressed_ack_timer) == 1)
+   __sock_put(sk);
+
inet_csk_clear_xmit_timers(sk);
 }
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
62b776f9003798eaf06992a4eb0914d17646aa61..0a2ea0bbf867271db05aedd7d48b193677664321
 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2595,6 +2595,7 @@ int tcp_disconnect(struct sock *sk, int flags)
dst_release(sk->sk_rx_dst);
sk->sk_rx_dst = NULL;
tcp_saved_syn_free(tp);
+   tp->compressed_ack = 0;
 
/* Clean up fastopen related fields */
tcp_free_fastopen_req(tp);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
99fcab7e6570c8b8758ea4b15cdd26df29fb4fd6..58feea67b6bb147fa9e75b8b514a9a41576b512b
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4242,6 +4242,8 @@ static void tcp_sack_new_ofo_skb(struct sock *sk, u32 
seq, u32 end_seq)
 * If the sack array is full, forget about the last one.
 */
if (this_sack >= TCP_NUM_SACKS) {
+   if (tp->compressed_ack)
+   tcp_send_ack(sk);
this_sack--;
tp->rx_opt.num_sacks--;
sp--;
@@ -5074,6 +5076,7 @@ static inline void tcp_data_snd_check(struct sock *sk)
 static void __tcp_ack_snd_check(struct sock *sk, int ofo_possible)
 {
struct tcp_sock *tp = tcp_sk(sk);
+   unsigned long delay;
 
/* More than one full frame received... */
if (((tp->rcv_nxt - tp->rcv_wup) > inet_csk(sk)->icsk_ack.rcv_mss &&
@@ -5085,15 +5088,31 @@ static void __tcp_ack_snd_check(struct sock *sk, int 
ofo_possible)
(tp->rcv_nxt - tp->copied_seq < sk->sk_rcvlowat ||
 __tcp_select_window(sk) >= tp->rcv_wnd)) ||
/* We ACK each frame or... */
-   tcp_in_quickack_mode(sk) ||
-   /* We have out of order data. */
-   (ofo_possible && !RB_EMPTY_ROOT(&tp->out_of_order_queue))) {
-   /* Then ack it now */
+   tcp_in_quickack_mode(sk)) {
+send_now:
tcp_send_ack(sk);
-   } else {
-   /* Else, send delayed ack. */
+   return;
+   }
+
+   if (!ofo_possible || RB_EMPTY_ROOT(&tp->out_of_order_queue)) {
tcp_send_delayed_ack(sk);
+   return;
}
+
+   if (!tcp_is_sack(tp) || t

[PATCH net-next 4/4] tcp: add TCPAckCompressed SNMP counter

2018-05-17 Thread Eric Dumazet

This counter tracks number of ACK packets that the host has not sent,
thanks to ACK compression.

Sample output :

$ nstat -n;sleep 1;nstat|egrep 
"IpInReceives|IpOutRequests|TcpInSegs|TcpOutSegs|TcpExtTCPAckCompressed"
IpInReceives123250 0.0
IpOutRequests   3684   0.0
TcpInSegs   123251 0.0
TcpOutSegs  3684   0.0
TcpExtTCPAckCompressed  119252 0.0

Signed-off-by: Eric Dumazet 
---
 include/uapi/linux/snmp.h | 1 +
 net/ipv4/proc.c   | 1 +
 net/ipv4/tcp_output.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 
d02e859301ff499dd72a1c0e1b56bed10a9397a6..750d89120335eb489f698191edb6c5110969fa8c
 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -278,6 +278,7 @@ enum
LINUX_MIB_TCPMTUPSUCCESS,   /* TCPMTUPSuccess */
LINUX_MIB_TCPDELIVERED, /* TCPDelivered */
LINUX_MIB_TCPDELIVEREDCE,   /* TCPDeliveredCE */
+   LINUX_MIB_TCPACKCOMPRESSED, /* TCPAckCompressed */
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 
261b71d0ccc5c17c6032bf67eb8f842006766e64..6c1ff89a60fa0a3485dcc71fafc799e798d5dc11
 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -298,6 +298,7 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("TCPMTUPSuccess", LINUX_MIB_TCPMTUPSUCCESS),
SNMP_MIB_ITEM("TCPDelivered", LINUX_MIB_TCPDELIVERED),
SNMP_MIB_ITEM("TCPDeliveredCE", LINUX_MIB_TCPDELIVEREDCE),
+   SNMP_MIB_ITEM("TCPAckCompressed", LINUX_MIB_TCPACKCOMPRESSED),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
7ee98aad82b758674ca7f3e90bd3fc165e8fcd45..437bb7ceba7fd388abac1c12f2920b02be77bad9
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -165,6 +165,8 @@ static inline void tcp_event_ack_sent(struct sock *sk, 
unsigned int pkts)
struct tcp_sock *tp = tcp_sk(sk);
 
if (unlikely(tp->compressed_ack)) {
+   NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED,
+ tp->compressed_ack);
tp->compressed_ack = 0;
if (hrtimer_try_to_cancel(&tp->compressed_ack_timer) == 1)
__sock_put(sk);
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH net-next 1/4] tcp: use __sock_put() instead of sock_put() in tcp_clear_xmit_timers()

2018-05-17 Thread Eric Dumazet

Socket can not disappear under us.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
a08eab58ef7001b3e141e3722fd8a3875e5c5d7d..6ffc8bd894876ad23407f5ec4994350139af85e7
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -558,7 +558,7 @@ void tcp_init_xmit_timers(struct sock *);
 static inline void tcp_clear_xmit_timers(struct sock *sk)
 {
if (hrtimer_try_to_cancel(&tcp_sk(sk)->pacing_timer) == 1)
-   sock_put(sk);
+   __sock_put(sk);
 
inet_csk_clear_xmit_timers(sk);
 }
-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH net-next 0/4] tcp: implement SACK compression

2018-05-17 Thread Eric Dumazet

When TCP receives an out-of-order packet, it immediately sends
a SACK packet, generating network load but also forcing the
receiver to send 1-MSS pathological packets, increasing its
RTX queue length/depth, and thus processing time.

Wifi networks suffer from this aggressive behavior, but generally
speaking, all these SACK packets add fuel to the fire when networks
are under congestion.

This patch series adds SACK compression, but the infrastructure
could be leveraged to also compress ACK in the future.

Eric Dumazet (4):
  tcp: use __sock_put() instead of sock_put() in tcp_clear_xmit_timers()
  tcp: do not force quickack when receiving out-of-order packets
  tcp: add SACK compression
  tcp: add TCPAckCompressed SNMP counter

 include/linux/tcp.h   |  2 ++
 include/net/tcp.h |  5 -
 include/uapi/linux/snmp.h |  1 +
 net/ipv4/proc.c   |  1 +
 net/ipv4/tcp.c|  1 +
 net/ipv4/tcp_input.c  | 33 +
 net/ipv4/tcp_output.c |  9 +
 net/ipv4/tcp_timer.c  | 25 +
 8 files changed, 68 insertions(+), 9 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog

[PATCH net-next 2/4] tcp: do not force quickack when receiving out-of-order packets

2018-05-17 Thread Eric Dumazet

As explained in commit 9f9843a751d0 ("tcp: properly handle stretch
acks in slow start"), TCP stacks have to consider how many packets
are acknowledged in one single ACK, because of GRO, but also
because of ACK compression or losses.

We plan to add SACK compression in the following patch, we
must therefore not call tcp_enter_quickack_mode()

Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
b188e0d75edd9e5e1c9f0355818caa932fef7416..99fcab7e6570c8b8758ea4b15cdd26df29fb4fd6
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4708,8 +4708,6 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt + tcp_receive_window(tp)))
goto out_of_window;
 
-   tcp_enter_quickack_mode(sk);
-
if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
/* Partial packet, seq < rcv_next < end_seq */
SOCK_DEBUG(sk, "partial packet: rcv_next %X seq %X - %X\n",
-- 
2.17.0.441.gb46fe60e1d-goog

Re: [PATCH net-next] net/smc: init conn.tx_work & conn.send_lock sooner

2018-05-17 Thread Ursula Braun



On 05/17/2018 12:54 PM, Eric Dumazet wrote:
> syzkaller found that following program crashes the host :
> 
> {
>   int fd = socket(AF_SMC, SOCK_STREAM, 0);
>   int val = 1;
> 
>   listen(fd, 0);
>   shutdown(fd, SHUT_RDWR);
>   setsockopt(fd, 6, TCP_NODELAY, &val, 4);
> }
> 
> Simply initialize conn.tx_work & conn.send_lock at socket creation,
> rather than deeper in the stack.
> 
> ODEBUG: assert_init not available (active state 0) object type: timer_list 
> hint:   (null)
> WARNING: CPU: 1 PID: 13988 at lib/debugobjects.c:329 
> debug_print_object+0x16a/0x210 lib/debugobjects.c:326
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 1 PID: 13988 Comm: syz-executor0 Not tainted 4.17.0-rc4+ #46
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  panic+0x22f/0x4de kernel/panic.c:184
>  __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
>  report_bug+0x252/0x2d0 lib/bug.c:186
>  fixup_bug arch/x86/kernel/traps.c:178 [inline]
>  do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
> RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
> RSP: 0018:880197a37880 EFLAGS: 00010086
> RAX: 0061 RBX: 0005 RCX: c90001ed
> RDX: 4aaf RSI: 8160f6f1 RDI: 0001
> RBP: 880197a378c0 R08: 8801aa7a0080 R09: ed003b5e3eb2
> R10: ed003b5e3eb2 R11: 8801daf1f597 R12: 0001
> R13: 88d96980 R14: 87fa19a0 R15: 81666ec0
>  debug_object_assert_init+0x309/0x500 lib/debugobjects.c:692
>  debug_timer_assert_init kernel/time/timer.c:724 [inline]
>  debug_assert_init kernel/time/timer.c:776 [inline]
>  del_timer+0x74/0x140 kernel/time/timer.c:1198
>  try_to_grab_pending+0x439/0x9a0 kernel/workqueue.c:1223
>  mod_delayed_work_on+0x91/0x250 kernel/workqueue.c:1592
>  mod_delayed_work include/linux/workqueue.h:541 [inline]
>  smc_setsockopt+0x387/0x6d0 net/smc/af_smc.c:1367
>  __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
>  __do_sys_setsockopt net/socket.c:1914 [inline]
>  __se_sys_setsockopt net/socket.c:1911 [inline]
>  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>

This problem should no longer show up with yesterday's net-next commit
569bc6436568 ("net/smc: no tx work trigger for fallback sockets").
 
> Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
> Signed-off-by: Eric Dumazet 
> Cc: Ursula Braun 
> Cc: linux-s...@vger.kernel.org
> Reported-by: syzbot 
> ---
>  net/smc/af_smc.c | 2 ++
>  net/smc/smc_tx.c | 4 +---
>  net/smc/smc_tx.h | 1 +
>  3 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 
> d15762b057c0d8d4167feca9a4c41f0408604c37..6ad4f6c771c3fa63d3ae714dbe65879910963a21
>  100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -193,8 +193,10 @@ static struct sock *smc_sock_alloc(struct net *net, 
> struct socket *sock,
>   sk->sk_protocol = protocol;
>   smc = smc_sk(sk);
>   INIT_WORK(&smc->tcp_listen_work, smc_tcp_listen_work);
> + INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
>   INIT_LIST_HEAD(&smc->accept_q);
>   spin_lock_init(&smc->accept_q_lock);
> + spin_lock_init(&smc->conn.send_lock);
>   sk->sk_prot->hash(sk);
>   sk_refcnt_debug_inc(sk);
>  
> diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
> index 
> 58dfe0bd9d6075b5d3db97c65b364b97da73..08a7de98bb031b5d2256e3238236108749e7ae39
>  100644
> --- a/net/smc/smc_tx.c
> +++ b/net/smc/smc_tx.c
> @@ -450,7 +450,7 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
>  /* Wakeup sndbuf consumers from process context
>   * since there is more data to transmit
>   */
> -static void smc_tx_work(struct work_struct *work)
> +void smc_tx_work(struct work_struct *work)
>  {
>   struct smc_connection *conn = container_of(to_delayed_work(work),
>  struct smc_connection,
> @@ -512,6 +512,4 @@ void smc_tx_consumer_update(struct smc_connection *conn)
>  void smc_tx_init(struct smc_sock *smc)
>  {
>   smc->sk.sk_write_space = smc_tx_write_space;
> - INIT_DELAYED_WORK(&smc->conn.tx_work, smc_tx_work);
> - spin_lock_init(&smc->conn.send_lock);
>  }
> diff --git a/net/smc/smc_tx.h b/net/smc/smc_tx.h
> index 
> 78255964fa4dc1c69f96548e035e74a167999a62..8f64b12bf03c1b52c599b1239784a404efe65dae
>  100644
> --- a/net/smc/smc_tx.h
> +++ b/net/smc/smc_tx.h
> @@ -27,6 +27,7 @@ static inline int smc_tx_prepared_sends(struct 
> smc_connection *conn)
>   return smc_curs_diff(conn->sndbuf_size, &sent, &prep);
>  }
>  
> +void smc_tx_work(struct work_struct *work);
>  void smc_tx_init(struct s

Re: [PATCH net-next] net/smc: init conn.tx_work & conn.send_lock sooner

2018-05-17 Thread Eric Dumazet

On Thu, May 17, 2018 at 5:13 AM Ursula Braun  wrote:

> This problem should no longer show up with yesterday's net-next commit
> 569bc6436568 ("net/smc: no tx work trigger for fallback sockets").

It definitely triggers on latest net-next, which includes 569bc6436568

Thanks.

Re: [PATCH net-next v3 00/10] net: mvpp2: phylink conversion

2018-05-17 Thread Gregory CLEMENT

Hi Antoine,
 
 On jeu., mai 17 2018, Antoine Tenart  wrote:

> Hi Dave, Russell,
>
> This series convert the Marvell PPv2 driver to phylink (models the MAC
> to PHY link).
>
> One important point is the PPv2 driver supports two probe modes: device
> tree and ACPI. This series only brings phylink support for the device
> tree mode, as the ACPI one will need further work. Still, the driver
> should be working as before when using ACPI. This split should be
> temporary, and was discussed with Marcin (in Cc.) who added ACPI support
> to the driver.
>
> Also as the SFP cages on both DB boards can be considered as non-wired.
> We thus chose not to describe those SFP cages and we use fixed-link.
>
> The rest of the series uses phylink to add support for 1000BaseX and
> 2500BaseX modes in the PPv2 driver. To do this, two patches are needed
> in the common PHY framework (patches 3 and 4). The last 4 patches modify
> the device tree to use the new PPv2 functionalities.
>
> The series has been tested for the device tree mode on the 7040-db,
> 8040-db and 8040-mcbin boards, to ensure all the interface where working
> as expected.
>
> @Dave: patches 7 to 10 should go through the mvebu tree (Gregory in
> Cc.) to avoid any conflict with the other mvebu dt patches taken during
> this cycle.

Patches 7 to 10 have been applied on mvebu/dt64.

Thanks,

Gregory

>
> The series is based on today's net-next.
>
> Thanks!
> Antoine
>
> Since v2:
>   - Removed the SFP description from the DB boards, as their SFP cages
> are wired properly. We now use fixed-link.
>   - Because of this rework, split the series in two, so that the SFP
> part is reviewed separately.
>   - Small fixes in the phylink patch.
>   - Rebased on the latest net-next branch.
>
> Since v1:
>   - Chose a different approach to the SFP changes, as the previous ones
> weren't valid and reworked both BD boards device trees.
>   - Misc fixes.
>   - Added Kishon's acked-by on one patch.
>   - Rebaed on latest net-next branch.
>
> Antoine Tenart (9):
>   net: mvpp2: align the ethtool ops definition
>   net: mvpp2: phylink support
>   phy: add 2.5G SGMII mode to the phy_mode enum
>   phy: cp110-comphy: 2.5G SGMII mode
>   net: mvpp2: 1000baseX support
>   net: mvpp2: 2500baseX support
>   arm64: dts: marvell: mcbin: enable the fourth network interface
>   arm64: dts: marvell: 8040-db: describe the 10G interfaces as
> fixed-link
>   arm64: dts: marvell: 7040-db: describe the 10G interface as fixed-link
>
> Russell King (1):
>   arm64: dts: marvell: mcbin: add 10G SFP support
>
>  .../arm64/boot/dts/marvell/armada-7040-db.dts |   5 +
>  .../arm64/boot/dts/marvell/armada-8040-db.dts |  10 +
>  .../boot/dts/marvell/armada-8040-mcbin.dts|  70 ++
>  drivers/net/ethernet/marvell/Kconfig  |   1 +
>  drivers/net/ethernet/marvell/mvpp2.c  | 931 +++---
>  drivers/phy/marvell/phy-mvebu-cp110-comphy.c  |  17 +-
>  include/linux/phy/phy.h   |   1 +
>  7 files changed, 680 insertions(+), 355 deletions(-)
>
> -- 
> 2.17.0
>

-- 
Gregory Clement, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com

Re: [patch net-next] nfp: flower: set sysfs link to device for representors

2018-05-17 Thread Or Gerlitz

On Thu, May 17, 2018 at 1:05 PM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Do this so the sysfs has "device" link correctly set.

please no

This is likely to create bunch of issues with respect to how libvirt
deals with the representors.

We were discussing it off list between nfp and mlnx driver people. We
need to put
the open stack folks and kernel developers into the same thread. I can
make a post
on that next week

[PATCH bpf-next v6 0/6] ipv6: sr: introduce seg6local End.BPF action

2018-05-17 Thread Mathieu Xhonneux

As of Linux 4.14, it is possible to define advanced local processing for
IPv6 packets with a Segment Routing Header through the seg6local LWT
infrastructure. This LWT implements the network programming principles
defined in the IETF “SRv6 Network Programming” draft.

The implemented operations are generic, and it would be very interesting to
be able to implement user-specific seg6local actions, without having to
modify the kernel directly. To do so, this patchset adds an End.BPF action
to seg6local, powered by some specific Segment Routing-related helpers,
which provide SR functionalities that can be applied on the packet. This
BPF hook would then allow to implement specific actions at native kernel
speed such as OAM features, advanced SR SDN policies, SRv6 actions like
Segment Routing Header (SRH) encapsulation depending on the content of
the packet, etc.

This patchset is divided in 6 patches, whose main features are :

- A new seg6local action End.BPF with the corresponding new BPF program
  type BPF_PROG_TYPE_LWT_SEG6LOCAL. Such attached BPF program can be
  passed to the LWT seg6local through netlink, the same way as the LWT
  BPF hook operates.
- 3 new BPF helpers for the seg6local BPF hook, allowing to edit/grow/
  shrink a SRH and apply on a packet some of the generic SRv6 actions.
- 1 new BPF helper for the LWT BPF IN hook, allowing to add a SRH through
  encapsulation (via IPv6 encapsulation or inlining if the packet contains
  already an IPv6 header).

As this patchset adds a new LWT BPF hook, I took into account the result of
the discussions when the LWT BPF infrastructure got merged. Hence, the
seg6local BPF hook doesn’t allow write access to skb->data directly, only
the SRH can be modified through specific helpers, which ensures that the
integrity of the packet is maintained.
More details are available in the related patches messages.

The performances of this BPF hook have been assessed with the BPF JIT
enabled on a Intel Xeon X3440 processors with 4 cores and 8 threads
clocked at 2.53 GHz. No throughput losses are noted with the seg6local
BPF hook when the BPF program does nothing (440kpps). Adding a 8-bytes
TLV (1 call each to bpf_lwt_seg6_adjust_srh and bpf_lwt_seg6_store_bytes)
drops the throughput to 410kpps, and inlining a SRH via
bpf_lwt_seg6_action drops the throughput to 420kpps.
All throughputs are stable.

---
v2: move the SRH integrity state from skb->cb to a per-cpu buffer
v3: - document helpers in man-page style
- fix kbuild bugs
- un-break BPF LWT out hook
- bpf_push_seg6_encap is now static
- preempt_enable is now called when the packet is dropped in
  input_action_end_bpf
v4: fix kbuild bugs when CONFIG_IPV6=m
v5: fix kbuild sparse warnings when CONFIG_IPV6=m
v6: fix skb pointers-related bugs in helpers

Thanks.


Mathieu Xhonneux (6):
  ipv6: sr: make seg6.h includable without IPv6
  ipv6: sr: export function lookup_nexthop
  bpf: Add IPv6 Segment Routing helpers
  bpf: Split lwt inout verifier structures
  ipv6: sr: Add seg6local action End.BPF
  selftests/bpf: test for seg6local End.BPF action

 include/linux/bpf_types.h |   5 +-
 include/net/seg6.h|   7 +-
 include/net/seg6_local.h  |  32 ++
 include/uapi/linux/bpf.h  |  97 -
 include/uapi/linux/seg6_local.h   |   3 +
 kernel/bpf/verifier.c |   1 +
 net/core/filter.c | 393 ---
 net/ipv6/Kconfig  |   5 +
 net/ipv6/seg6_local.c | 180 -
 tools/include/uapi/linux/bpf.h|  97 -
 tools/lib/bpf/libbpf.c|   1 +
 tools/testing/selftests/bpf/Makefile  |   6 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  12 +
 tools/testing/selftests/bpf/test_lwt_seg6local.c  | 438 ++
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++
 15 files changed, 1344 insertions(+), 73 deletions(-)
 create mode 100644 include/net/seg6_local.h
 create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
 create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh

-- 
2.16.1

[PATCH bpf-next v6 5/6] ipv6: sr: Add seg6local action End.BPF

2018-05-17 Thread Mathieu Xhonneux

This patch adds the End.BPF action to the LWT seg6local infrastructure.
This action works like any other seg6local End action, meaning that an IPv6
header with SRH is needed, whose DA has to be equal to the SID of the
action. It will also advance the SRH to the next segment, the BPF program
does not have to take care of this.

Since the BPF program may not be a source of instability in the kernel, it
is important to ensure that the integrity of the packet is maintained
before yielding it back to the IPv6 layer. The hook hence keeps track if
the SRH has been altered through the helpers, and re-validates its
content if needed with seg6_validate_srh. The state kept for validation is
stored in a per-CPU buffer. The BPF program is not allowed to directly
write into the packet, and only some fields of the SRH can be altered
through the helper bpf_lwt_seg6_store_bytes.

Performances profiling has shown that the SRH re-validation does not induce
a significant overhead. If the altered SRH is deemed as invalid, the packet
is dropped.

This validation is also done before executing any action through
bpf_lwt_seg6_action, and will not be performed again if the SRH is not
modified after calling the action.

The BPF program may return 3 types of return codes:
- BPF_OK: the End.BPF action will look up the next destination through
 seg6_lookup_nexthop.
- BPF_REDIRECT: if an action has been executed through the
  bpf_lwt_seg6_action helper, the BPF program should return this
  value, as the skb's destination is already set and the default
  lookup should not be performed.
- BPF_DROP : the packet will be dropped.

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/linux/bpf_types.h   |   1 +
 include/uapi/linux/bpf.h|   1 +
 include/uapi/linux/seg6_local.h |   3 +
 kernel/bpf/verifier.c   |   1 +
 net/core/filter.c   |  25 +++
 net/ipv6/seg6_local.c   | 158 +++-
 tools/lib/bpf/libbpf.c  |   1 +
 7 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index aa5c8b878474..b161e506dcfc 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -12,6 +12,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_SEG6LOCAL, lwt_seg6local)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_MSG, sk_msg)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 37f098ca822b..e8efb12d0a7d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -141,6 +141,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+   BPF_PROG_TYPE_LWT_SEG6LOCAL,
 };
 
 enum bpf_attach_type {
diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
index ef2d8c3e76c1..aadcc11fb918 100644
--- a/include/uapi/linux/seg6_local.h
+++ b/include/uapi/linux/seg6_local.h
@@ -25,6 +25,7 @@ enum {
SEG6_LOCAL_NH6,
SEG6_LOCAL_IIF,
SEG6_LOCAL_OIF,
+   SEG6_LOCAL_BPF,
__SEG6_LOCAL_MAX,
 };
 #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
@@ -59,6 +60,8 @@ enum {
SEG6_LOCAL_ACTION_END_AS= 13,
/* forward to SR-unaware VNF with masquerading */
SEG6_LOCAL_ACTION_END_AM= 14,
+   /* custom BPF action */
+   SEG6_LOCAL_ACTION_END_BPF   = 15,
 
__SEG6_LOCAL_ACTION_MAX,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a9e4b1372da6..390142d62ba1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1262,6 +1262,7 @@ static bool may_access_direct_pkt_data(struct 
bpf_verifier_env *env,
switch (env->prog->type) {
case BPF_PROG_TYPE_LWT_IN:
case BPF_PROG_TYPE_LWT_OUT:
+   case BPF_PROG_TYPE_LWT_SEG6LOCAL:
/* dst_input() and dst_output() can't write for now */
if (t == BPF_WRITE)
return false;
diff --git a/net/core/filter.c b/net/core/filter.c
index 39641ea567b4..8cf0065107a3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4893,6 +4893,21 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
}
 }
 
+static const struct bpf_func_proto *
+lwt_seg6local_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_lwt_seg6_store_bytes:
+   return &bpf_lwt_seg6_store_bytes_proto;
+   case BPF_FUNC_lwt_seg6_action:
+   return &bpf_lwt_seg6_action_proto;
+   case BPF_FUNC_lwt_seg6_adjust_srh:
+   return &bpf_lwt_seg6_adjust_srh_pro

[PATCH bpf-next v6 2/6] ipv6: sr: export function lookup_nexthop

2018-05-17 Thread Mathieu Xhonneux

The function lookup_nexthop is essential to implement most of the seg6local
actions. As we want to provide a BPF helper allowing to apply some of these
actions on the packet being processed, the helper should be able to call
this function, hence the need to make it public.

Moreover, if one argument is incorrect or if the next hop can not be found,
an error should be returned by the BPF helper so the BPF program can adapt
its processing of the packet (return an error, properly force the drop,
...). This patch hence makes this function return dst->error to indicate a
possible error.

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/net/seg6.h   |  3 ++-
 include/net/seg6_local.h | 24 
 net/ipv6/seg6_local.c| 20 +++-
 3 files changed, 37 insertions(+), 10 deletions(-)
 create mode 100644 include/net/seg6_local.h

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 70b4cfac52d7..e029e301faa5 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -67,5 +67,6 @@ extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int 
len);
 extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
 int proto);
 extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh);
-
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+  u32 tbl_id);
 #endif
diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
new file mode 100644
index ..57498b23085d
--- /dev/null
+++ b/include/net/seg6_local.h
@@ -0,0 +1,24 @@
+/*
+ *  SR-IPv6 implementation
+ *
+ *  Authors:
+ *  David Lebrun 
+ *  eBPF support: Mathieu Xhonneux 
+ *
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _NET_SEG6_LOCAL_H
+#define _NET_SEG6_LOCAL_H
+
+#include 
+#include 
+
+extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+  u32 tbl_id);
+
+#endif
diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 45722327375a..e9b23fb924ad 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -30,6 +30,7 @@
 #ifdef CONFIG_IPV6_SEG6_HMAC
 #include 
 #endif
+#include 
 #include 
 
 struct seg6_local_lwt;
@@ -140,8 +141,8 @@ static void advance_nextseg(struct ipv6_sr_hdr *srh, struct 
in6_addr *daddr)
*daddr = *addr;
 }
 
-static void lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
-  u32 tbl_id)
+int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
+   u32 tbl_id)
 {
struct net *net = dev_net(skb->dev);
struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -187,6 +188,7 @@ static void lookup_nexthop(struct sk_buff *skb, struct 
in6_addr *nhaddr,
 
skb_dst_drop(skb);
skb_dst_set(skb, dst);
+   return dst->error;
 }
 
 /* regular endpoint function */
@@ -200,7 +202,7 @@ static int input_action_end(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
 
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-   lookup_nexthop(skb, NULL, 0);
+   seg6_lookup_nexthop(skb, NULL, 0);
 
return dst_input(skb);
 
@@ -220,7 +222,7 @@ static int input_action_end_x(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
 
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-   lookup_nexthop(skb, &slwt->nh6, 0);
+   seg6_lookup_nexthop(skb, &slwt->nh6, 0);
 
return dst_input(skb);
 
@@ -239,7 +241,7 @@ static int input_action_end_t(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
 
advance_nextseg(srh, &ipv6_hdr(skb)->daddr);
 
-   lookup_nexthop(skb, NULL, slwt->table);
+   seg6_lookup_nexthop(skb, NULL, slwt->table);
 
return dst_input(skb);
 
@@ -331,7 +333,7 @@ static int input_action_end_dx6(struct sk_buff *skb,
if (!ipv6_addr_any(&slwt->nh6))
nhaddr = &slwt->nh6;
 
-   lookup_nexthop(skb, nhaddr, 0);
+   seg6_lookup_nexthop(skb, nhaddr, 0);
 
return dst_input(skb);
 drop:
@@ -380,7 +382,7 @@ static int input_action_end_dt6(struct sk_buff *skb,
if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
goto drop;
 
-   lookup_nexthop(skb, NULL, slwt->table);
+   seg6_lookup_nexthop(skb, NULL, slwt->table);
 
return dst_input(skb);
 
@@ -406,7 +408,7 @@ static int input_action_end_b6(struct sk_buff *skb, struct 
seg6_local_lwt *slwt)
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
skb_set_transport_header(skb, sizeof(struct ipv6hdr));
 
-   lookup_nexthop(skb, NULL, 0);
+   seg6_lookup_nexthop(skb, NULL, 0);
 
return dst_input(skb);
 
@@ -438,7 +440,7 @@ static int input_action_end_b6

[PATCH bpf-next v6 4/6] bpf: Split lwt inout verifier structures

2018-05-17 Thread Mathieu Xhonneux

The new bpf_lwt_push_encap helper should only be accessible within the
LWT BPF IN hook, and not the OUT one, as this may lead to a skb under
panic.

At the moment, both LWT BPF IN and OUT share the same list of helpers,
whose calls are authorized by the verifier. This patch separates the
verifier ops for the IN and OUT hooks, and allows the IN hook to call the
bpf_lwt_push_encap helper.

This patch is also the occasion to put all lwt_*_func_proto functions
together for clarity. At the moment, socks_op_func_proto is in the middle
of lwt_inout_func_proto and lwt_xmit_func_proto.

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/linux/bpf_types.h |  4 +--
 net/core/filter.c | 83 +--
 2 files changed, 54 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b67f8793de0d..aa5c8b878474 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -9,8 +9,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_XDP, xdp)
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SKB, cg_skb)
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK, cg_sock)
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, cg_sock_addr)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_inout)
-BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_inout)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_IN, lwt_in)
+BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_OUT, lwt_out)
 BPF_PROG_TYPE(BPF_PROG_TYPE_LWT_XMIT, lwt_xmit)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SOCK_OPS, sock_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_SKB, sk_skb)
diff --git a/net/core/filter.c b/net/core/filter.c
index 2f43d0a6ac5d..39641ea567b4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4755,33 +4755,6 @@ xdp_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
}
 }
 
-static const struct bpf_func_proto *
-lwt_inout_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
-{
-   switch (func_id) {
-   case BPF_FUNC_skb_load_bytes:
-   return &bpf_skb_load_bytes_proto;
-   case BPF_FUNC_skb_pull_data:
-   return &bpf_skb_pull_data_proto;
-   case BPF_FUNC_csum_diff:
-   return &bpf_csum_diff_proto;
-   case BPF_FUNC_get_cgroup_classid:
-   return &bpf_get_cgroup_classid_proto;
-   case BPF_FUNC_get_route_realm:
-   return &bpf_get_route_realm_proto;
-   case BPF_FUNC_get_hash_recalc:
-   return &bpf_get_hash_recalc_proto;
-   case BPF_FUNC_perf_event_output:
-   return &bpf_skb_event_output_proto;
-   case BPF_FUNC_get_smp_processor_id:
-   return &bpf_get_smp_processor_id_proto;
-   case BPF_FUNC_skb_under_cgroup:
-   return &bpf_skb_under_cgroup_proto;
-   default:
-   return bpf_base_func_proto(func_id);
-   }
-}
-
 static const struct bpf_func_proto *
 sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -4847,6 +4820,44 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct 
bpf_prog *prog)
}
 }
 
+static const struct bpf_func_proto *
+lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_skb_load_bytes:
+   return &bpf_skb_load_bytes_proto;
+   case BPF_FUNC_skb_pull_data:
+   return &bpf_skb_pull_data_proto;
+   case BPF_FUNC_csum_diff:
+   return &bpf_csum_diff_proto;
+   case BPF_FUNC_get_cgroup_classid:
+   return &bpf_get_cgroup_classid_proto;
+   case BPF_FUNC_get_route_realm:
+   return &bpf_get_route_realm_proto;
+   case BPF_FUNC_get_hash_recalc:
+   return &bpf_get_hash_recalc_proto;
+   case BPF_FUNC_perf_event_output:
+   return &bpf_skb_event_output_proto;
+   case BPF_FUNC_get_smp_processor_id:
+   return &bpf_get_smp_processor_id_proto;
+   case BPF_FUNC_skb_under_cgroup:
+   return &bpf_skb_under_cgroup_proto;
+   default:
+   return bpf_base_func_proto(func_id);
+   }
+}
+
+static const struct bpf_func_proto *
+lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+   switch (func_id) {
+   case BPF_FUNC_lwt_push_encap:
+   return &bpf_lwt_push_encap_proto;
+   default:
+   return lwt_out_func_proto(func_id, prog);
+   }
+}
+
 static const struct bpf_func_proto *
 lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -4878,7 +4889,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
case BPF_FUNC_set_hash_invalid:
return &bpf_set_hash_invalid_proto;
default:
-   return lwt_inout_func_proto(func_id, prog);
+   return lwt_out_func_proto(func_id, prog);
}
 }
 
@@ -6451,13 +6462,23 @@ const struct bpf_prog_ops cg_skb_prog_ops = {
.test_run   = bpf_prog_test_run_skb,

[PATCH bpf-next v6 1/6] ipv6: sr: make seg6.h includable without IPv6

2018-05-17 Thread Mathieu Xhonneux

include/net/seg6.h cannot be included in a source file if CONFIG_IPV6 is
not enabled:
   include/net/seg6.h: In function 'seg6_pernet':
>> include/net/seg6.h:52:14: error: 'struct net' has no member named
'ipv6'; did you mean 'ipv4'?
 return net->ipv6.seg6_data;
 ^~~~
 ipv4

This commit makes seg6_pernet return NULL if IPv6 is not compiled, hence
allowing seg6.h to be included regardless of the configuration.

Signed-off-by: Mathieu Xhonneux 
---
 include/net/seg6.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/net/seg6.h b/include/net/seg6.h
index 099bad59dc90..70b4cfac52d7 100644
--- a/include/net/seg6.h
+++ b/include/net/seg6.h
@@ -49,7 +49,11 @@ struct seg6_pernet_data {
 
 static inline struct seg6_pernet_data *seg6_pernet(struct net *net)
 {
+#if IS_ENABLED(CONFIG_IPV6)
return net->ipv6.seg6_data;
+#else
+   return NULL;
+#endif
 }
 
 extern int seg6_init(void);
-- 
2.16.1

[PATCH bpf-next v6 6/6] selftests/bpf: test for seg6local End.BPF action

2018-05-17 Thread Mathieu Xhonneux

Add a new test for the seg6local End.BPF action. The following helpers
are also tested:

- bpf_lwt_push_encap within the LWT BPF IN hook
- bpf_lwt_seg6_action
- bpf_lwt_seg6_adjust_srh
- bpf_lwt_seg6_store_bytes

A chain of End.BPF actions is built. The SRH is injected through a LWT
BPF IN hook before the chain. Each End.BPF action validates the previous
one, otherwise the packet is dropped.
The test succeeds if the last node in the chain receives the packet and
the UDP datagram contained can be retrieved from userspace.

Signed-off-by: Mathieu Xhonneux 
---
 tools/include/uapi/linux/bpf.h|  97 -
 tools/testing/selftests/bpf/Makefile  |   6 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  12 +
 tools/testing/selftests/bpf/test_lwt_seg6local.c  | 438 ++
 tools/testing/selftests/bpf/test_lwt_seg6local.sh | 140 +++
 5 files changed, 690 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_lwt_seg6local.c
 create mode 100755 tools/testing/selftests/bpf/test_lwt_seg6local.sh

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333a8225..e8efb12d0a7d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -141,6 +141,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_MSG,
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+   BPF_PROG_TYPE_LWT_SEG6LOCAL,
 };
 
 enum bpf_attach_type {
@@ -1902,6 +1903,90 @@ union bpf_attr {
  * egress otherwise). This is the only flag supported for now.
  * Return
  * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+ * Description
+ * Encapsulate the packet associated to *skb* within a Layer 3
+ * protocol header. This header is provided in the buffer at
+ * address *hdr*, with *len* its size in bytes. *type* indicates
+ * the protocol of the header and can be one of:
+ *
+ * **BPF_LWT_ENCAP_SEG6**
+ * IPv6 encapsulation with Segment Routing Header
+ * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
+ * the IPv6 header is computed by the kernel.
+ * **BPF_LWT_ENCAP_SEG6_INLINE**
+ * Only works if *skb* contains an IPv6 packet. Insert a
+ * Segment Routing Header (**struct ipv6_sr_hdr**) inside
+ * the IPv6 header.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_store_bytes(struct sk_buff *skb, u32 offset, const void 
*from, u32 len)
+ * Description
+ * Store *len* bytes from address *from* into the packet
+ * associated to *skb*, at *offset*. Only the flags, tag and TLVs
+ * inside the outermost IPv6 Segment Routing Header can be
+ * modified through this helper.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_adjust_srh(struct sk_buff *skb, u32 offset, s32 delta)
+ * Description
+ * Adjust the size allocated to TLVs in the outermost IPv6
+ * Segment Routing Header contained in the packet associated to
+ * *skb*, at position *offset* by *delta* bytes. Only offsets
+ * after the segments are accepted. *delta* can be as well
+ * positive (growing) as negative (shrinking).
+ *
+ * A call to this helper is susceptible to change the underlaying
+ * packet buffer. Therefore, at load time, all checks on pointers
+ * previously done by the verifier are invalidated and must be
+ * performed again, if the helper is used in combination with
+ * direct packet access.
+ * Return
+ * 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_lwt_seg6_action(struct sk_buff *skb, u32 action, void *param, u32 
param_len)
+ * Description
+ * Apply an IPv6 Segment Routing action of type *action* to the
+ * packet associated to *skb*. Each action takes a parameter
+

[PATCH bpf-next v6 3/6] bpf: Add IPv6 Segment Routing helpers

2018-05-17 Thread Mathieu Xhonneux

The BPF seg6local hook should be powerful enough to enable users to
implement most of the use-cases one could think of. After some thinking,
we figured out that the following actions should be possible on a SRv6
packet, requiring 3 specific helpers :
- bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
- bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
   (to add/delete TLVs)
- bpf_lwt_seg6_action: Apply some SRv6 network programming actions
   (specifically End.X, End.T, End.B6 and
End.B6.Encap)

The specifications of these helpers are provided in the patch (see
include/uapi/linux/bpf.h).

The non-sensitive fields of the SRH are the following : flags, tag and
TLVs. The other fields can not be modified, to maintain the SRH
integrity. Flags, tag and TLVs can easily be modified as their validity
can be checked afterwards via seg6_validate_srh. It is not allowed to
modify the segments directly. If one wants to add segments on the path,
he should stack a new SRH using the End.B6 action via
bpf_lwt_seg6_action.

Growing, shrinking or editing TLVs via the helpers will flag the SRH as
invalid, and it will have to be re-validated before re-entering the IPv6
layer. This flag is stored in a per-CPU buffer, along with the current
header length in bytes.

Storing the SRH len in bytes in the control block is mandatory when using
bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
boundary). When adding/deleting TLVs within the BPF program, the SRH may
temporary be in an invalid state where its length cannot be rounded to 8
bytes without remainder, hence the need to store the length in bytes
separately. The caller of the BPF program can then ensure that the SRH's
final length is valid using this value. Again, a final SRH modified by a
BPF program which doesn’t respect the 8-bytes boundary will be discarded
as it will be considered as invalid.

Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
available from the LWT BPF IN hook, but not from the seg6local BPF one.
This helper allows to encapsulate a Segment Routing Header (either with
a new outer IPv6 header, or by inlining it directly in the existing IPv6
header) into a non-SRv6 packet. This helper is required if we want to
offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
as the BPF seg6local hook only works on traffic already containing a SRH.
This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
the same purpose but with a static SRH per route.

These helpers require CONFIG_IPV6=y (and not =m).

Signed-off-by: Mathieu Xhonneux 
Acked-by: David Lebrun 
---
 include/net/seg6_local.h |   8 ++
 include/uapi/linux/bpf.h |  96 +++-
 net/core/filter.c| 285 +++
 net/ipv6/Kconfig |   5 +
 net/ipv6/seg6_local.c|   2 +
 5 files changed, 372 insertions(+), 24 deletions(-)

diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
index 57498b23085d..661fd5b4d3e0 100644
--- a/include/net/seg6_local.h
+++ b/include/net/seg6_local.h
@@ -15,10 +15,18 @@
 #ifndef _NET_SEG6_LOCAL_H
 #define _NET_SEG6_LOCAL_H
 
+#include 
 #include 
 #include 
 
 extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
   u32 tbl_id);
 
+struct seg6_bpf_srh_state {
+   bool valid;
+   u16 hdrlen;
+};
+
+DECLARE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
+
 #endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333a8225..37f098ca822b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1902,6 +1902,90 @@ union bpf_attr {
  * egress otherwise). This is the only flag supported for now.
  * Return
  * **SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_lwt_push_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len)
+ * Description
+ * Encapsulate the packet associated to *skb* within a Layer 3
+ * protocol header. This header is provided in the buffer at
+ * address *hdr*, with *len* its size in bytes. *type* indicates
+ * the protocol of the header and can be one of:
+ *
+ * **BPF_LWT_ENCAP_SEG6**
+ * IPv6 encapsulation with Segment Routing Header
+ * (**struct ipv6_sr_hdr**). *hdr* only contains the SRH,
+ * the IPv6 header is computed by the kernel.
+ * **BPF_LWT_ENCAP_SEG6_INLINE**
+ * Only works if *skb* contains an IPv6 packet. Insert a
+ * Segment Routing Header (**struct ipv6_sr_hdr**) inside
+ * the IPv6 header.
+ *
+ * A call to this helper is susceptible to change the underlaying
+ *

[PATCH] net/ncsi: prevent a couple array underflows

2018-05-17 Thread Dan Carpenter

We recently refactored this code and introduced a static checker
warning.  Smatch complains that if cmd->index is zero then we would
underflow the arrays.  That's obviously true.

The question is whether we prevent cmd->index from being zero at a
different level.  I've looked at the code and I don't immediately see
a check for that.

Fixes: 062b3e1b6d4f ("net/ncsi: Refactor MAC, VLAN filters")
Signed-off-by: Dan Carpenter 

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index ce9497966ebe..a6b7c7d5c829 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -347,7 +347,7 @@ static int ncsi_rsp_handler_svf(struct ncsi_request *nr)
 
cmd = (struct ncsi_cmd_svf_pkt *)skb_network_header(nr->cmd);
ncf = &nc->vlan_filter;
-   if (cmd->index > ncf->n_vids)
+   if (cmd->index == 0 || cmd->index > ncf->n_vids)
return -ERANGE;
 
/* Add or remove the VLAN filter. Remember HW indexes from 1 */
@@ -445,7 +445,8 @@ static int ncsi_rsp_handler_sma(struct ncsi_request *nr)
ncf = &nc->mac_filter;
bitmap = &ncf->bitmap;
 
-   if (cmd->index > ncf->n_uc + ncf->n_mc + ncf->n_mixed)
+   if (cmd->index == 0 ||
+   cmd->index > ncf->n_uc + ncf->n_mc + ncf->n_mixed)
return -ERANGE;
 
index = (cmd->index - 1) * ETH_ALEN;

Re: [PATCH V2 8/8] dt-bindings: stm32: add compatible for syscon

2018-05-17 Thread Rob Herring

On Tue, May 15, 2018 at 11:19 AM, Christophe ROULLIER
 wrote:
> Hi Rob,

Please don't top post to lists.
>
> I do not understand, so let me explain our status:
>
> We have syscfg IP Harware in our SOC.

Add a compatible string that uniquely identifies what the block is. So
something like "st,stm32f746-syscfg".

> But we do not have SoC specific driver to manage syscfg, we are using a 
> generic driver "syscon".

That does not matter. We're talking about the binding. Design
decisions in the OS should not define the binding. It doesn't matter
that the OS currently doesn't use the compatible string.

> So can you tell me what you wish to describe this part in our SOC bindings ?
>
> Thanks for your help.
>
> Christophe.
>
> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: lundi 7 mai 2018 18:36
> To: Christophe ROULLIER 
> Cc: mark.rutl...@arm.com; mcoquelin.st...@gmail.com; Alexandre TORGUE 
> ; Peppe CAVALLARO ; 
> devicet...@vger.kernel.org; and...@lunn.ch; 
> linux-arm-ker...@lists.infradead.org; netdev@vger.kernel.org
> Subject: Re: [PATCH V2 8/8] dt-bindings: stm32: add compatible for syscon
>
> On Wed, May 02, 2018 at 04:18:43PM +0200, Christophe Roullier wrote:
>> This patch describes syscon DT bindings.
>>
>> Signed-off-by: Christophe Roullier 
>> ---
>>  Documentation/devicetree/bindings/arm/stm32.txt | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/stm32.txt
>> b/Documentation/devicetree/bindings/arm/stm32.txt
>> index 6808ed9..06e3834 100644
>> --- a/Documentation/devicetree/bindings/arm/stm32.txt
>> +++ b/Documentation/devicetree/bindings/arm/stm32.txt
>> @@ -8,3 +8,7 @@ using one of the following compatible strings:
>>st,stm32f746
>>st,stm32h743
>>st,stm32mp157
>> +
>> +Required nodes:
>> +- syscon: the soc bus node must have a system controller node
>> +pointing to the
>> +  global control registers, with the compatible string "syscon";
>
> You misunderstood my prior comment. 'syscon' alone is not valid. You need SoC 
> specific compatible string for it and 'stm32' is not SoC specific. IOW, the 
> compatible property for a syscon should imply every single register field in 
> the block.
>
> Rob

Re: [PATCH net-next 1/2] net: phy: sfp: make the i2c-bus property really optional

2018-05-17 Thread Andrew Lunn

On Thu, May 17, 2018 at 10:29:06AM +0200, Antoine Tenart wrote:
> The SFF,SFP documentation is clear about making all the DT properties,
> with the exception of the compatible, optional. In practice this is not
> the case and without an i2c-bus property provided the SFP code will
> throw NULL pointer exceptions.
> 
> This patch is an attempt to fix this.

Hi Antoine, Russell

How usable is an SFF/SFP module without access to the i2c EEPROM? I
guess this comes down to link speed. Can it be manually configured?

I'm just wondering if we want to make this mandatory? Fail the probe
if it is not listed?

Andrew

Re: [patch net-next] nfp: flower: set sysfs link to device for representors

2018-05-17 Thread Jiri Pirko

Thu, May 17, 2018 at 02:25:14PM CEST, gerlitz...@gmail.com wrote:
>On Thu, May 17, 2018 at 1:05 PM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> Do this so the sysfs has "device" link correctly set.
>
>please no
>
>This is likely to create bunch of issues with respect to how libvirt
>deals with the representors.

Once netdev is there because of some probed pci device, this link should
be set. That is one of the basics.

What "bunch of issues" are you talking about? Please be specific.


>
>We were discussing it off list between nfp and mlnx driver people. We
>need to put
>the open stack folks and kernel developers into the same thread. I can
>make a post
>on that next week

[PATCH bpf-next] bpf: change eBPF helper doc parsing script to allow for smaller indent

2018-05-17 Thread Quentin Monnet

Documentation for eBPF helpers can be parsed from bpf.h and eventually
turned into a man page. Commit 6f96674dbd8c ("bpf: relax constraints on
formatting for eBPF helper documentation") changed the script used to
parse it, in order to allow for different indent style and to ease the
work for writing documentation for future helpers.

The script currently considers that the first tab can be replaced by 6
to 8 spaces. But the documentation for bpf_fib_lookup() uses a mix of
tabs (for the "Description" part) and of spaces ("Return" part), and
only has 5 space long indent for the latter.

We probably do not want to change the values accepted by the script each
time a new helper gets a new indent style. However, it is worth noting
that with those 5 spaces, the "Description" and "Return" part *look*
aligned in the generated patch and in `git show`, so it is likely other
helper authors will use the same length. Therefore, allow for helper
documentation to use 5 spaces only for the first indent level.

Signed-off-by: Quentin Monnet 
---
 scripts/bpf_helpers_doc.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 8f59897fbda1..5010a4d5bfba 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -95,7 +95,7 @@ class HeaderParser(object):
 return capture.group(1)
 
 def parse_desc(self):
-p = re.compile(' \* ?(?:\t| {6,8})Description$')
+p = re.compile(' \* ?(?:\t| {5,8})Description$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty description and we might be parsing another
@@ -109,7 +109,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 desc += '\n'
 else:
-p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
+p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 desc += capture.group(1) + '\n'
@@ -118,7 +118,7 @@ class HeaderParser(object):
 return desc
 
 def parse_ret(self):
-p = re.compile(' \* ?(?:\t| {6,8})Return$')
+p = re.compile(' \* ?(?:\t| {5,8})Return$')
 capture = p.match(self.line)
 if not capture:
 # Helper can have empty retval and we might be parsing another
@@ -132,7 +132,7 @@ class HeaderParser(object):
 if self.line == ' *\n':
 ret += '\n'
 else:
-p = re.compile(' \* ?(?:\t| {6,8})(?:\t| {8})(.*)')
+p = re.compile(' \* ?(?:\t| {5,8})(?:\t| {8})(.*)')
 capture = p.match(self.line)
 if capture:
 ret += capture.group(1) + '\n'
-- 
2.14.1

Re: [PATCH net-next 1/2] net: phy: sfp: make the i2c-bus property really optional

2018-05-17 Thread Antoine Tenart

Hi Andrew,

On Thu, May 17, 2018 at 02:41:28PM +0200, Andrew Lunn wrote:
> On Thu, May 17, 2018 at 10:29:06AM +0200, Antoine Tenart wrote:
> > The SFF,SFP documentation is clear about making all the DT properties,
> > with the exception of the compatible, optional. In practice this is not
> > the case and without an i2c-bus property provided the SFP code will
> > throw NULL pointer exceptions.
> > 
> > This patch is an attempt to fix this.
> 
> How usable is an SFF/SFP module without access to the i2c EEPROM? I
> guess this comes down to link speed. Can it be manually configured?
>
> I'm just wondering if we want to make this mandatory? Fail the probe
> if it is not listed?

Yes, the other option would be to fail when probing a cage missing the
i2c description. I'd say a passive module can work without the i2c
EEPROM accessible as it does not need to be configured. I don't know
what would happen with active ones.

So the question is, do we want to enable partially working SFP cages
(ie. probably working with only a subset of SFP modules)?

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH net-next 1/2] net: phy: sfp: make the i2c-bus property really optional

2018-05-17 Thread Andrew Lunn

On Thu, May 17, 2018 at 02:56:48PM +0200, Antoine Tenart wrote:
> Hi Andrew,
> 
> On Thu, May 17, 2018 at 02:41:28PM +0200, Andrew Lunn wrote:
> > On Thu, May 17, 2018 at 10:29:06AM +0200, Antoine Tenart wrote:
> > > The SFF,SFP documentation is clear about making all the DT properties,
> > > with the exception of the compatible, optional. In practice this is not
> > > the case and without an i2c-bus property provided the SFP code will
> > > throw NULL pointer exceptions.
> > > 
> > > This patch is an attempt to fix this.
> > 
> > How usable is an SFF/SFP module without access to the i2c EEPROM? I
> > guess this comes down to link speed. Can it be manually configured?
> >
> > I'm just wondering if we want to make this mandatory? Fail the probe
> > if it is not listed?
> 
> Yes, the other option would be to fail when probing a cage missing the
> i2c description. I'd say a passive module can work without the i2c
> EEPROM accessible as it does not need to be configured. I don't know
> what would happen with active ones.

Hi Antoine

I was thinking about how it reads the bit rate from the EEPROM. From
that it determines what mode the MAC could use, 1000-Base-X,
2500-Base-X, etc. Can you still configure this correctly via ethtool,
if you don't have the bitrate information?

   Andrew

[PATCH 1/4] arcnet: com20020: Add com20020 io mapped version

2018-05-17 Thread Andrea Greco

From: Andrea Greco 

Add support for com20022I/com20020, io mapped.

Signed-off-by: Andrea Greco 
---
 drivers/net/arcnet/Kconfig   |   9 +-
 drivers/net/arcnet/Makefile  |   1 +
 drivers/net/arcnet/arcdevice.h   |  14 ++
 drivers/net/arcnet/com20020-io.c | 287 +++
 drivers/net/arcnet/com20020.c|   5 +-
 5 files changed, 313 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/arcnet/com20020-io.c

diff --git a/drivers/net/arcnet/Kconfig b/drivers/net/arcnet/Kconfig
index 39bd16f3f86d..85e60ed29fa8 100644
--- a/drivers/net/arcnet/Kconfig
+++ b/drivers/net/arcnet/Kconfig
@@ -3,7 +3,7 @@
 #
 
 menuconfig ARCNET
-   depends on NETDEVICES && (ISA || PCI || PCMCIA)
+   depends on NETDEVICES
tristate "ARCnet support"
---help---
  If you have a network card of this type, say Y and check out the
@@ -129,5 +129,12 @@ config ARCNET_COM20020_CS
 
  To compile this driver as a module, choose M here: the module will be
  called com20020_cs.  If unsure, say N.
+config ARCNET_COM20020_IO
+   bool "Support for COM20020 (IO mapped)"
+   depends on ARCNET_COM20020 && !(ARCNET_COM20020_PCI || 
ARCNET_COM20020_ISA || ARCNET_COM20020_CS)
+   help
+ Say Y here if your custom board mount com20020 chipset or friends.
+ Supported Chipset: com20020, com20022, com20022I-3v3
+ If unsure, say N.
 
 endif # ARCNET
diff --git a/drivers/net/arcnet/Makefile b/drivers/net/arcnet/Makefile
index 53525e8ea130..18da4341f404 100644
--- a/drivers/net/arcnet/Makefile
+++ b/drivers/net/arcnet/Makefile
@@ -14,3 +14,4 @@ obj-$(CONFIG_ARCNET_COM20020) += com20020.o
 obj-$(CONFIG_ARCNET_COM20020_ISA) += com20020-isa.o
 obj-$(CONFIG_ARCNET_COM20020_PCI) += com20020-pci.o
 obj-$(CONFIG_ARCNET_COM20020_CS) += com20020_cs.o
+obj-$(CONFIG_ARCNET_COM20020_IO) += com20020-io.o
diff --git a/drivers/net/arcnet/arcdevice.h b/drivers/net/arcnet/arcdevice.h
index d09b2b46ab63..86c36d9b666b 100644
--- a/drivers/net/arcnet/arcdevice.h
+++ b/drivers/net/arcnet/arcdevice.h
@@ -371,6 +371,19 @@ void arcnet_timeout(struct net_device *dev);
 #define BUS_ALIGN  1
 #endif
 
+#ifdef CONFIG_ARCNET_COM20020_IO
+#define arcnet_inb(addr, offset)   \
+   ioread8((void __iomem *)(addr) + BUS_ALIGN * offset)
+
+#define arcnet_outb(value, addr, offset)   \
+   iowrite8(value, (void __iomem *)addr + BUS_ALIGN * offset)
+
+#define arcnet_insb(addr, offset, buffer, count)   \
+   ioread8_rep((void __iomem *)addr + BUS_ALIGN * offset, buffer, count)
+
+#define arcnet_outsb(addr, offset, buffer, count)  \
+   iowrite8_rep((void __iomem *)addr + BUS_ALIGN * offset, buffer, count)
+#else
 /* addr and offset allow register like names to define the actual IO  address.
  * A configuration option multiplies the offset for alignment.
  */
@@ -388,6 +401,7 @@ void arcnet_timeout(struct net_device *dev);
readb((addr) + (offset))
 #define arcnet_writeb(value, addr, offset) \
writeb(value, (addr) + (offset))
+#endif
 
 #endif /* __KERNEL__ */
 #endif /* _LINUX_ARCDEVICE_H */
diff --git a/drivers/net/arcnet/com20020-io.c b/drivers/net/arcnet/com20020-io.c
new file mode 100644
index ..fce458242193
--- /dev/null
+++ b/drivers/net/arcnet/com20020-io.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/* Linux ARCnet driver for com 20020.
+ *
+ * datasheet:
+ * http://ww1.microchip.com/downloads/en/DeviceDoc/200223vrevc.pdf
+ * http://ww1.microchip.com/downloads/en/DeviceDoc/20020.pdf
+ *
+ * Supported chip version:
+ * - com20020
+ * - com20022
+ * - com20022I-3v3
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "arcdevice.h"
+#include "com20020.h"
+
+/* Reset (5 * xTalFreq), minimal com20020 xTal is 10Mhz */
+#define RESET_DELAY 500
+
+enum com20020_xtal_freq {
+   freq_10Mhz = 10,
+   freq_20Mhz = 20,
+};
+
+enum com20020_arcnet_speed {
+   arc_speed_10M_bps = 1000,
+   arc_speed_5M_bps = 500,
+   arc_speed_2M50_bps = 250,
+   arc_speed_1M25_bps = 125,
+   arc_speed_625K_bps = 625000,
+   arc_speed_312K5_bps = 312500,
+   arc_speed_156K25_bps = 156250,
+};
+
+enum com20020_timeout {
+   arc_timeout_328us =   328000,
+   arc_timeout_164us = 164000,
+   arc_timeout_82us =  82000,
+   arc_timeout_20u5s =  20500,
+};
+
+static int setup_clock(int *clockp, int *clockm, int xtal, int arcnet_speed)
+{
+   int pll_factor, req_clock_frq = 20;
+
+   switch (arcnet_speed) {
+   case arc_speed_10M_bps:
+   req_clock_frq = 80;
+   *clockp = 0;
+   break;
+   case arc_speed_5M_bps:
+   req_clock_frq = 4

[PATCH 2/4] arcnet: com20020: bindings for smsc com20020

2018-05-17 Thread Andrea Greco

From: Andrea Greco 

Add devicetree bindings for smsc com20020

Signed-off-by: Andrea Greco 
---
 .../devicetree/bindings/net/smsc-com20020.txt   | 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/smsc-com20020.txt

diff --git a/Documentation/devicetree/bindings/net/smsc-com20020.txt 
b/Documentation/devicetree/bindings/net/smsc-com20020.txt
new file mode 100644
index ..92360b054873
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/smsc-com20020.txt
@@ -0,0 +1,21 @@
+SMSC com20020 Arcnet network controller
+
+Required propelty:
+- timeout-ns: Arcnet bus timeout, Idle Time (328000 - 20500)
+- bus-speed-bps: Arcnet bus speed (1000 - 156250)
+- smsc,xtal-mhz: External oscillator frequency
+- smsc,backplane-enabled: Controller use backplane mode
+- reset-gpios: Chip reset pin
+- interrupts: Should contain controller interrupt
+
+arcnet@2800 {
+compatible = "smsc,com20020";
+
+   timeout-ns = <20500>;
+   bus-speed-bps = <1000>;
+   smsc,xtal-mhz = <20>;
+   smsc,backplane-enabled;
+
+   reset-gpios = <&gpio3 21 GPIO_ACTIVE_LOW>;
+   interrupts = <&gpio2 10 GPIO_ACTIVE_LOW>;
+};
-- 
2.14.3

[PATCH 3/4] arcnet: com20020: Fixup missing SLOWARB bit

2018-05-17 Thread Andrea Greco

From: Andrea Greco 

If com20020 clock is major of 40Mhz SLOWARB bit is requested.

Signed-off-by: Andrea Greco 
---
 drivers/net/arcnet/com20020.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/arcnet/com20020.c b/drivers/net/arcnet/com20020.c
index 2fd00d2dd6bf..f1de02f05305 100644
--- a/drivers/net/arcnet/com20020.c
+++ b/drivers/net/arcnet/com20020.c
@@ -102,6 +102,10 @@ int com20020_check(struct net_device *dev)
lp->setup = lp->clockm ? 0 : (lp->clockp << 1);
lp->setup2 = (lp->clockm << 4) | 8;
 
+   /* If clock is major of 40Mhz, SLOWARB bit must be set */
+   if (lp->clockm > 1)
+   lp->setup2 |= SLOWARB;
+
/* CHECK: should we do this for SOHARD cards ? */
/* Enable P1Mode for backplane mode */
lp->setup = lp->setup | P1MODE;
-- 
2.14.3

[PATCH 4/4] arcnet: com20020: Add ethtool support

2018-05-17 Thread Andrea Greco

From: Andrea Greco 

Setup ethtols for export com20020 diag register

Signed-off-by: Andrea Greco 
---
 drivers/net/arcnet/com20020-io.c  |  1 +
 drivers/net/arcnet/com20020-isa.c |  1 +
 drivers/net/arcnet/com20020.c | 24 
 drivers/net/arcnet/com20020.h |  1 +
 drivers/net/arcnet/com20020_cs.c  |  1 +
 include/uapi/linux/if_arcnet.h|  6 ++
 6 files changed, 34 insertions(+)

diff --git a/drivers/net/arcnet/com20020-io.c b/drivers/net/arcnet/com20020-io.c
index fce458242193..0d4355bcd873 100644
--- a/drivers/net/arcnet/com20020-io.c
+++ b/drivers/net/arcnet/com20020-io.c
@@ -183,6 +183,7 @@ static int com20020_probe(struct platform_device *pdev)
 
dev = alloc_arcdev(NULL);
dev->netdev_ops = &com20020_netdev_ops;
+   dev->ethtool_ops = &com20020_ethtool_ops;
lp = netdev_priv(dev);
 
lp->card_flags = ARC_CAN_10MBIT;
diff --git a/drivers/net/arcnet/com20020-isa.c 
b/drivers/net/arcnet/com20020-isa.c
index 38fa60ddaf2e..44ab6dcccb58 100644
--- a/drivers/net/arcnet/com20020-isa.c
+++ b/drivers/net/arcnet/com20020-isa.c
@@ -154,6 +154,7 @@ static int __init com20020_init(void)
dev->dev_addr[0] = node;
 
dev->netdev_ops = &com20020_netdev_ops;
+   dev->ethtool_ops = &com20020_ethtool_ops;
 
lp = netdev_priv(dev);
lp->backplane = backplane;
diff --git a/drivers/net/arcnet/com20020.c b/drivers/net/arcnet/com20020.c
index f1de02f05305..02dd93a18e53 100644
--- a/drivers/net/arcnet/com20020.c
+++ b/drivers/net/arcnet/com20020.c
@@ -201,6 +201,29 @@ const struct net_device_ops com20020_netdev_ops = {
.ndo_set_rx_mode = com20020_set_mc_list,
 };
 
+static int com20020_ethtool_regs_len(struct net_device *netdev)
+{
+   return sizeof(struct com20020_ethtool_regs);
+}
+
+static void com20020_ethtool_regs_read(struct net_device *dev,
+  struct ethtool_regs *regs, void *p)
+{
+   struct arcnet_local *lp = netdev_priv(dev);
+   struct com20020_ethtool_regs *com_reg = p;
+
+   memset(p, 0, sizeof(struct com20020_ethtool_regs));
+
+   com_reg->status = lp->hw.status(dev) & 0xFF;
+   com_reg->diag_register = (lp->hw.status(dev) >> 8) & 0xFF;
+   com_reg->reconf_count = lp->num_recons;
+}
+
+const struct ethtool_ops com20020_ethtool_ops = {
+   .get_regs = com20020_ethtool_regs_read,
+   .get_regs_len  = com20020_ethtool_regs_len,
+};
+
 /* Set up the struct net_device associated with this card.  Called after
  * probing succeeds.
  */
@@ -402,6 +425,7 @@ static void com20020_set_mc_list(struct net_device *dev)
 EXPORT_SYMBOL(com20020_check);
 EXPORT_SYMBOL(com20020_found);
 EXPORT_SYMBOL(com20020_netdev_ops);
+EXPORT_SYMBOL(com20020_ethtool_ops);
 #endif
 
 MODULE_LICENSE("GPL");
diff --git a/drivers/net/arcnet/com20020.h b/drivers/net/arcnet/com20020.h
index 0bcc5d0a6903..a1024c8f8a1f 100644
--- a/drivers/net/arcnet/com20020.h
+++ b/drivers/net/arcnet/com20020.h
@@ -31,6 +31,7 @@
 int com20020_check(struct net_device *dev);
 int com20020_found(struct net_device *dev, int shared);
 extern const struct net_device_ops com20020_netdev_ops;
+extern const struct ethtool_ops com20020_ethtool_ops;
 
 /* The number of low I/O ports used by the card. */
 #define ARCNET_TOTAL_SIZE 8
diff --git a/drivers/net/arcnet/com20020_cs.c b/drivers/net/arcnet/com20020_cs.c
index cf607ffcf358..ae64f436fd54 100644
--- a/drivers/net/arcnet/com20020_cs.c
+++ b/drivers/net/arcnet/com20020_cs.c
@@ -233,6 +233,7 @@ static int com20020_config(struct pcmcia_device *link)
}
 
dev->irq = link->irq;
+   dev->ethtool_ops = &com20020_ethtool_ops;
 
ret = pcmcia_enable_device(link);
if (ret)
diff --git a/include/uapi/linux/if_arcnet.h b/include/uapi/linux/if_arcnet.h
index 683878036d76..790c0fa7386d 100644
--- a/include/uapi/linux/if_arcnet.h
+++ b/include/uapi/linux/if_arcnet.h
@@ -127,4 +127,10 @@ struct archdr {
} soft;
 };
 
+struct com20020_ethtool_regs {
+   __u8 status;
+   __u8 diag_register;
+   __u32 reconf_count;
+};
+
 #endif /* _LINUX_IF_ARCNET_H */
-- 
2.14.3

Re: [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter

2018-05-17 Thread Toke Høiland-Jørgensen

Eric Dumazet  writes:

> On 05/17/2018 04:23 AM, Toke Høiland-Jørgensen wrote:
>
>> 
>> We don't do full parsing of SACKs, no; we were trying to keep things
>> simple... We do detect the presence of SACK options, though, and the
>> presence of SACK options on an ACK will make previous ACKs be considered
>> redundant.
>> 
>
> But they are not redundant in some cases, particularly when reorders
> happen in the network.

Huh. I was under the impression that SACKs were basically cumulative
until cleared.

I.e., in packet sequence ABCDE where B and D are lost, C would have
SACK(B) and E would have SACK(B,D). Are you saying that E would only
have SACK(D)?

-Toke

[PATCH net-next 1/1] tc-testing: fixed copy-pasting error in ife tests

2018-05-17 Thread Roman Mashak

Reported-by: Vlad Buslov 
Reported-by: Davide Caratti 
Signed-off-by: Roman Mashak 
---
 .../selftests/tc-testing/tc-tests/actions/ife.json | 28 +++---
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
index 0330ef29..de97e4ff705c 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
@@ -20,7 +20,7 @@
 "matchPattern": "action order [0-9]*: ife encode action pass.*type 
0xED3E.*allow mark.*index 2",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -44,7 +44,7 @@
 "matchPattern": "action order [0-9]*: ife encode action pipe.*type 
0xED3E.*use mark.*index 2",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -68,7 +68,7 @@
 "matchPattern": "action order [0-9]*: ife encode action continue.*type 
0xED3E.*allow mark.*index 2",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -92,7 +92,7 @@
 "matchPattern": "action order [0-9]*: ife encode action drop.*type 
0xED3E.*use mark 789.*index 2",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -116,7 +116,7 @@
 "matchPattern": "action order [0-9]*: ife encode action 
reclassify.*type 0xED3E.*use mark 656768.*index 2",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -140,7 +140,7 @@
 "matchPattern": "action order [0-9]*: ife encode action jump 1.*type 
0xED3E.*use mark 65.*index 2",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -164,7 +164,7 @@
 "matchPattern": "action order [0-9]*: ife encode action 
reclassify.*type 0xED3E.*use mark 4294967295.*index 90",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -210,7 +210,7 @@
 "matchPattern": "action order [0-9]*: ife encode action pass.*type 
0xED3E.*allow prio.*index 9",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -234,7 +234,7 @@
 "matchPattern": "action order [0-9]*: ife encode action pipe.*type 
0xED3E.*use prio 7.*index 9",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -258,7 +258,7 @@
 "matchPattern": "action order [0-9]*: ife encode action continue.*type 
0xED3E.*use prio 3.*index 9",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -282,7 +282,7 @@
 "matchPattern": "action order [0-9]*: ife encode action drop.*type 
0xED3E.*allow prio.*index 9",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -306,7 +306,7 @@
 "matchPattern": "action order [0-9]*: ife encode action 
reclassify.*type 0xED3E.*use prio 998877.*index 9",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -330,7 +330,7 @@
 "matchPattern": "action order [0-9]*: ife encode action jump 10.*type 
0xED3E.*use prio 998877.*index 9",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
@@ -354,7 +354,7 @@
 "matchPattern": "action order [0-9]*: ife encode action 
reclassify.*type 0xED3E.*use prio 4294967295.*index 99",
 "matchCount": "1",
 "teardown": [
-"$TC actions flush action skbedit"
+"$TC actions flush action ife"
 ]
 },
 {
-- 
2.7.4

Re: [PATCH net-next 1/2] net: phy: sfp: make the i2c-bus property really optional

2018-05-17 Thread Antoine Tenart

On Thu, May 17, 2018 at 03:04:06PM +0200, Andrew Lunn wrote:
> 
> I was thinking about how it reads the bit rate from the EEPROM. From
> that it determines what mode the MAC could use, 1000-Base-X,
> 2500-Base-X, etc. Can you still configure this correctly via ethtool,
> if you don't have the bitrate information?

I see. That's a very good question. When testing this, I used SFP cages
which were not wired *at all*. So it worked because the SFP module
injection never was seen by the kernel, which was then not calling
phylink_sfp_module_insert() and thus not calling sfp_parse_support().

But in cases where the module insertion can be detected, as you pointed
out, I'm not so sure it can work then. I'll wait for other answers, but
we may want to fail when probing such modules as you suggested.

Thanks!
Antoine

-- 
Antoine Ténart, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH v2] netfilter: properly initialize xt_table_info structure

2018-05-17 Thread Greg Kroah-Hartman

On Thu, May 17, 2018 at 12:42:00PM +0200, Jan Engelhardt wrote:
> 
> On Thursday 2018-05-17 12:09, Greg Kroah-Hartman wrote:
> >> > --- a/net/netfilter/x_tables.c
> >> > +++ b/net/netfilter/x_tables.c
> >> > @@ -1183,11 +1183,10 @@ struct xt_table_info 
> >> > *xt_alloc_table_info(unsigned int size)
> >> >   * than shoot all processes down before realizing there is 
> >> > nothing
> >> >   * more to reclaim.
> >> >   */
> >> > -info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> >> > +info = kvzalloc(sz, GFP_KERNEL | __GFP_NORETRY);
> >> >  if (!info)
> >> >  return NULL;
> >>
> >> I am curious, what particular path does not later overwrite the whole zone 
> >> ?
> >
> >In do_ipt_get_ctl, the IPT_SO_GET_ENTRIES: option uses a len value that
> >can be larger than the size of the structure itself.
> >
> >Then the data is copied to userspace in copy_entries_to_user() for ipv4
> >and v6, and that's where the "bad data"
> 
> If the kernel incorrectly copies more bytes than it should, isn't that
> a sign that may be going going past the end of the info buffer?
> (And thus, zeroing won't truly fix the issue)

No, the buffer size is correct, we just aren't filling up the whole
buffer as the data requested is smaller than the buffer size.

thanks,

greg k-h

Re: [PATCH v2 net-next 00/12] net: stmmac: Clean-up and tune-up

2018-05-17 Thread Jose Abreu

Hi David, Florian,

Results of slowing down CPU follows bellow.

On 16-05-2018 20:01, Florian Fainelli wrote:
> On 05/16/2018 11:56 AM, David Miller wrote:
>> From: Jose Abreu 
>> Date: Wed, 16 May 2018 13:50:42 +0100
>>
>>> David raised some rightfull constrains about the use of indirect callbacks 
>>> in
>>> the code. I did iperf tests with and without patches 3-12 and the 
>>> performance
>>> remained equal. I guess for 1Gb/s and because my setup has a powerfull
>>> processor these patches don't affect the performance.
>> Does your cpu need Spectre v1 and v2 workarounds which cause indirect calls 
>> to
>> be extremely expensive?
> Given how widespread stmmac is within the ARM CPU's ecosystem, the
> answer is more than likely yes.
>
> To get a better feeling of whether your indirect branches introduce a
> difference, either don't run the CPU at full speed (e.g: use cpufreq to
> slow it down), and/or profile the number of cycles and instruction cache
> hits/miss ratio for the functions called in hot-path.

It turns out my CPU has every single vulnerability detected so far :D

---
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: PTI
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Mitigation: __user pointer sanitization
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Vulnerable: Minimal generic ASM retpoline
---

I'm not sure if workaround is active for spectre_v2 though,
because it just says "vulnerable" ...

Now, I'm using an 8 core Intel running @ 3.4 GHz:

---
# cat /proc/cpuinfo | grep -i mhz
cpu MHz : 3988.358
cpu MHz : 3991.775
cpu MHz : 3995.003
cpu MHz : 3996.003
cpu MHz : 3995.113
cpu MHz : 3996.512
cpu MHz : 3954.454
cpu MHz : 3937.402
---

So, following Florian advice I turned off 7 cores and changed CPU
freq to the minimum allowed (800MHz):

---
# cat /sys/bus/cpu/devices/cpu0/cpufreq/scaling_min_freq
80
---

---
# for file in /sys/bus/cpu/devices/cpu*/cpufreq/scaling_governor;
do echo userspace > $file; done
# for file in /sys/bus/cpu/devices/cpu*/cpufreq/scaling_setspeed;
do echo 80 > $file; done
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 0 > /sys/devices/system/cpu/cpu2/online
# echo 0 > /sys/devices/system/cpu/cpu3/online
# echo 0 > /sys/devices/system/cpu/cpu4/online
# echo 0 > /sys/devices/system/cpu/cpu5/online
# echo 0 > /sys/devices/system/cpu/cpu6/online
# echo 0 > /sys/devices/system/cpu/cpu7/online
---

---
# cat /proc/cpuinfo | grep -i mhz
cpu MHz : 900.076
---

And these are the iperf results:

---
*With* patches 3-12, 8xCPU @ 3.4GHz: iperf = 0.0-60.0 sec  6.62
GBytes   948 Mbits/sec   0.045 ms   37/4838564 (0.00076%)
*With* patches 3-12, 1xCPU @ 800MHz: iperf = 0.0-60.0 sec  6.62
GBytes   947 Mbits/sec   0.000 ms   18/4833009 (0%)
*Without* patches 3-12, 8xCPU @ 3.4GHz: iperf = 0.0-60.0 sec 
6.60 GBytes   945 Mbits/sec   0.049 ms   31/4819455 (0.00064%)
*Without* patches 3-12, 1xCPU @ 800MHz: iperf = 0.0-60.0 sec 
6.62 GBytes   948 Mbits/sec   0.000 ms0/4837257 (0%)
---

Given that the difference between better/worst is < 1%, I think
we can conclude patches 3-13 don't affect the overall
performance. I didn't profile the cache hits/miss though ...

Any comments? Unfortunately I don't have access to an ARM board
to test this yet ...

Thanks and Best Regards,
Jose Miguel Abreu

[PATCH iproute2-next 1/1] tc: add missing space symbol in ife output

2018-05-17 Thread Roman Mashak

In order to make TDC tests match the output patterns, the missing space
character must be added in the mode output string.

Signed-off-by: Roman Mashak 
---
 tc/m_ife.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index 5320e94dbd48..20e9c73d9a0e 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -240,7 +240,7 @@ static int print_ife(struct action_util *au, FILE *f, 
struct rtattr *arg)
p = RTA_DATA(tb[TCA_IFE_PARMS]);
 
print_string(PRINT_ANY, "kind", "%s ", "ife");
-   print_string(PRINT_ANY, "mode", "%s",
+   print_string(PRINT_ANY, "mode", "%s ",
 p->flags & IFE_ENCODE ? "encode" : "decode");
print_action_control(f, "action ", p->action, " ");
 
-- 
2.7.4

Re: [PATCH 00/14] Modify action API for implementing lockless actions

2018-05-17 Thread Vlad Buslov

On Wed 16 May 2018 at 21:51, Jiri Pirko  wrote:
> Wed, May 16, 2018 at 11:23:41PM CEST, vla...@mellanox.com wrote:
>>
>>On Wed 16 May 2018 at 17:36, Roman Mashak  wrote:
>>> Vlad Buslov  writes:
>>>
 On Wed 16 May 2018 at 14:38, Roman Mashak  wrote:
> On Wed, May 16, 2018 at 2:43 AM, Vlad Buslov  wrote:
>> I'm trying to run tdc, but keep getting following error even on clean
>> branch without my patches:
>
> Vlad, not sure if you saw my email:
> Apply Roman's patch and try again
>
> https://marc.info/?l=linux-netdev&m=152639369112020&w=2
>
> cheers,
> jamal

 With patch applied I get following error:

 Test 7d50: Add skbmod action to set destination mac
 exit: 255 0
 dst MAC address <11:22:33:44:55:66>
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel

>>>
>>> You may actually have broken something with your patches in this case.
>>
>> Results is for net-next without my patches.
>
> Do you have skbmod compiled in kernel or as a module?

 Thanks, already figured out that default config has some actions
 disabled.
 Have more errors now. Everything related to ife:

 Test 7682: Create valid ife encode action with mark and pass control
 exit: 255 0
 IFE type 0xED3E
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel

 Test ef47: Create valid ife encode action with mark and pipe control
 exit: 255 0
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

 Test df43: Create valid ife encode action with mark and continue control
 exit: 255 0
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

 Test e4cf: Create valid ife encode action with mark and drop control
 exit: 255 0
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

 Test ccba: Create valid ife encode action with mark and reclassify control
 exit: 255 0
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

 Test a1cf: Create valid ife encode action with mark and jump control
 exit: 255 0
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

 ...

>>>
>>> Please make sure you have these in your kernel config:
>>>
>>> CONFIG_NET_ACT_IFE=y
>>> CONFIG_NET_IFE_SKBMARK=m
>>> CONFIG_NET_IFE_SKBPRIO=m
>>> CONFIG_NET_IFE_SKBTCINDEX=m
>
> Roman, could you please add this to some file? Something similar to:
> tools/testing/selftests/net/forwarding/config
>
> Thanks!
>
>>>
>>> For tdc to run all the tests, it is assumed that all the supported tc
>>> actions/filters are enabled and compiled.
>>
>>Enabling these options allowed all ife tests to pass. Thanks!
>>
>>Error in u32 test still appears however:
>>
>>Test e9a3: Add u32 with source match
>>
>>-> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 ingress"
>>
>>-> prepare stage *** Error message: "Cannot find device "v0p1"

I investigated and was able to fix u32 problems.

First of all, u32 test requires having veth interfaces that are not
created by test infrastructure by default. Following command fixes the
issue:

sudo ip link add v0p0 type veth peer name v0p1

After executing this command test passes, however looking at test
definition itself it seems meaningless. It creates filter with match
source IP 127.0.0.1, then tests if filter with source IP 127.0.0.2
exists, but passes successfully because it actually expects to match
zero filters with such IP :)

I fixed it and it passed properly matching single filter with source IP
127.0.0.2.

After this flower test failed. The flower test expects that user
explicitly provide "-d" option with interface to use. With -d it failed
again. This time because it expects action to have 1m references, but
actual value was 101. I investigated it and found out that test
passed, if executed without running other tests first. So it seemed that
some other test was leaking reference to gact action. It turned out that
culprit was mirred test 6fb4, which created pipe action but didn't flush
it afterward.

With all tests passing on that particular version of net-next, I will
now rebase my changes on top of it and run them again.

Regards,
Vlad

[QUESTION] ehea memory notifier

2018-05-17 Thread David Hildenbrand

Hi,

looking at the ehea_mem_notifier() and called functions, I wonder if
it can tolerate addresses and sizes that are not aligned to EHEA_SECTSIZE.

Looks like for MEM_ONLINE/MEM_GOING_OFFLINE ehea_update_busmap() will do
nothing in case we don't span at least one EHEA_SECTSIZE.

This implies, that for onlined/offlined memory with unaligned
address/size, we won't mark the usmap entry valid.

start_section = (pfn * PAGE_SIZE) / EHEA_SECTSIZE;
end_section = start_section + ((nr_pages * PAGE_SIZE) / EHEA_SECTSIZE)
...
for (i = start_section; i < end_section; i++) {
...
}

The other way around, if we onlined e.g. 16GB and marked the entry
valid, we won't mark it invalid if e.g. offlining 8GB of that.

Is this the right thing to do? Especially
- is "valid of partially online sections" bad?
- is "invalid of partially online sections" bad?

(working on paravirtualized memory devices that will be able to
online/offline things that would not be possible on real HW and checking
all memory notifiers)

Thanks!

-- 

Thanks,

David / dhildenb

Re: [PATCH v3 1/2] media: rc: introduce BPF_PROG_RAWIR_EVENT

2018-05-17 Thread Sean Young

Hi Quentin,

On Thu, May 17, 2018 at 01:10:56PM +0100, Quentin Monnet wrote:
> 2018-05-16 22:04 UTC+0100 ~ Sean Young 
> > Add support for BPF_PROG_RAWIR_EVENT. This type of BPF program can call
> > rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
> > that the last key should be repeated.
> > 
> > The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
> > the target_fd must be the /dev/lircN device.
> > 
> > Signed-off-by: Sean Young 
> > ---
> >  drivers/media/rc/Kconfig   |  13 ++
> >  drivers/media/rc/Makefile  |   1 +
> >  drivers/media/rc/bpf-rawir-event.c | 363 +
> >  drivers/media/rc/lirc_dev.c|  24 ++
> >  drivers/media/rc/rc-core-priv.h|  24 ++
> >  drivers/media/rc/rc-ir-raw.c   |  14 +-
> >  include/linux/bpf_rcdev.h  |  30 +++
> >  include/linux/bpf_types.h  |   3 +
> >  include/uapi/linux/bpf.h   |  55 -
> >  kernel/bpf/syscall.c   |   7 +
> >  10 files changed, 531 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/media/rc/bpf-rawir-event.c
> >  create mode 100644 include/linux/bpf_rcdev.h
> > 
> 
> [...]
> 
> Hi Sean,
> 
> Please find below some nitpicks on the documentation for the two helpers.

I agree with all your points. I will reword and fix this for v4.

Thanks,

Sean
> 
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index d94d333a8225..243e141e8a5b 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> 
> [...]
> 
> > @@ -1902,6 +1904,35 @@ union bpf_attr {
> >   * egress otherwise). This is the only flag supported for now.
> >   * Return
> >   * **SK_PASS** on success, or **SK_DROP** on error.
> > + *
> > + * int bpf_rc_keydown(void *ctx, u32 protocol, u32 scancode, u32 toggle)
> > + * Description
> > + * Report decoded scancode with toggle value. For use in
> > + * BPF_PROG_TYPE_RAWIR_EVENT, to report a successfully
> 
> Could you please use bold RST markup for constants and function names?
> Typically for BPF_PROG_TYPE_RAWIR_EVENT here and the enum below.
> 
> > + * decoded scancode. This is will generate a keydown event,
> 
> s/This is will/This will/?
> 
> > + * and a keyup event once the scancode is no longer repeated.
> > + *
> > + * *ctx* pointer to bpf_rawir_event, *protocol* is decoded
> > + * protocol (see RC_PROTO_* enum).
> 
> This documentation is intended to be compiled as a man page. Could you
> please use a complete sentence here?
> Also, this could do with additional markup as well: **struct
> bpf_rawir_event**.
> 
> > + *
> > + * Some protocols include a toggle bit, in case the button
> > + * was released and pressed again between consecutive scancodes,
> > + * copy this bit into *toggle* if it exists, else set to 0.
> > + *
> > + * Return
> 
> The "Return" lines here and in the second helper use space indent
> instead as tabs (as all other lines do). Would you mind fixing it for
> consistency?
> 
> > + * Always return 0 (for now)
> 
> Other helpers use just "0" in that case, but I do not really mind.
> Out of curiosity, do you have anything specific in mind for changing the
> return value here in the future?

I don't expect this is to change, so I should just "0".

> 
> > + *
> > + * int bpf_rc_repeat(void *ctx)
> > + * Description
> > + * Repeat the last decoded scancode; some IR protocols like
> > + * NEC have a special IR message for repeat last button,
> 
> s/repeat/repeating/?
> 
> > + * in case user is holding a button down; the scancode is
> > + * not repeated.
> > + *
> > + * *ctx* pointer to bpf_rawir_event.
> 
> Please use a complete sentence here as well, if you do not mind.
> 
> > + *
> > + * Return
> > + * Always return 0 (for now)
> >   */
> Thanks,
> Quentin

KASAN: use-after-free Read in vhost_chr_write_iter

2018-05-17 Thread DaeRyong Jeong

We report the crash: KASAN: use-after-free Read in vhost_chr_write_iter

This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
version of Syzkaller), which we describe more at the end of this
report. Our analysis shows that the race occurs when invoking two
syscalls concurrently, write$vnet and ioctl$VHOST_RESET_OWNER.


Analysis:
We think the concurrent execution of vhost_process_iotlb_msg() and
vhost_dev_cleanup() causes the crash.
Both of functions can run concurrently (please see call sequence below),
and possibly, there is a race on dev->iotlb.
If the switch occurs right after vhost_dev_cleanup() frees
dev->iotlb, vhost_process_iotlb_msg() still sees the non-null value and it
keep executing without returning -EFAULT. Consequently, use-after-free
occures


Thread interleaving:
CPU0 (vhost_process_iotlb_msg)  CPU1 (vhost_dev_cleanup)
(In the case of both VHOST_IOTLB_UPDATE and
VHOST_IOTLB_INVALIDATE)
=   =

vhost_umem_clean(dev->iotlb);
if (!dev->iotlb) {
ret = -EFAULT;
break;
}
dev->iotlb = NULL;


Call Sequence:
CPU0
=
vhost_net_chr_write_iter
vhost_chr_write_iter
vhost_process_iotlb_msg

CPU1
=
vhost_net_ioctl
vhost_net_reset_owner
vhost_dev_reset_owner
vhost_dev_cleanup


==
BUG: KASAN: use-after-free in vhost_umem_interval_tree_iter_first 
drivers/vhost/vhost.c:52 [inline]
BUG: KASAN: use-after-free in vhost_del_umem_range drivers/vhost/vhost.c:936 
[inline]
BUG: KASAN: use-after-free in vhost_process_iotlb_msg 
drivers/vhost/vhost.c:1010 [inline]
BUG: KASAN: use-after-free in vhost_chr_write_iter+0x44e/0xcd0 
drivers/vhost/vhost.c:1037
Read of size 8 at addr 8801d9d7bc00 by task syz-executor0/4997

CPU: 0 PID: 4997 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x166/0x21c lib/dump_stack.c:113
 print_address_description+0x73/0x250 mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report+0x23f/0x360 mm/kasan/report.c:412
 check_memory_region_inline mm/kasan/kasan.c:260 [inline]
 __asan_load8+0x54/0x90 mm/kasan/kasan.c:699
 vhost_umem_interval_tree_iter_first drivers/vhost/vhost.c:52 [inline]
 vhost_del_umem_range drivers/vhost/vhost.c:936 [inline]
 vhost_process_iotlb_msg drivers/vhost/vhost.c:1010 [inline]
 vhost_chr_write_iter+0x44e/0xcd0 drivers/vhost/vhost.c:1037
 vhost_net_chr_write_iter+0x38/0x40 drivers/vhost/net.c:1380
 call_write_iter include/linux/fs.h:1784 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x355/0x480 fs/read_write.c:487
 vfs_write+0x12d/0x2d0 fs/read_write.c:549
 ksys_write+0xca/0x190 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x43/0x50 fs/read_write.c:607
 do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4563f9
RSP: 002b:7f4da7ce9b28 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 0072bee0 RCX: 004563f9
RDX: 0068 RSI: 26c0 RDI: 0015
RBP: 0729 R08:  R09: 
R10:  R11: 0246 R12: 7f4da7cea6d4
R13:  R14: 006ffc78 R15: 

Allocated by task 4997:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xae/0xe0 mm/kasan/kasan.c:553
 __do_kmalloc_node mm/slab.c:3682 [inline]
 __kmalloc_node+0x47/0x70 mm/slab.c:3689
 kmalloc_node include/linux/slab.h:554 [inline]
 kvmalloc_node+0x99/0xd0 mm/util.c:421
 kvmalloc include/linux/mm.h:550 [inline]
 kvzalloc include/linux/mm.h:558 [inline]
 vhost_umem_alloc+0x72/0x120 drivers/vhost/vhost.c:1260
 vhost_init_device_iotlb+0x1e/0x160 drivers/vhost/vhost.c:1548
 vhost_net_set_features drivers/vhost/net.c:1273 [inline]
 vhost_net_ioctl+0x849/0x1040 drivers/vhost/net.c:1338
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x145/0xd00 fs/ioctl.c:686
 ksys_ioctl+0x94/0xb0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x43/0x50 fs/ioctl.c:706
 do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 5000:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.

Re: [PATCH net-next] net/smc: init conn.tx_work & conn.send_lock sooner

2018-05-17 Thread Ursula Braun



On 05/17/2018 02:20 PM, Eric Dumazet wrote:
> On Thu, May 17, 2018 at 5:13 AM Ursula Braun  wrote:
> 
>> This problem should no longer show up with yesterday's net-next commit
>> 569bc6436568 ("net/smc: no tx work trigger for fallback sockets").
> 
> It definitely triggers on latest net-next, which includes 569bc6436568
> 
> Thanks.
> 

Sorry, my fault. 

Your proposed patch solves the problem. On the other hand the purpose of
smc_tx_init() has been to cover tx-related socket initializations needed for
connection sockets only. tx_work is something that should be scheduled only
for active connection sockets in non-fallback mode.
Thus I prefer this alternate patch to solve the problem:

---
 net/smc/af_smc.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1362,14 +1362,18 @@ static int smc_setsockopt(struct socket
}
break;
case TCP_NODELAY:
-   if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+   if (sk->sk_state != SMC_INIT &&
+   sk->sk_state != SMC_LISTEN &&
+   sk->sk_state != SMC_CLOSED) {
if (val && !smc->use_fallback)
mod_delayed_work(system_wq, &smc->conn.tx_work,
 0);
}
break;
case TCP_CORK:
-   if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+   if (sk->sk_state != SMC_INIT &&
+   sk->sk_state != SMC_LISTEN &&
+   sk->sk_state != SMC_CLOSED) {
if (!val && !smc->use_fallback)
mod_delayed_work(system_wq, &smc->conn.tx_work,
 0);

What do you think?

Re: [PATCH net-next v3 1/3] ipv4: support sport, dport and ip_proto in RTM_GETROUTE

2018-05-17 Thread Roopa Prabhu

On Wed, May 16, 2018 at 7:36 PM, David Miller  wrote:
> From: Roopa Prabhu 
> Date: Wed, 16 May 2018 13:30:28 -0700
>
>> yes, but we hold rcu read lock before calling the reply function for
>> fib result.  I did consider allocating the skb before the read
>> lock..but then the refactoring (into a separate netlink reply func)
>> would seem unnecessary.
>>
>> I am fine with pre-allocating and undoing the refactoring if that works 
>> better.
>
> Hmmm... I also notice that with this change we end up doing the
> rtnl_unicast() under the RCU lock which is unnecessary too.

that was unintentional, it seemed like the previous code did that too..
and you are right  it did not.

>
> So yes, please pull the "out_skb" allocation before the
> rcu_read_lock(), and push the rtnl_unicast() after the
> rcu_read_unlock().

agreed, will do.

>
> It really is a shame that sharing the ETH_P_IP skb between the route
> route lookup and the netlink response doesn't work properly.

I did try a few things before giving up on the same skb...since it
also seemed like
keeping the netlink code separate would be a good thing for the future.

>
> I was using RTM_GETROUTE at one point for route/fib lookup performance
> measurements.  It never was great at that, but now that there is going
> to be two SKB allocations instead of one it is going to be even less
> useful for that kind of usage.

oh...did not realize this use of it. It certainly seems like we should
try to retain the
single skb in that case. let me see what i can do.

thanks.

[iproute2-next v3 1/1] tipc: fixed node and name table listings

2018-05-17 Thread Jon Maloy

We make it easier for users to correlate between 128-bit node
identities and 32-bit node hash number by extending the 'node list'
command to also show the hash number.

We also improve the 'nametable show' command to show the node identity
instead of the node hash number. Since the former potentially is much
longer than the latter, we make room for it by eliminating the (to the
user) irrelevant publication key. We also reorder some of the columns so
that the node id comes last, since this looks nicer and is more logical.

---
v2: Fixed compiler warning as per comment from David Ahern
v3: Fixed leaking socket as per comment from David Ahern

Signed-off-by: Jon Maloy 
---
 tipc/misc.c  | 20 
 tipc/misc.h  |  1 +
 tipc/nametable.c | 18 ++
 tipc/node.c  | 19 ---
 tipc/peer.c  |  4 
 5 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/tipc/misc.c b/tipc/misc.c
index 16849f1..e4b1cd0 100644
--- a/tipc/misc.c
+++ b/tipc/misc.c
@@ -13,6 +13,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 #include "misc.h"
 
 #define IN_RANGE(val, low, high) ((val) <= (high) && (val) >= (low))
@@ -109,3 +113,19 @@ void nodeid2str(uint8_t *id, char *str)
for (i = 31; str[i] == '0'; i--)
str[i] = 0;
 }
+
+void hash2nodestr(uint32_t hash, char *str)
+{
+   struct tipc_sioc_nodeid_req nr = {};
+   int sd;
+
+   sd = socket(AF_TIPC, SOCK_RDM, 0);
+   if (sd < 0) {
+   fprintf(stderr, "opening TIPC socket: %s\n", strerror(errno));
+   return;
+   }
+   nr.peer = hash;
+   if (!ioctl(sd, SIOCGETNODEID, &nr))
+   nodeid2str((uint8_t *)nr.node_id, str);
+   close(sd);
+}
diff --git a/tipc/misc.h b/tipc/misc.h
index 6e8afdd..ff2f31f 100644
--- a/tipc/misc.h
+++ b/tipc/misc.h
@@ -17,5 +17,6 @@
 uint32_t str2addr(char *str);
 int str2nodeid(char *str, uint8_t *id);
 void nodeid2str(uint8_t *id, char *str);
+void hash2nodestr(uint32_t hash, char *str);
 
 #endif
diff --git a/tipc/nametable.c b/tipc/nametable.c
index 2578940..ae73dfa 100644
--- a/tipc/nametable.c
+++ b/tipc/nametable.c
@@ -20,6 +20,7 @@
 #include "cmdl.h"
 #include "msg.h"
 #include "nametable.h"
+#include "misc.h"
 
 #define PORTID_STR_LEN 45 /* Four u32 and five delimiter chars */
 
@@ -31,6 +32,7 @@ static int nametable_show_cb(const struct nlmsghdr *nlh, void 
*data)
struct nlattr *attrs[TIPC_NLA_NAME_TABLE_MAX + 1] = {};
struct nlattr *publ[TIPC_NLA_PUBL_MAX + 1] = {};
const char *scope[] = { "", "zone", "cluster", "node" };
+   char str[33] = {0,};
 
mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
if (!info[TIPC_NLA_NAME_TABLE])
@@ -45,20 +47,20 @@ static int nametable_show_cb(const struct nlmsghdr *nlh, 
void *data)
return MNL_CB_ERROR;
 
if (!*iteration)
-   printf("%-10s %-10s %-10s %-10s %-10s %-10s\n",
-  "Type", "Lower", "Upper", "Node", "Port",
-  "Publication Scope");
+   printf("%-10s %-10s %-10s %-8s %-10s %-33s\n",
+  "Type", "Lower", "Upper", "Scope", "Port",
+  "Node");
(*iteration)++;
 
-   printf("%-10u %-10u %-10u %-10x %-10u %-12u",
+   hash2nodestr(mnl_attr_get_u32(publ[TIPC_NLA_PUBL_NODE]), str);
+
+   printf("%-10u %-10u %-10u %-8s %-10u %s\n",
   mnl_attr_get_u32(publ[TIPC_NLA_PUBL_TYPE]),
   mnl_attr_get_u32(publ[TIPC_NLA_PUBL_LOWER]),
   mnl_attr_get_u32(publ[TIPC_NLA_PUBL_UPPER]),
-  mnl_attr_get_u32(publ[TIPC_NLA_PUBL_NODE]),
+  scope[mnl_attr_get_u32(publ[TIPC_NLA_PUBL_SCOPE])],
   mnl_attr_get_u32(publ[TIPC_NLA_PUBL_REF]),
-  mnl_attr_get_u32(publ[TIPC_NLA_PUBL_KEY]));
-
-   printf("%s\n", scope[mnl_attr_get_u32(publ[TIPC_NLA_PUBL_SCOPE])]);
+  str);
 
return MNL_CB_OK;
 }
diff --git a/tipc/node.c b/tipc/node.c
index b73b644..0fa1064 100644
--- a/tipc/node.c
+++ b/tipc/node.c
@@ -26,10 +26,11 @@
 
 static int node_list_cb(const struct nlmsghdr *nlh, void *data)
 {
-   uint32_t addr;
struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
struct nlattr *info[TIPC_NLA_MAX + 1] = {};
struct nlattr *attrs[TIPC_NLA_NODE_MAX + 1] = {};
+   char str[33] = {};
+   uint32_t addr;
 
mnl_attr_parse(nlh, sizeof(*genl), parse_attrs, info);
if (!info[TIPC_NLA_NODE])
@@ -40,13 +41,12 @@ static int node_list_cb(const struct nlmsghdr *nlh, void 
*data)
return MNL_CB_ERROR;
 
addr = mnl_attr_get_u32(attrs[TIPC_NLA_NODE_ADDR]);
-   printf("%x: ", addr);
-
+   hash2nodestr(addr, str);
+   printf("%-32s %08x ", str, addr);
if (attrs[TIPC_NLA_NODE_UP])
printf("up\n");
else
printf("down\n");
-
return MNL_CB_OK;
 }
 
@@

Re: [patch net-next RFC 04/12] dsa: set devlink port attrs for dsa ports

2018-05-17 Thread Jiri Pirko

Fri, Mar 23, 2018 at 06:09:29PM CET, f.faine...@gmail.com wrote:
>On 03/23/2018 07:49 AM, Jiri Pirko wrote:
>> Fri, Mar 23, 2018 at 02:30:02PM CET, and...@lunn.ch wrote:
>>> On Thu, Mar 22, 2018 at 11:55:14AM +0100, Jiri Pirko wrote:
 From: Jiri Pirko 

 Set the attrs and allow to expose port flavour to user via devlink.

 Signed-off-by: Jiri Pirko 
 ---
  net/dsa/dsa2.c | 23 +++
  1 file changed, 23 insertions(+)

 diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
 index adf50fbc4c13..49453690696d 100644
 --- a/net/dsa/dsa2.c
 +++ b/net/dsa/dsa2.c
 @@ -270,7 +270,27 @@ static int dsa_port_setup(struct dsa_port *dp)
case DSA_PORT_TYPE_UNUSED:
break;
case DSA_PORT_TYPE_CPU:
 +  /* dp->index is used now as port_number. However
 +   * CPU ports should have separate numbering
 +   * independent from front panel port numbers.
 +   */
 +  devlink_port_attrs_set(&dp->devlink_port,
 + DEVLINK_PORT_FLAVOUR_CPU,
 + dp->index, false, 0);
 +  err = dsa_port_link_register_of(dp);
 +  if (err) {
 +  dev_err(ds->dev, "failed to setup link for port 
 %d.%d\n",
 +  ds->index, dp->index);
 +  return err;
 +  }
>>>
>>> Ah, i get it. These used to be two case statements with one code
>>> block. But you split them apart, so needed to duplicate the
>>> dsa_port_link_register.
>>>
>>> Unfortunately, you forgot to add a 'break;', so it still falls
>>> through, and overwrites the port flavour to DSA.
>> 
>> ah, crap. Don't have hw to test this :/
>> Will fix. Thanks!
>
>You don't need hardware, there is drivers/net/dsa/dsa_loop.c which will
>emulate a DSA switch. It won't create interconnect ports, since only one

Hmm, trying to use dsa_loop. Doing:
modprobe dsa_loop
modprobe fixed_phy

I don't see the netdevs. Any idea what am I doing wrong? Thanks!


>switch can be created with the method chosen, but this would have helped
>you catch the missing break since the "CPU" port would have been
>displayed as "DSA" anyway.
>
>If you need hardware, I am sure this can be somehow arranged. By that, I
>mean something on which you can run upstream Linux on without out of
>tree patches.
>-- 
>Florian

[PATCH 0/2] bpf: sockmap, fix uninitialized variable and double-free

2018-05-17 Thread Gustavo A. R. Silva

This patchset aims to fix an uninitialized variable issue and
a double-free issue in __sock_map_ctx_update_elem.

Both issues were reported by Coverity.

Thanks.

Gustavo A. R. Silva (2):
  bpf: sockmap, fix uninitialized variable
  bpf: sockmap, fix double-free

 kernel/bpf/sockmap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.7.4

Re: [patch net-next RFC 04/12] dsa: set devlink port attrs for dsa ports

2018-05-17 Thread Florian Fainelli



On 05/17/2018 07:02 AM, Jiri Pirko wrote:
> Fri, Mar 23, 2018 at 06:09:29PM CET, f.faine...@gmail.com wrote:
>> On 03/23/2018 07:49 AM, Jiri Pirko wrote:
>>> Fri, Mar 23, 2018 at 02:30:02PM CET, and...@lunn.ch wrote:
 On Thu, Mar 22, 2018 at 11:55:14AM +0100, Jiri Pirko wrote:
> From: Jiri Pirko 
>
> Set the attrs and allow to expose port flavour to user via devlink.
>
> Signed-off-by: Jiri Pirko 
> ---
>  net/dsa/dsa2.c | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index adf50fbc4c13..49453690696d 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -270,7 +270,27 @@ static int dsa_port_setup(struct dsa_port *dp)
>   case DSA_PORT_TYPE_UNUSED:
>   break;
>   case DSA_PORT_TYPE_CPU:
> + /* dp->index is used now as port_number. However
> +  * CPU ports should have separate numbering
> +  * independent from front panel port numbers.
> +  */
> + devlink_port_attrs_set(&dp->devlink_port,
> +DEVLINK_PORT_FLAVOUR_CPU,
> +dp->index, false, 0);
> + err = dsa_port_link_register_of(dp);
> + if (err) {
> + dev_err(ds->dev, "failed to setup link for port 
> %d.%d\n",
> + ds->index, dp->index);
> + return err;
> + }

 Ah, i get it. These used to be two case statements with one code
 block. But you split them apart, so needed to duplicate the
 dsa_port_link_register.

 Unfortunately, you forgot to add a 'break;', so it still falls
 through, and overwrites the port flavour to DSA.
>>>
>>> ah, crap. Don't have hw to test this :/
>>> Will fix. Thanks!
>>
>> You don't need hardware, there is drivers/net/dsa/dsa_loop.c which will
>> emulate a DSA switch. It won't create interconnect ports, since only one
> 
> Hmm, trying to use dsa_loop. Doing:
> modprobe dsa_loop
> modprobe fixed_phy
> 
> I don't see the netdevs. Any idea what am I doing wrong? Thanks!

Yes, modprobe dsa-loop-bdinfo first, which will create the
mdio_board_info and then modprobe dsa-loop.

> 
> 
>> switch can be created with the method chosen, but this would have helped
>> you catch the missing break since the "CPU" port would have been
>> displayed as "DSA" anyway.
>>
>> If you need hardware, I am sure this can be somehow arranged. By that, I
>> mean something on which you can run upstream Linux on without out of
>> tree patches.
>> -- 
>> Florian

-- 
Florian

[PATCH 1/2] bpf: sockmap, fix uninitialized variable

2018-05-17 Thread Gustavo A. R. Silva

There is a potential execution path in which variable err is
returned without being properly initialized previously.

Fix this by initializing variable err to 0.

Addresses-Coverity-ID: 1468964 ("Uninitialized scalar variable")
Fixes: e5cd3abcb31a ("bpf: sockmap, refactor sockmap routines to work
with hashmap")
Signed-off-by: Gustavo A. R. Silva 
---
 kernel/bpf/sockmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index c6de139..41b41fc 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1713,7 +1713,7 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
struct smap_psock_map_entry *e = NULL;
struct smap_psock *psock;
bool new = false;
-   int err;
+   int err = 0;
 
/* 1. If sock map has BPF programs those will be inherited by the
 * sock being added. If the sock is already attached to BPF programs
-- 
2.7.4

Re: STMMAC driver with TSO enabled issue

2018-05-17 Thread Jose Abreu

Hi Bhadram,

On 15-05-2018 07:44, Bhadram Varka wrote:
> Hi Jose,
>
> On 5/10/2018 9:15 PM, Jose Abreu wrote:
>>
>>
>> On 10-05-2018 16:08, Bhadram Varka wrote:
>>> Hi Jose,
>>>
>>> On 5/10/2018 7:59 PM, Jose Abreu wrote:
 Hi Bhadram,

 On 10-05-2018 09:55, Jose Abreu wrote:
> ++net-dev
>
> Hi Bhadram,
>
> On 09-05-2018 12:03, Bhadram Varka wrote:
>> Hi,
>>
>> Thanks for responding.
>>
>> Tried below suggested way. Still observing the issue -
> It seems stmmac has a bug in the RX side when using TSO
> which is
> causing all the RX descriptors to be consumed. The stmmac_rx()
> function will need to be refactored. I will send a fix ASAP.

 Are you using this patch [1] ? Because there is a problem with
 the patch. By adding the previously removed call to
 stmmac_init_rx_desc() TSO works okay in my setup.

>>>
>>> No. I don't have this change in my code base. I am using
>>> net-next tree.
>>>
>>> Can you please post the change for which TSO works ? I can help
>>> you with the testing.
>>
>> It should work with net-next because patch was not merged yet ...
>> Please send me the output of "dmesg | grep -i stmmac", since boot
>> and your full register values (from 0x0 to 0x12E4).
>>
>
> [root@alarm ~]# dmesg | grep -i dwc
> [6.925005] dwc-eth-dwmac 249.ethernet: Cannot get CSR
> clock
> [6.933657] dwc-eth-dwmac 249.ethernet: no reset control
> found
> [6.955325] dwc-eth-dwmac 249.ethernet: User ID: 0x10,
> Synopsys ID: 0x41
> [6.962379] dwc-eth-dwmac 249.ethernet:  DWMAC4/5
> [6.967434] dwc-eth-dwmac 249.ethernet: DMA HW
> capability register supported
> [6.974827] dwc-eth-dwmac 249.ethernet: RX Checksum
> Offload Engine supported
> [6.982915] dwc-eth-dwmac 249.ethernet: TX Checksum
> insertion supported
> [6.991235] dwc-eth-dwmac 249.ethernet: Wake-Up On Lan
> supported
> [6.998974] dwc-eth-dwmac 249.ethernet: TSO supported
> [7.006422] dwc-eth-dwmac 249.ethernet: TSO feature enabled
> [7.012581] dwc-eth-dwmac 249.ethernet: Enable RX
> Mitigation via HW Watchdog Timer
> [7.236391] dwc-eth-dwmac 249.ethernet eth0: device MAC
> address 4a:d1:e3:58:cb:7a
> [7.333414] dwc-eth-dwmac 249.ethernet eth0: IEEE
> 1588-2008 Advanced Timestamp supported
> [7.342441] dwc-eth-dwmac 249.ethernet eth0: registered
> PTP clock
> [   10.157066] dwc-eth-dwmac 249.ethernet eth0: Link is Up
> - 1Gbps/Full - flow control off
> [root@alarm ~]# dmesg | grep -i stmma
> [7.020567] libphy: stmmac: probed
> [7.316295] Broadcom BCM89610 stmmac-0:00: attached PHY
> driver [Broadcom BCM89610] (mii_bus:phy_addr=stmmac-0:00, irq=64)
>
> I will get the register details -
>
> FYI - TSO works fine with single channel. I see the issue only
> if multi channel enabled (supports 4 Tx/Rx channels).
>

And normal data transfer works okay with multi channel, right? I
will need the register details to proceed ... You could also try
git bisect ...

Thanks and Best Regards,
Jose Miguel Abreu

Re: net: ieee802154: 6lowpan: fix frag reassembly

2018-05-17 Thread Stefan Schmidt

Hello Greg.

On 17.05.2018 10:59, Greg KH wrote:
> On Mon, May 14, 2018 at 05:22:18PM +0200, Stefan Schmidt wrote:
>> Hello.
>>
>>
>> Please apply f18fa5de5ba7f1d6650951502bb96a6e4715a948
>>
>> (net: ieee802154: 6lowpan: fix frag reassembly) to the 4.16.x stable tree.
>>
>>
>> Earlier trees are not needed as the problem was introduced in 4.16.
> 
> Really?  Commit f18fa5de5ba7 ("net: ieee802154: 6lowpan: fix frag
> reassembly") says it fixes commit 648700f76b03 ("inet: frags: use
> rhashtables for reassembly units") which did not show up until 4.17-rc1:
>   $ git describe --contains 648700f76b03
>   v4.17-rc1~148^2~20^2~11
> 
> Also, it did not get backported to 4.16.y, so I don't see how it is
> needed in 4.16-stable.

I guess its time to blush on my side. During the bisection for the
commit that introduced the problem I came to the point where it was
clear to me that it was already in 4.16. This was a while back I have
have honestly no idea how I did this mistake.

I tested again now with plain 4.16 and it works fine.
The fix is also in 4.17-rcX where it actually is needed. In the end I am
glad that it was not introduced and slipped me in an earlier release.

> To verify this, I tried applying the patch, and it totally fails to
> apply to the 4.16.y tree.
> 
> So are you _sure_ you want/need this in 4.16?  If so, can you provide a
> working backport that you have verified works?

No backport needed. I simply screwed up when verifying this for 4.16.
I put on the hat of shame for today and will try harder the next time.

Sorry to have wasted your time on this. :/

regards
Stefan Schmidt

Re: [patch net-next RFC 04/12] dsa: set devlink port attrs for dsa ports

2018-05-17 Thread Jiri Pirko

Thu, May 17, 2018 at 04:08:10PM CEST, f.faine...@gmail.com wrote:
>
>
>On 05/17/2018 07:02 AM, Jiri Pirko wrote:
>> Fri, Mar 23, 2018 at 06:09:29PM CET, f.faine...@gmail.com wrote:
>>> On 03/23/2018 07:49 AM, Jiri Pirko wrote:
 Fri, Mar 23, 2018 at 02:30:02PM CET, and...@lunn.ch wrote:
> On Thu, Mar 22, 2018 at 11:55:14AM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko 
>>
>> Set the attrs and allow to expose port flavour to user via devlink.
>>
>> Signed-off-by: Jiri Pirko 
>> ---
>>  net/dsa/dsa2.c | 23 +++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
>> index adf50fbc4c13..49453690696d 100644
>> --- a/net/dsa/dsa2.c
>> +++ b/net/dsa/dsa2.c
>> @@ -270,7 +270,27 @@ static int dsa_port_setup(struct dsa_port *dp)
>>  case DSA_PORT_TYPE_UNUSED:
>>  break;
>>  case DSA_PORT_TYPE_CPU:
>> +/* dp->index is used now as port_number. However
>> + * CPU ports should have separate numbering
>> + * independent from front panel port numbers.
>> + */
>> +devlink_port_attrs_set(&dp->devlink_port,
>> +   DEVLINK_PORT_FLAVOUR_CPU,
>> +   dp->index, false, 0);
>> +err = dsa_port_link_register_of(dp);
>> +if (err) {
>> +dev_err(ds->dev, "failed to setup link for port 
>> %d.%d\n",
>> +ds->index, dp->index);
>> +return err;
>> +}
>
> Ah, i get it. These used to be two case statements with one code
> block. But you split them apart, so needed to duplicate the
> dsa_port_link_register.
>
> Unfortunately, you forgot to add a 'break;', so it still falls
> through, and overwrites the port flavour to DSA.

 ah, crap. Don't have hw to test this :/
 Will fix. Thanks!
>>>
>>> You don't need hardware, there is drivers/net/dsa/dsa_loop.c which will
>>> emulate a DSA switch. It won't create interconnect ports, since only one
>> 
>> Hmm, trying to use dsa_loop. Doing:
>> modprobe dsa_loop
>> modprobe fixed_phy
>> 
>> I don't see the netdevs. Any idea what am I doing wrong? Thanks!
>
>Yes, modprobe dsa-loop-bdinfo first, which will create the

That is compiled inside "fixed_phy", isn't it?
In my case, "Module fixed_phy is builtin"
So it should be enough just to "modprobe dsa_loop", right? That does not
work :/


>mdio_board_info and then modprobe dsa-loop.
>
>> 
>> 
>>> switch can be created with the method chosen, but this would have helped
>>> you catch the missing break since the "CPU" port would have been
>>> displayed as "DSA" anyway.
>>>
>>> If you need hardware, I am sure this can be somehow arranged. By that, I
>>> mean something on which you can run upstream Linux on without out of
>>> tree patches.
>>> -- 
>>> Florian
>
>-- 
>Florian

[PATCH net 1/7] net: ip6_gre: Request headroom in __gre6_xmit()

2018-05-17 Thread Petr Machata

__gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit()
for generic IP-in-IP processing. However it doesn't make sure that there
is enough headroom to push the header to. That can lead to the panic
cited below. (Reproducer below that).

Fix by requesting either needed_headroom if already primed, or just the
bare minimum needed for the header otherwise.

[  158.576725] kernel BUG at net/core/skbuff.c:104!
[  158.581510] invalid opcode:  [#1] PREEMPT SMP KASAN PTI
[  158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel 
tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e 
leds_mlxcpld
[  158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 
4.17.0-rc4-net_master-custom-139 #10
[  158.610938] Hardware name: Mellanox Technologies Ltd. 
"MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[  158.620426] RIP: 0010:skb_panic+0xc3/0x100
[  158.624586] RSP: 0018:8801d3f27110 EFLAGS: 00010286
[  158.629882] RAX: 0082 RBX: 8801c02cc040 RCX: 
[  158.637127] RDX: 0082 RSI: dc00 RDI: ed003a7e4e18
[  158.644366] RBP: 8801bfec8020 R08: ed003aabce19 R09: ed003aabce19
[  158.651574] R10: 000b R11: ed003aabce18 R12: 8801c364de66
[  158.658786] R13: 002c R14: 00c0 R15: 8801c364de68
[  158.666007] FS:  () GS:8801d540() 
knlGS:
[  158.674212] CS:  0010 DS:  ES:  CR0: 80050033
[  158.680036] CR2: 7f4b3702dcd0 CR3: 03228002 CR4: 001606e0
[  158.687228] Call Trace:
[  158.689752]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.694475]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.699141]  skb_push+0x78/0x90
[  158.702344]  __gre6_xmit+0x246/0xd80 [ip6_gre]
[  158.706872]  ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre]
[  158.711992]  ? __gre6_xmit+0xd80/0xd80 [ip6_gre]
[  158.716668]  ? debug_check_no_locks_freed+0x210/0x210
[  158.721761]  ? print_irqtrace_events+0x120/0x120
[  158.726461]  ? sched_clock_cpu+0x18/0x210
[  158.730572]  ? sched_clock_cpu+0x18/0x210
[  158.734692]  ? cyc2ns_read_end+0x10/0x10
[  158.738705]  ? skb_network_protocol+0x76/0x200
[  158.743216]  ? netif_skb_features+0x1b2/0x550
[  158.747648]  dev_hard_start_xmit+0x137/0x770
[  158.752010]  sch_direct_xmit+0x2ef/0x5d0
[  158.755992]  ? pfifo_fast_dequeue+0x3fa/0x670
[  158.760460]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
[  158.765975]  ? __lock_is_held+0xa0/0x160
[  158.770002]  __qdisc_run+0x39e/0xfc0
[  158.773673]  ? _raw_spin_unlock+0x29/0x40
[  158.81]  ? pfifo_fast_enqueue+0x24b/0x3e0
[  158.782191]  ? sch_direct_xmit+0x5d0/0x5d0
[  158.786372]  ? pfifo_fast_dequeue+0x670/0x670
[  158.790818]  ? __dev_queue_xmit+0x172/0x1770
[  158.795195]  ? preempt_count_sub+0xf/0xd0
[  158.799313]  __dev_queue_xmit+0x410/0x1770
[  158.803512]  ? ___slab_alloc+0x605/0x930
[  158.807525]  ? ___slab_alloc+0x605/0x930
[  158.811540]  ? memcpy+0x34/0x50
[  158.814768]  ? netdev_pick_tx+0x1c0/0x1c0
[  158.818895]  ? __skb_clone+0x2fd/0x3d0
[  158.822712]  ? __copy_skb_header+0x270/0x270
[  158.827079]  ? rcu_read_lock_sched_held+0x93/0xa0
[  158.831903]  ? kmem_cache_alloc+0x344/0x4d0
[  158.836199]  ? skb_clone+0x123/0x230
[  158.839869]  ? skb_split+0x820/0x820
[  158.843521]  ? tcf_mirred+0x554/0x930 [act_mirred]
[  158.848407]  tcf_mirred+0x554/0x930 [act_mirred]
[  158.853104]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
[  158.860005]  ? __lock_acquire+0x706/0x26e0
[  158.864162]  ? mark_lock+0x13d/0xb40
[  158.867832]  tcf_action_exec+0xcf/0x2a0
[  158.871736]  tcf_classify+0xfa/0x340
[  158.875402]  __netif_receive_skb_core+0x8e1/0x1c60
[  158.880334]  ? nf_ingress+0x500/0x500
[  158.884059]  ? process_backlog+0x347/0x4b0
[  158.888241]  ? lock_acquire+0xd8/0x320
[  158.892050]  ? process_backlog+0x1b6/0x4b0
[  158.896228]  ? process_backlog+0xc2/0x4b0
[  158.900291]  process_backlog+0xc2/0x4b0
[  158.904210]  net_rx_action+0x5cc/0x980
[  158.908047]  ? napi_complete_done+0x2c0/0x2c0
[  158.912525]  ? rcu_read_unlock+0x80/0x80
[  158.916534]  ? __lock_is_held+0x34/0x160
[  158.920541]  __do_softirq+0x1d4/0x9d2
[  158.924308]  ? trace_event_raw_event_irq_handler_exit+0x140/0x140
[  158.930515]  run_ksoftirqd+0x1d/0x40
[  158.934152]  smpboot_thread_fn+0x32b/0x690
[  158.938299]  ? sort_range+0x20/0x20
[  158.941842]  ? preempt_count_sub+0xf/0xd0
[  158.945940]  ? schedule+0x5b/0x140
[  158.949412]  kthread+0x206/0x300
[  158.952689]  ? sort_range+0x20/0x20
[  158.956249]  ? kthread_stop+0x570/0x570
[  158.960164]  ret_from_fork+0x3a/0x50
[  158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf 
db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 
48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
[  158.983235] RIP: skb_panic+0xc3/0x100 RSP: 8801d3f27110
[  158.988935] ---[ end trace 5af56ee845aa6cc8 ]---
[  158.993641

[PATCH net 0/7] net: ip6_gre: Fixes in headroom handling

2018-05-17 Thread Petr Machata

This series mends some problems in headroom management in ip6_gre
module. The current code base has the following three closely-related
problems:

- ip6gretap tunnels neglect to ensure there's enough writable headroom
  before pushing GRE headers.

- ip6erspan does this, but assumes that dev->needed_headroom is primed.
  But that doesn't happen until ip6_tnl_xmit() is called later. Thus for
  the first packet, ip6erspan actually behaves like ip6gretap above.

- ip6erspan shares some of the code with ip6gretap, including
  calculations of needed header length. While there is custom
  ERSPAN-specific code for calculating the headroom, the computed
  values are overwritten by the ip6gretap code.

The first two issues lead to a kernel panic in situations where a packet
is mirrored from a veth device to the device in question. They are
fixed, respectively, in patches #1 and #2, which include the full panic
trace and a reproducer.

The rest of the patchset deals with the last issue. In patches #3 to #6,
several functions are split up into reusable parts. Finally in patch #7
these blocks are used to compose ERSPAN-specific callbacks where
necessary to fix the hlen calculation.

Petr Machata (7):
  net: ip6_gre: Request headroom in __gre6_xmit()
  net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit()
  net: ip6_gre: Split up ip6gre_tnl_link_config()
  net: ip6_gre: Split up ip6gre_tnl_change()
  net: ip6_gre: Split up ip6gre_newlink()
  net: ip6_gre: Split up ip6gre_changelink()
  net: ip6_gre: Fix ip6erspan hlen calculation

 net/ipv6/ip6_gre.c | 184 +
 1 file changed, 145 insertions(+), 39 deletions(-)

-- 
2.4.11

[PATCH net 3/7] net: ip6_gre: Split up ip6gre_tnl_link_config()

2018-05-17 Thread Petr Machata

The function ip6gre_tnl_link_config() is used for setting up
configuration of both ip6gretap and ip6erspan tunnels. Split the
function into the common part and the route-lookup part. The latter then
takes the calculated header length as an argument. This split will allow
the patches down the line to sneak in a custom header length computation
for the ERSPAN tunnel.

Signed-off-by: Petr Machata 
---
 net/ipv6/ip6_gre.c | 38 ++
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 53b1531..78ba6b9 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1022,12 +1022,11 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff 
*skb,
return NETDEV_TX_OK;
 }
 
-static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu)
+static void ip6gre_tnl_link_config_common(struct ip6_tnl *t)
 {
struct net_device *dev = t->dev;
struct __ip6_tnl_parm *p = &t->parms;
struct flowi6 *fl6 = &t->fl.u.ip6;
-   int t_hlen;
 
if (dev->type != ARPHRD_ETHER) {
memcpy(dev->dev_addr, &p->laddr, sizeof(struct in6_addr));
@@ -1054,12 +1053,13 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, 
int set_mtu)
dev->flags |= IFF_POINTOPOINT;
else
dev->flags &= ~IFF_POINTOPOINT;
+}
 
-   t->tun_hlen = gre_calc_hlen(t->parms.o_flags);
-
-   t->hlen = t->encap_hlen + t->tun_hlen;
-
-   t_hlen = t->hlen + sizeof(struct ipv6hdr);
+static void ip6gre_tnl_link_config_route(struct ip6_tnl *t, int set_mtu,
+int t_hlen)
+{
+   const struct __ip6_tnl_parm *p = &t->parms;
+   struct net_device *dev = t->dev;
 
if (p->flags & IP6_TNL_F_CAP_XMIT) {
int strict = (ipv6_addr_type(&p->raddr) &
@@ -1091,6 +1091,24 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, 
int set_mtu)
}
 }
 
+static int ip6gre_calc_hlen(struct ip6_tnl *tunnel)
+{
+   int t_hlen;
+
+   tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags);
+   tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
+
+   t_hlen = tunnel->hlen + sizeof(struct ipv6hdr);
+   tunnel->dev->hard_header_len = LL_MAX_HEADER + t_hlen;
+   return t_hlen;
+}
+
+static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu)
+{
+   ip6gre_tnl_link_config_common(t);
+   ip6gre_tnl_link_config_route(t, set_mtu, ip6gre_calc_hlen(t));
+}
+
 static int ip6gre_tnl_change(struct ip6_tnl *t,
const struct __ip6_tnl_parm *p, int set_mtu)
 {
@@ -1384,11 +1402,7 @@ static int ip6gre_tunnel_init_common(struct net_device 
*dev)
return ret;
}
 
-   tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags);
-   tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
-   t_hlen = tunnel->hlen + sizeof(struct ipv6hdr);
-
-   dev->hard_header_len = LL_MAX_HEADER + t_hlen;
+   t_hlen = ip6gre_calc_hlen(tunnel);
dev->mtu = ETH_DATA_LEN - t_hlen;
if (dev->type == ARPHRD_ETHER)
dev->mtu -= ETH_HLEN;
-- 
2.4.11

1 2 3 4 >

1 - 100 of 364 matches

Mail list logo