RE: [RFC PATCH v3 1/2] Documentation: DT: net: Add Xilinx gmiitorgmii converter device tree binding documentation

2016-08-04 Thread Appana Durga Kedareswara Rao
Hi Rob,

Thanks for the review...



> > +XILINX GMIITORGMII Converter Driver Device Tree Bindings
> > +
> > +
> > +The Gigabit Media Independent Interface (GMII) to Reduced Gigabit
> > +Media Independent Interface (RGMII) core provides the RGMII between
> > +RGMII-compliant Ethernet physical media devices (PHY) and the Gigabit
> Ethernet controller.
> > +This core can be used in all three modes of operation(10/100/1000 Mb/s).
> > +The Management Data Input/Output (MDIO) interface is used to
> > +configure the Speed of operation. This core can switch dynamically
> > +between the three Different speed modes by configuring the conveter
> register through mdio write.
> > +
> > +The MDIO is a bus to which the PHY devices are connected.  For each
> > +device that exists on this bus, a child node should be created.  See
> > +the definition of the PHY node in booting-without-of.txt for an
> > +example of how to define a PHY.
> > +
> > +Required properties:
> > +  - compatible : Should be "xilinx,gmiitorgmii"
> 
> Perhaps xilinx,gmii-to-rgmii.

Sure...

> 
> This needs some sort of version information in the compatible string.

Sure will fix in the next version...

> 
> > +  - reg : The ID number for the phy, usually a small integer
> > +
> > +Example:
> > +   mdio {
> > +#address-cells = <1>;
> > +#size-cells = <0>;
> > +   ethernet-phy@0 {
> > +   ..
> > +   };
> > +gmii_to_rgmii: gmii_to_rgmii@8 {
> 
> Don't use underscores in node names.

Ok will fix in the next version...

Regards,
Kedar.


[PATCH v2] net: phy: micrel: Add specific suspend

2016-08-04 Thread Wenyou Yang
Disable all interrupts when suspend, they will be enabled
when resume. Otherwise, the suspend/resume process will be
blocked occasionally.

Signed-off-by: Wenyou Yang 
Acked-by: Nicolas Ferre 
---

Changes in v2:
 - Use fairly generic phydrv->config_intr() with
   PHY_INTERRUPT_DISABLED, then call genphy_suspend().
 - Modify kszphy_resume() with PHY_INTERRUPT_ENABLED accordingly.
 - Add static attribute for kszphy_suspend().

 drivers/net/phy/micrel.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 5a8fefc..b598d46 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -647,17 +647,28 @@ static void kszphy_get_stats(struct phy_device *phydev,
data[i] = kszphy_get_stat(phydev, i);
 }
 
-static int kszphy_resume(struct phy_device *phydev)
+static int kszphy_suspend(struct phy_device *phydev)
 {
-   int value;
+   /* Disable PHY Interrupts */
+   if (phy_interrupt_is_valid(phydev)) {
+   phydev->interrupts = PHY_INTERRUPT_DISABLED;
+   if (phydev->drv->config_intr)
+   phydev->drv->config_intr(phydev);
+   }
 
-   mutex_lock(&phydev->lock);
+   return genphy_suspend(phydev);
+}
 
-   value = phy_read(phydev, MII_BMCR);
-   phy_write(phydev, MII_BMCR, value & ~BMCR_PDOWN);
+static int kszphy_resume(struct phy_device *phydev)
+{
+   genphy_resume(phydev);
 
-   kszphy_config_intr(phydev);
-   mutex_unlock(&phydev->lock);
+   /* Enable PHY Interrupts */
+   if (phy_interrupt_is_valid(phydev)) {
+   phydev->interrupts = PHY_INTERRUPT_ENABLED;
+   if (phydev->drv->config_intr)
+   phydev->drv->config_intr(phydev);
+   }
 
return 0;
 }
@@ -870,7 +881,7 @@ static struct phy_driver ksphy_driver[] = {
.get_sset_count = kszphy_get_sset_count,
.get_strings= kszphy_get_strings,
.get_stats  = kszphy_get_stats,
-   .suspend= genphy_suspend,
+   .suspend= kszphy_suspend,
.resume = kszphy_resume,
 }, {
.phy_id = PHY_ID_KSZ8061,
-- 
2.7.4



RE: [RFC PATCH v3 2/2] net: phy: Add gmiitorgmii converter support

2016-08-04 Thread Appana Durga Kedareswara Rao
Hi zhuyj,

Thanks for the review...

> 
> +   switch (phydev->speed) {
> +   case SPEED_1000:
> +   val |= BMCR_SPEED1000;
> +   case SPEED_100:
> +   val |= BMCR_SPEED100;
> +   }
> 
> Are there only 2 kinds of speed?

Converter supports 3 different speeds will fix in the next version...

Thanks,
Kedar.

> 
> 
> On Thu, Aug 4, 2016 at 8:13 PM, Kedareswara rao Appana
>  wrote:
> > This patch adds support for gmiitorgmii converter.
> >
> > The GMII to RGMII IP core provides the Reduced Gigabit Media
> > Independent Interface (RGMII) between Ethernet physical media Devices
> > and the Gigabit Ethernet controller. This core can Switch dynamically
> > between the three different speed modes of Operation by configuring
> > the converter register through mdio write.
> >
> > MDIO interface is used to set operating speed of Ethernet MAC.
> >
> > Signed-off-by: Kedareswara rao Appana 
> > ---
> > Thanks a lot Andrew for your inputs.
> > Changes for v3:
> > --> Updated the driver as suggested by Andrew.
> > Changes for v2:
> > --> Passed struct xphy pointer directly to the fix_mac_speed
> > API as suggested by the Florian.
> > --> Added checks for the phy-node fail case as suggested
> > by the Florian
> >
> >  drivers/net/phy/Kconfig |   8 +++
> >  drivers/net/phy/Makefile|   1 +
> >  drivers/net/phy/xilinx_gmii2rgmii.c | 121
> > 
> >  3 files changed, 130 insertions(+)
> >  create mode 100644 drivers/net/phy/xilinx_gmii2rgmii.c
> >
> > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index
> > 6dad9a9..eec31a3 100644
> > --- a/drivers/net/phy/Kconfig
> > +++ b/drivers/net/phy/Kconfig
> > @@ -271,6 +271,14 @@ config MDIO_BCM_IPROC
> >   This module provides a driver for the MDIO busses found in the
> >   Broadcom iProc SoC's.
> >
> > +config XILINX_GMII2RGMII
> > +   tristate "Xilinx GMII2RGMII converter driver"
> > +   default y
> > +   ---help---
> > + This driver support xilinx GMII to RGMII IP core it provides
> > + the Reduced Gigabit Media Independent Interface(RGMII) between
> > + Ethernet physical media devices and the Gigabit Ethernet 
> > controller.
> > +
> >  endif # PHYLIB
> >
> >  config MICREL_KS8995MA
> > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index
> > fcdbb92..8650067 100644
> > --- a/drivers/net/phy/Makefile
> > +++ b/drivers/net/phy/Makefile
> > @@ -44,3 +44,4 @@ obj-$(CONFIG_MDIO_MOXART) += mdio-moxart.o
> >  obj-$(CONFIG_MDIO_BCM_UNIMAC)  += mdio-bcm-unimac.o
> >  obj-$(CONFIG_MICROCHIP_PHY)+= microchip.o
> >  obj-$(CONFIG_MDIO_BCM_IPROC)   += mdio-bcm-iproc.o
> > +obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o
> > diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c
> > b/drivers/net/phy/xilinx_gmii2rgmii.c
> > new file mode 100644
> > index 000..56a655a9
> > --- /dev/null
> > +++ b/drivers/net/phy/xilinx_gmii2rgmii.c
> > @@ -0,0 +1,121 @@
> > +/* Xilinx GMII2RGMII Converter driver
> > + *
> > + * Copyright (C) 2016 Xilinx, Inc.
> > + *
> > + * Author: Kedareswara rao Appana 
> > + *
> > + * Description:
> > + * This driver is developed for Xilinx GMII2RGMII Converter
> > + *
> > + * This program is free software: you can redistribute it and/or
> > +modify
> > + * it under the terms of the GNU General Public License as published
> > +by
> > + * the Free Software Foundation, either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define XILINX_GMII2RGMII_REG  0x10
> > +
> > +struct gmii2rgmii {
> > +   struct phy_device *phy_dev;
> > +   struct phy_driver *phy_drv;
> > +   struct phy_driver conv_phy_drv;
> > +   int addr;
> > +};
> > +
> > +static int xgmiitorgmii_read_status(struct phy_device *phydev) {
> > +   struct gmii2rgmii *priv = (struct gmii2rgmii *)phydev->priv;
> > +   u16 val = 0;
> > +
> > +   priv->phy_drv->read_status(phydev);
> > +
> > +   val = mdiobus_read(phydev->mdio.bus, priv->addr,
> > + XILINX_GMII2RGMII_REG);
> > +
> > +   switch (phydev->speed) {
> > +   case SPEED_1000:
> > +   val |= BMCR_SPEED1000;
> > +   case SPEED_100:
> > +   val |= BMCR_SPEED100;
> > +   }
> > +
> > +   mdiobus_write(phydev->mdio.bus, priv->addr,
> > + XILINX_GMII2RGMII_REG, val);
> > +
> > +   return 0;
> > +}
> > +
> > +int xgmiitorgmii_probe(struct mdio_device *mdiodev) {
> > +   struct device *dev = &mdiodev->dev;
> > +   struct device_node *np = dev->of_node, *phy_node;
> > +   struct gm

Re: [RFC PATCH v3 2/2] net: phy: Add gmiitorgmii converter support

2016-08-04 Thread zhuyj
+   switch (phydev->speed) {
+   case SPEED_1000:
+   val |= BMCR_SPEED1000;
+   case SPEED_100:
+   val |= BMCR_SPEED100;
+   }

Are there only 2 kinds of speed?


On Thu, Aug 4, 2016 at 8:13 PM, Kedareswara rao Appana
 wrote:
> This patch adds support for gmiitorgmii converter.
>
> The GMII to RGMII IP core provides the Reduced Gigabit Media
> Independent Interface (RGMII) between Ethernet physical media
> Devices and the Gigabit Ethernet controller. This core can
> Switch dynamically between the three different speed modes of
> Operation by configuring the converter register through mdio write.
>
> MDIO interface is used to set operating speed of Ethernet MAC.
>
> Signed-off-by: Kedareswara rao Appana 
> ---
> Thanks a lot Andrew for your inputs.
> Changes for v3:
> --> Updated the driver as suggested by Andrew.
> Changes for v2:
> --> Passed struct xphy pointer directly to the fix_mac_speed
> API as suggested by the Florian.
> --> Added checks for the phy-node fail case as suggested
> by the Florian
>
>  drivers/net/phy/Kconfig |   8 +++
>  drivers/net/phy/Makefile|   1 +
>  drivers/net/phy/xilinx_gmii2rgmii.c | 121 
> 
>  3 files changed, 130 insertions(+)
>  create mode 100644 drivers/net/phy/xilinx_gmii2rgmii.c
>
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 6dad9a9..eec31a3 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -271,6 +271,14 @@ config MDIO_BCM_IPROC
>   This module provides a driver for the MDIO busses found in the
>   Broadcom iProc SoC's.
>
> +config XILINX_GMII2RGMII
> +   tristate "Xilinx GMII2RGMII converter driver"
> +   default y
> +   ---help---
> + This driver support xilinx GMII to RGMII IP core it provides
> + the Reduced Gigabit Media Independent Interface(RGMII) between
> + Ethernet physical media devices and the Gigabit Ethernet controller.
> +
>  endif # PHYLIB
>
>  config MICREL_KS8995MA
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index fcdbb92..8650067 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -44,3 +44,4 @@ obj-$(CONFIG_MDIO_MOXART) += mdio-moxart.o
>  obj-$(CONFIG_MDIO_BCM_UNIMAC)  += mdio-bcm-unimac.o
>  obj-$(CONFIG_MICROCHIP_PHY)+= microchip.o
>  obj-$(CONFIG_MDIO_BCM_IPROC)   += mdio-bcm-iproc.o
> +obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o
> diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c 
> b/drivers/net/phy/xilinx_gmii2rgmii.c
> new file mode 100644
> index 000..56a655a9
> --- /dev/null
> +++ b/drivers/net/phy/xilinx_gmii2rgmii.c
> @@ -0,0 +1,121 @@
> +/* Xilinx GMII2RGMII Converter driver
> + *
> + * Copyright (C) 2016 Xilinx, Inc.
> + *
> + * Author: Kedareswara rao Appana 
> + *
> + * Description:
> + * This driver is developed for Xilinx GMII2RGMII Converter
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define XILINX_GMII2RGMII_REG  0x10
> +
> +struct gmii2rgmii {
> +   struct phy_device *phy_dev;
> +   struct phy_driver *phy_drv;
> +   struct phy_driver conv_phy_drv;
> +   int addr;
> +};
> +
> +static int xgmiitorgmii_read_status(struct phy_device *phydev)
> +{
> +   struct gmii2rgmii *priv = (struct gmii2rgmii *)phydev->priv;
> +   u16 val = 0;
> +
> +   priv->phy_drv->read_status(phydev);
> +
> +   val = mdiobus_read(phydev->mdio.bus, priv->addr, 
> XILINX_GMII2RGMII_REG);
> +
> +   switch (phydev->speed) {
> +   case SPEED_1000:
> +   val |= BMCR_SPEED1000;
> +   case SPEED_100:
> +   val |= BMCR_SPEED100;
> +   }
> +
> +   mdiobus_write(phydev->mdio.bus, priv->addr, XILINX_GMII2RGMII_REG, 
> val);
> +
> +   return 0;
> +}
> +
> +int xgmiitorgmii_probe(struct mdio_device *mdiodev)
> +{
> +   struct device *dev = &mdiodev->dev;
> +   struct device_node *np = dev->of_node, *phy_node;
> +   struct gmii2rgmii *priv;
> +   int ret;
> +
> +   priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
> +   if (!priv)
> +   return -ENOMEM;
> +
> +   phy_node = of_parse_phandle(np, "phy-handle", 0);
> +   if (IS_ERR(phy_node)) {
> +   dev_err(dev, "Couldn't parse phy-handle\n");
> +   ret = -ENODEV;
> +   goto out;
> +   }
> +
> +   priv->phy_de

RE: [PATCH v1] net: phy: micrel: Add specific suspend

2016-08-04 Thread Wenyou.Yang
Hi Florian,

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: 2016年8月4日 11:33
> To: Wenyou Yang 
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; Alexandre Belloni  electrons.com>; Nicolas Ferre ; Andrew Lunn
> 
> Subject: Re: [PATCH v1] net: phy: micrel: Add specific suspend
> 
> On 03/08/2016 17:21, Wenyou Yang wrote:
> > Disable all interrupts when suspend, they will be enabled when resume.
> > Otherwise, the suspend/resume process will be blocked occasionally.
> 
> This seems like something fairly generic actually, we could imagine having the
> core library do something like this:
> 
> - if interrupts are valid and enabled for the PHY, call
> phydrv->config_intr() with PHY_INTERRUPT_DISABLED
> - call genphy_suspend

Accepted. I will send a new version.

> 
> Of course if none of this fits the generic model, the PHY driver can still 
> provide a
> suspend callback. Might be worth auditing other drivers for that pattern and 
> look at
> those that need specific handling like keeping specific interrupt sources 
> active for
> e.g: wake-on-LAN.
> 
> >
> > Signed-off-by: Wenyou Yang 
> > Acked-by: Nicolas Ferre 
> > ---
> >
> >  drivers/net/phy/micrel.c | 19 ++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index
> > 5a8fefc..8cb778a 100644
> > --- a/drivers/net/phy/micrel.c
> > +++ b/drivers/net/phy/micrel.c
> > @@ -647,6 +647,23 @@ static void kszphy_get_stats(struct phy_device
> *phydev,
> > data[i] = kszphy_get_stat(phydev, i);  }
> >
> > +int kszphy_suspend(struct phy_device *phydev) {
> > +   int value;
> > +
> > +   mutex_lock(&phydev->lock);
> > +
> > +   /* disable interrupts */
> > +   phy_write(phydev, MII_KSZPHY_INTCS, 0x0);
> > +
> > +   value = phy_read(phydev, MII_BMCR);
> > +   phy_write(phydev, MII_BMCR, value | BMCR_PDOWN);
> > +
> > +   mutex_unlock(&phydev->lock);
> > +
> > +   return 0;
> > +}
> > +
> >  static int kszphy_resume(struct phy_device *phydev)  {
> > int value;
> > @@ -870,7 +887,7 @@ static struct phy_driver ksphy_driver[] = {
> > .get_sset_count = kszphy_get_sset_count,
> > .get_strings= kszphy_get_strings,
> > .get_stats  = kszphy_get_stats,
> > -   .suspend= genphy_suspend,
> > +   .suspend= kszphy_suspend,
> > .resume = kszphy_resume,
> >  }, {
> > .phy_id = PHY_ID_KSZ8061,
> >


Best Regards,
Wenyou Yang


Re: Buggy rhashtable walking

2016-08-04 Thread Johannes Berg

> So I'm going to fix this by consolidating identical objects into
> a single rhashtable entry which also lets us get rid of the
> insecure_elasticity setting.

Hm. Would you rather allocate a separate head entry for the hashtable,
or chain the entries?

(Luckily) the colliding key case practically never happens, and some
drivers don't even allow it, so that's perhaps something to keep in
mind for this. Perhaps we should just generally disallow it - but a few
people (hi Ben) would be really unhappy about that I guess.

I think this might affect more than one use of rhashtable in mac80211
now, since the mesh paths also use it.

johannes


Re: [PATCH net 1/2] tg3: Fix for diasllow rx coalescing time to be 0

2016-08-04 Thread Siva Reddy Kallam
On Thu, Aug 4, 2016 at 3:45 AM, Michael Chan  wrote:
> On Wed, Aug 3, 2016 at 9:04 AM, Rick Jones  wrote:
>>
>> Should anything then happen with:
>>
>> /* No rx interrupts will be generated if both are zero */
>> if ((ec->rx_coalesce_usecs == 0) &&
>> (ec->rx_max_coalesced_frames == 0))
>> return -EINVAL;
>>
>>
>> which is the next block of code?  The logic there seems to suggest that it
>> was intended to be able to have an rx_coalesce_usecs of 0 and rely on packet
>> arrival to trigger an interrupt.  Presumably setting rx_max_coalesced_frames
>> to 1 to disable interrupt coalescing.
>>
>
> I remember writing this block of code over 10 years ago for early
> generations of the chip.  Newer chips seem to behave differently and
> rx_coalesce_usecs can never be zero.  So this block can be removed now
> that the condition can never be true.  We should probably leave a
> comment there for future reference.
Thanks Rick for identifying this.
Thanks Michael for your inputs. I will submit a patch with removal of
this block of code and add a comment for future reference.


[PATCH] net: macb: Correct CAPS mask

2016-08-04 Thread Harini Katakam
USRIO and JUMBO CAPS have the same mask.
Fix the same.

Fixes: ce721a702197 ("net: ethernet: cadence-macb: Add disabled usrio caps")
Cc: sta...@vger.kernel.org # v4.5+
Signed-off-by: Harini Katakam 
Acked-by: Nicolas Ferre 
---
 drivers/net/ethernet/cadence/macb.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 36893d8..b6fcf10 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -403,11 +403,11 @@
 #define MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII0x0004
 #define MACB_CAPS_NO_GIGABIT_HALF  0x0008
 #define MACB_CAPS_USRIO_DISABLED   0x0010
+#define MACB_CAPS_JUMBO0x0020
 #define MACB_CAPS_FIFO_MODE0x1000
 #define MACB_CAPS_GIGABIT_MODE_AVAILABLE   0x2000
 #define MACB_CAPS_SG_DISABLED  0x4000
 #define MACB_CAPS_MACB_IS_GEM  0x8000
-#define MACB_CAPS_JUMBO0x0010
 
 /* Bit manipulation macros */
 #define MACB_BIT(name) \
-- 
2.7.4



Re: [RFC PATCH 1/2] net: macb: Correct CAPS mask

2016-08-04 Thread Harini Katakam
On Thu, Aug 4, 2016 at 7:37 PM, Nicolas Ferre  wrote:
> Le 01/08/2016 à 09:20, Harini Katakam a écrit :
>> USRIO and JUMBO CAPS have the same mask.
>> Fix the same.
>>
>> Signed-off-by: Harini Katakam 
>
> Hi,
> Indeed there's a bug...
>
>
>> ---
>>  drivers/net/ethernet/cadence/macb.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.h 
>> b/drivers/net/ethernet/cadence/macb.h
>> index 36893d8..b6fcf10 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -403,11 +403,11 @@
>>  #define MACB_CAPS_USRIO_DEFAULT_IS_MII_GMII  0x0004
>>  #define MACB_CAPS_NO_GIGABIT_HALF0x0008
>>  #define MACB_CAPS_USRIO_DISABLED 0x0010
>> +#define MACB_CAPS_JUMBO  0x0020
>>  #define MACB_CAPS_FIFO_MODE  0x1000
>>  #define MACB_CAPS_GIGABIT_MODE_AVAILABLE 0x2000
>>  #define MACB_CAPS_SG_DISABLED0x4000
>>  #define MACB_CAPS_MACB_IS_GEM0x8000
>> -#define MACB_CAPS_JUMBO  0x0010
>
> Acked-by: Nicolas Ferre 
>
> Can you please send it independently without the RFC tag in the subject
> and with the following tags in the body as well:
>
> Fixes: ce721a702197 ("net: ethernet: cadence-macb: Add disabled usrio caps")
> Cc: sta...@vger.kernel.org # v4.5+
>

Thanks Nicolas. I'll do that.

Regards,
Harini

>>  /* Bit manipulation macros */
>>  #define MACB_BIT(name)   \
>>
>
> Thanks, bye,
> --
> Nicolas Ferre


Re: order-0 vs order-N driver allocation. Was: [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support

2016-08-04 Thread Alexei Starovoitov
On Thu, Aug 04, 2016 at 05:30:56PM -0700, Alexander Duyck wrote:
> On Thu, Aug 4, 2016 at 9:19 AM, Jesper Dangaard Brouer
>  wrote:
> >
> > On Wed, 3 Aug 2016 10:45:13 -0700 Alexei Starovoitov 
> >  wrote:
> >
> >> On Mon, Jul 25, 2016 at 09:35:20AM +0200, Eric Dumazet wrote:
> >> > On Tue, 2016-07-19 at 12:16 -0700, Brenden Blanco wrote:
> >> > > The mlx4 driver by default allocates order-3 pages for the ring to
> >> > > consume in multiple fragments. When the device has an xdp program, this
> >> > > behavior will prevent tx actions since the page must be re-mapped in
> >> > > TODEVICE mode, which cannot be done if the page is still shared.
> >> > >
> >> > > Start by making the allocator configurable based on whether xdp is
> >> > > running, such that order-0 pages are always used and never shared.
> >> > >
> >> > > Since this will stress the page allocator, add a simple page cache to
> >> > > each rx ring. Pages in the cache are left dma-mapped, and in drop-only
> >> > > stress tests the page allocator is eliminated from the perf report.
> >> > >
> >> > > Note that setting an xdp program will now require the rings to be
> >> > > reconfigured.
> >> >
> >> > Again, this has nothing to do with XDP ?
> >> >
> >> > Please submit a separate patch, switching this driver to order-0
> >> > allocations.
> >> >
> >> > I mentioned this order-3 vs order-0 issue earlier [1], and proposed to
> >> > send a generic patch, but had been traveling lately, and currently in
> >> > vacation.
> >> >
> >> > order-3 pages are problematic when dealing with hostile traffic anyway,
> >> > so we should exclusively use order-0 pages, and page recycling like
> >> > Intel drivers.
> >> >
> >> > http://lists.openwall.net/netdev/2016/04/11/88
> >>
> >> Completely agree. These multi-page tricks work only for benchmarks and
> >> not for production.
> >> Eric, if you can submit that patch for mlx4 that would be awesome.
> >>
> >> I think we should default to order-0 for both mlx4 and mlx5.
> >> Alternatively we're thinking to do a netlink or ethtool switch to
> >> preserve old behavior, but frankly I don't see who needs this order-N
> >> allocation schemes.
> >
> > I actually agree, that we should switch to order-0 allocations.
> >
> > *BUT* this will cause performance regressions on platforms with
> > expensive DMA operations (as they no longer amortize the cost of
> > mapping a larger page).

order-0 is mainly about correctness under memory pressure.
As Eric pointed out order-N is a serious issue for hostile traffic,
but even for normal traffic it's a problem. Sooner or later
only order-0 pages will be available.
Performance considerations come second.

> The trick is to use page reuse like we do for the Intel NICs.  If you
> can get away with just reusing the page you don't have to keep making
> the expensive map/unmap calls.

you mean two packet per page trick?
I think it's trading off performance vs memory.
It's useful. I wish there was a knob to turn it on/off instead
of relying on mtu size threshold.

> > I've started coding on the page-pool last week, which address both the
> > DMA mapping and recycling (with less atomic ops). (p.s. still on
> > vacation this week).
> >
> > http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf
> 
> I really wonder if we couldn't get away with creating some sort of 2
> tiered allocator for this.  So instead of allocating a page pool we
> just reserved blocks of memory like we do with huge pages.  Then you
> have essentially a huge page that is mapped to a given device for DMA
> and reserved for it to use as a memory resource to allocate the order
> 0 pages out of.  Doing it that way would likely have multiple
> advantages when working with things like IOMMU since the pages would
> all belong to one linear block so it would likely consume less
> resources on those devices, and it wouldn't be that far off from how
> DPDK is making use of huge pages in order to improve it's memory
> access times and such.

interesting idea. Like dma_map 1GB region and then allocate
pages from it only? but the rest of the kernel won't be able
to use them? so only some smaller region then? or it will be
a boot time flag to reserve this pseudo-huge page?
I don't think any of that is needed for XDP. As demonstrated by current
mlx4 it's very fast already. No bottlenecks in page allocators.
Tiny page recycle array does the magic because most of the traffic
is not going to the stack.
This order-0 vs order-N discussion is for the main stack.
Not related to XDP.



Re: [RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread Tom Herbert
On Thu, Aug 4, 2016 at 12:36 PM,   wrote:
> From: Kan Liang 
>
> To achieve better network performance, the key step is to distribute the
> packets to dedicated queues according to policy and system run time
> status.
>
> This patch provides an interface which can return the proper dedicated
> queue for socket/task. Then the packets of the socket/task will be
> redirect to the dedicated queue for better network performance.
>
> For selecting the proper queue, currently it uses round-robin algorithm
> to find the available object from the given policy object list. The
> algorithm is good enough for now. But it could be improved by some
> adaptive algorithm later.
>
Seriously? You want to all of this code so we revert to TX queue
selection by round robin?

> The selected object will be stored in hashtable. So it does not need to
> go through the whole object list every time.
>
> Signed-off-by: Kan Liang 
> ---
>  include/linux/netpolicy.h |   5 ++
>  net/core/netpolicy.c  | 136 
> ++
>  2 files changed, 141 insertions(+)
>
> diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
> index 5900252..a522015 100644
> --- a/include/linux/netpolicy.h
> +++ b/include/linux/netpolicy.h
> @@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void);
>  extern int netpolicy_register(struct netpolicy_instance *instance,
>   enum netpolicy_name policy);
>  extern void netpolicy_unregister(struct netpolicy_instance *instance);
> +extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool 
> is_rx);
>  #else
>  static inline void update_netpolicy_sys_map(void)
>  {
> @@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct 
> netpolicy_instance *instance)
>  {
>  }
>
> +static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, 
> bool is_rx)
> +{
> +   return 0;
> +}
>  #endif
>
>  #endif /*__LINUX_NETPOLICY_H*/
> diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
> index 3605761..98ca430 100644
> --- a/net/core/netpolicy.c
> +++ b/net/core/netpolicy.c
> @@ -290,6 +290,142 @@ static void netpolicy_record_clear_dev_node(struct 
> net_device *dev)
> spin_unlock_bh(&np_hashtable_lock);
>  }
>
> +static struct netpolicy_object *get_avail_object(struct net_device *dev,
> +enum netpolicy_name policy,
> +bool is_rx)
> +{
> +   int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
> +   struct netpolicy_object *tmp, *obj = NULL;
> +   int val = -1;
> +
> +   /* Check if net policy is supported */
> +   if (!dev || !dev->netpolicy)
> +   return NULL;
> +
> +   /* The system should have queues which support the request policy. */
> +   if ((policy != dev->netpolicy->cur_policy) &&
> +   (dev->netpolicy->cur_policy != NET_POLICY_MIX))
> +   return NULL;
> +
> +   spin_lock_bh(&dev->np_ob_list_lock);
> +   list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], 
> list) {
> +   if ((val > atomic_read(&tmp->refcnt)) ||
> +   (val == -1)) {
> +   val = atomic_read(&tmp->refcnt);
> +   obj = tmp;
> +   }
> +   }
> +
> +   if (WARN_ON(!obj)) {
> +   spin_unlock_bh(&dev->np_ob_list_lock);
> +   return NULL;
> +   }
> +   atomic_inc(&obj->refcnt);
> +   spin_unlock_bh(&dev->np_ob_list_lock);
> +
> +   return obj;
> +}
> +
> +static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
> +{
> +   struct netpolicy_record *old_record, *new_record;
> +   struct net_device *dev = instance->dev;
> +   unsigned long ptr_id = (uintptr_t)instance->ptr;
> +   int queue = -1;
> +
> +   spin_lock_bh(&np_hashtable_lock);
> +   old_record = netpolicy_record_search(ptr_id);
> +   if (!old_record) {
> +   pr_warn("NETPOLICY: doesn't registered. Remove net policy 
> settings!\n");
> +   instance->policy = NET_POLICY_INVALID;
> +   goto err;
> +   }
> +
> +   if (is_rx && old_record->rx_obj) {
> +   queue = old_record->rx_obj->queue;
> +   } else if (!is_rx && old_record->tx_obj) {
> +   queue = old_record->tx_obj->queue;
> +   } else {
> +   new_record = kzalloc(sizeof(*new_record), GFP_KERNEL);
> +   if (!new_record)
> +   goto err;
> +   memcpy(new_record, old_record, sizeof(*new_record));
> +
> +   if (is_rx) {
> +   new_record->rx_obj = get_avail_object(dev, 
> new_record->policy, is_rx);
> +   if (!new_record->dev)
> +   new_record->dev = dev;
> +   if (!new_record->rx_obj) {
> +   kfree(new_record);
> +

[PATCH] proc: make proc entries inherit ownership from parent

2016-08-04 Thread Dmitry Torokhov
There are certain parameters that belong to net namespace and that are
exported in /proc. They should be controllable by the container's owner,
but are currently owned by global root and thus not available.

Let's change proc code to inherit ownership of parent entry, and when
create per-ns "net" proc entry set it up as owned by container's owner.

Signed-off-by: Dmitry Torokhov 
---
 fs/proc/generic.c  |  2 ++
 fs/proc/proc_net.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index c633476..bca66d8 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -390,6 +390,8 @@ static struct proc_dir_entry *__proc_create(struct 
proc_dir_entry **parent,
atomic_set(&ent->count, 1);
spin_lock_init(&ent->pde_unload_lock);
INIT_LIST_HEAD(&ent->pde_openers);
+   proc_set_user(ent, (*parent)->uid, (*parent)->gid);
+
 out:
return ent;
 }
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index c8bbc68..d701738 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -185,6 +186,8 @@ const struct file_operations proc_net_operations = {
 static __net_init int proc_net_ns_init(struct net *net)
 {
struct proc_dir_entry *netd, *net_statd;
+   kuid_t uid;
+   kgid_t gid;
int err;
 
err = -ENOMEM;
@@ -199,6 +202,16 @@ static __net_init int proc_net_ns_init(struct net *net)
netd->parent = &proc_root;
memcpy(netd->name, "net", 4);
 
+   uid = make_kuid(net->user_ns, 0);
+   if (!uid_valid(uid))
+   uid = GLOBAL_ROOT_UID;
+
+   gid = make_kgid(net->user_ns, 0);
+   if (!gid_valid(gid))
+   gid = GLOBAL_ROOT_GID;
+
+   proc_set_user(netd, uid, gid);
+
err = -EEXIST;
net_statd = proc_net_mkdir(net, "stat", netd);
if (!net_statd)
-- 
2.8.0.rc3.226.g39d4020


-- 
Dmitry


Re: [RFC V2 PATCH 00/25] Kernel NET policy

2016-08-04 Thread Alexei Starovoitov
On Wed, Dec 31, 2014 at 08:38:49PM -0500, kan.li...@intel.com wrote:
> 
> Changes since V1:
>  - Using work queue to set Rx network flow classification rules and search
>available NET policy object asynchronously.
>  - Using RCU lock to replace read-write lock
>  - Redo performance test and update performance results.
>  - Some minor modification for codes and documents.
>  - Remove i40e related patches which will be submitted in separate thread.

Most of the issues brought up in the prior submission were not addressed,
so one more NACK from me as well.
My objection with this approach is the same as others:
such policy doesn't belong in the kernel.

>  1. Why userspace tool cannot do the same thing?
> A: Kernel is more suitable for NET policy.
>- User space code would be far more complicated to get right and 
> perform
>  well . It always need to work with out of date state compared to the
>  latest, because it cannot do any locking with the kernel state.
>- User space code is less efficient than kernel code, because of the
>  additional context switches needed.
>- Kernel is in the right position to coordinate requests from multiple
>  users.

and above excuses is the reason to hack flow director rules in the kernel?
You can do the same in user space. It's not a kernel job.



Re: order-0 vs order-N driver allocation. Was: [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support

2016-08-04 Thread Alexander Duyck
On Thu, Aug 4, 2016 at 9:19 AM, Jesper Dangaard Brouer
 wrote:
>
> On Wed, 3 Aug 2016 10:45:13 -0700 Alexei Starovoitov 
>  wrote:
>
>> On Mon, Jul 25, 2016 at 09:35:20AM +0200, Eric Dumazet wrote:
>> > On Tue, 2016-07-19 at 12:16 -0700, Brenden Blanco wrote:
>> > > The mlx4 driver by default allocates order-3 pages for the ring to
>> > > consume in multiple fragments. When the device has an xdp program, this
>> > > behavior will prevent tx actions since the page must be re-mapped in
>> > > TODEVICE mode, which cannot be done if the page is still shared.
>> > >
>> > > Start by making the allocator configurable based on whether xdp is
>> > > running, such that order-0 pages are always used and never shared.
>> > >
>> > > Since this will stress the page allocator, add a simple page cache to
>> > > each rx ring. Pages in the cache are left dma-mapped, and in drop-only
>> > > stress tests the page allocator is eliminated from the perf report.
>> > >
>> > > Note that setting an xdp program will now require the rings to be
>> > > reconfigured.
>> >
>> > Again, this has nothing to do with XDP ?
>> >
>> > Please submit a separate patch, switching this driver to order-0
>> > allocations.
>> >
>> > I mentioned this order-3 vs order-0 issue earlier [1], and proposed to
>> > send a generic patch, but had been traveling lately, and currently in
>> > vacation.
>> >
>> > order-3 pages are problematic when dealing with hostile traffic anyway,
>> > so we should exclusively use order-0 pages, and page recycling like
>> > Intel drivers.
>> >
>> > http://lists.openwall.net/netdev/2016/04/11/88
>>
>> Completely agree. These multi-page tricks work only for benchmarks and
>> not for production.
>> Eric, if you can submit that patch for mlx4 that would be awesome.
>>
>> I think we should default to order-0 for both mlx4 and mlx5.
>> Alternatively we're thinking to do a netlink or ethtool switch to
>> preserve old behavior, but frankly I don't see who needs this order-N
>> allocation schemes.
>
> I actually agree, that we should switch to order-0 allocations.
>
> *BUT* this will cause performance regressions on platforms with
> expensive DMA operations (as they no longer amortize the cost of
> mapping a larger page).

The trick is to use page reuse like we do for the Intel NICs.  If you
can get away with just reusing the page you don't have to keep making
the expensive map/unmap calls.

> Plus, the base cost of order-0 page is 246 cycles (see [1] slide#9),
> and the 10G wirespeed target is approx 201 cycles.  Thus, for these
> speeds some page recycling tricks are needed.  I described how the Intel
> drives does a cool trick in [1] slide#14, but it does not address the
> DMA part and costs some extra atomic ops.

I'm not sure what you mean about it not addressing the DMA part.  Last
I knew we should be just as fast using the page reuse in the Intel
drivers as the Mellanox driver using the 32K page.  The only real
difference in cost is the spot where we are atomically incrementing
the page count since that is the atomic I assume you are referring to.

I had thought about it and amortizing the atomic operation would
probably be pretty straight forward.  All we would have to do is the
same trick we use in the page frag allocator.  We could add a separate
page_count type variable to the Rx buffer info structure and decrement
that instead.  If I am not mistaken that would allow us to drop it
down to only one atomic update of the page count every 64K or so uses
of the page.

> I've started coding on the page-pool last week, which address both the
> DMA mapping and recycling (with less atomic ops). (p.s. still on
> vacation this week).
>
> http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf

I really wonder if we couldn't get away with creating some sort of 2
tiered allocator for this.  So instead of allocating a page pool we
just reserved blocks of memory like we do with huge pages.  Then you
have essentially a huge page that is mapped to a given device for DMA
and reserved for it to use as a memory resource to allocate the order
0 pages out of.  Doing it that way would likely have multiple
advantages when working with things like IOMMU since the pages would
all belong to one linear block so it would likely consume less
resources on those devices, and it wouldn't be that far off from how
DPDK is making use of huge pages in order to improve it's memory
access times and such.

- Alex


Re: [RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread Daniel Borkmann

On 08/05/2016 12:54 AM, Andi Kleen wrote:

+1, I tried to bring this up here [1] in the last spin. I think only very
few changes would be needed, f.e. on eBPF side to add a queue setting
helper function which is probably straight forward ~10loc patch; and with
regards to actually picking it up after clsact egress, we'd need to adapt
__netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it.


You're proposing to rewrite the whole net policy manager as EBPF and run
it in a crappy JITer? Is that a serious proposal? It just sounds crazy
to me.

Especially since we already have a perfectly good compiler and
programming language to write system code in.

EBPF is ok for temporal instrumentation (if you somehow can accept
its security challenges), but using it to replace core
kernel functionality (which network policy IMHO is) with some bizarre
JITed setup and multiple languages doesn't really make any sense.

Especially it doesn't make sense for anything with shared state,
which is the core part of network policy: it negotiates with multiple
users.

After all we're writing Linux here and not some research toy.


From what I read I guess you didn't really bother to look any deeper into
this bizarre "research toy" to double check some of your claims. One of the
things it's often deployed for by the way is defining policy. And the
suggestion here was merely to explore existing infrastructure around things
like tc and whether it already resolves at least a part of your net policy
manager's requirements (like queue selection) or whether existing infrastructure
can be extended with fewer complexity this way (as was mentioned with a new
cls module as one option).


Re: [PATCH] net: ethernet: ti: cpsw: split common driver data and slaves data

2016-08-04 Thread kbuild test robot
Hi Ivan,

[auto build test ERROR on net/master]
[also build test ERROR on next-20160804]
[cannot apply to net-next/master v4.7]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ivan-Khoronzhuk/net-ethernet-ti-cpsw-split-common-driver-data-and-slaves-data/20160805-052837
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

>> drivers/net/ethernet/ti/cpsw.c:403:8: error: expected ';', identifier or '(' 
>> before 'struct'
static struct cpsw_common *cpsw;
   ^
   drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_tx_poll':
>> drivers/net/ethernet/ti/cpsw.c:788:27: error: implicit declaration of 
>> function 'napi_to_priv' [-Werror=implicit-function-declaration]
 struct cpsw_priv *priv = napi_to_priv(napi_tx);
  ^
>> drivers/net/ethernet/ti/cpsw.c:788:27: warning: initialization makes pointer 
>> from integer without a cast [-Wint-conversion]
   drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_rx_poll':
   drivers/net/ethernet/ti/cpsw.c:809:27: warning: initialization makes pointer 
from integer without a cast [-Wint-conversion]
 struct cpsw_priv *priv = napi_to_priv(napi_rx);
  ^
   drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_ndo_open':
>> drivers/net/ethernet/ti/cpsw.c:1259:12: error: 'struct cpsw_priv' has no 
>> member named 'version'
 reg = priv->version;
   ^
   drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_probe_dual_emac':
>> drivers/net/ethernet/ti/cpsw.c:2148:15: warning: unused variable 'i' 
>> [-Wunused-variable]
 int ret = 0, i;
  ^
   drivers/net/ethernet/ti/cpsw.c: In function 'cpsw_probe':
>> drivers/net/ethernet/ti/cpsw.c:2434:6: error: 'struct cpsw_common' has no 
>> member named 'irq'
 cpsw->irq = platform_get_irq(pdev, 1);
 ^
   drivers/net/ethernet/ti/cpsw.c:2435:10: error: 'struct cpsw_common' has no 
member named 'irq'
 if (cpsw->irq < 0) {
 ^
   drivers/net/ethernet/ti/cpsw.c:2437:13: error: 'struct cpsw_common' has no 
member named 'irq'
  ret = cpsw->irq;
^
   cc1: some warnings being treated as errors

vim +403 drivers/net/ethernet/ti/cpsw.c

   397  /* snapshot of IRQ numbers */
   398  u32 irqs_table[4];
   399  u32 num_irqs;
   400  struct cpts *cpts;
   401  }
   402  
 > 403  static struct cpsw_common *cpsw;
   404  
   405  struct cpsw_stats {
   406  char stat_string[ETH_GSTRING_LEN];

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine

2016-08-04 Thread Cyrill Gorcunov
On Wed, Aug 03, 2016 at 08:54:50PM +0400, Stanislav Kinsburskiy wrote:
> Otherwise freezer cgroup state might never become "FROZEN".
> 
> Here is a deadlock scheme for 2 processes in one freezer cgroup, which is
> freezing:
> 
> CPU 0   CPU 1
> 
> do_last
> inode_lock(dir->d_inode)
> vfs_create
> nfs_create
> ...
> __rpc_execute
> rpc_wait_bit_killable
> __refrigerator
> do_last
> inode_lock(dir->d_inode)
> 
> So, the problem is that one process takes directory inode mutex, executes
> creation request and goes to refrigerator.
> Another one waits till directory lock is released, remains "thawed" and thus
> freezer cgroup state never becomes "FROZEN".
> 
> Notes:
> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup and then
> freeze it again.
> 2) The issue was introduced by commit 
> d310310cbff18ec385c6ab4d58f33b100192a96a.
> 3) This patch is not aimed to fix the issue, but to show the problem root.
> Look like this problem moght be applicable to other hunks from the commit,
> mentioned above.
> 
> Signed-off-by: Stanislav Kinsburskiy 

I think it's worth adding backtrace as well
---

=== pid: 708987 === (file_read)

[] __refrigerator+0x5b/0x190
[] rpc_wait_bit_killable+0x66/0x80 [sunrpc]
[] __rpc_execute+0x154/0x420 [sunrpc]
[] rpc_execute+0x5e/0xa0 [sunrpc]
[] rpc_run_task+0x70/0x90 [sunrpc]
[] rpc_call_sync+0x50/0xc0 [sunrpc]
[] nfs3_rpc_wrapper.constprop.10+0x6b/0xb0 [nfsv3]
[] nfs3_proc_setattr+0xbf/0x140 [nfsv3]
[] nfs3_proc_create+0x1a3/0x220 [nfsv3]
[] nfs_create+0x83/0x150 [nfs]
[] vfs_create+0x8c/0x110
[] do_last+0xc0d/0x11d0
[] path_openat+0xc2/0x460
[] do_filp_open+0x4b/0xb0
[] do_sys_open+0xf3/0x1f0
[] SyS_open+0x1e/0x20
[] system_call_fastpath+0x16/0x1b
[] 0x

=== pid: 708988 === (file_read)

[] do_last+0x283/0x11d0
[] path_openat+0xc2/0x460
[] do_filp_open+0x4b/0xb0
[] do_sys_open+0xf3/0x1f0
[] SyS_open+0x1e/0x20
[] system_call_fastpath+0x16/0x1b
[] 0x


Re: [RFC V2 PATCH 00/25] Kernel NET policy

2016-08-04 Thread Stephen Hemminger
On Wed, 31 Dec 2014 20:38:49 -0500
kan.li...@intel.com wrote:

>  5. Why disable IRQ balance?
> A: Disabling IRQ balance is a common way (recommend way for some devices) 
> to
>tune network performance.

I appreciate that network tuning is hard, most people get it wrong, and nobody
agrees on the right answer.

So rather than fixing existing tools or writing new userspace tools to do 
network
tuning, you want to hard code one policy manager in kernel with a /proc 
interface.
Why not make a good userspace tool (like powertop). There are also several IRQ
balancing programs, but since irqbalance was championed by one vendor others 
seem
to follow like sheep.

I agree that this a real concern but the implementation of this leaves much
to be desired and discussed. Why can't this be done outside of the kernel.


Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver

2016-08-04 Thread Lino Sanfilippo
Hi Timur,

On 03.08.2016 22:12, Timur Tabi wrote:


> +/* Fill up transmit descriptors */
> +static void emac_tx_fill_tpd(struct emac_adapter *adpt,
> +  struct emac_tx_queue *tx_q, struct sk_buff *skb,
> +  struct emac_tpd *tpd)
> +{
> + u16 nr_frags = skb_shinfo(skb)->nr_frags;
> + unsigned int len = skb_headlen(skb);
> + struct emac_buffer *tpbuf = NULL;
> + unsigned int mapped_len = 0;
> + unsigned int i;
> + int ret;
> +
> + /* if Large Segment Offload is (in TCP Segmentation Offload struct) */
> + if (TPD_LSO(tpd)) {
> + mapped_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
> +
> + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx);
> + tpbuf->length = mapped_len;
> + tpbuf->dma_addr = dma_map_single(adpt->netdev->dev.parent,
> +  skb->data, mapped_len,
> +  DMA_TO_DEVICE);
> + ret = dma_mapping_error(adpt->netdev->dev.parent,
> + tpbuf->dma_addr);
> + if (ret) {
> + dev_kfree_skb(skb);
> + return;
> + }
> +
> + TPD_BUFFER_ADDR_L_SET(tpd, lower_32_bits(tpbuf->dma_addr));
> + TPD_BUFFER_ADDR_H_SET(tpd, upper_32_bits(tpbuf->dma_addr));

You should also take big endian systems into account. This means that if the 
multi-byte values
in the descriptors require little-endian you have to convert from host byte 
order to le and
vice versa. You can use cpu_to_le32() and friends for this. 


> + TPD_BUF_LEN_SET(tpd, tpbuf->length);
> + emac_tx_tpd_create(adpt, tx_q, tpd);
> + }
> +
> + if (mapped_len < len) {
> + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx);
> + tpbuf->length = len - mapped_len;
> + tpbuf->dma_addr = dma_map_single(adpt->netdev->dev.parent,
> +  skb->data + mapped_len,
> +  tpbuf->length, DMA_TO_DEVICE);
> + ret = dma_mapping_error(adpt->netdev->dev.parent,
> + tpbuf->dma_addr);
> + if (ret) {
> + dev_kfree_skb(skb);
> + return;
> + }
> +
> + TPD_BUFFER_ADDR_L_SET(tpd, lower_32_bits(tpbuf->dma_addr));
> + TPD_BUFFER_ADDR_H_SET(tpd, upper_32_bits(tpbuf->dma_addr));
> + TPD_BUF_LEN_SET(tpd, tpbuf->length);
> + emac_tx_tpd_create(adpt, tx_q, tpd);
> + }
> +
> + for (i = 0; i < nr_frags; i++) {
> + struct skb_frag_struct *frag;
> +
> + frag = &skb_shinfo(skb)->frags[i];
> +
> + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx);
> + tpbuf->length = frag->size;
> + tpbuf->dma_addr = dma_map_page(adpt->netdev->dev.parent,
> +frag->page.p, frag->page_offset,
> +tpbuf->length, DMA_TO_DEVICE);
> + ret = dma_mapping_error(adpt->netdev->dev.parent,
> + tpbuf->dma_addr);
> + if (ret) {
> + dev_kfree_skb(skb);
> + return;
> + }

In case of error you need to undo all mappings that you have done so far.

> +
> + TPD_BUFFER_ADDR_L_SET(tpd, lower_32_bits(tpbuf->dma_addr));
> + TPD_BUFFER_ADDR_H_SET(tpd, upper_32_bits(tpbuf->dma_addr));
> + TPD_BUF_LEN_SET(tpd, tpbuf->length);
> + emac_tx_tpd_create(adpt, tx_q, tpd);
> + }
> +
> + /* The last tpd */
> + emac_tx_tpd_mark_last(adpt, tx_q);

Use a wmb() here to make sure that all writes to the descriptors in dma memory
are completed before you update the producer register (see memory-barriers.txt
for the reason why this is needed)

> + /* The last buffer info contain the skb address,
> +  * so it will be freed after unmap
> +  */
> + tpbuf->skb = skb;
> +}
> +
> +/* Transmit the packet using specified transmit queue */
> +int emac_mac_tx_buf_send(struct emac_adapter *adpt, struct emac_tx_queue 
> *tx_q,
> +  struct sk_buff *skb)
> +{
> + struct emac_tpd tpd;
> + u32 prod_idx;
> +
> + memset(&tpd, 0, sizeof(tpd));
> +
> + if (emac_tso_csum(adpt, tx_q, skb, &tpd) != 0) {
> + dev_kfree_skb_any(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + if (skb_vlan_tag_present(skb)) {
> + u16 tag;
> +
> + EMAC_VLAN_TO_TAG(skb_vlan_tag_get(skb), tag);
> + TPD_CVLAN_TAG_SET(&tpd, tag);
> + TPD_INSTC_SET(&tpd, 1);
> + }
> +
> + if (skb_network_offset(skb) != ETH_HLEN)
> + TPD_TYP_SET(&tpd, 1);
> +
> + emac_tx_fill_tpd(adpt, tx_q, skb, &tpd);

RE: [E1000-devel] igb: question regarding auto-negotiation

2016-08-04 Thread Pieper, Jeffrey E


-Original Message-
From: Dominic Curran [mailto:dominic.cur...@citrix.com] 
Sent: Thursday, August 04, 2016 4:02 PM
To: Alexander Duyck ; 
e1000-de...@lists.sourceforge.net
Cc: Netdev 
Subject: Re: [E1000-devel] igb: question regarding auto-negotiation



On 07/29/2016 09:35 PM, Alexander Duyck wrote:
> On Fri, Jul 29, 2016 at 4:37 PM, Dominic Curran
>  wrote:
>> Hi
>>
>> This question refers to igb codebase.
>> I have a question regarding the setting of hw->mac.autoneg.
>>
>> Is it correct to say for igb driver:
>> "if speed=1000 and duplex=FULL and media_type=COPPER  then  only
>> auto-negotiate enable is supported"
>>
>> i.e.
>> with these settings (speed/duplex/media_type) then auto-negotiate can
>> _not_ be disabled.  Correct ?
>>
>> I say this for two reasons:
>> 1) The code in igb_set_spd_dplx() seems to indicate it:
>>
>> case SPEED_1000 + DUPLEX_FULL:
>>  mac->autoneg = 1;
>>  adapter->hw.phy.autoneg_advertised = ADVERTISE_1000_FULL;
>>  break;
>>
>> 2) Instrumenting the driver, I always see the autoneg code in
>> e1000_check_for_copper_link_generic()  get called after an igb_reset().
>>
>>
>> Have i understood correctly ?
>>
>> thanks in advance
>> dom
> If you are using copper then you are likely referring to 1000Base-T
> correct?  If so then autonegotation is a requirement.
>
> Here is the wikipedia URL that refers to this:
> https://en.wikipedia.org/wiki/Gigabit_Ethernet#1000BASE-T
>
> Hope this helps to clear it up.
>
> Thanks.
>
> - Alex

Thanks for reply.  I read the wiki link and a bunch of other links besides.

So I have a follow-up question:

You're right I am using copper and 1000Base-T, and as you say auto-neg 
is then requirement.

So why is it possible with the igb driver to set auto-neg to OFF ?

e.g.
# ethtool -a eth6
Pause parameters for eth6:
Autonegotiate:  on
RX: on
TX: on

Now turn it OFF:

# ethtool -A eth6 autoneg off

# ethtool -a eth6
Pause parameters for eth6:
Autonegotiate:  off
RX: on
TX: on

But added debug to the driver I _know_ that it is still 
auto-negotiating, and printing current settings indicates 
'Auto-negotiatiate' is on...

# ethtool  eth6
Settings for eth6:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Full
 Supported pause frame use: No
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full
 100baseT/Half 100baseT/Full
 1000baseT/Full
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Speed: 1000Mb/s
 Duplex: Full
 Port: Twisted Pair
 PHYAD: 1
 Transceiver: internal
 Auto-negotiation: on 
 MDI-X: on
 Supports Wake-on: pumbg
 Wake-on: d
 Current message level: 0x0007 (7)
drv probe link
 Link detected: yes

So I dont understand the difference between these two values ??
Can anyone help please ?

Thanks
dom

Dom,

ethtool -A is used to set flow-control parameters, not speed/duplex/autoneg. 
Speed/duplex parameters are set with ethtool -s. man ethtool should help 
explain the difference.

I hope this helps,
Jeff
--
___
E1000-devel mailing list
e1000-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: igb: question regarding auto-negotiation

2016-08-04 Thread Dominic Curran



On 07/29/2016 09:35 PM, Alexander Duyck wrote:

On Fri, Jul 29, 2016 at 4:37 PM, Dominic Curran
 wrote:

Hi

This question refers to igb codebase.
I have a question regarding the setting of hw->mac.autoneg.

Is it correct to say for igb driver:
"if speed=1000 and duplex=FULL and media_type=COPPER  then  only
auto-negotiate enable is supported"

i.e.
with these settings (speed/duplex/media_type) then auto-negotiate can
_not_ be disabled.  Correct ?

I say this for two reasons:
1) The code in igb_set_spd_dplx() seems to indicate it:

case SPEED_1000 + DUPLEX_FULL:
 mac->autoneg = 1;
 adapter->hw.phy.autoneg_advertised = ADVERTISE_1000_FULL;
 break;

2) Instrumenting the driver, I always see the autoneg code in
e1000_check_for_copper_link_generic()  get called after an igb_reset().


Have i understood correctly ?

thanks in advance
dom

If you are using copper then you are likely referring to 1000Base-T
correct?  If so then autonegotation is a requirement.

Here is the wikipedia URL that refers to this:
https://en.wikipedia.org/wiki/Gigabit_Ethernet#1000BASE-T

Hope this helps to clear it up.

Thanks.

- Alex


Thanks for reply.  I read the wiki link and a bunch of other links besides.

So I have a follow-up question:

You're right I am using copper and 1000Base-T, and as you say auto-neg 
is then requirement.


So why is it possible with the igb driver to set auto-neg to OFF ?

e.g.
# ethtool -a eth6
Pause parameters for eth6:
Autonegotiate:  on
RX: on
TX: on

Now turn it OFF:

# ethtool -A eth6 autoneg off

# ethtool -a eth6
Pause parameters for eth6:
Autonegotiate:  off
RX: on
TX: on

But added debug to the driver I _know_ that it is still 
auto-negotiating, and printing current settings indicates 
'Auto-negotiatiate' is on...


# ethtool  eth6
Settings for eth6:
Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on 
MDI-X: on
Supports Wake-on: pumbg
Wake-on: d
Current message level: 0x0007 (7)
   drv probe link
Link detected: yes

So I dont understand the difference between these two values ??
Can anyone help please ?

Thanks
dom



Re: [RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread Andi Kleen
> +1, I tried to bring this up here [1] in the last spin. I think only very
> few changes would be needed, f.e. on eBPF side to add a queue setting
> helper function which is probably straight forward ~10loc patch; and with
> regards to actually picking it up after clsact egress, we'd need to adapt
> __netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it.

You're proposing to rewrite the whole net policy manager as EBPF and run
it in a crappy JITer? Is that a serious proposal? It just sounds crazy
to me.

Especially since we already have a perfectly good compiler and
programming language to write system code in.

EBPF is ok for temporal instrumentation (if you somehow can accept
its security challenges), but using it to replace core 
kernel functionality (which network policy IMHO is) with some bizarre
JITed setup and multiple languages doesn't really make any sense.

Especially it doesn't make sense for anything with shared state,
which is the core part of network policy: it negotiates with multiple
users.

After all we're writing Linux here and not some research toy.

Thanks,

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread Daniel Borkmann

On 08/04/2016 10:21 PM, John Fastabend wrote:

On 16-08-04 12:36 PM, kan.li...@intel.com wrote:

From: Kan Liang 

To achieve better network performance, the key step is to distribute the
packets to dedicated queues according to policy and system run time
status.

This patch provides an interface which can return the proper dedicated
queue for socket/task. Then the packets of the socket/task will be
redirect to the dedicated queue for better network performance.

For selecting the proper queue, currently it uses round-robin algorithm
to find the available object from the given policy object list. The
algorithm is good enough for now. But it could be improved by some
adaptive algorithm later.

The selected object will be stored in hashtable. So it does not need to
go through the whole object list every time.

Signed-off-by: Kan Liang 
---
  include/linux/netpolicy.h |   5 ++
  net/core/netpolicy.c  | 136 ++
  2 files changed, 141 insertions(+)


There is a hook in the tx path now (recently added)

# ifdef CONFIG_NET_EGRESS
 if (static_key_false(&egress_needed)) {
 skb = sch_handle_egress(skb, &rc, dev);
 if (!skb)
 goto out;
 }
# endif

that allows pushing any policy you like for picking tx queues. It would
be better to use this mechanism. The hook runs 'tc' classifiers so
either write a new ./net/sch/cls_*.c for this or just use ebpf to stick
your policy in at runtime.

I'm out of the office for a few days but when I get pack I can test that
it actually picks the selected queue in all cases I know there was an
issue with some of the drivers using select_queue awhile back.


+1, I tried to bring this up here [1] in the last spin. I think only very
few changes would be needed, f.e. on eBPF side to add a queue setting
helper function which is probably straight forward ~10loc patch; and with
regards to actually picking it up after clsact egress, we'd need to adapt
__netdev_pick_tx() slightly when CONFIG_XPS so it doesn't override it.

  [1] http://www.spinics.net/lists/netdev/msg386953.html


[PATCH net 2/3] mlxsw: spectrum: Do not override PAUSE settings

2016-08-04 Thread Ido Schimmel
The PFCC register is used to configure both PAUSE and PFC frames.
Therefore, when PFC frames are disabled we must make sure we don't
mistakenly also disable PAUSE frames (which might be enabled).

Fix this by packing the PFCC register with the current PAUSE settings.

Note that this register is also accessed via ethtool ops, but there we
are guaranteed to have PFC disabled.

Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
index 3c4a178..b6ed7f7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
@@ -341,6 +341,8 @@ static int mlxsw_sp_port_pfc_set(struct mlxsw_sp_port 
*mlxsw_sp_port,
char pfcc_pl[MLXSW_REG_PFCC_LEN];
 
mlxsw_reg_pfcc_pack(pfcc_pl, mlxsw_sp_port->local_port);
+   mlxsw_reg_pfcc_pprx_set(pfcc_pl, mlxsw_sp_port->link.rx_pause);
+   mlxsw_reg_pfcc_pptx_set(pfcc_pl, mlxsw_sp_port->link.tx_pause);
mlxsw_reg_pfcc_prio_pack(pfcc_pl, pfc->pfc_en);
 
return mlxsw_reg_write(mlxsw_sp_port->mlxsw_sp->core, MLXSW_REG(pfcc),
-- 
2.8.2



[PATCH net v2 1/3] bpf: also call skb_postpush_rcsum on xmit occasions

2016-08-04 Thread Daniel Borkmann
Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.

For the same reasons as described in f8ffad69c9f8 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.

Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.

Also here we have to address bpf_redirect() and bpf_clone_redirect().

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 5708999..c46244f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1365,6 +1365,12 @@ static inline int bpf_try_make_writable(struct sk_buff 
*skb,
return err;
 }
 
+static inline void bpf_push_mac_rcsum(struct sk_buff *skb)
+{
+   if (skb_at_tc_ingress(skb))
+   skb_postpush_rcsum(skb, skb_mac_header(skb), skb->mac_len);
+}
+
 static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
 {
struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp);
@@ -1607,9 +1613,6 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
 
 static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 {
-   if (skb_at_tc_ingress(skb))
-   skb_postpush_rcsum(skb, skb_mac_header(skb), skb->mac_len);
-
return dev_forward_skb(dev, skb);
 }
 
@@ -1648,6 +1651,8 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 
flags, u64 r4, u64 r5)
if (unlikely(!skb))
return -ENOMEM;
 
+   bpf_push_mac_rcsum(skb);
+
return flags & BPF_F_INGRESS ?
   __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
 }
@@ -1693,6 +1698,8 @@ int skb_do_redirect(struct sk_buff *skb)
return -EINVAL;
}
 
+   bpf_push_mac_rcsum(skb);
+
return ri->flags & BPF_F_INGRESS ?
   __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
 }
-- 
1.9.3



[PATCH net v2 3/3] bpf: fix checksum for vlan push/pop helper

2016-08-04 Thread Daniel Borkmann
When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).

For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.

However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.

Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 5ecd5c9..b5add4e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1371,6 +1371,12 @@ static inline void bpf_push_mac_rcsum(struct sk_buff 
*skb)
skb_postpush_rcsum(skb, skb_mac_header(skb), skb->mac_len);
 }
 
+static inline void bpf_pull_mac_rcsum(struct sk_buff *skb)
+{
+   if (skb_at_tc_ingress(skb))
+   skb_postpull_rcsum(skb, skb_mac_header(skb), skb->mac_len);
+}
+
 static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
 {
struct bpf_scratchpad *sp = this_cpu_ptr(&bpf_sp);
@@ -1763,7 +1769,10 @@ static u64 bpf_skb_vlan_push(u64 r1, u64 r2, u64 
vlan_tci, u64 r4, u64 r5)
 vlan_proto != htons(ETH_P_8021AD)))
vlan_proto = htons(ETH_P_8021Q);
 
+   bpf_push_mac_rcsum(skb);
ret = skb_vlan_push(skb, vlan_proto, vlan_tci);
+   bpf_pull_mac_rcsum(skb);
+
bpf_compute_data_end(skb);
return ret;
 }
@@ -1783,7 +1792,10 @@ static u64 bpf_skb_vlan_pop(u64 r1, u64 r2, u64 r3, u64 
r4, u64 r5)
struct sk_buff *skb = (struct sk_buff *) (long) r1;
int ret;
 
+   bpf_push_mac_rcsum(skb);
ret = skb_vlan_pop(skb);
+   bpf_pull_mac_rcsum(skb);
+
bpf_compute_data_end(skb);
return ret;
 }
-- 
1.9.3



[PATCH net v2 2/3] bpf: fix checksum fixups on bpf_skb_store_bytes

2016-08-04 Thread Daniel Borkmann
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.

Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().

Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().

Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/linux/skbuff.h | 52 --
 net/core/filter.c  |  4 ++--
 2 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6f0b3e0..0f665cb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2847,6 +2847,18 @@ static inline int skb_linearize_cow(struct sk_buff *skb)
   __skb_linearize(skb) : 0;
 }
 
+static __always_inline void
+__skb_postpull_rcsum(struct sk_buff *skb, const void *start, unsigned int len,
+unsigned int off)
+{
+   if (skb->ip_summed == CHECKSUM_COMPLETE)
+   skb->csum = csum_block_sub(skb->csum,
+  csum_partial(start, len, 0), off);
+   else if (skb->ip_summed == CHECKSUM_PARTIAL &&
+skb_checksum_start_offset(skb) < 0)
+   skb->ip_summed = CHECKSUM_NONE;
+}
+
 /**
  * skb_postpull_rcsum - update checksum for received skb after pull
  * @skb: buffer to update
@@ -2857,36 +2869,38 @@ static inline int skb_linearize_cow(struct sk_buff *skb)
  * update the CHECKSUM_COMPLETE checksum, or set ip_summed to
  * CHECKSUM_NONE so that it can be recomputed from scratch.
  */
-
 static inline void skb_postpull_rcsum(struct sk_buff *skb,
  const void *start, unsigned int len)
 {
-   if (skb->ip_summed == CHECKSUM_COMPLETE)
-   skb->csum = csum_sub(skb->csum, csum_partial(start, len, 0));
-   else if (skb->ip_summed == CHECKSUM_PARTIAL &&
-skb_checksum_start_offset(skb) < 0)
-   skb->ip_summed = CHECKSUM_NONE;
+   __skb_postpull_rcsum(skb, start, len, 0);
 }
 
-unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);
+static __always_inline void
+__skb_postpush_rcsum(struct sk_buff *skb, const void *start, unsigned int len,
+unsigned int off)
+{
+   if (skb->ip_summed == CHECKSUM_COMPLETE)
+   skb->csum = csum_block_add(skb->csum,
+  csum_partial(start, len, 0), off);
+}
 
+/**
+ * skb_postpush_rcsum - update checksum for received skb after push
+ * @skb: buffer to update
+ * @start: start of data after push
+ * @len: length of data pushed
+ *
+ * After doing a push on a received packet, you need to call this to
+ * update the CHECKSUM_COMPLETE checksum.
+ */
 static inline void skb_postpush_rcsum(struct sk_buff *skb,
  const void *start, unsigned int len)
 {
-   /* For performing the reverse operation to skb_postpull_rcsum(),
-* we can instead of ...
-*
-*   skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
-*
-* ... just use this equivalent version here to save a few
-* instructions. Feeding csum of 0 in csum_partial() and later
-* on adding skb->csum is equivalent to feed skb->csum in the
-* first place.
-*/
-   if (skb->ip_summed == CHECKSUM_COMPLETE)
-   skb->csum = csum_partial(start, len, skb->csum);
+   __skb_postpush_rcsum(skb, start, len, 0);
 }
 
+unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);
+
 /**
  * skb_push_rcsum - push skb and update receive checksum
  * @

[PATCH net v2 0/3] Few BPF helper related checksum fixes

2016-08-04 Thread Daniel Borkmann
The set contains three fixes with regards to CHECKSUM_COMPLETE
and BPF helper functions. For details please see individual
patches.

Thanks!

v1 -> v2:
  - Fixed make htmldocs issue reported by kbuild bot.
  - Rest as is.

Daniel Borkmann (3):
  bpf: also call skb_postpush_rcsum on xmit occasions
  bpf: fix checksum fixups on bpf_skb_store_bytes
  bpf: fix checksum for vlan push/pop helper

 include/linux/skbuff.h | 52 --
 net/core/filter.c  | 29 +++-
 2 files changed, 57 insertions(+), 24 deletions(-)

-- 
1.9.3



Re: [PATCH 2/2] net: core: ethtool: add ringparam perqueue command

2016-08-04 Thread Ivan Khoronzhuk

Please, ignore it
It was sent by mistake

On 05.08.16 00:11, Ivan Khoronzhuk wrote:

It useful feature to be able to configure number of buffers for
every queue.

Signed-off-by: Ivan Khoronzhuk 
---
 include/linux/ethtool.h |   4 ++
 net/core/ethtool.c  | 104 
 2 files changed, 108 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 7e64c17..7109736 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -372,6 +372,10 @@ struct ethtool_ops {
  struct ethtool_coalesce *);
int (*get_per_queue_bandwidth)(struct net_device *, u32, int *);
int (*set_per_queue_bandwidth)(struct net_device *, u32, int);
+   int (*get_per_queue_ringparam)(struct net_device *, u32,
+  struct ethtool_ringparam *);
+   int (*set_per_queue_ringparam)(struct net_device *, u32,
+  struct ethtool_ringparam *);
int (*get_link_ksettings)(struct net_device *,
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f31d539..42a7cb3 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2347,6 +2347,104 @@ static int ethtool_get_per_queue_coalesce(struct 
net_device *dev,
 }

 static int
+ethtool_get_per_queue_ringparam(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   u32 bit;
+   int ret;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if (!dev->ethtool_ops->get_per_queue_ringparam)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   struct ethtool_ringparam
+   ringparam = { .cmd = ETHTOOL_GRINGPARAM };
+
+   ret = dev->ethtool_ops->get_per_queue_ringparam(dev, bit,
+   &ringparam);
+   if (ret != 0)
+   return ret;
+   if (copy_to_user(useraddr, &ringparam, sizeof(ringparam)))
+   return -EFAULT;
+   useraddr += sizeof(ringparam);
+   }
+
+   return 0;
+}
+
+static int
+ethtool_set_per_queue_ringparam(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   struct ethtool_ringparam *backup = NULL, *tmp = NULL;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+   int i, ret = 0;
+   int n_queue;
+   u32 bit;
+
+   if ((!dev->ethtool_ops->set_per_queue_ringparam) ||
+   (!dev->ethtool_ops->get_per_queue_ringparam))
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+   n_queue = bitmap_weight(queue_mask, MAX_NUM_QUEUE);
+   tmp = kmalloc_array(n_queue, sizeof(*backup), GFP_KERNEL);
+   if (!tmp)
+   return -ENOMEM;
+   backup = tmp;
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   struct ethtool_ringparam
+   ringparam = { .cmd = ETHTOOL_SRINGPARAM };
+
+   ret = dev->ethtool_ops->get_per_queue_ringparam(dev, bit, tmp);
+   if (ret != 0)
+   goto roll_back;
+
+   tmp++;
+
+   if (copy_from_user(&ringparam, useraddr, sizeof(ringparam))) {
+   ret = -EFAULT;
+   goto roll_back;
+   }
+
+   ret = dev->ethtool_ops->set_per_queue_ringparam(dev, bit,
+   &ringparam);
+   if (ret != 0)
+   goto roll_back;
+
+   useraddr += sizeof(ringparam);
+   }
+
+roll_back:
+   if (ret != 0) {
+   tmp = backup;
+   for_each_set_bit(i, queue_mask, bit) {
+   dev->ethtool_ops->set_per_queue_ringparam(dev, i, tmp);
+   tmp++;
+   }
+   }
+   kfree(backup);
+
+   return ret;
+}
+
+static int
 ethtool_get_per_queue_bandwidth(struct net_device *dev,
void __user *useraddr,
struct ethtool_per_queue_op *per_queue_opt)
@@ -2509,

Re: [PATCH 1/2] net: core: ethtool: add per queue bandwidth command

2016-08-04 Thread Ivan Khoronzhuk

Please, ignore it
It was sent by mistake

On 05.08.16 00:11, Ivan Khoronzhuk wrote:

Signed-off-by: Ivan Khoronzhuk 
---
 include/linux/ethtool.h  |   4 ++
 include/uapi/linux/ethtool.h |   2 +
 net/core/ethtool.c   | 102 +++
 3 files changed, 108 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..7e64c17 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -273,6 +273,8 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 
*legacy_u32,
  * a TX queue has this number, return -EINVAL. If only a RX queue or a TX
  * queue has this number, ignore the inapplicable fields.
  * Returns a negative error code or zero.
+ * @get_per_queue_bandwidth: get per-queue bandwidth
+ * @set_per_queue_bandwidth: set per-queue bandwidth
  * @get_link_ksettings: When defined, takes precedence over the
  * %get_settings method. Get various device settings
  * including Ethernet link settings. The %cmd and
@@ -368,6 +370,8 @@ struct ethtool_ops {
  struct ethtool_coalesce *);
int (*set_per_queue_coalesce)(struct net_device *, u32,
  struct ethtool_coalesce *);
+   int (*get_per_queue_bandwidth)(struct net_device *, u32, int *);
+   int (*set_per_queue_bandwidth)(struct net_device *, u32, int);
int (*get_link_ksettings)(struct net_device *,
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index b8f38e8..0fcfe9e 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1314,6 +1314,8 @@ struct ethtool_per_queue_op {

 #define ETHTOOL_GLINKSETTINGS  0x004c /* Get ethtool_link_settings */
 #define ETHTOOL_SLINKSETTINGS  0x004d /* Set ethtool_link_settings */
+#define ETHTOOL_GBANDWIDTH 0x004e /* Get ethtool per queue bandwidth */
+#define ETHTOOL_SBANDWIDTH 0x004f /* Set ethtool per queue bandwidth */


 /* compatibility with older code */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..f31d539 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2346,6 +2346,102 @@ static int ethtool_get_per_queue_coalesce(struct 
net_device *dev,
return 0;
 }

+static int
+ethtool_get_per_queue_bandwidth(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   u32 bit;
+   int ret;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if (!dev->ethtool_ops->get_per_queue_bandwidth)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   int bandwidth;
+
+   ret = dev->ethtool_ops->get_per_queue_bandwidth(dev, bit,
+   &bandwidth);
+   if (ret != 0)
+   return ret;
+   if (copy_to_user(useraddr, &bandwidth, sizeof(bandwidth)))
+   return -EFAULT;
+   useraddr += sizeof(bandwidth);
+   }
+
+   return 0;
+}
+
+static int
+ethtool_set_per_queue_bandwidth(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   u32 bit;
+   int n_queue;
+   int i, ret = 0;
+   int *backup = NULL, *tmp = NULL;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if ((!dev->ethtool_ops->set_per_queue_bandwidth) ||
+   (!dev->ethtool_ops->get_per_queue_bandwidth))
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+   n_queue = bitmap_weight(queue_mask, MAX_NUM_QUEUE);
+   tmp = kmalloc_array(n_queue, sizeof(*backup), GFP_KERNEL);
+   if (!tmp)
+   return -ENOMEM;
+   backup = tmp;
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   int bandwidth;
+
+   ret = dev->ethtool_ops->get_per_queue_bandwidth(dev, bit, tmp);
+   if (ret != 0)
+   goto roll_back;
+
+   tmp++;
+
+   if (copy_from_user(&bandwidth, useraddr, sizeof(bandwidth))) {
+   ret = -EFAULT;
+   goto roll_back;
+   }
+
+  

Re: [PATCH] priority improvement

2016-08-04 Thread Ivan Khoronzhuk


Please, ignore it
It was sent by mistake

On 05.08.16 00:11, Ivan Khoronzhuk wrote:

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 45 +-
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 9ddaccc..cd12f52 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -788,22 +788,16 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int 
budget)
 {
struct cpsw_priv*priv = napi_to_priv(napi_tx);
int num_tx, ch;
-   u32 ch_map;
+   unsigned long   ch_map;

/* process every unprocessed channel */
-   ch_map = cpdma_ctrl_txchs_state(priv->dma);
-   for (ch = 0, num_tx = 0; num_tx < budget; ch_map >>= 1, ch++) {
-   if (!ch_map) {
-   ch_map = cpdma_ctrl_txchs_state(priv->dma);
-   if (!ch_map)
-   break;
-
-   ch = 0;
-   }
-
-   if (!(ch_map & 0x01))
-   continue;
+   for (num_tx = 0; num_tx < budget;) {
+   ch_map = cpdma_ctrl_txchs_state(priv->dma);
+   if (!ch_map)
+   break;

+   /* process beginning from higher priority queue */
+   ch = __fls(ch_map);
num_tx += cpdma_chan_process(priv->txch[ch], budget - num_tx);
}

@@ -829,19 +823,13 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int 
budget)
u32 ch_map;

/* process every unprocessed channel */
-   ch_map = cpdma_ctrl_rxchs_state(priv->dma);
-   for (ch = 0, num_rx = 0; num_rx < budget; ch_map >>= 1, ch++) {
-   if (!ch_map) {
-   ch_map = cpdma_ctrl_rxchs_state(priv->dma);
-   if (!ch_map)
-   break;
-
-   ch = 0;
-   }
-
-   if (!(ch_map & 0x01))
-   continue;
+   for (num_rx = 0; num_rx < budget;) {
+   ch_map = cpdma_ctrl_rxchs_state(priv->dma);
+   if (!ch_map)
+   break;

+   /* process beginning from higher priority queue */
+   ch = __fls(ch_map);
num_rx += cpdma_chan_process(priv->rxch[ch], budget - num_rx);
}

@@ -1130,8 +1118,11 @@ cpsw_tx_queue_mapping(struct cpsw_priv *priv, struct 
sk_buff *skb)
 {
unsigned int q_idx = skb_get_queue_mapping(skb);

-   if (q_idx >= priv->tx_ch_num)
-   q_idx = q_idx % priv->tx_ch_num;
+   /* cpsw h/w has backward order queue priority, 7 - highest */
+   if (likely(q_idx < priv->tx_ch_num))
+   q_idx = priv->tx_ch_num - q_idx - 1;
+   else
+   q_idx = 0;

return priv->txch[q_idx];
 }



--
Regards,
Ivan Khoronzhuk


Re: [PATCH 0/2] Add ability to configure ethernet h/w shaper

2016-08-04 Thread Ivan Khoronzhuk

Please, ignore it
It was sent by mistake

On 05.08.16 00:11, Ivan Khoronzhuk wrote:

These two patches can be used to set per queue bandwidth with ethtool.
I've create them as logical continuation of patchset from intel,
that have introduced per-queue setting command month ago for ethtool
interface
(http://kernel.opensuse.org/cgit/kernel-source/commit/?h=rpm-4.4.9-36&;
id=feaab26abfffe381fb4c8c10d2762a753d481c6c). Actually I've not tested this
interface and planning to send it in parallel with
"net: ethernet: ti: cpsw: add multi-queue support"
(https://lkml.org/lkml/2016/6/30/603), as it contains only changes to
ethtool interface.

First patch can be used to set per-channel bandwidth, second to tune
number of per-channel descriptors. It can solve issues described by
Schuyler. In case if per-channel bandwidth is equal to maximum
for every channel, the driver could be switched to priority mode.

Ivan Khoronzhuk (2):
  net: core: ethtool: add per queue bandwidth command
  net: core: ethtool: add ringparam perqueue command

 include/linux/ethtool.h  |   8 ++
 include/uapi/linux/ethtool.h |   2 +
 net/core/ethtool.c   | 206 +++
 3 files changed, 216 insertions(+)



--
Regards,
Ivan Khoronzhuk


[PATCH 3/3] net: ethernet: ti: cpsw: split common driver data and private net data

2016-08-04 Thread Ivan Khoronzhuk
Simplify driver by splitting common driver data and net dev
private data. In case of dual_emac mode 2 networks devices
are created, each of them contains its own private data.
But 2 net devices share a bunch of h/w resources, that shouldn't
be duplicated.
This patch leads to the following:
- no functional changes
- reduce code size
- reduce memory usage
- reduce number of conversion to priv function
- reduce number of arguments for some functions
- increase code readability
- create prerequisites to add multichannel support,
  when channels are shared between net devices

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 775 +++--
 1 file changed, 364 insertions(+), 411 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 85ee9f5..7a84515 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -141,8 +141,8 @@ do {
\
 #define CPSW_CMINTMIN_INTVL((1000 / CPSW_CMINTMAX_CNT) + 1)
 
 #define cpsw_slave_index(priv) \
-   ((priv->data.dual_emac) ? priv->emac_port : \
-   priv->data.active_slave)
+   ((cpsw->data.dual_emac) ? priv->emac_port : \
+   cpsw->data.active_slave)
 
 static int debug_level;
 module_param(debug_level, int, 0);
@@ -364,29 +364,34 @@ static inline void slave_write(struct cpsw_slave *slave, 
u32 val, u32 offset)
 }
 
 struct cpsw_priv {
-   struct platform_device  *pdev;
struct net_device   *ndev;
-   struct napi_struct  napi_rx;
-   struct napi_struct  napi_tx;
struct device   *dev;
+   u8  mac_addr[ETH_ALEN];
+   boolrx_pause;
+   booltx_pause;
+   u32 msg_enable;
+   u32 emac_port;
+};
+
+struct cpsw_common {
+   struct net_device   *ndev; /* holds base ndev */
+   struct platform_device  *pdev;
struct cpsw_platform_data   data;
+   struct napi_struct  napi_rx;
+   struct napi_struct  napi_tx;
+   struct cpdma_chan   *txch, *rxch;
+   struct cpsw_slave   *slaves;
struct cpsw_ss_regs __iomem *regs;
struct cpsw_wr_regs __iomem *wr_regs;
u8 __iomem  *hw_stats;
struct cpsw_host_regs __iomem   *host_port_regs;
-   u32 msg_enable;
-   u32 version;
-   u32 coal_intvl;
-   u32 bus_freq_mhz;
-   int rx_packet_max;
struct clk  *clk;
-   u8  mac_addr[ETH_ALEN];
-   struct cpsw_slave   *slaves;
struct cpdma_ctlr   *dma;
-   struct cpdma_chan   *txch, *rxch;
struct cpsw_ale *ale;
-   boolrx_pause;
-   booltx_pause;
+   int rx_packet_max;
+   u32 bus_freq_mhz;
+   u32 version;
+   u32 coal_intvl;
boolquirk_irq;
boolrx_irq_disabled;
booltx_irq_disabled;
@@ -394,9 +399,10 @@ struct cpsw_priv {
u32 irqs_table[4];
u32 num_irqs;
struct cpts *cpts;
-   u32 emac_port;
 };
 
+static struct cpsw_common *cpsw;
+
 struct cpsw_stats {
char stat_string[ETH_GSTRING_LEN];
int type;
@@ -485,78 +491,79 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
 
 #define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats)
 
-#define napi_to_priv(napi) container_of(napi, struct cpsw_priv, napi)
 #define for_each_slave(priv, func, arg...) \
do {\
struct cpsw_slave *slave;   \
int n;  \
-   if (priv->data.dual_emac)   \
-   (func)((priv)->slaves + priv->emac_port, ##arg);\
+   if (cpsw->data.dual_emac)   \
+   (func)(cpsw->slaves + priv->emac_port, ##arg);\
else\
-   for (n = (priv)->data.slaves,   \
-   slave = (priv)->slaves; \
+   fo

[PATCH] net: ethernet: ti: cpsw: split common driver data and slaves data

2016-08-04 Thread Ivan Khoronzhuk
Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 761 +++--
 1 file changed, 359 insertions(+), 402 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c51f346..38b04bf 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -141,8 +141,8 @@ do {
\
 #define CPSW_CMINTMIN_INTVL((1000 / CPSW_CMINTMAX_CNT) + 1)
 
 #define cpsw_slave_index(priv) \
-   ((priv->data.dual_emac) ? priv->emac_port : \
-   priv->data.active_slave)
+   ((cpsw->data.dual_emac) ? priv->emac_port : \
+   cpsw->data.active_slave)
 
 static int debug_level;
 module_param(debug_level, int, 0);
@@ -364,29 +364,33 @@ static inline void slave_write(struct cpsw_slave *slave, 
u32 val, u32 offset)
 }
 
 struct cpsw_priv {
-   struct platform_device  *pdev;
struct net_device   *ndev;
-   struct napi_struct  napi_rx;
-   struct napi_struct  napi_tx;
struct device   *dev;
+   u8  mac_addr[ETH_ALEN];
+   boolrx_pause;
+   booltx_pause;
+   u32 msg_enable;
+   u32 emac_port;
+};
+
+struct cpsw_common {
+   struct platform_device  *pdev;
struct cpsw_platform_data   data;
+   struct napi_struct  napi_rx;
+   struct napi_struct  napi_tx;
+   struct cpdma_chan   *txch, *rxch;
+   struct cpsw_slave   *slaves;
struct cpsw_ss_regs __iomem *regs;
struct cpsw_wr_regs __iomem *wr_regs;
u8 __iomem  *hw_stats;
struct cpsw_host_regs __iomem   *host_port_regs;
-   u32 msg_enable;
-   u32 version;
-   u32 coal_intvl;
-   u32 bus_freq_mhz;
-   int rx_packet_max;
struct clk  *clk;
-   u8  mac_addr[ETH_ALEN];
-   struct cpsw_slave   *slaves;
struct cpdma_ctlr   *dma;
-   struct cpdma_chan   *txch, *rxch;
struct cpsw_ale *ale;
-   boolrx_pause;
-   booltx_pause;
+   int rx_packet_max;
+   u32 bus_freq_mhz;
+   u32 version;
+   u32 coal_intvl;
boolquirk_irq;
boolrx_irq_disabled;
booltx_irq_disabled;
@@ -394,8 +398,9 @@ struct cpsw_priv {
u32 irqs_table[4];
u32 num_irqs;
struct cpts *cpts;
-   u32 emac_port;
-};
+}
+
+static struct cpsw_common *cpsw;
 
 struct cpsw_stats {
char stat_string[ETH_GSTRING_LEN];
@@ -485,78 +490,79 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
 
 #define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats)
 
-#define napi_to_priv(napi) container_of(napi, struct cpsw_priv, napi)
 #define for_each_slave(priv, func, arg...) \
do {\
struct cpsw_slave *slave;   \
int n;  \
-   if (priv->data.dual_emac)   \
-   (func)((priv)->slaves + priv->emac_port, ##arg);\
+   if (cpsw->data.dual_emac)   \
+   (func)(cpsw->slaves + priv->emac_port, ##arg);\
else\
-   for (n = (priv)->data.slaves,   \
-   slave = (priv)->slaves; \
+   for (n = cpsw->data.slaves, \
+   slave = cpsw->slaves;   \
n; n--) \
(func)(slave++, ##arg); \
} while (0)
-#define cpsw_get_slave_ndev(priv, __slave_no__)
\
-   ((__slave_no__ < priv->data.slaves) ?   \
-   priv->slaves[__slave_no__].ndev : NULL)
-#define cpsw_get_slave_priv(priv, __slave_no__)
\
-   (((__slave_no__ < priv->data.slaves) && \
-   (priv->slaves[__

[PATCH 2/3] net: ethernet: ti: cpsw: remove redundant check in napi poll

2016-08-04 Thread Ivan Khoronzhuk
No need to check number of handled packets, when in most cases (> 99%)
it's not 0. It can be 0 only in rarely cases, even in this case
it's not bad to print just 0.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8972bf6..85ee9f5 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -793,9 +793,7 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int 
budget)
}
}
 
-   if (num_tx)
-   cpsw_dbg(priv, intr, "poll %d tx pkts\n", num_tx);
-
+   cpsw_dbg(priv, intr, "poll %d tx pkts\n", num_tx);
return num_tx;
 }
 
@@ -814,9 +812,7 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int 
budget)
}
}
 
-   if (num_rx)
-   cpsw_dbg(priv, intr, "poll %d rx pkts\n", num_rx);
-
+   cpsw_dbg(priv, intr, "poll %d rx pkts\n", num_rx);
return num_rx;
 }
 
-- 
1.9.1



[PATCH 1/3] net: ethernet: ti: cpsw: simplify submit routine

2016-08-04 Thread Ivan Khoronzhuk
As second net dev is created only in case of dual_emac mode, port
number can be figured out in simpler way. Also no need to pass
redundant ndev struct.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c51f346..8972bf6 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1065,19 +1065,11 @@ static int cpsw_common_res_usage_state(struct cpsw_priv 
*priv)
return usage_count;
 }
 
-static inline int cpsw_tx_packet_submit(struct net_device *ndev,
-   struct cpsw_priv *priv, struct sk_buff *skb)
+static inline int cpsw_tx_packet_submit(struct cpsw_priv *priv,
+   struct sk_buff *skb)
 {
-   if (!priv->data.dual_emac)
-   return cpdma_chan_submit(priv->txch, skb, skb->data,
- skb->len, 0);
-
-   if (ndev == cpsw_get_slave_ndev(priv, 0))
-   return cpdma_chan_submit(priv->txch, skb, skb->data,
- skb->len, 1);
-   else
-   return cpdma_chan_submit(priv->txch, skb, skb->data,
- skb->len, 2);
+   return cpdma_chan_submit(priv->txch, skb, skb->data, skb->len,
+priv->emac_port + priv->data.dual_emac);
 }
 
 static inline void cpsw_add_dual_emac_def_ale_entries(
@@ -1406,7 +1398,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff 
*skb,
 
skb_tx_timestamp(skb);
 
-   ret = cpsw_tx_packet_submit(ndev, priv, skb);
+   ret = cpsw_tx_packet_submit(priv, skb);
if (unlikely(ret != 0)) {
cpsw_err(priv, tx_err, "desc submit failed\n");
goto fail;
-- 
1.9.1



Re: [PATCH] net: ethernet: ti: cpsw: split common driver data and slaves data

2016-08-04 Thread Ivan Khoronzhuk

Please, ignore it.
It was sent by mistake.

On 05.08.16 00:11, Ivan Khoronzhuk wrote:

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 761 +++--
 1 file changed, 359 insertions(+), 402 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c51f346..38b04bf 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -141,8 +141,8 @@ do {
\
 #define CPSW_CMINTMIN_INTVL((1000 / CPSW_CMINTMAX_CNT) + 1)

 #define cpsw_slave_index(priv) \
-   ((priv->data.dual_emac) ? priv->emac_port :   \
-   priv->data.active_slave)
+   ((cpsw->data.dual_emac) ? priv->emac_port :   \
+   cpsw->data.active_slave)

 static int debug_level;
 module_param(debug_level, int, 0);
@@ -364,29 +364,33 @@ static inline void slave_write(struct cpsw_slave *slave, 
u32 val, u32 offset)
 }

 struct cpsw_priv {
-   struct platform_device  *pdev;
struct net_device   *ndev;
-   struct napi_struct  napi_rx;
-   struct napi_struct  napi_tx;
struct device   *dev;
+   u8  mac_addr[ETH_ALEN];
+   boolrx_pause;
+   booltx_pause;
+   u32 msg_enable;
+   u32 emac_port;
+};
+
+struct cpsw_common {
+   struct platform_device  *pdev;
struct cpsw_platform_data   data;
+   struct napi_struct  napi_rx;
+   struct napi_struct  napi_tx;
+   struct cpdma_chan   *txch, *rxch;
+   struct cpsw_slave   *slaves;
struct cpsw_ss_regs __iomem *regs;
struct cpsw_wr_regs __iomem *wr_regs;
u8 __iomem  *hw_stats;
struct cpsw_host_regs __iomem   *host_port_regs;
-   u32 msg_enable;
-   u32 version;
-   u32 coal_intvl;
-   u32 bus_freq_mhz;
-   int rx_packet_max;
struct clk  *clk;
-   u8  mac_addr[ETH_ALEN];
-   struct cpsw_slave   *slaves;
struct cpdma_ctlr   *dma;
-   struct cpdma_chan   *txch, *rxch;
struct cpsw_ale *ale;
-   boolrx_pause;
-   booltx_pause;
+   int rx_packet_max;
+   u32 bus_freq_mhz;
+   u32 version;
+   u32 coal_intvl;
boolquirk_irq;
boolrx_irq_disabled;
booltx_irq_disabled;
@@ -394,8 +398,9 @@ struct cpsw_priv {
u32 irqs_table[4];
u32 num_irqs;
struct cpts *cpts;
-   u32 emac_port;
-};
+}
+
+static struct cpsw_common *cpsw;

 struct cpsw_stats {
char stat_string[ETH_GSTRING_LEN];
@@ -485,78 +490,79 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {

 #define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats)

-#define napi_to_priv(napi) container_of(napi, struct cpsw_priv, napi)
 #define for_each_slave(priv, func, arg...) \
do {\
struct cpsw_slave *slave;   \
int n;  \
-   if (priv->data.dual_emac)\
-   (func)((priv)->slaves + priv->emac_port, ##arg);\
+   if (cpsw->data.dual_emac)\
+   (func)(cpsw->slaves + priv->emac_port, ##arg);\
else\
-   for (n = (priv)->data.slaves,\
-   slave = (priv)->slaves;  \
+   for (n = cpsw->data.slaves,  \
+   slave = cpsw->slaves;\
n; n--) \
(func)(slave++, ##arg); \
} while (0)
-#define cpsw_get_slave_ndev(priv, __slave_no__)
\
-   ((__slave_no__ < priv->data.slaves) ? \
-   priv->slaves[__slave_no__].ndev : NULL)
-#define cpsw_get_slave_priv(priv, __slave_no__)
\
-   (((_

[PATCH 0/2] Add ability to configure ethernet h/w shaper

2016-08-04 Thread Ivan Khoronzhuk
These two patches can be used to set per queue bandwidth with ethtool.
I've create them as logical continuation of patchset from intel,
that have introduced per-queue setting command month ago for ethtool
interface
(http://kernel.opensuse.org/cgit/kernel-source/commit/?h=rpm-4.4.9-36&;
id=feaab26abfffe381fb4c8c10d2762a753d481c6c). Actually I've not tested this
interface and planning to send it in parallel with
"net: ethernet: ti: cpsw: add multi-queue support"
(https://lkml.org/lkml/2016/6/30/603), as it contains only changes to
ethtool interface.

First patch can be used to set per-channel bandwidth, second to tune
number of per-channel descriptors. It can solve issues described by
Schuyler. In case if per-channel bandwidth is equal to maximum
for every channel, the driver could be switched to priority mode.

Ivan Khoronzhuk (2):
  net: core: ethtool: add per queue bandwidth command
  net: core: ethtool: add ringparam perqueue command

 include/linux/ethtool.h  |   8 ++
 include/uapi/linux/ethtool.h |   2 +
 net/core/ethtool.c   | 206 +++
 3 files changed, 216 insertions(+)

-- 
1.9.1



[PATCH 0/3] net: ethernet: ti: cpsw: split driver data and per ndev data

2016-08-04 Thread Ivan Khoronzhuk
In dual_emac mode the driver can handle 2 network devices. Each of them can use
its own private data and common data/resources. This patchset splits common 
driver
data/resources and private per net device data.

Doesn't have bad impact on performance.

Based on net-next/master

Ivan Khoronzhuk (3):
  net: ethernet: ti: cpsw: simplify submit routine
  net: ethernet: ti: cpsw: remove redundant check in napi poll
  net: ethernet: ti: cpsw: split common driver data and private net data

 drivers/net/ethernet/ti/cpsw.c | 797 +++--
 1 file changed, 369 insertions(+), 428 deletions(-)

-- 
1.9.1



[PATCH] priority improvement

2016-08-04 Thread Ivan Khoronzhuk
Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 45 +-
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 9ddaccc..cd12f52 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -788,22 +788,16 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int 
budget)
 {
struct cpsw_priv*priv = napi_to_priv(napi_tx);
int num_tx, ch;
-   u32 ch_map;
+   unsigned long   ch_map;
 
/* process every unprocessed channel */
-   ch_map = cpdma_ctrl_txchs_state(priv->dma);
-   for (ch = 0, num_tx = 0; num_tx < budget; ch_map >>= 1, ch++) {
-   if (!ch_map) {
-   ch_map = cpdma_ctrl_txchs_state(priv->dma);
-   if (!ch_map)
-   break;
-
-   ch = 0;
-   }
-
-   if (!(ch_map & 0x01))
-   continue;
+   for (num_tx = 0; num_tx < budget;) {
+   ch_map = cpdma_ctrl_txchs_state(priv->dma);
+   if (!ch_map)
+   break;
 
+   /* process beginning from higher priority queue */
+   ch = __fls(ch_map);
num_tx += cpdma_chan_process(priv->txch[ch], budget - num_tx);
}
 
@@ -829,19 +823,13 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int 
budget)
u32 ch_map;
 
/* process every unprocessed channel */
-   ch_map = cpdma_ctrl_rxchs_state(priv->dma);
-   for (ch = 0, num_rx = 0; num_rx < budget; ch_map >>= 1, ch++) {
-   if (!ch_map) {
-   ch_map = cpdma_ctrl_rxchs_state(priv->dma);
-   if (!ch_map)
-   break;
-
-   ch = 0;
-   }
-
-   if (!(ch_map & 0x01))
-   continue;
+   for (num_rx = 0; num_rx < budget;) {
+   ch_map = cpdma_ctrl_rxchs_state(priv->dma);
+   if (!ch_map)
+   break;
 
+   /* process beginning from higher priority queue */
+   ch = __fls(ch_map);
num_rx += cpdma_chan_process(priv->rxch[ch], budget - num_rx);
}
 
@@ -1130,8 +1118,11 @@ cpsw_tx_queue_mapping(struct cpsw_priv *priv, struct 
sk_buff *skb)
 {
unsigned int q_idx = skb_get_queue_mapping(skb);
 
-   if (q_idx >= priv->tx_ch_num)
-   q_idx = q_idx % priv->tx_ch_num;
+   /* cpsw h/w has backward order queue priority, 7 - highest */
+   if (likely(q_idx < priv->tx_ch_num))
+   q_idx = priv->tx_ch_num - q_idx - 1;
+   else
+   q_idx = 0;
 
return priv->txch[q_idx];
 }
-- 
1.9.1



[PATCH 2/2] net: core: ethtool: add ringparam perqueue command

2016-08-04 Thread Ivan Khoronzhuk
It useful feature to be able to configure number of buffers for
every queue.

Signed-off-by: Ivan Khoronzhuk 
---
 include/linux/ethtool.h |   4 ++
 net/core/ethtool.c  | 104 
 2 files changed, 108 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 7e64c17..7109736 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -372,6 +372,10 @@ struct ethtool_ops {
  struct ethtool_coalesce *);
int (*get_per_queue_bandwidth)(struct net_device *, u32, int *);
int (*set_per_queue_bandwidth)(struct net_device *, u32, int);
+   int (*get_per_queue_ringparam)(struct net_device *, u32,
+  struct ethtool_ringparam *);
+   int (*set_per_queue_ringparam)(struct net_device *, u32,
+  struct ethtool_ringparam *);
int (*get_link_ksettings)(struct net_device *,
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f31d539..42a7cb3 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2347,6 +2347,104 @@ static int ethtool_get_per_queue_coalesce(struct 
net_device *dev,
 }
 
 static int
+ethtool_get_per_queue_ringparam(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   u32 bit;
+   int ret;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if (!dev->ethtool_ops->get_per_queue_ringparam)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   struct ethtool_ringparam
+   ringparam = { .cmd = ETHTOOL_GRINGPARAM };
+
+   ret = dev->ethtool_ops->get_per_queue_ringparam(dev, bit,
+   &ringparam);
+   if (ret != 0)
+   return ret;
+   if (copy_to_user(useraddr, &ringparam, sizeof(ringparam)))
+   return -EFAULT;
+   useraddr += sizeof(ringparam);
+   }
+
+   return 0;
+}
+
+static int
+ethtool_set_per_queue_ringparam(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   struct ethtool_ringparam *backup = NULL, *tmp = NULL;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+   int i, ret = 0;
+   int n_queue;
+   u32 bit;
+
+   if ((!dev->ethtool_ops->set_per_queue_ringparam) ||
+   (!dev->ethtool_ops->get_per_queue_ringparam))
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+   n_queue = bitmap_weight(queue_mask, MAX_NUM_QUEUE);
+   tmp = kmalloc_array(n_queue, sizeof(*backup), GFP_KERNEL);
+   if (!tmp)
+   return -ENOMEM;
+   backup = tmp;
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   struct ethtool_ringparam
+   ringparam = { .cmd = ETHTOOL_SRINGPARAM };
+
+   ret = dev->ethtool_ops->get_per_queue_ringparam(dev, bit, tmp);
+   if (ret != 0)
+   goto roll_back;
+
+   tmp++;
+
+   if (copy_from_user(&ringparam, useraddr, sizeof(ringparam))) {
+   ret = -EFAULT;
+   goto roll_back;
+   }
+
+   ret = dev->ethtool_ops->set_per_queue_ringparam(dev, bit,
+   &ringparam);
+   if (ret != 0)
+   goto roll_back;
+
+   useraddr += sizeof(ringparam);
+   }
+
+roll_back:
+   if (ret != 0) {
+   tmp = backup;
+   for_each_set_bit(i, queue_mask, bit) {
+   dev->ethtool_ops->set_per_queue_ringparam(dev, i, tmp);
+   tmp++;
+   }
+   }
+   kfree(backup);
+
+   return ret;
+}
+
+static int
 ethtool_get_per_queue_bandwidth(struct net_device *dev,
void __user *useraddr,
struct ethtool_per_queue_op *per_queue_opt)
@@ -2509,6 +2607,12 @@ static int ethtool_set_per_queue(struct net_device *dev, 
void __user *

[PATCH 1/2] net: core: ethtool: add per queue bandwidth command

2016-08-04 Thread Ivan Khoronzhuk
Signed-off-by: Ivan Khoronzhuk 
---
 include/linux/ethtool.h  |   4 ++
 include/uapi/linux/ethtool.h |   2 +
 net/core/ethtool.c   | 102 +++
 3 files changed, 108 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 9ded8c6..7e64c17 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -273,6 +273,8 @@ bool ethtool_convert_link_mode_to_legacy_u32(u32 
*legacy_u32,
  * a TX queue has this number, return -EINVAL. If only a RX queue or a TX
  * queue has this number, ignore the inapplicable fields.
  * Returns a negative error code or zero.
+ * @get_per_queue_bandwidth: get per-queue bandwidth
+ * @set_per_queue_bandwidth: set per-queue bandwidth
  * @get_link_ksettings: When defined, takes precedence over the
  * %get_settings method. Get various device settings
  * including Ethernet link settings. The %cmd and
@@ -368,6 +370,8 @@ struct ethtool_ops {
  struct ethtool_coalesce *);
int (*set_per_queue_coalesce)(struct net_device *, u32,
  struct ethtool_coalesce *);
+   int (*get_per_queue_bandwidth)(struct net_device *, u32, int *);
+   int (*set_per_queue_bandwidth)(struct net_device *, u32, int);
int (*get_link_ksettings)(struct net_device *,
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index b8f38e8..0fcfe9e 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1314,6 +1314,8 @@ struct ethtool_per_queue_op {
 
 #define ETHTOOL_GLINKSETTINGS  0x004c /* Get ethtool_link_settings */
 #define ETHTOOL_SLINKSETTINGS  0x004d /* Set ethtool_link_settings */
+#define ETHTOOL_GBANDWIDTH 0x004e /* Get ethtool per queue bandwidth */
+#define ETHTOOL_SBANDWIDTH 0x004f /* Set ethtool per queue bandwidth */
 
 
 /* compatibility with older code */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..f31d539 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2346,6 +2346,102 @@ static int ethtool_get_per_queue_coalesce(struct 
net_device *dev,
return 0;
 }
 
+static int
+ethtool_get_per_queue_bandwidth(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   u32 bit;
+   int ret;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if (!dev->ethtool_ops->get_per_queue_bandwidth)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   int bandwidth;
+
+   ret = dev->ethtool_ops->get_per_queue_bandwidth(dev, bit,
+   &bandwidth);
+   if (ret != 0)
+   return ret;
+   if (copy_to_user(useraddr, &bandwidth, sizeof(bandwidth)))
+   return -EFAULT;
+   useraddr += sizeof(bandwidth);
+   }
+
+   return 0;
+}
+
+static int
+ethtool_set_per_queue_bandwidth(struct net_device *dev,
+   void __user *useraddr,
+   struct ethtool_per_queue_op *per_queue_opt)
+{
+   u32 bit;
+   int n_queue;
+   int i, ret = 0;
+   int *backup = NULL, *tmp = NULL;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if ((!dev->ethtool_ops->set_per_queue_bandwidth) ||
+   (!dev->ethtool_ops->get_per_queue_bandwidth))
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+   n_queue = bitmap_weight(queue_mask, MAX_NUM_QUEUE);
+   tmp = kmalloc_array(n_queue, sizeof(*backup), GFP_KERNEL);
+   if (!tmp)
+   return -ENOMEM;
+   backup = tmp;
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   int bandwidth;
+
+   ret = dev->ethtool_ops->get_per_queue_bandwidth(dev, bit, tmp);
+   if (ret != 0)
+   goto roll_back;
+
+   tmp++;
+
+   if (copy_from_user(&bandwidth, useraddr, sizeof(bandwidth))) {
+   ret = -EFAULT;
+   goto roll_back;
+   }
+
+   ret = dev->ethtool_ops->set_per_queue_bandwidth(dev, bit,
+  

Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine

2016-08-04 Thread Stanislav Kinsburskiy



03.08.2016 19:36, Jeff Layton пишет:

On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:

Otherwise freezer cgroup state might never become "FROZEN".

Here is a deadlock scheme for 2 processes in one freezer cgroup,
which is
freezing:

CPU 0   CPU 1

do_last
inode_lock(dir->d_inode)
vfs_create
nfs_create
...
__rpc_execute
rpc_wait_bit_killable
__refrigerator
 do_last
 inode_lock(dir->d_inode)

So, the problem is that one process takes directory inode mutex,
executes
creation request and goes to refrigerator.
Another one waits till directory lock is released, remains "thawed"
and thus
freezer cgroup state never becomes "FROZEN".

Notes:
1) Interesting, that this is not a pure deadlock: one can thaw cgroup
and then
freeze it again.
2) The issue was introduced by commit
d310310cbff18ec385c6ab4d58f33b100192a96a.
3) This patch is not aimed to fix the issue, but to show the problem
root.
Look like this problem moght be applicable to other hunks from the
commit,
mentioned above.


Signed-off-by: Stanislav Kinsburskiy 
---
  net/sunrpc/sched.c |1 -
  1 file changed, 1 deletion(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9ae5885..ec7ccc1 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
  
  static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)

  {
-   freezable_schedule_unsafe();
if (signal_pending_state(mode, current))
return -ERESTARTSYS;
return 0;


Ummm...so what actually does the schedule() with this patch?


Schedule() replaces freezable_schedule_unsafe() of course, sorry for this.


There was a bit of discussion on this recently -- see the thread with
this subject line in linux-nfs:

 Re: Hang due to nfs letting tasks freeze with locked inodes


Thanks, had a look.


Basically it comes down to this:

All of the proposals so far to fix this problem just switch out the
freezable_schedule_unsafe (and similar) calls for those that don't
allow the process to freeze.

The problem there is that we originally added that stuff in response to
bug reports about machines failing to suspend. What often happens is
that the network interfaces come down, and then the freezer runs over
all of the processes, which never return because they're blocked
waiting on the server to reply.


I probably don't understand something, but this sounds somewhat wrong to 
me: freezing processes _after_ network is down.





...shrug...

Maybe we should just go ahead and do it (and to CIFS as well). Just be
prepared for the inevitable complaints about laptops failing to suspend
once you do.


The worst part in all of this, from my POW, is that current behavior
makes NFS non-freezable in a generic case, even in case of freezing a
container, which has it's own net ns and NFS mount.
So, I would say, that returning of previous logic would make the
world better.


Part of the fix, I think is to add a return code (similar to
ERESTARTSYS) that gets interpreted near the kernel-userland boundary
as: "allow the process to be frozen, and then retry the call once it's
resumed".

With that, filesystems could return the error code when they want to
redrive the entire syscall from that level. That won't work for non-
idempotent requests though. We'd need to do something more elaborate
there.



Might be, that breaking rpc request is something that should be avoided 
at all.
With all these locks being held, almost all (any?) of the requests to 
remote server

should be considered as an atomic operation from freezer point of view.
The process always can be frozen on signal handling.

IOW, I might worth considering a scenario, when NFS is not freezable at all,
and any problems with suspend on laptops/whatever have to solved in 
suspend code.





Re: [v5.1] ucc_fast: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-04 Thread Arnd Bergmann
On Thursday, August 4, 2016 10:22:43 PM CEST Arvind Yadav wrote:
> index df8ea79..ada9070 100644
> --- a/include/soc/fsl/qe/ucc_fast.h
> +++ b/include/soc/fsl/qe/ucc_fast.h
> @@ -165,10 +165,12 @@ struct ucc_fast_private {
> int stopped_tx; /* Whether channel has been stopped for Tx
>(STOP_TX, etc.) */
> int stopped_rx; /* Whether channel has been stopped for Rx */
> -   u32 ucc_fast_tx_virtual_fifo_base_offset;/* pointer to base of Tx
> -   virtual fifo */
> -   u32 ucc_fast_rx_virtual_fifo_base_offset;/* pointer to base of Rx
> -   virtual fifo */
> +   unsigned long ucc_fast_tx_virtual_fifo_base_offset;/* pointer to base 
> of
> +   * Tx virtual fifo
> +   */
> +   unsigned long ucc_fast_rx_virtual_fifo_base_offset;/* pointer to base 
> of
> +   * Rx virtual fifo
> +   */
>  #ifdef STATISTICS
> u32 tx_frames;  /* Transmitted frames counter. */
> u32 rx_frames;  /* Received frames counter (only frames
> 

This change seems ok, but what about the other u32 variables in ucc_geth.c
that get checked for IS_ERR_VALUE?

Arnd




[PATCH iproute v2 3/5] ip6tnl: Support for fou encapsulation

2016-08-04 Thread Tom Herbert
Signed-off-by: Tom Herbert 
---
 ip/link_ip6tnl.c | 92 +++-
 1 file changed, 91 insertions(+), 1 deletion(-)

diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 89861c6..59162a3 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -37,6 +37,9 @@ static void print_usage(FILE *f)
fprintf(f, "  [ dev PHYS_DEV ] [ encaplimit ELIM ]\n");
fprintf(f, "  [ hoplimit HLIM ] [ tclass TCLASS ] [ flowlabel 
FLOWLABEL ]\n");
fprintf(f, "  [ dscp inherit ] [ fwmark inherit ]\n");
+   fprintf(f, "  [ noencap ] [ encap { fou | gue | none } ]\n");
+   fprintf(f, "  [ encap-sport PORT ] [ encap-dport PORT ]\n");
+   fprintf(f, "  [ [no]encap-csum ] [ [no]encap-csum6 ] [ 
[no]encap-remcsum ]\n");
fprintf(f, "\n");
fprintf(f, "Where: NAME  := STRING\n");
fprintf(f, "   ADDR  := IPV6_ADDRESS\n");
@@ -82,6 +85,10 @@ static int ip6tunnel_parse_opt(struct link_util *lu, int 
argc, char **argv,
__u32 flags = 0;
__u32 link = 0;
__u8 proto = 0;
+   __u16 encaptype = 0;
+   __u16 encapflags = TUNNEL_ENCAP_FLAG_CSUM6;
+   __u16 encapsport = 0;
+   __u16 encapdport = 0;
 
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0) {
@@ -182,7 +189,7 @@ get_failed:
if (get_u8(&uval, *argv, 0))
invarg("invalid HLIM", *argv);
hop_limit = uval;
-   } else if (matches(*argv, "encaplimit") == 0) {
+   } else if (strcmp(*argv, "encaplimit") == 0) {
NEXT_ARG();
if (strcmp(*argv, "none") == 0) {
flags |= IP6_TNL_F_IGN_ENCAP_LIMIT;
@@ -236,6 +243,40 @@ get_failed:
if (strcmp(*argv, "inherit") != 0)
invarg("not inherit", *argv);
flags |= IP6_TNL_F_USE_ORIG_FWMARK;
+   } else if (strcmp(*argv, "noencap") == 0) {
+   encaptype = TUNNEL_ENCAP_NONE;
+   } else if (strcmp(*argv, "encap") == 0) {
+   NEXT_ARG();
+   if (strcmp(*argv, "fou") == 0)
+   encaptype = TUNNEL_ENCAP_FOU;
+   else if (strcmp(*argv, "gue") == 0)
+   encaptype = TUNNEL_ENCAP_GUE;
+   else if (strcmp(*argv, "none") == 0)
+   encaptype = TUNNEL_ENCAP_NONE;
+   else
+   invarg("Invalid encap type.", *argv);
+   } else if (strcmp(*argv, "encap-sport") == 0) {
+   NEXT_ARG();
+   if (strcmp(*argv, "auto") == 0)
+   encapsport = 0;
+   else if (get_u16(&encapsport, *argv, 0))
+   invarg("Invalid source port.", *argv);
+   } else if (strcmp(*argv, "encap-dport") == 0) {
+   NEXT_ARG();
+   if (get_u16(&encapdport, *argv, 0))
+   invarg("Invalid destination port.", *argv);
+   } else if (strcmp(*argv, "encap-csum") == 0) {
+   encapflags |= TUNNEL_ENCAP_FLAG_CSUM;
+   } else if (strcmp(*argv, "noencap-csum") == 0) {
+   encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM;
+   } else if (strcmp(*argv, "encap-udp6-csum") == 0) {
+   encapflags |= TUNNEL_ENCAP_FLAG_CSUM6;
+   } else if (strcmp(*argv, "noencap-udp6-csum") == 0) {
+   encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM6;
+   } else if (strcmp(*argv, "encap-remcsum") == 0) {
+   encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM;
+   } else if (strcmp(*argv, "noencap-remcsum") == 0) {
+   encapflags |= ~TUNNEL_ENCAP_FLAG_REMCSUM;
} else
usage();
argc--, argv++;
@@ -250,6 +291,11 @@ get_failed:
addattr32(n, 1024, IFLA_IPTUN_FLAGS, flags);
addattr32(n, 1024, IFLA_IPTUN_LINK, link);
 
+   addattr16(n, 1024, IFLA_IPTUN_ENCAP_TYPE, encaptype);
+   addattr16(n, 1024, IFLA_IPTUN_ENCAP_FLAGS, encapflags);
+   addattr16(n, 1024, IFLA_IPTUN_ENCAP_SPORT, htons(encapsport));
+   addattr16(n, 1024, IFLA_IPTUN_ENCAP_DPORT, htons(encapdport));
+
return 0;
 }
 
@@ -334,6 +380,50 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb
 
if (flags & IP6_TNL_F_USE_ORIG_FWMARK)
fprintf(f, "fwmark inherit ");
+
+   if (tb[IFLA_IPTUN_ENCAP_TYPE] &&
+   rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_TYPE]) !=
+   TUNNEL_ENCAP_NONE) {
+   __u16 type

[PATCH iproute v2 4/5] gre6: Support for fou encapsulation

2016-08-04 Thread Tom Herbert
Signed-off-by: Tom Herbert 
---
 ip/link_gre.c  |   2 +-
 ip/link_gre6.c | 101 +
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index 5dc4067..3b99e56 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -429,7 +429,7 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
fputs("external ", f);
 
if (tb[IFLA_GRE_ENCAP_TYPE] &&
-   *(__u16 *)RTA_DATA(tb[IFLA_GRE_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) {
+   rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) {
__u16 type = rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]);
__u16 flags = rta_getattr_u16(tb[IFLA_GRE_ENCAP_FLAGS]);
__u16 sport = rta_getattr_u16(tb[IFLA_GRE_ENCAP_SPORT]);
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 6767ef6..d00db1f 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -38,6 +38,9 @@ static void print_usage(FILE *f)
fprintf(f, "  [ hoplimit TTL ] [ encaplimit ELIM ]\n");
fprintf(f, "  [ tclass TCLASS ] [ flowlabel FLOWLABEL ]\n");
fprintf(f, "  [ dscp inherit ] [ dev PHYS_DEV ]\n");
+   fprintf(f, "  [ noencap ] [ encap { fou | gue | none } ]\n");
+   fprintf(f, "  [ encap-sport PORT ] [ encap-dport PORT ]\n");
+   fprintf(f, "  [ [no]encap-csum ] [ [no]encap-csum6 ] [ 
[no]encap-remcsum ]\n");
fprintf(f, "\n");
fprintf(f, "Where: NAME  := STRING\n");
fprintf(f, "   ADDR  := IPV6_ADDRESS\n");
@@ -86,6 +89,10 @@ static int gre_parse_opt(struct link_util *lu, int argc, 
char **argv,
unsigned int flags = 0;
__u8 hop_limit = DEFAULT_TNL_HOP_LIMIT;
__u8 encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT;
+   __u16 encaptype = 0;
+   __u16 encapflags = TUNNEL_ENCAP_FLAG_CSUM6;
+   __u16 encapsport = 0;
+   __u16 encapdport = 0;
int len;
 
if (!(n->nlmsg_flags & NLM_F_CREATE)) {
@@ -146,6 +153,18 @@ get_failed:
 
if (greinfo[IFLA_GRE_FLAGS])
flags = rta_getattr_u32(greinfo[IFLA_GRE_FLAGS]);
+
+   if (greinfo[IFLA_GRE_ENCAP_TYPE])
+   encaptype = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_TYPE]);
+
+   if (greinfo[IFLA_GRE_ENCAP_FLAGS])
+   encapflags = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_FLAGS]);
+
+   if (greinfo[IFLA_GRE_ENCAP_SPORT])
+   encapsport = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_SPORT]);
+
+   if (greinfo[IFLA_GRE_ENCAP_DPORT])
+   encapdport = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_DPORT]);
}
 
while (argc > 0) {
@@ -277,6 +296,40 @@ get_failed:
if (strcmp(*argv, "inherit") != 0)
invarg("not inherit", *argv);
flags |= IP6_TNL_F_RCV_DSCP_COPY;
+   } else if (strcmp(*argv, "noencap") == 0) {
+   encaptype = TUNNEL_ENCAP_NONE;
+   } else if (strcmp(*argv, "encap") == 0) {
+   NEXT_ARG();
+   if (strcmp(*argv, "fou") == 0)
+   encaptype = TUNNEL_ENCAP_FOU;
+   else if (strcmp(*argv, "gue") == 0)
+   encaptype = TUNNEL_ENCAP_GUE;
+   else if (strcmp(*argv, "none") == 0)
+   encaptype = TUNNEL_ENCAP_NONE;
+   else
+   invarg("Invalid encap type.", *argv);
+   } else if (strcmp(*argv, "encap-sport") == 0) {
+   NEXT_ARG();
+   if (strcmp(*argv, "auto") == 0)
+   encapsport = 0;
+   else if (get_u16(&encapsport, *argv, 0))
+   invarg("Invalid source port.", *argv);
+   } else if (strcmp(*argv, "encap-dport") == 0) {
+   NEXT_ARG();
+   if (get_u16(&encapdport, *argv, 0))
+   invarg("Invalid destination port.", *argv);
+   } else if (strcmp(*argv, "encap-csum") == 0) {
+   encapflags |= TUNNEL_ENCAP_FLAG_CSUM;
+   } else if (strcmp(*argv, "noencap-csum") == 0) {
+   encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM;
+   } else if (strcmp(*argv, "encap-udp6-csum") == 0) {
+   encapflags |= TUNNEL_ENCAP_FLAG_CSUM6;
+   } else if (strcmp(*argv, "noencap-udp6-csum") == 0) {
+   encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM6;
+   } else if (strcmp(*argv, "encap-remcsum") == 0) {
+   encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM;
+   } else if (strcmp(*argv, "noencap-remcsum") == 0) {
+

[PATCH iproute v2 2/5] ila: Support for configuring ila to use netfilter hook

2016-08-04 Thread Tom Herbert
Signed-off-by: Tom Herbert 
---
 ip/Makefile|   2 +-
 ip/ip.c|   3 +-
 ip/ip_common.h |   1 +
 ip/ipila.c | 267 +
 4 files changed, 271 insertions(+), 2 deletions(-)
 create mode 100644 ip/ipila.c

diff --git a/ip/Makefile b/ip/Makefile
index 33e9286..86c8cdc 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -7,7 +7,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
 link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
 iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \
-iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o
+iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o ipila.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/ip.c b/ip/ip.c
index 166ef17..cb3adcb 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -51,7 +51,7 @@ static void usage(void)
 "   ip [ -force ] -batch filename\n"
 "where  OBJECT := { link | address | addrlabel | route | rule | neigh | ntable 
|\n"
 "   tunnel | tuntap | maddress | mroute | mrule | monitor | 
xfrm |\n"
-"   netns | l2tp | fou | macsec | tcp_metrics | token | 
netconf }\n"
+"   netns | l2tp | fou | macsec | tcp_metrics | token | 
netconf | ila }\n"
 "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "-h[uman-readable] | -iec |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
@@ -84,6 +84,7 @@ static const struct cmd {
{ "link",   do_iplink },
{ "l2tp",   do_ipl2tp },
{ "fou",do_ipfou },
+   { "ila",do_ipila },
{ "macsec", do_ipmacsec },
{ "tunnel", do_iptunnel },
{ "tunl",   do_iptunnel },
diff --git a/ip/ip_common.h b/ip/ip_common.h
index c818812..93ff5bc 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -52,6 +52,7 @@ int do_netns(int argc, char **argv);
 int do_xfrm(int argc, char **argv);
 int do_ipl2tp(int argc, char **argv);
 int do_ipfou(int argc, char **argv);
+extern int do_ipila(int argc, char **argv);
 int do_tcp_metrics(int argc, char **argv);
 int do_ipnetconf(int argc, char **argv);
 int do_iptoken(int argc, char **argv);
diff --git a/ip/ipila.c b/ip/ipila.c
new file mode 100644
index 000..9f24b5d
--- /dev/null
+++ b/ip/ipila.c
@@ -0,0 +1,267 @@
+/*
+ * ipila.c ILA (Identifier Locator Addressing) support
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors:Tom Herbert 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "libgenl.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static void usage(void)
+{
+   fprintf(stderr, "Usage: ip ila add loc_match LOCATOR_MATCH "
+   "loc LOCATOR [ dev DEV ]\n");
+   fprintf(stderr, "   ip ila del loc_match LOCATOR_MATCH "
+   "[ loc LOCATOR ] [ dev DEV ]\n");
+   fprintf(stderr, "   ip ila list\n");
+   fprintf(stderr, "\n");
+
+   exit(-1);
+}
+
+/* netlink socket */
+static struct rtnl_handle genl_rth = { .fd = -1 };
+static int genl_family = -1;
+
+#define ILA_REQUEST(_req, _bufsiz, _cmd, _flags)   \
+   GENL_REQUEST(_req, _bufsiz, genl_family, 0, \
+ILA_GENL_VERSION, _cmd, _flags)
+
+#define ILA_RTA(g) ((struct rtattr *)(((char *)(g)) +  \
+   NLMSG_ALIGN(sizeof(struct genlmsghdr
+
+#define ADDR_BUF_SIZE sizeof(":::")
+
+static int print_addr64(__u64 addr, char *buff, size_t len)
+{
+   __u16 *words = (__u16 *)&addr;
+   __u16 v;
+   int i, ret;
+   size_t written = 0;
+   char *sep = ":";
+
+   for (i = 0; i < 4; i++) {
+   v = ntohs(words[i]);
+
+   if (i == 3)
+   sep = "";
+
+   ret = snprintf(&buff[written], len - written, "%x%s", v, sep);
+   if (ret < 0)
+   return ret;
+
+   written += ret;
+   }
+
+   return written;
+}
+
+static void print_ila_locid(FILE *fp, int attr, struct rtattr *tb[], int space)
+{
+   char abuf[256];
+   size_t blen;
+   int i;
+
+   if (tb[attr]) {
+   blen = print_addr64(rta_getattr_u32(tb[attr]),
+   abuf, sizeof(abuf));
+   fprintf(fp, "%s", abuf);
+   } else {
+   fprintf(fp, "-");
+   blen = 1;
+   }
+
+   for (i = 0; i < space - blen; i++)
+   fprintf(fp, " ");
+}
+
+static int print_ila_mapping(const struct sockaddr_nl *who,
+

[PATCH iproute v2 5/5] fou: Allowing configuring IPv6 listener

2016-08-04 Thread Tom Herbert
Signed-off-by: Tom Herbert 
---
 ip/ipfou.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ip/ipfou.c b/ip/ipfou.c
index 2a6ae17..0673d11 100644
--- a/ip/ipfou.c
+++ b/ip/ipfou.c
@@ -25,8 +25,9 @@
 
 static void usage(void)
 {
-   fprintf(stderr, "Usage: ip fou add port PORT { ipproto PROTO  | gue 
}\n");
-   fprintf(stderr, "   ip fou del port PORT\n");
+   fprintf(stderr, "Usage: ip fou add port PORT "
+   "{ ipproto PROTO  | gue } [ -6 ]\n");
+   fprintf(stderr, "   ip fou del port PORT [ -6 ]\n");
fprintf(stderr, "\n");
fprintf(stderr, "Where: PROTO { ipproto-name | 1..255 }\n");
fprintf(stderr, "   PORT { 1..65535 }\n");
@@ -50,6 +51,7 @@ static int fou_parse_opt(int argc, char **argv, struct 
nlmsghdr *n,
__u8 ipproto, type;
bool gue_set = false;
int ipproto_set = 0;
+   unsigned short family = AF_INET;
 
while (argc > 0) {
if (!matches(*argv, "port")) {
@@ -71,6 +73,8 @@ static int fou_parse_opt(int argc, char **argv, struct 
nlmsghdr *n,
ipproto_set = 1;
} else if (!matches(*argv, "gue")) {
gue_set = true;
+   } else if (!matches(*argv, "-6")) {
+   family = AF_INET6;
} else {
fprintf(stderr, "fou: unknown command \"%s\"?\n", 
*argv);
usage();
@@ -98,6 +102,7 @@ static int fou_parse_opt(int argc, char **argv, struct 
nlmsghdr *n,
 
addattr16(n, 1024, FOU_ATTR_PORT, port);
addattr8(n, 1024, FOU_ATTR_TYPE, type);
+   addattr16(n, 1024, FOU_ATTR_AF, family);
 
if (ipproto_set)
addattr8(n, 1024, FOU_ATTR_IPPROTO, ipproto);
-- 
2.8.0.rc2



[PATCH iproute v2 0/5] iproute: ila and fou addition

2016-08-04 Thread Tom Herbert
Patch set includes:

- Allow configuring checksum mode for ila LWT (e.g. configure
  checksum neutral
- Configuration for performing ila translations using netfilter hook
- fou encapsulation for ip6tnl and gre6
- fou listener for IPv6

v2:
  - Style fixes
  - Replace occurences of RTA_DATA with rta_getattr_u*

Tom Herbert (5):
  ila: Support for checksum neutral translation
  ila: Support for configuring ila to use netfilter hook
  ip6tnl: Support for fou encapsulation
  gre6: Support for fou encapsulation
  fou: Allowing configuring IPv6 listener

 ip/Makefile   |   2 +-
 ip/ip.c   |   3 +-
 ip/ip_common.h|   1 +
 ip/ipfou.c|   9 +-
 ip/ipila.c| 267 ++
 ip/iproute_lwtunnel.c |  58 ++-
 ip/link_gre.c |   2 +-
 ip/link_gre6.c| 101 +++
 ip/link_ip6tnl.c  |  92 -
 9 files changed, 527 insertions(+), 8 deletions(-)
 create mode 100644 ip/ipila.c

-- 
2.8.0.rc2



[PATCH iproute v2 1/5] ila: Support for checksum neutral translation

2016-08-04 Thread Tom Herbert
Add configuration of ila LWT tunnels for checksum mode including
checksum neutral translation.

Signed-off-by: Tom Herbert 
---
 ip/iproute_lwtunnel.c | 58 +--
 1 file changed, 56 insertions(+), 2 deletions(-)

diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
index bdbb15d..b656143 100644
--- a/ip/iproute_lwtunnel.c
+++ b/ip/iproute_lwtunnel.c
@@ -90,6 +90,32 @@ static void print_encap_ip(FILE *fp, struct rtattr *encap)
fprintf(fp, "tos %d ", rta_getattr_u8(tb[LWTUNNEL_IP_TOS]));
 }
 
+static char *ila_csum_mode2name(__u8 csum_mode)
+{
+   switch (csum_mode) {
+   case ILA_CSUM_ADJUST_TRANSPORT:
+   return "adj-transport";
+   case ILA_CSUM_NEUTRAL_MAP:
+   return "neutral-map";
+   case ILA_CSUM_NO_ACTION:
+   return "no-action";
+   default:
+   return "unknown";
+   }
+}
+
+static __u8 ila_csum_name2mode(char *name)
+{
+   if (strcmp(name, "adj-transport") == 0)
+   return ILA_CSUM_ADJUST_TRANSPORT;
+   else if (strcmp(name, "neutral-map") == 0)
+   return ILA_CSUM_NEUTRAL_MAP;
+   else if (strcmp(name, "no-action") == 0)
+   return ILA_CSUM_NO_ACTION;
+   else
+   return -1;
+}
+
 static void print_encap_ila(FILE *fp, struct rtattr *encap)
 {
struct rtattr *tb[ILA_ATTR_MAX+1];
@@ -103,6 +129,10 @@ static void print_encap_ila(FILE *fp, struct rtattr *encap)
   abuf, sizeof(abuf));
fprintf(fp, " %s ", abuf);
}
+
+   if (tb[ILA_ATTR_CSUM_MODE])
+   fprintf(fp, " csum-mode %s ",
+   
ila_csum_mode2name(rta_getattr_u8(tb[ILA_ATTR_CSUM_MODE])));
 }
 
 static void print_encap_ip6(FILE *fp, struct rtattr *encap)
@@ -246,10 +276,34 @@ static int parse_encap_ila(struct rtattr *rta, size_t len,
exit(1);
}
 
+   argc--; argv++;
+
rta_addattr64(rta, 1024, ILA_ATTR_LOCATOR, locator);
 
-   *argcp = argc;
-   *argvp = argv;
+   while (argc > 0) {
+   if (strcmp(*argv, "csum-mode") == 0) {
+   __u8 csum_mode;
+
+   NEXT_ARG();
+
+   csum_mode = ila_csum_name2mode(*argv);
+   if (csum_mode < 0)
+   invarg("\"csum-mode\" value is invalid\n", 
*argv);
+
+   rta_addattr8(rta, 1024, ILA_ATTR_CSUM_MODE, csum_mode);
+
+   argc--; argv++;
+   } else {
+   break;
+   }
+   }
+
+   /* argv is currently the first unparsed argument,
+* but the lwt_parse_encap() caller will move to the next,
+* so step back
+*/
+   *argcp = argc + 1;
+   *argvp = argv - 1;
 
return 0;
 }
-- 
2.8.0.rc2



Re: [RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread John Fastabend
On 16-08-04 12:36 PM, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> To achieve better network performance, the key step is to distribute the
> packets to dedicated queues according to policy and system run time
> status.
> 
> This patch provides an interface which can return the proper dedicated
> queue for socket/task. Then the packets of the socket/task will be
> redirect to the dedicated queue for better network performance.
> 
> For selecting the proper queue, currently it uses round-robin algorithm
> to find the available object from the given policy object list. The
> algorithm is good enough for now. But it could be improved by some
> adaptive algorithm later.
> 
> The selected object will be stored in hashtable. So it does not need to
> go through the whole object list every time.
> 
> Signed-off-by: Kan Liang 
> ---
>  include/linux/netpolicy.h |   5 ++
>  net/core/netpolicy.c  | 136 
> ++
>  2 files changed, 141 insertions(+)
> 

There is a hook in the tx path now (recently added)

# ifdef CONFIG_NET_EGRESS
if (static_key_false(&egress_needed)) {
skb = sch_handle_egress(skb, &rc, dev);
if (!skb)
goto out;
}
# endif

that allows pushing any policy you like for picking tx queues. It would
be better to use this mechanism. The hook runs 'tc' classifiers so
either write a new ./net/sch/cls_*.c for this or just use ebpf to stick
your policy in at runtime.

I'm out of the office for a few days but when I get pack I can test that
it actually picks the selected queue in all cases I know there was an
issue with some of the drivers using select_queue awhile back.

.John



Re: [RFC V2 PATCH 01/25] net: introduce NET policy

2016-08-04 Thread Randy Dunlap
On 08/04/16 12:36, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> This patch introduce NET policy subsystem. If proc is supported in the
> system, it creates netpolicy node in proc system.
> 
> Signed-off-by: Kan Liang 
> ---
>  include/linux/netdevice.h   |   7 +++
>  include/net/net_namespace.h |   3 ++
>  net/Kconfig |   7 +++
>  net/core/Makefile   |   1 +
>  net/core/netpolicy.c| 128 
> 
>  5 files changed, 146 insertions(+)
>  create mode 100644 net/core/netpolicy.c

> diff --git a/net/Kconfig b/net/Kconfig
> index c2cdbce..00552ba 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
>  
>  endif
>  
> +config NETPOLICY
> + depends on NET
> + bool "Net policy support"
> + default y
> + ---help---
> + Net policy support
> +

New kconfig options shouldn't default to y.


-- 
~Randy


[PATCH -net] net/ethernet: tundra: fix dump_eth_one warning in tsi108_eth

2016-08-04 Thread Paul Gortmaker
The call site for this function appears as:

  #ifdef DEBUG
data->msg_enable = DEBUG;
dump_eth_one(dev);
  #endif

...leading to the following warning for !DEBUG builds:

drivers/net/ethernet/tundra/tsi108_eth.c:169:13: warning: 'dump_eth_one' 
defined but not used [-Wunused-function]
 static void dump_eth_one(struct net_device *dev)
 ^

...when using the arch/powerpc/configs/mpc7448_hpc2_defconfig

Put the function definition under the same #ifdef as the call site
to avoid the warning.

Cc: "David S. Miller" 
Cc: netdev@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Signed-off-by: Paul Gortmaker 
---
 drivers/net/ethernet/tundra/tsi108_eth.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/tundra/tsi108_eth.c 
b/drivers/net/ethernet/tundra/tsi108_eth.c
index 01a77145a0fa..8fd131207ee1 100644
--- a/drivers/net/ethernet/tundra/tsi108_eth.c
+++ b/drivers/net/ethernet/tundra/tsi108_eth.c
@@ -166,6 +166,7 @@ static struct platform_driver tsi_eth_driver = {
 
 static void tsi108_timed_checker(unsigned long dev_ptr);
 
+#ifdef DEBUG
 static void dump_eth_one(struct net_device *dev)
 {
struct tsi108_prv_data *data = netdev_priv(dev);
@@ -190,6 +191,7 @@ static void dump_eth_one(struct net_device *dev)
   TSI_READ(TSI108_EC_RXESTAT),
   TSI_READ(TSI108_EC_RXERR), data->rxpending);
 }
+#endif
 
 /* Synchronization is needed between the thread and up/down events.
  * Note that the PHY is accessed through the same registers for both
-- 
2.8.4



[RFC V2 PATCH 06/25] net/netpolicy: set and remove IRQ affinity

2016-08-04 Thread kan . liang
From: Kan Liang 

This patches introduces functions to set and remove IRQ affinity
according to cpu and queue mapping.

The functions will not record the previous affinity status. After a
set/remove cycles, it will set the affinity on all online CPU with IRQ
balance enabling.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index ff7fc04..c44818d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -128,6 +129,38 @@ err:
return -ENOMEM;
 }
 
+static void netpolicy_clear_affinity(struct net_device *dev)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+   u32 i;
+
+   for (i = 0; i < s_info->avail_rx_num; i++) {
+   irq_clear_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+   irq_set_affinity_hint(s_info->rx[i].irq, cpu_online_mask);
+   }
+
+   for (i = 0; i < s_info->avail_tx_num; i++) {
+   irq_clear_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+   irq_set_affinity_hint(s_info->tx[i].irq, cpu_online_mask);
+   }
+}
+
+static void netpolicy_set_affinity(struct net_device *dev)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+   u32 i;
+
+   for (i = 0; i < s_info->avail_rx_num; i++) {
+   irq_set_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+   irq_set_affinity_hint(s_info->rx[i].irq, 
cpumask_of(s_info->rx[i].cpu));
+   }
+
+   for (i = 0; i < s_info->avail_tx_num; i++) {
+   irq_set_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+   irq_set_affinity_hint(s_info->tx[i].irq, 
cpumask_of(s_info->tx[i].cpu));
+   }
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 01/25] net: introduce NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

This patch introduce NET policy subsystem. If proc is supported in the
system, it creates netpolicy node in proc system.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h   |   7 +++
 include/net/net_namespace.h |   3 ++
 net/Kconfig |   7 +++
 net/core/Makefile   |   1 +
 net/core/netpolicy.c| 128 
 5 files changed, 146 insertions(+)
 create mode 100644 net/core/netpolicy.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 076df53..19638d6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1619,6 +1619,8 @@ enum netdev_priv_flags {
  * switch driver and used to set the phys state of the
  * switch port.
  *
+ * @proc_dev:  device node in proc to configure device net policy
+ *
  * FIXME: cleanup struct net_device such that network protocol info
  * moves out.
  */
@@ -1886,6 +1888,11 @@ struct net_device {
struct lock_class_key   *qdisc_tx_busylock;
struct lock_class_key   *qdisc_running_key;
boolproto_down;
+#ifdef CONFIG_NETPOLICY
+#ifdef CONFIG_PROC_FS
+   struct proc_dir_entry   *proc_dev;
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..d2ff6c4 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -142,6 +142,9 @@ struct net {
 #endif
struct sock *diag_nlsk;
atomic_tfnhe_genid;
+#ifdef CONFIG_NETPOLICY
+   struct proc_dir_entry   *proc_netpolicy;
+#endif /* CONFIG_NETPOLICY */
 };
 
 #include 
diff --git a/net/Kconfig b/net/Kconfig
index c2cdbce..00552ba 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
 
 endif
 
+config NETPOLICY
+   depends on NET
+   bool "Net policy support"
+   default y
+   ---help---
+   Net policy support
+
 source "net/dccp/Kconfig"
 source "net/sctp/Kconfig"
 source "net/rds/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..0be7092 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NETPOLICY) += netpolicy.o
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
new file mode 100644
index 000..faabfe7
--- /dev/null
+++ b/net/core/netpolicy.c
@@ -0,0 +1,128 @@
+/*
+ * netpolicy.c: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.li...@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * NET policy intends to simplify the network configuration and get a good
+ * network performance according to the hints(policy) which is applied by user.
+ *
+ * Motivation
+ * - The network performance is not good with default system settings.
+ * - It is too difficult to do automatic tuning for all possible
+ *   workloads, since workloads have different requirements. Some
+ *   workloads may want high throughput. Some may need low latency.
+ * - There are lots of manual configurations. Fine grained configuration
+ *   is too difficult for users.
+ * So, it is a big challenge to get good network performance.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_PROC_FS
+
+static int net_policy_proc_show(struct seq_file *m, void *v)
+{
+   struct net_device *dev = (struct net_device *)m->private;
+
+   seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+
+   return 0;
+}
+
+static int net_policy_proc_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, net_policy_proc_show, PDE_DATA(inode));
+}
+
+static const struct file_operations proc_net_policy_operations = {
+   .open   = net_policy_proc_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+   .owner  = THIS_MODULE,
+};
+
+static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
+{
+   dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
+   if (!dev->proc_dev)
+   return -ENOMEM;
+
+   if (!proc_create_data("policy", 

[RFC V2 PATCH 04/25] net/netpolicy: get CPU information

2016-08-04 Thread kan . liang
From: Kan Liang 

Net policy also needs to know CPU information. Currently, online
CPU number is enough.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7c34c8a..075aaca 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -49,6 +49,11 @@ static void netpolicy_free_dev_info(struct 
netpolicy_dev_info *d_info)
kfree(d_info->tx_irq);
 }
 
+static u32 netpolicy_get_cpu_information(void)
+{
+   return num_online_cpus();
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 02/25] net/netpolicy: init NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

This patch tries to initialize NET policy for all the devices in the
system. However, not all device drivers have NET policy support. For
those drivers who does not have NET policy support, the node will not be
showed in /proc/net/netpolicy/.
The device driver who has NET policy support must implement the
interface ndo_netpolicy_init, which is used to do necessory
initialization and collect information (E.g. supported policies) from
driver.

The user can check /proc/net/netpolicy/ and
/proc/net/netpolicy/$DEV/policy to know the available device and its
supported policy.

np_lock is also introduced to protect the state of NET policy.

Device hotplug will be handled later in this series.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h | 12 +++
 include/linux/netpolicy.h | 31 +
 net/core/netpolicy.c  | 86 +--
 3 files changed, 118 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/netpolicy.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 19638d6..2e0a7e7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1120,6 +1121,9 @@ struct netdev_xdp {
  * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp);
  * This function is used to set or query state related to XDP on the
  * netdevice. See definition of enum xdp_netdev_command for details.
+ * int(*ndo_netpolicy_init)(struct net_device *dev,
+ * struct netpolicy_info *info);
+ * This function is used to init and get supported policy.
  *
  */
 struct net_device_ops {
@@ -1306,6 +1310,10 @@ struct net_device_ops {
   int needed_headroom);
int (*ndo_xdp)(struct net_device *dev,
   struct netdev_xdp *xdp);
+#ifdef CONFIG_NETPOLICY
+   int (*ndo_netpolicy_init)(struct net_device *dev,
+ struct netpolicy_info 
*info);
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
@@ -1620,6 +1628,8 @@ enum netdev_priv_flags {
  * switch port.
  *
  * @proc_dev:  device node in proc to configure device net policy
+ * @netpolicy: NET policy related information of net device
+ * @np_lock:   protect the state of NET policy
  *
  * FIXME: cleanup struct net_device such that network protocol info
  * moves out.
@@ -1892,6 +1902,8 @@ struct net_device {
 #ifdef CONFIG_PROC_FS
struct proc_dir_entry   *proc_dev;
 #endif /* CONFIG_PROC_FS */
+   struct netpolicy_info   *netpolicy;
+   spinlock_t  np_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
new file mode 100644
index 000..ca1f131
--- /dev/null
+++ b/include/linux/netpolicy.h
@@ -0,0 +1,31 @@
+/*
+ * netpolicy.h: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.li...@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#ifndef __LINUX_NETPOLICY_H
+#define __LINUX_NETPOLICY_H
+
+enum netpolicy_name {
+   NET_POLICY_NONE = 0,
+   NET_POLICY_MAX,
+};
+
+extern const char *policy_name[];
+
+struct netpolicy_info {
+   enum netpolicy_name cur_policy;
+   unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+};
+
+#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index faabfe7..5f304d5 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,13 +35,29 @@
 #include 
 #include 
 
+const char *policy_name[NET_POLICY_MAX] = {
+   "NONE"
+};
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
struct net_device *dev = (struct net_device *)m->private;
-
-   seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+   int i;
+
+   if (WARN_ON(!dev->netpolicy))
+   return -EINVAL;
+
+   if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+   seq_printf(m, "%s: There is no policy applied\n", dev->name);
+   seq_printf(m, "%s: The available policy include:", dev->name);
+   for_each_set_bit(i, dev->netpolicy->avail_policy, 
NET_POLICY_MAX)
+   seq_printf(m, " %s", policy_name[

[RFC V2 PATCH 00/25] Kernel NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

(re-send to correct system time issue. Sorry for any inconvenience.)
It is a big challenge to get good network performance. First, the network
performance is not good with default system settings. Second, it is too
difficult to do automatic tuning for all possible workloads, since workloads
have different requirements. Some workloads may want high throughput. Some may
need low latency. Last but not least, there are lots of manual configurations.
Fine grained configuration is too difficult for users.

NET policy intends to simplify the network configuration and get a good network
performance according to the hints(policy) which is applied by user. It
provides some typical "policies" for user which can be set per-socket, per-task
or per-device. The kernel will automatically figures out how to merge different
requests to get good network performance.

NET policy is designed for multiqueue network devices. This implementation is
only for Intel NICs using i40e driver. But the concepts and generic code should
apply to other multiqueue NICs too.

NET policy is also a combination of generic policy manager code and some
ethtool callbacks (per queue coalesce setting, flow classification rules) to
configure the driver.

This series also supports CPU hotplug and device hotplug.

Here are some common questions about NET policy.
 1. Why userspace tool cannot do the same thing?
A: Kernel is more suitable for NET policy.
   - User space code would be far more complicated to get right and perform
 well . It always need to work with out of date state compared to the
 latest, because it cannot do any locking with the kernel state.
   - User space code is less efficient than kernel code, because of the
 additional context switches needed.
   - Kernel is in the right position to coordinate requests from multiple
 users.

 2. Is NET policy looking for optimal settings?
A: No. The NET policy intends to get a good network performance according
   to user's specific request. Our target for good performance is ~90% of
   the optimal settings.

 3. How's the configuration impact the connection rates?
A: There are two places to acquire rtnl mutex to configure the device.
   - One is to do device policy setting. It happens on initalization stage,
 hotplug or queue number changes. The device policy will be set to
 NET_POLICY_NONE. If so, it "falls back" to the system default way to
 direct the packets. It doesn't block the connection.
   - The other is to set Rx network flow classification options or rules.
 It uses work queue to do asynchronized setting. It avoid destroying
 the connection rates.

 4. Why not using existing mechanism for NET policy?
For example, cgroup tc or existing SOCKET options.
A: The NET policy has already used existing mechanism as many as it can.
   For example, it uses existing ethtool interface to configure the device.
   However, the NET policy stiil need to introduce new interfaces to meet
   its special request.
   For resource usage, current cgroup tc is not suitable for per-socket
   setting. Also, current tc can only set rate limit. The NET policy wants
   to change interrupt moderation per device queue. So in this series, it
   will not use cgroup tc. But in some places, cgroup and NET policy are
   similar. For example, both of them isolates the resource usage. Both of
   them do traffic controller. So it is on the NET policy TODO list to
   work well with cgroup.
   For socket options, SO_MARK or may be SO_PRIORITY is close to NET 
policy's
   requirement. But they can not be reused for NET policy. SO_MARK can be
   used for routing and packet filtering. But the NET policy doesn't intend 
to
   change the routing. It only redirects the packet to the specific device
   queue. Also, the target queue is assigned by NET policy subsystem at run
   time. It should not be set in advance. SO_PRIORITY can set 
protocol-defined
   priority for all packets on the socket. But the policies don't have 
priority.

 5. Why disable IRQ balance?
A: Disabling IRQ balance is a common way (recommend way for some devices) to
   tune network performance.


Here are some key Interfaces/APIs for NET policy.

Interfaces which export to user space

   /proc/net/netpolicy/$DEV/policy
   User can set/get per device policy from /proc

   /proc/$PID/net_policy
   User can set/get per task policy from /proc
   prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
   An alternative way to set/get per task policy is from prctl.

   setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
   User can set/get per socket policy by setsockopt

New ndo opt

   int (*ndo_netpolicy_init)(struct net_device *dev,
 struct netpolicy_info *info);
   Initialize device driver for NET policy

   int (*n

[RFC V2 PATCH 05/25] net/netpolicy: create CPU and queue mapping

2016-08-04 Thread kan . liang
From: Kan Liang 

Current implementation forces CPU and queue 1:1 mapping. This patch
introduces the function netpolicy_update_sys_map to create this mapping.
The result is stored in netpolicy_sys_info.

If the CPU count and queue count are different, the remaining
CPUs/queues are not used for now.

CPU hotplug, device hotplug or ethtool may change the CPU count or
queue count. For these cases, this function can also be called to
reconstruct the mapping. These cases will be handled later in this
series.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h | 18 
 net/core/netpolicy.c  | 74 +++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fc87d9b..a946b75c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -30,9 +30,27 @@ struct netpolicy_dev_info {
u32 *tx_irq;
 };
 
+struct netpolicy_sys_map {
+   u32 cpu;
+   u32 queue;
+   u32 irq;
+};
+
+struct netpolicy_sys_info {
+   /*
+* Record the cpu and queue 1:1 mapping
+*/
+   u32 avail_rx_num;
+   struct netpolicy_sys_map*rx;
+   u32 avail_tx_num;
+   struct netpolicy_sys_map*tx;
+};
+
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+   /* cpu and queue mapping information */
+   struct netpolicy_sys_info   sys_info;
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 075aaca..ff7fc04 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -54,6 +54,80 @@ static u32 netpolicy_get_cpu_information(void)
return num_online_cpus();
 }
 
+static void netpolicy_free_sys_map(struct net_device *dev)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+
+   kfree(s_info->rx);
+   s_info->rx = NULL;
+   s_info->avail_rx_num = 0;
+   kfree(s_info->tx);
+   s_info->tx = NULL;
+   s_info->avail_tx_num = 0;
+}
+
+static int netpolicy_update_sys_map(struct net_device *dev,
+   struct netpolicy_dev_info *d_info,
+   u32 cpu)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+   u32 num, i, online_cpu;
+   cpumask_var_t cpumask;
+
+   if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+   return -ENOMEM;
+
+   /* update rx cpu map */
+   if (cpu > d_info->rx_num)
+   num = d_info->rx_num;
+   else
+   num = cpu;
+
+   s_info->avail_rx_num = num;
+   s_info->rx = kcalloc(num, sizeof(*s_info->rx), GFP_ATOMIC);
+   if (!s_info->rx)
+   goto err;
+   cpumask_copy(cpumask, cpu_online_mask);
+
+   i = 0;
+   for_each_cpu(online_cpu, cpumask) {
+   if (i == num)
+   break;
+   s_info->rx[i].cpu = online_cpu;
+   s_info->rx[i].queue = i;
+   s_info->rx[i].irq = d_info->rx_irq[i];
+   i++;
+   }
+
+   /* update tx cpu map */
+   if (cpu >= d_info->tx_num)
+   num = d_info->tx_num;
+   else
+   num = cpu;
+
+   s_info->avail_tx_num = num;
+   s_info->tx = kcalloc(num, sizeof(*s_info->tx), GFP_ATOMIC);
+   if (!s_info->tx)
+   goto err;
+
+   i = 0;
+   for_each_cpu(online_cpu, cpumask) {
+   if (i == num)
+   break;
+   s_info->tx[i].cpu = online_cpu;
+   s_info->tx[i].queue = i;
+   s_info->tx[i].irq = d_info->tx_irq[i];
+   i++;
+   }
+
+   free_cpumask_var(cpumask);
+   return 0;
+err:
+   netpolicy_free_sys_map(dev);
+   free_cpumask_var(cpumask);
+   return -ENOMEM;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 12/25] net/netpolicy: NET device hotplug

2016-08-04 Thread kan . liang
From: Kan Liang 

Support NET device up/down/namechange in the NET policy code.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 66 +---
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8336106..2a04fcf 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -684,6 +684,9 @@ static const struct file_operations 
proc_net_policy_operations = {
 
 static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 {
+   if (dev->proc_dev)
+   proc_remove(dev->proc_dev);
+
dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
if (!dev->proc_dev)
return -ENOMEM;
@@ -750,6 +753,19 @@ void uninit_netpolicy(struct net_device *dev)
spin_unlock(&dev->np_lock);
 }
 
+static void netpolicy_dev_init(struct net *net,
+  struct net_device *dev)
+{
+   if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+   if (netpolicy_proc_dev_init(net, dev))
+   uninit_netpolicy(dev);
+   else
+#endif /* CONFIG_PROC_FS */
+   pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+   }
+}
+
 static int __net_init netpolicy_net_init(struct net *net)
 {
struct net_device *dev, *aux;
@@ -762,14 +778,7 @@ static int __net_init netpolicy_net_init(struct net *net)
 #endif /* CONFIG_PROC_FS */
 
for_each_netdev_safe(net, dev, aux) {
-   if (!init_netpolicy(dev)) {
-#ifdef CONFIG_PROC_FS
-   if (netpolicy_proc_dev_init(net, dev))
-   uninit_netpolicy(dev);
-   else
-#endif /* CONFIG_PROC_FS */
-   pr_info("NETPOLICY: Init net policy for %s\n", 
dev->name);
-   }
+   netpolicy_dev_init(net, dev);
}
 
return 0;
@@ -791,17 +800,58 @@ static struct pernet_operations netpolicy_net_ops = {
.exit = netpolicy_net_exit,
 };
 
+static int netpolicy_notify(struct notifier_block *this,
+   unsigned long event,
+   void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+   switch (event) {
+   case NETDEV_CHANGENAME:
+#ifdef CONFIG_PROC_FS
+   if (dev->proc_dev) {
+   proc_remove(dev->proc_dev);
+   if ((netpolicy_proc_dev_init(dev_net(dev), dev) < 0) &&
+   dev->proc_dev) {
+   proc_remove(dev->proc_dev);
+   dev->proc_dev = NULL;
+   }
+   }
+#endif
+   break;
+   case NETDEV_UP:
+   netpolicy_dev_init(dev_net(dev), dev);
+   break;
+   case NETDEV_GOING_DOWN:
+   uninit_netpolicy(dev);
+#ifdef CONFIG_PROC_FS
+   proc_remove(dev->proc_dev);
+   dev->proc_dev = NULL;
+#endif
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_dev_notf = {
+   .notifier_call = netpolicy_notify,
+};
+
 static int __init netpolicy_init(void)
 {
int ret;
 
ret = register_pernet_subsys(&netpolicy_net_ops);
+   if (!ret)
+   register_netdevice_notifier(&netpolicy_dev_notf);
 
return ret;
 }
 
 static void __exit netpolicy_exit(void)
 {
+   unregister_netdevice_notifier(&netpolicy_dev_notf);
unregister_pernet_subsys(&netpolicy_net_ops);
 }
 
-- 
2.5.5



[RFC V2 PATCH 09/25] net/netpolicy: set NET policy by policy name

2016-08-04 Thread kan . liang
From: Kan Liang 

User can write policy name to /proc/net/netpolicy/$DEV/policy to enable
net policy for specific device.

When the policy is enabled, the subsystem automatically disables IRQ
balance and set IRQ affinity. The object list is also generated
accordingly.

It is device driver's responsibility to set driver specific
configuration for the given policy.

np_lock will be used to protect the state.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h |  5 +++
 include/linux/netpolicy.h |  1 +
 net/core/netpolicy.c  | 95 +++
 3 files changed, 101 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1eda870..aa3ef38 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1127,6 +1127,9 @@ struct netdev_xdp {
  * int (*ndo_get_irq_info)(struct net_device *dev,
  *struct netpolicy_dev_info *info);
  * This function is used to get irq information of rx and tx queues
+ * int (*ndo_set_net_policy)(struct net_device *dev,
+ *  enum netpolicy_name name);
+ * This function is used to set per device net policy by name
  *
  */
 struct net_device_ops {
@@ -1318,6 +1321,8 @@ struct net_device_ops {
  struct netpolicy_info 
*info);
int (*ndo_get_irq_info)(struct net_device *dev,
struct netpolicy_dev_info 
*info);
+   int (*ndo_set_net_policy)(struct net_device *dev,
+ enum netpolicy_name name);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 73a5fa6..b1d9277 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -27,6 +27,7 @@ enum netpolicy_traffic {
NETPOLICY_RXTX,
 };
 
+#define POLICY_NAME_LEN_MAX64
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0f8ff16..8112839 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -430,6 +431,69 @@ err:
return ret;
 }
 
+static int net_policy_set_by_name(char *name, struct net_device *dev)
+{
+   int i, ret;
+
+   spin_lock(&dev->np_lock);
+   ret = 0;
+
+   if (!dev->netpolicy ||
+   !dev->netdev_ops->ndo_set_net_policy) {
+   ret = -ENOTSUPP;
+   goto unlock;
+   }
+
+   for (i = 0; i < NET_POLICY_MAX; i++) {
+   if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+   break;
+   }
+
+   if (!test_bit(i, dev->netpolicy->avail_policy)) {
+   ret = -ENOTSUPP;
+   goto unlock;
+   }
+
+   if (i == dev->netpolicy->cur_policy)
+   goto unlock;
+
+   /* If there is no policy applied yet, need to do enable first . */
+   if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+   ret = netpolicy_enable(dev);
+   if (ret)
+   goto unlock;
+   }
+
+   netpolicy_free_obj_list(dev);
+
+   /* Generate object list according to policy name */
+   ret = netpolicy_gen_obj_list(dev, i);
+   if (ret)
+   goto err;
+
+   /* set policy */
+   ret = dev->netdev_ops->ndo_set_net_policy(dev, i);
+   if (ret)
+   goto err;
+
+   /* If removing policy, need to do disable. */
+   if (i == NET_POLICY_NONE)
+   netpolicy_disable(dev);
+
+   dev->netpolicy->cur_policy = i;
+
+   spin_unlock(&dev->np_lock);
+   return 0;
+
+err:
+   netpolicy_free_obj_list(dev);
+   if (dev->netpolicy->cur_policy == NET_POLICY_NONE)
+   netpolicy_disable(dev);
+unlock:
+   spin_unlock(&dev->np_lock);
+   return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -459,11 +523,40 @@ static int net_policy_proc_open(struct inode *inode, 
struct file *file)
return single_open(file, net_policy_proc_show, PDE_DATA(inode));
 }
 
+static ssize_t net_policy_proc_write(struct file *file, const char __user *buf,
+size_t count, loff_t *pos)
+{
+   struct seq_file *m = file->private_data;
+   struct net_device *dev = (struct net_device *)m->private;
+   char name[POLICY_NAME_LEN_MAX];
+   int i, ret;
+
+   if (!dev->netpolicy)
+   return -ENOTSUPP;
+
+   if (count > POLICY_NAME_LEN_MAX)
+   return -EINVAL;
+
+   if (copy_from_user(name, buf, count))
+   return -EINVAL;
+
+   for (i = 0; i < count - 1; i++)
+   name[i] = tou

[RFC V2 PATCH 08/25] net/netpolicy: introduce NET policy object

2016-08-04 Thread kan . liang
From: Kan Liang 

This patch introduces the concept of NET policy object and policy object
list.

The NET policy object is the instance of CPU/queue mapping. The object
can be shared between different tasks/sockets. So besides CPU and queue
information, the object also maintains a reference counter.

Each policy will have a dedicated object list. If the policy is set as
device policy, all objects will be inserted into the related policy
object list. The user will search and pickup the available objects from
the list later.

The network performance for objects could be different because of the
queue and CPU topology. To generate a proper object list, dev location,
HT and CPU topology have to be considered. The high performance objects
are in the front of the list.

The object lists will be regenerated if sys mapping changes or device
net policy changes.

Lock np_ob_list_lock is used to protect the object list.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h |   2 +
 include/linux/netpolicy.h |  15 +++
 net/core/netpolicy.c  | 237 +-
 3 files changed, 253 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0e55ccd..1eda870 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1635,6 +1635,7 @@ enum netdev_priv_flags {
  * @proc_dev:  device node in proc to configure device net policy
  * @netpolicy: NET policy related information of net device
  * @np_lock:   protect the state of NET policy
+ * @np_ob_list_lock:   protect the net policy object list
  *
  * FIXME: cleanup struct net_device such that network protocol info
  * moves out.
@@ -1909,6 +1910,7 @@ struct net_device {
 #endif /* CONFIG_PROC_FS */
struct netpolicy_info   *netpolicy;
spinlock_t  np_lock;
+   spinlock_t  np_ob_list_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a946b75c..73a5fa6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -21,6 +21,12 @@ enum netpolicy_name {
NET_POLICY_MAX,
 };
 
+enum netpolicy_traffic {
+   NETPOLICY_RX= 0,
+   NETPOLICY_TX,
+   NETPOLICY_RXTX,
+};
+
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
@@ -46,11 +52,20 @@ struct netpolicy_sys_info {
struct netpolicy_sys_map*tx;
 };
 
+struct netpolicy_object {
+   struct list_headlist;
+   u32 cpu;
+   u32 queue;
+   atomic_trefcnt;
+};
+
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
/* cpu and queue mapping information */
struct netpolicy_sys_info   sys_info;
+   /* List of policy objects 0 rx 1 tx */
+   struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7d4a49d..0f8ff16 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -161,10 +162,30 @@ static void netpolicy_set_affinity(struct net_device *dev)
}
 }
 
+static void netpolicy_free_obj_list(struct net_device *dev)
+{
+   int i, j;
+   struct netpolicy_object *obj, *tmp;
+
+   spin_lock(&dev->np_ob_list_lock);
+   for (i = 0; i < NETPOLICY_RXTX; i++) {
+   for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++) {
+   if (list_empty(&dev->netpolicy->obj_list[i][j]))
+   continue;
+   list_for_each_entry_safe(obj, tmp, 
&dev->netpolicy->obj_list[i][j], list) {
+   list_del(&obj->list);
+   kfree(obj);
+   }
+   }
+   }
+   spin_unlock(&dev->np_ob_list_lock);
+}
+
 static int netpolicy_disable(struct net_device *dev)
 {
netpolicy_clear_affinity(dev);
netpolicy_free_sys_map(dev);
+   netpolicy_free_obj_list(dev);
 
return 0;
 }
@@ -203,6 +224,212 @@ static int netpolicy_enable(struct net_device *dev)
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
+
+static u32 cpu_to_queue(struct net_device *dev,
+   u32 cpu, bool is_rx)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+   int i;
+
+   if (is_rx) {
+   for (i = 0; i < s_info->avail_rx_num; i++) {
+   if (s_info->rx[i].cpu == cpu)
+   return s_info->rx[i].queue;
+   }

[RFC V2 PATCH 10/25] net/netpolicy: add three new NET policies

2016-08-04 Thread kan . liang
From: Kan Liang 

Introduce three NET policies
CPU policy: configure for higher throughput and lower CPU% (power
saving).
BULK policy: configure for highest throughput.
LATENCY policy: configure for lowest latency.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h | 3 +++
 net/core/netpolicy.c  | 5 -
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index b1d9277..3d348a7 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -18,6 +18,9 @@
 
 enum netpolicy_name {
NET_POLICY_NONE = 0,
+   NET_POLICY_CPU,
+   NET_POLICY_BULK,
+   NET_POLICY_LATENCY,
NET_POLICY_MAX,
 };
 
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8112839..71e9163 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -223,7 +223,10 @@ static int netpolicy_enable(struct net_device *dev)
 }
 
 const char *policy_name[NET_POLICY_MAX] = {
-   "NONE"
+   "NONE",
+   "CPU",
+   "BULK",
+   "LATENCY"
 };
 
 static u32 cpu_to_queue(struct net_device *dev,
-- 
2.5.5



[RFC V2 PATCH 13/25] net/netpolicy: support CPU hotplug

2016-08-04 Thread kan . liang
From: Kan Liang 

For CPU hotplug, the NET policy subsystem will rebuild the sys map and
object list.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 76 
 1 file changed, 76 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 2a04fcf..3b523fc 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -838,6 +839,73 @@ static struct notifier_block netpolicy_dev_notf = {
.notifier_call = netpolicy_notify,
 };
 
+/**
+ * update_netpolicy_sys_map() - rebuild the sys map and object list
+ *
+ * This function go through all the available net policy supported device,
+ * and rebuild sys map and object list.
+ *
+ */
+void update_netpolicy_sys_map(void)
+{
+   struct net *net;
+   struct net_device *dev, *aux;
+   enum netpolicy_name cur_policy;
+
+   for_each_net(net) {
+   for_each_netdev_safe(net, dev, aux) {
+   spin_lock(&dev->np_lock);
+   if (!dev->netpolicy)
+   goto unlock;
+   cur_policy = dev->netpolicy->cur_policy;
+   if (cur_policy == NET_POLICY_NONE)
+   goto unlock;
+
+   dev->netpolicy->cur_policy = NET_POLICY_NONE;
+
+   /* rebuild everything */
+   netpolicy_disable(dev);
+   netpolicy_enable(dev);
+   if (netpolicy_gen_obj_list(dev, cur_policy)) {
+   pr_warn("NETPOLICY: Failed to generate 
netpolicy object list for dev %s\n",
+   dev->name);
+   netpolicy_disable(dev);
+   goto unlock;
+   }
+   if (dev->netdev_ops->ndo_set_net_policy(dev, 
cur_policy)) {
+   pr_warn("NETPOLICY: Failed to set netpolicy for 
dev %s\n",
+   dev->name);
+   netpolicy_disable(dev);
+   goto unlock;
+   }
+
+   dev->netpolicy->cur_policy = cur_policy;
+unlock:
+   spin_unlock(&dev->np_lock);
+   }
+   }
+}
+
+static int netpolicy_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_ONLINE:
+   update_netpolicy_sys_map();
+   break;
+   case CPU_DYING:
+   update_netpolicy_sys_map();
+   break;
+   }
+   return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_cpu_notifier = {
+   &netpolicy_cpu_callback,
+   NULL,
+   0
+};
+
 static int __init netpolicy_init(void)
 {
int ret;
@@ -846,6 +914,10 @@ static int __init netpolicy_init(void)
if (!ret)
register_netdevice_notifier(&netpolicy_dev_notf);
 
+   cpu_notifier_register_begin();
+   __register_cpu_notifier(&netpolicy_cpu_notifier);
+   cpu_notifier_register_done();
+
return ret;
 }
 
@@ -853,6 +925,10 @@ static void __exit netpolicy_exit(void)
 {
unregister_netdevice_notifier(&netpolicy_dev_notf);
unregister_pernet_subsys(&netpolicy_net_ops);
+
+   cpu_notifier_register_begin();
+   __unregister_cpu_notifier(&netpolicy_cpu_notifier);
+   cpu_notifier_register_done();
 }
 
 subsys_initcall(netpolicy_init);
-- 
2.5.5



[RFC V2 PATCH 07/25] net/netpolicy: enable and disable NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

This patch introduces functions to enable and disable NET policy.

For enabling, it collects device and CPU information, setup CPU/queue
mapping, and set IRQ affinity accordingly.

For disabling, it removes the IRQ affinity and mapping information.

np_lock should protect the enable and disable state. It will be done
later in this series.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index c44818d..7d4a49d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -161,6 +161,45 @@ static void netpolicy_set_affinity(struct net_device *dev)
}
 }
 
+static int netpolicy_disable(struct net_device *dev)
+{
+   netpolicy_clear_affinity(dev);
+   netpolicy_free_sys_map(dev);
+
+   return 0;
+}
+
+static int netpolicy_enable(struct net_device *dev)
+{
+   int ret;
+   struct netpolicy_dev_info d_info;
+   u32 cpu;
+
+   if (WARN_ON(!dev->netpolicy))
+   return -EINVAL;
+
+   /* get driver information */
+   ret = netpolicy_get_dev_info(dev, &d_info);
+   if (ret)
+   return ret;
+
+   /* get cpu information */
+   cpu = netpolicy_get_cpu_information();
+
+   /* create sys map */
+   ret = netpolicy_update_sys_map(dev, &d_info, cpu);
+   if (ret) {
+   netpolicy_free_dev_info(&d_info);
+   return ret;
+   }
+
+   /* set irq affinity */
+   netpolicy_set_affinity(dev);
+
+   netpolicy_free_dev_info(&d_info);
+   return 0;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 15/25] net/netpolicy: implement netpolicy register

2016-08-04 Thread kan . liang
From: Kan Liang 

The socket/task can only be benefited when it register itself with
specific policy. If it's the first time to register, a record will be
created and inserted into RCU hash table. The record includes ptr,
policy and object information. ptr is the socket/task's pointer which is
used as key to search the record in hash table. Object will be assigned
later.

This patch also introduces a new type NET_POLICY_INVALID, which
indicates that the task/socket are not registered.

np_hashtable_lock is introduced to protect the hash table.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |  26 
 net/core/netpolicy.c  | 153 ++
 2 files changed, 179 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index cc75e3c..5900252 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -17,6 +17,7 @@
 #define __LINUX_NETPOLICY_H
 
 enum netpolicy_name {
+   NET_POLICY_INVALID  = -1,
NET_POLICY_NONE = 0,
NET_POLICY_CPU,
NET_POLICY_BULK,
@@ -79,12 +80,37 @@ struct netpolicy_info {
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_instance {
+   struct net_device   *dev;
+   enum netpolicy_name policy; /* required policy */
+   void*ptr;   /* pointers */
+};
+
+/* check if policy is valid */
+static inline int is_net_policy_valid(enum netpolicy_name policy)
+{
+   return ((policy < NET_POLICY_MAX) && (policy > NET_POLICY_INVALID));
+}
+
 #ifdef CONFIG_NETPOLICY
 extern void update_netpolicy_sys_map(void);
+extern int netpolicy_register(struct netpolicy_instance *instance,
+ enum netpolicy_name policy);
+extern void netpolicy_unregister(struct netpolicy_instance *instance);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
 }
+
+static inline int netpolicy_register(struct netpolicy_instance *instance,
+enum netpolicy_name policy)
+{  return 0;
+}
+
+static inline void netpolicy_unregister(struct netpolicy_instance *instance)
+{
+}
+
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7579685..3605761 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -38,6 +38,19 @@
 #include 
 #include 
 #include 
+#include 
+
+struct netpolicy_record {
+   struct hlist_node   hash_node;
+   unsigned long   ptr_id;
+   enum netpolicy_name policy;
+   struct net_device   *dev;
+   struct netpolicy_object *rx_obj;
+   struct netpolicy_object *tx_obj;
+};
+
+static DEFINE_HASHTABLE(np_record_hash, 10);
+static DEFINE_SPINLOCK(np_hashtable_lock);
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -223,6 +236,143 @@ static int netpolicy_enable(struct net_device *dev)
return 0;
 }
 
+static struct netpolicy_record *netpolicy_record_search(unsigned long ptr_id)
+{
+   struct netpolicy_record *rec = NULL;
+
+   hash_for_each_possible_rcu(np_record_hash, rec, hash_node, ptr_id) {
+   if (rec->ptr_id == ptr_id)
+   break;
+   }
+
+   return rec;
+}
+
+static void put_queue(struct net_device *dev,
+ struct netpolicy_object *rx_obj,
+ struct netpolicy_object *tx_obj)
+{
+   if (!dev || !dev->netpolicy)
+   return;
+
+   if (rx_obj)
+   atomic_dec(&rx_obj->refcnt);
+   if (tx_obj)
+   atomic_dec(&tx_obj->refcnt);
+}
+
+static void netpolicy_record_clear_obj(void)
+{
+   struct netpolicy_record *rec;
+   int i;
+
+   spin_lock_bh(&np_hashtable_lock);
+   hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+   put_queue(rec->dev, rec->rx_obj, rec->tx_obj);
+   rec->rx_obj = NULL;
+   rec->tx_obj = NULL;
+   }
+   spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void netpolicy_record_clear_dev_node(struct net_device *dev)
+{
+   struct netpolicy_record *rec;
+   int i;
+
+   spin_lock_bh(&np_hashtable_lock);
+   hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+   if (rec->dev == dev) {
+   hash_del_rcu(&rec->hash_node);
+   kfree(rec);
+   }
+   }
+   spin_unlock_bh(&np_hashtable_lock);
+}
+
+/**
+ * netpolicy_register() - Register per socket/task policy request
+ * @instance:  NET policy per socket/task instance info
+ * @policy:request NET policy
+ *
+ * This function intends to register per socket/task policy request.
+ * If it's the first time to register, an record will be created and
+ * inserted into RCU hash table.
+ *
+ * The record includes ptr, policy and object info. ptr of the socket/task
+ * is the key to search the re

[RFC V2 PATCH 11/25] net/netpolicy: add MIX policy

2016-08-04 Thread kan . liang
From: Kan Liang 

MIX policy is combine of other policies. It allows different queue has
different policy. If MIX policy is applied,
/proc/net/netpolicy/$DEV/policy shows per queue policy.

Usually, the workloads requires either high throughput or low latency.
So for current implementation, MIX policy is combine of LATENCY policy
and BULK policy.

The workloads which requires high throughput are usually utilize more
CPU resources compared to the workloads which requires low latency. This
means that if there is an equal interest in latency and throughput
performance, it is better to reserve more BULK queues than LATENCY
queues. In this patch, MIX policy is forced to include 1/3 LATENCY
policy queues and 2/3 BULK policy queues.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |   7 +++
 net/core/netpolicy.c  | 139 ++
 2 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 3d348a7..579ff98 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -22,6 +22,12 @@ enum netpolicy_name {
NET_POLICY_BULK,
NET_POLICY_LATENCY,
NET_POLICY_MAX,
+
+   /*
+* Mixture of the above policy
+* Can only be set as global policy.
+*/
+   NET_POLICY_MIX,
 };
 
 enum netpolicy_traffic {
@@ -66,6 +72,7 @@ struct netpolicy_object {
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+   boolhas_mix_policy;
/* cpu and queue mapping information */
struct netpolicy_sys_info   sys_info;
/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 71e9163..8336106 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -280,6 +280,9 @@ static inline int node_distance_cmp(const void *a, const 
void *b)
return _a->distance - _b->distance;
 }
 
+#define mix_latency_num(num)   ((num) / 3)
+#define mix_throughput_num(num)((num) - mix_latency_num(num))
+
 static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
   enum netpolicy_name policy,
   struct sort_node *nodes, int num_node,
@@ -287,7 +290,9 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
 {
cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
struct cpumask *node_assigned_cpumask;
+   int *l_num = NULL, *b_num = NULL;
int i, ret = -ENOMEM;
+   int num_node_cpu;
u32 cpu;
 
if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
@@ -299,6 +304,23 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
if (!node_assigned_cpumask)
goto alloc_fail2;
 
+   if (policy == NET_POLICY_MIX) {
+   l_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+   if (!l_num)
+   goto alloc_fail3;
+   b_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+   if (!b_num) {
+   kfree(l_num);
+   goto alloc_fail3;
+   }
+
+   for (i = 0; i < num_node; i++) {
+   num_node_cpu = 
cpumask_weight(&node_avail_cpumask[nodes[i].node]);
+   l_num[i] = mix_latency_num(num_node_cpu);
+   b_num[i] = mix_throughput_num(num_node_cpu);
+   }
+   }
+
/* Don't share physical core */
for (i = 0; i < num_node; i++) {
if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
@@ -309,7 +331,13 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
cpu = cpumask_first(node_tmp_cpumask);
 
/* push to obj list */
-   ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+   if (policy == NET_POLICY_MIX) {
+   if (l_num[i]-- > 0)
+   ret = netpolicy_add_obj(dev, cpu, 
is_rx, NET_POLICY_LATENCY);
+   else if (b_num[i]-- > 0)
+   ret = netpolicy_add_obj(dev, cpu, 
is_rx, NET_POLICY_BULK);
+   } else
+   ret = netpolicy_add_obj(dev, cpu, is_rx, 
policy);
if (ret) {
spin_unlock(&dev->np_ob_list_lock);
goto err;
@@ -322,6 +350,41 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
spin_unlock(&dev->np_ob_list_lock);
}
 
+   if (policy == NET_POLICY_MIX) {
+   struct netpolicy_object *obj;
+   int dir = is_rx ? 0 : 1;
+   u32 sibling;
+
+   /* if have to share core, choose

[RFC V2 PATCH 18/25] net/netpolicy: set Tx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang 

When the device tries to transmit a packet, netdev_pick_tx is called to
find the available Tx queues. If the net policy is applied, it picks up
the assigned Tx queue from net policy subsystem, and redirect the
traffic to the assigned queue.

Signed-off-by: Kan Liang 
---
 include/net/sock.h |  9 +
 net/core/dev.c | 20 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index fd4132f..6219434 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2273,4 +2273,13 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
+/* Return netpolicy instance information from socket. */
+static inline struct netpolicy_instance *netpolicy_find_instance(struct sock 
*sk)
+{
+#ifdef CONFIG_NETPOLICY
+   if (is_net_policy_valid(sk->sk_netpolicy.policy))
+   return &sk->sk_netpolicy;
+#endif
+   return NULL;
+}
 #endif /* _SOCK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 2a9c39f..08db6eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device 
*dev,
struct sk_buff *skb,
void *accel_priv)
 {
+   struct sock *sk = skb->sk;
int queue_index = 0;
 
 #ifdef CONFIG_XPS
@@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct net_device 
*dev,
if (ops->ndo_select_queue)
queue_index = ops->ndo_select_queue(dev, skb, 
accel_priv,
__netdev_pick_tx);
-   else
-   queue_index = __netdev_pick_tx(dev, skb);
+   else {
+#ifdef CONFIG_NETPOLICY
+   struct netpolicy_instance *instance;
+
+   queue_index = -1;
+   if (dev->netpolicy && sk) {
+   instance = netpolicy_find_instance(sk);
+   if (instance) {
+   if (!instance->dev)
+   instance->dev = dev;
+   queue_index = 
netpolicy_pick_queue(instance, false);
+   }
+   }
+   if (queue_index < 0)
+#endif
+   queue_index = __netdev_pick_tx(dev, skb);
+   }
 
if (!accel_priv)
queue_index = netdev_cap_txqueue(dev, queue_index);
-- 
2.5.5



[RFC V2 PATCH 16/25] net/netpolicy: introduce per socket netpolicy

2016-08-04 Thread kan . liang
From: Kan Liang 

The network socket is the most basic unit which control the network
traffic. This patch introduces a new socket option SO_NETPOLICY to
set/get net policy for socket. so that the application can set its own
policy on socket to improve the network performance.
Per socket net policy can also be inherited by new socket.

The usage of SO_NETPOLICY socket option is as below.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
getsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
The policy set by SO_NETPOLICY socket option must be valid and
compatible with current device policy. Othrewise, it will error out. The
socket policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang 
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h |  2 ++
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  2 ++
 arch/mips/include/uapi/asm/socket.h|  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h|  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/net/request_sock.h |  4 +++-
 include/net/sock.h |  9 +
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 28 
 16 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..06b2ef9 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h 
b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..24f85f0 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..82c8d44 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..b99c1df 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..71a43ed 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 2027240a..ce8b9ba 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..c041265 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..2639dcd 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE  0x402E
 
+#define SO_NETPOLICY   0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h 
b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..e04e3b6 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h 
b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..d43b854 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/

[RFC V2 PATCH 21/25] net/netpolicy: set per task policy by proc

2016-08-04 Thread kan . liang
From: Kan Liang 

Users may not want to change the source code to add per task net polic
support. Or they may want to change a running task's net policy. prctl
does not work for both cases.

This patch adds an interface in /proc, which can be used to set and
retrieve policy of already running tasks. User can write the policy name
into /proc/$PID/net_policy to set per task net policy.

Signed-off-by: Kan Liang 
---
 fs/proc/base.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index a11eb71..7679785 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -91,6 +91,8 @@
 #include 
 #endif
 #include 
+#include 
+#include 
 #include "internal.h"
 #include "fd.h"
 
@@ -2807,6 +2809,65 @@ static int proc_pid_personality(struct seq_file *m, 
struct pid_namespace *ns,
return err;
 }
 
+#ifdef CONFIG_NETPOLICY
+static int proc_net_policy_show(struct seq_file *m, void *v)
+{
+   struct inode *inode = m->private;
+   struct task_struct *task = get_proc_task(inode);
+
+   if (is_net_policy_valid(task->task_netpolicy.policy))
+   seq_printf(m, "%s\n", policy_name[task->task_netpolicy.policy]);
+
+   return 0;
+}
+
+static int proc_net_policy_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, proc_net_policy_show, inode);
+}
+
+static ssize_t proc_net_policy_write(struct file *file, const char __user *buf,
+size_t count, loff_t *ppos)
+{
+   struct inode *inode = file_inode(file);
+   struct task_struct *task = get_proc_task(inode);
+   char name[POLICY_NAME_LEN_MAX];
+   int i, ret;
+
+   if (count >= POLICY_NAME_LEN_MAX)
+   return -EINVAL;
+
+   if (copy_from_user(name, buf, count))
+   return -EINVAL;
+
+   for (i = 0; i < count - 1; i++)
+   name[i] = toupper(name[i]);
+   name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+   for (i = 0; i < NET_POLICY_MAX; i++) {
+   if (!strncmp(name, policy_name[i], strlen(policy_name[i]))) {
+   ret = netpolicy_register(&task->task_netpolicy, i);
+   if (ret)
+   return ret;
+   break;
+   }
+   }
+
+   if (i == NET_POLICY_MAX)
+   return -EINVAL;
+
+   return count;
+}
+
+static const struct file_operations proc_net_policy_operations = {
+   .open   = proc_net_policy_open,
+   .write  = proc_net_policy_write,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+};
+#endif /* CONFIG_NETPOLICY */
+
 /*
  * Thread groups
  */
@@ -2906,6 +2967,9 @@ static const struct pid_entry tgid_base_stuff[] = {
REG("timers", S_IRUGO, proc_timers_operations),
 #endif
REG("timerslack_ns", S_IRUGO|S_IWUGO, 
proc_pid_set_timerslack_ns_operations),
+#if IS_ENABLED(CONFIG_NETPOLICY)
+   REG("net_policy", S_IRUSR|S_IWUSR, proc_net_policy_operations),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.5.5



[RFC V2 PATCH 14/25] net/netpolicy: handle channel changes

2016-08-04 Thread kan . liang
From: Kan Liang 

User can uses ethtool to set the channel number. This patch handles the
channel changes by rebuilding the object list.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h | 8 
 net/core/ethtool.c| 8 +++-
 net/core/netpolicy.c  | 1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 579ff98..cc75e3c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -79,4 +79,12 @@ struct netpolicy_info {
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+#ifdef CONFIG_NETPOLICY
+extern void update_netpolicy_sys_map(void);
+#else
+static inline void update_netpolicy_sys_map(void)
+{
+}
+#endif
+
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..e1f8bd0 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1703,6 +1703,7 @@ static noinline_for_stack int ethtool_set_channels(struct 
net_device *dev,
 {
struct ethtool_channels channels, max;
u32 max_rx_in_use = 0;
+   int ret;
 
if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP;
@@ -1726,7 +1727,12 @@ static noinline_for_stack int 
ethtool_set_channels(struct net_device *dev,
(channels.combined_count + channels.rx_count) <= max_rx_in_use)
return -EINVAL;
 
-   return dev->ethtool_ops->set_channels(dev, &channels);
+   ret = dev->ethtool_ops->set_channels(dev, &channels);
+#ifdef CONFIG_NETPOLICY
+   if (!ret)
+   update_netpolicy_sys_map();
+#endif
+   return ret;
 }
 
 static int ethtool_get_pauseparam(struct net_device *dev, void __user 
*useraddr)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 3b523fc..7579685 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -885,6 +885,7 @@ unlock:
}
}
 }
+EXPORT_SYMBOL(update_netpolicy_sys_map);
 
 static int netpolicy_cpu_callback(struct notifier_block *nfb,
  unsigned long action, void *hcpu)
-- 
2.5.5



[RFC V2 PATCH 19/25] net/netpolicy: set Rx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang 

For setting Rx queues, this patch configure Rx network flow
classification rules to redirect the packets to the assigned queue.

Since we may not get all the information required for rule until the
first packet arrived, it will add the rule after recvmsg. Also, to
avoid destroying the connection rates, the configuration will be done
asynchronized by work queue. So the first several packets may not use
the assigned queue.

The dev information will be discarded in udp_queue_rcv_skb, so we record
it in netpolicy struct in advance.

This patch only support INET tcp4 and udp4. It can be extend to other
socket type and V6 later shortly.

For each sk, it only supports one rule. If the port/address changed, the
previos rule will be replaced.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |  33 +++-
 net/core/netpolicy.c  | 131 +-
 net/ipv4/af_inet.c|  71 +
 net/ipv4/udp.c|   4 ++
 4 files changed, 236 insertions(+), 3 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a522015..df962de 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -37,6 +37,8 @@ enum netpolicy_traffic {
NETPOLICY_RXTX,
 };
 
+#define NETPOLICY_INVALID_QUEUE-1
+#define NETPOLICY_INVALID_LOC  NETPOLICY_INVALID_QUEUE
 #define POLICY_NAME_LEN_MAX64
 extern const char *policy_name[];
 
@@ -80,10 +82,32 @@ struct netpolicy_info {
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_tcpudpip4_spec {
+   /* source and Destination host and port */
+   __be32  ip4src;
+   __be32  ip4dst;
+   __be16  psrc;
+   __be16  pdst;
+};
+
+union netpolicy_flow_union {
+   struct netpolicy_tcpudpip4_spec tcp_udp_ip4_spec;
+};
+
+struct netpolicy_flow_spec {
+   __u32   flow_type;
+   union netpolicy_flow_union  spec;
+};
+
 struct netpolicy_instance {
struct net_device   *dev;
-   enum netpolicy_name policy; /* required policy */
-   void*ptr;   /* pointers */
+   enum netpolicy_name policy; /* required policy */
+   void*ptr;   /* pointers */
+   int location;   /* rule location */
+   atomic_trule_queue; /* queue set by rule */
+   struct work_struct  fc_wk;  /* flow classification work */
+   atomic_tfc_wk_cnt;  /* flow classification work 
number */
+   struct netpolicy_flow_spec flow;/* flow information */
 };
 
 /* check if policy is valid */
@@ -98,6 +122,7 @@ extern int netpolicy_register(struct netpolicy_instance 
*instance,
  enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_instance *instance);
 extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool 
is_rx);
+extern void netpolicy_set_rules(struct netpolicy_instance *instance);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -116,6 +141,10 @@ static inline int netpolicy_pick_queue(struct 
netpolicy_instance *instance, bool
 {
return 0;
 }
+
+static inline void netpolicy_set_rules(struct netpolicy_instance *instance)
+{
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 98ca430..89c65d9 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpolicy_record {
struct hlist_node   hash_node;
@@ -52,6 +53,8 @@ struct netpolicy_record {
 static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
+struct workqueue_struct *np_fc_wq;
+
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
 {
@@ -426,6 +429,90 @@ int netpolicy_pick_queue(struct netpolicy_instance 
*instance, bool is_rx)
 }
 EXPORT_SYMBOL(netpolicy_pick_queue);
 
+void np_flow_rule_set(struct work_struct *wk)
+{
+   struct netpolicy_instance *instance;
+   struct netpolicy_flow_spec *flow;
+   struct ethtool_rxnfc cmd;
+   struct net_device *dev;
+   int queue, ret;
+
+   instance = container_of(wk, struct netpolicy_instance,
+   fc_wk);
+   if (!instance)
+   return;
+
+   flow = &instance->flow;
+   if (WARN_ON(!flow))
+   goto done;
+   dev = instance->dev;
+   if (WARN_ON(!dev))
+   goto done;
+
+   /* Check if ntuple is supported */
+   if (!dev->ethtool_ops->set_rxnfc)
+   goto done;
+
+   /* Only support TCP/UDP V4 by now */
+   if ((flow->flow_type != TCP_V4_FLOW) &&
+   (flow->flow_type != UDP_V4_FLOW))
+   goto done;
+
+   queue = get_avail

[RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread kan . liang
From: Kan Liang 

To achieve better network performance, the key step is to distribute the
packets to dedicated queues according to policy and system run time
status.

This patch provides an interface which can return the proper dedicated
queue for socket/task. Then the packets of the socket/task will be
redirect to the dedicated queue for better network performance.

For selecting the proper queue, currently it uses round-robin algorithm
to find the available object from the given policy object list. The
algorithm is good enough for now. But it could be improved by some
adaptive algorithm later.

The selected object will be stored in hashtable. So it does not need to
go through the whole object list every time.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |   5 ++
 net/core/netpolicy.c  | 136 ++
 2 files changed, 141 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 5900252..a522015 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void);
 extern int netpolicy_register(struct netpolicy_instance *instance,
  enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_instance *instance);
+extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool 
is_rx);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct 
netpolicy_instance *instance)
 {
 }
 
+static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, 
bool is_rx)
+{
+   return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 3605761..98ca430 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -290,6 +290,142 @@ static void netpolicy_record_clear_dev_node(struct 
net_device *dev)
spin_unlock_bh(&np_hashtable_lock);
 }
 
+static struct netpolicy_object *get_avail_object(struct net_device *dev,
+enum netpolicy_name policy,
+bool is_rx)
+{
+   int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+   struct netpolicy_object *tmp, *obj = NULL;
+   int val = -1;
+
+   /* Check if net policy is supported */
+   if (!dev || !dev->netpolicy)
+   return NULL;
+
+   /* The system should have queues which support the request policy. */
+   if ((policy != dev->netpolicy->cur_policy) &&
+   (dev->netpolicy->cur_policy != NET_POLICY_MIX))
+   return NULL;
+
+   spin_lock_bh(&dev->np_ob_list_lock);
+   list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) {
+   if ((val > atomic_read(&tmp->refcnt)) ||
+   (val == -1)) {
+   val = atomic_read(&tmp->refcnt);
+   obj = tmp;
+   }
+   }
+
+   if (WARN_ON(!obj)) {
+   spin_unlock_bh(&dev->np_ob_list_lock);
+   return NULL;
+   }
+   atomic_inc(&obj->refcnt);
+   spin_unlock_bh(&dev->np_ob_list_lock);
+
+   return obj;
+}
+
+static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
+{
+   struct netpolicy_record *old_record, *new_record;
+   struct net_device *dev = instance->dev;
+   unsigned long ptr_id = (uintptr_t)instance->ptr;
+   int queue = -1;
+
+   spin_lock_bh(&np_hashtable_lock);
+   old_record = netpolicy_record_search(ptr_id);
+   if (!old_record) {
+   pr_warn("NETPOLICY: doesn't registered. Remove net policy 
settings!\n");
+   instance->policy = NET_POLICY_INVALID;
+   goto err;
+   }
+
+   if (is_rx && old_record->rx_obj) {
+   queue = old_record->rx_obj->queue;
+   } else if (!is_rx && old_record->tx_obj) {
+   queue = old_record->tx_obj->queue;
+   } else {
+   new_record = kzalloc(sizeof(*new_record), GFP_KERNEL);
+   if (!new_record)
+   goto err;
+   memcpy(new_record, old_record, sizeof(*new_record));
+
+   if (is_rx) {
+   new_record->rx_obj = get_avail_object(dev, 
new_record->policy, is_rx);
+   if (!new_record->dev)
+   new_record->dev = dev;
+   if (!new_record->rx_obj) {
+   kfree(new_record);
+   goto err;
+   }
+   queue = new_record->rx_obj->queue;
+   } else {
+   new_record->tx_obj = get_avail_object(dev, 
new_record->policy, is_rx);
+   if (!new_record->dev)
+   new_record->dev = dev;
+   if (!new_record->tx_obj) {
+  

[RFC V2 PATCH 25/25] Documentation/networking: Document NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

Signed-off-by: Kan Liang 
---
 Documentation/networking/netpolicy.txt | 157 +
 1 file changed, 157 insertions(+)
 create mode 100644 Documentation/networking/netpolicy.txt

diff --git a/Documentation/networking/netpolicy.txt 
b/Documentation/networking/netpolicy.txt
new file mode 100644
index 000..b8e3d4c
--- /dev/null
+++ b/Documentation/networking/netpolicy.txt
@@ -0,0 +1,157 @@
+What is Linux Net Policy?
+
+It is a big challenge to get good network performance. First, the network
+performance is not good with default system settings. Second, it is too
+difficult to do automatic tuning for all possible workloads, since workloads
+have different requirements. Some workloads may want high throughput. Some may
+need low latency. Last but not least, there are lots of manual configurations.
+Fine grained configuration is too difficult for users.
+
+"NET policy" intends to simplify the network configuration and get a
+good network performance according to the hints(policy) which is applied by
+user. It provides some typical "policies" for user which can be set
+per-socket, per-task or per-device. The kernel automatically figures out
+how to merge different requests to get good network performance.
+
+"Net policy" is designed for multiqueue network devices. This document
+describes the concepts and APIs of "net policy" support.
+
+NET POLICY CONCEPTS
+
+Scope of Net Policies
+
+Device net policy: this policy applies to the whole device. Once the
+device net policy is set, it automatically configures the system
+according to the applied policy. The configuration usually includes IRQ
+affinity, IRQ balance disable, interrupt moderation, and so on. But the
+device net policy does not change the packet direction.
+
+Task net policy: this is a per-task policy. When it is applied to specific
+task, all packet transmissions of the task will be redirected to the
+assigned queues accordingly. If a task does not define a task policy,
+it "falls back" to the system default way to direct the packets. The
+per-task policy must be compatible with device net policy.
+
+Socket net policy: this is a per-socket policy. When it is applied to
+specific socket, all packet transmissions of the socket will be redirected
+to the assigned queues accordingly. If a socket does not define a socket
+policy, it "falls back" to the system default way to direct the packets.
+The per-socket policy must be compatible with both device net policy and
+per-task policy.
+
+Components of Net Policies
+
+Net policy object: it is a combination of CPU and queue. The queue IRQ has
+to set affinity with the CPU. It can be shared between sockets and tasks.
+A reference counter is used to track the sharing number.
+
+Net policy object list: each device policy has an object list. Once the
+device policy is determined, the net policy object will be inserted into
+the net policy object list. The net policy object list does not change
+unless the CPU/queue number is changed, the netpolicy is disabled or
+the device policy is changed.
+The network performance for objects could be different because of the
+CPU/queue topology and dev location. The objects which can bring high
+performance are in the front of the list.
+
+RCU hash table: an RCU hash table to maintain the relationship between
+the task/socket and the assigned object. The task/socket can get the
+assigned object by searching the table.
+If it is the first time, there is no assigned object in the table. It will
+go through the object list to find the available object based on position
+and reference number.
+If the net policy object list changes, all the assigned objects will become
+invalid.
+
+NET POLICY APIs
+
+Interfaces between net policy and device driver
+
+int (*ndo_netpolicy_init)(struct net_device *dev,
+  struct netpolicy_info *info);
+
+The device driver who has NET policy support must implement this interface.
+In this interface, the device driver does necessory initialization, and 
fill
+the info for net policy module. The information could include supported
+policy, MIX policy support, queue pair support and so on.
+
+int (*ndo_get_irq_info)(struct net_device *dev,
+struct netpolicy_dev_info *info);
+
+This interface is used to get more accurate device IRQ information.
+
+int (*ndo_set_net_policy)(struct net_device *dev,
+  enum netpolicy_name name);
+
+This interface is used to set device net policy by name. It is device 
driver's
+responsibility to set driver specific configuration for the given policy.
+
+Interfaces between net policy and kernel
+
+int netpolicy_register(struct netpolicy_instance *instance);
+void netpolicy_unregister(struct netpolicy_instance *i

[RFC V2 PATCH 24/25] net/netpolicy: limit the total record number

2016-08-04 Thread kan . liang
From: Kan Liang 

NET policy can not fulfill users request without limit, because of the
security consideration and device limitation. For security
consideration, the attacker may fake millions of per task/socket request
to crash the system. For device limitation, the flow director rules
number is limited on i40e driver. NET policy should not run out the
rules, otherwise it cannot guarantee the good performance.

This patch limits the total record number in RCU hash table to fix the
cases as above. The max total record number could vary for different
device. For i40e driver, it limits the record number according to flow
director rules number. If it exceeds the limitation, the registeration
and new object request will be denied.

Since the dev may not be aware in registeration, the cur_rec_num may not
be updated on time. So the actual registered record may exceeds the
max_rec_num. But it will not bring any problems. Because the patch also
check the limitation on object request. It guarantees that the device
resource will not run out.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |  4 
 net/core/netpolicy.c  | 22 --
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 0eba512..9bc2ee0 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -40,6 +40,7 @@ enum netpolicy_traffic {
 #define NETPOLICY_INVALID_QUEUE-1
 #define NETPOLICY_INVALID_LOC  NETPOLICY_INVALID_QUEUE
 #define POLICY_NAME_LEN_MAX64
+#define NETPOLICY_MAX_RECORD_NUM   7000
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
@@ -81,6 +82,9 @@ struct netpolicy_info {
struct netpolicy_sys_info   sys_info;
/* List of policy objects 0 rx 1 tx */
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
+   /* for record number limitation */
+   int max_rec_num;
+   atomic_tcur_rec_num;
 };
 
 struct netpolicy_tcpudpip4_spec {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 735405c..e9f3800 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -368,6 +368,9 @@ static int get_avail_queue(struct netpolicy_instance 
*instance, bool is_rx)
unsigned long ptr_id = (uintptr_t)instance->ptr;
int queue = -1;
 
+   if (atomic_read(&dev->netpolicy->cur_rec_num) > 
dev->netpolicy->max_rec_num)
+   return queue;
+
spin_lock_bh(&np_hashtable_lock);
old_record = netpolicy_record_search(ptr_id);
if (!old_record) {
@@ -388,8 +391,10 @@ static int get_avail_queue(struct netpolicy_instance 
*instance, bool is_rx)
 
if (is_rx) {
new_record->rx_obj = get_avail_object(dev, 
new_record->policy, is_rx);
-   if (!new_record->dev)
+   if (!new_record->dev) {
new_record->dev = dev;
+   atomic_inc(&dev->netpolicy->cur_rec_num);
+   }
if (!new_record->rx_obj) {
kfree(new_record);
goto err;
@@ -397,8 +402,10 @@ static int get_avail_queue(struct netpolicy_instance 
*instance, bool is_rx)
queue = new_record->rx_obj->queue;
} else {
new_record->tx_obj = get_avail_object(dev, 
new_record->policy, is_rx);
-   if (!new_record->dev)
+   if (!new_record->dev) {
new_record->dev = dev;
+   atomic_inc(&dev->netpolicy->cur_rec_num);
+   }
if (!new_record->tx_obj) {
kfree(new_record);
goto err;
@@ -638,6 +645,7 @@ int netpolicy_register(struct netpolicy_instance *instance,
   enum netpolicy_name policy)
 {
unsigned long ptr_id = (uintptr_t)instance->ptr;
+   struct net_device *dev = instance->dev;
struct netpolicy_record *new, *old;
 
if (!is_net_policy_valid(policy)) {
@@ -645,6 +653,10 @@ int netpolicy_register(struct netpolicy_instance *instance,
return -EINVAL;
}
 
+   if (dev && dev->netpolicy &&
+   (atomic_read(&dev->netpolicy->cur_rec_num) > 
dev->netpolicy->max_rec_num))
+   return -ENOSPC;
+
new = kzalloc(sizeof(*new), GFP_KERNEL);
if (!new) {
instance->policy = NET_POLICY_INVALID;
@@ -668,6 +680,8 @@ int netpolicy_register(struct netpolicy_instance *instance,
new->dev = instance->dev;
new->policy = policy;
hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+   if (dev && dev->netpolicy)
+   atomic_inc(&dev->netpolicy->

[RFC V2 PATCH 23/25] net/netpolicy: optimize for queue pair

2016-08-04 Thread kan . liang
From: Kan Liang 

Some drivers like i40e driver does not support separate Tx and Rx queues
as channels. Using Rx queue to stand for the channels, if queue_pair is
set by driver.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h | 1 +
 net/core/netpolicy.c  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 00600f8..0eba512 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -76,6 +76,7 @@ struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
boolhas_mix_policy;
+   boolqueue_pair;
/* cpu and queue mapping information */
struct netpolicy_sys_info   sys_info;
/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index dc1edfc..735405c 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -495,6 +495,9 @@ int netpolicy_pick_queue(struct netpolicy_instance 
*instance, bool is_rx)
if (!policy_validate(instance))
return -EINVAL;
 
+   if (dev->netpolicy->queue_pair)
+   is_rx = true;
+
/* fast path */
rcu_read_lock();
version = (int *)rcu_dereference(netpolicy_sys_map_version);
-- 
2.5.5



[RFC V2 PATCH 22/25] net/netpolicy: fast path for finding the queues

2016-08-04 Thread kan . liang
From: Kan Liang 

Current implementation searches the hash table to get assigned object
for each transmit/receive packet. It's not necessory, because the
assigned object usually remain unchanged. This patch store the assigned
queue to speed up the searching process.

But under certain situations, the assigned objects has to be changed,
especially when system cpu and queue mapping changed, such as CPU
hotplug, device hotplug, queue number changes and so on. In this patch,
the netpolicy_sys_map_version is used to track the system cpu and queue
mapping changes. If the netpolicy_sys_map_version doesn't match with the
instance's version, the stored queue will be dropped. The
netpolicy_sys_map_version is protected by RCU lock.

Also, to reduce the overhead, this patch asynchronously find the
available object by work queue. So the first several packets may not be
benefited.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |   8 
 net/core/netpolicy.c  | 103 +-
 net/ipv4/af_inet.c|   7 +---
 3 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index df962de..00600f8 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -108,6 +108,14 @@ struct netpolicy_instance {
struct work_struct  fc_wk;  /* flow classification work */
atomic_tfc_wk_cnt;  /* flow classification work 
number */
struct netpolicy_flow_spec flow;/* flow information */
+   /* For fast path */
+   atomic_trx_queue;
+   atomic_ttx_queue;
+   struct work_struct  get_rx_wk;
+   atomic_tget_rx_wk_cnt;
+   struct work_struct  get_tx_wk;
+   atomic_tget_tx_wk_cnt;
+   int sys_map_version;
 };
 
 /* check if policy is valid */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 4b844d8..dc1edfc 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -79,10 +79,13 @@ struct netpolicy_record {
struct netpolicy_object *tx_obj;
 };
 
+static void __rcu *netpolicy_sys_map_version;
+
 static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
 struct workqueue_struct *np_fc_wq;
+struct workqueue_struct *np_fast_path_wq;
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -411,6 +414,37 @@ err:
return queue;
 }
 
+static void np_find_rx_queue(struct work_struct *wk)
+{
+   struct netpolicy_instance *instance;
+   int queue;
+
+   instance = container_of(wk, struct netpolicy_instance,
+   get_rx_wk);
+
+   if (instance) {
+   queue = get_avail_queue(instance, true);
+   if (queue >= 0)
+   atomic_set(&instance->rx_queue, queue);
+   }
+   atomic_set(&instance->get_rx_wk_cnt, 0);
+}
+
+static void np_find_tx_queue(struct work_struct *wk)
+{
+   struct netpolicy_instance *instance;
+   int queue;
+
+   instance = container_of(wk, struct netpolicy_instance,
+   get_tx_wk);
+   if (instance) {
+   queue = get_avail_queue(instance, false);
+   if (queue >= 0)
+   atomic_set(&instance->tx_queue, queue);
+   }
+   atomic_set(&instance->get_tx_wk_cnt, 0);
+}
+
 static inline bool policy_validate(struct netpolicy_instance *instance)
 {
struct net_device *dev = instance->dev;
@@ -453,6 +487,7 @@ static inline bool policy_validate(struct 
netpolicy_instance *instance)
 int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
 {
struct net_device *dev = instance->dev;
+   int *version;
 
if (!dev || !dev->netpolicy)
return -EINVAL;
@@ -460,7 +495,31 @@ int netpolicy_pick_queue(struct netpolicy_instance 
*instance, bool is_rx)
if (!policy_validate(instance))
return -EINVAL;
 
-   return get_avail_queue(instance, is_rx);
+   /* fast path */
+   rcu_read_lock();
+   version = (int *)rcu_dereference(netpolicy_sys_map_version);
+   if (*version == instance->sys_map_version) {
+   if (is_rx && (atomic_read(&instance->rx_queue) != 
NETPOLICY_INVALID_QUEUE)) {
+   rcu_read_unlock();
+   return atomic_read(&instance->rx_queue);
+   }
+   if (!is_rx && (atomic_read(&instance->tx_queue) != 
NETPOLICY_INVALID_QUEUE)) {
+   rcu_read_unlock();
+   return atomic_read(&instance->tx_queue);
+   }
+   } else {
+   atomic_set(&instance->rx_queue, NETPOLICY_INVALID_QUEUE);
+   atomic_set(&instance->tx_queue, NETPOLICY_INVALID_QUEUE);
+   instance->sys_map_ver

[RFC V2 PATCH 20/25] net/netpolicy: introduce per task net policy

2016-08-04 Thread kan . liang
From: Kan Liang 

Usually, application as a whole has specific requirement. Applying the
net policy to all sockets one by one in the application is too complex.
This patch introduces per task net policy to address this case.
Once the per task net policy is applied, all the sockets in the
application will apply the same net policy. Also, per task net policy
can be inherited by all children.

The usage of PR_SET_NETPOLICY option is as below.
prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL).
It applies per task policy. The policy name must be valid and compatible
with current device policy. Othrewise, it will error out. The task
policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang 
---
 include/linux/init_task.h  |  9 +
 include/linux/sched.h  |  5 +
 include/net/sock.h | 12 +++-
 include/uapi/linux/prctl.h |  4 
 kernel/exit.c  |  4 
 kernel/fork.c  |  6 ++
 kernel/sys.c   | 31 +++
 net/core/netpolicy.c   | 35 +++
 net/core/sock.c| 10 +-
 net/ipv4/af_inet.c |  7 +--
 10 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f8834f8..133d1cb 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,14 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_NETPOLICY
+#define INIT_NETPOLICY(tsk)\
+   .task_netpolicy.policy = NET_POLICY_INVALID,\
+   .task_netpolicy.dev = NULL, \
+   .task_netpolicy.ptr = (void *)&tsk,
+#else
+#define INIT_NETPOLICY(tsk)
+#endif
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -260,6 +268,7 @@ extern struct task_group root_task_group;
INIT_VTIME(tsk) \
INIT_NUMA_BALANCING(tsk)\
INIT_KASAN(tsk) \
+   INIT_NETPOLICY(tsk) \
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d99218a..2cfcdbd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -62,6 +62,8 @@ struct sched_param {
 
 #include 
 
+#include 
+
 #define SCHED_ATTR_SIZE_VER0   48  /* sizeof first published struct */
 
 /*
@@ -1919,6 +1921,9 @@ struct task_struct {
 #ifdef CONFIG_MMU
struct task_struct *oom_reaper_list;
 #endif
+#ifdef CONFIG_NETPOLICY
+   struct netpolicy_instance task_netpolicy;
+#endif
 /* CPU-specific state of this task */
struct thread_struct thread;
 /*
diff --git a/include/net/sock.h b/include/net/sock.h
index 6219434..e4f023c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1477,6 +1477,7 @@ void sock_edemux(struct sk_buff *skb);
 #define sock_edemux(skb) sock_efree(skb)
 #endif
 
+void sock_setnetpolicy(struct socket *sock);
 int sock_setsockopt(struct socket *sock, int level, int op,
char __user *optval, unsigned int optlen);
 
@@ -2273,10 +2274,19 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
-/* Return netpolicy instance information from socket. */
+/* Return netpolicy instance information from either task or socket.
+ * If both task and socket have netpolicy instance information,
+ * using task's and unregistering socket's. Because task policy is
+ * dominant policy
+ */
 static inline struct netpolicy_instance *netpolicy_find_instance(struct sock 
*sk)
 {
 #ifdef CONFIG_NETPOLICY
+   if (is_net_policy_valid(current->task_netpolicy.policy)) {
+   if (is_net_policy_valid(sk->sk_netpolicy.policy))
+   netpolicy_unregister(&sk->sk_netpolicy);
+   return ¤t->task_netpolicy;
+   }
if (is_net_policy_valid(sk->sk_netpolicy.policy))
return &sk->sk_netpolicy;
 #endif
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..bc182d2 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER  3
 # define PR_CAP_AMBIENT_CLEAR_ALL  4
 
+/* Control net policy */
+#define PR_SET_NETPOLICY   48
+#define PR_GET_NETPOLICY   49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 84ae830..4abd921 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -858,6 +858,10 @@ void do_exit(long code)
if (unlikely(current->pi_state_cache))
kfree(current->pi_state_cache);
 #endif
+#ifdef CONFIG_NETPOLICY
+   if (is_net_policy_valid(current->task_netpolicy.policy))
+   netpolicy_unreg

[RFC V2 PATCH 03/25] net/netpolicy: get device queue irq information

2016-08-04 Thread kan . liang
From: Kan Liang 

Net policy needs to know device information. Currently, it's enough to
only get irq information of rx and tx queues.

This patch introduces ndo ops to do so, not ethtool ops.
Because there are already several ways to get irq information in
userspace. It's not necessory to extend the ethtool.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h |  5 +
 include/linux/netpolicy.h |  7 +++
 net/core/netpolicy.c  | 14 ++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e0a7e7..0e55ccd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1124,6 +1124,9 @@ struct netdev_xdp {
  * int(*ndo_netpolicy_init)(struct net_device *dev,
  * struct netpolicy_info *info);
  * This function is used to init and get supported policy.
+ * int (*ndo_get_irq_info)(struct net_device *dev,
+ *struct netpolicy_dev_info *info);
+ * This function is used to get irq information of rx and tx queues
  *
  */
 struct net_device_ops {
@@ -1313,6 +1316,8 @@ struct net_device_ops {
 #ifdef CONFIG_NETPOLICY
int (*ndo_netpolicy_init)(struct net_device *dev,
  struct netpolicy_info 
*info);
+   int (*ndo_get_irq_info)(struct net_device *dev,
+   struct netpolicy_dev_info 
*info);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ca1f131..fc87d9b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -23,6 +23,13 @@ enum netpolicy_name {
 
 extern const char *policy_name[];
 
+struct netpolicy_dev_info {
+   u32 rx_num;
+   u32 tx_num;
+   u32 *rx_irq;
+   u32 *tx_irq;
+};
+
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 5f304d5..7c34c8a 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,20 @@
 #include 
 #include 
 
+static int netpolicy_get_dev_info(struct net_device *dev,
+ struct netpolicy_dev_info *d_info)
+{
+   if (!dev->netdev_ops->ndo_get_irq_info)
+   return -ENOTSUPP;
+   return dev->netdev_ops->ndo_get_irq_info(dev, d_info);
+}
+
+static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
+{
+   kfree(d_info->rx_irq);
+   kfree(d_info->tx_irq);
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-04 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Woodford, Timothy W.
> Sent: Friday, July 29, 2016 7:41 AM
> To: Woodford, Timothy W. ; Avargil,
> Raanan ; Jarod Wilson ;
> linux-ker...@vger.kernel.org; Hall, Christopher S
> 
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out
> systim sanitization
> 
> >>> This is prepatory work for an expanding list of adapter families that have
> occasional ~10 hour clock jumps when being used for PTP. Factor out the
> sanitization function and convert to using a feature (bug) flag, per 
> suggestion
> from Jesse Brandeburg.
> >>>
> >>> Littering functional code with device-specific checks is much messier than
> simply checking a flag, and having device-specific init set flags as needed.
> >>> There are probably a number of other cases in the e1000e code that
> could/should be converted similarly.
> >>
> >> Looks ok to me.
> >> Adding Chris who asked what happens if we reach the max retry counter
> (E1000_MAX_82574_SYSTIM_REREAD)?
> >> This counter is set to 50.
> >> Can you, for testing purposes, decreased this value (or even set it to 0)
> and see what happens?
> >  I'll set the max retry counter to 1 and run an overnight test to see what
> happens.
> 
> After running with this configuration for about 36 hours, I haven't seen any
> timing jumps.  Either this configuration eliminates the error, or it makes it
> significantly less likely to occur.
> 
> Tim Woodford

Feel free to throw a Tested-by: on it if you like.  Not a big deal either way, 
I managed to get enough cycles in on it I'm pretty happy with it as well.

> ___
> Intel-wired-lan mailing list
> intel-wired-...@lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/intel-wired-lan


RE: [Intel-wired-lan] [PATCH net-next v3 2/2] e1000e: fix PTP on e1000_pch_lpt variants

2016-08-04 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Jarod Wilson
> Sent: Tuesday, July 26, 2016 11:26 AM
> To: linux-ker...@vger.kernel.org
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net-next v3 2/2] e1000e: fix PTP on
> e1000_pch_lpt variants
> 
> I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
> as a PTP slave experiences random ~10 hour clock jumps, which are resolved
> if the same workaround for the 82574 and 82583 is employed, so set the
> appropriate flag2 in e1000_pch_lpt_info too.
> 
> Reported-by: Rupesh Patel 
> CC: Jesse Brandeburg 
> CC: Jeff Kirsher 
> CC: intel-wired-...@lists.osuosl.org
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson 
> ---
>  drivers/net/ethernet/intel/e1000e/ich8lan.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Tested-by: Aaron Brown 


RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-08-04 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Jarod Wilson
> Sent: Tuesday, July 26, 2016 11:26 AM
> To: linux-ker...@vger.kernel.org
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim
> sanitization
> 
> This is prepatory work for an expanding list of adapter families that have
> occasional ~10 hour clock jumps when being used for PTP. Factor out the
> sanitization function and convert to using a feature (bug) flag, per
> suggestion from Jesse Brandeburg.
> 
> Littering functional code with device-specific checks is much messier than
> simply checking a flag, and having device-specific init set flags as needed.
> There are probably a number of other cases in the e1000e code that
> could/should be converted similarly.
> 
> Suggested-by: Jesse Brandeburg 
> CC: Jesse Brandeburg 
> CC: Jeff Kirsher 
> CC: intel-wired-...@lists.osuosl.org
> CC: netdev@vger.kernel.org
> Signed-off-by: Jarod Wilson 
> ---
>  drivers/net/ethernet/intel/e1000e/82571.c  |  6 ++-
>  drivers/net/ethernet/intel/e1000e/e1000.h  |  1 +
>  drivers/net/ethernet/intel/e1000e/netdev.c | 66 ++---
> -
>  3 files changed, 44 insertions(+), 29 deletions(-)

Tested-by: Aaron Brown 


[RFC V2 PATCH 04/25] net/netpolicy: get CPU information

2016-08-04 Thread kan . liang
From: Kan Liang 

Net policy also needs to know CPU information. Currently, online
CPU number is enough.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7c34c8a..075aaca 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -49,6 +49,11 @@ static void netpolicy_free_dev_info(struct 
netpolicy_dev_info *d_info)
kfree(d_info->tx_irq);
 }
 
+static u32 netpolicy_get_cpu_information(void)
+{
+   return num_online_cpus();
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 01/25] net: introduce NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

This patch introduce NET policy subsystem. If proc is supported in the
system, it creates netpolicy node in proc system.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h   |   7 +++
 include/net/net_namespace.h |   3 ++
 net/Kconfig |   7 +++
 net/core/Makefile   |   1 +
 net/core/netpolicy.c| 128 
 5 files changed, 146 insertions(+)
 create mode 100644 net/core/netpolicy.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 076df53..19638d6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1619,6 +1619,8 @@ enum netdev_priv_flags {
  * switch driver and used to set the phys state of the
  * switch port.
  *
+ * @proc_dev:  device node in proc to configure device net policy
+ *
  * FIXME: cleanup struct net_device such that network protocol info
  * moves out.
  */
@@ -1886,6 +1888,11 @@ struct net_device {
struct lock_class_key   *qdisc_tx_busylock;
struct lock_class_key   *qdisc_running_key;
boolproto_down;
+#ifdef CONFIG_NETPOLICY
+#ifdef CONFIG_PROC_FS
+   struct proc_dir_entry   *proc_dev;
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..d2ff6c4 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -142,6 +142,9 @@ struct net {
 #endif
struct sock *diag_nlsk;
atomic_tfnhe_genid;
+#ifdef CONFIG_NETPOLICY
+   struct proc_dir_entry   *proc_netpolicy;
+#endif /* CONFIG_NETPOLICY */
 };
 
 #include 
diff --git a/net/Kconfig b/net/Kconfig
index c2cdbce..00552ba 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
 
 endif
 
+config NETPOLICY
+   depends on NET
+   bool "Net policy support"
+   default y
+   ---help---
+   Net policy support
+
 source "net/dccp/Kconfig"
 source "net/sctp/Kconfig"
 source "net/rds/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..0be7092 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NETPOLICY) += netpolicy.o
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
new file mode 100644
index 000..faabfe7
--- /dev/null
+++ b/net/core/netpolicy.c
@@ -0,0 +1,128 @@
+/*
+ * netpolicy.c: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.li...@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * NET policy intends to simplify the network configuration and get a good
+ * network performance according to the hints(policy) which is applied by user.
+ *
+ * Motivation
+ * - The network performance is not good with default system settings.
+ * - It is too difficult to do automatic tuning for all possible
+ *   workloads, since workloads have different requirements. Some
+ *   workloads may want high throughput. Some may need low latency.
+ * - There are lots of manual configurations. Fine grained configuration
+ *   is too difficult for users.
+ * So, it is a big challenge to get good network performance.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_PROC_FS
+
+static int net_policy_proc_show(struct seq_file *m, void *v)
+{
+   struct net_device *dev = (struct net_device *)m->private;
+
+   seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+
+   return 0;
+}
+
+static int net_policy_proc_open(struct inode *inode, struct file *file)
+{
+   return single_open(file, net_policy_proc_show, PDE_DATA(inode));
+}
+
+static const struct file_operations proc_net_policy_operations = {
+   .open   = net_policy_proc_open,
+   .read   = seq_read,
+   .llseek = seq_lseek,
+   .release= seq_release,
+   .owner  = THIS_MODULE,
+};
+
+static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
+{
+   dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
+   if (!dev->proc_dev)
+   return -ENOMEM;
+
+   if (!proc_create_data("policy", 

[RFC V2 PATCH 11/25] net/netpolicy: add MIX policy

2016-08-04 Thread kan . liang
From: Kan Liang 

MIX policy is combine of other policies. It allows different queue has
different policy. If MIX policy is applied,
/proc/net/netpolicy/$DEV/policy shows per queue policy.

Usually, the workloads requires either high throughput or low latency.
So for current implementation, MIX policy is combine of LATENCY policy
and BULK policy.

The workloads which requires high throughput are usually utilize more
CPU resources compared to the workloads which requires low latency. This
means that if there is an equal interest in latency and throughput
performance, it is better to reserve more BULK queues than LATENCY
queues. In this patch, MIX policy is forced to include 1/3 LATENCY
policy queues and 2/3 BULK policy queues.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |   7 +++
 net/core/netpolicy.c  | 139 ++
 2 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 3d348a7..579ff98 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -22,6 +22,12 @@ enum netpolicy_name {
NET_POLICY_BULK,
NET_POLICY_LATENCY,
NET_POLICY_MAX,
+
+   /*
+* Mixture of the above policy
+* Can only be set as global policy.
+*/
+   NET_POLICY_MIX,
 };
 
 enum netpolicy_traffic {
@@ -66,6 +72,7 @@ struct netpolicy_object {
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+   boolhas_mix_policy;
/* cpu and queue mapping information */
struct netpolicy_sys_info   sys_info;
/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 71e9163..8336106 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -280,6 +280,9 @@ static inline int node_distance_cmp(const void *a, const 
void *b)
return _a->distance - _b->distance;
 }
 
+#define mix_latency_num(num)   ((num) / 3)
+#define mix_throughput_num(num)((num) - mix_latency_num(num))
+
 static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
   enum netpolicy_name policy,
   struct sort_node *nodes, int num_node,
@@ -287,7 +290,9 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
 {
cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
struct cpumask *node_assigned_cpumask;
+   int *l_num = NULL, *b_num = NULL;
int i, ret = -ENOMEM;
+   int num_node_cpu;
u32 cpu;
 
if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
@@ -299,6 +304,23 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
if (!node_assigned_cpumask)
goto alloc_fail2;
 
+   if (policy == NET_POLICY_MIX) {
+   l_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+   if (!l_num)
+   goto alloc_fail3;
+   b_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+   if (!b_num) {
+   kfree(l_num);
+   goto alloc_fail3;
+   }
+
+   for (i = 0; i < num_node; i++) {
+   num_node_cpu = 
cpumask_weight(&node_avail_cpumask[nodes[i].node]);
+   l_num[i] = mix_latency_num(num_node_cpu);
+   b_num[i] = mix_throughput_num(num_node_cpu);
+   }
+   }
+
/* Don't share physical core */
for (i = 0; i < num_node; i++) {
if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
@@ -309,7 +331,13 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
cpu = cpumask_first(node_tmp_cpumask);
 
/* push to obj list */
-   ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+   if (policy == NET_POLICY_MIX) {
+   if (l_num[i]-- > 0)
+   ret = netpolicy_add_obj(dev, cpu, 
is_rx, NET_POLICY_LATENCY);
+   else if (b_num[i]-- > 0)
+   ret = netpolicy_add_obj(dev, cpu, 
is_rx, NET_POLICY_BULK);
+   } else
+   ret = netpolicy_add_obj(dev, cpu, is_rx, 
policy);
if (ret) {
spin_unlock(&dev->np_ob_list_lock);
goto err;
@@ -322,6 +350,41 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, 
bool is_rx,
spin_unlock(&dev->np_ob_list_lock);
}
 
+   if (policy == NET_POLICY_MIX) {
+   struct netpolicy_object *obj;
+   int dir = is_rx ? 0 : 1;
+   u32 sibling;
+
+   /* if have to share core, choose

[RFC V2 PATCH 03/25] net/netpolicy: get device queue irq information

2016-08-04 Thread kan . liang
From: Kan Liang 

Net policy needs to know device information. Currently, it's enough to
only get irq information of rx and tx queues.

This patch introduces ndo ops to do so, not ethtool ops.
Because there are already several ways to get irq information in
userspace. It's not necessory to extend the ethtool.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h |  5 +
 include/linux/netpolicy.h |  7 +++
 net/core/netpolicy.c  | 14 ++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2e0a7e7..0e55ccd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1124,6 +1124,9 @@ struct netdev_xdp {
  * int(*ndo_netpolicy_init)(struct net_device *dev,
  * struct netpolicy_info *info);
  * This function is used to init and get supported policy.
+ * int (*ndo_get_irq_info)(struct net_device *dev,
+ *struct netpolicy_dev_info *info);
+ * This function is used to get irq information of rx and tx queues
  *
  */
 struct net_device_ops {
@@ -1313,6 +1316,8 @@ struct net_device_ops {
 #ifdef CONFIG_NETPOLICY
int (*ndo_netpolicy_init)(struct net_device *dev,
  struct netpolicy_info 
*info);
+   int (*ndo_get_irq_info)(struct net_device *dev,
+   struct netpolicy_dev_info 
*info);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ca1f131..fc87d9b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -23,6 +23,13 @@ enum netpolicy_name {
 
 extern const char *policy_name[];
 
+struct netpolicy_dev_info {
+   u32 rx_num;
+   u32 tx_num;
+   u32 *rx_irq;
+   u32 *tx_irq;
+};
+
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 5f304d5..7c34c8a 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,20 @@
 #include 
 #include 
 
+static int netpolicy_get_dev_info(struct net_device *dev,
+ struct netpolicy_dev_info *d_info)
+{
+   if (!dev->netdev_ops->ndo_get_irq_info)
+   return -ENOTSUPP;
+   return dev->netdev_ops->ndo_get_irq_info(dev, d_info);
+}
+
+static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
+{
+   kfree(d_info->rx_irq);
+   kfree(d_info->tx_irq);
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 12/25] net/netpolicy: NET device hotplug

2016-08-04 Thread kan . liang
From: Kan Liang 

Support NET device up/down/namechange in the NET policy code.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 66 +---
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8336106..2a04fcf 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -684,6 +684,9 @@ static const struct file_operations 
proc_net_policy_operations = {
 
 static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 {
+   if (dev->proc_dev)
+   proc_remove(dev->proc_dev);
+
dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
if (!dev->proc_dev)
return -ENOMEM;
@@ -750,6 +753,19 @@ void uninit_netpolicy(struct net_device *dev)
spin_unlock(&dev->np_lock);
 }
 
+static void netpolicy_dev_init(struct net *net,
+  struct net_device *dev)
+{
+   if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+   if (netpolicy_proc_dev_init(net, dev))
+   uninit_netpolicy(dev);
+   else
+#endif /* CONFIG_PROC_FS */
+   pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+   }
+}
+
 static int __net_init netpolicy_net_init(struct net *net)
 {
struct net_device *dev, *aux;
@@ -762,14 +778,7 @@ static int __net_init netpolicy_net_init(struct net *net)
 #endif /* CONFIG_PROC_FS */
 
for_each_netdev_safe(net, dev, aux) {
-   if (!init_netpolicy(dev)) {
-#ifdef CONFIG_PROC_FS
-   if (netpolicy_proc_dev_init(net, dev))
-   uninit_netpolicy(dev);
-   else
-#endif /* CONFIG_PROC_FS */
-   pr_info("NETPOLICY: Init net policy for %s\n", 
dev->name);
-   }
+   netpolicy_dev_init(net, dev);
}
 
return 0;
@@ -791,17 +800,58 @@ static struct pernet_operations netpolicy_net_ops = {
.exit = netpolicy_net_exit,
 };
 
+static int netpolicy_notify(struct notifier_block *this,
+   unsigned long event,
+   void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+   switch (event) {
+   case NETDEV_CHANGENAME:
+#ifdef CONFIG_PROC_FS
+   if (dev->proc_dev) {
+   proc_remove(dev->proc_dev);
+   if ((netpolicy_proc_dev_init(dev_net(dev), dev) < 0) &&
+   dev->proc_dev) {
+   proc_remove(dev->proc_dev);
+   dev->proc_dev = NULL;
+   }
+   }
+#endif
+   break;
+   case NETDEV_UP:
+   netpolicy_dev_init(dev_net(dev), dev);
+   break;
+   case NETDEV_GOING_DOWN:
+   uninit_netpolicy(dev);
+#ifdef CONFIG_PROC_FS
+   proc_remove(dev->proc_dev);
+   dev->proc_dev = NULL;
+#endif
+   break;
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_dev_notf = {
+   .notifier_call = netpolicy_notify,
+};
+
 static int __init netpolicy_init(void)
 {
int ret;
 
ret = register_pernet_subsys(&netpolicy_net_ops);
+   if (!ret)
+   register_netdevice_notifier(&netpolicy_dev_notf);
 
return ret;
 }
 
 static void __exit netpolicy_exit(void)
 {
+   unregister_netdevice_notifier(&netpolicy_dev_notf);
unregister_pernet_subsys(&netpolicy_net_ops);
 }
 
-- 
2.5.5



[RFC V2 PATCH 02/25] net/netpolicy: init NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

This patch tries to initialize NET policy for all the devices in the
system. However, not all device drivers have NET policy support. For
those drivers who does not have NET policy support, the node will not be
showed in /proc/net/netpolicy/.
The device driver who has NET policy support must implement the
interface ndo_netpolicy_init, which is used to do necessory
initialization and collect information (E.g. supported policies) from
driver.

The user can check /proc/net/netpolicy/ and
/proc/net/netpolicy/$DEV/policy to know the available device and its
supported policy.

np_lock is also introduced to protect the state of NET policy.

Device hotplug will be handled later in this series.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h | 12 +++
 include/linux/netpolicy.h | 31 +
 net/core/netpolicy.c  | 86 +--
 3 files changed, 118 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/netpolicy.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 19638d6..2e0a7e7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1120,6 +1121,9 @@ struct netdev_xdp {
  * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp);
  * This function is used to set or query state related to XDP on the
  * netdevice. See definition of enum xdp_netdev_command for details.
+ * int(*ndo_netpolicy_init)(struct net_device *dev,
+ * struct netpolicy_info *info);
+ * This function is used to init and get supported policy.
  *
  */
 struct net_device_ops {
@@ -1306,6 +1310,10 @@ struct net_device_ops {
   int needed_headroom);
int (*ndo_xdp)(struct net_device *dev,
   struct netdev_xdp *xdp);
+#ifdef CONFIG_NETPOLICY
+   int (*ndo_netpolicy_init)(struct net_device *dev,
+ struct netpolicy_info 
*info);
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
@@ -1620,6 +1628,8 @@ enum netdev_priv_flags {
  * switch port.
  *
  * @proc_dev:  device node in proc to configure device net policy
+ * @netpolicy: NET policy related information of net device
+ * @np_lock:   protect the state of NET policy
  *
  * FIXME: cleanup struct net_device such that network protocol info
  * moves out.
@@ -1892,6 +1902,8 @@ struct net_device {
 #ifdef CONFIG_PROC_FS
struct proc_dir_entry   *proc_dev;
 #endif /* CONFIG_PROC_FS */
+   struct netpolicy_info   *netpolicy;
+   spinlock_t  np_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
new file mode 100644
index 000..ca1f131
--- /dev/null
+++ b/include/linux/netpolicy.h
@@ -0,0 +1,31 @@
+/*
+ * netpolicy.h: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.li...@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#ifndef __LINUX_NETPOLICY_H
+#define __LINUX_NETPOLICY_H
+
+enum netpolicy_name {
+   NET_POLICY_NONE = 0,
+   NET_POLICY_MAX,
+};
+
+extern const char *policy_name[];
+
+struct netpolicy_info {
+   enum netpolicy_name cur_policy;
+   unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+};
+
+#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index faabfe7..5f304d5 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,13 +35,29 @@
 #include 
 #include 
 
+const char *policy_name[NET_POLICY_MAX] = {
+   "NONE"
+};
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
struct net_device *dev = (struct net_device *)m->private;
-
-   seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+   int i;
+
+   if (WARN_ON(!dev->netpolicy))
+   return -EINVAL;
+
+   if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+   seq_printf(m, "%s: There is no policy applied\n", dev->name);
+   seq_printf(m, "%s: The available policy include:", dev->name);
+   for_each_set_bit(i, dev->netpolicy->avail_policy, 
NET_POLICY_MAX)
+   seq_printf(m, " %s", policy_name[

[RFC V2 PATCH 13/25] net/netpolicy: support CPU hotplug

2016-08-04 Thread kan . liang
From: Kan Liang 

For CPU hotplug, the NET policy subsystem will rebuild the sys map and
object list.

Signed-off-by: Kan Liang 
---
 net/core/netpolicy.c | 76 
 1 file changed, 76 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 2a04fcf..3b523fc 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -838,6 +839,73 @@ static struct notifier_block netpolicy_dev_notf = {
.notifier_call = netpolicy_notify,
 };
 
+/**
+ * update_netpolicy_sys_map() - rebuild the sys map and object list
+ *
+ * This function go through all the available net policy supported device,
+ * and rebuild sys map and object list.
+ *
+ */
+void update_netpolicy_sys_map(void)
+{
+   struct net *net;
+   struct net_device *dev, *aux;
+   enum netpolicy_name cur_policy;
+
+   for_each_net(net) {
+   for_each_netdev_safe(net, dev, aux) {
+   spin_lock(&dev->np_lock);
+   if (!dev->netpolicy)
+   goto unlock;
+   cur_policy = dev->netpolicy->cur_policy;
+   if (cur_policy == NET_POLICY_NONE)
+   goto unlock;
+
+   dev->netpolicy->cur_policy = NET_POLICY_NONE;
+
+   /* rebuild everything */
+   netpolicy_disable(dev);
+   netpolicy_enable(dev);
+   if (netpolicy_gen_obj_list(dev, cur_policy)) {
+   pr_warn("NETPOLICY: Failed to generate 
netpolicy object list for dev %s\n",
+   dev->name);
+   netpolicy_disable(dev);
+   goto unlock;
+   }
+   if (dev->netdev_ops->ndo_set_net_policy(dev, 
cur_policy)) {
+   pr_warn("NETPOLICY: Failed to set netpolicy for 
dev %s\n",
+   dev->name);
+   netpolicy_disable(dev);
+   goto unlock;
+   }
+
+   dev->netpolicy->cur_policy = cur_policy;
+unlock:
+   spin_unlock(&dev->np_lock);
+   }
+   }
+}
+
+static int netpolicy_cpu_callback(struct notifier_block *nfb,
+ unsigned long action, void *hcpu)
+{
+   switch (action & ~CPU_TASKS_FROZEN) {
+   case CPU_ONLINE:
+   update_netpolicy_sys_map();
+   break;
+   case CPU_DYING:
+   update_netpolicy_sys_map();
+   break;
+   }
+   return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_cpu_notifier = {
+   &netpolicy_cpu_callback,
+   NULL,
+   0
+};
+
 static int __init netpolicy_init(void)
 {
int ret;
@@ -846,6 +914,10 @@ static int __init netpolicy_init(void)
if (!ret)
register_netdevice_notifier(&netpolicy_dev_notf);
 
+   cpu_notifier_register_begin();
+   __register_cpu_notifier(&netpolicy_cpu_notifier);
+   cpu_notifier_register_done();
+
return ret;
 }
 
@@ -853,6 +925,10 @@ static void __exit netpolicy_exit(void)
 {
unregister_netdevice_notifier(&netpolicy_dev_notf);
unregister_pernet_subsys(&netpolicy_net_ops);
+
+   cpu_notifier_register_begin();
+   __unregister_cpu_notifier(&netpolicy_cpu_notifier);
+   cpu_notifier_register_done();
 }
 
 subsys_initcall(netpolicy_init);
-- 
2.5.5



[RFC V2 PATCH 05/25] net/netpolicy: create CPU and queue mapping

2016-08-04 Thread kan . liang
From: Kan Liang 

Current implementation forces CPU and queue 1:1 mapping. This patch
introduces the function netpolicy_update_sys_map to create this mapping.
The result is stored in netpolicy_sys_info.

If the CPU count and queue count are different, the remaining
CPUs/queues are not used for now.

CPU hotplug, device hotplug or ethtool may change the CPU count or
queue count. For these cases, this function can also be called to
reconstruct the mapping. These cases will be handled later in this
series.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h | 18 
 net/core/netpolicy.c  | 74 +++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fc87d9b..a946b75c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -30,9 +30,27 @@ struct netpolicy_dev_info {
u32 *tx_irq;
 };
 
+struct netpolicy_sys_map {
+   u32 cpu;
+   u32 queue;
+   u32 irq;
+};
+
+struct netpolicy_sys_info {
+   /*
+* Record the cpu and queue 1:1 mapping
+*/
+   u32 avail_rx_num;
+   struct netpolicy_sys_map*rx;
+   u32 avail_tx_num;
+   struct netpolicy_sys_map*tx;
+};
+
 struct netpolicy_info {
enum netpolicy_name cur_policy;
unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+   /* cpu and queue mapping information */
+   struct netpolicy_sys_info   sys_info;
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 075aaca..ff7fc04 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -54,6 +54,80 @@ static u32 netpolicy_get_cpu_information(void)
return num_online_cpus();
 }
 
+static void netpolicy_free_sys_map(struct net_device *dev)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+
+   kfree(s_info->rx);
+   s_info->rx = NULL;
+   s_info->avail_rx_num = 0;
+   kfree(s_info->tx);
+   s_info->tx = NULL;
+   s_info->avail_tx_num = 0;
+}
+
+static int netpolicy_update_sys_map(struct net_device *dev,
+   struct netpolicy_dev_info *d_info,
+   u32 cpu)
+{
+   struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+   u32 num, i, online_cpu;
+   cpumask_var_t cpumask;
+
+   if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+   return -ENOMEM;
+
+   /* update rx cpu map */
+   if (cpu > d_info->rx_num)
+   num = d_info->rx_num;
+   else
+   num = cpu;
+
+   s_info->avail_rx_num = num;
+   s_info->rx = kcalloc(num, sizeof(*s_info->rx), GFP_ATOMIC);
+   if (!s_info->rx)
+   goto err;
+   cpumask_copy(cpumask, cpu_online_mask);
+
+   i = 0;
+   for_each_cpu(online_cpu, cpumask) {
+   if (i == num)
+   break;
+   s_info->rx[i].cpu = online_cpu;
+   s_info->rx[i].queue = i;
+   s_info->rx[i].irq = d_info->rx_irq[i];
+   i++;
+   }
+
+   /* update tx cpu map */
+   if (cpu >= d_info->tx_num)
+   num = d_info->tx_num;
+   else
+   num = cpu;
+
+   s_info->avail_tx_num = num;
+   s_info->tx = kcalloc(num, sizeof(*s_info->tx), GFP_ATOMIC);
+   if (!s_info->tx)
+   goto err;
+
+   i = 0;
+   for_each_cpu(online_cpu, cpumask) {
+   if (i == num)
+   break;
+   s_info->tx[i].cpu = online_cpu;
+   s_info->tx[i].queue = i;
+   s_info->tx[i].irq = d_info->tx_irq[i];
+   i++;
+   }
+
+   free_cpumask_var(cpumask);
+   return 0;
+err:
+   netpolicy_free_sys_map(dev);
+   free_cpumask_var(cpumask);
+   return -ENOMEM;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
"NONE"
 };
-- 
2.5.5



[RFC V2 PATCH 00/25] Kernel NET policy

2016-08-04 Thread kan . liang
From: Kan Liang 

It is a big challenge to get good network performance. First, the network
performance is not good with default system settings. Second, it is too
difficult to do automatic tuning for all possible workloads, since workloads
have different requirements. Some workloads may want high throughput. Some may
need low latency. Last but not least, there are lots of manual configurations.
Fine grained configuration is too difficult for users.

NET policy intends to simplify the network configuration and get a good network
performance according to the hints(policy) which is applied by user. It
provides some typical "policies" for user which can be set per-socket, per-task
or per-device. The kernel will automatically figures out how to merge different
requests to get good network performance.

NET policy is designed for multiqueue network devices. This implementation is
only for Intel NICs using i40e driver. But the concepts and generic code should
apply to other multiqueue NICs too.

NET policy is also a combination of generic policy manager code and some
ethtool callbacks (per queue coalesce setting, flow classification rules) to
configure the driver.

This series also supports CPU hotplug and device hotplug.

Here are some common questions about NET policy.
 1. Why userspace tool cannot do the same thing?
A: Kernel is more suitable for NET policy.
   - User space code would be far more complicated to get right and perform
 well . It always need to work with out of date state compared to the
 latest, because it cannot do any locking with the kernel state.
   - User space code is less efficient than kernel code, because of the
 additional context switches needed.
   - Kernel is in the right position to coordinate requests from multiple
 users.

 2. Is NET policy looking for optimal settings?
A: No. The NET policy intends to get a good network performance according
   to user's specific request. Our target for good performance is ~90% of
   the optimal settings.

 3. How's the configuration impact the connection rates?
A: There are two places to acquire rtnl mutex to configure the device.
   - One is to do device policy setting. It happens on initalization stage,
 hotplug or queue number changes. The device policy will be set to
 NET_POLICY_NONE. If so, it "falls back" to the system default way to
 direct the packets. It doesn't block the connection.
   - The other is to set Rx network flow classification options or rules.
 It uses work queue to do asynchronized setting. It avoid destroying
 the connection rates.

 4. Why not using existing mechanism for NET policy?
For example, cgroup tc or existing SOCKET options.
A: The NET policy has already used existing mechanism as many as it can.
   For example, it uses existing ethtool interface to configure the device.
   However, the NET policy stiil need to introduce new interfaces to meet
   its special request.
   For resource usage, current cgroup tc is not suitable for per-socket
   setting. Also, current tc can only set rate limit. The NET policy wants
   to change interrupt moderation per device queue. So in this series, it
   will not use cgroup tc. But in some places, cgroup and NET policy are
   similar. For example, both of them isolates the resource usage. Both of
   them do traffic controller. So it is on the NET policy TODO list to
   work well with cgroup.
   For socket options, SO_MARK or may be SO_PRIORITY is close to NET 
policy's
   requirement. But they can not be reused for NET policy. SO_MARK can be
   used for routing and packet filtering. But the NET policy doesn't intend 
to
   change the routing. It only redirects the packet to the specific device
   queue. Also, the target queue is assigned by NET policy subsystem at run
   time. It should not be set in advance. SO_PRIORITY can set 
protocol-defined
   priority for all packets on the socket. But the policies don't have 
priority.

 5. Why disable IRQ balance?
A: Disabling IRQ balance is a common way (recommend way for some devices) to
   tune network performance.


Here are some key Interfaces/APIs for NET policy.

Interfaces which export to user space

   /proc/net/netpolicy/$DEV/policy
   User can set/get per device policy from /proc

   /proc/$PID/net_policy
   User can set/get per task policy from /proc
   prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
   An alternative way to set/get per task policy is from prctl.

   setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
   User can set/get per socket policy by setsockopt

New ndo opt

   int (*ndo_netpolicy_init)(struct net_device *dev,
 struct netpolicy_info *info);
   Initialize device driver for NET policy

   int (*ndo_get_irq_info)(struct net_device *dev,
   s

[PATCH net 1/3] mlxsw: spectrum: Do not assume PAUSE frames are disabled

2016-08-04 Thread Ido Schimmel
When ieee_setpfc() gets called, PAUSE frames are not necessarily
disabled on the port.

Check if PAUSE frames are disabled or enabled and configure the port's
headroom buffer accordingly.

Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
index 01cfb75..3c4a178 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
@@ -351,17 +351,17 @@ static int mlxsw_sp_dcbnl_ieee_setpfc(struct net_device 
*dev,
  struct ieee_pfc *pfc)
 {
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
+   bool pause_en = mlxsw_sp_port_is_pause_en(mlxsw_sp_port);
int err;
 
-   if ((mlxsw_sp_port->link.tx_pause || mlxsw_sp_port->link.rx_pause) &&
-   pfc->pfc_en) {
+   if (pause_en && pfc->pfc_en) {
netdev_err(dev, "PAUSE frames already enabled on port\n");
return -EINVAL;
}
 
err = __mlxsw_sp_port_headroom_set(mlxsw_sp_port, dev->mtu,
   mlxsw_sp_port->dcb.ets->prio_tc,
-  false, pfc);
+  pause_en, pfc);
if (err) {
netdev_err(dev, "Failed to configure port's headroom for 
PFC\n");
return err;
@@ -380,7 +380,7 @@ static int mlxsw_sp_dcbnl_ieee_setpfc(struct net_device 
*dev,
 
 err_port_pfc_set:
__mlxsw_sp_port_headroom_set(mlxsw_sp_port, dev->mtu,
-mlxsw_sp_port->dcb.ets->prio_tc, false,
+mlxsw_sp_port->dcb.ets->prio_tc, pause_en,
 mlxsw_sp_port->dcb.pfc);
return err;
 }
-- 
2.8.2



[RFC V2 PATCH 15/25] net/netpolicy: implement netpolicy register

2016-08-04 Thread kan . liang
From: Kan Liang 

The socket/task can only be benefited when it register itself with
specific policy. If it's the first time to register, a record will be
created and inserted into RCU hash table. The record includes ptr,
policy and object information. ptr is the socket/task's pointer which is
used as key to search the record in hash table. Object will be assigned
later.

This patch also introduces a new type NET_POLICY_INVALID, which
indicates that the task/socket are not registered.

np_hashtable_lock is introduced to protect the hash table.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |  26 
 net/core/netpolicy.c  | 153 ++
 2 files changed, 179 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index cc75e3c..5900252 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -17,6 +17,7 @@
 #define __LINUX_NETPOLICY_H
 
 enum netpolicy_name {
+   NET_POLICY_INVALID  = -1,
NET_POLICY_NONE = 0,
NET_POLICY_CPU,
NET_POLICY_BULK,
@@ -79,12 +80,37 @@ struct netpolicy_info {
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_instance {
+   struct net_device   *dev;
+   enum netpolicy_name policy; /* required policy */
+   void*ptr;   /* pointers */
+};
+
+/* check if policy is valid */
+static inline int is_net_policy_valid(enum netpolicy_name policy)
+{
+   return ((policy < NET_POLICY_MAX) && (policy > NET_POLICY_INVALID));
+}
+
 #ifdef CONFIG_NETPOLICY
 extern void update_netpolicy_sys_map(void);
+extern int netpolicy_register(struct netpolicy_instance *instance,
+ enum netpolicy_name policy);
+extern void netpolicy_unregister(struct netpolicy_instance *instance);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
 }
+
+static inline int netpolicy_register(struct netpolicy_instance *instance,
+enum netpolicy_name policy)
+{  return 0;
+}
+
+static inline void netpolicy_unregister(struct netpolicy_instance *instance)
+{
+}
+
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7579685..3605761 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -38,6 +38,19 @@
 #include 
 #include 
 #include 
+#include 
+
+struct netpolicy_record {
+   struct hlist_node   hash_node;
+   unsigned long   ptr_id;
+   enum netpolicy_name policy;
+   struct net_device   *dev;
+   struct netpolicy_object *rx_obj;
+   struct netpolicy_object *tx_obj;
+};
+
+static DEFINE_HASHTABLE(np_record_hash, 10);
+static DEFINE_SPINLOCK(np_hashtable_lock);
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -223,6 +236,143 @@ static int netpolicy_enable(struct net_device *dev)
return 0;
 }
 
+static struct netpolicy_record *netpolicy_record_search(unsigned long ptr_id)
+{
+   struct netpolicy_record *rec = NULL;
+
+   hash_for_each_possible_rcu(np_record_hash, rec, hash_node, ptr_id) {
+   if (rec->ptr_id == ptr_id)
+   break;
+   }
+
+   return rec;
+}
+
+static void put_queue(struct net_device *dev,
+ struct netpolicy_object *rx_obj,
+ struct netpolicy_object *tx_obj)
+{
+   if (!dev || !dev->netpolicy)
+   return;
+
+   if (rx_obj)
+   atomic_dec(&rx_obj->refcnt);
+   if (tx_obj)
+   atomic_dec(&tx_obj->refcnt);
+}
+
+static void netpolicy_record_clear_obj(void)
+{
+   struct netpolicy_record *rec;
+   int i;
+
+   spin_lock_bh(&np_hashtable_lock);
+   hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+   put_queue(rec->dev, rec->rx_obj, rec->tx_obj);
+   rec->rx_obj = NULL;
+   rec->tx_obj = NULL;
+   }
+   spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void netpolicy_record_clear_dev_node(struct net_device *dev)
+{
+   struct netpolicy_record *rec;
+   int i;
+
+   spin_lock_bh(&np_hashtable_lock);
+   hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+   if (rec->dev == dev) {
+   hash_del_rcu(&rec->hash_node);
+   kfree(rec);
+   }
+   }
+   spin_unlock_bh(&np_hashtable_lock);
+}
+
+/**
+ * netpolicy_register() - Register per socket/task policy request
+ * @instance:  NET policy per socket/task instance info
+ * @policy:request NET policy
+ *
+ * This function intends to register per socket/task policy request.
+ * If it's the first time to register, an record will be created and
+ * inserted into RCU hash table.
+ *
+ * The record includes ptr, policy and object info. ptr of the socket/task
+ * is the key to search the re

[RFC V2 PATCH 18/25] net/netpolicy: set Tx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang 

When the device tries to transmit a packet, netdev_pick_tx is called to
find the available Tx queues. If the net policy is applied, it picks up
the assigned Tx queue from net policy subsystem, and redirect the
traffic to the assigned queue.

Signed-off-by: Kan Liang 
---
 include/net/sock.h |  9 +
 net/core/dev.c | 20 ++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index fd4132f..6219434 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2273,4 +2273,13 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
+/* Return netpolicy instance information from socket. */
+static inline struct netpolicy_instance *netpolicy_find_instance(struct sock 
*sk)
+{
+#ifdef CONFIG_NETPOLICY
+   if (is_net_policy_valid(sk->sk_netpolicy.policy))
+   return &sk->sk_netpolicy;
+#endif
+   return NULL;
+}
 #endif /* _SOCK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 2a9c39f..08db6eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device 
*dev,
struct sk_buff *skb,
void *accel_priv)
 {
+   struct sock *sk = skb->sk;
int queue_index = 0;
 
 #ifdef CONFIG_XPS
@@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct net_device 
*dev,
if (ops->ndo_select_queue)
queue_index = ops->ndo_select_queue(dev, skb, 
accel_priv,
__netdev_pick_tx);
-   else
-   queue_index = __netdev_pick_tx(dev, skb);
+   else {
+#ifdef CONFIG_NETPOLICY
+   struct netpolicy_instance *instance;
+
+   queue_index = -1;
+   if (dev->netpolicy && sk) {
+   instance = netpolicy_find_instance(sk);
+   if (instance) {
+   if (!instance->dev)
+   instance->dev = dev;
+   queue_index = 
netpolicy_pick_queue(instance, false);
+   }
+   }
+   if (queue_index < 0)
+#endif
+   queue_index = __netdev_pick_tx(dev, skb);
+   }
 
if (!accel_priv)
queue_index = netdev_cap_txqueue(dev, queue_index);
-- 
2.5.5



[RFC V2 PATCH 09/25] net/netpolicy: set NET policy by policy name

2016-08-04 Thread kan . liang
From: Kan Liang 

User can write policy name to /proc/net/netpolicy/$DEV/policy to enable
net policy for specific device.

When the policy is enabled, the subsystem automatically disables IRQ
balance and set IRQ affinity. The object list is also generated
accordingly.

It is device driver's responsibility to set driver specific
configuration for the given policy.

np_lock will be used to protect the state.

Signed-off-by: Kan Liang 
---
 include/linux/netdevice.h |  5 +++
 include/linux/netpolicy.h |  1 +
 net/core/netpolicy.c  | 95 +++
 3 files changed, 101 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1eda870..aa3ef38 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1127,6 +1127,9 @@ struct netdev_xdp {
  * int (*ndo_get_irq_info)(struct net_device *dev,
  *struct netpolicy_dev_info *info);
  * This function is used to get irq information of rx and tx queues
+ * int (*ndo_set_net_policy)(struct net_device *dev,
+ *  enum netpolicy_name name);
+ * This function is used to set per device net policy by name
  *
  */
 struct net_device_ops {
@@ -1318,6 +1321,8 @@ struct net_device_ops {
  struct netpolicy_info 
*info);
int (*ndo_get_irq_info)(struct net_device *dev,
struct netpolicy_dev_info 
*info);
+   int (*ndo_set_net_policy)(struct net_device *dev,
+ enum netpolicy_name name);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 73a5fa6..b1d9277 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -27,6 +27,7 @@ enum netpolicy_traffic {
NETPOLICY_RXTX,
 };
 
+#define POLICY_NAME_LEN_MAX64
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0f8ff16..8112839 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
@@ -430,6 +431,69 @@ err:
return ret;
 }
 
+static int net_policy_set_by_name(char *name, struct net_device *dev)
+{
+   int i, ret;
+
+   spin_lock(&dev->np_lock);
+   ret = 0;
+
+   if (!dev->netpolicy ||
+   !dev->netdev_ops->ndo_set_net_policy) {
+   ret = -ENOTSUPP;
+   goto unlock;
+   }
+
+   for (i = 0; i < NET_POLICY_MAX; i++) {
+   if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+   break;
+   }
+
+   if (!test_bit(i, dev->netpolicy->avail_policy)) {
+   ret = -ENOTSUPP;
+   goto unlock;
+   }
+
+   if (i == dev->netpolicy->cur_policy)
+   goto unlock;
+
+   /* If there is no policy applied yet, need to do enable first . */
+   if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+   ret = netpolicy_enable(dev);
+   if (ret)
+   goto unlock;
+   }
+
+   netpolicy_free_obj_list(dev);
+
+   /* Generate object list according to policy name */
+   ret = netpolicy_gen_obj_list(dev, i);
+   if (ret)
+   goto err;
+
+   /* set policy */
+   ret = dev->netdev_ops->ndo_set_net_policy(dev, i);
+   if (ret)
+   goto err;
+
+   /* If removing policy, need to do disable. */
+   if (i == NET_POLICY_NONE)
+   netpolicy_disable(dev);
+
+   dev->netpolicy->cur_policy = i;
+
+   spin_unlock(&dev->np_lock);
+   return 0;
+
+err:
+   netpolicy_free_obj_list(dev);
+   if (dev->netpolicy->cur_policy == NET_POLICY_NONE)
+   netpolicy_disable(dev);
+unlock:
+   spin_unlock(&dev->np_lock);
+   return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -459,11 +523,40 @@ static int net_policy_proc_open(struct inode *inode, 
struct file *file)
return single_open(file, net_policy_proc_show, PDE_DATA(inode));
 }
 
+static ssize_t net_policy_proc_write(struct file *file, const char __user *buf,
+size_t count, loff_t *pos)
+{
+   struct seq_file *m = file->private_data;
+   struct net_device *dev = (struct net_device *)m->private;
+   char name[POLICY_NAME_LEN_MAX];
+   int i, ret;
+
+   if (!dev->netpolicy)
+   return -ENOTSUPP;
+
+   if (count > POLICY_NAME_LEN_MAX)
+   return -EINVAL;
+
+   if (copy_from_user(name, buf, count))
+   return -EINVAL;
+
+   for (i = 0; i < count - 1; i++)
+   name[i] = tou

[RFC V2 PATCH 19/25] net/netpolicy: set Rx queues according to policy

2016-08-04 Thread kan . liang
From: Kan Liang 

For setting Rx queues, this patch configure Rx network flow
classification rules to redirect the packets to the assigned queue.

Since we may not get all the information required for rule until the
first packet arrived, it will add the rule after recvmsg. Also, to
avoid destroying the connection rates, the configuration will be done
asynchronized by work queue. So the first several packets may not use
the assigned queue.

The dev information will be discarded in udp_queue_rcv_skb, so we record
it in netpolicy struct in advance.

This patch only support INET tcp4 and udp4. It can be extend to other
socket type and V6 later shortly.

For each sk, it only supports one rule. If the port/address changed, the
previos rule will be replaced.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |  33 +++-
 net/core/netpolicy.c  | 131 +-
 net/ipv4/af_inet.c|  71 +
 net/ipv4/udp.c|   4 ++
 4 files changed, 236 insertions(+), 3 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a522015..df962de 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -37,6 +37,8 @@ enum netpolicy_traffic {
NETPOLICY_RXTX,
 };
 
+#define NETPOLICY_INVALID_QUEUE-1
+#define NETPOLICY_INVALID_LOC  NETPOLICY_INVALID_QUEUE
 #define POLICY_NAME_LEN_MAX64
 extern const char *policy_name[];
 
@@ -80,10 +82,32 @@ struct netpolicy_info {
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_tcpudpip4_spec {
+   /* source and Destination host and port */
+   __be32  ip4src;
+   __be32  ip4dst;
+   __be16  psrc;
+   __be16  pdst;
+};
+
+union netpolicy_flow_union {
+   struct netpolicy_tcpudpip4_spec tcp_udp_ip4_spec;
+};
+
+struct netpolicy_flow_spec {
+   __u32   flow_type;
+   union netpolicy_flow_union  spec;
+};
+
 struct netpolicy_instance {
struct net_device   *dev;
-   enum netpolicy_name policy; /* required policy */
-   void*ptr;   /* pointers */
+   enum netpolicy_name policy; /* required policy */
+   void*ptr;   /* pointers */
+   int location;   /* rule location */
+   atomic_trule_queue; /* queue set by rule */
+   struct work_struct  fc_wk;  /* flow classification work */
+   atomic_tfc_wk_cnt;  /* flow classification work 
number */
+   struct netpolicy_flow_spec flow;/* flow information */
 };
 
 /* check if policy is valid */
@@ -98,6 +122,7 @@ extern int netpolicy_register(struct netpolicy_instance 
*instance,
  enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_instance *instance);
 extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool 
is_rx);
+extern void netpolicy_set_rules(struct netpolicy_instance *instance);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -116,6 +141,10 @@ static inline int netpolicy_pick_queue(struct 
netpolicy_instance *instance, bool
 {
return 0;
 }
+
+static inline void netpolicy_set_rules(struct netpolicy_instance *instance)
+{
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 98ca430..89c65d9 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpolicy_record {
struct hlist_node   hash_node;
@@ -52,6 +53,8 @@ struct netpolicy_record {
 static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
+struct workqueue_struct *np_fc_wq;
+
 static int netpolicy_get_dev_info(struct net_device *dev,
  struct netpolicy_dev_info *d_info)
 {
@@ -426,6 +429,90 @@ int netpolicy_pick_queue(struct netpolicy_instance 
*instance, bool is_rx)
 }
 EXPORT_SYMBOL(netpolicy_pick_queue);
 
+void np_flow_rule_set(struct work_struct *wk)
+{
+   struct netpolicy_instance *instance;
+   struct netpolicy_flow_spec *flow;
+   struct ethtool_rxnfc cmd;
+   struct net_device *dev;
+   int queue, ret;
+
+   instance = container_of(wk, struct netpolicy_instance,
+   fc_wk);
+   if (!instance)
+   return;
+
+   flow = &instance->flow;
+   if (WARN_ON(!flow))
+   goto done;
+   dev = instance->dev;
+   if (WARN_ON(!dev))
+   goto done;
+
+   /* Check if ntuple is supported */
+   if (!dev->ethtool_ops->set_rxnfc)
+   goto done;
+
+   /* Only support TCP/UDP V4 by now */
+   if ((flow->flow_type != TCP_V4_FLOW) &&
+   (flow->flow_type != UDP_V4_FLOW))
+   goto done;
+
+   queue = get_avail

[PATCH net 3/3] mlxsw: spectrum: Add missing DCB rollback in error path

2016-08-04 Thread Ido Schimmel
We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
I missed its rollback in the error path of port creation, so add it.

Fixes: f00817df2b42 ("mlxsw: spectrum: Introduce support for Data Center 
Bridging (DCB)")
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index c3e6150..e1b8f62 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2220,6 +2220,7 @@ err_port_vlan_init:
 err_core_port_init:
unregister_netdev(dev);
 err_register_netdev:
+   mlxsw_sp_port_dcb_fini(mlxsw_sp_port);
 err_port_dcb_init:
 err_port_ets_init:
 err_port_buffers_init:
-- 
2.8.2



[RFC V2 PATCH 16/25] net/netpolicy: introduce per socket netpolicy

2016-08-04 Thread kan . liang
From: Kan Liang 

The network socket is the most basic unit which control the network
traffic. This patch introduces a new socket option SO_NETPOLICY to
set/get net policy for socket. so that the application can set its own
policy on socket to improve the network performance.
Per socket net policy can also be inherited by new socket.

The usage of SO_NETPOLICY socket option is as below.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
getsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
The policy set by SO_NETPOLICY socket option must be valid and
compatible with current device policy. Othrewise, it will error out. The
socket policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang 
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h |  2 ++
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  2 ++
 arch/mips/include/uapi/asm/socket.h|  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h|  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/net/request_sock.h |  4 +++-
 include/net/sock.h |  9 +
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 28 
 16 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..06b2ef9 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h 
b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..24f85f0 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..82c8d44 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..b99c1df 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..71a43ed 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 2027240a..ce8b9ba 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..c041265 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h 
b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..2639dcd 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE  0x402E
 
+#define SO_NETPOLICY   0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h 
b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..e04e3b6 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h 
b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..d43b854 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
 
 #define SO_CNX_ADVICE  53
 
+#define SO_NETPOLICY   54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h 
b/arch/sparc/include/uapi/

[RFC V2 PATCH 17/25] net/netpolicy: introduce netpolicy_pick_queue

2016-08-04 Thread kan . liang
From: Kan Liang 

To achieve better network performance, the key step is to distribute the
packets to dedicated queues according to policy and system run time
status.

This patch provides an interface which can return the proper dedicated
queue for socket/task. Then the packets of the socket/task will be
redirect to the dedicated queue for better network performance.

For selecting the proper queue, currently it uses round-robin algorithm
to find the available object from the given policy object list. The
algorithm is good enough for now. But it could be improved by some
adaptive algorithm later.

The selected object will be stored in hashtable. So it does not need to
go through the whole object list every time.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h |   5 ++
 net/core/netpolicy.c  | 136 ++
 2 files changed, 141 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 5900252..a522015 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void);
 extern int netpolicy_register(struct netpolicy_instance *instance,
  enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_instance *instance);
+extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool 
is_rx);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct 
netpolicy_instance *instance)
 {
 }
 
+static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, 
bool is_rx)
+{
+   return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 3605761..98ca430 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -290,6 +290,142 @@ static void netpolicy_record_clear_dev_node(struct 
net_device *dev)
spin_unlock_bh(&np_hashtable_lock);
 }
 
+static struct netpolicy_object *get_avail_object(struct net_device *dev,
+enum netpolicy_name policy,
+bool is_rx)
+{
+   int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+   struct netpolicy_object *tmp, *obj = NULL;
+   int val = -1;
+
+   /* Check if net policy is supported */
+   if (!dev || !dev->netpolicy)
+   return NULL;
+
+   /* The system should have queues which support the request policy. */
+   if ((policy != dev->netpolicy->cur_policy) &&
+   (dev->netpolicy->cur_policy != NET_POLICY_MIX))
+   return NULL;
+
+   spin_lock_bh(&dev->np_ob_list_lock);
+   list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) {
+   if ((val > atomic_read(&tmp->refcnt)) ||
+   (val == -1)) {
+   val = atomic_read(&tmp->refcnt);
+   obj = tmp;
+   }
+   }
+
+   if (WARN_ON(!obj)) {
+   spin_unlock_bh(&dev->np_ob_list_lock);
+   return NULL;
+   }
+   atomic_inc(&obj->refcnt);
+   spin_unlock_bh(&dev->np_ob_list_lock);
+
+   return obj;
+}
+
+static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
+{
+   struct netpolicy_record *old_record, *new_record;
+   struct net_device *dev = instance->dev;
+   unsigned long ptr_id = (uintptr_t)instance->ptr;
+   int queue = -1;
+
+   spin_lock_bh(&np_hashtable_lock);
+   old_record = netpolicy_record_search(ptr_id);
+   if (!old_record) {
+   pr_warn("NETPOLICY: doesn't registered. Remove net policy 
settings!\n");
+   instance->policy = NET_POLICY_INVALID;
+   goto err;
+   }
+
+   if (is_rx && old_record->rx_obj) {
+   queue = old_record->rx_obj->queue;
+   } else if (!is_rx && old_record->tx_obj) {
+   queue = old_record->tx_obj->queue;
+   } else {
+   new_record = kzalloc(sizeof(*new_record), GFP_KERNEL);
+   if (!new_record)
+   goto err;
+   memcpy(new_record, old_record, sizeof(*new_record));
+
+   if (is_rx) {
+   new_record->rx_obj = get_avail_object(dev, 
new_record->policy, is_rx);
+   if (!new_record->dev)
+   new_record->dev = dev;
+   if (!new_record->rx_obj) {
+   kfree(new_record);
+   goto err;
+   }
+   queue = new_record->rx_obj->queue;
+   } else {
+   new_record->tx_obj = get_avail_object(dev, 
new_record->policy, is_rx);
+   if (!new_record->dev)
+   new_record->dev = dev;
+   if (!new_record->tx_obj) {
+  

[RFC V2 PATCH 14/25] net/netpolicy: handle channel changes

2016-08-04 Thread kan . liang
From: Kan Liang 

User can uses ethtool to set the channel number. This patch handles the
channel changes by rebuilding the object list.

Signed-off-by: Kan Liang 
---
 include/linux/netpolicy.h | 8 
 net/core/ethtool.c| 8 +++-
 net/core/netpolicy.c  | 1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 579ff98..cc75e3c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -79,4 +79,12 @@ struct netpolicy_info {
struct list_head
obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+#ifdef CONFIG_NETPOLICY
+extern void update_netpolicy_sys_map(void);
+#else
+static inline void update_netpolicy_sys_map(void)
+{
+}
+#endif
+
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..e1f8bd0 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1703,6 +1703,7 @@ static noinline_for_stack int ethtool_set_channels(struct 
net_device *dev,
 {
struct ethtool_channels channels, max;
u32 max_rx_in_use = 0;
+   int ret;
 
if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
return -EOPNOTSUPP;
@@ -1726,7 +1727,12 @@ static noinline_for_stack int 
ethtool_set_channels(struct net_device *dev,
(channels.combined_count + channels.rx_count) <= max_rx_in_use)
return -EINVAL;
 
-   return dev->ethtool_ops->set_channels(dev, &channels);
+   ret = dev->ethtool_ops->set_channels(dev, &channels);
+#ifdef CONFIG_NETPOLICY
+   if (!ret)
+   update_netpolicy_sys_map();
+#endif
+   return ret;
 }
 
 static int ethtool_get_pauseparam(struct net_device *dev, void __user 
*useraddr)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 3b523fc..7579685 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -885,6 +885,7 @@ unlock:
}
}
 }
+EXPORT_SYMBOL(update_netpolicy_sys_map);
 
 static int netpolicy_cpu_callback(struct notifier_block *nfb,
  unsigned long action, void *hcpu)
-- 
2.5.5



  1   2   >