Re: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Simon Horman
On Tue, Jul 11, 2017 at 03:51:40PM +0300, Sergei Shtylyov wrote:
> On 07/11/2017 03:21 PM, Simon Horman wrote:
> 
> >>>Add a new compatible string for the RZ/G1M (R8A7743) SoC.
> >>>
> >>>Signed-off-by: Biju Das 
> >>>---
> >>>v1->v2
> >>>* Changed the subject
> >>>* re-formatted the required properties
> >>>
> >>>.../devicetree/bindings/net/renesas,ravb.txt   | 29 
> >>>+-
> >>>1 file changed, 17 insertions(+), 12 deletions(-)
> >>>
> >>>diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
> >>>b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >>>index b519503..4717bc2 100644
> >>>--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >>>+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >>>@@ -4,19 +4,24 @@ This file provides information on what the device node 
> >>>for the Ethernet AVB
> >>>interface contains.
> >>>
> >>>Required properties:
> >>>-- compatible: "renesas,etheravb-r8a7790" if the device is a part of 
> >>>R8A7790 SoC.
> >>>-"renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
> >>>-"renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
> >>>-"renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
> >>>-"renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
> >>>-"renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
> >>>-"renesas,etheravb-r8a7796" if the device is a part of R8A7796 SoC.
> >>>-"renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible 
> >>>interface.
> >>>-"renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible 
> >>>interface.
> >>>+- compatible: Must contain one or more of the following:
> >>
> >>   No, it can't contain more than one SoC specific value.
> >
> >The text above does not say it can unless one is under the delusion
> >that, f.e. an r8a7790 SoC is also an r8a7791 SoC.
> 
>Hm, you're right. Sorry about my misunderstanding...
> 
>The only problem I'm still seeing with the current wording is that gen2/3
> fallback
> values are declared optional, while you can't get the driver to bind without
> them on
> the newer SoCs. But this is probably not a new problem...
> 
> >If you would prefer an alternate wording please provide a suggestion.
> 
>Perhaps we should emphasize that the gen2/3 fallback values are mandatory?

It's only mandatory for SoCs covered by those values - admittedly all SoCs
supported by the driver at this time. But that is an artifact of the
implementation rather than a property of the binding.

In any case I think is already described sufficiently in:

When compatible with the generic version, nodes must list the
SoC-specific version corresponding to the platform first followed by
the generic version.

> >>>+  - "renesas,etheravb-r8a7743" for the R8A7743 SoC.
> >>>+  - "renesas,etheravb-r8a7790" for the R8A7790 SoC.
> >>>+  - "renesas,etheravb-r8a7791" for the R8A7791 SoC.
> >>>+  - "renesas,etheravb-r8a7792" for the R8A7792 SoC.
> >>>+  - "renesas,etheravb-r8a7793" for the R8A7793 SoC.
> >>>+  - "renesas,etheravb-r8a7794" for the R8A7794 SoC.
> >>>+  - "renesas,etheravb-rcar-gen2" as a fallback for the above
> >>>+  R-Car Gen2 and RZ/G1 devices.
> [...]
> 
> WBR, Sergei
> 


[PATCH] iwlwifi: mvm: Fix a memory leak in an error handling path in 'iwl_mvm_sar_get_wgds_table()'

2017-07-11 Thread Christophe JAILLET
We should free 'wgds.pointer' here as done a few lines above in another
error handling path.
It was allocated within 'acpi_evaluate_object()'.

Signed-off-by: Christophe JAILLET 
---
A comment in '/drivers/acpi/acpica/utalloc.c' states that:
/* [...] Note: The caller should use acpi_os_free to free this
 * buffer created via ACPI_ALLOCATE_BUFFER.
 */

So, at the end of this function:
out_free:
kfree(wgds.pointer);
should maybe be:
out_free:
acpi_os_free(wgds.pointer);

If correct, several places should be fixed accordingly.
---
 drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c 
b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index 24cc406d87ef..56d535ab696f 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -1287,8 +1287,10 @@ static int iwl_mvm_sar_get_wgds_table(struct iwl_mvm 
*mvm,
 
entry = _pkg->package.elements[i + 1];
if ((entry->type != ACPI_TYPE_INTEGER) ||
-   (entry->integer.value > U8_MAX))
-   return -EINVAL;
+   (entry->integer.value > U8_MAX)) {
+   ret = -EINVAL;
+   goto out_free;
+   }
 
geo_table->values[i] = entry->integer.value;
}
-- 
2.11.0



Re: [PATCH net-next 2/4] net-next: mediatek: add platform data to adapt into various hardware

2017-07-11 Thread Florian Fainelli


On 07/11/2017 08:37 PM, sean.w...@mediatek.com wrote:
> From: Sean Wang 
> 
> This patch is the preparation patch in order to adapt into various
> hardware through adding platform data which holds specific characteristics
> among MediaTek SoCs and introducing the unified clock handler for those
> distinct clock requirements depending on different features such as
> TRGMII and SGMII getting support on the target SoC. And finally, add
> enhancement with given the generic description for Kconfig and remove the
> unnecessary machine type dependency in Makefile.
> 
> Signed-off-by: Sean Wang 
> ---

>   if (dev->phydev->link)
> @@ -1837,6 +1838,39 @@ static void ethsys_reset(struct mtk_eth *eth, u32 
> reset_bits)
>   mdelay(10);
>  }
>  
> +static void mtk_clk_disable(struct mtk_eth *eth)
> +{
> + int clk;
> +
> + for (clk = MTK_CLK_MAX - 1; clk >= 0; clk--) {
> + if (eth->clks[clk])
> + clk_disable_unprepare(eth->clks[clk]);
> + }

The clock framework works just fine with NULL clk references, no need to
check that.

> +}

There are now (or will be soon in clk-next) bulk accessors that you may
consider using for this.

> +
> +static int mtk_clk_enable(struct mtk_eth *eth)
> +{
> + int clk, ret;
> +
> + for (clk = 0; clk < MTK_CLK_MAX ; clk++) {
> + if (eth->clks[clk]) {
> + ret = clk_prepare_enable(eth->clks[clk]);
> + if (ret)
> + goto err_disable_clks;
> + }
> + }

Same here.
-- 
Florian


Re: Lots of new warnings with gcc-7.1.1

2017-07-11 Thread Jakub Kicinski
On Tue, 11 Jul 2017 15:35:15 -0700, Linus Torvalds wrote:
> I do suspect I'll make "-Wformat-truncation" (as opposed to
> "-Wformat-overflow") be a "V=1" kind of warning.  But let's see how
> many of these we can fix, ok?

Somehow related - what's the stand on -Wimplicit-fallthrough?  I run
into the jump tables in jhash.h generating lots of warnings.  Is it OK
to do this?

--->8--

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 348c6f47e4cc..f6d6513a4c03 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -85,20 +85,19 @@ static inline u32 jhash(const void *key, u32 length, u32 
initval)
k += 12;
}
/* Last block: affect all 32 bits of (c) */
-   /* All the case statements fall through */
switch (length) {
-   case 12: c += (u32)k[11]<<24;
-   case 11: c += (u32)k[10]<<16;
-   case 10: c += (u32)k[9]<<8;
-   case 9:  c += k[8];
-   case 8:  b += (u32)k[7]<<24;
-   case 7:  b += (u32)k[6]<<16;
-   case 6:  b += (u32)k[5]<<8;
-   case 5:  b += k[4];
-   case 4:  a += (u32)k[3]<<24;
-   case 3:  a += (u32)k[2]<<16;
-   case 2:  a += (u32)k[1]<<8;
-   case 1:  a += k[0];
+   case 12: c += (u32)k[11]<<24;   /* fall through */
+   case 11: c += (u32)k[10]<<16;   /* fall through */
+   case 10: c += (u32)k[9]<<8; /* fall through */
+   case 9:  c += k[8]; /* fall through */
+   case 8:  b += (u32)k[7]<<24;/* fall through */
+   case 7:  b += (u32)k[6]<<16;/* fall through */
+   case 6:  b += (u32)k[5]<<8; /* fall through */
+   case 5:  b += k[4]; /* fall through */
+   case 4:  a += (u32)k[3]<<24;/* fall through */
+   case 3:  a += (u32)k[2]<<16;/* fall through */
+   case 2:  a += (u32)k[1]<<8; /* fall through */
+   case 1:  a += k[0]; /* fall through */
 __jhash_final(a, b, c);
case 0: /* Nothing left to add */
break;
@@ -131,11 +130,11 @@ static inline u32 jhash2(const u32 *k, u32 length, u32 
initval)
k += 3;
}
 
-   /* Handle the last 3 u32's: all the case statements fall through */
+   /* Handle the last 3 u32's */
switch (length) {
-   case 3: c += k[2];
-   case 2: b += k[1];
-   case 1: a += k[0];
+   case 3: c += k[2];  /* fall through */
+   case 2: b += k[1];  /* fall through */
+   case 1: a += k[0];  /* fall through */
__jhash_final(a, b, c);
case 0: /* Nothing left to add */
break;


AW: Wir vergeben Kredite mit einem Zinssatz von jährlich 2%

2017-07-11 Thread Bernhard Stöckl

Wir vergeben Kredite mit einem Zinssatz von jährlich 2%.

Die Bearbeitung des Antrags erfolgt rasch, wir verlangen keine Gebühren, was 
sie beantragen werden wir annehmen. Wir bewilligen Kredite von bis zu 40 
Millionen Euro und von mindestens 15.000 Euro. Sie können einen geschäftlichen 
oder privaten Kredit beantrage, wobei Sie bei Geschäftskrediten eine 
Zahlungsfrist von einem Jahr erhalten.

Anfragen bitte per Mail an:   sunrisefundin...@hotmail.com  


Re: [PATCH net-next 0/4] net-next: mediatek: add support for ethernet on MT7622 SoC

2017-07-11 Thread David Miller
From: 
Date: Wed, 12 Jul 2017 11:37:41 +0800

> From: Sean Wang 
> 
> The series adds the driver for ethernet controller found on MT7622 SoC.
> There are additions against with previous MT7623 SoC such as shared SGMII
> given for the dual GMACs and built-in 5-ports 10/100 embedded switch support
> (ESW). Thus more clocks consumers and SGMII hardware setup for the extra
> features are all introduced here and as for the support for ESW that would be
> planned to add in the separate patch integrating with DSA infrastructure
> in the future.
> 
> Currently testing successfully is done with those patches for the conditions
> such as GMAC2 with IP1001 PHY via RGMII and GMAC1/2 with RTL8211F PHY via 
> SGMII.

net-next is closed, please resubmit this when net-next is open again:

http://vger.kernel.org/~davem/net-next.html

Thanks.


Re: [PATCH net v2] samples/bpf: fix a build issue

2017-07-11 Thread David Miller
From: Yonghong Song 
Date: Mon, 10 Jul 2017 14:04:28 -0700

> With latest net-next:
> 
> clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include 
> -I./arch/x86/include -I./arch/x86/include/generated/uapi 
> -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/kconfig.h  -Isamples/bpf \
> -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
> -Wno-compare-distinct-pointer-types \
> -Wno-gnu-variable-sized-type-not-at-end \
> -Wno-address-of-packed-member -Wno-tautological-compare \
> -Wno-unknown-warning-option \
> -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf 
> -filetype=obj -o samples/bpf/tcp_synrto_kern.o
> samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not 
> found
>   ^~
> 1 error generated.
> 
> 
> net has the same issue.
> 
> Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
> Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
> compiler include logic so that programs in samples/bpf can access the headers
> in selftests/bpf, but not the other way around.
> 
> Signed-off-by: Yonghong Song 

Applied, thanks.


Re: Lots of new warnings with gcc-7.1.1

2017-07-11 Thread Linus Torvalds
On Tue, Jul 11, 2017 at 8:17 PM, Linus Torvalds
 wrote:
>
> If that's the case, I'd prefer just turning off the format-truncation
> (but not overflow) warning with '-Wno-format-trunction".

Doing

 KBUILD_CFLAGS  += $(call cc-disable-warning, format-truncation)

in the main Makefile certainly cuts down on the warnings.

We still have some overflow warnings, including the crazy one where
gcc doesn't see that the number of max7315 boards is very limited.

But those could easily be converted to just snprintf() instead, and
then the truncation warning disabling takes care of it. Maybe that's
the right answer.

We also have about a bazillion

warning: ‘*’ in boolean context, suggest ‘&&’ instead

warnings in drivers/ata/libata-core.c, all due to a single macro that
uses a pattern that gcc-7.1.1 doesn't like. The warning looks a bit
debatable, but I suspect the macro could easily be changed too.

Tejun, would you hate just moving the "multiply by 1000" part _into_
that EZ() macro? Something like the attached (UNTESTED!) patch?

  Linus
 drivers/ata/libata-core.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 8453f9a4682f..4c7d5a138495 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3231,19 +3231,19 @@ static const struct ata_timing ata_timing[] = {
 };
 
 #define ENOUGH(v, unit)(((v)-1)/(unit)+1)
-#define EZ(v, unit)((v)?ENOUGH(v, unit):0)
+#define EZ(v, unit)((v)?ENOUGH((v)*1000, unit):0)
 
 static void ata_timing_quantize(const struct ata_timing *t, struct ata_timing 
*q, int T, int UT)
 {
-   q->setup= EZ(t->setup  * 1000,  T);
-   q->act8b= EZ(t->act8b  * 1000,  T);
-   q->rec8b= EZ(t->rec8b  * 1000,  T);
-   q->cyc8b= EZ(t->cyc8b  * 1000,  T);
-   q->active   = EZ(t->active * 1000,  T);
-   q->recover  = EZ(t->recover* 1000,  T);
-   q->dmack_hold   = EZ(t->dmack_hold * 1000,  T);
-   q->cycle= EZ(t->cycle  * 1000,  T);
-   q->udma = EZ(t->udma   * 1000, UT);
+   q->setup= EZ(t->setup,  T);
+   q->act8b= EZ(t->act8b,  T);
+   q->rec8b= EZ(t->rec8b,  T);
+   q->cyc8b= EZ(t->cyc8b,  T);
+   q->active   = EZ(t->active, T);
+   q->recover  = EZ(t->recover,T);
+   q->dmack_hold   = EZ(t->dmack_hold, T);
+   q->cycle= EZ(t->cycle,  T);
+   q->udma = EZ(t->udma,   UT);
 }
 
 void ata_timing_merge(const struct ata_timing *a, const struct ata_timing *b,


[PATCH net-next 2/4] net-next: mediatek: add platform data to adapt into various hardware

2017-07-11 Thread sean.wang
From: Sean Wang 

This patch is the preparation patch in order to adapt into various
hardware through adding platform data which holds specific characteristics
among MediaTek SoCs and introducing the unified clock handler for those
distinct clock requirements depending on different features such as
TRGMII and SGMII getting support on the target SoC. And finally, add
enhancement with given the generic description for Kconfig and remove the
unnecessary machine type dependency in Makefile.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/Kconfig   |  6 +--
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 71 -
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 23 +-
 3 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/Kconfig 
b/drivers/net/ethernet/mediatek/Kconfig
index 698bb89..f9149d2 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -7,11 +7,11 @@ config NET_VENDOR_MEDIATEK
 if NET_VENDOR_MEDIATEK
 
 config NET_MEDIATEK_SOC
-   tristate "MediaTek MT7623 Gigabit ethernet support"
-   depends on NET_VENDOR_MEDIATEK && (MACH_MT7623 || MACH_MT2701)
+   tristate "MediaTek SoC Gigabit Ethernet support"
+   depends on NET_VENDOR_MEDIATEK
select PHYLIB
---help---
  This driver supports the gigabit ethernet MACs in the
- MediaTek MT2701/MT7623 chipset family.
+ MediaTek SoC family.
 
 endif #NET_VENDOR_MEDIATEK
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index b8068ce..51ca79f 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -184,7 +184,8 @@ static void mtk_phy_link_adjust(struct net_device *dev)
break;
};
 
-   if (mac->id == 0 && !mac->trgmii)
+   if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_GMAC1_TRGMII) &&
+   !mac->id && !mac->trgmii)
mtk_gmac0_rgmii_adjust(mac->hw, dev->phydev->speed);
 
if (dev->phydev->link)
@@ -1837,6 +1838,39 @@ static void ethsys_reset(struct mtk_eth *eth, u32 
reset_bits)
mdelay(10);
 }
 
+static void mtk_clk_disable(struct mtk_eth *eth)
+{
+   int clk;
+
+   for (clk = MTK_CLK_MAX - 1; clk >= 0; clk--) {
+   if (eth->clks[clk])
+   clk_disable_unprepare(eth->clks[clk]);
+   }
+}
+
+static int mtk_clk_enable(struct mtk_eth *eth)
+{
+   int clk, ret;
+
+   for (clk = 0; clk < MTK_CLK_MAX ; clk++) {
+   if (eth->clks[clk]) {
+   ret = clk_prepare_enable(eth->clks[clk]);
+   if (ret)
+   goto err_disable_clks;
+   }
+   }
+
+   return 0;
+
+err_disable_clks:
+   while (--clk >= 0) {
+   if (eth->clks[clk])
+   clk_disable_unprepare(eth->clks[clk]);
+   }
+
+   return ret;
+}
+
 static int mtk_hw_init(struct mtk_eth *eth)
 {
int i, val;
@@ -1847,10 +1881,8 @@ static int mtk_hw_init(struct mtk_eth *eth)
pm_runtime_enable(eth->dev);
pm_runtime_get_sync(eth->dev);
 
-   clk_prepare_enable(eth->clks[MTK_CLK_ETHIF]);
-   clk_prepare_enable(eth->clks[MTK_CLK_ESW]);
-   clk_prepare_enable(eth->clks[MTK_CLK_GP1]);
-   clk_prepare_enable(eth->clks[MTK_CLK_GP2]);
+   mtk_clk_enable(eth);
+
ethsys_reset(eth, RSTCTRL_FE);
ethsys_reset(eth, RSTCTRL_PPE);
 
@@ -1925,10 +1957,7 @@ static int mtk_hw_deinit(struct mtk_eth *eth)
if (!test_and_clear_bit(MTK_HW_INIT, >state))
return 0;
 
-   clk_disable_unprepare(eth->clks[MTK_CLK_GP2]);
-   clk_disable_unprepare(eth->clks[MTK_CLK_GP1]);
-   clk_disable_unprepare(eth->clks[MTK_CLK_ESW]);
-   clk_disable_unprepare(eth->clks[MTK_CLK_ETHIF]);
+   mtk_clk_disable(eth);
 
pm_runtime_put_sync(eth->dev);
pm_runtime_disable(eth->dev);
@@ -2406,6 +2435,7 @@ static int mtk_probe(struct platform_device *pdev)
 {
struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
struct device_node *mac_np;
+   const struct of_device_id *match;
struct mtk_eth *eth;
int err;
int i;
@@ -2414,6 +2444,9 @@ static int mtk_probe(struct platform_device *pdev)
if (!eth)
return -ENOMEM;
 
+   match = of_match_device(of_mtk_match, >dev);
+   eth->soc = (struct mtk_soc_data *)match->data;
+
eth->dev = >dev;
eth->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(eth->base))
@@ -2450,7 +2483,12 @@ static int mtk_probe(struct platform_device *pdev)
if (IS_ERR(eth->clks[i])) {
if (PTR_ERR(eth->clks[i]) == -EPROBE_DEFER)
return -EPROBE_DEFER;
-   return 

[PATCH net-next 4/4] MAINTAINERS: add Sean/Nelson as MediaTek ethernet maintainers

2017-07-11 Thread sean.wang
From: Sean Wang 

Sean and Nelson work for MediaTek on maintaining the MediaTek ethernet
driver for the existing SoCs and adding support for the following SoCs.
In the past, Sean has been active at making most of the qualifications
, stress test and submitting a lot of patches for the driver while
Nelson was looking into the aspects more on hardware additions and details
such as introducing PDMA with Hardware LRO to the driver.

Cc: John Crispin 
Signed-off-by: Sean Wang 
Signed-off-by: Nelson Chang 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4f4057c..3b2dd4b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8385,6 +8385,8 @@ F:include/uapi/linux/uvcvideo.h
 MEDIATEK ETHERNET DRIVER
 M: Felix Fietkau 
 M: John Crispin 
+M: Sean Wang 
+M: Nelson Chang 
 L: netdev@vger.kernel.org
 S: Maintained
 F: drivers/net/ethernet/mediatek/
-- 
2.7.4



[PATCH net-next 1/4] dt-bindings: net: mediatek: add support for MediaTek MT7623 and MT7622 SoC

2017-07-11 Thread sean.wang
From: Sean Wang 

The patch adds the supplements in the dt-binding document for MediaTek
MT7622 SoC with extra SGMII system controller and relevant clock consumers
listed as the requirements for those SoCs equipped with the SGMII circuit.
Also, add the missing binding information for MT7623 SoC here which relies
on the fallback binding of MT2701.

Signed-off-by: Sean Wang 
---
 Documentation/devicetree/bindings/net/mediatek-net.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt 
b/Documentation/devicetree/bindings/net/mediatek-net.txt
index c7194e8..1d1168b 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -7,24 +7,30 @@ have dual GMAC each represented by a child node..
 * Ethernet controller node
 
 Required properties:
-- compatible: Should be "mediatek,mt2701-eth"
+- compatible: Should be
+   "mediatek,mt2701-eth": for MT2701 SoC
+   "mediatek,mt7623-eth", "mediatek,mt2701-eth": for MT7623 SoC
+   "mediatek,mt7622-eth": for MT7622 SoC
 - reg: Address and length of the register set for the device
 - interrupts: Should contain the three frame engines interrupts in numeric
order. These are fe_int0, fe_int1 and fe_int2.
 - clocks: the clock used by the core
 - clock-names: the names of the clock listed in the clocks property. These are
-   "ethif", "esw", "gp2", "gp1"
+   "ethif", "esw", "gp2", "gp1" : For MT2701 and MT7623 SoC
+"ethif", "esw", "gp0", "gp1", "gp2", "sgmii_tx250m", "sgmii_rx250m",
+   "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii_ck", "eth2pll" : For MT7622 SoC
 - power-domains: phandle to the power domain that the ethernet is part of
 - resets: Should contain a phandle to the ethsys reset signal
 - reset-names: Should contain the reset signal name "eth"
 - mediatek,ethsys: phandle to the syscon node that handles the port setup
+- mediatek,sgmiisys: phandle to the syscon node that handles the SGMII setup
+   which is required for those SoCs equipped with SGMII such as MT7622 SoC.
 - mediatek,pctl: phandle to the syscon node that handles the ports slew rate
and driver current
 
 Optional properties:
 - interrupt-parent: Should be the phandle for the interrupt controller
   that services interrupts for this device
-
 * Ethernet MAC node
 
 Required properties:
-- 
2.7.4



[PATCH net-next 3/4] net-next: mediatek: add support for MediaTek MT7622 SoC

2017-07-11 Thread sean.wang
From: Sean Wang 

This patch adds the driver for ethernet controller on MT7622 SoC. It has
the similar handling logic as the previously MT7623 does, but there are
additions against with MT7623 SoC, the shared SGMII given for the dual
GMACs and including 5-ports 10/100 embedded switch support (ESW) as the
GMAC1 option, thus more clocks consumers for the extra feature are
introduced here. So for ease portability and maintenance, those
differences all are being kept inside the platform data as other drivers
usually do. Currently testing successfully is done with those patches for
the conditions such as GMAC2 with IP1001 PHY via RGMII and GMAC1/2 with
RTL8211F PHY via SGMII.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 72 -
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 54 +-
 2 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 51ca79f..e3dbea1 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -52,7 +52,8 @@ static const struct mtk_ethtool_stats {
 };
 
 static const char * const mtk_clks_source_name[] = {
-   "ethif", "esw", "gp1", "gp2", "trgpll"
+   "ethif", "esw", "gp0", "gp1", "gp2", "trgpll", "sgmii_tx250m",
+   "sgmii_rx250m", "sgmii_cdr_ref", "sgmii_cdr_fb", "sgmii_ck", "eth2pll"
 };
 
 void mtk_w32(struct mtk_eth *eth, u32 val, unsigned reg)
@@ -162,6 +163,47 @@ static void mtk_gmac0_rgmii_adjust(struct mtk_eth *eth, 
int speed)
mtk_w32(eth, val, TRGMII_TCK_CTRL);
 }
 
+static void mtk_gmac_sgmii_hw_setup(struct mtk_eth *eth, int mac_id)
+{
+   u32 val;
+
+   /* Setup the link timer and QPHY power up inside SGMIISYS */
+   regmap_write(eth->sgmiisys, SGMSYS_PCS_LINK_TIMER,
+SGMII_LINK_TIMER_DEFAULT);
+
+   regmap_read(eth->sgmiisys, SGMSYS_SGMII_MODE, );
+   val |= SGMII_REMOTE_FAULT_DIS;
+   regmap_write(eth->sgmiisys, SGMSYS_SGMII_MODE, val);
+
+   regmap_read(eth->sgmiisys, SGMSYS_PCS_CONTROL_1, );
+   val |= SGMII_AN_RESTART;
+   regmap_write(eth->sgmiisys, SGMSYS_PCS_CONTROL_1, val);
+
+   regmap_read(eth->sgmiisys, SGMSYS_QPHY_PWR_STATE_CTRL, );
+   val &= ~SGMII_PHYA_PWD;
+   regmap_write(eth->sgmiisys, SGMSYS_QPHY_PWR_STATE_CTRL, val);
+
+   /* Determine MUX for which GMAC uses the SGMII interface */
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_DUAL_GMAC_SHARED_SGMII)) {
+   regmap_read(eth->ethsys, ETHSYS_SYSCFG0, );
+   val &= ~SYSCFG0_SGMII_MASK;
+   val |= !mac_id ? SYSCFG0_SGMII_GMAC1 : SYSCFG0_SGMII_GMAC2;
+   regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+
+   dev_info(eth->dev, "setup shared sgmii for gmac=%d\n",
+mac_id);
+   }
+
+   /* Setup the GMAC1 going through SGMII path when SoC also support
+* ESW on GMAC1
+*/
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_ESW | MTK_GMAC1_SGMII) &&
+   !mac_id) {
+   mtk_w32(eth, 0, MTK_MAC_MISC);
+   dev_info(eth->dev, "setup gmac1 going through sgmii");
+   }
+}
+
 static void mtk_phy_link_adjust(struct net_device *dev)
 {
struct mtk_mac *mac = netdev_priv(dev);
@@ -269,6 +311,7 @@ static int mtk_phy_connect(struct net_device *dev)
if (!np)
return -ENODEV;
 
+   mac->ge_mode = 0;
switch (of_get_phy_mode(np)) {
case PHY_INTERFACE_MODE_TRGMII:
mac->trgmii = true;
@@ -276,7 +319,15 @@ static int mtk_phy_connect(struct net_device *dev)
case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_ID:
case PHY_INTERFACE_MODE_RGMII:
-   mac->ge_mode = 0;
+   break;
+   case PHY_INTERFACE_MODE_SGMII:
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII))
+   mtk_gmac_sgmii_hw_setup(eth, mac->id);
+   break;
+   case PHY_INTERFACE_MODE_INTERNAL:
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_GMAC1_ESW) && !mac->id)
+   /* Setup the path through ESW internal switch */
+   mtk_w32(eth, MTK_MUX_TO_ESW, MTK_MAC_MISC);
break;
case PHY_INTERFACE_MODE_MII:
mac->ge_mode = 1;
@@ -2424,6 +2475,7 @@ static int mtk_get_chip_id(struct mtk_eth *eth, u32 
*chip_id)
 static bool mtk_is_hwlro_supported(struct mtk_eth *eth)
 {
switch (eth->chip_id) {
+   case MT7622_ETH:
case MT7623_ETH:
return true;
}
@@ -2463,6 +2515,16 @@ static int mtk_probe(struct platform_device *pdev)
return PTR_ERR(eth->ethsys);
}
 
+   if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII)) {
+   eth->sgmiisys =
+   

[PATCH net-next 0/4] net-next: mediatek: add support for ethernet on MT7622 SoC

2017-07-11 Thread sean.wang
From: Sean Wang 

The series adds the driver for ethernet controller found on MT7622 SoC.
There are additions against with previous MT7623 SoC such as shared SGMII
given for the dual GMACs and built-in 5-ports 10/100 embedded switch support
(ESW). Thus more clocks consumers and SGMII hardware setup for the extra
features are all introduced here and as for the support for ESW that would be
planned to add in the separate patch integrating with DSA infrastructure
in the future.

Currently testing successfully is done with those patches for the conditions
such as GMAC2 with IP1001 PHY via RGMII and GMAC1/2 with RTL8211F PHY via SGMII.

Sean Wang (4):
  dt-bindings: net: mediatek: add support for MediaTek MT7623 and MT7622
SoC
  net-next: mediatek: add platform data to adapt into various hardware
  net-next: mediatek: add support for MediaTek MT7622 SoC
  MAINTAINERS: add Sean/Nelson as MediaTek ethernet maintainers

 .../devicetree/bindings/net/mediatek-net.txt   |  12 +-
 MAINTAINERS|   2 +
 drivers/net/ethernet/mediatek/Kconfig  |   6 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c| 143 +++--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h|  73 ++-
 5 files changed, 216 insertions(+), 20 deletions(-)

-- 
2.7.4



Re: [RFC] get_compat_msghdr(): get rid of field-by-field copyin

2017-07-11 Thread David Miller
From: Al Viro 
Date: Sat, 8 Jul 2017 19:21:00 +0100

> There are 3 commits in vfs.git#misc.compat I hadn't pushed to Linus yet;
> they touch net/* and I'd like to see at least "no objections" from networking
> folks before asking to pull that; all of those are about getting rid of
> field-by-field copyin.  Please, review and comment.
> 
> Signed-off-by: Al Viro 

That weird sparc64 regression concerns me.

The commit referenced in that discussion:

d9e968cb9f849770288f5fde3d8d3a5f7e339052 ("getrlimit()/setrlimit(): move compat 
to native")

looks harmless, or if there is a bug in there I can't see it.

But whatever it is, that same problem could be hiding in some of these
other transformations as well.

I think the bug might be that we are corrupting the user's stack
somehow.  But the two user copies in that commit look perfectly fine
to my eyes.

There shouldn't be any padding in that compat_rlimit structure, so
it's not like we're copying extra bytes.  Well, we'd be exposing
kernel stack memory if that were the case.

Color me stumped, but cautious about ACK'ing these networking
compat changes.



Re: Lots of new warnings with gcc-7.1.1

2017-07-11 Thread Linus Torvalds
On Tue, Jul 11, 2017 at 8:10 PM, Guenter Roeck  wrote:
>
> The hwmon warnings are all about supporting no more than 9,999 sensors
> (applesmc) to 999,999,999 sensors (scpi) of a given type.

Yeah, I think that's enough.

> Easy "fix" would be to replace snprintf() with scnprintf(), presumably
> because gcc doesn't know about scnprintf().

If that's the case, I'd prefer just turning off the format-truncation
(but not overflow) warning with '-Wno-format-trunction".

But maybe we can at least start it on a subsystem-by-subsystem basis
after people have verified their own subsusystem?

  Linus


Re: Lots of new warnings with gcc-7.1.1

2017-07-11 Thread Guenter Roeck

On 07/11/2017 03:35 PM, Linus Torvalds wrote:

[ Very random list of maintainers and mailing lists, at least
partially by number of warnings generated by gcc-7.1.1 that is then
correlated with the get_maintainers script ]

So I upgraded one of my boxes to F26, which upgraded the compiler to gcc-7.1.1

Which in turn means that my nice clean allmodconfig compile is not an
unholy mess of annoying new warnings.

Normally I hate the stupid new warnings, but this time around they are
actually exactly the kinds of warnings you'd want to see and that are
hard for humans to pick out errors: lots of format errors wrt limited
buffer sizes.

At the same time, many of them *are* annoying. We have various limited
buffers that are limited for a good reason, and some of the format
truncation warnings are about numbers in the range {0-MAX_INT], where
we definitely know that we don't need to worry about the really big
ones.

After all, we're using "snprintf()" for a reason - we *want* to
truncate if the buffer is too small.



The hwmon warnings are all about supporting no more than 9,999 sensors
(applesmc) to 999,999,999 sensors (scpi) of a given type. Easy "fix" would
be to replace snprintf() with scnprintf(), presumably because gcc doesn't
know about scnprintf(). We could also increase the name buffer size.
But is that really worth it just to silence gcc ?

Guenter


Re: [PATCH 1/1] bridge: mdb: fix leak on complete_info ptr on fail path

2017-07-11 Thread David Miller
From: Eduardo Valentin 
Date: Tue, 11 Jul 2017 14:55:12 -0700

> We currently get the following kmemleak report:
> unreferenced object 0x8800039d9820 (size 32):
>   comm "softirq", pid 0, jiffies 4295212383 (age 792.416s)
>   hex dump (first 32 bytes):
> 00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00  
> 00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff  
>   backtrace:
> [] kmemleak_alloc+0x4a/0xa0
> [] kmem_cache_alloc_trace+0xb8/0x1c0
> [] __br_mdb_notify+0x2a3/0x300 [bridge]
> [] br_mdb_notify+0x6e/0x70 [bridge]
> [] br_multicast_add_group+0x109/0x150 [bridge]
> [] br_ip6_multicast_add_group+0x58/0x60 [bridge]
> [] br_multicast_rcv+0x1d5/0xdb0 [bridge]
> [] br_handle_frame_finish+0xcf/0x510 [bridge]
> [] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter]
> [] br_nf_hook_thresh+0x48/0xb0 [br_netfilter]
> [] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 
> [br_netfilter]
> [] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter]
> [] br_nf_pre_routing+0x197/0x3d0 [br_netfilter]
> [] nf_iterate+0x52/0x60
> [] nf_hook_slow+0x5c/0xb0
> [] br_handle_frame+0x1a4/0x2c0 [bridge]
> 
> This happens when switchdev_port_obj_add() fails. This patch
> frees complete_info object in the fail path.

Applied, thanks.

I'm so glad I pushed back on your original patch :-)

> Cc: stable  # v4.9+

Please do not add stable tags to networking patches, I queue up and
submit networking -stable changes myself upon request which I am doing
in this case as well.

Thanks.


Re: [GIT] Networking

2017-07-11 Thread Herbert Xu
On Tue, Jul 11, 2017 at 01:31:14PM -0700, David Miller wrote:
>
> Acked-by: David S. Miller 
> 
> Looks good, is this going via my tree or your's?

I'll push it along.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: WARN_ON_ONCE(work > weight) in napi_poll()

2017-07-11 Thread Igor Mitsyanko

On 07/11/2017 10:28 AM, Andrey Ryabinin wrote:


It gave me this:

[118648.825347] #1 quota too big 72 64 16
[118648.825351] #2 quota too big 72 64 16
[118648.825471] [ cut here ]
[118648.825484] WARNING: CPU: 0 PID: 0 at ../net/core/dev.c:5274 
net_rx_action+0x258/0x360

So this means that we didn't met the condition bellow, i.e. skb_queue_empty() 
returned true.

 ath10k_htt_txrx_compl_task():

 if ((quota > ATH10K_NAPI_QUOTA_LIMIT) &&
 !skb_queue_empty(>rx_in_ord_compl_q)) {
 resched_napi = true;
 goto exit;
 }


Also WLAN.RM.2.0-00180-QCARMSWPZ-1 firmware is a bit old, could you also update 
firmware to give it a try?
https://github.com/kvalo/ath10k-firmware/tree/master/QCA6174/hw3.0/4.4



Will try.



Maybe ath10k_htt_rx_in_ord_ind() has to accept "budget_left" parameter 
and use it to limit number of processed MSDUs in queued AMSDU and saving 
rest for later (NAPI has to be rescheduled in this case).
It seems natural that this problem happens with current logic, in case 
AMSDU in Rx queue has more elements then left in budget.


Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio

2017-07-11 Thread David Miller
From: Jeff Kirsher 
Date: Tue, 11 Jul 2017 17:18:17 -0700

> On Tue, 2017-07-11 at 07:26 -0700, David Miller wrote:
>> From: Amritha Nambiar 
>> Date: Tue, 11 Jul 2017 03:18:30 -0700
>> 
>> > The following series introduces a new hardware offload mode in
>> > tc/mqprio where the TCs, the queue configurations and
>> > bandwidth rate limits are offloaded to the hardware.
>> 
>> net-next is closed, please resubmit this when net-next opens
>> again:
>> 
>>  http://vger.kernel.org/~davem/net-next.html
>> 
>> Thank you.
> 
> Dave, this is gonna come through my tree.  Amritha is looking for
> community feedback, while we have this under validation.  Sorry for the
> confusion.

One way to deal with this is to say something like:

[PATCH net-next RFC]

in the subject lines instead of just:

[PATCH net-next]


Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio

2017-07-11 Thread Jeff Kirsher
On Tue, 2017-07-11 at 07:26 -0700, David Miller wrote:
> From: Amritha Nambiar 
> Date: Tue, 11 Jul 2017 03:18:30 -0700
> 
> > The following series introduces a new hardware offload mode in
> > tc/mqprio where the TCs, the queue configurations and
> > bandwidth rate limits are offloaded to the hardware.
> 
> net-next is closed, please resubmit this when net-next opens
> again:
> 
>   http://vger.kernel.org/~davem/net-next.html
> 
> Thank you.

Dave, this is gonna come through my tree.  Amritha is looking for
community feedback, while we have this under validation.  Sorry for the
confusion.

signature.asc
Description: This is a digitally signed message part


Re: Lots of new warnings with gcc-7.1.1

2017-07-11 Thread Marcel Holtmann
Hi Linus,

> At the same time, others aren't quite as insane, and in many cases the
> warnings might be easy to just fix.
> 
> And some actually look valid, although they might still require odd input:
> 
>  net/bluetooth/smp.c: In function ‘le_max_key_size_read’:
>  net/bluetooth/smp.c:3372:29: warning: ‘snprintf’ output may be
> truncated before the last format character [-Wformat-truncation=]
>snprintf(buf, sizeof(buf), "%2u\n", SMP_DEV(hdev)->max_key_size);
>   ^~~
>  net/bluetooth/smp.c:3372:2: note: ‘snprintf’ output between 4 and 5
> bytes into a destination of size 4
> 
> yeah, "max_key_size" is unsigned char, but if it's larger than 99 it
> really does need 5 bytes for "%2u\n" with the terminating NUL
> character.
> 
> Of course, the "%2d" implies that people expect it to be < 100, but at
> the same time it doesn't sound like a bad idea to just make the buffer
> be one byte bigger. So..

the Bluetooth specification defines that the Maximum Encryption Key Size shall 
be in the range 7 to 16 octets. Which is also reflected in these defines:

#define SMP_MIN_ENC_KEY_SIZE7
#define SMP_MAX_ENC_KEY_SIZE16

So it is buf[4] since we know it never gets larger than 16. So even in this 
case the warning is bogus.

I have no problem in increasing it to buf[5] to shut up the compiler, but that 
is what I would be doing here. I am not fixing an actual bug.

> Anyway, it would be lovely if some of the more affected developers
> would take a look at gcc-7.1.1 warnings. Right now I get about three
> *thousand* lines of warnings from a "make allmodconfig" build, which
> makes them a bit overwhelming.
> 
> I do suspect I'll make "-Wformat-truncation" (as opposed to
> "-Wformat-overflow") be a "V=1" kind of warning.  But let's see how
> many of these we can fix, ok?

I had to use the -Wno-format-trunction in a few projects since gcc was 
completely lost. And since we were using snprintf, I saw no point in trying to 
please gcc with a larger buffer.

Regards

Marcel



Lots of new warnings with gcc-7.1.1

2017-07-11 Thread Linus Torvalds
[ Very random list of maintainers and mailing lists, at least
partially by number of warnings generated by gcc-7.1.1 that is then
correlated with the get_maintainers script ]

So I upgraded one of my boxes to F26, which upgraded the compiler to gcc-7.1.1

Which in turn means that my nice clean allmodconfig compile is not an
unholy mess of annoying new warnings.

Normally I hate the stupid new warnings, but this time around they are
actually exactly the kinds of warnings you'd want to see and that are
hard for humans to pick out errors: lots of format errors wrt limited
buffer sizes.

At the same time, many of them *are* annoying. We have various limited
buffers that are limited for a good reason, and some of the format
truncation warnings are about numbers in the range {0-MAX_INT], where
we definitely know that we don't need to worry about the really big
ones.

After all, we're using "snprintf()" for a reason - we *want* to
truncate if the buffer is too small.

But a lot of the warnings look reasonable, and at least the warnings
are nice in how they actually explain why the warning is happening.
Example:

  arch/x86/platform/intel-mid/device_libs/platform_max7315.c: In
function ‘max7315_platform_data’:
  arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:35:
warning: ‘%d’ directive writing between 1 and 11 bytes into a region
of size 9 [-Wformat-overflow=]
 sprintf(base_pin_name, "max7315_%d_base", nr);
 ^~
  arch/x86/platform/intel-mid/device_libs/platform_max7315.c:41:26:
note: directive argument in the range [-2147483647, 2147483647]

Yeah, the compiler is technically correct, but we already made sure we
have at most MAX7315_NUM of those adapters, so no, "nr" is really not
going to be a 10-digit number.

So the warning is kind of bogus.

At the same time, others aren't quite as insane, and in many cases the
warnings might be easy to just fix.

And some actually look valid, although they might still require odd input:

  net/bluetooth/smp.c: In function ‘le_max_key_size_read’:
  net/bluetooth/smp.c:3372:29: warning: ‘snprintf’ output may be
truncated before the last format character [-Wformat-truncation=]
snprintf(buf, sizeof(buf), "%2u\n", SMP_DEV(hdev)->max_key_size);
   ^~~
  net/bluetooth/smp.c:3372:2: note: ‘snprintf’ output between 4 and 5
bytes into a destination of size 4

yeah, "max_key_size" is unsigned char, but if it's larger than 99 it
really does need 5 bytes for "%2u\n" with the terminating NUL
character.

Of course, the "%2d" implies that people expect it to be < 100, but at
the same time it doesn't sound like a bad idea to just make the buffer
be one byte bigger. So..

Anyway, it would be lovely if some of the more affected developers
would take a look at gcc-7.1.1 warnings. Right now I get about three
*thousand* lines of warnings from a "make allmodconfig" build, which
makes them a bit overwhelming.

I do suspect I'll make "-Wformat-truncation" (as opposed to
"-Wformat-overflow") be a "V=1" kind of warning.  But let's see how
many of these we can fix, ok?

  Linus


Re: [Intel-wired-lan] [PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well

2017-07-11 Thread kbuild test robot
Hi Amritha,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[cannot apply to v4.12 next-20170711]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Amritha-Nambiar/Configuring-traffic-classes-via-new-hardware-offload-mechanism-in-tc-mqprio/20170711-215943
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git 
dev-queue
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko] 
undefined!
>> ERROR: "__udivdi3" [drivers/net/ethernet/intel/i40e/i40e.ko] undefined!

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH 1/1] bridge: mdb: fix leak on complete_info ptr on fail path

2017-07-11 Thread Eduardo Valentin
We currently get the following kmemleak report:
unreferenced object 0x8800039d9820 (size 32):
  comm "softirq", pid 0, jiffies 4295212383 (age 792.416s)
  hex dump (first 32 bytes):
00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00  
00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff  
  backtrace:
[] kmemleak_alloc+0x4a/0xa0
[] kmem_cache_alloc_trace+0xb8/0x1c0
[] __br_mdb_notify+0x2a3/0x300 [bridge]
[] br_mdb_notify+0x6e/0x70 [bridge]
[] br_multicast_add_group+0x109/0x150 [bridge]
[] br_ip6_multicast_add_group+0x58/0x60 [bridge]
[] br_multicast_rcv+0x1d5/0xdb0 [bridge]
[] br_handle_frame_finish+0xcf/0x510 [bridge]
[] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter]
[] br_nf_hook_thresh+0x48/0xb0 [br_netfilter]
[] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 
[br_netfilter]
[] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter]
[] br_nf_pre_routing+0x197/0x3d0 [br_netfilter]
[] nf_iterate+0x52/0x60
[] nf_hook_slow+0x5c/0xb0
[] br_handle_frame+0x1a4/0x2c0 [bridge]

This happens when switchdev_port_obj_add() fails. This patch
frees complete_info object in the fail path.

Cc: stable  # v4.9+
Reviewed-by: Vallish Vaidyeshwara 
Signed-off-by: Eduardo Valentin 
---
 net/bridge/br_mdb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index b084548..c1030f8 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -323,7 +323,8 @@ static void __br_mdb_notify(struct net_device *dev, struct 
net_bridge_port *p,
__mdb_entry_to_br_ip(entry, _info->ip);
mdb.obj.complete_priv = complete_info;
mdb.obj.complete = br_mdb_complete;
-   switchdev_port_obj_add(port_dev, );
+   if (switchdev_port_obj_add(port_dev, ))
+   kfree(complete_info);
}
} else if (port_dev && type == RTM_DELMDB) {
switchdev_port_obj_del(port_dev, );
-- 
2.7.4



Re: [PATCH 1/1] bridge: mdb: report complete_info ptr as not a kmemleak

2017-07-11 Thread Eduardo Valentin
On Wed, Jul 05, 2017 at 09:05:14AM -0700, Eduardo Valentin wrote:
> On Tue, Jul 04, 2017 at 01:48:42AM -0700, David Miller wrote:
> > From: Eduardo Valentin 
> > Date: Mon, 3 Jul 2017 10:06:34 -0700
> > 
> > > We currently get the following kmemleak report:
> >  ...
> > > This patch flags the complete_info ptr object as not a leak as it will
> > > get freed when .complete_priv() is called,
> > 
> > We don't call .complete_priv().  We call .complete().
> 
> True, I can fix this wording in the commit next version.
> 
> > 
> >  for the br mdb case, it
> > > will be freed at br_mdb_complete().
> > > 
> > > Cc: stable  # v4.9+
> > > Reviewed-by: Vallish Vaidyeshwara 
> > > Signed-off-by: Eduardo Valentin 
> > 
> > I don't understand why kmemleak cannot see this.
> > 
> > We store the pointer globally when we do the switchdev_port_obv_add()
> > call and it should be able to see that.
> 
> I see and agree. But even then I get these reports consistently on RTM_NEWMDB 
> type.
> This is why I am proposing marking as a non-kmemleak.
> 
> To me, this is only a leak if .complete() never gets called.

Ok. So, in fact, I believe the problem I am seeing in more when 
switchdev_port_obj_add() fails.

I will send a patch for that.

> 
> 
> -- 
> All the best,
> Eduardo Valentin

-- 
All the best,
Eduardo Valentin


Re: [PATCH V3] Set NTB format again after altsetting switch for Huawei devices

2017-07-11 Thread Bjørn Mork
Oliver Neukum  writes:
> Am Dienstag, den 11.07.2017, 17:21 +0200 schrieb Enrico Mioso:
>> Some firmwares in Huawei E3372H devices have been observed to switch back
>> to NTB 32-bit format after altsetting switch.
>> This patch implements a driver flag to check for the device settings and
>> set NTB format to 16-bit again if needed.
>> The flag has been activated for devices controlled by the huawei_cdc_ncm.c
>> driver.
>
> Hi,
>
> does this cover reset_resume()?

AFAIK reset_resume doesn't really work for 3G/LTE modems.  Too much
state to restore both in firmware and network, with no well defined
protocol to restore it.

The only reason for having reset_resume in the modem drivers is that
quite a few modems have an SD reader. We need to support persist for the
usb-storage function, even if the modem functions fail.  There is no
reason to mess up the file system too.



Bjørn


Re: [PATCH V3] Set NTB format again after altsetting switch for Huawei devices

2017-07-11 Thread Oliver Neukum
Am Dienstag, den 11.07.2017, 17:21 +0200 schrieb Enrico Mioso:
> 
> Some firmwares in Huawei E3372H devices have been observed to switch back
> to NTB 32-bit format after altsetting switch.
> This patch implements a driver flag to check for the device settings and
> set NTB format to 16-bit again if needed.
> The flag has been activated for devices controlled by the huawei_cdc_ncm.c
> driver.

Hi,

does this cover reset_resume()?

Regards
Oliver


Re: [RESEND PATCH net] tap: convert a mutex to a spinlock

2017-07-11 Thread David Miller
From: Cong Wang 
Date: Mon, 10 Jul 2017 10:05:50 -0700

> We are not allowed to block on the RCU reader side, so can't
> just hold the mutex as before. As a quick fix, convert it to
> a spinlock.
> 
> Fixes: d9f1f61c0801 ("tap: Extending tap device create/destroy APIs")
> Reported-by: Christian Borntraeger 
> Tested-by: Christian Borntraeger 
> Cc: Sainath Grandhi 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks.


Re: [PATCH net] cxgb4: fix BUG() on interrupt deallocating path of ULD

2017-07-11 Thread David Miller
From: "Guilherme G. Piccoli" 
Date: Mon, 10 Jul 2017 10:55:46 -0300

> Since the introduction of ULD (Upper-Layer Drivers), the MSI-X
> deallocating path changed in cxgb4: the driver frees the interrupts
> of ULD when unregistering it or on shutdown PCI handler.
> 
> Problem is that if a MSI-X is not freed before deallocated in the PCI
> layer, it will trigger a BUG() due to still "alive" interrupt being
> tentatively quiesced.
> 
> The below trace was observed when doing a simple unbind of Chelsio's
> adapter PCI function, like:
>   "echo 001e:80:00.4 > /sys/bus/pci/drivers/cxgb4/unbind"
> 
> Trace:
> 
>   kernel BUG at drivers/pci/msi.c:352!
>   Oops: Exception in kernel mode, sig: 5 [#1]
>   ...
>   NIP [c05a5e60] free_msi_irqs+0xa0/0x250
>   LR [c05a5e50] free_msi_irqs+0x90/0x250
>   Call Trace:
>   [c05a5e50] free_msi_irqs+0x90/0x250 (unreliable)
>   [c05a72c4] pci_disable_msix+0x124/0x180
>   [d00011e06708] disable_msi+0x88/0xb0 [cxgb4]
>   [d00011e06948] free_some_resources+0xa8/0x160 [cxgb4]
>   [d00011e06d60] remove_one+0x170/0x3c0 [cxgb4]
>   [c058a910] pci_device_remove+0x70/0x110
>   [c064ef04] device_release_driver_internal+0x1f4/0x2c0
>   ...
> 
> This patch fixes the issue by refactoring the shutdown path of ULD on
> cxgb4 driver, by properly freeing and disabling interrupts on PCI
> remove handler too.
> 
> Fixes: 0fbc81b3ad51 ("Allocate resources dynamically for all cxgb4 ULD's")
> Reported-by: Harsha Thyagaraja 
> Signed-off-by: Guilherme G. Piccoli 

Applied and queued up for -stable, thanks.


Re: [PATCH 1/2] openvswitch: Optimize updating for OvS flow_stats.

2017-07-11 Thread David Miller

net-next is closed, please submit these two patches when it is open
again:

http://vger.kernel.org/~davem/net-next.html

Thank you.


Re: [PATCH net] qed: Fix printk option passed when printing ipv6 addresses

2017-07-11 Thread David Miller
From: Michal Kalderon 
Date: Sun, 9 Jul 2017 13:00:16 +0300

> The option "h" (host order ) exists for ipv4 only.
> Remove the h when printing ipv6 addresses.
> 
> Lead to the following smatch warning:
> 
> drivers/net/ethernet/qlogic/qed/qed_iwarp.c:585 qed_iwarp_print_tcp_ramrod()
> warn: '%pI6' can only be followed by c
> drivers/net/ethernet/qlogic/qed/qed_iwarp.c:1521 qed_iwarp_print_cm_info()
> warn: '%pI6' can only be followed by c
> 
> Fixes commit 456a584947d5 ("qed: iWARP CM add passive side connect")
> 
> Reported-by: Dan Carpenter 
> Signed-off-by: Michal Kalderon 
> Signed-off-by: Yuval Mintz 

Applied.


Re: [PATCH] net-next: Fix minor code bug in timestamping.txt

2017-07-11 Thread David Miller
From: Ahmad Fatoum 
Date: Sat,  8 Jul 2017 21:28:44 +0200

> Passing (void*)val instead of  would make a pointer out of an integer
> and cause sock_setsockopt to -EFAULT.
> 
> See tools/testing/selftests/networking/timestamping/timestamping.c
> for a working example.
> 
> Cc: David S. Miller 
> Cc: netdev@vger.kernel.org
> Signed-off-by: Ahmad Fatoum 

Applied.


Re: [PATCH 0/3] net: stmmac: Fixes and cleanups in 'alloc_dma_[rt]x_desc_resources()'

2017-07-11 Thread David Miller
From: Christophe JAILLET 
Date: Sat,  8 Jul 2017 09:46:12 +0200

> These patchs are all related to 'alloc_dma_[rt]x_desc_resources()' functions.
> 
> The 2 first fix an error path where some resources are leaking. I've
> separated them into 2 patches because the issues have been introduced by
> 2 deferent commits.
> 
> The 3rd patch is just a clean-up.

Series applied, thanks.


Re: [PATCH v6 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-07-11 Thread Alexander Duyck
On Mon, Jul 10, 2017 at 5:01 PM, Casey Leedom  wrote:
>
> Hey Alexander,
>
>   Okay, I understand your point regarding the "most likely scenario" being
> TLPs directed upstream to the Root Complex.  But I'd still like to make sure
> that we have an agreed upon API/methodology for doing Peer-to-Peer with
> Relaxed Ordering and no Relaxed Ordering to the Root Complex.  I don't see
> how the proposed APIs can be used in that fashion.
>
>   Right now the proposed change for cxgb4 is for it to test its own PCIe
> Capability Device Control[Relaxed Ordering Enable] in order to use that
> information to program the Chelsio Hardware to emit/not emit upstream TLPs
> with the Relaxed Ordering Attribute set.  But if we're going to have the
> mixed mode situation I describe, the PCIe Capability Device Control[Relaxed
> Ordering Enable] will have to be set which means that we'll be programming
> the Chelsio Hardware to send upstream TLPs with Relaxed Ordering Enable to
> the Root Complex which is what we were trying to avoid in the first place ...
>
>   [[ And, as I noted on Friday evening, the currect cxgb4 Driver hardwires
>  the Relaxed Ordering Enable on early dureing device probe, so that
>  would minimally need to be addressed even if we decide that we don't
>  ever want to support mixed mode Relaxed Ordering. ]]

So when you say "hardwires the Relaxed Ordering Enable" do you mean it
cannot be changed by software, or are you saying the firmware is just
setting the bit? I just want to make sure the change even works.

Really what I was trying to get at is the Relaxed Ordering Enable bit
doesn't have to stay as 0 if it is cleared due to the root complex not
supporting it. If you have a device that is smart enough to be able to
distinguish between different destinations and can change the TLPs
depending on where they are routed then the driver should feel free to
re-enable the relaxed ordering bit itself and do whatever it wants.
The idea though is that in most cases it will probably be enabled by
default since most firmwares default to that if I am not mistaken. If
we disable it, then it is up to the drivers to figure out if they need
to come up with a workaround.

>   We need some method of telling the Chelsio Driver that it should/shouldn't
> use Relaxed Ordering with TLPs directed at the Root Complex.  And the same
> is true for a Peer PCIe Device.

Well my thought for now is to take this in baby steps. Looking over
the code I might suggest that Ding just drops patch 3 and just focuses
on the first two patches for now. Patch 3 doesn't do anything from the
sounds of it since the Relaxed Ordering Enable being cleared will
cause the TLPs to already not set the relaxed ordering bit, do I have
that right?

The main thing Ding cared about is making the ixgbe driver behave more
like the cxgb4 driver in that he wanted to have us enable relaxed
ordering by default which right now we were only doing for the SPARC
platforms. As such we may want to look at changing patch 3 to instead
strip the code in ixgbe so that it is enabled to use Relaxed Ordering
by default and then let the configuration space determine if we use it
or not.

>   It may be that we should approach this from the completely opposite
> direction and instead of having quirks which identify problematic devices,
> have quirks which identify devices which would benefit from the use of
> Relaxed Ordering (if the sending device supports that).  That is, assume the
> using Relaxed Ordering shouldn't be done unless the target device says "I
> love Relaxed Ordering TLPs" ...  In such a world, an NVMe or a Graphics
> device might declare love of Relaxed Ordering and the same for a SPARC Root
> Complex (I think that was your example).

Really I think we are probably better off enabling it by default and
leaving it up to the configuration space of the end points to sort it
out. The advantage is that it will let us catch issues with platforms
that need these kind of quirks sooner since up until now we have been
avoiding enabling relaxed ordering for most devices on x86 and as we
get faster and faster buses we are going to need to start fully
supporting this sooner or later anyway.

>   By the way, the sole example of Data Corruption with Relaxed Ordering is
> the AMD A1100 ARM SoC and AMD appears to have given up on that almost as
> soon as it was released.  So what we're left with currently is a performance
> problem on modern Intel CPUs ...  (And hopefully we'll get a Technical
> Publication on that issue fairly soon.)
>
> Casey

That's also assuming there aren't any other platforms with issues
lurking out there. If something like this had been in place before
some of the modern Intel CPUs were developed perhaps they would have
caught the relaxed ordering issue soon enough to have resolved it in
the silicon.

Odds are this sets things up for how we will deal with future issues
more than current ones. I'm more a fan of just letting the 

Re: [GIT] Networking

2017-07-11 Thread David Miller
From: Herbert Xu 
Date: Mon, 10 Jul 2017 22:00:48 +0800

> crypto: af_alg - Avoid sock_graft call warning
> 
> The newly added sock_graft warning triggers in af_alg_accept.
> It's harmless as we're essentially doing sock->sk = sock->sk.
> 
> The sock_graft call is actually redundant because all the work
> it does is subsumed by sock_init_data.  However, it was added
> to placate SELinux as it uses it to initialise its internal state.
> 
> This patch avoisd the warning by making the SELinux call directly.
> 
> Reported-by: Linus Torvalds 
> Signed-off-by: Herbert Xu 

Acked-by: David S. Miller 

Looks good, is this going via my tree or your's?


Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-07-11 Thread Eric Biggers
On Tue, Jul 11, 2017 at 11:53:11AM -0700, Dave Watson wrote:
> On 07/11/17 08:29 AM, Steffen Klassert wrote:
> > Sorry for replying to old mail...
> > > +int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
> > > +{
> > 
> > ...
> > 
> > > +
> > > + if (!sw_ctx->aead_send) {
> > > + sw_ctx->aead_send = crypto_alloc_aead("gcm(aes)", 0, 0);
> > > + if (IS_ERR(sw_ctx->aead_send)) {
> > > + rc = PTR_ERR(sw_ctx->aead_send);
> > > + sw_ctx->aead_send = NULL;
> > > + goto free_rec_seq;
> > > + }
> > > + }
> > > +
> > 
> > When I look on how you allocate the aead transformation, it seems
> > that you should either register an asynchronous callback with
> > aead_request_set_callback(), or request for a synchronous algorithm.
> > 
> > Otherwise you will crash on an asynchronous crypto return, no?
> 
> The intention is for it to be synchronous, and gather directly from
> userspace buffers.  It looks like calling
> crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC) is the correct way
> to request synchronous algorithms only?
> 

Yes, that means the CRYPTO_ALG_ASYNC bit is required to be 0, i.e. the algorithm
must be synchronous.  Currently it's requesting either a synchronous or
asynchronous algorithm, and it will crash if it gets an async one.

Also I think even with a synchronous algorithm, tls_do_encryption() still needs
to call aead_request_set_callback(), passing NULL for the callback and data, so
that the request flags are initialized.

Eric


Re: [PATCH net] cxgb4: ptp_clock_register() returns error pointers

2017-07-11 Thread Richard Cochran
On Tue, Jul 11, 2017 at 07:54:38PM +0530, Ganesh Goudar wrote:
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
> index 50517cf..06c0de4 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
> @@ -441,7 +441,7 @@ void cxgb4_ptp_init(struct adapter *adapter)
>  
>   adapter->ptp_clock = ptp_clock_register(>ptp_clock_info,
>   >pdev->dev);
> - if (!adapter->ptp_clock) {
> + if (IS_ERR_OR_NULL(adapter->ptp_clock)) {

But here you also need

adapter->ptp_clock = NULL;

>   dev_err(adapter->pdev_dev,
>   "PTP %s Clock registration has failed\n", __func__);
>   return;

because later on you have:

void cxgb4_ptp_stop(struct adapter *adapter)
{
...

if (adapter->ptp_clock) {
ptp_clock_unregister(adapter->ptp_clock);
adapter->ptp_clock = NULL;
}
}

Thanks,
Richard


Re: [PATCH net v2] samples/bpf: fix a build issue

2017-07-11 Thread Daniel Borkmann

On 07/11/2017 09:51 PM, Jesper Dangaard Brouer wrote:

On Mon, 10 Jul 2017 14:04:28 -0700 Yonghong Song  wrote:


With latest net-next:

clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include 
-I./arch/x86/include -I./arch/x86/include/generated/uapi 
-I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi 
-I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  
-Isamples/bpf \
 -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
 -Wno-compare-distinct-pointer-types \
 -Wno-gnu-variable-sized-type-not-at-end \
 -Wno-address-of-packed-member -Wno-tautological-compare \
 -Wno-unknown-warning-option \
 -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf 
-filetype=obj -o samples/bpf/tcp_synrto_kern.o
samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
   ^~
1 error generated.


net has the same issue.

Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.


I think this patch should have been split up in two patches. One where
you fix the compile issue, and one where you add support for ntohl and
htonl.  And you are also moving the file... I'm getting confused
reading this patch.


Could have been done, sure. Patch is still straight forward and small as-is,
so I don't really mind having this user space code squashed together.


Re: [Patch] mqueue: fix netlink sock refcnt and skb refcnt

2017-07-11 Thread Cong Wang
On Mon, Jul 10, 2017 at 11:09 AM, Linus Torvalds
 wrote:
> This thing is definitely not cc'd to the right people:
>
> On Sun, Jul 9, 2017 at 10:08 PM, Cong Wang  wrote:
>>
>> Cc: Linus Torvalds 
>> Cc: Andrew Morton 
>> Cc: Manfred Spraul 
>> Signed-off-by: Cong Wang 
>
> Unlike your previous patch, this seems to be more of a generic netlink
> interface issue, so you should primarily talk to the networking
> people, I'm not sure it makes much sense to cc me/Andrew/Manfred.
>

OK, actually this patch is not needed, because we only send notification
once, although the code itself does need some cleanup.

Sorry for the noise.


Re: [PATCH net v2] samples/bpf: fix a build issue

2017-07-11 Thread Jesper Dangaard Brouer

On Mon, 10 Jul 2017 14:04:28 -0700 Yonghong Song  wrote:

> With latest net-next:
> 
> clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include 
> -I./arch/x86/include -I./arch/x86/include/generated/uapi 
> -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi 
> -I./include/uapi -I./include/generated/uapi -include 
> ./include/linux/kconfig.h  -Isamples/bpf \
> -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
> -Wno-compare-distinct-pointer-types \
> -Wno-gnu-variable-sized-type-not-at-end \
> -Wno-address-of-packed-member -Wno-tautological-compare \
> -Wno-unknown-warning-option \
> -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf 
> -filetype=obj -o samples/bpf/tcp_synrto_kern.o
> samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not 
> found
>   ^~
> 1 error generated.
> 
> 
> net has the same issue.
> 
> Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.

I think this patch should have been split up in two patches. One where
you fix the compile issue, and one where you add support for ntohl and
htonl.  And you are also moving the file... I'm getting confused
reading this patch.

> Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
> compiler include logic so that programs in samples/bpf can access the headers
> in selftests/bpf, but not the other way around.
> 
> Signed-off-by: Yonghong Song 
> ---
>  samples/bpf/Makefile   |  1 +
>  tools/testing/selftests/bpf/Makefile   |  1 -
>  tools/testing/selftests/bpf/bpf_endian.h   | 14 ++
>  {samples => tools/testing/selftests}/bpf/bpf_helpers.h |  0
>  4 files changed, 15 insertions(+), 1 deletion(-)
>  rename {samples => tools/testing/selftests}/bpf/bpf_helpers.h (100%)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 9c65058..87246be 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -207,6 +207,7 @@ $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
>  # useless for BPF samples.
>  $(obj)/%.o: $(src)/%.c
>   $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
> + -I$(srctree)/tools/testing/selftests/bpf/ \
>   -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value 
> -Wno-pointer-sign \
>   -Wno-compare-distinct-pointer-types \
>   -Wno-gnu-variable-sized-type-not-at-end \
> diff --git a/tools/testing/selftests/bpf/Makefile 
> b/tools/testing/selftests/bpf/Makefile
> index 2ca51a8..153c3a1 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -37,6 +37,5 @@ CLANG ?= clang
>  
>  %.o: %.c
>   $(CLANG) -I. -I./include/uapi -I../../../include/uapi \
> - -I../../../../samples/bpf/ \
>   -Wno-compare-distinct-pointer-types \
>   -O2 -target bpf -c $< -o $@
> diff --git a/tools/testing/selftests/bpf/bpf_endian.h 
> b/tools/testing/selftests/bpf/bpf_endian.h
> index 487cbfb..74af266 100644
> --- a/tools/testing/selftests/bpf/bpf_endian.h
> +++ b/tools/testing/selftests/bpf/bpf_endian.h
> @@ -23,11 +23,19 @@
>  # define __bpf_htons(x)  __builtin_bswap16(x)
>  # define __bpf_constant_ntohs(x) ___constant_swab16(x)
>  # define __bpf_constant_htons(x) ___constant_swab16(x)
> +# define __bpf_ntohl(x)  __builtin_bswap32(x)
> +# define __bpf_htonl(x)  __builtin_bswap32(x)
> +# define __bpf_constant_ntohl(x) ___constant_swab32(x)
> +# define __bpf_constant_htonl(x) ___constant_swab32(x)
>  #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
>  # define __bpf_ntohs(x)  (x)
>  # define __bpf_htons(x)  (x)
>  # define __bpf_constant_ntohs(x) (x)
>  # define __bpf_constant_htons(x) (x)
> +# define __bpf_ntohl(x)  (x)
> +# define __bpf_htonl(x)  (x)
> +# define __bpf_constant_ntohl(x) (x)
> +# define __bpf_constant_htonl(x) (x)
>  #else
>  # error "Fix your compiler's __BYTE_ORDER__?!"
>  #endif
> @@ -38,5 +46,11 @@
>  #define bpf_ntohs(x) \
>   (__builtin_constant_p(x) ?  \
>__bpf_constant_ntohs(x) : __bpf_ntohs(x))
> +#define bpf_htonl(x) \
> + (__builtin_constant_p(x) ?  \
> +  __bpf_constant_htonl(x) : __bpf_htonl(x))
> +#define bpf_ntohl(x) \
> + (__builtin_constant_p(x) ?  \
> +  __bpf_constant_ntohl(x) : __bpf_ntohl(x))
>  
>  #endif /* __BPF_ENDIAN__ */
> diff --git a/samples/bpf/bpf_helpers.h 
> b/tools/testing/selftests/bpf/bpf_helpers.h
> similarity index 100%
> rename from samples/bpf/bpf_helpers.h
> rename to tools/testing/selftests/bpf/bpf_helpers.h



-- 
Best regards,
  Jesper Dangaard Brouer
  

Re: [RFC PATCH 03/12] xdp: add bpf_redirect helper function

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 11:38:33 -0700
John Fastabend  wrote:

> On 07/11/2017 07:09 AM, Andy Gospodarek wrote:
> > On Mon, Jul 10, 2017 at 1:23 PM, John Fastabend
> >  wrote:  
> >> On 07/09/2017 06:37 AM, Saeed Mahameed wrote:  
> >>>
> >>>
> >>> On 7/7/2017 8:35 PM, John Fastabend wrote:  
>  This adds support for a bpf_redirect helper function to the XDP
>  infrastructure. For now this only supports redirecting to the egress
>  path of a port.
> 
>  In order to support drivers handling a xdp_buff natively this patches
>  uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
>  to the specified device.
> 
>  If the program specifies either (a) an unknown device or (b) a device
>  that does not support the operation a BPF warning is thrown and the
>  XDP_ABORTED error code is returned.
> 
>  Signed-off-by: John Fastabend 
>  Acked-by: Daniel Borkmann 
>  ---  
> >>
> >> [...]
> >>  
> 
>  +static int __bpf_tx_xdp(struct net_device *dev, struct xdp_buff *xdp)
>  +{
>  +if (dev->netdev_ops->ndo_xdp_xmit) {
>  +dev->netdev_ops->ndo_xdp_xmit(dev, xdp);  
> >>>
> >>> Hi John,
> >>>
> >>> I have some concern here regarding synchronizing between the
> >>> redirecting device and the target device:
> >>>
> >>> if the target device's NAPI is also doing XDP_TX on the same XDP TX
> >>> ring which this NDO might be redirecting xdp packets into the same
> >>> ring, there would be a race accessing this ring resources (buffers
> >>> and descriptors). Maybe you addressed this issue in the device driver
> >>> implementation of this ndo or with some NAPI tricks/assumptions, I
> >>> guess we have the same issue for if you run the same program to
> >>> redirect traffic from multiple netdevices into one netdevice, how do
> >>> you synchronize accessing this TX ring ?  
> >>
> >> The implementation uses a per cpu TX ring to resolve these races. And
> >> the pair of driver interface API calls, xdp_do_redirect() and 
> >> xdp_do_flush_map()
> >> must be completed in a single poll() handler.
> >>
> >> This comment was included in the header file to document this,
> >>
> >> /* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
> >>  * same cpu context. Further for best results no more than a single map
> >>  * for the do_redirect/do_flush pair should be used. This limitation is
> >>  * because we only track one map and force a flush when the map changes.
> >>  * This does not appear to be a real limitation for existing software.
> >>  */
> >>
> >> In general some documentation about implementing XDP would probably be
> >> useful to add in Documentation/networking but this IMO goes beyond just
> >> this patch series.
> >>  
> >>>
> >>> Maybe we need some clear guidelines in this ndo documentation stating
> >>> how to implement this ndo and what are the assumptions on those XDP
> >>> TX redirect rings or from which context this ndo can run.
> >>>
> >>> can you please elaborate.  
> >>
> >> I think the best implementation is to use a per cpu TX ring as I did in
> >> this series. If your device is limited by the number of queues for some
> >> reason some other scheme would need to be devised. Unfortunately, the only
> >> thing I've come up for this case (using only this series) would both impact
> >> performance and make the code complex.
> >>
> >> A nice solution might be to constrain networking "tasks" to only a subset
> >> of cores. For 64+ core systems this might be a good idea. It would allow
> >> avoiding locking using per_cpu logic but also avoid networking consuming
> >> slices of every core in the system. As core count goes up I think we will
> >> eventually need to address this.I believe Eric was thinking along these
> >> lines with his netconf talk iirc. Obviously this work is way outside the
> >> scope of this series though.  
> > 
> > I agree that it is outside the scope of this series, but I think it is
> > important to consider the impact of the output queue selection in both
> > a heterogenous and homogenous driver setup and how tx could be
> > optimized or even considered to be more reliable and I think that was
> > part of Saeed's point.
> > 
> > I got base redirect support for bnxt_en working yesterday, but for it
> > and other drivers that do not necessarily create a ring/queue per core
> > like ixgbe there is probably a bit more to work in each driver to
> > properly track output tx rings/queues than what you have done with
> > ixgbe.
> >   
> 
> The problem, in my mind at least, is if you do not have a ring per core
> how does the locking work? I don't see any good way to do this outside
> of locking which I was trying to avoid.

My solution would be to queue the XDP packets in the devmap, and then
bulk xdp_xmit them to the device on flush.  Talking a lock per bulk
amortize cost to basically 

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 12:19 PM, Jesper Dangaard Brouer wrote:
> On Tue, 11 Jul 2017 11:56:21 -0700
> John Fastabend  wrote:
> 
>> On 07/11/2017 11:44 AM, Jesper Dangaard Brouer wrote:
>>> On Tue, 11 Jul 2017 20:01:36 +0200
>>> Jesper Dangaard Brouer  wrote:
>>>   
 On Tue, 11 Jul 2017 10:48:29 -0700
 John Fastabend  wrote:
  
> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
>> On Sat, 8 Jul 2017 21:06:17 +0200
>> Jesper Dangaard Brouer  wrote:
>>   
>>> My plan is to test this latest patchset again, Monday and Tuesday.
>>> I'll try to assess stability and provide some performance numbers.  
>>
>> Performance numbers:
>>
>>  14378479 pkt/s = XDP_DROP without touching memory
>>   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>>   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>>   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
>> XDP_TX)
>>   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
>>
>> The performance drop between xdp2 and xdp_redirect, was expected due
>> to the HW-tailptr flush per packet, which is costly.
>>
>>  (1/6344472-1/4595574)*10^9 = -59.98 ns
>>
>> The performance drop between xdp2 and xdp_redirect_map, is higher than
>> I expected, which is not good!  The avoidance of the tailptr flush per
>> packet was expected to give a higher boost.  The cost increased with
>> 40 ns, which is too high compared to the code added (on a 4GHz machine
>> approx 160 cycles).
>>
>>  (1/6344472-1/5066243)*10^9 = -39.77 ns
>>
>> This system doesn't have DDIO, thus we are stalling on cache-misses,
>> but I was actually expecting that the added code could "hide" behind
>> these cache-misses.
>>
>> I'm somewhat surprised to see this large a performance drop.
>>   
>
> Yep, although there is room for optimizations in the code path for sure. 
> And
> 5mpps is not horrible my preference is to get this series in plus any
> small optimization we come up with while the merge window is closed. Then
> follow up patches can do optimizations.

 IMHO 5Mpps is a very bad number for XDP.
  
> One easy optimization is to get rid of the atomic bitops. They are not 
> needed
> here we have a per cpu unsigned long. Another easy one would be to move
> some of the checks out of the hotpath. For example checking for 
> ndo_xdp_xmit
> and flush ops on the net device in the hotpath really should be done in 
> the
> slow path.

 I'm already running with a similar patch as below, but it
 (surprisingly) only gave my 3 ns improvement.  I also tried a
 prefetchw() on xdp.data that gave me 10 ns (which is quite good).

 I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
 have DDIO ... I have high hopes for this, as the major bottleneck on
 this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.

 Something is definitely wrong on this CPU, as perf stats shows, a very
 bad utilization of the CPU pipeline with 0.89 insn per cycle.  
>>>
>>> Wow, getting DDIO working and avoiding the cache-miss, was really
>>> _the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
>>> really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
>>> optimization)
>>>   
>>
>> Very nice :) this was with the prefecthw() removed right?
> 
> Yes, prefetchw removed.

Great, I wanted to avoid playing prefetch games for as long as possible.
At least in this initial submission.

> 
>>> 13,939,674 pkt/s = XDP_DROP without touching memory
>>> 14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
>>> 13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>>>  7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
>>> 13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
>>>
>>> Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
>>> read packet memory.
>>>
>>> The large performance gap to xdp_redirect is due to the tailptr flush,
>>> which really show up on this system.  The CPU efficiency is 1.36 insn
>>> per cycle, which for map variant is 2.15 insn per cycle.
>>>
>>>  Gap (1/13221812-1/7596576)*10^9 = -56 ns
>>>
>>> The xdp_redirect_map performance is really really good, almost 10G
>>> wirespeed on a single CPU!!!  This is amazing, and we know that this
>>> code is not even optimal yet.  The performance difference to xdp2 is
>>> only around 1 ns.
>>>   
>>
>> Great, yeah there are some more likely()/unlikely() hints we could add and
>> also remove some of the checks in the hotpath, etc.
> 
> Yes, plus inlining some function call.
>  

Yep.

>> Thanks for doing this!
> 
> I have a really strange observation... if I change the 

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 11:56:21 -0700
John Fastabend  wrote:

> On 07/11/2017 11:44 AM, Jesper Dangaard Brouer wrote:
> > On Tue, 11 Jul 2017 20:01:36 +0200
> > Jesper Dangaard Brouer  wrote:
> >   
> >> On Tue, 11 Jul 2017 10:48:29 -0700
> >> John Fastabend  wrote:
> >>  
> >>> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
>  On Sat, 8 Jul 2017 21:06:17 +0200
>  Jesper Dangaard Brouer  wrote:
>    
> > My plan is to test this latest patchset again, Monday and Tuesday.
> > I'll try to assess stability and provide some performance numbers.  
> 
>  Performance numbers:
> 
>   14378479 pkt/s = XDP_DROP without touching memory
>    9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>    6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>    4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
>  XDP_TX)
>    5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> 
>  The performance drop between xdp2 and xdp_redirect, was expected due
>  to the HW-tailptr flush per packet, which is costly.
> 
>   (1/6344472-1/4595574)*10^9 = -59.98 ns
> 
>  The performance drop between xdp2 and xdp_redirect_map, is higher than
>  I expected, which is not good!  The avoidance of the tailptr flush per
>  packet was expected to give a higher boost.  The cost increased with
>  40 ns, which is too high compared to the code added (on a 4GHz machine
>  approx 160 cycles).
> 
>   (1/6344472-1/5066243)*10^9 = -39.77 ns
> 
>  This system doesn't have DDIO, thus we are stalling on cache-misses,
>  but I was actually expecting that the added code could "hide" behind
>  these cache-misses.
> 
>  I'm somewhat surprised to see this large a performance drop.
>    
> >>>
> >>> Yep, although there is room for optimizations in the code path for sure. 
> >>> And
> >>> 5mpps is not horrible my preference is to get this series in plus any
> >>> small optimization we come up with while the merge window is closed. Then
> >>> follow up patches can do optimizations.
> >>
> >> IMHO 5Mpps is a very bad number for XDP.
> >>  
> >>> One easy optimization is to get rid of the atomic bitops. They are not 
> >>> needed
> >>> here we have a per cpu unsigned long. Another easy one would be to move
> >>> some of the checks out of the hotpath. For example checking for 
> >>> ndo_xdp_xmit
> >>> and flush ops on the net device in the hotpath really should be done in 
> >>> the
> >>> slow path.
> >>
> >> I'm already running with a similar patch as below, but it
> >> (surprisingly) only gave my 3 ns improvement.  I also tried a
> >> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
> >>
> >> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
> >> have DDIO ... I have high hopes for this, as the major bottleneck on
> >> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
> >>
> >> Something is definitely wrong on this CPU, as perf stats shows, a very
> >> bad utilization of the CPU pipeline with 0.89 insn per cycle.  
> > 
> > Wow, getting DDIO working and avoiding the cache-miss, was really
> > _the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
> > really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
> > optimization)
> >   
> 
> Very nice :) this was with the prefecthw() removed right?

Yes, prefetchw removed.

> > 13,939,674 pkt/s = XDP_DROP without touching memory
> > 14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
> > 13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
> >  7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
> > 13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
> > 
> > Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
> > read packet memory.
> > 
> > The large performance gap to xdp_redirect is due to the tailptr flush,
> > which really show up on this system.  The CPU efficiency is 1.36 insn
> > per cycle, which for map variant is 2.15 insn per cycle.
> > 
> >  Gap (1/13221812-1/7596576)*10^9 = -56 ns
> > 
> > The xdp_redirect_map performance is really really good, almost 10G
> > wirespeed on a single CPU!!!  This is amazing, and we know that this
> > code is not even optimal yet.  The performance difference to xdp2 is
> > only around 1 ns.
> >   
> 
> Great, yeah there are some more likely()/unlikely() hints we could add and
> also remove some of the checks in the hotpath, etc.

Yes, plus inlining some function call.
 
> Thanks for doing this!

I have a really strange observation... if I change the CPU powersave
settings, then the xdp_redirect_map performance drops in half!  Above
was with "tuned-adm profile powersave" (because, this is a really noisy
server, and I'm sitting next to it).  I 

Re: net-next STATUS page

2017-07-11 Thread Stephen Hemminger
On Tue, 11 Jul 2017 07:24:16 -0700 (PDT)
David Miller  wrote:

> It has gotten to the point that even casually walking around
> Faro, Portugal last week, random German tourists would stop
> me in the street and ask if net-next was open or not.
> 
> Therefore, in order to avoid any and all confusion I have created this
> web site:
> 
>   http://vger.kernel.org/~davem/net-next.html
> 
> Please refer to it if you are ever in doubt.
> 
> Thanks!

Where is the app?


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 11:44 AM, Jesper Dangaard Brouer wrote:
> On Tue, 11 Jul 2017 20:01:36 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> On Tue, 11 Jul 2017 10:48:29 -0700
>> John Fastabend  wrote:
>>
>>> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:  
 On Sat, 8 Jul 2017 21:06:17 +0200
 Jesper Dangaard Brouer  wrote:
 
> My plan is to test this latest patchset again, Monday and Tuesday.
> I'll try to assess stability and provide some performance numbers.

 Performance numbers:

  14378479 pkt/s = XDP_DROP without touching memory
   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
 XDP_TX)
   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap

 The performance drop between xdp2 and xdp_redirect, was expected due
 to the HW-tailptr flush per packet, which is costly.

  (1/6344472-1/4595574)*10^9 = -59.98 ns

 The performance drop between xdp2 and xdp_redirect_map, is higher than
 I expected, which is not good!  The avoidance of the tailptr flush per
 packet was expected to give a higher boost.  The cost increased with
 40 ns, which is too high compared to the code added (on a 4GHz machine
 approx 160 cycles).

  (1/6344472-1/5066243)*10^9 = -39.77 ns

 This system doesn't have DDIO, thus we are stalling on cache-misses,
 but I was actually expecting that the added code could "hide" behind
 these cache-misses.

 I'm somewhat surprised to see this large a performance drop.
 
>>>
>>> Yep, although there is room for optimizations in the code path for sure. And
>>> 5mpps is not horrible my preference is to get this series in plus any
>>> small optimization we come up with while the merge window is closed. Then
>>> follow up patches can do optimizations.  
>>
>> IMHO 5Mpps is a very bad number for XDP.
>>
>>> One easy optimization is to get rid of the atomic bitops. They are not 
>>> needed
>>> here we have a per cpu unsigned long. Another easy one would be to move
>>> some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
>>> and flush ops on the net device in the hotpath really should be done in the
>>> slow path.  
>>
>> I'm already running with a similar patch as below, but it
>> (surprisingly) only gave my 3 ns improvement.  I also tried a
>> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
>>
>> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
>> have DDIO ... I have high hopes for this, as the major bottleneck on
>> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
>>
>> Something is definitely wrong on this CPU, as perf stats shows, a very
>> bad utilization of the CPU pipeline with 0.89 insn per cycle.
> 
> Wow, getting DDIO working and avoiding the cache-miss, was really
> _the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
> really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
> optimization)
> 

Very nice :) this was with the prefecthw() removed right?

> 13,939,674 pkt/s = XDP_DROP without touching memory
> 14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
> 13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>  7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
> 13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
> 
> Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
> read packet memory.
> 
> The large performance gap to xdp_redirect is due to the tailptr flush,
> which really show up on this system.  The CPU efficiency is 1.36 insn
> per cycle, which for map variant is 2.15 insn per cycle.
> 
>  Gap (1/13221812-1/7596576)*10^9 = -56 ns
> 
> The xdp_redirect_map performance is really really good, almost 10G
> wirespeed on a single CPU!!!  This is amazing, and we know that this
> code is not even optimal yet.  The performance difference to xdp2 is
> only around 1 ns.
> 

Great, yeah there are some more likely()/unlikely() hints we could add and
also remove some of the checks in the hotpath, etc.

Thanks for doing this!



Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-07-11 Thread Dave Watson
On 07/11/17 08:29 AM, Steffen Klassert wrote:
> Sorry for replying to old mail...
> > +int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
> > +{
> 
> ...
> 
> > +
> > +   if (!sw_ctx->aead_send) {
> > +   sw_ctx->aead_send = crypto_alloc_aead("gcm(aes)", 0, 0);
> > +   if (IS_ERR(sw_ctx->aead_send)) {
> > +   rc = PTR_ERR(sw_ctx->aead_send);
> > +   sw_ctx->aead_send = NULL;
> > +   goto free_rec_seq;
> > +   }
> > +   }
> > +
> 
> When I look on how you allocate the aead transformation, it seems
> that you should either register an asynchronous callback with
> aead_request_set_callback(), or request for a synchronous algorithm.
> 
> Otherwise you will crash on an asynchronous crypto return, no?

The intention is for it to be synchronous, and gather directly from
userspace buffers.  It looks like calling
crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC) is the correct way
to request synchronous algorithms only?

> Also, it seems that you have your scatterlists on a per crypto
> transformation base istead of per crypto request. Is this intentional?

We hold the socket lock and only one crypto op can happen at a time,
so we reuse the scatterlists.


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 20:01:36 +0200
Jesper Dangaard Brouer  wrote:

> On Tue, 11 Jul 2017 10:48:29 -0700
> John Fastabend  wrote:
> 
> > On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:  
> > > On Sat, 8 Jul 2017 21:06:17 +0200
> > > Jesper Dangaard Brouer  wrote:
> > > 
> > >> My plan is to test this latest patchset again, Monday and Tuesday.
> > >> I'll try to assess stability and provide some performance numbers.
> > > 
> > > Performance numbers:
> > > 
> > >  14378479 pkt/s = XDP_DROP without touching memory
> > >   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
> > >   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
> > >   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
> > > XDP_TX)
> > >   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> > > 
> > > The performance drop between xdp2 and xdp_redirect, was expected due
> > > to the HW-tailptr flush per packet, which is costly.
> > > 
> > >  (1/6344472-1/4595574)*10^9 = -59.98 ns
> > > 
> > > The performance drop between xdp2 and xdp_redirect_map, is higher than
> > > I expected, which is not good!  The avoidance of the tailptr flush per
> > > packet was expected to give a higher boost.  The cost increased with
> > > 40 ns, which is too high compared to the code added (on a 4GHz machine
> > > approx 160 cycles).
> > > 
> > >  (1/6344472-1/5066243)*10^9 = -39.77 ns
> > > 
> > > This system doesn't have DDIO, thus we are stalling on cache-misses,
> > > but I was actually expecting that the added code could "hide" behind
> > > these cache-misses.
> > > 
> > > I'm somewhat surprised to see this large a performance drop.
> > > 
> > 
> > Yep, although there is room for optimizations in the code path for sure. And
> > 5mpps is not horrible my preference is to get this series in plus any
> > small optimization we come up with while the merge window is closed. Then
> > follow up patches can do optimizations.  
> 
> IMHO 5Mpps is a very bad number for XDP.
> 
> > One easy optimization is to get rid of the atomic bitops. They are not 
> > needed
> > here we have a per cpu unsigned long. Another easy one would be to move
> > some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
> > and flush ops on the net device in the hotpath really should be done in the
> > slow path.  
> 
> I'm already running with a similar patch as below, but it
> (surprisingly) only gave my 3 ns improvement.  I also tried a
> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
> 
> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
> have DDIO ... I have high hopes for this, as the major bottleneck on
> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
> 
> Something is definitely wrong on this CPU, as perf stats shows, a very
> bad utilization of the CPU pipeline with 0.89 insn per cycle.

Wow, getting DDIO working and avoiding the cache-miss, was really
_the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
optimization)

13,939,674 pkt/s = XDP_DROP without touching memory
14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
 7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap

Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
read packet memory.

The large performance gap to xdp_redirect is due to the tailptr flush,
which really show up on this system.  The CPU efficiency is 1.36 insn
per cycle, which for map variant is 2.15 insn per cycle.

 Gap (1/13221812-1/7596576)*10^9 = -56 ns

The xdp_redirect_map performance is really really good, almost 10G
wirespeed on a single CPU!!!  This is amazing, and we know that this
code is not even optimal yet.  The performance difference to xdp2 is
only around 1 ns.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp1 6
 proto 17:   11919302 pkt/s
 proto 17:   14281659 pkt/s
 proto 17:   14290650 pkt/s
 proto 17:   14283120 pkt/s
 proto 17:   14303023 pkt/s
 proto 17:   14299496 pkt/s

 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp2 6
 proto 0:  1 pkt/s
 proto 17:3225455 pkt/s
 proto 17:   13266772 pkt/s
 proto 17:   13221812 pkt/s
 proto 17:   1300 pkt/s
 proto 17:   13225508 pkt/s
 proto 17:   13226274 pkt/s

 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp_redirect 6 6
 ifindex 6:  66040 pkt/s
 ifindex 6:7029143 pkt/s
 ifindex 6:7596576 pkt/s
 ifindex 6:7598499 pkt/s
 ifindex 6:7597025 pkt/s
 ifindex 6:7598462 pkt/s

 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp_redirect_map 6 6
 map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
 ifindex 6:  

Re: [RFC PATCH 03/12] xdp: add bpf_redirect helper function

2017-07-11 Thread John Fastabend
On 07/11/2017 07:09 AM, Andy Gospodarek wrote:
> On Mon, Jul 10, 2017 at 1:23 PM, John Fastabend
>  wrote:
>> On 07/09/2017 06:37 AM, Saeed Mahameed wrote:
>>>
>>>
>>> On 7/7/2017 8:35 PM, John Fastabend wrote:
 This adds support for a bpf_redirect helper function to the XDP
 infrastructure. For now this only supports redirecting to the egress
 path of a port.

 In order to support drivers handling a xdp_buff natively this patches
 uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
 to the specified device.

 If the program specifies either (a) an unknown device or (b) a device
 that does not support the operation a BPF warning is thrown and the
 XDP_ABORTED error code is returned.

 Signed-off-by: John Fastabend 
 Acked-by: Daniel Borkmann 
 ---
>>
>> [...]
>>

 +static int __bpf_tx_xdp(struct net_device *dev, struct xdp_buff *xdp)
 +{
 +if (dev->netdev_ops->ndo_xdp_xmit) {
 +dev->netdev_ops->ndo_xdp_xmit(dev, xdp);
>>>
>>> Hi John,
>>>
>>> I have some concern here regarding synchronizing between the
>>> redirecting device and the target device:
>>>
>>> if the target device's NAPI is also doing XDP_TX on the same XDP TX
>>> ring which this NDO might be redirecting xdp packets into the same
>>> ring, there would be a race accessing this ring resources (buffers
>>> and descriptors). Maybe you addressed this issue in the device driver
>>> implementation of this ndo or with some NAPI tricks/assumptions, I
>>> guess we have the same issue for if you run the same program to
>>> redirect traffic from multiple netdevices into one netdevice, how do
>>> you synchronize accessing this TX ring ?
>>
>> The implementation uses a per cpu TX ring to resolve these races. And
>> the pair of driver interface API calls, xdp_do_redirect() and 
>> xdp_do_flush_map()
>> must be completed in a single poll() handler.
>>
>> This comment was included in the header file to document this,
>>
>> /* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
>>  * same cpu context. Further for best results no more than a single map
>>  * for the do_redirect/do_flush pair should be used. This limitation is
>>  * because we only track one map and force a flush when the map changes.
>>  * This does not appear to be a real limitation for existing software.
>>  */
>>
>> In general some documentation about implementing XDP would probably be
>> useful to add in Documentation/networking but this IMO goes beyond just
>> this patch series.
>>
>>>
>>> Maybe we need some clear guidelines in this ndo documentation stating
>>> how to implement this ndo and what are the assumptions on those XDP
>>> TX redirect rings or from which context this ndo can run.
>>>
>>> can you please elaborate.
>>
>> I think the best implementation is to use a per cpu TX ring as I did in
>> this series. If your device is limited by the number of queues for some
>> reason some other scheme would need to be devised. Unfortunately, the only
>> thing I've come up for this case (using only this series) would both impact
>> performance and make the code complex.
>>
>> A nice solution might be to constrain networking "tasks" to only a subset
>> of cores. For 64+ core systems this might be a good idea. It would allow
>> avoiding locking using per_cpu logic but also avoid networking consuming
>> slices of every core in the system. As core count goes up I think we will
>> eventually need to address this.I believe Eric was thinking along these
>> lines with his netconf talk iirc. Obviously this work is way outside the
>> scope of this series though.
> 
> I agree that it is outside the scope of this series, but I think it is
> important to consider the impact of the output queue selection in both
> a heterogenous and homogenous driver setup and how tx could be
> optimized or even considered to be more reliable and I think that was
> part of Saeed's point.
> 
> I got base redirect support for bnxt_en working yesterday, but for it
> and other drivers that do not necessarily create a ring/queue per core
> like ixgbe there is probably a bit more to work in each driver to
> properly track output tx rings/queues than what you have done with
> ixgbe.
> 

The problem, in my mind at least, is if you do not have a ring per core
how does the locking work? I don't see any good way to do this outside
of locking which I was trying to avoid.

.John

>>
>>
>>> Thanks,
>>> Saeed.
>>>



Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 11:01 AM, Jesper Dangaard Brouer wrote:
> On Tue, 11 Jul 2017 10:48:29 -0700
> John Fastabend  wrote:
> 
>> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
>>> On Sat, 8 Jul 2017 21:06:17 +0200
>>> Jesper Dangaard Brouer  wrote:
>>>   
 My plan is to test this latest patchset again, Monday and Tuesday.
 I'll try to assess stability and provide some performance numbers.  
>>>
>>> Performance numbers:
>>>
>>>  14378479 pkt/s = XDP_DROP without touching memory
>>>   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>>>   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>>>   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
>>> XDP_TX)
>>>   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
>>>
>>> The performance drop between xdp2 and xdp_redirect, was expected due
>>> to the HW-tailptr flush per packet, which is costly.
>>>
>>>  (1/6344472-1/4595574)*10^9 = -59.98 ns
>>>
>>> The performance drop between xdp2 and xdp_redirect_map, is higher than
>>> I expected, which is not good!  The avoidance of the tailptr flush per
>>> packet was expected to give a higher boost.  The cost increased with
>>> 40 ns, which is too high compared to the code added (on a 4GHz machine
>>> approx 160 cycles).
>>>
>>>  (1/6344472-1/5066243)*10^9 = -39.77 ns
>>>
>>> This system doesn't have DDIO, thus we are stalling on cache-misses,
>>> but I was actually expecting that the added code could "hide" behind
>>> these cache-misses.
>>>
>>> I'm somewhat surprised to see this large a performance drop.
>>>   
>>
>> Yep, although there is room for optimizations in the code path for sure. And
>> 5mpps is not horrible my preference is to get this series in plus any
>> small optimization we come up with while the merge window is closed. Then
>> follow up patches can do optimizations.
> 
> IMHO 5Mpps is a very bad number for XDP.
> 
>> One easy optimization is to get rid of the atomic bitops. They are not needed
>> here we have a per cpu unsigned long. Another easy one would be to move
>> some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
>> and flush ops on the net device in the hotpath really should be done in the
>> slow path.
> 
> I'm already running with a similar patch as below, but it
> (surprisingly) only gave my 3 ns improvement.  I also tried a
> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
> 

Ah OK good, do the above numbers use the both the bitops changes and the
prefechw?

> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
> have DDIO ... I have high hopes for this, as the major bottleneck on
> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
> 
> Something is definitely wrong on this CPU, as perf stats shows, a very
> bad utilization of the CPU pipeline with 0.89 insn per cycle.
> 

Interesting, the E5-1650 numbers will be good to know. If you have the
perf trace to posting might help track down some hot spots.

.John


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 07:23 AM, Jesper Dangaard Brouer wrote:
> On Mon, 10 Jul 2017 17:59:17 -0700
> John Fastabend  wrote:
> 
>> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:
>>> On Sat, 8 Jul 2017 21:06:17 +0200
>>> Jesper Dangaard Brouer  wrote:
>>>   
 On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
 David Miller  wrote:
  
> From: John Fastabend 
> Date: Fri, 07 Jul 2017 10:48:36 -0700
> 
>> On 07/07/2017 10:34 AM, John Fastabend wrote:  
>>> This series adds two new XDP helper routines bpf_redirect() and
>>> bpf_redirect_map(). The first variant bpf_redirect() is meant
>>> to be used the same way it is currently being used by the cls_bpf
>>> classifier. An xdp packet will be redirected immediately when this
>>> is called.  
>>
>> Also other than the typo in the title there ;) I'm going to CC
>> the driver maintainers working on XDP (makes for a long CC list but)
>> because we would want to try and get support in as many as possible in
>> the next merge window.
>>
>> For this rev I just implemented on ixgbe because I wrote the
>> original XDP support there. I'll volunteer to do virtio as well.  
>
> I went over this series a few times and it looks great to me.
> You didn't even give me some coding style issues to pick on :-)

 We (Daniel, Andy and I) have been reviewing and improving on this
 patchset the last couple of weeks ;-).  We had some stability issues,
 which is why it wasn't published earlier. My plan is to test this
 latest patchset again, Monday and Tuesday. I'll try to assess stability
 and provide some performance numbers.  
>>>
>>>
>>> Damn, I though it was stable, I have been running a lot of performance
>>> tests, and then this just happened :-(  
>>
>> Thanks, I'll take a look through the code and see if I can come up with
>> why this might happen. I haven't hit it on my tests yet though.
> 
> I've figured out why this happens, and I have a fix, see patch below
> with some comments with questions.
> 

Awesome!

> The problem is that we can leak map_to_flush in an error path, the fix:
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2ccd6ff09493..7f1f48668dcf 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, 
> struct xdp_buff *xdp,
> ri->map = NULL;
>  
> trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> -
> +   // Q: Should we also trace "goto out" (failed lookup)?
> +   //like bpf_warn_invalid_xdp_redirect();

Maybe another trace event? trace_xdp_redirect_failed()

> return __bpf_tx_xdp(fwd, map, xdp, index);
>  out:
> ri->ifindex = 0;
> -   ri->map = NULL;
> +   // XXX: here we could leak ri->map_to_flush, which could be
> +   //  picked up later by xdp_do_flush_map()
> +   xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */

+1 

ah map lookup failed and we need to do the flush nice catch.

> return -EINVAL;
> 
> 
> While debugging this, I noticed that we can have packets in-flight,
> while the XDP RX rings are being reconfigured.  I wonder if this is a
> ixgbe driver XDP-bug?  I think it would be best to add some
> RCU-barrier, after ixgbe_setup_tc().
> 

Actually I think a synchronize_sched() is needed, after the IXGBE_DOWN bit
is set but before the xdp_tx queues are cleaned up. In practice the 
ixgbe_down/up
sequence has so many msleep() operations for napi cleanup and hardware sync
I would be really surprised we ever hit this. But for correctness we should
likely add it.  

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ed97aa81a850..4872fbb54ecd 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -9801,7 +9804,18 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
> struct bpf_prog *prog)
>  
> /* If transitioning XDP modes reconfigure rings */
> if (!!prog != !!old_prog) {
> -   int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
> +   // XXX: Warn pkts can be in-flight in old_prog
> +   //  while ixgbe_setup_tc() calls ixgbe_close(dev).
> +   //
> +   // Should we avoid these in-flight packets?
> +   // Would it be enough to add an synchronize_rcu()
> +   // or rcu_barrier()?
> +   // or do we need an napi_synchronize() call here?
> +   //
> +   int err;
> +   netdev_info(dev,
> +   "Calling ixgbe_setup_tc() to reconfig XDP 
> rings\n");
> +   err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
>  
> if (err) {
>   

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 10:48:29 -0700
John Fastabend  wrote:

> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
> > On Sat, 8 Jul 2017 21:06:17 +0200
> > Jesper Dangaard Brouer  wrote:
> >   
> >> My plan is to test this latest patchset again, Monday and Tuesday.
> >> I'll try to assess stability and provide some performance numbers.  
> > 
> > Performance numbers:
> > 
> >  14378479 pkt/s = XDP_DROP without touching memory
> >   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
> >   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
> >   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
> > XDP_TX)
> >   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> > 
> > The performance drop between xdp2 and xdp_redirect, was expected due
> > to the HW-tailptr flush per packet, which is costly.
> > 
> >  (1/6344472-1/4595574)*10^9 = -59.98 ns
> > 
> > The performance drop between xdp2 and xdp_redirect_map, is higher than
> > I expected, which is not good!  The avoidance of the tailptr flush per
> > packet was expected to give a higher boost.  The cost increased with
> > 40 ns, which is too high compared to the code added (on a 4GHz machine
> > approx 160 cycles).
> > 
> >  (1/6344472-1/5066243)*10^9 = -39.77 ns
> > 
> > This system doesn't have DDIO, thus we are stalling on cache-misses,
> > but I was actually expecting that the added code could "hide" behind
> > these cache-misses.
> > 
> > I'm somewhat surprised to see this large a performance drop.
> >   
> 
> Yep, although there is room for optimizations in the code path for sure. And
> 5mpps is not horrible my preference is to get this series in plus any
> small optimization we come up with while the merge window is closed. Then
> follow up patches can do optimizations.

IMHO 5Mpps is a very bad number for XDP.

> One easy optimization is to get rid of the atomic bitops. They are not needed
> here we have a per cpu unsigned long. Another easy one would be to move
> some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
> and flush ops on the net device in the hotpath really should be done in the
> slow path.

I'm already running with a similar patch as below, but it
(surprisingly) only gave my 3 ns improvement.  I also tried a
prefetchw() on xdp.data that gave me 10 ns (which is quite good).

I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
have DDIO ... I have high hopes for this, as the major bottleneck on
this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.

Something is definitely wrong on this CPU, as perf stats shows, a very
bad utilization of the CPU pipeline with 0.89 insn per cycle.


> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 1191060..899364d 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -213,7 +213,7 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 key)
> struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> unsigned long *bitmap = this_cpu_ptr(dtab->flush_needed);
>  
> -   set_bit(key, bitmap);
> +   __set_bit(key, bitmap);
>  }
>  
>  struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
> @@ -253,7 +253,7 @@ void __dev_map_flush(struct bpf_map *map)
>  
> netdev = dev->dev;
>  
> -   clear_bit(bit, bitmap);
> +   __clear_bit(bit, bitmap);
> if (unlikely(!netdev || !netdev->netdev_ops->ndo_xdp_flush))
> continue;
>  
> @@ -287,7 +287,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev 
> *old_dev)
>  
> for_each_online_cpu(cpu) {
> bitmap = per_cpu_ptr(old_dev->dtab->flush_needed, 
> cpu);
> -   clear_bit(old_dev->key, bitmap);
> +   __clear_bit(old_dev->key, bitmap);
>  
> fl->netdev_ops->ndo_xdp_flush(old_dev->dev);
> }
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH] cisco: enic: Fic an error handling path in 'vnic_dev_init_devcmd2()'

2017-07-11 Thread David Miller
From: Christophe JAILLET 
Date: Sat,  8 Jul 2017 06:51:35 +0200

> if 'ioread32()' returns 0xFFF, we have to go through the error
> handling path as done everywhere else in this function.
> 
> Move the 'err_free_wq' label to better match its name and its location
> and add a new label 'err_disable_wq'.
> Update the code accordingly.
> 
> Fixes: 373fb0873d43 ("enic: add devcmd2")
> Signed-off-by: Christophe JAILLET 

Looks good, applied, thanks.


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
> On Sat, 8 Jul 2017 21:06:17 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> My plan is to test this latest patchset again, Monday and Tuesday.
>> I'll try to assess stability and provide some performance numbers.
> 
> Performance numbers:
> 
>  14378479 pkt/s = XDP_DROP without touching memory
>   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
> XDP_TX)
>   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> 
> The performance drop between xdp2 and xdp_redirect, was expected due
> to the HW-tailptr flush per packet, which is costly.
> 
>  (1/6344472-1/4595574)*10^9 = -59.98 ns
> 
> The performance drop between xdp2 and xdp_redirect_map, is higher than
> I expected, which is not good!  The avoidance of the tailptr flush per
> packet was expected to give a higher boost.  The cost increased with
> 40 ns, which is too high compared to the code added (on a 4GHz machine
> approx 160 cycles).
> 
>  (1/6344472-1/5066243)*10^9 = -39.77 ns
> 
> This system doesn't have DDIO, thus we are stalling on cache-misses,
> but I was actually expecting that the added code could "hide" behind
> these cache-misses.
> 
> I'm somewhat surprised to see this large a performance drop.
> 

Yep, although there is room for optimizations in the code path for sure. And
5mpps is not horrible my preference is to get this series in plus any
small optimization we come up with while the merge window is closed. Then
follow up patches can do optimizations.

One easy optimization is to get rid of the atomic bitops. They are not needed
here we have a per cpu unsigned long. Another easy one would be to move
some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
and flush ops on the net device in the hotpath really should be done in the
slow path.

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 1191060..899364d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -213,7 +213,7 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 key)
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
unsigned long *bitmap = this_cpu_ptr(dtab->flush_needed);
 
-   set_bit(key, bitmap);
+   __set_bit(key, bitmap);
 }
 
 struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
@@ -253,7 +253,7 @@ void __dev_map_flush(struct bpf_map *map)
 
netdev = dev->dev;
 
-   clear_bit(bit, bitmap);
+   __clear_bit(bit, bitmap);
if (unlikely(!netdev || !netdev->netdev_ops->ndo_xdp_flush))
continue;
 
@@ -287,7 +287,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev 
*old_dev)
 
for_each_online_cpu(cpu) {
bitmap = per_cpu_ptr(old_dev->dtab->flush_needed, cpu);
-   clear_bit(old_dev->key, bitmap);
+   __clear_bit(old_dev->key, bitmap);
 
fl->netdev_ops->ndo_xdp_flush(old_dev->dev);
}



Re: [PATCH v2 00/10] Constify attribute_group structures

2017-07-11 Thread David Miller
From: Arvind Yadav 
Date: Tue, 11 Jul 2017 12:52:41 +0530

> attribute_groups are not supposed to change at runtime. So mark the
> non-const structs as const.

Please resubmit this when net-next is open again:

http://vger.kernel.org/~davem/net-next.html

Thank you.


Re: [PATCH net 0/3] bnxt_en: Bug fixes.

2017-07-11 Thread David Miller
From: Michael Chan 
Date: Tue, 11 Jul 2017 13:05:33 -0400

> 3 bug fixes in this series.  Fix a crash in bnxt_get_stats64() that can
> happen if the device is closing and freeing the statistics block at the
> same time.  The 2nd one fixes ethtool -L failing when changing from
> combined to non-combined mode or vice versa.  The last one fixes SRIOV
> failure on big-endian systems because we were setting a bitmap wrong in
> a firmware message.

Series applied, thanks Michael.

We really should look into generically taking care of the ->ndo_close()
vs. statistics pulling synchronization so that every driver doesn't
need to have code like this.


Re: WARN_ON_ONCE(work > weight) in napi_poll()

2017-07-11 Thread Andrey Ryabinin
On 07/11/2017 12:24 AM, Ryan Hsu wrote:
> On 07/04/2017 08:59 AM, Andrey Ryabinin wrote:
> 
>> On 07/04/2017 04:49 PM, Kalle Valo wrote:
>>> Andrey Ryabinin  writes:
>>>
 I occasionally hit WARN_ON_ONCE(work > weight); in napi_poll() on a
 laptop with ath10k card.


 [37207.593370] [ cut here ]
 [37207.593380] WARNING: CPU: 0 PID: 7 at ../net/core/dev.c:5274
 net_rx_action+0x258/0x360
 [37207.593381] Modules linked in: snd_hda_codec_realtek snd_soc_skl
 snd_hda_codec_generic snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp
 snd_hda_ext_core snd_soc_sst_match snd_soc_core rtsx_pci_sdmmc
 mmc_core snd_pcm_dmaengine x86_pkg_temp_thermal snd_hda_intel uvcvideo
 kvm_intel videobuf2_vmalloc kvm snd_hda_codec snd_hwdep btusb
 videobuf2_memops btintel snd_hda_core videobuf2_v4l2 bluetooth
 irqbypass snd_pcm videobuf2_core crc32c_intel videodev mei_me mei
 rtsx_pci intel_lpss_pci intel_lpss_acpi intel_vbtn intel_lpss mfd_core
 tpm_tis intel_hid tpm_tis_core tpm efivarfs
 [37207.593430] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.11.7 #28
 [37207.593432] Hardware name: Dell Inc. XPS 13 9360/0839Y6, BIOS 1.3.5 
 05/08/2017
>>> What firmware and hardware versions exactly? The dmesg output when
>>> ath10k loads is preferred. As you are using XPS 13 I'm guessing it's one
>>> of QCA6174 family.
>>>
>> Yes it's qca6174.
>>
>> $ dmesg |grep ath10
>> [0.624828] ath10k_pci :3a:00.0: enabling device ( -> 0002)
>> [0.626370] ath10k_pci :3a:00.0: pci irq msi oper_irq_mode 2 irq_mode 
>> 0 reset_mode 0
>> [0.837862] ath10k_pci :3a:00.0: qca6174 hw3.2 target 0x0503 
>> chip_id 0x00340aff sub 1a56:1535
>> [0.837863] ath10k_pci :3a:00.0: kconfig debug 0 debugfs 1 tracing 0 
>> dfs 0 testmode 0
>> [0.838388] ath10k_pci :3a:00.0: firmware ver 
>> WLAN.RM.2.0-00180-QCARMSWPZ-1 api 4 features wowlan,ignore-otp,no-4addr-pad 
>> crc32 75dee6c5
>> [0.900606] ath10k_pci :3a:00.0: board_file api 2 bmi_id N/A crc32 
>> 19644295
>> [3.020681] ath10k_pci :3a:00.0: htt-ver 3.26 wmi-op 4 htt-op 3 cal 
>> otp max-sta 32 raw 0 hwcrypto 1
>> [9.574087] ath10k_pci :3a:00.0 wlp58s0: renamed from wlan0
>>
> 
> I can't reproduce of this past few days, not sure if this is due to the amsdu 
> packets received from AP, would you mind share what Ap you're using?
> And if there any specific steps you're doing?
> 

The bug is quite rare, and I don't know how to reproduce it on purpose. It just 
happens once in several days
and it happened with different APs. 

Perhaps you could think of some debug patch that could help to understand 
what's going on? I could carry it and will
report if warning hits again. 

FYI I applied small debug patch bellow, just to understand where exactly we 
bail out with too big quota.

---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 02a3fc81fbe3..02b0cc6ec671 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2548,6 +2548,8 @@ int ath10k_htt_txrx_compl_task(struct ath10k *ar, int 
budget)
goto exit;
}
}
+   if (quota > budget)
+   pr_err("#1 quota too big %d %d %d\n", quota, budget, 
num_rx_msdus);
 
while (quota < budget) {
/* no more data to receive */
@@ -2568,6 +2570,8 @@ int ath10k_htt_txrx_compl_task(struct ath10k *ar, int 
budget)
goto exit;
}
}
+   if (quota > budget)
+   pr_err("#2 quota too big %d %d %d\n", quota, budget, 
num_rx_msdus);
 
/* From NAPI documentation:
 *  The napi poll() function may also process TX completions, in which
--


It gave me this:

[118648.825347] #1 quota too big 72 64 16
[118648.825351] #2 quota too big 72 64 16
[118648.825471] [ cut here ]
[118648.825484] WARNING: CPU: 0 PID: 0 at ../net/core/dev.c:5274 
net_rx_action+0x258/0x360

So this means that we didn't met the condition bellow, i.e. skb_queue_empty() 
returned true.

ath10k_htt_txrx_compl_task():

if ((quota > ATH10K_NAPI_QUOTA_LIMIT) &&
!skb_queue_empty(>rx_in_ord_compl_q)) {
resched_napi = true;
goto exit;
}

> Also WLAN.RM.2.0-00180-QCARMSWPZ-1 firmware is a bit old, could you also 
> update firmware to give it a try?
> https://github.com/kvalo/ath10k-firmware/tree/master/QCA6174/hw3.0/4.4
> 

Will try.


[PATCH net 1/3] bnxt_en: Fix race conditions in .ndo_get_stats64().

2017-07-11 Thread Michael Chan
.ndo_get_stats64() may not be protected by RTNL and can race with
.ndo_stop() or other ethtool operations that can free the statistics
memory.  Fix it by setting a new flag BNXT_STATE_READ_STATS and then
proceeding to read statistics memory only if the state is OPEN.  The
close path that frees the memory clears the OPEN state and then waits
for the BNXT_STATE_READ_STATS to clear before proceeding to free the
statistics memory.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 --
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 +
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index a19f68f..415694d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6279,6 +6279,12 @@ static int bnxt_open(struct net_device *dev)
return __bnxt_open_nic(bp, true, true);
 }
 
+static bool bnxt_drv_busy(struct bnxt *bp)
+{
+   return (test_bit(BNXT_STATE_IN_SP_TASK, >state) ||
+   test_bit(BNXT_STATE_READ_STATS, >state));
+}
+
 int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
 {
int rc = 0;
@@ -6297,7 +6303,7 @@ int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, 
bool link_re_init)
 
clear_bit(BNXT_STATE_OPEN, >state);
smp_mb__after_atomic();
-   while (test_bit(BNXT_STATE_IN_SP_TASK, >state))
+   while (bnxt_drv_busy(bp))
msleep(20);
 
/* Flush rings and and disable interrupts */
@@ -6358,8 +6364,15 @@ static int bnxt_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
u32 i;
struct bnxt *bp = netdev_priv(dev);
 
-   if (!bp->bnapi)
+   set_bit(BNXT_STATE_READ_STATS, >state);
+   /* Make sure bnxt_close_nic() sees that we are reading stats before
+* we check the BNXT_STATE_OPEN flag.
+*/
+   smp_mb__after_atomic();
+   if (!test_bit(BNXT_STATE_OPEN, >state)) {
+   clear_bit(BNXT_STATE_READ_STATS, >state);
return;
+   }
 
/* TODO check if we need to synchronize with bnxt_close path */
for (i = 0; i < bp->cp_nr_rings; i++) {
@@ -6406,6 +6419,7 @@ static int bnxt_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
stats->tx_fifo_errors = le64_to_cpu(tx->tx_fifo_underruns);
stats->tx_errors = le64_to_cpu(tx->tx_err);
}
+   clear_bit(BNXT_STATE_READ_STATS, >state);
 }
 
 static bool bnxt_mc_list_updated(struct bnxt *bp, u32 *rx_mask)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index f872a7d..3c9d484 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1117,6 +1117,7 @@ struct bnxt {
unsigned long   state;
 #define BNXT_STATE_OPEN0
 #define BNXT_STATE_IN_SP_TASK  1
+#define BNXT_STATE_READ_STATS  2
 
struct bnxt_irq *irq_tbl;
int total_irqs;
-- 
1.8.3.1



[PATCH net 2/3] bnxt_en: Fix bug in ethtool -L.

2017-07-11 Thread Michael Chan
When changing channels from combined to rx/tx or vice versa, the code
uses the wrong "sh" parameter to determine if we are reserving rings
for shared or non-shared mode.  It should be using the ethtool requested
"sh" parameter instead of the current "sh" parameter.

Fix it by passing the "sh" parameter to bnxt_reserve_rings().  For
ethtool, we will pass in the requested "sh" parameter.

Fixes: 391be5c27364 ("bnxt_en: Implement new scheme to reserve tx rings.")
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +-
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 415694d..d9830d0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6918,16 +6918,13 @@ static void bnxt_sp_task(struct work_struct *work)
 }
 
 /* Under rtnl_lock */
-int bnxt_reserve_rings(struct bnxt *bp, int tx, int rx, int tcs, int tx_xdp)
+int bnxt_reserve_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
+  int tx_xdp)
 {
int max_rx, max_tx, tx_sets = 1;
int tx_rings_needed;
-   bool sh = true;
int rc;
 
-   if (!(bp->flags & BNXT_FLAG_SHARED_RINGS))
-   sh = false;
-
if (tcs)
tx_sets = tcs;
 
@@ -7135,7 +7132,7 @@ int bnxt_setup_mq_tc(struct net_device *dev, u8 tc)
sh = true;
 
rc = bnxt_reserve_rings(bp, bp->tx_nr_rings_per_tc, bp->rx_nr_rings,
-   tc, bp->tx_nr_rings_xdp);
+   sh, tc, bp->tx_nr_rings_xdp);
if (rc)
return rc;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3c9d484..f34691f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1301,7 +1301,8 @@ int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, 
unsigned long *bmap,
 int bnxt_half_open_nic(struct bnxt *bp);
 void bnxt_half_close_nic(struct bnxt *bp);
 int bnxt_close_nic(struct bnxt *, bool, bool);
-int bnxt_reserve_rings(struct bnxt *bp, int tx, int rx, int tcs, int tx_xdp);
+int bnxt_reserve_rings(struct bnxt *bp, int tx, int rx, bool sh, int tcs,
+  int tx_xdp);
 int bnxt_setup_mq_tc(struct net_device *dev, u8 tc);
 int bnxt_get_max_rings(struct bnxt *, int *, int *, bool);
 void bnxt_restore_pf_fw_resources(struct bnxt *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index fd11815..be6acad 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -432,7 +432,8 @@ static int bnxt_set_channels(struct net_device *dev,
}
tx_xdp = req_rx_rings;
}
-   rc = bnxt_reserve_rings(bp, req_tx_rings, req_rx_rings, tcs, tx_xdp);
+   rc = bnxt_reserve_rings(bp, req_tx_rings, req_rx_rings, sh, tcs,
+   tx_xdp);
if (rc) {
netdev_warn(dev, "Unable to allocate the requested rings\n");
return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 7d67552..3961a68 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -170,7 +170,7 @@ static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog 
*prog)
if (!tc)
tc = 1;
rc = bnxt_reserve_rings(bp, bp->tx_nr_rings_per_tc, bp->rx_nr_rings,
-   tc, tx_xdp);
+   true, tc, tx_xdp);
if (rc) {
netdev_warn(dev, "Unable to reserve enough TX rings to support 
XDP.\n");
return rc;
-- 
1.8.3.1



[PATCH net 3/3] bnxt_en: Fix SRIOV on big-endian architecture.

2017-07-11 Thread Michael Chan
The PF driver sets up a list of firmware commands from the VF driver that
needs to be forwarded to the PF for approval.  This list is a 256-bit
bitmap.  The code that sets up the bitmap falls apart on big-endian
architecture.  __set_bit() does not work because it operates on long types
whereas the firmware interface is defined in u32 types, causing bits in
the wrong 32-bit word to be set.

Fix it by setting the proper bits on an array of u32.

Fixes: de68f5de5651 ("bnxt_en: Fix bitmap declaration to work on 32-bit 
arches.")
Reported-by: Shannon Nelson 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d9830d0..e7c8539 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3458,13 +3458,18 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
req.ver_upd = DRV_VER_UPD;
 
if (BNXT_PF(bp)) {
-   DECLARE_BITMAP(vf_req_snif_bmap, 256);
-   u32 *data = (u32 *)vf_req_snif_bmap;
+   u32 data[8];
int i;
 
-   memset(vf_req_snif_bmap, 0, sizeof(vf_req_snif_bmap));
-   for (i = 0; i < ARRAY_SIZE(bnxt_vf_req_snif); i++)
-   __set_bit(bnxt_vf_req_snif[i], vf_req_snif_bmap);
+   memset(data, 0, sizeof(data));
+   for (i = 0; i < ARRAY_SIZE(bnxt_vf_req_snif); i++) {
+   u16 cmd = bnxt_vf_req_snif[i];
+   unsigned int bit, idx;
+
+   idx = cmd / 32;
+   bit = cmd % 32;
+   data[idx] |= 1 << bit;
+   }
 
for (i = 0; i < 8; i++)
req.vf_req_fwd[i] = cpu_to_le32(data[i]);
-- 
1.8.3.1



[PATCH net 0/3] bnxt_en: Bug fixes.

2017-07-11 Thread Michael Chan
3 bug fixes in this series.  Fix a crash in bnxt_get_stats64() that can
happen if the device is closing and freeing the statistics block at the
same time.  The 2nd one fixes ethtool -L failing when changing from
combined to non-combined mode or vice versa.  The last one fixes SRIOV
failure on big-endian systems because we were setting a bitmap wrong in
a firmware message.

Michael Chan (3):
  bnxt_en: Fix race conditions in .ndo_get_stats64().
  bnxt_en: Fix bug in ethtool -L.
  bnxt_en: Fix SRIOV on big-endian architecture.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 42 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  4 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  2 +-
 4 files changed, 35 insertions(+), 16 deletions(-)

-- 
1.8.3.1



Re: net-next STATUS page

2017-07-11 Thread Cong Wang
On Tue, Jul 11, 2017 at 7:24 AM, David Miller  wrote:
>
> It has gotten to the point that even casually walking around
> Faro, Portugal last week, random German tourists would stop
> me in the street and ask if net-next was open or not.
>
> Therefore, in order to avoid any and all confusion I have created this
> web site:
>
> http://vger.kernel.org/~davem/net-next.html
>
> Please refer to it if you are ever in doubt.

Can we update netdev-FAQ too? ;)


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Sat, 8 Jul 2017 21:06:17 +0200
Jesper Dangaard Brouer  wrote:

> My plan is to test this latest patchset again, Monday and Tuesday.
> I'll try to assess stability and provide some performance numbers.

Performance numbers:

 14378479 pkt/s = XDP_DROP without touching memory
  9222401 pkt/s = xdp1: XDP_DROP with reading packet data
  6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
  4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate XDP_TX)
  5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap

The performance drop between xdp2 and xdp_redirect, was expected due
to the HW-tailptr flush per packet, which is costly.

 (1/6344472-1/4595574)*10^9 = -59.98 ns

The performance drop between xdp2 and xdp_redirect_map, is higher than
I expected, which is not good!  The avoidance of the tailptr flush per
packet was expected to give a higher boost.  The cost increased with
40 ns, which is too high compared to the code added (on a 4GHz machine
approx 160 cycles).

 (1/6344472-1/5066243)*10^9 = -39.77 ns

This system doesn't have DDIO, thus we are stalling on cache-misses,
but I was actually expecting that the added code could "hide" behind
these cache-misses.

I'm somewhat surprised to see this large a performance drop.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Results::

 # XDP_DROP with reading packet data
 [jbrouer@canyon bpf]$ sudo ./xdp1 3
 proto 17:6449727 pkt/s
 proto 17:9222639 pkt/s
 proto 17:9222401 pkt/s
 proto 17:9223083 pkt/s
 proto 17:9223515 pkt/s
 proto 17:9222477 pkt/s
 ^C

 # XDP_TX with swap mac
 [jbrouer@canyon bpf]$ sudo ./xdp2 3
 proto 17: 934682 pkt/s
 proto 17:6344845 pkt/s
 proto 17:6344472 pkt/s
 proto 17:6345265 pkt/s
 proto 17:6345238 pkt/s
 proto 17:6345338 pkt/s
 ^C

 # XDP_REDIRECT with swap mac (simulate XDP_TX via same ifindex)
 [jbrouer@canyon bpf]$ sudo ./xdp_redirect 3 3
 ifindex 3: 749567 pkt/s
 ifindex 3:4595025 pkt/s
 ifindex 3:4595574 pkt/s
 ifindex 3:4595429 pkt/s
 ifindex 3:4595340 pkt/s
 ifindex 3:4595352 pkt/s
 ifindex 3:4595364 pkt/s
 ^C

  # XDP_REDIRECT with swap mac + devmap (still simulate XDP_TX)
 [jbrouer@canyon bpf]$ sudo ./xdp_redirect_map 3 3
 map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
 ifindex 3:3076506 pkt/s
 ifindex 3:5066282 pkt/s
 ifindex 3:5066243 pkt/s
 ifindex 3:5067376 pkt/s
 ifindex 3:5067226 pkt/s
 ifindex 3:5067622 pkt/s

My own tools::

 [jbrouer@canyon prototype-kernel]$
   sudo ./xdp_bench01_mem_access_cost --dev ixgbe1 --sec 2 \
--action XDP_DROP
 XDP_action   ppspps-human-readable mem  
 XDP_DROP 0  0  no_touch 
 XDP_DROP 98944019,894,401  no_touch 
 XDP_DROP 14377459   14,377,459 no_touch 
 XDP_DROP 14378228   14,378,228 no_touch 
 XDP_DROP 14378400   14,378,400 no_touch 
 XDP_DROP 14378319   14,378,319 no_touch 
 XDP_DROP 14378479   14,378,479 no_touch 
 XDP_DROP 14377332   14,377,332 no_touch 
 XDP_DROP 14378411   14,378,411 no_touch 
 XDP_DROP 14378095   14,378,095 no_touch 
 ^CInterrupted: Removing XDP program on ifindex:3 device:ixgbe1

 [jbrouer@canyon prototype-kernel]$
  sudo ./xdp_bench01_mem_access_cost --dev ixgbe1 --sec 2 \
   --action XDP_DROP --read
 XDP_action   ppspps-human-readable mem  
 XDP_DROP 0  0  read 
 XDP_DROP 69941146,994,114  read 
 XDP_DROP 89794148,979,414  read 
 XDP_DROP 89796368,979,636  read 
 XDP_DROP 89800878,980,087  read 
 XDP_DROP 89790978,979,097  read 
 XDP_DROP 89789708,978,970  read 
 ^CInterrupted: Removing XDP program on ifindex:3 device:ixgbe1

 [jbrouer@canyon prototype-kernel]$
  sudo ./xdp_bench01_mem_access_cost --dev ixgbe1 --sec 2 \
  --action XDP_TX --swap --read
 XDP_action   ppspps-human-readable mem  
 XDP_TX   0  0  swap_mac 
 XDP_TX   21415562,141,556  swap_mac 
 XDP_TX   61719846,171,984  swap_mac 
 XDP_TX   61719556,171,955  swap_mac 
 XDP_TX   61717676,171,767  swap_mac 
 XDP_TX   61716806,171,680  swap_mac 
 XDP_TX   61722016,172,201  swap_mac 
 ^CInterrupted: Removing XDP program on ifindex:3 device:ixgbe1


Setting tuned-adm network-latency ::

 $ sudo tuned-adm list
 [...]
 Current active profile: network-latency



[PATCH V3] Set NTB format again after altsetting switch for Huawei devices

2017-07-11 Thread Enrico Mioso
Some firmwares in Huawei E3372H devices have been observed to switch back
to NTB 32-bit format after altsetting switch.
This patch implements a driver flag to check for the device settings and
set NTB format to 16-bit again if needed.
The flag has been activated for devices controlled by the huawei_cdc_ncm.c
driver.

V1->V2:
- fixed broken error checks
- some corrections to the commit message
V2->V3:
- variable name changes, to clarify what's happening
- check (and possibly set) the NTB format later in the common bind code path

Signed-off-by: Enrico Mioso 
Reported-and-tested-by: Christian Panton 
Reviewed-by: Bjørn Mork 
CC: Bjørn Mork 
CC: Christian Panton 
CC: linux-...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: Oliver Neukum 
---
 drivers/net/usb/cdc_ncm.c| 28 
 drivers/net/usb/huawei_cdc_ncm.c |  6 ++
 include/linux/usb/cdc_ncm.h  |  1 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index d103a1d4fb36..8f572b9f3625 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -768,8 +768,10 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct 
usb_interface *intf, u8 data_
u8 *buf;
int len;
int temp;
+   int err;
u8 iface_no;
struct usb_cdc_parsed_header hdr;
+   u16 curr_ntb_format;
 
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
@@ -874,6 +876,32 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct 
usb_interface *intf, u8 data_
goto error2;
}
 
+   /*
+* Some Huawei devices have been observed to come out of reset in NDP32 
mode.
+* Let's check if this is the case, and set the device to NDP16 mode 
again if
+* needed.
+   */
+   if (ctx->drvflags & CDC_NCM_FLAG_RESET_NTB16) {
+   err = usbnet_read_cmd(dev, USB_CDC_GET_NTB_FORMAT,
+ USB_TYPE_CLASS | USB_DIR_IN | 
USB_RECIP_INTERFACE,
+ 0, iface_no, _ntb_format, 2);
+   if (err < 0) {
+   goto error2;
+   }
+
+   if (curr_ntb_format == USB_CDC_NCM_NTB32_FORMAT) {
+   dev_info(>dev, "resetting NTB format to 16-bit");
+   err = usbnet_write_cmd(dev, USB_CDC_SET_NTB_FORMAT,
+  USB_TYPE_CLASS | USB_DIR_OUT
+  | USB_RECIP_INTERFACE,
+  USB_CDC_NCM_NTB16_FORMAT,
+  iface_no, NULL, 0);
+
+   if (err < 0)
+   goto error2;
+   }
+   }
+
cdc_ncm_find_endpoints(dev, ctx->data);
cdc_ncm_find_endpoints(dev, ctx->control);
if (!dev->in || !dev->out || !dev->status) {
diff --git a/drivers/net/usb/huawei_cdc_ncm.c b/drivers/net/usb/huawei_cdc_ncm.c
index 2680a65cd5e4..63f28908afda 100644
--- a/drivers/net/usb/huawei_cdc_ncm.c
+++ b/drivers/net/usb/huawei_cdc_ncm.c
@@ -80,6 +80,12 @@ static int huawei_cdc_ncm_bind(struct usbnet *usbnet_dev,
 * be at the end of the frame.
 */
drvflags |= CDC_NCM_FLAG_NDP_TO_END;
+
+   /* Additionally, it has been reported that some Huawei E3372H devices, 
with
+* firmware version 21.318.01.00.541, come out of reset in NTB32 format 
mode, hence
+* needing to be set to the NTB16 one again.
+*/
+   drvflags |= CDC_NCM_FLAG_RESET_NTB16;
ret = cdc_ncm_bind_common(usbnet_dev, intf, 1, drvflags);
if (ret)
goto err;
diff --git a/include/linux/usb/cdc_ncm.h b/include/linux/usb/cdc_ncm.h
index 021f7a88f52c..1a59699cf82a 100644
--- a/include/linux/usb/cdc_ncm.h
+++ b/include/linux/usb/cdc_ncm.h
@@ -83,6 +83,7 @@
 /* Driver flags */
 #define CDC_NCM_FLAG_NDP_TO_END0x02/* NDP is 
placed at end of frame */
 #define CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE  0x04/* Avoid altsetting 
toggle during init */
+#define CDC_NCM_FLAG_RESET_NTB16 0x08  /* set NDP16 one more time after 
altsetting switch */
 
 #define cdc_ncm_comm_intf_is_mbim(x)  ((x)->desc.bInterfaceSubClass == 
USB_CDC_SUBCLASS_MBIM && \
   (x)->desc.bInterfaceProtocol == 
USB_CDC_PROTO_NONE)
-- 
2.13.2



Re: net-next STATUS page

2017-07-11 Thread Phil Sutter
On Tue, Jul 11, 2017 at 05:05:06PM +0200, Jiri Pirko wrote:
> Tue, Jul 11, 2017 at 04:24:16PM CEST, da...@davemloft.net wrote:
> >
> >It has gotten to the point that even casually walking around
> >Faro, Portugal last week, random German tourists would stop
> >me in the street and ask if net-next was open or not.
> >
> >Therefore, in order to avoid any and all confusion I have created this
> >web site:
> >
> > http://vger.kernel.org/~davem/net-next.html
> 
> Hallelujah!
> 
> You remember I suggested this couple of years ago? :)

Never underestimate the power of German tourists. I even heard stories
about them causing bad dreams!

Cheers, Phil


Re: [PATCH net-next RFC 05/12] net: dsa: Add support for learning FDB through notification

2017-07-11 Thread Nikolay Aleksandrov
On 11/07/17 13:26, Arkadi Sharshevsky wrote:
> 
> 
> On 07/10/2017 11:59 PM, Vivien Didelot wrote:
>> Hi Arkadi,
>>
>> Arkadi Sharshevsky  writes:
>>
> + err = dsa_port_fdb_add(p->dp, fdb_info->addr, fdb_info->vid);
> + if (err) {
> + netdev_dbg(dev, "fdb add failed err=%d\n", err);
> + break;
> + }
> + call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
> +  _info->info);
> + break;
> +
> + case SWITCHDEV_FDB_DEL_TO_DEVICE:
> + fdb_info = _work->fdb_info;
> + err = dsa_port_fdb_del(p->dp, fdb_info->addr, fdb_info->vid);
> + if (err)
> + netdev_dbg(dev, "fdb del failed err=%d\n", err);

 OK I must have missed from the off-list discussion why we are not
 calling the switchdev notifier here?
>>>
>>> We do not agree on it actually, that is why it was moved to the list.
>>> I think that delete should succeed, you should retry until succession.
>>>
>>> The deletion is done under spinlock in the bridge so you cannot block,
>>> thus delete cannot fail due to hardware failure. Calling it here doesn't
>>> make sense because the bridge probably already deleted this FDB.
>>
>> So as we discussed, the problem here is that if dsa_port_fdb_del fails
>> for some probable reasons (MDIO timeout, weak GPIO lines, etc.), Linux
>> bridge will delete the entry in software, dumping bridge fdb will show
>> nothing, but the entry would still be programmed in hardware and the
>> network can thus be inconsistent, unsupposedly switching frames.
>>
>> IMHO the correct way for bridge to use the notification chain is to make
>> SWITCHDEV_FDB_DEL_TO_DEVICE symmetrical to SWITCHDEV_FDB_ADD_TO_DEVICE:
>> if an entry has been marked as offloaded, bridge must mark the entry as
>> to-be-deleted and do not delete the software entry until the driver
>> notifies back the successful deletion.
>>
>> If that is hardly feasible due to some bridge limitations, we must
>> explain this in a comment and use something more explosive than a simple
>> netdev_dbg to warn the user about the broken network setup...
>>
>>
>> Thanks,
>>
>> Vivien
>>
> 
> Hi Nikolay,
> 
> Vivien raised inconsistency issue with the current switchdev
> notification chain in case of FDB del. In case of static FDB delete,
> notification will be sent to the driver, followed by deletion of the
> software entry without waiting for the hardware delete. In case of
> hardware deletion failure the consecutive FDB dump will not show the
> deleted entry, yet, the entry will stay in hardware.
> 
> The deletion is done under lock thus the hardware deletion is deferred,
> and cannot fail due to hardware removal failure. Thus the above proposed
> solution by Vivien can lead to confusing situation:
> 
> 1. User deletes the entry
> 2. Deletion succeed
> 3. User dumps FDB and still sees this entry due to hardware failure,
>what should he do? retry to delete until the FDB dump will not show
>the entry?
> 
> Would like to hear you opinion about this solution.
> 
> IMHO in this case the driver should retry to delete, in case of
> several retries the driver should maybe:
> 1. Trap the traffic to CPU (dint think it possible in case of DSA).
> 2. Disable the port (its more explosive then netdev_dbg).
> 
> Thanks,
> Arkadi
> 
> 

Hi,
Looking at the code - it would seem that retrying is the only current option
with the way these switchdev notifications are handled. They cannot fail, 
meaning
from the bridge POV these ops must always succeed and errors are ignored, so the
driver should do everything possible to stay in sync, and in case all fails
then disabling the port seems like the best option to me, to show that 
something is
clearly wrong and avoid further issues, but DSA maintainers can comment more
on how to handle failure.

That being said:
This sounds a lot like the switchdev notifications vs callbacks discussions 
that we've
had in the past. Also what happened with the prepare+commit and all that ? If 
the hash_lock
is the main problem let's work towards improving that and making the fdb code 
handle
switchdev similar to the vlan code.

Cheers,
 Nik











Re: net-next STATUS page

2017-07-11 Thread Jiri Pirko
Tue, Jul 11, 2017 at 04:24:16PM CEST, da...@davemloft.net wrote:
>
>It has gotten to the point that even casually walking around
>Faro, Portugal last week, random German tourists would stop
>me in the street and ask if net-next was open or not.
>
>Therefore, in order to avoid any and all confusion I have created this
>web site:
>
>   http://vger.kernel.org/~davem/net-next.html

Hallelujah!

You remember I suggested this couple of years ago? :)


Re: [PATCH] brcmfmac: added LED triggers for transmit/receive

2017-07-11 Thread Russell Joyce
Thanks for your comments.

> What I think Rafał is saying is that it would be better to have this
> code in cfg80211 so other drivers including mac80211 could use it.


While I agree that moving all wireless LED triggers to cfg80211 would be an
ideal situation, it seems a bit out of scope for what I was trying to achieve.
This would probably also require removing the mac80211 LED triggers (and any
other similar triggers that might be created by specific wireless drivers not
using mac80211), in order to consolidate them in one place.

Besides this, I'm not sure where exactly in cfg80211 this functionality would
go (I assume it was originally put in mac80211 instead for a reason?),
although I'm certainly no expert in this area of the kernel.

> Indeed. However, the LED subsystem could/should(?) take care of mapping
> "rx" and "tx" triggers to the same LED.

In terms of the LED triggers, the only alternative I can see is to create a
single complex trigger that exposes "rx" and "tx" parameters that can be
individually enabled or disabled. This would reduce the number of triggers from
three to one, but also makes things slightly more awkward for the user, and
deviates from the convention set by mac80211.


Re: [PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio

2017-07-11 Thread David Miller
From: Amritha Nambiar 
Date: Tue, 11 Jul 2017 03:18:30 -0700

> The following series introduces a new hardware offload mode in
> tc/mqprio where the TCs, the queue configurations and
> bandwidth rate limits are offloaded to the hardware.

net-next is closed, please resubmit this when net-next opens
again:

http://vger.kernel.org/~davem/net-next.html

Thank you.


[PATCH net] cxgb4: ptp_clock_register() returns error pointers

2017-07-11 Thread Ganesh Goudar
Check ptp_clock_register() return not only for NULL but
also for error pointers.

Fixes: 9c33e4208bce ("cxgb4: Add PTP Hardware Clock (PHC) support")
Reported-by: Dan Carpenter 
Cc: Richard Cochran 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
index 50517cf..06c0de4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ptp.c
@@ -441,7 +441,7 @@ void cxgb4_ptp_init(struct adapter *adapter)
 
adapter->ptp_clock = ptp_clock_register(>ptp_clock_info,
>pdev->dev);
-   if (!adapter->ptp_clock) {
+   if (IS_ERR_OR_NULL(adapter->ptp_clock)) {
dev_err(adapter->pdev_dev,
"PTP %s Clock registration has failed\n", __func__);
return;
-- 
2.1.0



net-next STATUS page

2017-07-11 Thread David Miller

It has gotten to the point that even casually walking around
Faro, Portugal last week, random German tourists would stop
me in the street and ask if net-next was open or not.

Therefore, in order to avoid any and all confusion I have created this
web site:

http://vger.kernel.org/~davem/net-next.html

Please refer to it if you are ever in doubt.

Thanks!


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Mon, 10 Jul 2017 17:59:17 -0700
John Fastabend  wrote:

> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:
> > On Sat, 8 Jul 2017 21:06:17 +0200
> > Jesper Dangaard Brouer  wrote:
> >   
> >> On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
> >> David Miller  wrote:
> >>  
> >>> From: John Fastabend 
> >>> Date: Fri, 07 Jul 2017 10:48:36 -0700
> >>> 
>  On 07/07/2017 10:34 AM, John Fastabend wrote:  
> > This series adds two new XDP helper routines bpf_redirect() and
> > bpf_redirect_map(). The first variant bpf_redirect() is meant
> > to be used the same way it is currently being used by the cls_bpf
> > classifier. An xdp packet will be redirected immediately when this
> > is called.  
> 
>  Also other than the typo in the title there ;) I'm going to CC
>  the driver maintainers working on XDP (makes for a long CC list but)
>  because we would want to try and get support in as many as possible in
>  the next merge window.
> 
>  For this rev I just implemented on ixgbe because I wrote the
>  original XDP support there. I'll volunteer to do virtio as well.  
> >>>
> >>> I went over this series a few times and it looks great to me.
> >>> You didn't even give me some coding style issues to pick on :-)
> >>
> >> We (Daniel, Andy and I) have been reviewing and improving on this
> >> patchset the last couple of weeks ;-).  We had some stability issues,
> >> which is why it wasn't published earlier. My plan is to test this
> >> latest patchset again, Monday and Tuesday. I'll try to assess stability
> >> and provide some performance numbers.  
> > 
> > 
> > Damn, I though it was stable, I have been running a lot of performance
> > tests, and then this just happened :-(  
> 
> Thanks, I'll take a look through the code and see if I can come up with
> why this might happen. I haven't hit it on my tests yet though.

I've figured out why this happens, and I have a fix, see patch below
with some comments with questions.

The problem is that we can leak map_to_flush in an error path, the fix:

diff --git a/net/core/filter.c b/net/core/filter.c
index 2ccd6ff09493..7f1f48668dcf 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, struct 
xdp_buff *xdp,
ri->map = NULL;
 
trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
-
+   // Q: Should we also trace "goto out" (failed lookup)?
+   //like bpf_warn_invalid_xdp_redirect();
return __bpf_tx_xdp(fwd, map, xdp, index);
 out:
ri->ifindex = 0;
-   ri->map = NULL;
+   // XXX: here we could leak ri->map_to_flush, which could be
+   //  picked up later by xdp_do_flush_map()
+   xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */
return -EINVAL;


While debugging this, I noticed that we can have packets in-flight,
while the XDP RX rings are being reconfigured.  I wonder if this is a
ixgbe driver XDP-bug?  I think it would be best to add some
RCU-barrier, after ixgbe_setup_tc().

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ed97aa81a850..4872fbb54ecd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9801,7 +9804,18 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
struct bpf_prog *prog)
 
/* If transitioning XDP modes reconfigure rings */
if (!!prog != !!old_prog) {
-   int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+   // XXX: Warn pkts can be in-flight in old_prog
+   //  while ixgbe_setup_tc() calls ixgbe_close(dev).
+   //
+   // Should we avoid these in-flight packets?
+   // Would it be enough to add an synchronize_rcu()
+   // or rcu_barrier()?
+   // or do we need an napi_synchronize() call here?
+   //
+   int err;
+   netdev_info(dev,
+   "Calling ixgbe_setup_tc() to reconfig XDP rings\n");
+   err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
 
if (err) {
rcu_assign_pointer(adapter->xdp_prog, old_prog);

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


RE: [PATCH] dpaa_eth: use correct device for DMA mapping API

2017-07-11 Thread Madalin-cristian Bucur
> -Original Message-
> From: arndbergm...@gmail.com [mailto:arndbergm...@gmail.com] On Behalf Of
> Arnd Bergmann
> Subject: Re: [PATCH] dpaa_eth: use correct device for DMA mapping API
> 
> On Tue, Jul 11, 2017 at 10:50 AM, Madalin-cristian Bucur
>  wrote:
> 
> > Hi Arnd,
> >
> > Thanks for looking into this, I've tested your fix, it seems to need
> more work:
> >
> > [0.894968]  platform: DMA map failed
> > [0.898627]  platform: DMA map failed
> > [0.902288]  platform: DMA map failed
> > [0.905947]  platform: DMA map failed
> > [0.909606]  platform: DMA map failed
> > [0.913265]  platform: DMA map failed
> 
> I see: the assignment ended up after the first use, so ->dev was still
> NULL here.
> 
> This should fix the problem you saw here:
> 
> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> index f7b0b928cd53..988c0212ce7e 100644
> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> @@ -2650,6 +2650,8 @@ static int dpaa_eth_probe(struct platform_device
> *pdev)
>   for (i = 0; i < DPAA_BPS_NUM; i++) {
>   int err;
> 
> + /* DMA operations are done on the platform-provided device */
> + dpaa_bps[i]->dev = dev->parent;
>   dpaa_bps[i] = dpaa_bp_alloc(dev);

Your new change de-references dpaa_bps[i] before it is set, this won't work 
either.

>   if (IS_ERR(dpaa_bps[i]))
>   return PTR_ERR(dpaa_bps[i]);
> @@ -2657,8 +2659,6 @@ static int dpaa_eth_probe(struct platform_device
> *pdev)
>   dpaa_bps[i]->raw_size = bpool_buffer_raw_size(i, DPAA_BPS_NUM);
>   /* avoid runtime computations by keeping the usable size here */
>   dpaa_bps[i]->size = dpaa_bp_size(dpaa_bps[i]->raw_size);
> - /* DMA operations are done on the platform-provided device */
> - dpaa_bps[i]->dev = dev->parent;
> 
>   err = dpaa_bp_alloc_pool(dpaa_bps[i]);
>   if (err < 0) {
> 
> > @@ -2806,7 +2799,6 @@ static int dpaa_eth_probe(struct platform_device
> *pdev)
> > dpaa_bps_free(priv);
> >  bp_create_failed:
> >  fq_probe_failed:
> > -dev_mask_failed:
> >  mac_probe_failed:
> > dev_set_drvdata(dev, NULL);
> > free_netdev(net_dev);
> >
> > I'll try to address your concern about performing the DMA mapping on a
> different
> > device than the one set up by the platform code.
> 
> Thanks!
> 
>  Arnd


Re: [RFC PATCH 03/12] xdp: add bpf_redirect helper function

2017-07-11 Thread Andy Gospodarek
On Mon, Jul 10, 2017 at 1:23 PM, John Fastabend
 wrote:
> On 07/09/2017 06:37 AM, Saeed Mahameed wrote:
>>
>>
>> On 7/7/2017 8:35 PM, John Fastabend wrote:
>>> This adds support for a bpf_redirect helper function to the XDP
>>> infrastructure. For now this only supports redirecting to the egress
>>> path of a port.
>>>
>>> In order to support drivers handling a xdp_buff natively this patches
>>> uses a new ndo operation ndo_xdp_xmit() that takes pushes a xdp_buff
>>> to the specified device.
>>>
>>> If the program specifies either (a) an unknown device or (b) a device
>>> that does not support the operation a BPF warning is thrown and the
>>> XDP_ABORTED error code is returned.
>>>
>>> Signed-off-by: John Fastabend 
>>> Acked-by: Daniel Borkmann 
>>> ---
>
> [...]
>
>>>
>>> +static int __bpf_tx_xdp(struct net_device *dev, struct xdp_buff *xdp)
>>> +{
>>> +if (dev->netdev_ops->ndo_xdp_xmit) {
>>> +dev->netdev_ops->ndo_xdp_xmit(dev, xdp);
>>
>> Hi John,
>>
>> I have some concern here regarding synchronizing between the
>> redirecting device and the target device:
>>
>> if the target device's NAPI is also doing XDP_TX on the same XDP TX
>> ring which this NDO might be redirecting xdp packets into the same
>> ring, there would be a race accessing this ring resources (buffers
>> and descriptors). Maybe you addressed this issue in the device driver
>> implementation of this ndo or with some NAPI tricks/assumptions, I
>> guess we have the same issue for if you run the same program to
>> redirect traffic from multiple netdevices into one netdevice, how do
>> you synchronize accessing this TX ring ?
>
> The implementation uses a per cpu TX ring to resolve these races. And
> the pair of driver interface API calls, xdp_do_redirect() and 
> xdp_do_flush_map()
> must be completed in a single poll() handler.
>
> This comment was included in the header file to document this,
>
> /* The pair of xdp_do_redirect and xdp_do_flush_map MUST be called in the
>  * same cpu context. Further for best results no more than a single map
>  * for the do_redirect/do_flush pair should be used. This limitation is
>  * because we only track one map and force a flush when the map changes.
>  * This does not appear to be a real limitation for existing software.
>  */
>
> In general some documentation about implementing XDP would probably be
> useful to add in Documentation/networking but this IMO goes beyond just
> this patch series.
>
>>
>> Maybe we need some clear guidelines in this ndo documentation stating
>> how to implement this ndo and what are the assumptions on those XDP
>> TX redirect rings or from which context this ndo can run.
>>
>> can you please elaborate.
>
> I think the best implementation is to use a per cpu TX ring as I did in
> this series. If your device is limited by the number of queues for some
> reason some other scheme would need to be devised. Unfortunately, the only
> thing I've come up for this case (using only this series) would both impact
> performance and make the code complex.
>
> A nice solution might be to constrain networking "tasks" to only a subset
> of cores. For 64+ core systems this might be a good idea. It would allow
> avoiding locking using per_cpu logic but also avoid networking consuming
> slices of every core in the system. As core count goes up I think we will
> eventually need to address this.I believe Eric was thinking along these
> lines with his netconf talk iirc. Obviously this work is way outside the
> scope of this series though.

I agree that it is outside the scope of this series, but I think it is
important to consider the impact of the output queue selection in both
a heterogenous and homogenous driver setup and how tx could be
optimized or even considered to be more reliable and I think that was
part of Saeed's point.

I got base redirect support for bnxt_en working yesterday, but for it
and other drivers that do not necessarily create a ring/queue per core
like ixgbe there is probably a bit more to work in each driver to
properly track output tx rings/queues than what you have done with
ixgbe.

>
>
>> Thanks,
>> Saeed.
>>


Re: [PATCH/RFC] dma-mapping: Provide dummy set_dma_ops() for NO_DMA=y

2017-07-11 Thread Christoph Hellwig
On Mon, Jul 10, 2017 at 04:31:54PM +0100, Robin Murphy wrote:
> On 10/07/17 15:56, Christoph Hellwig wrote:
> > This looks reasonable to me, I'd be happy to pick it up.  Can you send
> > it as a series with the reverts?
> 
> The fact remains that the FSL driver is still doing the wrong thing
> though - set_dma_ops(dev1, get_dma_ops(dev2)) is just a hack which
> happens to work on platforms which don't keep other arch-internal DMA
> info as well. I did start writing a patch somewhere to fix this thing to
> actually do proper DMA configuration (originally in the context of not
> abusing arch_setup_dma_ops()), but Arnd's fix is probably simpler.
> 
> I don't think it makes an awful lot of sense for code without a DMA API
> dependency to be calling set_dma_ops() - AFAIU the only reason it's
> available to drivers at all is for particular RDMA cases which know that
> their "DMA" is actually done by CPU threads, and want to optimise for that.

Even that code should require the DMA API.  So yes, maybe we should
fix the code to propagate the settings by another means.


Re: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Sergei Shtylyov

On 07/11/2017 03:20 PM, Chris Paterson wrote:


Add a new compatible string for the RZ/G1M (R8A7743) SoC.

Signed-off-by: Biju Das 
---
v1->v2
* Changed the subject
* re-formatted the required properties

 .../devicetree/bindings/net/renesas,ravb.txt   | 29 +--

---

 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt
b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index b519503..4717bc2 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -4,19 +4,24 @@ This file provides information on what the device
node for the Ethernet AVB  interface contains.

 Required properties:
-- compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790

SoC.

- "renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
- "renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
- "renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
- "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
- "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
- "renesas,etheravb-r8a7796" if the device is a part of R8A7796 SoC.
- "renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible

interface.

- "renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible

interface.

+- compatible: Must contain one or more of the following:


No, it can't contain more than one SoC specific value.


Surely it can contain both "renesas,etheravb-r8a7743" and 
"renesas,etheravb-rcar-gen2" for example?

As indeed dictated by the instruction further down:


   Yes. Sorry, I should have been more attentive.


+   When compatible with the generic version, nodes must list the
+   SoC-specific version corresponding to the platform first followed by
+   the generic version.



Kind regards, Chris


MBR, Sergei



Re: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Sergei Shtylyov

On 07/11/2017 03:21 PM, Simon Horman wrote:


Add a new compatible string for the RZ/G1M (R8A7743) SoC.

Signed-off-by: Biju Das 
---
v1->v2
* Changed the subject
* re-formatted the required properties

.../devicetree/bindings/net/renesas,ravb.txt   | 29 +-
1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index b519503..4717bc2 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -4,19 +4,24 @@ This file provides information on what the device node for 
the Ethernet AVB
interface contains.

Required properties:
-- compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790 
SoC.
- "renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
- "renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
- "renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
- "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
- "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
- "renesas,etheravb-r8a7796" if the device is a part of R8A7796 SoC.
- "renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible 
interface.
- "renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible 
interface.
+- compatible: Must contain one or more of the following:


   No, it can't contain more than one SoC specific value.


The text above does not say it can unless one is under the delusion
that, f.e. an r8a7790 SoC is also an r8a7791 SoC.


   Hm, you're right. Sorry about my misunderstanding...

   The only problem I'm still seeing with the current wording is that gen2/3 
fallback
values are declared optional, while you can't get the driver to bind without 
them on

the newer SoCs. But this is probably not a new problem...


If you would prefer an alternate wording please provide a suggestion.


   Perhaps we should emphasize that the gen2/3 fallback values are mandatory?


+  - "renesas,etheravb-r8a7743" for the R8A7743 SoC.
+  - "renesas,etheravb-r8a7790" for the R8A7790 SoC.
+  - "renesas,etheravb-r8a7791" for the R8A7791 SoC.
+  - "renesas,etheravb-r8a7792" for the R8A7792 SoC.
+  - "renesas,etheravb-r8a7793" for the R8A7793 SoC.
+  - "renesas,etheravb-r8a7794" for the R8A7794 SoC.
+  - "renesas,etheravb-rcar-gen2" as a fallback for the above
+   R-Car Gen2 and RZ/G1 devices.

[...]

WBR, Sergei



Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances

2017-07-11 Thread Jiri Pirko
Tue, Jul 11, 2017 at 02:15:27PM CEST, j...@mojatatu.com wrote:
>Hi Jiri,
>
>Commenting on generalities - will comment on code later:
>
>On 17-07-10 02:51 PM, Jiri Pirko wrote:
>> From: Jiri Pirko 
>> 
>> Currently the filters added to qdiscs are independent. So for example if you
>> have 2 netdevices and you create ingress qdisc on both and you want to add
>> identical filter rules both, you need to add them twice. This patchset
>> makes this easier and mainly saves resources allowing to share all filters
>> within a qdisc - I call it a "filter block". Also this helps to save
>> resources when we do offload to hw for example to expensive TCAM.
>> 
>> So back to the example. First, we create 2 qdiscs. Both will share
>> block number 22. "22" is just an identification. If we don't pass any
>> block number, a new one will be generated by kernel:
>> 
>> $ tc qdisc add dev ens7 ingress block 22
>>  
>> $ tc qdisc add dev ens8 ingress block 22
>> 
>
>Above makes intuitive sense.
>
>
>  
>> 
>> Now if we list the qdiscs, we will see the block index in the output:
>> qdisc fq_codel 0: dev ens7 root refcnt 2 limit 10240p flows 1024 quantum 
>> 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
>>   Sent 9014 bytes 99 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>>maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>new_flows_len 0 old_flows_len 0
>> qdisc ingress : dev ens7 parent :fff1 block 22
>>
>>   Sent 4592 bytes 58 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>> qdisc fq_codel 0: dev ens8 root refcnt 2 limit 10240p flows 1024 quantum 
>> 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
>>   Sent 17022 bytes 307 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>>maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>new_flows_len 0 old_flows_len 0
>> qdisc ingress : dev ens8 parent :fff1 block 22
>>
>>   Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>   backlog 0b 0p requeues 0
>> 
>
>So does this.
>
>> 
>> Now we can add filter to any of qdiscs sharing the same block:
>> 
>> $ tc filter add dev ens7 parent : protocol ip pref 25 flower dst_ip 
>> 192.168.0.0/16 action drop
>> 
>
>So for backward compat - this also makes sense. But:
>it does make sense to create new syntax for adding
>filters and actions:
>
>tc filter add block 22 protocol ip pref 25 flower \
>  dst_ip 192.168.0.0/16 action drop

Was thinking about that. Decided to pass on this now. This should be
addressed by follow-up anyway.


>
>Coordinates of the filter block before were:
>
>, , [handle]
>
>You should be able to abuse struct tcmsg ifindex to represent block #
>as long as you set parent to be something meaningful that is
>identified "block coordinate" via TC_H_XXX (pick something safe not
>in use by ingress or egress; look at: uapi/linux/pkt_sched.h)

Not sure about this. I have take closer look. In general, I don't like
to abuse anything :)


>
>> 
>> We will see the same output if we list filters for ens7 and ens8, including 
>> stats:
>> 
>> $ tc -s filter show dev ens7 root
>> filter parent : protocol ip pref 25 flower
>> filter parent : protocol ip pref 25 flower handle 0x1
>>eth_type ipv4
>>dst_ip 192.168.1.0/24
>>  action order 1: gact action drop
>>   random type none pass val 0
>>   index 3 ref 1 bind 1 installed 10201 sec used 10150 sec
>>  Action statistics:
>>  Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>> 
>> $ tc -s filter show dev ens8 root
>> filter dev ens7 parent : protocol ip pref 25 flower
>> filter dev ens7 parent : protocol ip pref 25 flower handle 0x1
>>eth_type ipv4
>>dst_ip 192.168.1.0/24
>>  action order 1: gact action drop
>>   random type none pass val 0
>>   index 3 ref 1 bind 1 installed 10202 sec used 10152 sec
>>  Action statistics:
>>  Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
>>  backlog 0b 0p requeues 0
>> 
>> 
>> Issues:
>> - tp->q is set by the device used to add the filter. That has to be resolved.
>>Impacts the dump (as you can see above)
>> 
>
>I think you have more problems if the dump above is reality;->
>You added to ingress and this is showing egress.

Howcome? I only don't see "dev x" on ens7. That is the only difference,


>
>To complete the thought, dump is:
>
> tc -s filter show block 22

Understood. Again, this should be addressed in follow-up.


>
>cheers,
>jamal
>


Re: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Simon Horman
On Tue, Jul 11, 2017 at 03:12:14PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 07/10/2017 06:32 PM, Biju Das wrote:
> 
> >Add a new compatible string for the RZ/G1M (R8A7743) SoC.
> >
> >Signed-off-by: Biju Das 
> >---
> >v1->v2
> >* Changed the subject
> >* re-formatted the required properties
> >
> > .../devicetree/bindings/net/renesas,ravb.txt   | 29 
> > +-
> > 1 file changed, 17 insertions(+), 12 deletions(-)
> >
> >diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
> >b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >index b519503..4717bc2 100644
> >--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >@@ -4,19 +4,24 @@ This file provides information on what the device node for 
> >the Ethernet AVB
> > interface contains.
> >
> > Required properties:
> >-- compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790 
> >SoC.
> >-  "renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
> >-  "renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
> >-  "renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
> >-  "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
> >-  "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
> >-  "renesas,etheravb-r8a7796" if the device is a part of R8A7796 SoC.
> >-  "renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible 
> >interface.
> >-  "renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible 
> >interface.
> >+- compatible: Must contain one or more of the following:
> 
>No, it can't contain more than one SoC specific value.

The text above does not say it can unless one is under the delusion
that, f.e. an r8a7790 SoC is also an r8a7791 SoC.

If you would prefer an alternate wording please provide a suggestion.

> >+  - "renesas,etheravb-r8a7743" for the R8A7743 SoC.
> >+  - "renesas,etheravb-r8a7790" for the R8A7790 SoC.
> >+  - "renesas,etheravb-r8a7791" for the R8A7791 SoC.
> >+  - "renesas,etheravb-r8a7792" for the R8A7792 SoC.
> >+  - "renesas,etheravb-r8a7793" for the R8A7793 SoC.
> >+  - "renesas,etheravb-r8a7794" for the R8A7794 SoC.
> >+  - "renesas,etheravb-rcar-gen2" as a fallback for the above
> >+R-Car Gen2 and RZ/G1 devices.
> >
> >-  When compatible with the generic version, nodes must list the
> >-  SoC-specific version corresponding to the platform first
> >-  followed by the generic version.
> >+  - "renesas,etheravb-r8a7795" for the R8A7795 SoC.
> >+  - "renesas,etheravb-r8a7796" for the R8A7796 SoC.
> 
>Neither here.
> 
> >+  - "renesas,etheravb-rcar-gen3" as a fallback for the above
> >+R-Car Gen3 devices.
> >+
> >+When compatible with the generic version, nodes must list the
> >+SoC-specific version corresponding to the platform first followed by
> >+the generic version.
> >
> > - reg: offset and length of (1) the register block and (2) the stream 
> > buffer.
> > - interrupts: A list of interrupt-specifiers, one for each entry in
> 
> WBR, Sergei
> 


RE: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Chris Paterson
Hello Sergei,

> From: Sergei Shtylyov [mailto:sergei.shtyl...@cogentembedded.com]
> Sent: 11 July 2017 13:12
> 
> Hello!
> 
> On 07/10/2017 06:32 PM, Biju Das wrote:
> 
> > Add a new compatible string for the RZ/G1M (R8A7743) SoC.
> >
> > Signed-off-by: Biju Das 
> > ---
> > v1->v2
> > * Changed the subject
> > * re-formatted the required properties
> >
> >  .../devicetree/bindings/net/renesas,ravb.txt   | 29 +--
> ---
> >  1 file changed, 17 insertions(+), 12 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > index b519503..4717bc2 100644
> > --- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > +++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > @@ -4,19 +4,24 @@ This file provides information on what the device
> > node for the Ethernet AVB  interface contains.
> >
> >  Required properties:
> > -- compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790
> SoC.
> > - "renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
> > - "renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
> > - "renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
> > - "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
> > - "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
> > - "renesas,etheravb-r8a7796" if the device is a part of R8A7796 SoC.
> > - "renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible
> interface.
> > - "renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible
> interface.
> > +- compatible: Must contain one or more of the following:
> 
> No, it can't contain more than one SoC specific value.

Surely it can contain both "renesas,etheravb-r8a7743" and 
"renesas,etheravb-rcar-gen2" for example?

As indeed dictated by the instruction further down:

> > +   When compatible with the generic version, nodes must list the
> > +   SoC-specific version corresponding to the platform first followed by
> > +   the generic version.

Kind regards, Chris


Re: [patch net-next RFC 0/4] net: sched: allow qdiscs to share filter block instances

2017-07-11 Thread Jamal Hadi Salim

Hi Jiri,

Commenting on generalities - will comment on code later:

On 17-07-10 02:51 PM, Jiri Pirko wrote:

From: Jiri Pirko 

Currently the filters added to qdiscs are independent. So for example if you
have 2 netdevices and you create ingress qdisc on both and you want to add
identical filter rules both, you need to add them twice. This patchset
makes this easier and mainly saves resources allowing to share all filters
within a qdisc - I call it a "filter block". Also this helps to save
resources when we do offload to hw for example to expensive TCAM.

So back to the example. First, we create 2 qdiscs. Both will share
block number 22. "22" is just an identification. If we don't pass any
block number, a new one will be generated by kernel:

$ tc qdisc add dev ens7 ingress block 22
 
$ tc qdisc add dev ens8 ingress block 22



Above makes intuitive sense.


  


Now if we list the qdiscs, we will see the block index in the output:
qdisc fq_codel 0: dev ens7 root refcnt 2 limit 10240p flows 1024 quantum 1514 
target 5.0ms interval 100.0ms memory_limit 32Mb ecn
  Sent 9014 bytes 99 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
   new_flows_len 0 old_flows_len 0
qdisc ingress : dev ens7 parent :fff1 block 22
   
  Sent 4592 bytes 58 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
qdisc fq_codel 0: dev ens8 root refcnt 2 limit 10240p flows 1024 quantum 1514 
target 5.0ms interval 100.0ms memory_limit 32Mb ecn
  Sent 17022 bytes 307 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
   new_flows_len 0 old_flows_len 0
qdisc ingress : dev ens8 parent :fff1 block 22
   
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0



So does this.



Now we can add filter to any of qdiscs sharing the same block:

$ tc filter add dev ens7 parent : protocol ip pref 25 flower dst_ip 
192.168.0.0/16 action drop



So for backward compat - this also makes sense. But:
it does make sense to create new syntax for adding
filters and actions:

tc filter add block 22 protocol ip pref 25 flower \
  dst_ip 192.168.0.0/16 action drop

Coordinates of the filter block before were:

, , [handle]

You should be able to abuse struct tcmsg ifindex to represent block #
as long as you set parent to be something meaningful that is
identified "block coordinate" via TC_H_XXX (pick something safe not
in use by ingress or egress; look at: uapi/linux/pkt_sched.h)



We will see the same output if we list filters for ens7 and ens8, including 
stats:

$ tc -s filter show dev ens7 root
filter parent : protocol ip pref 25 flower
filter parent : protocol ip pref 25 flower handle 0x1
   eth_type ipv4
   dst_ip 192.168.1.0/24
 action order 1: gact action drop
  random type none pass val 0
  index 3 ref 1 bind 1 installed 10201 sec used 10150 sec
 Action statistics:
 Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

$ tc -s filter show dev ens8 root
filter dev ens7 parent : protocol ip pref 25 flower
filter dev ens7 parent : protocol ip pref 25 flower handle 0x1
   eth_type ipv4
   dst_ip 192.168.1.0/24
 action order 1: gact action drop
  random type none pass val 0
  index 3 ref 1 bind 1 installed 10202 sec used 10152 sec
 Action statistics:
 Sent 4200 bytes 50 pkt (dropped 50, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0


Issues:
- tp->q is set by the device used to add the filter. That has to be resolved.
   Impacts the dump (as you can see above)



I think you have more problems if the dump above is reality;->
You added to ingress and this is showing egress.

To complete the thought, dump is:

 tc -s filter show block 22

cheers,
jamal



Re: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Sergei Shtylyov

Hello!

On 07/10/2017 06:32 PM, Biju Das wrote:


Add a new compatible string for the RZ/G1M (R8A7743) SoC.

Signed-off-by: Biju Das 
---
v1->v2
* Changed the subject
* re-formatted the required properties

 .../devicetree/bindings/net/renesas,ravb.txt   | 29 +-
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
b/Documentation/devicetree/bindings/net/renesas,ravb.txt
index b519503..4717bc2 100644
--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
@@ -4,19 +4,24 @@ This file provides information on what the device node for 
the Ethernet AVB
 interface contains.

 Required properties:
-- compatible: "renesas,etheravb-r8a7790" if the device is a part of R8A7790 
SoC.
- "renesas,etheravb-r8a7791" if the device is a part of R8A7791 SoC.
- "renesas,etheravb-r8a7792" if the device is a part of R8A7792 SoC.
- "renesas,etheravb-r8a7793" if the device is a part of R8A7793 SoC.
- "renesas,etheravb-r8a7794" if the device is a part of R8A7794 SoC.
- "renesas,etheravb-r8a7795" if the device is a part of R8A7795 SoC.
- "renesas,etheravb-r8a7796" if the device is a part of R8A7796 SoC.
- "renesas,etheravb-rcar-gen2" for generic R-Car Gen 2 compatible 
interface.
- "renesas,etheravb-rcar-gen3" for generic R-Car Gen 3 compatible 
interface.
+- compatible: Must contain one or more of the following:


   No, it can't contain more than one SoC specific value.


+  - "renesas,etheravb-r8a7743" for the R8A7743 SoC.
+  - "renesas,etheravb-r8a7790" for the R8A7790 SoC.
+  - "renesas,etheravb-r8a7791" for the R8A7791 SoC.
+  - "renesas,etheravb-r8a7792" for the R8A7792 SoC.
+  - "renesas,etheravb-r8a7793" for the R8A7793 SoC.
+  - "renesas,etheravb-r8a7794" for the R8A7794 SoC.
+  - "renesas,etheravb-rcar-gen2" as a fallback for the above
+   R-Car Gen2 and RZ/G1 devices.

- When compatible with the generic version, nodes must list the
- SoC-specific version corresponding to the platform first
- followed by the generic version.
+  - "renesas,etheravb-r8a7795" for the R8A7795 SoC.
+  - "renesas,etheravb-r8a7796" for the R8A7796 SoC.


   Neither here.


+  - "renesas,etheravb-rcar-gen3" as a fallback for the above
+   R-Car Gen3 devices.
+
+   When compatible with the generic version, nodes must list the
+   SoC-specific version corresponding to the platform first followed by
+   the generic version.

 - reg: offset and length of (1) the register block and (2) the stream buffer.
 - interrupts: A list of interrupt-specifiers, one for each entry in


WBR, Sergei



[PATCH] rt2x00: make const array glrt_table static

2017-07-11 Thread Colin King
From: Colin Ian King 

Don't populate array glrt_table on the stack but make it static.
Makes the object code a smaller by over 670 bytes:

Before:
   textdata bss dec hex filename
 1317724733   0  136505   21539 rt2800lib.o

After:
   textdata bss dec hex filename
 1310434789   0  135832   21298 rt2800lib.o

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/ralink/rt2x00/rt2800lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c 
b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
index 6e2e760d98b1..0b75def39c6c 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2800lib.c
@@ -5704,7 +5704,7 @@ static void rt2800_init_freq_calibration(struct 
rt2x00_dev *rt2x00dev)
 
 static void rt2800_init_bbp_5592_glrt(struct rt2x00_dev *rt2x00dev)
 {
-   const u8 glrt_table[] = {
+   static const u8 glrt_table[] = {
0xE0, 0x1F, 0X38, 0x32, 0x08, 0x28, 0x19, 0x0A, 0xFF, 0x00, /* 
128 ~ 137 */
0x16, 0x10, 0x10, 0x0B, 0x36, 0x2C, 0x26, 0x24, 0x42, 0x36, /* 
138 ~ 147 */
0x30, 0x2D, 0x4C, 0x46, 0x3D, 0x40, 0x3E, 0x42, 0x3D, 0x40, /* 
148 ~ 157 */
-- 
2.11.0



Re: Regarding xfrm state search with destination address as wildcard mask

2017-07-11 Thread Balaji Foss
Hi

Any help on this query is greatly appreciated.

Thanks,
  - Balaji

On Thu, Jul 6, 2017 at 12:21 PM, Balaji Foss  wrote:
> Hi All,
>
> Im trying to implement IPSec for ospfv3 as per RFC4552 on Linux kernel
> version 3.16.39.
> Requirement is to support IPsec encryption/authentication for ospfv3 traffic.
> As of now, this can be achieved by following set of SA and SP rules.
>
> ip xfrm state add src :: dst ff02::5 proto ah spi 0x401 mode transport
> auth "hmac(sha1)" 0x12345678123456781234567812345678
> ip xfrm state add src :: dst ff02::6 proto ah spi 0x401 mode transport
> auth "hmac(sha1)" 0x12345678123456781234567812345678
> ip xfrm state add src  dst  proto ah spi 0x401 mode
> transport auth "hmac(sha1)" 0x12345678123456781234567812345678
> ip xfrm state add src  dst  proto ah spi 0x401 mode
> transport auth "hmac(sha1)" 0x12345678123456781234567812345678
>
> ip xfrm policy add dir out src  dst 0::0/0 dev e101-049-0 proto
> ospf priority 2147483648 tmpl  proto ah spi 0x401 mode transport level
> use
> ip xfrm policy add dir in src 0::0/0 dst 0::0/0 dev e101-049-0 proto
> ospf priority 2147483648 tmpl proto ah spi 0x401 mode transport level
> use
>
>
> One can notice that it needs four SA rules to achieve IPsec for single
> OSPF interface.
> Instead of these four rules, can we have a single rule with DIP as
> wild card mask and the xfrm state search as based on SPI ,family and
> proto alone?
>
> As of now, the API "__xfrm_state_lookup"  search based on
> SPI,family,proto and dest_addr.  Is there any way I can achieve the SA
> lookup without dest_addr and only with SPI,family and proto alone?
>
> Any help or pointers is greatly appreciated.
>
> Regards
> Bala


[PATCH] net: stmmac: make const array route_possibilities static

2017-07-11 Thread Colin King
From: Colin Ian King 

Don't populate array route_possibilities on the stack but make it
static const.  Makes the object code a little smaller by 85 bytes:

Before:
   textdata bss dec hex filename
   99012448   0   12349303d dwmac4_core.o

After:
   textdata bss dec hex filename
   97602504   0   122642fe8 dwmac4_core.o

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index f233bf8b4ebb..c4407e8e39a3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -117,7 +117,7 @@ static void dwmac4_tx_queue_routing(struct mac_device_info 
*hw,
void __iomem *ioaddr = hw->pcsr;
u32 value;
 
-   const struct stmmac_rx_routing route_possibilities[] = {
+   static const struct stmmac_rx_routing route_possibilities[] = {
{ GMAC_RXQCTRL_AVCPQ_MASK, GMAC_RXQCTRL_AVCPQ_SHIFT },
{ GMAC_RXQCTRL_PTPQ_MASK, GMAC_RXQCTRL_PTPQ_SHIFT },
{ GMAC_RXQCTRL_DCBCPQ_MASK, GMAC_RXQCTRL_DCBCPQ_SHIFT },
-- 
2.11.0



[PATCH] net: broadcom: bnx2x: make a couple of const arrays static

2017-07-11 Thread Colin King
From: Colin Ian King 

Don't populate various tables on the stack but make them static const.
Makes the object code smaller by nearly 200 bytes:

Before:
   textdata bss dec hex filename
 113468   11200   0  124668   1e6fc bnx2x_ethtool.o

After:
   textdata bss dec hex filename
 113129   11344   0  124473   1e639 bnx2x_ethtool.o

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index 43423744fdfa..21bc4bed6b26 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -2886,7 +2886,7 @@ static int bnx2x_test_nvram_tbl(struct bnx2x *bp,
 
 static int bnx2x_test_nvram(struct bnx2x *bp)
 {
-   const struct crc_pair nvram_tbl[] = {
+   static const struct crc_pair nvram_tbl[] = {
{ 0,  0x14 }, /* bootstrap */
{  0x14,  0xec }, /* dir */
{ 0x100, 0x350 }, /* manuf_info */
@@ -2895,7 +2895,7 @@ static int bnx2x_test_nvram(struct bnx2x *bp)
{ 0x708,  0x70 }, /* manuf_key_info */
{ 0, 0 }
};
-   const struct crc_pair nvram_tbl2[] = {
+   static const struct crc_pair nvram_tbl2[] = {
{ 0x7e8, 0x350 }, /* manuf_info2 */
{ 0xb38,  0xf0 }, /* feature_info */
{ 0, 0 }
-- 
2.11.0



Re: [PATCH net-next RFC 05/12] net: dsa: Add support for learning FDB through notification

2017-07-11 Thread Arkadi Sharshevsky


On 07/10/2017 11:59 PM, Vivien Didelot wrote:
> Hi Arkadi,
> 
> Arkadi Sharshevsky  writes:
> 
 +  err = dsa_port_fdb_add(p->dp, fdb_info->addr, fdb_info->vid);
 +  if (err) {
 +  netdev_dbg(dev, "fdb add failed err=%d\n", err);
 +  break;
 +  }
 +  call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED, dev,
 +   _info->info);
 +  break;
 +
 +  case SWITCHDEV_FDB_DEL_TO_DEVICE:
 +  fdb_info = _work->fdb_info;
 +  err = dsa_port_fdb_del(p->dp, fdb_info->addr, fdb_info->vid);
 +  if (err)
 +  netdev_dbg(dev, "fdb del failed err=%d\n", err);
>>>
>>> OK I must have missed from the off-list discussion why we are not
>>> calling the switchdev notifier here?
>>
>> We do not agree on it actually, that is why it was moved to the list.
>> I think that delete should succeed, you should retry until succession.
>>
>> The deletion is done under spinlock in the bridge so you cannot block,
>> thus delete cannot fail due to hardware failure. Calling it here doesn't
>> make sense because the bridge probably already deleted this FDB.
> 
> So as we discussed, the problem here is that if dsa_port_fdb_del fails
> for some probable reasons (MDIO timeout, weak GPIO lines, etc.), Linux
> bridge will delete the entry in software, dumping bridge fdb will show
> nothing, but the entry would still be programmed in hardware and the
> network can thus be inconsistent, unsupposedly switching frames.
> 
> IMHO the correct way for bridge to use the notification chain is to make
> SWITCHDEV_FDB_DEL_TO_DEVICE symmetrical to SWITCHDEV_FDB_ADD_TO_DEVICE:
> if an entry has been marked as offloaded, bridge must mark the entry as
> to-be-deleted and do not delete the software entry until the driver
> notifies back the successful deletion.
> 
> If that is hardly feasible due to some bridge limitations, we must
> explain this in a comment and use something more explosive than a simple
> netdev_dbg to warn the user about the broken network setup...
> 
> 
> Thanks,
> 
> Vivien
> 

Hi Nikolay,

Vivien raised inconsistency issue with the current switchdev
notification chain in case of FDB del. In case of static FDB delete,
notification will be sent to the driver, followed by deletion of the
software entry without waiting for the hardware delete. In case of
hardware deletion failure the consecutive FDB dump will not show the
deleted entry, yet, the entry will stay in hardware.

The deletion is done under lock thus the hardware deletion is deferred,
and cannot fail due to hardware removal failure. Thus the above proposed
solution by Vivien can lead to confusing situation:

1. User deletes the entry
2. Deletion succeed
3. User dumps FDB and still sees this entry due to hardware failure,
   what should he do? retry to delete until the FDB dump will not show
   the entry?

Would like to hear you opinion about this solution.

IMHO in this case the driver should retry to delete, in case of
several retries the driver should maybe:
1. Trap the traffic to CPU (dint think it possible in case of DSA).
2. Disable the port (its more explosive then netdev_dbg).

Thanks,
Arkadi




[PATCH 4/6] [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping

2017-07-11 Thread Amritha Nambiar
The i40e driver is modified to enable the new mqprio hardware
offload mode and factor the TCs and queue configuration by
creating channel VSIs. In this mode, the priority to traffic
class mapping and the user specified queue ranges are used
to configure the traffic classes when the 'hw' option is set
to 2.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5 hw 2

# tc qdisc show dev eth0

qdisc mqprio 8038: root  tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
 queues:(0:1) (2:3) (4:4) (5:5)

The HW channels created are removed and all the queue configuration
is set to default when the qdisc is detached from the root of the
device.

# tc qdisc del dev eth0 root

This patch also disables setting up channels via ethtool (ethtool -L)
when the TCs are configured using mqprio scheduler.

The patch also limits setting ethtool Rx flow hash indirection
(ethtool -X eth0 equal N) to max queues configured via mqprio.
The Rx flow hash indirection input through ethtool should be
validated so that it is within in the queue range configured via
tc/mqprio. The bound checking is achieved by reporting the current
rss size to the kernel when queues are configured via mqprio.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 1 0 2 3 0 queues 2@0 4@2 8@6 11@14 hw 2

# ethtool -X eth0 equal 12
Cannot set RX flow hash configuration: Invalid argument

Signed-off-by: Amritha Nambiar 
---
 drivers/net/ethernet/intel/i40e/i40e.h |3 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |8 
 drivers/net/ethernet/intel/i40e/i40e_main.c|  442 ++--
 3 files changed, 342 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 93bb527..501d62e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #include "i40e_client.h"
@@ -697,6 +698,7 @@ struct i40e_vsi {
enum i40e_vsi_type type;  /* VSI type, e.g., LAN, FCoE, etc */
s16 vf_id;  /* Virtual function ID for SRIOV VSIs */
 
+   struct tc_mqprio_qopt_offload mqprio_qopt; /* queue parameters */
struct i40e_tc_configuration tc_config;
struct i40e_aqc_vsi_properties_data info;
 
@@ -722,6 +724,7 @@ struct i40e_vsi {
u16 cnt_q_avail; /* num of queues available for channel usage */
u16 orig_rss_size;
u16 current_rss_size;
+   bool reconfig_rss;
 
/* keeps track of next_base_queue to be used for channel setup */
atomic_t next_base_queue;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a868c8d..d61af22 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2626,7 +2626,7 @@ static int i40e_get_rxnfc(struct net_device *netdev, 
struct ethtool_rxnfc *cmd,
 
switch (cmd->cmd) {
case ETHTOOL_GRXRINGS:
-   cmd->data = vsi->num_queue_pairs;
+   cmd->data = vsi->rss_size;
ret = 0;
break;
case ETHTOOL_GRXFH:
@@ -3871,6 +3871,12 @@ static int i40e_set_channels(struct net_device *dev,
if (vsi->type != I40E_VSI_MAIN)
return -EINVAL;
 
+   /* We do not support setting channels via ethtool when TCs are
+* configured through mqprio
+*/
+   if (pf->flags & I40E_FLAG_TC_MQPRIO)
+   return -EINVAL;
+
/* verify they are not requesting separate vectors */
if (!count || ch->rx_count || ch->tx_count)
return -EINVAL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 54a6275..40084b7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1578,6 +1578,169 @@ static int i40e_set_mac(struct net_device *netdev, void 
*p)
 }
 
 /**
+ * i40e_config_rss_aq - Prepare for RSS using AQ commands
+ * @vsi: vsi structure
+ * @seed: RSS hash seed
+ **/
+static int i40e_config_rss_aq(struct i40e_vsi *vsi, const u8 *seed,
+ u8 *lut, u16 lut_size)
+{
+   struct i40e_pf *pf = vsi->back;
+   struct i40e_hw *hw = >hw;
+   int ret = 0;
+
+   if (seed) {
+   struct i40e_aqc_get_set_rss_key_data *seed_dw =
+   (struct i40e_aqc_get_set_rss_key_data *)seed;
+   ret = i40e_aq_set_rss_key(hw, vsi->id, seed_dw);
+   if (ret) {
+   dev_info(>pdev->dev,
+"Cannot set RSS key, err %s aq_err %s\n",
+i40e_stat_str(hw, ret),
+i40e_aq_str(hw, hw->aq.asq_last_status));
+   

[PATCH 5/6] [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused on the PF as well

2017-07-11 Thread Amritha Nambiar
This patch refactors the BW rate limiting for Tx traffic
on the VF to be reused in the next patch for rate limiting Tx
traffic for the VSIs on the PF as well.

Signed-off-by: Amritha Nambiar 
---
 drivers/net/ethernet/intel/i40e/i40e.h |5 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c|   63 
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   45 +-
 3 files changed, 70 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 501d62e..c545be6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -127,6 +127,10 @@
 /* default to trying for four seconds */
 #define I40E_TRY_LINK_TIMEOUT  (4 * HZ)
 
+/* BW rate limiting */
+#define I40E_BW_CREDIT_DIVISOR 50 /* 50Mbps per BW credit */
+#define I40E_MAX_BW_INACTIVE_ACCUM 4  /* accumulate 4 credits max */
+
 /* driver state flags */
 enum i40e_state_t {
__I40E_TESTING,
@@ -1040,4 +1044,5 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi 
*vsi)
 }
 
 int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
+int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 40084b7..409ef09 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5397,6 +5397,69 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 
enabled_tc)
 }
 
 /**
+ * i40e_get_link_speed - Returns link speed for the interface
+ * @vsi: VSI to be configured
+ *
+ **/
+int i40e_get_link_speed(struct i40e_vsi *vsi)
+{
+   struct i40e_pf *pf = vsi->back;
+
+   switch (pf->hw.phy.link_info.link_speed) {
+   case I40E_LINK_SPEED_40GB:
+   return 4;
+   case I40E_LINK_SPEED_25GB:
+   return 25000;
+   case I40E_LINK_SPEED_20GB:
+   return 2;
+   case I40E_LINK_SPEED_10GB:
+   return 1;
+   case I40E_LINK_SPEED_1GB:
+   return 1000;
+   default:
+   return -EINVAL;
+   }
+}
+
+/**
+ * i40e_set_bw_limit - setup BW limit for Tx traffic based on max_tx_rate
+ * @vsi: VSI to be configured
+ * @seid: seid of the channel/VSI
+ * @max_tx_rate: max TX rate to be configured as BW limit
+ *
+ * Helper function to set BW limit for a given VSI
+ **/
+int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate)
+{
+   struct i40e_pf *pf = vsi->back;
+   int speed = 0;
+   int ret = 0;
+
+   speed = i40e_get_link_speed(vsi);
+   if (max_tx_rate > speed) {
+   dev_err(>pdev->dev,
+   "Invalid max tx rate %llu specified for VSI seid %d.",
+   max_tx_rate, seid);
+   return -EINVAL;
+   }
+   if ((max_tx_rate < 50) && (max_tx_rate > 0)) {
+   dev_warn(>pdev->dev,
+"Setting max tx rate to minimum usable value of 
50Mbps.\n");
+   max_tx_rate = 50;
+   }
+
+   /* Tx rate credits are in values of 50Mbps, 0 is disabled*/
+   ret = i40e_aq_config_vsi_bw_limit(>hw, seid,
+ max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+ I40E_MAX_BW_INACTIVE_ACCUM, NULL);
+   if (ret)
+   dev_err(>pdev->dev,
+   "Failed set tx rate (%llu Mbps) for vsi->seid %u, error 
code %d.\n",
+   max_tx_rate, seid, ret);
+   return ret;
+}
+
+/**
  * i40e_remove_queue_channels - Remove queue channels for the TCs
  * @vsi: VSI to be configured
  *
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index be172f8..7a0f953 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2926,8 +2926,6 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, 
int vf_id,
return ret;
 }
 
-#define I40E_BW_CREDIT_DIVISOR 50 /* 50Mbps per BW credit */
-#define I40E_MAX_BW_INACTIVE_ACCUM 4  /* device can accumulate 4 credits max */
 /**
  * i40e_ndo_set_vf_bw
  * @netdev: network interface device structure
@@ -2943,7 +2941,6 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int 
vf_id, int min_tx_rate,
struct i40e_pf *pf = np->vsi->back;
struct i40e_vsi *vsi;
struct i40e_vf *vf;
-   int speed = 0;
int ret = 0;
 
/* validate the request */
@@ -2968,48 +2965,10 @@ int i40e_ndo_set_vf_bw(struct net_device *netdev, int 
vf_id, int min_tx_rate,
goto error;
}
 
-   switch (pf->hw.phy.link_info.link_speed) {
-   case I40E_LINK_SPEED_40GB:
-   speed = 4;
-   break;
-  

[PATCH 3/6] [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler

2017-07-11 Thread Amritha Nambiar
This patch sets up the infrastructure for offloading TCs and
queue configurations to the hardware by creating HW channels(VSI).
A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC
(TC0). TC0 for the main VSI is also reconfigured as per user provided
queue parameters. Queue counts that are not power-of-2 are handled by
reconfiguring RSS by reprogramming LUTs using the queue count value.
This patch also handles configuring the TX rings for the channels,
setting up the RX queue map for channel.

Also, the channels so created are removed and all the queue
configuration is set to default when the qdisc is detached from the
root of the device.

Signed-off-by: Amritha Nambiar 
Signed-off-by: Kiran Patil 
---
 drivers/net/ethernet/intel/i40e/i40e.h  |   33 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |  747 +++
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |2 
 3 files changed, 773 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index bd9bf46..93bb527 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -86,6 +86,7 @@
 #define I40E_AQ_LEN256
 #define I40E_AQ_WORK_LIMIT 66 /* max number of VFs + a little */
 #define I40E_MAX_USER_PRIORITY 8
+#define I40E_MAX_QUEUES_PER_CH 64
 #define I40E_DEFAULT_TRAFFIC_CLASS BIT(0)
 #define I40E_DEFAULT_MSG_ENABLE4
 #define I40E_QUEUE_WAIT_RETRY_LIMIT10
@@ -338,6 +339,23 @@ struct i40e_flex_pit {
u8 pit_index;
 };
 
+struct i40e_channel {
+   struct list_head list;
+   bool initialized;
+   u8 type;
+   u16 vsi_number; /* Assigned VSI number from AQ 'Add VSI' response */
+   u16 stat_counter_idx;
+   u16 base_queue;
+   u16 num_queue_pairs; /* Requested by user */
+   u16 seid;
+
+   u8 enabled_tc;
+   struct i40e_aqc_vsi_properties_data info;
+
+   /* track this channel belongs to which VSI */
+   struct i40e_vsi *parent_vsi;
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
struct pci_dev *pdev;
@@ -453,6 +471,7 @@ struct i40e_pf {
 #define I40E_FLAG_CLIENT_L2_CHANGE BIT_ULL(25)
 #define I40E_FLAG_CLIENT_RESET BIT_ULL(26)
 #define I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED   BIT_ULL(27)
+#define I40E_FLAG_TC_MQPRIOBIT_ULL(28)
 
struct i40e_client_instance *cinst;
bool stat_offsets_loaded;
@@ -533,6 +552,8 @@ struct i40e_pf {
u32 ioremap_len;
u32 fd_inv;
u16 phy_led_val;
+
+   u16 override_q_count;
 };
 
 /**
@@ -697,6 +718,16 @@ struct i40e_vsi {
bool current_isup;  /* Sync 'link up' logging */
enum i40e_aq_link_speed current_speed;  /* Sync link speed logging */
 
+   /* channel specific fields */
+   u16 cnt_q_avail; /* num of queues available for channel usage */
+   u16 orig_rss_size;
+   u16 current_rss_size;
+
+   /* keeps track of next_base_queue to be used for channel setup */
+   atomic_t next_base_queue;
+
+   struct list_head ch_list;
+
void *priv; /* client driver data reference. */
 
/* VSI specific handlers */
@@ -1004,4 +1035,6 @@ static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi 
*vsi)
 {
return !!vsi->xdp_prog;
 }
+
+int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 03412a8..54a6275 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2877,7 +2877,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring 
*ring)
struct i40e_vsi *vsi = ring->vsi;
cpumask_var_t mask;
 
-   if (!ring->q_vector || !ring->netdev)
+   if (!ring->q_vector || !ring->netdev || ring->ch)
return;
 
/* Single TC mode enable XPS */
@@ -2950,7 +2950,14 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 * initialization. This has to be done regardless of
 * DCB as by default everything is mapped to TC0.
 */
-   tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
+   if (ring->ch)
+   tx_ctx.rdylist =
+   le16_to_cpu(ring->ch->info.qs_handle[ring->dcb_tc]);
+
+   else
+   tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
tx_ctx.rdylist_act = 0;
 
/* clear the context in the HMC */
@@ -2972,12 +2979,23 @@ static int i40e_configure_tx_ring(struct i40e_ring 
*ring)
}
 
/* Now associate this queue with this PCI function */
-   if (vsi->type == I40E_VSI_VMDQ2) {
-   qtx_ctl = 

[PATCH 1/6] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations

2017-07-11 Thread Amritha Nambiar
This patch introduces a new hardware offload mode in mqprio
which makes full use of the mqprio options, the TCs, the
queue configurations and the bandwidth rates for the TCs.
This is achieved by setting the value 2 for the "hw" option.
This new offload mode supports new attributes for traffic
class such as minimum and maximum values for bandwidth rate limits.

Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
mqprio queue options and use this to be shared between the kernel and
device driver. This contains a copy of the exisiting datastructure
for mqprio queue options. This new datastructure can be extended when
adding new attributes for traffic class such as bandwidth rate limits. The
existing datastructure for mqprio queue options will be shared between the
kernel and userspace.

This patch enables configuring additional attributes associated
with a traffic class such as minimum and maximum bandwidth
rates and can be offloaded to the hardware in the new offload mode.
The min and max limits for bandwidth rates are provided
by the user along with the the TCs and the queue configurations
when creating the mqprio qdisc.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
 queues:(0:3) (4:7)
 min rates:0bit 0bit
 max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar 
---
 include/linux/netdevice.h  |2 
 include/net/pkt_cls.h  |7 ++
 include/uapi/linux/pkt_sched.h |   13 +++
 net/sched/sch_mqprio.c |  170 +---
 4 files changed, 181 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e48ee2e..12c6c3f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,6 +779,7 @@ enum {
TC_SETUP_CLSFLOWER,
TC_SETUP_MATCHALL,
TC_SETUP_CLSBPF,
+   TC_SETUP_MQPRIO_EXT,
 };
 
 struct tc_cls_u32_offload;
@@ -791,6 +792,7 @@ struct tc_to_netdev {
struct tc_cls_matchall_offload *cls_mall;
struct tc_cls_bpf_offload *cls_bpf;
struct tc_mqprio_qopt *mqprio;
+   struct tc_mqprio_qopt_offload *mqprio_qopt;
};
bool egress_dev;
 };
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 537d0a0..9facda2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -569,6 +569,13 @@ struct tc_cls_bpf_offload {
u32 gen_flags;
 };
 
+struct tc_mqprio_qopt_offload {
+   /* struct tc_mqprio_qopt must always be the first element */
+   struct tc_mqprio_qopt qopt;
+   u32 flags;
+   u64 min_rate[TC_QOPT_MAX_QUEUE];
+   u64 max_rate[TC_QOPT_MAX_QUEUE];
+};
 
 /* This structure holds cookie structure that is passed from user
  * to the kernel for actions and classifiers
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf55..cf2a146 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -620,6 +620,7 @@ struct tc_drr_stats {
 enum {
TC_MQPRIO_HW_OFFLOAD_NONE,  /* no offload requested */
TC_MQPRIO_HW_OFFLOAD_TCS,   /* offload TCs, no queue counts */
+   TC_MQPRIO_HW_OFFLOAD,   /* fully supported offload */
__TC_MQPRIO_HW_OFFLOAD_MAX
 };
 
@@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
__u16   offset[TC_QOPT_MAX_QUEUE];
 };
 
+#define TC_MQPRIO_F_MIN_RATE  0x1
+#define TC_MQPRIO_F_MAX_RATE  0x2
+
+enum {
+   TCA_MQPRIO_UNSPEC,
+   TCA_MQPRIO_MIN_RATE64,
+   TCA_MQPRIO_MAX_RATE64,
+   __TCA_MQPRIO_MAX,
+};
+
+#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
+
 /* SFB */
 
 enum {
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index e0c0272..6524d12 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -18,10 +18,13 @@
 #include 
 #include 
 #include 
+#include 
 
 struct mqprio_sched {
struct Qdisc**qdiscs;
int hw_offload;
+   u32 flags;
+   u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
 };
 
 static void mqprio_destroy(struct Qdisc *sch)
@@ -39,9 +42,21 @@ static void mqprio_destroy(struct Qdisc *sch)
}
 
if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) {
-   struct tc_mqprio_qopt offload = { 0 };
-   struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-  { .mqprio =  } };
+   struct tc_mqprio_qopt_offload offload = { { 0 } };
+   struct tc_to_netdev tc = { 0 };
+
+   switch (priv->hw_offload) {
+   case TC_MQPRIO_HW_OFFLOAD_TCS:
+   tc.type = TC_SETUP_MQPRIO;
+   tc.mqprio = 
+   

[PATCH 2/6] [next-queue]net: i40e: Add macro for PF reset bit

2017-07-11 Thread Amritha Nambiar
Introduce a macro for the bit setting the PF reset flag and
update its usages. This makes it easier to use this flag
in functions to be introduced in future without encountering
checkpatch issues related to alignment and line over 80
characters.

Signed-off-by: Amritha Nambiar 
---
 drivers/net/ethernet/intel/i40e/i40e.h |2 ++
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |3 +--
 drivers/net/ethernet/intel/i40e/i40e_main.c|9 -
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |5 ++---
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 5f75e67..bd9bf46 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -155,6 +155,8 @@ enum i40e_state_t {
__I40E_STATE_SIZE__,
 };
 
+#define I40E_PF_RESET_FLAG BIT_ULL(__I40E_PF_RESET_REQUESTED)
+
 /* VSI state flags */
 enum i40e_vsi_state_t {
__I40E_VSI_DOWN,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 8f326f8..b46117e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -798,8 +798,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 */
if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
-   i40e_do_reset_safe(pf,
-  BIT_ULL(__I40E_PF_RESET_REQUESTED));
+   i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
}
 
vsi = i40e_vsi_setup(pf, I40E_VSI_VMDQ2, vsi_seid, 0);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 998ad96..03412a8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5738,7 +5738,7 @@ int i40e_vsi_open(struct i40e_vsi *vsi)
 err_setup_tx:
i40e_vsi_free_tx_resources(vsi);
if (vsi == pf->vsi[pf->lan_vsi])
-   i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED), true);
+   i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 
return err;
 }
@@ -5866,7 +5866,7 @@ void i40e_do_reset(struct i40e_pf *pf, u32 reset_flags, 
bool lock_acquired)
wr32(>hw, I40E_GLGEN_RTRIG, val);
i40e_flush(>hw);
 
-   } else if (reset_flags & BIT_ULL(__I40E_PF_RESET_REQUESTED)) {
+   } else if (reset_flags & I40E_PF_RESET_FLAG) {
 
/* Request a PF Reset
 *
@@ -9145,7 +9145,7 @@ static int i40e_set_features(struct net_device *netdev,
need_reset = i40e_set_ntuple(pf, features);
 
if (need_reset)
-   i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED), true);
+   i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
 
return 0;
 }
@@ -9397,8 +9397,7 @@ static int i40e_ndo_bridge_setlink(struct net_device *dev,
pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
else
pf->flags &= ~I40E_FLAG_VEB_MODE_ENABLED;
-   i40e_do_reset(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED),
- true);
+   i40e_do_reset(pf, I40E_PF_RESET_FLAG, true);
break;
}
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 3ef67dc..be172f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1377,8 +1377,7 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int 
num_vfs)
if (num_vfs) {
if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
-   i40e_do_reset_safe(pf,
-  BIT_ULL(__I40E_PF_RESET_REQUESTED));
+   i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
}
return i40e_pci_sriov_enable(pdev, num_vfs);
}
@@ -1386,7 +1385,7 @@ int i40e_pci_sriov_configure(struct pci_dev *pdev, int 
num_vfs)
if (!pci_vfs_assigned(pf->pdev)) {
i40e_free_vfs(pf);
pf->flags &= ~I40E_FLAG_VEB_MODE_ENABLED;
-   i40e_do_reset_safe(pf, BIT_ULL(__I40E_PF_RESET_REQUESTED));
+   i40e_do_reset_safe(pf, I40E_PF_RESET_FLAG);
} else {
dev_warn(>dev, "Unable to free VFs because some are 
assigned to VMs.\n");
return -EINVAL;



[PATCH 6/6] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio

2017-07-11 Thread Amritha Nambiar
This patch enables setting up maximum Tx rates for the traffic
classes in i40e. The maximum rate offloaded to the hardware through
the mqprio framework is configured for the VSI. Configuring
minimum Tx rate limit is not supported in the device. The minimum
usable value for Tx rate is 50Mbps.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:
# tc qdisc show dev eth0

qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
 queues:(0:3) (4:7)
 min rates:0bit 0bit
 max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar 
---
 drivers/net/ethernet/intel/i40e/i40e.h  |2 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |   99 +--
 2 files changed, 92 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index c545be6..7a4a338 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -357,6 +357,8 @@ struct i40e_channel {
u8 enabled_tc;
struct i40e_aqc_vsi_properties_data info;
 
+   u64 max_tx_rate;
+
/* track this channel belongs to which VSI */
struct i40e_vsi *parent_vsi;
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 409ef09..c5a625c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5193,9 +5193,16 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi 
*vsi, u8 enabled_tc,
i40e_status ret;
int i;
 
-   if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
-   !vsi->mqprio_qopt.qopt.hw)
+   if (vsi->back->flags & I40E_FLAG_TC_MQPRIO)
return 0;
+   if (!vsi->mqprio_qopt.qopt.hw) {
+   ret = i40e_set_bw_limit(vsi, vsi->seid, 0);
+   if (ret)
+   dev_info(>back->pdev->dev,
+"Failed to reset tx rate for vsi->seid %u\n",
+vsi->seid);
+   return ret;
+   }
bw_data.tc_valid_bits = enabled_tc;
for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
bw_data.tc_bw_credits[i] = bw_share[i];
@@ -5501,6 +5508,13 @@ static void i40e_remove_queue_channels(struct i40e_vsi 
*vsi)
rx_ring->ch = NULL;
}
 
+   /* Reset BW configured for this VSI via mqprio */
+   ret = i40e_set_bw_limit(vsi, ch->seid, 0);
+   if (ret)
+   dev_info(>back->pdev->dev,
+"Failed to reset tx rate for ch->seid %u\n",
+ch->seid);
+
/* delete VSI from FW */
ret = i40e_aq_delete_element(>back->hw, ch->seid,
 NULL);
@@ -6073,6 +6087,17 @@ int i40e_create_queue_channel(struct i40e_vsi *vsi,
 "Setup channel (id:%u) utilizing num_queues %d\n",
 ch->seid, ch->num_queue_pairs);
 
+   /* configure VSI for BW limit */
+   if (ch->max_tx_rate) {
+   if (i40e_set_bw_limit(vsi, ch->seid, ch->max_tx_rate))
+   return -EINVAL;
+
+   dev_dbg(>pdev->dev,
+   "Set tx rate of %llu Mbps (count of 50Mbps %llu) for 
vsi->seid %u\n",
+   ch->max_tx_rate,
+   ch->max_tx_rate / I40E_BW_CREDIT_DIVISOR, ch->seid);
+   }
+
/* in case of VF, this will be main SRIOV VSI */
ch->parent_vsi = vsi;
 
@@ -6108,6 +6133,12 @@ static int i40e_configure_queue_channels(struct i40e_vsi 
*vsi)
ch->base_queue =
vsi->tc_config.tc_info[i].qoffset;
 
+   /* Bandwidth limit through tc interface is in bytes/s,
+* change to Mbit/s
+*/
+   ch->max_tx_rate =
+   (vsi->mqprio_qopt.max_rate[i] * 8) / 100;
+
list_add_tail(>list, >ch_list);
 
ret = i40e_create_queue_channel(vsi, ch);
@@ -6527,6 +6558,7 @@ void i40e_down(struct i40e_vsi *vsi)
 static int i40e_validate_mqprio_qopt(struct i40e_vsi *vsi,
 struct tc_mqprio_qopt_offload *mqprio_qopt)
 {
+   u64 sum_max_rate = 0;
int i;
 
if ((mqprio_qopt->qopt.offset[0] != 0) ||
@@ -6536,8 +6568,13 @@ static int i40e_validate_mqprio_qopt(struct i40e_vsi 
*vsi,
for (i = 0; ; i++) {
if (!mqprio_qopt->qopt.count[i])
return -EINVAL;
-   if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+   if (mqprio_qopt->min_rate[i]) {
+   

[PATCH 0/6] Configuring traffic classes via new hardware offload mechanism in tc/mqprio

2017-07-11 Thread Amritha Nambiar
The following series introduces a new hardware offload mode in
tc/mqprio where the TCs, the queue configurations and
bandwidth rate limits are offloaded to the hardware. The existing
mqprio framework is extended to configure the queue counts and
layout and also added support for rate limiting. This is achieved
by setting the value 2 for the "hw" mode. Legacy devices can fall
back to the existing setup supporting hw mode 1 where only the
TCs are offloaded, however if hw mode 2 is specified, then this
type of offload fails to initialize if the requested mode is not
supported. The i40e driver enables the new mqprio hardware offload
mechanism factoring the TCs, queue configuration and bandwidth
rates by creating HW channel VSIs.

In this new mode, the priority to traffic class mapping and the
user specified queue ranges are used to configure the traffic
class when the 'hw' option is set to 2. This is achieved by
creating HW channels(VSI). A new channel is created for each
of the traffic class configuration offloaded via mqprio
framework except for the first TC (TC0) which is for the main
VSI. TC0 for the main VSI is also reconfigured as per user
provided queue parameters. Finally, bandwidth rate limits are
set on these traffic classes through the mqprio offload
framework by sending these rates in addition to the number of
TCs and the queue configurations.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0
  qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
   queues:(0:3) (4:7)
   min rates:0bit 0bit
   max rates:55Mbit 60Mbit
---

Amritha Nambiar (6):
  [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio 
for offloading full TC configurations
  [next-queue]net: i40e: Add macro for PF reset bit
  [next-queue]net: i40e: Add infrastructure for queue channel support with 
the TCs and queue configurations offloaded via mqprio scheduler
  [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver 
for configuring TCs and queue mapping
  [next-queue]net: i40e: Refactor VF BW rate limiting function to be reused 
on the PF as well
  [next-queue]net: i40e: Add support to set max bandwidth rates for TCs 
offloaded via tc/mqprio


 drivers/net/ethernet/intel/i40e/i40e.h |   45 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |3 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |8 
 drivers/net/ethernet/intel/i40e/i40e_main.c| 1478 +---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|2 
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   50 -
 include/linux/netdevice.h  |2 
 include/net/pkt_cls.h  |7 
 include/uapi/linux/pkt_sched.h |   13 
 net/sched/sch_mqprio.c |  170 ++
 10 files changed, 1526 insertions(+), 252 deletions(-)

--


Backport - xen-netback: correctly schedule rate-limited queues

2017-07-11 Thread Jean-Louis Dupond

Hi All,

Is there a chance the following patch 
(https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=dfa523ae9f2542bee4cddaea37b3be3e157f6e6b) 
can get backported to the current LTS kernels (including 4.9.x)?


The commit message might not be completely showing the impact of this 
issue (which is fixed with the patch).


Without the patch, ksoftirqd goes nuts if you upload on a domU with 
rate-limiting enabled.
This not only causing high cpu load, but sometimes also a complete hang 
of the hypervisor.


As the patch is quite important and trivial, I think it would be good to 
have it backported :)


Thanks
Jean-Louis


Re: [PATCH] dpaa_eth: use correct device for DMA mapping API

2017-07-11 Thread Arnd Bergmann
On Tue, Jul 11, 2017 at 10:50 AM, Madalin-cristian Bucur
 wrote:

> Hi Arnd,
>
> Thanks for looking into this, I've tested your fix, it seems to need more 
> work:
>
> [0.894968]  platform: DMA map failed
> [0.898627]  platform: DMA map failed
> [0.902288]  platform: DMA map failed
> [0.905947]  platform: DMA map failed
> [0.909606]  platform: DMA map failed
> [0.913265]  platform: DMA map failed

I see: the assignment ended up after the first use, so ->dev was still
NULL here.

This should fix the problem you saw here:

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index f7b0b928cd53..988c0212ce7e 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2650,6 +2650,8 @@ static int dpaa_eth_probe(struct platform_device *pdev)
  for (i = 0; i < DPAA_BPS_NUM; i++) {
  int err;

+ /* DMA operations are done on the platform-provided device */
+ dpaa_bps[i]->dev = dev->parent;
  dpaa_bps[i] = dpaa_bp_alloc(dev);
  if (IS_ERR(dpaa_bps[i]))
  return PTR_ERR(dpaa_bps[i]);
@@ -2657,8 +2659,6 @@ static int dpaa_eth_probe(struct platform_device *pdev)
  dpaa_bps[i]->raw_size = bpool_buffer_raw_size(i, DPAA_BPS_NUM);
  /* avoid runtime computations by keeping the usable size here */
  dpaa_bps[i]->size = dpaa_bp_size(dpaa_bps[i]->raw_size);
- /* DMA operations are done on the platform-provided device */
- dpaa_bps[i]->dev = dev->parent;

  err = dpaa_bp_alloc_pool(dpaa_bps[i]);
  if (err < 0) {

> @@ -2806,7 +2799,6 @@ static int dpaa_eth_probe(struct platform_device *pdev)
> dpaa_bps_free(priv);
>  bp_create_failed:
>  fq_probe_failed:
> -dev_mask_failed:
>  mac_probe_failed:
> dev_set_drvdata(dev, NULL);
> free_netdev(net_dev);
>
> I'll try to address your concern about performing the DMA mapping on a 
> different
> device than the one set up by the platform code.

Thanks!

 Arnd


Re: [PATCH v2] dt-bindings: net: ravb : Add support for r8a7743 SoC

2017-07-11 Thread Simon Horman
On Mon, Jul 10, 2017 at 05:51:09PM +0200, Geert Uytterhoeven wrote:
> On Mon, Jul 10, 2017 at 5:32 PM, Biju Das  wrote:
> > Add a new compatible string for the RZ/G1M (R8A7743) SoC.
> >
> > Signed-off-by: Biju Das 
> 
> Reviewed-by: Geert Uytterhoeven 

Reviewed-by: Simon Horman 

I believe that as net-next is currently closed you should repost it once
it re-opens (likely in about a week) with the following subject prefix
and Geert's tag.

[PATCH net-next repost v2]





Re: [PATCH] brcmfmac: added LED triggers for transmit/receive

2017-07-11 Thread Arend van Spriel
On 10-07-17 19:02, Russell Joyce wrote:
>> 1) I think most of it should be some cfg80211 shareable code.
> 
> I’m not sure exactly what you mean by this, could you please clarify?

What I think Rafał is saying is that it would be better to have this
code in cfg80211 so other drivers including mac80211 could use it.

>> 2) This "rxtx" while surely present in other places sounds like a
>> workaround for LED subsystem limitation. Maybe it's time to finally
>> rework LED triggers.
> 
> I agree that it’s not an ideal way to do things, but I couldn’t think of a
> better alternative. I think that having a combined trigger is useful though, 
> for
> situations like using the single LED on a Raspberry Pi to show Wi-Fi activity.

Indeed. However, the LED subsystem could/should(?) take care of mapping
"rx" and "tx" triggers to the same LED.

I am happy to comment on your patch, but maybe you can first take a look
at the suggestion Rafał made above.

Regards,
Arend


RE: [PATCH] dpaa_eth: use correct device for DMA mapping API

2017-07-11 Thread Madalin-cristian Bucur
> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Monday, July 10, 2017 6:14 PM
> Subject: [PATCH] dpaa_eth: use correct device for DMA mapping API
> 
> Geert Uytterhoeven ran into a build error without CONFIG_HAS_DMA,
> as a result of the driver calling set_dma_ops(). While we can
> fix the build error in the dma-mapping implementation, there is
> another problem in this driver:
> 
> The configuration for the DMA is done by the platform code,
> looking up information about the system from the device tree.
> This copies the information only in an incomplete way, setting
> the dma_map_ops and forcing a specific mask, but ignoring all
> settings regarding IOMMU, coherence etc.
> 
> A better way to avoid the problem is to only ever pass a device
> into the dma_mapping implementation that has been setup by the
> platform code. In this case, that is the parent device, so we
> can get that pointer at probe time. Fortunately, we already have
> a pointer in the device specific structure for that, so we only
> need to modify that.
> 
> Fixes: fb52728a9294 ("dpaa_eth: reuse the dma_ops provided by the FMan MAC
> device")
> Signed-off-by: Arnd Bergmann 
> ---
> Not tested, please see if this works before applying!
> ---
>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> index 757b873735a5..f7b0b928cd53 100644
> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> @@ -2646,14 +2646,6 @@ static int dpaa_eth_probe(struct platform_device
> *pdev)
>   priv->buf_layout[RX].priv_data_size = DPAA_RX_PRIV_DATA_SIZE; /* Rx
> */
>   priv->buf_layout[TX].priv_data_size = DPAA_TX_PRIV_DATA_SIZE; /* Tx
> */
> 
> - /* device used for DMA mapping */
> - set_dma_ops(dev, get_dma_ops(>dev));
> - err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
> - if (err) {
> - dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
> - goto dev_mask_failed;
> - }
> -
>   /* bp init */
>   for (i = 0; i < DPAA_BPS_NUM; i++) {
>   int err;
> @@ -2665,7 +2657,8 @@ static int dpaa_eth_probe(struct platform_device
> *pdev)
>   dpaa_bps[i]->raw_size = bpool_buffer_raw_size(i,
> DPAA_BPS_NUM);
>   /* avoid runtime computations by keeping the usable size here
> */
>   dpaa_bps[i]->size = dpaa_bp_size(dpaa_bps[i]->raw_size);
> - dpaa_bps[i]->dev = dev;
> + /* DMA operations are done on the platform-provided device */
> + dpaa_bps[i]->dev = dev->parent;
> 
>   err = dpaa_bp_alloc_pool(dpaa_bps[i]);
>   if (err < 0) {
> --
> 2.9.0

Hi Arnd,

Thanks for looking into this, I've tested your fix, it seems to need more work:

[0.894968]  platform: DMA map failed
[0.898627]  platform: DMA map failed
[0.902288]  platform: DMA map failed
[0.905947]  platform: DMA map failed
[0.909606]  platform: DMA map failed
[0.913265]  platform: DMA map failed

You also missed this related change:

@@ -2806,7 +2799,6 @@ static int dpaa_eth_probe(struct platform_device *pdev)
dpaa_bps_free(priv);
 bp_create_failed:
 fq_probe_failed:
-dev_mask_failed:
 mac_probe_failed:
dev_set_drvdata(dev, NULL);
free_netdev(net_dev);

I'll try to address your concern about performing the DMA mapping on a different
device than the one set up by the platform code.

Regards,
Madalin


Re: [PATCH net-next RFC 10/12] net: dsa: Move FDB dump implementation inside DSA

2017-07-11 Thread Arkadi Sharshevsky


On 07/10/2017 10:36 PM, Vivien Didelot wrote:
> Hi Arkadi,
> 
> Arkadi Sharshevsky  writes:
> 
>> +struct dsa_slave_dump_ctx {
>> +struct net_device *dev;
>> +struct sk_buff *skb;
>> +struct netlink_callback *cb;
>> +int idx;
>> +};
>> +
>>  struct dsa_switch_ops {
>>  /*
>>   * Legacy probing.
>> @@ -392,9 +399,7 @@ struct dsa_switch_ops {
>>  int (*port_fdb_del)(struct dsa_switch *ds, int port,
>>  const unsigned char *addr, u16 vid);
>>  int (*port_fdb_dump)(struct dsa_switch *ds, int port,
>> - struct switchdev_obj_port_fdb *fdb,
>> -  switchdev_obj_dump_cb_t *cb);
>> -
>> + struct dsa_slave_dump_ctx *dump);
> 
> I would prefer to pass a callback to the driver, so that we can call
> port_fdb_dump for other interfaces like debugfs, and for non exposed
> switch ports (CPU and DSA links) as well. Something like:
> 
> typedef int dsa_fdb_dump_cb_t(const unsigned char *addr, u16 vid,
>   bool is_static, void *data);
> 
> int (*port_fdb_dump)(struct dsa_switch *ds, int port,
>  dsa_fdb_dump_cb_t *cb, void *data);
> 
> Then dsa_slave_dump_ctx and dsa_slave_port_fdb_do_dump can be static to
> net/dsa/slave.c.
> 

Originally thought about it. But it makes sense to pass a callback
when different functions can be supplied.

Here is only one option which is the fdb_do_dump(). So I thought
exporting it makes more sense..

>> +int dsa_slave_port_fdb_dump(struct sk_buff *skb, struct netlink_callback 
>> *cb,
>> +struct net_device *dev,
>> +struct net_device *filter_dev, int *idx)
>> +{
>> +struct dsa_slave_dump_ctx dump = {
>> +.dev = dev,
>> +.skb = skb,
>> +.cb = cb,
>> +.idx = *idx,
>> +};
>> +struct dsa_slave_priv *p = netdev_priv(dev);
>> +struct dsa_port *dp = p->dp;
>> +struct dsa_switch *ds = dp->ds;
>> +int err;
>> +
>> +if (!ds->ops->port_fdb_dump) {
>> +err = -EOPNOTSUPP;
>> +goto out;
>> +}
>> +
>> +err = ds->ops->port_fdb_dump(ds, dp->index, );
>> +out:
>> +*idx = dump.idx;
>> +return err;
> 
> There is no need to reassign *idx in case of error:
> 
> if (!ds->ops->port_fdb_dump)
> return -EOPNOTSUPP;
> 
> err = ds->ops->port_fdb_dump(ds, dp->index, );
> *idx = dump.idx;
> 
> return err;
>
Will fix, thanks.

>> +.ndo_fdb_dump   = dsa_slave_port_fdb_dump,
> 
> And s/dsa_slave_port_fdb_dump/dsa_slave_fdb_dump/ here will be even
> better ;-)
> 
> 
> Thanks,
> 
> Vivien
> 
Will fix, thanks.


  1   2   >