Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote: > When XDP prog is attached, it is currently limiting > MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 > in x86. > > AFAICT, since mlx4 is doing one page per packet for XDP, > we can at least raise the MTU limitation up to > PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is > doing. It will be useful in the next patch which allows > XDP program to extend the packet by adding new header(s). > > Signed-off-by: Martin KaFai Lau > --- Have you tested your patch on a host with PAGE_SIZE = 64 KB ? Looks XDP really kills arches with bigger pages :( Thanks.
Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
On 12/2/16 4:38 PM, Eric Dumazet wrote: On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote: When XDP prog is attached, it is currently limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 in x86. AFAICT, since mlx4 is doing one page per packet for XDP, we can at least raise the MTU limitation up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is doing. It will be useful in the next patch which allows XDP program to extend the packet by adding new header(s). Signed-off-by: Martin KaFai Lau --- Have you tested your patch on a host with PAGE_SIZE = 64 KB ? Looks XDP really kills arches with bigger pages :( I'm afraid xdp mlx[45] support was not tested on arches with 64k pages at all. Not just this patch. I think people who care about such archs should test? Note page per packet is not a hard requirement for all drivers and all archs. For mlx[45] it was the easiest and the most convenient way to achieve desired performance. If there are ways to do the same performance differently, I'm all ears :)
[PATCH] net: ping: check minimum size on ICMP header length
Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there was no check that the iovec contained enough bytes for a icmp header, and the read loop would walk across neighboring stack contents. Since the iov_iter conversion, bad arguments are noticed, but the returned error is EFAULT. Returning EMSGSIZE is a clearer fix and solves the problem prior to v3.19. This was found using trinity with KASAN on v3.18: BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffc071077da0 Read of size 8 by task trinity-c2/9623 page:ffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: GBU 3.18.0-dirty #15 Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90 [] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171 [< inline >] __dump_stack lib/dump_stack.c:15 [] dump_stack+0x7c/0xd0 lib/dump_stack.c:50 [< inline >] print_address_description mm/kasan/report.c:147 [< inline >] kasan_report_error mm/kasan/report.c:236 [] kasan_report+0x380/0x4b8 mm/kasan/report.c:259 [< inline >] check_memory_region mm/kasan/kasan.c:264 [] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507 [] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15 [< inline >] memcpy_from_msg include/linux/skbuff.h:2667 [] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674 [] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714 [] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749 [< inline >] __sock_sendmsg_nosec net/socket.c:624 [< inline >] __sock_sendmsg net/socket.c:632 [] sock_sendmsg+0x124/0x164 net/socket.c:643 [< inline >] SYSC_sendto net/socket.c:1797 [] SyS_sendto+0x178/0x1d8 net/socket.c:1761 CVE-2016-8399 Reported-by: Qidan He Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Cc: sta...@vger.kernel.org Signed-off-by: Kees Cook --- net/ipv4/ping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 205e2000d395..8257be3f032c 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -654,7 +654,7 @@ int ping_common_sendmsg(int family, struct msghdr *msg, size_t len, void *user_icmph, size_t icmph_len) { u8 type, code; - if (len > 0x) + if (len > 0x || len < icmph_len) return -EMSGSIZE; /* -- 2.7.4 -- Kees Cook Nexus Security
Re: [PATCH net-next] liquidio: 'imply' ptp instead of 'select'
Hi Arnd, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Arnd-Bergmann/liquidio-imply-ptp-instead-of-select/20161203-084019 config: x86_64-allmodconfig compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: make ARCH=x86_64 allmodconfig make ARCH=x86_64 All errors (new ones prefixed by >>): >> drivers/net/ethernet/cavium/Kconfig:81: syntax error >> drivers/net/ethernet/cavium/Kconfig:80: unknown option "imply" make[2]: *** [allmodconfig] Error 1 make[1]: *** [allmodconfig] Error 2 make: *** [sub-make] Error 2 -- >> drivers/net/ethernet/cavium/Kconfig:81: syntax error >> drivers/net/ethernet/cavium/Kconfig:80: unknown option "imply" make[2]: *** [oldconfig] Error 1 make[1]: *** [oldconfig] Error 2 make: *** [sub-make] Error 2 -- >> drivers/net/ethernet/cavium/Kconfig:81: syntax error >> drivers/net/ethernet/cavium/Kconfig:80: unknown option "imply" make[2]: *** [olddefconfig] Error 1 make[2]: Target 'oldnoconfig' not remade because of errors. make[1]: *** [oldnoconfig] Error 2 make: *** [sub-make] Error 2 vim +81 drivers/net/ethernet/cavium/Kconfig d07a147f David Daney 2016-03-14 74 port on Cavium Networks' Octeon CN57XX, CN56XX, CN55XX, d07a147f David Daney 2016-03-14 75 CN54XX, CN52XX, and CN6XXX chips. d07a147f David Daney 2016-03-14 76 111fc64a Raghu Vatsavayi 2016-11-28 77 config LIQUIDIO_VF 111fc64a Raghu Vatsavayi 2016-11-28 78 tristate "Cavium LiquidIO VF support" 111fc64a Raghu Vatsavayi 2016-11-28 79 depends on 64BIT && PCI_MSI 2d6e65ca Arnd Bergmann 2016-12-03 @80 imply PTP_1588_CLOCK 111fc64a Raghu Vatsavayi 2016-11-28 @81 ---help--- 111fc64a Raghu Vatsavayi 2016-11-28 82 This driver supports Cavium LiquidIO Intelligent Server Adapter 111fc64a Raghu Vatsavayi 2016-11-28 83 based on CN23XX chips. 111fc64a Raghu Vatsavayi 2016-11-28 84 :: The code at line 81 was first introduced by commit :: 111fc64a237f231bc2d3187bdf8358eb7966e6a9 liquidio CN23XX: VF registration :: TO: Raghu Vatsavayi :: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
[PATCH] net: ethernet: ti: cpdma: use desc_read in chan_process instead of raw read
There is desc_read() macros to read desc fields, so no need to use __raw_readl(); Signed-off-by: Ivan Khoronzhuk --- Based on net-next/master drivers/net/ethernet/ti/davinci_cpdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c index c776e45..d96dca5 100644 --- a/drivers/net/ethernet/ti/davinci_cpdma.c +++ b/drivers/net/ethernet/ti/davinci_cpdma.c @@ -1132,7 +1132,7 @@ static int __cpdma_chan_process(struct cpdma_chan *chan) } desc_dma = desc_phys(pool, desc); - status = __raw_readl(&desc->hw_mode); + status = desc_read(desc, hw_mode); outlen = status & 0x7ff; if (status & CPDMA_DESC_OWNER) { chan->stats.busy_dequeue++; -- 2.7.4
Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
On Fri, 2016-12-02 at 16:53 -0800, Alexei Starovoitov wrote: > On 12/2/16 4:38 PM, Eric Dumazet wrote: > > On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote: > >> When XDP prog is attached, it is currently limiting > >> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 > >> in x86. > >> > >> AFAICT, since mlx4 is doing one page per packet for XDP, > >> we can at least raise the MTU limitation up to > >> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is > >> doing. It will be useful in the next patch which allows > >> XDP program to extend the packet by adding new header(s). > >> > >> Signed-off-by: Martin KaFai Lau > >> --- > > > > Have you tested your patch on a host with PAGE_SIZE = 64 KB ? > > > > Looks XDP really kills arches with bigger pages :( > > I'm afraid xdp mlx[45] support was not tested on arches > with 64k pages at all. Not just this patch. > I think people who care about such archs should test? > Note page per packet is not a hard requirement for all drivers > and all archs. For mlx[45] it was the easiest and the most > convenient way to achieve desired performance. > If there are ways to do the same performance differently, > I'm all ears :) > My question was more like : Can we double check all these patches wont break mlx4 driver (non XDP path) on arches with PAGE_SIZE=64KB. I have no plan using XDP before a while, but I certainly know some customers are using mlx4 on powerpc.
[PATCH net-next v2 4/4] bnxt_en: Add PFC statistics.
Report PFC statistics to ethtool -S and DCBNL. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 +++ drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 14 +- drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 --- 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 2a714cf..b4abc1b 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1124,6 +1124,13 @@ struct bnxt { u32 lpi_tmr_hi; }; +#define BNXT_RX_STATS_OFFSET(counter) \ + (offsetof(struct rx_port_stats, counter) / 8) + +#define BNXT_TX_STATS_OFFSET(counter) \ + ((offsetof(struct tx_port_stats, counter) + \ + sizeof(struct rx_port_stats) + 512) / 8) + #ifdef CONFIG_NET_RX_BUSY_POLL static inline void bnxt_enable_poll(struct bnxt_napi *bnapi) { diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c index f391b47..fdf2d8c 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c @@ -347,8 +347,10 @@ static int bnxt_dcbnl_ieee_setets(struct net_device *dev, struct ieee_ets *ets) static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc) { struct bnxt *bp = netdev_priv(dev); + __le64 *stats = (__le64 *)bp->hw_rx_port_stats; struct ieee_pfc *my_pfc = bp->ieee_pfc; - int rc; + long rx_off, tx_off; + int i, rc; pfc->pfc_cap = bp->max_lltc; @@ -369,6 +371,16 @@ static int bnxt_dcbnl_ieee_getpfc(struct net_device *dev, struct ieee_pfc *pfc) pfc->mbc = my_pfc->mbc; pfc->delay = my_pfc->delay; + if (!stats) + return 0; + + rx_off = BNXT_RX_STATS_OFFSET(rx_pfc_ena_frames_pri0); + tx_off = BNXT_TX_STATS_OFFSET(tx_pfc_ena_frames_pri0); + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++, rx_off++, tx_off++) { + pfc->requests[i] = le64_to_cpu(*(stats + tx_off)); + pfc->indications[i] = le64_to_cpu(*(stats + rx_off)); + } + return 0; } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c index fa6125e..784aa77 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c @@ -107,16 +107,9 @@ static int bnxt_set_coalesce(struct net_device *dev, #define BNXT_NUM_STATS 21 -#define BNXT_RX_STATS_OFFSET(counter) \ - (offsetof(struct rx_port_stats, counter) / 8) - #define BNXT_RX_STATS_ENTRY(counter) \ { BNXT_RX_STATS_OFFSET(counter), __stringify(counter) } -#define BNXT_TX_STATS_OFFSET(counter) \ - ((offsetof(struct tx_port_stats, counter) + \ - sizeof(struct rx_port_stats) + 512) / 8) - #define BNXT_TX_STATS_ENTRY(counter) \ { BNXT_TX_STATS_OFFSET(counter), __stringify(counter) } @@ -150,6 +143,14 @@ static int bnxt_set_coalesce(struct net_device *dev, BNXT_RX_STATS_ENTRY(rx_tagged_frames), BNXT_RX_STATS_ENTRY(rx_double_tagged_frames), BNXT_RX_STATS_ENTRY(rx_good_frames), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri0), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri1), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri2), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri3), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri4), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri5), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri6), + BNXT_RX_STATS_ENTRY(rx_pfc_ena_frames_pri7), BNXT_RX_STATS_ENTRY(rx_undrsz_frames), BNXT_RX_STATS_ENTRY(rx_eee_lpi_events), BNXT_RX_STATS_ENTRY(rx_eee_lpi_duration), @@ -179,6 +180,14 @@ static int bnxt_set_coalesce(struct net_device *dev, BNXT_TX_STATS_ENTRY(tx_fcs_err_frames), BNXT_TX_STATS_ENTRY(tx_err), BNXT_TX_STATS_ENTRY(tx_fifo_underruns), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri0), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri1), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri2), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri3), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri4), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri5), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri6), + BNXT_TX_STATS_ENTRY(tx_pfc_ena_frames_pri7), BNXT_TX_STATS_ENTRY(tx_eee_lpi_events), BNXT_TX_STATS_ENTRY(tx_eee_lpi_duration), BNXT_TX_STATS_ENTRY(tx_total_collisions), -- 1.8.3.1
[PATCH net-next v2 3/4] bnxt_en: Implement DCBNL to support host-based DCBX.
Support only IEEE DCBX initially. Add IEEE DCBNL ops and functions to get and set the hardware DCBX parameters. The DCB code is conditional on Kconfig CONFIG_BNXT_DCB. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/Kconfig | 10 + drivers/net/ethernet/broadcom/bnxt/Makefile | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 9 + drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 490 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h | 41 +++ 6 files changed, 557 insertions(+), 3 deletions(-) create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h diff --git a/drivers/net/ethernet/broadcom/Kconfig b/drivers/net/ethernet/broadcom/Kconfig index bd8c80c..404c020 100644 --- a/drivers/net/ethernet/broadcom/Kconfig +++ b/drivers/net/ethernet/broadcom/Kconfig @@ -203,4 +203,14 @@ config BNXT_SRIOV Virtualization support in the NetXtreme-C/E products. This allows for virtual function acceleration in virtual environments. +config BNXT_DCB + bool "Data Center Bridging (DCB) Support" + default n + depends on BNXT && DCB + ---help--- + Say Y here if you want to use Data Center Bridging (DCB) in the + driver. + + If unsure, say N. + endif # NET_VENDOR_BROADCOM diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile index 97e78e2..b233a86 100644 --- a/drivers/net/ethernet/broadcom/bnxt/Makefile +++ b/drivers/net/ethernet/broadcom/bnxt/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_BNXT) += bnxt_en.o -bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o +bnxt_en-y := bnxt.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 7ba5a99..e8ab5fd 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -54,6 +54,7 @@ #include "bnxt.h" #include "bnxt_sriov.h" #include "bnxt_ethtool.h" +#include "bnxt_dcb.h" #define BNXT_TX_TIMEOUT(5 * HZ) @@ -4997,7 +4998,7 @@ static void bnxt_enable_napi(struct bnxt *bp) } } -static void bnxt_tx_disable(struct bnxt *bp) +void bnxt_tx_disable(struct bnxt *bp) { int i; struct bnxt_tx_ring_info *txr; @@ -5015,7 +5016,7 @@ static void bnxt_tx_disable(struct bnxt *bp) netif_carrier_off(bp->dev); } -static void bnxt_tx_enable(struct bnxt *bp) +void bnxt_tx_enable(struct bnxt *bp) { int i; struct bnxt_tx_ring_info *txr; @@ -6686,6 +6687,7 @@ static void bnxt_remove_one(struct pci_dev *pdev) bnxt_hwrm_func_drv_unrgtr(bp); bnxt_free_hwrm_resources(bp); + bnxt_dcb_free(bp); pci_iounmap(pdev, bp->bar2); pci_iounmap(pdev, bp->bar1); pci_iounmap(pdev, bp->bar0); @@ -6913,6 +6915,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) dev->min_mtu = ETH_ZLEN; dev->max_mtu = 9500; + bnxt_dcb_init(bp); + #ifdef CONFIG_BNXT_SRIOV init_waitqueue_head(&bp->sriov_cfg_wait); #endif diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 1f3d852..2a714cf 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1026,6 +1026,13 @@ struct bnxt { struct bnxt_irq *irq_tbl; u8 mac_addr[ETH_ALEN]; +#ifdef CONFIG_BNXT_DCB + struct ieee_pfc *ieee_pfc; + struct ieee_ets *ieee_ets; + u8 dcbx_cap; + u8 default_pri; +#endif /* CONFIG_BNXT_DCB */ + u32 msg_enable; u32 hwrm_spec_code; @@ -1221,6 +1228,8 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi) int hwrm_send_message_silent(struct bnxt *, void *, u32, int); int bnxt_hwrm_set_coal(struct bnxt *); int bnxt_hwrm_func_qcaps(struct bnxt *); +void bnxt_tx_disable(struct bnxt *bp); +void bnxt_tx_enable(struct bnxt *bp); int bnxt_hwrm_set_pause(struct bnxt *); int bnxt_hwrm_set_link_setting(struct bnxt *, bool, bool); int bnxt_hwrm_fw_set_time(struct bnxt *); diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c new file mode 100644 index 000..f391b47 --- /dev/null +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c @@ -0,0 +1,490 @@ +/* Broadcom NetXtreme-C/E network driver. + * + * Copyright (c) 2014-2016 Broadcom Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include "b
[PATCH net-next v2 0/4] bnxt_en: Add DCBNL support.
This series adds DCBNL operations to support host-based IEEE DCBX. v2: Updated to the latest firmware interface spec. David, please consider this series for net-next. Michael Chan (4): bnxt_en: Re-factor bnxt_setup_tc(). bnxt_en: Update firmware header file to latest 1.6.0. bnxt_en: Implement DCBNL to support host-based DCBX. bnxt_en: Add PFC statistics. drivers/net/ethernet/broadcom/Kconfig | 10 + drivers/net/ethernet/broadcom/bnxt/Makefile |2 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 54 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 22 +- drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c | 502 ++ drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h | 41 + drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 23 +- drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 1725 + drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |8 +- 9 files changed, 1672 insertions(+), 715 deletions(-) create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.h -- 1.8.3.1
[PATCH net-next v2 1/4] bnxt_en: Re-factor bnxt_setup_tc().
Add a new function bnxt_setup_mq_tc() to handle MQPRIO. This new function will be called during ETS setup when we add DCBNL in the next patch. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++ drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 + 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 0e4f168..7664281 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -6337,17 +6337,10 @@ static int bnxt_change_mtu(struct net_device *dev, int new_mtu) return 0; } -static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, -struct tc_to_netdev *ntc) +int bnxt_setup_mq_tc(struct net_device *dev, u8 tc) { struct bnxt *bp = netdev_priv(dev); bool sh = false; - u8 tc; - - if (ntc->type != TC_SETUP_MQPRIO) - return -EINVAL; - - tc = ntc->tc; if (tc > bp->max_tc) { netdev_err(dev, "too many traffic classes requested: %d Max supported is %d\n", @@ -6390,6 +6383,15 @@ static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, return 0; } +static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, +struct tc_to_netdev *ntc) +{ + if (ntc->type != TC_SETUP_MQPRIO) + return -EINVAL; + + return bnxt_setup_mq_tc(dev, ntc->tc); +} + #ifdef CONFIG_RFS_ACCEL static bool bnxt_fltr_match(struct bnxt_ntuple_filter *f1, struct bnxt_ntuple_filter *f2) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index 47be789..fcd07ee 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1225,5 +1225,6 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi) int bnxt_hwrm_fw_set_time(struct bnxt *); int bnxt_open_nic(struct bnxt *, bool, bool); int bnxt_close_nic(struct bnxt *, bool, bool); +int bnxt_setup_mq_tc(struct net_device *dev, u8 tc); int bnxt_get_max_rings(struct bnxt *, int *, int *, bool); #endif -- 1.8.3.1
[PATCH net-next v2 2/4] bnxt_en: Update firmware header file to latest 1.6.0.
Latest interface has the latest DCB command structs. Get and store the max number of lossless TCs the hardware can support. Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 28 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h |5 +- drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h | 1725 ++- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |8 +- 4 files changed, 1069 insertions(+), 697 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 7664281..7ba5a99 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -186,11 +186,11 @@ enum board_idx { }; static const u16 bnxt_async_events_arr[] = { - HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_STATUS_CHANGE, - HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PF_DRVR_UNLOAD, - HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED, - HWRM_ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE, - HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE, + ASYNC_EVENT_CMPL_EVENT_ID_LINK_STATUS_CHANGE, + ASYNC_EVENT_CMPL_EVENT_ID_PF_DRVR_UNLOAD, + ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED, + ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE, + ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE, }; static bool bnxt_vf_pciid(enum board_idx idx) @@ -1476,8 +1476,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi *bnapi, u32 *raw_cons, } #define BNXT_GET_EVENT_PORT(data) \ - ((data) & \ -HWRM_ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK) + ((data) & \ +ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK) static int bnxt_async_event_process(struct bnxt *bp, struct hwrm_async_event_cmpl *cmpl) @@ -1486,7 +1486,7 @@ static int bnxt_async_event_process(struct bnxt *bp, /* TODO CHIMP_FW: Define event id's for link change, error etc */ switch (event_id) { - case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE: { + case ASYNC_EVENT_CMPL_EVENT_ID_LINK_SPEED_CFG_CHANGE: { u32 data1 = le32_to_cpu(cmpl->event_data1); struct bnxt_link_info *link_info = &bp->link_info; @@ -1502,13 +1502,13 @@ static int bnxt_async_event_process(struct bnxt *bp, set_bit(BNXT_LINK_SPEED_CHNG_SP_EVENT, &bp->sp_event); /* fall thru */ } - case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_LINK_STATUS_CHANGE: + case ASYNC_EVENT_CMPL_EVENT_ID_LINK_STATUS_CHANGE: set_bit(BNXT_LINK_CHNG_SP_EVENT, &bp->sp_event); break; - case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PF_DRVR_UNLOAD: + case ASYNC_EVENT_CMPL_EVENT_ID_PF_DRVR_UNLOAD: set_bit(BNXT_HWRM_PF_UNLOAD_SP_EVENT, &bp->sp_event); break; - case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED: { + case ASYNC_EVENT_CMPL_EVENT_ID_PORT_CONN_NOT_ALLOWED: { u32 data1 = le32_to_cpu(cmpl->event_data1); u16 port_id = BNXT_GET_EVENT_PORT(data1); @@ -1521,7 +1521,7 @@ static int bnxt_async_event_process(struct bnxt *bp, set_bit(BNXT_HWRM_PORT_MODULE_SP_EVENT, &bp->sp_event); break; } - case HWRM_ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE: + case ASYNC_EVENT_CMPL_EVENT_ID_VF_CFG_CHANGE: if (BNXT_PF(bp)) goto async_event_process_exit; set_bit(BNXT_RESET_TASK_SILENT_SP_EVENT, &bp->sp_event); @@ -4261,12 +4261,16 @@ static int bnxt_hwrm_queue_qportcfg(struct bnxt *bp) goto qportcfg_exit; } bp->max_tc = resp->max_configurable_queues; + bp->max_lltc = resp->max_configurable_lossless_queues; if (bp->max_tc > BNXT_MAX_QUEUE) bp->max_tc = BNXT_MAX_QUEUE; if (resp->queue_cfg_info & QUEUE_QPORTCFG_RESP_QUEUE_CFG_INFO_ASYM_CFG) bp->max_tc = 1; + if (bp->max_lltc > bp->max_tc) + bp->max_lltc = bp->max_tc; + qptr = &resp->queue_id0; for (i = 0; i < bp->max_tc; i++) { bp->q_info[i].queue_id = *qptr++; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index fcd07ee..1f3d852 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -11,10 +11,10 @@ #define BNXT_H #define DRV_MODULE_NAME"bnxt_en" -#define DRV_MODULE_VERSION "1.5.0" +#define DRV_MODULE_VERSION "1.6.0" #define DRV_VER_MAJ1 -#define DRV_VER_MIN5 +#define DRV_VER_MIN6 #define DRV_VER_UPD0 struct tx_bd { @@ -1010,6 +1010,7 @@ struct bnxt { u32 rss_hash_cfg; u8 max_tc; + u8
Re: [PATCH] net: wireless: realtek: constify rate_control_ops structures
On Sat, Dec 3, 2016 at 2:09 AM, Larry Finger wrote: > On 12/02/2016 03:50 AM, Bhumika Goyal wrote: >> >> The structures rate_control_ops are only passed as an argument to the >> functions ieee80211_rate_control_{register/unregister}. This argument is >> of type const, so rate_control_ops having this property can also be >> declared as const. >> Done using Coccinelle: >> >> @r1 disable optional_qualifier @ >> identifier i; >> position p; >> @@ >> static struct rate_control_ops i@p = {...}; >> >> @ok1@ >> identifier r1.i; >> position p; >> @@ >> ieee80211_rate_control_register(&i@p) >> >> @ok2@ >> identifier r1.i; >> position p; >> @@ >> ieee80211_rate_control_unregister(&i@p) >> >> @bad@ >> position p!={r1.p,ok1.p,ok2.p}; >> identifier r1.i; >> @@ >> i@p >> >> @depends on !bad disable optional_qualifier@ >> identifier r1.i; >> @@ >> static >> +const >> struct rate_control_ops i={...}; >> >> @depends on !bad disable optional_qualifier@ >> identifier r1.i; >> @@ >> +const >> struct rate_control_ops i; >> >> File size before: >>textdata bss dec hex filename >>1991 104 02095 82f wireless/realtek/rtlwifi/rc.o >> >> File size after: >>textdata bss dec hex filename >>2095 0 02095 wireless/realtek/rtlwifi/rc.o >> >> Signed-off-by: Bhumika Goyal >> --- >> drivers/net/wireless/realtek/rtlwifi/rc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/wireless/realtek/rtlwifi/rc.c >> b/drivers/net/wireless/realtek/rtlwifi/rc.c >> index ce8621a..107c13c 100644 >> --- a/drivers/net/wireless/realtek/rtlwifi/rc.c >> +++ b/drivers/net/wireless/realtek/rtlwifi/rc.c >> @@ -284,7 +284,7 @@ static void rtl_rate_free_sta(void *rtlpriv, >> kfree(rate_priv); >> } >> >> -static struct rate_control_ops rtl_rate_ops = { >> +static const struct rate_control_ops rtl_rate_ops = { >> .name = "rtl_rc", >> .alloc = rtl_rate_alloc, >> .free = rtl_rate_free, >> > > The content of your patch is OK; however, your subject is not. By > convention, "net: wireless: realtek:" is assumed. We do, however, include > "rtlwifi:" to indicate which part of drivers/net/wireless/realtek/ is > referenced. > Ok, I will send a v2 with the correct subject. Thanks for the input. Thanks, Bhumika > NACK > > Larry >
Re: [PATCH 2/3] uapi: export tc_skbmod.h
Hi Stephen, [auto build test ERROR on linus/master] [also build test ERROR on v4.9-rc7] [cannot apply to next-20161202] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Stephen-Hemminger/UAPI-export-missing-headers/20161203-104831 config: i386-tinyconfig (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): >> scripts/Makefile.headersinst:55: *** Missing UAPI file >> include/uapi/linux/tc_act/tc_sbkmod.h. Stop. -- >> scripts/Makefile.headersinst:55: *** Missing UAPI file >> include/uapi/linux/tc_act/tc_sbkmod.h. Stop. make[3]: *** [tc_act] Error 2 make[3]: Target '__headersinst' not remade because of errors. make[2]: *** [linux] Error 2 make[2]: Target '__headersinst' not remade because of errors. make[1]: *** [headers_install] Error 2 make: *** [sub-make] Error 2 vim +55 scripts/Makefile.headersinst d8ecc5cd Sam Ravnborg2011-04-27 39 10b63956 David Howells 2012-10-02 40 srcdir:= $(srctree)/$(obj) 10b63956 David Howells 2012-10-02 41 gendir:= $(objtree)/$(gen) 10b63956 David Howells 2012-10-02 42 10b63956 David Howells 2012-10-02 43 oldsrcdir := $(srctree)/$(subst /uapi,,$(obj)) 10b63956 David Howells 2012-10-02 44 7712401a Sam Ravnborg2008-06-15 45 # all headers files for this dir d8ecc5cd Sam Ravnborg2011-04-27 46 header-y := $(filter-out $(generic-y), $(header-y)) 40f1d4c2 David Howells 2012-10-02 47 all-files := $(header-y) $(genhdr-y) $(wrapper-files) 10b63956 David Howells 2012-10-02 48 output-files := $(addprefix $(installdir)/, $(all-files)) 10b63956 David Howells 2012-10-02 49 c0ff68f1 Nicolas Dichtel 2013-04-29 50 input-files1 := $(foreach hdr, $(header-y), \ c4619bc6 Sam Ravnborg2013-03-04 51$(if $(wildcard $(srcdir)/$(hdr)), \ c0ff68f1 Nicolas Dichtel 2013-04-29 52 $(wildcard $(srcdir)/$(hdr))) \ c0ff68f1 Nicolas Dichtel 2013-04-29 53) c0ff68f1 Nicolas Dichtel 2013-04-29 54 input-files1-name := $(notdir $(input-files1)) c0ff68f1 Nicolas Dichtel 2013-04-29 @55 input-files2 := $(foreach hdr, $(header-y), \ c0ff68f1 Nicolas Dichtel 2013-04-29 56$(if $(wildcard $(srcdir)/$(hdr)),, \ c4619bc6 Sam Ravnborg2013-03-04 57 $(if $(wildcard $(oldsrcdir)/$(hdr)), \ 10b63956 David Howells 2012-10-02 58 $(wildcard $(oldsrcdir)/$(hdr)), \ c4619bc6 Sam Ravnborg2013-03-04 59 $(error Missing UAPI file $(srcdir)/$(hdr))) \ c0ff68f1 Nicolas Dichtel 2013-04-29 60)) c0ff68f1 Nicolas Dichtel 2013-04-29 61 input-files2-name := $(notdir $(input-files2)) c0ff68f1 Nicolas Dichtel 2013-04-29 62 input-files3 := $(foreach hdr, $(genhdr-y), \ c4619bc6 Sam Ravnborg2013-03-04 63$(if $(wildcard $(gendir)/$(hdr)), \ :: The code at line 55 was first introduced by commit :: c0ff68f1611d6855a06d672989ad5cfea160a4eb kbuild: fix make headers_install when path is too long :: TO: Nicolas Dichtel :: CC: Michal Marek --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [net-next PATCH v4 5/6] virtio_net: add XDP_TX support
On 16-12-02 12:51 PM, John Fastabend wrote: > This adds support for the XDP_TX action to virtio_net. When an XDP > program is run and returns the XDP_TX action the virtio_net XDP > implementation will transmit the packet on a TX queue that aligns > with the current CPU that the XDP packet was processed on. > > Before sending the packet the header is zeroed. Also XDP is expected > to handle checksum correctly so no checksum offload support is > provided. > > Signed-off-by: John Fastabend > --- > drivers/net/virtio_net.c | 63 > -- > 1 file changed, 60 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index b67203e..137caba 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -330,12 +330,43 @@ static struct sk_buff *page_to_skb(struct virtnet_info > *vi, > return skb; > } > > +static void virtnet_xdp_xmit(struct virtnet_info *vi, > + unsigned int qnum, struct xdp_buff *xdp) > +{ > + struct send_queue *sq = &vi->sq[qnum]; > + struct virtio_net_hdr_mrg_rxbuf *hdr; > + unsigned int num_sg, len; > + void *xdp_sent; > + int err; > + > + /* Free up any pending old buffers before queueing new ones. */ > + while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) { > + struct page *page = virt_to_head_page(xdp_sent); > + > + put_page(page); > + } > + > + /* Zero header and leave csum up to XDP layers */ > + hdr = xdp->data; > + memset(hdr, 0, vi->hdr_len); > + > + num_sg = 1; > + sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data); > + err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, > +xdp->data, GFP_ATOMIC); > + if (unlikely(err)) > + put_page(virt_to_head_page(xdp->data)); > + else > + virtqueue_kick(sq->vq); > +} > + Hi Michael, Any idea why the above pattern > + err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, > +xdp->data, GFP_ATOMIC); > + if (unlikely(err)) > + put_page(virt_to_head_page(xdp->data)); > + else > + virtqueue_kick(sq->vq); > +} would cause a hang but if I call the virtqueue_kick as below even in the error case everything seems to be fine. err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, xdp->data, GFP_ATOMIC); if (unlikely(err)) put_page(virt_to_head_page(xdp->data)); virtqueue_kick(sq->vq); I'll take a look through the virtio code but thought I might ask in case you know off-hand or it could be something else entirely. I noticed virtio_input.c uses the second pattern and virtio_net.c uses the above pattern but I'm guessing it never gets exercised due to stack backoff. Thanks, John
[PATCH v3 net-next 3/4] net: dsa: mv88e6xxx: Move the tagging protocol into info
Older chips support a single tagging protocol, DSA. New chips support both DSA and EDSA, an enhanced version. Having both as an option changes the register layouts. Up until now, it has been assumed that if EDSA is supported, it will be used. Hence the register layout has been determined by which protocol should be used. However, mv88e6390 has a different implementation of EDSA, which requires we need to use the DSA tagging. Hence separate the selection of the protocol from the register layout. Signed-off-by: Andrew Lunn Reviewed-by: Vivien Didelot --- drivers/net/dsa/mv88e6xxx/chip.c | 33 +++-- drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 17 - 2 files changed, 31 insertions(+), 19 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 6e981bedd028..80efee6f5e16 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2482,7 +2482,7 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) PORT_CONTROL_USE_TAG | PORT_CONTROL_USE_IP | PORT_CONTROL_STATE_FORWARDING; if (dsa_is_cpu_port(ds, port)) { - if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA)) + if (chip->info->tag_protocol == DSA_TAG_PROTO_EDSA) reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA | PORT_CONTROL_FORWARD_UNKNOWN_MC; else @@ -2611,7 +2611,7 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) /* Port Ethertype: use the Ethertype DSA Ethertype * value. */ - if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA)) { + if (chip->info->tag_protocol == DSA_TAG_PROTO_EDSA) { err = mv88e6xxx_port_write(chip, port, PORT_ETH_TYPE, ETH_P_EDSA); if (err) @@ -3637,6 +3637,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 8, + .tag_protocol = DSA_TAG_PROTO_DSA, .flags = MV88E6XXX_FLAGS_FAMILY_6097, .ops = &mv88e6085_ops, }, @@ -3651,6 +3652,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 8, + .tag_protocol = DSA_TAG_PROTO_DSA, .flags = MV88E6XXX_FLAGS_FAMILY_6095, .ops = &mv88e6095_ops, }, @@ -3679,6 +3681,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_DSA, .flags = MV88E6XXX_FLAGS_FAMILY_6165, .ops = &mv88e6123_ops, }, @@ -3693,6 +3696,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_DSA, .flags = MV88E6XXX_FLAGS_FAMILY_6185, .ops = &mv88e6131_ops, }, @@ -3707,6 +3711,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_DSA, .flags = MV88E6XXX_FLAGS_FAMILY_6165, .ops = &mv88e6161_ops, }, @@ -3721,6 +3726,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_DSA, .flags = MV88E6XXX_FLAGS_FAMILY_6165, .ops = &mv88e6165_ops, }, @@ -3735,6 +3741,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_EDSA, .flags = MV88E6XXX_FLAGS_FAMILY_6351, .ops = &mv88e6171_ops, }, @@ -3749,6 +3756,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_EDSA, .flags = MV88E6XXX_FLAGS_FAMILY_6352, .ops = &mv88e6172_ops, }, @@ -3763,6 +3771,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = { .global1_addr = 0x1b, .age_time_coeff = 15000, .g1_irqs = 9, + .tag_protocol = DSA_TAG_PROTO_EDSA, .
[PATCH v3 net-next 4/4] net: dsa: mv88e6xxx: Refactor CPU and DSA port setup
Older chips only support DSA tagging. Newer chips have both DSA and EDSA tagging. Refactor the code by adding port functions for setting the frame mode, egress mode, and if to forward unknown frames. This results in the helper mv88e6xxx_6065_family() becoming unused, so remove it. Signed-off-by: Andrew Lunn v3: Verify mandatory ops for port setup Don't set ether type for DSA port. --- drivers/net/dsa/mv88e6xxx/chip.c | 217 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 20 drivers/net/dsa/mv88e6xxx/port.c | 118 ++ drivers/net/dsa/mv88e6xxx/port.h | 13 ++ 4 files changed, 319 insertions(+), 49 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 80efee6f5e16..9c14aaad5103 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -677,11 +677,6 @@ static int mv88e6xxx_phy_ppu_write(struct mv88e6xxx_chip *chip, int addr, return err; } -static bool mv88e6xxx_6065_family(struct mv88e6xxx_chip *chip) -{ - return chip->info->family == MV88E6XXX_FAMILY_6065; -} - static bool mv88e6xxx_6095_family(struct mv88e6xxx_chip *chip) { return chip->info->family == MV88E6XXX_FAMILY_6095; @@ -2438,6 +2433,72 @@ static int mv88e6xxx_serdes_power_on(struct mv88e6xxx_chip *chip) return err; } +static int mv88e6xxx_setup_port_dsa(struct mv88e6xxx_chip *chip, int port, + int upstream_port) +{ + int err; + + err = chip->info->ops->port_set_frame_mode( + chip, port, MV88E6XXX_FRAME_MODE_DSA); + if (err) + return err; + + return chip->info->ops->port_set_egress_unknowns( + chip, port, port == upstream_port); +} + +static int mv88e6xxx_setup_port_cpu(struct mv88e6xxx_chip *chip, int port) +{ + int err; + + switch (chip->info->tag_protocol) { + case DSA_TAG_PROTO_EDSA: + err = chip->info->ops->port_set_frame_mode( + chip, port, MV88E6XXX_FRAME_MODE_ETHERTYPE); + if (err) + return err; + + err = mv88e6xxx_port_set_egress_mode( + chip, port, PORT_CONTROL_EGRESS_ADD_TAG); + if (err) + return err; + + if (chip->info->ops->port_set_ether_type) + err = chip->info->ops->port_set_ether_type( + chip, port, ETH_P_EDSA); + break; + + case DSA_TAG_PROTO_DSA: + err = chip->info->ops->port_set_frame_mode( + chip, port, MV88E6XXX_FRAME_MODE_DSA); + if (err) + return err; + + err = mv88e6xxx_port_set_egress_mode( + chip, port, PORT_CONTROL_EGRESS_UNMODIFIED); + break; + default: + err = -EINVAL; + } + + if (err) + return err; + + return chip->info->ops->port_set_egress_unknowns(chip, port, true); +} + +static int mv88e6xxx_setup_port_normal(struct mv88e6xxx_chip *chip, int port) +{ + int err; + + err = chip->info->ops->port_set_frame_mode( + chip, port, MV88E6XXX_FRAME_MODE_NORMAL); + if (err) + return err; + + return chip->info->ops->port_set_egress_unknowns(chip, port, false); +} + static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) { struct dsa_switch *ds = chip->ds; @@ -2473,44 +2534,23 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) * If this is the upstream port for this switch, enable * forwarding of unknown unicasts and multicasts. */ - reg = 0; - if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) || - mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) || - mv88e6xxx_6095_family(chip) || mv88e6xxx_6065_family(chip) || - mv88e6xxx_6185_family(chip) || mv88e6xxx_6320_family(chip)) - reg = PORT_CONTROL_IGMP_MLD_SNOOP | + reg = PORT_CONTROL_IGMP_MLD_SNOOP | PORT_CONTROL_USE_TAG | PORT_CONTROL_USE_IP | PORT_CONTROL_STATE_FORWARDING; - if (dsa_is_cpu_port(ds, port)) { - if (chip->info->tag_protocol == DSA_TAG_PROTO_EDSA) - reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA | - PORT_CONTROL_FORWARD_UNKNOWN_MC; - else - reg |= PORT_CONTROL_DSA_TAG; - reg |= PORT_CONTROL_EGRESS_ADD_TAG | - PORT_CONTROL_FORWARD_UNKNOWN; - } - if (dsa_is_dsa_port(ds, port)) { - if (mv88e6xxx_6095_family(chip) || - mv88e6xxx_6185_family(chip)) - reg |= PORT_CONTROL_DSA_TAG; - if (mv88e6xxx_6352_family(chip) || -
[PATCH v3 net-next 1/4] net: dsa: mv88e6xxx: Implement mv88e6390 tag remap
The mv88e6390 does not have the two registers to set the frame priority map. Instead it has an indirection registers for setting a number of different priority maps. Refactor the old code into an function, implement the mv88e6390 version, and use an op to call the right one. Signed-off-by: Andrew Lunn Reviewed-by: Vivien Didelot --- v2: Add port prefix Add helper function for 6390 Add _IEEE_ into #defines --- drivers/net/dsa/mv88e6xxx/chip.c | 37 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 12 +++ drivers/net/dsa/mv88e6xxx/port.c | 63 +++ drivers/net/dsa/mv88e6xxx/port.h | 2 ++ 4 files changed, 101 insertions(+), 13 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index ce2f7ff8066e..ff4bd2f74357 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2617,20 +2617,10 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) if (err) return err; } + } - /* Tag Remap: use an identity 802.1p prio -> switch -* prio mapping. -*/ - err = mv88e6xxx_port_write(chip, port, PORT_TAG_REGMAP_0123, - 0x3210); - if (err) - return err; - - /* Tag Remap 2: use an identity 802.1p prio -> switch -* prio mapping. -*/ - err = mv88e6xxx_port_write(chip, port, PORT_TAG_REGMAP_4567, - 0x7654); + if (chip->info->ops->port_tag_remap) { + err = chip->info->ops->port_tag_remap(chip, port); if (err) return err; } @@ -3189,6 +3179,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { .port_set_link = mv88e6xxx_port_set_link, .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_speed = mv88e6185_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3217,6 +3208,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { .port_set_link = mv88e6xxx_port_set_link, .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_speed = mv88e6185_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3245,6 +3237,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .port_set_link = mv88e6xxx_port_set_link, .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_speed = mv88e6185_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3259,6 +3252,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .port_set_link = mv88e6xxx_port_set_link, .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_speed = mv88e6185_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3288,6 +3282,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay, .port_set_speed = mv88e6185_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3305,6 +3300,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay, .port_set_speed = mv88e6352_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3320,6 +3316,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { .port_set_duplex = mv88e6xxx_port_set_duplex, .port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay, .port_set_speed = mv88e6185_port_set_speed, + .port_tag_remap = mv88e6095_port_tag_remap, .stats_snapshot = mv88e6320_g1_stats_snapshot,
[PATCH v3 net-next 2/4] net: dsa: mv88e6xxx: Monitor and Management tables
The mv88e6390 changes the monitor control register into the Monitor and Management control, which is an indirection register to various registers. Add ops to set the CPU port and the ingress/egress port for both register layouts, to global1 Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 68 +- drivers/net/dsa/mv88e6xxx/global1.c | 69 +++ drivers/net/dsa/mv88e6xxx/global1.h | 4 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 13 +++ 4 files changed, 145 insertions(+), 9 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index ff4bd2f74357..6e981bedd028 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2747,15 +2747,17 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip) if (err) return err; - /* Configure the upstream port, and configure it as the port to which -* ingress and egress and ARP monitor frames are to be sent. -*/ - reg = upstream_port << GLOBAL_MONITOR_CONTROL_INGRESS_SHIFT | - upstream_port << GLOBAL_MONITOR_CONTROL_EGRESS_SHIFT | - upstream_port << GLOBAL_MONITOR_CONTROL_ARP_SHIFT; - err = mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg); - if (err) - return err; + if (chip->info->ops->g1_set_cpu_port) { + err = chip->info->ops->g1_set_cpu_port(chip, upstream_port); + if (err) + return err; + } + + if (chip->info->ops->g1_set_egress_port) { + err = chip->info->ops->g1_set_egress_port(chip, upstream_port); + if (err) + return err; + } /* Disable remote management, and set the switch's DSA device number. */ err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL_2, @@ -3184,6 +3186,8 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6095_ops = { @@ -3213,6 +3217,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6123_ops = { @@ -3227,6 +3233,8 @@ static const struct mv88e6xxx_ops mv88e6123_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6131_ops = { @@ -3242,6 +3250,8 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6161_ops = { @@ -3257,6 +3267,8 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6165_ops = { @@ -3271,6 +3283,8 @@ static const struct mv88e6xxx_ops mv88e6165_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6171_ops = { @@ -3287,6 +3301,8 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, + .g1_set_egress_port = mv88e6095_g1_set_egress_port, }; static const struct mv88e6xxx_ops mv88e6172_ops = { @@ -3305,6 +3321,8 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .stats_get_sset_cou
Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
On Fri, Dec 02, 2016 at 06:15:26PM -0800, Eric Dumazet wrote: > On Fri, 2016-12-02 at 16:53 -0800, Alexei Starovoitov wrote: > > On 12/2/16 4:38 PM, Eric Dumazet wrote: > > > On Fri, 2016-12-02 at 15:23 -0800, Martin KaFai Lau wrote: > > >> When XDP prog is attached, it is currently limiting > > >> MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 > > >> in x86. > > >> > > >> AFAICT, since mlx4 is doing one page per packet for XDP, > > >> we can at least raise the MTU limitation up to > > >> PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is > > >> doing. It will be useful in the next patch which allows > > >> XDP program to extend the packet by adding new header(s). > > >> > > >> Signed-off-by: Martin KaFai Lau > > >> --- > > > > > > Have you tested your patch on a host with PAGE_SIZE = 64 KB ? > > > > > > Looks XDP really kills arches with bigger pages :( > > > > I'm afraid xdp mlx[45] support was not tested on arches > > with 64k pages at all. Not just this patch. > > I think people who care about such archs should test? > > Note page per packet is not a hard requirement for all drivers > > and all archs. For mlx[45] it was the easiest and the most > > convenient way to achieve desired performance. > > If there are ways to do the same performance differently, > > I'm all ears :) > > > > My question was more like : > > Can we double check all these patches wont break mlx4 driver (non XDP > path) on arches with PAGE_SIZE=64KB. The page/pkt requirement is not added by this patch. The earlier XDP patch series has already ensured this page/pkt requirement is effective only when XDP prog is attached. In the earlier XDP patches, MTU is limited to 1514 when XDP is ative. This patch is to allow fully use of the page for a packet (and also only matter when XDP is active).
Re: [PATCH net-next 1/4] bpf: xdp: Allow head adjustment in XDP prog
On Sat, Dec 03, 2016 at 01:22:05AM +0100, Daniel Borkmann wrote: > On 12/03/2016 12:23 AM, Martin KaFai Lau wrote: > >This patch allows XDP prog to extend/remove the packet > >data at the head (like adding or removing header). It is > >done by adding a new XDP helper bpf_xdp_adjust_head(). > > > >It also renames bpf_helper_changes_skb_data() to > >bpf_helper_changes_pkt_data() to better reflect > >that XDP prog does not work on skb. > > > >Signed-off-by: Martin KaFai Lau > [...] > >diff --git a/net/core/filter.c b/net/core/filter.c > >index 56b43587d200..6902e2f73e38 100644 > >--- a/net/core/filter.c > >+++ b/net/core/filter.c > >@@ -2234,7 +2234,34 @@ static const struct bpf_func_proto > >bpf_skb_change_head_proto = { > > .arg3_type = ARG_ANYTHING, > > }; > > > >-bool bpf_helper_changes_skb_data(void *func) > >+BPF_CALL_2(bpf_xdp_adjust_head, struct xdp_buff *, xdp, int, offset) > >+{ > >+/* Both mlx4 and mlx5 driver align each packet to PAGE_SIZE when > >+ * XDP prog is set. > >+ * If the above is not true for the other drivers to support > >+ * bpf_xdp_adjust_head, struct xdp_buff can be extended. > >+ */ > >+void *head = (void *)((unsigned long)xdp->data & PAGE_MASK); > >+void *new_data = xdp->data + offset; > >+ > >+if (new_data < head || new_data >= xdp->data_end) > >+/* The packet length must be >=1 */ > > Patch looks generally good to me. Should the min pkt len here be > limited to ETH_HLEN instead of 1? Make sense. Will make the change. > > >+return -EINVAL; > >+ > >+xdp->data = new_data; > >+ > >+return 0; > >+} > >+ > >+static const struct bpf_func_proto bpf_xdp_adjust_head_proto = { > >+.func = bpf_xdp_adjust_head, > >+.gpl_only = false, > >+.ret_type = RET_INTEGER, > >+.arg1_type = ARG_PTR_TO_CTX, > >+.arg2_type = ARG_ANYTHING, > >+}; > >+ > >+bool bpf_helper_changes_pkt_data(void *func) > > { > > if (func == bpf_skb_vlan_push || > > func == bpf_skb_vlan_pop || > [...]
[PATCH v1 net-next 5/5] net: dsa: mv88e6xxx: Implement mv88e6390 pause control
The mv88e6390 has a number flow control registers accessed via the Flow Control register. Use these to set the pause control. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 7 +++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 2 ++ drivers/net/dsa/mv88e6xxx/port.c | 13 + drivers/net/dsa/mv88e6xxx/port.h | 1 + 4 files changed, 23 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 3ddb1f79e709..ca453f3243cd 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -3490,6 +3490,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, @@ -3513,6 +3514,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, @@ -3536,6 +3538,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, @@ -3586,6 +3589,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, @@ -3739,6 +3743,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, @@ -3764,6 +3769,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, @@ -3787,6 +3793,7 @@ static const struct mv88e6xxx_ops mv88e6391_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_pause_config = mv88e6390_port_pause_config, .stats_snapshot = mv88e6390_g1_stats_snapshot, .stats_set_histogram = mv88e6390_g1_stats_set_histogram, .stats_get_sset_count = mv88e6320_stats_get_sset_count, diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h index 3b1f3ab490b9..13c7cc443454 100644 --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h @@ -78,6 +78,8 @@ #define PORT_PCS_CTRL_SPEED_1 (0x03) /* 6390X */ #define PORT_PCS_CTRL_SPEED_UNFORCED (0x03) #define PORT_PAUSE_CTRL0x02 +#define PORT_FLOW_CTRL_LIMIT_IN((0x00 << 8) | BIT(15)) +#define PORT_FLOW_CTRL_LIMIT_OUT ((0x01 << 8) | BIT(15)) #define PORT_SWITCH_ID 0x03 #define PORT_SWITCH_ID_PROD_NUM_6085 0x04a #define PORT_SWITCH_ID_PROD_NUM_6095 0x095 diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c index 8d14833b2e49..0db7fa0373ae 100644 --- a/drivers/net/dsa/mv88e6xxx/port.c +++ b/
[PATCH v1 net-next 3/5] net: dsa: mv88e6xxx: Refactor egress rate limiting
There are two different rate limiting configurations, depending on the switch generation. Refactor this into ops. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 31 +++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 2 ++ drivers/net/dsa/mv88e6xxx/port.c | 12 drivers/net/dsa/mv88e6xxx/port.h | 2 ++ 4 files changed, 35 insertions(+), 12 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index db1542e05e62..1b0917e44809 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2657,18 +2657,8 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) return err; } - /* Rate Control: disable ingress rate limiting. */ - if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) || - mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) || - mv88e6xxx_6320_family(chip)) { - err = mv88e6xxx_port_write(chip, port, PORT_RATE_CONTROL, - 0x0001); - if (err) - return err; - - } else if (mv88e6xxx_6185_family(chip) || mv88e6xxx_6095_family(chip)) { - err = mv88e6xxx_port_write(chip, port, PORT_RATE_CONTROL, - 0x); + if (chip->info->ops->port_egress_rate_limiting) { + err = chip->info->ops->port_egress_rate_limiting(chip, port); if (err) return err; } @@ -3229,6 +3219,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3268,6 +3259,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, + .port_egress_rate_limiting = mv88e6095_port_egress_rate_limiting, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3309,6 +3301,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, + .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3331,6 +3324,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, + .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3371,6 +3365,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, + .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3396,6 +3391,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, + .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3419,6 +3415,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, +
[PATCH v1 net-next 4/5] net: dsa: mv88e6xxx: Refactor pause configuration
The mv88e6390 has a different mechanism for configuring pause. Refactor the code into an ops function, and for the moment, don't add any mv88e6390 code yet. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 28 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 + drivers/net/dsa/mv88e6xxx/port.c | 11 +++ drivers/net/dsa/mv88e6xxx/port.h | 1 + 4 files changed, 33 insertions(+), 8 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 1b0917e44809..3ddb1f79e709 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2625,17 +2625,15 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) if (err) return err; - if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) || - mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) || - mv88e6xxx_6320_family(chip)) { - /* Do not limit the period of time that this port can -* be paused for by the remote end or the period of -* time that this port can pause the remote end. -*/ - err = mv88e6xxx_port_write(chip, port, PORT_PAUSE_CTRL, 0x); + if (chip->info->ops->port_pause_config) { + err = chip->info->ops->port_pause_config(chip, port); if (err) return err; + } + if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) || + mv88e6xxx_6165_family(chip) || mv88e6xxx_6097_family(chip) || + mv88e6xxx_6320_family(chip)) { /* Port ATU control: disable limiting the number of * address database entries that this port is allowed * to use. @@ -3220,6 +3218,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6097_port_pause_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3260,6 +3259,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6095_port_egress_rate_limiting, + .port_pause_config = mv88e6097_port_pause_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3302,6 +3302,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6097_port_pause_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3325,6 +3326,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6097_port_pause_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3366,6 +3368,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6097_port_pause_config, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3392,6 +3395,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .port_set_ether_type = mv88e6351_port_set_ether_type, .port_jumbo_config = mv88e6165_port_jumbo_config, .port_egress_rate_limiting = mv88e6097_port_egress_rate_limiting, + .port_pause_config = mv88e6097_port_pause_config, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3416,6 +3420,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { .port_set_ether_type = mv88e6351_por
[PATCH v1 net-next 2/5] net: dsa: mv88e6xxx: Refactor setting of jumbo frames
Some switches support jumbo frames. Refactor this code into operations in the ops structure. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 26 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 + drivers/net/dsa/mv88e6xxx/port.c | 14 ++ drivers/net/dsa/mv88e6xxx/port.h | 2 +- 4 files changed, 38 insertions(+), 5 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index b2b6fe3ef4bf..db1542e05e62 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2582,10 +2582,6 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) mv88e6xxx_6185_family(chip)) reg = PORT_CONTROL_2_MAP_DA; - if (mv88e6xxx_6352_family(chip) || mv88e6xxx_6351_family(chip) || - mv88e6xxx_6165_family(chip) || mv88e6xxx_6320_family(chip)) - reg |= PORT_CONTROL_2_JUMBO_10240; - if (mv88e6xxx_6095_family(chip) || mv88e6xxx_6185_family(chip)) { /* Set the upstream port this port should use */ reg |= dsa_upstream_port(ds); @@ -2604,6 +2600,12 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) return err; } + if (chip->info->ops->port_jumbo_config) { + err = chip->info->ops->port_jumbo_config(chip, port); + if (err) + return err; + } + /* Port Association Vector: when learning source addresses * of packets, add the address to the address database using * a port bitmap that has only the bit for this port set and @@ -2663,6 +2665,7 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port) 0x0001); if (err) return err; + } else if (mv88e6xxx_6185_family(chip) || mv88e6xxx_6095_family(chip)) { err = mv88e6xxx_port_write(chip, port, PORT_RATE_CONTROL, 0x); @@ -3264,6 +3267,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_jumbo_config = mv88e6165_port_jumbo_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3304,6 +3308,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_jumbo_config = mv88e6165_port_jumbo_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3325,6 +3330,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_jumbo_config = mv88e6165_port_jumbo_config, .stats_snapshot = mv88e6xxx_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3364,6 +3370,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_jumbo_config = mv88e6165_port_jumbo_config, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3388,6 +3395,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_jumbo_config = mv88e6165_port_jumbo_config, .stats_snapshot = mv88e6320_g1_stats_snapshot, .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, @@ -3410,6 +3418,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { .port_set_frame_mode = mv88e6351_port_set_frame_mode, .port_set_egress_unknowns = mv88e6351_port_set_egress_unknowns, .port_set_ether_type = mv88e6351_port_set_ether_type, + .port_jumbo_config = mv88e6165_port_jumbo_config, .stats_snapsho
[PATCH v1 net-next 0/5] mv88e6390 batch 3
More patches to support the MV88e6390. This is mostly refactoring existing code and adding implementations for the mv88e6390. This patchset set which reserved frames are sent to the cpu, the size of jumbo frames that will be accepted, turn off egress rate limiting, and configuration of pause frames. Andrew Lunn (5): net: dsa: mv88e6xxx: Reserved Management frames to CPU net: dsa: mv88e6xxx: Refactor setting of jumbo frames net: dsa: mv88e6xxx: Refactor egress rate limiting net: dsa: mv88e6xxx: Refactor pause configuration net: dsa: mv88e6xxx: Implement mv88e6390 pause control drivers/net/dsa/mv88e6xxx/chip.c | 125 +++--- drivers/net/dsa/mv88e6xxx/global1.c | 27 drivers/net/dsa/mv88e6xxx/global1.h | 1 + drivers/net/dsa/mv88e6xxx/global2.c | 43 +++- drivers/net/dsa/mv88e6xxx/global2.h | 6 ++ drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 9 +++ drivers/net/dsa/mv88e6xxx/port.c | 50 ++ drivers/net/dsa/mv88e6xxx/port.h | 6 +- 8 files changed, 225 insertions(+), 42 deletions(-) -- 2.10.2
[PATCH v1 net-next 1/5] net: dsa: mv88e6xxx: Reserved Management frames to CPU
Older devices have a couple of registers in global2. The mv88e6390 family has a single register in global1 behind which hides similar configuration. Implement and op for this. Signed-off-by: Andrew Lunn --- drivers/net/dsa/mv88e6xxx/chip.c | 35 drivers/net/dsa/mv88e6xxx/global1.c | 27 ++ drivers/net/dsa/mv88e6xxx/global1.h | 1 + drivers/net/dsa/mv88e6xxx/global2.c | 43 --- drivers/net/dsa/mv88e6xxx/global2.h | 6 + drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 3 +++ 6 files changed, 97 insertions(+), 18 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 9c14aaad5103..b2b6fe3ef4bf 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2899,6 +2899,17 @@ static int mv88e6xxx_setup(struct dsa_switch *ds) goto unlock; } + /* Some generations have the configuration of sending reserved +* management frames to the CPU in global2, others in +* global1. Hence it does not fit the two setup functions +* above. +*/ + if (chip->info->ops->mgmt_rsvd2cpu) { + err = chip->info->ops->mgmt_rsvd2cpu(chip); + if (err) + goto unlock; + } + unlock: mutex_unlock(&chip->reg_lock); @@ -3221,6 +3232,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6095_ops = { @@ -3237,6 +3249,7 @@ static const struct mv88e6xxx_ops mv88e6095_ops = { .stats_get_sset_count = mv88e6095_stats_get_sset_count, .stats_get_strings = mv88e6095_stats_get_strings, .stats_get_stats = mv88e6095_stats_get_stats, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6097_ops = { @@ -3257,6 +3270,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6123_ops = { @@ -3275,6 +3289,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6131_ops = { @@ -3295,6 +3310,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6161_ops = { @@ -3315,6 +3331,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6165_ops = { @@ -3331,6 +3348,7 @@ static const struct mv88e6xxx_ops mv88e6165_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6171_ops = { @@ -3352,6 +3370,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6172_ops = { @@ -3375,6 +3394,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6175_ops = { @@ -3396,6 +3416,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { .stats_get_stats = mv88e6095_stats_get_stats, .g1_set_cpu_port = mv88e6095_g1_set_cpu_port, .g1_set_egress_port = mv88e6095_g1_set_egress_port, + .mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu, }; static const struct mv88e6xxx_ops mv88e6176_ops = { @@ -3419,6 +3440,7 @@ static const struct mv8
Re: [PATCH net-next 2/4] mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs
On Fri, 2016-12-02 at 19:42 -0800, Martin KaFai Lau wrote: > On Fri, Dec 02, 2016 at 06:15:26PM -0800, Eric Dumazet wrote: > > My question was more like : > > > > Can we double check all these patches wont break mlx4 driver (non XDP > > path) on arches with PAGE_SIZE=64KB. > The page/pkt requirement is not added by this patch. The earlier > XDP patch series has already ensured this page/pkt requirement > is effective only when XDP prog is attached. > > In the earlier XDP patches, MTU is limited to 1514 when > XDP is ative. This patch is to allow fully use of the > page for a packet (and also only matter when XDP is active). OK, thanks for the clarification.
[PATCHv2 net-next 0/4] MV88E6390 batch two
This is the second batch of patches adding support for the MV88e6390. They are not sufficient to make it work properly. The mv88e6390 has a much expanded set of priority maps. Refactor the existing code, and implement basic support for the new device. Similarly, the monitor control register has been reworked. The mv88e6390 has something odd in its EDSA tagging implementation, which means it is not possible to use it. So we need to use DSA tagging. This is the first device with EDSA support where we need to use DSA, and the code does not support this. So two patches refactor the existing code. The two different register definitions are separated out, and using DSA on an EDSA capable device is added. v2: Add port prefix Add helper function for 6390 Add _IEEE_ into #defines Split monitor_ctrl into a number of separate ops. Remove 6390 code which is management, used in a later patch s/EGREES/EGRESS/. Broke up setup_port_dsa() and set_port_dsa() into a number of ops v3: Verify mandatory ops for port setup Don't set ether type for DSA port. Andrew Lunn (4): net: dsa: mv88e6xxx: Implement mv88e6390 tag remap net: dsa: mv88e6xxx: Monitor and Management tables net: dsa: mv88e6xxx: Move the tagging protocol into info net: dsa: mv88e6xxx: Refactor CPU and DSA port setup drivers/net/dsa/mv88e6xxx/chip.c | 339 ++ drivers/net/dsa/mv88e6xxx/global1.c | 69 +++ drivers/net/dsa/mv88e6xxx/global1.h | 4 + drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 62 +-- drivers/net/dsa/mv88e6xxx/port.c | 181 ++ drivers/net/dsa/mv88e6xxx/port.h | 15 ++ 6 files changed, 583 insertions(+), 87 deletions(-) -- 2.10.2
Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
Hello On 11/24/2016 10:25 PM, Pavel Machek wrote: Hi! I'm debugging strange delays during transmit in stmmac driver. They seem to be present in 4.4 kernel (and older kernels, too). Workload is burst of udp packets being sent, pause, burst of udp packets, ... ... 4.9-rc6 still has the delays. With the #define STMMAC_COAL_TX_TIMER 1000 #define STMMAC_TX_MAX_FRAMES 2 settings, delays go away, and driver still works. (It fails fairly fast in 4.4). Good news. But the question still is: what is going on there? 256 packets looks way too large for being a trigger for aborting the TX coalescing timer. Looking more deeply into this, the driver is using non-highres timers to implement the TX coalescing. This simply cannot work. 1 HZ, which is the lowest granularity of non-highres timers in the kernel, is variable as well as already too large of a delay for effective TX coalescing. I seriously think that the TX coalescing support should be ripped out or disabled entirely until it is implemented properly in this driver. Ok, I'd disable coalescing, but could not figure it out till. What is generic way to do that? It seems only thing stmmac_tx_timer() does is calling stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be possible to do that explicitely, without delay, but it stops working completely if I attempt to do that. On a side note, stmmac_poll() does stmmac_enable_dma_irq() while stmmac_dma_interrupt() disables interrupts. But I don't see any protection between the two, so IMO it could race and we'd end up without polling or interrupts... the idea behind the TX mitigation is to mix the interrupt and timer and this approach gave us real benefit in terms of performances and CPU usage (especially on SH4-200/SH4-300 platforms based). In the ring, some descriptors can raise the irq (according to a threshold) and set the IC bit. In this path, the NAPI poll will be scheduled. But there is a timer that can run (and we experimented that no high resolution is needed) to clear the tx resources. Concerning the lock protection, we had reviewed long time ago and IIRC, no raise condition should be present. Open to review it, again! So, welcome any other schema and testing on platforms supported. Hoping this summary can help. Peppe Thanks and best regards, Pavel
Re: [PATCH] stmmac: simplify flag assignment
+ Alex On 11/30/2016 12:44 PM, Pavel Machek wrote: Simplify flag assignment. Signed-off-by: Pavel Machek diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index ed20668..0b706a7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2771,12 +2771,8 @@ static netdev_features_t stmmac_fix_features(struct net_device *dev, features &= ~NETIF_F_CSUM_MASK; /* Disable tso if asked by ethtool */ - if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) { - if (features & NETIF_F_TSO) - priv->tso = true; - else - priv->tso = false; - } + if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) + priv->tso = !!(features & NETIF_F_TSO); return features; }
[PATCH 2/2] can: peak: Add support for PCAN-USB X6 USB interface
From: Stephane Grosjean This adds support for PEAK-System PCAN-USB X6 USB to CAN interface. The CAN FD adapter PCAN-USB X6 allows the connection of up to 6 CAN FD or CAN networks to a computer via USB. The interface is installed in an aluminum profile casing and is shipped in versions with D-Sub connectors or M12 circular connectors. The PCAN-USB X6 registers in the USB sub-system as if 3x PCAN-USB-Pro FD adapters were plugged. So, this patch: - updates the PEAK_USB entry of the corresponding Kconfig file - defines and adds the device id. of the PCAN-USB X6 (0x0014) into the table of supported device ids - defines and adds the new software structure implementing the PCAN-USB X6, which is obviously a clone of the software structure implementing the PCAN-USB Pro FD. Signed-off-by: Stephane Grosjean Tested-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 + drivers/net/can/usb/peak_usb/pcan_usb_core.h | 2 + drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 72 3 files changed, 76 insertions(+) diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.c b/drivers/net/can/usb/peak_usb/pcan_usb_core.c index c06382cdfdfe..f3141ca56bc3 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.c @@ -39,6 +39,7 @@ static struct usb_device_id peak_usb_table[] = { {USB_DEVICE(PCAN_USB_VENDOR_ID, PCAN_USBPRO_PRODUCT_ID)}, {USB_DEVICE(PCAN_USB_VENDOR_ID, PCAN_USBFD_PRODUCT_ID)}, {USB_DEVICE(PCAN_USB_VENDOR_ID, PCAN_USBPROFD_PRODUCT_ID)}, + {USB_DEVICE(PCAN_USB_VENDOR_ID, PCAN_USBX6_PRODUCT_ID)}, {} /* Terminating entry */ }; @@ -50,6 +51,7 @@ static const struct peak_usb_adapter *const peak_usb_adapters_list[] = { &pcan_usb_pro, &pcan_usb_fd, &pcan_usb_pro_fd, + &pcan_usb_x6, }; /* diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_core.h b/drivers/net/can/usb/peak_usb/pcan_usb_core.h index 506fe506c9d3..3cbfb069893d 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_core.h +++ b/drivers/net/can/usb/peak_usb/pcan_usb_core.h @@ -27,6 +27,7 @@ #define PCAN_USBPRO_PRODUCT_ID 0x000d #define PCAN_USBPROFD_PRODUCT_ID 0x0011 #define PCAN_USBFD_PRODUCT_ID 0x0012 +#define PCAN_USBX6_PRODUCT_ID 0x0014 #define PCAN_USB_DRIVER_NAME "peak_usb" @@ -90,6 +91,7 @@ extern const struct peak_usb_adapter pcan_usb; extern const struct peak_usb_adapter pcan_usb_pro; extern const struct peak_usb_adapter pcan_usb_fd; extern const struct peak_usb_adapter pcan_usb_pro_fd; +extern const struct peak_usb_adapter pcan_usb_x6; struct peak_time_ref { struct timeval tv_host_0, tv_host; diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c index 8a316a194cf7..304732550f0a 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c @@ -1132,3 +1132,75 @@ const struct peak_usb_adapter pcan_usb_pro_fd = { .do_get_berr_counter = pcan_usb_fd_get_berr_counter, }; + +/* describes the PCAN-USB X6 adapter */ +static const struct can_bittiming_const pcan_usb_x6_const = { + .name = "pcan_usb_x6", + .tseg1_min = 1, + .tseg1_max = (1 << PUCAN_TSLOW_TSGEG1_BITS), + .tseg2_min = 1, + .tseg2_max = (1 << PUCAN_TSLOW_TSGEG2_BITS), + .sjw_max = (1 << PUCAN_TSLOW_SJW_BITS), + .brp_min = 1, + .brp_max = (1 << PUCAN_TSLOW_BRP_BITS), + .brp_inc = 1, +}; + +static const struct can_bittiming_const pcan_usb_x6_data_const = { + .name = "pcan_usb_x6", + .tseg1_min = 1, + .tseg1_max = (1 << PUCAN_TFAST_TSGEG1_BITS), + .tseg2_min = 1, + .tseg2_max = (1 << PUCAN_TFAST_TSGEG2_BITS), + .sjw_max = (1 << PUCAN_TFAST_SJW_BITS), + .brp_min = 1, + .brp_max = (1 << PUCAN_TFAST_BRP_BITS), + .brp_inc = 1, +}; + +const struct peak_usb_adapter pcan_usb_x6 = { + .name = "PCAN-USB X6", + .device_id = PCAN_USBX6_PRODUCT_ID, + .ctrl_count = PCAN_USBPROFD_CHANNEL_COUNT, + .ctrlmode_supported = CAN_CTRLMODE_FD | + CAN_CTRLMODE_3_SAMPLES | CAN_CTRLMODE_LISTENONLY, + .clock = { + .freq = PCAN_UFD_CRYSTAL_HZ, + }, + .bittiming_const = &pcan_usb_x6_const, + .data_bittiming_const = &pcan_usb_x6_data_const, + + /* size of device private data */ + .sizeof_dev_private = sizeof(struct pcan_usb_fd_device), + + /* timestamps usage */ + .ts_used_bits = 32, + .ts_period = 100, /* calibration period in ts. */ + .us_per_ts_scale = 1, /* us = (ts * scale) >> shift */ + .us_per_ts_shift = 0, + + /* give here messages in/out endpoints */ + .ep_msg_in = PCAN_USBPRO_EP_MSGIN, + .ep_msg_out = {PCAN_USBPRO_EP_MSGOUT_0, PCAN_USBPRO_EP_MSGOUT_1}, + + /* size of rx/tx usb buffers
[PATCH 1/2] can: peak: Fix bittiming fields size in bits
From: Stephane Grosjean This fixes the bitimings fields ranges supported by all the CAN-FD USB interfaces of the PEAK-System CAN-FD adapters. Very first development versions of the IP core API defined smaller TSGEx and SJW fields for both nominal and data bittimings records than the production versions. This patch fixes them by enlarging their sizes to the actual values: field: old size:fixed size: nominal TSGEG1 68 nominal TSGEG2 47 nominal SJW 47 data TSGEG1 45 data TSGEG2 34 data SJW 24 Note that this has no other consequences than offering larger choice to bitrate encoding. Signed-off-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde --- drivers/net/can/usb/peak_usb/pcan_ucan.h | 37 +++--- drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 32 +- 2 files changed, 45 insertions(+), 24 deletions(-) diff --git a/drivers/net/can/usb/peak_usb/pcan_ucan.h b/drivers/net/can/usb/peak_usb/pcan_ucan.h index e8fc4952c6b0..2147678f0225 100644 --- a/drivers/net/can/usb/peak_usb/pcan_ucan.h +++ b/drivers/net/can/usb/peak_usb/pcan_ucan.h @@ -43,11 +43,22 @@ struct __packed pucan_command { u16 args[3]; }; +#define PUCAN_TSLOW_BRP_BITS 10 +#define PUCAN_TSLOW_TSGEG1_BITS8 +#define PUCAN_TSLOW_TSGEG2_BITS7 +#define PUCAN_TSLOW_SJW_BITS 7 + +#define PUCAN_TSLOW_BRP_MASK ((1 << PUCAN_TSLOW_BRP_BITS) - 1) +#define PUCAN_TSLOW_TSEG1_MASK ((1 << PUCAN_TSLOW_TSGEG1_BITS) - 1) +#define PUCAN_TSLOW_TSEG2_MASK ((1 << PUCAN_TSLOW_TSGEG2_BITS) - 1) +#define PUCAN_TSLOW_SJW_MASK ((1 << PUCAN_TSLOW_SJW_BITS) - 1) + /* uCAN TIMING_SLOW command fields */ -#define PUCAN_TSLOW_SJW_T(s, t)(((s) & 0xf) | ((!!(t)) << 7)) -#define PUCAN_TSLOW_TSEG2(t) ((t) & 0xf) -#define PUCAN_TSLOW_TSEG1(t) ((t) & 0x3f) -#define PUCAN_TSLOW_BRP(b) ((b) & 0x3ff) +#define PUCAN_TSLOW_SJW_T(s, t)(((s) & PUCAN_TSLOW_SJW_MASK) | \ + ((!!(t)) << 7)) +#define PUCAN_TSLOW_TSEG2(t) ((t) & PUCAN_TSLOW_TSEG2_MASK) +#define PUCAN_TSLOW_TSEG1(t) ((t) & PUCAN_TSLOW_TSEG1_MASK) +#define PUCAN_TSLOW_BRP(b) ((b) & PUCAN_TSLOW_BRP_MASK) struct __packed pucan_timing_slow { __le16 opcode_channel; @@ -60,11 +71,21 @@ struct __packed pucan_timing_slow { __le16 brp;/* BaudRate Prescaler */ }; +#define PUCAN_TFAST_BRP_BITS 10 +#define PUCAN_TFAST_TSGEG1_BITS5 +#define PUCAN_TFAST_TSGEG2_BITS4 +#define PUCAN_TFAST_SJW_BITS 4 + +#define PUCAN_TFAST_BRP_MASK ((1 << PUCAN_TFAST_BRP_BITS) - 1) +#define PUCAN_TFAST_TSEG1_MASK ((1 << PUCAN_TFAST_TSGEG1_BITS) - 1) +#define PUCAN_TFAST_TSEG2_MASK ((1 << PUCAN_TFAST_TSGEG2_BITS) - 1) +#define PUCAN_TFAST_SJW_MASK ((1 << PUCAN_TFAST_SJW_BITS) - 1) + /* uCAN TIMING_FAST command fields */ -#define PUCAN_TFAST_SJW(s) ((s) & 0x3) -#define PUCAN_TFAST_TSEG2(t) ((t) & 0x7) -#define PUCAN_TFAST_TSEG1(t) ((t) & 0xf) -#define PUCAN_TFAST_BRP(b) ((b) & 0x3ff) +#define PUCAN_TFAST_SJW(s) ((s) & PUCAN_TFAST_SJW_MASK) +#define PUCAN_TFAST_TSEG2(t) ((t) & PUCAN_TFAST_TSEG2_MASK) +#define PUCAN_TFAST_TSEG1(t) ((t) & PUCAN_TFAST_TSEG1_MASK) +#define PUCAN_TFAST_BRP(b) ((b) & PUCAN_TFAST_BRP_MASK) struct __packed pucan_timing_fast { __le16 opcode_channel; diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c index ce44a033f63b..8a316a194cf7 100644 --- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c +++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c @@ -993,24 +993,24 @@ static void pcan_usb_fd_free(struct peak_usb_device *dev) static const struct can_bittiming_const pcan_usb_fd_const = { .name = "pcan_usb_fd", .tseg1_min = 1, - .tseg1_max = 64, + .tseg1_max = (1 << PUCAN_TSLOW_TSGEG1_BITS), .tseg2_min = 1, - .tseg2_max = 16, - .sjw_max = 16, + .tseg2_max = (1 << PUCAN_TSLOW_TSGEG2_BITS), + .sjw_max = (1 << PUCAN_TSLOW_SJW_BITS), .brp_min = 1, - .brp_max = 1024, + .brp_max = (1 << PUCAN_TSLOW_BRP_BITS), .brp_inc = 1, }; static const struct can_bittiming_const pcan_usb_fd_data_const = { .name = "pcan_usb_fd", .tseg1_min = 1, - .tseg1_max = 16, + .tseg1_max = (1 << PUCAN_TFAST_TSGEG1_BITS), .tseg2_min = 1, - .tseg2_max = 8, - .sjw_max = 4, + .tseg2_max = (1 << PUCAN_TFAST_TSGEG2_BITS), + .sjw_max = (1 << PUCAN_TFAST_SJW_BITS), .brp_min = 1, - .brp_max = 1024, + .brp_max = (1 << PUCAN_TFAST_BRP_BITS
pull-request: can 2016-12-02
Hello David, this is a pull request for net/master. THere are two patches by Stephane Grosjean, who adds support for the new PCAN-USB X6 USB interface to the pcan_usb driver. regards, Marc --- The following changes since commit aa196eed3d80d4b003b04a270712b978a012a939: macvtap: handle ubuf refcount correctly when meet errors (2016-11-30 15:06:02 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git tags/linux-can-fixes-for-4.9-20161201 for you to fetch changes up to f00b534ded60bd0a23c2fa8dec4ece52aa7d235f: can: peak: Add support for PCAN-USB X6 USB interface (2016-12-01 14:12:20 +0100) linux-can-fixes-for-4.9-20161201 Stephane Grosjean (2): can: peak: Fix bittiming fields size in bits can: peak: Add support for PCAN-USB X6 USB interface drivers/net/can/usb/peak_usb/pcan_ucan.h | 37 +++--- drivers/net/can/usb/peak_usb/pcan_usb_core.c | 2 + drivers/net/can/usb/peak_usb/pcan_usb_core.h | 2 + drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 104 ++- 4 files changed, 121 insertions(+), 24 deletions(-)
Re: [RFC PATCH net-next v2] ipv6: implement consistent hashing for equal-cost multipath routing
On 12/01/2016 06:55 PM, Roopa Prabhu wrote: > I think Its best for it to be a global setting, and thats why sysctl > seems like the best way (unless there are other ways to set this > globally via rtnetlink). If it helps, most hw switch vendors > supporting this feature also provide a globally tunable knob and it is > not on by default. What I had in mind was to keep the global sysctl but also provide a per-route attribute, which would default to the global sysctl. David signature.asc Description: OpenPGP digital signature
[PATCH net v3] tipc: check minimum bearer MTU
Qian Zhang (å¼ è°¦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek Reported-by: Qian Zhang (å¼ è°¦) --- changes v2 to v3: - rename tipc_check_mtu() helper to tipc_mtu_bad() and make the comment about the function less confusing changes v1 to v2: - add missing "static" to tipc_check_mtu() helper declaration - rather than blocking device MTU change to too low value, disable tipc bearer if that happens (suggested by Ben Hutchings) --- net/tipc/bearer.c| 11 +-- net/tipc/bearer.h| 13 + net/tipc/udp_media.c | 5 + 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c index 975dbeb60ab0..52d74760fb68 100644 --- a/net/tipc/bearer.c +++ b/net/tipc/bearer.c @@ -421,6 +421,10 @@ int tipc_enable_l2_media(struct net *net, struct tipc_bearer *b, dev = dev_get_by_name(net, driver_name); if (!dev) return -ENODEV; + if (tipc_mtu_bad(dev, 0)) { + dev_put(dev); + return -EINVAL; + } /* Associate TIPC bearer with L2 bearer */ rcu_assign_pointer(b->media_ptr, dev); @@ -610,8 +614,6 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt, if (!b) return NOTIFY_DONE; - b->mtu = dev->mtu; - switch (evt) { case NETDEV_CHANGE: if (netif_carrier_ok(dev)) @@ -624,6 +626,11 @@ static int tipc_l2_device_event(struct notifier_block *nb, unsigned long evt, tipc_reset_bearer(net, b); break; case NETDEV_CHANGEMTU: + if (tipc_mtu_bad(dev, 0)) { + bearer_disable(net, b); + break; + } + b->mtu = dev->mtu; tipc_reset_bearer(net, b); break; case NETDEV_CHANGEADDR: diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h index 78892e2f53e3..278ff7f616f9 100644 --- a/net/tipc/bearer.h +++ b/net/tipc/bearer.h @@ -39,6 +39,7 @@ #include "netlink.h" #include "core.h" +#include "msg.h" #include #define MAX_MEDIA 3 @@ -59,6 +60,9 @@ #define TIPC_MEDIA_TYPE_IB 2 #define TIPC_MEDIA_TYPE_UDP3 +/* minimum bearer MTU */ +#define TIPC_MIN_BEARER_MTU(MAX_H_SIZE + INT_H_SIZE) + /** * struct tipc_media_addr - destination address used by TIPC bearers * @value: address info (format defined by media) @@ -215,4 +219,13 @@ void tipc_bearer_xmit(struct net *net, u32 bearer_id, void tipc_bearer_bc_xmit(struct net *net, u32 bearer_id, struct sk_buff_head *xmitq); +/* check if device MTU is too low for tipc headers */ +static inline bool tipc_mtu_bad(struct net_device *dev, unsigned int reserve) +{ + if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve) + return false; + netdev_warn(dev, "MTU too low for tipc bearer\n"); + return true; +} + #endif /* _TIPC_BEARER_H */ diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index 78cab9c5a445..b58dc95f3d35 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -697,6 +697,11 @@ static int tipc_udp_enable(struct net *net, struct tipc_bearer *b, udp_conf.local_ip.s_addr = htonl(INADDR_ANY); udp_conf.use_udp_checksums = false; ub->ifindex = dev->ifindex; + if (tipc_mtu_bad(dev, sizeof(struct iphdr) + + sizeof(struct udphdr))) { + err = -EINVAL; + goto err; + } b->mtu = dev->mtu - sizeof(struct iphdr) - sizeof(struct udphdr); #if IS_ENABLED(CONFIG_IPV6) -- 2.10.2
Re: [PATCH iproute2 net-next 2/4] tc: flower: document SCTP ip_proto
On Thu, Dec 01, 2016 at 10:50:10AM -0800, Stephen Hemminger wrote: > On Tue, 29 Nov 2016 16:51:31 +0100 > Simon Horman wrote: > > > Add SCTP ip_proto to help text and man page. > > > > Signed-off-by: Simon Horman > > Sorry doesn't apply to current net-next branch in iproute2 git. > Probably some of the other changes modified formatting. Sorry about that, this file seems a bit busy these days. I will rebase and repost.
Re: stmmac: turn coalescing / NAPI off in stmmac
On 12/1/2016 11:48 PM, Pavel Machek wrote: @@ -2771,12 +2771,8 @@ static netdev_features_t stmmac_fix_features(struct net_device *dev, features &= ~NETIF_F_CSUM_MASK; /* Disable tso if asked by ethtool */ - if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) { - if (features & NETIF_F_TSO) - priv->tso = true; - else - priv->tso = false; - } + if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) + priv->tso = !!(features & NETIF_F_TSO); Pavel, this really seems arbitrary. Whilst I really appreciate you're looking into this driver a bit because of some issues you are trying to resolve, I'd like to ask that you not start bombarding me with nit-pick cleanups here and there and instead concentrate on the real bug or issue. Well, fixing clean code is easier than fixing strange code... Plus I was hoping to make the mainainers to talk. The driver is listed as supported after all. Absolutely, I am available to support you, better I can. So no problem to clarify strange or complex parts of the driver and find/try new solutions to enhance it. Anyway... since you asked. I belive I have way to disable NAPI / tx coalescing in the driver. Unfortunately, locking is missing on the rx path, and needs to be extended to _irqsave variant on tx path. I have just replied to a previous thread about that... To be honest, I have in the box just a patch to fix lock on lpi as I had discussed in this mailing list some week ago. I will provide it asap. So patch currently looks like this (hand edited, can't be applied, got it working few hours ago). Does it look acceptable? I'd prefer this to go after the patch that pulls common code to single place, so that single place needs to be patched. Plus I guess I should add ifdefs, so that more advanced NAPI / tx coalescing code can be reactivated when it is fixed. Trivial fixes can go on top. Does that sound like a plan? Hmm, what I find strange is that, just this code is running since a long time on several platforms and Chip versions. No raise condition have been found or lock protection problems (also proving look mechanisms). I'd like to avoid to break old compatibilities and having the same performances but if there are some bugs I can support to review and test. Indeed, this year we have added the 4.x but some parts of the code (for TSO) should be self-contained. So I cannot image regressions on common part of the code... I let Alex to do a double check. Pavel, I ask you sorry if I missed some problems so, if you can (as D. Miller asked) to send us a cover letter + all patches I will try to reply soon. I can do also some tests if you ask me that! I could run on 3.x and 4.x but I cannot promise you benchmarks. Which tree do you want patches against? https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/ ? I think that bug fixing should be on top of net.git but I let Miller to decide. Best Regards Peppe Best regards, Pavel diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 0b706a7..c0016c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1395,9 +1397,10 @@ static void __stmmac_tx_clean(struct stmmac_priv *priv) static void stmmac_tx_clean(struct stmmac_priv *priv) { - spin_lock(&priv->tx_lock); + unsigned long flags; + spin_lock_irqsave(&priv->tx_lock, flags); __stmmac_tx_clean(priv); - spin_unlock(&priv->tx_lock); + spin_unlock_irqrestore(&priv->tx_lock, flags); } static inline void stmmac_enable_dma_irq(struct stmmac_priv *priv) @@ -1441,6 +1444,8 @@ static void stmmac_tx_err(struct stmmac_priv *priv) netif_wake_queue(priv->dev); } +static int stmmac_rx(struct stmmac_priv *priv, int limit); + /** * stmmac_dma_interrupt - DMA ISR * @priv: driver private structure @@ -1452,10 +1457,17 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv) { int status; int rxfifosz = priv->plat->rx_fifo_size; + unsigned long flags; status = priv->hw->dma->dma_interrupt(priv->ioaddr, &priv->xstats); if (likely((status & handle_rx)) || (status & handle_tx)) { + int r; + spin_lock_irqsave(&priv->tx_lock, flags); + r = stmmac_rx(priv, 999); + spin_unlock_irqrestore(&priv->tx_lock, flags); +#if 0 if (likely(napi_schedule_prep(&priv->napi))) { //pr_err("napi: schedule\n"); stmmac_disable_dma_irq(priv); @@ -1463,7 +1475,8 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv) } else pr_err("napi: schedule failed\n"); #endif + stmmac_tx
Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
+ Lino On 12/2/2016 9:24 AM, Giuseppe CAVALLARO wrote: Hello On 11/24/2016 10:25 PM, Pavel Machek wrote: Hi! I'm debugging strange delays during transmit in stmmac driver. They seem to be present in 4.4 kernel (and older kernels, too). Workload is burst of udp packets being sent, pause, burst of udp packets, ... ... 4.9-rc6 still has the delays. With the #define STMMAC_COAL_TX_TIMER 1000 #define STMMAC_TX_MAX_FRAMES 2 settings, delays go away, and driver still works. (It fails fairly fast in 4.4). Good news. But the question still is: what is going on there? 256 packets looks way too large for being a trigger for aborting the TX coalescing timer. Looking more deeply into this, the driver is using non-highres timers to implement the TX coalescing. This simply cannot work. 1 HZ, which is the lowest granularity of non-highres timers in the kernel, is variable as well as already too large of a delay for effective TX coalescing. I seriously think that the TX coalescing support should be ripped out or disabled entirely until it is implemented properly in this driver. Ok, I'd disable coalescing, but could not figure it out till. What is generic way to do that? It seems only thing stmmac_tx_timer() does is calling stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be possible to do that explicitely, without delay, but it stops working completely if I attempt to do that. On a side note, stmmac_poll() does stmmac_enable_dma_irq() while stmmac_dma_interrupt() disables interrupts. But I don't see any protection between the two, so IMO it could race and we'd end up without polling or interrupts... the idea behind the TX mitigation is to mix the interrupt and timer and this approach gave us real benefit in terms of performances and CPU usage (especially on SH4-200/SH4-300 platforms based). In the ring, some descriptors can raise the irq (according to a threshold) and set the IC bit. In this path, the NAPI poll will be scheduled. But there is a timer that can run (and we experimented that no high resolution is needed) to clear the tx resources. Concerning the lock protection, we had reviewed long time ago and IIRC, no raise condition should be present. Open to review it, again! So, welcome any other schema and testing on platforms supported. Hoping this summary can help. Peppe Thanks and best regards, Pavel
Re: [PATCH 1/2] net: stmmac: avoid Camelcase naming
Hello Corentin patches look ok, I just wonder if you tested it in case of the stmmac is connected to a transceiver. Let me consider it a critical part of the driver to properly work. Regards Peppe On 12/1/2016 4:19 PM, Corentin Labbe wrote: This patch simply rename regValue to value, like it was named in other mdio functions. Signed-off-by: Corentin Labbe --- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index e3216e5..6796c28 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -83,14 +83,14 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg) unsigned int mii_data = priv->hw->mii.data; int data; - u16 regValue = (((phyaddr << 11) & (0xF800)) | + u16 value = (((phyaddr << 11) & (0xF800)) | ((phyreg << 6) & (0x07C0))); - regValue |= MII_BUSY | ((priv->clk_csr & 0xF) << 2); + value |= MII_BUSY | ((priv->clk_csr & 0xF) << 2); if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address)) return -EBUSY; - writel(regValue, priv->ioaddr + mii_address); + writel(value, priv->ioaddr + mii_address); if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address)) return -EBUSY;
Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
Hi! > >>1 HZ, which is the lowest granularity of non-highres timers in the > >>kernel, is variable as well as already too large of a delay for > >>effective TX coalescing. > >> > >>I seriously think that the TX coalescing support should be ripped out > >>or disabled entirely until it is implemented properly in this > >>driver. > > > >Ok, I'd disable coalescing, but could not figure it out till. What is > >generic way to do that? > > > >It seems only thing stmmac_tx_timer() does is calling > >stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be > >possible to do that explicitely, without delay, but it stops working > >completely if I attempt to do that. > > > >On a side note, stmmac_poll() does stmmac_enable_dma_irq() while > >stmmac_dma_interrupt() disables interrupts. But I don't see any > >protection between the two, so IMO it could race and we'd end up > >without polling or interrupts... > > > the idea behind the TX mitigation is to mix the interrupt and > timer and this approach gave us real benefit in terms > of performances and CPU usage (especially on SH4-200/SH4-300 platforms > based). Well, if you have a workload that sends and receive packets, it tends to work ok, as you do tx_clean() in stmmac_poll(). My workload is not like that -- it is "sending packets at 3MB/sec, receiving none". So the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled, and then we run out of transmit descriptors, and then 40msec passes, and then we clean them. Bad. And that's why low-res timers do not cut it. > In the ring, some descriptors can raise the irq (according to a > threshold) and set the IC bit. In this path, the NAPI poll will be > scheduled. Not NAPI poll but stmmac_tx_timer(), right? > But there is a timer that can run (and we experimented that no high > resolution is needed) to clear the tx resources. > Concerning the lock protection, we had reviewed long time ago and > IIRC, no raise condition should be present. Open to review it, > again! Well, I certainly like the fact that we are talking :-). And yes, I have some questions. There's nothing that protect stmmac_poll() from running concurently with stmmac_dma_interrupt(), right? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
[[PATCH iproute2/net-next v2] 2/4] tc: flower: document SCTP ip_proto
Add SCTP ip_proto to help text and man page. Signed-off-by: Simon Horman --- man/man8/tc-flower.8 | 14 +++--- tc/f_flower.c| 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index 56db42f983c1..a401293fed50 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -29,7 +29,7 @@ flower \- flow based traffic control filter .IR PRIORITY " | " .BR vlan_eth_type " { " ipv4 " | " ipv6 " | " .IR ETH_TYPE " } | " -.BR ip_proto " { " tcp " | " udp " | " +.BR ip_proto " { " tcp " | " udp " | " sctp " | " .IR IP_PROTO " } | { " .BR dst_ip " | " src_ip " } { " .IR ipv4_address " | " ipv6_address " } | { " @@ -93,8 +93,8 @@ or an unsigned 16bit value in hexadecimal format. .BI ip_proto " IP_PROTO" Match on layer four protocol. .I IP_PROTO -may be either -.BR tcp , udp +may be +.BR tcp ", " udp ", " sctp or an unsigned 8bit value in hexadecimal format. .TP .BI dst_ip " ADDRESS" @@ -110,8 +110,8 @@ option of tc filter. .TQ .BI src_port " NUMBER" Match on layer 4 protocol source or destination port number. Only available for -.BR ip_proto " values " udp " and " tcp , -which has to be specified in beforehand. +.BR ip_proto " values " udp ", " tcp " and " sctp +which have to be specified in beforehand. .SH NOTES As stated above where applicable, matches of a certain layer implicitly depend on the matches of the next lower layer. Precisely, layer one and two matches @@ -125,8 +125,8 @@ and finally layer four matches (\fBdst_port\fR and \fBsrc_port\fR) depend on .B ip_proto -being set to either -.BR tcp " or " udp . +being set to +.BR tcp ", " udp " or " sctp. .P There can be only used one mask per one prio. If user needs to specify different mask, he has to use different prio. diff --git a/tc/f_flower.c b/tc/f_flower.c index 1555764b9996..dacf24faf00e 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -37,7 +37,7 @@ static void explain(void) " vlan_ethtype [ ipv4 | ipv6 | ETH-TYPE ] |\n" " dst_mac MAC-ADDR |\n" " src_mac MAC-ADDR |\n" - " ip_proto [tcp | udp | IP-PROTO ] |\n" + " ip_proto [tcp | udp | sctp | IP-PROTO ] |\n" " dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " dst_port PORT-NUMBER |\n" -- 2.7.0.rc3.207.g0ac5344
[[PATCH iproute2/net-next v2] 3/4] tc: flower: correct name of ip_proto parameter to flower_parse_port()
This corrects a typo. Fixes: a1fb0d484237 ("tc: flower: Support matching on SCTP ports") Signed-off-by: Simon Horman --- tc/f_flower.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tc/f_flower.c b/tc/f_flower.c index dacf24faf00e..d86ccdc3d3f0 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -160,15 +160,15 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type, return 0; } -static int flower_port_attr_type(__u8 ip_port, bool is_src) +static int flower_port_attr_type(__u8 ip_proto, bool is_src) { - if (ip_port == IPPROTO_TCP) { + if (ip_proto == IPPROTO_TCP) { return is_src ? TCA_FLOWER_KEY_TCP_SRC : TCA_FLOWER_KEY_TCP_DST; - } else if (ip_port == IPPROTO_UDP) { + } else if (ip_proto == IPPROTO_UDP) { return is_src ? TCA_FLOWER_KEY_UDP_SRC : TCA_FLOWER_KEY_UDP_DST; - } else if (ip_port == IPPROTO_SCTP) { + } else if (ip_proto == IPPROTO_SCTP) { return is_src ? TCA_FLOWER_KEY_SCTP_SRC : TCA_FLOWER_KEY_SCTP_DST; } else { @@ -177,14 +177,14 @@ static int flower_port_attr_type(__u8 ip_port, bool is_src) } } -static int flower_parse_port(char *str, __u8 ip_port, bool is_src, +static int flower_parse_port(char *str, __u8 ip_proto, bool is_src, struct nlmsghdr *n) { int ret; int type; __be16 port; - type = flower_port_attr_type(ip_port, is_src); + type = flower_port_attr_type(ip_proto, is_src); if (type < 0) return -1; -- 2.7.0.rc3.207.g0ac5344
[[PATCH iproute2/net-next v2] 4/4] tc: flower: make use of flower_port_attr_type() safe and silent
Make use of flower_port_attr_type() safe: * flower_port_attr_type() may return a valid index into tb[] or -1. Only access tb[] in the case of the former. * Do not access null entries in tb[] Also make usage silent - it is valid for ip_proto to be invalid, for example if it is not specified as part of the filter. Fixes: a1fb0d484237 ("tc: flower: Support matching on SCTP ports") Signed-off-by: Simon Horman --- tc/f_flower.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/tc/f_flower.c b/tc/f_flower.c index d86ccdc3d3f0..615e8f27bed2 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -162,19 +162,17 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type, static int flower_port_attr_type(__u8 ip_proto, bool is_src) { - if (ip_proto == IPPROTO_TCP) { + if (ip_proto == IPPROTO_TCP) return is_src ? TCA_FLOWER_KEY_TCP_SRC : TCA_FLOWER_KEY_TCP_DST; - } else if (ip_proto == IPPROTO_UDP) { + else if (ip_proto == IPPROTO_UDP) return is_src ? TCA_FLOWER_KEY_UDP_SRC : TCA_FLOWER_KEY_UDP_DST; - } else if (ip_proto == IPPROTO_SCTP) { + else if (ip_proto == IPPROTO_SCTP) return is_src ? TCA_FLOWER_KEY_SCTP_SRC : TCA_FLOWER_KEY_SCTP_DST; - } else { - fprintf(stderr, "Illegal \"ip_proto\" for port\n"); + else return -1; - } } static int flower_parse_port(char *str, __u8 ip_proto, bool is_src, @@ -511,7 +509,8 @@ static void flower_print_ip_addr(FILE *f, char *name, __be16 eth_type, static void flower_print_port(FILE *f, char *name, struct rtattr *attr) { - fprintf(f, "\n %s %d", name, ntohs(rta_getattr_u16(attr))); + if (attr) + fprintf(f, "\n %s %d", name, ntohs(rta_getattr_u16(attr))); } static int flower_print_opt(struct filter_util *qu, FILE *f, @@ -520,6 +519,7 @@ static int flower_print_opt(struct filter_util *qu, FILE *f, struct rtattr *tb[TCA_FLOWER_MAX + 1]; __be16 eth_type = 0; __u8 ip_proto = 0xff; + int nl_type; if (!opt) return 0; @@ -574,10 +574,12 @@ static int flower_print_opt(struct filter_util *qu, FILE *f, tb[TCA_FLOWER_KEY_IPV6_SRC], tb[TCA_FLOWER_KEY_IPV6_SRC_MASK]); - flower_print_port(f, "dst_port", - tb[flower_port_attr_type(ip_proto, false)]); - flower_print_port(f, "src_port", - tb[flower_port_attr_type(ip_proto, true)]); + nl_type = flower_port_attr_type(ip_proto, false); + if (nl_type >= 0) + flower_print_port(f, "dst_port", tb[nl_type]); + nl_type = flower_port_attr_type(ip_proto, true); + if (nl_type >= 0) + flower_print_port(f, "src_port", tb[nl_type]); if (tb[TCA_FLOWER_FLAGS]) { __u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]); -- 2.7.0.rc3.207.g0ac5344
[[PATCH iproute2/net-next v2] 0/4] tc: flower: SCTP and other port fixes
Hi Stephen, this short series: * Makes some improvements to the documentation of flower; A follow-up to recent work by Paul Blakey and myself. * Corrects some errors introduced when SCTP port matching support was recently added. Changes since v2: * Rebase Simon Horman (4): tc: flower: remove references to eth_type in manpage tc: flower: document SCTP ip_proto tc: flower: correct name of ip_proto parameter to flower_parse_port() tc: flower: make use of flower_port_attr_type() safe and silent man/man8/tc-flower.8 | 37 ++--- tc/f_flower.c| 32 +--- 2 files changed, 35 insertions(+), 34 deletions(-) -- 2.7.0.rc3.207.g0ac5344
[[PATCH iproute2/net-next v2] 1/4] tc: flower: remove references to eth_type in manpage
Remove references to eth_type and ether_type (spelling error) in the tc flower manpage. Also correct formatting of boldface text with whitespace. Cc: Paul Blakey Signed-off-by: Simon Horman --- man/man8/tc-flower.8 | 23 +++ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index 16ef261797ab..56db42f983c1 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -103,8 +103,8 @@ or an unsigned 8bit value in hexadecimal format. Match on source or destination IP address. .I ADDRESS must be a valid IPv4 or IPv6 address, depending on -.BR ether_type , -which has to be specified in beforehand. +.BR protocol +option of tc filter. .TP .BI dst_port " NUMBER" .TQ @@ -114,16 +114,15 @@ Match on layer 4 protocol source or destination port number. Only available for which has to be specified in beforehand. .SH NOTES As stated above where applicable, matches of a certain layer implicitly depend -on the matches of the next lower layer. Precisely, layer one and two matches ( -.BR indev , dst_mac , src_mac " and " eth_type ) -have no dependency, layer three matches ( -.BR ip_proto , dst_ip " and " src_ip ) -require -.B eth_type -being set to either -.BR ipv4 " or " ipv6 , -and finally layer four matches ( -.BR dst_port " and " src_port ) +on the matches of the next lower layer. Precisely, layer one and two matches +(\fBindev\fR, \fBdst_mac\fR and \fBsrc_mac\fR) +have no dependency, layer three matches +(\fBip_proto\fR, \fBdst_ip\fR and \fBsrc_ip\fR) +depend on the +.B protocol +option of tc filter +and finally layer four matches +(\fBdst_port\fR and \fBsrc_port\fR) depend on .B ip_proto being set to either -- 2.7.0.rc3.207.g0ac5344
Re: [PATCH net 0/7] net: stmmac: fix probe error handling and phydev leaks
On 11/30/2016 3:29 PM, Johan Hovold wrote: This series fixes a number of issues with the stmmac-driver probe error handling, which for example left clocks enabled after probe failures. The final patch fixes a failure to deregister and free any fixed-link PHYs that were registered during probe on probe errors and on driver unbind. It also fixes a related of-node leak on late probe errors. This series depends on the of_phy_deregister_fixed_link() helper that was just merged to net. As mentioned earlier, one staging driver also suffers from a similar leak and can be fixed up once the above mentioned helper hits mainline. Note that these patches have only been compile tested. For common and STi part: Acked-by: Giuseppe Cavallaro thx peppe Johan Johan Hovold (7): net: ethernet: stmmac: dwmac-socfpga: fix use-after-free on probe errors net: ethernet: stmmac: dwmac-sti: fix probe error path net: ethernet: stmmac: dwmac-rk: fix probe error path net: ethernet: stmmac: dwmac-generic: fix probe error path net: ethernet: stmmac: dwmac-meson8b: fix probe error path net: ethernet: stmmac: platform: fix outdated function header net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks .../net/ethernet/stmicro/stmmac/dwmac-generic.c| 17 -- .../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c| 25 ++ .../net/ethernet/stmicro/stmmac/dwmac-lpc18xx.c| 17 -- drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c | 23 ++--- .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 32 +- drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 21 +--- .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c| 39 ++ drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c| 23 ++--- drivers/net/ethernet/stmicro/stmmac/dwmac-stm32.c | 19 --- drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c | 26 +++ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 - .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 33 +++--- .../net/ethernet/stmicro/stmmac/stmmac_platform.h | 2 ++ 13 files changed, 215 insertions(+), 63 deletions(-)
Re: [PATCH 1/4] bindings: net: stmmac: correct note about TSO
On 11/23/2016 3:24 PM, Niklas Cassel wrote: From: Niklas Cassel snps,tso was previously placed under AXI BUS Mode parameters, suggesting that the property should be in the stmmac-axi-config node. TSO (TCP Segmentation Offloading) has nothing to do with AXI BUS Mode parameters, and the parser actually expects it to be in the root node, not in the stmmac-axi-config. Also added a note about snps,tso only being available on GMAC4 and newer. Signed-off-by: Niklas Cassel Acked-by: Giuseppe Cavallaro --- Documentation/devicetree/bindings/net/stmmac.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/net/stmmac.txt b/Documentation/devicetree/bindings/net/stmmac.txt index 41b49e6075f5..b95ff998ba73 100644 --- a/Documentation/devicetree/bindings/net/stmmac.txt +++ b/Documentation/devicetree/bindings/net/stmmac.txt @@ -1,7 +1,7 @@ * STMicroelectronics 10/100/1000 Ethernet driver (GMAC) Required properties: -- compatible: Should be "snps,dwmac-" "snps,dwmac" +- compatible: Should be "snps,dwmac-", "snps,dwmac" For backwards compatibility: "st,spear600-gmac" is also supported. - reg: Address and length of the register set for the device - interrupt-parent: Should be the phandle for the interrupt controller @@ -50,6 +50,8 @@ Optional properties: - snps,ps-speed: port selection speed that can be passed to the core when PCS is supported. For example, this is used in case of SGMII and MAC2MAC connection. +- snps,tso: this enables the TSO feature otherwise it will be managed by +MAC HW capability register. Only for GMAC4 and newer. - AXI BUS Mode parameters: below the list of all the parameters to program the AXI register inside the DMA module: - snps,lpi_en: enable Low Power Interface @@ -62,8 +64,6 @@ Optional properties: - snps,fb: fixed-burst - snps,mb: mixed-burst - snps,rb: rebuild INCRx Burst - - snps,tso: this enables the TSO feature otherwise it will be managed by - MAC HW capability register. - mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus. Examples:
Re: pull request (net-next): ipsec-next 2016-12-01
On Thu, Dec 01, 2016 at 11:45:16AM -0500, David Miller wrote: > From: Steffen Klassert > Date: Thu, 1 Dec 2016 12:48:04 +0100 > > > git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git > > master > > > > for you to fetch changes up to 2258d927a691ddd2ab585adb17ea9f96e89d0638: > > > > xfrm: remove unused helper (2016-09-30 08:20:56 +0200) > > Hmmm, when I try to pull I don't get anything: > > [davem@dhcp-10-15-49-210 net-next]$ git pull --no-ff > git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master > >From git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next > * branchmaster -> FETCH_HEAD > Already up-to-date. Oh yes, my bad. You've got it already with my last pull request. Sorry.
Re: [PATCH 1/2] net: stmmac: avoid Camelcase naming
On Fri, Dec 02, 2016 at 09:44:48AM +0100, Giuseppe CAVALLARO wrote: > Hello Corentin > > patches look ok, I just wonder if you tested it in case of > the stmmac is connected to a transceiver. Let me consider it > a critical part of the driver to properly work. > > Regards > Peppe > I tested it on a Cubieboard 2 (dwmac-sunxi). What do you mean by "connected to a transceiver" ? an external PHY ? Regards
Re: [PATCH v2] cpsw: ethtool: add support for getting/setting EEE registers
Hi Florian sorry for my delay. On 11/24/2016 7:23 PM, Florian Fainelli wrote: +Peppe, Le 24/11/2016 à 07:38, Andrew Lunn a écrit : As for enabling advertising and correct working of cpsw do you mean it would be better to disable EEE in any PHY on cpsw initialization as long as cpsw doesn't provide support for EEE? We observe some strange behavior with our gigabit PHYs and a link partner in a EEE-capable unmanaged NetGear switch. Disabling advertising seems to help. Though we're still investigating the issue. Hi Florian Am i right in saying, a PHY should not advertise EEE until the MAC driver calls phy_init_eee(), indicating the MAC supports EEE? You would think so, but I don't see how this could possibly work if that was not the case already, see below. If so, it looks like we need to change a few of the PHY drivers, in particular, the bcm-*.c. The first part that bcm-phy-lib.c does is make sure that EEE is enabled such that this gets reflected in MDIO_PCS_EEE_ABLE, without this, we won't be able to pass the first test in phy_init_eee(). The second part is to advertise EEE such that this gets reflected in MDIO_AN_EEE_ADV, also to make sure that we can pass the second check in phy_init_eee(). Now, looking at phy_init_eee(), and what stmmac does (and bcmgenet, copied after stmmac), we need to somehow, have EEE advertised for phy_init_eee() to succeed, prepare the MAC to support EEE, and finally conclude with a call to phy_ethtool_set_eee(), which writes to the MDIO_AN_EEE_ADV register, and concludes the EEE auto-negotiated process. Since we already have EEE advertised, we are essentially just checking that the EEE advertised settings and the LP advertised settings actually do match, so it sounds like the final call to phy_ethtool_set_eee() is potentially useless if the resolved advertised and link partner advertised settings already match... So it sounds like at least, the first time you try to initialize EEE, we should start with EEE not advertised, and then, if we have EEE enabled at some point, and we re-negotiate the link parameters, somehow phy_init_eee() does a right job for that. Peppe, any thoughts on this? I share what you say. In sum, the EEE management inside the stmmac is: - the driver looks at own HW cap register if EEE is supported (indeed the user could keep disable EEE if bugged on some HW + Alex, Fabrice: we had some patches for this to propose where we called the phy_ethtool_set_eee to disable feature at phy level - then the stmmac asks PHY layer to understand if transceiver and partners are EEE capable. - If all matches the EEE is actually initialized. the logic above should be respected when use ethtool, hmm, I will check the stmmac_ethtool_op_set_eee asap. Hoping this is useful Regards Peppe
Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
On Thu, 1 Dec 2016 11:50:00 -0800, Pravin Shelar wrote: > This is not changing any behavior compared to current OVS vlan checks. > Single vlan header is not considered for MTU check. It is changing it. Consider the case when there's an interface with MTU 1500 forwarding to an interface with MTU 1496. Obviously, full-sized vlan frames ingressing on the first interface are not forwardable to the second one. Yet, if the vlan tag is accelerated (and thus not counted in skb->len), is_skb_forwardable happily returns true because of the check len = dev->mtu + dev->hard_header_len + VLAN_HLEN; if (skb->len <= len) Jiri
Re: [PATCH 1/2] net: stmmac: avoid Camelcase naming
On 12/2/2016 9:58 AM, Corentin Labbe wrote: On Fri, Dec 02, 2016 at 09:44:48AM +0100, Giuseppe CAVALLARO wrote: Hello Corentin patches look ok, I just wonder if you tested it in case of the stmmac is connected to a transceiver. Let me consider it a critical part of the driver to properly work. Regards Peppe I tested it on a Cubieboard 2 (dwmac-sunxi). What do you mean by "connected to a transceiver" ? an external PHY ? yes an external PHY. AFAIK, users have, sometime, a switch with fixed link enabled. Thx for your support and patches Acked-by: Giuseppe Cavallaro Regards
Re: [RFC PATCH net-next v2] ipv6: implement consistent hashing for equal-cost multipath routing
On 11/30/2016 05:04 AM, Tom Herbert wrote: > This is a lot of code to make ECMP work better. Can you be more > specific as to what the "issues" are? Assuming this is just the > transient packet reorder that happens in one link flap I am wondering > if this complexity is justified. Unconsistent hashing is an issue when the load balancer is in front of stateful backends, keeping per-flow state. Also, if neighbors are constantly being added and removed, flows are constantly changing nexthops. David signature.asc Description: OpenPGP digital signature
Re: [patch net-next v3 11/12] mlxsw: spectrum_router: Request a dump of FIB tables during init
On Fri, Dec 02, 2016 at 12:27:25AM +0100, Hannes Frederic Sowa wrote: > I really like that. Would you mind adding this? Yes. I'll send another version to Jiri today after testing and hopefully we can submit today / tomorrow. I think Linus is still undecided about -rc8 and I would like to get this in 4.10. > >> Quick follow-up question: How can I quickly find out the hw limitations > >> via the kernel api? > > > > That's a good question. Currently, you can't. However, we already have a > > mechanism in place to read device's capabilities from the firmware and > > we can (and should) expose some of them to the user. The best API for > > that would be devlink, as it can represent the entire device as opposed > > to only a port netdev like other tools. > > > > We're also working on making the pipeline more visible to the user, so > > that it would be easier for users to understand and debug their > > networks. I believe a colleague of mine (Matty) presented this during > > the last netdev conference. > > Thanks, I will look it up! Found it: https://www.youtube.com/watch?v=gwzaKXWIelc&feature=youtu.be&list=PLrninrcyMo3IkTvpvM2LK6gn4NdbFhI0G&t=6892
Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote: > On Wed, Nov 30, 2016 at 6:30 AM, Jiri Benc wrote: > > I'm not opposed to changing this but I'm afraid it needs much deeper > > review. Because with this in place, no core kernel functions that > > depend on skb->protocol may be called from within openvswitch. > > > Can you give specific example where it does not work? I can't, I haven't reviewed the usage. I'm just saying that the stack does not expect skb->protocol being ETH_P_8021Q for e.g. IPv4 packets. It may not be relevant for the calls used by openvswitch but we should be sure about that. Especially defragmentation and conntrack is worth looking at. Again, I'm not saying this is wrong nor that there is an actual problem. I'm just pointing out that openvswitch has different expectations about skb wrt. vlans than the rest of the kernel and we should be reasonably sure the behavior is correct when passing between the two. > skb-protocol value is set by the caller, so it should not be > arbitrary. is it missing in any case? It's not set exactly by the caller, because that's what this patch is removing. It is set by whoever handed over the packet to openvswitch. The point is we don't know *what* it is set to. It may as well be ETH_P_8021Q, breaking the conditions here. It should not happen in practice but still, it seems weird to depend on the fact that the packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor ETH_P_8021AD. Jiri
Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
Hi Pavel On 12/2/2016 9:45 AM, Pavel Machek wrote: Hi! 1 HZ, which is the lowest granularity of non-highres timers in the kernel, is variable as well as already too large of a delay for effective TX coalescing. I seriously think that the TX coalescing support should be ripped out or disabled entirely until it is implemented properly in this driver. Ok, I'd disable coalescing, but could not figure it out till. What is generic way to do that? It seems only thing stmmac_tx_timer() does is calling stmmac_tx_clean(), which reclaims tx_skbuff[] entries. It should be possible to do that explicitely, without delay, but it stops working completely if I attempt to do that. On a side note, stmmac_poll() does stmmac_enable_dma_irq() while stmmac_dma_interrupt() disables interrupts. But I don't see any protection between the two, so IMO it could race and we'd end up without polling or interrupts... the idea behind the TX mitigation is to mix the interrupt and timer and this approach gave us real benefit in terms of performances and CPU usage (especially on SH4-200/SH4-300 platforms based). Well, if you have a workload that sends and receive packets, it tends to work ok, as you do tx_clean() in stmmac_poll(). My workload is not like that -- it is "sending packets at 3MB/sec, receiving none". So the stmmac_tx_timer() is rescheduled and rescheduled and rescheduled, and then we run out of transmit descriptors, and then 40msec passes, and then we clean them. Bad. And that's why low-res timers do not cut it. in that case, I expect that the tuning of the driver could help you. I mean, by using ethtool, it could be enough to set the IC bit on all the descriptors. You should touch the tx_coal_frames. Then you can use ethtool -S to monitor the status. We had experimented this tuning on STB IP where just datagrams had to send externally. To be honest, although we had seen better results w/o any timer, we kept this approach enabled because the timer was fast enough to cover our tests on SH4 boxes. FYI, stmmac doesn't implement adaptive algo. In the ring, some descriptors can raise the irq (according to a threshold) and set the IC bit. In this path, the NAPI poll will be scheduled. Not NAPI poll but stmmac_tx_timer(), right? in the xmit according the the threshold the timer is started or the interrupt is set inside the descriptor. Then stmmac_tx_clean will be always called and, if you see the flow, no irqlock protection is needed! But there is a timer that can run (and we experimented that no high resolution is needed) to clear the tx resources. Concerning the lock protection, we had reviewed long time ago and IIRC, no raise condition should be present. Open to review it, again! Well, I certainly like the fact that we are talking :-). And yes, I have some questions. There's nothing that protect stmmac_poll() from running concurently with stmmac_dma_interrupt(), right? This is not necessary. Best Regards peppe Best regards, Pavel
RE: [PATCH] net:phy fix driver reference count error when attach and detach phy device
From: Mao Wenan > Sent: 30 November 2016 10:23 > The nic in my board use the phy dev from marvell, and the system will > load the marvell phy driver automatically, but when I remove the phy > drivers, the system immediately panic: > Call trace: > [ 2582.834493] [] phy_state_machine+0x3c/0x438 [ > 2582.851754] [] process_one_work+0x150/0x428 [ > 2582.868188] [] worker_thread+0x144/0x4b0 [ > 2582.883882] [] kthread+0xfc/0x110 > > there should be proper reference counting in place to avoid that. > I found that phy_attach_direct() forgets to add phy device driver > reference count, and phy_detach() forgets to subtract reference count. > This patch is to fix this bug, after that panic is disappeared when remove > marvell.ko > > Signed-off-by: Mao Wenan > --- > drivers/net/phy/phy_device.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 1a4bf8a..a7ec7c2 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > return -EIO; > } > > + if (!try_module_get(d->driver->owner)) { > + dev_err(&dev->dev, "failed to get the device driver module\n"); > + return -EIO; > + } If this is the phy code, what stops the phy driver being unloaded before the try_module_get() obtains a reference. If it isn't the phy driver then there ought to be a reference count obtained when the phy driver is located (by whatever decides which phy driver to use). Even if that code later releases its reference (it probably shouldn't on success) then you can't fail to get an extra reference here. > + > get_device(d); > > /* Assume that if there is no driver, that it doesn't > @@ -921,6 +926,7 @@ int phy_attach_direct(struct net_device *dev, struct > phy_device *phydev, > > error: > put_device(d); > + module_put(d->driver->owner); Are those two in the wrong order ? > module_put(bus->owner); > return err; > } > @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev) > bus = phydev->mdio.bus; > > put_device(&phydev->mdio.dev); > + module_put(phydev->mdio.dev.driver->owner); > module_put(bus->owner); Where is this code called from? You can't call it from the phy driver because the driver can be unloaded as soon as the last reference is removed. At that point the code memory is freed. > } > EXPORT_SYMBOL(phy_detach); > -- > 2.7.0 >
Re: [patch] net: renesas: ravb: unintialized return value
On Thu, Dec 01, 2016 at 11:57:44PM +0300, Dan Carpenter wrote: > We want to set the other "err" variable here so that we can return it > later. My version of GCC misses this issue but I caught it with a > static checker. > Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev > leaks") > Signed-off-by: Dan Carpenter Thanks for catching this. Reviewed-by: Johan Hovold Johan
[PATCH v2] netfilter: avoid warn and OOM killer on vmalloc call
Andrey Konovalov reported that this vmalloc call is based on an userspace request and that it's spewing traces, which may flood the logs and cause DoS if abused. Florian Westphal also mentioned that this call should not trigger OOM killer. This patch brings the vmalloc call in sync to kmalloc and disables the warn trace on allocation failure and also disable OOM killer invocation. Note, however, that under such stress situation, other places may trigger OOM killer invocation. Reported-by: Andrey Konovalov Cc: Florian Westphal Signed-off-by: Marcelo Ricardo Leitner --- net/netfilter/x_tables.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index fc4977456c30e098197b4f987b758072c9cf60d9..dece525bf83a0098dad607fce665cd0bde228362 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -958,7 +958,9 @@ struct xt_table_info *xt_alloc_table_info(unsigned int size) if (sz <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) info = kmalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY); if (!info) { - info = vmalloc(sz); + info = __vmalloc(sz, GFP_KERNEL | __GFP_NOWARN | +__GFP_NORETRY | __GFP_HIGHMEM, +PAGE_KERNEL); if (!info) return NULL; } -- 2.9.3
Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
On Fri, 2 Dec 2016 10:42:02 +0100, Jiri Benc wrote: > On Thu, 1 Dec 2016 12:31:09 -0800, Pravin Shelar wrote: > It's not set exactly by the caller, because that's what this patch is > removing. It is set by whoever handed over the packet to openvswitch. > The point is we don't know *what* it is set to. It may as well be > ETH_P_8021Q, breaking the conditions here. It should not happen in > practice but still, it seems weird to depend on the fact that the > packet coming to ovs has never skb->protocol equal to ETH_P_8021Q nor > ETH_P_8021AD. I'm wondering whether we should not revive the patchset that makes the first vlan tag always accelerated. It makes handling of various packet formats and the checks for forwardability so much simpler... Jiri
[PATCH] net: wireless: realtek: constify rate_control_ops structures
The structures rate_control_ops are only passed as an argument to the functions ieee80211_rate_control_{register/unregister}. This argument is of type const, so rate_control_ops having this property can also be declared as const. Done using Coccinelle: @r1 disable optional_qualifier @ identifier i; position p; @@ static struct rate_control_ops i@p = {...}; @ok1@ identifier r1.i; position p; @@ ieee80211_rate_control_register(&i@p) @ok2@ identifier r1.i; position p; @@ ieee80211_rate_control_unregister(&i@p) @bad@ position p!={r1.p,ok1.p,ok2.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ static +const struct rate_control_ops i={...}; @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct rate_control_ops i; File size before: textdata bss dec hex filename 1991 104 02095 82f wireless/realtek/rtlwifi/rc.o File size after: textdata bss dec hex filename 2095 0 02095 wireless/realtek/rtlwifi/rc.o Signed-off-by: Bhumika Goyal --- drivers/net/wireless/realtek/rtlwifi/rc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/rc.c b/drivers/net/wireless/realtek/rtlwifi/rc.c index ce8621a..107c13c 100644 --- a/drivers/net/wireless/realtek/rtlwifi/rc.c +++ b/drivers/net/wireless/realtek/rtlwifi/rc.c @@ -284,7 +284,7 @@ static void rtl_rate_free_sta(void *rtlpriv, kfree(rate_priv); } -static struct rate_control_ops rtl_rate_ops = { +static const struct rate_control_ops rtl_rate_ops = { .name = "rtl_rc", .alloc = rtl_rate_alloc, .free = rtl_rate_free, -- 1.9.1
[PATCH/RFC net-next 0/2] net/sched: cls_flower: Support matching on ICMP
Hi, this series add supports for matching on ICMP type and code to cls_flower. This is modeled on existing support for matching on L4 ports. The updates to the dissector are intended to allow for code and storage re-use. Simon Horman (2): flow dissector: ICMP support net/sched: cls_flower: Support matching on ICMP type and code drivers/net/bonding/bond_main.c | 6 +++-- include/linux/skbuff.h | 5 + include/net/flow_dissector.h| 50 ++--- include/uapi/linux/pkt_cls.h| 10 + net/core/flow_dissector.c | 34 +--- net/sched/cls_flow.c| 4 ++-- net/sched/cls_flower.c | 42 ++ 7 files changed, 141 insertions(+), 10 deletions(-) -- 2.7.0.rc3.207.g0ac5344
[PATCH/RFC net-next 1/2] flow dissector: ICMP support
Allow dissection of ICMP(V6) type and code. This re-uses transport layer port dissection code as although ICMP is not a transport protocol and their type and code are not ports this allows sharing of both code and storage. Signed-off-by: Simon Horman --- drivers/net/bonding/bond_main.c | 6 -- include/linux/skbuff.h | 5 + include/net/flow_dissector.h| 30 +++--- net/core/flow_dissector.c | 34 +++--- net/sched/cls_flow.c| 4 ++-- 5 files changed, 69 insertions(+), 10 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 8029dd4912b6..a6f75cfb2bf7 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3181,7 +3181,8 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, } else { return false; } - if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0) + if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && + proto >= 0 && !skb_flow_is_icmp_any(skb, proto)) fk->ports.ports = skb_flow_get_ports(skb, noff, proto); return true; @@ -3209,7 +3210,8 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb) return bond_eth_hash(skb); if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 || - bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) + bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23 || + flow_keys_are_icmp_any(&flow)) hash = bond_eth_hash(skb); else hash = (__force u32)flow.ports.ports; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9c535fbccf2c..44a8f69a9198 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1094,6 +1094,11 @@ u32 __skb_get_poff(const struct sk_buff *skb, void *data, __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto, void *data, int hlen_proto); +static inline bool skb_flow_is_icmp_any(const struct sk_buff *skb, u8 ip_proto) +{ + return flow_protos_are_icmp_any(skb->protocol, ip_proto); +} + static inline __be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto) { diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index c4f31666afd2..8880025914e3 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -2,6 +2,7 @@ #define _NET_FLOW_DISSECTOR_H #include +#include #include #include @@ -89,10 +90,15 @@ struct flow_dissector_key_addrs { }; /** - * flow_dissector_key_tp_ports: - * @ports: port numbers of Transport header + * flow_dissector_key_ports: + * @ports: port numbers of Transport header or + * type and code of ICMP header + * ports: source (high) and destination (low) port numbers * src: source port number * dst: destination port number + * icmp: ICMP type (high) and code (low) + * type: ICMP type + * type: ICMP code */ struct flow_dissector_key_ports { union { @@ -101,6 +107,11 @@ struct flow_dissector_key_ports { __be16 src; __be16 dst; }; + __be16 icmp; + struct { + u8 type; + u8 code; + }; }; }; @@ -188,9 +199,22 @@ struct flow_keys_digest { void make_flow_keys_digest(struct flow_keys_digest *digest, const struct flow_keys *flow); +static inline bool flow_protos_are_icmp_any(__be16 n_proto, u8 ip_proto) +{ + return (n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP) || + (n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6); +} + +static inline bool flow_keys_are_icmp_any(const struct flow_keys *keys) +{ + return flow_protos_are_icmp_any(keys->basic.n_proto, + keys->basic.ip_proto); +} + static inline bool flow_keys_have_l4(const struct flow_keys *keys) { - return (keys->ports.ports || keys->tags.flow_label); + return (!flow_keys_are_icmp_any(keys) && keys->ports.ports) || + keys->tags.flow_label; } u32 flow_hash_from_keys(struct flow_keys *keys); diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1eb6f949e5b2..0584b4bb4390 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -58,6 +58,28 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector, EXPORT_SYMBOL(skb_flow_dissector_init); /** + * skb_flow_get_be16 - extract be16 entity + * @skb: sk_buff to extract from + * @poff: offset to extract at + * @data: raw buffer pointer to the packet + * @hlen: packet header length + * + * The function will
[PATCH/RFC net-next 2/2] net/sched: cls_flower: Support matching on ICMP type and code
Support matching on ICMP type and code. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent : flower \ indev eth0 ip_proto icmp type 8 code 0 action drop tc filter add dev eth0 protocol ipv6 parent : flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Signed-off-by: Simon Horman --- include/net/flow_dissector.h | 24 ++-- include/uapi/linux/pkt_cls.h | 10 ++ net/sched/cls_flower.c | 42 ++ 3 files changed, 74 insertions(+), 2 deletions(-) diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index 8880025914e3..5540dfa18872 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -199,10 +199,30 @@ struct flow_keys_digest { void make_flow_keys_digest(struct flow_keys_digest *digest, const struct flow_keys *flow); +static inline bool flow_protos_are_icmpv4(__be16 n_proto, u8 ip_proto) +{ + return n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP; +} + +static inline bool flow_protos_are_icmpv6(__be16 n_proto, u8 ip_proto) +{ + return n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6; +} + static inline bool flow_protos_are_icmp_any(__be16 n_proto, u8 ip_proto) { - return (n_proto == htons(ETH_P_IP) && ip_proto == IPPROTO_ICMP) || - (n_proto == htons(ETH_P_IPV6) && ip_proto == IPPROTO_ICMPV6); + return flow_protos_are_icmpv4(n_proto, ip_proto) || + flow_protos_are_icmpv6(n_proto, ip_proto); +} + +static inline bool flow_basic_key_is_icmpv4(const struct flow_dissector_key_basic *basic) +{ + return flow_protos_are_icmpv4(basic->n_proto, basic->ip_proto); +} + +static inline bool flow_basic_key_is_icmpv6(const struct flow_dissector_key_basic *basic) +{ + return flow_protos_are_icmpv6(basic->n_proto, basic->ip_proto); } static inline bool flow_keys_are_icmp_any(const struct flow_keys *keys) diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index 86786d45ee66..58160fe80b80 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -457,6 +457,16 @@ enum { TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK, /* be16 */ TCA_FLOWER_KEY_ENC_UDP_DST_PORT,/* be16 */ TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK, /* be16 */ + + TCA_FLOWER_KEY_ICMPV4_CODE, /* u8 */ + TCA_FLOWER_KEY_ICMPV4_CODE_MASK,/* u8 */ + TCA_FLOWER_KEY_ICMPV4_TYPE, /* u8 */ + TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,/* u8 */ + TCA_FLOWER_KEY_ICMPV6_CODE, /* u8 */ + TCA_FLOWER_KEY_ICMPV6_CODE_MASK,/* u8 */ + TCA_FLOWER_KEY_ICMPV6_TYPE, /* u8 */ + TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */ + __TCA_FLOWER_MAX, }; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index e8dd09af0d0c..412efa7de226 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -355,6 +355,14 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = { [TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK] = { .type = NLA_U16 }, [TCA_FLOWER_KEY_ENC_UDP_DST_PORT] = { .type = NLA_U16 }, [TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK] = { .type = NLA_U16 }, + [TCA_FLOWER_KEY_ICMPV4_TYPE]= { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV4_TYPE_MASK] = { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV4_CODE]= { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV4_CODE_MASK] = { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV6_TYPE]= { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV6_TYPE_MASK] = { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV6_CODE]= { .type = NLA_U8 }, + [TCA_FLOWER_KEY_ICMPV6_CODE_MASK] = { .type = NLA_U8 }, }; static void fl_set_key_val(struct nlattr **tb, @@ -471,6 +479,20 @@ static int fl_set_key(struct net *net, struct nlattr **tb, fl_set_key_val(tb, &key->tp.dst, TCA_FLOWER_KEY_SCTP_DST, &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK, sizeof(key->tp.dst)); + } else if (flow_basic_key_is_icmpv4(&key->basic)) { + fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE, + &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK, + sizeof(key->tp.type)); + fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE, + &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK, + sizeof(key->tp.code)); + } else if (flow_basic_key_is_icmpv6(&key->basic)) { + fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE, + &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK, + sizeof(key->tp.type)); + fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE, +
Re: [PATCH 4/6] net: ethernet: ti: cpts: add ptp pps support
On Wed, Nov 30, 2016 at 11:17:38PM +0100, Richard Cochran wrote: > On Wed, Nov 30, 2016 at 02:43:57PM -0600, Grygorii Strashko wrote: > > Sry, but this is questionable - code for pps comes from TI internal > > branches (SDK releases) where it survived for a pretty long time. Actually, there is a way to get an accurate PPS from the am335x. See this recent thread: https://www.mail-archive.com/linuxptp-devel@lists.sourceforge.net/msg01726.html That is the way to go, and so, please drop this present patch. Thanks, Richard
[PATCH/RFC iproute2/net-next 2/3] tc: flower: introduce enum flower_endpoint
Introduce enum flower_endpoint and use it instead of a bool as the type for paramatising source and destination. This is intended to improve read-ability and provide some type checking of endpoint parameters. Signed-off-by: Simon Horman --- tc/f_flower.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/tc/f_flower.c b/tc/f_flower.c index 615e8f27bed2..42253067b43d 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -23,6 +23,11 @@ #include "tc_util.h" #include "rt_names.h" +enum flower_endpoint { + flower_src, + flower_dst +}; + static void explain(void) { fprintf(stderr, @@ -160,29 +165,30 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type, return 0; } -static int flower_port_attr_type(__u8 ip_proto, bool is_src) +static int flower_port_attr_type(__u8 ip_proto, enum flower_endpoint endpoint) { if (ip_proto == IPPROTO_TCP) - return is_src ? TCA_FLOWER_KEY_TCP_SRC : + return endpoint == flower_src ? TCA_FLOWER_KEY_TCP_SRC : TCA_FLOWER_KEY_TCP_DST; else if (ip_proto == IPPROTO_UDP) - return is_src ? TCA_FLOWER_KEY_UDP_SRC : + return endpoint == flower_src ? TCA_FLOWER_KEY_UDP_SRC : TCA_FLOWER_KEY_UDP_DST; else if (ip_proto == IPPROTO_SCTP) - return is_src ? TCA_FLOWER_KEY_SCTP_SRC : + return endpoint == flower_src ? TCA_FLOWER_KEY_SCTP_SRC : TCA_FLOWER_KEY_SCTP_DST; else return -1; } -static int flower_parse_port(char *str, __u8 ip_proto, bool is_src, +static int flower_parse_port(char *str, __u8 ip_proto, +enum flower_endpoint endpoint, struct nlmsghdr *n) { int ret; int type; __be16 port; - type = flower_port_attr_type(ip_proto, is_src); + type = flower_port_attr_type(ip_proto, endpoint); if (type < 0) return -1; @@ -340,14 +346,14 @@ static int flower_parse_opt(struct filter_util *qu, char *handle, } } else if (matches(*argv, "dst_port") == 0) { NEXT_ARG(); - ret = flower_parse_port(*argv, ip_proto, false, n); + ret = flower_parse_port(*argv, ip_proto, flower_dst, n); if (ret < 0) { fprintf(stderr, "Illegal \"dst_port\"\n"); return -1; } } else if (matches(*argv, "src_port") == 0) { NEXT_ARG(); - ret = flower_parse_port(*argv, ip_proto, true, n); + ret = flower_parse_port(*argv, ip_proto, flower_src, n); if (ret < 0) { fprintf(stderr, "Illegal \"src_port\"\n"); return -1; -- 2.7.0.rc3.207.g0ac5344
[PATCH/RFC iproute2/net-next 3/3] tc: flower: support matching on ICMP type and code
Support matching on ICMP type and code. Example usage: tc qdisc add dev eth0 ingress tc filter add dev eth0 protocol ip parent : flower \ indev eth0 ip_proto icmp type 8 code 0 action drop tc filter add dev eth0 protocol ipv6 parent : flower \ indev eth0 ip_proto icmpv6 type 128 code 0 action drop Signed-off-by: Simon Horman --- man/man8/tc-flower.8 | 20 --- tc/f_flower.c| 96 2 files changed, 105 insertions(+), 11 deletions(-) diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index a401293fed50..c01ace6249dd 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -29,7 +29,7 @@ flower \- flow based traffic control filter .IR PRIORITY " | " .BR vlan_eth_type " { " ipv4 " | " ipv6 " | " .IR ETH_TYPE " } | " -.BR ip_proto " { " tcp " | " udp " | " sctp " | " +.BR ip_proto " { " tcp " | " udp " | " sctp " | " icmp " | " icmpv6 " | " .IR IP_PROTO " } | { " .BR dst_ip " | " src_ip " } { " .IR ipv4_address " | " ipv6_address " } | { " @@ -94,7 +94,7 @@ or an unsigned 16bit value in hexadecimal format. Match on layer four protocol. .I IP_PROTO may be -.BR tcp ", " udp ", " sctp +.BR tcp ", " udp ", " sctp ", " icmp ", " icmpv6 or an unsigned 8bit value in hexadecimal format. .TP .BI dst_ip " ADDRESS" @@ -112,6 +112,13 @@ option of tc filter. Match on layer 4 protocol source or destination port number. Only available for .BR ip_proto " values " udp ", " tcp " and " sctp which have to be specified in beforehand. +.TP +.BI type " NUMBER" +.TQ +.BI code " NUMBER" +Match on ICMP type or code. Only available for +.BR ip_proto " values " icmp " and " icmpv6 +which have to be specified in beforehand. .SH NOTES As stated above where applicable, matches of a certain layer implicitly depend on the matches of the next lower layer. Precisely, layer one and two matches @@ -120,13 +127,16 @@ have no dependency, layer three matches (\fBip_proto\fR, \fBdst_ip\fR and \fBsrc_ip\fR) depend on the .B protocol -option of tc filter -and finally layer four matches +option of tc filter, layer four port matches (\fBdst_port\fR and \fBsrc_port\fR) depend on .B ip_proto being set to -.BR tcp ", " udp " or " sctp. +.BR tcp ", " udp " or " sctp, +and finally ICMP matches (\fBcode\fR and \fBtype\fR) depend on +.B ip_proto +being set to +.BR icmp " or " icmpv6. .P There can be only used one mask per one prio. If user needs to specify different mask, he has to use different prio. diff --git a/tc/f_flower.c b/tc/f_flower.c index 42253067b43d..59f6f1ea26e6 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -28,6 +28,11 @@ enum flower_endpoint { flower_dst }; +enum flower_icmp_field { + flower_icmp_type, + flower_icmp_code +}; + static void explain(void) { fprintf(stderr, @@ -42,11 +47,13 @@ static void explain(void) " vlan_ethtype [ ipv4 | ipv6 | ETH-TYPE ] |\n" " dst_mac MAC-ADDR |\n" " src_mac MAC-ADDR |\n" - " ip_proto [tcp | udp | sctp | IP-PROTO ] |\n" + " ip_proto [tcp | udp | sctp | icmp | icmpv6 | IP-PROTO ] |\n" " dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " dst_port PORT-NUMBER |\n" - " src_port PORT-NUMBER }\n" + " src_port PORT-NUMBER |\n" + " type ICMP-TYPE |\n" + " code ICMP-CODE }\n" " FILTERID := X:Y:Z\n" " ACTION-SPEC := ... look at individual actions\n" "\n" @@ -95,16 +102,23 @@ static int flower_parse_ip_proto(char *str, __be16 eth_type, int type, int ret; __u8 ip_proto; - if (eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6)) { - fprintf(stderr, "Illegal \"eth_type\" for ip proto\n"); - return -1; - } + if (eth_type != htons(ETH_P_IP) && eth_type != htons(ETH_P_IPV6)) + goto err; + if (matches(str, "tcp") == 0) { ip_proto = IPPROTO_TCP; } else if (matches(str, "udp") == 0) { ip_proto = IPPROTO_UDP; } else if (matches(str, "sctp") == 0) { ip_proto = IPPROTO_SCTP; + } else if (matches(str, "icmp") == 0) { + if (eth_type != htons(ETH_P_IP)) + goto err; + ip_proto = IPPROTO_ICMP; + } else if (matches(str, "icmpv6") == 0) { + if (eth_type != htons(ETH_P_IPV6)) + goto err; + ip_proto = IPPROTO_ICMPV6; } else { ret = get_u8(&ip_proto,
[PATCH/RFC iproute2/net-next 0/3] tc: flower: Support matching on ICMP
Add support for matching on ICMP type and code to flower. This is modeled on existing support for matching on L4 ports. The second patch provided a minor cleanup which is in keeping with they style used in the last patch. This is marked as an RFC to match the same designation given to the corresponding kernel patches. Based on iproute2/net-next with the following applied: * [[PATCH iproute2/net-next v2] 0/4] tc: flower: SCTP and other port fixes Simon Horman (3): tc: flower: update headers for TCA_FLOWER_KEY_ICMP* tc: flower: introduce enum flower_endpoint tc: flower: support matching on ICMP type and code include/linux/pkt_cls.h | 10 man/man8/tc-flower.8| 20 ++-- tc/f_flower.c | 118 ++-- 3 files changed, 129 insertions(+), 19 deletions(-) -- 2.7.0.rc3.207.g0ac5344
[PATCH/RFC iproute2/net-next 1/3] tc: flower: update headers for TCA_FLOWER_KEY_ICMP*
These are proposed changes for net-next. Signed-off-by: Simon Horman --- include/linux/pkt_cls.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h index a3d8a4f17d8e..fa435ea8ad21 100644 --- a/include/linux/pkt_cls.h +++ b/include/linux/pkt_cls.h @@ -403,6 +403,16 @@ enum { TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK, /* be16 */ TCA_FLOWER_KEY_ENC_UDP_DST_PORT,/* be16 */ TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK, /* be16 */ + + TCA_FLOWER_KEY_ICMPV4_CODE, /* u8 */ + TCA_FLOWER_KEY_ICMPV4_CODE_MASK,/* u8 */ + TCA_FLOWER_KEY_ICMPV4_TYPE, /* u8 */ + TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,/* u8 */ + TCA_FLOWER_KEY_ICMPV6_CODE, /* u8 */ + TCA_FLOWER_KEY_ICMPV6_CODE_MASK,/* u8 */ + TCA_FLOWER_KEY_ICMPV6_TYPE, /* u8 */ + TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */ + __TCA_FLOWER_MAX, }; -- 2.7.0.rc3.207.g0ac5344
Re: [PATCH net] tcp: warn on bogus MSS and try to amend it
On Thu, Dec 01, 2016 at 03:29:49PM -0500, David Miller wrote: > From: Marcelo Ricardo Leitner > Date: Wed, 30 Nov 2016 11:14:32 -0200 > > > There have been some reports lately about TCP connection stalls caused > > by NIC drivers that aren't setting gso_size on aggregated packets on rx > > path. This causes TCP to assume that the MSS is actually the size of the > > aggregated packet, which is invalid. > > > > Although the proper fix is to be done at each driver, it's often hard > > and cumbersome for one to debug, come to such root cause and report/fix > > it. > > > > This patch amends this situation in two ways. First, it adds a warning > > on when this situation occurs, so it gives a hint to those trying to > > debug this. It also limit the maximum probed MSS to the adverised MSS, > > as it should never be any higher than that. > > > > The result is that the connection may not have the best performance ever > > but it shouldn't stall, and the admin will have a hint on what to look > > for. > > > > Tested with virtio by forcing gso_size to 0. > > > > Cc: Jonathan Maxwell > > Signed-off-by: Marcelo Ricardo Leitner > > I totally agree with this change, however I think the warning message can > be improved in two ways: > > > len = skb_shinfo(skb)->gso_size ? : skb->len; > > if (len >= icsk->icsk_ack.rcv_mss) { > > - icsk->icsk_ack.rcv_mss = len; > > + icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, > > + tcp_sk(sk)->advmss); > > + if (icsk->icsk_ack.rcv_mss != len) > > + pr_warn_once("Seems your NIC driver is doing bad RX > > acceleration. TCP performance may be compromised.\n"); > > We know it's a bad GRO implementation that causes this so let's be specific > in the > message, perhaps something like: > > Driver has suspect GRO implementation, TCP performance may be > compromised. > > Also, we have skb->dev available here most likely, so prefixing the message > with > skb->dev->name would make analyzing this situation even easier for someone > hitting > this. It's not avaliable anymore.. It's NULLified before we get there: tcp_v4_rcv() (same for v6) { ... skb->dev = NULL; ... if (!sock_owned_by_user(sk)) { if (!tcp_prequeue(sk, skb)) ret = tcp_v4_do_rcv(sk, skb); } else if (tcp_add_backlog(sk, skb)) { ... } I'll update the msg as above and post v2. Thanks, Marcelo
Re: [flamebait] xdp, well meaning but pointless
On Thu, 1 Dec 2016 13:51:32 -0800 Tom Herbert wrote: > >> The technical plenary at last IETF on Seoul a couple of weeks ago was > >> exclusively focussed on DDOS in light of the recent attack against > >> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare > >> presentation by Nick Sullivan > >> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf) > >> alluded to some implementation of DDOS mitigation. In particular, on > >> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel" slide 14 > >> numbers he gave we're based in iptables+BPF and that was a whole > >> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic > >> and that's also when I introduced XDP to whole IETF :-) ). If that's > >> the best we can do the Internet is in a world hurt. DDOS mitigation > >> alone is probably a sufficient motivation to look at XDP. We need > >> something that drops bad packets as quickly as possible when under > >> attack, we need this to be integrated into the stack, we need it to be > >> programmable to deal with the increasing savvy of attackers, and we > >> don't want to be forced to be dependent on HW solutions. This is why > >> we created XDP! The 1.2Mpps number is a bit low, but we are unfortunately in that ballpark. > > I totally understand that. But in my reply to David in this thread I > > mentioned DNS apex processing as being problematic which is actually > > being referred in your linked slide deck on page 9 ("What do floods look > > like") and the problematic of parsing DNS packets in XDP due to string > > processing and looping inside eBPF. That is a weak argument. You do realize CloudFlare actually use eBPF to do this exact filtering, and (so-far) eBPF for parsing DNS have been sufficient for them. > I agree that eBPF is not going to be sufficient from everything we'll > want to do. Undoubtably, we'll continue see new addition of more > helpers to assist in processing, but at some point we will want a to > load a kernel module that handles more complex processing and insert > it at the XDP callout. Nothing in the design of XDP precludes doing > that and I have already posted the patches to generalize the XDP > callout for that. Taking either of these routes has tradeoffs, but > regardless of whether this is BPF or module code, the principles of > XDP and its value to help solve some class of problems remains. As I've said before, I do support Tom's patches for a more generic XDP hook that the kernel itself can use. The first thing I would implement with this is a fast-path for Linux L2 bridging (do depend on multiport TX support). It would be so easy to speedup bridging, XDP would only need to forward packets already in the bridge-FIB table, rest is XDP_PASS to normal stack and bridge code (timers etc). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
[iproute PATCH v2 09/18] ss: Make tmr_name local to tcp_timer_print()
It's used only there, so no need to have it globally defined. Signed-off-by: Phil Sutter --- misc/ss.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 71040a82ca6b1..97fcfd4a85548 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -882,15 +882,6 @@ static void sock_addr_print(const char *addr, char *delim, const char *port, sock_addr_print_width(addr_width, addr, delim, serv_width, port, ifname); } -static const char *tmr_name[] = { - "off", - "on", - "keepalive", - "timewait", - "persist", - "unknown" -}; - static const char *print_ms_timer(int timeout) { static char buf[64]; @@ -1983,6 +1974,15 @@ static void tcp_stats_print(struct tcpstat *s) static void tcp_timer_print(struct tcpstat *s) { + static const char * const tmr_name[] = { + "off", + "on", + "keepalive", + "timewait", + "persist", + "unknown" + }; + if (s->timer) { if (s->timer > 4) s->timer = 5; -- 2.10.0
[iproute PATCH v2 03/18] ss: Add missing tab when printing UNIX details
When dumping UNIX sockets and show_details is active but not show_mem (ss -xne), the socket details are printed without being prefixed by tab. Fix this by printing the tab character when either one of '-e' or '-m' has been specified. Signed-off-by: Phil Sutter --- misc/ss.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 3871a6f61f8ea..f1053b1db4132 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -3096,10 +3096,10 @@ static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh, unix_stats_print(&stat, f); - if (show_mem) { + if (show_mem || show_details) printf("\t"); + if (show_mem) print_skmeminfo(tb, UNIX_DIAG_MEMINFO); - } if (show_details) { if (tb[UNIX_DIAG_SHUTDOWN]) { unsigned char mask; -- 2.10.0
[iproute PATCH v2 04/18] ss: Use sockstat->type in all socket types
Unix sockets used that field already to hold info about the socket type. By replicating this approach in all other socket types, we can get rid of protocol parameter in inet_stats_print() and have sock_state_print() figure things out by itself. Signed-off-by: Phil Sutter --- misc/ss.c | 132 +++--- 1 file changed, 74 insertions(+), 58 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index f1053b1db4132..a953d4b022aed 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -837,8 +837,59 @@ static bool is_sctp_assoc(struct sockstat *s, const char *sock_name) return true; } -static void sock_state_print(struct sockstat *s, const char *sock_name) +static const char *unix_netid_name(int type) +{ + switch (type) { + case SOCK_STREAM: + return "u_str"; + case SOCK_SEQPACKET: + return "u_seq"; + case SOCK_DGRAM: + default: + return "u_dgr"; + } +} + +static const char *proto_name(int protocol) +{ + switch (protocol) { + case 0: + return "raw"; + case IPPROTO_UDP: + return "udp"; + case IPPROTO_TCP: + return "tcp"; + case IPPROTO_SCTP: + return "sctp"; + case IPPROTO_DCCP: + return "dccp"; + } + + return "???"; +} + +static void sock_state_print(struct sockstat *s) { + const char *sock_name; + + switch (s->local.family) { + case AF_UNIX: + sock_name = unix_netid_name(s->type); + break; + case AF_INET: + case AF_INET6: + sock_name = proto_name(s->type); + break; + case AF_PACKET: + sock_name = s->type == SOCK_RAW ? "p_raw" : "p_dgr"; + break; + case AF_NETLINK: + sock_name = "nl"; + break; + default: + sock_name = "unknown"; + } + if (netid_width) printf("%-*s ", netid_width, is_sctp_assoc(s, sock_name) ? "" : sock_name); @@ -1722,29 +1773,11 @@ void *parse_markmask(const char *markmask) return res; } -static char *proto_name(int protocol) -{ - switch (protocol) { - case 0: - return "raw"; - case IPPROTO_UDP: - return "udp"; - case IPPROTO_TCP: - return "tcp"; - case IPPROTO_SCTP: - return "sctp"; - case IPPROTO_DCCP: - return "dccp"; - } - - return "???"; -} - -static void inet_stats_print(struct sockstat *s, int protocol) +static void inet_stats_print(struct sockstat *s) { char *buf = NULL; - sock_state_print(s, proto_name(protocol)); + sock_state_print(s); inet_addr_print(&s->local, s->lport, s->iface); inet_addr_print(&s->remote, s->rport, 0); @@ -2059,8 +2092,9 @@ static int tcp_show_line(char *line, const struct filter *f, int family) s.rto = (double)rto; s.ssthresh = s.ssthresh == -1 ? 0 : s.ssthresh; s.rto = s.rto != 3 * hz ? s.rto / hz : 0; + s.ss.type = IPPROTO_TCP; - inet_stats_print(&s.ss, IPPROTO_TCP); + inet_stats_print(&s.ss); if (show_options) tcp_timer_print(&s); @@ -2370,8 +2404,7 @@ static void parse_diag_msg(struct nlmsghdr *nlh, struct sockstat *s) } static int inet_show_sock(struct nlmsghdr *nlh, - struct sockstat *s, - int protocol) + struct sockstat *s) { struct rtattr *tb[INET_DIAG_MAX+1]; struct inet_diag_msg *r = NLMSG_DATA(nlh); @@ -2380,9 +2413,9 @@ static int inet_show_sock(struct nlmsghdr *nlh, nlh->nlmsg_len - NLMSG_LENGTH(sizeof(*r))); if (tb[INET_DIAG_PROTOCOL]) - protocol = *(__u8 *)RTA_DATA(tb[INET_DIAG_PROTOCOL]); + s->type = *(__u8 *)RTA_DATA(tb[INET_DIAG_PROTOCOL]); - inet_stats_print(s, protocol); + inet_stats_print(s); if (show_options) { struct tcpstat t = {}; @@ -2390,7 +2423,7 @@ static int inet_show_sock(struct nlmsghdr *nlh, t.timer = r->idiag_timer; t.timeout = r->idiag_expires; t.retrans = r->idiag_retrans; - if (protocol == IPPROTO_SCTP) + if (s->type == IPPROTO_SCTP) sctp_timer_print(&t); else tcp_timer_print(&t); @@ -2412,9 +2445,9 @@ static int inet_show_sock(struct nlmsghdr *nlh, } } - if (show_mem || (show_tcpinfo && protocol != IPPROTO_UDP)) { + if (show_mem || (show_tcpinfo && s->type != IPPROTO_UDP)) { printf("\n\t"); - if (protocol == IPPROTO_SCTP) + if (s->type == IPPROTO_SCTP) sctp_show_info(nlh, r, tb
[iproute PATCH v2 07/18] ss: Eliminate unix_use_proc()
This function is used only at a single place anymore, so replace the call to it by it's content, which makes that specific part of unix_show() consistent with e.g. tcp_show(). Signed-off-by: Phil Sutter --- misc/ss.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 0de336200142f..ad38eb97b0055 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -2995,11 +2995,6 @@ static bool unix_type_skip(struct sockstat *s, struct filter *f) return false; } -static bool unix_use_proc(void) -{ - return getenv("PROC_NET_UNIX") || getenv("PROC_ROOT"); -} - static void unix_stats_print(struct sockstat *s, struct filter *f) { char port_name[30] = {}; @@ -3123,7 +3118,8 @@ static int unix_show(struct filter *f) if (!filter_af_get(f, AF_UNIX)) return 0; - if (!unix_use_proc() && unix_show_netlink(f) == 0) + if (!getenv("PROC_NET_UNIX") && !getenv("PROC_ROOT") + && unix_show_netlink(f) == 0) return 0; if ((fp = net_unix_open()) == NULL) -- 2.10.0
[iproute PATCH v2 12/18] ss: Make slabstat_ids local to get_slabstat()
Signed-off-by: Phil Sutter --- misc/ss.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 3662f5f4861c7..c498478421190 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -601,21 +601,19 @@ struct slabstat { static struct slabstat slabstat; -static const char *slabstat_ids[] = { - - "sock", - "tcp_bind_bucket", - "tcp_tw_bucket", - "tcp_open_request", - "skbuff_head_cache", -}; - static int get_slabstat(struct slabstat *s) { char buf[256]; FILE *fp; int cnt; static int slabstat_valid; + static const char * const slabstat_ids[] = { + "sock", + "tcp_bind_bucket", + "tcp_tw_bucket", + "tcp_open_request", + "skbuff_head_cache", + }; if (slabstat_valid) return 0; -- 2.10.0
[iproute PATCH v2 15/18] ss: Make unix_state_map local to unix_show()
Also make it const, since there won't be any write access happening. Signed-off-by: Phil Sutter --- misc/ss.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index c7818eadf9e75..e82c416b5fa72 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -2914,9 +2914,6 @@ outerr: } while (0); } -int unix_state_map[] = { SS_CLOSE, SS_SYN_SENT, -SS_ESTABLISHED, SS_CLOSING }; - #define MAX_UNIX_REMEMBER (1024*1024/sizeof(struct sockstat)) static void unix_list_drop_first(struct sockstat **list) @@ -3058,6 +3055,8 @@ static int unix_show(struct filter *f) int newformat = 0; int cnt; struct sockstat *list = NULL; + const int unix_state_map[] = { SS_CLOSE, SS_SYN_SENT, + SS_ESTABLISHED, SS_CLOSING }; if (!filter_af_get(f, AF_UNIX)) return 0; -- 2.10.0
[iproute PATCH v2 08/18] ss: Turn generic_proc_open() wrappers into macros
Signed-off-by: Phil Sutter --- misc/ss.c | 89 ++- 1 file changed, 19 insertions(+), 70 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index ad38eb97b0055..71040a82ca6b1 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -327,76 +327,25 @@ static FILE *generic_proc_open(const char *env, const char *name) return fopen(p, "r"); } - -static FILE *net_tcp_open(void) -{ - return generic_proc_open("PROC_NET_TCP", "net/tcp"); -} - -static FILE *net_tcp6_open(void) -{ - return generic_proc_open("PROC_NET_TCP6", "net/tcp6"); -} - -static FILE *net_udp_open(void) -{ - return generic_proc_open("PROC_NET_UDP", "net/udp"); -} - -static FILE *net_udp6_open(void) -{ - return generic_proc_open("PROC_NET_UDP6", "net/udp6"); -} - -static FILE *net_raw_open(void) -{ - return generic_proc_open("PROC_NET_RAW", "net/raw"); -} - -static FILE *net_raw6_open(void) -{ - return generic_proc_open("PROC_NET_RAW6", "net/raw6"); -} - -static FILE *net_unix_open(void) -{ - return generic_proc_open("PROC_NET_UNIX", "net/unix"); -} - -static FILE *net_packet_open(void) -{ - return generic_proc_open("PROC_NET_PACKET", "net/packet"); -} - -static FILE *net_netlink_open(void) -{ - return generic_proc_open("PROC_NET_NETLINK", "net/netlink"); -} - -static FILE *slabinfo_open(void) -{ - return generic_proc_open("PROC_SLABINFO", "slabinfo"); -} - -static FILE *net_sockstat_open(void) -{ - return generic_proc_open("PROC_NET_SOCKSTAT", "net/sockstat"); -} - -static FILE *net_sockstat6_open(void) -{ - return generic_proc_open("PROC_NET_SOCKSTAT6", "net/sockstat6"); -} - -static FILE *net_snmp_open(void) -{ - return generic_proc_open("PROC_NET_SNMP", "net/snmp"); -} - -static FILE *ephemeral_ports_open(void) -{ - return generic_proc_open("PROC_IP_LOCAL_PORT_RANGE", "sys/net/ipv4/ip_local_port_range"); -} +#define net_tcp_open() generic_proc_open("PROC_NET_TCP", "net/tcp") +#define net_tcp6_open()generic_proc_open("PROC_NET_TCP6", "net/tcp6") +#define net_udp_open() generic_proc_open("PROC_NET_UDP", "net/udp") +#define net_udp6_open()generic_proc_open("PROC_NET_UDP6", "net/udp6") +#define net_raw_open() generic_proc_open("PROC_NET_RAW", "net/raw") +#define net_raw6_open()generic_proc_open("PROC_NET_RAW6", "net/raw6") +#define net_unix_open()generic_proc_open("PROC_NET_UNIX", "net/unix") +#define net_packet_open() generic_proc_open("PROC_NET_PACKET", \ + "net/packet") +#define net_netlink_open() generic_proc_open("PROC_NET_NETLINK", \ + "net/netlink") +#define slabinfo_open()generic_proc_open("PROC_SLABINFO", "slabinfo") +#define net_sockstat_open()generic_proc_open("PROC_NET_SOCKSTAT", \ + "net/sockstat") +#define net_sockstat6_open() generic_proc_open("PROC_NET_SOCKSTAT6", \ + "net/sockstat6") +#define net_snmp_open()generic_proc_open("PROC_NET_SNMP", "net/snmp") +#define ephemeral_ports_open() generic_proc_open("PROC_IP_LOCAL_PORT_RANGE", \ + "sys/net/ipv4/ip_local_port_range") struct user_ent { struct user_ent *next; -- 2.10.0
[iproute PATCH v2 06/18] ss: Drop list traversal from unix_stats_print()
Although this complicates the dedicated procfs-based code path in unix_show() a bit, it's the only sane way to get rid of unix_show_sock() output diverging from other socket types in that it prints all socket details in a new line. As a side effect, it allows to eliminate all procfs specific code in the same function. Signed-off-by: Phil Sutter --- misc/ss.c | 137 +- 1 file changed, 64 insertions(+), 73 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index fcbaecbe25a2f..0de336200142f 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -2975,15 +2975,13 @@ int unix_state_map[] = { SS_CLOSE, SS_SYN_SENT, #define MAX_UNIX_REMEMBER (1024*1024/sizeof(struct sockstat)) -static void unix_list_free(struct sockstat *list) +static void unix_list_drop_first(struct sockstat **list) { - while (list) { - struct sockstat *s = list; + struct sockstat *s = *list; - list = list->next; - free(s->name); - free(s); - } + (*list) = (*list)->next; + free(s->name); + free(s); } static bool unix_type_skip(struct sockstat *s, struct filter *f) @@ -3002,61 +3000,18 @@ static bool unix_use_proc(void) return getenv("PROC_NET_UNIX") || getenv("PROC_ROOT"); } -static void unix_stats_print(struct sockstat *list, struct filter *f) +static void unix_stats_print(struct sockstat *s, struct filter *f) { - struct sockstat *s; - char *peer; - bool use_proc = unix_use_proc(); char port_name[30] = {}; - for (s = list; s; s = s->next) { - if (!(f->states & (1 << s->state))) - continue; - if (unix_type_skip(s, f)) - continue; - - peer = "*"; - if (s->peer_name) - peer = s->peer_name; - - if (s->rport && use_proc) { - struct sockstat *p; - - for (p = list; p; p = p->next) { - if (s->rport == p->lport) - break; - } - - if (!p) { - peer = "?"; - } else { - peer = p->name ? : "*"; - } - } - - if (use_proc && f->f) { - struct sockstat st = { - .local.family = AF_UNIX, - .remote.family = AF_UNIX, - }; - - memcpy(st.local.data, &s->name, sizeof(s->name)); - if (strcmp(peer, "*")) - memcpy(st.remote.data, &peer, sizeof(peer)); - if (run_ssfilter(f->f, &st) == 0) - continue; - } - - sock_state_print(s); + sock_state_print(s); - sock_addr_print(s->name ?: "*", " ", - int_to_str(s->lport, port_name), NULL); - sock_addr_print(peer, " ", int_to_str(s->rport, port_name), - NULL); + sock_addr_print(s->name ?: "*", " ", + int_to_str(s->lport, port_name), NULL); + sock_addr_print(s->peer_name ?: "*", " ", + int_to_str(s->rport, port_name), NULL); - proc_ctx_print(s); - printf("\n"); - } + proc_ctx_print(s); } static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh, @@ -3105,8 +3060,6 @@ static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh, unix_stats_print(&stat, f); - if (show_mem || show_details) - printf("\t"); if (show_mem) print_skmeminfo(tb, UNIX_DIAG_MEMINFO); if (show_details) { @@ -3117,8 +3070,7 @@ static int unix_show_sock(const struct sockaddr_nl *addr, struct nlmsghdr *nlh, printf(" %c-%c", mask & 1 ? '-' : '<', mask & 2 ? '-' : '>'); } } - if (show_mem || show_details) - printf("\n"); + printf("\n"); return 0; } @@ -3209,6 +3161,11 @@ static int unix_show(struct filter *f) if (u->type == SOCK_DGRAM && u->state == SS_CLOSE && u->rport) u->state = SS_ESTABLISHED; } + if (unix_type_skip(u, f) || + !(f->states & (1 << u->state))) { + free(u); + continue; + } if (!newformat) { u->rport = 0; @@ -3216,6 +3173,42 @@ static int unix_show(struct filter *f) u->wq = 0; } + if (name[0]) { + u->name = strdup(name); +
[iproute PATCH v2 01/18] ss: Mark fall through in arg parsing switch()
As there is a certain chance of overlooking this, better add a comment to draw readers' attention. Signed-off-by: Phil Sutter --- misc/ss.c | 1 + 1 file changed, 1 insertion(+) diff --git a/misc/ss.c b/misc/ss.c index 07dcd8c209c04..469721fd9aee3 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -4223,6 +4223,7 @@ int main(int argc, char *argv[]) exit(0); case 'z': show_sock_ctx++; + /* fall through */ case 'Z': if (is_selinux_enabled() <= 0) { fprintf(stderr, "ss: SELinux is not enabled.\n"); -- 2.10.0
Re: [PATCH net-next 2/3] net/act_pedit: Support using offset relative to the conventional network headers
On Thu, Dec 01, 2016 at 02:41:14PM -0500, David Miller wrote: > From: Amir Vadai > Date: Wed, 30 Nov 2016 11:09:27 +0200 > > > @@ -119,18 +119,45 @@ static bool offset_valid(struct sk_buff *skb, int > > offset) > > return true; > > } > > > > +static int pedit_skb_hdr_offset(struct sk_buff *skb, > > + enum pedit_header_type htype, int *hoffset) > > +{ > > + int ret = -1; > > + > > + switch (htype) { > > + case PEDIT_HDR_TYPE_ETH: > > + if (skb_mac_header_was_set(skb)) { > > + *hoffset = skb_mac_offset(skb); > > + ret = 0; > > + } > > + break; > > + case PEDIT_HDR_TYPE_RAW: > > + case PEDIT_HDR_TYPE_IP4: > > + case PEDIT_HDR_TYPE_IP6: > > + *hoffset = skb_network_offset(skb); > > + ret = 0; > > + break; > > + case PEDIT_HDR_TYPE_TCP: > > + case PEDIT_HDR_TYPE_UDP: > > + if (skb_transport_header_was_set(skb)) { > > + *hoffset = skb_transport_offset(skb); > > + ret = 0; > > + } > > + break; > > + }; > > + > > + return ret; > > +} > > + > > The only distinction between the cases is "L2", "L3", and "L4". > > Therefore I don't see any reason to break it down into IP4 vs. IP6 vs. > RAW, for example. They all map to the same thing. > > So why not just have PEDIT_HDR_TYPE_L2, PEDIT_HDR_TYPE_L3, and > PEDIT_HDR_TYPE_L4? It definitely seems more straightforward > and cleaner that way. Yeh, is isn't by mistake. The next step will be to implement hardware offloading of the action, and for that we would like to keep the information about the specific header type. > > Thanks.
[iproute PATCH v2 02/18] ss: Drop empty lines in UDP output
When dumping UDP sockets and show_tcpinfo (-i) is active but not show_mem (-m), print_tcpinfo() does not output anything leading to an empty line being printed after every socket. Fix this by skipping the call to print_tcpinfo() and the previous newline printing in that case. Signed-off-by: Phil Sutter --- misc/ss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/misc/ss.c b/misc/ss.c index 469721fd9aee3..3871a6f61f8ea 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -2412,7 +2412,7 @@ static int inet_show_sock(struct nlmsghdr *nlh, } } - if (show_mem || show_tcpinfo) { + if (show_mem || (show_tcpinfo && protocol != IPPROTO_UDP)) { printf("\n\t"); if (protocol == IPPROTO_SCTP) sctp_show_info(nlh, r, tb); -- 2.10.0
[iproute PATCH v2 18/18] ss: unix_show: No need to initialize members of calloc'ed structs
Signed-off-by: Phil Sutter --- misc/ss.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index c72aba7e65ad3..f23aa6be33174 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -3066,8 +3066,6 @@ static int unix_show(struct filter *f) if (!(u = calloc(1, sizeof(*u break; - u->name = NULL; - u->peer_name = NULL; if (sscanf(buf, "%x: %x %x %x %x %x %d %s", &u->rport, &u->rq, &u->wq, &flags, &u->type, -- 2.10.0
[iproute PATCH v2 13/18] ss: Get rid of useless goto in handle_follow_request()
Signed-off-by: Phil Sutter --- misc/ss.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index c498478421190..ec71c21ce6a4a 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -3632,7 +3632,7 @@ static int generic_show_sock(const struct sockaddr_nl *addr, static int handle_follow_request(struct filter *f) { - int ret = -1; + int ret = 0; int groups = 0; struct rtnl_handle rth; @@ -3655,10 +3655,8 @@ static int handle_follow_request(struct filter *f) rth.local.nl_pid = 0; if (rtnl_dump_filter(&rth, generic_show_sock, f)) - goto Exit; + ret = -1; - ret = 0; -Exit: rtnl_close(&rth); return ret; } -- 2.10.0
[iproute PATCH v2 11/18] ss: Make some variables function-local
addrp_width and screen_width are used in main() only, so no need to have them globally available. Signed-off-by: Phil Sutter --- misc/ss.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 44386c82c7578..3662f5f4861c7 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -105,10 +105,8 @@ int sctp_ino; int netid_width; int state_width; -int addrp_width; int addr_width; int serv_width; -int screen_width; static const char *TCP_PROTO = "tcp"; static const char *SCTP_PROTO = "sctp"; @@ -3975,6 +3973,7 @@ int main(int argc, char *argv[]) FILE *filter_fp = NULL; int ch; int state_filter = 0; + int addrp_width, screen_width = 80; while ((ch = getopt_long(argc, argv, "dhaletuwxnro460spbEf:miA:D:F:vVzZN:KHS", @@ -4264,7 +4263,6 @@ int main(int argc, char *argv[]) if (current_filter.states&(current_filter.states-1)) state_width = 10; - screen_width = 80; if (isatty(STDOUT_FILENO)) { struct winsize w; -- 2.10.0
[iproute PATCH v2 05/18] ss: introduce proc_ctx_print()
This consolidates identical code in three places. While the function name is not quite perfect as there is different proc_ctx printing code in netlink_show_one() as well, I sadly didn't find a more suitable one. Signed-off-by: Phil Sutter --- misc/ss.c | 49 ++--- 1 file changed, 14 insertions(+), 35 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index a953d4b022aed..fcbaecbe25a2f 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -1773,14 +1773,9 @@ void *parse_markmask(const char *markmask) return res; } -static void inet_stats_print(struct sockstat *s) +static void proc_ctx_print(struct sockstat *s) { - char *buf = NULL; - - sock_state_print(s); - - inet_addr_print(&s->local, s->lport, s->iface); - inet_addr_print(&s->remote, s->rport, 0); + char *buf; if (show_proc_ctx || show_sock_ctx) { if (find_entry(s->ino, &buf, @@ -1797,6 +1792,16 @@ static void inet_stats_print(struct sockstat *s) } } +static void inet_stats_print(struct sockstat *s) +{ + sock_state_print(s); + + inet_addr_print(&s->local, s->lport, s->iface); + inet_addr_print(&s->remote, s->rport, 0); + + proc_ctx_print(s); +} + static int proc_parse_inet_addr(char *loc, char *rem, int family, struct sockstat * s) { @@ -3001,7 +3006,6 @@ static void unix_stats_print(struct sockstat *list, struct filter *f) { struct sockstat *s; char *peer; - char *ctx_buf = NULL; bool use_proc = unix_use_proc(); char port_name[30] = {}; @@ -3050,19 +3054,7 @@ static void unix_stats_print(struct sockstat *list, struct filter *f) sock_addr_print(peer, " ", int_to_str(s->rport, port_name), NULL); - if (show_proc_ctx || show_sock_ctx) { - if (find_entry(s->ino, &ctx_buf, - (show_proc_ctx & show_sock_ctx) ? - PROC_SOCK_CTX : PROC_CTX) > 0) { - printf(" users:(%s)", ctx_buf); - free(ctx_buf); - } - } else if (show_users) { - if (find_entry(s->ino, &ctx_buf, USERS) > 0) { - printf(" users:(%s)", ctx_buf); - free(ctx_buf); - } - } + proc_ctx_print(s); printf("\n"); } } @@ -3260,7 +3252,6 @@ static int unix_show(struct filter *f) static int packet_stats_print(struct sockstat *s, const struct filter *f) { - char *buf = NULL; const char *addr, *port; char ll_name[16]; @@ -3287,19 +3278,7 @@ static int packet_stats_print(struct sockstat *s, const struct filter *f) sock_addr_print(addr, ":", port, NULL); sock_addr_print("", "*", "", NULL); - if (show_proc_ctx || show_sock_ctx) { - if (find_entry(s->ino, &buf, - (show_proc_ctx & show_sock_ctx) ? - PROC_SOCK_CTX : PROC_CTX) > 0) { - printf(" users:(%s)", buf); - free(buf); - } - } else if (show_users) { - if (find_entry(s->ino, &buf, USERS) > 0) { - printf(" users:(%s)", buf); - free(buf); - } - } + proc_ctx_print(s); if (show_details) sock_details_print(s); -- 2.10.0
[iproute PATCH v2 00/18] ss: Minor code review
This is a series of misc changes to ss code which happened as fall-out when working on a unified output formatter (still unfinished). Changes since v1: - Rebased onto current upstream, resolved conflicts in patch 4 generated by previously added SCTP socket support. Phil Sutter (18): ss: Mark fall through in arg parsing switch() ss: Drop empty lines in UDP output ss: Add missing tab when printing UNIX details ss: Use sockstat->type in all socket types ss: introduce proc_ctx_print() ss: Drop list traversal from unix_stats_print() ss: Eliminate unix_use_proc() ss: Turn generic_proc_open() wrappers into macros ss: Make tmr_name local to tcp_timer_print() ss: Make user_ent_hash_build_init local to user_ent_hash_build() ss: Make some variables function-local ss: Make slabstat_ids local to get_slabstat() ss: Get rid of useless goto in handle_follow_request() ss: Get rid of single-fielded struct snmpstat ss: Make unix_state_map local to unix_show() ss: Make sstate_name local to sock_state_print() ss: Make sstate_namel local to scan_state() ss: unix_show: No need to initialize members of calloc'ed structs misc/ss.c | 532 ++ 1 file changed, 224 insertions(+), 308 deletions(-) -- 2.10.0
[iproute PATCH v2 14/18] ss: Get rid of single-fielded struct snmpstat
A struct with only a single field does not make much sense. Besides that, it was used by print_summary() only. Signed-off-by: Phil Sutter --- misc/ss.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index ec71c21ce6a4a..c7818eadf9e75 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -3661,10 +3661,6 @@ static int handle_follow_request(struct filter *f) return ret; } -struct snmpstat { - int tcp_estab; -}; - static int get_snmp_int(char *proto, char *key, int *result) { char buf[1024]; @@ -3785,11 +3781,11 @@ static int get_sockstat(struct ssummary *s) static int print_summary(void) { struct ssummary s; - struct snmpstat sn; + int tcp_estab; if (get_sockstat(&s) < 0) perror("ss: get_sockstat"); - if (get_snmp_int("Tcp:", "CurrEstab", &sn.tcp_estab) < 0) + if (get_snmp_int("Tcp:", "CurrEstab", &tcp_estab) < 0) perror("ss: get_snmpstat"); get_slabstat(&slabstat); @@ -3798,7 +3794,7 @@ static int print_summary(void) printf("TCP: %d (estab %d, closed %d, orphaned %d, synrecv %d, timewait %d/%d), ports %d\n", s.tcp_total + slabstat.tcp_syns + s.tcp_tws, - sn.tcp_estab, + tcp_estab, s.tcp_total - (s.tcp4_hashed+s.tcp6_hashed-s.tcp_tws), s.tcp_orphans, slabstat.tcp_syns, -- 2.10.0
[PATCH net-next 0/2] samples, bpf: Refactor; Add automated tests for cgroups
These two patches are around refactoring out some old, reusable code from the existing test_current_task_under_cgroup_user test, and adding a new, automated test. There is some generic cgroupsv2 setup & cleanup code, given that most environment still don't have it setup by default. With this code, we're able to pretty easily add an automated test for future cgroupsv2 functionality. Sargun Dhillon (2): samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers samples, bpf: Add automated test for cgroup filter attachments samples/bpf/Makefile | 4 +- samples/bpf/cgroup_helpers.c | 177 ++ samples/bpf/cgroup_helpers.h | 16 ++ samples/bpf/test_cgrp2_attach2.c | 132 samples/bpf/test_current_task_under_cgroup_user.c | 108 +++-- 5 files changed, 352 insertions(+), 85 deletions(-) create mode 100644 samples/bpf/cgroup_helpers.c create mode 100644 samples/bpf/cgroup_helpers.h create mode 100644 samples/bpf/test_cgrp2_attach2.c -- 2.7.4
[iproute PATCH v2 17/18] ss: Make sstate_namel local to scan_state()
Signed-off-by: Phil Sutter --- misc/ss.c | 29 ++--- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 8439f473d7f7b..c72aba7e65ad3 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -666,21 +666,6 @@ static const char *sctp_sstate_name[] = { [SCTP_STATE_SHUTDOWN_ACK_SENT] = "ACK_SENT", }; -static const char *sstate_namel[] = { - "UNKNOWN", - [SS_ESTABLISHED] = "established", - [SS_SYN_SENT] = "syn-sent", - [SS_SYN_RECV] = "syn-recv", - [SS_FIN_WAIT1] = "fin-wait-1", - [SS_FIN_WAIT2] = "fin-wait-2", - [SS_TIME_WAIT] = "time-wait", - [SS_CLOSE] = "unconnected", - [SS_CLOSE_WAIT] = "close-wait", - [SS_LAST_ACK] = "last-ack", - [SS_LISTEN] = "listening", - [SS_CLOSING] = "closing", -}; - struct sockstat { struct sockstat*next; unsigned inttype; @@ -3888,6 +3873,20 @@ static void usage(void) static int scan_state(const char *state) { + static const char * const sstate_namel[] = { + "UNKNOWN", + [SS_ESTABLISHED] = "established", + [SS_SYN_SENT] = "syn-sent", + [SS_SYN_RECV] = "syn-recv", + [SS_FIN_WAIT1] = "fin-wait-1", + [SS_FIN_WAIT2] = "fin-wait-2", + [SS_TIME_WAIT] = "time-wait", + [SS_CLOSE] = "unconnected", + [SS_CLOSE_WAIT] = "close-wait", + [SS_LAST_ACK] = "last-ack", + [SS_LISTEN] = "listening", + [SS_CLOSING] = "closing", + }; int i; if (strcasecmp(state, "close") == 0 || -- 2.10.0
Re: stmmac: turn coalescing / NAPI off in stmmac
Hi! > >Anyway... since you asked. I belive I have way to disable NAPI / tx > >coalescing in the driver. Unfortunately, locking is missing on the rx > >path, and needs to be extended to _irqsave variant on tx path. > > I have just replied to a previous thread about that... Yeah, please reply to David's mail where he describes why it can't work. > >So patch currently looks like this (hand edited, can't be > >applied, got it working few hours ago). Does it look acceptable? > > > >I'd prefer this to go after the patch that pulls common code to single > >place, so that single place needs to be patched. Plus I guess I should > >add ifdefs, so that more advanced NAPI / tx coalescing code can be > >reactivated when it is fixed. Trivial fixes can go on top. Does that > >sound like a plan? > > Hmm, what I find strange is that, just this code is running since a > long time on several platforms and Chip versions. No raise condition > have been found or lock protection problems (also proving look > mechanisms). Well, it works better for me when I disable CONFIG_SMP. It is normal that locking problems are hard to reproduce :-(. > Pavel, I ask you sorry if I missed some problems so, if you can > (as D. Miller asked) to send us a cover letter + all patches > I will try to reply soon. I can do also some tests if you ask > me that! I could run on 3.x and 4.x but I cannot promise you > benchmarks. Actually... I have questions here. David normally pulls from you (can I have a address of your git tree?). Could you apply these to your git? [PATCH] stmmac ethernet: unify locking [PATCH] stmmac: simplify flag assignment [PATCH] stmmac: cleanup documenation, make it match reality They are rather trivial and independend, I'm not sure what cover letter would say, besides "simple fixes". Then I can re-do the reset on top of that... > >Which tree do you want patches against? > > > >https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/ ? > > I think that bug fixing should be on top of net.git but I let Miller > to decide. Hmm. It is "only" a performance problem (40msec delays).. I guess -next is better target. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
[iproute PATCH v2 10/18] ss: Make user_ent_hash_build_init local to user_ent_hash_build()
By having it statically defined, there is no need for it to be global. Signed-off-by: Phil Sutter --- misc/ss.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index 97fcfd4a85548..44386c82c7578 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -100,8 +100,6 @@ int show_bpf; int show_proc_ctx; int show_sock_ctx; int show_header = 1; -/* If show_users & show_proc_ctx only do user_ent_hash_build() once */ -int user_ent_hash_build_init; int follow_events; int sctp_ino; @@ -421,6 +419,7 @@ static void user_ent_hash_build(void) char *pid_context; char *sock_context; const char *no_ctx = "unavailable"; + static int user_ent_hash_build_init; /* If show_users & show_proc_ctx set only do this once */ if (user_ent_hash_build_init != 0) -- 2.10.0
[iproute PATCH v2 16/18] ss: Make sstate_name local to sock_state_print()
Signed-off-by: Phil Sutter --- misc/ss.c | 29 ++--- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/misc/ss.c b/misc/ss.c index e82c416b5fa72..8439f473d7f7b 100644 --- a/misc/ss.c +++ b/misc/ss.c @@ -655,21 +655,6 @@ static unsigned long long cookie_sk_get(const uint32_t *cookie) return (((unsigned long long)cookie[1] << 31) << 1) | cookie[0]; } -static const char *sstate_name[] = { - "UNKNOWN", - [SS_ESTABLISHED] = "ESTAB", - [SS_SYN_SENT] = "SYN-SENT", - [SS_SYN_RECV] = "SYN-RECV", - [SS_FIN_WAIT1] = "FIN-WAIT-1", - [SS_FIN_WAIT2] = "FIN-WAIT-2", - [SS_TIME_WAIT] = "TIME-WAIT", - [SS_CLOSE] = "UNCONN", - [SS_CLOSE_WAIT] = "CLOSE-WAIT", - [SS_LAST_ACK] = "LAST-ACK", - [SS_LISTEN] = "LISTEN", - [SS_CLOSING] = "CLOSING", -}; - static const char *sctp_sstate_name[] = { [SCTP_STATE_CLOSED] = "CLOSED", [SCTP_STATE_COOKIE_WAIT] = "COOKIE_WAIT", @@ -815,6 +800,20 @@ static const char *proto_name(int protocol) static void sock_state_print(struct sockstat *s) { const char *sock_name; + static const char * const sstate_name[] = { + "UNKNOWN", + [SS_ESTABLISHED] = "ESTAB", + [SS_SYN_SENT] = "SYN-SENT", + [SS_SYN_RECV] = "SYN-RECV", + [SS_FIN_WAIT1] = "FIN-WAIT-1", + [SS_FIN_WAIT2] = "FIN-WAIT-2", + [SS_TIME_WAIT] = "TIME-WAIT", + [SS_CLOSE] = "UNCONN", + [SS_CLOSE_WAIT] = "CLOSE-WAIT", + [SS_LAST_ACK] = "LAST-ACK", + [SS_LISTEN] = "LISTEN", + [SS_CLOSING] = "CLOSING", + }; switch (s->local.family) { case AF_UNIX: -- 2.10.0
[PATCH net-next 1/2] samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers
This patch modifies test_current_task_under_cgroup_user. The test has several helpers around creating a temporary environment for cgroup testing, and moving the current task around cgroups. This set of helpers can then be used in other tests. Signed-off-by: Sargun Dhillon --- samples/bpf/Makefile | 2 +- samples/bpf/cgroup_helpers.c | 177 ++ samples/bpf/cgroup_helpers.h | 16 ++ samples/bpf/test_current_task_under_cgroup_user.c | 108 +++-- 4 files changed, 218 insertions(+), 85 deletions(-) create mode 100644 samples/bpf/cgroup_helpers.c create mode 100644 samples/bpf/cgroup_helpers.h diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 22b6407e..3c805af 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -54,7 +54,7 @@ test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o xdp1-objs := bpf_load.o libbpf.o xdp1_user.o # reuse xdp1 source intentionally xdp2-objs := bpf_load.o libbpf.o xdp1_user.o -test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \ +test_current_task_under_cgroup-objs := bpf_load.o libbpf.o cgroup_helpers.o \ test_current_task_under_cgroup_user.o trace_event-objs := bpf_load.o libbpf.o trace_event_user.o sampleip-objs := bpf_load.o libbpf.o sampleip_user.o diff --git a/samples/bpf/cgroup_helpers.c b/samples/bpf/cgroup_helpers.c new file mode 100644 index 000..9d1be94 --- /dev/null +++ b/samples/bpf/cgroup_helpers.c @@ -0,0 +1,177 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +#include "cgroup_helpers.h" + +/* + * To avoid relying on the system setup, when setup_cgroup_env is called + * we create a new mount namespace, and cgroup namespace. The cgroup2 + * root is mounted at CGROUP_MOUNT_PATH + * + * Unfortunately, most people don't have cgroupv2 enabled at this point in time. + * It's easier to create our own mount namespace and manage it ourselves. + * + * We assume /mnt exists. + */ + +#define WALK_FD_LIMIT 16 +#define CGROUP_MOUNT_PATH "/mnt" +#define CGROUP_WORK_DIR"/cgroup-test-work-dir" +#define format_cgroup_path(buf, path) \ + snprintf(buf, sizeof(buf), "%s%s%s", CGROUP_MOUNT_PATH, \ +CGROUP_WORK_DIR, path) + +/** + * setup_cgroup_environment() - Setup the cgroup environment + * + * After calling this function, cleanup_cgroup_environment should be called + * once testing is complete. + * + * This function will print an error to stderr and return 1 if it is unable + * to setup the cgroup environment. If setup is successful, 0 is returned. + */ +int setup_cgroup_environment(void) +{ + char cgroup_workdir[PATH_MAX + 1]; + + format_cgroup_path(cgroup_workdir, ""); + + if (unshare(CLONE_NEWNS)) { + log_err("unshare"); + return 1; + } + + if (mount("none", "/", NULL, MS_REC | MS_PRIVATE, NULL)) { + log_err("mount fakeroot"); + return 1; + } + + if (mount("none", CGROUP_MOUNT_PATH, "cgroup2", 0, NULL)) { + log_err("mount cgroup2"); + return 1; + } + + /* Cleanup existing failed runs, now that the environment is setup */ + cleanup_cgroup_environment(); + + if (mkdir(cgroup_workdir, 0777) && errno != EEXIST) { + log_err("mkdir cgroup work dir"); + return 1; + } + + return 0; +} + +static int nftwfunc(const char *filename, const struct stat *statptr, + int fileflags, struct FTW *pfwt) +{ + if ((fileflags & FTW_D) && rmdir(filename)) + log_err("Removing cgroup: %s", filename); + return 0; +} + + +static int join_cgroup_from_top(char *cgroup_path) +{ + char cgroup_procs_path[PATH_MAX + 1]; + pid_t pid = getpid(); + int fd, rc = 0; + + snprintf(cgroup_procs_path, sizeof(cgroup_procs_path), +"%s/cgroup.procs", cgroup_path); + + fd = open(cgroup_procs_path, O_WRONLY); + if (fd < 0) { + log_err("Opening Cgroup Procs: %s", cgroup_procs_path); + return 1; + } + + if (dprintf(fd, "%d\n", pid) < 0) { + log_err("Joining Cgroup"); + rc = 1; + } + + close(fd); + return rc; +} + +/** + * join_cgroup() - Join a cgroup + * @path: The cgroup path, relative to the workdir, to join + * + * This function expects a cgroup to already be created, relative to the cgroup + * work dir, and it joins it. For example, passing "/my-cgroup" as the path + * would actually put the calling process into the cgroup + * "/cgroup-test-work-dir/my-cgroup" + * + * On success, it returns 0, otherwise on failure it returns 1. + */ +int join_cgroup(char *path) +{ + char cgroup_path[PATH_MAX + 1]; + + format
[PATCH net-next 2/2] samples, bpf: Add automated test for cgroup filter attachments
This patch adds the sample program test_cgrp2_attach2. This program is similar to test_cgrp2_attach, but it performs automated testing of the cgroupv2 BPF attached filters. It runs the following checks: * Simple filter attachment * Application of filters to child cgroups * Overriding filters on child cgroups * Checking that this still works when the parent filter is removed The filters that are used here are simply allow all / deny all filters, so it isn't checking the actual functionality of the filters, but rather the behaviour around detachment / attachment. If net_cls is enabled, this test will fail. Signed-off-by: Sargun Dhillon --- samples/bpf/Makefile | 2 + samples/bpf/test_cgrp2_attach2.c | 132 +++ 2 files changed, 134 insertions(+) create mode 100644 samples/bpf/test_cgrp2_attach2.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 3c805af..8892d7c 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -23,6 +23,7 @@ hostprogs-y += map_perf_test hostprogs-y += test_overhead hostprogs-y += test_cgrp2_array_pin hostprogs-y += test_cgrp2_attach +hostprogs-y += test_cgrp2_attach2 hostprogs-y += xdp1 hostprogs-y += xdp2 hostprogs-y += test_current_task_under_cgroup @@ -51,6 +52,7 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o +test_cgrp2_attach2-objs := libbpf.o test_cgrp2_attach2.o cgroup_helpers.o xdp1-objs := bpf_load.o libbpf.o xdp1_user.o # reuse xdp1 source intentionally xdp2-objs := bpf_load.o libbpf.o xdp1_user.o diff --git a/samples/bpf/test_cgrp2_attach2.c b/samples/bpf/test_cgrp2_attach2.c new file mode 100644 index 000..ddfac42 --- /dev/null +++ b/samples/bpf/test_cgrp2_attach2.c @@ -0,0 +1,132 @@ +/* eBPF example program: + * + * - Creates arraymap in kernel with 4 bytes keys and 8 byte values + * + * - Loads eBPF program + * + * The eBPF program accesses the map passed in to store two pieces of + * information. The number of invocations of the program, which maps + * to the number of packets received, is stored to key 0. Key 1 is + * incremented on each iteration by the number of bytes stored in + * the skb. + * + * - Attaches the new program to a cgroup using BPF_PROG_ATTACH + * + * - Every second, reads map[0] and map[1] to see how many bytes and + * packets were seen on any socket of tasks in the given cgroup. + */ + +#define _GNU_SOURCE + +#include +#include +#include +#include + +#include + +#include "libbpf.h" +#include "cgroup_helpers.h" + +#define FOO"/foo" +#define BAR"/foo/bar/" +#define PING_CMD "ping -c1 -w1 127.0.0.1" + +static int prog_load(int verdict) +{ + int ret; + struct bpf_insn prog[] = { + BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */ + BPF_EXIT_INSN(), + }; + + ret = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SKB, +prog, sizeof(prog), "GPL", 0); + + if (ret < 0) { + log_err("Loading program"); + printf("Output from verifier:\n%s\n---\n", bpf_log_buf); + return 0; + } + return ret; +} + + +int main(int argc, char **argv) +{ + int drop_prog, allow_prog, foo = 0, bar = 0, rc = 0; + + allow_prog = prog_load(1); + if (!allow_prog) + goto err; + + drop_prog = prog_load(0); + if (!drop_prog) + goto err; + + if (setup_cgroup_environment()) + goto err; + + /* Create cgroup /foo, get fd, and join it */ + foo = create_and_get_cgroup(FOO); + if (!foo) + goto err; + + if (join_cgroup(FOO)) + goto err; + + if (bpf_prog_attach(drop_prog, foo, BPF_CGROUP_INET_EGRESS)) { + log_err("Attaching prog to /foo"); + goto err; + } + + assert(system(PING_CMD) != 0); + + /* Create cgroup /foo/bar, get fd, and join it */ + bar = create_and_get_cgroup(BAR); + if (!bar) + goto err; + + if (join_cgroup(BAR)) + goto err; + + assert(system(PING_CMD) != 0); + + if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS)) { + log_err("Attaching prog to /foo/bar"); + goto err; + } + + assert(system(PING_CMD) == 0); + + + if (bpf_prog_detach(bar, BPF_CGROUP_INET_EGRESS)) { + log_err("Detaching program from /foo/bar"); + goto err; + } + + assert(system(PING_CMD) != 0); + + if (bpf_prog_attach(allow_prog, bar, BPF_CGROUP_INET_EGRESS)) { + log_err("Attaching prog to /foo/bar"); + goto err; + } + + if (bpf_prog_detach(foo, BPF_CGROUP_INET_EGRESS)) { +
[PATCH net v2] tcp: warn on bogus MSS and try to amend it
There have been some reports lately about TCP connection stalls caused by NIC drivers that aren't setting gso_size on aggregated packets on rx path. This causes TCP to assume that the MSS is actually the size of the aggregated packet, which is invalid. Although the proper fix is to be done at each driver, it's often hard and cumbersome for one to debug, come to such root cause and report/fix it. This patch amends this situation in two ways. First, it adds a warning on when this situation occurs, so it gives a hint to those trying to debug this. It also limit the maximum probed MSS to the adverised MSS, as it should never be any higher than that. The result is that the connection may not have the best performance ever but it shouldn't stall, and the admin will have a hint on what to look for. Tested with virtio by forcing gso_size to 0. Cc: Jonathan Maxwell Signed-off-by: Marcelo Ricardo Leitner --- v2: Updated msg as suggested by David. net/ipv4/tcp_input.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a27b9c0e27c08b4e4aeaff3d0bfdf3ae561ba4d8..fd619eb93749b6de56a41669248b337c051d9fe2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -144,7 +144,10 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) */ len = skb_shinfo(skb)->gso_size ? : skb->len; if (len >= icsk->icsk_ack.rcv_mss) { - icsk->icsk_ack.rcv_mss = len; + icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, + tcp_sk(sk)->advmss); + if (icsk->icsk_ack.rcv_mss != len) + pr_warn_once("Driver has suspect GRO implementation, TCP performance may be compromised.\n"); } else { /* Otherwise, we make more careful check taking into account, * that SACKs block is variable. -- 2.9.3
Re: [PATCH 2/7] net: ethernet: ti: cpdma: fix desc re-queuing
On Thu, Dec 01, 2016 at 05:34:27PM -0600, Grygorii Strashko wrote: > The currently processing cpdma descriptor with EOQ flag set may > contain two values in Next Descriptor Pointer field: > - valid pointer: means CPDMA missed addition of new desc in queue; It shouldn't happen in normal circumstances, right? So, why it happens only for egress channels? And Does that mean there is some resynchronization between submit and process function, or this is h/w issue? > - null: no more descriptors in queue. > In the later case, it's not required to write to HDP register, but now > CPDMA does it. > > Hence, add additional check for Next Descriptor Pointer != null in > cpdma_chan_process() function before writing in HDP register. > > Signed-off-by: Grygorii Strashko > --- > drivers/net/ethernet/ti/davinci_cpdma.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c > b/drivers/net/ethernet/ti/davinci_cpdma.c > index 0924014..379314f 100644 > --- a/drivers/net/ethernet/ti/davinci_cpdma.c > +++ b/drivers/net/ethernet/ti/davinci_cpdma.c > @@ -1152,7 +1152,7 @@ static int __cpdma_chan_process(struct cpdma_chan *chan) > chan->count--; > chan->stats.good_dequeue++; > > - if (status & CPDMA_DESC_EOQ) { > + if ((status & CPDMA_DESC_EOQ) && chan->head) { > chan->stats.requeue++; > chan_write(chan, hdp, desc_phys(pool, chan->head)); > } > -- > 2.10.1 >
[PATCH iproute2 V5 1/3] libnetlink: Introduce rta_getattr_be*()
Add the utility functions rta_getattr_be16() and rta_getattr_be32(), and change existing code to use it. Signed-off-by: Amir Vadai --- bridge/fdb.c | 4 ++-- include/libnetlink.h | 9 + ip/iplink_geneve.c | 2 +- ip/iplink_vxlan.c| 2 +- tc/f_flower.c| 2 +- 5 files changed, 14 insertions(+), 5 deletions(-) diff --git a/bridge/fdb.c b/bridge/fdb.c index 90f4b154c5dc..a91521776e99 100644 --- a/bridge/fdb.c +++ b/bridge/fdb.c @@ -168,10 +168,10 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) if (tb[NDA_PORT]) { if (jw_global) jsonw_uint_field(jw_global, "port", -ntohs(rta_getattr_u16(tb[NDA_PORT]))); +rta_getattr_be16(tb[NDA_PORT])); else fprintf(fp, "port %d ", - ntohs(rta_getattr_u16(tb[NDA_PORT]))); + rta_getattr_be16(tb[NDA_PORT])); } if (tb[NDA_VNI]) { diff --git a/include/libnetlink.h b/include/libnetlink.h index 483509ca9635..751ebf186dd4 100644 --- a/include/libnetlink.h +++ b/include/libnetlink.h @@ -10,6 +10,7 @@ #include #include #include +#include struct rtnl_handle { int fd; @@ -140,10 +141,18 @@ static inline __u16 rta_getattr_u16(const struct rtattr *rta) { return *(__u16 *)RTA_DATA(rta); } +static inline __be16 rta_getattr_be16(const struct rtattr *rta) +{ + return ntohs(rta_getattr_u16(rta)); +} static inline __u32 rta_getattr_u32(const struct rtattr *rta) { return *(__u32 *)RTA_DATA(rta); } +static inline __be32 rta_getattr_be32(const struct rtattr *rta) +{ + return ntohl(rta_getattr_u32(rta)); +} static inline __u64 rta_getattr_u64(const struct rtattr *rta) { __u64 tmp; diff --git a/ip/iplink_geneve.c b/ip/iplink_geneve.c index 3bfba91c644c..1e6669d07d60 100644 --- a/ip/iplink_geneve.c +++ b/ip/iplink_geneve.c @@ -234,7 +234,7 @@ static void geneve_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) if (tb[IFLA_GENEVE_PORT]) fprintf(f, "dstport %u ", - ntohs(rta_getattr_u16(tb[IFLA_GENEVE_PORT]))); + rta_getattr_be16(tb[IFLA_GENEVE_PORT])); if (tb[IFLA_GENEVE_COLLECT_METADATA]) fputs("external ", f); diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c index 93af979a1e97..6d02bb47b2f0 100644 --- a/ip/iplink_vxlan.c +++ b/ip/iplink_vxlan.c @@ -413,7 +413,7 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) if (tb[IFLA_VXLAN_PORT]) fprintf(f, "dstport %u ", - ntohs(rta_getattr_u16(tb[IFLA_VXLAN_PORT]))); + rta_getattr_be16(tb[IFLA_VXLAN_PORT])); if (tb[IFLA_VXLAN_LEARNING] && !rta_getattr_u8(tb[IFLA_VXLAN_LEARNING])) diff --git a/tc/f_flower.c b/tc/f_flower.c index 1555764b9996..e132974e0d1d 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -511,7 +511,7 @@ static void flower_print_ip_addr(FILE *f, char *name, __be16 eth_type, static void flower_print_port(FILE *f, char *name, struct rtattr *attr) { - fprintf(f, "\n %s %d", name, ntohs(rta_getattr_u16(attr))); + fprintf(f, "\n %s %d", name, rta_getattr_be16(attr)); } static int flower_print_opt(struct filter_util *qu, FILE *f, -- 2.10.2
[PATCH iproute2 V5 3/3] tc/act_tunnel: Introduce ip tunnel action
This action could be used before redirecting packets to a shared tunnel device, or when redirecting packets arriving from a such a device. The 'unset' action is optional. It is used to explicitly unset the metadata created by the tunnel device during decap. If not used, the metadata will be released automatically by the kernel. The 'set' operation, will set the metadata with the specified values for the encap. For example, the following flower filter will forward all ICMP packets destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before redirecting, a metadata for the vxlan tunnel is created using the tunnel_key action and it's arguments: $ tc filter add dev net0 protocol ip parent : \ flower \ ip_proto 1 \ dst_ip 11.11.11.2 \ action tunnel_key set \ src_ip 11.11.0.1 \ dst_ip 11.11.0.2 \ id 11 \ action mirred egress redirect dev vxlan0 Signed-off-by: Amir Vadai --- include/linux/tc_act/tc_tunnel_key.h | 42 ++ man/man8/tc-tunnel_key.8 | 112 +++ tc/Makefile | 1 + tc/m_tunnel_key.c| 258 +++ 4 files changed, 413 insertions(+) create mode 100644 include/linux/tc_act/tc_tunnel_key.h create mode 100644 man/man8/tc-tunnel_key.8 create mode 100644 tc/m_tunnel_key.c diff --git a/include/linux/tc_act/tc_tunnel_key.h b/include/linux/tc_act/tc_tunnel_key.h new file mode 100644 index ..f9ddf5369a45 --- /dev/null +++ b/include/linux/tc_act/tc_tunnel_key.h @@ -0,0 +1,42 @@ +/* + * Copyright (c) 2016, Amir Vadai + * Copyright (c) 2016, Mellanox Technologies. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#ifndef __LINUX_TC_TUNNEL_KEY_H +#define __LINUX_TC_TUNNEL_KEY_H + +#include + +#define TCA_ACT_TUNNEL_KEY 17 + +#define TCA_TUNNEL_KEY_ACT_SET 1 +#define TCA_TUNNEL_KEY_ACT_RELEASE 2 + +struct tc_tunnel_key { + tc_gen; + int t_action; +}; + +enum { + TCA_TUNNEL_KEY_UNSPEC, + TCA_TUNNEL_KEY_TM, + TCA_TUNNEL_KEY_PARMS, + TCA_TUNNEL_KEY_ENC_IPV4_SRC,/* be32 */ + TCA_TUNNEL_KEY_ENC_IPV4_DST,/* be32 */ + TCA_TUNNEL_KEY_ENC_IPV6_SRC,/* struct in6_addr */ + TCA_TUNNEL_KEY_ENC_IPV6_DST,/* struct in6_addr */ + TCA_TUNNEL_KEY_ENC_KEY_ID, /* be64 */ + TCA_TUNNEL_KEY_PAD, + __TCA_TUNNEL_KEY_MAX, +}; + +#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1) + +#endif + diff --git a/man/man8/tc-tunnel_key.8 b/man/man8/tc-tunnel_key.8 new file mode 100644 index ..17b15b9b34b9 --- /dev/null +++ b/man/man8/tc-tunnel_key.8 @@ -0,0 +1,112 @@ +.TH "Tunnel metadata manipulation action in tc" 8 "10 Nov 2016" "iproute2" "Linux" + +.SH NAME +tunnel_key - Tunnel metadata manipulation +.SH SYNOPSIS +.in +8 +.ti -8 +.BR tc " ... " "action tunnel_key" " { " unset " | " +.IR SET " }" + +.ti -8 +.IR SET " := " +.BR set " " src_ip +.IR ADDRESS +.BR dst_ip +.IR ADDRESS +.BI id " KEY_ID" + +.SH DESCRIPTION +The +.B tunnel_key +action combined with a shared IP tunnel device, allows to perform IP tunnel en- +or decapsulation on a packet, reflected by +the operation modes +.IR UNSET " and " SET . +The +.I UNSET +mode is optional - even without using it, the metadata information will be +released automatically when packet processing will be finished. +.IR UNSET +function could be used in cases when traffic is forwarded between two tunnels, +where the metadata from the first tunnel will be used for encapsulation done by +the second tunnel. +.IR SET +mode requires the source and destination ip +.I ADDRESS +and the tunnel key id +.I KEY_ID +which will be used by the ip tunnel shared device to create the tunnel header. The +.B tunnel_key +action is useful only in combination with a +.B mirred redirect +action to a shared IP tunnel device which will use the metadata (for +.I SET +) and unset the metadata created by it (for +.I UNSET +). + +.SH OPTIONS +.TP +.B unset +Unset the tunnel metadata created by the IP tunnel device. This function is +not mandatory and might be used only in some specific use cases (as explained +above). +.TP +.B set +Set tunnel metadata to be used by the IP tunnel device. Requires +.B id +, +.B src_ip +and +.B dst_ip +options. +.RS +.TP +.B id +Tunnel ID (for example VNI in VXLAN tunnel) +.TP +.B src_ip +Outer header source IP address (IPv4 or IPv6) +.TP +.B dst_ip +Outer header destination IP address (IPv4 or IPv6) +.RE +.SH EXAMPLES +The following example encapsulates incoming ICMP packets on eth0 into a vxlan +tunnel, by setting metadata to VNI 11, source IP 11.11.0.1 and destination IP +11.11.0.2, and by redirecting the packet with the metadata to device vxlan0, +which will do the actual encaps
[PATCH iproute2 V5 2/3] tc/cls_flower: Classify packet in ip tunnels
Introduce classifying by metadata extracted by the tunnel device. Outer header fields - source/dest ip and tunnel id, are extracted from the metadata when classifying. For example, the following will add a filter on the ingress Qdisc of shared vxlan device named 'vxlan0'. To forward packets with outer src ip 11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be forwarded to tap device 'vnet0': $ tc filter add dev vxlan0 protocol ip parent : \ flower \ enc_src_ip 11.11.0.2 \ enc_dst_ip 11.11.0.1 \ enc_key_id 11 \ dst_ip 11.11.11.1 \ action mirred egress redirect dev vnet0 Signed-off-by: Amir Vadai --- man/man8/tc-flower.8 | 17 ++- tc/f_flower.c| 82 ++-- 2 files changed, 95 insertions(+), 4 deletions(-) diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8 index 16ef261797ab..dd3564917dcc 100644 --- a/man/man8/tc-flower.8 +++ b/man/man8/tc-flower.8 @@ -34,7 +34,11 @@ flower \- flow based traffic control filter .BR dst_ip " | " src_ip " } { " .IR ipv4_address " | " ipv6_address " } | { " .BR dst_port " | " src_port " } " -.IR port_number " }" +.IR port_number " } | " +.B enc_key_id +.IR KEY-ID " | {" +.BR enc_dst_ip " | " enc_src_ip " } { " +.IR ipv4_address " | " ipv6_address " } | " .SH DESCRIPTION The .B flower @@ -112,6 +116,17 @@ which has to be specified in beforehand. Match on layer 4 protocol source or destination port number. Only available for .BR ip_proto " values " udp " and " tcp , which has to be specified in beforehand. +.TP +.BI enc_key_id " NUMBER" +.TQ +.BI enc_dst_ip " ADDRESS" +.TQ +.BI enc_src_ip " ADDRESS" +Match on IP tunnel metadata. Key id +.I NUMBER +is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel). +.I ADDRESS +must be a valid IPv4 or IPv6 address. .SH NOTES As stated above where applicable, matches of a certain layer implicitly depend on the matches of the next lower layer. Precisely, layer one and two matches ( diff --git a/tc/f_flower.c b/tc/f_flower.c index e132974e0d1d..7e7f4c92a947 100644 --- a/tc/f_flower.c +++ b/tc/f_flower.c @@ -41,7 +41,10 @@ static void explain(void) " dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" " dst_port PORT-NUMBER |\n" - " src_port PORT-NUMBER }\n" + " src_port PORT-NUMBER |\n" + " enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" + " enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n" + " enc_key_id [ KEY-ID ] }\n" " FILTERID := X:Y:Z\n" " ACTION-SPEC := ... look at individual actions\n" "\n" @@ -125,8 +128,9 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type, family = AF_INET; } else if (eth_type == htons(ETH_P_IPV6)) { family = AF_INET6; + } else if (!eth_type) { + family = AF_UNSPEC; } else { - fprintf(stderr, "Illegal \"eth_type\" for ip address\n"); return -1; } @@ -134,8 +138,10 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type, if (ret) return -1; - if (addr.family != family) + if (family && (addr.family != family)) { + fprintf(stderr, "Illegal \"eth_type\" for ip address\n"); return -1; + } addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type, addr.data, addr.bytelen); @@ -197,6 +203,18 @@ static int flower_parse_port(char *str, __u8 ip_port, bool is_src, return 0; } +static int flower_parse_key_id(const char *str, int type, struct nlmsghdr *n) +{ + int ret; + __be32 key_id; + + ret = get_be32(&key_id, str, 10); + if (!ret) + addattr32(n, MAX_MSG, type, key_id); + + return ret; +} + static int flower_parse_opt(struct filter_util *qu, char *handle, int argc, char **argv, struct nlmsghdr *n) { @@ -354,6 +372,38 @@ static int flower_parse_opt(struct filter_util *qu, char *handle, fprintf(stderr, "Illegal \"src_port\"\n"); return -1; } + } else if (matches(*argv, "enc_dst_ip") == 0) { + NEXT_ARG(); + ret = flower_parse_ip_addr(*argv, 0, + TCA_FLOWER_KEY_ENC_IPV4_DST, + TCA_FLOWER_KEY_ENC_IPV4_DST_MASK, + TCA_FLOWER_KEY_ENC_IPV6_DST, + TCA_FLOWER_KEY_ENC_IPV6_DST_MASK, +