[PATCH net v3.16] r8169: Increase no descriptors on max.
This patch increase rx/tx on maximum allowed 1024 4-duble-words descriptors. Signed-off-by: Corcodel Marian --- drivers/net/ethernet/realtek/r8169.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index e215812..5fd3fca 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -84,8 +84,8 @@ static const int multicast_filter_limit = 32; #define R8169_REGS_SIZE256 #define R8169_NAPI_WEIGHT 64 -#define NUM_TX_DESC64 /* Number of Tx descriptor registers */ -#define NUM_RX_DESC256U/* Number of Rx descriptor registers */ +#define NUM_TX_DESC1024/* Number of Tx descriptor registers */ +#define NUM_RX_DESC1024U /* Number of Rx descriptor registers */ #define R8169_TX_RING_BYTES(NUM_TX_DESC * sizeof(struct TxDesc)) #define R8169_RX_RING_BYTES(NUM_RX_DESC * sizeof(struct RxDesc)) -- 2.1.4
[PATCH] Increase no descriptors on max.
This patch increase rx/tx on maximum allowed 1024 4-duble-words descriptors. Signed-off-by: Corcodel Marian --- drivers/net/ethernet/realtek/r8169.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index e215812..5fd3fca 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -84,8 +84,8 @@ static const int multicast_filter_limit = 32; #define R8169_REGS_SIZE256 #define R8169_NAPI_WEIGHT 64 -#define NUM_TX_DESC64 /* Number of Tx descriptor registers */ -#define NUM_RX_DESC256U/* Number of Rx descriptor registers */ +#define NUM_TX_DESC1024/* Number of Tx descriptor registers */ +#define NUM_RX_DESC1024U /* Number of Rx descriptor registers */ #define R8169_TX_RING_BYTES(NUM_TX_DESC * sizeof(struct TxDesc)) #define R8169_RX_RING_BYTES(NUM_RX_DESC * sizeof(struct RxDesc)) -- 2.1.4
[net-next] arp: correct return value of arp_rcv
Currently, arp_rcv() always return zero on a packet delivery upcall. To make its behavior more compliant with the way this API should be used, this patch changes this to let it return NET_RX_SUCCESS when the packet is proper handled, and NET_RX_DROP otherwise. Signed-off-by: Zhang Shengju --- net/ipv4/arp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index c102eb5..ae235a1 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -880,7 +880,7 @@ out: consume_skb(skb); out_free_dst: dst_release(reply_dst); - return 0; + return NET_RX_SUCCESS; } static void parp_redo(struct sk_buff *skb) @@ -924,11 +924,11 @@ static int arp_rcv(struct sk_buff *skb, struct net_device *dev, consumeskb: consume_skb(skb); - return 0; + return NET_RX_SUCCESS; freeskb: kfree_skb(skb); out_of_mem: - return 0; + return NET_RX_DROP; } /* -- 1.8.3.1
pull request: bluetooth-next 2016-03-01
Hi Dave, Here's our main set of Bluetooth & 802.15.4 patches for the 4.6 kernel. - New Bluetooth HCI driver for Intel/AG6xx controllers - New Broadcom ACPI IDs - LED trigger support for indicating Bluetooth powered state - Various fixes in mac802154, 6lowpan and related drivers - New USB IDs for AR3012 Bluetooth controllers Please let me know if there are any issues pulling. Thanks. Johan --- The following changes since commit a30a9ea6e21b495372aff549f3dfd63198bd1f45: rocker: fix rocker_world_port_obj_vlan_add() (2016-02-23 13:12:31 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream for you to fetch changes up to 34bf1912bfc06bd9200893916078eb0f16480a95: Bluetooth: hci_uart: Add diag and address support for Intel/AG6xx (2016-02-29 19:25:22 +0200) Alexander Aring (9): MAINTAINERS: update 802.15.4 entries mac802154: fix mac header length check at86rf230: fix race on error handling at86rf230: fix state change handling on error mrf24j40: add writeable missing reg 6lowpan: iphc: add support for stateful compression ieee802154: 6lowpan: fix return of netdev notifier 6lowpan: iphc: fix stateful multicast compression 6lowpan: iphc: fix invalid case handling Andrzej Hajda (1): 6lowpan: fix error checking code Anton Protopopov (1): Bluetooth: hci_intel: Fix a wrong comparison Bhumika Goyal (1): Bluetooth: ath3k: Fixed a blank line after declaration issue Dmitry Tunin (3): Bluetooth: btusb: Add new AR3012 ID 13d3:3395 Bluetooth: Add new AR3012 ID 0489:e095 Bluetooth: btusb: Add a new AR3012 ID 04ca:3014 Heiner Kallweit (2): Bluetooth: add LED trigger for indicating HCI is powered up Bluetooth: Use managed version of led_trigger_register in LED trigger J.J. Meijer (1): Bluetooth: hci_bcm: Add new ACPI ID for bcm43241 Koen Zandberg (1): mac802154: Fixes kernel oops when unloading a radio driver Loic Poulain (1): Bluetooth: hci_uart: Add Intel/AG6xx support Marcel Holtmann (1): Bluetooth: hci_uart: Add diag and address support for Intel/AG6xx Mika Westerberg (1): Bluetooth: hci_bcm: Add BCM2E7C ACPI ID Petri Gynther (1): Bluetooth: btbcm: Fix handling of firmware not found Wei-Ning Huang (1): Bluetooth: hci_core: cancel power off delayed work properly MAINTAINERS| 9 +- drivers/bluetooth/Kconfig | 11 + drivers/bluetooth/Makefile | 1 + drivers/bluetooth/ath3k.c | 7 + drivers/bluetooth/btbcm.c | 3 +- drivers/bluetooth/btusb.c | 3 + drivers/bluetooth/hci_ag6xx.c | 337 + drivers/bluetooth/hci_bcm.c| 2 + drivers/bluetooth/hci_intel.c | 4 +- drivers/bluetooth/hci_ldisc.c | 6 + drivers/bluetooth/hci_uart.h | 8 +- drivers/net/ieee802154/at86rf230.c | 25 ++- drivers/net/ieee802154/mrf24j40.c | 1 + include/net/6lowpan.h | 32 +++ include/net/bluetooth/hci_core.h | 3 + include/net/mac802154.h| 5 +- net/6lowpan/core.c | 39 +++- net/6lowpan/debugfs.c | 247 + net/6lowpan/iphc.c | 413 +++- net/bluetooth/Kconfig | 9 + net/bluetooth/Makefile | 1 + net/bluetooth/hci_core.c | 7 + net/bluetooth/leds.c | 74 +++ net/bluetooth/leds.h | 16 ++ net/ieee802154/6lowpan/core.c | 7 +- net/mac802154/main.c | 2 +- 26 files changed, 1192 insertions(+), 80 deletions(-) create mode 100644 drivers/bluetooth/hci_ag6xx.c create mode 100644 net/bluetooth/leds.c create mode 100644 net/bluetooth/leds.h signature.asc Description: PGP signature
Re: [Intel-wired-lan] [next] igb: allow setting MAC address on i211 using a device tree blob V5
On Mar 1, 2016, at 03:52, Brown, Aaron F wrote: > This throws a few checkpatch warnings, but I won't withhold my tested by for > these: > > total: 0 errors, 2 warnings, 0 checks, 21 lines checked > > Your patch has style problems, please review. > > NOTE: If any of the errors are false positives, please report > them to the maintainer, see CHECKPATCH in MAINTAINERS. > u1463:[0]/usr/src/kernels/next-queue> Thanks for testing... Do you require me to reformat the patch text? And won't that break the link? John
Re: [PATCH] [BACKPORT] [3.14.56] bnx2x: Don't notify about scratchpad parities
On Thu, Dec 10, 2015 at 02:37:34PM +0100, Patrick Schaaf wrote: > On Friday 06 November 2015 09:32:46 Greg KH wrote: > > On Thu, Nov 05, 2015 at 11:18:37AM +0100, Patrick Schaaf wrote: > > > bnx2x: Don't notify about scratchpad parities > > > > > > This is a (trivial) "backport" of ad6afbe9578d1fa26680faf78c846bd8c00d1d6e > > > to stable kernel 3.14.56. > > > > This patch isn't in 4.1 either, do you want it there as well? > > Hi Greg, > > I didn't see the patch in 3.14.57 or 3.14.58 - could you please consider it > again (for all stable kernels that don't have it)? > > My three machines with bnx2x interfaces have been running file with patch > 3.14.56, for the last 35 days. The original problematic event (spewing a > million messages which are suppressed by that patch), did not reoccur so far > (neither did any other issue, dmesg is completely empty since boot). > > best regards > Patrick > > Related earlier posts / reports, for reference: > > http://marc.info/?l=linux-netdev&m=144663711626469 > http://lists.openwall.net/netdev/2015/11/05/48 Sorry for the long delay, now queued up. greg k-h
Re: linux-next: manual merge of the target-merge tree with the net-next tree
Hi Nicholas, On Mon, 29 Feb 2016 21:39:33 -0800 "Nicholas A. Bellinger" wrote: > > I'll include a note to Linus in target-pending/for-next-merge PULL > request, and will plan to wait until after DaveM's net-next is merged > for v4.6-rc0. The order doesn't really matter and Linus is cleverer than I am :-) -- Cheers, Stephen Rothwell
Re: linux-next: manual merge of the target-merge tree with the net-next tree
On Mon, 2016-02-29 at 17:39 +1100, Stephen Rothwell wrote: > Hi Nicholas, > > Today's linux-next merge of the target-merge tree got a conflict in: > > drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h > > between commit: > > ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support") > > from the net-next tree and commit: > > c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD") > > from the target-merge tree. > > I fixed it up (the latter was a superset of the former) and can carry > the fix as necessary (no action is required). > Thanks Stephen. I'll include a note to Linus in target-pending/for-next-merge PULL request, and will plan to wait until after DaveM's net-next is merged for v4.6-rc0.
Re: [PATCH] fsl/fman: remove dTSEC-A003 Errata workaround
On 02/29/2016 09:17 AM, igal.liber...@freescale.com wrote: > From: Igal Liberman > > Errata dTSEC-A003 was fixed in P4080 rev 3.0. > Prior revisions are not supported, so the workaround can be removed. > > Signed-off-by: Igal Liberman Since when do we not support p4080 rev 2? -Scott
RE: [Intel-wired-lan] [next] igb: allow setting MAC address on i211 using a device tree blob V5
> From: netdev-ow...@vger.kernel.org [mailto:netdev- > ow...@vger.kernel.org] On Behalf Of John Holland > Sent: Thursday, February 18, 2016 3:11 AM > To: intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org > Subject: [Intel-wired-lan] [next] igb: allow setting MAC address on i211 using > a device tree blob V5 > > Hello, > > The Intel i211 LOM PCIe Ethernet controllers' iNVM operates as an OTP and > has no external EEPROM interface [1]. The following allows the driver to > pickup the MAC address from a device tree blob when CONFIG_OF has been > enabled. > > [1] > http://www.intel.com/content/www/us/en/embedded/products/networkin > g/i211-ethernet-controller-datasheet.html > > Changes V2 > - Restrict searching for compatible devices to current pci device. > > Changes V3 > - Add device tree binding documentation. > > Changes V4 > - Rebase patch. > > Changes V5 > - Use eth_platform_get_mac_address() to resolve MAC specified in a dtb. > - Remove now invalid device tree binding documentation specified in V3 >und V4. > > Signed-off-by: John Holland > --- > drivers/net/ethernet/intel/igb/igb_main.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) This throws a few checkpatch warnings, but I won't withhold my tested by for these: - u1463:[0]/usr/src/kernels/next-queue> git format-patch $item -1 --stdout|./scripts/checkpatch.pl - WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line) #14: http://www.intel.com/content/www/us/en/embedded/products/networking/i211-ethernet-controller-datasheet.html WARNING: email address 'John Holland' might be better as 'John Holland ' #30: Signed-off-by: John Holland total: 0 errors, 2 warnings, 0 checks, 21 lines checked Your patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. u1463:[0]/usr/src/kernels/next-queue> - I do not seem to have hardware that uses device tree, so my testing is relegated to regression tests with my existing set of chipsets. Tested-by: Aaron Brown
RE: [Intel-wired-lan] [PATCH] igb: Garbled output for "ethtool -m"
> From: Intel-wired-lan [intel-wired-lan-boun...@lists.osuosl.org] on behalf of > Doron Shikmoni [doron.shikm...@gmail.com] > Sent: Tuesday, February 16, 2016 11:34 PM > To: intel-wired-...@lists.osuosl.org > Cc: netdev@vger.kernel.org > Subject: [Intel-wired-lan] [PATCH] igb: Garbled output for "ethtool -m" > > Hello, > > Garbled output for "ethtool -m ethX", in igb-driven NICs with module / > plugin EEPROM (i.e. SFP information). Each output data byte appears > duplicated. > > In igb_ethtool.c, igb_get_module_eeprom() is reading the EEPROM via i2c; > the eeprom offset for each word that's read via igb_read_phy_reg_i2c() > was passed in #words, whereas it needs to be a byte offset. > This patches fixes the bug. > > Signed-off-by: Doron Shikmoni > --- > drivers/net/ethernet/intel/igb/igb_ethtool.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Checkpatch complains that you pushed the line over 80 characters: u1463:[0]/usr/src/kernels/next-queue> git format-patch $item -1 --stdout|./scripts/checkpatch.pl - WARNING: line over 80 characters #29: FILE: drivers/net/ethernet/intel/igb/igb_ethtool.c:3010: + status = igb_read_phy_reg_i2c(hw, (first_word + i) * 2, &dataword[i]); total: 0 errors, 1 warnings, 0 checks, 8 lines checked Your patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. u1463:[0]/usr/src/kernels/next-queue> But functionally seems good. I'll let Jeff choose whether to be a stickler for the warning or not, so... Tested-by: Aaron Brown
Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files
在 2016/2/26 22:29, Arnd Bergmann 写道: > The two header files got moved to include/linux, and most > users were already converted, this changes the remaining drivers > and removes the files. > > Signed-off-by: Arnd Bergmann > --- > drivers/dma/idma64.h| 2 +- > drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 2 +- The HNS portion: Acked-by: Yisen Zhuang > drivers/net/ethernet/netronome/nfp/nfp_net.h| 2 +- > include/asm-generic/io-64-nonatomic-hi-lo.h | 2 -- > include/asm-generic/io-64-nonatomic-lo-hi.h | 2 -- > 5 files changed, 3 insertions(+), 7 deletions(-) > delete mode 100644 include/asm-generic/io-64-nonatomic-hi-lo.h > delete mode 100644 include/asm-generic/io-64-nonatomic-lo-hi.h > > diff --git a/drivers/dma/idma64.h b/drivers/dma/idma64.h > index 8423f13ed0da..a52ad6bcf86a 100644 > --- a/drivers/dma/idma64.h > +++ b/drivers/dma/idma64.h > @@ -16,7 +16,7 @@ > #include > #include > > -#include > +#include > > #include "virt-dma.h" > > diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c > b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c > index 802d55457f19..fd90f3737963 100644 > --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c > +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c > @@ -7,7 +7,7 @@ > * (at your option) any later version. > */ > > -#include > +#include > #include > #include "hns_dsaf_main.h" > #include "hns_dsaf_mac.h" > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h > b/drivers/net/ethernet/netronome/nfp/nfp_net.h > index ab264e1bccd0..75683fb26734 100644 > --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h > +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h > @@ -45,7 +45,7 @@ > #include > #include > #include > -#include > +#include > > #include "nfp_net_ctrl.h" > > diff --git a/include/asm-generic/io-64-nonatomic-hi-lo.h > b/include/asm-generic/io-64-nonatomic-hi-lo.h > deleted file mode 100644 > index 32b73abce1b0.. > --- a/include/asm-generic/io-64-nonatomic-hi-lo.h > +++ /dev/null > @@ -1,2 +0,0 @@ > -/* XXX: delete asm-generic/io-64-nonatomic-hi-lo.h after converting new > users */ > -#include > diff --git a/include/asm-generic/io-64-nonatomic-lo-hi.h > b/include/asm-generic/io-64-nonatomic-lo-hi.h > deleted file mode 100644 > index 55a627c37721.. > --- a/include/asm-generic/io-64-nonatomic-lo-hi.h > +++ /dev/null > @@ -1,2 +0,0 @@ > -/* XXX: delete asm-generic/io-64-nonatomic-lo-hi.h after converting new > users */ > -#include >
Re:LDPE/HDPE/TPE gloves etc.
Dear Manager, This is Vivian from Ju County Mingbo Industry & Trade Co.,Ltd in China. Glad to hear that you're on the market for disposable PE gloves and aprons. We mainly produced LDPE/HDPE/TPE gloves,Two fingers gloves, PE aprons for more than eight years. Our products have been exported to many countries. So please be assured of the quality. Hope to be a partner of your company! Any interest, please freely contact me! Looking forward to hearing from you soon. Best regards£¡ Vivian Ju County Mingbo Industry £¦ Trade Co.,Ltd ADD: Liu Guanzhuang industrial Park£¬Ju County£¬Rizhao City£¬Shandong Province Tel£º18661694858 Fax£º0633-6178378
Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote: > The two header files got moved to include/linux, and most > users were already converted, this changes the remaining drivers > and removes the files. > > Signed-off-by: Arnd Bergmann > --- > drivers/dma/idma64.h| 2 +- > drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 2 +- > drivers/net/ethernet/netronome/nfp/nfp_net.h| 2 +- The NFP portion: Acked-by: Simon Horman > include/asm-generic/io-64-nonatomic-hi-lo.h | 2 -- > include/asm-generic/io-64-nonatomic-lo-hi.h | 2 -- > 5 files changed, 3 insertions(+), 7 deletions(-) > delete mode 100644 include/asm-generic/io-64-nonatomic-hi-lo.h > delete mode 100644 include/asm-generic/io-64-nonatomic-lo-hi.h
Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32
On 16-02-29 01:25 PM, Cong Wang wrote: > On Mon, Feb 29, 2016 at 10:58 AM, Jiri Pirko wrote: >> Mon, Feb 29, 2016 at 07:40:53PM CET, john.fastab...@gmail.com wrote: >>> On 16-02-27 08:28 PM, Cong Wang wrote: On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend wrote: > On 16-02-26 09:39 AM, Cong Wang wrote: >> On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend >> wrote: >>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h >>> index 2121df5..e64d20b 100644 >>> --- a/include/net/pkt_cls.h >>> +++ b/include/net/pkt_cls.h >>> @@ -392,4 +392,9 @@ struct tc_cls_u32_offload { >>> }; >>> }; >>> >>> +static inline bool tc_should_offload(struct net_device *dev) >>> +{ >>> + return dev->netdev_ops->ndo_setup_tc; >>> +} >>> + >> >> These should be protected by CONFIG_NET_CLS_U32, no? >> > > Its not necessary it is a completely general function and I only > lifted it out of cls_u32 so that the cls_flower classifier could > also use it. > > I don't see the need off-hand to have it wrapped in an ORd ifdef > statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...). > Any particular reason you were thnking it should be wrapped in ifdefs? > Not a big deal. I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n. Thanks. >>> >>> Well because this is 'static inline' gcc should just remove it >>> if it is not used. Assuming non-ancient gcc and normal compile >>> flags, e.g. you are not including -fkeep-inline-functions or >>> something. >>> >>> So just to keep it readable I would prefer to just leave it >>> as is. >> >> Definitelly. cls_flower will use it in very near future. Making it >> dependent on CONFIG_NET_CLS_U32 makes 0 sense to me. > > Oh, why then do you have u32 in the struct name tc_cls_u32_offload? > > (Note that in the above I said "these" not "this", so I never only refer > to tc_should_offload) > hmm yeah that likely wont be needed by flower although it could be used. I still think its best to leave this as is there doesn't seem to be a very strong precedent to wrap any of the other structs/fields/etc in pkt_cls.h into their respective ifdef/endif blocks. And I think it starts to get a bit much if we do. I'm trusting gcc here can do the right thing when these are included but never used. Thanks, John
RE: [PATCH net-next 0/4] lan78xx: driver update
> > This patch series add new ethtool functions of set_pauseparam & > get_pauseparam > > and MAINTAINERS entry. > > Series applied, thanks. Thanks. > Please fix your configuration such that your proper name appears in the > "From: " field of your outgoing emails. That is what ends up in the > Author field of every GIT commit. And right now only your email address > appears there. I'll contact IT depart to find out there is way for it. Just in case, can I send over other email such as gmail, but "Signed by" is company email address?
Re: Softirq priority inversion from "softirq: reduce latencies"
On 02/29/2016 11:14 AM, Thomas Gleixner wrote: > On Mon, 29 Feb 2016, Peter Hurley wrote: >> On 02/29/2016 10:24 AM, Eric Dumazet wrote: Just to be clear if (time_before(jiffies, end) && !need_resched() && --max_restart) goto restart; aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process. >>> >>> Sure, now remove the 1st and 2nd condition. >> >> Well just removing the 2nd condition has everything working fine, >> because that fixes the priority inversion. > > No. It does not fix anything. It hides the shortcomings of the driver. > >> However, when system resources are _not_ contended, it makes no >> sense to be forced to revert to ksoftirqd resolution, which is strictly >> intended as fallback. > > No. You claim it is simply because your driver does not handle that situation > properly. > >> Or flipping your argument on its head, why not just _always_ execute >> softirq in ksoftirqd? > > Which is what that change effectivley does. And that makes a lot of sense, > because you get the softirq load under scheduler control and do not let the > softirq run as a context stealing entity which is completely uncontrollable by > the scheduler. Ok, fair enough. However, charging [in the scheduler sense] very lightweight DMA completion for one subsystem collectively with very heavyweight NET_RX (doing garbage collection in softirq!) is hardly ideal. The alternative being threaded interrupt handlers (which are essentially treated as 0.00 scheduler cost). I just want to make sure that's the conscious choice being made, when the patches for converting from tasklet to threaded irq start hitting subsystem maintainers. Regards, Peter Hurley
[PATCH net v2] mld, igmp: Fix reserved tailroom calculation
The current reserved_tailroom calculation fails to take hlen and tlen into account. skb: [__hlen__|__data|__tlen___|__extra__] ^ ^ headskb_end_offset In this representation, hlen + data + tlen is the size passed to alloc_skb. "extra" is the extra space made available in __alloc_skb because of rounding up by kmalloc. We can reorder the representation like so: [__hlen__|__data|__extra__|__tlen___] ^ ^ headskb_end_offset The maximum space available for ip headers and payload without fragmentation is min(mtu, data + extra). Therefore, reserved_tailroom = data + extra + tlen - min(mtu, data + extra) = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) Compare the second line to the current expression: reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset) and we can see that hlen and tlen are not taken into account. The min() in the third line can be expanded into: if mtu < skb_tailroom - tlen: reserved_tailroom = skb_tailroom - mtu else: reserved_tailroom = tlen Depending on hlen, tlen, mtu and the number of multicast address records, the current code may output skbs that have less tailroom than dev->needed_tailroom or it may output more skbs than needed because not all space available is used. Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs") Signed-off-by: Benjamin Poirier --- Notes: Changes v1->v2 As suggested by Hannes, move the code to an inline helper and express it using "if" rather than "min". include/linux/skbuff.h | 24 net/ipv4/igmp.c| 3 +-- net/ipv6/mcast.c | 3 +-- 3 files changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4ce9ff7..d3fcd45 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1985,6 +1985,30 @@ static inline void skb_reserve(struct sk_buff *skb, int len) skb->tail += len; } +/** + * skb_tailroom_reserve - adjust reserved_tailroom + * @skb: buffer to alter + * @mtu: maximum amount of headlen permitted + * @needed_tailroom: minimum amount of reserved_tailroom + * + * Set reserved_tailroom so that headlen can be as large as possible but + * not larger than mtu and tailroom cannot be smaller than + * needed_tailroom. + * The required headroom should already have been reserved before using + * this function. + */ +static inline void skb_tailroom_reserve(struct sk_buff *skb, unsigned int mtu, + unsigned int needed_tailroom) +{ + SKB_LINEAR_ASSERT(skb); + if (mtu < skb_tailroom(skb) - needed_tailroom) + /* use at most mtu */ + skb->reserved_tailroom = skb_tailroom(skb) - mtu; + else + /* use up to all available space */ + skb->reserved_tailroom = needed_tailroom; +} + #define ENCAP_TYPE_ETHER 0 #define ENCAP_TYPE_IPPROTO 1 diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 05e4cba..b3086cf 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -356,9 +356,8 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu) skb_dst_set(skb, &rt->dst); skb->dev = dev; - skb->reserved_tailroom = skb_end_offset(skb) - -min(mtu, skb_end_offset(skb)); skb_reserve(skb, hlen); + skb_tailroom_reserve(skb, mtu, tlen); skb_reset_network_header(skb); pip = ip_hdr(skb); diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 5ee56d0..d64ee7e 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1574,9 +1574,8 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) return NULL; skb->priority = TC_PRIO_CONTROL; - skb->reserved_tailroom = skb_end_offset(skb) - -min(mtu, skb_end_offset(skb)); skb_reserve(skb, hlen); + skb_tailroom_reserve(skb, mtu, tlen); if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) { /* : -- 2.7.0
Re: [PATCH net-next] hv_netvsc: add ethtool support for set and get of settings
From: Ben Hutchings Date: Mon, 29 Feb 2016 22:34:38 + > On Mon, 2016-02-29 at 17:09 -0500, David Miller wrote: >> From: Simon Xiao >> Date: Thu, 25 Feb 2016 15:24:08 -0800 >> >> > This patch allows the user to set and retrieve speed and duplex of the >> > hv_netvsc device via ethtool. >> > >> > Example: >> > $ ethtool eth0 >> > Settings for eth0: >> > ... >> > Speed: Unknown! >> > Duplex: Unknown! (255) >> > ... >> > $ ethtool -s eth0 speed 1000 duplex full >> > $ ethtool eth0 >> > Settings for eth0: >> > ... >> > Speed: 1000Mb/s >> > Duplex: Full >> > ... >> > >> > This is based on patches by Roopa Prabhu and Nikolay Aleksandrov. >> > >> > Signed-off-by: Simon Xiao >> >> Applied, thanks. > > I missed this due to flu, but now I look at it - I don't see the point. > Link speed isn't meaingful for a memory-based transport, so "unknown" > is correct. The link is effectively full duplex though. > > If the issue is that ethtool is a bit shouty about unknowns, let's > consider changing that in ethtool, not teaching drivers to lie. The issue is that certain bonding modes do not work properly without a speed being reported by a device. We're doing this for other "virtual" devices already thanks to changes that went in last week, so there is precedence.
Re: [PATCHv2 08/10] rfkill: Use switch to demux userspace operations
On Mon, Feb 29, 2016 at 05:30:20PM -0500, João Paulo Rechi Vita wrote: > I agree there is a difference in the logic here, thanks for taking the > time to point it out so clearly, and sorry for missing this. But AFAIU > userspace should not call RFKILL_OP_CHANGE with ev.type == > RFKILL_TYPE_ALL, as RFKILL_OP_CHANGE is intended to be used to > block/unblock one RFKill switch, and it is not possible to create a > RFKill switch with type == RFKILL_TYPE_ALL (rfkill_alloc() would > return NULL). Interesting. Maybe Johannes can comment on that part since I think he wrote the code that interacts with kernel for the rfkill test cases. > I tried to look into the source code of the test suite you pointed, > but couldn't easily figure out how it ends up with that combination. > Could you please explain (or point me in the code) how is that a valid > operation? If I'm not missing anything, we should probably return > EINVAL in this case. These specific failures were shown for the test cases in this file: http://w1.fi/cgit/hostap/tree/tests/hwsim/test_rfkill.py The interaction with kernel is done using this code: http://w1.fi/cgit/hostap/tree/tests/hwsim/rfkill.py It does indeed look like TYPE_ALL is used here (the block() and unblock() implementations). If this is incorrect, we can certainly change the script since I'd assume this is not used for anything else than the hwsim test cases (or well who knows, it is available out there, so if someone needs python code to do rfkill operations..). -- Jouni MalinenPGP id EFC895FA
Re: [PATCH net-next] hv_netvsc: add ethtool support for set and get of settings
On Mon, 2016-02-29 at 17:09 -0500, David Miller wrote: > From: Simon Xiao > Date: Thu, 25 Feb 2016 15:24:08 -0800 > > > This patch allows the user to set and retrieve speed and duplex of the > > hv_netvsc device via ethtool. > > > > Example: > > $ ethtool eth0 > > Settings for eth0: > > ... > > Speed: Unknown! > > Duplex: Unknown! (255) > > ... > > $ ethtool -s eth0 speed 1000 duplex full > > $ ethtool eth0 > > Settings for eth0: > > ... > > Speed: 1000Mb/s > > Duplex: Full > > ... > > > > This is based on patches by Roopa Prabhu and Nikolay Aleksandrov. > > > > Signed-off-by: Simon Xiao > > Applied, thanks. I missed this due to flu, but now I look at it - I don't see the point. Link speed isn't meaingful for a memory-based transport, so "unknown" is correct. The link is effectively full duplex though. If the issue is that ethtool is a bit shouty about unknowns, let's consider changing that in ethtool, not teaching drivers to lie. Ben. -- Ben Hutchings If God had intended Man to program, we'd have been born with serial I/O ports. signature.asc Description: This is a digitally signed message part
Re: [PATCHv2 08/10] rfkill: Use switch to demux userspace operations
Hello Jouni, On 26 February 2016 at 12:59, Jouni Malinen wrote: > On Mon, Feb 22, 2016 at 11:36:39AM -0500, João Paulo Rechi Vita wrote: >> Using a switch to handle different ev.op values in rfkill_fop_write() >> makes the code easier to extend, as out-of-range values can always be >> handled by the default case. > > This breaks rfkill.. There are automated test scripts for testing this > area (and most of Wi-Fi for that matter. It would be nice if these were > used for changes before they get contributed upstream.. > > http://buildbot.w1.fi/hwsim/ > Thanks for pointing that out, I haven't heard of this tool before. I'll give it a try before my next submission. > This specific commit broke all the rfkill_* test cases because of > following: > >> diff --git a/net/rfkill/core.c b/net/rfkill/core.c >> @@ -1199,29 +1200,32 @@ static ssize_t rfkill_fop_write(struct file *file, >> const char __user *buf, >> - list_for_each_entry(rfkill, &rfkill_list, node) { >> - if (rfkill->idx != ev.idx && ev.op != RFKILL_OP_CHANGE_ALL) >> - continue; >> - >> - if (rfkill->type != ev.type && ev.type != RFKILL_TYPE_ALL) >> - continue; > > Note that RFKILL_TYPE_ALL here.. > >> + list_for_each_entry(rfkill, &rfkill_list, node) >> + if (rfkill->type == ev.type || >> + ev.type == RFKILL_TYPE_ALL) >> + rfkill_set_block(rfkill, ev.soft); > > It was included for RFKILL_OP_CHANGE_ALL. > >> + case RFKILL_OP_CHANGE: >> + list_for_each_entry(rfkill, &rfkill_list, node) >> + if (rfkill->idx == ev.idx && rfkill->type == ev.type) >> + rfkill_set_block(rfkill, ev.soft); > > but not for RFKILL_OP_CHANGE.. > > This needs following to work: > > > diff --git a/net/rfkill/core.c b/net/rfkill/core.c > index 59ff92d..c4bbd19 100644 > --- a/net/rfkill/core.c > +++ b/net/rfkill/core.c > @@ -1239,7 +1239,9 @@ static ssize_t rfkill_fop_write(struct file *file, > const char __user *buf, > break; > case RFKILL_OP_CHANGE: > list_for_each_entry(rfkill, &rfkill_list, node) > - if (rfkill->idx == ev.idx && rfkill->type == ev.type) > + if (rfkill->idx == ev.idx && > + (rfkill->type == ev.type || > +ev.type == RFKILL_TYPE_ALL)) > rfkill_set_block(rfkill, ev.soft); > ret = 0; > break; > I agree there is a difference in the logic here, thanks for taking the time to point it out so clearly, and sorry for missing this. But AFAIU userspace should not call RFKILL_OP_CHANGE with ev.type == RFKILL_TYPE_ALL, as RFKILL_OP_CHANGE is intended to be used to block/unblock one RFKill switch, and it is not possible to create a RFKill switch with type == RFKILL_TYPE_ALL (rfkill_alloc() would return NULL). I tried to look into the source code of the test suite you pointed, but couldn't easily figure out how it ends up with that combination. Could you please explain (or point me in the code) how is that a valid operation? If I'm not missing anything, we should probably return EINVAL in this case. Regards, -- João Paulo Rechi Vita http://about.me/jprvita
Re: Question on switchdev
On Mon, Feb 29, 2016 at 04:43:16PM -0500, Murali Karicheri wrote: Hi Murali Please can you get your email client to wrap lines at ~ 75 characters. > TI Keystone netcp h/w has a switch. It has n slave ports and 1 host > port. Currently the netcp driver disables the switch functionality > which makes them appear as n nic ports. However we have requirement > to add switch support in the driver. I have reviewed the > experimental driver documentation > Documentation/networking/switchdev.txt and would like to understand > it better so that I can add this support to keystone netcp driver. > NetCP h/w has a 1 (host port) x n (slave port) switch. It can do > layer 2 forwarding between ports. In the switch mode, host driver > provides the frame to the switch and switch uses the filter data > base (AKA ALE table, Address Learning Engine table) to forward the > packet. There is a piece of information available per frame (meta > data) to decide if frame to be forwarded to a particular port or use > the fdb for forward decisions. This makes is sound like a good fit for DSA. Documentation/networking/dsa/dsa.txt. You probably need to implement a new tagging protocol in net/dsa/tag_*.c and a driver in drivers/net/dsa/ > 1. How does port netdev differ from regular netdev that carries data >when registering netdev? Any example you can point to? They don't differ at all. You consider each port of the switch to be a normal Linux interface. > 2. I assume port netdev will appear as an interface in ifconfig -a >command and it is not assigned an IP address. Correct? The user can assign an address, if they want. It is a normal Linux interface. They can also create a bridge, and add the interface to the bridge. An advanced DSA driver will keep track of which interfaces are in which bridge, and if possible, offload the bridge to the hardware. > 3. with 1xn switch, so we have n + 1 netdev registered with net >core? I assume, only 1 netdev is for data plane and the rest are >control plane. Is this correct? No. You only have netdev devices for the external ports of the switch. The other port is known as the cpu port, and does not have a netdev. > 4. We have bunch of port specific configuration that we would like > to control or configure from use space using standard tools. For > example, switch port state, flow control etc. Is that possible to > add using this framework? ethtool update needed for this? The whole idea here is that the switch ports are normal Linux interface. You use normal linux APIs to configure them. You probably don't need to add any new features. One key things to get your head around. The switch is a hardware accelerator for the Linux stack. You have to think how you can make your switch accelerate the Linux stack. It takes people a while to get this. Andrew
Re: [PATCH net-next 0/4] lan78xx: driver update
From: Date: Thu, 25 Feb 2016 23:33:05 + > This patch series add new ethtool functions of set_pauseparam & > get_pauseparam > and MAINTAINERS entry. Series applied, thanks. Please fix your configuration such that your proper name appears in the "From: " field of your outgoing emails. That is what ends up in the Author field of every GIT commit. And right now only your email address appears there. Thanks.
Re: [PATCH net-next] hv_netvsc: add ethtool support for set and get of settings
From: Simon Xiao Date: Thu, 25 Feb 2016 15:24:08 -0800 > This patch allows the user to set and retrieve speed and duplex of the > hv_netvsc device via ethtool. > > Example: > $ ethtool eth0 > Settings for eth0: > ... > Speed: Unknown! > Duplex: Unknown! (255) > ... > $ ethtool -s eth0 speed 1000 duplex full > $ ethtool eth0 > Settings for eth0: > ... > Speed: 1000Mb/s > Duplex: Full > ... > > This is based on patches by Roopa Prabhu and Nikolay Aleksandrov. > > Signed-off-by: Simon Xiao Applied, thanks.
[PATCH next 2/3] ipvlan: Implement L3-symmetric mode.
From: Mahesh Bandewar Current packet processing from IPtables perspective is asymmetric for IPvlan L3 mode. On egress path, packets hit LOCAL_OUT and POST_ROUTING hooks in slave-ns as well as master's ns however during ingress path, LOCAL_IN and PRE_ROUTING hooks are hit only in slave's ns. L3 mode is restrictive and uses master's L3 for packet processing, so it does not make sense to skip these hooks in ingress path in master's ns. The changes in this patch nominates master-dev to be the device for L3 ingress processing when skb device is the IPvlan slave. Since master device is used for L3 processing, the IPT hooks are hit in master's ns making the packet processing symmetric. The other minor change this patch does to add a force parameter for set_port_mode() to ensure correct settings are set during the device initialization phase. Signed-off-by: Mahesh Bandewar CC: Eric Dumazet CC: Tim Hockin CC: Alex Pollitt CC: Matthew Dupre --- drivers/net/ipvlan/ipvlan_main.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 5802b9025765..734c25e52c60 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -14,16 +14,19 @@ static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev) ipvlan->dev->mtu = dev->mtu - ipvlan->mtu_adj; } -static void ipvlan_set_port_mode(struct ipvl_port *port, u16 nval) +static void ipvlan_set_port_mode(struct ipvl_port *port, u16 nval, bool force) { struct ipvl_dev *ipvlan; - if (port->mode != nval) { + if (port->mode != nval || force) { list_for_each_entry(ipvlan, &port->ipvlans, pnode) { - if (nval == IPVLAN_MODE_L3) + if (nval == IPVLAN_MODE_L3) { ipvlan->dev->flags |= IFF_NOARP; - else + ipvlan->dev->l3_dev = port->dev; + } else { ipvlan->dev->flags &= ~IFF_NOARP; + ipvlan->dev->l3_dev = ipvlan->dev; + } } port->mode = nval; } @@ -392,7 +395,7 @@ static int ipvlan_nl_changelink(struct net_device *dev, if (data && data[IFLA_IPVLAN_MODE]) { u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]); - ipvlan_set_port_mode(port, nmode); + ipvlan_set_port_mode(port, nmode, false); } return 0; } @@ -479,7 +482,6 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev, memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN); dev->priv_flags |= IFF_IPVLAN_SLAVE; - port->count += 1; err = register_netdevice(dev); if (err < 0) @@ -490,7 +492,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev, goto ipvlan_destroy_port; list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans); - ipvlan_set_port_mode(port, mode); + ipvlan_set_port_mode(port, mode, true); netif_stacked_transfer_operstate(phy_dev, dev); return 0; -- 2.7.0.rc3.207.g0ac5344
[PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
From: Mahesh Bandewar netif_receive_skb_core() dispatcher uses skb->dev device to send it to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet handlers intern use the device passed to determine the net-ns to further process these packets. Now with the nomination logic, the dispatcher will call netif_get_l3_dev() helper to select the device to be used for this processing. Since l3_dev is initialized to self, normal packet processing should not change. Signed-off-by: Mahesh Bandewar CC: Eric Dumazet CC: Tim Hockin CC: Alex Pollitt CC: Matthew Dupre --- net/core/dev.c | 9 ++--- net/ipv4/ip_input.c | 5 +++-- net/ipv6/ip6_input.c | 5 +++-- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index c4023a68cdc1..9252436ef11a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1811,7 +1811,8 @@ static inline int deliver_skb(struct sk_buff *skb, if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) return -ENOMEM; atomic_inc(&skb->users); - return pt_prev->func(skb, skb->dev, pt_prev, orig_dev); + return pt_prev->func(skb, netif_get_l3_dev(skb->dev), pt_prev, +orig_dev); } static inline void deliver_ptype_list_skb(struct sk_buff *skb, @@ -1904,7 +1905,8 @@ again: } out_unlock: if (pt_prev) - pt_prev->func(skb2, skb->dev, pt_prev, skb->dev); + pt_prev->func(skb2, netif_get_l3_dev(skb->dev), pt_prev, + skb->dev); rcu_read_unlock(); } @@ -4157,7 +4159,8 @@ ncls: if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC))) goto drop; else - ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); + ret = pt_prev->func(skb, netif_get_l3_dev(skb->dev), + pt_prev, orig_dev); } else { drop: if (!deliver_exact) diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index e3d782746d9d..b47164e3e1c6 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -247,7 +247,8 @@ int ip_local_deliver(struct sk_buff *skb) /* * Reassemble IP fragments. */ - struct net *net = dev_net(skb->dev); + struct net_device *dev = netif_get_l3_dev(skb->dev); + struct net *net = dev_net(dev); if (ip_is_fragment(ip_hdr(skb))) { if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER)) @@ -255,7 +256,7 @@ int ip_local_deliver(struct sk_buff *skb) } return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN, - net, NULL, skb, skb->dev, NULL, + net, NULL, skb, dev, NULL, ip_local_deliver_finish); } diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index c05c425c2389..88443ac06402 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -287,9 +287,10 @@ discard: int ip6_input(struct sk_buff *skb) { + struct net_device *dev = netif_get_l3_dev(skb->dev); + return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN, - dev_net(skb->dev), NULL, skb, skb->dev, NULL, - ip6_input_finish); + dev_net(dev), NULL, skb, dev, NULL, ip6_input_finish); } int ip6_mc_input(struct sk_buff *skb) -- 2.7.0.rc3.207.g0ac5344
[PATCH next 1/3] dev: Add netif_get_l3_dev() helper
From: Mahesh Bandewar This patch adds a l3_dev pointer and a helper function to retrieve that. During ingress L3 packet processing, this device will be used instead of skb->dev. Since l3_dev is initialized to self; l3_dev should be pointing to skb->dev so the normal packet processing is neither altered nor should incur any additional cost (as it resides in the RX cache line). Signed-off-by: Mahesh Bandewar CC: Eric Dumazet CC: Tim Hockin CC: Alex Pollitt CC: Matthew Dupre --- include/linux/netdevice.h | 6 ++ net/core/dev.c| 1 + 2 files changed, 7 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e52077ffe5ed..1cf7e8d61043 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1738,6 +1738,7 @@ struct net_device { unsigned long gro_flush_timeout; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; + struct net_device *l3_dev; #ifdef CONFIG_NET_CLS_ACT struct tcf_proto __rcu *ingress_cl_list; @@ -4085,6 +4086,11 @@ static inline void netif_keep_dst(struct net_device *dev) dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM); } +static inline struct net_device *netif_get_l3_dev(struct net_device *dev) +{ + return dev->l3_dev; +} + extern struct pernet_operations __net_initdata loopback_net_ops; /* Logging, debugging and troubleshooting/diagnostic helpers. */ diff --git a/net/core/dev.c b/net/core/dev.c index edb7179bc051..c4023a68cdc1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7463,6 +7463,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, if (!dev->ethtool_ops) dev->ethtool_ops = &default_ethtool_ops; + dev->l3_dev = dev; nf_hook_ingress_init(dev); return dev; -- 2.7.0.rc3.207.g0ac5344
[PATCH next 0/3] IPvlan L3 symetric mode
From: Mahesh Bandewar One of the major request (for enhancement) that I have received from various users of IPvlan in L3 mode is its inability to handle IPtables. In a typical IPvlan L3 setup where master is in default-ns and each slave is into different (slave) ns. In this setup egress packet processing for traffic originating from slave-ns will hit all NF_HOOKs in slave-ns as well as default-ns. However same is not true for ingress processing. All these NF_HOOKs are hit only in the slave-ns skipping them in the default-ns. IPvlan in L3 mode is restrictive and it's preferred to hit these hooks in master's ns than in slave's ns (L2 mode is where these hooks will be hit only in slave's ns). This can be achieved by adding a device pointer in net_device struct. Stack will use this device reference and associated ns for all egress L3 processing. By default this is initialized to self so skb->dev would be same as skb->dev->l3_dev and hence the normal path will stay unchanged. Also since l3_dev is in the same RX cache line, there should not be any additional cost. IPvlan slaves OTOH can assign (nominate) its master to its l3_dev so that L3 processing happens in master's ns Please check individual patches for the details. Mahesh Bandewar (3): dev: Add netif_get_l3_dev() helper ipvlan: Use netif_get_l3_dev() to implement L3-symmetric mode. net: update L3 path with device selection logic drivers/net/ipvlan/ipvlan_main.c | 16 +--- include/linux/netdevice.h| 6 ++ net/core/dev.c | 10 +++--- net/ipv4/ip_input.c | 5 +++-- net/ipv6/ip6_input.c | 5 +++-- 5 files changed, 28 insertions(+), 14 deletions(-) -- 2.7.0.rc3.207.g0ac5344
Re: [PATCH] 3c59x: mask LAST_FRAG bit from length field in ring
From: Neil Horman Date: Thu, 25 Feb 2016 13:02:50 -0500 > Recently, I fixed a bug in 3c59x: > > commit 6e144419e4da11a9a4977c8d899d7247d94ca338 > Author: Neil Horman > Date: Wed Jan 13 12:43:54 2016 -0500 > > 3c59x: fix another page map/single unmap imbalance > > Which correctly rebalanced dma mapping and unmapping types. Unfortunately it > introduced a new bug which causes oopses on older systems. > > When mapping dma regions, the last entry for a packet in the 3c59x tx ring > encodes a LAST_FRAG bit, which is encoded as the high order bit of the buffers > length field. When it is unmapped the LAST_FRAG bit is cleared prior to being > passed to the unmap function. Unfortunately the commit above fails to do that > masking. It was missed in testing because the system on which I tested it had > an intel iommu, the driver for which ignores the size field, using only the > DMA > address as the token to identify the mapping to be released. However, on > older > systems that rely on swiotlb (or other dma drivers that key off that length > field), not masking off that LAST_FRAG high order bit results in parsing a > huge > size to be release, leading to all sorts of odd corruptions and the like. > > Fix is easy, just mask the length with 0xFFF. It should really be > &(LAST_FRAG-1), but 0xFFF is the style of the file, and I'd like to make this > fix minimal and correct before making it prettier. > > Appies to the net tree cleanly. All testing on both iommu and swiommu based > systems produce good results > > Signed-off-by: Neil Horman Applied and queued up for -stable, thanks.
Re: [Patch net-next v3 0/4] net_sched: update backlog for hierarchical qdisc's
From: Cong Wang Date: Thu, 25 Feb 2016 14:54:59 -0800 > For hierarchical qdisc like HTB, we currently only update its qlen > but leave its backlog as zero: > > qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat > 0 ver 3.17 > Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 > requeues 0) > backlog 0b 72p requeues 0 > > This patchset makes backlog as accurate as qlen. > > --- > v3: rebase and fix the n==0 case for qdisc_tree_reduce_backlog() > v2: rebase and update changelog, not code change Series applied, thanks.
Question on switchdev
Hi Jiri, Scott, or other switchdev experts, TI Keystone netcp h/w has a switch. It has n slave ports and 1 host port. Currently the netcp driver disables the switch functionality which makes them appear as n nic ports. However we have requirement to add switch support in the driver. I have reviewed the experimental driver documentation Documentation/networking/switchdev.txt and would like to understand it better so that I can add this support to keystone netcp driver. NetCP h/w has a 1 (host port) x n (slave port) switch. It can do layer 2 forwarding between ports. In the switch mode, host driver provides the frame to the switch and switch uses the filter data base (AKA ALE table, Address Learning Engine table) to forward the packet. There is a piece of information available per frame (meta data) to decide if frame to be forwarded to a particular port or use the fdb for forward decisions. I see following description in the above documentation. ===From Documentation/networking/switchdev.txt== On switchdev driver initialization, the driver will allocate and register a struct net_device (using register_netdev()) for each enumerated physical switch port, called the port netdev. A port netdev is the software representation of the physical port and provides a conduit for control traffic to/from the controller (the kernel) and the network, as well as an anchor point for higher level constructs such as bridges, bonds, VLANs, tunnels, and L3 routers. Using standard netdev tools (iproute2, ethtool, etc), the port netdev can also provide to the user access to the physical properties of the switch port such as PHY link state and I/O statistics. = 1. How does port netdev differ from regular netdev that carries data when registering netdev? Any example you can point to? 2. I assume port netdev will appear as an interface in ifconfig -a command and it is not assigned an IP address. Correct? 3. with 1xn switch, so we have n + 1 netdev registered with net core? I assume, only 1 netdev is for data plane and the rest are control plane. Is this correct? 4. We have bunch of port specific configuration that we would like to control or configure from use space using standard tools. For example, switch port state, flow control etc. Is that possible to add using this framework? ethtool update needed for this? 5. This feature is marked as experimental. Hope having more drivers added to this switch dev framework can eventually get this out of experimental to regular status. Right? I have more questions that I will defer for now. It would be great if I can work with you to implement this in netcp driver. Hope you can respond with your comment. Thanks. -- Murali Karicheri Linux Kernel, Keystone
Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32
On Mon, Feb 29, 2016 at 10:58 AM, Jiri Pirko wrote: > Mon, Feb 29, 2016 at 07:40:53PM CET, john.fastab...@gmail.com wrote: >>On 16-02-27 08:28 PM, Cong Wang wrote: >>> On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend >>> wrote: On 16-02-26 09:39 AM, Cong Wang wrote: > On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend > wrote: >> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h >> index 2121df5..e64d20b 100644 >> --- a/include/net/pkt_cls.h >> +++ b/include/net/pkt_cls.h >> @@ -392,4 +392,9 @@ struct tc_cls_u32_offload { >> }; >> }; >> >> +static inline bool tc_should_offload(struct net_device *dev) >> +{ >> + return dev->netdev_ops->ndo_setup_tc; >> +} >> + > > These should be protected by CONFIG_NET_CLS_U32, no? > Its not necessary it is a completely general function and I only lifted it out of cls_u32 so that the cls_flower classifier could also use it. I don't see the need off-hand to have it wrapped in an ORd ifdef statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...). Any particular reason you were thnking it should be wrapped in ifdefs? >>> >>> Not a big deal. >>> >>> I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n. >>> >>> Thanks. >>> >> >>Well because this is 'static inline' gcc should just remove it >>if it is not used. Assuming non-ancient gcc and normal compile >>flags, e.g. you are not including -fkeep-inline-functions or >>something. >> >>So just to keep it readable I would prefer to just leave it >>as is. > > Definitelly. cls_flower will use it in very near future. Making it > dependent on CONFIG_NET_CLS_U32 makes 0 sense to me. Oh, why then do you have u32 in the struct name tc_cls_u32_offload? (Note that in the above I said "these" not "this", so I never only refer to tc_should_offload)
Re: [PATCH/RFC v5 net-next] ravb: Add dma queue interrupt support
On 02/28/2016 05:13 PM, Yoshihiro Kaneko wrote: From: Kazuya Mizuguchi This patch supports the following interrupts. - One interrupt for multiple (error, gPTP) - One interrupt for emac - Four interrupts for dma queue (best effort rx/tx, network control rx/tx) This patch improve efficiency of the interrupt handler by adding the interrupt handler corresponding to each interrupt source described above. Additionally, it reduces the number of times of the access to EthernetAVB IF. Also this patch prevent this driver depends on the whim of a boot loader. [ykaneko0...@gmail.com: define bit names of registers] [ykaneko0...@gmail.com: add comment for gen3 only registers] [ykaneko0...@gmail.com: fix coding style] [ykaneko0...@gmail.com: update changelog] [ykaneko0...@gmail.com: gen3: fix initialization of interrupts] [ykaneko0...@gmail.com: gen3: fix clearing interrupts] [ykaneko0...@gmail.com: gen3: add helper function for request_irq()] [ykaneko0...@gmail.com: revert ravb_close() and ravb_ptp_stop()] [ykaneko0...@gmail.com: avoid calling free_irq() to non-hooked interrupts] [ykaneko0...@gmail.com: make NC/BE interrupt handler a function] Signed-off-by: Kazuya Mizuguchi Signed-off-by: Yoshihiro Kaneko [...] diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index c936682..1bec71e 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c [...] @@ -697,6 +726,39 @@ static void ravb_error_interrupt(struct net_device *ndev) } } +static int ravb_nc_be_interrupt(struct net_device *ndev, int ravb_queue, I'd call this function e.g. ravb_queue_interrupt(). And make it return 'bool' or even 'irqreturn_t' directly. And I'd suggest a shorter name for the 'ravb_queue' parameter, like 'queue' or even 'q'... Agreed. + u32 ris0, u32 *ric0, u32 tis, u32 *tic) You don't seem to need 'ric0' and 'tic' past the call sites, so no real need to pass them by reference. When Rx/Tx interrupt for NC and BE is issued at the same time, this function is called twice (for NC, BE) from ravb_interrupt. The interrupt mask of NC set in the first call will be reset in the next call for BE. So it is necessary to keep the modified value of "ric0" and "tic". OK, but we still can simplify this by reading these registers right in ravb_queue_interrupt()... [...] @@ -725,31 +787,15 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id) /* Network control and best effort queue RX/TX */ for (q = RAVB_NC; q >= RAVB_BE; q--) { - if (((ris0 & ric0) & BIT(q)) || - ((tis & tic) & BIT(q))) { - if (napi_schedule_prep(&priv->napi[q])) { - /* Mask RX and TX interrupts */ - ric0 &= ~BIT(q); - tic &= ~BIT(q); - ravb_write(ndev, ric0, RIC0); - ravb_write(ndev, tic, TIC); - __napi_schedule(&priv->napi[q]); - } else { - netdev_warn(ndev, - "ignoring interrupt, rx status 0x%08x, rx mask 0x%08x,\n", - ris0, ric0); - netdev_warn(ndev, - " tx status 0x%08x, tx mask 0x%08x.\n", - tis, tic); - } + if (ravb_nc_be_interrupt(ndev, q, ris0, &ric0, tis, +&tic)) result = IRQ_HANDLED; - } } Unroll this *for* loop please... OK. It was a bad idea actually, sorry... [...] @@ -767,6 +813,73 @@ static irqreturn_t ravb_interrupt(int irq, void [...] +static irqreturn_t ravb_dmaq_interrupt(int irq, void *dev_id, int ravb_queue) Perhaps, ravb_rx_tx_interrupt()? Agreed. And we still have ravb_dma_interrupt() unused, right? [...] Thanks, kaneko MBR, Sergei
Re: [PATCH/RFC v6 net-next] ravb: Add dma queue interrupt support
Hello. On 02/28/2016 06:41 PM, Yoshihiro Kaneko wrote: From: Kazuya Mizuguchi This patch supports the following interrupts. - One interrupt for multiple (timestamp, error, gPTP) - One interrupt for emac - Four interrupts for dma queue (best effort rx/tx, network control rx/tx) This patch improve efficiency of the interrupt handler by adding the interrupt handler corresponding to each interrupt source described above. Additionally, it reduces the number of times of the access to EthernetAVB IF. Also this patch prevent this driver depends on the whim of a boot loader. [ykaneko0...@gmail.com: define bit names of registers] [ykaneko0...@gmail.com: add comment for gen3 only registers] [ykaneko0...@gmail.com: fix coding style] [ykaneko0...@gmail.com: update changelog] [ykaneko0...@gmail.com: gen3: fix initialization of interrupts] [ykaneko0...@gmail.com: gen3: fix clearing interrupts] [ykaneko0...@gmail.com: gen3: add helper function for request_irq()] [ykaneko0...@gmail.com: gen3: remove IRQF_SHARED flag for request_irq()] [ykaneko0...@gmail.com: revert ravb_close() and ravb_ptp_stop()] [ykaneko0...@gmail.com: avoid calling free_irq() to non-hooked interrupts] [ykaneko0...@gmail.com: make NC/BE interrupt handler a function] [ykaneko0...@gmail.com: make timestamp interrupt handler a function] [ykaneko0...@gmail.com: timestamp interrupt is handled in multiple interrupt handler instead of dma queue interrupt handler] Signed-off-by: Kazuya Mizuguchi Signed-off-by: Yoshihiro Kaneko OK, you are very close now! Just a few comments... [...] diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index c936682..22ef65d 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c [...] @@ -697,6 +726,47 @@ static void ravb_error_interrupt(struct net_device *ndev) } } +static bool ravb_queue_interrupt(struct net_device *ndev, int q, +u32 ris0, u32 *ric0, u32 tis, u32 *tic) +{ + struct ravb_private *priv = netdev_priv(ndev); + Perhaps it makes sense to read the RI[CS]0/TI[CS] here instead of passing them (by reference)? [...] @@ -714,42 +784,21 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id) u32 ric0 = ravb_read(ndev, RIC0); u32 tis = ravb_read(ndev, TIS); u32 tic = ravb_read(ndev, TIC); - int q; /* Timestamp updated */ - if (tis & TIS_TFUF) { - ravb_write(ndev, ~TIS_TFUF, TIS); - ravb_get_tx_tstamp(ndev); + if (ravb_timestamp_interrupt(ndev, tis)) result = IRQ_HANDLED; - } /* Network control and best effort queue RX/TX */ - for (q = RAVB_NC; q >= RAVB_BE; q--) { - if (((ris0 & ric0) & BIT(q)) || - ((tis & tic) & BIT(q))) { - if (napi_schedule_prep(&priv->napi[q])) { - /* Mask RX and TX interrupts */ - ric0 &= ~BIT(q); - tic &= ~BIT(q); - ravb_write(ndev, ric0, RIC0); - ravb_write(ndev, tic, TIC); - __napi_schedule(&priv->napi[q]); - } else { - netdev_warn(ndev, - "ignoring interrupt, rx status 0x%08x, rx mask 0x%08x,\n", - ris0, ric0); - netdev_warn(ndev, - "tx status 0x%08x, tx mask 0x%08x.\n", - tis, tic); - } - result = IRQ_HANDLED; - } - } + if (ravb_queue_interrupt(ndev, RAVB_NC, ris0, &ric0, tis, &tic)) + result = IRQ_HANDLED; + if (ravb_queue_interrupt(ndev, RAVB_BE, ris0, &ric0, tis, &tic)) + result = IRQ_HANDLED; Hmm, perhaps unrolling wasn't such a great idea... we can't use || here as it would be short-circuited. :-( [...] +static irqreturn_t ravb_rx_tx_interrupt(int irq, void *dev_id, int ravb_queue) Please, please shorten this 'ravb_queue'... Also, would make sense to rename it to ravb_dma_interrupt()... [...] Unfortunately, I still can't do a full gen2 regression testing as both Alt and Porter boards don't work with the recent kernel due to AVB_MDIO stuck at 1... But perhaps such testing isn't even necessary. MBR, Sergei
Re: Softirq priority inversion from "softirq: reduce latencies"
From: Thomas Gleixner Date: Mon, 29 Feb 2016 20:14:36 +0100 (CET) > On Mon, 29 Feb 2016, Peter Hurley wrote: >> Or flipping your argument on its head, why not just _always_ execute >> softirq in ksoftirqd? > > Which is what that change effectivley does. And that makes a lot of sense, > because you get the softirq load under scheduler control and do not let the > softirq run as a context stealing entity which is completely uncontrollable by > the scheduler. +1
[PATCH 4/4] net: can: ifi: Add obscure bit swap for EFF frame IDs
In case of CAN2.0 EFF frame, the controller handles frame IDs in a rather bizzare way. The ID is split into an extended part, IDX[28:11] and standard part, ID[10:0]. In the TX path, the core first sends the top 11 bits of the IDX, followed by ID and finally the rest of IDX. In the RX path, the core stores the ID the LSbit part of IDX field, followed by the LSbit parts of real IDX. The MSbit parts of IDX are stored in ID field of the register. This patch implements the necessary bit shuffling to mitigate this obscure behavior. In case two of these controllers are connected together, the RX and TX bit swapping nullifies itself and the issue does not manifest. The issue only manifests when talking to another different CAN controller. Signed-off-by: Marek Vasut Cc: Marc Kleine-Budde Cc: Mark Rutland Cc: Oliver Hartkopp Cc: Wolfgang Grandegger --- drivers/net/can/ifi_canfd/ifi_canfd.c | 31 +-- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c b/drivers/net/can/ifi_canfd/ifi_canfd.c index 6704098..254861b 100644 --- a/drivers/net/can/ifi_canfd/ifi_canfd.c +++ b/drivers/net/can/ifi_canfd/ifi_canfd.c @@ -136,7 +136,11 @@ #define IFI_CANFD_RXFIFO_ID0x6c #define IFI_CANFD_RXFIFO_ID_ID_OFFSET 0 #define IFI_CANFD_RXFIFO_ID_ID_STD_MASK0x7ff +#define IFI_CANFD_RXFIFO_ID_ID_STD_OFFSET 0 +#define IFI_CANFD_RXFIFO_ID_ID_STD_WIDTH 10 #define IFI_CANFD_RXFIFO_ID_ID_XTD_MASK0x1fff +#define IFI_CANFD_RXFIFO_ID_ID_XTD_OFFSET 11 +#define IFI_CANFD_RXFIFO_ID_ID_XTD_WIDTH 18 #define IFI_CANFD_RXFIFO_ID_IDEBIT(29) #define IFI_CANFD_RXFIFO_DATA 0x70/* 0x70..0xac */ @@ -157,7 +161,11 @@ #define IFI_CANFD_TXFIFO_ID0xbc #define IFI_CANFD_TXFIFO_ID_ID_OFFSET 0 #define IFI_CANFD_TXFIFO_ID_ID_STD_MASK0x7ff +#define IFI_CANFD_TXFIFO_ID_ID_STD_OFFSET 0 +#define IFI_CANFD_TXFIFO_ID_ID_STD_WIDTH 10 #define IFI_CANFD_TXFIFO_ID_ID_XTD_MASK0x1fff +#define IFI_CANFD_TXFIFO_ID_ID_XTD_OFFSET 11 +#define IFI_CANFD_TXFIFO_ID_ID_XTD_WIDTH 18 #define IFI_CANFD_TXFIFO_ID_IDEBIT(29) #define IFI_CANFD_TXFIFO_DATA 0xc0/* 0xb0..0xfc */ @@ -229,10 +237,20 @@ static void ifi_canfd_read_fifo(struct net_device *ndev) rxid = readl(priv->base + IFI_CANFD_RXFIFO_ID); id = (rxid >> IFI_CANFD_RXFIFO_ID_ID_OFFSET); - if (id & IFI_CANFD_RXFIFO_ID_IDE) + if (id & IFI_CANFD_RXFIFO_ID_IDE) { id &= IFI_CANFD_RXFIFO_ID_ID_XTD_MASK; - else + /* +* In case the Extended ID frame is received, the standard +* and extended part of the ID are swapped in the register, +* so swap them back to obtain the correct ID. +*/ + id = (id >> IFI_CANFD_RXFIFO_ID_ID_XTD_OFFSET) | +((id & IFI_CANFD_RXFIFO_ID_ID_STD_MASK) << + IFI_CANFD_RXFIFO_ID_ID_XTD_WIDTH); + id |= CAN_EFF_FLAG; + } else { id &= IFI_CANFD_RXFIFO_ID_ID_STD_MASK; + } cf->can_id = id; if (rxdlc & IFI_CANFD_RXFIFO_DLC_ESI) { @@ -767,6 +785,15 @@ static netdev_tx_t ifi_canfd_start_xmit(struct sk_buff *skb, if (cf->can_id & CAN_EFF_FLAG) { txid = cf->can_id & CAN_EFF_MASK; + /* +* In case the Extended ID frame is transmitted, the +* standard and extended part of the ID are swapped +* in the register, so swap them back to send the +* correct ID. +*/ + txid = (txid >> IFI_CANFD_TXFIFO_ID_ID_XTD_WIDTH) | + ((txid & IFI_CANFD_TXFIFO_ID_ID_XTD_MASK) << +IFI_CANFD_TXFIFO_ID_ID_XTD_OFFSET); txid |= IFI_CANFD_TXFIFO_ID_IDE; } else { txid = cf->can_id & CAN_SFF_MASK; -- 2.7.0
[PATCH 1/4] net: can: ifi: Fix clock generator configuration
The clock generation does not match reality when using the CAN IP core outside of the FPGA design. This patch fixes the computation of values which are programmed into the clock generator registers. First, there are some off-by-one errors which manifest themselves only when communicating with different controller, so those are fixed. Second, the bits in the clock generator registers have different meaning depending on whether the core is in ISO CANFD mode or any of the other modes (BOSCH CANFD or CAN2.0). Detect the ISO CANFD mode and fix handling of this special case of clock configuration. Finally, the CAN clock speed is in CANCLOCK register, not SYSCLOCK register, so fix this as well. Signed-off-by: Marek Vasut Cc: Marc Kleine-Budde Cc: Mark Rutland Cc: Oliver Hartkopp Cc: Wolfgang Grandegger --- drivers/net/can/ifi_canfd/ifi_canfd.c | 43 ++- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c b/drivers/net/can/ifi_canfd/ifi_canfd.c index 639868b..72f5205 100644 --- a/drivers/net/can/ifi_canfd/ifi_canfd.c +++ b/drivers/net/can/ifi_canfd/ifi_canfd.c @@ -514,25 +514,25 @@ static irqreturn_t ifi_canfd_isr(int irq, void *dev_id) static const struct can_bittiming_const ifi_canfd_bittiming_const = { .name = KBUILD_MODNAME, - .tseg1_min = 2,/* Time segment 1 = prop_seg + phase_seg1 */ + .tseg1_min = 1,/* Time segment 1 = prop_seg + phase_seg1 */ .tseg1_max = 64, - .tseg2_min = 1,/* Time segment 2 = phase_seg2 */ - .tseg2_max = 16, + .tseg2_min = 2,/* Time segment 2 = phase_seg2 */ + .tseg2_max = 64, .sjw_max= 16, - .brp_min= 1, - .brp_max= 1024, + .brp_min= 2, + .brp_max= 256, .brp_inc= 1, }; static const struct can_bittiming_const ifi_canfd_data_bittiming_const = { .name = KBUILD_MODNAME, - .tseg1_min = 2,/* Time segment 1 = prop_seg + phase_seg1 */ - .tseg1_max = 16, - .tseg2_min = 1,/* Time segment 2 = phase_seg2 */ - .tseg2_max = 8, - .sjw_max= 4, - .brp_min= 1, - .brp_max= 32, + .tseg1_min = 1,/* Time segment 1 = prop_seg + phase_seg1 */ + .tseg1_max = 64, + .tseg2_min = 2,/* Time segment 2 = phase_seg2 */ + .tseg2_max = 64, + .sjw_max= 16, + .brp_min= 2, + .brp_max= 256, .brp_inc= 1, }; @@ -545,32 +545,33 @@ static void ifi_canfd_set_bittiming(struct net_device *ndev) u32 noniso_arg = 0; u32 time_off; - if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO) { + if (priv->can.ctrlmode & CAN_CTRLMODE_FD) { + time_off = IFI_CANFD_TIME_SJW_OFF_ISO; + } else { noniso_arg = IFI_CANFD_TIME_SET_TIMEB_BOSCH | IFI_CANFD_TIME_SET_TIMEA_BOSCH | IFI_CANFD_TIME_SET_PRESC_BOSCH | IFI_CANFD_TIME_SET_SJW_BOSCH; time_off = IFI_CANFD_TIME_SJW_OFF_BOSCH; - } else { - time_off = IFI_CANFD_TIME_SJW_OFF_ISO; } /* Configure bit timing */ - brp = bt->brp - 1; + brp = bt->brp - 2; sjw = bt->sjw - 1; tseg1 = bt->prop_seg + bt->phase_seg1 - 1; - tseg2 = bt->phase_seg2 - 1; + tseg2 = bt->phase_seg2 - 2; writel((tseg2 << IFI_CANFD_TIME_TIMEB_OFF) | (tseg1 << IFI_CANFD_TIME_TIMEA_OFF) | (brp << IFI_CANFD_TIME_PRESCALE_OFF) | - (sjw << time_off), + (sjw << time_off) | + noniso_arg, priv->base + IFI_CANFD_TIME); /* Configure data bit timing */ - brp = dbt->brp - 1; + brp = dbt->brp - 2; sjw = dbt->sjw - 1; tseg1 = dbt->prop_seg + dbt->phase_seg1 - 1; - tseg2 = dbt->phase_seg2 - 1; + tseg2 = dbt->phase_seg2 - 2; writel((tseg2 << IFI_CANFD_TIME_TIMEB_OFF) | (tseg1 << IFI_CANFD_TIME_TIMEA_OFF) | (brp << IFI_CANFD_TIME_PRESCALE_OFF) | @@ -847,7 +848,7 @@ static int ifi_canfd_plat_probe(struct platform_device *pdev) priv->can.state = CAN_STATE_STOPPED; - priv->can.clock.freq = readl(addr + IFI_CANFD_SYSCLOCK); + priv->can.clock.freq = readl(addr + IFI_CANFD_CANCLOCK); priv->can.bittiming_const = &ifi_canfd_bittiming_const; priv->can.data_bittiming_const = &ifi_canfd_data_bittiming_const; -- 2.7.0
[PATCH 3/4] net: can: ifi: Fix RX and TX ID mask
The RX and TX ID mask for CAN2.0 is 11 bits wide. This patch fixes the incorrect mask, which caused the CAN IDs to miss the MSBit both on receive and transmit. Signed-off-by: Marek Vasut Cc: Marc Kleine-Budde Cc: Mark Rutland Cc: Oliver Hartkopp Cc: Wolfgang Grandegger --- drivers/net/can/ifi_canfd/ifi_canfd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c b/drivers/net/can/ifi_canfd/ifi_canfd.c index 82a33bd..6704098 100644 --- a/drivers/net/can/ifi_canfd/ifi_canfd.c +++ b/drivers/net/can/ifi_canfd/ifi_canfd.c @@ -135,7 +135,7 @@ #define IFI_CANFD_RXFIFO_ID0x6c #define IFI_CANFD_RXFIFO_ID_ID_OFFSET 0 -#define IFI_CANFD_RXFIFO_ID_ID_STD_MASK0x3ff +#define IFI_CANFD_RXFIFO_ID_ID_STD_MASK0x7ff #define IFI_CANFD_RXFIFO_ID_ID_XTD_MASK0x1fff #define IFI_CANFD_RXFIFO_ID_IDEBIT(29) @@ -156,7 +156,7 @@ #define IFI_CANFD_TXFIFO_ID0xbc #define IFI_CANFD_TXFIFO_ID_ID_OFFSET 0 -#define IFI_CANFD_TXFIFO_ID_ID_STD_MASK0x3ff +#define IFI_CANFD_TXFIFO_ID_ID_STD_MASK0x7ff #define IFI_CANFD_TXFIFO_ID_ID_XTD_MASK0x1fff #define IFI_CANFD_TXFIFO_ID_IDEBIT(29) -- 2.7.0
[PATCH 0/4] Synchronise IFI CANFD driver with real world
Thus far, this driver was only tested on a hardware synthesised in the warm and safe insides of an FPGA, only against another IFI CANFD core. The real hardware arrived now and I tested the IFI CANFD driver against different, harsh, real-world CAN controller. This uncovered a few bugs, so here are the fixes for those. Marek Vasut (4): net: can: ifi: Fix clock generator configuration net: can: ifi: Fix TX DLC configuration net: can: ifi: Fix RX and TX ID mask net: can: ifi: Add obscure bit swap for EFF frame IDs drivers/net/can/ifi_canfd/ifi_canfd.c | 83 --- 1 file changed, 58 insertions(+), 25 deletions(-) Cc: Marc Kleine-Budde Cc: Mark Rutland Cc: Oliver Hartkopp Cc: Wolfgang Grandegger -- 2.7.0
[PATCH 2/4] net: can: ifi: Fix TX DLC configuration
The TX DLC, the transmission length information, was not written into the transmit configuration register. When using the CAN core with different CAN controller, the receiving CAN controller will receive only the ID part of the CAN frame, but no data at all. This patch adds the TX DLC into the register to fix this issue. Signed-off-by: Marek Vasut Cc: Marc Kleine-Budde Cc: Mark Rutland Cc: Oliver Hartkopp Cc: Wolfgang Grandegger --- drivers/net/can/ifi_canfd/ifi_canfd.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c b/drivers/net/can/ifi_canfd/ifi_canfd.c index 72f5205..82a33bd 100644 --- a/drivers/net/can/ifi_canfd/ifi_canfd.c +++ b/drivers/net/can/ifi_canfd/ifi_canfd.c @@ -774,10 +774,15 @@ static netdev_tx_t ifi_canfd_start_xmit(struct sk_buff *skb, if (priv->can.ctrlmode & (CAN_CTRLMODE_FD | CAN_CTRLMODE_FD_NON_ISO)) { if (can_is_canfd_skb(skb)) { + txdlc |= can_len2dlc(cf->len); txdlc |= IFI_CANFD_TXFIFO_DLC_EDL; if (cf->flags & CANFD_BRS) txdlc |= IFI_CANFD_TXFIFO_DLC_BRS; + } else { + txdlc |= cf->len; } + } else { + txdlc |= cf->len; } if (cf->can_id & CAN_RTR_FLAG) -- 2.7.0
[ANNOUNCE] NetDev 1.1 slides now available
Hi, Today we're releasing the NetDev 1.1 slides, you can find them at: http://www.netdevconf.org/1.1/proceedings/ Regarding videos, we're still uploading (~40 hours), so it may take a little while until we make them public. Will send a short noticed once they are available. And short reminder to talk presenters: Don't forget that your paper submission deadline is set on *10th March 2016*. Thanks.
Re: [PATCH] mrf24j40: fix security-enabled processing on inbound frames
On 02/18/2016 01:34 PM, zopieux wrote: Fix the MRF24J40 handling of security-enabled frames so it does not block upon receiving such frames. Signed-off-by: Alexander Aring Reported-by: Alexandre Macabies Tested-by: Alexandre Macabies --- When receiving a security-enabled IEEE 802.15.4 frame, the MRF24J40 triggers a SECIF interrupt that needs to be handled for RX processing to keep functioning properly. This patch enables the SECIF interrupt and makes the MRF ignores all hardware processing of security-enabled frames, that is handled by the ieee802154 stack instead. --- The "From" field of the email needs to have your real name in it. This will be where the "Author" field in git comes from. It looks like there are a few separate things happening in this patch. Maybe they should be broken out in to separate patches. I see: 1. The ieee802154.h part, 2. The TX part, 3. The RX part. The patch description only really describes the RX part. Other than that, the actual code seems OK to me. Alan.
Re: Softirq priority inversion from "softirq: reduce latencies"
On lun., 2016-02-29 at 11:13 -0800, Peter Hurley wrote: > On 02/29/2016 07:27 AM, Eric Dumazet wrote: > > On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > > > >> The reason why Eric's change is so effective for Eric's workload is > >> that it fixes the problem where NET_RX keeps getting new network packets > >> so it keeps looping, servicing more NET_RX softirq. > > > > You have very little idea of what is happening in networking land. > > While that is true, I can read a trace: > > ** already in NET_RX softirq ** > > -0 0..s2 15us : kmem_cache_alloc: call_site=c08378e4 > ptr=de55d7c0 bytes_req=192 bytes_alloc=192 gfp_flags=GFP_ATOMIC > -0 0..s2 23us : netif_receive_skb_entry: dev=eth0 napi_id=0x0 > queue_mapping=0 skbaddr=dca04400 vlan_tagged=0 vlan_proto=0x > vlan_tci=0x000 > 0 protocol=0x0800 ip_summed=0 hash=0x l4_hash=0 len=88 data_len=0 > truesize=1984 mac_header_valid=1 mac_header=-14 nr_frags=0 gso_size=0 > gso_type=0x0 > -0 0..s2 30us+: netif_receive_skb: dev=eth0 skbaddr=dca04400 > len=88 > -0 0d.s5 98us : sched_waking: comm=sshd pid=750 prio=120 > target_cpu=000 > -0 0d.s6 105us : sched_stat_sleep: comm=sshd pid=750 > delay=3125230447 [ns] > -0 0dns6 110us+: sched_wakeup: comm=sshd pid=750 prio=120 > target_cpu=000 > -0 0dns4 123us+: timer_start: timer=dc940e9c > function=tcp_delack_timer expires=9746 [timeout=10] flags=0x > -0 0dnH3 150us : irq_handler_entry: irq=176 > name=4a10.ethernet > -0 0dnH3 153us : softirq_raise: vec=3 [action=NET_RX] > -0 0dnH3 155us : irq_handler_exit: irq=176 ret=handled > -0 0dnH3 160us : irq_handler_entry: irq=20 > name=4900.edma_ccint > -0 0dnH3 163us : irq_handler_exit: irq=20 ret=handled > -0 0.ns2 169us : napi_poll: napi poll on napi struct de465c30 > for device eth0 > -0 0.ns2 171us : softirq_exit: vec=3 [action=NET_RX] > > > As you can see, NET_RX softirq is re-raised while in NET_RX softirq, > as a result of receiving new packets. So NET_RX will keep looping, > which is what I wrote. Well, NET_RX can not be re-raised, it is a single bit flip. It is 'raised' on this trace because the driver already rearmed the IRQ so that hard irq handler could fire. Anyway, it seems you know much better than me, so I will stop answering your mails on this topic.
Re: [PATCH net-next V1 09/10] net/mlx5: Fix global UAR mapping
> > Well anyone can see that from the code. > > You have to explain why. In a simple words as partially explained in the commit message we want to have both mappings (NC and WC) available so upper layer can decide which to choose e.g. for SQs/QPs in some cases (Small Packets) and only when WC is supported we would like to write TX descriptors (WQEs) using ConnectX BlueFlame feature via WC mapping and if WC is not supported the TX descriptors would be posted in the usual way (doorbell) via NC mapping. this would give a latency boost for small packets. The problem is when posting BlueFlame buffers when the mapping is not WC i.e via NC mapping the latency will get worst than writing using the usual way (doorbell). so this is why we use ARCH_HAS_IOREMAP_WC to give a hint to upper layer whether to use BlueFlame writes (WC) or doorbell writes (NC). > > And BTW, ARCH_HAS_IOREMAP_WC doesn't even tell you if the platform > will actually give you a write-combining mapping. We did some research after your comment and we are considering removing ARCH_HAS_IOREMAP_WC from the code, we will update the patches soon. > > So if it's the driver operates properly if a non-WC mapping is used > for uar->bf_map, then get rid of this CPP test altogether PLEASE! > > Otherwise your driver is buggy, because ARCH_HAS_IOREMAP_WC only says > whether the default implementation of ioremap_wc() needs to be > provided by include/asm-generic/iomap.h It does not guarantee that a > write-combining mapping will be provided. > > I really can't think of any reason why you absolutely require a > WC mapping, and the CPP test just makes your driver look more > ugly than it needs to me. WC mapping is required in order to know if BlueFlame writes would give a better latency or not. > > So can you please explain what the hell is happening here and why you > are doing things this way rather than just reading the code to me? I hope the above explains what we are trying to do here, I know it is not perfect, but as you know the kernel IO mapping API doesn't tell if the WC mapping was successful or not, so we used the CPP test. but after your comment we understood it is not perfect, and we are looking into it. Thanks
[net-next PATCH] net: relax setup_tc ndo op handle restriction
I added this check in setup_tc to multiple drivers, if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) Unfortunately restricting to TC_H_ROOT like this breaks the old instantiation of mqprio to setup a hardware qdisc. This patch relaxes the test to only check the type to make it equivalent to the check before I broke it. With this the old instantiation continues to work. A good smoke test is to setup mqprio with, # tc qdisc add dev eth4 root mqprio num_tc 8 \ map 0 1 2 3 4 5 6 7 \ queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7 Fixes: e4c6734eaab9 ("net: rework ndo tc op to consume additional qdisc handle paramete") Reported-by: Singh Krishneil Reported-by: Jake Keller CC: Murali Karicheri CC: Shradha Shah CC: Or Gerlitz CC: Ariel Elior CC: Jeff Kirsher CC: Bruce Allan CC: Jesse Brandeburg CC: Don Skidmore Signed-off-by: John Fastabend --- drivers/net/ethernet/amd/xgbe/xgbe-drv.c|2 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |2 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c |2 +- drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |2 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c |2 +- drivers/net/ethernet/sfc/tx.c |2 +- drivers/net/ethernet/ti/netcp_core.c|2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index 3360684..ebf9224 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -1632,7 +1632,7 @@ static int xgbe_setup_tc(struct net_device *netdev, u32 handle, __be16 proto, struct xgbe_prv_data *pdata = netdev_priv(netdev); u8 tc; - if (handle != TC_H_ROOT || tc_to_netdev->type != TC_SETUP_MQPRIO) + if (tc_to_netdev->type != TC_SETUP_MQPRIO) return -EINVAL; tc = tc_to_netdev->tc; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c index 45843d1..a949783 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c @@ -4275,7 +4275,7 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc) int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto, struct tc_to_netdev *tc) { - if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) + if (tc->type != TC_SETUP_MQPRIO) return -EINVAL; return bnx2x_setup_tc(dev, tc->tc); } diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index ff1507f..f1a0a73 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -5383,7 +5383,7 @@ static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto, struct bnxt *bp = netdev_priv(dev); u8 tc; - if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO) + if (ntc->type != TC_SETUP_MQPRIO) return -EINVAL; tc = ntc->tc; diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c index dc1a821..d09a8dd 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c @@ -1207,7 +1207,7 @@ err_queueing_scheme: static int __fm10k_setup_tc(struct net_device *dev, u32 handle, __be16 proto, struct tc_to_netdev *tc) { - if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) + if (tc->type != TC_SETUP_MQPRIO) return -EINVAL; return fm10k_setup_tc(dev, tc->tc); diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index cf4b729..02139f3 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -8422,7 +8422,7 @@ int __ixgbe_setup_tc(struct net_device *dev, u32 handle, __be16 proto, } } - if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) + if (tc->type != TC_SETUP_MQPRIO) return -EINVAL; return ixgbe_setup_tc(dev, tc->tc); diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 96d95cb..a2d560a 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -72,7 +72,7 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up) static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, __be16 proto, struct tc_to_netdev *tc) { - if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO) + if (tc->type != TC_SETUP_MQPRIO) return -EINVAL; return mlx4_en_setup_tc(dev, tc->tc); diff --git a/drivers/net/ethernet/sfc/
Re: [Patch net-next] net: remove skb_sender_cpu_clear()
On lun., 2016-02-29 at 10:55 -0800, Cong Wang wrote: > On Mon, Feb 29, 2016 at 10:50 AM, Daniel Borkmann > wrote: > > On 02/28/2016 05:19 AM, Cong Wang wrote: > >> > >> After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id > >> cohabitation") > >> skb_sender_cpu_clear() becomes empty and can be removed. > >> > >> Cc: Eric Dumazet > >> Signed-off-by: Cong Wang > > > > > > Wasn't the intention to keep this helper as a marker when packet > > crosses domains from RX to TX, see discussion here: > > > > https://patchwork.ozlabs.org/patch/527167/ > > > > Maybe better to rename it and add a comment into the helper to > > make the intention more clear? > > Since when we need an empty function to mark some call path? > Isn't this supposed to be done by comments or documents? > > BTW, I myself even don't think we need any comment, people > who touches it should understand it. I have no objections for this patch. If we keep the helper, a better name would be needed anyway.
[PATCH net V1 4/7] net/mlx5e: Fix ethtool RX hash func configuration change
From: Tariq Toukan We should modify TIRs explicitly to apply the new RSS configuration. The light ndo close/open calls do not "refresh" them. Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h |3 ++ .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 34 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 27 +-- include/linux/mlx5/mlx5_ifc.h |4 ++- 4 files changed, 46 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 614a602..976bddb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -447,6 +447,8 @@ enum mlx5e_traffic_types { MLX5E_NUM_TT, }; +#define IS_HASHING_TT(tt) (tt != MLX5E_TT_ANY) + enum mlx5e_rqt_ix { MLX5E_INDIRECTION_RQT, MLX5E_SINGLE_RQ_RQT, @@ -613,6 +615,7 @@ void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv); void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv); int mlx5e_redirect_rqt(struct mlx5e_priv *priv, enum mlx5e_rqt_ix rqt_ix); +void mlx5e_build_tir_ctx_hash(void *tirc, struct mlx5e_priv *priv); int mlx5e_open_locked(struct net_device *netdev); int mlx5e_close_locked(struct net_device *netdev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 65624ac..64af1b0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -703,18 +703,36 @@ static int mlx5e_get_rxfh(struct net_device *netdev, u32 *indir, u8 *key, return 0; } +static void mlx5e_modify_tirs_hash(struct mlx5e_priv *priv, void *in, int inlen) +{ + struct mlx5_core_dev *mdev = priv->mdev; + void *tirc = MLX5_ADDR_OF(modify_tir_in, in, ctx); + int i; + + MLX5_SET(modify_tir_in, in, bitmask.hash, 1); + mlx5e_build_tir_ctx_hash(tirc, priv); + + for (i = 0; i < MLX5E_NUM_TT; i++) + if (IS_HASHING_TT(i)) + mlx5_core_modify_tir(mdev, priv->tirn[i], in, inlen); +} + static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir, const u8 *key, const u8 hfunc) { struct mlx5e_priv *priv = netdev_priv(dev); - bool close_open; - int err = 0; + int inlen = MLX5_ST_SZ_BYTES(modify_tir_in); + void *in; if ((hfunc != ETH_RSS_HASH_NO_CHANGE) && (hfunc != ETH_RSS_HASH_XOR) && (hfunc != ETH_RSS_HASH_TOP)) return -EINVAL; + in = mlx5_vzalloc(inlen); + if (!in) + return -ENOMEM; + mutex_lock(&priv->state_lock); if (indir) { @@ -723,11 +741,6 @@ static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir, mlx5e_redirect_rqt(priv, MLX5E_INDIRECTION_RQT); } - close_open = (key || (hfunc != ETH_RSS_HASH_NO_CHANGE)) && -test_bit(MLX5E_STATE_OPENED, &priv->state); - if (close_open) - mlx5e_close_locked(dev); - if (key) memcpy(priv->params.toeplitz_hash_key, key, sizeof(priv->params.toeplitz_hash_key)); @@ -735,12 +748,13 @@ static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir, if (hfunc != ETH_RSS_HASH_NO_CHANGE) priv->params.rss_hfunc = hfunc; - if (close_open) - err = mlx5e_open_locked(priv->netdev); + mlx5e_modify_tirs_hash(priv, in, inlen); mutex_unlock(&priv->state_lock); - return err; + kvfree(in); + + return 0; } static int mlx5e_get_rxnfc(struct net_device *netdev, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 137b05e..34b1049 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1317,6 +1317,21 @@ static void mlx5e_build_tir_ctx_lro(void *tirc, struct mlx5e_priv *priv) lro_timer_supported_periods[2])); } +void mlx5e_build_tir_ctx_hash(void *tirc, struct mlx5e_priv *priv) +{ + MLX5_SET(tirc, tirc, rx_hash_fn, +mlx5e_rx_hash_fn(priv->params.rss_hfunc)); + if (priv->params.rss_hfunc == ETH_RSS_HASH_TOP) { + void *rss_key = MLX5_ADDR_OF(tirc, tirc, +rx_hash_toeplitz_key); + size_t len = MLX5_FLD_SZ_BYTES(tirc, + rx_hash_toeplitz_key); + + MLX5_SET(tirc, tirc, rx_hash_symmetric, 1); + memcpy(rss_key, priv->params.toeplitz_hash_key, len); + } +} + static int mlx5e_modify_tirs_lro(stru
[PATCH net V1 7/7] net/mlx5e: Provide correct packet/bytes statistics
From: Gal Pressman Using the HW VPort counters for traffic (rx/tx packets/bytes) statistics is wrong. This is because frames dropped due to steering or out of buffer will be counted as received. To fix that, we move to use the packet/bytes accounting done by the driver for what the netdev reports out. Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support [...]') Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++-- 1 files changed, 8 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 02689ca..402994b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -141,6 +141,10 @@ void mlx5e_update_stats(struct mlx5e_priv *priv) return; /* Collect firts the SW counters and then HW for consistency */ + s->rx_packets = 0; + s->rx_bytes = 0; + s->tx_packets = 0; + s->tx_bytes = 0; s->tso_packets = 0; s->tso_bytes= 0; s->tx_queue_stopped = 0; @@ -155,6 +159,8 @@ void mlx5e_update_stats(struct mlx5e_priv *priv) for (i = 0; i < priv->params.num_channels; i++) { rq_stats = &priv->channel[i]->rq.stats; + s->rx_packets += rq_stats->packets; + s->rx_bytes += rq_stats->bytes; s->lro_packets += rq_stats->lro_packets; s->lro_bytes+= rq_stats->lro_bytes; s->rx_csum_none += rq_stats->csum_none; @@ -164,6 +170,8 @@ void mlx5e_update_stats(struct mlx5e_priv *priv) for (j = 0; j < priv->params.num_tc; j++) { sq_stats = &priv->channel[i]->sq[j].stats; + s->tx_packets += sq_stats->packets; + s->tx_bytes += sq_stats->bytes; s->tso_packets += sq_stats->tso_packets; s->tso_bytes+= sq_stats->tso_bytes; s->tx_queue_stopped += sq_stats->stopped; @@ -225,23 +233,6 @@ void mlx5e_update_stats(struct mlx5e_priv *priv) s->tx_broadcast_bytes = MLX5_GET_CTR(out, transmitted_eth_broadcast.octets); - s->rx_packets = - s->rx_unicast_packets + - s->rx_multicast_packets + - s->rx_broadcast_packets; - s->rx_bytes = - s->rx_unicast_bytes + - s->rx_multicast_bytes + - s->rx_broadcast_bytes; - s->tx_packets = - s->tx_unicast_packets + - s->tx_multicast_packets + - s->tx_broadcast_packets; - s->tx_bytes = - s->tx_unicast_bytes + - s->tx_multicast_bytes + - s->tx_broadcast_bytes; - /* Update calculated offload counters */ s->tx_csum_offload = s->tx_packets - tx_offload_none; s->rx_csum_good= s->rx_packets - s->rx_csum_none - -- 1.7.1
[PATCH net V1 0/7] Mellanox 100G mlx5 driver fixes
Hi Dave, This series has few bug fixes for the mlx5 Ethernet driver. Eran fixed a locking issue with time-stamping that could cause a soft-lockup when time-stamping is enabled. Gal fixed the rx/tx packets/bytes counters returned by the driver to actually went through the network stack. Tariq removed a poll CQ optimization which could lead the driver to stop getting interrupts for some of the rings, and a did also fix to HW LRO which is currently broken. He also provided RSS and RX hash fixes for the case of changing the number of rx rings the RX hash/RSS configuration will be out of sync. The time stamping fix from Eran is not for -stable as the feature was only introduced in 4.5 but all of the others are. Changes fro V0: - Eran addressed the irqsave/restore comments from "Dave" and fixed them. This series is generated against net commit 4c0b6eaf373a 'net: thunderx: Fix for Qset error due to CQ full' Saeed. Eran Ben Elisha (1): net/mlx5e: Fix soft lockup when HW Timestamping is enabled Gal Pressman (2): net/mlx5e: Add rx/tx bytes software counters net/mlx5e: Provide correct packet/bytes statistics Tariq Toukan (4): net/mlx5e: Remove wrong poll CQ optimization net/mlx5e: Fix LRO modify net/mlx5e: Fix ethtool RX hash func configuration change net/mlx5e: Correctly handle RSS indirection table when changing number of channels drivers/net/ethernet/mellanox/mlx5/core/en.h | 18 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 25 --- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 36 ++--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 82 +++- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|8 +-- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 19 ++--- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |1 - include/linux/mlx5/mlx5_ifc.h |4 +- 8 files changed, 109 insertions(+), 84 deletions(-)
[PATCH net V1 5/7] net/mlx5e: Correctly handle RSS indirection table when changing number of channels
From: Tariq Toukan Upon changing num_channels, reset the RSS indirection table to match the new value. Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options') Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h |2 ++ .../net/ethernet/mellanox/mlx5/core/en_ethtool.c |2 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++ 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 976bddb..d0a57d5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -619,6 +619,8 @@ void mlx5e_build_tir_ctx_hash(void *tirc, struct mlx5e_priv *priv); int mlx5e_open_locked(struct net_device *netdev); int mlx5e_close_locked(struct net_device *netdev); +void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len, + int num_channels); static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq, struct mlx5e_tx_wqe *wqe, int bf_sz) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 64af1b0..5abeb00 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -385,6 +385,8 @@ static int mlx5e_set_channels(struct net_device *dev, mlx5e_close_locked(dev); priv->params.num_channels = count; + mlx5e_build_default_indir_rqt(priv->params.indirection_rqt, + MLX5E_INDIR_RQT_SIZE, count); if (was_opened) err = mlx5e_open_locked(dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 34b1049..02689ca 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1199,7 +1199,6 @@ static void mlx5e_fill_indir_rqt_rqns(struct mlx5e_priv *priv, void *rqtc) ix = mlx5e_bits_invert(i, MLX5E_LOG_INDIR_RQT_SIZE); ix = priv->params.indirection_rqt[ix]; - ix = ix % priv->params.num_channels; MLX5_SET(rqtc, rqtc, rq_num[i], test_bit(MLX5E_STATE_OPENED, &priv->state) ? priv->channel[ix]->rq.rqn : @@ -2101,12 +2100,20 @@ u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev) 2 /*sizeof(mlx5e_tx_wqe.inline_hdr_start)*/; } +void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len, + int num_channels) +{ + int i; + + for (i = 0; i < len; i++) + indirection_rqt[i] = i % num_channels; +} + static void mlx5e_build_netdev_priv(struct mlx5_core_dev *mdev, struct net_device *netdev, int num_channels) { struct mlx5e_priv *priv = netdev_priv(netdev); - int i; priv->params.log_sq_size = MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE; @@ -2130,8 +2137,8 @@ static void mlx5e_build_netdev_priv(struct mlx5_core_dev *mdev, netdev_rss_key_fill(priv->params.toeplitz_hash_key, sizeof(priv->params.toeplitz_hash_key)); - for (i = 0; i < MLX5E_INDIR_RQT_SIZE; i++) - priv->params.indirection_rqt[i] = i % num_channels; + mlx5e_build_default_indir_rqt(priv->params.indirection_rqt, + MLX5E_INDIR_RQT_SIZE, num_channels); priv->params.lro_wqe_sz= MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ; -- 1.7.1
[PATCH net V1 6/7] net/mlx5e: Add rx/tx bytes software counters
From: Gal Pressman Sum up rx/tx bytes in software as we do for rx/tx packets, to be reported in upcoming statistics fix. Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h|8 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |1 + drivers/net/ethernet/mellanox/mlx5/core/en_tx.c |9 ++--- 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index d0a57d5..5b17532 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -223,6 +223,7 @@ struct mlx5e_pport_stats { static const char rq_stats_strings[][ETH_GSTRING_LEN] = { "packets", + "bytes", "csum_none", "csum_sw", "lro_packets", @@ -232,16 +233,18 @@ static const char rq_stats_strings[][ETH_GSTRING_LEN] = { struct mlx5e_rq_stats { u64 packets; + u64 bytes; u64 csum_none; u64 csum_sw; u64 lro_packets; u64 lro_bytes; u64 wqe_err; -#define NUM_RQ_STATS 6 +#define NUM_RQ_STATS 7 }; static const char sq_stats_strings[][ETH_GSTRING_LEN] = { "packets", + "bytes", "tso_packets", "tso_bytes", "csum_offload_none", @@ -253,6 +256,7 @@ static const char sq_stats_strings[][ETH_GSTRING_LEN] = { struct mlx5e_sq_stats { u64 packets; + u64 bytes; u64 tso_packets; u64 tso_bytes; u64 csum_offload_none; @@ -260,7 +264,7 @@ struct mlx5e_sq_stats { u64 wake; u64 dropped; u64 nop; -#define NUM_SQ_STATS 8 +#define NUM_SQ_STATS 9 }; struct mlx5e_stats { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 3fd6a58..59658b9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -263,6 +263,7 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget) mlx5e_build_rx_skb(cqe, rq, skb); rq->stats.packets++; + rq->stats.bytes += be32_to_cpu(cqe->byte_cnt); napi_gro_receive(cq->napi, skb); wq_ll_pop: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 2beea8c..bb4eeeb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -179,6 +179,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb) unsigned int skb_len = skb->len; u8 opcode = MLX5_OPCODE_SEND; dma_addr_t dma_addr = 0; + unsigned int num_bytes; bool bf = false; u16 headlen; u16 ds_cnt; @@ -204,8 +205,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb) opcode = MLX5_OPCODE_LSO; ihs = skb_transport_offset(skb) + tcp_hdrlen(skb); payload_len = skb->len - ihs; - wi->num_bytes = skb->len + - (skb_shinfo(skb)->gso_segs - 1) * ihs; + num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs; sq->stats.tso_packets++; sq->stats.tso_bytes += payload_len; } else { @@ -213,9 +213,11 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb) !skb->xmit_more && !skb_shinfo(skb)->nr_frags; ihs = mlx5e_get_inline_hdr_size(sq, skb, bf); - wi->num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN); + num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN); } + wi->num_bytes = num_bytes; + if (skb_vlan_tag_present(skb)) { mlx5e_insert_vlan(eseg->inline_hdr_start, skb, ihs, &skb_data, &skb_len); @@ -307,6 +309,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, struct sk_buff *skb) sq->bf_budget = bf ? sq->bf_budget - 1 : 0; sq->stats.packets++; + sq->stats.bytes += num_bytes; return NETDEV_TX_OK; dma_unmap_wqe_err: -- 1.7.1
[PATCH net V1 3/7] net/mlx5e: Fix soft lockup when HW Timestamping is enabled
From: Eran Ben Elisha Readers/Writers lock for SW timecounter was acquired without disabling interrupts on local CPU. The problematic scenario: * HW timestamping is enabled * Timestamp overflow periodic service task is running on local CPU and holding write_lock for SW timecounter * Completion arrives, triggers interrupt for local CPU. Interrupt routine calls napi_schedule(), which triggers rx/tx skb process. An attempt to read SW timecounter using read_lock is done, which is already locked by a writer on the same CPU and cause soft lockup. Add irqsave/irqrestore for when using the readers/writers lock for writing. Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support') Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 25 1 files changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c index be65435..2018eeb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c @@ -62,10 +62,11 @@ static void mlx5e_timestamp_overflow(struct work_struct *work) struct delayed_work *dwork = to_delayed_work(work); struct mlx5e_tstamp *tstamp = container_of(dwork, struct mlx5e_tstamp, overflow_work); + unsigned long flags; - write_lock(&tstamp->lock); + write_lock_irqsave(&tstamp->lock, flags); timecounter_read(&tstamp->clock); - write_unlock(&tstamp->lock); + write_unlock_irqrestore(&tstamp->lock, flags); schedule_delayed_work(&tstamp->overflow_work, tstamp->overflow_period); } @@ -136,10 +137,11 @@ static int mlx5e_ptp_settime(struct ptp_clock_info *ptp, struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp, ptp_info); u64 ns = timespec64_to_ns(ts); + unsigned long flags; - write_lock(&tstamp->lock); + write_lock_irqsave(&tstamp->lock, flags); timecounter_init(&tstamp->clock, &tstamp->cycles, ns); - write_unlock(&tstamp->lock); + write_unlock_irqrestore(&tstamp->lock, flags); return 0; } @@ -150,10 +152,11 @@ static int mlx5e_ptp_gettime(struct ptp_clock_info *ptp, struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp, ptp_info); u64 ns; + unsigned long flags; - write_lock(&tstamp->lock); + write_lock_irqsave(&tstamp->lock, flags); ns = timecounter_read(&tstamp->clock); - write_unlock(&tstamp->lock); + write_unlock_irqrestore(&tstamp->lock, flags); *ts = ns_to_timespec64(ns); @@ -164,10 +167,11 @@ static int mlx5e_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta) { struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp, ptp_info); + unsigned long flags; - write_lock(&tstamp->lock); + write_lock_irqsave(&tstamp->lock, flags); timecounter_adjtime(&tstamp->clock, delta); - write_unlock(&tstamp->lock); + write_unlock_irqrestore(&tstamp->lock, flags); return 0; } @@ -176,6 +180,7 @@ static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 delta) { u64 adj; u32 diff; + unsigned long flags; int neg_adj = 0; struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp, ptp_info); @@ -189,11 +194,11 @@ static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 delta) adj *= delta; diff = div_u64(adj, 10ULL); - write_lock(&tstamp->lock); + write_lock_irqsave(&tstamp->lock, flags); timecounter_read(&tstamp->clock); tstamp->cycles.mult = neg_adj ? tstamp->nominal_c_mult - diff : tstamp->nominal_c_mult + diff; - write_unlock(&tstamp->lock); + write_unlock_irqrestore(&tstamp->lock, flags); return 0; } -- 1.7.1
[PATCH net V1 1/7] net/mlx5e: Remove wrong poll CQ optimization
From: Tariq Toukan With the MLX5E_CQ_HAS_CQES optimization flag, the following buggy flow might occur: - Suppose RX is always busy, TX has a single packet every second. - We poll a single TX cqe and clear its flag. - We never arm it again as RX is always busy. - TX CQ flag is never changed, and new TX cqes are not polled. We revert this optimization. Fixes: e586b3b0baee ('net/mlx5: Ethernet Datapath files') Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h |5 - drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |7 --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 10 +- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |1 - 4 files changed, 1 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index aac071a..614a602 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -304,14 +304,9 @@ enum { MLX5E_RQ_STATE_POST_WQES_ENABLE, }; -enum cq_flags { - MLX5E_CQ_HAS_CQES = 1, -}; - struct mlx5e_cq { /* data path - accessed per cqe */ struct mlx5_cqwq wq; - unsigned long flags; /* data path - accessed per napi poll */ struct napi_struct*napi; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index dd959d9..3fd6a58 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -230,10 +230,6 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget) struct mlx5e_rq *rq = container_of(cq, struct mlx5e_rq, cq); int work_done; - /* avoid accessing cq (dma coherent memory) if not needed */ - if (!test_and_clear_bit(MLX5E_CQ_HAS_CQES, &cq->flags)) - return 0; - for (work_done = 0; work_done < budget; work_done++) { struct mlx5e_rx_wqe *wqe; struct mlx5_cqe64 *cqe; @@ -279,8 +275,5 @@ wq_ll_pop: /* ensure cq space is freed before enabling more cqes */ wmb(); - if (work_done == budget) - set_bit(MLX5E_CQ_HAS_CQES, &cq->flags); - return work_done; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 2c3fba0..2beea8c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -335,10 +335,6 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq) u16 sqcc; int i; - /* avoid accessing cq (dma coherent memory) if not needed */ - if (!test_and_clear_bit(MLX5E_CQ_HAS_CQES, &cq->flags)) - return false; - sq = container_of(cq, struct mlx5e_sq, cq); npkts = 0; @@ -422,10 +418,6 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq) netif_tx_wake_queue(sq->txq); sq->stats.wake++; } - if (i == MLX5E_TX_CQ_POLL_BUDGET) { - set_bit(MLX5E_CQ_HAS_CQES, &cq->flags); - return true; - } - return false; + return (i == MLX5E_TX_CQ_POLL_BUDGET); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c index 4ac8d71..66d51a7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c @@ -88,7 +88,6 @@ void mlx5e_completion_event(struct mlx5_core_cq *mcq) { struct mlx5e_cq *cq = container_of(mcq, struct mlx5e_cq, mcq); - set_bit(MLX5E_CQ_HAS_CQES, &cq->flags); set_bit(MLX5E_CHANNEL_NAPI_SCHED, &cq->channel->flags); barrier(); napi_schedule(cq->napi); -- 1.7.1
[PATCH net V1 2/7] net/mlx5e: Fix LRO modify
From: Tariq Toukan Ethtool LRO enable/disable is broken, as of today we only modify TCP TIRs in order to apply the requested configuration. Hardware requires that all TIRs pointing to the same RQ should share the same LRO configuration. For that all other TIRs' LRO fields must be modified as well. Fixes: 5c50368f3831 ('net/mlx5e: Light-weight netdev open/stop') Signed-off-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++ 1 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d4e1c30..137b05e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -1317,7 +1317,7 @@ static void mlx5e_build_tir_ctx_lro(void *tirc, struct mlx5e_priv *priv) lro_timer_supported_periods[2])); } -static int mlx5e_modify_tir_lro(struct mlx5e_priv *priv, int tt) +static int mlx5e_modify_tirs_lro(struct mlx5e_priv *priv) { struct mlx5_core_dev *mdev = priv->mdev; @@ -1325,6 +1325,7 @@ static int mlx5e_modify_tir_lro(struct mlx5e_priv *priv, int tt) void *tirc; int inlen; int err; + int tt; inlen = MLX5_ST_SZ_BYTES(modify_tir_in); in = mlx5_vzalloc(inlen); @@ -1336,7 +1337,11 @@ static int mlx5e_modify_tir_lro(struct mlx5e_priv *priv, int tt) mlx5e_build_tir_ctx_lro(tirc, priv); - err = mlx5_core_modify_tir(mdev, priv->tirn[tt], in, inlen); + for (tt = 0; tt < MLX5E_NUM_TT; tt++) { + err = mlx5_core_modify_tir(mdev, priv->tirn[tt], in, inlen); + if (err) + break; + } kvfree(in); @@ -1885,8 +1890,10 @@ static int mlx5e_set_features(struct net_device *netdev, mlx5e_close_locked(priv->netdev); priv->params.lro_en = !!(features & NETIF_F_LRO); - mlx5e_modify_tir_lro(priv, MLX5E_TT_IPV4_TCP); - mlx5e_modify_tir_lro(priv, MLX5E_TT_IPV6_TCP); + err = mlx5e_modify_tirs_lro(priv); + if (err) + mlx5_core_warn(priv->mdev, "lro modify failed, %d\n", + err); if (was_opened) err = mlx5e_open_locked(priv->netdev); -- 1.7.1
Re: Softirq priority inversion from "softirq: reduce latencies"
On Mon, 29 Feb 2016, Peter Hurley wrote: > On 02/29/2016 10:24 AM, Eric Dumazet wrote: > >> Just to be clear > >> > >>if (time_before(jiffies, end) && !need_resched() && > >>--max_restart) > >>goto restart; > >> > >> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process. > > > > Sure, now remove the 1st and 2nd condition. > > Well just removing the 2nd condition has everything working fine, > because that fixes the priority inversion. No. It does not fix anything. It hides the shortcomings of the driver. > However, when system resources are _not_ contended, it makes no > sense to be forced to revert to ksoftirqd resolution, which is strictly > intended as fallback. No. You claim it is simply because your driver does not handle that situation properly. > Or flipping your argument on its head, why not just _always_ execute > softirq in ksoftirqd? Which is what that change effectivley does. And that makes a lot of sense, because you get the softirq load under scheduler control and do not let the softirq run as a context stealing entity which is completely uncontrollable by the scheduler. Running the softirq on return from interrupt can cause real priority inversions. Thanks, tglx
Re: Softirq priority inversion from "softirq: reduce latencies"
On 02/29/2016 07:27 AM, Eric Dumazet wrote: > On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > >> The reason why Eric's change is so effective for Eric's workload is >> that it fixes the problem where NET_RX keeps getting new network packets >> so it keeps looping, servicing more NET_RX softirq. > > You have very little idea of what is happening in networking land. While that is true, I can read a trace: ** already in NET_RX softirq ** -0 0..s2 15us : kmem_cache_alloc: call_site=c08378e4 ptr=de55d7c0 bytes_req=192 bytes_alloc=192 gfp_flags=GFP_ATOMIC -0 0..s2 23us : netif_receive_skb_entry: dev=eth0 napi_id=0x0 queue_mapping=0 skbaddr=dca04400 vlan_tagged=0 vlan_proto=0x vlan_tci=0x000 0 protocol=0x0800 ip_summed=0 hash=0x l4_hash=0 len=88 data_len=0 truesize=1984 mac_header_valid=1 mac_header=-14 nr_frags=0 gso_size=0 gso_type=0x0 -0 0..s2 30us+: netif_receive_skb: dev=eth0 skbaddr=dca04400 len=88 -0 0d.s5 98us : sched_waking: comm=sshd pid=750 prio=120 target_cpu=000 -0 0d.s6 105us : sched_stat_sleep: comm=sshd pid=750 delay=3125230447 [ns] -0 0dns6 110us+: sched_wakeup: comm=sshd pid=750 prio=120 target_cpu=000 -0 0dns4 123us+: timer_start: timer=dc940e9c function=tcp_delack_timer expires=9746 [timeout=10] flags=0x -0 0dnH3 150us : irq_handler_entry: irq=176 name=4a10.ethernet -0 0dnH3 153us : softirq_raise: vec=3 [action=NET_RX] -0 0dnH3 155us : irq_handler_exit: irq=176 ret=handled -0 0dnH3 160us : irq_handler_entry: irq=20 name=4900.edma_ccint -0 0dnH3 163us : irq_handler_exit: irq=20 ret=handled -0 0.ns2 169us : napi_poll: napi poll on napi struct de465c30 for device eth0 -0 0.ns2 171us : softirq_exit: vec=3 [action=NET_RX] As you can see, NET_RX softirq is re-raised while in NET_RX softirq, as a result of receiving new packets. So NET_RX will keep looping, which is what I wrote. > Once hard irq for RX has triggered, we arm a NAPI (NET_RX softirq), and > no more irq will come unless the napi handler ran. Then when NAPI is > complete, we re-allow interrupt to be delivered when a new packet is > coming. > > Yes, ksoftirqd runs under load, and this is _wanted_. > > Sure, it might add a latency if some high prio task is wanting the same > cpu, but this is exactly the purpose of having multi tasking. > >
Re: [PATCH net-next v3 0/5] bridge/ovs: avoid skb head copy on frame forwarding
On Fri, Feb 26, 2016 at 1:45 AM, Paolo Abeni wrote: > Currently, while when an OVS or Linux bridge is used to forward frames towards > some tunnel device, a skb_head_copy() may occur if the ingress device do not > provide enough headroom for the tx encapsulation. > > This patch series tries to address the issue implementing a new ndo operation > to > allow the master device to control the headroom used when allocating the skb > on > frame reception. > > Said operation is used by the Linux bridge to notify the bridged ports of > needed_headroom changes, and similar bookkeeping and behaviour is also added > to > openvswitch, on a per datapath basis. > > Finally, the operation is implemented for veth and tun device, which give > performance improvement in the 6-12% range when forwarding frames from said > devices towards a vxlan tunnel. > > v2: > - fix netdev_get_fwd_headroom() behaviour > - remove some code duplication with the netdev_set_rx_headroom() and >netdev_reset_rx_headroom() helpers > - handle headroom reset on [v]port removal/deletion > - initialize tun align to the old default value > > v3: > - fix a comment typo > Patch series looks good to me. Acked-by: Pravin B Shelar
Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32
Mon, Feb 29, 2016 at 07:40:53PM CET, john.fastab...@gmail.com wrote: >On 16-02-27 08:28 PM, Cong Wang wrote: >> On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend >> wrote: >>> On 16-02-26 09:39 AM, Cong Wang wrote: On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend wrote: > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h > index 2121df5..e64d20b 100644 > --- a/include/net/pkt_cls.h > +++ b/include/net/pkt_cls.h > @@ -392,4 +392,9 @@ struct tc_cls_u32_offload { > }; > }; > > +static inline bool tc_should_offload(struct net_device *dev) > +{ > + return dev->netdev_ops->ndo_setup_tc; > +} > + These should be protected by CONFIG_NET_CLS_U32, no? >>> >>> Its not necessary it is a completely general function and I only >>> lifted it out of cls_u32 so that the cls_flower classifier could >>> also use it. >>> >>> I don't see the need off-hand to have it wrapped in an ORd ifdef >>> statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...). >>> Any particular reason you were thnking it should be wrapped in ifdefs? >>> >> >> Not a big deal. >> >> I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n. >> >> Thanks. >> > >Well because this is 'static inline' gcc should just remove it >if it is not used. Assuming non-ancient gcc and normal compile >flags, e.g. you are not including -fkeep-inline-functions or >something. > >So just to keep it readable I would prefer to just leave it >as is. Definitelly. cls_flower will use it in very near future. Making it dependent on CONFIG_NET_CLS_U32 makes 0 sense to me.
Re: [Patch net-next] net: remove skb_sender_cpu_clear()
On Mon, Feb 29, 2016 at 10:50 AM, Daniel Borkmann wrote: > On 02/28/2016 05:19 AM, Cong Wang wrote: >> >> After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id >> cohabitation") >> skb_sender_cpu_clear() becomes empty and can be removed. >> >> Cc: Eric Dumazet >> Signed-off-by: Cong Wang > > > Wasn't the intention to keep this helper as a marker when packet > crosses domains from RX to TX, see discussion here: > > https://patchwork.ozlabs.org/patch/527167/ > > Maybe better to rename it and add a comment into the helper to > make the intention more clear? Since when we need an empty function to mark some call path? Isn't this supposed to be done by comments or documents? BTW, I myself even don't think we need any comment, people who touches it should understand it.
Re: Softirq priority inversion from "softirq: reduce latencies"
On 02/29/2016 10:24 AM, Eric Dumazet wrote: > On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote: > >> While I appreciate the attempt, that's not the problem. >> >> Just to be clear >> >> if (time_before(jiffies, end) && !need_resched() && >> --max_restart) >> goto restart; >> >> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process. > > > Sure, now remove the 1st and 2nd condition. Well just removing the 2nd condition has everything working fine, because that fixes the priority inversion. > You would still 'abort' (ie wakeup ksoftirqd really) when --max_restart > becomes 0 Sure. Which would mean there's contended heavy i/o load so the driver has to fallback to non-DMA. That's an acceptable outcome. > So, instead of some subtle load dependent bug, you know have a reliable > trigger. There's no "subtle load dependent bug" here. The driver has a fallback mode of operation that it relies on without DMA. Of course, as I already wrote, this has consequences. If system resources are _actually contended_, then naturally, fighting for cpu and i/o time is fine, and I'm happy to do that in ksoftirqd. However, when system resources are _not_ contended, it makes no sense to be forced to revert to ksoftirqd resolution, which is strictly intended as fallback. Or flipping your argument on its head, why not just _always_ execute softirq in ksoftirqd?
Re: [Patch net-next] net: remove skb_sender_cpu_clear()
On 02/28/2016 05:19 AM, Cong Wang wrote: After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet Signed-off-by: Cong Wang Wasn't the intention to keep this helper as a marker when packet crosses domains from RX to TX, see discussion here: https://patchwork.ozlabs.org/patch/527167/ Maybe better to rename it and add a comment into the helper to make the intention more clear?
Re: [PATCH] mrf24j40: fix security-enabled processing on inbound frames
On 02/23/2016 04:29 AM, Alexander Aring wrote: Alan, do you have some comments about that? Currently the mrf24j40 goes into a deadlock if a frame with security enable bit is set. As you see, I helped myself to create this patch and solve this stupid default behaviour of mrf24j40. :-) Hi Alex, I'll look at this today. Alan.
Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32
On 16-02-27 08:28 PM, Cong Wang wrote: > On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend > wrote: >> On 16-02-26 09:39 AM, Cong Wang wrote: >>> On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend >>> wrote: diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index 2121df5..e64d20b 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -392,4 +392,9 @@ struct tc_cls_u32_offload { }; }; +static inline bool tc_should_offload(struct net_device *dev) +{ + return dev->netdev_ops->ndo_setup_tc; +} + >>> >>> These should be protected by CONFIG_NET_CLS_U32, no? >>> >> >> Its not necessary it is a completely general function and I only >> lifted it out of cls_u32 so that the cls_flower classifier could >> also use it. >> >> I don't see the need off-hand to have it wrapped in an ORd ifdef >> statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...). >> Any particular reason you were thnking it should be wrapped in ifdefs? >> > > Not a big deal. > > I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n. > > Thanks. > Well because this is 'static inline' gcc should just remove it if it is not used. Assuming non-ancient gcc and normal compile flags, e.g. you are not including -fkeep-inline-functions or something. So just to keep it readable I would prefer to just leave it as is. Thanks, John
Re: [PATCH] mld, igmp: Fix reserved tailroom calculation
On 29.02.2016 19:08, Benjamin Poirier wrote: If you think we should write the expression with "if" instead of "min", instead of the current + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); it should be: + if (mtu < skb_tailroom(skb) - tlen) + skb->reserved_tailroom = skb_tailroom(skb) - mtu; + else + skb->reserved_tailroom = tlen; The second alternative does not look more readable to me but I have been looking at that expression for a while. If you think that it is more readable, I will resend the patch expressed that way. Please let me know. I would still find it more readable actually, but no strong opinion, I would leave it up to you. Could it make sense to put this code into a static inline helper and reuse it for both, igmp and mld? Thanks, Hannes
Re: Softirq priority inversion from "softirq: reduce latencies"
On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote: > While I appreciate the attempt, that's not the problem. > > Just to be clear > > if (time_before(jiffies, end) && !need_resched() && > --max_restart) > goto restart; > > aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process. Sure, now remove the 1st and 2nd condition. You would still 'abort' (ie wakeup ksoftirqd really) when --max_restart becomes 0 So, instead of some subtle load dependent bug, you know have a reliable trigger. The fact it took 3 years for someone to complain about this change should tell us something really. The only way for your bug to hide would be to remove all the 'break infinite loop' logic. And this is not going to happen.
Re: [PATCH] mld, igmp: Fix reserved tailroom calculation
On 2016/02/29 16:43, Hannes Frederic Sowa wrote: > On 29.02.2016 16:19, Benjamin Poirier wrote: > >On 2016/02/29 15:57, Daniel Borkmann wrote: > >[...] > >> > >>[ cutting the IPv4 part off as diff is the same ] > >> > >>>diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c > >>>index 5ee56d0..c157edc 100644 > >>>--- a/net/ipv6/mcast.c > >>>+++ b/net/ipv6/mcast.c > >>>@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev > >>>*idev, unsigned int mtu) > >>> return NULL; > >>> > >>> skb->priority = TC_PRIO_CONTROL; > >>>- skb->reserved_tailroom = skb_end_offset(skb) - > >>>- min(mtu, skb_end_offset(skb)); > >>> skb_reserve(skb, hlen); > >>>+ skb->reserved_tailroom = skb_tailroom(skb) - > >>>+ min_t(int, mtu, skb_tailroom(skb) - tlen); > >> > >>Are you sure this is correct? Wouldn't that mean (assuming we allocated > >>enough space), that I could now fill a larger than MTU frame? > > > >Quoting back a part of the log: > > > >>>The maximum space available for ip headers and payload without > >>>fragmentation is min(mtu, data + extra). Therefore, > >>>reserved_tailroom > >>>= data + extra + tlen - min(mtu, data + extra) > >>>= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) > >>>= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) > > > >The min() takes care of the situation you describe, ie. if the allocated > >space is large, reserved_tailroom will be large enough that we do not > >use more space than the mtu. > > > >I tested the mld and igmp code with different driver parameters, mtu > >values, number of multicast address records and even allocation > >failures. If you think the formula is wrong, please provide a > >counter-example with hlen, tlen, mtu and size values. > > I think the code is fine albeit I think we should remove the min macro and > just do something: > > if (skb_tailroom(skb) > mtu) > skb->reserved_tailroom = skb_tailroom(skb) - mtu; > > Does that make sense? I think it is much more readable. That is not equivalent. It fails to take tlen into account. For igmp, consider this case: with hlen = 16, mtu = 9000, tlen = 8, additionally, suppose that the first iteration of the allocation loop (alloc_skb(9000 + 16 + 8, ...) which requires 4 pages) fails and the second iteration (alloc_skb((9000 >> 1) + 16 + 8, ...) which requires 2 pages) succeeds: size = (9000 >> 1) + 16 + 8 = 4524 skb_end_offset = 8192 - 320 = 7872 tailroom = 7872 - 16 = 7856 data = 9000 >> 1 = 4500 extra = 7872 - 4524 = 3348 reserved tailroom (patch version) = 4500 + 3348 + 8 - min(9000, 4500 + 3348) = 8 reserved tailroom (your version) = 0 Headers are ipv4 + igmpv3 = 24 + 8 = 32, records are 8 bytes With 978 igmpv3 records, with your version, we would output an skb that has less tailroom (0) than dev->needed_tailroom (8). For mld, consider this case: with hlen = 16, mtu = 9000, tlen = 8: size = 3776 (SKB_MAX_ORDER case) skb_end_offset = 3776 tailroom = 3776 - 16 = 3760 data = 3776 - 16 - 8 = 3752 extra = 0 reserved tailroom (patch version) = 3752 + 0 + 8 - min(9000, 3752 + 0) = 8 reserved tailroom (your version) = 0 Headers are ipv6 + icmpv6 = 48 + 8 = 56, records are 20 bytes With 185 mld records, with your formula, we would output an skb that has less tailroom (4) than dev->needed_tailroom (8). If you think we should write the expression with "if" instead of "min", instead of the current + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); it should be: + if (mtu < skb_tailroom(skb) - tlen) + skb->reserved_tailroom = skb_tailroom(skb) - mtu; + else + skb->reserved_tailroom = tlen; The second alternative does not look more readable to me but I have been looking at that expression for a while. If you think that it is more readable, I will resend the patch expressed that way. Please let me know.
Re: Softirq priority inversion from "softirq: reduce latencies"
On 02/29/2016 08:21 AM, Eric Dumazet wrote: > On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote: > >> The current kernel is HZ=250 but this would occur on HZ=1000 as well. > > Right. But the problem with HZ=100 and HZ=250 is that the detection can > happens because jiffy granularity is too coarse, since > > msecs_to_jiffies(2) -> 1 > > Following patch might reduce the probability, but wont really fix your > problem. > > Fact that ksoftirqd prio is not what you want is completely orthogonal. > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 479e443..f7cc594 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -180,7 +180,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); > > /* > * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, > - * but break the loop if need_resched() is set or after 2 ms. > + * but break the loop if need_resched() is set or after 2 ms/ticks. > * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in > * certain cases, such as stop_machine(), jiffies may cease to > * increment and so we need the MAX_SOFTIRQ_RESTART limit as > @@ -191,7 +191,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); > * we want to handle softirqs as soon as possible, but they > * should not be able to lock up the box. > */ > -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > +#define MAX_SOFTIRQ_TIME (1 + msecs_to_jiffies(2)) > #define MAX_SOFTIRQ_RESTART 10 > > #ifdef CONFIG_TRACE_IRQFLAGS While I appreciate the attempt, that's not the problem. Just to be clear if (time_before(jiffies, end) && !need_resched() && --max_restart) goto restart; aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
[PATCH v2] socket.7: Document some BPF-related socket options
From: Craig Gallek Document the behavior and the first kernel version for each of the following socket options: SO_ATTACH_FILTER SO_ATTACH_BPF SO_ATTACH_REUSEPORT_CBPF SO_ATTACH_REUSEPORT_EBPF SO_DETACH_FILTER SO_DETACH_BPF SO_LOCK_FILTER Signed-off-by: Craig Gallek --- v2 changes: - Content suggestions from Michael Kerrisk : * Clarify socket filter return value semantics * Clarify wording of minimal kernel versions * Explain behavior of multiple calls using SO_ATTACH_[BPF|FILTER] * Define 'reuseport groups' in SO_ATTACH_REUSEPORT_* - Include SO_LOCK_FILTER documentation mostly based off of the wording in the commit message by Vincent Bernat d59577b6ffd3 ("sk-filter: Add ability to lock a socket filter program") --- man7/socket.7 | 136 +- 1 file changed, 115 insertions(+), 21 deletions(-) diff --git a/man7/socket.7 b/man7/socket.7 index db7cb8324dde..d22107cc47d7 100644 --- a/man7/socket.7 +++ b/man7/socket.7 @@ -41,9 +41,6 @@ .\"SO_GET_FILTER (3.8) .\"commit a8fc92778080c845eaadc369a0ecf5699a03bef0 .\"Author: Pavel Emelyanov -.\"SO_LOCK_FILTER (3.9) -.\"commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182 -.\"Author: Vincent Bernat .\"SO_SELECT_ERR_QUEUE (3.10) .\" commit 7d4c04fc170087119727119074e72445f2bb192b .\"Author: Keller, Jacob E @@ -53,13 +50,6 @@ .\" SO_BPF_EXTENSIONS (3.14) .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e .\"Author: Michal Sekletar -.\" SO_ATTACH_BPF (3.19) -.\" and SO_DETACH_BPF as synonym for SO_DETACH_FILTER -.\" commit 89aa075832b0da4402acebd698d0411dcc82d03e -.\"Author: Alexei Starovoitov -.\"SO_ATTACH_REUSEPORT_CBPF, SO_ATTACH_REUSEPORT_EBPF (4.5) -.\"commit 538950a1b7527a0a52ccd9337e3fcd304f027f13 -.\"Author: Craig Gallek .\" .TH SOCKET 7 2015-05-07 Linux "Linux Programmer's Manual" .SH NAME @@ -311,6 +301,90 @@ The value 0 indicates that this is not a listening socket, the value 1 indicates that this is a listening socket. This socket option is read-only. .TP +.BR SO_ATTACH_FILTER " and " SO_ATTACH_BPF +Attach a classic or extended BPF program (respectively) to the socket +for use as a filter of incoming packets. A packet will be dropped if +the filter program returns zero. If the filter program returns a +non-zero value which is less than the packet's data length, the packet +will be truncated to the length returned. If the value returned by +the filter is greater than or equal to the packet's data length, the +packet is allowed to proceed unmodified. + +The argument for +.BR SO_ATTACH_FILTER +is a +.I sock_fprog +structure in +.B . +.sp +.in +4n +.nf +struct sock_fprog { +unsigned short len; +struct sock_filter *filter; +}; +.fi +.in +.IP +The argument for +.BR SO_ATTACH_BPF +is a file descriptor returned by the +.BR bpf (2) +system call and must refer to a program of type +.BR BPF_PROG_TYPE_SOCKET_FILTER. +These options may be set multiple times for a given socket, each time +replacing the previous filter program. The classic and extended +versions may be called on the same socket, but the previous filter +will always be replaced such that a socket never has more than one +filter defined. + +.BR SO_ATTACH_FILTER +is available since Linux 2.2. +.BR SO_ATTACH_BPF +is available since Linux 3.19. Both classic and extended BPF are +explained in the kernel source file +.I Documentation/networking/filter.txt +.TP +.BR SO_ATTACH_REUSEPORT_CBPF " and " SO_ATTACH_REUSEPORT_EBPF " (since Linux 4.5)" +For use with the +.BR SO_REUSEPORT +option, these options allow the user to set a classic or extended +BPF program (respectively) which defines how packets are assigned to +the sockets in the reuseport group (that is, all sockets which have +.BR SO_REUSEPORT +set and are using the same local address to receive packets). The BPF +program must return an index between 0 and N-1 representing the socket +which should receive the packet (where N is the number of sockets in +the group). If the BPF program returns an invalid index, socket +selection will fall back to the plain +.BR SO_REUSEPORT +mechanism. + +Sockets are numbered in the order in which they are added to the group +(that is, the order of +.BR bind (2) +calls for UDP sockets or the order of +.BR listen (2) +calls for TCP sockets). New sockets added to a reuseport group will +inherit the BPF program. When a socket is removed from a reuseport +group (via +.BR close (2)) +the last socket in the group will be moved into the closed socket's +position. + +These options may be set repeatedly at any time on any single socket +in the group to replace the current BPF program used by all sockets in +the group. +.BR SO_ATTACH_REUSEPORT_CBPF +takes the same socket argument type as +.BR SO_ATTACH_FILTER +and +.BR SO_ATTACH_REUSEPORT_EBPF +takes th
Re: Softirq priority inversion from "softirq: reduce latencies"
From: Peter Hurley Date: Mon, 29 Feb 2016 07:03:11 -0800 > However, I'm pointing out that Eric's sledgehammer approach to fixing > the NET_RX softirq bug is having significant side-effects in other > subsystems. Either your hardware can handle arbitrary latencies and thus can use softirqs for event completion successfully, or it can't. You, my friend, are the one using the sledgehammer.
Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode
On Mon, Feb 29, 2016 at 2:23 AM, Jiri Benc wrote: > On Sat, 27 Feb 2016 12:54:52 -0800, Tom Herbert wrote: >> Yes, but RCO has not been specified for VXLAN-GPE either > > As far as I can see, RCO will just work with VXLAN-GPE. But I have no > problem disallowing them to be set together, if you prefer that. > >> so the patch >> does not correctly refuse setting those two together. Inevitably >> though, those and other extensions will defined for VXLAN-GPE and new >> ones for VXLAN. Again, the protocols are fundamentally incompatible, >> so instead of trying to enforce each valid combination at >> configuration > > We need to do the checking in either case. If we accepted unsupported > combinations and then just silently ignored them, we'd be in troubles > later when such combination becomes defined/supported. There would be > no way for the userspace tools to detect whether a particular kernel > supports the combination or not. > > So, we need to check for supported combination of options during > configuration anyway. > > And when we have that, I don't really see the reason for doing that > kind of code duplication that you suggest. > >> or performing multiple checks for flavor each time we >> look at a packet, it seems easier to split the parsing with at most >> one check for the protocol variant. For instance in >> vxlan_udp_encap_recv just do: >> >> if (vs->flags & VXLAN_F_GPE) >>if (!vxlan_parse_gpe_hdr(&unparsed, skb, vs->flags)) >>goto drop; >> else >>if (!vxlan_parse_gpe(&unparsed, skb, vs->flags)) >>goto drop; > > Most of the code of these two functions will be identical. To > consolidate that as much as possible, you'll end up with what I have or > something very similar. > >> And then move REMCSUM and GPB and other protocol specific checks to >> the right function. > > And when RCO is defined for GPE, we copy the code? Doesn't make sense, > sorry. > > If you look at the code in the current net-next (and the code after > this patchset), the extension handling has been made generic and each > extension gets its own handler function, leading to clean separation in > the code. There's no reason to split the vxlan_rcv into two functions > doing the same things but with slightly different calls to extensions. > They may or may not be "slightly different"; if they are the same (like RCO for VXLAN-GPE uses the low order bits in VNI) then a common backend function can be called. As defined now, GPB can't be used with VXLAN-GPE at all, but when I read your patch it looks very much like GPB is being checked and allowed in the VXLAN-GPE path. The fact that "if (vs->flags & VXLAN_F_GBP)" always fails for VXLAN-GPE packets because of configuration constraints is not at all obvious, and really this just results in an unnecessary conditional that gives the same answer for every single VXLAN-GPE packet which we've already checked for just a few lines above. At least the check for GPB could be moved to an else block of " if (vs->flags & VXLAN_F_GPE)", this alone improves clarity and eliminates an unnecessary conditional in the VXLAN-GPE path. > Jiri
Re: [net] net: fix double free issue of skbuff
From: 张胜举 Date: Mon, 29 Feb 2016 22:16:37 +0800 >> On Mon, 2016-02-29 at 12:22 +, Zhang Shengju wrote: >> > If skb_reorder_vlan_header() failed, skb is freed and NULL is returned. >> > Then at skb_vlan_untag(), it will free skbuff again which cause double >> > free. >> >> On skb_reorder_vlan_header() failure, skb_vlan_untag() will call >> kfree_skb() using the return value of skb_reorder_vlan_header(), that is >> NULL. kfree_skb() is a noop when the argument is NULL. >> >> The current code seams safe. >> >> Paolo > Hi Paolo, even current code is safe, this's still a potential problem. We > should make an > assumption that inner function doesn't free skb, and let outside function > take care of this. No, the current code is intentional and perfectly fine. Fix real bugs, not imaginary ones. Thanks.
Re: [net] net: fix double free issue of skbuff
From: Zhang Shengju Date: Mon, 29 Feb 2016 12:22:53 + > If skb_reorder_vlan_header() failed, skb is freed and NULL is returned. > Then at skb_vlan_untag(), it will free skbuff again which cause double > free. The 'skb' local variable in this case will be set to "NULL", calling kfree_skb() on NULL doesn't do anything. > This patch removes kfree_skb() call in function skb_reorder_vlan_header(). > > Signed-off-by: Zhang Shengju Please analyze the complete control path of the caller of this function, and you'll find that everything is fine.
Re: [PATCH v3 00/17] stmmac: enhance driver performances and update the version
Gents on top of these patches, there is a new train to enhance the stmmac to support the DWMAC_4.x chips. They will be proposed very soon and on top of this update (as soon as reviewed and merged). In our context, it has been very useful working with the same driver that runs fine on several (x86, arm, sh4) boxes with different SYNP MAC/GMAC IPs (starting from MAC10/100 Database Release 1.5 to databook 3.70a and 4.00a and 4.10a). We got the benefit to have all the features already supported by stmmac plus the good performances available with TSO on gmac4. I can image no big issues to enhance the stmmac on supporting the new 4.00a and 4.10a although there is some other work made by Rabin and Larper. I guess, stmmac users will continue to be happy to continue to have the same d.d. working on their platforms with new gmac. But! it also makes sense to avoid to have two drivers that aim to do the same job. Or to get more synergy on the same code as done in the past with Rayagond for PTP and EEE. If you have some concern or advice, please do not hesitate to ask. We will try to send the patches soon to show the code in case of people are interested in. Kind Regards Peppe On 2/29/2016 2:27 PM, Alexandre TORGUE wrote: According to Giuseppe, I send the v3 series. This is a subset of patches to rework the driver in order to improve its performances and make it more robust under stress conditions. All patches have been ported on STi mainstream kernel branch and tested on ARM STiH4xx platforms and newer ones. This series also updates the driver version and prepares it to include further development to support new chips. In detail, these patches are: o to rework and improve the internal DMA bus settings Fine tuning is mandatory on some platforms for both performance and stability issues. o to rework and optimize the descriptor management. This will help a lot on performance side and preparing the inclusion on the GMAC4.x. o to add a set of optimizations for both xmit and rx functions. These will help a lot on performance side and making the driver more robust in case of low memory conditions and under some stress test, performed for example on IP-STB. Below some throughput figures obtained on some boxes before and after the patches. nuttcp (mbps) iperf (Mbps) -- tcp udp tcp udp tx rx tx rx tx rx tx rx -- old 680 800 480 506760 800 600 700 new 830 880 540 630840 880 700 800 == V2: - rx_copybreak is now managed by using ethtool. V3: - improve comments on PCIe detailing that there are no regressions - rework some APIs to properly define some params as bool as expected - rework the formula to get the element inside the ring. Comparing V2, patches 4 and 13 have been merged because the same formula have been used. After this rework, no evident benefit has been noticed in terms of performances so the table above is still valid. Disassembling the code for SH4 and ARM, with the new formula just an instr is saved (depending on compiler flags) and this gives us not so relevanti gain, for example, on SH4 where some instr are executed in the same pipeline stage. Ring sizes are now fixed and maybe they can be reworked to be tuned w/o using stmmaceth= cmdline option. Indeed, nobody change these sizes and indeed the numbers selected by default respect the budget and avoid to pass invalid setup. These are the best driver default sizes for ring and chain. == Fabrice Gasnier (3): stmmac: merge get_rx_owner into rx_status routine. stmmac: optimize tx clean function stmmac: fix phy init when attached to a phy Giuseppe Cavallaro (14): stmmac: share reset function between dwmac100 and dwmac1000 stmmac: rework DMA bus setting and introduce new platform AXI structure stmmac: change descriptor layout stmmac: review RX/TX ring management stmmac: add length field to dma data stmmac: add last_segment field to dma data stmmac: add is_jumbo field to dma data stmmac: optimize tx desc management stmmac: set dirty index out of the loop stmmac: first frame prep at the end of xmit routine stmmac: do not poll phy handler when attach a switch stmmac: do not perform zero-copy for rx frames stmmac: tune rx copy via threshold. stmmac: update version to Oct_2015 Documentation/devicetree/bindings/net/stmmac.txt | 54 ++- drivers/net/ethernet/stmicro/stmmac/chain_mode.c | 37 +- drivers/net/ethernet/stmicro/stmmac
[PATCH] ethernet/atl1c: remove left over dead code
Left over from c24588afc536a35c924d014f13b669b20ccf8553 ("atl1c: using fixed TXQ configuration for l2cb and l1c") Signed-off-by: Eric Engestrom --- drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index 8b5988e..d0084d4 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -65,10 +65,6 @@ static void atl1c_reset_dma_ring(struct atl1c_adapter *adapter); static int atl1c_configure(struct atl1c_adapter *adapter); static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter); -static const u16 atl1c_pay_load_size[] = { - 128, 256, 512, 1024, 2048, 4096, -}; - static const u32 atl1c_default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_LINK | NETIF_MSG_TIMER | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP; -- 2.7.1
[PATCH] net/ipv4: remove left over dead code
8cc785f6f429c2a3fb81745dc142cbd72a462c4a ("net: ipv4: make the ping /proc code AF-independent") removed the code using it, but renamed this variable instead of removing it. Signed-off-by: Eric Engestrom --- net/ipv4/ping.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index d3a2716..35179fc 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -1140,13 +1140,6 @@ static int ping_v4_seq_show(struct seq_file *seq, void *v) return 0; } -static const struct seq_operations ping_v4_seq_ops = { - .show = ping_v4_seq_show, - .start = ping_v4_seq_start, - .next = ping_seq_next, - .stop = ping_seq_stop, -}; - static int ping_seq_open(struct inode *inode, struct file *file) { struct ping_seq_afinfo *afinfo = PDE_DATA(inode); -- 2.7.1
[PATCH] net/rtnetlink: remove dead code
3b766cd832328fcb87db3507e7b98cf42f21689d ("net/core: Add reading VF statistics through the PF netdevice") added that variable but it's never been used. Signed-off-by: Eric Engestrom --- net/core/rtnetlink.c | 9 - 1 file changed, 9 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index d735e85..35abefc 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1389,15 +1389,6 @@ static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = { [IFLA_VF_TRUST] = { .len = sizeof(struct ifla_vf_trust) }, }; -static const struct nla_policy ifla_vf_stats_policy[IFLA_VF_STATS_MAX + 1] = { - [IFLA_VF_STATS_RX_PACKETS] = { .type = NLA_U64 }, - [IFLA_VF_STATS_TX_PACKETS] = { .type = NLA_U64 }, - [IFLA_VF_STATS_RX_BYTES]= { .type = NLA_U64 }, - [IFLA_VF_STATS_TX_BYTES]= { .type = NLA_U64 }, - [IFLA_VF_STATS_BROADCAST] = { .type = NLA_U64 }, - [IFLA_VF_STATS_MULTICAST] = { .type = NLA_U64 }, -}; - static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = { [IFLA_PORT_VF] = { .type = NLA_U32 }, [IFLA_PORT_PROFILE] = { .type = NLA_STRING, -- 2.7.1
INFORMAÇÃO IMPORTANTE RE: Consultoria de Investimento em sua Localidade.
Oi Amigo, INFORMAÇÃO IMPORTANTE Nossa família está interessada em investir fundos em sua localidade. Mais informações para você se interessou. Saudações, Sir Henry Neville Lindley Keswick Presidente da Jardine Matheson Holdings Ltd https://en.wikipedia.org/wiki/Keswick_family
[PATCH 0/3] Enable Ethernet on STM32F429 EVAL board
This series adds Ethernet support on STM32F429 SOC and enable it on Eval board: -Add Ethernet node in SOC file: -Define MII mode pinctrl -use Mixed burst and PBL 8 -Add system config node for glue. -Enable Ethernet for Eval board: -mii mode -connected to a PHY through MDIO. Note, this series follow the series which adds glue and update stmmac driver: https://lkml.org/lkml/2016/2/26/329 Best regards. Alex Alexandre TORGUE (3): ARM: dts: stm32f429: Add system config bank node ARM: dts: stm32f429: Add Ethernet support ARM: dts: stm32f429: Enable Ethernet on Eval board arch/arm/boot/dts/stm32429i-eval.dts | 15 ++ arch/arm/boot/dts/stm32f429.dtsi | 40 2 files changed, 55 insertions(+) -- 1.9.1
[PATCH 2/3] ARM: dts: stm32f429: Add Ethernet support
Add Ethernet support (Synopsys MAC IP 3.50a) on stm32f429 SOC. Signed-off-by: Alexandre TORGUE diff --git a/arch/arm/boot/dts/stm32f429.dtsi b/arch/arm/boot/dts/stm32f429.dtsi index bb7a736..af0367c 100644 --- a/arch/arm/boot/dts/stm32f429.dtsi +++ b/arch/arm/boot/dts/stm32f429.dtsi @@ -283,6 +283,26 @@ bias-disable; }; }; + + ethernet0_mii: mii@0 { + mii { + slew-rate = <2>; + pinmux = , + , + , + , + , + , +, +, + , + , + , + , + , + ; + }; + }; }; rcc: rcc@40023810 { @@ -323,6 +343,21 @@ st,mem2mem; }; + ethernet0: dwmac@40028000 { + compatible = "st,stm32-dwmac", "snps,dwmac-3.50a"; + status = "disabled"; + reg = <0x40028000 0x8000>; + reg-names = "stmmaceth"; + interrupts = <0 61 0>, <0 62 0>; + interrupt-names = "macirq", "eth_wake_irq"; + clock-names = "stmmaceth", "tx-clk", "rx-clk"; + clocks = <&rcc 0 25>, <&rcc 0 26>, <&rcc 0 27>; + st,syscon = <&syscfg 0x4>; + snps,pbl = <8>; + snps,mixed-burst; + dma-ranges; + }; + rng: rng@50060800 { compatible = "st,stm32-rng"; reg = <0x50060800 0x400>; -- 1.9.1
[PATCH] net-sysfs: remove left over dead code
This format hasn't been used since 04ed3e741d0f133e02bed7fa5c98edba128f90e7 ("net: change netdev->features to u32") Signed-off-by: Eric Engestrom --- net/core/net-sysfs.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index b6c8a66..e326707 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -29,7 +29,6 @@ #ifdef CONFIG_SYSFS static const char fmt_hex[] = "%#x\n"; -static const char fmt_long_hex[] = "%#lx\n"; static const char fmt_dec[] = "%d\n"; static const char fmt_ulong[] = "%lu\n"; static const char fmt_u64[] = "%llu\n"; -- 2.7.1
[PATCH 3/3] ARM: dts: stm32f429: Enable Ethernet on Eval board
MAC is connected to a PHY in MII mode. Signed-off-by: Alexandre TORGUE diff --git a/arch/arm/boot/dts/stm32429i-eval.dts b/arch/arm/boot/dts/stm32429i-eval.dts index 1ae57fa..e345459 100644 --- a/arch/arm/boot/dts/stm32429i-eval.dts +++ b/arch/arm/boot/dts/stm32429i-eval.dts @@ -87,6 +87,21 @@ clock-frequency = <2500>; }; +ðernet0 { + status = "okay"; + pinctrl-0 = <ðernet0_mii>; + pinctrl-names = "default"; + phy-mode= "mii-id"; + mdio0 { + #address-cells = <1>; + #size-cells = <0>; + compatible = "snps,dwmac-mdio"; + phy1: ethernet-phy@1 { + reg = <1>; + }; + }; +}; + &usart1 { pinctrl-0 = <&usart1_pins_a>; pinctrl-names = "default"; -- 1.9.1
Re: [PATCH v2 1/3] net: ipv4: Convert IP network timestamps to be y2038 safe
On Saturday 27 February 2016 00:32:15 Deepa Dinamani wrote: > ICMP timestamp messages and IP source route options require > timestamps to be in milliseconds modulo 24 hours from > midnight UT format. > > Add inet_current_timestamp() function to support this. The function > returns the required timestamp in network byte order. > > Timestamp calculation is also changed to call ktime_get_real_ts64() > which uses struct timespec64. struct timespec64 is y2038 safe. > Previously it called getnstimeofday() which uses struct timespec. > struct timespec is not y2038 safe. > > Signed-off-by: Deepa Dinamani > Cc: "David S. Miller" > Cc: Alexey Kuznetsov > Cc: Hideaki YOSHIFUJI > Cc: James Morris > Cc: Patrick McHardy > Acked-by: Arnd Bergmann
[PATCH 1/3] ARM: dts: stm32f429: Add system config bank node
Signed-off-by: Alexandre TORGUE diff --git a/arch/arm/boot/dts/stm32f429.dtsi b/arch/arm/boot/dts/stm32f429.dtsi index 598362e..bb7a736 100644 --- a/arch/arm/boot/dts/stm32f429.dtsi +++ b/arch/arm/boot/dts/stm32f429.dtsi @@ -171,6 +171,11 @@ status = "disabled"; }; + syscfg: system-config@40013800 { + compatible = "syscon"; + reg = <0x40013800 0x400>; + }; + pin-controller { #address-cells = <1>; #size-cells = <1>; -- 1.9.1
Re: Softirq priority inversion from "softirq: reduce latencies"
On lun., 2016-02-29 at 07:58 -0800, Peter Hurley wrote: > All that's happened is the first loop of NET_RX softirq has woken a > process; that is sufficient to abort softirq and defer it for ksoftirqd. > > That's why I'm saying this is a priority inversion, and one that > will happen a lot. Sure. This will happen every time ksoftirqd is launched. Get rid of ksoftirqd or renice it so that you can easily be killed by softirq storm.
Re: Softirq priority inversion from "softirq: reduce latencies"
On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote: > The current kernel is HZ=250 but this would occur on HZ=1000 as well. Right. But the problem with HZ=100 and HZ=250 is that the detection can happens because jiffy granularity is too coarse, since msecs_to_jiffies(2) -> 1 Following patch might reduce the probability, but wont really fix your problem. Fact that ksoftirqd prio is not what you want is completely orthogonal. diff --git a/kernel/softirq.c b/kernel/softirq.c index 479e443..f7cc594 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -180,7 +180,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); /* * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, - * but break the loop if need_resched() is set or after 2 ms. + * but break the loop if need_resched() is set or after 2 ms/ticks. * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in * certain cases, such as stop_machine(), jiffies may cease to * increment and so we need the MAX_SOFTIRQ_RESTART limit as @@ -191,7 +191,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip); * we want to handle softirqs as soon as possible, but they * should not be able to lock up the box. */ -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) +#define MAX_SOFTIRQ_TIME (1 + msecs_to_jiffies(2)) #define MAX_SOFTIRQ_RESTART 10 #ifdef CONFIG_TRACE_IRQFLAGS
[PATCH] stmmac: Fix 'eth0: No PHY found' regression
This patch manages the case when you have an Ethernet MAC with a "fixed link", and not connected to a normal MDIO-managed PHY device. The test of phy_bus_name was not helpful because it was never affected and replaced by the mdio test node. Signed-off-by: Gabriel Fernandez --- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 +-- drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 9 - include/linux/stmmac.h| 1 + 3 files changed, 10 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index 0faf163..efb54f3 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -199,21 +199,12 @@ int stmmac_mdio_register(struct net_device *ndev) struct stmmac_priv *priv = netdev_priv(ndev); struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data; int addr, found; - struct device_node *mdio_node = NULL; - struct device_node *child_node = NULL; + struct device_node *mdio_node = priv->plat->mdio_node; if (!mdio_bus_data) return 0; if (IS_ENABLED(CONFIG_OF)) { - for_each_child_of_node(priv->device->of_node, child_node) { - if (of_device_is_compatible(child_node, - "snps,dwmac-mdio")) { - mdio_node = child_node; - break; - } - } - if (mdio_node) { netdev_dbg(ndev, "FOUND MDIO subnode\n"); } else { diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 6a52fa1..4514ba7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -110,6 +110,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) struct device_node *np = pdev->dev.of_node; struct plat_stmmacenet_data *plat; struct stmmac_dma_cfg *dma_cfg; + struct device_node *child_node = NULL; plat = devm_kzalloc(&pdev->dev, sizeof(*plat), GFP_KERNEL); if (!plat) @@ -140,13 +141,19 @@ stmmac_probe_config_dt(struct platform_device *pdev, const char **mac) plat->phy_node = of_node_get(np); } + for_each_child_of_node(np, child_node) + if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) { + plat->mdio_node = child_node; + break; + } + /* "snps,phy-addr" is not a standard property. Mark it as deprecated * and warn of its use. Remove this when phy node support is added. */ if (of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr) == 0) dev_warn(&pdev->dev, "snps,phy-addr property is deprecated\n"); - if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name) + if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node) plat->mdio_bus_data = NULL; else plat->mdio_bus_data = diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index eead8ab..881a79d 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -100,6 +100,7 @@ struct plat_stmmacenet_data { int interface; struct stmmac_mdio_bus_data *mdio_bus_data; struct device_node *phy_node; + struct device_node *mdio_node; struct stmmac_dma_cfg *dma_cfg; int clk_csr; int has_gmac; -- 1.9.1
Re: Softirq priority inversion from "softirq: reduce latencies"
On 02/29/2016 07:19 AM, Eric Dumazet wrote: > On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > >> Not the case. The softirq is raised from interrupt. >> >> Before Eric's change, when an interrupt raises a new softirq >> while processing another softirq, the new softirq is immediately >> processed *after the existing softirq completes*. >> >> After Eric's change, when an interrupt raises a new softirq >> while processing another softirq and _that softirq wakes a process_, >> the new softirq is *deferred to normal process priority*. > > For the last time, this is not true. > > My patch changed the probability for this to happen. There is a huge difference between 1. heavy i/o load forcing ksoftirqd to battle out i/o with regular sched processes *as a fallback to avoid 100% softirq* and 2. always deferring new softirq just because a process was woken > It will happen even if you revert it. I think there is a happy medium where finer constraints on softirq looping will get us both what we want. For example, an accumulating mask of softirq already run would keep one softirq level from looping over-and-over. Or a per-softirq limiting counter. Or relying on the hard limit that was added later of a fixed number of softirq loops. Or a combination of those. > linux never claimed that softirq could steal all cpu time. That's not the problem observed here. In fact, what your patch triggers is exactly the opposite: although cpu load is initially very light because DMA is used to perform device i/o, once DMA is not being serviced in a timely manner, the driver fallbacks to purely interrupt-driven i/o which dramatically increases the real cpu load at those line rates. > Are by any chance still running a HZ=100 kernel ? The current kernel is HZ=250 but this would occur on HZ=1000 as well. Regards, Peter Hurley
Re: Softirq priority inversion from "softirq: reduce latencies"
On 02/29/2016 07:40 AM, Mike Galbraith wrote: > On Mon, 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > >>> If I'm listening properly, the root cause is that there is a timing >>> constraint involved, which is being exposed because one softirq raises >>> another (ew). >> >> Not the case. The softirq is raised from interrupt. > > Yeah, saw that on re-read. > >> Before Eric's change, when an interrupt raises a new softirq >> while processing another softirq, the new softirq is immediately >> processed *after the existing softirq completes*. > > Not necessarily, Eric only changed it from an arbitrary count to an > arbitrary time, so your irq could just as well land when there's no > count left and be up the same creek. Your misreading the softirq abort logic: neither 2ms nor a fixed number of loops has elapsed. All that's happened is the first loop of NET_RX softirq has woken a process; that is sufficient to abort softirq and defer it for ksoftirqd. That's why I'm saying this is a priority inversion, and one that will happen a lot. > I was more infatuated by the constraint that's left dangling in the > breeze any time processing is deferred to ksoftirqd. > > -Mike >
Re: [PATCH] mld, igmp: Fix reserved tailroom calculation
On 02/29/2016 04:19 PM, Benjamin Poirier wrote: On 2016/02/29 15:57, Daniel Borkmann wrote: [...] [ cutting the IPv4 part off as diff is the same ] diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 5ee56d0..c157edc 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) return NULL; skb->priority = TC_PRIO_CONTROL; - skb->reserved_tailroom = skb_end_offset(skb) - -min(mtu, skb_end_offset(skb)); skb_reserve(skb, hlen); + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); Are you sure this is correct? Wouldn't that mean (assuming we allocated enough space), that I could now fill a larger than MTU frame? Quoting back a part of the log: The maximum space available for ip headers and payload without fragmentation is min(mtu, data + extra). Therefore, reserved_tailroom = data + extra + tlen - min(mtu, data + extra) = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) The min() takes care of the situation you describe, ie. if the allocated space is large, reserved_tailroom will be large enough that we do not use more space than the mtu. Hmm, sorry, you are right, I had a bug in my thought process wrt the skb_reserve() that is now done first. Code is fine, patch would be against -net tree: Acked-by: Daniel Borkmann Thanks, Benjamin!
Re: [PATCH] mld, igmp: Fix reserved tailroom calculation
On 29.02.2016 16:19, Benjamin Poirier wrote: On 2016/02/29 15:57, Daniel Borkmann wrote: [...] [ cutting the IPv4 part off as diff is the same ] diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 5ee56d0..c157edc 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) return NULL; skb->priority = TC_PRIO_CONTROL; - skb->reserved_tailroom = skb_end_offset(skb) - -min(mtu, skb_end_offset(skb)); skb_reserve(skb, hlen); + skb->reserved_tailroom = skb_tailroom(skb) - + min_t(int, mtu, skb_tailroom(skb) - tlen); Are you sure this is correct? Wouldn't that mean (assuming we allocated enough space), that I could now fill a larger than MTU frame? Quoting back a part of the log: The maximum space available for ip headers and payload without fragmentation is min(mtu, data + extra). Therefore, reserved_tailroom = data + extra + tlen - min(mtu, data + extra) = skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen) = skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen) The min() takes care of the situation you describe, ie. if the allocated space is large, reserved_tailroom will be large enough that we do not use more space than the mtu. I tested the mld and igmp code with different driver parameters, mtu values, number of multicast address records and even allocation failures. If you think the formula is wrong, please provide a counter-example with hlen, tlen, mtu and size values. I think the code is fine albeit I think we should remove the min macro and just do something: if (skb_tailroom(skb) > mtu) skb->reserved_tailroom = skb_tailroom(skb) - mtu; Does that make sense? I think it is much more readable. Thanks, Hannes
Re: Softirq priority inversion from "softirq: reduce latencies"
On Mon, 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > > If I'm listening properly, the root cause is that there is a timing > > constraint involved, which is being exposed because one softirq raises > > another (ew). > > Not the case. The softirq is raised from interrupt. Yeah, saw that on re-read. > Before Eric's change, when an interrupt raises a new softirq > while processing another softirq, the new softirq is immediately > processed *after the existing softirq completes*. Not necessarily, Eric only changed it from an arbitrary count to an arbitrary time, so your irq could just as well land when there's no count left and be up the same creek. I was more infatuated by the constraint that's left dangling in the breeze any time processing is deferred to ksoftirqd. -Mike
[PATCH] fsl/fman: remove dTSEC-A003 Errata workaround
From: Igal Liberman Errata dTSEC-A003 was fixed in P4080 rev 3.0. Prior revisions are not supported, so the workaround can be removed. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/fman_dtsec.c |8 1 file changed, 8 deletions(-) diff --git a/drivers/net/ethernet/freescale/fman/fman_dtsec.c b/drivers/net/ethernet/freescale/fman/fman_dtsec.c index 7c92eb8..09dd46d 100644 --- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c +++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c @@ -932,14 +932,6 @@ int dtsec_set_tx_pause_frames(struct fman_mac *dtsec, if (!is_init_done(dtsec->dtsec_drv_param)) return -EINVAL; - /* FM_BAD_TX_TS_IN_B_2_B_ERRATA_DTSEC_A003 Errata workaround */ - if (dtsec->fm_rev_info.major == 2) - if (pause_time <= 320) { - pr_warn("pause-time: %d illegal.Should be > 320\n", - pause_time); - return -EINVAL; - } - if (pause_time) { ptv = ioread32be(®s->ptv); ptv &= PTV_PTE_MASK; -- 1.7.9.5
[PATCH net 4/5] dwc_eth_qos: use DWCEQOS_MSG_DEFAULT
From: Rabin Vincent Since debug is hardcoded to 3, the defaults in the DWCEQOS_MSG_DEFAULT macro are never used, which does not seem to be the intended behaviour here. Set debug to -1 like other drivers so that DWCEQOS_MSG_DEFAULT is actually used by default. Signed-off-by: Rabin Vincent Signed-off-by: Lars Persson --- drivers/net/ethernet/synopsys/dwc_eth_qos.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c index 3ca2d5c..6897c1d 100644 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c @@ -426,7 +426,7 @@ #define DWC_MMC_RXOCTETCOUNT_GB 0x0784 #define DWC_MMC_RXPACKETCOUNT_GB 0x0780 -static int debug = 3; +static int debug = -1; module_param(debug, int, 0); MODULE_PARM_DESC(debug, "DWC_eth_qos debug level (0=none,...,16=all)"); -- 2.1.4
[PATCH net 1/5] dwc_eth_qos: fix race condition in dwceqos_start_xmit
From: Rabin Vincent The xmit handler and the tx_reclaim tasklet had a race on the tx_free variable which could lead to a tx timeout if tx_free was updated after the tx complete interrupt. Signed-off-by: Rabin Vincent Signed-off-by: Lars Persson --- drivers/net/ethernet/synopsys/dwc_eth_qos.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c index fc8bbff..926db2d 100644 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c @@ -2178,12 +2178,10 @@ static int dwceqos_start_xmit(struct sk_buff *skb, struct net_device *ndev) ((trans.initial_descriptor + trans.nr_descriptors) % DWCEQOS_TX_DCNT)); - dwceqos_tx_finalize(skb, lp, &trans); - - netdev_sent_queue(ndev, skb->len); - spin_lock_bh(&lp->tx_lock); lp->tx_free -= trans.nr_descriptors; + dwceqos_tx_finalize(skb, lp, &trans); + netdev_sent_queue(ndev, skb->len); spin_unlock_bh(&lp->tx_lock); ndev->trans_start = jiffies; -- 2.1.4
[PATCH net 2/5] dwc_eth_qos: release descriptors outside netif_tx_lock
To prepare for using the CMA, we can not be in atomic context when de-allocating DMA buffers. The tx lock was needed only to protect the hw reset against the xmit handler. Now we briefly grab the tx lock while stopping the queue to make sure no thread is inside or will enter the xmit handler. Signed-off-by: Lars Persson --- drivers/net/ethernet/synopsys/dwc_eth_qos.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c index 926db2d..53d48c0 100644 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c @@ -1918,15 +1918,17 @@ static int dwceqos_stop(struct net_device *ndev) phy_stop(lp->phy_dev); tasklet_disable(&lp->tx_bdreclaim_tasklet); - netif_stop_queue(ndev); napi_disable(&lp->napi); - dwceqos_drain_dma(lp); + /* Stop all tx before we drain the tx dma. */ + netif_tx_lock_bh(lp->ndev); + netif_stop_queue(ndev); + netif_tx_unlock_bh(lp->ndev); - netif_tx_lock(lp->ndev); + dwceqos_drain_dma(lp); dwceqos_reset_hw(lp); + dwceqos_descriptor_free(lp); - netif_tx_unlock(lp->ndev); return 0; } -- 2.1.4
[PATCH net 5/5] dwc_eth_qos: do phy_start before resetting hardware
This reverts the changed init order from commit 3647bc35bd42 ("dwc_eth_qos: Reset hardware before PHY start") and makes another fix for the race. It turned out that the reset state machine of the dwceqos hardware requires PHY clocks to be present in order to complete the reset cycle. To plug the race with the phy state machine we defer link speed setting until the hardware init has finished. Signed-off-by: Lars Persson --- drivers/net/ethernet/synopsys/dwc_eth_qos.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c index 6897c1d..af11ed1 100644 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c @@ -650,6 +650,11 @@ struct net_local { u32 mmc_tx_counters_mask; struct dwceqos_flowcontrol flowcontrol; + + /* Tracks the intermediate state of phy started but hardware +* init not finished yet. +*/ + bool phy_defer; }; static void dwceqos_read_mmc_counters(struct net_local *lp, u32 rx_mask, @@ -901,6 +906,9 @@ static void dwceqos_adjust_link(struct net_device *ndev) struct phy_device *phydev = lp->phy_dev; int status_change = 0; + if (lp->phy_defer) + return; + if (phydev->link) { if ((lp->speed != phydev->speed) || (lp->duplex != phydev->duplex)) { @@ -1635,6 +1643,12 @@ static void dwceqos_init_hw(struct net_local *lp) regval = dwceqos_read(lp, REG_DWCEQOS_MAC_CFG); dwceqos_write(lp, REG_DWCEQOS_MAC_CFG, regval | DWCEQOS_MAC_CFG_TE | DWCEQOS_MAC_CFG_RE); + + lp->phy_defer = false; + mutex_lock(&lp->phy_dev->lock); + phy_read_status(lp->phy_dev); + dwceqos_adjust_link(lp->ndev); + mutex_unlock(&lp->phy_dev->lock); } static void dwceqos_tx_reclaim(unsigned long data) @@ -1880,9 +1894,13 @@ static int dwceqos_open(struct net_device *ndev) } netdev_reset_queue(ndev); + /* The dwceqos reset state machine requires all phy clocks to complete, +* hence the unusual init order with phy_start first. +*/ + lp->phy_defer = true; + phy_start(lp->phy_dev); dwceqos_init_hw(lp); napi_enable(&lp->napi); - phy_start(lp->phy_dev); netif_start_queue(ndev); tasklet_enable(&lp->tx_bdreclaim_tasklet); @@ -1915,8 +1933,6 @@ static int dwceqos_stop(struct net_device *ndev) { struct net_local *lp = netdev_priv(ndev); - phy_stop(lp->phy_dev); - tasklet_disable(&lp->tx_bdreclaim_tasklet); napi_disable(&lp->napi); @@ -1927,6 +1943,7 @@ static int dwceqos_stop(struct net_device *ndev) dwceqos_drain_dma(lp); dwceqos_reset_hw(lp); + phy_stop(lp->phy_dev); dwceqos_descriptor_free(lp); -- 2.1.4
[PATCH net 3/5] dwc_eth_qos: use GFP_KERNEL in dma_alloc_coherent()
From: Rabin Vincent Since we are in non-atomic context here we can pass GFP_KERNEL to dma_alloc_coherent(). This enables use of the CMA. Signed-off-by: Rabin Vincent Signed-off-by: Lars Persson --- drivers/net/ethernet/synopsys/dwc_eth_qos.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c index 53d48c0..3ca2d5c 100644 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c @@ -1113,7 +1113,7 @@ static int dwceqos_descriptor_init(struct net_local *lp) /* Allocate DMA descriptors */ size = DWCEQOS_RX_DCNT * sizeof(struct dwceqos_dma_desc); lp->rx_descs = dma_alloc_coherent(lp->ndev->dev.parent, size, - &lp->rx_descs_addr, 0); + &lp->rx_descs_addr, GFP_KERNEL); if (!lp->rx_descs) goto err_out; lp->rx_descs_tail_addr = lp->rx_descs_addr + @@ -1121,7 +1121,7 @@ static int dwceqos_descriptor_init(struct net_local *lp) size = DWCEQOS_TX_DCNT * sizeof(struct dwceqos_dma_desc); lp->tx_descs = dma_alloc_coherent(lp->ndev->dev.parent, size, - &lp->tx_descs_addr, 0); + &lp->tx_descs_addr, GFP_KERNEL); if (!lp->tx_descs) goto err_out; lp->tx_descs_tail_addr = lp->tx_descs_addr + -- 2.1.4
Re: Softirq priority inversion from "softirq: reduce latencies"
On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote: > The reason why Eric's change is so effective for Eric's workload is > that it fixes the problem where NET_RX keeps getting new network packets > so it keeps looping, servicing more NET_RX softirq. You have very little idea of what is happening in networking land. Once hard irq for RX has triggered, we arm a NAPI (NET_RX softirq), and no more irq will come unless the napi handler ran. Then when NAPI is complete, we re-allow interrupt to be delivered when a new packet is coming. Yes, ksoftirqd runs under load, and this is _wanted_. Sure, it might add a latency if some high prio task is wanting the same cpu, but this is exactly the purpose of having multi tasking.
[PATCH] fsl/fman: Initialize fman->dev earlier
From: Igal Liberman Currently, in a case of error, dev_err is using fman->dev before its initialization and "(NULL device *)" is printed. This patch fixes this issue. Signed-off-by: Igal Liberman --- drivers/net/ethernet/freescale/fman/fman.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c index 623aa1c..79a210a 100644 --- a/drivers/net/ethernet/freescale/fman/fman.c +++ b/drivers/net/ethernet/freescale/fman/fman.c @@ -2791,6 +2791,8 @@ static struct fman *read_dts_node(struct platform_device *of_dev) goto fman_free; } + fman->dev = &of_dev->dev; + return fman; fman_node_put: @@ -2845,8 +2847,6 @@ static int fman_probe(struct platform_device *of_dev) dev_set_drvdata(dev, fman); - fman->dev = dev; - dev_dbg(dev, "FMan%d probed\n", fman->dts_params.id); return 0; -- 1.7.9.5