[PATCH net v3.16] r8169: Increase no descriptors on max.

2016-02-29 Thread Corcodel Marian
  This patch increase rx/tx on maximum allowed 1024 4-duble-words
  descriptors.

Signed-off-by: Corcodel Marian 
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index e215812..5fd3fca 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -84,8 +84,8 @@ static const int multicast_filter_limit = 32;
 
 #define R8169_REGS_SIZE256
 #define R8169_NAPI_WEIGHT  64
-#define NUM_TX_DESC64  /* Number of Tx descriptor registers */
-#define NUM_RX_DESC256U/* Number of Rx descriptor registers */
+#define NUM_TX_DESC1024/* Number of Tx descriptor registers */
+#define NUM_RX_DESC1024U   /* Number of Rx descriptor registers */
 #define R8169_TX_RING_BYTES(NUM_TX_DESC * sizeof(struct TxDesc))
 #define R8169_RX_RING_BYTES(NUM_RX_DESC * sizeof(struct RxDesc))
 
-- 
2.1.4



[PATCH] Increase no descriptors on max.

2016-02-29 Thread Corcodel Marian
  This patch increase rx/tx on maximum allowed 1024 4-duble-words
  descriptors.

Signed-off-by: Corcodel Marian 
---
 drivers/net/ethernet/realtek/r8169.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index e215812..5fd3fca 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -84,8 +84,8 @@ static const int multicast_filter_limit = 32;
 
 #define R8169_REGS_SIZE256
 #define R8169_NAPI_WEIGHT  64
-#define NUM_TX_DESC64  /* Number of Tx descriptor registers */
-#define NUM_RX_DESC256U/* Number of Rx descriptor registers */
+#define NUM_TX_DESC1024/* Number of Tx descriptor registers */
+#define NUM_RX_DESC1024U   /* Number of Rx descriptor registers */
 #define R8169_TX_RING_BYTES(NUM_TX_DESC * sizeof(struct TxDesc))
 #define R8169_RX_RING_BYTES(NUM_RX_DESC * sizeof(struct RxDesc))
 
-- 
2.1.4



[net-next] arp: correct return value of arp_rcv

2016-02-29 Thread Zhang Shengju
Currently, arp_rcv() always return zero on a packet delivery upcall.

To make its behavior more compliant with the way this API should be
used, this patch changes this to let it return NET_RX_SUCCESS when the
packet is proper handled, and NET_RX_DROP otherwise.

Signed-off-by: Zhang Shengju 
---
 net/ipv4/arp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index c102eb5..ae235a1 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -880,7 +880,7 @@ out:
consume_skb(skb);
 out_free_dst:
dst_release(reply_dst);
-   return 0;
+   return NET_RX_SUCCESS;
 }
 
 static void parp_redo(struct sk_buff *skb)
@@ -924,11 +924,11 @@ static int arp_rcv(struct sk_buff *skb, struct net_device 
*dev,
 
 consumeskb:
consume_skb(skb);
-   return 0;
+   return NET_RX_SUCCESS;
 freeskb:
kfree_skb(skb);
 out_of_mem:
-   return 0;
+   return NET_RX_DROP;
 }
 
 /*
-- 
1.8.3.1





pull request: bluetooth-next 2016-03-01

2016-02-29 Thread Johan Hedberg
Hi Dave,

Here's our main set of Bluetooth & 802.15.4 patches for the 4.6 kernel.

 - New Bluetooth HCI driver for Intel/AG6xx controllers
 - New Broadcom ACPI IDs
 - LED trigger support for indicating Bluetooth powered state
 - Various fixes in mac802154, 6lowpan and related drivers
 - New USB IDs for AR3012 Bluetooth controllers

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit a30a9ea6e21b495372aff549f3dfd63198bd1f45:

  rocker: fix rocker_world_port_obj_vlan_add() (2016-02-23 13:12:31 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 34bf1912bfc06bd9200893916078eb0f16480a95:

  Bluetooth: hci_uart: Add diag and address support for Intel/AG6xx (2016-02-29 
19:25:22 +0200)


Alexander Aring (9):
  MAINTAINERS: update 802.15.4 entries
  mac802154: fix mac header length check
  at86rf230: fix race on error handling
  at86rf230: fix state change handling on error
  mrf24j40: add writeable missing reg
  6lowpan: iphc: add support for stateful compression
  ieee802154: 6lowpan: fix return of netdev notifier
  6lowpan: iphc: fix stateful multicast compression
  6lowpan: iphc: fix invalid case handling

Andrzej Hajda (1):
  6lowpan: fix error checking code

Anton Protopopov (1):
  Bluetooth: hci_intel: Fix a wrong comparison

Bhumika Goyal (1):
  Bluetooth: ath3k: Fixed a blank line after declaration issue

Dmitry Tunin (3):
  Bluetooth: btusb: Add new AR3012 ID 13d3:3395
  Bluetooth: Add new AR3012 ID 0489:e095
  Bluetooth: btusb: Add a new AR3012 ID 04ca:3014

Heiner Kallweit (2):
  Bluetooth: add LED trigger for indicating HCI is powered up
  Bluetooth: Use managed version of led_trigger_register in LED trigger

J.J. Meijer (1):
  Bluetooth: hci_bcm: Add new ACPI ID for bcm43241

Koen Zandberg (1):
  mac802154: Fixes kernel oops when unloading a radio driver

Loic Poulain (1):
  Bluetooth: hci_uart: Add Intel/AG6xx support

Marcel Holtmann (1):
  Bluetooth: hci_uart: Add diag and address support for Intel/AG6xx

Mika Westerberg (1):
  Bluetooth: hci_bcm: Add BCM2E7C ACPI ID

Petri Gynther (1):
  Bluetooth: btbcm: Fix handling of firmware not found

Wei-Ning Huang (1):
  Bluetooth: hci_core: cancel power off delayed work properly

 MAINTAINERS|   9 +-
 drivers/bluetooth/Kconfig  |  11 +
 drivers/bluetooth/Makefile |   1 +
 drivers/bluetooth/ath3k.c  |   7 +
 drivers/bluetooth/btbcm.c  |   3 +-
 drivers/bluetooth/btusb.c  |   3 +
 drivers/bluetooth/hci_ag6xx.c  | 337 +
 drivers/bluetooth/hci_bcm.c|   2 +
 drivers/bluetooth/hci_intel.c  |   4 +-
 drivers/bluetooth/hci_ldisc.c  |   6 +
 drivers/bluetooth/hci_uart.h   |   8 +-
 drivers/net/ieee802154/at86rf230.c |  25 ++-
 drivers/net/ieee802154/mrf24j40.c  |   1 +
 include/net/6lowpan.h  |  32 +++
 include/net/bluetooth/hci_core.h   |   3 +
 include/net/mac802154.h|   5 +-
 net/6lowpan/core.c |  39 +++-
 net/6lowpan/debugfs.c  | 247 +
 net/6lowpan/iphc.c | 413 +++-
 net/bluetooth/Kconfig  |   9 +
 net/bluetooth/Makefile |   1 +
 net/bluetooth/hci_core.c   |   7 +
 net/bluetooth/leds.c   |  74 +++
 net/bluetooth/leds.h   |  16 ++
 net/ieee802154/6lowpan/core.c  |   7 +-
 net/mac802154/main.c   |   2 +-
 26 files changed, 1192 insertions(+), 80 deletions(-)
 create mode 100644 drivers/bluetooth/hci_ag6xx.c
 create mode 100644 net/bluetooth/leds.c
 create mode 100644 net/bluetooth/leds.h


signature.asc
Description: PGP signature


Re: [Intel-wired-lan] [next] igb: allow setting MAC address on i211 using a device tree blob V5

2016-02-29 Thread John Holland
On Mar 1, 2016, at 03:52, Brown, Aaron F  wrote:

> This throws a few checkpatch warnings, but I won't withhold my tested by for 
> these:
> 
> total: 0 errors, 2 warnings, 0 checks, 21 lines checked
> 
> Your patch has style problems, please review.
> 
> NOTE: If any of the errors are false positives, please report
>  them to the maintainer, see CHECKPATCH in MAINTAINERS.
> u1463:[0]/usr/src/kernels/next-queue>

Thanks for testing...

Do you require me to reformat the patch text? And won't that break the link?

John


Re: [PATCH] [BACKPORT] [3.14.56] bnx2x: Don't notify about scratchpad parities

2016-02-29 Thread Greg KH
On Thu, Dec 10, 2015 at 02:37:34PM +0100, Patrick Schaaf wrote:
> On Friday 06 November 2015 09:32:46 Greg KH wrote:
> > On Thu, Nov 05, 2015 at 11:18:37AM +0100, Patrick Schaaf wrote:
> > > bnx2x: Don't notify about scratchpad parities
> > > 
> > > This is a (trivial) "backport" of ad6afbe9578d1fa26680faf78c846bd8c00d1d6e
> > > to stable kernel 3.14.56.
> > 
> > This patch isn't in 4.1 either, do you want it there as well?
> 
> Hi Greg,
> 
> I didn't see the patch in 3.14.57 or 3.14.58 - could you please consider it 
> again (for all stable kernels that don't have it)?
> 
> My three machines with bnx2x interfaces have been running file with patch 
> 3.14.56, for the last 35 days. The original problematic event (spewing a 
> million messages which are suppressed by that patch), did not reoccur so far 
> (neither did any other issue, dmesg is completely empty since boot).
> 
> best regards
>   Patrick
> 
> Related earlier posts / reports, for reference:
> 
> http://marc.info/?l=linux-netdev&m=144663711626469
> http://lists.openwall.net/netdev/2015/11/05/48

Sorry for the long delay, now queued up.

greg k-h


Re: linux-next: manual merge of the target-merge tree with the net-next tree

2016-02-29 Thread Stephen Rothwell
Hi Nicholas,

On Mon, 29 Feb 2016 21:39:33 -0800 "Nicholas A. Bellinger" 
 wrote:
>
> I'll include a note to Linus in target-pending/for-next-merge PULL
> request, and will plan to wait until after DaveM's net-next is merged
> for v4.6-rc0.

The order doesn't really matter and Linus is cleverer than I am :-)

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the target-merge tree with the net-next tree

2016-02-29 Thread Nicholas A. Bellinger
On Mon, 2016-02-29 at 17:39 +1100, Stephen Rothwell wrote:
> Hi Nicholas,
> 
> Today's linux-next merge of the target-merge tree got a conflict in:
> 
>   drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
> 
> between commit:
> 
>   ba9cee6aa67d ("cxgb4/iw_cxgb4: TOS support")
> 
> from the net-next tree and commit:
> 
>   c973e2a3ff1b ("cxgb4: add definitions for iSCSI target ULD")
> 
> from the target-merge tree.
> 
> I fixed it up (the latter was a superset of the former) and can carry
> the fix as necessary (no action is required).
> 

Thanks Stephen.

I'll include a note to Linus in target-pending/for-next-merge PULL
request, and will plan to wait until after DaveM's net-next is merged
for v4.6-rc0.



Re: [PATCH] fsl/fman: remove dTSEC-A003 Errata workaround

2016-02-29 Thread Scott Wood
On 02/29/2016 09:17 AM, igal.liber...@freescale.com wrote:
> From: Igal Liberman 
> 
> Errata dTSEC-A003 was fixed in P4080 rev 3.0.
> Prior revisions are not supported, so the workaround can be removed.
> 
> Signed-off-by: Igal Liberman 

Since when do we not support p4080 rev 2?

-Scott



RE: [Intel-wired-lan] [next] igb: allow setting MAC address on i211 using a device tree blob V5

2016-02-29 Thread Brown, Aaron F
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of John Holland
> Sent: Thursday, February 18, 2016 3:11 AM
> To: intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [next] igb: allow setting MAC address on i211 using
> a device tree blob V5
> 
> Hello,
> 
> The Intel i211 LOM PCIe Ethernet controllers' iNVM operates as an OTP and
> has no external EEPROM interface [1]. The following allows the driver to
> pickup the MAC address from a device tree blob when CONFIG_OF has been
> enabled.
> 
> [1]
> http://www.intel.com/content/www/us/en/embedded/products/networkin
> g/i211-ethernet-controller-datasheet.html
> 
> Changes V2
> - Restrict searching for compatible devices to current pci device.
> 
> Changes V3
> - Add device tree binding documentation.
> 
> Changes V4
> - Rebase patch.
> 
> Changes V5
> - Use eth_platform_get_mac_address() to resolve MAC specified in a dtb.
> - Remove now invalid device tree binding documentation specified in V3
>und V4.
> 
> Signed-off-by: John Holland
> ---
>   drivers/net/ethernet/intel/igb/igb_main.c | 9 ++---
>   1 file changed, 6 insertions(+), 3 deletions(-)

This throws a few checkpatch warnings, but I won't withhold my tested by for 
these:
-
u1463:[0]/usr/src/kernels/next-queue> git format-patch $item -1 
--stdout|./scripts/checkpatch.pl -
WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per 
line)
#14:
http://www.intel.com/content/www/us/en/embedded/products/networking/i211-ethernet-controller-datasheet.html

WARNING: email address 'John Holland' might be better as 
'John Holland '
#30:
Signed-off-by: John Holland

total: 0 errors, 2 warnings, 0 checks, 21 lines checked

Your patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
  them to the maintainer, see CHECKPATCH in MAINTAINERS.
u1463:[0]/usr/src/kernels/next-queue>
-

I do not seem to have hardware that uses device tree, so my testing is 
relegated to regression tests with my existing set of chipsets.

Tested-by: Aaron Brown 


RE: [Intel-wired-lan] [PATCH] igb: Garbled output for "ethtool -m"

2016-02-29 Thread Brown, Aaron F
> From: Intel-wired-lan [intel-wired-lan-boun...@lists.osuosl.org] on behalf of 
> Doron Shikmoni [doron.shikm...@gmail.com]
> Sent: Tuesday, February 16, 2016 11:34 PM
> To: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] igb: Garbled output for "ethtool -m"
> 
> Hello,
> 
> Garbled output for "ethtool -m ethX", in igb-driven NICs with module /
> plugin EEPROM (i.e. SFP information). Each output data byte appears
> duplicated.
> 
> In igb_ethtool.c, igb_get_module_eeprom() is reading the EEPROM via i2c;
> the eeprom offset for each word that's read via igb_read_phy_reg_i2c()
> was passed in #words, whereas it needs to be a byte offset.
> This patches fixes the bug.
>
> Signed-off-by: Doron Shikmoni 
> ---
>  drivers/net/ethernet/intel/igb/igb_ethtool.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Checkpatch complains that you pushed the line over 80 characters:

u1463:[0]/usr/src/kernels/next-queue> git format-patch $item -1 
--stdout|./scripts/checkpatch.pl -
WARNING: line over 80 characters
#29: FILE: drivers/net/ethernet/intel/igb/igb_ethtool.c:3010:
+   status = igb_read_phy_reg_i2c(hw, (first_word + i) * 2, 
&dataword[i]);

total: 0 errors, 1 warnings, 0 checks, 8 lines checked

Your patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
  them to the maintainer, see CHECKPATCH in MAINTAINERS.
u1463:[0]/usr/src/kernels/next-queue> 


But functionally seems good.  I'll let Jeff choose whether to be a stickler for 
the warning or not, so...

Tested-by: Aaron Brown 


Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files

2016-02-29 Thread Yisen Zhuang
在 2016/2/26 22:29, Arnd Bergmann 写道:
> The two header files got moved to include/linux, and most
> users were already converted, this changes the remaining drivers
> and removes the files.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/dma/idma64.h| 2 +-
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 2 +-

The HNS portion:

Acked-by: Yisen Zhuang 

>  drivers/net/ethernet/netronome/nfp/nfp_net.h| 2 +-
>  include/asm-generic/io-64-nonatomic-hi-lo.h | 2 --
>  include/asm-generic/io-64-nonatomic-lo-hi.h | 2 --
>  5 files changed, 3 insertions(+), 7 deletions(-)
>  delete mode 100644 include/asm-generic/io-64-nonatomic-hi-lo.h
>  delete mode 100644 include/asm-generic/io-64-nonatomic-lo-hi.h
> 
> diff --git a/drivers/dma/idma64.h b/drivers/dma/idma64.h
> index 8423f13ed0da..a52ad6bcf86a 100644
> --- a/drivers/dma/idma64.h
> +++ b/drivers/dma/idma64.h
> @@ -16,7 +16,7 @@
>  #include 
>  #include 
>  
> -#include 
> +#include 
>  
>  #include "virt-dma.h"
>  
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c 
> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
> index 802d55457f19..fd90f3737963 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
> @@ -7,7 +7,7 @@
>   * (at your option) any later version.
>   */
>  
> -#include 
> +#include 
>  #include 
>  #include "hns_dsaf_main.h"
>  #include "hns_dsaf_mac.h"
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
> b/drivers/net/ethernet/netronome/nfp/nfp_net.h
> index ab264e1bccd0..75683fb26734 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
> @@ -45,7 +45,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  
>  #include "nfp_net_ctrl.h"
>  
> diff --git a/include/asm-generic/io-64-nonatomic-hi-lo.h 
> b/include/asm-generic/io-64-nonatomic-hi-lo.h
> deleted file mode 100644
> index 32b73abce1b0..
> --- a/include/asm-generic/io-64-nonatomic-hi-lo.h
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -/* XXX: delete asm-generic/io-64-nonatomic-hi-lo.h after converting new 
> users */
> -#include 
> diff --git a/include/asm-generic/io-64-nonatomic-lo-hi.h 
> b/include/asm-generic/io-64-nonatomic-lo-hi.h
> deleted file mode 100644
> index 55a627c37721..
> --- a/include/asm-generic/io-64-nonatomic-lo-hi.h
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -/* XXX: delete asm-generic/io-64-nonatomic-lo-hi.h after converting new 
> users */
> -#include 
> 



Re:LDPE/HDPE/TPE gloves etc.

2016-02-29 Thread Vivian
Dear Manager,

This is Vivian from Ju County Mingbo Industry & Trade Co.,Ltd in China.

Glad to hear that you're on the market for disposable PE gloves and aprons. We 
mainly produced LDPE/HDPE/TPE gloves,Two fingers gloves, PE aprons for more 
than eight years. Our products have been exported to many countries. So please 
be assured of the quality. Hope to be a partner of your company!

Any interest, please freely contact me! Looking forward to hearing from you 
soon.

Best regards£¡
Vivian
Ju County Mingbo Industry £¦ Trade Co.,Ltd
ADD: Liu Guanzhuang industrial Park£¬Ju County£¬Rizhao City£¬Shandong Province
Tel£º18661694858
Fax£º0633-6178378



Re: [PATCH] asm-generic: remove old nonatomic-io wrapper files

2016-02-29 Thread Simon Horman
On Fri, Feb 26, 2016 at 03:29:05PM +0100, Arnd Bergmann wrote:
> The two header files got moved to include/linux, and most
> users were already converted, this changes the remaining drivers
> and removes the files.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/dma/idma64.h| 2 +-
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_net.h| 2 +-

The NFP portion:

Acked-by: Simon Horman 

>  include/asm-generic/io-64-nonatomic-hi-lo.h | 2 --
>  include/asm-generic/io-64-nonatomic-lo-hi.h | 2 --
>  5 files changed, 3 insertions(+), 7 deletions(-)
>  delete mode 100644 include/asm-generic/io-64-nonatomic-hi-lo.h
>  delete mode 100644 include/asm-generic/io-64-nonatomic-lo-hi.h


Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32

2016-02-29 Thread John Fastabend
On 16-02-29 01:25 PM, Cong Wang wrote:
> On Mon, Feb 29, 2016 at 10:58 AM, Jiri Pirko  wrote:
>> Mon, Feb 29, 2016 at 07:40:53PM CET, john.fastab...@gmail.com wrote:
>>> On 16-02-27 08:28 PM, Cong Wang wrote:
 On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend
  wrote:
> On 16-02-26 09:39 AM, Cong Wang wrote:
>> On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend
>>  wrote:
>>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>>> index 2121df5..e64d20b 100644
>>> --- a/include/net/pkt_cls.h
>>> +++ b/include/net/pkt_cls.h
>>> @@ -392,4 +392,9 @@ struct tc_cls_u32_offload {
>>> };
>>>  };
>>>
>>> +static inline bool tc_should_offload(struct net_device *dev)
>>> +{
>>> +   return dev->netdev_ops->ndo_setup_tc;
>>> +}
>>> +
>>
>> These should be protected by CONFIG_NET_CLS_U32, no?
>>
>
> Its not necessary it is a completely general function and I only
> lifted it out of cls_u32 so that the cls_flower classifier could
> also use it.
>
> I don't see the need off-hand to have it wrapped in an ORd ifdef
> statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...).
> Any particular reason you were thnking it should be wrapped in ifdefs?
>

 Not a big deal.

 I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n.

 Thanks.

>>>
>>> Well because this is 'static inline' gcc should just remove it
>>> if it is not used. Assuming non-ancient gcc and normal compile
>>> flags, e.g. you are not including -fkeep-inline-functions or
>>> something.
>>>
>>> So just to keep it readable I would prefer to just leave it
>>> as is.
>>
>> Definitelly. cls_flower will use it in very near future. Making it
>> dependent on CONFIG_NET_CLS_U32 makes 0 sense to me.
> 
> Oh, why then do you have u32 in the struct name tc_cls_u32_offload?
> 
> (Note that in the above I said "these" not "this", so I never only refer
> to tc_should_offload)
> 

hmm yeah that likely wont be needed by flower although it could be used.
I still think its best to leave this as is there doesn't seem to be a
very strong precedent to wrap any of the other structs/fields/etc in
pkt_cls.h into their respective ifdef/endif blocks. And I think it
starts to get a bit much if we do. I'm trusting gcc here can do the
right thing when these are included but never used.

Thanks,
John


RE: [PATCH net-next 0/4] lan78xx: driver update

2016-02-29 Thread Woojung.Huh
> > This patch series add new ethtool functions of set_pauseparam  &
> get_pauseparam
> > and MAINTAINERS entry.
> 
> Series applied, thanks.
Thanks.
 
> Please fix your configuration such that your proper name appears in the
> "From: " field of your outgoing emails.  That is what ends up in the
> Author field of every GIT commit.  And right now only your email address
> appears there.
I'll contact IT depart to find out there is way for it.
Just in case, can I send over other email such as gmail, but "Signed by" is 
company email address?


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Peter Hurley
On 02/29/2016 11:14 AM, Thomas Gleixner wrote:
> On Mon, 29 Feb 2016, Peter Hurley wrote:
>> On 02/29/2016 10:24 AM, Eric Dumazet wrote:
 Just to be clear

if (time_before(jiffies, end) && !need_resched() &&
--max_restart)
goto restart;

 aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
>>>
>>> Sure, now remove the 1st and 2nd condition.
>>
>> Well just removing the 2nd condition has everything working fine,
>> because that fixes the priority inversion.
> 
> No. It does not fix anything. It hides the shortcomings of the driver.
>  
>> However, when system resources are _not_ contended, it makes no
>> sense to be forced to revert to ksoftirqd resolution, which is strictly
>> intended as fallback.
> 
> No. You claim it is simply because your driver does not handle that situation
> properly.
>  
>> Or flipping your argument on its head, why not just _always_ execute
>> softirq in ksoftirqd?
> 
> Which is what that change effectivley does. And that makes a lot of sense,
> because you get the softirq load under scheduler control and do not let the
> softirq run as a context stealing entity which is completely uncontrollable by
> the scheduler.

Ok, fair enough.

However, charging [in the scheduler sense] very lightweight DMA completion for
one subsystem collectively with very heavyweight NET_RX (doing garbage 
collection
in softirq!) is hardly ideal.

The alternative being threaded interrupt handlers (which are essentially treated
as 0.00 scheduler cost).

I just want to make sure that's the conscious choice being made, when the
patches for converting from tasklet to threaded irq start hitting subsystem
maintainers.

Regards,
Peter Hurley




[PATCH net v2] mld, igmp: Fix reserved tailroom calculation

2016-02-29 Thread Benjamin Poirier
The current reserved_tailroom calculation fails to take hlen and tlen into
account.

skb:
[__hlen__|__data|__tlen___|__extra__]
^   ^
headskb_end_offset

In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:

[__hlen__|__data|__extra__|__tlen___]
^   ^
headskb_end_offset

The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)

Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.

The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
reserved_tailroom = skb_tailroom - mtu
else:
reserved_tailroom = tlen

Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.

Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large 
MTUs")
Signed-off-by: Benjamin Poirier 
---

Notes:
Changes v1->v2
As suggested by Hannes, move the code to an inline helper and express it
using "if" rather than "min".

 include/linux/skbuff.h | 24 
 net/ipv4/igmp.c|  3 +--
 net/ipv6/mcast.c   |  3 +--
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ce9ff7..d3fcd45 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1985,6 +1985,30 @@ static inline void skb_reserve(struct sk_buff *skb, int 
len)
skb->tail += len;
 }
 
+/**
+ * skb_tailroom_reserve - adjust reserved_tailroom
+ * @skb: buffer to alter
+ * @mtu: maximum amount of headlen permitted
+ * @needed_tailroom: minimum amount of reserved_tailroom
+ *
+ * Set reserved_tailroom so that headlen can be as large as possible but
+ * not larger than mtu and tailroom cannot be smaller than
+ * needed_tailroom.
+ * The required headroom should already have been reserved before using
+ * this function.
+ */
+static inline void skb_tailroom_reserve(struct sk_buff *skb, unsigned int mtu,
+   unsigned int needed_tailroom)
+{
+   SKB_LINEAR_ASSERT(skb);
+   if (mtu < skb_tailroom(skb) - needed_tailroom)
+   /* use at most mtu */
+   skb->reserved_tailroom = skb_tailroom(skb) - mtu;
+   else
+   /* use up to all available space */
+   skb->reserved_tailroom = needed_tailroom;
+}
+
 #define ENCAP_TYPE_ETHER   0
 #define ENCAP_TYPE_IPPROTO 1
 
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 05e4cba..b3086cf 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -356,9 +356,8 @@ static struct sk_buff *igmpv3_newpack(struct net_device 
*dev, unsigned int mtu)
skb_dst_set(skb, &rt->dst);
skb->dev = dev;
 
-   skb->reserved_tailroom = skb_end_offset(skb) -
-min(mtu, skb_end_offset(skb));
skb_reserve(skb, hlen);
+   skb_tailroom_reserve(skb, mtu, tlen);
 
skb_reset_network_header(skb);
pip = ip_hdr(skb);
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 5ee56d0..d64ee7e 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1574,9 +1574,8 @@ static struct sk_buff *mld_newpack(struct inet6_dev 
*idev, unsigned int mtu)
return NULL;
 
skb->priority = TC_PRIO_CONTROL;
-   skb->reserved_tailroom = skb_end_offset(skb) -
-min(mtu, skb_end_offset(skb));
skb_reserve(skb, hlen);
+   skb_tailroom_reserve(skb, mtu, tlen);
 
if (__ipv6_get_lladdr(idev, &addr_buf, IFA_F_TENTATIVE)) {
/* :
-- 
2.7.0



Re: [PATCH net-next] hv_netvsc: add ethtool support for set and get of settings

2016-02-29 Thread David Miller
From: Ben Hutchings 
Date: Mon, 29 Feb 2016 22:34:38 +

> On Mon, 2016-02-29 at 17:09 -0500, David Miller wrote:
>> From: Simon Xiao 
>> Date: Thu, 25 Feb 2016 15:24:08 -0800
>> 
>> > This patch allows the user to set and retrieve speed and duplex of the
>> > hv_netvsc device via ethtool.
>> > 
>> > Example:
>> > $ ethtool eth0
>> > Settings for eth0:
>> > ...
>> > Speed: Unknown!
>> > Duplex: Unknown! (255)
>> > ...
>> > $ ethtool -s eth0 speed 1000 duplex full
>> > $ ethtool eth0
>> > Settings for eth0:
>> > ...
>> > Speed: 1000Mb/s
>> > Duplex: Full
>> > ...
>> > 
>> > This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
>> > 
>> > Signed-off-by: Simon Xiao 
>> 
>> Applied, thanks.
> 
> I missed this due to flu, but now I look at it - I don't see the point.
> Link speed isn't meaingful for a memory-based transport, so "unknown"
> is correct.  The link is effectively full duplex though.
> 
> If the issue is that ethtool is a bit shouty about unknowns, let's
> consider changing that in ethtool, not teaching drivers to lie.

The issue is that certain bonding modes do not work properly without
a speed being reported by a device.

We're doing this for other "virtual" devices already thanks to changes
that went in last week, so there is precedence.


Re: [PATCHv2 08/10] rfkill: Use switch to demux userspace operations

2016-02-29 Thread Jouni Malinen
On Mon, Feb 29, 2016 at 05:30:20PM -0500, João Paulo Rechi Vita wrote:

> I agree there is a difference in the logic here, thanks for taking the
> time to point it out so clearly, and sorry for missing this. But AFAIU
> userspace should not call RFKILL_OP_CHANGE with ev.type ==
> RFKILL_TYPE_ALL, as RFKILL_OP_CHANGE is intended to be used to
> block/unblock one RFKill switch, and it is not possible to create a
> RFKill switch with type == RFKILL_TYPE_ALL (rfkill_alloc() would
> return NULL).

Interesting. Maybe Johannes can comment on that part since I think he
wrote the code that interacts with kernel for the rfkill test cases.

> I tried to look into the source code of the test suite you pointed,
> but couldn't easily figure out how it ends up with that combination.
> Could you please explain (or point me in the code) how is that a valid
> operation? If I'm not missing anything, we should probably return
> EINVAL in this case.

These specific failures were shown for the test cases in this file:
http://w1.fi/cgit/hostap/tree/tests/hwsim/test_rfkill.py

The interaction with kernel is done using this code:
http://w1.fi/cgit/hostap/tree/tests/hwsim/rfkill.py

It does indeed look like TYPE_ALL is used here (the block() and
unblock() implementations). If this is incorrect, we can certainly
change the script since I'd assume this is not used for anything else
than the hwsim test cases (or well who knows, it is available out there,
so if someone needs python code to do rfkill operations..).
 
-- 
Jouni MalinenPGP id EFC895FA


Re: [PATCH net-next] hv_netvsc: add ethtool support for set and get of settings

2016-02-29 Thread Ben Hutchings
On Mon, 2016-02-29 at 17:09 -0500, David Miller wrote:
> From: Simon Xiao 
> Date: Thu, 25 Feb 2016 15:24:08 -0800
> 
> > This patch allows the user to set and retrieve speed and duplex of the
> > hv_netvsc device via ethtool.
> > 
> > Example:
> > $ ethtool eth0
> > Settings for eth0:
> > ...
> > Speed: Unknown!
> > Duplex: Unknown! (255)
> > ...
> > $ ethtool -s eth0 speed 1000 duplex full
> > $ ethtool eth0
> > Settings for eth0:
> > ...
> > Speed: 1000Mb/s
> > Duplex: Full
> > ...
> > 
> > This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
> > 
> > Signed-off-by: Simon Xiao 
> 
> Applied, thanks.

I missed this due to flu, but now I look at it - I don't see the point.
Link speed isn't meaingful for a memory-based transport, so "unknown"
is correct.  The link is effectively full duplex though.

If the issue is that ethtool is a bit shouty about unknowns, let's
consider changing that in ethtool, not teaching drivers to lie.

Ben.

-- 
Ben Hutchings
If God had intended Man to program,
we'd have been born with serial I/O ports.

signature.asc
Description: This is a digitally signed message part


Re: [PATCHv2 08/10] rfkill: Use switch to demux userspace operations

2016-02-29 Thread João Paulo Rechi Vita
Hello Jouni,

On 26 February 2016 at 12:59, Jouni Malinen  wrote:
> On Mon, Feb 22, 2016 at 11:36:39AM -0500, João Paulo Rechi Vita wrote:
>> Using a switch to handle different ev.op values in rfkill_fop_write()
>> makes the code easier to extend, as out-of-range values can always be
>> handled by the default case.
>
> This breaks rfkill.. There are automated test scripts for testing this
> area (and most of Wi-Fi for that matter. It would be nice if these were
> used for changes before they get contributed upstream..
>
> http://buildbot.w1.fi/hwsim/
>

Thanks for pointing that out, I haven't heard of this tool before.
I'll give it a try before my next submission.

> This specific commit broke all the rfkill_* test cases because of
> following:
>
>> diff --git a/net/rfkill/core.c b/net/rfkill/core.c
>> @@ -1199,29 +1200,32 @@ static ssize_t rfkill_fop_write(struct file *file, 
>> const char __user *buf,
>> - list_for_each_entry(rfkill, &rfkill_list, node) {
>> - if (rfkill->idx != ev.idx && ev.op != RFKILL_OP_CHANGE_ALL)
>> - continue;
>> -
>> - if (rfkill->type != ev.type && ev.type != RFKILL_TYPE_ALL)
>> - continue;
>
> Note that RFKILL_TYPE_ALL here..
>
>> + list_for_each_entry(rfkill, &rfkill_list, node)
>> + if (rfkill->type == ev.type ||
>> + ev.type == RFKILL_TYPE_ALL)
>> + rfkill_set_block(rfkill, ev.soft);
>
> It was included for RFKILL_OP_CHANGE_ALL.
>
>> + case RFKILL_OP_CHANGE:
>> + list_for_each_entry(rfkill, &rfkill_list, node)
>> + if (rfkill->idx == ev.idx && rfkill->type == ev.type)
>> + rfkill_set_block(rfkill, ev.soft);
>
> but not for RFKILL_OP_CHANGE..
>
> This needs following to work:
>
>
> diff --git a/net/rfkill/core.c b/net/rfkill/core.c
> index 59ff92d..c4bbd19 100644
> --- a/net/rfkill/core.c
> +++ b/net/rfkill/core.c
> @@ -1239,7 +1239,9 @@ static ssize_t rfkill_fop_write(struct file *file, 
> const char __user *buf,
> break;
> case RFKILL_OP_CHANGE:
> list_for_each_entry(rfkill, &rfkill_list, node)
> -   if (rfkill->idx == ev.idx && rfkill->type == ev.type)
> +   if (rfkill->idx == ev.idx &&
> +   (rfkill->type == ev.type ||
> +ev.type == RFKILL_TYPE_ALL))
> rfkill_set_block(rfkill, ev.soft);
> ret = 0;
> break;
>

I agree there is a difference in the logic here, thanks for taking the
time to point it out so clearly, and sorry for missing this. But AFAIU
userspace should not call RFKILL_OP_CHANGE with ev.type ==
RFKILL_TYPE_ALL, as RFKILL_OP_CHANGE is intended to be used to
block/unblock one RFKill switch, and it is not possible to create a
RFKill switch with type == RFKILL_TYPE_ALL (rfkill_alloc() would
return NULL).

I tried to look into the source code of the test suite you pointed,
but couldn't easily figure out how it ends up with that combination.
Could you please explain (or point me in the code) how is that a valid
operation? If I'm not missing anything, we should probably return
EINVAL in this case.

Regards,

--
João Paulo Rechi Vita
http://about.me/jprvita


Re: Question on switchdev

2016-02-29 Thread Andrew Lunn
On Mon, Feb 29, 2016 at 04:43:16PM -0500, Murali Karicheri wrote:

Hi Murali

Please can you get your email client to wrap lines at ~ 75 characters.

> TI Keystone netcp h/w has a switch. It has n slave ports and 1 host
> port. Currently the netcp driver disables the switch functionality
> which makes them appear as n nic ports. However we have requirement
> to add switch support in the driver. I have reviewed the
> experimental driver documentation
> Documentation/networking/switchdev.txt and would like to understand
> it better so that I can add this support to keystone netcp driver.
 
> NetCP h/w has a 1 (host port) x n (slave port) switch. It can do
> layer 2 forwarding between ports. In the switch mode, host driver
> provides the frame to the switch and switch uses the filter data
> base (AKA ALE table, Address Learning Engine table) to forward the
> packet. There is a piece of information available per frame (meta
> data) to decide if frame to be forwarded to a particular port or use
> the fdb for forward decisions.

This makes is sound like a good fit for DSA.

Documentation/networking/dsa/dsa.txt.

You probably need to implement a new tagging protocol in
net/dsa/tag_*.c and a driver in drivers/net/dsa/

> 1. How does port netdev differ from regular netdev that carries data
>when registering netdev? Any example you can point to?

They don't differ at all. You consider each port of the switch to be a
normal Linux interface.

> 2. I assume port netdev will appear as an interface in ifconfig -a
>command and it is not assigned an IP address. Correct?

The user can assign an address, if they want. It is a normal Linux
interface. They can also create a bridge, and add the interface to the
bridge. An advanced DSA driver will keep track of which interfaces are
in which bridge, and if possible, offload the bridge to the hardware.

> 3. with 1xn switch, so we have n + 1 netdev registered with net
>core? I assume, only 1 netdev is for data plane and the rest are
>control plane. Is this correct?

No. You only have netdev devices for the external ports of the
switch. The other port is known as the cpu port, and does not have a
netdev.

> 4. We have bunch of port specific configuration that we would like
> to control or configure from use space using standard tools. For
> example, switch port state, flow control etc. Is that possible to
> add using this framework? ethtool update needed for this?

The whole idea here is that the switch ports are normal Linux
interface. You use normal linux APIs to configure them. You probably
don't need to add any new features.

One key things to get your head around. The switch is a hardware
accelerator for the Linux stack. You have to think how you can make
your switch accelerate the Linux stack. It takes people a while to get
this.

  Andrew


Re: [PATCH net-next 0/4] lan78xx: driver update

2016-02-29 Thread David Miller
From: 
Date: Thu, 25 Feb 2016 23:33:05 +

> This patch series add new ethtool functions of set_pauseparam  & 
> get_pauseparam
> and MAINTAINERS entry.

Series applied, thanks.

Please fix your configuration such that your proper name appears in the
"From: " field of your outgoing emails.  That is what ends up in the
Author field of every GIT commit.  And right now only your email address
appears there.

Thanks.


Re: [PATCH net-next] hv_netvsc: add ethtool support for set and get of settings

2016-02-29 Thread David Miller
From: Simon Xiao 
Date: Thu, 25 Feb 2016 15:24:08 -0800

> This patch allows the user to set and retrieve speed and duplex of the
> hv_netvsc device via ethtool.
> 
> Example:
> $ ethtool eth0
> Settings for eth0:
> ...
> Speed: Unknown!
> Duplex: Unknown! (255)
> ...
> $ ethtool -s eth0 speed 1000 duplex full
> $ ethtool eth0
> Settings for eth0:
> ...
> Speed: 1000Mb/s
> Duplex: Full
> ...
> 
> This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
> 
> Signed-off-by: Simon Xiao 

Applied, thanks.


[PATCH next 2/3] ipvlan: Implement L3-symmetric mode.

2016-02-29 Thread Mahesh Bandewar
From: Mahesh Bandewar 

Current packet processing from IPtables perspective is asymmetric
for IPvlan L3 mode. On egress path, packets hit LOCAL_OUT and
POST_ROUTING hooks in slave-ns as well as master's ns however
during ingress path, LOCAL_IN and PRE_ROUTING hooks are hit only
in slave's ns. L3 mode is restrictive and uses master's L3 for
packet processing, so it does not make sense to skip these hooks
in ingress path in master's ns.

The changes in this patch nominates master-dev to be the device
for L3 ingress processing when skb device is the IPvlan slave.
Since master device is used for L3 processing, the IPT hooks are
hit in master's ns making the packet processing symmetric.

The other minor change this patch does to add a force parameter
for set_port_mode() to ensure correct settings are set during the
device initialization phase.

Signed-off-by: Mahesh Bandewar 
CC: Eric Dumazet 
CC: Tim Hockin 
CC: Alex Pollitt 
CC: Matthew Dupre 
---
 drivers/net/ipvlan/ipvlan_main.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 5802b9025765..734c25e52c60 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -14,16 +14,19 @@ static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, 
struct net_device *dev)
ipvlan->dev->mtu = dev->mtu - ipvlan->mtu_adj;
 }
 
-static void ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
+static void ipvlan_set_port_mode(struct ipvl_port *port, u16 nval, bool force)
 {
struct ipvl_dev *ipvlan;
 
-   if (port->mode != nval) {
+   if (port->mode != nval || force) {
list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
-   if (nval == IPVLAN_MODE_L3)
+   if (nval == IPVLAN_MODE_L3) {
ipvlan->dev->flags |= IFF_NOARP;
-   else
+   ipvlan->dev->l3_dev = port->dev;
+   } else {
ipvlan->dev->flags &= ~IFF_NOARP;
+   ipvlan->dev->l3_dev = ipvlan->dev;
+   }
}
port->mode = nval;
}
@@ -392,7 +395,7 @@ static int ipvlan_nl_changelink(struct net_device *dev,
if (data && data[IFLA_IPVLAN_MODE]) {
u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
 
-   ipvlan_set_port_mode(port, nmode);
+   ipvlan_set_port_mode(port, nmode, false);
}
return 0;
 }
@@ -479,7 +482,6 @@ static int ipvlan_link_new(struct net *src_net, struct 
net_device *dev,
memcpy(dev->dev_addr, phy_dev->dev_addr, ETH_ALEN);
 
dev->priv_flags |= IFF_IPVLAN_SLAVE;
-
port->count += 1;
err = register_netdevice(dev);
if (err < 0)
@@ -490,7 +492,7 @@ static int ipvlan_link_new(struct net *src_net, struct 
net_device *dev,
goto ipvlan_destroy_port;
 
list_add_tail_rcu(&ipvlan->pnode, &port->ipvlans);
-   ipvlan_set_port_mode(port, mode);
+   ipvlan_set_port_mode(port, mode, true);
 
netif_stacked_transfer_operstate(phy_dev, dev);
return 0;
-- 
2.7.0.rc3.207.g0ac5344



[PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing

2016-02-29 Thread Mahesh Bandewar
From: Mahesh Bandewar 

netif_receive_skb_core() dispatcher uses skb->dev device to send it
to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet
handlers intern use the device passed to determine the net-ns to
further process these packets.  Now with the nomination logic, the
dispatcher will call netif_get_l3_dev() helper to select the device
to be used for this processing. Since l3_dev is initialized to self,
normal packet processing should not change.

Signed-off-by: Mahesh Bandewar 
CC: Eric Dumazet 
CC: Tim Hockin 
CC: Alex Pollitt 
CC: Matthew Dupre 
---
 net/core/dev.c   | 9 ++---
 net/ipv4/ip_input.c  | 5 +++--
 net/ipv6/ip6_input.c | 5 +++--
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c4023a68cdc1..9252436ef11a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1811,7 +1811,8 @@ static inline int deliver_skb(struct sk_buff *skb,
if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
return -ENOMEM;
atomic_inc(&skb->users);
-   return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+   return pt_prev->func(skb, netif_get_l3_dev(skb->dev), pt_prev,
+orig_dev);
 }
 
 static inline void deliver_ptype_list_skb(struct sk_buff *skb,
@@ -1904,7 +1905,8 @@ again:
}
 out_unlock:
if (pt_prev)
-   pt_prev->func(skb2, skb->dev, pt_prev, skb->dev);
+   pt_prev->func(skb2, netif_get_l3_dev(skb->dev), pt_prev,
+ skb->dev);
rcu_read_unlock();
 }
 
@@ -4157,7 +4159,8 @@ ncls:
if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
goto drop;
else
-   ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+   ret = pt_prev->func(skb, netif_get_l3_dev(skb->dev),
+   pt_prev, orig_dev);
} else {
 drop:
if (!deliver_exact)
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index e3d782746d9d..b47164e3e1c6 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -247,7 +247,8 @@ int ip_local_deliver(struct sk_buff *skb)
/*
 *  Reassemble IP fragments.
 */
-   struct net *net = dev_net(skb->dev);
+   struct net_device *dev = netif_get_l3_dev(skb->dev);
+   struct net *net = dev_net(dev);
 
if (ip_is_fragment(ip_hdr(skb))) {
if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
@@ -255,7 +256,7 @@ int ip_local_deliver(struct sk_buff *skb)
}
 
return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN,
-  net, NULL, skb, skb->dev, NULL,
+  net, NULL, skb, dev, NULL,
   ip_local_deliver_finish);
 }
 
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index c05c425c2389..88443ac06402 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -287,9 +287,10 @@ discard:
 
 int ip6_input(struct sk_buff *skb)
 {
+   struct net_device *dev = netif_get_l3_dev(skb->dev);
+
return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
-  dev_net(skb->dev), NULL, skb, skb->dev, NULL,
-  ip6_input_finish);
+  dev_net(dev), NULL, skb, dev, NULL, ip6_input_finish);
 }
 
 int ip6_mc_input(struct sk_buff *skb)
-- 
2.7.0.rc3.207.g0ac5344



[PATCH next 1/3] dev: Add netif_get_l3_dev() helper

2016-02-29 Thread Mahesh Bandewar
From: Mahesh Bandewar 

This patch adds a l3_dev pointer and a helper function to retrieve
that. During ingress L3 packet processing, this device will be used
instead of skb->dev. Since l3_dev is initialized to self; l3_dev
should be pointing to skb->dev so the normal packet processing is
neither altered nor should incur any additional cost (as it resides
in the RX cache line).

Signed-off-by: Mahesh Bandewar 
CC: Eric Dumazet 
CC: Tim Hockin 
CC: Alex Pollitt 
CC: Matthew Dupre 
---
 include/linux/netdevice.h | 6 ++
 net/core/dev.c| 1 +
 2 files changed, 7 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e52077ffe5ed..1cf7e8d61043 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1738,6 +1738,7 @@ struct net_device {
unsigned long   gro_flush_timeout;
rx_handler_func_t __rcu *rx_handler;
void __rcu  *rx_handler_data;
+   struct net_device   *l3_dev;
 
 #ifdef CONFIG_NET_CLS_ACT
struct tcf_proto __rcu  *ingress_cl_list;
@@ -4085,6 +4086,11 @@ static inline void netif_keep_dst(struct net_device *dev)
dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM);
 }
 
+static inline struct net_device *netif_get_l3_dev(struct net_device *dev)
+{
+   return dev->l3_dev;
+}
+
 extern struct pernet_operations __net_initdata loopback_net_ops;
 
 /* Logging, debugging and troubleshooting/diagnostic helpers. */
diff --git a/net/core/dev.c b/net/core/dev.c
index edb7179bc051..c4023a68cdc1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7463,6 +7463,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
if (!dev->ethtool_ops)
dev->ethtool_ops = &default_ethtool_ops;
 
+   dev->l3_dev = dev;
nf_hook_ingress_init(dev);
 
return dev;
-- 
2.7.0.rc3.207.g0ac5344



[PATCH next 0/3] IPvlan L3 symetric mode

2016-02-29 Thread Mahesh Bandewar
From: Mahesh Bandewar 

One of the major request (for enhancement) that I have received
from various users of IPvlan in L3 mode is its inability to handle
IPtables.

In a typical IPvlan L3 setup where master is in default-ns and 
each slave is into different (slave) ns. In this setup egress
packet processing for traffic originating from slave-ns will
hit all NF_HOOKs in slave-ns as well as default-ns. However same
is not true for ingress processing. All these NF_HOOKs are
hit only in the slave-ns skipping them in the default-ns.
IPvlan in L3 mode is restrictive and it's preferred to hit these
hooks in master's ns than in slave's ns (L2 mode is where these
hooks will be hit only in slave's ns).

This can be achieved by adding a device pointer in net_device
struct. Stack will use this device reference and associated ns
for all egress L3 processing. By default this is initialized to
self so skb->dev would be same as skb->dev->l3_dev and hence the
normal path will stay unchanged. Also since l3_dev is in the
same RX cache line, there should not be any additional cost.

IPvlan slaves OTOH can assign (nominate) its master to its l3_dev
so that L3 processing happens in master's ns

Please check individual patches for the details.

Mahesh Bandewar (3):
  dev: Add netif_get_l3_dev() helper
  ipvlan: Use netif_get_l3_dev() to implement L3-symmetric mode.
  net: update L3 path with device selection logic

 drivers/net/ipvlan/ipvlan_main.c | 16 +---
 include/linux/netdevice.h|  6 ++
 net/core/dev.c   | 10 +++---
 net/ipv4/ip_input.c  |  5 +++--
 net/ipv6/ip6_input.c |  5 +++--
 5 files changed, 28 insertions(+), 14 deletions(-)

-- 
2.7.0.rc3.207.g0ac5344



Re: [PATCH] 3c59x: mask LAST_FRAG bit from length field in ring

2016-02-29 Thread David Miller
From: Neil Horman 
Date: Thu, 25 Feb 2016 13:02:50 -0500

> Recently, I fixed a bug in 3c59x:
> 
> commit 6e144419e4da11a9a4977c8d899d7247d94ca338
> Author: Neil Horman 
> Date:   Wed Jan 13 12:43:54 2016 -0500
> 
> 3c59x: fix another page map/single unmap imbalance
> 
> Which correctly rebalanced dma mapping and unmapping types.  Unfortunately it
> introduced a new bug which causes oopses on older systems.
> 
> When mapping dma regions, the last entry for a packet in the 3c59x tx ring
> encodes a LAST_FRAG bit, which is encoded as the high order bit of the buffers
> length field.  When it is unmapped the LAST_FRAG bit is cleared prior to being
> passed to the unmap function.  Unfortunately the commit above fails to do that
> masking.  It was missed in testing because the system on which I tested it had
> an intel iommu, the driver for which ignores the size field, using only the 
> DMA
> address as the token to identify the mapping to be released.  However, on 
> older
> systems that rely on swiotlb (or other dma drivers that key off that length
> field), not masking off that LAST_FRAG high order bit results in parsing a 
> huge
> size to be release, leading to all sorts of odd corruptions and the like.
> 
> Fix is easy, just mask the length with 0xFFF.  It should really be
> &(LAST_FRAG-1), but 0xFFF is the style of the file, and I'd like to make this
> fix minimal and correct before making it prettier.
> 
> Appies to the net tree cleanly.  All testing on both iommu and swiommu based
> systems produce good results
> 
> Signed-off-by: Neil Horman 

Applied and queued up for -stable, thanks.


Re: [Patch net-next v3 0/4] net_sched: update backlog for hierarchical qdisc's

2016-02-29 Thread David Miller
From: Cong Wang 
Date: Thu, 25 Feb 2016 14:54:59 -0800

> For hierarchical qdisc like HTB, we currently only update its qlen
> but leave its backlog as zero:
> 
> qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 
> 0 ver 3.17
>  Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 
> requeues 0)
>  backlog 0b 72p requeues 0
> 
> This patchset makes backlog as accurate as qlen.
> 
> ---
> v3: rebase and fix the n==0 case for qdisc_tree_reduce_backlog()
> v2: rebase and update changelog, not code change

Series applied, thanks.


Question on switchdev

2016-02-29 Thread Murali Karicheri
Hi Jiri, Scott, or other switchdev experts,

TI Keystone netcp h/w has a switch. It has n slave ports and 1 host port. 
Currently the netcp driver disables the switch functionality which makes them 
appear as n nic ports. However we have requirement to add switch support in the 
driver. I have reviewed the experimental driver documentation 
Documentation/networking/switchdev.txt and would like to understand it better 
so that I can add this support to keystone netcp driver.

NetCP h/w has a 1 (host port) x n (slave port) switch. It can do layer 2 
forwarding between ports. In the switch mode, host driver provides the frame to 
the switch and switch uses the filter data base (AKA ALE table, Address 
Learning Engine table) to forward the packet. There is a piece of information 
available per frame (meta data) to decide if frame to be forwarded to a 
particular port or use the fdb for forward decisions. I see following 
description in the above documentation.

===From Documentation/networking/switchdev.txt==
On switchdev driver initialization, the driver will allocate and register a 
struct net_device (using register_netdev()) for each enumerated physical switch 
port, called the port netdev.  A port netdev is the software representation of 
the physical port and provides a conduit for control traffic to/from the 
controller (the kernel) and the network, as well as an anchor point for higher 
level constructs such as bridges, bonds, VLANs, tunnels, and L3 routers.  Using 
standard netdev tools (iproute2, ethtool, etc), the port netdev can also 
provide to the user access to the physical properties of the switch port such 
as PHY link state and I/O statistics.
=

1. How does port netdev differ from regular netdev that carries data when 
registering netdev? Any example you can point to? 
2. I assume port netdev will appear as an interface in ifconfig -a command and 
it is not assigned an IP address. Correct?
3. with 1xn switch, so we have n + 1 netdev registered with net core? I assume, 
only 1 netdev is for data plane and the rest are control plane. Is this correct?
4. We have bunch of port specific configuration that we would like to control 
or configure from use space using standard tools. For example, switch port 
state, flow control etc. Is that possible to add using this framework? ethtool 
update needed for this?
5. This feature is marked as experimental. Hope having more drivers added to 
this switch dev framework can eventually get this out of experimental to 
regular status. Right?

I have more questions that I will defer for now. It would be great if I can 
work with you to implement this in netcp driver. Hope you can respond with your 
comment.

Thanks.
-- 
Murali Karicheri
Linux Kernel, Keystone


Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32

2016-02-29 Thread Cong Wang
On Mon, Feb 29, 2016 at 10:58 AM, Jiri Pirko  wrote:
> Mon, Feb 29, 2016 at 07:40:53PM CET, john.fastab...@gmail.com wrote:
>>On 16-02-27 08:28 PM, Cong Wang wrote:
>>> On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend
>>>  wrote:
 On 16-02-26 09:39 AM, Cong Wang wrote:
> On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend
>  wrote:
>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index 2121df5..e64d20b 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -392,4 +392,9 @@ struct tc_cls_u32_offload {
>> };
>>  };
>>
>> +static inline bool tc_should_offload(struct net_device *dev)
>> +{
>> +   return dev->netdev_ops->ndo_setup_tc;
>> +}
>> +
>
> These should be protected by CONFIG_NET_CLS_U32, no?
>

 Its not necessary it is a completely general function and I only
 lifted it out of cls_u32 so that the cls_flower classifier could
 also use it.

 I don't see the need off-hand to have it wrapped in an ORd ifdef
 statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...).
 Any particular reason you were thnking it should be wrapped in ifdefs?

>>>
>>> Not a big deal.
>>>
>>> I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n.
>>>
>>> Thanks.
>>>
>>
>>Well because this is 'static inline' gcc should just remove it
>>if it is not used. Assuming non-ancient gcc and normal compile
>>flags, e.g. you are not including -fkeep-inline-functions or
>>something.
>>
>>So just to keep it readable I would prefer to just leave it
>>as is.
>
> Definitelly. cls_flower will use it in very near future. Making it
> dependent on CONFIG_NET_CLS_U32 makes 0 sense to me.

Oh, why then do you have u32 in the struct name tc_cls_u32_offload?

(Note that in the above I said "these" not "this", so I never only refer
to tc_should_offload)


Re: [PATCH/RFC v5 net-next] ravb: Add dma queue interrupt support

2016-02-29 Thread Sergei Shtylyov

On 02/28/2016 05:13 PM, Yoshihiro Kaneko wrote:


From: Kazuya Mizuguchi 

This patch supports the following interrupts.

- One interrupt for multiple (error, gPTP)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)

This patch improve efficiency of the interrupt handler by adding the
interrupt handler corresponding to each interrupt source described
above. Additionally, it reduces the number of times of the access to
EthernetAVB IF.
Also this patch prevent this driver depends on the whim of a boot loader.

[ykaneko0...@gmail.com: define bit names of registers]
[ykaneko0...@gmail.com: add comment for gen3 only registers]
[ykaneko0...@gmail.com: fix coding style]
[ykaneko0...@gmail.com: update changelog]
[ykaneko0...@gmail.com: gen3: fix initialization of interrupts]
[ykaneko0...@gmail.com: gen3: fix clearing interrupts]
[ykaneko0...@gmail.com: gen3: add helper function for request_irq()]
[ykaneko0...@gmail.com: revert ravb_close() and ravb_ptp_stop()]
[ykaneko0...@gmail.com: avoid calling free_irq() to non-hooked interrupts]
[ykaneko0...@gmail.com: make NC/BE interrupt handler a function]
Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 



[...]


diff --git a/drivers/net/ethernet/renesas/ravb_main.c
b/drivers/net/ethernet/renesas/ravb_main.c
index c936682..1bec71e 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c


[...]


@@ -697,6 +726,39 @@ static void ravb_error_interrupt(struct net_device
*ndev)
 }
   }

+static int ravb_nc_be_interrupt(struct net_device *ndev, int ravb_queue,



I'd call this function e.g. ravb_queue_interrupt(). And make it return
'bool' or even 'irqreturn_t' directly. And I'd suggest a shorter name for
the 'ravb_queue' parameter, like 'queue' or even 'q'...


Agreed.




+   u32 ris0, u32 *ric0, u32 tis, u32 *tic)



You don't seem to need 'ric0' and 'tic' past the call sites, so no real
need to pass them by reference.


When Rx/Tx interrupt for NC and BE is issued at the same time,
this function is called twice (for NC, BE) from ravb_interrupt.
The interrupt mask of NC set in the first call will be reset in the next
call for BE. So it is necessary to keep the modified value of "ric0" and
"tic".


   OK, but we still can simplify this by reading these registers right in 
ravb_queue_interrupt()...


[...]

@@ -725,31 +787,15 @@ static irqreturn_t ravb_interrupt(int irq, void
*dev_id)

 /* Network control and best effort queue RX/TX */
 for (q = RAVB_NC; q >= RAVB_BE; q--) {
-   if (((ris0 & ric0) & BIT(q)) ||
-   ((tis  & tic)  & BIT(q))) {
-   if (napi_schedule_prep(&priv->napi[q])) {
-   /* Mask RX and TX interrupts */
-   ric0 &= ~BIT(q);
-   tic &= ~BIT(q);
-   ravb_write(ndev, ric0, RIC0);
-   ravb_write(ndev, tic, TIC);
-   __napi_schedule(&priv->napi[q]);
-   } else {
-   netdev_warn(ndev,
-   "ignoring interrupt,
rx status 0x%08x, rx mask 0x%08x,\n",
-   ris0, ric0);
-   netdev_warn(ndev,
-   "
tx status 0x%08x, tx mask 0x%08x.\n",
-   tis, tic);
-   }
+   if (ravb_nc_be_interrupt(ndev, q, ris0, &ric0,
tis,
+&tic))
 result = IRQ_HANDLED;
-   }
 }



Unroll this *for* loop please...



OK.


   It was a bad idea actually, sorry...

[...]

@@ -767,6 +813,73 @@ static irqreturn_t ravb_interrupt(int irq, void

[...]

+static irqreturn_t ravb_dmaq_interrupt(int irq, void *dev_id, int
ravb_queue)



Perhaps, ravb_rx_tx_interrupt()?


Agreed.


   And we still have ravb_dma_interrupt() unused, right?

[...]


Thanks,
kaneko


MBR, Sergei



Re: [PATCH/RFC v6 net-next] ravb: Add dma queue interrupt support

2016-02-29 Thread Sergei Shtylyov

Hello.

On 02/28/2016 06:41 PM, Yoshihiro Kaneko wrote:


From: Kazuya Mizuguchi 

This patch supports the following interrupts.

- One interrupt for multiple (timestamp, error, gPTP)
- One interrupt for emac
- Four interrupts for dma queue (best effort rx/tx, network control rx/tx)

This patch improve efficiency of the interrupt handler by adding the
interrupt handler corresponding to each interrupt source described
above. Additionally, it reduces the number of times of the access to
EthernetAVB IF.
Also this patch prevent this driver depends on the whim of a boot loader.

[ykaneko0...@gmail.com: define bit names of registers]
[ykaneko0...@gmail.com: add comment for gen3 only registers]
[ykaneko0...@gmail.com: fix coding style]
[ykaneko0...@gmail.com: update changelog]
[ykaneko0...@gmail.com: gen3: fix initialization of interrupts]
[ykaneko0...@gmail.com: gen3: fix clearing interrupts]
[ykaneko0...@gmail.com: gen3: add helper function for request_irq()]
[ykaneko0...@gmail.com: gen3: remove IRQF_SHARED flag for request_irq()]
[ykaneko0...@gmail.com: revert ravb_close() and ravb_ptp_stop()]
[ykaneko0...@gmail.com: avoid calling free_irq() to non-hooked interrupts]
[ykaneko0...@gmail.com: make NC/BE interrupt handler a function]
[ykaneko0...@gmail.com: make timestamp interrupt handler a function]
[ykaneko0...@gmail.com: timestamp interrupt is handled in multiple
  interrupt handler instead of dma queue interrupt handler]
Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 


   OK, you are very close now! Just a few comments...

[...]

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index c936682..22ef65d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c

[...]

@@ -697,6 +726,47 @@ static void ravb_error_interrupt(struct net_device *ndev)
}
  }

+static bool ravb_queue_interrupt(struct net_device *ndev, int q,
+u32 ris0, u32 *ric0, u32 tis, u32 *tic)
+{
+   struct ravb_private *priv = netdev_priv(ndev);
+


   Perhaps it makes sense to read the RI[CS]0/TI[CS] here instead of passing 
them (by reference)?


[...]

@@ -714,42 +784,21 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id)
u32 ric0 = ravb_read(ndev, RIC0);
u32 tis  = ravb_read(ndev, TIS);
u32 tic  = ravb_read(ndev, TIC);
-   int q;

/* Timestamp updated */
-   if (tis & TIS_TFUF) {
-   ravb_write(ndev, ~TIS_TFUF, TIS);
-   ravb_get_tx_tstamp(ndev);
+   if (ravb_timestamp_interrupt(ndev, tis))
result = IRQ_HANDLED;
-   }

/* Network control and best effort queue RX/TX */
-   for (q = RAVB_NC; q >= RAVB_BE; q--) {
-   if (((ris0 & ric0) & BIT(q)) ||
-   ((tis  & tic)  & BIT(q))) {
-   if (napi_schedule_prep(&priv->napi[q])) {
-   /* Mask RX and TX interrupts */
-   ric0 &= ~BIT(q);
-   tic &= ~BIT(q);
-   ravb_write(ndev, ric0, RIC0);
-   ravb_write(ndev, tic, TIC);
-   __napi_schedule(&priv->napi[q]);
-   } else {
-   netdev_warn(ndev,
-   "ignoring interrupt, rx status 
0x%08x, rx mask 0x%08x,\n",
-   ris0, ric0);
-   netdev_warn(ndev,
-   "tx status 
0x%08x, tx mask 0x%08x.\n",
-   tis, tic);
-   }
-   result = IRQ_HANDLED;
-   }
-   }
+   if (ravb_queue_interrupt(ndev, RAVB_NC, ris0, &ric0, tis, &tic))
+   result = IRQ_HANDLED;
+   if (ravb_queue_interrupt(ndev, RAVB_BE, ris0, &ric0, tis, &tic))
+   result = IRQ_HANDLED;


   Hmm, perhaps unrolling wasn't such a great idea... we can't use || here as 
it would be short-circuited. :-(


[...]

+static irqreturn_t ravb_rx_tx_interrupt(int irq, void *dev_id, int ravb_queue)


   Please, please shorten this 'ravb_queue'...
   Also, would make sense to rename it to ravb_dma_interrupt()...

[...]

   Unfortunately, I still can't do a full gen2 regression testing as both Alt 
and Porter boards don't work with the recent kernel due to AVB_MDIO stuck at 
1... But perhaps such testing isn't even necessary.


MBR, Sergei



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread David Miller
From: Thomas Gleixner 
Date: Mon, 29 Feb 2016 20:14:36 +0100 (CET)

> On Mon, 29 Feb 2016, Peter Hurley wrote:
>> Or flipping your argument on its head, why not just _always_ execute
>> softirq in ksoftirqd?
> 
> Which is what that change effectivley does. And that makes a lot of sense,
> because you get the softirq load under scheduler control and do not let the
> softirq run as a context stealing entity which is completely uncontrollable by
> the scheduler.

+1


[PATCH 4/4] net: can: ifi: Add obscure bit swap for EFF frame IDs

2016-02-29 Thread Marek Vasut
In case of CAN2.0 EFF frame, the controller handles frame IDs in a
rather bizzare way. The ID is split into an extended part, IDX[28:11]
and standard part, ID[10:0]. In the TX path, the core first sends the
top 11 bits of the IDX, followed by ID and finally the rest of IDX.
In the RX path, the core stores the ID the LSbit part of IDX field,
followed by the LSbit parts of real IDX. The MSbit parts of IDX are
stored in ID field of the register.

This patch implements the necessary bit shuffling to mitigate this
obscure behavior. In case two of these controllers are connected
together, the RX and TX bit swapping nullifies itself and the issue
does not manifest. The issue only manifests when talking to another
different CAN controller.

Signed-off-by: Marek Vasut 
Cc: Marc Kleine-Budde 
Cc: Mark Rutland 
Cc: Oliver Hartkopp 
Cc: Wolfgang Grandegger 
---
 drivers/net/can/ifi_canfd/ifi_canfd.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c 
b/drivers/net/can/ifi_canfd/ifi_canfd.c
index 6704098..254861b 100644
--- a/drivers/net/can/ifi_canfd/ifi_canfd.c
+++ b/drivers/net/can/ifi_canfd/ifi_canfd.c
@@ -136,7 +136,11 @@
 #define IFI_CANFD_RXFIFO_ID0x6c
 #define IFI_CANFD_RXFIFO_ID_ID_OFFSET  0
 #define IFI_CANFD_RXFIFO_ID_ID_STD_MASK0x7ff
+#define IFI_CANFD_RXFIFO_ID_ID_STD_OFFSET  0
+#define IFI_CANFD_RXFIFO_ID_ID_STD_WIDTH   10
 #define IFI_CANFD_RXFIFO_ID_ID_XTD_MASK0x1fff
+#define IFI_CANFD_RXFIFO_ID_ID_XTD_OFFSET  11
+#define IFI_CANFD_RXFIFO_ID_ID_XTD_WIDTH   18
 #define IFI_CANFD_RXFIFO_ID_IDEBIT(29)
 
 #define IFI_CANFD_RXFIFO_DATA  0x70/* 0x70..0xac */
@@ -157,7 +161,11 @@
 #define IFI_CANFD_TXFIFO_ID0xbc
 #define IFI_CANFD_TXFIFO_ID_ID_OFFSET  0
 #define IFI_CANFD_TXFIFO_ID_ID_STD_MASK0x7ff
+#define IFI_CANFD_TXFIFO_ID_ID_STD_OFFSET  0
+#define IFI_CANFD_TXFIFO_ID_ID_STD_WIDTH   10
 #define IFI_CANFD_TXFIFO_ID_ID_XTD_MASK0x1fff
+#define IFI_CANFD_TXFIFO_ID_ID_XTD_OFFSET  11
+#define IFI_CANFD_TXFIFO_ID_ID_XTD_WIDTH   18
 #define IFI_CANFD_TXFIFO_ID_IDEBIT(29)
 
 #define IFI_CANFD_TXFIFO_DATA  0xc0/* 0xb0..0xfc */
@@ -229,10 +237,20 @@ static void ifi_canfd_read_fifo(struct net_device *ndev)
 
rxid = readl(priv->base + IFI_CANFD_RXFIFO_ID);
id = (rxid >> IFI_CANFD_RXFIFO_ID_ID_OFFSET);
-   if (id & IFI_CANFD_RXFIFO_ID_IDE)
+   if (id & IFI_CANFD_RXFIFO_ID_IDE) {
id &= IFI_CANFD_RXFIFO_ID_ID_XTD_MASK;
-   else
+   /*
+* In case the Extended ID frame is received, the standard
+* and extended part of the ID are swapped in the register,
+* so swap them back to obtain the correct ID.
+*/
+   id = (id >> IFI_CANFD_RXFIFO_ID_ID_XTD_OFFSET) |
+((id & IFI_CANFD_RXFIFO_ID_ID_STD_MASK) <<
+  IFI_CANFD_RXFIFO_ID_ID_XTD_WIDTH);
+   id |= CAN_EFF_FLAG;
+   } else {
id &= IFI_CANFD_RXFIFO_ID_ID_STD_MASK;
+   }
cf->can_id = id;
 
if (rxdlc & IFI_CANFD_RXFIFO_DLC_ESI) {
@@ -767,6 +785,15 @@ static netdev_tx_t ifi_canfd_start_xmit(struct sk_buff 
*skb,
 
if (cf->can_id & CAN_EFF_FLAG) {
txid = cf->can_id & CAN_EFF_MASK;
+   /*
+* In case the Extended ID frame is transmitted, the
+* standard and extended part of the ID are swapped
+* in the register, so swap them back to send the
+* correct ID.
+*/
+   txid = (txid >> IFI_CANFD_TXFIFO_ID_ID_XTD_WIDTH) |
+  ((txid & IFI_CANFD_TXFIFO_ID_ID_XTD_MASK) <<
+IFI_CANFD_TXFIFO_ID_ID_XTD_OFFSET);
txid |= IFI_CANFD_TXFIFO_ID_IDE;
} else {
txid = cf->can_id & CAN_SFF_MASK;
-- 
2.7.0



[PATCH 1/4] net: can: ifi: Fix clock generator configuration

2016-02-29 Thread Marek Vasut
The clock generation does not match reality when using the CAN IP
core outside of the FPGA design. This patch fixes the computation
of values which are programmed into the clock generator registers.

First, there are some off-by-one errors which manifest themselves
only when communicating with different controller, so those are
fixed.

Second, the bits in the clock generator registers have different
meaning depending on whether the core is in ISO CANFD mode or any
of the other modes (BOSCH CANFD or CAN2.0). Detect the ISO CANFD
mode and fix handling of this special case of clock configuration.

Finally, the CAN clock speed is in CANCLOCK register, not SYSCLOCK
register, so fix this as well.

Signed-off-by: Marek Vasut 
Cc: Marc Kleine-Budde 
Cc: Mark Rutland 
Cc: Oliver Hartkopp 
Cc: Wolfgang Grandegger 
---
 drivers/net/can/ifi_canfd/ifi_canfd.c | 43 ++-
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c 
b/drivers/net/can/ifi_canfd/ifi_canfd.c
index 639868b..72f5205 100644
--- a/drivers/net/can/ifi_canfd/ifi_canfd.c
+++ b/drivers/net/can/ifi_canfd/ifi_canfd.c
@@ -514,25 +514,25 @@ static irqreturn_t ifi_canfd_isr(int irq, void *dev_id)
 
 static const struct can_bittiming_const ifi_canfd_bittiming_const = {
.name   = KBUILD_MODNAME,
-   .tseg1_min  = 2,/* Time segment 1 = prop_seg + phase_seg1 */
+   .tseg1_min  = 1,/* Time segment 1 = prop_seg + phase_seg1 */
.tseg1_max  = 64,
-   .tseg2_min  = 1,/* Time segment 2 = phase_seg2 */
-   .tseg2_max  = 16,
+   .tseg2_min  = 2,/* Time segment 2 = phase_seg2 */
+   .tseg2_max  = 64,
.sjw_max= 16,
-   .brp_min= 1,
-   .brp_max= 1024,
+   .brp_min= 2,
+   .brp_max= 256,
.brp_inc= 1,
 };
 
 static const struct can_bittiming_const ifi_canfd_data_bittiming_const = {
.name   = KBUILD_MODNAME,
-   .tseg1_min  = 2,/* Time segment 1 = prop_seg + phase_seg1 */
-   .tseg1_max  = 16,
-   .tseg2_min  = 1,/* Time segment 2 = phase_seg2 */
-   .tseg2_max  = 8,
-   .sjw_max= 4,
-   .brp_min= 1,
-   .brp_max= 32,
+   .tseg1_min  = 1,/* Time segment 1 = prop_seg + phase_seg1 */
+   .tseg1_max  = 64,
+   .tseg2_min  = 2,/* Time segment 2 = phase_seg2 */
+   .tseg2_max  = 64,
+   .sjw_max= 16,
+   .brp_min= 2,
+   .brp_max= 256,
.brp_inc= 1,
 };
 
@@ -545,32 +545,33 @@ static void ifi_canfd_set_bittiming(struct net_device 
*ndev)
u32 noniso_arg = 0;
u32 time_off;
 
-   if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO) {
+   if (priv->can.ctrlmode & CAN_CTRLMODE_FD) {
+   time_off = IFI_CANFD_TIME_SJW_OFF_ISO;
+   } else {
noniso_arg = IFI_CANFD_TIME_SET_TIMEB_BOSCH |
 IFI_CANFD_TIME_SET_TIMEA_BOSCH |
 IFI_CANFD_TIME_SET_PRESC_BOSCH |
 IFI_CANFD_TIME_SET_SJW_BOSCH;
time_off = IFI_CANFD_TIME_SJW_OFF_BOSCH;
-   } else {
-   time_off = IFI_CANFD_TIME_SJW_OFF_ISO;
}
 
/* Configure bit timing */
-   brp = bt->brp - 1;
+   brp = bt->brp - 2;
sjw = bt->sjw - 1;
tseg1 = bt->prop_seg + bt->phase_seg1 - 1;
-   tseg2 = bt->phase_seg2 - 1;
+   tseg2 = bt->phase_seg2 - 2;
writel((tseg2 << IFI_CANFD_TIME_TIMEB_OFF) |
   (tseg1 << IFI_CANFD_TIME_TIMEA_OFF) |
   (brp << IFI_CANFD_TIME_PRESCALE_OFF) |
-  (sjw << time_off),
+  (sjw << time_off) |
+  noniso_arg,
   priv->base + IFI_CANFD_TIME);
 
/* Configure data bit timing */
-   brp = dbt->brp - 1;
+   brp = dbt->brp - 2;
sjw = dbt->sjw - 1;
tseg1 = dbt->prop_seg + dbt->phase_seg1 - 1;
-   tseg2 = dbt->phase_seg2 - 1;
+   tseg2 = dbt->phase_seg2 - 2;
writel((tseg2 << IFI_CANFD_TIME_TIMEB_OFF) |
   (tseg1 << IFI_CANFD_TIME_TIMEA_OFF) |
   (brp << IFI_CANFD_TIME_PRESCALE_OFF) |
@@ -847,7 +848,7 @@ static int ifi_canfd_plat_probe(struct platform_device 
*pdev)
 
priv->can.state = CAN_STATE_STOPPED;
 
-   priv->can.clock.freq = readl(addr + IFI_CANFD_SYSCLOCK);
+   priv->can.clock.freq = readl(addr + IFI_CANFD_CANCLOCK);
 
priv->can.bittiming_const   = &ifi_canfd_bittiming_const;
priv->can.data_bittiming_const  = &ifi_canfd_data_bittiming_const;
-- 
2.7.0



[PATCH 3/4] net: can: ifi: Fix RX and TX ID mask

2016-02-29 Thread Marek Vasut
The RX and TX ID mask for CAN2.0 is 11 bits wide. This patch fixes
the incorrect mask, which caused the CAN IDs to miss the MSBit both
on receive and transmit.

Signed-off-by: Marek Vasut 
Cc: Marc Kleine-Budde 
Cc: Mark Rutland 
Cc: Oliver Hartkopp 
Cc: Wolfgang Grandegger 
---
 drivers/net/can/ifi_canfd/ifi_canfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c 
b/drivers/net/can/ifi_canfd/ifi_canfd.c
index 82a33bd..6704098 100644
--- a/drivers/net/can/ifi_canfd/ifi_canfd.c
+++ b/drivers/net/can/ifi_canfd/ifi_canfd.c
@@ -135,7 +135,7 @@
 
 #define IFI_CANFD_RXFIFO_ID0x6c
 #define IFI_CANFD_RXFIFO_ID_ID_OFFSET  0
-#define IFI_CANFD_RXFIFO_ID_ID_STD_MASK0x3ff
+#define IFI_CANFD_RXFIFO_ID_ID_STD_MASK0x7ff
 #define IFI_CANFD_RXFIFO_ID_ID_XTD_MASK0x1fff
 #define IFI_CANFD_RXFIFO_ID_IDEBIT(29)
 
@@ -156,7 +156,7 @@
 
 #define IFI_CANFD_TXFIFO_ID0xbc
 #define IFI_CANFD_TXFIFO_ID_ID_OFFSET  0
-#define IFI_CANFD_TXFIFO_ID_ID_STD_MASK0x3ff
+#define IFI_CANFD_TXFIFO_ID_ID_STD_MASK0x7ff
 #define IFI_CANFD_TXFIFO_ID_ID_XTD_MASK0x1fff
 #define IFI_CANFD_TXFIFO_ID_IDEBIT(29)
 
-- 
2.7.0



[PATCH 0/4] Synchronise IFI CANFD driver with real world

2016-02-29 Thread Marek Vasut
Thus far, this driver was only tested on a hardware synthesised in
the warm and safe insides of an FPGA, only against another IFI CANFD
core. The real hardware arrived now and I tested the IFI CANFD driver
against different, harsh, real-world CAN controller.

This uncovered a few bugs, so here are the fixes for those.

Marek Vasut (4):
  net: can: ifi: Fix clock generator configuration
  net: can: ifi: Fix TX DLC configuration
  net: can: ifi: Fix RX and TX ID mask
  net: can: ifi: Add obscure bit swap for EFF frame IDs

 drivers/net/can/ifi_canfd/ifi_canfd.c | 83 ---
 1 file changed, 58 insertions(+), 25 deletions(-)

Cc: Marc Kleine-Budde 
Cc: Mark Rutland 
Cc: Oliver Hartkopp 
Cc: Wolfgang Grandegger 

-- 
2.7.0



[PATCH 2/4] net: can: ifi: Fix TX DLC configuration

2016-02-29 Thread Marek Vasut
The TX DLC, the transmission length information, was not written
into the transmit configuration register. When using the CAN core
with different CAN controller, the receiving CAN controller will
receive only the ID part of the CAN frame, but no data at all.

This patch adds the TX DLC into the register to fix this issue.

Signed-off-by: Marek Vasut 
Cc: Marc Kleine-Budde 
Cc: Mark Rutland 
Cc: Oliver Hartkopp 
Cc: Wolfgang Grandegger 
---
 drivers/net/can/ifi_canfd/ifi_canfd.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/can/ifi_canfd/ifi_canfd.c 
b/drivers/net/can/ifi_canfd/ifi_canfd.c
index 72f5205..82a33bd 100644
--- a/drivers/net/can/ifi_canfd/ifi_canfd.c
+++ b/drivers/net/can/ifi_canfd/ifi_canfd.c
@@ -774,10 +774,15 @@ static netdev_tx_t ifi_canfd_start_xmit(struct sk_buff 
*skb,
 
if (priv->can.ctrlmode & (CAN_CTRLMODE_FD | CAN_CTRLMODE_FD_NON_ISO)) {
if (can_is_canfd_skb(skb)) {
+   txdlc |= can_len2dlc(cf->len);
txdlc |= IFI_CANFD_TXFIFO_DLC_EDL;
if (cf->flags & CANFD_BRS)
txdlc |= IFI_CANFD_TXFIFO_DLC_BRS;
+   } else {
+   txdlc |= cf->len;
}
+   } else {
+   txdlc |= cf->len;
}
 
if (cf->can_id & CAN_RTR_FLAG)
-- 
2.7.0



[ANNOUNCE] NetDev 1.1 slides now available

2016-02-29 Thread Pablo Neira Ayuso
Hi,

Today we're releasing the NetDev 1.1 slides, you can find them at:

http://www.netdevconf.org/1.1/proceedings/

Regarding videos, we're still uploading (~40 hours), so it may take a
little while until we make them public. Will send a short noticed once
they are available.

And short reminder to talk presenters: Don't forget that your paper
submission deadline is set on *10th March 2016*.

Thanks.


Re: [PATCH] mrf24j40: fix security-enabled processing on inbound frames

2016-02-29 Thread Alan Ott

On 02/18/2016 01:34 PM, zopieux wrote:

Fix the MRF24J40 handling of security-enabled frames so it does not
block upon receiving such frames.

Signed-off-by: Alexander Aring 
Reported-by: Alexandre Macabies 
Tested-by: Alexandre Macabies 
---
When receiving a security-enabled IEEE 802.15.4 frame, the MRF24J40
triggers a SECIF interrupt that needs to be handled for RX processing
to keep functioning properly.

This patch enables the SECIF interrupt and makes the MRF ignores all
hardware processing of security-enabled frames, that is handled by the
ieee802154 stack instead.
---


The "From" field of the email needs to have your real name in it. This 
will be where the "Author" field in git comes from.


It looks like there are a few separate things happening in this patch. 
Maybe they should be broken out in to separate patches. I see:


1. The ieee802154.h part,
2. The TX part,
3. The RX part.

The patch description only really describes the RX part.

Other than that, the actual code seems OK to me.

Alan.



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 11:13 -0800, Peter Hurley wrote:
> On 02/29/2016 07:27 AM, Eric Dumazet wrote:
> > On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote:
> > 
> >> The reason why Eric's change is so effective for Eric's workload is
> >> that it fixes the problem where NET_RX keeps getting new network packets
> >> so it keeps looping, servicing more NET_RX softirq.
> > 
> > You have very little idea of what is happening in networking land.
> 
> While that is true, I can read a trace:
> 
>   ** already in NET_RX softirq **
> 
>   -0   0..s2   15us : kmem_cache_alloc: call_site=c08378e4 
> ptr=de55d7c0 bytes_req=192 bytes_alloc=192 gfp_flags=GFP_ATOMIC
>   -0   0..s2   23us : netif_receive_skb_entry: dev=eth0 napi_id=0x0 
> queue_mapping=0 skbaddr=dca04400 vlan_tagged=0 vlan_proto=0x 
> vlan_tci=0x000
> 0 protocol=0x0800 ip_summed=0 hash=0x l4_hash=0 len=88 data_len=0 
> truesize=1984 mac_header_valid=1 mac_header=-14 nr_frags=0 gso_size=0 
> gso_type=0x0
>   -0   0..s2   30us+: netif_receive_skb: dev=eth0 skbaddr=dca04400 
> len=88
>   -0   0d.s5   98us : sched_waking: comm=sshd pid=750 prio=120 
> target_cpu=000
>   -0   0d.s6  105us : sched_stat_sleep: comm=sshd pid=750 
> delay=3125230447 [ns]
>   -0   0dns6  110us+: sched_wakeup: comm=sshd pid=750 prio=120 
> target_cpu=000
>   -0   0dns4  123us+: timer_start: timer=dc940e9c 
> function=tcp_delack_timer expires=9746 [timeout=10] flags=0x
>   -0   0dnH3  150us : irq_handler_entry: irq=176 
> name=4a10.ethernet
>   -0   0dnH3  153us : softirq_raise: vec=3 [action=NET_RX]
>   -0   0dnH3  155us : irq_handler_exit: irq=176 ret=handled
>   -0   0dnH3  160us : irq_handler_entry: irq=20 
> name=4900.edma_ccint
>   -0   0dnH3  163us : irq_handler_exit: irq=20 ret=handled
>   -0   0.ns2  169us : napi_poll: napi poll on napi struct de465c30 
> for device eth0
>   -0   0.ns2  171us : softirq_exit: vec=3 [action=NET_RX]
> 
> 
> As you can see, NET_RX softirq is re-raised while in NET_RX softirq,
> as a result of receiving new packets. So NET_RX will keep looping,
> which is what I wrote.

Well, NET_RX can not be re-raised, it is a single bit flip.

It is 'raised' on this trace because the driver already rearmed the IRQ
so that hard irq handler could fire.

Anyway, it seems you know much better than me, so I will stop answering
your mails on this topic.





Re: [PATCH net-next V1 09/10] net/mlx5: Fix global UAR mapping

2016-02-29 Thread Saeed Mahameed
>
> Well anyone can see that from the code.
>
> You have to explain why.

In a simple words as partially explained in the commit message we want
to have both mappings (NC and WC) available so upper layer can decide
which to choose e.g. for SQs/QPs in some cases (Small Packets) and
only when WC is supported we would like to write TX descriptors (WQEs)
using ConnectX BlueFlame feature via WC mapping and if WC is not
supported the TX descriptors would be posted in the usual way
(doorbell) via NC mapping.
this would give a latency boost for small packets.

The problem is when posting BlueFlame buffers when the mapping is not
WC i.e via NC mapping the latency will get worst than writing using
the usual way (doorbell).

so this is why we use ARCH_HAS_IOREMAP_WC to give a hint to upper
layer whether to use BlueFlame writes (WC) or doorbell writes (NC).

>
> And BTW, ARCH_HAS_IOREMAP_WC doesn't even tell you if the platform
> will actually give you a write-combining mapping.

We did some research after your comment and we are considering
removing ARCH_HAS_IOREMAP_WC from the code, we will update the patches
soon.

>
> So if it's the driver operates properly if a non-WC mapping is used
> for uar->bf_map, then get rid of this CPP test altogether PLEASE!
>
> Otherwise your driver is buggy, because ARCH_HAS_IOREMAP_WC only says
> whether the default implementation of ioremap_wc() needs to be
> provided by include/asm-generic/iomap.h It does not guarantee that a
> write-combining mapping will be provided.
>
> I really can't think of any reason why you absolutely require a
> WC mapping, and the CPP test just makes your driver look more
> ugly than it needs to me.

WC mapping is required in order to know if BlueFlame writes would give
a better latency or not.

>
> So can you please explain what the hell is happening here and why you
> are doing things this way rather than just reading the code to me?

I hope the above explains what we are trying to do here, I know it is
not perfect, but as you know the kernel IO mapping API doesn't tell if
the WC mapping was successful or not, so we used the CPP test.
but after your comment we understood it is not perfect, and we are
looking into it.

Thanks


[net-next PATCH] net: relax setup_tc ndo op handle restriction

2016-02-29 Thread John Fastabend
I added this check in setup_tc to multiple drivers,

 if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)

Unfortunately restricting to TC_H_ROOT like this breaks the old
instantiation of mqprio to setup a hardware qdisc. This patch
relaxes the test to only check the type to make it equivalent
to the check before I broke it. With this the old instantiation
continues to work.

A good smoke test is to setup mqprio with,

# tc qdisc add dev eth4 root mqprio num_tc 8 \
  map 0 1 2 3 4 5 6 7 \
  queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7

Fixes: e4c6734eaab9 ("net: rework ndo tc op to consume additional qdisc handle 
paramete")
Reported-by: Singh Krishneil 
Reported-by: Jake Keller 
CC: Murali Karicheri 
CC: Shradha Shah 
CC: Or Gerlitz 
CC: Ariel Elior 
CC: Jeff Kirsher 
CC: Bruce Allan 
CC: Jesse Brandeburg 
CC: Don Skidmore 
Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c|2 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   |2 +-
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |2 +-
 drivers/net/ethernet/sfc/tx.c   |2 +-
 drivers/net/ethernet/ti/netcp_core.c|2 +-
 8 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 3360684..ebf9224 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -1632,7 +1632,7 @@ static int xgbe_setup_tc(struct net_device *netdev, u32 
handle, __be16 proto,
struct xgbe_prv_data *pdata = netdev_priv(netdev);
u8 tc;
 
-   if (handle != TC_H_ROOT || tc_to_netdev->type != TC_SETUP_MQPRIO)
+   if (tc_to_netdev->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
tc = tc_to_netdev->tc;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 45843d1..a949783 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4275,7 +4275,7 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
 int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
 struct tc_to_netdev *tc)
 {
-   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
+   if (tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
return bnx2x_setup_tc(dev, tc->tc);
 }
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff1507f..f1a0a73 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5383,7 +5383,7 @@ static int bnxt_setup_tc(struct net_device *dev, u32 
handle, __be16 proto,
struct bnxt *bp = netdev_priv(dev);
u8 tc;
 
-   if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
+   if (ntc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
tc = ntc->tc;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index dc1a821..d09a8dd 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1207,7 +1207,7 @@ err_queueing_scheme:
 static int __fm10k_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
struct tc_to_netdev *tc)
 {
-   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
+   if (tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
return fm10k_setup_tc(dev, tc->tc);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cf4b729..02139f3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8422,7 +8422,7 @@ int __ixgbe_setup_tc(struct net_device *dev, u32 handle, 
__be16 proto,
}
}
 
-   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
+   if (tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
return ixgbe_setup_tc(dev, tc->tc);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 96d95cb..a2d560a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -72,7 +72,7 @@ int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 static int __mlx4_en_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
  struct tc_to_netdev *tc)
 {
-   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
+   if (tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
return mlx4_en_setup_tc(dev, tc->tc);
diff --git a/drivers/net/ethernet/sfc/

Re: [Patch net-next] net: remove skb_sender_cpu_clear()

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 10:55 -0800, Cong Wang wrote:
> On Mon, Feb 29, 2016 at 10:50 AM, Daniel Borkmann  
> wrote:
> > On 02/28/2016 05:19 AM, Cong Wang wrote:
> >>
> >> After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id
> >> cohabitation")
> >> skb_sender_cpu_clear() becomes empty and can be removed.
> >>
> >> Cc: Eric Dumazet 
> >> Signed-off-by: Cong Wang 
> >
> >
> > Wasn't the intention to keep this helper as a marker when packet
> > crosses domains from RX to TX, see discussion here:
> >
> >   https://patchwork.ozlabs.org/patch/527167/
> >
> > Maybe better to rename it and add a comment into the helper to
> > make the intention more clear?
> 
> Since when we need an empty function to mark some call path?
> Isn't this supposed to be done by comments or documents?
> 
> BTW, I myself even don't think we need any comment, people
> who touches it should understand it.

I have no objections for this patch.

If we keep the helper, a better name would be needed anyway.





[PATCH net V1 4/7] net/mlx5e: Fix ethtool RX hash func configuration change

2016-02-29 Thread Saeed Mahameed
From: Tariq Toukan 

We should modify TIRs explicitly to apply the new RSS configuration.
The light ndo close/open calls do not "refresh" them.

Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |3 ++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   34 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   27 +--
 include/linux/mlx5/mlx5_ifc.h  |4 ++-
 4 files changed, 46 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 614a602..976bddb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -447,6 +447,8 @@ enum mlx5e_traffic_types {
MLX5E_NUM_TT,
 };
 
+#define IS_HASHING_TT(tt) (tt != MLX5E_TT_ANY)
+
 enum mlx5e_rqt_ix {
MLX5E_INDIRECTION_RQT,
MLX5E_SINGLE_RQ_RQT,
@@ -613,6 +615,7 @@ void mlx5e_enable_vlan_filter(struct mlx5e_priv *priv);
 void mlx5e_disable_vlan_filter(struct mlx5e_priv *priv);
 
 int mlx5e_redirect_rqt(struct mlx5e_priv *priv, enum mlx5e_rqt_ix rqt_ix);
+void mlx5e_build_tir_ctx_hash(void *tirc, struct mlx5e_priv *priv);
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 65624ac..64af1b0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -703,18 +703,36 @@ static int mlx5e_get_rxfh(struct net_device *netdev, u32 
*indir, u8 *key,
return 0;
 }
 
+static void mlx5e_modify_tirs_hash(struct mlx5e_priv *priv, void *in, int 
inlen)
+{
+   struct mlx5_core_dev *mdev = priv->mdev;
+   void *tirc = MLX5_ADDR_OF(modify_tir_in, in, ctx);
+   int i;
+
+   MLX5_SET(modify_tir_in, in, bitmask.hash, 1);
+   mlx5e_build_tir_ctx_hash(tirc, priv);
+
+   for (i = 0; i < MLX5E_NUM_TT; i++)
+   if (IS_HASHING_TT(i))
+   mlx5_core_modify_tir(mdev, priv->tirn[i], in, inlen);
+}
+
 static int mlx5e_set_rxfh(struct net_device *dev, const u32 *indir,
  const u8 *key, const u8 hfunc)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
-   bool close_open;
-   int err = 0;
+   int inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
+   void *in;
 
if ((hfunc != ETH_RSS_HASH_NO_CHANGE) &&
(hfunc != ETH_RSS_HASH_XOR) &&
(hfunc != ETH_RSS_HASH_TOP))
return -EINVAL;
 
+   in = mlx5_vzalloc(inlen);
+   if (!in)
+   return -ENOMEM;
+
mutex_lock(&priv->state_lock);
 
if (indir) {
@@ -723,11 +741,6 @@ static int mlx5e_set_rxfh(struct net_device *dev, const 
u32 *indir,
mlx5e_redirect_rqt(priv, MLX5E_INDIRECTION_RQT);
}
 
-   close_open = (key || (hfunc != ETH_RSS_HASH_NO_CHANGE)) &&
-test_bit(MLX5E_STATE_OPENED, &priv->state);
-   if (close_open)
-   mlx5e_close_locked(dev);
-
if (key)
memcpy(priv->params.toeplitz_hash_key, key,
   sizeof(priv->params.toeplitz_hash_key));
@@ -735,12 +748,13 @@ static int mlx5e_set_rxfh(struct net_device *dev, const 
u32 *indir,
if (hfunc != ETH_RSS_HASH_NO_CHANGE)
priv->params.rss_hfunc = hfunc;
 
-   if (close_open)
-   err = mlx5e_open_locked(priv->netdev);
+   mlx5e_modify_tirs_hash(priv, in, inlen);
 
mutex_unlock(&priv->state_lock);
 
-   return err;
+   kvfree(in);
+
+   return 0;
 }
 
 static int mlx5e_get_rxnfc(struct net_device *netdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 137b05e..34b1049 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1317,6 +1317,21 @@ static void mlx5e_build_tir_ctx_lro(void *tirc, struct 
mlx5e_priv *priv)
  lro_timer_supported_periods[2]));
 }
 
+void mlx5e_build_tir_ctx_hash(void *tirc, struct mlx5e_priv *priv)
+{
+   MLX5_SET(tirc, tirc, rx_hash_fn,
+mlx5e_rx_hash_fn(priv->params.rss_hfunc));
+   if (priv->params.rss_hfunc == ETH_RSS_HASH_TOP) {
+   void *rss_key = MLX5_ADDR_OF(tirc, tirc,
+rx_hash_toeplitz_key);
+   size_t len = MLX5_FLD_SZ_BYTES(tirc,
+  rx_hash_toeplitz_key);
+
+   MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
+   memcpy(rss_key, priv->params.toeplitz_hash_key, len);
+   }
+}
+
 static int mlx5e_modify_tirs_lro(stru

[PATCH net V1 7/7] net/mlx5e: Provide correct packet/bytes statistics

2016-02-29 Thread Saeed Mahameed
From: Gal Pressman 

Using the HW VPort counters for traffic (rx/tx packets/bytes)
statistics is wrong. This is because frames dropped due to steering or
out of buffer will be counted as received. To fix that, we move to use
the packet/bytes accounting done by the driver for what the netdev
reports out.

Fixes: f62b8bb8f2d3 ('net/mlx5: Extend mlx5_core to support [...]')
Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   25 ++--
 1 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 02689ca..402994b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -141,6 +141,10 @@ void mlx5e_update_stats(struct mlx5e_priv *priv)
return;
 
/* Collect firts the SW counters and then HW for consistency */
+   s->rx_packets   = 0;
+   s->rx_bytes = 0;
+   s->tx_packets   = 0;
+   s->tx_bytes = 0;
s->tso_packets  = 0;
s->tso_bytes= 0;
s->tx_queue_stopped = 0;
@@ -155,6 +159,8 @@ void mlx5e_update_stats(struct mlx5e_priv *priv)
for (i = 0; i < priv->params.num_channels; i++) {
rq_stats = &priv->channel[i]->rq.stats;
 
+   s->rx_packets   += rq_stats->packets;
+   s->rx_bytes += rq_stats->bytes;
s->lro_packets  += rq_stats->lro_packets;
s->lro_bytes+= rq_stats->lro_bytes;
s->rx_csum_none += rq_stats->csum_none;
@@ -164,6 +170,8 @@ void mlx5e_update_stats(struct mlx5e_priv *priv)
for (j = 0; j < priv->params.num_tc; j++) {
sq_stats = &priv->channel[i]->sq[j].stats;
 
+   s->tx_packets   += sq_stats->packets;
+   s->tx_bytes += sq_stats->bytes;
s->tso_packets  += sq_stats->tso_packets;
s->tso_bytes+= sq_stats->tso_bytes;
s->tx_queue_stopped += sq_stats->stopped;
@@ -225,23 +233,6 @@ void mlx5e_update_stats(struct mlx5e_priv *priv)
s->tx_broadcast_bytes   =
MLX5_GET_CTR(out, transmitted_eth_broadcast.octets);
 
-   s->rx_packets =
-   s->rx_unicast_packets +
-   s->rx_multicast_packets +
-   s->rx_broadcast_packets;
-   s->rx_bytes =
-   s->rx_unicast_bytes +
-   s->rx_multicast_bytes +
-   s->rx_broadcast_bytes;
-   s->tx_packets =
-   s->tx_unicast_packets +
-   s->tx_multicast_packets +
-   s->tx_broadcast_packets;
-   s->tx_bytes =
-   s->tx_unicast_bytes +
-   s->tx_multicast_bytes +
-   s->tx_broadcast_bytes;
-
/* Update calculated offload counters */
s->tx_csum_offload = s->tx_packets - tx_offload_none;
s->rx_csum_good= s->rx_packets - s->rx_csum_none -
-- 
1.7.1



[PATCH net V1 0/7] Mellanox 100G mlx5 driver fixes

2016-02-29 Thread Saeed Mahameed
Hi Dave,

This series has few bug fixes for the mlx5 Ethernet driver.

Eran fixed a locking issue with time-stamping that could cause a soft-lockup 
when time-stamping is enabled.

Gal fixed the rx/tx packets/bytes counters returned by the driver to actually 
went through the network stack.

Tariq removed a poll CQ optimization which could lead the driver to stop 
getting interrupts for some of the rings, and a did also fix to HW LRO which is 
currently broken.

He also provided RSS and RX hash fixes for the case of changing the number of 
rx rings the RX hash/RSS configuration will be out of sync.

The time stamping fix from Eran is not for -stable as the feature was only 
introduced in 4.5 but all of the others are.

Changes fro V0:
- Eran addressed the irqsave/restore comments from "Dave" and fixed 
them.

This series is generated against net commit 4c0b6eaf373a 'net: thunderx: Fix 
for Qset error due to CQ full'

Saeed.

Eran Ben Elisha (1):
  net/mlx5e: Fix soft lockup when HW Timestamping is enabled

Gal Pressman (2):
  net/mlx5e: Add rx/tx bytes software counters
  net/mlx5e: Provide correct packet/bytes statistics

Tariq Toukan (4):
  net/mlx5e: Remove wrong poll CQ optimization
  net/mlx5e: Fix LRO modify
  net/mlx5e: Fix ethtool RX hash func configuration change
  net/mlx5e: Correctly handle RSS indirection table when changing
number of channels

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   18 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   25 ---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   36 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   82 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|8 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   19 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |1 -
 include/linux/mlx5/mlx5_ifc.h  |4 +-
 8 files changed, 109 insertions(+), 84 deletions(-)



[PATCH net V1 5/7] net/mlx5e: Correctly handle RSS indirection table when changing number of channels

2016-02-29 Thread Saeed Mahameed
From: Tariq Toukan 

Upon changing num_channels, reset the RSS indirection table to
match the new value.

Fixes: 2d75b2bc8a8c ('net/mlx5e: Add ethtool RSS configuration options')
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |2 ++
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   15 +++
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 976bddb..d0a57d5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -619,6 +619,8 @@ void mlx5e_build_tir_ctx_hash(void *tirc, struct mlx5e_priv 
*priv);
 
 int mlx5e_open_locked(struct net_device *netdev);
 int mlx5e_close_locked(struct net_device *netdev);
+void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len,
+  int num_channels);
 
 static inline void mlx5e_tx_notify_hw(struct mlx5e_sq *sq,
  struct mlx5e_tx_wqe *wqe, int bf_sz)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 64af1b0..5abeb00 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -385,6 +385,8 @@ static int mlx5e_set_channels(struct net_device *dev,
mlx5e_close_locked(dev);
 
priv->params.num_channels = count;
+   mlx5e_build_default_indir_rqt(priv->params.indirection_rqt,
+ MLX5E_INDIR_RQT_SIZE, count);
 
if (was_opened)
err = mlx5e_open_locked(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 34b1049..02689ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1199,7 +1199,6 @@ static void mlx5e_fill_indir_rqt_rqns(struct mlx5e_priv 
*priv, void *rqtc)
ix = mlx5e_bits_invert(i, MLX5E_LOG_INDIR_RQT_SIZE);
 
ix = priv->params.indirection_rqt[ix];
-   ix = ix % priv->params.num_channels;
MLX5_SET(rqtc, rqtc, rq_num[i],
 test_bit(MLX5E_STATE_OPENED, &priv->state) ?
 priv->channel[ix]->rq.rqn :
@@ -2101,12 +2100,20 @@ u16 mlx5e_get_max_inline_cap(struct mlx5_core_dev *mdev)
   2 /*sizeof(mlx5e_tx_wqe.inline_hdr_start)*/;
 }
 
+void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len,
+  int num_channels)
+{
+   int i;
+
+   for (i = 0; i < len; i++)
+   indirection_rqt[i] = i % num_channels;
+}
+
 static void mlx5e_build_netdev_priv(struct mlx5_core_dev *mdev,
struct net_device *netdev,
int num_channels)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
-   int i;
 
priv->params.log_sq_size   =
MLX5E_PARAMS_DEFAULT_LOG_SQ_SIZE;
@@ -2130,8 +2137,8 @@ static void mlx5e_build_netdev_priv(struct mlx5_core_dev 
*mdev,
netdev_rss_key_fill(priv->params.toeplitz_hash_key,
sizeof(priv->params.toeplitz_hash_key));
 
-   for (i = 0; i < MLX5E_INDIR_RQT_SIZE; i++)
-   priv->params.indirection_rqt[i] = i % num_channels;
+   mlx5e_build_default_indir_rqt(priv->params.indirection_rqt,
+ MLX5E_INDIR_RQT_SIZE, num_channels);
 
priv->params.lro_wqe_sz=
MLX5E_PARAMS_DEFAULT_LRO_WQE_SZ;
-- 
1.7.1



[PATCH net V1 6/7] net/mlx5e: Add rx/tx bytes software counters

2016-02-29 Thread Saeed Mahameed
From: Gal Pressman 

Sum up rx/tx bytes in software as we do for rx/tx packets, to be reported
in upcoming statistics fix.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h|8 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c |9 ++---
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index d0a57d5..5b17532 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -223,6 +223,7 @@ struct mlx5e_pport_stats {
 
 static const char rq_stats_strings[][ETH_GSTRING_LEN] = {
"packets",
+   "bytes",
"csum_none",
"csum_sw",
"lro_packets",
@@ -232,16 +233,18 @@ static const char rq_stats_strings[][ETH_GSTRING_LEN] = {
 
 struct mlx5e_rq_stats {
u64 packets;
+   u64 bytes;
u64 csum_none;
u64 csum_sw;
u64 lro_packets;
u64 lro_bytes;
u64 wqe_err;
-#define NUM_RQ_STATS 6
+#define NUM_RQ_STATS 7
 };
 
 static const char sq_stats_strings[][ETH_GSTRING_LEN] = {
"packets",
+   "bytes",
"tso_packets",
"tso_bytes",
"csum_offload_none",
@@ -253,6 +256,7 @@ static const char sq_stats_strings[][ETH_GSTRING_LEN] = {
 
 struct mlx5e_sq_stats {
u64 packets;
+   u64 bytes;
u64 tso_packets;
u64 tso_bytes;
u64 csum_offload_none;
@@ -260,7 +264,7 @@ struct mlx5e_sq_stats {
u64 wake;
u64 dropped;
u64 nop;
-#define NUM_SQ_STATS 8
+#define NUM_SQ_STATS 9
 };
 
 struct mlx5e_stats {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 3fd6a58..59658b9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -263,6 +263,7 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 
mlx5e_build_rx_skb(cqe, rq, skb);
rq->stats.packets++;
+   rq->stats.bytes += be32_to_cpu(cqe->byte_cnt);
napi_gro_receive(cq->napi, skb);
 
 wq_ll_pop:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2beea8c..bb4eeeb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -179,6 +179,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
unsigned int skb_len = skb->len;
u8  opcode = MLX5_OPCODE_SEND;
dma_addr_t dma_addr = 0;
+   unsigned int num_bytes;
bool bf = false;
u16 headlen;
u16 ds_cnt;
@@ -204,8 +205,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
opcode   = MLX5_OPCODE_LSO;
ihs  = skb_transport_offset(skb) + tcp_hdrlen(skb);
payload_len  = skb->len - ihs;
-   wi->num_bytes = skb->len +
-   (skb_shinfo(skb)->gso_segs - 1) * ihs;
+   num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs;
sq->stats.tso_packets++;
sq->stats.tso_bytes += payload_len;
} else {
@@ -213,9 +213,11 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
 !skb->xmit_more &&
 !skb_shinfo(skb)->nr_frags;
ihs = mlx5e_get_inline_hdr_size(sq, skb, bf);
-   wi->num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
+   num_bytes = max_t(unsigned int, skb->len, ETH_ZLEN);
}
 
+   wi->num_bytes = num_bytes;
+
if (skb_vlan_tag_present(skb)) {
mlx5e_insert_vlan(eseg->inline_hdr_start, skb, ihs, &skb_data,
  &skb_len);
@@ -307,6 +309,7 @@ static netdev_tx_t mlx5e_sq_xmit(struct mlx5e_sq *sq, 
struct sk_buff *skb)
sq->bf_budget = bf ? sq->bf_budget - 1 : 0;
 
sq->stats.packets++;
+   sq->stats.bytes += num_bytes;
return NETDEV_TX_OK;
 
 dma_unmap_wqe_err:
-- 
1.7.1



[PATCH net V1 3/7] net/mlx5e: Fix soft lockup when HW Timestamping is enabled

2016-02-29 Thread Saeed Mahameed
From: Eran Ben Elisha 

Readers/Writers lock for SW timecounter was acquired without disabling
interrupts on local CPU.

The problematic scenario:
* HW timestamping is enabled
* Timestamp overflow periodic service task is running on local CPU and
  holding write_lock for SW timecounter
* Completion arrives, triggers interrupt for local CPU.
  Interrupt routine calls napi_schedule(), which triggers rx/tx
  skb process.
  An attempt to read SW timecounter using read_lock is done, which is
  already locked by a writer on the same CPU and cause soft lockup.

Add irqsave/irqrestore for when using the readers/writers lock for
writing.

Fixes: ef9814deafd0 ('net/mlx5e: Add HW timestamping (TS) support')
Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c |   25 
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
index be65435..2018eeb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
@@ -62,10 +62,11 @@ static void mlx5e_timestamp_overflow(struct work_struct 
*work)
struct delayed_work *dwork = to_delayed_work(work);
struct mlx5e_tstamp *tstamp = container_of(dwork, struct mlx5e_tstamp,
   overflow_work);
+   unsigned long flags;
 
-   write_lock(&tstamp->lock);
+   write_lock_irqsave(&tstamp->lock, flags);
timecounter_read(&tstamp->clock);
-   write_unlock(&tstamp->lock);
+   write_unlock_irqrestore(&tstamp->lock, flags);
schedule_delayed_work(&tstamp->overflow_work, tstamp->overflow_period);
 }
 
@@ -136,10 +137,11 @@ static int mlx5e_ptp_settime(struct ptp_clock_info *ptp,
struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
   ptp_info);
u64 ns = timespec64_to_ns(ts);
+   unsigned long flags;
 
-   write_lock(&tstamp->lock);
+   write_lock_irqsave(&tstamp->lock, flags);
timecounter_init(&tstamp->clock, &tstamp->cycles, ns);
-   write_unlock(&tstamp->lock);
+   write_unlock_irqrestore(&tstamp->lock, flags);
 
return 0;
 }
@@ -150,10 +152,11 @@ static int mlx5e_ptp_gettime(struct ptp_clock_info *ptp,
struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
   ptp_info);
u64 ns;
+   unsigned long flags;
 
-   write_lock(&tstamp->lock);
+   write_lock_irqsave(&tstamp->lock, flags);
ns = timecounter_read(&tstamp->clock);
-   write_unlock(&tstamp->lock);
+   write_unlock_irqrestore(&tstamp->lock, flags);
 
*ts = ns_to_timespec64(ns);
 
@@ -164,10 +167,11 @@ static int mlx5e_ptp_adjtime(struct ptp_clock_info *ptp, 
s64 delta)
 {
struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
   ptp_info);
+   unsigned long flags;
 
-   write_lock(&tstamp->lock);
+   write_lock_irqsave(&tstamp->lock, flags);
timecounter_adjtime(&tstamp->clock, delta);
-   write_unlock(&tstamp->lock);
+   write_unlock_irqrestore(&tstamp->lock, flags);
 
return 0;
 }
@@ -176,6 +180,7 @@ static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, 
s32 delta)
 {
u64 adj;
u32 diff;
+   unsigned long flags;
int neg_adj = 0;
struct mlx5e_tstamp *tstamp = container_of(ptp, struct mlx5e_tstamp,
  ptp_info);
@@ -189,11 +194,11 @@ static int mlx5e_ptp_adjfreq(struct ptp_clock_info *ptp, 
s32 delta)
adj *= delta;
diff = div_u64(adj, 10ULL);
 
-   write_lock(&tstamp->lock);
+   write_lock_irqsave(&tstamp->lock, flags);
timecounter_read(&tstamp->clock);
tstamp->cycles.mult = neg_adj ? tstamp->nominal_c_mult - diff :
tstamp->nominal_c_mult + diff;
-   write_unlock(&tstamp->lock);
+   write_unlock_irqrestore(&tstamp->lock, flags);
 
return 0;
 }
-- 
1.7.1



[PATCH net V1 1/7] net/mlx5e: Remove wrong poll CQ optimization

2016-02-29 Thread Saeed Mahameed
From: Tariq Toukan 

With the MLX5E_CQ_HAS_CQES optimization flag, the following buggy
flow might occur:
- Suppose RX is always busy, TX has a single packet every second.
- We poll a single TX cqe and clear its flag.
- We never arm it again as RX is always busy.
- TX CQ flag is never changed, and new TX cqes are not polled.

We revert this optimization.

Fixes: e586b3b0baee ('net/mlx5: Ethernet Datapath files')
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |5 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |7 ---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |   10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c |1 -
 4 files changed, 1 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index aac071a..614a602 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -304,14 +304,9 @@ enum {
MLX5E_RQ_STATE_POST_WQES_ENABLE,
 };
 
-enum cq_flags {
-   MLX5E_CQ_HAS_CQES = 1,
-};
-
 struct mlx5e_cq {
/* data path - accessed per cqe */
struct mlx5_cqwq   wq;
-   unsigned long  flags;
 
/* data path - accessed per napi poll */
struct napi_struct*napi;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index dd959d9..3fd6a58 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -230,10 +230,6 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
struct mlx5e_rq *rq = container_of(cq, struct mlx5e_rq, cq);
int work_done;
 
-   /* avoid accessing cq (dma coherent memory) if not needed */
-   if (!test_and_clear_bit(MLX5E_CQ_HAS_CQES, &cq->flags))
-   return 0;
-
for (work_done = 0; work_done < budget; work_done++) {
struct mlx5e_rx_wqe *wqe;
struct mlx5_cqe64 *cqe;
@@ -279,8 +275,5 @@ wq_ll_pop:
/* ensure cq space is freed before enabling more cqes */
wmb();
 
-   if (work_done == budget)
-   set_bit(MLX5E_CQ_HAS_CQES, &cq->flags);
-
return work_done;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2c3fba0..2beea8c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -335,10 +335,6 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq)
u16 sqcc;
int i;
 
-   /* avoid accessing cq (dma coherent memory) if not needed */
-   if (!test_and_clear_bit(MLX5E_CQ_HAS_CQES, &cq->flags))
-   return false;
-
sq = container_of(cq, struct mlx5e_sq, cq);
 
npkts = 0;
@@ -422,10 +418,6 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq)
netif_tx_wake_queue(sq->txq);
sq->stats.wake++;
}
-   if (i == MLX5E_TX_CQ_POLL_BUDGET) {
-   set_bit(MLX5E_CQ_HAS_CQES, &cq->flags);
-   return true;
-   }
 
-   return false;
+   return (i == MLX5E_TX_CQ_POLL_BUDGET);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
index 4ac8d71..66d51a7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c
@@ -88,7 +88,6 @@ void mlx5e_completion_event(struct mlx5_core_cq *mcq)
 {
struct mlx5e_cq *cq = container_of(mcq, struct mlx5e_cq, mcq);
 
-   set_bit(MLX5E_CQ_HAS_CQES, &cq->flags);
set_bit(MLX5E_CHANNEL_NAPI_SCHED, &cq->channel->flags);
barrier();
napi_schedule(cq->napi);
-- 
1.7.1



[PATCH net V1 2/7] net/mlx5e: Fix LRO modify

2016-02-29 Thread Saeed Mahameed
From: Tariq Toukan 

Ethtool LRO enable/disable is broken, as of today we only modify TCP
TIRs in order to apply the requested configuration.

Hardware requires that all TIRs pointing to the same RQ should share the
same LRO configuration. For that all other TIRs' LRO fields must be
modified as well.

Fixes: 5c50368f3831 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   15 +++
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d4e1c30..137b05e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -1317,7 +1317,7 @@ static void mlx5e_build_tir_ctx_lro(void *tirc, struct 
mlx5e_priv *priv)
  lro_timer_supported_periods[2]));
 }
 
-static int mlx5e_modify_tir_lro(struct mlx5e_priv *priv, int tt)
+static int mlx5e_modify_tirs_lro(struct mlx5e_priv *priv)
 {
struct mlx5_core_dev *mdev = priv->mdev;
 
@@ -1325,6 +1325,7 @@ static int mlx5e_modify_tir_lro(struct mlx5e_priv *priv, 
int tt)
void *tirc;
int inlen;
int err;
+   int tt;
 
inlen = MLX5_ST_SZ_BYTES(modify_tir_in);
in = mlx5_vzalloc(inlen);
@@ -1336,7 +1337,11 @@ static int mlx5e_modify_tir_lro(struct mlx5e_priv *priv, 
int tt)
 
mlx5e_build_tir_ctx_lro(tirc, priv);
 
-   err = mlx5_core_modify_tir(mdev, priv->tirn[tt], in, inlen);
+   for (tt = 0; tt < MLX5E_NUM_TT; tt++) {
+   err = mlx5_core_modify_tir(mdev, priv->tirn[tt], in, inlen);
+   if (err)
+   break;
+   }
 
kvfree(in);
 
@@ -1885,8 +1890,10 @@ static int mlx5e_set_features(struct net_device *netdev,
mlx5e_close_locked(priv->netdev);
 
priv->params.lro_en = !!(features & NETIF_F_LRO);
-   mlx5e_modify_tir_lro(priv, MLX5E_TT_IPV4_TCP);
-   mlx5e_modify_tir_lro(priv, MLX5E_TT_IPV6_TCP);
+   err = mlx5e_modify_tirs_lro(priv);
+   if (err)
+   mlx5_core_warn(priv->mdev, "lro modify failed, %d\n",
+  err);
 
if (was_opened)
err = mlx5e_open_locked(priv->netdev);
-- 
1.7.1



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Thomas Gleixner
On Mon, 29 Feb 2016, Peter Hurley wrote:
> On 02/29/2016 10:24 AM, Eric Dumazet wrote:
> >> Just to be clear
> >>
> >>if (time_before(jiffies, end) && !need_resched() &&
> >>--max_restart)
> >>goto restart;
> >>
> >> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
> > 
> > Sure, now remove the 1st and 2nd condition.
> 
> Well just removing the 2nd condition has everything working fine,
> because that fixes the priority inversion.

No. It does not fix anything. It hides the shortcomings of the driver.
 
> However, when system resources are _not_ contended, it makes no
> sense to be forced to revert to ksoftirqd resolution, which is strictly
> intended as fallback.

No. You claim it is simply because your driver does not handle that situation
properly.
 
> Or flipping your argument on its head, why not just _always_ execute
> softirq in ksoftirqd?

Which is what that change effectivley does. And that makes a lot of sense,
because you get the softirq load under scheduler control and do not let the
softirq run as a context stealing entity which is completely uncontrollable by
the scheduler.

Running the softirq on return from interrupt can cause real priority
inversions.

Thanks,

tglx


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Peter Hurley
On 02/29/2016 07:27 AM, Eric Dumazet wrote:
> On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote:
> 
>> The reason why Eric's change is so effective for Eric's workload is
>> that it fixes the problem where NET_RX keeps getting new network packets
>> so it keeps looping, servicing more NET_RX softirq.
> 
> You have very little idea of what is happening in networking land.

While that is true, I can read a trace:

  ** already in NET_RX softirq **

  -0   0..s2   15us : kmem_cache_alloc: call_site=c08378e4 
ptr=de55d7c0 bytes_req=192 bytes_alloc=192 gfp_flags=GFP_ATOMIC
  -0   0..s2   23us : netif_receive_skb_entry: dev=eth0 napi_id=0x0 
queue_mapping=0 skbaddr=dca04400 vlan_tagged=0 vlan_proto=0x vlan_tci=0x000
0 protocol=0x0800 ip_summed=0 hash=0x l4_hash=0 len=88 data_len=0 
truesize=1984 mac_header_valid=1 mac_header=-14 nr_frags=0 gso_size=0 
gso_type=0x0
  -0   0..s2   30us+: netif_receive_skb: dev=eth0 skbaddr=dca04400 
len=88
  -0   0d.s5   98us : sched_waking: comm=sshd pid=750 prio=120 
target_cpu=000
  -0   0d.s6  105us : sched_stat_sleep: comm=sshd pid=750 
delay=3125230447 [ns]
  -0   0dns6  110us+: sched_wakeup: comm=sshd pid=750 prio=120 
target_cpu=000
  -0   0dns4  123us+: timer_start: timer=dc940e9c 
function=tcp_delack_timer expires=9746 [timeout=10] flags=0x
  -0   0dnH3  150us : irq_handler_entry: irq=176 
name=4a10.ethernet
  -0   0dnH3  153us : softirq_raise: vec=3 [action=NET_RX]
  -0   0dnH3  155us : irq_handler_exit: irq=176 ret=handled
  -0   0dnH3  160us : irq_handler_entry: irq=20 
name=4900.edma_ccint
  -0   0dnH3  163us : irq_handler_exit: irq=20 ret=handled
  -0   0.ns2  169us : napi_poll: napi poll on napi struct de465c30 
for device eth0
  -0   0.ns2  171us : softirq_exit: vec=3 [action=NET_RX]


As you can see, NET_RX softirq is re-raised while in NET_RX softirq,
as a result of receiving new packets. So NET_RX will keep looping,
which is what I wrote.


> Once hard irq for RX has triggered, we arm a NAPI (NET_RX softirq), and
> no more irq will come unless the napi handler ran. Then when NAPI is
> complete, we re-allow interrupt to be delivered when a new packet is
> coming.
> 
> Yes, ksoftirqd runs under load, and this is _wanted_.
> 
> Sure, it might add a latency if some high prio task is wanting the same
> cpu, but this is exactly the purpose of having multi tasking.
> 
> 



Re: [PATCH net-next v3 0/5] bridge/ovs: avoid skb head copy on frame forwarding

2016-02-29 Thread pravin shelar
On Fri, Feb 26, 2016 at 1:45 AM, Paolo Abeni  wrote:
> Currently, while when an OVS or Linux bridge is used to forward frames towards
> some tunnel device, a skb_head_copy() may occur if the ingress device do not
> provide enough headroom for the tx encapsulation.
>
> This patch series tries to address the issue implementing a new ndo operation 
> to
> allow the master device to control the headroom used when allocating the skb 
> on
> frame reception.
>
> Said operation is used by the Linux bridge to notify the bridged ports of
> needed_headroom changes, and similar bookkeeping and behaviour is also added 
> to
> openvswitch, on a per datapath basis.
>
> Finally, the operation is implemented for veth and tun device, which give
> performance improvement in the 6-12% range when forwarding frames from said
> devices towards a vxlan tunnel.
>
> v2:
> - fix netdev_get_fwd_headroom() behaviour
> - remove some code duplication with the netdev_set_rx_headroom() and
>netdev_reset_rx_headroom() helpers
> - handle headroom reset on [v]port removal/deletion
> - initialize tun align to the old default value
>
> v3:
> - fix a comment typo
>
Patch series looks good to me.

Acked-by: Pravin B Shelar 


Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32

2016-02-29 Thread Jiri Pirko
Mon, Feb 29, 2016 at 07:40:53PM CET, john.fastab...@gmail.com wrote:
>On 16-02-27 08:28 PM, Cong Wang wrote:
>> On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend
>>  wrote:
>>> On 16-02-26 09:39 AM, Cong Wang wrote:
 On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend
  wrote:
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 2121df5..e64d20b 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -392,4 +392,9 @@ struct tc_cls_u32_offload {
> };
>  };
>
> +static inline bool tc_should_offload(struct net_device *dev)
> +{
> +   return dev->netdev_ops->ndo_setup_tc;
> +}
> +

 These should be protected by CONFIG_NET_CLS_U32, no?

>>>
>>> Its not necessary it is a completely general function and I only
>>> lifted it out of cls_u32 so that the cls_flower classifier could
>>> also use it.
>>>
>>> I don't see the need off-hand to have it wrapped in an ORd ifdef
>>> statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...).
>>> Any particular reason you were thnking it should be wrapped in ifdefs?
>>>
>> 
>> Not a big deal.
>> 
>> I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n.
>> 
>> Thanks.
>> 
>
>Well because this is 'static inline' gcc should just remove it
>if it is not used. Assuming non-ancient gcc and normal compile
>flags, e.g. you are not including -fkeep-inline-functions or
>something.
>
>So just to keep it readable I would prefer to just leave it
>as is.

Definitelly. cls_flower will use it in very near future. Making it
dependent on CONFIG_NET_CLS_U32 makes 0 sense to me.


Re: [Patch net-next] net: remove skb_sender_cpu_clear()

2016-02-29 Thread Cong Wang
On Mon, Feb 29, 2016 at 10:50 AM, Daniel Borkmann  wrote:
> On 02/28/2016 05:19 AM, Cong Wang wrote:
>>
>> After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id
>> cohabitation")
>> skb_sender_cpu_clear() becomes empty and can be removed.
>>
>> Cc: Eric Dumazet 
>> Signed-off-by: Cong Wang 
>
>
> Wasn't the intention to keep this helper as a marker when packet
> crosses domains from RX to TX, see discussion here:
>
>   https://patchwork.ozlabs.org/patch/527167/
>
> Maybe better to rename it and add a comment into the helper to
> make the intention more clear?

Since when we need an empty function to mark some call path?
Isn't this supposed to be done by comments or documents?

BTW, I myself even don't think we need any comment, people
who touches it should understand it.


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Peter Hurley
On 02/29/2016 10:24 AM, Eric Dumazet wrote:
> On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote:
> 
>> While I appreciate the attempt, that's not the problem.
>>
>> Just to be clear
>>
>>  if (time_before(jiffies, end) && !need_resched() &&
>>  --max_restart)
>>  goto restart;
>>
>> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.
> 
> 
> Sure, now remove the 1st and 2nd condition.

Well just removing the 2nd condition has everything working fine,
because that fixes the priority inversion.


> You would still 'abort' (ie wakeup ksoftirqd really) when --max_restart
> becomes 0

Sure. Which would mean there's contended heavy i/o load so the driver
has to fallback to non-DMA. That's an acceptable outcome.


> So, instead of some subtle load dependent bug, you know have a reliable
> trigger.

There's no "subtle load dependent bug" here.

The driver has a fallback mode of operation that it relies on without
DMA. Of course, as I already wrote, this has consequences.

If system resources are _actually contended_, then naturally, fighting
for cpu and i/o time is fine, and I'm happy to do that in ksoftirqd.

However, when system resources are _not_ contended, it makes no
sense to be forced to revert to ksoftirqd resolution, which is strictly
intended as fallback.

Or flipping your argument on its head, why not just _always_ execute
softirq in ksoftirqd?



Re: [Patch net-next] net: remove skb_sender_cpu_clear()

2016-02-29 Thread Daniel Borkmann

On 02/28/2016 05:19 AM, Cong Wang wrote:

After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id 
cohabitation")
skb_sender_cpu_clear() becomes empty and can be removed.

Cc: Eric Dumazet 
Signed-off-by: Cong Wang 


Wasn't the intention to keep this helper as a marker when packet
crosses domains from RX to TX, see discussion here:

  https://patchwork.ozlabs.org/patch/527167/

Maybe better to rename it and add a comment into the helper to
make the intention more clear?


Re: [PATCH] mrf24j40: fix security-enabled processing on inbound frames

2016-02-29 Thread Alan Ott

On 02/23/2016 04:29 AM, Alexander Aring wrote:

Alan, do you have some comments about that?

Currently the mrf24j40 goes into a deadlock if a frame with security
enable bit is set. As you see, I helped myself to create this patch and solve
this stupid default behaviour of mrf24j40. :-)



Hi Alex, I'll look at this today.

Alan.



Re: [net-next PATCH v3 1/3] net: sched: consolidate offload decision in cls_u32

2016-02-29 Thread John Fastabend
On 16-02-27 08:28 PM, Cong Wang wrote:
> On Fri, Feb 26, 2016 at 8:24 PM, John Fastabend
>  wrote:
>> On 16-02-26 09:39 AM, Cong Wang wrote:
>>> On Fri, Feb 26, 2016 at 7:53 AM, John Fastabend
>>>  wrote:
 diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
 index 2121df5..e64d20b 100644
 --- a/include/net/pkt_cls.h
 +++ b/include/net/pkt_cls.h
 @@ -392,4 +392,9 @@ struct tc_cls_u32_offload {
 };
  };

 +static inline bool tc_should_offload(struct net_device *dev)
 +{
 +   return dev->netdev_ops->ndo_setup_tc;
 +}
 +
>>>
>>> These should be protected by CONFIG_NET_CLS_U32, no?
>>>
>>
>> Its not necessary it is a completely general function and I only
>> lifted it out of cls_u32 so that the cls_flower classifier could
>> also use it.
>>
>> I don't see the need off-hand to have it wrapped in an ORd ifdef
>> statement where its (CONFIG_NET_CLS_U32 | CONFIG_NET_CLS_X ...).
>> Any particular reason you were thnking it should be wrapped in ifdefs?
>>
> 
> Not a big deal.
> 
> I just feel these don't need to compile when I have CONFIG_NET_CLS_U32=n.
> 
> Thanks.
> 

Well because this is 'static inline' gcc should just remove it
if it is not used. Assuming non-ancient gcc and normal compile
flags, e.g. you are not including -fkeep-inline-functions or
something.

So just to keep it readable I would prefer to just leave it
as is.

Thanks,
John


Re: [PATCH] mld, igmp: Fix reserved tailroom calculation

2016-02-29 Thread Hannes Frederic Sowa

On 29.02.2016 19:08, Benjamin Poirier wrote:

If you think we should write the expression with "if" instead of "min",
instead of the current

+   skb->reserved_tailroom = skb_tailroom(skb) -
+   min_t(int, mtu, skb_tailroom(skb) - tlen);

it should be:

+   if (mtu < skb_tailroom(skb) - tlen)
+   skb->reserved_tailroom = skb_tailroom(skb) - mtu;
+   else
+   skb->reserved_tailroom = tlen;

The second alternative does not look more readable to me but I have been
looking at that expression for a while. If you think that it is more
readable, I will resend the patch expressed that way. Please let me
know.


I would still find it more readable actually, but no strong opinion, I 
would leave it up to you.


Could it make sense to put this code into a static inline helper and 
reuse it for both, igmp and mld?


Thanks,
Hannes



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 10:05 -0800, Peter Hurley wrote:

> While I appreciate the attempt, that's not the problem.
> 
> Just to be clear
> 
>   if (time_before(jiffies, end) && !need_resched() &&
>   --max_restart)
>   goto restart;
> 
> aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.


Sure, now remove the 1st and 2nd condition.

You would still 'abort' (ie wakeup ksoftirqd really) when --max_restart
becomes 0

So, instead of some subtle load dependent bug, you know have a reliable
trigger.

The fact it took 3 years for someone to complain about this change
should tell us something really.

The only way for your bug to hide would be to remove all the 'break
infinite loop' logic.

And this is not going to happen.




Re: [PATCH] mld, igmp: Fix reserved tailroom calculation

2016-02-29 Thread Benjamin Poirier
On 2016/02/29 16:43, Hannes Frederic Sowa wrote:
> On 29.02.2016 16:19, Benjamin Poirier wrote:
> >On 2016/02/29 15:57, Daniel Borkmann wrote:
> >[...]
> >>
> >>[ cutting the IPv4 part off as diff is the same ]
> >>
> >>>diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> >>>index 5ee56d0..c157edc 100644
> >>>--- a/net/ipv6/mcast.c
> >>>+++ b/net/ipv6/mcast.c
> >>>@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev 
> >>>*idev, unsigned int mtu)
> >>>   return NULL;
> >>>
> >>>   skb->priority = TC_PRIO_CONTROL;
> >>>-  skb->reserved_tailroom = skb_end_offset(skb) -
> >>>-   min(mtu, skb_end_offset(skb));
> >>>   skb_reserve(skb, hlen);
> >>>+  skb->reserved_tailroom = skb_tailroom(skb) -
> >>>+  min_t(int, mtu, skb_tailroom(skb) - tlen);
> >>
> >>Are you sure this is correct? Wouldn't that mean (assuming we allocated
> >>enough space), that I could now fill a larger than MTU frame?
> >
> >Quoting back a part of the log:
> >
> >>>The maximum space available for ip headers and payload without
> >>>fragmentation is min(mtu, data + extra). Therefore,
> >>>reserved_tailroom
> >>>= data + extra + tlen - min(mtu, data + extra)
> >>>= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
> >>>= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
> >
> >The min() takes care of the situation you describe, ie. if the allocated
> >space is large, reserved_tailroom will be large enough that we do not
> >use more space than the mtu.
> >
> >I tested the mld and igmp code with different driver parameters, mtu
> >values, number of multicast address records and even allocation
> >failures. If you think the formula is wrong, please provide a
> >counter-example with hlen, tlen, mtu and size values.
> 
> I think the code is fine albeit I think we should remove the min macro and
> just do something:
> 
> if (skb_tailroom(skb) > mtu)
>   skb->reserved_tailroom = skb_tailroom(skb) - mtu;
> 
> Does that make sense? I think it is much more readable.

That is not equivalent. It fails to take tlen into account.

For igmp, consider this case:
with hlen = 16, mtu = 9000, tlen = 8,
additionally, suppose that the first iteration of the allocation loop
(alloc_skb(9000 + 16 + 8, ...) which requires 4 pages) fails and the
second iteration (alloc_skb((9000 >> 1) + 16 + 8, ...) which requires 2
pages) succeeds:
size = (9000 >> 1) + 16 + 8 = 4524
skb_end_offset = 8192 - 320 = 7872
tailroom = 7872 - 16 = 7856

data = 9000 >> 1 = 4500
extra = 7872 - 4524 = 3348

reserved tailroom (patch version)
= 4500 + 3348 + 8 - min(9000, 4500 + 3348)
= 8
reserved tailroom (your version)
= 0

Headers are ipv4 + igmpv3 = 24 + 8 = 32, records are 8 bytes
With 978 igmpv3 records, with your version, we would output an
skb that has less tailroom (0) than dev->needed_tailroom (8).

For mld, consider this case:
with hlen = 16, mtu = 9000, tlen = 8:
size = 3776 (SKB_MAX_ORDER case)
skb_end_offset = 3776
tailroom = 3776 - 16 = 3760

data = 3776 - 16 - 8 = 3752
extra = 0

reserved tailroom (patch version)
= 3752 + 0 + 8 - min(9000, 3752 + 0)
= 8
reserved tailroom (your version)
= 0

Headers are ipv6 + icmpv6 = 48 + 8 = 56, records are 20 bytes
With 185 mld records, with your formula, we would output an skb that
has less tailroom (4) than dev->needed_tailroom (8).

If you think we should write the expression with "if" instead of "min",
instead of the current

+   skb->reserved_tailroom = skb_tailroom(skb) -
+   min_t(int, mtu, skb_tailroom(skb) - tlen);

it should be:

+   if (mtu < skb_tailroom(skb) - tlen)
+   skb->reserved_tailroom = skb_tailroom(skb) - mtu;
+   else
+   skb->reserved_tailroom = tlen;

The second alternative does not look more readable to me but I have been
looking at that expression for a while. If you think that it is more
readable, I will resend the patch expressed that way. Please let me
know.


Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Peter Hurley
On 02/29/2016 08:21 AM, Eric Dumazet wrote:
> On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote:
> 
>>  The current kernel is HZ=250 but this would occur on HZ=1000 as well.
> 
> Right. But the problem with HZ=100 and HZ=250 is that the detection can
> happens because jiffy granularity is too coarse, since 
> 
> msecs_to_jiffies(2) -> 1
> 
> Following patch might reduce the probability, but wont really fix your
> problem.
> 
> Fact that ksoftirqd prio is not what you want is completely orthogonal.
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 479e443..f7cc594 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -180,7 +180,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
>  
>  /*
>   * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> - * but break the loop if need_resched() is set or after 2 ms.
> + * but break the loop if need_resched() is set or after 2 ms/ticks.
>   * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
>   * certain cases, such as stop_machine(), jiffies may cease to
>   * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> @@ -191,7 +191,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
>   * we want to handle softirqs as soon as possible, but they
>   * should not be able to lock up the box.
>   */
> -#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
> +#define MAX_SOFTIRQ_TIME  (1 + msecs_to_jiffies(2))
>  #define MAX_SOFTIRQ_RESTART 10
>  
>  #ifdef CONFIG_TRACE_IRQFLAGS

While I appreciate the attempt, that's not the problem.

Just to be clear

if (time_before(jiffies, end) && !need_resched() &&
--max_restart)
goto restart;

aborts softirq *even if 0ns have elapsed*, if NET_RX has woken a process.






[PATCH v2] socket.7: Document some BPF-related socket options

2016-02-29 Thread Craig Gallek
From: Craig Gallek 

Document the behavior and the first kernel version for each of the
following socket options:
SO_ATTACH_FILTER
SO_ATTACH_BPF
SO_ATTACH_REUSEPORT_CBPF
SO_ATTACH_REUSEPORT_EBPF
SO_DETACH_FILTER
SO_DETACH_BPF
SO_LOCK_FILTER

Signed-off-by: Craig Gallek 
---
v2 changes:
- Content suggestions from Michael Kerrisk :
  * Clarify socket filter return value semantics
  * Clarify wording of minimal kernel versions
  * Explain behavior of multiple calls using SO_ATTACH_[BPF|FILTER]
  * Define 'reuseport groups' in SO_ATTACH_REUSEPORT_*
- Include SO_LOCK_FILTER documentation mostly based off of the wording
  in the commit message by Vincent Bernat 
  d59577b6ffd3 ("sk-filter: Add ability to lock a socket filter program")

---
 man7/socket.7 | 136 +-
 1 file changed, 115 insertions(+), 21 deletions(-)

diff --git a/man7/socket.7 b/man7/socket.7
index db7cb8324dde..d22107cc47d7 100644
--- a/man7/socket.7
+++ b/man7/socket.7
@@ -41,9 +41,6 @@
 .\"SO_GET_FILTER (3.8)
 .\"commit a8fc92778080c845eaadc369a0ecf5699a03bef0
 .\"Author: Pavel Emelyanov 
-.\"SO_LOCK_FILTER (3.9)
-.\"commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
-.\"Author: Vincent Bernat 
 .\"SO_SELECT_ERR_QUEUE (3.10)
 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
 .\"Author: Keller, Jacob E 
@@ -53,13 +50,6 @@
 .\" SO_BPF_EXTENSIONS (3.14)
 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
 .\"Author: Michal Sekletar 
-.\" SO_ATTACH_BPF (3.19)
-.\" and SO_DETACH_BPF as synonym for SO_DETACH_FILTER
-.\" commit 89aa075832b0da4402acebd698d0411dcc82d03e
-.\"Author: Alexei Starovoitov 
-.\"SO_ATTACH_REUSEPORT_CBPF, SO_ATTACH_REUSEPORT_EBPF (4.5)
-.\"commit 538950a1b7527a0a52ccd9337e3fcd304f027f13
-.\"Author: Craig Gallek 
 .\"
 .TH SOCKET 7 2015-05-07 Linux "Linux Programmer's Manual"
 .SH NAME
@@ -311,6 +301,90 @@ The value 0 indicates that this is not a listening socket,
 the value 1 indicates that this is a listening socket.
 This socket option is read-only.
 .TP
+.BR SO_ATTACH_FILTER " and " SO_ATTACH_BPF
+Attach a classic or extended BPF program (respectively) to the socket
+for use as a filter of incoming packets. A packet will be dropped if
+the filter program returns zero.  If the filter program returns a
+non-zero value which is less than the packet's data length, the packet
+will be truncated to the length returned.  If the value returned by
+the filter is greater than or equal to the packet's data length, the
+packet is allowed to proceed unmodified.
+
+The argument for
+.BR SO_ATTACH_FILTER
+is a
+.I sock_fprog
+structure in
+.B .
+.sp
+.in +4n
+.nf
+struct sock_fprog {
+unsigned short  len;
+struct sock_filter *filter;
+};
+.fi
+.in
+.IP
+The argument for
+.BR SO_ATTACH_BPF
+is a file descriptor returned by the
+.BR bpf (2)
+system call and must refer to a program of type
+.BR BPF_PROG_TYPE_SOCKET_FILTER.
+These options may be set multiple times for a given socket, each time
+replacing the previous filter program.  The classic and extended
+versions may be called on the same socket, but the previous filter
+will always be replaced such that a socket never has more than one
+filter defined.
+
+.BR SO_ATTACH_FILTER
+is available since Linux 2.2.
+.BR SO_ATTACH_BPF
+is available since Linux 3.19.  Both classic and extended BPF are
+explained in the kernel source file
+.I Documentation/networking/filter.txt
+.TP
+.BR SO_ATTACH_REUSEPORT_CBPF " and " SO_ATTACH_REUSEPORT_EBPF " (since Linux 
4.5)"
+For use with the
+.BR SO_REUSEPORT
+option, these options allow the user to set a classic or extended
+BPF program (respectively) which defines how packets are assigned to
+the sockets in the reuseport group (that is, all sockets which have
+.BR SO_REUSEPORT
+set and are using the same local address to receive packets).  The BPF
+program must return an index between 0 and N-1 representing the socket
+which should receive the packet (where N is the number of sockets in
+the group). If the BPF program returns an invalid index, socket
+selection will fall back to the plain
+.BR SO_REUSEPORT
+mechanism.
+
+Sockets are numbered in the order in which they are added to the group
+(that is, the order of
+.BR bind (2)
+calls for UDP sockets or the order of
+.BR listen (2)
+calls for TCP sockets).  New sockets added to a reuseport group will
+inherit the BPF program.  When a socket is removed from a reuseport
+group (via
+.BR close (2))
+the last socket in the group will be moved into the closed socket's
+position.
+
+These options may be set repeatedly at any time on any single socket
+in the group to replace the current BPF program used by all sockets in
+the group.
+.BR SO_ATTACH_REUSEPORT_CBPF
+takes the same socket argument type as
+.BR SO_ATTACH_FILTER
+and
+.BR SO_ATTACH_REUSEPORT_EBPF
+takes th

Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread David Miller
From: Peter Hurley 
Date: Mon, 29 Feb 2016 07:03:11 -0800

> However, I'm pointing out that Eric's sledgehammer approach to fixing
> the NET_RX softirq bug is having significant side-effects in other
> subsystems.

Either your hardware can handle arbitrary latencies and thus can use
softirqs for event completion successfully, or it can't.

You, my friend, are the one using the sledgehammer.


Re: [PATCH net-next 1/5] vxlan: implement GPE in L2 mode

2016-02-29 Thread Tom Herbert
On Mon, Feb 29, 2016 at 2:23 AM, Jiri Benc  wrote:
> On Sat, 27 Feb 2016 12:54:52 -0800, Tom Herbert wrote:
>> Yes, but RCO has not been specified for VXLAN-GPE either
>
> As far as I can see, RCO will just work with VXLAN-GPE. But I have no
> problem disallowing them to be set together, if you prefer that.
>
>> so the patch
>> does not correctly refuse setting those two together. Inevitably
>> though, those and other extensions will defined for VXLAN-GPE and new
>> ones for VXLAN. Again, the protocols are fundamentally incompatible,
>> so instead of trying to enforce each valid combination at
>> configuration
>
> We need to do the checking in either case. If we accepted unsupported
> combinations and then just silently ignored them, we'd be in troubles
> later when such combination becomes defined/supported. There would be
> no way for the userspace tools to detect whether a particular kernel
> supports the combination or not.
>
> So, we need to check for supported combination of options during
> configuration anyway.
>
> And when we have that, I don't really see the reason for doing that
> kind of code duplication that you suggest.
>
>> or performing multiple checks for flavor each time we
>> look at a packet, it seems easier to split the parsing with at most
>> one check for the protocol variant. For instance in
>> vxlan_udp_encap_recv just do:
>>
>> if (vs->flags & VXLAN_F_GPE)
>>if (!vxlan_parse_gpe_hdr(&unparsed, skb, vs->flags))
>>goto drop;
>> else
>>if (!vxlan_parse_gpe(&unparsed, skb, vs->flags))
>>goto drop;
>
> Most of the code of these two functions will be identical. To
> consolidate that as much as possible, you'll end up with what I have or
> something very similar.
>
>> And then move REMCSUM and GPB and other protocol specific checks to
>> the right function.
>
> And when RCO is defined for GPE, we copy the code? Doesn't make sense,
> sorry.
>
> If you look at the code in the current net-next (and the code after
> this patchset), the extension handling has been made generic and each
> extension gets its own handler function, leading to clean separation in
> the code. There's no reason to split the vxlan_rcv into two functions
> doing the same things but with slightly different calls to extensions.
>
They may or may not be "slightly different"; if they are the same
(like RCO for VXLAN-GPE uses the low order bits in VNI) then a common
backend function can be called.

As defined now, GPB can't be used with VXLAN-GPE at all, but when I
read your patch it looks very much like GPB is being checked and
allowed in the VXLAN-GPE path. The fact that "if (vs->flags &
VXLAN_F_GBP)" always fails for VXLAN-GPE packets because of
configuration constraints is not at all obvious, and really this just
results in an unnecessary conditional that gives the same answer for
every single VXLAN-GPE packet which we've already checked for just a
few lines above. At least the check for GPB could be moved to an else
block of " if (vs->flags & VXLAN_F_GPE)", this alone improves clarity
and eliminates an unnecessary conditional in the VXLAN-GPE path.

>  Jiri


Re: [net] net: fix double free issue of skbuff

2016-02-29 Thread David Miller
From: 张胜举 
Date: Mon, 29 Feb 2016 22:16:37 +0800

>> On Mon, 2016-02-29 at 12:22 +, Zhang Shengju wrote:
>> > If skb_reorder_vlan_header() failed, skb is freed and NULL is returned.
>> > Then at skb_vlan_untag(), it will free skbuff again which cause double
>> > free.
>> 
>> On skb_reorder_vlan_header() failure, skb_vlan_untag() will call
>> kfree_skb() using the return value of skb_reorder_vlan_header(), that is
>> NULL. kfree_skb() is a noop when the argument is NULL.
>> 
>> The current code seams safe.
>> 
>> Paolo
> Hi Paolo, even current code is safe, this's still a potential problem. We 
> should make an
> assumption that inner function doesn't free skb, and let outside function 
> take care of this.

No, the current code is intentional and perfectly fine.

Fix real bugs, not imaginary ones.

Thanks.


Re: [net] net: fix double free issue of skbuff

2016-02-29 Thread David Miller
From: Zhang Shengju 
Date: Mon, 29 Feb 2016 12:22:53 +

> If skb_reorder_vlan_header() failed, skb is freed and NULL is returned.
> Then at skb_vlan_untag(), it will free skbuff again which cause double
> free.

The 'skb' local variable in this case will be set to "NULL", calling
kfree_skb() on NULL doesn't do anything.

> This patch removes kfree_skb() call in function skb_reorder_vlan_header().
> 
> Signed-off-by: Zhang Shengju 

Please analyze the complete control path of the caller of this
function, and you'll find that everything is fine.


Re: [PATCH v3 00/17] stmmac: enhance driver performances and update the version

2016-02-29 Thread Giuseppe CAVALLARO

Gents

on top of these patches, there is a new train to enhance the stmmac to
support the DWMAC_4.x chips. They will be proposed very soon and on
top of this update (as soon as reviewed and merged).

In our context, it has been very useful working with the same driver
that runs fine on several (x86, arm, sh4) boxes with different SYNP
MAC/GMAC IPs (starting from MAC10/100 Database Release 1.5 to databook
3.70a and  4.00a and 4.10a). We got the benefit to have all the
features already supported by stmmac plus the good performances
available with TSO on gmac4.

I can image no big issues to enhance the stmmac on supporting the
new 4.00a and 4.10a although there is some other work made by
Rabin and Larper.

I guess, stmmac users will continue to be happy to continue to have the
same d.d. working on their platforms with new gmac.
But! it also makes sense to avoid to have two drivers that aim
to do the same job. Or to get more synergy on the same code as
done in the past with Rayagond for PTP and EEE.

If you have some concern or advice, please do not hesitate to ask.
We will try to send the patches soon to show the code in case of
people are interested in.

Kind Regards
Peppe

On 2/29/2016 2:27 PM, Alexandre TORGUE wrote:

According to Giuseppe, I send the v3 series.

This is a subset of patches to rework the driver in order to improve its
performances and make it more robust under stress conditions.

All patches have been ported on STi mainstream kernel branch and
tested on ARM STiH4xx platforms and newer ones.

This series also updates the driver version and prepares it
to include further development to support new chips.

In detail, these patches are:

o to rework and improve the internal DMA bus settings

   Fine tuning is mandatory on some platforms for both
   performance and stability issues.

o to rework and optimize the descriptor management.

   This will help a lot on performance side and preparing
   the inclusion on the GMAC4.x.

o to add a set of optimizations for both xmit and rx functions.

   These will help a lot on performance side and making the driver
   more robust in case of low memory conditions and under some
   stress test, performed for example on IP-STB.

Below some throughput figures obtained on some boxes before and after
the patches.

nuttcp (mbps)   iperf (Mbps)
--
   tcp udp  tcp  udp
tx   rx   tx  rx  tx   rx   tx  rx
 --
old 680   800 480  506760  800   600  700
new 830   880 540  630840  880   700   800

==

V2: - rx_copybreak is now managed by using ethtool.
V3: - improve comments on PCIe detailing that there are no regressions
 - rework some APIs to properly define some params as bool as expected
 - rework the formula to get the element inside the ring. Comparing V2,
patches 4 and 13 have been merged because the same formula have been
used. After this rework, no evident benefit has been noticed in terms
of performances so the table above is still valid. Disassembling the
code for SH4 and ARM, with the new formula just an instr is saved
(depending on compiler flags) and this gives us not so relevanti gain,
for example, on SH4 where some instr are executed in the same pipeline
stage.
Ring sizes are now fixed and maybe they can be reworked to be tuned
w/o using stmmaceth= cmdline option. Indeed, nobody change these sizes
and indeed the numbers selected by default respect the budget and
avoid to pass invalid setup. These are the best driver default sizes
for ring and chain.

==
Fabrice Gasnier (3):
   stmmac: merge get_rx_owner into rx_status routine.
   stmmac: optimize tx clean function
   stmmac: fix phy init when attached to a phy

Giuseppe Cavallaro (14):
   stmmac: share reset function between dwmac100 and dwmac1000
   stmmac: rework DMA bus setting and introduce new platform AXI
 structure
   stmmac: change descriptor layout
   stmmac: review RX/TX ring management
   stmmac: add length field to dma data
   stmmac: add last_segment field to dma data
   stmmac: add is_jumbo field to dma data
   stmmac: optimize tx desc management
   stmmac: set dirty index out of the loop
   stmmac: first frame prep at the end of xmit routine
   stmmac: do not poll phy handler when attach a switch
   stmmac: do not perform zero-copy for rx frames
   stmmac: tune rx copy via threshold.
   stmmac: update version to Oct_2015

  Documentation/devicetree/bindings/net/stmmac.txt   |  54 ++-
  drivers/net/ethernet/stmicro/stmmac/chain_mode.c   |  37 +-
  drivers/net/ethernet/stmicro/stmmac

[PATCH] ethernet/atl1c: remove left over dead code

2016-02-29 Thread Eric Engestrom
Left over from c24588afc536a35c924d014f13b669b20ccf8553
("atl1c: using fixed TXQ configuration for l2cb and l1c")

Signed-off-by: Eric Engestrom 
---
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 8b5988e..d0084d4 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -65,10 +65,6 @@ static void atl1c_reset_dma_ring(struct atl1c_adapter 
*adapter);
 static int atl1c_configure(struct atl1c_adapter *adapter);
 static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter);
 
-static const u16 atl1c_pay_load_size[] = {
-   128, 256, 512, 1024, 2048, 4096,
-};
-
 
 static const u32 atl1c_default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
NETIF_MSG_LINK | NETIF_MSG_TIMER | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP;
-- 
2.7.1



[PATCH] net/ipv4: remove left over dead code

2016-02-29 Thread Eric Engestrom
8cc785f6f429c2a3fb81745dc142cbd72a462c4a ("net: ipv4: make the ping
/proc code AF-independent") removed the code using it, but renamed this
variable instead of removing it.

Signed-off-by: Eric Engestrom 
---
 net/ipv4/ping.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index d3a2716..35179fc 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -1140,13 +1140,6 @@ static int ping_v4_seq_show(struct seq_file *seq, void 
*v)
return 0;
 }
 
-static const struct seq_operations ping_v4_seq_ops = {
-   .show   = ping_v4_seq_show,
-   .start  = ping_v4_seq_start,
-   .next   = ping_seq_next,
-   .stop   = ping_seq_stop,
-};
-
 static int ping_seq_open(struct inode *inode, struct file *file)
 {
struct ping_seq_afinfo *afinfo = PDE_DATA(inode);
-- 
2.7.1



[PATCH] net/rtnetlink: remove dead code

2016-02-29 Thread Eric Engestrom
3b766cd832328fcb87db3507e7b98cf42f21689d ("net/core: Add reading VF
statistics through the PF netdevice") added that variable but it's never
been used.

Signed-off-by: Eric Engestrom 
---
 net/core/rtnetlink.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d735e85..35abefc 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1389,15 +1389,6 @@ static const struct nla_policy 
ifla_vf_policy[IFLA_VF_MAX+1] = {
[IFLA_VF_TRUST] = { .len = sizeof(struct ifla_vf_trust) },
 };
 
-static const struct nla_policy ifla_vf_stats_policy[IFLA_VF_STATS_MAX + 1] = {
-   [IFLA_VF_STATS_RX_PACKETS]  = { .type = NLA_U64 },
-   [IFLA_VF_STATS_TX_PACKETS]  = { .type = NLA_U64 },
-   [IFLA_VF_STATS_RX_BYTES]= { .type = NLA_U64 },
-   [IFLA_VF_STATS_TX_BYTES]= { .type = NLA_U64 },
-   [IFLA_VF_STATS_BROADCAST]   = { .type = NLA_U64 },
-   [IFLA_VF_STATS_MULTICAST]   = { .type = NLA_U64 },
-};
-
 static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
[IFLA_PORT_VF]  = { .type = NLA_U32 },
[IFLA_PORT_PROFILE] = { .type = NLA_STRING,
-- 
2.7.1



INFORMAÇÃO IMPORTANTE RE: Consultoria de Investimento em sua Localidade.

2016-02-29 Thread Alecssandro

Oi Amigo,

INFORMAÇÃO IMPORTANTE

Nossa família está interessada em investir fundos em sua localidade.

Mais informações para você se interessou.

Saudações,
Sir Henry Neville Lindley Keswick
Presidente da Jardine Matheson Holdings Ltd
https://en.wikipedia.org/wiki/Keswick_family


[PATCH 0/3] Enable Ethernet on STM32F429 EVAL board

2016-02-29 Thread Alexandre TORGUE
This series adds Ethernet support on STM32F429 SOC and enable it on Eval
board:
 -Add Ethernet node in SOC file:
  -Define MII mode pinctrl
  -use Mixed burst and PBL 8
 -Add system config node for glue.
 -Enable Ethernet for Eval board:
  -mii mode
  -connected to a PHY through MDIO.

Note, this series follow the series which adds glue and update stmmac driver:

https://lkml.org/lkml/2016/2/26/329

Best regards.

Alex

Alexandre TORGUE (3):
  ARM: dts: stm32f429: Add system config bank node
  ARM: dts: stm32f429: Add Ethernet support
  ARM: dts: stm32f429: Enable Ethernet on Eval board

 arch/arm/boot/dts/stm32429i-eval.dts | 15 ++
 arch/arm/boot/dts/stm32f429.dtsi | 40 
 2 files changed, 55 insertions(+)

-- 
1.9.1



[PATCH 2/3] ARM: dts: stm32f429: Add Ethernet support

2016-02-29 Thread Alexandre TORGUE
Add Ethernet support (Synopsys MAC IP 3.50a) on stm32f429 SOC.

Signed-off-by: Alexandre TORGUE 

diff --git a/arch/arm/boot/dts/stm32f429.dtsi b/arch/arm/boot/dts/stm32f429.dtsi
index bb7a736..af0367c 100644
--- a/arch/arm/boot/dts/stm32f429.dtsi
+++ b/arch/arm/boot/dts/stm32f429.dtsi
@@ -283,6 +283,26 @@
bias-disable;
};
};
+
+   ethernet0_mii: mii@0 {
+   mii {
+   slew-rate = <2>;
+   pinmux = 
,
+
,
+
,
+
,
+
,
+
,
+,
+,
+
,
+
,
+
,
+
,
+
,
+
;
+   };
+   };
};
 
rcc: rcc@40023810 {
@@ -323,6 +343,21 @@
st,mem2mem;
};
 
+   ethernet0: dwmac@40028000 {
+   compatible = "st,stm32-dwmac", "snps,dwmac-3.50a";
+   status = "disabled";
+   reg = <0x40028000 0x8000>;
+   reg-names = "stmmaceth";
+   interrupts = <0 61 0>, <0 62 0>;
+   interrupt-names = "macirq", "eth_wake_irq";
+   clock-names = "stmmaceth", "tx-clk", "rx-clk";
+   clocks = <&rcc 0 25>, <&rcc 0 26>, <&rcc 0 27>;
+   st,syscon = <&syscfg 0x4>;
+   snps,pbl = <8>;
+   snps,mixed-burst;
+   dma-ranges;
+   };
+
rng: rng@50060800 {
compatible = "st,stm32-rng";
reg = <0x50060800 0x400>;
-- 
1.9.1



[PATCH] net-sysfs: remove left over dead code

2016-02-29 Thread Eric Engestrom
This format hasn't been used since 04ed3e741d0f133e02bed7fa5c98edba128f90e7
("net: change netdev->features to u32")

Signed-off-by: Eric Engestrom 
---
 net/core/net-sysfs.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b6c8a66..e326707 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -29,7 +29,6 @@
 
 #ifdef CONFIG_SYSFS
 static const char fmt_hex[] = "%#x\n";
-static const char fmt_long_hex[] = "%#lx\n";
 static const char fmt_dec[] = "%d\n";
 static const char fmt_ulong[] = "%lu\n";
 static const char fmt_u64[] = "%llu\n";
-- 
2.7.1



[PATCH 3/3] ARM: dts: stm32f429: Enable Ethernet on Eval board

2016-02-29 Thread Alexandre TORGUE
MAC is connected to a PHY in MII mode.

Signed-off-by: Alexandre TORGUE 

diff --git a/arch/arm/boot/dts/stm32429i-eval.dts 
b/arch/arm/boot/dts/stm32429i-eval.dts
index 1ae57fa..e345459 100644
--- a/arch/arm/boot/dts/stm32429i-eval.dts
+++ b/arch/arm/boot/dts/stm32429i-eval.dts
@@ -87,6 +87,21 @@
clock-frequency = <2500>;
 };
 
+ðernet0 {
+   status = "okay";
+   pinctrl-0   = <ðernet0_mii>;
+   pinctrl-names   = "default";
+   phy-mode= "mii-id";
+   mdio0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "snps,dwmac-mdio";
+   phy1: ethernet-phy@1 {
+   reg = <1>;
+   };
+   };
+};
+
 &usart1 {
pinctrl-0 = <&usart1_pins_a>;
pinctrl-names = "default";
-- 
1.9.1



Re: [PATCH v2 1/3] net: ipv4: Convert IP network timestamps to be y2038 safe

2016-02-29 Thread Arnd Bergmann
On Saturday 27 February 2016 00:32:15 Deepa Dinamani wrote:
> ICMP timestamp messages and IP source route options require
> timestamps to be in milliseconds modulo 24 hours from
> midnight UT format.
> 
> Add inet_current_timestamp() function to support this. The function
> returns the required timestamp in network byte order.
> 
> Timestamp calculation is also changed to call ktime_get_real_ts64()
> which uses struct timespec64. struct timespec64 is y2038 safe.
> Previously it called getnstimeofday() which uses struct timespec.
> struct timespec is not y2038 safe.
> 
> Signed-off-by: Deepa Dinamani 
> Cc: "David S. Miller" 
> Cc: Alexey Kuznetsov 
> Cc: Hideaki YOSHIFUJI 
> Cc: James Morris 
> Cc: Patrick McHardy 
> 

Acked-by: Arnd Bergmann 


[PATCH 1/3] ARM: dts: stm32f429: Add system config bank node

2016-02-29 Thread Alexandre TORGUE
Signed-off-by: Alexandre TORGUE 

diff --git a/arch/arm/boot/dts/stm32f429.dtsi b/arch/arm/boot/dts/stm32f429.dtsi
index 598362e..bb7a736 100644
--- a/arch/arm/boot/dts/stm32f429.dtsi
+++ b/arch/arm/boot/dts/stm32f429.dtsi
@@ -171,6 +171,11 @@
status = "disabled";
};
 
+   syscfg: system-config@40013800 {
+   compatible = "syscon";
+   reg = <0x40013800 0x400>;
+   };
+
pin-controller {
#address-cells = <1>;
#size-cells = <1>;
-- 
1.9.1



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:58 -0800, Peter Hurley wrote:

> All that's happened is the first loop of NET_RX softirq has woken a
> process; that is sufficient to abort softirq and defer it for ksoftirqd.
> 
> That's why I'm saying this is a priority inversion, and one that
> will happen a lot.

Sure. This will happen every time ksoftirqd is launched.

Get rid of ksoftirqd or renice it so that you can easily be killed by
softirq storm.






Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:54 -0800, Peter Hurley wrote:

>  The current kernel is HZ=250 but this would occur on HZ=1000 as well.

Right. But the problem with HZ=100 and HZ=250 is that the detection can
happens because jiffy granularity is too coarse, since 

msecs_to_jiffies(2) -> 1

Following patch might reduce the probability, but wont really fix your
problem.

Fact that ksoftirqd prio is not what you want is completely orthogonal.

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 479e443..f7cc594 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
 
 /*
  * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
- * but break the loop if need_resched() is set or after 2 ms.
+ * but break the loop if need_resched() is set or after 2 ms/ticks.
  * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
  * certain cases, such as stop_machine(), jiffies may cease to
  * increment and so we need the MAX_SOFTIRQ_RESTART limit as
@@ -191,7 +191,7 @@ EXPORT_SYMBOL(__local_bh_enable_ip);
  * we want to handle softirqs as soon as possible, but they
  * should not be able to lock up the box.
  */
-#define MAX_SOFTIRQ_TIME  msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_TIME  (1 + msecs_to_jiffies(2))
 #define MAX_SOFTIRQ_RESTART 10
 
 #ifdef CONFIG_TRACE_IRQFLAGS





[PATCH] stmmac: Fix 'eth0: No PHY found' regression

2016-02-29 Thread Gabriel Fernandez
This patch manages the case when you have an Ethernet MAC with
a "fixed link", and not connected to a normal MDIO-managed PHY device.

The test of phy_bus_name was not helpful because it was never affected
and replaced by the mdio test node.

Signed-off-by: Gabriel Fernandez 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 11 +--
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c |  9 -
 include/linux/stmmac.h|  1 +
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 0faf163..efb54f3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -199,21 +199,12 @@ int stmmac_mdio_register(struct net_device *ndev)
struct stmmac_priv *priv = netdev_priv(ndev);
struct stmmac_mdio_bus_data *mdio_bus_data = priv->plat->mdio_bus_data;
int addr, found;
-   struct device_node *mdio_node = NULL;
-   struct device_node *child_node = NULL;
+   struct device_node *mdio_node = priv->plat->mdio_node;
 
if (!mdio_bus_data)
return 0;
 
if (IS_ENABLED(CONFIG_OF)) {
-   for_each_child_of_node(priv->device->of_node, child_node) {
-   if (of_device_is_compatible(child_node,
-   "snps,dwmac-mdio")) {
-   mdio_node = child_node;
-   break;
-   }
-   }
-
if (mdio_node) {
netdev_dbg(ndev, "FOUND MDIO subnode\n");
} else {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 6a52fa1..4514ba7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -110,6 +110,7 @@ stmmac_probe_config_dt(struct platform_device *pdev, const 
char **mac)
struct device_node *np = pdev->dev.of_node;
struct plat_stmmacenet_data *plat;
struct stmmac_dma_cfg *dma_cfg;
+   struct device_node *child_node = NULL;
 
plat = devm_kzalloc(&pdev->dev, sizeof(*plat), GFP_KERNEL);
if (!plat)
@@ -140,13 +141,19 @@ stmmac_probe_config_dt(struct platform_device *pdev, 
const char **mac)
plat->phy_node = of_node_get(np);
}
 
+   for_each_child_of_node(np, child_node)
+   if (of_device_is_compatible(child_node, "snps,dwmac-mdio")) {
+   plat->mdio_node = child_node;
+   break;
+   }
+
/* "snps,phy-addr" is not a standard property. Mark it as deprecated
 * and warn of its use. Remove this when phy node support is added.
 */
if (of_property_read_u32(np, "snps,phy-addr", &plat->phy_addr) == 0)
dev_warn(&pdev->dev, "snps,phy-addr property is deprecated\n");
 
-   if ((plat->phy_node && !of_phy_is_fixed_link(np)) || plat->phy_bus_name)
+   if ((plat->phy_node && !of_phy_is_fixed_link(np)) || !plat->mdio_node)
plat->mdio_bus_data = NULL;
else
plat->mdio_bus_data =
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index eead8ab..881a79d 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -100,6 +100,7 @@ struct plat_stmmacenet_data {
int interface;
struct stmmac_mdio_bus_data *mdio_bus_data;
struct device_node *phy_node;
+   struct device_node *mdio_node;
struct stmmac_dma_cfg *dma_cfg;
int clk_csr;
int has_gmac;
-- 
1.9.1



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Peter Hurley
On 02/29/2016 07:19 AM, Eric Dumazet wrote:
> On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote:
> 
>> Not the case. The softirq is raised from interrupt.
>>
>> Before Eric's change, when an interrupt raises a new softirq
>> while processing another softirq, the new softirq is immediately
>> processed *after the existing softirq completes*.
>>
>> After Eric's change, when an interrupt raises a new softirq
>> while processing another softirq and _that softirq wakes a process_,
>> the new softirq is *deferred to normal process priority*.
> 
> For the last time, this is not true.
> 
> My patch changed the probability for this to happen.

There is a huge difference between
1. heavy i/o load forcing ksoftirqd to battle out i/o with regular
   sched processes *as a fallback to avoid 100% softirq* and
2. always deferring new softirq just because a process was woken


> It will happen even if you revert it.

I think there is a happy medium where finer constraints on
softirq looping will get us both what we want.

For example, an accumulating mask of softirq already run would
keep one softirq level from looping over-and-over. Or a per-softirq
limiting counter. Or relying on the hard limit that was added later
of a fixed number of softirq loops. Or a combination of those.


> linux never claimed that softirq could steal all cpu time.

That's not the problem observed here.

In fact, what your patch triggers is exactly the opposite:
although cpu load is initially very light because DMA is used to perform
device i/o, once DMA is not being serviced in a timely manner, the
driver fallbacks to purely interrupt-driven i/o which dramatically
increases the real cpu load at those line rates.

> Are by any chance still running a HZ=100 kernel ?

The current kernel is HZ=250 but this would occur on HZ=1000 as well.

Regards,
Peter Hurley



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Peter Hurley
On 02/29/2016 07:40 AM, Mike Galbraith wrote:
> On Mon, 2016-02-29 at 07:03 -0800, Peter Hurley wrote:
> 
>>> If I'm listening properly, the root cause is that there is a timing
>>> constraint involved, which is being exposed because one softirq raises
>>> another (ew).
>>
>> Not the case. The softirq is raised from interrupt.
> 
> Yeah, saw that on re-read.
> 
>> Before Eric's change, when an interrupt raises a new softirq
>> while processing another softirq, the new softirq is immediately
>> processed *after the existing softirq completes*.
> 
> Not necessarily, Eric only changed it from an arbitrary count to an
> arbitrary time, so your irq could just as well land when there's no
> count left and be up the same creek.

Your misreading the softirq abort logic:
neither 2ms nor a fixed number of loops has elapsed.

All that's happened is the first loop of NET_RX softirq has woken a
process; that is sufficient to abort softirq and defer it for ksoftirqd.

That's why I'm saying this is a priority inversion, and one that
will happen a lot.


> I was more infatuated by the constraint that's left dangling in the
> breeze any time processing is deferred to ksoftirqd.
> 
>   -Mike
> 



Re: [PATCH] mld, igmp: Fix reserved tailroom calculation

2016-02-29 Thread Daniel Borkmann

On 02/29/2016 04:19 PM, Benjamin Poirier wrote:

On 2016/02/29 15:57, Daniel Borkmann wrote:
[...]


[ cutting the IPv4 part off as diff is the same ]


diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 5ee56d0..c157edc 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev 
*idev, unsigned int mtu)
return NULL;

skb->priority = TC_PRIO_CONTROL;
-   skb->reserved_tailroom = skb_end_offset(skb) -
-min(mtu, skb_end_offset(skb));
skb_reserve(skb, hlen);
+   skb->reserved_tailroom = skb_tailroom(skb) -
+   min_t(int, mtu, skb_tailroom(skb) - tlen);


Are you sure this is correct? Wouldn't that mean (assuming we allocated
enough space), that I could now fill a larger than MTU frame?


Quoting back a part of the log:


The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)


The min() takes care of the situation you describe, ie. if the allocated
space is large, reserved_tailroom will be large enough that we do not
use more space than the mtu.


Hmm, sorry, you are right, I had a bug in my thought process wrt the
skb_reserve() that is now done first.

Code is fine, patch would be against -net tree:

Acked-by: Daniel Borkmann 

Thanks, Benjamin!


Re: [PATCH] mld, igmp: Fix reserved tailroom calculation

2016-02-29 Thread Hannes Frederic Sowa

On 29.02.2016 16:19, Benjamin Poirier wrote:

On 2016/02/29 15:57, Daniel Borkmann wrote:
[...]


[ cutting the IPv4 part off as diff is the same ]


diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index 5ee56d0..c157edc 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -1574,9 +1574,9 @@ static struct sk_buff *mld_newpack(struct inet6_dev 
*idev, unsigned int mtu)
return NULL;

skb->priority = TC_PRIO_CONTROL;
-   skb->reserved_tailroom = skb_end_offset(skb) -
-min(mtu, skb_end_offset(skb));
skb_reserve(skb, hlen);
+   skb->reserved_tailroom = skb_tailroom(skb) -
+   min_t(int, mtu, skb_tailroom(skb) - tlen);


Are you sure this is correct? Wouldn't that mean (assuming we allocated
enough space), that I could now fill a larger than MTU frame?


Quoting back a part of the log:


The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)


The min() takes care of the situation you describe, ie. if the allocated
space is large, reserved_tailroom will be large enough that we do not
use more space than the mtu.

I tested the mld and igmp code with different driver parameters, mtu
values, number of multicast address records and even allocation
failures. If you think the formula is wrong, please provide a
counter-example with hlen, tlen, mtu and size values.


I think the code is fine albeit I think we should remove the min macro 
and just do something:


if (skb_tailroom(skb) > mtu)
skb->reserved_tailroom = skb_tailroom(skb) - mtu;

Does that make sense? I think it is much more readable.

Thanks,
Hannes



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Mike Galbraith
On Mon, 2016-02-29 at 07:03 -0800, Peter Hurley wrote:

> > If I'm listening properly, the root cause is that there is a timing
> > constraint involved, which is being exposed because one softirq raises
> > another (ew).
> 
> Not the case. The softirq is raised from interrupt.

Yeah, saw that on re-read.

> Before Eric's change, when an interrupt raises a new softirq
> while processing another softirq, the new softirq is immediately
> processed *after the existing softirq completes*.

Not necessarily, Eric only changed it from an arbitrary count to an
arbitrary time, so your irq could just as well land when there's no
count left and be up the same creek.

I was more infatuated by the constraint that's left dangling in the
breeze any time processing is deferred to ksoftirqd.

-Mike


[PATCH] fsl/fman: remove dTSEC-A003 Errata workaround

2016-02-29 Thread igal.liberman
From: Igal Liberman 

Errata dTSEC-A003 was fixed in P4080 rev 3.0.
Prior revisions are not supported, so the workaround can be removed.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/fman_dtsec.c |8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman_dtsec.c 
b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
index 7c92eb8..09dd46d 100644
--- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
@@ -932,14 +932,6 @@ int dtsec_set_tx_pause_frames(struct fman_mac *dtsec,
if (!is_init_done(dtsec->dtsec_drv_param))
return -EINVAL;
 
-   /* FM_BAD_TX_TS_IN_B_2_B_ERRATA_DTSEC_A003 Errata workaround */
-   if (dtsec->fm_rev_info.major == 2)
-   if (pause_time <= 320) {
-   pr_warn("pause-time: %d illegal.Should be > 320\n",
-   pause_time);
-   return -EINVAL;
-   }
-
if (pause_time) {
ptv = ioread32be(®s->ptv);
ptv &= PTV_PTE_MASK;
-- 
1.7.9.5



[PATCH net 4/5] dwc_eth_qos: use DWCEQOS_MSG_DEFAULT

2016-02-29 Thread Lars Persson
From: Rabin Vincent 

Since debug is hardcoded to 3, the defaults in the DWCEQOS_MSG_DEFAULT
macro are never used, which does not seem to be the intended behaviour
here.  Set debug to -1 like other drivers so that DWCEQOS_MSG_DEFAULT is
actually used by default.

Signed-off-by: Rabin Vincent 
Signed-off-by: Lars Persson 
---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index 3ca2d5c..6897c1d 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -426,7 +426,7 @@
 #define DWC_MMC_RXOCTETCOUNT_GB  0x0784
 #define DWC_MMC_RXPACKETCOUNT_GB 0x0780
 
-static int debug = 3;
+static int debug = -1;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "DWC_eth_qos debug level (0=none,...,16=all)");
 
-- 
2.1.4



[PATCH net 1/5] dwc_eth_qos: fix race condition in dwceqos_start_xmit

2016-02-29 Thread Lars Persson
From: Rabin Vincent 

The xmit handler and the tx_reclaim tasklet had a race on the tx_free
variable which could lead to a tx timeout if tx_free was updated after
the tx complete interrupt.

Signed-off-by: Rabin Vincent 
Signed-off-by: Lars Persson 
---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index fc8bbff..926db2d 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -2178,12 +2178,10 @@ static int dwceqos_start_xmit(struct sk_buff *skb, 
struct net_device *ndev)
((trans.initial_descriptor + trans.nr_descriptors) %
 DWCEQOS_TX_DCNT));
 
-   dwceqos_tx_finalize(skb, lp, &trans);
-
-   netdev_sent_queue(ndev, skb->len);
-
spin_lock_bh(&lp->tx_lock);
lp->tx_free -= trans.nr_descriptors;
+   dwceqos_tx_finalize(skb, lp, &trans);
+   netdev_sent_queue(ndev, skb->len);
spin_unlock_bh(&lp->tx_lock);
 
ndev->trans_start = jiffies;
-- 
2.1.4



[PATCH net 2/5] dwc_eth_qos: release descriptors outside netif_tx_lock

2016-02-29 Thread Lars Persson
To prepare for using the CMA, we can not be in atomic context when
de-allocating DMA buffers.

The tx lock was needed only to protect the hw reset against the xmit
handler. Now we briefly grab the tx lock while stopping the queue to
make sure no thread is inside or will enter the xmit handler.

Signed-off-by: Lars Persson 
---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index 926db2d..53d48c0 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -1918,15 +1918,17 @@ static int dwceqos_stop(struct net_device *ndev)
phy_stop(lp->phy_dev);
 
tasklet_disable(&lp->tx_bdreclaim_tasklet);
-   netif_stop_queue(ndev);
napi_disable(&lp->napi);
 
-   dwceqos_drain_dma(lp);
+   /* Stop all tx before we drain the tx dma. */
+   netif_tx_lock_bh(lp->ndev);
+   netif_stop_queue(ndev);
+   netif_tx_unlock_bh(lp->ndev);
 
-   netif_tx_lock(lp->ndev);
+   dwceqos_drain_dma(lp);
dwceqos_reset_hw(lp);
+
dwceqos_descriptor_free(lp);
-   netif_tx_unlock(lp->ndev);
 
return 0;
 }
-- 
2.1.4



[PATCH net 5/5] dwc_eth_qos: do phy_start before resetting hardware

2016-02-29 Thread Lars Persson
This reverts the changed init order from commit 3647bc35bd42
("dwc_eth_qos: Reset hardware before PHY start") and makes another fix
for the race.

It turned out that the reset state machine of the dwceqos hardware
requires PHY clocks to be present in order to complete the reset
cycle.

To plug the race with the phy state machine we defer link speed
setting until the hardware init has finished.

Signed-off-by: Lars Persson 
---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index 6897c1d..af11ed1 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -650,6 +650,11 @@ struct net_local {
u32 mmc_tx_counters_mask;
 
struct dwceqos_flowcontrol flowcontrol;
+
+   /* Tracks the intermediate state of phy started but hardware
+* init not finished yet.
+*/
+   bool phy_defer;
 };
 
 static void dwceqos_read_mmc_counters(struct net_local *lp, u32 rx_mask,
@@ -901,6 +906,9 @@ static void dwceqos_adjust_link(struct net_device *ndev)
struct phy_device *phydev = lp->phy_dev;
int status_change = 0;
 
+   if (lp->phy_defer)
+   return;
+
if (phydev->link) {
if ((lp->speed != phydev->speed) ||
(lp->duplex != phydev->duplex)) {
@@ -1635,6 +1643,12 @@ static void dwceqos_init_hw(struct net_local *lp)
regval = dwceqos_read(lp, REG_DWCEQOS_MAC_CFG);
dwceqos_write(lp, REG_DWCEQOS_MAC_CFG,
  regval | DWCEQOS_MAC_CFG_TE | DWCEQOS_MAC_CFG_RE);
+
+   lp->phy_defer = false;
+   mutex_lock(&lp->phy_dev->lock);
+   phy_read_status(lp->phy_dev);
+   dwceqos_adjust_link(lp->ndev);
+   mutex_unlock(&lp->phy_dev->lock);
 }
 
 static void dwceqos_tx_reclaim(unsigned long data)
@@ -1880,9 +1894,13 @@ static int dwceqos_open(struct net_device *ndev)
}
netdev_reset_queue(ndev);
 
+   /* The dwceqos reset state machine requires all phy clocks to complete,
+* hence the unusual init order with phy_start first.
+*/
+   lp->phy_defer = true;
+   phy_start(lp->phy_dev);
dwceqos_init_hw(lp);
napi_enable(&lp->napi);
-   phy_start(lp->phy_dev);
 
netif_start_queue(ndev);
tasklet_enable(&lp->tx_bdreclaim_tasklet);
@@ -1915,8 +1933,6 @@ static int dwceqos_stop(struct net_device *ndev)
 {
struct net_local *lp = netdev_priv(ndev);
 
-   phy_stop(lp->phy_dev);
-
tasklet_disable(&lp->tx_bdreclaim_tasklet);
napi_disable(&lp->napi);
 
@@ -1927,6 +1943,7 @@ static int dwceqos_stop(struct net_device *ndev)
 
dwceqos_drain_dma(lp);
dwceqos_reset_hw(lp);
+   phy_stop(lp->phy_dev);
 
dwceqos_descriptor_free(lp);
 
-- 
2.1.4



[PATCH net 3/5] dwc_eth_qos: use GFP_KERNEL in dma_alloc_coherent()

2016-02-29 Thread Lars Persson
From: Rabin Vincent 

Since we are in non-atomic context here we can pass GFP_KERNEL to
dma_alloc_coherent(). This enables use of the CMA.

Signed-off-by: Rabin Vincent 
Signed-off-by: Lars Persson 
---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index 53d48c0..3ca2d5c 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -1113,7 +1113,7 @@ static int dwceqos_descriptor_init(struct net_local *lp)
/* Allocate DMA descriptors */
size = DWCEQOS_RX_DCNT * sizeof(struct dwceqos_dma_desc);
lp->rx_descs = dma_alloc_coherent(lp->ndev->dev.parent, size,
-   &lp->rx_descs_addr, 0);
+   &lp->rx_descs_addr, GFP_KERNEL);
if (!lp->rx_descs)
goto err_out;
lp->rx_descs_tail_addr = lp->rx_descs_addr +
@@ -1121,7 +1121,7 @@ static int dwceqos_descriptor_init(struct net_local *lp)
 
size = DWCEQOS_TX_DCNT * sizeof(struct dwceqos_dma_desc);
lp->tx_descs = dma_alloc_coherent(lp->ndev->dev.parent, size,
-   &lp->tx_descs_addr, 0);
+   &lp->tx_descs_addr, GFP_KERNEL);
if (!lp->tx_descs)
goto err_out;
lp->tx_descs_tail_addr = lp->tx_descs_addr +
-- 
2.1.4



Re: Softirq priority inversion from "softirq: reduce latencies"

2016-02-29 Thread Eric Dumazet
On lun., 2016-02-29 at 07:03 -0800, Peter Hurley wrote:

> The reason why Eric's change is so effective for Eric's workload is
> that it fixes the problem where NET_RX keeps getting new network packets
> so it keeps looping, servicing more NET_RX softirq.

You have very little idea of what is happening in networking land.

Once hard irq for RX has triggered, we arm a NAPI (NET_RX softirq), and
no more irq will come unless the napi handler ran. Then when NAPI is
complete, we re-allow interrupt to be delivered when a new packet is
coming.

Yes, ksoftirqd runs under load, and this is _wanted_.

Sure, it might add a latency if some high prio task is wanting the same
cpu, but this is exactly the purpose of having multi tasking.




[PATCH] fsl/fman: Initialize fman->dev earlier

2016-02-29 Thread igal.liberman
From: Igal Liberman 

Currently, in a case of error, dev_err is using fman->dev
before its initialization and "(NULL device *)" is printed.
This patch fixes this issue.

Signed-off-by: Igal Liberman 
---
 drivers/net/ethernet/freescale/fman/fman.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
index 623aa1c..79a210a 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -2791,6 +2791,8 @@ static struct fman *read_dts_node(struct platform_device 
*of_dev)
goto fman_free;
}
 
+   fman->dev = &of_dev->dev;
+
return fman;
 
 fman_node_put:
@@ -2845,8 +2847,6 @@ static int fman_probe(struct platform_device *of_dev)
 
dev_set_drvdata(dev, fman);
 
-   fman->dev = dev;
-
dev_dbg(dev, "FMan%d probed\n", fman->dts_params.id);
 
return 0;
-- 
1.7.9.5



  1   2   >