date:20170105

Re: [lkp-developer] [page_pool] 50a8fe7622: kernel_BUG_at_mm/slub.c

2017-01-05 Thread Jesper Dangaard Brouer


On Fri, 6 Jan 2017 13:08:27 +0800 kernel test robot  
wrote:

> FYI, we noticed the following commit:
> 
> commit: 50a8fe7622e6c45af778d91f83c11491f0afaaf3 ("page_pool: basic 
> implementation of page_pool")
> url: 
> https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/page_pool-proof-of-concept-early-code/20161221-014200
> base: git://git.cmpxchg.org/linux-mmotm.git master
> 
> in testcase: trinity
> with following parameters:
> 
>   runtime: 300s
> 
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
> 
> 
> on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M

This is because this RFC patch does not support 32-bit, as I'm using a
page flag that is only avail on 64-bit, see[1].

I though this kind of page-flags violation would be caught compile-time?

[1] 
https://github.com/0day-ci/linux/commit/50a8fe7622e6c45af778d91f83c11491f0afaaf3#diff-c684e72d6c55b89ae592b66e9ce818ee
 
> caused below changes:
> 
> 
> +--+++
> |  | 03fc8354e2 | 50a8fe7622 |
> +--+++
> | boot_successes   | 6  | 0  |
> | boot_failures| 0  | 4  |
> | kernel_BUG_at_mm/slub.c  | 0  | 4  |
> | invalid_opcode:#[##]SMP_DEBUG_PAGEALLOC  | 0  | 4  |
> | Kernel_panic-not_syncing:Fatal_exception | 0  | 4  |
> +--+++
> 
> 
> 
> [0.00]   .text : 0xc100 - 0xc188d0b7   (8756 kB)
> [0.00] Checking if this processor honours the WP bit even in 
> supervisor mode...Ok.
> [0.00] [ cut here ]
> [0.00] kernel BUG at mm/slub.c:349!
> [0.00] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> 4.9.0-mm1-00096-g50a8fe7 #1
> [0.00] task: c1d4ea80 task.stack: c1d46000
> [0.00] EIP: get_partial_node+0x148/0x330
> [0.00] EFLAGS: 00210046 CPU: 0
> [0.00] EAX: 00200082 EBX: d2d38000 ECX:  EDX: d2400010
> [0.00] ESI: c1e3ef80 EDI: d240 EBP: c1d47e50 ESP: c1d47dc0
> [0.00]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> [0.00] CR0: 80050033 CR2: ffbff000 CR3: 01e76000 CR4: 06b0
> [0.00] Call Trace:
> [0.00]  ? add_lock_to_list+0x7e/0xa7
> [0.00]  ? __lock_acquire+0x103a/0x1326
> [0.00]  ___slab_alloc+0x238/0x378
> 
> 
> To reproduce:
> 
> git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp qemu -k  job-script  # job-script is attached in 
> this email

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [for-next 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-05 Thread Leon Romanovsky

On Thu, Jan 05, 2017 at 03:07:31PM -0500, David Miller wrote:
> From: Eli Cohen 
> Date: Thu, 5 Jan 2017 14:03:18 -0600
>
> > If necessary I can make sure it builds on 32 bits as well.
>
> Please do.

Dave,

I'm failing to understand the benefits of building mlx5 on 32 bits, and
see only disadvantages:
 * It is actual dead code without test coverage.
 * It misleads reviewers/customers by seeing code for 32 bits.
 * It adds compilation time for 32 bits platforms and "punishes" them
   for not relevant for them driver.

Why do you call removing all that as a "regression"?

 Thanks.

signature.asc
Description: PGP signature

[PATCH v4] rfkill: Add rfkill-any LED trigger

2017-01-05 Thread Michał Kępień

Add a new "global" (i.e. not per-rfkill device) LED trigger, rfkill-any,
which may be useful on laptops with a single "radio LED" and multiple
radio transmitters.  The trigger is meant to turn a LED on whenever
there is at least one radio transmitter active and turn it off
otherwise.

Signed-off-by: Michał Kępień 
---
Changes from v3:

  - Revert introducing a new bitfield and instead defer LED event firing
to a work queue to prevent conditional locking and ensure the
trigger can really be used from any context.  This also voids the
need to take rfkill_global_mutex before calling rfkill_set_block()
in rfkill_resume().

Changes from v2:

  - Handle the global mutex properly when rfkill_set_{hw,sw}_state() or
rfkill_set_states() is called from within an rfkill callback.  v2
always tried to lock the global mutex in such a case, which led to a
deadlock when an rfkill driver called one of the above functions
from its query or set_block callback.  This is solved by defining a
new bitfield, RFKILL_BLOCK_SW_HASLOCK, which is set before the above
callbacks are invoked and cleared afterwards; the functions listed
above use this bitfield to tell rfkill_any_led_trigger_event()
whether the global mutex is currently held or not.
RFKILL_BLOCK_SW_SETCALL cannot be reused for this purpose as setting
it before invoking the query callback would cause any calls to
rfkill_set_sw_state() made from within that callback to work on
RFKILL_BLOCK_SW_PREV instead of RFKILL_BLOCK_SW and thus change the
way rfkill_set_block() behaves.

  - As rfkill_any_led_trigger_event() now takes a boolean argument which
tells it whether the global mutex was already taken by the caller,
all calls to __rfkill_any_led_trigger_event() outside
rfkill_any_led_trigger_event() have been replaced with calls to
rfkill_any_led_trigger_event(true).

Changes from v1:

  - take rfkill_global_mutex before calling rfkill_set_block() in
rfkill_resume(); the need for doing this was previously obviated by
908209c ("rfkill: don't impose global states on resume"), but given
that __rfkill_any_led_trigger_event() is called from
rfkill_set_block() unconditionally, each caller of the latter needs
to take care of locking rfkill_global_mutex,

  - declare __rfkill_any_led_trigger_event() even when
CONFIG_RFKILL_LEDS=n to prevent implicit declaration errors,

  - remove the #ifdef surrounding rfkill_any_led_trigger_{,un}register()
calls to prevent compilation warnings about functions and a label
being defined but not used,

  - move the rfkill_any_led_trigger_register() call in rfkill_init()
before the rfkill_handler_init() call to avoid the need to call
rfkill_handler_exit() from rfkill_init() and thus prevent a section
mismatch.

 net/rfkill/core.c | 72 ++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index afa4f71b4c7b..2064c3a35ef8 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -176,6 +176,50 @@ static void rfkill_led_trigger_unregister(struct rfkill 
*rfkill)
 {
led_trigger_unregister(>led_trigger);
 }
+
+static struct led_trigger rfkill_any_led_trigger;
+static struct work_struct rfkill_any_work;
+
+static void rfkill_any_led_trigger_worker(struct work_struct *work)
+{
+   enum led_brightness brightness = LED_OFF;
+   struct rfkill *rfkill;
+
+   mutex_lock(_global_mutex);
+   list_for_each_entry(rfkill, _list, node) {
+   if (!(rfkill->state & RFKILL_BLOCK_ANY)) {
+   brightness = LED_FULL;
+   break;
+   }
+   }
+   mutex_unlock(_global_mutex);
+
+   led_trigger_event(_any_led_trigger, brightness);
+}
+
+static void rfkill_any_led_trigger_event(void)
+{
+   schedule_work(_any_work);
+}
+
+static void rfkill_any_led_trigger_activate(struct led_classdev *led_cdev)
+{
+   rfkill_any_led_trigger_event();
+}
+
+static int rfkill_any_led_trigger_register(void)
+{
+   INIT_WORK(_any_work, rfkill_any_led_trigger_worker);
+   rfkill_any_led_trigger.name = "rfkill-any";
+   rfkill_any_led_trigger.activate = rfkill_any_led_trigger_activate;
+   return led_trigger_register(_any_led_trigger);
+}
+
+static void rfkill_any_led_trigger_unregister(void)
+{
+   led_trigger_unregister(_any_led_trigger);
+   cancel_work_sync(_any_work);
+}
 #else
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
 {
@@ -189,6 +233,19 @@ static inline int rfkill_led_trigger_register(struct 
rfkill *rfkill)
 static inline void rfkill_led_trigger_unregister(struct rfkill *rfkill)
 {
 }
+
+static void rfkill_any_led_trigger_event(void)
+{
+}
+
+static int rfkill_any_led_trigger_register(void)
+{
+   return 0;
+}
+
+static void rfkill_any_led_trigger_unregister(void)
+{
+}
 #endif /* CONFIG_RFKILL_LEDS

Re: [PATCH net-next v4 0/4] Fix OdroidC2 Gigabit Tx link issue

2017-01-05 Thread Yegor Yefremov

Hi Russel,

On Fri, Jan 6, 2017 at 12:25 AM, Russell King - ARM Linux
 wrote:
> On Mon, Nov 28, 2016 at 09:54:28AM -0800, Florian Fainelli wrote:
>> If we start supporting generic "enable", "disable" type of properties
>> with values that map directly to register definitions of the HW, we
>> leave too much room for these properties to be utilized to implement a
>> specific policy, and this is not acceptable.
>
> Another concern with this patch is that the existing phylib "set_eee"
> code is horribly buggy - it just translates the modes from userspace
> into the register value and writes them directly to the register with
> no validation.  So it's possible to set modes in the register that the
> hardware doesn't support, and have them advertised to the link partner.
>
> I have a patch which fixes that, restricting (as we do elsewhere) the
> advert according to the EEE supported capabilities retrieved from the
> PCS - maybe the problem here is that the PCS doesn't support support
> EEE in 1000baseT mode?
>
> Out of interest, which PHY is used on this platform?
>
> On the SolidRun boards, they're using AR8035, and have suffered this
> occasional link drop problem.  What has been found is that it seems to
> be to do with the timing parameters, and it seemed to only be 1000bT
> that was affected.  I don't remember off hand exactly which or what
> the change was they made to stabilise it though, but I can probabily
> find out tomorrow.

I have different boards with am335x and AR8035 and we had occasional
link drop with both 100 and 1000 speeds.

Yegor

[lkp-developer] [page_pool] 50a8fe7622: kernel_BUG_at_mm/slub.c

2017-01-05 Thread kernel test robot


FYI, we noticed the following commit:

commit: 50a8fe7622e6c45af778d91f83c11491f0afaaf3 ("page_pool: basic 
implementation of page_pool")
url: 
https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/page_pool-proof-of-concept-early-code/20161221-014200
base: git://git.cmpxchg.org/linux-mmotm.git master

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -smp 2 -m 320M

caused below changes:


+--+++
|  | 03fc8354e2 | 50a8fe7622 |
+--+++
| boot_successes   | 6  | 0  |
| boot_failures| 0  | 4  |
| kernel_BUG_at_mm/slub.c  | 0  | 4  |
| invalid_opcode:#[##]SMP_DEBUG_PAGEALLOC  | 0  | 4  |
| Kernel_panic-not_syncing:Fatal_exception | 0  | 4  |
+--+++



[0.00]   .text : 0xc100 - 0xc188d0b7   (8756 kB)
[0.00] Checking if this processor honours the WP bit even in supervisor 
mode...Ok.
[0.00] [ cut here ]
[0.00] kernel BUG at mm/slub.c:349!
[0.00] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-mm1-00096-g50a8fe7 
#1
[0.00] task: c1d4ea80 task.stack: c1d46000
[0.00] EIP: get_partial_node+0x148/0x330
[0.00] EFLAGS: 00210046 CPU: 0
[0.00] EAX: 00200082 EBX: d2d38000 ECX:  EDX: d2400010
[0.00] ESI: c1e3ef80 EDI: d240 EBP: c1d47e50 ESP: c1d47dc0
[0.00]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[0.00] CR0: 80050033 CR2: ffbff000 CR3: 01e76000 CR4: 06b0
[0.00] Call Trace:
[0.00]  ? add_lock_to_list+0x7e/0xa7
[0.00]  ? __lock_acquire+0x103a/0x1326
[0.00]  ___slab_alloc+0x238/0x378


To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.9.0-mm1 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y

Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device

2017-01-05 Thread Florian Fainelli

Le 01/05/17 à 19:39, maowenan a écrit :
> 
> 
> On 2017/1/6 11:21, Florian Fainelli wrote:
>> +Andrew,
>>
>> Le 01/05/17 à 18:29, maowenan a écrit :
> @Florian Fainelli, what's your comments about this patch?

 I am trying to reproduce what you are seeing, but at first glance is looks 
 like an
 appropriate solution to me. Do you mind giving me a couple more days?

 Thanks!
 --
 Florian
>>>
>>> Hi Florian, 
>>>   Do you have any update about this patch?
>>
>> Your patch is not complete, there are now MDIO device (which PHY devices
>> are a superset of) that would also need a similar fix.
>>
> ok, is there any patch to fix MDIO yet?  if not, i will verify it and give a 
> fix patch?
> 

No, there is not a patch yet, your approach looks okay, but need to be
made general and cover MDIO devices as well.

Thank you!
-- 
Florian

[PATCHv3 4/5] arm: mvebu: Add device tree for 98DX3236 SoCs

2017-01-05 Thread Chris Packham

The Marvell 98DX3236, 98DX3336, 98DX4521 and variants are switch ASICs
with integrated CPUs. They are similar to the Armada XP SoCs but have
different I/O interfaces.

Signed-off-by: Chris Packham 
---
Changes in v2:
- Update devicetree binding documentation to reflect that 98DX3336 and
  984251 are supersets of 98DX3236.
- disable crypto block
- disable sdio for 98DX3236, enable for 98DX4251
Changes in v3:
- fix typo 4521 -> 4251
- document prestera bindings
- rework corediv-clock binding
- add label to packet processor node
- add new compativle string for DFX server

 .../devicetree/bindings/arm/marvell/98dx3236.txt   |  23 ++
 .../devicetree/bindings/net/marvell,prestera.txt   |  50 
 arch/arm/boot/dts/armada-xp-98dx3236.dtsi  | 254 +
 arch/arm/boot/dts/armada-xp-98dx3336.dtsi  |  76 ++
 arch/arm/boot/dts/armada-xp-98dx4251.dtsi  |  90 
 5 files changed, 493 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
 create mode 100644 Documentation/devicetree/bindings/net/marvell,prestera.txt
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx3236.dtsi
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx3336.dtsi
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx4251.dtsi

diff --git a/Documentation/devicetree/bindings/arm/marvell/98dx3236.txt 
b/Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
new file mode 100644
index ..64e8c73fc5ab
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
@@ -0,0 +1,23 @@
+Marvell 98DX3236, 98DX3336 and 98DX4251 Platforms Device Tree Bindings
+--
+
+Boards with a SoC of the Marvell 98DX3236, 98DX3336 and 98DX4251 families
+shall have the following property:
+
+Required root node property:
+
+compatible: must contain "marvell,armadaxp-98dx3236"
+
+In addition, boards using the Marvell 98DX3336 SoC shall have the
+following property:
+
+Required root node property:
+
+compatible: must contain "marvell,armadaxp-98dx3336"
+
+In addition, boards using the Marvell 98DX4251 SoC shall have the
+following property:
+
+Required root node property:
+
+compatible: must contain "marvell,armadaxp-98dx4251"
diff --git a/Documentation/devicetree/bindings/net/marvell,prestera.txt 
b/Documentation/devicetree/bindings/net/marvell,prestera.txt
new file mode 100644
index ..5fbab29718e8
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/marvell,prestera.txt
@@ -0,0 +1,50 @@
+Marvell Prestera Switch Chip bindings
+-
+
+Required properties:
+- compatible: one of the following
+   "marvell,prestera-98dx3236",
+   "marvell,prestera-98dx3336",
+   "marvell,prestera-98dx4251",
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device
+
+Optional properties:
+- dfx: phandle reference to the "DFX Server" node
+
+Example:
+
+switch {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 MBUS_ID(0x03, 0x00) 0 0x10>;
+
+   packet-processor@0 {
+   compatible = "marvell,prestera-98dx3236";
+   reg = <0 0x400>;
+   interrupts = <33>, <34>, <35>;
+   dfx = <>;
+   };
+};
+
+DFX Server bindings
+---
+
+Required properties:
+- compatible: must be "marvell,dfx-server"
+- reg: address and length of the register set for the device.
+
+Example:
+
+dfx-registers {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges = <0 MBUS_ID(0x08, 0x00) 0 0x10>;
+
+   dfx: dfx@0 {
+   compatible = "marvell,dfx-server";
+   reg = <0 0x10>;
+   };
+};
diff --git a/arch/arm/boot/dts/armada-xp-98dx3236.dtsi 
b/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
new file mode 100644
index ..4b7b2fe3b682
--- /dev/null
+++ b/arch/arm/boot/dts/armada-xp-98dx3236.dtsi
@@ -0,0 +1,254 @@
+/*
+ * Device Tree Include file for Marvell 98dx3236 family SoC
+ *
+ * Copyright (C) 2016 Allied Telesis Labs
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU

[PATCH net] net: Fix inconsistent rtnl_lock usage on dev_get_stats().

2017-01-05 Thread Michael Chan

Some callers take rtnl_lock() before calling dev_get_stats() and some
don't.  Most network drivers expect the ndo_get_stats64() to be called
under rtnl_lock() to avoid race conditions with device close or ethtool
reconfigurations.  Fix it so that all callers take rtnl_lock().

Rename the original dev_get_stats() as __dev_get_stats() and add a new
dev_get_stats() that takes rtnl_lock() before calling __dev_get_stats().
Modify all callers that already take rtnl_lock() to call __dev_get_stats().

Signed-off-by: Michael Chan 
---
 drivers/net/bonding/bond_main.c  |  4 ++--
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  2 +-
 drivers/net/ethernet/intel/ixgbevf/ethtool.c |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c |  2 +-
 include/linux/netdevice.h|  2 ++
 net/core/dev.c   | 19 ---
 net/core/rtnetlink.c |  4 ++--
 8 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8029dd4..9a2fbea 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1509,7 +1509,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
 
slave_dev->priv_flags |= IFF_BONDING;
/* initialize slave stats */
-   dev_get_stats(new_slave->dev, _slave->slave_stats);
+   __dev_get_stats(new_slave->dev, _slave->slave_stats);
 
if (bond_is_lb(bond)) {
/* bond_alb_init_slave() must be called before all other stages 
since
@@ -3351,7 +3351,7 @@ static struct rtnl_link_stats64 *bond_get_stats(struct 
net_device *bond_dev,
rcu_read_lock();
bond_for_each_slave_rcu(bond, slave, iter) {
const struct rtnl_link_stats64 *new =
-   dev_get_stats(slave->dev, );
+   __dev_get_stats(slave->dev, );
 
bond_fold_stats(stats, new, >slave_stats);
 
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 3ac2183..8396336 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -865,7 +865,7 @@ void hns_get_ethtool_stats(struct net_device *netdev,
 
h->dev->ops->update_stats(h, >stats);
 
-   net_stats = dev_get_stats(netdev, );
+   net_stats = __dev_get_stats(netdev, );
 
/* get netdev statistics */
p[0] = net_stats->rx_packets;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index fd192bf..f8097c4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1145,7 +1145,7 @@ static void ixgbe_get_ethtool_stats(struct net_device 
*netdev,
char *p = NULL;
 
ixgbe_update_stats(adapter);
-   net_stats = dev_get_stats(netdev, );
+   net_stats = __dev_get_stats(netdev, );
for (i = 0; i < IXGBE_GLOBAL_STATS_LEN; i++) {
switch (ixgbe_gstrings_stats[i].type) {
case NETDEV_STATS:
diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index 508e72c..622ccad 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -406,7 +406,7 @@ static void ixgbevf_get_ethtool_stats(struct net_device 
*netdev,
char *p;
 
ixgbevf_update_stats(adapter);
-   net_stats = dev_get_stats(netdev, );
+   net_stats = __dev_get_stats(netdev, );
for (i = 0; i < IXGBEVF_GLOBAL_STATS_LEN; i++) {
switch (ixgbevf_gstrings_stats[i].type) {
case NETDEV_STATS:
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 1b26e96..ea77de0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -270,7 +270,7 @@ static void nfp_net_get_stats(struct net_device *netdev,
int i, j, k;
u8 *p;
 
-   netdev_stats = dev_get_stats(netdev, );
+   netdev_stats = __dev_get_stats(netdev, );
 
for (i = 0; i < NN_ET_GLOBAL_STATS_LEN; i++) {
switch (nfp_net_et_stats[i].type) {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 994f742..76bc92f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3787,6 +3787,8 @@ static inline void __dev_mc_unsync(struct net_device *dev,
 void netdev_features_change(struct net_device *dev);
 /* Load a device via the kmod */
 void dev_load(struct net *net, const char *name);
+struct rtnl_link_stats64 *__dev_get_stats(struct net_device *dev,
+

[PATCHv2 0/5] Support for Marvell switches with integrated CPUs

2017-01-05 Thread Chris Packham

The 98DX3236, 98DX3336 and 98DX4251 are a set of switch ASICs with
integrated CPUs. They CPU block is common within these product lines and
(as far as I can tell/have been told) is based on the Armada XP. There
are a few differences due to the fact they have to squeeze the CPU into
the same package as the switch.

Chris Packham (4):
  clk: mvebu: support for 98DX3236 SoC
Changes in v2:
- Update devicetree binding documentation for new compatible string
Changes in v3:
- Add 98dx3236 support to mvebu/clk-corediv.c rather than creating a
  new driver.
- Document mv98dx3236-corediv-clock binding
  arm: mvebu: support for SMP on 98DX3336 SoC
Changes in v2:
- Document new enable-method value
- Correct some references from 98DX4521 to 98DX3236
Changes in v3:
- Simplify mv98dx3236_resume_init by using of_io_request_and_map()
  arm: mvebu: Add device tree for 98DX3236 SoCs
Changes in v2:
- Update devicetree binding documentation to reflect that 98DX3336 and
  984251 are supersets of 98DX3236.
- disable crypto block
- disable sdio for 98DX3236, enable for 98DX4251
Changes in v3:
- fix typo 4521 -> 4251
- document prestera bindings
- rework corediv-clock binding
- add label to packet processor node
- add new compativle string for DFX server
  arm: mvebu: Add device tree for db-dxbc2 and db-xc3-24g4xg boards
Changes in v2/v3:
- none

Kalyan Kinthada (1):
  pinctrl: mvebu: pinctrl driver for 98DX3236 SoC
Changes in v2:
- include sdio support for the 98DX4251
Changes in v3:
- None


 Documentation/devicetree/bindings/arm/cpus.txt |   1 +
 .../bindings/arm/marvell/98dx3236-resume-ctrl.txt  |  18 ++
 .../devicetree/bindings/arm/marvell/98dx3236.txt   |  23 ++
 .../bindings/clock/mvebu-corediv-clock.txt |   1 +
 .../devicetree/bindings/clock/mvebu-cpu-clock.txt  |   1 +
 .../devicetree/bindings/net/marvell,prestera.txt   |  50 
 .../pinctrl/marvell,armada-98dx3236-pinctrl.txt|  46 
 arch/arm/boot/dts/armada-xp-98dx3236.dtsi  | 254 +
 arch/arm/boot/dts/armada-xp-98dx3336.dtsi  |  76 ++
 arch/arm/boot/dts/armada-xp-98dx4251.dtsi  |  90 
 arch/arm/boot/dts/db-dxbc2.dts | 159 +
 arch/arm/boot/dts/db-xc3-24g4xg.dts| 155 +
 arch/arm/mach-mvebu/Makefile   |   1 +
 arch/arm/mach-mvebu/common.h   |   1 +
 arch/arm/mach-mvebu/platsmp.c  |  43 
 arch/arm/mach-mvebu/pmsu-98dx3236.c|  52 +
 drivers/clk/mvebu/armada-xp.c  |  42 
 drivers/clk/mvebu/clk-corediv.c|  23 ++
 drivers/clk/mvebu/clk-cpu.c|  31 ++-
 drivers/pinctrl/mvebu/pinctrl-armada-xp.c  | 155 +
 20 files changed, 1220 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/arm/marvell/98dx3236-resume-ctrl.txt
 create mode 100644 Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
 create mode 100644 Documentation/devicetree/bindings/net/marvell,prestera.txt
 create mode 100644 
Documentation/devicetree/bindings/pinctrl/marvell,armada-98dx3236-pinctrl.txt
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx3236.dtsi
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx3336.dtsi
 create mode 100644 arch/arm/boot/dts/armada-xp-98dx4251.dtsi
 create mode 100644 arch/arm/boot/dts/db-dxbc2.dts
 create mode 100644 arch/arm/boot/dts/db-xc3-24g4xg.dts
 create mode 100644 arch/arm/mach-mvebu/pmsu-98dx3236.c

Interdiff to v2:

diff --git
a/Documentation/devicetree/bindings/clock/mvebu-corediv-clock.txt
b/Documentation/devicetree/bindings/clock/mvebu-corediv-clock.txt
index 520562a7dc2a..c7b4e3a6b2c6 100644
--- a/Documentation/devicetree/bindings/clock/mvebu-corediv-clock.txt
+++ b/Documentation/devicetree/bindings/clock/mvebu-corediv-clock.txt
@@ -7,6 +7,7 @@ Required properties:
 - compatible : must be "marvell,armada-370-corediv-clock",
   "marvell,armada-375-corediv-clock",
   "marvell,armada-380-corediv-clock",
+   "marvell,mv98dx3236-corediv-clock",
 
 - reg : must be the register address of Core Divider control register
 - #clock-cells : from common clock binding; shall be set to 1
diff --git a/Documentation/devicetree/bindings/net/marvell,prestera.txt
b/Documentation/devicetree/bindings/net/marvell,prestera.txt
new file mode 100644
index ..5fbab29718e8
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/marvell,prestera.txt
@@ -0,0 +1,50 @@
+Marvell Prestera Switch Chip bindings
+-
+
+Required properties:
+- compatible: one of the following
+   "marvell,prestera-98dx3236",
+   "marvell,prestera-98dx3336",
+   "marvell,prestera-98dx4251",
+- reg: address and length of the register set for the device.
+-

[PATCHv2 net-next] cxgb4: Synchronize access to mailbox

2017-01-05 Thread Hariprasad Shenai

The issue comes when there are multiple threads attempting to use
the mailbox facility at the same time.
When DCB operations and interface up/down is run in a loop for every
0.1 sec, we observed mailbox collisions. And out of the two commands
one would fail with the present code, since we don't queue the second
command.

To overcome the above issue, added a queue to access the mailbox.
Whenever a mailbox command is issued add it to the queue. If its at
the head issue the mailbox command, else wait for the existing command
to complete. Usually command takes less than a milli-second to
complete.

Also timeout from the loop, if the command under execution takes
long time to run.

In reality, the number of mailbox access collisions is going to be
very rare since no one runs such abusive script.

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  8 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |  3 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  | 59 -
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 0bce1bf9ca0f..78a852c72f5d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -782,6 +782,10 @@ struct vf_info {
bool pf_set_mac;
 };
 
+struct mbox_list {
+   struct list_head list;
+};
+
 struct adapter {
void __iomem *regs;
void __iomem *bar2;
@@ -844,6 +848,10 @@ struct adapter {
struct work_struct db_drop_task;
bool tid_release_task_busy;
 
+   /* lock for mailbox cmd list */
+   spinlock_t mbox_lock;
+   struct mbox_list mlist;
+
/* support for mailbox command/reply logging */
 #define T4_OS_LOG_MBOX_CMDS 256
struct mbox_cmd_log *mbox_log;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 6f951877430b..9d2fe5140b88 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4707,6 +4707,9 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
spin_lock_init(>stats_lock);
spin_lock_init(>tid_release_lock);
spin_lock_init(>win0_lock);
+   spin_lock_init(>mbox_lock);
+
+   INIT_LIST_HEAD(>mlist.list);
 
INIT_WORK(>tid_release_task, process_tid_release_list);
INIT_WORK(>db_full_task, process_db_full);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index e8139514d32c..7ac6ea531b0f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -284,6 +284,7 @@ int t4_wr_mbox_meat_timeout(struct adapter *adap, int mbox, 
const void *cmd,
1, 1, 3, 5, 10, 10, 20, 50, 100, 200
};
 
+   struct mbox_list entry;
u16 access = 0;
u16 execute = 0;
u32 v;
@@ -311,11 +312,61 @@ int t4_wr_mbox_meat_timeout(struct adapter *adap, int 
mbox, const void *cmd,
timeout = -timeout;
}
 
+   /* Queue ourselves onto the mailbox access list.  When our entry is at
+* the front of the list, we have rights to access the mailbox.  So we
+* wait [for a while] till we're at the front [or bail out with an
+* EBUSY] ...
+*/
+   spin_lock(>mbox_lock);
+   list_add_tail(, >mlist.list);
+   spin_unlock(>mbox_lock);
+
+   delay_idx = 0;
+   ms = delay[0];
+
+   for (i = 0; ; i += ms) {
+   /* If we've waited too long, return a busy indication.  This
+* really ought to be based on our initial position in the
+* mailbox access list but this is a start.  We very rearely
+* contend on access to the mailbox ...
+*/
+   if (i > FW_CMD_MAX_TIMEOUT) {
+   spin_lock(>mbox_lock);
+   list_del();
+   spin_unlock(>mbox_lock);
+   ret = -EBUSY;
+   t4_record_mbox(adap, cmd, size, access, ret);
+   return ret;
+   }
+
+   /* If we're at the head, break out and start the mailbox
+* protocol.
+*/
+   if (list_first_entry(>mlist.list, struct mbox_list,
+list) == )
+   break;
+
+   /* Delay for a bit before checking again ... */
+   if (sleep_ok) {
+   ms = delay[delay_idx];  /* last element may repeat */
+   if (delay_idx < ARRAY_SIZE(delay) - 1)
+   delay_idx++;
+   msleep(ms);
+   } else {
+   mdelay(ms);
+   }
+   }
+
+

RE: [PATCH 1/1] igb: Fix hw_dbg logging in igb_update_flash_i210

2017-01-05 Thread Brown, Aaron F

> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Peter Senna Tschudin
> Sent: Monday, January 2, 2017 9:26 AM
> Cc: Hannu Lounento ; Peter Senna Tschudin
> ; Kirsher, Jeffrey T
> ; moderated list:INTEL ETHERNET DRIVERS
> ; open list:NETWORKING DRIVERS
> ; open list 
> Subject: [PATCH 1/1] igb: Fix hw_dbg logging in igb_update_flash_i210
> 
> From: Hannu Lounento 
> 
> Fix an if statement with hw_dbg lines where the logic was inverted with
> regards to the corresponding return value used in the if statement.
> 
> Signed-off-by: Hannu Lounento 
> Signed-off-by: Peter Senna Tschudin 
> ---
>  drivers/net/ethernet/intel/igb/e1000_i210.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Tested-by: Aaron Brown

Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device

2017-01-05 Thread maowenan



On 2017/1/6 11:21, Florian Fainelli wrote:
> +Andrew,
> 
> Le 01/05/17 à 18:29, maowenan a écrit :
 @Florian Fainelli, what's your comments about this patch?
>>>
>>> I am trying to reproduce what you are seeing, but at first glance is looks 
>>> like an
>>> appropriate solution to me. Do you mind giving me a couple more days?
>>>
>>> Thanks!
>>> --
>>> Florian
>>
>> Hi Florian, 
>>   Do you have any update about this patch?
> 
> Your patch is not complete, there are now MDIO device (which PHY devices
> are a superset of) that would also need a similar fix.
> 
ok, is there any patch to fix MDIO yet?  if not, i will verify it and give a 
fix patch?

Re: [PATCH 1/2] net: ipv4: Simplify rt_fill_info

2017-01-05 Thread David Ahern

On 1/5/17 8:32 PM, David Ahern wrote:
> rt_fill_info has only 1 caller and both of the last 2 args -- nowait
> and flags -- are hardcoded to 0. Given that remove them as input arguments
> and simplify rt_fill_info accordingly.
> 
> Signed-off-by: David Ahern 
> ---
>  net/ipv4/route.c | 20 +++-
>  1 file changed, 7 insertions(+), 13 deletions(-)

forgot to update the header -- this is a standalone patch for net-next. sorry.

[PATCH net-next] net: ipv4: make fib_select_default static

2017-01-05 Thread David Ahern

fib_select_default has a single caller within the same file.
Make it static.

Signed-off-by: David Ahern 
---
 include/net/ip_fib.h | 1 -
 net/ipv4/fib_semantics.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 5f376af377c7..57c2a863d0b2 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -344,7 +344,6 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb);
 int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
u8 tos, int oif, struct net_device *dev,
struct in_device *idev, u32 *itag);
-void fib_select_default(const struct flowi4 *flp, struct fib_result *res);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 static inline int fib_num_tclassid_users(struct net *net)
 {
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 7a5b4c7d9a87..05c911d21782 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1434,7 +1434,7 @@ int fib_sync_down_dev(struct net_device *dev, unsigned 
long event, bool force)
 }
 
 /* Must be invoked inside of an RCU protected region.  */
-void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
+static void fib_select_default(const struct flowi4 *flp, struct fib_result 
*res)
 {
struct fib_info *fi = NULL, *last_resort = NULL;
struct hlist_head *fa_head = res->fa_head;
-- 
2.1.4

[PATCH 1/2] net: ipv4: Simplify rt_fill_info

2017-01-05 Thread David Ahern

rt_fill_info has only 1 caller and both of the last 2 args -- nowait
and flags -- are hardcoded to 0. Given that remove them as input arguments
and simplify rt_fill_info accordingly.

Signed-off-by: David Ahern 
---
 net/ipv4/route.c | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 0fcac8e7a2b2..7b52ac20145b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2454,7 +2454,7 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow);
 
 static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
-   u32 seq, int event, int nowait, unsigned int flags)
+   u32 seq, int event)
 {
struct rtable *rt = skb_rtable(skb);
struct rtmsg *r;
@@ -2463,7 +2463,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
u32 error;
u32 metrics[RTAX_MAX];
 
-   nlh = nlmsg_put(skb, portid, seq, event, sizeof(*r), flags);
+   nlh = nlmsg_put(skb, portid, seq, event, sizeof(*r), 0);
if (!nlh)
return -EMSGSIZE;
 
@@ -2541,18 +2541,12 @@ static int rt_fill_info(struct net *net,  __be32 dst, 
__be32 src, u32 table_id,
IPV4_DEVCONF_ALL(net, MC_FORWARDING)) {
int err = ipmr_get_route(net, skb,
 fl4->saddr, fl4->daddr,
-r, nowait, portid);
+r, 0, portid);
 
if (err <= 0) {
-   if (!nowait) {
-   if (err == 0)
-   return 0;
-   goto nla_put_failure;
-   } else {
-   if (err == -EMSGSIZE)
-   goto nla_put_failure;
-   error = err;
-   }
+   if (err == 0)
+   return 0;
+   goto nla_put_failure;
}
} else
 #endif
@@ -2665,7 +2659,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh)
 
err = rt_fill_info(net, dst, src, table_id, , skb,
   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq,
-  RTM_NEWROUTE, 0, 0);
+  RTM_NEWROUTE);
if (err < 0)
goto errout_free;
 
-- 
2.1.4

Re: [RFC PATCH] virtio_net: XDP support for adjust_head

2017-01-05 Thread John Fastabend

On 17-01-05 04:39 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 05, 2017 at 02:57:23PM -0800, John Fastabend wrote:
>> On 17-01-03 02:16 PM, Michael S. Tsirkin wrote:
>>> On Tue, Jan 03, 2017 at 02:01:27PM +0800, Jason Wang wrote:


 On 2017年01月03日 03:44, John Fastabend wrote:
> Add support for XDP adjust head by allocating a 256B header region
> that XDP programs can grow into. This is only enabled when a XDP
> program is loaded.
>
> In order to ensure that we do not have to unwind queue headroom push
> queue setup below bpf_prog_add. It reads better to do a prog ref
> unwind vs another queue setup call.
>
> : There is a problem with this patch as is. When xdp prog is loaded
>the old buffers without the 256B headers need to be flushed so that
>the bpf prog has the necessary headroom. This patch does this by
>calling the virtqueue_detach_unused_buf() and followed by the
>virtnet_set_queues() call to reinitialize the buffers. However I
>don't believe this is safe per comment in virtio_ring this API
>is not valid on an active queue and the only thing we have done
>here is napi_disable/napi_enable wrappers which doesn't do anything
>to the emulation layer.
>
>So the RFC is really to find the best solution to this problem.
>A couple things come to mind, (a) always allocate the necessary
>headroom but this is a bit of a waste (b) add some bit somewhere
>to check if the buffer has headroom but this would mean XDP programs
>would be broke for a cycle through the ring, (c) figure out how
>to deactivate a queue, free the buffers and finally reallocate.
>I think (c) is the best choice for now but I'm not seeing the
>API to do this so virtio/qemu experts anyone know off-hand
>how to make this work? I started looking into the PCI callbacks
>reset() and virtio_device_ready() or possibly hitting the right
>set of bits with vp_set_status() but my first attempt just hung
>the device.

 Hi John:

 AFAIK, disabling a specific queue was supported only by virtio 1.0 through
 queue_enable field in pci common cfg.
>>>
>>> In fact 1.0 only allows enabling queues selectively.
>>> We can add disabling by a spec enhancement but
>>> for now reset is the only way.
>>>
>>>
 But unfortunately, qemu does not
 emulate this at all and legacy device does not even support this. So the
 safe way is probably reset the device and redo the initialization here.
>>>
>>> You will also have to re-apply rx filtering if you do this.
>>> Probably sending notification uplink.
>>>
>>
>> The following seems to hang the device on the next virtnet_send_command()
>> I expected this to meet the reset requirements from the spec because I
>> believe its the same flow coming out of restore(). For a real patch we
>> don't actually need to kfree all the structs and reallocate them but
>> I was expecting the below to work. Any ideas/hints?
> 
> Restore assumes device was previously reset.
> You want to combine freeze+restore.

Yep the below is actually freeze+restore I just omitted the freeze portion
from the description.

> 
>> static int virtnet_xdp_reset(struct virtnet_info *vi)
>> {
>> int i, ret;
>>

//  insert flush_work here doesn't seem to help hang.

>> netif_device_detach(vi->dev);
>> cancel_delayed_work_sync(>refill);
>> if (netif_running(vi->dev)) {
>> for (i = 0; i < vi->max_queue_pairs; i++)
>> napi_disable(>rq[i].napi);
>> }
>>
>> remove_vq_common(vi, false);

// everything above is freeze sans flush_work and virtnet_cpu_notif_remove
// the rest below is restore where I call virtnet_set_queues later outside
// the reset function.

>> ret = init_vqs(vi);
>> if (ret)
>> return ret;
>> virtio_device_ready(vi->vdev);
>>
>> if (netif_running(vi->dev)) {
>> for (i = 0; i < vi->curr_queue_pairs; i++)
>> if (!try_fill_recv(vi, >rq[i], GFP_KERNEL))
>> schedule_delayed_work(>refill, 0);
>>
>> for (i = 0; i < vi->max_queue_pairs; i++)
>> virtnet_napi_enable(>rq[i]);
>> }
>> netif_device_attach(vi->dev);
>> return 0;
>> }

Could be a locking problem I'm missing so I'll look at it a bit more but
good to know we expect freeze/restore to reset the device.

.John

Re: [PATCH net-next] cxgb4: Synchronize access to mailbox

2017-01-05 Thread Hariprasad S




> On 05-Jan-2017, at 9:35 PM, David Miller  wrote:
> 
> From: Hariprasad Shenai 
> Date: Thu,  5 Jan 2017 11:23:10 +0530
> 
>> @@ -844,6 +848,10 @@ struct adapter {
>>struct work_struct db_drop_task;
>>bool tid_release_task_busy;
>> 
>> +/* lock for mailbox cmd list */
>> +spinlock_t mbox_lock;
>> +struct mbox_list mlist;
>> +
> ...
>> @@ -4707,6 +4707,9 @@ static int init_one(struct pci_dev *pdev, const struct 
>> pci_device_id *ent)
>>spin_lock_init(>stats_lock);
>>spin_lock_init(>tid_release_lock);
>>spin_lock_init(>win0_lock);
>> +spin_lock_init(>mbox_lock);
>> +
>> +INIT_LIST_HEAD(>mbox_list.list);
> 
> It is absolutely impossible that you even compiled this code.

My bad. Looks like I sent wrong patch. I will send V2 for the same.

Re: [PATCH] net:phy fix driver reference count error when attach and detach phy device

2017-01-05 Thread Florian Fainelli

+Andrew,

Le 01/05/17 à 18:29, maowenan a écrit :
>>> @Florian Fainelli, what's your comments about this patch?
>>
>> I am trying to reproduce what you are seeing, but at first glance is looks 
>> like an
>> appropriate solution to me. Do you mind giving me a couple more days?
>>
>> Thanks!
>> --
>> Florian
> 
> Hi Florian, 
>   Do you have any update about this patch?

Your patch is not complete, there are now MDIO device (which PHY devices
are a superset of) that would also need a similar fix.
-- 
Florian

Re: [PATCH v4 0/4] vsock: cancel connect packets when failing to connect

2017-01-05 Thread Peng Tao

On Tue, Dec 13, 2016 at 5:50 PM, Stefan Hajnoczi  wrote:
> On Mon, Dec 12, 2016 at 08:21:05PM +0800, Peng Tao wrote:
>> Currently, if a connect call fails on a signal or timeout (e.g., guest is 
>> still
>> in the process of starting up), we'll just return to caller and leave the 
>> connect
>> packet queued and they are sent even though the connection is considered a 
>> failure,
>> which can confuse applications with unwanted false connect attempt.
>>
>> The patchset enables vsock (both host and guest) to cancel queued packets 
>> when
>> a connect attempt is considered to fail.
>>
>> v4 changelog:
>>   - drop two unnecessary void * cast
>>   - update new callback commnet
>> v3 changelog:
>>   - define cancel_pkt callback in struct vsock_transport rather than struct 
>> virtio_transport
>>   - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
>> v2 changelog:
>>   - fix queued_replies counting and resume tx/rx when necessary
>>
>> Cheers,
>> Tao
>>
>> Peng Tao (4):
>>   vsock: track pkt owner vsock
>>   vhost-vsock: add pkt cancel capability
>>   vsock: add pkt cancel capability
>>   vsock: cancel packets when failing to connect
>>
>>  drivers/vhost/vsock.c   | 41 
>> 
>>  include/linux/virtio_vsock.h|  2 ++
>>  include/net/af_vsock.h  |  3 +++
>>  net/vmw_vsock/af_vsock.c| 14 +++
>>  net/vmw_vsock/virtio_transport.c| 42 
>> +
>>  net/vmw_vsock/virtio_transport_common.c |  7 ++
>>  6 files changed, 109 insertions(+)
>>
>> --
>> 2.7.4
>>
>> ___
>> Virtualization mailing list
>> virtualizat...@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
> Reviewed-by: Stefan Hajnoczi 
ping~

It looks like the patchsets are reviewed but not merged. Is there any blocker?

Cheers,
Tao

RE: [PATCH] net:phy fix driver reference count error when attach and detach phy device

2017-01-05 Thread maowenan



> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Tuesday, December 13, 2016 12:33 AM
> To: maowenan; David Laight; netdev@vger.kernel.org; Dingtianhong;
> weiyongjun (A)
> Subject: Re: [PATCH] net:phy fix driver reference count error when attach and
> detach phy device
> 
> On 12/12/2016 12:49 AM, maowenan wrote:
> >
> >
> > On 2016/12/5 16:47, maowenan wrote:
> >>
> >>
> >> On 2016/12/2 17:45, David Laight wrote:
> >>> From: Mao Wenan
>  Sent: 30 November 2016 10:23
>  The nic in my board use the phy dev from marvell, and the system
>  will load the marvell phy driver automatically, but when I remove
>  the phy drivers, the system immediately panic:
>  Call trace:
>  [ 2582.834493] [] phy_state_machine+0x3c/0x438 [
>  2582.851754] [] process_one_work+0x150/0x428 [
>  2582.868188] [] worker_thread+0x144/0x4b0 [
>  2582.883882] [] kthread+0xfc/0x110
> 
>  there should be proper reference counting in place to avoid that.
>  I found that phy_attach_direct() forgets to add phy device driver
>  reference count, and phy_detach() forgets to subtract reference count.
>  This patch is to fix this bug, after that panic is disappeared when
>  remove marvell.ko
> 
>  Signed-off-by: Mao Wenan 
>  ---
>   drivers/net/phy/phy_device.c | 7 +++
>   1 file changed, 7 insertions(+)
> 
>  diff --git a/drivers/net/phy/phy_device.c
>  b/drivers/net/phy/phy_device.c index 1a4bf8a..a7ec7c2 100644
>  --- a/drivers/net/phy/phy_device.c
>  +++ b/drivers/net/phy/phy_device.c
>  @@ -866,6 +866,11 @@ int phy_attach_direct(struct net_device *dev,
> struct phy_device *phydev,
>   return -EIO;
>   }
> 
>  +if (!try_module_get(d->driver->owner)) {
>  +dev_err(>dev, "failed to get the device driver 
>  module\n");
>  +return -EIO;
>  +}
> >>>
> >>> If this is the phy code, what stops the phy driver being unloaded
> >>> before the try_module_get() obtains a reference.
> >>> If it isn't the phy driver then there ought to be a reference count
> >>> obtained when the phy driver is located (by whatever decides which phy
> driver to use).
> >>> Even if that code later releases its reference (it probably
> >>> shouldn't on success) then you can't fail to get an extra reference here.
> >>
> >> [Mao Wenan]Yes, this is phy code, in function phy_attach_direct(),
> drivers/net/phy/phy_device.c.
> >> when one NIC driver to do probe behavior, it will attach one matched
> >> phy driver. phy_attach_direct() is to obtain phy driver reference and
> >> bind phy driver, if try_module_get() execute on success, the
> >> reference count is added; if failed, the driver can't be attached to this 
> >> NIC,
> and it can't added the phy driver reference count. So before try_module_get
> obtains a reference, phy driver can't can't be bound to this NIC.
> >> when the phy driver is attached to NIC, the reference count is added,
> >> if someone remove phy driver directly, it will be failed because reference
> count is not equal to 0.
> >>
> >> An example of call trace when there is NIC driver to attch one phy driver:
> >> hns_nic_dev_probe->hns_nic_try_get_ae->hns_nic_init_phy->of_phy_conne
> >> ct->phy_connect_direct->phy_attach_direct
> >>
> >> Consider the steps of phy driver(marvell.ko) added and removed, and NIC
> driver(hns_enet_drv.ko) added and removed:
> >> 1)insmod marvell   ref=0
> >> 2)insmod hns_enet_drv  ref=1
> >> 3)rmmod marvell(should not on success, ref=1)
> >> 4)rmmod hns_enet_drv   ref=0
> >> 5)rmmod marvell(should on success, because ref=0)
> >>
> >> if we don't add the reference count in phy_attach_direct(the second
> >> step ref=0), so the third step rmmod marvell will be panic, because there 
> >> is
> one user remain use marvell driver and phy_stat_machine use the NULL drv
> pointer.
> >>
> >>>
>  +
>   get_device(d);
> 
>   /* Assume that if there is no driver, that it doesn't @@ -921,6
>  +926,7 @@ int phy_attach_direct(struct net_device *dev, struct
>  phy_device *phydev,
> 
>   error:
>   put_device(d);
>  +module_put(d->driver->owner);
> >>>
> >>> Are those two in the wrong order ?
> >>>
>   module_put(bus->owner);
>   return err;
>   }
>  @@ -998,6 +1004,7 @@ void phy_detach(struct phy_device *phydev)
>   bus = phydev->mdio.bus;
> 
>   put_device(>mdio.dev);
>  +module_put(phydev->mdio.dev.driver->owner);
>   module_put(bus->owner);
> >>>
> >>> Where is this code called from?
> >>> You can't call it from the phy driver because the driver can be
> >>> unloaded as soon as the last reference is removed.
> >>> At that point the code memory is freed.
> >>
> >> [Mao Wenan] it is called by NIC when it

[PATCH V4 net-next 3/3] tun: rx batching

2017-01-05 Thread Jason Wang

We can only process 1 packet at one time during sendmsg(). This often
lead bad cache utilization under heavy load. So this patch tries to do
some batching during rx before submitting them to host network
stack. This is done through accepting MSG_MORE as a hint from
sendmsg() caller, if it was set, batch the packet temporarily in a
linked list and submit them all once MSG_MORE were cleared.

Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host:

 Mpps  -+%
rx-frames = 00.91  +0%
rx-frames = 41.00  +9.8%
rx-frames = 81.00  +9.8%
rx-frames = 16   1.01  +10.9%
rx-frames = 32   1.07  +17.5%
rx-frames = 48   1.07  +17.5%
rx-frames = 64   1.08  +18.6%
rx-frames = 64 (no MSG_MORE) 0.91  +0%

User were allowed to change per device batched packets through
ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation
to prevent bh from being disabled too long.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 76 ++-
 1 file changed, 70 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cd8e02c..6c93926 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -218,6 +218,7 @@ struct tun_struct {
struct list_head disabled;
void *security;
u32 flow_count;
+   u32 rx_batched;
struct tun_pcpu_stats __percpu *pcpu_stats;
 };
 
@@ -522,6 +523,7 @@ static void tun_queue_purge(struct tun_file *tfile)
while ((skb = skb_array_consume(>tx_array)) != NULL)
kfree_skb(skb);
 
+   skb_queue_purge(>sk.sk_write_queue);
skb_queue_purge(>sk.sk_error_queue);
 }
 
@@ -1140,10 +1142,45 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
*tfile,
return skb;
 }
 
+static void tun_rx_batched(struct tun_struct *tun, struct tun_file *tfile,
+  struct sk_buff *skb, int more)
+{
+   struct sk_buff_head *queue = >sk.sk_write_queue;
+   struct sk_buff_head process_queue;
+   u32 rx_batched = tun->rx_batched;
+   bool rcv = false;
+
+   if (!rx_batched || (!more && skb_queue_empty(queue))) {
+   local_bh_disable();
+   netif_receive_skb(skb);
+   local_bh_enable();
+   return;
+   }
+
+   spin_lock(>lock);
+   if (!more || skb_queue_len(queue) == rx_batched) {
+   __skb_queue_head_init(_queue);
+   skb_queue_splice_tail_init(queue, _queue);
+   rcv = true;
+   } else {
+   __skb_queue_tail(queue, skb);
+   }
+   spin_unlock(>lock);
+
+   if (rcv) {
+   struct sk_buff *nskb;
+   local_bh_disable();
+   while ((nskb = __skb_dequeue(_queue)))
+   netif_receive_skb(nskb);
+   netif_receive_skb(skb);
+   local_bh_enable();
+   }
+}
+
 /* Get packet from user space buffer */
 static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
void *msg_control, struct iov_iter *from,
-   int noblock)
+   int noblock, bool more)
 {
struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
struct sk_buff *skb;
@@ -1283,10 +1320,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
skb_probe_transport_header(skb, 0);
 
rxhash = skb_get_hash(skb);
+
 #ifndef CONFIG_4KSTACKS
-   local_bh_disable();
-   netif_receive_skb(skb);
-   local_bh_enable();
+   tun_rx_batched(tun, tfile, skb, more);
 #else
netif_rx_ni(skb);
 #endif
@@ -1312,7 +1348,8 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (!tun)
return -EBADFD;
 
-   result = tun_get_user(tun, tfile, NULL, from, file->f_flags & 
O_NONBLOCK);
+   result = tun_get_user(tun, tfile, NULL, from,
+ file->f_flags & O_NONBLOCK, false);
 
tun_put(tun);
return result;
@@ -1570,7 +1607,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr 
*m, size_t total_len)
return -EBADFD;
 
ret = tun_get_user(tun, tfile, m->msg_control, >msg_iter,
-  m->msg_flags & MSG_DONTWAIT);
+  m->msg_flags & MSG_DONTWAIT,
+  m->msg_flags & MSG_MORE);
tun_put(tun);
return ret;
 }
@@ -1771,6 +1809,7 @@ static int tun_set_iff(struct net *net, struct file 
*file, struct ifreq *ifr)
tun->align = NET_SKB_PAD;
tun->filter_attached = false;
tun->sndbuf = tfile->socket.sk->sk_sndbuf;
+   tun->rx_batched = 0;
 
tun->pcpu_stats = netdev_alloc_pcpu_stats(struct

[PATCH V4 net-next 1/3] vhost: better detection of available buffers

2017-01-05 Thread Jason Wang

This patch tries to do several tweaks on vhost_vq_avail_empty() for a
better performance:

- check cached avail index first which could avoid userspace memory access.
- using unlikely() for the failure of userspace access
- check vq->last_avail_idx instead of cached avail index as the last
  step.

This patch is need for batching supports which needs to peek whether
or not there's still available buffers in the ring.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d643260..9f11838 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
__virtio16 avail_idx;
int r;
 
+   if (vq->avail_idx != vq->last_avail_idx)
+   return false;
+
r = vhost_get_user(vq, avail_idx, >avail->idx);
-   if (r)
+   if (unlikely(r))
return false;
+   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
 
-   return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
+   return vq->avail_idx == vq->last_avail_idx;
 }
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-- 
2.7.4

[PATCH V4 net-next 0/3] vhost_net tx batching

2017-01-05 Thread Jason Wang

Hi:

This series tries to implement tx batching support for vhost. This was
done by using MSG_MORE as a hint for under layer socket. The backend
(e.g tap) can then batch the packets temporarily in a list and
submit it all once the number of bacthed exceeds a limitation.

Tests shows obvious improvement on guest pktgen over over
mlx4(noqueue) on host:

 Mpps  -+%
rx-frames = 00.91  +0%
rx-frames = 41.00  +9.8%
rx-frames = 81.00  +9.8%
rx-frames = 16   1.01  +10.9%
rx-frames = 32   1.07  +17.5%
rx-frames = 48   1.07  +17.5%
rx-frames = 64   1.08  +18.6%
rx-frames = 64 (no MSG_MORE) 0.91  +0%

Changes from V3:
- use ethtool instead of module parameter to control the maximum
  number of batched packets
- avoid overhead when MSG_MORE were not set and no packet queued

Changes from V2:
- remove uselss queue limitation check (and we don't drop any packet now)

Changes from V1:
- drop NAPI handler since we don't use NAPI now
- fix the issues that may exceeds max pending of zerocopy
- more improvement on available buffer detection
- move the limitation of batched pacekts from vhost to tuntap

Please review.

Thanks

Jason Wang (3):
  vhost: better detection of available buffers
  vhost_net: tx batching
  tun: rx batching

 drivers/net/tun.c | 76 +++
 drivers/vhost/net.c   | 23 ++--
 drivers/vhost/vhost.c |  8 --
 3 files changed, 96 insertions(+), 11 deletions(-)

-- 
2.7.4

[PATCH V4 net-next 2/3] vhost_net: tx batching

2017-01-05 Thread Jason Wang

This patch tries to utilize tuntap rx batching by peeking the tx
virtqueue during transmission, if there's more available buffers in
the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap)
to batch the packets.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc3465..c42e9c3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -351,6 +351,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
return r;
 }
 
+static bool vhost_exceeds_maxpend(struct vhost_net *net)
+{
+   struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
+   struct vhost_virtqueue *vq = >vq;
+
+   return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
+   == nvq->done_idx;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -394,8 +403,7 @@ static void handle_tx(struct vhost_net *net)
/* If more outstanding DMAs, queue the work.
 * Handle upend_idx wrap around
 */
-   if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND)
- % UIO_MAXIOV == nvq->done_idx))
+   if (unlikely(vhost_exceeds_maxpend(net)))
break;
 
head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
@@ -454,6 +462,16 @@ static void handle_tx(struct vhost_net *net)
msg.msg_control = NULL;
ubufs = NULL;
}
+
+   total_len += len;
+   if (total_len < VHOST_NET_WEIGHT &&
+   !vhost_vq_avail_empty(>dev, vq) &&
+   likely(!vhost_exceeds_maxpend(net))) {
+   msg.msg_flags |= MSG_MORE;
+   } else {
+   msg.msg_flags &= ~MSG_MORE;
+   }
+
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(sock, , len);
if (unlikely(err < 0)) {
@@ -472,7 +490,6 @@ static void handle_tx(struct vhost_net *net)
vhost_add_used_and_signal(>dev, vq, head, 0);
else
vhost_zerocopy_signal_used(net, vq);
-   total_len += len;
vhost_net_tx_packet(net);
if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
vhost_poll_queue(>poll);
-- 
2.7.4

[PATCH v2] net: stmmac: fix maxmtu assignment to be within valid range

2017-01-05 Thread Kweh, Hock Leong

From: "Kweh, Hock Leong" 

There is no checking valid value of maxmtu when getting it from device tree.
This resolution added the checking condition to ensure the assignment is
made within a valid range.

Signed-off-by: Kweh, Hock Leong 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 92ac006..4df555e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3345,8 +3345,14 @@ int stmmac_dvr_probe(struct device *device,
ndev->max_mtu = JUMBO_LEN;
else
ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
-   if (priv->plat->maxmtu < ndev->max_mtu)
+
+   if ((priv->plat->maxmtu < ndev->max_mtu) &&
+   (priv->plat->maxmtu >= ndev->min_mtu))
ndev->max_mtu = priv->plat->maxmtu;
+   else if (priv->plat->maxmtu != 0)
+   netdev_warn(priv->dev,
+   "%s: warning: maxmtu having invalid value (%d)\n",
+   __func__, priv->plat->maxmtu);
 
if (flow_ctrl)
priv->flow_ctrl = FLOW_AUTO;/* RX/TX pause on */
-- 
1.7.9.5

RE: [PATCH] net: stmmac: fix maxmtu assignment to be within valid range

2017-01-05 Thread Kweh, Hock Leong

> -Original Message-
> From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> Sent: Friday, January 06, 2017 5:07 AM
> To: Kweh, Hock Leong 
> Cc: David S. Miller ; Joao Pinto
> ; Giuseppe CAVALLARO ;
> seraphin.bonna...@st.com; Jarod Wilson ; Alexandre
> TORGUE ; Joachim Eastwood
> ; Niklas Cassel ; Johan Hovold
> ; Pavel Machek ; lars.pers...@axis.com;
> netdev ; LKML 
> Subject: Re: [PATCH] net: stmmac: fix maxmtu assignment to be within valid
> range
> 
> On Thu, Jan 5, 2017 at 12:47 PM, Kweh, Hock Leong
>  wrote:
> > From: "Kweh, Hock Leong" 
> >
> > There is no checking valid value of maxmtu when getting it from devicetree.
> 
> 'Device Tree' or 'device tree' ?

Noted & Thanks. Submitting V2.

> 
> > This resolution added the checking condition to ensure the assignment
> > is made within a valid range.
> 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 39eb7a6..683d59f 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -3319,7 +3319,8 @@ int stmmac_dvr_probe(struct device *device,
> > ndev->max_mtu = JUMBO_LEN;
> > else
> > ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
> > -   if (priv->plat->maxmtu < ndev->max_mtu)
> > +   if ((priv->plat->maxmtu < ndev->max_mtu) &&
> > +   (priv->plat->maxmtu >= ndev->min_mtu))
> > ndev->max_mtu = priv->plat->maxmtu;
> 
> Perhaps add a warning message on else branch?

Noted & Thanks. Submitting V2.

> 
> --
> With Best Regards,
> Andy Shevchenko

Re: [PATCH 2/2] PCI: lock each enable/disable num_vfs operation in sysfs

2017-01-05 Thread Gavin Shan

On Fri, Jan 06, 2017 at 12:55:08AM +, Tantilov, Emil S wrote:
>>On Wed, Jan 04, 2017 at 04:00:20PM +, Tantilov, Emil S wrote:
On Tue, Jan 03, 2017 at 04:48:31PM -0800, Emil Tantilov wrote:
>Enabling/disabling SRIOV via sysfs by echo-ing multiple values
>simultaneously:
>
>echo 63 > /sys/class/net/ethX/device/sriov_numvfs&
>echo 63 > /sys/class/net/ethX/device/sriov_numvfs
>
>sleep 5
>
>echo 0 > /sys/class/net/ethX/device/sriov_numvfs&
>echo 0 > /sys/class/net/ethX/device/sriov_numvfs
>
>Results in the following bug:
>
>kernel BUG at drivers/pci/iov.c:495!
>invalid opcode:  [#1] SMP
>CPU: 1 PID: 8050 Comm: bash Tainted: G   W   4.9.0-rc7-net-next #2092
>RIP: 0010:[]
> [] pci_iov_release+0x57/0x60
>
>Call Trace:
> [] pci_release_dev+0x26/0x70
> [] device_release+0x3e/0xb0
> [] kobject_cleanup+0x67/0x180
> [] kobject_put+0x2d/0x60
> [] put_device+0x17/0x20
> [] pci_dev_put+0x1a/0x20
> [] pci_get_dev_by_id+0x5b/0x90
> [] pci_get_subsys+0x35/0x40
> [] pci_get_device+0x18/0x20
> [] pci_get_domain_bus_and_slot+0x2b/0x60
> [] pci_iov_remove_virtfn+0x57/0x180
> [] pci_disable_sriov+0x65/0x140
> [] ixgbe_disable_sriov+0xc7/0x1d0 [ixgbe]
> [] ixgbe_pci_sriov_configure+0x3d/0x170 [ixgbe]
> [] sriov_numvfs_store+0xdc/0x130
>...
>RIP  [] pci_iov_release+0x57/0x60
>
>Use the existing mutex lock to protect each enable/disable operation.
>
>CC: Alexander Duyck 
>Signed-off-by: Emil Tantilov 

Emil, It's going to change semantics of pci_enable_sriov() and
>>pci_disable_sriov().
They can be invoked when writing to the sysfs entry, or loading PF's
driver. With the change applied, the lock (pf->sriov->lock) isn't acquired 
and released
in the PF's driver loading path.
>>>
>>>The enablement of SRIOV on driver load is done via deprecated module 
>>>parameter.
>>>Perhaps we can just remove it, although there are probably still people that 
>>>use it
>>>and may not be happy if we get rid of it.
>>>
>>
>>Yeah, some drivers are still using the interface. So we cannot affect it
>>until it can be droped.
>>
I think the reasonable way would be adding a flag in "struct sriov", to
indicate someone is accessing the IOV capability through sysfs file. With 
this, the
code returns with "-EBUSY" immediately for contenders. With it, nothing is 
going
to be changed in PF's driver loading path.
>>>
>>>Flag is what I initially had in mind, but did not want to add extra locking 
>>>if we
>>>can make use of the existing.
>>>
>>
>>The problem is sriov->lock wasn't introduced to protect the whole IOV 
>>capability.
>>Instead, it protects the allocation of virtual bus (if needed). In your patch,
>>it will be used to protect the whole IOV capability, ensure accessing the
>>IOV capability exclusively. So the usage of this lock is changed.
>>
>> code extracted from pci.h:
>>
>> struct pci_sriov {
>>:
>>struct mutex lock;  /* lock for VF bus */
>>:
>> }
>>
>>The lock was introduced by commit d1b054da8 ("PCI: initialize and release
>>SR-IOV capability"). If I'm correct enough, I don't think this lock is needed 
>>when
>>pci_enable_sriov() or pci_disable_sriov() are called in driver because of
>>module
>>parameters. I don't see the usage case calling pci_disable_sriov() while
>>previous pci_enable_sriov() isn't finished yet. Also, it's not needed in EEH
>>subsystem.
>>So I think the lock can be dropped, then it can be used to protect sysfs path.
>
>That's pretty much what this patch does, except I kept the locking for EEH 
>since 
>it is the only driver that calls pci_iov_add/remove_virtfn() directly.
>
>I'll write it up and run some tests, although I have no way to test EEH.
> 

Yes, Your patch is close to what I suggested. I'm pretty sure EEH needn't
this lock. I'll fix it up if EEH is broken because of it.

Also, there are some minor comments as below and I guess most of them won't
be applied if you take my suggestion eventually. However, I'm trying to make
the comments complete.
>>>
>>>Thanks a lot for reviewing!
>>>

>---
> drivers/pci/pci-sysfs.c |   24 +---
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>index 0666287..5b54cf5 100644
>--- a/drivers/pci/pci-sysfs.c
>+++ b/drivers/pci/pci-sysfs.c
>@@ -472,7 +472,9 @@ static ssize_t sriov_numvfs_store(struct device
>>*dev,
> const char *buf, size_t count)
> {
>   struct pci_dev *pdev = to_pci_dev(dev);
>+  struct pci_sriov *iov = pdev->sriov;
>   int ret;
>+

Unnecessary change.

>   u16 num_vfs;
>
>   ret = kstrtou16(buf, 0, _vfs);
>@@ -482,38

Re: [PATCH v2] scsi: bfa: Increase requested firmware version to 3.2.5.1

2017-01-05 Thread Martin K. Petersen

> "Benjamin" == Benjamin Poirier  writes:

Benjamin> bna & bfa firmware version 3.2.5.1 was submitted to
Benjamin> linux-firmware on Feb 17 19:10:20 2015 -0500 in 0ab54ff1dc
Benjamin> ("linux-firmware: Add QLogic BR Series Adapter Firmware").

Benjamin> Increase bfa's requested firmware version. Also increase the
Benjamin> driver version.  I only tested that all of the devices probe
Benjamin> without error.

Applied to 4.10/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH net] udp: inuse checks can quit early for reuseport

2017-01-05 Thread Eric Garver

UDP lib inuse checks will walk the entire hash bucket to check if the
portaddr is in use. In the case of reuseport we can stop searching when
we find a matching reuseport.

On a 16-core VM a test program that spawns 16 threads that each bind to
1024 sockets (one per 10ms) takes 1m45s. With this change it takes 11s.

Also add a cond_resched() when the port is not specified.

Signed-off-by: Eric Garver 
---
 net/ipv4/udp.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1307a7c2e544..4318d72e0248 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -153,13 +153,18 @@ static int udp_lib_lport_inuse(struct net *net, __u16 num,
(!sk2->sk_reuse || !sk->sk_reuse) &&
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-   (!sk2->sk_reuseport || !sk->sk_reuseport ||
-rcu_access_pointer(sk->sk_reuseport_cb) ||
-!uid_eq(uid, sock_i_uid(sk2))) &&
saddr_comp(sk, sk2, true)) {
-   if (!bitmap)
-   return 1;
-   __set_bit(udp_sk(sk2)->udp_port_hash >> log, bitmap);
+   if (sk2->sk_reuseport && sk->sk_reuseport &&
+   !rcu_access_pointer(sk->sk_reuseport_cb) &&
+   uid_eq(uid, sock_i_uid(sk2))) {
+   if (!bitmap)
+   return 0;
+   } else {
+   if (!bitmap)
+   return 1;
+   __set_bit(udp_sk(sk2)->udp_port_hash >> log,
+ bitmap);
+   }
}
}
return 0;
@@ -188,11 +193,14 @@ static int udp_lib_lport_inuse2(struct net *net, __u16 
num,
(!sk2->sk_reuse || !sk->sk_reuse) &&
(!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
-   (!sk2->sk_reuseport || !sk->sk_reuseport ||
-rcu_access_pointer(sk->sk_reuseport_cb) ||
-!uid_eq(uid, sock_i_uid(sk2))) &&
saddr_comp(sk, sk2, true)) {
-   res = 1;
+   if (sk2->sk_reuseport && sk->sk_reuseport &&
+   !rcu_access_pointer(sk->sk_reuseport_cb) &&
+   uid_eq(uid, sock_i_uid(sk2))) {
+   res = 0;
+   } else {
+   res = 1;
+   }
break;
}
}
@@ -285,6 +293,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
snum += rand;
} while (snum != first);
spin_unlock_bh(>lock);
+   cond_resched();
} while (++first != last);
goto fail;
} else {
-- 
2.10.0

RE: [PATCH 2/2] PCI: lock each enable/disable num_vfs operation in sysfs

2017-01-05 Thread Tantilov, Emil S

>-Original Message-
>From: Gavin Shan [mailto:gws...@linux.vnet.ibm.com]
>Sent: Wednesday, January 04, 2017 3:12 PM
>To: Tantilov, Emil S 
>Cc: Gavin Shan ; linux-...@vger.kernel.org;
>intel-wired-...@lists.osuosl.org; Duyck, Alexander H
>; netdev@vger.kernel.org; linux-
>ker...@vger.kernel.org
>Subject: Re: [PATCH 2/2] PCI: lock each enable/disable num_vfs operation in
>sysfs
>
>On Wed, Jan 04, 2017 at 04:00:20PM +, Tantilov, Emil S wrote:
>>>On Tue, Jan 03, 2017 at 04:48:31PM -0800, Emil Tantilov wrote:
Enabling/disabling SRIOV via sysfs by echo-ing multiple values
simultaneously:

echo 63 > /sys/class/net/ethX/device/sriov_numvfs&
echo 63 > /sys/class/net/ethX/device/sriov_numvfs

sleep 5

echo 0 > /sys/class/net/ethX/device/sriov_numvfs&
echo 0 > /sys/class/net/ethX/device/sriov_numvfs

Results in the following bug:

kernel BUG at drivers/pci/iov.c:495!
invalid opcode:  [#1] SMP
CPU: 1 PID: 8050 Comm: bash Tainted: G   W   4.9.0-rc7-net-next #2092
RIP: 0010:[]
  [] pci_iov_release+0x57/0x60

Call Trace:
 [] pci_release_dev+0x26/0x70
 [] device_release+0x3e/0xb0
 [] kobject_cleanup+0x67/0x180
 [] kobject_put+0x2d/0x60
 [] put_device+0x17/0x20
 [] pci_dev_put+0x1a/0x20
 [] pci_get_dev_by_id+0x5b/0x90
 [] pci_get_subsys+0x35/0x40
 [] pci_get_device+0x18/0x20
 [] pci_get_domain_bus_and_slot+0x2b/0x60
 [] pci_iov_remove_virtfn+0x57/0x180
 [] pci_disable_sriov+0x65/0x140
 [] ixgbe_disable_sriov+0xc7/0x1d0 [ixgbe]
 [] ixgbe_pci_sriov_configure+0x3d/0x170 [ixgbe]
 [] sriov_numvfs_store+0xdc/0x130
...
RIP  [] pci_iov_release+0x57/0x60

Use the existing mutex lock to protect each enable/disable operation.

CC: Alexander Duyck 
Signed-off-by: Emil Tantilov 
>>>
>>>Emil, It's going to change semantics of pci_enable_sriov() and
>pci_disable_sriov().
>>>They can be invoked when writing to the sysfs entry, or loading PF's
>>>driver. With the change applied, the lock (pf->sriov->lock) isn't acquired 
>>>and released
>>>in the PF's driver loading path.
>>
>>The enablement of SRIOV on driver load is done via deprecated module 
>>parameter.
>>Perhaps we can just remove it, although there are probably still people that 
>>use it
>>and may not be happy if we get rid of it.
>>
>
>Yeah, some drivers are still using the interface. So we cannot affect it
>until it can be droped.
>
>>>I think the reasonable way would be adding a flag in "struct sriov", to
>>>indicate someone is accessing the IOV capability through sysfs file. With 
>>>this, the
>>>code returns with "-EBUSY" immediately for contenders. With it, nothing is 
>>>going
>>>to be changed in PF's driver loading path.
>>
>>Flag is what I initially had in mind, but did not want to add extra locking 
>>if we
>>can make use of the existing.
>>
>
>The problem is sriov->lock wasn't introduced to protect the whole IOV 
>capability.
>Instead, it protects the allocation of virtual bus (if needed). In your patch,
>it will be used to protect the whole IOV capability, ensure accessing the
>IOV capability exclusively. So the usage of this lock is changed.
>
> code extracted from pci.h:
>
> struct pci_sriov {
>:
>struct mutex lock;  /* lock for VF bus */
>:
> }
>
>The lock was introduced by commit d1b054da8 ("PCI: initialize and release
>SR-IOV capability"). If I'm correct enough, I don't think this lock is needed 
>when
>pci_enable_sriov() or pci_disable_sriov() are called in driver because of
>module
>parameters. I don't see the usage case calling pci_disable_sriov() while
>previous pci_enable_sriov() isn't finished yet. Also, it's not needed in EEH
>subsystem.
>So I think the lock can be dropped, then it can be used to protect sysfs path.

That's pretty much what this patch does, except I kept the locking for EEH 
since 
it is the only driver that calls pci_iov_add/remove_virtfn() directly.

I'll write it up and run some tests, although I have no way to test EEH.
 
>>>Also, there are some minor comments as below and I guess most of them won't
>>>be applied if you take my suggestion eventually. However, I'm trying to make
>>>the comments complete.
>>
>>Thanks a lot for reviewing!
>>
>>>
---
 drivers/pci/pci-sysfs.c |   24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 0666287..5b54cf5 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -472,7 +472,9 @@ static ssize_t sriov_numvfs_store(struct device
>*dev,
  const char *buf, size_t count)
 {
struct pci_dev *pdev = to_pci_dev(dev);
+   struct pci_sriov *iov =

Re: [Intel-wired-lan] [net-next PATCH v2 2/6] i40e: Introduce VF Port Representator(VFPR) netdevs.

2017-01-05 Thread Samudrala, Sridhar




On 1/5/2017 1:46 PM, Jeff Kirsher wrote:

On Tue, 2017-01-03 at 10:07 -0800, Sridhar Samudrala wrote:

VF Port Representator netdevs are created for each VF if the switch mode
is set to 'switchdev'. These netdevs can be used to control and configure
VFs from PFs namespace. They enable exposing VF statistics, configure and
monitor link state, mtu, filters, fdb/vlan entries etc. of VFs.
Broadcast filters are not enabled in switchdev mode.

Sample script to create VF port representors
# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/:05:00.0 mode switchdev
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip l show
297: enp5s0f0:  mtu 1500 qdisc noop portid
6805ca2e7268 state DOWN mode DEFAULT group default qlen 1000
  link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff
  vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto,
trust off
  vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto,
trust off
299: enp5s0f0-vf0:  mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
  link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
300: enp5s0f0-vf1:  mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
  link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Sridhar Samudrala 
---
  drivers/net/ethernet/intel/i40e/i40e_main.c|  21 ++-
  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 154
-
  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  14 ++
  3 files changed, 182 insertions(+), 7 deletions(-)

This does not apply cleanly because it is based on an older version of
i40e_virtchnl_pf.c file.  It appears that i40e has been updated to use
"i40e_add_filter()" yet your patch still uses "i40e_add_mac_filter()".

I am not using i40e_add_mac_filter() in my patches. I only i40e_add_filter()
These patches are against davem's net-next kernel



We need to clarify what the "right way" is to add filters and use the
correct function.

Dropping this series and will await v3, please address the other feedback
from Or Gerlitz and Jiri Pirko as well in your updated series.


Sure. I will be submitting a v3 soon addressing the review comments.

Re: [RFC PATCH] virtio_net: XDP support for adjust_head

2017-01-05 Thread Michael S. Tsirkin

On Thu, Jan 05, 2017 at 02:57:23PM -0800, John Fastabend wrote:
> On 17-01-03 02:16 PM, Michael S. Tsirkin wrote:
> > On Tue, Jan 03, 2017 at 02:01:27PM +0800, Jason Wang wrote:
> >>
> >>
> >> On 2017年01月03日 03:44, John Fastabend wrote:
> >>> Add support for XDP adjust head by allocating a 256B header region
> >>> that XDP programs can grow into. This is only enabled when a XDP
> >>> program is loaded.
> >>>
> >>> In order to ensure that we do not have to unwind queue headroom push
> >>> queue setup below bpf_prog_add. It reads better to do a prog ref
> >>> unwind vs another queue setup call.
> >>>
> >>> : There is a problem with this patch as is. When xdp prog is loaded
> >>>the old buffers without the 256B headers need to be flushed so that
> >>>the bpf prog has the necessary headroom. This patch does this by
> >>>calling the virtqueue_detach_unused_buf() and followed by the
> >>>virtnet_set_queues() call to reinitialize the buffers. However I
> >>>don't believe this is safe per comment in virtio_ring this API
> >>>is not valid on an active queue and the only thing we have done
> >>>here is napi_disable/napi_enable wrappers which doesn't do anything
> >>>to the emulation layer.
> >>>
> >>>So the RFC is really to find the best solution to this problem.
> >>>A couple things come to mind, (a) always allocate the necessary
> >>>headroom but this is a bit of a waste (b) add some bit somewhere
> >>>to check if the buffer has headroom but this would mean XDP programs
> >>>would be broke for a cycle through the ring, (c) figure out how
> >>>to deactivate a queue, free the buffers and finally reallocate.
> >>>I think (c) is the best choice for now but I'm not seeing the
> >>>API to do this so virtio/qemu experts anyone know off-hand
> >>>how to make this work? I started looking into the PCI callbacks
> >>>reset() and virtio_device_ready() or possibly hitting the right
> >>>set of bits with vp_set_status() but my first attempt just hung
> >>>the device.
> >>
> >> Hi John:
> >>
> >> AFAIK, disabling a specific queue was supported only by virtio 1.0 through
> >> queue_enable field in pci common cfg.
> > 
> > In fact 1.0 only allows enabling queues selectively.
> > We can add disabling by a spec enhancement but
> > for now reset is the only way.
> > 
> > 
> >> But unfortunately, qemu does not
> >> emulate this at all and legacy device does not even support this. So the
> >> safe way is probably reset the device and redo the initialization here.
> > 
> > You will also have to re-apply rx filtering if you do this.
> > Probably sending notification uplink.
> > 
> 
> The following seems to hang the device on the next virtnet_send_command()
> I expected this to meet the reset requirements from the spec because I
> believe its the same flow coming out of restore(). For a real patch we
> don't actually need to kfree all the structs and reallocate them but
> I was expecting the below to work. Any ideas/hints?

Restore assumes device was previously reset.
You want to combine freeze+restore.

> static int virtnet_xdp_reset(struct virtnet_info *vi)
> {
> int i, ret;
> 
> netif_device_detach(vi->dev);
> cancel_delayed_work_sync(>refill);
> if (netif_running(vi->dev)) {
> for (i = 0; i < vi->max_queue_pairs; i++)
> napi_disable(>rq[i].napi);
> }
> 
> remove_vq_common(vi, false);
> ret = init_vqs(vi);
> if (ret)
> return ret;
> virtio_device_ready(vi->vdev);
> 
> if (netif_running(vi->dev)) {
> for (i = 0; i < vi->curr_queue_pairs; i++)
> if (!try_fill_recv(vi, >rq[i], GFP_KERNEL))
> schedule_delayed_work(>refill, 0);
> 
> for (i = 0; i < vi->max_queue_pairs; i++)
> virtnet_napi_enable(>rq[i]);
> }
> netif_device_attach(vi->dev);
> return 0;
> }

[PATCH iproute2 0/3] ip vrf: minor error message cleanups

2017-01-05 Thread David Ahern

David Ahern (3):
  ip vrf: Fix error message when running exec as non-root user
  ip vrf: Improve error message for non-root user
  ip vrf: Clean up bpf related error messages

 ip/ipvrf.c |  6 +-
 lib/fs.c   | 16 
 2 files changed, 17 insertions(+), 5 deletions(-)

-- 
2.1.4

[PATCH iproute2 2/3] ip vrf: Improve cgroup2 error messages

2017-01-05 Thread David Ahern

Currently, if a non-root user attempts to run ip vrf exec a non-helpful
error is returned:

$ ip vrf exec mgmt bash
Failed to mount cgroup2. Are CGROUPS enabled in your kernel?

Only show the CGROUPS kernel hint for the ENODEV error and for the
rest show the strerror for the errno. So now:

$ ip/ip vrf exec mgmt bash
Failed to mount cgroup2: Operation not permitted

Signed-off-by: David Ahern 
---
 lib/fs.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/fs.c b/lib/fs.c
index 644bb486ae8e..12a4657a0bc9 100644
--- a/lib/fs.c
+++ b/lib/fs.c
@@ -80,13 +80,21 @@ char *find_cgroup2_mount(void)
 
if (mount("none", mnt, CGROUP2_FS_NAME, 0, NULL)) {
/* EBUSY means already mounted */
-   if (errno != EBUSY) {
+   if (errno == EBUSY)
+   goto out;
+
+   if (errno == ENODEV) {
fprintf(stderr,
"Failed to mount cgroup2. Are CGROUPS enabled 
in your kernel?\n");
-   free(mnt);
-   return NULL;
+   } else {
+   fprintf(stderr,
+   "Failed to mount cgroup2: %s\n",
+   strerror(errno));
}
+   free(mnt);
+   return NULL;
}
+out:
return mnt;
 }
 
-- 
2.1.4

[PATCH iproute2 3/3] ip vrf: Improve bpf error messages

2017-01-05 Thread David Ahern

Next up a non-root user gets various bpf related error messages:

$ ip vrf exec mgmt bash
Failed to load BPF prog: 'Operation not permitted'
Kernel compiled with CGROUP_BPF enabled?

Catch the EPERM error and do not show the kernel config option.

Signed-off-by: David Ahern 
---
 ip/ipvrf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ip/ipvrf.c b/ip/ipvrf.c
index dc8364a43a57..8bd99d6251f2 100644
--- a/ip/ipvrf.c
+++ b/ip/ipvrf.c
@@ -181,7 +181,11 @@ static int vrf_configure_cgroup(const char *path, int 
ifindex)
if (prog_fd < 0) {
fprintf(stderr, "Failed to load BPF prog: '%s'\n",
strerror(errno));
-   fprintf(stderr, "Kernel compiled with CGROUP_BPF enabled?\n");
+
+   if (errno != EPERM) {
+   fprintf(stderr,
+   "Kernel compiled with CGROUP_BPF enabled?\n");
+   }
goto out;
}
 
-- 
2.1.4

[PATCH iproute2 1/3] ip vrf: Fix run-on error message on mkdir failure

2017-01-05 Thread David Ahern

Andy reported a missing newline if a non-root user attempts to run
'ip vrf exec':

$ ./ip/ip vrf exec default /bin/echo asdf
mkdir failed for /var/run/cgroup2: Permission deniedFailed to setup vrf cgroup2 
directory

Reported-by: Andy Lutomirski 
Signed-off-by: David Ahern 
---
 lib/fs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/fs.c b/lib/fs.c
index 39cc96dccca9..644bb486ae8e 100644
--- a/lib/fs.c
+++ b/lib/fs.c
@@ -121,7 +121,7 @@ int make_path(const char *path, mode_t mode)
 
if (mkdir(dir, mode) != 0) {
fprintf(stderr,
-   "mkdir failed for %s: %s",
+   "mkdir failed for %s: %s\n",
dir, strerror(errno));
goto out;
}
-- 
2.1.4

Re: [net-next PATCH v2 5/6] i40e: Add TX and RX support in switchdev mode.

2017-01-05 Thread Samudrala, Sridhar




On 1/5/2017 4:56 AM, Jiri Pirko wrote:

Tue, Jan 03, 2017 at 07:07:53PM CET, sridhar.samudr...@intel.com wrote:

In switchdev mode, broadcast filter is not enabled on VFs. The broadcasts and
unknown frames from VFs are received by the PF and passed to corresponding VF
port representator netdev.
A host based switching entity like a linux bridge or OVS redirects these frames
to the right VFs via VFPR netdevs. Any frames sent via VFPR netdevs are sent as
directed transmits to the corresponding VFs. To enable directed transmit, skb
metadata dst is used to pass the VF id and the frame is requeued to call the PFs
transmit routine.

Small script to demonstrate inter VF pings in switchdev mode.
PF: enp5s0f0, VFs: enp5s2,enp5s2f1 VFPRs:enp5s0f0-vf0, enp5s0f0-vf1

# rmmod i40e; modprobe i40e
# devlink dev eswitch set pci/:05:00.0 mode switchdev
# echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
# ip link set enp5s0f0 vf 0 mac 00:11:22:33:44:55
# ip link set enp5s0f0 vf 1 mac 00:11:22:33:44:56
# rmmod i40evf; modprobe i40evf

/* Create 2 namespaces and move the VFs to the corresponding ns. */
# ip netns add ns0
# ip link set enp5s2 netns ns0
# ip netns exec ns0 ip addr add 192.168.1.10/24 dev enp5s2
# ip netns exec ns0 ip link set enp5s2 up
# ip netns add ns1
# ip link set enp5s2f1 netns ns1
# ip netns exec ns1 ip addr add 192.168.1.11/24 dev enp5s2f1
# ip netns exec ns1 ip link set enp5s2f1 up

/* bring up pf and vfpr netdevs */
# ip link set enp5s0f0 up
# ip link set enp5s0f0-vf0 up
# ip link set enp5s0f0-vf1 up

/* Create a linux bridge and add vfpr netdevs to it. */
# ip link add vfpr-br type bridge
# ip link set enp5s0f0-vf0 master vfpr-br
# ip link set enp5s0f0-vf1 master vfpr-br
# ip addr add 192.168.1.1/24 dev vfpr-br
# ip link set vfpr-br up

# ip netns exec ns0 ping -c3 192.168.1.11
# ip netns exec ns1 ping -c3 192.168.1.10

Signed-off-by: Sridhar Samudrala 

[...]


diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 352cf7c..b46ddaa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1176,16 +1176,37 @@ static bool i40e_alloc_mapped_page(struct i40e_ring 
*rx_ring,
  * @rx_ring:  rx ring in play
  * @skb: packet to send up
  * @vlan_tag: vlan tag for packet
+ * @lpbk: is it a loopback frame?
  **/
static void i40e_receive_skb(struct i40e_ring *rx_ring,
-struct sk_buff *skb, u16 vlan_tag)
+struct sk_buff *skb, u16 vlan_tag, bool lpbk)
{
struct i40e_q_vector *q_vector = rx_ring->q_vector;
+   struct i40e_pf *pf = rx_ring->vsi->back;
+   struct i40e_vf *vf;
+   struct ethhdr *eth;
+   int vf_id;

if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
(vlan_tag & VLAN_VID_MASK))
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);

+   if ((pf->eswitch_mode == DEVLINK_ESWITCH_MODE_LEGACY) || !lpbk)
+   goto gro_receive;
+
+   /* If a loopback packet is received from a VF in switchdev mode, pass 
the
+* frame to the corresponding VFPR netdev based on the source MAC in 
the frame.
+*/
+   eth = (struct ethhdr *)skb_mac_header(skb);
+   for (vf_id = 0; vf_id < pf->num_alloc_vfs; vf_id++) {
+   vf = >vf[vf_id];
+   if (ether_addr_equal(eth->h_source, vf->default_lan_addr.addr)) 
{
+   skb->dev = vf->vfpr_netdev;

This sucks :( Is't there any identification coming from rx ring that
would tell you what vf this is? To match vrpr according to a single MAC
address seems a bit awkward. What if there is a macvlan configured
on the VF?
Unfortunately, with the current HW, RX descriptor only indicates if it 
is a loopback packet from a VF,

but not the specific id of the VF.
Multiple macs on VF are not supported with the current patchset.
At this point we are not making switchdev as the default mode because of 
these limitations.







+   break;
+   }
+   }
+
+gro_receive:
napi_gro_receive(_vector->napi, skb);
}


[...]


@@ -2998,3 +3064,19 @@ netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, 
struct net_device *netdev)

return i40e_xmit_frame_ring(skb, tx_ring);
}
+
+netdev_tx_t i40e_vfpr_netdev_start_xmit(struct sk_buff *skb,
+   struct net_device *dev)
+{
+   struct i40e_vfpr_netdev_priv *priv = netdev_priv(dev);
+   struct i40e_vf *vf = priv->vf;
+   struct i40e_pf *pf = vf->pf;
+   struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+
+   skb_dst_drop(skb);
+   dst_hold(>vfpr_dst->dst);
+   skb_dst_set(skb, >vfpr_dst->dst);
+   skb->dev = vsi->netdev;

This dst dance seems a bit odd to me. Why don't you just call
i40e_xmit_frame_ring with an extra arg holding the needed metadata?


We don't have TX/RX queues associated with VFPR

Re: [PATCH net-next v4 0/4] Fix OdroidC2 Gigabit Tx link issue

2017-01-05 Thread Russell King - ARM Linux

On Mon, Nov 28, 2016 at 09:54:28AM -0800, Florian Fainelli wrote:
> If we start supporting generic "enable", "disable" type of properties
> with values that map directly to register definitions of the HW, we
> leave too much room for these properties to be utilized to implement a
> specific policy, and this is not acceptable.

Another concern with this patch is that the existing phylib "set_eee"
code is horribly buggy - it just translates the modes from userspace
into the register value and writes them directly to the register with
no validation.  So it's possible to set modes in the register that the
hardware doesn't support, and have them advertised to the link partner.

I have a patch which fixes that, restricting (as we do elsewhere) the
advert according to the EEE supported capabilities retrieved from the
PCS - maybe the problem here is that the PCS doesn't support support
EEE in 1000baseT mode?

Out of interest, which PHY is used on this platform?

On the SolidRun boards, they're using AR8035, and have suffered this
occasional link drop problem.  What has been found is that it seems to
be to do with the timing parameters, and it seemed to only be 1000bT
that was affected.  I don't remember off hand exactly which or what
the change was they made to stabilise it though, but I can probabily
find out tomorrow.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: [RFC PATCH] virtio_net: XDP support for adjust_head

2017-01-05 Thread John Fastabend

On 17-01-03 02:16 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 03, 2017 at 02:01:27PM +0800, Jason Wang wrote:
>>
>>
>> On 2017年01月03日 03:44, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> : There is a problem with this patch as is. When xdp prog is loaded
>>>the old buffers without the 256B headers need to be flushed so that
>>>the bpf prog has the necessary headroom. This patch does this by
>>>calling the virtqueue_detach_unused_buf() and followed by the
>>>virtnet_set_queues() call to reinitialize the buffers. However I
>>>don't believe this is safe per comment in virtio_ring this API
>>>is not valid on an active queue and the only thing we have done
>>>here is napi_disable/napi_enable wrappers which doesn't do anything
>>>to the emulation layer.
>>>
>>>So the RFC is really to find the best solution to this problem.
>>>A couple things come to mind, (a) always allocate the necessary
>>>headroom but this is a bit of a waste (b) add some bit somewhere
>>>to check if the buffer has headroom but this would mean XDP programs
>>>would be broke for a cycle through the ring, (c) figure out how
>>>to deactivate a queue, free the buffers and finally reallocate.
>>>I think (c) is the best choice for now but I'm not seeing the
>>>API to do this so virtio/qemu experts anyone know off-hand
>>>how to make this work? I started looking into the PCI callbacks
>>>reset() and virtio_device_ready() or possibly hitting the right
>>>set of bits with vp_set_status() but my first attempt just hung
>>>the device.
>>
>> Hi John:
>>
>> AFAIK, disabling a specific queue was supported only by virtio 1.0 through
>> queue_enable field in pci common cfg.
> 
> In fact 1.0 only allows enabling queues selectively.
> We can add disabling by a spec enhancement but
> for now reset is the only way.
> 
> 
>> But unfortunately, qemu does not
>> emulate this at all and legacy device does not even support this. So the
>> safe way is probably reset the device and redo the initialization here.
> 
> You will also have to re-apply rx filtering if you do this.
> Probably sending notification uplink.
> 

The following seems to hang the device on the next virtnet_send_command()
I expected this to meet the reset requirements from the spec because I
believe its the same flow coming out of restore(). For a real patch we
don't actually need to kfree all the structs and reallocate them but
I was expecting the below to work. Any ideas/hints?

static int virtnet_xdp_reset(struct virtnet_info *vi)
{
int i, ret;

netif_device_detach(vi->dev);
cancel_delayed_work_sync(>refill);
if (netif_running(vi->dev)) {
for (i = 0; i < vi->max_queue_pairs; i++)
napi_disable(>rq[i].napi);
}

remove_vq_common(vi, false);
ret = init_vqs(vi);
if (ret)
return ret;
virtio_device_ready(vi->vdev);

if (netif_running(vi->dev)) {
for (i = 0; i < vi->curr_queue_pairs; i++)
if (!try_fill_recv(vi, >rq[i], GFP_KERNEL))
schedule_delayed_work(>refill, 0);

for (i = 0; i < vi->max_queue_pairs; i++)
virtnet_napi_enable(>rq[i]);
}
netif_device_attach(vi->dev);
return 0;
}

Re: [PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap

2017-01-05 Thread Murali Karicheri

On 01/05/2017 03:42 PM, Murali Karicheri wrote:
> Rob,
> 
> On 12/22/2016 04:24 PM, Rob Herring wrote:
>> On Tue, Dec 20, 2016 at 05:09:44PM -0500, Murali Karicheri wrote:
>>> From: WingMan Kwok 
>>>
>>> 10gbe phy driver needs to access the 10gbe subsystem control
>>> register during phy initialization. To facilitate the shared
>>> access of the subsystem register region between the 10gbe Ethernet
>>> driver and the phy driver, this patch adds support of the
>>> subsystem register region defined by a syscon node in the dts.
>>>
>>> Although there is no shared access to the gbe subsystem register
>>> region, using syscon for that is for the sake of consistency.
>>>
>>> This change is backward compatible with previously released gbe
>>> devicetree bindings.
>>>
>>> Signed-off-by: WingMan Kwok 
>>> Signed-off-by: Murali Karicheri 
>>> Signed-off-by: Sekhar Nori 
>>> ---
>>>  .../devicetree/bindings/net/keystone-netcp.txt |  16 ++-
>>>  drivers/net/ethernet/ti/netcp_ethss.c  | 140 
>>> +
>>>  2 files changed, 127 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt 
>>> b/Documentation/devicetree/bindings/net/keystone-netcp.txt
>>> index 04ba1dc..0854a73 100644
>>> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
>>> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
>>> @@ -72,20 +72,24 @@ Required properties:
>>> "ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
>>> "ti,netcp-xgbe" for 10 GbE
>>>  
>>> +- syscon-subsys:   phandle to syscon node of the switch
>>> +   subsystem registers.
>>> +
>>>  - reg: register location and the size for the following 
>>> register
>>> regions in the specified order.
>>> - switch subsystem registers
>>> +   - sgmii module registers
>>
>> This needs to go on the end of the list. Otherwise, it is not backwards 
>> compatible.
> 
> Thanks for your review! I assumed backward compatibility means new kernel
> should work with old DTB. The driver code is adjusted to work with both
> DTBs. Isn't that enough?
Rob,

I will pull out 1/10 and 2/10 from the series as it needs more work and
re-submit rest of the patches in my v1 spin. However please reply to my
above backward compatibility question.

Thanks and Regards,

Murali
> 
> Murali
> 
>>
>>> - sgmii port3/4 module registers (only for NetCP 1.4)
>>> - switch module registers
>>> - serdes registers (only for 10G)
>>>  
>>> NetCP 1.4 ethss, here is the order
>>> -   index #0 - switch subsystem registers
>>> +   index #0 - sgmii module registers
>>> index #1 - sgmii port3/4 module registers
>>> index #2 - switch module registers
>>>  
>>> NetCP 1.5 ethss 9 port, 5 port and 2 port
>>> -   index #0 - switch subsystem registers
>>> +   index #0 - sgmii module registers
>>> index #1 - switch module registers
>>> index #2 - serdes registers
>>>  
>>> @@ -145,6 +149,11 @@ Optional properties:
>>>  
>>>  Example binding:
>>>  
>>> +gbe_subsys: subsys@209 {
>>> +   compatible = "syscon";
>>> +   reg = <0x0209 0x100>;
>>> +};
>>> +
>>>  netcp: netcp@200 {
>>> reg = <0x2620110 0x8>;
>>> reg-names = "efuse";
>>> @@ -163,7 +172,8 @@ netcp: netcp@200 {
>>> ranges;
>>> gbe@9 {
>>> label = "netcp-gbe";
>>> -   reg = <0x9 0x300>, <0x90400 0x400>, <0x90800 0x700>;
>>> +   syscon-subsys = <_subsys>;
>>> +   reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>;
>>> /* enable-ale; */
>>> tx-queue = <648>;
>>> tx-channel = <8>;
>>
> 
> 


-- 
Murali Karicheri
Linux Kernel, Keystone

Re: [net-next PATCH 5/6] i40e: Add TX and RX support in switchdev mode.

2017-01-05 Thread Samudrala, Sridhar




On 1/5/2017 3:50 AM, Or Gerlitz wrote:

On Thu, Jan 5, 2017 at 12:46 AM, Samudrala, Sridhar
 wrote:


On 1/3/2017 3:03 PM, Or Gerlitz wrote:

On Fri, Dec 30, 2016 at 7:04 PM, Samudrala, Sridhar
 wrote:

On 12/30/2016 7:31 AM, Or Gerlitz wrote:

Are you exposing switchdev ops for the representators? didn't see that
or maybe it's in the 4th patch which didn't make it to the list?

Not at this time. In the future patches when we offload fdb/vlan
functionality, we could use switchdev ops.

but wait, this is the switchdev mode... even before doing any
offloading, you want (need) your representor netdevices to have the
same HW ID marking they are all ports of the same ASIC, this you can
do with the switchdev parent ID attribute.

OK. I will add switchdev_port_attr_get() with PORT_PARENT_ID support in v3.

Good, I made this comment, b/c we want to create a well defined user-experience
to be taken into account also by upper virtualization layers.

Another piece there to add is have your VF reps implement the
get_phys_port_name ndo,


It looks like you are returning the VF port number as phys_port_name() 
for a VF rep in en_rep.c.

Is this correct?

By default i am creating VFPR netdev with name as _VF
For ex; if enp5s0f0 is the pf name, VFPR netdev for VF0 will be enp5s0f0_vf0

If we want udev to follow this syntax should i return '_vf0'  as 
get_phys_port_name() for VF rep 0?



where as we explain in commit cb67b832921cfa20ad79bafdc51f1745339d0557 is used
as follows:

 Port phys name (ndo_get_phys_port_name) is implemented to allow exporting
 to user-space the VF vport number and along with the switchdev port parent
 id (phys_switch_id) enable a udev base consistent naming scheme:

 SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="", 
\
 ATTR{phys_port_name}!="", NAME="$PF_NIC$attr{phys_port_name}"

 where phys_switch_id is exposed by the PF (and VF reps) and $PF_NIC is
 the name of the PF netdevice.

Or.

Re: [PATCH] net: phy: dp83867: fix irq generation

2017-01-05 Thread Florian Fainelli

On 01/05/2017 12:48 PM, Grygorii Strashko wrote:
> For proper IRQ generation by DP83867 phy the INT/PWDN pin has to be
> programmed as an interrupt output instead of a Powerdown input in
> Configuration Register 3 (CFG3), Address 0x001E, bit 7 INT_OE = 1. The
> current driver doesn't do this and as result IRQs will not be generated by
> DP83867 phy even if they are properly configured in DT.
> 
> Hence, fix IRQ generation by properly configuring CFG3.INT_OE bit and
> ensure that Link Status Change (LINK_STATUS_CHNG_INT) and Auto-Negotiation
> Complete (AUTONEG_COMP_INT) interrupt are enabled. After this the DP83867
> driver will work properly in interrupt enabled mode.
> 
> Signed-off-by: Grygorii Strashko 
> ---
>  drivers/net/phy/dp83867.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
> index 1b63924..e84ae08 100644
> --- a/drivers/net/phy/dp83867.c
> +++ b/drivers/net/phy/dp83867.c
> @@ -29,6 +29,7 @@
>  #define MII_DP83867_MICR 0x12
>  #define MII_DP83867_ISR  0x13
>  #define DP83867_CTRL 0x1f
> +#define DP83867_CFG3 0x1e
>  
>  /* Extended Registers */
>  #define DP83867_RGMIICTL 0x0032
> @@ -98,6 +99,8 @@ static int dp83867_config_intr(struct phy_device *phydev)
>   micr_status |=
>   (MII_DP83867_MICR_AN_ERR_INT_EN |
>   MII_DP83867_MICR_SPEED_CHNG_INT_EN |
> + MII_DP83867_MICR_AUTONEG_COMP_INT_EN |
> + MII_DP83867_MICR_LINK_STS_CHNG_INT_EN |
>   MII_DP83867_MICR_DUP_MODE_CHNG_INT_EN |
>   MII_DP83867_MICR_SLEEP_MODE_CHNG_INT_EN);
>  
> @@ -214,6 +217,13 @@ static int dp83867_config_init(struct phy_device *phydev)
>   }
>   }
>  
> + /* Enable Interrupt output INT_OE in CFG3 register */
> + if (phy_interrupt_is_valid(phydev)) {
> + val = phy_read(phydev, DP83867_CFG3);
> + val |= BIT(7);
> + phy_write(phydev, DP83867_CFG3, val);
> + }

Don't you need to clear that bit in the case phy_interrupt_is_valid()
returns false?

Other than that:

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH v4] net: ethernet: faraday: To support device tree usage.

2017-01-05 Thread Arnd Bergmann

On Thursday, January 5, 2017 6:23:53 PM CET Greentime Hu wrote:
> Signed-off-by: Greentime Hu 
> ---
> Changes in v4:
>   - Use the same binding document to describe the same faraday ethernet 
> controller and add faraday to vendor-prefixes.txt.
> Changes in v3:
>   - Nothing changed in this patch but I have committed andestech to 
> vendor-prefixes.txt.
> Changes in v2:
>   - Change atmac100_of_ids to ftmac100_of_ids
> 

The patch looks good to me now, but please add a proper commit log
before your Signed-off-by tag.

Arnd

Re: [PATCH] tg3: Avoid NULL pointer dereference in tg3_get_nstats()

2017-01-05 Thread Michael Chan

On Thu, Jan 5, 2017 at 12:17 PM, David Miller  wrote:
> From: Michael Chan 
> Date: Thu, 5 Jan 2017 12:04:13 -0800
>
>> But it looks like ndo_get_stats() can be called without rtnl lock from
>> net-procfs.c.  So it is possible that we'll read tp->hw_stats after it
>> has been freed.  For example, if we are reading /proc/net/dev and
>> closing tg3 at the same time.  David, is not taking rtnl_lock in
>> net-procfs.c by design?
>
> Probably not, that dev_get_stats() call probably should be surrounded
> by RTNL protection.
>
> Doing a quick grep on dev_get_stats() shows other call sites, most of
> which are using it to fetch slave device statistics from the get stats
> method of the parent.  Which should be ok.
>
> It appears that the vlan procfs code in net/8021q/vlanproc.c has a
> similar bug as net/core/net-procfs.c
>
> Maybe net/core/net-sysfs.c has the same issue as well, and perhaps also
> net/openvswitch/vport.c:ovs_vport_get_stats().
>

OK.  I will send a patch later today to add rtnl_lock to these callers.

[RFC PATCH] tcp: accept RST for rcv_nxt - 1 after receiving a FIN

2017-01-05 Thread Jason Baron

Using a Mac OSX box as a client connecting to a Linux server, we have found
that when certain applications (such as 'ab'), are abruptly terminated
(via ^C), a FIN is sent followed by a RST packet on tcp connections. The
FIN is accepted by the Linux stack but the RST is sent with the same
sequence number as the FIN, and Linux responds with a challenge ACK per
RFC 5961. The OSX client then does not reply with any RST as would be
expected on a closed socket.

This results in sockets accumulating on the Linux server left mostly in
the CLOSE_WAIT state, although LAST_ACK and CLOSING are also possible.
This sequence of events can tie up a lot of resources on the Linux server
since there may be a lot of data in write buffers at the time of the RST.
Accepting a RST equal to rcv_nxt - 1, after we have already successfully
processed a FIN, has made a significant difference for us in practice, by
freeing up unneeded resources in a more expedient fashion.

I also found a posting that the iOS client behaves in a similar manner here
(it will send a FIN followed by a RST for rcv_nxt - 1):
https://www.snellman.net/blog/archive/2016-02-01-tcp-rst/

A packetdrill test demonstrating the behavior.

// testing mac osx rst behavior

// Establish a connection
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 32768 
0.100 > S. 0:0(0) ack 1 
0.200 < . 1:1(0) ack 1 win 32768
0.200 accept(3, ..., ...) = 4

// Client closes the connection
0.300 < F. 1:1(0) ack 1 win 32768

// now send rst with same sequence
0.300 < R. 1:1(0) ack 1 win 32768

// make sure we are in TCP_CLOSE
0.400 %{
assert tcpi_state == 7
}%

Signed-off-by: Jason Baron 
---
 net/ipv4/tcp_input.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ec6d84363024..373bea05c93b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5249,6 +5249,24 @@ static int tcp_copy_to_iovec(struct sock *sk, struct 
sk_buff *skb, int hlen)
return err;
 }
 
+/* Accept RST for rcv_nxt - 1 after a FIN.
+ * When tcp connections are abruptly terminated from Mac OSX (via ^C), a
+ * FIN is sent followed by a RST packet. The RST is sent with the same
+ * sequence number as the FIN, and thus according to RFC 5961 a challenge
+ * ACK should be sent. However, Mac OSX does not reply to the challenge ACK
+ * with a RST on the closed socket, hence accept this class of RSTs.
+ */
+static bool tcp_reset_check(struct sock *sk, struct sk_buff *skb)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   return unlikely((TCP_SKB_CB(skb)->seq == (tp->rcv_nxt - 1)) &&
+   (TCP_SKB_CB(skb)->end_seq == (tp->rcv_nxt - 1)) &&
+   (sk->sk_state == TCP_CLOSE_WAIT ||
+sk->sk_state == TCP_LAST_ACK ||
+sk->sk_state == TCP_CLOSING));
+}
+
 /* Does PAWS and seqno based validation of an incoming segment, flags will
  * play significant role here.
  */
@@ -5287,6 +5305,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct 
sk_buff *skb,
  LINUX_MIB_TCPACKSKIPPEDSEQ,
  >last_oow_ack_time))
tcp_send_dupack(sk, skb);
+   } else if (tcp_reset_check(sk, skb)) {
+   tcp_reset(sk);
}
goto discard;
}
@@ -5300,7 +5320,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct 
sk_buff *skb,
 * else
 * Send a challenge ACK
 */
-   if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) {
+   if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt ||
+   tcp_reset_check(sk, skb)) {
rst_seq_match = true;
} else if (tcp_is_sack(tp) && tp->rx_opt.num_sacks > 0) {
struct tcp_sack_block *sp = >selective_acks[0];
-- 
2.6.1

Re: [Intel-wired-lan] [net-next PATCH v2 2/6] i40e: Introduce VF Port Representator(VFPR) netdevs.

2017-01-05 Thread Jeff Kirsher

On Tue, 2017-01-03 at 10:07 -0800, Sridhar Samudrala wrote:
> VF Port Representator netdevs are created for each VF if the switch mode
> is set to 'switchdev'. These netdevs can be used to control and configure
> VFs from PFs namespace. They enable exposing VF statistics, configure and
> monitor link state, mtu, filters, fdb/vlan entries etc. of VFs.
> Broadcast filters are not enabled in switchdev mode.
> 
> Sample script to create VF port representors
> # rmmod i40e; modprobe i40e
> # devlink dev eswitch set pci/:05:00.0 mode switchdev
> # echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
> # ip l show
> 297: enp5s0f0:  mtu 1500 qdisc noop portid
> 6805ca2e7268 state DOWN mode DEFAULT group default qlen 1000
>  link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff
>  vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto,
> trust off
>  vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto,
> trust off
> 299: enp5s0f0-vf0:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT group default qlen 1000
>  link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
> 300: enp5s0f0-vf1:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT group default qlen 1000
>  link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
> 
> Signed-off-by: Sridhar Samudrala 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c    |  21 ++-
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 154
> -
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h |  14 ++
>  3 files changed, 182 insertions(+), 7 deletions(-)

This does not apply cleanly because it is based on an older version of
i40e_virtchnl_pf.c file.  It appears that i40e has been updated to use
"i40e_add_filter()" yet your patch still uses "i40e_add_mac_filter()".

We need to clarify what the "right way" is to add filters and use the
correct function.

Dropping this series and will await v3, please address the other feedback
from Or Gerlitz and Jiri Pirko as well in your updated series.

signature.asc
Description: This is a digitally signed message part

TCP using IPv4-mapped IPv6 address as source

2017-01-05 Thread Jonathan T. Leighton

I've observed TCP using an IPv4-mapped IPv6 address as the source 
address, which I believe contradicts 
https://tools.ietf.org/html/rfc6890#page-14 (BCP 153). This occurs when 
an IPv6 TCP socket, bound to a local IPv4-mapped IPv6 address, attempts 
to connect to a remote IPv6 address. Presumable connect() should return 
EAFNOSUPPORT in this case. Please advise me if this is not to 
appropriate list to report this.


$ uname -a
Linux ubuntu 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux


Best regards,
Jon

Re: [PATCH 2/2] isdn: i4l: move active-isdn drivers to staging

2017-01-05 Thread Greg Kroah-Hartman

On Tue, Jan 03, 2017 at 10:19:29PM +0100, Arnd Bergmann wrote:
> On Tuesday, January 3, 2017 4:24:36 PM CET Greg Kroah-Hartman wrote:
> > On Wed, Mar 02, 2016 at 08:06:46PM +0100, Arnd Bergmann wrote:
> > > The icn, act2000 and pcbit drivers are all for very old hardware,
> > > and it is highly unlikely that anyone is actually still using them
> > > on modern kernels, if at all.
> > > 
> > > All three drivers apparently are for hardware that predates PCI
> > > being the common connector, as they are ISA-only and active
> > > PCI ISDN cards were widely available in the 1990s.
> > > 
> > > Looking through the git logs, it I cannot find any indication of a
> > > patch to any of these drivers that has been tested on real hardware,
> > > only cleanups or global API changes.
> > > 
> > > Signed-off-by: Arnd Bergmann 
> > > Acked-by: Karsten Keil 
> > 
> > This patch got added in the 4.6 kernel release.  As I am now taking
> > patches for 4.11-rc1, I figure it is time to just delete the
> > drivers/staging/i4l/ directory now, given that no one has really done
> > anything with it.  If people show up that wish to maintain it, I'll be
> > glad to revert it, or if someone really screams in the next week.
> > Otherwise it's time to just move on 
> 
> Sounds good to me.

Ok, now deleted!

thanks,

greg k-h

Re: [PATCH] net: stmmac: fix maxmtu assignment to be within valid range

2017-01-05 Thread Andy Shevchenko

On Thu, Jan 5, 2017 at 12:47 PM, Kweh, Hock Leong
 wrote:
> From: "Kweh, Hock Leong" 
>
> There is no checking valid value of maxmtu when getting it from devicetree.

'Device Tree' or 'device tree' ?

> This resolution added the checking condition to ensure the assignment is
> made within a valid range.

> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 39eb7a6..683d59f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3319,7 +3319,8 @@ int stmmac_dvr_probe(struct device *device,
> ndev->max_mtu = JUMBO_LEN;
> else
> ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
> -   if (priv->plat->maxmtu < ndev->max_mtu)
> +   if ((priv->plat->maxmtu < ndev->max_mtu) &&
> +   (priv->plat->maxmtu >= ndev->min_mtu))
> ndev->max_mtu = priv->plat->maxmtu;

Perhaps add a warning message on else branch?

-- 
With Best Regards,
Andy Shevchenko

[PATCH] net: phy: dp83867: fix irq generation

2017-01-05 Thread Grygorii Strashko

For proper IRQ generation by DP83867 phy the INT/PWDN pin has to be
programmed as an interrupt output instead of a Powerdown input in
Configuration Register 3 (CFG3), Address 0x001E, bit 7 INT_OE = 1. The
current driver doesn't do this and as result IRQs will not be generated by
DP83867 phy even if they are properly configured in DT.

Hence, fix IRQ generation by properly configuring CFG3.INT_OE bit and
ensure that Link Status Change (LINK_STATUS_CHNG_INT) and Auto-Negotiation
Complete (AUTONEG_COMP_INT) interrupt are enabled. After this the DP83867
driver will work properly in interrupt enabled mode.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/phy/dp83867.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 1b63924..e84ae08 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -29,6 +29,7 @@
 #define MII_DP83867_MICR   0x12
 #define MII_DP83867_ISR0x13
 #define DP83867_CTRL   0x1f
+#define DP83867_CFG3   0x1e
 
 /* Extended Registers */
 #define DP83867_RGMIICTL   0x0032
@@ -98,6 +99,8 @@ static int dp83867_config_intr(struct phy_device *phydev)
micr_status |=
(MII_DP83867_MICR_AN_ERR_INT_EN |
MII_DP83867_MICR_SPEED_CHNG_INT_EN |
+   MII_DP83867_MICR_AUTONEG_COMP_INT_EN |
+   MII_DP83867_MICR_LINK_STS_CHNG_INT_EN |
MII_DP83867_MICR_DUP_MODE_CHNG_INT_EN |
MII_DP83867_MICR_SLEEP_MODE_CHNG_INT_EN);
 
@@ -214,6 +217,13 @@ static int dp83867_config_init(struct phy_device *phydev)
}
}
 
+   /* Enable Interrupt output INT_OE in CFG3 register */
+   if (phy_interrupt_is_valid(phydev)) {
+   val = phy_read(phydev, DP83867_CFG3);
+   val |= BIT(7);
+   phy_write(phydev, DP83867_CFG3, val);
+   }
+
return 0;
 }
 
-- 
2.10.1.dirty

Re: [PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap

2017-01-05 Thread Murali Karicheri

Rob,

On 12/22/2016 04:24 PM, Rob Herring wrote:
> On Tue, Dec 20, 2016 at 05:09:44PM -0500, Murali Karicheri wrote:
>> From: WingMan Kwok 
>>
>> 10gbe phy driver needs to access the 10gbe subsystem control
>> register during phy initialization. To facilitate the shared
>> access of the subsystem register region between the 10gbe Ethernet
>> driver and the phy driver, this patch adds support of the
>> subsystem register region defined by a syscon node in the dts.
>>
>> Although there is no shared access to the gbe subsystem register
>> region, using syscon for that is for the sake of consistency.
>>
>> This change is backward compatible with previously released gbe
>> devicetree bindings.
>>
>> Signed-off-by: WingMan Kwok 
>> Signed-off-by: Murali Karicheri 
>> Signed-off-by: Sekhar Nori 
>> ---
>>  .../devicetree/bindings/net/keystone-netcp.txt |  16 ++-
>>  drivers/net/ethernet/ti/netcp_ethss.c  | 140 
>> +
>>  2 files changed, 127 insertions(+), 29 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt 
>> b/Documentation/devicetree/bindings/net/keystone-netcp.txt
>> index 04ba1dc..0854a73 100644
>> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
>> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
>> @@ -72,20 +72,24 @@ Required properties:
>>  "ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2)
>>  "ti,netcp-xgbe" for 10 GbE
>>  
>> +- syscon-subsys:phandle to syscon node of the switch
>> +subsystem registers.
>> +
>>  - reg:  register location and the size for the following 
>> register
>>  regions in the specified order.
>>  - switch subsystem registers
>> +- sgmii module registers
> 
> This needs to go on the end of the list. Otherwise, it is not backwards 
> compatible.

Thanks for your review! I assumed backward compatibility means new kernel
should work with old DTB. The driver code is adjusted to work with both
DTBs. Isn't that enough?

Murali

> 
>>  - sgmii port3/4 module registers (only for NetCP 1.4)
>>  - switch module registers
>>  - serdes registers (only for 10G)
>>  
>>  NetCP 1.4 ethss, here is the order
>> -index #0 - switch subsystem registers
>> +index #0 - sgmii module registers
>>  index #1 - sgmii port3/4 module registers
>>  index #2 - switch module registers
>>  
>>  NetCP 1.5 ethss 9 port, 5 port and 2 port
>> -index #0 - switch subsystem registers
>> +index #0 - sgmii module registers
>>  index #1 - switch module registers
>>  index #2 - serdes registers
>>  
>> @@ -145,6 +149,11 @@ Optional properties:
>>  
>>  Example binding:
>>  
>> +gbe_subsys: subsys@209 {
>> +compatible = "syscon";
>> +reg = <0x0209 0x100>;
>> +};
>> +
>>  netcp: netcp@200 {
>>  reg = <0x2620110 0x8>;
>>  reg-names = "efuse";
>> @@ -163,7 +172,8 @@ netcp: netcp@200 {
>>  ranges;
>>  gbe@9 {
>>  label = "netcp-gbe";
>> -reg = <0x9 0x300>, <0x90400 0x400>, <0x90800 0x700>;
>> +syscon-subsys = <_subsys>;
>> +reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>;
>>  /* enable-ale; */
>>  tx-queue = <648>;
>>  tx-channel = <8>;
> 


-- 
Murali Karicheri
Linux Kernel, Keystone

RE: [PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread KY Srinivasan



> -Original Message-
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Thursday, January 5, 2017 12:09 PM
> To: KY Srinivasan 
> Cc: Stephen Hemminger ;
> da...@davemloft.net; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; Stephen Hemminger 
> Subject: Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> 
> On Thu, Jan 05, 2017 at 07:08:23PM +, KY Srinivasan wrote:
> >
> >
> > > -Original Message-
> > > From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> > > Sent: Thursday, January 5, 2017 10:29 AM
> > > To: KY Srinivasan 
> > > Cc: Stephen Hemminger ;
> > > da...@davemloft.net; netdev@vger.kernel.org; linux-
> > > ker...@vger.kernel.org; Stephen Hemminger
> 
> > > Subject: Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> > >
> > > On Thu, Jan 05, 2017 at 05:43:04PM +, KY Srinivasan wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > > Sent: Thursday, January 5, 2017 9:36 AM
> > > > > To: da...@davemloft.net; KY Srinivasan 
> > > > > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > > > > gre...@linuxfoundation.org; Stephen Hemminger
> > > > > 
> > > > > Subject: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> > > > >
> > > > > Update the Hyper-V MAINTAINERS to include myself.
> > > > >
> > > > > Signed-off-by: Stephen Hemminger 
> > > >
> > > > Acked-by: K. Y. Srinivasan 
> > >
> > > Thanks, will go queue this up now.
> >
> > Thanks Greg. On a different note, there are a bunch of Hyper-V specific
> > patches that have been submitted over the last month or so that have not
> > been committed. Should I resend them.
> 
> Nope, they are still in my mbox, I'm just going through stuff that has
> to be in 4.10-final at the moment, give me another week or so to catch
> up on all the new stuff for 4.11-rc1.

Thanks Greg.

K. Y
> 
> thanks,
> 
> greg k-h

Re: [PATCH v1 1/2] bpf: add a longest prefix match trie map implementation

2017-01-05 Thread Daniel Borkmann


Hi Daniel,

On 01/05/2017 09:04 PM, Daniel Mack wrote:

On 01/05/2017 05:25 PM, Daniel Borkmann wrote:

On 12/29/2016 06:28 PM, Daniel Mack wrote:



diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
new file mode 100644
index 000..8b6a61d
--- /dev/null
+++ b/kernel/bpf/lpm_trie.c


[..]


+static struct bpf_map *trie_alloc(union bpf_attr *attr)
+{
+   struct lpm_trie *trie;
+
+   /* check sanity of attributes */
+   if (attr->max_entries == 0 || attr->map_flags ||
+   attr->key_size < sizeof(struct bpf_lpm_trie_key) + 1   ||
+   attr->key_size > sizeof(struct bpf_lpm_trie_key) + 256 ||
+   attr->value_size != sizeof(u64))
+   return ERR_PTR(-EINVAL);


The correct attr->map_flags test here would need to be ...

attr->map_flags != BPF_F_NO_PREALLOC

... since in this case we don't have any prealloc pool, and
should that come one day that test could be relaxed again.


+   trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN);
+   if (!trie)
+   return NULL;
+
+   /* copy mandatory map attributes */
+   trie->map.map_type = attr->map_type;
+   trie->map.key_size = attr->key_size;
+   trie->map.value_size = attr->value_size;
+   trie->map.max_entries = attr->max_entries;


You also need to fill in trie->map.pages as that is eventually
used to charge memory against in bpf_map_charge_memlock(), right
now that would remain as 0 meaning the map is not accounted for.


Hmm, okay. The nodes are, however, allocated dynamically at runtime in
this case. That means that we have trie->map.pages on each allocation,
right?


The current scheme (f.e. htab_map_alloc() has some details, although
probably not too obvious) that was done charges worst-case cost up front,
so it would be in trie_alloc() where you fill map.pages and map_create()
will later account for them.

Thanks,
Daniel

Re: [PATCH] tg3: Avoid NULL pointer dereference in tg3_get_nstats()

2017-01-05 Thread David Miller

From: Michael Chan 
Date: Thu, 5 Jan 2017 12:04:13 -0800

> But it looks like ndo_get_stats() can be called without rtnl lock from
> net-procfs.c.  So it is possible that we'll read tp->hw_stats after it
> has been freed.  For example, if we are reading /proc/net/dev and
> closing tg3 at the same time.  David, is not taking rtnl_lock in
> net-procfs.c by design?

Probably not, that dev_get_stats() call probably should be surrounded
by RTNL protection.

Doing a quick grep on dev_get_stats() shows other call sites, most of
which are using it to fetch slave device statistics from the get stats
method of the parent.  Which should be ok.

It appears that the vlan procfs code in net/8021q/vlanproc.c has a
similar bug as net/core/net-procfs.c

Maybe net/core/net-sysfs.c has the same issue as well, and perhaps also
net/openvswitch/vport.c:ovs_vport_get_stats().

Re: [PATCH v1 1/2] bpf: add a longest prefix match trie map implementation

2017-01-05 Thread Daniel Mack

Hi,

On 01/05/2017 09:01 PM, Daniel Borkmann wrote:
> On 01/05/2017 05:25 PM, Daniel Borkmann wrote:
>> On 12/29/2016 06:28 PM, Daniel Mack wrote:

> [...]
>>> +static struct bpf_map *trie_alloc(union bpf_attr *attr)
>>> +{
>>> +struct lpm_trie *trie;
>>> +
>>> +/* check sanity of attributes */
>>> +if (attr->max_entries == 0 || attr->map_flags ||
>>> +attr->key_size < sizeof(struct bpf_lpm_trie_key) + 1   ||
>>> +attr->key_size > sizeof(struct bpf_lpm_trie_key) + 256 ||
>>> +attr->value_size != sizeof(u64))
>>> +return ERR_PTR(-EINVAL);
> 
> One more question on this regarding value size as u64 (perhaps I
> missed it along the way): reason this was chosen was because for
> keeping stats? Why not making user choose a size as in other maps,
> so also custom structs could be stored there?

In my use case, the actual value of a node is in fact ignored, all that
matters is whether a node exists in a trie or not. The test code uses
u64 for its tests.

I can change it around so that the value size can be defined by
userspace, but ideally it would also support 0-byte lengths then. The
bpf map syscall handler should handle the latter just fine if I read the
code correctly?

Thanks,
Daniel

Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread gre...@linuxfoundation.org

On Thu, Jan 05, 2017 at 07:08:23PM +, KY Srinivasan wrote:
> 
> 
> > -Original Message-
> > From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> > Sent: Thursday, January 5, 2017 10:29 AM
> > To: KY Srinivasan 
> > Cc: Stephen Hemminger ;
> > da...@davemloft.net; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; Stephen Hemminger 
> > Subject: Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> > 
> > On Thu, Jan 05, 2017 at 05:43:04PM +, KY Srinivasan wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > > Sent: Thursday, January 5, 2017 9:36 AM
> > > > To: da...@davemloft.net; KY Srinivasan 
> > > > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > > > gre...@linuxfoundation.org; Stephen Hemminger
> > > > 
> > > > Subject: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> > > >
> > > > Update the Hyper-V MAINTAINERS to include myself.
> > > >
> > > > Signed-off-by: Stephen Hemminger 
> > >
> > > Acked-by: K. Y. Srinivasan 
> > 
> > Thanks, will go queue this up now.
> 
> Thanks Greg. On a different note, there are a bunch of Hyper-V specific
> patches that have been submitted over the last month or so that have not
> been committed. Should I resend them.

Nope, they are still in my mbox, I'm just going through stuff that has
to be in 4.10-final at the moment, give me another week or so to catch
up on all the new stuff for 4.11-rc1.

thanks,

greg k-h

Re: [for-next 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-05 Thread David Miller

From: Eli Cohen 
Date: Thu, 5 Jan 2017 14:03:18 -0600

> If necessary I can make sure it builds on 32 bits as well.

Please do.

Re: [PATCH v1 1/2] bpf: add a longest prefix match trie map implementation

2017-01-05 Thread Daniel Mack

Hi Daniel,

Thanks for your feedback! I agree on all points. Two questions below.

On 01/05/2017 05:25 PM, Daniel Borkmann wrote:
> On 12/29/2016 06:28 PM, Daniel Mack wrote:

>> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
>> new file mode 100644
>> index 000..8b6a61d
>> --- /dev/null
>> +++ b/kernel/bpf/lpm_trie.c

[..]

>> +static struct bpf_map *trie_alloc(union bpf_attr *attr)
>> +{
>> +struct lpm_trie *trie;
>> +
>> +/* check sanity of attributes */
>> +if (attr->max_entries == 0 || attr->map_flags ||
>> +attr->key_size < sizeof(struct bpf_lpm_trie_key) + 1   ||
>> +attr->key_size > sizeof(struct bpf_lpm_trie_key) + 256 ||
>> +attr->value_size != sizeof(u64))
>> +return ERR_PTR(-EINVAL);
> 
> The correct attr->map_flags test here would need to be ...
> 
>attr->map_flags != BPF_F_NO_PREALLOC
> 
> ... since in this case we don't have any prealloc pool, and
> should that come one day that test could be relaxed again.
> 
>> +trie = kzalloc(sizeof(*trie), GFP_USER | __GFP_NOWARN);
>> +if (!trie)
>> +return NULL;
>> +
>> +/* copy mandatory map attributes */
>> +trie->map.map_type = attr->map_type;
>> +trie->map.key_size = attr->key_size;
>> +trie->map.value_size = attr->value_size;
>> +trie->map.max_entries = attr->max_entries;
> 
> You also need to fill in trie->map.pages as that is eventually
> used to charge memory against in bpf_map_charge_memlock(), right
> now that would remain as 0 meaning the map is not accounted for.

Hmm, okay. The nodes are, however, allocated dynamically at runtime in
this case. That means that we have trie->map.pages on each allocation,
right?

>> +static void trie_free(struct bpf_map *map)
>> +{
>> +struct lpm_trie_node __rcu **slot;
>> +struct lpm_trie_node *node;
>> +struct lpm_trie *trie =
>> +container_of(map, struct lpm_trie, map);
>> +
>> +spin_lock(>lock);
>> +
>> +/*
>> + * Always start at the root and walk down to a node that has no
>> + * children. Then free that node, nullify its pointer in the parent,
>> + * then start over.
>> + */
>> +
>> +for (;;) {
>> +slot = >root;
>> +
>> +for (;;) {
>> +node = rcu_dereference_protected(*slot,
>> +lockdep_is_held(>lock));
>> +if (!node)
>> +goto out;
>> +
>> +if (node->child[0]) {
> 
> rcu_access_pointer(node->child[0]) (at least to keep sparse happy?)

Done, but sparse does not actually complain here.



Thanks,
Daniel

Re: [PATCH v4 net-next] tools: psock_tpacket: block Rx until socket filter has been added and socket has been bound to loopback.

2017-01-05 Thread David Miller

From: Sowmini Varadhan 
Date: Thu,  5 Jan 2017 11:06:22 -0800

> Packets from any/all interfaces may be queued up on the PF_PACKET socket
> before it is bound to the loopback interface by psock_tpacket, and
> when these are passed up by the kernel, they could interfere
> with the Rx tests.
> 
> Avoid interference from spurious packet by blocking Rx until the
> socket filter has been set up, and the packet has been bound to the
> desired (lo) interface. The effective sequence is
>   socket(PF_PACKET, SOCK_RAW, 0);
>   set up ring
>   Invoke SO_ATTACH_FILTER
>   bind to sll_protocol set to ETH_P_ALL, sll_ifindex for lo
> After this sequence, the only packets that will be passed up are
> those received on loopback that pass the attached filter.
> 
> Signed-off-by: Sowmini Varadhan 

Applied, thanks.

Re: [PATCH] tg3: Avoid NULL pointer dereference in tg3_get_nstats()

2017-01-05 Thread Michael Chan

On Thu, Jan 5, 2017 at 9:33 AM, David Miller  wrote:
> From: Wang Yufen 
> Date: Thu, 5 Jan 2017 22:13:21 +0800
>
>> From: Yufen Wang 
>>
>> A possible NULL pointer dereference in tg3_get_stats64 while doing
>> tg3_free_consistent.
>  ...
>> This patch avoids the NULL pointer dereference by using !tg3_flag(tp, 
>> INIT_COMPLETE)
>> instate of !tp->hw_stats.
>>
>> Signed-off-by: Yufen Wang 
>> ---
>>  drivers/net/ethernet/broadcom/tg3.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/tg3.c 
>> b/drivers/net/ethernet/broadcom/tg3.c
>> index 185e9e0..012f18d 100644
>> --- a/drivers/net/ethernet/broadcom/tg3.c
>> +++ b/drivers/net/ethernet/broadcom/tg3.c
>> @@ -14148,7 +14148,7 @@ static struct rtnl_link_stats64 
>> *tg3_get_stats64(struct net_device *dev,
>>   struct tg3 *tp = netdev_priv(dev);
>>
>>   spin_lock_bh(>lock);
>> - if (!tp->hw_stats) {
>> + if (!tg3_flag(tp, INIT_COMPLETE)) {
>
> The real issue is the manner and order in which the driver performs
> initialization actions relative to netif_device_{attach,detach}().
>
> That is what needs to be fixed here, instead of adding more and more
> ad-hoc tests to the various methods which can be invoked once the
> netif_device_attach() occurs.

Normally, ndo_get_stats64() should be under rtnl lock in the netlink
code path and we should be safe. We only free tp->hw_stats under rtnl
lock in the close path or ethtool path.

But it looks like ndo_get_stats() can be called without rtnl lock from
net-procfs.c.  So it is possible that we'll read tp->hw_stats after it
has been freed.  For example, if we are reading /proc/net/dev and
closing tg3 at the same time.  David, is not taking rtnl_lock in
net-procfs.c by design?

Re: [PATCH v1 1/2] bpf: add a longest prefix match trie map implementation

2017-01-05 Thread Daniel Borkmann


On 01/05/2017 05:25 PM, Daniel Borkmann wrote:

On 12/29/2016 06:28 PM, Daniel Mack wrote:

This trie implements a longest prefix match algorithm that can be used
to match IP addresses to a stored set of ranges.

Internally, data is stored in an unbalanced trie of nodes that has a
maximum height of n, where n is the prefixlen the trie was created
with.

Tries may be created with prefix lengths that are multiples of 8, in
the range from 8 to 2048. The key used for lookup and update operations
is a struct bpf_lpm_trie_key, and the value is a uint64_t.

The code carries more information about the internal implementation.

Signed-off-by: Daniel Mack 
Reviewed-by: David Herrmann 


Thanks for working on it, and sorry for late reply. In addition to
Alexei's earlier comments on the cover letter, a few comments inline:


[...]

+static struct bpf_map *trie_alloc(union bpf_attr *attr)
+{
+struct lpm_trie *trie;
+
+/* check sanity of attributes */
+if (attr->max_entries == 0 || attr->map_flags ||
+attr->key_size < sizeof(struct bpf_lpm_trie_key) + 1   ||
+attr->key_size > sizeof(struct bpf_lpm_trie_key) + 256 ||
+attr->value_size != sizeof(u64))
+return ERR_PTR(-EINVAL);


One more question on this regarding value size as u64 (perhaps I
missed it along the way): reason this was chosen was because for
keeping stats? Why not making user choose a size as in other maps,
so also custom structs could be stored there?

Thanks,
Daniel

Re: [PATCH net-next v2] tcp: provide timestamps for partial writes

2017-01-05 Thread David Miller

From: Soheil Hassas Yeganeh 
Date: Wed,  4 Jan 2017 11:19:34 -0500

> From: Soheil Hassas Yeganeh 
> 
> For TCP sockets, TX timestamps are only captured when the user data
> is successfully and fully written to the socket. In many cases,
> however, TCP writes can be partial for which no timestamp is
> collected.
> 
> Collect timestamps whenever any user data is (fully or partially)
> copied into the socket. Pass tcp_write_queue_tail to tcp_tx_timestamp
> instead of the local skb pointer since it can be set to NULL on
> the error path.
> 
> Note that tcp_write_queue_tail can be NULL, even if bytes have been
> copied to the socket. This is because acknowledgements are being
> processed in tcp_sendmsg(), and by the time tcp_tx_timestamp is
> called tcp_write_queue_tail can be NULL. For such cases, this patch
> does not collect any timestamps (i.e., it is best-effort).
> 
> This patch is written with suggestions from Willem de Bruijn and
> Eric Dumazet.
> 
> Change-log V1 -> V2:
>   - Use sockc.tsflags instead of sk->sk_tsflags.
>   - Use the same code path for normal writes and errors.
> 
> Signed-off-by: Soheil Hassas Yeganeh 
> Acked-by: Yuchung Cheng 

Applied, thanks.

Re: [for-next 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-05 Thread David Miller

From: Saeed Mahameed 
Date: Tue,  3 Jan 2017 23:55:25 +0200

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
> b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> index ddb4ca4..39505ac 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
> @@ -5,7 +5,7 @@
>  config MLX5_CORE
>   tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver"
>   depends on MAY_USE_DEVLINK
> - depends on PCI
> + depends on PCI && 64BIT
>   default n
>   ---help---
> Core driver for low level functionality of the ConnectX-4 and

This is a regression, I'm not applying this.

I don't care how hard it is, you have to keep the driver building properly
in non-64bit builds.

Re: [PATCH] phy state machine: failsafe leave invalid RUNNING state

2017-01-05 Thread Florian Fainelli

On 01/05/2017 01:23 AM, Zefir Kurtisi wrote:
> On 01/04/2017 10:44 PM, Florian Fainelli wrote:
>> On 01/04/2017 08:10 AM, Zefir Kurtisi wrote:
>>> On 01/04/2017 04:30 PM, Florian Fainelli wrote:


 On 01/04/2017 07:27 AM, Zefir Kurtisi wrote:
> On 01/04/2017 04:13 PM, Florian Fainelli wrote:
>>
>>
>> On 01/04/2017 07:04 AM, Zefir Kurtisi wrote:
>>> While in RUNNING state, phy_state_machine() checks for link changes by
>>> comparing phydev->link before and after calling phy_read_status().
>>> This works as long as it is guaranteed that phydev->link is never
>>> changed outside the phy_state_machine().
>>>
>>> If in some setups this happens, it causes the state machine to miss
>>> a link loss and remain RUNNING despite phydev->link being 0.
>>>
>>> This has been observed running a dsa setup with a process continuously
>>> polling the link states over ethtool each second (SNMPD RFC-1213
>>> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
>>> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
>>> call phy_read_status() and with that modify the link status - and
>>> with that bricking the phy state machine.
>>
>> That's the interesting part of the analysis, how does this brick the PHY
>> state machine? Is the PHY driver changing the link status in the
>> read_status callback that it implements?
>>
> phydev->read_status points to genphy_read_status(), where the first call 
> goes to
> genphy_update_link() which updates the link status.
>
> Thereafter phy_state_machine():RUNNING won't be able to detect the link 
> loss
> anymore unless the link state changes again.
>
>
> I was trying to figure out if there is a rule that forbids changing 
> phydev->link
> from outside the state machine, but found several places where it happens 
> (either
> directly, or over genphy_read_status() or over genphy_update_link()).
>
> Curious how this did not show up before, since within the dsa setup it is 
> very
> easy to trigger:
> a) physically disconnect link
> b) within one second run ethtool ethX

 You need to be more specific here about what "the dsa setup" is, drivers
 involved, which ports of the switch you are seeing this with (user
 facing, CPU port, DSA port?) etc.

>>> I am working on top of LEDE and with that at kernel 4.4.21 - alas I checked 
>>> the
>>> related source files and believe the effect should be reproducible with 
>>> HEAD.
>>>
>>> The setup is as follows:
>>> mv88e6321:
>>> * ports 0+1 connected to fibre-optics transceivers at fixed 100 Mbps
>>> * port 4 is CPU port
>>> * custom phy driver (replacement for marvell.ko) only populated with
>>>   * .config_init to
>>> * set fixed speed for ports 0+1 (when in FO mode)
>>> * run genphy_config_init() for all other modes (here: CPU port)
>>>   * .config_aneg=genphy_config_aneg, .read_status=genphy_read_status
>>>
>>>
>>> To my understanding, the exact setup is irrelevant - to reproduce the issue 
>>> it is
>>> enough to have a means of running genphy_update_link() (as done in e.g.
>>> mediatek/mtk_eth_soc.c, dsa/slave.c), or genphy_read_status() (as done in 
>>> e.g.
>>> hisilicon/hns/hns_enet.c) or phy_read_status() (as done in e.g.
>>> ethernet/ti/netcp_ethss.c, ethernet/aeroflex/greth.c, etc.). In the observed
>>> drivers it is mostly implemented in the ETHTOOL_GSET execution path.
>>>
>>> Once you get the link state updated outside the phy state machine, it 
>>> remains in
>>> invalid RUNNING. To prevent that invalid state, to my understanding upper 
>>> layer
>>> drivers (Ethernet, dsa) must not modify link-states in any case (including 
>>> calling
>>> the functions noted above), or we need the proposed fail-safe mechanism to 
>>> prevent
>>> getting stuck.
>>
>> OK, I see the code path involved now, sorry -ENOCOFFEE when I initially
>> responded. Yes, clearly, we should not be mangling the PHY device's link
>> by calling genphy_read_status(). At first glance, none of the users
>> below should be doing what they are doing, but let's kick a separate
>> patch series to collect feedback from the driver writes.
>>
>> Thanks!
>>
> Ok, thanks for taking time.
> 
> The kbuild test robot error is due to 'struct device dev' been removed from
> phy_device struct since 4.4.21. Does it make sense to provide a v2 fixing 
> that, or
> do you expect that this fail-safe mechanism is not needed once all 
> Ethernet/dsa
> drivers are fixed?

I think there is value in identifying wrong behaving drivers while we
fix them one after the other.

> 
> I think it won't hurt to add the check simply to ensure that it got fixed and 
> the
> issue is not popping up thereafter.

Agreed, can you resubmit against the latest net-next/master tree?

Thanks!
-- 
Florian

Re: [PATCH v1 1/2] bpf: add a longest prefix match trie map implementation

2017-01-05 Thread Daniel Borkmann


On 01/05/2017 05:25 PM, Daniel Borkmann wrote:

On 12/29/2016 06:28 PM, Daniel Mack wrote:

This trie implements a longest prefix match algorithm that can be used
to match IP addresses to a stored set of ranges.

Internally, data is stored in an unbalanced trie of nodes that has a
maximum height of n, where n is the prefixlen the trie was created
with.

Tries may be created with prefix lengths that are multiples of 8, in
the range from 8 to 2048. The key used for lookup and update operations
is a struct bpf_lpm_trie_key, and the value is a uint64_t.

The code carries more information about the internal implementation.

Signed-off-by: Daniel Mack 
Reviewed-by: David Herrmann 


Thanks for working on it, and sorry for late reply. In addition to
Alexei's earlier comments on the cover letter, a few comments inline:

[...]

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
new file mode 100644
index 000..8b6a61d
--- /dev/null
+++ b/kernel/bpf/lpm_trie.c
@@ -0,0 +1,468 @@
+/*
+ * Longest prefix match list implementation
+ *
+ * Copyright (c) 2016 Daniel Mack
+ * Copyright (c) 2016 David Herrmann
+ *
+ * This file is subject to the terms and conditions of version 2 of the GNU
+ * General Public License.  See the file COPYING in the main directory of the
+ * Linux distribution for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Intermediate node */
+#define LPM_TREE_NODE_FLAG_IM BIT(0)
+
+struct lpm_trie_node;
+
+struct lpm_trie_node {
+struct rcu_head rcu;
+struct lpm_trie_node __rcu*child[2];
+u32prefixlen;
+u32flags;
+u64value;
+u8data[0];
+};
+
+struct lpm_trie {
+struct bpf_mapmap;
+struct lpm_trie_node __rcu*root;
+size_tn_entries;
+size_tmax_prefixlen;
+size_tdata_size;
+spinlock_tlock;
+};
+

[...]

+
+static inline int extract_bit(const u8 *data, size_t index)
+{
+return !!(data[index / 8] & (1 << (7 - (index % 8;
+}
+

[...]

+
+static struct lpm_trie_node *lpm_trie_node_alloc(size_t data_size)
+{
+return kmalloc(sizeof(struct lpm_trie_node) + data_size,
+   GFP_ATOMIC | __GFP_NOWARN);
+}
+
+/* Called from syscall or from eBPF program */
+static int trie_update_elem(struct bpf_map *map,
+void *_key, void *value, u64 flags)
+{
+struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
+struct lpm_trie_node *node, *im_node, *new_node = NULL;
+struct lpm_trie_node __rcu **slot;
+struct bpf_lpm_trie_key *key = _key;
+unsigned int next_bit;
+size_t matchlen = 0;
+int ret = 0;


We should guard for future map flags here:

 if (unlikely(flags > BPF_EXIST))
 return -EINVAL;

And further below we'd need to check for BPF_{NO,}EXIST when replacing
resp. adding the node?


+if (key->prefixlen > trie->max_prefixlen)
+return -EINVAL;
+
+spin_lock(>lock);


That spin lock would need to be converted to a raw lock, see commit
ac00881f9221 ("bpf: convert hashtab lock to raw lock"). The comment
in htab also mentions that bpf_map_update_elem() can be called in
irq context (I assume as a map from tracing side?), so we'd need to
use the *_irqsave variants here as well.


+/* Allocate and fill a new node */
+
+if (trie->n_entries == trie->map.max_entries) {
+ret = -ENOSPC;
+goto out;
+}
+
+new_node = lpm_trie_node_alloc(trie->data_size);
+if (!new_node) {
+ret = -ENOMEM;
+goto out;
+}
+
+trie->n_entries++;
+new_node->value = *(u64 *) value;
+new_node->prefixlen = key->prefixlen;
+new_node->flags = 0;
+new_node->child[0] = NULL;
+new_node->child[1] = NULL;


Should this be ...

RCU_INIT_POINTER(new_node->child[0], NULL);
RCU_INIT_POINTER(new_node->child[1], NULL);


+memcpy(new_node->data, key->data, trie->data_size);
+
+/*
+ * Now find a slot to attach the new node. To do that, walk the tree
+ * from the root match as many bits as possible for each node until we
+ * either find an empty slot or a slot that needs to be replaced by an
+ * intermediate node.
+ */
+slot = >root;
+
+while ((node = rcu_dereference_protected(*slot,
+lockdep_is_held(>lock {
+matchlen = longest_prefix_match(trie, node, key);
+
+if (node->prefixlen != matchlen ||
+node->prefixlen == key->prefixlen ||
+node->prefixlen == trie->max_prefixlen)
+break;
+
+next_bit = extract_bit(key->data, node->prefixlen);
+slot = >child[next_bit];
+}
+
+/*
+ * If the slot is empty (a free child pointer or an empty root),
+ * simply assign the @new_node to that slot and be done.
+ */
+if (!node) {
+rcu_assign_pointer(*slot, new_node);
+goto

Re: [PATCH net-next] packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode

2017-01-05 Thread Daniel Borkmann


On 01/05/2017 07:27 PM, Eric Dumazet wrote:

On Thu, 2017-01-05 at 02:34 +0100, Daniel Borkmann wrote:

[...]

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7e39087..ddbda25 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -481,6 +481,9 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, 
void *frame,
h.h2->tp_nsec = ts.tv_nsec;
break;
case TPACKET_V3:
+   h.h3->tp_sec = ts.tv_sec;
+   h.h3->tp_nsec = ts.tv_nsec;
+   break;
default:
WARN(1, "TPACKET version not supported.\n");
BUG();


Gosh. Can we also replace this BUG() into something less aggressive ?


There are currently 5 of these WARN() + BUG() constructs and 1 BUG()-only
for the 'default' TPACKET version spread all over af_packet, so probably
makes sense to rather make all of them less aggressive.


diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 
b9e1a13b4ba36a0bc7edf6a8c2c116c7d48c970c..0c0d268544787dcbef6601c5014e7d3836d16f96
 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -476,9 +476,11 @@ static __u32 __packet_set_timestamp(struct packet_sock 
*po, void *frame,
h.h2->tp_nsec = ts.tv_nsec;
break;
case TPACKET_V3:
+   h.h3->tp_sec = ts.tv_sec;
+   h.h3->tp_nsec = ts.tv_nsec;
+   break;
default:
-   WARN(1, "TPACKET version not supported.\n");
-   BUG();
+   pr_err_once("TPACKET version %u not supported.\n", 
po->tp_version);
}

/* one flush is safe, as both fields always lie on the same cacheline */

[net-next PATCH v6 0/3] net: dummy: Introduce dummy virtual functions

2017-01-05 Thread Phil Sutter

This series adds VF support to dummy device driver after adding the
necessary infrastructure changes:

Patch 1 adds a netdevice callback for device-specific VF count
retrieval. Patch 2 then changes dev_num_vf() implementation to make use
of that new callback (if implemented), falling back to the old
behaviour. Patch 3 then implements VF support in dummy, without the fake
PCI parent device hack from v5.

Phil Sutter (3):
  net: net_device_ops: Introduce ndo_get_vf_count
  net: rtnetlink: Use a local dev_num_vf() implementation
  net: dummy: Introduce dummy virtual functions

 drivers/net/dummy.c   | 178 +-
 include/linux/netdevice.h |   5 ++
 include/linux/pci.h   |   2 -
 net/core/rtnetlink.c  |  37 ++
 4 files changed, 205 insertions(+), 17 deletions(-)

-- 
2.11.0

[net-next PATCH v6 2/3] net: rtnetlink: Use a local dev_num_vf() implementation

2017-01-05 Thread Phil Sutter

Promote dev_num_vf() to be no longer PCI device specific but use
ndo_get_vf_count() if implemented and only fall back to pci_num_vf()
like the old dev_num_vf() did.

Since this implementation no longer requires a parent device to be
present, don't pass the parent but the actual device to it and have it
check for parent existence only in the fallback case. This in turn
allows to eliminate parent existence checks in callers.

Signed-off-by: Phil Sutter 
---
Changes since v5:
- Introduced this patch.
---
 include/linux/pci.h  |  2 --
 net/core/rtnetlink.c | 37 -
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index e2d1a124216a9..adbc859fe7c4c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -885,7 +885,6 @@ void pcibios_setup_bridge(struct pci_bus *bus, unsigned 
long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == _bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
-#define dev_num_vf(d) ((dev_is_pci(d) ? pci_num_vf(to_pci_dev(d)) : 0))
 
 /* Generic PCI functions exported to card drivers */
 
@@ -1630,7 +1629,6 @@ static inline int pci_get_new_domain_nr(void) { return 
-ENOSYS; }
 
 #define dev_is_pci(d) (false)
 #define dev_is_pf(d) (false)
-#define dev_num_vf(d) (0)
 #endif /* CONFIG_PCI */
 
 /* Include architecture-dependent settings and functions */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 18b5aae99becf..84294593e0306 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -833,13 +833,24 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats 
*a,
a->rx_nohandler = b->rx_nohandler;
 }
 
+static int dev_num_vf(const struct net_device *dev)
+{
+   if (dev->netdev_ops->ndo_get_vf_count)
+   return dev->netdev_ops->ndo_get_vf_count(dev);
+#ifdef CONFIG_PCI
+   if (dev->dev.parent && dev_is_pci(dev->dev.parent))
+   return pci_num_vf(to_pci_dev(dev->dev.parent));
+#endif
+   return 0;
+}
+
 /* All VF info */
 static inline int rtnl_vfinfo_size(const struct net_device *dev,
   u32 ext_filter_mask)
 {
-   if (dev->dev.parent && dev_is_pci(dev->dev.parent) &&
-   (ext_filter_mask & RTEXT_FILTER_VF)) {
-   int num_vfs = dev_num_vf(dev->dev.parent);
+   int num_vfs = dev_num_vf(dev);
+
+   if (num_vfs && (ext_filter_mask & RTEXT_FILTER_VF)) {
size_t size = nla_total_size(0);
size += num_vfs *
(nla_total_size(0) +
@@ -889,12 +900,12 @@ static size_t rtnl_port_size(const struct net_device *dev,
size_t port_self_size = nla_total_size(sizeof(struct nlattr))
+ port_size;
 
-   if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent ||
+   if (!dev->netdev_ops->ndo_get_vf_port ||
!(ext_filter_mask & RTEXT_FILTER_VF))
return 0;
-   if (dev_num_vf(dev->dev.parent))
+   if (dev_num_vf(dev))
return port_self_size + vf_ports_size +
-   vf_port_size * dev_num_vf(dev->dev.parent);
+   vf_port_size * dev_num_vf(dev);
else
return port_self_size;
 }
@@ -962,7 +973,7 @@ static int rtnl_vf_ports_fill(struct sk_buff *skb, struct 
net_device *dev)
if (!vf_ports)
return -EMSGSIZE;
 
-   for (vf = 0; vf < dev_num_vf(dev->dev.parent); vf++) {
+   for (vf = 0; vf < dev_num_vf(dev); vf++) {
vf_port = nla_nest_start(skb, IFLA_VF_PORT);
if (!vf_port)
goto nla_put_failure;
@@ -1012,7 +1023,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct 
net_device *dev,
 {
int err;
 
-   if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent ||
+   if (!dev->netdev_ops->ndo_get_vf_port ||
!(ext_filter_mask & RTEXT_FILTER_VF))
return 0;
 
@@ -1020,7 +1031,7 @@ static int rtnl_port_fill(struct sk_buff *skb, struct 
net_device *dev,
if (err)
return err;
 
-   if (dev_num_vf(dev->dev.parent)) {
+   if (dev_num_vf(dev)) {
err = rtnl_vf_ports_fill(skb, dev);
if (err)
return err;
@@ -1351,15 +1362,15 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
if (rtnl_fill_stats(skb, dev))
goto nla_put_failure;
 
-   if (dev->dev.parent && (ext_filter_mask & RTEXT_FILTER_VF) &&
-   nla_put_u32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent)))
+   if (ext_filter_mask & RTEXT_FILTER_VF &&
+   nla_put_u32(skb, IFLA_NUM_VF, dev_num_vf(dev)))
goto nla_put_failure;
 
-   if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent &&
+   if (dev->netdev_ops->ndo_get_vf_config &&
ext_filter_mask & RTEXT_FILTER_VF) {

[net-next PATCH v6 1/3] net: net_device_ops: Introduce ndo_get_vf_count

2017-01-05 Thread Phil Sutter

The idea is to allow drivers to implement this callback in order to
provide a custom way to return the number of virtual functions present
on the device.

Signed-off-by: Phil Sutter 
---
Changes since v5:
- Introduced this patch.
---
 include/linux/netdevice.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ecd78b3c9abad..a04a693f55065 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -964,6 +964,10 @@ struct netdev_xdp {
  *  with PF and querying it may introduce a theoretical security risk.
  * int (*ndo_set_vf_rss_query_en)(struct net_device *dev, int vf, bool 
setting);
  * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
+ * int (*ndo_get_vf_count)(const struct net_device *dev);
+ * Return the number of VFs present on this device instead of having
+ * rtnetlink use pci_num_vf() on the PCI parent device.
+ *
  * int (*ndo_setup_tc)(struct net_device *dev, u8 tc)
  * Called to setup 'tc' number of traffic classes in the net device. This
  * is always called from the stack with the rtnl lock held and netif tx
@@ -1218,6 +1222,7 @@ struct net_device_ops {
int (*ndo_set_vf_rss_query_en)(
   struct net_device *dev,
   int vf, bool setting);
+   int (*ndo_get_vf_count)(const struct net_device 
*dev);
int (*ndo_setup_tc)(struct net_device *dev,
u32 handle,
__be16 protocol,
-- 
2.11.0

[net-next PATCH v6 3/3] net: dummy: Introduce dummy virtual functions

2017-01-05 Thread Phil Sutter

The idea for this was born when testing VF support in iproute2 which was
impeded by hardware requirements. In fact, not every VF-capable hardware
driver implements all netdev ops, so testing the interface is still hard
to do even with a well-sorted hardware shelf.

To overcome this and allow for testing the user-kernel interface, this
patch allows to turn dummy into a PF with a configurable amount of VFs.

Signed-off-by: Phil Sutter 
---
Changes since v5:
- Got rid of fake PCI parent hack altogether, implement ndo_get_vf_count
  instead.

Changes since v4:
- Initialize pci_pdev.sriov at runtime - older gcc versions don't allow
  initializing fields of anonymous unions at declaration time.
- Rebased onto current net-next/master.

Changes since v3:
- Changed type of vf_mac field from unsigned char to u8.
- Column-aligned structs' field names.

Changes since v2:
- Fixed oops on reboot (need to initialize parent device mutex).
- Got rid of potential mem leak noticed by Eric Dumazet.
- Dropped stray newline insertion.

Changes since v1:
- Fixed issues reported by kbuild test robot:
  - pci_dev->sriov is only present if CONFIG_PCI_ATS is active.
  - pci_bus_type does not exist if CONFIG_PCI is not defined.
---
 drivers/net/dummy.c | 178 +++-
 1 file changed, 176 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index 6421835f11b7e..8da0a97ff7cee 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -42,6 +42,25 @@
 #define DRV_VERSION"1.0"
 
 static int numdummies = 1;
+static int num_vfs;
+
+struct vf_data_storage {
+   u8  vf_mac[ETH_ALEN];
+   u16 pf_vlan; /* When set, guest VLAN config not allowed. */
+   u16 pf_qos;
+   __be16  vlan_proto;
+   u16 min_tx_rate;
+   u16 max_tx_rate;
+   u8  spoofchk_enabled;
+   boolrss_query_enabled;
+   u8  trusted;
+   int link_state;
+};
+
+struct dummy_priv {
+   int num_vfs;
+   struct vf_data_storage  *vfinfo;
+};
 
 /* fake multicast ability */
 static void set_multicast_list(struct net_device *dev)
@@ -91,10 +110,25 @@ static netdev_tx_t dummy_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
 static int dummy_dev_init(struct net_device *dev)
 {
+   struct dummy_priv *priv = netdev_priv(dev);
+
dev->dstats = netdev_alloc_pcpu_stats(struct pcpu_dstats);
if (!dev->dstats)
return -ENOMEM;
 
+   priv->num_vfs = num_vfs;
+   priv->vfinfo = NULL;
+
+   if (!num_vfs)
+   return 0;
+
+   priv->vfinfo = kcalloc(num_vfs, sizeof(struct vf_data_storage),
+  GFP_KERNEL);
+   if (!priv->vfinfo) {
+   free_percpu(dev->dstats);
+   return -ENOMEM;
+   }
+
return 0;
 }
 
@@ -112,6 +146,124 @@ static int dummy_change_carrier(struct net_device *dev, 
bool new_carrier)
return 0;
 }
 
+static int dummy_set_vf_mac(struct net_device *dev, int vf, u8 *mac)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (!is_valid_ether_addr(mac) || (vf >= priv->num_vfs))
+   return -EINVAL;
+
+   memcpy(priv->vfinfo[vf].vf_mac, mac, ETH_ALEN);
+
+   return 0;
+}
+
+static int dummy_set_vf_vlan(struct net_device *dev, int vf,
+u16 vlan, u8 qos, __be16 vlan_proto)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if ((vf >= priv->num_vfs) || (vlan > 4095) || (qos > 7))
+   return -EINVAL;
+
+   priv->vfinfo[vf].pf_vlan = vlan;
+   priv->vfinfo[vf].pf_qos = qos;
+   priv->vfinfo[vf].vlan_proto = vlan_proto;
+
+   return 0;
+}
+
+static int dummy_set_vf_rate(struct net_device *dev, int vf, int min, int max)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (vf >= priv->num_vfs)
+   return -EINVAL;
+
+   priv->vfinfo[vf].min_tx_rate = min;
+   priv->vfinfo[vf].max_tx_rate = max;
+
+   return 0;
+}
+
+static int dummy_set_vf_spoofchk(struct net_device *dev, int vf, bool val)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (vf >= priv->num_vfs)
+   return -EINVAL;
+
+   priv->vfinfo[vf].spoofchk_enabled = val;
+
+   return 0;
+}
+
+static int dummy_set_vf_rss_query_en(struct net_device *dev, int vf, bool val)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (vf >= priv->num_vfs)
+   return -EINVAL;
+
+   priv->vfinfo[vf].rss_query_enabled = val;
+
+   return 0;
+}
+
+static int dummy_set_vf_trust(struct net_device *dev, int vf, bool val)
+{
+   struct dummy_priv *priv = netdev_priv(dev);
+
+   if (vf >= priv->num_vfs)
+   return -EINVAL;
+
+   priv->vfinfo[vf].trusted = val;
+
+   return 0;
+}
+
+static int dummy_get_vf_config(struct net_device *dev,
+  int vf, struct ifla_vf_info

RE: [PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread KY Srinivasan



> -Original Message-
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Thursday, January 5, 2017 10:29 AM
> To: KY Srinivasan 
> Cc: Stephen Hemminger ;
> da...@davemloft.net; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; Stephen Hemminger 
> Subject: Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> 
> On Thu, Jan 05, 2017 at 05:43:04PM +, KY Srinivasan wrote:
> >
> >
> > > -Original Message-
> > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > Sent: Thursday, January 5, 2017 9:36 AM
> > > To: da...@davemloft.net; KY Srinivasan 
> > > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > > gre...@linuxfoundation.org; Stephen Hemminger
> > > 
> > > Subject: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> > >
> > > Update the Hyper-V MAINTAINERS to include myself.
> > >
> > > Signed-off-by: Stephen Hemminger 
> >
> > Acked-by: K. Y. Srinivasan 
> 
> Thanks, will go queue this up now.

Thanks Greg. On a different note, there are a bunch of Hyper-V specific
patches that have been submitted over the last month or so that have not
been committed. Should I resend them.

Regards,

K. Y
> 
> greg k-h

[PATCH net-next v2] net: dsa: b53: Utilize common helpers for u64/MAC

2017-01-05 Thread Florian Fainelli

Utilize the two functions recently introduced: u64_to_ether() and
ether_to_u64() instead of our own versions.

Reviewed-by: Andrew Lunn 
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- include etherdevice.h in b53_priv.h to fix Kbuild reported
  errors

 drivers/net/dsa/b53/b53_common.c |  2 +-
 drivers/net/dsa/b53/b53_priv.h   | 24 +++-
 2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 947adda3397d..d5370c227043 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1137,7 +1137,7 @@ static int b53_arl_op(struct b53_device *dev, int op, int 
port,
int ret;
 
/* Convert the array into a 64-bit MAC */
-   mac = b53_mac_to_u64(addr);
+   mac = ether_addr_to_u64(addr);
 
/* Perform a read for the given MAC and VID */
b53_write48(dev, B53_ARLIO_PAGE, B53_MAC_ADDR_IDX, mac);
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index f192a673caba..1f4b07b77de2 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "b53_regs.h"
@@ -325,25 +326,6 @@ struct b53_arl_entry {
u8 is_static:1;
 };
 
-static inline void b53_mac_from_u64(u64 src, u8 *dst)
-{
-   unsigned int i;
-
-   for (i = 0; i < ETH_ALEN; i++)
-   dst[ETH_ALEN - 1 - i] = (src >> (8 * i)) & 0xff;
-}
-
-static inline u64 b53_mac_to_u64(const u8 *src)
-{
-   unsigned int i;
-   u64 dst = 0;
-
-   for (i = 0; i < ETH_ALEN; i++)
-   dst |= (u64)src[ETH_ALEN - 1 - i] << (8 * i);
-
-   return dst;
-}
-
 static inline void b53_arl_to_entry(struct b53_arl_entry *ent,
u64 mac_vid, u32 fwd_entry)
 {
@@ -352,14 +334,14 @@ static inline void b53_arl_to_entry(struct b53_arl_entry 
*ent,
ent->is_valid = !!(fwd_entry & ARLTBL_VALID);
ent->is_age = !!(fwd_entry & ARLTBL_AGE);
ent->is_static = !!(fwd_entry & ARLTBL_STATIC);
-   b53_mac_from_u64(mac_vid, ent->mac);
+   u64_to_ether_addr(mac_vid, ent->mac);
ent->vid = mac_vid >> ARLTBL_VID_S;
 }
 
 static inline void b53_arl_from_entry(u64 *mac_vid, u32 *fwd_entry,
  const struct b53_arl_entry *ent)
 {
-   *mac_vid = b53_mac_to_u64(ent->mac);
+   *mac_vid = ether_addr_to_u64(ent->mac);
*mac_vid |= (u64)(ent->vid & ARLTBL_VID_MASK) << ARLTBL_VID_S;
*fwd_entry = ent->port & ARLTBL_DATA_PORT_ID_MASK;
if (ent->is_valid)
-- 
2.9.3

[PATCH v4 net-next] tools: psock_tpacket: block Rx until socket filter has been added and socket has been bound to loopback.

2017-01-05 Thread Sowmini Varadhan

Packets from any/all interfaces may be queued up on the PF_PACKET socket
before it is bound to the loopback interface by psock_tpacket, and
when these are passed up by the kernel, they could interfere
with the Rx tests.

Avoid interference from spurious packet by blocking Rx until the
socket filter has been set up, and the packet has been bound to the
desired (lo) interface. The effective sequence is
socket(PF_PACKET, SOCK_RAW, 0);
set up ring
Invoke SO_ATTACH_FILTER
bind to sll_protocol set to ETH_P_ALL, sll_ifindex for lo
After this sequence, the only packets that will be passed up are
those received on loopback that pass the attached filter.

Signed-off-by: Sowmini Varadhan 
---
v2: patch reworked based on comments from Willem de Bruijn
v4: dropped patch 1/2: leave it soft; 
Send patch 2/2 to the owner of tools/testing/selftests/net/ listed in
MAINTAINERS, instead of the list generated by get_maintainer.pl

 tools/testing/selftests/net/psock_tpacket.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/psock_tpacket.c 
b/tools/testing/selftests/net/psock_tpacket.c
index 4a1bc64..7f6cd9f 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -110,7 +110,7 @@ struct block_desc {
 
 static int pfsocket(int ver)
 {
-   int ret, sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+   int ret, sock = socket(PF_PACKET, SOCK_RAW, 0);
if (sock == -1) {
perror("socket");
exit(1);
@@ -239,7 +239,6 @@ static void walk_v1_v2_rx(int sock, struct ring *ring)
bug_on(ring->type != PACKET_RX_RING);
 
pair_udp_open(udp_sock, PORT_BASE);
-   pair_udp_setfilter(sock);
 
memset(, 0, sizeof(pfd));
pfd.fd = sock;
@@ -601,7 +600,6 @@ static void walk_v3_rx(int sock, struct ring *ring)
bug_on(ring->type != PACKET_RX_RING);
 
pair_udp_open(udp_sock, PORT_BASE);
-   pair_udp_setfilter(sock);
 
memset(, 0, sizeof(pfd));
pfd.fd = sock;
@@ -741,6 +739,8 @@ static void bind_ring(int sock, struct ring *ring)
 {
int ret;
 
+   pair_udp_setfilter(sock);
+
ring->ll.sll_family = PF_PACKET;
ring->ll.sll_protocol = htons(ETH_P_ALL);
ring->ll.sll_ifindex = if_nametoindex("lo");
-- 
1.7.1

Re: [PATCH] MIPS: NI 169445 board support

2017-01-05 Thread Nathan Sullivan

On Thu, Jan 05, 2017 at 06:33:53PM +, Joao Pinto wrote:
> Hi,
> 
> Às 6:28 PM de 1/5/2017, Niklas Cassel escreveu:
> > On 01/04/2017 05:38 PM, Nathan Sullivan wrote:
> >> On Tue, Dec 20, 2016 at 05:34:34PM +0100, Ralf Baechle wrote:
> >>> On Fri, Dec 02, 2016 at 09:42:09AM -0600, Nathan Sullivan wrote:
>  Date:   Fri, 2 Dec 2016 09:42:09 -0600
>  From: Nathan Sullivan 
>  To: r...@linux-mips.org, mark.rutl...@arm.com, robh...@kernel.org
>  CC: linux-m...@linux-mips.org, devicet...@vger.kernel.org,
>   linux-ker...@vger.kernel.org, Nathan Sullivan 
>  Subject: [PATCH] MIPS: NI 169445 board support
>  Content-Type: text/plain
> 
>  Support the National Instruments 169445 board.
> >>> Nathan,
> >>>
> >>> I assume you're going to repost the changes Rob asked for in
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.linux-2Dmips.org_patch_14641_-2326924=DgICaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=5p7f9dIkvVVK4UFHimMpezq5NwIJfUpd08c-Zk4_c6c=_JwSwe4VFYtxV1tcYt6Z8r4hJX0xfoGhCixygUxlg5s=
> >>>   and resubmit?
> >>>
> >>> Thanks,
> >>>
> >>>   Ralf
> >> Hmm, I found the issue with the generic MIPS config and dwc_eth_qos.  The 
> >> NIC
> >> driver attempts to cache align a descriptor ring using the 
> >> ___cacheline_aligned
> >> attribute on the descriptor struct, in combination with a "skip" feature in
> >> hardware.  However, the skip feature only has a three bit field, and the 
> >> generic
> >> MIPS config selects MIPS_L1_CACHE_SHIFT_7.  So, the line size is 128, and 
> >> with a
> >> 64-bit bus, that means the NIC descriptor skip field would need to be set 
> >> to
> >> 14 to align the 16-byte descriptors...
> >>
> >> I guess it makes sense for a generic MIPS kernel to align everything for 
> >> 128 byte
> >> cache lines, and for me to fix the dwc_eth_qos driver to handle cases 
> >> where the
> >> line size is too big for the hardware skip feature, right?
> > 
> > I don't know if you've been following the discussion regarding
> > dwc_eth_qos on netdev, but Joao Pinto from Synopsys is
> > planning on removing the driver (since the stmmac driver
> > now supports the same version of the IP, together with older
> > versions of the IP).
> > 
> > Since device tree bindings are treated as an ABI,
> > Joao has implemented a glue layer for stmmac that parses
> > the dwc_eth_qos binding, but uses stmmac under the hood.
> > 
> > You can use any of the bindings, but since the dwc_eth_qos
> > binding will be marked as deprecated, you might want to
> > consider moving to the stmmac binding.
> 
> A patch set to port dwc_eth_qos to stmmac is at this moment under review:
> 
> http://patchwork.ozlabs.org/patch/711428/
> http://patchwork.ozlabs.org/patch/711438/
> http://patchwork.ozlabs.org/patch/711439/
> 
> Niklas has tested it and it works well, so after the patches are upstreamed 
> the
> dwc_eth_qos will be removed as agreed with Lars.
> 
> Thanks.
>

Thanks for the heads up, I'll wait, adjust my bindings and retest then.

   Nathan

> > 
> >>
> >> Thanks,
> >>
> >>Nathan
> >>
> >>
> > 
>

Re: [PATCH] MIPS: NI 169445 board support

2017-01-05 Thread Joao Pinto

Às 6:44 PM de 1/5/2017, Nathan Sullivan escreveu:
> On Thu, Jan 05, 2017 at 06:33:53PM +, Joao Pinto wrote:
>> Hi,
>>
>> Às 6:28 PM de 1/5/2017, Niklas Cassel escreveu:
>>> On 01/04/2017 05:38 PM, Nathan Sullivan wrote:
 On Tue, Dec 20, 2016 at 05:34:34PM +0100, Ralf Baechle wrote:
> On Fri, Dec 02, 2016 at 09:42:09AM -0600, Nathan Sullivan wrote:
>> Date:   Fri, 2 Dec 2016 09:42:09 -0600
>> From: Nathan Sullivan 
>> To: r...@linux-mips.org, mark.rutl...@arm.com, robh...@kernel.org
>> CC: linux-m...@linux-mips.org, devicet...@vger.kernel.org,
>>  linux-ker...@vger.kernel.org, Nathan Sullivan 
>> Subject: [PATCH] MIPS: NI 169445 board support
>> Content-Type: text/plain
>>
>> Support the National Instruments 169445 board.
> Nathan,
>
> I assume you're going to repost the changes Rob asked for in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.linux-2Dmips.org_patch_14641_-2326924=DgICaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=5p7f9dIkvVVK4UFHimMpezq5NwIJfUpd08c-Zk4_c6c=_JwSwe4VFYtxV1tcYt6Z8r4hJX0xfoGhCixygUxlg5s=
>   and resubmit?
>
> Thanks,
>
>   Ralf
 Hmm, I found the issue with the generic MIPS config and dwc_eth_qos.  The 
 NIC
 driver attempts to cache align a descriptor ring using the 
 ___cacheline_aligned
 attribute on the descriptor struct, in combination with a "skip" feature in
 hardware.  However, the skip feature only has a three bit field, and the 
 generic
 MIPS config selects MIPS_L1_CACHE_SHIFT_7.  So, the line size is 128, and 
 with a
 64-bit bus, that means the NIC descriptor skip field would need to be set 
 to
 14 to align the 16-byte descriptors...

 I guess it makes sense for a generic MIPS kernel to align everything for 
 128 byte
 cache lines, and for me to fix the dwc_eth_qos driver to handle cases 
 where the
 line size is too big for the hardware skip feature, right?
>>>
>>> I don't know if you've been following the discussion regarding
>>> dwc_eth_qos on netdev, but Joao Pinto from Synopsys is
>>> planning on removing the driver (since the stmmac driver
>>> now supports the same version of the IP, together with older
>>> versions of the IP).
>>>
>>> Since device tree bindings are treated as an ABI,
>>> Joao has implemented a glue layer for stmmac that parses
>>> the dwc_eth_qos binding, but uses stmmac under the hood.
>>>
>>> You can use any of the bindings, but since the dwc_eth_qos
>>> binding will be marked as deprecated, you might want to
>>> consider moving to the stmmac binding.
>>
>> A patch set to port dwc_eth_qos to stmmac is at this moment under review:
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_711428_=DgIDAw=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=E0wkLvWGNBx49Zdq7Jw5toxfcwI9r7MBBbcTea28AL0=P71GK8K8tyGjenB4tDVyKfCuZF9cZiFBBpdeX8PQtEM=
>>  
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_711438_=DgIDAw=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=E0wkLvWGNBx49Zdq7Jw5toxfcwI9r7MBBbcTea28AL0=fj787JEefx7cddQAe7g604tMtvDlVzYj3kQKy80Gym0=
>>  
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_711439_=DgIDAw=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=E0wkLvWGNBx49Zdq7Jw5toxfcwI9r7MBBbcTea28AL0=Cyy9ySM6LNgkQ07OsIYE8KnD1h1DruhCKLxH6W3_1VY=
>>  
>>
>> Niklas has tested it and it works well, so after the patches are upstreamed 
>> the
>> dwc_eth_qos will be removed as agreed with Lars.
>>
>> Thanks.
>>
> 
> Thanks for the heads up, I'll wait, adjust my bindings and retest then.

Great! Thanks!

> 
>Nathan
> 
>>>

 Thanks,

Nathan


>>>
>>

[PATCH net-next] net: dsa: remove version string

2017-01-05 Thread Vivien Didelot

The dsa_driver_version string is irrelevant and has not been bumped
since its introduction about 9 years ago. Kill it.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa.c  | 5 -
 net/dsa/dsa_priv.h | 1 -
 net/dsa/slave.c| 1 -
 3 files changed, 7 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 89e66b623d73..3f85be0aae34 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -27,8 +27,6 @@
 #include 
 #include "dsa_priv.h"
 
-char dsa_driver_version[] = "0.1";
-
 static struct sk_buff *dsa_slave_notag_xmit(struct sk_buff *skb,
struct net_device *dev)
 {
@@ -926,9 +924,6 @@ static int dsa_probe(struct platform_device *pdev)
struct dsa_switch_tree *dst;
int ret;
 
-   pr_notice_once("Distributed Switch Architecture driver version %s\n",
-  dsa_driver_version);
-
if (pdev->dev.of_node) {
ret = dsa_of_probe(>dev);
if (ret)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 6cfd7388834e..63ae1484abae 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -49,7 +49,6 @@ struct dsa_slave_priv {
 };
 
 /* dsa.c */
-extern char dsa_driver_version[];
 int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
  struct device_node *port_dn, int port);
 void dsa_cpu_dsa_destroy(struct device_node *port_dn);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index ffd91969b830..5cd5b8137c08 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -673,7 +673,6 @@ static void dsa_slave_get_drvinfo(struct net_device *dev,
  struct ethtool_drvinfo *drvinfo)
 {
strlcpy(drvinfo->driver, "dsa", sizeof(drvinfo->driver));
-   strlcpy(drvinfo->version, dsa_driver_version, sizeof(drvinfo->version));
strlcpy(drvinfo->fw_version, "N/A", sizeof(drvinfo->fw_version));
strlcpy(drvinfo->bus_info, "platform", sizeof(drvinfo->bus_info));
 }
-- 
2.11.0

[PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread Stephen Hemminger

Update the Hyper-V MAINTAINERS to include myself.

Signed-off-by: Stephen Hemminger 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ea11bb03f550..7542341d8155 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5963,6 +5963,7 @@ F:drivers/media/platform/sti/hva
 Hyper-V CORE AND DRIVERS
 M: "K. Y. Srinivasan" 
 M: Haiyang Zhang 
+M: Stephen Hemminger 
 L: de...@linuxdriverproject.org
 S: Maintained
 F: arch/x86/include/asm/mshyperv.h
-- 
2.11.0

Re: [PATCH net-next] net: dsa: remove version string

2017-01-05 Thread Florian Fainelli

On 01/05/2017 09:28 AM, Vivien Didelot wrote:
> The dsa_driver_version string is irrelevant and has not been bumped
> since its introduction about 9 years ago. Kill it.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 

Good riddance ;)
-- 
Florian

Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter

2017-01-05 Thread Shuah Khan

On 01/05/2017 08:54 AM, Sowmini Varadhan wrote:
> On (01/04/17 16:26), Shuah Khan wrote:
>>
>> Could you please split this patch into two. Hardening part in one and
>> the cleanup in a separate patch. This way I can get the hardening fix
>> into 4.10 in my next Kselftest update. Cleanup patch can go in later.
>>
>> thanks,
>> -- Shuah
> 
> I'm a little confused by the comments above.
> 
> Dan's suggestion was that I could have used some other
> tool to generate the code, rather than hand-crafting it as I did.
> In his last message, he suggests that it may be ok to leave
> the hand-crafted version as is (for now), as well.
> 
> To make it clear:
> the current v3 version *is* the "hardening" part. Dan's suggestion is
> that the hand-crafted version can be replaced by bpf_asm generated code
> later. That would be the "cleanup" part, which I was going to do in a
> later commit.
> 
> Does that help?
> 
> --Sowmini
> 

Let's try this again. I want to see a separate patch for the
filter cleanup. I don't want that included in the non-udp packet
check. Please address the readability review comments from me and
Daniel when you send your next version.

thanks,
-- Shuah

Re: [PATCH net-next] packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode

2017-01-05 Thread Eric Dumazet

On Thu, 2017-01-05 at 02:34 +0100, Daniel Borkmann wrote:
> When TX timestamping is in use with TPACKET_V3's TX ring, then we'll
> hit the BUG() in __packet_set_timestamp() when ring buffer slot is
> returned to user space via tpacket_destruct_skb(). This is due to v3
> being assumed as unreachable here, but since 7f953ab2ba46 ("af_packet:
> TX_RING support for TPACKET_V3") it's not anymore. Fix it by filling
> the timestamp back into the ring slot.
> 
> Fixes: 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3")
> Signed-off-by: Daniel Borkmann 
> ---
>  net/packet/af_packet.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 7e39087..ddbda25 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -481,6 +481,9 @@ static __u32 __packet_set_timestamp(struct packet_sock 
> *po, void *frame,
>   h.h2->tp_nsec = ts.tv_nsec;
>   break;
>   case TPACKET_V3:
> + h.h3->tp_sec = ts.tv_sec;
> + h.h3->tp_nsec = ts.tv_nsec;
> + break;
>   default:
>   WARN(1, "TPACKET version not supported.\n");
>   BUG();

Gosh. Can we also replace this BUG() into something less aggressive ?

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 
b9e1a13b4ba36a0bc7edf6a8c2c116c7d48c970c..0c0d268544787dcbef6601c5014e7d3836d16f96
 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -476,9 +476,11 @@ static __u32 __packet_set_timestamp(struct packet_sock 
*po, void *frame,
h.h2->tp_nsec = ts.tv_nsec;
break;
case TPACKET_V3:
+   h.h3->tp_sec = ts.tv_sec;
+   h.h3->tp_nsec = ts.tv_nsec;
+   break;
default:
-   WARN(1, "TPACKET version not supported.\n");
-   BUG();
+   pr_err_once("TPACKET version %u not supported.\n", 
po->tp_version);
}
 
/* one flush is safe, as both fields always lie on the same cacheline */

Re: [PATCH] MIPS: NI 169445 board support

2017-01-05 Thread Joao Pinto

Hi,

Às 6:28 PM de 1/5/2017, Niklas Cassel escreveu:
> On 01/04/2017 05:38 PM, Nathan Sullivan wrote:
>> On Tue, Dec 20, 2016 at 05:34:34PM +0100, Ralf Baechle wrote:
>>> On Fri, Dec 02, 2016 at 09:42:09AM -0600, Nathan Sullivan wrote:
 Date:   Fri, 2 Dec 2016 09:42:09 -0600
 From: Nathan Sullivan 
 To: r...@linux-mips.org, mark.rutl...@arm.com, robh...@kernel.org
 CC: linux-m...@linux-mips.org, devicet...@vger.kernel.org,
  linux-ker...@vger.kernel.org, Nathan Sullivan 
 Subject: [PATCH] MIPS: NI 169445 board support
 Content-Type: text/plain

 Support the National Instruments 169445 board.
>>> Nathan,
>>>
>>> I assume you're going to repost the changes Rob asked for in
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.linux-2Dmips.org_patch_14641_-2326924=DgICaQ=DPL6_X_6JkXFx7AXWqB0tg=s2fO0hii0OGNOv9qQy_HRXy-xAJUD1NNoEcc3io_kx0=5p7f9dIkvVVK4UFHimMpezq5NwIJfUpd08c-Zk4_c6c=_JwSwe4VFYtxV1tcYt6Z8r4hJX0xfoGhCixygUxlg5s=
>>>   and resubmit?
>>>
>>> Thanks,
>>>
>>>   Ralf
>> Hmm, I found the issue with the generic MIPS config and dwc_eth_qos.  The NIC
>> driver attempts to cache align a descriptor ring using the 
>> ___cacheline_aligned
>> attribute on the descriptor struct, in combination with a "skip" feature in
>> hardware.  However, the skip feature only has a three bit field, and the 
>> generic
>> MIPS config selects MIPS_L1_CACHE_SHIFT_7.  So, the line size is 128, and 
>> with a
>> 64-bit bus, that means the NIC descriptor skip field would need to be set to
>> 14 to align the 16-byte descriptors...
>>
>> I guess it makes sense for a generic MIPS kernel to align everything for 128 
>> byte
>> cache lines, and for me to fix the dwc_eth_qos driver to handle cases where 
>> the
>> line size is too big for the hardware skip feature, right?
> 
> I don't know if you've been following the discussion regarding
> dwc_eth_qos on netdev, but Joao Pinto from Synopsys is
> planning on removing the driver (since the stmmac driver
> now supports the same version of the IP, together with older
> versions of the IP).
> 
> Since device tree bindings are treated as an ABI,
> Joao has implemented a glue layer for stmmac that parses
> the dwc_eth_qos binding, but uses stmmac under the hood.
> 
> You can use any of the bindings, but since the dwc_eth_qos
> binding will be marked as deprecated, you might want to
> consider moving to the stmmac binding.

A patch set to port dwc_eth_qos to stmmac is at this moment under review:

http://patchwork.ozlabs.org/patch/711428/
http://patchwork.ozlabs.org/patch/711438/
http://patchwork.ozlabs.org/patch/711439/

Niklas has tested it and it works well, so after the patches are upstreamed the
dwc_eth_qos will be removed as agreed with Lars.

Thanks.

> 
>>
>> Thanks,
>>
>>Nathan
>>
>>
>

Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread gre...@linuxfoundation.org

On Thu, Jan 05, 2017 at 05:43:04PM +, KY Srinivasan wrote:
> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Thursday, January 5, 2017 9:36 AM
> > To: da...@davemloft.net; KY Srinivasan 
> > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> > gre...@linuxfoundation.org; Stephen Hemminger
> > 
> > Subject: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> > 
> > Update the Hyper-V MAINTAINERS to include myself.
> > 
> > Signed-off-by: Stephen Hemminger 
> 
> Acked-by: K. Y. Srinivasan 

Thanks, will go queue this up now.

greg k-h

Re: Re: [PATCH] MIPS: NI 169445 board support

2017-01-05 Thread Niklas Cassel

On 01/04/2017 05:38 PM, Nathan Sullivan wrote:
> On Tue, Dec 20, 2016 at 05:34:34PM +0100, Ralf Baechle wrote:
>> On Fri, Dec 02, 2016 at 09:42:09AM -0600, Nathan Sullivan wrote:
>>> Date:   Fri, 2 Dec 2016 09:42:09 -0600
>>> From: Nathan Sullivan 
>>> To: r...@linux-mips.org, mark.rutl...@arm.com, robh...@kernel.org
>>> CC: linux-m...@linux-mips.org, devicet...@vger.kernel.org,
>>>  linux-ker...@vger.kernel.org, Nathan Sullivan 
>>> Subject: [PATCH] MIPS: NI 169445 board support
>>> Content-Type: text/plain
>>>
>>> Support the National Instruments 169445 board.
>> Nathan,
>>
>> I assume you're going to repost the changes Rob asked for in
>> https://patchwork.linux-mips.org/patch/14641/#26924 and resubmit?
>>
>> Thanks,
>>
>>   Ralf
> Hmm, I found the issue with the generic MIPS config and dwc_eth_qos.  The NIC
> driver attempts to cache align a descriptor ring using the 
> ___cacheline_aligned
> attribute on the descriptor struct, in combination with a "skip" feature in
> hardware.  However, the skip feature only has a three bit field, and the 
> generic
> MIPS config selects MIPS_L1_CACHE_SHIFT_7.  So, the line size is 128, and 
> with a
> 64-bit bus, that means the NIC descriptor skip field would need to be set to
> 14 to align the 16-byte descriptors...
>
> I guess it makes sense for a generic MIPS kernel to align everything for 128 
> byte
> cache lines, and for me to fix the dwc_eth_qos driver to handle cases where 
> the
> line size is too big for the hardware skip feature, right?

I don't know if you've been following the discussion regarding
dwc_eth_qos on netdev, but Joao Pinto from Synopsys is
planning on removing the driver (since the stmmac driver
now supports the same version of the IP, together with older
versions of the IP).

Since device tree bindings are treated as an ABI,
Joao has implemented a glue layer for stmmac that parses
the dwc_eth_qos binding, but uses stmmac under the hood.

You can use any of the bindings, but since the dwc_eth_qos
binding will be marked as deprecated, you might want to
consider moving to the stmmac binding.

>
> Thanks,
>
>Nathan
>
>

Re: [PATCH net-next] net: make ndo_get_stats64 a void function

2017-01-05 Thread David Miller

From: Stephen Hemminger 
Date: Thu,  5 Jan 2017 09:31:36 -0800

> The network device operation for reading statistics is only called
> in one place, and it ignores the return value. Having a structure
> return value is potentially confusing because some future driver could
> incorrectly assume that the return value was used.
> 
> Fix all drivers with ndo_get_stats64 to have a void function.
> 
> Signed-off-by: Stephen Hemminger 

You missed at least one new warning, please do a fresh allmodconfig build
and watch the logs.

drivers/net/ethernet/broadcom/bnx2.c: In function ‘bnx2_get_stats64’:
drivers/net/ethernet/broadcom/bnx2.c:6830:10: warning: ‘return’ with a value, 
in function returning void

Thanks.

Re: [PATCH net-next 0/3] rxrpc: Update tracing and proc interfaces

2017-01-05 Thread David Miller

From: David Howells <dhowe...@redhat.com>
Date: Thu, 05 Jan 2017 14:31:50 +

> This set of patches fixes and extends tracing:
> 
>  (1) Fix the handling of enum-to-string translations so that external
>  tracing tools can make use of it by using TRACE_DEFINE_ENUM.
> 
>  (2) Extend a couple of tracepoints to export some extra available
>  information and add three new tracepoints to allow monitoring of
>  received DATA packets, call disconnection and improper/implicit call
>  termination.
> 
> and adds a bit more procfs-exported information:
> 
>  (3) Show a call's hard-ACK cursors in /proc/net/rxrpc_calls.
> 
> The patches can be found here also:
> 
>   
> http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite
> 
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20170105

Pulled, thanks.

Re: [PATCH v3 3/3] stmmac: adding new glue driver dwmac-dwc-qos-eth

2017-01-05 Thread Joao Pinto

Hi Alex,

Às 5:19 PM de 1/5/2017, Alexandre Torgue escreveu:
> Hi Joao,
> 
> On 01/04/2017 05:22 PM, Joao Pinto wrote:
>> This patch adds a new glue driver called dwmac-dwc-qos-eth which
>> was based in the dwc_eth_qos as is. To assure retro-compatibility a slight
>> tweak was also added to stmmac_platform.
> 
> Sorry to come late in the review. I have a basic question. Why do you create a
> glue driver for that ?
> dwmac-glues are currently vendor specific, so why create one for IP ? Why not
> continue to use stmmac_platform.c ?
> (It is very basic, I assume I miss something)
> 

If you check in the kernel tree there is a synopsys qos driver under
net/ethernet/synopsys/*.qos.c. At this moment Synopsys has the goal to support
QoS in the mainline kernel and so a discussion took place a month ago, about
what would be the best solution. At the time we (mailing-list folks) decided to
port the net/ethernet/synopsys/*.qos.c driver to stmmac and remove it. This way
we can have stmmac has a single synopsys ethernet software package.
For us to achieve this we agreed that stmmac would have

Lars the current synopsys/*.qos.c maintainer requested that stmmac be compatible
with the devicetree bindings that axis' customers were using in the driver. So
if you check the new glue driver, you will see it parses the legacy drivers DT
bindings and initiates stmmac. So you can see it like a legacy compatible glue
for the stmmac.

Thanks,
Joao

> thanks
> Alex
> 
> 
> 
>>
>> Signed-off-by: Joao Pinto 
>> ---
>> changes v2 -> v3:
>> - Nothing changed, just to keep up patch set version
>> changes v1 -> v2:
>> - WOL was not declared in the new glue driver
>> - clocks were switched and now fixed (apb_pclk and phy_ref_clk)
>>
>>  .../bindings/net/snps,dwc-qos-ethernet.txt |   3 +
>>  drivers/net/ethernet/stmicro/stmmac/Kconfig|   9 +
>>  drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
>>  .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 200 
>> +
>>  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  15 +-
>>  5 files changed, 225 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
>>
>> diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
>> b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
>> index d93f71c..21d27aa 100644
>> --- a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
>> +++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
>> @@ -1,5 +1,8 @@
>>  * Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC)
>>
>> +This binding is deprecated, but it continues to be supported, but new
>> +features should be preferably added to the stmmac binding document.
>> +
>>  This binding supports the Synopsys Designware Ethernet QoS (Quality Of 
>> Service)
>>  IP block. The IP supports multiple options for bus type, clocking and reset
>>  structure, and feature list. Consequently, a number of properties and list
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig
>> b/drivers/net/ethernet/stmicro/stmmac/Kconfig
>> index ab66248..99594e3 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
>> +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
>> @@ -29,6 +29,15 @@ config STMMAC_PLATFORM
>>
>>  if STMMAC_PLATFORM
>>
>> +config DWMAC_DWC_QOS_ETH
>> +tristate "Support for snps,dwc-qos-ethernet.txt DT binding."
>> +select PHYLIB
>> +select CRC32
>> +select MII
>> +depends on OF && HAS_DMA
>> +help
>> +  Support for chips using the snps,dwc-qos-ethernet.txt DT binding.
>> +
>>  config DWMAC_GENERIC
>>  tristate "Generic driver for DWMAC"
>>  default STMMAC_PLATFORM
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile
>> b/drivers/net/ethernet/stmicro/stmmac/Makefile
>> index 8f83a86..700c603 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/Makefile
>> +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
>> @@ -16,6 +16,7 @@ obj-$(CONFIG_DWMAC_SOCFPGA)+= dwmac-altr-socfpga.o
>>  obj-$(CONFIG_DWMAC_STI)+= dwmac-sti.o
>>  obj-$(CONFIG_DWMAC_STM32)+= dwmac-stm32.o
>>  obj-$(CONFIG_DWMAC_SUNXI)+= dwmac-sunxi.o
>> +obj-$(CONFIG_DWMAC_DWC_QOS_ETH)+= dwmac-dwc-qos-eth.o
>>  obj-$(CONFIG_DWMAC_GENERIC)+= dwmac-generic.o
>>  stmmac-platform-objs:= stmmac_platform.o
>>  dwmac-altr-socfpga-objs := altr_tse_pcs.o dwmac-socfpga.o
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
>> new file mode 100644
>> index 000..4532a7c
>> --- /dev/null
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
>> @@ -0,0 +1,200 @@
>> +/*
>> + * Synopsys DWC Ethernet Quality-of-Service v4.10a linux driver
>> + *
>> + * Copyright (C) 2016 Joao Pinto 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU

RE: [PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread KY Srinivasan



> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Thursday, January 5, 2017 9:36 AM
> To: da...@davemloft.net; KY Srinivasan 
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> gre...@linuxfoundation.org; Stephen Hemminger
> 
> Subject: [PATCH net] hyper-v: Add myself as additional MAINTAINER
> 
> Update the Hyper-V MAINTAINERS to include myself.
> 
> Signed-off-by: Stephen Hemminger 

Acked-by: K. Y. Srinivasan 

> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ea11bb03f550..7542341d8155 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5963,6 +5963,7 @@ F:  drivers/media/platform/sti/hva
>  Hyper-V CORE AND DRIVERS
>  M:   "K. Y. Srinivasan" 
>  M:   Haiyang Zhang 
> +M:   Stephen Hemminger 
>  L:   de...@linuxdriverproject.org
>  S:   Maintained
>  F:   arch/x86/include/asm/mshyperv.h
> --
> 2.11.0

Re: [PATCH net] hyper-v: Add myself as additional MAINTAINER

2017-01-05 Thread Greg KH

On Thu, Jan 05, 2017 at 09:36:26AM -0800, Stephen Hemminger wrote:
> Update the Hyper-V MAINTAINERS to include myself.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ea11bb03f550..7542341d8155 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5963,6 +5963,7 @@ F:  drivers/media/platform/sti/hva
>  Hyper-V CORE AND DRIVERS
>  M:   "K. Y. Srinivasan" 
>  M:   Haiyang Zhang 
> +M:   Stephen Hemminger 

No objection from me, they need all the help they can get :)

But I would like an ack from the current maintainers that this is ok
first.

thanks,

greg k-h

Re: [PATCH] tg3: Avoid NULL pointer dereference in tg3_get_nstats()

2017-01-05 Thread David Miller

From: Wang Yufen 
Date: Thu, 5 Jan 2017 22:13:21 +0800

> From: Yufen Wang 
> 
> A possible NULL pointer dereference in tg3_get_stats64 while doing
> tg3_free_consistent.
 ...
> This patch avoids the NULL pointer dereference by using !tg3_flag(tp, 
> INIT_COMPLETE)
> instate of !tp->hw_stats.
> 
> Signed-off-by: Yufen Wang 
> ---
>  drivers/net/ethernet/broadcom/tg3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/tg3.c 
> b/drivers/net/ethernet/broadcom/tg3.c
> index 185e9e0..012f18d 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -14148,7 +14148,7 @@ static struct rtnl_link_stats64 
> *tg3_get_stats64(struct net_device *dev,
>   struct tg3 *tp = netdev_priv(dev);
>  
>   spin_lock_bh(>lock);
> - if (!tp->hw_stats) {
> + if (!tg3_flag(tp, INIT_COMPLETE)) {

The real issue is the manner and order in which the driver performs
initialization actions relative to netif_device_{attach,detach}().

That is what needs to be fixed here, instead of adding more and more
ad-hoc tests to the various methods which can be invoked once the
netif_device_attach() occurs.

[PATCH net-next] net: make ndo_get_stats64 a void function

2017-01-05 Thread Stephen Hemminger

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/bonding/bond_main.c  | 10 --
 drivers/net/dummy.c  |  5 ++---
 drivers/net/ethernet/alacritech/slicoss.c|  6 ++
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 --
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |  6 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c |  4 +---
 drivers/net/ethernet/atheros/alx/main.c  |  6 ++
 drivers/net/ethernet/broadcom/b44.c  |  5 ++---
 drivers/net/ethernet/broadcom/bnx2.c |  3 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|  6 ++
 drivers/net/ethernet/broadcom/tg3.c  |  8 +++-
 drivers/net/ethernet/brocade/bna/bnad.c  |  6 ++
 drivers/net/ethernet/calxeda/xgmac.c |  5 ++---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |  5 ++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  7 +++
 drivers/net/ethernet/cisco/enic/enic_main.c  |  8 +++-
 drivers/net/ethernet/ec_bhf.c|  4 +---
 drivers/net/ethernet/emulex/benet/be_main.c  |  5 ++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c   |  6 ++
 drivers/net/ethernet/hisilicon/hns/hns_enet.c|  6 ++
 drivers/net/ethernet/ibm/ehea/ehea_main.c|  5 ++---
 drivers/net/ethernet/intel/e1000e/e1000.h|  4 ++--
 drivers/net/ethernet/intel/e1000e/netdev.c   |  5 ++---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |  6 ++
 drivers/net/ethernet/intel/i40e/i40e.h   |  5 ++---
 drivers/net/ethernet/intel/i40e/i40e_main.c  | 18 ++
 drivers/net/ethernet/intel/igb/igb_main.c| 10 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  7 ---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c|  6 ++
 drivers/net/ethernet/marvell/mvneta.c|  4 +---
 drivers/net/ethernet/marvell/mvpp2.c |  4 +---
 drivers/net/ethernet/marvell/sky2.c  |  6 ++
 drivers/net/ethernet/mediatek/mtk_eth_soc.c  |  6 ++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  4 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  3 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c   |  4 +---
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c   |  3 +--
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c |  9 -
 drivers/net/ethernet/neterion/vxge/vxge-main.c   |  4 +---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c  |  6 ++
 drivers/net/ethernet/nvidia/forcedeth.c  |  4 +---
 drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 10 --
 drivers/net/ethernet/qlogic/qede/qede_main.c |  7 ++-
 drivers/net/ethernet/qualcomm/emac/emac.c|  6 ++
 drivers/net/ethernet/realtek/8139too.c   |  9 +++--
 drivers/net/ethernet/realtek/r8169.c |  4 +---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c  |  8 ++--
 drivers/net/ethernet/sfc/efx.c   |  6 ++
 drivers/net/ethernet/sfc/falcon/efx.c|  6 ++
 drivers/net/ethernet/sun/niu.c   |  6 ++
 drivers/net/ethernet/synopsys/dwc_eth_qos.c  |  4 +---
 drivers/net/ethernet/tile/tilepro.c  |  4 ++--
 drivers/net/ethernet/via/via-rhine.c |  8 +++-
 drivers/net/fjes/fjes_main.c |  7 ++-
 drivers/net/hyperv/netvsc_drv.c  |  6 ++
 drivers/net/ifb.c|  6 ++
 drivers/net/ipvlan/ipvlan_main.c |  5 ++---
 drivers/net/loopback.c   |  5 ++---
 drivers/net/macsec.c |  6 ++
 drivers/net/macvlan.c|  5 ++---
 drivers/net/nlmon.c  |  4 +---
 drivers/net/ppp/ppp_generic.c|  4 +---
 drivers/net/slip/slip.c  |  3 +--
 drivers/net/team/team.c  |  3 +--
 drivers/net/tun.c|  3 +--
 drivers/net/veth.c   |  6 ++
 drivers/net/virtio_net.c |  6 ++
 drivers/net/vmxnet3/vmxnet3_ethtool.c|  4 +---
 drivers/net/vmxnet3/vmxnet3_int.h|  4 ++--

Re: [PATCH v3 3/3] stmmac: adding new glue driver dwmac-dwc-qos-eth

2017-01-05 Thread Alexandre Torgue


Hi Joao,

On 01/04/2017 05:22 PM, Joao Pinto wrote:

This patch adds a new glue driver called dwmac-dwc-qos-eth which
was based in the dwc_eth_qos as is. To assure retro-compatibility a slight
tweak was also added to stmmac_platform.


Sorry to come late in the review. I have a basic question. Why do you 
create a glue driver for that ?
dwmac-glues are currently vendor specific, so why create one for IP ? 
Why not continue to use stmmac_platform.c ?

(It is very basic, I assume I miss something)

thanks
Alex





Signed-off-by: Joao Pinto 
---
changes v2 -> v3:
- Nothing changed, just to keep up patch set version
changes v1 -> v2:
- WOL was not declared in the new glue driver
- clocks were switched and now fixed (apb_pclk and phy_ref_clk)

 .../bindings/net/snps,dwc-qos-ethernet.txt |   3 +
 drivers/net/ethernet/stmicro/stmmac/Kconfig|   9 +
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 200 +
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  15 +-
 5 files changed, 225 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c

diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt 
b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
index d93f71c..21d27aa 100644
--- a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
+++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
@@ -1,5 +1,8 @@
 * Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC)

+This binding is deprecated, but it continues to be supported, but new
+features should be preferably added to the stmmac binding document.
+
 This binding supports the Synopsys Designware Ethernet QoS (Quality Of Service)
 IP block. The IP supports multiple options for bus type, clocking and reset
 structure, and feature list. Consequently, a number of properties and list
diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig 
b/drivers/net/ethernet/stmicro/stmmac/Kconfig
index ab66248..99594e3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
+++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
@@ -29,6 +29,15 @@ config STMMAC_PLATFORM

 if STMMAC_PLATFORM

+config DWMAC_DWC_QOS_ETH
+   tristate "Support for snps,dwc-qos-ethernet.txt DT binding."
+   select PHYLIB
+   select CRC32
+   select MII
+   depends on OF && HAS_DMA
+   help
+ Support for chips using the snps,dwc-qos-ethernet.txt DT binding.
+
 config DWMAC_GENERIC
tristate "Generic driver for DWMAC"
default STMMAC_PLATFORM
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 8f83a86..700c603 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_DWMAC_SOCFPGA)   += dwmac-altr-socfpga.o
 obj-$(CONFIG_DWMAC_STI)+= dwmac-sti.o
 obj-$(CONFIG_DWMAC_STM32)  += dwmac-stm32.o
 obj-$(CONFIG_DWMAC_SUNXI)  += dwmac-sunxi.o
+obj-$(CONFIG_DWMAC_DWC_QOS_ETH)+= dwmac-dwc-qos-eth.o
 obj-$(CONFIG_DWMAC_GENERIC)+= dwmac-generic.o
 stmmac-platform-objs:= stmmac_platform.o
 dwmac-altr-socfpga-objs := altr_tse_pcs.o dwmac-socfpga.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
new file mode 100644
index 000..4532a7c
--- /dev/null
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -0,0 +1,200 @@
+/*
+ * Synopsys DWC Ethernet Quality-of-Service v4.10a linux driver
+ *
+ * Copyright (C) 2016 Joao Pinto 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "stmmac_platform.h"
+
+static int dwc_eth_dwmac_config_dt(struct platform_device *pdev,
+  struct plat_stmmacenet_data *plat_dat)
+{
+   struct device_node *np = pdev->dev.of_node;
+   u32 burst_map = 0;
+   u32 bit_index = 0;
+   u32 a_index = 0;
+
+   if (!plat_dat->axi) {
+   plat_dat->axi = kzalloc(sizeof(struct stmmac_axi), GFP_KERNEL);
+
+   if (!plat_dat->axi)
+   return -ENOMEM;
+   }
+
+   plat_dat->axi->axi_lpi_en = of_property_read_bool(np, "snps,en-lpi");
+   if (of_property_read_u32(np, "snps,write-requests",
+_dat->axi->axi_wr_osr_lmt)) {
+   /**
+* Since the register has a reset value of

[PATCH net-next 0/2] net/sched: act_csum: add support for SCTP checksum

2017-01-05 Thread Davide Caratti

This series extends current act_csum functionality to allow computation of
SCTP checksums. Patch 1 ensures LIBCRC32C will be selected if NET_ACT_CSUM
is selected. Patch 2 extends act_csum to handle IPPROTO_SCTP protocol in
IPv4/IPv6 header, and eventually compute the CRC32c value.

Davide Caratti (2):
  net/sched: Kconfig: select LIBCRC32C if NET_ACT_CSUM is selected
  net/sched: act_csum: compute crc32c on SCTP packets

 include/uapi/linux/tc_act/tc_csum.h |  3 ++-
 net/sched/Kconfig   |  1 +
 net/sched/act_csum.c| 32 
 3 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.7.4

[PATCH net-next 1/2] net/sched: Kconfig: select LIBCRC32C if NET_ACT_CSUM is selected

2017-01-05 Thread Davide Caratti

LIBCRC32C is needed to compute crc32c on SCTP packets.

Signed-off-by: Davide Caratti 
---
 net/sched/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 87956a7..a9aa38d 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -707,6 +707,7 @@ config NET_ACT_SKBEDIT
 config NET_ACT_CSUM
 tristate "Checksum Updating"
 depends on NET_CLS_ACT && INET
+select LIBCRC32C
 ---help---
  Say Y here to update some common checksum after some direct
  packet alterations.
-- 
2.7.4

[PATCH net-next 2/2] net/sched: act_csum: compute crc32c on SCTP packets

2017-01-05 Thread Davide Caratti

modify act_csum to compute crc32c on IPv4/IPv6 packets having SCTP in
their payload, and extend UAPI definitions accordingly.

Signed-off-by: Davide Caratti 
---
 include/uapi/linux/tc_act/tc_csum.h |  3 ++-
 net/sched/act_csum.c| 32 
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_csum.h 
b/include/uapi/linux/tc_act/tc_csum.h
index 8ac8041..58d457f 100644
--- a/include/uapi/linux/tc_act/tc_csum.h
+++ b/include/uapi/linux/tc_act/tc_csum.h
@@ -21,7 +21,8 @@ enum {
TCA_CSUM_UPDATE_FLAG_IGMP= 4,
TCA_CSUM_UPDATE_FLAG_TCP = 8,
TCA_CSUM_UPDATE_FLAG_UDP = 16,
-   TCA_CSUM_UPDATE_FLAG_UDPLITE = 32
+   TCA_CSUM_UPDATE_FLAG_UDPLITE = 32,
+   TCA_CSUM_UPDATE_FLAG_SCTP= 64
 };
 
 struct tc_csum {
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index a0edd80..620ac9b 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -322,6 +323,25 @@ static int tcf_csum_ipv6_udp(struct sk_buff *skb, unsigned 
int ihl,
return 1;
 }
 
+static int tcf_csum_sctp(struct sk_buff *skb, unsigned int ihl,
+unsigned int ipl)
+{
+   struct sctphdr *sctph;
+
+   if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type & SKB_GSO_SCTP)
+   return 1;
+
+   sctph = tcf_csum_skb_nextlayer(skb, ihl, ipl, sizeof(*sctph));
+   if (!sctph)
+   return 0;
+
+   sctph->checksum = sctp_compute_cksum(skb,
+skb_network_offset(skb) + ihl);
+   skb->ip_summed = CHECKSUM_NONE;
+
+   return 1;
+}
+
 static int tcf_csum_ipv4(struct sk_buff *skb, u32 update_flags)
 {
const struct iphdr *iph;
@@ -365,6 +385,12 @@ static int tcf_csum_ipv4(struct sk_buff *skb, u32 
update_flags)
   ntohs(iph->tot_len), 1))
goto fail;
break;
+   case IPPROTO_SCTP:
+   if (update_flags & TCA_CSUM_UPDATE_FLAG_SCTP)
+   if (!tcf_csum_sctp(skb, iph->ihl * 4,
+  ntohs(iph->tot_len)))
+   goto fail;
+   break;
}
 
if (update_flags & TCA_CSUM_UPDATE_FLAG_IPV4HDR) {
@@ -481,6 +507,12 @@ static int tcf_csum_ipv6(struct sk_buff *skb, u32 
update_flags)
   pl + sizeof(*ip6h), 1))
goto fail;
goto done;
+   case IPPROTO_SCTP:
+   if (update_flags & TCA_CSUM_UPDATE_FLAG_SCTP)
+   if (!tcf_csum_sctp(skb, hl,
+  pl + sizeof(*ip6h)))
+   goto fail;
+   goto done;
default:
goto ignore_skb;
}
-- 
2.7.4

Re: [PATCH 0/6] Netfilter fixes for net

2017-01-05 Thread David Miller

From: Pablo Neira Ayuso 
Date: Thu,  5 Jan 2017 12:19:47 +0100

> The following patchset contains accumulated Netfilter fixes for your
> net tree:
> 
> 1) Ensure quota dump and reset happens iff we can deliver numbers to
>userspace.
> 
> 2) Silence splat on incorrect use of smp_processor_id() from nft_queue.
> 
> 3) Fix an out-of-bound access reported by KASAN in
>nf_tables_rule_destroy(), patch from Florian Westphal.
> 
> 4) Fix layer 4 checksum mangling in the nf_tables payload expression
>with IPv6.
> 
> 5) Fix a race in the CLUSTERIP target from control plane path when two
>threads run to add a new configuration object. Serialize invocations
>of clusterip_config_init() using spin_lock. From Xin Long.
> 
> 6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with
>the br_nf_pre_routing_finish() hook. From Artur Molchanov.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

And a happy new year to you too!

Re: [PATCH v2 net-next] net:dsa: check for EPROBE_DEFER from dsa_dst_parse()

2017-01-05 Thread David Miller

From: Volodymyr Bendiuga 
Date: Thu,  5 Jan 2017 11:10:13 +0100

> Since there can be multiple dsa switches stacked together but
> not all of devicetree nodes available at the time of calling
> dsa_dst_parse(), EPROBE_DEFER can be returned by it. When this
> happens, only the last dsa switch has to be deleted by
> dsa_dst_del_ds(), but not the whole list, because next time linux
> cames back to this function it will try to add only the last dsa
> switch which returned EPROBE_DEFER.
> 
> Signed-off-by: Volodymyr Bendiuga 
> Reviewed-by: Andrew Lunn 

Applied.

Re: [PATCH v3 net-next] net:mv88e6xxx: use g2 interrupt for 6097 chip

2017-01-05 Thread David Miller

From: Volodymyr Bendiuga 
Date: Thu,  5 Jan 2017 10:44:18 +0100

> This chip needs MV88E6XXX_FLAG_G2_INT
> 
> Signed-off-by: Volodymyr Bendiuga 
> Reviewed-by: Andrew Lunn 

Applied, thanks.

Re: [net-next PATCH v2 5/6] i40e: Add TX and RX support in switchdev mode.

2017-01-05 Thread Jakub Kicinski

On Thu, Jan 5, 2017 at 12:08 PM, Or Gerlitz  wrote:
> On Tue, Jan 3, 2017 at 8:07 PM, Sridhar Samudrala
>  wrote:
>> A host based switching entity like a linux bridge or OVS redirects these 
>> frames
>> to the right VFs via VFPR netdevs. Any frames sent via VFPR netdevs are sent 
>> as
>> directed transmits to the corresponding VFs. To enable directed transmit, skb
>> metadata dst is used to pass the VF id and the frame is requeued to call the 
>> PFs
>> transmit routine.
>
> Jakub/John, patch #4 which didn't appear in the list had a long discussion [1]
> ending  with "lets talk on it @ netdev", did we?

I spoke to a few people, but nobody had much to say about it back then :(

Noob question: can we somehow "debug" why a patch is not appearing on
a vger list?

Re: [PATCH] net: xilinx: emaclite: Remove xemaclite_remove_ndev()

2017-01-05 Thread David Miller

From: Tobias Klauser 
Date: Thu,  5 Jan 2017 10:41:36 +0100

> xemaclite_remove_ndev() is a simple wrapper around free_netdev()
> checking for NULL before the call. All possible paths calling
> it are guaranteed to pass a non-NULL argument, so rather call
> free_netdev() directly.
> 
> Signed-off-by: Tobias Klauser 

Applied.

Re: [PATCH] net: ethoc: Remove unused members from struct ethoc

2017-01-05 Thread David Miller

From: Tobias Klauser 
Date: Thu,  5 Jan 2017 09:16:27 +0100

> The io_region_size and dma_alloc members of struct ethoc are only
> written but never read, so they might as well be removed.
> 
> Signed-off-by: Tobias Klauser 

Applied.

1 2 >

1 - 100 of 160 matches

Mail list logo