date:20170920

[PATCH] netfilter: nf_tables: Release memory obtained by kasprintf

2017-09-20 Thread Arvind Yadav

Free memory region, if nf_tables_set_alloc_name is not successful.

Signed-off-by: Arvind Yadav 
---
 net/netfilter/nf_tables_api.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9299271..393e37e 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2741,8 +2741,10 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, 
struct nft_set *set,
list_for_each_entry(i, &ctx->table->sets, list) {
if (!nft_is_active_next(ctx->net, i))
continue;
-   if (!strcmp(set->name, i->name))
+   if (!strcmp(set->name, i->name)) {
+   kfree(set->name);
return -ENFILE;
+   }
}
return 0;
 }
-- 
1.9.1

[net-next] macvlan: code refine to check data before using

2017-09-20 Thread Zhang Shengju

This patch checks data first at one place, return if it's null.

Signed-off-by: Zhang Shengju 
---
 drivers/net/macvlan.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index d2aea96..1ffe77e 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1231,11 +1231,14 @@ static int macvlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
return -EADDRNOTAVAIL;
}
 
-   if (data && data[IFLA_MACVLAN_FLAGS] &&
+   if (!data)
+   return 0;
+
+   if (data[IFLA_MACVLAN_FLAGS] &&
nla_get_u16(data[IFLA_MACVLAN_FLAGS]) & ~MACVLAN_FLAG_NOPROMISC)
return -EINVAL;
 
-   if (data && data[IFLA_MACVLAN_MODE]) {
+   if (data[IFLA_MACVLAN_MODE]) {
switch (nla_get_u32(data[IFLA_MACVLAN_MODE])) {
case MACVLAN_MODE_PRIVATE:
case MACVLAN_MODE_VEPA:
@@ -1248,7 +1251,7 @@ static int macvlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
}
}
 
-   if (data && data[IFLA_MACVLAN_MACADDR_MODE]) {
+   if (data[IFLA_MACVLAN_MACADDR_MODE]) {
switch (nla_get_u32(data[IFLA_MACVLAN_MACADDR_MODE])) {
case MACVLAN_MACADDR_ADD:
case MACVLAN_MACADDR_DEL:
@@ -1260,7 +1263,7 @@ static int macvlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
}
}
 
-   if (data && data[IFLA_MACVLAN_MACADDR]) {
+   if (data[IFLA_MACVLAN_MACADDR]) {
if (nla_len(data[IFLA_MACVLAN_MACADDR]) != ETH_ALEN)
return -EINVAL;
 
@@ -1268,7 +1271,7 @@ static int macvlan_validate(struct nlattr *tb[], struct 
nlattr *data[],
return -EADDRNOTAVAIL;
}
 
-   if (data && data[IFLA_MACVLAN_MACADDR_COUNT])
+   if (data[IFLA_MACVLAN_MACADDR_COUNT])
return -EINVAL;
 
return 0;
-- 
1.8.3.1

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Hi


Will try bisecting tonight



W dniu 2017-09-20 o 05:24, Eric Dumazet pisze:

On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote:

Just checked kernel 4.13.2 and same problem

Just after start all 6 bgp sessions - and kernel starts to learn routes
it panic.

https://bugzilla.kernel.org/attachment.cgi?id=258509



Unfortunately we have not enough information from these traces.

Can you get a full stack trace ?

Alternatively, can you bisect ?

Thanks.

Re: [RFC PATCH 2/3] usbnet: Avoid potential races in usbnet_deferred_kevent()

2017-09-20 Thread Oliver Neukum

Am Dienstag, den 19.09.2017, 13:51 -0700 schrieb Guenter Roeck:
> On Tue, Sep 19, 2017 at 1:37 PM, Oliver Neukum  wrote:
> > 
> > Am Dienstag, den 19.09.2017, 09:15 -0700 schrieb Douglas Anderson:
> > > 
[..]
> > > NOTES:
> > > - No known bugs are fixed by this; it's just found by code inspection.
> > 
> > Hi,
> > 
> > unfortunately the patch is wrong. The flags must be cleared only
> > in case the handler is successful. That is not guaranteed.
> > 
> 
> Just out of curiosity, what is the retry mechanism ? Whenever a new,
> possibly unrelated, event is scheduled ?

Hi,

that actually depends on the flag.
Look at the case of fail_lowmem. There we reschedule.

HTH
Oliver

Re: [RFC PATCH 2/3] usbnet: Avoid potential races in usbnet_deferred_kevent()

2017-09-20 Thread Oliver Neukum

Am Dienstag, den 19.09.2017, 13:53 -0700 schrieb Doug Anderson:
> Hi,
> 
> On Tue, Sep 19, 2017 at 1:37 PM, Oliver Neukum  wrote:
> > 
> > Am Dienstag, den 19.09.2017, 09:15 -0700 schrieb Douglas Anderson:
> > > 
> > > In general when you've got a flag communicating that "something needs
> > > to be done" you want to clear that flag _before_ doing the task.  If
> > > you clear the flag _after_ doing the task you end up with the risk
> > > that this will happen:
> > > 
> > > 1. Requester sets flag saying task A needs to be done.
> > > 2. Worker comes and stars doing task A.
> > > 3. Worker finishes task A but hasn't yet cleared the flag.
> > > 4. Requester wants to set flag saying task A needs to be done again.
> > > 5. Worker clears the flag without doing anything.
> > > 
> > > Let's make the usbnet codebase consistently clear the flag _before_ it
> > > does the requested work.  That way if there's another request to do
> > > the work while the work is already in progress it won't be lost.
> > > 
> > > NOTES:
> > > - No known bugs are fixed by this; it's just found by code inspection.
> > 
> > Hi,
> > 
> > unfortunately the patch is wrong. The flags must be cleared only
> > in case the handler is successful. That is not guaranteed.
> > 
> > Regards
> > Oliver
> > 
> > NACK
> 
> OK, thanks for reviewing!  I definitely wasn't super confident about
> the patch (hence the RFC).
> 
> Do you think that the races I identified are possible to hit?  In

As far as I can tell, we are safe, but you are right to say that the
driver is not quite clean at that point.

> other words: should I try to rework the patch somehow or just drop it?
>  Originally I had the patch setting the flags back to true in the
> failure cases, but then I convinced myself that wasn't needed.  I can
> certainly go back and try it that way...

Setting the flags again in the error case would certainly be an
improvement. I'd be happy with a patch doing that.

Regards
Oliver

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Trying to make video from ipmi :)

with that results:

https://bugzilla.kernel.org/attachment.cgi?id=258521

catched two more lines where it starts - panic from 4.13.2.


Now will try tro do some bisection



W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze:

Hi


Will try bisecting tonight



W dniu 2017-09-20 o 05:24, Eric Dumazet pisze:

On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote:

Just checked kernel 4.13.2 and same problem

Just after start all 6 bgp sessions - and kernel starts to learn routes
it panic.

https://bugzilla.kernel.org/attachment.cgi?id=258509



Unfortunately we have not enough information from these traces.

Can you get a full stack trace ?

Alternatively, can you bisect ?

Thanks.

Re: [lkp-robot] [test_rhashtable] c1bd3689a7: WARNING:at_lib/debugobjects.c:#__debug_object_init

2017-09-20 Thread Florian Westphal

kernel test robot  wrote:
> FYI, we noticed the following commit:
> 
> commit: c1bd3689a70d1ba1a2f7c6781770920087166018 ("test_rhashtable: add test 
> case for rhl_table interface")
> url: 
> https://github.com/0day-ci/linux/commits/Florian-Westphal/test_rhashtable-add-test-case-for-rhl-table/20170919-135550
> 
> 
> in testcase: boot
> 
> on test machine: qemu-system-x86_64 -enable-kvm -smp 2 -m 512M
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> +-+++
> [   15.235031] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:328 
> __debug_object_init+0x794/0x930
[..]

This is with v1 of the patch where the rhltable struct was allocated on
stack, v2 is fine.

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Ok looks like ending bisection


Latest bisected kernel when there is no kernel panic 4.12.0+ (from 
next)  - but only this warning:


[  309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 timed out
[  309.030034] [ cut here ]
[  309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139
[  309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal
[  309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5
[  309.030046] task: 88086d98a000 task.stack: c90003378000
[  309.030048] RIP: 0010:dev_watchdog+0xcf/0x139
[  309.030049] RSP: 0018:88087fbc3ea8 EFLAGS: 00010246
[  309.030050] RAX: 003d RBX: 88046b68 RCX: 

[  309.030050] RDX: 88087fbd2f01 RSI:  RDI: 
88087fbcda08
[  309.030051] RBP: 88087fbc3eb8 R08:  R09: 
88087ff80a04
[  309.030051] R10:  R11: 88086d98a001 R12: 

[  309.030052] R13: 88087fbc3ef8 R14: 88086d98a000 R15: 
81c06008
[  309.030053] FS:  () GS:88087fbc() 
knlGS:

[  309.030054] CS:  0010 DS:  ES:  CR0: 80050033
[  309.030054] CR2: 7fba600f6098 CR3: 00086b955000 CR4: 
001406e0

[  309.030055] Call Trace:
[  309.030057]  
[  309.030059]  ? netif_tx_lock+0x79/0x79
[  309.030062]  call_timer_fn.isra.24+0x17/0x77
[  309.030063]  run_timer_softirq+0x118/0x161
[  309.030065]  ? netif_tx_lock+0x79/0x79
[  309.030066]  ? ktime_get+0x2b/0x42
[  309.030070]  ? lapic_next_deadline+0x21/0x27
[  309.030073]  ? clockevents_program_event+0xa8/0xc5
[  309.030076]  __do_softirq+0xa8/0x19d
[  309.030078]  irq_exit+0x5d/0x6b
[  309.030079]  smp_apic_timer_interrupt+0x2a/0x36
[  309.030082]  apic_timer_interrupt+0x89/0x90
[  309.030085] RIP: 0010:mwait_idle+0x4e/0x6a
[  309.030086] RSP: 0018:c9000337be98 EFLAGS: 0246 ORIG_RAX: 
ff10
[  309.030087] RAX:  RBX:  RCX: 

[  309.030087] RDX:  RSI:  RDI: 
88086d98a000
[  309.030088] RBP: c9000337be98 R08: 88046f8279a0 R09: 
88046f827040
[  309.030089] R10: 88086d98a000 R11: 88086d98a000 R12: 

[  309.030089] R13: 88086d98a000 R14: 88086d98a000 R15: 
88086d98a000

[  309.030090]  
[  309.030094]  arch_cpu_idle+0xa/0xc
[  309.030095]  default_idle_call+0x19/0x1b
[  309.030102]  do_idle+0xbc/0x196
[  309.030104]  cpu_startup_entry+0x1d/0x20
[  309.030105]  start_secondary+0xd8/0xdc
[  309.030108]  secondary_startup_64+0x9f/0x9f
[  309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 e8 3a 
62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 c0 e8 3d 4c 
b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 8b 05 a0 bc 6a

[  309.030128] ---[ end trace 9102cb25703ae2d9 ]---


I just marked it as good - cause this problem above is differend - and 
im going to:


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)




W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze:

Trying to make video from ipmi :)

with that results:

https://bugzilla.kernel.org/attachment.cgi?id=258521

catched two more lines where it starts - panic from 4.13.2.


Now will try tro do some bisection



W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze:

Hi


Will try bisecting tonight



W dniu 2017-09-20 o 05:24, Eric Dumazet pisze:

On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote:

Just checked kernel 4.13.2 and same problem

Just after start all 6 bgp sessions - and kernel starts to learn 
routes

it panic.

https://bugzilla.kernel.org/attachment.cgi?id=258509



Unfortunately we have not enough information from these traces.

Can you get a full stack trace ?

Alternatively, can you bisect ?

Thanks.

[PATCH v2 1/2] mac80211: Add rcu read side critical sections

2017-09-20 Thread Ville Syrjala

From: Ville Syrjälä 

I got the following lockdep warning about the rcu_dereference()s in
ieee80211_tx_h_select_key(). After tracing all callers of
ieee80211_tx_h_select_key() I discovered that ieee80211_get_buffered_bc()
and ieee80211_build_data_template() had the rcu_read_lock/unlock() but
three other places did not. So I just blindly added them and made the
read side critical section extend as far as the lifetime of 'tx' which
is where we seem to be stuffing the rcu protected pointers. No real clue
whether this is correct or not.

[  854.573700] ../net/mac80211/tx.c:594 suspicious rcu_dereference_check() 
usage!
[  854.573704]
   other info that might help us debug this:

[  854.573707]
   rcu_scheduler_active = 2, debug_locks = 1
[  854.573712] 6 locks held by kworker/u2:0/2877:
[  854.573715]  #0:  ("%s"wiphy_name(local->hw.wiphy)){.+}, at: 
[] process_one_work+0x127/0x580
[  854.573742]  #1:  ((&sdata->work)){+.+.+.}, at: [] 
process_one_work+0x127/0x580
[  854.573758]  #2:  (&wdev->mtx){+.+.+.}, at: [] 
ieee80211_sta_work+0x23/0x1c70 [mac80211]
[  854.573902]  #3:  (&local->sta_mtx){+.+.+.}, at: [] 
__sta_info_flush+0x60/0x160 [mac80211]
[  854.573947]  #4:  (&(&txq->axq_lock)->rlock){+.-...}, at: [] 
ath_tx_node_cleanup+0x5c/0x180 [ath9k]
[  854.573973]  #5:  (&(&fq->lock)->rlock){+.-...}, at: [] 
ieee80211_tx_dequeue+0x24/0xa80 [mac80211]
[  854.574023]
   stack backtrace:
[  854.574028] CPU: 0 PID: 2877 Comm: kworker/u2:0 Not tainted 4.13.0-mgm-ovl+ 
#52
[  854.574032] Hardware name: FUJITSU SIEMENS LIFEBOOK S6120/FJNB16C, BIOS 
Version 1.26  05/10/2004
[  854.574070] Workqueue: phy0 ieee80211_iface_work [mac80211]
[  854.574076] Call Trace:
[  854.574086]  dump_stack+0x16/0x19
[  854.574092]  lockdep_rcu_suspicious+0xcb/0xf0
[  854.574131]  ieee80211_tx_h_select_key+0x1b5/0x500 [mac80211]
[  854.574171]  ieee80211_tx_dequeue+0x283/0xa80 [mac80211]
[  854.574181]  ath_tid_dequeue+0x84/0xf0 [ath9k]
[  854.574189]  ath_tx_node_cleanup+0xb8/0x180 [ath9k]
[  854.574199]  ath9k_sta_state+0x48/0xf0 [ath9k]
[  854.574207]  ? ath9k_del_ps_key.isra.19+0x60/0x60 [ath9k]
[  854.574240]  drv_sta_state+0xaf/0x8c0 [mac80211]
[  854.574275]  __sta_info_destroy_part2+0x10b/0x140 [mac80211]
[  854.574309]  __sta_info_flush+0xd5/0x160 [mac80211]
[  854.574349]  ieee80211_set_disassoc+0xd3/0x570 [mac80211]
[  854.574390]  ieee80211_sta_connection_lost+0x30/0x60 [mac80211]
[  854.574431]  ieee80211_sta_work+0x1ff/0x1c70 [mac80211]
[  854.574436]  ? mark_held_locks+0x62/0x90
[  854.574443]  ? _raw_spin_unlock_irqrestore+0x55/0x70
[  854.574447]  ? trace_hardirqs_on_caller+0x11c/0x1a0
[  854.574452]  ? trace_hardirqs_on+0xb/0x10
[  854.574459]  ? dev_mc_net_exit+0xe/0x20
[  854.574467]  ? skb_dequeue+0x48/0x70
[  854.574504]  ieee80211_iface_work+0x2d8/0x320 [mac80211]
[  854.574509]  process_one_work+0x1d1/0x580
[  854.574513]  ? process_one_work+0x127/0x580
[  854.574519]  worker_thread+0x31/0x380
[  854.574525]  kthread+0xd9/0x110
[  854.574529]  ? process_one_work+0x580/0x580
[  854.574534]  ? kthread_create_on_node+0x30/0x30
[  854.574540]  ret_from_fork+0x19/0x24

[  854.574548] =
[  854.574551] WARNING: suspicious RCU usage
[  854.574555] 4.13.0-mgm-ovl+ #52 Not tainted
[  854.574558] -
[  854.574561] ../net/mac80211/tx.c:608 suspicious rcu_dereference_check() 
usage!
[  854.574564]
   other info that might help us debug this:

[  854.574568]
   rcu_scheduler_active = 2, debug_locks = 1
[  854.574572] 6 locks held by kworker/u2:0/2877:
[  854.574574]  #0:  ("%s"wiphy_name(local->hw.wiphy)){.+}, at: 
[] process_one_work+0x127/0x580
[  854.574590]  #1:  ((&sdata->work)){+.+.+.}, at: [] 
process_one_work+0x127/0x580
[  854.574606]  #2:  (&wdev->mtx){+.+.+.}, at: [] 
ieee80211_sta_work+0x23/0x1c70 [mac80211]
[  854.574657]  #3:  (&local->sta_mtx){+.+.+.}, at: [] 
__sta_info_flush+0x60/0x160 [mac80211]
[  854.574702]  #4:  (&(&txq->axq_lock)->rlock){+.-...}, at: [] 
ath_tx_node_cleanup+0x5c/0x180 [ath9k]
[  854.574721]  #5:  (&(&fq->lock)->rlock){+.-...}, at: [] 
ieee80211_tx_dequeue+0x24/0xa80 [mac80211]
[  854.574771]
   stack backtrace:
[  854.574775] CPU: 0 PID: 2877 Comm: kworker/u2:0 Not tainted 4.13.0-mgm-ovl+ 
#52
[  854.574779] Hardware name: FUJITSU SIEMENS LIFEBOOK S6120/FJNB16C, BIOS 
Version 1.26  05/10/2004
[  854.574814] Workqueue: phy0 ieee80211_iface_work [mac80211]
[  854.574821] Call Trace:
[  854.574825]  dump_stack+0x16/0x19
[  854.574830]  lockdep_rcu_suspicious+0xcb/0xf0
[  854.574869]  ieee80211_tx_h_select_key+0x44e/0x500 [mac80211]
[  854.574908]  ieee80211_tx_dequeue+0x283/0xa80 [mac80211]
[  854.574919]  ath_tid_dequeue+0x84/0xf0 [ath9k]
[  854.574927]  ath_tx_node_cleanup+0xb8/0x180 [ath9k]
[  854.574936]  ath9k_sta_state+0x48/0xf0 [ath9k]
[  854.574945]  ? ath9k_del_ps_key.isra.19+0x60/0x60 [ath9k]
[  854.574978]  drv_sta_state+0xaf/0x8c0 [mac80211]
[

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski

Ok kernel crashed with different panic that i didnt catch when i was 
doing bisect and now my bisection is broken :)


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be overwritten by 
checkout:

    Documentation/00-INDEX
    Documentation/ABI/stable/sysfs-class-udc
    Documentation/ABI/testing/configfs-usb-gadget-uac1
    Documentation/ABI/testing/ima_policy
    Documentation/ABI/testing/sysfs-bus-iio
    Documentation/ABI/testing/sysfs-bus-iio-meas-spec
    Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
    Documentation/ABI/testing/sysfs-class-net
    Documentation/ABI/testing/sysfs-class-power-twl4030
    Documentation/ABI/testing/sysfs-class-typec
    Documentation/DMA-API.txt
    Documentation/IRQ-domain.txt
    Documentation/Makefile
    Documentation/PCI/MSI-HOWTO.txt
    Documentation/RCU/00-INDEX
    Documentation/RCU/Design/Requirements/Requirements.html
    Documentation/RCU/checklist.txt
    Documentation/admin-guide/README.rst
    Documentation/admin-guide/devices.txt
    Documentation/admin-guide/index.rst
    Documentation/admin-guide/kernel-parameters.txt
    Documentation/admin-guide/pm/cpufreq.rst
    Documentation/admin-guide/pm/intel_pstate.rst
    Documentation/admin-guide/ras.rst
    Documentation/arm/Atmel/README
    Documentation/block/biodoc.txt
    Documentation/conf.py
    Documentation/core-api/assoc_array.rst
    Documentation/core-api/atomic_ops.rst
    Documentation/core-api/index.rst
    Documentation/crypto/asymmetric-keys.txt
    Documentation/dev-tools/index.rst
    Documentation/dev-tools/sparse.rst
    Documentation/devicetree/bindings/arm/amlogic.txt
    Documentation/devicetree/bindings/arm/atmel-at91.txt
    Documentation/devicetree/bindings/arm/ccn.txt
    Documentation/devicetree/bindings/arm/cpus.txt
    Documentation/devicetree/bindings/arm/gemini.txt
Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
Documentation/devicetree/bindings/arm/keystone/keystone.txt
    Documentation/devicetree/bindings/arm/mediatek.txt
    Documentation/devicetree/bindings/arm/rockchip.txt
    Documentation/devicetree/bindings/arm/shmobile.txt
    Documentation/devicetree/bindings/arm/tegra.txt
    Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
    Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
    Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt
    Documentation/devicetree/bindings/gpio/gpio_atmel.txt
Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt
Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt
    Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt
    Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt
Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt
Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt
Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt
    Documentation/devicetree/bindings/leds/common.txt
    Documentation/devicetree/bindings/mfd/hi6421.txt
    Documentation/devicetree/bindings/mfd/tps65910.txt
    Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
    Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt
    Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt
    Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt
    Documentation/devicetree/bindings/mtd/atmel-nand.txt
    Documentation/devicetree/bindings/net/dsa/b53.txt
    Documentation/devicetree/bindings/net/ethernet.txt
    Documentation/devicetree/bindings/net/macb.txt
Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
    Documentation/devicetree/bindings/net/ti,wilink-st.txt
Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt
    Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt
    Documentation/devicetree/bindings/opp/opp.txt
    Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt
    Documentation/devicetree/bindings/phy/brcm-sata-phy.txt
    Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt
Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt
Documentation/devicetree/bindings/power/rockchip-io-domain.txt
    Documentation/devicetree/bindings/power/supply/bq27xxx.txt
    Documentation/devicetree/bindings/property-units.txt
    Documentation/devicetree/bindings/regulator/regulator.txt
    Documentation/devicetree/bindings/serial/8
error: The following untracked working tree files would be overwritten 
by checkout:

    Documentation/ABI/testing/sysfs-class-net-phydev
    Documentation/DocBook/.gitignore
    Documentation/DocBook/Makef

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Soo far bisected and marked:

git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
'pinctrl-v4.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31



W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze:
Ok kernel crashed with different panic that i didnt catch when i was 
doing bisect and now my bisection is broken :)


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be overwritten 
by checkout:

    Documentation/00-INDEX
    Documentation/ABI/stable/sysfs-class-udc
    Documentation/ABI/testing/configfs-usb-gadget-uac1
    Documentation/ABI/testing/ima_policy
    Documentation/ABI/testing/sysfs-bus-iio
    Documentation/ABI/testing/sysfs-bus-iio-meas-spec
    Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
    Documentation/ABI/testing/sysfs-class-net
    Documentation/ABI/testing/sysfs-class-power-twl4030
    Documentation/ABI/testing/sysfs-class-typec
    Documentation/DMA-API.txt
    Documentation/IRQ-domain.txt
    Documentation/Makefile
    Documentation/PCI/MSI-HOWTO.txt
    Documentation/RCU/00-INDEX
    Documentation/RCU/Design/Requirements/Requirements.html
    Documentation/RCU/checklist.txt
    Documentation/admin-guide/README.rst
    Documentation/admin-guide/devices.txt
    Documentation/admin-guide/index.rst
    Documentation/admin-guide/kernel-parameters.txt
    Documentation/admin-guide/pm/cpufreq.rst
    Documentation/admin-guide/pm/intel_pstate.rst
    Documentation/admin-guide/ras.rst
    Documentation/arm/Atmel/README
    Documentation/block/biodoc.txt
    Documentation/conf.py
    Documentation/core-api/assoc_array.rst
    Documentation/core-api/atomic_ops.rst
    Documentation/core-api/index.rst
    Documentation/crypto/asymmetric-keys.txt
    Documentation/dev-tools/index.rst
    Documentation/dev-tools/sparse.rst
    Documentation/devicetree/bindings/arm/amlogic.txt
    Documentation/devicetree/bindings/arm/atmel-at91.txt
    Documentation/devicetree/bindings/arm/ccn.txt
    Documentation/devicetree/bindings/arm/cpus.txt
    Documentation/devicetree/bindings/arm/gemini.txt
Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
Documentation/devicetree/bindings/arm/keystone/keystone.txt
    Documentation/devicetree/bindings/arm/mediatek.txt
    Documentation/devicetree/bindings/arm/rockchip.txt
    Documentation/devicetree/bindings/arm/shmobile.txt
    Documentation/devicetree/bindings/arm/tegra.txt
    Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
    Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
    Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt
    Documentation/devicetree/bindings/gpio/gpio_atmel.txt
Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt
Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt
    Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt
    Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt
Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt 

Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt 

Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt 


    Documentation/devicetree/bindings/leds/common.txt
    Documentation/devicetree/bindings/mfd/hi6421.txt
    Documentation/devicetree/bindings/mfd/tps65910.txt
    Documentation/devicetree/bindings/mmc/fsl-esdhc.txt
    Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt
    Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt
    Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt
    Documentation/devicetree/bindings/mtd/

Re: [PATCH] net: ethernet: aquantia: default to no in config

2017-09-20 Thread Sergei Shtylyov


Hello!

On 9/20/2017 1:43 AM, Vito Caputo wrote:


NET_VENDOR_AQUANTIA was "default y" for some reason, which seems
obviously inappropriate.
---
  drivers/net/ethernet/aquantia/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/Kconfig 
b/drivers/net/ethernet/aquantia/Kconfig
index cdf78e069a39..6167b13cf349 100644
--- a/drivers/net/ethernet/aquantia/Kconfig
+++ b/drivers/net/ethernet/aquantia/Kconfig
@@ -4,7 +4,7 @@
  
  config NET_VENDOR_AQUANTIA

bool "aQuantia devices"
-   default y
+   default n


   Just remove it -- 'n' is the default default. :-)

[...]

MBR, Sergei

Re: [PATCH v2 1/2] mac80211: Add rcu read side critical sections

2017-09-20 Thread Johannes Berg

On Wed, 2017-09-20 at 13:11 +0300, Ville Syrjala wrote:

> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1770,15 +1770,21 @@ bool ieee80211_tx_prepare_skb(struct ieee80211_hw *hw,
>   struct ieee80211_tx_data tx;
>   struct sk_buff *skb2;
>  
> - if (ieee80211_tx_prepare(sdata, &tx, NULL, skb) == TX_DROP)
> + rcu_read_lock();

The documentation says:

/**
 * ieee80211_tx_prepare_skb - prepare an 802.11 skb for transmission
 * @hw: pointer as obtained from ieee80211_alloc_hw()
 * @vif: virtual interface
 * @skb: frame to be sent from within the driver
 * @band: the band to transmit on
 * @sta: optional pointer to get the station to send the frame to
 *
 * Note: must be called under RCU lock
 */

You can't even argue that it should be the function itself doing it,
because the (admittedly optional) sta pointer would otherwise not have
proper protection after you leave the function ... You can't pass out a
sta pointer that's RCU protected.

Side note: Perhaps some annotation should be there? not sure it's
possible - would have to be something like
struct ieee80211_sta * __rcu *sta;

I guess since the outer pointer isn't protected, only the inner ...

Therefore, this patch is wrong.

I actually think the same is true for ieee80211_tx_dequeue(), but I'm
less sure about it - the sta pointer there clearly is somehow safely
passed in (even if it's w/o RCU, the driver can potentially make that
safe), but the key pointer seems unsafe in this case (as well) if
there's no outer RCU protection.

johannes

[PATCH 2/2] netfilter: ipset: ipset list may return wrong member count for set with timeout

2017-09-20 Thread Pablo Neira Ayuso

From: Vishwanath Pai 

Simple testcase:

$ ipset create test hash:ip timeout 5
$ ipset add test 1.2.3.4
$ ipset add test 1.2.2.2
$ sleep 5

$ ipset l
Name: test
Type: hash:ip
Revision: 5
Header: family inet hashsize 1024 maxelem 65536 timeout 5
Size in memory: 296
References: 0
Number of entries: 2
Members:

We return "Number of entries: 2" but no members are listed. That is
because mtype_list runs "ip_set_timeout_expired" and does not list the
expired entries, but set->elements is never upated (until mtype_gc
cleans it up later).

Reviewed-by: Joshua Hunt 
Signed-off-by: Vishwanath Pai 
Signed-off-by: Jozsef Kadlecsik 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index f236c0bc7b3f..51063d9ed0f7 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -1041,12 +1041,24 @@ mtype_test(struct ip_set *set, void *value, const 
struct ip_set_ext *ext,
 static int
 mtype_head(struct ip_set *set, struct sk_buff *skb)
 {
-   const struct htype *h = set->data;
+   struct htype *h = set->data;
const struct htable *t;
struct nlattr *nested;
size_t memsize;
u8 htable_bits;
 
+   /* If any members have expired, set->elements will be wrong
+* mytype_expire function will update it with the right count.
+* we do not hold set->lock here, so grab it first.
+* set->elements can still be incorrect in the case of a huge set,
+* because elements might time out during the listing.
+*/
+   if (SET_WITH_TIMEOUT(set)) {
+   spin_lock_bh(&set->lock);
+   mtype_expire(set, h);
+   spin_unlock_bh(&set->lock);
+   }
+
rcu_read_lock_bh();
t = rcu_dereference_bh_nfnl(h->table);
memsize = mtype_ahash_memsize(h, t) + set->ext_size;
-- 
2.1.4

[PATCH 0/2] Netfilter fixes for net

2017-09-20 Thread Pablo Neira Ayuso

Hi David,

The following patchset contains two Netfilter fixes for your net tree,
they are:

1) Fix NAt compilation with UP, from Geert Uytterhoeven.

2) Fix incorrect number of entries when dumping a set, from
   Vishwanath Pai.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 2bd6bf03f4c1c59381d62c61d03f6cc3fe71f66e:

  Linux 4.14-rc1 (2017-09-16 15:47:51 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 7f4f7dd4417d9efd038b14d39c70170db2e0baa0:

  netfilter: ipset: ipset list may return wrong member count for set with 
timeout (2017-09-18 17:35:32 +0200)


Geert Uytterhoeven (1):
  netfilter: nat: Do not use ARRAY_SIZE() on spinlocks to fix zero div

Vishwanath Pai (1):
  netfilter: ipset: ipset list may return wrong member count for set with 
timeout

 net/netfilter/ipset/ip_set_hash_gen.h | 14 +-
 net/netfilter/nf_nat_core.c   | 12 ++--
 2 files changed, 19 insertions(+), 7 deletions(-)

[PATCH 1/2] netfilter: nat: Do not use ARRAY_SIZE() on spinlocks to fix zero div

2017-09-20 Thread Pablo Neira Ayuso

From: Geert Uytterhoeven 

If no spinlock debugging options (CONFIG_GENERIC_LOCKBREAK,
CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC) are enabled on a UP
platform (e.g. m68k defconfig), arch_spinlock_t is an empty struct,
hence using ARRAY_SIZE(nf_nat_locks) causes a division by zero:

net/netfilter/nf_nat_core.c: In function ‘nf_nat_setup_info’:
net/netfilter/nf_nat_core.c:432: warning: division by zero
net/netfilter/nf_nat_core.c: In function ‘__nf_nat_cleanup_conntrack’:
net/netfilter/nf_nat_core.c:535: warning: division by zero
net/netfilter/nf_nat_core.c:537: warning: division by zero
net/netfilter/nf_nat_core.c: In function ‘nf_nat_init’:
net/netfilter/nf_nat_core.c:810: warning: division by zero
net/netfilter/nf_nat_core.c:811: warning: division by zero
net/netfilter/nf_nat_core.c:824: warning: division by zero

Fix this by using the CONNTRACK_LOCKS definition instead.

Suggested-by: Florian Westphal 
Fixes: 8073e960a03bf7b5 ("netfilter: nat: use keyed locks")
Signed-off-by: Geert Uytterhoeven 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_nat_core.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index f393a7086025..af8345fc4fbd 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -429,7 +429,7 @@ nf_nat_setup_info(struct nf_conn *ct,
 
srchash = hash_by_src(net,
  &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
-   lock = &nf_nat_locks[srchash % ARRAY_SIZE(nf_nat_locks)];
+   lock = &nf_nat_locks[srchash % CONNTRACK_LOCKS];
spin_lock_bh(lock);
hlist_add_head_rcu(&ct->nat_bysource,
   &nf_nat_bysource[srchash]);
@@ -532,9 +532,9 @@ static void __nf_nat_cleanup_conntrack(struct nf_conn *ct)
unsigned int h;
 
h = hash_by_src(nf_ct_net(ct), 
&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
-   spin_lock_bh(&nf_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]);
+   spin_lock_bh(&nf_nat_locks[h % CONNTRACK_LOCKS]);
hlist_del_rcu(&ct->nat_bysource);
-   spin_unlock_bh(&nf_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]);
+   spin_unlock_bh(&nf_nat_locks[h % CONNTRACK_LOCKS]);
 }
 
 static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
@@ -807,8 +807,8 @@ static int __init nf_nat_init(void)
 
/* Leave them the same for the moment. */
nf_nat_htable_size = nf_conntrack_htable_size;
-   if (nf_nat_htable_size < ARRAY_SIZE(nf_nat_locks))
-   nf_nat_htable_size = ARRAY_SIZE(nf_nat_locks);
+   if (nf_nat_htable_size < CONNTRACK_LOCKS)
+   nf_nat_htable_size = CONNTRACK_LOCKS;
 
nf_nat_bysource = nf_ct_alloc_hashtable(&nf_nat_htable_size, 0);
if (!nf_nat_bysource)
@@ -821,7 +821,7 @@ static int __init nf_nat_init(void)
return ret;
}
 
-   for (i = 0; i < ARRAY_SIZE(nf_nat_locks); i++)
+   for (i = 0; i < CONNTRACK_LOCKS; i++)
spin_lock_init(&nf_nat_locks[i]);
 
nf_ct_helper_expectfn_register(&follow_master_nat);
-- 
2.1.4

[PATCH net 4/9] net: hns3: Fix for not setting rx private buffer size to zero

2017-09-20 Thread Yunsheng Lin

When rx private buffer is disabled, there may be some case that
the rx private buffer is not set to zero, which may cause buffer
allocation process to fail.
This patch fixes this problem by setting priv->enable to 0 and
priv->buf_size to zero when rx private buffer is disabled.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 1876418..bf3179a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1504,6 +1504,11 @@ int hclge_rx_buffer_calc(struct hclge_dev *hdev, u32 
tx_size)
priv->wl.high = 2 * hdev->mps;
priv->buf_size = priv->wl.high;
}
+   } else {
+   priv->enable = 0;
+   priv->wl.low = 0;
+   priv->wl.high = 0;
+   priv->buf_size = 0;
}
}
 
@@ -1516,8 +1521,15 @@ int hclge_rx_buffer_calc(struct hclge_dev *hdev, u32 
tx_size)
for (i = 0; i < HCLGE_MAX_TC_NUM; i++) {
priv = &hdev->priv_buf[i];
 
-   if (hdev->hw_tc_map & BIT(i))
-   priv->enable = 1;
+   priv->enable = 0;
+   priv->wl.low = 0;
+   priv->wl.high = 0;
+   priv->buf_size = 0;
+
+   if (!(hdev->hw_tc_map & BIT(i)))
+   continue;
+
+   priv->enable = 1;
 
if (hdev->tm_info.hw_pfc_map & BIT(i)) {
priv->wl.low = 128;
-- 
1.9.1

[PATCH net 9/9] net: hns3: Fix for pri to tc mapping in TM

2017-09-20 Thread Yunsheng Lin

Current mapping between pri and tc is one to one,
so user can't map multi priorities to the same tc.
This patch changes the mapping to many to one.

Fixes: 848440544b41f ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 
driver")
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h |  3 ++-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h |  2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c   | 16 +---
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index ad685f5..1a01cad 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -376,12 +376,12 @@ struct hnae3_ae_algo {
 struct hnae3_tc_info {
u16 tqp_offset; /* TQP offset from base TQP */
u16 tqp_count;  /* Total TQPs */
-   u8  up; /* user priority */
u8  tc; /* TC index */
boolenable; /* If this TC is enable or not */
 };
 
 #define HNAE3_MAX_TC   8
+#define HNAE3_MAX_USER_PRIO8
 struct hnae3_knic_private_info {
struct net_device *netdev; /* Set by KNIC client when init instance */
u16 rss_size;  /* Allocated RSS queues */
@@ -389,6 +389,7 @@ struct hnae3_knic_private_info {
u16 num_desc;
 
u8 num_tc; /* Total number of enabled TCs */
+   u8 prio_tc[HNAE3_MAX_USER_PRIO];  /* TC indexed by prio */
struct hnae3_tc_info tc_info[HNAE3_MAX_TC]; /* Idx of array is HW TC */
 
u16 num_tqps; /* total number of TQPs in this handle */
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
index 7f8dd12..9fcfd93 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h
@@ -176,7 +176,6 @@ struct hclge_pg_info {
 struct hclge_tc_info {
u8 tc_id;
u8 tc_sch_mode; /* 0: sp; 1: dwrr */
-   u8 up;
u8 pgid;
u32 bw_limit;
 };
@@ -197,6 +196,7 @@ struct hclge_tm_info {
u8 num_tc;
u8 num_pg;  /* It must be 1 if vNET-Base schd */
u8 pg_dwrr[HCLGE_PG_NUM];
+   u8 prio_tc[HNAE3_MAX_USER_PRIO];
struct hclge_pg_info pg_info[HCLGE_PG_NUM];
struct hclge_tc_info tc_info[HNAE3_MAX_TC];
enum hclge_fc_mode fc_mode;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index b7ba7aa..73a75d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -128,9 +128,7 @@ static int hclge_fill_pri_array(struct hclge_dev *hdev, u8 
*pri, u8 pri_id)
 {
u8 tc;
 
-   for (tc = 0; tc < hdev->tm_info.num_tc; tc++)
-   if (hdev->tm_info.tc_info[tc].up == pri_id)
-   break;
+   tc = hdev->tm_info.prio_tc[pri_id];
 
if (tc >= hdev->tm_info.num_tc)
return -EINVAL;
@@ -158,7 +156,7 @@ static int hclge_up_to_tc_map(struct hclge_dev *hdev)
 
hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_PRI_TO_TC_MAPPING, false);
 
-   for (pri_id = 0; pri_id < hdev->tm_info.num_tc; pri_id++) {
+   for (pri_id = 0; pri_id < HNAE3_MAX_USER_PRIO; pri_id++) {
ret = hclge_fill_pri_array(hdev, pri, pri_id);
if (ret)
return ret;
@@ -405,16 +403,17 @@ static void hclge_tm_vport_tc_info_update(struct 
hclge_vport *vport)
kinfo->tc_info[i].tqp_offset = i * kinfo->rss_size;
kinfo->tc_info[i].tqp_count = kinfo->rss_size;
kinfo->tc_info[i].tc = i;
-   kinfo->tc_info[i].up = hdev->tm_info.tc_info[i].up;
} else {
/* Set to default queue if TC is disable */
kinfo->tc_info[i].enable = false;
kinfo->tc_info[i].tqp_offset = 0;
kinfo->tc_info[i].tqp_count = 1;
kinfo->tc_info[i].tc = 0;
-   kinfo->tc_info[i].up = 0;
}
}
+
+   memcpy(kinfo->prio_tc, hdev->tm_info.prio_tc,
+  FIELD_SIZEOF(struct hnae3_knic_private_info, prio_tc));
 }
 
 static void hclge_tm_vport_info_update(struct hclge_dev *hdev)
@@ -436,12 +435,15 @@ static void hclge_tm_tc_info_init(struct hclge_dev *hdev)
for (i = 0; i < hdev->tm_info.num_tc; i++) {
hdev->tm_info.tc_info[i].tc_id = i;
hdev->tm_info.tc_info[i].tc_sch_mode = HCLGE_SCH_MODE_DWRR;
-   hdev->tm_info.tc_info[i].up = i;
hdev->tm_info.tc_info[i].pgid = 0;
hdev->tm_info.tc_info[i].bw_limit =

[PATCH net 6/9] net: hns3: Fix for rx priv buf allocation when DCB is not supported

2017-09-20 Thread Yunsheng Lin

When hdev doesn't support DCB, rx private buffer is not allocated,
otherwise there is not enough buffer for rx shared buffer, causing
buffer allocation process to fail.
This patch fixes by checking the dcb capability in
hclge_rx_buffer_calc.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index c08c41f..1e15ce1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1489,6 +1489,16 @@ int hclge_rx_buffer_calc(struct hclge_dev *hdev, u32 
tx_size)
struct hclge_priv_buf *priv;
int i;
 
+   /* When DCB is not supported, rx private
+* buffer is not allocated.
+*/
+   if (!hnae3_dev_dcb_supported(hdev)) {
+   if (!hclge_is_rx_buf_ok(hdev, rx_all))
+   return -ENOMEM;
+
+   return 0;
+   }
+
/* step 1, try to alloc private buffer for all enabled tc */
for (i = 0; i < HCLGE_MAX_TC_NUM; i++) {
priv = &hdev->priv_buf[i];
-- 
1.9.1

[PATCH net 5/9] net: hns3: Fix for rx_priv_buf_alloc not setting rx shared buffer

2017-09-20 Thread Yunsheng Lin

rx_priv_buf_alloc is used to tell hardware how much buffer is
used for rx direction, right now only the private buffer is
assigned.
For ae_dev that doesn't support DCB, private rx buffer is assigned
to zero, only shared rx buffer is used. So not setting the shared
rx buffer cause dropping of packet in SSU.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h  | 3 ++-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 30e2ad5..758cf39 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -270,7 +270,8 @@ struct hclge_tx_buff_alloc {
 
 struct hclge_rx_priv_buff {
__le16 buf_num[HCLGE_TC_NUM];
-   u8 rsv[8];
+   __le16 shared_buf;
+   u8 rsv[6];
 };
 
 struct hclge_query_version {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bf3179a..c08c41f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1622,6 +1622,10 @@ static int hclge_rx_priv_buf_alloc(struct hclge_dev 
*hdev)
cpu_to_le16(true << HCLGE_TC0_PRI_BUF_EN_B);
}
 
+   req->shared_buf =
+   cpu_to_le16((hdev->s_buf.buf_size >> HCLGE_BUF_UNIT_S) |
+   (1 << HCLGE_TC0_PRI_BUF_EN_B));
+
ret = hclge_cmd_send(&hdev->hw, &desc, 1);
if (ret) {
dev_err(&hdev->pdev->dev,
-- 
1.9.1

[PATCH net 8/9] net: hns3: Fix for setting rss_size incorrectly

2017-09-20 Thread Yunsheng Lin

rss_size is 1, 2, 4, 8, 16, 32, 64, 128, but acutal tc queue
size can be any u16 less than 128. If tc queue size is 5, we
set the rss_size to 8, indirection table will be used to limit
the size of actual queue size.
It may cause dropping of receiving packet in hardware if
rss_size is not set correctly.
For now, each TC has the same rss size.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Yunsheng Lin 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 76 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  1 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  |  1 +
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 1e15ce1..d27618b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2606,6 +2606,7 @@ static int hclge_rss_init_hw(struct hclge_dev *hdev)
u16 tc_valid[HCLGE_MAX_TC_NUM];
u16 tc_size[HCLGE_MAX_TC_NUM];
u32 *rss_indir = NULL;
+   u16 rss_size = 0, roundup_size;
const u8 *key;
int i, ret, j;
 
@@ -2620,7 +2621,13 @@ static int hclge_rss_init_hw(struct hclge_dev *hdev)
for (j = 0; j < hdev->num_vmdq_vport + 1; j++) {
for (i = 0; i < HCLGE_RSS_IND_TBL_SIZE; i++) {
vport[j].rss_indirection_tbl[i] =
-   i % hdev->rss_size_max;
+   i % vport[j].alloc_rss_size;
+
+   /* vport 0 is for PF */
+   if (j != 0)
+   continue;
+
+   rss_size = vport[j].alloc_rss_size;
rss_indir[i] = vport[j].rss_indirection_tbl[i];
}
}
@@ -2637,42 +2644,31 @@ static int hclge_rss_init_hw(struct hclge_dev *hdev)
if (ret)
goto err;
 
+   /* Each TC have the same queue size, and tc_size set to hardware is
+* the log2 of roundup power of two of rss_size, the acutal queue
+* size is limited by indirection table.
+*/
+   if (rss_size > HCLGE_RSS_TC_SIZE_7 || rss_size == 0) {
+   dev_err(&hdev->pdev->dev,
+   "Configure rss tc size failed, invalid TC_SIZE = %d\n",
+   rss_size);
+   return -EINVAL;
+   }
+
+   roundup_size = roundup_pow_of_two(rss_size);
+   roundup_size = ilog2(roundup_size);
+
for (i = 0; i < HCLGE_MAX_TC_NUM; i++) {
-   if (hdev->hw_tc_map & BIT(i))
-   tc_valid[i] = 1;
-   else
-   tc_valid[i] = 0;
+   tc_valid[i] = 0;
 
-   switch (hdev->rss_size_max) {
-   case HCLGE_RSS_TC_SIZE_0:
-   tc_size[i] = 0;
-   break;
-   case HCLGE_RSS_TC_SIZE_1:
-   tc_size[i] = 1;
-   break;
-   case HCLGE_RSS_TC_SIZE_2:
-   tc_size[i] = 2;
-   break;
-   case HCLGE_RSS_TC_SIZE_3:
-   tc_size[i] = 3;
-   break;
-   case HCLGE_RSS_TC_SIZE_4:
-   tc_size[i] = 4;
-   break;
-   case HCLGE_RSS_TC_SIZE_5:
-   tc_size[i] = 5;
-   break;
-   case HCLGE_RSS_TC_SIZE_6:
-   tc_size[i] = 6;
-   break;
-   case HCLGE_RSS_TC_SIZE_7:
-   tc_size[i] = 7;
-   break;
-   default:
-   break;
-   }
-   tc_offset[i] = hdev->rss_size_max * i;
+   if (!(hdev->hw_tc_map & BIT(i)))
+   continue;
+
+   tc_valid[i] = 1;
+   tc_size[i] = roundup_size;
+   tc_offset[i] = rss_size * i;
}
+
ret = hclge_set_rss_tc_mode(hdev, tc_valid, tc_size, tc_offset);
 
 err:
@@ -4166,12 +4162,6 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
return ret;
}
 
-   ret = hclge_rss_init_hw(hdev);
-   if (ret) {
-   dev_err(&pdev->dev, "Rss init fail, ret =%d\n", ret);
-   return  ret;
-   }
-
ret = hclge_init_vlan_config(hdev);
if (ret) {
dev_err(&pdev->dev, "VLAN init fail, ret =%d\n", ret);
@@ -4184,6 +4174,12 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
return ret;
}
 
+   ret = hclge_rss_init_hw(hdev);
+   if (ret) {
+   dev_err(&pdev->dev, "Rss init fail, ret =%d\n", ret);
+   return ret;
+   }
+
setup_timer(&hdev->service_time

[PATCH net 3/9] net: hns3: Fix for DEFAULT_DV when dev doesn't support DCB

2017-09-20 Thread Yunsheng Lin

When ae_dev doesn't support DCB, DEFAULT_DV must be set to
a lower value, otherwise the buffer allocation process will
fail.
This patch fix it by setting it to 30K bytes.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h  | 1 +
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index c2b613b..30e2ad5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -688,6 +688,7 @@ struct hclge_reset_tqp_queue {
 #define HCLGE_DEFAULT_TX_BUF   0x4000   /* 16k  bytes */
 #define HCLGE_TOTAL_PKT_BUF0x108000 /* 1.03125M bytes */
 #define HCLGE_DEFAULT_DV   0xA000   /* 40k byte */
+#define HCLGE_DEFAULT_NON_DCB_DV   0x7800  /* 30K byte */
 
 #define HCLGE_TYPE_CRQ 0
 #define HCLGE_TYPE_CSQ 1
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index c515b84..1876418 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1444,7 +1444,11 @@ static bool  hclge_is_rx_buf_ok(struct hclge_dev *hdev, 
u32 rx_all)
tc_num = hclge_get_tc_num(hdev);
pfc_enable_num = hclge_get_pfc_enalbe_num(hdev);
 
-   shared_buf_min = 2 * hdev->mps + HCLGE_DEFAULT_DV;
+   if (hnae3_dev_dcb_supported(hdev))
+   shared_buf_min = 2 * hdev->mps + HCLGE_DEFAULT_DV;
+   else
+   shared_buf_min = 2 * hdev->mps + HCLGE_DEFAULT_NON_DCB_DV;
+
shared_buf_tc = pfc_enable_num * hdev->mps +
(tc_num - pfc_enable_num) * hdev->mps / 2 +
hdev->mps;
-- 
1.9.1

[PATCH net 7/9] net: hns3: Fix typo error for feild in hclge_tm

2017-09-20 Thread Yunsheng Lin

This patch fixes a typo error for feild, which should be field.

Fixes: 848440544b41f ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 
driver")
Signed-off-by: Yunsheng Lin 
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c| 20 ++--
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h|  4 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index c91dbf1..fe659f7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -280,11 +280,11 @@ static int hclge_tm_pg_shapping_cfg(struct hclge_dev 
*hdev,
 
shap_cfg_cmd->pg_id = pg_id;
 
-   hclge_tm_set_feild(shap_cfg_cmd->pg_shapping_para, IR_B, ir_b);
-   hclge_tm_set_feild(shap_cfg_cmd->pg_shapping_para, IR_U, ir_u);
-   hclge_tm_set_feild(shap_cfg_cmd->pg_shapping_para, IR_S, ir_s);
-   hclge_tm_set_feild(shap_cfg_cmd->pg_shapping_para, BS_B, bs_b);
-   hclge_tm_set_feild(shap_cfg_cmd->pg_shapping_para, BS_S, bs_s);
+   hclge_tm_set_field(shap_cfg_cmd->pg_shapping_para, IR_B, ir_b);
+   hclge_tm_set_field(shap_cfg_cmd->pg_shapping_para, IR_U, ir_u);
+   hclge_tm_set_field(shap_cfg_cmd->pg_shapping_para, IR_S, ir_s);
+   hclge_tm_set_field(shap_cfg_cmd->pg_shapping_para, BS_B, bs_b);
+   hclge_tm_set_field(shap_cfg_cmd->pg_shapping_para, BS_S, bs_s);
 
return hclge_cmd_send(&hdev->hw, &desc, 1);
 }
@@ -307,11 +307,11 @@ static int hclge_tm_pri_shapping_cfg(struct hclge_dev 
*hdev,
 
shap_cfg_cmd->pri_id = pri_id;
 
-   hclge_tm_set_feild(shap_cfg_cmd->pri_shapping_para, IR_B, ir_b);
-   hclge_tm_set_feild(shap_cfg_cmd->pri_shapping_para, IR_U, ir_u);
-   hclge_tm_set_feild(shap_cfg_cmd->pri_shapping_para, IR_S, ir_s);
-   hclge_tm_set_feild(shap_cfg_cmd->pri_shapping_para, BS_B, bs_b);
-   hclge_tm_set_feild(shap_cfg_cmd->pri_shapping_para, BS_S, bs_s);
+   hclge_tm_set_field(shap_cfg_cmd->pri_shapping_para, IR_B, ir_b);
+   hclge_tm_set_field(shap_cfg_cmd->pri_shapping_para, IR_U, ir_u);
+   hclge_tm_set_field(shap_cfg_cmd->pri_shapping_para, IR_S, ir_s);
+   hclge_tm_set_field(shap_cfg_cmd->pri_shapping_para, BS_B, bs_b);
+   hclge_tm_set_field(shap_cfg_cmd->pri_shapping_para, BS_S, bs_s);
 
return hclge_cmd_send(&hdev->hw, &desc, 1);
 }
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
index 7e67337..85158b0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h
@@ -94,10 +94,10 @@ struct hclge_bp_to_qs_map_cmd {
u32 rsvd1;
 };
 
-#define hclge_tm_set_feild(dest, string, val) \
+#define hclge_tm_set_field(dest, string, val) \
hnae_set_field((dest), (HCLGE_TM_SHAP_##string##_MSK), \
   (HCLGE_TM_SHAP_##string##_LSH), val)
-#define hclge_tm_get_feild(src, string) \
+#define hclge_tm_get_field(src, string) \
hnae_get_field((src), (HCLGE_TM_SHAP_##string##_MSK), \
   (HCLGE_TM_SHAP_##string##_LSH))
 
-- 
1.9.1

[PATCH net 0/9] TM related bugfixes for the HNS3 Ethernet Driver

2017-09-20 Thread Yunsheng Lin

This patch set contains a few bugfixes related to hclge_tm module.

Yunsheng Lin (9):
  net: hns3: Cleanup for ROCE capability flag in ae_dev
  net: hns3: Fix initialization when cmd is not supported
  net: hns3: Fix for DEFAULT_DV when dev doesn't support DCB
  net: hns3: Fix for not setting rx private buffer size to zero
  net: hns3: Fix for rx_priv_buf_alloc not setting rx shared buffer
  net: hns3: Fix for rx priv buf allocation when DCB is not supported
  net: hns3: Fix typo error for feild in hclge_tm
  net: hns3: Fix for setting rss_size incorrectly
  net: hns3: Fix for pri to tc mapping in TM

 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  15 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |   4 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 163 +++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|   3 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  |  41 +++---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.h  |   4 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c |  16 +-
 7 files changed, 143 insertions(+), 103 deletions(-)

-- 
1.9.1

[PATCH net 2/9] net: hns3: Fix initialization when cmd is not supported

2017-09-20 Thread Yunsheng Lin

When ae_dev doesn't support DCB, rx_priv_wl_config,
common_thrd_config and tm_qs_bp_cfg can't be called, otherwise
cmd return fail, which causes the hclge module initialization
process to fail.
This patch fix it by adding a DCB capability flag to check if
the ae_dev support DCB.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility 
Layer Support")
Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  7 ++
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 26 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  |  4 
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 10 -
 4 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 0f7b61a..ad685f5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -50,10 +50,17 @@
 
 #define HNAE3_DEV_INITED_B 0x0
 #define HNAE3_DEV_SUPPORT_ROCE_B   0x1
+#define HNAE3_DEV_SUPPORT_DCB_B0x2
+
+#define HNAE3_DEV_SUPPORT_ROCE_DCB_BITS (BIT(HNAE3_DEV_SUPPORT_DCB_B) |\
+   BIT(HNAE3_DEV_SUPPORT_ROCE_B))
 
 #define hnae3_dev_roce_supported(hdev) \
hnae_get_bit(hdev->ae_dev->flag, HNAE3_DEV_SUPPORT_ROCE_B)
 
+#define hnae3_dev_dcb_supported(hdev) \
+   hnae_get_bit(hdev->ae_dev->flag, HNAE3_DEV_SUPPORT_DCB_B)
+
 #define ring_ptr_move_fw(ring, p) \
((ring)->p = ((ring)->p + 1) % (ring)->desc_num)
 #define ring_ptr_move_bw(ring, p) \
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index eb78c23..c515b84 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1772,18 +1772,22 @@ int hclge_buffer_alloc(struct hclge_dev *hdev)
return ret;
}
 
-   ret = hclge_rx_priv_wl_config(hdev);
-   if (ret) {
-   dev_err(&hdev->pdev->dev,
-   "could not configure rx private waterline %d\n", ret);
-   return ret;
-   }
+   if (hnae3_dev_dcb_supported(hdev)) {
+   ret = hclge_rx_priv_wl_config(hdev);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "could not configure rx private waterline %d\n",
+   ret);
+   return ret;
+   }
 
-   ret = hclge_common_thrd_config(hdev);
-   if (ret) {
-   dev_err(&hdev->pdev->dev,
-   "could not configure common threshold %d\n", ret);
-   return ret;
+   ret = hclge_common_thrd_config(hdev);
+   if (ret) {
+   dev_err(&hdev->pdev->dev,
+   "could not configure common threshold %d\n",
+   ret);
+   return ret;
+   }
}
 
ret = hclge_common_wl_config(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index 1c577d2..c91dbf1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -976,6 +976,10 @@ int hclge_pause_setup_hw(struct hclge_dev *hdev)
if (ret)
return ret;
 
+   /* Only DCB-supported dev supports qset back pressure setting */
+   if (!hnae3_dev_dcb_supported(hdev))
+   return 0;
+
for (i = 0; i < hdev->tm_info.num_tc; i++) {
ret = hclge_tm_qs_bp_cfg(hdev, i);
if (ret)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
index 94d8bb5..35369e1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -42,15 +42,15 @@
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_GE), 0},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE), 0},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA),
-BIT(HNAE3_DEV_SUPPORT_ROCE_B)},
+HNAE3_DEV_SUPPORT_ROCE_DCB_BITS},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA_MACSEC),
-BIT(HNAE3_DEV_SUPPORT_ROCE_B)},
+HNAE3_DEV_SUPPORT_ROCE_DCB_BITS},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA),
-BIT(HNAE3_DEV_SUPPORT_ROCE_B)},
+HNAE3_DEV_SUPPORT_ROCE_DCB_BITS},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC),
-BIT(HNAE3_DEV_SUPPORT_ROCE_B)},
+HNAE3_DEV_SUPPORT_ROCE_DCB_BITS},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC),
-BIT(HNAE3_DEV_SUPPORT_ROCE_B)},
+HNAE3_DEV_SUPPORT_ROCE_DCB_BITS},
/* required last entry */
{0, }
 };
-- 
1.

[PATCH net 1/9] net: hns3: Cleanup for ROCE capability flag in ae_dev

2017-09-20 Thread Yunsheng Lin

This patch add the ROCE supported flag in the driver_data
field of pci_device_id, delete roce_pci_tbl and change
HNAE_DEV_SUPPORT_ROCE_B to HNAE3_DEV_SUPPORT_ROCE_B.
This cleanup is done in order to support adding capability
in pci_device_id and to fix initialization failure when
cmd is not supported.

Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  5 -
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 25 --
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 16 +-
 3 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index b2f28ae..0f7b61a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -49,7 +49,10 @@
 #define HNAE3_CLASS_NAME_SIZE 16
 
 #define HNAE3_DEV_INITED_B 0x0
-#define HNAE_DEV_SUPPORT_ROCE_B0x1
+#define HNAE3_DEV_SUPPORT_ROCE_B   0x1
+
+#define hnae3_dev_roce_supported(hdev) \
+   hnae_get_bit(hdev->ae_dev->flag, HNAE3_DEV_SUPPORT_ROCE_B)
 
 #define ring_ptr_move_fw(ring, p) \
((ring)->p = ((ring)->p + 1) % (ring)->desc_num)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 44c722a..eb78c23 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -46,17 +46,7 @@ static int hclge_set_mta_filter_mode(struct hclge_dev *hdev,
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA), 0},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC), 0},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC), 0},
-   /* Required last entry */
-   {0, }
-};
-
-static const struct pci_device_id roce_pci_tbl[] = {
-   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA), 0},
-   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA_MACSEC), 0},
-   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA), 0},
-   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC), 0},
-   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC), 0},
-   /* Required last entry */
+   /* required last entry */
{0, }
 };
 
@@ -894,7 +884,7 @@ static int hclge_query_pf_resource(struct hclge_dev *hdev)
hdev->num_tqps = __le16_to_cpu(req->tqp_num);
hdev->pkt_buf_size = __le16_to_cpu(req->buf_size) << HCLGE_BUF_UNIT_S;
 
-   if (hnae_get_bit(hdev->ae_dev->flag, HNAE_DEV_SUPPORT_ROCE_B)) {
+   if (hnae3_dev_roce_supported(hdev)) {
hdev->num_roce_msix =
hnae_get_field(__le16_to_cpu(req->pf_intr_vector_number),
   HCLGE_PF_VEC_NUM_M, HCLGE_PF_VEC_NUM_S);
@@ -3931,8 +3921,7 @@ static int hclge_init_client_instance(struct hnae3_client 
*client,
goto err;
 
if (hdev->roce_client &&
-   hnae_get_bit(hdev->ae_dev->flag,
-HNAE_DEV_SUPPORT_ROCE_B)) {
+   hnae3_dev_roce_supported(hdev)) {
struct hnae3_client *rc = hdev->roce_client;
 
ret = hclge_init_roce_base_info(vport);
@@ -3955,8 +3944,7 @@ static int hclge_init_client_instance(struct hnae3_client 
*client,
 
break;
case HNAE3_CLIENT_ROCE:
-   if (hnae_get_bit(hdev->ae_dev->flag,
-HNAE_DEV_SUPPORT_ROCE_B)) {
+   if (hnae3_dev_roce_supported(hdev)) {
hdev->roce_client = client;
vport->roce.client = client;
}
@@ -4068,7 +4056,6 @@ static void hclge_pci_uninit(struct hclge_dev *hdev)
 static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
 {
struct pci_dev *pdev = ae_dev->pdev;
-   const struct pci_device_id *id;
struct hclge_dev *hdev;
int ret;
 
@@ -4083,10 +4070,6 @@ static int hclge_init_ae_dev(struct hnae3_ae_dev *ae_dev)
hdev->ae_dev = ae_dev;
ae_dev->priv = hdev;
 
-   id = pci_match_id(roce_pci_tbl, ae_dev->pdev);
-   if (id)
-   hnae_set_bit(ae_dev->flag, HNAE_DEV_SUPPORT_ROCE_B, 1);
-
ret = hclge_pci_init(hdev);
if (ret) {
dev_err(&pdev->dev, "PCI init failed\n");
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
index 4d68d6e..94d8bb5 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -41,11 +41,16 @@
 static const struct pci_device_id hns3_pci_tbl[] = {
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_GE), 0},
{PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE), 0},
-

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Ok resumed and soo far:

Panic:

# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using 
stack larger than 1024.

git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f

No panic:

# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
'udp-reduce-cache-pressure'

git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20


W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze:

Soo far bisected and marked:

git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
'pinctrl-v4.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31



W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze:
Ok kernel crashed with different panic that i didnt catch when i was 
doing bisect and now my bisection is broken :)


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be overwritten 
by checkout:

    Documentation/00-INDEX
    Documentation/ABI/stable/sysfs-class-udc
    Documentation/ABI/testing/configfs-usb-gadget-uac1
    Documentation/ABI/testing/ima_policy
    Documentation/ABI/testing/sysfs-bus-iio
    Documentation/ABI/testing/sysfs-bus-iio-meas-spec
    Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
    Documentation/ABI/testing/sysfs-class-net
    Documentation/ABI/testing/sysfs-class-power-twl4030
    Documentation/ABI/testing/sysfs-class-typec
    Documentation/DMA-API.txt
    Documentation/IRQ-domain.txt
    Documentation/Makefile
    Documentation/PCI/MSI-HOWTO.txt
    Documentation/RCU/00-INDEX
    Documentation/RCU/Design/Requirements/Requirements.html
    Documentation/RCU/checklist.txt
    Documentation/admin-guide/README.rst
    Documentation/admin-guide/devices.txt
    Documentation/admin-guide/index.rst
    Documentation/admin-guide/kernel-parameters.txt
    Documentation/admin-guide/pm/cpufreq.rst
    Documentation/admin-guide/pm/intel_pstate.rst
    Documentation/admin-guide/ras.rst
    Documentation/arm/Atmel/README
    Documentation/block/biodoc.txt
    Documentation/conf.py
    Documentation/core-api/assoc_array.rst
    Documentation/core-api/atomic_ops.rst
    Documentation/core-api/index.rst
    Documentation/crypto/asymmetric-keys.txt
    Documentation/dev-tools/index.rst
    Documentation/dev-tools/sparse.rst
    Documentation/devicetree/bindings/arm/amlogic.txt
    Documentation/devicetree/bindings/arm/atmel-at91.txt
    Documentation/devicetree/bindings/arm/ccn.txt
    Documentation/devicetree/bindings/arm/cpus.txt
    Documentation/devicetree/bindings/arm/gemini.txt
Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
Documentation/devicetree/bindings/arm/keystone/keystone.txt
    Documentation/devicetree/bindings/arm/mediatek.txt
    Documentation/devicetree/bindings/arm/rockchip.txt
    Documentation/devicetree/bindings/arm/shmobile.txt
    Documentation/devicetree/bindings/arm/tegra.txt
    Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
    Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
    Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt
    Documentation/devicetree/bindings/gpio/gpio_atmel.txt
Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt
Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt
Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt
    Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt
Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt 

Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt 

Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt 


    Documentation/devicetree/bindings/leds/common.txt
    Documentat

[PATCH net-next 08/10] net/smc: introduce a delay

2017-09-20 Thread Ursula Braun

The number of outstanding work requests is limited. If all work
requests are in use, tx processing is postponed to another scheduling
of the tx worker. Switch to a delayed worker to have a gap for tx
completion queue events before the next retry.

Signed-off-by: Ursula Braun 
---
 net/smc/smc.h   |  2 +-
 net/smc/smc_close.c | 12 +++-
 net/smc/smc_tx.c| 12 
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/smc/smc.h b/net/smc/smc.h
index 6e44313e4467..0ccd6fa387ad 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -149,7 +149,7 @@ struct smc_connection {
atomic_tsndbuf_space;   /* remaining space in sndbuf */
u16 tx_cdc_seq; /* sequence # for CDC send */
spinlock_t  send_lock;  /* protect wr_sends */
-   struct work_struct  tx_work;/* retry of smc_cdc_msg_send */
+   struct delayed_work tx_work;/* retry of smc_cdc_msg_send */
 
struct smc_host_cdc_msg local_rx_ctrl;  /* filled during event_handl.
 * .prod cf. TCP rcv_nxt
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index 3c2e166b5d22..5201bc103bd8 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -208,7 +208,7 @@ int smc_close_active(struct smc_sock *smc)
case SMC_ACTIVE:
smc_close_stream_wait(smc, timeout);
release_sock(sk);
-   cancel_work_sync(&conn->tx_work);
+   cancel_delayed_work_sync(&conn->tx_work);
lock_sock(sk);
if (sk->sk_state == SMC_ACTIVE) {
/* send close request */
@@ -234,7 +234,7 @@ int smc_close_active(struct smc_sock *smc)
if (!smc_cdc_rxed_any_close(conn))
smc_close_stream_wait(smc, timeout);
release_sock(sk);
-   cancel_work_sync(&conn->tx_work);
+   cancel_delayed_work_sync(&conn->tx_work);
lock_sock(sk);
if (sk->sk_err != ECONNABORTED) {
/* confirm close from peer */
@@ -263,7 +263,9 @@ int smc_close_active(struct smc_sock *smc)
/* peer sending PeerConnectionClosed will cause transition */
break;
case SMC_PROCESSABORT:
-   cancel_work_sync(&conn->tx_work);
+   release_sock(sk);
+   cancel_delayed_work_sync(&conn->tx_work);
+   lock_sock(sk);
smc_close_abort(conn);
sk->sk_state = SMC_CLOSED;
smc_close_wait_tx_pends(smc);
@@ -425,7 +427,7 @@ int smc_close_shutdown_write(struct smc_sock *smc)
case SMC_ACTIVE:
smc_close_stream_wait(smc, timeout);
release_sock(sk);
-   cancel_work_sync(&conn->tx_work);
+   cancel_delayed_work_sync(&conn->tx_work);
lock_sock(sk);
/* send close wr request */
rc = smc_close_wr(conn);
@@ -439,7 +441,7 @@ int smc_close_shutdown_write(struct smc_sock *smc)
if (!smc_cdc_rxed_any_close(conn))
smc_close_stream_wait(smc, timeout);
release_sock(sk);
-   cancel_work_sync(&conn->tx_work);
+   cancel_delayed_work_sync(&conn->tx_work);
lock_sock(sk);
/* confirm close from peer */
rc = smc_close_wr(conn);
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 3c656beb8820..3866573288dd 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -24,6 +24,8 @@
 #include "smc_cdc.h"
 #include "smc_tx.h"
 
+#define SMC_TX_WORK_DELAY  HZ
+
 /* sndbuf producer ***/
 
 /* callback implementation for sk.sk_write_space()
@@ -406,7 +408,8 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
goto out_unlock;
}
rc = 0;
-   schedule_work(&conn->tx_work);
+   schedule_delayed_work(&conn->tx_work,
+ SMC_TX_WORK_DELAY);
}
goto out_unlock;
}
@@ -430,7 +433,7 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
  */
 static void smc_tx_work(struct work_struct *work)
 {
-   struct smc_connection *conn = container_of(work,
+   struct smc_connection *conn = container_of(to_delayed_work(work),
   struct smc_connection,
   tx_work);
struct smc_sock *smc = container_of(conn, struct smc_sock, conn);
@@ -468,7 +471,8 @@ void smc_tx_consumer_update(struct smc_connection *conn)
if (!rc)
rc = smc_cdc_msg_send(conn, wr_buf, pend);
if (rc < 0) {
-

[PATCH net-next 10/10] net/smc: no close wait in case of process shut down

2017-09-20 Thread Ursula Braun

Usually socket closing is delayed if there is still data available in
the send buffer to be transmitted. If a process is killed, the delay
should be avoided.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_close.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index 5201bc103bd8..f0d16fb825f7 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -174,15 +174,15 @@ int smc_close_active(struct smc_sock *smc)
 {
struct smc_cdc_conn_state_flags *txflags =
&smc->conn.local_tx_ctrl.conn_state_flags;
-   long timeout = SMC_MAX_STREAM_WAIT_TIMEOUT;
struct smc_connection *conn = &smc->conn;
struct sock *sk = &smc->sk;
int old_state;
+   long timeout;
int rc = 0;
 
-   if (sock_flag(sk, SOCK_LINGER) &&
-   !(current->flags & PF_EXITING))
-   timeout = sk->sk_lingertime;
+   timeout = current->flags & PF_EXITING ?
+ 0 : sock_flag(sk, SOCK_LINGER) ?
+ sk->sk_lingertime : SMC_MAX_STREAM_WAIT_TIMEOUT;
 
 again:
old_state = sk->sk_state;
@@ -413,13 +413,14 @@ void smc_close_sock_put_work(struct work_struct *work)
 int smc_close_shutdown_write(struct smc_sock *smc)
 {
struct smc_connection *conn = &smc->conn;
-   long timeout = SMC_MAX_STREAM_WAIT_TIMEOUT;
struct sock *sk = &smc->sk;
int old_state;
+   long timeout;
int rc = 0;
 
-   if (sock_flag(sk, SOCK_LINGER))
-   timeout = sk->sk_lingertime;
+   timeout = current->flags & PF_EXITING ?
+ 0 : sock_flag(sk, SOCK_LINGER) ?
+ sk->sk_lingertime : SMC_MAX_STREAM_WAIT_TIMEOUT;
 
 again:
old_state = sk->sk_state;
-- 
2.13.5

[PATCH net-next 07/10] net/smc: terminate link group if out-of-sync is received

2017-09-20 Thread Ursula Braun

An out-of-sync condition can just be detected by the client.
If the server receives a CLC DECLINE message indicating an out-of-sync
condition for the link groups, the server must clean up the out-of-sync
link group.
There is no need for an extra third parameter in smc_clc_send_decline().

Signed-off-by: Ursula Braun 
---
 net/smc/af_smc.c  |  6 ++
 net/smc/smc_clc.c | 10 +-
 net/smc/smc_clc.h |  3 +--
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 2e8d2dabac0c..745f145d4c4d 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -513,7 +513,7 @@ static int smc_connect_rdma(struct smc_sock *smc)
/* RDMA setup failed, switch back to TCP */
smc->use_fallback = true;
if (reason_code && (reason_code != SMC_CLC_DECL_REPLY)) {
-   rc = smc_clc_send_decline(smc, reason_code, 0);
+   rc = smc_clc_send_decline(smc, reason_code);
if (rc < sizeof(struct smc_clc_msg_decline))
goto out_err;
}
@@ -808,8 +808,6 @@ static void smc_listen_work(struct work_struct *work)
rc = local_contact;
if (rc == -ENOMEM)
reason_code = SMC_CLC_DECL_MEM;/* insufficient memory*/
-   else if (rc == -ENOLINK)
-   reason_code = SMC_CLC_DECL_SYNCERR; /* synchr. error */
goto decline_rdma;
}
link = &new_smc->conn.lgr->lnk[SMC_SINGLE_LINK];
@@ -903,7 +901,7 @@ static void smc_listen_work(struct work_struct *work)
smc_conn_free(&new_smc->conn);
new_smc->use_fallback = true;
if (reason_code && (reason_code != SMC_CLC_DECL_REPLY)) {
-   rc = smc_clc_send_decline(new_smc, reason_code, 0);
+   rc = smc_clc_send_decline(new_smc, reason_code);
if (rc < sizeof(struct smc_clc_msg_decline))
goto out_err;
}
diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c
index 3934913ab835..b7dd2743fb5c 100644
--- a/net/smc/smc_clc.c
+++ b/net/smc/smc_clc.c
@@ -95,9 +95,10 @@ int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int 
buflen,
}
if (clcm->type == SMC_CLC_DECLINE) {
reason_code = SMC_CLC_DECL_REPLY;
-   if (ntohl(((struct smc_clc_msg_decline *)buf)->peer_diagnosis)
-   == SMC_CLC_DECL_SYNCERR)
+   if (((struct smc_clc_msg_decline *)buf)->hdr.flag) {
smc->conn.lgr->sync_err = true;
+   smc_lgr_terminate(smc->conn.lgr);
+   }
}
 
 out:
@@ -105,8 +106,7 @@ int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int 
buflen,
 }
 
 /* send CLC DECLINE message across internal TCP socket */
-int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info,
-u8 out_of_sync)
+int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info)
 {
struct smc_clc_msg_decline dclc;
struct msghdr msg;
@@ -118,7 +118,7 @@ int smc_clc_send_decline(struct smc_sock *smc, u32 
peer_diag_info,
dclc.hdr.type = SMC_CLC_DECLINE;
dclc.hdr.length = htons(sizeof(struct smc_clc_msg_decline));
dclc.hdr.version = SMC_CLC_V1;
-   dclc.hdr.flag = out_of_sync ? 1 : 0;
+   dclc.hdr.flag = (peer_diag_info == SMC_CLC_DECL_SYNCERR) ? 1 : 0;
memcpy(dclc.id_for_peer, local_systemid, sizeof(local_systemid));
dclc.peer_diagnosis = htonl(peer_diag_info);
memcpy(dclc.trl.eyecatcher, SMC_EYECATCHER, sizeof(SMC_EYECATCHER));
diff --git a/net/smc/smc_clc.h b/net/smc/smc_clc.h
index 13db8ce177c9..1c55414041d4 100644
--- a/net/smc/smc_clc.h
+++ b/net/smc/smc_clc.h
@@ -106,8 +106,7 @@ struct smc_ib_device;
 
 int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int buflen,
 u8 expected_type);
-int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info,
-u8 out_of_sync);
+int smc_clc_send_decline(struct smc_sock *smc, u32 peer_diag_info);
 int smc_clc_send_proposal(struct smc_sock *smc, struct smc_ib_device *smcibdev,
  u8 ibport);
 int smc_clc_send_confirm(struct smc_sock *smc);
-- 
2.13.5

[PATCH net-next 04/10] net/smc: adjust net_device refcount

2017-09-20 Thread Ursula Braun

smc_pnet_fill_entry() uses dev_get_by_name() adding a refcount to ndev.
The following smc_pnet_enter() has to reduce the refcount if the entry
to be added exists already in the pnet table.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_pnet.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index 78f7af28ae4f..31f8453c25c5 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -181,8 +181,10 @@ static int smc_pnet_enter(struct smc_pnetentry 
*new_pnetelem)
 sizeof(new_pnetelem->ndev->name)) ||
smc_pnet_same_ibname(pnetelem,
 new_pnetelem->smcibdev->ibdev->name,
-new_pnetelem->ib_port))
+new_pnetelem->ib_port)) {
+   dev_put(pnetelem->ndev);
goto found;
+   }
}
list_add_tail(&new_pnetelem->list, &smc_pnettable.pnetlist);
rc = 0;
-- 
2.13.5

[PATCH net-next 09/10] net/smc: parameter cleanup in smc_cdc_get_free_slot()

2017-09-20 Thread Ursula Braun

Use the smc_connection as first parameter with smc_cdc_get_free_slot().
This is just a small code cleanup, no functional change.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_cdc.c | 7 ---
 net/smc/smc_cdc.h | 3 ++-
 net/smc/smc_tx.c  | 6 ++
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index a7294edbc221..5ef97e5a5f78 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -62,10 +62,12 @@ static void smc_cdc_tx_handler(struct smc_wr_tx_pend_priv 
*pnd_snd,
bh_unlock_sock(&smc->sk);
 }
 
-int smc_cdc_get_free_slot(struct smc_link *link,
+int smc_cdc_get_free_slot(struct smc_connection *conn,
  struct smc_wr_buf **wr_buf,
  struct smc_cdc_tx_pend **pend)
 {
+   struct smc_link *link = &conn->lgr->lnk[SMC_SINGLE_LINK];
+
return smc_wr_tx_get_free_slot(link, smc_cdc_tx_handler, wr_buf,
   (struct smc_wr_tx_pend_priv **)pend);
 }
@@ -118,8 +120,7 @@ int smc_cdc_get_slot_and_msg_send(struct smc_connection 
*conn)
struct smc_wr_buf *wr_buf;
int rc;
 
-   rc = smc_cdc_get_free_slot(&conn->lgr->lnk[SMC_SINGLE_LINK], &wr_buf,
-  &pend);
+   rc = smc_cdc_get_free_slot(conn, &wr_buf, &pend);
if (rc)
return rc;
 
diff --git a/net/smc/smc_cdc.h b/net/smc/smc_cdc.h
index 8e1d76f26007..56f883d1159c 100644
--- a/net/smc/smc_cdc.h
+++ b/net/smc/smc_cdc.h
@@ -206,7 +206,8 @@ static inline void smc_cdc_msg_to_host(struct 
smc_host_cdc_msg *local,
 
 struct smc_cdc_tx_pend;
 
-int smc_cdc_get_free_slot(struct smc_link *link, struct smc_wr_buf **wr_buf,
+int smc_cdc_get_free_slot(struct smc_connection *conn,
+ struct smc_wr_buf **wr_buf,
  struct smc_cdc_tx_pend **pend);
 void smc_cdc_tx_dismiss_slots(struct smc_connection *conn);
 int smc_cdc_msg_send(struct smc_connection *conn, struct smc_wr_buf *wr_buf,
diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 3866573288dd..ec49bc3c3283 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -396,8 +396,7 @@ int smc_tx_sndbuf_nonempty(struct smc_connection *conn)
int rc;
 
spin_lock_bh(&conn->send_lock);
-   rc = smc_cdc_get_free_slot(&conn->lgr->lnk[SMC_SINGLE_LINK], &wr_buf,
-  &pend);
+   rc = smc_cdc_get_free_slot(conn, &wr_buf, &pend);
if (rc < 0) {
if (rc == -EBUSY) {
struct smc_sock *smc =
@@ -466,8 +465,7 @@ void smc_tx_consumer_update(struct smc_connection *conn)
((to_confirm > conn->rmbe_update_limit) &&
 ((to_confirm > (conn->rmbe_size / 2)) ||
  conn->local_rx_ctrl.prod_flags.write_blocked))) {
-   rc = smc_cdc_get_free_slot(&conn->lgr->lnk[SMC_SINGLE_LINK],
-  &wr_buf, &pend);
+   rc = smc_cdc_get_free_slot(conn, &wr_buf, &pend);
if (!rc)
rc = smc_cdc_msg_send(conn, wr_buf, pend);
if (rc < 0) {
-- 
2.13.5

[PATCH net-next 01/10] net/smc: add missing dev_put

2017-09-20 Thread Ursula Braun

From: Hans Wippel 

In the infiniband part, SMC currently uses get_netdev which calls
dev_hold on the returned net device. However, the SMC code never calls
dev_put on that net device resulting in a wrong reference count.

This patch adds a dev_put after the usage of the net device to fix the
issue.

Signed-off-by: Hans Wippel 
Signed-off-by: Ursula Braun 
---
 net/smc/smc_ib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 547e0e113b17..0b5852299158 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -380,6 +380,7 @@ static int smc_ib_fill_gid_and_mac(struct smc_ib_device 
*smcibdev, u8 ibport)
ndev = smcibdev->ibdev->get_netdev(smcibdev->ibdev, ibport);
if (ndev) {
memcpy(&smcibdev->mac, ndev->dev_addr, ETH_ALEN);
+   dev_put(ndev);
} else if (!rc) {
memcpy(&smcibdev->mac[ibport - 1][0],
   &smcibdev->gid[ibport - 1].raw[8], 3);
-- 
2.13.5

[PATCH net-next 06/10] net/smc: longer delay for client link group removal

2017-09-20 Thread Ursula Braun

Client link group creation always follows the server linkgroup creation.
If peer creates a new server link group, client has to create a new
client link group. If peer reuses a server link group for a new
connection, client has to reuse its client link group as well. This
patch introduces a longer delay for client link group removal to make
sure this link group still exists, once the peer decides to reuse a
server link group. This avoids out-of-sync conditions for link groups.
If already scheduled, modify the delay.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_core.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 1a16d51e2330..20b66e79c5d6 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -25,8 +25,9 @@
 #include "smc_cdc.h"
 #include "smc_close.h"
 
-#define SMC_LGR_NUM_INCR   256
-#define SMC_LGR_FREE_DELAY (600 * HZ)
+#define SMC_LGR_NUM_INCR   256
+#define SMC_LGR_FREE_DELAY_SERV(600 * HZ)
+#define SMC_LGR_FREE_DELAY_CLNT(SMC_LGR_FREE_DELAY_SERV + 10)
 
 static u32 smc_lgr_num;/* unique link group number */
 
@@ -107,8 +108,15 @@ static void smc_lgr_unregister_conn(struct smc_connection 
*conn)
__smc_lgr_unregister_conn(conn);
}
write_unlock_bh(&lgr->conns_lock);
-   if (reduced && !lgr->conns_num)
-   schedule_delayed_work(&lgr->free_work, SMC_LGR_FREE_DELAY);
+   if (!reduced || lgr->conns_num)
+   return;
+   /* client link group creation always follows the server link group
+* creation. For client use a somewhat higher removal delay time,
+* otherwise there is a risk of out-of-sync link groups.
+*/
+   mod_delayed_work(system_wq, &lgr->free_work,
+lgr->role == SMC_CLNT ? SMC_LGR_FREE_DELAY_CLNT :
+SMC_LGR_FREE_DELAY_SERV);
 }
 
 static void smc_lgr_free_work(struct work_struct *work)
-- 
2.13.5

[PATCH net-next 05/10] net/smc: adapt send request completion notification

2017-09-20 Thread Ursula Braun

The solicited flag is meaningful for the receive completion queue.
Ask for next work completion of any type on the send queue.

Signed-off-by: Ursula Braun 
---
 net/smc/smc_wr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index ab56bda66783..525d91e0d57e 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -244,7 +244,7 @@ int smc_wr_tx_send(struct smc_link *link, struct 
smc_wr_tx_pend_priv *priv)
int rc;
 
ib_req_notify_cq(link->smcibdev->roce_cq_send,
-IB_CQ_SOLICITED_MASK | IB_CQ_REPORT_MISSED_EVENTS);
+IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS);
pend = container_of(priv, struct smc_wr_tx_pend, priv);
rc = ib_post_send(link->roce_qp, &link->wr_tx_ibs[pend->idx],
  &failed_wr);
-- 
2.13.5

[PATCH net-next 00/10] net/smc: updates 2017-09-20

2017-09-20 Thread Ursula Braun

Hi Dave,

here is a collection of small smc-patches built for net-next improving
the smc code in different areas.

Thanks,
Ursula

Hans Wippel (2):
  net/smc: add missing dev_put
  net/smc: add receive timeout check

Ursula Braun (8):
  net/smc: take RCU read lock for routing cache lookup
  net/smc: adjust net_device refcount
  net/smc: adapt send request completion notification
  net/smc: longer delay for client link group removal
  net/smc: terminate link group if out-of-sync is received
  net/smc: introduce a delay
  net/smc: parameter cleanup in smc_cdc_get_free_slot()
  net/smc: no close wait in case of process shut down

 net/smc/af_smc.c| 16 +---
 net/smc/smc.h   |  2 +-
 net/smc/smc_cdc.c   |  7 ---
 net/smc/smc_cdc.h   |  3 ++-
 net/smc/smc_clc.c   | 10 +-
 net/smc/smc_clc.h   |  3 +--
 net/smc/smc_close.c | 27 +++
 net/smc/smc_core.c  | 16 
 net/smc/smc_ib.c|  1 +
 net/smc/smc_pnet.c  |  4 +++-
 net/smc/smc_rx.c|  2 ++
 net/smc/smc_tx.c| 18 ++
 net/smc/smc_wr.c|  2 +-
 13 files changed, 66 insertions(+), 45 deletions(-)

-- 
2.13.5

[PATCH net-next 03/10] net/smc: take RCU read lock for routing cache lookup

2017-09-20 Thread Ursula Braun

smc_netinfo_by_tcpsk() looks up the routing cache. Such a lookup requires
protection by an RCU read lock.

Signed-off-by: Ursula Braun 
---
 net/smc/af_smc.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 8c6d24b2995d..2e8d2dabac0c 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -282,6 +282,7 @@ int smc_netinfo_by_tcpsk(struct socket *clcsock,
 __be32 *subnet, u8 *prefix_len)
 {
struct dst_entry *dst = sk_dst_get(clcsock->sk);
+   struct in_device *in_dev;
struct sockaddr_in addr;
int rc = -ENOENT;
int len;
@@ -298,14 +299,17 @@ int smc_netinfo_by_tcpsk(struct socket *clcsock,
/* get address to which the internal TCP socket is bound */
kernel_getsockname(clcsock, (struct sockaddr *)&addr, &len);
/* analyze IPv4 specific data of net_device belonging to TCP socket */
-   for_ifa(dst->dev->ip_ptr) {
-   if (ifa->ifa_address != addr.sin_addr.s_addr)
+   rcu_read_lock();
+   in_dev = __in_dev_get_rcu(dst->dev);
+   for_ifa(in_dev) {
+   if (!inet_ifa_match(addr.sin_addr.s_addr, ifa))
continue;
*prefix_len = inet_mask_len(ifa->ifa_mask);
*subnet = ifa->ifa_address & ifa->ifa_mask;
rc = 0;
break;
-   } endfor_ifa(dst->dev->ip_ptr);
+   } endfor_ifa(in_dev);
+   rcu_read_unlock();
 
 out_rel:
dst_release(dst);
-- 
2.13.5

[PATCH net-next 02/10] net/smc: add receive timeout check

2017-09-20 Thread Ursula Braun

From: Hans Wippel 

The SMC receive function currently lacks a timeout check under the
condition that no data were received and no data are available. This
patch adds such a check.

Signed-off-by: Hans Wippel 
Signed-off-by: Ursula Braun 
---
 net/smc/smc_rx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/smc/smc_rx.c b/net/smc/smc_rx.c
index b17a333e9bb0..3e631ae4b6b6 100644
--- a/net/smc/smc_rx.c
+++ b/net/smc/smc_rx.c
@@ -148,6 +148,8 @@ int smc_rx_recvmsg(struct smc_sock *smc, struct msghdr 
*msg, size_t len,
read_done = sock_intr_errno(timeo);
break;
}
+   if (!timeo)
+   return -EAGAIN;
}
 
if (!atomic_read(&conn->bytes_to_rcv)) {
-- 
2.13.5

Re: [PATCH v2 1/2] mac80211: Add rcu read side critical sections

2017-09-20 Thread Ville Syrjälä

On Wed, Sep 20, 2017 at 12:39:24PM +0200, Johannes Berg wrote:
> On Wed, 2017-09-20 at 13:11 +0300, Ville Syrjala wrote:
> 
> > --- a/net/mac80211/tx.c
> > +++ b/net/mac80211/tx.c
> > @@ -1770,15 +1770,21 @@ bool ieee80211_tx_prepare_skb(struct ieee80211_hw 
> > *hw,
> >     struct ieee80211_tx_data tx;
> >     struct sk_buff *skb2;
> >  
> > -   if (ieee80211_tx_prepare(sdata, &tx, NULL, skb) == TX_DROP)
> > +   rcu_read_lock();
> 
> The documentation says:
> 
> /**
>  * ieee80211_tx_prepare_skb - prepare an 802.11 skb for transmission
>  * @hw: pointer as obtained from ieee80211_alloc_hw()
>  * @vif: virtual interface
>  * @skb: frame to be sent from within the driver
>  * @band: the band to transmit on
>  * @sta: optional pointer to get the station to send the frame to
>  *
>  * Note: must be called under RCU lock
>  */
> 
> You can't even argue that it should be the function itself doing it,
> because the (admittedly optional) sta pointer would otherwise not have
> proper protection after you leave the function ... You can't pass out a
> sta pointer that's RCU protected.

Yeah, I suppose that would need rcu_handoff+some other mechanism to
make sure it stays around after that.

> 
> Side note: Perhaps some annotation should be there? not sure it's
> possible - would have to be something like
>   struct ieee80211_sta * __rcu *sta;
> 
> I guess since the outer pointer isn't protected, only the inner ...

I think just the fact that even the pointers in ieee80211_tx_data don't
have the __rcu annotation makes it rather hard to see what is really rcu
protected and what isn't. If every user of those pointers would have to
do the rcu_dereference() things would be rather more explicit.

> Therefore, this patch is wrong.

OK, so the problem is in ath9k then.

> I actually think the same is true for ieee80211_tx_dequeue(), but I'm
> less sure about it - the sta pointer there clearly is somehow safely
> passed in (even if it's w/o RCU, the driver can potentially make that
> safe), but the key pointer seems unsafe in this case (as well) if
> there's no outer RCU protection.

Well, I think this is as far as I want to dig into the matter. I can
respin the patch once more with just tx_dequeue() fix in there, if you
want (not sure you do if you think it's wrong as well). After that I'll
leave it to someone who actually knows what they're doing with mac80211 ;)

-- 
Ville Syrjälä
Intel OTC

Re: [PATCH v2 1/2] mac80211: Add rcu read side critical sections

2017-09-20 Thread Johannes Berg

On Wed, 2017-09-20 at 15:11 +0300, Ville Syrjälä wrote:
> 
> > I guess since the outer pointer isn't protected, only the inner ...
> 
> I think just the fact that even the pointers in ieee80211_tx_data
> don't have the __rcu annotation makes it rather hard to see what is
> really rcu protected and what isn't. If every user of those pointers
> would have to do the rcu_dereference() things would be rather more
> explicit.

It wouldn't make sense though, because those users don't need to
provide the protection, and they don't need to make sure to use the
pointer in an RCU safe manner (access once etc.) since they're in code
that can't really go wrong... mostly.

> > Therefore, this patch is wrong.
> 
> OK, so the problem is in ath9k then.

I agree.

> > I actually think the same is true for ieee80211_tx_dequeue(), but 
[...]
> Well, I think this is as far as I want to dig into the matter. I can
> respin the patch once more with just tx_dequeue() fix in there, if
> you want (not sure you do if you think it's wrong as well). After
> that I'll leave it to someone who actually knows what they're doing
> with mac80211 ;)

:-)
I think we should rather document that RCU is required for that
function, I think for some usages it may be OK without but with keys
I'm pretty sure you'll need it.

johannes

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Almost there

Bisecting: 6 revisions left to test after this (roughly 3 steps)
[ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() 
properly




W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze:

Ok resumed and soo far:

Panic:

# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid 
using stack larger than 1024.

git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f

No panic:

# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
'udp-reduce-cache-pressure'

git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20


W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze:

Soo far bisected and marked:

git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
'pinctrl-v4.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31



W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze:
Ok kernel crashed with different panic that i didnt catch when i was 
doing bisect and now my bisection is broken :)


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be 
overwritten by checkout:

    Documentation/00-INDEX
    Documentation/ABI/stable/sysfs-class-udc
    Documentation/ABI/testing/configfs-usb-gadget-uac1
    Documentation/ABI/testing/ima_policy
    Documentation/ABI/testing/sysfs-bus-iio
    Documentation/ABI/testing/sysfs-bus-iio-meas-spec
    Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
    Documentation/ABI/testing/sysfs-class-net
    Documentation/ABI/testing/sysfs-class-power-twl4030
    Documentation/ABI/testing/sysfs-class-typec
    Documentation/DMA-API.txt
    Documentation/IRQ-domain.txt
    Documentation/Makefile
    Documentation/PCI/MSI-HOWTO.txt
    Documentation/RCU/00-INDEX
Documentation/RCU/Design/Requirements/Requirements.html
    Documentation/RCU/checklist.txt
    Documentation/admin-guide/README.rst
    Documentation/admin-guide/devices.txt
    Documentation/admin-guide/index.rst
    Documentation/admin-guide/kernel-parameters.txt
    Documentation/admin-guide/pm/cpufreq.rst
    Documentation/admin-guide/pm/intel_pstate.rst
    Documentation/admin-guide/ras.rst
    Documentation/arm/Atmel/README
    Documentation/block/biodoc.txt
    Documentation/conf.py
    Documentation/core-api/assoc_array.rst
    Documentation/core-api/atomic_ops.rst
    Documentation/core-api/index.rst
    Documentation/crypto/asymmetric-keys.txt
    Documentation/dev-tools/index.rst
    Documentation/dev-tools/sparse.rst
    Documentation/devicetree/bindings/arm/amlogic.txt
    Documentation/devicetree/bindings/arm/atmel-at91.txt
    Documentation/devicetree/bindings/arm/ccn.txt
    Documentation/devicetree/bindings/arm/cpus.txt
    Documentation/devicetree/bindings/arm/gemini.txt
Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
Documentation/devicetree/bindings/arm/keystone/keystone.txt
    Documentation/devicetree/bindings/arm/mediatek.txt
    Documentation/devicetree/bindings/arm/rockchip.txt
    Documentation/devicetree/bindings/arm/shmobile.txt
    Documentation/devicetree/bindings/arm/tegra.txt
Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt
    Documentation/devicetree/bindings/gpio/gpio_atmel.txt
Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt
Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt
Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt
Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt
Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt 

Documentation/devicetree/bindings/interrupt-controller/a

Re: rsi: fix a dereference on adapter before it has been null checked

2017-09-20 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> The assignment of dev is dereferencing adapter before adapter has
> been null checked, potentially leading to a null pointer dereference.
> Fix this by simply moving the assignment of dev to a later point
> after the sanity null check of adapter.
> 
> Detected by CoverityScan CID#1398383 ("Dereference before null check")
> 
> Fixes: dad0d04fa7ba ("rsi: Add RS9113 wireless driver")
> Signed-off-by: Colin Ian King 

Patch applied to wireless-drivers-next.git, thanks.

6508497cbdc7 rsi: fix a dereference on adapter before it has been null checked

-- 
https://patchwork.kernel.org/patch/9944509/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: [1/2] b43: fix unitialized reads of ret by initializing the array to zero

2017-09-20 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> The u8 char array ret is not being initialized and elements outside
> the range start to end contain just garbage values from the stack.
> This results in a later scan of the array to read potentially
> uninitialized values.  Fix this by initializing the array to zero.
> This seems to have been an issue since the very first commit.
> 
> Detected by CoverityScan CID#139652 ("Uninitialized scalar variable")
> 
> Signed-off-by: Colin Ian King 
> Reviewed-by: Michael Buesch 

2 patches applied to wireless-drivers-next.git, thanks.

e31fbe1034d9 b43: fix unitialized reads of ret by initializing the array to zero
e3ae1c772046 b43legacy: fix unitialized reads of ret by initializing the array 
to zero

-- 
https://patchwork.kernel.org/patch/9939435/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

[PATCH] net_sched: always reset qdisc backlog in qdisc_reset()

2017-09-20 Thread Konstantin Khlebnikov

SKB stored in qdisc->gso_skb also counted into backlog.

Some qdiscs don't reset backlog to zero in ->reset(),
for example sfq just dequeue and free all queued skb.

Signed-off-by: Konstantin Khlebnikov 
Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too")
---
 net/sched/sch_generic.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 92237e75dbbc..bf8c81e07c70 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -685,6 +685,7 @@ void qdisc_reset(struct Qdisc *qdisc)
qdisc->gso_skb = NULL;
}
qdisc->q.qlen = 0;
+   qdisc->qstats.backlog = 0;
 }
 EXPORT_SYMBOL(qdisc_reset);

[PATCH] net_sched/hfsc: fix curve activation in hfsc_change_class()

2017-09-20 Thread Konstantin Khlebnikov

If real-time or fair-share curves are enabled in hfsc_change_class()
class isn't inserted into rb-trees yet. Thus init_ed() and init_vf()
must be called in place of update_ed() and update_vf().

Remove isn't required because for now curves cannot be disabled.

Signed-off-by: Konstantin Khlebnikov 
---
 net/sched/sch_hfsc.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index daaf214e5201..3f88b75488b0 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -958,6 +958,8 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 
parentid,
}
 
if (cl != NULL) {
+   int old_flags;
+
if (parentid) {
if (cl->cl_parent &&
cl->cl_parent->cl_common.classid != parentid)
@@ -978,6 +980,8 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 
parentid,
}
 
sch_tree_lock(sch);
+   old_flags = cl->cl_flags;
+
if (rsc != NULL)
hfsc_change_rsc(cl, rsc, cur_time);
if (fsc != NULL)
@@ -986,10 +990,21 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 
parentid,
hfsc_change_usc(cl, usc, cur_time);
 
if (cl->qdisc->q.qlen != 0) {
-   if (cl->cl_flags & HFSC_RSC)
-   update_ed(cl, qdisc_peek_len(cl->qdisc));
-   if (cl->cl_flags & HFSC_FSC)
-   update_vf(cl, 0, cur_time);
+   int len = qdisc_peek_len(cl->qdisc);
+
+   if (cl->cl_flags & HFSC_RSC) {
+   if (old_flags & HFSC_RSC)
+   update_ed(cl, len);
+   else
+   init_ed(cl, len);
+   }
+
+   if (cl->cl_flags & HFSC_FSC) {
+   if (old_flags & HFSC_FSC)
+   update_vf(cl, 0, cur_time);
+   else
+   init_vf(cl, len);
+   }
}
sch_tree_unlock(sch);

Re: mwifiex: remove unnecessary call to memset

2017-09-20 Thread Kalle Valo

Himanshu Jha  wrote:

> call to memset to assign 0 value immediately after allocating
> memory with kzalloc is unnecesaary as kzalloc allocates the memory
> filled with 0 value.
> 
> Semantic patch used to resolve this issue:
> 
> @@
> expression e,e2; constant c;
> statement S;
> @@
> 
>   e = kzalloc(e2, c);
>   if(e == NULL) S
> - memset(e, 0, e2);
> 
> Signed-off-by: Himanshu Jha 

Patch applied to wireless-drivers-next.git, thanks.

85dafc129196 mwifiex: remove unnecessary call to memset

-- 
https://patchwork.kernel.org/patch/9947331/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: mwifiex: make const arrays static to shink object code size

2017-09-20 Thread Kalle Valo

Colin Ian King  wrote:

> From: Colin Ian King 
> 
> Don't populate const arrays on the stack, instead make them static
> Makes the object code smaller by nearly 300 bytes:
> 
> Before:
>text  data bss dec hex filename
>   69260 16149 576   85985   14fe1 cfg80211.o
> 
> After:
>text  data bss dec hex filename
>   68385 16725 576   85686   14eb6 cfg80211.o
> 
> Signed-off-by: Colin Ian King 

Patch applied to wireless-drivers-next.git, thanks.

d157bcfaf854 mwifiex: make const arrays static to shink object code size

-- 
https://patchwork.kernel.org/patch/9954375/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


And the last one

git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for 
insertion into fib6 tree


With this have kernel panic same as always

git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and 
remove the operation of dst_free()




W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze:

Almost there

Bisecting: 6 revisions left to test after this (roughly 3 steps)
[ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() 
properly




W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze:

Ok resumed and soo far:

Panic:

# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid 
using stack larger than 1024.

git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f

No panic:

# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
'udp-reduce-cache-pressure'

git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20


W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze:

Soo far bisected and marked:

git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
'pinctrl-v4.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31



W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze:
Ok kernel crashed with different panic that i didnt catch when i 
was doing bisect and now my bisection is broken :)


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be 
overwritten by checkout:

    Documentation/00-INDEX
    Documentation/ABI/stable/sysfs-class-udc
    Documentation/ABI/testing/configfs-usb-gadget-uac1
    Documentation/ABI/testing/ima_policy
    Documentation/ABI/testing/sysfs-bus-iio
    Documentation/ABI/testing/sysfs-bus-iio-meas-spec
    Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
    Documentation/ABI/testing/sysfs-class-net
    Documentation/ABI/testing/sysfs-class-power-twl4030
    Documentation/ABI/testing/sysfs-class-typec
    Documentation/DMA-API.txt
    Documentation/IRQ-domain.txt
    Documentation/Makefile
    Documentation/PCI/MSI-HOWTO.txt
    Documentation/RCU/00-INDEX
Documentation/RCU/Design/Requirements/Requirements.html
    Documentation/RCU/checklist.txt
    Documentation/admin-guide/README.rst
    Documentation/admin-guide/devices.txt
    Documentation/admin-guide/index.rst
    Documentation/admin-guide/kernel-parameters.txt
    Documentation/admin-guide/pm/cpufreq.rst
    Documentation/admin-guide/pm/intel_pstate.rst
    Documentation/admin-guide/ras.rst
    Documentation/arm/Atmel/README
    Documentation/block/biodoc.txt
    Documentation/conf.py
    Documentation/core-api/assoc_array.rst
    Documentation/core-api/atomic_ops.rst
    Documentation/core-api/index.rst
    Documentation/crypto/asymmetric-keys.txt
    Documentation/dev-tools/index.rst
    Documentation/dev-tools/sparse.rst
    Documentation/devicetree/bindings/arm/amlogic.txt
    Documentation/devicetree/bindings/arm/atmel-at91.txt
    Documentation/devicetree/bindings/arm/ccn.txt
    Documentation/devicetree/bindings/arm/cpus.txt
    Documentation/devicetree/bindings/arm/gemini.txt
Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
Documentation/devicetree/bindings/arm/keystone/keystone.txt
    Documentation/devicetree/bindings/arm/mediatek.txt
    Documentation/devicetree/bindings/arm/rockchip.txt
    Documentation/devicetree/bindings/arm/shmobile.txt
    Documentation/devicetree/bindings/arm/tegra.txt
Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt
Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt
Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt
Documentation/devicet

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


hmm

But after

b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
Author: Wei Wang 
Date:   Sat Jun 17 10:42:32 2017 -0700

    ipv4: mark DST_NOGC and remove the operation of dst_free()

    With the previous preparation patches, we are ready to get rid of the
    dst gc operation in ipv4 code and release dst based on refcnt only.
    So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
    to dst_free().
    At this point, all dst created in ipv4 code do not use the dst gc
    anymore and will be destroyed at the point when refcnt drops to 0.

    Signed-off-by: Wei Wang 
    Acked-by: Martin KaFai Lau 
    Signed-off-by: David S. Miller 

:04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 
831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M  net



Still panic - soo will back to past 3 steps and will try to get again 
bisect without panic.




W dniu 2017-09-20 o 14:49, Paweł Staszewski pisze:

And the last one

git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt 
for insertion into fib6 tree


With this have kernel panic same as always

git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and 
remove the operation of dst_free()




W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze:

Almost there

Bisecting: 6 revisions left to test after this (roughly 3 steps)
[ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() 
properly




W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze:

Ok resumed and soo far:

Panic:

# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid 
using stack larger than 1024.

git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f

No panic:

# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
'udp-reduce-cache-pressure'

git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20


W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze:

Soo far bisected and marked:

git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
'pinctrl-v4.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31



W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze:
Ok kernel crashed with different panic that i didnt catch when i 
was doing bisect and now my bisection is broken :)


git bisect good
Bisecting: 1787 revisions left to test after this (roughly 11 steps)
error: Your local changes to the following files would be 
overwritten by checkout:

    Documentation/00-INDEX
    Documentation/ABI/stable/sysfs-class-udc
    Documentation/ABI/testing/configfs-usb-gadget-uac1
    Documentation/ABI/testing/ima_policy
    Documentation/ABI/testing/sysfs-bus-iio
    Documentation/ABI/testing/sysfs-bus-iio-meas-spec
Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
    Documentation/ABI/testing/sysfs-class-net
Documentation/ABI/testing/sysfs-class-power-twl4030
    Documentation/ABI/testing/sysfs-class-typec
    Documentation/DMA-API.txt
    Documentation/IRQ-domain.txt
    Documentation/Makefile
    Documentation/PCI/MSI-HOWTO.txt
    Documentation/RCU/00-INDEX
Documentation/RCU/Design/Requirements/Requirements.html
    Documentation/RCU/checklist.txt
    Documentation/admin-guide/README.rst
    Documentation/admin-guide/devices.txt
    Documentation/admin-guide/index.rst
    Documentation/admin-guide/kernel-parameters.txt
    Documentation/admin-guide/pm/cpufreq.rst
    Documentation/admin-guide/pm/intel_pstate.rst
    Documentation/admin-guide/ras.rst
    Documentation/arm/Atmel/README
    Documentation/block/biodoc.txt
    Documentation/conf.py
    Documentation/core-api/assoc_array.rst
    Documentation/core-api/atomic_ops.rst
    Documentation/core-api/inde

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet

Sorry for top-posting, but this is to give context to Wei, since Pawel
used a top posting way to report his bisection.

Wei, can you take a look at Pawel report ?

Crash happens in dst_destroy() at following :

if (dst->dev)
 dev_put(dst->dev); <>


dst->dev is not NULL, but netdev->pcpu_refcnt is NULL

65 ff 08decl   %gs:(%rax)   // CRASH since rax = NULL



Pawel, please share your netdevices and routing setup  ?

Thanks !

On Wed, 2017-09-20 at 14:49 +0200, Paweł Staszewski wrote:
> And the last one
> 
> git bisect good
> Bisecting: 1 revision left to test after this (roughly 1 step)
> [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for 
> insertion into fib6 tree
> 
> With this have kernel panic same as always
> 
> git bisect bad
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and 
> remove the operation of dst_free()
> 
> 
> 
> W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze:
> > Almost there
> >
> > Bisecting: 6 revisions left to test after this (roughly 3 steps)
> > [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() 
> > properly
> >
> >
> >
> > W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze:
> >> Ok resumed and soo far:
> >>
> >> Panic:
> >>
> >> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid 
> >> using stack larger than 1024.
> >> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
> >>
> >> No panic:
> >>
> >> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
> >> 'udp-reduce-cache-pressure'
> >> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
> >>
> >>
> >> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze:
> >>> Soo far bisected and marked:
> >>>
> >>> git bisect start
> >>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
> >>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
> >>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
> >>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
> >>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
> >>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
> >>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
> >>> 'pinctrl-v4.13-1' of 
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> >>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
> >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
> >>> 'next' of 
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
> >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
> >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
> >>> 'next' of 
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
> >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
> >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 
> >>> 'next' of 
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
> >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
> >>>
> >>>
> >>>
> >>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze:
>  Ok kernel crashed with different panic that i didnt catch when i 
>  was doing bisect and now my bisection is broken :)
> 
>  git bisect good
>  Bisecting: 1787 revisions left to test after this (roughly 11 steps)
>  error: Your local changes to the following files would be 
>  overwritten by checkout:
>  Documentation/00-INDEX
>  Documentation/ABI/stable/sysfs-class-udc
>  Documentation/ABI/testing/configfs-usb-gadget-uac1
>  Documentation/ABI/testing/ima_policy
>  Documentation/ABI/testing/sysfs-bus-iio
>  Documentation/ABI/testing/sysfs-bus-iio-meas-spec
>  Documentation/ABI/testing/sysfs-bus-iio-timer-stm32
>  Documentation/ABI/testing/sysfs-class-net
>  Documentation/ABI/testing/sysfs-class-power-twl4030
>  Documentation/ABI/testing/sysfs-class-typec
>  Documentation/DMA-API.txt
>  Documentation/IRQ-domain.txt
>  Documentation/Makefile
>  Documentation/PCI/MSI-HOWTO.txt
>  Documentation/RCU/00-INDEX
>  Documentation/RCU/Design/Requirements/Requirements.html
>  Documentation/RCU/checklist.txt
>  Documentation/admin-guide/README.rst
>  Documentation/admin-guide/devices.txt
>  Documentation/admin-guide/index.rst
>  Documentation/admin-guide/kernel-parameters.txt
>  Documentation/admin-guide/pm/cpufreq.rst
>  Documentation/admin-guide/pm/intel_pstate.rst
>  Documentation/admin-guide/ras.rst
>  Documentation/arm/Atmel/README
>  Documentation/block/biodoc.txt
>  Documentation/conf.py
>  Documentation/core-api/assoc_array.rst
>

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


So far path for bisect was:

git bisect start
# bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 
'pinctrl-v4.13-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075
# good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' 
of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31
# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using 
stack larger than 1024.

git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f
# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
'udp-reduce-cache-pressure'

git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20
# bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 
's390-net-updates-part-2'

git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230
# good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 
'bpf-ctx-narrow'

git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70
# good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove 
cp_outgoing

git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2
# bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add 
TCP_MD5SIG_EXT socket option to set a key address prefix

git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d
# good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new 
function dst_dev_put()

git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
# bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove 
DST_NOCACHE flag

git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2
# bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call 
dst_hold_safe() properly

git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112

No PANIC

# good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call 
dst_hold_safe() properly

git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f

PANIC

# bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take 
dst->__refcnt for insertion into fib6 tree

git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855

PANIC

# bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC 
and remove the operation of dst_free()

git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
# first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: 
mark DST_NOGC and remove the operation of dst_free()





W dniu 2017-09-20 o 15:05, Paweł Staszewski pisze:

hmm

But after

b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit
commit b838d5e1c5b6e57b10ec8af2268824041e3ea911
Author: Wei Wang 
Date:   Sat Jun 17 10:42:32 2017 -0700

    ipv4: mark DST_NOGC and remove the operation of dst_free()

    With the previous preparation patches, we are ready to get rid of the
    dst gc operation in ipv4 code and release dst based on refcnt only.
    So this patch adds DST_NOGC flag for all IPv4 dst and remove the 
calls

    to dst_free().
    At this point, all dst created in ipv4 code do not use the dst gc
    anymore and will be destroyed at the point when refcnt drops to 0.

    Signed-off-by: Wei Wang 
    Acked-by: Martin KaFai Lau 
    Signed-off-by: David S. Miller 

:04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 
831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M  net



Still panic - soo will back to past 3 steps and will try to get again 
bisect without panic.




W dniu 2017-09-20 o 14:49, Paweł Staszewski pisze:

And the last one

git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt 
for insertion into fib6 tree


With this have kernel panic same as always

git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and 
remove the operation of dst_free()




W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze:

Almost there

Bisecting: 6 revisions left to test after this (roughly 3 steps)
[ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call 
dst_hold_safe() properly




W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze:

Ok resumed and soo far:

Panic:

# bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid 
using stack larger than 1024.

git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f

No panic:

# good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 
'udp-reduce-cache-pressure'

git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20


W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze:

Soo far bisected and marked:

git bisect start
# bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2
git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355
# good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13
git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269
# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [ac7b7596

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Yes sorry for top-posting also.

Configuration:

Ethernet devices:

lspci | grep Etherne
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
81:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
83:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
83:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)



ip l
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp2s0f0:  mtu 1500 qdisc mq state 
DOWN mode DEFAULT qlen 8192

    link/ether 00:25:90:e4:97:9a brd ff:ff:ff:ff:ff:ff
3: enp2s0f1:  mtu 1500 qdisc mq state 
DOWN mode DEFAULT qlen 8192

    link/ether 00:25:90:e4:97:9b brd ff:ff:ff:ff:ff:ff
4: enp4s0f0:  mtu 1500 qdisc mq 
master bond1 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
5: enp4s0f1:  mtu 1500 qdisc mq 
master bond0 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
6: enp7s0f0:  mtu 1500 qdisc mq 
master bond1 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
7: enp7s0f1:  mtu 1500 qdisc mq 
master bond0 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
8: enp129s0f0:  mtu 1500 qdisc mq 
master bond1 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
9: enp129s0f1:  mtu 1500 qdisc mq 
master bond0 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
10: enp131s0f0:  mtu 1500 qdisc 
mq master bond1 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
11: enp131s0f1:  mtu 1500 qdisc 
mq master bond0 state UP mode DEFAULT qlen 8192

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
12: sit0@NONE:  mtu 1480 qdisc noop state DOWN mode DEFAULT qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
13: bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
14: bond1:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
15: vlan4091@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
16: vlan4032@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
17: vlan514@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
18: vlan87@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
19: vlan518@bond1:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
20: vlan646@bond1:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff
21: vlan370@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
22: vlan3212@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff
23: vlan746@bond0:  mtu 1500 qdisc 
noqueue state UP mode DEFAULT qlen 1000

    link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff


There are bonds:

cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp4s0f1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 0c:c4:7a:bc:b8:69
Slave queue ID: 0

Slave Interface: enp7s0f1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:90:e3:dd:9d
Slave queue ID: 0

Slave Interface: enp129s0f1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:25:90:e3:da:e1
Slave queue ID: 0

Slave Interface: enp131s0f1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:bc:b1:fd
Slave queue ID: 0

cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driv

Re: [PATCH iproute2 json v2 05/27] ip: ipaddress.c: add support for json output

2017-09-20 Thread Sabrina Dubroca

Hi Julien,

2017-08-17, 10:35:52 -0700, Julien Fortin wrote:
> From: Julien Fortin 
> 
> This patch converts all output (mostly fprintfs) to the new ip_print api
> which handle both regular and json output.
> Initialize a json_writer and open an array object if -json was specified.
> Note that the JSON attribute naming follows the NETLINK_ATTRIBUTE naming.
[snip]

This patch (commit d0e720111aad) changed the output of "ip addr":

Before:
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host 
   valid_lft forever preferred_lft forever

After:
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128scope host 
   valid_lft forever preferred_lft forever

The space following the mask is missing.

Could you have a look?
Thanks.

-- 
Sabrina

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet

Could you try this debug patch ?

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,14 @@ void netdev_run_todo(void);
  */
 static inline void dev_put(struct net_device *dev)
 {
-   this_cpu_dec(*dev->pcpu_refcnt);
+   int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+   if (!pref) {
+   pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+  dev, dev->name, dev->reg_state, dev->dismantle);
+   BUG();
+   }
+   this_cpu_dec(*pref);
 }
 
 /**

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet

On Wed, 2017-09-20 at 06:34 -0700, Eric Dumazet wrote:
> Could you try this debug patch ?
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 
> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
>  100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3331,7 +3331,14 @@ void netdev_run_todo(void);
>   */
>  static inline void dev_put(struct net_device *dev)
>  {
> - this_cpu_dec(*dev->pcpu_refcnt);
> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
> +
> + if (!pref) {
> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
> +dev, dev->name, dev->reg_state, dev->dismantle);
> + BUG();
> + }
> + this_cpu_dec(*pref);
>  }
>  
>  /**
> 

And since the console will be filled by stack trace, maybe instead of
BUG() use some infinite loop ?

for (;;)
cpu_relax();

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski




W dniu 2017-09-20 o 15:34, Eric Dumazet pisze:

Could you try this debug patch ?

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,14 @@ void netdev_run_todo(void);
   */
  static inline void dev_put(struct net_device *dev)
  {
-   this_cpu_dec(*dev->pcpu_refcnt);
+   int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+   if (!pref) {
+   pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+  dev, dev->name, dev->reg_state, dev->dismantle);
+   BUG();
+   }
+   this_cpu_dec(*pref);
  }
  
  /**






You want me to add this patch to what kernel version ?
currently im after git bisect reset - so mainline stable

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet

On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote:
> 
> W dniu 2017-09-20 o 15:34, Eric Dumazet pisze:
> > Could you try this debug patch ?
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 
> > f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
> >  100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3331,7 +3331,14 @@ void netdev_run_todo(void);
> >*/
> >   static inline void dev_put(struct net_device *dev)
> >   {
> > -   this_cpu_dec(*dev->pcpu_refcnt);
> > +   int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
> > +
> > +   if (!pref) {
> > +   pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
> > +  dev, dev->name, dev->reg_state, dev->dismantle);
> > +   BUG();
> > +   }
> > +   this_cpu_dec(*pref);
> >   }
> >   
> >   /**
> >
> >
> >
> 
> You want me to add this patch to what kernel version ?
> currently im after git bisect reset - so mainline stable
> 

Simply us the latest net-next as mentioned in the thread title, thanks.

Re: [PATCH net-next 3/4] qed: Fix maximum number of CQs for iWARP

2017-09-20 Thread Leon Romanovsky

On Wed, Sep 20, 2017 at 05:46:35AM +, Kalderon, Michal wrote:
> From: Leon Romanovsky 
> Sent: Tuesday, September 19, 2017 8:46 PM
> On Tue, Sep 19, 2017 at 08:26:18PM +0300, Michal Kalderon wrote:
> >> The maximum number of CQs supported is bound to the number
> >> of connections supported, which differs between RoCE and iWARP.
> >>
> >> This fixes a crash that occurred in iWARP when running 1000 sessions
> >> using perftest.
> >>
> >> Signed-off-by: Michal Kalderon 
> >> Signed-off-by: Ariel Elior 
> >> ---
> >
> >It is worth to add Fixes line.
> >
> >Thanks
> The original code was there before we had iWARP support, so this doesn't
> exactly fix an older commit, but fixes iWARP code in general.

So add Fixes which points to iWARP ennoblement patch.

Thanks

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature

Re: [PATCH net-next 08/10] net/smc: introduce a delay

2017-09-20 Thread Leon Romanovsky

On Wed, Sep 20, 2017 at 01:58:11PM +0200, Ursula Braun wrote:
> The number of outstanding work requests is limited. If all work
> requests are in use, tx processing is postponed to another scheduling
> of the tx worker. Switch to a delayed worker to have a gap for tx
> completion queue events before the next retry.
>

How will delay prevent and protect the resource exhausting?

Thanks


signature.asc
Description: PGP signature

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


Nit much more after adding this patch

https://bugzilla.kernel.org/attachment.cgi?id=258529



W dniu 2017-09-20 o 15:44, Eric Dumazet pisze:

On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote:

W dniu 2017-09-20 o 15:34, Eric Dumazet pisze:

Could you try this debug patch ?

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82
 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,14 @@ void netdev_run_todo(void);
*/
   static inline void dev_put(struct net_device *dev)
   {
-   this_cpu_dec(*dev->pcpu_refcnt);
+   int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+   if (!pref) {
+   pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+  dev, dev->name, dev->reg_state, dev->dismantle);
+   BUG();
+   }
+   this_cpu_dec(*pref);
   }
   
   /**





You want me to add this patch to what kernel version ?
currently im after git bisect reset - so mainline stable


Simply us the latest net-next as mentioned in the thread title, thanks.

[patch net-next 00/16] mlxsw: Multicast flood update

2017-09-20 Thread Jiri Pirko

From: Jiri Pirko 

Nogah says:

Currently, there are four erroneous flows in MC flood:
1. When MC is disabled it affects only the flood table for unregistered
   MC packets, but packets that match an entry in the MDB are unaffected.
2. When MC is disabled, MC packets are being sent to all the ports in the
   bridge (like BC and link-local MC packets) regardless of the designated
   flag (BR_MCAST_FLAG).
3. When a port is being deleted from a bridge it might remain in the MDB.
4. When MC is enabled packets are flooded to the mrouter ports only if
   they don't match any entry in the MDB, when they should always be
   flooded to them.

What these problems have in common is the discrepancy between how the
hardware handles MDB and mcast flood, and how the driver does it. Each
of these problems needs fixing either in the MDB code, or in mcast flood
code, and some in both.

Patches 1-6 change the way the MDB is handled in the driver to make the
following changes easier.
Patches 7-8 fix problem number 1 by removing the MDB from the HW when MC
is being disabled and restoring it when it is being enabled.
Patches 9-10 fix problem number 2 by offloading the flood table by the
appropriate flag.
Patch 11 fixes problem number 3 by adding MDB flush to the port removal.
Patches 12-14 fix problem number 4 by adding the mrouter ports to every
MDB entry in the HW to mimic the wanted behaviour.

Nogah Frankel (16):
  mlxsw: spectrum_switchdev: Change mc_router to mrouter
  mlxsw: spectrum_switchdev: Add a ports bitmap to the mid db
  mlxsw: spectrum_switchdev: Remove reference count from mid
  mlxsw: spectrum_switchdev: Save mids list per bridge device
  mlxsw: spectrum_switchdev: Break smid write function
  mlxsw: spectrum_switchdev: Attach mid id allocation to HW write
  mlxsw: spectrum_switchdev: Break mid deletion into two function
  mlxsw: spectrum_switchdev: Don't write mids to the HW when mc is
disabled
  mlxsw: spectrum_switchdev: Disable mdb when mc is disabled
  mlxsw: spectrum_switchdev: Use generic mc flood function
  mlxsw: spectrum_switchdev: Flood mc when mc is disabled by user flag
  mlxsw: spectrum_switchdev: Flush the mdb when a port is being removed
  mlxsw: spectrum_switchdev: Flood all mc packets to mrouter ports
  mlxsw: spectrum_switchdev: Update the mdb of mrouter port change
  mlxsw: spectrum_switchdev: Remove mrouter flood in mdb flush
  mlxsw: spectrum_switchdev: Consider mrouter status for mdb changes

 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   3 +-
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 417 -
 2 files changed, 323 insertions(+), 97 deletions(-)

-- 
2.9.5

[patch net-next 14/16] mlxsw: spectrum_switchdev: Update the mdb of mrouter port change

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Whenever a port starts / stops being mrouter, update all the mdb entries
in the HW to flood / stop flooding mc packets there.
The change should happen only if the port is not in the mid. (If it is,
the mid should flood mc packets to this port anyway)

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 23 ++
 1 file changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 146beaa..bf1a175 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -130,6 +130,11 @@ mlxsw_sp_bridge_mdb_mc_enable_sync(struct mlxsw_sp_port 
*mlxsw_sp_port,
   struct mlxsw_sp_bridge_device
   *bridge_device);
 
+static void
+mlxsw_sp_port_mrouter_update_mdb(struct mlxsw_sp_port *mlxsw_sp_port,
+struct mlxsw_sp_bridge_port *bridge_port,
+bool add);
+
 static struct mlxsw_sp_bridge_device *
 mlxsw_sp_bridge_device_find(const struct mlxsw_sp_bridge *bridge,
const struct net_device *br_dev)
@@ -747,6 +752,8 @@ static int mlxsw_sp_port_attr_mrouter_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
if (err)
return err;
 
+   mlxsw_sp_port_mrouter_update_mdb(mlxsw_sp_port, bridge_port,
+is_port_mrouter);
 out:
bridge_port->mrouter = is_port_mrouter;
return 0;
@@ -1517,6 +1524,22 @@ mlxsw_sp_bridge_mdb_mc_enable_sync(struct mlxsw_sp_port 
*mlxsw_sp_port,
}
 }
 
+static void
+mlxsw_sp_port_mrouter_update_mdb(struct mlxsw_sp_port *mlxsw_sp_port,
+struct mlxsw_sp_bridge_port *bridge_port,
+bool add)
+{
+   struct mlxsw_sp_bridge_device *bridge_device;
+   struct mlxsw_sp_mid *mid;
+
+   bridge_device = bridge_port->bridge_device;
+
+   list_for_each_entry(mid, &bridge_device->mids_list, list) {
+   if (!test_bit(mlxsw_sp_port->local_port, mid->ports_in_mid))
+   mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, add);
+   }
+}
+
 static int mlxsw_sp_port_obj_add(struct net_device *dev,
 const struct switchdev_obj *obj,
 struct switchdev_trans *trans)
-- 
2.9.5

[patch net-next 15/16] mlxsw: spectrum_switchdev: Remove mrouter flood in mdb flush

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

In mdb flush the port is being removed from all the mids it is registered
to. But if the port is mrouter, all the mids floods to it.
This patch remove mrouter ports from mids it is not registered to in the
mdb flush.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index bf1a175..459cedc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1673,6 +1673,9 @@ mlxsw_sp_bridge_port_mdb_flush(struct mlxsw_sp_port 
*mlxsw_sp_port,
if (test_bit(mlxsw_sp_port->local_port, mid->ports_in_mid)) {
__mlxsw_sp_port_mdb_del(mlxsw_sp_port, bridge_port,
mid);
+   } else if (bridge_device->multicast_enabled &&
+  bridge_port->mrouter) {
+   mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
}
}
 }
-- 
2.9.5

[patch net-next 04/16] mlxsw: spectrum_switchdev: Save mids list per bridge device

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Instead of saving all the mids in the same list, save them per vlan
device. This change allows a more efficient mid find.
Also, in the next patches, there will be added a lot of loops over all the
mids in bridge device for multicast disable, mrouter change and ndb flush.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 49 +++---
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index cb2275ed..2ba8a44 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -67,7 +67,6 @@ struct mlxsw_sp_bridge {
u32 ageing_time;
bool vlan_enabled_exists;
struct list_head bridges_list;
-   struct list_head mids_list;
DECLARE_BITMAP(mids_bitmap, MLXSW_SP_MID_MAX);
const struct mlxsw_sp_bridge_ops *bridge_8021q_ops;
const struct mlxsw_sp_bridge_ops *bridge_8021d_ops;
@@ -77,6 +76,7 @@ struct mlxsw_sp_bridge_device {
struct net_device *dev;
struct list_head list;
struct list_head ports_list;
+   struct list_head mids_list;
u8 vlan_enabled:1,
   multicast_enabled:1;
const struct mlxsw_sp_bridge_ops *ops;
@@ -161,6 +161,7 @@ mlxsw_sp_bridge_device_create(struct mlxsw_sp_bridge 
*bridge,
} else {
bridge_device->ops = bridge->bridge_8021d_ops;
}
+   INIT_LIST_HEAD(&bridge_device->mids_list);
list_add(&bridge_device->list, &bridge->bridges_list);
 
return bridge_device;
@@ -170,10 +171,17 @@ static void
 mlxsw_sp_bridge_device_destroy(struct mlxsw_sp_bridge *bridge,
   struct mlxsw_sp_bridge_device *bridge_device)
 {
+   struct mlxsw_sp_mid *mid, *tmp;
+
list_del(&bridge_device->list);
if (bridge_device->vlan_enabled)
bridge->vlan_enabled_exists = false;
WARN_ON(!list_empty(&bridge_device->ports_list));
+   list_for_each_entry_safe(mid, tmp, &bridge_device->mids_list, list) {
+   list_del(&mid->list);
+   clear_bit(mid->mid, bridge->mids_bitmap);
+   kfree(mid);
+   }
kfree(bridge_device);
 }
 
@@ -1221,22 +1229,25 @@ static int mlxsw_sp_port_smid_set(struct mlxsw_sp_port 
*mlxsw_sp_port, u16 mid,
return err;
 }
 
-static struct mlxsw_sp_mid *__mlxsw_sp_mc_get(struct mlxsw_sp *mlxsw_sp,
- const unsigned char *addr,
- u16 fid)
+static struct
+mlxsw_sp_mid *__mlxsw_sp_mc_get(struct mlxsw_sp_bridge_device *bridge_device,
+   const unsigned char *addr,
+   u16 fid)
 {
struct mlxsw_sp_mid *mid;
 
-   list_for_each_entry(mid, &mlxsw_sp->bridge->mids_list, list) {
+   list_for_each_entry(mid, &bridge_device->mids_list, list) {
if (ether_addr_equal(mid->addr, addr) && mid->fid == fid)
return mid;
}
return NULL;
 }
 
-static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp *mlxsw_sp,
-   const unsigned char *addr,
-   u16 fid)
+static struct
+mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp *mlxsw_sp,
+ struct mlxsw_sp_bridge_device *bridge_device,
+ const unsigned char *addr,
+ u16 fid)
 {
struct mlxsw_sp_mid *mid;
size_t alloc_size;
@@ -1263,7 +1274,7 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct 
mlxsw_sp *mlxsw_sp,
ether_addr_copy(mid->addr, addr);
mid->fid = fid;
mid->mid = mid_idx;
-   list_add_tail(&mid->list, &mlxsw_sp->bridge->mids_list);
+   list_add_tail(&mid->list, &bridge_device->mids_list);
 
return mid;
 }
@@ -1316,9 +1327,10 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
 
fid_index = mlxsw_sp_fid_index(mlxsw_sp_port_vlan->fid);
 
-   mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb->addr, fid_index);
+   mid = __mlxsw_sp_mc_get(bridge_device, mdb->addr, fid_index);
if (!mid) {
-   mid = __mlxsw_sp_mc_alloc(mlxsw_sp, mdb->addr, fid_index);
+   mid = __mlxsw_sp_mc_alloc(mlxsw_sp, bridge_device, mdb->addr,
+ fid_index);
if (!mid) {
netdev_err(dev, "Unable to allocate MC group\n");
return -ENOMEM;
@@ -1440,7 +1452,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
 
fid_index = mlxsw_sp_fid_index(mlxsw_sp_port_vlan->fid);
 
-   mid = __mlxsw_sp_mc_get(mlxsw_sp, mdb-

[patch net-next 09/16] mlxsw: spectrum_switchdev: Disable mdb when mc is disabled

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Remove all the mdb entries from the HW when mc is being disabled and
re-write them when it is being enabled.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 41 +++---
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index cea257a..79806af 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -121,6 +121,11 @@ mlxsw_sp_bridge_port_fdb_flush(struct mlxsw_sp *mlxsw_sp,
   struct mlxsw_sp_bridge_port *bridge_port,
   u16 fid_index);
 
+static void
+mlxsw_sp_bridge_mdb_mc_enable_sync(struct mlxsw_sp_port *mlxsw_sp_port,
+  struct mlxsw_sp_bridge_device
+  *bridge_device);
+
 static struct mlxsw_sp_bridge_device *
 mlxsw_sp_bridge_device_find(const struct mlxsw_sp_bridge *bridge,
const struct net_device *br_dev)
@@ -757,6 +762,12 @@ static int mlxsw_sp_port_mc_disabled_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
if (!bridge_device)
return 0;
 
+   if (bridge_device->multicast_enabled != !mc_disabled) {
+   bridge_device->multicast_enabled = !mc_disabled;
+   mlxsw_sp_bridge_mdb_mc_enable_sync(mlxsw_sp_port,
+  bridge_device);
+   }
+
list_for_each_entry(bridge_port, &bridge_device->ports_list, list) {
enum mlxsw_sp_flood_type packet_type = MLXSW_SP_FLOOD_TYPE_MC;
bool member = mc_disabled ? true : bridge_port->mrouter;
@@ -1207,9 +1218,8 @@ static int mlxsw_sp_port_mdb_op(struct mlxsw_sp 
*mlxsw_sp, const char *addr,
return err;
 }
 
-/* clean the an entry from the HW and write there a full new entry */
-static int mlxsw_sp_port_smid_full_entry(struct mlxsw_sp *mlxsw_sp,
-u16 mid_idx)
+static int mlxsw_sp_port_smid_full_entry(struct mlxsw_sp *mlxsw_sp, u16 
mid_idx,
+long *ports_bitmap)
 {
char *smid_pl;
int err, i;
@@ -1224,6 +1234,9 @@ static int mlxsw_sp_port_smid_full_entry(struct mlxsw_sp 
*mlxsw_sp,
mlxsw_reg_smid_port_mask_set(smid_pl, i, 1);
}
 
+   for_each_set_bit(i, ports_bitmap, mlxsw_core_max_ports(mlxsw_sp->core))
+   mlxsw_reg_smid_port_set(smid_pl, i, 1);
+
err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(smid), smid_pl);
kfree(smid_pl);
return err;
@@ -1273,7 +1286,8 @@ mlxsw_sp_mc_write_mdb_entry(struct mlxsw_sp *mlxsw_sp,
return false;
 
mid->mid = mid_idx;
-   err = mlxsw_sp_port_smid_full_entry(mlxsw_sp, mid_idx);
+   err = mlxsw_sp_port_smid_full_entry(mlxsw_sp, mid_idx,
+   mid->ports_in_mid);
if (err)
return false;
 
@@ -1414,6 +1428,25 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
return err;
 }
 
+static void
+mlxsw_sp_bridge_mdb_mc_enable_sync(struct mlxsw_sp_port *mlxsw_sp_port,
+  struct mlxsw_sp_bridge_device
+  *bridge_device)
+{
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   struct mlxsw_sp_mid *mid;
+   bool mc_enabled;
+
+   mc_enabled = bridge_device->multicast_enabled;
+
+   list_for_each_entry(mid, &bridge_device->mids_list, list) {
+   if (mc_enabled)
+   mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid);
+   else
+   mlxsw_sp_mc_remove_mdb_entry(mlxsw_sp, mid);
+   }
+}
+
 static int mlxsw_sp_port_obj_add(struct net_device *dev,
 const struct switchdev_obj *obj,
 struct switchdev_trans *trans)
-- 
2.9.5

[patch net-next 11/16] mlxsw: spectrum_switchdev: Flood mc when mc is disabled by user flag

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

When multicast is disabled, flood mc packets only to port that are marked
BR_MCAST_FLOOD (instead to all).

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c| 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 19ac206..50c4d7c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -262,7 +262,8 @@ mlxsw_sp_bridge_port_create(struct mlxsw_sp_bridge_device 
*bridge_device,
bridge_port->dev = brport_dev;
bridge_port->bridge_device = bridge_device;
bridge_port->stp_state = BR_STATE_DISABLED;
-   bridge_port->flags = BR_LEARNING | BR_FLOOD | BR_LEARNING_SYNC;
+   bridge_port->flags = BR_LEARNING | BR_FLOOD | BR_LEARNING_SYNC |
+BR_MCAST_FLOOD;
INIT_LIST_HEAD(&bridge_port->vlans_list);
list_add(&bridge_port->list, &bridge_device->ports_list);
bridge_port->ref_count = 1;
@@ -468,7 +469,8 @@ static int mlxsw_sp_port_attr_get(struct net_device *dev,
   &attr->u.brport_flags);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
-   attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD;
+   attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD |
+  BR_MCAST_FLOOD;
break;
default:
return -EOPNOTSUPP;
@@ -653,8 +655,18 @@ static int mlxsw_sp_port_attr_br_flags_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
if (err)
return err;
 
-   memcpy(&bridge_port->flags, &brport_flags, sizeof(brport_flags));
+   if (bridge_port->bridge_device->multicast_enabled)
+   goto out;
 
+   err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
+  MLXSW_SP_FLOOD_TYPE_MC,
+  brport_flags &
+  BR_MCAST_FLOOD);
+   if (err)
+   return err;
+
+out:
+   memcpy(&bridge_port->flags, &brport_flags, sizeof(brport_flags));
return 0;
 }
 
@@ -747,7 +759,8 @@ static bool mlxsw_sp_mc_flood(const struct 
mlxsw_sp_bridge_port *bridge_port)
const struct mlxsw_sp_bridge_device *bridge_device;
 
bridge_device = bridge_port->bridge_device;
-   return !bridge_device->multicast_enabled ? true : bridge_port->mrouter;
+   return bridge_device->multicast_enabled ? bridge_port->mrouter :
+   bridge_port->flags & BR_MCAST_FLOOD;
 }
 
 static int mlxsw_sp_port_mc_disabled_set(struct mlxsw_sp_port *mlxsw_sp_port,
-- 
2.9.5

[patch net-next 13/16] mlxsw: spectrum_switchdev: Flood all mc packets to mrouter ports

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

When mc is enabled, whenever a mc packet doesn't hit any mdb entry it is
being flood to the ports marked as mrouters. However, all mc packets should
be flooded to them even if they match an entry in the mdb.
This patch adds the mrouter ports to every mdb entry that is being written
to the HW.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 65 --
 1 file changed, 60 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index bc07873..146beaa 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1288,10 +1288,55 @@ mlxsw_sp_mid *__mlxsw_sp_mc_get(struct 
mlxsw_sp_bridge_device *bridge_device,
return NULL;
 }
 
+static void
+mlxsw_sp_bridge_port_get_ports_bitmap(struct mlxsw_sp *mlxsw_sp,
+ struct mlxsw_sp_bridge_port *bridge_port,
+ unsigned long *ports_bitmap)
+{
+   struct mlxsw_sp_port *mlxsw_sp_port;
+   u64 max_lag_members, i;
+   int lag_id;
+
+   if (!bridge_port->lagged) {
+   set_bit(bridge_port->system_port, ports_bitmap);
+   } else {
+   max_lag_members = MLXSW_CORE_RES_GET(mlxsw_sp->core,
+MAX_LAG_MEMBERS);
+   lag_id = bridge_port->lag_id;
+   for (i = 0; i < max_lag_members; i++) {
+   mlxsw_sp_port = mlxsw_sp_port_lagged_get(mlxsw_sp,
+lag_id, i);
+   if (mlxsw_sp_port)
+   set_bit(mlxsw_sp_port->local_port,
+   ports_bitmap);
+   }
+   }
+}
+
+static void
+mlxsw_sp_mc_get_mrouters_bitmap(unsigned long *flood_bitmap,
+   struct mlxsw_sp_bridge_device *bridge_device,
+   struct mlxsw_sp *mlxsw_sp)
+{
+   struct mlxsw_sp_bridge_port *bridge_port;
+
+   list_for_each_entry(bridge_port, &bridge_device->ports_list, list) {
+   if (bridge_port->mrouter) {
+   mlxsw_sp_bridge_port_get_ports_bitmap(mlxsw_sp,
+ bridge_port,
+ flood_bitmap);
+   }
+   }
+}
+
 static bool
 mlxsw_sp_mc_write_mdb_entry(struct mlxsw_sp *mlxsw_sp,
-   struct mlxsw_sp_mid *mid)
+   struct mlxsw_sp_mid *mid,
+   struct mlxsw_sp_bridge_device *bridge_device)
 {
+   long *flood_bitmap;
+   int num_of_ports;
+   int alloc_size;
u16 mid_idx;
int err;
 
@@ -1300,9 +1345,18 @@ mlxsw_sp_mc_write_mdb_entry(struct mlxsw_sp *mlxsw_sp,
if (mid_idx == MLXSW_SP_MID_MAX)
return false;
 
+   num_of_ports = mlxsw_core_max_ports(mlxsw_sp->core);
+   alloc_size = sizeof(long) * BITS_TO_LONGS(num_of_ports);
+   flood_bitmap = kzalloc(alloc_size, GFP_KERNEL);
+   if (!flood_bitmap)
+   return false;
+
+   bitmap_copy(flood_bitmap,  mid->ports_in_mid, num_of_ports);
+   mlxsw_sp_mc_get_mrouters_bitmap(flood_bitmap, bridge_device, mlxsw_sp);
+
mid->mid = mid_idx;
-   err = mlxsw_sp_port_smid_full_entry(mlxsw_sp, mid_idx,
-   mid->ports_in_mid);
+   err = mlxsw_sp_port_smid_full_entry(mlxsw_sp, mid_idx, flood_bitmap);
+   kfree(flood_bitmap);
if (err)
return false;
 
@@ -1355,7 +1409,7 @@ mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp 
*mlxsw_sp,
if (!bridge_device->multicast_enabled)
goto out;
 
-   if (!mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid))
+   if (!mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid, bridge_device))
goto err_write_mdb_entry;
 
 out:
@@ -1456,7 +1510,8 @@ mlxsw_sp_bridge_mdb_mc_enable_sync(struct mlxsw_sp_port 
*mlxsw_sp_port,
 
list_for_each_entry(mid, &bridge_device->mids_list, list) {
if (mc_enabled)
-   mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid);
+   mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid,
+   bridge_device);
else
mlxsw_sp_mc_remove_mdb_entry(mlxsw_sp, mid);
}
-- 
2.9.5

[patch net-next 16/16] mlxsw: spectrum_switchdev: Consider mrouter status for mdb changes

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

When a mrouter is registered or leaves a mid, don't update the HW.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 459cedc..0f9eac5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1491,6 +1491,9 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
if (!bridge_device->multicast_enabled)
return 0;
 
+   if (bridge_port->mrouter)
+   return 0;
+
err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true);
if (err) {
netdev_err(dev, "Unable to set SMID\n");
@@ -1613,10 +1616,12 @@ __mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
int err;
 
if (bridge_port->bridge_device->multicast_enabled) {
-   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
-
-   if (err)
-   netdev_err(dev, "Unable to remove port from SMID\n");
+   if (bridge_port->bridge_device->multicast_enabled) {
+   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid,
+false);
+   if (err)
+   netdev_err(dev, "Unable to remove port from 
SMID\n");
+   }
}
 
err = mlxsw_sp_port_remove_from_mid(mlxsw_sp_port, mid);
-- 
2.9.5

[patch net-next 12/16] mlxsw: spectrum_switchdev: Flush the mdb when a port is being removed

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

When a port is being removed from a bridge, flush the bridge mdb to remove
the mids of that port.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 39 --
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 50c4d7c..bc07873 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -122,6 +122,10 @@ mlxsw_sp_bridge_port_fdb_flush(struct mlxsw_sp *mlxsw_sp,
   u16 fid_index);
 
 static void
+mlxsw_sp_bridge_port_mdb_flush(struct mlxsw_sp_port *mlxsw_sp_port,
+  struct mlxsw_sp_bridge_port *bridge_port);
+
+static void
 mlxsw_sp_bridge_mdb_mc_enable_sync(struct mlxsw_sp_port *mlxsw_sp_port,
   struct mlxsw_sp_bridge_device
   *bridge_device);
@@ -176,17 +180,11 @@ static void
 mlxsw_sp_bridge_device_destroy(struct mlxsw_sp_bridge *bridge,
   struct mlxsw_sp_bridge_device *bridge_device)
 {
-   struct mlxsw_sp_mid *mid, *tmp;
-
list_del(&bridge_device->list);
if (bridge_device->vlan_enabled)
bridge->vlan_enabled_exists = false;
WARN_ON(!list_empty(&bridge_device->ports_list));
-   list_for_each_entry_safe(mid, tmp, &bridge_device->mids_list, list) {
-   list_del(&mid->list);
-   clear_bit(mid->mid, bridge->mids_bitmap);
-   kfree(mid);
-   }
+   WARN_ON(!list_empty(&bridge_device->mids_list));
kfree(bridge_device);
 }
 
@@ -987,24 +985,28 @@ mlxsw_sp_port_vlan_bridge_leave(struct mlxsw_sp_port_vlan 
*mlxsw_sp_port_vlan)
struct mlxsw_sp_bridge_vlan *bridge_vlan;
struct mlxsw_sp_bridge_port *bridge_port;
u16 vid = mlxsw_sp_port_vlan->vid;
-   bool last;
+   bool last_port, last_vlan;
 
if (WARN_ON(mlxsw_sp_fid_type(fid) != MLXSW_SP_FID_TYPE_8021Q &&
mlxsw_sp_fid_type(fid) != MLXSW_SP_FID_TYPE_8021D))
return;
 
bridge_port = mlxsw_sp_port_vlan->bridge_port;
+   last_vlan = list_is_singular(&bridge_port->vlans_list);
bridge_vlan = mlxsw_sp_bridge_vlan_find(bridge_port, vid);
-   last = list_is_singular(&bridge_vlan->port_vlan_list);
+   last_port = list_is_singular(&bridge_vlan->port_vlan_list);
 
list_del(&mlxsw_sp_port_vlan->bridge_vlan_node);
mlxsw_sp_bridge_vlan_put(bridge_vlan);
mlxsw_sp_port_vid_stp_set(mlxsw_sp_port, vid, BR_STATE_DISABLED);
mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, vid, false);
-   if (last)
+   if (last_port)
mlxsw_sp_bridge_port_fdb_flush(mlxsw_sp_port->mlxsw_sp,
   bridge_port,
   mlxsw_sp_fid_index(fid));
+   if (last_vlan)
+   mlxsw_sp_bridge_port_mdb_flush(mlxsw_sp_port, bridge_port);
+
mlxsw_sp_port_vlan_fid_leave(mlxsw_sp_port_vlan);
 
mlxsw_sp_bridge_port_put(mlxsw_sp_port->mlxsw_sp->bridge, bridge_port);
@@ -1580,6 +1582,23 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
return __mlxsw_sp_port_mdb_del(mlxsw_sp_port, bridge_port, mid);
 }
 
+static void
+mlxsw_sp_bridge_port_mdb_flush(struct mlxsw_sp_port *mlxsw_sp_port,
+  struct mlxsw_sp_bridge_port *bridge_port)
+{
+   struct mlxsw_sp_bridge_device *bridge_device;
+   struct mlxsw_sp_mid *mid, *tmp;
+
+   bridge_device = bridge_port->bridge_device;
+
+   list_for_each_entry_safe(mid, tmp, &bridge_device->mids_list, list) {
+   if (test_bit(mlxsw_sp_port->local_port, mid->ports_in_mid)) {
+   __mlxsw_sp_port_mdb_del(mlxsw_sp_port, bridge_port,
+   mid);
+   }
+   }
+}
+
 static int mlxsw_sp_port_obj_del(struct net_device *dev,
 const struct switchdev_obj *obj)
 {
-- 
2.9.5

[patch net-next 08/16] mlxsw: spectrum_switchdev: Don't write mids to the HW when mc is disabled

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Don't write multicast related data to the HW when mc is disabled.
Also, don't allocate mid id to new mids (so the remove function could know
that they weren't wrote to the HW)

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c| 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7f622de..cea257a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1290,6 +1290,9 @@ mlxsw_sp_mc_write_mdb_entry(struct mlxsw_sp *mlxsw_sp,
 static int mlxsw_sp_mc_remove_mdb_entry(struct mlxsw_sp *mlxsw_sp,
struct mlxsw_sp_mid *mid)
 {
+   if (!mid->in_hw)
+   return 0;
+
clear_bit(mid->mid, mlxsw_sp->bridge->mids_bitmap);
mid->in_hw = false;
return mlxsw_sp_port_mdb_op(mlxsw_sp, mid->addr, mid->fid, mid->mid,
@@ -1319,11 +1322,15 @@ mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp 
*mlxsw_sp,
ether_addr_copy(mid->addr, addr);
mid->fid = fid;
mid->in_hw = false;
+
+   if (!bridge_device->multicast_enabled)
+   goto out;
+
if (!mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid))
goto err_write_mdb_entry;
 
+out:
list_add_tail(&mid->list, &bridge_device->mids_list);
-
return mid;
 
 err_write_mdb_entry:
@@ -1391,6 +1398,9 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
}
set_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
 
+   if (!bridge_device->multicast_enabled)
+   return 0;
+
err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true);
if (err) {
netdev_err(dev, "Unable to set SMID\n");
@@ -1476,9 +1486,12 @@ __mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
struct net_device *dev = mlxsw_sp_port->dev;
int err;
 
-   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
-   if (err)
-   netdev_err(dev, "Unable to remove port from SMID\n");
+   if (bridge_port->bridge_device->multicast_enabled) {
+   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
+
+   if (err)
+   netdev_err(dev, "Unable to remove port from SMID\n");
+   }
 
err = mlxsw_sp_port_remove_from_mid(mlxsw_sp_port, mid);
if (err)
-- 
2.9.5

[patch net-next 10/16] mlxsw: spectrum_switchdev: Use generic mc flood function

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Use the generic mc flood function to decide whether to flood mc to a port
when mc is being enabled / disabled.
Move this function in the file to avoid forward declaration.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 79806af..19ac206 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -742,6 +742,14 @@ static int mlxsw_sp_port_attr_mrouter_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
return 0;
 }
 
+static bool mlxsw_sp_mc_flood(const struct mlxsw_sp_bridge_port *bridge_port)
+{
+   const struct mlxsw_sp_bridge_device *bridge_device;
+
+   bridge_device = bridge_port->bridge_device;
+   return !bridge_device->multicast_enabled ? true : bridge_port->mrouter;
+}
+
 static int mlxsw_sp_port_mc_disabled_set(struct mlxsw_sp_port *mlxsw_sp_port,
 struct switchdev_trans *trans,
 struct net_device *orig_dev,
@@ -770,7 +778,7 @@ static int mlxsw_sp_port_mc_disabled_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
 
list_for_each_entry(bridge_port, &bridge_device->ports_list, list) {
enum mlxsw_sp_flood_type packet_type = MLXSW_SP_FLOOD_TYPE_MC;
-   bool member = mc_disabled ? true : bridge_port->mrouter;
+   bool member = mlxsw_sp_mc_flood(bridge_port);
 
err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port,
   bridge_port,
@@ -829,14 +837,6 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
return err;
 }
 
-static bool mlxsw_sp_mc_flood(const struct mlxsw_sp_bridge_port *bridge_port)
-{
-   const struct mlxsw_sp_bridge_device *bridge_device;
-
-   bridge_device = bridge_port->bridge_device;
-   return !bridge_device->multicast_enabled ? true : bridge_port->mrouter;
-}
-
 static int
 mlxsw_sp_port_vlan_fid_join(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan,
struct mlxsw_sp_bridge_port *bridge_port)
-- 
2.9.5

[patch net-next 01/16] mlxsw: spectrum_switchdev: Change mc_router to mrouter

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Change the naming of mc_router to mrouter to keep consistency.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index d39ffbf..22f8d74 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -699,10 +699,10 @@ static int mlxsw_sp_port_attr_br_vlan_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
return -EINVAL;
 }
 
-static int mlxsw_sp_port_attr_mc_router_set(struct mlxsw_sp_port 
*mlxsw_sp_port,
-   struct switchdev_trans *trans,
-   struct net_device *orig_dev,
-   bool is_port_mc_router)
+static int mlxsw_sp_port_attr_mrouter_set(struct mlxsw_sp_port *mlxsw_sp_port,
+ struct switchdev_trans *trans,
+ struct net_device *orig_dev,
+ bool is_port_mrouter)
 {
struct mlxsw_sp_bridge_port *bridge_port;
int err;
@@ -720,12 +720,12 @@ static int mlxsw_sp_port_attr_mc_router_set(struct 
mlxsw_sp_port *mlxsw_sp_port,
 
err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
   MLXSW_SP_FLOOD_TYPE_MC,
-  is_port_mc_router);
+  is_port_mrouter);
if (err)
return err;
 
 out:
-   bridge_port->mrouter = is_port_mc_router;
+   bridge_port->mrouter = is_port_mrouter;
return 0;
 }
 
@@ -793,9 +793,9 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
 attr->u.vlan_filtering);
break;
case SWITCHDEV_ATTR_ID_PORT_MROUTER:
-   err = mlxsw_sp_port_attr_mc_router_set(mlxsw_sp_port, trans,
-  attr->orig_dev,
-  attr->u.mrouter);
+   err = mlxsw_sp_port_attr_mrouter_set(mlxsw_sp_port, trans,
+attr->orig_dev,
+attr->u.mrouter);
break;
case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
err = mlxsw_sp_port_mc_disabled_set(mlxsw_sp_port, trans,
-- 
2.9.5

[patch net-next 07/16] mlxsw: spectrum_switchdev: Break mid deletion into two function

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Break mid deletion into two function, so it will be possible in the future
to delete a mid entry for other reasons then switchdev command (like port
deletion).

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 32 ++
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 9dd05d8..7f622de 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1468,6 +1468,25 @@ static int mlxsw_sp_port_vlans_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
return 0;
 }
 
+static int
+__mlxsw_sp_port_mdb_del(struct mlxsw_sp_port *mlxsw_sp_port,
+   struct mlxsw_sp_bridge_port *bridge_port,
+   struct mlxsw_sp_mid *mid)
+{
+   struct net_device *dev = mlxsw_sp_port->dev;
+   int err;
+
+   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
+   if (err)
+   netdev_err(dev, "Unable to remove port from SMID\n");
+
+   err = mlxsw_sp_port_remove_from_mid(mlxsw_sp_port, mid);
+   if (err)
+   netdev_err(dev, "Unable to remove MC SFD\n");
+
+   return err;
+}
+
 static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port *mlxsw_sp_port,
 const struct switchdev_obj_port_mdb *mdb)
 {
@@ -1479,8 +1498,6 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
struct mlxsw_sp_bridge_port *bridge_port;
struct mlxsw_sp_mid *mid;
u16 fid_index;
-   u16 mid_idx;
-   int err = 0;
 
bridge_port = mlxsw_sp_bridge_port_find(mlxsw_sp->bridge, orig_dev);
if (!bridge_port)
@@ -1501,16 +1518,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
return -EINVAL;
}
 
-   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
-   if (err)
-   netdev_err(dev, "Unable to remove port from SMID\n");
-
-   mid_idx = mid->mid;
-   err = mlxsw_sp_port_remove_from_mid(mlxsw_sp_port, mid);
-   if (err)
-   netdev_err(dev, "Unable to remove MC SFD\n");
-
-   return err;
+   return __mlxsw_sp_port_mdb_del(mlxsw_sp_port, bridge_port, mid);
 }
 
 static int mlxsw_sp_port_obj_del(struct net_device *dev,
-- 
2.9.5

[patch net-next 02/16] mlxsw: spectrum_switchdev: Add a ports bitmap to the mid db

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Add a bitmap of ports to the mid struct to hold the ports that are
registered to this mid.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 20 +---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 7180d8f..0424bee 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -95,6 +95,7 @@ struct mlxsw_sp_mid {
u16 fid;
u16 mid;
unsigned int ref_count;
+   unsigned long *ports_in_mid; /* bits array */
 };
 
 enum mlxsw_sp_span_type {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 22f8d74..0fde16a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1239,6 +1239,7 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct 
mlxsw_sp *mlxsw_sp,
u16 fid)
 {
struct mlxsw_sp_mid *mid;
+   size_t alloc_size;
u16 mid_idx;
 
mid_idx = find_first_zero_bit(mlxsw_sp->bridge->mids_bitmap,
@@ -1250,6 +1251,14 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct 
mlxsw_sp *mlxsw_sp,
if (!mid)
return NULL;
 
+   alloc_size = sizeof(unsigned long) *
+BITS_TO_LONGS(mlxsw_core_max_ports(mlxsw_sp->core));
+   mid->ports_in_mid = kzalloc(alloc_size, GFP_KERNEL);
+   if (!mid->ports_in_mid) {
+   kfree(mid);
+   return NULL;
+   }
+
set_bit(mid_idx, mlxsw_sp->bridge->mids_bitmap);
ether_addr_copy(mid->addr, addr);
mid->fid = fid;
@@ -1260,12 +1269,16 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct 
mlxsw_sp *mlxsw_sp,
return mid;
 }
 
-static int __mlxsw_sp_mc_dec_ref(struct mlxsw_sp *mlxsw_sp,
+static int __mlxsw_sp_mc_dec_ref(struct mlxsw_sp_port *mlxsw_sp_port,
 struct mlxsw_sp_mid *mid)
 {
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+
+   clear_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
if (--mid->ref_count == 0) {
list_del(&mid->list);
clear_bit(mid->mid, mlxsw_sp->bridge->mids_bitmap);
+   kfree(mid->ports_in_mid);
kfree(mid);
return 1;
}
@@ -1311,6 +1324,7 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
}
}
mid->ref_count++;
+   set_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
 
err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true,
 mid->ref_count == 1);
@@ -1331,7 +1345,7 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
return 0;
 
 err_out:
-   __mlxsw_sp_mc_dec_ref(mlxsw_sp, mid);
+   __mlxsw_sp_mc_dec_ref(mlxsw_sp_port, mid);
return err;
 }
 
@@ -1437,7 +1451,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
netdev_err(dev, "Unable to remove port from SMID\n");
 
mid_idx = mid->mid;
-   if (__mlxsw_sp_mc_dec_ref(mlxsw_sp, mid)) {
+   if (__mlxsw_sp_mc_dec_ref(mlxsw_sp_port, mid)) {
err = mlxsw_sp_port_mdb_op(mlxsw_sp, mdb->addr, fid_index,
   mid_idx, false);
if (err)
-- 
2.9.5

[patch net-next 05/16] mlxsw: spectrum_switchdev: Break smid write function

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Break the smid write function into two, one that cleans the ports that
might be still written there and one that changes an exiting mid entry.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 42 +++---
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 2ba8a44..09ead97 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1190,7 +1190,7 @@ mlxsw_sp_port_fdb_set(struct mlxsw_sp_port *mlxsw_sp_port,
 }
 
 static int mlxsw_sp_port_mdb_op(struct mlxsw_sp *mlxsw_sp, const char *addr,
-   u16 fid, u16 mid, bool adding)
+   u16 fid, u16 mid_idx, bool adding)
 {
char *sfd_pl;
int err;
@@ -1201,16 +1201,16 @@ static int mlxsw_sp_port_mdb_op(struct mlxsw_sp 
*mlxsw_sp, const char *addr,
 
mlxsw_reg_sfd_pack(sfd_pl, mlxsw_sp_sfd_op(adding), 0);
mlxsw_reg_sfd_mc_pack(sfd_pl, 0, addr, fid,
- MLXSW_REG_SFD_REC_ACTION_NOP, mid);
+ MLXSW_REG_SFD_REC_ACTION_NOP, mid_idx);
err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(sfd), sfd_pl);
kfree(sfd_pl);
return err;
 }
 
-static int mlxsw_sp_port_smid_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 mid,
- bool add, bool clear_all_ports)
+/* clean the an entry from the HW and write there a full new entry */
+static int mlxsw_sp_port_smid_full_entry(struct mlxsw_sp *mlxsw_sp,
+u16 mid_idx)
 {
-   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
char *smid_pl;
int err, i;
 
@@ -1218,12 +1218,29 @@ static int mlxsw_sp_port_smid_set(struct mlxsw_sp_port 
*mlxsw_sp_port, u16 mid,
if (!smid_pl)
return -ENOMEM;
 
-   mlxsw_reg_smid_pack(smid_pl, mid, mlxsw_sp_port->local_port, add);
-   if (clear_all_ports) {
-   for (i = 1; i < mlxsw_core_max_ports(mlxsw_sp->core); i++)
-   if (mlxsw_sp->ports[i])
-   mlxsw_reg_smid_port_mask_set(smid_pl, i, 1);
+   mlxsw_reg_smid_pack(smid_pl, mid_idx, 0, false);
+   for (i = 1; i < mlxsw_core_max_ports(mlxsw_sp->core); i++) {
+   if (mlxsw_sp->ports[i])
+   mlxsw_reg_smid_port_mask_set(smid_pl, i, 1);
}
+
+   err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(smid), smid_pl);
+   kfree(smid_pl);
+   return err;
+}
+
+static int mlxsw_sp_port_smid_set(struct mlxsw_sp_port *mlxsw_sp_port,
+ u16 mid_idx, bool add)
+{
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   char *smid_pl;
+   int err;
+
+   smid_pl = kmalloc(MLXSW_REG_SMID_LEN, GFP_KERNEL);
+   if (!smid_pl)
+   return -ENOMEM;
+
+   mlxsw_reg_smid_pack(smid_pl, mid_idx, mlxsw_sp_port->local_port, add);
err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(smid), smid_pl);
kfree(smid_pl);
return err;
@@ -1336,10 +1353,11 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
return -ENOMEM;
}
is_new_mid = true;
+   mlxsw_sp_port_smid_full_entry(mlxsw_sp, mid->mid);
}
set_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
 
-   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true, is_new_mid);
+   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true);
if (err) {
netdev_err(dev, "Unable to set SMID\n");
goto err_out;
@@ -1458,7 +1476,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
return -EINVAL;
}
 
-   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false, false);
+   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, false);
if (err)
netdev_err(dev, "Unable to remove port from SMID\n");
 
-- 
2.9.5

[patch net-next 06/16] mlxsw: spectrum_switchdev: Attach mid id allocation to HW write

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Attach mid getting and releasing mid id to the HW write / remove, and add
a flag to indicate whether the mid is in the HW. It is done because mid id
is also HW index to this mid.
This change allows adding in the following patches the ability to have a
mid in the mdb cache but not in the HW. It will be useful for being able
to disable the multicast.
It means that the mdb is being written / delete to the HW in the mid
allocation / removing function, not after them.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  1 +
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 88 ++
 2 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 6fd0afe..e907ec4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -94,6 +94,7 @@ struct mlxsw_sp_mid {
unsigned char addr[ETH_ALEN];
u16 fid;
u16 mid;
+   bool in_hw;
unsigned long *ports_in_mid; /* bits array */
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 09ead97..9dd05d8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1260,6 +1260,42 @@ mlxsw_sp_mid *__mlxsw_sp_mc_get(struct 
mlxsw_sp_bridge_device *bridge_device,
return NULL;
 }
 
+static bool
+mlxsw_sp_mc_write_mdb_entry(struct mlxsw_sp *mlxsw_sp,
+   struct mlxsw_sp_mid *mid)
+{
+   u16 mid_idx;
+   int err;
+
+   mid_idx = find_first_zero_bit(mlxsw_sp->bridge->mids_bitmap,
+ MLXSW_SP_MID_MAX);
+   if (mid_idx == MLXSW_SP_MID_MAX)
+   return false;
+
+   mid->mid = mid_idx;
+   err = mlxsw_sp_port_smid_full_entry(mlxsw_sp, mid_idx);
+   if (err)
+   return false;
+
+   err = mlxsw_sp_port_mdb_op(mlxsw_sp, mid->addr, mid->fid, mid_idx,
+  true);
+   if (err)
+   return false;
+
+   set_bit(mid_idx, mlxsw_sp->bridge->mids_bitmap);
+   mid->in_hw = true;
+   return true;
+}
+
+static int mlxsw_sp_mc_remove_mdb_entry(struct mlxsw_sp *mlxsw_sp,
+   struct mlxsw_sp_mid *mid)
+{
+   clear_bit(mid->mid, mlxsw_sp->bridge->mids_bitmap);
+   mid->in_hw = false;
+   return mlxsw_sp_port_mdb_op(mlxsw_sp, mid->addr, mid->fid, mid->mid,
+   false);
+}
+
 static struct
 mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp *mlxsw_sp,
  struct mlxsw_sp_bridge_device *bridge_device,
@@ -1268,12 +1304,6 @@ mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp 
*mlxsw_sp,
 {
struct mlxsw_sp_mid *mid;
size_t alloc_size;
-   u16 mid_idx;
-
-   mid_idx = find_first_zero_bit(mlxsw_sp->bridge->mids_bitmap,
- MLXSW_SP_MID_MAX);
-   if (mid_idx == MLXSW_SP_MID_MAX)
-   return NULL;
 
mid = kzalloc(sizeof(*mid), GFP_KERNEL);
if (!mid)
@@ -1281,36 +1311,43 @@ mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct mlxsw_sp 
*mlxsw_sp,
 
alloc_size = sizeof(unsigned long) *
 BITS_TO_LONGS(mlxsw_core_max_ports(mlxsw_sp->core));
+
mid->ports_in_mid = kzalloc(alloc_size, GFP_KERNEL);
-   if (!mid->ports_in_mid) {
-   kfree(mid);
-   return NULL;
-   }
+   if (!mid->ports_in_mid)
+   goto err_ports_in_mid_alloc;
 
-   set_bit(mid_idx, mlxsw_sp->bridge->mids_bitmap);
ether_addr_copy(mid->addr, addr);
mid->fid = fid;
-   mid->mid = mid_idx;
+   mid->in_hw = false;
+   if (!mlxsw_sp_mc_write_mdb_entry(mlxsw_sp, mid))
+   goto err_write_mdb_entry;
+
list_add_tail(&mid->list, &bridge_device->mids_list);
 
return mid;
+
+err_write_mdb_entry:
+   kfree(mid->ports_in_mid);
+err_ports_in_mid_alloc:
+   kfree(mid);
+   return NULL;
 }
 
 static int mlxsw_sp_port_remove_from_mid(struct mlxsw_sp_port *mlxsw_sp_port,
 struct mlxsw_sp_mid *mid)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   int err = 0;
 
clear_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
if (bitmap_empty(mid->ports_in_mid,
 mlxsw_core_max_ports(mlxsw_sp->core))) {
+   err = mlxsw_sp_mc_remove_mdb_entry(mlxsw_sp, mid);
list_del(&mid->list);
-   clear_bit(mid->mid, mlxsw_sp->bridge->mids_bitmap);
kfree(mid->ports_in_mid);
kfree(mid);
-   return 1;
}
-   return 0;
+   return err;
 }
 
 st

[patch net-next 03/16] mlxsw: spectrum_switchdev: Remove reference count from mid

2017-09-20 Thread Jiri Pirko

From: Nogah Frankel 

Since there is a bitmap for the ports registered to each mid, there is no
need for a ref count, since it will always be the number of set bits in
this bitmap. Any check of the ref count was replaced with checking if the
bitmap is empty.

Signed-off-by: Nogah Frankel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   |  1 -
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 20 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 0424bee..6fd0afe 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -94,7 +94,6 @@ struct mlxsw_sp_mid {
unsigned char addr[ETH_ALEN];
u16 fid;
u16 mid;
-   unsigned int ref_count;
unsigned long *ports_in_mid; /* bits array */
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 0fde16a..cb2275ed 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -1263,19 +1263,19 @@ static struct mlxsw_sp_mid *__mlxsw_sp_mc_alloc(struct 
mlxsw_sp *mlxsw_sp,
ether_addr_copy(mid->addr, addr);
mid->fid = fid;
mid->mid = mid_idx;
-   mid->ref_count = 0;
list_add_tail(&mid->list, &mlxsw_sp->bridge->mids_list);
 
return mid;
 }
 
-static int __mlxsw_sp_mc_dec_ref(struct mlxsw_sp_port *mlxsw_sp_port,
-struct mlxsw_sp_mid *mid)
+static int mlxsw_sp_port_remove_from_mid(struct mlxsw_sp_port *mlxsw_sp_port,
+struct mlxsw_sp_mid *mid)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 
clear_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
-   if (--mid->ref_count == 0) {
+   if (bitmap_empty(mid->ports_in_mid,
+mlxsw_core_max_ports(mlxsw_sp->core))) {
list_del(&mid->list);
clear_bit(mid->mid, mlxsw_sp->bridge->mids_bitmap);
kfree(mid->ports_in_mid);
@@ -1296,6 +1296,7 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
struct mlxsw_sp_bridge_device *bridge_device;
struct mlxsw_sp_bridge_port *bridge_port;
struct mlxsw_sp_mid *mid;
+   bool is_new_mid = false;
u16 fid_index;
int err = 0;
 
@@ -1322,18 +1323,17 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
netdev_err(dev, "Unable to allocate MC group\n");
return -ENOMEM;
}
+   is_new_mid = true;
}
-   mid->ref_count++;
set_bit(mlxsw_sp_port->local_port, mid->ports_in_mid);
 
-   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true,
-mid->ref_count == 1);
+   err = mlxsw_sp_port_smid_set(mlxsw_sp_port, mid->mid, true, is_new_mid);
if (err) {
netdev_err(dev, "Unable to set SMID\n");
goto err_out;
}
 
-   if (mid->ref_count == 1) {
+   if (is_new_mid) {
err = mlxsw_sp_port_mdb_op(mlxsw_sp, mdb->addr, fid_index,
   mid->mid, true);
if (err) {
@@ -1345,7 +1345,7 @@ static int mlxsw_sp_port_mdb_add(struct mlxsw_sp_port 
*mlxsw_sp_port,
return 0;
 
 err_out:
-   __mlxsw_sp_mc_dec_ref(mlxsw_sp_port, mid);
+   mlxsw_sp_port_remove_from_mid(mlxsw_sp_port, mid);
return err;
 }
 
@@ -1451,7 +1451,7 @@ static int mlxsw_sp_port_mdb_del(struct mlxsw_sp_port 
*mlxsw_sp_port,
netdev_err(dev, "Unable to remove port from SMID\n");
 
mid_idx = mid->mid;
-   if (__mlxsw_sp_mc_dec_ref(mlxsw_sp_port, mid)) {
+   if (mlxsw_sp_port_remove_from_mid(mlxsw_sp_port, mid)) {
err = mlxsw_sp_port_mdb_op(mlxsw_sp, mdb->addr, fid_index,
   mid_idx, false);
if (err)
-- 
2.9.5

Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.

2017-09-20 Thread Tom Herbert

On Tue, Sep 19, 2017 at 10:13 PM, Eric Dumazet  wrote:
> On Tue, 2017-09-19 at 21:59 -0700, Samudrala, Sridhar wrote:
>> On 9/19/2017 5:48 PM, Tom Herbert wrote:
>> > On Tue, Sep 19, 2017 at 5:34 PM, Samudrala, Sridhar
>> >  wrote:
>> > > On 9/12/2017 3:53 PM, Tom Herbert wrote:
>> > > > On Tue, Sep 12, 2017 at 3:31 PM, Samudrala, Sridhar
>> > > >  wrote:
>> > > > >
>> > > > > On 9/12/2017 8:47 AM, Eric Dumazet wrote:
>> > > > > > On Mon, 2017-09-11 at 23:27 -0700, Samudrala, Sridhar wrote:
>> > > > > > > On 9/11/2017 8:53 PM, Eric Dumazet wrote:
>> > > > > > > > On Mon, 2017-09-11 at 20:12 -0700, Tom Herbert wrote:
>> > > > > > > >
>> > > > > > > > > Two ints in sock_common for this purpose is quite expensive 
>> > > > > > > > > and the
>> > > > > > > > > use case for this is limited-- even if a RX->TX queue 
>> > > > > > > > > mapping were
>> > > > > > > > > introduced to eliminate the queue pair assumption this still 
>> > > > > > > > > won't
>> > > > > > > > > help if the receive and transmit interfaces are different 
>> > > > > > > > > for the
>> > > > > > > > > connection. I think we really need to see some very 
>> > > > > > > > > compelling
>> > > > > > > > > results
>> > > > > > > > > to be able to justify this.
>> > > > > > > Will try to collect and post some perf data with symmetric queue
>> > > > > > > configuration.
>> > >
>> > > Here is some performance data i collected with memcached workload over
>> > > ixgbe 10Gb NIC with mcblaster benchmark.
>> > > ixgbe is configured with 16 queues and rx-usecs is set to 1000 for a very
>> > > low
>> > > interrupt rate.
>> > >   ethtool -L p1p1 combined 16
>> > >   ethtool -C p1p1 rx-usecs 1000
>> > > and busy poll is set to 1000usecs
>> > >   sysctl net.core.busy_poll = 1000
>> > >
>> > > 16 threads  800K requests/sec
>> > > =
>> > >   rtt(min/avg/max)usecs intr/sec contextswitch/sec
>> > > ---
>> > > Default2/182/1064123391 61163
>> > > Symmetric Queues   2/50/6311  20457 32843
>> > >
>> > > 32 threads  800K requests/sec
>> > > =
>> > >  rtt(min/avg/max)usecs intr/sec contextswitch/sec
>> > > 
>> > > Default2/162/639032168 69450
>> > > Symmetric Queues2/50/385335044 35847
>> > >
>> > No idea what "Default" configuration is. Please report how xps_cpus is
>> > being set, how many RSS queues there are, and what the mapping is
>> > between RSS queues and CPUs and shared caches. Also, whether and
>> > threads are pinned.
>> Default is linux 4.13 with the settings i listed above.
>> ethtool -L p1p1 combined 16
>> ethtool -C p1p1 rx-usecs 1000
>> sysctl net.core.busy_poll = 1000
>>
>> # ethtool -x p1p1
>> RX flow hash indirection table for p1p1 with 16 RX ring(s):
>> 0:  0 1 2 3 4 5 6 7
>> 8:  8 9101112131415
>>16:  0 1 2 3 4 5 6 7
>>24:  8 9101112131415
>>32:  0 1 2 3 4 5 6 7
>>40:  8 9101112131415
>>48:  0 1 2 3 4 5 6 7
>>56:  8 9101112131415
>>64:  0 1 2 3 4 5 6 7
>>72:  8 9101112131415
>>80:  0 1 2 3 4 5 6 7
>>88:  8 9101112131415
>>96:  0 1 2 3 4 5 6 7
>>   104:  8 9101112131415
>>   112:  0 1 2 3 4 5 6 7
>>   120:  8 9101112131415
>>
>> smp_affinity for the 16 queuepairs
>> 141 p1p1-TxRx-0 ,0001
>> 142 p1p1-TxRx-1 ,0002
>> 143 p1p1-TxRx-2 ,0004
>> 144 p1p1-TxRx-3 ,0008
>> 145 p1p1-TxRx-4 ,0010
>> 146 p1p1-TxRx-5 ,0020
>> 147 p1p1-TxRx-6 ,0040
>> 148 p1p1-TxRx-7 ,0080
>> 149 p1p1-TxRx-8 ,0100
>> 150 p1p1-TxRx-9 ,0200
>> 151 p1p1-TxRx-10 ,0400
>> 152 p1p1-TxRx-11 ,0800
>> 153 p1p1-TxRx-12 ,1000
>> 154 p1p1-TxRx-13 ,2000
>> 155 p1p1-TxRx-14 ,4000
>> 156 p1p1-TxRx-15 ,8000
>> xps_cpus for the 16 Tx queues
>> ,0001
>> ,0002
>> ,0004
>> ,0008
>> ,0010
>> ,0020
>> ,0040
>> ,0080
>> ,0100
>> ,0200
>> ,0400
>> ,0

Re: [PATCH net-next] net: dsa: Utilize dsa_slave_dev_check()

2017-09-20 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> Instead of open coding the check.
>
> Signed-off-by: Florian Fainelli 

If we do need to use it outside one day, we may think about renaming
netdev_uses_dsa() to netdev_is_dsa_master() and renaming
dsa_slave_dev_check() to netdev_is_dsa_slave().

In the meantime, looks good!

Reviewed-by: Vivien Didelot

Re: [PATCH net-next 08/10] net/smc: introduce a delay

2017-09-20 Thread Ursula Braun

On 09/20/2017 04:03 PM, Leon Romanovsky wrote:
> On Wed, Sep 20, 2017 at 01:58:11PM +0200, Ursula Braun wrote:
>> The number of outstanding work requests is limited. If all work
>> requests are in use, tx processing is postponed to another scheduling
>> of the tx worker. Switch to a delayed worker to have a gap for tx
>> completion queue events before the next retry.
>>
> 
> How will delay prevent and protect the resource exhausting?
> 
> Thanks
> 

SMC runs with a fixed number of in-flight work requests per QP (constant
SMC_WR_BUF_CNT) to prevent resource exhausting. If all work requests are
currently in use, sending of another work request has to wait till some
outstanding work request is confirmed via send completion queue. If sending
is done in a context which is not allowed to wait, the tx_worker is
scheduled instead.
With this patch a small delay is added to avoid too many unsuccessful send
retries due to a still ongoing "all work requests in use" condition.

0xC5ED6645.asc
Description: application/pgp-keys

signature.asc
Description: OpenPGP digital signature

Re: Latest net-next from GIT panic

2017-09-20 Thread Eric Dumazet

On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote:
> Nit much more after adding this patch
> 
> https://bugzilla.kernel.org/attachment.cgi?id=258529
> 

This is why I suggested to replace the BUG() in another mail

So :

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74
 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,15 @@ void netdev_run_todo(void);
  */
 static inline void dev_put(struct net_device *dev)
 {
-   this_cpu_dec(*dev->pcpu_refcnt);
+   int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+   if (!pref) {
+   pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+  dev, dev->name, dev->reg_state, dev->dismantle);
+   for (;;)
+   cpu_relax();
+   }
+   this_cpu_dec(*pref);
 }
 
 /**

vhost_net: VM looses network when using vhost over time

2017-09-20 Thread Bernd Naumann

Hi @all,

We have encountered/experience a bug which is more or less reproducible, but we 
do not know how to do it exactly or how to debug the issue in the first place.


# Background

In our setup we have a Ganti Cluser (kvm) with atm ~60 nodes running ~500 VMs, 
we are using tap interfaces on L2 bridges, L3 routed tap interfaces, and tap 
interfaces on a bridge with a VTEP attached to it. (For the vxlan setup we have 
a home grown daemon to maintain the FDB).


# The issue

On some VMs we loose network-connectivity under certain/unknown circumstances. 
"Looseing" means that the VM is not reachable and can therefor not reach any 
other host in the network.

However with `tcpdump` on the host (phy NIC + bridge) we can see the traffic 
going in; but with `tcpdump` on the VM we only see arp goes in, but nothing 
goes out. Manually setting the ARP entry does not help at all, or only for a 
moment, like `ip link set $DEV set arp off; ip link set $DEV arp on`. The only 
way we found to "fix" it, is rebooting the VM, or do `modprobe -r virtio_net; 
modprobe virtio_net`, but this seams also not the best workaround and can fail 
in a short time again. Also it is difficult to determinate when the issue is 
kicking in. Counting 'FAILED' neighbors is a indicator but nothing to rely on.

The frequence of the issue ranges from once in a few days, to multiple times 
per day or even after some minutes after boot. Most impact we see on VMs with 
higher network traffic like our gateway-VMs (multiple NICs in different 
networks, IPsec, iptables, ...); ha-proxy-VMs (similar to our gateways), but 
also (with reduced frequency) on /normal/ application VMs.

For what we have found so far, it looks like kind of: 
* https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978 -- Bug #997978 
“KVM images lose connectivity with bridged network” : Bugs : qemu-kvm package : 
Ubuntu
* https://bugs.centos.org/view.php?id=5526 -- 0005526: KVM Guest with virtio 
network loses network connectivity - CentOS Bug Tracker

Via `rtmon` we can observe that it starts with some "FAILED" neighbor entries 
and that they increase over time. As we know that this is only one consequence 
of not sending ARP replys to the requester; or that requested ARP is unanswered 
(cause the packet is not leaving the VM), the increasing count of 'FAILED' 
neighbors is /normal/. BUT: This can start on any interface, bridged tap 
interface for WAN, bridged tap in VXLAN, routed tap; it does not matter, or is 
not directly linked to the "kind" of interface.


# General overview of the setup

* ganiti-cluster with ~60 nodes
* each node has 2 x 50G (mlnx5 dual-port) connected to 2 x MLNX SN2700 switches
* each node runs `bird` with OSPF and ECMP (and OSPF with ECMP on SN2700 too)
* each VM has one or more vNICs in a bridged or routed network
* networks: bridged tap in WAN; bridged tap with attached VTEP; routed tap
* host OS: Ubuntu 16.04.3 with Ubuntu Kernel 4.12.13; first tested with 
qemu-kvm 1:2.5+dfsg-5ubuntu10.15, and later upgraded to qemu-kvm 
2.10~rc3+dfsg-0ubuntu1, same issue; guest OS Ubutnu 14.04, Ubuntu 16.04 and 
Ubuntu 16.04 with latest Ubuntu mainline kernel PPA


# So far we can "verify" it is 'vhost'

Without "vhost=on" for the kvm process we can not observe this issue. While 
using "vhost=on", a effected VM can be "fixed" by `rmmod` and `insmod 
virtio_net`, but reboot seams to provide a "fix" for a "longer" period. (But as 
you may know, virtio has not the performance we expect.)


So we have some questions:

* How can we debug the main issue to provide a meaningful bug report? Debug 
flags on the kernel but where to hang gdb on it? Sadly we are no kernel hackers 
:/, but we can compile our own kernel and qemu-kvm to test also release 
candidates and/or put patches in place.
* Does someone have seen this too? Can provide a better workaround, or patch or 
anything?
* Where to file/reopen this issue? qemu, netdev?
* Is qemu-kvm even the right place to look for answers?

We are happy to provide more information or collect debug information if 
someone wants to investigate.

Thanks for your time!
Best,
Bernd Naumann

Spreadshirt 
Bernd Naumann 
Systems Engineer, Networking & Operations 
bernd.naum...@spreadshirt.net 

http://www.spreadshirt.com 

sprd.net AG 
Gießerstraße 27 
D-04229 Leipzig 

Fon: +49 341 594 00 - 5900 
Fax: +49 341 594 00 - 5149 

Vorstand / executive board: Philip Rooke (CEO/Vorsitzender) · Tobias Schaugg 
Aufsichtsratsvorsitzender / chairman of the supervisory board: Lukasz Gadowski 
Handelsregister / trade register: Amtsgericht Leipzig, HRB 22478 
Umsatzsteuer-IdentNummer / VAT-ID: DE 8138 7149 4

Re: Latest net-next from GIT panic

2017-09-20 Thread Paweł Staszewski


W dniu 2017-09-20 o 16:40, Eric Dumazet pisze:

On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote:

Nit much more after adding this patch

https://bugzilla.kernel.org/attachment.cgi?id=258529


This is why I suggested to replace the BUG() in another mail

So :

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 
f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74
 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3331,7 +3331,15 @@ void netdev_run_todo(void);
   */
  static inline void dev_put(struct net_device *dev)
  {
-   this_cpu_dec(*dev->pcpu_refcnt);
+   int __percpu *pref = READ_ONCE(dev->pcpu_refcnt);
+
+   if (!pref) {
+   pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n",
+  dev, dev->name, dev->reg_state, dev->dismantle);
+   for (;;)
+   cpu_relax();
+   }
+   this_cpu_dec(*pref);
  }
  
  /**






Full panic

https://bugzilla.kernel.org/attachment.cgi?id=258531


I will change patch and apply but later today cause now cant use backup 
router as testlab - Internet rush hours if something happens this will 
be bed when second router will have bugged kernel :)

RFC iproute2 doc files

2017-09-20 Thread Stephen Hemminger

I noticed that the iproute man pages are up to date but the LaTex documentation
is very out of date. Rarely updated since the Linux 2.2 days.

Either someone needs to do a massive editing job on them, or they should just
be dropped. My preference would be to just drop everything in the doc/ 
directory.
The current versions are so old, they can't be helping.

Re: [PATCH net-next 5/5] tls: Add generic NIC offload infrastructure.

2017-09-20 Thread Hannes Frederic Sowa

Hello,

Boris Pismenny  writes:

> Hello,
>
> Hannes Frederic Sowa  writes:
>> Hello,
>> 
>> Ilya Lesokhin  writes:
>> 
>> > Hannes Frederic Sowa  writes:
>> >
>> >> The user should be aware of that they can't migrate the socket to
>> >> another interface if they got hw offloaded. This is not the case for
>> software offload.
>> >> Thus I think the user has to opt in and it shouldn't be a heuristic
>> >> until we can switch back to sw offload path.
>> >>
>> >> Maybe change flowi_oif to sk_bound_dev_if and somwhow lock it against
>> >> further changes if hw tls is in use?
>> >>
>> >
>> > I'm not sure I follow.
>> > We do set sk->sk_bound_dev_if to prevent further changes.
>> >
>> > Do you recommend we enable TLS offload only if SO_BINDTODEVICE
>> > was previously used on that socket?
>> > and prevent even users with CAP_NET_RAW from unbinding it?
>> >
>> > I would rather avoid requiring CAP_NET_RAW to use TLS offload.
>> > But admittedly I'm not sure setting sk->sk_bound_dev_if without
>> > CAP_NET_RAW like we do is legit either.
>> >
>> > Finally, the reason we made HW offload the default is that the user
>> > can use sudo ethtool -K enp0s4 tls-hw-tx-offload off to opt out of HW
>> > offload and we currently don't have anything equivalent for opting out of
>> SW KTLS.
>> 
>> IMHO the decision if a TCP flow should be bounded to hw and thus never
>> push traffic to another interface should a decision the administrator and the
>> application should opt in. You might have your management application
>> which is accessible over multiple interfaces and your production application
>> which might want to use hw offloaded tls. Thus I don't think only a single
>> ethtool knob will do it.
>
> IMO the configuration knob should be at the kTLS level and not at the
> HW vs. SW level. The management application shouldn't be using kTLS.
> I'd like to view TLS offload similarly to LSO. The default is opt-in if
> possible, and the Kernel decides that based on device capabilities.
>
>> 
>> I agree that SO_BINDTODEVICE is bad for this use case. First, the
>> CAP_NET_RAW limitation seems annoying and we don't want to enforce TLS
>> apps to have this capability. Second, the user space application doesn't care
>> which interface it should talk to (maybe?) but leave the routing decision to
>> the kernel and just opt in to TLS. SO_BINDTODEVICE doesn't allow this.
>> 
>> sk_bound_dev_if can be rebound later with CAP_NET_RAW privileges, will
>> this be a problem?
>
> Yes it is a problem and we have some ideas for a software fallback that should
> catch this. 

Ok.

> Is the software fallback a prerequisite for kTLS offload in Kernel?

I don't know. I would assume yes because it will change how uAPI will
look like?

>> 
>> Have you thought how the user space will configure the various offloading
>> features (sw, hw, none)? Will it in e.g. OpenSSL be part of the Cipher Spec 
>> or
>> will there be new functions around SSL_CTX to do so?
>> 
>> Maybe an enhancement of the TLS_TX setsockopt with a boolean for hw
>> offload is a solution?
>
> Yes, we think that OpenSSL should first configure whether it complies with
> kTLS support. Next, we thought of using an environment variable to control
> kTLS globally in OpenSSL as follows:

0. no kernel tls at all but use e.g. OpenSSL crypto code.

> 1. only software kTLS
> 2. only hardware kTLS - no fallback to software.
> 3. Try to use hardware kTLS and if it isn't supported fallback to
> software kTLS.

Hmm, environment variable and global control contradicts itself. ;)

In some form or another there is a need to have all options for
debugging. I also wonder if it makes sense to disable ktls based on
reordering and fast path vs. slow path hit ratio. But that is something
to think about later.

> The above is something we plan for the future, assuming that kTLS
> wouldn't fit for all use-cases. What do you think?
>
> If you'd like to have more fine-grained control of kTLS, e.g. per socket,
> then the application would need to be modified to configure that,
> which is something we try to avoid.

That is why I proposed signaling over ciphers(1) for openssl. If you
e.g. look at apache/mod_ssl, they loop the cipher list from the
configuration file directly to OpenSSL. Same for a lot of other web
servers, nginx etc. Thus you just need to modify openssl and don't need
to touch the users of the library.

E.g. in Fedora/RHEL the crypto libs load a default cipher list from
/etc/crypto-policies/, which you can update centrally with
update-crypto-policies. Maybe the kTLS switches fit nicely in there?

For that to do, OpenSSL needs still to have more fine grain control over
which kTLS sw/hw to use, right?

>> 
>> Another question:
>> 
>> How is the dependency management done between socket layer and driver
>> layer? It seems a bit cyclic but judging from this code you don't hold
>> references to the device (dev_hold) (which is good, you don't want to have
>> users creating refs to devices). OTOH you somehow n

IP Expo show Europe 2017 Attendees List

2017-09-20 Thread Aspen Ella

Hi,
Would you be interested in the "IP Expo show Europe 2017 Attendees List ?"

Please Let me know your interest to send you the number of attendees and cost.
Just let me know if you have any questions.
Awaiting your reply
 
Regards,
Aspen
Marketing Executive
 
 To remove from this mailing: reply with subject line as "leave out."

Re: [PATCH net-next 00/10] net/smc: updates 2017-09-20

2017-09-20 Thread Bart Van Assche

On Wed, 2017-09-20 at 13:58 +0200, Ursula Braun wrote:
> here is a collection of small smc-patches built for net-next improving
> the smc code in different areas.

Hello Ursula,

Can you provide us an update for the timeline of the plan to transition from
PF_SMC to PF_INET/PF_INET6 + SOCK_STREAM? See also
https://www.mail-archive.com/netdev@vger.kernel.org/msg166744.html.

Thanks,

Bart.

Re: [PATCH] VSOCK: fix uapi/linux/vm_sockets.h incomplete types

2017-09-20 Thread Stefan Hajnoczi

On Tue, Sep 19, 2017 at 10:38:40AM -0700, David Miller wrote:
> From: Stefan Hajnoczi 
> Date: Mon, 18 Sep 2017 16:21:00 +0100
> 
> > On Fri, Sep 15, 2017 at 02:14:32PM -0700, David Miller wrote:
> >> > diff --git a/include/uapi/linux/vm_sockets.h 
> >> > b/include/uapi/linux/vm_sockets.h
> >> > index b4ed5d895699..4ae5c625ac56 100644
> >> > --- a/include/uapi/linux/vm_sockets.h
> >> > +++ b/include/uapi/linux/vm_sockets.h
> >> > @@ -18,6 +18,10 @@
> >> >  
> >> >  #include 
> >> >  
> >> > +#ifndef __KERNEL__
> >> > +#include  /* struct sockaddr */
> >> > +#endif
> >> > +
> >> 
> >> There is no precedence whatsoever to include sys/socket.h in _any_ UAPI
> >> header file provided by the kernel.
> > 
> >  does it for the same reason:
> > 
> > include/uapi/linux/if.h:#include  /* for 
> > struct sockaddr. */
> 
> You don't need it for struct sockaddr, you need it for sa_family_t,
> the comment is very misleading.
> 
> Please do as I have instructed and it will fix this problem.

No, you really cannot rely on struct sockaddr from  in
uapi headers.  You can check this yourself:

  $ cd /tmp && gcc -o a.o -c /usr/include/linux/vm_sockets.h
  /usr/include/linux/vm_sockets.h:148:32: error: invalid application of 
‘sizeof’ to incomplete type ‘struct sockaddr’
  unsigned char svm_zero[sizeof(struct sockaddr) -
^~

The weird situation is:

1. When compiling the kernel,  brings in struct sockaddr
   because the compiler finds include/linux/socket.h first before
   include/uapi/linux/socket.h.

2. When compiling a userspace application,  does not
   bring in struct sockaddr because include/uapi/linux/socket.h is
   found.

This is why I added the #include  when !__KERNEL__.  Sorry
that the commit description wasn't clear on this.

Am I misunderstanding something?

Stefan

RE: [PATCH v4 net 2/3] lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE

2017-09-20 Thread Nisar.Sayed

Thanks Sergei, I will update it and submit next version.

- Nisar

 > Hello!
> 
> On 09/19/2017 01:02 AM, Nisar Sayed wrote:
> 
> > Allow EEPROM write for less than MAX_EEPROM_SIZE
> >
> > Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to
> > 10/100/1000 Ethernet device driver")
> > Signed-off-by: Nisar Sayed 
> > ---
> >   drivers/net/usb/lan78xx.c | 9 -
> >   1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
> > index fcf85ae37435..3292f56ffe02 100644
> > --- a/drivers/net/usb/lan78xx.c
> > +++ b/drivers/net/usb/lan78xx.c
> > @@ -1290,11 +1290,10 @@ static int lan78xx_ethtool_set_eeprom(struct
> net_device *netdev,
> > if (ret)
> > return ret;
> >
> > -   /* Allow entire eeprom update only */
> > -   if ((ee->magic == LAN78XX_EEPROM_MAGIC) &&
> > -   (ee->offset == 0) &&
> > -   (ee->len == 512) &&
> > -   (data[0] == EEPROM_INDICATOR))
> > +   /* Invalid EEPROM_INDICATOR at offset zero will result in fail to
> 
> s/fail/a failure/.
> 
> > +* load data from EEPROM
> > +*/
> > +   if (ee->magic == LAN78XX_EEPROM_MAGIC)
> > ret = lan78xx_write_raw_eeprom(dev, ee->offset, ee->len,
> data);
> > else if ((ee->magic == LAN78XX_OTP_MAGIC) &&
> >  (ee->offset == 0) &&
> >
> 
> MBR, Sergei

Re: [RFC PATCH] net: Introduce a socket option to enable picking tx queue based on rx queue.

2017-09-20 Thread Hannes Frederic Sowa

Sridhar Samudrala  writes:

> This patch introduces a new socket option SO_SYMMETRIC_QUEUES that can be used
> to enable symmetric tx and rx queues on a socket.
>
> This option is specifically useful for epoll based multi threaded workloads
> where each thread handles packets received on a single RX queue . In this 
> model,
> we have noticed that it helps to send the packets on the same TX queue
> corresponding to the queue-pair associated with the RX queue specifically when
> busy poll is enabled with epoll().
>
> Two new fields are added to struct sock_common to cache the last rx ifindex 
> and
> the rx queue in the receive path of an SKB. __netdev_pick_tx() returns the 
> cached
> rx queue when this option is enabled and the TX is happening on the same 
> device.

Would it help to make the rx and tx skb hashes symmetric
(skb_get_hash_symmetric) on request?

Re: [PATCH net-next 00/14] gtp: Additional feature support

2017-09-20 Thread Andreas Schultz


Hi Harald,

On 20/09/17 01:19, Harald Welte wrote:

Hi Tom,

On Tue, Sep 19, 2017 at 08:59:28AM -0700, Tom Herbert wrote:

On Tue, Sep 19, 2017 at 5:43 AM, Harald Welte 
wrote:

On Mon, Sep 18, 2017 at 05:38:50PM -0700, Tom Herbert wrote:

   - IPv6 support


see my detailed comments in other mails.  It's unfortunately only
support for the already "deprecated" IPv6-only PDP contexts, not the
more modern v4v6 type.  In order to interoperate with old and new
approach, all three cases (v4, v6 and v4v6) should be supported from
one code base.


It sounds like something that can be subsequently added.


Not entirely, at least on the netlink (and any other configuration
interface) you will have to reflect this from the very beginning.  You
have to have an explicit PDP type and cannot rely on the address type to
specify the type of PDP context.  Whatever interfaces are introduced
now will have to remain compatible to any future change.

My strategy to avoid any such possible 'road blocks' from being
introduced would be to simply add v4v6 and v6 support in one go.  The
differences are marginal (having both an IPv6 prefix and a v4 address in
parallel, rather than mutually exclusive only).


Do you have a reference to the spec?


See http://osmocom.org/issues/2418#note-7 which lists Section 11.2.1.3.2
of 3GPP TS 29.061 in combination with RFC3314, RFC7066, RFC6459 and
3GPP TS 23.060 9.2.1 as well as a summary of my understanding of it some
months ago.


   - Configurable networking interfaces so that GTP kernel can be
   used and tested without needing GSN network emulation (i.e. no
   user space daemon needed).


We have some pretty decent userspace utilities for configuring the
GTP interfaces and tunnels in the libgtpnl repository, but if it
helps people to have another way of configuration, I won't be
against it.


AFAIK those userspace utilities don't support IPv6.


Of course not [yet]. libgtpnl and the command line tools have been
implemented specifically for the in-kernel GTP driver, and you have to
make sure to add related support on both the kernel and the userspace
side (libgtpnl). So there's little point in adding features on either
side before the other side.  There would be no way to test...


Being able to configure GTP like any other encapsulation will
facilitate development of IPv6 and other features.


That may very well be the case, but adding "IPv6 support" to kernel GTP
in a way that is not in line with the existing userspace libraries and
control-plane implementations means that you're developing those
features in an artificial environment that doesn't resemble real 3GPP
interoperable networks out there.

As indicated, I'm not against adding additional interfaces, but we have
to make sure that we add IPv6 support (or any new feature support) to at
least libgtpnl, and to make sure we test interoperability with existing
3GPP network equipment such as real IPv6 capable phones and SGSNs.


I'm not sure if this is a useful feature.  GTP is used only in
operator-controlled networks and only on standard ports.  It's not
possible to negotiate any non-standard ports on the signaling plane
either.


Bear in mind that we're not required to do everything the GTP spec
says.


Yes, we are, at least as long as it affects interoperability with other
implemetations out there.

GTP uses well-known port numbers on *both* sides of the tunnel, and you
cannot deviate from that.


Actually, the well-known port is only mandatory for the receiving side.
The sending side can choose any port it wishes as long as it is prepared
to receive possible error indication on the well-known port.

Of course, it makes the implementation simple to use only one port, but 
for scalability it might be a good idea to support per PDP context 
sending ports.


Regards
Andreas


There's no point in having all kinds of feetures in the GTP user plane
which are not interoperable with other implementations, and which are
completely outside of the information model / architecture of GTP.

In the real world, GTP-U is only used in combination with GTP-C.  And in
GTP-C you can only negotiate the IP address of both sides of GTP-U, and
not the port number information.  As a result, the port numbers are
static on both sides.


My impression is GTP designers probably didn't think in terms of
getting best performance. But we can ;-)


I think it's wasted efforts if it's about "random udp ports" as no
standards-compliant implementation out there with which you will have to
interoperate will be able to support it.

GTP is used between home and roaming operator.  If you want to introduce
changes to how it works, you will have to have control over both sides
of the implementation of both the GTP-C and the GTP-u plane, which is
very unlikely and rather the exception in the hundreds of operators you
interoperate with.  Also keep in mind that there often are various
"middleboxes" that will suddenly have to reflect your changes.  That
starts from packet filter

Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone

2017-09-20 Thread Andreas Schultz


On 19/09/17 02:38, Tom Herbert wrote:

Add new configuration of GTP interfaces that allow specifying a port to
listen on (as opposed to having to get sockets from a userspace control
plane). This allows GTP interfaces to be configured and the data path
tested without requiring a GTP-C daemon.


This would imply that you can have multiple independent GTP sockets on 
the same IP address.That is not permitted by the GTP specifications. 
3GPP TS 29.281, section 4.3 states clearly that there is "only" one GTP 
entity per IP address.A PDP context is defined by the destination IP and 
the TEID. The destination port is not part of the identity of a PDP context.


Even the source IP and source port are not part of the tunnel identity. 
This makes is possible to send traffic from a new SGSN/SGW during 
handover before the control protocol has announced the handover.


At this point the usual response is: THAT IS NOT SAFE. Yes, GTP has been 
designed for cooperative networks only and should not be used on 
hostile/unsecured networks.


On the sending side, using multiple ports is permitted as long as the 
default GTP port is always able to receive incoming messages.


Andreas

[...]

Re: [PATCH net-next 03/14] gtp: Call common functions to get tunnel routes and add dst_cache

2017-09-20 Thread Andreas Schultz




On 19/09/17 14:09, Harald Welte wrote:

Hi Dave,

On Mon, Sep 18, 2017 at 09:17:51PM -0700, David Miller wrote:

This and the new dst caching code ignores any source address selection
done by ip_route_output_key() or the new tunnel route lookup helpers.

Either source address selection should be respected, or if saddr will
never be modified by a route lookup for some specific reason here,
that should be documented.


The IP source address is fixed by signaling on the GTP-C control plane
and nothing that the kernel can unilaterally decide to change.  Such a
change of address would have to be decided by and first be signaled on
GTP-C to the peer by the userspace daemon, which would then update the
PDP context in the kernel.


I think we had this discussion before. The sending IP and port are not 
part of the identity of the PDP context. So IMHO the sender is permitted

to change the source IP at random.

Regards
Andreas



So I guess you're asking us to document that rationale as form of a
source code comment ?

Re: [PATCH,net-next,0/2] Improve code coverage of syzkaller

2017-09-20 Thread Willem de Bruijn

On Wed, Sep 20, 2017 at 2:08 AM, David Miller  wrote:
> From: Petar Penkov 
> Date: Tue, 19 Sep 2017 21:26:14 -0700
>
>> Furthermore, in a way testing already requires specific kernel
>> configuration.  In this particular example, syzkaller prefers
>> synchronous operation and therefore needs 4KSTACKS disabled. Other
>> features that require rebuilding are KASAN and dbx. From this point
>> of view, I still think that having the TUN_NAPI flag has value.
>
> Then I think this path could be enabled/disabled with a runtime flag
> just as easily, no?

I think that the compile time option was chosen because of the ns_capable
check, so that with user namespaces unprivileged processes can control this
path. Perhaps we can require capable() only to set IFF_NAPI_FRAGS.

Then we can convert the napi_gro_receive path to be conditional on a new
IFF_NAPI flag instead of this compile time option.

[PATCH v5 net 1/3] lan78xx: Fix for eeprom read/write when device auto suspend

2017-09-20 Thread Nisar Sayed

Fix for eeprom read/write when device auto suspend

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 
Ethernet device driver")
Signed-off-by: Nisar Sayed 
---
 drivers/net/usb/lan78xx.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index b99a7fb09f8e..fcf85ae37435 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1265,30 +1265,46 @@ static int lan78xx_ethtool_get_eeprom(struct net_device 
*netdev,
  struct ethtool_eeprom *ee, u8 *data)
 {
struct lan78xx_net *dev = netdev_priv(netdev);
+   int ret;
+
+   ret = usb_autopm_get_interface(dev->intf);
+   if (ret)
+   return ret;
 
ee->magic = LAN78XX_EEPROM_MAGIC;
 
-   return lan78xx_read_raw_eeprom(dev, ee->offset, ee->len, data);
+   ret = lan78xx_read_raw_eeprom(dev, ee->offset, ee->len, data);
+
+   usb_autopm_put_interface(dev->intf);
+
+   return ret;
 }
 
 static int lan78xx_ethtool_set_eeprom(struct net_device *netdev,
  struct ethtool_eeprom *ee, u8 *data)
 {
struct lan78xx_net *dev = netdev_priv(netdev);
+   int ret;
+
+   ret = usb_autopm_get_interface(dev->intf);
+   if (ret)
+   return ret;
 
/* Allow entire eeprom update only */
if ((ee->magic == LAN78XX_EEPROM_MAGIC) &&
(ee->offset == 0) &&
(ee->len == 512) &&
(data[0] == EEPROM_INDICATOR))
-   return lan78xx_write_raw_eeprom(dev, ee->offset, ee->len, data);
+   ret = lan78xx_write_raw_eeprom(dev, ee->offset, ee->len, data);
else if ((ee->magic == LAN78XX_OTP_MAGIC) &&
 (ee->offset == 0) &&
 (ee->len == 512) &&
 (data[0] == OTP_INDICATOR_1))
-   return lan78xx_write_raw_otp(dev, ee->offset, ee->len, data);
+   ret = lan78xx_write_raw_otp(dev, ee->offset, ee->len, data);
 
-   return -EINVAL;
+   usb_autopm_put_interface(dev->intf);
+
+   return ret;
 }
 
 static void lan78xx_get_strings(struct net_device *netdev, u32 stringset,
-- 
2.14.1

[PATCH v5 net 2/3] lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE

2017-09-20 Thread Nisar Sayed

Allow EEPROM write for less than MAX_EEPROM_SIZE

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 
Ethernet device driver")
Signed-off-by: Nisar Sayed 
---
 drivers/net/usb/lan78xx.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index fcf85ae37435..f8c63eec8353 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1290,11 +1290,10 @@ static int lan78xx_ethtool_set_eeprom(struct net_device 
*netdev,
if (ret)
return ret;
 
-   /* Allow entire eeprom update only */
-   if ((ee->magic == LAN78XX_EEPROM_MAGIC) &&
-   (ee->offset == 0) &&
-   (ee->len == 512) &&
-   (data[0] == EEPROM_INDICATOR))
+   /* Invalid EEPROM_INDICATOR at offset zero will result in a failure
+* to load data from EEPROM
+*/
+   if (ee->magic == LAN78XX_EEPROM_MAGIC)
ret = lan78xx_write_raw_eeprom(dev, ee->offset, ee->len, data);
else if ((ee->magic == LAN78XX_OTP_MAGIC) &&
 (ee->offset == 0) &&
-- 
2.14.1

[PATCH v5 net 0/3] lan78xx: This series of patches are for lan78xx driver.

2017-09-20 Thread Nisar Sayed

This series of patches are for lan78xx driver.

These patches fixes potential issues associated with lan78xx driver.

v5
- Updated changes as per comments

v4
- Updated changes to handle return values as per comments
- Updated EEPROM write handling as per comments

v3
- Updated chagnes as per comments

v2
- Added patch version information
- Added fixes tag
- Updated patch description
- Updated chagnes as per comments

v1
- Splitted patches as per comments
- Dropped "fixed_phy device support" and "Fix for system suspend" changes

Nisar Sayed (3):
  lan78xx: Fix for eeprom read/write when device auto suspend
  lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
  lan78xx: Use default values loaded from EEPROM/OTP after reset

 drivers/net/usb/lan78xx.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

-- 
2.14.1

[PATCH v5 net 3/3] lan78xx: Use default values loaded from EEPROM/OTP after reset

2017-09-20 Thread Nisar Sayed

Use default value of auto duplex and auto speed values loaded
from EEPROM/OTP after reset. The LAN78xx allows platform
configurations to be loaded from EEPROM/OTP.
Ex: When external phy is connected, the MAC can be configured to
have correct auto speed, auto duplex, auto polarity configured
from the EEPROM/OTP.

Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 
Ethernet device driver")
Signed-off-by: Nisar Sayed 
---
 drivers/net/usb/lan78xx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index f8c63eec8353..0161f77641fa 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2449,7 +2449,6 @@ static int lan78xx_reset(struct lan78xx_net *dev)
/* LAN7801 only has RGMII mode */
if (dev->chipid == ID_REV_CHIP_ID_7801_)
buf &= ~MAC_CR_GMII_EN_;
-   buf |= MAC_CR_AUTO_DUPLEX_ | MAC_CR_AUTO_SPEED_;
ret = lan78xx_write_reg(dev, MAC_CR, buf);
 
ret = lan78xx_read_reg(dev, MAC_TX, &buf);
-- 
2.14.1

Re: [PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone

2017-09-20 Thread Tom Herbert

On Wed, Sep 20, 2017 at 8:27 AM, Andreas Schultz  wrote:
> On 19/09/17 02:38, Tom Herbert wrote:
>>
>> Add new configuration of GTP interfaces that allow specifying a port to
>> listen on (as opposed to having to get sockets from a userspace control
>> plane). This allows GTP interfaces to be configured and the data path
>> tested without requiring a GTP-C daemon.
>
>
> This would imply that you can have multiple independent GTP sockets on the
> same IP address.That is not permitted by the GTP specifications. 3GPP TS
> 29.281, section 4.3 states clearly that there is "only" one GTP entity per
> IP address.A PDP context is defined by the destination IP and the TEID. The
> destination port is not part of the identity of a PDP context.
>
We are in no way trying change GTP, if someone runs this in a real GTP
network then they need to abide by the specification. However, there
is nothing inconsistent and it breaks nothing if someone wishes to use
different port numbers in their own private network for testing or
development purposes. Every other UDP application that has assigned
port number allows configurable ports, I don't see that GTP is so
special that it should be an exception.

Tom

[PATCH v4 0/4] Add cross-compilation support to eBPF samples

2017-09-20 Thread Joel Fernandes

These patches fix issues seen when cross-compiling eBPF samples on arm64.
Compared to [1], I dropped the controversial inline-asm patch and exploring
other options to fix it. However these patches are a step in the right
direction and I look forward to getting them into -next and the merge window.

Changes since v3:
- just a repost with acks

[1] https://lkml.org/lkml/2017/8/7/417

Joel Fernandes (4):
  samples/bpf: Use getppid instead of getpgrp for array map stress
  samples/bpf: Enable cross compiler support
  samples/bpf: Fix pt_regs issues when cross-compiling
  samples/bpf: Add documentation on cross compilation

 samples/bpf/Makefile  |  7 +++-
 samples/bpf/README.rst| 10 ++
 samples/bpf/map_perf_test_kern.c  |  2 +-
 samples/bpf/map_perf_test_user.c  |  2 +-
 tools/testing/selftests/bpf/bpf_helpers.h | 56 +++
 5 files changed, 67 insertions(+), 10 deletions(-)

-- 
2.14.1.821.g8fa685d3b7-goog

1 2 3 4 >

1 - 100 of 306 matches

Mail list logo