Re: [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO

2016-11-22 Thread John Fastabend
On 16-11-21 03:23 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:49:34PM -0800, John Fastabend wrote:
>> This adds support for dynamically setting the LRO feature flag. The
>> message to control guest features in the backend uses the
>> CTRL_GUEST_OFFLOADS msg type.
>>
>> Signed-off-by: John Fastabend 
>> ---
>>  drivers/net/virtio_net.c |   45 
>> -
>>  1 file changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ca5239a..8189e5b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1419,6 +1419,41 @@ static void virtnet_init_settings(struct net_device 
>> *dev)
>>  .set_settings = virtnet_set_settings,
>>  };
>>  
>> +static int virtnet_set_features(struct net_device *netdev,
>> +netdev_features_t features)
>> +{
>> +struct virtnet_info *vi = netdev_priv(netdev);
>> +struct virtio_device *vdev = vi->vdev;
>> +struct scatterlist sg;
>> +u64 offloads = 0;
>> +
>> +if (features & NETIF_F_LRO)
>> +offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
>> +(1 << VIRTIO_NET_F_GUEST_TSO6);
>> +
>> +if (features & NETIF_F_RXCSUM)
>> +offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
>> +
>> +if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
>> +sg_init_one(&sg, &offloads, sizeof(uint64_t));
>> +if (!virtnet_send_command(vi,
>> +  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
>> +  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
>> +  &sg)) {
>> +dev_warn(&netdev->dev,
>> + "Failed to set guest offloads by virtnet 
>> command.\n");
>> +return -EINVAL;
>> +}
>> +} else if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &&
>> +   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>> +dev_warn(&netdev->dev,
>> + "No support for setting offloads pre version_1.\n");
>> +return -EINVAL;
>> +}
> 
> I don't get this last warning. VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> was exposes by legacy devices, I don't think it's related to
> VIRTIO_F_VERSION_1.
> 

OK looks like I can just drop the else if branch here.

Thanks,
John


Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues

2016-11-22 Thread John Fastabend
On 16-11-21 03:13 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:51:04PM -0800, John Fastabend wrote:
>> XDP requires using isolated transmit queues to avoid interference
>> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
>> adds a XDP queue per cpu when a XDP program is loaded and does not
>> expose the queues to the OS via the normal API call to
>> netif_set_real_num_tx_queues(). This way the stack will never push
>> an skb to these queues.
>>
>> However virtio/vhost/qemu implementation only allows for creating
>> TX/RX queue pairs at this time so creating only TX queues was not
>> possible. And because the associated RX queues are being created I
>> went ahead and exposed these to the stack and let the backend use
>> them. This creates more RX queues visible to the network stack than
>> TX queues which is worth mentioning but does not cause any issues as
>> far as I can tell.
>>
>> Signed-off-by: John Fastabend 
> 
> FYI what's supposed to happen is packets from the same
> flow going in the reverse direction will go on the
> same queue.
> 
> This might come in handy when implementing RX XDP.
> 

Yeah but if its the first packet not part of a flow then presumably it
can pick any queue but its worth keeping in mind certainly.

.John


Re: linux-next: build warnings after merge of the net-next tree

2016-11-22 Thread Thomas Petazzoni
Hello,

On Mon, 21 Nov 2016 17:28:39 -0800, Florian Fainelli wrote:
> +Thomas, Gregory,
> 
> On 11/21/2016 05:22 PM, Stephen Rothwell wrote:
> [snip]
> > 
> > Introduced by commit
> > 
> >   a0627f776a45 ("net: marvell: Allow drivers to be built with COMPILE_TEST")
> > 
> > "a few warnings" is a matter of perception.  :-(  
> 
> Thomas, based on our IRC conversation, do you already have patches for
> mvneta and mvpp2 to build without warning on 64-bit or should I prepare
> patches for these?

Yes, we already have patches for making mvneta and mvpp2 build without
warning for 64-bit (Grégory for mvneta, and myself for mvpp2). I
intended to send the mvpp2 ones together with patches adding support
for a new variant of the IP, but I guess I can send just the few ones
that make it 64-bit "buildable".

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support

2016-11-22 Thread John Fastabend
On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>> From: Shrijeet Mukherjee 
>>
>> This adds XDP support to virtio_net. Some requirements must be
>> met for XDP to be enabled depending on the mode. First it will
>> only be supported with LRO disabled so that data is not pushed
>> across multiple buffers. The MTU must be less than a page size
>> to avoid having to handle XDP across multiple pages.
>>
>> If mergeable receive is enabled this first series only supports
>> the case where header and data are in the same buf which we can
>> check when a packet is received by looking at num_buf. If the
>> num_buf is greater than 1 and a XDP program is loaded the packet
>> is dropped and a warning is thrown. When any_header_sg is set this
>> does not happen and both header and data is put in a single buffer
>> as expected so we check this when XDP programs are loaded. Note I
>> have only tested this with Linux vhost backend.
>>
>> If big packets mode is enabled and MTU/LRO conditions above are
>> met then XDP is allowed.
>>
>> A follow on patch can be generated to solve the mergeable receive
>> case with num_bufs equal to 2. Buffers greater than two may not
>> be handled has easily.
> 
> 
> I would very much prefer support for other layouts without drops
> before merging this.
> header by itself can certainly be handled by skipping it.
> People wanted to use that e.g. for zero copy.

OK fair enough I'll do this now rather than push it out.

> 
> Anything else can be handled by copying the packet.

This though I'm not so sure about. The copy is going to be slow and
I wonder if someone could craft a packet to cause this if it could
be used to slow down a system.

Also I can't see what would cause this to happen. With mergeable
buffers and LRO off the num_bufs is either 1 or 2 depending on where
the header is. Otherwise with LRO off it should be in a single page.
At least this is the Linux vhost implementation, I guess other
implementation might meet spec but use num_buf > 2 or multiple pages
even in the non LRO case.

I tend to think dropping the packet out right is better than copying
it around. At very least if we do this we need to put in warnings so
users can see something is mis-configured.

.John


net/can: use-after-free in bcm_rx_thr_flush

2016-11-22 Thread Andrey Konovalov
Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

A reproducer is attached.
You may need to run it a few times.

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

==
BUG: KASAN: use-after-free in bcm_rx_thr_flush+0x284/0x2b0
Read of size 1 at addr 88006c1faae5 by task a.out/3874

page:ea0001b07e80 count:1 mapcount:0 mapping:  (null) index:0x0
flags: 0x180(slab)
page dumped because: kasan: bad access detected

CPU: 1 PID: 3874 Comm: a.out Not tainted 4.9.0-rc6+ #427
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88006ab07900 81b472e4 88006ab07990 88006c1faae5
 00fa 00fb 88006ab07980 8150ad42
 88006323ce58 0246 880068ca8000 0282
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [< inline >] describe_address mm/kasan/report.c:259
 [] kasan_report_error+0x122/0x560 mm/kasan/report.c:365
 [< inline >] kasan_report mm/kasan/report.c:387
 [] __asan_report_load1_noabort+0x3e/0x40
mm/kasan/report.c:405
 [< inline >] bcm_rx_do_flush net/can/bcm.c:589
 [] bcm_rx_thr_flush+0x284/0x2b0 net/can/bcm.c:612
 [< inline >] bcm_rx_setup net/can/bcm.c:1199
 [] bcm_sendmsg+0xbb6/0x30e0 net/can/bcm.c:1351
 [< inline >] sock_sendmsg_nosec net/socket.c:621
 [] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [] ___sys_sendmsg+0x771/0x8b0 net/socket.c:1954
 [] __sys_sendmsg+0xce/0x170 net/socket.c:1988
 [< inline >] SYSC_sendmsg net/socket.c:1999
 [] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209

The buggy address belongs to the object at 88006c1faae0
 which belongs to the cache kmalloc-32 of size 32
The buggy address 88006c1faae5 is located 5 bytes inside
 of 32-byte region [88006c1faae0, 88006c1fab00)

Freed by task 2013:
 [] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 [] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 [< inline >] set_track mm/kasan/kasan.c:507
 [] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
 [< inline >] slab_free_hook mm/slub.c:1352
 [< inline >] slab_free_freelist_hook mm/slub.c:1374
 [< inline >] slab_free mm/slub.c:2951
 [] kfree+0xe8/0x2b0 mm/slub.c:3871
 [] selinux_cred_free+0x51/0x80 security/selinux/hooks.c:3725
 [] security_cred_free+0x48/0x80 security/security.c:907
 [] put_cred_rcu+0xed/0x390 kernel/cred.c:116
 [< inline >] __rcu_reclaim kernel/rcu/rcu.h:118
 [< inline >] rcu_do_batch kernel/rcu/tree.c:2776
 [< inline >] invoke_rcu_callbacks kernel/rcu/tree.c:3040
 [< inline >] __rcu_process_callbacks kernel/rcu/tree.c:3007
 [] rcu_process_callbacks+0xa40/0x1190 kernel/rcu/tree.c:3024
 [] __do_softirq+0x23f/0x8e5 kernel/softirq.c:284

Allocated by task 1826:
 [] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 [] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 [< inline >] set_track mm/kasan/kasan.c:507
 [] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598
 [] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537
 [< inline >] slab_post_alloc_hook mm/slab.h:417
 [< inline >] slab_alloc_node mm/slub.c:2708
 [< inline >] slab_alloc mm/slub.c:2716
 [] __kmalloc_track_caller+0xcf/0x2a0 mm/slub.c:4240
 [] kmemdup+0x24/0x50 mm/util.c:113
 [] selinux_cred_prepare+0x49/0xb0
security/selinux/hooks.c:3739
 [] security_prepare_creds+0x7d/0xb0 security/security.c:912
 [] prepare_creds+0x243/0x340 kernel/cred.c:277
 [] copy_creds+0x7b/0x5c0 kernel/cred.c:343
 [] copy_process.part.45+0x86e/0x5b50 kernel/fork.c:1529
 [< inline >] copy_process kernel/fork.c:1479
 [] _do_fork+0x1ba/0xcc0 kernel/fork.c:1933
 [< inline >] SYSC_clone kernel/fork.c:2043
 [] SyS_clone+0x37/0x50 kernel/fork.c:2037
 [] do_syscall_64+0x195/0x490 arch/x86/entry/common.c:280
 [] return_from_SYSCALL_64+0x0/0x7a
arch/x86/entry/entry_64.S:251

Memory state around the buggy address:
 88006c1fa980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
 88006c1faa00: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
>88006c1faa80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
   ^
 88006c1fab00: fc fc fb fb fb fb fc fc 00 00 00 00 fc fc 00 00
 88006c1fab80: 00 00 fc fc fb fb fb fb fc fc fb fb fb fb fc fc
==

Thanks!
// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_syz_fuse_mount
#define __NR_syz_fuse_mount 104
#endif
#ifndef __NR_syz_fuseblk_mount
#define __NR_syz_fuseblk_mount 105
#endif
#ifndef __NR_syz_open_pts
#define __NR_syz_open_pts 103
#endif
#ifndef __NR_syz_te

Re: net/l2tp: use-after-free write in l2tp_ip6_close

2016-11-22 Thread Andrey Konovalov
Hi Guillaume,

Sorry, I was on vacation last week, couldn't reply.

As I can see a fix was already sent upstream.

Thanks!

On Thu, Nov 10, 2016 at 6:44 PM, Guillaume Nault  wrote:
> On Mon, Nov 07, 2016 at 11:35:26PM +0100, Andrey Konovalov wrote:
>> Hi,
>>
>> I've got the following error report while running the syzkaller fuzzer:
>>
>> ==
>> BUG: KASAN: use-after-free in l2tp_ip6_close+0x239/0x2a0 at addr
>> 8800677276d8
>> Write of size 8 by task a.out/8668
>> CPU: 0 PID: 8668 Comm: a.out Not tainted 4.9.0-rc4+ #354
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  8800694d7b00 81b46a64 88006adb5780 8800677276c0
>>  880067727c68 8800677276c0 8800694d7b28 8150a86c
>>  8800694d7bb8 88006adb5780 8800e77276d8 8800694d7ba8
>> Call Trace:
>>  [< inline >] __dump_stack lib/dump_stack.c:15
>>  [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>>  [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
>>  [< inline >] print_address_description mm/kasan/report.c:194
>>  [] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283
>>  [< inline >] kasan_report mm/kasan/report.c:303
>>  [] __asan_report_store8_noabort+0x3e/0x40
>> mm/kasan/report.c:329
>>  [< inline >] __write_once_size ./include/linux/compiler.h:272
>>  [< inline >] __hlist_del ./include/linux/list.h:622
>>  [< inline >] hlist_del_init ./include/linux/list.h:637
>>  [] l2tp_ip6_close+0x239/0x2a0 net/l2tp/l2tp_ip6.c:239
>>  [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>>  [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>>  [] sock_release+0x8e/0x1d0 net/socket.c:570
>>  [] sock_close+0x16/0x20 net/socket.c:1017
>>  [] __fput+0x29d/0x720 fs/file_table.c:208
>>  [] fput+0x15/0x20 fs/file_table.c:244
>>  [] task_work_run+0xf8/0x170 kernel/task_work.c:116
>>  [< inline >] exit_task_work ./include/linux/task_work.h:21
>>  [] do_exit+0x883/0x2ac0 kernel/exit.c:828
>>  [] do_group_exit+0x10e/0x340 kernel/exit.c:931
>>  [< inline >] SYSC_exit_group kernel/exit.c:942
>>  [] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Object at 8800677276c0, in cache L2TP/IPv6 size: 1448
>> Allocated:
>> PID = 8692
>> [] save_stack_trace+0x16/0x20 
>> arch/x86/kernel/stacktrace.c:57
>> [] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
>> [< inline >] set_track mm/kasan/kasan.c:507
>> [] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598
>> [] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537
>> [< inline >] slab_post_alloc_hook mm/slab.h:417
>> [< inline >] slab_alloc_node mm/slub.c:2708
>> [< inline >] slab_alloc mm/slub.c:2716
>> [] kmem_cache_alloc+0xb4/0x270 mm/slub.c:2721
>> [] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1327
>> [] sk_alloc+0x38/0xaf0 net/core/sock.c:1389
>> [] inet6_create+0x2e5/0xf60 net/ipv6/af_inet6.c:182
>> [] __sock_create+0x37f/0x640 net/socket.c:1153
>> [< inline >] sock_create net/socket.c:1193
>> [< inline >] SYSC_socket net/socket.c:1223
>> [] SyS_socket+0xf0/0x1b0 net/socket.c:1203
>> [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Freed:
>> PID = 8668
>> [] save_stack_trace+0x16/0x20 
>> arch/x86/kernel/stacktrace.c:57
>> [] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
>> [< inline >] set_track mm/kasan/kasan.c:507
>> [] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
>> [< inline >] slab_free_hook mm/slub.c:1352
>> [< inline >] slab_free_freelist_hook mm/slub.c:1374
>> [< inline >] slab_free mm/slub.c:2951
>> [] kmem_cache_free+0xb3/0x2c0 mm/slub.c:2973
>> [< inline >] sk_prot_free net/core/sock.c:1370
>> [] __sk_destruct+0x319/0x480 net/core/sock.c:1445
>> [] sk_destruct+0x44/0x80 net/core/sock.c:1453
>> [] __sk_free+0x54/0x230 net/core/sock.c:1461
>> [] sk_free+0x23/0x30 net/core/sock.c:1472
>> [< inline >] sock_put ./include/net/sock.h:1591
>> [] sk_common_release+0x294/0x3e0 net/core/sock.c:2745
>> [] l2tp_ip6_close+0x209/0x2a0 net/l2tp/l2tp_ip6.c:243
>> [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>> [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>> [] sock_release+0x8e/0x1d0 net/socket.c:570
>> [] sock_close+0x16/0x20 net/socket.c:1017
>> [] __fput+0x29d/0x720 fs/file_table.c:208
>> [] fput+0x15/0x20 fs/file_table.c:244
>> [] task_work_run+0xf8/0x170 kernel/task_work.c:116
>> [< inline >] exit_task_work ./include/linux/task_work.h:21
>> [] do_exit+0x883/0x2ac0 kernel/exit.c:828
>> [] do_group_exit+0x10e/0x340 kernel/exit.c:931
>> [< inline >] SYSC_exit_group kernel/exit.c:942
>> [] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>> [] entry_SYSCALL_64_fastpath+0x1f/0xc2
>> arch/x86/entry/entry_64.S:209
>> Memory state around the buggy address:
>>  880067727580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Re: [PATCH net 1/1] net sched filters: pass netlink message flags in event notification

2016-11-22 Thread Daniel Borkmann

On 11/22/2016 06:23 AM, Cong Wang wrote:

On Thu, Nov 17, 2016 at 1:02 PM, Cong Wang  wrote:

On Wed, Nov 16, 2016 at 2:16 PM, Roman Mashak  wrote:

Userland client should be able to read an event, and reflect it back to
the kernel, therefore it needs to extract complete set of netlink flags.

For example, this will allow "tc monitor" to distinguish Add and Replace
operations.

Signed-off-by: Roman Mashak 
Signed-off-by: Jamal Hadi Salim 
---
  net/sched/cls_api.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2b2a797..8e93d4a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, struct 
sk_buff *oskb,

 for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL;
  it_chain = &tp->next)
-   tfilter_notify(net, oskb, n, tp, 0, event, false);
+   tfilter_notify(net, oskb, n, tp, n->nlmsg_flags, event, false);



I must miss something, why does it make sense to pass n->nlmsg_flags
as 'fh' to tfilter_notify()??


Ping... Any response?

It still doesn't look correct to me. I will send a fix unless someone could
explain this.


Sigh, I missed that this was applied already to -net (it certainly doesn't look
like -net material, but rather -net-next stuff) ... This definitely looks buggy
to me, the 0 as it was before was correct here (as it means we delete the whole
chain in this case).

If you could send a patch would be great. Thanks Cong!


mlx5 "syndrome" errors in kernel log

2016-11-22 Thread Jesper Dangaard Brouer

Hi Saeed,

I'm seeing below dmesg errors, after pulling net-next at commit
e796f49d826aad, before I was not seeing these errors, where my tree was
based on top of commit 319b0534b95.

mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)


Listing my firmware version:

 $ ethtool -i mlx5p2
 driver: mlx5_core
 version: 3.0-1 (January 2015)
 firmware-version: 12.12.1240
 bus-info: :02:00.1
 supports-statistics: yes
 supports-test: no
 supports-eeprom-access: no
 supports-register-dump: no
 supports-priv-flags: yes

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


git diff --stat 319b0534b95..e796f49d826aad drivers/net/ethernet/mellanox/mlx5/
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c| 145 
++--
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  40 
-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  61 
+
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h   |  49 
+-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c |  12 
 drivers/net/ethernet/mellanox/mlx5/core/main.c   |  37 
+++
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h  |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/port.c   |  57 
+++
 8 files changed, 312 insertions(+), 90 deletions(-)


$ git shortlog  319b0534b95..e796f49d826aad drivers/net/ethernet/mellanox/mlx5/
Daniel Borkmann (3):
  bpf, mlx5: fix mlx5e_create_rq taking reference on prog
  bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
  bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup

Eric Dumazet (1):
  net/mlx5e: remove napi_hash_del() calls

Gal Pressman (1):
  net/mlx5e: Expose PCIe statistics to ethtool

Huy Nguyen (3):
  net/mlx5: Add handling for port module event
  net/mlx5e: Add port module event counters to ethtool stats
  net/mlx5: Set driver version into firmware

Mohamad Haj Yahia (1):
  net/mlx5: Make the command interface cache more flexible



$ git log --pretty=oneline   319b0534b95..e796f49d826aad 
drivers/net/ethernet/mellanox/mlx5/
a055c19be98bc065a4478663ba7f6833693b8958 bpf, mlx5: drop priv->xdp_prog 
reference on netdev cleanup
c54c06290428554bc0e26d58f21a7865cbe995af bpf, mlx5: fix various refcount issues 
in mlx5e_xdp_set
97bc402db7821259f6a722cb38e060aa9b35b6e8 bpf, mlx5: fix mlx5e_create_rq taking 
reference on prog
9c7262399ba12825f3ca4b00a76d8d5e77c720f5 net/mlx5e: Expose PCIe statistics to 
ethtool
012e50e109fd27ff989492ad74c50ca7ab21e6a1 net/mlx5: Set driver version into 
firmware
bedb7c909c1911270fcb084230245df4a00bd881 net/mlx5e: Add port module event 
counters to ethtool stats
d4eb4cd78b0774c7061db56844ed2ea7790cc77c net/mlx5: Add handling for port module 
event
0ac3ea70897fb9f84b620aeda074ecccf481629d net/mlx5: Make the command interface 
cache more flexible
d30d9ccbfac7cf9a12a088d57aaf0891732e2bca net/mlx5e: remove napi_hash_del() calls


[PATCH v2] net/phy: add trace events for mdio accesses

2016-11-22 Thread Uwe Kleine-König
Make it possible to generate trace events for mdio read and write accesses.

Signed-off-by: Uwe Kleine-König 
---
Changes since (implicit) v1:

 - make use of TRACE_EVENT_CONDITION

Alternatively to this patch the condition could be

+   TP_CONDITION(err == 0),

but then we'd need in the read callbacks:

+   trace_mdio_access(bus, 1, addr, regnum, retval, retval < 0 ? retval : 
0);

or at least

+   trace_mdio_access(bus, 1, addr, regnum, retval, retval < 0);

which both looks more ugly IMHO.

Best regards
Uwe

 drivers/net/phy/mdio_bus.c  | 11 +++
 include/trace/events/mdio.h | 42 ++
 2 files changed, 53 insertions(+)
 create mode 100644 include/trace/events/mdio.h

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 09deef4bed09..653d076eafe5 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -38,6 +38,9 @@
 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 int mdiobus_register_device(struct mdio_device *mdiodev)
 {
if (mdiodev->bus->mdio_map[mdiodev->addr])
@@ -461,6 +464,8 @@ int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 
regnum)
retval = bus->read(bus, addr, regnum);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
return retval;
 }
 EXPORT_SYMBOL(mdiobus_read_nested);
@@ -485,6 +490,8 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
retval = bus->read(bus, addr, regnum);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
return retval;
 }
 EXPORT_SYMBOL(mdiobus_read);
@@ -513,6 +520,8 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 
regnum, u16 val)
err = bus->write(bus, addr, regnum, val);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 0, addr, regnum, val, err);
+
return err;
 }
 EXPORT_SYMBOL(mdiobus_write_nested);
@@ -538,6 +547,8 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 
regnum, u16 val)
err = bus->write(bus, addr, regnum, val);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 0, addr, regnum, val, err);
+
return err;
 }
 EXPORT_SYMBOL(mdiobus_write);
diff --git a/include/trace/events/mdio.h b/include/trace/events/mdio.h
new file mode 100644
index ..468e2d095d19
--- /dev/null
+++ b/include/trace/events/mdio.h
@@ -0,0 +1,42 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mdio
+
+#if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MDIO_H
+
+#include 
+
+TRACE_EVENT_CONDITION(mdio_access,
+
+   TP_PROTO(struct mii_bus *bus, int read,
+unsigned addr, unsigned regnum, u16 val, int err),
+
+   TP_ARGS(bus, read, addr, regnum, val, err),
+
+   TP_CONDITION(err >= 0),
+
+   TP_STRUCT__entry(
+   __array(char, busid, MII_BUS_ID_SIZE)
+   __field(int, read)
+   __field(unsigned, addr)
+   __field(unsigned, regnum)
+   __field(u16, val)
+   ),
+
+   TP_fast_assign(
+   strncpy(__entry->busid, bus->id, MII_BUS_ID_SIZE);
+   __entry->read = read;
+   __entry->addr = addr;
+   __entry->regnum = regnum;
+   __entry->val = val;
+   ),
+
+   TP_printk("%s %-5s phy:0x%02x reg:0x%02x val:0x%04hx",
+ __entry->busid, __entry->read ? "read" : "write",
+ __entry->addr, __entry->regnum, __entry->val)
+);
+
+#endif /* if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include 
-- 
2.10.2



Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation

2016-11-22 Thread Jerome Brunet
On Mon, 2016-11-21 at 21:35 -0800, Florian Fainelli wrote:
> Le 21/11/2016 à 08:47, Andrew Lunn a écrit :
> > 
> > > 
> > > What I did not realize when doing this patch for the realtek
> > > driver is
> > > that there is already 6 valid modes defined in the kernel
> > > 
> > > #define MDIO_EEE_100TXMDIO_AN_EEE_ADV_100TX   
> > > /*
> > > 100TX EEE cap */
> > > #define MDIO_EEE_1000TMDIO_AN_EEE_ADV_1000T   
> > > /*
> > > 1000T EEE cap */
> > > #define MDIO_EEE_10GT 0x0008  /* 10GT EEE
> > > cap */
> > > #define MDIO_EEE_1000KX   0x0010  /* 1000KX
> > > EEE cap
> > > */
> > > #define MDIO_EEE_10GKX4   0x0020  /* 10G KX4
> > > EEE cap
> > > */
> > > #define MDIO_EEE_10GKR0x0040  /* 10G KR EEE
> > > cap
> > > */
> > > 
> > > I took care of only 2 in the case of realtek.c since it only
> > > support
> > > MDIO_EEE_100TX and MDIO_EEE_1000T.
> > > 
> > > Defining a property for each is certainly doable but it does not
> > > look
> > > very nice either. If it extends in the future, it will get even
> > > more
> > > messier, especially if you want to disable everything.
> > 
> > Yes, agreed.
> 
> One risk with the definition a group of advertisement capabilities
> (under the form of a bitmask for instance) to enable/disable is that
> we
> end up with Device Tree contain some kind of configuration policy as
> opposed to just flagging particular hardware features as broken.

The code proposed only allows to disable EEE advertisement (not
enable), so we should not see it used as a configuration policy in DT.
To make this more explicit, I could replace the property "eee-advert-
disable" by "eee-broken" ?

> 
> Fortunately, there does not seem to be a ton of PHYs out there which
> require EEE

It is quite difficult to have the real picture here because some PHYs
have EEE disabled by default and you have to explicitly enable it.
I have no idea of the ratio between the 2 phy policies.

> to be disabled to function properly so having individual
> properties vs. bitmasks/groups is kind of speculative here.

In the particular instance of the OdroidC2, disabling EEE for GbE only
enough. However, If you have a PHY broken with, I think it is likely
that you might want to disable all (supported) EEE modes. That's reason
why I prefer bitmask. I agree both are functionally similar, this is
kind of a cosmetic debate.

> 
> Another approach to solving this problem could be to register a PHY
> fixup which disables EEE at the PHY level, and which is only called
> for
> specific boards affected by this problem
> (of_machine_is_compatible()).
> This code can leave in arch/*/* when that is possible, 

That something I was looking at, but we don't have these files anymore
on ARM64 (looking at your comment, you already know this)

> or it can just be
> somewhere where it is relevant, e.g; in the PHY driver for instance
> (similarly to how PCI fixups are done).

Do you prefer having board specific code inside generic driver than
having the setting living in DT? Peppe told me they also had a few
platform with similar issues. The point is that this could be useful to
other people, so it could spread a grow a bit.

I would prefer having this in the DT, but I can definitely do it the
PHY with of_machine_is_compatible() and register_fixup is this what you
prefer/want. 

Cheers
Jerome




net/icmp: null-ptr-deref in icmp6_send

2016-11-22 Thread Andrey Konovalov
Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

It seems that skb_dst(skb) may end up being NULL.

As far as I can see the bug was introduced in commit 5d41ce29e ("net:
icmp6_send should use dst dev to determine L3 domain").
ICMP v4 probaly has similar issue due to 9d1a6c4ea ("net:
icmp_route_lookup should use rt dev to determine L3 domain").

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 8800666d4200 task.stack: 880067348000
RIP: 0010:[]  []
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:88006734f2c0  EFLAGS: 00010206
RAX: 8800666d4200 RBX:  RCX: 
RDX:  RSI: dc00 RDI: 0018
RBP: 88006734f630 R08: 880064138418 R09: 0003
R10: dc00 R11: 0005 R12: 
R13: 84e7e200 R14: 880064138484 R15: 8800641383c0
FS:  7fb3887a07c0() GS:88006cc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2000 CR3: 6b04 CR4: 06f0
Stack:
 8800666d4200 8800666d49f8 8800666d4200 84c02460
 8800666d4a1a 11000ccdaa2f 88006734f498 0046
 88006734f440 832f4269 880064ba7456 
Call Trace:
 [] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 [] __netif_receive_skb_core+0x187b/0x2a10 net/core/dev.c:4208
 [] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4246
 [] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4274
 [] netif_receive_skb+0x48/0x250 net/core/dev.c:4298
 [] tun_get_user+0xbde/0x2890 drivers/net/tun.c:1308
 [] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
 [< inline >] new_sync_write fs/read_write.c:499
 [] __vfs_write+0x334/0x570 fs/read_write.c:512
 [] vfs_write+0x17b/0x500 fs/read_write.c:560
 [< inline >] SYSC_write fs/read_write.c:607
 [] SyS_write+0xd4/0x1a0 fs/read_write.c:599
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 67 58 41 f6 c4 01 0f 85 d4 07 00 00 49 83 e4 fe e8 ea 5e fc fd
49 8d 7c 24 18 49 ba 00 00 00 00 00 fc ff df 49 89 f9 49 c1 e9 03 <43>
80 3c 11 00 0f 85 c5 17 00 00 4d 8b 64 24 18 65 ff 05 cd 3c
RIP  [] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
 RSP 
---[ end trace 12dd736536064d71 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt


[patch net-next 0/2] mlxsw: core: Implement thermal zone

2016-11-22 Thread Jiri Pirko
From: Jiri Pirko 

Implement thermal zone for mlxsw based HW.
The first patch is just a register dependency for the second patch.

Ivan Vecera (1):
  mlxsw: core: Implement thermal zone

Jiri Pirko (1):
  mlxsw: reg: Add Management Fan Speed Limit register

 drivers/net/ethernet/mellanox/mlxsw/Kconfig|   9 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.c |   8 +
 drivers/net/ethernet/mellanox/mlxsw/core.h |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 442 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h  |  49 +++
 6 files changed, 533 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

-- 
2.7.4



[patch net-next 1/2] mlxsw: reg: Add Management Fan Speed Limit register

2016-11-22 Thread Jiri Pirko
From: Jiri Pirko 

The MFSL register is used to configure the fan speed event / interrupt
notification mechanism. Fan speed threshold are defined for both
under-speed and over-speed.

Signed-off-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 49 +++
 1 file changed, 49 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index edad7cb..2618e9c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4518,6 +4518,54 @@ static inline void mlxsw_reg_mfsm_pack(char *payload, u8 
tacho)
mlxsw_reg_mfsm_tacho_set(payload, tacho);
 }
 
+/* MFSL - Management Fan Speed Limit Register
+ * --
+ * The Fan Speed Limit register is used to configure the fan speed
+ * event / interrupt notification mechanism. Fan speed threshold are
+ * defined for both under-speed and over-speed.
+ */
+#define MLXSW_REG_MFSL_ID 0x9004
+#define MLXSW_REG_MFSL_LEN 0x0C
+
+MLXSW_REG_DEFINE(mfsl, MLXSW_REG_MFSL_ID, MLXSW_REG_MFSL_LEN);
+
+/* reg_mfsl_tacho
+ * Fan tachometer index.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mfsl, tacho, 0x00, 24, 4);
+
+/* reg_mfsl_tach_min
+ * Tachometer minimum value (minimum RPM).
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mfsl, tach_min, 0x04, 0, 16);
+
+/* reg_mfsl_tach_max
+ * Tachometer maximum value (maximum RPM).
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mfsl, tach_max, 0x08, 0, 16);
+
+static inline void mlxsw_reg_mfsl_pack(char *payload, u8 tacho,
+  u16 tach_min, u16 tach_max)
+{
+   MLXSW_REG_ZERO(mfsl, payload);
+   mlxsw_reg_mfsl_tacho_set(payload, tacho);
+   mlxsw_reg_mfsl_tach_min_set(payload, tach_min);
+   mlxsw_reg_mfsl_tach_max_set(payload, tach_max);
+}
+
+static inline void mlxsw_reg_mfsl_unpack(char *payload, u8 tacho,
+u16 *p_tach_min, u16 *p_tach_max)
+{
+   if (p_tach_min)
+   *p_tach_min = mlxsw_reg_mfsl_tach_min_get(payload);
+
+   if (p_tach_max)
+   *p_tach_max = mlxsw_reg_mfsl_tach_max_get(payload);
+}
+
 /* MTCAP - Management Temperature Capabilities
  * ---
  * This register exposes the capabilities of the device and
@@ -5228,6 +5276,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
MLXSW_REG(mfcr),
MLXSW_REG(mfsc),
MLXSW_REG(mfsm),
+   MLXSW_REG(mfsl),
MLXSW_REG(mtcap),
MLXSW_REG(mtmp),
MLXSW_REG(mpat),
-- 
2.7.4



[patch net-next 2/2] mlxsw: core: Implement thermal zone

2016-11-22 Thread Jiri Pirko
From: Ivan Vecera 

Implement thermal zone for mlxsw based HW. It uses temperature sensor
provided by ASIC (the same as mlxsw hwmon interface) to report current
temp to thermal core. The ASIC's PWM is then used to control speed
of system fans registered as cooling devices.

Signed-off-by: Ivan Vecera 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig|   9 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile   |   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.c |   8 +
 drivers/net/ethernet/mellanox/mlxsw/core.h |  24 ++
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c | 442 +
 5 files changed, 484 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig 
b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index c9822e6..95ae4c0 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -19,6 +19,15 @@ config MLXSW_CORE_HWMON
---help---
  Say Y here if you want to expose HWMON interface on mlxsw devices.
 
+config MLXSW_CORE_THERMAL
+   bool "Thermal zone support for Mellanox Technologies Switch ASICs"
+   depends on MLXSW_CORE && THERMAL
+   depends on !(MLXSW_CORE=y && THERMAL=m)
+   default y
+   ---help---
+Say Y here if you want to automatically control fans speed according
+ambient temperature reported by ASIC.
+
 config MLXSW_PCI
tristate "PCI bus implementation for Mellanox Technologies Switch ASICs"
depends on PCI && HAS_DMA && HAS_IOMEM && MLXSW_CORE
diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile 
b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 2722942..fe8dadb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_MLXSW_CORE)   += mlxsw_core.o
 mlxsw_core-objs:= core.o
 mlxsw_core-$(CONFIG_MLXSW_CORE_HWMON) += core_hwmon.o
+mlxsw_core-$(CONFIG_MLXSW_CORE_THERMAL) += core_thermal.o
 obj-$(CONFIG_MLXSW_PCI)+= mlxsw_pci.o
 mlxsw_pci-objs := pci.o
 obj-$(CONFIG_MLXSW_I2C)+= mlxsw_i2c.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 763752f..bcd7251 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -131,6 +131,7 @@ struct mlxsw_core {
} lag;
struct mlxsw_res res;
struct mlxsw_hwmon *hwmon;
+   struct mlxsw_thermal *thermal;
struct mlxsw_core_port ports[MLXSW_PORT_MAX_PORTS];
unsigned long driver_priv[0];
/* driver_priv has to be always the last item */
@@ -1162,6 +1163,11 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_hwmon_init;
 
+   err = mlxsw_thermal_init(mlxsw_core, mlxsw_bus_info,
+&mlxsw_core->thermal);
+   if (err)
+   goto err_thermal_init;
+
if (mlxsw_driver->init) {
err = mlxsw_driver->init(mlxsw_core, mlxsw_bus_info);
if (err)
@@ -1178,6 +1184,7 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
if (mlxsw_core->driver->fini)
mlxsw_core->driver->fini(mlxsw_core);
 err_driver_init:
+err_thermal_init:
 err_hwmon_init:
devlink_unregister(devlink);
 err_devlink_register:
@@ -1204,6 +1211,7 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core 
*mlxsw_core)
mlxsw_core_debugfs_fini(mlxsw_core);
if (mlxsw_core->driver->fini)
mlxsw_core->driver->fini(mlxsw_core);
+   mlxsw_thermal_fini(mlxsw_core->thermal);
devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
mlxsw_core->bus->fini(mlxsw_core->bus_priv);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h 
b/drivers/net/ethernet/mellanox/mlxsw/core.h
index f7a4d83..3de8955 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -321,4 +321,28 @@ static inline int mlxsw_hwmon_init(struct mlxsw_core 
*mlxsw_core,
 
 #endif
 
+struct mlxsw_thermal;
+
+#ifdef CONFIG_MLXSW_CORE_THERMAL
+
+int mlxsw_thermal_init(struct mlxsw_core *mlxsw_core,
+  const struct mlxsw_bus_info *mlxsw_bus_info,
+  struct mlxsw_thermal **p_thermal);
+void mlxsw_thermal_fini(struct mlxsw_thermal *thermal);
+
+#else
+
+static inline int mlxsw_thermal_init(struct mlxsw_core *mlxsw_core,
+const struct mlxsw_bus_info 
*mlxsw_bus_info,
+struct mlxsw_thermal **p_thermal)
+{
+   return 0;
+}
+
+static inline void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
+{
+}
+
+#endif
+
 #e

[PATCH] net: dsa: mv88e6xxx: add MV88E6097 switch

2016-11-22 Thread Stefan Eichenberger
Add support for the MV88E6097 switch. The change was tested on an Armada
based platform with a MV88E6097 switch.

Signed-off-by: Stefan Eichenberger 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 19 +++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5a9729b..20d6fb5 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3213,6 +3213,12 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
.phy_write = mv88e6xxx_phy_ppu_write,
 };
 
+static const struct mv88e6xxx_ops mv88e6097_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+};
+
 static const struct mv88e6xxx_ops mv88e6123_ops = {
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
.phy_read = mv88e6xxx_read,
@@ -3342,6 +3348,19 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.ops = &mv88e6095_ops,
},
 
+   [MV88E6097] = {
+   .prod_num = PORT_SWITCH_ID_PROD_NUM_6097,
+   .family = MV88E6XXX_FAMILY_6097,
+   .name = "Marvell 88E6097/88E6097F",
+   .num_databases = 4096,
+   .num_ports = 11,
+   .port_base_addr = 0x10,
+   .global1_addr = 0x1b,
+   .age_time_coeff = 15000,
+   .flags = MV88E6XXX_FLAGS_FAMILY_6097,
+   .ops = &mv88e6097_ops,
+   },
+
[MV88E6123] = {
.prod_num = PORT_SWITCH_ID_PROD_NUM_6123,
.family = MV88E6XXX_FAMILY_6165,
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index e572121..42e28f8 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -74,6 +74,7 @@
 #define PORT_SWITCH_ID 0x03
 #define PORT_SWITCH_ID_PROD_NUM_6085   0x04a
 #define PORT_SWITCH_ID_PROD_NUM_6095   0x095
+#define PORT_SWITCH_ID_PROD_NUM_6097   0x099
 #define PORT_SWITCH_ID_PROD_NUM_6131   0x106
 #define PORT_SWITCH_ID_PROD_NUM_6320   0x115
 #define PORT_SWITCH_ID_PROD_NUM_6123   0x121
@@ -353,6 +354,7 @@
 enum mv88e6xxx_model {
MV88E6085,
MV88E6095,
+   MV88E6097,
MV88E6123,
MV88E6131,
MV88E6161,
-- 
2.9.3



[PATCH] net: dsa: mv88e6xxx: egress all frames

2016-11-22 Thread Stefan Eichenberger
Egress multicast and egress unicast is only enabled for CPU/DSA ports
but for switching operation it seems it should be enabled for all ports.
Do I miss something here?

I did the following test:
brctl addbr br0
brctl addif br0 lan0
brctl addif br0 lan1

In this scenario the unicast and multicast packets were not forwarded,
therefore ARP requests were not resolved, and no connection could be
established.

If no bridge is configured we do not forward unicast and multicast
packets because the VLAN mapping is active.

Signed-off-by: Stefan Eichenberger 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 883fd98..fe76372 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2506,15 +2506,14 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
mv88e6xxx_6185_family(chip) || mv88e6xxx_6320_family(chip))
reg = PORT_CONTROL_IGMP_MLD_SNOOP |
PORT_CONTROL_USE_TAG | PORT_CONTROL_USE_IP |
-   PORT_CONTROL_STATE_FORWARDING;
+   PORT_CONTROL_STATE_FORWARDING |
+   PORT_CONTROL_FORWARD_UNKNOWN_MC | PORT_CONTROL_FORWARD_UNKNOWN;
if (dsa_is_cpu_port(ds, port)) {
if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_EDSA))
-   reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
-   PORT_CONTROL_FORWARD_UNKNOWN_MC;
+   reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA;
else
reg |= PORT_CONTROL_DSA_TAG;
-   reg |= PORT_CONTROL_EGRESS_ADD_TAG |
-   PORT_CONTROL_FORWARD_UNKNOWN;
+   reg |= PORT_CONTROL_EGRESS_ADD_TAG;
}
if (dsa_is_dsa_port(ds, port)) {
if (mv88e6xxx_6095_family(chip) ||
-- 
2.9.3



[PATCH v8] mac80211: multicast to unicast conversion

2016-11-22 Thread Michael Braun
Add the ability for an AP (and associated VLANs) to perform
multicast-to-unicast conversion for ARP, IPv4 and IPv6 frames
(possibly within 802.1Q). If enabled, such frames are to be sent
to each station separately, with the DA replaced by their own
MAC address rather than the group address.

Note that this may break certain expectations of the receiver,
such as the ability to drop unicast IP packets received within
multicast L2 frames, or the ability to not send ICMP destination
unreachable messages for packets received in L2 multicast (which
is required, but the receiver can't tell the difference if this
new option is enabled.)

This also doesn't implement the 802.11 DMS (directed multicast
service).

Signed-off-by: Michael Braun 

--
v8:
  - remove superflous check
  - change return type to bool
v7:
  - avoid recursion
  - style and description
v5:
  - rename bss->unicast to bss->multicast_to_unicast
  - access sdata->bss only after checking iftype
v4:
  - rename MULTICAST_TO_UNICAST to MULTICAST_TO_UNICAST
v3: fix compile error for trace.h
v2: add nl80211 toggle
rename tx_dnat to change_da
change int to bool unicast
---
 net/mac80211/cfg.c|  12 +
 net/mac80211/debugfs_netdev.c |   3 ++
 net/mac80211/ieee80211_i.h|   1 +
 net/mac80211/tx.c | 122 +-
 4 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 1edb017..7de342a 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -3345,6 +3345,17 @@ static int ieee80211_del_tx_ts(struct wiphy *wiphy, 
struct net_device *dev,
return -ENOENT;
 }
 
+static int ieee80211_set_multicast_to_unicast(struct wiphy *wiphy,
+ struct net_device *dev,
+ const bool enabled)
+{
+   struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+
+   sdata->u.ap.multicast_to_unicast = enabled;
+
+   return 0;
+}
+
 const struct cfg80211_ops mac80211_config_ops = {
.add_virtual_intf = ieee80211_add_iface,
.del_virtual_intf = ieee80211_del_iface,
@@ -3430,4 +3441,5 @@ const struct cfg80211_ops mac80211_config_ops = {
.set_ap_chanwidth = ieee80211_set_ap_chanwidth,
.add_tx_ts = ieee80211_add_tx_ts,
.del_tx_ts = ieee80211_del_tx_ts,
+   .set_multicast_to_unicast = ieee80211_set_multicast_to_unicast,
 };
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index ed7bff4..509c6c3 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -487,6 +487,8 @@ static ssize_t ieee80211_if_fmt_num_buffered_multicast(
 }
 IEEE80211_IF_FILE_R(num_buffered_multicast);
 
+IEEE80211_IF_FILE(multicast_to_unicast, u.ap.multicast_to_unicast, HEX);
+
 /* IBSS attributes */
 static ssize_t ieee80211_if_fmt_tsf(
const struct ieee80211_sub_if_data *sdata, char *buf, int buflen)
@@ -642,6 +644,7 @@ static void add_ap_files(struct ieee80211_sub_if_data 
*sdata)
DEBUGFS_ADD(dtim_count);
DEBUGFS_ADD(num_buffered_multicast);
DEBUGFS_ADD_MODE(tkip_mic_test, 0200);
+   DEBUGFS_ADD_MODE(multicast_to_unicast, 0600);
 }
 
 static void add_vlan_files(struct ieee80211_sub_if_data *sdata)
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 70c0963..84374ed 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -293,6 +293,7 @@ struct ieee80211_if_ap {
 driver_smps_mode; /* smps mode request */
 
struct work_struct request_smps_work;
+   bool multicast_to_unicast;
 };
 
 struct ieee80211_if_wds {
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index c3ce86e..5ed 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3418,6 +3419,115 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb,
rcu_read_unlock();
 }
 
+static int ieee80211_change_da(struct sk_buff *skb, struct sta_info *sta)
+{
+   struct ethhdr *eth;
+   int err;
+
+   err = skb_ensure_writable(skb, ETH_HLEN);
+   if (unlikely(err))
+   return err;
+
+   eth = (void *)skb->data;
+   ether_addr_copy(eth->h_dest, sta->sta.addr);
+
+   return 0;
+}
+
+static inline bool
+ieee80211_multicast_to_unicast(struct sk_buff *skb, struct net_device *dev)
+{
+   struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+   const struct ethhdr *eth = (void *)skb->data;
+   const struct vlan_ethhdr *ethvlan = (void *)skb->data;
+   u16 ethertype;
+
+   if (likely(!is_multicast_ether_addr(eth->h_dest)))
+   return 0;
+
+   switch (sdata->vif.type) {
+   case NL80211_IFTYPE_AP_VLAN:
+   if (sdata->u.vlan.sta)
+   return 0;
+   if (sdata->wdev.use_4addr)
+   

[PATCH v5] iproute2: macvlan: add "source" mode

2016-11-22 Thread Michael Braun
Adjusting iproute2 utility to support new macvlan link type mode called
"source".

Example of commands that can be applied:
  ip link add link eth0 name macvlan0 type macvlan mode source
  ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr del 00:11:11:11:11:11
  ip link set link dev macvlan0 type macvlan macaddr flush
  ip -details link show dev macvlan0

Based on previous work of Stefan Gula 

Signed-off-by: Michael Braun 

Cc: ste...@gmail.com

v5:
 - rebase and fix checkpatch

v4:
 - add MACADDR_SET support
 - skip FLAG_UNICAST / FLAG_UNICAST_ALL as this is not upstream
 - fix man page
---
 ip/iplink_macvlan.c   | 124 +++---
 man/man8/ip-link.8.in |  42 -
 2 files changed, 158 insertions(+), 8 deletions(-)

diff --git a/ip/iplink_macvlan.c b/ip/iplink_macvlan.c
index 83ff961..b9a146f 100644
--- a/ip/iplink_macvlan.c
+++ b/ip/iplink_macvlan.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "rt_names.h"
 #include "utils.h"
@@ -29,7 +30,11 @@
 static void print_explain(struct link_util *lu, FILE *f)
 {
fprintf(f,
-   "Usage: ... %s mode { private | vepa | bridge | passthru 
[nopromisc] }\n",
+   "Usage: ... %s mode MODE [flag MODE_FLAG] MODE_OPTS\n"
+   "MODE: private | vepa | bridge | passthru | source\n"
+   "MODE_FLAG: null | nopromisc\n"
+   "MODE_OPTS: for mode \"source\":\n"
+   "\tmacaddr { { add | del }  | set [  [ 
  ... ] ] | flush }\n",
lu->id
);
 }
@@ -43,7 +48,15 @@ static void explain(struct link_util *lu)
 static int mode_arg(const char *arg)
 {
fprintf(stderr,
-   "Error: argument of \"mode\" must be \"private\", \"vepa\", 
\"bridge\" or \"passthru\", not \"%s\"\n",
+   "Error: argument of \"mode\" must be \"private\", \"vepa\", 
\"bridge\", \"passthru\" or \"source\", not \"%s\"\n",
+   arg);
+   return -1;
+}
+
+static int flag_arg(const char *arg)
+{
+   fprintf(stderr,
+   "Error: argument of \"flag\" must be \"nopromisc\" or \"null\", 
not \"%s\"\n",
arg);
return -1;
 }
@@ -53,6 +66,10 @@ static int macvlan_parse_opt(struct link_util *lu, int argc, 
char **argv,
 {
__u32 mode = 0;
__u16 flags = 0;
+   __u32 mac_mode = 0;
+   int has_flags = 0;
+   char mac[ETH_ALEN];
+   struct rtattr *nmac;
 
while (argc > 0) {
if (matches(*argv, "mode") == 0) {
@@ -66,10 +83,72 @@ static int macvlan_parse_opt(struct link_util *lu, int 
argc, char **argv,
mode = MACVLAN_MODE_BRIDGE;
else if (strcmp(*argv, "passthru") == 0)
mode = MACVLAN_MODE_PASSTHRU;
+   else if (strcmp(*argv, "source") == 0)
+   mode = MACVLAN_MODE_SOURCE;
else
return mode_arg(*argv);
+   } else if (matches(*argv, "flag") == 0) {
+   NEXT_ARG();
+
+   if (strcmp(*argv, "nopromisc") == 0)
+   flags |= MACVLAN_FLAG_NOPROMISC;
+   else if (strcmp(*argv, "null") == 0)
+   flags |= 0;
+   else
+   return flag_arg(*argv);
+
+   has_flags = 1;
+
+   } else if (matches(*argv, "macaddr") == 0) {
+   NEXT_ARG();
+
+   if (strcmp(*argv, "add") == 0) {
+   mac_mode = MACVLAN_MACADDR_ADD;
+   } else if (strcmp(*argv, "del") == 0) {
+   mac_mode = MACVLAN_MACADDR_DEL;
+   } else if (strcmp(*argv, "set") == 0) {
+   mac_mode = MACVLAN_MACADDR_SET;
+   } else if (strcmp(*argv, "flush") == 0) {
+   mac_mode = MACVLAN_MACADDR_FLUSH;
+   } else {
+   explain(lu);
+   return -1;
+   }
+
+   addattr32(n, 1024, IFLA_MACVLAN_MACADDR_MODE, mac_mode);
+
+   if (mac_mode == MACVLAN_MACADDR_ADD ||
+   mac_mode == MACVLAN_MACADDR_DEL) {
+   NEXT_ARG();
+
+   if (ll_addr_a2n(mac, sizeof(mac),
+   *argv) != ETH_ALEN)
+   return -1;
+
+   addattr_l(n, 1024, IFLA_MACVLAN_MACADDR, &mac,
+ ETH_ALEN);
+   }
+
+   if (mac_mode == MACVLAN_MACADDR_SET) {
+   

[PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc

2016-11-22 Thread Mike Manning
Bursts of failures may occur when adding IPv6 routes via Netlink to the
kernel when testing under scale (e.g. 500 routes lost out of 1M). The
reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
extended the area map in time for the atomic allocation using percpu.c:
pcpu_alloc() to succeed. This results in route additions failing with
an -ENOMEM error.

While the sender of the Netlink msg to add this route could check for
an ACK and retransmit in the case of an -ENOMEM error, the latter
should not occur in the first place if there is plenty of memory. The
solution is to use non-atomic alloc for rt6_info instead. While the
client may now be blocked for longer depending on the state of the
chunk being added to, this work has to be incurred at some point.

The alternative solution would be to provide configurable parameters
e.g. via sysctl in percpu.c for default map size, low/high empty pages
and map margins. For this solution, the map margin sizes need to be
stored per chunk, as large margins cannot be used if the dynamic early
slots map size is in use. This is not a preferred solution though, as
it requires tuning of these parameters to provide sufficient margins to
avoid -ENOMEM errors depending on system requirements.

Signed-off-by: Mike Manning 
---
 net/ipv6/route.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 1b57e11..0e9bb76 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
 
if (rt) {
-   rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
+   rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
if (rt->rt6i_pcpu) {
int cpu;
 
-- 
1.7.10.4



[PATCH] fec: Always write MAC address to controller register

2016-11-22 Thread Daniel Krüger
On non-FEC_QUIRK_ENET_MAC types the MAC address needs to be set in FEC
during initialisation, if not done by bootloader already. Especially random
MACs or MAC addresses provided by kernel parameter must be set.

Signed-off-by: Daniel Krueger 
---
 drivers/net/ethernet/freescale/fec_main.c |   14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 2a03857..ea32fda 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -902,14 +902,14 @@ fec_restart(struct net_device *ndev)
/*
 * enet-mac reset will reset mac address registers too,
 * so need to reconfigure it.
+* On non-FEC_QUIRK_ENET_MAC types it won't be reset,
+* but it must be configured once at least (especially random MACs).
 */
-   if (fep->quirks & FEC_QUIRK_ENET_MAC) {
-   memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
-   writel((__force u32)cpu_to_be32(temp_mac[0]),
-  fep->hwp + FEC_ADDR_LOW);
-   writel((__force u32)cpu_to_be32(temp_mac[1]),
-  fep->hwp + FEC_ADDR_HIGH);
-   }
+   memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
+   writel((__force u32)cpu_to_be32(temp_mac[0]),
+  fep->hwp + FEC_ADDR_LOW);
+   writel((__force u32)cpu_to_be32(temp_mac[1]),
+  fep->hwp + FEC_ADDR_HIGH);
 
/* Clear any outstanding interrupt. */
writel(0x, fep->hwp + FEC_IEVENT);
-- 
1.7.9.5



net/udp: bug in skb_pull_rcsum

2016-11-22 Thread Andrey Konovalov
Hi,

I've got the following error report while fuzzing the kernel with syzkaller.

A reproducer is attached.

On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).

[ cut here ]
kernel BUG at net/core/skbuff.c:3029!
invalid opcode:  [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3854 Comm: a.out Not tainted 4.9.0-rc6+ #431
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880068472c00 task.stack: 880063ec8000
RIP: 0010:[]  []
skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
RSP: 0018:880063ecf660  EFLAGS: 00010297
RAX: 880068472c00 RBX: 880065a2da00 RCX: 
RDX:  RSI: 000d RDI: ed000c7d9ec0
RBP: 880063ecf690 R08: 11000d08e67e R09: 11000cb45b50
R10: dc00 R11:  R12: 880065a2da80
R13: 0008 R14: 880065a2dad8 R15: 0001
FS:  7fbb006497c0() GS:88006cd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20032fe0 CR3: 636d9000 CR4: 06e0
Stack:
 88006bfbb948 880065a2da00 88006416 11000cb45b52
  11000d4d3933 880063ecf6f8 83354ced
 fe00 880065a2da90 880063ecf6c0 0001
Call Trace:
 [< inline >] udp_csum_pull_header ./include/net/udp.h:166
 [] udpv6_queue_rcv_skb+0x37d/0x17b0 net/ipv6/udp.c:625
 [< inline >] sk_backlog_rcv ./include/net/sock.h:874
 [] __release_sock+0x126/0x3a0 net/core/sock.c:2046
 [] release_sock+0x59/0x1c0 net/core/sock.c:2504
 [] udpv6_sendmsg+0x1310/0x24a0 net/ipv6/udp.c:1273
 [] inet_sendmsg+0x317/0x4e0 net/ipv4/af_inet.c:734
 [< inline >] sock_sendmsg_nosec net/socket.c:621
 [] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [] sock_write_iter+0x221/0x3b0 net/socket.c:829
 [] do_iter_readv_writev+0x2bb/0x3f0 fs/read_write.c:695
 [] do_readv_writev+0x431/0x730 fs/read_write.c:872
 [] vfs_writev+0x8f/0xc0 fs/read_write.c:911
 [] do_writev+0xe1/0x240 fs/read_write.c:944
 [< inline >] SYSC_writev fs/read_write.c:1017
 [] SyS_writev+0x27/0x30 fs/read_write.c:1014
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 89 f8 49 c1 e8 03 47 0f b6 14 08 45 84 d2 74 0a 41 80 fa 03 0f
8e cf 00 00 00 80 a3 91 00 00 00 f9 e9 43 ff ff ff e8 3b 79 79 fe <0f>
0b e8 34 79 79 fe 0f 0b e8 2d 79 79 fe 48 8b 7d d0 31 d2 44
RIP  [] skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
 RSP 
---[ end trace a5d5d2cef6a25ecb ]---
==
// autogenerated by syzkaller (http://github.com/google/syzkaller)

#ifndef __NR_mmap
#define __NR_mmap 9
#endif
#ifndef __NR_bind
#define __NR_bind 49
#endif
#ifndef __NR_sendmsg
#define __NR_sendmsg 46
#endif
#ifndef __NR_writev
#define __NR_writev 20
#endif
#ifndef __NR_socket
#define __NR_socket 41
#endif
#ifndef __NR_syz_fuse_mount
#define __NR_syz_fuse_mount 104
#endif
#ifndef __NR_syz_fuseblk_mount
#define __NR_syz_fuseblk_mount 105
#endif
#ifndef __NR_syz_open_dev
#define __NR_syz_open_dev 102
#endif
#ifndef __NR_syz_open_pts
#define __NR_syz_open_pts 103
#endif
#ifndef __NR_syz_test
#define __NR_syz_test 101
#endif

#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

__thread int skip_segv;
__thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* uctx)
{
  if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED))
_longjmp(segv_env, 1);
  exit(sig);
}

static void install_segv_handler()
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)\
  {\
__atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);   \
if (_setjmp(segv_env) == 0) {  \
  __VA_ARGS__; \
}  \
__atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);   \
  }

static uintptr_t syz_open_dev(uintptr_t a0, uintptr_t a1, uintptr_t a2)
{
  if (a0 == 0xc || a0 == 0xb) {

char buf[128];
sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block",
(uint8_t)a1, (uint8_t)a2);
return open(buf, O_RDWR, 0);
  } else {

char buf[1024];
char* hash;
strncpy(buf, (char*)a0, sizeof(buf));
buf[sizeof(buf) - 1] = 0;
while ((hash = strchr(buf, '#'))) {
  *hash = '0' + (char)(a1 % 10);
  a1 /= 10;
}
return open(buf, a2, 0);
  }
}

static uintptr_t syz_open_pts(uintptr_t a0, uin

Re: mlx5 "syndrome" errors in kernel log

2016-11-22 Thread Saeed Mahameed
On Tue, Nov 22, 2016 at 11:59 AM, Jesper Dangaard Brouer
 wrote:
>
> Hi Saeed,
>
> I'm seeing below dmesg errors, after pulling net-next at commit
> e796f49d826aad, before I was not seeing these errors, where my tree was
> based on top of commit 319b0534b95.
>
> mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.1: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
> mlx5_core :02:00.0: mlx5_cmd_check:698:(pid 8788): ACCESS_REG(0x805) 
> op_mod(0x1) failed, status bad parameter(0x3), syndrome (0x6c4d48)
>
>
> Listing my firmware version:
>
>  $ ethtool -i mlx5p2
>  driver: mlx5_core
>  version: 3.0-1 (January 2015)
>  firmware-version: 12.12.1240

Hi Jesper,

Seems like this FW version doesn't support a new FW command introduced
by "net/mlx5e: Expose PCIe statistics to ethtool"

I suggest to upgrade FW, but if you don't know how to do it or in a
hurry, please go ahead and revert "
   net/mlx5e: Expose PCIe statistics to ethtool"

I will need to introduce a new capability bit as a permanent solution
and a fix for the above patch.

Thanks for the report,
We will handle this.


Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc

2016-11-22 Thread Hannes Frederic Sowa
On 22.11.2016 11:34, Mike Manning wrote:
> Bursts of failures may occur when adding IPv6 routes via Netlink to the
> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
> extended the area map in time for the atomic allocation using percpu.c:
> pcpu_alloc() to succeed. This results in route additions failing with
> an -ENOMEM error.
> 
> While the sender of the Netlink msg to add this route could check for
> an ACK and retransmit in the case of an -ENOMEM error, the latter
> should not occur in the first place if there is plenty of memory. The
> solution is to use non-atomic alloc for rt6_info instead. While the
> client may now be blocked for longer depending on the state of the
> chunk being added to, this work has to be incurred at some point.
> 
> The alternative solution would be to provide configurable parameters
> e.g. via sysctl in percpu.c for default map size, low/high empty pages
> and map margins. For this solution, the map margin sizes need to be
> stored per chunk, as large margins cannot be used if the dynamic early
> slots map size is in use. This is not a preferred solution though, as
> it requires tuning of these parameters to provide sufficient margins to
> avoid -ENOMEM errors depending on system requirements.
> 
> Signed-off-by: Mike Manning 
> ---
>  net/ipv6/route.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 1b57e11..0e9bb76 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>   struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>  
>   if (rt) {
> - rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
> + rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>   if (rt->rt6i_pcpu) {
>   int cpu;

Nak, this doesn't work, as ip6_dst_alloc must be callable from
non-blocking code paths unfortunately.




Re: [PATCH] ipv6:ipv6_pinfo dereferenced after NULL check

2016-11-22 Thread Hannes Frederic Sowa
On 22.11.2016 07:27, Manjeet Pawar wrote:
> From: Rohit Thapliyal 
> 
> np checked for NULL and then dereferenced. It should be modified
> for NULL case.
> 
> Signed-off-by: Rohit Thapliyal 
> Signed-off-by: Manjeet Pawar 
> ---
>  net/ipv6/ip6_output.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 1dfc402..c2afa14 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -205,14 +205,15 @@ int ip6_xmit(const struct sock *sk, struct sk_buff 
> *skb, struct flowi6 *fl6,
>   /*
>*  Fill in the IPv6 header
>*/
> - if (np)
> + if (np) {
>   hlimit = np->hop_limit;
> + ip6_flow_hdr(
> + hdr, tclass, ip6_make_flowlabel(
> + net, skb, fl6->flowlabel,
> + np->autoflowlabel, fl6));
> + }
>   if (hlimit < 0)
>   hlimit = ip6_dst_hoplimit(dst);
>  
> - ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
> - np->autoflowlabel, fl6));
> -
>   hdr->payload_len = htons(seg_len);
>   hdr->nexthdr = proto;
>   hdr->hop_limit = hlimit;
> 


We always should initialize hdr and not skip the ip6_flow_hdr call.

Do you saw a bug or did you find this by code review? I wonder if np can
actually be NULL at this point. Maybe we can just eliminate the NULL check.

Thanks,
Hannes



Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration

2016-11-22 Thread Jiri Pirko
Mon, Nov 21, 2016 at 08:09:22PM CET, f.faine...@gmail.com wrote:
>Hi all,
>
>This patch series allows using the bridge master interface to configure
>an Ethernet switch port's CPU/management port with different VLAN attributes 
>than
>those of the bridge downstream ports/members.
>
>Jiri, Ido, Andrew, Vivien, please review the impact on mlxsw and mv88e6xxx, I
>tested this with b53 and a mockup DSA driver.

Patchset looks fine to me.

>
>Open questions:
>
>- if we have more than one bridge on top of a physical switch, the driver
>  should keep track of that and verify that we are not going to change
>  the CPU port VLAN attributes in a way that results in incompatible settings
>  to be applied

Ack. In mlxsw this is tracked


>
>- if the default behavior is to have all VLANs associated with the CPU port
>  be ingressing/egressing tagged to the CPU, is this really useful?
>
>Florian Fainelli (3):
>  net: bridge: Allow bridge master device to configure switch CPU port
>  net: dsa: Propagate VLAN add/del to CPU port(s)
>  net: dsa: b53: Remove CPU port specific VLAN programming
>
> drivers/net/dsa/b53/b53_common.c | 22 ++--
> net/bridge/br_vlan.c | 28 ++---
> net/dsa/slave.c  | 45 +---
> 3 files changed, 64 insertions(+), 31 deletions(-)
>
>-- 
>2.9.3
>


Re: [PATCH net 1/2] r8152: fix the sw rx checksum is unavailable

2016-11-22 Thread Mark Lord

On 16-11-18 07:03 AM, Mark Lord wrote:

On 16-11-18 02:57 AM, Hayes Wang wrote:
..

Besides, the maximum data length which the RTL8152 would send to
the host is 16KB. That is, if the agg_buf_sz is 16KB, the host
wouldn't split it. However, you still see problems for it.


How does the RTL8152 know that the limit is 16KB,
rather than some other number?  Is this a hardwired number
in the hardware, or is it a parameter that the software
sends to the chip during initialization?

..

The first issue is that a packet sometimes begins in one URB,
and completes in the next URB, without an rx_desc at the start
of the second URB.  This I have already reported earlier.


Long run tests over the weekend, with the invalidate_dcache_range() call
before the inner loop of r8152_rx_bottom(), turned up a few instances
where packets were truncated inside a 16384 byte URB buffer, without filling 
the URB.

[10.293228] r8152_rx_bottom: 4278 corrupted urb: head=9d21 
urb_offset=2856/3376 pkt_len(1518) exceeds remainder(496)
[10.304523] r8152_dump_rx_desc: 044805ee 4008 006005dc 0602  
 rx_len=1518
..
[   16.660431] r8152_rx_bottom: 7802 corrupted urb: head=9d1f8000 
urb_offset=1544/2064 pkt_len(1518) exceeds remainder(496)
[   16.671719] r8152_dump_rx_desc: 044805ee 4048 004005dc 46020006  
 rx_len=1518

The r8152.c driver attempted to build skb's for the entire packet size,
even though the 1518-byte packets had only 496-bytes of data in the URB.
It is not clear what the chip did with the rest of the packets in question,
but the next URBs in each case began with a new/real rx_desc and new packet.

There were also unconnected events during the test runs where the
test code noticed totally invalid rx_desc structs in the middles of URBs.
The stock driver would again have attempted to treat those as "valid" (ugh).

..
[   10.273906] r8152_check_rx_desc: rx_desc looks bad.
[   10.279012] r8152_rx_bottom: 4338 corrupted urb. head=9d21 
urb_offset=2856/3376 len_used=2880
[   10.288196] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 
202f3a30 rx_len=12857

..
[7.184565] r8152_check_rx_desc: rx_desc looks bad.
[7.189657] r8152_rx_bottom: 1678 corrupted urb. head=9d21 
urb_offset=2856/3376 len_used=2880
[7.198852] r8152_dump_rx_desc: a1388402 803c9001 84380810 a67c5c4c a77c782b 
c64c782b rx_len=1026
..
[   10.351251] r8152_check_rx_desc: rx_desc looks bad.
[   10.356356] r8152_rx_bottom: 4397 corrupted urb. head=9d20c000 
urb_offset=4400/7984 len_used=4424
[   10.365543] r8152_dump_rx_desc: 312e3239 382e3836 0a20382e 3d435253 3034336d 
202f3a30 rx_len=12857
..
[   10.518119] r8152_check_rx_desc: rx_desc looks bad.
[   10.523204] r8152_rx_bottom: 4458 corrupted urb. head=9d21 
urb_offset=4400/7984 len_used=4424
[   10.532416] r8152_dump_rx_desc: 54544120 6e3d5352 636f6c6f 65762c6b 343d7372 
6464612c rx_len=16672
..


But the driver, as written, sometimes accesses bytes outside
of the 16KB URB buffer, because it trusts the non-existent
rx_desc in these cases, and also because it accesses bytes
from the rx_desc without first checking whether there is
sufficient remaining space in the URB to hold an rx_desc.

These incorrect accesses sometimes touch memory outside
of the URB buffer.  Since the driver allocates all of its
rx URB buffers at once, they are highly likely to be
physically (and therefore virtually) adjacent in memory.

So mistakenly accessing beyond the end of one buffer will
often result in a read from memory of the next URB buffer.
Which causes a portion of it to be loaded in the the D-cache.

When that URB is subsequently filled by DMA, there then exists
a data-consistency issue:  the D-cache contains stale information
from before the latest DMA cycle.

So this explains the strange memory behaviour observed earlier on.
When I add a call to invalidate_dcache_range() to the driver
just before it begins examining a new rx URB, the problems go away.
So this confirms the observations.

Using non-cacheable RAM also makes the problem go away.
But neither is a fix for the real buffer overrun accesses in the driver.

Fix the "packet spans URBs" bug, and fix the driver to ALWAYS
test lengths/ranges before accessing the actual buffer,
and everything should begin working reliably.


Re: [PATCH] net: ipv6: avoid errors due to per-cpu atomic alloc

2016-11-22 Thread Mike Manning
On 11/22/2016 12:18 PM, Hannes Frederic Sowa wrote:
> On 22.11.2016 11:34, Mike Manning wrote:
>> Bursts of failures may occur when adding IPv6 routes via Netlink to the
>> kernel when testing under scale (e.g. 500 routes lost out of 1M). The
>> reason is that percpu.c:pcpu_balance_workfn() is not guaranteed to have
>> extended the area map in time for the atomic allocation using percpu.c:
>> pcpu_alloc() to succeed. This results in route additions failing with
>> an -ENOMEM error.
>>
>> While the sender of the Netlink msg to add this route could check for
>> an ACK and retransmit in the case of an -ENOMEM error, the latter
>> should not occur in the first place if there is plenty of memory. The
>> solution is to use non-atomic alloc for rt6_info instead. While the
>> client may now be blocked for longer depending on the state of the
>> chunk being added to, this work has to be incurred at some point.
>>
>> The alternative solution would be to provide configurable parameters
>> e.g. via sysctl in percpu.c for default map size, low/high empty pages
>> and map margins. For this solution, the map margin sizes need to be
>> stored per chunk, as large margins cannot be used if the dynamic early
>> slots map size is in use. This is not a preferred solution though, as
>> it requires tuning of these parameters to provide sufficient margins to
>> avoid -ENOMEM errors depending on system requirements.
>>
>> Signed-off-by: Mike Manning 
>> ---
>>  net/ipv6/route.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 1b57e11..0e9bb76 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -347,7 +347,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
>>  struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
>>  
>>  if (rt) {
>> -rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
>> +rt->rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_KERNEL);
>>  if (rt->rt6i_pcpu) {
>>  int cpu;
> 
> Nak, this doesn't work, as ip6_dst_alloc must be callable from
> non-blocking code paths unfortunately.
> 
> 

Thanks for the prompt reply.

Do you consider the alternative of providing configurable parameters for per-cpu
alloc as viable, or is there a better way of dealing with this?

While I have tested such param changes under scale as avoiding the -ENOMEM 
errors, it
would be good to get confirmation that this approach is acceptable prior to 
coding the
sysctl handling for these.



[PATCH] iproute2: Nr. of packets and octets for macsec tx stats were swapped.

2016-11-22 Thread Daniel . Hopf
Signed-off-by: Daniel Hopf 
---
 ip/ipmacsec.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
index c9252bb..aa89a00 100644
--- a/ip/ipmacsec.c
+++ b/ip/ipmacsec.c
@@ -634,10 +634,10 @@ static void print_one_stat(const char **names, 
struct rtattr **attr, int idx,
 }

 static const char *txsc_stats_names[NUM_MACSEC_TXSC_STATS_ATTR] = {
-   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = 
"OutOctetsProtected",
-   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = 
"OutOctetsEncrypted",
-   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutPktsProtected",
-   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutPktsEncrypted",
+   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = "OutPktsProtected",
+   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = "OutPktsEncrypted",
+   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
"OutOctetsProtected",
+   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
"OutOctetsEncrypted",
 };

 static void print_txsc_stats(const char *prefix, struct rtattr *attr)
--
2.9.3


Re: [PATCH] iproute2: Nr. of packets and octets for macsec tx stats were swapped.

2016-11-22 Thread Sabrina Dubroca
Hi Daniel,

Thanks for fixing this. I noticed it some time ago but it seems
I forgot to send the patch :(

Acked-by: Sabrina Dubroca 


Your subject line should be:

Subject: [PATCH iproute2] macsec: Nr.of packets and octets for macsec tx stats 
were swapped.

with "iproute2" between the brackets.

2016-11-22, 14:24:40 +0100, daniel.h...@continental-corporation.com wrote:
> Signed-off-by: Daniel Hopf 
> ---
>  ip/ipmacsec.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/ip/ipmacsec.c b/ip/ipmacsec.c
> index c9252bb..aa89a00 100644
> --- a/ip/ipmacsec.c
> +++ b/ip/ipmacsec.c
> @@ -634,10 +634,10 @@ static void print_one_stat(const char **names, 
> struct rtattr **attr, int idx,
>  }
> 
>  static const char *txsc_stats_names[NUM_MACSEC_TXSC_STATS_ATTR] = {
> -   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = 
> "OutOctetsProtected",
> -   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = 
> "OutOctetsEncrypted",
> -   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
> "OutPktsProtected",
> -   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
> "OutPktsEncrypted",
> +   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_PROTECTED] = "OutPktsProtected",
> +   [MACSEC_TXSC_STATS_ATTR_OUT_PKTS_ENCRYPTED] = "OutPktsEncrypted",
> +   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_PROTECTED] = 
> "OutOctetsProtected",
> +   [MACSEC_TXSC_STATS_ATTR_OUT_OCTETS_ENCRYPTED] = 
> "OutOctetsEncrypted",
>  };

Your patch was corrupted, probably by your email client, you have
extra newlines everywhere.
Can you send a v2 of this patch? Thanks!

-- 
Sabrina


Re: Synopsys Ethernet QoS Driver

2016-11-22 Thread Joao Pinto

Hi Lars and Peppe,

On 21-11-2016 16:11, Joao Pinto wrote:
> On 21-11-2016 15:43, Lars Persson wrote:
>>
>>
>>> 21 nov. 2016 kl. 16:06 skrev Joao Pinto :
>>>
 On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
> On 11/21/2016 2:28 PM, Lars Persson wrote:
>
>
>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO :
>>
>> Hello Joao
>>
>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>> Hello,
>>>
> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent  wrote:
>> On Fri, Nov 18, 2016 at 02:20:27PM +, Joao Pinto wrote:
>> For now we are interesting in improving the synopsys QoS driver under
>> /nect/ethernet/synopsys. For now the driver structure consists of a
>> single file
>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and
> 

snip (...)

>>
>> Peppe
>>
>
> Hello Joao and others,
>
>>>
>>> Hi Lars,
>>>
> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on 
> the
> most mature driver, the stmmac.
>
> I hope that the code can migrate into an ethernet/synopsys folder to keep 
> the
> convention of naming the folder after the vendor. This makes it easy for
> others to find the driver.
>
> The dwc_eth_qos.c will eventually be removed and its DT binding interface 
> can
> then be implemented in the stmmac driver.
>>>
>>> So your ideia is to pick the ethernet/stmmac and rename it to 
>>> ethernet/synopsys
>>> and try to improve the structure and add the missing QoS features to it?
>>
>> Indeed this is what I prefer.
> 
> Ok, it makes sense.
> Just for curiosity the target setup is the following:
> https://www.youtube.com/watch?v=8V-LB5y2Cos
> but instead of using internal drivers, we desire to use mainline drivers only.
> 
> Thanks!

Regarding this subject, I am thinking of making the following adaption:

a) delete ethernet/synopsys
b) rename ethernet/stmicro/stmmac to ethernet/synopsys

and send you a patch for you to evaluate. Both agree with the approach?
To have a new work base would be important, because I will add to the "new"
structure some missing QoS features like Multichannel support, CBS and later 
TSN.

Thanks.

> 
>>
>>>

 Thanks Lars, I will be happy to support all you on this transition
 and I agree on renaming all.

 peppe


> - Lars
>
>>>
>>>

>
> (See http://lists.openwall.net/netdev/2016/02/29/127)
>
> The former only supports 4.x of the hardware.
>
> The later supports 4.x and 3.x and already has a platform glue driver
> with support for several platforms, a PCI glue driver, and a core 
> driver
> with several features not present in the former (for example: TX/RX
> interrupt coalescing, EEE, PTP).
>
> Have you evaluated both drivers?  Why have you decided to work on the
> former rather than the latter?


>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>
>

>>>
> 



[PATCH net] net/mlx4_en: Free netdev resources under state lock

2016-11-22 Thread Tariq Toukan
Make sure mlx4_en_free_resources is called under the netdev state lock.
This is needed since RCU dereference of XDP prog should be protected.

Fixes: 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock")
Signed-off-by: Tariq Toukan 
Reported-by: Sagi Grimberg 
CC: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 3a47e83d3e07..a60f635da78b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -129,6 +129,9 @@ static enum mlx4_net_trans_rule_id 
mlx4_ip_proto_to_trans_rule_id(u8 ip_proto)
}
 };
 
+/* Must not acquire state_lock, as its corresponding work_sync
+ * is done under it.
+ */
 static void mlx4_en_filter_work(struct work_struct *work)
 {
struct mlx4_en_filter *filter = container_of(work,
@@ -2189,13 +2192,13 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
mutex_lock(&mdev->state_lock);
mdev->pndev[priv->port] = NULL;
mdev->upper[priv->port] = NULL;
-   mutex_unlock(&mdev->state_lock);
 
 #ifdef CONFIG_RFS_ACCEL
mlx4_en_cleanup_filters(priv);
 #endif
 
mlx4_en_free_resources(priv);
+   mutex_unlock(&mdev->state_lock);
 
kfree(priv->tx_ring);
kfree(priv->tx_cq);
-- 
1.8.3.1



[PATCH net-next] marvell: mark mvneta and mvpp2 32-bit only

2016-11-22 Thread Arnd Bergmann
Both of these drivers won't work on 64-bit architectures unless they
are redesigned, since they store a virtual address pointer in a 32-bit
field of the descriptors:

drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct':
drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to 
integer of different size [-Werror=pointer-to-int-cast]
drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init':
drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly 
truncated to unsigned type [-Werror=overflow]

This limits the COMPILE_TEST option for the two drivers again to
only build them on 32-bit. This seems nicer than shutting up the
warnings, in case we ever actually want to use them on 64-bit,
as the warnings indicate which parts of the driver are currently
broken there.

Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with 
COMPILE_TEST")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/marvell/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index d74d4e6f0b34..66fd9dbb2ca7 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -58,6 +58,7 @@ config MVNETA
tristate "Marvell Armada 370/38x/XP network interface support"
depends on PLAT_ORION || COMPILE_TEST
depends on HAS_DMA
+   depends on !64BIT
select MVMDIO
select FIXED_PHY
---help---
@@ -81,6 +82,7 @@ config MVPP2
tristate "Marvell Armada 375 network interface support"
depends on MACH_ARMADA_375 || COMPILE_TEST
depends on HAS_DMA
+   depends on !64BIT
select MVMDIO
---help---
  This driver supports the network interface units in the
-- 
2.9.0



[PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Roi Dayan
tp->root is being allocated in init() time and kfreed in destroy()
however it is being dereferenced in classify() path.

We could be in classify() path after destroy() was called and thus 
tp->root is null. Verifying if tp->root is null in classify() path 
is enough because it's being freed with kfree_rcu() and classify() 
path is under rcu_read_lock().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Signed-off-by: Roi Dayan 
Cc: Cong Wang 
---

Hi Cong, all

As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: 
destroy 
proto tp when all filters are gone"). This patch provides a fix only for 
cls_flower where 
I succeeded in reproducing the issue. Cong, if you can/want to come up with a 
fix that
will be applicable for all the others classifiners, I am fine with that.

Thanks,
Roi


 net/sched/cls_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..88a26c4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct fl_flow_key skb_mkey;
struct ip_tunnel_info *info;
 
-   if (!atomic_read(&head->ht.nelems))
+   if (!head || !atomic_read(&head->ht.nelems))
return -1;
 
fl_clear_masked_range(&skb_key, &head->mask);
-- 
2.7.4



Re: [PATCH v2] net/phy: add trace events for mdio accesses

2016-11-22 Thread Steven Rostedt
On Tue, 22 Nov 2016 11:01:27 +0100
Uwe Kleine-König  wrote:

> diff --git a/include/trace/events/mdio.h b/include/trace/events/mdio.h
> new file mode 100644
> index ..468e2d095d19
> --- /dev/null
> +++ b/include/trace/events/mdio.h
> @@ -0,0 +1,42 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM mdio
> +
> +#if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_MDIO_H
> +
> +#include 
> +
> +TRACE_EVENT_CONDITION(mdio_access,
> +
> + TP_PROTO(struct mii_bus *bus, int read,
> +  unsigned addr, unsigned regnum, u16 val, int err),
> +
> + TP_ARGS(bus, read, addr, regnum, val, err),
> +
> + TP_CONDITION(err >= 0),
> +
> + TP_STRUCT__entry(
> + __array(char, busid, MII_BUS_ID_SIZE)
> + __field(int, read)

read is just a 0 or 1. What about making it a char? That way we can
pack this better. If I'm not mistaken, MII_BUS_ID_SIZE is (20 - 3) or
17. If read is just one byte, then it can fit in one of those three
bytes, and you save 4 extra bytes (assuming addr will be 4 byte
aligned).

-- Steve


> + __field(unsigned, addr)
> + __field(unsigned, regnum)
> + __field(u16, val)
> + ),
> +
> + TP_fast_assign(
> + strncpy(__entry->busid, bus->id, MII_BUS_ID_SIZE);
> + __entry->read = read;
> + __entry->addr = addr;
> + __entry->regnum = regnum;
> + __entry->val = val;
> + ),
> +
> + TP_printk("%s %-5s phy:0x%02x reg:0x%02x val:0x%04hx",
> +   __entry->busid, __entry->read ? "read" : "write",
> +   __entry->addr, __entry->regnum, __entry->val)
> +);
> +
> +#endif /* if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ) */
> +
> +/* This part must be outside protection */
> +#include 



Re: [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()

2016-11-22 Thread Valo, Kalle
Bjorn Andersson  writes:

> On Wed 16 Nov 10:49 PST 2016, Kalle Valo wrote:
>
>> Bjorn Andersson  wrote:
>> > The correct include file for getting errno constants and ERR_PTR() is
>> > linux/err.h, rather than linux/errno.h, so fix the include.
>> > 
>> > Fixes: e8b123e60084 ("soc: qcom: smem_state: Add stubs for disabled 
>> > smem_state")
>> > Acked-by: Andy Gross 
>> > Signed-off-by: Bjorn Andersson 
>> 
>> For some reason this fails to compile now. Can you take a look, please?
>> 
>> ERROR: "qcom_wcnss_open_channel" 
>> [drivers/net/wireless/ath/wcn36xx/wcn36xx.ko] undefined!
>> make[1]: *** [__modpost] Error 1
>> make: *** [modules] Error 2
>> 
>> 5 patches set to Changes Requested.
>> 
>> 9429045 [v5,1/5] soc: qcom: smem_state: Fix include for ERR_PTR()
>> 9429047 [v5,2/5] wcn36xx: Transition driver to SMD client
>
> This patch was updated with the necessary depends in Kconfig to catch
> this exact issue and when I pull in your .config (which has QCOM_SMD=n,
> QCOM_WCNSS_CTRL=n and WCN36XX=y) I can build this just fine.
>
> I've tested the various combinations and it seems to work fine. Do you
> have any other patches in your tree?

This was with the pending branch of my ath.git tree. There are other
wireless patches (ath10k etc) but I would guess they don't affect here.

> Any stale objects?

Not sure what you mean with this question, but I didn't run 'make clean'
if that's what you are asking.

> Would you mind retesting this, before I invest more time in trying to
> reproduce the issue you're seeing?

Sure, I'll take a look but that might take few days.

-- 
Kalle Valo

Re: [PATCHv2 net-next 00/11] Start adding support for mv88e6390

2016-11-22 Thread David Miller
From: Andrew Lunn 
Date: Mon, 21 Nov 2016 23:26:54 +0100

> This is the first patchset implementing support for the mv88e6390
> family.  This is a new generation of switch devices and has numerous
> incompatible changes to the registers. These patches allow the switch
> to the detected during probe, and makes the statistics unit work.
> 
> These patches are insufficient to make the mv88e6390 functional. More
> patches will follow.
> 
> v2:
>   Move stats code into global1
>   Change DT compatible string to mv88e6190
>   Fixed mv88e6351 stats which v1 had broken

Series applied, thanks Andrew.


Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support

2016-11-22 Thread Michael S. Tsirkin
On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> > On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >> From: Shrijeet Mukherjee 
> >>
> >> This adds XDP support to virtio_net. Some requirements must be
> >> met for XDP to be enabled depending on the mode. First it will
> >> only be supported with LRO disabled so that data is not pushed
> >> across multiple buffers. The MTU must be less than a page size
> >> to avoid having to handle XDP across multiple pages.
> >>
> >> If mergeable receive is enabled this first series only supports
> >> the case where header and data are in the same buf which we can
> >> check when a packet is received by looking at num_buf. If the
> >> num_buf is greater than 1 and a XDP program is loaded the packet
> >> is dropped and a warning is thrown. When any_header_sg is set this
> >> does not happen and both header and data is put in a single buffer
> >> as expected so we check this when XDP programs are loaded. Note I
> >> have only tested this with Linux vhost backend.
> >>
> >> If big packets mode is enabled and MTU/LRO conditions above are
> >> met then XDP is allowed.
> >>
> >> A follow on patch can be generated to solve the mergeable receive
> >> case with num_bufs equal to 2. Buffers greater than two may not
> >> be handled has easily.
> > 
> > 
> > I would very much prefer support for other layouts without drops
> > before merging this.
> > header by itself can certainly be handled by skipping it.
> > People wanted to use that e.g. for zero copy.
> 
> OK fair enough I'll do this now rather than push it out.
> 
> > 
> > Anything else can be handled by copying the packet.
> 
> This though I'm not so sure about. The copy is going to be slow and
> I wonder if someone could craft a packet to cause this if it could
> be used to slow down a system.

Device can always linearize if it wants to. If device is malicious
it's hard for OS to defend itself.

> Also I can't see what would cause this to happen. With mergeable
> buffers and LRO off the num_bufs is either 1 or 2 depending on where
> the header is. Otherwise with LRO off it should be in a single page.
> At least this is the Linux vhost implementation, I guess other
> implementation might meet spec but use num_buf > 2 or multiple pages
> even in the non LRO case.

Me neither but then not a long time ago we always placed
header in a separate entry until we saw the extra s/g has
measureable overhead.

network broken is kind of a heavy handed thing, making debugging
impossible for many people.

> I tend to think dropping the packet out right is better than copying
> it around. At very least if we do this we need to put in warnings so
> users can see something is mis-configured.
> 
> .John

Yes, I think that's a good idea.

-- 
MST


Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues

2016-11-22 Thread Michael S. Tsirkin
On Tue, Nov 22, 2016 at 12:17:40AM -0800, John Fastabend wrote:
> On 16-11-21 03:13 PM, Michael S. Tsirkin wrote:
> > On Sat, Nov 19, 2016 at 06:51:04PM -0800, John Fastabend wrote:
> >> XDP requires using isolated transmit queues to avoid interference
> >> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
> >> adds a XDP queue per cpu when a XDP program is loaded and does not
> >> expose the queues to the OS via the normal API call to
> >> netif_set_real_num_tx_queues(). This way the stack will never push
> >> an skb to these queues.
> >>
> >> However virtio/vhost/qemu implementation only allows for creating
> >> TX/RX queue pairs at this time so creating only TX queues was not
> >> possible. And because the associated RX queues are being created I
> >> went ahead and exposed these to the stack and let the backend use
> >> them. This creates more RX queues visible to the network stack than
> >> TX queues which is worth mentioning but does not cause any issues as
> >> far as I can tell.
> >>
> >> Signed-off-by: John Fastabend 
> > 
> > FYI what's supposed to happen is packets from the same
> > flow going in the reverse direction will go on the
> > same queue.
> > 
> > This might come in handy when implementing RX XDP.
> > 
> 
> Yeah but if its the first packet not part of a flow then presumably it
> can pick any queue but its worth keeping in mind certainly.
> 
> .John

Oh I agree, absolutely. This was just a FYI in case it comes useful
as an optimization down the road.

-- 
MST


Re: [PATCH] net: dsa: mv88e6xxx: egress all frames

2016-11-22 Thread Andrew Lunn
On Tue, Nov 22, 2016 at 11:39:44AM +0100, Stefan Eichenberger wrote:
> Egress multicast and egress unicast is only enabled for CPU/DSA ports
> but for switching operation it seems it should be enabled for all ports.
> Do I miss something here?
> 
> I did the following test:
> brctl addbr br0
> brctl addif br0 lan0
> brctl addif br0 lan1
> 
> In this scenario the unicast and multicast packets were not forwarded,
> therefore ARP requests were not resolved, and no connection could be
> established.

Hi Stefan

This is probably specific to the 6097 family. It works fine without
this on other devices. Creating a bridge like above and pinging across
it is one of my standard tests. But i only test modern devices like
the 6165, 6352, 6351, 6390 families.

In fact, you might need to review all the code and look where
mv88e6xxx_6095_family(chip) is used and consider if you need to add
mv88e6xxx_6097_family(chip). e.g.

if (mv88e6xxx_6095_family(chip) || mv88e6xxx_6185_family(chip)) {
/* Set the upstream port this port should use */
reg |= dsa_upstream_port(ds);
/* enable forwarding of unknown multicast addresses to
 * the upstream port
 */
if (port == dsa_upstream_port(ds))
reg |= PORT_CONTROL_2_FORWARD_UNKNOWN;
}

Maybe this is your problem?

Andrew


Re: [patch net-next 0/2] mlxsw: core: Implement thermal zone

2016-11-22 Thread David Miller
From: Jiri Pirko 
Date: Tue, 22 Nov 2016 11:24:11 +0100

> Implement thermal zone for mlxsw based HW.
> The first patch is just a register dependency for the second patch.

Looks good, series applied, thanks.


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Jiri Pirko
Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>tp->root is being allocated in init() time and kfreed in destroy()
>however it is being dereferenced in classify() path.
>
>We could be in classify() path after destroy() was called and thus 
>tp->root is null. Verifying if tp->root is null in classify() path 
>is enough because it's being freed with kfree_rcu() and classify() 
>path is under rcu_read_lock().
>
>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>Signed-off-by: Roi Dayan 
>Cc: Cong Wang 

This is correct

Reviewed-by: Jiri Pirko 

The other way to fix this would be to move tp->ops->destroy call to
call_rcu phase. That would require bigger changes though. net-next
perhaps?



>---
>
>Hi Cong, all
>
>As stated above, the issue was introduced with commit 1e052be69d04 
>("net_sched: destroy 
>proto tp when all filters are gone"). This patch provides a fix only for 
>cls_flower where 
>I succeeded in reproducing the issue. Cong, if you can/want to come up with a 
>fix that
>will be applicable for all the others classifiners, I am fine with that.
>
>Thanks,
>Roi
>
>
> net/sched/cls_flower.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index e8dd09a..88a26c4 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
>tcf_proto *tp,
>   struct fl_flow_key skb_mkey;
>   struct ip_tunnel_info *info;
> 
>-  if (!atomic_read(&head->ht.nelems))
>+  if (!head || !atomic_read(&head->ht.nelems))
>   return -1;
> 
>   fl_clear_masked_range(&skb_key, &head->mask);
>-- 
>2.7.4
>


Re: [PATCH] ipv6:ipv6_pinfo dereferenced after NULL check

2016-11-22 Thread David Miller
From: Hannes Frederic Sowa 
Date: Tue, 22 Nov 2016 13:26:45 +0100

> On 22.11.2016 07:27, Manjeet Pawar wrote:
>> From: Rohit Thapliyal 
>> 
>> np checked for NULL and then dereferenced. It should be modified
>> for NULL case.
>> 
>> Signed-off-by: Rohit Thapliyal 
>> Signed-off-by: Manjeet Pawar 
>> ---
>>  net/ipv6/ip6_output.c | 9 +
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>> 
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 1dfc402..c2afa14 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -205,14 +205,15 @@ int ip6_xmit(const struct sock *sk, struct sk_buff 
>> *skb, struct flowi6 *fl6,
>>  /*
>>   *  Fill in the IPv6 header
>>   */
>> -if (np)
>> +if (np) {
>>  hlimit = np->hop_limit;
>> +ip6_flow_hdr(
>> +hdr, tclass, ip6_make_flowlabel(
>> +net, skb, fl6->flowlabel,
>> +np->autoflowlabel, fl6));
>> +}
>>  if (hlimit < 0)
>>  hlimit = ip6_dst_hoplimit(dst);
>>  
>> -ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel,
>> -np->autoflowlabel, fl6));
>> -
>>  hdr->payload_len = htons(seg_len);
>>  hdr->nexthdr = proto;
>>  hdr->hop_limit = hlimit;
>> 
> 
> 
> We always should initialize hdr and not skip the ip6_flow_hdr call.
> 
> Do you saw a bug or did you find this by code review? I wonder if np can
> actually be NULL at this point. Maybe we can just eliminate the NULL check.

Also the indentation is really off.


Re: [PATCH] net: dsa: mv88e6xxx: add MV88E6097 switch

2016-11-22 Thread Andrew Lunn
On Tue, Nov 22, 2016 at 11:28:36AM +0100, Stefan Eichenberger wrote:
> Add support for the MV88E6097 switch. The change was tested on an Armada
> based platform with a MV88E6097 switch.

Hi Stefan

Please can you based your patches on net-next. You will then find the
ops structure has gained a few more entries.

Andrew


Re: [PATCH] fec: Always write MAC address to controller register

2016-11-22 Thread David Miller

This change is already in the tree via commit
b82d44d78480faff7456e9e0999acb9d38666057 made nearly
two months ago:

commit b82d44d78480faff7456e9e0999acb9d38666057
Author: Gavin Schenk 
Date:   Fri Sep 30 11:46:10 2016 +0200

net: fec: set mac address unconditionally

If the mac address origin is not dt, you can only safely assign a mac
address after "link up" of the device. If the link is off the clocks are
disabled and because of issues assigning registers when clocks are off the
new mac address cannot be written in .ndo_set_mac_address() on some soc's.
This fix sets the mac address unconditionally in fec_restart(...) and
ensures consistency between fec registers and the network layer.

Signed-off-by: Gavin Schenk 
Acked-by: Fugang Duan 
Acked-by: Uwe Kleine-König 
Fixes: 9638d19e4816 ("net: fec: add netif status check before set mac 
address")
Signed-off-by: David S. Miller 

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 1fa2d87..48a033e 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -913,13 +913,11 @@ fec_restart(struct net_device *ndev)
 * enet-mac reset will reset mac address registers too,
 * so need to reconfigure it.
 */
-   if (fep->quirks & FEC_QUIRK_ENET_MAC) {
-   memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
-   writel((__force u32)cpu_to_be32(temp_mac[0]),
-  fep->hwp + FEC_ADDR_LOW);
-   writel((__force u32)cpu_to_be32(temp_mac[1]),
-  fep->hwp + FEC_ADDR_HIGH);
-   }
+   memcpy(&temp_mac, ndev->dev_addr, ETH_ALEN);
+   writel((__force u32)cpu_to_be32(temp_mac[0]),
+  fep->hwp + FEC_ADDR_LOW);
+   writel((__force u32)cpu_to_be32(temp_mac[1]),
+  fep->hwp + FEC_ADDR_HIGH);
 
/* Clear any outstanding interrupt. */
writel(0x, fep->hwp + FEC_IEVENT);



Re: [PATCH net-next] marvell: mark mvneta and mvpp2 32-bit only

2016-11-22 Thread Gregory CLEMENT
Hi Arnd,
 
 On mar., nov. 22 2016, Arnd Bergmann  wrote:

> Both of these drivers won't work on 64-bit architectures unless they
> are redesigned, since they store a virtual address pointer in a 32-bit
> field of the descriptors:
>
> drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct':
> drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to 
> integer of different size [-Werror=pointer-to-int-cast]
> drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init':
> drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly 
> truncated to unsigned type [-Werror=overflow]
>
> This limits the COMPILE_TEST option for the two drivers again to
> only build them on 32-bit. This seems nicer than shutting up the
> warnings, in case we ever actually want to use them on 64-bit,
> as the warnings indicate which parts of the driver are currently

Actually we are using these drivers on 64-bits so obviously there are
not 32 bits only!

For mvneta currently we do not use BM on the 64-bits version. I agree
that there is a problem with mvneta_bm.c when building in 64-bits but it
should not prevent us to use the mvneta driver.

Gregory

> broken there.
>
> Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with 
> COMPILE_TEST")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/ethernet/marvell/Kconfig | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/Kconfig 
> b/drivers/net/ethernet/marvell/Kconfig
> index d74d4e6f0b34..66fd9dbb2ca7 100644
> --- a/drivers/net/ethernet/marvell/Kconfig
> +++ b/drivers/net/ethernet/marvell/Kconfig
> @@ -58,6 +58,7 @@ config MVNETA
>   tristate "Marvell Armada 370/38x/XP network interface support"
>   depends on PLAT_ORION || COMPILE_TEST
>   depends on HAS_DMA
> + depends on !64BIT
>   select MVMDIO
>   select FIXED_PHY
>   ---help---
> @@ -81,6 +82,7 @@ config MVPP2
>   tristate "Marvell Armada 375 network interface support"
>   depends on MACH_ARMADA_375 || COMPILE_TEST
>   depends on HAS_DMA
> + depends on !64BIT
>   select MVMDIO
>   ---help---
> This driver supports the network interface units in the
> -- 
> 2.9.0
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com


Re: [PATCH v2] net/phy: add trace events for mdio accesses

2016-11-22 Thread Andrew Lunn
On Tue, Nov 22, 2016 at 09:55:21AM -0500, Steven Rostedt wrote:
> On Tue, 22 Nov 2016 11:01:27 +0100
> Uwe Kleine-König  wrote:
> 
> > diff --git a/include/trace/events/mdio.h b/include/trace/events/mdio.h
> > new file mode 100644
> > index ..468e2d095d19
> > --- /dev/null
> > +++ b/include/trace/events/mdio.h
> > @@ -0,0 +1,42 @@
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM mdio
> > +
> > +#if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_MDIO_H
> > +
> > +#include 
> > +
> > +TRACE_EVENT_CONDITION(mdio_access,
> > +
> > +   TP_PROTO(struct mii_bus *bus, int read,
> > +unsigned addr, unsigned regnum, u16 val, int err),
> > +
> > +   TP_ARGS(bus, read, addr, regnum, val, err),
> > +
> > +   TP_CONDITION(err >= 0),
> > +
> > +   TP_STRUCT__entry(
> > +   __array(char, busid, MII_BUS_ID_SIZE)
> > +   __field(int, read)
> 
> read is just a 0 or 1. What about making it a char? That way we can
> pack this better. If I'm not mistaken, MII_BUS_ID_SIZE is (20 - 3) or
> 17. If read is just one byte, then it can fit in one of those three
> bytes, and you save 4 extra bytes (assuming addr will be 4 byte
> aligned).

addr could also be cast into a u8. There are a maximum of 32
addresses on an MDIO bus. Because of clause 45 MDIO, regnum needs to
remain a u32.

   Andrew


Re: wl1251 & mac address & calibration data

2016-11-22 Thread Michal Kazior
On 21 November 2016 at 16:51, Pali Rohár  wrote:
> On Friday 11 November 2016 18:20:50 Pali Rohár wrote:
>> Hi! I will open discussion about mac address and calibration data for
>> wl1251 wireless chip again...
>>
>> Problem: Mac address & calibration data for wl1251 chip on Nokia N900
>> are stored on second nand partition (mtd1) in special proprietary format
>> which is used only for Nokia N900 (probably on N8x0 and N9 too).
>> Wireless driver wl1251.ko cannot work without mac address and
>> calibration data.

Same problem applies to some ath9k/ath10k supported routers. Some even
carry mac address as implicit offset from ethernet mac address. As far
as I understand OpenWRT cooks cal blobs on first boot prior to loading
modules.


>> Absence of mac address cause that driver generates random mac address at
>> every kernel boot which has couple of problems (unstable identifier of
>> wireless device due to udev permanent storage rules; unpredictable
>> behaviour for dhcp mac address assignment, mac address filtering, ...).
>>
>> Currently there is no way to set (permanent) mac address for network
>> interface from userspace. And it does not make sense to implement in
>> linux kernel large parser for proprietary format of second nand
>> partition where is mac address stored only for one device -- Nokia N900.
>>
>> Driver wl1251.ko loads calibration data via request_firmware() for file
>> wl1251-nvs.bin. There are some "example" calibration file in linux-
>> firmware repository, but it is not suitable for normal usage as real
>> calibration data are per-device specific.

You could hook up a script that cooks up the cal/mac file via
modprobe's install hook, no?


Michał


Re: [PATCH net-next] marvell: mark mvneta and mvpp2 32-bit only

2016-11-22 Thread David Miller
From: Arnd Bergmann 
Date: Tue, 22 Nov 2016 15:21:22 +0100

> Both of these drivers won't work on 64-bit architectures unless they
> are redesigned, since they store a virtual address pointer in a 32-bit
> field of the descriptors:
> 
> drivers/net/ethernet/marvell/mvneta_bm.c: In function 'mvneta_bm_construct':
> drivers/net/ethernet/marvell/mvneta_bm.c:103:16: error: cast from pointer to 
> integer of different size [-Werror=pointer-to-int-cast]
> drivers/net/ethernet/marvell/mvpp2.c: In function 'mvpp2_prs_vlan_init':
> drivers/net/ethernet/marvell/mvpp2.c:2563:32: error: large integer implicitly 
> truncated to unsigned type [-Werror=overflow]
> 
> This limits the COMPILE_TEST option for the two drivers again to
> only build them on 32-bit. This seems nicer than shutting up the
> warnings, in case we ever actually want to use them on 64-bit,
> as the warnings indicate which parts of the driver are currently
> broken there.
> 
> Fixes: a0627f776a45 ("net: marvell: Allow drivers to be built with 
> COMPILE_TEST")
> Signed-off-by: Arnd Bergmann 

Ok, this is a reasonable thing to do for now until the 64-bit patches
are sorted out.

Applied, thanks Arnd.



Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration

2016-11-22 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> This patch series allows using the bridge master interface to configure
> an Ethernet switch port's CPU/management port with different VLAN attributes 
> than
> those of the bridge downstream ports/members.
>
> Jiri, Ido, Andrew, Vivien, please review the impact on mlxsw and mv88e6xxx, I
> tested this with b53 and a mockup DSA driver.

Patchset looks fine to me overall. I'm cooking a patch similar to 3/3
for mv88e6xxx to put on top of this patchset.

Minor comments in individual patchs will follow.

> Open questions:
>
> - if we have more than one bridge on top of a physical switch, the driver
>   should keep track of that and verify that we are not going to change
>   the CPU port VLAN attributes in a way that results in incompatible settings
>   to be applied

In mv88e6xxx, mv88e6xxx_port_check_hw_vlan() does that. It needs a small
adjustment though.

> - if the default behavior is to have all VLANs associated with the CPU port
>   be ingressing/egressing tagged to the CPU, is this really useful?

I have no strong opinion on this. Intuitively I'd expect the CPU port to
be excluded until I add it myself, but I didn't think much about it.

Thanks,

Vivien


Re: wl1251 & mac address & calibration data

2016-11-22 Thread Pali Rohár
On Tuesday 22 November 2016 16:22:57 Michal Kazior wrote:
> On 21 November 2016 at 16:51, Pali Rohár  wrote:
> > On Friday 11 November 2016 18:20:50 Pali Rohár wrote:
> >> Hi! I will open discussion about mac address and calibration data for
> >> wl1251 wireless chip again...
> >>
> >> Problem: Mac address & calibration data for wl1251 chip on Nokia N900
> >> are stored on second nand partition (mtd1) in special proprietary format
> >> which is used only for Nokia N900 (probably on N8x0 and N9 too).
> >> Wireless driver wl1251.ko cannot work without mac address and
> >> calibration data.
> 
> Same problem applies to some ath9k/ath10k supported routers. Some even
> carry mac address as implicit offset from ethernet mac address. As far
> as I understand OpenWRT cooks cal blobs on first boot prior to loading
> modules.

So... wl1251 on Nokia N900 is not alone and this problem is there for
more drivers and devices. Which means we should come up with some
generic solution.

> >> Absence of mac address cause that driver generates random mac address at
> >> every kernel boot which has couple of problems (unstable identifier of
> >> wireless device due to udev permanent storage rules; unpredictable
> >> behaviour for dhcp mac address assignment, mac address filtering, ...).
> >>
> >> Currently there is no way to set (permanent) mac address for network
> >> interface from userspace. And it does not make sense to implement in
> >> linux kernel large parser for proprietary format of second nand
> >> partition where is mac address stored only for one device -- Nokia N900.
> >>
> >> Driver wl1251.ko loads calibration data via request_firmware() for file
> >> wl1251-nvs.bin. There are some "example" calibration file in linux-
> >> firmware repository, but it is not suitable for normal usage as real
> >> calibration data are per-device specific.
> 
> You could hook up a script that cooks up the cal/mac file via
> modprobe's install hook, no?

Via modprobe hook I can either pass custom module parameter or call any
other system (shell) commands.

As wl1251.ko does not accept mac_address as module parameter, such
modprobe hook does not help -- as there is absolutely no way from
userspace to set or change (permanent) mac address.

-- 
Pali Rohár
pali.ro...@gmail.com


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread David Miller
From: Jiri Pirko 
Date: Tue, 22 Nov 2016 15:48:44 +0100

> Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>>tp->root is being allocated in init() time and kfreed in destroy()
>>however it is being dereferenced in classify() path.
>>
>>We could be in classify() path after destroy() was called and thus 
>>tp->root is null. Verifying if tp->root is null in classify() path 
>>is enough because it's being freed with kfree_rcu() and classify() 
>>path is under rcu_read_lock().
>>
>>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>>Signed-off-by: Roi Dayan 
>>Cc: Cong Wang 
> 
> This is correct
> 
> Reviewed-by: Jiri Pirko 
> 
> The other way to fix this would be to move tp->ops->destroy call to
> call_rcu phase. That would require bigger changes though. net-next
> perhaps?

This patch is targetted at net-next as per Subj.


Re: [PATCH v3 3/5] net: asix: Fix AX88772x resume failures

2016-11-22 Thread Jon Hunter
Hi Allan,

On 18/11/16 15:09, Jon Hunter wrote:
> Hi Allan,
> 
> On 14/11/16 09:45, ASIX_Allan [Office] wrote:
>> Hi Jon,
>>
>> Please help to double check if the USB host controller of your Terga
>> platform had been powered OFF while running the ax88772_suspend() routine or
>> not? 
> 
> Sorry for the delay. Today I set up a local board to reproduce this on
> and was able to recreate the same problem. The Tegra xhci driver does
> not power off during suspend and simply calls xhci_suspend(). I also
> checked vbus to see if it was turning off but it is not. Furthermore I
> don't see a new USB device detected after the error and so I don't see
> any evidence that it ever disconnects.

In an attempt to isolate if this is a Tegra issue or not, I recompiled 
v4.9-rc6 for x86 and I was able to reproduce the problem on my desktop ...

[  256.030060] PM: Syncing filesystems ... done.
[  256.113925] PM: Preparing system for sleep (mem)
[  256.114119] Freezing user space processes ... (elapsed 0.002 seconds) done.
[  256.116701] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[  256.118041] PM: Suspending system (mem)
[  256.118058] Suspending console(s) (use no_console_suspend to debug)
[  256.118324] asix 1-1.2:1.0 eth2: Failed to read reg index 0x: -19
[  256.118327] asix 1-1.2:1.0 eth2: Error reading Medium Status register: 
ffed
[  256.118329] asix 1-1.2:1.0 eth2: Failed to write reg index 0x: -19
[  256.118332] asix 1-1.2:1.0 eth2: Failed to write Medium Mode mode to 0xfeed: 
ffed
[  256.118374] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  256.118471] sd 0:0:0:0: [sda] Stopping disk
[  256.152992] hpet1: lost 1 rtc interrupts
[  256.153893] serial 00:06: disabled
[  256.153899] serial 00:06: System wakeup disabled by ACPI
[  256.154068] e1000e: EEE TX LPI TIMER: 0011
[  256.628281] PM: suspend of devices complete after 509.782 msecs
[  256.628620] PM: late suspend of devices complete after 0.336 msecs
[  256.629366] ehci-pci :00:1d.0: System wakeup enabled by ACPI
[  256.629595] tg3 :03:00.0: System wakeup enabled by ACPI
[  256.629601] ehci-pci :00:1a.0: System wakeup enabled by ACPI
[  256.629652] e1000e :00:19.0: System wakeup enabled by ACPI
[  256.629812] xhci_hcd :00:14.0: System wakeup enabled by ACPI
[  256.648347] PM: noirq suspend of devices complete after 19.713 msecs
[  256.648685] ACPI: Preparing to enter system sleep state S3
[  256.668275] PM: Saving platform NVS memory
[  256.668283] Disabling non-boot CPUs ...

To reproduce this, I did the following:

1. Connect the asix device and noted the net interface (ie. eth2)
2. Disabled the interface (ie. sudo ifconfig eth2 down)
3. Ran a suspend-resume cycle using rtcwake (eg. sudo rtcwake -d rtc0 -m mem -s 
5)

Cheers
Jon

-- 
nvpublic


[PATCH] net: mvneta: Only disable mvneta_bm for 64-bits

2016-11-22 Thread Gregory CLEMENT
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index 66fd9dbb2ca7..2ccea9dd9248 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -44,6 +44,7 @@ config MVMDIO
 config MVNETA_BM_ENABLE
tristate "Marvell Armada 38x/XP network interface BM support"
depends on MVNETA
+   depends on !64BIT
---help---
  This driver supports auxiliary block of the network
  interface units in the Marvell ARMADA XP and ARMADA 38x SoC
@@ -58,7 +59,6 @@ config MVNETA
tristate "Marvell Armada 370/38x/XP network interface support"
depends on PLAT_ORION || COMPILE_TEST
depends on HAS_DMA
-   depends on !64BIT
select MVMDIO
select FIXED_PHY
---help---
@@ -71,6 +71,7 @@ config MVNETA
 
 config MVNETA_BM
tristate
+   depends on !64BIT
default y if MVNETA=y && MVNETA_BM_ENABLE!=n
default MVNETA_BM_ENABLE
select HWBM
-- 
2.10.2



Re: [RFC net-next 1/3] net: bridge: Allow bridge master device to configure switch CPU port

2016-11-22 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> bridge vlan add vid 2 dev br0 self
>   -> CPU port gets programmed
> bridge vlan add vid 2 dev port0
>   -> port0 (switch port 0) gets programmed

Although this is not specific to this patch, I'd like to point out that
this seems not to be the behavior bridge expects.

The bridge manpage says:

bridge vlan add - add a new vlan filter entry
...

   self   the vlan is configured on the specified physical device.
  Required if the device is the bridge device.

   master the vlan is configured on the software bridge (default).

So if I'm not mistaken, the switch chip must be programmed only when the
bridge command is called with the "self" attribute. Without it, only
software configuration must be made, like what happens when the driver
returns -EOPNOTSUPP.

Currently, both commands below program the hardware:

# bridge vlan add vid 2 dev port0 [master]
# bridge vlan add vid 2 dev port0 [master] self

Jiri, what do you think? Is there a reason for switchdev not to be
consistent with the bridge doc, or should this be fixed?

Thanks,

Vivien


[PATCH v3] net/phy: add trace events for mdio accesses

2016-11-22 Thread Uwe Kleine-König
Make it possible to generate trace events for mdio read and write accesses.

Signed-off-by: Uwe Kleine-König 
---
 drivers/net/phy/mdio_bus.c  | 11 +++
 include/trace/events/mdio.h | 42 ++
 2 files changed, 53 insertions(+)
 create mode 100644 include/trace/events/mdio.h

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 09deef4bed09..653d076eafe5 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -38,6 +38,9 @@
 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 int mdiobus_register_device(struct mdio_device *mdiodev)
 {
if (mdiodev->bus->mdio_map[mdiodev->addr])
@@ -461,6 +464,8 @@ int mdiobus_read_nested(struct mii_bus *bus, int addr, u32 
regnum)
retval = bus->read(bus, addr, regnum);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
return retval;
 }
 EXPORT_SYMBOL(mdiobus_read_nested);
@@ -485,6 +490,8 @@ int mdiobus_read(struct mii_bus *bus, int addr, u32 regnum)
retval = bus->read(bus, addr, regnum);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 1, addr, regnum, retval, retval);
+
return retval;
 }
 EXPORT_SYMBOL(mdiobus_read);
@@ -513,6 +520,8 @@ int mdiobus_write_nested(struct mii_bus *bus, int addr, u32 
regnum, u16 val)
err = bus->write(bus, addr, regnum, val);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 0, addr, regnum, val, err);
+
return err;
 }
 EXPORT_SYMBOL(mdiobus_write_nested);
@@ -538,6 +547,8 @@ int mdiobus_write(struct mii_bus *bus, int addr, u32 
regnum, u16 val)
err = bus->write(bus, addr, regnum, val);
mutex_unlock(&bus->mdio_lock);
 
+   trace_mdio_access(bus, 0, addr, regnum, val, err);
+
return err;
 }
 EXPORT_SYMBOL(mdiobus_write);
diff --git a/include/trace/events/mdio.h b/include/trace/events/mdio.h
new file mode 100644
index ..00d85f5f54e4
--- /dev/null
+++ b/include/trace/events/mdio.h
@@ -0,0 +1,42 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM mdio
+
+#if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_MDIO_H
+
+#include 
+
+TRACE_EVENT_CONDITION(mdio_access,
+
+   TP_PROTO(struct mii_bus *bus, char read,
+u8 addr, unsigned regnum, u16 val, int err),
+
+   TP_ARGS(bus, read, addr, regnum, val, err),
+
+   TP_CONDITION(err >= 0),
+
+   TP_STRUCT__entry(
+   __array(char, busid, MII_BUS_ID_SIZE)
+   __field(char, read)
+   __field(u8, addr)
+   __field(u16, val)
+   __field(unsigned, regnum)
+   ),
+
+   TP_fast_assign(
+   strncpy(__entry->busid, bus->id, MII_BUS_ID_SIZE);
+   __entry->read = read;
+   __entry->addr = addr;
+   __entry->regnum = regnum;
+   __entry->val = val;
+   ),
+
+   TP_printk("%s %-5s phy:0x%02hhx reg:0x%02x val:0x%04hx",
+ __entry->busid, __entry->read ? "read" : "write",
+ __entry->addr, __entry->regnum, __entry->val)
+);
+
+#endif /* if !defined(_TRACE_MDIO_H) || defined(TRACE_HEADER_MULTI_READ) */
+
+/* This part must be outside protection */
+#include 
-- 
2.10.2



Re: [PATCH] fec: Always write MAC address to controller register

2016-11-22 Thread Daniel Krüger
Sorry, I missed it.
But thanks for the fast answer.

cu,
  Daniel

Am 22.11.2016 um 16:07 schrieb David Miller:
> 
> This change is already in the tree via commit
> b82d44d78480faff7456e9e0999acb9d38666057 made nearly
> two months ago:
> [...]


Re: [PATCH v3] net/phy: add trace events for mdio accesses

2016-11-22 Thread Steven Rostedt
On Tue, 22 Nov 2016 16:47:11 +0100
Uwe Kleine-König  wrote:

> Make it possible to generate trace events for mdio read and write accesses.
> 
> Signed-off-by: Uwe Kleine-König 

For the tracing side.

Acked-by: Steven Rostedt 

-- Steve


[PATCH net] ipv6: bump genid when the IFA_F_TENTATIVE flag is clear

2016-11-22 Thread Paolo Abeni
When an ipv6 address has the tentative flag set, it can't be
used as source for egress traffic, while the associated route,
if any, can be looked up and even stored into some dst_cache.

In the latter scenario, the source ipv6 address selected and
stored in the cache is most probably wrong (e.g. with
link-local scope) and the entity using the dst_cache will
experience lack of ipv6 connectivity until said cache is
cleared or invalidated.

Overall this may cause lack of connectivity over most IPv6 tunnels
(comprising geneve and vxlan), if the first egress packet reaches
the tunnel before the DaD is completed for the used ipv6
address.

This patch bumps a new genid after that the IFA_F_TENTATIVE flag
is cleared, so that dst_cache will be invalidated on
next lookup and ipv6 connectivity restored.

Fixes: 0c1d70af924b ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd762 ("geneve: add dst caching support")
Acked-by: Hannes Frederic Sowa 
Signed-off-by: Paolo Abeni 
---
 net/ipv6/addrconf.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 060dd99..4bc5ba3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -183,7 +183,7 @@ static struct rt6_info *addrconf_get_prefix_route(const 
struct in6_addr *pfx,
 
 static void addrconf_dad_start(struct inet6_ifaddr *ifp);
 static void addrconf_dad_work(struct work_struct *w);
-static void addrconf_dad_completed(struct inet6_ifaddr *ifp);
+static void addrconf_dad_completed(struct inet6_ifaddr *ifp, bool bump_id);
 static void addrconf_dad_run(struct inet6_dev *idev);
 static void addrconf_rs_timer(unsigned long data);
 static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifa);
@@ -2898,6 +2898,7 @@ static void add_addr(struct inet6_dev *idev, const struct 
in6_addr *addr,
spin_lock_bh(&ifp->lock);
ifp->flags &= ~IFA_F_TENTATIVE;
spin_unlock_bh(&ifp->lock);
+   rt_genid_bump_ipv6(dev_net(idev->dev));
ipv6_ifa_notify(RTM_NEWADDR, ifp);
in6_ifa_put(ifp);
}
@@ -3740,7 +3741,7 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
 {
struct inet6_dev *idev = ifp->idev;
struct net_device *dev = idev->dev;
-   bool notify = false;
+   bool bump_id, notify = false;
 
addrconf_join_solict(dev, &ifp->addr);
 
@@ -3755,11 +3756,12 @@ static void addrconf_dad_begin(struct inet6_ifaddr *ifp)
idev->cnf.accept_dad < 1 ||
!(ifp->flags&IFA_F_TENTATIVE) ||
ifp->flags & IFA_F_NODAD) {
+   bump_id = ifp->flags & IFA_F_TENTATIVE;
ifp->flags &= 
~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED);
spin_unlock(&ifp->lock);
read_unlock_bh(&idev->lock);
 
-   addrconf_dad_completed(ifp);
+   addrconf_dad_completed(ifp, bump_id);
return;
}
 
@@ -3819,8 +3821,8 @@ static void addrconf_dad_work(struct work_struct *w)
struct inet6_ifaddr,
dad_work);
struct inet6_dev *idev = ifp->idev;
+   bool bump_id, disable_ipv6 = false;
struct in6_addr mcaddr;
-   bool disable_ipv6 = false;
 
enum {
DAD_PROCESS,
@@ -3890,11 +3892,12 @@ static void addrconf_dad_work(struct work_struct *w)
 * DAD was successful
 */
 
+   bump_id = ifp->flags & IFA_F_TENTATIVE;
ifp->flags &= 
~(IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|IFA_F_DADFAILED);
spin_unlock(&ifp->lock);
write_unlock_bh(&idev->lock);
 
-   addrconf_dad_completed(ifp);
+   addrconf_dad_completed(ifp, bump_id);
 
goto out;
}
@@ -3931,7 +3934,7 @@ static bool ipv6_lonely_lladdr(struct inet6_ifaddr *ifp)
return true;
 }
 
-static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
+static void addrconf_dad_completed(struct inet6_ifaddr *ifp, bool bump_id)
 {
struct net_device *dev = ifp->idev->dev;
struct in6_addr lladdr;
@@ -3983,6 +3986,9 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
*ifp)
spin_unlock(&ifp->lock);
write_unlock_bh(&ifp->idev->lock);
}
+
+   if (bump_id)
+   rt_genid_bump_ipv6(dev_net(dev));
 }
 
 static void addrconf_dad_run(struct inet6_dev *idev)
-- 
1.8.3.1



Re: [PATCH net 1/1] net sched filters: pass netlink message flags in event notification

2016-11-22 Thread Roman Mashak
Daniel Borkmann  writes:

> On 11/22/2016 06:23 AM, Cong Wang wrote:
>> On Thu, Nov 17, 2016 at 1:02 PM, Cong Wang  wrote:
>>> On Wed, Nov 16, 2016 at 2:16 PM, Roman Mashak  wrote:
 Userland client should be able to read an event, and reflect it back to
 the kernel, therefore it needs to extract complete set of netlink flags.

 For example, this will allow "tc monitor" to distinguish Add and Replace
 operations.

 Signed-off-by: Roman Mashak 
 Signed-off-by: Jamal Hadi Salim 
 ---
   net/sched/cls_api.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)

 diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
 index 2b2a797..8e93d4a 100644
 --- a/net/sched/cls_api.c
 +++ b/net/sched/cls_api.c
 @@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, 
 struct sk_buff *oskb,

  for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL;
   it_chain = &tp->next)
 -   tfilter_notify(net, oskb, n, tp, 0, event, false);
 +   tfilter_notify(net, oskb, n, tp, n->nlmsg_flags, event, 
 false);
>>>
>>>
>>> I must miss something, why does it make sense to pass n->nlmsg_flags
>>> as 'fh' to tfilter_notify()??
>>
>> Ping... Any response?
>>
>> It still doesn't look correct to me. I will send a fix unless someone could
>> explain this.
>
> Sigh, I missed that this was applied already to -net (it certainly doesn't 
> look
> like -net material, but rather -net-next stuff) ... This definitely looks 
> buggy
> to me, the 0 as it was before was correct here (as it means we delete the 
> whole
> chain in this case).
>
> If you could send a patch would be great. Thanks Cong!

Cong/Daniel, sorry for late response, I was distracted.
I apologize, I will send a fix today.

-- 
Roman Mashak


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Daniel Borkmann

[ + John ]

On 11/22/2016 03:48 PM, Jiri Pirko wrote:

Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:

tp->root is being allocated in init() time and kfreed in destroy()
however it is being dereferenced in classify() path.

We could be in classify() path after destroy() was called and thus
tp->root is null. Verifying if tp->root is null in classify() path
is enough because it's being freed with kfree_rcu() and classify()
path is under rcu_read_lock().

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Signed-off-by: Roi Dayan 
Cc: Cong Wang 


This is correct

Reviewed-by: Jiri Pirko 

The other way to fix this would be to move tp->ops->destroy call to
call_rcu phase. That would require bigger changes though. net-next
perhaps?


Hmm, I don't think we want to have such an additional test in fast
path for each and every classifier. Can we think of ways to avoid that?

My question is, since we unlink individual instances from such tp-internal
lists through RCU and release the instance through call_rcu() as well as
the head (tp->root) via kfree_rcu() eventually, against what are we protecting
setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
not respecting grace period?

The only thing that actually checks if tp->root is NULL right now is the
get() callback. Is that the reason why tp->root is RCU'ified? John?

Thanks,
Daniel


Hi Cong, all

As stated above, the issue was introduced with commit 1e052be69d04 ("net_sched: 
destroy
proto tp when all filters are gone"). This patch provides a fix only for 
cls_flower where
I succeeded in reproducing the issue. Cong, if you can/want to come up with a 
fix that
will be applicable for all the others classifiners, I am fine with that.

Thanks,
Roi


net/sched/cls_flower.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index e8dd09a..88a26c4 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const struct 
tcf_proto *tp,
struct fl_flow_key skb_mkey;
struct ip_tunnel_info *info;

-   if (!atomic_read(&head->ht.nelems))
+   if (!head || !atomic_read(&head->ht.nelems))
return -1;

fl_clear_masked_range(&skb_key, &head->mask);
--
2.7.4





[PATCH net-next] net: mvneta: Only disable mvneta_bm for 64-bits

2016-11-22 Thread Gregory CLEMENT
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index 66fd9dbb2ca7..2ccea9dd9248 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -44,6 +44,7 @@ config MVMDIO
 config MVNETA_BM_ENABLE
tristate "Marvell Armada 38x/XP network interface BM support"
depends on MVNETA
+   depends on !64BIT
---help---
  This driver supports auxiliary block of the network
  interface units in the Marvell ARMADA XP and ARMADA 38x SoC
@@ -58,7 +59,6 @@ config MVNETA
tristate "Marvell Armada 370/38x/XP network interface support"
depends on PLAT_ORION || COMPILE_TEST
depends on HAS_DMA
-   depends on !64BIT
select MVMDIO
select FIXED_PHY
---help---
@@ -71,6 +71,7 @@ config MVNETA
 
 config MVNETA_BM
tristate
+   depends on !64BIT
default y if MVNETA=y && MVNETA_BM_ENABLE!=n
default MVNETA_BM_ENABLE
select HWBM
-- 
2.10.2



Re: [PATCH] net: mvneta: Only disable mvneta_bm for 64-bits

2016-11-22 Thread Gregory CLEMENT
Hi,
 
 On mar., nov. 22 2016, Gregory CLEMENT  
wrote:

> Actually only the mvneta_bm support is not 64-bits compatible.
> The mvneta code itself can run on 64-bits architecture.

I have just realized that my topic prefix was wrong (net-next was
missing), I am send a new email with the correct prefix.

Sorry for the noise.

Gregory

>
> Signed-off-by: Gregory CLEMENT 
> ---
>  drivers/net/ethernet/marvell/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/marvell/Kconfig 
> b/drivers/net/ethernet/marvell/Kconfig
> index 66fd9dbb2ca7..2ccea9dd9248 100644
> --- a/drivers/net/ethernet/marvell/Kconfig
> +++ b/drivers/net/ethernet/marvell/Kconfig
> @@ -44,6 +44,7 @@ config MVMDIO
>  config MVNETA_BM_ENABLE
>   tristate "Marvell Armada 38x/XP network interface BM support"
>   depends on MVNETA
> + depends on !64BIT
>   ---help---
> This driver supports auxiliary block of the network
> interface units in the Marvell ARMADA XP and ARMADA 38x SoC
> @@ -58,7 +59,6 @@ config MVNETA
>   tristate "Marvell Armada 370/38x/XP network interface support"
>   depends on PLAT_ORION || COMPILE_TEST
>   depends on HAS_DMA
> - depends on !64BIT
>   select MVMDIO
>   select FIXED_PHY
>   ---help---
> @@ -71,6 +71,7 @@ config MVNETA
>  
>  config MVNETA_BM
>   tristate
> + depends on !64BIT
>   default y if MVNETA=y && MVNETA_BM_ENABLE!=n
>   default MVNETA_BM_ENABLE
>   select HWBM
> -- 
> 2.10.2
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Jiri Pirko
Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
>[ + John ]
>
>On 11/22/2016 03:48 PM, Jiri Pirko wrote:
>> Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>> > tp->root is being allocated in init() time and kfreed in destroy()
>> > however it is being dereferenced in classify() path.
>> > 
>> > We could be in classify() path after destroy() was called and thus
>> > tp->root is null. Verifying if tp->root is null in classify() path
>> > is enough because it's being freed with kfree_rcu() and classify()
>> > path is under rcu_read_lock().
>> > 
>> > Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are 
>> > gone")
>> > Signed-off-by: Roi Dayan 
>> > Cc: Cong Wang 
>> 
>> This is correct
>> 
>> Reviewed-by: Jiri Pirko 
>> 
>> The other way to fix this would be to move tp->ops->destroy call to
>> call_rcu phase. That would require bigger changes though. net-next
>> perhaps?
>
>Hmm, I don't think we want to have such an additional test in fast
>path for each and every classifier. Can we think of ways to avoid that?
>
>My question is, since we unlink individual instances from such tp-internal
>lists through RCU and release the instance through call_rcu() as well as
>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>not respecting grace period?

If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
to null.


>
>The only thing that actually checks if tp->root is NULL right now is the
>get() callback. Is that the reason why tp->root is RCU'ified? John?
>
>Thanks,
>Daniel
>
>> > Hi Cong, all
>> > 
>> > As stated above, the issue was introduced with commit 1e052be69d04 
>> > ("net_sched: destroy
>> > proto tp when all filters are gone"). This patch provides a fix only for 
>> > cls_flower where
>> > I succeeded in reproducing the issue. Cong, if you can/want to come up 
>> > with a fix that
>> > will be applicable for all the others classifiners, I am fine with that.
>> > 
>> > Thanks,
>> > Roi
>> > 
>> > 
>> > net/sched/cls_flower.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> > 
>> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> > index e8dd09a..88a26c4 100644
>> > --- a/net/sched/cls_flower.c
>> > +++ b/net/sched/cls_flower.c
>> > @@ -135,7 +135,7 @@ static int fl_classify(struct sk_buff *skb, const 
>> > struct tcf_proto *tp,
>> >struct fl_flow_key skb_mkey;
>> >struct ip_tunnel_info *info;
>> > 
>> > -  if (!atomic_read(&head->ht.nelems))
>> > +  if (!head || !atomic_read(&head->ht.nelems))
>> >return -1;
>> > 
>> >fl_clear_masked_range(&skb_key, &head->mask);
>> > --
>> > 2.7.4
>> > 
>


Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Jiri Pirko
Tue, Nov 22, 2016 at 04:37:42PM CET, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Tue, 22 Nov 2016 15:48:44 +0100
>
>> Tue, Nov 22, 2016 at 03:25:26PM CET, r...@mellanox.com wrote:
>>>tp->root is being allocated in init() time and kfreed in destroy()
>>>however it is being dereferenced in classify() path.
>>>
>>>We could be in classify() path after destroy() was called and thus 
>>>tp->root is null. Verifying if tp->root is null in classify() path 
>>>is enough because it's being freed with kfree_rcu() and classify() 
>>>path is under rcu_read_lock().
>>>
>>>Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
>>>Signed-off-by: Roi Dayan 
>>>Cc: Cong Wang 
>> 
>> This is correct
>> 
>> Reviewed-by: Jiri Pirko 
>> 
>> The other way to fix this would be to move tp->ops->destroy call to
>> call_rcu phase. That would require bigger changes though. net-next
>> perhaps?
>
>This patch is targetted at net-next as per Subj.

Oh, right, then it should be fixed so the tp->head could be never null


Re: wl1251 & mac address & calibration data

2016-11-22 Thread Michal Kazior
On 22 November 2016 at 16:31, Pali Rohár  wrote:
> On Tuesday 22 November 2016 16:22:57 Michal Kazior wrote:
>> On 21 November 2016 at 16:51, Pali Rohár  wrote:
>> > On Friday 11 November 2016 18:20:50 Pali Rohár wrote:
>> >> Hi! I will open discussion about mac address and calibration data for
>> >> wl1251 wireless chip again...
>> >>
>> >> Problem: Mac address & calibration data for wl1251 chip on Nokia N900
>> >> are stored on second nand partition (mtd1) in special proprietary format
>> >> which is used only for Nokia N900 (probably on N8x0 and N9 too).
>> >> Wireless driver wl1251.ko cannot work without mac address and
>> >> calibration data.
>>
>> Same problem applies to some ath9k/ath10k supported routers. Some even
>> carry mac address as implicit offset from ethernet mac address. As far
>> as I understand OpenWRT cooks cal blobs on first boot prior to loading
>> modules.
>
> So... wl1251 on Nokia N900 is not alone and this problem is there for
> more drivers and devices. Which means we should come up with some
> generic solution.

This isn't particularly a problem for ath9k/ath10k.

Let me give you more background on ath10k.

ath10k devices can come with caldata and macaddr stored in their
OTP/EEPROM. In that case a generic "template" board file is used.
Userspace doesn't need to do anything special.

Some vendors however decide to use flash partition to store caldata.
In that case ath10k expects userspace to prepare cal-$bus-$devname.bin
files, each for a different radio (you can have multiple radios on a
system).

Now translating this for wl1251 I would expect it should also use
something like wl1251-nvs-sdio-0x0001.bin for devices like N900 that
have caldata on flash partition (instead of the generic
wl1251-nvs.bin). I'm not sure if wl1251-nvs.bin is something
comparable to (the generic) board.bin ath10k has though. Maybe the
entire idea behind wl1251-nvs.bin is flawed as it's supposed to be
device specific and is oblivious to possibility of having multiple
wl1251 radios on one system (probably sane assumption from practical
standpoint but still).


>> >> Absence of mac address cause that driver generates random mac address at
>> >> every kernel boot which has couple of problems (unstable identifier of
>> >> wireless device due to udev permanent storage rules; unpredictable
>> >> behaviour for dhcp mac address assignment, mac address filtering, ...).
>> >>
>> >> Currently there is no way to set (permanent) mac address for network
>> >> interface from userspace. And it does not make sense to implement in
>> >> linux kernel large parser for proprietary format of second nand
>> >> partition where is mac address stored only for one device -- Nokia N900.
>> >>
>> >> Driver wl1251.ko loads calibration data via request_firmware() for file
>> >> wl1251-nvs.bin. There are some "example" calibration file in linux-
>> >> firmware repository, but it is not suitable for normal usage as real
>> >> calibration data are per-device specific.
>>
>> You could hook up a script that cooks up the cal/mac file via
>> modprobe's install hook, no?
>
> Via modprobe hook I can either pass custom module parameter or call any
> other system (shell) commands.
>
> As wl1251.ko does not accept mac_address as module parameter, such
> modprobe hook does not help -- as there is absolutely no way from
> userspace to set or change (permanent) mac address.

Quoting modprobe.d manual:

>   install modulename command...
>   This command instructs modprobe to run your
>   command instead of inserting the module in the
>   kernel as normal. The command can be any shell
>   command: this allows you to do any kind of
>   complex processing you might wish. [...]

You can hook up a script that cooks up wl1251-nvs.bin (caldata,
macaddr) and then insmod the actual wl1251.ko module. Or you can just
cook up the nvs on first device boot and store it in /lib/firmware
(possibly overwriting the "generic" wl1251 from linux-firmware).


Michal


Re: [PATCH net-next] net: mvneta: Only disable mvneta_bm for 64-bits

2016-11-22 Thread David Miller
From: Gregory CLEMENT 
Date: Tue, 22 Nov 2016 17:00:37 +0100

> Actually only the mvneta_bm support is not 64-bits compatible.
> The mvneta code itself can run on 64-bits architecture.
> 
> Signed-off-by: Gregory CLEMENT 

No it cannot, it emits warnings because it casts pointers to and
from 32-bit integers.

I'm not applying this.

drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rx_refill’:
drivers/net/ethernet/marvell/mvneta.c:1802:42: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
  mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
  ^
drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rxq_drop_pkts’:
drivers/net/ethernet/marvell/mvneta.c:1864:16: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
   void *data = (void *)rx_desc->buf_cookie;
^
drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rx_swbm’:
drivers/net/ethernet/marvell/mvneta.c:1902:10: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
   data = (unsigned char *)rx_desc->buf_cookie;
  ^
drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rx_hwbm’:
drivers/net/ethernet/marvell/mvneta.c:2023:10: warning: cast to pointer from 
integer of different size [-Wint-to-pointer-cast]
   data = (unsigned char *)rx_desc->buf_cookie;
  ^


Re: [PATCH] iproute2: Nr. of packets and octets for macsec tx stats were swapped.

2016-11-22 Thread Rami Rosen
Hi, Daniel
Acked-by: Rami Rosen 

Agreed about Sabrina comments about adding iproute2 and about the newlines.

Regards,
R


Re: [PATCH net-next] net: mvneta: Only disable mvneta_bm for 64-bits

2016-11-22 Thread Gregory CLEMENT
Hi David,
 
 On mar., nov. 22 2016, David Miller  wrote:

> From: Gregory CLEMENT 
> Date: Tue, 22 Nov 2016 17:00:37 +0100
>
>> Actually only the mvneta_bm support is not 64-bits compatible.
>> The mvneta code itself can run on 64-bits architecture.
>> 
>> Signed-off-by: Gregory CLEMENT 
>
> No it cannot, it emits warnings because it casts pointers to and
> from 32-bit integers.
>
> I'm not applying this.
>
> drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rx_refill’:
> drivers/net/ethernet/marvell/mvneta.c:1802:42: warning: cast from pointer to 
> integer of different size [-Wpointer-to-int-cast]
>   mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
>   ^
> drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rxq_drop_pkts’:
> drivers/net/ethernet/marvell/mvneta.c:1864:16: warning: cast to pointer from 
> integer of different size [-Wint-to-pointer-cast]
>void *data = (void *)rx_desc->buf_cookie;
> ^
> drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rx_swbm’:
> drivers/net/ethernet/marvell/mvneta.c:1902:10: warning: cast to pointer from 
> integer of different size [-Wint-to-pointer-cast]
>data = (unsigned char *)rx_desc->buf_cookie;
>   ^
> drivers/net/ethernet/marvell/mvneta.c: In function ‘mvneta_rx_hwbm’:
> drivers/net/ethernet/marvell/mvneta.c:2023:10: warning: cast to pointer from 
> integer of different size [-Wint-to-pointer-cast]
>data = (unsigned char *)rx_desc->buf_cookie;
>   ^

Indeed!

There was a missing patch for it that I had in my tree and I didn't
submit yet. I am bout to doing it now.

Thanks,

Gregory


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com


Re: net/udp: bug in skb_pull_rcsum

2016-11-22 Thread Eric Dumazet
On Tue, Nov 22, 2016 at 3:58 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> A reproducer is attached.
>
> On commit 9c763584b7c8911106bb77af7e648bef09af9d80 (4.9-rc6, Nov 20).
>
> [ cut here ]
> kernel BUG at net/core/skbuff.c:3029!
> invalid opcode:  [#1] SMP KASAN
> Modules linked in:
> CPU: 1 PID: 3854 Comm: a.out Not tainted 4.9.0-rc6+ #431
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880068472c00 task.stack: 880063ec8000
> RIP: 0010:[]  []
> skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
> RSP: 0018:880063ecf660  EFLAGS: 00010297
> RAX: 880068472c00 RBX: 880065a2da00 RCX: 
> RDX:  RSI: 000d RDI: ed000c7d9ec0
> RBP: 880063ecf690 R08: 11000d08e67e R09: 11000cb45b50
> R10: dc00 R11:  R12: 880065a2da80
> R13: 0008 R14: 880065a2dad8 R15: 0001
> FS:  7fbb006497c0() GS:88006cd0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20032fe0 CR3: 636d9000 CR4: 06e0
> Stack:
>  88006bfbb948 880065a2da00 88006416 11000cb45b52
>   11000d4d3933 880063ecf6f8 83354ced
>  fe00 880065a2da90 880063ecf6c0 0001
> Call Trace:
>  [< inline >] udp_csum_pull_header ./include/net/udp.h:166
>  [] udpv6_queue_rcv_skb+0x37d/0x17b0 net/ipv6/udp.c:625
>  [< inline >] sk_backlog_rcv ./include/net/sock.h:874
>  [] __release_sock+0x126/0x3a0 net/core/sock.c:2046
>  [] release_sock+0x59/0x1c0 net/core/sock.c:2504
>  [] udpv6_sendmsg+0x1310/0x24a0 net/ipv6/udp.c:1273
>  [] inet_sendmsg+0x317/0x4e0 net/ipv4/af_inet.c:734
>  [< inline >] sock_sendmsg_nosec net/socket.c:621
>  [] sock_sendmsg+0xcc/0x110 net/socket.c:631
>  [] sock_write_iter+0x221/0x3b0 net/socket.c:829
>  [] do_iter_readv_writev+0x2bb/0x3f0 fs/read_write.c:695
>  [] do_readv_writev+0x431/0x730 fs/read_write.c:872
>  [] vfs_writev+0x8f/0xc0 fs/read_write.c:911
>  [] do_writev+0xe1/0x240 fs/read_write.c:944
>  [< inline >] SYSC_writev fs/read_write.c:1017
>  [] SyS_writev+0x27/0x30 fs/read_write.c:1014
>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209
> Code: 89 f8 49 c1 e8 03 47 0f b6 14 08 45 84 d2 74 0a 41 80 fa 03 0f
> 8e cf 00 00 00 80 a3 91 00 00 00 f9 e9 43 ff ff ff e8 3b 79 79 fe <0f>
> 0b e8 34 79 79 fe 0f 0b e8 2d 79 79 fe 48 8b 7d d0 31 d2 44
> RIP  [] skb_pull_rcsum+0x255/0x350 net/core/skbuff.c:3029
>  RSP 
> ---[ end trace a5d5d2cef6a25ecb ]---
> ==


Thanks for the report.

It seems bug was added in commit f7ad74fef3af6c6e2ef7f01c5589d77fe7db3d7c

I will cook a fix (Note that bug is no longer present in net-next and
linux-4.10+ kernels)


[PATCH net-next 0/4] Extend mvneta to support Armada 3700 (ARM 64)

2016-11-22 Thread Gregory CLEMENT
Hi,

This series enable the use of mvneta driver on the Armada 3700
SoCs. Armada 3700 is a new ARMv8 SoC from Marvell using same network
controller as older Armada 370/38x/XP.

Besides the changes needed to be used on 64-bits architecture done in
the 1st patch, there are also few difference related to the Armada
3700 SoC. The main one being the used of shared interrupt instead of
the private ones. It has been addressed in the 3rd patch.

Not all the feature supported on the older Soc have been ported yet
for this new SoC.

Gregory CLEMENT (2):
  net: mvneta: Only disable mvneta_bm for 64-bits
  ARM64: dts: marvell: Add network support for Armada 3700

Marcin Wojtas (2):
  net: mvneta: Convert to be 64 bits compatible
  net: mvneta: Add network support for Armada 3700 SoC

 .../bindings/net/marvell-armada-370-neta.txt   |   7 +-
 arch/arm64/boot/dts/marvell/armada-3720-db.dts |  23 ++
 arch/arm64/boot/dts/marvell/armada-37xx.dtsi   |  23 ++
 drivers/net/ethernet/marvell/Kconfig   |  10 +-
 drivers/net/ethernet/marvell/mvneta.c  | 364 -
 5 files changed, 333 insertions(+), 94 deletions(-)

-- 
2.10.2



[PATCH net-next 1/4] net: mvneta: Convert to be 64 bits compatible

2016-11-22 Thread Gregory CLEMENT
From: Marcin Wojtas 

Prepare the mvneta driver in order to be usable on the 64 bits platform
such as the Armada 3700.

[gregory.clem...@free-electrons.com]: this patch was extract from a larger
one to ease review and maintenance.

Signed-off-by: Marcin Wojtas 
Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 77 ---
 1 file changed, 71 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 87274d4ab102..67f6465d96ba 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -296,6 +296,12 @@
 /* descriptor aligned size */
 #define MVNETA_DESC_ALIGNED_SIZE   32
 
+/* Number of bytes to be taken into account by HW when putting incoming data
+ * to the buffers. It is needed in case NET_SKB_PAD exceeds maximum packet
+ * offset supported in MVNETA_RXQ_CONFIG_REG(q) registers.
+ */
+#define MVNETA_RX_PKT_OFFSET_CORRECTION64
+
 #define MVNETA_RX_PKT_SIZE(mtu) \
ALIGN((mtu) + MVNETA_MH_SIZE + MVNETA_VLAN_TAG_LEN + \
  ETH_HLEN + ETH_FCS_LEN,\
@@ -416,8 +422,11 @@ struct mvneta_port {
u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
 
u32 indir[MVNETA_RSS_LU_TABLE_SIZE];
+#ifdef CONFIG_64BIT
+   u64 data_high;
+#endif
+   u16 rx_offset_correction;
 };
-
 /* The mvneta_tx_desc and mvneta_rx_desc structures describe the
  * layout of the transmit and reception DMA descriptors, and their
  * layout is therefore defined by the hardware design
@@ -1791,6 +1800,10 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
if (!data)
return -ENOMEM;
 
+#ifdef CONFIG_64BIT
+   if (unlikely(pp->data_high != (u64)upper_32_bits((u64)data) << 32))
+   return -ENOMEM;
+#endif
phys_addr = dma_map_single(pp->dev->dev.parent, data,
   MVNETA_RX_BUF_SIZE(pp->pkt_size),
   DMA_FROM_DEVICE);
@@ -1799,7 +1812,8 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
return -ENOMEM;
}
 
-   mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
+   phys_addr += pp->rx_offset_correction;
+   mvneta_rx_desc_fill(rx_desc, phys_addr, (uintptr_t)data);
return 0;
 }
 
@@ -1861,8 +1875,16 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
 
for (i = 0; i < rxq->size; i++) {
struct mvneta_rx_desc *rx_desc = rxq->descs + i;
-   void *data = (void *)rx_desc->buf_cookie;
-
+   void *data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
+#ifdef CONFIG_64BIT
+   /* In Neta HW only 32 bits data is supported, so in
+* order to obtain whole 64 bits address from RX
+* descriptor, we store the upper 32 bits when
+* allocating buffer, and put it back when using
+* buffer cookie for accessing packet in memory.
+*/
+   data = (u8 *)(pp->data_high | (u64)data);
+#endif
dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
 MVNETA_RX_BUF_SIZE(pp->pkt_size), 
DMA_FROM_DEVICE);
mvneta_frag_free(pp->frag_size, data);
@@ -1899,7 +1921,17 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int 
rx_todo,
rx_done++;
rx_status = rx_desc->status;
rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
+#ifdef CONFIG_64BIT
+   /* In Neta HW only 32 bits data is supported, so in
+* order to obtain whole 64 bits address from RX
+* descriptor, we store the upper 32 bits when
+* allocating buffer, and put it back when using
+* buffer cookie for accessing packet in memory.
+*/
+   data = (u8 *)(pp->data_high | (u64)rx_desc->buf_cookie);
+#else
data = (unsigned char *)rx_desc->buf_cookie;
+#endif
phys_addr = rx_desc->buf_phys_addr;
 
if (!mvneta_rxq_desc_is_first_last(rx_status) ||
@@ -2020,7 +2052,17 @@ static int mvneta_rx_hwbm(struct mvneta_port *pp, int 
rx_todo,
rx_done++;
rx_status = rx_desc->status;
rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
-   data = (unsigned char *)rx_desc->buf_cookie;
+#ifdef CONFIG_64BIT
+   /* In Neta HW only 32 bits data is supported, so in
+* order to obtain whole 64 bits address from RX
+* descriptor, we store the upper 32 bits when
+* allocating buffer, and put it back when using
+* buffer cookie for accessing packet in memory.
+*/
+   data = (u8 *)(pp->data_high | (u64)rx_desc->buf_cookie);
+#else
+   data = (u8 *)rx_

[PATCH net-next 2/4] net: mvneta: Only disable mvneta_bm for 64-bits

2016-11-22 Thread Gregory CLEMENT
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index 66fd9dbb2ca7..2ccea9dd9248 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -44,6 +44,7 @@ config MVMDIO
 config MVNETA_BM_ENABLE
tristate "Marvell Armada 38x/XP network interface BM support"
depends on MVNETA
+   depends on !64BIT
---help---
  This driver supports auxiliary block of the network
  interface units in the Marvell ARMADA XP and ARMADA 38x SoC
@@ -58,7 +59,6 @@ config MVNETA
tristate "Marvell Armada 370/38x/XP network interface support"
depends on PLAT_ORION || COMPILE_TEST
depends on HAS_DMA
-   depends on !64BIT
select MVMDIO
select FIXED_PHY
---help---
@@ -71,6 +71,7 @@ config MVNETA
 
 config MVNETA_BM
tristate
+   depends on !64BIT
default y if MVNETA=y && MVNETA_BM_ENABLE!=n
default MVNETA_BM_ENABLE
select HWBM
-- 
2.10.2



[PATCH net-next 4/4] ARM64: dts: marvell: Add network support for Armada 3700

2016-11-22 Thread Gregory CLEMENT
Add neta nodes for network support both in device tree for the SoC and
the board.

Signed-off-by: Gregory CLEMENT 
---
 arch/arm64/boot/dts/marvell/armada-3720-db.dts | 23 +++
 arch/arm64/boot/dts/marvell/armada-37xx.dtsi   | 23 +++
 2 files changed, 46 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-3720-db.dts 
b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
index 1372e9a6aaa4..c8b82e4145de 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
@@ -81,3 +81,26 @@
 &pcie0 {
status = "okay";
 };
+
+&mdio {
+   status = "okay";
+   phy0: ethernet-phy@0 {
+   reg = <0>;
+   };
+
+   phy1: ethernet-phy@1 {
+   reg = <1>;
+   };
+};
+
+ð0 {
+   phy-mode = "rgmii-id";
+   phy = <&phy0>;
+   status = "okay";
+};
+
+ð1 {
+   phy-mode = "rgmii-id";
+   phy = <&phy1>;
+   status = "okay";
+};
diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi 
b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index c4762538ec01..a7278ce9e523 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -140,6 +140,29 @@
};
};
 
+   eth0: ethernet@3 {
+  compatible = "marvell,armada-3700-neta";
+  reg = <0x3 0x4000>;
+  interrupts = ;
+  clocks = <&sb_periph_clk 8>;
+  status = "disabled";
+   };
+
+   mdio: mdio@32004 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "marvell,orion-mdio";
+   reg = <0x32004 0x4>;
+   };
+
+   eth1: ethernet@4 {
+   compatible = "marvell,armada-3700-neta";
+   reg = <0x4 0x4000>;
+   interrupts = ;
+   clocks = <&sb_periph_clk 7>;
+   status = "disabled";
+   };
+
usb3: usb@58000 {
compatible = "marvell,armada3700-xhci",
"generic-xhci";
-- 
2.10.2



[PATCH net-next 3/4] net: mvneta: Add network support for Armada 3700 SoC

2016-11-22 Thread Gregory CLEMENT
From: Marcin Wojtas 

Armada 3700 is a new ARMv8 SoC from Marvell using same network controller
as older Armada 370/38x/XP. There are however some differences that
needed taking into account when adding support for it:

* open default MBUS window to 4GB of DRAM - Armada 3700 SoC's Mbus
  configuration for network controller has to be done on two levels:
  global and per-port. The first one is inherited from the
  bootloader. The latter can be opened in a default way, leaving
  arbitration to the bus controller.  Hence filled mbus_dram_target_info
  structure is not needed

* make per-CPU operation optional - Recent patches adding RSS and XPS
  support for Armada 38x/XP enabled per-CPU operation of the controller
  by default. Contrary to older SoC's Armada 3700 SoC's network
  controller is not capable of per-CPU processing due to interrupt lines'
  connectivity.  This patch restores non-per-CPU operation, which is now
  optional and depends on neta_armada3700 flag value in mvneta_port
  structure. In order not to complicate the code, separate interrupt
  subroutine is implemented.

For now, on the Armada 3700, RSS is disabled as the current
implementation depend on precpu interrupt.

[gregory.clem...@free-electrons.com: extract from a larger patch, replace
some ifdef and port to net-next for v4.10]

Signed-off-by: Marcin Wojtas 
Signed-off-by: Gregory CLEMENT 
---
 .../bindings/net/marvell-armada-370-neta.txt   |   7 +-
 drivers/net/ethernet/marvell/Kconfig   |   7 +-
 drivers/net/ethernet/marvell/mvneta.c  | 287 +++--
 3 files changed, 214 insertions(+), 87 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt 
b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index 73be8970815e..7aa840c8768d 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -1,7 +1,10 @@
-* Marvell Armada 370 / Armada XP Ethernet Controller (NETA)
+* Marvell Armada 370 / Armada XP / Armada 3700 Ethernet Controller (NETA)
 
 Required properties:
-- compatible: "marvell,armada-370-neta" or "marvell,armada-xp-neta".
+- compatible: could be one of the followings
+   "marvell,armada-370-neta"
+   "marvell,armada-xp-neta"
+   "marvell,armada-3700-neta"
 - reg: address and length of the register set for the device.
 - interrupts: interrupt for the device
 - phy: See ethernet.txt file in the same directory.
diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index 2ccea9dd9248..3b8f11fe5e13 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -56,14 +56,15 @@ config MVNETA_BM_ENABLE
  buffer management.
 
 config MVNETA
-   tristate "Marvell Armada 370/38x/XP network interface support"
-   depends on PLAT_ORION || COMPILE_TEST
+   tristate "Marvell Armada 370/38x/XP/37xx network interface support"
+   depends on ARCH_MVEBU || COMPILE_TEST
depends on HAS_DMA
select MVMDIO
select FIXED_PHY
---help---
  This driver supports the network interface units in the
- Marvell ARMADA XP, ARMADA 370 and ARMADA 38x SoC family.
+ Marvell ARMADA XP, ARMADA 370, ARMADA 38x and
+ ARMADA 37xx SoC family.
 
  Note that this driver is distinct from the mv643xx_eth
  driver, which should be used for the older Marvell SoCs
diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 67f6465d96ba..7438ffd5639a 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -397,6 +397,9 @@ struct mvneta_port {
spinlock_t lock;
bool is_stopped;
 
+   u32 cause_rx_tx;
+   struct napi_struct napi;
+
/* Core clock */
struct clk *clk;
/* AXI clock */
@@ -422,6 +425,9 @@ struct mvneta_port {
u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
 
u32 indir[MVNETA_RSS_LU_TABLE_SIZE];
+
+   /* Flags for special SoC configurations */
+   bool neta_armada3700;
 #ifdef CONFIG_64BIT
u64 data_high;
 #endif
@@ -964,14 +970,9 @@ static int mvneta_mbus_io_win_set(struct mvneta_port *pp, 
u32 base, u32 wsize,
return 0;
 }
 
-/* Assign and initialize pools for port. In case of fail
- * buffer manager will remain disabled for current port.
- */
-static int mvneta_bm_port_init(struct platform_device *pdev,
-  struct mvneta_port *pp)
+static  int mvneta_bm_port_mbus_init(struct mvneta_port *pp)
 {
-   struct device_node *dn = pdev->dev.of_node;
-   u32 long_pool_id, short_pool_id, wsize;
+   u32 wsize;
u8 target, attr;
int err;
 
@@ -990,6 +991,25 @@ static int mvneta_bm_port_init(struct platform_device 
*pdev,
netdev_info(pp->dev, "fail to configure mbus window to BM\n")

[PATCH v2] net: dsa: mv88e6xxx: add MV88E6097 switch

2016-11-22 Thread Stefan Eichenberger
Add support for the MV88E6097 switch. The change was tested on an Armada
based platform with a MV88E6097 switch.

Signed-off-by: Stefan Eichenberger 
---
 drivers/net/dsa/mv88e6xxx/chip.c  | 26 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  2 ++
 2 files changed, 28 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 48b58c7..2d5941c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3208,6 +3208,19 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
.stats_get_stats = mv88e6095_stats_get_stats,
 };
 
+static const struct mv88e6xxx_ops mv88e6097_ops = {
+   .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
+   .phy_read = mv88e6xxx_g2_smi_phy_read,
+   .phy_write = mv88e6xxx_g2_smi_phy_write,
+   .port_set_link = mv88e6xxx_port_set_link,
+   .port_set_duplex = mv88e6xxx_port_set_duplex,
+   .port_set_speed = mv88e6185_port_set_speed,
+   .stats_snapshot = mv88e6xxx_g1_stats_snapshot,
+   .stats_get_sset_count = mv88e6095_stats_get_sset_count,
+   .stats_get_strings = mv88e6095_stats_get_strings,
+   .stats_get_stats = mv88e6095_stats_get_stats,
+};
+
 static const struct mv88e6xxx_ops mv88e6123_ops = {
/* MV88E6XXX_FAMILY_6165 */
.set_switch_mac = mv88e6xxx_g2_set_switch_mac,
@@ -3579,6 +3592,19 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.ops = &mv88e6095_ops,
},
 
+   [MV88E6097] = {
+   .prod_num = PORT_SWITCH_ID_PROD_NUM_6097,
+   .family = MV88E6XXX_FAMILY_6097,
+   .name = "Marvell 88E6097/88E6097F",
+   .num_databases = 4096,
+   .num_ports = 11,
+   .port_base_addr = 0x10,
+   .global1_addr = 0x1b,
+   .age_time_coeff = 15000,
+   .flags = MV88E6XXX_FLAGS_FAMILY_6097,
+   .ops = &mv88e6097_ops,
+   },
+
[MV88E6123] = {
.prod_num = PORT_SWITCH_ID_PROD_NUM_6123,
.family = MV88E6XXX_FAMILY_6165,
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 9298faa..ab52c37 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -81,6 +81,7 @@
 #define PORT_SWITCH_ID 0x03
 #define PORT_SWITCH_ID_PROD_NUM_6085   0x04a
 #define PORT_SWITCH_ID_PROD_NUM_6095   0x095
+#define PORT_SWITCH_ID_PROD_NUM_6097   0x099
 #define PORT_SWITCH_ID_PROD_NUM_6131   0x106
 #define PORT_SWITCH_ID_PROD_NUM_6320   0x115
 #define PORT_SWITCH_ID_PROD_NUM_6123   0x121
@@ -378,6 +379,7 @@
 enum mv88e6xxx_model {
MV88E6085,
MV88E6095,
+   MV88E6097,
MV88E6123,
MV88E6131,
MV88E6161,
-- 
2.9.3



Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver

2016-11-22 Thread Jason Gunthorpe
On Mon, Nov 21, 2016 at 05:53:04PM -0800, Vishwanathapura, Niranjana wrote:
> There are many example drivers in kernel which are using bus_register() in
> an initcall.

There really are not, certainly not in major subsystems.

> We could add a custom Interface between HFI1 driver and hfi_vnic drivers
> without involving a bus.

hfi is already registering on the infiniband class, just use that.

> But using the existing bus model gave a lot of in-built flexibility in
> decoupling devices from the drivers.

If you want to have your own bus then you need your own hfi
subsystem. drivers/infiniband is not a dumping ground..

Jason


Re: [RFC net-next 2/3] net: dsa: Propagate VLAN add/del to CPU port(s)

2016-11-22 Thread Vivien Didelot
Hi Florian,

Open question: will we need to do the same for FDB and MDB objects?

Florian Fainelli  writes:

> Now that the bridge layer can call into switchdev to signal programming
> requests targeting the bridge master device itself, allow the switch
> drivers to implement separate programming of downstream and
> upstream/management ports.
>
> Signed-off-by: Vivien Didelot 
> Signed-off-by: Florian Fainelli 
> ---
>  net/dsa/slave.c | 45 +
>  1 file changed, 33 insertions(+), 12 deletions(-)
>
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index d0c7bce88743..18288261b964 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -223,35 +223,30 @@ static int dsa_slave_set_mac_address(struct net_device 
> *dev, void *a)
>   return 0;
>  }
>  
> -static int dsa_slave_port_vlan_add(struct net_device *dev,
> +static int dsa_slave_port_vlan_add(struct dsa_switch *ds, int port,
>  const struct switchdev_obj_port_vlan *vlan,
>  struct switchdev_trans *trans)
>  {
> - struct dsa_slave_priv *p = netdev_priv(dev);
> - struct dsa_switch *ds = p->parent;
>  

Extra newline ^.

>   if (switchdev_trans_ph_prepare(trans)) {
>   if (!ds->ops->port_vlan_prepare || !ds->ops->port_vlan_add)
>   return -EOPNOTSUPP;
>  
> - return ds->ops->port_vlan_prepare(ds, p->port, vlan, trans);
> + return ds->ops->port_vlan_prepare(ds, port, vlan, trans);
>   }
>  
> - ds->ops->port_vlan_add(ds, p->port, vlan, trans);
> + ds->ops->port_vlan_add(ds, port, vlan, trans);
>  
>   return 0;
>  }
>  
> -static int dsa_slave_port_vlan_del(struct net_device *dev,
> +static int dsa_slave_port_vlan_del(struct dsa_switch *ds, int port,
>  const struct switchdev_obj_port_vlan *vlan)
>  {
> - struct dsa_slave_priv *p = netdev_priv(dev);
> - struct dsa_switch *ds = p->parent;
> -
>   if (!ds->ops->port_vlan_del)
>   return -EOPNOTSUPP;
>  
> - return ds->ops->port_vlan_del(ds, p->port, vlan);
> + return ds->ops->port_vlan_del(ds, port, vlan);
>  }
>  
>  static int dsa_slave_port_vlan_dump(struct net_device *dev,
> @@ -465,8 +460,21 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
> const struct switchdev_obj *obj,
> struct switchdev_trans *trans)
>  {
> + struct dsa_slave_priv *p = netdev_priv(dev);
> + struct dsa_switch *ds = p->parent;
> + int port = p->port;
>   int err;
>  
> + /* Here we may be called with an orig_dev which is different from dev,
> +  * on purpose, to receive request coming from e.g the bridge master
> +  * device. Although there are no network device associated with CPU/DSA
> +  * ports, we may still have programming operation for these ports.
> +  */
> + if (obj->orig_dev == p->bridge_dev) {
> + ds = ds->dst->ds[0];
> + port = ds->dst->cpu_port;
> + }
> +
>   /* For the prepare phase, ensure the full set of changes is feasable in
>* one go in order to signal a failure properly. If an operation is not
>* supported, return -EOPNOTSUPP.
> @@ -483,7 +491,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
>trans);
>   break;
>   case SWITCHDEV_OBJ_ID_PORT_VLAN:
> - err = dsa_slave_port_vlan_add(dev,
> + err = dsa_slave_port_vlan_add(ds, port,
> SWITCHDEV_OBJ_PORT_VLAN(obj),
> trans);

Note that dsa_slave_port_vlan_add() will be called N times, N being the
number of bridge ports. This is not an issue for the moment though.
Programming it only once requires caching, so leave it for an eventual
future patch.

When issuing the following command (lan0 being a member of br0):

# bridge vlan add vid 42 dev lan0

the CPU port is also programmed as tagged in VLAN 42. Is that expected?

Thanks,

Vivien


[PATCH net] udplite: call proper backlog handlers

2016-11-22 Thread Eric Dumazet
From: Eric Dumazet 

In commits 93821778def10 ("udp: Fix rcv socket locking") and
f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.

This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
before queueing")

Bug found by syzkaller team, thanks a lot guys !

Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9

Fixes: 93821778def1 ("udp: Fix rcv socket locking")
Fixes: f7ad74fef3af ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into 
__udpv6_queue_rcv_skb")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet 
Reported-by: Andrey Konovalov 
Cc: Benjamin LaHaise 
Cc: Herbert Xu 
---
 net/ipv4/udp.c  |2 +-
 net/ipv4/udp_impl.h |2 +-
 net/ipv4/udplite.c  |2 +-
 net/ipv6/udp.c  |2 +-
 net/ipv6/udp_impl.h |2 +-
 net/ipv6/udplite.c  |2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0de9d5d2b9ae..5bab6c3f7a2f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1455,7 +1455,7 @@ static void udp_v4_rehash(struct sock *sk)
udp_lib_rehash(sk, new_hash);
 }
 
-static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
int rc;
 
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index 7e0fe4bdd967..feb50a16398d 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -25,7 +25,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t 
len, int noblock,
int flags, int *addr_len);
 int udp_sendpage(struct sock *sk, struct page *page, int offset, size_t size,
 int flags);
-int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 void udp_destroy_sock(struct sock *sk);
 
 #ifdef CONFIG_PROC_FS
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index af817158d830..ff450c2aad9b 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -50,7 +50,7 @@ struct proto  udplite_prot = {
.sendmsg   = udp_sendmsg,
.recvmsg   = udp_recvmsg,
.sendpage  = udp_sendpage,
-   .backlog_rcv   = udp_queue_rcv_skb,
+   .backlog_rcv   = __udp_queue_rcv_skb,
.hash  = udp_lib_hash,
.unhash= udp_lib_unhash,
.get_port  = udp_v4_get_port,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e5056d4873d1..e4a8000d59ad 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -514,7 +514,7 @@ void __udp6_lib_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
return;
 }
 
-static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
+int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
int rc;
 
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index f6eb1ab34f4b..e78bdc76dcc3 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -26,7 +26,7 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, int 
optname,
 int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len);
 int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
  int flags, int *addr_len);
-int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
+int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb);
 void udpv6_destroy_sock(struct sock *sk);
 
 #ifdef CONFIG_PROC_FS
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index 47d0d2b87106..2f5101a12283 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -45,7 +45,7 @@ struct proto udplitev6_prot = {
.getsockopt= udpv6_getsockopt,
.sendmsg   = udpv6_sendmsg,
.recvmsg   = udpv6_recvmsg,
-   .backlog_rcv   = udpv6_queue_rcv_skb,
+   .backlog_rcv   = __udpv6_queue_rcv_skb,
.hash  = udp_lib_hash,
.unhash= udp_lib_unhash,
.get_port  = udp_v6_get_port,




Re: [PATCHv2 net-next 00/11] Start adding support for mv88e6390

2016-11-22 Thread Vivien Didelot
Hi,

Andrew Lunn  writes:

> This is the first patchset implementing support for the mv88e6390
> family.  This is a new generation of switch devices and has numerous
> incompatible changes to the registers. These patches allow the switch
> to the detected during probe, and makes the statistics unit work.
>
> These patches are insufficient to make the mv88e6390 functional. More
> patches will follow.
>
> v2:
>   Move stats code into global1
>   Change DT compatible string to mv88e6190
>   Fixed mv88e6351 stats which v1 had broken

Thanks Andrew!

For what it's worth:

Reviewed-by: Vivien Didelot 


Vivien


Re: [PATCH v9 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking

2016-11-22 Thread Simon Guinot
On Fri, Nov 18, 2016 at 12:20:07PM +0100, Simon Guinot wrote:
> On Fri, Nov 18, 2016 at 08:48:36AM +, Levy, Amir (Jer) wrote:
> > On Tue, Nov 15 2016, 12:59 PM, Simon Guinot wrote:
> > > On Wed, Nov 09, 2016 at 03:42:53PM +, Levy, Amir (Jer) wrote:
> > > > On Wed, Nov 9 2016, 04:36 PM, Simon Guinot wrote:
> > > > > Hi Amir,
> > > > >
> > > > > I have an ASUS "All Series/Z87-DELUXE/QUAD" motherboard with a 
> > > > > Thunderbolt 2 "Falcon Ridge" chipset (device ID 156d).
> > > > >
> > > > > Is the thunderbolt-icm driver supposed to work with this chipset ?
> > > > >
> > > >
> > > > Yes, the thunderbolt-icm supports Falcon Ridge, device ID 156c.
> > > > 156d is the bridge -
> > > > http://lxr.free-electrons.com/source/include/linux/pci_ids.h#L2619
> > > >
> > > > > I have installed both a 4.8.6 Linux kernel (patched with your v9
> > > > > series) and the thunderbolt-software-daemon (27 october release) 
> > > > > inside a Debian system (Jessie).
> > > > >
> > > > > If I connect the ASUS motherboard with a MacBook Pro (Thunderbolt 
> > > > > 2, device ID 156c), I can see that the thunderbolt-icm driver is 
> > > > > loaded and that the thunderbolt-software-daemon is well started. 
> > > > > But the Ethernet interface is not created.
> > > > >
> > > > > I have attached to this email the syslog file. There is the logs 
> > > > > from both the kernel and the daemon inside. Note that the daemon 
> > > > > logs are everything but clear about what could be the issue. Maybe 
> > > > > I missed some kind of configuration ? But I failed to find any 
> > > > > valuable information about configuring the driver and/or the 
> > > > > daemon in
> > > the various documentation files.
> > > > >
> > > > > Please, can you provide some guidance ? I'd really like to test 
> > > > > your patch series.
> > > >
> > > > First, thank you very much for willing to test it.
> > > > Thunderbolt Networking support was added during Falcon Ridge, in the
> > > latest FR images.
> > > > Do you know which Thunderbolt image version you have on your system?
> > > > Currently I submitted only Thunderbolt Networking feature in Linux, 
> > > > and we plan to add more features like reading the image version and
> > > updating the image.
> > > > If you don't know the image version, the only thing I can suggest is 
> > > > to load windows, install thunderbolt SW and check in the Thunderbolt
> > > application the image version.
> > > > To know if image update is needed, you can check - 
> > > > https://thunderbolttechnology.net/updates
> > > 
> > > Hi Amir,
> > > 
> > > From the Windows Thunderbolt software, I can read 13.00 for the 
> > > firmware version. And from https://thunderbolttechnology.net/updates, 
> > > I can see that there is no update available for my ASUS motherboard.
> > > 
> > > Am I good to go ?
> > > 
> > 
> > Thunderbolt Networking is supported on both Thunderbolt(tm) 2 and 
> > Thunderbolt(tm) 3 systems.  
> > Thunderbolt 2 systems must have updated NVM (version 25 or later) in order 
> > for the functionality to work properly.  
> > If the system does not have the update, please contact the OEM directly for 
> > an updated NVM.  
> > For best functionality and support, Intel recommends using Thunderbolt 3 
> > systems for all validation and testing.
> 
> Maybe it is worth mentioning in the documentation and/or in the Kconfig
> help message that a minimal firmware version is needed for Thunderbolt 2
> controllers.
> 
> It would have saved some time for me :)
> 
> > 
> > > BTW, it is quite a shame that the Thunderbolt firmware version can't 
> > > be read from Linux.
> > > 
> > 
> > This is WIP, once this patch will be upstream, we will be able to focus more
> > on aligning Linux with the Thunderbolt features that we have for windows.
> 
> Well, I rather see the firmware identification and update as basic
> features on the top of which ones you can build a driver. For example in
> this case this would allow the ICM driver and/or the userland daemon to
> exit with a useful error message rather than just not working without any
> explanation.
> 
> Next week I'll try the driver with a Thunderbolt 3 controller.

Hi Amir,

I tested the thunderbolt-icm driver (v9 series) on an Gigabyte
motherboard (Z170X-UD5 TH-CF) with a Thunderbolt 3 controller (Alpine
Ridge 4C).

I can see that the network interface is well created when the
motherboard is connected to a MacBook Pro (Thunderbolt 2 or 3).

And here are the TCP bandwidths measured using the iperf3 benchmark:

- MacBook Pro Thunderbolt 2: 8.46Gbits/sec
- MacBook Pro Thunderbolt 3: 11.8Gbits/sec

Are this results consistent with your expectations ?

From the MacOS system interface on the MacBook Pro Thunderbolt 3,
I noticed that the interface appears as dual lane (2x 20Gb/sec). But
when two MacBook Pro are connected together, the interface appears as
single lane (1x 40Gb/sec). Is some lane bonding support missing in the
Linux implementation ?

Here are a couple of additional questio

Re: net/can: use-after-free in bcm_rx_thr_flush

2016-11-22 Thread Oliver Hartkopp

Hi Andrey,

thanks for the report.

Although I can't see the issue in the code ...

On 11/22/2016 10:22 AM, Andrey Konovalov wrote:


==
BUG: KASAN: use-after-free in bcm_rx_thr_flush+0x284/0x2b0
Read of size 1 at addr 88006c1faae5 by task a.out/3874

page:ea0001b07e80 count:1 mapcount:0 mapping:  (null) index:0x0
flags: 0x180(slab)
page dumped because: kasan: bad access detected


(..)



The buggy address belongs to the object at 88006c1faae0
 which belongs to the cache kmalloc-32 of size 32


???


The buggy address 88006c1faae5 is located 5 bytes inside
 of 32-byte region [88006c1faae0, 88006c1fab00)


(..)


Memory state around the buggy address:
 88006c1fa980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
 88006c1faa00: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc

88006c1faa80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb

   ^
 88006c1fab00: fc fc fb fb fb fb fc fc 00 00 00 00 fc fc 00 00
 88006c1fab80: 00 00 fc fc fb fb fb fb fc fc fb fb fb fb fc fc
==


(should be some zero initialized memory here)

The relevant code of bcm_rx_do_flush() can be found here:

http://lxr.free-electrons.com/source/net/can/bcm.c#L589

static inline int bcm_rx_do_flush(struct bcm_op *op, int update,
  unsigned int index)
{
struct canfd_frame *lcf = op->last_frames + op->cfsiz * index;

if ((op->last_frames) && (lcf->flags & RX_THR)) {  <<<- !!!
if (update)
bcm_rx_changed(op, lcf);
return 1;
}
return 0;
}


lcf->flags points into an array of struct canfd_frame at offset 5 which 
is allocated here:


http://lxr.free-electrons.com/source/net/can/bcm.c#L1105

/* create and init array for received CAN frames */
op->last_frames = kzalloc(msg_head->nframes * op->cfsiz,
  GFP_KERNEL);

So why does KASAN complain about accessing some kind of 32 byte cache 
when it should point into a zero initialized allocated space?


I will write some other test cases with a similar setting of options to 
check if I can trigger the instability too.


Tnx & regards,
Oliver


RE: [PATCH v9 0/8] thunderbolt: Introducing Thunderbolt(TM) Networking

2016-11-22 Thread Mario.Limonciello
> Here are a couple of additional questions:
> 
> - When the network interface is created, there is no IP address
>   assigned (or negotiated ?) on the Linux side. But it is done on the
>   MacOS side. And in the Linux kernel logs I can also read the message:
>   "ready for ThunderboltIP negotiation". Is there something missing or
>   not working on the Linux side ? What is the correct way to configure
>   or negotiate the IP address. For my tests I did it manually...
> 
> - When the Linux machine is started with the Thunderbolt wire already
>   connected to a MacBook Pro, sometimes (but not every time) the
>   network interface is not created. The Thunderbolt wire needs to be
>   replugged.
> 
> FWIW you get my
> 
> Tested-by: Simon Guinot 
> 
> Simon

Simon,

Since I also performed testing on the previous patchset, I'll share what I did.

I configured Network Manager to use the TBT interface to share an internet
connection to another box.  This configures a static IP address on the local
Linux side and sets up routing.

Network manager remembers setup this in a configuration database.  
When the interface goes up it will then set up a DHCP server to hand
out an IP address to the other side.




Re: net/can: use-after-free in bcm_rx_thr_flush

2016-11-22 Thread Andrey Konovalov
On Tue, Nov 22, 2016 at 6:29 PM, Oliver Hartkopp  wrote:
> Hi Andrey,
>
> thanks for the report.
>
> Although I can't see the issue in the code ...
>
> On 11/22/2016 10:22 AM, Andrey Konovalov wrote:
>
>> ==
>> BUG: KASAN: use-after-free in bcm_rx_thr_flush+0x284/0x2b0
>> Read of size 1 at addr 88006c1faae5 by task a.out/3874
>>
>> page:ea0001b07e80 count:1 mapcount:0 mapping:  (null)
>> index:0x0
>> flags: 0x180(slab)
>> page dumped because: kasan: bad access detected
>
>
> (..)
>
>>
>> The buggy address belongs to the object at 88006c1faae0
>>  which belongs to the cache kmalloc-32 of size 32
>
>
> ???
>
>> The buggy address 88006c1faae5 is located 5 bytes inside
>>  of 32-byte region [88006c1faae0, 88006c1fab00)
>
>
> (..)
>
>> Memory state around the buggy address:
>>  88006c1fa980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
>>  88006c1faa00: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
>>>
>>> 88006c1faa80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
>>
>>^
>>  88006c1fab00: fc fc fb fb fb fb fc fc 00 00 00 00 fc fc 00 00
>>  88006c1fab80: 00 00 fc fc fb fb fb fb fc fc fb fb fb fb fc fc
>> ==
>
>
> (should be some zero initialized memory here)
>
> The relevant code of bcm_rx_do_flush() can be found here:
>
> http://lxr.free-electrons.com/source/net/can/bcm.c#L589
>
> static inline int bcm_rx_do_flush(struct bcm_op *op, int update,
>   unsigned int index)
> {
> struct canfd_frame *lcf = op->last_frames + op->cfsiz * index;
>
> if ((op->last_frames) && (lcf->flags & RX_THR)) {  <<<- !!!
> if (update)
> bcm_rx_changed(op, lcf);
> return 1;
> }
> return 0;
> }
>
>
> lcf->flags points into an array of struct canfd_frame at offset 5 which is
> allocated here:
>
> http://lxr.free-electrons.com/source/net/can/bcm.c#L1105
>
> /* create and init array for received CAN frames */
> op->last_frames = kzalloc(msg_head->nframes * op->cfsiz,
>   GFP_KERNEL);
>
> So why does KASAN complain about accessing some kind of 32 byte cache when
> it should point into a zero initialized allocated space?

Hi Oliver,

My guess would be that this is an out-of-bounds access which doesn't
hit the redzone.
The free and alloc stack traces also look unrelated to the access.
Besides I have a bunch of related slab-out-of-bounds reports, see below.

Thanks for looking at this!

==
BUG: KASAN: slab-out-of-bounds in bcm_send_to_user+0x330/0x480
Read of size 16 at addr 88006de17338 by task syz-executor/30679

page:ea0001b78580 count:1 mapcount:0 mapping:  (null)
index:0x88006de16760 compound_mapcount: 0
flags: 0x5004080(slab|head)
page dumped because: kasan: bad access detected

CPU: 2 PID: 30679 Comm: syz-executor Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 88003cd277b0 81b472e4 88003cd27840 88006de17338
 00fb 00fc 88003cd27830 8150ad42
  81509f65 88006aef9830 0282
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [< inline >] describe_address mm/kasan/report.c:259
 [] kasan_report_error+0x122/0x560 mm/kasan/report.c:365
 [] kasan_report+0x36/0x40 mm/kasan/report.c:387
 [< inline >] check_memory_region_inline mm/kasan/kasan.c:308
 [] check_memory_region+0x13e/0x1a0 mm/kasan/kasan.c:315
 [] memcpy+0x23/0x50 mm/kasan/kasan.c:350
 [] bcm_send_to_user+0x330/0x480 net/can/bcm.c:325
 [] bcm_rx_changed+0x22e/0x2a0 net/can/bcm.c:443
 [< inline >] bcm_rx_do_flush net/can/bcm.c:591
 [] bcm_rx_thr_flush+0x19e/0x2b0 net/can/bcm.c:612
 [< inline >] bcm_rx_setup net/can/bcm.c:1199
 [] bcm_sendmsg+0xbb6/0x30e0 net/can/bcm.c:1351
 [< inline >] sock_sendmsg_nosec net/socket.c:621
 [] sock_sendmsg+0xcc/0x110 net/socket.c:631
 [] ___sys_sendmsg+0x771/0x8b0 net/socket.c:1954
 [] __sys_sendmsg+0xce/0x170 net/socket.c:1988
 [< inline >] SYSC_sendmsg net/socket.c:1999
 [] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2

The buggy address belongs to the object at 88006de17320
 which belongs to the cache kmalloc-32 of size 32
The buggy address 88006de17338 is located 24 bytes inside
 of 32-byte region [88006de17320, 88006de17340)

Freed by task 0:
 [] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 [] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 [< inline >] set_track mm/kasan/kasan.c:507
 [] kasan_slab_free+0x73/0xc0 mm

Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration

2016-11-22 Thread Andrew Lunn
Hi Ido
 
> First of all, I want to be sure that when we say "CPU port", we're
> talking about the same thing. In mlxsw, the CPU port is a pipe between
> the device and the host, through which all packets trapped to the host
> go through. So, when a packet is trapped, the driver reads its Rx
> descriptor, checks through which port it ingressed, resolves its netdev,
> sets skb->dev accordingly and injects it to the Rx path via
> netif_receive_skb(). The CPU port itself isn't represented using a
> netdev.

With DSA, we have a real physical ethernet network interface for the
'cpu' port. It connects to one of the ports of the switch. Frames on
this interface have an extra header, indicating which switch port it
came from, and we do a similar resolving it to a slave netdev, strip
of the header and injecting it into the receiver path via
netif_receive_skb().

Andrew


Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration

2016-11-22 Thread Ido Schimmel
Hi Florian,

On Mon, Nov 21, 2016 at 11:09:22AM -0800, Florian Fainelli wrote:
> Hi all,
> 
> This patch series allows using the bridge master interface to configure
> an Ethernet switch port's CPU/management port with different VLAN attributes 
> than
> those of the bridge downstream ports/members.
> 
> Jiri, Ido, Andrew, Vivien, please review the impact on mlxsw and mv88e6xxx, I
> tested this with b53 and a mockup DSA driver.

We'll need to add a check in mlxsw and ignore any VLAN configuration for
the bridge device itself. Otherwise, any configuration done on br0 will
be propagated to all of its slaves, which is incorrect.

> 
> Open questions:
> 
> - if we have more than one bridge on top of a physical switch, the driver
>   should keep track of that and verify that we are not going to change
>   the CPU port VLAN attributes in a way that results in incompatible settings
>   to be applied
> 
> - if the default behavior is to have all VLANs associated with the CPU port
>   be ingressing/egressing tagged to the CPU, is this really useful?

First of all, I want to be sure that when we say "CPU port", we're
talking about the same thing. In mlxsw, the CPU port is a pipe between
the device and the host, through which all packets trapped to the host
go through. So, when a packet is trapped, the driver reads its Rx
descriptor, checks through which port it ingressed, resolves its netdev,
sets skb->dev accordingly and injects it to the Rx path via
netif_receive_skb(). The CPU port itself isn't represented using a
netdev.

Given the above, having VLAN filters (or STP) on the CPU port itself
isn't really helpful (we do have them for physical ports of course...).
So, mlxsw will not benefit from this patchset and if we've the same
concept of "CPU port", then I'm not sure why you don't just enable all
the VLANs on it?

Also, how are you going to set the VLAN filters for the CPU port when
you don't offload a bridge, but instead vlan devices between which you
route packets? You lose your abstraction of CPU port...

Thanks!


Re: wl1251 & mac address & calibration data

2016-11-22 Thread Pali Rohár
On Tuesday 22 November 2016 17:14:28 Michal Kazior wrote:
> On 22 November 2016 at 16:31, Pali Rohár  wrote:
> > On Tuesday 22 November 2016 16:22:57 Michal Kazior wrote:
> >> On 21 November 2016 at 16:51, Pali Rohár 
> >> wrote:
> >> > On Friday 11 November 2016 18:20:50 Pali Rohár wrote:
> >> >> Hi! I will open discussion about mac address and calibration
> >> >> data for wl1251 wireless chip again...
> >> >> 
> >> >> Problem: Mac address & calibration data for wl1251 chip on
> >> >> Nokia N900 are stored on second nand partition (mtd1) in
> >> >> special proprietary format which is used only for Nokia N900
> >> >> (probably on N8x0 and N9 too). Wireless driver wl1251.ko
> >> >> cannot work without mac address and calibration data.
> >> 
> >> Same problem applies to some ath9k/ath10k supported routers. Some
> >> even carry mac address as implicit offset from ethernet mac
> >> address. As far as I understand OpenWRT cooks cal blobs on first
> >> boot prior to loading modules.
> > 
> > So... wl1251 on Nokia N900 is not alone and this problem is there
> > for more drivers and devices. Which means we should come up with
> > some generic solution.
> 
> This isn't particularly a problem for ath9k/ath10k.
> 
> Let me give you more background on ath10k.
> 
> ath10k devices can come with caldata and macaddr stored in their
> OTP/EEPROM. In that case a generic "template" board file is used.
> Userspace doesn't need to do anything special.
> 
> Some vendors however decide to use flash partition to store caldata.
> In that case ath10k expects userspace to prepare
> cal-$bus-$devname.bin files, each for a different radio (you can
> have multiple radios on a system).
> 
> Now translating this for wl1251 I would expect it should also use
> something like wl1251-nvs-sdio-0x0001.bin for devices like N900 that
> have caldata on flash partition (instead of the generic
> wl1251-nvs.bin). I'm not sure if wl1251-nvs.bin is something
> comparable to (the generic) board.bin ath10k has though. Maybe the
> entire idea behind wl1251-nvs.bin is flawed as it's supposed to be
> device specific and is oblivious to possibility of having multiple
> wl1251 radios on one system (probably sane assumption from practical
> standpoint but still).

Basically nvs data are device specific, in ideal case they should be 
generated in factory by some calibration process (or so).

> >> >> Absence of mac address cause that driver generates random mac
> >> >> address at every kernel boot which has couple of problems
> >> >> (unstable identifier of wireless device due to udev permanent
> >> >> storage rules; unpredictable behaviour for dhcp mac address
> >> >> assignment, mac address filtering, ...).
> >> >> 
> >> >> Currently there is no way to set (permanent) mac address for
> >> >> network interface from userspace. And it does not make sense
> >> >> to implement in linux kernel large parser for proprietary
> >> >> format of second nand partition where is mac address stored
> >> >> only for one device -- Nokia N900.
> >> >> 
> >> >> Driver wl1251.ko loads calibration data via request_firmware()
> >> >> for file wl1251-nvs.bin. There are some "example" calibration
> >> >> file in linux- firmware repository, but it is not suitable for
> >> >> normal usage as real calibration data are per-device specific.
> >> 
> >> You could hook up a script that cooks up the cal/mac file via
> >> modprobe's install hook, no?
> > 
> > Via modprobe hook I can either pass custom module parameter or call
> > any other system (shell) commands.
> > 
> > As wl1251.ko does not accept mac_address as module parameter, such
> > modprobe hook does not help -- as there is absolutely no way from
> > userspace to set or change (permanent) mac address.
> 
> Quoting modprobe.d manual:
> >   install modulename command...
> >   
> >   This command instructs modprobe to run your
> >   command instead of inserting the module in the
> >   kernel as normal. The command can be any shell
> >   command: this allows you to do any kind of
> >   complex processing you might wish. [...]

I know. But this do not allow me to send mac address to kernel -- as 
kernel does not support such command yet (reason for my first question).

> You can hook up a script that cooks up wl1251-nvs.bin (caldata,
> macaddr) and then insmod the actual wl1251.ko module. Or you can just
> cook up the nvs on first device boot and store it in /lib/firmware
> (possibly overwriting the "generic" wl1251 from linux-firmware).

This is what I would like to prevent -- overwriting (possible readonly) 
system files with some device specific. It is really bad idea!

-- 
Pali Rohár
pali.ro...@gmail.com


signature.asc
Description: This is a digitally signed message part.


Re: net/icmp: null-ptr-deref in icmp6_send

2016-11-22 Thread Cong Wang
On Tue, Nov 22, 2016 at 2:23 AM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> It seems that skb_dst(skb) may end up being NULL.
>
> As far as I can see the bug was introduced in commit 5d41ce29e ("net:
> icmp6_send should use dst dev to determine L3 domain").
> ICMP v4 probaly has similar issue due to 9d1a6c4ea ("net:
> icmp_route_lookup should use rt dev to determine L3 domain").


ipv6_parse_hopopts() is called before NF_INET_PRE_ROUTING,
so the skb_dst could be NULL.

I have no idea what commit 5d41ce29e tried to fix, but we already
use skb->dev a few lines before l3mdev_master_ifindex(), so I don't
understand why skb->dev could be NULL, maybe just for vrf dev?


Re: [RFC net-next 0/3] net: bridge: Allow CPU port configuration

2016-11-22 Thread Florian Fainelli
On 11/22/2016 09:41 AM, Ido Schimmel wrote:
> Hi Florian,
> 
> On Mon, Nov 21, 2016 at 11:09:22AM -0800, Florian Fainelli wrote:
>> Hi all,
>>
>> This patch series allows using the bridge master interface to configure
>> an Ethernet switch port's CPU/management port with different VLAN attributes 
>> than
>> those of the bridge downstream ports/members.
>>
>> Jiri, Ido, Andrew, Vivien, please review the impact on mlxsw and mv88e6xxx, I
>> tested this with b53 and a mockup DSA driver.
> 
> We'll need to add a check in mlxsw and ignore any VLAN configuration for
> the bridge device itself. Otherwise, any configuration done on br0 will
> be propagated to all of its slaves, which is incorrect.
> 
>>
>> Open questions:
>>
>> - if we have more than one bridge on top of a physical switch, the driver
>>   should keep track of that and verify that we are not going to change
>>   the CPU port VLAN attributes in a way that results in incompatible settings
>>   to be applied
>>
>> - if the default behavior is to have all VLANs associated with the CPU port
>>   be ingressing/egressing tagged to the CPU, is this really useful?
> 
> First of all, I want to be sure that when we say "CPU port", we're
> talking about the same thing. In mlxsw, the CPU port is a pipe between
> the device and the host, through which all packets trapped to the host
> go through. So, when a packet is trapped, the driver reads its Rx
> descriptor, checks through which port it ingressed, resolves its netdev,
> sets skb->dev accordingly and injects it to the Rx path via
> netif_receive_skb(). The CPU port itself isn't represented using a
> netdev.

In the case of DSA, the CPU port is a normal Ethernet MAC driver, but in
premise, this driver plus the DSA tag protocol hook do exactly the same
things as you just describe.

> 
> Given the above, having VLAN filters (or STP) on the CPU port itself
> isn't really helpful (we do have them for physical ports of course...).
> So, mlxsw will not benefit from this patchset and if we've the same
> concept of "CPU port", then I'm not sure why you don't just enable all
> the VLANs on it?

We do enable all VLANs on the CPU port (at least with b53, but I think
mv88e6xxx does it too), but compared to e.g: mlxsw, we trap all traffic
by default, and actually, quite often (always actually, until we add IP
routing offloads) the CPU is involved in the LAN/WAN routing, so it is
not infrequent to have the following packet flow:

LAN port -> VLAN 1 -> eth0.1 -> NAT/routing -> eth0.2 -> VLAN 2 -> WAN port

In that case, having the ability to define the per-port membership for
VLANs, including the CPU, kind of helps, especially if there are
private/guests VLAN on either the LAN or WAN segments that the CPU does
not necessarily need to play a role in.

NB: this scheme works because in most configurations that we support
today, the CPU port's speed is greater or equal than the speed of the
downstream/front panel ports.

> 
> Also, how are you going to set the VLAN filters for the CPU port when
> you don't offload a bridge, but instead vlan devices between which you
> route packets? You lose your abstraction of CPU port...

As far as I can tell today, this is not particularly helpful with DSA,
where we start with all traffic going to the CPU (each DSA created
network device is segregated from the other) and only then we require
having bridge VLAN filtering enabled in the kernel, and configuring
bridge VLAN membership to have a proper VLAN-based scheme.

If you did configure VLAN membership with e.g: port0. we could
support that just fine, but that programming interface does not allow
configuring the default VLAN, and in our case, it matters a bit to
support the LAN/WAN routing scenario described. We could agree that all
untagged traffic should go to VLAN 0 or 1 for instance, but that could
then, vary on a per-driver/HW basis.

Hope this clarifies things a bit!
-- 
Florian


[PATCH/RFC -next] net: phy: Fix double free in phy_detach()

2016-11-22 Thread Geert Uytterhoeven
During "poweroff" on sh73a0/kzm9g:

WARNING: CPU: 0 PID: 1271 at drivers/base/devres.c:889 phy_detach+0x44/0x60
Modules linked in:
CPU: 0 PID: 1271 Comm: halt Not tainted 
4.9.0-rc6-kzm9g-05637-gb090128865050239 #823
Hardware name: Generic SH73A0 (Flattened Device Tree)
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0xa4/0xdc)
[] (dump_stack) from [] (__warn+0xcc/0xfc)
[] (__warn) from [] (warn_slowpath_null+0x1c/0x24)
[] (warn_slowpath_null) from [] (phy_detach+0x44/0x60)
[] (phy_detach) from [] (smsc911x_stop+0xf4/0x10c)
[] (smsc911x_stop) from [] (__dev_close_many+0x94/0xb8)
[] (__dev_close_many) from [] (__dev_close+0x20/0x34)
[] (__dev_close) from [] (__dev_change_flags+0x8c/0x130)
[] (__dev_change_flags) from [] 
(dev_change_flags+0x18/0x48)
[] (dev_change_flags) from [] 
(devinet_ioctl+0x33c/0x708)
[] (devinet_ioctl) from [] (sock_ioctl+0x29c/0x2f8)
[] (sock_ioctl) from [] (vfs_ioctl+0x20/0x34)
[] (vfs_ioctl) from [] (do_vfs_ioctl+0x870/0x9c4)
[] (do_vfs_ioctl) from [] (SyS_ioctl+0x34/0x5c)
[] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x1c)
---[ end trace 4555b9be7369b463 ]---

If device_release_driver(&phydev->mdio.dev) was called, it has already
released all resources belonging to the PHY device. Hence the subsequent
call to phy_led_triggers_unregister() may cause a double free, leading
to the warning.

Move the call to phy_led_triggers_unregister() before the possible call
to device_release_driver() to fix this.

Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy 
link state change")
Signed-off-by: Geert Uytterhoeven 
---
Is this the right fix?
---
 drivers/net/phy/phy_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 9e8f048891bd192f..b32457660db66de4 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -981,6 +981,8 @@ void phy_detach(struct phy_device *phydev)
phydev->attached_dev = NULL;
phy_suspend(phydev);
 
+   phy_led_triggers_unregister(phydev);
+
/* If the device had no specific driver before (i.e. - it
 * was using the generic driver), we unbind the device
 * from the generic driver so that there's a chance a
@@ -994,8 +996,6 @@ void phy_detach(struct phy_device *phydev)
}
}
 
-   phy_led_triggers_unregister(phydev);
-
/*
 * The phydev might go away on the put_device() below, so avoid
 * a use-after-free bug by reading the underlying bus first.
-- 
1.9.1



[PATCH net] bnxt: do not busy-poll when link is down

2016-11-22 Thread Andy Gospodarek
When busy polling while a link is down (during a link-flap test), TX
timeouts were observed as well as the following messages in the ring
buffer:

bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free tx failed. rc:-1
bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free rx failed. rc:-1

These were resolved by checking for link status and returning if link
was not up.

Signed-off-by: Andy Gospodarek 
Signed-off-by: Michael Chan 
Tested-by: Rob Miller 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index e18635b..013e373 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1811,6 +1811,9 @@ static int bnxt_busy_poll(struct napi_struct *napi)
if (atomic_read(&bp->intr_sem) != 0)
return LL_FLUSH_FAILED;
 
+   if (!bp->link_info.link_up)
+   return LL_FLUSH_FAILED;
+
if (!bnxt_lock_poll(bnapi))
return LL_FLUSH_BUSY;
 
-- 
2.1.0



List pre vas

2016-11-22 Thread Paní KLeung



Ahoj.

Dobre rano, a jak to delate? Jen rychly jedno, je tu oficialni 
prilezitosti bych chtel diskutovat s vami soukrome.


Ocenil bych vasi rychlou reakci tady na mem osobnim soukromeho e-mailu 
nize pro dalsi komunikaci.


S pratelskym pozdravem,
Paní Ko May Leung
email: lngkoma...@gmail.com
Místopredseda, Managing Director
a vykonny reditel Chong Hing Bank Limited


Re: [PATCH net] bnxt: do not busy-poll when link is down

2016-11-22 Thread Eric Dumazet
On Tue, 2016-11-22 at 13:14 -0500, Andy Gospodarek wrote:
> When busy polling while a link is down (during a link-flap test), TX
> timeouts were observed as well as the following messages in the ring
> buffer:
> 
> bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
> bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free tx failed. rc:-1
> bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
> bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free rx failed. rc:-1
> 
> These were resolved by checking for link status and returning if link
> was not up.
> 
> Signed-off-by: Andy Gospodarek 
> Signed-off-by: Michael Chan 
> Tested-by: Rob Miller 
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index e18635b..013e373 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -1811,6 +1811,9 @@ static int bnxt_busy_poll(struct napi_struct *napi)
>   if (atomic_read(&bp->intr_sem) != 0)
>   return LL_FLUSH_FAILED;
>  
> + if (!bp->link_info.link_up)
> + return LL_FLUSH_FAILED;
> +
>   if (!bnxt_lock_poll(bnapi))
>   return LL_FLUSH_BUSY;
>  


Any plans removing this busy polling stuff, now it is done in core
networking stack ?

This would remove bnxt_lock_napi() extra overhead in normal path ( napi
poll )

I could do this but I do not have the hardware to do the tests.





Re: [PATCH] net: dsa: mv88e6xxx: egress all frames

2016-11-22 Thread Stefan Eichenberger
Hi Andrew

On Tue, Nov 22, 2016 at 04:03:30PM +0100, Andrew Lunn wrote:
> On Tue, Nov 22, 2016 at 11:39:44AM +0100, Stefan Eichenberger wrote:
> > Egress multicast and egress unicast is only enabled for CPU/DSA ports
> > but for switching operation it seems it should be enabled for all ports.
> > Do I miss something here?
> > 
> > I did the following test:
> > brctl addbr br0
> > brctl addif br0 lan0
> > brctl addif br0 lan1
> > 
> > In this scenario the unicast and multicast packets were not forwarded,
> > therefore ARP requests were not resolved, and no connection could be
> > established.
> 
> Hi Stefan
> 
> This is probably specific to the 6097 family. It works fine without
> this on other devices. Creating a bridge like above and pinging across
> it is one of my standard tests. But i only test modern devices like
> the 6165, 6352, 6351, 6390 families.

Okay perfect, I wasn't 100% sure if I would have to configure something
additionally.

> 
> In fact, you might need to review all the code and look where
> mv88e6xxx_6095_family(chip) is used and consider if you need to add
> mv88e6xxx_6097_family(chip). e.g.
> 
> if (mv88e6xxx_6095_family(chip) || mv88e6xxx_6185_family(chip)) {
> /* Set the upstream port this port should use */
> reg |= dsa_upstream_port(ds);
> /* enable forwarding of unknown multicast addresses to
>  * the upstream port
>  */
> if (port == dsa_upstream_port(ds))
> reg |= PORT_CONTROL_2_FORWARD_UNKNOWN;
> }
> 
> Maybe this is your problem?

I think I still don't understand exactly how the driver works.

My problem is that the multicast and broadcast frames are filtered and
the following counter is increasing in ethtool:
sw_in_filtered: 596

This makes sense because "Egress Floods" in the Port Control Register is
set to 0. What kind of mechanism should make sure that for example ARP
packets are sent trough all ports anyway?

Unfortunately I don't have any devices available with more modern
devices, so I can't double check the registers.

Regards,
Stefan


Re: [PATCH net] bnxt: do not busy-poll when link is down

2016-11-22 Thread Michael Chan
On Tue, Nov 22, 2016 at 10:38 AM, Eric Dumazet  wrote:
> On Tue, 2016-11-22 at 13:14 -0500, Andy Gospodarek wrote:
>> When busy polling while a link is down (during a link-flap test), TX
>> timeouts were observed as well as the following messages in the ring
>> buffer:
>>
>> bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
>> bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free tx failed. rc:-1
>> bnxt_en 0008:01:00.2 enP8p1s0f2d2: Resp cmpl intr err msg: 0x51
>> bnxt_en 0008:01:00.2 enP8p1s0f2d2: hwrm_ring_free rx failed. rc:-1
>>
>> These were resolved by checking for link status and returning if link
>> was not up.
>>
>> Signed-off-by: Andy Gospodarek 
>> Signed-off-by: Michael Chan 
>> Tested-by: Rob Miller 
>> ---
>>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
>> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> index e18635b..013e373 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>> @@ -1811,6 +1811,9 @@ static int bnxt_busy_poll(struct napi_struct *napi)
>>   if (atomic_read(&bp->intr_sem) != 0)
>>   return LL_FLUSH_FAILED;
>>
>> + if (!bp->link_info.link_up)
>> + return LL_FLUSH_FAILED;
>> +
>>   if (!bnxt_lock_poll(bnapi))
>>   return LL_FLUSH_BUSY;
>>
>
>
> Any plans removing this busy polling stuff, now it is done in core
> networking stack ?
>
> This would remove bnxt_lock_napi() extra overhead in normal path ( napi
> poll )
>
> I could do this but I do not have the hardware to do the tests.
>
It's on my list of many TODO things.  Probably in the next few weeks.


Re: [PATCH] net: dsa: mv88e6xxx: egress all frames

2016-11-22 Thread Andrew Lunn
On Tue, Nov 22, 2016 at 07:37:33PM +0100, Stefan Eichenberger wrote:
> Hi Andrew
> 
> On Tue, Nov 22, 2016 at 04:03:30PM +0100, Andrew Lunn wrote:
> > On Tue, Nov 22, 2016 at 11:39:44AM +0100, Stefan Eichenberger wrote:
> > > Egress multicast and egress unicast is only enabled for CPU/DSA ports
> > > but for switching operation it seems it should be enabled for all ports.
> > > Do I miss something here?
> > > 
> > > I did the following test:
> > > brctl addbr br0
> > > brctl addif br0 lan0
> > > brctl addif br0 lan1
> > > 
> > > In this scenario the unicast and multicast packets were not forwarded,
> > > therefore ARP requests were not resolved, and no connection could be
> > > established.
> > 
> > Hi Stefan
> > 
> > This is probably specific to the 6097 family. It works fine without
> > this on other devices. Creating a bridge like above and pinging across
> > it is one of my standard tests. But i only test modern devices like
> > the 6165, 6352, 6351, 6390 families.
> 
> Okay perfect, I wasn't 100% sure if I would have to configure something
> additionally.

No. The idea is you treat the interfaces as normal interfaces. You
should not need to do anything additional to what you would do with a
normal interface, when adding it to a bridge.
 
> > In fact, you might need to review all the code and look where
> > mv88e6xxx_6095_family(chip) is used and consider if you need to add
> > mv88e6xxx_6097_family(chip). e.g.
> > 
> > if (mv88e6xxx_6095_family(chip) || mv88e6xxx_6185_family(chip)) {
> > /* Set the upstream port this port should use */
> > reg |= dsa_upstream_port(ds);
> > /* enable forwarding of unknown multicast addresses to
> >  * the upstream port
> >  */
> > if (port == dsa_upstream_port(ds))
> > reg |= PORT_CONTROL_2_FORWARD_UNKNOWN;
> > }
> > 
> > Maybe this is your problem?
> 
> I think I still don't understand exactly how the driver works.
> 
> My problem is that the multicast and broadcast frames are filtered and
> the following counter is increasing in ethtool:
> sw_in_filtered: 596

This is not what is supposed to happen. Broadcast and multicast frames
should go to all ports in the bridge. There are two different ways
this can happen:

1) The mv88e6xxx driver started out with the host doing all bridge
operations. The switch forwards all frames to the software bridge, and
the software bridge then sends them out another port if needed.

2) We later added support for hardware bridging. That is, the switch
itself bridges frames between ports. It will only pass frames to the
software bridge if it does not know what to do with a frame itself.

Now, the different families are not 100% compatible with each
other. We never had access to a 6097, so it has not been tested
recently, and we have probably broken it... My guess would be,
anywhere mv88e6xxx_6095_family(chip) is used, there also needs to be
an mv88e6xxx_6097_family(chip). But i could be wrong.

What you might find useful is

https://github.com/vivien/linux.git 161b96bd7d16d21b0f046c935b70c3b2d277ccc2

although it might need some changes for recent commits.

With that, you can see deeper into the switches registers.

 Andrew


Re: [PATCH net] bnxt: do not busy-poll when link is down

2016-11-22 Thread Eric Dumazet
On Tue, 2016-11-22 at 10:55 -0800, Michael Chan wrote:
> On Tue, Nov 22, 2016 at 10:38 AM, Eric Dumazet  wrote:

> >
> > Any plans removing this busy polling stuff, now it is done in core
> > networking stack ?
> >
> > This would remove bnxt_lock_napi() extra overhead in normal path ( napi
> > poll )
> >
> > I could do this but I do not have the hardware to do the tests.
> >
> It's on my list of many TODO things.  Probably in the next few weeks.

Awesome, thanks !





Re: net/icmp: null-ptr-deref in icmp6_send

2016-11-22 Thread David Ahern


Sent from my iPhone

> On Nov 22, 2016, at 1:11 PM, Cong Wang  wrote:
> 
>> On Tue, Nov 22, 2016 at 2:23 AM, Andrey Konovalov  
>> wrote:
>> Hi,
>> 
>> I've got the following error report while fuzzing the kernel with syzkaller.
>> 
>> It seems that skb_dst(skb) may end up being NULL.
>> 
>> As far as I can see the bug was introduced in commit 5d41ce29e ("net:
>> icmp6_send should use dst dev to determine L3 domain").
>> ICMP v4 probaly has similar issue due to 9d1a6c4ea ("net:
>> icmp_route_lookup should use rt dev to determine L3 domain").
> 
> 
> ipv6_parse_hopopts() is called before NF_INET_PRE_ROUTING,
> so the skb_dst could be NULL.
> 
> I have no idea what commit 5d41ce29e tried to fix, but we already
> use skb->dev a few lines before l3mdev_master_ifindex(), so I don't
> understand why skb->dev could be NULL, maybe just for vrf dev?

On PTO this week and currently at the beach. Will take a look tonight. Thanks 
for the report. 

Re: [PATCH net-next] tcp: enhance tcp_collapse_retrans() with skb_shift()

2016-11-22 Thread Eric Dumazet
On Tue, 2016-11-15 at 12:51 -0800, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> In commit 2331ccc5b323 ("tcp: enhance tcp collapsing"),
> we made a first step allowing copying right skb to left skb head.
> 
> Since all skbs in socket write queue are headless (but possibly the very
> first one), this strategy often does not work.
> 
> This patch extends tcp_collapse_retrans() to perform frag shifting,
> thanks to skb_shift() helper.
> 
> This helper needs to not BUG on non headless skbs, as callers are ok
> with that.
> 
> Tested:
> 
> Following packetdrill test now passes :
> 
> 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
>+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>+0 bind(3, ..., ...) = 0
>+0 listen(3, 1) = 0
> 
>+0 < S 0:0(0) win 32792 
>+0 > S. 0:0(0) ack 1 
> +.100 < . 1:1(0) ack 1 win 257
>+0 accept(3, ..., ...) = 4
> 
>+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>+0 write(4, ..., 200) = 200
>+0 > P. 1:201(200) ack 1
> +.001 write(4, ..., 200) = 200
>+0 > P. 201:401(200) ack 1
> +.001 write(4, ..., 200) = 200
>+0 > P. 401:601(200) ack 1
> +.001 write(4, ..., 200) = 200
>+0 > P. 601:801(200) ack 1
> +.001 write(4, ..., 200) = 200
>+0 > P. 801:1001(200) ack 1
> +.001 write(4, ..., 100) = 100
>+0 > P. 1001:1101(100) ack 1
> +.001 write(4, ..., 100) = 100
>+0 > P. 1101:1201(100) ack 1
> +.001 write(4, ..., 100) = 100
>+0 > P. 1201:1301(100) ack 1
> +.001 write(4, ..., 100) = 100
>+0 > P. 1301:1401(100) ack 1
> 
> +.099 < . 1:1(0) ack 201 win 257
> +.001 < . 1:1(0) ack 201 win 257 
>+0 > P. 201:1001(800) ack 1
> 
> Signed-off-by: Eric Dumazet 
> Cc: Neal Cardwell 
> Cc: Yuchung Cheng 
> ---
>  net/core/skbuff.c |4 +++-
>  net/ipv4/tcp_output.c |   22 +++---
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 
> 0b2a6e94af2de73ed638634c47a0fb71e2cbc1cb..a9cb81a10c4ba895587727aa4cf098e9a38424ea
>  100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -2656,7 +2656,9 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, 
> int shiftlen)
>   struct skb_frag_struct *fragfrom, *fragto;
>  
>   BUG_ON(shiftlen > skb->len);
> - BUG_ON(skb_headlen(skb));   /* Would corrupt stream */
> +
> + if (skb_headlen(skb))
> + return 0;
>  
>   todo = shiftlen;
>   from = 0;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 
> f57b5aa51b59cf0a58975fe34a7dcdb886ea8c50..19105b46a30436ebb85fe97ee43089e77aa028bb
>  100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2514,7 +2514,7 @@ void tcp_skb_collapse_tstamp(struct sk_buff *skb,
>  }
>  
>  /* Collapses two adjacent SKB's during retransmission. */
> -static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
> +static bool tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
>  {
>   struct tcp_sock *tp = tcp_sk(sk);
>   struct sk_buff *next_skb = tcp_write_queue_next(sk, skb);
> @@ -2525,14 +2525,17 @@ static void tcp_collapse_retrans(struct sock *sk, 
> struct sk_buff *skb)
>  
>   BUG_ON(tcp_skb_pcount(skb) != 1 || tcp_skb_pcount(next_skb) != 1);
>  
> + if (next_skb_size) {
> + if (next_skb_size <= skb_availroom(skb))
> + skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size),
> +   next_skb_size);
> + else if (!skb_shift(skb, next_skb, next_skb_size))
> + return false;
> + }
>   tcp_highest_sack_combine(sk, next_skb, skb);
>  
>   tcp_unlink_write_queue(next_skb, sk);
>  
> - if (next_skb_size)
> - skb_copy_bits(next_skb, 0, skb_put(skb, next_skb_size),
> -   next_skb_size);
> -
>   if (next_skb->ip_summed == CHECKSUM_PARTIAL)
>   skb->ip_summed = CHECKSUM_PARTIAL;
>  
> @@ -2561,6 +2564,7 @@ static void tcp_collapse_retrans(struct sock *sk, 
> struct sk_buff *skb)
>   tcp_skb_collapse_tstamp(skb, next_skb);
>  
>   sk_wmem_free_skb(sk, next_skb);
> + return true;
>  }
>  
>  /* Check if coalescing SKBs is legal. */
> @@ -2610,16 +2614,12 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
> struct sk_buff *to,
>  
>   if (space < 0)
>   break;
> - /* Punt if not enough space exists in the first SKB for
> -  * the data in the second
> -  */
> - if (skb->len > skb_availroom(to))
> - break;
>  
>   if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))
>   break;
>  
> - tcp_collapse_retrans(sk, to);
> + if (!tcp_collapse_retrans(sk, to))
> + break;
>   }
>  }
>  


David, patch is marked 'Superseded' in
https://patchwork.ozlabs.org/patch/695264/

Not sure what this means exactly ?
Did I miss a mail/feedback

[PATCH net] flow_dissect: call init_default_flow_dissectors() earlier

2016-11-22 Thread Eric Dumazet
From: Eric Dumazet 

Andre Noll reported panics after my recent fix (commit 34fad54c2537
"net: __skb_flow_dissect() must cap its return value")

After some more headaches, Alexander root caused the problem to
init_default_flow_dissectors() being called too late, in case
a network driver like IGB is not a module and receives DHCP message
very early.

Fix is to call init_default_flow_dissectors() much earlier,
as it is a core infrastructure and does not depend on another
kernel service.

Fixes: 06635a35d13d4 ("flow_dissect: use programable dissector in 
skb_flow_dissect and friends")
Signed-off-by: Eric Dumazet 
Reported-by: Andre Noll 
Diagnosed-by: Alexander Duyck 
---
 net/core/flow_dissector.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 69e4463a4b1b..c6d8207ffa7e 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1013,4 +1013,4 @@ static int __init init_default_flow_dissectors(void)
return 0;
 }
 
-late_initcall_sync(init_default_flow_dissectors);
+core_initcall(init_default_flow_dissectors);




Re: [PATCH net-next] net/sched: cls_flower: verify root pointer before dereferncing it

2016-11-22 Thread Cong Wang
On Tue, Nov 22, 2016 at 8:11 AM, Jiri Pirko  wrote:
> Tue, Nov 22, 2016 at 05:04:11PM CET, dan...@iogearbox.net wrote:
>>Hmm, I don't think we want to have such an additional test in fast
>>path for each and every classifier. Can we think of ways to avoid that?
>>
>>My question is, since we unlink individual instances from such tp-internal
>>lists through RCU and release the instance through call_rcu() as well as
>>the head (tp->root) via kfree_rcu() eventually, against what are we protecting
>>setting RCU_INIT_POINTER(tp->root, NULL) in ->destroy() callback? Something
>>not respecting grace period?
>
> If you call tp->ops->destroy in call_rcu, you don't have to set tp->root
> to null.

We do need to respect the grace period if we touch the globally visible
data structure tp in tcf_destroy(). Therefore Roi's patch is not fixing the
right place.

Also I don't know why you blame my commit, this problem should already
exist prior to my commit, probably date back to John's RCU patches.

I am working on a patch.


  1   2   >