date:20151201

Re: [RFC PATCH net] Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"

2015-12-01 Thread David Miller

From: Nicolas Dichtel 
Date: Fri, 27 Nov 2015 18:17:05 +0100

> This reverts commit ab450605b35caa768ca33e86db9403229bf42be4.
> 
> In IPv6, we cannot inherit the dst of the original dst. ndisc packets
> are IPv6 packets and may take another route than the original packet.
> 
> This patch breaks the following scenario: a packet comes from eth0 and
> is forwarded through vxlan1. The encapsulated packet triggers an NS
> which cannot be sent because of the wrong route.
> 
> CC: Jiri Benc 
> CC: Thomas Graf 
> Signed-off-by: Nicolas Dichtel 
> ---
> 
> I know that this is not the right fix, it's why I've put RFC ;-)
> Should the right fix only do a copy of dst metadata in the new dst?
> Feedback is welcomed.

Ok I'll apply this revert while you guys try to come up with something
better.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

2015-12-01 Thread David Miller

From: Eric Dumazet 
Date: Mon, 30 Nov 2015 09:58:23 -0800

> On Sat, 2015-11-28 at 15:51 +0100, Pavel Machek wrote:
>> atl1c driver is doing order-4 allocation with GFP_ATOMIC
>> priority. That often breaks  networking after resume. Switch to
>> GFP_KERNEL. Still not ideal, but should be significantly better.
>> 
>> Signed-off-by: Pavel Machek 
>> 
>> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
>> b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> index 2795d6d..afb71e0 100644
>> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
>> @@ -1016,10 +1016,10 @@ static int atl1c_setup_ring_resources(struct 
>> atl1c_adapter *adapter)
>>  sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
>>  8 * 4;
>>  
>> -ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
>> -_header->dma);
>> +ring_header->desc = dma_alloc_coherent(>dev, ring_header->size,
>> +   _header->dma, GFP_KERNEL);
>>  if (unlikely(!ring_header->desc)) {
>> -dev_err(>dev, "pci_alloc_consistend failed\n");
>> +dev_err(>dev, "could not get memmory for DMA buffer\n");
>>  goto err_nomem;
>>  }
>>  memset(ring_header->desc, 0, ring_header->size);
>> 
> 
> It seems there is a missed opportunity to get rid of the memset() here,
> by adding __GFP_ZERO to the dma_alloc_coherent() GFP_KERNEL mask,
> or simply using dma_zalloc_coherent()

Also, the Subject line needs to be adjusted.  The proper format for
the Subject line is:

[PATCH $TREE] $subsystem: $description.

Where "$TREE" is either 'net' or 'net-next', $subsystem is the lowercase
name of the driver (here 'atl1c') and then a colon, and then a space, and
then the single-line description.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v3 3/8] netfilter: Allow calling into nat helper without skb_dst.g

2015-12-01 Thread Pablo Neira Ayuso

On Wed, Nov 25, 2015 at 04:08:16PM -0800, Jarno Rajahalme wrote:
> NAT checksum recalculation code assumes existence of skb_dst, which
> becomes a problem for a later patch in the series ("openvswitch:
> Interface with NAT.").  Simplify this by removing the check on
> skb_dst, as the checksum will be dealt with later in the stack.
> 
> Suggested-by: Pravin Shelar 
> Signed-off-by: Jarno Rajahalme 
> ---
>  net/ipv4/netfilter/nf_nat_l3proto_ipv4.c | 29 -
>  net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 29 -
>  2 files changed, 16 insertions(+), 42 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c 
> b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> index 5075b7e..f8aad03 100644
> --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> @@ -127,28 +127,15 @@ static void nf_nat_ipv4_csum_recalc(struct sk_buff *skb,
>   u8 proto, void *data, __sum16 *check,
>   int datalen, int oldlen)
>  {
> - const struct iphdr *iph = ip_hdr(skb);
> - struct rtable *rt = skb_rtable(skb);
> -
>   if (skb->ip_summed != CHECKSUM_PARTIAL) {
> - if (!(rt->rt_flags & RTCF_LOCAL) &&
> - (!skb->dev || skb->dev->features & NETIF_F_V4_CSUM)) {
> - skb->ip_summed = CHECKSUM_PARTIAL;
> - skb->csum_start = skb_headroom(skb) +
> -   skb_network_offset(skb) +
> -   ip_hdrlen(skb);
> - skb->csum_offset = (void *)check - data;
> - *check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
> - datalen, proto, 0);
> - } else {
> - *check = 0;
> - *check = csum_tcpudp_magic(iph->saddr, iph->daddr,
> -datalen, proto,
> -csum_partial(data, datalen,
> - 0));
> - if (proto == IPPROTO_UDP && !*check)
> - *check = CSUM_MANGLED_0;
> - }
> + const struct iphdr *iph = ip_hdr(skb);
> +
> + skb->ip_summed = CHECKSUM_PARTIAL;
> + skb->csum_start = skb_headroom(skb) + skb_network_offset(skb) +
> + ip_hdrlen(skb);
> + skb->csum_offset = (void *)check - data;
> + *check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, datalen,
> + proto, 0);

Is this change going to work with traffic that is redirected to the
localhost?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Sowmini Varadhan

On (12/01/15 15:57), David Miller wrote:

> >> > Also current versions of SNMP provide more useful information about
> >> > network interface slot information in ifDescription
> >> 
> >> Well if they do provide strings, then that is probably a better way
> >> forward than messing with the kernel.
> > 
> > It gives strings based on PCI information but nothing useful
> > on tunnels.
> 
> But at least in theory, that could be extended to do so right?

iirc even for the cisco NOS-es, the snmp ifindex for virtual interfaces
(tunnels, vpc, loopback) etc would not have any slot etc info, but
would have other things (specific to the virtual interface type, e.g.,
FEX interface index had something that was pertinent to fex)

But the bigger reason they had a immutable snmp-ifindex was that
the uspace networking applications could build state based on that
immutable index and hang on to that number, regardless of any renumbering
that happened due to HA/failover.

And, since they did not (in general) have to deal with random third
party apps, they did not have to deal with questions like "what should
POSIX/glibc APIs send - the immutable or the mutable index?" so it
was ok for them to have the complexity of two interface indices.

--Sowmini

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH (net-next.git)] stmmac: support Reg_9 to get HW level information

2015-12-01 Thread David Miller

From: Giuseppe Cavallaro 
Date: Mon, 30 Nov 2015 11:33:10 +0100

> For GMAC newer than 3.40a there is a new register (Reg_9) that provides the
> status of all modules of the transmit and receive paths and FIFO status.
> These can be exposed via ethtool.
> 
> Signed-off-by: Giuseppe Cavallaro 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] hv_netvsc: rework link status change handling

2015-12-01 Thread David Miller

From: Vitaly Kuznetsov 
Date: Fri, 27 Nov 2015 11:39:55 +0100

> There are several issues in hv_netvsc driver with regards to link status
> change handling:
> - RNDIS_STATUS_NETWORK_CHANGE results in calling userspace helper doing
>   '/etc/init.d/network restart' and this is inappropriate and broken for
>   many reasons.
> - link_watch infrastructure only sends one notification per second and
>   in case of e.g. paired disconnect/connect events we get only one
>   notification with last status. This makes it impossible to handle such
>   situations in userspace.
> 
> Redo link status changes handling in the following way:
> - Create a list of reconfig events in network device context.
> - On a reconfig event add it to the list of events and schedule
>   netvsc_link_change().
> - In netvsc_link_change() ensure 2-second delay between link status
>   changes.
> - Handle RNDIS_STATUS_NETWORK_CHANGE as a paired disconnect/connect event.
> 
> Signed-off-by: Vitaly Kuznetsov 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] drivers: net: xgene: fix Tx flow control

2015-12-01 Thread David Miller

From: Iyappan Subramanian 
Date: Fri, 27 Nov 2015 19:22:31 -0800

> Currently the Tx flow control is based on reading the hardware state,
> which is not accurate since it may not reflect the descriptors that
> are not yet reached the memory.
> 
> To accurately control the Tx flow, changing it to be software based.
> 
> Signed-off-by: Iyappan Subramanian 
> Tested-by: Khuong Dinh 

Having a new atomic operation for every completion descriptor
operation is very excessive.

Especially when there is probably some other lock being held in all
of these paths upon which you can use for synchornization.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next V2 08/17] hv_netvsc: Don't ask for additional head room in the skb

2015-12-01 Thread David Miller

From: "K. Y. Srinivasan" 
Date: Sat, 28 Nov 2015 12:20:36 -0800

> +#elseif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)

The correct CPP directive is "#elif".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next 0/2] Basic support for Solarflare 8000 series NICs

2015-12-01 Thread David Miller

From: Bert Kenward 
Date: Mon, 30 Nov 2015 09:05:33 +

> The upcoming Solarflare 8000 series 10G/40G network card supports a 
> similar interface to the current 7000 series cards. This patch series 
> provides basic support for these cards, making no use of any new 
> functionality.
> 
> v2: fix indenting in ef10.c in patch 1/2.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 1/2] net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA

2015-12-01 Thread David Miller

From: Eric Dumazet 
Date: Sun, 29 Nov 2015 20:03:10 -0800

> This patch is a cleanup to make following patch easier to
> review.
> 
> Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
> from (struct socket)->flags to a (struct socket_wq)->flags
> to benefit from RCU protection in sock_wake_async()
> 
> To ease backports, we rename both constants.
> 
> Two new helpers, sk_set_bit(int nr, struct sock *sk)
> and sk_clear_bit(int net, struct sock *sk) are added so that
> following patch can change their implementation.
> 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 2/2] net: fix sock_wake_async() rcu protection

2015-12-01 Thread David Miller

From: Eric Dumazet 
Date: Sun, 29 Nov 2015 20:03:11 -0800

> Dmitry provided a syzkaller (http://github.com/google/syzkaller)
> triggering a fault in sock_wake_async() when async IO is requested.
> 
> Said program stressed af_unix sockets, but the issue is generic
> and should be addressed in core networking stack.
> 
> The problem is that by the time sock_wake_async() is called,
> we should not access the @flags field of 'struct socket',
> as the inode containing this socket might be freed without
> further notice, and without RCU grace period.
> 
> We already maintain an RCU protected structure, "struct socket_wq"
> so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
> is the safe route.
> 
> It also reduces number of cache lines needing dirtying, so might
> provide a performance improvement anyway.
> 
> In followup patches, we might move remaining flags (SOCK_NOSPACE,
> SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
> being mostly read and let it being shared between cpus.
> 
> Reported-by: Dmitry Vyukov 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: "hw csum failure" error on skge driver with 4.3 kernel upon receiving ICMPv6 multicast listener discovery packets

2015-12-01 Thread Stefan Ring

On Sun, Nov 15, 2015 at 7:44 PM, David Madore  wrote:
> The skge driver in the 4.3 kernel reports hardware checksum errors
> upon receiving (certain?) IPv6 multicast packets containing ICMPv6
> multicast listener discovery messages.  This is a regression since 4.1
> (I believe between 4.1 and 4.2).  The e1000e driver on a different
> Ethernet port of the same machine is not affected.  Disabling offload
> rx checksumming suppresses the errors.  Nor are all IPv6 multicast
> packets affected: for some reason, it seems only those containing
> ICMPv6 multicast listener discovery messages trigger the problem.
>
> In case it also matters, the skge interface in question (eth1 in what
> follows) is part of a bridge that contains another Ethernet interface
> and a Wifi card.
>
> Here is a frame, with its link-level headers, that caused an error
> when received by skge:
>
>    33 33 ff 62 30 d8 60 fb 42 f1 b1 36 86 dd 60 00  33.b0.`.B..6..`.
> 0010   00 00 00 20 00 01 fe 80 00 00 00 00 00 00 62 fb  ... ..b.
> 0020   42 ff fe f1 b1 36 ff 02 00 00 00 00 00 00 00 00  B6..
> 0030   00 01 ff 62 30 d8 3a 00 01 00 05 02 00 00 83 00  ...b0.:.
> 0040   c9 8a 00 00 00 00 ff 02 00 00 00 00 00 00 00 00  
> 0050   00 01 ff 62 30 d8...b0.
>
> (Network dumps performed on another network device suggest that the
> checksum is, indeed, correct.)
>
> And here is the syslog produced upon receiving the above packet:
>
> Nov 15 17:52:13 pleiades kernel: [  661.393163] eth1: hw csum failure
> Nov 15 17:52:13 pleiades kernel: [  661.394203] CPU: 0 PID: 0 Comm: swapper/0 
> Tainted: GW   4.3.0-pleiades #1
> Nov 15 17:52:13 pleiades kernel: [  661.395192] Hardware name: System 
> manufacturer System Product Name/P5WD2-Premium, BIOS 0709 03/31/2006
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  88013a9d5d00 
> 88013fc03aa8 8129a186 88013afe
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  88013fc03ac0 
> 81436425  88013fc03af0
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  8142b87a 
> 1027316b3fc03b30 88013a9d5d00 0030
> Nov 15 17:52:13 pleiades kernel: [  661.395192] Call Trace:
> Nov 15 17:52:13 pleiades kernel: [  661.395192][] 
> dump_stack+0x44/0x5e
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> netdev_rx_csum_fault+0x35/0x40
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> __skb_checksum_complete+0xca/0xd0
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> ipv6_mc_validate_checksum+0xab/0x140
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> skb_checksum_trimmed+0x8f/0x180
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> ipv6_mc_check_mld+0x105/0x330
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> br_multicast_rcv+0x8c/0xce0 [bridge]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> __netif_receive_skb+0x13/0x60
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> netif_receive_skb_internal+0x2e/0x90
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> br_handle_frame_finish+0x28c/0x5b0 [bridge]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> usb_hcd_submit_urb+0xa4/0x960 [usbcore]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> br_handle_frame+0x151/0x270 [bridge]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> usb_submit_urb+0x2d2/0x510 [usbcore]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> __netif_receive_skb_core+0x1c2/0x990
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> __usb_hcd_giveback_urb+0x82/0xe0 [usbcore]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> __netif_receive_skb+0x13/0x60
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> netif_receive_skb_internal+0x2e/0x90
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> napi_gro_receive+0xa0/0xd0
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> skge_poll+0x380/0x7a0 [skge]
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> lapic_next_event+0x18/0x20
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> net_rx_action+0x13c/0x300
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> __do_softirq+0xc7/0x240
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> irq_exit+0x70/0x90
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> do_IRQ+0x51/0xd0
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> common_interrupt+0x7c/0x7c
> Nov 15 17:52:13 pleiades kernel: [  661.395192][] 
> ? mwait_idle+0x87/0x140
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> arch_cpu_idle+0xa/0x10
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> default_idle_call+0x25/0x30
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> cpu_startup_entry+0x29c/0x310
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> rest_init+0x72/0x80
> Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
>

Re: [PATCH] isdn: remove spellcaster driver

2015-12-01 Thread David Miller

From: Arnd Bergmann 
Date: Mon, 30 Nov 2015 11:34:09 +0100

> The 'sc' ISDN driver relies on using readl() to access ISA I/O memory.
> This has been deprecated and produced warnings since linux-2.3.23,
> disabled by default since 2.4.10 and finally removed in 2.6.5.
> 
> I found this because the compiling the driver for ARM produces
> a warning:
> 
> In file included from ../drivers/isdn/sc/includes.h:8:0,
>  from ../drivers/isdn/sc/init.c:13:
> ../arch/arm/include/asm/io.h:115:21: note: expected 'const volatile void *' 
> but argument is of type 'long unsigned int'
> 
> It is pretty clear that this driver has not been used for a long time
> and there is no point fixing it now, so let's remove it.
> 
> Signed-off-by: Arnd Bergmann 
> ---
> There has been some discussion about removing all the ISA based ISDN
> drivers in the past. I'm not trying to restart that discussion at the
> moment, and I did not find the same bug in the other drivers, so
> let's just remove this one for now.

Applied to net-next, we can resurrect+fix this driver if someone actually
uses it.

Thanks Arnd.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Hannes Frederic Sowa


On Tue, Dec 1, 2015, at 22:44, Stephen Hemminger wrote:
> On Tue, 01 Dec 2015 22:14:59 +0100
> Hannes Frederic Sowa  wrote:
> > I had several snmp installations with net-snmp and munin, cacti and so
> > on and all had the interface name in ifDescription already some years
> > back.
> 
> In net-snmp 5.7 or later ifDescr is set to result of pci_lookup_name
> (by default).

Seems the data should simply be in ifName nowadays (unconfirmed):


Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: add support for netdev notifier error injection

2015-12-01 Thread David Miller

From: Nikolay Aleksandrov 
Date: Sat, 28 Nov 2015 13:45:28 +0100

> From: Nikolay Aleksandrov 
> 
> This module allows to insert errors in some of netdevice's notifier
> events. All network drivers use these notifiers to signal various events
> and to check if they are allowed, e.g. PRECHANGEMTU and CHANGEMTU
> afterwards. Until recently I had to run failure tests by injecting
> a custom module, but now this infrastructure makes it trivial to test
> these failure paths. Some of the recent bugs I fixed were found using
> this module.
> Here's an example:
>  $ cd /sys/kernel/debug/notifier-error-inject/netdev
>  $ echo -22 > actions/NETDEV_CHANGEMTU/error
>  $ ip link set eth0 mtu 1024
>  RTNETLINK answers: Invalid argument
> 
> CC: Akinobu Mita 
> CC: "David S. Miller" 
> CC: netdev 
> Signed-off-by: Nikolay Aleksandrov 

This looks fine, applied to net-next, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net-next V2 08/17] hv_netvsc: Don't ask for additional head room in the skb

2015-12-01 Thread KY Srinivasan



> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, December 1, 2015 12:42 PM
> To: KY Srinivasan 
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com
> Subject: Re: [PATCH net-next V2 08/17] hv_netvsc: Don't ask for additional
> head room in the skb
> 
> From: "K. Y. Srinivasan" 
> Date: Sat, 28 Nov 2015 12:20:36 -0800
> 
> > +#elseif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
> 
> The correct CPP directive is "#elif".

Thanks David, I will fix the typo and resubmit.

K. Y
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next v2 04/15] i40e: remove BUG_ON from feature string building

2015-12-01 Thread Joe Perches

On Wed, 2015-11-25 at 11:36 -0800, Joe Perches wrote:
> On Wed, 2015-11-25 at 10:35 -0800, Jeff Kirsher wrote:
> > On Wed, 2015-11-25 at 21:26 +0300, Sergei Shtylyov wrote:
> > > On 11/25/2015 09:21 PM, Jeff Kirsher wrote:
> > > 
> > > > From: Shannon Nelson 
> > > > 
> > > > There's really no reason to kill the kernel thread just because
> > > > of a
> > > > little info string. This reworks the code to use snprintf's
> > > > limiting to
> > > > assure that the string is never too long, and WARN_ON to still
> > > > put out
> > > > a warning that we might want to look at the feature list
> > > > length.
> > > > 
> > > > diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c
> > > > b/drivers/net/ethernet/intel/i40e/i40e_main.c
> []
> > > >if (pf->flags & I40E_FLAG_VEB_MODE_ENABLED)
> > > > - buf += sprintf(buf, "VEB ");
> > > > + i += snprintf([i], REMAIN(i), "VEPA ");
> > > 
> > > Not "VEB "?
> > 
> > Nice catch Sergei, I will wait a till this afternoon to respin the
> > patch series, just in case there are other changes needed that our
> > validation did not catch. :-)
> 
> trivia:
> 
> If you redo these, it'd be nicer not to use " " after each
> fixed string, but use " " before each fixed string.
> 
> The final output string would be 1 byte shorter overall and
> not have an excess " " before the newline.
> 
> The declaration of i doesn't need initialization to 0:
> 
>   i = snprintf(buf, INFO_STRING_LEN, "Features: PF-id[%d]", ...
> 
> would work.

Maybe something like this patch (net-next)

Fix I40E_FLAG_VEB_MODE_ENABLED output of VEB

Miscellanea:
o Remove unnecessary string variable
o Add space before not after fixed strings
o Use kmalloc not kzalloc
o Don't initialize i to 0, use result of first snprintf

---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 42 +
 1 file changed, 19 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 4b7d874..145eeb5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10240,52 +10240,48 @@ static int i40e_setup_pf_filter_control(struct 
i40e_pf *pf)
 static void i40e_print_features(struct i40e_pf *pf)
 {
struct i40e_hw *hw = >hw;
-   char *buf, *string;
-   int i = 0;
+   char *buf;
+   int i;
 
-   string = kzalloc(INFO_STRING_LEN, GFP_KERNEL);
-   if (!string) {
-   dev_err(>pdev->dev, "Features string allocation failed\n");
+   buf = kmalloc(INFO_STRING_LEN, GFP_KERNEL);
+   if (!buf)
return;
-   }
-
-   buf = string;
 
-   i += snprintf([i], REMAIN(i), "Features: PF-id[%d] ", hw->pf_id);
+   i = snprintf(buf, INFO_STRING_LEN, "Features: PF-id[%d]", hw->pf_id);
 #ifdef CONFIG_PCI_IOV
-   i += snprintf([i], REMAIN(i), "VFs: %d ", pf->num_req_vfs);
+   i += snprintf([i], REMAIN(i), " VFs: %d", pf->num_req_vfs);
 #endif
-   i += snprintf([i], REMAIN(i), "VSIs: %d QP: %d RX: %s ",
+   i += snprintf([i], REMAIN(i), " VSIs: %d QP: %d RX: %s",
  pf->hw.func_caps.num_vsis,
  pf->vsi[pf->lan_vsi]->num_queue_pairs,
  pf->flags & I40E_FLAG_RX_PS_ENABLED ? "PS" : "1BUF");
 
if (pf->flags & I40E_FLAG_RSS_ENABLED)
-   i += snprintf([i], REMAIN(i), "RSS ");
+   i += snprintf([i], REMAIN(i), " RSS");
if (pf->flags & I40E_FLAG_FD_ATR_ENABLED)
-   i += snprintf([i], REMAIN(i), "FD_ATR ");
+   i += snprintf([i], REMAIN(i), " FD_ATR");
if (pf->flags & I40E_FLAG_FD_SB_ENABLED) {
-   i += snprintf([i], REMAIN(i), "FD_SB ");
-   i += snprintf([i], REMAIN(i), "NTUPLE ");
+   i += snprintf([i], REMAIN(i), " FD_SB");
+   i += snprintf([i], REMAIN(i), " NTUPLE");
}
if (pf->flags & I40E_FLAG_DCB_CAPABLE)
-   i += snprintf([i], REMAIN(i), "DCB ");
+   i += snprintf([i], REMAIN(i), " DCB");
 #if IS_ENABLED(CONFIG_VXLAN)
-   i += snprintf([i], REMAIN(i), "VxLAN ");
+   i += snprintf([i], REMAIN(i), " VxLAN");
 #endif
if (pf->flags & I40E_FLAG_PTP)
-   i += snprintf([i], REMAIN(i), "PTP ");
+   i += snprintf([i], REMAIN(i), " PTP");
 #ifdef I40E_FCOE
if (pf->flags & I40E_FLAG_FCOE_ENABLED)
-   i += snprintf([i], REMAIN(i), "FCOE ");
+   i += snprintf([i], REMAIN(i), " FCOE");
 #endif
if (pf->flags & I40E_FLAG_VEB_MODE_ENABLED)
-   i += snprintf([i], REMAIN(i), "VEPA ");
+   i += snprintf([i], REMAIN(i), " VEB");
else
-   buf += sprintf(buf, "VEPA ");
+   i += snprintf([i], REMAIN(i), " VEPA");
 
-   dev_info(>pdev->dev, "%s\n", string);
-   kfree(string);
+   dev_info(>pdev->dev, "%s\n", buf);
+

Re: [PATCH net-next 0/6] qede/qed: Implement various ethtool operations

2015-12-01 Thread David Miller

From: Yuval Mintz 
Date: Mon, 30 Nov 2015 12:25:00 +0200

> This series adds several new ethtool operations to qede:
>   - {get, set}_channels
>   - {get, set}_ringparam
>   - set_phys_id
>   - nway_reset
>   - {get, set}_pauseparam
> As well as extending the qed APIs to support these commands.
> 
> Dave, please consider applying this series to `net-next'.

Series applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Stephen Hemminger

On Tue, 01 Dec 2015 22:14:59 +0100
Hannes Frederic Sowa  wrote:

> On Tue, Dec 1, 2015, at 21:57, David Miller wrote:
> > From: Stephen Hemminger 
> > Date: Tue, 1 Dec 2015 12:20:38 -0800
> > 
> > > On Tue, 01 Dec 2015 14:28:47 -0500 (EST)
> > > David Miller  wrote:
> > > 
> > >> From: Stephen Hemminger 
> > >> Date: Tue, 1 Dec 2015 08:06:52 -0800
> > >> 
> > >> > On Tue, 01 Dec 2015 17:02:23 +0100
> > >> > Hannes Frederic Sowa  wrote:
> > >> > 
> > >> >> On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
> > >> >> > > I'm not sure I understand how this would work- are we going to 
> > >> >> > > pin down the ifindex for some subset of interfaces?
> > >> >> > 
> > >> >> > I'm not sure what your idea is, but I guess we might mean the same
> > >> >> > thing:
> > >> >> > 
> > >> >> > What I have in mind is that the user can supply a list of (ifname ->
> > >> >> > ifindex) entries via a sysfs/procfs interface and if such a list is
> > >> >> > present, the kernel will search the list for every ifname which is
> > >> >> > registered and check if there is an entry. If there is, the ifindex
> > >> >> > for this entry is used. If there is no entry found for the given
> > >> >> > ifname, the usual algorithm is used (therefore inherently providing
> > >> >> > backward compatibility).
> > >> >> 
> > >> >> Sorry to ask because I don't like this feature at all. There was a lot
> > >> >> of work on stable interface names. Why do you need stable ifindexes,
> > >> >> which were never meant to be stable for a longer amount of time?
> > >> > 
> > >> > Also current versions of SNMP provide more useful information about
> > >> > network interface slot information in ifDescription
> > >> 
> > >> Well if they do provide strings, then that is probably a better way
> > >> forward than messing with the kernel.
> > > 
> > > It gives strings based on PCI information but nothing useful
> > > on tunnels.
> > 
> > But at least in theory, that could be extended to do so right?
> 
> I had several snmp installations with net-snmp and munin, cacti and so
> on and all had the interface name in ifDescription already some years
> back.

In net-snmp 5.7 or later ifDescr is set to result of pci_lookup_name
(by default).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/4] net: support per queue tx_usecs in sysfs

2015-12-01 Thread Florian Fainelli

On 01/12/15 00:01, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> Network devices usually have many queues. Each queue has its own
> tx_usecs options. Currently, we can only set all the queues with same
> value by ethtool. This patch expose the tx_usecs in sysfs. So the user
> can set/get per queue coalesce parameter tx_usecs by sysfs.

The new interface you propose makes things inconsistent, since we have
two separate configuration paths (sysfs and ethtool), and it would seem
better to have per-queue awareness in ethtool, since there is a whole
bunch of other parameters that could be configured on a per-queue basis.

Have you tried to extend existing ethtool interfaces to cover the need
for multiple queues?

> 
> Signed-off-by: Kan Liang 
> ---
>  Documentation/networking/scaling.txt | 12 
>  include/linux/netdevice.h|  8 
>  net/core/net-sysfs.c | 38 
> 
>  3 files changed, 58 insertions(+)
> 
> diff --git a/Documentation/networking/scaling.txt 
> b/Documentation/networking/scaling.txt
> index 59f4db2..636192d 100644
> --- a/Documentation/networking/scaling.txt
> +++ b/Documentation/networking/scaling.txt
> @@ -431,6 +431,18 @@ a max-rate attribute is supported, by setting a Mbps 
> value to
>  
>  A value of zero means disabled, and this is the default.
>  
> +Per Queue interrupt moderation:
> +=
> +
> +The interrupt moderation mechanism, which implemented by HW, employs
> +a series of timers to limit the number of interrupts it generates.
> +TX queue absolute delay timer can be set to a microseconds value with
> +
> +/sys/class/net//queues/tx-/tx_usecs
> +
> +For the device which doesn't support per queue interrupt moderation,
> +it shows "N/A".
> +
>  Further Information
>  ===
>  RPS and RFS were introduced in kernel 2.6.35. XPS was incorporated into
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 7d2d1d7..9db5c57 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1059,6 +1059,10 @@ typedef u16 (*select_queue_fallback_t)(struct 
> net_device *dev,
>   *   This function is used to get egress tunnel information for given skb.
>   *   This is useful for retrieving outer tunnel header parameters while
>   *   sampling packet.
> + * void (*ndo_set_per_queue_tx_usecs)(struct net_device *dev,
> + * int index, u32 val);
> + * void (*ndo_get_per_queue_tx_usecs)(struct net_device *dev, int index);
> + *   This function is used to set/get per queue coalesce parameter tx_usecs.
>   *
>   */
>  struct net_device_ops {
> @@ -1236,6 +1240,10 @@ struct net_device_ops {
>bool proto_down);
>   int (*ndo_fill_metadata_dst)(struct net_device *dev,
>  struct sk_buff *skb);
> + void(*ndo_set_per_queue_tx_usecs)(struct net_device 
> *dev,
> +   int index, u32 
> val);
> + u32 (*ndo_get_per_queue_tx_usecs)(struct net_device 
> *dev,
> +   int index);
>  };
>  
>  /**
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index f88a62a..48016b8 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -1239,12 +1239,50 @@ static struct netdev_queue_attribute 
> xps_cpus_attribute =
>  __ATTR(xps_cpus, S_IRUGO | S_IWUSR, show_xps_map, store_xps_map);
>  #endif /* CONFIG_XPS */
>  
> +static ssize_t tx_usecs_show(struct netdev_queue *queue,
> +  struct netdev_queue_attribute *attr,
> +  char *buf)
> +{
> + struct net_device *dev = queue->dev;
> + int index = queue - dev->_tx;
> + u32 val;
> +
> + if (dev->netdev_ops->ndo_get_per_queue_tx_usecs) {
> + val = dev->netdev_ops->ndo_get_per_queue_tx_usecs(dev, index);
> + return sprintf(buf, "%u\n", val);
> + }
> +
> + return sprintf(buf, "N/A\n");
> +}
> +
> +static ssize_t tx_usecs_store(struct netdev_queue *queue,
> +   struct netdev_queue_attribute *attr,
> +   const char *buf, size_t len)
> +{
> + struct net_device *dev = queue->dev;
> + int index = queue - dev->_tx;
> + u32 val, ret;
> +
> + ret = kstrtouint(buf, 0, );
> + if (ret < 0)
> + return -EINVAL;
> +
> + if (dev->netdev_ops->ndo_set_per_queue_tx_usecs)
> + dev->netdev_ops->ndo_set_per_queue_tx_usecs(dev, index, val);
> +
> + return len;
> +}
> +
> +static struct netdev_queue_attribute tx_usecs_attribute =
> +__ATTR(tx_usecs, S_IRUGO | S_IWUSR, tx_usecs_show, tx_usecs_store);
> +
>  static struct attribute *netdev_queue_default_attrs[] = {
>

Re: size overflow in function qdisc_tree_decrease_qlen net/sched/sch_api.c

2015-12-01 Thread Eric Dumazet

On Tue, 2015-12-01 at 12:06 -0800, Eric Dumazet wrote:
> On Tue, 2015-12-01 at 11:17 -0800, Cong Wang wrote:
> > On Tue, Dec 1, 2015 at 11:09 AM, Eric Dumazet  
> > wrote:
> > > On Tue, 2015-12-01 at 10:43 -0800, Cong Wang wrote:
> > >
> > >> This smells hacky... Another way to fix this is to hold the qdisc tree
> > >> lock in mq_dump(), since it is not a hot path (comparing with
> > >> enqueue/dequeue)?
> > >
> > > Really ? Which qdisc tree lock will protect you exactly ???
> > >
> > > Whole point of MQ is that each TX queue has its own lock.
> > >
> > > So multiple cpus can call qdisc_tree_decrease_qlen() at the same time,
> > > holding their own lock.
> > >
> > > Clearly modifying mq 'data' is wrong.
> > 
> > Ah, yeah, but mq _seems_ also the only one who modifies sch->q.qlen
> > in ->dump(), which is the root cause of this bug. I am wondering if it 
> > should
> > just compute the qlen and return it without modifying sch->q.qlen.
> 
> Sure, but then we still would get PAX underflows warnings ...
> 
> Also need to take care of sch->qstats.drops += count;
> 
> Also that would require a change of ->dump() api, since tc_fill_qdisc()
> does :
> 
> if (q->ops->dump && q->ops->dump(q, skb) < 0)
> goto nla_put_failure;
> qlen = q->q.qlen;
> 
> Not sure it is worth the pain, changing signature of all ->dump()
> handlers...
> 
> 
> What about adding TCQ_F_NOPARENT and then :
> 
> Note : Seems to be more invasive patch for net tree (need to properly
> set TCQ_F_NOPARENT)


Hmm... it looks like we have a much more serious bug :

qdisc_lookup() calls qdisc_match_from_root(dev->qdisc, handle) without
proper lock being held, so we might actually crash the host,
if qdisc_tree_decrease_qlen() happens at the time qdiscs are changed. 

qdisc_tree_decrease_qlen() needs serious care :(

Damned.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: xen-netfront crash when detaching network while some network activity

2015-12-01 Thread Marek Marczykowski-Górecki

On Tue, Dec 01, 2015 at 05:00:42PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 17, 2015 at 03:45:15AM +0100, Marek Marczykowski-Górecki wrote:
> > On Wed, Oct 21, 2015 at 08:57:34PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Wed, May 27, 2015 at 12:03:12AM +0200, Marek Marczykowski-Górecki 
> > > wrote:
> > > > On Tue, May 26, 2015 at 11:56:00AM +0100, David Vrabel wrote:
> > > > > On 22/05/15 12:49, Marek Marczykowski-Górecki wrote:
> > > > > > Hi all,
> > > > > > 
> > > > > > I'm experiencing xen-netfront crash when doing xl network-detach 
> > > > > > while
> > > > > > some network activity is going on at the same time. It happens only 
> > > > > > when
> > > > > > domU has more than one vcpu. Not sure if this matters, but the 
> > > > > > backend
> > > > > > is in another domU (not dom0). I'm using Xen 4.2.2. It happens on 
> > > > > > kernel
> > > > > > 3.9.4 and 4.1-rc1 as well.
> > > > > > 
> > > > > > Steps to reproduce:
> > > > > > 1. Start the domU with some network interface
> > > > > > 2. Call there 'ping -f some-IP'
> > > > > > 3. Call 'xl network-detach NAME 0'
> 
> Do you see this all the time or just on occassions?

Using above procedure - all the time.

> I tried to reproduce it and couldn't see it. Is your VM an PV or HVM?

PV, started by libvirt. This may have something to do, the problem didn't
existed on older Xen (4.1) and started by xl. I'm not sure about kernel
version there, but I think I've tried there 3.18 too, which has this
problem.

But I don't see anything special in domU config file (neither backend
nor frontend) - it may be some libvirt default. If that's really the
cause. Can I (and how) get any useful information about that?


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


pgpSU6WIi4g5q.pgp
Description: PGP signature

Re: [RFC] Stable interface index option

2015-12-01 Thread Stephen Hemminger

On Tue, 01 Dec 2015 14:28:47 -0500 (EST)
David Miller  wrote:

> From: Stephen Hemminger 
> Date: Tue, 1 Dec 2015 08:06:52 -0800
> 
> > On Tue, 01 Dec 2015 17:02:23 +0100
> > Hannes Frederic Sowa  wrote:
> > 
> >> On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
> >> > > I'm not sure I understand how this would work- are we going to 
> >> > > pin down the ifindex for some subset of interfaces?
> >> > 
> >> > I'm not sure what your idea is, but I guess we might mean the same
> >> > thing:
> >> > 
> >> > What I have in mind is that the user can supply a list of (ifname ->
> >> > ifindex) entries via a sysfs/procfs interface and if such a list is
> >> > present, the kernel will search the list for every ifname which is
> >> > registered and check if there is an entry. If there is, the ifindex
> >> > for this entry is used. If there is no entry found for the given
> >> > ifname, the usual algorithm is used (therefore inherently providing
> >> > backward compatibility).
> >> 
> >> Sorry to ask because I don't like this feature at all. There was a lot
> >> of work on stable interface names. Why do you need stable ifindexes,
> >> which were never meant to be stable for a longer amount of time?
> > 
> > Also current versions of SNMP provide more useful information about
> > network interface slot information in ifDescription
> 
> Well if they do provide strings, then that is probably a better way
> forward than messing with the kernel.

It gives strings based on PCI information but nothing useful
on tunnels.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vmxnet3: fix checks for dma mapping errors

2015-12-01 Thread David Miller

From: Alexey Khoroshilov 
Date: Sat, 28 Nov 2015 01:29:30 +0300

> vmxnet3_drv does check dma_addr with dma_mapping_error()
> after mapping dma memory. The patch adds the checks and
> tries to handle failures.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov 

Applied with typo fixed, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFCv3 bluetooth-next 1/4] 6lowpan: add lowpan dev register helpers

2015-12-01 Thread Stefan Schmidt


Hello.

On 29/11/15 12:34, Alexander Aring wrote:

This patch introduces register and unregister functionality for lowpan
interfaces. While register a lowpan interface there are several things
which need to be initialize by the 6lowpan subsystem. Upcoming
functionality need to register/unregister per interface components e.g.
debugfs entry.

Signed-off-by: Alexander Aring 
---
  include/net/6lowpan.h |  7 ++-
  net/6lowpan/core.c| 33 +++--
  net/bluetooth/6lowpan.c   |  8 +++-
  net/ieee802154/6lowpan/core.c |  6 ++
  4 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/include/net/6lowpan.h b/include/net/6lowpan.h
index cf3bc56..730211f 100644
--- a/include/net/6lowpan.h
+++ b/include/net/6lowpan.h
@@ -185,7 +185,12 @@ static inline void lowpan_push_hc_data(u8 **hc_ptr, const 
void *data,
*hc_ptr += len;
  }
  
-void lowpan_netdev_setup(struct net_device *dev, enum lowpan_lltypes lltype);

+int lowpan_register_netdevice(struct net_device *dev,
+ enum lowpan_lltypes lltype);
+int lowpan_register_netdev(struct net_device *dev,
+  enum lowpan_lltypes lltype);
+void lowpan_unregister_netdevice(struct net_device *dev);
+void lowpan_unregister_netdev(struct net_device *dev);
  
  /**

   * lowpan_header_decompress - replace 6LoWPAN header with IPv6 header
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index 83b19e0..80fc509 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -15,7 +15,8 @@
  
  #include 
  
-void lowpan_netdev_setup(struct net_device *dev, enum lowpan_lltypes lltype)

+int lowpan_register_netdevice(struct net_device *dev,
+ enum lowpan_lltypes lltype)
  {
dev->addr_len = EUI64_ADDR_LEN;
dev->type = ARPHRD_6LOWPAN;
@@ -23,8 +24,36 @@ void lowpan_netdev_setup(struct net_device *dev, enum 
lowpan_lltypes lltype)
dev->priv_flags |= IFF_NO_QUEUE;
  
  	lowpan_priv(dev)->lltype = lltype;

+
+   return register_netdevice(dev);
+}
+EXPORT_SYMBOL(lowpan_register_netdevice);
+
+int lowpan_register_netdev(struct net_device *dev,
+  enum lowpan_lltypes lltype)
+{
+   int ret;
+
+   rtnl_lock();
+   ret = lowpan_register_netdevice(dev, lltype);
+   rtnl_unlock();
+   return ret;
+}
+EXPORT_SYMBOL(lowpan_register_netdev);
+
+void lowpan_unregister_netdevice(struct net_device *dev)
+{
+   unregister_netdevice(dev);
+}
+EXPORT_SYMBOL(lowpan_unregister_netdevice);
+
+void lowpan_unregister_netdev(struct net_device *dev)
+{
+   rtnl_lock();
+   lowpan_unregister_netdevice(dev);
+   rtnl_unlock();
  }
-EXPORT_SYMBOL(lowpan_netdev_setup);
+EXPORT_SYMBOL(lowpan_unregister_netdev);
  
  static int __init lowpan_module_init(void)

  {
diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c
index 9e9cca3..d040365 100644
--- a/net/bluetooth/6lowpan.c
+++ b/net/bluetooth/6lowpan.c
@@ -825,9 +825,7 @@ static int setup_netdev(struct l2cap_chan *chan, struct 
lowpan_dev **dev)
list_add_rcu(&(*dev)->list, _6lowpan_devices);
spin_unlock(_lock);
  
-	lowpan_netdev_setup(netdev, LOWPAN_LLTYPE_BTLE);

-
-   err = register_netdev(netdev);
+   err = lowpan_register_netdev(netdev, LOWPAN_LLTYPE_BTLE);
if (err < 0) {
BT_INFO("register_netdev failed %d", err);
spin_lock(_lock);
@@ -890,7 +888,7 @@ static void delete_netdev(struct work_struct *work)
struct lowpan_dev *entry = container_of(work, struct lowpan_dev,
delete_netdev);
  
-	unregister_netdev(entry->netdev);

+   lowpan_unregister_netdev(entry->netdev);
  
  	/* The entry pointer is deleted by the netdev destructor. */

  }
@@ -1348,7 +1346,7 @@ static void disconnect_devices(void)
ifdown(entry->netdev);
BT_DBG("Unregistering netdev %s %p",
   entry->netdev->name, entry->netdev);
-   unregister_netdev(entry->netdev);
+   lowpan_unregister_netdev(entry->netdev);
kfree(entry);
}
  }
diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
index 20c49c7..737c87a 100644
--- a/net/ieee802154/6lowpan/core.c
+++ b/net/ieee802154/6lowpan/core.c
@@ -161,9 +161,7 @@ static int lowpan_newlink(struct net *src_net, struct 
net_device *ldev,
wdev->needed_headroom;
ldev->needed_tailroom = wdev->needed_tailroom;
  
-	lowpan_netdev_setup(ldev, LOWPAN_LLTYPE_IEEE802154);

-
-   ret = register_netdevice(ldev);
+   ret = lowpan_register_netdevice(ldev, LOWPAN_LLTYPE_IEEE802154);
if (ret < 0) {
dev_put(wdev);
return ret;
@@ -180,7 +178,7 @@ static void lowpan_dellink(struct net_device *ldev, struct 
list_head *head)
ASSERT_RTNL();
  
  	wdev->ieee802154_ptr->lowpan_dev =

Re: [RFC] Stable interface index option

2015-12-01 Thread David Miller

From: Stephen Hemminger 
Date: Tue, 1 Dec 2015 12:20:38 -0800

> On Tue, 01 Dec 2015 14:28:47 -0500 (EST)
> David Miller  wrote:
> 
>> From: Stephen Hemminger 
>> Date: Tue, 1 Dec 2015 08:06:52 -0800
>> 
>> > On Tue, 01 Dec 2015 17:02:23 +0100
>> > Hannes Frederic Sowa  wrote:
>> > 
>> >> On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
>> >> > > I'm not sure I understand how this would work- are we going to 
>> >> > > pin down the ifindex for some subset of interfaces?
>> >> > 
>> >> > I'm not sure what your idea is, but I guess we might mean the same
>> >> > thing:
>> >> > 
>> >> > What I have in mind is that the user can supply a list of (ifname ->
>> >> > ifindex) entries via a sysfs/procfs interface and if such a list is
>> >> > present, the kernel will search the list for every ifname which is
>> >> > registered and check if there is an entry. If there is, the ifindex
>> >> > for this entry is used. If there is no entry found for the given
>> >> > ifname, the usual algorithm is used (therefore inherently providing
>> >> > backward compatibility).
>> >> 
>> >> Sorry to ask because I don't like this feature at all. There was a lot
>> >> of work on stable interface names. Why do you need stable ifindexes,
>> >> which were never meant to be stable for a longer amount of time?
>> > 
>> > Also current versions of SNMP provide more useful information about
>> > network interface slot information in ifDescription
>> 
>> Well if they do provide strings, then that is probably a better way
>> forward than messing with the kernel.
> 
> It gives strings based on PCI information but nothing useful
> on tunnels.

But at least in theory, that could be extended to do so right?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next PATCH] net: ipv6: restrict hop_limit sysctl setting to range [1;255]

2015-12-01 Thread Phil Sutter

Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.

Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.

Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.

Signed-off-by: Phil Sutter 
---
 net/ipv6/addrconf.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d84742f003a9f..a5de1a616c12a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5200,6 +5200,20 @@ int addrconf_sysctl_forward(struct ctl_table *ctl, int 
write,
 }
 
 static
+int addrconf_sysctl_hop_limit(struct ctl_table *ctl, int write,
+  void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   struct ctl_table lctl;
+   int min_hl = 1, max_hl = 255;
+
+   lctl = *ctl;
+   lctl.extra1 = _hl;
+   lctl.extra2 = _hl;
+
+   return proc_dointvec_minmax(, write, buffer, lenp, ppos);
+}
+
+static
 int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -5454,7 +5468,7 @@ static struct addrconf_sysctl_table
.data   = _devconf.hop_limit,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler   = addrconf_sysctl_hop_limit,
},
{
.procname   = "mtu",
-- 
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RESEND PATCH] arm64: bpf: add 'store immediate' instruction

2015-12-01 Thread Shi, Yang


On 11/30/2015 2:24 PM, Yang Shi wrote:

aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register

Signed-off-by: Yang Shi 
CC: Zi Shen Lim 


Had email exchange offline with Zi Shen Lim since he is traveling and 
cannot send text-only mail, quoted below for his reply:


"I've given reviewed-by in response to original posting. Unless 
something has changed, feel free to add it."


Since there is nothing changed, added his reviewed-by.

Reviewed-by: Zi Shen Lim 

Thanks,
Yang


CC: Xi Wang 
---
Thsi patch might be buried by the storm of xadd discussion, however, it is
absolutely irrelevent to xadd, so resend the patch itself.

  arch/arm64/net/bpf_jit_comp.c | 20 +++-
  1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 6809647..49c1f1b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -563,7 +563,25 @@ emit_cond_jmp:
case BPF_ST | BPF_MEM | BPF_H:
case BPF_ST | BPF_MEM | BPF_B:
case BPF_ST | BPF_MEM | BPF_DW:
-   goto notyet;
+   /* Load imm to a register then store it */
+   ctx->tmp_used = 1;
+   emit_a64_mov_i(1, tmp2, off, ctx);
+   emit_a64_mov_i(1, tmp, imm, ctx);
+   switch (BPF_SIZE(code)) {
+   case BPF_W:
+   emit(A64_STR32(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_H:
+   emit(A64_STRH(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_B:
+   emit(A64_STRB(tmp, dst, tmp2), ctx);
+   break;
+   case BPF_DW:
+   emit(A64_STR64(tmp, dst, tmp2), ctx);
+   break;
+   }
+   break;

/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_W:



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Stephen Hemminger

On Tue, 01 Dec 2015 22:54:54 +0100
Hannes Frederic Sowa  wrote:

> 
> On Tue, Dec 1, 2015, at 22:44, Stephen Hemminger wrote:
> > On Tue, 01 Dec 2015 22:14:59 +0100
> > Hannes Frederic Sowa  wrote:
> > > I had several snmp installations with net-snmp and munin, cacti and so
> > > on and all had the interface name in ifDescription already some years
> > > back.
> > 
> > In net-snmp 5.7 or later ifDescr is set to result of pci_lookup_name
> > (by default).
> 
> Seems the data should simply be in ifName nowadays (unconfirmed):
> 
> 
> Bye,
> Hannes

By default
  ifDescr == ifName == interface name (ie eth0, enp0s1, ...)
If the device is on PCI bus then result is like:


IF-MIB::ifDescr.2 = STRING: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 
PCI Express Gigabit Ethernet Controller
IF-MIB::ifName.2 = STRING: enp3s0
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] wan/x25: Fix use-after-free in x25_asy_open_tty()

2015-12-01 Thread David Miller

From: Peter Hurley 
Date: Fri, 27 Nov 2015 14:18:39 -0500

> The N_X25 line discipline may access the previous line discipline's closed
> and already-freed private data on open [1].
> 
> The tty->disc_data field _never_ refers to valid data on entry to the
> line discipline's open() method. Rather, the ldisc is expected to
> initialize that field for its own use for the lifetime of the instance
> (ie. from open() to close() only).
 ...
> Reported-and-tested-by: Sasha Levin 
> Cc: 
> Signed-off-by: Peter Hurley 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Hannes Frederic Sowa

On Tue, Dec 1, 2015, at 20:27, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Tue, 01 Dec 2015 17:02:23 +0100
> 
> > On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
> >> > I'm not sure I understand how this would work- are we going to 
> >> > pin down the ifindex for some subset of interfaces?
> >> 
> >> I'm not sure what your idea is, but I guess we might mean the same
> >> thing:
> >> 
> >> What I have in mind is that the user can supply a list of (ifname ->
> >> ifindex) entries via a sysfs/procfs interface and if such a list is
> >> present, the kernel will search the list for every ifname which is
> >> registered and check if there is an entry. If there is, the ifindex
> >> for this entry is used. If there is no entry found for the given
> >> ifname, the usual algorithm is used (therefore inherently providing
> >> backward compatibility).
> > 
> > Sorry to ask because I don't like this feature at all. There was a lot
> > of work on stable interface names. Why do you need stable ifindexes,
> > which were never meant to be stable for a longer amount of time?
> 
> Because all the remote SNMP tools work with interface indexes, not names.

I know, but it should be terribly simply to patch SNMP tools to even
store the table of ifindex <-> name mappings persistently on the disk
and thus completely avoid this issue. Even though they can check on
interfaces if they have the same characteristics, e.g. tunnel to the
same destinations etc. Those are all policies which user space should
handle.

I agree it would make life much easier for user space if the kernel
would keep the ifindex stable over reboots etc. but for a much higher
costs at kernel maintenance.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2] openvswitch: properly refcount vport-vxlan module

2015-12-01 Thread David Miller

From: Paolo Abeni 
Date: Mon, 30 Nov 2015 12:31:43 +0100

> After 614732eaa12d, no refcount is maintained for the vport-vxlan module.
> This allows the userspace to remove such module while vport-vxlan
> devices still exist, which leads to later oops.
> 
> v1 -> v2:
>  - move vport 'owner' initialization in ovs_vport_ops_register()
>and make such function a macro
> 
> Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
> Signed-off-by: Paolo Abeni 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/13] mvneta Buffer Management and enhancements

2015-12-01 Thread Marcin Wojtas

Hi Gregory,

Thanks for the log. I think it may be an overall problem with 4GB size
representation in mvebu_mbus_dram_info structure? Maybe whole DRAM
space is associated to CS0, and the 4GB size (0x1  ) does not
fit u32 variable?

Best regards,
Marcin

2015-12-01 14:12 GMT+01:00 Gregory CLEMENT :
> Hi Marcin,
>
>  On lun., nov. 30 2015, Marcin Wojtas  wrote:
> [...]
 5. Enable BM on Armada XP and 38X development boards - those ones and
 A370 I could check on my own. In all cases they survived night-long
 linerate iperf. Also tests were performed with A388 SoC working as a
 network bridge between two packet generators. They showed increase of
 maximum processed 64B packets by ~20k (~555k packets with BM enabled
 vs ~535 packets without BM). Also when pushing 1500B-packets with a
 line rate achieved, CPU load decreased from around 25% without BM vs
 18-20% with BM.
>>>
>>> I was trying to test the BM part of tour series on the Armada XP GP
>>> board. However it failed very quickly during the pool allocation. After
>>> a first debug I found that the size of the cs used in the
>>> mvebu_mbus_dram_info struct was 0. I have applied your series on a
>>> v4.4-rc1 kernel. At this stage I don't know if it is a regression in the
>>> mbus driver, a misconfiguration on my side or something else.
>>>
>>> Does it ring a bell for you?
>>
>> Frankly, I'm a bit surprised, I've never seen such problems on any of
>> the boards (AXP-GP/DB, A38X-DB/GP/AP). Did mvebu_mbus_dram_win_info
>> function exit with an error? Can you please apply below diff:
>> http://pastebin.com/2ws1txWk
>
> Yes it exited with errors and I added the same kind traces. It was how I
> knew that the size was 0!
>
> I've just rebuild a fresh kernel using mvebu_v7_defconfig and adding
> your patch, I got the same issue (see the log at the end of the email.)
>
>
> But the good news is that on the same kernel on Armada 388 GP the pool
> allocation does not fail. I really suspect an issue with my u-boot.
>
>
>> And send me a full log beginning from u-boot?
>>
>>>
>>> How do you test test it exactly?
>>> Especially on which kernel and with which U-Boot?
>>>
>>
>> I've just re-built the patchset I sent, which is on top of 4.4-rc1.
>>
>> I use AXP-GP, 78460 @ 1600MHz, 2GB DRAM, and everything works fine. My
>> u-boot version: v2011.12 2014_T2.0_eng_dropv2.
>
> My config is AXP-GP, 78460 @ 1300MHz, 8GB DRAM (only 4GB are used
> because I didn't activated LPAE), but the main difference is the U-Boot
> version: v2011.12 2014_T2.eng_dropv1.ATAG-test02.
>
> Thanks,
>
> Gregory
>
>
> [0.00] Booting Linux on physical CPU 0x0
> [0.00] Linux version 4.4.0-rc1-00013-g76f111f9bdf8-dirty 
> (gclement@FE-laptop) (gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu1) ) 
> #1024 SMP Tue Dec 1 14:02:52 CET 2015
> [0.00] CPU: ARMv7 Processor [562f5842] revision 2 (ARMv7), cr=10c5387d
> [0.00] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
> [0.00] Machine model: Marvell Armada XP Development Board 
> DB-MV784MP-GP
> [0.00] Memory policy: Data cache writealloc
> [0.00] PERCPU: Embedded 12 pages/cpu @ee1ac000 s18752 r8192 d22208 
> u49152
> [0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
> pages: 981504
> [0.00] Kernel command line: console=ttyS0,115200 earlyprintk 
> mvneta.rxq_def=2
> [0.00] log_buf_len individual max cpu contribution: 4096 bytes
> [0.00] log_buf_len total cpu_extra contributions: 12288 bytes
> [0.00] log_buf_len min size: 16384 bytes
> [0.00] log_buf_len: 32768 bytes
> [0.00] early log buf free: 14924(91%)
> [0.00] PID hash table entries: 4096 (order: 2, 16384 bytes)
> [0.00] Dentry cache hash table entries: 131072 (order: 7, 524288 
> bytes)
> [0.00] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
> [0.00] Memory: 3888204K/3932160K available (5576K kernel code, 251K 
> rwdata, 1544K rodata, 4460K init, 207K bss, 43956K reserved, 0K cma-reserved, 
> 3145728K highmem)
> [0.00] Virtual kernel memory layout:
> [0.00] vector  : 0x - 0x1000   (   4 kB)
> [0.00] fixmap  : 0xffc0 - 0xfff0   (3072 kB)
> [0.00] vmalloc : 0xf080 - 0xff80   ( 240 MB)
> [0.00] lowmem  : 0xc000 - 0xf000   ( 768 MB)
> [0.00] pkmap   : 0xbfe0 - 0xc000   (   2 MB)
> [0.00] modules : 0xbf00 - 0xbfe0   (  14 MB)
> [0.00]   .text : 0xc0008000 - 0xc06fc374   (7121 kB)
> [0.00]   .init : 0xc06fd000 - 0xc0b58000   (4460 kB)
> [0.00]   .data : 0xc0b58000 - 0xc0b96d00   ( 252 kB)
> [0.00].bss : 0xc0b96d00 - 0xc0bcaa58   ( 208 kB)
> [0.00] Hierarchical RCU implementation.
> [0.00]  Build-time adjustment of leaf fanout to

Re: Increasing skb->mark size

2015-12-01 Thread Daniel Borkmann


On 12/01/2015 08:13 PM, Andi Kleen wrote:

Lorenzo Colitti  writes:

On Wed, Nov 25, 2015 at 5:32 AM, Matt Bennett
 wrote:

I'm emailing this list for feedback on the feasibility of increasing
skb->mark or adding a new field for marking. Perhaps this extension
could be done under a new CONFIG option.


64-bit marks (both skb->mark and sk->sk_mark) would be useful for
hosts doing complex policy routing as well. Current Android releases
use 20 of the 32 bits. If the mark were 64 bits, we could put the UID
in it, and stop using ip rules to implement per-UID routing.


This would be be great. I've recently ran into some issues with
the overhead of the Android firewall setup.

So basically you need 4 extra bytes in sk_buff. How about:

- shrinking skb->priority to 2 byte


That wouldn't work, see SO_PRIORITY and such (4 bytes) ...


- skb_iff is either skb->dev->iff or 0. so it could be replaced with a
single bit flag for the 0 case.


... and that one wouldn't work on ingress.

Hmm, thinking out loud, maybe it makes sense to combine {mark, priority}
into a mark64 field as union, if the use-case allows to ignore/overwrite
priorities set by applications, or to infer them otherwise based on
different policies like net_prio cgroup (see skb_update_prio()).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: size overflow in function qdisc_tree_decrease_qlen net/sched/sch_api.c

2015-12-01 Thread Eric Dumazet

On Tue, 2015-12-01 at 11:17 -0800, Cong Wang wrote:
> On Tue, Dec 1, 2015 at 11:09 AM, Eric Dumazet  wrote:
> > On Tue, 2015-12-01 at 10:43 -0800, Cong Wang wrote:
> >
> >> This smells hacky... Another way to fix this is to hold the qdisc tree
> >> lock in mq_dump(), since it is not a hot path (comparing with
> >> enqueue/dequeue)?
> >
> > Really ? Which qdisc tree lock will protect you exactly ???
> >
> > Whole point of MQ is that each TX queue has its own lock.
> >
> > So multiple cpus can call qdisc_tree_decrease_qlen() at the same time,
> > holding their own lock.
> >
> > Clearly modifying mq 'data' is wrong.
> 
> Ah, yeah, but mq _seems_ also the only one who modifies sch->q.qlen
> in ->dump(), which is the root cause of this bug. I am wondering if it should
> just compute the qlen and return it without modifying sch->q.qlen.

Sure, but then we still would get PAX underflows warnings ...

Also need to take care of sch->qstats.drops += count;

Also that would require a change of ->dump() api, since tc_fill_qdisc()
does :

if (q->ops->dump && q->ops->dump(q, skb) < 0)
goto nla_put_failure;
qlen = q->q.qlen;

Not sure it is worth the pain, changing signature of all ->dump()
handlers...


What about adding TCQ_F_NOPARENT and then :

Note : Seems to be more invasive patch for net tree (need to properly
set TCQ_F_NOPARENT)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index f43c8f33f09e..20c462b48404 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -753,6 +753,9 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned 
int n)
while ((parentid = sch->parent)) {
if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
return;
+   /* This qdisc has no 'parent', we are at the top of the tree */
+   if (sch->flags & TCQ_F_NOPARENT)
+   return;
 
sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
if (sch == NULL) {
 



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Improve Atheros ethernet driver not to do order 4 GFP_ATOMIC allocation

2015-12-01 Thread David Miller

From: Michal Hocko 
Date: Mon, 30 Nov 2015 14:21:29 +0100

> On Sat 28-11-15 15:51:13, Pavel Machek wrote:
>> 
>> atl1c driver is doing order-4 allocation with GFP_ATOMIC
>> priority. That often breaks  networking after resume. Switch to
>> GFP_KERNEL. Still not ideal, but should be significantly better.
> 
> It is not clear why GFP_KERNEL can replace GFP_ATOMIC safely neither
> from the changelog nor from the patch context.

Earlier in the function we do a GFP_KERNEL kmalloc so: 

¯\_(ツ)_/¯

It should be fine.

Re: [PATCH net-next v6] mpls: support for dead routes

2015-12-01 Thread David Miller

From: Roopa Prabhu 
Date: Sat, 28 Nov 2015 19:38:33 -0800

> Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
> routes due to link events. Also adds code to ignore dead
> routes during route selection.

I agree with Robert's feedback that we probably should use
ACCESS_ONCE(), optionally with a local variable.

Please make this change and I'll apply this patch, thanks Roopa!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [v8, 2/6] fsl/fman: Add FMan support

2015-12-01 Thread David Miller

From: 
Date: Mon, 30 Nov 2015 14:20:58 +0200

> +typedef irqreturn_t (fman_exceptions_cb)(struct fman *fman,
> +  enum fman_exceptions exception);

Function and function pointer declarations and definitions should be
indented such that the second and subsequent lines begin precisely
at the first column after the openning parenthesis of the first line.

Please audit this and fix it in your entire submission, almost ever
new such case is done incorrectly.

> + fman->state->exceptions = (EX_DMA_BUS_ERROR |
> + EX_DMA_READ_ECC  |
> + EX_DMA_SYSTEM_WRITE_ECC  |
> + EX_DMA_FM_WRITE_ECC  |
> + EX_FPM_STALL_ON_TASKS|
> + EX_FPM_SINGLE_ECC|
> + EX_FPM_DOUBLE_ECC|
> + EX_QMI_DEQ_FROM_UNKNOWN_PORTID |
> + EX_BMI_LIST_RAM_ECC  |
> + EX_BMI_STORAGE_PROFILE_ECC   |
> + EX_BMI_STATISTICS_RAM_ECC|
> + EX_MURAM_ECC |
> + EX_BMI_DISPATCH_RAM_ECC  |
> + EX_QMI_DOUBLE_ECC|
> + EX_QMI_SINGLE_ECC);

The same applies to multi-line parenthesized expressions like this
one.  Again, please audit and fix this in your entire submission.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

forwarding of ipv4 link local addresses

2015-12-01 Thread David Ahern

RFC 3927 states that packets from/to IPv4 link-local addresses 
(169.254/16) should not be forwarded, yet the Linux networking stack 
happily forwards them. Before sending in a patch I wanted to inquire if 
this behavior is intentional.


Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFCv3 bluetooth-next 2/4] 6lowpan: add debugfs support

2015-12-01 Thread Stefan Schmidt


Hello.

On 29/11/15 12:34, Alexander Aring wrote:

This patch will introduce a 6lowpan entry into the debugfs if enabled.
Inside this 6lowpan directory we create a subdirectories of all 6lowpan
interfaces to offer a per interface debugfs support.

Signed-off-by: Alexander Aring 
---
  include/net/6lowpan.h   |  3 +++
  net/6lowpan/6lowpan_i.h | 28 ++
  net/6lowpan/Kconfig |  8 
  net/6lowpan/Makefile|  1 +
  net/6lowpan/core.c  | 28 +-
  net/6lowpan/debugfs.c   | 53 +
  6 files changed, 120 insertions(+), 1 deletion(-)
  create mode 100644 net/6lowpan/6lowpan_i.h
  create mode 100644 net/6lowpan/debugfs.c

diff --git a/include/net/6lowpan.h b/include/net/6lowpan.h
index 730211f..2f6a3f2 100644
--- a/include/net/6lowpan.h
+++ b/include/net/6lowpan.h
@@ -53,6 +53,8 @@
  #ifndef __6LOWPAN_H__
  #define __6LOWPAN_H__
  
+#include 

+
  #include 
  #include 
  
@@ -98,6 +100,7 @@ enum lowpan_lltypes {
  
  struct lowpan_priv {

enum lowpan_lltypes lltype;
+   struct dentry *iface_debugfs;
  
  	/* must be last */

u8 priv[0] __aligned(sizeof(void *));
diff --git a/net/6lowpan/6lowpan_i.h b/net/6lowpan/6lowpan_i.h
new file mode 100644
index 000..d16bb4b
--- /dev/null
+++ b/net/6lowpan/6lowpan_i.h
@@ -0,0 +1,28 @@
+#ifndef __6LOWPAN_I_H
+#define __6LOWPAN_I_H
+
+#include 
+
+#ifdef CONFIG_6LOWPAN_DEBUGFS
+int lowpan_dev_debugfs_init(struct net_device *dev);
+void lowpan_dev_debugfs_exit(struct net_device *dev);
+
+int __init lowpan_debugfs_init(void);
+void lowpan_debugfs_exit(void);
+#else
+static inline int lowpan_dev_debugfs_init(struct net_device *dev)
+{
+   return 0;
+}
+
+static inline void lowpan_dev_debugfs_exit(struct net_device *dev) { }
+
+static inline int __init lowpan_debugfs_init(void)
+{
+   return 0;
+}
+
+static inline void lowpan_debugfs_exit(void) { }
+#endif /* CONFIG_6LOWPAN_DEBUGFS */
+
+#endif /* __6LOWPAN_I_H */
diff --git a/net/6lowpan/Kconfig b/net/6lowpan/Kconfig
index 7fa0f38..7ecedd7 100644
--- a/net/6lowpan/Kconfig
+++ b/net/6lowpan/Kconfig
@@ -5,6 +5,14 @@ menuconfig 6LOWPAN
  This enables IPv6 over Low power Wireless Personal Area Network -
  "6LoWPAN" which is supported by IEEE 802.15.4 or Bluetooth stacks.
  
+config 6LOWPAN_DEBUGFS

+   bool "6LoWPAN debugfs support"
+   depends on 6LOWPAN
+   depends on DEBUG_FS
+   ---help---
+ This enables 6LoWPAN debugfs support. For example to manipulate
+ IPHC context information at runtime.
+
  menuconfig 6LOWPAN_NHC
tristate "Next Header Compression Support"
depends on 6LOWPAN
diff --git a/net/6lowpan/Makefile b/net/6lowpan/Makefile
index c6ffc55..54cad8d 100644
--- a/net/6lowpan/Makefile
+++ b/net/6lowpan/Makefile
@@ -1,6 +1,7 @@
  obj-$(CONFIG_6LOWPAN) += 6lowpan.o
  
  6lowpan-y := core.o iphc.o nhc.o

+6lowpan-$(CONFIG_6LOWPAN_DEBUGFS) += debugfs.o
  
  #rfc6282 nhcs

  obj-$(CONFIG_6LOWPAN_NHC_DEST) += nhc_dest.o
diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
index 80fc509..c7f06f5 100644
--- a/net/6lowpan/core.c
+++ b/net/6lowpan/core.c
@@ -15,9 +15,13 @@
  
  #include 
  
+#include "6lowpan_i.h"

+
  int lowpan_register_netdevice(struct net_device *dev,
  enum lowpan_lltypes lltype)
  {
+   int ret;
+
dev->addr_len = EUI64_ADDR_LEN;
dev->type = ARPHRD_6LOWPAN;
dev->mtu = IPV6_MIN_MTU;
@@ -25,7 +29,15 @@ int lowpan_register_netdevice(struct net_device *dev,
  
  	lowpan_priv(dev)->lltype = lltype;
  
-	return register_netdevice(dev);

+   ret = lowpan_dev_debugfs_init(dev);
+   if (ret < 0)
+   return ret;
+
+   ret = register_netdevice(dev);
+   if (ret < 0)
+   lowpan_dev_debugfs_exit(dev);
+
+   return ret;
  }
  EXPORT_SYMBOL(lowpan_register_netdevice);
  
@@ -44,6 +56,7 @@ EXPORT_SYMBOL(lowpan_register_netdev);

  void lowpan_unregister_netdevice(struct net_device *dev)
  {
unregister_netdevice(dev);
+   lowpan_dev_debugfs_exit(dev);
  }
  EXPORT_SYMBOL(lowpan_unregister_netdevice);
  
@@ -57,6 +70,12 @@ EXPORT_SYMBOL(lowpan_unregister_netdev);
  
  static int __init lowpan_module_init(void)

  {
+   int ret;
+
+   ret = lowpan_debugfs_init();
+   if (ret < 0)
+   return ret;
+
request_module_nowait("ipv6");
  
  	request_module_nowait("nhc_dest");

@@ -69,6 +88,13 @@ static int __init lowpan_module_init(void)
  
  	return 0;

  }
+
+static void __exit lowpan_module_exit(void)
+{
+   lowpan_debugfs_exit();
+}
+
  module_init(lowpan_module_init);
+module_exit(lowpan_module_exit);
  
  MODULE_LICENSE("GPL");

diff --git a/net/6lowpan/debugfs.c b/net/6lowpan/debugfs.c
new file mode 100644
index 000..88eef84
--- /dev/null
+++ b/net/6lowpan/debugfs.c
@@ -0,0 +1,53 @@
+/* This program is free software; you can redistribute it and/or modify
+ *

Re: [RFC] Stable interface index option

2015-12-01 Thread Hannes Frederic Sowa

On Tue, Dec 1, 2015, at 21:57, David Miller wrote:
> From: Stephen Hemminger 
> Date: Tue, 1 Dec 2015 12:20:38 -0800
> 
> > On Tue, 01 Dec 2015 14:28:47 -0500 (EST)
> > David Miller  wrote:
> > 
> >> From: Stephen Hemminger 
> >> Date: Tue, 1 Dec 2015 08:06:52 -0800
> >> 
> >> > On Tue, 01 Dec 2015 17:02:23 +0100
> >> > Hannes Frederic Sowa  wrote:
> >> > 
> >> >> On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
> >> >> > > I'm not sure I understand how this would work- are we going to 
> >> >> > > pin down the ifindex for some subset of interfaces?
> >> >> > 
> >> >> > I'm not sure what your idea is, but I guess we might mean the same
> >> >> > thing:
> >> >> > 
> >> >> > What I have in mind is that the user can supply a list of (ifname ->
> >> >> > ifindex) entries via a sysfs/procfs interface and if such a list is
> >> >> > present, the kernel will search the list for every ifname which is
> >> >> > registered and check if there is an entry. If there is, the ifindex
> >> >> > for this entry is used. If there is no entry found for the given
> >> >> > ifname, the usual algorithm is used (therefore inherently providing
> >> >> > backward compatibility).
> >> >> 
> >> >> Sorry to ask because I don't like this feature at all. There was a lot
> >> >> of work on stable interface names. Why do you need stable ifindexes,
> >> >> which were never meant to be stable for a longer amount of time?
> >> > 
> >> > Also current versions of SNMP provide more useful information about
> >> > network interface slot information in ifDescription
> >> 
> >> Well if they do provide strings, then that is probably a better way
> >> forward than messing with the kernel.
> > 
> > It gives strings based on PCI information but nothing useful
> > on tunnels.
> 
> But at least in theory, that could be extended to do so right?

I had several snmp installations with net-snmp and munin, cacti and so
on and all had the interface name in ifDescription already some years
back.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: "hw csum failure" error on skge driver with 4.3 kernel upon receiving ICMPv6 multicast listener discovery packets

2015-12-01 Thread Stephen Hemminger

On Tue, 1 Dec 2015 22:07:45 +0100
Stefan Ring  wrote:

> On Sun, Nov 15, 2015 at 7:44 PM, David Madore  wrote:
> > The skge driver in the 4.3 kernel reports hardware checksum errors
> > upon receiving (certain?) IPv6 multicast packets containing ICMPv6
> > multicast listener discovery messages.  This is a regression since 4.1
> > (I believe between 4.1 and 4.2).  The e1000e driver on a different
> > Ethernet port of the same machine is not affected.  Disabling offload
> > rx checksumming suppresses the errors.  Nor are all IPv6 multicast
> > packets affected: for some reason, it seems only those containing
> > ICMPv6 multicast listener discovery messages trigger the problem.
> >
> > In case it also matters, the skge interface in question (eth1 in what
> > follows) is part of a bridge that contains another Ethernet interface
> > and a Wifi card.
> >
> > Here is a frame, with its link-level headers, that caused an error
> > when received by skge:
> >
> >    33 33 ff 62 30 d8 60 fb 42 f1 b1 36 86 dd 60 00  33.b0.`.B..6..`.
> > 0010   00 00 00 20 00 01 fe 80 00 00 00 00 00 00 62 fb  ... ..b.
> > 0020   42 ff fe f1 b1 36 ff 02 00 00 00 00 00 00 00 00  B6..
> > 0030   00 01 ff 62 30 d8 3a 00 01 00 05 02 00 00 83 00  ...b0.:.
> > 0040   c9 8a 00 00 00 00 ff 02 00 00 00 00 00 00 00 00  
> > 0050   00 01 ff 62 30 d8...b0.
> >
> > (Network dumps performed on another network device suggest that the
> > checksum is, indeed, correct.)
> >
> > And here is the syslog produced upon receiving the above packet:
> >
> > Nov 15 17:52:13 pleiades kernel: [  661.393163] eth1: hw csum failure
> > Nov 15 17:52:13 pleiades kernel: [  661.394203] CPU: 0 PID: 0 Comm: 
> > swapper/0 Tainted: GW   4.3.0-pleiades #1
> > Nov 15 17:52:13 pleiades kernel: [  661.395192] Hardware name: System 
> > manufacturer System Product Name/P5WD2-Premium, BIOS 0709 03/31/2006
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  88013a9d5d00 
> > 88013fc03aa8 8129a186 88013afe
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  88013fc03ac0 
> > 81436425  88013fc03af0
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  8142b87a 
> > 1027316b3fc03b30 88013a9d5d00 0030
> > Nov 15 17:52:13 pleiades kernel: [  661.395192] Call Trace:
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]
> > [] dump_stack+0x44/0x5e
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > netdev_rx_csum_fault+0x35/0x40
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > __skb_checksum_complete+0xca/0xd0
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > ipv6_mc_validate_checksum+0xab/0x140
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > skb_checksum_trimmed+0x8f/0x180
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > ipv6_mc_check_mld+0x105/0x330
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > br_multicast_rcv+0x8c/0xce0 [bridge]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> > __netif_receive_skb+0x13/0x60
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> > netif_receive_skb_internal+0x2e/0x90
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > br_handle_frame_finish+0x28c/0x5b0 [bridge]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> > usb_hcd_submit_urb+0xa4/0x960 [usbcore]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > br_handle_frame+0x151/0x270 [bridge]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> > usb_submit_urb+0x2d2/0x510 [usbcore]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > __netif_receive_skb_core+0x1c2/0x990
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> > __usb_hcd_giveback_urb+0x82/0xe0 [usbcore]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > __netif_receive_skb+0x13/0x60
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > netif_receive_skb_internal+0x2e/0x90
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > napi_gro_receive+0xa0/0xd0
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > skge_poll+0x380/0x7a0 [skge]
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] ? 
> > lapic_next_event+0x18/0x20
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > net_rx_action+0x13c/0x300
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > __do_softirq+0xc7/0x240
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > irq_exit+0x70/0x90
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > do_IRQ+0x51/0xd0
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > common_interrupt+0x7c/0x7c
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]
> > [] ? mwait_idle+0x87/0x140
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]  [] 
> > arch_cpu_idle+0xa/0x10
> > Nov 15 17:52:13 pleiades kernel: [  661.395192]

[PATCH net-next V3 11/17] hv_netvsc: Eliminate page_buf from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate page_buf from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   |4 ++--
 drivers/net/hyperv/netvsc.c   |   25 ++---
 drivers/net/hyperv/netvsc_drv.c   |   11 ++-
 drivers/net/hyperv/rndis_filter.c |   26 +-
 4 files changed, 35 insertions(+), 31 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f5b2145..b541455 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -149,7 +149,6 @@ struct hv_netvsc_packet {
 
 
u64 send_completion_tid;
-   struct hv_page_buffer *page_buf;
 };
 
 struct netvsc_device_info {
@@ -188,7 +187,8 @@ int netvsc_device_add(struct hv_device *device, void 
*additional_info);
 int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
struct hv_netvsc_packet *packet,
-   struct rndis_message *rndis_msg);
+   struct rndis_message *rndis_msg,
+   struct hv_page_buffer **page_buffer);
 void netvsc_linkstatus_callback(struct hv_device *device_obj,
struct rndis_message *resp);
 void netvsc_xmit_completion(void *context);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 081f14f..18058a59 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -702,7 +702,8 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
   unsigned int section_index,
   u32 pend_size,
   struct hv_netvsc_packet *packet,
-  struct rndis_message *rndis_msg)
+  struct rndis_message *rndis_msg,
+  struct hv_page_buffer **pb)
 {
char *start = net_device->send_buf;
char *dest = start + (section_index * net_device->send_section_size)
@@ -723,9 +724,9 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
}
 
for (i = 0; i < page_count; i++) {
-   char *src = phys_to_virt(packet->page_buf[i].pfn << PAGE_SHIFT);
-   u32 offset = packet->page_buf[i].offset;
-   u32 len = packet->page_buf[i].len;
+   char *src = phys_to_virt((*pb)[i].pfn << PAGE_SHIFT);
+   u32 offset = (*pb)[i].offset;
+   u32 len = (*pb)[i].len;
 
memcpy(dest, (src + offset), len);
msg_size += len;
@@ -742,7 +743,8 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
 
 static inline int netvsc_send_pkt(
struct hv_netvsc_packet *packet,
-   struct netvsc_device *net_device)
+   struct netvsc_device *net_device,
+   struct hv_page_buffer **pb)
 {
struct nvsp_message nvmsg;
u16 q_idx = packet->q_idx;
@@ -789,8 +791,8 @@ static inline int netvsc_send_pkt(
packet->xmit_more = false;
 
if (packet->page_buf_cnt) {
-   pgbuf = packet->cp_partial ? packet->page_buf +
-   packet->rmsg_pgcnt : packet->page_buf;
+   pgbuf = packet->cp_partial ? (*pb) +
+   packet->rmsg_pgcnt : (*pb);
ret = vmbus_sendpacket_pagebuffer_ctl(out_channel,
  pgbuf,
  packet->page_buf_cnt,
@@ -838,7 +840,8 @@ static inline int netvsc_send_pkt(
 
 int netvsc_send(struct hv_device *device,
struct hv_netvsc_packet *packet,
-   struct rndis_message *rndis_msg)
+   struct rndis_message *rndis_msg,
+   struct hv_page_buffer **pb)
 {
struct netvsc_device *net_device;
int ret = 0, m_ret = 0;
@@ -891,7 +894,7 @@ int netvsc_send(struct hv_device *device,
if (section_index != NETVSC_INVALID_INDEX) {
netvsc_copy_to_send_buf(net_device,
section_index, msd_len,
-   packet, rndis_msg);
+   packet, rndis_msg, pb);
 
packet->send_buf_index = section_index;
 
@@ -922,7 +925,7 @@ int netvsc_send(struct hv_device *device,
}
 
if (msd_send) {
-   m_ret = netvsc_send_pkt(msd_send, net_device);
+   m_ret = netvsc_send_pkt(msd_send, net_device, pb);
 
if (m_ret != 0) {
netvsc_free_send_slot(net_device,
@@ -932,7 +935,7 @@ int netvsc_send(struct hv_device *device,
}
 
if (cur_send)
-   ret = netvsc_send_pkt(cur_send, net_device);
+   ret = netvsc_send_pkt(cur_send, net_device, pb);
 
if (ret != 0 && section_index !=

Re: [RFC] Stable interface index option

2015-12-01 Thread Maximilian Wilhelm

Anno domini 2015 Hannes Frederic Sowa scripsit:

> On Tue, Dec 1, 2015, at 20:27, David Miller wrote:
> > From: Hannes Frederic Sowa 
> > Date: Tue, 01 Dec 2015 17:02:23 +0100
> > 
> > > On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
> > >> > I'm not sure I understand how this would work- are we going to 
> > >> > pin down the ifindex for some subset of interfaces?
> > >> 
> > >> I'm not sure what your idea is, but I guess we might mean the same
> > >> thing:
> > >> 
> > >> What I have in mind is that the user can supply a list of (ifname ->
> > >> ifindex) entries via a sysfs/procfs interface and if such a list is
> > >> present, the kernel will search the list for every ifname which is
> > >> registered and check if there is an entry. If there is, the ifindex
> > >> for this entry is used. If there is no entry found for the given
> > >> ifname, the usual algorithm is used (therefore inherently providing
> > >> backward compatibility).
> > > 
> > > Sorry to ask because I don't like this feature at all. There was a lot
> > > of work on stable interface names. Why do you need stable ifindexes,
> > > which were never meant to be stable for a longer amount of time?
> > 
> > Because all the remote SNMP tools work with interface indexes, not names.

That's indeed true and the underlying problem which brought us to this
idea.

> I know, but it should be terribly simply to patch SNMP tools to even
> store the table of ifindex <-> name mappings persistently on the disk
> and thus completely avoid this issue. Even though they can check on
> interfaces if they have the same characteristics, e.g. tunnel to the
> same destinations etc. Those are all policies which user space should
> handle.

How should net-snmp handle cases where new interfaces come up on old
and now unused numbers? What should it report? That would escalate the
problem a lot IMHO.

> I agree it would make life much easier for user space if the kernel
> would keep the ifindex stable over reboots etc. but for a much higher
> costs at kernel maintenance.

What would that cost be in the implementation I sketched before?

I don't quite see what the higher cost would be. I currently can
manually set an ifindex of my choosing for newly created GRE tunnels,
vlan interfaces and the like. So what would be the difference of
having the optional ability to push some of these predefined ifindexes
into the kernel and don't bother while creating the interface and
having the same outcome? Same effect but easier to use once set up.

Regarding the performance issues raised before the same question
applies: What's the difference if I create some / a lot of interfaces
with sparse ifindexes by using "ip link add foo index 1234" or by
having a list within the kernel.

I still consider this a feature worth and simple enough to implement
which would serve as a great option for people with such usage
scenarios.

Best
Max
-- 
"Does is bother me, that people hurt others, because they are to weak to face 
the truth? Yeah. Sorry 'bout that."
 -- Thirteen, House M.D.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: size overflow in function qdisc_tree_decrease_qlen net/sched/sch_api.c

2015-12-01 Thread Eric Dumazet

On Tue, 2015-12-01 at 14:47 -0800, Cong Wang wrote:
> On Tue, Dec 1, 2015 at 2:33 PM, Eric Dumazet  wrote:
> > Hmm... it looks like we have a much more serious bug :
> >
> > qdisc_lookup() calls qdisc_match_from_root(dev->qdisc, handle) without
> > proper lock being held, so we might actually crash the host,
> > if qdisc_tree_decrease_qlen() happens at the time qdiscs are changed.
> >
> > qdisc_tree_decrease_qlen() needs serious care :(
> 
> Convert qdisc list to RCU protected?

Yes, or/and add a per txqueue list, to shorten lookup times !

If we have a per txqueue list, we do not need RCU as we already own the
qdisc lock.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 01/17] hv_netvsc: Resize some of the variables in hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

As part of reducing the size of the hv_netvsc_packet, resize some of the
variables based on their usage.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 5fa98f5..972e562 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -127,11 +127,11 @@ struct ndis_tcp_ip_checksum_info;
  */
 struct hv_netvsc_packet {
/* Bookkeeping stuff */
-   u32 status;
+   u8 status;
 
-   bool is_data_pkt;
-   bool xmit_more; /* from skb */
-   bool cp_partial; /* partial copy into send buffer */
+   u8 is_data_pkt;
+   u8 xmit_more; /* from skb */
+   u8 cp_partial; /* partial copy into send buffer */
 
u16 vlan_tci;
 
@@ -147,13 +147,13 @@ struct hv_netvsc_packet {
/* This points to the memory after page_buf */
struct rndis_message *rndis_msg;
 
-   u32 rmsg_size; /* RNDIS header and PPI size */
-   u32 rmsg_pgcnt; /* page count of RNDIS header and PPI */
+   u8 rmsg_size; /* RNDIS header and PPI size */
+   u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
 
u32 total_data_buflen;
/* Points to the send/receive buffer where the ethernet frame is */
void *data;
-   u32 page_buf_cnt;
+   u8 page_buf_cnt;
struct hv_page_buffer *page_buf;
 };
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Hannes Frederic Sowa

Hello,

On Tue, Dec 1, 2015, at 23:43, Maximilian Wilhelm wrote:
> Anno domini 2015 Hannes Frederic Sowa scripsit:
> 
> > On Tue, Dec 1, 2015, at 20:27, David Miller wrote:
> > > From: Hannes Frederic Sowa 
> > > Date: Tue, 01 Dec 2015 17:02:23 +0100
> > > 
> > > > On Tue, Dec 1, 2015, at 16:50, Maximilian Wilhelm wrote:
> > > >> > I'm not sure I understand how this would work- are we going to 
> > > >> > pin down the ifindex for some subset of interfaces?
> > > >> 
> > > >> I'm not sure what your idea is, but I guess we might mean the same
> > > >> thing:
> > > >> 
> > > >> What I have in mind is that the user can supply a list of (ifname ->
> > > >> ifindex) entries via a sysfs/procfs interface and if such a list is
> > > >> present, the kernel will search the list for every ifname which is
> > > >> registered and check if there is an entry. If there is, the ifindex
> > > >> for this entry is used. If there is no entry found for the given
> > > >> ifname, the usual algorithm is used (therefore inherently providing
> > > >> backward compatibility).
> > > > 
> > > > Sorry to ask because I don't like this feature at all. There was a lot
> > > > of work on stable interface names. Why do you need stable ifindexes,
> > > > which were never meant to be stable for a longer amount of time?
> > > 
> > > Because all the remote SNMP tools work with interface indexes, not names.
> 
> That's indeed true and the underlying problem which brought us to this
> idea.

I do really understand the problem with SNMP tooling but I hope that
monitoring software can just ignore the ifindex for the time being and
just use it as a way to walk the table. The authoritative identifier
should be the name, this is were a lot of work went into from the
udev/systemd folks to be stable and that is already pretty hairy and
took a long time. Indexes are simply the way to walk snmp tables, names
won't work in the snmp design. But I don't see any reason why monitoring
software uses this ifindex as the key to store all subsequent interface
statistics.

> > I know, but it should be terribly simply to patch SNMP tools to even
> > store the table of ifindex <-> name mappings persistently on the disk
> > and thus completely avoid this issue. Even though they can check on
> > interfaces if they have the same characteristics, e.g. tunnel to the
> > same destinations etc. Those are all policies which user space should
> > handle.
> 
> How should net-snmp handle cases where new interfaces come up on old
> and now unused numbers? What should it report? That would escalate the
> problem a lot IMHO.

ifindexes are only reused when the ifindex allocator wraps around which
should hopefully take a while and that is exactly my point.

In general the ifindexes are designed to not be reused very fast. Most
ifindex usage is in socket layer where one specifies which way a packet
should go in sendto/msg calls to override routing lookups or use link
local addresses. Imagine an application looks up an interface and
determines the ifindex to send out data to an ipv6 link local address
(which needs the ifindex obviously). If we don't bias the ifindex
selection during device creation time the app will get an error and
won't race with other tunnels being setup and can handle that
accordingly because new tunnels simply have new ifindexes until the
per-namespace counter wraps around. If we have name based policies we
have to audit user space applications how they do interface name
selection to protect them against reusing interface names. Based on your
mail you simply already do ensure that interface names are unique, so
your monitoring software should use just them.

I simply see this feature being misused way too easily.

ip link ... index IDX was added to create devices in new netns for CRIU.
This does make sense but installing policy in the kernel for interface
indexes seems to much to me.

> > I agree it would make life much easier for user space if the kernel
> > would keep the ifindex stable over reboots etc. but for a much higher
> > costs at kernel maintenance.
> 
> What would that cost be in the implementation I sketched before?

I don't think there is a high performance cost. Only device allocation
path would need to be changed and this is not fast path at all.

> I don't quite see what the higher cost would be. I currently can
> manually set an ifindex of my choosing for newly created GRE tunnels,
> vlan interfaces and the like. So what would be the difference of
> having the optional ability to push some of these predefined ifindexes
> into the kernel and don't bother while creating the interface and
> having the same outcome? Same effect but easier to use once set up.
> 
> Regarding the performance issues raised before the same question
> applies: What's the difference if I create some / a lot of interfaces
> with sparse ifindexes by using "ip link add foo index 1234" or by
> having a list within the kernel.
> 
> I still consider

Re: r8169 regression: UDP packets dropped intermittantly

2015-12-01 Thread Francois Romieu

Jonathan Woithe  :
[...]
> Any thoughts or progress at this stage?  Are there further tests you need me
> to do ?

Yes but you should expect two more days without signal.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: size overflow in function qdisc_tree_decrease_qlen net/sched/sch_api.c

2015-12-01 Thread Cong Wang

On Tue, Dec 1, 2015 at 2:33 PM, Eric Dumazet  wrote:
> Hmm... it looks like we have a much more serious bug :
>
> qdisc_lookup() calls qdisc_match_from_root(dev->qdisc, handle) without
> proper lock being held, so we might actually crash the host,
> if qdisc_tree_decrease_qlen() happens at the time qdiscs are changed.
>
> qdisc_tree_decrease_qlen() needs serious care :(

Convert qdisc list to RCU protected?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 05/17] hv_netvsc: Eliminatte the data field from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminatte the data field from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   |5 ++---
 drivers/net/hyperv/netvsc.c   |5 +++--
 drivers/net/hyperv/netvsc_drv.c   |3 ++-
 drivers/net/hyperv/rndis_filter.c |   11 +++
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 7fa4f43..506d552 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -148,9 +148,6 @@ struct hv_netvsc_packet {
u64 send_completion_tid;
void *send_completion_ctx;
void (*send_completion)(void *context);
-
-   /* Points to the send/receive buffer where the ethernet frame is */
-   void *data;
struct hv_page_buffer *page_buf;
 };
 
@@ -196,6 +193,7 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
 void netvsc_xmit_completion(void *context);
 int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
+   void **data,
struct ndis_tcp_ip_checksum_info *csum_info,
struct vmbus_channel *channel);
 void netvsc_channel_cb(void *context);
@@ -206,6 +204,7 @@ int rndis_filter_device_add(struct hv_device *dev,
 void rndis_filter_device_remove(struct hv_device *dev);
 int rndis_filter_receive(struct hv_device *dev,
struct hv_netvsc_packet *pkt,
+   void **data,
struct vmbus_channel *channel);
 
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 2de9e7f..8fbf816 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1008,6 +1008,7 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
int i;
int count = 0;
struct net_device *ndev;
+   void *data;
 
ndev = net_device->ndev;
 
@@ -1047,13 +1048,13 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
for (i = 0; i < count; i++) {
/* Initialize the netvsc packet */
netvsc_packet->status = NVSP_STAT_SUCCESS;
-   netvsc_packet->data = (void *)((unsigned long)net_device->
+   data = (void *)((unsigned long)net_device->
recv_buf + vmxferpage_packet->ranges[i].byte_offset);
netvsc_packet->total_data_buflen =
vmxferpage_packet->ranges[i].byte_count;
 
/* Pass it to the upper layer */
-   rndis_filter_receive(device, netvsc_packet, channel);
+   rndis_filter_receive(device, netvsc_packet, , channel);
 
if (netvsc_packet->status != NVSP_STAT_SUCCESS)
status = NVSP_STAT_FAIL;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 77c0849..c73afb1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -685,6 +685,7 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
  */
 int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
+   void **data,
struct ndis_tcp_ip_checksum_info *csum_info,
struct vmbus_channel *channel)
 {
@@ -713,7 +714,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 * Copy to skb. This copy is needed here since the memory pointed by
 * hv_netvsc_packet cannot be deallocated
 */
-   memcpy(skb_put(skb, packet->total_data_buflen), packet->data,
+   memcpy(skb_put(skb, packet->total_data_buflen), *data,
packet->total_data_buflen);
 
skb->protocol = eth_type_trans(skb, net);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 63584e7..be0fa9c 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -351,6 +351,7 @@ static inline void *rndis_get_ppi(struct rndis_packet 
*rpkt, u32 type)
 static void rndis_filter_receive_data(struct rndis_device *dev,
   struct rndis_message *msg,
   struct hv_netvsc_packet *pkt,
+  void **data,
   struct vmbus_channel *channel)
 {
struct rndis_packet *rndis_pkt;
@@ -383,7 +384,7 @@ static void rndis_filter_receive_data(struct rndis_device 
*dev,
 * the data packet to the stack, without the rndis trailer padding
 */
pkt->total_data_buflen = rndis_pkt->data_len;
-   pkt->data = (void *)((unsigned long)pkt->data + data_offset);
+   *data =

[PATCH net-next V3 00/17] hv_netvsc: Eliminate the additional head room

2015-12-01 Thread K. Y. Srinivasan

In an attempt to avoid having to allocate memory on the send path, the netvsc
driver was requesting additional head room so that both rndis header and the
netvsc packet (the state that had to persist) could be placed in the skb.
Since the amount of head room requested was exceeding the default head room
as set in LL_MAX_HEADER, we were forcing a reallocation of skb.

With this patch-set, I have reduced the size of the netvsc packet to less
than 20 bytes and with this reduction we don't need to ask for any additional
headroom. We place the rndis header in the skb head room and we place the
netvsc packet in control buffer area in the skb.

V2:  - Addressed  review comments:
 - Eliminated more fields from netvsc packet structure.

V3:  - Fixed a typo in patch: hv_netvsc: Don't ask for additional head room in 
the skb.
 
K. Y. Srinivasan (15):
  hv_netvsc: Resize some of the variables in hv_netvsc_packet
  hv_netvsc: Rearrange the hv_negtvsc_packet to be space efficient
  hv_netvsc: Eliminate the channel field in hv_netvsc_packet structure
  hv_netvsc: Eliminate rndis_msg pointer from hv_netvsc_packet
structure
  hv_netvsc: Eliminatte the data field from struct hv_netvsc_packet
  hv_netvsc: Eliminate send_completion from struct hv_netvsc_packet
  hv_netvsc: Eliminate send_completion_ctx from struct hv_netvsc_packet
  hv_netvsc: Don't ask for additional head room in the skb
  hv_netvsc: Eliminate page_buf from struct hv_netvsc_packet
  hv_netvsc: Eliminate send_completion_tid from struct hv_netvsc_packet
  hv_netvsc: Eliminate is_data_pkt from struct hv_netvsc_packet
  hv_netvsc: Eliminate completion_func from struct hv_netvsc_packet
  hv_netvsc: Eliminate xmit_more from struct hv_netvsc_packet
  hv_netvsc: Eliminate status from struct hv_netvsc_packet
  hv_netvsc: Eliminate vlan_tci from struct hv_netvsc_packet

Vitaly Kuznetsov (2):
  hv_netvsc: move subchannel existence check to netvsc_select_queue()
  hv_netvsc: remove locking in netvsc_send()

 drivers/net/hyperv/hyperv_net.h   |   48 +++--
 drivers/net/hyperv/netvsc.c   |  102 ++---
 drivers/net/hyperv/netvsc_drv.c   |   93 +-
 drivers/net/hyperv/rndis_filter.c |   66 
 include/linux/netdevice.h |4 +-
 5 files changed, 138 insertions(+), 175 deletions(-)

-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 10/17] hv_netvsc: remove locking in netvsc_send()

2015-12-01 Thread K. Y. Srinivasan

From: Vitaly Kuznetsov 

Packet scheduler guarantees there won't be multiple senders for the same
queue and as we use q_idx for multi_send_data the spinlock is redundant.

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h |1 -
 drivers/net/hyperv/netvsc.c |8 
 2 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index a9d2bdc5..f5b2145 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -633,7 +633,6 @@ struct nvsp_message {
 #define RNDIS_PKT_ALIGN_DEFAULT 8
 
 struct multi_send_data {
-   spinlock_t lock; /* protect struct multi_send_data */
struct hv_netvsc_packet *pkt; /* netvsc pkt pending */
u32 count; /* counter of batched packets */
 };
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 419b055..081f14f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -38,7 +38,6 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
 {
struct netvsc_device *net_device;
struct net_device *ndev = hv_get_drvdata(device);
-   int i;
 
net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
if (!net_device)
@@ -58,9 +57,6 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
 
-   for (i = 0; i < num_online_cpus(); i++)
-   spin_lock_init(_device->msd[i].lock);
-
hv_set_drvdata(device, net_device);
return net_device;
 }
@@ -850,7 +846,6 @@ int netvsc_send(struct hv_device *device,
u16 q_idx = packet->q_idx;
u32 pktlen = packet->total_data_buflen, msd_len = 0;
unsigned int section_index = NETVSC_INVALID_INDEX;
-   unsigned long flag;
struct multi_send_data *msdp;
struct hv_netvsc_packet *msd_send = NULL, *cur_send = NULL;
bool try_batch;
@@ -867,7 +862,6 @@ int netvsc_send(struct hv_device *device,
msdp = _device->msd[q_idx];
 
/* batch packets in send buffer if possible */
-   spin_lock_irqsave(>lock, flag);
if (msdp->pkt)
msd_len = msdp->pkt->total_data_buflen;
 
@@ -927,8 +921,6 @@ int netvsc_send(struct hv_device *device,
cur_send = packet;
}
 
-   spin_unlock_irqrestore(>lock, flag);
-
if (msd_send) {
m_ret = netvsc_send_pkt(msd_send, net_device);
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 08/17] hv_netvsc: Don't ask for additional head room in the skb

2015-12-01 Thread K. Y. Srinivasan

The rndis header is 116 bytes big and can be placed in the default
head room that will be available in the skb. Since the netvsc packet
is less than 48 bytes, we can use the skb control buffer
for the netvsc packet. With these changes we don't need to
ask for additional head room.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
V2: When HYPERV_NET is configured, set LL_MAX_HEADER to 128 - Vitaly 
Kuznetsov 
V2: Add a build time check on the skb control buffer - Florian Westphal 

V3: Fix a typo - David Miller

 drivers/net/hyperv/hyperv_net.h |3 +++
 drivers/net/hyperv/netvsc_drv.c |   30 +++---
 include/linux/netdevice.h   |4 +++-
 3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 9504ca9..e15dc2c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -124,6 +124,9 @@ struct ndis_tcp_ip_checksum_info;
 /*
  * Represent netvsc packet which contains 1 RNDIS and 1 ethernet frame
  * within the RNDIS
+ *
+ * The size of this structure is less than 48 bytes and we can now
+ * place this structure in the skb->cb field.
  */
 struct hv_netvsc_packet {
/* Bookkeeping stuff */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 947b778..90cc8d9 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -432,7 +432,6 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
net_device *net)
u32 net_trans_info;
u32 hash;
u32 skb_length;
-   u32 pkt_sz;
struct hv_page_buffer page_buf[MAX_PAGE_BUFFER_COUNT];
struct netvsc_stats *tx_stats = this_cpu_ptr(net_device_ctx->tx_stats);
 
@@ -460,16 +459,21 @@ check_size:
goto check_size;
}
 
-   pkt_sz = sizeof(struct hv_netvsc_packet) + RNDIS_AND_PPI_SIZE;
-
-   ret = skb_cow_head(skb, pkt_sz);
+   /*
+* Place the rndis header in the skb head room and
+* the skb->cb will be used for hv_netvsc_packet
+* structure.
+*/
+   ret = skb_cow_head(skb, RNDIS_AND_PPI_SIZE);
if (ret) {
netdev_err(net, "unable to alloc hv_netvsc_packet\n");
ret = -ENOMEM;
goto drop;
}
-   /* Use the headroom for building up the packet */
-   packet = (struct hv_netvsc_packet *)skb->head;
+   /* Use the skb control buffer for building up the packet */
+   BUILD_BUG_ON(sizeof(struct hv_netvsc_packet) >
+   FIELD_SIZEOF(struct sk_buff, cb));
+   packet = (struct hv_netvsc_packet *)skb->cb;
 
packet->status = 0;
packet->xmit_more = skb->xmit_more;
@@ -482,8 +486,7 @@ check_size:
packet->is_data_pkt = true;
packet->total_data_buflen = skb->len;
 
-   rndis_msg = (struct rndis_message *)((unsigned long)packet +
-   sizeof(struct hv_netvsc_packet));
+   rndis_msg = (struct rndis_message *)skb->head;
 
memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE);
 
@@ -1071,16 +1074,12 @@ static int netvsc_probe(struct hv_device *dev,
struct netvsc_device_info device_info;
struct netvsc_device *nvdev;
int ret;
-   u32 max_needed_headroom;
 
net = alloc_etherdev_mq(sizeof(struct net_device_context),
num_online_cpus());
if (!net)
return -ENOMEM;
 
-   max_needed_headroom = sizeof(struct hv_netvsc_packet) +
- RNDIS_AND_PPI_SIZE;
-
netif_carrier_off(net);
 
net_device_ctx = netdev_priv(net);
@@ -1116,13 +1115,6 @@ static int netvsc_probe(struct hv_device *dev,
net->ethtool_ops = _ops;
SET_NETDEV_DEV(net, >device);
 
-   /*
-* Request additional head room in the skb.
-* We will use this space to build the rndis
-* heaser and other state we need to maintain.
-*/
-   net->needed_headroom = max_needed_headroom;
-
/* Notify the netvsc driver of the new device */
memset(_info, 0, sizeof(device_info));
device_info.ring_size = ring_size;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d208914..96975f8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -132,7 +132,9 @@ static inline bool dev_xmit_complete(int rc)
  * used.
  */
 
-#if defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
+#if defined(CONFIG_HYPERV_NET)
+# define LL_MAX_HEADER 128
+#elif defined(CONFIG_WLAN) || IS_ENABLED(CONFIG_AX25)
 # if defined(CONFIG_MAC80211_MESH)
 #  define LL_MAX_HEADER 128
 # else
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

[PATCH net-next V3 04/17] hv_netvsc: Eliminate rndis_msg pointer from hv_netvsc_packet structure

2015-12-01 Thread K. Y. Srinivasan

Eliminate rndis_msg pointer from hv_netvsc_packet structure.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   |8 +++-
 drivers/net/hyperv/netvsc.c   |   10 ++
 drivers/net/hyperv/netvsc_drv.c   |7 +++
 drivers/net/hyperv/rndis_filter.c |2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ac24091..7fa4f43 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -149,10 +149,6 @@ struct hv_netvsc_packet {
void *send_completion_ctx;
void (*send_completion)(void *context);
 
-
-   /* This points to the memory after page_buf */
-   struct rndis_message *rndis_msg;
-
/* Points to the send/receive buffer where the ethernet frame is */
void *data;
struct hv_page_buffer *page_buf;
@@ -189,10 +185,12 @@ struct rndis_device {
 
 
 /* Interface */
+struct rndis_message;
 int netvsc_device_add(struct hv_device *device, void *additional_info);
 int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
-   struct hv_netvsc_packet *packet);
+   struct hv_netvsc_packet *packet,
+   struct rndis_message *rndis_msg);
 void netvsc_linkstatus_callback(struct hv_device *device_obj,
struct rndis_message *resp);
 void netvsc_xmit_completion(void *context);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 52533ed..2de9e7f 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -706,7 +706,8 @@ static u32 netvsc_get_next_send_section(struct 
netvsc_device *net_device)
 static u32 netvsc_copy_to_send_buf(struct netvsc_device *net_device,
   unsigned int section_index,
   u32 pend_size,
-  struct hv_netvsc_packet *packet)
+  struct hv_netvsc_packet *packet,
+  struct rndis_message *rndis_msg)
 {
char *start = net_device->send_buf;
char *dest = start + (section_index * net_device->send_section_size)
@@ -722,7 +723,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
if (packet->is_data_pkt && packet->xmit_more && remain &&
!packet->cp_partial) {
padding = net_device->pkt_align - remain;
-   packet->rndis_msg->msg_len += padding;
+   rndis_msg->msg_len += padding;
packet->total_data_buflen += padding;
}
 
@@ -841,7 +842,8 @@ static inline int netvsc_send_pkt(
 }
 
 int netvsc_send(struct hv_device *device,
-   struct hv_netvsc_packet *packet)
+   struct hv_netvsc_packet *packet,
+   struct rndis_message *rndis_msg)
 {
struct netvsc_device *net_device;
int ret = 0, m_ret = 0;
@@ -897,7 +899,7 @@ int netvsc_send(struct hv_device *device,
if (section_index != NETVSC_INVALID_INDEX) {
netvsc_copy_to_send_buf(net_device,
section_index, msd_len,
-   packet);
+   packet, rndis_msg);
 
packet->send_buf_index = section_index;
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index e5f4eec..77c0849 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -482,10 +482,10 @@ check_size:
packet->is_data_pkt = true;
packet->total_data_buflen = skb->len;
 
-   packet->rndis_msg = (struct rndis_message *)((unsigned long)packet +
+   rndis_msg = (struct rndis_message *)((unsigned long)packet +
sizeof(struct hv_netvsc_packet));
 
-   memset(packet->rndis_msg, 0, RNDIS_AND_PPI_SIZE);
+   memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE);
 
/* Set the completion routine */
packet->send_completion = netvsc_xmit_completion;
@@ -495,7 +495,6 @@ check_size:
isvlan = packet->vlan_tci & VLAN_TAG_PRESENT;
 
/* Add the rndis header */
-   rndis_msg = packet->rndis_msg;
rndis_msg->ndis_msg_type = RNDIS_MSG_PACKET;
rndis_msg->msg_len = packet->total_data_buflen;
rndis_pkt = _msg->msg.pkt;
@@ -619,7 +618,7 @@ do_send:
packet->page_buf_cnt = init_page_array(rndis_msg, rndis_msg_size,
   skb, packet);
 
-   ret = netvsc_send(net_device_ctx->device_ctx, packet);
+   ret = netvsc_send(net_device_ctx->device_ctx, packet, rndis_msg);
 
 drop:
if (ret == 0) {
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 1b04d78..63584e7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++

Re: [PATCH 00/13] mvneta Buffer Management and enhancements

2015-12-01 Thread Marcin Wojtas

Gregory,

Please apply below patch:
http://pastebin.com/t42xyU3i
It will confirm if there's an overflow for CS0 size in your setup.
Please let know.

So far the issue may have been not noticed, because in every IO driver
using mvebu_mbus_dram_info for configuring MBUS windows, there's
following substraction:
(cs->size - 1) & 0xf000

I think there are two options:
1. Change size type to u64.
2. Change condition in mvebu_mbus_get_dram_win_info to:
if (cs->base <= phyaddr && phyaddr <= (cs->base + cs->size -1))

I'm looking forward to your information and opinion.

Best regards,
Marcin

2015-12-01 22:40 GMT+01:00 Marcin Wojtas :
> Hi Gregory,
>
> Thanks for the log. I think it may be an overall problem with 4GB size
> representation in mvebu_mbus_dram_info structure? Maybe whole DRAM
> space is associated to CS0, and the 4GB size (0x1  ) does not
> fit u32 variable?
>
> Best regards,
> Marcin
>
> 2015-12-01 14:12 GMT+01:00 Gregory CLEMENT 
> :
>> Hi Marcin,
>>
>>  On lun., nov. 30 2015, Marcin Wojtas  wrote:
>> [...]
> 5. Enable BM on Armada XP and 38X development boards - those ones and
> A370 I could check on my own. In all cases they survived night-long
> linerate iperf. Also tests were performed with A388 SoC working as a
> network bridge between two packet generators. They showed increase of
> maximum processed 64B packets by ~20k (~555k packets with BM enabled
> vs ~535 packets without BM). Also when pushing 1500B-packets with a
> line rate achieved, CPU load decreased from around 25% without BM vs
> 18-20% with BM.

 I was trying to test the BM part of tour series on the Armada XP GP
 board. However it failed very quickly during the pool allocation. After
 a first debug I found that the size of the cs used in the
 mvebu_mbus_dram_info struct was 0. I have applied your series on a
 v4.4-rc1 kernel. At this stage I don't know if it is a regression in the
 mbus driver, a misconfiguration on my side or something else.

 Does it ring a bell for you?
>>>
>>> Frankly, I'm a bit surprised, I've never seen such problems on any of
>>> the boards (AXP-GP/DB, A38X-DB/GP/AP). Did mvebu_mbus_dram_win_info
>>> function exit with an error? Can you please apply below diff:
>>> http://pastebin.com/2ws1txWk
>>
>> Yes it exited with errors and I added the same kind traces. It was how I
>> knew that the size was 0!
>>
>> I've just rebuild a fresh kernel using mvebu_v7_defconfig and adding
>> your patch, I got the same issue (see the log at the end of the email.)
>>
>>
>> But the good news is that on the same kernel on Armada 388 GP the pool
>> allocation does not fail. I really suspect an issue with my u-boot.
>>
>>
>>> And send me a full log beginning from u-boot?
>>>

 How do you test test it exactly?
 Especially on which kernel and with which U-Boot?

>>>
>>> I've just re-built the patchset I sent, which is on top of 4.4-rc1.
>>>
>>> I use AXP-GP, 78460 @ 1600MHz, 2GB DRAM, and everything works fine. My
>>> u-boot version: v2011.12 2014_T2.0_eng_dropv2.
>>
>> My config is AXP-GP, 78460 @ 1300MHz, 8GB DRAM (only 4GB are used
>> because I didn't activated LPAE), but the main difference is the U-Boot
>> version: v2011.12 2014_T2.eng_dropv1.ATAG-test02.
>>
>> Thanks,
>>
>> Gregory
>>
>>
>> [0.00] Booting Linux on physical CPU 0x0
>> [0.00] Linux version 4.4.0-rc1-00013-g76f111f9bdf8-dirty 
>> (gclement@FE-laptop) (gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu1) ) 
>> #1024 SMP Tue Dec 1 14:02:52 CET 2015
>> [0.00] CPU: ARMv7 Processor [562f5842] revision 2 (ARMv7), 
>> cr=10c5387d
>> [0.00] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction 
>> cache
>> [0.00] Machine model: Marvell Armada XP Development Board 
>> DB-MV784MP-GP
>> [0.00] Memory policy: Data cache writealloc
>> [0.00] PERCPU: Embedded 12 pages/cpu @ee1ac000 s18752 r8192 d22208 
>> u49152
>> [0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
>> pages: 981504
>> [0.00] Kernel command line: console=ttyS0,115200 earlyprintk 
>> mvneta.rxq_def=2
>> [0.00] log_buf_len individual max cpu contribution: 4096 bytes
>> [0.00] log_buf_len total cpu_extra contributions: 12288 bytes
>> [0.00] log_buf_len min size: 16384 bytes
>> [0.00] log_buf_len: 32768 bytes
>> [0.00] early log buf free: 14924(91%)
>> [0.00] PID hash table entries: 4096 (order: 2, 16384 bytes)
>> [0.00] Dentry cache hash table entries: 131072 (order: 7, 524288 
>> bytes)
>> [0.00] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
>> [0.00] Memory: 3888204K/3932160K available (5576K kernel code, 251K 
>> rwdata, 1544K rodata, 4460K init, 207K bss, 43956K reserved, 0K 
>> cma-reserved, 3145728K highmem)
>> [0.00] Virtual kernel memory layout:
>>

Re: [PATCH 1/2 net-next] ravb: Add fallback compatibility strings

2015-12-01 Thread Simon Horman

On Tue, Dec 01, 2015 at 09:42:52AM +0100, Geert Uytterhoeven wrote:
> On Tue, Dec 1, 2015 at 8:43 AM, Simon Horman  
> wrote:
> > Add fallback compatibility strings for R-Car Gen 2 & 3 SoC Families.
> > This is in keeping with the fallback scheme being adopted wherever
> > appropriate for drivers for Renesas SoCs.
> >
> > Also correct typo.
> >
> > Signed-off-by: Simon Horman 
> > ---
> >  Documentation/devicetree/bindings/net/renesas,ravb.txt | 11 +--
> >  drivers/net/ethernet/renesas/ravb_main.c   |  2 ++
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
> > b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > index b486f3f5f6a3..115006325bff 100644
> > --- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > +++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> > @@ -1,12 +1,19 @@
> >  * Renesas Electronics Ethernet AVB
> >
> >  This file provides information on what the device node for the Ethernet AVB
> > -interface contains.
> > +interface.
> >
> >  Required properties:
> >  - compatible: "renesas,etheravb-r8a7790" if the device is a part of 
> > R8A7790 SoC.
> >   "renesas,etheravb-r8a7794" if the device is a part of R8A7794 
> > SoC.
> >   "renesas,etheravb-r8a7795" if the device is a part of R8A7795 
> > SoC.
> > + "renesas,etheravb-gen2" for generic R-Car Gen 2 compatible 
> > interface.
> > + "renesas,etheravb-gen3" for generic R-Car Gen 3 compatible 
> > interface.
> > +
> 
> (Same comment, different audience)
> 
> Shouldn't that be "renesas,etheravb-rcar-gen" or "renesas,etheravb-rcar2"?
> 
> Else you'll be in trouble when Renesas starts focussing on airplanes
> (R-Plane Gen2), rockets (R-Rocket Gen2), or IoT (R-IoT Gen2).

Sure, lets go with:

renesas,etheravb-rcar-gen2 and renesas,etheravb-rcar-gen3.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: size overflow in function qdisc_tree_decrease_qlen net/sched/sch_api.c

2015-12-01 Thread Cong Wang

On Tue, Dec 1, 2015 at 12:06 PM, Eric Dumazet  wrote:
> On Tue, 2015-12-01 at 11:17 -0800, Cong Wang wrote:
>> On Tue, Dec 1, 2015 at 11:09 AM, Eric Dumazet  wrote:
>> > On Tue, 2015-12-01 at 10:43 -0800, Cong Wang wrote:
>> >
>> >> This smells hacky... Another way to fix this is to hold the qdisc tree
>> >> lock in mq_dump(), since it is not a hot path (comparing with
>> >> enqueue/dequeue)?
>> >
>> > Really ? Which qdisc tree lock will protect you exactly ???
>> >
>> > Whole point of MQ is that each TX queue has its own lock.
>> >
>> > So multiple cpus can call qdisc_tree_decrease_qlen() at the same time,
>> > holding their own lock.
>> >
>> > Clearly modifying mq 'data' is wrong.
>>
>> Ah, yeah, but mq _seems_ also the only one who modifies sch->q.qlen
>> in ->dump(), which is the root cause of this bug. I am wondering if it should
>> just compute the qlen and return it without modifying sch->q.qlen.
>
> Sure, but then we still would get PAX underflows warnings ...
>
> Also need to take care of sch->qstats.drops += count;
>
> Also that would require a change of ->dump() api, since tc_fill_qdisc()
> does :
>
> if (q->ops->dump && q->ops->dump(q, skb) < 0)
> goto nla_put_failure;
> qlen = q->q.qlen;
>
> Not sure it is worth the pain, changing signature of all ->dump()
> handlers...

Yeah, I am fully aware of that, your patch is a quick fix, I was trying
to see if there is any long-term fix for this.

>
>
> What about adding TCQ_F_NOPARENT and then :

This seems equivalent to your fix since TCQ_F_MQROOT implies
no parent:

if (sch->parent != TC_H_ROOT)
return -EOPNOTSUPP;

Again, your patch is fine, just want to check if there is any better fix.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 0/5] ila: Optimization to preserve value of early demux

2015-12-01 Thread Tom Herbert

In the current implementation of ILA, LWT is used to perform
translation on both the input and output paths. This is functional,
however there is a big performance hit in the receive path. Early
demux occurs before the routing lookup (a hit actually obviates the
route lookup). Therefore the stack currently performs early
demux before translation so that a local connection with ILA
addresses is never matched. Note that this issue is not just
with ILA, but pretty much any translated or encapsulated packet
handled by LWT would miss the opportunity for early demux. Solving
the general problem seems non trivial since we would need to move
the route lookup before early demx thereby mitigating the value.

This patch set addresses the issue for ILA by adding a fast locator
lookup that occurs before early demux. This is done by setting iptables
rule in PREROUTING. Something like:

ip6tables -t mangle -A PREROUTING --dst 2001:0:0:33::/64 -j ILAIN

For the backend we implement an rhashtable that contains identifier
to locator to mappings. The table also allows more specific matches
that include original locator and interface.

This patch set:
 - Add an rhashtable function to atomically replace and element.
   This is useful to implement sub-trees from a table entry
   without needing to use a special anchor structure as the
   table entry.
 - Add a start callback for starting a netlink dump.
 - Creates an ila directory under net/ipv6 and moves ila.c to it.
   ila.c is split into ila_common.c and ila_lwt.c.
 - Implement a table to do identifier->locator mapping. This is
   an rhashtable (in ila_xlat.c).
 - Configuration for the table with netlink.
 - Add ILAIN and ILAOUT targets which call into the ILA module

Changes from v1:
 - Use iptables targets instead of a new xfrm function

Testing:
   Running 200 netperf TCP_RR streams

No ILA, baseline
   79.26% CPU utilization
   1678282 tps
   104/189/390 50/90/99% latencies

ILA before fix (LWT on both input and output)
   81.91% CPU utilization
   1464723 tps (-14.5% from baseline)
   121/215/411 50/90/99% latencies

ILA after fix (PREROUTING ILAIN target for input)
   80.41% CPU utilization
   1577483 tps (-6.3% from baseline)
   113/203/393 50/90/99% latencies


Tom Herbert (5):
  ila: Create net/ipv6/ila directory
  rhashtable: add function to replace an element
  netlink: add a start callback for starting a netlink dump
  ila: Add generic ILA translation facility
  net: ILA iptables target

 include/linux/netlink.h|   2 +
 include/linux/rhashtable.h |  82 ++
 include/net/genetlink.h|   2 +
 include/net/ila.h  |  18 ++
 include/uapi/linux/ila.h   |  22 ++
 net/ipv6/Makefile  |   2 +-
 net/ipv6/ila.c | 229 
 net/ipv6/ila/Makefile  |   7 +
 net/ipv6/ila/ila.h |  48 
 net/ipv6/ila/ila_common.c  | 103 
 net/ipv6/ila/ila_lwt.c | 152 +++
 net/ipv6/ila/ila_xlat.c| 645 +
 net/netfilter/Kconfig  |  12 +
 net/netfilter/Makefile |   1 +
 net/netfilter/xt_ILA.c |  82 ++
 net/netlink/af_netlink.c   |   4 +
 net/netlink/genetlink.c|  16 ++
 17 files changed, 1197 insertions(+), 230 deletions(-)
 create mode 100644 include/net/ila.h
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c
 create mode 100644 net/ipv6/ila/ila_xlat.c
 create mode 100644 net/netfilter/xt_ILA.c

-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 07/17] hv_netvsc: Eliminate send_completion_ctx from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate send_completion_ctx from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h |1 -
 drivers/net/hyperv/netvsc.c |3 +--
 drivers/net/hyperv/netvsc_drv.c |1 -
 3 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 9a3c972..9504ca9 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -146,7 +146,6 @@ struct hv_netvsc_packet {
 
 
u64 send_completion_tid;
-   void *send_completion_ctx;
struct hv_page_buffer *page_buf;
 };
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 34c16d1..0e0b723 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -653,8 +653,7 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
netvsc_free_send_slot(net_device, send_index);
q_idx = nvsc_packet->q_idx;
channel = incoming_channel;
-   netvsc_xmit_completion(nvsc_packet->
-  send_completion_ctx);
+   netvsc_xmit_completion(nvsc_packet);
}
 
num_outstanding_sends =
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 6d71a1e..947b778 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -489,7 +489,6 @@ check_size:
 
/* Set the completion routine */
packet->completion_func = 1;
-   packet->send_completion_ctx = packet;
packet->send_completion_tid = (unsigned long)skb;
 
isvlan = packet->vlan_tci & VLAN_TAG_PRESENT;
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 2/5] rhashtable: add function to replace an element

2015-12-01 Thread Tom Herbert

Add the rhashtable_replace_fast function. This replaces one object in
the table with another atomically. The hashes of the new and old objects
must be equal.

Signed-off-by: Tom Herbert 
---
 include/linux/rhashtable.h | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 843ceca..77deece 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -819,4 +819,86 @@ out:
return err;
 }
 
+/* Internal function, please use rhashtable_replace_fast() instead */
+static inline int __rhashtable_replace_fast(
+   struct rhashtable *ht, struct bucket_table *tbl,
+   struct rhash_head *obj_old, struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct rhash_head __rcu **pprev;
+   struct rhash_head *he;
+   spinlock_t *lock;
+   unsigned int hash;
+   int err = -ENOENT;
+
+   /* Minimally, the old and new objects must have same hash
+* (which should mean identifiers are the same).
+*/
+   hash = rht_head_hashfn(ht, tbl, obj_old, params);
+   if (hash != rht_head_hashfn(ht, tbl, obj_new, params))
+   return -EINVAL;
+
+   lock = rht_bucket_lock(tbl, hash);
+
+   spin_lock_bh(lock);
+
+   pprev = >buckets[hash];
+   rht_for_each(he, tbl, hash) {
+   if (he != obj_old) {
+   pprev = >next;
+   continue;
+   }
+
+   rcu_assign_pointer(obj_new->next, obj_old->next);
+   rcu_assign_pointer(*pprev, obj_new);
+   err = 0;
+   break;
+   }
+
+   spin_unlock_bh(lock);
+
+   return err;
+}
+
+/**
+ * rhashtable_replace_fast - replace an object in hash table
+ * @ht:hash table
+ * @obj_old:   pointer to hash head inside object being replaced
+ * @obj_new:   pointer to hash head inside object which is new
+ * @params:hash table parameters
+ *
+ * Replacing an object doesn't affect the number of elements in the hash table
+ * or bucket, so we don't need to worry about shrinking or expanding the
+ * table here.
+ *
+ * Returns zero on success, -ENOENT if the entry could not be found,
+ * -EINVAL if hash is not the same for the old and new objects.
+ */
+static inline int rhashtable_replace_fast(
+   struct rhashtable *ht, struct rhash_head *obj_old,
+   struct rhash_head *obj_new,
+   const struct rhashtable_params params)
+{
+   struct bucket_table *tbl;
+   int err;
+
+   rcu_read_lock();
+
+   tbl = rht_dereference_rcu(ht->tbl, ht);
+
+   /* Because we have already taken (and released) the bucket
+* lock in old_tbl, if we find that future_tbl is not yet
+* visible then that guarantees the entry to still be in
+* the old tbl if it exists.
+*/
+   while ((err = __rhashtable_replace_fast(ht, tbl, obj_old,
+   obj_new, params)) &&
+  (tbl = rht_dereference_rcu(tbl->future_tbl, ht)))
+   ;
+
+   rcu_read_unlock();
+
+   return err;
+}
+
 #endif /* _LINUX_RHASHTABLE_H */
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 17/17] hv_netvsc: Eliminate vlan_tci from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate vlan_tci from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |4 ++--
 drivers/net/hyperv/netvsc_drv.c   |   14 +++---
 drivers/net/hyperv/rndis_filter.c |7 +++
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 128b296..0c04362 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -136,7 +136,6 @@ struct hv_netvsc_packet {
u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
u8 page_buf_cnt;
 
-   u16 vlan_tci;
u16 q_idx;
u32 send_buf_index;
 
@@ -188,7 +187,8 @@ int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
void **data,
struct ndis_tcp_ip_checksum_info *csum_info,
-   struct vmbus_channel *channel);
+   struct vmbus_channel *channel,
+   u16 vlan_tci);
 void netvsc_channel_cb(void *context);
 int rndis_filter_open(struct hv_device *dev);
 int rndis_filter_close(struct hv_device *dev);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 490672e..ba21272 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -470,7 +470,6 @@ check_size:
FIELD_SIZEOF(struct sk_buff, cb));
packet = (struct hv_netvsc_packet *)skb->cb;
 
-   packet->vlan_tci = skb->vlan_tci;
 
packet->q_idx = skb_get_queue_mapping(skb);
 
@@ -480,7 +479,7 @@ check_size:
 
memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE);
 
-   isvlan = packet->vlan_tci & VLAN_TAG_PRESENT;
+   isvlan = skb->vlan_tci & VLAN_TAG_PRESENT;
 
/* Add the rndis header */
rndis_msg->ndis_msg_type = RNDIS_MSG_PACKET;
@@ -508,8 +507,8 @@ check_size:
IEEE_8021Q_INFO);
vlan = (struct ndis_pkt_8021q_info *)((void *)ppi +
ppi->ppi_offset);
-   vlan->vlanid = packet->vlan_tci & VLAN_VID_MASK;
-   vlan->pri = (packet->vlan_tci & VLAN_PRIO_MASK) >>
+   vlan->vlanid = skb->vlan_tci & VLAN_VID_MASK;
+   vlan->pri = (skb->vlan_tci & VLAN_PRIO_MASK) >>
VLAN_PRIO_SHIFT;
}
 
@@ -676,7 +675,8 @@ int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
void **data,
struct ndis_tcp_ip_checksum_info *csum_info,
-   struct vmbus_channel *channel)
+   struct vmbus_channel *channel,
+   u16 vlan_tci)
 {
struct net_device *net;
struct net_device_context *net_device_ctx;
@@ -716,9 +716,9 @@ int netvsc_recv_callback(struct hv_device *device_obj,
skb->ip_summed = CHECKSUM_NONE;
}
 
-   if (packet->vlan_tci & VLAN_TAG_PRESENT)
+   if (vlan_tci & VLAN_TAG_PRESENT)
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
-  packet->vlan_tci);
+  vlan_tci);
 
skb_record_rx_queue(skb, channel->
offermsg.offer.sub_channel_index);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 28adf6a..a37bbda 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -354,6 +354,7 @@ static int rndis_filter_receive_data(struct rndis_device 
*dev,
u32 data_offset;
struct ndis_pkt_8021q_info *vlan;
struct ndis_tcp_ip_checksum_info *csum_info;
+   u16 vlan_tci = 0;
 
rndis_pkt = >msg.pkt;
 
@@ -384,15 +385,13 @@ static int rndis_filter_receive_data(struct rndis_device 
*dev,
 
vlan = rndis_get_ppi(rndis_pkt, IEEE_8021Q_INFO);
if (vlan) {
-   pkt->vlan_tci = VLAN_TAG_PRESENT | vlan->vlanid |
+   vlan_tci = VLAN_TAG_PRESENT | vlan->vlanid |
(vlan->pri << VLAN_PRIO_SHIFT);
-   } else {
-   pkt->vlan_tci = 0;
}
 
csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO);
return netvsc_recv_callback(dev->net_dev->dev, pkt, data,
-   csum_info, channel);
+   csum_info, channel, vlan_tci);
 }
 
 int rndis_filter_receive(struct hv_device *dev,
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 13/17] hv_netvsc: Eliminate is_data_pkt from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate is_data_pkt from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |1 -
 drivers/net/hyperv/netvsc.c   |   14 --
 drivers/net/hyperv/netvsc_drv.c   |1 -
 drivers/net/hyperv/rndis_filter.c |1 -
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index baa40b1..65e340e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -131,7 +131,6 @@ struct ndis_tcp_ip_checksum_info;
 struct hv_netvsc_packet {
/* Bookkeeping stuff */
u8 status;
-   u8 is_data_pkt;
u8 xmit_more; /* from skb */
u8 cp_partial; /* partial copy into send buffer */
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index d18e10c..11b009e 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -704,12 +704,14 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
   u32 pend_size,
   struct hv_netvsc_packet *packet,
   struct rndis_message *rndis_msg,
-  struct hv_page_buffer **pb)
+  struct hv_page_buffer **pb,
+  struct sk_buff *skb)
 {
char *start = net_device->send_buf;
char *dest = start + (section_index * net_device->send_section_size)
 + pend_size;
int i;
+   bool is_data_pkt = (skb != NULL) ? true : false;
u32 msg_size = 0;
u32 padding = 0;
u32 remain = packet->total_data_buflen % net_device->pkt_align;
@@ -717,7 +719,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
packet->page_buf_cnt;
 
/* Add padding */
-   if (packet->is_data_pkt && packet->xmit_more && remain &&
+   if (is_data_pkt && packet->xmit_more && remain &&
!packet->cp_partial) {
padding = net_device->pkt_align - remain;
rndis_msg->msg_len += padding;
@@ -758,7 +760,7 @@ static inline int netvsc_send_pkt(
u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
-   if (packet->is_data_pkt) {
+   if (skb != NULL) {
/* 0 is RMC_DATA; */
nvmsg.msg.v1_msg.send_rndis_pkt.channel_type = 0;
} else {
@@ -868,7 +870,7 @@ int netvsc_send(struct hv_device *device,
if (msdp->pkt)
msd_len = msdp->pkt->total_data_buflen;
 
-   try_batch = packet->is_data_pkt && msd_len > 0 && msdp->count <
+   try_batch = (skb != NULL) && msd_len > 0 && msdp->count <
net_device->max_pkt;
 
if (try_batch && msd_len + pktlen + net_device->pkt_align <
@@ -880,7 +882,7 @@ int netvsc_send(struct hv_device *device,
section_index = msdp->pkt->send_buf_index;
packet->cp_partial = true;
 
-   } else if (packet->is_data_pkt && pktlen + net_device->pkt_align <
+   } else if ((skb != NULL) && pktlen + net_device->pkt_align <
   net_device->send_section_size) {
section_index = netvsc_get_next_send_section(net_device);
if (section_index != NETVSC_INVALID_INDEX) {
@@ -894,7 +896,7 @@ int netvsc_send(struct hv_device *device,
if (section_index != NETVSC_INVALID_INDEX) {
netvsc_copy_to_send_buf(net_device,
section_index, msd_len,
-   packet, rndis_msg, pb);
+   packet, rndis_msg, pb, skb);
 
packet->send_buf_index = section_index;
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 1532ae4..eb0c6fa 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -477,7 +477,6 @@ check_size:
 
packet->q_idx = skb_get_queue_mapping(skb);
 
-   packet->is_data_pkt = true;
packet->total_data_buflen = skb->len;
 
rndis_msg = (struct rndis_message *)skb->head;
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 53139f7..0b98674 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -215,7 +215,6 @@ static int rndis_filter_send_request(struct rndis_device 
*dev,
/* Setup the packet to send it */
packet = >pkt;
 
-   packet->is_data_pkt = false;
packet->total_data_buflen = req->request_msg.msg_len;
packet->page_buf_cnt = 1;
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH ney-next V3 12/17] hv_netvsc: Eliminate send_completion_tid from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate send_completion_tid from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |8 ++--
 drivers/net/hyperv/netvsc.c   |   28 ++--
 drivers/net/hyperv/netvsc_drv.c   |   14 ++
 drivers/net/hyperv/rndis_filter.c |2 +-
 4 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index b541455..baa40b1 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -145,10 +145,6 @@ struct hv_netvsc_packet {
u32 send_buf_index;
 
u32 total_data_buflen;
-   u32 pad1;
-
-
-   u64 send_completion_tid;
 };
 
 struct netvsc_device_info {
@@ -188,10 +184,10 @@ int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
struct hv_netvsc_packet *packet,
struct rndis_message *rndis_msg,
-   struct hv_page_buffer **page_buffer);
+   struct hv_page_buffer **page_buffer,
+   struct sk_buff *skb);
 void netvsc_linkstatus_callback(struct hv_device *device_obj,
struct rndis_message *resp);
-void netvsc_xmit_completion(void *context);
 int netvsc_recv_callback(struct hv_device *device_obj,
struct hv_netvsc_packet *packet,
void **data,
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 18058a59..d18e10c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -614,6 +614,7 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
struct hv_netvsc_packet *nvsc_packet;
struct net_device *ndev;
u32 send_index;
+   struct sk_buff *skb;
 
ndev = net_device->ndev;
 
@@ -639,17 +640,17 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
int queue_sends;
 
/* Get the send context */
-   nvsc_packet = (struct hv_netvsc_packet *)(unsigned long)
-   packet->trans_id;
+   skb = (struct sk_buff *)(unsigned long)packet->trans_id;
 
/* Notify the layer above us */
-   if (nvsc_packet) {
+   if (skb) {
+   nvsc_packet = (struct hv_netvsc_packet *) skb->cb;
send_index = nvsc_packet->send_buf_index;
if (send_index != NETVSC_INVALID_INDEX)
netvsc_free_send_slot(net_device, send_index);
q_idx = nvsc_packet->q_idx;
channel = incoming_channel;
-   netvsc_xmit_completion(nvsc_packet);
+   dev_kfree_skb_any(skb);
}
 
num_outstanding_sends =
@@ -744,7 +745,8 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
 static inline int netvsc_send_pkt(
struct hv_netvsc_packet *packet,
struct netvsc_device *net_device,
-   struct hv_page_buffer **pb)
+   struct hv_page_buffer **pb,
+   struct sk_buff *skb)
 {
struct nvsp_message nvmsg;
u16 q_idx = packet->q_idx;
@@ -772,10 +774,7 @@ static inline int netvsc_send_pkt(
nvmsg.msg.v1_msg.send_rndis_pkt.send_buf_section_size =
packet->total_data_buflen;
 
-   if (packet->completion_func)
-   req_id = (ulong)packet;
-   else
-   req_id = 0;
+   req_id = (ulong)skb;
 
if (out_channel->rescind)
return -ENODEV;
@@ -841,7 +840,8 @@ static inline int netvsc_send_pkt(
 int netvsc_send(struct hv_device *device,
struct hv_netvsc_packet *packet,
struct rndis_message *rndis_msg,
-   struct hv_page_buffer **pb)
+   struct hv_page_buffer **pb,
+   struct sk_buff *skb)
 {
struct netvsc_device *net_device;
int ret = 0, m_ret = 0;
@@ -907,7 +907,7 @@ int netvsc_send(struct hv_device *device,
}
 
if (msdp->pkt)
-   netvsc_xmit_completion(msdp->pkt);
+   dev_kfree_skb_any(skb);
 
if (packet->xmit_more && !packet->cp_partial) {
msdp->pkt = packet;
@@ -925,17 +925,17 @@ int netvsc_send(struct hv_device *device,
}
 
if (msd_send) {
-   m_ret = netvsc_send_pkt(msd_send, net_device, pb);
+   m_ret = netvsc_send_pkt(msd_send, net_device, pb, skb);
 
if (m_ret != 0) {
netvsc_free_send_slot(net_device,
  msd_send->send_buf_index);
-   netvsc_xmit_completion(msd_send);
+   dev_kfree_skb_any(skb);
}
}
 
if (cur_send)
-

[PATCH net-next V3 09/17] hv_netvsc: move subchannel existence check to netvsc_select_queue()

2015-12-01 Thread K. Y. Srinivasan

From: Vitaly Kuznetsov 

Signed-off-by: Vitaly Kuznetsov 
Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h |   15 ---
 drivers/net/hyperv/netvsc.c |5 ++---
 drivers/net/hyperv/netvsc_drv.c |3 +++
 3 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index e15dc2c..a9d2bdc5 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -1260,19 +1260,4 @@ struct rndis_message {
 #define TRANSPORT_INFO_IPV6_TCP ((INFO_IPV6 << 16) | INFO_TCP)
 #define TRANSPORT_INFO_IPV6_UDP ((INFO_IPV6 << 16) | INFO_UDP)
 
-static inline struct vmbus_channel *get_channel(struct hv_netvsc_packet 
*packet,
-   struct netvsc_device *net_device)
-
-{
-   struct vmbus_channel *out_channel;
-
-   out_channel = net_device->chn_table[packet->q_idx];
-   if (!out_channel) {
-   out_channel = net_device->dev->channel;
-   packet->q_idx = 0;
-   }
-   return out_channel;
-}
-
-
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0e0b723..419b055 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -749,8 +749,8 @@ static inline int netvsc_send_pkt(
struct netvsc_device *net_device)
 {
struct nvsp_message nvmsg;
-   struct vmbus_channel *out_channel = get_channel(packet, net_device);
u16 q_idx = packet->q_idx;
+   struct vmbus_channel *out_channel = net_device->chn_table[q_idx];
struct net_device *ndev = net_device->ndev;
u64 req_id;
int ret;
@@ -859,8 +859,7 @@ int netvsc_send(struct hv_device *device,
if (!net_device)
return -ENODEV;
 
-   out_channel = get_channel(packet, net_device);
-   q_idx = packet->q_idx;
+   out_channel = net_device->chn_table[q_idx];
 
packet->send_buf_index = NETVSC_INVALID_INDEX;
packet->cp_partial = false;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 90cc8d9..da3a224 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -272,6 +272,9 @@ static u16 netvsc_select_queue(struct net_device *ndev, 
struct sk_buff *skb,
skb_set_hash(skb, hash, PKT_HASH_TYPE_L3);
}
 
+   if (!nvsc_dev->chn_table[q_idx])
+   q_idx = 0;
+
return q_idx;
 }
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 14/17] hv_netvsc: Eliminate completion_func from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate completion_func from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |1 -
 drivers/net/hyperv/netvsc_drv.c   |3 ---
 drivers/net/hyperv/rndis_filter.c |1 -
 3 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 65e340e..ddf51a0 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -137,7 +137,6 @@ struct hv_netvsc_packet {
u8 rmsg_size; /* RNDIS header and PPI size */
u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
u8 page_buf_cnt;
-   u8 completion_func;
 
u16 vlan_tci;
u16 q_idx;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index eb0c6fa..bc4be1d 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -483,9 +483,6 @@ check_size:
 
memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE);
 
-   /* Set the completion routine */
-   packet->completion_func = 1;
-
isvlan = packet->vlan_tci & VLAN_TAG_PRESENT;
 
/* Add the rndis header */
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 0b98674..6ba5adf 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -236,7 +236,6 @@ static int rndis_filter_send_request(struct rndis_device 
*dev,
pb[0].len;
}
 
-   packet->completion_func = 0;
packet->xmit_more = false;
 
ret = netvsc_send(dev->net_dev->dev, packet, NULL, , NULL);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 02/17] hv_netvsc: Rearrange the hv_negtvsc_packet to be space efficient

2015-12-01 Thread K. Y. Srinivasan

Rearrange the elements of struct hv_negtvsc_packet for optimal layout -
eliminate unnecessary padding.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h |   18 ++
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 972e562..7435673 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -128,32 +128,34 @@ struct ndis_tcp_ip_checksum_info;
 struct hv_netvsc_packet {
/* Bookkeeping stuff */
u8 status;
-
u8 is_data_pkt;
u8 xmit_more; /* from skb */
u8 cp_partial; /* partial copy into send buffer */
 
-   u16 vlan_tci;
+   u8 rmsg_size; /* RNDIS header and PPI size */
+   u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
+   u8 page_buf_cnt;
+   u8 pad0;
 
+   u16 vlan_tci;
u16 q_idx;
+   u32 send_buf_index;
+
+   u32 total_data_buflen;
+   u32 pad1;
+
struct vmbus_channel *channel;
 
u64 send_completion_tid;
void *send_completion_ctx;
void (*send_completion)(void *context);
 
-   u32 send_buf_index;
 
/* This points to the memory after page_buf */
struct rndis_message *rndis_msg;
 
-   u8 rmsg_size; /* RNDIS header and PPI size */
-   u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
-
-   u32 total_data_buflen;
/* Points to the send/receive buffer where the ethernet frame is */
void *data;
-   u8 page_buf_cnt;
struct hv_page_buffer *page_buf;
 };
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next V3 06/17] hv_netvsc: Eliminate send_completion from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate send_completion from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
Reviewed-by: Haiyang Zhang 
---
 drivers/net/hyperv/hyperv_net.h   |3 +--
 drivers/net/hyperv/netvsc.c   |6 +++---
 drivers/net/hyperv/netvsc_drv.c   |2 +-
 drivers/net/hyperv/rndis_filter.c |2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 506d552..9a3c972 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -135,7 +135,7 @@ struct hv_netvsc_packet {
u8 rmsg_size; /* RNDIS header and PPI size */
u8 rmsg_pgcnt; /* page count of RNDIS header and PPI */
u8 page_buf_cnt;
-   u8 pad0;
+   u8 completion_func;
 
u16 vlan_tci;
u16 q_idx;
@@ -147,7 +147,6 @@ struct hv_netvsc_packet {
 
u64 send_completion_tid;
void *send_completion_ctx;
-   void (*send_completion)(void *context);
struct hv_page_buffer *page_buf;
 };
 
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 8fbf816..34c16d1 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -653,8 +653,8 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
netvsc_free_send_slot(net_device, send_index);
q_idx = nvsc_packet->q_idx;
channel = incoming_channel;
-   nvsc_packet->send_completion(nvsc_packet->
-send_completion_ctx);
+   netvsc_xmit_completion(nvsc_packet->
+  send_completion_ctx);
}
 
num_outstanding_sends =
@@ -775,7 +775,7 @@ static inline int netvsc_send_pkt(
nvmsg.msg.v1_msg.send_rndis_pkt.send_buf_section_size =
packet->total_data_buflen;
 
-   if (packet->send_completion)
+   if (packet->completion_func)
req_id = (ulong)packet;
else
req_id = 0;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c73afb1..6d71a1e 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -488,7 +488,7 @@ check_size:
memset(rndis_msg, 0, RNDIS_AND_PPI_SIZE);
 
/* Set the completion routine */
-   packet->send_completion = netvsc_xmit_completion;
+   packet->completion_func = 1;
packet->send_completion_ctx = packet;
packet->send_completion_tid = (unsigned long)skb;
 
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index be0fa9c..c8af172 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -237,7 +237,7 @@ static int rndis_filter_send_request(struct rndis_device 
*dev,
packet->page_buf[0].len;
}
 
-   packet->send_completion = NULL;
+   packet->completion_func = 0;
packet->xmit_more = false;
 
ret = netvsc_send(dev->net_dev->dev, packet, NULL);
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2 -next] examples, bpf: further improve examples

2015-12-01 Thread Daniel Borkmann

Improve example files further and add a more generic set of possible
helpers for them that can be used.

Signed-off-by: Daniel Borkmann 
---
 examples/bpf/bpf_cyclic.c   |  38 
 examples/bpf/bpf_funcs.h|  76 ---
 examples/bpf/bpf_graft.c|  39 
 examples/bpf/bpf_prog.c |  33 ---
 examples/bpf/bpf_shared.c   |  32 +++
 examples/bpf/bpf_shared.h   |   2 +-
 examples/bpf/bpf_tailcall.c |  84 +++--
 include/bpf_api.h   | 225 
 8 files changed, 327 insertions(+), 202 deletions(-)
 delete mode 100644 examples/bpf/bpf_funcs.h
 create mode 100644 include/bpf_api.h

diff --git a/examples/bpf/bpf_cyclic.c b/examples/bpf/bpf_cyclic.c
index bde061c..c66cbec 100644
--- a/examples/bpf/bpf_cyclic.c
+++ b/examples/bpf/bpf_cyclic.c
@@ -1,32 +1,30 @@
-#include 
-
-#include "bpf_funcs.h"
+#include "../../include/bpf_api.h"
 
 /* Cyclic dependency example to test the kernel's runtime upper
- * bound on loops.
+ * bound on loops. Also demonstrates on how to use direct-actions,
+ * loaded as: tc filter add [...] bpf da obj [...]
  */
-struct bpf_elf_map __section("maps") jmp_tc = {
-   .type   = BPF_MAP_TYPE_PROG_ARRAY,
-   .id = 0xabccba,
-   .size_key   = sizeof(int),
-   .size_value = sizeof(int),
-   .pinning= PIN_OBJECT_NS,
-   .max_elem   = 1,
-};
+#define JMP_MAP_ID 0xabccba
+
+BPF_PROG_ARRAY(jmp_tc, JMP_MAP_ID, PIN_OBJECT_NS, 1);
 
-__section_tail(0xabccba, 0) int cls_loop(struct __sk_buff *skb)
+__section_tail(JMP_MAP_ID, 0)
+int cls_loop(struct __sk_buff *skb)
 {
char fmt[] = "cb: %u\n";
 
-   bpf_printk(fmt, sizeof(fmt), skb->cb[0]++);
-   bpf_tail_call(skb, _tc, 0);
-   return -1;
+   trace_printk(fmt, sizeof(fmt), skb->cb[0]++);
+   tail_call(skb, _tc, 0);
+
+   skb->tc_classid = TC_H_MAKE(1, 42);
+   return TC_ACT_OK;
 }
 
-__section("classifier") int cls_entry(struct __sk_buff *skb)
+__section_cls_entry
+int cls_entry(struct __sk_buff *skb)
 {
-   bpf_tail_call(skb, _tc, 0);
-   return -1;
+   tail_call(skb, _tc, 0);
+   return TC_ACT_SHOT;
 }
 
-char __license[] __section("license") = "GPL";
+BPF_LICENSE("GPL");
diff --git a/examples/bpf/bpf_funcs.h b/examples/bpf/bpf_funcs.h
deleted file mode 100644
index 6d058f0..000
--- a/examples/bpf/bpf_funcs.h
+++ /dev/null
@@ -1,76 +0,0 @@
-#ifndef __BPF_FUNCS__
-#define __BPF_FUNCS__
-
-#include 
-
-#include "../../include/bpf_elf.h"
-
-/* Misc macros. */
-#ifndef __maybe_unused
-# define __maybe_unused__attribute__ ((__unused__))
-#endif
-
-#ifndef __stringify
-# define __stringify(x)#x
-#endif
-
-#ifndef __section
-# define __section(NAME)   __attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(m, x)  __section(__stringify(m) "/" __stringify(x))
-#endif
-
-#ifndef offsetof
-# define offsetof  __builtin_offsetof
-#endif
-
-#ifndef htons
-# define htons(x)  __constant_htons((x))
-#endif
-
-#ifndef likely
-# define likely(x) __builtin_expect(!!(x), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(x)   __builtin_expect(!!(x), 0)
-#endif
-
-/* The verifier will translate them to actual function calls. */
-static void *(*bpf_map_lookup_elem)(void *map, void *key) __maybe_unused =
-   (void *) BPF_FUNC_map_lookup_elem;
-
-static int (*bpf_map_update_elem)(void *map, void *key, void *value,
- unsigned long long flags) __maybe_unused =
-   (void *) BPF_FUNC_map_update_elem;
-
-static int (*bpf_map_delete_elem)(void *map, void *key) __maybe_unused =
-   (void *) BPF_FUNC_map_delete_elem;
-
-static unsigned int (*get_smp_processor_id)(void) __maybe_unused =
-   (void *) BPF_FUNC_get_smp_processor_id;
-
-static unsigned int (*get_prandom_u32)(void) __maybe_unused =
-   (void *) BPF_FUNC_get_prandom_u32;
-
-static int (*bpf_printk)(const char *fmt, int fmt_size, ...) __maybe_unused =
-   (void *) BPF_FUNC_trace_printk;
-
-static void (*bpf_tail_call)(void *ctx, void *map, int index) __maybe_unused =
-   (void *) BPF_FUNC_tail_call;
-
-/* LLVM built-in functions that an eBPF C program may use to emit
- * BPF_LD_ABS and BPF_LD_IND instructions.
- */
-unsigned long long load_byte(void *skb, unsigned long long off)
-   asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-   asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-   asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_FUNCS__ */
diff --git a/examples/bpf/bpf_graft.c b/examples/bpf/bpf_graft.c
index f36d25a..f48fd02 100644
--- a/examples/bpf/bpf_graft.c
+++ b/examples/bpf/bpf_graft.c
@@ -1,6 +1,4 @@
-#include 
-
-#include "bpf_funcs.h"
+#include "../../include/bpf_api.h"
 
 /* This example

Re: [RFC 1/4] net: support per queue tx_usecs in sysfs

2015-12-01 Thread Jesse Brandeburg

On Tue, 1 Dec 2015 14:13:34 -0800
Florian Fainelli  wrote:

> On 01/12/15 00:01, kan.li...@intel.com wrote:
> > From: Kan Liang 
> > 
> > Network devices usually have many queues. Each queue has its own
> > tx_usecs options. Currently, we can only set all the queues with same
> > value by ethtool. This patch expose the tx_usecs in sysfs. So the user
> > can set/get per queue coalesce parameter tx_usecs by sysfs.
> 
> The new interface you propose makes things inconsistent, since we have
> two separate configuration paths (sysfs and ethtool), and it would seem
> better to have per-queue awareness in ethtool, since there is a whole
> bunch of other parameters that could be configured on a per-queue basis.
> 
> Have you tried to extend existing ethtool interfaces to cover the need
> for multiple queues?

While I agree that ethtool provides a similar functionality, ethtool
was designed (particularly the ethtool -C/c commands) around one queue
NICs.  We can't change the output or functionality of the user
interface without breaking a bunch of user's scripts and stuff.

With this effort, Kan is laying groundwork for making further kernel
changes, and having the kernel call back in to drivers via ethtool
mechanisms that were designed before multiple queue adapters.

We can also next migrate the legacy ethtool interfaces to use these
new .ndo_ops should we wish.

These patches were provided with the intent of getting some feedback
about going down this path of making a *consistent* user interface that
is driver agnostic in sysfs, and supports multiple queue adapters.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2 net-next] ravb: Add fallback compatibility strings

2015-12-01 Thread Simon Horman

On Tue, Dec 01, 2015 at 01:26:08PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 12/1/2015 10:43 AM, Simon Horman wrote:
> 
> >Add fallback compatibility strings for R-Car Gen 2 & 3 SoC Families.
> >This is in keeping with the fallback scheme being adopted wherever
> >appropriate for drivers for Renesas SoCs.
> >
> >Also correct typo.
> >
> >Signed-off-by: Simon Horman 
> >---
> >  Documentation/devicetree/bindings/net/renesas,ravb.txt | 11 +--
> >  drivers/net/ethernet/renesas/ravb_main.c   |  2 ++
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> >
> >diff --git a/Documentation/devicetree/bindings/net/renesas,ravb.txt 
> >b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >index b486f3f5f6a3..115006325bff 100644
> >--- a/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >+++ b/Documentation/devicetree/bindings/net/renesas,ravb.txt
> >@@ -1,12 +1,19 @@
> >  * Renesas Electronics Ethernet AVB
> >
> >  This file provides information on what the device node for the Ethernet AVB
> >-interface contains.
> >+interface.
> 
>Why?

Because I miss read it (multiple times).
I'll drop this change.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload

2015-12-01 Thread Singhai, Anjali




On 12/1/2015 8:08 AM, John W. Linville wrote:

On Tue, Dec 01, 2015 at 04:49:28PM +0100, Hannes Frederic Sowa wrote:

On Tue, Dec 1, 2015, at 16:44, John W. Linville wrote:

On Mon, Nov 30, 2015 at 09:26:51PM -0800, Tom Herbert wrote:

On Mon, Nov 30, 2015 at 5:28 PM, Jesse Gross  wrote:

Based on what we can do today, I see only two real choices: do some
refactoring to clean up the stack a bit or remove the existing VXLAN
offloading altogether. I think this series is trying to do the former
and the result is that the stack is cleaner after than before. That
seems like a good thing.

There is a third choice which is to do nothing. Creating an
infrastructure that claims to "Generalize udp based tunnel offload"
but actually doesn't generalize the mechanism is nothing more than
window dressing-- this does nothing to help with the VXLAN to
VXLAN-GPE transition for instance. If geneve specific offload is
really needed now then that can be should with another ndo function,
or alternatively ntuple filter with a device specific action would at
least get the stack out of needing to be concerned with that.
Regardless, we will work optimize the rest of the stack for devices
that implement protocol agnostic mechanisms.

Is there no concern about NDO proliferation? Does the size of the
netdev_ops structure matter? Beyond that, I can see how a single
entry point with an enum specifying the offload type isn't really any
different in the grand scheme of things than having multiple NDOs,
one per offload.

Given the need to live with existing hardware offloads, I would lean
toward a consolidated NDO. But if a different NDO per tunnel type is
preferred, I can be satisified with that.

Having per-offloading NDOs helps the stack to gather further information
what kind of offloads the driver has even maybe without trying to call
down into the layer (just by comparing to NULL). Checking this inside
the driver offload function clearly does not have this feature. So we
finally can have "ip tunnel please-recommend-type" feature. :)

That is a valuable insight! Maybe the per-offload NDO isn't such a
bad idea afterall... :-)

John
This helps me understand why having a separate ndo op might still be ok. 
Thanks for the feedback. I will go back to that model.  Also  I think I 
did finally understand the discussion on using a single 2's compliment 
checksum method

for future silicon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 4/5] ila: Add generic ILA translation facility

2015-12-01 Thread Tom Herbert

This patch implements an ILA tanslation table. This table can be
configured with identifier to locator mappings, and can be be queried
to resolve a mapping. Queries can be parameterized based on interface,
direction (incoming or outoing), and matching locator.  The table is
implemented using rhashtable and is configured via netlink (through
"ip ila .." in iproute).

The table may be used as alternative means to do do ILA tanslations
other than the lw tunnels

Signed-off-by: Tom Herbert 
---
 include/net/ila.h |  18 ++
 include/uapi/linux/ila.h  |  22 ++
 net/ipv6/ila/Makefile |   2 +-
 net/ipv6/ila/ila.h|   2 +
 net/ipv6/ila/ila_common.c |   8 +
 net/ipv6/ila/ila_xlat.c   | 645 ++
 6 files changed, 696 insertions(+), 1 deletion(-)
 create mode 100644 include/net/ila.h
 create mode 100644 net/ipv6/ila/ila_xlat.c

diff --git a/include/net/ila.h b/include/net/ila.h
new file mode 100644
index 000..9f4f43e
--- /dev/null
+++ b/include/net/ila.h
@@ -0,0 +1,18 @@
+/*
+ * ILA kernel interface
+ *
+ * Copyright (c) 2015 Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ */
+
+#ifndef _NET_ILA_H
+#define _NET_ILA_H
+
+int ila_xlat_outgoing(struct sk_buff *skb);
+int ila_xlat_incoming(struct sk_buff *skb);
+
+#endif /* _NET_ILA_H */
diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index 7ed9e67..abde7bb 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -3,13 +3,35 @@
 #ifndef _UAPI_LINUX_ILA_H
 #define _UAPI_LINUX_ILA_H
 
+/* NETLINK_GENERIC related info */
+#define ILA_GENL_NAME  "ila"
+#define ILA_GENL_VERSION   0x1
+
 enum {
ILA_ATTR_UNSPEC,
ILA_ATTR_LOCATOR,   /* u64 */
+   ILA_ATTR_IDENTIFIER,/* u64 */
+   ILA_ATTR_LOCATOR_MATCH, /* u64 */
+   ILA_ATTR_IFINDEX,   /* s32 */
+   ILA_ATTR_DIR,   /* u32 */
 
__ILA_ATTR_MAX,
 };
 
 #define ILA_ATTR_MAX   (__ILA_ATTR_MAX - 1)
 
+enum {
+   ILA_CMD_UNSPEC,
+   ILA_CMD_ADD,
+   ILA_CMD_DEL,
+   ILA_CMD_GET,
+
+   __ILA_CMD_MAX,
+};
+
+#define ILA_CMD_MAX(__ILA_CMD_MAX - 1)
+
+#define ILA_DIR_IN (1 << 0)
+#define ILA_DIR_OUT(1 << 1)
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 31d136b..4b32e59 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index b94081f..28542cb 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -42,5 +42,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct 
ila_params *p);
 
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
+int ila_xlat_init(void);
+void ila_xlat_fini(void);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 64e1904..32dc9aa 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -80,12 +80,20 @@ static int __init ila_init(void)
if (ret)
goto fail_lwt;
 
+   ret = ila_xlat_init();
+   if (ret)
+   goto fail_xlat;
+
+   return 0;
+fail_xlat:
+   ila_lwt_fini();
 fail_lwt:
return ret;
 }
 
 static void __exit ila_fini(void)
 {
+   ila_xlat_fini();
ila_lwt_fini();
 }
 
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
new file mode 100644
index 000..fcfe998
--- /dev/null
+++ b/net/ipv6/ila/ila_xlat.c
@@ -0,0 +1,645 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ila.h"
+
+struct ila_xlat_params {
+   struct ila_params ip;
+   __be64 identifier;
+   int ifindex;
+   unsigned int dir;
+};
+
+struct ila_map {
+   struct ila_xlat_params p;
+   struct rhash_head node;
+   struct ila_map *next;
+   struct rcu_head rcu;
+};
+
+static unsigned int ila_net_id;
+
+struct ila_net {
+   struct rhashtable rhash_table;
+   spinlock_t *locks; /* Bucket locks for entry manipulation */
+   unsigned int locks_mask;
+};
+
+#defineLOCKS_PER_CPU 10
+
+static int alloc_ila_locks(struct ila_net *ilan, gfp_t gfp)
+{
+   unsigned int i, size;
+   unsigned int nr_pcpus = num_possible_cpus();
+
+   nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
+   size = roundup_pow_of_two(nr_pcpus * LOCKS_PER_CPU);
+
+   if (sizeof(spinlock_t) != 0) {
+#ifdef CONFIG_NUMA
+   if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+   gfp == GFP_KERNEL)
+

[PATCH net-next V3 16/17] hv_netvsc: Eliminate status from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate status from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |1 -
 drivers/net/hyperv/netvsc.c   |6 ++
 drivers/net/hyperv/netvsc_drv.c   |8 ++--
 drivers/net/hyperv/rndis_filter.c |   20 +---
 4 files changed, 13 insertions(+), 22 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 96d34e2..128b296 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -130,7 +130,6 @@ struct ndis_tcp_ip_checksum_info;
  */
 struct hv_netvsc_packet {
/* Bookkeeping stuff */
-   u8 status;
u8 cp_partial; /* partial copy into send buffer */
 
u8 rmsg_size; /* RNDIS header and PPI size */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index cd5b65e..02bab9a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1045,17 +1045,15 @@ static void netvsc_receive(struct netvsc_device 
*net_device,
/* Each range represents 1 RNDIS pkt that contains 1 ethernet frame */
for (i = 0; i < count; i++) {
/* Initialize the netvsc packet */
-   netvsc_packet->status = NVSP_STAT_SUCCESS;
data = (void *)((unsigned long)net_device->
recv_buf + vmxferpage_packet->ranges[i].byte_offset);
netvsc_packet->total_data_buflen =
vmxferpage_packet->ranges[i].byte_count;
 
/* Pass it to the upper layer */
-   rndis_filter_receive(device, netvsc_packet, , channel);
+   status = rndis_filter_receive(device, netvsc_packet, ,
+ channel);
 
-   if (netvsc_packet->status != NVSP_STAT_SUCCESS)
-   status = NVSP_STAT_FAIL;
}
 
netvsc_send_recv_completion(device, channel, net_device,
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 7520c52..490672e 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -470,8 +470,6 @@ check_size:
FIELD_SIZEOF(struct sk_buff, cb));
packet = (struct hv_netvsc_packet *)skb->cb;
 
-   packet->status = 0;
-
packet->vlan_tci = skb->vlan_tci;
 
packet->q_idx = skb_get_queue_mapping(skb);
@@ -687,8 +685,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 
net = ((struct netvsc_device *)hv_get_drvdata(device_obj))->ndev;
if (!net || net->reg_state != NETREG_REGISTERED) {
-   packet->status = NVSP_STAT_FAIL;
-   return 0;
+   return NVSP_STAT_FAIL;
}
net_device_ctx = netdev_priv(net);
rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
@@ -697,8 +694,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
skb = netdev_alloc_skb_ip_align(net, packet->total_data_buflen);
if (unlikely(!skb)) {
++net->stats.rx_dropped;
-   packet->status = NVSP_STAT_FAIL;
-   return 0;
+   return NVSP_STAT_FAIL;
}
 
/*
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 3c06aa7..28adf6a 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -344,7 +344,7 @@ static inline void *rndis_get_ppi(struct rndis_packet 
*rpkt, u32 type)
return NULL;
 }
 
-static void rndis_filter_receive_data(struct rndis_device *dev,
+static int rndis_filter_receive_data(struct rndis_device *dev,
   struct rndis_message *msg,
   struct hv_netvsc_packet *pkt,
   void **data,
@@ -371,7 +371,7 @@ static void rndis_filter_receive_data(struct rndis_device 
*dev,
   "overflow detected (got %u, min %u)"
   "...dropping this message!\n",
   pkt->total_data_buflen, rndis_pkt->data_len);
-   return;
+   return NVSP_STAT_FAIL;
}
 
/*
@@ -391,7 +391,8 @@ static void rndis_filter_receive_data(struct rndis_device 
*dev,
}
 
csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO);
-   netvsc_recv_callback(dev->net_dev->dev, pkt, data, csum_info, channel);
+   return netvsc_recv_callback(dev->net_dev->dev, pkt, data,
+   csum_info, channel);
 }
 
 int rndis_filter_receive(struct hv_device *dev,
@@ -406,7 +407,7 @@ int rndis_filter_receive(struct hv_device *dev,
int ret = 0;
 
if (!net_dev) {
-   ret = -EINVAL;
+   ret = NVSP_STAT_FAIL;
goto exit;
}
 
@@ -416,7 +417,7 @@ int rndis_filter_receive(struct hv_device *dev,
if (!net_dev->extension) {

[PATCH net-next v2 3/5] netlink: add a start callback for starting a netlink dump

2015-12-01 Thread Tom Herbert

The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert 
---
 include/linux/netlink.h  |  2 ++
 include/net/genetlink.h  |  2 ++
 net/netlink/af_netlink.c |  4 
 net/netlink/genetlink.c  | 16 
 4 files changed, 24 insertions(+)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 639e9b8..0b41959 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
 struct netlink_callback {
struct sk_buff  *skb;
const struct nlmsghdr   *nlh;
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -153,6 +154,7 @@ struct nlmsghdr *
 __nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int 
flags);
 
 struct netlink_dump_control {
+   int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 1b6b6dc..43c0e77 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -114,6 +114,7 @@ static inline void genl_info_net_set(struct genl_info 
*info, struct net *net)
  * @flags: flags
  * @policy: attribute validation policy
  * @doit: standard command callback
+ * @start: start callback for dumps
  * @dumpit: callback for dumpers
  * @done: completion callback for dumps
  * @ops_list: operations list
@@ -122,6 +123,7 @@ struct genl_ops {
const struct nla_policy *policy;
int(*doit)(struct sk_buff *skb,
   struct genl_info *info);
+   int(*start)(struct netlink_callback *cb);
int(*dumpit)(struct sk_buff *skb,
 struct netlink_callback *cb);
int(*done)(struct netlink_callback *cb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 59651af..81dc1bb 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2915,6 +2915,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
cb = >cb;
memset(cb, 0, sizeof(*cb));
+   cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2927,6 +2928,9 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff 
*skb,
 
mutex_unlock(nlk->cb_mutex);
 
+   if (cb->start)
+   cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index bc0e504..8e63662 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 }
 EXPORT_SYMBOL(genlmsg_put);
 
+static int genl_lock_start(struct netlink_callback *cb)
+{
+   /* our ops are always const - netlink API doesn't propagate that */
+   const struct genl_ops *ops = cb->data;
+   int rc = 0;
+
+   if (ops->start) {
+   genl_lock();
+   rc = ops->start(cb);
+   genl_unlock();
+   }
+   return rc;
+}
+
 static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
 {
/* our ops are always const - netlink API doesn't propagate that */
@@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+   .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
} else {
struct netlink_dump_control c = {
.module = family->module,
+   .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 5/5] net: ILA iptables target

2015-12-01 Thread Tom Herbert

Add two target ILAIN and ILAOUT which hook into the ILA module.

Signed-off-by: Tom Herbert 
---
 net/netfilter/Kconfig  | 12 
 net/netfilter/Makefile |  1 +
 net/netfilter/xt_ILA.c | 82 ++
 3 files changed, 95 insertions(+)
 create mode 100644 net/netfilter/xt_ILA.c

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 4692782..62ae50f 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -749,6 +749,18 @@ config NETFILTER_XT_TARGET_IDLETIMER
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_TARGET_ILA
+   tristate "ILA target support"
+   depends on IP_NF_MANGLE || IP6_NF_MANGLE
+   depends on NETFILTER_ADVANCED
+   depends on IPV6_ILA
+   help
+ This option adds an `ILA' target, which allow Identifier Locator
+ Addressing (ILA) translations. The ILA tables are managed by the
+ ILA module.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_TARGET_LED
tristate '"LED" target support'
depends on LEDS_CLASS && LEDS_TRIGGERS
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7638c36..4fc16aa 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -122,6 +122,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += 
xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_ILA) += xt_ILA.o
 
 # matches
 obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o
diff --git a/net/netfilter/xt_ILA.c b/net/netfilter/xt_ILA.c
new file mode 100644
index 000..9b01e2e
--- /dev/null
+++ b/net/netfilter/xt_ILA.c
@@ -0,0 +1,82 @@
+/* x_tables module for Identifier Locator Addressing (ILA) translation
+ *
+ * (C) 2015 by Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+MODULE_AUTHOR("Tom Herbert ");
+MODULE_DESCRIPTION("Xtables: ILA translation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ip6t_ILA");
+MODULE_ALIAS("ip6t_ILAIN");
+MODULE_ALIAS("ip6t_ILAOUT");
+
+static unsigned int
+ila_tg_input(struct sk_buff *skb, const struct xt_action_param *par)
+{
+   ila_xlat_incoming(skb);
+
+   return XT_CONTINUE;
+}
+
+static unsigned int
+ila_tg_output(struct sk_buff *skb, const struct xt_action_param *par)
+{
+   ila_xlat_outgoing(skb);
+
+   return XT_CONTINUE;
+}
+
+static int ila_tg_check(const struct xt_tgchk_param *par)
+{
+   return 0;
+}
+
+static struct xt_target ila_tg_reg[] __read_mostly = {
+   {
+   .name   = "ILAIN",
+   .family = NFPROTO_IPV6,
+   .checkentry = ila_tg_check,
+   .target = ila_tg_input,
+   .targetsize = 0,
+   .table  = "mangle",
+   .hooks  = (1 << NF_INET_PRE_ROUTING) |
+ (1 << NF_INET_LOCAL_IN),
+   .me = THIS_MODULE,
+   },
+   {
+   .name   = "ILAOUT",
+   .family = NFPROTO_IPV6,
+   .checkentry = ila_tg_check,
+   .target = ila_tg_output,
+   .targetsize = 0,
+   .table  = "mangle",
+   .hooks  = (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_OUT),
+   .me = THIS_MODULE,
+   },
+};
+
+static int __init ila_tg_init(void)
+{
+   return xt_register_targets(ila_tg_reg, ARRAY_SIZE(ila_tg_reg));
+}
+
+static void __exit ila_tg_exit(void)
+{
+   xt_unregister_targets(ila_tg_reg, ARRAY_SIZE(ila_tg_reg));
+}
+
+module_init(ila_tg_init);
+module_exit(ila_tg_exit);
-- 
2.4.6

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/5] ila: Create net/ipv6/ila directory

2015-12-01 Thread Tom Herbert

Create ila directory in preparation for supporting other hooks in the
kernel than LWT for doing ILA. This includes:
  - Moving ila.c to ila/ila_lwt.c
  - Splitting out some common functions into ila_common.c

Signed-off-by: Tom Herbert 
---
 net/ipv6/Makefile |   2 +-
 net/ipv6/ila.c| 229 --
 net/ipv6/ila/Makefile |   7 ++
 net/ipv6/ila/ila.h|  46 ++
 net/ipv6/ila/ila_common.c |  95 +++
 net/ipv6/ila/ila_lwt.c| 152 ++
 6 files changed, 301 insertions(+), 230 deletions(-)
 delete mode 100644 net/ipv6/ila.c
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_common.c
 create mode 100644 net/ipv6/ila/ila_lwt.c

diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2c900c7..2fbd90b 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -34,7 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
-obj-$(CONFIG_IPV6_ILA) += ila.o
+obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)+= netfilter/
 
 obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
diff --git a/net/ipv6/ila.c b/net/ipv6/ila.c
deleted file mode 100644
index 1a6852e..000
--- a/net/ipv6/ila.c
+++ /dev/null
@@ -1,229 +0,0 @@
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct ila_params {
-   __be64 locator;
-   __be64 locator_match;
-   __wsum csum_diff;
-};
-
-static inline struct ila_params *ila_params_lwtunnel(
-   struct lwtunnel_state *lwstate)
-{
-   return (struct ila_params *)lwstate->data;
-}
-
-static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
-{
-   __be32 diff[] = {
-   ~from[0], ~from[1], to[0], to[1],
-   };
-
-   return csum_partial(diff, sizeof(diff), 0);
-}
-
-static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
-{
-   if (*(__be64 *)>daddr == p->locator_match)
-   return p->csum_diff;
-   else
-   return compute_csum_diff8((__be32 *)>daddr,
- (__be32 *)>locator);
-}
-
-static void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
-{
-   __wsum diff;
-   struct ipv6hdr *ip6h = ipv6_hdr(skb);
-   size_t nhoff = sizeof(struct ipv6hdr);
-
-   /* First update checksum */
-   switch (ip6h->nexthdr) {
-   case NEXTHDR_TCP:
-   if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr {
-   struct tcphdr *th = (struct tcphdr *)
-   (skb_network_header(skb) + nhoff);
-
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(>check, skb,
-   diff, true);
-   }
-   break;
-   case NEXTHDR_UDP:
-   if (likely(pskb_may_pull(skb, nhoff + sizeof(struct udphdr {
-   struct udphdr *uh = (struct udphdr *)
-   (skb_network_header(skb) + nhoff);
-
-   if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(>check, skb,
-   diff, true);
-   if (!uh->check)
-   uh->check = CSUM_MANGLED_0;
-   }
-   }
-   break;
-   case NEXTHDR_ICMP:
-   if (likely(pskb_may_pull(skb,
-nhoff + sizeof(struct icmp6hdr {
-   struct icmp6hdr *ih = (struct icmp6hdr *)
-   (skb_network_header(skb) + nhoff);
-
-   diff = get_csum_diff(ip6h, p);
-   inet_proto_csum_replace_by_diff(>icmp6_cksum, skb,
-   diff, true);
-   }
-   break;
-   }
-
-   /* Now change destination address */
-   *(__be64 *)>daddr = p->locator;
-}
-
-static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
-{
-   struct dst_entry *dst = skb_dst(skb);
-
-   if (skb->protocol != htons(ETH_P_IPV6))
-   goto drop;
-
-   update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
-
-   return dst->lwtstate->orig_output(net, sk, skb);
-
-drop:
-   kfree_skb(skb);
-   return -EINVAL;
-}
-
-static int ila_input(struct sk_buff *skb)
-{
-

[PATCH net-next V3 15/17] hv_netvsc: Eliminate xmit_more from struct hv_netvsc_packet

2015-12-01 Thread K. Y. Srinivasan

Eliminate xmit_more from struct hv_netvsc_packet.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/hyperv_net.h   |1 -
 drivers/net/hyperv/netvsc.c   |   13 -
 drivers/net/hyperv/netvsc_drv.c   |1 -
 drivers/net/hyperv/rndis_filter.c |2 --
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index ddf51a0..96d34e2 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -131,7 +131,6 @@ struct ndis_tcp_ip_checksum_info;
 struct hv_netvsc_packet {
/* Bookkeeping stuff */
u8 status;
-   u8 xmit_more; /* from skb */
u8 cp_partial; /* partial copy into send buffer */
 
u8 rmsg_size; /* RNDIS header and PPI size */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 11b009e..cd5b65e 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -712,6 +712,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
 + pend_size;
int i;
bool is_data_pkt = (skb != NULL) ? true : false;
+   bool xmit_more = (skb != NULL) ? skb->xmit_more : false;
u32 msg_size = 0;
u32 padding = 0;
u32 remain = packet->total_data_buflen % net_device->pkt_align;
@@ -719,7 +720,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
packet->page_buf_cnt;
 
/* Add padding */
-   if (is_data_pkt && packet->xmit_more && remain &&
+   if (is_data_pkt && xmit_more && remain &&
!packet->cp_partial) {
padding = net_device->pkt_align - remain;
rndis_msg->msg_len += padding;
@@ -758,6 +759,7 @@ static inline int netvsc_send_pkt(
int ret;
struct hv_page_buffer *pgbuf;
u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
+   bool xmit_more = (skb != NULL) ? skb->xmit_more : false;
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
if (skb != NULL) {
@@ -789,7 +791,7 @@ static inline int netvsc_send_pkt(
 * unnecessarily.
 */
if (ring_avail < (RING_AVAIL_PERCENT_LOWATER + 1))
-   packet->xmit_more = false;
+   xmit_more = false;
 
if (packet->page_buf_cnt) {
pgbuf = packet->cp_partial ? (*pb) +
@@ -801,14 +803,14 @@ static inline int netvsc_send_pkt(
  sizeof(struct 
nvsp_message),
  req_id,
  
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED,
- !packet->xmit_more);
+ !xmit_more);
} else {
ret = vmbus_sendpacket_ctl(out_channel, ,
   sizeof(struct nvsp_message),
   req_id,
   VM_PKT_DATA_INBAND,
   
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED,
-  !packet->xmit_more);
+  !xmit_more);
}
 
if (ret == 0) {
@@ -854,6 +856,7 @@ int netvsc_send(struct hv_device *device,
struct multi_send_data *msdp;
struct hv_netvsc_packet *msd_send = NULL, *cur_send = NULL;
bool try_batch;
+   bool xmit_more = (skb != NULL) ? skb->xmit_more : false;
 
net_device = get_outbound_net_device(device);
if (!net_device)
@@ -911,7 +914,7 @@ int netvsc_send(struct hv_device *device,
if (msdp->pkt)
dev_kfree_skb_any(skb);
 
-   if (packet->xmit_more && !packet->cp_partial) {
+   if (xmit_more && !packet->cp_partial) {
msdp->pkt = packet;
msdp->count++;
} else {
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index bc4be1d..7520c52 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -471,7 +471,6 @@ check_size:
packet = (struct hv_netvsc_packet *)skb->cb;
 
packet->status = 0;
-   packet->xmit_more = skb->xmit_more;
 
packet->vlan_tci = skb->vlan_tci;
 
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 6ba5adf..3c06aa7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -236,8 +236,6 @@ static int rndis_filter_send_request(struct rndis_device 
*dev,
pb[0].len;
}
 
-   packet->xmit_more = false;
-
ret = netvsc_send(dev->net_dev->dev, packet, NULL, , NULL);
return ret;
 }
-- 
1.7.4.1

--
To unsubscribe from this list: send the line

Re: ipsec impact on performance

2015-12-01 Thread David Ahern


On 12/1/15 10:17 AM, Rick Jones wrote:

On 12/01/2015 09:59 AM, Sowmini Varadhan wrote:

But these are all still relatively small things - tweaking them
doesnt get me significantly past the 3 Gbps limit. Any suggestions
on how to make this budge (or design criticism of the patch) would
be welcome.


What do the perf profiles show?  Presumably, loss of TSO/GSO means an
increase in the per-packet costs, but if the ipsec path significantly
increases the per-byte costs...

Short of a perf profile, I suppose one way to probe for per-packet
versus per-byte would be to up the MTU.  That should reduce the
per-packet costs while keeping the per-byte roughly the same.


Using iperf3 and AH with NULL algorithm between 2 peers connected by a 
10G link.


Without AH configured I get a steady 9.9 Gbps with iperf3 consuming 
about 55% cpu.


With AH I get ~1.5 Gbps with MTU at 1500:

[  4]   0.00-1.01   sec   160 MBytes  1.33 Gbits/sec   23905 KBytes
[  4]   1.01-2.00   sec   211 MBytes  1.79 Gbits/sec0996 KBytes

iperf3 runs about 60% CPU and ksoftirqd/2 is at 86%.


Bumping the MTU to 9000:

[  4]   3.00-4.00   sec   914 MBytes  7.67 Gbits/sec  260   1.01 MBytes
[  4]   4.00-5.00   sec  1012 MBytes  8.49 Gbits/sec0   1.23 MBytes
[  4]   5.00-6.00   sec  1.15 GBytes  9.88 Gbits/sec0   1.23 MBytes

At this rate iperf3 was at 95% CPU and ksoftirqd was not relevant.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ipsec impact on performance

2015-12-01 Thread Sowmini Varadhan

On (12/01/15 16:56), David Ahern wrote:
> 
> Using iperf3 and AH with NULL algorithm between 2 peers connected by
> a 10G link.
> 
I'm using esp-null, not AH, and iperf2, which I understand is 
quite different from, and more aggressive than, iperf3 (though I'm not 
sure that it matters for this single-stream case).

> With AH I get ~1.5 Gbps with MTU at 1500:

But yes, I get approx that too. 

The "good" news is that I can get about 3 Gbps with my patch. So one
could say that I've 2x-ed the perf. Except that:

The "bad" news is that even GSO/GRO can do way better, so we
need to be able to extend that perf to also be available 
to some key TCP and IP extensions (like md5 and ipsec, maybe)
and beyond (i.e need to de-ossify the stack so we can extend
TCP/IP features without sacrificing perf along the way).

The not-so-great news is that I see that just adding perf tracepoints
(not even enabling them!) seems to make a small diff (3 Gbps vs 3.2 Gbps) 
to my numbers. Is that mere standard-deviation, or something 
one should be aware of, about perf?

> iperf3 runs about 60% CPU and ksoftirqd/2 is at 86%.

yes, not surprising. You really need to compare this to GSO/GRO
for a pure-s/w,  apples-apples comparison.

> Bumping the MTU to 9000:

Yes that's not always an option. See also the comments from Eric/Rick
about latency [http://lists.openwall.net/netdev/2015/11/24/111]. 

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] Stable interface index option

2015-12-01 Thread Andrew Lunn

> In general the ifindexes are designed to not be reused very fast.

Some parts of multicast group management rely on this. You need to
remove group memberships from a socket when an interface has
disappeared, e.g. a VPN interface has gone away. You can pass the
ifindex of the no longer existing interface when removing the group
memberships. If that ifindex has been re-used, you are going to have
interesting race conditions.

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/4] net: support per queue tx_usecs in sysfs

2015-12-01 Thread Alexander Duyck

On Tue, Dec 1, 2015 at 3:44 PM, Jesse Brandeburg
 wrote:
> On Tue, 1 Dec 2015 14:13:34 -0800
> Florian Fainelli  wrote:
>
>> On 01/12/15 00:01, kan.li...@intel.com wrote:
>> > From: Kan Liang 
>> >
>> > Network devices usually have many queues. Each queue has its own
>> > tx_usecs options. Currently, we can only set all the queues with same
>> > value by ethtool. This patch expose the tx_usecs in sysfs. So the user
>> > can set/get per queue coalesce parameter tx_usecs by sysfs.
>>
>> The new interface you propose makes things inconsistent, since we have
>> two separate configuration paths (sysfs and ethtool), and it would seem
>> better to have per-queue awareness in ethtool, since there is a whole
>> bunch of other parameters that could be configured on a per-queue basis.
>>
>> Have you tried to extend existing ethtool interfaces to cover the need
>> for multiple queues?
>
> While I agree that ethtool provides a similar functionality, ethtool
> was designed (particularly the ethtool -C/c commands) around one queue
> NICs.  We can't change the output or functionality of the user
> interface without breaking a bunch of user's scripts and stuff.

Actually it seems like it should be fairly easy to extend the existing
interface.  If for example you were to add a couple new keywords such
as rx-queue or tx-queue what you could do is make it so that you would
then specifically overwrite or access the values of a given Tx queue
or Rx queue instead of doing it generically for the entire device.

> With this effort, Kan is laying groundwork for making further kernel
> changes, and having the kernel call back in to drivers via ethtool
> mechanisms that were designed before multiple queue adapters.
>
> We can also next migrate the legacy ethtool interfaces to use these
> new .ndo_ops should we wish.

Maybe that is the path this should start taking now.  Another thing to
keep in mind is that not all drivers make use of the rx-usecs value
the way the Intel drivers do.  As such we need to be able to still
support all the various interrupt moderation types that are supported
by any NICs that might make use of this feature.

> These patches were provided with the intent of getting some feedback
> about going down this path of making a *consistent* user interface that
> is driver agnostic in sysfs, and supports multiple queue adapters.

If you are truly going for something that is driver agnostic you need
to start looking at other drivers.  For example I don't see how this
current implementation deals with the tx/rx_frames values provided in
the mlx4 drivers for their interrupt moderation.  Also it seems like
the assumption here is that you are going to want to run all of the
queues statically without allowing dynamic interrupt moderation.  I
would think that this might be something that could be managed per
queue as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next V3 00/17] hv_netvsc: Eliminate the additional head room

2015-12-01 Thread David Miller

From: "K. Y. Srinivasan" 
Date: Tue,  1 Dec 2015 16:42:58 -0800

> In an attempt to avoid having to allocate memory on the send path, the netvsc
> driver was requesting additional head room so that both rndis header and the
> netvsc packet (the state that had to persist) could be placed in the skb.
> Since the amount of head room requested was exceeding the default head room
> as set in LL_MAX_HEADER, we were forcing a reallocation of skb.
> 
> With this patch-set, I have reduced the size of the netvsc packet to less
> than 20 bytes and with this reduction we don't need to ask for any additional
> headroom. We place the rndis header in the skb head room and we place the
> netvsc packet in control buffer area in the skb.
> 
> V2:  - Addressed  review comments:
>  - Eliminated more fields from netvsc packet structure.
> 
> V3:  - Fixed a typo in patch: hv_netvsc: Don't ask for additional head room 
> in the skb.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Increasing skb->mark size

2015-12-01 Thread Andi Kleen

> >This would be be great. I've recently ran into some issues with
> >the overhead of the Android firewall setup.
> >
> >So basically you need 4 extra bytes in sk_buff. How about:
> >
> >- shrinking skb->priority to 2 byte
> 
> That wouldn't work, see SO_PRIORITY and such (4 bytes) ...

But does anybody really use the full 4 bytes for the priority?
SO_PRIORITY could well truncate the value.

> 
> >- skb_iff is either skb->dev->iff or 0. so it could be replaced with a
> >single bit flag for the 0 case.
> 
> ... and that one wouldn't work on ingress.

Please explain.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] bpf, array: fix heap out-of-bounds access when updating elements

2015-12-01 Thread David Miller

From: Daniel Borkmann 
Date: Mon, 30 Nov 2015 13:02:55 +0100

> During own review but also reported by Dmitry's syzkaller [1] it has been
> noticed that we trigger a heap out-of-bounds access on eBPF array maps
> when updating elements. This happens with each map whose map->value_size
> (specified during map creation time) is not multiple of 8 bytes.
> 
> In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and
> used to align array map slots for faster access. However, in function
> array_map_update_elem(), we update the element as ...
> 
> memcpy(array->value + array->elem_size * index, value, array->elem_size);
> 
> ... where we access 'value' out-of-bounds, since it was allocated from
> map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER)
> and later on copied through copy_from_user(value, uvalue, map->value_size).
> Thus, up to 7 bytes, we can access out-of-bounds.
> 
> Same could happen from within an eBPF program, where in worst case we
> access beyond an eBPF program's designated stack.
> 
> Since 1be7f75d1668 ("bpf: enable non-root eBPF programs") didn't hit an
> official release yet, it only affects priviledged users.
> 
> In case of array_map_lookup_elem(), the verifier prevents eBPF programs
> from accessing beyond map->value_size through check_map_access(). Also
> from syscall side map_lookup_elem() only copies map->value_size back to
> user, so nothing could leak.
> 
>   [1] http://github.com/google/syzkaller
> 
> Fixes: 28fbcfa08d8e ("bpf: add array type of eBPF maps")
> Reported-by: Dmitry Vyukov 
> Signed-off-by: Daniel Borkmann 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 net 0/6] Marvell Armada XP/370/38X Neta fixes

2015-12-01 Thread David Miller

From: Marcin Wojtas 
Date: Mon, 30 Nov 2015 13:27:40 +0100

> I'm sending v4 with corrected commit log of the last patch, in order
> to avoid possible conflicts between the branches as suggested by
> Gregory Clement.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/4] net: support per queue tx_usecs in sysfs

2015-12-01 Thread David Miller

From: Florian Fainelli 
Date: Tue, 01 Dec 2015 14:13:34 -0800

> The new interface you propose makes things inconsistent, since we have
> two separate configuration paths (sysfs and ethtool), and it would seem
> better to have per-queue awareness in ethtool, since there is a whole
> bunch of other parameters that could be configured on a per-queue basis.

Agreed, we have to extend ethtool to support this, in order to provide
a consistent interface.

We could even do this by encapsulating one ethtool command within
another, the outer ethtool command says "apply the inner op to
queue(s) N".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v3 03/17] net: ethtool: add new ETHTOOL_GSETTINGS/SSETTINGS API

2015-12-01 Thread David Miller

From: David Decotigny 
Date: Mon, 30 Nov 2015 14:05:41 -0800

> This patch defines a new ETHTOOL_GSETTINGS/SSETTINGS API, handled by
> the new get_ksettings/set_ksettings callbacks. This API provides
> support for most legacy ethtool_cmd fields, adds support for larger
> link mode masks (up to 4064 bits, variable length), and removes
> ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt).

Please do not define the mask using a non-fixed type.  I know it makes
it easier to use the various bitmap helper routines if you use 'long',
but here it is clearly superior to use "u32" for the bitmap type and
do the bit operations by hand if necessary.

Otherwise you have to have all of this ulong size CPP conditional code
which is incredibly ugly.

Furthermore you have to use fixed sized types anyways so that we don't
need compat code to deal with 32-bit userspace applications making
these ethtool calls into a 64-bit kernel.

THanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 1/3] bnxt_en: Fixed incorrect implementation of ndo_set_mac_address

2015-12-01 Thread Michael Chan

From: Jeffrey Huang 

The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.

Signed-off-by: Jeffrey Huang 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index db15c5e..651b587 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5212,13 +5212,22 @@ init_err:
 static int bnxt_change_mac_addr(struct net_device *dev, void *p)
 {
struct sockaddr *addr = p;
+   struct bnxt *bp = netdev_priv(dev);
+   int rc = 0;
 
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
 
+   if (ether_addr_equal(addr->sa_data, dev->dev_addr))
+   return 0;
+
memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
+   if (netif_running(dev)) {
+   bnxt_close_nic(bp, false, false);
+   rc = bnxt_open_nic(bp, false, false);
+   }
 
-   return 0;
+   return rc;
 }
 
 /* rtnl_lock held */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ipsec impact on performance

2015-12-01 Thread David Ahern


On 12/1/15 5:09 PM, Sowmini Varadhan wrote:

The not-so-great news is that I see that just adding perf tracepoints
(not even enabling them!) seems to make a small diff (3 Gbps vs 3.2 Gbps)
to my numbers. Is that mere standard-deviation, or something
one should be aware of, about perf?


existence of traepoints has no overhead until activated (ie., launch 
perf or start ftrace for those tracepoints).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/4] net: support per queue tx_usecs in sysfs

2015-12-01 Thread David Miller

From: Jesse Brandeburg 
Date: Tue, 1 Dec 2015 15:44:54 -0800

> While I agree that ethtool provides a similar functionality, ethtool
> was designed (particularly the ethtool -C/c commands) around one queue
> NICs.  We can't change the output or functionality of the user
> interface without breaking a bunch of user's scripts and stuff.

See my suggestion, we can encapsulate existing ethtool commands inside
of a new command that specifies operations that are to be applied to
specific queues.

Do not use sysfs for this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 0/3] bnxt_en: set mac address and uc_list bug fixes.

2015-12-01 Thread Michael Chan

Fix ndo_set_mac_address() for PF and VF.
Re-apply uc_list after chip reset.

Michael Chan (3):
  bnxt_en: Fixed incorrect implemenation of ndo_set_mac_address
  bnxt_en: enforce properly storing of MAC address
  bnxt_en: Setup uc_list mac filters after resetting the chip.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c   | 42 +
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |  7 ++---
 2 files changed, 32 insertions(+), 17 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 2/3] bnxt_en: enforce properly storing of MAC address

2015-12-01 Thread Michael Chan

From: Jeffrey Huang 

For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW.  For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.

Signed-off-by: Jeffrey Huang 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   | 18 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c |  7 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 651b587..4656480 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3625,6 +3625,7 @@ static int bnxt_hwrm_func_qcaps(struct bnxt *bp)
pf->fw_fid = le16_to_cpu(resp->fid);
pf->port_id = le16_to_cpu(resp->port_id);
memcpy(pf->mac_addr, resp->perm_mac_address, ETH_ALEN);
+   memcpy(bp->dev->dev_addr, pf->mac_addr, ETH_ALEN);
pf->max_rsscos_ctxs = le16_to_cpu(resp->max_rsscos_ctx);
pf->max_cp_rings = le16_to_cpu(resp->max_cmpl_rings);
pf->max_tx_rings = le16_to_cpu(resp->max_tx_rings);
@@ -3648,8 +3649,11 @@ static int bnxt_hwrm_func_qcaps(struct bnxt *bp)
 
vf->fw_fid = le16_to_cpu(resp->fid);
memcpy(vf->mac_addr, resp->perm_mac_address, ETH_ALEN);
-   if (!is_valid_ether_addr(vf->mac_addr))
-   random_ether_addr(vf->mac_addr);
+   if (is_valid_ether_addr(vf->mac_addr))
+   /* overwrite netdev dev_adr with admin VF MAC */
+   memcpy(bp->dev->dev_addr, vf->mac_addr, ETH_ALEN);
+   else
+   random_ether_addr(bp->dev->dev_addr);
 
vf->max_rsscos_ctxs = le16_to_cpu(resp->max_rsscos_ctx);
vf->max_cp_rings = le16_to_cpu(resp->max_cmpl_rings);
@@ -5218,6 +5222,9 @@ static int bnxt_change_mac_addr(struct net_device *dev, 
void *p)
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
 
+   if (BNXT_VF(bp) && is_valid_ether_addr(bp->vf.mac_addr))
+   return -EADDRNOTAVAIL;
+
if (ether_addr_equal(addr->sa_data, dev->dev_addr))
return 0;
 
@@ -5695,15 +5702,12 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
bnxt_set_tpa_flags(bp);
bnxt_set_ring_params(bp);
dflt_rings = netif_get_num_default_rss_queues();
-   if (BNXT_PF(bp)) {
-   memcpy(dev->dev_addr, bp->pf.mac_addr, ETH_ALEN);
+   if (BNXT_PF(bp))
bp->pf.max_irqs = max_irqs;
-   } else {
 #if defined(CONFIG_BNXT_SRIOV)
-   memcpy(dev->dev_addr, bp->vf.mac_addr, ETH_ALEN);
+   else
bp->vf.max_irqs = max_irqs;
 #endif
-   }
bnxt_get_max_rings(bp, _rx_rings, _tx_rings);
bp->rx_nr_rings = min_t(int, dflt_rings, max_rx_rings);
bp->tx_nr_rings_per_tc = min_t(int, dflt_rings, max_tx_rings);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index f4cf688..7a9af28 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -804,10 +804,9 @@ void bnxt_update_vf_mac(struct bnxt *bp)
if (!is_valid_ether_addr(resp->perm_mac_address))
goto update_vf_mac_exit;
 
-   if (ether_addr_equal(resp->perm_mac_address, bp->vf.mac_addr))
-   goto update_vf_mac_exit;
-
-   memcpy(bp->vf.mac_addr, resp->perm_mac_address, ETH_ALEN);
+   if (!ether_addr_equal(resp->perm_mac_address, bp->vf.mac_addr))
+   memcpy(bp->vf.mac_addr, resp->perm_mac_address, ETH_ALEN);
+   /* overwrite netdev dev_adr with admin VF MAC */
memcpy(bp->dev->dev_addr, bp->vf.mac_addr, ETH_ALEN);
 update_vf_mac_exit:
mutex_unlock(>hwrm_cmd_lock);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 3/3] bnxt_en: Setup uc_list mac filters after resetting the chip.

2015-12-01 Thread Michael Chan

Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters.  Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.

Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4656480..aecb2b62 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3884,6 +3884,8 @@ static int bnxt_alloc_rfs_vnics(struct bnxt *bp)
 #endif
 }
 
+static int bnxt_cfg_rx_mode(struct bnxt *);
+
 static int bnxt_init_chip(struct bnxt *bp, bool irq_re_init)
 {
int rc = 0;
@@ -3950,11 +3952,9 @@ static int bnxt_init_chip(struct bnxt *bp, bool 
irq_re_init)
bp->vnic_info[0].rx_mask |=
CFA_L2_SET_RX_MASK_REQ_MASK_PROMISCUOUS;
 
-   rc = bnxt_hwrm_cfa_l2_set_rx_mask(bp, 0);
-   if (rc) {
-   netdev_err(bp->dev, "HWRM cfa l2 rx mask failure rc: %x\n", rc);
+   rc = bnxt_cfg_rx_mode(bp);
+   if (rc)
goto err_out;
-   }
 
rc = bnxt_hwrm_set_coal(bp);
if (rc)
@@ -4869,7 +4869,7 @@ static void bnxt_set_rx_mode(struct net_device *dev)
}
 }
 
-static void bnxt_cfg_rx_mode(struct bnxt *bp)
+static int bnxt_cfg_rx_mode(struct bnxt *bp)
 {
struct net_device *dev = bp->dev;
struct bnxt_vnic_info *vnic = >vnic_info[0];
@@ -4918,6 +4918,7 @@ static void bnxt_cfg_rx_mode(struct bnxt *bp)
netdev_err(bp->dev, "HWRM vnic filter failure rc: %x\n",
   rc);
vnic->uc_filter_count = i;
+   return rc;
}
}
 
@@ -4926,6 +4927,8 @@ skip_uc:
if (rc)
netdev_err(bp->dev, "HWRM cfa l2 rx mask failure rc: %x\n",
   rc);
+
+   return rc;
 }
 
 static netdev_features_t bnxt_fix_features(struct net_device *dev,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-01 Thread Alexander Duyck

On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin  wrote:
> On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
>>
>>
>> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
>> >They can only be corrected if the underlying assumptions are correct
>> >and they aren't.  Your solution would have never worked correctly.
>> >The problem is you assume you can keep the device running when you are
>> >migrating and you simply cannot.  At some point you will always have
>> >to stop the device in order to complete the migration, and you cannot
>> >stop it before you have stopped your page tracking mechanism.  So
>> >unless the platform has an IOMMU that is somehow taking part in the
>> >dirty page tracking you will not be able to stop the guest and then
>> >the device, it will have to be the device and then the guest.
>> >
>> >>>Doing suspend and resume() may help to do migration easily but some
>> >>>devices requires low service down time. Especially network and I got
>> >>>that some cloud company promised less than 500ms network service downtime.
>> >Honestly focusing on the downtime is getting the cart ahead of the
>> >horse.  First you need to be able to do this without corrupting system
>> >memory and regardless of the state of the device.  You haven't even
>> >gotten to that state yet.  Last I knew the device had to be up in
>> >order for your migration to even work.
>>
>> I think the issue is that the content of rx package delivered to stack maybe
>> changed during migration because the piece of memory won't be migrated to
>> new machine. This may confuse applications or stack. Current dummy write
>> solution can ensure the content of package won't change after doing dummy
>> write while the content maybe not received data if migration happens before
>> that point. We can recheck the content via checksum or crc in the protocol
>> after dummy write to ensure the content is what VF received. I think stack
>> has already done such checks and the package will be abandoned if failed to
>> pass through the check.
>
>
> Most people nowdays rely on hardware checksums so I don't think this can
> fly.

Correct.  The checksum/crc approach will not work since it is possible
for a checksum to even be mangled in the case of some features such as
LRO or GRO.

>> Another way is to tell all memory driver are using to Qemu and let Qemu to
>> migrate these memory after stopping VCPU and the device. This seems safe but
>> implementation maybe complex.
>
> Not really 100% safe.  See below.
>
> I think hiding these details behind dma_* API does have
> some appeal. In any case, it gives us a good
> terminology as it covers what most drivers do.

That was kind of my thought.  If we were to build our own
dma_mark_clean() type function that will mark the DMA region dirty on
sync or unmap then that is half the battle right there as we would be
able to at least keep the regions consistent after they have left the
driver.

> There are several components to this:
> - dma_map_* needs to prevent page from
>   being migrated while device is running.
>   For example, expose some kind of bitmap from guest
>   to host, set bit there while page is mapped.
>   What happens if we stop the guest and some
>   bits are still set? See dma_alloc_coherent below
>   for some ideas.

Yeah, I could see something like this working.  Maybe we could do
something like what was done for the NX bit and make use of the upper
order bits beyond the limits of the memory range to mark pages as
non-migratable?

I'm curious.  What we have with a DMA mapped region is essentially
shared memory between the guest and the device.  How would we resolve
something like this with IVSHMEM, or are we blocked there as well in
terms of migration?

> - dma_unmap_* needs to mark page as dirty
>   This can be done by writing into a page.
>
> - dma_sync_* needs to mark page as dirty
>   This is trickier as we can not change the data.
>   One solution is using atomics.
>   For example:
> int x = ACCESS_ONCE(*p);
> cmpxchg(p, x, x);
>   Seems to do a write without changing page
>   contents.

Like I said we can probably kill 2 birds with one stone by just
implementing our own dma_mark_clean() for x86 virtualized
environments.

I'd say we could take your solution one step further and just use 0
instead of bothering to read the value.  After all it won't write the
area if the value at the offset is not 0.  The only downside is that
this is a locked operation so we will take a pretty serious
performance penalty when this is active.  As such my preference would
be to hide the code behind some static key that we could then switch
on in the event of a VM being migrated.

> - dma_alloc_coherent memory (e.g. device rings)
>   must be migrated after device stopped modifying it.
>   Just stopping the VCPU is not enough:
>   you must make sure device is not changing it.
>
>   Or maybe the device has some kind of ring flush operation,
>   if there was a

Re: IPv4 tunnels: why IP-IP and SIT enforce DF bit, but GRE does not?

2015-12-01 Thread David Miller

From: Hannes Frederic Sowa 
Date: Tue, 01 Dec 2015 14:30:55 +0100

> On Tue, Dec 1, 2015, at 14:20, Konstantin Shemyak wrote:
>> My point was not to question its feasibility, but to make it similar 
>> across GRE, IP-IP and SIT tunnels.
> 
> I would send a patch to add it again if Parvin didn't have good reasons
> to remove it.

The tunnel code consolidation created a lot of regressions and subtle
unintended changes in behavior between the different tunnel types.

This DF bit issue is just yet another example of that.

I'm really disappointed at how many bugs and problems were introduced
by those changes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] openvswitch: fix hangup on vxlan/gre/geneve device deletion

2015-12-01 Thread Paolo Abeni

Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e90 ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni 
---
 net/openvswitch/dp_notify.c| 2 +-
 net/openvswitch/vport-netdev.c | 8 ++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index a7a80a6..653d073 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,7 +58,7 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
 
hlist_for_each_entry_safe(vport, n, >ports[i], 
dp_hash_node) {
-   if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
+   if (vport->ops->type == OVS_VPORT_TYPE_INTERNAL)
continue;
 
if (!(vport->dev->priv_flags & 
IFF_OVS_DATAPATH))
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index b327368..6b0190b 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -180,9 +180,13 @@ void ovs_netdev_tunnel_destroy(struct vport *vport)
if (vport->dev->priv_flags & IFF_OVS_DATAPATH)
ovs_netdev_detach_dev(vport);
 
-   /* Early release so we can unregister the device */
+   /* We can be invoked by both explicit vport deletion and
+* underlying netdev deregistration; delete the link only
+* if it's not already shutting down.
+*/
+   if (vport->dev->reg_state == NETREG_REGISTERED)
+   rtnl_delete_link(vport->dev);
dev_put(vport->dev);
-   rtnl_delete_link(vport->dev);
vport->dev = NULL;
rtnl_unlock();
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next 00/26] bonding/team offload + mlxsw implementation

2015-12-01 Thread Jiri Pirko

Tue, Dec 01, 2015 at 05:35:43PM CET, gerlitz...@gmail.com wrote:
>On Tue, Dec 1, 2015 at 5:12 PM, Jiri Pirko  wrote:
>> Tue, Dec 01, 2015 at 04:06:23PM CET, gerlitz...@gmail.com wrote:
>>>On Tue, Dec 1, 2015 at 4:43 PM, Jiri Pirko  wrote:
 Tue, Dec 01, 2015 at 02:48:38PM CET, j...@resnulli.us wrote:
>
>This patchset introduces needed infrastructure for link aggregation
>offload - for both team and bonding. It also implements the offload
>in mlxsw driver.
>
>>>I didn't see any changes to switchdev.h, can you elaborate on that please.
>
>> Correct. This patchset does not extend switchdev api. The extension is
>> done for netdev notifiers. It seems natural and correct.
>> As we discussed already with John on a different thread, it makes sense
>> for non-switchdev drivers to benefit from this extensions as well.
>
>This is understood.
>
>However, the point which is still not clear to me related to the LAG /
>switchdev object model.
>
>All of FDB/VLAN/FIB switchdev objects have corresponding software counterparts
>in the kernel --- what's the case for LAG? the software construct is
>bond or team
>instance, shouldn't there  be a modeling of the HW LAG object in switchdev?

No need for that, what that would be good for?

Switchdev iface (most of it) works with struct net_device. Does not matter
if that is the port netdev direclty, or if it is team/bonding netdev.
It falls into the picture very nicely.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ravb: ptp: Add CONFIG mode support

2015-12-01 Thread Yoshihiro Kaneko

From: Kazuya Mizuguchi 

This patch makes PTP support active in CONFIG mode on R-Car Gen3.

Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Yoshihiro Kaneko 
---

This patch is based on the master branch of David Miller's next networking
tree.

 drivers/net/ethernet/renesas/ravb.h  |  1 +
 drivers/net/ethernet/renesas/ravb_main.c | 33 +++-
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index f9dee74..9fbe92a 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -206,6 +206,7 @@ enum CCC_BIT {
CCC_OPC_RESET   = 0x,
CCC_OPC_CONFIG  = 0x0001,
CCC_OPC_OPERATION = 0x0002,
+   CCC_GAC = 0x0080,
CCC_DTSR= 0x0100,
CCC_CSEL= 0x0003,
CCC_CSEL_HPB= 0x0001,
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 990dc55..293046d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1231,7 +1231,8 @@ static int ravb_open(struct net_device *ndev)
ravb_emac_init(ndev);
 
/* Initialise PTP Clock driver */
-   ravb_ptp_init(ndev, priv->pdev);
+   if (priv->chip_id == RCAR_GEN2)
+   ravb_ptp_init(ndev, priv->pdev);
 
netif_tx_start_all_queues(ndev);
 
@@ -1244,7 +1245,8 @@ static int ravb_open(struct net_device *ndev)
 
 out_ptp_stop:
/* Stop PTP Clock driver */
-   ravb_ptp_stop(ndev);
+   if (priv->chip_id == RCAR_GEN2)
+   ravb_ptp_stop(ndev);
 out_free_irq:
free_irq(ndev->irq, ndev);
free_irq(priv->emac_irq, ndev);
@@ -1476,7 +1478,8 @@ static int ravb_close(struct net_device *ndev)
ravb_write(ndev, 0, TIC);
 
/* Stop PTP Clock driver */
-   ravb_ptp_stop(ndev);
+   if (priv->chip_id == RCAR_GEN2)
+   ravb_ptp_stop(ndev);
 
/* Set the config mode to stop the AVB-DMAC's processes */
if (ravb_stop_dma(ndev) < 0)
@@ -1781,8 +1784,16 @@ static int ravb_probe(struct platform_device *pdev)
ndev->ethtool_ops = _ethtool_ops;
 
/* Set AVB config mode */
-   ravb_write(ndev, (ravb_read(ndev, CCC) & ~CCC_OPC) | CCC_OPC_CONFIG,
-  CCC);
+   if (chip_id == RCAR_GEN2) {
+   ravb_write(ndev, (ravb_read(ndev, CCC) & ~CCC_OPC) |
+  CCC_OPC_CONFIG, CCC);
+   /* Set CSEL value */
+   ravb_write(ndev, (ravb_read(ndev, CCC) & ~CCC_CSEL) |
+  CCC_CSEL_HPB, CCC);
+   } else {
+   ravb_write(ndev, (ravb_read(ndev, CCC) & ~CCC_OPC) |
+  CCC_OPC_CONFIG | CCC_GAC | CCC_CSEL_HPB, CCC);
+   }
 
/* Set CSEL value */
ravb_write(ndev, (ravb_read(ndev, CCC) & ~CCC_CSEL) | CCC_CSEL_HPB,
@@ -1814,6 +1825,10 @@ static int ravb_probe(struct platform_device *pdev)
/* Initialise HW timestamp list */
INIT_LIST_HEAD(>ts_skb_list);
 
+   /* Initialise PTP Clock driver */
+   if (chip_id != RCAR_GEN2)
+   ravb_ptp_init(ndev, pdev);
+
/* Debug message level */
priv->msg_enable = RAVB_DEF_MSG_ENABLE;
 
@@ -1855,6 +1870,10 @@ out_napi_del:
 out_dma_free:
dma_free_coherent(ndev->dev.parent, priv->desc_bat_size, priv->desc_bat,
  priv->desc_bat_dma);
+
+   /* Stop PTP Clock driver */
+   if (chip_id != RCAR_GEN2)
+   ravb_ptp_stop(ndev);
 out_release:
if (ndev)
free_netdev(ndev);
@@ -1869,6 +1888,10 @@ static int ravb_remove(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct ravb_private *priv = netdev_priv(ndev);
 
+   /* Stop PTP Clock driver */
+   if (priv->chip_id != RCAR_GEN2)
+   ravb_ptp_stop(ndev);
+
dma_free_coherent(ndev->dev.parent, priv->desc_bat_size, priv->desc_bat,
  priv->desc_bat_dma);
/* Set reset mode */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] af_unix: fix entry locking in unix_dgram_recvmsg

2015-12-01 Thread Rainer Weikusat

Rainer Weikusat  writes:

[...]

> Insofar I understand the comment in this code block correctly,
>
> err = mutex_lock_interruptible(>readlock);
> if (unlikely(err)) {
> /* recvmsg() in non blocking mode is supposed to return 
> -EAGAIN
>  * sk_rcvtimeo is not honored by mutex_lock_interruptible()
>  */
> err = noblock ? -EAGAIN : -ERESTARTSYS;
> goto out;
> }
>
> setting a receive timeout for an AF_UNIX datagram socket also doesn't
> work as intended because of this: In case of n readers with the same
> timeout, the nth reader will end up blocking n times the timeout.

Test program which confirms this. It starts four concurrent reads on the
same socket with a receive timeout of 3s. This means the whole program
should take a little more than 3s to execute as each read should time
out at about the same time. But it takes 12s instead as the reads
pile up on the readlock mutex and each then gets its own timeout once it
could enter the receive loop.

---
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SERVER_ADDR "\0multi-timeout"
#define RCV_TIMEO   3

static void set_rcv_timeo(int sk)
{
struct timeval tv;

tv.tv_sec = RCV_TIMEO;
tv.tv_usec = 0;
setsockopt(sk, SOL_SOCKET, SO_RCVTIMEO, , sizeof(tv));
}

int main(void)
{
struct sockaddr_un sun;
struct timeval tv_start, tv_end;
int sk, dummy;

sun.sun_family = AF_UNIX;
memcpy(sun.sun_path, SERVER_ADDR, sizeof(SERVER_ADDR));
sk = socket(AF_UNIX, SOCK_DGRAM, 0);
bind(sk, (struct sockaddr *), sizeof(sun));
set_rcv_timeo(sk);

gettimeofday(_start, NULL);

if (fork() == 0) {
read(sk, , sizeof(dummy));
_exit(0);
}

if (fork() == 0) {
read(sk, , sizeof(dummy));
_exit(0);
}

if (fork() == 0) {
read(sk, , sizeof(dummy));
_exit(0);
}

read(sk, , sizeof(dummy));

while (waitpid(-1, NULL, 0) > 0);

gettimeofday(_end, NULL);
printf("Waited for %u timeouts\n",
   (unsigned)((tv_end.tv_sec - tv_start.tv_sec) / RCV_TIMEO));

return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: soft lockup happened for ixgbevf

2015-12-01 Thread Alexander Duyck


On 11/30/2015 11:12 PM, Ding Tianhong wrote:

Hi Everyone:

I found this problem when using the Testgine to send package to the 82599 
ethernet:
1.wait for the speed to 10G/bit, it's ok.
2.then down the eth and then up, loop for several times.
3.then the softlockup happened.
[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: 88057dd8a220 ti: 88057dd9c000 task.ti: 
88057dd9c000
[  368.106005] RIP: 0010:[]  [] 
fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:88061fc83ce8  EFLAGS: 0286
[  368.106005] RAX: 0001 RBX: 020155c0 RCX: 0001
[  368.106005] RDX: 88061fc83d50 RSI: 88061fc83d70 RDI: 880036d11a00
[  368.106005] RBP: 88061fc83d08 R08: 0001 R09: 
[  368.106005] R10: 880036d11a00 R11: 819e0900 R12: 88061fc83c58
[  368.106005] R13: 816154dd R14: 88061fc83d08 R15: 020155c0
[  368.106005] FS:  () GS:88061fc8() 
knlGS:
[  368.106005] CS:  0010 DS:  ES:  CR0: 80050033
[  368.106005] CR2: 7f8c2aee9c40 CR3: 00057b222000 CR4: 000407e0
[  368.106005] DR0:  DR1:  DR2: 
[  368.106005] DR3:  DR6: 0ff0 DR7: 0400
[  368.106005] Stack:
[  368.106005]  01c0 88057b766000 8802e380b000 
88057af03e00
[  368.106005]  88061fc83dc0 815349a6 88061fc83d40 
814ee146
[  368.106005]  8802e380af00 e380af00 819e0900 
020155c001c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

I have test this on kernel 3.10 and kernel 4.3, both exist this problem.
Thanks for any suggestion.


So I have a couple questions.  First what kernel was the above trace 
captured with?  Are you using the in-kernel driver or did you download 
one and build it out-of-tree?  If you are using the in-tree then I can 
assume the above trace is with kernel version 3.10 since the ixgbevf 
driver was fixed to not use the backlog several versions ago.


Can you povide the trace for the 4.3 kernel?  It would greatly help as 
the 3.10 kernel is a bit old and a reproduction on the 4.3 kernel or 
net-next would go a long way toward allowing us to figure out the root 
cause.


Thanks.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 >

1 - 100 of 301 matches

Mail list logo