Re: [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB)

2015-10-24 Thread Eric Dumazet
On Sat, 2015-10-24 at 08:00 +, Jonas Markussen wrote:

> Repacketization is only on retransmissions; RDB bundles previously sent 
> segments with the next “normal” transmission instead. 
> 
> This makes the flow recover the lost segment  before a retransmission is 
> triggered by an RTO or fast retransmit.

Thank you for this very high quality patch submission.

Please give us a few days for proper evaluation.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-10-24 Thread David Miller
From: Mans Rullgard 
Date: Thu, 22 Oct 2015 17:28:50 +0100

> +static void nb8800_mac_tx(struct net_device *dev, int enable)
 ...
> +static void nb8800_mac_rx(struct net_device *dev, int enable)
 ...
> +static void nb8800_mac_af(struct net_device *dev, int enable)

Please use 'bool' and true/false for 'enable'.

> +static int nb8800_alloc_rx(struct net_device *dev, int i, int napi)

Likewise here for 'napi'.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf"

2015-10-24 Thread Lan, Tianyu


On 10/22/2015 4:52 AM, Alexander Duyck wrote:

Also have you even considered the MSI-X configuration on the VF?  I
haven't seen anything anywhere that would have migrated the VF's MSI-X
configuration from BAR 3 on one system to the new system.


MSI-X migration is done by Hypervisor(Qemu).
Following link is my Qemu patch to do that.
http://marc.info/?l=kvm=144544706530484=2
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: no CHECKSUM_PARTIAL on skbs with extension headers and recalc checksum during fragmentation

2015-10-24 Thread Tom Herbert
On Fri, Oct 23, 2015 at 9:13 AM, Hannes Frederic Sowa
 wrote:
> CHECKSUM_PARTIAL should only be used on plain vanilla IPv6 + UDP packets
> in ip6_append_data. Some drivers don't correctly handle extension headers,
> especially not ipv6 fragmentation which could result in broken checksums.
>
Yes, we've seen this in some drivers, but the conclusion is that those
drivers are *broken* and need to be fixed! CHECKSUM_PARTIAL works
perfectly well in the presence of extension headers if the
driver/device is correctly implemented (simple algorithm with
csum_start and csum_offset).

Tom

> 1) This patch improves the test for fragmentation and extension headers
> in ip6_append_data, so we set the ip_summed mode as early as possible
> to the correct value to compute the checksum during memory copy-in from
> user space.
>
> 2) We always call skb_checksum_help on CHECKSUM_PARTIAL fragments in
> ip6_fragment, because we don't know if the underlying hardware can deal
> with ip6_fragments.
>
> Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo 
> packets")
> See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check 
> CHECKSUM_PARTIAL")
> Cc: Eric Dumazet 
> Cc: Vlad Yasevich 
> Cc: Benjamin Coddington 
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  net/ipv6/ip6_output.c | 78 
> ---
>  1 file changed, 37 insertions(+), 41 deletions(-)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 8dddb45..26d2911 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -593,6 +593,10 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb,
> frag_id = ipv6_select_ident(net, _hdr(skb)->daddr,
> _hdr(skb)->saddr);
>
> +   if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> +   (err = skb_checksum_help(skb)))
> +   goto fail;
> +
> hroom = LL_RESERVED_SPACE(rt->dst.dev);
> if (skb_has_frag_list(skb)) {
> int first_len = skb_pagelen(skb);
> @@ -721,10 +725,6 @@ slow_path_clean:
> }
>
>  slow_path:
> -   if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> -   skb_checksum_help(skb))
> -   goto fail;
> -
> left = skb->len - hlen; /* Space per frame */
> ptr = hlen; /* Where to start from */
>
> @@ -1260,6 +1260,7 @@ static int __ip6_append_data(struct sock *sk,
> struct rt6_info *rt = (struct rt6_info *)cork->dst;
> struct ipv6_txoptions *opt = v6_cork->opt;
> int csummode = CHECKSUM_NONE;
> +   unsigned int maxnonfragsize, headersize;
>
> skb = skb_peek_tail(queue);
> if (!skb) {
> @@ -1277,38 +1278,43 @@ static int __ip6_append_data(struct sock *sk,
> maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
>  sizeof(struct frag_hdr);
>
> -   if (mtu <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) {
> -   unsigned int maxnonfragsize, headersize;
> -
> -   headersize = sizeof(struct ipv6hdr) +
> -(opt ? opt->opt_flen + opt->opt_nflen : 0) +
> -(dst_allfrag(>dst) ?
> - sizeof(struct frag_hdr) : 0) +
> -rt->rt6i_nfheader_len;
> -
> -   if (ip6_sk_ignore_df(sk))
> -   maxnonfragsize = sizeof(struct ipv6hdr) + 
> IPV6_MAXPLEN;
> -   else
> -   maxnonfragsize = mtu;
> +   headersize = sizeof(struct ipv6hdr) +
> +(opt ? opt->opt_flen + opt->opt_nflen : 0) +
> +(dst_allfrag(>dst) ?
> + sizeof(struct frag_hdr) : 0) +
> +rt->rt6i_nfheader_len;
> +
> +   if (cork->length + length > mtu - headersize && dontfrag &&
> +   (sk->sk_protocol == IPPROTO_UDP ||
> +sk->sk_protocol == IPPROTO_RAW)) {
> +   ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
> +   sizeof(struct ipv6hdr));
> +   goto emsgsize;
> +   }
>
> -   /* dontfrag active */
> -   if ((cork->length + length > mtu - headersize) && dontfrag &&
> -   (sk->sk_protocol == IPPROTO_UDP ||
> -sk->sk_protocol == IPPROTO_RAW)) {
> -   ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
> -  sizeof(struct ipv6hdr));
> -   goto emsgsize;
> -   }
> +   if (ip6_sk_ignore_df(sk))
> +   maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
> +   else
> +   maxnonfragsize = mtu;
>
> -   if (cork->length + length > maxnonfragsize - headersize) {
> +   if (cork->length + length > 

Re: [PATCH net] ipv6: no CHECKSUM_PARTIAL on skbs with extension headers and recalc checksum during fragmentation

2015-10-24 Thread Tom Herbert
On Sat, Oct 24, 2015 at 12:28 PM, Hannes Frederic Sowa
 wrote:
> Hi Tom,
>
> On Sat, Oct 24, 2015, at 18:21, Tom Herbert wrote:
>> On Fri, Oct 23, 2015 at 9:13 AM, Hannes Frederic Sowa
>>  wrote:
>> > CHECKSUM_PARTIAL should only be used on plain vanilla IPv6 + UDP packets
>> > in ip6_append_data. Some drivers don't correctly handle extension headers,
>> > especially not ipv6 fragmentation which could result in broken checksums.
>> >
>> Yes, we've seen this in some drivers, but the conclusion is that those
>> drivers are *broken* and need to be fixed! CHECKSUM_PARTIAL works
>> perfectly well in the presence of extension headers if the
>> driver/device is correctly implemented (simple algorithm with
>> csum_start and csum_offset).
>
> I will do some more research on those drivers and I agree in general
> with you. But if you look at the code it was clearly the intention to
> not use CHECKSUM_PARTIAL on fragmented IPv6 packets, just this intention
> has not been formulated correctly in code. I still would like to see
> something like that showing up in stable and we can follow more CSUM
> bits maybe for drivers who support or don't support that.
>
> Also as we are talking about fragments here, I would not put too much
> effort into that, as fragments will be slow path anyway.
>
Checksum will need to be done before fragmentation, but I see little
value in trying to push that back to the userspace copy. Besides, I
would expect that UFO should work properly with extension headers.

The rule is simple: if a driver states that ids capable of support an
IPv6 offload, that means that it must properly handle packets with
extension headers which are a core part of IPv6 protocol. We do not
want to add any more feature bits for stuff like this.

It is possible that we will be deploying IPv6 extension headers within
our data center at some point. I would prefer that we retain the value
of HW checksum offload in that, but we simply cannot have devices
silently sending bad checksums...

Please see my patch on the driver helper function to rectify device
capabilities with packets as they are transmitted.

Thanks,
Tom

> Bye,
> Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: no CHECKSUM_PARTIAL on skbs with extension headers and recalc checksum during fragmentation

2015-10-24 Thread Hannes Frederic Sowa
Hi Tom,

On Sat, Oct 24, 2015, at 18:46, Tom Herbert wrote:
> On Sat, Oct 24, 2015 at 12:28 PM, Hannes Frederic Sowa
>  wrote:
> > Hi Tom,
> >
> > On Sat, Oct 24, 2015, at 18:21, Tom Herbert wrote:
> >> On Fri, Oct 23, 2015 at 9:13 AM, Hannes Frederic Sowa
> >>  wrote:
> >> > CHECKSUM_PARTIAL should only be used on plain vanilla IPv6 + UDP packets
> >> > in ip6_append_data. Some drivers don't correctly handle extension 
> >> > headers,
> >> > especially not ipv6 fragmentation which could result in broken checksums.
> >> >
> >> Yes, we've seen this in some drivers, but the conclusion is that those
> >> drivers are *broken* and need to be fixed! CHECKSUM_PARTIAL works
> >> perfectly well in the presence of extension headers if the
> >> driver/device is correctly implemented (simple algorithm with
> >> csum_start and csum_offset).
> >
> > I will do some more research on those drivers and I agree in general
> > with you. But if you look at the code it was clearly the intention to
> > not use CHECKSUM_PARTIAL on fragmented IPv6 packets, just this intention
> > has not been formulated correctly in code. I still would like to see
> > something like that showing up in stable and we can follow more CSUM
> > bits maybe for drivers who support or don't support that.
> >
> > Also as we are talking about fragments here, I would not put too much
> > effort into that, as fragments will be slow path anyway.
> >
> Checksum will need to be done before fragmentation, but I see little
> value in trying to push that back to the userspace copy. Besides, I
> would expect that UFO should work properly with extension headers.

Sure, checksum needs to be calculated before fragmentation, but touching
the data once during userspace copy seems to be valuable to me to
instead doing it later in the kernel, especially if the data is cold as
it has been on a send queue for quite some while and some context
switches happened (MSG_MORE for example).

For UFO CHECKSUM_PARTIAL is a must and we require proper checksum
offload capabilities. But nearly no networking card does provide UFO
offloading, there are very few of them (but heavily used due to virtual
ones supporting that). If you use UFO during UDP output you end up doing
CHECKSUM_PARTIAL.

> The rule is simple: if a driver states that ids capable of support an
> IPv6 offload, that means that it must properly handle packets with
> extension headers which are a core part of IPv6 protocol. We do not
> want to add any more feature bits for stuff like this.

I would agree, but we don't want to disable CHECKSUM_PARTIAL for most of
the traffic just if some drivers fail to do correct checksum offloading
for fragmented packets.

> It is possible that we will be deploying IPv6 extension headers within
> our data center at some point. I would prefer that we retain the value
> of HW checksum offload in that, but we simply cannot have devices
> silently sending bad checksums...

Yes, sure. Btw. this patch only affects UDP and RAW sockets.

> Please see my patch on the driver helper function to rectify device
> capabilities with packets as they are transmitted.

I have seen that already but will have a second look.

Anyway, currently it is easy to generate broken checksums on the wire
and would like to solve that for net, we certainly can improve that in
net-next.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package

2015-10-24 Thread Lan, Tianyu


On 10/22/2015 5:14 AM, Alexander Duyck wrote:

Where is i being initialized?  It was here but you removed it.  Are you
using i without initializing it?


Sorry, the initialization was put into patch 10 by mistake. "i" is
assigned with "tx_ring->next_to_clean".
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


openvswitch: net --> net-next merge

2015-10-24 Thread David Miller

I needed to do a merge of 'net' into 'net-next' in order to
facilitate a set of tipc patches that I wanted to apply to
'net-next'.

There were several openvswitch merge conflicts, mostly to do with the
egress tunnel info bug fix conflicting with the simplification of the
vport ->send() method.

I did my best to resolve the conflicts, but if someone would double
check my work I would really appreciate it.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 1605/1613] net/tipc/link.c:166:6: sparse: symbol 'link_is_bc_sndlink' was not declared. Should it be static?

2015-10-24 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   687f079addba1ac7f97ce97080c2291bbe8c8dce
commit: 5266698661401afc5e4a1a521cf9ba10724d10dd [1605/1613] tipc: let 
broadcast packet reception use new link receive function
reproduce:
# apt-get install sparse
git checkout 5266698661401afc5e4a1a521cf9ba10724d10dd
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/tipc/link.c:166:6: sparse: symbol 'link_is_bc_sndlink' was not declared. 
>> Should it be static?
>> net/tipc/link.c:171:6: sparse: symbol 'link_is_bc_rcvlink' was not declared. 
>> Should it be static?
   net/tipc/link.c:635:6: sparse: symbol 'link_prepare_wakeup' was not 
declared. Should it be static?
   net/tipc/link.c:891:6: sparse: symbol 'tipc_link_advance_backlog' was not 
declared. Should it be static?
   net/tipc/link.c:967:5: sparse: symbol 'tipc_link_retrans' was not declared. 
Should it be static?
>> net/tipc/link.c:1561:6: sparse: symbol 'tipc_link_build_bc_init_msg' was not 
>> declared. Should it be static?
   include/linux/rcupdate.h:305:9: sparse: context imbalance in 
'tipc_link_find_owner' - wrong count at exit
   net/tipc/link.c:1821:5: sparse: context imbalance in 'tipc_nl_link_set' - 
different lock contexts for basic block
   net/tipc/link.c:2058:5: sparse: context imbalance in 'tipc_nl_link_dump' - 
different lock contexts for basic block
   net/tipc/link.c:2129:5: sparse: context imbalance in 'tipc_nl_link_get' - 
different lock contexts for basic block
   net/tipc/link.c:2181:5: sparse: context imbalance in 
'tipc_nl_link_reset_stats' - different lock contexts for basic block
--
   net/tipc/node.c:144:18: sparse: symbol 'tipc_node_create' was not declared. 
Should it be static?
   net/tipc/node.c:831:6: sparse: symbol 'tipc_node_filter_pkt' was not 
declared. Should it be static?
>> net/tipc/node.c:1090:6: sparse: symbol 'tipc_node_bc_rcv' was not declared. 
>> Should it be static?
   net/tipc/node.c:237:5: sparse: context imbalance in 'tipc_node_add_conn' - 
different lock contexts for basic block
   net/tipc/node.c:268:6: sparse: context imbalance in 'tipc_node_remove_conn' 
- different lock contexts for basic block
   net/tipc/node.c:303:9: sparse: context imbalance in 'tipc_node_timeout' - 
different lock contexts for basic block
   net/tipc/node.c:383:13: sparse: context imbalance in 'tipc_node_link_up' - 
wrong count at exit
   net/tipc/node.c:463:13: sparse: context imbalance in 'tipc_node_link_down' - 
wrong count at exit
   net/tipc/node.c:497:6: sparse: context imbalance in 'tipc_node_check_dest' - 
wrong count at exit
   net/tipc/node.c:917:9: sparse: context imbalance in 'tipc_node_get_linkname' 
- different lock contexts for basic block
   net/tipc/node.c:935:9: sparse: context imbalance in 'tipc_node_unlock' - 
unexpected unlock
   net/tipc/node.c:1027:5: sparse: context imbalance in 'tipc_node_xmit' - 
wrong count at exit
>> net/tipc/node.c:1090:6: sparse: context imbalance in 'tipc_node_bc_rcv' - 
>> wrong count at exit
   net/tipc/node.c:1277:6: sparse: context imbalance in 'tipc_rcv' - different 
lock contexts for basic block
   net/tipc/node.c:1348:5: sparse: context imbalance in 'tipc_nl_node_dump' - 
different lock contexts for basic block

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next] tipc: link_is_bc_sndlink() can be static

2015-10-24 Thread kbuild test robot
TO: "David S. Miller" 
CC: netdev@vger.kernel.org
CC: Jon Maloy 
CC: Ying Xue 
CC: tipc-discuss...@lists.sourceforge.net
CC: linux-ker...@vger.kernel.org


Signed-off-by: Fengguang Wu 
---
 link.c |8 
 node.c |2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 4449fa0..9efbdbd 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -163,12 +163,12 @@ bool tipc_link_is_blocked(struct tipc_link *l)
return l->state & (LINK_RESETTING | LINK_PEER_RESET | LINK_FAILINGOVER);
 }
 
-bool link_is_bc_sndlink(struct tipc_link *l)
+static bool link_is_bc_sndlink(struct tipc_link *l)
 {
return !l->bc_sndlink;
 }
 
-bool link_is_bc_rcvlink(struct tipc_link *l)
+static bool link_is_bc_rcvlink(struct tipc_link *l)
 {
return ((l->bc_rcvlink == l) && !link_is_bc_sndlink(l));
 }
@@ -1364,8 +1364,8 @@ static bool tipc_link_build_bc_proto_msg(struct tipc_link 
*l, bool bcast,
  * Give a newly added peer node the sequence number where it should
  * start receiving and acking broadcast packets.
  */
-void tipc_link_build_bc_init_msg(struct tipc_link *l,
-struct sk_buff_head *xmitq)
+static void tipc_link_build_bc_init_msg(struct tipc_link *l,
+   struct sk_buff_head *xmitq)
 {
struct sk_buff_head list;
 
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 7493506..20cddec 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1083,7 +1083,7 @@ int tipc_node_xmit_skb(struct net *net, struct sk_buff 
*skb, u32 dnode,
  *
  * Invoked with no locks held.
  */
-void tipc_node_bc_rcv(struct net *net, struct sk_buff *skb, int bearer_id)
+static void tipc_node_bc_rcv(struct net *net, struct sk_buff *skb, int 
bearer_id)
 {
int rc;
struct sk_buff_head xmitq;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: no CHECKSUM_PARTIAL on skbs with extension headers and recalc checksum during fragmentation

2015-10-24 Thread Hannes Frederic Sowa
Hi Tom,

On Sat, Oct 24, 2015, at 18:21, Tom Herbert wrote:
> On Fri, Oct 23, 2015 at 9:13 AM, Hannes Frederic Sowa
>  wrote:
> > CHECKSUM_PARTIAL should only be used on plain vanilla IPv6 + UDP packets
> > in ip6_append_data. Some drivers don't correctly handle extension headers,
> > especially not ipv6 fragmentation which could result in broken checksums.
> >
> Yes, we've seen this in some drivers, but the conclusion is that those
> drivers are *broken* and need to be fixed! CHECKSUM_PARTIAL works
> perfectly well in the presence of extension headers if the
> driver/device is correctly implemented (simple algorithm with
> csum_start and csum_offset).

I will do some more research on those drivers and I agree in general
with you. But if you look at the code it was clearly the intention to
not use CHECKSUM_PARTIAL on fragmented IPv6 packets, just this intention
has not been formulated correctly in code. I still would like to see
something like that showing up in stable and we can follow more CSUM
bits maybe for drivers who support or don't support that.

Also as we are talking about fragments here, I would not put too much
effort into that, as fragments will be slow path anyway.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-24 Thread Ani Sinha
netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
kmem_cache_free(net->ct.nf_conntrack_cachep, ct);

net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1  task 2  task 3
nf_conntrack_find_get
 nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
__nf_conntrack_alloc
 kmem_cache_alloc
 
memset(>tuplehash[IP_CT_DIR_MAX],
 if (nf_ct_is_dying(ct))
 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[]  [] 
nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [] alloc_null_binding+0x5b/0xa0 
[iptable_nat]
<4>[46267.085697]  [] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [] nf_iterate+0x69/0xb0
<4>[46267.085991]  [] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [] ? dst_output+0x0/0x20
<4>[46267.086277]  [] ip_output+0xa4/0xc0
<4>[46267.086346]  [] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [] sys_sendto+0x139/0x190
<4>[46267.087229]  [] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 
c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 
0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet 
Cc: Florian Westphal 
Cc: Pablo Neira Ayuso 
Cc: Patrick McHardy 
Cc: Jozsef Kadlecsik 
Cc: "David S. Miller" 
Cc: Cyrill Gorcunov 
Signed-off-by: Andrey Vagin 
Acked-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 
Signed-off-by: Ani Sinha 
---
 net/netfilter/nf_conntrack_core.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 9a46908..fd0f7a3 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack)
nf_ct_put(ct);
 }
 
+static inline bool
+nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
+   const struct nf_conntrack_tuple *tuple,
+   u16 zone)
+{
+   struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+
+   /* A conntrack can be recreated with the equal tuple,
+* so we need to check that the conntrack is confirmed
+*/
+   return nf_ct_tuple_equal(tuple, 

Re: [PATCH net] ipv6: no CHECKSUM_PARTIAL on skbs with extension headers and recalc checksum during fragmentation

2015-10-24 Thread Hannes Frederic Sowa
Hi,

On Sat, Oct 24, 2015, at 00:48, Eric Dumazet wrote:
> On Fri, 2015-10-23 at 15:13 +0200, Hannes Frederic Sowa wrote:
> > CHECKSUM_PARTIAL should only be used on plain vanilla IPv6 + UDP packets
> > in ip6_append_data. Some drivers don't correctly handle extension headers,
> > especially not ipv6 fragmentation which could result in broken checksums.
> > 
> > 1) This patch improves the test for fragmentation and extension headers
> > in ip6_append_data, so we set the ip_summed mode as early as possible
> > to the correct value to compute the checksum during memory copy-in from
> > user space.
> > 
> > 2) We always call skb_checksum_help on CHECKSUM_PARTIAL fragments in
> > ip6_fragment, because we don't know if the underlying hardware can deal
> > with ip6_fragments.
> > 
> > Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo 
> > packets")
> > See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check 
> > CHECKSUM_PARTIAL")
> > Cc: Eric Dumazet 
> > Cc: Vlad Yasevich 
> > Cc: Benjamin Coddington 
> > Signed-off-by: Hannes Frederic Sowa 
> > ---
> >  net/ipv6/ip6_output.c | 78 
> > ---
> >  1 file changed, 37 insertions(+), 41 deletions(-)
> > 
> > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> > index 8dddb45..26d2911 100644
> > --- a/net/ipv6/ip6_output.c
> > +++ b/net/ipv6/ip6_output.c
> > @@ -593,6 +593,10 @@ int ip6_fragment(struct sock *sk, struct sk_buff *skb,
> > frag_id = ipv6_select_ident(net, _hdr(skb)->daddr,
> > _hdr(skb)->saddr);
> >  
> > +   if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> > +   (err = skb_checksum_help(skb)))
> > +   goto fail;
> > +
> > hroom = LL_RESERVED_SPACE(rt->dst.dev);
> > if (skb_has_frag_list(skb)) {
> > int first_len = skb_pagelen(skb);
> > @@ -721,10 +725,6 @@ slow_path_clean:
> > }
> >  
> >  slow_path:
> > -   if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> > -   skb_checksum_help(skb))
> > -   goto fail;
> > -
> 
> 
> It looks like this patch could be split in two, to ease future bisection
> maybe ?

Sure, will split it up.

Thanks Eric!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da upstream.

2015-10-24 Thread Ani Sinha
netfilter: nf_conntrack: don't release a conntrack with non-zero
refcnt

With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1CPU2
nf_conntrack_find()
nf_ct_put()
 destroy_conntrack()
...
init_conntrack
 __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(>use) (use = 2)
 if (!l4proto->new(ct, skb, dataoff, 
timeouts))
  nf_conntrack_free(ct); (use = 2 !!!)
...
__nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
/* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

<4>[67096.759334] [ cut here ]
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C 
---2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[]  [] 
destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255]  [] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255]  [] nf_conntrack_find_get+0x85/0x130 
[nf_conntrack]
<4>[67096.760255]  [] nf_conntrack_in+0x352/0xb60 
[nf_conntrack]
<4>[67096.760255]  [] ipv4_conntrack_local+0x51/0x60 
[nf_conntrack_ipv4]
<4>[67096.760255]  [] nf_iterate+0x69/0xb0
<4>[67096.760255]  [] ? dst_output+0x0/0x20
<4>[67096.760255]  [] nf_hook_slow+0x74/0x110
<4>[67096.760255]  [] ? dst_output+0x0/0x20
<4>[67096.760255]  [] raw_sendmsg+0x775/0x910
<4>[67096.760255]  [] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [] inet_sendmsg+0x4a/0xb0
<4>[67096.760255]  [] ? sock_sendmsg+0x13/0x140
<4>[67096.760255]  [] sock_sendmsg+0x117/0x140
<4>[67096.760255]  [] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255]  [] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255]  [] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255]  [] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [] sys_sendto+0x139/0x190
<4>[67096.760255]  [] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255]  [] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255]  [] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255]  [] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet 
Cc: Andrew Vagin 
Cc: Florian Westphal 
Cc: Zefan Li 
Signed-off-by: Ani Sinha 
Reported-by: Andrew Vagin 
Signed-off-by: Pablo Neira Ayuso 
Reviewed-by: Eric Dumazet 
Acked-by: Andrew Vagin 
---
 net/netfilter/nf_conntrack_core.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 9a171b2..9a46908 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -441,7 +441,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
goto out;
 
add_timer(>timeout);
-   nf_conntrack_get(>ct_general);
+   smp_wmb();
+   /* The caller holds a reference to this object */
+   atomic_set(>ct_general.use, 2);
__nf_conntrack_hash_insert(ct, hash, repl_hash);
NF_CT_STAT_INC(net, insert);
spin_unlock_bh(_conntrack_lock);
@@ -732,11 +734,10 @@ __nf_conntrack_alloc(struct net *net, u16 zone,
nf_ct_zone->id = zone;
}
 #endif
-   /*
-* changes to lookup keys must be done before setting refcnt to 1
+   /* Because we use RCU lookups, we set ct_general.use to zero before
+* this is inserted in any list.
 */
-   smp_wmb();
-   

Re: [PATCH net-next 00/16] tipc: improve broadcast implementation

2015-10-24 Thread David Miller
From: Jon Maloy 
Date: Thu, 22 Oct 2015 08:51:32 -0400

> The TIPC broadcast link implementation is currently complex and hard to
> follow. It also incurs some amount of code and structure duplication,
> something that can be reduced significantly with a little effort.
> 
> This commit series introduces a number of improvements which address
> both the locking structure, the code/structure duplication issue, and
> the overall readbility of the code.
> 
> The series consists of three main parts:
> 
> 1-7: Adaptation to the new link structure, and preparation for the next
>  step. In particular, we want the broadcast transmission link to
>  have a life cycle that is longer than any of its potential (unicast
>  and broadcast receive links) users. This eliminates the need to
>  always test for the presence of this link before accessing it.
> 
> 8-10: This is what is really new in this series. Commit #9 is by far
>   the largest and most important one, because it moves most of
>   the broadcast functionality into link.c, partially reusing the
>   fields and functionality of the unicast link. The removal of
>   the "node_map" infrastructure in commit #10 is also an important
>   achievement.
> 
> 11-16: Some improvements leveraging the changes made in the previous
>commits.
> 
> The series needs commit 53387c4e22ac ("tipc: extend broadcast link window 
> size")
> and commit e53567948f82 ("tipc: conditionally expand buffer headroom over udp 
> tunnel")
> which are both present in 'net' but not yet in 'net-next', to apply cleanly.

Series applied, thanks Jon.

And I really appreciate the exact explanation of what this series
depended upon.

Thanks again.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 01/12] PCI: Add virtfn_index for struct pci_device

2015-10-24 Thread Lan, Tianyu



On 10/22/2015 2:07 AM, Alexander Duyck wrote:

On 10/21/2015 09:37 AM, Lan Tianyu wrote:

Add "virtfn_index" member in the struct pci_device to record VF sequence
of PF. This will be used in the VF sysfs node handle.

Signed-off-by: Lan Tianyu 
---
  drivers/pci/iov.c   | 1 +
  include/linux/pci.h | 1 +
  2 files changed, 2 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..065b6bb 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -136,6 +136,7 @@ static int virtfn_add(struct pci_dev *dev, int id,
int reset)
  virtfn->physfn = pci_dev_get(dev);
  virtfn->is_virtfn = 1;
  virtfn->multifunction = 0;
+virtfn->virtfn_index = id;
  for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) {
  res = >resource[i + PCI_IOV_RESOURCES];
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..85c5531 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -356,6 +356,7 @@ struct pci_dev {
  unsigned intio_window_1k:1;/* Intel P2P bridge 1K I/O
windows */
  unsigned intirq_managed:1;
  pci_dev_flags_t dev_flags;
+unsigned intvirtfn_index;
  atomic_tenable_cnt;/* pci_enable_device has been called */
  u32saved_config_space[16]; /* config space saved at
suspend time */



Can't you just calculate the VF index based on the VF BDF number
combined with the information in the PF BDF number and VF
offset/stride?  Seems kind of pointless to add a variable that is only
used by one driver and is in a slowpath when you can just calculate it
pretty quickly.


Good suggestion. Will try it.



- Alex

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 1610/1613] net/tipc/link.c:176:5: sparse: symbol 'tipc_link_is_active' was not declared. Should it be static?

2015-10-24 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   687f079addba1ac7f97ce97080c2291bbe8c8dce
commit: c72fa872a23f03b2b9c17e88f3b0a8070924e5f1 [1610/1613] tipc: eliminate 
link's reference to owner node
reproduce:
# apt-get install sparse
git checkout c72fa872a23f03b2b9c17e88f3b0a8070924e5f1
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   net/tipc/link.c:166:6: sparse: symbol 'link_is_bc_sndlink' was not declared. 
Should it be static?
   net/tipc/link.c:171:6: sparse: symbol 'link_is_bc_rcvlink' was not declared. 
Should it be static?
>> net/tipc/link.c:176:5: sparse: symbol 'tipc_link_is_active' was not 
>> declared. Should it be static?
   net/tipc/link.c:648:6: sparse: symbol 'link_prepare_wakeup' was not 
declared. Should it be static?
   net/tipc/link.c:904:6: sparse: symbol 'tipc_link_advance_backlog' was not 
declared. Should it be static?
   net/tipc/link.c:980:5: sparse: symbol 'tipc_link_retrans' was not declared. 
Should it be static?
   net/tipc/link.c:1573:6: sparse: symbol 'tipc_link_build_bc_init_msg' was not 
declared. Should it be static?
   include/linux/rcupdate.h:305:9: sparse: context imbalance in 
'tipc_link_find_owner' - wrong count at exit
   net/tipc/link.c:1833:5: sparse: context imbalance in 'tipc_nl_link_set' - 
different lock contexts for basic block
   net/tipc/link.c:2070:5: sparse: context imbalance in 'tipc_nl_link_dump' - 
different lock contexts for basic block
   net/tipc/link.c:2141:5: sparse: context imbalance in 'tipc_nl_link_get' - 
different lock contexts for basic block
   net/tipc/link.c:2193:5: sparse: context imbalance in 
'tipc_nl_link_reset_stats' - different lock contexts for basic block

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next] tipc: tipc_link_is_active() can be static

2015-10-24 Thread kbuild test robot
TO: "David S. Miller" 
CC: netdev@vger.kernel.org
CC: Jon Maloy 
CC: Ying Xue 
CC: tipc-discuss...@lists.sourceforge.net
CC: linux-ker...@vger.kernel.org


Signed-off-by: Fengguang Wu 
---
 link.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 4449fa0..b637276 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -173,7 +173,7 @@ bool link_is_bc_rcvlink(struct tipc_link *l)
return ((l->bc_rcvlink == l) && !link_is_bc_sndlink(l));
 }
 
-int tipc_link_is_active(struct tipc_link *l)
+static int tipc_link_is_active(struct tipc_link *l)
 {
return l->active;
 }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: ethernet: add driver for Aurora VLSI NB8800 Ethernet controller

2015-10-24 Thread Måns Rullgård
Florian Fainelli  writes:

 +static void nb8800_set_rx_mode(struct net_device *dev)
 +{
 +  struct nb8800_priv *priv = netdev_priv(dev);
 +  struct netdev_hw_addr *ha;
 +  int af_en;
 +
 +  if ((dev->flags & (IFF_PROMISC | IFF_ALLMULTI)) ||
 +  netdev_mc_count(dev) > 64)
>>>
>>> 64, that's pretty generous for a perfect match filter, nice.
>> 
>> That's bogus; I forgot to delete it.  The hardware uses a 64-entry hash
>> table, and whoever wrote the old driver apparently didn't understand how
>> it works.
>
> Might be best to put the interface in promiscuous mode until you have
> proper multicast support. Since this is for a Set-Top box chip, having
> proper multicast support still seems like something highly desirable.

The code below should work correctly with any number of multicast
addresses.

 +  phydev = phy_find_first(bus);
 +  if (!phydev || phy_read(phydev, MII_BMSR) <= 0) {
>>>
>>> What is this additional MII_MBSR read used for?
>> 
>> On one of my boards, phylib misdetects a phy on the second ethernet port
>> even though there is none.  Perhaps I should revisit that problem and
>> look for a better solution.
>
> I think that would be best, if you are currently using the Generic PHY
> driver, consider writing a specific driver which would take care of
> quirky behavior.

The problem is that there is no PHY, yet for some reason reading the ID
registers appears to succeed.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_XPS depends on L1_CACHE_BYTES being greater than sizeof(struct xps_map)

2015-10-24 Thread Helge Deller
* Alexander Duyck :
> On 10/23/2015 03:17 PM, Helge Deller wrote:
> >On 24.10.2015 00:00, Alexander Duyck wrote:
> >>On 10/23/2015 02:08 PM, Helge Deller wrote:
> >>>* Eric Dumazet :
> On Fri, 2015-10-23 at 21:25 +0200, Helge Deller wrote:
> 
> >Then, how about simply changing it to twice of L1_CACHE_BYTES ?
> >
> >#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES * 2 - sizeof(struct xps_map)) 
> >/ sizeof(u16))
> 
> 
> Seems good to me.
> >>>
> >>>Great!
> >>>
> >>>Can you then maybe give me an Acked-by or signed-off for the patch below?
> >>>It further adds a compile-time check to avoid that XPS_MIN_MAP_ALLOC
> >>>gets calculated to zero on any architecture - otherwise no queues would
> >>>be allocated.
> >>>
> >>>In addition I would like to push it for v4.3 then through my parisc-tree
> >>>(after keeping it in for-next for 1-2 days), together with the patch
> >>>which reduces L1_CACHE_BYTES to 16 on parisc.
> >>>Would that be OK too?
> >>>
> >>>Thanks!
> >>>Helge
> >>>
> >>>
> >>>[PATCH] net/xps: Increase initial number of xps queues
> >>>
> >>>Increase the number of initial allocated xps queues, so that the initial 
> >>>record
> >>>allocates twice the size of L1_CACHE_BYTES bytes.
> >>>
> >>>This change is needed to copy with architectures where L1_CACHE_BYTES is
> >>>defined to equal or less than 16 bytes.
> >>>
> >>>Signed-off-by: Helge Deller 
> >>>
> >>>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>index 2d15e38..d152788 100644
> >>>--- a/include/linux/netdevice.h
> >>>+++ b/include/linux/netdevice.h
> >>>@@ -718,7 +718,7 @@ struct xps_map {
> >>>   u16 queues[0];
> >>>   };
> >>>   #define XPS_MAP_SIZE(_num) (sizeof(struct xps_map) + ((_num) * 
> >>> sizeof(u16)))
> >>>-#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map))\
> >>>+#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES * 2 - sizeof(struct xps_map)) \
> >>>   / sizeof(u16))
> >>>
> >>>   /*
> >>>diff --git a/net/core/dev.c b/net/core/dev.c
> >>>index 6bb6470..f6d6dd1 100644
> >>>--- a/net/core/dev.c
> >>>+++ b/net/core/dev.c
> >>>@@ -1972,6 +1972,8 @@ static struct xps_map *expand_xps_map(struct xps_map 
> >>>*map,
> >>>   int alloc_len = XPS_MIN_MAP_ALLOC;
> >>>   int i, pos;
> >>>
> >>>+BUILD_BUG_ON(XPS_MIN_MAP_ALLOC == 0);
> >>>+
> >>>   for (pos = 0; map && pos < map->len; pos++) {
> >>>   if (map->queues[pos] != index)
> >>>   continue;
> >>>
> >>>
> >>
> >>Rather then leaving a potential bug you could probably rewrite the macro so 
> >>that it will give you at least 1.
> >>
> >>All you need to do is something like the following
> >>#define XPS_MIN_MAP_ALLOC \
> >> ((L1_CACHE_ALIGN(offsetof(struct xps_map, queue[1])) - \
> >>   sizeof(struct xps_map)) / sizeof(u16))
> >>
> >>That should give you at least an XPS_MIN_MAP_ALLOC of 1.
> >
> >Yes, good idea!
> >
> >What makes me wonder though (because I have no idea about the XPS 
> >code/layer):
> >How likely is it, that more than 1 (e.g. minimum "X") queues are needed?
> >E.g. if a typical system needs at least 3 queues, then doesn't it make sense 
> >to allocate
> >at least 3 initially by using queue[3] in your proposed patch above ?
> >What would "X" be then?
> 
> The question I would have is in how many cases it it likely that somebody
> would enable this feature and point a given CPU at more than one queue.  I
> know the Intel drivers that make use of XPS tend to do a 1:1 mapping for
> their ATR feature.  I would think if anything most CPUs would probably be
> mapped many:1, but you probably won't have all that many cases where it is
> 1:many or many:many.
> 
> I'd say starting with at least 1 should be fine.  Worst case scenario is we
> have to make a couple more calls to expand_xps_map which will likely occur
> as a slow path and infrequent event anyway.

Ok, can I get then the signed-off or acked-by from you for this patch?

Thanks,
Helge


[PATCH] net/xps: Fix calculation of initial number of xps queues

The existing code breaks on architectures where the L1 cache size
(L1_CACHE_BYTES) is smaller or equal the size of struct xps_map.

The new code ensures that we get at minimum one initial xps queue, or
even more as long as it fits into the next multiple of L1_CACHE_SIZE.

Signed-off-by: Helge Deller 

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2d15e38..2212c82 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -718,8 +718,8 @@ struct xps_map {
u16 queues[0];
 };
 #define XPS_MAP_SIZE(_num) (sizeof(struct xps_map) + ((_num) * sizeof(u16)))
-#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map))   \
-/ sizeof(u16))
+#define XPS_MIN_MAP_ALLOC ((L1_CACHE_ALIGN(offsetof(struct xps_map, 
queues[1])) \
+   - sizeof(struct xps_map)) / sizeof(u16))
 
 /*
  * This structure holds all XPS maps for device.  

[PATCH net] ipv4: fix problems from the RTNH_F_LINKDOWN introduction

2015-10-24 Thread Julian Anastasov
When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
we should not delete the local routes if the local address
is still present. The confusion comes from the fact that both
fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
constant. Fix it by returning back the variable 'force'.

Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
ip route list table local | grep dummy | grep host
local 192.168.168.1 dev dummy0  proto kernel  scope host  src 192.168.168.1

Second fix is for fib_sync_up: when nexthop is part of multipath
route we should clear the LINKDOWN flag when link goes UP
or when first address is added. This is needed because we always
set LINKDOWN flag when DEAD flag is set but now the nexthop
is not dead anymore. Examples when LINKDOWN bit can be forgotten:

- link goes down (LINKDOWN is set), then link goes UP and device
shows carrier OK but LINKDOWN remains set

- last address is deleted (LINKDOWN is set), then address is
added and device shows carrier OK but LINKDOWN remains set

Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops")
Signed-off-by: Julian Anastasov 
---
 include/net/ip_fib.h |  2 +-
 net/ipv4/fib_frontend.c  | 13 +++--
 net/ipv4/fib_semantics.c | 18 +++---
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 727d6e9..654aec1 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -317,7 +317,7 @@ void fib_flush_external(struct net *net);
 
 /* Exported by fib_semantics.c */
 int ip_fib_check_default(__be32 gw, struct net_device *dev);
-int fib_sync_down_dev(struct net_device *dev, unsigned long event);
+int fib_sync_down_dev(struct net_device *dev, unsigned long event, int force);
 int fib_sync_down_addr(struct net *net, __be32 local);
 int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
 void fib_select_multipath(struct fib_result *res);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 690bcbc..4826a22 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1110,9 +1110,10 @@ static void nl_fib_lookup_exit(struct net *net)
net->ipv4.fibnl = NULL;
 }
 
-static void fib_disable_ip(struct net_device *dev, unsigned long event)
+static void fib_disable_ip(struct net_device *dev, unsigned long event,
+  int force)
 {
-   if (fib_sync_down_dev(dev, event))
+   if (fib_sync_down_dev(dev, event, force))
fib_flush(dev_net(dev));
rt_cache_flush(dev_net(dev));
arp_ifdown(dev);
@@ -1140,7 +1141,7 @@ static int fib_inetaddr_event(struct notifier_block 
*this, unsigned long event,
/* Last address was deleted from this interface.
 * Disable IP.
 */
-   fib_disable_ip(dev, event);
+   fib_disable_ip(dev, event, 1);
} else {
rt_cache_flush(dev_net(dev));
}
@@ -1157,7 +1158,7 @@ static int fib_netdev_event(struct notifier_block *this, 
unsigned long event, vo
unsigned int flags;
 
if (event == NETDEV_UNREGISTER) {
-   fib_disable_ip(dev, event);
+   fib_disable_ip(dev, event, 2);
rt_flush_dev(dev);
return NOTIFY_DONE;
}
@@ -1178,14 +1179,14 @@ static int fib_netdev_event(struct notifier_block 
*this, unsigned long event, vo
rt_cache_flush(net);
break;
case NETDEV_DOWN:
-   fib_disable_ip(dev, event);
+   fib_disable_ip(dev, event, 0);
break;
case NETDEV_CHANGE:
flags = dev_get_flags(dev);
if (flags & (IFF_RUNNING | IFF_LOWER_UP))
fib_sync_up(dev, RTNH_F_LINKDOWN);
else
-   fib_sync_down_dev(dev, event);
+   fib_sync_down_dev(dev, event, 0);
/* fall through */
case NETDEV_CHANGEMTU:
rt_cache_flush(net);
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 064bd3c..f657418 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1281,7 +1281,13 @@ int fib_sync_down_addr(struct net *net, __be32 local)
return ret;
 }
 
-int fib_sync_down_dev(struct net_device *dev, unsigned long event)
+/* Event  force Flags   Description
+ * NETDEV_CHANGE  0 LINKDOWNCarrier OFF, not for scope host
+ * NETDEV_DOWN0 LINKDOWN|DEAD   Link down, not for scope host
+ * NETDEV_DOWN1 LINKDOWN|DEAD   Last address removed
+ * NETDEV_UNREGISTER  2 LINKDOWN|DEAD   Device removed
+ */
+int fib_sync_down_dev(struct net_device *dev, unsigned long event, int force)
 {
int ret = 0;
int scope = RT_SCOPE_NOWHERE;
@@ -1290,8 +1296,7 @@ int fib_sync_down_dev(struct 

Re: Missing IPv4 routes

2015-10-24 Thread Brian Rak



On 10/23/2015 6:32 PM, Alexander Duyck wrote:

On 10/23/2015 02:34 PM, Brian Rak wrote:

I've got a weird situation here.  I have a route that the kernel knows
about, but won't display via the general RTM_GETROUTE call, but will
display if I query for that particular route:

# ip -4 route show | grep 108.61.171.x


The use of 'x' here is going to make things confusing.  I assume you 
are using a value of 0 here, or is this a route to a specific IP 
address that you have.  If not you should be using a 0 for all bits 
that would be outside of your subnet mask.



This is a route to a particular IP address:

# ip route show | grep  108.61.171.247
# ip route get  108.61.171.247
108.61.171.247 dev SRVID630287
cache


# ip route get 108.61.171.x
108.61.171.x dev MYIF
 cache


The 'x' being the actual value here should work as this will perform a 
lookup as I recall.



# cat /proc/net/route | grep 108.61.171.x


The IPs are in network order and as just hex so this won't work.


# cat /proc/net/route  | grep -i 6c3dac


The byte ordering you are using is backwards here from what I can 
tell.  So it should be ac3d6c you are checking for, not the other way 
around.  So for example if I was using 192.168.1.x I would want to 
look for 01A8C0.

Oops.  This also doesn't show the route, which it should:

# cat /proc/net/route  | grep SRVID630287
#




# ip route add 108.61.171.x dev MYIF
RTNETLINK answers: File exists
# ip route del 108.61.171.x  < it deletes successfully once
# ip route del 108.61.171.x
RTNETLINK answers: No such process



So at least we have the routes in the FIB.  It looks like this just 
might be a display issue.



This is on a machine running 4.1.3, but I have seen it on earlier
versions in the past.

I don't have great reproduction steps here, I've seen this 4-5 times in
the past few months (on different hardware).  So far, I haven't really
found any way of fixing it (deleting and readding the route has no
effect).  I thought at first this might be related to
e55ffaf457bcc8ec4e9d9f56f955971f834d65b3, but as far as I can tell that
only relates to /proc/net/route.

Any suggestions on further troubleshooting here?  I'm all out of ideas
(and since I can't easily reproduce it yet, I can't reboot to a newer
kernel to see if it goes away)


How many routes do you have on your system?  I'm just wondering if it 
might be possible that the route could be at a boundary for the dump 
call and if it might be possibly losing the data there. Although I 
would expect

ip -4 route show | wc -l shows 67


Also have you tried double checking to verify that grep isn't somehow 
missing the line?
Yes, so we noticed this issue because BIRD stopped picking up the 
route.  BIRD's trying to grab these via netlink: 
https://github.com/BIRD/bird/blob/master/sysdep/linux/netlink.c#L1045 , 
so I don't believe this is just an issue with grep missing the route.  I 
also wrote a simple  python script with pyroute2, which also missed the 
route.


I was doing some testing to see if I could add routes for nearby IPs, 
and ended up somehow correcting the issue:


# ip route show | grep SRVID630287
# ip route add 108.61.171.200/32 dev SRVID630287
# ip route show | grep SRVID630287
108.61.171.200 dev SRVID630287  scope link
108.61.171.247 dev SRVID630287  scope link
# ip route del 108.61.171.200/32 dev SRVID630287
# ip route show | grep SRVID630287
108.61.171.247 dev SRVID630287  scope link

Does that make any sense?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] ipv6: gre: support SIT encapsulation

2015-10-24 Thread Eric Dumazet
From: Eric Dumazet 

gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.

Fixes: 61c1db7fae21e ("ipv6: sit: add GSO/TSO support")
Signed-off-by: Eric Dumazet 
---
 net/ipv4/gre_offload.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 5aa46d4b44ef..5a8ee3282550 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -36,7 +36,8 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
  SKB_GSO_TCP_ECN |
  SKB_GSO_GRE |
  SKB_GSO_GRE_CSUM |
- SKB_GSO_IPIP)))
+ SKB_GSO_IPIP |
+ SKB_GSO_SIT)))
goto out;
 
if (!skb->encapsulation)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 08/12] IXGBEVF: Rework code of finding the end transmit desc of package

2015-10-24 Thread Lan, Tianyu



On 10/22/2015 8:58 PM, Michael S. Tsirkin wrote:

Do you really need to play the shifting games?
Can't you just reset everything and re-initialize the rings?
It's slower but way less intrusive.
Also removes the need to track writes into rings.


Shift ring is to avoid losing those packets in the ring.
This may cause some race condition and so I introduced a
lock to prevent such cases in the latter patch.
Yes, reset everything after migration can make thing easy.
But just like you said it would affect performance and loss
more packets. I can do a test later to get data about these
two way.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

2015-10-24 Thread Ani Sinha
Please refer to the thread "linux 3.4.43 : kernel crash at
__nf_conntrack_confirm" on netdev for context.

thanks

On Sat, Oct 24, 2015 at 11:27 AM, Ani Sinha  wrote:
> netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
>
> Lets look at destroy_conntrack:
>
> hlist_nulls_del_rcu(>tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> ...
> nf_conntrack_free(ct)
> kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
>
> net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
>
> The hash is protected by rcu, so readers look up conntracks without
> locks.
> A conntrack is removed from the hash, but in this moment a few readers
> still can use the conntrack. Then this conntrack is released and another
> thread creates conntrack with the same address and the equal tuple.
> After this a reader starts to validate the conntrack:
> * It's not dying, because a new conntrack was created
> * nf_ct_tuple_equal() returns true.
>
> But this conntrack is not initialized yet, so it can not be used by two
> threads concurrently. In this case BUG_ON may be triggered from
> nf_nat_setup_info().
>
> Florian Westphal suggested to check the confirm bit too. I think it's
> right.
>
> task 1  task 2  task 3
> nf_conntrack_find_get
>  nf_conntrack_find
> destroy_conntrack
>  hlist_nulls_del_rcu
>  nf_conntrack_free
>  kmem_cache_free
> __nf_conntrack_alloc
>  kmem_cache_alloc
>  
> memset(>tuplehash[IP_CT_DIR_MAX],
>  if (nf_ct_is_dying(ct))
>  if (!nf_ct_tuple_equal()
>
> I'm not sure, that I have ever seen this race condition in a real life.
> Currently we are investigating a bug, which is reproduced on a few nodes.
> In our case one conntrack is initialized from a few tasks concurrently,
> we don't have any other explanation for this.
>
> <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
> ...
> <4>[46267.083951] RIP: 0010:[]  [] 
> nf_nat_setup_info+0x564/0x590 [nf_nat]
> ...
> <4>[46267.085549] Call Trace:
> <4>[46267.085622]  [] alloc_null_binding+0x5b/0xa0 
> [iptable_nat]
> <4>[46267.085697]  [] nf_nat_rule_find+0x5c/0x80 
> [iptable_nat]
> <4>[46267.085770]  [] nf_nat_fn+0x111/0x260 [iptable_nat]
> <4>[46267.085843]  [] nf_nat_out+0x48/0xd0 [iptable_nat]
> <4>[46267.085919]  [] nf_iterate+0x69/0xb0
> <4>[46267.085991]  [] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086063]  [] nf_hook_slow+0x74/0x110
> <4>[46267.086133]  [] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086207]  [] ? dst_output+0x0/0x20
> <4>[46267.086277]  [] ip_output+0xa4/0xc0
> <4>[46267.086346]  [] raw_sendmsg+0x8b4/0x910
> <4>[46267.086419]  [] inet_sendmsg+0x4a/0xb0
> <4>[46267.086491]  [] ? sock_update_classid+0x3a/0x50
> <4>[46267.086562]  [] sock_sendmsg+0x117/0x140
> <4>[46267.086638]  [] ? _spin_unlock_bh+0x1b/0x20
> <4>[46267.086712]  [] ? autoremove_wake_function+0x0/0x40
> <4>[46267.086785]  [] ? do_ip_setsockopt+0x90/0xd80
> <4>[46267.086858]  [] ? call_function_interrupt+0xe/0x20
> <4>[46267.086936]  [] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087006]  [] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087081]  [] ? kmem_cache_alloc+0xd8/0x1e0
> <4>[46267.087151]  [] sys_sendto+0x139/0x190
> <4>[46267.087229]  [] ? sock_setsockopt+0x16d/0x6f0
> <4>[46267.087303]  [] ? audit_syscall_entry+0x1d7/0x200
> <4>[46267.087378]  [] ? __audit_syscall_exit+0x265/0x290
> <4>[46267.087454]  [] ? compat_sys_setsockopt+0x75/0x210
> <4>[46267.087531]  [] compat_sys_socketcall+0x13f/0x210
> <4>[46267.087607]  [] ia32_sysret+0x0/0x5
> <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 
> c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 
> <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
> <1>[46267.088023] RIP  [] nf_nat_setup_info+0x564/0x590
>
> Cc: Eric Dumazet 
> Cc: Florian Westphal 
> Cc: Pablo Neira Ayuso 
> Cc: Patrick McHardy 
> Cc: Jozsef Kadlecsik 
> Cc: "David S. Miller" 
> Cc: Cyrill Gorcunov 
> Signed-off-by: Andrey Vagin 
> Acked-by: Eric Dumazet 
> Signed-off-by: Pablo Neira Ayuso 
> Signed-off-by: Ani Sinha 
> ---
>  net/netfilter/nf_conntrack_core.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index 9a46908..fd0f7a3 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -309,6 +309,21 @@ static void death_by_timeout(unsigned long ul_conntrack)
> nf_ct_put(ct);
>  }
>
> +static inline 

Re: [PATCHv2 net 1/3] openvswitch: Fix double-free on ip_defrag() errors

2015-10-24 Thread Joe Stringer
On 24 October 2015 at 01:20, Florian Westphal  wrote:
> Joe Stringer  wrote:
>>  err:
>> + if (err)
>> + kfree_skb(skb);
>>   skb_push(skb, nh_ofs);
>
> That looks... wrong :-}

D'oh. Teaches me for last minute adjustments. I'll resend.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da upstream.

2015-10-24 Thread Ani Sinha
Please refer to the thread "linux 3.4.43 : kernel crash at
__nf_conntrack_confirm" on netdev for context.

thanks

On Sat, Oct 24, 2015 at 10:27 AM, Ani Sinha  wrote:
> netfilter: nf_conntrack: don't release a conntrack with non-zero
> refcnt
>
> With this patch, the conntrack refcount is initially set to zero and
> it is bumped once it is added to any of the list, so we fulfill
> Eric's golden rule which is that all released objects always have a
> refcount that equals zero.
>
> Andrey Vagin reports that nf_conntrack_free can't be called for a
> conntrack with non-zero ref-counter, because it can race with
> nf_conntrack_find_get().
>
> A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
> ref-counter says that this conntrack is used. So when we release
> a conntrack with non-zero counter, we break this assumption.
>
> CPU1CPU2
> nf_conntrack_find()
> nf_ct_put()
>  destroy_conntrack()
> ...
> init_conntrack
>  __nf_conntrack_alloc (set use = 1)
> atomic_inc_not_zero(>use) (use = 2)
>  if (!l4proto->new(ct, skb, dataoff, 
> timeouts))
>   nf_conntrack_free(ct); (use = 2 !!!)
> ...
> __nf_conntrack_alloc (set use = 1)
>  if (!nf_ct_key_equal(h, tuple, zone))
>   nf_ct_put(ct); (use = 0)
>destroy_conntrack()
> /* continue to work with CT */
>
> After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
> race in nf_conntrack_find_get" another bug was triggered in
> destroy_conntrack():
>
> <4>[67096.759334] [ cut here ]
> <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
> ...
> <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C 
> ---2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
> <4>[67096.759932] RIP: 0010:[]  [] 
> destroy_conntrack+0x15c/0x190 [nf_conntrack]
> <4>[67096.760255] Call Trace:
> <4>[67096.760255]  [] nf_conntrack_destroy+0x17/0x30
> <4>[67096.760255]  [] nf_conntrack_find_get+0x85/0x130 
> [nf_conntrack]
> <4>[67096.760255]  [] nf_conntrack_in+0x352/0xb60 
> [nf_conntrack]
> <4>[67096.760255]  [] ipv4_conntrack_local+0x51/0x60 
> [nf_conntrack_ipv4]
> <4>[67096.760255]  [] nf_iterate+0x69/0xb0
> <4>[67096.760255]  [] ? dst_output+0x0/0x20
> <4>[67096.760255]  [] nf_hook_slow+0x74/0x110
> <4>[67096.760255]  [] ? dst_output+0x0/0x20
> <4>[67096.760255]  [] raw_sendmsg+0x775/0x910
> <4>[67096.760255]  [] ? flush_tlb_others_ipi+0x128/0x130
> <4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255]  [] inet_sendmsg+0x4a/0xb0
> <4>[67096.760255]  [] ? sock_sendmsg+0x13/0x140
> <4>[67096.760255]  [] sock_sendmsg+0x117/0x140
> <4>[67096.760255]  [] ? native_smp_send_reschedule+0x49/0x60
> <4>[67096.760255]  [] ? _spin_unlock_bh+0x1b/0x20
> <4>[67096.760255]  [] ? autoremove_wake_function+0x0/0x40
> <4>[67096.760255]  [] ? do_ip_setsockopt+0x90/0xd80
> <4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255]  [] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255]  [] sys_sendto+0x139/0x190
> <4>[67096.760255]  [] ? audit_syscall_entry+0x1d7/0x200
> <4>[67096.760255]  [] ? __audit_syscall_exit+0x265/0x290
> <4>[67096.760255]  [] compat_sys_socketcall+0x13f/0x210
> <4>[67096.760255]  [] ia32_sysret+0x0/0x5
>
> I have reused the original title for the RFC patch that Andrey posted and
> most of the original patch description.
>
> Cc: Eric Dumazet 
> Cc: Andrew Vagin 
> Cc: Florian Westphal 
> Cc: Zefan Li 
> Signed-off-by: Ani Sinha 
> Reported-by: Andrew Vagin 
> Signed-off-by: Pablo Neira Ayuso 
> Reviewed-by: Eric Dumazet 
> Acked-by: Andrew Vagin 
> ---
>  net/netfilter/nf_conntrack_core.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index 9a171b2..9a46908 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -441,7 +441,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
> goto out;
>
> add_timer(>timeout);
> -   nf_conntrack_get(>ct_general);
> +   smp_wmb();
> +   /* The caller holds a reference to this object */
> +   atomic_set(>ct_general.use, 2);
> __nf_conntrack_hash_insert(ct, hash, repl_hash);
> NF_CT_STAT_INC(net, insert);
> 

[PATCH RFT v2] sh_eth: fix kernel oops in skb_put()

2015-10-24 Thread Sergei Shtylyov
In a low memory situation the following kernel oops occurs:

Unable to handle kernel NULL pointer dereference at virtual address 0050
pgd = 8490c000
[0050] *pgd=4651e831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in:
CPU: 0Not tainted  (3.4-at16 #9)
PC is at skb_put+0x10/0x98
LR is at sh_eth_poll+0x2c8/0xa10
pc : [<8035f780>]lr : [<8028bf50>]psr: 6113
sp : 84eb1a90  ip : 84eb1ac8  fp : 84eb1ac4
r10: 003f  r9 : 05ea  r8 : 
r7 :   r6 : 940453b0  r5 : 0003  r4 : 9381b180
r3 :   r2 :   r1 : 05ea  r0 : 
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 4248c059  DAC: 0015
Process klogd (pid: 2046, stack limit = 0x84eb02e8)
[...]

This is because netdev_alloc_skb() fails and 'mdp->rx_skbuff[entry]' is left
NULL but sh_eth_rx() later uses it without checking. Add such check...

Reported-by: Yasushi SHOJI 
Signed-off-by: Sergei Shtylyov 

---
This patch is against DaveM's 'net.git' repo.

 drivers/net/ethernet/renesas/sh_eth.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: net/drivers/net/ethernet/renesas/sh_eth.c
===
--- net.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net/drivers/net/ethernet/renesas/sh_eth.c
@@ -1481,6 +1481,7 @@ static int sh_eth_rx(struct net_device *
if (mdp->cd->shift_rd0)
desc_status >>= 16;
 
+   skb = mdp->rx_skbuff[entry];
if (desc_status & (RD_RFS1 | RD_RFS2 | RD_RFS3 | RD_RFS4 |
   RD_RFS5 | RD_RFS6 | RD_RFS10)) {
ndev->stats.rx_errors++;
@@ -1496,12 +1497,11 @@ static int sh_eth_rx(struct net_device *
ndev->stats.rx_missed_errors++;
if (desc_status & RD_RFS10)
ndev->stats.rx_over_errors++;
-   } else {
+   } else  if (skb) {
if (!mdp->cd->hw_swap)
sh_eth_soft_swap(
phys_to_virt(ALIGN(rxdesc->addr, 4)),
pkt_len + 2);
-   skb = mdp->rx_skbuff[entry];
mdp->rx_skbuff[entry] = NULL;
if (mdp->cd->rpadir)
skb_reserve(skb, NET_IP_ALIGN);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH net-next] tipc: tipc_link_is_active() can be static

2015-10-24 Thread Jon Maloy
Acked-by: Jon Maloy 



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of kbuild test robot
> Sent: Saturday, 24 October, 2015 11:11
> To: kbuild-inter...@linux.intel.com; l...@eclists.intel.com
> Cc: kbuild-...@01.org; netdev@vger.kernel.org
> Subject: [RFC PATCH net-next] tipc: tipc_link_is_active() can be static
> 
> TO: "David S. Miller" 
> CC: netdev@vger.kernel.org
> CC: Jon Maloy 
> CC: Ying Xue 
> CC: tipc-discuss...@lists.sourceforge.net
> CC: linux-ker...@vger.kernel.org
> 
> 
> Signed-off-by: Fengguang Wu 
> ---
>  link.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 4449fa0..b637276 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -173,7 +173,7 @@ bool link_is_bc_rcvlink(struct tipc_link *l)
>   return ((l->bc_rcvlink == l) && !link_is_bc_sndlink(l));  }
> 
> -int tipc_link_is_active(struct tipc_link *l)
> +static int tipc_link_is_active(struct tipc_link *l)
>  {
>   return l->active;
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC PATCH net-next] tipc: link_is_bc_sndlink() can be static

2015-10-24 Thread Jon Maloy
Acked-by: Jon Maloy 


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of kbuild test robot
> Sent: Saturday, 24 October, 2015 10:56
> To: kbuild-inter...@linux.intel.com; l...@eclists.intel.com
> Cc: kbuild-...@01.org; netdev@vger.kernel.org
> Subject: [RFC PATCH net-next] tipc: link_is_bc_sndlink() can be static
> 
> TO: "David S. Miller" 
> CC: netdev@vger.kernel.org
> CC: Jon Maloy 
> CC: Ying Xue 
> CC: tipc-discuss...@lists.sourceforge.net
> CC: linux-ker...@vger.kernel.org
> 
> 
> Signed-off-by: Fengguang Wu 
> ---
>  link.c |8 
>  node.c |2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tipc/link.c b/net/tipc/link.c index 4449fa0..9efbdbd 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -163,12 +163,12 @@ bool tipc_link_is_blocked(struct tipc_link *l)
>   return l->state & (LINK_RESETTING | LINK_PEER_RESET |
> LINK_FAILINGOVER);  }
> 
> -bool link_is_bc_sndlink(struct tipc_link *l)
> +static bool link_is_bc_sndlink(struct tipc_link *l)
>  {
>   return !l->bc_sndlink;
>  }
> 
> -bool link_is_bc_rcvlink(struct tipc_link *l)
> +static bool link_is_bc_rcvlink(struct tipc_link *l)
>  {
>   return ((l->bc_rcvlink == l) && !link_is_bc_sndlink(l));  } @@ -1364,8
> +1364,8 @@ static bool tipc_link_build_bc_proto_msg(struct tipc_link *l,
> bool bcast,
>   * Give a newly added peer node the sequence number where it should
>   * start receiving and acking broadcast packets.
>   */
> -void tipc_link_build_bc_init_msg(struct tipc_link *l,
> -  struct sk_buff_head *xmitq)
> +static void tipc_link_build_bc_init_msg(struct tipc_link *l,
> + struct sk_buff_head *xmitq)
>  {
>   struct sk_buff_head list;
> 
> diff --git a/net/tipc/node.c b/net/tipc/node.c index 7493506..20cddec 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -1083,7 +1083,7 @@ int tipc_node_xmit_skb(struct net *net, struct
> sk_buff *skb, u32 dnode,
>   *
>   * Invoked with no locks held.
>   */
> -void tipc_node_bc_rcv(struct net *net, struct sk_buff *skb, int bearer_id)
> +static void tipc_node_bc_rcv(struct net *net, struct sk_buff *skb, int
> +bearer_id)
>  {
>   int rc;
>   struct sk_buff_head xmitq;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: openvswitch: net --> net-next merge

2015-10-24 Thread Pravin Shelar
On Sat, Oct 24, 2015 at 6:55 AM, David Miller  wrote:
>
> I needed to do a merge of 'net' into 'net-next' in order to
> facilitate a set of tipc patches that I wanted to apply to
> 'net-next'.
>
> There were several openvswitch merge conflicts, mostly to do with the
> egress tunnel info bug fix conflicting with the simplification of the
> vport ->send() method.
>
> I did my best to resolve the conflicts, but if someone would double
> check my work I would really appreciate it.
>

Resolved conflicts looks good. There is one minor issue where unused
function (ovs_tunnel_route_lookup) is not removed, I will send out
patch to fix it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ipv6 route: Aggregate table getting code

2015-10-24 Thread Masashi Honma

On 2015/10/19 15:09, David Miller wrote:

This is not correct.

The whole point of the test is so that the kernel log message
warning for failing to provide NLM_F_CREATE can be printed
in precisely the correct conditions.


Thanks. Now I understand importance of the warning.

Though fib6_get_table() is called twice to show the warning.
So I made another patch to reduce calling the function.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ipv6 route: Use flag instead of calling fib6_get_table() twice

2015-10-24 Thread Masashi Honma
The fib6_get_table() is called twice to show the warning.
This patch reduces calling the function.

Signed-off-by: Masashi Honma 
---
 include/net/ip6_fib.h |  2 +-
 net/ipv6/fib6_rules.c |  3 ++-
 net/ipv6/ip6_fib.c| 10 +++---
 net/ipv6/route.c  | 13 -
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index aaf9700..d6c5dff 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -256,7 +256,7 @@ typedef struct rt6_info *(*pol_lookup_t)(struct net *,
  */
 
 struct fib6_table *fib6_get_table(struct net *net, u32 id);
-struct fib6_table *fib6_new_table(struct net *net, u32 id);
+struct fib6_table *fib6_new_table(struct net *net, u32 id, int *exist);
 struct dst_entry *fib6_rule_lookup(struct net *net, struct flowi6 *fl6,
   int flags, pol_lookup_t lookup);
 
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 9f777ec..7512941 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -187,12 +187,13 @@ static int fib6_rule_configure(struct fib_rule *rule, 
struct sk_buff *skb,
int err = -EINVAL;
struct net *net = sock_net(skb->sk);
struct fib6_rule *rule6 = (struct fib6_rule *) rule;
+   int exist;
 
if (rule->action == FR_ACT_TO_TBL) {
if (rule->table == RT6_TABLE_UNSPEC)
goto errout;
 
-   if (fib6_new_table(net, rule->table) == NULL) {
+   if (fib6_new_table(net, rule->table, ) == NULL) {
err = -ENOBUFS;
goto errout;
}
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 7d2e002..abf65ef 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -226,16 +226,19 @@ static struct fib6_table *fib6_alloc_table(struct net 
*net, u32 id)
return table;
 }
 
-struct fib6_table *fib6_new_table(struct net *net, u32 id)
+struct fib6_table *fib6_new_table(struct net *net, u32 id, int *exist)
 {
struct fib6_table *tb;
 
if (id == 0)
id = RT6_TABLE_MAIN;
tb = fib6_get_table(net, id);
-   if (tb)
+   if (tb) {
+   *exist = 1;
return tb;
+   }
 
+   *exist = 0;
tb = fib6_alloc_table(net, id);
if (tb)
fib6_link_table(net, tb);
@@ -272,8 +275,9 @@ static void __net_init fib6_tables_init(struct net *net)
 }
 #else
 
-struct fib6_table *fib6_new_table(struct net *net, u32 id)
+struct fib6_table *fib6_new_table(struct net *net, u32 id, int *exist)
 {
+   *exist = 1;
return fib6_get_table(net, id);
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index cb32ce2..7c4e0c2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1757,6 +1757,7 @@ int ip6_route_info_create(struct fib6_config *cfg, struct 
rt6_info **rt_ret)
struct inet6_dev *idev = NULL;
struct fib6_table *table;
int addr_type;
+   int exist;
 
if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
return -EINVAL;
@@ -1778,16 +1779,10 @@ int ip6_route_info_create(struct fib6_config *cfg, 
struct rt6_info **rt_ret)
cfg->fc_metric = IP6_RT_PRIO_USER;
 
err = -ENOBUFS;
+   table = fib6_new_table(net, cfg->fc_table, );
if (cfg->fc_nlinfo.nlh &&
-   !(cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_CREATE)) {
-   table = fib6_get_table(net, cfg->fc_table);
-   if (!table) {
-   pr_warn("NLM_F_CREATE should be specified when creating 
new route\n");
-   table = fib6_new_table(net, cfg->fc_table);
-   }
-   } else {
-   table = fib6_new_table(net, cfg->fc_table);
-   }
+   !(cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_CREATE) && !exist)
+   pr_warn("NLM_F_CREATE should be specified when creating new 
route\n");
 
if (!table)
goto out;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CONFIG_XPS depends on L1_CACHE_BYTES being greater than sizeof(struct xps_map)

2015-10-24 Thread Alexander Duyck

On 10/24/2015 07:43 AM, Helge Deller wrote:

* Alexander Duyck :

On 10/23/2015 03:17 PM, Helge Deller wrote:

On 24.10.2015 00:00, Alexander Duyck wrote:

On 10/23/2015 02:08 PM, Helge Deller wrote:

* Eric Dumazet :

On Fri, 2015-10-23 at 21:25 +0200, Helge Deller wrote:


Then, how about simply changing it to twice of L1_CACHE_BYTES ?

#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES * 2 - sizeof(struct xps_map)) / 
sizeof(u16))



Seems good to me.


Great!

Can you then maybe give me an Acked-by or signed-off for the patch below?
It further adds a compile-time check to avoid that XPS_MIN_MAP_ALLOC
gets calculated to zero on any architecture - otherwise no queues would
be allocated.

In addition I would like to push it for v4.3 then through my parisc-tree
(after keeping it in for-next for 1-2 days), together with the patch
which reduces L1_CACHE_BYTES to 16 on parisc.
Would that be OK too?

Thanks!
Helge


[PATCH] net/xps: Increase initial number of xps queues

Increase the number of initial allocated xps queues, so that the initial record
allocates twice the size of L1_CACHE_BYTES bytes.

This change is needed to copy with architectures where L1_CACHE_BYTES is
defined to equal or less than 16 bytes.

Signed-off-by: Helge Deller 

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2d15e38..d152788 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -718,7 +718,7 @@ struct xps_map {
   u16 queues[0];
   };
   #define XPS_MAP_SIZE(_num) (sizeof(struct xps_map) + ((_num) * sizeof(u16)))
-#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map))\
+#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES * 2 - sizeof(struct xps_map)) \
   / sizeof(u16))

   /*
diff --git a/net/core/dev.c b/net/core/dev.c
index 6bb6470..f6d6dd1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1972,6 +1972,8 @@ static struct xps_map *expand_xps_map(struct xps_map *map,
   int alloc_len = XPS_MIN_MAP_ALLOC;
   int i, pos;

+BUILD_BUG_ON(XPS_MIN_MAP_ALLOC == 0);
+
   for (pos = 0; map && pos < map->len; pos++) {
   if (map->queues[pos] != index)
   continue;




Rather then leaving a potential bug you could probably rewrite the macro so 
that it will give you at least 1.

All you need to do is something like the following
#define XPS_MIN_MAP_ALLOC \
 ((L1_CACHE_ALIGN(offsetof(struct xps_map, queue[1])) - \
   sizeof(struct xps_map)) / sizeof(u16))

That should give you at least an XPS_MIN_MAP_ALLOC of 1.


Yes, good idea!

What makes me wonder though (because I have no idea about the XPS code/layer):
How likely is it, that more than 1 (e.g. minimum "X") queues are needed?
E.g. if a typical system needs at least 3 queues, then doesn't it make sense to 
allocate
at least 3 initially by using queue[3] in your proposed patch above ?
What would "X" be then?


The question I would have is in how many cases it it likely that somebody
would enable this feature and point a given CPU at more than one queue.  I
know the Intel drivers that make use of XPS tend to do a 1:1 mapping for
their ATR feature.  I would think if anything most CPUs would probably be
mapped many:1, but you probably won't have all that many cases where it is
1:many or many:many.

I'd say starting with at least 1 should be fine.  Worst case scenario is we
have to make a couple more calls to expand_xps_map which will likely occur
as a slow path and infrequent event anyway.


Ok, can I get then the signed-off or acked-by from you for this patch?

Thanks,
Helge


[PATCH] net/xps: Fix calculation of initial number of xps queues

The existing code breaks on architectures where the L1 cache size
(L1_CACHE_BYTES) is smaller or equal the size of struct xps_map.

The new code ensures that we get at minimum one initial xps queue, or
even more as long as it fits into the next multiple of L1_CACHE_SIZE.

Signed-off-by: Helge Deller 

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2d15e38..2212c82 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -718,8 +718,8 @@ struct xps_map {
u16 queues[0];
  };
  #define XPS_MAP_SIZE(_num) (sizeof(struct xps_map) + ((_num) * sizeof(u16)))
-#define XPS_MIN_MAP_ALLOC ((L1_CACHE_BYTES - sizeof(struct xps_map))   \
-/ sizeof(u16))
+#define XPS_MIN_MAP_ALLOC ((L1_CACHE_ALIGN(offsetof(struct xps_map, 
queues[1])) \
+   - sizeof(struct xps_map)) / sizeof(u16))

  /*
   * This structure holds all XPS maps for device.  Maps are indexed by CPU.



This looks good to me.

Acked-by: Alexander Duyck 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism

2015-10-24 Thread Bendik Rønning Opstad
On Friday, October 23, 2015 02:44:14 PM Eric Dumazet wrote:
> On Fri, 2015-10-23 at 22:50 +0200, Bendik Rønning Opstad wrote:
> 
> >  
> > +/**
> > + * tcp_stream_is_thin_dpifl() - Tests if the stream is thin based on 
> > dynamic PIF
> > + *  limit
> > + * @tp: the tcp_sock struct
> > + *
> > + * Return: true if current packets in flight (PIF) count is lower than
> > + * the dynamic PIF limit, else false
> > + */
> > +static inline bool tcp_stream_is_thin_dpifl(const struct tcp_sock *tp)
> > +{
> > +   u64 dpif_lim = tp->srtt_us >> 3;
> > +   /* Div by is_thin_min_itt_lim, the minimum allowed ITT
> > +* (Inter-transmission time) in usecs.
> > +*/
> > +   do_div(dpif_lim, tp->thin_dpifl_itt_lower_bound);
> > +   return tcp_packets_in_flight(tp) < dpif_lim;
> > +}
> > +
> This is very strange :
> 
> You are using a do_div() while both operands are 32bits.  A regular
> divide would be ok :
> 
> u32 dpif_lim = (tp->srtt_us >> 3) / tp->thin_dpifl_itt_lower_bound;
> 
> But then, you can avoid the divide by using a multiply, less expensive :
> 
> return(u64)tcp_packets_in_flight(tp) * tp->thin_dpifl_itt_lower_bound 
> <
>   (tp->srtt_us >> 3);
> 

You are of course correct. Will fix this and use multiply. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net 1/3] openvswitch: Fix double-free on ip_defrag() errors

2015-10-24 Thread Florian Westphal
Joe Stringer  wrote:
>  err:
> + if (err)
> + kfree_skb(skb);
>   skb_push(skb, nh_ofs);

That looks... wrong :-}
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/8] xen-netback/core: packet hashing

2015-10-24 Thread David Miller
From: Paul Durrant 
Date: Wed, 21 Oct 2015 11:36:17 +0100

> This series adds xen-netback support for hash negotiation with a frontend
> driver, and an implementation of toeplitz hashing as the initial negotiable
> algorithm.

Ping, I want to see some review from some other xen networking folks.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull request: bluetooth-next 2015-10-22

2015-10-24 Thread David Miller
From: Johan Hedberg 
Date: Thu, 22 Oct 2015 13:54:55 +0300

> Here's probably the last bluetooth-next pull request for 4.4. Among
> several other changes it contains the rest of the fixes & cleanups from
> the Bluetooth UnplugFest (that didn't need to be hurried to 4.3).
> 
>  - Refactoring & cleanups to 6lowpan code
>  - New USB ids for two Atheros controllers and BCM43142A0 from Broadcom
>  - Fix (quirk) for broken Broadcom BCM2045 controllers
>  - Support for latest Apple controllers
>  - Improvements to the vendor diagnostic message support
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] e1000e:Fix incorrect assumption that the function e1000e_up always runs successfully in e1000_change_mtu

2015-10-24 Thread Alexander Duyck

On 10/23/2015 09:09 PM, Nicholas Krause wrote:

This fixes the function e1000_change_mtu to properly check and run
the error code by e1000e_up as this function can fail and the error
code should be returned to the caller of the function e1000_change_mtu
to signal a error has occurred when calling this particular function


The function e1000e_up always returns 0.  It should probably be switched 
to a void instead of an int.  You could probably go through and drop a 
bunch of dead code that is checking for the result of that function as well.



Signed-off-by: Nicholas Krause 
---
  drivers/net/ethernet/intel/e1000e/netdev.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index faf4b3f..aebccb1 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -5931,6 +5931,7 @@ static int e1000_change_mtu(struct net_device *netdev, 
int new_mtu)
  {
struct e1000_adapter *adapter = netdev_priv(netdev);
int max_frame = new_mtu + VLAN_ETH_HLEN + ETH_FCS_LEN;
+   int err = 0;
  
  	/* Jumbo frame support */

if ((max_frame > (VLAN_ETH_FRAME_LEN + ETH_FCS_LEN)) &&
@@ -5984,7 +5985,9 @@ static int e1000_change_mtu(struct net_device *netdev, 
int new_mtu)
adapter->rx_buffer_len = VLAN_ETH_FRAME_LEN + ETH_FCS_LEN;
  
  	if (netif_running(netdev))

-   e1000e_up(adapter);
+   err = e1000e_up(adapter);
+   if (err)
+   e_err("Failed to successfully bring up this adapter\n");
else
e1000e_reset(adapter);
  
@@ -5992,7 +5995,7 @@ static int e1000_change_mtu(struct net_device *netdev, int new_mtu)
  
  	clear_bit(__E1000_RESETTING, >state);
  
-	return 0;

+   return err;
  }
  
  static int e1000_mii_ioctl(struct net_device *netdev, struct ifreq *ifr,


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] Automatic adjustment of max frame size

2015-10-24 Thread Toshiaki Makita
David,

I found my patch set is marked with Changes Requested, but I haven't
seen any feedback.

Could you give me your feedback?

Thanks,
Toshiaki Makita
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v7 01/10] qed: Add module with basic common support

2015-10-24 Thread David Miller
From: Yuval Mintz 
Date: Thu, 22 Oct 2015 08:06:48 +0300

> diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
> b/drivers/net/ethernet/qlogic/qed/Makefile
> new file mode 100644
> index 000..5bbe0c7
> --- /dev/null
> +++ b/drivers/net/ethernet/qlogic/qed/Makefile
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_QED) := qed.o
> +
> +qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
> qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o

Please break this qed-y assignment into multiple lines using "\".

> +#ifndef _QED_H
> +#define _QED_H
> +#include 

Please put an empty line after "#define _QED_H"

> +/* helpers */
> +static inline u32 DB_ADDR(u32 cid, u32 DEMS)
> +{
> + u32 db_addr = FIELD_VALUE(DB_LEGACY_ADDR_DEMS, DEMS) |
> +   FIELD_VALUE(DB_LEGACY_ADDR_ICID, cid);
> +
> + return db_addr;
> +}

Generally speaking, inline functions should be named with lowercase
letters, CPP macros that act like functions can be uppercase.

> +#define MAP_WORD_SIZE sizeof(unsigned long)
> +#define BITS_PER_MAP_WORD (MAP_WORD_SIZE * 8)

Please don't reimplement something as fundamental as a bitmap of words, use
the faciltiies and helpers in linux/bitmap.h instead.

> +/* counts the iids for the CDU/CDUC ILT client configuration */
> +struct qed_cdu_iids {
> + u32 pf_cids;
> +};

This one member structure is excessive, it's only ever instantiated
as a local variable in this file, so just use "u32" directly.

> + p_info->p_cxt = (u8 *)p_mngr->ilt_shadow[line].p_virt +
> +   p_info->iid % cxts_per_p * conn_cxt_size;

p_virt is a void pointer, therefore you don't need to cast it to have
arithmatic on it work properly.

> + if (qed_mcp_is_init(p_hwfn)) {
> + ether_addr_copy(p_hwfn->hw_info.hw_mac_addr,
> + p_hwfn->mcp_info->func_info.mac);
> + } else {
> + static u8 mcp_hw_mac[6] = { 0, 2, 3, 4, 5, 6 };
> +
> + ether_addr_copy(p_hwfn->hw_info.hw_mac_addr, mcp_hw_mac);
> + p_hwfn->hw_info.hw_mac_addr[5] = p_hwfn->abs_pf_id;
> + }

In this else branch you should probably use a random ethernet address.
If that is not correct here, this is not expected and requires a
detailed comment explaining why this fixed ethernet address is being
used.

> +static int qed_get_dev_info(struct qed_dev *cdev)
 ...
> + return 0;

This never returns anything other than zero, make it return void.

> +int qed_hw_prepare(struct qed_dev*cdev,
> +int  personality)

Please do not declare function variables with tab characters
separating the type from the variable name.  Likewise when these
functions are externally declared in header files.

> + void __iomem *p_regview, *p_doorbell;
> +
> + p_regview = (void __iomem *)
> + ((u8 __iomem *)cdev->regview + i *

Again, void pointers do not need to be cast to "u8 *" in order
to perform byte arithmatic on them.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net: fec: stable queue request

2015-10-24 Thread David Miller
From: Lucas Stach 
Date: Wed, 21 Oct 2015 15:20:00 +0200

> can you please add
> 
> b0c6ce24911fcb64715de9569f0f7b4f54d1d045
> net: fec: Remove unneeded use of IS_ERR_VALUE() macro
> 
> 42ea4457aea7aaeddf0c0b06724f297608f5e9d2
> net: fec: normalize return value of pm_runtime_get_sync() in MDIO write
> 
> to your stable 4.2+ queue?

Done.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB)

2015-10-24 Thread Yuchung Cheng
On Fri, Oct 23, 2015 at 1:50 PM, Bendik Rønning Opstad
 wrote:
>
> This is a request for comments.
>
> Redundant Data Bundling (RDB) is a mechanism for TCP aimed at reducing
> the latency for applications sending time-dependent data.
> Latency-sensitive applications or services, such as online games and
> remote desktop, produce traffic with thin-stream characteristics,
> characterized by small packets and a relatively high ITT. By bundling
> already sent data in packets with new data, RDB alleviates head-of-line
> blocking by reducing the need to retransmit data segments when packets
> are lost. RDB is a continuation on the work on latency improvements for
> TCP in Linux, previously resulting in two thin-stream mechanisms in the
> Linux kernel
> (https://github.com/torvalds/linux/blob/master/Documentation/networking/tcp-thin.txt).
>
> The RDB implementation has been thoroughly tested, and shows
> significant latency reductions when packet loss occurs[1]. The tests
> show that, by imposing restrictions on the bundling rate, it can be made
> not to negatively affect competing traffic in an unfair manner.
>
> Note: Current patch set depends on a recently submitted patch for
> tcp_skb_cb (tcp: refactor struct tcp_skb_cb: 
> http://patchwork.ozlabs.org/patch/510674)
>
> These patches have been tested with as set of packetdrill scripts located at
> https://github.com/bendikro/packetdrill/tree/master/gtests/net/packetdrill/tests/linux/rdb
> (The tests require patching packetdrill with a new socket option:
> https://github.com/bendikro/packetdrill/commit/9916b6c53e33dd04329d29b7d8baf703b2c2ac1b)
>
> Detailed info about the RDB mechanism can be found at
> http://mlab.no/blog/2015/10/redundant-data-bundling-in-tcp, as well as in the 
> paper

What's the difference between RDB and TCP repacketization
(http://flylib.com/books/en/3.223.1.226/1/) ?

Reading the blog page, I am concerned the amount of
change (esp on fast path) just to bundle new writes during timeout &
retransmit, for a specific type of application? why not just send X
packets with total bytes < MSS on timeout..

> "Latency and Fairness Trade-Off for Thin Streams using Redundant Data
> Bundling in TCP"[2].
>
> [1] http://home.ifi.uio.no/paalh/students/BendikOpstad.pdf
> [2] http://home.ifi.uio.no/bendiko/rdb_fairness_tradeoff.pdf
>
>
> Bendik Rønning Opstad (2):
>   tcp: Add DPIFL thin stream detection mechanism
>   tcp: Add Redundant Data Bundling (RDB)
>
>  Documentation/networking/ip-sysctl.txt |  23 +++
>  include/linux/skbuff.h |   1 +
>  include/linux/tcp.h|   9 +-
>  include/net/tcp.h  |  34 
>  include/uapi/linux/tcp.h   |   1 +
>  net/core/skbuff.c  |   3 +-
>  net/ipv4/Makefile  |   3 +-
>  net/ipv4/sysctl_net_ipv4.c |  35 
>  net/ipv4/tcp.c |  19 ++-
>  net/ipv4/tcp_input.c   |   3 +
>  net/ipv4/tcp_output.c  |  11 +-
>  net/ipv4/tcp_rdb.c | 281 
> +
>  12 files changed, 415 insertions(+), 8 deletions(-)
>  create mode 100644 net/ipv4/tcp_rdb.c
>
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RTF] sh_eth: fix kernel oops in skb_put()

2015-10-24 Thread Sergei Shtylyov

Hello.

On 10/24/2015 2:09 AM, Sergei Shtylyov wrote:


In a low memory situation the following kernel oops occurs:

Unable to handle kernel NULL pointer dereference at virtual address 0050
pgd = 8490c000
[0050] *pgd=4651e831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in:
CPU: 0Not tainted  (3.4-at16 #9)
PC is at skb_put+0x10/0x98
LR is at sh_eth_poll+0x2c8/0xa10
pc : [<8035f780>]lr : [<8028bf50>]psr: 6113
sp : 84eb1a90  ip : 84eb1ac8  fp : 84eb1ac4
r10: 003f  r9 : 05ea  r8 : 
r7 :   r6 : 940453b0  r5 : 0003  r4 : 9381b180
r3 :   r2 :   r1 : 05ea  r0 : 
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 4248c059  DAC: 0015
Process klogd (pid: 2046, stack limit = 0x84eb02e8)
[...]

This is because netdev_alloc_skb() fails and 'mdp->rx_skbuff[entry]' is left
NULL but sh_eth_rx() later uses it without checking. Add such check...

Reported-by: Yasushi SHOJI 
Signed-off-by: Sergei Shtylyov 

---
This patch is against DaveM's 'net.git' repo.

drivers/net/ethernet/renesas/sh_eth.c |3 +++
  1 file changed, 3 insertions(+)

Index: net/drivers/net/ethernet/renesas/sh_eth.c
===
--- net.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net/drivers/net/ethernet/renesas/sh_eth.c
@@ -1502,6 +1502,8 @@ static int sh_eth_rx(struct net_device *
phys_to_virt(ALIGN(rxdesc->addr, 4)),
pkt_len + 2);
skb = mdp->rx_skbuff[entry];
+   if (!skb)
+   goto skip;
mdp->rx_skbuff[entry] = NULL;
if (mdp->cd->rpadir)
skb_reserve(skb, NET_IP_ALIGN);
@@ -1516,6 +1518,7 @@ static int sh_eth_rx(struct net_device *
if (desc_status & RD_RFS8)
ndev->stats.multicast++;
}
+skip:
entry = (++mdp->cur_rx) % mdp->num_rx_ring;
rxdesc = >rx_ring[entry];
}


   In fact, it could be done without goto/label. I'll recast.

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH] fm10k:Fix error handling in the function fm10k_probe

2015-10-24 Thread Alexander Duyck

On 10/16/2015 12:42 PM, Nicholas Krause wrote:

This fixes error handling in the function fm10k_probe to properly
if the call to the function fm10k_iov_configure has failed by
returning a error code before jumping to a new goto label,
fm10k_iov_configure to clean up previously allocated resources
between this goto label and the previous goto label before finally
returning the error code to the callers of the function fm10k_probe

Signed-off-by: Nicholas Krause 
---
  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index ce53ff2..3d7374e 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1844,13 +1844,17 @@ static int fm10k_probe(struct pci_dev *pdev,
fm10k_slot_warn(interface);
  
  	/* enable SR-IOV after registering netdev to enforce PF/VF ordering */

-   fm10k_iov_configure(pdev, 0);
+   err = fm10k_iov_configure(pdev, 0);
+   if (err)
+   goto err_iov_configure;
  
  	/* clear the service task disable bit to allow service task to start */

clear_bit(__FM10K_SERVICE_DISABLE, >state);
  
  	return 0;

-
+err_iov_configure:
+   fm10k_ptp_unregister(interface);
+   unregister_netdev(netdev);
  err_register:
fm10k_mbx_free_irq(interface);
  err_mbx_interrupt:


Failing to enable SR-IOV is not grounds for not brining up the 
interface.  The NIC is still usable, it just cannot do SR-IOV.


The only reason fm10k_iov_configure has a return value is because it is 
also passed through the PCI interface to allow the sysfs control.  If it 
returns an error there then it is much more important as the user 
actually requested it be enabled.


If you would like you could display a message there but this shouldn't 
be a hard error.


- Alex


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v3] openvswitch: Fix egress tunnel info.

2015-10-24 Thread Jiri Benc
On Fri, 23 Oct 2015 12:47:09 -0700, Pravin Shelar wrote:
> This is not complete code. I found couple of issues with it.

That's likely, I didn't test it :-)

> This code does not copy lwtunnel_state state into new dst.

Not sure what state you mean. All the relevant state should be in
struct ip_tunnel_info.

> And it is converting lwtunnel dst into metadata dst.

I did that intentionally. But your comment made me realize that I was
too much focused on the tunnel implementation side. This function,
should it be used outside of ovs, could be called earlier than the skb
is handed to the tunnel. Then we of course need a valid dst->output and
this approach (converting lwtunnel data to metadata_dst) would fail.
This is the same problem we have with IPv6 ndisc replies.

This needs more thinking and is out of scope of what you're trying to
solve. Sorry for the noise. But at least it made me realize that the
ndisc problem (that I still don't have a good solution for) is broader.

Thanks,

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB)

2015-10-24 Thread Jonas Markussen


> On 24 Oct 2015, at 08:11, Yuchung Cheng  wrote:
> 
> On Fri, Oct 23, 2015 at 1:50 PM, Bendik Rønning Opstad
>  wrote:
>> 
>> This is a request for comments.
>> 
>> Redundant Data Bundling (RDB) is a mechanism for TCP aimed at reducing
>> the latency for applications sending time-dependent data.
>> Latency-sensitive applications or services, such as online games and
>> remote desktop, produce traffic with thin-stream characteristics,
>> characterized by small packets and a relatively high ITT. By bundling
>> already sent data in packets with new data, RDB alleviates head-of-line
>> blocking by reducing the need to retransmit data segments when packets
>> are lost. RDB is a continuation on the work on latency improvements for
>> TCP in Linux, previously resulting in two thin-stream mechanisms in the
>> Linux kernel
>> (https://github.com/torvalds/linux/blob/master/Documentation/networking/tcp-thin.txt).
>> 
>> The RDB implementation has been thoroughly tested, and shows
>> significant latency reductions when packet loss occurs[1]. The tests
>> show that, by imposing restrictions on the bundling rate, it can be made
>> not to negatively affect competing traffic in an unfair manner.
>> 
>> Note: Current patch set depends on a recently submitted patch for
>> tcp_skb_cb (tcp: refactor struct tcp_skb_cb: 
>> http://patchwork.ozlabs.org/patch/510674)
>> 
>> These patches have been tested with as set of packetdrill scripts located at
>> https://github.com/bendikro/packetdrill/tree/master/gtests/net/packetdrill/tests/linux/rdb
>> (The tests require patching packetdrill with a new socket option:
>> https://github.com/bendikro/packetdrill/commit/9916b6c53e33dd04329d29b7d8baf703b2c2ac1b)
>> 
>> Detailed info about the RDB mechanism can be found at
>> http://mlab.no/blog/2015/10/redundant-data-bundling-in-tcp, as well as in 
>> the paper
> 
> What's the difference between RDB and TCP repacketization
> (http://flylib.com/books/en/3.223.1.226/1/) ?
> 
> Reading the blog page, I am concerned the amount of
> change (esp on fast path) just to bundle new writes during timeout &
> retransmit, for a specific type of application? why not just send X
> packets with total bytes < MSS on timeout..

Repacketization is only on retransmissions; RDB bundles previously sent 
segments with the next “normal” transmission instead. 

This makes the flow recover the lost segment  before a retransmission is 
triggered by an RTO or fast retransmit.

>> "Latency and Fairness Trade-Off for Thin Streams using Redundant Data
>> Bundling in TCP"[2].
>> 
>> [1] http://home.ifi.uio.no/paalh/students/BendikOpstad.pdf
>> [2] http://home.ifi.uio.no/bendiko/rdb_fairness_tradeoff.pdf
>> 
>> 
>> Bendik Rønning Opstad (2):
>>  tcp: Add DPIFL thin stream detection mechanism
>>  tcp: Add Redundant Data Bundling (RDB)
>> 
>> Documentation/networking/ip-sysctl.txt |  23 +++
>> include/linux/skbuff.h |   1 +
>> include/linux/tcp.h|   9 +-
>> include/net/tcp.h  |  34 
>> include/uapi/linux/tcp.h   |   1 +
>> net/core/skbuff.c  |   3 +-
>> net/ipv4/Makefile  |   3 +-
>> net/ipv4/sysctl_net_ipv4.c |  35 
>> net/ipv4/tcp.c |  19 ++-
>> net/ipv4/tcp_input.c   |   3 +
>> net/ipv4/tcp_output.c  |  11 +-
>> net/ipv4/tcp_rdb.c | 281 
>> +
>> 12 files changed, 415 insertions(+), 8 deletions(-)
>> create mode 100644 net/ipv4/tcp_rdb.c
>> 
>> --
>> 1.9.1

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

[PATCH net-next] ipv6: icmp: include addresses in debug messages

2015-10-24 Thread Bjørn Mork
Messages like "icmp6_send: no reply to icmp error" are close
to useless. Adding source and destination addresses to provide
some more clue.

Signed-off-by: Bjørn Mork 
---
I've had this laying around for much too long because I haven't been
convinced it's a good idea...  But keeping it around longer isn't going
to help, so it's better to leave that decision to you experts :)


Bjørn

 net/ipv6/icmp.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index efb1c00f2270..36c5a98b0472 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -453,7 +453,8 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info)
 *  and anycast addresses will be checked later.
 */
if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) {
-   net_dbg_ratelimited("icmp6_send: addr_any/mcast source\n");
+   net_dbg_ratelimited("icmp6_send: addr_any/mcast source [%pI6c > 
%pI6c]\n",
+   >saddr, >daddr);
return;
}
 
@@ -461,7 +462,8 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info)
 *  Never answer to a ICMP packet.
 */
if (is_ineligible(skb)) {
-   net_dbg_ratelimited("icmp6_send: no reply to icmp error\n");
+   net_dbg_ratelimited("icmp6_send: no reply to icmp error [%pI6c 
> %pI6c]\n",
+   >saddr, >daddr);
return;
}
 
@@ -513,7 +515,8 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info)
len = skb->len - msg.offset;
len = min_t(unsigned int, len, IPV6_MIN_MTU - sizeof(struct ipv6hdr) - 
sizeof(struct icmp6hdr));
if (len < 0) {
-   net_dbg_ratelimited("icmp: len problem\n");
+   net_dbg_ratelimited("icmp: len problem [%pI6c > %pI6c]\n",
+   >saddr, >daddr);
goto out_dst_release;
}
 
@@ -785,7 +788,8 @@ static int icmpv6_rcv(struct sk_buff *skb)
if (type & ICMPV6_INFOMSG_MASK)
break;
 
-   net_dbg_ratelimited("icmpv6: msg of unknown type\n");
+   net_dbg_ratelimited("icmpv6: msg of unknown type [%pI6c > 
%pI6c]\n",
+   saddr, daddr);
 
/*
 * error of unknown type.
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html