date:20160614

Re: [PATCH net-next] tcp: return sizeof tcp_dctcp_info in dctcp_get_info()

2016-06-14 Thread David Miller

From: Neal Cardwell 
Date: Mon, 13 Jun 2016 11:20:35 -0400

> Make sure that dctcp_get_info() returns only the size of the
> info->dctcp struct that it zeroes out and fills in. Previously it had
> been returning the size of the enclosing tcp_cc_info union,
> sizeof(*info).  There is no problem yet, but that union that may one
> day be larger than struct tcp_dctcp_info, in which case the
> TCP_CC_INFO code might accidentally copy uninitialized bytes from the
> stack.
> 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Soheil Hassas Yeganeh 
> Signed-off-by: Eric Dumazet 

Applied.

Re: [PATCH -next] sctp: fix error return code in sctp_init()

2016-06-14 Thread David Miller

From: weiyj...@163.com
Date: Mon, 13 Jun 2016 23:08:26 +0800

> From: Wei Yongjun 
> 
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-14 Thread Matt Wilson

On Tue, Jun 14, 2016 at 11:27:09PM -0700, David Miller wrote:
> From: Matt Wilson 
> Date: Tue, 14 Jun 2016 23:23:36 -0700
> 
> > Point taken, though existing drivers (even fairly popular ones) also
> > aren't as clean as you might like. A quick look around...
> 
> Existing drivers do undesirable things, film at 11...
> 
> Yet are never a reason to accept such things in new drivers.

I generally agree with this philosophy.

> > Like many other network drivers, some of this is common code used for
> > non-Linux systems, and that's why there is some overlap with Linux
> > facilities.
> 
> Again, never an excuse for such things.

I suppose I was just happy to have the *majorly* objectionable parts
cleaned up...

> > Are there other things that jump out at you?
> 
> I review hundreds of patches a day, I invested what I was able to
> before moving on to other people's work.

I wasn't asking you to do more, only if you had anything else you
wanted to say before Netanel sends a v2. 

> Other developers must help review such a large driver submission, it
> can't all be on me.

And I'm certainly not saying it's all on you. I've been reviewing this
with the team for quite a while to get it in pretty reasonable (IMHO)
shape.

--msw

Re: [PATCH net-next 0/2] rxrpc: Rename rxrpc source files

2016-06-14 Thread David Miller

From: David Howells 
Date: Mon, 13 Jun 2016 15:07:50 +0100

> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160613

Pulled.

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-14 Thread David Miller

From: Matt Wilson 
Date: Tue, 14 Jun 2016 23:23:36 -0700

> Point taken, though existing drivers (even fairly popular ones) also
> aren't as clean as you might like. A quick look around...

Existing drivers do undesirable things, film at 11...

Yet are never a reason to accept such things in new drivers.

> Like many other network drivers, some of this is common code used for
> non-Linux systems, and that's why there is some overlap with Linux
> facilities.

Again, never an excuse for such things.

> Are there other things that jump out at you?

I review hundreds of patches a day, I invested what I was able to
before moving on to other people's work.

Other developers must help review such a large driver submission, it
can't all be on me.

[PATCH resend net-next 1/1] qede: Remove the redundant initialization statement.

2016-06-14 Thread Sudarsana Reddy Kalluru

The callback ".getcbx" is being initialized twice. This was introduced by
the commit 0fdeb72aa6c9 ("qede: Add dcbnl support."). The patch removes the
redundant initialization statement.

Reported-by: Julia Lawall 
Signed-off-by: Yuval Mintz 
Signed-off-by: Sudarsana Reddy Kalluru 
---
 drivers/net/ethernet/qlogic/qede/qede_dcbnl.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_dcbnl.c 
b/drivers/net/ethernet/qlogic/qede/qede_dcbnl.c
index 03e8c02..318f0cb 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_dcbnl.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_dcbnl.c
@@ -308,7 +308,6 @@ static const struct dcbnl_rtnl_ops qede_dcbnl_ops = {
.ieee_setets = qede_dcbnl_ieee_setets,
.ieee_getapp = qede_dcbnl_ieee_getapp,
.ieee_setapp = qede_dcbnl_ieee_setapp,
-   .getdcbx = qede_dcbnl_getdcbx,
.ieee_peer_getpfc = qede_dcbnl_ieee_peer_getpfc,
.ieee_peer_getets = qede_dcbnl_ieee_peer_getets,
.getstate = qede_dcbnl_getstate,
-- 
1.8.3.1

Re: [PATCH] net: hns: update the dependency

2016-06-14 Thread David Miller

From: Yisen Zhuang 
Date: Wed, 15 Jun 2016 14:03:33 +0800

> Hi David,
> 
> You mean that i send this patch 3 times?
> 
> I am sorry for this.
> 
> I don't know why you can receive 3 times. I can only receive an email for 
> this patch.

I got three copies, each with a different Date: field.

patchwork saw all 3 copies as well:

http://patchwork.ozlabs.org/patch/634452/
http://patchwork.ozlabs.org/patch/634559/
http://patchwork.ozlabs.org/patch/634586/

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-14 Thread Matt Wilson

On Tue, Jun 14, 2016 at 10:25:16PM -0700, David Miller wrote:
> From: Netanel Belgazal 
> Date: Mon, 13 Jun 2016 11:46:13 +0300
> 
> > +#define ena_trc_dbg(format, arg...) \
> > +   pr_debug("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_info(format, arg...) \
> > +   pr_info("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_warn(format, arg...) \
> > +   pr_warn("[ENA_COM: %s] " format, __func__, ##arg)
> > +#define ena_trc_err(format, arg...) \
> > +   pr_err("[ENA_COM: %s] " format, __func__, ##arg)
> 
> These custom tracing macros are quite inappropriate.
> 
> We have the function tracer in the kernel when that is needed.  So spitting
> out __func__ all over the place is not something that should be found in
> drivers these days.

Point taken, though existing drivers (even fairly popular ones) also
aren't as clean as you might like. A quick look around...

msw@carbon:~/git/upstream/linux$ git grep -B1 '__func__' drivers/net/ethernet/ 
| grep -A1 '#define' 
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h-#define BNX2X_ERROR(fmt, ...)   
 \
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h:pr_err("[%s:%d]" fmt, 
__func__, __LINE__, ##__VA_ARGS__)
[...]
drivers/net/ethernet/intel/ixgb/ixgb_osdep.h:#define ENTER() pr_debug("%s\n", 
__func__);

Like many other network drivers, some of this is common code used for
non-Linux systems, and that's why there is some overlap with Linux
facilities. For example, here's the common ENA parts as it's situated
in DPDK as a PMD:

   http://dpdk.org/browse/dpdk/tree/drivers/net/ena/base/ena_com.c

When you compare to the DPDK version you can see that the common code
has already been contextualized for Linux in this patch in
anticipation of this type of feedback. (e.g., ENA_SPINLOCK_LOCK() ->
spin_lock_irqsave(), etc., as that would obviously never fly).

The Linux-specific bits (ena_netdev.c, ena_ethtool.c, etc.) don't make
use of any of the overlapping functionality needed for the common
code.

> And one can modify pr_fmt do make pr_debug et al. have whatever prefix
> one wants.

Yup, that's an easy improvement.

> I suspect there will be several rounds of review to weed out things
> like this.  You can preempt a lot of that by removing as much in your
> driver that the kernel has existing facilities for.

Are there other things that jump out at you? I felt like this was
pretty good for an initial submission in terms of striking a balance
between using a portable core while avoiding a lot of compatibility
shims.

--msw

[PATCH] tipc: eliminate uninitialized variable warning

2016-06-14 Thread Ying Xue

net/tipc/link.c: In function ‘tipc_link_timeout’:
net/tipc/link.c:744:28: warning: ‘mtyp’ may be used uninitialized in this 
function [-Wuninitialized]

Fixes: 42b18f605fea ("tipc: refactor function tipc_link_timeout()")
Acked-by: Jon Maloy 
Signed-off-by: Ying Xue 
---
 net/tipc/link.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 7059c94..67b6ab9 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -704,7 +704,8 @@ static void link_profile_stats(struct tipc_link *l)
  */
 int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq)
 {
-   int mtyp, rc = 0;
+   int mtyp = 0;
+   int rc = 0;
bool state = false;
bool probe = false;
bool setup = false;
-- 
1.7.9.5

[PATCH] tipc: fix suspicious RCU usage

2016-06-14 Thread Ying Xue

When run tipcTS&tipcTC test suite, the following complaint appears:

[   56.926168] ===
[   56.926169] [ INFO: suspicious RCU usage. ]
[   56.926171] 4.7.0-rc1+ #160 Not tainted
[   56.926173] ---
[   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() 
usage!
[   56.926175]
[   56.926175] other info that might help us debug this:
[   56.926175]
[   56.926177]
[   56.926177] rcu_scheduler_active = 1, debug_locks = 1
[   56.926179] 3 locks held by swapper/4/0:
[   56.926180]  #0:  (((&req->timer))){+.-...}, at: [] 
call_timer_fn+0x5/0x340
[   56.926203]  #1:  (&(&req->lock)->rlock){+.-...}, at: [] 
disc_timeout+0x1b/0xd0 [tipc]
[   56.926212]  #2:  (rcu_read_lock){..}, at: [] 
tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926218]
[   56.926218] stack backtrace:
[   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160
[   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   56.926224]   880016803d28 813c4423 
8800154252c0
[   56.926227]  0001 880016803d58 810b7512 
8800124d8120
[   56.926230]  880013f8a160 8800132b5ccc 8800124d8120 
880016803d88
[   56.926234] Call Trace:
[   56.926235][] dump_stack+0x67/0x94
[   56.926250]  [] lockdep_rcu_suspicious+0xe2/0x120
[   56.926256]  [] tipc_l2_send_msg+0x131/0x1c0 [tipc]
[   56.926261]  [] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
[   56.926266]  [] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926273]  [] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926278]  [] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926283]  [] disc_timeout+0x56/0xd0 [tipc]
[   56.926288]  [] call_timer_fn+0xb8/0x340
[   56.926291]  [] ? call_timer_fn+0x5/0x340
[   56.926296]  [] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926300]  [] run_timer_softirq+0x23a/0x390
[   56.926306]  [] ? clockevents_program_event+0x7f/0x130
[   56.926316]  [] __do_softirq+0xc3/0x4a2
[   56.926323]  [] irq_exit+0x8a/0xb0
[   56.926327]  [] smp_apic_timer_interrupt+0x46/0x60
[   56.926331]  [] apic_timer_interrupt+0x89/0x90
[   56.926333][] ? default_idle+0x2a/0x1a0
[   56.926340]  [] ? default_idle+0x28/0x1a0
[   56.926342]  [] arch_cpu_idle+0xf/0x20
[   56.926345]  [] default_idle_call+0x2f/0x50
[   56.926347]  [] cpu_startup_entry+0x215/0x3e0
[   56.926353]  [] start_secondary+0xf9/0x100

The warning appears as rtnl_dereference() is wrongly used in
tipc_l2_send_msg() under RCU read lock protection. Instead the proper
usage should be that rcu_dereference_rtnl() is called here.

Fixes: 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer layer")
Acked-by: Jon Maloy 
Signed-off-by: Ying Xue 
---
 net/tipc/bearer.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index 6f11c62..bf8f05c 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -405,7 +405,7 @@ int tipc_l2_send_msg(struct net *net, struct sk_buff *skb,
return 0;
 
/* Send RESET message even if bearer is detached from device */
-   tipc_ptr = rtnl_dereference(dev->tipc_ptr);
+   tipc_ptr = rcu_dereference_rtnl(dev->tipc_ptr);
if (unlikely(!tipc_ptr && !msg_is_reset(buf_msg(skb
goto drop;
 
-- 
1.7.9.5

Re: [PATCH] net: hns: update the dependency

2016-06-14 Thread Yisen Zhuang

Hi David,

You mean that i send this patch 3 times?

I am sorry for this.

I don't know why you can receive 3 times. I can only receive an email for this 
patch.

Thanks,

Yisen

在 2016/6/15 13:26, David Miller 写道:
> From: Yisen Zhuang 
> Date: Mon, 13 Jun 2016 19:56:27 +0800
> 
>> From: Kejian Yan 
>>
>> After the patchset about adding support of ACPI (commit id is 6343488)
>> being applied, HNS does not depend on OF. It depends on OF or ACPI, so
>> the Kconfig file needs to be updated.
>>
>> Signed-off-by: Kejian Yan 
>> Signed-off-by: Yisen Zhuang 
> 
> Why did you submit this same exact patch 3 times?
> 
> .
>

Re: [PATCH v1 1/1] ipv4: Prevent malformed UFO fragments in ip_append_page

2016-06-14 Thread David Miller

From: Steven Caron 
Date: Mon, 13 Jun 2016 14:01:19 +

> As  the ip fragment offset field counts 8-byte chunks, non-final ip
> fragments must be multiples of 8 bytes of payload. Depending  on the
> mtu and ip option sizes, ip_append_page wasn't respecting this,
> notably when running NFS under UDP.
> 
> Signed-off-by: Steven Caron 

This seems to have DOS newlines or something strange like that.

Please fix your email client to send clean patches.

Thanks.

Re: [PATCH net] gre: fix error handler

2016-06-14 Thread Eric Dumazet

On Tue, 2016-06-14 at 22:15 -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> 1) gre_parse_header() can be called from gre_err()
> 
>At this point transport header points to ICMP header, not the inner
> header.
> 
> 2) We can not really change transport header as ipgre_err() will later
> assume transport header still points to ICMP header.
> 
> 3) pskb_may_pull() logic in gre_parse_header() really works
>   if we are interested at zone pointed by skb->data
> 
> So this fix :
> 
> A) changes gre_parse_header() to use skb->data instead of
> skb_transport_header()
> 
> B) changes gre_err() to pull the IPv4 header immediately following
> the ICMP header that was already pulled earlier.
> 
> C) remove obsolete IPV6 includes
> 
> Signed-off-by: Eric Dumazet 
> Cc: Tom Herbert 
> Cc: Maciej Żenczykowski 
> ---
>  net/ipv4/gre_demux.c |4 ++--
>  net/ipv4/ip_gre.c|9 +++--
>  2 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
> index 4c39f4fd332a..0ba26ad9809d 100644
> --- a/net/ipv4/gre_demux.c
> +++ b/net/ipv4/gre_demux.c
> @@ -71,7 +71,7 @@ int gre_parse_header(struct sk_buff *skb, struct 
> tnl_ptk_info *tpi,
>   if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr
>   return -EINVAL;
>  
> - greh = (struct gre_base_hdr *)skb_transport_header(skb);
> + greh = (struct gre_base_hdr *)skb->data;
>   if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING)))
>   return -EINVAL;
>  
> @@ -81,7 +81,7 @@ int gre_parse_header(struct sk_buff *skb, struct 
> tnl_ptk_info *tpi,
>   if (!pskb_may_pull(skb, hdr_len))
>   return -EINVAL;
>  
> - greh = (struct gre_base_hdr *)skb_transport_header(skb);
> + greh = (struct gre_base_hdr *)skb->data;
>   tpi->proto = greh->protocol;
>  
>   options = (__be32 *)(greh + 1);
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 4d2025f7ec57..454832bc2897 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -49,12 +49,6 @@
>  #include 
>  #include 
>  
> -#if IS_ENABLED(CONFIG_IPV6)
> -#include 
> -#include 
> -#include 
> -#endif
> -
>  /*
> Problems & solutions
> 
> @@ -217,11 +211,14 @@ static void gre_err(struct sk_buff *skb, u32 info)
>* by themselves???
>*/
>  
> + const struct iphdr *iph = (struct iphdr *)skb->data;
>   const int type = icmp_hdr(skb)->type;
>   const int code = icmp_hdr(skb)->code;
>   struct tnl_ptk_info tpi;
>   bool csum_err = false;
>  
> + pskb_pull(skb, iph->ihl * 4);
> +
>   if (gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IP)) < 0) {
>   if (!csum_err)  /* ignore csum errors. */
>   return;
> 


Hmm... I read this prior commit. It looks like we need something else.

commit b7f8fe251e4609e2a437bd2c2dea01e61db6849c
Author: Jiri Benc 
Date:   Fri Apr 29 23:31:32 2016 +0200

gre: do not pull header in ICMP error processing

iptunnel_pull_header expects that IP header was already pulled; with this
expectation, it pulls the tunnel header. This is not true in gre_err.
Furthermore, ipv4_update_pmtu and ipv4_redirect expect that skb->data points
to the IP header.

We cannot pull the tunnel header in this path. It's just a matter of not
calling iptunnel_pull_header - we don't need any of its effects.

Fixes: bda7bb463436 ("gre: Allow multiple protocol listener for gre 
protocol.")
Signed-off-by: Jiri Benc 
Signed-off-by: David S. Miller

Re: [patch net-next] net: hns: add skb_reset_mac_header() after skb being alloc

2016-06-14 Thread David Miller

From: Yisen Zhuang 
Date: Mon, 13 Jun 2016 20:41:22 +0800

> From: Kejian Yan 
> 
> HNS receives a packet without doing anything, but it should call
> skb_reset_mac_header() to initialize the header before using
> eth_hdr().
> 
> Fixes: 0d6b425a3773c3445b0f51b2f333821beaacb619
> Signed-off-by: Kejian Yan 
> Signed-off-by: Yisen Zhuang 

Well, this patch made me look at this function.

You really shouldn't be filtering packets looped back, that is
the stack's job.  It shouldn't be happening in the driver.

And once you remove that code, this patch here is no longer
necessary.

Second of all, unless you card supports every protocol that
exists in the past, present, and _future_ you cannot set
skb->ip_summed to CHECKSUM_UNNECSSARY unconditionally like
that.

You can only set that for protocols your chip actually supports.

Re: [PATCH net-next 0/3] r8152: code adjustment for PHY

2016-06-14 Thread David Miller

From: Hayes Wang 
Date: Mon, 13 Jun 2016 17:49:35 +0800

> These patches are for adjusting the code about PHY and setting speed.

Series applied, thanks.

Re: [PATCH net-next] net/sched: flower: Return error when hw can't offload and skip_sw is set

2016-06-14 Thread David Miller

From: Amir Vadai 
Date: Mon, 13 Jun 2016 12:06:39 +0300

> From: Amir Vadai 
> 
> When skip_sw is set and hardware fails to apply filter, return error to
> user. This will make error propagation logic similar to the one
> currently used in u32 classifier.
> Also, changed code to use tc_skip_sw() utility function.
> 
> Signed-off-by: Amir Vadai 

Applied, thanks.

Re: [PATCH net-next 1/1] Remove the redundant initialization statement.

2016-06-14 Thread David Miller

From: Sudarsana Reddy Kalluru 
Date: Mon, 13 Jun 2016 05:01:10 -0400

> The callback ".getcbx" is being initialized twice. This was introduced by
> the commit 0fdeb72aa6c9 ("qede: Add dcbnl support."). The patch removes the
> redundant initialization statement.
> 
> Reported-by: Julia Lawall  
> Signed-off-by: Yuval Mintz 
> Signed-off-by: Sudarsana Reddy Kalluru 

This subject line is not properly formed.

After "[PATCH ...] " you must provide a proper subsystem prefix specifier
followed by a colon character, and a SPACE.

Otherwise nobody reading the GIT shortlog has any idea where in the kernel
your change is taking place.

An appropriate subsystem prefix for this patch woult be "qede: ".

Re: [PATCH] net: hns: update the dependency

2016-06-14 Thread David Miller

From: Yisen Zhuang 
Date: Mon, 13 Jun 2016 19:56:27 +0800

> From: Kejian Yan 
> 
> After the patchset about adding support of ACPI (commit id is 6343488)
> being applied, HNS does not depend on OF. It depends on OF or ACPI, so
> the Kconfig file needs to be updated.
> 
> Signed-off-by: Kejian Yan 
> Signed-off-by: Yisen Zhuang 

Why did you submit this same exact patch 3 times?

Re: [PATCH net-next] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-06-14 Thread David Miller

From: Netanel Belgazal 
Date: Mon, 13 Jun 2016 11:46:13 +0300

> +#define ena_trc_dbg(format, arg...) \
> + pr_debug("[ENA_COM: %s] " format, __func__, ##arg)
> +#define ena_trc_info(format, arg...) \
> + pr_info("[ENA_COM: %s] " format, __func__, ##arg)
> +#define ena_trc_warn(format, arg...) \
> + pr_warn("[ENA_COM: %s] " format, __func__, ##arg)
> +#define ena_trc_err(format, arg...) \
> + pr_err("[ENA_COM: %s] " format, __func__, ##arg)

These custom tracing macros are quite inappropriate.

We have the function tracer in the kernel when that is needed.  So spitting
out __func__ all over the place is not something that should be found in
drivers these days.

And one can modify pr_fmt do make pr_debug et al. have whatever prefix
one wants.

I suspect there will be several rounds of review to weed out things
like this.  You can preempt a lot of that by removing as much in your
driver that the kernel has existing facilities for.

Thanks.

Re: [PATCH v2 0/5] ipvs: fix backup sync daemon with IPv6, and minor updates

2016-06-14 Thread Julian Anastasov

Hello,

On Tue, 14 Jun 2016, Quentin Armitage wrote:

> This series of patches arise from discovering that:
> ipvsadm --start-daemon backup --mcast-group IPv6_address ...
> would always fail.
> 
> The first patch resolves the problem. The second and third patches are
> optimizations that were noticed while investigating the original problem.
> The fourth patch adds a lock which appears to have been omitted, and the
> final patch adds the recently added sync daemon multicast parameters to
> the log messages that are written when the sync daemons start.
> 
> v2 fixes a compile error in a debug message identified by kbuild test robot.
> Now compiles with CONFIG_IP_VS_DEBUG enabled. Patch 2/5 is modified to correct
> the problem, and patch 3/5 is modifed to apply with the modified patch 2/5.
> 
> Quentin Armitage (5):
>   ipvs: Enable setting IPv6 multicast address for ipvs sync daemon
> backup
>   ipvs: Stop calling __dev_get_by_name() repeatedly when starting sync
> daemon
>   ipvs: Don't check result < 0 after setting result = 0
>   ipvs: Lock socket before setting SK_CAN_REUSE
>   ipvs: log additional sync daemon parameters
> 
>  net/netfilter/ipvs/ip_vs_sync.c |  104 +++---
>  1 files changed, 52 insertions(+), 52 deletions(-)
> 
> -- 
> 1.7.7.6

Thanks for catching this bug. Following are my
comments for the patches:

Patch 1:

I missed the fact that link-local addresses (ffx2) require
binding to ifindex due to __ipv6_addr_needs_scope_id check,
I tested only with a ff05 address. BTW, ff01 is a node-local
address (loopback), you should not use it for IPVS.

Instead of directly writing into sin6_scope_id we can use
'sock->sk->sk_bound_dev_if = ifindex;' before bind(), it will
work for v4 and v6. Let me know if such solution works.

You have to send this patch as a bugfix, it should
apply to the net tree and later will go to stable trees (4.3+),
i.e. 4.4, 4.5, 4.6 and 4.7, I don't see stable 4.3 in
https://www.kernel.org/. You should mention in commit message
that this patch is a fix to specific commit (check
Documentation/SubmittingPatches):

Fixes: d33288172e72 ("ipvs: add more mcast parameters for the sync daemon")

The other patches will go to the net-next tree in
separate patchset but I see little fuzz if patch 2 is applied
without patch 1, so may be this patchset should wait the first
patch to appear in net-next kernel.

Patch 2: looks OK

Patch 3: looks OK

It was done this way to not exceed the 80-char limit.
May be you can reduce the message for the same reason.

Patch 4: looks OK

Before bind() such operations should be safe without locks.

Patch 5:

No need of <> for the commit IDs.

The indentation of existing pr_info in both cases
should not be changed.

Patches 1, 2, 3 have coding style warnings from checkpatch
that can be fixed, you can check them in this way:

scripts/checkpatch.pl --strict /tmp/file.patch

Regards

--
Julian Anastasov

Re: [PATCH net v2 0/4] ovs: fix rtnl notifications on interface deletion

2016-06-14 Thread David Miller

From: pravin shelar 
Date: Mon, 13 Jun 2016 12:39:16 -0700

> On Mon, Jun 13, 2016 at 1:31 AM, Nicolas Dichtel
>  wrote:
>>
>> There was no rtnl notifications for interfaces (gre, vxlan, geneve) created
>> by ovs. This problem is fixed by adjusting the creation path.
>>
>> v1 -> v2:
>>  - add patch #1 and #4
>>  - rework error handling in patch #2
>>
>>  drivers/net/geneve.c | 14 ++---
>>  drivers/net/vxlan.c  | 58 
>> ++--
>>  net/ipv4/ip_gre.c| 14 ++---
>>  3 files changed, 56 insertions(+), 30 deletions(-)
>>
> 
> All patches looks good to me.
> 
> Acked-by: Pravin B Shelar 

Series applied, thanks.

[PATCH net] gre: fix error handler

2016-06-14 Thread Eric Dumazet

From: Eric Dumazet 

1) gre_parse_header() can be called from gre_err()

   At this point transport header points to ICMP header, not the inner
header.

2) We can not really change transport header as ipgre_err() will later
assume transport header still points to ICMP header.

3) pskb_may_pull() logic in gre_parse_header() really works
  if we are interested at zone pointed by skb->data

So this fix :

A) changes gre_parse_header() to use skb->data instead of
skb_transport_header()

B) changes gre_err() to pull the IPv4 header immediately following
the ICMP header that was already pulled earlier.

C) remove obsolete IPV6 includes

Signed-off-by: Eric Dumazet 
Cc: Tom Herbert 
Cc: Maciej Żenczykowski 
---
 net/ipv4/gre_demux.c |4 ++--
 net/ipv4/ip_gre.c|9 +++--
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 4c39f4fd332a..0ba26ad9809d 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -71,7 +71,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info 
*tpi,
if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr
return -EINVAL;
 
-   greh = (struct gre_base_hdr *)skb_transport_header(skb);
+   greh = (struct gre_base_hdr *)skb->data;
if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING)))
return -EINVAL;
 
@@ -81,7 +81,7 @@ int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info 
*tpi,
if (!pskb_may_pull(skb, hdr_len))
return -EINVAL;
 
-   greh = (struct gre_base_hdr *)skb_transport_header(skb);
+   greh = (struct gre_base_hdr *)skb->data;
tpi->proto = greh->protocol;
 
options = (__be32 *)(greh + 1);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4d2025f7ec57..454832bc2897 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -49,12 +49,6 @@
 #include 
 #include 
 
-#if IS_ENABLED(CONFIG_IPV6)
-#include 
-#include 
-#include 
-#endif
-
 /*
Problems & solutions

@@ -217,11 +211,14 @@ static void gre_err(struct sk_buff *skb, u32 info)
 * by themselves???
 */
 
+   const struct iphdr *iph = (struct iphdr *)skb->data;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
struct tnl_ptk_info tpi;
bool csum_err = false;
 
+   pskb_pull(skb, iph->ihl * 4);
+
if (gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IP)) < 0) {
if (!csum_err)  /* ignore csum errors. */
return;

Re: [PATCH net-next 00/10] net_sched: defer skb freeing while changing qdiscs

2016-06-14 Thread Eric Dumazet

On Tue, 2016-06-14 at 19:13 -0700, Cong Wang wrote:

> No objection from me. It looks like a good optimization
> before we can improve the qdisc root spinlock.
> 
> Just one nit: You probably want to keep rtnl_kfree_skbs()
> within qdisc layer unless you have any plan to use it
> in other places.

I do not see other places freeing a lot of skbs under rtnl ;)

As a side effect, using rtnl_kfree_skb() changes call graph given by
drop monitor, so better avoid this unless really needed.

Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-14 Thread Takashi Sakamoto


Hi Richard,

On Tue, 14 Jun 2016 19:04:44 +0200, Richard Cochran write:
>> Well, I guess I should have said, I am not too familiar with the
>> breadth of current audio hardware, high end or low end.  Of course I
>> would like to see even consumer devices work with AVB, but it is up to
>> the ALSA people to make that happen.  So far, nothing has been done,
>> afaict.

In OSS world, there's few developers for this kind of devices, even if 
it's alsa-project. Furthermore, manufacturerer for recording equipments 
have no interests in OSS.


In short, what we can do for these devices is just to 
reverse-engineering. For models of Ethernet-AVB, it might be just to 
transfer or receive packets, and read them. The devices are still 
black-boxes and we have no ways to reveal their details.


So when you require the details to implement something in your side, few 
developers can tell you, I think.



Regards

Takashi Sakamoto

Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-14 Thread Takashi Sakamoto

Hi Richard,

> On Mon, Jun 13, 2016 at 01:47:13PM +0200, Richard Cochran wrote:
>> 3. ALSA support for tunable AD/DA clocks.  The rate of the Listener's
>>DA clock must match that of the Talker and the other Listeners.
>>Either you adjust it in HW using a VCO or similar, or you do
>>adaptive sample rate conversion in the application. (And that is
>>another reason for *not* having a shared kernel buffer.)  For the
>>Talker, either you adjust the AD clock to match the PTP time, or
>>you measure the frequency offset.
>>
>> I have seen audio PLL/multiplier chips that will take, for example, a
>> 10 kHz input and produce your 48 kHz media clock.  With the right HW
>> design, you can tell your PTP Hardware Clock to produce a 1 PPS,
>> and you will have a synchronized AVB endpoint.  The software is all
>> there already.  Somebody should tell the ALSA guys about it.

Just from my curiosity, could I ask you more explanation for it in ALSA 
side?

The similar mechanism to synchronize endpoints was also applied to audio 
and music unit on IEEE 1394 bus. According to IEC 61883-1/6, some of 
these actual units can generate presentation-timestamp from header 
information of 8,000 packet per sec, and utilize the signal as sampling 
clock[1].

There's much differences between IEC 61883-1/6 on IEEE 1394 bus and 
Audio and Video Bridge on Ethernet[2], especially for synchronization, 
but in this point of transferring synchnization signal and time-based 
data, we have the similar requirements of software implementations, I think.

My motivation to join in this discussion is to consider about to make it 
clear to implement packet-oriented drivers in ALSA kernel-land, and 
enhance my work for drivers to handle IEC 61883-1/6 on IEEE 1394 bus.

>> I don't know if ALSA has anything for sample rate conversion or not,
>> but haven't seen anything that addresses distributed synchronized
>> audio applications.

In ALSA, sampling rate conversion should be in userspace, not in kernel 
land. In alsa-lib, sampling rate conversion is implemented in shared 
object. When userspace applications start playbacking/capturing, 
depending on PCM node to access, these applications load the shared 
object and convert PCM frames from buffer in userspace to mmapped 
DMA-buffer, then commit them.

Before establishing a PCM substream, userspace applications and 
in-kernel drivers communicate to decide sampling rate, PCM frame format, 
the size of PCM buffer, and so on. (see snd_pcm_hw_params() and 
ioctl(SNDRV_PCM_IOCTL_HW_PARAMS)). Thus, as long as in-kernel drivers 
know specifications of endpoints, userspace applications can start PCM 
substreams correctly.

[1] In detail, please refer to specification of 1394TA I introduced:
http://www.spinics.net/lists/netdev/msg381259.html
[2] I guess that IEC 61883-1/6 packet for Ethernet-AVB is a mutant from 
original specifications.

Regards

Takashi Sakamoto

Re: [PATCH net-next 00/10] net_sched: defer skb freeing while changing qdiscs

2016-06-14 Thread Cong Wang

On Mon, Jun 13, 2016 at 8:21 PM, Eric Dumazet  wrote:
> qdiscs/classes are changed under RTNL protection and often
> while blocking BH and root qdisc spinlock.
>
> When lots of skbs need to be dropped, we free
> them under these locks causing TX/RX freezes,
> and more generally latency spikes.
>
> I saw spikes of 50+ ms on quite fast hardware...
>
> This patch series adds a simple queue protected by RTNL
> where skbs can be placed until RTNL is released.
>
> Note that this might also serve in the future for optional
> reinjection of packets when a qdisc is replaced.

No objection from me. It looks like a good optimization
before we can improve the qdisc root spinlock.

Just one nit: You probably want to keep rtnl_kfree_skbs()
within qdisc layer unless you have any plan to use it
in other places.

Thanks!

RE: [Intel-wired-lan] [PATCHv2 net-next] net: igb: Only dma sync frame length

2016-06-14 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Andrew Lunn
> Sent: Friday, June 3, 2016 2:03 PM
> To: Kirsher, Jeffrey T ; David Miller
> 
> Cc: netdev ; intel-wired-...@lists.osuosl.org;
> Andrew Lunn 
> Subject: [Intel-wired-lan] [PATCHv2 net-next] net: igb: Only dma sync frame
> length
> 
> On some platforms, syncing a buffer for DMA is expensive. Rather than
> sync the whole 2K receive buffer, only synchronise the length of the
> frame, which will typically be the MTU, or a much smaller TCP ACK.
> 
> For an IMX6Q, this gives around 6% increased TCP receive performance,
> which is cache operations bound and reduces CPU load for TCP transmit.
> 
> Signed-off-by: Andrew Lunn 
> ---
> v2:
> Christmas tree the local variables
> Pass size into igb_add_rx_frag() rather than repeating the endiness swap.
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Tested-by: Aaron Brown

Re: [PATCH v9 05/22] IB/hns: Add initial profile resource

2016-06-14 Thread oulijun

Hi,
On 2016/6/9 15:01, Leon Romanovsky wrote:
> On Wed, Jun 01, 2016 at 11:37:47PM +0800, Lijun Ou wrote:
>> This patch mainly configured some profile resoure. For example,
>> vendor_id, hardware version, and some data structure sizes so on.
>>
>> Signed-off-by: Wei Hu 
>> Signed-off-by: Nenglong Zhao 
>> Signed-off-by: Lijun Ou 
>> ---
> 
> <...>
> 
>> +#define HNS_ROCE_V1_NUM_COMP_EQE0x8000
>> +#define HNS_ROCE_V1_NUM_ASYNC_EQE   0x400
> 
> Something wrong with indentation.
> 
I have checked it in my local code and will send a new patch at soon.

thanks
Lijun Ou

Re: [PATCH net] htb: call qdisc_root with rcu read lock held

2016-06-14 Thread Cong Wang

On Mon, Jun 13, 2016 at 9:16 PM, Florian Westphal  wrote:
> saw a debug splat:
> net/include/net/sch_generic.h:287 suspicious rcu_dereference_check() usage!
> other info that might help us debug this:
> rcu_scheduler_active = 1, debug_locks = 0
>  2 locks held by kworker/2:1/710:
>   #0:  ("events"){.+.+.+}, at: []
>   #1:  ((&q->work)){+.+...}, at: [] 
> process_one_work+0x14d/0x690
> Workqueue: events htb_work_func
> Call Trace:
>  [] dump_stack+0x85/0xc2
>  [] lockdep_rcu_suspicious+0xe7/0x120
>  [] htb_work_func+0x67/0x70
>
> Signed-off-by: Florian Westphal 

Acked-by: Cong Wang

Re: [RFC PATCH iproute2] tc: let m_ipt work with new iptables API headers

2016-06-14 Thread Stephen Hemminger

On Sun, 29 May 2016 20:27:13 +0200
Alexander Aring  wrote:

> Since commit 5cd1adb ("Update to current iptables headers") the build
> with m_ipt.o and the following config will fail:
> 
> TC_CONFIG_XT:=n
> TC_CONFIG_XT_OLD:=n
> TC_CONFIG_XT_OLD_H:=n
> 
> This patch renames "iptables_target" to "xtables_target" and some other
> things which gets renamed and I noticed while reading iptables git log.
> Functions which are not used in m_ipt.c and not exported by the header
> are removed, if they still used in m_ipt.c I added a static to the function.
> 
> Reported-by: Clemens Gruber 
> Signed-off-by: Alexander Aring 

Applied.
Not sure why this never showed up in patchwork.

Re: [ldv-project] [net] libertas: potential race condition

2016-06-14 Thread James Cameron

On Tue, Jun 14, 2016 at 05:16:11PM +0400, Pavel Andrianov wrote:
> 08.06.2016 02:51, James Cameron пишет:
> >On Tue, Jun 07, 2016 at 09:39:55AM -0500, Dan Williams wrote:
> >>On Tue, 2016-06-07 at 13:30 +0400, Pavel Andrianov wrote:
> >>>Hi!
> >>>
> >>>There is a potential race condition in
> >>>drivers/net/wireless/libertas/libertas.ko.
> >>>In the function lbs_hard_start_xmit(..), line 159, a socket buffer
> >>>is
> >>>written to priv->current_skb with a spin_lock protection.
> >>>In the function lbs_mac_event_disconnected(..), lines 50-51, the
> >>>field
> >>>current_skb is cleaned. There is no protection used. The
> >>>corresponding
> >>>handlers are activated at the same time in lbs_start_card(..) and
> >>>then
> >>>may be executed simultaneously. Note, there are two structures
> >>>lbs_netdev_ops and mesh_netdev_ops, which have the target handler
> >>>lbs_hard_start_xmit.
> >>>Is it a real race or I have missed something?
> >>Yeah, it looks like it should be grabbing priv->driver_lock before
> >>clearing priv->currenttxskb in lbs_mac_event_disconnected().  Care to
> >>submit a patch after testing?  Do you have any of that hardware?
> >I've hardware, with serial console.
> >
> >Can test any patch, on USB (8388) or SDIO (8686).
> >
> Hi!
> 
> I've prepare the patch for this issue. Could you test it?
> 
> Thank you.

Tested on OLPC XO-1 (usb8388) and XO-1.5 (sd8686) with v4.7-rc3.

Confirmed that lbs_mac_event_disconnected is being called on the
station when hostapd on access point is given SIGHUP.

Longer duration test was;

- SSH to station and run "top -d 0.2",

- send SIGHUP every six seconds, for 300 cycles,

You may add my;

Tested-by: James Cameron 

-- 
James Cameron
http://quozl.netrek.org/

Re: [PATCH] IPsec NAT-T issue

2016-06-14 Thread Blair Steven

The restoration is happening - but being actioned on the wrong location.

The destination IP address is being saved and restored, and the SPI 
being written directly after the destination IP address. From my 
understanding though, the ESN shuffling should have saved and restored 
the UDP source / dest ports + SPI.

-Blair

On 06/13/2016 10:20 PM, Steffen Klassert wrote:
> On Mon, Jun 13, 2016 at 11:48:13AM +1200, Blair Steven wrote:
>> During testing we have discovered an issue with IPsec NAT-T where the SPI
>> is over writing the source and dest ports of the UDP header.
> The headers should be restored after the crypto operation in
> esp_restore_header(). Does this not happen in your case? What
> kind of problem do you experience?
>

Re: [PATCH iproute2] ip rule: Add support for l3mdev rules

2016-06-14 Thread Stephen Hemminger

On Fri, 10 Jun 2016 10:47:17 -0700
David Ahern  wrote:

> Kernel commit 96c63fa7393d ("net: Add l3mdev rule") added support for
> the FRA_L3MDEV attribute. The attribute enables use of l3mdev rules
> which mean 'get table id from l3 master device'. This patch adds
> support to iproute2 to show, add and delete rules with this attribute.
> 
> Signed-off-by: David Ahern 

Applied to net-next

[PATCH net-next V2 7/9] liquidio: New driver FW command structure

2016-06-14 Thread Raghu Vatsavayi

This patch is for new driver/firmware control command structure
(octnic_packet_params and octnic_cmd_setup ) and resultant code changes.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 109 +
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  25 +++--
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  |  15 +--
 3 files changed, 19 insertions(+), 130 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index aa28790..1f1a28d 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2704,68 +2704,6 @@ static inline int send_nic_timestamp_pkt(struct 
octeon_device *oct,
return retval;
 }
 
-static inline int is_ipv4(struct sk_buff *skb)
-{
-   return (skb->protocol == htons(ETH_P_IP)) &&
-  (ip_hdr(skb)->version == 4);
-}
-
-static inline int is_vlan(struct sk_buff *skb)
-{
-   return skb->protocol == htons(ETH_P_8021Q);
-}
-
-static inline int is_ip_fragmented(struct sk_buff *skb)
-{
-   /* The Don't fragment and Reserved flag fields are ignored.
-* IP is fragmented if
-* -  the More fragments bit is set (indicating this IP is a fragment
-* with more to follow; the current offset could be 0 ).
-* -  ths offset field is non-zero.
-*/
-   return (ip_hdr(skb)->frag_off & htons(IP_MF | IP_OFFSET)) ? 1 : 0;
-}
-
-static inline int is_ipv6(struct sk_buff *skb)
-{
-   return (skb->protocol == htons(ETH_P_IPV6)) &&
-  (ipv6_hdr(skb)->version == 6);
-}
-
-static inline int is_with_extn_hdr(struct sk_buff *skb)
-{
-   return (ipv6_hdr(skb)->nexthdr != IPPROTO_TCP) &&
-  (ipv6_hdr(skb)->nexthdr != IPPROTO_UDP);
-}
-
-static inline int is_tcpudp(struct sk_buff *skb)
-{
-   return (ip_hdr(skb)->protocol == IPPROTO_TCP) ||
-  (ip_hdr(skb)->protocol == IPPROTO_UDP);
-}
-
-static inline u32 get_ipv4_5tuple_tag(struct sk_buff *skb)
-{
-   u32 tag;
-   struct iphdr *iphdr = ip_hdr(skb);
-
-   tag = crc32(0, &iphdr->protocol, 1);
-   tag = crc32(tag, (u8 *)&iphdr->saddr, 8);
-   tag = crc32(tag, skb_transport_header(skb), 4);
-   return tag;
-}
-
-static inline u32 get_ipv6_5tuple_tag(struct sk_buff *skb)
-{
-   u32 tag;
-   struct ipv6hdr *ipv6hdr = ipv6_hdr(skb);
-
-   tag = crc32(0, &ipv6hdr->nexthdr, 1);
-   tag = crc32(tag, (u8 *)&ipv6hdr->saddr, 32);
-   tag = crc32(tag, skb_transport_header(skb), 4);
-   return tag;
-}
-
 /** \brief Transmit networks packets to the Octeon interface
  * @param skbuff   skbuff struct to be passed to network layer.
  * @param netdevpointer to network device
@@ -2852,52 +2790,11 @@ static int liquidio_xmit(struct sk_buff *skb, struct 
net_device *netdev)
 
cmdsetup.u64 = 0;
cmdsetup.s.ifidx = lio->linfo.ifidx;
+   cmdsetup.s.iq_no = iq_no;
 
-   if (skb->ip_summed == CHECKSUM_PARTIAL) {
-   if (is_ipv4(skb) && !is_ip_fragmented(skb) && is_tcpudp(skb)) {
-   tag = get_ipv4_5tuple_tag(skb);
-
-   cmdsetup.s.cksum_offset = sizeof(struct ethhdr) + 1;
-
-   if (ip_hdr(skb)->ihl > 5)
-   cmdsetup.s.ipv4opts_ipv6exthdr =
-   OCT_PKT_PARAM_IPV4OPTS;
-
-   } else if (is_ipv6(skb)) {
-   tag = get_ipv6_5tuple_tag(skb);
+   if (skb->ip_summed == CHECKSUM_PARTIAL)
+   cmdsetup.s.transport_csum = 1;
 
-   cmdsetup.s.cksum_offset = sizeof(struct ethhdr) + 1;
-
-   if (is_with_extn_hdr(skb))
-   cmdsetup.s.ipv4opts_ipv6exthdr =
-   OCT_PKT_PARAM_IPV6EXTHDR;
-
-   } else if (is_vlan(skb)) {
-   if (vlan_eth_hdr(skb)->h_vlan_encapsulated_proto
-   == htons(ETH_P_IP) &&
-   !is_ip_fragmented(skb) && is_tcpudp(skb)) {
-   tag = get_ipv4_5tuple_tag(skb);
-
-   cmdsetup.s.cksum_offset =
-   sizeof(struct vlan_ethhdr) + 1;
-
-   if (ip_hdr(skb)->ihl > 5)
-   cmdsetup.s.ipv4opts_ipv6exthdr =
-   OCT_PKT_PARAM_IPV4OPTS;
-
-   } else if (vlan_eth_hdr(skb)->h_vlan_encapsulated_proto
-   == htons(ETH_P_IPV6)) {
-   tag = get_ipv6_5tuple_tag(skb);
-
-   cmdsetup.s.cksum_offset =
-   sizeof(struct vlan_ethhdr) + 1;

[PATCH net-next V2 1/9] liquidio: Avoid double free during soft command

2016-06-14 Thread Raghu Vatsavayi

This patch is to resolve the double free issue by checking proper return
values from soft command.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 4 ++--
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 4 ++--
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c 
b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index 245c063..1096cdb 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -317,7 +317,7 @@ octnet_mdio45_access(struct lio *lio, int op, int loc, int 
*value)
 
retval = octeon_send_soft_command(oct_dev, sc);
 
-   if (retval) {
+   if (retval == IQ_SEND_FAILED) {
dev_err(&oct_dev->pci_dev->dev,
"octnet_mdio45_access instruction failed status: %x\n",
retval);
@@ -722,7 +722,7 @@ static int octnet_set_intrmod_cfg(void *oct, struct 
oct_intrmod_cfg *intr_cfg)
sc->wait_time = 1000;
 
retval = octeon_send_soft_command(oct_dev, sc);
-   if (retval) {
+   if (retval == IQ_SEND_FAILED) {
octeon_free_soft_command(oct_dev, sc);
return -EINVAL;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 655d89e..47fba0e 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2583,7 +2583,7 @@ static inline int send_nic_timestamp_pkt(struct 
octeon_device *oct,
retval = octeon_send_command(oct, sc->iq_no, ring_doorbell, &sc->cmd,
 sc, ih->dlengsz, ndata->reqtype);
 
-   if (retval) {
+   if (retval == IQ_SEND_FAILED) {
dev_err(&oct->pci_dev->dev, "timestamp data packet failed 
status: %x\n",
retval);
octeon_free_soft_command(oct, sc);
@@ -3192,7 +3192,7 @@ static int setup_nic_devices(struct octeon_device 
*octeon_dev)
sc->wait_time = 1000;
 
retval = octeon_send_soft_command(octeon_dev, sc);
-   if (retval) {
+   if (retval == IQ_SEND_FAILED) {
dev_err(&octeon_dev->pci_dev->dev,
"iq/oq config failed status: %x\n",
retval);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_nic.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_nic.c
index 1a01915..aacabe4 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_nic.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_nic.c
@@ -178,7 +178,7 @@ octnet_send_nic_ctrl_pkt(struct octeon_device *oct,
}
 
retval = octeon_send_soft_command(oct, sc);
-   if (retval) {
+   if (retval == IQ_SEND_FAILED) {
octeon_free_soft_command(oct, sc);
dev_err(&oct->pci_dev->dev, "%s soft command send failed 
status: %x\n",
__func__, retval);
-- 
1.8.3.1

[PATCH net-next V2 2/9] liquidio: Host queue mapping changes

2016-06-14 Thread Raghu Vatsavayi

This patch is to allocate the input queues based on Numa node in tx path
and queue mapping changes based on the mapping info provided by firmware.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c |  4 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 75 --
 .../net/ethernet/cavium/liquidio/liquidio_common.h | 44 -
 .../net/ethernet/cavium/liquidio/octeon_device.c   | 41 +++-
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   | 11 ++--
 .../net/ethernet/cavium/liquidio/request_manager.c | 32 +++--
 6 files changed, 142 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c 
b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index 1096cdb..2937c802 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -653,7 +653,7 @@ static int lio_get_intr_coalesce(struct net_device *netdev,
intrmod_cfg->intrmod_mincnt_trigger;
}
 
-   iq = oct->instr_queue[lio->linfo.txpciq[0]];
+   iq = oct->instr_queue[lio->linfo.txpciq[0].s.q_no];
intr_coal->tx_max_coalesced_frames = iq->fill_threshold;
break;
 
@@ -859,7 +859,7 @@ static int lio_set_intr_coalesce(struct net_device *netdev,
if ((intr_coal->tx_max_coalesced_frames >= CN6XXX_DB_MIN) &&
(intr_coal->tx_max_coalesced_frames <= CN6XXX_DB_MAX)) {
for (j = 0; j < lio->linfo.num_txpciq; j++) {
-   q_no = lio->linfo.txpciq[j];
+   q_no = lio->linfo.txpciq[j].s.q_no;
oct->instr_queue[q_no]->fill_threshold =
intr_coal->tx_max_coalesced_frames;
}
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 47fba0e..3477a3c 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -682,7 +682,8 @@ static inline void txqs_wake(struct net_device *netdev)
int i;
 
for (i = 0; i < netdev->num_tx_queues; i++)
-   netif_wake_subqueue(netdev, i);
+   if (__netif_subqueue_stopped(netdev, i))
+   netif_wake_subqueue(netdev, i);
} else {
netif_wake_queue(netdev);
}
@@ -752,11 +753,14 @@ static inline int check_txq_status(struct lio *lio)
 
/* check each sub-queue state */
for (q = 0; q < numqs; q++) {
-   iq = lio->linfo.txpciq[q & (lio->linfo.num_txpciq - 1)];
+   iq = lio->linfo.txpciq[q %
+   (lio->linfo.num_txpciq)].s.q_no;
if (octnet_iq_is_full(lio->oct_dev, iq))
continue;
-   wake_q(lio->netdev, q);
-   ret_val++;
+   if (__netif_subqueue_stopped(lio->netdev, q)) {
+   wake_q(lio->netdev, q);
+   ret_val++;
+   }
}
} else {
if (octnet_iq_is_full(lio->oct_dev, lio->txq))
@@ -1230,7 +1234,8 @@ static int liquidio_stop_nic_module(struct octeon_device 
*oct)
for (i = 0; i < oct->ifcount; i++) {
lio = GET_LIO(oct->props[i].netdev);
for (j = 0; j < lio->linfo.num_rxpciq; j++)
-   octeon_unregister_droq_ops(oct, lio->linfo.rxpciq[j]);
+   octeon_unregister_droq_ops(oct,
+  lio->linfo.rxpciq[j].s.q_no);
}
 
for (i = 0; i < oct->ifcount; i++)
@@ -1337,14 +1342,17 @@ static inline int check_txq_state(struct lio *lio, 
struct sk_buff *skb)
 
if (netif_is_multiqueue(lio->netdev)) {
q = skb->queue_mapping;
-   iq = lio->linfo.txpciq[(q & (lio->linfo.num_txpciq - 1))];
+   iq = lio->linfo.txpciq[(q % (lio->linfo.num_txpciq))].s.q_no;
} else {
iq = lio->txq;
+   q = iq;
}
 
if (octnet_iq_is_full(lio->oct_dev, iq))
return 0;
-   wake_q(lio->netdev, q);
+
+   if (__netif_subqueue_stopped(lio->netdev, q))
+   wake_q(lio->netdev, q);
return 1;
 }
 
@@ -1743,14 +1751,13 @@ static void if_cfg_callback(struct octeon_device *oct,
 static u16 select_q(struct net_device *dev, struct sk_buff *skb,
void *accel_priv, select_queue_fallback_t fallback)
 {
-   int qindex;
+   u32 qindex = 0;
struct lio *lio;
 
lio = GET_LIO(dev);
-   /* select queue on chosen queue_mapping or co

[PATCH net-next V2 6/9] liquidio: Consider PTP for packet size calculations

2016-06-14 Thread Raghu Vatsavayi

This patch is to refactor packet size calculations to support PTP enabled
for 66xx and 68xx cards and also other cards that do not support PTP.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 34 +-
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  6 ++--
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 3a4f31f..aa28790 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -84,6 +84,8 @@ static int conf_type;
 module_param(conf_type, int, 0);
 MODULE_PARM_DESC(conf_type, "select octeon configuration 0 default 1 ovs");
 
+static int ptp_enable = 1;
+
 /* Bit mask values for lio->ifstate */
 #define   LIO_IFSTATE_DROQ_OPS 0x01
 #define   LIO_IFSTATE_REGISTERED   0x02
@@ -1851,6 +1853,7 @@ liquidio_push_packet(u32 octeon_id,
if (netdev) {
int packet_was_received;
struct lio *lio = GET_LIO(netdev);
+   struct octeon_device *oct = lio->oct_dev;
 
/* Do not proceed if the interface is not in RUNNING state. */
if (!ifstate_check(lio, LIO_IFSTATE_RUNNING)) {
@@ -1889,21 +1892,26 @@ liquidio_push_packet(u32 octeon_id,
put_page(pg_info->page);
}
 
-   if (rh->r_dh.has_hwtstamp) {
-   /* timestamp is included from the hardware at the
-* beginning of the packet.
-*/
-   if (ifstate_check(lio,
- LIO_IFSTATE_RX_TIMESTAMP_ENABLED)) {
-   /* Nanoseconds are in the first 64-bits
-* of the packet.
+   if (((oct->chip_id == OCTEON_CN66XX) ||
+(oct->chip_id == OCTEON_CN68XX)) &&
+   ptp_enable) {
+   if (rh->r_dh.has_hwtstamp) {
+   /* timestamp is included from the hardware at
+* the beginning of the packet.
 */
-   memcpy(&ns, (skb->data), sizeof(ns));
-   shhwtstamps = skb_hwtstamps(skb);
-   shhwtstamps->hwtstamp =
-   ns_to_ktime(ns + lio->ptp_adjust);
+   if (ifstate_check
+   (lio, LIO_IFSTATE_RX_TIMESTAMP_ENABLED)) {
+   /* Nanoseconds are in the first 64-bits
+* of the packet.
+*/
+   memcpy(&ns, (skb->data), sizeof(ns));
+   shhwtstamps = skb_hwtstamps(skb);
+   shhwtstamps->hwtstamp =
+   ns_to_ktime(ns +
+   lio->ptp_adjust);
+   }
+   skb_pull(skb, sizeof(ns));
}
-   skb_pull(skb, sizeof(ns));
}
 
skb->protocol = eth_type_trans(skb, skb->dev);
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h 
b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 00b3ef5..84ffcae 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -174,9 +174,11 @@ static inline void add_sg_size(struct octeon_sg_entry 
*sg_entry,
 /*- End Scatter/Gather ---*/
 
 #define   OCTNET_FRM_PTP_HEADER_SIZE  8
-#define   OCTNET_FRM_HEADER_SIZE 30 /* PTP timestamp + VLAN + Ethernet */
 
-#define   OCTNET_MIN_FRM_SIZE(64  + OCTNET_FRM_PTP_HEADER_SIZE)
+#define   OCTNET_FRM_HEADER_SIZE 22 /* VLAN + Ethernet */
+
+#define   OCTNET_MIN_FRM_SIZE64
+
 #define   OCTNET_MAX_FRM_SIZE(16000 + OCTNET_FRM_HEADER_SIZE)
 
 #define   OCTNET_DEFAULT_FRM_SIZE(1500 + OCTNET_FRM_HEADER_SIZE)
-- 
1.8.3.1

[PATCH net-next V2 5/9] liquidio: RX desc alloc changes

2016-06-14 Thread Raghu Vatsavayi

This patch is to add page based buffers for receive side descriptors of
the driver and separate free routines for rx and tx buffers.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  34 +++-
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 132 +++--
 drivers/net/ethernet/cavium/liquidio/octeon_droq.h |  18 ++
 .../net/ethernet/cavium/liquidio/octeon_network.h  | 215 +++--
 4 files changed, 316 insertions(+), 83 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 0daa89a..3a4f31f 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -1439,7 +1439,7 @@ static void free_netbuf(void *buf)
 
check_txq_state(lio, skb);
 
-   recv_buffer_free((struct sk_buff *)skb);
+   tx_buffer_free(skb);
 }
 
 /**
@@ -1484,7 +1484,7 @@ static void free_netsgbuf(void *buf)
 
check_txq_state(lio, skb); /* mq support: sub-queue state check */
 
-   recv_buffer_free((struct sk_buff *)skb);
+   tx_buffer_free(skb);
 }
 
 /**
@@ -1862,6 +1862,32 @@ liquidio_push_packet(u32 octeon_id,
skb->dev = netdev;
 
skb_record_rx_queue(skb, droq->q_no);
+   if (likely(len > MIN_SKB_SIZE)) {
+   struct octeon_skb_page_info *pg_info;
+   unsigned char *va;
+
+   pg_info = ((struct octeon_skb_page_info *)(skb->cb));
+   if (pg_info->page) {
+   /* For Paged allocation use the frags */
+   va = page_address(pg_info->page) +
+   pg_info->page_offset;
+   memcpy(skb->data, va, MIN_SKB_SIZE);
+   skb_put(skb, MIN_SKB_SIZE);
+   skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
+   pg_info->page,
+   pg_info->page_offset +
+   MIN_SKB_SIZE,
+   len - MIN_SKB_SIZE,
+   LIO_RXBUFFER_SZ);
+   }
+   } else {
+   struct octeon_skb_page_info *pg_info =
+   ((struct octeon_skb_page_info *)(skb->cb));
+   skb_copy_to_linear_data(skb, page_address(pg_info->page)
+   + pg_info->page_offset, len);
+   skb_put(skb, len);
+   put_page(pg_info->page);
+   }
 
if (rh->r_dh.has_hwtstamp) {
/* timestamp is included from the hardware at the
@@ -2612,7 +2638,7 @@ static void handle_timestamp(struct octeon_device *oct,
}
 
octeon_free_soft_command(oct, sc);
-   recv_buffer_free(skb);
+   tx_buffer_free(skb);
 }
 
 /* \brief Send a data packet that will be timestamped
@@ -3001,7 +3027,7 @@ lio_xmit_failed:
   iq_no, stats->tx_dropped);
dma_unmap_single(&oct->pci_dev->dev, ndata.cmd.dptr,
 ndata.datasize, DMA_TO_DEVICE);
-   recv_buffer_free(skb);
+   tx_buffer_free(skb);
return NETDEV_TX_OK;
 }
 
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
index 1f648dc..a12beaa 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
@@ -151,22 +151,26 @@ octeon_droq_destroy_ring_buffers(struct octeon_device 
*oct,
 struct octeon_droq *droq)
 {
u32 i;
+   struct octeon_skb_page_info *pg_info;
 
for (i = 0; i < droq->max_count; i++) {
-   if (droq->recv_buf_list[i].buffer) {
-   if (droq->desc_ring) {
-   lio_unmap_ring_info(oct->pci_dev,
-   (u64)droq->
-   desc_ring[i].info_ptr,
-   OCT_DROQ_INFO_SIZE);
-   lio_unmap_ring(oct->pci_dev,
-  (u64)droq->desc_ring[i].
-  buffer_ptr,
-  droq->buffer_size);
-   }
-   recv_buffer_free(droq->recv_buf_list[i].buffer);
-   droq->recv_buf_list[i].buffer = NULL;
-   }
+   pg_info = &droq->recv_buf_list[i].pg_info;
+
+   if (pg_info->dma)
+

[PATCH net-next V2 0/9] liquidio: Updates and Bug fixes

2016-06-14 Thread Raghu Vatsavayi

Dave,

Following are updates for liquidio bug fixes and driver
support for new firmware interface. These updates are divided
into smaller logical patches as mentioned by you. These set of
nine patches should be applied in the following order as some of
them depend on earlier patches in the list.

Thanks.

Raghu Vatsavayi (9):
  liquidio: Avoid double free during soft command
  liquidio: Host queue mapping changes
  liquidio:Scatter gather list per IQ
  liquidio:RX queue alloc changes
  liquidio: RX desc alloc changes
  liquidio: Consider PTP for packet size calculations
  liquidio: New driver FW command structure
  liquidio: Replace ifidx for FW commands
  liquidio: Introduce new octeon2/3 header


 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c |  41 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 715 +++--
 .../net/ethernet/cavium/liquidio/liquidio_common.h | 260 ++--
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  70 +-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   1 +
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 169 +++--
 drivers/net/ethernet/cavium/liquidio/octeon_droq.h |  21 +-
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |  62 +-
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |  23 +-
 .../net/ethernet/cavium/liquidio/octeon_network.h  | 229 ++-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |  35 +-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  | 154 +++--
 .../net/ethernet/cavium/liquidio/request_manager.c |  96 +--
 .../ethernet/cavium/liquidio/response_manager.c|   6 +-
 14 files changed, 1218 insertions(+), 664 deletions(-)

-- 
1.8.3.1

[PATCH net-next V2 4/9] liquidio:RX queue alloc changes

2016-06-14 Thread Raghu Vatsavayi

This patch is to allocate rx queue's memory based on numa node and also use
page based buffers for rx traffic improvements.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 .../net/ethernet/cavium/liquidio/octeon_device.c   | 27 -
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 35 --
 drivers/net/ethernet/cavium/liquidio/octeon_main.h | 23 --
 3 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 967fe4d..c06807d 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -783,14 +783,15 @@ int octeon_setup_instr_queues(struct octeon_device *oct)
 
 int octeon_setup_output_queues(struct octeon_device *oct)
 {
-   u32 i, num_oqs = 0;
+   u32 num_oqs = 0;
u32 num_descs = 0;
u32 desc_size = 0;
+   u32 oq_no = 0;
+   int numa_node = cpu_to_node(oq_no % num_online_cpus());
 
+   num_oqs = 1;
/* this causes queue 0 to be default queue */
if (OCTEON_CN6XXX(oct)) {
-   /* CFG_GET_OQ_MAX_BASE_Q(CHIP_FIELD(oct, cn6xxx, conf)); */
-   num_oqs = 1;
num_descs =
CFG_GET_NUM_DEF_RX_DESCS(CHIP_FIELD(oct, cn6xxx, conf));
desc_size =
@@ -798,19 +799,15 @@ int octeon_setup_output_queues(struct octeon_device *oct)
}
 
oct->num_oqs = 0;
+   oct->droq[0] = vmalloc_node(sizeof(*oct->droq[0]), numa_node);
+   if (!oct->droq[0])
+   oct->droq[0] = vmalloc(sizeof(*oct->droq[0]));
+   if (!oct->droq[0])
+   return 1;
 
-   for (i = 0; i < num_oqs; i++) {
-   oct->droq[i] = vmalloc(sizeof(*oct->droq[i]));
-   if (!oct->droq[i])
-   return 1;
-
-   memset(oct->droq[i], 0, sizeof(struct octeon_droq));
-
-   if (octeon_init_droq(oct, i, num_descs, desc_size, NULL))
-   return 1;
-
-   oct->num_oqs++;
-   }
+   if (octeon_init_droq(oct, oq_no, num_descs, desc_size, NULL))
+   return 1;
+   oct->num_oqs++;
 
return 0;
 }
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
index 174072b..1f648dc 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
@@ -242,6 +242,8 @@ int octeon_init_droq(struct octeon_device *oct,
struct octeon_droq *droq;
u32 desc_ring_size = 0, c_num_descs = 0, c_buf_size = 0;
u32 c_pkts_per_intr = 0, c_refill_threshold = 0;
+   int orig_node = dev_to_node(&oct->pci_dev->dev);
+   int numa_node = cpu_to_node(q_no % num_online_cpus());
 
dev_dbg(&oct->pci_dev->dev, "%s[%d]\n", __func__, q_no);
 
@@ -261,15 +263,23 @@ int octeon_init_droq(struct octeon_device *oct,
struct octeon_config *conf6x = CHIP_FIELD(oct, cn6xxx, conf);
 
c_pkts_per_intr = (u32)CFG_GET_OQ_PKTS_PER_INTR(conf6x);
-   c_refill_threshold = (u32)CFG_GET_OQ_REFILL_THRESHOLD(conf6x);
+   c_refill_threshold =
+   (u32)CFG_GET_OQ_REFILL_THRESHOLD(conf6x);
+   } else {
+   return 1;
}
 
droq->max_count = c_num_descs;
droq->buffer_size = c_buf_size;
 
desc_ring_size = droq->max_count * OCT_DROQ_DESC_SIZE;
+   set_dev_node(&oct->pci_dev->dev, numa_node);
droq->desc_ring = lio_dma_alloc(oct, desc_ring_size,
(dma_addr_t *)&droq->desc_ring_dma);
+   set_dev_node(&oct->pci_dev->dev, orig_node);
+   if (!droq->desc_ring)
+   droq->desc_ring = lio_dma_alloc(oct, desc_ring_size,
+   (dma_addr_t *)&droq->desc_ring_dma);
 
if (!droq->desc_ring) {
dev_err(&oct->pci_dev->dev,
@@ -283,12 +293,11 @@ int octeon_init_droq(struct octeon_device *oct,
droq->max_count);
 
droq->info_list =
-   cnnic_alloc_aligned_dma(oct->pci_dev,
-   (droq->max_count * OCT_DROQ_INFO_SIZE),
-   &droq->info_alloc_size,
-   &droq->info_base_addr,
-   &droq->info_list_dma);
-
+   cnnic_numa_alloc_aligned_dma((droq->max_count *
+ OCT_DROQ_INFO_SIZE),
+&droq->info_alloc_size,
+&droq->info_base_addr,
+numa_node);
if (!droq->info_list) {
dev_err(&oct->pci_dev->

[PATCH net-next V2 3/9] liquidio:Scatter gather list per IQ

2016-06-14 Thread Raghu Vatsavayi

This patch is to allocate and manage scatter gather lists per
input queue(iq's) and remove queue's interdependence.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 214 ++---
 .../net/ethernet/cavium/liquidio/octeon_network.h  |   8 +-
 2 files changed, 149 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 3477a3c..0daa89a 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -166,6 +166,8 @@ struct octnic_gather {
 *  received from the IP layer.
 */
struct octeon_sg_entry *sg;
+
+   u64 sg_dma_ptr;
 };
 
 /** This structure is used by NIC driver to store information required
@@ -791,64 +793,116 @@ static inline struct list_head *list_delete_head(struct 
list_head *root)
 }
 
 /**
- * \brief Delete gather list
+ * \brief Delete gather lists
  * @param lio per-network private data
  */
-static void delete_glist(struct lio *lio)
+static void delete_glists(struct lio *lio)
 {
struct octnic_gather *g;
+   int i;
 
-   do {
-   g = (struct octnic_gather *)
-   list_delete_head(&lio->glist);
-   if (g) {
-   if (g->sg)
-   kfree((void *)((unsigned long)g->sg -
-   g->adjust));
-   kfree(g);
-   }
-   } while (g);
+   if (!lio->glist)
+   return;
+
+   for (i = 0; i < lio->linfo.num_txpciq; i++) {
+   do {
+   g = (struct octnic_gather *)
+   list_delete_head(&lio->glist[i]);
+   if (g) {
+   if (g->sg) {
+   dma_unmap_single(&lio->oct_dev->
+pci_dev->dev,
+g->sg_dma_ptr,
+g->sg_size,
+DMA_TO_DEVICE);
+   kfree((void *)((unsigned long)g->sg -
+  g->adjust));
+   }
+   kfree(g);
+   }
+   } while (g);
+   }
+
+   kfree((void *)lio->glist);
 }
 
 /**
- * \brief Setup gather list
+ * \brief Setup gather lists
  * @param lio per-network private data
  */
-static int setup_glist(struct lio *lio)
+static int setup_glists(struct octeon_device *oct, struct lio *lio, int 
num_iqs)
 {
-   int i;
+   int i, j;
struct octnic_gather *g;
 
-   INIT_LIST_HEAD(&lio->glist);
+   lio->glist_lock = kcalloc(num_iqs, sizeof(*lio->glist_lock),
+ GFP_KERNEL);
+   if (!lio->glist_lock)
+   return 1;
 
-   for (i = 0; i < lio->tx_qsize; i++) {
-   g = kzalloc(sizeof(*g), GFP_KERNEL);
-   if (!g)
-   break;
+   lio->glist = kcalloc(num_iqs, sizeof(*lio->glist),
+GFP_KERNEL);
+   if (!lio->glist) {
+   kfree((void *)lio->glist_lock);
+   return 1;
+   }
 
-   g->sg_size =
-   ((ROUNDUP4(OCTNIC_MAX_SG) >> 2) * OCT_SG_ENTRY_SIZE);
+   for (i = 0; i < num_iqs; i++) {
+   int numa_node = cpu_to_node(i % num_online_cpus());
 
-   g->sg = kmalloc(g->sg_size + 8, GFP_KERNEL);
-   if (!g->sg) {
-   kfree(g);
-   break;
+   spin_lock_init(&lio->glist_lock[i]);
+
+   INIT_LIST_HEAD(&lio->glist[i]);
+
+   for (j = 0; j < lio->tx_qsize; j++) {
+   g = kzalloc_node(sizeof(*g), GFP_KERNEL,
+numa_node);
+   if (!g)
+   g = kzalloc(sizeof(*g), GFP_KERNEL);
+   if (!g)
+   break;
+
+   g->sg_size = ((ROUNDUP4(OCTNIC_MAX_SG) >> 2) *
+ OCT_SG_ENTRY_SIZE);
+
+   g->sg = kmalloc_node(g->sg_size + 8,
+GFP_KERNEL, numa_node);
+   if (!g->sg)
+   g->sg = kmalloc(g->sg_size + 8, GFP_KERNEL);
+   if (!g->sg) {
+   kfree(g);
+   break;
+   }
+
+   /* The gather component should be aligned on 64-bit
+* bounda

[PATCH net-next V2 9/9] liquidio: Introduce new octeon2/3 header

2016-06-14 Thread Raghu Vatsavayi

Added support for new instruction header for octeon2/octeon3(ih) and
corresponding changes.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  40 +++---
 .../net/ethernet/cavium/liquidio/liquidio_common.h | 134 -
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |  41 ++-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |  21 ++--
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  | 130 +++-
 .../net/ethernet/cavium/liquidio/request_manager.c |  59 +
 .../ethernet/cavium/liquidio/response_manager.c|   6 +-
 7 files changed, 334 insertions(+), 97 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 4119e70..d0ab97c 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -2658,10 +2658,9 @@ static inline int send_nic_timestamp_pkt(struct 
octeon_device *oct,
 {
int retval;
struct octeon_soft_command *sc;
-   struct octeon_instr_ih *ih;
-   struct octeon_instr_rdp *rdp;
struct lio *lio;
int ring_doorbell;
+   u32 len;
 
lio = finfo->lio;
 
@@ -2683,12 +2682,11 @@ static inline int send_nic_timestamp_pkt(struct 
octeon_device *oct,
sc->callback_arg = finfo->skb;
sc->iq_no = ndata->q_no;
 
-   ih = (struct octeon_instr_ih *)&sc->cmd.ih;
-   rdp = (struct octeon_instr_rdp *)&sc->cmd.rdp;
+   len = (u32)((struct octeon_instr_ih2 *)(&sc->cmd.cmd2.ih2))->dlengsz;
 
ring_doorbell = !xmit_more;
retval = octeon_send_command(oct, sc->iq_no, ring_doorbell, &sc->cmd,
-sc, ih->dlengsz, ndata->reqtype);
+sc, len, ndata->reqtype);
 
if (retval == IQ_SEND_FAILED) {
dev_err(&oct->pci_dev->dev, "timestamp data packet failed 
status: %x\n",
@@ -2715,6 +2713,8 @@ static int liquidio_xmit(struct sk_buff *skb, struct 
net_device *netdev)
struct octnic_data_pkt ndata;
struct octeon_device *oct;
struct oct_iq_stats *stats;
+   struct octeon_instr_irh *irh;
+   union tx_info *tx_info;
int status = 0;
int q_idx = 0, iq_no = 0;
int xmit_more, j;
@@ -2800,18 +2800,18 @@ static int liquidio_xmit(struct sk_buff *skb, struct 
net_device *netdev)
cmdsetup.s.u.datasize = skb->len;
octnet_prepare_pci_cmd(oct, &ndata.cmd, &cmdsetup, tag);
/* Offload checksum calculation for TCP/UDP packets */
-   ndata.cmd.dptr = dma_map_single(&oct->pci_dev->dev,
-   skb->data,
-   skb->len,
-   DMA_TO_DEVICE);
-   if (dma_mapping_error(&oct->pci_dev->dev, ndata.cmd.dptr)) {
+   dptr = dma_map_single(&oct->pci_dev->dev,
+ skb->data,
+ skb->len,
+ DMA_TO_DEVICE);
+   if (dma_mapping_error(&oct->pci_dev->dev, dptr)) {
dev_err(&oct->pci_dev->dev, "%s DMA mapping error 1\n",
__func__);
return NETDEV_TX_BUSY;
}
 
-   finfo->dptr = ndata.cmd.dptr;
-
+   ndata.cmd.cmd2.dptr = dptr;
+   finfo->dptr = dptr;
ndata.reqtype = REQTYPE_NORESP_NET;
 
} else {
@@ -2885,18 +2885,17 @@ static int liquidio_xmit(struct sk_buff *skb, struct 
net_device *netdev)
   g->sg_size, DMA_TO_DEVICE);
dptr = g->sg_dma_ptr;
 
-   finfo->dptr = ndata.cmd.dptr;
+   ndata.cmd.cmd2.dptr = dptr;
+   finfo->dptr = dptr;
finfo->g = g;
 
ndata.reqtype = REQTYPE_NORESP_NET_SG;
}
 
-   if (skb_shinfo(skb)->gso_size) {
-   struct octeon_instr_irh *irh =
-   (struct octeon_instr_irh *)&ndata.cmd.irh;
-   union tx_info *tx_info = (union tx_info *)&ndata.cmd.ossp[0];
+   irh = (struct octeon_instr_irh *)&ndata.cmd.cmd2.irh;
+   tx_info = (union tx_info *)&ndata.cmd.cmd2.ossp[0];
 
-   irh->len = 1;   /* to indicate that ossp[0] contains tx_info */
+   if (skb_shinfo(skb)->gso_size) {
tx_info->s.gso_size = skb_shinfo(skb)->gso_size;
tx_info->s.gso_segs = skb_shinfo(skb)->gso_segs;
}
@@ -2926,8 +2925,9 @@ lio_xmit_failed:
stats->tx_dropped++;
netif_info(lio, tx_err, lio->netdev, "IQ%d Transmit dropped:%llu\n",
   iq_no, stats->tx_dropped);
-   dma_unmap_single(&oct->pci_dev->dev,

[PATCH net-next V2 8/9] liquidio: Replace ifidx for FW commands

2016-06-14 Thread Raghu Vatsavayi

This patch decoupled the firmware side ifidx and host side interface
number. It also has some minor name change for linkinfo sturct field.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c |  33 ++--
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 205 +++--
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  55 +++---
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   2 +
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   1 +
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |   4 +-
 drivers/net/ethernet/cavium/liquidio/octeon_droq.h |   3 +-
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |  12 +-
 .../net/ethernet/cavium/liquidio/octeon_network.h  |   6 +-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |  12 +-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  |  23 ++-
 .../net/ethernet/cavium/liquidio/request_manager.c |   5 +
 12 files changed, 187 insertions(+), 174 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c 
b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index 2937c802..4523c86 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -127,7 +127,7 @@ static int lio_get_settings(struct net_device *netdev, 
struct ethtool_cmd *ecmd)
dev_err(&oct->pci_dev->dev, "Unknown link interface 
reported\n");
}
 
-   if (linfo->link.s.status) {
+   if (linfo->link.s.link_up) {
ethtool_cmd_speed_set(ecmd, linfo->link.s.speed);
ecmd->duplex = linfo->link.s.duplex;
} else {
@@ -222,23 +222,20 @@ static int octnet_gpio_access(struct net_device *netdev, 
int addr, int val)
struct lio *lio = GET_LIO(netdev);
struct octeon_device *oct = lio->oct_dev;
struct octnic_ctrl_pkt nctrl;
-   struct octnic_ctrl_params nparams;
int ret = 0;
 
memset(&nctrl, 0, sizeof(struct octnic_ctrl_pkt));
 
nctrl.ncmd.u64 = 0;
nctrl.ncmd.s.cmd = OCTNET_CMD_GPIO_ACCESS;
-   nctrl.ncmd.s.param1 = lio->linfo.ifidx;
-   nctrl.ncmd.s.param2 = addr;
-   nctrl.ncmd.s.param3 = val;
+   nctrl.ncmd.s.param1 = addr;
+   nctrl.ncmd.s.param2 = val;
+   nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
nctrl.wait_time = 100;
nctrl.netpndev = (u64)netdev;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
-   nparams.resp_order = OCTEON_RESP_ORDERED;
-
-   ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl, nparams);
+   ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
if (ret < 0) {
dev_err(&oct->pci_dev->dev, "Failed to configure gpio value\n");
return -EINVAL;
@@ -303,9 +300,10 @@ octnet_mdio45_access(struct lio *lio, int op, int loc, int 
*value)
mdio_cmd->mdio_addr = loc;
if (op)
mdio_cmd->value1 = *value;
-   mdio_cmd->value2 = lio->linfo.ifidx;
octeon_swap_8B_data((u64 *)mdio_cmd, sizeof(struct oct_mdio_cmd) / 8);
 
+   sc->iq_no = lio->linfo.txpciq[0].s.q_no;
+
octeon_prepare_soft_command(oct_dev, sc, OPCODE_NIC, OPCODE_NIC_MDIO45,
0, 0, 0);
 
@@ -503,10 +501,10 @@ static void lio_set_msglevel(struct net_device *netdev, 
u32 msglvl)
if ((msglvl ^ lio->msg_enable) & NETIF_MSG_HW) {
if (msglvl & NETIF_MSG_HW)
liquidio_set_feature(netdev,
-OCTNET_CMD_VERBOSE_ENABLE);
+OCTNET_CMD_VERBOSE_ENABLE, 0);
else
liquidio_set_feature(netdev,
-OCTNET_CMD_VERBOSE_DISABLE);
+OCTNET_CMD_VERBOSE_DISABLE, 0);
}
 
lio->msg_enable = msglvl;
@@ -950,7 +948,6 @@ static int lio_set_settings(struct net_device *netdev, 
struct ethtool_cmd *ecmd)
struct octeon_device *oct = lio->oct_dev;
struct oct_link_info *linfo;
struct octnic_ctrl_pkt nctrl;
-   struct octnic_ctrl_params nparams;
int ret = 0;
 
/* get the link info */
@@ -978,9 +975,9 @@ static int lio_set_settings(struct net_device *netdev, 
struct ethtool_cmd *ecmd)
 
nctrl.ncmd.u64 = 0;
nctrl.ncmd.s.cmd = OCTNET_CMD_SET_SETTINGS;
+   nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
nctrl.wait_time = 1000;
nctrl.netpndev = (u64)netdev;
-   nctrl.ncmd.s.param1 = lio->linfo.ifidx;
nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
 
/* Passing the parameters sent by ethtool like Speed, Autoneg & Duplex
@@ -990,19 +987,17 @@ static int lio_set_settings(struct net_device *netdev, 
struct ethtool_cmd *ecmd)
/* Autoneg ON */
nct

Re: [iproute PATCH 1/2] tc: m_action: Fix for field init before memset

2016-06-14 Thread Stephen Hemminger

On Wed, 15 Jun 2016 00:26:08 +0200
Phil Sutter  wrote:

> From: Phil Sutter 
> 
> Initializing req.t.tca_family before setting the whole req object to
> zero using memset does not make sense. Instead initialize the field
> after calling memset.
> 
> Note that this change has no functional effect since AF_UNSPEC is
> defined to 0 anyway, so this fix is a purely cosmetic one.
> 
> Signed-off-by: Phil Sutter 

Instead of moving around the code with memset(), it would make more
sense to change this and other places to use C99 style initializers.
They are safer and the code is cleaner.

Re: [PATCH 2/2] dt: bindings: Add bindings for Cirrus Logic CS89x0 ethernet chip

2016-06-14 Thread Rob Herring

On Mon, Jun 13, 2016 at 06:52:17PM +0300, Alexander Shiyan wrote:
> Add device tree binding documentation details for Cirrus Logic
> CS8900/CS8920 ethernet chip.
> 
> Signed-off-by: Alexander Shiyan 
> ---
>  Documentation/devicetree/bindings/net/cirrus,cs89x0.txt | 13 +
>  1 file changed, 13 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/cirrus,cs89x0.txt

Acked-by: Rob Herring

[PATCH net-next] ila: Fix checksum neutral mapping

2016-06-14 Thread Tom Herbert

The algorithm for checksum neutral mapping is incorrect. This problem
was being hidden since we were previously always performing checksum
offload on the translated addresses and only with IPv6 HW csum.
Enabling an ILA router shows the issue.

Corrected algorithm:

old_loc is the original locator in the packet, new_loc is the value
to overwrite with and is found in the lookup table. old_flag is
the old flag value (zero of CSUM_NEUTRAL_FLAG) and new_flag is
then (old_flag ^ CSUM_NEUTRAL_FLAG) & CSUM_NEUTRAL_FLAG.

Need SUM(new_id + new_flag + diff) == SUM(old_id + old_flag) for
checksum neutral translation.

Solving for diff gives:

diff = (old_id - new_id) + (old_flag - new_flag)

compute_csum_diff8(new_id, old_id) gives old_id - new_id

If old_flag is set
   old_flag - new_flag = old_flag = CSUM_NEUTRAL_FLAG
Else
   old_flag - new_flag = -new_flag = ~CSUM_NEUTRAL_FLAG

Tested:
  - Implemented a user space program that creates random addresses
and random locators to overwrite. Compares the checksum over
the address before and after translation (must always be equal)
  - Enabled ILA router and showed proper operation.

Signed-off-by: Tom Herbert 
---
 net/ipv6/ila/ila_common.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index b3d00be..ec9efbc 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -34,12 +34,12 @@ static void ila_csum_do_neutral(struct ila_addr *iaddr,
if (p->locator_match.v64) {
diff = p->csum_diff;
} else {
-   diff = compute_csum_diff8((__be32 *)iaddr,
- (__be32 *)&p->locator);
+   diff = compute_csum_diff8((__be32 *)&p->locator,
+ (__be32 *)iaddr);
}
 
fval = (__force __wsum)(ila_csum_neutral_set(iaddr->ident) ?
-   ~CSUM_NEUTRAL_FLAG : CSUM_NEUTRAL_FLAG);
+   CSUM_NEUTRAL_FLAG : ~CSUM_NEUTRAL_FLAG);
 
diff = csum_add(diff, fval);
 
@@ -140,8 +140,8 @@ void ila_init_saved_csum(struct ila_params *p)
return;
 
p->csum_diff = compute_csum_diff8(
-   (__be32 *)&p->locator_match,
-   (__be32 *)&p->locator);
+   (__be32 *)&p->locator,
+   (__be32 *)&p->locator_match);
 }
 
 static int __init ila_init(void)
-- 
2.8.0.rc2

Re: [PATCH net-next 00/11] bnxt_en: Updates for net-next.

2016-06-14 Thread David Miller

From: Michael Chan 
Date: Mon, 13 Jun 2016 02:25:27 -0400

> -Add default VLAN support for VFs.
> -Add NPAR (NIC partioning) support.
> -Add support for new device 5731x and 5741x. GRO logic is different.
> -Support new ETHTOOL_{G|S}LINKSETTINGS.

Series applied, thanks.

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-14 Thread Vivien Didelot

Hi,

Andrew Lunn  writes:

> On Tue, Jun 14, 2016 at 06:24:17PM -0400, Vivien Didelot wrote:
>> Hi Andrew,
>> 
>> Andrew Lunn  writes:
>> 
>> >> - ret = mdiobus_read_nested(bus, addr, reg);
>> >> + ret = mdiobus_read_nested(bus, sw_addr + addr, reg);
>> >>   if (ret < 0)
>> >>   return ret;
>> >
>> > If we are doing direct access, doesn't it means sw_addr is 0?
>> >
>> > So isn't this pointless?
>> 
>> 6060 has no indirect access and directly responds to 16 SMI addresses,
>> regardless its chip address which can be strapped to either 0 or 16.
>
> Ah! O.K.
>
> wnr854t-setup.c uses 0.
> rd88f6183ap-ge-setup.c uses 0.
> wrt350n-v2-setup.c uses 0.
> rd88f5181l-fxo-setup.c uses 0.
> rd88f5181l-ge-setup.c uses 0.
> mach-bf518/boards/ezbrd.c uses 0.
>
> The 6060 is a very old device. I doubt we will get any new boards
> contributed using it. We are also going to have trouble actually
> finding a device with one in order to test a merged mv88e6xxx and
> mv88e6060 driver.
>
> So i say we ignore the possibility of an 6060 on 16, until one really
> comes along.

Sounds good to me!

Thanks,

Vivien

Re: [PATCH v4] udp reuseport: fix packet of same flow hashed to different socket

2016-06-14 Thread David Miller

From: Su Xuemin 
Date: Mon, 13 Jun 2016 11:02:50 +0800

> From: "Su, Xuemin" 
> 
> There is a corner case in which udp packets belonging to a same
> flow are hashed to different socket when hslot->count changes from 10
> to 11:
> 
> 1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
> and always passes 'daddr' to udp_ehashfn().
> 
> 2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
> but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
> INADDR_ANY instead of some specific addr.
> 
> That means when hslot->count changes from 10 to 11, the hash calculated by
> udp_ehashfn() is also changed, and the udp packets belonging to a same
> flow will be hashed to different socket.
> 
> This is easily reproduced:
> 1) Create 10 udp sockets and bind all of them to 0.0.0.0:4.
> 2) From the same host send udp packets to 127.0.0.1:4, record the
> socket index which receives the packets.
> 3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
> is 4 + UDP_HASH_SIZE(4096), this makes the new socket put into the
> same hslot as the aformentioned 10 sockets, and makes the hslot->count
> change from 10 to 11.
> 4) From the same host send udp packets to 127.0.0.1:4, and the socket
> index which receives the packets will be different from the one received
> in step 2.
> This should not happen as the socket bound to 0.0.0.0:44096 should not
> change the behavior of the sockets bound to 0.0.0.0:4.
> 
> It's the same case for IPv6, and this patch also fixes that.
> 
> Signed-off-by: Su, Xuemin 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks.

[net PATCH] i40e/i40evf: Fix i40e_rx_checksum

2016-06-14 Thread Alexander Duyck

There are a couple of issues I found in i40e_rx_checksum while doing some
recent testing.  As a result I have found the Rx checksum logic is pretty
much broken and returning that the checksum is valid for tunnels in cases
where it is not.

First the inner types are not the correct values to use to test for if a
tunnel is present or not.  In addition the inner protocol types are not a
bitmask as such performing an OR of the values doesn't make sense.  I have
instead changed the code so that the inner protocol types are used to
determine if we report CHECKSUM_UNNECESSARY or not.  For anything that does
not end in UDP, TCP, or SCTP it doesn't make much sense to report a
checksum offload since it won't contain a checksum anyway.

This leaves us with the need to set the csum_level based on some value.
For that purpose I am using the tunnel_type field.  If the tunnel type is
GRENAT or greater then this means we have a GRE or UDP tunnel with an inner
header.  In the case of GRE or UDP we will have a possible checksum present
so for this reason it should be safe to set the csum_level to 1 to indicate
that we are reporting the state of the inner header.

Signed-off-by: Alexander Duyck 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   30 ++---
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |   30 ++---
 2 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 55f151fca1dc..a8868e1bf832 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1280,8 +1280,8 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
union i40e_rx_desc *rx_desc)
 {
struct i40e_rx_ptype_decoded decoded;
-   bool ipv4, ipv6, tunnel = false;
u32 rx_error, rx_status;
+   bool ipv4, ipv6;
u8 ptype;
u64 qword;
 
@@ -1336,19 +1336,23 @@ static inline void i40e_rx_checksum(struct i40e_vsi 
*vsi,
if (rx_error & BIT(I40E_RX_DESC_ERROR_PPRS_SHIFT))
return;
 
-   /* The hardware supported by this driver does not validate outer
-* checksums for tunneled VXLAN or GENEVE frames.  I don't agree
-* with it but the specification states that you "MAY validate", it
-* doesn't make it a hard requirement so if we have validated the
-* inner checksum report CHECKSUM_UNNECESSARY.
+   /* If there is an outer header present that might contain a checksum
+* we need to bump the checksum level by 1 to reflect the fact that
+* we are indicating we validated the inner checksum.
 */
-   if (decoded.inner_prot & (I40E_RX_PTYPE_INNER_PROT_TCP |
- I40E_RX_PTYPE_INNER_PROT_UDP |
- I40E_RX_PTYPE_INNER_PROT_SCTP))
-   tunnel = true;
-
-   skb->ip_summed = CHECKSUM_UNNECESSARY;
-   skb->csum_level = tunnel ? 1 : 0;
+   if (decoded.tunnel_type >= I40E_RX_PTYPE_TUNNEL_IP_GRENAT)
+   skb->csum_level = 1;
+
+   /* Only report checksum unnecessary for TCP, UDP, or SCTP */
+   switch (decoded.inner_prot) {
+   case I40E_RX_PTYPE_INNER_PROT_TCP:
+   case I40E_RX_PTYPE_INNER_PROT_UDP:
+   case I40E_RX_PTYPE_INNER_PROT_SCTP:
+   skb->ip_summed = CHECKSUM_UNNECESSARY;
+   /* fall though */
+   default:
+   break;
+   }
 
return;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index be99189da925..79d99cd91b24 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -752,8 +752,8 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
union i40e_rx_desc *rx_desc)
 {
struct i40e_rx_ptype_decoded decoded;
-   bool ipv4, ipv6, tunnel = false;
u32 rx_error, rx_status;
+   bool ipv4, ipv6;
u8 ptype;
u64 qword;
 
@@ -808,19 +808,23 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
if (rx_error & BIT(I40E_RX_DESC_ERROR_PPRS_SHIFT))
return;
 
-   /* The hardware supported by this driver does not validate outer
-* checksums for tunneled VXLAN or GENEVE frames.  I don't agree
-* with it but the specification states that you "MAY validate", it
-* doesn't make it a hard requirement so if we have validated the
-* inner checksum report CHECKSUM_UNNECESSARY.
+   /* If there is an outer header present that might contain a checksum
+* we need to bump the checksum level by 1 to reflect the fact that
+* we are indicating we validated the inner checksum.
 */
-   if (decoded.inner_prot & (I40E_RX_PTYPE_INNER_PROT_TCP |
- I40E_RX_PTYPE_INNER_

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-14 Thread Andrew Lunn

On Tue, Jun 14, 2016 at 06:24:17PM -0400, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
> >> -  ret = mdiobus_read_nested(bus, addr, reg);
> >> +  ret = mdiobus_read_nested(bus, sw_addr + addr, reg);
> >>if (ret < 0)
> >>return ret;
> >
> > If we are doing direct access, doesn't it means sw_addr is 0?
> >
> > So isn't this pointless?
> 
> 6060 has no indirect access and directly responds to 16 SMI addresses,
> regardless its chip address which can be strapped to either 0 or 16.

Ah! O.K.

wnr854t-setup.c uses 0.
rd88f6183ap-ge-setup.c uses 0.
wrt350n-v2-setup.c uses 0.
rd88f5181l-fxo-setup.c uses 0.
rd88f5181l-ge-setup.c uses 0.
mach-bf518/boards/ezbrd.c uses 0.

The 6060 is a very old device. I doubt we will get any new boards
contributed using it. We are also going to have trouble actually
finding a device with one in order to test a merged mv88e6xxx and
mv88e6060 driver.

So i say we ignore the possibility of an 6060 on 16, until one really
comes along.

> Question 2) is MV88E6XXX_FLAG_MULTI_CHIP confusing?

No, i think it is fine.

Andrew

Re: [PATCH 3/3] net: hisilicon: Add Fast Ethernet MAC driver

2016-06-14 Thread Rob Herring

On Mon, Jun 13, 2016 at 02:07:56PM +0800, Dongpo Li wrote:
> This patch adds the Hisilicon Fast Ethernet MAC(FEMAC) driver.
> The FEMAC supports max speed 100Mbps and has been used in many
> Hisilicon SoC.
> 
> Reviewed-by: Jiancheng Xue 
> Signed-off-by: Dongpo Li 
> ---
>  .../devicetree/bindings/net/hisilicon-femac.txt|   40 +
>  drivers/net/ethernet/hisilicon/Kconfig |   12 +
>  drivers/net/ethernet/hisilicon/Makefile|1 +
>  drivers/net/ethernet/hisilicon/hisi_femac.c| 1015 
> 
>  4 files changed, 1068 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-femac.txt
>  create mode 100644 drivers/net/ethernet/hisilicon/hisi_femac.c
> 
> diff --git a/Documentation/devicetree/bindings/net/hisilicon-femac.txt 
> b/Documentation/devicetree/bindings/net/hisilicon-femac.txt
> new file mode 100644
> index 000..b953a56
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-femac.txt
> @@ -0,0 +1,40 @@
> +Hisilicon Fast Ethernet MAC controller
> +
> +Required properties:
> +- compatible: should be "hisilicon,hisi-femac" and one of the following:

This compatible seems a bit pointless. The following 2 are generic 
enough.

> + * "hisilicon,hisi-femac-v1"
> + * "hisilicon,hisi-femac-v2"

SoC specific compatible strings in addition to these please.

> +- reg: specifies base physical address(s) and size of the device registers.
> +  The first region is the MAC core register base and size.
> +  The second region is the global MAC control register.
> +- interrupts: should contain the MAC interrupt.
> +- clocks: clock phandle and specifier pair.

How many clocks?

> +- resets: should contain the phandle to the MAC reset signal(required) and
> + the PHY reset signal(optional).
> +- reset-names: should contain the reset signal name "mac_reset"(required)
> + and "phy_reset"(optional).
> +- mac-address: see ethernet.txt [1].
> +- phy-mode: see ethernet.txt [1].
> +- phy-handle: see ethernet.txt [1].
> +- hisilicon,phy-reset-delays: triplet of delays if PHY reset signal given.
> + The 1st cell is reset pre-delay in micro seconds.
> + The 2nd cell is reset pulse in micro seconds.
> + The 3rd cell is reset post-delay in micro seconds.

Add standard unit suffixes.

> +
> +[1] Documentation/devicetree/bindings/net/ethernet.txt
> +
> +Example:
> + hisi_femac: ethernet@1009 {
> + compatible = "hisilicon,hisi-femac-v2", "hisilicon,hisi-femac";
> + reg = <0x1009 0x1000>,<0x10091300 0x200>;
> + interrupts = <12>;
> + clocks = <&crg HI3518EV200_ETH_CLK>;
> + resets = <&crg 0xec 0>,
> + <&crg 0xec 3>;
> + reset-names = "mac_reset",
> + "phy_reset";
> + mac-address = [00 00 00 00 00 00];
> + phy-mode = "mii";
> + phy-handle = <&phy0>;
> + hisilicon,phy-reset-delays = <1 2 2>;
> + };

[iproute PATCH 2/2] tc: m_action: Drop unused variable nladdr in tc_action_gd()

2016-06-14 Thread Phil Sutter

From: Phil Sutter 

This has been there since the introduction of tc/m_action.c back in 2004
and was apparently never in use.

Signed-off-by: Phil Sutter 
---
 tc/m_action.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 1b34370f4ec09..fd324090db1f2 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -386,7 +386,6 @@ static int tc_action_gd(int cmd, unsigned int flags, int 
*argc_p, char ***argv_p
int prio = 0;
int ret = 0;
__u32 i;
-   struct sockaddr_nl nladdr;
struct rtattr *tail;
struct rtattr *tail2;
struct nlmsghdr *ans = NULL;
@@ -400,9 +399,6 @@ static int tc_action_gd(int cmd, unsigned int flags, int 
*argc_p, char ***argv_p
memset(&req, 0, sizeof(req));
req.t.tca_family = AF_UNSPEC;
 
-   memset(&nladdr, 0, sizeof(nladdr));
-   nladdr.nl_family = AF_NETLINK;
-
req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
req.n.nlmsg_flags = NLM_F_REQUEST|flags;
req.n.nlmsg_type = cmd;
-- 
2.8.2

[iproute PATCH 1/2] tc: m_action: Fix for field init before memset

2016-06-14 Thread Phil Sutter

From: Phil Sutter 

Initializing req.t.tca_family before setting the whole req object to
zero using memset does not make sense. Instead initialize the field
after calling memset.

Note that this change has no functional effect since AF_UNSPEC is
defined to 0 anyway, so this fix is a purely cosmetic one.

Signed-off-by: Phil Sutter 
---
 tc/m_action.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index c416d98a775a6..1b34370f4ec09 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -397,9 +397,8 @@ static int tc_action_gd(int cmd, unsigned int flags, int 
*argc_p, char ***argv_p
charbuf[MAX_MSG];
} req;
 
-   req.t.tca_family = AF_UNSPEC;
-
memset(&req, 0, sizeof(req));
+   req.t.tca_family = AF_UNSPEC;
 
memset(&nladdr, 0, sizeof(nladdr));
nladdr.nl_family = AF_NETLINK;
@@ -502,10 +501,8 @@ static int tc_action_modify(int cmd, unsigned int flags, 
int *argc_p, char ***ar
charbuf[MAX_MSG];
} req;
 
-   req.t.tca_family = AF_UNSPEC;
-
memset(&req, 0, sizeof(req));
-
+   req.t.tca_family = AF_UNSPEC;
req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
req.n.nlmsg_flags = NLM_F_REQUEST|flags;
req.n.nlmsg_type = cmd;
@@ -541,10 +538,8 @@ static int tc_act_list_or_flush(int argc, char **argv, int 
event)
charbuf[MAX_MSG];
} req;
 
-   req.t.tca_family = AF_UNSPEC;
-
memset(&req, 0, sizeof(req));
-
+   req.t.tca_family = AF_UNSPEC;
req.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
 
tail = NLMSG_TAIL(&req.n);
-- 
2.8.2

RE: [PATCH v6] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-14 Thread Mario_Limonciello

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, June 14, 2016 1:35 PM
> To: pali.ro...@gmail.com
> Cc: gre...@linuxfoundation.org; and...@lunn.ch; Limonciello, Mario
> ; hayesw...@realtek.com; linux-
> ker...@vger.kernel.org; netdev@vger.kernel.org; linux-
> u...@vger.kernel.org; anthony.w...@canonical.com
> Subject: Re: [PATCH v6] r8152: Add support for setting pass through MAC
> address on RTL8153-AD
> 
> From: Pali Rohár 
> Date: Tue, 14 Jun 2016 18:47:36 +0200
> 
> > You have never seen two ethernet cards with same MAC addresses? Right
> I
> > have not seen two USB, but there is non zero chance that could happen.
> 
> It would be an error scenerio, and something to be avoided.
> 
> It is a valid and correct assumption that one is able to put
> several devices at the same time on the same physical network
> and expect it to work.
> 
> The behavior added by the change in question invalidates that.
> 
> I'm trying to consider the long term aspects of this, which is that if
> more devices adopt this scheme we're in trouble if we blindly
> interpret the MAC address in this way.
> 

Do you mean if other manufacturers start to ship devices with 
RTL8135-AD's w/ this pass through bit set and people start to try to 
mix and match?

> This firmware MAC property facility seems to be designed with only an
> extremely narrow use case being considered.

Yes, as I understand it this is the reason that it's only on such specific 
devices
that the mac address pass through bit is actually set on the efuse.

Re: [PATCH v2 net-next v2 11/12] net: dsa: mv88e6xxx: add an SMI ops structure

2016-06-14 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> +struct mv88e6xxx_smi_ops {
>> +int (*read)(struct mii_bus *bus, int sw_addr,
>> +int addr, int reg, u16 *val);
>> +int (*write)(struct mii_bus *bus, int sw_addr,
>> + int addr, int reg, u16 val);
>> +};
>> +
>
> I think this API would be better if it used ps, not bus and sw_addr.
>
> The only problem is the very first read to get the switch ID. I would
> add one more layer in between, so that you can call the lowest level
> functions without having a ps structure.

That's why I keep it simple for the moment.

The low-level API using ps is now _mv88e6xxx_reg_{read,write}. I can
rename them to mv88e6xxx_smi_{read,write} in v3 or later.

Thanks,

Vivien

Re: [PATCH 1/3] net: Add MDIO bus driver for the Hisilicon FEMAC

2016-06-14 Thread Rob Herring

On Mon, Jun 13, 2016 at 02:07:54PM +0800, Dongpo Li wrote:
> This patch adds a separate driver for the MDIO interface of the
> Hisilicon Fast Ethernet MAC.
> 
> Reviewed-by: Jiancheng Xue 
> Signed-off-by: Dongpo Li 
> ---
>  .../bindings/net/hisilicon-femac-mdio.txt  |  22 +++

Acked-by: Rob Herring 

>  drivers/net/phy/Kconfig|   8 +
>  drivers/net/phy/Makefile   |   1 +
>  drivers/net/phy/mdio-hisi-femac.c  | 165 
> +
>  4 files changed, 196 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/net/hisilicon-femac-mdio.txt
>  create mode 100644 drivers/net/phy/mdio-hisi-femac.c

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Andrew Lunn

>   This was DT as well with a recent fedora/NetworkManager. It
> actually seems to be timing related to how fast the device gets
> configured after the initial phy probe. There is something like a 1
> second window or so where it will work, but if network manager takes
> longer than that, the link state drops and cannot be brought back up
> unless the cable is pulled, replugged while the netdevice is being
> restarted.

Ah!

There is another bug in the driver. The phy is connected to the netdev
after calling register_netdev(). You are supposed to do it before,
because the interface is usable, and can be used, directly after the
register.

Move the call to smsc911x_mii_init() before the register_netdev().

 Andrew

[iproute PATCH 0/2] Two cosmetic fixes in tc/m_action.c

2016-06-14 Thread Phil Sutter

Found these when doing something else. Shouldn't cause any functional
change.

Phil Sutter (2):
  tc: m_action: Fix for field init before memset
  tc: m_action: Drop unused variable nladdr in tc_action_gd()

 tc/m_action.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

-- 
2.8.2

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-14 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> -ret = mdiobus_read_nested(bus, addr, reg);
>> +ret = mdiobus_read_nested(bus, sw_addr + addr, reg);
>>  if (ret < 0)
>>  return ret;
>
> If we are doing direct access, doesn't it means sw_addr is 0?
>
> So isn't this pointless?

6060 has no indirect access and directly responds to 16 SMI addresses,
regardless its chip address which can be strapped to either 0 or 16.

If we want to add support for it in mv88e6xxx someday (which is likely),
the code is ready for that.

Question 1) given this, should I still consider your first comment on
this patch about the mv88e6xxx_smi_ops assignment?

Question 2) is MV88E6XXX_FLAG_MULTI_CHIP confusing? I took a short name
for style but maybe a longer MV88E6XXX_FLAG_MULTI_CHIP_ADDRESSING or
MV88E6XXX_FLAG_MULTI_CHIP_MODE would be clearer to make to distinction
between "Single-chip Addressing Mode" and "Multi-chip Addressing Mode".

Thanks,

Vivien

[PATCH] [v5] net: emac: emac gigabit ethernet controller driver

2016-06-14 Thread Timur Tabi

Add supports for ethernet controller HW on Qualcomm Technologies, Inc. SoC.
This driver supports the following features:
1) Checksum offload.
2) Interrupt coalescing support.
3) SGMII phy.
4) phylib interface for external phy

Based on original work by
Niranjana Vishwanathapura 
Gilad Avidov 

Signed-off-by: Timur Tabi 
---

v5:
 - changed author to Timur, added MAINTAINERS entry
 - use phylib, replacing internal phy code
 - added support for EMAC internal SGMII v2
 - fix ~DIS_INT warning
 - update DT bindings, including removing unused properties
 - removed interrupt handler for internal sgmii
 - removed link status check handler/state (replaced with phylib)
 - removed periodic timer handler (replaced with phylib)
 - removed power management code (will be rewritten later)
 - external phy is now required, not optional
 - removed redundant EMAC_STATUS_DOWN status flag
 - removed redundant link status and speed variables
 - removed redundant status bits (vlan strip, promiscuous, loopback, etc)
 - removed useless watchdog status
 - removed command-line parameters
 - cleaned up probe messages
 - removed redundant params from emac_sgmii_link_init()
 - always call netdev_completed_queue() (per review comment)
 - fix emac_napi_rtx() (per review comment)
 - removed max_ints loop in interrupt handler
 - removed redundant mutex around phy read/write calls
 - added lock for reading emac status (per review comment)
 - generate random MAC address if it can't be read from firmware
 - replace EMAC_DMA_ADDR_HI/LO with upper/lower_32_bits
 - don't test return value from platform_get_resource (per review comment)
 - use net_warn_ratelimited (per review comment)
 - don't set the dma masks (will be set by DT or IORT code)
 - remove unused emac_tx_tpd_ts_save()
 - removed redundant local MTU variable

v4:
 - add missing ipv6 header file
 - correct compatible string
 - fix spacing in emac_reg_write arrays
 - drop unnecessary cell-index property
 - remove unsupported DT properties from docs
 - remove GPIO initialization and update docs

v3:
 - remove most of the memory barriers by using the non xxx_relaxed() api.
 - remove RSS and WOL support.
 - correct comments from physical address to dma address.
 - rearrange structs to make them packed.
 - replace polling loops with readl_poll_timeout().
 - remove unnecessary wrapper functions from phy layer.
 - add blank line before return statements.
 - set to null clocks after clk_put().
 - use module_platform_driver() and dma_set_mask_and_coherent()
 - replace long hex bitmasks with BIT() macro.

v2:
 - replace hw bit fields to macros with bitwise operations.
 - change all iterators to unsized types (int)
 - some minor code flow improvements.
 - change return type to void for functions which return value is never
   used.
 - replace instance of l_relaxed() io followed by mb() with a
   readl()/writel().


 .../devicetree/bindings/net/qcom-emac.txt  |   66 +
 MAINTAINERS|6 +
 drivers/net/ethernet/qualcomm/Kconfig  |   11 +
 drivers/net/ethernet/qualcomm/Makefile |2 +
 drivers/net/ethernet/qualcomm/emac/Makefile|7 +
 drivers/net/ethernet/qualcomm/emac/emac-mac.c  | 1674 
 drivers/net/ethernet/qualcomm/emac/emac-mac.h  |  284 
 drivers/net/ethernet/qualcomm/emac/emac-phy.c  |  211 +++
 drivers/net/ethernet/qualcomm/emac/emac-phy.h  |   32 +
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c|  699 
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.h|   24 +
 drivers/net/ethernet/qualcomm/emac/emac.c  |  798 ++
 drivers/net/ethernet/qualcomm/emac/emac.h  |  370 +
 13 files changed, 4184 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/qcom-emac.txt
 create mode 100644 drivers/net/ethernet/qualcomm/emac/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-mac.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-mac.h
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-phy.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-phy.h
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac.h

diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt 
b/Documentation/devicetree/bindings/net/qcom-emac.txt
new file mode 100644
index 000..e48a9b9
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom-emac.txt
@@ -0,0 +1,66 @@
+Qualcomm EMAC Gigabit Ethernet Controller
+
+Required properties:
+- compatible : Should be "qcom,fsm9900-emac".
+- reg : Offset and length of the register regions for the device
+- reg-names : Register region names referenced in 'reg' above.
+   Required register resource entries are:
+   "base"   : EM

Re: [PATCH v2 net-next v2 10/12] net: dsa: mv88e6xxx: iterate on compatible info

2016-06-14 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> On Tue, Jun 14, 2016 at 02:31:51PM -0400, Vivien Didelot wrote:
>> With legacy probing, we cannot have a compatible info structure. We have
>> to guess it. Instead of using only the first info structure of the info
>> table, iterate over the compatible data.
>> 
>> That way, the legacy code will support new compatible chips with
>> different register access without requiring any code change.
>
> I don't think this is safe when used in combination with multi-chip
> addresses. This code will perform writes on various addresses,
> addresses which could be real registers on a device.
>
> I don't see a need to support guessing. The new binding will work,
> without any guessing. So use that.

OK, I drop this patch and limit the detection in the legacy probing
against the 6085 chip info.

Thanks,

Vivien

Re: [PATCH v2 net-next v2 09/12] net: dsa: mv88e6xxx: add SMI detection helper

2016-06-14 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> -name = info->name;
>> +dev_info(dev, "switch 0x%x detected: %s, revision %u\n", prod_num,
>> + info->name, rev);
>>  
>> -ps = devm_kzalloc(dsa_dev, sizeof(*ps), GFP_KERNEL);
>> +ps = devm_kzalloc(dev, sizeof(*ps), GFP_KERNEL);
>>  if (!ps)
>>  return NULL;
>
> I don't like the way this detect function goes a lot further than
> detection. I would say detection finished when you have the info
> structure. Return at that point, and let the probe do the rest.

OK, I split detection and allocation.

Thanks,

Vivien

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-14 Thread Andrew Lunn

On Tue, Jun 14, 2016 at 02:31:53PM -0400, Vivien Didelot wrote:
> When the SMI address of the switch chip on the SMI master bus is not
> zero, some chips (e.g. 88E6352) use an indirect access through two SMI
> Command and Data registers, while others (e.g. 88E6060) still use a
> direct access.
> 
> Add a capability flag to describe chips supporting the Multi-chip
> Addressing Mode.
> 
> Use the SMI indirect access ops only for switches with this flag and
> change the direct SMI direct access ops to support non-zero chip
> address.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/mv88e6xxx.c |  6 +++---
>  drivers/net/dsa/mv88e6xxx.h | 16 +++-
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index fc28a6c..8e12246 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -52,7 +52,7 @@ static int mv88e6xxx_smi_direct_read(struct mii_bus *bus, 
> int sw_addr,
>  {
>   int ret;
>  
> - ret = mdiobus_read_nested(bus, addr, reg);
> + ret = mdiobus_read_nested(bus, sw_addr + addr, reg);
>   if (ret < 0)
>   return ret;

If we are doing direct access, doesn't it means sw_addr is 0?

So isn't this pointless?

   Andrew

Re: [PATCH v2 net-next v2 11/12] net: dsa: mv88e6xxx: add an SMI ops structure

2016-06-14 Thread Andrew Lunn

> +struct mv88e6xxx_smi_ops {
> + int (*read)(struct mii_bus *bus, int sw_addr,
> + int addr, int reg, u16 *val);
> + int (*write)(struct mii_bus *bus, int sw_addr,
> +  int addr, int reg, u16 val);
> +};
> +

I think this API would be better if it used ps, not bus and sw_addr.

The only problem is the very first read to get the switch ID. I would
add one more layer in between, so that you can call the lowest level
functions without having a ps structure.

  Andrew

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Sergei Shtylyov


On 06/15/2016 12:53 AM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things
seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.


Ok, I'm back on the machine, this is what mine says without that patch.

SMSC LAN911x Internal PHY 1800.etherne:01: attached PHY driver
[SMSC
LAN911x Internal PHY] (mii_bus:phy_addr=1800.etherne:01, irq=0)


Hum, that's unexpected... things are probably more complex that I
thought. Do you have extra patches to this driver by changce?


No, the initial kernel where the problem was discovered is
4.5.2-301.fc24.aarch64, but I built a mainline 4.6, and modprobed the
driver
with the same effect.

Although, now that I'm looking closer at phy_irq, I'm curious how it
works for
anyone else...


Does anything change when you comment out that memcpy()? It
shouldn't probably...


Well that should change the irq to PHY_POLL by default rather than the 0's
in the structure, which may be a better patch.


   It shouldn't due to the wrong size. It should only overwrite IRQ and index 
0, unless I'm mistaken.


MBR, Sergei

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Jeremy Linton


On 06/14/2016 04:42 PM, Sergei Shtylyov wrote:

On 06/15/2016 12:40 AM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things
seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.


Ok, I'm back on the machine, this is what mine says without that patch.

SMSC LAN911x Internal PHY 1800.etherne:01: attached PHY driver
[SMSC
LAN911x Internal PHY] (mii_bus:phy_addr=1800.etherne:01, irq=0)


Hum, that's unexpected... things are probably more complex that I
thought. Do you have extra patches to this driver by changce?


No, the initial kernel where the problem was discovered is
4.5.2-301.fc24.aarch64, but I built a mainline 4.6, and modprobed the
driver
with the same effect.


Although, now that I'm looking closer at phy_irq, I'm curious how it
works for
anyone else...


Does anything change when you comment out that memcpy()? It
shouldn't probably...


	Well that should change the irq to PHY_POLL by default rather than the 
0's in the structure, which may be a better patch.

[PATCH v2 net-next] net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads

2016-06-14 Thread Philip Prindeville

From: Philip Prindeville 

In the presence of firewalls which improperly block ICMP Unreachable
(including Fragmentation Required) messages, Path MTU Discovery is
prevented from working.

A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as
is done for other payloads like AppleTalk--and doing transparent
fragmentation and reassembly.

Redux includes the enforcement of mutual exclusion between this feature
and Path MTU Discovery as suggested by Alexander Duyck.

Cc: Alexander Duyck 
Reviewed-by: Stephen Hemminger 
Signed-off-by: Philip Prindeville 
---
 include/net/ip_tunnels.h   |  1 +
 include/uapi/linux/if_tunnel.h |  1 +
 net/ipv4/ip_gre.c  | 42 +-
 net/ipv4/ip_tunnel.c   |  2 +-
 4 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index dbf..9222678 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -132,6 +132,7 @@ struct ip_tunnel {
int ip_tnl_net_id;
struct gro_cellsgro_cells;
boolcollect_md;
+   boolignore_df;
 };
 
 #define TUNNEL_CSUM__cpu_to_be16(0x01)
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index af4de90..1046f55 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -113,6 +113,7 @@ enum {
IFLA_GRE_ENCAP_SPORT,
IFLA_GRE_ENCAP_DPORT,
IFLA_GRE_COLLECT_METADATA,
+   IFLA_GRE_IGNORE_DF,
__IFLA_GRE_MAX,
 };
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4d2025f..0f8ca3f 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -841,17 +841,19 @@ out:
return ipgre_tunnel_validate(tb, data);
 }
 
-static void ipgre_netlink_parms(struct net_device *dev,
+static int ipgre_netlink_parms(struct net_device *dev,
struct nlattr *data[],
struct nlattr *tb[],
struct ip_tunnel_parm *parms)
 {
+   struct ip_tunnel *t = netdev_priv(dev);
+
memset(parms, 0, sizeof(*parms));
 
parms->iph.protocol = IPPROTO_GRE;
 
if (!data)
-   return;
+   return 0;
 
if (data[IFLA_GRE_LINK])
parms->link = nla_get_u32(data[IFLA_GRE_LINK]);
@@ -880,16 +882,26 @@ static void ipgre_netlink_parms(struct net_device *dev,
if (data[IFLA_GRE_TOS])
parms->iph.tos = nla_get_u8(data[IFLA_GRE_TOS]);
 
-   if (!data[IFLA_GRE_PMTUDISC] || nla_get_u8(data[IFLA_GRE_PMTUDISC]))
+   if (!data[IFLA_GRE_PMTUDISC] || nla_get_u8(data[IFLA_GRE_PMTUDISC])) {
+   if (t->ignore_df)
+   return -EINVAL;
parms->iph.frag_off = htons(IP_DF);
+   }
 
if (data[IFLA_GRE_COLLECT_METADATA]) {
-   struct ip_tunnel *t = netdev_priv(dev);
-
t->collect_md = true;
if (dev->type == ARPHRD_IPGRE)
dev->type = ARPHRD_NONE;
}
+
+   if (data[IFLA_GRE_IGNORE_DF]) {
+   if (nla_get_u8(data[IFLA_GRE_IGNORE_DF])
+ && (parms->iph.frag_off & htons(IP_DF)))
+   return -EINVAL;
+   t->ignore_df = !!nla_get_u8(data[IFLA_GRE_IGNORE_DF]);
+   }
+
+   return 0;
 }
 
 /* This function returns true when ENCAP attributes are present in the nl msg 
*/
@@ -960,16 +972,19 @@ static int ipgre_newlink(struct net *src_net, struct 
net_device *dev,
 {
struct ip_tunnel_parm p;
struct ip_tunnel_encap ipencap;
+   int err;
 
if (ipgre_netlink_encap_parms(data, &ipencap)) {
struct ip_tunnel *t = netdev_priv(dev);
-   int err = ip_tunnel_encap_setup(t, &ipencap);
+   err = ip_tunnel_encap_setup(t, &ipencap);
 
if (err < 0)
return err;
}
 
-   ipgre_netlink_parms(dev, data, tb, &p);
+   err = ipgre_netlink_parms(dev, data, tb, &p);
+   if (err < 0)
+   return err;
return ip_tunnel_newlink(dev, tb, &p);
 }
 
@@ -978,16 +993,19 @@ static int ipgre_changelink(struct net_device *dev, 
struct nlattr *tb[],
 {
struct ip_tunnel_parm p;
struct ip_tunnel_encap ipencap;
+   int err;
 
if (ipgre_netlink_encap_parms(data, &ipencap)) {
struct ip_tunnel *t = netdev_priv(dev);
-   int err = ip_tunnel_encap_setup(t, &ipencap);
+   err = ip_tunnel_encap_setup(t, &ipencap);
 
if (err < 0)
return err;
}
 
-   ipgre_netlink_parms(dev, data, tb, &p);
+   err = ipgre_netlink_parms(dev, data, tb, &p);
+   if (err < 0)
+   return err;
return ip_tunnel_changelink(dev, tb, &p);
 }

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-14 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> @@ -3681,7 +3681,7 @@ mv88e6xxx_smi_detect(struct device *dev, struct 
>> mii_bus *bus, int sw_addr,
>>  u16 id;
>>  
>>  ops = &mv88e6xxx_smi_direct_ops;
>> -if (sw_addr > 0)
>> +if (sw_addr > 0 && info->flags & MV88E6XXX_FLAG_MULTI_CHIP)
>>  ops = &mv88e6xxx_smi_indirect_ops;
>
> Is sw_addr is > 0 and MV88E6XXX_FLAG_MULTI_CHIP is not set, you should
> return -EINVAL. The device tree is invalid.

OK, I'll change this snippet for the following until we explicitly add
support for such device with non-zero address and direct SMI access:

if (sw_addr == 0)
ops = &mv88e6xxx_smi_direct_ops;
else if (info->flags & MV88E6XXX_FLAG_MULTI_CHIP)
ops = &mv88e6xxx_smi_indirect_ops;
else
return NULL;

Thanks,

Vivien

Re: [PATCH v2 net-next v2 09/12] net: dsa: mv88e6xxx: add SMI detection helper

2016-06-14 Thread Andrew Lunn

On Tue, Jun 14, 2016 at 02:31:50PM -0400, Vivien Didelot wrote:
> Extract the allocation and switch ID reading code used by both legacy
> and new probing into an helper function which uses a info structure to
> describe how to access the switch ID register.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/mv88e6xxx.c | 74 
> -
>  1 file changed, 32 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index 8ac9f9a..2f36d01 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -3631,22 +3631,15 @@ mv88e6xxx_lookup_info(unsigned int prod_num, const 
> struct mv88e6xxx_info *table,
>   return NULL;
>  }
>  
> -static const char *mv88e6xxx_drv_probe(struct device *dsa_dev,
> -struct device *host_dev, int sw_addr,
> -void **priv)
> +static struct mv88e6xxx_priv_state *
> +mv88e6xxx_smi_detect(struct device *dev, struct mii_bus *bus, int sw_addr,
> +  const struct mv88e6xxx_info *info)
>  {
> - const struct mv88e6xxx_info *info;
>   struct mv88e6xxx_priv_state *ps;
> - struct mii_bus *bus;
> - const char *name;
>   int id, prod_num, rev;
> - int err;
>  
> - bus = dsa_host_dev_to_mii_bus(host_dev);
> - if (!bus)
> - return NULL;
> -
> - id = __mv88e6xxx_reg_read(bus, sw_addr, REG_PORT(0), PORT_SWITCH_ID);
> + id = __mv88e6xxx_reg_read(bus, sw_addr, info->port_base_addr,
> +   PORT_SWITCH_ID);
>   if (id < 0)
>   return NULL;
>  
> @@ -3658,28 +3651,46 @@ static const char *mv88e6xxx_drv_probe(struct device 
> *dsa_dev,
>   if (!info)
>   return NULL;
>  
> - name = info->name;
> + dev_info(dev, "switch 0x%x detected: %s, revision %u\n", prod_num,
> +  info->name, rev);
>  
> - ps = devm_kzalloc(dsa_dev, sizeof(*ps), GFP_KERNEL);
> + ps = devm_kzalloc(dev, sizeof(*ps), GFP_KERNEL);
>   if (!ps)
>   return NULL;

I don't like the way this detect function goes a lot further than
detection. I would say detection finished when you have the info
structure. Return at that point, and let the probe do the rest.

   Andrew

Re: [PATCH v2 net-next v2 10/12] net: dsa: mv88e6xxx: iterate on compatible info

2016-06-14 Thread Andrew Lunn

On Tue, Jun 14, 2016 at 02:31:51PM -0400, Vivien Didelot wrote:
> With legacy probing, we cannot have a compatible info structure. We have
> to guess it. Instead of using only the first info structure of the info
> table, iterate over the compatible data.
> 
> That way, the legacy code will support new compatible chips with
> different register access without requiring any code change.

I don't think this is safe when used in combination with multi-chip
addresses. This code will perform writes on various addresses,
addresses which could be real registers on a device.

I don't see a need to support guessing. The new binding will work,
without any guessing. So use that.

Andrew

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Sergei Shtylyov


On 06/15/2016 12:40 AM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.


Ok, I'm back on the machine, this is what mine says without that patch.

SMSC LAN911x Internal PHY 1800.etherne:01: attached PHY driver [SMSC
LAN911x Internal PHY] (mii_bus:phy_addr=1800.etherne:01, irq=0)


Hum, that's unexpected... things are probably more complex that I
thought. Do you have extra patches to this driver by changce?


No, the initial kernel where the problem was discovered is
4.5.2-301.fc24.aarch64, but I built a mainline 4.6, and modprobed the driver
with the same effect.


Although, now that I'm looking closer at phy_irq, I'm curious how it works for
anyone else...


   Does anything change when you comment out that memcpy()? It shouldn't 
probably...


MBR, Sergei

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Jeremy Linton


On 06/14/2016 04:34 PM, Sergei Shtylyov wrote:

On 06/15/2016 12:29 AM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.


Ok, I'm back on the machine, this is what mine says without that patch.



SMSC LAN911x Internal PHY 1800.etherne:01: attached PHY driver [SMSC
LAN911x Internal PHY] (mii_bus:phy_addr=1800.etherne:01, irq=0)


Hum, that's unexpected... things are probably more complex that I
thought. Do you have extra patches to this driver by changce?


No, the initial kernel where the problem was discovered is
4.5.2-301.fc24.aarch64, but I built a mainline 4.6, and modprobed the 
driver with the same effect.



Although, now that I'm looking closer at phy_irq, I'm curious how it 
works for anyone else...

Re: [CRIU] TCP_REPAIR MSS issue

2016-06-14 Thread Andrey Vagin

On Tue, Jun 14, 2016 at 07:37:12PM +, Eggert, Lars wrote:
> Hi,
> 
> On 2016-06-14, at 19:15, Andrey Vagin  wrote:
> > Recently we found that we have to restore more parameters for tcp
> > sockets.
> > https://patchwork.kernel.org/patch/9144995/
> 
> thanks for the pointer.
> 
> > As for your problem, criu saves and restores mss_clamp. Could you check
> > that it works for your case?
> 
> I do this already, but clamping doesn't help here, since it only limits the 
> MSS (but does not increase it from the minimum.)

Yes, you are right.

On my host, I see that dst is set in tcp_v4_connect() -> sk_setup_caps()

> 
> Lars

Re: [PATCH iproute2] ip route: Add annotation for replaced routes

2016-06-14 Thread David Ahern


On 6/14/16 3:34 PM, Stephen Hemminger wrote:

On Thu,  9 Jun 2016 13:05:42 -0700
David Ahern  wrote:


If NLM_F_REPLACE flag is set then a route is replacing an existing route.
Prepend "Replaced " to these routes similar to how "Deleted " is added
to deleted routes.

Signed-off-by: David Ahern 


Why just routes, there are several other places where Replaced entries could
be returned?



b/c that's the use case I was investigating. :-)

Will add the others.

Re: [PATCH v2 net-next v2 12/12] net: dsa: mv88e6xxx: add addressing mode to info

2016-06-14 Thread Andrew Lunn

> @@ -3681,7 +3681,7 @@ mv88e6xxx_smi_detect(struct device *dev, struct mii_bus 
> *bus, int sw_addr,
>   u16 id;
>  
>   ops = &mv88e6xxx_smi_direct_ops;
> - if (sw_addr > 0)
> + if (sw_addr > 0 && info->flags & MV88E6XXX_FLAG_MULTI_CHIP)
>   ops = &mv88e6xxx_smi_indirect_ops;

Hi Vivien

Is sw_addr is > 0 and MV88E6XXX_FLAG_MULTI_CHIP is not set, you should
return -EINVAL. The device tree is invalid.

   Andrew

Re: [PATCH iproute2] ip route: Add annotation for replaced routes

2016-06-14 Thread Stephen Hemminger

On Thu,  9 Jun 2016 13:05:42 -0700
David Ahern  wrote:

> If NLM_F_REPLACE flag is set then a route is replacing an existing route.
> Prepend "Replaced " to these routes similar to how "Deleted " is added
> to deleted routes.
> 
> Signed-off-by: David Ahern 

Why just routes, there are several other places where Replaced entries could
be returned?

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Sergei Shtylyov


On 06/15/2016 12:29 AM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.


Ok, I'm back on the machine, this is what mine says without that patch.



SMSC LAN911x Internal PHY 1800.etherne:01: attached PHY driver [SMSC
LAN911x Internal PHY] (mii_bus:phy_addr=1800.etherne:01, irq=0)


   Hum, that's unexpected... things are probably more complex that I thought. 
Do you have extra patches to this driver by changce?


MBR, Sergei

Re: [PATCH net-next v2] tcp: use RFC6298 compliant TCP RTO calculation

2016-06-14 Thread Yuchung Cheng

On Tue, Jun 14, 2016 at 12:18 PM, Daniel Metz  wrote:
> From: Daniel Metz 
>
> This patch adjusts Linux RTO calculation to be RFC6298 Standard
> compliant. MinRTO is no longer added to the computed RTO, RTO damping
> and overestimation are decreased.
>
> In RFC 6298 Standard TCP Retransmission Timeout (RTO) calculation the
> calculated RTO is rounded up to the Minimum RTO (MinRTO), if it is
> less.  The Linux implementation as a discrepancy to the Standard
> basically adds the defined MinRTO to the calculated RTO. When
> comparing both approaches, the Linux calculation seems to perform
> worse for sender limited TCP flows like Telnet, SSH or constant bit
> rate encoded transmissions, especially for Round Trip Times (RTT) of
> 50ms to 800ms.
>
> Compared to the Linux implementation the RFC 6298 proposed RTO
> calculation performs better and more precise in adapting to current
> network characteristics. Extensive measurements for bulk data did not
> show a negative impact of the adjusted calculation.
>
> Exemplary performance comparison for sender-limited-flows:
>
> - Rate: 10Mbit/s
> - Delay: 200ms, Delay Variation: 10ms
> - Time between each scheduled segment: 1s
> - Amount Data Segments: 300
> - Mean of 11 runs
>
>  Mean Response Waiting Time [milliseconds]
>
> PER [%] |   0.5  11.5  2  3  5  7 10
> +---
> old | 206.4  208.6  218.0  218.6  227.2  249.3  274.7  308.2
> new | 203.9  206.0  207.0  209.9  217.3  225.6  238.7  259.1
>
>
> Detailed analysis:
> https://docs.google.com/document/d/1pKmPfnQb6fDK4qpiNVkN8cQyGE4wYDZukcuZfR-BnnM/
>
> Reasoning for historic design:
> Sarolahti, P.; Kuznetsov, A. (2002). Congestion Control in Linux TCP.
> Conference Paper. Proceedings of the FREENIX Track. 2002 USENIX Annual
> https://www.cs.helsinki.fi/research/iwtcp/papers/linuxtcp.pdf
>
>
> Signed-off-by: Hagen Paul Pfeifer 
> Signed-off-by: Daniel Metz 
> Cc: Eric Dumazet 
> Cc: Yuchung Cheng 
> ---
>
> v2:
>  - Using the RFC 6298 compliant implementation, the tcp_sock struct variable
>  u32 mdev_max_us becomes obsolete and consequently is being removed.
>  - Add reference to Kuznetsov paper
>
>
>  include/linux/tcp.h|  1 -
>  net/ipv4/tcp_input.c   | 74 
> --
>  net/ipv4/tcp_metrics.c |  2 +-
>  3 files changed, 18 insertions(+), 59 deletions(-)
>
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index 7be9b12..d1790c5 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -231,7 +231,6 @@ struct tcp_sock {
>  /* RTT measurement */
> u32 srtt_us;/* smoothed round trip time << 3 in usecs */
> u32 mdev_us;/* medium deviation */
> -   u32 mdev_max_us;/* maximal mdev for the last rtt period */
> u32 rttvar_us;  /* smoothed mdev_max*/
> u32 rtt_seq;/* sequence number to update rttvar */
> struct rtt_meas {
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 94d4aff..0d53537 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -680,8 +680,7 @@ static void tcp_event_data_recv(struct sock *sk, struct 
> sk_buff *skb)
>  /* Called to compute a smoothed rtt estimate. The data fed to this
>   * routine either comes from timestamps, or from segments that were
>   * known _not_ to have been retransmitted [see Karn/Partridge
> - * Proceedings SIGCOMM 87]. The algorithm is from the SIGCOMM 88
> - * piece by Van Jacobson.
> + * Proceedings SIGCOMM 87].
>   * NOTE: the next three routines used to be one big routine.
>   * To save cycles in the RFC 1323 implementation it was better to break
>   * it up into three procedures. -- erics
> @@ -692,59 +691,21 @@ static void tcp_rtt_estimator(struct sock *sk, long 
> mrtt_us)
> long m = mrtt_us; /* RTT */
> u32 srtt = tp->srtt_us;
>
> -   /*  The following amusing code comes from Jacobson's
> -*  article in SIGCOMM '88.  Note that rtt and mdev
> -*  are scaled versions of rtt and mean deviation.
> -*  This is designed to be as fast as possible
> -*  m stands for "measurement".
> -*
> -*  On a 1990 paper the rto value is changed to:
> -*  RTO = rtt + 4 * mdev
> -*
> -* Funny. This algorithm seems to be very broken.
> -* These formulae increase RTO, when it should be decreased, increase
> -* too slowly, when it should be increased quickly, decrease too 
> quickly
> -* etc. I guess in BSD RTO takes ONE value, so that it is absolutely
> -* does not matter how to _calculate_ it. Seems, it was trap
> -* that VJ failed to avoid. 8)
> -*/
> if (srtt != 0) {
> -   m -= (srtt >> 3);   /* m is now error in rtt est */
> -   srtt += m;  /

Re: [iproute2 PATCH 1/1] action pedit: stylistic changes

2016-06-14 Thread Stephen Hemminger

On Sun, 12 Jun 2016 17:40:34 -0400
Jamal Hadi Salim  wrote:

> From: Jamal Hadi Salim 
> 
> More modern layout.
> 
> Signed-off-by: Jamal Hadi Salim 

Applied, also added a few more cleanups here.

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Jeremy Linton


On 06/14/2016 03:44 PM, Sergei Shtylyov wrote:

On 06/14/2016 07:16 PM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.


Ok, I'm back on the machine, this is what mine says without that patch.



SMSC LAN911x Internal PHY 1800.etherne:01: attached PHY driver [SMSC 
LAN911x Internal PHY] (mii_bus:phy_addr=1800.etherne:01, irq=0)

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Sergei Shtylyov


On 06/15/2016 12:12 AM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


AFAIK, I think that its set when the device is configured as a platform
device, or there is an external phy/interrupt setup in DT. I might be wrong on
that..


   I totally fail to see that, even in net-next. The only place that uses 
'phy_irq' is that buggy memcpy()...


MBR, Sergei

Re: [iproute2 PATCH] utils: fix hex digits parsing in hexstring_a2n()

2016-06-14 Thread Stephen Hemminger

On Tue, 14 Jun 2016 20:55:17 +
Beniamino Galvani  wrote:

> strtoul() only modifies errno on overflow, so if errno is not zero
> before calling the function its value is preserved and makes the
> function fail for valid inputs; initialize it.
> 
> Signed-off-by: Beniamino Galvani 
> ---
>  lib/utils.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/utils.c b/lib/utils.c
> index 70e85b7..7dceeb5 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -924,6 +924,7 @@ __u8 *hexstring_a2n(const char *str, __u8 *buf, int blen, 
> unsigned int *len)
>  
>   strncpy(tmpstr, str, 2);
>   tmpstr[2] = '\0';
> + errno = 0;
>   tmp = strtoul(tmpstr, &endptr, 16);
>   if (errno != 0 || tmp > 0xFF || *endptr != '\0')
>   return NULL;

Applied.
The code here is doing things in a manner much harder than necessary...

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Sergei Shtylyov


On 06/14/2016 11:59 PM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)


Looks like your phy ends up polling (-1==IRQ_POLL)...


   Yeah. You didn't answer my question though...


I'm using the device tree on my board.


This was DT as well with a recent fedora/NetworkManager. It actually seems
to be timing related to how fast the device gets configured after the initial
phy probe. There is something like a 1 second window or so where it will work,
but if network manager takes longer than that, the link state drops and cannot
be brought back up unless the cable is pulled, replugged while the netdevice
is being restarted.


   1 second is the PHY poll interval IIRC.

MBR, Sergei

Re: [PATCH net] net_sched: fix pfifo_head_drop behavior vs backlog

2016-06-14 Thread David Miller

From: Eric Dumazet 
Date: Sun, 12 Jun 2016 20:01:25 -0700

> From: Eric Dumazet 
> 
> When the qdisc is full, we drop a packet at the head of the queue,
> queue the current skb and return NET_XMIT_CN
> 
> Now we track backlog on upper qdiscs, we need to call
> qdisc_tree_reduce_backlog(), even if the qlen did not change.
> 
> Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too")
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks.

Re: [PATCH 3/3] net: hisilicon: Add Fast Ethernet MAC driver

2016-06-14 Thread Arnd Bergmann

On Tuesday, June 14, 2016 9:17:44 PM CEST Li Dongpo wrote:
> On 2016/6/13 17:06, Arnd Bergmann wrote:
> > On Monday, June 13, 2016 2:07:56 PM CEST Dongpo Li wrote:
> > You tx function uses BQL to optimize the queue length, and that
> > is great. You also check xmit reclaim for rx interrupts, so
> > as long as you have both rx and tx traffic, this should work
> > great.
> > 
> > However, I notice that you only have a 'tx fifo empty'
> > interrupt triggering the napi poll, so I guess on a tx-only
> > workload you will always end up pushing packets into the
> > queue until BQL throttles tx, and then get the interrupt
> > after all packets have been sent, which will cause BQL to
> > make the queue longer up to the maximum queue size, and that
> > negates the effect of BQL.
> > 
> > Is there any way you can get a tx interrupt earlier than
> > this in order to get a more balanced queue, or is it ok
> > to just rely on rx packets to come in occasionally, and
> > just use the tx fifo empty interrupt as a fallback?
> > 
> In tx direction, there are only two kinds of interrupts, 'tx fifo empty'
> and 'tx one packet finish'. I didn't use 'tx one packet finish' because
> it would lead to high hardware interrupts rate. This has been verified in
> our chips. It's ok to just use tx fifo empty interrupt.

I'm not convinced by the explanation, I don't think that has anything
to do with the hardware design, but instead is about the correctness
of the BQL logic with your driver.

Maybe your xmit function can do something like

if (dql_avail(netdev_get_tx_queue(dev, 0)->dql) < 0)
enable per-packet interrupt
else
use only fifo-empty interrupt

That way, you don't get a lot of interrupts when the system is
in a state of packets being received and sent continuously,
but if you get to the point where your tx queue fills up
and no rx interrupts arrive, you don't have to wait for it
to become completely empty before adding new packets, and
BQL won't keep growing the queue.

> >> +priv->phy_mode = of_get_phy_mode(node);
> >> +if (priv->phy_mode < 0) {
> >> +dev_err(dev, "not find phy-mode\n");
> >> +ret = -EINVAL;
> >> +goto out_disable_clk;
> >> +}
> >> +
> >> +priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
> >> +if (!priv->phy_node) {
> >> +dev_err(dev, "not find phy-handle\n");
> >> +ret = -EINVAL;
> >> +goto out_disable_clk;
> >> +}
> >> +
> >> +priv->phy = of_phy_connect(ndev, priv->phy_node,
> >> +   hisi_femac_adjust_link, 0, priv->phy_mode);
> >> +if (!(priv->phy) || IS_ERR(priv->phy)) {
> >> +dev_err(dev, "connect to PHY failed!\n");
> >> +ret = -ENODEV;
> >> +goto out_phy_node;
> >> +}
> > 
> > I wonder if we could generalize this set of three calls, I
> > get the impression that we duplicate this across several
> > drivers that shouldn't need to bother with the specific
> > phy-handle and phy-mode properties.
> > 
> Some drivers only call 'of_phy_connect' when ndo_open called,
> some call when driver probed. But 'phy_mode' and 'phy_node' are
> usually initialized when driver probed.
> So I think it's not suitable to combine 'of_phy_connect' with
> 'of_get_phy_mode' and 'of_parse_phandle'.
> Do you have any more suggestions ?

My idea was to add another interface that drivers could optionally
call if they use the logic that you have here, but other drivers
could keep using the plain of_phy_connect.

Anyway, this was just an idea, it's not important.

Arnd

Re: [PATCH v2 net-next v2 05/12] net: dsa: mv88e6xxx: add switch register helpers

2016-06-14 Thread Andrew Lunn

On Tue, Jun 14, 2016 at 02:31:46PM -0400, Vivien Didelot wrote:
> The mixed assignments, allocations and registrations in the probe code
> make it hard to follow the logic and figure out what is DSA or chip
> specific.
> 
> Extract the struct dsa_switch related code in a simple
> mv88e6xxx_register_switch helper function.
> 
> For symmetry in the code, add a mv88e6xxx_unregister_switch function.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH] esp: correct offset for ESN when using NAT-T

2016-06-14 Thread David Miller

From: Blair Steven 
Date: Mon, 13 Jun 2016 11:48:14 +1200

> The offset for calculating ESN was not taking into account the new UDP
> header created for NAT-T.

All submissions must include a proper signoff.

Re: [PATCH v2 net-next v2 01/12] net: dsa: mv88e6xxx: fix style issues

2016-06-14 Thread Andrew Lunn

On Tue, Jun 14, 2016 at 02:31:42PM -0400, Vivien Didelot wrote:
> This patch fixes 5 style problems reported by checkpatch:
> 
> WARNING: suspect code indent for conditional statements (8, 24)
> #492: FILE: drivers/net/dsa/mv88e6xxx.c:492:
> + if (phydev->link)
> + reg |= PORT_PCS_CTRL_LINK_UP;
> 
> CHECK: Logical continuations should be on the previous line
> #1318: FILE: drivers/net/dsa/mv88e6xxx.c:1318:
> +  oldstate == PORT_CONTROL_STATE_FORWARDING)
> + && (state == PORT_CONTROL_STATE_DISABLED ||
> 
> CHECK: multiple assignments should be avoided
> #1662: FILE: drivers/net/dsa/mv88e6xxx.c:1662:
> + vlan->vid_begin = vlan->vid_end = next.vid;
> 
> WARNING: line over 80 characters
> #2097: FILE: drivers/net/dsa/mv88e6xxx.c:2097:
> +const struct switchdev_obj_port_vlan 
> *vlan,
> 
> WARNING: suspect code indent for conditional statements (16, 32)
> #2734: FILE: drivers/net/dsa/mv88e6xxx.c:2734:
> + if (mv88e6xxx_6352_family(ps) || mv88e6xxx_6351_family(ps) ||
> [...]
> + reg |= PORT_CONTROL_EGRESS_ADD_TAG;
> 
> total: 0 errors, 3 warnings, 2 checks, 3805 lines checked
> 
> It also rebases and integrates changes sent by Ben Dooks [1]:
> 
> The driver has a number of functions that are not exported or
> declared elsewhere, so make them static to avoid the following
> warnings from sparse:
> 
> drivers/net/dsa/mv88e6xxx.c:113:5: warning: symbol 'mv88e6xxx_reg_read' 
> was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:167:5: warning: symbol 'mv88e6xxx_reg_write' 
> was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:231:5: warning: symbol 'mv88e6xxx_set_addr' 
> was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:367:6: warning: symbol 
> 'mv88e6xxx_ppu_state_init' was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:3157:5: warning: symbol 
> 'mv88e6xxx_phy_page_read' was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:3169:5: warning: symbol 
> 'mv88e6xxx_phy_page_write' was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:3583:26: warning: symbol 
> 'mv88e6xxx_switch_driver' was not declared. Should it be static?
> drivers/net/dsa/mv88e6xxx.c:3621:5: warning: symbol 'mv88e6xxx_probe' was 
> not declared. Should it be static?
> 
> [1] http://patchwork.ozlabs.org/patch/632708/
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Jeremy Linton


On 06/14/2016 03:44 PM, Sergei Shtylyov wrote:

On 06/14/2016 07:16 PM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


	AFAIK, I think that its set when the device is configured as a platform 
device, or there is an external phy/interrupt setup in DT. I might be 
wrong on that..







internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)

I'm using the device tree on my board.

MBR, Sergei

Re: [PATCH] net: smsc911x: If PHY doesn't have an interrupt then POLL

2016-06-14 Thread Jeremy Linton


On 06/14/2016 03:44 PM, Sergei Shtylyov wrote:

On 06/14/2016 07:16 PM, Jeremy Linton wrote:


If the interrupt configuration isn't set and we are using the


It's never set, judging by the driver code.


internal phy, then we need to poll the phy to reliably detect
phy state changes.


What address your internal PHY is at? Mine is at 1, and things seem
to work reliably after probing:

SMSC LAN8700 1800.etherne:01: attached PHY driver [SMSC LAN8700]
(mii_bus:phy_addr=1800.etherne:01, irq=-1)



BTW, the phy that is causing the problem is the one
labeled "SMSC LAN911x Internal PHY" in phy/smsc.c.

Re: [PATCH v2 net-next v2 08/12] net: dsa: mv88e6xxx: read switch ID from info

2016-06-14 Thread Vivien Didelot

Hi,

Sergei Shtylyov  writes:

>> -id = mv88e6xxx_reg_read(ps, REG_PORT(0), PORT_SWITCH_ID);
>> +of_id = of_match_node(mv88e6xxx_of_id_table, np);
>
> You could use of_device_get_match_data() here.
>
>> +if (!of_id)
>> +return -EINVAL;
>> +
>> +info = (const struct mv88e6xxx_info *)of_id->data;
>
> Pointer casts from 'void *' are automatic.

I applied your comments and also squashed patches 7 and 8 together.
I'll respin a v3 soon unless there are other comments.

Thanks,

Vivien

padata - is serial actually serial?

2016-06-14 Thread Jason A. Donenfeld

Hi Steffen & Folks,

I submit a job to padata_do_parallel(). When the parallel() function
triggers, I do some things, and then call padata_do_serial(). Finally
the serial() function triggers, where I complete the job (check a
nonce, etc).

The padata API is very appealing because not only does it allow for
parallel computation, but it claims that the serial() functions will
execute in the order that jobs were originally submitted to
padata_do_parallel().

Unfortunately, in practice, I'm pretty sure I'm seeing deviations from
this. When I submit tons and tons of tasks at rapid speed to
padata_do_parallel(), it seems like the serial() function isn't being
called in the exactly the same order that tasks were submitted to
padata_do_parallel().

Is this known (expected) behavior? Or have I stumbled upon a potential
bug that's worthwhile for me to investigate more?

Thanks,
Jason

[PATCH iproute2 3/6] ip neigh: Add support for keyword

2016-06-14 Thread David Ahern

Add vrf keyword to 'ip neigh' commands. Allows listing neighbor
entries for all links associated with a given VRF.
  $ ip neigh show vrf NAME

versus current syntax:
  $ ip neigh show master DEV

Signed-off-by: David Ahern 
---
 ip/ipneigh.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 4ddb747e2086..3e444712645f 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -48,7 +48,8 @@ static void usage(void)
 {
fprintf(stderr, "Usage: ip neigh { add | del | change | replace }\n"
"{ ADDR [ lladdr LLADDR ] [ nud STATE ] 
| proxy ADDR } [ dev DEV ]\n");
-   fprintf(stderr, "   ip neigh { show | flush } [ proxy ] [ to PREFIX 
] [ dev DEV ] [ nud STATE ]\n\n");
+   fprintf(stderr, "   ip neigh { show | flush } [ proxy ] [ to PREFIX 
] [ dev DEV ] [ nud STATE ]\n");
+   fprintf(stderr, " [ vrf NAME ]\n\n");
fprintf(stderr, "STATE := { permanent | noarp | stale | reachable | 
none |\n"
"   incomplete | delay | probe | failed }\n");
exit(-1);
@@ -385,6 +386,17 @@ static int do_show_or_flush(int argc, char **argv, int 
flush)
invarg("Device does not exist\n", *argv);
addattr32(&req.n, sizeof(req), NDA_MASTER, ifindex);
filter.master = ifindex;
+   } else if (strcmp(*argv, "vrf") == 0) {
+   int ifindex;
+
+   NEXT_ARG();
+   ifindex = ll_name_to_index(*argv);
+   if (!ifindex)
+   invarg("Not a valid VRF name\n", *argv);
+   if (!name_is_vrf(*argv))
+   invarg("Not a valid VRF name\n", *argv);
+   addattr32(&req.n, sizeof(req), NDA_MASTER, ifindex);
+   filter.master = ifindex;
} else if (strcmp(*argv, "unused") == 0) {
filter.unused_only = 1;
} else if (strcmp(*argv, "nud") == 0) {
-- 
2.1.4

[PATCH iproute2 1/6] ip vrf: Add name_is_vrf

2016-06-14 Thread David Ahern

Add name_is_vrf function to determine if given name corresponds to a
VRF device.

Signed-off-by: David Ahern 
---
 ip/ip_common.h  |  2 ++
 ip/iplink_vrf.c | 53 +
 2 files changed, 55 insertions(+)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index e8da9e034b15..410eb135774a 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -90,6 +90,8 @@ struct link_util *get_link_slave_kind(const char *slave_kind);
 
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
 
+bool name_is_vrf(char *name);
+
 #ifndefINFINITY_LIFE_TIME
 #define INFINITY_LIFE_TIME  0xU
 #endif
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
index e3c7b4652da5..abd43c08423e 100644
--- a/ip/iplink_vrf.c
+++ b/ip/iplink_vrf.c
@@ -96,3 +96,56 @@ struct link_util vrf_slave_link_util = {
.print_opt  = vrf_slave_print_opt,
.slave  = true,
 };
+
+bool name_is_vrf(char *name)
+{
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsgi;
+   charbuf[1024];
+   } req = {
+   .n = {
+   .nlmsg_len   = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+   .nlmsg_flags = NLM_F_REQUEST,
+   .nlmsg_type  = RTM_GETLINK,
+   },
+   .i = {
+   .ifi_family  = preferred_family,
+   },
+   };
+   struct {
+   struct nlmsghdr n;
+   char buf[8192];
+   } answer;
+   struct rtattr *tb[IFLA_MAX+1];
+   struct rtattr *li[IFLA_INFO_MAX+1];
+   struct ifinfomsg *ifi;
+   int len;
+
+   addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
+
+   if (rtnl_talk(&rth, &req.n, &answer.n, sizeof(answer)) < 0)
+   goto err;
+
+   ifi = NLMSG_DATA(&answer.n);
+   len = answer.n.nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
+   if (len < 0) {
+   fprintf(stderr, "BUG: Invalid response to link query.\n");
+   goto err;
+   }
+
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+
+   if (!tb[IFLA_LINKINFO])
+   goto err;
+
+   parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]);
+
+   if (!li[IFLA_INFO_KIND])
+   goto err;
+
+   return strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf") == 0;
+
+err:
+   return false;
+}
-- 
2.1.4

[PATCH iproute2 5/6] ip vrf: Add ipvrf_get_table

2016-06-14 Thread David Ahern

Add ipvrf_get_table to lookup table id for device name. Returns 0
on any error or if name is not a VRF device.

Signed-off-by: David Ahern 
---
 ip/ip_common.h  |  1 +
 ip/iplink_vrf.c | 66 +
 2 files changed, 67 insertions(+)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 410eb135774a..8fdb7219fc2b 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -90,6 +90,7 @@ struct link_util *get_link_slave_kind(const char *slave_kind);
 
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
 
+__u32 ipvrf_get_table(char *name);
 bool name_is_vrf(char *name);
 
 #ifndefINFINITY_LIFE_TIME
diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
index abd43c08423e..2eecb4564f7e 100644
--- a/ip/iplink_vrf.c
+++ b/ip/iplink_vrf.c
@@ -97,6 +97,72 @@ struct link_util vrf_slave_link_util = {
.slave  = true,
 };
 
+/* returns table id if name is a VRF device */
+__u32 ipvrf_get_table(char *name)
+{
+   struct {
+   struct nlmsghdr n;
+   struct ifinfomsgi;
+   charbuf[1024];
+   } req = {
+   .n = {
+   .nlmsg_len   = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+   .nlmsg_flags = NLM_F_REQUEST,
+   .nlmsg_type  = RTM_GETLINK,
+   },
+   .i = {
+   .ifi_family  = preferred_family,
+   },
+   };
+   struct {
+   struct nlmsghdr n;
+   char buf[8192];
+   } answer;
+   struct rtattr *tb[IFLA_MAX+1];
+   struct rtattr *li[IFLA_INFO_MAX+1];
+   struct rtattr *vrf_attr[IFLA_VRF_MAX + 1];
+   struct ifinfomsg *ifi;
+   __u32 tb_id = 0;
+   int len;
+
+   addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
+
+   if (rtnl_talk(&rth, &req.n, &answer.n, sizeof(answer)) < 0)
+   goto err;
+
+   ifi = NLMSG_DATA(&answer.n);
+   len = answer.n.nlmsg_len - NLMSG_LENGTH(sizeof(*ifi));
+   if (len < 0) {
+   fprintf(stderr, "BUG: Invalid response to link query.\n");
+   goto err;
+   }
+
+   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), len);
+
+   if (!tb[IFLA_LINKINFO])
+   goto err;
+
+   parse_rtattr_nested(li, IFLA_INFO_MAX, tb[IFLA_LINKINFO]);
+
+   if (!li[IFLA_INFO_KIND] || !li[IFLA_INFO_DATA])
+   goto err;
+
+   if (strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf"))
+   goto err;
+
+   parse_rtattr_nested(vrf_attr, IFLA_VRF_MAX, li[IFLA_INFO_DATA]);
+   if (vrf_attr[IFLA_VRF_TABLE])
+   tb_id = rta_getattr_u32(vrf_attr[IFLA_VRF_TABLE]);
+
+   if (!tb_id)
+   fprintf(stderr, "BUG: VRF %s is missing table id\n", name);
+
+   return tb_id;
+
+err:
+   return 0;
+}
+
 bool name_is_vrf(char *name)
 {
struct {
-- 
2.1.4

[PATCH iproute2 0/6] Add support for vrf keyword

2016-06-14 Thread David Ahern

Currently the syntax for VRF related commands is rather kludgy and
inconsistent from one subcommand to another. This set adds support
for the VRF keyword to the link, address, neigh, and route commands
to improve the user experience listing data associated with vrfs,
modifying routes or doing a route lookup.

David Ahern (6):
  ip vrf: Add name_is_vrf
  ip link/addr: Add support for vrf keyword
  ip neigh: Add support for keyword
  ip route: Change type mask to bitmask
  ip vrf: Add ipvrf_get_table
  ip route: Add support for vrf keyword

 ip/ip_common.h  |   3 ++
 ip/ipaddress.c  |  11 ++
 ip/iplink.c |  15 ++-
 ip/iplink_vrf.c | 119 
 ip/ipneigh.c|  14 ++-
 ip/iproute.c|  43 
 6 files changed, 195 insertions(+), 10 deletions(-)

-- 
2.1.4

[PATCH iproute2 4/6] ip route: Change type mask to bitmask

2016-06-14 Thread David Ahern

Allow option to select multiple route types to show or exlude
specific route types.

Signed-off-by: David Ahern 
---
 ip/iproute.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 8224d7ffa94b..aae693d17be8 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -113,7 +113,7 @@ static struct
int flushe;
int protocol, protocolmask;
int scope, scopemask;
-   int type, typemask;
+   __u64 typemask;
int tos, tosmask;
int iif, iifmask;
int oif, oifmask;
@@ -178,7 +178,8 @@ static int filter_nlmsg(struct nlmsghdr *n, struct rtattr 
**tb, int host_len)
return 0;
if ((filter.scope^r->rtm_scope)&filter.scopemask)
return 0;
-   if ((filter.type^r->rtm_type)&filter.typemask)
+
+   if (filter.typemask && !(filter.typemask & (1 << r->rtm_type)))
return 0;
if ((filter.tos^r->rtm_tos)&filter.tosmask)
return 0;
@@ -365,7 +366,8 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
if (n->nlmsg_type == RTM_DELROUTE)
fprintf(fp, "Deleted ");
-   if ((r->rtm_type != RTN_UNICAST || show_details > 0) && !filter.type)
+   if ((r->rtm_type != RTN_UNICAST || show_details > 0) &&
+   (!filter.typemask || (filter.typemask & (1 << r->rtm_type
fprintf(fp, "%s ", rtnl_rtntype_n2a(r->rtm_type, b1, 
sizeof(b1)));
 
if (tb[RTA_DST]) {
@@ -1433,10 +1435,9 @@ static int iproute_list_flush_or_save(int argc, char 
**argv, int action)
int type;
 
NEXT_ARG();
-   filter.typemask = -1;
if (rtnl_rtntype_a2n(&type, *argv))
invarg("node type value is invalid\n", *argv);
-   filter.type = type;
+   filter.typemask = (1<

[PATCH iproute2 6/6] ip route: Add support for vrf keyword

2016-06-14 Thread David Ahern

Add vrf keyword to 'ip route' commands. Allows:
1. Users can list routes by VRF name:
   $ ip route show vrf NAME

   VRF tables have all routes including local and broadcast routes.
   The VRF keyword filters LOCAL and BROADCAST routes; to see all
   routes the table option can be used. Or to see local routes only
   for a VRF:
   $ ip route show vrf NAME type local

   Contrast with current syntax:
   $ ip route show table ID

   where the user needs to find the vrf to table ID or maintain a
   text file in /etc/iproute2/rt_tables.d.

2. Add or delete a route for a VRF:
   $ ip route {add|delete} vrf NAME 

   Similarly for this command, users currently need to use table
   option and know the table id or maintain a mapping.

3. Do a route lookup for a VRF:
   $ ip route get vrf NAME ADDRESS

   Contrast with current syntax:
   $ ip route get oif DEV ADDRESS

   (specifying table id for route get does not work kernel side).

Signed-off-by: David Ahern 
---
 ip/iproute.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index aae693d17be8..bd661c16cb46 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -67,10 +67,10 @@ static void usage(void)
fprintf(stderr, "   ip route showdump\n");
fprintf(stderr, "   ip route get ADDRESS [ from ADDRESS iif STRING 
]\n");
fprintf(stderr, "[ oif STRING ] [ tos TOS 
]\n");
-   fprintf(stderr, "[ mark NUMBER ]\n");
+   fprintf(stderr, "[ mark NUMBER ] [ vrf NAME 
]\n");
fprintf(stderr, "   ip route { add | del | change | append | 
replace } ROUTE\n");
fprintf(stderr, "SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact 
PREFIX ]\n");
-   fprintf(stderr, "[ table TABLE_ID ] [ proto RTPROTO ]\n");
+   fprintf(stderr, "[ table TABLE_ID ] [ vrf NAME ] [ proto 
RTPROTO ]\n");
fprintf(stderr, "[ type TYPE ] [ scope SCOPE ]\n");
fprintf(stderr, "ROUTE := NODE_SPEC [ INFO_SPEC ]\n");
fprintf(stderr, "NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]\n");
@@ -1141,6 +1141,20 @@ static int iproute_modify(int cmd, unsigned int flags, 
int argc, char **argv)
addattr32(&req.n, sizeof(req), RTA_TABLE, tid);
}
table_ok = 1;
+   } else if (matches(*argv, "vrf") == 0) {
+   __u32 tid;
+
+   NEXT_ARG();
+   tid = ipvrf_get_table(*argv);
+   if (tid == 0)
+   invarg("Invalid VRF\n", *argv);
+   if (tid < 256)
+   req.r.rtm_table = tid;
+   else {
+   req.r.rtm_table = RT_TABLE_UNSPEC;
+   addattr32(&req.n, sizeof(req), RTA_TABLE, tid);
+   }
+   table_ok = 1;
} else if (strcmp(*argv, "dev") == 0 ||
   strcmp(*argv, "oif") == 0) {
NEXT_ARG();
@@ -1395,6 +1409,15 @@ static int iproute_list_flush_or_save(int argc, char 
**argv, int action)
}
} else
filter.tb = tid;
+   } else if (matches(*argv, "vrf") == 0) {
+   __u32 tid;
+
+   NEXT_ARG();
+   tid = ipvrf_get_table(*argv);
+   if (tid == 0)
+   invarg("Invalid VRF\n", *argv);
+   filter.tb = tid;
+   filter.typemask = ~(1 << RTN_LOCAL | 1<

[PATCH iproute2 2/6] ip link/addr: Add support for vrf keyword

2016-06-14 Thread David Ahern

Add vrf keyword to 'ip link' and 'ip addr' commands (common list code).

Allows:
1. Adding a link to a VRF using the vrf name:
 $ ip link set NAME vrf NAME

   Versus the current syntax:
 $ ip link set NAME master DEV
   Removing a link from a VRF still uses 'ip link set NAME nomaster'

2. Showing links associated with a VRF:
   $ ip link show vrf NAME

   Versus the current syntax:
 $ ip link show master DEV

3. List addresses associated with links in a VRF
 $ ip addr show vrf NAME

   Versus the current syntax:
 $ ip addr show master DEV

Signed-off-by: David Ahern 
---
 ip/ipaddress.c | 11 +++
 ip/iplink.c| 15 +--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index df363b070d5d..170c4f5d1eb5 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -77,6 +77,7 @@ static void usage(void)
fprintf(stderr, "   ip address del IFADDR dev IFNAME 
[mngtmpaddr]\n");
fprintf(stderr, "   ip address {show|save|flush} [ dev IFNAME ] [ 
scope SCOPE-ID ]\n");
fprintf(stderr, "[ to PREFIX ] [ FLAG-LIST 
] [ label LABEL ] [up]\n");
+   fprintf(stderr, "[ vrf NAME ]\n");
fprintf(stderr, "   ip address {showdump|restore}\n");
fprintf(stderr, "IFADDR := PREFIX | ADDR peer PREFIX\n");
fprintf(stderr, "  [ broadcast ADDR ] [ anycast ADDR ]\n");
@@ -1613,6 +1614,16 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
if (!ifindex)
invarg("Device does not exist\n", *argv);
filter.master = ifindex;
+   } else if (strcmp(*argv, "vrf") == 0) {
+   int ifindex;
+
+   NEXT_ARG();
+   ifindex = ll_name_to_index(*argv);
+   if (!ifindex)
+   invarg("Not a valid VRF name\n", *argv);
+   if (!name_is_vrf(*argv))
+   invarg("Not a valid VRF name\n", *argv);
+   filter.master = ifindex;
} else if (do_link && strcmp(*argv, "type") == 0) {
NEXT_ARG();
filter.kind = *argv;
diff --git a/ip/iplink.c b/ip/iplink.c
index d2e586b6d133..d564aca6406e 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -82,11 +82,11 @@ void iplink_usage(void)
fprintf(stderr, "  [ query_rss { on | 
off} ]\n");
fprintf(stderr, "  [ state { auto | 
enable | disable} ] ]\n");
fprintf(stderr, "  [ trust { on | off} 
] ]\n");
-   fprintf(stderr, " [ master DEVICE ]\n");
+   fprintf(stderr, " [ master DEVICE ][ vrf NAME 
]\n");
fprintf(stderr, " [ nomaster ]\n");
fprintf(stderr, " [ addrgenmode { eui64 | none 
| stable_secret | random } ]\n");
fprintf(stderr, " [ protodown { on | off } 
]\n");
-   fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up] 
[master DEV] [type TYPE]\n");
+   fprintf(stderr, "   ip link show [ DEVICE | group GROUP ] [up] 
[master DEV] [vrf NAME] [type TYPE]\n");
 
if (iplink_have_newlink()) {
fprintf(stderr, "   ip link help [ TYPE ]\n");
@@ -565,6 +565,17 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
invarg("Device does not exist\n", *argv);
addattr_l(&req->n, sizeof(*req), IFLA_MASTER,
  &ifindex, 4);
+   } else if (strcmp(*argv, "vrf") == 0) {
+   int ifindex;
+
+   NEXT_ARG();
+   ifindex = ll_name_to_index(*argv);
+   if (!ifindex)
+   invarg("Not a valid VRF name\n", *argv);
+   if (!name_is_vrf(*argv))
+   invarg("Not a valid VRF name\n", *argv);
+   addattr_l(&req->n, sizeof(*req), IFLA_MASTER,
+ &ifindex, sizeof(ifindex));
} else if (matches(*argv, "nomaster") == 0) {
int ifindex = 0;
 
-- 
2.1.4

1 2 3 >

1 - 100 of 255 matches

Mail list logo