Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-26 Thread Jiri Pirko
Wed, Aug 26, 2015 at 08:08:21AM CEST, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Wed, 26 Aug 2015 07:52:15 +0200
>
>> They are simple statistics. But they does not fit into any existing
>> interface. This is about EMAD packets. They are not per-netdevice, but
>> per-pcidevice. So I cannot put them into ethtool.
>> 
>> I see no other iface to expose this other than debugfs. Please suggest
>> some other way, I don't see it :/
>
>Then create one, instead of crapping up the driver with debugfs
>craziness.

I'm not sure it is possible to come up with a generic interface for
arbitraty statistics for some generic PCI device.

I can imagine the pushback saying "hey, put that statistics into subtree
specific area, like netdev etc". And that is correct. In vast majority
of cases, that can be done.

In mlxsw case however, 36 netdevices are sharing 1 pci device. And the
stuff related to that pci device cannot be exposed via netdev.

I don't think that are much more cases like this. Therefore I think that
for this cases, debugfs might be a good way to expose debugging stats.


>
>>>I'm not applying this, and I'm really getting irritated about how much
>>>garbage people put into debugfs when it has _NO_ business being there.
>> 
>> I think that is the primary purpose of this iface, To put arbitrary
>> debugging garbage there. Am I missing something?
>
>It's not garbage if it's useful for someone.
>
>If it's not useful, why even bother?
>
>This is why I hate debugfs, it's a fundamentally flawed facility.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] netlink: add NETLINK_CAP_ACK socket option

2015-08-26 Thread Jiri Benc
On Tue, 25 Aug 2015 21:43:29 +0200, Christophe Ricard wrote:
>  void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err)
>  {
> + struct netlink_sock *nlk;
>   struct sk_buff *skb;
>   struct nlmsghdr *rep;
>   struct nlmsgerr *errmsg;
>   size_t payload = sizeof(*errmsg);
> + struct sock *sk;
>  
> - /* error messages get the original request appened */
> - if (err)
> + sk = netlink_lookup(sock_net(in_skb->sk),
> + in_skb->sk->sk_protocol,
> + NETLINK_CB(in_skb).portid);

The necessity to look up the socket for every ack was what I didn't
like about this. Would it be possible to add a socket parameter to
various code paths that lead to netlink_ack (or a boolean, as David
suggested)? It will probably be needed to add it to
netlink_sock->netlink_rcv, netlink_kernel_cfg->input, etc.

As an alternative, David also suggested to attach the sender socket to
in_skb->sk. Could work, too.

Thanks,

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Question] Usage of dev_hold()/dev_put()

2015-08-26 Thread Zhangjie (HZ)
Eric,
Thank you for your patient apply.
There is still a question, 
In receive path, driver does not call dev_hold(), when skb goes to host stack, 
skb->dev is likely to be used. 
If device is destroyed before that, it seems dangerous.

Thank you!
Zhangjie
-Original Message-
From: Eric Dumazet [mailto:eric.duma...@gmail.com] 
Sent: Tuesday, August 25, 2015 10:25 PM
To: Zhangjie (HZ)
Cc: Jason Wang; netdev@vger.kernel.org; Qinchuanyu; Yewudi; liuyongan 00175866; 
Wangbicheng; Yan Chen
Subject: Re: [Question] Usage of dev_hold()/dev_put()

On Tue, 2015-08-25 at 08:28 +, Zhangjie (HZ) wrote:
> Hi,
> 
> The comment of function dev_hold() and dev_put() are really simple.
> 
> Actually, I can’t find a rule to follow.
> 
> Where should I call dev_hold()/dev_put()?
> 
> Are they necessary during xmit or receive or only in register/release?
> 
> I find that, for tap, it calls dev_hold()/dev_put() for every
> tun_sendmsg/tun_recvmsg:
> 
> So, should I call dev_hold()/dev_put() for each quote, such like:
> “skb->dev = dev” ?
> 
> But, for physical nic, I can’t find dev_hold()/dev_put() during xmit 
> or receive.
> 
> What kind of scene is it necessary to call dev_hold()/dev_put()?
> 
>  
> 
> Look forward for your feedback. 
> 
> Thank you ! J
> 
> Zhangjie
> 
> 
Please do not send HTML mails, otherwise they do not reach netdev mailing list

In general, you need to use dev_hold() for every reference on 'dev'
stored in the object.

However they are some paths were you do not need that : For example in transmit 
path, skb are stored either in a qdisc or device driver internal queue/ring. At 
device dismantle we properly delete all these skb, so we do not have to worry 
about used dev_hold()/dev_put() in transmit.

In drivers/net/tun.c, the dev_hold()/dev_put() only is required so that device 
doesnt disappear between tun_get() and actual use of it after a potentially 
long copy from user space (that might trigger page faults)





Re: [PATCH v2 net-next 1/5] net_sched: make tcf_hash_destroy() static

2015-08-26 Thread Daniel Borkmann

On 08/26/2015 05:06 AM, Alexei Starovoitov wrote:

tcf_hash_destroy() used once. Make it static.

Signed-off-by: Alexei Starovoitov 


Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 2/5] net_sched: act_bpf: remove unnecessary copy

2015-08-26 Thread Daniel Borkmann

On 08/26/2015 05:06 AM, Alexei Starovoitov wrote:

Fix harmless typo and avoid unnecessary copy of empty 'prog' into
unused 'strcut tcf_bpf_cfg old'.

Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs")
Signed-off-by: Alexei Starovoitov 


Correct tag is actually net-next commit a5c90b29e5cc ("act_bpf: properly
support late binding of bpf action to a classifier").

Thanks for catching it!

Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 3/5] net_sched: convert tcindex to call tcf_exts_destroy from rcu callback

2015-08-26 Thread Daniel Borkmann

On 08/26/2015 05:06 AM, Alexei Starovoitov wrote:

Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov 


Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 4/5] net_sched: convert rsvp to call tcf_exts_destroy from rcu callback

2015-08-26 Thread Daniel Borkmann

On 08/26/2015 05:06 AM, Alexei Starovoitov wrote:

Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov 


Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 5/5] net_sched: act_bpf: remove spinlock in fast path

2015-08-26 Thread Daniel Borkmann

On 08/26/2015 05:06 AM, Alexei Starovoitov wrote:

Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
with extra care taken to free bpf programs after rcu grace period.
Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
and final destruction is done from tc_action_ops->cleanup() callback that is
called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
bind and refcnt reach zero which is only possible when classifier is destroyed.
Previous two patches fixed the last two classifiers (tcindex and rsvp) to
call tcf_exts_destroy() from rcu callback.

Similar to gact/mirred there is a race between prog->filter and
prog->tcf_action. Meaning that the program being replaced may use
previous default action if it happened to return TC_ACT_UNSPEC.
act_mirred race betwen tcf_action and tcfm_dev is similar.
In all cases the race is harmless.
Long term we may want to improve the situation by replacing the whole
tc_action->priv as single pointer instead of updating inner fields one by one.

Signed-off-by: Alexei Starovoitov 


Looks good to me, thanks!

Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters

2015-08-26 Thread Corinna Vinschen
On Aug 26 00:54, Francois Romieu wrote:
> David Miller  :
> [...]
> > Your counter offsets should be read at probe time, not open time.
> 
> It can be done but the "CmdRxEnb / rx traffic must be enabled" constraint
> will make it a major pita. 
> 
> Reading counter offsets at the end of open() naturally solves this
> constraint (retentive error unwinding in opne() stops being completely
> trivial though :o/ ).
> 
> > Bringing the interface is brought down/up should not reset the
> > counters.
> 
> Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets
> takes care of it: it's set during the first open() after probe().
> 
> Looking at it again, the patch directly stores 16 and 32 bit values
> in rtnl_link_stats64. Nobody should care about exact exceedingly high
> error count but rx_multicast ought to be accumulated.

I'll have a look into that for a followup patch.


Thanks,
Corinna


pgpgKxmrSJF6p.pgp
Description: PGP signature


Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters

2015-08-26 Thread Corinna Vinschen
On Aug 25 16:03, David Miller wrote:
> From: David Miller 
> Date: Tue, 25 Aug 2015 15:59:21 -0700 (PDT)
> 
> > From: Francois Romieu 
> > Date: Wed, 26 Aug 2015 00:54:06 +0200
> > 
> >>> Bringing the interface is brought down/up should not reset the
> >>> counters.
> >> 
> >> Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets
> >> takes care of it: it's set during the first open() after probe().
> > 
> > Ok, then it's fine.
> 
> And as such I've applied this patch, thanks.

Thanks,
Corinna


pgpWiElqoUtc5.pgp
Description: PGP signature


RE: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-26 Thread Philip Downey


> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, August 25, 2015 10:20 PM
> To: Philip Downey
> Cc: kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshf...@linux-ipv6.org;
> ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups
> 
> From: Philip Downey 
> Date: Mon, 24 Aug 2015 12:39:17 +0100
> 
> > +extern int sysctl_igmp_link_local_reports;
>  ...
> > +/* IGMP reports for link-local multicast groups are enabled by default */
> > +#define IGMP_ENABLE_LLM 1
> > +
> > +int sysctl_igmp_link_local_reports __read_mostly = IGMP_ENABLE_LLM;
> > +
> > +#define IGMP_INHIBIT_LINK_LOCAL_REPORTS(_ipaddr) \
> > +   (ipv4_is_local_multicast(_ipaddr) && \
> > +(sysctl_igmp_link_local_reports == 0))
> > +
> 
> People know that "1" and "0" means enable and disable respectively, so this
> macros is pretty excessive.  Just remove it.
> 
> Also, simplify the name of the sysctl to something like
> "sysctl_igmp_llm_reports" or similar, and simplify the test against 0 to be in
> the canonical "!x" format.  Then the test can fit on one
> line:
> 
>   (ipv4_is_local_multicast(_ipaddr) && !sysctl_igmp_llm_reports).

Thanks for reviewing David.
I will make the requested changes  (fitting the test on a single line was my 
main reason for introducing the macro - that and making it patently obvious 
what the test was doing.  Your suggestion would seem to meet that aim).

Will amend and resubmit.

Regards

Philip
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[net-next] lo uses DEPRECATED zero tx_queue_len - convert driver to use IFF_NO_QUEUE instead

2015-08-26 Thread Phil Sutter
On Tue, Aug 25, 2015 at 01:16:41PM +0200, Geert Uytterhoeven wrote:
> I don't know if this was reported before (I'm not subscribed to netdev), but
> Google couldn't find it:
> 
> lo uses DEPRECATED zero tx_queue_len - convert driver to use
> IFF_NO_QUEUE instead

This is fixed by commit e65db2b ("net: loopback: convert to using
IFF_NO_QUEUE"), part of a bigger series converting drivers (see here:
http://lists.openwall.net/netdev/2015/08/18/52).

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v2 3/3] net: sched: fall back to noqueue when removing root qdisc

2015-08-26 Thread Phil Sutter
Hi Jesper,

On Sun, Aug 23, 2015 at 08:53:57PM +0200, Jesper Dangaard Brouer wrote:
> On Sun, 23 Aug 2015 20:44:42 +0200
> Jesper Dangaard Brouer  wrote:
> 
> > On Sat, 22 Aug 2015 02:20:56 +0200
> > Phil Sutter  wrote:
> > 
> > > When removing the root qdisc, the interface should fall back to noqueue
> > > as the 'real' minimal qdisc instead of the default one. 
> > 
> > I worry this behavior could break existing scripts.
> 
> You would break OpenWRT package "qos-scripts", specifically:
>  
> https://github.com/openwrt-mirror/openwrt/blob/master/package/network/config/qos-scripts/files/usr/bin/qos-stop

Thanks for pointing this out!

> Which cleans-up/clear the qdisc setup by removing the root qdisc,
> assuming and depending on the default qdisc is re-assigned.

OK. Since the premise of the whole thing is to not break existing
scripts, this sadly tears down my approach.

> > I prefer the idea of allowing tc command to assign noqueue (to any
> > device).  This makes the action explicit for the user, instead of being
> > a side-effect of removing a qdisc. (and does not break backward compat)

I will give this another go. What I didn't like was that after attaching
noqueue, tc would output nothing when asked to show the attached qdisc -
which is of debatable correctness at least. But maybe that's just a user
space problem I could address separately.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[net-next] lo uses DEPRECATED zero tx_queue_len - convert driver to use IFF_NO_QUEUE instead

2015-08-26 Thread Geert Uytterhoeven
Hi Phil,

On Wed, Aug 26, 2015 at 11:34 AM, Phil Sutter  wrote:
> On Tue, Aug 25, 2015 at 01:16:41PM +0200, Geert Uytterhoeven wrote:
>> I don't know if this was reported before (I'm not subscribed to netdev), but
>> Google couldn't find it:
>>
>> lo uses DEPRECATED zero tx_queue_len - convert driver to use
>> IFF_NO_QUEUE instead
>
> This is fixed by commit e65db2b ("net: loopback: convert to using
> IFF_NO_QUEUE"), part of a bigger series converting drivers (see here:
> http://lists.openwall.net/netdev/2015/08/18/52).

Strange, that commit is included in my tree, which is based on
https://git.kernel.org/cgit/linux/kernel/git/geert/renesas-drivers.git/
and includes net-next?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: Do not use sizeof() on pointer type

2015-08-26 Thread Thierry Reding
From: Thierry Reding 

The rate_control_cap_mask() function takes a parameter mcs_mask, which
GCC will take to be u8 * even though it was declared with a fixed size.
This causes the following warning:

net/mac80211/rate.c: In function 'rate_control_cap_mask':
net/mac80211/rate.c:719:25: warning: 'sizeof' on array function 
parameter 'mcs_mask' will return size of 'u8 * {aka unsigned char *}' 
[-Wsizeof-array-argument]
   for (i = 0; i < sizeof(mcs_mask); i++)
 ^
net/mac80211/rate.c:684:10: note: declared here
   u8 mcs_mask[IEEE80211_HT_MCS_MASK_LEN],
  ^

This can be easily fixed by using the IEEE80211_HT_MCS_MASK_LEN directly
within the loop condition.

Signed-off-by: Thierry Reding 
---
 net/mac80211/rate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index 9857693b91ec..9ce8883d5f44 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -716,7 +716,7 @@ static bool rate_control_cap_mask(struct 
ieee80211_sub_if_data *sdata,
 
/* Filter out rates that the STA does not support */
*mask &= sta->supp_rates[sband->band];
-   for (i = 0; i < sizeof(mcs_mask); i++)
+   for (i = 0; i < IEEE80211_HT_MCS_MASK_LEN; i++)
mcs_mask[i] &= sta->ht_cap.mcs.rx_mask[i];
 
sta_vht_cap = sta->vht_cap.vht_mcs.rx_mcs_map;
-- 
2.4.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Optimize the snmp stat aggregation for large cpus

2015-08-26 Thread Raghavendra K T

On 08/26/2015 04:37 AM, David Miller wrote:

From: Raghavendra K T 
Date: Tue, 25 Aug 2015 13:24:24 +0530


Please let me know if you have suggestions/comments.


Like Eric Dumazet said the idea is good but needs some adjustments.

You might want to see whether a per-cpu work buffer works for this.


sure, Let me know if I understood correctly,

we allocate the temp buffer,
we will have a  "add_this_cpu_data" function and do

for_each_online_cpu(cpu)
smp_call_function_single(cpu, add_this_cpu_data, buffer, 1)

if not could you please point to an example you had in mind.



It's extremely unfortunately that we can't depend upon the destination
buffer being properly aligned, because we wouldn't need a temporary
scratch area if it were aligned properly.


True, But I think for 64 bit cpus when (pad == 0) we can go ahead and
use stats array directly and get rid of put_unaligned(). is it correct?

(my internal initial patch had this version but thought it is ugly to
have ifdef BITS_PER_LONG==64)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] route: put lwstate before freeing dst to avoid use after free

2015-08-26 Thread Jiri Benc
On Tue, 25 Aug 2015 14:25:14 -0400, Sasha Levin wrote:
> Commit 61adedf3 ("route: move lwtunnel state to dst_entry") is trying to
> release lwstate after getting rid of dst, which causes a use-after-free
> trying to access dst->lwstate.
> 
> Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry")
> Signed-off-by: Sasha Levin 

Already fixed by e252b3d1a174 in net-next.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [[net-next] lo uses DEPRECATED zero tx_queue_len - convert driver to use IFF_NO_QUEUE instead

2015-08-26 Thread Phil Sutter
Hi Geert,

On Wed, Aug 26, 2015 at 12:16:47PM +0200, Geert Uytterhoeven wrote:
> On Wed, Aug 26, 2015 at 11:34 AM, Phil Sutter  wrote:
> > On Tue, Aug 25, 2015 at 01:16:41PM +0200, Geert Uytterhoeven wrote:
> >> I don't know if this was reported before (I'm not subscribed to netdev), 
> >> but
> >> Google couldn't find it:
> >>
> >> lo uses DEPRECATED zero tx_queue_len - convert driver to use
> >> IFF_NO_QUEUE instead
> >
> > This is fixed by commit e65db2b ("net: loopback: convert to using
> > IFF_NO_QUEUE"), part of a bigger series converting drivers (see here:
> > http://lists.openwall.net/netdev/2015/08/18/52).
> 
> Strange, that commit is included in my tree, which is based on
> https://git.kernel.org/cgit/linux/kernel/git/geert/renesas-drivers.git/
> and includes net-next?

I'm sorry, the above statement was too quickly put. Florian Westphal
confirmed the problem you are seeing privately, and I can follow from
looking at the code. Obviously I failed to notice that by using
alloc_netdev instead of alloc_etherdev, there is a way for drivers to
circumvent ether_setup completely which leaves tx_queue_len
uninitialized (i.e., 0) and therefore triggers the warning.

I'm yet unsure how to properly fix this issue, but moving the check to a
more appropriate place is certainly advisable.

Thanks for pointing this out,

Phil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] net: Optimize snmp stat aggregation by walking all the percpu data at once

2015-08-26 Thread Raghavendra K T

On 08/25/2015 09:30 PM, Eric Dumazet wrote:

On Tue, 2015-08-25 at 21:17 +0530, Raghavendra K T wrote:

On 08/25/2015 07:58 PM, Eric Dumazet wrote:





This is a great idea, but kcalloc()/kmalloc() can fail and you'll crash
the whole kernel at this point.



Good catch, and my bad. Though system is in bad memory condition,
since fill_stat is not critical for the system do you think silently
returning from here is a good idea?
or do you think we should handle with -ENOMEM way up.?


Hmm... presumably these 288 bytes could be allocated in
inet6_fill_ifla6_attrs() stack frame.


Correct, since we need to allocate  for IPSTATS_MIB_MAX, we could do 
this in even snmp6_fill_stats() stack frame.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [rhashtable-test] EIP is at lock_is_held

2015-08-26 Thread Phil Sutter
Hi,

(Full-quoting here due to added maling lists.)

Looks like this is a problem of slow systems. I will try to reproduce
and come up with a similar fix as in commit 685a015 ("rhashtable: Allow
other tasks to be scheduled in large lookup loops").

Thanks for reporting,

Phil

On Mon, Aug 24, 2015 at 12:40:43PM +0800, Fengguang Wu wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git master
> 
> commit f4a3e90ba5739cfd761b6befadae9728bd3641ed
> Author: Phil Sutter 
> AuthorDate: Sat Aug 15 00:37:15 2015 +0200
> Commit: David S. Miller 
> CommitDate: Mon Aug 17 14:33:47 2015 -0700
> 
> rhashtable-test: extend to test concurrency
> 
> After having tested insertion, lookup, table walk and removal, spawn a
> number of threads running operations on the same rhashtable. Each of
> them will:
> 
> 1) insert it's own set of objects,
> 2) lookup every successfully inserted object and finally
> 3) remove objects in several rounds until all of them have been removed,
>making sure the remaining ones are still found after each round.
> 
> This should put a good amount of load onto the system and due to
> synchronising thread startup via two semaphores also extensive
> concurrent table access.
> 
> The default number of ten threads returned within half a second on my
> local VM with two cores. Running 200 threads took about four seconds. If
> slow systems suffer too much from this though, the default could be
> lowered or even set to zero so this extended test does not run at all by
> default.
> 
> Signed-off-by: Phil Sutter 
> Acked-by: Thomas Graf 
> Signed-off-by: David S. Miller 
> 
> +++++
> || c1f066d4ee | 
> f4a3e90ba5 | 6967aa466b |
> +++++
> | boot_successes | 1060   | 808   
>  | 106|
> | boot_failures  | 1  | 102   
>  | 21 |
> | INFO:possible_circular_locking_dependency_detected | 1  |   
>  ||
> | backtrace:vfs_readv| 1  |   
>  ||
> | backtrace:SyS_readv| 1  |   
>  ||
> | backtrace:blk_mq_sysfs_unregister  | 1  |   
>  ||
> | backtrace:blk_mq_queue_reinit_notify   | 1  |   
>  ||
> | backtrace:debug_hotplug_cpu| 1  |   
>  ||
> | backtrace:kernel_init_freeable | 1  |   
>  ||
> | EIP_is_at_lock_is_held | 0  | 63
>  | 16 |
> | Kernel_panic-not_syncing:softlockup:hung_tasks | 0  | 102   
>  | 19 |
> | backtrace:threadfunc   | 0  | 101   
>  | 19 |
> | EIP_is_at_rcu_read_lock_held   | 0  | 10
>  | 5  |
> | EIP_is_at_rcu_lockdep_current_cpu_online   | 0  | 9 
>  ||
> | EIP_is_at_thread_lookup_test   | 0  | 11
>  | 1  |
> | EIP_is_at_lock_release | 0  | 3 
>  ||
> | EIP_is_at_lockdep_rht_bucket_is_held   | 0  | 3 
>  ||
> | EIP_is_at_rcu_is_watching  | 0  | 5 
>  ||
> | backtrace:apic_timer_interrupt | 0  | 3 
>  ||
> | EIP_is_at_lock_acquire | 0  | 12
>  | 2  |
> | EIP_is_at_jhash| 0  | 11
>  ||
> | EIP_is_at_raw_spin_lock_bh | 0  | 1 
>  ||
> | EIP_is_at_debug_lockdep_rcu_enabled| 0  | 13
>  ||
> | EIP_is_at_lockdep_rht_mutex_is_held| 0  | 3 
>  | 2  |
> | EIP_is_at_raw_spin_unlock_bh   | 0  | 1 
>  ||
> | EIP_is_at_do_raw_spin_lock | 0  | 1 
>  ||
> | EIP_is_at__local_bh_enable_ip  | 0  | 1 
>  ||
> | EIP_is_at_threadfunc   | 0  | 1 
>  ||
> | IP-Config:Auto-configuration_of_network_failed | 0  | 0 
>  | 2  |
> +---

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-26 Thread Nikolay Aleksandrov

> On Aug 25, 2015, at 11:06 PM, David Miller  wrote:
> 
> From: Nikolay Aleksandrov 
> Date: Tue, 25 Aug 2015 22:28:16 -0700
> 
>> Certainly, that should be done and I will look into it, but the
>> essence of this patch is a bit different. The problem here is not
>> the size of the fdb entries, it’s more the number of them - having
>> 96000 entries (even if they were 1 byte ones) is just way too much
>> especially when the fdb hash size is small and static. We could work
>> on making it dynamic though, but still these type of local entries
>> per vlan per port can easily be avoided with this option.
> 
> 96000 bits can be stored in 12k.  Get where I'm going with this?
> 
> Look at the problem sideways.

Oh okay, I misunderstood your previous comment. I’ll look into that.

Thanks,
 Nik--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Chuck Ebbert
On Wed, 26 Aug 2015 08:46:59 +
Shaun Crampton  wrote:

> Testing our app at scale on Google¹s GCE, running ~1000 CoreOS hosts: over
> approximately 1 hour, I see about 1 in 50 hosts hit one of the Oopses
> below and then reboot (I¹m not sure if the different oopses are related to
> each other).
> 
> The app is Project Calico, which is a datacenter networking fabric.
> calico-felix, the process named below, is our per-host agent.  The
> per-host agent is responsible for reading the network information from a
> central server and applying "ip route² and "iptables" updates to the
> kernel.  We¹re running on CoreOS, with about 100  docker containers/veths
> pairs running on each host.  calico-felix is running inside one of those
> containers. We also run the BIRD BGP stack to redistribute routes around
> the datacenter.  The errors happen more frequently while Calico is under
> load.
> 
> I¹m not sure where to go from here.  I can reproduce these issues easily
> at that scale but I haven¹t managed to boil it down to a small-scale repro
> scenario for further investigation (yet).
> 

What in the world is going on with those call traces? E.g.:

> [ 4513.712008]  
> [ 4513.712008]  [] ? ip_rcv_finish+0x81/0x360
> [ 4513.712008]  [] ip_rcv+0x2a4/0x400
> [ 4513.712008]  [] ? inet_del_offload+0x40/0x40
> [ 4513.712008]  [] __netif_receive_skb_core+0x6c3/0x9a0
> [ 4513.712008]  [] ? build_skb+0x17/0x90
> [ 4513.712008]  [] __netif_receive_skb+0x18/0x60
> [ 4513.712008]  [] netif_receive_skb_internal+0x33/0xa0
> [ 4513.712008]  [] netif_receive_skb_sk+0x1c/0x70
> [ 4513.712008]  [] 0xa00f772b
> [ 4513.712008]  [] ? __netif_receive_skb_core+0x6c3/0x9a0
> [ 4513.712008]  [] 0xa00f7d81
> [ 4513.712008]  [] net_rx_action+0x159/0x340
> [ 4513.712008]  [] __do_softirq+0xf4/0x290
> [ 4513.712008]  [] irq_exit+0xad/0xc0
> [ 4513.712008]  [] do_IRQ+0x5a/0xf0
> [ 4513.712008]  [] common_interrupt+0x6e/0x6e
> [ 4513.712008]  

There are two functions in the call trace that the kernel knows
nothing about. How did they get in there?

And there is really executable code in there, as can be seen from a
later trace:

> [ 4123.003006]  
> [ 4123.003006]  [] nf_iterate+0x57/0x80
> [ 4123.003006]  [] nf_hook_slow+0x97/0x100
> [ 4123.003006]  [] ip_local_deliver+0x92/0xa0
> [ 4123.003006]  [] ? ip_rcv_finish+0x360/0x360
> [ 4123.003006]  [] ip_rcv_finish+0x81/0x360
> [ 4123.003006]  [] ip_rcv+0x2a4/0x400
> [ 4123.003006]  [] ? inet_del_offload+0x40/0x40
> [ 4123.003006]  [] __netif_receive_skb_core+0x6c3/0x9a0
> [ 4123.003006]  [] ? build_skb+0x17/0x90
> [ 4123.003006]  [] __netif_receive_skb+0x18/0x60
> [ 4123.003006]  [] netif_receive_skb_internal+0x33/0xa0
> [ 4123.003006]  [] netif_receive_skb_sk+0x1c/0x70
> [ 4123.003006]  [] 0xa00d472b
> [ 4123.003006]  [] 0xa00d4d81
> [ 4123.003006]  [] net_rx_action+0x159/0x340
> [ 4123.003006]  [] __do_softirq+0xf4/0x290
> [ 4123.003006]  [] irq_exit+0xad/0xc0
> [ 4123.003006]  [] do_IRQ+0x5a/0xf0
> [ 4123.003006]  [] common_interrupt+0x6e/0x6e
> [ 4123.003006]  
> [ 4123.003006]  [] ? __ip_route_output_key+0x31d/0x860
> [ 4123.003006]  [] ? xfrm_lookup_route+0x5/0x70
> [ 4123.003006]  [] ? ip_route_output_flow+0x54/0x60
> [ 4123.003006]  [] ip_queue_xmit+0x36a/0x3d0
> [ 4123.003006]  [] tcp_transmit_skb+0x4b9/0x990
> [ 4123.003006]  [] tcp_write_xmit+0x115/0xe90
> [ 4123.003006]  [] __tcp_push_pending_frames+0x32/0xd0
> [ 4123.003006]  [] tcp_push+0xef/0x120
> [ 4123.003006]  [] tcp_sendmsg+0xc5/0xb20
> [ 4123.003006]  [] ? lock_hrtimer_base.isra.22+0x29/0x50
> [ 4123.003006]  [] inet_sendmsg+0x64/0xa0
> [ 4123.003006]  [] ? __fget_light+0x25/0x70
> [ 4123.003006]  [] sock_sendmsg+0x3d/0x50
> [ 4123.003006]  [] SYSC_sendto+0x102/0x1a0
> [ 4123.003006]  [] ? __audit_syscall_entry+0xb4/0x110
> [ 4123.003006]  [] ? do_audit_syscall_entry+0x6c/0x70
> [ 4123.003006]  [] ?
> syscall_trace_enter_phase1+0x103/0x160
> [ 4123.003006]  [] SyS_sendto+0xe/0x10
> [ 4123.003006]  [] system_call_fastpath+0x12/0x71
> [ 4123.003006] Code: <48> 8b 88 40 03 00 00 e8 1d dd dd ff 5d c3 0f 1f 00
> 41 83 b9 80 00 
> [ 4123.003006] RIP  [] 0xa0233027
> [ 4123.003006]  RSP 

Presumably the same two functions as before (loaded at a different
base address but same offsets, 0xd81 and 0x72b). And then nf_iterate
call into another unknown function, and there really is code there
and it's consistent with the oops. And the kernel thinks it's
outside of any normal text section, so it does not try to dump any
code from before the instruction pointer.

   0:   48 8b 88 40 03 00 00mov0x340(%rax),%rcx
   7:   e8 1d dd dd ff  callq  0xff29
   c:   5d  pop%rbp
   d:   c3  retq   

Did you write your own module loader or something?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] atomics,cmpxchg: Privatize the inclusion of asm/cmpxchg.h

2015-08-26 Thread Boqun Feng
After commit:

atomics: add acquire/release/relaxed variants of some atomic operations

Architectures may only provide {cmp,}xchg_relaxed definitions in
asm/cmpxchg.h. Other variants, such as {cmp,}xchg, may be built in
linux/atomic.h, which means simply including asm/cmpxchg.h may not get
the definitions of all the{cmp,}xchg variants. Therefore, we should
privatize the inclusions of asm/cmpxchg.h to keep it only included in
arch/* and replace the inclusions outside with linux/atomic.h

Acked-by: Will Deacon 
Signed-off-by: Boqun Feng 
---
 Documentation/atomic_ops.txt| 4 
 drivers/net/ethernet/sfc/mcdi.c | 2 +-
 drivers/phy/phy-rcar-gen2.c | 3 +--
 drivers/staging/speakup/selection.c | 2 +-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index b19fc34..c9d1cac 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -542,6 +542,10 @@ The routines xchg() and cmpxchg() must provide the same 
exact
 memory-barrier semantics as the atomic and bit operations returning
 values.
 
+Note: If someone wants to use xchg(), cmpxchg() and their variants,
+linux/atomic.h should be included rather than asm/cmpxchg.h, unless
+the code is in arch/* and can take care of itself.
+
 Spinlocks and rwlocks have memory barrier expectations as well.
 The rule to follow is simple:
 
diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 81640f8..968383e 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -9,7 +9,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include "net_driver.h"
 #include "nic.h"
 #include "io.h"
diff --git a/drivers/phy/phy-rcar-gen2.c b/drivers/phy/phy-rcar-gen2.c
index 39d9b29..117b495 100644
--- a/drivers/phy/phy-rcar-gen2.c
+++ b/drivers/phy/phy-rcar-gen2.c
@@ -17,8 +17,7 @@
 #include 
 #include 
 #include 
-
-#include 
+#include 
 
 #define USBHS_LPSTS0x02
 #define USBHS_UGCTRL   0x80
diff --git a/drivers/staging/speakup/selection.c 
b/drivers/staging/speakup/selection.c
index a031570..81c0888 100644
--- a/drivers/staging/speakup/selection.c
+++ b/drivers/staging/speakup/selection.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include "speakup.h"
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, net-next] r8169:Correct value on r810x_phy_power_up function

2015-08-26 Thread Corcodel Marian
Correct value on r810x_phy_power_up function normal clean 
 bit BMCR_PDOWN

Signed-off-by: Corcodel Marian 

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index d6d39df..91cf3a6 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4669,7 +4669,7 @@ static void r810x_phy_power_down(struct rtl8169_private 
*tp)
 static void r810x_phy_power_up(struct rtl8169_private *tp)
 {
rtl_writephy(tp, 0x1f, 0x);
-   rtl_writephy(tp, MII_BMCR, BMCR_ANENABLE);
+   rtl_writephy(tp, MII_BMCR, ~BMCR_PDOWN);
 }
 
 static void r810x_pll_power_down(struct rtl8169_private *tp)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-26 Thread Vlad Yasevich
On 08/24/2015 08:55 PM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov 
> 
> This patch adds a new knob that, when enabled, allows to suppress the
> installation of local fdb entries in newly created vlans. This could
> pose a big scalability issue if we have a large number of ports and a
> large number of vlans, e.g. in a 48 port device with 2000 vlans these
> entries easily go up to 96000.
> Note that packets for these macs are still received properly because they
> are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
> results in a miss.
> Also note that vlan membership of ingress port and the bridge device
> as egress are still being correctly enforced.
> 
> The default (0/off) is keeping the current behaviour.
> 
> Based on a patch by Wilson Kok (w...@cumulusnetworks.com).
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> As usual I'll post iproute2 patch if this one gets accepted.
> 

... snip...

> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 3cef6892c0bb..f9efa1b07994 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -98,11 +98,12 @@ static int __vlan_add(struct net_port_vlans *v, u16 vid, 
> u16 flags)
>   return err;
>   }
>  
> - err = br_fdb_insert(br, p, dev->dev_addr, vid);
> - if (err) {
> - br_err(br, "failed insert local address into bridge "
> -"forwarding table\n");
> - goto out_filt;
> + if (!br_vlan_ignore_local_fdb(br) || !v->port_idx) {
> + err = br_fdb_insert(br, p, dev->dev_addr, vid);
> + if (err) {
> + br_err(br, "failed insert local address into bridge 
> forwarding table\n");
> + goto out_filt;
> + }
>   }
>

One question.  Does it make sense to push this down into br_fdb_insert?
This patch prevents automatic entries from being added.  But what about
manual entries for a local fdb?  The code in br_fdb_add() will still a
vid 0 entry as well as entries for all vlans currently configured on the port.

-vlad

>   set_bit(vid, v->vlan_bitmap);
> @@ -492,6 +493,13 @@ int br_vlan_filter_toggle(struct net_bridge *br, 
> unsigned long val)
>   return 0;
>  }
>  
> +int br_vlan_ignore_local_fdb_toggle(struct net_bridge *br, unsigned long val)
> +{
> + br->vlan_ignore_local_fdb = val ? true : false;
> +
> + return 0;
> +}
> +
>  int br_vlan_set_proto(struct net_bridge *br, unsigned long val)
>  {
>   int err = 0;
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-26 Thread Vlad Yasevich
On 08/26/2015 02:10 AM, B Viswanath wrote:
>>>
>>> I'd rather we fix the essence of the scalability problem than add
>>> more spaghetti code to the various bridge paths.
>>>
>>> Can we make the fdb entries smaller?
>>>
>>> Can we enhance how we store such local entries such that they live in
>>> a compact datastructure?  Perhaps the FDB can consist of a very dense
>>> lookup mechanism for local stuff sitting alongside the current table.
>>
>> Certainly, that should be done and I will look into it, but the essence of 
>> this patch
>> is a bit different. The problem here is not the size of the fdb entries, 
>> it’s more the
>> number of them - having 96000 entries (even if they were 1 byte ones) is 
>> just way
>> too much especially when the fdb hash size is small and static. We could 
>> work on making
>> it dynamic though, but still these type of local entries per vlan per port 
>> can easily be avoided
>> with this option.
>>
> 
> I was wondering if it is possible to assign a vlan bitmap for the FDB
> entry, instead of replicating the entry for each vlan. ( I believe
> Roopa has done something similar, but not so sure). This means that
> the number of FDB entries remain static for any number of vlans.
> 
> I guess its more complicated than it sounds, but just wanted to know
> if its feasible at all.

I've actually had this done in one of the earlier attempts.  The issue was how
to compress it because there was absolutely no gain if you have a sparse vlan 
bitmap.

I even tried doing something along the lines of vlan_group array, but that can
explode to full size almost as fast.

What actually worked better was a hash table of vlans where each entry in the 
table
contained a bunch of data one of which was a list of fdbs for a given vlan.   It
didn't replicate fdbs but simply referenced the ones we cared about and bumped 
the ref.

However, this made vlan look-ups slower since we now had a hash instead of a 
bitmap lookup
and Stephen rejected it.

-vlad

> 
> Thanks
> Vissu
> 
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-26 Thread Nikolay Aleksandrov

> On Aug 26, 2015, at 5:42 AM, Vlad Yasevich  wrote:
> 
> On 08/24/2015 08:55 PM, Nikolay Aleksandrov wrote:
>> From: Nikolay Aleksandrov 
>> 
>> This patch adds a new knob that, when enabled, allows to suppress the
>> installation of local fdb entries in newly created vlans. This could
>> pose a big scalability issue if we have a large number of ports and a
>> large number of vlans, e.g. in a 48 port device with 2000 vlans these
>> entries easily go up to 96000.
>> Note that packets for these macs are still received properly because they
>> are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
>> results in a miss.
>> Also note that vlan membership of ingress port and the bridge device
>> as egress are still being correctly enforced.
>> 
>> The default (0/off) is keeping the current behaviour.
>> 
>> Based on a patch by Wilson Kok (w...@cumulusnetworks.com).
>> 
>> Signed-off-by: Nikolay Aleksandrov 
>> ---
>> As usual I'll post iproute2 patch if this one gets accepted.
>> 
> 
> ... snip...
> 
>> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
>> index 3cef6892c0bb..f9efa1b07994 100644
>> --- a/net/bridge/br_vlan.c
>> +++ b/net/bridge/br_vlan.c
>> @@ -98,11 +98,12 @@ static int __vlan_add(struct net_port_vlans *v, u16 vid, 
>> u16 flags)
>>  return err;
>>  }
>> 
>> -err = br_fdb_insert(br, p, dev->dev_addr, vid);
>> -if (err) {
>> -br_err(br, "failed insert local address into bridge "
>> -   "forwarding table\n");
>> -goto out_filt;
>> +if (!br_vlan_ignore_local_fdb(br) || !v->port_idx) {
>> +err = br_fdb_insert(br, p, dev->dev_addr, vid);
>> +if (err) {
>> +br_err(br, "failed insert local address into bridge 
>> forwarding table\n");
>> +goto out_filt;
>> +}
>>  }
>> 
> 
> One question.  Does it make sense to push this down into br_fdb_insert?
> This patch prevents automatic entries from being added.  But what about
> manual entries for a local fdb?  The code in br_fdb_add() will still a
> vid 0 entry as well as entries for all vlans currently configured on the port.
> 
> -vlad
> 

Good point, it would make sense if we go this way, but as Dave suggested it’d 
be better
to fix the root cause of the scalability issue rather than trying to work 
around it, so I’m
dropping this patch for now and will try to come up with a different solution, 
need to look
into this more.

>>  set_bit(vid, v->vlan_bitmap);
>> @@ -492,6 +493,13 @@ int br_vlan_filter_toggle(struct net_bridge *br, 
>> unsigned long val)
>>  return 0;
>> }
>> 
>> +int br_vlan_ignore_local_fdb_toggle(struct net_bridge *br, unsigned long 
>> val)
>> +{
>> +br->vlan_ignore_local_fdb = val ? true : false;
>> +
>> +return 0;
>> +}
>> +
>> int br_vlan_set_proto(struct net_bridge *br, unsigned long val)
>> {
>>  int err = 0;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ip_rcv_finish() NULL pointer and possibly related Oopses

2015-08-26 Thread Shaun Crampton

>And the kernel thinks it's
>outside of any normal text section, so it does not try to dump any
>code from before the instruction pointer.
>
>   0:  48 8b 88 40 03 00 00mov0x340(%rax),%rcx
>   7:  e8 1d dd dd ff  callq  0xff29
>   c:  5d  pop%rbp
>   d:  c3  retq
>
>Did you write your own module loader or something?

We certainly didn't but CoreOS may have.  I've asked CoreOS if they know
what's going on.

Are there any extra diagnostics I can gather from a CoreOS system to help
figure out what's going on there?  Is there anything I can do to get more
useful diagnostics when one of these failures occur?  As noted, I can
reproduce the issue but it's expensive, requiring hundreds of VMs to
hammer away for an hour or so.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH, net-next] r8169:Actually from r810x_pll_power_up

2015-08-26 Thread Corcodel Marian
Actually from r810x_pll_power_up function i removed function
  r810x_phy_power_up because is two situation.One run from  rtl8169_phy_reset
 wich already power on interface after reset MII_BMCR  and two i placed
 supplementary on __rtl8169_resume.

Signed-off-by: Corcodel Marian 

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 91cf3a6..2d712a4 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4699,7 +4699,6 @@ static void r810x_pll_power_up(struct rtl8169_private *tp)
 {
void __iomem *ioaddr = tp->mmio_addr;
 
-   r810x_phy_power_up(tp);
 
switch (tp->mac_version) {
case RTL_GIGA_MAC_VER_07:
@@ -7862,7 +7861,7 @@ static void __rtl8169_resume(struct net_device *dev)
struct rtl8169_private *tp = netdev_priv(dev);
 
netif_device_attach(dev);
-
+   r810x_phy_power_up(tp); 
rtl_pll_power_up(tp);
 
rtl_lock_work(tp);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Question] Usage of dev_hold()/dev_put()

2015-08-26 Thread Eric Dumazet
On Wed, 2015-08-26 at 07:48 +, Zhangjie (HZ) wrote:
> Eric,
> Thank you for your patient apply.
> There is still a question, 
> In receive path, driver does not call dev_hold(), when skb goes to host 
> stack, skb->dev is likely to be used. 
> If device is destroyed before that, it seems dangerous.

This is also handled properly.

Check : flush_backlog() in net/core/dev.c
   sock_queue_rcv_skb() , and all functions setting skb->dev to NULL


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] lib/Makefile: remove CONFIG_AVERAGE build rule

2015-08-26 Thread Valentin Rothberg
The Kconfig option AVERAGE and its implementation has been removed by
commit f4e774f55fe0 ("average: remove out-of-line implementation").
Remove the dead build rule in lib/Makefile.

Signed-off-by: Valentin Rothberg 
Reviewed-by: Johannes Berg 
---
I detected the issue with scripts/checkkconfigsymbols.py
David asked to resend the full patch netdev so that it gets queued
up in patchwork.

 lib/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/Makefile b/lib/Makefile
index 51e1d761f0b9..f32d342b75de 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -143,8 +143,6 @@ obj-$(CONFIG_GENERIC_ATOMIC64) += atomic64.o
 
 obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o
 
-obj-$(CONFIG_AVERAGE) += average.o
-
 obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o
 
 obj-$(CONFIG_CORDIC) += cordic.o
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Optimize the snmp stat aggregation for large cpus

2015-08-26 Thread Eric Dumazet
On Wed, 2015-08-26 at 15:55 +0530, Raghavendra K T wrote:
> On 08/26/2015 04:37 AM, David Miller wrote:
> > From: Raghavendra K T 
> > Date: Tue, 25 Aug 2015 13:24:24 +0530
> >
> >> Please let me know if you have suggestions/comments.
> >
> > Like Eric Dumazet said the idea is good but needs some adjustments.
> >
> > You might want to see whether a per-cpu work buffer works for this.
> 
> sure, Let me know if I understood correctly,
> 
> we allocate the temp buffer,
> we will have a  "add_this_cpu_data" function and do
> 
> for_each_online_cpu(cpu)
>  smp_call_function_single(cpu, add_this_cpu_data, buffer, 1)
> 
> if not could you please point to an example you had in mind.


Sorry I do not think it is a good idea.

Sending an IPI is way more expensive and intrusive than reading 4 or 5
cache lines from memory (per cpu)

Definitely not something we want.

> 
> >
> > It's extremely unfortunately that we can't depend upon the destination
> > buffer being properly aligned, because we wouldn't need a temporary
> > scratch area if it were aligned properly.
> 
> True, But I think for 64 bit cpus when (pad == 0) we can go ahead and
> use stats array directly and get rid of put_unaligned(). is it correct?


Nope. We have no alignment guarantee. It could be 0x04
pointer value. (ie not a multiple of 8)

> 
> (my internal initial patch had this version but thought it is ugly to
> have ifdef BITS_PER_LONG==64)

This has nothing to do with arch having 64bit per long. It is about
alignment of a u64.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] Optimize the snmp stat aggregation for large cpus

2015-08-26 Thread Raghavendra K T

On 08/26/2015 07:39 PM, Eric Dumazet wrote:

On Wed, 2015-08-26 at 15:55 +0530, Raghavendra K T wrote:

On 08/26/2015 04:37 AM, David Miller wrote:

From: Raghavendra K T 
Date: Tue, 25 Aug 2015 13:24:24 +0530


Please let me know if you have suggestions/comments.


Like Eric Dumazet said the idea is good but needs some adjustments.

You might want to see whether a per-cpu work buffer works for this.


sure, Let me know if I understood correctly,

we allocate the temp buffer,
we will have a  "add_this_cpu_data" function and do

for_each_online_cpu(cpu)
  smp_call_function_single(cpu, add_this_cpu_data, buffer, 1)

if not could you please point to an example you had in mind.



Sorry I do not think it is a good idea.

Sending an IPI is way more expensive and intrusive than reading 4 or 5
cache lines from memory (per cpu)

Definitely not something we want.


Okay. Another problem I thought here was that we could only loop over
online cpus.


It's extremely unfortunately that we can't depend upon the destination
buffer being properly aligned, because we wouldn't need a temporary
scratch area if it were aligned properly.


True, But I think for 64 bit cpus when (pad == 0) we can go ahead and
use stats array directly and get rid of put_unaligned(). is it correct?



Nope. We have no alignment guarantee. It could be 0x04
pointer value. (ie not a multiple of 8)



(my internal initial patch had this version but thought it is ugly to
have ifdef BITS_PER_LONG==64)


This has nothing to do with arch having 64bit per long. It is about
alignment of a u64.



Okay. I 'll send V2 with declaring tmp buffer in stack.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: phy: fixed: propagate fixed link values to struct

2015-08-26 Thread Madalin Bucur
The fixed link values parsed from the device tree are stored in
the struct fixed_phy member status. The struct phy_device members
speed, duplex were not updated.

Signed-off-by: Madalin Bucur 
---
 drivers/net/phy/fixed_phy.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 479b93f..20731fc 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -292,6 +292,15 @@ struct phy_device *fixed_phy_register(unsigned int irq,
return ERR_PTR(-EINVAL);
}
 
+   /* propagate the fixed link values to struct phy_device */
+   if (status->link) {
+   phy->link = status->link;
+   phy->speed = status->speed;
+   phy->duplex = status->duplex;
+   phy->pause = status->pause;
+   phy->asym_pause = status->asym_pause;
+   }
+
of_node_get(np);
phy->dev.of_node = np;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: fixed: propagate fixed link values to struct

2015-08-26 Thread Stas Sergeev
26.08.2015 17:48, Madalin Bucur пишет:
> The fixed link values parsed from the device tree are stored in
> the struct fixed_phy member status. The struct phy_device members
> speed, duplex were not updated.
> 
> Signed-off-by: Madalin Bucur 
> ---
>  drivers/net/phy/fixed_phy.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
> index 479b93f..20731fc 100644
> --- a/drivers/net/phy/fixed_phy.c
> +++ b/drivers/net/phy/fixed_phy.c
> @@ -292,6 +292,15 @@ struct phy_device *fixed_phy_register(unsigned int irq,
>   return ERR_PTR(-EINVAL);
>   }
>  
> + /* propagate the fixed link values to struct phy_device */
> + if (status->link) {
> + phy->link = status->link;
Oh, I wonder if you want to initialize phy->link regardless,
outside of the "if (status->link)" block.
Other than that,

Acked-by: Stas Sergeev 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[v2] net: phy: fixed: propagate fixed link values to struct

2015-08-26 Thread Madalin Bucur
The fixed link values parsed from the device tree are stored in
the struct fixed_phy member status. The struct phy_device members
speed, duplex were not updated.

Signed-off-by: Madalin Bucur 
---
v2: always setting phy->link, thanks Stas

 drivers/net/phy/fixed_phy.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 479b93f..99d9bc1 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -292,6 +292,15 @@ struct phy_device *fixed_phy_register(unsigned int irq,
return ERR_PTR(-EINVAL);
}
 
+   /* propagate the fixed link values to struct phy_device */
+   phy->link = status->link;
+   if (status->link) {
+   phy->speed = status->speed;
+   phy->duplex = status->duplex;
+   phy->pause = status->pause;
+   phy->asym_pause = status->asym_pause;
+   }
+
of_node_get(np);
phy->dev.of_node = np;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull-request: wireless-drivers-next 2015-08-26

2015-08-26 Thread Kalle Valo
Hi Dave,

here's one more smaller pull request I would like to still get to 4.3.
Nothing really special expect the new firmware API 17 support for
iwlwifi and qca6164 support for ath10k which would be good to have in
4.3.

Please let me know if you have any problems.

Kalle

The following changes since commit 4a89ba04ecc6377696e4e26c1abc1cb5764decb9:

  3c59x: Add BQL support for 3c59x ethernet driver. (2015-08-24 12:20:58 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-for-davem-2015-08-26

for you to fetch changes up to 0ba3ac03c1f38be17102d1c76c42a7c66a3e9ff2:

  Merge ath-next from ath.git (2015-08-26 12:40:23 +0300)



Major changes:

iwlwifi:

* new Tx power firmware API
* bump max firmware API to 17
* fix bug in debug prints
* static checker fix
* fix unused defines
* fix command list on newest firmware

brcmfmac:

* support NVRAM loading for bcm47xx platform
* new debugfs entry for msgbuf protocol layer used with PCIe devices

ath10k:

* add spectral scan support for qca99x0
* add qca6164 support


Adrien Schildknecht (1):
  rtlwifi: rtl8192cu: Add new device ID

Arend van Spriel (3):
  brcmfmac: correct interface combination info
  brcmfmac: make use of cfg80211_check_combinations()
  brcmfmac: bump highest event number for 4339 firmware

Ayala Beker (1):
  iwlwifi: mvm: split debug message to avoid exceeding 110 characters

Christian Engelmayer (1):
  rsi: Fix possible leak when loading firmware

Dan Carpenter (1):
  iwlwifi: mvm: catch underflow error earlier

Emmanuel Grumbach (1):
  iwlwifi: mvm: bump firmware API to 17

Franky Lin (2):
  brcmfmac: add debugfs entry for msgbuf statistics
  brcmfmac: block the correct flowring when backup queue overflow

Hante Meuleman (1):
  brcmfmac: Add support for host platform NVRAM loading.

Johannes Berg (2):
  iwlwifi: correctly size command string arrays
  iwlwifi: mvm: support new TX power command

Kalle Valo (2):
  Merge tag 'iwlwifi-next-for-kalle-2015-08-23' of 
https://git.kernel.org/.../iwlwifi/iwlwifi-next
  Merge ath-next from ath.git

Michal Kazior (6):
  ath10k: wake up offchannel queue properly
  ath10k: wake up queue upon vif creation
  ath10k: split ap/ibss wep key install process
  ath10k: add missing mutex unlock on failpath
  ath10k: fix dma_mapping_error() handling
  ath10k: add qca6164 support

Nicholas Mc Guire (1):
  wil6210: match wait_for_completion_timeout return type

Oleksij Rempel (1):
  ath9k_htc: do ani shortcalibratio if we got -ETIMEDOUT

Rafał Miłecki (1):
  brcmfmac: check all combinations when setting wiphy's addresses

Raja Mani (6):
  ath10k: refactor phyerr event handlers
  ath10k: handle 10.4 firmware phyerr event
  ath10k: ensure pktlog disable cmd reaches fw before pdev suspend
  ath10k: free collected fw stats memory if .pull_fw_stats fails
  ath10k: add spectral scan support for 10.4 fw
  ath10k: fix compilation warnings in wmi phyerr pull function

Sara Sharon (1):
  iwlwifi: mvm: update wakeup reason enum

Vasanthakumar Thiagarajan (3):
  ath10k: fix invalid survey reporting for QCA99X0
  ath10k: add cycle/rx_clear counters frequency to hw_params
  ath10k: fill in wmi 10.4 command handlers for addba/delba debug commands

Wu Fengguang (1):
  rtlwifi: rtl8192ee: fix semicolon.cocci warnings

 drivers/net/wireless/ath/ath10k/core.c |   18 +-
 drivers/net/wireless/ath/ath10k/core.h |3 +
 drivers/net/wireless/ath/ath10k/debug.c|3 +-
 drivers/net/wireless/ath/ath10k/htc.c  |4 +-
 drivers/net/wireless/ath/ath10k/htt_tx.c   |8 +-
 drivers/net/wireless/ath/ath10k/hw.c   |4 +-
 drivers/net/wireless/ath/ath10k/hw.h   |3 +-
 drivers/net/wireless/ath/ath10k/mac.c  |   54 +++--
 drivers/net/wireless/ath/ath10k/pci.c  |   21 +-
 drivers/net/wireless/ath/ath10k/spectral.c |   18 +-
 drivers/net/wireless/ath/ath10k/spectral.h |4 +-
 drivers/net/wireless/ath/ath10k/wmi-ops.h  |   22 ++-
 drivers/net/wireless/ath/ath10k/wmi-tlv.c  |   17 +-
 drivers/net/wireless/ath/ath10k/wmi.c  |  198 +++
 drivers/net/wireless/ath/ath10k/wmi.h  |   64 --
 drivers/net/wireless/ath/ath9k/htc_drv_main.c  |   13 +-
 drivers/net/wireless/ath/wil6210/wmi.c |2 +-
 drivers/net/wireless/brcm80211/brcmfmac/cfg80211.c |  206 
 drivers/net/wireless/brcm80211/brcmfmac/firmware.c |   39 ++--
 drivers/net/wireless/brcm80211/brcmfmac/flowring.c |   10 +-
 drivers/net/wireless/brcm80211/brcmfmac/fweh.h |   10 +-
 drivers/net/wireless/brcm80211

Re: [PATCH net v2] sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state

2015-08-26 Thread Vlad Yasevich
On 08/23/2015 07:30 AM, Xin Long wrote:
> commit f8d960524 fix the 0 peer.rwnd issue in SHUTDOWN_PENDING state through
> not reseting the overall_error_count when receive a heartbeat, but the same
> issue also exists in SHUTDOWN_RECEIVE state.
> 
> so we change the condition to state < SCTP_STATE_SHUTDOWN_PENDING to reset the
> overall_error_count when receive a heartbeat, which can avoid the issue happen
> in SCTP_STATE_SHUTDOWN_RECEIVE.
> 
> as to SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT state, with
> this patch, it will not be affected by the heartbeat, cause these two states
> have been taken charge of by t2 timer.
> 
> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown")
> Signed-off-by: Xin Long 


The code is OK, but the change log could use some help.

How is this for the explanation:

Commit f8d960524 ("sctp: Enforce retransmission limit during shutdown") fixed a
problem with excessive retransmissions in the SHUTDOWN_PENDING by not resetting
the association overall_error_count.  This allowed the association to better
enforce assoc.max_retrans limit.

However, the same issue still exists when the association is in 
SHUTDOWN_RECEIVED
state.  In this state, HB-ACKs will continue to reset the overall_error_count
for the association would extend the lifetime of association unnecessarily.

This patch solves this by resetting the overall_error_count whenever the current
state is small then SCTP_STATE_SHUTDOWN_PENDING.  As a small side-effect, we
end up also handling SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT
states, but they are not really impacted because we disable Heartbeats in those
states.


Thanks
-vlad


> ---
>  net/sctp/sm_sideeffect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index fef2acd..85e6f03 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -702,7 +702,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>* outstanding data and rely on the retransmission limit be reached
>* to shutdown the association.
>*/
> - if (t->asoc->state != SCTP_STATE_SHUTDOWN_PENDING)
> + if (t->asoc->state < SCTP_STATE_SHUTDOWN_PENDING)
>   t->asoc->overall_error_count = 0;
>  
>   /* Clear the hb_sent flag to signal that we had a good
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] man ip-link: Add more explanation about vlan reordering

2015-08-26 Thread Jeremy Harris
On 17/08/15 20:22, Vadim Kochan wrote:
> +.BR reorder_hdr " is " on
> +then VLAN header will be not inserted immediately but only before passing to 
> the
> +physical device (if this device does not support VLAN offloading), the 
> similar
> +on the RX direction - by default the packet will be untagged before being
> +received by VLAN device. Reordering allows to accelerate tagging on egress 
> and
> +to hide VLAN header on ingress so the packet looks like regular Ethernet 
> packet,
> +at the same time it might be confusing while the packet sniffing as the VLAN 
> header
  ^

Does not read well.  "for packet capture" perhaps?
-- 
Jeremy


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: [Bug 103541] New: netconsole + bonding: WARNING: CPU: 0 PID: 115 at kernel/softirq.c:150 __local_bh_enable_ip+0x6f/0xa0()

2015-08-26 Thread Stephen Hemminger


Begin forwarded message:

Date: Wed, 26 Aug 2015 10:01:30 +
From: "bugzilla-dae...@bugzilla.kernel.org" 

To: "shemmin...@linux-foundation.org" 
Subject: [Bug 103541] New: netconsole + bonding: WARNING: CPU: 0 PID: 115 at 
kernel/softirq.c:150 __local_bh_enable_ip+0x6f/0xa0()


https://bugzilla.kernel.org/show_bug.cgi?id=103541

Bug ID: 103541
   Summary: netconsole + bonding: WARNING: CPU: 0 PID: 115 at
kernel/softirq.c:150 __local_bh_enable_ip+0x6f/0xa0()
   Product: Networking
   Version: 2.5
Kernel Version: 4.1.4
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: shemmin...@linux-foundation.org
  Reporter: c...@excellency.fr
Regression: No

I upgraded from 3.4 to 4.1 and got a WARNING I'd never seen before:

[   14.389529] bond0: Setting MII monitoring interval to 100
[   14.389653] bond0: MII monitoring cannot be used with ARP monitoring -
disabling ARP monitoring...
[   14.390613] bond0: link status definitely down for interface eth0, disabling
it
[   14.390755] bond0: making interface eth1 the new active one
[   14.390862] [ cut here ]
[   14.390866] WARNING: CPU: 0 PID: 115 at kernel/softirq.c:150
__local_bh_enable_ip+0x6f/0xa0()
[   14.390882] Modules linked in: nf_conntrack_ftp ip6t_REJECT nf_reject_ipv6
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_length xt_NFLOG xt_limit
nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent xt_owner xt_set xt_multiport
xt_conntrack nf_conntrack ip_set_hash_net ip_set nfnetlink_log nfnetlink
ipmi_poweroff ipmi_devintf nfsd nfs_acl auth_rpcgss oid_registry nfs lockd
grace sunrpc netconsole bonding i2c_i801 lpc_ich mfd_core ehci_pci ehci_hcd
ipmi_si ipmi_msghandler igb dca i2c_algo_bit ptp pps_core hwmon usbhid usbcore
usb_common
[   14.390883] CPU: 0 PID: 115 rUID: 0 rGID: 0 Comm: kworker/u16:4 Not tainted
4.1.4 #2
[   14.390884] Hardware name: Supermicro X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F,
BIOS 2.00 04/24/2014
[   14.390887] Workqueue: bond0 bond_mii_monitor [bonding]
[   14.390889]  0096 8807fa027798 9b7388fc
8807fa0277d8
[   14.390890]  9b06f4a0 8807fa0277c8 0200
8807f9f543c8
[   14.390890]  8800016b4780 c0210b80 8807fa027830
8807fa0277e8
[   14.390891] Call Trace:
[   14.390894]  [] dump_stack+0x68/0x74
[   14.390896]  [] warn_slowpath_common+0x90/0xd0
[   14.390898]  [] warn_slowpath_null+0x15/0x20
[   14.390899]  [] __local_bh_enable_ip+0x6f/0xa0
[   14.390901]  [] bond_poll_controller+0x101/0x140 [bonding]
[   14.390904]  [] netpoll_poll_dev+0x6d/0x1a0
[   14.390905]  [] netpoll_send_skb_on_dev+0x16c/0x270
[   14.390906]  [] netpoll_send_udp+0x2cf/0x420
[   14.390907]  [] write_msg+0xc5/0x110 [netconsole]
[   14.390910]  [] call_console_drivers+0xc0/0xe0
[   14.390911]  [] console_unlock+0x317/0x400
[   14.390913]  [] vprintk_emit+0x2ce/0x530
[   14.390914]  [] vprintk_default+0x1a/0x20
[   14.390915]  [] printk+0x6c/0x6e
[   14.390917]  [] ? vprintk_default+0x1a/0x20
[   14.390918]  [] ? printk+0x6c/0x6e
[   14.390919]  [] ? check_preempt_curr+0x90/0xb0
[   14.390921]  [] __netdev_printk+0x152/0x310
[   14.390923]  [] netdev_info+0x7f/0x90
[   14.390925]  [] ? netdev_info+0x7f/0x90
[   14.390927]  [] ? __queue_work+0x1ba/0x350
[   14.390930]  [] bond_change_active_slave+0x13d/0x6f0
[bonding]
[   14.390932]  [] bond_select_active_slave+0x121/0x1f0
[bonding]
[   14.390934]  [] bond_mii_monitor+0x5e3/0x740 [bonding]
[   14.390935]  [] process_one_work+0x18c/0x4a0
[   14.390936]  [] worker_thread+0x12a/0x5d0
[   14.390938]  [] ? process_one_work+0x4a0/0x4a0
[   14.390939]  [] kthread+0xdb/0xf0
[   14.390940]  [] ? kthread_freezable_should_stop+0x70/0x70
[   14.390942]  [] ret_from_fork+0x42/0x70
[   14.390943]  [] ? kthread_freezable_should_stop+0x70/0x70
[   14.390944] ---[ end trace 4f1bbf12d3a796a0 ]---
[   14.548673] bond0: link status definitely down for interface eth1, disabling
it
[   14.548995] bond0: now running without any active interface!
[   14.590642] bond0: Setting ARP monitoring interval to 500

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/1] sfc: only use vadaptor stats if firmware is capable

2015-08-26 Thread Shradha Shah
From: Bert Kenward 

Some of the stats handling code differs based on SR-IOV support,
and SRIOV support is only available if full-featured firmware is
used.
Do not use vadaptor stats if firmware mode is not set to
full-featured.

Signed-off-by: Shradha Shah 
---
 drivers/net/ethernet/sfc/ef10.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 605cc89..b1a4ea2 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1282,7 +1282,12 @@ static size_t efx_ef10_update_stats_common(struct 
efx_nic *efx, u64 *full_stats,
}
}
 
-   if (core_stats) {
+   if (!core_stats)
+   return stats_count;
+
+   if (nic_data->datapath_caps &
+   1 << MC_CMD_GET_CAPABILITIES_OUT_EVB_LBN) {
+   /* Use vadaptor stats. */
core_stats->rx_packets = stats[EF10_STAT_rx_unicast] +
 stats[EF10_STAT_rx_multicast] +
 stats[EF10_STAT_rx_broadcast];
@@ -1302,6 +1307,26 @@ static size_t efx_ef10_update_stats_common(struct 
efx_nic *efx, u64 *full_stats,
core_stats->rx_fifo_errors = stats[EF10_STAT_rx_overflow];
core_stats->rx_errors = core_stats->rx_crc_errors;
core_stats->tx_errors = stats[EF10_STAT_tx_bad];
+   } else {
+   /* Use port stats. */
+   core_stats->rx_packets = stats[EF10_STAT_port_rx_packets];
+   core_stats->tx_packets = stats[EF10_STAT_port_tx_packets];
+   core_stats->rx_bytes = stats[EF10_STAT_port_rx_bytes];
+   core_stats->tx_bytes = stats[EF10_STAT_port_tx_bytes];
+   core_stats->rx_dropped = stats[EF10_STAT_port_rx_nodesc_drops] +
+stats[GENERIC_STAT_rx_nodesc_trunc] +
+stats[GENERIC_STAT_rx_noskb_drops];
+   core_stats->multicast = stats[EF10_STAT_port_rx_multicast];
+   core_stats->rx_length_errors =
+   stats[EF10_STAT_port_rx_gtjumbo] +
+   stats[EF10_STAT_port_rx_length_error];
+   core_stats->rx_crc_errors = stats[EF10_STAT_port_rx_bad];
+   core_stats->rx_frame_errors =
+   stats[EF10_STAT_port_rx_align_error];
+   core_stats->rx_fifo_errors = stats[EF10_STAT_port_rx_overflow];
+   core_stats->rx_errors = (core_stats->rx_length_errors +
+core_stats->rx_crc_errors +
+core_stats->rx_frame_errors);
}
 
return stats_count;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] net: phy: fixed: propagate fixed link values to struct

2015-08-26 Thread Stas Sergeev
26.08.2015 17:58, Madalin Bucur пишет:
> The fixed link values parsed from the device tree are stored in
> the struct fixed_phy member status. The struct phy_device members
> speed, duplex were not updated.

ACK, but IMHO it will make more sense if you include that
into your upcoming patch set rather than sending separately,
as otherwise there is simply no in-kernel users of that new
functionality (all the current users likely do not access
these fields as early as you want to, so they don't care).
In any case, the patch looks good to me and the policy is
up to others.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bgmac: support up to 3 cores (devices) on a bus

2015-08-26 Thread Rafał Miłecki
Broadcom buses may have more than 1 Ethernet device. This is used e.g.
to have few interfaces connected to different switch ports. So far we
saw chipsets with only 2 devices (e.g. BCM4706) but recent ones have
up to 3 (e.g. Netgear R8000 uses 3rd interface for most of switch
traffic, lower interfaces are for some kind of offloading).

Signed-off-by: Rafał Miłecki 
---
 drivers/net/ethernet/broadcom/bgmac.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 21e3c38..d043746 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1549,11 +1549,20 @@ static int bgmac_probe(struct bcma_device *core)
struct net_device *net_dev;
struct bgmac *bgmac;
struct ssb_sprom *sprom = &core->bus->sprom;
-   u8 *mac = core->core_unit ? sprom->et1mac : sprom->et0mac;
+   u8 *mac;
int err;
 
-   /* We don't support 2nd, 3rd, ... units, SPROM has to be adjusted */
-   if (core->core_unit > 1) {
+   switch (core->core_unit) {
+   case 0:
+   mac = sprom->et0mac;
+   break;
+   case 1:
+   mac = sprom->et1mac;
+   break;
+   case 2:
+   mac = sprom->et2mac;
+   break;
+   default:
pr_err("Unsupported core_unit %d\n", core->core_unit);
return -ENOTSUPP;
}
@@ -1588,8 +1597,17 @@ static int bgmac_probe(struct bcma_device *core)
}
bgmac->cmn = core->bus->drv_gmac_cmn.core;
 
-   bgmac->phyaddr = core->core_unit ? sprom->et1phyaddr :
-sprom->et0phyaddr;
+   switch (core->core_unit) {
+   case 0:
+   bgmac->phyaddr = sprom->et0phyaddr;
+   break;
+   case 1:
+   bgmac->phyaddr = sprom->et1phyaddr;
+   break;
+   case 2:
+   bgmac->phyaddr = sprom->et2phyaddr;
+   break;
+   }
bgmac->phyaddr &= BGMAC_PHY_MASK;
if (bgmac->phyaddr == BGMAC_PHY_MASK) {
bgmac_err(bgmac, "No PHY found\n");
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ss format bug

2015-08-26 Thread Mike Saal

Hi:

I found a formatting bug in the 4.1.1 ss command. The following line was 
incorrectly output due to passing a negative length to printf() when 
displaying the local address. In this instance hostapd does a "bind to 
device" on cdreth0 and then does a udp "in address any" port 67 bind. 
Please note the whitespace between the '*' and ' %cdreth0:67'


   'udp UNCONN 0 0 ** %cdreth0:67* *:* users:(("hostapd",pid=19241,fd=5))'

Attached is my patch for the bug fix, it might be prudent to add more 
guard code looking for negative length format codes.


Sincerely, Mike
diff -Nuar iproute2-4.1.1.orig/misc/ss.c iproute2-4.1.1/misc/ss.c
--- iproute2-4.1.1.orig/misc/ss.c	2015-07-06 17:57:34.0 -0400
+++ iproute2-4.1.1/misc/ss.c	2015-08-20 10:37:17.615100588 -0400
@@ -1023,6 +1023,8 @@
 	if (ifindex) {
 		ifname   = ll_index_to_name(ifindex);
 		est_len -= strlen(ifname) + 1;  /* +1 for percent char */
+		if (est_len < 0)
+			est_len = 0;
 	}
 
 	sock_addr_print_width(est_len, ap, ":", serv_width, resolve_service(port),




RE: [v2] net: phy: fixed: propagate fixed link values to struct

2015-08-26 Thread Madalin-Cristian Bucur
> -Original Message-
> From: Stas Sergeev [mailto:s...@list.ru]
> Sent: Wednesday, August 26, 2015 6:51 PM
> To: Bucur Madalin-Cristian-B32716 ;
> f.faine...@gmail.com
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Liberman Igal-
> B31950 
> Subject: Re: [v2] net: phy: fixed: propagate fixed link values to struct
> 
> 26.08.2015 17:58, Madalin Bucur пишет:
> > The fixed link values parsed from the device tree are stored in
> > the struct fixed_phy member status. The struct phy_device members
> > speed, duplex were not updated.
> 
> ACK, but IMHO it will make more sense if you include that
> into your upcoming patch set rather than sending separately,
> as otherwise there is simply no in-kernel users of that new
> functionality (all the current users likely do not access
> these fields as early as you want to, so they don't care).
> In any case, the patch looks good to me and the policy is
> up to others.

Given that it's more of a fix than a feature, I think it can be picked up 
separate
from a certain driver that accesses those fields early but I guess Florian, 
David
will decide this.

Thanks,
Madalin


Re: [PATCH net-next] route: fix breakage after moving lwtunnel state

2015-08-26 Thread Jiri Benc
On Sun, 23 Aug 2015 16:51:03 -0700 (PDT), David Miller wrote:
> From: Jiri Benc 
> Date: Fri, 21 Aug 2015 12:41:14 +0200
> 
> > @@ -99,6 +99,9 @@ struct dst_entry {
> > atomic_t__refcnt;   /* client references*/
> > int __use;
> > unsigned long   lastuse;
> > +#ifndef CONFIG_64BIT
> > +   struct lwtunnel_state   *lwtstate;
> > +#endif
> > union {
> > struct dst_entry*next;
> > struct rtable __rcu *rt_next;
> 
> I'm going to apply this to fix the build error without reverting your
> change entirely, but this is really an undesirable solution.
> 
> This cache line of the SKB is for write heavy members of struct
> dst_entry and so if you put a read-mostly member here it's going to
> result in performance problems.

I've spent the last few days taking measurements using tbench (which was
originally used for dst_entry reshuffling) and netperf (super_netperf
with various number of concurrent TCP streams) on i686 and found no
regression introduced by this.

This looks somehow surprising until we look at where and how lwtstate is
used. In non-tunnel case, it's either:

(1) used in netlink route notifications or
(2) used together with the first few fields in struct rtable/rt6_info or
(3) it's a skb_tunnel_info call.

No more cases than those three. Of those, (1) is not of much concern.

As for (2), the first few fields in struct rtable and struct rt6_info share
the same cacheline with __refcnt, thus accessing the lwtstate makes no
difference here.

About (3), skb_tunnel_info is called from ip_route_input_slow and
ip6_route_input. However, skb->dst is set only in case of a metadata
dst_entry. In such case, tunnel info is fetched from metadata; otherwise,
skb->dst is NULL. In either case, lwtstate is not accessed at all.

This confirms what I measured - placing lwtstate in the same cacheline as
__refcnt has no impact on performance in non-tunneling case.

As for tunneling, I did not see any performance degradation after my patch
in most cases. However, there were some cases where there was small
degradation (<2%). I bisected it to the addition of IPv6 fields into
ip_tunnel_key (i.e. commit c1ea5d672aaf). This is not much conclusive,
though, the variation of the benchmark results was relatively high and this
might be a noise. However, there's definitely room for performance
improvement here, the lwtunnel vxlan throughput is at about ~40% of the
non-vxlan throughput. I did not spend too much time on analyzing this, yet,
but it's clear the dst_entry layout is not our biggest concern here.

As the result, I think that the lwtstate field may stay where it is. For
simplification of the code and to get rid of the #ifdefs, we can have it
at the end of the struct also for 64bit case. Let me know if you prefer
this, I'll submit a patch.

Please let me know if you disagree with my analysis above.

Thanks,

  Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 2/6] net: add netif_is_bridge_master helper

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

Add this helper so code can easily figure out if netdev is a bridge.

Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 39f30da..be625f4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3848,6 +3848,11 @@ static inline bool netif_is_vrf(const struct net_device 
*dev)
return dev->priv_flags & IFF_VRF_MASTER;
 }
 
+static inline bool netif_is_bridge_master(const struct net_device *dev)
+{
+   return dev->priv_flags & IFF_EBRIDGE;
+}
+
 static inline bool netif_index_is_vrf(struct net *net, int ifindex)
 {
bool rc = false;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 1/6] net: introduce change upper device notifier change info

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

Add info that is passed along with NETDEV_CHANGEUPPER event.

Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h |  7 +++
 net/core/dev.c| 16 ++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6abe0d6..39f30da 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2127,6 +2127,13 @@ struct netdev_notifier_change_info {
unsigned int flags_changed;
 };
 
+struct netdev_notifier_changeupper_info {
+   struct netdev_notifier_info info; /* must be first */
+   struct net_device *upper_dev; /* new upper dev */
+   bool master; /* is upper dev master */
+   bool linking; /* is the nofication for link or unlink */
+};
+
 static inline void netdev_notifier_info_init(struct netdev_notifier_info *info,
 struct net_device *dev)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index b1f3f48..e7ef971 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5311,6 +5311,7 @@ static int __netdev_upper_dev_link(struct net_device *dev,
   struct net_device *upper_dev, bool master,
   void *private)
 {
+   struct netdev_notifier_changeupper_info changeupper_info;
struct netdev_adjacent *i, *j, *to_i, *to_j;
int ret = 0;
 
@@ -5329,6 +5330,10 @@ static int __netdev_upper_dev_link(struct net_device 
*dev,
if (master && netdev_master_upper_dev_get(dev))
return -EBUSY;
 
+   changeupper_info.upper_dev = upper_dev;
+   changeupper_info.master = master;
+   changeupper_info.linking = true;
+
ret = __netdev_adjacent_dev_link_neighbour(dev, upper_dev, private,
   master);
if (ret)
@@ -5367,7 +5372,8 @@ static int __netdev_upper_dev_link(struct net_device *dev,
goto rollback_lower_mesh;
}
 
-   call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+   call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+ &changeupper_info.info);
return 0;
 
 rollback_lower_mesh:
@@ -5462,9 +5468,14 @@ EXPORT_SYMBOL(netdev_master_upper_dev_link_private);
 void netdev_upper_dev_unlink(struct net_device *dev,
 struct net_device *upper_dev)
 {
+   struct netdev_notifier_changeupper_info changeupper_info;
struct netdev_adjacent *i, *j;
ASSERT_RTNL();
 
+   changeupper_info.upper_dev = upper_dev;
+   changeupper_info.master = netdev_master_upper_dev_get(dev) == upper_dev;
+   changeupper_info.linking = false;
+
__netdev_adjacent_dev_unlink_neighbour(dev, upper_dev);
 
/* Here is the tricky part. We must remove all dev's lower
@@ -5484,7 +5495,8 @@ void netdev_upper_dev_unlink(struct net_device *dev,
list_for_each_entry(i, &upper_dev->all_adj_list.upper, list)
__netdev_adjacent_dev_unlink(dev, i->dev);
 
-   call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev);
+   call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, dev,
+ &changeupper_info.info);
 }
 EXPORT_SYMBOL(netdev_upper_dev_unlink);
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 3/6] net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

Add this helper so code can easily figure out if netdev is openswitch.

Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h| 8 
 net/openvswitch/vport-internal_dev.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index be625f4..0a884e6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1264,6 +1264,7 @@ struct net_device_ops {
  * @IFF_MACVLAN: Macvlan device
  * @IFF_VRF_MASTER: device is a VRF master
  * @IFF_NO_QUEUE: device can run without qdisc attached
+ * @IFF_VRF_OPENVSWITCH: device is a Open vSwitch master
  */
 enum netdev_priv_flags {
IFF_802_1Q_VLAN = 1<<0,
@@ -1293,6 +1294,7 @@ enum netdev_priv_flags {
IFF_IPVLAN_SLAVE= 1<<24,
IFF_VRF_MASTER  = 1<<25,
IFF_NO_QUEUE= 1<<26,
+   IFF_OPENVSWITCH = 1<<27,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1322,6 +1324,7 @@ enum netdev_priv_flags {
 #define IFF_IPVLAN_SLAVE   IFF_IPVLAN_SLAVE
 #define IFF_VRF_MASTER IFF_VRF_MASTER
 #define IFF_NO_QUEUE   IFF_NO_QUEUE
+#define IFF_OPENVSWITCHIFF_OPENVSWITCH
 
 /**
  * struct net_device - The DEVICE structure.
@@ -3853,6 +3856,11 @@ static inline bool netif_is_bridge_master(const struct 
net_device *dev)
return dev->priv_flags & IFF_EBRIDGE;
 }
 
+static inline bool netif_is_ovs_master(const struct net_device *dev)
+{
+   return dev->priv_flags & IFF_OPENVSWITCH;
+}
+
 static inline bool netif_index_is_vrf(struct net *net, int ifindex)
 {
bool rc = false;
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index c058bbf..80b3e12 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -135,7 +135,7 @@ static void do_setup(struct net_device *netdev)
netdev->netdev_ops = &internal_dev_netdev_ops;
 
netdev->priv_flags &= ~IFF_TX_SKB_SHARING;
-   netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+   netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH;
netdev->destructor = internal_dev_destructor;
netdev->ethtool_ops = &internal_dev_ethtool_ops;
netdev->rtnl_link_ops = &internal_dev_link_ops;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 5/6] rocker: use new helper to figure out master kind

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

Looking at rtnl kind string is kind of ugly. So use new helpers to do
this in nicer way.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/rocker/rocker.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index a7cb74a..62f383c 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -322,21 +322,16 @@ static u16 rocker_port_vlan_to_vid(const struct 
rocker_port *rocker_port,
return ntohs(vlan_id);
 }
 
-static bool rocker_port_is_slave(const struct rocker_port *rocker_port,
-  const char *kind)
-{
-   return rocker_port->bridge_dev &&
-   !strcmp(rocker_port->bridge_dev->rtnl_link_ops->kind, kind);
-}
-
 static bool rocker_port_is_bridged(const struct rocker_port *rocker_port)
 {
-   return rocker_port_is_slave(rocker_port, "bridge");
+   return rocker_port->bridge_dev &&
+  netif_is_bridge_master(rocker_port->bridge_dev);
 }
 
 static bool rocker_port_is_ovsed(const struct rocker_port *rocker_port)
 {
-   return rocker_port_is_slave(rocker_port, "openvswitch");
+   return rocker_port->bridge_dev &&
+  netif_is_ovs_master(rocker_port->bridge_dev);
 }
 
 #define ROCKER_OP_FLAG_REMOVE  BIT(0)
@@ -5338,10 +5333,10 @@ static int rocker_port_master_changed(struct net_device 
*dev)
int err = 0;
 
/* N.B: Do nothing if the type of master is not supported */
-   if (master && master->rtnl_link_ops) {
-   if (!strcmp(master->rtnl_link_ops->kind, "bridge"))
+   if (master) {
+   if (netif_is_bridge_master(master))
err = rocker_port_bridge_join(rocker_port, master);
-   else if (!strcmp(master->rtnl_link_ops->kind, "openvswitch"))
+   else if (netif_is_ovs_master(master))
err = rocker_port_ovs_changed(rocker_port, master);
} else if (rocker_port_is_bridged(rocker_port)) {
err = rocker_port_bridge_leave(rocker_port);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 4/6] net: kill long time unused bonding private flags

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

We don't use them for years, just kill them now.

Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h | 57 +--
 1 file changed, 21 insertions(+), 36 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0a884e6..d895b17 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1240,13 +1240,8 @@ struct net_device_ops {
  *
  * @IFF_802_1Q_VLAN: 802.1Q VLAN device
  * @IFF_EBRIDGE: Ethernet bridging device
- * @IFF_SLAVE_INACTIVE: bonding slave not the curr. active
- * @IFF_MASTER_8023AD: bonding master, 802.3ad
- * @IFF_MASTER_ALB: bonding master, balance-alb
  * @IFF_BONDING: bonding master or slave
- * @IFF_SLAVE_NEEDARP: need ARPs for validation
  * @IFF_ISATAP: ISATAP interface (RFC4214)
- * @IFF_MASTER_ARPMON: bonding master, ARP mon in use
  * @IFF_WAN_HDLC: WAN HDLC device
  * @IFF_XMIT_DST_RELEASE: dev_hard_start_xmit() is allowed to
  * release skb->dst
@@ -1269,43 +1264,33 @@ struct net_device_ops {
 enum netdev_priv_flags {
IFF_802_1Q_VLAN = 1<<0,
IFF_EBRIDGE = 1<<1,
-   IFF_SLAVE_INACTIVE  = 1<<2,
-   IFF_MASTER_8023AD   = 1<<3,
-   IFF_MASTER_ALB  = 1<<4,
-   IFF_BONDING = 1<<5,
-   IFF_SLAVE_NEEDARP   = 1<<6,
-   IFF_ISATAP  = 1<<7,
-   IFF_MASTER_ARPMON   = 1<<8,
-   IFF_WAN_HDLC= 1<<9,
-   IFF_XMIT_DST_RELEASE= 1<<10,
-   IFF_DONT_BRIDGE = 1<<11,
-   IFF_DISABLE_NETPOLL = 1<<12,
-   IFF_MACVLAN_PORT= 1<<13,
-   IFF_BRIDGE_PORT = 1<<14,
-   IFF_OVS_DATAPATH= 1<<15,
-   IFF_TX_SKB_SHARING  = 1<<16,
-   IFF_UNICAST_FLT = 1<<17,
-   IFF_TEAM_PORT   = 1<<18,
-   IFF_SUPP_NOFCS  = 1<<19,
-   IFF_LIVE_ADDR_CHANGE= 1<<20,
-   IFF_MACVLAN = 1<<21,
-   IFF_XMIT_DST_RELEASE_PERM   = 1<<22,
-   IFF_IPVLAN_MASTER   = 1<<23,
-   IFF_IPVLAN_SLAVE= 1<<24,
-   IFF_VRF_MASTER  = 1<<25,
-   IFF_NO_QUEUE= 1<<26,
-   IFF_OPENVSWITCH = 1<<27,
+   IFF_BONDING = 1<<2,
+   IFF_ISATAP  = 1<<3,
+   IFF_WAN_HDLC= 1<<4,
+   IFF_XMIT_DST_RELEASE= 1<<5,
+   IFF_DONT_BRIDGE = 1<<6,
+   IFF_DISABLE_NETPOLL = 1<<7,
+   IFF_MACVLAN_PORT= 1<<8,
+   IFF_BRIDGE_PORT = 1<<9,
+   IFF_OVS_DATAPATH= 1<<10,
+   IFF_TX_SKB_SHARING  = 1<<11,
+   IFF_UNICAST_FLT = 1<<12,
+   IFF_TEAM_PORT   = 1<<13,
+   IFF_SUPP_NOFCS  = 1<<14,
+   IFF_LIVE_ADDR_CHANGE= 1<<15,
+   IFF_MACVLAN = 1<<16,
+   IFF_XMIT_DST_RELEASE_PERM   = 1<<17,
+   IFF_IPVLAN_MASTER   = 1<<18,
+   IFF_IPVLAN_SLAVE= 1<<19,
+   IFF_VRF_MASTER  = 1<<20,
+   IFF_NO_QUEUE= 1<<21,
+   IFF_OPENVSWITCH = 1<<22,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
 #define IFF_EBRIDGEIFF_EBRIDGE
-#define IFF_SLAVE_INACTIVE IFF_SLAVE_INACTIVE
-#define IFF_MASTER_8023AD  IFF_MASTER_8023AD
-#define IFF_MASTER_ALB IFF_MASTER_ALB
 #define IFF_BONDINGIFF_BONDING
-#define IFF_SLAVE_NEEDARP  IFF_SLAVE_NEEDARP
 #define IFF_ISATAP IFF_ISATAP
-#define IFF_MASTER_ARPMON  IFF_MASTER_ARPMON
 #define IFF_WAN_HDLC   IFF_WAN_HDLC
 #define IFF_XMIT_DST_RELEASE   IFF_XMIT_DST_RELEASE
 #define IFF_DONT_BRIDGEIFF_DONT_BRIDGE
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 0/6] rocker: make master change handling nicer

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

Jiri Pirko (6):
  net: introduce change upper device notifier change info
  net: add netif_is_bridge_master helper
  net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag
  net: kill long time unused bonding private flags
  rocker: use new helper to figure out master kind
  rocker: use change upper info

 drivers/net/ethernet/rocker/rocker.c | 74 ---
 include/linux/netdevice.h| 75 +++-
 net/core/dev.c   | 16 +++-
 net/openvswitch/vport-internal_dev.c |  2 +-
 4 files changed, 97 insertions(+), 70 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 6/6] rocker: use change upper info

2015-08-26 Thread Jiri Pirko
From: Jiri Pirko 

Since now information about changed upper is passed along, benefit from
that and use this info directly.

This also fixes possible issues that could happen when non-master device
is added (current code does not distinguish between master and non-master
upper device).

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/rocker/rocker.c | 61 ++--
 1 file changed, 38 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 62f383c..34ac41a 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -5326,46 +5326,61 @@ static int rocker_port_ovs_changed(struct rocker_port 
*rocker_port,
return err;
 }
 
-static int rocker_port_master_changed(struct net_device *dev)
+static int rocker_port_master_linked(struct rocker_port *rocker_port,
+struct net_device *master)
+{
+   int err = 0;
+
+   if (netif_is_bridge_master(master))
+   err = rocker_port_bridge_join(rocker_port, master);
+   else if (netif_is_ovs_master(master))
+   err = rocker_port_ovs_changed(rocker_port, master);
+   return err;
+}
+
+static int rocker_port_master_unlinked(struct rocker_port *rocker_port)
 {
-   struct rocker_port *rocker_port = netdev_priv(dev);
-   struct net_device *master = netdev_master_upper_dev_get(dev);
int err = 0;
 
-   /* N.B: Do nothing if the type of master is not supported */
-   if (master) {
-   if (netif_is_bridge_master(master))
-   err = rocker_port_bridge_join(rocker_port, master);
-   else if (netif_is_ovs_master(master))
-   err = rocker_port_ovs_changed(rocker_port, master);
-   } else if (rocker_port_is_bridged(rocker_port)) {
+   if (rocker_port_is_bridged(rocker_port))
err = rocker_port_bridge_leave(rocker_port);
-   } else if (rocker_port_is_ovsed(rocker_port)) {
+   else if (rocker_port_is_ovsed(rocker_port))
err = rocker_port_ovs_changed(rocker_port, NULL);
-   }
-
return err;
 }
 
 static int rocker_netdevice_event(struct notifier_block *unused,
  unsigned long event, void *ptr)
 {
-   struct net_device *dev;
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct netdev_notifier_changeupper_info *info;
+   struct rocker_port *rocker_port;
int err;
 
+   if (!rocker_port_dev_check(dev))
+   return NOTIFY_DONE;
+
switch (event) {
case NETDEV_CHANGEUPPER:
-   dev = netdev_notifier_info_to_dev(ptr);
-   if (!rocker_port_dev_check(dev))
-   return NOTIFY_DONE;
-   err = rocker_port_master_changed(dev);
-   if (err)
-   netdev_warn(dev,
-   "failed to reflect master change (err 
%d)\n",
-   err);
+   info = ptr;
+   if (!info->master)
+   goto out;
+   rocker_port = netdev_priv(dev);
+   if (info->linking) {
+   err = rocker_port_master_linked(rocker_port,
+   info->upper_dev);
+   if (err)
+   netdev_warn(dev, "failed to reflect master 
linked (err %d)\n",
+   err);
+   } else {
+   err = rocker_port_master_unlinked(rocker_port);
+   if (err)
+   netdev_warn(dev, "failed to reflect master 
unlinked (err %d)\n",
+   err);
+   }
break;
}
-
+out:
return NOTIFY_DONE;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] cxgb4: continue in debug mode, if probe fails

2015-08-26 Thread Hariprasad Shenai
If adapter is flashed with incorrect firmware, probe can fail.
If probe fails, continue in debug mode, so one can also use the debug
interface to update the firmware via ethtool.

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index f35dd22..422765d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4726,12 +4726,13 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
setup_memwin(adapter);
err = adap_init0(adapter);
+   if (err)
+   dev_err(&pdev->dev, "Adapter initialization failed, error %d.  "
+   "Continuing in debug mode\n", -err);
 #ifdef CONFIG_DEBUG_FS
bitmap_zero(adapter->sge.blocked_fl, adapter->sge.egr_sz);
 #endif
setup_memwin_rdma(adapter);
-   if (err)
-   goto out_unmap_bar;
 
for_each_port(adapter, i) {
struct net_device *netdev;
@@ -4799,6 +4800,8 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 * soon as the first register_netdev completes.
 */
cfg_queues(adapter);
+   if (!(adapter->flags & FW_OK))
+   goto fw_attach_fail;
 
adapter->l2t = t4_init_l2t(adapter->l2t_start, adapter->l2t_end);
if (!adapter->l2t) {
@@ -4851,6 +4854,7 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (err)
goto out_free_dev;
 
+fw_attach_fail:
/*
 * The card is now ready to go.  If any errors occur during device
 * registration we do not fail the whole card but rather proceed only
@@ -4901,7 +4905,6 @@ sriov:
 
  out_free_dev:
free_some_resources(adapter);
- out_unmap_bar:
if (!is_t4(adapter->params.chip))
iounmap(adapter->bar2);
  out_free_adapter:
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

2015-08-26 Thread Tony Lindgren
Hi,

* Guenter Roeck  [150817 13:48]:
> Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
> the call to smsc911x_probe_config() unconditional, and no longer fails if
> there is no device node. device_get_phy_mode() is called unconditionally,
> and if there is no phy node configured returns an error code. This error
> code is assigned to phy_interface, and interpreted elsewhere in the code
> as valid phy mode. This in turn causes qemu to crash when running a
> variant of realview_pb_defconfig.
> 
>   qemu: hardware error: lan9118_read: Bad reg 0x86
> 
> Fixes: 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT")
> Cc: Jeremy Linton 
> Cc Graeme Gregory 
> Signed-off-by: Guenter Roeck 
> ---
>  drivers/net/ethernet/smsc/smsc911x.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
> b/drivers/net/ethernet/smsc/smsc911x.c
> index 0f21aa3bb537..34f97684506b 100644
> --- a/drivers/net/ethernet/smsc/smsc911x.c
> +++ b/drivers/net/ethernet/smsc/smsc911x.c
> @@ -2367,12 +2367,17 @@ static const struct smsc911x_ops shifted_smsc911x_ops 
> = {
>  static int smsc911x_probe_config(struct smsc911x_platform_config *config,
>struct device *dev)
>  {
> + int phy_interface;
>   u32 width = 0;
>  
>   if (!dev)
>   return -ENODEV;
>  
> - config->phy_interface = device_get_phy_mode(dev);
> + phy_interface = device_get_phy_mode(dev);
> + if (phy_interface < 0)
> + return phy_interface;
> +
> + config->phy_interface = phy_interface;
>  
>   device_get_mac_address(dev, config->mac, ETH_ALEN);

Looks like this change makes at least omap boards using smsc911x
fail with -22 for me in Linux next.

Do any of the the device tree configured smsc911x devices actually
have a phy configured?

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 3/6] net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag

2015-08-26 Thread Florian Fainelli
On 26/08/15 09:36, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Add this helper so code can easily figure out if netdev is openswitch.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  include/linux/netdevice.h| 8 
>  net/openvswitch/vport-internal_dev.c | 2 +-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index be625f4..0a884e6 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1264,6 +1264,7 @@ struct net_device_ops {
>   * @IFF_MACVLAN: Macvlan device
>   * @IFF_VRF_MASTER: device is a VRF master
>   * @IFF_NO_QUEUE: device can run without qdisc attached
> + * @IFF_VRF_OPENVSWITCH: device is a Open vSwitch master

Typo, the flag you introduced is named IFF_OPENVSWITCH, not VFR_OPENSWITCH.

>   */
>  enum netdev_priv_flags {
>   IFF_802_1Q_VLAN = 1<<0,
> @@ -1293,6 +1294,7 @@ enum netdev_priv_flags {
>   IFF_IPVLAN_SLAVE= 1<<24,
>   IFF_VRF_MASTER  = 1<<25,
>   IFF_NO_QUEUE= 1<<26,
> + IFF_OPENVSWITCH = 1<<27,
>  };
>  
>  #define IFF_802_1Q_VLAN  IFF_802_1Q_VLAN
> @@ -1322,6 +1324,7 @@ enum netdev_priv_flags {
>  #define IFF_IPVLAN_SLAVE IFF_IPVLAN_SLAVE
>  #define IFF_VRF_MASTER   IFF_VRF_MASTER
>  #define IFF_NO_QUEUE IFF_NO_QUEUE
> +#define IFF_OPENVSWITCH  IFF_OPENVSWITCH
>  
>  /**
>   *   struct net_device - The DEVICE structure.
> @@ -3853,6 +3856,11 @@ static inline bool netif_is_bridge_master(const struct 
> net_device *dev)
>   return dev->priv_flags & IFF_EBRIDGE;
>  }
>  
> +static inline bool netif_is_ovs_master(const struct net_device *dev)
> +{
> + return dev->priv_flags & IFF_OPENVSWITCH;
> +}
> +
>  static inline bool netif_index_is_vrf(struct net *net, int ifindex)
>  {
>   bool rc = false;
> diff --git a/net/openvswitch/vport-internal_dev.c 
> b/net/openvswitch/vport-internal_dev.c
> index c058bbf..80b3e12 100644
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -135,7 +135,7 @@ static void do_setup(struct net_device *netdev)
>   netdev->netdev_ops = &internal_dev_netdev_ops;
>  
>   netdev->priv_flags &= ~IFF_TX_SKB_SHARING;
> - netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
> + netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH;
>   netdev->destructor = internal_dev_destructor;
>   netdev->ethtool_ops = &internal_dev_ethtool_ops;
>   netdev->rtnl_link_ops = &internal_dev_link_ops;
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

2015-08-26 Thread Jeremy Linton

On 08/26/2015 12:04 PM, Tony Lindgren wrote:

* Guenter Roeck  [150817 13:48]:

Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes

Looks like this change makes at least omap boards using smsc911x
fail with -22 for me in Linux next.

Do any of the the device tree configured smsc911x devices actually
have a phy configured?


Tony,

	Looks like all the ones in the kernel boot/dts directory have a phy 
including the omap3-lilly except for the ste-snowball.dts.


Do you have smsc,force-internal-phy set instead?

Thanks,



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-26 Thread Florian Fainelli
On 25/08/15 17:12, David Miller wrote:
> From: Florian Fainelli 
> Date: Tue, 25 Aug 2015 15:50:10 -0700
> 
>> This patch series implements a L2 only interface concept which
>> basically denies any kind of IP address configuration on these
>> interfaces, but still allows them to be used as configuration
>> end-points to keep using ethtool and friends.
>>
>> A cleaner approach might be to finally come up with the concept of
>> net_port which a net_device would be a superset of, but this still
>> raises tons of questions as to whether we should be modifying
>> userland tools to be able to configure/query these
>> interfaces. During all the switch talks/discussions last year, it
>> seemed to me like th L2-only interface is closest we have to a
>> "network port".
>>
>> Comments, flames, flying tomatoes welcome!
> 
> Interesting, indeed.
> 
> Do you plan to extend this to defining a more minimal network device
> sub-type as well?
> 
> Then we can pass "net_device_common" or whatever around as a common
> base type of actual net device "implementations".

I am a little worried this is not going to scale well without
introducing massive amounts of churn, but I am not opposed to the idea
of having a common denominator structure which is either further
specialized into a full-fledged net_device, or some other construct.

> 
> Or is you main goal just getting the L2-only semantic?

Yes, this was the main goal behind this submission, and see if there was
something obviously wrong with doing that.

Now, based on the feedback, it seems like there is both interest and
uses cases I had not initially thought about, like making this flag
fully volatile.

Thanks!
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

2015-08-26 Thread Guenter Roeck

Hi Tony,

On 08/26/2015 10:04 AM, Tony Lindgren wrote:

Hi,

* Guenter Roeck  [150817 13:48]:

Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
the call to smsc911x_probe_config() unconditional, and no longer fails if
there is no device node. device_get_phy_mode() is called unconditionally,
and if there is no phy node configured returns an error code. This error
code is assigned to phy_interface, and interpreted elsewhere in the code
as valid phy mode. This in turn causes qemu to crash when running a
variant of realview_pb_defconfig.

qemu: hardware error: lan9118_read: Bad reg 0x86

Fixes: 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT")
Cc: Jeremy Linton 
Cc Graeme Gregory 
Signed-off-by: Guenter Roeck 
---
  drivers/net/ethernet/smsc/smsc911x.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 0f21aa3bb537..34f97684506b 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2367,12 +2367,17 @@ static const struct smsc911x_ops shifted_smsc911x_ops = 
{
  static int smsc911x_probe_config(struct smsc911x_platform_config *config,
 struct device *dev)
  {
+   int phy_interface;
u32 width = 0;

if (!dev)
return -ENODEV;

-   config->phy_interface = device_get_phy_mode(dev);
+   phy_interface = device_get_phy_mode(dev);
+   if (phy_interface < 0)
+   return phy_interface;
+
+   config->phy_interface = phy_interface;

device_get_mac_address(dev, config->mac, ETH_ALEN);


Looks like this change makes at least omap boards using smsc911x
fail with -22 for me in Linux next.



What do you see if you revert my patch ? It should assign -22, or its
unsigned representation, to phy_interface, which isn't such a good idea
either.


Do any of the the device tree configured smsc911x devices actually
have a phy configured?


Good question, and beats me. Looking into the original code,
it didn't check for an error return from of_get_phy_mode() either,
and thus _would_ dutifully assign the error code to phy_interface.
Wonder how was this supposed to work to start with.

I'll do some debugging and try to find out what exactly is going on.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-26 Thread Florian Fainelli
On 25/08/15 21:24, Marcel Holtmann wrote:
> Hi Dave,
> 
>>> This patch series implements a L2 only interface concept which
>>> basically denies any kind of IP address configuration on these
>>> interfaces, but still allows them to be used as configuration
>>> end-points to keep using ethtool and friends.
>>>
>>> A cleaner approach might be to finally come up with the concept of
>>> net_port which a net_device would be a superset of, but this still
>>> raises tons of questions as to whether we should be modifying
>>> userland tools to be able to configure/query these
>>> interfaces. During all the switch talks/discussions last year, it
>>> seemed to me like th L2-only interface is closest we have to a
>>> "network port".
>>>
>>> Comments, flames, flying tomatoes welcome!
>>
>> Interesting, indeed.
>>
>> Do you plan to extend this to defining a more minimal network device
>> sub-type as well?
>>
>> Then we can pass "net_device_common" or whatever around as a common
>> base type of actual net device "implementations".
>>
>> Or is you main goal just getting the L2-only semantic?
> 
> the other end of this could be also an IP only net_device where we do not 
> have ethtool semantics.
> 
> We do have a need for a IPv6 only net_device when utilizing ARPHRD_6LOWPAN 
> for 802.15.4 and Bluetooth LE. Skipping in_dev initialization there might be 
> an interesting step towards that. Not sure how much entangled in_dev and 
> in6_dev still are. If it works for IFF_L2_ONLY, it might work also in the 
> other direction.

Just out of curiosity, is the aim for IPv6 only net_device to be denying
any kind of IPv4 configuration/tools, or is it for performance purposes?

The IFF_L2_ONLY flag would probably need to mean something like
(IFF_NO_IPV4 | IFF_NO_IPV6) such that you could decide which one of the
two IP stacks you want to use, or none.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V2 1/2] net: Introduce helper functions to get the per cpu data

2015-08-26 Thread Raghavendra K T
Signed-off-by: Raghavendra K T 
---
 include/net/ip.h   | 10 ++
 net/ipv4/af_inet.c | 41 +++--
 2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index d5fe9f2..93bf12e 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -202,10 +202,20 @@ void ip_send_unicast_reply(struct sock *sk, struct 
sk_buff *skb,
 #define NET_ADD_STATS_BH(net, field, adnd) 
SNMP_ADD_STATS_BH((net)->mib.net_statistics, field, adnd)
 #define NET_ADD_STATS_USER(net, field, adnd) 
SNMP_ADD_STATS_USER((net)->mib.net_statistics, field, adnd)
 
+u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offct);
 unsigned long snmp_fold_field(void __percpu *mib, int offt);
 #if BITS_PER_LONG==32
+u64 snmp_get_cpu_field64(void __percpu *mib, int cpu, int offct,
+size_t syncp_offset);
 u64 snmp_fold_field64(void __percpu *mib, int offt, size_t sync_off);
 #else
+static inline u64  snmp_get_cpu_field64(void __percpu *mib, int cpu, int offct,
+   size_t syncp_offset)
+{
+   return snmp_get_cpu_field(mib, cpu, offct);
+
+}
+
 static inline u64 snmp_fold_field64(void __percpu *mib, int offt, size_t 
syncp_off)
 {
return snmp_fold_field(mib, offt);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9532ee8..302e36b 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1448,38 +1448,51 @@ int inet_ctl_sock_create(struct sock **sk, unsigned 
short family,
 }
 EXPORT_SYMBOL_GPL(inet_ctl_sock_create);
 
+u64 snmp_get_cpu_field(void __percpu *mib, int cpu, int offt)
+{
+   return  *(((unsigned long *)per_cpu_ptr(mib, cpu)) + offt);
+}
+EXPORT_SYMBOL_GPL(snmp_get_cpu_field);
+
 unsigned long snmp_fold_field(void __percpu *mib, int offt)
 {
unsigned long res = 0;
int i;
 
for_each_possible_cpu(i)
-   res += *(((unsigned long *) per_cpu_ptr(mib, i)) + offt);
+   res += snmp_get_cpu_field(mib, i, offt);
return res;
 }
 EXPORT_SYMBOL_GPL(snmp_fold_field);
 
 #if BITS_PER_LONG==32
 
+u64 snmp_get_cpu_field64(void __percpu *mib, int cpu, int offct,
+size_t syncp_offset)
+{
+   void *bhptr;
+   struct u64_stats_sync *syncp;
+   u64 v;
+   unsigned int start;
+
+   bhptr = per_cpu_ptr(mib, cpu);
+   syncp = (struct u64_stats_sync *)(bhptr + syncp_offset);
+   do {
+   start = u64_stats_fetch_begin_irq(syncp);
+   v = *(((u64 *)bhptr) + offt);
+   } while (u64_stats_fetch_retry_irq(syncp, start));
+
+   return v;
+}
+EXPORT_SYMBOL_GPL(snmp_get_cpu_field64);
+
 u64 snmp_fold_field64(void __percpu *mib, int offt, size_t syncp_offset)
 {
u64 res = 0;
int cpu;
 
for_each_possible_cpu(cpu) {
-   void *bhptr;
-   struct u64_stats_sync *syncp;
-   u64 v;
-   unsigned int start;
-
-   bhptr = per_cpu_ptr(mib, cpu);
-   syncp = (struct u64_stats_sync *)(bhptr + syncp_offset);
-   do {
-   start = u64_stats_fetch_begin_irq(syncp);
-   v = *(((u64 *) bhptr) + offt);
-   } while (u64_stats_fetch_retry_irq(syncp, start));
-
-   res += v;
+   res += snmp_get_cpu_field(mib, cpu, offct, syncp_offset);
}
return res;
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC V2 0/2] Optimize the snmp stat aggregation for large cpus

2015-08-26 Thread Raghavendra K T
While creating 1000 containers, perf is showing lot of time spent in
snmp_fold_field on a large cpu system.

The current patch tries to improve by reordering the statistics gathering.

Please note that similar overhead was also reported while creating
veth pairs  https://lkml.org/lkml/2013/3/19/556

Changes in V2:
 - Allocate the stat calculation buffer in stack. (Eric)
 
Setup:
160 cpu (20 core) baremetal powerpc system with 1TB memory

1000 docker containers was created with command
docker run -itd  ubuntu:15.04  /bin/bash in loop

observation:
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) perf data showed, creating veth interfaces resulting in
the below code path was taking more time.

rtnl_fill_ifinfo
  -> inet6_fill_link_af
-> inet6_fill_ifla6_attrs
  -> snmp_fold_field

proposed idea:
 currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data to of an item (iteratively for around 90 items).
 The patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Performance of docker creation improved by around more than 2x
after the patch.

before the patch: 

time docker run -itd  ubuntu:15.04  /bin/bash
3f45ba571a42e925c4ec4aaee0e48d7610a9ed82a4c931f83324d41822cf6617
real0m6.836s
user0m0.095s
sys 0m0.011s

perf record -a docker run -itd  ubuntu:15.04  /bin/bash
===
# Samples: 32K of event 'cycles'
# Event count (approx.): 24688700190
# Overhead  Command  Shared Object   Symbol 

#   ...  ..  
50.73%  docker   [kernel.kallsyms]   [k] snmp_fold_field


 9.07%  swapper  [kernel.kallsyms]   [k] snooze_loop


 3.49%  docker   [kernel.kallsyms]   [k] veth_stats_one 


 2.85%  swapper  [kernel.kallsyms]   [k] _raw_spin_lock 


 1.37%  docker   docker  [.] backtrace_qsort


 1.31%  docker   docker  [.] strings.FieldsFunc 
 

  cache-misses:  2.7%
  
after the patch:
=
 time docker run -itd  ubuntu:15.04  /bin/bash
4e0619421332990bdea413fe455ab187607ed63d33d5c37aa5291bc2f5b35857
real0m3.357s
user0m0.092s
sys 0m0.010s

perf record -a docker run -itd  ubuntu:15.04  /bin/bash
===
# Samples: 15K of event 'cycles'
# Event count (approx.): 11471830714
# Overhead  Command  Shared Object Symbol   
  
#   ...    .
10.56%  swapper  [kernel.kallsyms] [k] snooze_loop  
  
 8.72%  docker   [kernel.kallsyms] [k] snmp_get_cpu_field   
  
 7.59%  docker   [kernel.kallsyms] [k] veth_stats_one   
  
 3.65%  swapper  [kernel.kallsyms] [k] _raw_spin_lock   
  
 3.06%  docker   docker[.] strings.FieldsFunc   
  
 2.96%  docker   docker[.] backtrace_qsort  

cache-misses: 1.38 %

Please let me know if you have suggestions/comments.
Thanks Eric and David for comments on V1.

Raghavendra K T (2):
  net: Introduce helper functions to get the per cpu data
  net: Optimize snmp stat aggregation by walking all the percpu data at
once

 include/net/ip.h| 10 ++
 net/ipv4/af_inet.c  | 41 +++--
 net/ipv6/addrconf.c | 18 +-
 3 files changed, 50 insertions(+), 19 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev

[PATCH RFC V2 2/2] net: Optimize snmp stat aggregation by walking all the percpu data at once

2015-08-26 Thread Raghavendra K T
Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 90 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
   Before   After
Docker creation time   6.836s   3.357s
cache miss 2.7% 1.38%

perf before:
50.73%  docker   [kernel.kallsyms]   [k] snmp_fold_field
 9.07%  swapper  [kernel.kallsyms]   [k] snooze_loop
 3.49%  docker   [kernel.kallsyms]   [k] veth_stats_one
 2.85%  swapper  [kernel.kallsyms]   [k] _raw_spin_lock

perf after:
10.56%  swapper  [kernel.kallsyms] [k] snooze_loop
 8.72%  docker   [kernel.kallsyms] [k] snmp_get_cpu_field
 7.59%  docker   [kernel.kallsyms] [k] veth_stats_one
 3.65%  swapper  [kernel.kallsyms] [k] _raw_spin_lock

Signed-off-by: Raghavendra K T 
---
 net/ipv6/addrconf.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

 Change in V2:
 - Allocate stat calculation buffer in stack (Eric)

Thanks David and Eric for coments on V1 and as both of them pointed,
unfortunately we cannot get rid of buffer for calculation without avoiding
unaligned op.

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..0f6c7a5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4624,16 +4624,22 @@ static inline void __snmp6_fill_statsdev(u64 *stats, 
atomic_long_t *mib,
 }
 
 static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib,
- int items, int bytes, size_t syncpoff)
+   int items, int bytes, size_t syncpoff,
+   u64 *buff)
 {
-   int i;
+   int i, c;
int pad = bytes - sizeof(u64) * items;
BUG_ON(pad < 0);
 
/* Use put_unaligned() because stats may not be aligned for u64. */
put_unaligned(items, &stats[0]);
+
+   for_each_possible_cpu(c)
+   for (i = 1; i < items; i++)
+   buff[i] += snmp_get_cpu_field64(mib, c, i, syncpoff);
+
for (i = 1; i < items; i++)
-   put_unaligned(snmp_fold_field64(mib, i, syncpoff), &stats[i]);
+   put_unaligned(buff[i], &stats[i]);
 
memset(&stats[items], 0, pad);
 }
@@ -4641,10 +4647,12 @@ static inline void __snmp6_fill_stats64(u64 *stats, 
void __percpu *mib,
 static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype,
 int bytes)
 {
+   u64 buff[IPSTATS_MIB_MAX] = {0,};
+
switch (attrtype) {
case IFLA_INET6_STATS:
-   __snmp6_fill_stats64(stats, idev->stats.ipv6,
-IPSTATS_MIB_MAX, bytes, offsetof(struct 
ipstats_mib, syncp));
+   __snmp6_fill_stats64(stats, idev->stats.ipv6, IPSTATS_MIB_MAX, 
bytes,
+offsetof(struct ipstats_mib, syncp), buff);
break;
case IFLA_INET6_ICMP6STATS:
__snmp6_fill_statsdev(stats, idev->stats.icmpv6dev->mibs, 
ICMP6_MIB_MAX, bytes);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH, net-next] r8169:Correct value on r810x_phy_power_up function

2015-08-26 Thread David Miller
From: Corcodel Marian 
Date: Wed, 26 Aug 2015 15:16:10 +0300

> Correct value on r810x_phy_power_up function normal clean 
>  bit BMCR_PDOWN
> 
> Signed-off-by: Corcodel Marian 
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index d6d39df..91cf3a6 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4669,7 +4669,7 @@ static void r810x_phy_power_down(struct rtl8169_private 
> *tp)
>  static void r810x_phy_power_up(struct rtl8169_private *tp)
>  {
>   rtl_writephy(tp, 0x1f, 0x);
> - rtl_writephy(tp, MII_BMCR, BMCR_ANENABLE);
> + rtl_writephy(tp, MII_BMCR, ~BMCR_PDOWN);

This DOES NOT only clear BMCR_PDOWN, it sets every other bit in the
register as well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 3/6] net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag

2015-08-26 Thread Scott Feldman
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Add this helper so code can easily figure out if netdev is openswitch.
>
> Signed-off-by: Jiri Pirko 
> ---
>  include/linux/netdevice.h| 8 
>  net/openvswitch/vport-internal_dev.c | 2 +-
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index be625f4..0a884e6 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1264,6 +1264,7 @@ struct net_device_ops {
>   * @IFF_MACVLAN: Macvlan device
>   * @IFF_VRF_MASTER: device is a VRF master
>   * @IFF_NO_QUEUE: device can run without qdisc attached
> + * @IFF_VRF_OPENVSWITCH: device is a Open vSwitch master
>   */
>  enum netdev_priv_flags {
> IFF_802_1Q_VLAN = 1<<0,
> @@ -1293,6 +1294,7 @@ enum netdev_priv_flags {
> IFF_IPVLAN_SLAVE= 1<<24,
> IFF_VRF_MASTER  = 1<<25,
> IFF_NO_QUEUE= 1<<26,
> +   IFF_OPENVSWITCH = 1<<27,
>  };
>
>  #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
> @@ -1322,6 +1324,7 @@ enum netdev_priv_flags {
>  #define IFF_IPVLAN_SLAVE   IFF_IPVLAN_SLAVE
>  #define IFF_VRF_MASTER IFF_VRF_MASTER
>  #define IFF_NO_QUEUE   IFF_NO_QUEUE
> +#define IFF_OPENVSWITCHIFF_OPENVSWITCH
>
>  /**
>   * struct net_device - The DEVICE structure.
> @@ -3853,6 +3856,11 @@ static inline bool netif_is_bridge_master(const struct 
> net_device *dev)
> return dev->priv_flags & IFF_EBRIDGE;
>  }
>
> +static inline bool netif_is_ovs_master(const struct net_device *dev)
> +{
> +   return dev->priv_flags & IFF_OPENVSWITCH;
> +}

We're going to run out of priv_flags bits.  This flag doesn't seem
like something that will be checked lots of places.  How about using
rtnl_link_ops->kind to save a bit in priv_flags?

static inline bool netif_is_ovs_master(const struct net_device *dev)
{
return !strcmp(dev->rtnl_link_ops->kind, "openvswitch"));
}

> +
>  static inline bool netif_index_is_vrf(struct net *net, int ifindex)
>  {
> bool rc = false;
> diff --git a/net/openvswitch/vport-internal_dev.c 
> b/net/openvswitch/vport-internal_dev.c
> index c058bbf..80b3e12 100644
> --- a/net/openvswitch/vport-internal_dev.c
> +++ b/net/openvswitch/vport-internal_dev.c
> @@ -135,7 +135,7 @@ static void do_setup(struct net_device *netdev)
> netdev->netdev_ops = &internal_dev_netdev_ops;
>
> netdev->priv_flags &= ~IFF_TX_SKB_SHARING;
> -   netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
> +   netdev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_OPENVSWITCH;
> netdev->destructor = internal_dev_destructor;
> netdev->ethtool_ops = &internal_dev_ethtool_ops;
> netdev->rtnl_link_ops = &internal_dev_link_ops;
> --
> 1.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-26 Thread David Miller
From: Jiri Pirko 
Date: Wed, 26 Aug 2015 09:37:57 +0200

> I don't think that are much more cases like this. Therefore I think that
> for this cases, debugfs might be a good way to expose debugging stats.

Scott wanted to do similar things in rocker.  DSA guys too.

Every switch device is going to have some kind of hierarchy like
this, it's not a unique situation.

So I don't buy the "this is a special situation" thing at all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

2015-08-26 Thread Tony Lindgren
* Jeremy Linton  [150826 10:35]:
> On 08/26/2015 12:04 PM, Tony Lindgren wrote:
> >* Guenter Roeck  [150817 13:48]:
> >>Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
> >Looks like this change makes at least omap boards using smsc911x
> >fail with -22 for me in Linux next.
> >
> >Do any of the the device tree configured smsc911x devices actually
> >have a phy configured?
> 
> Tony,
> 
>   Looks like all the ones in the kernel boot/dts directory have a phy
> including the omap3-lilly except for the ste-snowball.dts.
> 
>   Do you have smsc,force-internal-phy set instead?

Hmm most of them are using omap-gpmc-smsc911x.dtsi and
omap-gpmc-smsc9221.dtsi which are set up the same way as
omap3-lilly. So no phy.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lib/Makefile: remove CONFIG_AVERAGE build rule

2015-08-26 Thread David Miller
From: Valentin Rothberg 
Date: Wed, 26 Aug 2015 15:36:12 +0200

> The Kconfig option AVERAGE and its implementation has been removed by
> commit f4e774f55fe0 ("average: remove out-of-line implementation").
> Remove the dead build rule in lib/Makefile.
> 
> Signed-off-by: Valentin Rothberg 
> Reviewed-by: Johannes Berg 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 6/6] rocker: use change upper info

2015-08-26 Thread Scott Feldman
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Since now information about changed upper is passed along, benefit from
> that and use this info directly.
>
> This also fixes possible issues that could happen when non-master device
> is added (current code does not distinguish between master and non-master
> upper device).
>
> Signed-off-by: Jiri Pirko 

Acked-by: Scott Feldman 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 5/6] rocker: use new helper to figure out master kind

2015-08-26 Thread Scott Feldman
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Looking at rtnl kind string is kind of ugly. So use new helpers to do
> this in nicer way.
>
> Signed-off-by: Jiri Pirko 

Acked-by: Scott Feldman 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-26 Thread Marcel Holtmann
Hi Florian,

 This patch series implements a L2 only interface concept which
 basically denies any kind of IP address configuration on these
 interfaces, but still allows them to be used as configuration
 end-points to keep using ethtool and friends.
 
 A cleaner approach might be to finally come up with the concept of
 net_port which a net_device would be a superset of, but this still
 raises tons of questions as to whether we should be modifying
 userland tools to be able to configure/query these
 interfaces. During all the switch talks/discussions last year, it
 seemed to me like th L2-only interface is closest we have to a
 "network port".
 
 Comments, flames, flying tomatoes welcome!
>>> 
>>> Interesting, indeed.
>>> 
>>> Do you plan to extend this to defining a more minimal network device
>>> sub-type as well?
>>> 
>>> Then we can pass "net_device_common" or whatever around as a common
>>> base type of actual net device "implementations".
>>> 
>>> Or is you main goal just getting the L2-only semantic?
>> 
>> the other end of this could be also an IP only net_device where we do not 
>> have ethtool semantics.
>> 
>> We do have a need for a IPv6 only net_device when utilizing ARPHRD_6LOWPAN 
>> for 802.15.4 and Bluetooth LE. Skipping in_dev initialization there might be 
>> an interesting step towards that. Not sure how much entangled in_dev and 
>> in6_dev still are. If it works for IFF_L2_ONLY, it might work also in the 
>> other direction.
> 
> Just out of curiosity, is the aim for IPv6 only net_device to be denying
> any kind of IPv4 configuration/tools, or is it for performance purposes?

when you have 6LoWPAN, then it would be actually good to forbid IPv4 
configuration on these interface since they have no mapping whatsoever. 
Eventually it might allow us to decrease the size of the network stack for 
embedded sensor devices.

> The IFF_L2_ONLY flag would probably need to mean something like
> (IFF_NO_IPV4 | IFF_NO_IPV6) such that you could decide which one of the
> two IP stacks you want to use, or none.

I think IFF_NO_IPV4 and IFF_NO_IPV6 instead of IFF_L2_ONLY sounds like a good 
idea.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

2015-08-26 Thread Tony Lindgren
* Guenter Roeck  [150826 10:40]:
> Hi Tony,
> 
> On 08/26/2015 10:04 AM, Tony Lindgren wrote:
> >Hi,
> >
> >* Guenter Roeck  [150817 13:48]:
> >>Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
> >>the call to smsc911x_probe_config() unconditional, and no longer fails if
> >>there is no device node. device_get_phy_mode() is called unconditionally,
> >>and if there is no phy node configured returns an error code. This error
> >>code is assigned to phy_interface, and interpreted elsewhere in the code
> >>as valid phy mode. This in turn causes qemu to crash when running a
> >>variant of realview_pb_defconfig.
> >>
> >>qemu: hardware error: lan9118_read: Bad reg 0x86
> >>
> >>Fixes: 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT")
> >>Cc: Jeremy Linton 
> >>Cc Graeme Gregory 
> >>Signed-off-by: Guenter Roeck 
> >>---
> >>  drivers/net/ethernet/smsc/smsc911x.c | 7 ++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
> >>b/drivers/net/ethernet/smsc/smsc911x.c
> >>index 0f21aa3bb537..34f97684506b 100644
> >>--- a/drivers/net/ethernet/smsc/smsc911x.c
> >>+++ b/drivers/net/ethernet/smsc/smsc911x.c
> >>@@ -2367,12 +2367,17 @@ static const struct smsc911x_ops 
> >>shifted_smsc911x_ops = {
> >>  static int smsc911x_probe_config(struct smsc911x_platform_config *config,
> >> struct device *dev)
> >>  {
> >>+   int phy_interface;
> >>u32 width = 0;
> >>
> >>if (!dev)
> >>return -ENODEV;
> >>
> >>-   config->phy_interface = device_get_phy_mode(dev);
> >>+   phy_interface = device_get_phy_mode(dev);
> >>+   if (phy_interface < 0)
> >>+   return phy_interface;
> >>+
> >>+   config->phy_interface = phy_interface;
> >>
> >>device_get_mac_address(dev, config->mac, ETH_ALEN);
> >
> >Looks like this change makes at least omap boards using smsc911x
> >fail with -22 for me in Linux next.
> >
> 
> What do you see if you revert my patch ? It should assign -22, or its
> unsigned representation, to phy_interface, which isn't such a good idea
> either.

If I revert patch "smsc911x: Fix crash seen if neither ACPI nor OF is
configured or used" things work as just assign config->phy_interface
directly without returning early. It's -22 in that case also.
 
> >Do any of the the device tree configured smsc911x devices actually
> >have a phy configured?
> >
> Good question, and beats me. Looking into the original code,
> it didn't check for an error return from of_get_phy_mode() either,
> and thus _would_ dutifully assign the error code to phy_interface.
> Wonder how was this supposed to work to start with.
> 
> I'll do some debugging and try to find out what exactly is going on.

Looks like adding "smsc,force-internal-phy" to omap-gpmc-smsc9221.dtsi
does not help. Somehow the default behavior is now different.

Regards,

Tony
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 0/6] rocker: make master change handling nicer

2015-08-26 Thread Scott Feldman
On Wed, Aug 26, 2015 at 9:36 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Jiri Pirko (6):
>   net: introduce change upper device notifier change info
>   net: add netif_is_bridge_master helper
>   net: add netif_is_ovs_master helper with IFF_OPENVSWITCH private flag
>   net: kill long time unused bonding private flags
>   rocker: use new helper to figure out master kind
>   rocker: use change upper info

Looks good Jiri, thanks for cleaning this up.  Only nit was about
conserving netdev->priv_flags.   Using rtnl_link_ops->kind to
determine netdev type scales better.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/5] act_bpf: remove spinlock in fast path

2015-08-26 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 25 Aug 2015 20:06:30 -0700

> v1 version had a race condition in cleanup path of bpf_prog.
> I tried to fix it by adding new callback 'cleanup_rcu' to 'struct tcf_common'
> and call it out of act_api cleanup path, but Daniel noticed
> (thanks for the idea!) that most of the classifiers already do action cleanup
> out of rcu callback.
> So instead this set of patches converts tcindex and rsvp classifiers to call
> tcf_exts_destroy() after rcu grace period and since action cleanup logic
> in __tcf_hash_release() is only called when bind and refcnt goes to zero,
> it's guaranteed that cleanup() callback is called from rcu callback.
> More specifically:
> patches 1 and 2 - simple fixes
> patches 2 and 3 - convert tcf_exts_destroy in tcindex and rsvp to call_rcu
> patch 5 - removes spin_lock from act_bpf
> 
> The cleanup of actions is now universally done after rcu grace period
> and in the future we can drop (now unnecessary) call_rcu from 
> tcf_hash_destroy()
> patch 5 is using synchronize_rcu() in act_bpf replacement path, since it's
> very rare and alternative of dynamically allocating 'struct tcf_bpf_cfg' just
> to pass it to call_rcu looks even less appealing.

Series applied, thanks Alexei.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-26 Thread Scott Feldman
On Wed, Aug 26, 2015 at 10:49 AM, David Miller  wrote:
> From: Jiri Pirko 
> Date: Wed, 26 Aug 2015 09:37:57 +0200
>
>> I don't think that are much more cases like this. Therefore I think that
>> for this cases, debugfs might be a good way to expose debugging stats.
>
> Scott wanted to do similar things in rocker.  DSA guys too.
>
> Every switch device is going to have some kind of hierarchy like
> this, it's not a unique situation.

We've been able to get buy so far without a user-visible device for
the switch.  The switch ports are represented by netdevs, so that's
easy.  How can we create an object for the switch itself, so we can
attach common interfaces for the user to dump switch-level stats or
tables?   Using another netdev doesn't seem right.  Do we need a new
device class for switches, and then create some common tool/interfaces
for switch device class?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 03/10] ipv6: Export nf_ct_frag6_gather()

2015-08-26 Thread Joe Stringer
Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v4: Add ack.
v5-v6: No change.
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 6d02498..701cd2b 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -633,6 +633,7 @@ ret_orig:
kfree_skb(clone);
return skb;
 }
+EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
 
 void nf_ct_frag6_consume_orig(struct sk_buff *skb)
 {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 04/10] dst: Add __skb_dst_copy() variation

2015-08-26 Thread Joe Stringer
This variation on skb_dst_copy() doesn't require two skbs.

Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Acked-by: Thomas Graf 
---
v4: Add ack.
v5: No change.
v6: Add ack.
---
 include/net/dst.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index ef8f1d4..4c48016 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -289,13 +289,18 @@ static inline void skb_dst_drop(struct sk_buff *skb)
}
 }
 
-static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff 
*oskb)
+static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst)
 {
-   nskb->_skb_refdst = oskb->_skb_refdst;
+   nskb->_skb_refdst = refdst;
if (!(nskb->_skb_refdst & SKB_DST_NOREF))
dst_clone(skb_dst(nskb));
 }
 
+static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff 
*oskb)
+{
+   __skb_dst_copy(nskb, oskb->_skb_refdst);
+}
+
 /**
  * skb_dst_force - makes sure skb dst is refcounted
  * @skb: buffer
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 08/10] netfilter: connlabels: Export setting connlabel length

2015-08-26 Thread Joe Stringer
Add functions to change connlabel length into nf_conntrack_labels.c so
they may be reused by other modules like OVS and nftables without
needing to jump through xt_match_check() hoops.

Suggested-by: Florian Westphal 
Signed-off-by: Joe Stringer 
Acked-by: Florian Westphal 
Acked-by: Thomas Graf 
---
v2: Protect connlabel modification with spinlock.
Fix reference leak in error case.
Style fixups.
v3: No change.
v4-v5: Add acks.
---
 include/net/netfilter/nf_conntrack_labels.h |  4 
 net/netfilter/nf_conntrack_labels.c | 32 +
 net/netfilter/xt_connlabel.c| 16 ---
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_labels.h 
b/include/net/netfilter/nf_conntrack_labels.h
index dec6336..7e2b1d0 100644
--- a/include/net/netfilter/nf_conntrack_labels.h
+++ b/include/net/netfilter/nf_conntrack_labels.h
@@ -54,7 +54,11 @@ int nf_connlabels_replace(struct nf_conn *ct,
 #ifdef CONFIG_NF_CONNTRACK_LABELS
 int nf_conntrack_labels_init(void);
 void nf_conntrack_labels_fini(void);
+int nf_connlabels_get(struct net *net, unsigned int n_bits);
+void nf_connlabels_put(struct net *net);
 #else
 static inline int nf_conntrack_labels_init(void) { return 0; }
 static inline void nf_conntrack_labels_fini(void) {}
+static inline int nf_connlabels_get(struct net *net, unsigned int n_bits) { 
return 0; }
+static inline void nf_connlabels_put(struct net *net) {}
 #endif
diff --git a/net/netfilter/nf_conntrack_labels.c 
b/net/netfilter/nf_conntrack_labels.c
index daa7c13..3ce5c31 100644
--- a/net/netfilter/nf_conntrack_labels.c
+++ b/net/netfilter/nf_conntrack_labels.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+static spinlock_t nf_connlabels_lock;
+
 static unsigned int label_bits(const struct nf_conn_labels *l)
 {
unsigned int longs = l->words;
@@ -89,6 +91,35 @@ int nf_connlabels_replace(struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_connlabels_replace);
 
+int nf_connlabels_get(struct net *net, unsigned int n_bits)
+{
+   size_t words;
+
+   if (n_bits > (NF_CT_LABELS_MAX_SIZE * BITS_PER_BYTE))
+   return -ERANGE;
+
+   words = BITS_TO_LONGS(n_bits);
+
+   spin_lock(&nf_connlabels_lock);
+   net->ct.labels_used++;
+   if (words > net->ct.label_words)
+   net->ct.label_words = words;
+   spin_unlock(&nf_connlabels_lock);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(nf_connlabels_get);
+
+void nf_connlabels_put(struct net *net)
+{
+   spin_lock(&nf_connlabels_lock);
+   net->ct.labels_used--;
+   if (net->ct.labels_used == 0)
+   net->ct.label_words = 0;
+   spin_unlock(&nf_connlabels_lock);
+}
+EXPORT_SYMBOL_GPL(nf_connlabels_put);
+
 static struct nf_ct_ext_type labels_extend __read_mostly = {
.len= sizeof(struct nf_conn_labels),
.align  = __alignof__(struct nf_conn_labels),
@@ -97,6 +128,7 @@ static struct nf_ct_ext_type labels_extend __read_mostly = {
 
 int nf_conntrack_labels_init(void)
 {
+   spin_lock_init(&nf_connlabels_lock);
return nf_ct_extend_register(&labels_extend);
 }
 
diff --git a/net/netfilter/xt_connlabel.c b/net/netfilter/xt_connlabel.c
index 9f8719d..bb9cbeb 100644
--- a/net/netfilter/xt_connlabel.c
+++ b/net/netfilter/xt_connlabel.c
@@ -42,10 +42,6 @@ static int connlabel_mt_check(const struct xt_mtchk_param 
*par)
XT_CONNLABEL_OP_SET;
struct xt_connlabel_mtinfo *info = par->matchinfo;
int ret;
-   size_t words;
-
-   if (info->bit > XT_CONNLABEL_MAXBIT)
-   return -ERANGE;
 
if (info->options & ~options) {
pr_err("Unknown options in mask %x\n", info->options);
@@ -59,19 +55,15 @@ static int connlabel_mt_check(const struct xt_mtchk_param 
*par)
return ret;
}
 
-   par->net->ct.labels_used++;
-   words = BITS_TO_LONGS(info->bit+1);
-   if (words > par->net->ct.label_words)
-   par->net->ct.label_words = words;
-
+   ret = nf_connlabels_get(par->net, info->bit + 1);
+   if (ret < 0)
+   nf_ct_l3proto_module_put(par->family);
return ret;
 }
 
 static void connlabel_mt_destroy(const struct xt_mtdtor_param *par)
 {
-   par->net->ct.labels_used--;
-   if (par->net->ct.labels_used == 0)
-   par->net->ct.label_words = 0;
+   nf_connlabels_put(par->net);
nf_ct_l3proto_module_put(par->family);
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] smsc911x: Fix crash seen if neither ACPI nor OF is configured or used

2015-08-26 Thread Guenter Roeck

On 08/26/2015 10:04 AM, Tony Lindgren wrote:

Hi,

* Guenter Roeck  [150817 13:48]:

Commit 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT") makes
the call to smsc911x_probe_config() unconditional, and no longer fails if
there is no device node. device_get_phy_mode() is called unconditionally,
and if there is no phy node configured returns an error code. This error
code is assigned to phy_interface, and interpreted elsewhere in the code
as valid phy mode. This in turn causes qemu to crash when running a
variant of realview_pb_defconfig.

qemu: hardware error: lan9118_read: Bad reg 0x86

Fixes: 0b50dc4fc971 ("Convert smsc911x to use ACPI as well as DT")
Cc: Jeremy Linton 
Cc Graeme Gregory 
Signed-off-by: Guenter Roeck 
---
  drivers/net/ethernet/smsc/smsc911x.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 0f21aa3bb537..34f97684506b 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2367,12 +2367,17 @@ static const struct smsc911x_ops shifted_smsc911x_ops = 
{
  static int smsc911x_probe_config(struct smsc911x_platform_config *config,
 struct device *dev)
  {
+   int phy_interface;
u32 width = 0;

if (!dev)
return -ENODEV;

-   config->phy_interface = device_get_phy_mode(dev);
+   phy_interface = device_get_phy_mode(dev);
+   if (phy_interface < 0)
+   return phy_interface;
+
+   config->phy_interface = phy_interface;

device_get_mac_address(dev, config->mac, ETH_ALEN);


Looks like this change makes at least omap boards using smsc911x
fail with -22 for me in Linux next.

Do any of the the device tree configured smsc911x devices actually
have a phy configured?



Ok, this is more subtle than I thought.

Previously, the code would not attempt any devicetree configuration
if devicetree was not configured.

Now it does.

The error return from device_get_phy_mode() isn't the actual problem.
Apparently it doesn't really matter if a nonsensical value is assigned
to phy_interface.

The problem is that the reg-io-width property is obviously not present
in the non-dt and non-acpi case. This overwrites the existing platform data
configuration and selects 16 bit mode, to which the (simulated) hardware
obviously reacts less than enthusiastic.

Fixing this properly won't be easy. If the "reg-io-width" property
is not present or wrong, the default register width is 16 bit. Obviously,
if neither DT nor ACPI is available, it won't be present. This causes
the crash I had observed.

Bad part is that there does not seem to be a reliable means to detect
that platform data should be used in that situation. Other device_get_XXX
functions return -ENXIO if that happens, but not device_property_read_u32().
It is _supposed_ to return it per its API, but it doesn't (it returns
-ENODATA).

We may need two separate patches, one to fix up device_property_read_u32()
to return -ENXIO, and one to fix smsc911x_probe_config() to ignore the error
from device_get_phy_mode(), and to bail out if device_property_read_u32()
returns -ENXIO.

The simpler alternative would be to check the return value from
device_property_read_u32() for both -ENXIO and -ENODATA.
This would make the code independent of the necessary core changes
(which may take a while). I tested this variant, and it works, at least
for the non-DT case.

Does this make sense ?

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 09/10] openvswitch: Allow matching on conntrack label

2015-08-26 Thread Joe Stringer
Allow matching and setting the ct_label field. As with ct_mark, this is
populated by executing the CT action. The label field may be modified by
specifying a label and mask nested under the CT action. It is stored as
metadata attached to the connection. Label modification occurs after
lookup, and will only persist when the conntrack entry is committed by
providing the COMMIT flag to the CT action. Labels are currently fixed
to 128 bits in size.

Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v2: Split out setting the connlabel size for the current namespace.
v3: No change.
v4: Only allow setting label via ct action.
Update documentation.
v5: Fix ovs_ct_verify().
Add label to ct action serialization.
Free label bit length/reference properly.
Configure OVS label length per-netns, not per-dp.
Reject ct actions with label length longer than supported.
Replace some #ifdefs with IS_ENABLED.
Rebase.
v6: Add acks.
---
 include/uapi/linux/openvswitch.h |  10 +++
 net/openvswitch/actions.c|   1 +
 net/openvswitch/conntrack.c  | 128 ++-
 net/openvswitch/conntrack.h  |  11 +++-
 net/openvswitch/datapath.c   |  18 +++---
 net/openvswitch/datapath.h   |   3 +
 net/openvswitch/flow.c   |   4 +-
 net/openvswitch/flow.h   |   3 +-
 net/openvswitch/flow_netlink.c   |  46 +-
 net/openvswitch/flow_netlink.h   |   9 +--
 10 files changed, 199 insertions(+), 34 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 7a185b5..9d52058 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -326,6 +326,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
+   OVS_KEY_ATTR_CT_LABEL,  /* 16-octet connection tracking label */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -438,6 +439,11 @@ struct ovs_key_nd {
__u8nd_tll[ETH_ALEN];
 };
 
+#define OVS_CT_LABEL_LEN   16
+struct ovs_key_ct_label {
+   __u8ct_label[OVS_CT_LABEL_LEN];
+};
+
 /* OVS_KEY_ATTR_CT_STATE flags */
 #define OVS_CS_F_NEW   0x01 /* Beginning of a new connection. */
 #define OVS_CS_F_ESTABLISHED   0x02 /* Part of an existing connection. */
@@ -617,12 +623,16 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
  * mask, the corresponding bit in the value is copied to the connection
  * tracking mark field in the connection.
+ * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN
+ * mask. For each bit set in the mask, the corresponding bit in the value is
+ * copied to the connection tracking label field in the connection.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
+   OVS_CT_ATTR_LABEL,  /* label to associate with this connection. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 9741d2c..736a113 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -969,6 +969,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
case OVS_KEY_ATTR_CT_MARK:
+   case OVS_KEY_ATTR_CT_LABEL:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index e53dafc..a0417fb 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -34,6 +35,12 @@ struct md_mark {
u32 mask;
 };
 
+/* Metadata label for masked write to conntrack label. */
+struct md_label {
+   struct ovs_key_ct_label value;
+   struct ovs_key_ct_label mask;
+};
+
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
struct nf_conntrack_zone zone;
@@ -41,6 +48,7 @@ struct ovs_conntrack_info {
u32 flags;
u16 family;
struct md_mark mark;
+   struct md_label label;
 };
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -90,6 +98,24 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
return ct_state;
 }
 
+static void ovs_ct_get_label(const struct nf_conn *ct,
+struct ovs_key_ct_label *label)
+{
+   struct nf_conn_labels *cl = ct ? nf_ct_labels_find(ct) : NULL;
+
+   if (cl) {
+   size_t len = cl->words * sizeof(long);
+
+   if (len > OVS_CT_LABEL_LEN)
+ 

[PATCHv6 net-next 07/10] netfilter: Always export nf_connlabels_replace()

2015-08-26 Thread Joe Stringer
The following patches will reuse this code from OVS.

Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Acked-by: Thomas Graf 
---
v2-v4: No change.
v5: Add acks.
v6: No change.
---
 net/netfilter/nf_conntrack_labels.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_labels.c 
b/net/netfilter/nf_conntrack_labels.c
index bb53f12..daa7c13 100644
--- a/net/netfilter/nf_conntrack_labels.c
+++ b/net/netfilter/nf_conntrack_labels.c
@@ -48,7 +48,6 @@ int nf_connlabel_set(struct nf_conn *ct, u16 bit)
 }
 EXPORT_SYMBOL_GPL(nf_connlabel_set);
 
-#if IS_ENABLED(CONFIG_NF_CT_NETLINK)
 static void replace_u32(u32 *address, u32 mask, u32 new)
 {
u32 old, tmp;
@@ -89,7 +88,6 @@ int nf_connlabels_replace(struct nf_conn *ct,
return 0;
 }
 EXPORT_SYMBOL_GPL(nf_connlabels_replace);
-#endif
 
 static struct nf_ct_ext_type labels_extend __read_mostly = {
.len= sizeof(struct nf_conn_labels),
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 10/10] openvswitch: Allow attaching helpers to ct action

2015-08-26 Thread Joe Stringer
Add support for using conntrack helpers to assist protocol detection.
The new OVS_CT_ATTR_HELPER attribute of the CT action specifies a helper
to be used for this connection. If no helper is specified, then helpers
will be automatically applied as per the sysctl configuration of
net.netfilter.nf_conntrack_helper.

The helper may be specified as part of the conntrack action, eg:
ct(helper=ftp). Initial packets for related connections should be
committed to allow later packets for the flow to be considered
established.

Example ovs-ofctl flows allowing FTP connections from ports 1->2:
in_port=1,tcp,action=ct(helper=ftp,commit),2
in_port=2,tcp,ct_state=-trk,action=ct(recirc)
in_port=2,tcp,ct_state=+trk-new+est,action=1
in_port=2,tcp,ct_state=+trk+rel,action=1

Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v2-v3: No change.
v4: Change error code for unknown helper ENOENT->EINVAL.
v5: Fix rcu access of helpers.
Rebase.
v6: Add acks.
---
 include/uapi/linux/openvswitch.h |   3 ++
 net/openvswitch/conntrack.c  | 109 ++-
 2 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 9d52058..32e07d8 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -626,6 +626,7 @@ struct ovs_action_hash {
  * @OVS_CT_ATTR_LABEL: %OVS_CT_LABEL_LEN value followed by %OVS_CT_LABEL_LEN
  * mask. For each bit set in the mask, the corresponding bit in the value is
  * copied to the connection tracking label field in the connection.
+ * @OVS_CT_ATTR_HELPER: variable length string defining conntrack ALG.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
@@ -633,6 +634,8 @@ enum ovs_ct_attr {
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
OVS_CT_ATTR_LABEL,  /* label to associate with this connection. */
+   OVS_CT_ATTR_HELPER, /* netlink helper to assist detection of
+  related connections. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index a0417fb..890d3ee 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +44,7 @@ struct md_label {
 
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
+   struct nf_conntrack_helper *helper;
struct nf_conntrack_zone zone;
struct nf_conn *ct;
u32 flags;
@@ -234,6 +236,51 @@ static int ovs_ct_set_label(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
 }
 
+/* 'skb' should already be pulled to nh_ofs. */
+static int ovs_ct_helper(struct sk_buff *skb, u16 proto)
+{
+   const struct nf_conntrack_helper *helper;
+   const struct nf_conn_help *help;
+   enum ip_conntrack_info ctinfo;
+   unsigned int protoff;
+   struct nf_conn *ct;
+
+   ct = nf_ct_get(skb, &ctinfo);
+   if (!ct || ctinfo == IP_CT_RELATED_REPLY)
+   return NF_ACCEPT;
+
+   help = nfct_help(ct);
+   if (!help)
+   return NF_ACCEPT;
+
+   helper = rcu_dereference(help->helper);
+   if (!helper)
+   return NF_ACCEPT;
+
+   switch (proto) {
+   case NFPROTO_IPV4:
+   protoff = ip_hdrlen(skb);
+   break;
+   case NFPROTO_IPV6: {
+   u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+   __be16 frag_off;
+
+   protoff = ipv6_skip_exthdr(skb, sizeof(struct ipv6hdr),
+  &nexthdr, &frag_off);
+   if (protoff < 0 || (frag_off & htons(~0x7)) != 0) {
+   pr_debug("proto header not found\n");
+   return NF_ACCEPT;
+   }
+   break;
+   }
+   default:
+   WARN_ONCE(1, "helper invoked on non-IP family!");
+   return NF_DROP;
+   }
+
+   return helper->help(skb, protoff, ct, ctinfo);
+}
+
 static int handle_fragments(struct net *net, struct sw_flow_key *key,
u16 zone, struct sk_buff *skb)
 {
@@ -306,6 +353,13 @@ static bool skb_nfct_cached(const struct net *net, const 
struct sk_buff *skb,
return false;
if (!nf_ct_zone_equal_any(info->ct, nf_ct_zone(ct)))
return false;
+   if (info->helper) {
+   struct nf_conn_help *help;
+
+   help = nf_ct_ext_find(ct, NF_CT_EXT_HELPER);
+   if (help && rcu_access_pointer(help->helper) != info->helper)
+   return false;
+   }
 
return true;
 }
@@ -334,6 +388,11 @@ static int __ovs_ct_lookup(struct net *net, const struct 
sw_flow_key *key,
if (nf_conntrack_in(net, info->family, NF_INET_PRE_ROUTING,
  

[PATCHv6 net-next 06/10] openvswitch: Allow matching on conntrack mark

2015-08-26 Thread Joe Stringer
Allow matching and setting the ct_mark field. As with ct_state and
ct_zone, these fields are populated when the CT action is executed. To
write to this field, a value and mask can be specified as a nested
attribute under the CT action. This data is stored with the conntrack
entry, and is executed after the lookup occurs for the CT action. The
conntrack entry itself must be committed using the COMMIT flag in the CT
action flags for this change to persist.

Signed-off-by: Justin Pettit 
Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v1-v3: No change.
v4: Only allow setting conntrack mark via ct action.
Documentation tweaks.
v5: Rebase against conntrack zone changes.
Add ct_mark to ct action serialization
Replace some #ifdefs with IS_ENABLED.
v6: Add acks.
---
 include/uapi/linux/openvswitch.h |  5 +++
 net/openvswitch/actions.c|  1 +
 net/openvswitch/conntrack.c  | 67 ++--
 net/openvswitch/conntrack.h  |  1 +
 net/openvswitch/flow.h   |  1 +
 net/openvswitch/flow_netlink.c   | 12 ++-
 6 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 55f5997..7a185b5 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -325,6 +325,7 @@ enum ovs_key_attr {
 * the accepted length of the array. */
OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
+   OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
 
 #ifdef __KERNEL__
OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
@@ -613,11 +614,15 @@ struct ovs_action_hash {
  * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
  * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
  * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
+ * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
+ * mask, the corresponding bit in the value is copied to the connection
+ * tracking mark field in the connection.
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
+   OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
__OVS_CT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 72ca2c4..9741d2c 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -968,6 +968,7 @@ static int execute_masked_set_action(struct sk_buff *skb,
 
case OVS_KEY_ATTR_CT_STATE:
case OVS_KEY_ATTR_CT_ZONE:
+   case OVS_KEY_ATTR_CT_MARK:
err = -EINVAL;
break;
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 1189fd5..e53dafc 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -28,12 +28,19 @@ struct ovs_ct_len_tbl {
size_t minlen;
 };
 
+/* Metadata mark for masked write to conntrack mark */
+struct md_mark {
+   u32 value;
+   u32 mask;
+};
+
 /* Conntrack action context for execution. */
 struct ovs_conntrack_info {
struct nf_conntrack_zone zone;
struct nf_conn *ct;
u32 flags;
u16 family;
+   struct md_mark mark;
 };
 
 static u16 key_to_nfproto(const struct sw_flow_key *key)
@@ -84,10 +91,12 @@ static u8 ovs_ct_get_state(enum ip_conntrack_info ctinfo)
 }
 
 static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
-   const struct nf_conntrack_zone *zone)
+   const struct nf_conntrack_zone *zone,
+   const struct nf_conn *ct)
 {
key->ct.state = state;
key->ct.zone = zone->id;
+   key->ct.mark = ct ? ct->mark : 0;
 }
 
 /* Update 'key' based on skb->nfct. If 'post_ct' is true, then OVS has
@@ -110,7 +119,7 @@ static void ovs_ct_update_key(const struct sk_buff *skb,
} else if (post_ct) {
state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
}
-   __ovs_ct_update_key(key, state, zone);
+   __ovs_ct_update_key(key, state, zone, ct);
 }
 
 void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
@@ -127,6 +136,35 @@ int ovs_ct_put_key(const struct sw_flow_key *key, struct 
sk_buff *skb)
nla_put_u16(skb, OVS_KEY_ATTR_CT_ZONE, key->ct.zone))
return -EMSGSIZE;
 
+   if (IS_ENABLED(CONFIG_NF_CONNTRACK_MARK) &&
+   nla_put_u32(skb, OVS_KEY_ATTR_CT_MARK, key->ct.mark))
+   return -EMSGSIZE;
+
+   return 0;
+}
+
+static int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
+  u32 ct_mark, u32 mask)
+{
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+   u32 new_mark;
+
+   if (!IS_ENABLED(CONFIG_NF_CO

[PATCHv6 net-next 05/10] openvswitch: Add conntrack action

2015-08-26 Thread Joe Stringer
Expose the kernel connection tracker via OVS. Userspace components can
make use of the CT action to populate the connection state (ct_state)
field for a flow. This state can be subsequently matched.

Exposed connection states are OVS_CS_F_*:
- NEW (0x01) - Beginning of a new connection.
- ESTABLISHED (0x02) - Part of an existing connection.
- RELATED (0x04) - Related to an established connection.
- INVALID (0x20) - Could not track the connection for this packet.
- REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
- TRACKED (0x80) - This packet has been sent through conntrack.

When the CT action is executed by itself, it will send the packet
through the connection tracker and populate the ct_state field with one
or more of the connection state flags above. The CT action will always
set the TRACKED bit.

When the COMMIT flag is passed to the conntrack action, this specifies
that information about the connection should be stored. This allows
subsequent packets for the same (or related) connections to be
correlated with this connection. Sending subsequent packets for the
connection through conntrack allows the connection tracker to consider
the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.

The CT action may optionally take a zone to track the flow within. This
allows connections with the same 5-tuple to be kept logically separate
from connections in other zones. If the zone is specified, then the
"ct_zone" match field will be subsequently populated with the zone id.

IP fragments are handled by transparently assembling them as part of the
CT action. The maximum received unit (MRU) size is tracked so that
refragmentation can occur during output.

IP frag handling contributed by Andy Zhou.

Signed-off-by: Joe Stringer 
Signed-off-by: Justin Pettit 
Signed-off-by: Andy Zhou 
Acked-by: Thomas Graf 
---
This can be tested with the corresponding userspace component here:
https://www.github.com/justinpettit/openvswitch conntrack

v2: Don't take references to devs or dsts in output path.
Shift ovs_ct_init()/ovs_ct_exit() into this patch
Handle output case where flow key is invalidated
Store the entire L2 header to apply to fragments
Various minor simplifications
Improve comments/logs
Style fixes
Rebase
v3: Clone dst in output, free final dst reference properly.
Handle CHECKSUM_COMPLETE after fragmentation
Restore L2 skb metadata after fragmentation
Make MRU types more consistent
Better cleanup in error paths
Fix sparse warnings
v4: Reject set_field actions for ct_state,ct_zone
Combine key->ct update from skb->nfct into a single function.
Minor documentation tweaks.
Simplify some codepaths.
v5: Fix ovs_ct_verify().
Don't take references on nf_conntrack_ipv[46]
Replace some #ifdefs with IS_ENABLED.
Remove unused functions.
Rebase.
v6: Make ovs_ct_fill_key() static inline for disabled ovs-ct case.
Conditionally serialize OVS_KEY_ATTR_CT_* fields.
Add ack.
---
 include/uapi/linux/openvswitch.h |  40 
 net/openvswitch/Kconfig  |  11 +
 net/openvswitch/Makefile |   2 +
 net/openvswitch/actions.c| 175 ++-
 net/openvswitch/conntrack.c  | 454 +++
 net/openvswitch/conntrack.h  |  78 +++
 net/openvswitch/datapath.c   |  66 --
 net/openvswitch/datapath.h   |   6 +
 net/openvswitch/flow.c   |   2 +
 net/openvswitch/flow.h   |   6 +
 net/openvswitch/flow_netlink.c   |  69 --
 net/openvswitch/flow_netlink.h   |   4 +-
 net/openvswitch/vport.c  |   1 +
 13 files changed, 877 insertions(+), 37 deletions(-)
 create mode 100644 net/openvswitch/conntrack.c
 create mode 100644 net/openvswitch/conntrack.h

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d6b8854..55f5997 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -164,6 +164,9 @@ enum ovs_packet_cmd {
  * %OVS_USERSPACE_ATTR_EGRESS_TUN_PORT attribute, which is sent only if the
  * output port is actually a tunnel port. Contains the output tunnel key
  * extracted from the packet as nested %OVS_TUNNEL_KEY_ATTR_* attributes.
+ * @OVS_PACKET_ATTR_MRU: Present for an %OVS_PACKET_CMD_ACTION and
+ * %OVS_PACKET_ATTR_USERSPACE action specify the Maximum received fragment
+ * size.
  *
  * These attributes follow the &struct ovs_header within the Generic Netlink
  * payload for %OVS_PACKET_* commands.
@@ -180,6 +183,7 @@ enum ovs_packet_attr {
OVS_PACKET_ATTR_UNUSED2,
OVS_PACKET_ATTR_PROBE,  /* Packet operation is a feature probe,
   error logging should be suppressed. */
+   OVS_PACKET_ATTR_MRU,/* Maximum received IP fragment size. */
__OVS_PACKET_ATTR_MAX
 };
 
@@ -319,6 +323,8 @@ enum ovs_key_attr {
OVS_KEY_ATTR_MPLS,  /* array of struct ovs_key_mpls.
 * The i

Re: [GIT PULL nf-next] Second Round of IPVS Updates for v4.3

2015-08-26 Thread Pablo Neira Ayuso
On Fri, Aug 21, 2015 at 09:23:38AM -0700, Simon Horman wrote:
> Hi Pablo,
> 
> please consider these IPVS Updates for v4.3.
> 
> I realise these are a little late in the cycle, so if you would prefer
> me to repost them for v4.4 then just let me know.

Pulled, thanks Simon.

Let me see if this gets into this merge window, otherwise I'll keep it
in my tree and will submit in the next merge window.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-26 Thread Florian Fainelli
On 26/08/15 11:21, Scott Feldman wrote:
> On Wed, Aug 26, 2015 at 10:49 AM, David Miller  wrote:
>> From: Jiri Pirko 
>> Date: Wed, 26 Aug 2015 09:37:57 +0200
>>
>>> I don't think that are much more cases like this. Therefore I think that
>>> for this cases, debugfs might be a good way to expose debugging stats.
>>
>> Scott wanted to do similar things in rocker.  DSA guys too.
>>
>> Every switch device is going to have some kind of hierarchy like
>> this, it's not a unique situation.
> 
> We've been able to get buy so far without a user-visible device for
> the switch.  The switch ports are represented by netdevs, so that's
> easy.  How can we create an object for the switch itself, so we can
> attach common interfaces for the user to dump switch-level stats or
> tables?   Using another netdev doesn't seem right.  Do we need a new
> device class for switches, and then create some common tool/interfaces
> for switch device class?

I agree this is something crucially missing. If we try to list what
could be missing currently, there is mostly:

* switch-wide statistics, tables, databases
* controlling a firmware agent running on the switch
* restarting/re-configuring the switch hardware

All of these already have proper ethtool control interfaces, so using
something that understands a specialized net_device might be the easiest
way to go, but we would need a way to put it in the non network device
name space so tools and users to do not get confused?

We could also have a specific SET_NETDEV_DEVTYPE() which helps make that
specific device be part of a "switch-mgmt" class for instance?
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 02/10] openvswitch: Move MASKED* macros to datapath.h

2015-08-26 Thread Joe Stringer
This will allow the ovs-conntrack code to reuse these macros.

Signed-off-by: Joe Stringer 
Acked-by: Thomas Graf 
Acked-by: Pravin B Shelar 
---
v2-v3: No change.
v4: Add ack.
v5-v6: No change.
---
 net/openvswitch/actions.c  | 52 ++
 net/openvswitch/datapath.h |  4 
 2 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 4f42007..520438b 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -185,10 +185,6 @@ static int pop_mpls(struct sk_buff *skb, struct 
sw_flow_key *key,
return 0;
 }
 
-/* 'KEY' must not have any bits set outside of the 'MASK' */
-#define MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK)))
-#define SET_MASKED(OLD, KEY, MASK) ((OLD) = MASKED(OLD, KEY, MASK))
-
 static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key,
const __be32 *mpls_lse, const __be32 *mask)
 {
@@ -201,7 +197,7 @@ static int set_mpls(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
return err;
 
stack = (__be32 *)skb_mpls_header(skb);
-   lse = MASKED(*stack, *mpls_lse, *mask);
+   lse = OVS_MASKED(*stack, *mpls_lse, *mask);
if (skb->ip_summed == CHECKSUM_COMPLETE) {
__be32 diff[] = { ~(*stack), lse };
 
@@ -244,9 +240,9 @@ static void ether_addr_copy_masked(u8 *dst_, const u8 
*src_, const u8 *mask_)
const u16 *src = (const u16 *)src_;
const u16 *mask = (const u16 *)mask_;
 
-   SET_MASKED(dst[0], src[0], mask[0]);
-   SET_MASKED(dst[1], src[1], mask[1]);
-   SET_MASKED(dst[2], src[2], mask[2]);
+   OVS_SET_MASKED(dst[0], src[0], mask[0]);
+   OVS_SET_MASKED(dst[1], src[1], mask[1]);
+   OVS_SET_MASKED(dst[2], src[2], mask[2]);
 }
 
 static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key,
@@ -338,10 +334,10 @@ static void update_ipv6_checksum(struct sk_buff *skb, u8 
l4_proto,
 static void mask_ipv6_addr(const __be32 old[4], const __be32 addr[4],
   const __be32 mask[4], __be32 masked[4])
 {
-   masked[0] = MASKED(old[0], addr[0], mask[0]);
-   masked[1] = MASKED(old[1], addr[1], mask[1]);
-   masked[2] = MASKED(old[2], addr[2], mask[2]);
-   masked[3] = MASKED(old[3], addr[3], mask[3]);
+   masked[0] = OVS_MASKED(old[0], addr[0], mask[0]);
+   masked[1] = OVS_MASKED(old[1], addr[1], mask[1]);
+   masked[2] = OVS_MASKED(old[2], addr[2], mask[2]);
+   masked[3] = OVS_MASKED(old[3], addr[3], mask[3]);
 }
 
 static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto,
@@ -358,15 +354,15 @@ static void set_ipv6_addr(struct sk_buff *skb, u8 
l4_proto,
 static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask)
 {
/* Bits 21-24 are always unmasked, so this retains their values. */
-   SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16));
-   SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8));
-   SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask);
+   OVS_SET_MASKED(nh->flow_lbl[0], (u8)(fl >> 16), (u8)(mask >> 16));
+   OVS_SET_MASKED(nh->flow_lbl[1], (u8)(fl >> 8), (u8)(mask >> 8));
+   OVS_SET_MASKED(nh->flow_lbl[2], (u8)fl, (u8)mask);
 }
 
 static void set_ip_ttl(struct sk_buff *skb, struct iphdr *nh, u8 new_ttl,
   u8 mask)
 {
-   new_ttl = MASKED(nh->ttl, new_ttl, mask);
+   new_ttl = OVS_MASKED(nh->ttl, new_ttl, mask);
 
csum_replace2(&nh->check, htons(nh->ttl << 8), htons(new_ttl << 8));
nh->ttl = new_ttl;
@@ -392,7 +388,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
 * makes sense to check if the value actually changed.
 */
if (mask->ipv4_src) {
-   new_addr = MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src);
+   new_addr = OVS_MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src);
 
if (unlikely(new_addr != nh->saddr)) {
set_ip_addr(skb, nh, &nh->saddr, new_addr);
@@ -400,7 +396,7 @@ static int set_ipv4(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
}
}
if (mask->ipv4_dst) {
-   new_addr = MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst);
+   new_addr = OVS_MASKED(nh->daddr, key->ipv4_dst, mask->ipv4_dst);
 
if (unlikely(new_addr != nh->daddr)) {
set_ip_addr(skb, nh, &nh->daddr, new_addr);
@@ -488,7 +484,8 @@ static int set_ipv6(struct sk_buff *skb, struct sw_flow_key 
*flow_key,
*(__be32 *)nh & htonl(IPV6_FLOWINFO_FLOWLABEL);
}
if (mask->ipv6_hlimit) {
-   SET_MASKED(nh->hop_limit, key->ipv6_hlimit, mask->ipv6_hlimit);
+   OVS_SET_MASKED(nh->hop_limit, key->ipv6_hlimit,
+  mask->ipv6_hlimit);
flow_key->ip.ttl = nh->hop_limit;
}
return 0;
@@ -517,8 

[PATCHv6 net-next 01/10] openvswitch: Serialize acts with original netlink len

2015-08-26 Thread Joe Stringer
Previously, we used the kernel-internal netlink actions length to
calculate the size of messages to serialize back to userspace.
However,the sw_flow_actions may not be formatted exactly the same as the
actions on the wire, so store the original actions length when
de-serializing and re-use the original length when serializing.

Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
Acked-by: Thomas Graf 
---
v2: No change.
v3: Preserve original length across buffer resize.
v4: Add ack.
v5: No change.
v6: Add ack.
---
 net/openvswitch/datapath.c | 2 +-
 net/openvswitch/flow.h | 1 +
 net/openvswitch/flow_netlink.c | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ffe984f..d5b5473 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -713,7 +713,7 @@ static size_t ovs_flow_cmd_msg_size(const struct 
sw_flow_actions *acts,
 
/* OVS_FLOW_ATTR_ACTIONS */
if (should_fill_actions(ufid_flags))
-   len += nla_total_size(acts->actions_len);
+   len += nla_total_size(acts->orig_len);
 
return len
+ nla_total_size(sizeof(struct ovs_flow_stats)) /* 
OVS_FLOW_ATTR_STATS */
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index b62cdb3..082a87b 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -144,6 +144,7 @@ struct sw_flow_id {
 
 struct sw_flow_actions {
struct rcu_head rcu;
+   size_t orig_len;/* From flow_cmd_new netlink actions size */
u32 actions_len;
struct nlattr actions[];
 };
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 4e7a3f7..c182b28 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1619,6 +1619,7 @@ static struct nlattr *reserve_sfa_size(struct 
sw_flow_actions **sfa,
 
memcpy(acts->actions, (*sfa)->actions, (*sfa)->actions_len);
acts->actions_len = (*sfa)->actions_len;
+   acts->orig_len = (*sfa)->orig_len;
kfree(*sfa);
*sfa = acts;
 
@@ -2223,6 +2224,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
if (IS_ERR(*sfa))
return PTR_ERR(*sfa);
 
+   (*sfa)->orig_len = nla_len(attr);
err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
 key->eth.tci, log);
if (err)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv6 net-next 00/10] OVS conntrack support

2015-08-26 Thread Joe Stringer
The goal of this series is to allow OVS to send packets through the Linux
kernel connection tracker, and subsequently match on fields populated by
conntrack. This functionality is enabled through a new
CONFIG_OPENVSWITCH_CONNTRACK option.

This version addresses the feedback from v5, primarily checking the behaviour
is correct with different configurations such as disabling
CONFIG_OPENVSWITCH_CONNTRACK or disabling individual conntrack features like
connlabels.

The branch below has been updated with the corresponding userspace pieces:
https://github.com/joestringer/ovs dev/ct_20150818

Joe Stringer (10):
  openvswitch: Serialize acts with original netlink len
  openvswitch: Move MASKED* macros to datapath.h
  ipv6: Export nf_ct_frag6_gather()
  dst: Add __skb_dst_copy() variation
  openvswitch: Add conntrack action
  openvswitch: Allow matching on conntrack mark
  netfilter: Always export nf_connlabels_replace()
  netfilter: connlabels: Export setting connlabel length
  openvswitch: Allow matching on conntrack label
  openvswitch: Allow attaching helpers to ct action

 include/net/dst.h   |   9 +-
 include/net/netfilter/nf_conntrack_labels.h |   4 +
 include/uapi/linux/openvswitch.h|  58 +++
 net/ipv6/netfilter/nf_conntrack_reasm.c |   1 +
 net/netfilter/nf_conntrack_labels.c |  34 +-
 net/netfilter/xt_connlabel.c|  16 +-
 net/openvswitch/Kconfig |  11 +
 net/openvswitch/Makefile|   2 +
 net/openvswitch/actions.c   | 229 +++--
 net/openvswitch/conntrack.c | 744 
 net/openvswitch/conntrack.h |  86 
 net/openvswitch/datapath.c  |  86 +++-
 net/openvswitch/datapath.h  |  13 +
 net/openvswitch/flow.c  |   6 +-
 net/openvswitch/flow.h  |  11 +-
 net/openvswitch/flow_netlink.c  | 119 -
 net/openvswitch/flow_netlink.h  |  13 +-
 net/openvswitch/vport.c |   1 +
 18 files changed, 1336 insertions(+), 107 deletions(-)
 create mode 100644 net/openvswitch/conntrack.c
 create mode 100644 net/openvswitch/conntrack.h

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TRIVIAL PATCH] smsc9194: Remove uncompilable #if 0'd use of pr_dbg

2015-08-26 Thread Joe Perches
No pr_dbg method exists.

While this code is #if 0'd, it'd be nicer to
use the generic hex_dump, so use it instead.

Signed-off-by: Joe Perches 
---

Or maybe just delete the driver altogether.

It'd probably be nice to one day create something
like drivers/net/ethernet/obsolete and eventually
kill off code for the stuff that hasn't been
supported or sold in the last 20 years.

 drivers/net/ethernet/smsc/smc9194.c | 32 ++--
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc9194.c 
b/drivers/net/ethernet/smsc/smc9194.c
index 67d9fde..94857c1 100644
--- a/drivers/net/ethernet/smsc/smc9194.c
+++ b/drivers/net/ethernet/smsc/smc9194.c
@@ -1031,36 +1031,8 @@ err_out:
 static void print_packet( byte * buf, int length )
 {
 #if 0
-   int i;
-   int remainder;
-   int lines;
-
-   pr_dbg("Packet of length %d\n", length);
-   lines = length / 16;
-   remainder = length % 16;
-
-   for ( i = 0; i < lines ; i ++ ) {
-   int cur;
-
-   printk(KERN_DEBUG);
-   for ( cur = 0; cur < 8; cur ++ ) {
-   byte a, b;
-
-   a = *(buf ++ );
-   b = *(buf ++ );
-   pr_cont("%02x%02x ", a, b);
-   }
-   pr_cont("\n");
-   }
-   printk(KERN_DEBUG);
-   for ( i = 0; i < remainder/2 ; i++ ) {
-   byte a, b;
-
-   a = *(buf ++ );
-   b = *(buf ++ );
-   pr_cont("%02x%02x ", a, b);
-   }
-   pr_cont("\n");
+   print_hex_dump_dbg(DRV_NAME, DUMP_PREFIX_OFFSET, 16, 1,
+  buf, length, true);
 #endif
 }
 #endif


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] drivers: net: xgene: Add TSO support

2015-08-26 Thread Iyappan Subramanian
Adding TSO support for 10GbE

iperf Tx data rate without TSO: 3.42 Gbps
  with TSO: 9.41 Gbps

v2: Address review comments from v1
- skb_linearize() if headers doesn't fit in 3 hardware buffers

v1:
* Initial version

Signed-off-by: Iyappan Subramanian 
---

Iyappan Subramanian (2):
  drivers: net: xgene: Preparatory patch for TSO support
  drivers: net: xgene: Adding support for TSO

 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|  16 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 274 +++---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  12 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c |   8 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h |   2 +
 5 files changed, 283 insertions(+), 29 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] drivers: net: xgene: Preparatory patch for TSO support

2015-08-26 Thread Iyappan Subramanian
- Rearranged descriptor writes
- Moved increment command write to xgene_enet_setup_tx_desc

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 29 ++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |  1 +
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 4f68d19..652b4c3 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -219,6 +219,11 @@ out:
return hopinfo;
 }
 
+static u16 xgene_enet_encode_len(u16 len)
+{
+   return (len == BUFLEN_16K) ? 0 : len;
+}
+
 static int xgene_enet_setup_tx_desc(struct xgene_enet_desc_ring *tx_ring,
struct sk_buff *skb)
 {
@@ -227,27 +232,36 @@ static int xgene_enet_setup_tx_desc(struct 
xgene_enet_desc_ring *tx_ring,
dma_addr_t dma_addr;
u16 tail = tx_ring->tail;
u64 hopinfo;
+   u32 len, hw_len;
+   u8 count = 1;
 
raw_desc = &tx_ring->raw_desc[tail];
memset(raw_desc, 0, sizeof(struct xgene_enet_raw_desc));
 
-   dma_addr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
+   len = skb_headlen(skb);
+   hw_len = xgene_enet_encode_len(len);
+
+   dma_addr = dma_map_single(dev, skb->data, len, DMA_TO_DEVICE);
if (dma_mapping_error(dev, dma_addr)) {
netdev_err(tx_ring->ndev, "DMA mapping error\n");
return -EINVAL;
}
 
/* Hardware expects descriptor in little endian format */
-   raw_desc->m0 = cpu_to_le64(tail);
raw_desc->m1 = cpu_to_le64(SET_VAL(DATAADDR, dma_addr) |
-  SET_VAL(BUFDATALEN, skb->len) |
+  SET_VAL(BUFDATALEN, hw_len) |
   SET_BIT(COHERENT));
+
+   raw_desc->m0 = cpu_to_le64(SET_VAL(USERINFO, tail));
hopinfo = xgene_enet_work_msg(skb);
raw_desc->m3 = cpu_to_le64(SET_VAL(HENQNUM, tx_ring->dst_ring_num) |
   hopinfo);
tx_ring->cp_ring->cp_skb[tail] = skb;
 
-   return 0;
+   tail = (tail + 1) & (tx_ring->slots - 1);
+   tx_ring->tail = tail;
+
+   return count;
 }
 
 static netdev_tx_t xgene_enet_start_xmit(struct sk_buff *skb,
@@ -257,6 +271,7 @@ static netdev_tx_t xgene_enet_start_xmit(struct sk_buff 
*skb,
struct xgene_enet_desc_ring *tx_ring = pdata->tx_ring;
struct xgene_enet_desc_ring *cp_ring = tx_ring->cp_ring;
u32 tx_level, cq_level;
+   int count;
 
tx_level = pdata->ring_ops->len(tx_ring);
cq_level = pdata->ring_ops->len(cp_ring);
@@ -266,14 +281,14 @@ static netdev_tx_t xgene_enet_start_xmit(struct sk_buff 
*skb,
return NETDEV_TX_BUSY;
}
 
-   if (xgene_enet_setup_tx_desc(tx_ring, skb)) {
+   count = xgene_enet_setup_tx_desc(tx_ring, skb);
+   if (count <= 0) {
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
 
-   pdata->ring_ops->wr_cmd(tx_ring, 1);
+   pdata->ring_ops->wr_cmd(tx_ring, count);
skb_tx_timestamp(skb);
-   tx_ring->tail = (tx_ring->tail + 1) & (tx_ring->slots - 1);
 
pdata->stats.tx_packets++;
pdata->stats.tx_bytes += skb->len;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 1c85fc8..2ac547e 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -40,6 +40,7 @@
 #define XGENE_DRV_VERSION  "v1.0"
 #define XGENE_ENET_MAX_MTU 1536
 #define SKB_BUFFER_SIZE(XGENE_ENET_MAX_MTU - NET_IP_ALIGN)
+#define BUFLEN_16K (16 * 1024)
 #define NUM_PKT_BUF64
 #define NUM_BUFPOOL32
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] drivers: net: xgene: Adding support for TSO

2015-08-26 Thread Iyappan Subramanian
Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|  16 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 249 --
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  11 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c |   8 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h |   2 +
 5 files changed, 262 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 541bed0..ff05bbc 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -193,12 +193,16 @@ enum xgene_enet_rm {
 #define USERINFO_LEN   32
 #define FPQNUM_POS 32
 #define FPQNUM_LEN 12
+#define NV_POS 50
+#define NV_LEN 1
+#define LL_POS 51
+#define LL_LEN 1
 #define LERR_POS   60
 #define LERR_LEN   3
 #define STASH_POS  52
 #define STASH_LEN  2
 #define BUFDATALEN_POS 48
-#define BUFDATALEN_LEN 12
+#define BUFDATALEN_LEN 15
 #define DATAADDR_POS   0
 #define DATAADDR_LEN   42
 #define COHERENT_POS   63
@@ -215,9 +219,19 @@ enum xgene_enet_rm {
 #define IPHDR_LEN  6
 #define EC_POS 22  /* Enable checksum */
 #define EC_LEN 1
+#define ET_POS 23  /* Enable TSO */
 #define IS_POS 24  /* IP protocol select */
 #define IS_LEN 1
 #define TYPE_ETH_WORK_MESSAGE_POS  44
+#define LL_BYTES_MSB_POS   56
+#define LL_BYTES_MSB_LEN   8
+#define LL_BYTES_LSB_POS   48
+#define LL_BYTES_LSB_LEN   12
+#define LL_LEN_POS 48
+#define LL_LEN_LEN 8
+#define DATALEN_MASK   GENMASK(11, 0)
+
+#define LAST_BUFFER(0x7800ULL << BUFDATALEN_POS)
 
 struct xgene_enet_raw_desc {
__le64 m0;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 652b4c3..b330cb6 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -147,18 +147,27 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
 {
struct sk_buff *skb;
struct device *dev;
+   skb_frag_t *frag;
+   dma_addr_t *frag_dma_addr;
u16 skb_index;
u8 status;
-   int ret = 0;
+   int i, ret = 0;
 
skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
skb = cp_ring->cp_skb[skb_index];
+   frag_dma_addr = &cp_ring->frag_dma_addr[skb_index * MAX_SKB_FRAGS];
 
dev = ndev_to_dev(cp_ring->ndev);
dma_unmap_single(dev, GET_VAL(DATAADDR, le64_to_cpu(raw_desc->m1)),
-GET_VAL(BUFDATALEN, le64_to_cpu(raw_desc->m1)),
+skb_headlen(skb),
 DMA_TO_DEVICE);
 
+   for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+   frag = &skb_shinfo(skb)->frags[i];
+   dma_unmap_page(dev, frag_dma_addr[i], skb_frag_size(frag),
+  DMA_TO_DEVICE);
+   }
+
/* Checking for error */
status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0));
if (unlikely(status > 2)) {
@@ -179,12 +188,16 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
 
 static u64 xgene_enet_work_msg(struct sk_buff *skb)
 {
+   struct net_device *ndev = skb->dev;
+   struct xgene_enet_pdata *pdata = netdev_priv(ndev);
struct iphdr *iph;
-   u8 l3hlen, l4hlen = 0;
-   u8 csum_enable = 0;
-   u8 proto = 0;
-   u8 ethhdr;
-   u64 hopinfo;
+   u8 l3hlen = 0, l4hlen = 0;
+   u8 ethhdr, proto = 0, csum_enable = 0;
+   u64 hopinfo = 0;
+   u32 hdr_len, mss = 0;
+   u32 i, len, nr_frags;
+
+   ethhdr = xgene_enet_hdr_len(skb->data);
 
if (unlikely(skb->protocol != htons(ETH_P_IP)) &&
unlikely(skb->protocol != htons(ETH_P_8021Q)))
@@ -201,14 +214,40 @@ static u64 xgene_enet_work_msg(struct sk_buff *skb)
l4hlen = tcp_hdrlen(skb) >> 2;
csum_enable = 1;
proto = TSO_IPPROTO_TCP;
+   if (ndev->features & NETIF_F_TSO) {
+   hdr_len = ethhdr + ip_hdrlen(skb) + tcp_hdrlen(skb);
+   mss = skb_shinfo(skb)->gso_size;
+
+   if (skb_is_nonlinear(skb)) {
+   len = skb_headlen(skb);
+   nr_frags = skb_shinfo(skb)->nr_frags;
+
+

[RFC PATCH] smsc911x: Ignore error return from device_get_phy_mode()

2015-08-26 Thread Guenter Roeck
Commit 62ee783bf1f8 ("smsc911x: Fix crash seen if neither ACPI nor OF is
configured or used") introduces an error check for the return value from
device_get_phy_mode() and bails out if there is an error. Unfortunately,
there are configurations where no phy is configured. Those configurations
now fail.

To fix the problem, assign a phy interface mode of PHY_INTERFACE_MODE_NA
if device_get_phy_mode() returns an error.

Check the return value from device_property_read_u32() to see if there
is a suitable firmware interface to read the data, and abort if not.
The function should return -ENXIO in that case; however, it returns
-ENODATA. Check for both.

Fixes: 62ee783bf1f8 ("smsc911x: Fix crash seen if neither ACPI nor OF is 
configured or used")
Signed-off-by: Guenter Roeck 
---
Needs testing. RFC because I am not sure if the -ENODATA check is acceptable.

 drivers/net/ethernet/smsc/smsc911x.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 6eef3251d833..81e29e420fd0 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2369,23 +2369,29 @@ static int smsc911x_probe_config(struct 
smsc911x_platform_config *config,
 {
int phy_interface;
u32 width = 0;
+   int err;
 
phy_interface = device_get_phy_mode(dev);
if (phy_interface < 0)
-   return phy_interface;
-
+   phy_interface = PHY_INTERFACE_MODE_NA;
config->phy_interface = phy_interface;
 
device_get_mac_address(dev, config->mac, ETH_ALEN);
 
-   device_property_read_u32(dev, "reg-shift", &config->shift);
-
-   device_property_read_u32(dev, "reg-io-width", &width);
-   if (width == 4)
+   err = device_property_read_u32(dev, "reg-io-width", &width);
+   /* device_property_read_u32() should return -ENXIO if there is no
+* suitable firmware interface. In reality it returns -ENODATA.
+* Check for both and use platform data if that is the case.
+*/
+   if (err == -ENXIO || err == -ENODATA)
+   return err;
+   if (!err && width == 4)
config->flags |= SMSC911X_USE_32BIT;
else
config->flags |= SMSC911X_USE_16BIT;
 
+   device_property_read_u32(dev, "reg-shift", &config->shift);
+
if (device_property_present(dev, "smsc,irq-active-high"))
config->irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TRIVIAL PATCH V2] smsc9194: Remove uncompilable #if 0'd use of pr_dbg

2015-08-26 Thread Joe Perches
No pr_dbg method exists.

While this code is #if 0'd, it'd be nicer to
use the generic hex_dump, so use it instead.

Signed-off-by: Joe Perches 
---

 no print_hex_dump_dbg method exists either.

V2: Change the uncompilable print_hex_dump_dbg call to
print_hex_dump_debug that actually could be compiled...

Or maybe just delete the driver altogether.

It'd probably be nice to one day create something
like drivers/net/ethernet/obsolete and eventually
kill off code for the stuff that hasn't been
supported or sold in the last 20 years.

 drivers/net/ethernet/smsc/smc9194.c | 32 ++--
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc9194.c 
b/drivers/net/ethernet/smsc/smc9194.c
index 67d9fde..94857c1 100644
--- a/drivers/net/ethernet/smsc/smc9194.c
+++ b/drivers/net/ethernet/smsc/smc9194.c
@@ -1031,36 +1031,8 @@ err_out:
 static void print_packet( byte * buf, int length )
 {
 #if 0
-   int i;
-   int remainder;
-   int lines;
-
-   pr_dbg("Packet of length %d\n", length);
-   lines = length / 16;
-   remainder = length % 16;
-
-   for ( i = 0; i < lines ; i ++ ) {
-   int cur;
-
-   printk(KERN_DEBUG);
-   for ( cur = 0; cur < 8; cur ++ ) {
-   byte a, b;
-
-   a = *(buf ++ );
-   b = *(buf ++ );
-   pr_cont("%02x%02x ", a, b);
-   }
-   pr_cont("\n");
-   }
-   printk(KERN_DEBUG);
-   for ( i = 0; i < remainder/2 ; i++ ) {
-   byte a, b;
-
-   a = *(buf ++ );
-   b = *(buf ++ );
-   pr_cont("%02x%02x ", a, b);
-   }
-   pr_cont("\n");
+   print_hex_dump_debug(DRV_NAME, DUMP_PREFIX_OFFSET, 16, 1,
+buf, length, true);
 #endif
 }
 #endif


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] net: phy: fixed: propagate fixed link values to struct

2015-08-26 Thread Florian Fainelli
On 26/08/15 07:58, Madalin Bucur wrote:
> The fixed link values parsed from the device tree are stored in
> the struct fixed_phy member status. The struct phy_device members
> speed, duplex were not updated.

Arguably you need to start the PHY state machine for this to work
properly, but this looks fine to me.

> 
> Signed-off-by: Madalin Bucur 

Reviewed-by: Florian Fainelli 

> ---
> v2: always setting phy->link, thanks Stas
> 
>  drivers/net/phy/fixed_phy.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
> index 479b93f..99d9bc1 100644
> --- a/drivers/net/phy/fixed_phy.c
> +++ b/drivers/net/phy/fixed_phy.c
> @@ -292,6 +292,15 @@ struct phy_device *fixed_phy_register(unsigned int irq,
>   return ERR_PTR(-EINVAL);
>   }
>  
> + /* propagate the fixed link values to struct phy_device */
> + phy->link = status->link;
> + if (status->link) {
> + phy->speed = status->speed;
> + phy->duplex = status->duplex;
> + phy->pause = status->pause;
> + phy->asym_pause = status->asym_pause;
> + }
> +
>   of_node_get(np);
>   phy->dev.of_node = np;
>  
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 net-next 05/10] openvswitch: Add conntrack action

2015-08-26 Thread Joe Stringer
On 26 August 2015 at 11:31, Joe Stringer  wrote:
> Expose the kernel connection tracker via OVS. Userspace components can
> make use of the CT action to populate the connection state (ct_state)
> field for a flow. This state can be subsequently matched.
>
> Exposed connection states are OVS_CS_F_*:
> - NEW (0x01) - Beginning of a new connection.
> - ESTABLISHED (0x02) - Part of an existing connection.
> - RELATED (0x04) - Related to an established connection.
> - INVALID (0x20) - Could not track the connection for this packet.
> - REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
> - TRACKED (0x80) - This packet has been sent through conntrack.
>
> When the CT action is executed by itself, it will send the packet
> through the connection tracker and populate the ct_state field with one
> or more of the connection state flags above. The CT action will always
> set the TRACKED bit.
>
> When the COMMIT flag is passed to the conntrack action, this specifies
> that information about the connection should be stored. This allows
> subsequent packets for the same (or related) connections to be
> correlated with this connection. Sending subsequent packets for the
> connection through conntrack allows the connection tracker to consider
> the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
>
> The CT action may optionally take a zone to track the flow within. This
> allows connections with the same 5-tuple to be kept logically separate
> from connections in other zones. If the zone is specified, then the
> "ct_zone" match field will be subsequently populated with the zone id.
>
> IP fragments are handled by transparently assembling them as part of the
> CT action. The maximum received unit (MRU) size is tracked so that
> refragmentation can occur during output.
>
> IP frag handling contributed by Andy Zhou.

Based on original design by Justin Pettit.

> Signed-off-by: Joe Stringer 
> Signed-off-by: Justin Pettit 
> Signed-off-by: Andy Zhou 
> Acked-by: Thomas Graf 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NEW HONDA CITY

2015-08-26 Thread N
Your Email id  have won a cash prize of
ONE MILLION Great British pounds in on going new Honda city company UK to Claim 
your Prize Contact Mr. Edwin Richard via Email hondacityclaimdept...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device

2015-08-26 Thread David Ahern

On 8/25/15 3:51 PM, David Miller wrote:

From: David Ahern 
Date: Tue, 25 Aug 2015 15:37:55 -0700


On 8/25/15 2:02 PM, David Miller wrote:

From: David Ahern 
Date: Sun, 23 Aug 2015 12:41:00 -0600


@@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb,
struct net_device *dev)

   static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
   {
+   int err;
+
+   __skb_pull(skb, skb_network_offset(skb));
+   err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
+ NULL, NULL, skb->len);
+
+   if (err < 0) {
+   vrf_tx_error(skb->dev, skb);
+   return -EINVAL;
+   }
+
return dev_queue_xmit(skb);


This is expensive and rediculous to do for every TX frame.

You'll need to find another way.



The packet is directed here from the IP layer via the custom dst, so
there is no L2 header on the skb. So while the push and pop of the
header seems silly it is part and parcel of the feature to run tcpdump
on the VRF device. I don't see how it could be done any other way.


You're losing a significant optimization on the transmit path by not
using the neighbour table entry hard header cache.

That's what I want you to fix.

See dst_neigh_output() and in particular neigh_hh_output().



I'm sure you'll correct me if I am wrong ...

For VRF device we don't need dst_neigh_output or neigh_hh_output or a 
neighbor cache. The packet never hits a wire with the VRF device header; 
it just hits tcpdump and then recirculates in the stack. i.e, the vrf 
device xmit just hides the eth header via the skb_pull and recirculates 
the packet back in the stack with the dst pointing to the real device. 
That's just the game for tc, netfilter, tcpdump to work with the VRF device.


As such all we need is to push an eth header to the front of the skb for 
1 loop through the stack and eth_header via dev_hard_header with NULL 
daddr is the simplest path to accomplish that. Any other path is just 
extra overhead.


David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] smsc911x: Ignore error return from device_get_phy_mode()

2015-08-26 Thread Jeremy Linton

On 08/26/2015 01:49 PM, Guenter Roeck wrote:

Check the return value from device_property_read_u32() to see if there
is a suitable firmware interface to read the data, and abort if not.
The function should return -ENXIO in that case; however, it returns
-ENODATA. Check for both.

Fixes: 62ee783bf1f8 ("smsc911x: Fix crash seen if neither ACPI nor OF is configured 
or used")
Signed-off-by: Guenter Roeck 
---
Needs testing. RFC because I am not sure if the -ENODATA check is acceptable.


I'm not really sure about it myself. I can think of cases where it might 
cause problems. That said it does work in an ACPI environment with or 
without the _DSD block. If the DSD/property isn't set, obviously the 
device doesn't configure (but it doesn't crash either) so that is good 
and an overall improvement for ACPI.


Also, I personally might have hoisted the reg-io-width ahead of the 
device_get_phy_mode() and removed the phy checks, but I don't imagine 
there is much functional difference at this point.


Tested-by: Jeremy Linton 




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >