date:20150825

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread B Viswanath

>>
>> I'd rather we fix the essence of the scalability problem than add
>> more spaghetti code to the various bridge paths.
>>
>> Can we make the fdb entries smaller?
>>
>> Can we enhance how we store such local entries such that they live in
>> a compact datastructure?  Perhaps the FDB can consist of a very dense
>> lookup mechanism for local stuff sitting alongside the current table.
>
> Certainly, that should be done and I will look into it, but the essence of 
> this patch
> is a bit different. The problem here is not the size of the fdb entries, it’s 
> more the
> number of them - having 96000 entries (even if they were 1 byte ones) is just 
> way
> too much especially when the fdb hash size is small and static. We could work 
> on making
> it dynamic though, but still these type of local entries per vlan per port 
> can easily be avoided
> with this option.
>

I was wondering if it is possible to assign a vlan bitmap for the FDB
entry, instead of replicating the entry for each vlan. ( I believe
Roopa has done something similar, but not so sure). This means that
the number of FDB entries remain static for any number of vlans.

I guess its more complicated than it sounds, but just wanted to know
if its feasible at all.

Thanks
Vissu

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-25 Thread David Miller

From: Jiri Pirko 
Date: Wed, 26 Aug 2015 07:52:15 +0200

> They are simple statistics. But they does not fit into any existing
> interface. This is about EMAD packets. They are not per-netdevice, but
> per-pcidevice. So I cannot put them into ethtool.
> 
> I see no other iface to expose this other than debugfs. Please suggest
> some other way, I don't see it :/

Then create one, instead of crapping up the driver with debugfs
craziness.

>>I'm not applying this, and I'm really getting irritated about how much
>>garbage people put into debugfs when it has _NO_ business being there.
> 
> I think that is the primary purpose of this iface, To put arbitrary
> debugging garbage there. Am I missing something?

It's not garbage if it's useful for someone.

If it's not useful, why even bother?

This is why I hate debugfs, it's a fundamentally flawed facility.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread David Miller

From: Nikolay Aleksandrov 
Date: Tue, 25 Aug 2015 22:28:16 -0700

> Certainly, that should be done and I will look into it, but the
> essence of this patch is a bit different. The problem here is not
> the size of the fdb entries, it’s more the number of them - having
> 96000 entries (even if they were 1 byte ones) is just way too much
> especially when the fdb hash size is small and static. We could work
> on making it dynamic though, but still these type of local entries
> per vlan per port can easily be avoided with this option.

96000 bits can be stored in 12k.  Get where I'm going with this?

Look at the problem sideways.
N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·§zΧ^Ύ)ν
ζθw*jg¬±¨Άέ’j/κδzΉήΰ2ή¨θΪ&’)ί‘«aΆΪώψ�G«ιh�ζj:+v¨wθΩ₯

Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-25 Thread Jiri Pirko

Tue, Aug 25, 2015 at 11:25:21PM CEST, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Mon, 24 Aug 2015 16:45:46 +0200
>
>> From: Jiri Pirko 
>> 
>> Signed-off-by: Jiri Pirko 
>> Signed-off-by: Ido Schimmel 
>> Signed-off-by: Elad Raz 
>
>Enough with this debugfs madness.
>
>Expose this stuff through standard interfaces.
>
>They are simple statistics for crying out loud!

They are simple statistics. But they does not fit into any existing
interface. This is about EMAD packets. They are not per-netdevice, but
per-pcidevice. So I cannot put them into ethtool.

I see no other iface to expose this other than debugfs. Please suggest
some other way, I don't see it :/

Thanks.

>
>I'm not applying this, and I'm really getting irritated about how much
>garbage people put into debugfs when it has _NO_ business being there.

I think that is the primary purpose of this iface, To put arbitrary
debugging garbage there. Am I missing something?

>
>Sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/9] DSA port configuration and status

2015-08-25 Thread Andrew Lunn

> It looks to me like there will be at least one more revision to this
> series, so I'm not applying this version.

Correct, and thanks.

 Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] macvtap/macvlan: use IFF_NO_QUEUE

2015-08-25 Thread Jason Wang



On 08/26/2015 12:32 AM, Vlad Yasevich wrote:
> On 08/25/2015 07:30 AM, Jason Wang wrote:
>>
>> On 08/25/2015 06:17 PM, Michael S. Tsirkin wrote:
>>> On Mon, Aug 24, 2015 at 04:33:12PM +0800, Jason Wang wrote:
> For macvlan, switch to use IFF_NO_QUEUE instead of tx_queue_len = 0.
>
> For macvtap, after commit 6acf54f1cf0a6747bac9fea26f34cfc5a9029523
> ("macvtap: Add support of packet capture on macvtap
> device."). Multiqueue macvtap suffers from single qdisc lock
> contention. This is because macvtap claims a non zero tx_queue_len and
> it reuses this value as it socket receive queue size.Thanks to
> IFF_NO_QUEUE, we can remove the lock contention without breaking
> existing socket receive queue length logic.
>
> Cc: Patrick McHardy 
> Cc: Vladislav Yasevich 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
>>> Seems to make sense. Give me a day or two to get over the jet lag
>>> (and get out from under the pile of mail accumulated while I was traveling),
>>> I'll review properly and ack.
>>>
>> A note on this patch: only default qdisc were removed but we don't lose
>> the ability to attach a qdisc to macvtap (though it may cause lock
>> contention on multiqueue case).
>>
> Wouldn't that lock contention be solved if we really had multiple queues
> for multi-queue macvtaps?
>
> -vlad

Yes, but this introduce another layer of txq locks contention? And it
also needs macvlan multiqueue support. We used to do something like this
but switch to NETIF_F_LLTX finally. You may refer:

2c11455321f37da6fe6cc36353149f9ac9183334 macvlan: add multiqueue capability
8ffab51b3dfc54876f145f15b351c41f3f703195 macvlan: lockless tx path

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Nikolay Aleksandrov

> On Aug 25, 2015, at 5:56 PM, Stephen Hemminger  
> wrote:
> 
> On Tue, 25 Aug 2015 17:34:55 -0700
> Nikolay Aleksandrov  wrote:
> 
>> From: Nikolay Aleksandrov 
>> 
>> This patch adds a new knob that, when enabled, allows to suppress the
>> installation of local fdb entries in newly created vlans. This could
>> pose a big scalability issue if we have a large number of ports and a
>> large number of vlans, e.g. in a 48 port device with 2000 vlans these
>> entries easily go up to 96000.
>> Note that packets for these macs are still received properly because they
>> are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
>> results in a miss.
>> Also note that vlan membership of ingress port and the bridge device
>> as egress are still being correctly enforced.
>> 
>> The default (0/off) is keeping the current behaviour.
>> 
>> Based on a patch by Wilson Kok (w...@cumulusnetworks.com).
> 
> 
> This is getting messy, but then again the bridge seems to have become
> a ghetto for a long time. I would rather see the lookup code fixed so
> that the fdb was correct.

What do you mean by it is getting messy ? The entries (normally) are being 
added to each
vlan so there’s not much in terms of lookup that you can fix except making the 
table bigger/better
but that will be only a temporary win. If you elaborate on what you mean by fdb 
code being fixed
I could spend time and work on fixing it. If it is resizing the table so it can 
handle 96k entries and
probably using the rhashtable, that is what I have in mind too.
I still think that it would be nice to have the option to avoid adding the 96k 
entries in the first place
and that space could be better utilized by real ones, which this option does.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Nikolay Aleksandrov


> On Aug 25, 2015, at 7:42 PM, David Miller  wrote:
> 
> From: Nikolay Aleksandrov 
> Date: Tue, 25 Aug 2015 17:34:55 -0700
> 
>> From: Nikolay Aleksandrov 
>> 
>> This patch adds a new knob that, when enabled, allows to suppress the
>> installation of local fdb entries in newly created vlans. This could
>> pose a big scalability issue if we have a large number of ports and a
>> large number of vlans, e.g. in a 48 port device with 2000 vlans these
>> entries easily go up to 96000.
>> Note that packets for these macs are still received properly because they
>> are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
>> results in a miss.
>> Also note that vlan membership of ingress port and the bridge device
>> as egress are still being correctly enforced.
>> 
>> The default (0/off) is keeping the current behaviour.
>> 
>> Based on a patch by Wilson Kok (w...@cumulusnetworks.com).
>> 
>> Signed-off-by: Nikolay Aleksandrov 
>> ---
>> v2: Triple checked the timezone
> 
> I'd rather we fix the essence of the scalability problem than add
> more spaghetti code to the various bridge paths.
> 
> Can we make the fdb entries smaller?
> 
> Can we enhance how we store such local entries such that they live in
> a compact datastructure?  Perhaps the FDB can consist of a very dense
> lookup mechanism for local stuff sitting alongside the current table.

Certainly, that should be done and I will look into it, but the essence of this 
patch
is a bit different. The problem here is not the size of the fdb entries, it’s 
more the
number of them - having 96000 entries (even if they were 1 byte ones) is just 
way
too much especially when the fdb hash size is small and static. We could work 
on making
it dynamic though, but still these type of local entries per vlan per port can 
easily be avoided
with this option.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Marcel Holtmann

Hi Dave,

>> This patch series implements a L2 only interface concept which
>> basically denies any kind of IP address configuration on these
>> interfaces, but still allows them to be used as configuration
>> end-points to keep using ethtool and friends.
>> 
>> A cleaner approach might be to finally come up with the concept of
>> net_port which a net_device would be a superset of, but this still
>> raises tons of questions as to whether we should be modifying
>> userland tools to be able to configure/query these
>> interfaces. During all the switch talks/discussions last year, it
>> seemed to me like th L2-only interface is closest we have to a
>> "network port".
>> 
>> Comments, flames, flying tomatoes welcome!
> 
> Interesting, indeed.
> 
> Do you plan to extend this to defining a more minimal network device
> sub-type as well?
> 
> Then we can pass "net_device_common" or whatever around as a common
> base type of actual net device "implementations".
> 
> Or is you main goal just getting the L2-only semantic?

the other end of this could be also an IP only net_device where we do not have 
ethtool semantics.

We do have a need for a IPv6 only net_device when utilizing ARPHRD_6LOWPAN for 
802.15.4 and Bluetooth LE. Skipping in_dev initialization there might be an 
interesting step towards that. Not sure how much entangled in_dev and in6_dev 
still are. If it works for IFF_L2_ONLY, it might work also in the other 
direction.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] netlink: mmap: fix status setting in skb destructor

2015-08-25 Thread David Miller

From: Ken-ichirou MATSUZAWA 
Date: Thu, 20 Aug 2015 16:07:33 +0900

> I don't know the intension of setting VALID status in the skb
> destructor. But I think it need to be set UNUSED status in case of
> error then release skb, or rx ring might be filled with RESERVED
> frames.
> 
> Signed-off-by: Ken-ichirou MATSUZAWA 

I think the idea is to have the user process this "zero length" frame
and advance the status itself.

I think it is probably racy and problematic to have the kernel set a
frame's state to UNUSED.  It is not a valid state transition for the
kernel side of RX ring processing.

Only the user can safely release ring entries back to the kernel.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] netlink: rx mmap: fix POLLIN condition

2015-08-25 Thread David Miller

From: Ken-ichirou MATSUZAWA 
Date: Thu, 20 Aug 2015 14:54:47 +0900

> Now poll() returns immediately after setting kernel current frame
> (ring->head) to SKIP from user space even if there are no new
> frames. And in a case of all frames is VALID, user space program
> unintensionally sets (only) kernel current frame to UNUSED, then
> calls poll(), it will not return immediately even though there are
> VALID frames.
> 
> To avoid situations like above, I think we need to scan all frames
> to find a VALID frame at poll() like netlink_alloc_skb(),
> netlink_forward_ring() finding an UNUSED frame at skb allocation.
> 
> Signed-off-by: Ken-ichirou MATSUZAWA 

There seems to be a few issues here.

Taking a look at netlink_forward_ring(), it appears buggy.

static void netlink_forward_ring(struct netlink_ring *ring)
{
unsigned int head = ring->head, pos = head;
const struct nl_mmap_hdr *hdr;

do {
hdr = __netlink_lookup_frame(ring, pos);
if (hdr->nm_status == NL_MMAP_STATUS_UNUSED)
break;
if (hdr->nm_status != NL_MMAP_STATUS_SKIP)
break;
netlink_increment_head(ring);
} while (ring->head != head);
}

No matter what any of this code does, __netlink_lookup_frame() is always
called with the same "pos" value.

So, as far as I can tell, it will look at the same ring entry header over
and over again, every time through this loop.

netlink_increment_head() changes ring->head, but this has no influence
upon the calculations made inside of __netlink_lookup_frame().

So if netlink_forward_ring() _actually_ sees an entry that we should
advance past, it will cycle through the whole ring, advancing ring->head
until it equals the "ring->head != head" loop test fails.

We should definitely fix this bug first.

As per your patch, I wonder if a backwards scan would be faster.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 0/5] act_bpf: remove spinlock in fast path

2015-08-25 Thread Alexei Starovoitov

v1 version had a race condition in cleanup path of bpf_prog.
I tried to fix it by adding new callback 'cleanup_rcu' to 'struct tcf_common'
and call it out of act_api cleanup path, but Daniel noticed
(thanks for the idea!) that most of the classifiers already do action cleanup
out of rcu callback.
So instead this set of patches converts tcindex and rsvp classifiers to call
tcf_exts_destroy() after rcu grace period and since action cleanup logic
in __tcf_hash_release() is only called when bind and refcnt goes to zero,
it's guaranteed that cleanup() callback is called from rcu callback.
More specifically:
patches 1 and 2 - simple fixes
patches 2 and 3 - convert tcf_exts_destroy in tcindex and rsvp to call_rcu
patch 5 - removes spin_lock from act_bpf

The cleanup of actions is now universally done after rcu grace period
and in the future we can drop (now unnecessary) call_rcu from tcf_hash_destroy()
patch 5 is using synchronize_rcu() in act_bpf replacement path, since it's
very rare and alternative of dynamically allocating 'struct tcf_bpf_cfg' just
to pass it to call_rcu looks even less appealing.

Alexei Starovoitov (5):
  net_sched: make tcf_hash_destroy() static
  net_sched: act_bpf: remove unnecessary copy
  net_sched: convert tcindex to call tcf_exts_destroy from rcu callback
  net_sched: convert rsvp to call tcf_exts_destroy from rcu callback
  net_sched: act_bpf: remove spinlock in fast path

 include/net/act_api.h   |1 -
 include/net/tc_act/tc_bpf.h |2 +-
 net/sched/act_api.c |3 +--
 net/sched/act_bpf.c |   38 --
 net/sched/cls_rsvp.h|   18 ++
 net/sched/cls_tcindex.c |   29 +
 6 files changed, 61 insertions(+), 30 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 3/5] net_sched: convert tcindex to call tcf_exts_destroy from rcu callback

2015-08-25 Thread Alexei Starovoitov

Adjust destroy path of cls_tcindex to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov 
---
 net/sched/cls_tcindex.c |   29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index a557dbaf5afe..944c8ff45055 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -27,6 +27,7 @@
 struct tcindex_filter_result {
struct tcf_exts exts;
struct tcf_result   res;
+   struct rcu_head rcu;
 };
 
 struct tcindex_filter {
@@ -133,8 +134,23 @@ static int tcindex_init(struct tcf_proto *tp)
return 0;
 }
 
-static int
-tcindex_delete(struct tcf_proto *tp, unsigned long arg)
+static void tcindex_destroy_rexts(struct rcu_head *head)
+{
+   struct tcindex_filter_result *r;
+
+   r = container_of(head, struct tcindex_filter_result, rcu);
+   tcf_exts_destroy(&r->exts);
+}
+
+static void tcindex_destroy_fexts(struct rcu_head *head)
+{
+   struct tcindex_filter *f = container_of(head, struct tcindex_filter, 
rcu);
+
+   tcf_exts_destroy(&f->result.exts);
+   kfree(f);
+}
+
+static int tcindex_delete(struct tcf_proto *tp, unsigned long arg)
 {
struct tcindex_data *p = rtnl_dereference(tp->root);
struct tcindex_filter_result *r = (struct tcindex_filter_result *) arg;
@@ -162,9 +178,14 @@ found:
rcu_assign_pointer(*walk, rtnl_dereference(f->next));
}
tcf_unbind_filter(tp, &r->res);
-   tcf_exts_destroy(&r->exts);
+   /* all classifiers are required to call tcf_exts_destroy() after rcu
+* grace period, since converted-to-rcu actions are relying on that
+* in cleanup() callback
+*/
if (f)
-   kfree_rcu(f, rcu);
+   call_rcu(&f->rcu, tcindex_destroy_fexts);
+   else
+   call_rcu(&r->rcu, tcindex_destroy_rexts);
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 5/5] net_sched: act_bpf: remove spinlock in fast path

2015-08-25 Thread Alexei Starovoitov

Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing
with extra care taken to free bpf programs after rcu grace period.
Replacement of existing act_bpf (very rare) is done with synchronize_rcu()
and final destruction is done from tc_action_ops->cleanup() callback that is
called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when
bind and refcnt reach zero which is only possible when classifier is destroyed.
Previous two patches fixed the last two classifiers (tcindex and rsvp) to
call tcf_exts_destroy() from rcu callback.

Similar to gact/mirred there is a race between prog->filter and
prog->tcf_action. Meaning that the program being replaced may use
previous default action if it happened to return TC_ACT_UNSPEC.
act_mirred race betwen tcf_action and tcfm_dev is similar.
In all cases the race is harmless.
Long term we may want to improve the situation by replacing the whole
tc_action->priv as single pointer instead of updating inner fields one by one.

Signed-off-by: Alexei Starovoitov 
---
 include/net/tc_act/tc_bpf.h |2 +-
 net/sched/act_bpf.c |   36 +++-
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/net/tc_act/tc_bpf.h b/include/net/tc_act/tc_bpf.h
index a152e9858b2c..958d69cfb19c 100644
--- a/include/net/tc_act/tc_bpf.h
+++ b/include/net/tc_act/tc_bpf.h
@@ -15,7 +15,7 @@
 
 struct tcf_bpf {
struct tcf_common   common;
-   struct bpf_prog *filter;
+   struct bpf_prog __rcu   *filter;
union {
u32 bpf_fd;
u16 bpf_num_ops;
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 458cf647e698..559bfa011bda 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -37,25 +37,24 @@ static int tcf_bpf(struct sk_buff *skb, const struct 
tc_action *act,
   struct tcf_result *res)
 {
struct tcf_bpf *prog = act->priv;
+   struct bpf_prog *filter;
int action, filter_res;
bool at_ingress = G_TC_AT(skb->tc_verd) & AT_INGRESS;
 
if (unlikely(!skb_mac_header_was_set(skb)))
return TC_ACT_UNSPEC;
 
-   spin_lock(&prog->tcf_lock);
-
-   prog->tcf_tm.lastuse = jiffies;
-   bstats_update(&prog->tcf_bstats, skb);
+   tcf_lastuse_update(&prog->tcf_tm);
+   bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
 
-   /* Needed here for accessing maps. */
rcu_read_lock();
+   filter = rcu_dereference(prog->filter);
if (at_ingress) {
__skb_push(skb, skb->mac_len);
-   filter_res = BPF_PROG_RUN(prog->filter, skb);
+   filter_res = BPF_PROG_RUN(filter, skb);
__skb_pull(skb, skb->mac_len);
} else {
-   filter_res = BPF_PROG_RUN(prog->filter, skb);
+   filter_res = BPF_PROG_RUN(filter, skb);
}
rcu_read_unlock();
 
@@ -77,7 +76,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct 
tc_action *act,
break;
case TC_ACT_SHOT:
action = filter_res;
-   prog->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(prog->common.cpu_qstats));
break;
case TC_ACT_UNSPEC:
action = prog->tcf_action;
@@ -87,7 +86,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct 
tc_action *act,
break;
}
 
-   spin_unlock(&prog->tcf_lock);
return action;
 }
 
@@ -263,7 +261,10 @@ static void tcf_bpf_prog_fill_cfg(const struct tcf_bpf 
*prog,
  struct tcf_bpf_cfg *cfg)
 {
cfg->is_ebpf = tcf_bpf_is_ebpf(prog);
-   cfg->filter = prog->filter;
+   /* updates to prog->filter are prevented, since it's called either
+* with rtnl lock or during final cleanup in rcu callback
+*/
+   cfg->filter = rcu_dereference_protected(prog->filter, 1);
 
cfg->bpf_ops = prog->bpf_ops;
cfg->bpf_name = prog->bpf_name;
@@ -294,7 +295,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
if (!tcf_hash_check(parm->index, act, bind)) {
ret = tcf_hash_create(parm->index, est, act,
- sizeof(*prog), bind, false);
+ sizeof(*prog), bind, true);
if (ret < 0)
return ret;
 
@@ -325,7 +326,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
goto out;
 
prog = to_bpf(act);
-   spin_lock_bh(&prog->tcf_lock);
+   ASSERT_RTNL();
 
if (res != ACT_P_CREATED)
tcf_bpf_prog_fill_cfg(prog, &old);
@@ -339,14 +340,15 @@ static int tcf_bpf_init(struct net *net, struct nlattr 
*nla,
prog->bpf_fd = cfg.bpf_fd;
 
prog->tcf_action = parm->action;
-   prog->filter = cfg.filter;
-
-   spin_unlock_bh(&prog->tcf_l

[PATCH v2 net-next 1/5] net_sched: make tcf_hash_destroy() static

2015-08-25 Thread Alexei Starovoitov

tcf_hash_destroy() used once. Make it static.

Signed-off-by: Alexei Starovoitov 
---
 include/net/act_api.h |1 -
 net/sched/act_api.c   |3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 4519c81304bd..9d446f136607 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -111,7 +111,6 @@ struct tc_action_ops {
 };
 
 int tcf_hash_search(struct tc_action *a, u32 index);
-void tcf_hash_destroy(struct tc_action *a);
 u32 tcf_hash_new_index(struct tcf_hashinfo *hinfo);
 int tcf_hash_check(u32 index, struct tc_action *a, int bind);
 int tcf_hash_create(u32 index, struct nlattr *est, struct tc_action *a,
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index b087087ccfa9..06e7c4a37245 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -36,7 +36,7 @@ static void free_tcf(struct rcu_head *head)
kfree(p);
 }
 
-void tcf_hash_destroy(struct tc_action *a)
+static void tcf_hash_destroy(struct tc_action *a)
 {
struct tcf_common *p = a->priv;
struct tcf_hashinfo *hinfo = a->ops->hinfo;
@@ -52,7 +52,6 @@ void tcf_hash_destroy(struct tc_action *a)
 */
call_rcu(&p->tcfc_rcu, free_tcf);
 }
-EXPORT_SYMBOL(tcf_hash_destroy);
 
 int __tcf_hash_release(struct tc_action *a, bool bind, bool strict)
 {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 2/5] net_sched: act_bpf: remove unnecessary copy

2015-08-25 Thread Alexei Starovoitov

Fix harmless typo and avoid unnecessary copy of empty 'prog' into
unused 'strcut tcf_bpf_cfg old'.

Fixes: f4eaed28c783 ("act_bpf: fix memory leaks when replacing bpf programs")
Signed-off-by: Alexei Starovoitov 
---
 net/sched/act_bpf.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1b97dabc621a..458cf647e698 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -327,7 +327,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
prog = to_bpf(act);
spin_lock_bh(&prog->tcf_lock);
 
-   if (ret != ACT_P_CREATED)
+   if (res != ACT_P_CREATED)
tcf_bpf_prog_fill_cfg(prog, &old);
 
prog->bpf_ops = cfg.bpf_ops;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 net-next 4/5] net_sched: convert rsvp to call tcf_exts_destroy from rcu callback

2015-08-25 Thread Alexei Starovoitov

Adjust destroy path of cls_rsvp to call tcf_exts_destroy() after
rcu grace period.

Signed-off-by: Alexei Starovoitov 
---
 net/sched/cls_rsvp.h |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 02fa82792dab..f9c9fc075fe6 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -283,12 +283,22 @@ static int rsvp_init(struct tcf_proto *tp)
return -ENOBUFS;
 }
 
-static void
-rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
+static void rsvp_delete_filter_rcu(struct rcu_head *head)
 {
-   tcf_unbind_filter(tp, &f->res);
+   struct rsvp_filter *f = container_of(head, struct rsvp_filter, rcu);
+
tcf_exts_destroy(&f->exts);
-   kfree_rcu(f, rcu);
+   kfree(f);
+}
+
+static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
+{
+   tcf_unbind_filter(tp, &f->res);
+   /* all classifiers are required to call tcf_exts_destroy() after rcu
+* grace period, since converted-to-rcu actions are relying on that
+* in cleanup() callback
+*/
+   call_rcu(&f->rcu, rsvp_delete_filter_rcu);
 }
 
 static bool rsvp_destroy(struct tcf_proto *tp, bool force)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2] sctp: donot reset the overall_error_count in SHUTDOWN_RECEIVE state

2015-08-25 Thread David Miller

From: Xin Long 
Date: Sun, 23 Aug 2015 19:30:15 +0800

Vlad et al., please review.

> commit f8d960524 fix the 0 peer.rwnd issue in SHUTDOWN_PENDING state through
> not reseting the overall_error_count when receive a heartbeat, but the same
> issue also exists in SHUTDOWN_RECEIVE state.
> 
> so we change the condition to state < SCTP_STATE_SHUTDOWN_PENDING to reset the
> overall_error_count when receive a heartbeat, which can avoid the issue happen
> in SCTP_STATE_SHUTDOWN_RECEIVE.
> 
> as to SCTP_STATE_SHUTDOWN_ACK_SENT and SCTP_STATE_SHUTDOWN_SENT state, with
> this patch, it will not be affected by the heartbeat, cause these two states
> have been taken charge of by t2 timer.
> 
> Fixes: f8d960524 ("sctp: Enforce retransmission limit during shutdown")
> Signed-off-by: Xin Long 
> ---
>  net/sctp/sm_sideeffect.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index fef2acd..85e6f03 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -702,7 +702,7 @@ static void sctp_cmd_transport_on(sctp_cmd_seq_t *cmds,
>* outstanding data and rely on the retransmission limit be reached
>* to shutdown the association.
>*/
> - if (t->asoc->state != SCTP_STATE_SHUTDOWN_PENDING)
> + if (t->asoc->state < SCTP_STATE_SHUTDOWN_PENDING)
>   t->asoc->overall_error_count = 0;
>  
>   /* Clear the hb_sent flag to signal that we had a good
> -- 
> 2.1.0
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH

2015-08-25 Thread David Miller

From: Eugene Shatokhin 
Date: Mon, 24 Aug 2015 23:13:43 +0300

> The race may happen when a device (e.g. YOTA 4G LTE Modem) is
> unplugged while the system is downloading a large file from the Net.
> 
> Hardware breakpoints and Kprobes with delays were used to confirm that
> the race does actually happen.
> 
> The race is on skb_queue ('next' pointer) between usbnet_stop()
> and rx_complete(), which, in turn, calls usbnet_bh().
> 
> Here is a part of the call stack with the code where the changes to the
> queue happen. The line numbers are for the kernel 4.1.0:
 ...

It looks like this patch needs more discussion/work.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] usbnet: Get EVENT_NO_RUNTIME_PM bit before it is cleared

2015-08-25 Thread David Miller

From: Eugene Shatokhin 
Date: Mon, 24 Aug 2015 23:13:42 +0300

> It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
> usbnet_stop(), but its value should be read before it is cleared
> when dev->flags is set to 0.
> 
> The problem was spotted and the fix was provided by
> Oliver Neukum .
> 
> Signed-off-by: Eugene Shatokhin 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread David Miller

From: Nikolay Aleksandrov 
Date: Tue, 25 Aug 2015 17:34:55 -0700

> From: Nikolay Aleksandrov 
> 
> This patch adds a new knob that, when enabled, allows to suppress the
> installation of local fdb entries in newly created vlans. This could
> pose a big scalability issue if we have a large number of ports and a
> large number of vlans, e.g. in a 48 port device with 2000 vlans these
> entries easily go up to 96000.
> Note that packets for these macs are still received properly because they
> are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
> results in a miss.
> Also note that vlan membership of ingress port and the bridge device
> as egress are still being correctly enforced.
> 
> The default (0/off) is keeping the current behaviour.
> 
> Based on a patch by Wilson Kok (w...@cumulusnetworks.com).
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> v2: Triple checked the timezone

I'd rather we fix the essence of the scalability problem than add
more spaghetti code to the various bridge paths.

Can we make the fdb entries smaller?

Can we enhance how we store such local entries such that they live in
a compact datastructure?  Perhaps the FDB can consist of a very dense
lookup mechanism for local stuff sitting alongside the current table.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 4/9] net: dsa: Allow configuration of CPU & DSA port speeds/duplex

2015-08-25 Thread Florian Fainelli

Le 08/23/15 14:24, Andrew Lunn a écrit :
>>> +   port_dn = cd->port_dn[port];
>>> +   if (of_phy_is_fixed_link(port_dn)) {
>>> +   ret = of_phy_register_fixed_link(port_dn);
>>> +   if (ret) {
>>> +   netdev_err(master,
>>> +  "failed to register fixed PHY\n");
>>> +   return ret;
>>> +   }
>>> +   phydev = of_phy_find_device(port_dn);
>>> +   genphy_config_init(phydev);
>>> +   genphy_read_status(phydev);
>>> +   if (ds->drv->adjust_link)
>>> +   ds->drv->adjust_link(ds, port, phydev);
>>
>> This kind of hack here because what you really need is just the link
>> parameters, but you cannot obtain such information without first
>> configuring the PHY up to a certain point in genphy_config_init(), and
>> then have genphy_read_status() copy these values in your phydev structure.
>>
>> Maybe we should really consider something like this after all:
>>
>> https://lkml.org/lkml/2015/8/5/490
> 
> Hi Florian
> 
> This half solves the problem. The nice thing about using the
> fixed_link, is that i can just call the adjust_link function with it.
> The fixed_phy_status cannot be passed directly to adjust_link. Some
> code refactoring or duplication would be needed.

BTW, this is really the reason why I was trying to push for having MDIO
connected switches as PHY devices, because then you get the PHY library
to calculate the advertised/supported intersection for you, and you
could even imagine re-negotiating the CPU port link with the Ethernet MAC.

Maybe I should repost these patches some day once I can get that working
with multiple switches in a tree ;)

>  
>> Or maybe, we should really introduce this "cpu" network device after all
>> with a dropping xmit function, such that we get ethtool counters to work
>> on it, and we can also attach it to a PHY device to configure link
>> parameters?
> 
> I keep humming and harring about this. I don't really like the idea of
> having an interface which you cannot send/receive packets. Yet it
> solves a number of problems like this, and gives you access to
> statistics and registers in the usual way. If we do it for the CPU
> port, we should also do it for the DSA ports. And we probably want the
> call for up to return -ENOSUP, just to make it clear it cannot be used
> for anything.
> 
>   Andrew
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: dsa: fix EDSA frame from hwaccel frame

2015-08-25 Thread Guenter Roeck

On Tue, Aug 25, 2015 at 06:28:34PM -0700, Florian Fainelli wrote:
> Le 08/03/15 23:35, Vivien Didelot a écrit :
> > If the underlying network device features NETIF_F_HW_VLAN_CTAG_TX,
> > an EDSA frame is prepended with a 802.1q header once queued.
> > 
> > To fix this, push the VLAN tag to the payload if present, before
> > checking the frame protocol.
> 
> Makes sense, but you would want Andrew or Guenter to ack this patch.
> 
I'll have to carve out some time soon to see if the current dsa code
still works for me. Just don't know when and how to do that :-(

Guenter

> Thanks!
> 
> > 
> > [note: we may prefer to access directly VLAN TCI from hwaccel frames,
> > but this approach is simpler.]
> > 
> > Signed-off-by: Vivien Didelot 
> > ---
> >  net/dsa/tag_edsa.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
> > index 2288c80..3ada4eb 100644
> > --- a/net/dsa/tag_edsa.c
> > +++ b/net/dsa/tag_edsa.c
> > @@ -9,6 +9,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include "dsa_priv.h"
> > @@ -21,6 +22,10 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> > struct dsa_slave_priv *p = netdev_priv(dev);
> > u8 *edsa_header;
> >  
> > +   skb = vlan_hwaccel_push_inside(skb);
> > +   if (unlikely(!skb))
> > +   return NULL;
> > +
> > /*
> >  * Convert the outermost 802.1q tag to a DSA tag and prepend
> >  * a DSA ethertype field is the packet is tagged, or insert
> > 
> 
> 
> -- 
> Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: dsa: fix EDSA frame from hwaccel frame

2015-08-25 Thread Florian Fainelli

Le 08/03/15 23:35, Vivien Didelot a écrit :
> If the underlying network device features NETIF_F_HW_VLAN_CTAG_TX,
> an EDSA frame is prepended with a 802.1q header once queued.
> 
> To fix this, push the VLAN tag to the payload if present, before
> checking the frame protocol.

Makes sense, but you would want Andrew or Guenter to ack this patch.

Thanks!

> 
> [note: we may prefer to access directly VLAN TCI from hwaccel frames,
> but this approach is simpler.]
> 
> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/tag_edsa.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
> index 2288c80..3ada4eb 100644
> --- a/net/dsa/tag_edsa.c
> +++ b/net/dsa/tag_edsa.c
> @@ -9,6 +9,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "dsa_priv.h"
> @@ -21,6 +22,10 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   struct dsa_slave_priv *p = netdev_priv(dev);
>   u8 *edsa_header;
>  
> + skb = vlan_hwaccel_push_inside(skb);
> + if (unlikely(!skb))
> + return NULL;
> +
>   /*
>* Convert the outermost 802.1q tag to a DSA tag and prepend
>* a DSA ethertype field is the packet is tagged, or insert
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netlink_route kernel data dump size increased

2015-08-25 Thread tej parkash

On Thu, Aug 20, 2015 at 2:47 AM, Eric Dumazet  wrote:
> On Wed, 2015-08-19 at 23:41 +0530, tej parkash wrote:
>> All,
>>
>> We are running application on Linux Kernel 3.10 to collect network
>> interface information using  NETLINK_ROUTE protocol. earlier (kernel
>> 2.6.32) we were having 8K buffer allocated to collect all data but
>> with new kernel (3.10) we are seeing read socket error, as buffer size
>> is not sufficient for all network dump data.
>>
>> We want to understand that if the userspace buffer limit increased to
>> 16K or we need some other mechanism to collect the data in 8K chuck.
>> or Is there any other way application can use NETLINK_ROUTE  protocol,
>> so that it will not break the application if data size gets increased
>> in future.
>>
>> I did some some browsing and found some link but they were not very 
>> conclusive.
>> http://www.spinics.net/lists/netdev/msg162185.html
>>
>> Appreciate for any kind of help or pointers here
>>
>
> This sounds like a bug that might have been fixed later.
https://lists.ubuntu.com/archives/kernel-team/2014-August/046758.html

Is this the patch we are talking about?
It is available from 3.13 kernel only.  Let me also verify it, if it
is working for us.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Stephen Hemminger

On Tue, 25 Aug 2015 17:34:55 -0700
Nikolay Aleksandrov  wrote:

> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 3d95647039d0..2bda472c5a6e 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -294,6 +294,7 @@ struct net_bridge
>   u32 auto_cnt;
>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>   u8  vlan_enabled;
> + boolvlan_ignore_local_fdb;

bool takes more space than u8.


>   __be16  vlan_proto;
>   u16 default_pvid;
>   struct net_port_vlans __rcu *vlan_info;

> +int br_vlan_ignore_local_fdb_toggle(struct net_bridge *br, unsigned long val)
> +{
> + br->vlan_ignore_local_fdb = val ? true : false;

personal preference is for:
br->vlan_ignore_local_fdb = !!val;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Stephen Hemminger

On Tue, 25 Aug 2015 17:34:55 -0700
Nikolay Aleksandrov  wrote:

> From: Nikolay Aleksandrov 
> 
> This patch adds a new knob that, when enabled, allows to suppress the
> installation of local fdb entries in newly created vlans. This could
> pose a big scalability issue if we have a large number of ports and a
> large number of vlans, e.g. in a 48 port device with 2000 vlans these
> entries easily go up to 96000.
> Note that packets for these macs are still received properly because they
> are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
> results in a miss.
> Also note that vlan membership of ingress port and the bridge device
> as egress are still being correctly enforced.
> 
> The default (0/off) is keeping the current behaviour.
> 
> Based on a patch by Wilson Kok (w...@cumulusnetworks.com).


This is getting messy, but then again the bridge seems to have become
a ghetto for a long time. I would rather see the lookup code fixed so
that the fdb was correct.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Nikolay Aleksandrov

From: Nikolay Aleksandrov 

This patch adds a new knob that, when enabled, allows to suppress the
installation of local fdb entries in newly created vlans. This could
pose a big scalability issue if we have a large number of ports and a
large number of vlans, e.g. in a 48 port device with 2000 vlans these
entries easily go up to 96000.
Note that packets for these macs are still received properly because they
are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
results in a miss.
Also note that vlan membership of ingress port and the bridge device
as egress are still being correctly enforced.

The default (0/off) is keeping the current behaviour.

Based on a patch by Wilson Kok (w...@cumulusnetworks.com).

Signed-off-by: Nikolay Aleksandrov 
---
v2: Triple checked the timezone

 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_input.c|  7 +++
 net/bridge/br_netlink.c  | 14 +-
 net/bridge/br_private.h  | 18 ++
 net/bridge/br_sysfs_br.c | 18 ++
 net/bridge/br_vlan.c | 18 +-
 6 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 313c305fd1ad..df1c601dd315 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -231,6 +231,7 @@ enum {
IFLA_BR_STP_STATE,
IFLA_BR_PRIORITY,
IFLA_BR_VLAN_FILTERING,
+   IFLA_BR_VLAN_IGNORE_LOCAL_FDB,
__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index f921a5dce22d..a2b00849de3c 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -186,6 +186,13 @@ int br_handle_frame_finish(struct sock *sk, struct sk_buff 
*skb)
skb2 = skb;
/* Do not forward the packet since it's local. */
skb = NULL;
+   } else if (br_vlan_enabled(br) && br_vlan_ignore_local_fdb(br)) {
+   dst = __br_fdb_get(br, dest, 0);
+   if (dst && dst->is_local) {
+   skb2 = skb;
+   /* Do not forward the packet since it's local. */
+   skb = NULL;
+   }
}
 
if (skb) {
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index dbcb1949ea58..07978f7b6245 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -729,6 +729,7 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = 
{
[IFLA_BR_STP_STATE] = { .type = NLA_U32 },
[IFLA_BR_PRIORITY] = { .type = NLA_U16 },
[IFLA_BR_VLAN_FILTERING] = { .type = NLA_U8 },
+   [IFLA_BR_VLAN_IGNORE_LOCAL_FDB] = { .type = NLA_U8 },
 };
 
 static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
@@ -784,6 +785,14 @@ static int br_changelink(struct net_device *brdev, struct 
nlattr *tb[],
return err;
}
 
+   if (data[IFLA_BR_VLAN_IGNORE_LOCAL_FDB]) {
+   u8 vlan_ignore_local = 
nla_get_u8(data[IFLA_BR_VLAN_IGNORE_LOCAL_FDB]);
+
+   err = br_vlan_ignore_local_fdb_toggle(br, vlan_ignore_local);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
@@ -796,6 +805,7 @@ static size_t br_get_size(const struct net_device *brdev)
   nla_total_size(sizeof(u32)) +/* IFLA_BR_STP_STATE */
   nla_total_size(sizeof(u16)) +/* IFLA_BR_PRIORITY */
   nla_total_size(sizeof(u8)) + /* IFLA_BR_VLAN_FILTERING */
+  nla_total_size(sizeof(u8)) + /* 
IFLA_BR_VLAN_IGNORE_LOCAL_FDB */
   0;
 }
 
@@ -809,6 +819,7 @@ static int br_fill_info(struct sk_buff *skb, const struct 
net_device *brdev)
u32 stp_enabled = br->stp_enabled;
u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1];
u8 vlan_enabled = br_vlan_enabled(br);
+   u8 vlan_ignore_local = br_vlan_ignore_local_fdb(br);
 
if (nla_put_u32(skb, IFLA_BR_FORWARD_DELAY, forward_delay) ||
nla_put_u32(skb, IFLA_BR_HELLO_TIME, hello_time) ||
@@ -816,7 +827,8 @@ static int br_fill_info(struct sk_buff *skb, const struct 
net_device *brdev)
nla_put_u32(skb, IFLA_BR_AGEING_TIME, ageing_time) ||
nla_put_u32(skb, IFLA_BR_STP_STATE, stp_enabled) ||
nla_put_u16(skb, IFLA_BR_PRIORITY, priority) ||
-   nla_put_u8(skb, IFLA_BR_VLAN_FILTERING, vlan_enabled))
+   nla_put_u8(skb, IFLA_BR_VLAN_FILTERING, vlan_enabled) ||
+   nla_put_u8(skb, IFLA_BR_VLAN_IGNORE_LOCAL_FDB, vlan_ignore_local))
return -EMSGSIZE;
 
return 0;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3d95647039d0..2bda472c5a6e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -294,6 +294,7 @@ struct net_bridge
u32 auto_cnt;
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
u8  vlan

Re: [PATCH] lib/Makefile: remove CONFIG_AVERAGE build rule

2015-08-25 Thread David Miller

From: Johannes Berg 
Date: Fri, 21 Aug 2015 12:13:23 +0200

> On Fri, 2015-08-21 at 10:05 +, Valentin Rothberg wrote:
>> The Kconfig option AVERAGE and its implementation has been removed by
>> commit f4e774f55fe0 ("average: remove out-of-line implementation").
>> Remove the dead build rule in lib/Makefile.
> 
> D'oh, sorry about that.
> 
> Reviewed-by: Johannes Berg 
> 
> [reproducing patch in full for netdev]

The full patch needs to be resubmitted to netdev so that it gets
queued up in patchwork, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: tcp: add NV congestion control

2015-08-25 Thread David Miller

From: Lawrence Brakmo 
Date: Tue, 25 Aug 2015 16:33:50 -0700

> Changes from v5: cleaning of NV code, changing some default parameters

I have no fundamental objections to this patch series.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread David Miller

From: Florian Fainelli 
Date: Tue, 25 Aug 2015 15:50:10 -0700

> This patch series implements a L2 only interface concept which
> basically denies any kind of IP address configuration on these
> interfaces, but still allows them to be used as configuration
> end-points to keep using ethtool and friends.
> 
> A cleaner approach might be to finally come up with the concept of
> net_port which a net_device would be a superset of, but this still
> raises tons of questions as to whether we should be modifying
> userland tools to be able to configure/query these
> interfaces. During all the switch talks/discussions last year, it
> seemed to me like th L2-only interface is closest we have to a
> "network port".
> 
> Comments, flames, flying tomatoes welcome!

Interesting, indeed.

Do you plan to extend this to defining a more minimal network device
sub-type as well?

Then we can pass "net_device_common" or whatever around as a common
base type of actual net device "implementations".

Or is you main goal just getting the L2-only semantic?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] net: dsa: fix EDSA frame from hwaccel frame

2015-08-25 Thread Vivien Didelot

Hi All,

On Aug. Tuesday 04 (32) 02:35 AM, Vivien Didelot wrote:
> If the underlying network device features NETIF_F_HW_VLAN_CTAG_TX,
> an EDSA frame is prepended with a 802.1q header once queued.
> 
> To fix this, push the VLAN tag to the payload if present, before
> checking the frame protocol.
> 
> [note: we may prefer to access directly VLAN TCI from hwaccel frames,
> but this approach is simpler.]
> 
> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/tag_edsa.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
> index 2288c80..3ada4eb 100644
> --- a/net/dsa/tag_edsa.c
> +++ b/net/dsa/tag_edsa.c
> @@ -9,6 +9,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "dsa_priv.h"
> @@ -21,6 +22,10 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   struct dsa_slave_priv *p = netdev_priv(dev);
>   u8 *edsa_header;
>  
> + skb = vlan_hwaccel_push_inside(skb);
> + if (unlikely(!skb))
> + return NULL;
> +
>   /*
>* Convert the outermost 802.1q tag to a DSA tag and prepend
>* a DSA ethertype field is the packet is tagged, or insert
> -- 
> 2.4.6
> 

Did someone have a chance to give this a look?

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 5/5] net: dsa: bcm_sf2: Allow disabling tagging protocol

2015-08-25 Thread David Miller

From: Florian Fainelli 
Date: Tue, 25 Aug 2015 15:50:15 -0700

>   /* Enable Broadcom tags for IMP port */
>   reg = core_readl(priv, CORE_BRCM_HDR_CTRL);
> - reg |= val;
> + if (!tagging_disabled)

Just a pet-peeve of mine, I hate conditions that read as double
negatives, like this one does.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v5 net-next 4/4] tcp: add NV congestion control

2015-08-25 Thread Lawrence Brakmo

On 8/5/15, 5:51 PM, "knn...@gmail.com on behalf of Kenneth Klette
Jonassen"  wrote:

>On Wed, Aug 5, 2015 at 3:39 AM, Lawrence Brakmo  wrote:
>> This is a request for comments.
>
>Nice to see more development on delay-based congestion control.
>
>It would be good to see how NV stacks up against CDG. Any chance of
>adding cdg as a congestion control parameter to your experiments?

Done. I¹m updating the TCP-NV page with the updated results, should be
done by tomorrow (8/26).

>Experiments on NV without its temporary cwnd reductions would also be
>of interest -- to get a reference of how effective this mechanism is.

Done. It turns out that it only improves fairness a little but hurts
P99 latencies more significantly. So it is now off by default.
Thanks for making me re-examine this feature.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] drivers: net: xgene: fix: Oops in linkwatch_fire_event

2015-08-25 Thread David Miller

From: Iyappan Subramanian 
Date: Tue, 25 Aug 2015 15:03:03 -0700

> [ 1065.801569] Internal error: Oops: 9606 [#1] SMP
> ...
> [ 1065.866655] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Apr 22 
> 2015
> [ 1065.873937] Workqueue: events_power_efficient phy_state_machine
> [ 1065.879837] task: fe01de105e80 ti: fe00bcf18000 task.ti: 
> fe00bcf18000
> [ 1065.887288] PC is at linkwatch_fire_event+0xac/0xc0
> [ 1065.892141] LR is at linkwatch_fire_event+0xa0/0xc0
> [ 1065.896995] pc : [] lr : [] pstate: 
> 21c5
> [ 1065.904356] sp : fe00bcf1bd00
> ...
> [ 1066.196813] Call Trace:
> [ 1066.199248] [] linkwatch_fire_event+0xac/0xc0
> [ 1066.205140] [] netif_carrier_off+0x54/0x64
> [ 1066.210773] [] phy_state_machine+0x120/0x3bc
> [ 1066.216578] [] process_one_work+0x15c/0x3a8
> [ 1066.96] [] worker_thread+0x134/0x470
> [ 1066.227757] [] kthread+0xe0/0xf8
> [ 1066.232525] Code: 97f65ee9 f9420660 d538d082 8b42 (885f7c40)
> 
> The fix is to call phy_disconnect() from xgene_enet_mdio_remove,
> which in turn call cancel_delayed_work_sync().
> 
> Signed-off-by: Iyappan Subramanian 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Sowmini Varadhan

On Tue, Aug 25, 2015 at 4:52 PM, David Ahern  wrote:


> The VRF driver can check the device when the enslave request happens.
>

Will this work correctly if I set up a bonding interface or SVI,
and want to put the bond-master or SVI in the vrf (but subsequently
want to get, say, timestamp/other-stats from the L2 slave in the vrf?)

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next 7/8] geneve: Consolidate Geneve functionality in single module.

2015-08-25 Thread Pravin Shelar

On Tue, Aug 25, 2015 at 2:35 PM, Jesse Gross  wrote:
> On Tue, Aug 25, 2015 at 1:54 PM, Pravin Shelar  wrote:
>> On Tue, Aug 25, 2015 at 12:03 PM, Jesse Gross  wrote:
>>> On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar  
>>> wrote:
 diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
 index c05bc13..8eb875d 100644
 --- a/drivers/net/geneve.c
 +++ b/drivers/net/geneve.c
 @@ -492,36 +813,36 @@ static int geneve_configure(struct net *net, struct 
 net_device *dev,
>>> [...]
 +   gs = geneve_find_sock(gn, geneve->dst_port);
 +   if (gs) {
 +   if (metadata) {
 +   if (gs->collect_md)
 +   return -EEXIST;
 +   else
 +   return -EPERM;
 +   } else {
 +   if (gs->collect_md)
 +   return -EPERM;
 +
 +   t = geneve_lookup(gn, htons(dst_port),
 + rem_addr, geneve->vni);
 +   if (t)
 +   return -EBUSY;
 +   }
 +   }
>>>
>>> I like the new structure but unfortunately, I think there is a race.
>>> If two devices are created with conflicting configurations but neither
>>> is brought up then creation of both devices will succeed. However,
>>> when the second one is brought up, it will silently collide with the
>>> first.
>>
>> geneve tunnel is added to hash table during configure time. So the
>> lookup does not have any dependency on device up or down state. The
>> Lookup and hash table updates are done under rtnl lock.
>
> But the check for duplicates is contingent on finding a socket. If we
> configure two devices before calling geneve_open(), then there won't
> be a socket yet and therefore no check.

ah.. ok. I have to traverse the geneve tunnel list here.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread David Miller

From: Nikolay Aleksandrov 
Date: Tue, 25 Aug 2015 16:47:00 -0700

> hahaha :-)
> Sorry about that, I’m in the US but my VM is in another timezone. Anyway I’ll
> submit a v2 shortly back in the future.

"the future", very funny...
N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·§zΧ^Ύ)ν
ζθw*jg¬±¨Άέ’j/κδzΉήΰ2ή¨θΪ&’)ί‘«aΆΪώψ�G«ιh�ζj:+v¨wθΩ₯

Re: [Patch net] cls_u32: complete the check for non-forced case in u32_destroy()

2015-08-25 Thread David Miller

From: Cong Wang 
Date: Tue, 25 Aug 2015 16:38:12 -0700

> In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are 
> gone")
> I added a check in u32_destroy() to see if all real filters are gone
> for each tp, however, that is only done for root_ht, same is needed
> for others.
> 
> This can be reproduced by the following tc commands:
> 
> tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 
> 256
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
> ht 15:2: match ip src 10.0.0.2 flowid 1:10
> tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
> ht 15:2: match ip src 10.0.0.3 flowid 1:10
> 
> Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
> Reported-by: Akshat Kakkar 
> Cc: Jamal Hadi Salim 
> Signed-off-by: Cong Wang 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v3 0/2] Documentation: dsa

2015-08-25 Thread David Miller

From: Florian Fainelli 
Date: Tue, 25 Aug 2015 15:33:12 -0700

> This patch series adds some documentation about DSA as a subsystem as well
> as the SF2 driver since it slightly diverges from your average DSA driver ;)

This looks great, applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread David Ahern


On 8/25/15 4:44 PM, Sowmini Varadhan wrote:

On Tue, Aug 25, 2015 at 3:50 PM, Florian Fainelli  wrote:

Hi all,

This patch series implements a L2 only interface concept which basically denies
any kind of IP address configuration on these interfaces, but still allows them
to be used as configuration end-points to keep using ethtool and friends.



This is a very interesting idea. A few questions/thoughts: will there
be any eventual restrictions on which types interfaces can be L2_ONLY?
Ideally, it should be possible to let interfaces wink in/out of L2 only
state administratively (as can be done on a typical router, after unwinding
existing config as needed)

I'm assuming something will prevent an L2-only interface from being
part of a vrf.


The VRF driver can check the device when the enslave request happens.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Nikolay Aleksandrov


> On Aug 25, 2015, at 4:41 PM, David Miller  wrote:
> 
> From: Nikolay Aleksandrov 
> Date: Tue, 25 Aug 2015 03:55:27 +0300
> 
> If it is actually 03:55:27 +0300 where you are, please send me your
> time travel device.
> 
> And in the future please submit patches outside of your time
> travelling activities (ie. fix the date on your computer)
> thanks :-)
> 

hahaha :-)
Sorry about that, I’m in the US but my VM is in another timezone. Anyway I’ll
submit a v2 shortly back in the future.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Sowmini Varadhan

On Tue, Aug 25, 2015 at 3:50 PM, Florian Fainelli  wrote:
> Hi all,
>
> This patch series implements a L2 only interface concept which basically 
> denies
> any kind of IP address configuration on these interfaces, but still allows 
> them
> to be used as configuration end-points to keep using ethtool and friends.
>

This is a very interesting idea. A few questions/thoughts: will there
be any eventual restrictions on which types interfaces can be L2_ONLY?
Ideally, it should be possible to let interfaces wink in/out of L2 only
state administratively (as can be done on a typical router, after unwinding
existing config as needed)

I'm assuming something will prevent an L2-only interface from being
part of a vrf.

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v3 2/2] Documentation: networking: dsa: Add Broadcom SF2 document

2015-08-25 Thread Vivien Didelot

On Aug. Tuesday 25 (35) 03:33 PM, Florian Fainelli wrote:
> Add a document describing the Broadcom Starfigther 2 switch hardware,
> its specifics, and how the driver is implemented and its specifics.
> 
> Signed-off-by: Florian Fainelli 
> ---
> Changes in v2:
> 
> - address Randy Dunlap's feedback
> - address Vivien's feedback
> - fix typos/misc spelling mistakes and punctuation

Reviewed-by: Vivien Didelot 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread David Miller

From: Nikolay Aleksandrov 
Date: Tue, 25 Aug 2015 03:55:27 +0300

If it is actually 03:55:27 +0300 where you are, please send me your
time travel device.

And in the future please submit patches outside of your time
travelling activities (ie. fix the date on your computer)
thanks :-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] bridge: vlan: allow to suppress local mac install for all vlans

2015-08-25 Thread Nikolay Aleksandrov

From: Nikolay Aleksandrov 

This patch adds a new knob that, when enabled, allows to suppress the
installation of local fdb entries in newly created vlans. This could
pose a big scalability issue if we have a large number of ports and a
large number of vlans, e.g. in a 48 port device with 2000 vlans these
entries easily go up to 96000.
Note that packets for these macs are still received properly because they
are added in vlan 0 as "own" macs and referenced when fdb lookup by vlan
results in a miss.
Also note that vlan membership of ingress port and the bridge device
as egress are still being correctly enforced.

The default (0/off) is keeping the current behaviour.

Based on a patch by Wilson Kok (w...@cumulusnetworks.com).

Signed-off-by: Nikolay Aleksandrov 
---
As usual I'll post iproute2 patch if this one gets accepted.

 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_input.c|  7 +++
 net/bridge/br_netlink.c  | 14 +-
 net/bridge/br_private.h  | 18 ++
 net/bridge/br_sysfs_br.c | 18 ++
 net/bridge/br_vlan.c | 18 +-
 6 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 313c305fd1ad..df1c601dd315 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -231,6 +231,7 @@ enum {
IFLA_BR_STP_STATE,
IFLA_BR_PRIORITY,
IFLA_BR_VLAN_FILTERING,
+   IFLA_BR_VLAN_IGNORE_LOCAL_FDB,
__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index f921a5dce22d..a2b00849de3c 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -186,6 +186,13 @@ int br_handle_frame_finish(struct sock *sk, struct sk_buff 
*skb)
skb2 = skb;
/* Do not forward the packet since it's local. */
skb = NULL;
+   } else if (br_vlan_enabled(br) && br_vlan_ignore_local_fdb(br)) {
+   dst = __br_fdb_get(br, dest, 0);
+   if (dst && dst->is_local) {
+   skb2 = skb;
+   /* Do not forward the packet since it's local. */
+   skb = NULL;
+   }
}
 
if (skb) {
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index dbcb1949ea58..07978f7b6245 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -729,6 +729,7 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = 
{
[IFLA_BR_STP_STATE] = { .type = NLA_U32 },
[IFLA_BR_PRIORITY] = { .type = NLA_U16 },
[IFLA_BR_VLAN_FILTERING] = { .type = NLA_U8 },
+   [IFLA_BR_VLAN_IGNORE_LOCAL_FDB] = { .type = NLA_U8 },
 };
 
 static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
@@ -784,6 +785,14 @@ static int br_changelink(struct net_device *brdev, struct 
nlattr *tb[],
return err;
}
 
+   if (data[IFLA_BR_VLAN_IGNORE_LOCAL_FDB]) {
+   u8 vlan_ignore_local = 
nla_get_u8(data[IFLA_BR_VLAN_IGNORE_LOCAL_FDB]);
+
+   err = br_vlan_ignore_local_fdb_toggle(br, vlan_ignore_local);
+   if (err)
+   return err;
+   }
+
return 0;
 }
 
@@ -796,6 +805,7 @@ static size_t br_get_size(const struct net_device *brdev)
   nla_total_size(sizeof(u32)) +/* IFLA_BR_STP_STATE */
   nla_total_size(sizeof(u16)) +/* IFLA_BR_PRIORITY */
   nla_total_size(sizeof(u8)) + /* IFLA_BR_VLAN_FILTERING */
+  nla_total_size(sizeof(u8)) + /* 
IFLA_BR_VLAN_IGNORE_LOCAL_FDB */
   0;
 }
 
@@ -809,6 +819,7 @@ static int br_fill_info(struct sk_buff *skb, const struct 
net_device *brdev)
u32 stp_enabled = br->stp_enabled;
u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1];
u8 vlan_enabled = br_vlan_enabled(br);
+   u8 vlan_ignore_local = br_vlan_ignore_local_fdb(br);
 
if (nla_put_u32(skb, IFLA_BR_FORWARD_DELAY, forward_delay) ||
nla_put_u32(skb, IFLA_BR_HELLO_TIME, hello_time) ||
@@ -816,7 +827,8 @@ static int br_fill_info(struct sk_buff *skb, const struct 
net_device *brdev)
nla_put_u32(skb, IFLA_BR_AGEING_TIME, ageing_time) ||
nla_put_u32(skb, IFLA_BR_STP_STATE, stp_enabled) ||
nla_put_u16(skb, IFLA_BR_PRIORITY, priority) ||
-   nla_put_u8(skb, IFLA_BR_VLAN_FILTERING, vlan_enabled))
+   nla_put_u8(skb, IFLA_BR_VLAN_FILTERING, vlan_enabled) ||
+   nla_put_u8(skb, IFLA_BR_VLAN_IGNORE_LOCAL_FDB, vlan_ignore_local))
return -EMSGSIZE;
 
return 0;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 3d95647039d0..2bda472c5a6e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -294,6 +294,7 @@ struct net_bridge
u32 auto_cnt;
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
u8

[Patch net] cls_u32: complete the check for non-forced case in u32_destroy()

2015-08-25 Thread Cong Wang

In commit 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
I added a check in u32_destroy() to see if all real filters are gone
for each tp, however, that is only done for root_ht, same is needed
for others.

This can be reproduced by the following tc commands:

tc filter add dev eth0 parent 1:0 prio 5 handle 15: protocol ip u32 divisor 256
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:2 u32
ht 15:2: match ip src 10.0.0.2 flowid 1:10
tc filter add dev eth0 protocol ip parent 1: prio 5 handle 15:2:3 u32
ht 15:2: match ip src 10.0.0.3 flowid 1:10

Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone")
Reported-by: Akshat Kakkar 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
Signed-off-by: Cong Wang 
---
 net/sched/cls_u32.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index cab9e9b..4fbb674 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -490,6 +490,19 @@ static bool u32_destroy(struct tcf_proto *tp, bool force)
return false;
}
}
+
+   if (tp_c->refcnt > 1)
+   return false;
+
+   if (tp_c->refcnt == 1) {
+   struct tc_u_hnode *ht;
+
+   for (ht = rtnl_dereference(tp_c->hlist);
+ht;
+ht = rtnl_dereference(ht->next))
+   if (!ht_empty(ht))
+   return false;
+   }
}
 
if (root_ht && --root_ht->refcnt == 0)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

tcp: add NV congestion control

2015-08-25 Thread Lawrence Brakmo

Changes from v5: cleaning of NV code, changing some default parameters

I've run more extensive tests, I'm working on updating the NV website
(http://www.brakmo.org/networking/tcp-nv/TCPNV.html) should be updated
by tomorrow (8/26).

The updated tests include Reno, Cubic, NV and CDG and include more types
of traffic. Overview of results:
1) NV has a little lower throughput (2-3% less) with small number of flows
   as compared to Reno, Cubic and CDG
2) NV is less fair with few flows but becomes more fair with more flows
3) Less losses with NV (none in many cases) as compared to all others.
   One exception is when things get very congested (64 flows into one
   server), NV has 50% more losses than CDG, Cubic has 1.8x to 10x more
   losses than CDG. Reno has about the same losses as CDG.
4) In mixed traffic (1M and 10K RPCs), 10K flows achieve much higher
   average throughput with NV than with the others (which are
   very similar). In one example, 2 clients sending 1M and 10K to 2
   servers, with NV 10K flows average 1Gbps and 1M flows 3.7Gbps,
   whereas they average about 226Mbps and 4.4Gbps for Reno, Cubic and
   CDG. They all have similar link utilization.

Consists of the following patches:

[RFC PATCH v6 net-next 1/4] tcp: replace cnt & rtt with struct in
[RFC PATCH v6 net-next 2/4] tcp:  refactor struct tcp_skb_cb
[RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb
[RFC PATCH v6 net-next 4/4] tcp: add NV congestion control

Signed-off-by: Lawrence Brakmo 

include/net/tcp.h   |  20 ++-
net/ipv4/Kconfig|  16 ++
net/ipv4/Makefile   |   1 +
net/ipv4/tcp_bic.c  |   6 +-
net/ipv4/tcp_cdg.c  |  14 +-
net/ipv4/tcp_cubic.c|   6 +-
net/ipv4/tcp_htcp.c |  10 +-
net/ipv4/tcp_illinois.c |  20 +--
net/ipv4/tcp_input.c|  10 +-
net/ipv4/tcp_lp.c   |   6 +-
net/ipv4/tcp_nv.c   | 489 
++
net/ipv4/tcp_output.c   |   4 +-
net/ipv4/tcp_vegas.c|   6 +-
net/ipv4/tcp_vegas.h|   2 +-
net/ipv4/tcp_veno.c |   7 +-
net/ipv4/tcp_westwood.c |   7 +-
net/ipv4/tcp_yeah.c |   7 +-
17 files changed, 580 insertions(+), 51 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v6 net-next 1/4] tcp: replace cnt & rtt with struct in pkts_acked()

2015-08-25 Thread Lawrence Brakmo

Replace 2 arguments (cnt and rtt) in the congestion control modules'
pkts_acked() function with a struct. This will allow adding more
information without having to modify existing congestion control
modules (tcp_nv in particular needs bytes in flight when packet
was sent).

As proposed by Neal Cardwell in his comments to the tcp_nv patch.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h   |  7 ++-
 net/ipv4/tcp_bic.c  |  6 +++---
 net/ipv4/tcp_cdg.c  | 14 +++---
 net/ipv4/tcp_cubic.c|  6 +++---
 net/ipv4/tcp_htcp.c | 10 +-
 net/ipv4/tcp_illinois.c | 20 ++--
 net/ipv4/tcp_input.c|  7 +--
 net/ipv4/tcp_lp.c   |  6 +++---
 net/ipv4/tcp_vegas.c|  6 +++---
 net/ipv4/tcp_vegas.h|  2 +-
 net/ipv4/tcp_veno.c |  7 ---
 net/ipv4/tcp_westwood.c |  7 ---
 net/ipv4/tcp_yeah.c |  7 ---
 13 files changed, 58 insertions(+), 47 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..0121529 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags {
 
 union tcp_cc_info;
 
+struct ack_sample {
+   u32 pkts_acked;
+   s32 rtt_us;
+};
+
 struct tcp_congestion_ops {
struct list_headlist;
u32 key;
@@ -857,7 +862,7 @@ struct tcp_congestion_ops {
/* new value of cwnd after loss (optional) */
u32  (*undo_cwnd)(struct sock *sk);
/* hook for packet ack accounting (optional) */
-   void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
+   void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index fd1405d..f469f1b 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt)
+static void bictcp_acked(struct sock *sk, const struct ack_sample *sample)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
 
if (icsk->icsk_ca_state == TCP_CA_Open) {
struct bictcp *ca = inet_csk_ca(sk);
 
-   cnt -= ca->delayed_ack >> ACK_RATIO_SHIFT;
-   ca->delayed_ack += cnt;
+   ca->delayed_ack += sample->pkts_acked - 
+   (ca->delayed_ack >> ACK_RATIO_SHIFT);
}
 }
 
diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index 167b6a3..b4e5af7 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, 
u32 acked)
ca->shadow_wnd = max(ca->shadow_wnd, ca->shadow_wnd + incr);
 }
 
-static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us)
+static void tcp_cdg_acked(struct sock *sk, const struct ack_sample *sample)
 {
struct cdg *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);
 
-   if (rtt_us <= 0)
+   if (sample->rtt_us <= 0)
return;
 
/* A heuristic for filtering delayed ACKs, adapted from:
@@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, 
s32 rtt_us)
 * delay and rate based TCP mechanisms." TR 100219A. CAIA, 2010.
 */
if (tp->sacked_out == 0) {
-   if (num_acked == 1 && ca->delack) {
+   if (sample->pkts_acked == 1 && ca->delack) {
/* A delayed ACK is only used for the minimum if it is
 * provenly lower than an existing non-zero minimum.
 */
-   ca->rtt.min = min(ca->rtt.min, rtt_us);
+   ca->rtt.min = min(ca->rtt.min, sample->rtt_us);
ca->delack--;
return;
-   } else if (num_acked > 1 && ca->delack < 5) {
+   } else if (sample->pkts_acked > 1 && ca->delack < 5) {
ca->delack++;
}
}
 
-   ca->rtt.min = min_not_zero(ca->rtt.min, rtt_us);
-   ca->rtt.max = max(ca->rtt.max, rtt_us);
+   ca->rtt.min = min_not_zero(ca->rtt.min, sample->rtt_us);
+   ca->rtt.max = max(ca->rtt.max, sample->rtt_us);
 }
 
 static u32 tcp_cdg_ssthresh(struct sock *sk)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 28011fb..c5d0ba5 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us)
+static void bictcp_acked(struct sock *sk, const struct ack_samp

[RFC PATCH v6 net-next 4/4] tcp: add NV congestion control

2015-08-25 Thread Lawrence Brakmo

This is a request for comments.

TCP-NV (New Vegas) is a major update to TCP-Vegas.
An earlier version of NV was presented at 2010's LPC.
It is a delayed based congestion avoidance for the
data center. This version has been tested within a
10G rack where the HW RTTs are 20-50us.

A description of TCP-NV, including implementation
details as well as experimental results, can be found at:
http://www.brakmo.org/networking/tcp-nv/TCPNV.html

The current version includes many module parameters to support
experimentation with the parameters.

Signed-off-by: Lawrence Brakmo 
---
 net/ipv4/Kconfig  |  16 ++
 net/ipv4/Makefile |   1 +
 net/ipv4/tcp_nv.c | 489 ++
 3 files changed, 506 insertions(+)
 create mode 100644 net/ipv4/tcp_nv.c

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6fb3c90..f11f2f8 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -539,6 +539,22 @@ config TCP_CONG_VEGAS
window. TCP Vegas should provide less packet loss, but it is
not as aggressive as TCP Reno.
 
+config TCP_CONG_NV
+   tristate "TCP NV"
+   default n
+   ---help---
+   TCP NV is a follow up to TCP Vegas. It has been modified to deal with
+   10G networks, measurement noise introduced by LRO, GRO and interrupt
+   coalescence. In addition, it will decrease its cwnd multiplicatively
+   instead of linearly.
+
+   Note that in general congestion avoidance (cwnd decreased when # packets
+   queued grows) cannot coexist with congestion control (cwnd decreased 
only
+   when there is packet loss) due to fairness issues. One scenario when 
they
+   can coexist safely is when the CA flows have RTTs << CC flows RTTs.
+
+   For further details see http://www.brakmo.org/networking/tcp-nv/
+
 config TCP_CONG_SCALABLE
tristate "Scalable TCP"
default n
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index efc43f3..06f335f 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
 obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
 obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o
 obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
+obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o
 obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
new file mode 100644
index 000..b0bbd85
--- /dev/null
+++ b/net/ipv4/tcp_nv.c
@@ -0,0 +1,489 @@
+/*
+ * TCP NV: TCP with Congestion Avoidance
+ *
+ * TCP-NV is a successor of TCP-Vegas that has been developed to
+ * deal with the issues that occur in modern networks.
+ * Like TCP-Vegas, TCP-NV supports true congestion avoidance,
+ * the ability to detect congestion before packet losses occur.
+ * When congestion (queue buildup) starts to occur, TCP-NV
+ * predicts what the cwnd size should be for the current
+ * throughput and it reduces the cwnd proportionally to
+ * the difference between the current cwnd and the predicted cwnd.
+ * TCP-NV behaves like Reno when no congestion is detected, or when
+ * recovering from packet losses.
+ *
+ * TODO:
+ * 1) Add option to not decrease cwnd on losses below certain level
+ * 2) Add mechanism to deal with reverse congestion.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* TCP NV parameters */
+static int nv_enable __read_mostly = 1;
+static int nv_pad __read_mostly = 10;
+static int nv_pad_buffer __read_mostly = 2;
+static int nv_reset_period __read_mostly = 5;
+static int nv_min_cwnd = 10;
+static int nv_ssthresh_eval_min_calls = 30;
+static int nv_cong_decrease_mult = 30*128/100;
+static int nv_ssthresh_factor = 8;
+static int nv_rtt_factor = 128;
+static int nv_rtt_cnt_dec_delta = 0; /* dec cwnd by this many RTTs */
+static int nv_dec_factor = 8;  /* actual value is factor/8 */
+static int nv_loss_dec_factor = 512; /* on loss reduce cwnd by 50% */
+static int nv_cwnd_growth_factor = 0; /* 0 => Reno growth rate,
+  * 1 => double rate every 2 RTTs
+  * 2 => double rate every 3 RTTs, etc.
+  */
+static u8  nv_dec_eval_min_calls = 60;
+static u8  nv_rtt_min_cnt = 2;
+
+module_param(nv_enable, int, 0644);
+MODULE_PARM_DESC(nv_enable, "enable NV (congestion avoidance) behavior");
+module_param(nv_pad, int, 0644);
+MODULE_PARM_DESC(nv_pad, "extra packets above congestion level");
+module_param(nv_pad_buffer, int, 0644);
+MODULE_PARM_DESC(nv_pad_buffer, "no growth buffer zone");
+module_param(nv_reset_period, int, 0644);
+MODULE_PARM_DESC(nv_reset_period, "nv_min_rtt reset period (secs)");
+module_param(nv_min_cwnd, int, 0644);
+MODULE_PARM_DESC(nv_min_cwnd, "NV will not decrease cwnd below this value"
+" without losses");
+module_param(nv_dec_eval_min_calls, byte, 0644);
+MODULE_PARM_DESC(nv_dec_eval_min_calls, "Wait for this many

[RFC PATCH v6 net-next 2/4] tcp: refactor struct tcp_skb_cb

2015-08-25 Thread Lawrence Brakmo

Refactor tcp_skb_cb to create two overlaping areas to store
state for incoming or outgoing skbs based on comments by
Neal Cardwell to tcp_nv patch:

   AFAICT this patch would not require an increase in the size of
   sk_buff cb[] if it were to take advantage of the fact that the
   tcp_skb_cb header.h4 and header.h6 fields are only used in the packet
   reception code path, and this in_flight field is only used on the
   transmit side.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0121529..a086a98 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -755,11 +755,16 @@ struct tcp_skb_cb {
/* 1 byte hole */
__u32   ack_seq;/* Sequence number ACK'd*/
union {
-   struct inet_skb_parmh4;
+   struct {
+   /* There is space for up to 20 bytes */
+   } tx;   /* only used for outgoing skbs */
+   union {
+   struct inet_skb_parmh4;
 #if IS_ENABLED(CONFIG_IPV6)
-   struct inet6_skb_parm   h6;
+   struct inet6_skb_parm   h6;
 #endif
-   } header;   /* For incoming frames  */
+   } header;   /* For incoming skbs */
+   };
 };
 
 #define TCP_SKB_CB(__skb)  ((struct tcp_skb_cb *)&((__skb)->cb[0]))
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH v6 net-next 3/4] tcp: add in_flight to tcp_skb_cb

2015-08-25 Thread Lawrence Brakmo

Add in_flight (bytes in flight when packet was sent) field
to tx component of tcp_skb_cb and make it available to
congestion modules' pkts_acked() function through the
ack_sample function argument.

Signed-off-by: Lawrence Brakmo 
---
 include/net/tcp.h | 2 ++
 net/ipv4/tcp_input.c  | 5 -
 net/ipv4/tcp_output.c | 4 +++-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a086a98..cdd93e5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -757,6 +757,7 @@ struct tcp_skb_cb {
union {
struct {
/* There is space for up to 20 bytes */
+   __u32 in_flight;/* Bytes in flight when packet sent */
} tx;   /* only used for outgoing skbs */
union {
struct inet_skb_parmh4;
@@ -842,6 +843,7 @@ union tcp_cc_info;
 struct ack_sample {
u32 pkts_acked;
s32 rtt_us;
+   u32 in_flight;
 };
 
 struct tcp_congestion_ops {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index f506a0a..338e6bb 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3069,6 +3069,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
long ca_rtt_us = -1L;
struct sk_buff *skb;
u32 pkts_acked = 0;
+   u32 last_in_flight = 0;
bool rtt_update;
int flag = 0;
 
@@ -3108,6 +3109,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
if (!first_ackt.v64)
first_ackt = last_ackt;
 
+   last_in_flight = TCP_SKB_CB(skb)->tx.in_flight;
reord = min(pkts_acked, reord);
if (!after(scb->end_seq, tp->high_seq))
flag |= FLAG_ORIG_SACK_ACKED;
@@ -3197,7 +3199,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
}
 
if (icsk->icsk_ca_ops->pkts_acked) {
-   struct ack_sample sample = {pkts_acked, ca_rtt_us};
+   struct ack_sample sample = {pkts_acked, ca_rtt_us,
+   last_in_flight};
 
icsk->icsk_ca_ops->pkts_acked(sk, &sample);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 444ab5b..244d201 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct 
sk_buff *skb, int clone_it,
int err;
 
BUG_ON(!skb || !tcp_skb_pcount(skb));
+   tp = tcp_sk(sk);
 
if (clone_it) {
skb_mstamp_get(&skb->skb_mstamp);
+   TCP_SKB_CB(skb)->tx.in_flight = TCP_SKB_CB(skb)->end_seq
+   - tp->snd_una;
 
if (unlikely(skb_cloned(skb)))
skb = pskb_copy(skb, gfp_mask);
@@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff 
*skb, int clone_it,
}
 
inet = inet_sk(sk);
-   tp = tcp_sk(sk);
tcb = TCP_SKB_CB(skb);
memset(&opts, 0, sizeof(opts));
 
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: fec: use reinit_completion() in mdio accessor functions

2015-08-25 Thread David Miller

From: Russell King 
Date: Tue, 25 Aug 2015 09:49:53 +0100

> Rather than re-initialising the entire completion on every mdio access,
> use reinit_completion() which only resets the completion count.  This
> avoids possible reinitialisation of the contained spinlock and waitqueue
> while they may be in use (eg, mid-completion.)
> 
> Such an event could occur if there's a long delay in interrupt handling
> causing the mdio accessor to time out, then a second access comes in
> while the interrupt handler on a different CPU has called complete().
> Another scenario where this has been observed is while locking has
> been missing at the phy layer, allowing concurrent attempts to access
> the MDIO bus.
> 
> Signed-off-by: Russell King 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread David Ahern


On 8/25/15 4:20 PM, Alexei Starovoitov wrote:

On Tue, Aug 25, 2015 at 03:50:10PM -0700, Florian Fainelli wrote:

Hi all,

This patch series implements a L2 only interface concept which basically denies
any kind of IP address configuration on these interfaces, but still allows them
to be used as configuration end-points to keep using ethtool and friends.

A cleaner approach might be to finally come up with the concept of net_port
which a net_device would be a superset of, but this still raises tons of
questions as to whether we should be modifying userland tools to be able to
configure/query these interfaces. During all the switch talks/discussions last
year, it seemed to me like th L2-only interface is closest we have to a
"network port".

Comments, flames, flying tomatoes welcome!

Florian Fainelli (5):
   net: add IFF_L2_ONLY flag
   net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces
   net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces


interesting idea! Do you know how kernel/iproute2 will react to lack of in_dev?
No crashes I'm assuming, but what kind of errors are thrown?
imo great first step to have lightweight netdevs. +1 for 'net_port' in the 
future.



I was looking a lightweight netdevice a couple of months ago -- 
bypassing procfs, sysfs and reducing the overall size of the net_device 
struct (which needs to go on a diet). In my POC (which is not ready for 
posting) I am using a link attribute (IFLA_LWT_NETDEV) as the trigger to 
bypass devinet_sysctl_register for example.


In your case you are proposing an interface flag. Is the intention to 
allow a run time change?


David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: phy: add locking to phy_read_mmd_indirect()/phy_write_mmd_indirect()

2015-08-25 Thread David Miller

From: Florian Fainelli 
Date: Tue, 25 Aug 2015 07:08:24 -0700

> Le 08/25/15 01:49, Russell King a écrit :
>> The phy layer is missing locking for the above two functions - it
>> has been observed that two threads (userspace and the phy worker
>> thread) can race, entering the bus ->write or ->read functions
>> simultaneously.
>> 
>> This causes the FEC driver to initialise a completion while another
>> thread is waiting on it or while the interrupt is calling complete()
>> on it, which causes spinlock unlock-without-lock, spinlock lockups,
>> and completion timeouts.
>> 
>> Signed-off-by: Russell King 
> 
> Acked-by: Florian Fainelli 
> Fixes: a59a4d192 ("phy: add the EEE support and the way to access to the
> MMD registers.")
> Fixes: 0c1d77dfb ("net: libphy: Add phy specific function to access mmd
> phy registers")

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] RDS: Few more fixes

2015-08-25 Thread David Miller

From: Santosh Shilimkar 
Date: Tue, 25 Aug 2015 12:01:57 -0700

> As indicated in the earlier series [1], this is a follow-up series which
> addresses few issues around the RDS FMR code. With [1] and the subject
> series, now I can run many parallel threads with multiple sockets with
> N x N traffic. The stress tests has survived overnight runs.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Florian Fainelli

On 25/08/15 16:20, Alexei Starovoitov wrote:
> On Tue, Aug 25, 2015 at 03:50:10PM -0700, Florian Fainelli wrote:
>> Hi all,
>>
>> This patch series implements a L2 only interface concept which basically 
>> denies
>> any kind of IP address configuration on these interfaces, but still allows 
>> them
>> to be used as configuration end-points to keep using ethtool and friends.
>>
>> A cleaner approach might be to finally come up with the concept of net_port
>> which a net_device would be a superset of, but this still raises tons of
>> questions as to whether we should be modifying userland tools to be able to
>> configure/query these interfaces. During all the switch talks/discussions 
>> last
>> year, it seemed to me like th L2-only interface is closest we have to a
>> "network port".
>>
>> Comments, flames, flying tomatoes welcome!
>>
>> Florian Fainelli (5):
>>   net: add IFF_L2_ONLY flag
>>   net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces
>>   net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces
> 
> interesting idea! Do you know how kernel/iproute2 will react to lack of 
> in_dev?

Surprisingly pretty good so far, have not found a way to make the kernel
crash ;)

> No crashes I'm assuming, but what kind of errors are thrown?

If you try to assign an IP address to such an interface, you get:

# ifconfig gphy 192.168.1.1
ifconfig: SIOCSIFADDR: No buffer space available

> imo great first step to have lightweight netdevs. +1 for 'net_port' in the 
> future.

Thanks!
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Florian Fainelli

On 25/08/15 16:24, Stephen Hemminger wrote:
> On Tue, 25 Aug 2015 15:50:10 -0700
> Florian Fainelli  wrote:
> 
>> Hi all,
>>
>> This patch series implements a L2 only interface concept which basically 
>> denies
>> any kind of IP address configuration on these interfaces, but still allows 
>> them
>> to be used as configuration end-points to keep using ethtool and friends.
>>
>> A cleaner approach might be to finally come up with the concept of net_port
>> which a net_device would be a superset of, but this still raises tons of
>> questions as to whether we should be modifying userland tools to be able to
>> configure/query these interfaces. During all the switch talks/discussions 
>> last
>> year, it seemed to me like th L2-only interface is closest we have to a
>> "network port".
>>
>> Comments, flames, flying tomatoes welcome!
>>
>> Florian Fainelli (5):
>>   net: add IFF_L2_ONLY flag
>>   net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces
>>   net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces
>>   net: dsa: Flag slave network devices with IFF_L2_ONLY
>>   net: dsa: bcm_sf2: Allow disabling tagging protocol
>>
>>  drivers/net/dsa/bcm_sf2.c | 16 +---
>>  include/uapi/linux/if.h   |  5 -
>>  net/dsa/slave.c   |  1 +
>>  net/ipv4/devinet.c|  3 +++
>>  net/ipv6/addrconf.c   |  4 
>>  5 files changed, 25 insertions(+), 4 deletions(-)
>>
> 
> Can you bridge these?

You can add such an interface to the bridge, but I am still figuring out
how functional such a bridge is, because with my change to bcm_sf2,
there is no switch tag inserted, so I cannot differentiate a BPDU from
Port 0, 1 etc... probably of limited use. You could still configure
VLANs using bridge vlan filtering though, which was the main idea.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] vxlan: re-ignore EADDRINUSE from igmp_join

2015-08-25 Thread David Miller

From: Marcelo Ricardo Leitner 
Date: Tue, 25 Aug 2015 20:22:35 -0300

> Before 56ef9c909b40[1] it used to ignore all errors from igmp_join().
> That commit enhanced that and made it error out whatever error happened
> with igmp_join(), but that's not good because when using multicast
> groups vxlan will try to join it multiple times if the socket is reused
> and then the 2nd and further attempts will fail with EADDRINUSE.
> 
> As we don't track to which groups the socket is already subscribed, it's
> okay to just ignore that error.
> 
> Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
> Reported-by: John Nielsen 
> Signed-off-by: Marcelo Ricardo Leitner 
> ---
> 
> John, please see how this goes for you. It worked in here.
> 
> Dave, please consider this for stable trees. At least 4.1 is affected.

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Stephen Hemminger

On Tue, 25 Aug 2015 15:50:10 -0700
Florian Fainelli  wrote:

> Hi all,
> 
> This patch series implements a L2 only interface concept which basically 
> denies
> any kind of IP address configuration on these interfaces, but still allows 
> them
> to be used as configuration end-points to keep using ethtool and friends.
> 
> A cleaner approach might be to finally come up with the concept of net_port
> which a net_device would be a superset of, but this still raises tons of
> questions as to whether we should be modifying userland tools to be able to
> configure/query these interfaces. During all the switch talks/discussions last
> year, it seemed to me like th L2-only interface is closest we have to a
> "network port".
> 
> Comments, flames, flying tomatoes welcome!
> 
> Florian Fainelli (5):
>   net: add IFF_L2_ONLY flag
>   net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces
>   net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces
>   net: dsa: Flag slave network devices with IFF_L2_ONLY
>   net: dsa: bcm_sf2: Allow disabling tagging protocol
> 
>  drivers/net/dsa/bcm_sf2.c | 16 +---
>  include/uapi/linux/if.h   |  5 -
>  net/dsa/slave.c   |  1 +
>  net/ipv4/devinet.c|  3 +++
>  net/ipv6/addrconf.c   |  4 
>  5 files changed, 25 insertions(+), 4 deletions(-)
> 

Can you bridge these?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/4] net/mlx5_core: Expose transobj APIs from mlx5 core

2015-08-25 Thread David Miller

From: Achiad Shochat 
Date: Tue, 25 Aug 2015 15:29:57 +0300

> From: Yishai Hadas 
> 
> Move transobj.h from the core library to include/linux/mlx5
> and expose its APIs.
> It enables using its functionality outside of mlx5 core.
> 
> Signed-off-by: Yishai Hadas 

You can submit this patch when you submit the upstream change that makes
use of this code in such a way, not beforehand.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] vxlan: re-ignore EADDRINUSE from igmp_join

2015-08-25 Thread Marcelo Ricardo Leitner

Before 56ef9c909b40[1] it used to ignore all errors from igmp_join().
That commit enhanced that and made it error out whatever error happened
with igmp_join(), but that's not good because when using multicast
groups vxlan will try to join it multiple times if the socket is reused
and then the 2nd and further attempts will fail with EADDRINUSE.

As we don't track to which groups the socket is already subscribed, it's
okay to just ignore that error.

Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Reported-by: John Nielsen 
Signed-off-by: Marcelo Ricardo Leitner 
---

John, please see how this goes for you. It worked in here.

Dave, please consider this for stable trees. At least 4.1 is affected.

Thanks.

 drivers/net/vxlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 
61b457b9ec00517037e4833790bea97ac53aa832..b0a2da5d4e57e602c915b65abd34a83939c4c473
 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2279,6 +2279,8 @@ static int vxlan_open(struct net_device *dev)
 
if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip)) {
ret = vxlan_igmp_join(vxlan);
+   if (ret == -EADDRINUSE)
+   ret = 0;
if (ret) {
vxlan_sock_release(vs);
return ret;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull request: batman-adv 20150825

2015-08-25 Thread David Miller

From: Antonio Quartulli 
Date: Tue, 25 Aug 2015 13:02:24 +0200

> here you have another batch intended for net-next/linux-4.3.
> 
> Most of the patches are about code restyling and beautification,
> but we also a couple of non critical fixes which didn't make their
> way through net.
> 
> Please pull or let me know of any problem!

Pulled, thanks Antonio.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Alexei Starovoitov

On Tue, Aug 25, 2015 at 03:50:10PM -0700, Florian Fainelli wrote:
> Hi all,
> 
> This patch series implements a L2 only interface concept which basically 
> denies
> any kind of IP address configuration on these interfaces, but still allows 
> them
> to be used as configuration end-points to keep using ethtool and friends.
> 
> A cleaner approach might be to finally come up with the concept of net_port
> which a net_device would be a superset of, but this still raises tons of
> questions as to whether we should be modifying userland tools to be able to
> configure/query these interfaces. During all the switch talks/discussions last
> year, it seemed to me like th L2-only interface is closest we have to a
> "network port".
> 
> Comments, flames, flying tomatoes welcome!
> 
> Florian Fainelli (5):
>   net: add IFF_L2_ONLY flag
>   net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces
>   net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces

interesting idea! Do you know how kernel/iproute2 will react to lack of in_dev?
No crashes I'm assuming, but what kind of errors are thrown?
imo great first step to have lightweight netdevs. +1 for 'net_port' in the 
future.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pull-request: can 2015-08-25

2015-08-25 Thread David Miller

From: Marc Kleine-Budde 
Date: Tue, 25 Aug 2015 08:55:45 +0200

> this is the updated pull request of one patch by me for the peak_usb driver. 
> It
> fixes the driver, so that non FD adapters don't provide CAN FD bittimings.
 ...
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can.git 
> tags/linux-can-fixes-for-4.2-20150825

Pulled, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC, RFT PATCH 1/2] dl2k: Add support for IP1000A-based cards

2015-08-25 Thread Francois Romieu

Ondrej Zary  :
[...]
> Actually, gigabit works with this patch. The "PHY magic" part contains 
> mii_write(9, 0x0700) which makes gigabit work.

It'd be extatic if you could set mii->supports_gmii for the appropriate
PHY so that dl2k.c::rio_get_settings returns sensible gigabit status
(note: neither ipg.c nor dl2k.c currently does).

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/2] Optimize the snmp stat aggregation for large cpus

2015-08-25 Thread David Miller

From: Raghavendra K T 
Date: Tue, 25 Aug 2015 13:24:24 +0530

> Please let me know if you have suggestions/comments.

Like Eric Dumazet said the idea is good but needs some adjustments.

You might want to see whether a per-cpu work buffer works for this.

It's extremely unfortunately that we can't depend upon the destination
buffer being properly aligned, because we wouldn't need a temporary
scratch area if it were aligned properly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] enic: reduce ioread in devcmd2

2015-08-25 Thread David Miller

From: Govindarajulu Varadarajan <_gov...@gmx.com>
Date: Tue, 25 Aug 2015 14:15:11 +0530

> posted_index is RO in firmware. We need not do ioread everytime to get
> posted index. Store posted index locally.
> 
> Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com>
> ---
> v2: initialize devcmd2->posted properly in vnic_dev_init_devcmd2

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH repost net-next] net: compile renesas directory if NET_VENDOR_RENESAS is configured

2015-08-25 Thread David Miller

From: Simon Horman 
Date: Wed, 26 Aug 2015 08:55:39 +1000

> On Wed, Aug 26, 2015 at 12:15:57AM +0300, Sergei Shtylyov wrote:
>> On 08/25/2015 11:34 AM, Sergei Shtylyov wrote:
>> 
>> >>From: Kazuya Mizuguchi 
>> >>
>> >>Currently the renesas ethernet driver directory is compiled if SH_ETH is
>> >>configured rather than NET_VENDOR_RENESAS. Although incorrect that was
>> >>quite harmless as until recently as SH_ETH configured the only driver in
>> >>the renesas directory. However, as of c156633f1353 ("Renesas Ethernet AVB
>> >>driver proper") the renesas directory includes another driver, configured
>> >>by RAVB, and it makes little sense for it to have a hidden dependency on
>> >>SH_ETH.
>> >>
>> >>Signed-off-by: Kazuya Mizuguchi 
>> >>[horms: rewrote changelog]
>> >>Signed-off-by: Simon Horman 
>> >
>> >Sorry about missing that when submitting the AVB driver.
>> 
>>BTW, why against net-next? I consider this a fix.
> 
> I wasn't sure which way to go and decided to err on the side of caution with a
> net-next submission. I have no objections to it being considered for next.

I'll put it in the 'net' tree.

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next:master 1267/1290] net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types)

2015-08-25 Thread santosh.shilim...@oracle.com


On 8/25/15 3:55 PM, David Miller wrote:

From: kbuild test robot 
Date: Wed, 26 Aug 2015 06:42:39 +0800


sparse warnings: (new ones prefixed by >>)


net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base 
types)

net/rds/ib_recv.c:382:28:expected int [signed] can_wait
net/rds/ib_recv.c:382:28:got restricted gfp_t
net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64


Fixed by:


[PATCH] rds: Fix improper gfp_t usage.


net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base 
types)

net/rds/ib_recv.c:382:28:expected int [signed] can_wait
net/rds/ib_recv.c:382:28:got restricted gfp_t
net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64

Reported-by: kbuild test robot 
Signed-off-by: David S. Miller 
---

Thanks Dave. I was just creating the patch after noticing
the error from kbuild on my tree.

Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters

2015-08-25 Thread David Miller

From: David Miller 
Date: Tue, 25 Aug 2015 15:59:21 -0700 (PDT)

> From: Francois Romieu 
> Date: Wed, 26 Aug 2015 00:54:06 +0200
> 
>>> Bringing the interface is brought down/up should not reset the
>>> counters.
>> 
>> Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets
>> takes care of it: it's set during the first open() after probe().
> 
> Ok, then it's fine.

And as such I've applied this patch, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters

2015-08-25 Thread David Miller

From: Francois Romieu 
Date: Wed, 26 Aug 2015 00:54:06 +0200

> David Miller  :
> [...]
>> Your counter offsets should be read at probe time, not open time.
> 
> It can be done but the "CmdRxEnb / rx traffic must be enabled" constraint
> will make it a major pita. 
> 
> Reading counter offsets at the end of open() naturally solves this
> constraint (retentive error unwinding in opne() stops being completely
> trivial though :o/ ).

So you can manage whether you've done the "once per device probe"
counter reset, and act upon it at the end of ->open().

>> Bringing the interface is brought down/up should not reset the
>> counters.
> 
> Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets
> takes care of it: it's set during the first open() after probe().

Ok, then it's fine.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH repost net-next] net: compile renesas directory if NET_VENDOR_RENESAS is configured

2015-08-25 Thread Simon Horman

On Wed, Aug 26, 2015 at 12:15:57AM +0300, Sergei Shtylyov wrote:
> On 08/25/2015 11:34 AM, Sergei Shtylyov wrote:
> 
> >>From: Kazuya Mizuguchi 
> >>
> >>Currently the renesas ethernet driver directory is compiled if SH_ETH is
> >>configured rather than NET_VENDOR_RENESAS. Although incorrect that was
> >>quite harmless as until recently as SH_ETH configured the only driver in
> >>the renesas directory. However, as of c156633f1353 ("Renesas Ethernet AVB
> >>driver proper") the renesas directory includes another driver, configured
> >>by RAVB, and it makes little sense for it to have a hidden dependency on
> >>SH_ETH.
> >>
> >>Signed-off-by: Kazuya Mizuguchi 
> >>[horms: rewrote changelog]
> >>Signed-off-by: Simon Horman 
> >
> >Sorry about missing that when submitting the AVB driver.
> 
>BTW, why against net-next? I consider this a fix.

I wasn't sure which way to go and decided to err on the side of caution with a
net-next submission. I have no objections to it being considered for next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next:master 1267/1290] net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types)

2015-08-25 Thread David Miller

From: kbuild test robot 
Date: Wed, 26 Aug 2015 06:42:39 +0800

> sparse warnings: (new ones prefixed by >>)
> 
>>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different 
>>> base types)
>net/rds/ib_recv.c:382:28:expected int [signed] can_wait
>net/rds/ib_recv.c:382:28:got restricted gfp_t
>net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64

Fixed by:


[PATCH] rds: Fix improper gfp_t usage.

>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different 
>> base types)
   net/rds/ib_recv.c:382:28:expected int [signed] can_wait
   net/rds/ib_recv.c:382:28:got restricted gfp_t
   net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64

Reported-by: kbuild test robot 
Signed-off-by: David S. Miller 
---
 net/rds/ib_recv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 3afdcbd..ed9b41e 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -379,7 +379,7 @@ void rds_ib_recv_refill(struct rds_connection *conn, int 
prefill, gfp_t gfp)
struct ib_recv_wr *failed_wr;
unsigned int posted = 0;
int ret = 0;
-   int can_wait = gfp & __GFP_WAIT;
+   bool can_wait = !!(gfp & __GFP_WAIT);
u32 pos;
 
/* the goal here is to just make sure that someone, somewhere
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 net-next] r8169: Add values missing in @get_stats64 from HW counters

2015-08-25 Thread Francois Romieu

David Miller  :
[...]
> Your counter offsets should be read at probe time, not open time.

It can be done but the "CmdRxEnb / rx traffic must be enabled" constraint
will make it a major pita. 

Reading counter offsets at the end of open() naturally solves this
constraint (retentive error unwinding in opne() stops being completely
trivial though :o/ ).

> Bringing the interface is brought down/up should not reset the
> counters.

Afaiks rtl8169_tc_offsets.inited in rtl8169_init_counter_offsets
takes care of it: it's set during the first open() after probe().

Looking at it again, the patch directly stores 16 and 32 bit values
in rtnl_link_stats64. Nobody should care about exact exceedingly high
error count but rx_multicast ought to be accumulated.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 0/5] net: L2 only interfaces

2015-08-25 Thread Florian Fainelli

Hi all,

This patch series implements a L2 only interface concept which basically denies
any kind of IP address configuration on these interfaces, but still allows them
to be used as configuration end-points to keep using ethtool and friends.

A cleaner approach might be to finally come up with the concept of net_port
which a net_device would be a superset of, but this still raises tons of
questions as to whether we should be modifying userland tools to be able to
configure/query these interfaces. During all the switch talks/discussions last
year, it seemed to me like th L2-only interface is closest we have to a
"network port".

Comments, flames, flying tomatoes welcome!

Florian Fainelli (5):
  net: add IFF_L2_ONLY flag
  net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces
  net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces
  net: dsa: Flag slave network devices with IFF_L2_ONLY
  net: dsa: bcm_sf2: Allow disabling tagging protocol

 drivers/net/dsa/bcm_sf2.c | 16 +---
 include/uapi/linux/if.h   |  5 -
 net/dsa/slave.c   |  1 +
 net/ipv4/devinet.c|  3 +++
 net/ipv6/addrconf.c   |  4 
 5 files changed, 25 insertions(+), 4 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 3/5] net: ipv6: Skip in6_dev initialization for IFF_L2_ONLY interfaces

2015-08-25 Thread Florian Fainelli

IFF_L2_ONLY interfaces are Layer-2 only network devices and do not
support configuration of IPv6 addresses, nor the full IPv6 protocol
stack. Do nothing for these interfaces.

Signed-off-by: Florian Fainelli 
---
 net/ipv6/addrconf.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 0f08d3b9e238..0365b5ffe339 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3161,6 +3161,9 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
int run_pending = 0;
int err;
 
+   if (dev->flags & IFF_L2_ONLY)
+   goto out;
+
switch (event) {
case NETDEV_REGISTER:
if (!idev && dev->mtu >= IPV6_MIN_MTU) {
@@ -3304,6 +3307,7 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
addrconf_type_change(dev, event);
break;
}
+out:
 
return NOTIFY_OK;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 1/5] net: add IFF_L2_ONLY flag

2015-08-25 Thread Florian Fainelli

Allow network device drivers to flag specific network devices as being
L2 only, that is, no IPv4/v6 configuration will be allowed on these
interfaces, yet they are still usable as configuration endpoints for
ethtool interfaces.

Signed-off-by: Florian Fainelli 
---
 include/uapi/linux/if.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
index 9cf2394f0bcf..2de818930edf 100644
--- a/include/uapi/linux/if.h
+++ b/include/uapi/linux/if.h
@@ -87,6 +87,7 @@ enum net_device_flags {
IFF_LOWER_UP= 1<<16, /* volatile */
IFF_DORMANT = 1<<17, /* volatile */
IFF_ECHO= 1<<18, /* volatile */
+   IFF_L2_ONLY = 1<<19, /* volatile */
 };
 
 #define IFF_UP IFF_UP
@@ -108,9 +109,11 @@ enum net_device_flags {
 #define IFF_LOWER_UP   IFF_LOWER_UP
 #define IFF_DORMANTIFF_DORMANT
 #define IFF_ECHO   IFF_ECHO
+#define IFF_L2_ONLYIFF_L2_ONLY
 
 #define IFF_VOLATILE   (IFF_LOOPBACK|IFF_POINTOPOINT|IFF_BROADCAST|IFF_ECHO|\
-   IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT)
+   IFF_MASTER|IFF_SLAVE|IFF_RUNNING|IFF_LOWER_UP|IFF_DORMANT|\
+   IFF_L2_ONLY)
 
 #define IF_GET_IFACE   0x0001  /* for querying only */
 #define IF_GET_PROTO   0x0002
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 4/5] net: dsa: Flag slave network devices with IFF_L2_ONLY

2015-08-25 Thread Florian Fainelli

When tagging is not supported by the underlying switch driver,
ds->tag_protocol will be set to DSA_TAG_PROTO_NONE, and we should be
flagging the slave network devices with IFF_L2_ONLY such that IP
configuration is denied and they are just control end-points.

Signed-off-by: Florian Fainelli 
---
 net/dsa/slave.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index cce97385f743..855c66dddced 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1185,6 +1185,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct device 
*parent,
break;
 #endif
default:
+   slave_dev->flags |= IFF_L2_ONLY;
p->xmit = dsa_slave_notag_xmit;
break;
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 2/5] net: ipv4: Skip in_dev initialization for IFF_L2_ONLY interfaces

2015-08-25 Thread Florian Fainelli

IFF_L2_ONLY interfaces are Layer-2 only and do not support configuration
of IPv4 addresses, nor the full IPv4 protocol stack. Do nothing for
these interfaces.

Signed-off-by: Florian Fainelli 
---
 net/ipv4/devinet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2d9cb1748f81..30068754e821 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1383,6 +1383,9 @@ static int inetdev_event(struct notifier_block *this, 
unsigned long event,
ASSERT_RTNL();
 
if (!in_dev) {
+   if (dev->flags & IFF_L2_ONLY)
+   goto out;
+
if (event == NETDEV_REGISTER) {
in_dev = inetdev_init(dev);
if (IS_ERR(in_dev))
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC 5/5] net: dsa: bcm_sf2: Allow disabling tagging protocol

2015-08-25 Thread Florian Fainelli

Update the IMP port configuration to check whether tagging is enabled
(DSA_TAG_PROTO_BRCM) or disabled (DSA_TAG_PROTO_NONE) and correctly
program the relevant registers in both cases.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 289e20443d83..68abcc545231 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -159,6 +159,7 @@ static void bcm_sf2_imp_vlan_setup(struct dsa_switch *ds, 
int cpu_port)
 
 static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 {
+   bool tagging_disabled = !!(ds->tag_protocol == DSA_TAG_PROTO_NONE);
struct bcm_sf2_priv *priv = ds_to_priv(ds);
u32 reg, val;
 
@@ -199,21 +200,30 @@ static void bcm_sf2_imp_setup(struct dsa_switch *ds, int 
port)
 
/* Enable Broadcom tags for IMP port */
reg = core_readl(priv, CORE_BRCM_HDR_CTRL);
-   reg |= val;
+   if (!tagging_disabled)
+   reg |= val;
+   else
+   reg &= ~val;
core_writel(priv, reg, CORE_BRCM_HDR_CTRL);
 
/* Enable reception Broadcom tag for CPU TX (switch RX) to
 * allow us to tag outgoing frames
 */
reg = core_readl(priv, CORE_BRCM_HDR_RX_DIS);
-   reg &= ~(1 << port);
+   if (tagging_disabled)
+   reg |= 1 << port;
+   else
+   reg &= ~(1 << port);
core_writel(priv, reg, CORE_BRCM_HDR_RX_DIS);
 
/* Enable transmission of Broadcom tags from the switch (CPU RX) to
 * allow delivering frames to the per-port net_devices
 */
reg = core_readl(priv, CORE_BRCM_HDR_TX_DIS);
-   reg &= ~(1 << port);
+   if (tagging_disabled)
+   reg |= 1 << port;
+   else
+   reg &= ~(1 << port);
core_writel(priv, reg, CORE_BRCM_HDR_TX_DIS);
 
/* Force link status for IMP port */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] inetpeer: Add support for VRFs

2015-08-25 Thread David Miller

From: David Ahern 
Date: Tue, 25 Aug 2015 15:41:36 -0700

> Meaning rename struct inetpeer_addr to struct inetpeer_key and
> addr_compare to entry_compare or key_compare?

I'm not talking about inetpeer specifically, but generally speaking
everywhere you're going to have to handle this including inetpeer.

So something like "inet4_daddr_key" which is a __be32 and the ifindex.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device

2015-08-25 Thread David Miller

From: David Ahern 
Date: Tue, 25 Aug 2015 15:37:55 -0700

> On 8/25/15 2:02 PM, David Miller wrote:
>> From: David Ahern 
>> Date: Sun, 23 Aug 2015 12:41:00 -0600
>>
>>> @@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb,
>>> struct net_device *dev)
>>>
>>>   static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
>>>   {
>>> +   int err;
>>> +
>>> +   __skb_pull(skb, skb_network_offset(skb));
>>> +   err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
>>> + NULL, NULL, skb->len);
>>> +
>>> +   if (err < 0) {
>>> +   vrf_tx_error(skb->dev, skb);
>>> +   return -EINVAL;
>>> +   }
>>> +
>>> return dev_queue_xmit(skb);
>>
>> This is expensive and rediculous to do for every TX frame.
>>
>> You'll need to find another way.
>>
> 
> The packet is directed here from the IP layer via the custom dst, so
> there is no L2 header on the skb. So while the push and pop of the
> header seems silly it is part and parcel of the feature to run tcpdump
> on the VRF device. I don't see how it could be done any other way.

You're losing a significant optimization on the transmit path by not
using the neighbour table entry hard header cache.

That's what I want you to fix.

See dst_neigh_output() and in particular neigh_hh_output().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] inetpeer: Add support for VRFs

2015-08-25 Thread David Ahern

On 8/25/15 1:47 PM, David Miller wrote:

From: David Ahern 
Date: Sun, 23 Aug 2015 20:01:34 -0600

On 8/23/15 6:15 PM, Thomas Graf wrote:

On 08/23/15 at 08:26am, David Ahern wrote:

inetpeer caches based on address only, so duplicate IP addresses
within
a namespace return the same cached entry. Similar to IP fragments
handle
duplicate addresses across VRFs by adding the VRF master device index
to
the lookup.

We have a lot of other places which use the address only. Are you
going to add the VRF id to all these places as well?

If appropriate, yes. I have fixed IP fragments and this patch fixes
inetpeer cache. In both cases (L3 artifacts) the vrf device index
provides the means to uniquely identify duplicate IP addresses within
a namespace. If you know of other code that might be impacted I will
investigate and fix as needed.

Anyways, what this inetpeer patch is doing is the wrong abstraction.

The key is really "daddr + netdev" so make a helper that works using
those arguments.

That's what I have here:

struct inetpeer_addr {
struct inetpeer_addr_base   addr;
__u16   family;
#if IS_ENABLED(CONFIG_NET_VRF)
int vif;
#endif
};

the addr_compare then checks the vif (VRF device index) after the N-word 
address compare.

Then it is clear as we propagate this around that addresses need to
be coupled with the device in question in order to be keyed properly.

Meaning rename struct inetpeer_addr to struct inetpeer_key and 
addr_compare to entry_compare or key_compare? Everything else still 
treats the address + VRF device as the key.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next:master 1267/1290] net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types)

2015-08-25 Thread kbuild test robot

tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   04e1b7341dc33abe4dd3f761e2e9137701e55684
commit: 73ce4317bf983282593aff710b112a7e705620c3 [1267/1290] RDS: make sure we 
post recv buffers
reproduce:
  # apt-get install sparse
  git checkout 73ce4317bf983282593aff710b112a7e705620c3
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different 
>> base types)
   net/rds/ib_recv.c:382:28:expected int [signed] can_wait
   net/rds/ib_recv.c:382:28:got restricted gfp_t
   net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64

vim +382 net/rds/ib_recv.c

   366  }
   367  
   368  /*
   369   * This tries to allocate and post unused work requests after making 
sure that
   370   * they have all the allocations they need to queue received fragments 
into
   371   * sockets.
   372   *
   373   * -1 is returned if posting fails due to temporary resource exhaustion.
   374   */
   375  void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t 
gfp)
   376  {
   377  struct rds_ib_connection *ic = conn->c_transport_data;
   378  struct rds_ib_recv_work *recv;
   379  struct ib_recv_wr *failed_wr;
   380  unsigned int posted = 0;
   381  int ret = 0;
 > 382  int can_wait = gfp & __GFP_WAIT;
   383  u32 pos;
   384  
   385  /* the goal here is to just make sure that someone, somewhere
   386   * is posting buffers.  If we can't get the refill lock,
   387   * let them do their thing
   388   */
   389  if (!acquire_refill(conn))
   390  return;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] vrf: Add ethernet header for pass through VRF device

2015-08-25 Thread David Ahern

On 8/25/15 2:02 PM, David Miller wrote:

From: David Ahern 
Date: Sun, 23 Aug 2015 12:41:00 -0600

@@ -250,6 +253,17 @@ static netdev_tx_t vrf_xmit(struct sk_buff *skb, struct 
net_device *dev)

  static netdev_tx_t vrf_finish(struct sock *sk, struct sk_buff *skb)
  {
+   int err;
+
+   __skb_pull(skb, skb_network_offset(skb));
+   err = dev_hard_header(skb, skb->dev, ntohs(skb->protocol),
+ NULL, NULL, skb->len);
+
+   if (err < 0) {
+   vrf_tx_error(skb->dev, skb);
+   return -EINVAL;
+   }
+
return dev_queue_xmit(skb);

This is expensive and rediculous to do for every TX frame.

You'll need to find another way.

The packet is directed here from the IP layer via the custom dst, so 
there is no L2 header on the skb. So while the push and pop of the 
header seems silly it is part and parcel of the feature to run tcpdump 
on the VRF device. I don't see how it could be done any other way.

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 2/2] Documentation: networking: dsa: Add Broadcom SF2 document

2015-08-25 Thread Florian Fainelli

Add a document describing the Broadcom Starfigther 2 switch hardware,
its specifics, and how the driver is implemented and its specifics.

Signed-off-by: Florian Fainelli 
---
Changes in v2:

- address Randy Dunlap's feedback
- address Vivien's feedback
- fix typos/misc spelling mistakes and punctuation

 Documentation/networking/dsa/bcm_sf2.txt | 114 +++
 1 file changed, 114 insertions(+)
 create mode 100644 Documentation/networking/dsa/bcm_sf2.txt

diff --git a/Documentation/networking/dsa/bcm_sf2.txt 
b/Documentation/networking/dsa/bcm_sf2.txt
new file mode 100644
index ..d999d0c1c5b8
--- /dev/null
+++ b/Documentation/networking/dsa/bcm_sf2.txt
@@ -0,0 +1,114 @@
+Broadcom Starfighter 2 Ethernet switch driver
+=
+
+Broadcom's Starfighter 2 Ethernet switch hardware block is commonly found and
+deployed in the following products:
+
+- xDSL gateways such as BCM63138
+- streaming/multimedia Set Top Box such as BCM7445
+- Cable Modem/residential gateways such as BCM7145/BCM3390
+
+The switch is typically deployed in a configuration involving between 5 to 13
+ports, offering a range of built-in and customizable interfaces:
+
+- single integrated Gigabit PHY
+- quad integrated Gigabit PHY
+- quad external Gigabit PHY w/ MDIO multiplexer
+- integrated MoCA PHY
+- several external MII/RevMII/GMII/RGMII interfaces
+
+The switch also supports specific congestion control features which allow MoCA
+fail-over not to lose packets during a MoCA role re-election, as well as out of
+band back-pressure to the host CPU network interface when downstream interfaces
+are connected at a lower speed.
+
+The switch hardware block is typically interfaced using MMIO accesses and
+contains a bunch of sub-blocks/registers:
+
+* SWITCH_CORE: common switch registers
+* SWITCH_REG: external interfaces switch register
+* SWITCH_MDIO: external MDIO bus controller (there is another one in 
SWITCH_CORE,
+  which is used for indirect PHY accesses)
+* SWITCH_INDIR_RW: 64-bits wide register helper block
+* SWITCH_INTRL2_0/1: Level-2 interrupt controllers
+* SWITCH_ACB: Admission control block
+* SWITCH_FCB: Fail-over control block
+
+Implementation details
+==
+
+The driver is located in drivers/net/dsa/bcm_sf2.c and is implemented as a DSA
+driver; see Documentation/networking/dsa/dsa.txt for details on the subsytem
+and what it provides.
+
+The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag
+which gets inserted by the switch for every packet forwarded to the CPU
+interface, conversely, the CPU network interface should insert a similar tag 
for
+packets entering the CPU port. The tag format is described in
+net/dsa/tag_brcm.c.
+
+Overall, the SF2 driver is a fairly regular DSA driver; there are a few
+specifics covered below.
+
+Device Tree probing
+---
+
+The DSA platform device driver is probed using a specific compatible string
+provided in net/dsa/dsa.c. The reason for that is because the DSA subsystem 
gets
+registered as a platform device driver currently. DSA will provide the needed
+device_node pointers which are then accessible by the switch driver setup
+function to setup resources such as register ranges and interrupts. This
+currently works very well because none of the of_* functions utilized by the
+driver require a struct device to be bound to a struct device_node, but things
+may change in the future.
+
+MDIO indirect accesses
+--
+
+Due to a limitation in how Broadcom switches have been designed, external
+Broadcom switches connected to a SF2 require the use of the DSA slave MDIO bus
+in order to properly configure them. By default, the SF2 pseudo-PHY address, 
and
+an external switch pseudo-PHY address will both be snooping for incoming MDIO
+transactions, since they are at the same address (30), resulting in some kind 
of
+"double" programming. Using DSA, and setting ds->phys_mii_mask accordingly, we
+selectively divert reads and writes towards external Broadcom switches
+pseudo-PHY addresses. Newer revisions of the SF2 hardware have introduced a
+configurable pseudo-PHY address which circumvents the initial design 
limitation.
+
+Multimedia over CoAxial (MoCA) interfaces
+-
+
+MoCA interfaces are fairly specific and require the use of a firmware blob 
which
+gets loaded onto the MoCA processor(s) for packet processing. The switch
+hardware contains logic which will assert/de-assert link states accordingly for
+the MoCA interface whenever the MoCA coaxial cable gets disconnected or the
+firmware gets reloaded. The SF2 driver relies on such events to properly set 
its
+MoCA interface carrier state and properly report this to the networking stack.
+
+The MoCA interfaces are supported using the PHY library's fixed PHY/emulated 
PHY
+device and the switch driver registers a fixed_link_update callback for such
+PHYs which ref

[PATCH net-next v3 0/2] Documentation: dsa

2015-08-25 Thread Florian Fainelli

Hi all,

This patch series adds some documentation about DSA as a subsystem as well
as the SF2 driver since it slightly diverges from your average DSA driver ;)

Florian Fainelli (2):
  Documentation: networking: add a DSA document
  Documentation: networking: dsa: Add Broadcom SF2 document

 Documentation/networking/dsa/bcm_sf2.txt | 113 ++
 Documentation/networking/dsa/dsa.txt | 615 +++
 2 files changed, 728 insertions(+)
 create mode 100644 Documentation/networking/dsa/bcm_sf2.txt
 create mode 100644 Documentation/networking/dsa/dsa.txt

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v3 1/2] Documentation: networking: add a DSA document

2015-08-25 Thread Florian Fainelli

Describe how the DSA subsystem works, its design principles,
limitations, and describe in details how to implement a DSA switch
driver.

Acked-by: Andrew Lunn 
Acked-by: Scott Feldman 
Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
Changes in v3:

- add tags from Scott, Andrew and Vivien

Changes in v2:

- fixing incomplete sentence in the SWITCHDEV paragraph
- fix spelling mistakes spotted by Andrew and Vivien
- add additional feedback from Vivien
- add a convergence TODO item between SWITCHDEV and DSA
- misc fixes and improvements


 Documentation/networking/dsa/dsa.txt | 615 +++
 1 file changed, 615 insertions(+)
 create mode 100644 Documentation/networking/dsa/dsa.txt

diff --git a/Documentation/networking/dsa/dsa.txt 
b/Documentation/networking/dsa/dsa.txt
new file mode 100644
index ..aa9c1f9313cd
--- /dev/null
+++ b/Documentation/networking/dsa/dsa.txt
@@ -0,0 +1,615 @@
+Distributed Switch Architecture
+===
+
+Introduction
+
+
+This document describes the Distributed Switch Architecture (DSA) subsystem
+design principles, limitations, interactions with other subsystems, and how to
+develop drivers for this subsystem as well as a TODO for developers interested
+in joining the effort.
+
+Design principles
+=
+
+The Distributed Switch Architecture is a subsystem which was primarily designed
+to support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line)
+using Linux, but has since evolved to support other vendors as well.
+
+The original philosophy behind this design was to be able to use unmodified
+Linux tools such as bridge, iproute2, ifconfig to work transparently whether
+they configured/queried a switch port network device or a regular network
+device.
+
+An Ethernet switch is typically comprised of multiple front-panel ports, and 
one
+or more CPU or management port. The DSA subsystem currently relies on the
+presence of a management port connected to an Ethernet controller capable of
+receiving Ethernet frames from the switch. This is a very common setup for all
+kinds of Ethernet switches found in Small Home and Office products: routers,
+gateways, or even top-of-the rack switches. This host Ethernet controller will
+be later referred to as "master" and "cpu" in DSA terminology and code.
+
+The D in DSA stands for Distributed, because the subsystem has been designed
+with the ability to configure and manage cascaded switches on top of each other
+using upstream and downstream Ethernet links between switches. These specific
+ports are referred to as "dsa" ports in DSA terminology and code. A collection
+of multiple switches connected to each other is called a "switch tree".
+
+For each front-panel port, DSA will create specialized network devices which 
are
+used as controlling and data-flowing endpoints for use by the Linux networking
+stack. These specialized network interfaces are referred to as "slave" network
+interfaces in DSA terminology and code.
+
+The ideal case for using DSA is when an Ethernet switch supports a "switch tag"
+which is a hardware feature making the switch insert a specific tag for each
+Ethernet frames it received to/from specific ports to help the management
+interface figure out:
+
+- what port is this frame coming from
+- what was the reason why this frame got forwarded
+- how to send CPU originated traffic to specific ports
+
+The subsystem does support switches not capable of inserting/stripping tags, 
but
+the features might be slightly limited in that case (traffic separation relies
+on Port-based VLAN IDs).
+
+Note that DSA does not currently create network interfaces for the "cpu" and
+"dsa" ports because:
+
+- the "cpu" port is the Ethernet switch facing side of the management
+  controller, and as such, would create a duplication of feature, since you
+  would get two interfaces for the same conduit: master netdev, and "cpu" 
netdev
+
+- the "dsa" port(s) are just conduits between two or more switches, and as such
+  cannot really be used as proper network interfaces either, only the
+  downstream, or the top-most upstream interface makes sense with that model
+
+Switch tagging protocols
+
+
+DSA currently supports 4 different tagging protocols, and a tag-less mode as
+well. The different protocols are implemented in:
+
+net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy)
+net/dsa/tag_dsa.c: Marvell's original DSA tag
+net/dsa/tag_edsa.c: Marvell's enhanced DSA tag
+net/dsa/tag_brcm.c: Broadcom's 4 bytes tag
+
+The exact format of the tag protocol is vendor specific, but in general, they
+all contain something which:
+
+- identifies which port the Ethernet frame came from/should be sent to
+- provides a reason why this frame was forwarded to the management interface
+
+Master network devices
+--
+
+Master network devices are regular, unmodified Linux network device drivers

[PATCH] drivers: net: xgene: fix: Oops in linkwatch_fire_event

2015-08-25 Thread Iyappan Subramanian

[ 1065.801569] Internal error: Oops: 9606 [#1] SMP
...
[ 1065.866655] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Apr 22 
2015
[ 1065.873937] Workqueue: events_power_efficient phy_state_machine
[ 1065.879837] task: fe01de105e80 ti: fe00bcf18000 task.ti: 
fe00bcf18000
[ 1065.887288] PC is at linkwatch_fire_event+0xac/0xc0
[ 1065.892141] LR is at linkwatch_fire_event+0xa0/0xc0
[ 1065.896995] pc : [] lr : [] pstate: 
21c5
[ 1065.904356] sp : fe00bcf1bd00
...
[ 1066.196813] Call Trace:
[ 1066.199248] [] linkwatch_fire_event+0xac/0xc0
[ 1066.205140] [] netif_carrier_off+0x54/0x64
[ 1066.210773] [] phy_state_machine+0x120/0x3bc
[ 1066.216578] [] process_one_work+0x15c/0x3a8
[ 1066.96] [] worker_thread+0x134/0x470
[ 1066.227757] [] kthread+0xe0/0xf8
[ 1066.232525] Code: 97f65ee9 f9420660 d538d082 8b42 (885f7c40)

The fix is to call phy_disconnect() from xgene_enet_mdio_remove,
which in turn call cancel_delayed_work_sync().

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   | 3 +++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index a626c43..cfa3704 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -801,6 +801,9 @@ int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
 
 void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata)
 {
+   if (pdata->phy_dev)
+   phy_disconnect(pdata->phy_dev);
+
mdiobus_unregister(pdata->mdio_bus);
mdiobus_free(pdata->mdio_bus);
pdata->mdio_bus = NULL;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 299eb43..a02ea7f8 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1277,9 +1277,10 @@ static int xgene_enet_remove(struct platform_device 
*pdev)
mac_ops->tx_disable(pdata);
 
xgene_enet_napi_del(pdata);
-   xgene_enet_mdio_remove(pdata);
-   xgene_enet_delete_desc_rings(pdata);
+   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   xgene_enet_mdio_remove(pdata);
unregister_netdev(ndev);
+   xgene_enet_delete_desc_rings(pdata);
pdata->port_ops->shutdown(pdata);
free_netdev(ndev);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] MAINTAINERS: update vmxnet3 driver maintainer

2015-08-25 Thread David Miller

From: Shrikrishna Khare 
Date: Mon, 24 Aug 2015 14:24:11 -0700

> Shreyas Bhatewara would no longer maintain the vmxnet3 driver. Taking over
> the role of vmxnet3 maintainer.
> 
> Signed-off-by: Shrikrishna Khare 
> Signed off-by: Shreyas Bhatewara 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 net-next 7/8] geneve: Consolidate Geneve functionality in single module.

2015-08-25 Thread Jesse Gross

On Tue, Aug 25, 2015 at 1:54 PM, Pravin Shelar  wrote:
> On Tue, Aug 25, 2015 at 12:03 PM, Jesse Gross  wrote:
>> On Mon, Aug 24, 2015 at 10:43 AM, Pravin B Shelar  wrote:
>>> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
>>> index c05bc13..8eb875d 100644
>>> --- a/drivers/net/geneve.c
>>> +++ b/drivers/net/geneve.c
>>> @@ -492,36 +813,36 @@ static int geneve_configure(struct net *net, struct 
>>> net_device *dev,
>> [...]
>>> +   gs = geneve_find_sock(gn, geneve->dst_port);
>>> +   if (gs) {
>>> +   if (metadata) {
>>> +   if (gs->collect_md)
>>> +   return -EEXIST;
>>> +   else
>>> +   return -EPERM;
>>> +   } else {
>>> +   if (gs->collect_md)
>>> +   return -EPERM;
>>> +
>>> +   t = geneve_lookup(gn, htons(dst_port),
>>> + rem_addr, geneve->vni);
>>> +   if (t)
>>> +   return -EBUSY;
>>> +   }
>>> +   }
>>
>> I like the new structure but unfortunately, I think there is a race.
>> If two devices are created with conflicting configurations but neither
>> is brought up then creation of both devices will succeed. However,
>> when the second one is brought up, it will silently collide with the
>> first.
>
> geneve tunnel is added to hash table during configure time. So the
> lookup does not have any dependency on device up or down state. The
> Lookup and hash table updates are done under rtnl lock.

But the check for duplicates is contingent on finding a socket. If we
configure two devices before calling geneve_open(), then there won't
be a socket yet and therefore no check.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] ip6_gre: release cached dst on tunnel removal

2015-08-25 Thread David Miller

From: Nicolas Dichtel 
Date: Tue, 25 Aug 2015 16:20:34 +0200

> From: huaibin Wang 
> 
> When a tunnel is deleted, the cached dst entry should be released.
> 
> This problem may prevent the removal of a netns (seen with a x-netns IPv6
> gre tunnel):
>   unregister_netdevice: waiting for lo to become free. Usage count = 3
> 
> CC: Dmitry Kozlov 
> Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
> Signed-off-by: huaibin Wang 
> Signed-off-by: Nicolas Dichtel 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] vxlan: fix multiple inclusion of vxlan.h

2015-08-25 Thread David Miller

From: Jiri Benc 
Date: Tue, 25 Aug 2015 18:36:50 +0200

> The vxlan_get_sk_family inline function was added after the last #endif,
> making multiple inclusion of net/vxlan.h fail. Move it to the proper place.
> 
> Reported-by: Mark Rustad 
> Fixes: 705cc62f6728c ("vxlan: provide access function for vxlan socket 
> address family")
> Signed-off-by: Jiri Benc 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] MAINTAINERS: Add VRF entry

2015-08-25 Thread David Miller

From: David Ahern 
Date: Tue, 25 Aug 2015 10:26:22 -0700

> Add entry for new VRF device driver.
> 
> Signed-off-by: David Ahern 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Patch net-next] route: fix a use-after-free

2015-08-25 Thread David Miller

From: Cong Wang 
Date: Tue, 25 Aug 2015 10:38:53 -0700

> This patch fixes the following crash:
 ...
> dst is freed right before lwtstate_put(), this is not correct...
> 
> Fixes: 61adedf3e3f1 ("route: move lwtunnel state to dst_entry")
> Acked-by: Jiri Benc 
> Signed-off-by: Cong Wang 
> Signed-off-by: Cong Wang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH stable] Revert "dev: set iflink to 0 for virtual interfaces"

2015-08-25 Thread David Miller

From: Stephen Hemminger 
Date: Tue, 25 Aug 2015 14:09:58 -0700

> Dave could you include this in next update to 4.1 stable update?

Queued up, thanks Stephen.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC, RFT PATCH 1/2] dl2k: Add support for IP1000A-based cards

2015-08-25 Thread Ondrej Zary

On Tuesday 25 August 2015 23:04:30 David Miller wrote:
> From: Ondrej Zary 
> Date: Sun, 23 Aug 2015 23:06:27 +0200
>
> > Add support for IP1000A chips to dl2k driver.
> > IP1000A chip looks like a TC9020 with integrated PHY.
> >
> > Tested with Asus NX1101.
>
> You're saying the PHY support is incomplete, so gigabit isn't even
> detected for these chips.
>
> So in a way this is a regression of sorts.
>
> Come back with these proposed changes once you have the PHY support
> situation sorted out.
>
> Thanks.

Actually, gigabit works with this patch. The "PHY magic" part contains 
mii_write(9, 0x0700) which makes gigabit work.

BTW. D-Link DGE-550T is on the way so I'll test that too.

-- 
Ondrej Zary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next 2/3] mlxsw: expose EMAD transactions statistics via debugfs

2015-08-25 Thread David Miller

From: Jiri Pirko 
Date: Mon, 24 Aug 2015 16:45:46 +0200

> From: Jiri Pirko 
> 
> Signed-off-by: Jiri Pirko 
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Elad Raz 

Enough with this debugfs madness.

Expose this stuff through standard interfaces.

They are simple statistics for crying out loud!

I'm not applying this, and I'm really getting irritated about how much
garbage people put into debugfs when it has _NO_ business being there.

Sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net-next: Fix warning while make xmldocs caused by skbuff.c

2015-08-25 Thread David Miller

From: Masanari Iida 
Date: Mon, 24 Aug 2015 22:56:54 +0900

> This patch fix following warnings.
> 
> .//net/core/skbuff.c:407: warning: No description found
> for parameter 'len'
> .//net/core/skbuff.c:407: warning: Excess function parameter
>  'length' description in '__netdev_alloc_skb'
> .//net/core/skbuff.c:476: warning: No description found
>  for parameter 'len'
> .//net/core/skbuff.c:476: warning: Excess function parameter
> 'length' description in '__napi_alloc_skb'
> 
> Signed-off-by: Masanari Iida 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 253 matches

Mail list logo