Re: [PATCH net-next 0/4 v2] net: VRF support in IPv6 stack

2015-10-11 Thread David Miller

Please minimize the crazy pointer casting in this series.

All of these casts between ipv4/ipv6 route objects and dsts are
completely unnecessary.

Prefer to maintain an rt/rt6 in local variables, and then when you
need the dst say >dst, or to get the device say rt6->dst.dev or
similar.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net 0/2] mlxsw: couple of fixes

2015-10-11 Thread David Miller
From: Jiri Pirko 
Date: Thu,  8 Oct 2015 15:17:36 +0200

> Just a couple of small fixes.

Series applied, thanks Jiri.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: switchdev and VLAN ranges

2015-10-11 Thread Jiri Pirko
Sat, Oct 10, 2015 at 12:36:26PM CEST, niko...@cumulusnetworks.com wrote:
>On 10/10/2015 09:49 AM, Elad Raz wrote:
>> 
>>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot 
>>>  wrote:
>>>
>>> I have two concerns in mind:
>>>
>>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>>> phase for each VID, preparing a range like 100-4000 would definitely not
>>> be recommended.
>>>
>>> b) imagine that you have two Linux bridges on a switch, one using the
>>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>>> bridge members, it is not possible for the driver to say "I can
>>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>>> the whole range.
>> 
>> Another concern I have with vid_being..vid_end range is the “flags”. Where 
>> flags can be BRIDGE_VLAN_INFO_PVID.
>> There is no sense having more than one VLAN as a PVID.
>> This leave the HW vendor the choice which VLAN id they will use as the PVID.
>> 
>
>iproute2 doesn't allow to do it but I can see that someone can actually make it
>so the flags for the range have it and it doesn't look correct. Perhaps we need
>something like the patch below to enforce this from kernel-side.
>
>
>diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>index d78b4429505a..02b17b53e9a6 100644
>--- a/net/bridge/br_netlink.c
>+++ b/net/bridge/br_netlink.c
>@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
>   if (vinfo_start)
>   return -EINVAL;
>   vinfo_start = vinfo;
>+  /* don't allow range of pvids */
>+  if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
>+  return -EINVAL;
>   continue;
>   }
> 

Looks correct to me. Could you please submit this properly? Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net] sch_hhf: fix return value of hhf_drop()

2015-10-11 Thread David Miller
From: Cong Wang 
Date: Wed,  7 Oct 2015 16:47:32 -0700

> Similar to commit c0afd9ce4d6a ("fq_codel: fix return value of 
> fq_codel_drop()")
> ->drop() is supposed to return the number of bytes it dropped,
> but hhf_drop () returns the id of the bucket where it drops
> a packet from.
> 
> Cc: Jamal Hadi Salim 
> Cc: Terry Lam 
> Signed-off-by: Cong Wang 
> Signed-off-by: Cong Wang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/3] net: unix: fix use-after-free

2015-10-11 Thread David Miller
From: Jason Baron 
Date: Fri,  9 Oct 2015 00:15:59 -0400

> These patches are against mainline, I can re-base to net-next, please
> let me know.
> 
> They have been tested against: https://lkml.org/lkml/2015/9/13/195,
> which causes the use-after-free quite quickly and here:
> https://lkml.org/lkml/2015/10/2/693.

I'd like to understand how patches that don't even compile can be
"tested"?

net/unix/af_unix.c: In function ‘unix_dgram_writable’:
net/unix/af_unix.c:2480:3: error: ‘other_full’ undeclared (first use in this 
function)
net/unix/af_unix.c:2480:3: note: each undeclared identifier is reported only 
once for each function it appears in

Could you explain how that works, I'm having a hard time understanding
this?

Also please address Hannes's feedback, thanks.


Re: [PATCH net-next] net: synack packets can be attached to request sockets

2015-10-11 Thread David Miller
From: Eric Dumazet 
Date: Thu, 08 Oct 2015 05:01:55 -0700

> From: Eric Dumazet 
> 
> selinux needs few changes to accommodate fact that SYNACK messages
> can be attached to a request socket, lacking sk_security pointer
> 
> (Only syncookies are still attached to a TCP_LISTEN socket)
> 
> Adds a new sk_listener() helper, and use it in selinux and sch_fq
> 
> Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead 
> of listener")
> Signed-off-by: Eric Dumazet 
> Reported by: kernel test robot 

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3] BNX2: fix a Null Pointer for stats_blk

2015-10-11 Thread David Miller
From: Weidong Wang 
Date: Thu, 8 Oct 2015 18:03:47 +0800

> we have two processes to do:
> P1#: ifconfig eth0 down; which will call bnx2_close, then will
> , and set Null to stats_blk
> P2#: ifconfig eth0; which will call bnx2_get_stats64, it will
> use stats_blk.
> In one case:
> --P1#--   --P2#--
>   stats_blk(no null)
> bnx2_free_mem
> ->bp->stats_blk = NULL
>   GET_64BIT_NET_STATS
> 
> then it will cause 'NULL Pointer' Problem.
> it is as well with 'ethtool -S ethx'.
> 
> Allocate the statistics block at probe time so that this problem is
> impossible
> 
> Signed-off-by: Wang Weidong 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges

2015-10-11 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Currently it's possible for someone to send a vlan range to the kernel
with the pvid flag set which will result in the pvid bouncing from a
vlan to vlan and isn't correct, it also introduces problems for hardware
where it doesn't make sense having more than 1 pvid. iproute2 already
enforces this, so let's enforce it on kernel-side as well.

Reported-by: Elad Raz 
Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br_netlink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index d78b4429505a..02b17b53e9a6 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
if (vinfo_start)
return -EINVAL;
vinfo_start = vinfo;
+   /* don't allow range of pvids */
+   if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
+   return -EINVAL;
continue;
}
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] bpf: fix cb access in socket filter programs

2015-10-11 Thread David Miller
From: Alexei Starovoitov 
Date: Wed,  7 Oct 2015 10:55:41 -0700

> eBPF socket filter programs may see junk in 'u32 cb[5]' area,
> since it could have been used by protocol layers earlier.
> 
> For socket filter programs used in af_packet we need to clean
> 20 bytes of skb->cb area if it could be used by the program.
> For programs attached to TCP/UDP sockets we need to save/restore
> these 20 bytes, since it's used by protocol layers.
> 
> Remove SK_RUN_FILTER macro, since it's no longer used.
> 
> Long term we may move this bpf cb area to per-cpu scratch, but that
> requires addition of new 'per-cpu load/store' instructions,
> so not suitable as a short term fix.
> 
> Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
> Reported-by: Eric Dumazet 
> Signed-off-by: Alexei Starovoitov 
> ---
> v1->v2:
> dropped extra conditional for clearing of cb for af_packet.
> 
> Since eBPF is root only, the impact of the bug is low and
> the fix can stay in net-next.

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 net-next 01/10] qed: Add module with basic common support

2015-10-11 Thread David Miller
From: Yuval Mintz 
Date: Sat, 10 Oct 2015 14:18:18 +

> So far we've worked on stabalizing it on 64-bit LE endian.
> I can easily understand why we'd want to relax this restriction later on
> [and sooner rather than later], but what would you gain be gained by
> compile-fixing for platforms on which it was never tested?

This is exactly the kind of attitude I don't want to see.

I am so glad that so many driver authors supported all PCI platforms
from the very beginning so that casually I could test their driver on
my sparc64 systems.

There is huge value in expanding the compiler testing your driver
gets, from day one.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4] net: ipv6: Make address flushing on ifdown optional

2015-10-11 Thread David Miller
From: David Ahern 
Date: Wed,  7 Oct 2015 08:17:39 -0700

> +static void fixup_managed_addr(struct inet6_dev *idev, struct inet6_ifaddr 
> *ifp)
> +{
> + if (!ifp->rt)
> + ifp->rt = addrconf_dst_alloc(idev, >addr, false);

This potentially leaves an error pointer dangling in ifp->rt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next] bnxt_en: New Broadcom ethernet driver.

2015-10-11 Thread Dmitry Kravkov
> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Michael Chan
> Sent: Saturday, October 10, 2015 2:39 PM
> To: David Miller 
> Cc: netdev 
> Subject: [PATCH net-next] bnxt_en: New Broadcom ethernet driver.
> 
> +config BNXT
> + tristate "Broadcom NetXtreme-C/E support"
> + depends on PCI
> + select FW_LOADER
> + select LIBCRC32C
> + ---help---
> +   This driver supports Broadcom NetXtremeII 10/25/40/50 gigabit
> +   Ethernet cards.  To compile this driver as a module, choose M here:
> +   the module will be called bnx2x.  This is recommended.
Should be bnxt

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/2] net: fix some bugs in HNS drivers

2015-10-11 Thread David Miller
From: huangdaode 
Date: Thu, 8 Oct 2015 14:17:23 +0800

> This patchset fixes the two bugs in HNS driver, one is for fixing the
> compilation warning bug on arm 32-bit platform, another is fixing the wrong
> mac port judgement bug which is found during internal tests.

This patch series does not apply to the net-next tree at all.

Also, both patches should use a subsystem prefix of "hisilicon: "
in the Subject lines.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tun: use sk_fullsock() before reading sk->sk_tsflags

2015-10-11 Thread Michael S. Tsirkin
On Fri, Oct 09, 2015 at 03:42:21PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> timewait or request sockets are small and do not contain sk->sk_tsflags
> 
> Without this fix, we might read garbage, and crash later in
> 
> __skb_complete_tx_timestamp()
>  -> sock_queue_err_skb()
> 
> (These pseudo sockets do not have an error queue either)
> 
> Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead 
> of listener")
> Signed-off-by: Eric Dumazet 
> Cc: Willem de Bruijn 


Acked-by: Michael S. Tsirkin 

> ---
>  Note this bug also exists on net tree for timewait sockets but only
>  in exceptional conditions (routing glitches and IP early demux)
> 
>  drivers/net/tun.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 976aa9704297..b1878faea397 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -858,7 +858,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
>   goto drop;
>  
> - if (skb->sk) {
> + if (skb->sk && sk_fullsock(skb->sk)) {
>   sock_tx_timestamp(skb->sk, _shinfo(skb)->tx_flags);
>   sw_tx_timestamp(skb);
>   }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qlcnic: constify qlcnic_mbx_ops structure

2015-10-11 Thread Julia Lawall
The only instance of a qlcnic_mbx_ops structure is never modified.  Thus
the declaration of the structure and all references to the structure type
can be made const.

In the definition of the qlcnic_mailbox structure, the ops field is no
longer lined up with the other fields.  This was left as is, to avoid a lot
of trivial changes on the other lines.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall 

---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h |2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
index 9f0bdd9..37a731b 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
@@ -4048,7 +4048,7 @@ static void qlcnic_83xx_mailbox_worker(struct work_struct 
*work)
struct qlcnic_mailbox *mbx = container_of(work, struct qlcnic_mailbox,
  work);
struct qlcnic_adapter *adapter = mbx->adapter;
-   struct qlcnic_mbx_ops *mbx_ops = mbx->ops;
+   const struct qlcnic_mbx_ops *mbx_ops = mbx->ops;
struct device *dev = >pdev->dev;
atomic_t *rsp_status = >rsp_status;
struct list_head *head = >cmd_q;
@@ -4098,7 +4098,7 @@ static void qlcnic_83xx_mailbox_worker(struct work_struct 
*work)
}
 }
 
-static struct qlcnic_mbx_ops qlcnic_83xx_mbx_ops = {
+static const struct qlcnic_mbx_ops qlcnic_83xx_mbx_ops = {
.enqueue_cmd= qlcnic_83xx_enqueue_mbx_cmd,
.dequeue_cmd= qlcnic_83xx_dequeue_mbx_cmd,
.decode_resp= qlcnic_83xx_decode_mbx_rsp,
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
index d6696cf..46bbea8 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -1092,7 +1092,7 @@ struct qlcnic_filter_hash {
 struct qlcnic_mailbox {
struct workqueue_struct *work_q;
struct qlcnic_adapter   *adapter;
-   struct qlcnic_mbx_ops   *ops;
+   const struct qlcnic_mbx_ops *ops;
struct work_struct  work;
struct completion   completion;
struct list_headcmd_q;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: encx24j600: Fix typos in Kconfig

2015-10-11 Thread David Miller
From: j...@ringle.org
Date: Thu,  8 Oct 2015 11:10:19 -0400

> From: Jon Ringle 
> 
> Signed-off-by: Jon Ringle 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/5] bridge: vlan: fix possible null ptr derefs on port init and deinit

2015-10-11 Thread Nikolay Aleksandrov
On 10/11/2015 02:21 PM, Ido Schimmel wrote:
> Wed, Sep 30, 2015 at 09:16:54PM IDT, ra...@blackwall.org wrote:
>> From: Nikolay Aleksandrov 
>>
>> When a new port is being added we need to make vlgrp available after
>> rhashtable has been initialized and when removing a port we need to
>> flush the vlans and free the resources after we're sure noone can use
>> the port, i.e. after it's removed from the port list and synchronize_rcu
>> is executed.
> 
> Hi Nikolay,
> 
> Changing the order of port deinit breaks symmetry with the init
> sequence. It also introduces a problem for switchdev drivers. Flushing
> the VLANs clears HW VLAN filters and then, when port is unlinked from
> bridge and CHANGEUPPER is received, port is configured to direct traffic
> to CPU (as it's not offloaded anymore). Doing the reverse (like in this
> patch) renders the port unusable.
> 
> Regarding the reason for this change, are you afraid that vlgrp will be
> accessed in bridge's rx handler or xmit function after it's freed? If
> so, maybe we can access it using RCU primitives? That way, both the rx
> handler and xmit function (executed under RCU lock) will either have a
> valid copy or not. Looking at previous iterations of this code, I see
> that was the case with the 'net_port_vlans' struct.
> 
> I can start working on a fix if you agree with the proposed solution.
> 
> Thanks.
> 

Hi,
Ah, I didn't know about this, I feared that something might rely on the
particular order of the operations but didn't have a way to test this one in
particular. Anyway, your proposed solution sounds good to me.

Thank you,
 Nik


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: simplify configfs attributes V2

2015-10-11 Thread Christoph Hellwig
On Fri, Oct 09, 2015 at 04:37:51PM -0500, Felipe Balbi wrote:
> For reference, I'm fine if you guys take all patches through FS
> tree. Another option is waiting for dependencies to be merged in v4.4,
> and the gadget changes merge in v4.5, whatever works.

I'd prefer to take them all in one go.  If we really have to I could
drop the last patch for now and clean up later, but I'd prefer not to.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] net/fsl_pq_mdio: check TBI address for consistency with mapped range

2015-10-11 Thread Timur Tabi

Gerlando Falauto wrote:


Change-Id: If1e7d8931f440ea9259726c36d3df797dda016fb


You need to remove these from patches that are emailed, and fix the 
pointer type comparison.


Otherwise,

Acked-by: Timur Tabi 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/23] spear13xx_pcie_gadget: use per-attribute show and store methods

2015-10-11 Thread Christoph Hellwig
On Fri, Oct 09, 2015 at 04:05:17PM -0500, Felipe Balbi wrote:
> Pratyush Anand  writes:
> 
> > On Sat, Oct 3, 2015 at 7:02 PM, Christoph Hellwig  wrote:
> >> Signed-off-by: Christoph Hellwig 
> >
> > Acked-by: Pratyush Anand 
> 
> I don't seem to have the actual patch, care to resend?

The whole series was sent to all the receipients in the To and Cc lists,
so check your spam folder or one of the list archives.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/23] usb-gadget: use per-attribute show and store methods

2015-10-11 Thread Christoph Hellwig
On Fri, Oct 09, 2015 at 04:19:57PM -0500, Felipe Balbi wrote:
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Andrzej Pietrasiewicz 
> 
> I suppose this depends on other fs/configfs changes ?

The whole series should be applied in order, but doesn't have any
other dependencies.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-11 Thread Hajime Tazaki

At Fri, 9 Oct 2015 11:27:36 -0600,
David Ahern wrote:
> 
> [1  ]
> On 10/9/15 1:17 AM, Steffen Klassert wrote:
> >>> diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
> >>> index 30caa289c5db..5cedfda4b241 100644
> >>> --- a/net/ipv6/xfrm6_policy.c
> >>> +++ b/net/ipv6/xfrm6_policy.c
> >>> @@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net 
> >>> *net, int tos, int oif,
> >>>
> >>>   memset(, 0, sizeof(fl6));
> >>>   fl6.flowi6_oif = oif;
> >>> + fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
> >>>   memcpy(, daddr, sizeof(fl6.daddr));
> >>>   if (saddr)
> >>>   memcpy(, saddr, sizeof(fl6.saddr));
> >>
> >> I found that this fix is still not sufficient with the mip6
> >> (Mobile IPv6) use case.
> >
> > It does not even fix the vti case. The behaviour of the vti devices is
> > the same, with and without the patch.
> >
> 
> The attached patch applied to Linus' tree works for me. Currently the 
> above change is not in his tree, so I added it to this patch. Once you 
> confirm that it works for you I'll create the delta-patch for net and 
> send out.

I gave it a try but without any luck unfortunately.
I may need to look carefully what mip6 does here.

the code path where I'm looking at for MH packet (raw socket
with IPPROTO_MH) is:

#0  ip6_route_output (net=0x73a40a40 , sk=0x730b5c50, 
fl6=0x70ed2fe0) at net/ipv6/route.c:1195
#1  0x735d155f in ip6_dst_lookup_tail (net=0x73a40a40 
, sk=0x730b5c50, dst=0x70ed2f18, fl6=0x70ed2fe0)
at net/ipv6/ip6_output.c:929
#2  0x735d1707 in ip6_dst_lookup_flow (sk=0x730b5c50, 
fl6=0x70ed2fe0, final_dst=0x0) at net/ipv6/ip6_output.c:1024
#3  0x736199f7 in rawv6_sendmsg (sk=0x730b5c50, msg=0x70ed3320, 
len=32) at net/ipv6/raw.c:872
#4  0x73526d21 in inet_sendmsg (sock=0x730b5810, 
msg=0x70ed3320, size=32) at net/ipv4/af_inet.c:737
#5  0x7338dc8a in sock_sendmsg_nosec (msg=0x70ed3320, 
sock=0x730b5810) at net/socket.c:610
#6  sock_sendmsg (sock=0x730b5810, msg=0x70ed3320) at net/socket.c:620


which (*dst)->error of ip6_route_output gives -22 (EINVAL).

I will be back once I got any findings.

-- Hajime

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-11 Thread David Ahern

On 10/11/15 7:22 AM, Hajime Tazaki wrote:


At Fri, 9 Oct 2015 11:27:36 -0600,
David Ahern wrote:


[1  ]
On 10/9/15 1:17 AM, Steffen Klassert wrote:

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 30caa289c5db..5cedfda4b241 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -37,6 +37,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, 
int tos, int oif,

memset(, 0, sizeof(fl6));
fl6.flowi6_oif = oif;
+   fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
memcpy(, daddr, sizeof(fl6.daddr));
if (saddr)
memcpy(, saddr, sizeof(fl6.saddr));


I found that this fix is still not sufficient with the mip6
(Mobile IPv6) use case.


It does not even fix the vti case. The behaviour of the vti devices is
the same, with and without the patch.



The attached patch applied to Linus' tree works for me. Currently the
above change is not in his tree, so I added it to this patch. Once you
confirm that it works for you I'll create the delta-patch for net and
send out.


I gave it a try but without any luck unfortunately.
I may need to look carefully what mip6 does here.

the code path where I'm looking at for MH packet (raw socket
with IPPROTO_MH) is:

#0  ip6_route_output (net=0x73a40a40 , sk=0x730b5c50, 
fl6=0x70ed2fe0) at net/ipv6/route.c:1195
#1  0x735d155f in ip6_dst_lookup_tail (net=0x73a40a40 
, sk=0x730b5c50, dst=0x70ed2f18, fl6=0x70ed2fe0)
 at net/ipv6/ip6_output.c:929
#2  0x735d1707 in ip6_dst_lookup_flow (sk=0x730b5c50, 
fl6=0x70ed2fe0, final_dst=0x0) at net/ipv6/ip6_output.c:1024
#3  0x736199f7 in rawv6_sendmsg (sk=0x730b5c50, msg=0x70ed3320, 
len=32) at net/ipv6/raw.c:872
#4  0x73526d21 in inet_sendmsg (sock=0x730b5810, 
msg=0x70ed3320, size=32) at net/ipv4/af_inet.c:737
#5  0x7338dc8a in sock_sendmsg_nosec (msg=0x70ed3320, 
sock=0x730b5810) at net/socket.c:610
#6  sock_sendmsg (sock=0x730b5810, msg=0x70ed3320) at net/socket.c:620


which (*dst)->error of ip6_route_output gives -22 (EINVAL).

I will be back once I got any findings.


How does a xfrm change affect this code path?

Can you apply this patch, and then run:

perf record -e fib6:* -a -g
perf script

Thanks,
David


>From 5da83796c110d5f2c995b22b38be8e5268a7f573 Mon Sep 17 00:00:00 2001
From: David Ahern 
Date: Thu, 1 Oct 2015 14:52:58 -0700
Subject: [PATCH net-next] net: IPv6 fib and route tracepoints

Signed-off-by: David Ahern 
---
 include/trace/events/fib6.h | 237 
 net/core/net-traces.c   |   7 ++
 net/ipv6/route.c|  24 -
 3 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/fib6.h

diff --git a/include/trace/events/fib6.h b/include/trace/events/fib6.h
new file mode 100644
index ..9f34da0d59a3
--- /dev/null
+++ b/include/trace/events/fib6.h
@@ -0,0 +1,237 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM fib6
+
+#if !defined(_TRACE_FIB6_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_FIB6_H
+
+#include 
+#include 
+#include 
+#include 
+
+TRACE_EVENT(fib6_table_lookup,
+
+   TP_PROTO(const struct net *net, const struct rt6_info *rt,
+u32 tb_id, const struct flowi6 *flp),
+
+   TP_ARGS(net, rt, tb_id, flp),
+
+   TP_STRUCT__entry(
+   __field(u32,tb_id   )
+
+   __field(int,oif )
+   __field(int,iif )
+   __field(__u8,   tos )
+   __field(__u8,   scope   )
+   __field(__u8,   flags   )
+   __array(__u8,   src,16  )
+   __array(__u8,   dst,16  )
+
+   __dynamic_array(char,   name,   IFNAMSIZ )
+   __array(__u8,   gw, 16   )
+   ),
+
+   TP_fast_assign(
+   struct in6_addr *in6;
+
+   __entry->tb_id = tb_id;
+   __entry->oif = flp->flowi6_oif;
+   __entry->iif = flp->flowi6_iif;
+   __entry->tos = flp->flowi6_tos;
+   __entry->scope = flp->flowi6_scope;
+   __entry->flags = flp->flowi6_flags;
+
+   in6 = (struct in6_addr *)__entry->src;
+   *in6 = flp->saddr;
+
+   in6 = (struct in6_addr *)__entry->dst;
+   *in6 = flp->daddr;
+
+   if (rt->rt6i_idev) {
+   __assign_str(name, rt->rt6i_idev->dev->name);
+   } else {
+   __assign_str(name, "");
+   }
+   if (rt == net->ipv6.ip6_null_entry) {
+   struct in6_addr in6_zero = {};
+
+   in6 = (struct in6_addr *)__entry->gw;
+ 

Re: [PATCH v4 1/3] net: unix: fix use-after-free in unix_dgram_poll()

2015-10-11 Thread Rainer Weikusat
Hannes Frederic Sowa  writes:
> Jason Baron  writes:
>
>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>> queue associated with the socket s that we are poll'ing against, but also 
>> calls
>> sock_poll_wait() for a remote peer socket p, if it is connected. Thus,
>> if we call poll()/select()/epoll() for the socket s, there are then
>> a couple of code paths in which the remote peer socket p and its associated
>> peer_wait queue can be freed before poll()/select()/epoll() have a chance
>> to remove themselves from the remote peer socket.
>>
>> The way that remote peer socket can be freed are:
>>
>> 1. If s calls connect() to a connect to a new socket other than p, it will
>> drop its reference on p, and thus a close() on p will free it.
>>
>> 2. If we call close on p(), then a subsequent sendmsg() from s, will drop
>> the final reference to p, allowing it to be freed.
>>
>> Address this issue, by reverting unix_dgram_poll() to only register with
>> the wait queue associated with s and register a callback with the remote peer
>> socket on connect() that will wake up the wait queue associated with s. If
>> scenarios 1 or 2 occur above we then simply remove the callback from the
>> remote peer. This then presents the expected semantics to poll()/select()/
>> epoll().
>>
>> I've implemented this for sock-type, SOCK_RAW, SOCK_DGRAM, and SOCK_SEQPACKET
>> but not for SOCK_STREAM, since SOCK_STREAM does not use unix_dgram_poll().
>>
>> Introduced in commit ec0d215f9420 ("af_unix: fix 'poll for write'/connected
>> DGRAM sockets").
>>
>> Tested-by: Mathias Krause 
>> Signed-off-by: Jason Baron 
>
> While I think this approach works, I haven't seen where the current code
> leaks a reference.

It doesn't "leak a reference" (strictly). It possibly registers a wait queue 
with
whatever invoked the poll-routine which belongs to the peer socket of
the socket poll was called on. And the inherent problem with that is
that the lifetime of the peer socket is not necessarily the same as
the lifetime of the polled socket. If the polled socket is disconnected
from its peer while still being polled (or registered for being
polled), the former peer may be freed despite the polling code (of whatever
provenience) still references the peer_wait member of the unix socket
structure for this socket.

As pointed out in the original mail, two ways for this to happen is to
call connect on the polled socket or cause a unix_dgram_sendmsg call
on that after the peer socket was closed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges

2015-10-11 Thread Jiri Pirko
Sun, Oct 11, 2015 at 12:49:56PM CEST, ra...@blackwall.org wrote:
>From: Nikolay Aleksandrov 
>
>Currently it's possible for someone to send a vlan range to the kernel
>with the pvid flag set which will result in the pvid bouncing from a
>vlan to vlan and isn't correct, it also introduces problems for hardware
>where it doesn't make sense having more than 1 pvid. iproute2 already
>enforces this, so let's enforce it on kernel-side as well.
>
>Reported-by: Elad Raz 
>Signed-off-by: Nikolay Aleksandrov 

Acked-by: Jiri Pirko 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-11 Thread Hajime Tazaki

At Sun, 11 Oct 2015 07:31:32 -0600,
David Ahern wrote:

> >> The attached patch applied to Linus' tree works for me. Currently the
> >> above change is not in his tree, so I added it to this patch. Once you
> >> confirm that it works for you I'll create the delta-patch for net and
> >> send out.
> >
> > I gave it a try but without any luck unfortunately.
> > I may need to look carefully what mip6 does here.
> >
> > the code path where I'm looking at for MH packet (raw socket
> > with IPPROTO_MH) is:
> >
> > #0  ip6_route_output (net=0x73a40a40 , 
> > sk=0x730b5c50, fl6=0x70ed2fe0) at net/ipv6/route.c:1195
> > #1  0x735d155f in ip6_dst_lookup_tail (net=0x73a40a40 
> > , sk=0x730b5c50, dst=0x70ed2f18, 
> > fl6=0x70ed2fe0)
> >  at net/ipv6/ip6_output.c:929
> > #2  0x735d1707 in ip6_dst_lookup_flow (sk=0x730b5c50, 
> > fl6=0x70ed2fe0, final_dst=0x0) at net/ipv6/ip6_output.c:1024
> > #3  0x736199f7 in rawv6_sendmsg (sk=0x730b5c50, 
> > msg=0x70ed3320, len=32) at net/ipv6/raw.c:872
> > #4  0x73526d21 in inet_sendmsg (sock=0x730b5810, 
> > msg=0x70ed3320, size=32) at net/ipv4/af_inet.c:737
> > #5  0x7338dc8a in sock_sendmsg_nosec (msg=0x70ed3320, 
> > sock=0x730b5810) at net/socket.c:610
> > #6  sock_sendmsg (sock=0x730b5810, msg=0x70ed3320) at 
> > net/socket.c:620
> >
> >
> > which (*dst)->error of ip6_route_output gives -22 (EINVAL).
> >
> > I will be back once I got any findings.
> 
> How does a xfrm change affect this code path?

I've faced this issue since the following patch was applied.

commit 741a11d9e4103a8e1c590ef1280143fe654e4e33
Author: David Ahern 
Date:   Mon Sep 28 10:12:13 2015 -0700

net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set

I still couldn't spot which part (other than my posted call
graph) is broken and am not sure whether the xfrm change
affects or not (which I need to check the mip6 code again).

> Can you apply this patch, and then run:
> 
> perf record -e fib6:* -a -g
> perf script

I'm using libos environment right now, so the perf trace
can't be used as it is.

I'll keep post here.

-- Hajime
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/5] bridge: vlan: fix possible null ptr derefs on port init and deinit

2015-10-11 Thread Nikolay Aleksandrov
On 10/11/2015 02:42 PM, Nikolay Aleksandrov wrote:
> On 10/11/2015 02:21 PM, Ido Schimmel wrote:
>> Wed, Sep 30, 2015 at 09:16:54PM IDT, ra...@blackwall.org wrote:
>>> From: Nikolay Aleksandrov 
>>>
>>> When a new port is being added we need to make vlgrp available after
>>> rhashtable has been initialized and when removing a port we need to
>>> flush the vlans and free the resources after we're sure noone can use
>>> the port, i.e. after it's removed from the port list and synchronize_rcu
>>> is executed.
>>
>> Hi Nikolay,
>>
>> Changing the order of port deinit breaks symmetry with the init
>> sequence. It also introduces a problem for switchdev drivers. Flushing
>> the VLANs clears HW VLAN filters and then, when port is unlinked from
>> bridge and CHANGEUPPER is received, port is configured to direct traffic
>> to CPU (as it's not offloaded anymore). Doing the reverse (like in this
>> patch) renders the port unusable.
>>
>> Regarding the reason for this change, are you afraid that vlgrp will be
>> accessed in bridge's rx handler or xmit function after it's freed? If
>> so, maybe we can access it using RCU primitives? That way, both the rx
>> handler and xmit function (executed under RCU lock) will either have a
>> valid copy or not. Looking at previous iterations of this code, I see
>> that was the case with the 'net_port_vlans' struct.
>>
>> I can start working on a fix if you agree with the proposed solution.
>>
>> Thanks.
>>
> 
> Hi,
> Ah, I didn't know about this, I feared that something might rely on the
> particular order of the operations but didn't have a way to test this one in
> particular. Anyway, your proposed solution sounds good to me.
> 
> Thank you,
>  Nik

One thing to be careful about is the creation/destruction of the rhashtable 
itself
and the order of operations in regard to vlgrp visibility, so it's not only if
vlgrp is visible or not - it should be visible after rhashtable has been 
initialized
and should be removed before it's freed, the rest is pretty straight-forward.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 0/7] switchdev: change locking

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

This is something which I'm currently struggling with.
Callers of attr_set and obj_add/del often hold not only RTNL, but also
spinlock (bridge). So in that case, the driver implementing the op cannot sleep.

The way rocker is dealing with this now is just to invoke driver operation
and go out, without any checking or reporting of the operation status.

Since it would be nice to at least put a warning in case the operation fails,
it makes sense to do this in delayed work directly in switchdev core
instead of implementing this in separate drivers. And that is what this patchset
is introducing.

So from now on, the locking of switchdev mod ops is consistent. Caller either
holds rtnl mutex or in case it does not, caller sets defer flag, telling
switchdev core to process the op later in delayed work.

Flush function for switchdev deferred ops can be called by op
caller in appropriate location, for example after it releases
spin lock, to force switchdev core to process pending ops.

Jiri Pirko (7):
  switchdev: introduce switchdev workqueue
  switchdev: allow caller to explicitly request attr_set as deferred
  switchdev: remove pointers from switchdev objects
  switchdev: introduce possibility to defer obj_add/del
  bridge: defer switchdev fdb del call in fdb_del_external_learn
  rocker: remove nowait from switchdev callbacks.
  switchdev: assert rtnl mutex when going over lower netdevs

 drivers/net/ethernet/rocker/rocker.c |  13 +-
 include/net/switchdev.h  |  14 +-
 net/bridge/br_fdb.c  |   7 +-
 net/bridge/br_if.c   |   3 +
 net/bridge/br_stp.c  |   3 +-
 net/dsa/slave.c  |   2 +-
 net/switchdev/switchdev.c| 289 ---
 7 files changed, 231 insertions(+), 100 deletions(-)

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 7/7] switchdev: assert rtnl mutex when going over lower netdevs

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

netdev_for_each_lower_dev has to be called with rtnl mutex held. So
better enforce it in switchdev functions.

Signed-off-by: Jiri Pirko 
---
 net/switchdev/switchdev.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 5e483d2..6323162 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -487,6 +487,8 @@ EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
  * @id: object ID
  * @obj: object to dump
  * @cb: function to call with a filled object
+ *
+ * rtnl_lock must be held.
  */
 int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj,
switchdev_obj_dump_cb_t *cb)
@@ -496,6 +498,8 @@ int switchdev_port_obj_dump(struct net_device *dev, struct 
switchdev_obj *obj,
struct list_head *iter;
int err = -EOPNOTSUPP;
 
+   ASSERT_RTNL();
+
if (ops && ops->switchdev_port_obj_dump)
return ops->switchdev_port_obj_dump(dev, obj, cb);
 
@@ -1061,6 +1065,8 @@ static struct net_device *switchdev_get_dev_by_nhs(struct 
fib_info *fi)
struct net_device *dev = NULL;
int nhsel;
 
+   ASSERT_RTNL();
+
/* For this route, all nexthop devs must be on the same switch. */
 
for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
@@ -1291,10 +1297,11 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
u32 mark = dev->ifindex;
u32 reset_mark = 0;
 
-   if (group_dev && joining) {
-   mark = switchdev_port_fwd_mark_get(dev, group_dev);
-   } else if (group_dev && !joining) {
-   if (dev->offload_fwd_mark == mark)
+   if (group_dev) {
+   ASSERT_RTNL();
+   if (joining)
+   mark = switchdev_port_fwd_mark_get(dev, group_dev);
+   else if (dev->offload_fwd_mark == mark)
/* Ohoh, this port was the mark reference port,
 * but it's leaving the group, so reset the
 * mark for the remaining ports in the group.
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 2/7] switchdev: allow caller to explicitly request attr_set as deferred

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

Caller should know if he can call attr_set directly (when holding RTNL)
or if he has to defer the att_set processing for later.

This also allows drivers to sleep inside attr_set and report operation
status back to switchdev core. Switchdev core then warns if status is
not ok, instead of silent errors happening in drivers.

Signed-off-by: Jiri Pirko 
---
 include/net/switchdev.h   |   1 +
 net/bridge/br_stp.c   |   3 +-
 net/switchdev/switchdev.c | 107 --
 3 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index cc7f772..74e349a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -16,6 +16,7 @@
 #include 
 
 #define SWITCHDEV_F_NO_RECURSE BIT(0)
+#define SWITCHDEV_F_DEFER  BIT(1)
 
 struct switchdev_trans_item {
struct list_head list;
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index 3a982c0..a90bf18 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned int 
state)
 {
struct switchdev_attr attr = {
.id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
+   .flags = SWITCHDEV_F_DEFER,
.u.stp_state = state,
};
int err;
 
p->state = state;
err = switchdev_port_attr_set(p->dev, );
-   if (err && err != -EOPNOTSUPP)
+   if (err)
br_warn(p->br, "error setting offload STP state on port 
%u(%s)\n",
(unsigned int) p->port_no, p->dev->name);
 }
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 10306b6..8831dbd 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -166,6 +166,49 @@ static int __switchdev_port_attr_set(struct net_device 
*dev,
return err;
 }
 
+static int switchdev_port_attr_set_now(struct net_device *dev,
+  struct switchdev_attr *attr)
+{
+   struct switchdev_trans trans;
+   int err;
+
+   switchdev_trans_init();
+
+   /* Phase I: prepare for attr set. Driver/device should fail
+* here if there are going to be issues in the commit phase,
+* such as lack of resources or support.  The driver/device
+* should reserve resources needed for the commit phase here,
+* but should not commit the attr.
+*/
+
+   trans.ph_prepare = true;
+   err = __switchdev_port_attr_set(dev, attr, );
+   if (err) {
+   /* Prepare phase failed: abort the transaction.  Any
+* resources reserved in the prepare phase are
+* released.
+*/
+
+   if (err != -EOPNOTSUPP)
+   switchdev_trans_items_destroy();
+
+   return err;
+   }
+
+   /* Phase II: commit attr set.  This cannot fail as a fault
+* of driver/device.  If it does, it's a bug in the driver/device
+* because the driver said everythings was OK in phase I.
+*/
+
+   trans.ph_prepare = false;
+   err = __switchdev_port_attr_set(dev, attr, );
+   WARN(err, "%s: Commit of attribute (id=%d) failed.\n",
+dev->name, attr->id);
+   switchdev_trans_items_warn_destroy(dev, );
+
+   return err;
+}
+
 struct switchdev_attr_set_work {
struct work_struct work;
struct net_device *dev;
@@ -176,14 +219,17 @@ static void switchdev_port_attr_set_work(struct 
work_struct *work)
 {
struct switchdev_attr_set_work *asw =
container_of(work, struct switchdev_attr_set_work, work);
+   bool rtnl_locked = rtnl_is_locked();
int err;
 
-   rtnl_lock();
-   err = switchdev_port_attr_set(asw->dev, >attr);
+   if (!rtnl_locked)
+   rtnl_lock();
+   err = switchdev_port_attr_set_now(asw->dev, >attr);
if (err && err != -EOPNOTSUPP)
netdev_err(asw->dev, "failed (err=%d) to set attribute 
(id=%d)\n",
   err, asw->attr.id);
-   rtnl_unlock();
+   if (!rtnl_locked)
+   rtnl_unlock();
 
dev_put(asw->dev);
kfree(work);
@@ -204,7 +250,7 @@ static int switchdev_port_attr_set_defer(struct net_device 
*dev,
asw->dev = dev;
memcpy(>attr, attr, sizeof(asw->attr));
 
-   schedule_work(>work);
+   queue_work(switchdev_wq, >work);
 
return 0;
 }
@@ -218,57 +264,16 @@ static int switchdev_port_attr_set_defer(struct 
net_device *dev,
  * Use a 2-phase prepare-commit transaction model to ensure
  * system is not left in a partially updated state due to
  * failure from driver/device.
+ *
+ * rtnl_lock must be held and must not be in atomic section,
+ * in case SWITCHDEV_F_DEFER flag is not set.
  */
 int switchdev_port_attr_set(struct net_device *dev, struct 

[patch net-next 1/7] switchdev: introduce switchdev workqueue

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

This is going to be used for deferred operations.

Signed-off-by: Jiri Pirko 
---
 include/net/switchdev.h   |  5 +
 net/switchdev/switchdev.c | 19 +++
 2 files changed, 24 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 89266a3..cc7f772 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -202,6 +202,7 @@ int switchdev_port_fdb_dump(struct sk_buff *skb, struct 
netlink_callback *cb,
 void switchdev_port_fwd_mark_set(struct net_device *dev,
 struct net_device *group_dev,
 bool joining);
+void switchdev_flush_deferred(void);
 
 #else
 
@@ -323,6 +324,10 @@ static inline void switchdev_port_fwd_mark_set(struct 
net_device *dev,
 {
 }
 
+static inline void switchdev_flush_deferred(void)
+{
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 6e4a4f9..10306b6 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -17,9 +17,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
+static struct workqueue_struct *switchdev_wq;
+
 /**
  * switchdev_trans_item_enqueue - Enqueue data item to transaction queue
  *
@@ -1210,3 +1213,19 @@ void switchdev_port_fwd_mark_set(struct net_device *dev,
dev->offload_fwd_mark = mark;
 }
 EXPORT_SYMBOL_GPL(switchdev_port_fwd_mark_set);
+
+void switchdev_flush_deferred(void)
+{
+   flush_workqueue(switchdev_wq);
+}
+EXPORT_SYMBOL_GPL(switchdev_flush_deferred);
+
+static int __init switchdev_init(void)
+{
+   switchdev_wq = create_workqueue("switchdev");
+   if (!switchdev_wq)
+   return -ENOMEM;
+   return 0;
+}
+
+subsys_initcall(switchdev_init);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 4/7] switchdev: introduce possibility to defer obj_add/del

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

Similar to the attr usecase, the caller knows if he is holding RTNL and is
in atomic section. So let the called to decide the correct call variant.

This allows drivers to sleep inside their ops and wait for hw to get the
operation status. Then the status is propagated into switchdev core.
This avoids silent errors in drivers.

Signed-off-by: Jiri Pirko 
---
 include/net/switchdev.h   |   1 +
 net/switchdev/switchdev.c | 137 +-
 2 files changed, 112 insertions(+), 26 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index cd281db..c31908f 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -66,6 +66,7 @@ enum switchdev_obj_id {
 
 struct switchdev_obj {
enum switchdev_obj_id id;
+   u32 flags;
 };
 
 /* SWITCHDEV_OBJ_ID_PORT_VLAN */
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 32a0390..5e483d2 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -304,21 +304,8 @@ static int __switchdev_port_obj_add(struct net_device *dev,
return err;
 }
 
-/**
- * switchdev_port_obj_add - Add port object
- *
- * @dev: port device
- * @id: object ID
- * @obj: object to add
- *
- * Use a 2-phase prepare-commit transaction model to ensure
- * system is not left in a partially updated state due to
- * failure from driver/device.
- *
- * rtnl_lock must be held.
- */
-int switchdev_port_obj_add(struct net_device *dev,
-  const struct switchdev_obj *obj)
+static int switchdev_port_obj_add_now(struct net_device *dev,
+ const struct switchdev_obj *obj)
 {
struct switchdev_trans trans;
int err;
@@ -360,17 +347,9 @@ int switchdev_port_obj_add(struct net_device *dev,
 
return err;
 }
-EXPORT_SYMBOL_GPL(switchdev_port_obj_add);
 
-/**
- * switchdev_port_obj_del - Delete port object
- *
- * @dev: port device
- * @id: object ID
- * @obj: object to delete
- */
-int switchdev_port_obj_del(struct net_device *dev,
-  const struct switchdev_obj *obj)
+static int switchdev_port_obj_del_now(struct net_device *dev,
+ const struct switchdev_obj *obj)
 {
const struct switchdev_ops *ops = dev->switchdev_ops;
struct net_device *lower_dev;
@@ -386,13 +365,119 @@ int switchdev_port_obj_del(struct net_device *dev,
 */
 
netdev_for_each_lower_dev(dev, lower_dev, iter) {
-   err = switchdev_port_obj_del(lower_dev, obj);
+   err = switchdev_port_obj_del_now(lower_dev, obj);
if (err)
break;
}
 
return err;
 }
+
+struct switchdev_obj_work {
+   struct work_struct work;
+   struct net_device *dev;
+   struct switchdev_obj obj;
+   bool add; /* add of del */
+};
+
+static void switchdev_port_obj_work(struct work_struct *work)
+{
+   struct switchdev_obj_work *ow =
+   container_of(work, struct switchdev_obj_work, work);
+   bool rtnl_locked = rtnl_is_locked();
+   int err;
+
+   if (!rtnl_locked)
+   rtnl_lock();
+   if (ow->add)
+   err = switchdev_port_obj_add_now(ow->dev, >obj);
+   else
+   err = switchdev_port_obj_del_now(ow->dev, >obj);
+   if (err && err != -EOPNOTSUPP)
+   netdev_err(ow->dev, "failed (err=%d) to %s object (id=%d)\n",
+  err, ow->add ? "add" : "del", ow->obj.id);
+   if (!rtnl_locked)
+   rtnl_unlock();
+
+   dev_put(ow->dev);
+   kfree(ow);
+}
+
+static int switchdev_port_obj_work_schedule(struct net_device *dev,
+   const struct switchdev_obj *obj,
+   bool add)
+{
+   struct switchdev_obj_work *ow;
+
+   ow = kmalloc(sizeof(*ow), GFP_ATOMIC);
+   if (!ow)
+   return -ENOMEM;
+
+   INIT_WORK(>work, switchdev_port_obj_work);
+
+   dev_hold(dev);
+   ow->dev = dev;
+   memcpy(>obj, obj, sizeof(ow->obj));
+   ow->add = add;
+
+   queue_work(switchdev_wq, >work);
+   return 0;
+}
+
+static int switchdev_port_obj_add_defer(struct net_device *dev,
+   const struct switchdev_obj *obj)
+{
+   return switchdev_port_obj_work_schedule(dev, obj, true);
+}
+
+/**
+ * switchdev_port_obj_add - Add port object
+ *
+ * @dev: port device
+ * @id: object ID
+ * @obj: object to add
+ *
+ * Use a 2-phase prepare-commit transaction model to ensure
+ * system is not left in a partially updated state due to
+ * failure from driver/device.
+ *
+ * rtnl_lock must be held and must not be in atomic section,
+ * in case SWITCHDEV_F_DEFER flag is not set.
+ */
+int 

[patch net-next 6/7] rocker: remove nowait from switchdev callbacks.

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

No need to avoid sleeping in switchdev callbacks now, as the switchdev
core allows it.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/rocker/rocker.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 5473fbe..862da2f 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port 
*rocker_port,
rocker_port->stp_state == BR_STATE_FORWARDING)
return 0;
 
-   flags |= ROCKER_OP_FLAG_REMOVE;
+   flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
 
spin_lock_irqsave(>fdb_tbl_lock, lock_flags);
 
@@ -4370,8 +4370,7 @@ static int rocker_port_attr_set(struct net_device *dev,
 
switch (attr->id) {
case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
-   err = rocker_port_stp_update(rocker_port, trans,
-ROCKER_OP_FLAG_NOWAIT,
+   err = rocker_port_stp_update(rocker_port, trans, 0,
 attr->u.stp_state);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
@@ -4501,7 +4500,7 @@ static int rocker_port_fdb_del(struct rocker_port 
*rocker_port,
   const struct switchdev_obj_port_fdb *fdb)
 {
__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
-   int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
+   int flags = ROCKER_OP_FLAG_REMOVE;
 
if (!rocker_port_is_bridged(rocker_port))
return -EINVAL;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

Since spinlock is held here, defer the switchdev operation.

Signed-off-by: Jiri Pirko 
---
 net/bridge/br_fdb.c | 5 -
 net/bridge/br_if.c  | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index b3801fd..8b723b4 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const 
unsigned char *addr)
 static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
 {
struct switchdev_obj_port_fdb fdb = {
-   .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
+   .obj = {
+   .id = SWITCHDEV_OBJ_ID_PORT_FDB,
+   .flags = SWITCHDEV_F_DEFER,
+   },
.vid = f->vlan_id,
};
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 934cae9..09147cb 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "br_private.h"
 
@@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
list_del_rcu(>list);
 
br_fdb_delete_by_port(br, p, 0, 1);
+   switchdev_flush_deferred();
+
nbp_update_port_count(br);
 
netdev_upper_dev_unlink(dev, br->dev);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-next 3/7] switchdev: remove pointers from switchdev objects

2015-10-11 Thread Jiri Pirko
From: Jiri Pirko 

When object is used in deferred work, we cannot use pointers in
switchdev object structures because the memory they point at may be already
used by someone else. So rather do local copy of the value.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/rocker/rocker.c |  6 +++---
 include/net/switchdev.h  |  7 +++
 net/bridge/br_fdb.c  |  2 +-
 net/dsa/slave.c  |  2 +-
 net/switchdev/switchdev.c| 11 +++
 5 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index cf91ffc..5473fbe 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4453,7 +4453,7 @@ static int rocker_port_obj_add(struct net_device *dev,
fib4 = SWITCHDEV_OBJ_IPV4_FIB(obj);
err = rocker_port_fib_ipv4(rocker_port, trans,
   htonl(fib4->dst), fib4->dst_len,
-  fib4->fi, fib4->tb_id, 0);
+  >fi, fib4->tb_id, 0);
break;
case SWITCHDEV_OBJ_ID_PORT_FDB:
err = rocker_port_fdb_add(rocker_port, trans,
@@ -4525,7 +4525,7 @@ static int rocker_port_obj_del(struct net_device *dev,
fib4 = SWITCHDEV_OBJ_IPV4_FIB(obj);
err = rocker_port_fib_ipv4(rocker_port, NULL,
   htonl(fib4->dst), fib4->dst_len,
-  fib4->fi, fib4->tb_id,
+  >fi, fib4->tb_id,
   ROCKER_OP_FLAG_REMOVE);
break;
case SWITCHDEV_OBJ_ID_PORT_FDB:
@@ -4555,7 +4555,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
hash_for_each_safe(rocker->fdb_tbl, bkt, tmp, found, entry) {
if (found->key.rocker_port != rocker_port)
continue;
-   fdb->addr = found->key.addr;
+   ether_addr_copy(fdb->addr, found->key.addr);
fdb->ndm_state = NUD_REACHABLE;
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 74e349a..cd281db 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define SWITCHDEV_F_NO_RECURSE BIT(0)
 #define SWITCHDEV_F_DEFER  BIT(1)
@@ -56,8 +57,6 @@ struct switchdev_attr {
} u;
 };
 
-struct fib_info;
-
 enum switchdev_obj_id {
SWITCHDEV_OBJ_ID_UNDEFINED,
SWITCHDEV_OBJ_ID_PORT_VLAN,
@@ -85,7 +84,7 @@ struct switchdev_obj_ipv4_fib {
struct switchdev_obj obj;
u32 dst;
int dst_len;
-   struct fib_info *fi;
+   struct fib_info fi;
u8 tos;
u8 type;
u32 nlflags;
@@ -98,7 +97,7 @@ struct switchdev_obj_ipv4_fib {
 /* SWITCHDEV_OBJ_ID_PORT_FDB */
 struct switchdev_obj_port_fdb {
struct switchdev_obj obj;
-   const unsigned char *addr;
+   unsigned char addr[ETH_ALEN];
u16 vid;
u16 ndm_state;
 };
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 7f7d551..b3801fd 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -135,10 +135,10 @@ static void fdb_del_external_learn(struct 
net_bridge_fdb_entry *f)
 {
struct switchdev_obj_port_fdb fdb = {
.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-   .addr = f->addr.addr,
.vid = f->vlan_id,
};
 
+   ether_addr_copy(fdb.addr, f->addr.addr);
switchdev_port_obj_del(f->dst->dev, );
 }
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 4f607bc..a477975 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -390,7 +390,7 @@ static int dsa_slave_port_fdb_dump(struct net_device *dev,
if (ret < 0)
break;
 
-   fdb->addr = addr;
+   ether_addr_copy(fdb->addr, addr);
fdb->vid = vid;
fdb->ndm_state = is_static ? NUD_NOARP : NUD_REACHABLE;
 
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 8831dbd..32a0390 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -830,10 +831,10 @@ int switchdev_port_fdb_add(struct ndmsg *ndm, struct 
nlattr *tb[],
 {
struct switchdev_obj_port_fdb fdb = {
.obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
-   .addr = addr,
.vid = vid,
};
 
+   ether_addr_copy(fdb.addr, addr);
return switchdev_port_obj_add(dev, );
 }
 

Re: [PATCH net] ipv6: gre: setup default multicast routes over PtP links

2015-10-11 Thread David Miller
From: Hannes Frederic Sowa 
Date: Thu,  8 Oct 2015 18:19:39 +0200

> GRE point-to-point interfaces should also support ipv6 multicast. Setting
> up default multicast routes on interface creation was forgotten. Add it.
> 
> Bugzilla: 
> Cc: Julien Muchembled 
> Cc: Eric Dumazet 
> Cc: Nicolas Dumazet 
> Signed-off-by: Hannes Frederic Sowa 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: fix RFS vs lockless listeners

2015-10-11 Thread David Miller
From: Eric Dumazet 
Date: Thu, 08 Oct 2015 11:16:48 -0700

> From: Eric Dumazet 
> 
> Before recent TCP listener patches, we were updating listener
> sk->sk_rxhash before the cloning of master socket.
> 
> children sk_rxhash was therefore correct after the normal 3WHS.
> 
> But with lockless listener, we no longer dirty/change listener sk_rxhash
> as it would be racy.
> 
> We need to correctly update the child sk_rxhash, otherwise first data
> packet wont hit correct cpu if RFS is used.
> 
> Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
> Signed-off-by: Eric Dumazet 
> Reported-by: Willem de Bruijn 
> Cc: Tom Herbert 

Applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: drop frames with attached skb->sk in forwarding

2015-10-11 Thread David Miller
From: Hannes Frederic Sowa 
Date: Thu,  8 Oct 2015 18:19:53 +0200

> This is a clone of commit 2ab957492d13b ("ip_forward: Drop frames with
> attached skb->sk") for ipv6.
> 
> This commit has exactly the same reasons as the above mentioned commit,
> namely to prevent panics during netfilter reload or a misconfigured stack.
> 
> Signed-off-by: Hannes Frederic Sowa 

Also applied, thanks Hannes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 4/5] bridge: vlan: fix possible null ptr derefs on port init and deinit

2015-10-11 Thread Ido Schimmel
Wed, Sep 30, 2015 at 09:16:54PM IDT, ra...@blackwall.org wrote:
>From: Nikolay Aleksandrov 
>
>When a new port is being added we need to make vlgrp available after
>rhashtable has been initialized and when removing a port we need to
>flush the vlans and free the resources after we're sure noone can use
>the port, i.e. after it's removed from the port list and synchronize_rcu
>is executed.

Hi Nikolay,

Changing the order of port deinit breaks symmetry with the init
sequence. It also introduces a problem for switchdev drivers. Flushing
the VLANs clears HW VLAN filters and then, when port is unlinked from
bridge and CHANGEUPPER is received, port is configured to direct traffic
to CPU (as it's not offloaded anymore). Doing the reverse (like in this
patch) renders the port unusable.

Regarding the reason for this change, are you afraid that vlgrp will be
accessed in bridge's rx handler or xmit function after it's freed? If
so, maybe we can access it using RCU primitives? That way, both the rx
handler and xmit function (executed under RCU lock) will either have a
valid copy or not. Looking at previous iterations of this code, I see
that was the case with the 'net_port_vlans' struct.

I can start working on a fix if you agree with the proposed solution.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull request: bluetooth-next 2015-10-08

2015-10-11 Thread David Miller
From: Johan Hedberg 
Date: Thu, 8 Oct 2015 17:00:58 +0300

> Here's another set of Bluetooth & 802.15.4 patches for the 4.4 kernel.
> 
> 802.15.4:
>  - Many improvements & fixes to the mrf24j40 driver
>  - Fixes and cleanups to nl802154, mac802154 & ieee802154 code
> 
> Bluetooth:
>  - New chipset support in btmrvl driver
>  - Fixes & cleanups to btbcm, btmrvl, bpa10x & btintel drivers
>  - Support for vendor specific diagnostic data through common API
>  - Cleanups to the 6lowpan code
>  - New events & message types for monitor channel
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 0/3] net: dsa: push switchdev prepare phase in FDB ops

2015-10-11 Thread David Miller
From: Vivien Didelot 
Date: Thu,  8 Oct 2015 11:35:11 -0400

> This patchset pushes the switchdev prepare phase for the FDB add and del
> operations down to the DSA drivers. Currently only mv88e6xxx is affected.
> 
> Since the dump requires a bit of refactoring in the driver, it'll come in a
> future patchset.
> 
> Changes in v2:
>  * forward declare switchdev structs instead of fixing the dsa.h include.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG: Mellanox crash with iommu=soft and swiotlb=force

2015-10-11 Thread Christoffer Dall
Hi,

On Tue, Sep 29, 2015 at 12:59:35AM +0300, Or Gerlitz wrote:
> On Tue, Sep 29, 2015 at 12:04 AM, Christoffer Dall
>  wrote:
> > Hi,
> >
> > In doing some performance experiments I found that using a 10G Mellanox
> > MX354A Dual port FDR CX3 device on a server running Apache and running
> > ab against that server causes the system to crash with 'iommu=soft
> > swiotlb=force'.  The same behavior is seen without these options on Dom0
> > running under Xen.
> >
> > I have tried this on v4.0 and v4.3-rc3.
> 
> Woops, needs looking indeed. Unfortunately many people in the team are
> off for the Sukkot holiday with real backing to business coming to
> play on Oct 6th -- not sure we can really respond on that before.
> 
Friendly reminder on this one.  Did anyone have a look?

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kbuild-all] [PATCH net-next] net: Lookup actual route when oif is VRF device

2015-10-11 Thread Fengguang Wu
Hi David,

On Mon, Oct 05, 2015 at 12:10:12PM -0600, David Ahern wrote:
> On 10/5/15 12:01 PM, kbuild test robot wrote:
> >Hi David,
> >
> >[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please 
> >ignore]
> >
> 
> net-next patches can *not* be applied to Linus' tree. If you are going to
> run the build bot for patches submitted to netdev, please be sure to apply
> those patches to net-next tree.

OK. Hope these rules will work:

- "[PATCH net-next]" will be tested on net-next/master
- "[PATCH net]"  will be tested on net/master
- "[PATCH nf]"   will be tested on nf/master
- "[PATCH nf-next]"  will be tested on nf-next/master

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 7/7] netfilter-ipv6: pointer cast layout

2015-10-11 Thread Ian Morris
Correct whitespace layout of a pointer casting.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index cfcc71f..deab0a1 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -59,7 +59,7 @@ struct nf_ct_frag6_skb_cb
struct sk_buff  *orig;
 };
 
-#define NFCT_FRAG6_CB(skb) ((struct nf_ct_frag6_skb_cb*)((skb)->cb))
+#define NFCT_FRAG6_CB(skb) ((struct nf_ct_frag6_skb_cb *)((skb)->cb))
 
 static struct inet_frags nf_frags;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 5/7] netfilter-ipv6: ternary operator layout

2015-10-11 Thread Ian Morris
Correct whitespace layout of ternary operators in the netfilter-ipv6
code.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/ip6_tables.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 62190c3..80f014a 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -117,7 +117,7 @@ ip6_packet_match(const struct sk_buff *skb,
if (FWINV(ret != 0, IP6T_INV_VIA_IN)) {
dprintf("VIA in mismatch (%s vs %s).%s\n",
indev, ip6info->iniface,
-   ip6info->invflags_INV_VIA_IN ?" (INV)":"");
+   ip6info->invflags_INV_VIA_IN ? " (INV)" : "");
return false;
}
 
@@ -126,7 +126,7 @@ ip6_packet_match(const struct sk_buff *skb,
if (FWINV(ret != 0, IP6T_INV_VIA_OUT)) {
dprintf("VIA out mismatch (%s vs %s).%s\n",
outdev, ip6info->outiface,
-   ip6info->invflags_INV_VIA_OUT ?" (INV)":"");
+   ip6info->invflags_INV_VIA_OUT ? " (INV)" : "");
return false;
}
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 2/7] netfilter-ipv6: function definition layout

2015-10-11 Thread Ian Morris
Use tabs instead of spaces to indent second line of parameters in
function definitions.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/ip6_tables.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 67a2821..0654312 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -816,7 +816,7 @@ static void cleanup_entry(struct ip6t_entry *e, struct net 
*net)
newinfo) */
 static int
 translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
-const struct ip6t_replace *repl)
+   const struct ip6t_replace *repl)
 {
struct ip6t_entry *iter;
unsigned int i;
@@ -1090,7 +1090,7 @@ static int compat_table_info(const struct xt_table_info 
*info,
 #endif
 
 static int get_info(struct net *net, void __user *user,
-const int *len, int compat)
+   const int *len, int compat)
 {
char name[XT_TABLE_MAXNAMELEN];
struct xt_table *t;
@@ -1152,7 +1152,7 @@ static int get_info(struct net *net, void __user *user,
 
 static int
 get_entries(struct net *net, struct ip6t_get_entries __user *uptr,
-const int *len)
+   const int *len)
 {
int ret;
struct ip6t_get_entries get;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 4/7] netfilter-ipv6: whitespace around operators

2015-10-11 Thread Ian Morris
This patch cleanses whitespace around arithmetical operators.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 4 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c| 2 +-
 net/ipv6/netfilter/nf_reject_ipv6.c| 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c 
b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
index d3b7974..660bc10 100644
--- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
@@ -57,12 +57,12 @@ static const u_int8_t invmap[] = {
[ICMPV6_ECHO_REQUEST - 128] = ICMPV6_ECHO_REPLY + 1,
[ICMPV6_ECHO_REPLY - 128]   = ICMPV6_ECHO_REQUEST + 1,
[ICMPV6_NI_QUERY - 128] = ICMPV6_NI_REPLY + 1,
-   [ICMPV6_NI_REPLY - 128] = ICMPV6_NI_QUERY +1
+   [ICMPV6_NI_REPLY - 128] = ICMPV6_NI_QUERY + 1
 };
 
 static const u_int8_t noct_valid_new[] = {
[ICMPV6_MGM_QUERY - 130] = 1,
-   [ICMPV6_MGM_REPORT -130] = 1,
+   [ICMPV6_MGM_REPORT - 130] = 1,
[ICMPV6_MGM_REDUCTION - 130] = 1,
[NDISC_ROUTER_SOLICITATION - 130] = 1,
[NDISC_ROUTER_ADVERTISEMENT - 130] = 1,
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index 701cd2b..cfcc71f 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -445,7 +445,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq, struct net_device 
*dev)
skb_reset_transport_header(head);
skb_push(head, head->data - skb_network_header(head));
 
-   for (fp=head->next; fp; fp = fp->next) {
+   for (fp = head->next; fp; fp = fp->next) {
head->data_len += fp->len;
head->len += fp->len;
if (head->ip_summed != fp->ip_summed)
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c 
b/net/ipv6/netfilter/nf_reject_ipv6.c
index 94b4c6d..5629db8 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -26,7 +26,7 @@ const struct tcphdr *nf_reject_ip6_tcphdr_get(struct sk_buff 
*oldskb,
int tcphoff;
 
proto = oip6h->nexthdr;
-   tcphoff = ipv6_skip_exthdr(oldskb, ((u8*)(oip6h+1) - oldskb->data),
+   tcphoff = ipv6_skip_exthdr(oldskb, ((u8 *)(oip6h + 1) - oldskb->data),
   , _off);
 
if ((tcphoff < 0) || (tcphoff > oldskb->len)) {
@@ -224,7 +224,7 @@ static bool reject6_csum_ok(struct sk_buff *skb, int hook)
return true;
 
proto = ip6h->nexthdr;
-   thoff = ipv6_skip_exthdr(skb, ((u8*)(ip6h+1) - skb->data), , );
+   thoff = ipv6_skip_exthdr(skb, ((u8 *)(ip6h + 1) - skb->data), , 
);
 
if (thoff < 0 || thoff >= skb->len || (fo & htons(~0x7)) != 0)
return false;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 6/7] netfilter-ipv6: improve if statements

2015-10-11 Thread Ian Morris
Correct whitespace layout of if statements.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/ip6_tables.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 80f014a..bd660ec 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -133,7 +133,7 @@ ip6_packet_match(const struct sk_buff *skb,
 /* ... might want to do something with class and flowlabel here ... */
 
/* look for the desired protocol header */
-   if((ip6info->flags & IP6T_F_PROTO)) {
+   if (ip6info->flags & IP6T_F_PROTO) {
int protohdr;
unsigned short _frag_off;
 
@@ -151,7 +151,7 @@ ip6_packet_match(const struct sk_buff *skb,
ip6info->proto);
 
if (ip6info->proto == protohdr) {
-   if(ip6info->invflags & IP6T_INV_PROTO) {
+   if (ip6info->invflags & IP6T_INV_PROTO) {
return false;
}
return true;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 1/7] netfilter-ipv6: label placement

2015-10-11 Thread Ian Morris
Whitespace cleansing: Labels should not be indented.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/ip6_tables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 80e3bd7..67a2821 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -561,7 +561,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
pos = newpos;
}
}
-   next:
+next:
duprintf("Finished chain %u\n", hook);
}
return 1;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 3/7] netfilter-ipv6: code indentation

2015-10-11 Thread Ian Morris
Use tabs instead of spaces to indent code.

No changes detected by objdiff.

Signed-off-by: Ian Morris 
---
 net/ipv6/netfilter/ip6_tables.c   | 4 ++--
 net/ipv6/netfilter/ip6t_SYNPROXY.c| 2 +-
 net/ipv6/netfilter/nft_chain_route_ipv6.c | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 0654312..62190c3 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -443,8 +443,8 @@ ip6t_do_table(struct sk_buff *skb,
break;
} while (!acpar.hotdrop);
 
-   xt_write_recseq_end(addend);
-   local_bh_enable();
+   xt_write_recseq_end(addend);
+   local_bh_enable();
 
 #ifdef DEBUG_ALLOW_ALL
return NF_ACCEPT;
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index c235660..5312e9d 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -244,7 +244,7 @@ synproxy_send_client_ack(const struct synproxy_net *snet,
synproxy_build_options(nth, opts);
 
synproxy_send_tcp(snet, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
- niph, nth, tcp_hdr_size);
+ niph, nth, tcp_hdr_size);
 }
 
 static bool
diff --git a/net/ipv6/netfilter/nft_chain_route_ipv6.c 
b/net/ipv6/netfilter/nft_chain_route_ipv6.c
index 9df75bd..71d995f 100644
--- a/net/ipv6/netfilter/nft_chain_route_ipv6.c
+++ b/net/ipv6/netfilter/nft_chain_route_ipv6.c
@@ -61,11 +61,11 @@ static const struct nf_chain_type nft_chain_route_ipv6 = {
.name   = "route",
.type   = NFT_CHAIN_T_ROUTE,
.family = NFPROTO_IPV6,
-.owner = THIS_MODULE,
+   .owner  = THIS_MODULE,
.hook_mask  = (1 << NF_INET_LOCAL_OUT),
.hooks  = {
-[NF_INET_LOCAL_OUT]= nf_route_table_hook,
-},
+   [NF_INET_LOCAL_OUT] = nf_route_table_hook,
+   },
 };
 
 static int __init nft_chain_route_init(void)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 0/2] mpls: multipath support

2015-10-11 Thread Eric W. Biederman
Roopa Prabhu  writes:

> From: Roopa Prabhu 
>
> This patch adds support for MPLS multipath routes.
>
> Includes following changes to support multipath:
> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.
>
> - struct mpls_nh represents a mpls nexthop label forwarding entry
>
> - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
> multipath routes similar to ipv4/v6 fib
>
> - In the process of restructuring, this patch also consistently changes all
> labels to u8
>
> $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
>   nexthop as 700 via inet 10.1.1.6 dev swp2 \
>   nexthop as 800 via inet 40.1.1.2 dev swp3
>
> $ip  -f mpls route show
> 100 
>   nexthop as to 200 via inet 10.1.1.2  dev swp1
>   nexthop as to 700 via inet 10.1.1.6  dev swp2
>   nexthop as to 800 via inet 40.1.1.2  dev swp3
>
> Roopa Prabhu (2):
>   mpls: multipath support
>   mpls: flow-based multipath selection

Overall this looks good, very reviewable and maintainable.

I have a couple of minor nits that can probably be best addressed by
follow on patches.

Eric


> Signed-off-by: Roopa Prabhu 
>
> 
> v2:
>   - Incorporate some feedback from Robert:
>   use dynamic allocation (list) instead of static allocation
>   for nexthops
> v3:
>   - Move back to arrays (same as v1), also suggested by Eric Biederman
>
>
>  include/net/mpls_iptunnel.h |   2 +-
>  net/mpls/af_mpls.c  | 668 
> ++--
>  net/mpls/internal.h |  57 +++-
>  3 files changed, 572 insertions(+), 155 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: Fix vti use case with oif in dst lookups for IPv6

2015-10-11 Thread David Ahern

On 10/11/15 8:24 AM, Hajime Tazaki wrote:


I've faced this issue since the following patch was applied.

commit 741a11d9e4103a8e1c590ef1280143fe654e4e33
Author: David Ahern 
Date:   Mon Sep 28 10:12:13 2015 -0700

 net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set

I still couldn't spot which part (other than my posted call
graph) is broken and am not sure whether the xfrm change
affects or not (which I need to check the mip6 code again).


Ok, this is a separate problem from what Steffen is hitting.




Can you apply this patch, and then run:

perf record -e fib6:* -a -g
perf script


I'm using libos environment right now, so the perf trace
can't be used as it is.


ok.

Some path in raw6_sendmsg is setting fl6.flowi6_oif. Can you instrument it?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 1/2] mpls: multipath route support

2015-10-11 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a array (similar to ipv4 fib)

- In the process of restructuring, this patch also consistently changes
  all labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
simply returns the first nexthop. It is replaced by a
hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was
never used that way. And the dev handling gets tricky with multiple nexthops.
Cannot match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 500 +++-
 net/mpls/internal.h |  52 -
 3 files changed, 404 insertions(+), 150 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bb185a2..4d819df 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,37 +19,9 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -80,10 +52,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,6 +77,12 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   /* assume single nexthop for now */
+   return >rt_nh[0];
+}
+
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
 {
@@ -159,6 +137,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
struct net *net = dev_net(dev);
struct mpls_shim_hdr *hdr;
struct mpls_route *rt;
+   struct mpls_nh *nh;
struct mpls_entry_decoded dec;
struct net_device *out_dev;
struct mpls_dev *mdev;
@@ -166,6 +145,7 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned int mtu;
int err;
+   int nhidx;
 
/* Careful 

[PATCH net-next v3 0/2] mpls: multipath support

2015-10-11 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (2):
  mpls: multipath support
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


v2:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
v3:
- Move back to arrays (same as v1), also suggested by Eric Biederman


 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v3 2/2] mpls: flow-based multipath selection

2015-10-11 Thread Roopa Prabhu
From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 88 ++
 1 file changed, 83 insertions(+), 5 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 4d819df..15dd2eb 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,6 +22,11 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
 
@@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
-   /* assume single nexthop for now */
-   return >rt_nh[0];
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
+   bool eli_seen = false;
+   int label_index;
+   int nh_index = 0;
+   u32 hash = 0;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
+   if (rt->rt_nhn == 1)
+   goto out;
+
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   } else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
+   }
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+   pskb_may_pull(skb, sizeof(*hdr) * label_index +
+ sizeof(struct ipv6hdr))) {
+   const struct ipv6hdr *v6hdr;
+
+   v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
+   label_index);
+
+   hash = __ipv6_addr_jhash(>saddr, hash);
+   hash = __ipv6_addr_jhash(>daddr, hash);
+   hash = jhash_1word(v6hdr->nexthdr, hash);
+   }
+   }
+   }
+
+   nh_index = hash % rt->rt_nhn;
+out:
+   return >rt_nh[nh_index];
 }
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
@@ -145,7 +218,6 @@ static int mpls_forward(struct sk_buff *skb, struct 
net_device *dev,
unsigned int new_header_size;
unsigned 

[PATCH nf-next 0/7] coding style improvements for netfilter

2015-10-11 Thread Ian Morris
This series of patches improves the coding style of the netfilter-ipv6 
code by addressing some issues detected by checkpatch.

The changes were previously submitted as part of a larger monolithic 
patch but on advice from Pablo, these are being re-sent in smaller, 
more structured batches.

Ian Morris (7):
  netfilter-ipv6: label placement
  netfilter-ipv6: function definition layout
  netfilter-ipv6: code indentation
  netfilter-ipv6: whitespace around operators
  netfilter-ipv6: ternary operator layout
  netfilter-ipv6: improve if statements
  netfilter-ipv6: pointer cast layout

 net/ipv6/netfilter/ip6_tables.c| 20 ++--
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |  4 ++--
 net/ipv6/netfilter/nf_conntrack_reasm.c|  4 ++--
 net/ipv6/netfilter/nf_reject_ipv6.c|  4 ++--
 net/ipv6/netfilter/nft_chain_route_ipv6.c  |  6 +++---
 6 files changed, 20 insertions(+), 20 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e1000_driver_checkpatch_remove_errors_warnings_checks

2015-10-11 Thread Janusz Wolak

From daf0a1f5100c21f10b9e08829433258267748c44 Mon Sep 17 00:00:00 2001
From: Janusz Wolak 
Date: Tue, 6 Oct 2015 21:03:19 +0200
Subject: [PATCH 1/6] Remove checkpatch warnings and checks.

Signed-off-by: Janusz Wolak 
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 145 
++

 1 file changed, 76 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c

index e0c3c14..583a094 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -1,5 +1,5 @@

/***
-
+*
   Intel PRO/1000 Linux driver
   Copyright(c) 1999 - 2006 Intel Corporation.

@@ -34,7 +34,7 @@
 #include 

 char e1000_driver_name[] = "e1000";
-static char e1000_driver_string[] = "Intel(R) PRO/1000 Network Driver";
+static const char e1000_driver_string[] = "Intel(R) PRO/1000 Network 
Driver";

 #define DRV_VERSION "7.3.21-k8-NAPI"
 const char e1000_driver_version[] = DRV_VERSION;
 static const char e1000_copyright[] = "Copyright (c) 1999-2006 Intel 
Corporation.";
@@ -149,6 +149,7 @@ static void e1000_alloc_dummy_rx_buffers(struct 
e1000_adapter *adapter,

 int cleaned_count)
 {
 }
+
 static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
   struct e1000_rx_ring *rx_ring,
   int cleaned_count);
@@ -185,14 +186,14 @@ static void e1000_shutdown(struct pci_dev *pdev);

 #ifdef CONFIG_NET_POLL_CONTROLLER
 /* for netdump / net console */
-static void e1000_netpoll (struct net_device *netdev);
+static void e1000_netpoll(struct net_device *netdev);
 #endif

 #define COPYBREAK_DEFAULT 256
 static unsigned int copybreak __read_mostly = COPYBREAK_DEFAULT;
 module_param(copybreak, uint, 0644);
 MODULE_PARM_DESC(copybreak,
-   "Maximum size of packet that is copied to a new buffer on receive");
+"Maximum size of packet that is copied to a new buffer on 
receive");

 static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
@@ -224,7 +225,7 @@ MODULE_DESCRIPTION("Intel(R) PRO/1000 Network Driver");
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_VERSION);

-#define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK)
+#define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | 
NETIF_MSG_LINK)

 static int debug = -1;
 module_param(debug, int, 0);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
@@ -237,6 +238,7 @@ MODULE_PARM_DESC(debug, "Debug level 
(0=none,...,16=all)");

 struct net_device *e1000_get_hw_dev(struct e1000_hw *hw)
 {
struct e1000_adapter *adapter = hw->back;
+
return adapter->netdev;
 }

@@ -249,6 +251,7 @@ struct net_device *e1000_get_hw_dev(struct e1000_hw *hw)
 static int __init e1000_init_module(void)
 {
int ret;
+
pr_info("%s - version %s\n", e1000_driver_string, e1000_driver_version);

pr_info("%s\n", e1000_copyright);
@@ -258,8 +261,7 @@ static int __init e1000_init_module(void)
if (copybreak == 0)
pr_info("copybreak disabled\n");
else
-   pr_info("copybreak enabled for "
-  "packets <= %u bytes\n", copybreak);
+   pr_info("copybreak enabled for packets <= %u bytes\n", 
copybreak);
}
return ret;
 }
@@ -288,9 +290,8 @@ static int e1000_request_irq(struct e1000_adapter 
*adapter)


err = request_irq(adapter->pdev->irq, handler, irq_flags, netdev->name,
  netdev);
-   if (err) {
+   if (err)
e_err(probe, "Unable to allocate interrupt Error: %d\n", err);
-   }

return err;
 }
@@ -406,6 +407,7 @@ static void e1000_configure(struct e1000_adapter 
*adapter)

 */
for (i = 0; i < adapter->num_rx_queues; i++) {
struct e1000_rx_ring *ring = >rx_ring[i];
+
adapter->alloc_rx_buf(adapter, ring,
  E1000_DESC_UNUSED(ring));
}
@@ -466,7 +468,7 @@ static void e1000_power_down_phy(struct 
e1000_adapter *adapter)

 * (c) SoL/IDER session is active
 */
if (!adapter->wol && hw->mac_type >= e1000_82540 &&
-  hw->media_type == e1000_media_type_copper) {
+   hw->media_type == e1000_media_type_copper) {
u16 mii_reg = 0;

switch (hw->mac_type) {
@@ -501,8 +503,7 @@ static void e1000_down_and_stop(struct e1000_adapter 
*adapter)


cancel_delayed_work_sync(>watchdog_task);

-   /*
-* Since the watchdog task can reschedule other tasks, we should cancel
+   /* Since the watchdog task can reschedule other tasks, we 

[PATCH net-next] RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()

2015-10-11 Thread Sowmini Varadhan

Consider the following "duelling syn" sequence between two peers A and B:
A   B
SYN1 -->
<-- SYN2
SYN2ACK  -->

Note that the SYN/ACK has already been sent out by TCP before
rds_tcp_accept_one() gets invoked as part of callbacks.

If the inet_addr(A) is numerically less than inet_addr(B),
the arbitration scheme in rds_tcp_accept_one() will prefer the
TCP connection triggered by SYN1, and will send a CLOSE for the
SYN2 (just after the SYN2ACK was sent).

Since B also follows the same arbitration scheme, it will send the SYN-ACK
for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
B will also get a  CLOSE for SYN2, which should result in the cleanup
of the TCP state machine for SYN2, but it should not trigger any
stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
that would disrupt the progress of the SYN2 based RDS-TCP  connection.

Thus the arbitration scheme in rds_tcp_accept_one() should restore
rds_tcp callbacks for the winner before setting them up for the
new accept socket, and also make sure that conn->c_outgoing
is set to 0 so that we do not trigger any reconnect attempts on the
passive side of the tcp socket in the future, in conformance with
commit c82ac7e69efe ("net/rds: RDS-TCP: only initiate reconnect attempt
on outgoing TCP socket.")

Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp_listen.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 1d90240..0936a4a 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -125,6 +125,9 @@ int rds_tcp_accept_one(struct socket *sock)
new_sock = NULL;
ret = 0;
goto out;
+   } else if (rs_tcp->t_sock) {
+   rds_tcp_restore_callbacks(rs_tcp->t_sock, rs_tcp);
+   conn->c_outgoing = 0;
}
 
rds_conn_transition(conn, RDS_CONN_DOWN, RDS_CONN_CONNECTING);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Ben Cox
Forgive me for possibly being a little stupid here (This is my first
patch to Linux so I am slightly over my head)

Is this issue not addressed above the file where the following check is done?

if (iphlen > length)
goto error_free;



On Sun, Oct 11, 2015 at 11:10 PM, Eric Dumazet  wrote:
> On Sun, 2015-10-11 at 20:55 +, Ben Cartwright-Cox wrote:
>> Sending ICMP packets with raw sockets ends up in the SNMP counters
>> logging the type as the first byte of the IPv4 header rather than
>> the ICMP header (in nearly all cases this is seen as "OutType69".
>> This is fixed by adding the IP Header Length to the casting into
>> a icmphdr struct.
>>
>> Signed-off-by: Ben Cartwright-Cox 
>> ---
>>  net/ipv4/raw.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
>> index 561cd4b..1ad8bae 100644
>> --- a/net/ipv4/raw.c
>> +++ b/net/ipv4/raw.c
>> @@ -409,7 +409,7 @@ static int raw_send_hdrinc(struct sock *sk, struct 
>> flowi4 *fl4,
>>   }
>>   if (iph->protocol == IPPROTO_ICMP)
>>   icmp_out_count(net, ((struct icmphdr *)
>> - skb_transport_header(skb))->type);
>> + skb_transport_header(skb) + iphlen)->type);
>>
>>   err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, sk, skb,
>> NULL, rt->dst.dev, dst_output_sk);
>
>
> Hmm... This seems to lack checks against a malicious user ?
>
> The only guarantee you have here is that iphlen < length.
>
> It is not enough.
>
> Make sure you do not access not initialized memory or even non existent
> one.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev

2015-10-11 Thread Vivien Didelot
On Oct. Saturday 10 (41) 05:07 PM, Florian Fainelli wrote:
> 2015-10-10 15:41 GMT-07:00 Vivien Didelot 
> :
> > On Oct. Saturday 10 (41) 11:09 PM, Jiri Pirko wrote:
> >> Sat, Oct 10, 2015 at 05:56:19PM CEST, vivien.dide...@savoirfairelinux.com 
> >> wrote:
> >> >On Oct. Saturday 10 (41) 09:04 AM, Jiri Pirko wrote:
> >> >> Sat, Oct 10, 2015 at 04:53:52AM CEST, sfel...@gmail.com wrote:
> >> >> >On Thu, Oct 8, 2015 at 9:38 PM, Premkumar Jonnala 
> >> >> > wrote:
> >> >> >>
> >> >> >>
> >> >> >>> -Original Message-
> >> >> >>> From: sfel...@gmail.com [mailto:sfel...@gmail.com]
> >> >> >>> Sent: Friday, October 09, 2015 7:53 AM
> >> >> >>> To: netdev@vger.kernel.org
> >> >> >>> Cc: da...@davemloft.net; j...@resnulli.us; 
> >> >> >>> siva.mannem@gmail.com;
> >> >> >>> Premkumar Jonnala; step...@networkplumber.org;
> >> >> >>> ro...@cumulusnetworks.com; and...@lunn.ch; f.faine...@gmail.com;
> >> >> >>> vivien.dide...@savoirfairelinux.com
> >> >> >>> Subject: [PATCH net-next v3 3/4] bridge: push bridge setting 
> >> >> >>> ageing_time down
> >> >> >>> to switchdev
> >> >> >>>
> >> >> >>> From: Scott Feldman 
> >> >> >>>
> >> >> >>> Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that 
> >> >> >>> don't
> >> >> >>> support setting ageing_time (or setting bridge attrs in general).
> >> >> >>>
> >> >> >>> If push fails, don't update ageing_time in bridge and return err to 
> >> >> >>> user.
> >> >> >>>
> >> >> >>> If push succeeds, update ageing_time in bridge and run gc_timer now 
> >> >> >>> to
> >> >> >>> recalabrate when to run gc_timer next, based on new ageing_time.
> >> >> >>>
> >> >> >>> Signed-off-by: Scott Feldman 
> >> >> >>> Signed-off-by: Jiri Pirko 
> >> >> >
> >> >> >
> >> >> >
> >> >> >>> +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time)
> >> >> >>> +{
> >> >> >>> + struct switchdev_attr attr = {
> >> >> >>> + .id = SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
> >> >> >>> + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP,
> >> >> >>> + .u.ageing_time = ageing_time,
> >> >> >>> + };
> >> >> >>> + unsigned long t = clock_t_to_jiffies(ageing_time);
> >> >> >>> + int err;
> >> >> >>> +
> >> >> >>> + if (t < BR_MIN_AGEING_TIME || t > BR_MAX_AGEING_TIME)
> >> >> >>> + return -ERANGE;
> >> >> >>> +
> >> >> >>> + err = switchdev_port_attr_set(br->dev, );
> >> >> >>
> >> >> >> A thought - given that the ageing time is not a per-bridge-port 
> >> >> >> attr, why are we using a "port based api"
> >> >> >> to pass the attribute down?  May be I'm missing something here?
> >> >> >
> >> >> >I think Florian raised the same point earlier.  Sigh, I think this
> >> >> >should be addressedv4 coming soon...thanks guys for keeping the
> >> >> >standard high.
> >> >>
> >> >> Scott, can you tell us how do you want to address this? I like the
> >> >> current implementation.
> >> >
> >> >Scott, didn't you have a plan to add a struct device for the parent of
> >> >switchdev ports?
> >> >
> >> >I think it would be good to introduce such device with an helper to
> >> >retrieve this upper parent, and move the switchdev ops to this guy.
> >> >
> >> >So switchdev drivers may implement such API calls:
> >> >
> >> >.obj_add(struct device *swdev, struct switchdev_obj *obj);
> >> >
> >> >.port_obj_add(struct device *swdev, struct net_device *port,
> >> >  struct switchdev_obj *obj);
> >> >
> >> >Then switchdev code may have a parent API and the current port API may
> >> >look like this:
> >> >
> >> >int switchdev_port_obj_add(struct net_device *dev,
> >> >   struct switchdev_obj *obj)
> >> >{
> >> >struct device *swdev = switchdev_get_parent(dev);
> >> >int err = -EOPNOTSUPP;
> >> >
> >> >if (swdev && swdev->switchdev_ops &&
> >> >swdev->switchdev_ops->port_obj_add)
> >> >err = swdev->switchdev_ops->port_obj_add(swdev, dev, obj);
> >> >
> >> >return err;
> >> >}
> >>
> >> Fro the record, I don't see any reason for this "device". It would just
> >> introduce unnecessary complexicity. So far, we are fine without it.
> >
> > I wouldn't say that. I beleive that an Ethernet switch deserves its
> > struct device in the tree, since it is a physical chip, like any other.
> 
> Agreed, but gating these patches because we do not yet have a device
> driver model for an Ethernet switch outside of its individual ports
> does not seem like it hurts the current patch series, nor the existing
> model (and future).

Sure, I didn't mean to NAK the patch with this comment, I just wrote it
because we raised a concern about an API higher than the port level.

> >
> > Configuring it through one of its port (net_device) is fine, since you
> > want to change the port behavior, and Linux config is on per-port basis.
> >
> > But the 

Re: switchdev and VLAN ranges

2015-10-11 Thread Vivien Didelot
On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
> Sat, Oct 10, 2015 at 12:36:26PM CEST, niko...@cumulusnetworks.com wrote:
> >On 10/10/2015 09:49 AM, Elad Raz wrote:
> >> 
> >>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot 
> >>>  wrote:
> >>>
> >>> I have two concerns in mind:
> >>>
> >>> a) if we imagine that drivers like Rocker allocate memory in the prepare
> >>> phase for each VID, preparing a range like 100-4000 would definitely not
> >>> be recommended.
> >>>
> >>> b) imagine that you have two Linux bridges on a switch, one using the
> >>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
> >>> bridge members, it is not possible for the driver to say "I can
> >>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> >>> the whole range.
> >> 
> >> Another concern I have with vid_being..vid_end range is the “flags”. Where 
> >> flags can be BRIDGE_VLAN_INFO_PVID.
> >> There is no sense having more than one VLAN as a PVID.
> >> This leave the HW vendor the choice which VLAN id they will use as the 
> >> PVID.
> >> 
> >
> >iproute2 doesn't allow to do it but I can see that someone can actually make 
> >it
> >so the flags for the range have it and it doesn't look correct. Perhaps we 
> >need
> >something like the patch below to enforce this from kernel-side.
> >
> >
> >diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> >index d78b4429505a..02b17b53e9a6 100644
> >--- a/net/bridge/br_netlink.c
> >+++ b/net/bridge/br_netlink.c
> >@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
> > if (vinfo_start)
> > return -EINVAL;
> > vinfo_start = vinfo;
> >+/* don't allow range of pvids */
> >+if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
> >+return -EINVAL;
> > continue;
> > }
> > 
> 
> Looks correct to me. Could you please submit this properly? Thanks!

The above patch is correct, but we only solve part of the problem, since
the range and bridge flags are exposed by switchdev_obj_port_vlan as is.

Thanks,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Eric Dumazet
On Sun, 2015-10-11 at 15:43 -0700, Eric Dumazet wrote:

> But your code reads 21th byte.

BTW, nice catch !

Your patch only need a small addition.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Eric Dumazet
On Sun, 2015-10-11 at 23:17 +0100, Ben Cox wrote:
> Forgive me for possibly being a little stupid here (This is my first
> patch to Linux so I am slightly over my head)
> 
> Is this issue not addressed above the file where the following check is done?
> 
> if (iphlen > length)
> goto error_free;
> 

Imagine someone sends a frame, pretending it is ICMP, but containing
only the IPv4 header. And not a _single_ byte more.

length = 20

iphlen = 20 (if say ihl == 5)

We copied 20 bytes from user land.

But your code reads 21th byte.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Eric Dumazet
On Sun, 2015-10-11 at 15:44 -0700, Eric Dumazet wrote:
> On Sun, 2015-10-11 at 15:43 -0700, Eric Dumazet wrote:
> 
> > But your code reads 21th byte.
> 
> BTW, nice catch !

Maybe the following one.

1) We properly set transport header
2) We use icmp_hdr() helper.

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 561cd4b8fc6e..ffe25cd1f0e0 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -406,11 +406,11 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 
*fl4,
ip_select_ident(net, skb, NULL);
 
iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
+   skb->transport_header += iphlen;
+   if (iph->protocol == IPPROTO_ICMP &&
+   length >= iphlen + sizeof(struct icmphdr))
+   icmp_out_count(net, icmp_hdr(skb)->type);
}
-   if (iph->protocol == IPPROTO_ICMP)
-   icmp_out_count(net, ((struct icmphdr *)
-   skb_transport_header(skb))->type);
-
err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, sk, skb,
  NULL, rt->dst.dev, dst_output_sk);
if (err > 0)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Ben Cox
Nice!

That works in my head at least, Sorry about not seeing that fairly
glaring memory issue there.

Are you sure " skb->transport_header += iphlen; " won't have a knock
on affect when it's given to NF_HOOK ( as in, would a potential
userspace program get something it does not expect anymore ) ?

How does submission work at this point if the above is not a issue
(apologies if this is already in a FAQ somewhere I missed)?

On Mon, Oct 12, 2015 at 12:09 AM, Eric Dumazet  wrote:
> On Sun, 2015-10-11 at 15:44 -0700, Eric Dumazet wrote:
>> On Sun, 2015-10-11 at 15:43 -0700, Eric Dumazet wrote:
>>
>> > But your code reads 21th byte.
>>
>> BTW, nice catch !
>
> Maybe the following one.
>
> 1) We properly set transport header
> 2) We use icmp_hdr() helper.
>
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 561cd4b8fc6e..ffe25cd1f0e0 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -406,11 +406,11 @@ static int raw_send_hdrinc(struct sock *sk, struct 
> flowi4 *fl4,
> ip_select_ident(net, skb, NULL);
>
> iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
> +   skb->transport_header += iphlen;
> +   if (iph->protocol == IPPROTO_ICMP &&
> +   length >= iphlen + sizeof(struct icmphdr))
> +   icmp_out_count(net, icmp_hdr(skb)->type);
> }
> -   if (iph->protocol == IPPROTO_ICMP)
> -   icmp_out_count(net, ((struct icmphdr *)
> -   skb_transport_header(skb))->type);
> -
> err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, sk, skb,
>   NULL, rt->dst.dev, dst_output_sk);
> if (err > 0)
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 2/2] mpls: flow-based multipath selection

2015-10-11 Thread Eric W. Biederman
Roopa Prabhu  writes:

> From: Robert Shearman 
>
> Change the selection of a multipath route to use a flow-based
> hash. This more suitable for traffic sensitive to reordering within a
> flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
> of traffic given enough flows.
>
> Selection of the path for a multipath route is done using a hash of:
> 1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
>including entropy label, whichever is first.
> 2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
>payload, if present.
>
> Naturally, a 5-tuple hash using L4 information in addition would be
> possible and be better in some scenarios, but there is a tradeoff
> between looking deeper into the packet to achieve good distribution,
> and packet forwarding performance, and I have erred on the side of the
> latter as the default.

Not a fault with this patch, but this patches use of entropy labels
does highlight that we don't handle creating entropy labels on ingress
nor dealing with entropy labels on egress.

> Signed-off-by: Robert Shearman 
> ---
>  net/mpls/af_mpls.c | 88 
> ++
>  1 file changed, 83 insertions(+), 5 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 4d819df..15dd2eb 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -22,6 +22,11 @@
>  #include 
>  #include "internal.h"
>  
> +/* Maximum number of labels to look ahead at when selecting a path of
> + * a multipath route
> + */
> +#define MAX_MP_SELECT_LABELS 4

This number seems a little small.  Especially given that an entopy label
and the entropy label indicator consume two of these.  Although I
suspect 4 is enough for most cases in practice.

> +
>  static int zero = 0;
>  static int label_limit = (1 << 20) - 1;
>  
> @@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
> int mtu)
>  }
>  EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
>  
> -static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
> +static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
> +  struct sk_buff *skb, bool bos)
>  {
> - /* assume single nexthop for now */
> - return >rt_nh[0];
> + struct mpls_entry_decoded dec;
> + struct mpls_shim_hdr *hdr;
> + bool eli_seen = false;
> + int label_index;
> + int nh_index = 0;
> + u32 hash = 0;
> +
> + /* No need to look further into packet if there's only
> +  * one path
> +  */
> + if (rt->rt_nhn == 1)
> + goto out;
> +
> + for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
> +  label_index++) {
> + if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
> + break;
> +
> + /* Read and decode the current label */
> + hdr = mpls_hdr(skb) + label_index;
> + dec = mpls_entry_decode(hdr);
> +
> + /* RFC6790 - reserved labels MUST NOT be used as keys
> +  * for the load-balancing function
> +  */
> + if (dec.label == MPLS_LABEL_ENTROPY) {
> + eli_seen = true;
> + } else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) {

We should probably test this first and mark this branch as likely.

> + hash = jhash_1word(dec.label, hash);
> +
> + /* The entropy label follows the entropy label
> +  * indicator, so this means that the entropy
> +  * label was just added to the hash - no need to
> +  * go any deeper either in the label stack or in the
> +  * payload
> +  */
> + if (eli_seen)
> + break;
> + }
> +
> + bos = dec.bos;
> + if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
> +  sizeof(struct iphdr))) {
> + const struct iphdr *v4hdr;
> +
> + v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
> +label_index);
> + if (v4hdr->version == 4) {
> + hash = jhash_3words(ntohl(v4hdr->saddr),
> + ntohl(v4hdr->daddr),
> + v4hdr->protocol, hash);
> + } else if (v4hdr->version == 6 &&
> + pskb_may_pull(skb, sizeof(*hdr) * label_index +
> +   sizeof(struct ipv6hdr))) {
> + const struct ipv6hdr *v6hdr;
> +
> + v6hdr = (const struct ipv6hdr *)(mpls_hdr(skb) +
> +

Re: [PATCH] net: nhs: fix 32-bit build warning

2015-10-11 Thread Joe Perches
On Tue, 2015-10-06 at 23:53 +0200, Arnd Bergmann wrote:
> diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
> b/drivers/net/ethernet/hisilicon/hns/hnae.c
[]
> @@ -448,12 +448,12 @@ static ssize_t handles_show(struct device *dev,
>   s += sprintf(buf + s, "handle %d (eport_id=%u from %s):\n",
>   i++, h->eport_id, h->dev->name);
>   for (j = 0; j < h->q_num; j++) {
> - s += sprintf(buf + s, "\tqueue[%d] on 0x%llx\n",
> -  j, (u64)h->qs[i]->io_base);
> -#define HANDEL_TX_MSG "\t\ttx_ring on 0x%llx:%u,%u,%u,%u,%u,%llu,%llu\n"
> + s += sprintf(buf + s, "\tqueue[%d] on %p\n",
> +  j, h->qs[i]->io_base);
> +#define HANDEL_TX_MSG "\t\ttx_ring on %p:%u,%u,%u,%u,%u,%llu,%llu\n"
>   s += sprintf(buf + s,
>HANDEL_TX_MSG,

Maybe one day remove the misspelled and
used once HANDEL_TX_MSG macro too.

Another possibility is to use a generally
smaller object code style like:

char *s = buf;
...
s += sprintf(s, ...)
...
return s - buf;

instead of

ssize_t s = 0;
...
s += sprintf(buf + s, ...)
...
return s;
---
 drivers/net/ethernet/hisilicon/hns/hnae.c | 35 +++
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
index f52e99a..d2af46f 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -439,40 +439,39 @@ EXPORT_SYMBOL(hnae_ae_unregister);
 static ssize_t handles_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   ssize_t s = 0;
+   char *s = buf;
struct hnae_ae_dev *hdev = cls_to_ae_dev(dev);
struct hnae_handle *h;
int i = 0, j;
 
list_for_each_entry_rcu(h, >handle_list, node) {
-   s += sprintf(buf + s, "handle %d (eport_id=%u from %s):\n",
-   i++, h->eport_id, h->dev->name);
+   s += sprintf(s, "handle %d (eport_id=%u from %s):\n",
+i++, h->eport_id, h->dev->name);
for (j = 0; j < h->q_num; j++) {
-   s += sprintf(buf + s, "\tqueue[%d] on %p\n",
+   s += sprintf(s, "\tqueue[%d] on %p\n",
 j, h->qs[i]->io_base);
-#define HANDEL_TX_MSG "\t\ttx_ring on %p:%u,%u,%u,%u,%u,%llu,%llu\n"
-   s += sprintf(buf + s,
-HANDEL_TX_MSG,
+   s += sprintf(s,
+"\t\ttx_ring on 
%p:%u,%u,%u,%u,%u,%llu,%llu\n",
 h->qs[i]->tx_ring.io_base,
 h->qs[i]->tx_ring.buf_size,
 h->qs[i]->tx_ring.desc_num,
 h->qs[i]->tx_ring.max_desc_num_per_pkt,
 h->qs[i]->tx_ring.max_raw_data_sz_per_desc,
 h->qs[i]->tx_ring.max_pkt_size,
-h->qs[i]->tx_ring.stats.sw_err_cnt,
-h->qs[i]->tx_ring.stats.io_err_cnt);
-   s += sprintf(buf + s,
-   "\t\trx_ring on %p:%u,%u,%llu,%llu,%llu\n",
-   h->qs[i]->rx_ring.io_base,
-   h->qs[i]->rx_ring.buf_size,
-   h->qs[i]->rx_ring.desc_num,
-   h->qs[i]->rx_ring.stats.sw_err_cnt,
-   h->qs[i]->rx_ring.stats.io_err_cnt,
-   h->qs[i]->rx_ring.stats.seg_pkt_cnt);
+h->qs[i]->tx_ring.stats.sw_err_cnt,
+h->qs[i]->tx_ring.stats.io_err_cnt);
+   s += sprintf(s,
+"\t\trx_ring on %p:%u,%u,%llu,%llu,%llu\n",
+h->qs[i]->rx_ring.io_base,
+h->qs[i]->rx_ring.buf_size,
+h->qs[i]->rx_ring.desc_num,
+h->qs[i]->rx_ring.stats.sw_err_cnt,
+h->qs[i]->rx_ring.stats.io_err_cnt,
+h->qs[i]->rx_ring.stats.seg_pkt_cnt);
}
}
 
-   return s;
+   return s - buf;
 }
 
 static DEVICE_ATTR_RO(handles);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.

2015-10-11 Thread Sowmini Varadhan

The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.

Signed-off-by: Sowmini Varadhan 
---
 net/rds/bind.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index bc6b93e..6192566 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -196,7 +196,14 @@ int rds_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;
 
if (rs->rs_transport) { /* previously bound */
-   ret = 0;
+   trans = rs->rs_transport;
+   if (trans->laddr_check(sock_net(sock->sk),
+  sin->sin_addr.s_addr) != 0) {
+   ret = -ENOPROTOOPT;
+   rds_remove_bound(rs);
+   } else {
+   ret = 0;
+   }
goto out;
}
trans = rds_trans_get_preferred(sock_net(sock->sk),
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2 v2] bridge: add batch command support

2015-10-11 Thread Roopa Prabhu
From: Wilson Kok  

This patch adds support to batch bridge commands.
Follows ip batch code.

Signed-off-by: Wilson Kok 
Signed-off-by: Roopa Prabhu 
Acked-by: Christophe Gouault 
---
v2 - change tab to space in usage as pointed out by Christophe Gouault

 bridge/bridge.c   | 59 +++
 man/man8/bridge.8 | 11 +++
 2 files changed, 70 insertions(+)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index eaf09c8..72f153f 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "SNAPSHOT.h"
 #include "utils.h"
@@ -23,6 +24,8 @@ int show_stats;
 int show_details;
 int compress_vlans;
 int timestamp;
+char *batch_file;
+int force;
 const char *_SL_;
 
 static void usage(void) __attribute__((noreturn));
@@ -31,6 +34,7 @@ static void usage(void)
 {
fprintf(stderr,
 "Usage: bridge [ OPTIONS ] OBJECT { COMMAND | help }\n"
+"   bridge [ -force ] -batch filename\n"
 "where OBJECT := { link | fdb | mdb | vlan | monitor }\n"
 "  OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
 "   -o[neline] | -t[imestamp] | -n[etns] name |\n"
@@ -71,6 +75,50 @@ static int do_cmd(const char *argv0, int argc, char **argv)
return -1;
 }
 
+static int batch(const char *name)
+{
+   char *line = NULL;
+   size_t len = 0;
+   int ret = EXIT_SUCCESS;
+
+   if (name && strcmp(name, "-") != 0) {
+   if (freopen(name, "r", stdin) == NULL) {
+   fprintf(stderr,
+   "Cannot open file \"%s\" for reading: %s\n",
+   name, strerror(errno));
+   return EXIT_FAILURE;
+   }
+   }
+
+   if (rtnl_open(, 0) < 0) {
+   fprintf(stderr, "Cannot open rtnetlink\n");
+   return EXIT_FAILURE;
+   }
+
+   cmdlineno = 0;
+   while (getcmdline(, , stdin) != -1) {
+   char *largv[100];
+   int largc;
+
+   largc = makeargs(line, largv, 100);
+   if (largc == 0)
+   continue;   /* blank line */
+
+   if (do_cmd(largv[0], largc, largv)) {
+   fprintf(stderr, "Command failed %s:%d\n",
+   name, cmdlineno);
+   ret = EXIT_FAILURE;
+   if (!force)
+   break;
+   }
+   }
+   if (line)
+   free(line);
+
+   rtnl_close();
+   return ret;
+}
+
 int
 main(int argc, char **argv)
 {
@@ -123,6 +171,14 @@ main(int argc, char **argv)
exit(-1);
} else if (matches(opt, "-compressvlans") == 0) {
++compress_vlans;
+   } else if (matches(opt, "-force") == 0) {
+   ++force;
+   } else if (matches(opt, "-batch") == 0) {
+   argc--;
+   argv++;
+   if (argc <= 1)
+   usage();
+   batch_file = argv[1];
} else {
fprintf(stderr,
"Option \"%s\" is unknown, try \"bridge 
help\".\n",
@@ -134,6 +190,9 @@ main(int argc, char **argv)
 
_SL_ = oneline ? "\\" : "\n";
 
+   if (batch_file)
+   return batch(batch_file);
+
if (rtnl_open(, 0) < 0)
exit(1);
 
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index 5347a56..d45c728 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -21,6 +21,7 @@ bridge \- show / manipulate bridge addresses and devices
 \fB\-V\fR[\fIersion\fR] |
 \fB\-s\fR[\fItatistics\fR] |
 \fB\-n\fR[\fIetns\fR] name }
+\fB\-b\fR[\fIatch\fR] filename }
 
 .ti -8
 .BR "bridge link set"
@@ -137,6 +138,16 @@ to
 .RI "-n[etns] " NETNS " [ " OPTIONS " ] " OBJECT " { " COMMAND " | "
 .BR help " }"
 
+.TP
+.BR "\-b", " \-batch " 
+Read commands from provided file or standard input and invoke them.
+First failure will cause termination of bridge command.
+
+.TP
+.BR "\-force"
+Don't terminate bridge command on errors in batch mode.
+If there were any errors during execution of the commands, the application
+return code will be non zero.
 
 .SH BRIDGE - COMMAND SYNTAX
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 1/2] mpls: multipath route support

2015-10-11 Thread roopa
On 10/11/15, 1:41 PM, Eric W. Biederman wrote:
> Roopa Prabhu  writes:
>
>> From: Roopa Prabhu 
>>
>> This patch adds support for MPLS multipath routes.
>>
>> Includes following changes to support multipath:
>> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>>
>> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>>
>> - moves mpls route and nexthop structures into internal.h
>>
>> - A mpls_route can point to multiple mpls_nh structs
>>
>> - the nexthops are maintained as a array (similar to ipv4 fib)
>>
>> - In the process of restructuring, this patch also consistently changes
>>   all labels to u8
>>
>> - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
>> multipath routes similar to ipv4/v6 fib
>>
>> - In this patch, the multipath route nexthop selection algorithm
>> simply returns the first nexthop. It is replaced by a
>> hash based algorithm from Robert Shearman in the next patch
>>
>> - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
>> mpls_route_update though implemented to update based on dev, it was
>> never used that way. And the dev handling gets tricky with multiple nexthops.
>> Cannot match against any single nexthops dev. So, this patch removes the 
>> unused
>> 'dev' handling in mpls_route_update.
> Ugh.  Looking at this patch I have found if not a bug a misfeature.
>
> The existing code has a function:
>
>> bool mpls_output_possible(const struct net_device *dev)
>> {
>>  return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
>> }
> Which makes a lot of sense if you only have a single path to choose
> from.   When multipath comes into play we need to do something to ensure
> we don't select a path that where no output is possible for a prolonged
> period of time.

I was planning to add support for this in incremental patches (they are in my 
todo). Basically introduce the required
support for RTNH_F_DEAD and RTNH_F_LINKDOWN and skipping dead and link down 
nexthops.
I will submit this as an incremental patch.
>
>> Example:
>>
>> $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
>> nexthop as 700 via inet 10.1.1.6 dev swp2 \
>> nexthop as 800 via inet 40.1.1.2 dev swp3
>>
>> $ip  -f mpls route show
>> 100
>> nexthop as to 200 via inet 10.1.1.2  dev swp1
>> nexthop as to 700 via inet 10.1.1.6  dev swp2
>> nexthop as to 800 via inet 40.1.1.2  dev swp3
>>
>> Signed-off-by: Roopa Prabhu 
>> ---
>>  include/net/mpls_iptunnel.h |   2 +-
>>  net/mpls/af_mpls.c  | 500 
>> +++-
>>  net/mpls/internal.h |  52 -
>>  3 files changed, 404 insertions(+), 150 deletions(-)
>>
> [...]
>> @@ -196,9 +176,13 @@ static int mpls_forward(struct sk_buff *skb, struct 
>> net_device *dev,
>>  if (!rt)
>>  goto drop;
>>  
>> +nh = mpls_select_multipath(rt);
>> +if (!nh)
>> +goto drop;
>> +
>>  /* Find the output device */
>> -out_dev = rcu_dereference(rt->rt_dev);
>> -if (!mpls_output_possible(out_dev))
>> +out_dev = rcu_dereference(nh->nh_dev);
>> +if (!out_dev || !mpls_output_possible(out_dev))
>>  goto drop;
>>
> nit: mpls_output_possible already embeds the !out_dev test, making this
> hunk redundant.
ack,
>
>>  if (skb_warn_if_lro(skb))
>> @@ -635,9 +763,11 @@ static void mpls_ifdown(struct net_device *dev)
>>  struct mpls_route *rt = rtnl_dereference(platform_label[index]);
>>  if (!rt)
>>  continue;
>> -if (rtnl_dereference(rt->rt_dev) != dev)
>> -continue;
>> -rt->rt_dev = NULL;
>> +for_nexthops(rt) {
>> +if (rtnl_dereference(nh->nh_dev) != dev)
>> +continue;
>> +nh->nh_dev = NULL;
>> +} endfor_nexthops(rt);
>>  }
>>  
>>  mdev = mpls_dev_get(dev);
>> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
>> index 2681a4b..d7757be3 100644
>> --- a/net/mpls/internal.h
>> +++ b/net/mpls/internal.h
>> @@ -21,6 +21,54 @@ struct mpls_dev {
>>  
>>  struct sk_buff;
>>  
>> +#define LABEL_NOT_SPECIFIED (1 << 20)
>> +#define MAX_NEW_LABELS 2
>> +
>> +/* This maximum ha length copied from the definition of struct neighbour */
>> +#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>>
>> +enum mpls_payload_type {
>> +MPT_UNSPEC, /* IPv4 or IPv6 */
>> +MPT_IPV4 = 4,
>> +MPT_IPV6 = 6,
>> +
>> +/* Other types not implemented:
>> + *  - Pseudo-wire with or without control word (RFC4385)
>> + *  - GAL (RFC5586)
>> + */
>> +};
>> +
>> +struct mpls_nh { /* next hop label forwarding entry */
>> +struct net_device __rcu *nh_dev;
>> +u32 nh_label[MAX_NEW_LABELS];
>> +u8  nh_labels;
>> 

[PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Ben Cartwright-Cox
Sending ICMP packets with raw sockets ends up in the SNMP counters
logging the type as the first byte of the IPv4 header rather than
the ICMP header (in nearly all cases this is seen as "OutType69".
This is fixed by adding the IP Header Length to the casting into
a icmphdr struct.

Signed-off-by: Ben Cartwright-Cox 
---
 net/ipv4/raw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 561cd4b..1ad8bae 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -409,7 +409,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 
*fl4,
}
if (iph->protocol == IPPROTO_ICMP)
icmp_out_count(net, ((struct icmphdr *)
-   skb_transport_header(skb))->type);
+   skb_transport_header(skb) + iphlen)->type);
 
err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, sk, skb,
  NULL, rt->dst.dev, dst_output_sk);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 2/2] mpls: flow-based multipath selection

2015-10-11 Thread roopa
On 10/11/15, 12:43 PM, Eric W. Biederman wrote:
> Roopa Prabhu  writes:
>
>> From: Robert Shearman 
>>
>> Change the selection of a multipath route to use a flow-based
>> hash. This more suitable for traffic sensitive to reordering within a
>> flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
>> of traffic given enough flows.
>>
>> Selection of the path for a multipath route is done using a hash of:
>> 1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
>>including entropy label, whichever is first.
>> 2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
>>payload, if present.
>>
>> Naturally, a 5-tuple hash using L4 information in addition would be
>> possible and be better in some scenarios, but there is a tradeoff
>> between looking deeper into the packet to achieve good distribution,
>> and packet forwarding performance, and I have erred on the side of the
>> latter as the default.
> Not a fault with this patch, but this patches use of entropy labels
> does highlight that we don't handle creating entropy labels on ingress
> nor dealing with entropy labels on egress.
>
>> Signed-off-by: Robert Shearman 
>> ---
>>  net/mpls/af_mpls.c | 88 
>> ++
>>  1 file changed, 83 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 4d819df..15dd2eb 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -22,6 +22,11 @@
>>  #include 
>>  #include "internal.h"
>>  
>> +/* Maximum number of labels to look ahead at when selecting a path of
>> + * a multipath route
>> + */
>> +#define MAX_MP_SELECT_LABELS 4
> This number seems a little small.  Especially given that an entopy label
> and the entropy label indicator consume two of these.  Although I
> suspect 4 is enough for most cases in practice.
yes, we have seen 4 to be enough in practice as well. We can increase it in 
future if need be.
>
>> +
>>  static int zero = 0;
>>  static int label_limit = (1 << 20) - 1;
>>  
>> @@ -77,10 +82,78 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, 
>> unsigned int mtu)
>>  }
>>  EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
>>  
>> -static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
>> +static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
>> + struct sk_buff *skb, bool bos)
>>  {
>> -/* assume single nexthop for now */
>> -return >rt_nh[0];
>> +struct mpls_entry_decoded dec;
>> +struct mpls_shim_hdr *hdr;
>> +bool eli_seen = false;
>> +int label_index;
>> +int nh_index = 0;
>> +u32 hash = 0;
>> +
>> +/* No need to look further into packet if there's only
>> + * one path
>> + */
>> +if (rt->rt_nhn == 1)
>> +goto out;
>> +
>> +for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
>> + label_index++) {
>> +if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
>> +break;
>> +
>> +/* Read and decode the current label */
>> +hdr = mpls_hdr(skb) + label_index;
>> +dec = mpls_entry_decode(hdr);
>> +
>> +/* RFC6790 - reserved labels MUST NOT be used as keys
>> + * for the load-balancing function
>> + */
>> +if (dec.label == MPLS_LABEL_ENTROPY) {
>> +eli_seen = true;
>> +} else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) {
> We should probably test this first and mark this branch as likely.

ok
>
>> +hash = jhash_1word(dec.label, hash);
>> +
>> +/* The entropy label follows the entropy label
>> + * indicator, so this means that the entropy
>> + * label was just added to the hash - no need to
>> + * go any deeper either in the label stack or in the
>> + * payload
>> + */
>> +if (eli_seen)
>> +break;
>> +}
>> +
>> +bos = dec.bos;
>> +if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
>> + sizeof(struct iphdr))) {
>> +const struct iphdr *v4hdr;
>> +
>> +v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
>> +   label_index);
>> +if (v4hdr->version == 4) {
>> +hash = jhash_3words(ntohl(v4hdr->saddr),
>> +ntohl(v4hdr->daddr),
>> +v4hdr->protocol, hash);
>> +} else if (v4hdr->version == 6 &&
>> +pskb_may_pull(skb, sizeof(*hdr) * label_index +
>> +  

Re: [PATCH net-next v3 1/2] mpls: multipath route support

2015-10-11 Thread Eric W. Biederman
Roopa Prabhu  writes:

> From: Roopa Prabhu 
>
> This patch adds support for MPLS multipath routes.
>
> Includes following changes to support multipath:
> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>
> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>
> - moves mpls route and nexthop structures into internal.h
>
> - A mpls_route can point to multiple mpls_nh structs
>
> - the nexthops are maintained as a array (similar to ipv4 fib)
>
> - In the process of restructuring, this patch also consistently changes
>   all labels to u8
>
> - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
> multipath routes similar to ipv4/v6 fib
>
> - In this patch, the multipath route nexthop selection algorithm
> simply returns the first nexthop. It is replaced by a
> hash based algorithm from Robert Shearman in the next patch
>
> - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
> mpls_route_update though implemented to update based on dev, it was
> never used that way. And the dev handling gets tricky with multiple nexthops.
> Cannot match against any single nexthops dev. So, this patch removes the 
> unused
> 'dev' handling in mpls_route_update.

Ugh.  Looking at this patch I have found if not a bug a misfeature.

The existing code has a function:

> bool mpls_output_possible(const struct net_device *dev)
> {
>   return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
> }

Which makes a lot of sense if you only have a single path to choose
from.   When multipath comes into play we need to do something to ensure
we don't select a path that where no output is possible for a prolonged
period of time.

> Example:
>
> $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
> nexthop as 700 via inet 10.1.1.6 dev swp2 \
> nexthop as 800 via inet 40.1.1.2 dev swp3
>
> $ip  -f mpls route show
> 100
> nexthop as to 200 via inet 10.1.1.2  dev swp1
> nexthop as to 700 via inet 10.1.1.6  dev swp2
> nexthop as to 800 via inet 40.1.1.2  dev swp3
>
> Signed-off-by: Roopa Prabhu 
> ---
>  include/net/mpls_iptunnel.h |   2 +-
>  net/mpls/af_mpls.c  | 500 
> +++-
>  net/mpls/internal.h |  52 -
>  3 files changed, 404 insertions(+), 150 deletions(-)
>
[...]
> @@ -196,9 +176,13 @@ static int mpls_forward(struct sk_buff *skb, struct 
> net_device *dev,
>   if (!rt)
>   goto drop;
>  
> + nh = mpls_select_multipath(rt);
> + if (!nh)
> + goto drop;
> +
>   /* Find the output device */
> - out_dev = rcu_dereference(rt->rt_dev);
> - if (!mpls_output_possible(out_dev))
> + out_dev = rcu_dereference(nh->nh_dev);
> + if (!out_dev || !mpls_output_possible(out_dev))
>   goto drop;
>

nit: mpls_output_possible already embeds the !out_dev test, making this
hunk redundant.

>   if (skb_warn_if_lro(skb))

> @@ -635,9 +763,11 @@ static void mpls_ifdown(struct net_device *dev)
>   struct mpls_route *rt = rtnl_dereference(platform_label[index]);
>   if (!rt)
>   continue;
> - if (rtnl_dereference(rt->rt_dev) != dev)
> - continue;
> - rt->rt_dev = NULL;
> + for_nexthops(rt) {
> + if (rtnl_dereference(nh->nh_dev) != dev)
> + continue;
> + nh->nh_dev = NULL;
> + } endfor_nexthops(rt);
>   }
>  
>   mdev = mpls_dev_get(dev);
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index 2681a4b..d7757be3 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -21,6 +21,54 @@ struct mpls_dev {
>  
>  struct sk_buff;
>  
> +#define LABEL_NOT_SPECIFIED (1 << 20)
> +#define MAX_NEW_LABELS 2
> +
> +/* This maximum ha length copied from the definition of struct neighbour */
> +#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>
> +enum mpls_payload_type {
> + MPT_UNSPEC, /* IPv4 or IPv6 */
> + MPT_IPV4 = 4,
> + MPT_IPV6 = 6,
> +
> + /* Other types not implemented:
> +  *  - Pseudo-wire with or without control word (RFC4385)
> +  *  - GAL (RFC5586)
> +  */
> +};
> +
> +struct mpls_nh { /* next hop label forwarding entry */
> + struct net_device __rcu *nh_dev;
> + u32 nh_label[MAX_NEW_LABELS];
> + u8  nh_labels;
> + u8  nh_via_alen;
> + u8  nh_via_table;
> + u8  nh_via[MAX_VIA_ALEN];
> +};

I think for this patch this is a good version of mpls_nh.

We do have significant optimization opportunities with the mpls_nh
structure.

- We waste a byte on each mpls label.
- We need a full byte on any of the fields (nh_lables, nh_via_alen, 

[PATCH net-next 4/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-11 Thread Vivien Didelot
Playing with the VLAN map of every port to implement "hardware bridging"
in the 88E6352 driver was a hack until full 802.1Q was supported.

Indeed with 802.1Q port mode "Disabled" or "Fallback", this feature is
used to restrict which output ports an input port can egress frames to.

A Linux bridge is an untagged VLAN. With full 802.1Q support, we don't
need this hack anymore and can use the "Secure" strict 802.1Q port mode.

With this mode, the port-based VLAN map still needs to be configured,
but all the logic is VTU-centric. This means that the switch only cares
about rules described in its hardware VLAN table, which is exactly what
Linux bridge expects and what we want.

Note also that the hardware bridging was broken with the previous
flexible "Fallback" 802.1Q port mode. Here's an example:

Port0 and Port1 belong to the same bridge. If Port0 sends crafted tagged
frames with VID 200 to Port1, Port1 receives it. Even if Port1 is in
hardware VLAN 200, but not Port0, Port1 will still receive it, because
Fallback mode doesn't care about invalid VID or non-member source port.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6171.c |  2 --
 drivers/net/dsa/mv88e6352.c |  2 --
 drivers/net/dsa/mv88e6xxx.c | 47 +++--
 drivers/net/dsa/mv88e6xxx.h |  2 --
 4 files changed, 3 insertions(+), 50 deletions(-)

diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index ca3330a..dfca352 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -113,8 +113,6 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
 #endif
.get_regs_len   = mv88e6xxx_get_regs_len,
.get_regs   = mv88e6xxx_get_regs,
-   .port_join_bridge   = mv88e6xxx_join_bridge,
-   .port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
.port_pvid_get  = mv88e6xxx_port_pvid_get,
.port_pvid_set  = mv88e6xxx_port_pvid_set,
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 078a358..796fdcb 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -340,8 +340,6 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.set_eeprom = mv88e6352_set_eeprom,
.get_regs_len   = mv88e6xxx_get_regs_len,
.get_regs   = mv88e6xxx_get_regs,
-   .port_join_bridge   = mv88e6xxx_join_bridge,
-   .port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
.port_pvid_get  = mv88e6xxx_port_pvid_get,
.port_pvid_set  = mv88e6xxx_port_pvid_set,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 787352df..63736f9 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1124,41 +1124,6 @@ static int _mv88e6xxx_port_vlan_map_set(struct 
dsa_switch *ds, int port,
return _mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_BASE_VLAN, reg);
 }
 
-/* Bridge handling functions */
-
-static int mv88e6xxx_map_bridge(struct dsa_switch *ds, u16 members)
-{
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   const unsigned long output = members | BIT(dsa_upstream_port(ds));
-   int port, err = 0;
-
-   mutex_lock(>smi_mutex);
-
-   for_each_set_bit(port, , ps->num_ports) {
-   if (dsa_is_cpu_port(ds, port))
-   continue;
-
-   err = _mv88e6xxx_port_vlan_map_set(ds, port, output & ~port);
-   if (err)
-   break;
-   }
-
-   mutex_unlock(>smi_mutex);
-
-   return err;
-}
-
-
-int mv88e6xxx_join_bridge(struct dsa_switch *ds, int port, u32 br_port_mask)
-{
-   return mv88e6xxx_map_bridge(ds, br_port_mask);
-}
-
-int mv88e6xxx_leave_bridge(struct dsa_switch *ds, int port, u32 br_port_mask)
-{
-   return mv88e6xxx_map_bridge(ds, br_port_mask & ~port);
-}
-
 int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int port, u8 state)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
@@ -2007,7 +1972,7 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
reg |= PORT_CONTROL_2_FORWARD_UNKNOWN;
}
 
-   reg |= PORT_CONTROL_2_8021Q_FALLBACK;
+   reg |= PORT_CONTROL_2_8021Q_SECURE;
 
if (reg) {
ret = _mv88e6xxx_reg_write(ds, REG_PORT(port),
@@ -2103,15 +2068,9 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
goto abort;
 
/* Port based VLAN map: do not give each port its own address
-* database, allow the CPU port to talk to each of the 'real'
-* ports, and allow each of the 'real' ports to only talk to
-* the upstream port.
+* database, and allow every port to egress frames on all other ports.
 */
-   if 

[PATCH net-next 2/4] net: dsa: mv88e6xxx: do not support per-port FID

2015-10-11 Thread Vivien Didelot
Since we configure a switch chip through a Linux bridge, and a bridge is
implemented as a VLAN, there is no need for per-port FID anymore.

This patch gets rid of this and simplifies the driver code since we can
now directly map all 4095 FIDs available to all VLANs.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 69 -
 drivers/net/dsa/mv88e6xxx.h |  5 
 2 files changed, 11 insertions(+), 63 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 254f9bb..787352df 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1468,6 +1468,7 @@ static int _mv88e6xxx_vlan_init(struct dsa_switch *ds, 
u16 vid,
struct mv88e6xxx_vtu_stu_entry vlan = {
.valid = true,
.vid = vid,
+   .fid = vid, /* We use one FID per VLAN */
};
int i;
 
@@ -1501,22 +1502,10 @@ static int _mv88e6xxx_vlan_init(struct dsa_switch *ds, 
u16 vid,
return err;
}
 
-   /* Non-bridged ports and bridge groups use FIDs from 1 to
-* num_ports; VLANs use FIDs from num_ports+1 to 4095.
-*/
-   vlan.fid = find_next_zero_bit(ps->fid_bitmap, VLAN_N_VID,
- ps->num_ports + 1);
-   if (unlikely(vlan.fid == VLAN_N_VID)) {
-   pr_err("no more FID available for VLAN %d\n", vid);
-   return -ENOSPC;
-   }
-
/* Clear all MAC addresses from the new database */
err = _mv88e6xxx_atu_flush(ds, vlan.fid, true);
if (err)
return err;
-
-   set_bit(vlan.fid, ps->fid_bitmap);
}
 
*entry = vlan;
@@ -1556,7 +1545,6 @@ int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int 
port, u16 vid)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
struct mv88e6xxx_vtu_stu_entry vlan;
-   bool keep = false;
int i, err;
 
mutex_lock(>smi_mutex);
@@ -1574,28 +1562,22 @@ int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int 
port, u16 vid)
vlan.data[port] = GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER;
 
/* keep the VLAN unless all ports are excluded */
+   vlan.valid = false;
for (i = 0; i < ps->num_ports; ++i) {
if (dsa_is_cpu_port(ds, i))
continue;
 
if (vlan.data[i] != GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER) {
-   keep = true;
+   vlan.valid = true;
break;
}
}
 
-   vlan.valid = keep;
err = _mv88e6xxx_vtu_loadpurge(ds, );
if (err)
goto unlock;
 
err = _mv88e6xxx_atu_remove(ds, vlan.fid, port, false);
-   if (err)
-   goto unlock;
-
-   if (!keep)
-   clear_bit(vlan.fid, ps->fid_bitmap);
-
 unlock:
mutex_unlock(>smi_mutex);
 
@@ -1722,37 +1704,13 @@ static int _mv88e6xxx_atu_load(struct dsa_switch *ds,
return _mv88e6xxx_atu_cmd(ds, GLOBAL_ATU_OP_LOAD_DB);
 }
 
-static int _mv88e6xxx_port_vid_to_fid(struct dsa_switch *ds, int port, u16 vid)
-{
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   struct mv88e6xxx_vtu_stu_entry vlan;
-   int err;
-
-   if (vid == 0)
-   return ps->fid[port];
-
-   err = _mv88e6xxx_port_vtu_getnext(ds, port, vid - 1, );
-   if (err)
-   return err;
-
-   if (vlan.vid == vid)
-   return vlan.fid;
-
-   return -ENOENT;
-}
-
 static int _mv88e6xxx_port_fdb_load(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid,
u8 state)
 {
struct mv88e6xxx_atu_entry entry = { 0 };
-   int ret;
 
-   ret = _mv88e6xxx_port_vid_to_fid(ds, port, vid);
-   if (ret < 0)
-   return ret;
-
-   entry.fid = ret;
+   entry.fid = vid; /* We use one FID per VLAN */
entry.state = state;
ether_addr_copy(entry.mac, addr);
if (state != GLOBAL_ATU_DATA_STATE_UNUSED) {
@@ -1767,6 +1725,10 @@ int mv88e6xxx_port_fdb_prepare(struct dsa_switch *ds, 
int port,
   const struct switchdev_obj_port_fdb *fdb,
   struct switchdev_trans *trans)
 {
+   /* We don't use per-port FDB */
+   if (fdb->vid == 0)
+   return -EOPNOTSUPP;
+
/* We don't need any dynamic resource from the kernel (yet),
 * so skip the prepare phase.
 */
@@ -1864,16 +1826,11 @@ int mv88e6xxx_port_fdb_getnext(struct dsa_switch *ds, 
int port,
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
struct mv88e6xxx_atu_entry next;
-   u16 fid;
+   u16 fid = *vid; /* We use 

[PATCH net-next 1/4] net: dsa: mv88e6xxx: bridges do not need an FID

2015-10-11 Thread Vivien Didelot
With 88E6352 and similar switch chips, each port has a map to restrict
which output port this input port can egress frames to.

The current driver code implements hardware bridging using this feature,
and assigns to a bridge group the FID of its first member.

Now that 802.1Q is fully implemented in this driver, a Linux bridge
which is a simple untagged VLAN, already gets its own FID.

This patch gets rid of the per-bridge FID and explicits the usage of the
port based VLAN map feature.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx.c | 157 +++-
 drivers/net/dsa/mv88e6xxx.h |   1 -
 2 files changed, 40 insertions(+), 118 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index e381bfc..254f9bb 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1046,11 +1046,6 @@ static int _mv88e6xxx_atu_flush(struct dsa_switch *ds, 
u16 fid, bool static_too)
return _mv88e6xxx_atu_flush_move(ds, , static_too);
 }
 
-static int _mv88e6xxx_flush_fid(struct dsa_switch *ds, int fid)
-{
-   return _mv88e6xxx_atu_flush(ds, fid, false);
-}
-
 static int _mv88e6xxx_atu_move(struct dsa_switch *ds, u16 fid, int from_port,
   int to_port, bool static_too)
 {
@@ -1112,130 +1107,56 @@ abort:
return ret;
 }
 
-/* Must be called with smi lock held */
-static int _mv88e6xxx_update_port_config(struct dsa_switch *ds, int port)
+static int _mv88e6xxx_port_vlan_map_set(struct dsa_switch *ds, int port,
+   u16 output_ports)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   u8 fid = ps->fid[port];
-   u16 reg = fid << 12;
+   const u16 mask = (1 << ps->num_ports) - 1;
+   int reg;
 
-   if (dsa_is_cpu_port(ds, port))
-   reg |= ds->phys_port_mask;
-   else
-   reg |= (ps->bridge_mask[fid] |
-  (1 << dsa_upstream_port(ds))) & ~(1 << port);
+   reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_BASE_VLAN);
+   if (reg < 0)
+   return reg;
+
+   reg &= ~mask;
+   reg |= output_ports & mask;
 
return _mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_BASE_VLAN, reg);
 }
 
-/* Must be called with smi lock held */
-static int _mv88e6xxx_update_bridge_config(struct dsa_switch *ds, int fid)
-{
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int port;
-   u32 mask;
-   int ret;
-
-   mask = ds->phys_port_mask;
-   while (mask) {
-   port = __ffs(mask);
-   mask &= ~(1 << port);
-   if (ps->fid[port] != fid)
-   continue;
-
-   ret = _mv88e6xxx_update_port_config(ds, port);
-   if (ret)
-   return ret;
-   }
-
-   return _mv88e6xxx_flush_fid(ds, fid);
-}
-
 /* Bridge handling functions */
 
+static int mv88e6xxx_map_bridge(struct dsa_switch *ds, u16 members)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   const unsigned long output = members | BIT(dsa_upstream_port(ds));
+   int port, err = 0;
+
+   mutex_lock(>smi_mutex);
+
+   for_each_set_bit(port, , ps->num_ports) {
+   if (dsa_is_cpu_port(ds, port))
+   continue;
+
+   err = _mv88e6xxx_port_vlan_map_set(ds, port, output & ~port);
+   if (err)
+   break;
+   }
+
+   mutex_unlock(>smi_mutex);
+
+   return err;
+}
+
+
 int mv88e6xxx_join_bridge(struct dsa_switch *ds, int port, u32 br_port_mask)
 {
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int ret = 0;
-   u32 nmask;
-   int fid;
-
-   /* If the bridge group is not empty, join that group.
-* Otherwise create a new group.
-*/
-   fid = ps->fid[port];
-   nmask = br_port_mask & ~(1 << port);
-   if (nmask)
-   fid = ps->fid[__ffs(nmask)];
-
-   nmask = ps->bridge_mask[fid] | (1 << port);
-   if (nmask != br_port_mask) {
-   netdev_err(ds->ports[port],
-  "join: Bridge port mask mismatch fid=%d mask=0x%x 
expected 0x%x\n",
-  fid, br_port_mask, nmask);
-   return -EINVAL;
-   }
-
-   mutex_lock(>smi_mutex);
-
-   ps->bridge_mask[fid] = br_port_mask;
-
-   if (fid != ps->fid[port]) {
-   clear_bit(ps->fid[port], ps->fid_bitmap);
-   ps->fid[port] = fid;
-   ret = _mv88e6xxx_update_bridge_config(ds, fid);
-   }
-
-   mutex_unlock(>smi_mutex);
-
-   return ret;
+   return mv88e6xxx_map_bridge(ds, br_port_mask);
 }
 
 int mv88e6xxx_leave_bridge(struct dsa_switch *ds, int port, u32 br_port_mask)
 {
-   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   u8 fid, newfid;
-   int ret;
-
-   fid = ps->fid[port];
-
-   

[PATCH net-next 0/4] net: dsa: mv88e6xxx: fix hardware bridging

2015-10-11 Thread Vivien Didelot
DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in
order to configure the VLAN map of every port.

This VLAN map is a feature of these switch chips to hardcode and restrict which
output ports a given input port can egress frames to.

A Linux bridge is a simple untagged VLAN propagated by the bridge code itself.
With a proper 802.1Q support, a driver does not need this hook anymore, and
will simply program the related VLAN object.

This patchset improves the hardware bridging code in the mv88e6xxx driver with
a strict 802.1Q mode.

Ideally, the equivalent must be done for Broadcom Starfighter 2 and Rocker,
before completely getting rid of this hook.

Vivien Didelot (4):
  net: dsa: mv88e6xxx: bridges do not need an FID
  net: dsa: mv88e6xxx: do not support per-port FID
  net: dsa: do not warn unsupported bridge ops
  net: dsa: mv88e6xxx: fix hardware bridging

 drivers/net/dsa/mv88e6171.c |   2 -
 drivers/net/dsa/mv88e6352.c |   2 -
 drivers/net/dsa/mv88e6xxx.c | 215 ++--
 drivers/net/dsa/mv88e6xxx.h |   8 --
 net/dsa/slave.c |   2 +-
 5 files changed, 26 insertions(+), 203 deletions(-)

-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Eric Dumazet
On Sun, 2015-10-11 at 20:55 +, Ben Cartwright-Cox wrote:
> Sending ICMP packets with raw sockets ends up in the SNMP counters
> logging the type as the first byte of the IPv4 header rather than
> the ICMP header (in nearly all cases this is seen as "OutType69".
> This is fixed by adding the IP Header Length to the casting into
> a icmphdr struct.
> 
> Signed-off-by: Ben Cartwright-Cox 
> ---
>  net/ipv4/raw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 561cd4b..1ad8bae 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -409,7 +409,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 
> *fl4,
>   }
>   if (iph->protocol == IPPROTO_ICMP)
>   icmp_out_count(net, ((struct icmphdr *)
> - skb_transport_header(skb))->type);
> + skb_transport_header(skb) + iphlen)->type);
>  
>   err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, sk, skb,
> NULL, rt->dst.dev, dst_output_sk);


Hmm... This seems to lack checks against a malicious user ?

The only guarantee you have here is that iphlen < length.

It is not enough.

Make sure you do not access not initialized memory or even non existent
one.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/4] net: dsa: do not warn unsupported bridge ops

2015-10-11 Thread Vivien Didelot
A DSA driver may not provide the port_join_bridge and port_leave_bridge
functions, so don't warn in such case.

Signed-off-by: Vivien Didelot 
---
 net/dsa/slave.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bb2bd3b..43d7342 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1276,7 +1276,7 @@ int dsa_slave_netdevice_event(struct notifier_block 
*unused,
goto out;
 
err = dsa_slave_master_changed(dev);
-   if (err)
+   if (err && err != -EOPNOTSUPP)
netdev_warn(dev, "failed to reflect master change\n");
 
break;
-- 
2.6.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] icmp: Fixed bug in raw sockets causing incorrect ICMP SNMP counter values

2015-10-11 Thread Eric Dumazet
Please do not top-post on netdev and/or lkml


On Mon, 2015-10-12 at 00:14 +0100, Ben Cox wrote:
> Nice!
> 
> That works in my head at least, Sorry about not seeing that fairly
> glaring memory issue there.

No problem, this is why we review patches ;)

> 
> Are you sure " skb->transport_header += iphlen; " won't have a knock
> on affect when it's given to NF_HOOK ( as in, would a potential
> userspace program get something it does not expect anymore ) ?


netfilter does not depend on transport_header being set (otherwise as
you can see RAW packets would have problems with netfilter since the
very beginning.


> How does submission work at this point if the above is not a issue
> (apologies if this is already in a FAQ somewhere I missed)?
> 

Just send a V2 of the patch, using following title :

[PATCH v2 net] raw: increment correct SNMP counters for ICMP messages


I'll add my 'Acked-by:' after your 'Signed-off-by:'

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: switchdev and VLAN ranges

2015-10-11 Thread Nikolay Aleksandrov
On 10/12/2015 12:41 AM, Vivien Didelot wrote:
> On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
>> Sat, Oct 10, 2015 at 12:36:26PM CEST, niko...@cumulusnetworks.com wrote:
>>> On 10/10/2015 09:49 AM, Elad Raz wrote:

> On Oct 10, 2015, at 2:30 AM, Vivien Didelot 
>  wrote:
>
> I have two concerns in mind:
>
> a) if we imagine that drivers like Rocker allocate memory in the prepare
> phase for each VID, preparing a range like 100-4000 would definitely not
> be recommended.
>
> b) imagine that you have two Linux bridges on a switch, one using the
> hardware VLAN 100. If you request the VLAN range 99-101 for the other
> bridge members, it is not possible for the driver to say "I can
> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> the whole range.

 Another concern I have with vid_being..vid_end range is the “flags”. Where 
 flags can be BRIDGE_VLAN_INFO_PVID.
 There is no sense having more than one VLAN as a PVID.
 This leave the HW vendor the choice which VLAN id they will use as the 
 PVID.

>>>
>>> iproute2 doesn't allow to do it but I can see that someone can actually 
>>> make it
>>> so the flags for the range have it and it doesn't look correct. Perhaps we 
>>> need
>>> something like the patch below to enforce this from kernel-side.
>>>
>>>
>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>> index d78b4429505a..02b17b53e9a6 100644
>>> --- a/net/bridge/br_netlink.c
>>> +++ b/net/bridge/br_netlink.c
>>> @@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
>>> if (vinfo_start)
>>> return -EINVAL;
>>> vinfo_start = vinfo;
>>> +   /* don't allow range of pvids */
>>> +   if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
>>> +   return -EINVAL;
>>> continue;
>>> }
>>>
>>
>> Looks correct to me. Could you please submit this properly? Thanks!
> 
> The above patch is correct, but we only solve part of the problem, since
> the range and bridge flags are exposed by switchdev_obj_port_vlan as is.
> 
> Thanks,
> -v
> 

Yes, the above fixes the bridge side. About the switchdev side it seems like 
it's
up to the switchdev driver to do the right thing in its switchdev_ops. I took a
quick look at DSA and it seems correct, the flag isn't saved and on dump request
the flags are generated so it shouldn't be possible to export multiple pvids.
But switchdev_port_br_afspec() seems problematic, in fact I don't even see a 
vlan
id check, i.e. ==0 || >= VLAN_N_MASK.
Of course, I might be totally off point as I'm not that familiar with switchdev 
and
it's very late. :-)
But maybe it needs something like:

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 6e4a4f9ad927..3dd52a53867f 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -716,10 +717,14 @@ static int switchdev_port_br_afspec(struct net_device 
*dev,
return -EINVAL;
vinfo = nla_data(attr);
vlan.flags = vinfo->flags;
+   if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
+return -EINVAL;
if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
if (vlan.vid_begin)
return -EINVAL;
vlan.vid_begin = vinfo->vid;
+   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
+   return -EINVAL;
} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
if (!vlan.vid_begin)
return -EINVAL;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: nhs: fix 32-bit build warning

2015-10-11 Thread huangdaode

On 2015/10/12 4:13, Joe Perches wrote:

On Tue, 2015-10-06 at 23:53 +0200, Arnd Bergmann wrote:

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c

[]

@@ -448,12 +448,12 @@ static ssize_t handles_show(struct device *dev,
s += sprintf(buf + s, "handle %d (eport_id=%u from %s):\n",
i++, h->eport_id, h->dev->name);
for (j = 0; j < h->q_num; j++) {
-   s += sprintf(buf + s, "\tqueue[%d] on 0x%llx\n",
-j, (u64)h->qs[i]->io_base);
-#define HANDEL_TX_MSG "\t\ttx_ring on 0x%llx:%u,%u,%u,%u,%u,%llu,%llu\n"
+   s += sprintf(buf + s, "\tqueue[%d] on %p\n",
+j, h->qs[i]->io_base);
+#define HANDEL_TX_MSG "\t\ttx_ring on %p:%u,%u,%u,%u,%u,%llu,%llu\n"
s += sprintf(buf + s,
 HANDEL_TX_MSG,

Maybe one day remove the misspelled and
used once HANDEL_TX_MSG macro too.

Another possibility is to use a generally
smaller object code style like:

char *s = buf;
...
s += sprintf(s, ...)
...
return s - buf;

instead of

ssize_t s = 0;
...
s += sprintf(buf + s, ...)
...
return s;
---
  drivers/net/ethernet/hisilicon/hns/hnae.c | 35 +++
  1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
index f52e99a..d2af46f 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -439,40 +439,39 @@ EXPORT_SYMBOL(hnae_ae_unregister);
  static ssize_t handles_show(struct device *dev,
struct device_attribute *attr, char *buf)
  {
-   ssize_t s = 0;
+   char *s = buf;
struct hnae_ae_dev *hdev = cls_to_ae_dev(dev);
struct hnae_handle *h;
int i = 0, j;
  
  	list_for_each_entry_rcu(h, >handle_list, node) {

-   s += sprintf(buf + s, "handle %d (eport_id=%u from %s):\n",
-   i++, h->eport_id, h->dev->name);
+   s += sprintf(s, "handle %d (eport_id=%u from %s):\n",
+i++, h->eport_id, h->dev->name);
for (j = 0; j < h->q_num; j++) {
-   s += sprintf(buf + s, "\tqueue[%d] on %p\n",
+   s += sprintf(s, "\tqueue[%d] on %p\n",
 j, h->qs[i]->io_base);
-#define HANDEL_TX_MSG "\t\ttx_ring on %p:%u,%u,%u,%u,%u,%llu,%llu\n"
-   s += sprintf(buf + s,
-HANDEL_TX_MSG,
+   s += sprintf(s,
+"\t\ttx_ring on 
%p:%u,%u,%u,%u,%u,%llu,%llu\n",
 h->qs[i]->tx_ring.io_base,
 h->qs[i]->tx_ring.buf_size,
 h->qs[i]->tx_ring.desc_num,
 h->qs[i]->tx_ring.max_desc_num_per_pkt,
 h->qs[i]->tx_ring.max_raw_data_sz_per_desc,
 h->qs[i]->tx_ring.max_pkt_size,
-h->qs[i]->tx_ring.stats.sw_err_cnt,
-h->qs[i]->tx_ring.stats.io_err_cnt);
-   s += sprintf(buf + s,
-   "\t\trx_ring on %p:%u,%u,%llu,%llu,%llu\n",
-   h->qs[i]->rx_ring.io_base,
-   h->qs[i]->rx_ring.buf_size,
-   h->qs[i]->rx_ring.desc_num,
-   h->qs[i]->rx_ring.stats.sw_err_cnt,
-   h->qs[i]->rx_ring.stats.io_err_cnt,
-   h->qs[i]->rx_ring.stats.seg_pkt_cnt);
+h->qs[i]->tx_ring.stats.sw_err_cnt,
+h->qs[i]->tx_ring.stats.io_err_cnt);
+   s += sprintf(s,
+"\t\trx_ring on %p:%u,%u,%llu,%llu,%llu\n",
+h->qs[i]->rx_ring.io_base,
+h->qs[i]->rx_ring.buf_size,
+h->qs[i]->rx_ring.desc_num,
+h->qs[i]->rx_ring.stats.sw_err_cnt,
+h->qs[i]->rx_ring.stats.io_err_cnt,
+h->qs[i]->rx_ring.stats.seg_pkt_cnt);
}
}
  
-	return s;

+   return s - buf;
  }
  
  static DEVICE_ATTR_RO(handles);




.


thanks Joe
your suggestion is good, will send out a patch to fix it.


--
Best Regards
Daode Huang


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: [PATCH net-next 0/2] net: fix some bugs in HNS drivers

2015-10-11 Thread huangdaode

On 2015/10/11 20:02, David Miller wrote:

From: huangdaode 
Date: Thu, 8 Oct 2015 14:17:23 +0800


This patchset fixes the two bugs in HNS driver, one is for fixing the
compilation warning bug on arm 32-bit platform, another is fixing the wrong
mac port judgement bug which is found during internal tests.

This patch series does not apply to the net-next tree at all.

Also, both patches should use a subsystem prefix of "hisilicon: "
in the Subject lines.

Thanks.

.



Thanks David,
Because it conflicts with arnd's patches,  which fixes this bug.
in the mail, Joe Perches advised another fix, we will send out the patch 
soon.

thanks

--
Best Regards
Daode Huang


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev

2015-10-11 Thread Scott Feldman
On Sat, Oct 10, 2015 at 12:04 AM, Jiri Pirko  wrote:
> Sat, Oct 10, 2015 at 04:53:52AM CEST, sfel...@gmail.com wrote:
>>On Thu, Oct 8, 2015 at 9:38 PM, Premkumar Jonnala  
>>wrote:
>>>
>>>
 -Original Message-
 From: sfel...@gmail.com [mailto:sfel...@gmail.com]
 Sent: Friday, October 09, 2015 7:53 AM
 To: netdev@vger.kernel.org
 Cc: da...@davemloft.net; j...@resnulli.us; siva.mannem@gmail.com;
 Premkumar Jonnala; step...@networkplumber.org;
 ro...@cumulusnetworks.com; and...@lunn.ch; f.faine...@gmail.com;
 vivien.dide...@savoirfairelinux.com
 Subject: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time 
 down
 to switchdev

 From: Scott Feldman 

 Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't
 support setting ageing_time (or setting bridge attrs in general).

 If push fails, don't update ageing_time in bridge and return err to user.

 If push succeeds, update ageing_time in bridge and run gc_timer now to
 recalabrate when to run gc_timer next, based on new ageing_time.

 Signed-off-by: Scott Feldman 
 Signed-off-by: Jiri Pirko 
>>
>>
>>
 +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time)
 +{
 + struct switchdev_attr attr = {
 + .id = SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
 + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP,
 + .u.ageing_time = ageing_time,
 + };
 + unsigned long t = clock_t_to_jiffies(ageing_time);
 + int err;
 +
 + if (t < BR_MIN_AGEING_TIME || t > BR_MAX_AGEING_TIME)
 + return -ERANGE;
 +
 + err = switchdev_port_attr_set(br->dev, );
>>>
>>> A thought - given that the ageing time is not a per-bridge-port attr, why 
>>> are we using a "port based api"
>>> to pass the attribute down?  May be I'm missing something here?
>>
>>I think Florian raised the same point earlier.  Sigh, I think this
>>should be addressedv4 coming soon...thanks guys for keeping the
>>standard high.
>
> Scott, can you tell us how do you want to address this? I like the
> current implementation.

I like it also, but there is some awkwardness in calling
switchdev_port_attr_set() with the first argument being the bridge
netdev, not a port netdev.  But, the basic algorithm to recurse from
_this_ netdev down to its lower netdevs works great in this case; it's
just the name "switchdev_port_attr_set" implies a port netdev for
top-level netdev.  So I was thinking about adding another call,
something like "switchdev_master_attr_set", which basically just does
the same thing as switchdev_port_attr_set, except maybe skips the
check to call the ops->switchdev_port_attr_add on the top-level
netdev.  But now I don't like that idea so much as "master" would be
confusing when your passing a bond netdev (which is also a master),
but the bond _is_ the port netdev this time, a port on the bridge.

So let's scrap v4 and go with v3.  I think I can live with this naming
awkwardness, given that we got something for essentially free by using
switchdev_port_attr_set() in a new way.

Davem, I think we're OK going with v3.

(There is a follow-on discussion about a switch device, which we'll
continue but it shouldn't block this v3 version).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] RDS: Invoke ->laddr_check() in rds_bind() for explicitly bound transports.

2015-10-11 Thread santosh.shilim...@oracle.com

On 10/11/15 1:46 PM, Sowmini Varadhan wrote:


The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.


Right. Probably it should go into stable as well.


Signed-off-by: Sowmini Varadhan 
---
  net/rds/bind.c |9 -
  1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/net/rds/bind.c b/net/rds/bind.c
index bc6b93e..6192566 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -196,7 +196,14 @@ int rds_bind(struct socket *sock, struct sockaddr *uaddr, 
int addr_len)
goto out;

if (rs->rs_transport) { /* previously bound */
-   ret = 0;

Minor comment. If you retain above line you can drop the below else.

+   trans = rs->rs_transport;
+   if (trans->laddr_check(sock_net(sock->sk),
+  sin->sin_addr.s_addr) != 0) {
+   ret = -ENOPROTOOPT;
+   rds_remove_bound(rs);
+   } else {
+   ret = 0;
+   }


Acked-by: Santosh Shilimkar 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()

2015-10-11 Thread santosh.shilim...@oracle.com

On 10/11/15 1:49 PM, Sowmini Varadhan wrote:


Consider the following "duelling syn" sequence between two peers A and B:
A   B
SYN1 -->
<--  SYN2
SYN2ACK  -->

Note that the SYN/ACK has already been sent out by TCP before
rds_tcp_accept_one() gets invoked as part of callbacks.

If the inet_addr(A) is numerically less than inet_addr(B),
the arbitration scheme in rds_tcp_accept_one() will prefer the
TCP connection triggered by SYN1, and will send a CLOSE for the
SYN2 (just after the SYN2ACK was sent).

Since B also follows the same arbitration scheme, it will send the SYN-ACK
for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
B will also get a  CLOSE for SYN2, which should result in the cleanup
of the TCP state machine for SYN2, but it should not trigger any
stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
that would disrupt the progress of the SYN2 based RDS-TCP  connection.

Thus the arbitration scheme in rds_tcp_accept_one() should restore
rds_tcp callbacks for the winner before setting them up for the
new accept socket, and also make sure that conn->c_outgoing
is set to 0 so that we do not trigger any reconnect attempts on the
passive side of the tcp socket in the future, in conformance with
commit c82ac7e69efe ("net/rds: RDS-TCP: only initiate reconnect attempt
on outgoing TCP socket.")

Signed-off-by: Sowmini Varadhan 
---


Looks fine to me.

Acked-by: Santosh Shilimkar 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: switchdev and VLAN ranges

2015-10-11 Thread Scott Feldman
On Sun, Oct 11, 2015 at 5:13 PM, Nikolay Aleksandrov
 wrote:
> On 10/12/2015 12:41 AM, Vivien Didelot wrote:
>> On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
>>> Sat, Oct 10, 2015 at 12:36:26PM CEST, niko...@cumulusnetworks.com wrote:
 On 10/10/2015 09:49 AM, Elad Raz wrote:
>
>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot 
>>  wrote:
>>
>> I have two concerns in mind:
>>
>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>> phase for each VID, preparing a range like 100-4000 would definitely not
>> be recommended.
>>
>> b) imagine that you have two Linux bridges on a switch, one using the
>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>> bridge members, it is not possible for the driver to say "I can
>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>> the whole range.
>
> Another concern I have with vid_being..vid_end range is the “flags”. 
> Where flags can be BRIDGE_VLAN_INFO_PVID.
> There is no sense having more than one VLAN as a PVID.
> This leave the HW vendor the choice which VLAN id they will use as the 
> PVID.
>

 iproute2 doesn't allow to do it but I can see that someone can actually 
 make it
 so the flags for the range have it and it doesn't look correct. Perhaps we 
 need
 something like the patch below to enforce this from kernel-side.


 diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
 index d78b4429505a..02b17b53e9a6 100644
 --- a/net/bridge/br_netlink.c
 +++ b/net/bridge/br_netlink.c
 @@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
 if (vinfo_start)
 return -EINVAL;
 vinfo_start = vinfo;
 +   /* don't allow range of pvids */
 +   if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
 +   return -EINVAL;
 continue;
 }

>>>
>>> Looks correct to me. Could you please submit this properly? Thanks!
>>
>> The above patch is correct, but we only solve part of the problem, since
>> the range and bridge flags are exposed by switchdev_obj_port_vlan as is.
>>
>> Thanks,
>> -v
>>
>
> Yes, the above fixes the bridge side. About the switchdev side it seems like 
> it's
> up to the switchdev driver to do the right thing in its switchdev_ops. I took 
> a
> quick look at DSA and it seems correct, the flag isn't saved and on dump 
> request
> the flags are generated so it shouldn't be possible to export multiple pvids.
> But switchdev_port_br_afspec() seems problematic, in fact I don't even see a 
> vlan
> id check, i.e. ==0 || >= VLAN_N_MASK.
> Of course, I might be totally off point as I'm not that familiar with 
> switchdev and
> it's very late. :-)
> But maybe it needs something like:
>
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 6e4a4f9ad927..3dd52a53867f 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -716,10 +717,14 @@ static int switchdev_port_br_afspec(struct net_device 
> *dev,
> return -EINVAL;
> vinfo = nla_data(attr);
> vlan.flags = vinfo->flags;
> +   if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
> +return -EINVAL;
> if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
> if (vlan.vid_begin)
> return -EINVAL;
> vlan.vid_begin = vinfo->vid;
> +   if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> +   return -EINVAL;
> } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> if (!vlan.vid_begin)
> return -EINVAL;

This (and you other patch) seem right to me, if we're going to block
setting PVID when specifying a vlan range.  Would you mind combining
and resending both patches as one as a proper patch?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro

2015-10-11 Thread Joe Perches
Hello Huang.

On Mon, 2015-10-12 at 11:23 +0800, huangdaode wrote:
> This patch changes the code style to make the code more simple.
> also removes the once used HNADEL_TX_MSG macro, according to the

HANDEL_TX_MSG typo

> review comments from Joe Perches.
> 
> Signed-off-by: huangdaode 
> Reviewed-by: Joe Perches 

I didn't review this.

Please do not add signatures for another person when
not specifically added by that person.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next] bridge: try switchdev op first in __vlan_vid_add/del

2015-10-11 Thread Scott Feldman
On Sat, Oct 10, 2015 at 9:03 AM, Vivien Didelot
 wrote:
> On Oct. Friday 09 (41) 08:20 PM, Scott Feldman wrote:
>> On Fri, Oct 9, 2015 at 3:44 PM, Vivien Didelot
>>  wrote:
>> > Hi Jiri,
>> >
>> > On Oct. Friday 09 (41) 01:54 PM, Jiri Pirko wrote:
>> >> From: Jiri Pirko 
>> >>
>> >> Some drivers need to implement both switchdev vlan ops and
>> >> vid_add/kill ndos. For that to work in bridge code, we need to try
>> >> switchdev op first when adding/deleting vlan id.
>> >
>> > Just curious, when would a driver need to have both operations?
>>
>> Ya, I was kind of curious of that myself. Is this because the driver
>> wants to support standalone VLANs using 802.1q module and vconfig, as
>> well as bridge vlans?  With the vlan support built into the bridge,
>> I've been working under the assumption that 802.1q module (and
>> vconfig) aren't needed, and vlans for a bridged and non-bridge port
>> can be managed using the "bridge" iproute2 cmd.
>>
>> > I kinda have the same question regarding ndo_fdb_{add,del} and the
>> > bridge_{get,set}link equivalent, which is a bit confusing to me.
>>
>> I had to look back at my commit 7f109539 to remind myself about the
>> vid_add/kill ndos and bridge_{get,set}link usage.   Maybe that
>> write-up helps?  I'm not following you on the ndo_fdb_add/del part of
>> your question.
>
> Sorry I wasn't clear. What is confusing to me for FDB ops is that we
> define net_device_ops like:
>
> .ndo_fdb_add= switchdev_port_fdb_add,
> .ndo_fdb_del= switchdev_port_fdb_del,
> .ndo_fdb_dump   = switchdev_port_fdb_dump,
>
> But if I'm not mistaken, "bridge fdb" commands use the
> .ndo_bridge_{get,set,del}link ops, isn't it?

No, the "bridge fdb" cmds use the .ndo_fdb_xxx ops.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro

2015-10-11 Thread huangdaode

On 2015/10/12 11:24, Joe Perches wrote:

Hello Huang.

On Mon, 2015-10-12 at 11:23 +0800, huangdaode wrote:

This patch changes the code style to make the code more simple.
also removes the once used HNADEL_TX_MSG macro, according to the

HANDEL_TX_MSG typo


review comments from Joe Perches.

Signed-off-by: huangdaode 
Reviewed-by: Joe Perches 

I didn't review this.
'

Hi Joe
please refer to http://lists.openwall.net/netdev/2015/10/11/61
thanks.


Please do not add signatures for another person when
not specifically added by that person.



.




--
Best Regards
Daode Huang


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev

2015-10-11 Thread Premkumar Jonnala
> 2015-10-10 15:41 GMT-07:00 Vivien Didelot
> :
> > On Oct. Saturday 10 (41) 11:09 PM, Jiri Pirko wrote:
> >> Sat, Oct 10, 2015 at 05:56:19PM CEST, vivien.dide...@savoirfairelinux.com
> wrote:
> >> >On Oct. Saturday 10 (41) 09:04 AM, Jiri Pirko wrote:
> >> >> Sat, Oct 10, 2015 at 04:53:52AM CEST, sfel...@gmail.com wrote:
> >> >> >On Thu, Oct 8, 2015 at 9:38 PM, Premkumar Jonnala
>  wrote:
> >> >> >>
> >> >> >>
> >> >> >>> -Original Message-
> >> >> >>> From: sfel...@gmail.com [mailto:sfel...@gmail.com]
> >> >> >>> Sent: Friday, October 09, 2015 7:53 AM
> >> >> >>> To: netdev@vger.kernel.org
> >> >> >>> Cc: da...@davemloft.net; j...@resnulli.us;
> siva.mannem@gmail.com;
> >> >> >>> Premkumar Jonnala; step...@networkplumber.org;
> >> >> >>> ro...@cumulusnetworks.com; and...@lunn.ch;
> f.faine...@gmail.com;
> >> >> >>> vivien.dide...@savoirfairelinux.com
> >> >> >>> Subject: [PATCH net-next v3 3/4] bridge: push bridge setting
> ageing_time down
> >> >> >>> to switchdev
> >> >> >>>
> >> >> >>> From: Scott Feldman 
> >> >> >>>
> >> >> >>> Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge
> that don't
> >> >> >>> support setting ageing_time (or setting bridge attrs in general).
> >> >> >>>
> >> >> >>> If push fails, don't update ageing_time in bridge and return err to 
> >> >> >>> user.
> >> >> >>>
> >> >> >>> If push succeeds, update ageing_time in bridge and run gc_timer now
> to
> >> >> >>> recalabrate when to run gc_timer next, based on new ageing_time.
> >> >> >>>
> >> >> >>> Signed-off-by: Scott Feldman 
> >> >> >>> Signed-off-by: Jiri Pirko 
> >> >> >
> >> >> >
> >> >> >
> >> >> >>> +int br_set_ageing_time(struct net_bridge *br, u32 ageing_time)
> >> >> >>> +{
> >> >> >>> + struct switchdev_attr attr = {
> >> >> >>> + .id = SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
> >> >> >>> + .flags = SWITCHDEV_F_SKIP_EOPNOTSUPP,
> >> >> >>> + .u.ageing_time = ageing_time,
> >> >> >>> + };
> >> >> >>> + unsigned long t = clock_t_to_jiffies(ageing_time);
> >> >> >>> + int err;
> >> >> >>> +
> >> >> >>> + if (t < BR_MIN_AGEING_TIME || t > BR_MAX_AGEING_TIME)
> >> >> >>> + return -ERANGE;
> >> >> >>> +
> >> >> >>> + err = switchdev_port_attr_set(br->dev, );
> >> >> >>
> >> >> >> A thought - given that the ageing time is not a per-bridge-port 
> >> >> >> attr, why
> are we using a "port based api"
> >> >> >> to pass the attribute down?  May be I'm missing something here?
> >> >> >
> >> >> >I think Florian raised the same point earlier.  Sigh, I think this
> >> >> >should be addressedv4 coming soon...thanks guys for keeping the
> >> >> >standard high.
> >> >>
> >> >> Scott, can you tell us how do you want to address this? I like the
> >> >> current implementation.
> >> >
> >> >Scott, didn't you have a plan to add a struct device for the parent of
> >> >switchdev ports?
> >> >
> >> >I think it would be good to introduce such device with an helper to
> >> >retrieve this upper parent, and move the switchdev ops to this guy.
> >> >
> >> >So switchdev drivers may implement such API calls:
> >> >
> >> >.obj_add(struct device *swdev, struct switchdev_obj *obj);
> >> >
> >> >.port_obj_add(struct device *swdev, struct net_device *port,
> >> >  struct switchdev_obj *obj);
> >> >
> >> >Then switchdev code may have a parent API and the current port API may
> >> >look like this:
> >> >
> >> >int switchdev_port_obj_add(struct net_device *dev,
> >> >   struct switchdev_obj *obj)
> >> >{
> >> >struct device *swdev = switchdev_get_parent(dev);
> >> >int err = -EOPNOTSUPP;
> >> >
> >> >if (swdev && swdev->switchdev_ops &&
> >> >swdev->switchdev_ops->port_obj_add)
> >> >err = swdev->switchdev_ops->port_obj_add(swdev, dev, obj);
> >> >
> >> >return err;
> >> >}
> >>
> >> Fro the record, I don't see any reason for this "device". It would just
> >> introduce unnecessary complexicity. So far, we are fine without it.
> >
> > I wouldn't say that. I beleive that an Ethernet switch deserves its
> > struct device in the tree, since it is a physical chip, like any other.
> 
> Agreed, but gating these patches because we do not yet have a device
> driver model for an Ethernet switch outside of its individual ports
> does not seem like it hurts the current patch series, nor the existing
> model (and future).

I don't think we have to gate this patch because of this change, but I wanted to
raise this issue because it didn't seem right to use "port level" API when what 
we
want to configure is really a "bridge level" attribute.

> 
> >
> > Configuring it through one of its port (net_device) is fine, since you
> > want to change the port behavior, and Linux config is on per-port basis.
> >
> > But the complexity is already 

[PATCH net-next v2 2/2] hisilicon net: fix a bug on Hisilicon Network Subsystem

2015-10-11 Thread huangdaode
This patch fixes the wrong judgement of mac_id when get port num.

Signed-off-by: huangdaode 
Signed-off-by: yankejian 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index 95bf42a..e162e4a 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -179,7 +179,7 @@ static int hns_mac_get_inner_port_num(struct hns_mac_cb 
*mac_cb,
return -EINVAL;
}
} else if (mac_cb->dsaf_dev->dsaf_mode < DSAF_MODE_MAX) {
-   if (mac_cb->mac_id <= DSAF_MAX_PORT_NUM_PER_CHIP) {
+   if (mac_cb->mac_id >= DSAF_MAX_PORT_NUM_PER_CHIP) {
dev_err(mac_cb->dev,
"input invalid,%s mac%d vmid%d!\n",
mac_cb->dsaf_dev->ae_dev.name,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 0/2] hisilicon net: fix some bugs in HNS drivers

2015-10-11 Thread huangdaode
This patchset fixes the two bugs in HNS driver, one is change the code style 
according to the review comments from Joe Perches, another
is fixing the wrong mac_id judgement bug which is found during internal tests.

change log:
v2
  1) remove first bug fix, which is fixed in another patch submitted by
 Arnd Bergmann 
  2) change the code sytyle according to Joe.
v1
 initial version.
 
huangdaode (2):
  hisilicon net: removes the once HANDEL_TX_MSG macro
  hisilicon net: fix a bug on Hisilicon Network Subsystem

 drivers/net/ethernet/hisilicon/hns/hnae.c | 45 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c |  2 +-
 2 files changed, 23 insertions(+), 24 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2 1/2] hisilicon net: removes the once HANDEL_TX_MSG macro

2015-10-11 Thread huangdaode
This patch changes the code style to make the code more simple.
also removes the once used HNADEL_TX_MSG macro, according to the
review comments from Joe Perches.

Signed-off-by: huangdaode 
Reviewed-by: Joe Perches 
---
 drivers/net/ethernet/hisilicon/hns/hnae.c | 45 +++
 1 file changed, 22 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
index f52e99a..40f5779 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -439,40 +439,39 @@ EXPORT_SYMBOL(hnae_ae_unregister);
 static ssize_t handles_show(struct device *dev,
struct device_attribute *attr, char *buf)
 {
-   ssize_t s = 0;
+   char *s = buf;
struct hnae_ae_dev *hdev = cls_to_ae_dev(dev);
struct hnae_handle *h;
int i = 0, j;
 
list_for_each_entry_rcu(h, >handle_list, node) {
-   s += sprintf(buf + s, "handle %d (eport_id=%u from %s):\n",
+   s += sprintf(s, "handle %d (eport_id=%u from %s):\n",
i++, h->eport_id, h->dev->name);
for (j = 0; j < h->q_num; j++) {
-   s += sprintf(buf + s, "\tqueue[%d] on %p\n",
+   s += sprintf(s, "\tqueue[%d] on %p\n",
 j, h->qs[i]->io_base);
-#define HANDEL_TX_MSG "\t\ttx_ring on %p:%u,%u,%u,%u,%u,%llu,%llu\n"
-   s += sprintf(buf + s,
-HANDEL_TX_MSG,
-h->qs[i]->tx_ring.io_base,
-h->qs[i]->tx_ring.buf_size,
-h->qs[i]->tx_ring.desc_num,
-h->qs[i]->tx_ring.max_desc_num_per_pkt,
-h->qs[i]->tx_ring.max_raw_data_sz_per_desc,
-h->qs[i]->tx_ring.max_pkt_size,
-h->qs[i]->tx_ring.stats.sw_err_cnt,
-h->qs[i]->tx_ring.stats.io_err_cnt);
-   s += sprintf(buf + s,
-   "\t\trx_ring on %p:%u,%u,%llu,%llu,%llu\n",
-   h->qs[i]->rx_ring.io_base,
-   h->qs[i]->rx_ring.buf_size,
-   h->qs[i]->rx_ring.desc_num,
-   h->qs[i]->rx_ring.stats.sw_err_cnt,
-   h->qs[i]->rx_ring.stats.io_err_cnt,
-   h->qs[i]->rx_ring.stats.seg_pkt_cnt);
+   s += sprintf(s,
+   "\t\ttx_ring on 
%p:%u,%u,%u,%u,%u,%llu,%llu\n",
+   h->qs[i]->tx_ring.io_base,
+   h->qs[i]->tx_ring.buf_size,
+   h->qs[i]->tx_ring.desc_num,
+   h->qs[i]->tx_ring.max_desc_num_per_pkt,
+   h->qs[i]->tx_ring.max_raw_data_sz_per_desc,
+   h->qs[i]->tx_ring.max_pkt_size,
+   h->qs[i]->tx_ring.stats.sw_err_cnt,
+   h->qs[i]->tx_ring.stats.io_err_cnt);
+   s += sprintf(s,
+   "\t\trx_ring on %p:%u,%u,%llu,%llu,%llu\n",
+   h->qs[i]->rx_ring.io_base,
+   h->qs[i]->rx_ring.buf_size,
+   h->qs[i]->rx_ring.desc_num,
+   h->qs[i]->rx_ring.stats.sw_err_cnt,
+   h->qs[i]->rx_ring.stats.io_err_cnt,
+   h->qs[i]->rx_ring.stats.seg_pkt_cnt);
}
}
 
-   return s;
+   return s - buf;
 }
 
 static DEVICE_ATTR_RO(handles);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html