date:20160610

Re: [PATCH net-next 0/4] net_sched: remove qdisc_is_throttled()

2016-06-10 Thread David Miller

From: Eric Dumazet 
Date: Fri, 10 Jun 2016 16:41:35 -0700

> HTB, CBQ and HFSC pay a very high cost updating the qdisc 'throttled'
> status that nothing but CBQ seems to use.
> 
> CBQ usage is flaky anyway, since no qdisc ->enqueue() updates the
> 'throttled' qdisc status.
> 
> This looks like some 'optimization' that actually cost more than code
> without the optimization, and might cause latency issues with CBQ.
> 
> In my tests, I could achieve a 8 % performance increase in TCP_RR
> workload through HTB qdisc, in presence of throttled classes,
> and 5 % without throttled classes.

Series applied, thanks.

Re: [PATCH] [PATCH v2] net: au1000_eth: fix PHY detection

2016-06-10 Thread David Miller

From: Manuel Lauss 
Date: Sat, 11 Jun 2016 00:13:04 +0200

> Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541
> ("phy: Add API for {un}registering an mdio device to a bus.")
> broke PHY detection on this driver with a copy-paste bug:
> The code is looking 32 times for a PHY at address 0.
> 
> Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have
> their (autodetected) PHYs at address 31.
> 
> Cc: Andrew Lunn 
> Signed-off-by: Manuel Lauss 
> ---
> v2: unbreak "use highest phy addr" case.

I just noticed this and fixed up my tree to use this version instead.

Re: [PATCH] net: au1000_eth: fix PHY detection

2016-06-10 Thread David Miller

From: Manuel Lauss 
Date: Fri, 10 Jun 2016 16:53:05 +0200

> Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541
> ("phy: Add API for {un}registering an mdio device to a bus.")
> broke PHY detection on this driver with a copy-paste bug:
> The code is looking 32 times for a PHY at address 0.
> 
> Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have
> their (autodetected) PHYs at address 31.
> 
> Cc: Andrew Lunn 
> Signed-off-by: Manuel Lauss 

Applied.

Re: [PATCH next] sched: remove NET_XMIT_POLICED

2016-06-10 Thread David Miller

From: Florian Westphal 
Date: Fri, 10 Jun 2016 14:21:29 +0200

> ... its not returned anywhere.
> 
> BATMAN uses it as an intermediate return value to signal
> forwarding vs. buffering, but it will not return POLICED to
> callers outside of BATMAN.
> 
> sch_atm uses it, but on closer look its not returned either since
> the initialisation is always overwritten with qdisc_enqueue() retval.
> 
> Signed-off-by: Florian Westphal 

Applied, thanks.

Re: [PATCH V2 00/11] net: mediatek: various small fixes

2016-06-10 Thread David Miller

From: John Crispin 
Date: Fri, 10 Jun 2016 13:27:57 +0200

> This series contains various small fixes that we stumbled across while
> doing thorough testing and code level reviewing of the driver. The only
> patch that sticks out is the first one, which addresses a DQL related
> issue. The rest are just minor fixes.
> 
> Changes in V2:
> * drop the DQL patch from the list until a better solution is found

Series applied, thanks.

Re: [PATCH] [V3] net: ipconfig: avoid warning by making ic_addrservaddr static

2016-06-10 Thread David Miller

From: Ben Dooks 
Date: Fri, 10 Jun 2016 12:11:06 +0100

> The symbol ic_addrservaddr is not static, but has no declaration
> to match so make it static to fix the following warning:
> 
> net/ipv4/ipconfig.c:130:8: warning: symbol 'ic_addrservaddr' was not 
> declared. Should it be static?
> 
> Signed-off-by: Ben Dooks 

Applied.

Re: [PATCH v5 0/7] Add MDIO bus multiplexer support for iProc SoCs

2016-06-10 Thread David Miller

From: Pramod Kumar 
Date: Fri, 10 Jun 2016 11:03:44 +0530

> Broadcom iProc based SoCs use a MDIO bus multiplexer where child buses
> could be internal as well external to SoCs. These buses could supports
> MDIO transaction compatible to C-22/C-45.
> 
> Broadcom MDIO bus multiplexer is an integrated multiplexer where child bus
> selection and mdio transaction logic lies inside multiplexer itself.
> To accommodate this multiplexer in existing mux framework below changes
> were required-
> 
> 1. Passed MDIO parent bus via mdio_mux_init to MDIO mux framework.
> 
> This patch set includes MDIO bus multiplexer driver along with above
> framework change. It includes one external bus node having Ethernet PHY
> attached and two internal bus node holding PCIe PHYs.
 ...

Series applied, thanks.

Re: [PATCH] tipc: fix potential null pointer dereference in the nla_data function

2016-06-10 Thread David Miller

From: Baozeng Ding 
Date: Fri, 10 Jun 2016 10:26:59 +0800

> Before calling the nla_data function, make sure the argument is not null.
> Fix potential null pointer dereference vulnerability for this.
> 
> Signed-off-by: Baozeng Ding 

TIPC maintainers, please review.

Re: [PATCH] [V2] net: diag: add missing declarations

2016-06-10 Thread David Miller

From: Ben Dooks 
Date: Thu,  9 Jun 2016 18:05:09 +0100

> The functions inet_diag_msg_common_fill and inet_diag_msg_attrs_fill
> seem to have been missed from the include/linux/inet_diag.h header
> file. Add them to fix the following warnings:
> 
> net/ipv4/inet_diag.c:69:6: warning: symbol 'inet_diag_msg_common_fill' was 
> not declared. Should it be static?
> net/ipv4/inet_diag.c:108:5: warning: symbol 'inet_diag_msg_attrs_fill' was 
> not declared. Should it be static?
> 
> Signed-off-by: Ben Dooks 

Applied, thanks.

Re: [PATCHv4 net-next] sctp: sctp should change socket state when shutdown is received

2016-06-10 Thread David Miller

From: Xin Long 
Date: Thu,  9 Jun 2016 22:48:18 +0800

> Now sctp doesn't change socket state upon shutdown reception. It changes
> just the assoc state, even though it's a TCP-style socket.
> 
> For some cases, if we really need to check sk->sk_state, it's necessary to
> fix this issue, at least when we use ss or netstat to dump, we can get a
> more exact information.
> 
> As an improvement, we will change sk->sk_state when we change asoc->state
> to SHUTDOWN_RECEIVED, and also do it in sctp_shutdown to keep consistent
> with sctp_close.
> 
> Signed-off-by: Xin Long 

Applied.

Re: pull-request: mac80211-next 2016-06-09

2016-06-10 Thread David Miller

From: Johannes Berg 
Date: Thu,  9 Jun 2016 15:04:25 +0200

> Here's my first set of changes for -next. The only exciting part are the
> changes from Michał to integrate FQ/codel into mac80211's software queues
> to improve cross-station latency and finally solve much of the latency
> issues so many people have spent so much time talking about, but never
> actually working on fixing until he did :)
> 
> Can I ask you to pull net into net-next, preferably before merging this?
> I have a patch that is unsafe without a fix that I submitted through
> mac80211 previously (last time, not the pull request a few minutes ago),
> and I'd prefer not to have that in our -next trees without the fix. If
> you merge net into net-next before this pull request I can fast-forward
> mac80211-next to this merge later.
> 
> Let me know if there's any problem.

Looks good, pulled, thanks!

Re: [PATCH net-next v3 0/4] arm64 BPF JIT updates

2016-06-10 Thread David Miller

From: Zi Shen Lim 
Date: Wed,  8 Jun 2016 21:18:46 -0700

> Updates for arm64 eBPF JIT.

Series applied to net-next, thanks.

Re: [PATCH net-next v2 0/2] tcp: add NV congestion control

2016-06-10 Thread David Miller

From: Lawrence Brakmo 
Date: Wed, 8 Jun 2016 21:16:43 -0700

> Removed most of the module parameters
> 
> Tested in a rack using between 1 and 380 active TCP-NV flows.
> 
> Consists of the following patches:
> [PATCH net-next v2 1/2] tcp: add in_flight to tcp_skb_cb
> [PATCH net-next v2 2/2] tcp: add NV congestion control
> 
> Signed-off-by: Lawrence Brakmo 

Series applied, thanks.

Re: [PATCH 0/6] virtio_net: use common code for virtio_net_hdr and skb GSO conversion

2016-06-10 Thread David Miller

From: Mike Rapoport 
Date: Wed,  8 Jun 2016 16:09:16 +0300

> This patches introduce virtio_net_hdr_{from,to}_skb functions for
> conversion of GSO information between skb and virtio_net_hdr.

Looks like a nice cleanup to me, series applied, thanks Mike.

Re: [PATCH net-next] net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloads

2016-06-10 Thread David Miller

From: "Philip Prindeville" 
Date: Tue,  7 Jun 2016 13:48:46 -0600

> From: Philip Prindeville 
> 
> In the presence of firewalls which improperly block ICMP Unreachable
> (including Fragmentation Required) messages, Path MTU Discovery is
> prevented from working.
> 
> A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as
> is done for other payloads like AppleTalk--and doing transparent
> fragmentation and reassembly.
> 
> Reviewed-by: Stephen Hemminger 
> Signed-off-by: Philip Prindeville 

I think the feedback that this feature should be mutually exclusive
to pmtudisc is valid and you should address that.

Re: [PATCH] RDS: IB: Remove deprecated create_workqueue

2016-06-10 Thread David Miller

From: Bhaktipriya Shridhar 
Date: Wed, 8 Jun 2016 01:03:45 +0530

> alloc_workqueue replaces deprecated create_workqueue().
> 
> Since the driver is infiniband which can be used as block device and the
> workqueue seems involved in regular operation of the device, so a
> dedicated workqueue has been used  with WQ_MEM_RECLAIM set to guarantee
> forward progress under memory pressure.
> Since there are only a fixed number of work items, explicit concurrency
> limit is unnecessary here.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied, thanks.

Re: [PATCH v6] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-10 Thread David Miller

From: Mario Limonciello 
Date: Tue,  7 Jun 2016 13:22:37 -0500

> The RTL8153-AD supports a persistent system specific MAC address.
> This means a device plugged into two different systems with host side
> support will show different (but persistent) MAC addresses.
> 
> This information for the system's persistent MAC address is burned in when
> the system HW is built and available under \_SB.AMAC in the DSDT at runtime.
> 
> This technology is currently implemented in the Dell TB15 and WD15 Type-C
> docks.  More information is available here:
> http://www.dell.com/support/article/us/en/04/SLN301147
> 
> Signed-off-by: Mario Limonciello 
> ---
> Changes from v5:
>  * Correct return value if hex2bin succesful but invalid ether addr

Have things calmed down enough now that I can apply this?

Let's see some ACKs.

Re: [PATCH v2 1/2] net: ethernet: ti: cpsw: remove rx_descs property

2016-06-10 Thread David Miller

From: Ivan Khoronzhuk 
Date: Tue,  7 Jun 2016 16:59:35 +0300

>   if (!cpsw_common_res_usage_state(priv)) {
> + int buf_num;
>   struct cpsw_priv *priv_sl0 = cpsw_get_slave_priv(priv, 0);
>  

Please always order local variable declarations from longest to shortest
line.

Re: [PATCH 0/6] eBPF JIT for PPC64

2016-06-10 Thread David Miller

From: "Naveen N. Rao" 
Date: Tue,  7 Jun 2016 19:02:17 +0530

> Please note that patch [2] is a pre-requisite for this patchset, and is
> not yet upstream.
 ...
> [1] http://thread.gmane.org/gmane.linux.kernel/2188694
> [2] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/96514

Because of #2 I don't think I can take this directly into the networking
tree, right?

Therefore, how would you like this to be merged?

Re: [PATCH net v2] bridge: Fix incorrect re-injection of STP packets

2016-06-10 Thread David Miller

From: Ido Schimmel 
Date: Tue, 7 Jun 2016 12:06:58 +0300

> Commit 8626c56c8279 ("bridge: fix potential use-after-free when hook
> returns QUEUE or STOLEN verdict") fixed incorrect usage of NF_HOOK's
> return value by consuming packets in okfn via br_pass_frame_up().
> 
> However, this function re-injects packets to the Rx path with skb->dev
> set to the bridge device, which breaks kernel's STP, as all STP packets
> appear to originate from the bridge device itself.
> 
> Instead, if STP is enabled and bridge isn't a 802.1ad bridge, then learn
> packet's SMAC and inject it back to the Rx path for further processing
> by the packet handlers.
> 
> The patch also makes netfilter's behavior consistent with regards to
> packets destined to the Bridge Group Address, as no hook registered at
> LOCAL_IN will ever be called, regardless if STP is enabled or not.
> 
> Cc: Florian Westphal 
> Cc: Shmulik Ladkani 
> Cc: Toshiaki Makita 
> Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns 
> QUEUE or STOLEN verdict")
> Signed-off-by: Jiri Pirko 
> Signed-off-by: Ido Schimmel 

Applied, thanks.

Re: [PATCH net-next v2] bridge: Synchronize unicast filtering with FDB

2016-06-10 Thread David Miller

From: Toshiaki Makita 
Date: Mon,  6 Jun 2016 21:20:13 +0900

> Patrick Schaaf reported that flooding due to a missing fdb entry of
> the address of macvlan on the bridge device caused high CPU
> consumption of an openvpn process behind a tap bridge port.
> Adding an fdb entry of the macvlan address can suppress flooding
> and avoid this problem.
> 
> This change makes bridge able to synchronize unicast filtering with
> fdb automatically so admin do not need to manually add an fdb entry.
> This effectively supports IFF_UNICAST_FLT in bridge, thus adding an
> macvlan device would not place bridge into promiscuous mode as well.
> 
> v2:
> - Test vlan with br_vlan_should_use() in br_fdb_sync_uc() as per
>   Nikolay Aleksandrov.
> 
> Reported-by: Patrick Schaaf 
> Signed-off-by: Toshiaki Makita 

I really need bridging experts to review and ACK/NACK this.

Thanks.

Re: [PATCH] net: phy: smsc: reintroduced unconditional soft reset

2016-06-10 Thread David Miller

From: Manfred Schlaegl 
Date: Mon, 6 Jun 2016 10:47:47 +0200

> We detected some problems using the smsc lan8720a in combination with
> i.MX28 and tracked this down to commit 21009686662f ("net: phy: smsc: move
> smsc_phy_config_init reset part in a soft_reset function")
> With 2100968666 the generic soft reset is replaced by a specific function
> which handles power down state correctly. But additionally the soft reset
> itself got conditional and is therefore also only performed if the phy is
> in power down state.
> 
> This patch keeps the conditional wake up from power down, but
> re-introduces the unconditional soft reset using the generic soft reset
> function.
> It was tested on linux-4.1.25 and linux-4.7.0-rc2.
> 
> Signed-off-by: Manfred Schlaegl 

Applied.

Re: [PATCH] NET: PHY: adds driver for Intel XWAY PHY

2016-06-10 Thread David Miller

From: Hauke Mehrtens 
Date: Sun,  5 Jun 2016 23:41:11 +0200

> This adds support for the Intel (former Lantiq) XWAY 11G and 22E PHYs.
> These PHYs are also named PEF 7061, PEF 7071, PEF 7072.
> 
> Signed-off-by: John Crispin 
> Signed-off-by: Hauke Mehrtens 

Applied to net-next.

Re: [PATCH 4/4] Revert "powerpc/fsl: Move fsl_guts.h out of arch/powerpc"

2016-06-10 Thread Scott Wood

On Thu, 2016-06-02 at 11:01 +0200, Arnd Bergmann wrote:
> On Wednesday, June 1, 2016 8:24:20 PM CEST Scott Wood wrote:
> > On Mon, 2016-05-30 at 15:18 +0200, Arnd Bergmann wrote:
> > > All users of this driver are PowerPC specific and the header file
> > > has no business in the global include/linux/ hierarchy, so move
> > > it back before anyone starts using it on ARM.
> > > 
> > > This reverts commit 948486544713492f00ac8a9572909101ea892cb0.
> > > 
> > > Signed-off-by: Arnd Bergmann 
> > > ---
> > > This part of the series is not required for the eSDHC quirk,
> > > but it restores the asm/fsl_guts.h header so it doesn't accidentally
> > > get abused for this in the future. I found two drivers outside of
> > > arch/powerpc that already accessed the registers directly, but the
> > > functions look fairly contained, and can be easily hidden in an
> > > #ifdef CONFIG_PPC
> > 
> > NACK
> > 
> > Besides adding ifdef pollution for no good reason, this register block is
> > used
> > on some ARM chips as well.  Why is it a problem if "anyone starts using it
> > on
> > ARM"?
> 
> It's just not a good interface when it's defined as "this is the layout of
> a register area that any driver can ioremap() if they can figure out the
> device node".

That's why I want to move accesses into one guts driver.

>  It's not uncommon to have register areas like that, but
> normally you have at the minimum a 'syscon' device to handle locking
> between drivers accessing the same registers and to avoid having to map
> the same area multiple times.

syscon requires device tree changes.

I don't see read-modify-write operations in regmap -- how does locking around
an individual, inherently-atomic load or store help?

> If we need to use 'guts' registers on ARM, we can find a way to abstract
> them properly for the given use cases, using a syscon or a driver with
> exported functions, but just making a PowerPC platform specific header
> global to all Linux drivers by putting it into include/linux doesn't seem
> right.

Again, it's not PowerPC-specific!  It started that way but then the same
register block got put onto some ARM chips.

It's not global to "all Linux drivers", just the ones that choose to include
an fsl-specific header.

If and when all uses of guts are moved into the guts driver, the header can be
moved into drivers/soc/fsl.

> Note that the header file uses a structure definition rather than the more
> common macros with register offsets, which is fine for a driver that has
> its own registers and abstracts them, but it doesn't really work with
> the regmap interface, so if we want to use it with syscon, it also needs to
> be rewritten.

We don't want to use it with syscon.  If we did, the solution wouldn't be to
move the header back to arch/powerpc, but to convert the struct into offsets.

-Scott

Re: [PATCH 3/4] mmc: sdhci-of-esdhc: fix host version for T4240-R1.0-R2.0

2016-06-10 Thread Scott Wood

On Thu, 2016-06-02 at 10:52 +0200, Arnd Bergmann wrote:
> On Wednesday, June 1, 2016 8:11:14 PM CEST Scott Wood wrote:
> > > +#define T4240_HOST_VER ((VENDOR_V_23 << SDHCI_VENDOR_VER_SHIFT) |
> > > SDHCI_SPEC_200)
> > > +static const struct soc_device_attribute esdhc_t4240_quirk = {
> > > + /* T4240 revision < 0x20 uses vendor version 23, SDHCI version 200
> > > */
> > > + { .soc_id = "T4*(0x824000)", .revision = "0x[01]?",
> > > +   .data = (void *)(uintptr_t)(T4240_HOST_VER) },
> > 
> > Why should this code need to care that the string begins with "T4"?  This
> > creates dual maintenance if that were to change.  It's also broken because
> > T4240 has compatible = "fsl,t4240-device-config", "fsl,qoriq-device-config
> > -2.0" and thus with these patches it would incorrectly show up as "P
> > series
> > (0x824000)".  The compatible string of this node was never meant to be a
> > key
> > for choosing a string to describe the system to userspace.
> 
> This is an artifact of not knowing the specific SoC name, and we can change
> that by looking up the name from the SVR value in the soc_device driver.

...or we could keep it simple and just match the number.

> > 0x824000 is a magic number which should be represented symbolically.
> 
> Sure, feel free to change the format of the soc_device string in any
> name,

That's not what I was asking for...  The match should be numeric but the
knowledge of what the number is should come from a symbolic #define.

> > If T4240 is affected, then so are the reduced-core variants T4160 and
> > T4080,
> > but 0x824000 doesn't match them (Yangbo's patch had the same problem). 
> >  And
> > please don't respond with "0x824*"
> > 
> > You also didn't strip out the E bit of SVR which indicates encryption
> > capability and nothing else (Yangbo's patch did not have this problem
> > because
> > it used SVR_SOC_VER).
> 
> Ok, that should be easy enough to fix in the soc_device driver.

No, because the soc_device driver doesn't know whether the consumer of the ID
cares about the E bit.

> > What happens if the revision condition is more complicated, such as <=
> > 0x20
> > with 0x21 being fine?  Multiple quirk entries where before we had as
> > simple
> > comparison?
> 
> I guess yes. I would really hope that there is no need to use this interface
> pervasively, it's really just to work around the cases where there is no
> way to pass the information in DT otherwise.

How does putting it in the DT work when you have multiple versions of the same
SoC, some of which have the bug and some which don't?

> > I fail to see how this approach is an improvement (much less one that
> > needs to
> > hold up a patchset that is fixing a problem and is not touching any
> > generic
> > code).  Why does this need to be a string?
> 
> A string is what user space gets in /sys/devices/soc/*,

It is rare that the kernel accesses information in the exact same way that
userspace does.  And once we expose this to userspace we're stuck with it, so
exporting anything other than a simple number is even less desirable.

>  and we already have
> code that does the same things there to work around quirks, here we just
> use the same interface in a completely generic way. Note that not every
> SoC family uses numbers in the same way, some have multiple subrevisions,
> some have names etc.

Where is the need for a "completely generic way" for one piece of vendor
-specific code to get information that is inherently specific to that vendor,
that is supplied by code specific to that vendor?

-Scott

Re: [PATCH 2/3] net: stmmac: dwmac-rk: keep the PHY up for WoL

2016-06-10 Thread Heiko Stuebner

Am Freitag, 10. Juni 2016, 18:00:38 schrieb Vincent Palatin:
> When suspending the machine, do not shutdown the external PHY by cutting
> its regulator in the mac platform driver suspend code if Wake-on-Lan is
> enabled, else it cannot wake us up.
> In order to do this, split the suspend/resume callbacks from the
> init/exit callbacks, so we can condition the power-down on the lack of
> need to wake-up from the LAN but do it unconditionally when unloading the
> module.
> 
> Signed-off-by: Vincent Palatin 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 49
> +++--- 1 file changed, 44 insertions(+), 5
> deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 0cd3ecf..fa05771
> 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -46,6 +46,7 @@ struct rk_priv_data {
>   struct platform_device *pdev;
>   int phy_iface;
>   struct regulator *regulator;
> + bool powered_down;
>   const struct rk_gmac_ops *ops;
> 
>   bool clk_enabled;
> @@ -529,9 +530,8 @@ static struct rk_priv_data *rk_gmac_setup(struct
> platform_device *pdev, return bsp_priv;
>  }
> 
> -static int rk_gmac_init(struct platform_device *pdev, void *priv)
> +static int rk_gmac_powerup(struct rk_priv_data *bsp_priv)
>  {
> - struct rk_priv_data *bsp_priv = priv;
>   int ret;
> 
>   ret = phy_power_on(bsp_priv, true);
> @@ -542,15 +542,52 @@ static int rk_gmac_init(struct platform_device
> *pdev, void *priv) if (ret)
>   return ret;
> 
> + bsp_priv->powered_down = true;
> +
>   return 0;
>  }
> 
> -static void rk_gmac_exit(struct platform_device *pdev, void *priv)
> +static void rk_gmac_powerdown(struct rk_priv_data *gmac)
>  {
> - struct rk_priv_data *gmac = priv;
> -
>   phy_power_on(gmac, false);
>   gmac_clk_enable(gmac, false);
> + gmac->powered_down = true;

naming it gmac->suspended and doing all accesses in the suspend/resume 
callback might provide a nicer way? Now the check is in resume while the 
powerdown callback is setting it.

> +}
> +
> +static int rk_gmac_init(struct platform_device *pdev, void *priv)
> +{
> + struct rk_priv_data *bsp_priv = priv;
> +
> + return rk_gmac_powerup(bsp_priv);
> +}
> +
> +static void rk_gmac_exit(struct platform_device *pdev, void *priv)
> +{
> + struct rk_priv_data *bsp_priv = priv;
> +
> + rk_gmac_powerdown(bsp_priv);
> +}
> +
> +static void rk_gmac_suspend(struct platform_device *pdev, void *priv)
> +{
> + struct rk_priv_data *bsp_priv = priv;
> +
> + /* Keep the PHY up if we use Wake-on-Lan. */
> + if (device_may_wakeup(&pdev->dev))
> + return;
> +
> + rk_gmac_powerdown(bsp_priv);

aka do
bsp_priv->suspended = true;
here

> +}
> +
> +static void rk_gmac_resume(struct platform_device *pdev, void *priv)
> +{
> + struct rk_priv_data *bsp_priv = priv;
> +
> + /* The PHY was up for Wake-on-Lan. */
> + if (!bsp_priv->powered_down)
> + return;
> +
> + rk_gmac_powerup(bsp_priv);

missing something like
bsp_priv->suspended = false;

Right now it looks like your bsp_priv->powered_down will always be true 
after the first suspend with powerdown.

>  }
> 
>  static void rk_fix_speed(void *priv, unsigned int speed)
> @@ -591,6 +628,8 @@ static int rk_gmac_probe(struct platform_device *pdev)
> plat_dat->init = rk_gmac_init;
>   plat_dat->exit = rk_gmac_exit;
>   plat_dat->fix_mac_speed = rk_fix_speed;
> + plat_dat->suspend = rk_gmac_suspend;
> + plat_dat->resume = rk_gmac_resume;
> 
>   plat_dat->bsp_priv = rk_gmac_setup(pdev, data);
>   if (IS_ERR(plat_dat->bsp_priv))
> --
> 2.8.0.rc3.226.g39d4020

Re: [PATCH 2/4] soc: fsl: add GUTS driver for QorIQ platforms

2016-06-10 Thread Scott Wood

On Thu, 2016-06-02 at 10:43 +0200, Arnd Bergmann wrote:
> On Wednesday, June 1, 2016 8:47:22 PM CEST Scott Wood wrote:
> > On Mon, 2016-05-30 at 15:15 +0200, Arnd Bergmann wrote:
> > > diff --git a/drivers/soc/fsl/guts.c b/drivers/soc/fsl/guts.c
> > > new file mode 100644
> > > index ..2f30698f5bcf
> > > --- /dev/null
> > > +++ b/drivers/soc/fsl/guts.c
> > > @@ -0,0 +1,130 @@
> > > +/*
> > > + * Freescale QorIQ Platforms GUTS Driver
> > > + *
> > > + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License as published by
> > > + * the Free Software Foundation; either version 2 of the License, or
> > > + * (at your option) any later version.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#define GUTS_PVR 0x0a0
> > > +#define GUTS_SVR 0x0a4
> > > +
> > > +struct guts {
> > > + void __iomem *regs;
> > 
> > We already have a struct to define guts.  Why are you not using it?  Why
> > do
> > you consider using it to be "abuse"?  What if we want to move more guts
> > functionality into this driver?
> 
> This structure was in the original patch, I left it in there, only
> removed the inclusion of the powerpc header file, which seemed to
> be misplaced.

I'm not refering "struct guts".  I'm referring to changing
"struct ccsr_guts __iomem *regs" into "void __iomem *regs".

And it's not a powerpc header file.

> > > +/*
> > > + * Table for matching compatible strings, for device tree
> > > + * guts node, for Freescale QorIQ SOCs.
> > > + */
> > > +static const struct of_device_id fsl_guts_of_match[] = {
> > > + /* For T4 & B4 Series SOCs */
> > > + { .compatible = "fsl,qoriq-device-config-1.0", .data = "T4/B4
> > > series" },
> > [snip]
> > > + { .compatible = "fsl,qoriq-device-config-2.0", .data = "P
> > > series"
> > 
> > As noted in my comment on patch 3/4, these descriptions are reversed.
> > 
> > They're also incomplete.  t2080 has device config 2.0.  t1040 is described
> > as
> > 2.0 though it should probably be 2.1 (or better, drop the generic
> > compatible
> > altogether).
> 
> Ok. Ideally I think we'd even look up the specific SoC names from the
> SVC rather than the compatible string. I just didn't have a good list
> for those to put in the driver.

The list is in arch/powerpc/include/asm/mpc85xx.h but I don't know why we need
to convert it to a string in the first place.

> 
> > > + /*
> > > +  * syscon devices default to little-endian, but on powerpc we
> > > have
> > > +  * existing device trees with big-endian maps and an absent
> > > endianess
> > > +  * "big-property"
> > > +  */
> > > + if (!IS_ENABLED(CONFIG_POWERPC) &&
> > > + !of_property_read_bool(dev->of_node, "big-endian"))
> > > + guts->little_endian = true;
> > 
> > This is not a syscon device (Yangbo's patch to add a guts node on ls2080
> > is
> > the only guts node that says "syscon", and that was a leftover from
> > earlier
> > revisions and should probably be removed).  Even if it were, where is it
> > documented that syscon defaults to little-endian?
> 
> Documentation/devicetree/bindings/regmap/regmap.txt
> 
> We had a little screwup here, basically regmap (and by consequence, syscon)
> always defaulted to little-endian way before that was documented, so it's
> too late to change it, 

What causes a device node to fall under the jurisdiction of regmap.txt? 
 Again, these nodes do not claim "syscon" compatibility.

> although I agree it would have made sense to document
> regmap to default to big-endian on powerpc.

Please don't.  It's enough of a mess as is; no need to start throwing in
architecture ifdefs.

> > Documentation/devicetree/bindings/common-properties.txt says that the
> > individual binding specifies the default.  The default for this node
> > should be
> > big-endian because that's what existed before there was a need to describe
> > the
> > endianness.  And we need an update to the guts binding to specify that.
> 
> Good point. This proably means that specifying both the "guts" and "syscon"
> compatible strings implies having to also specify the endianess explicitly
> both ways, because otherwise we break one of the two bindings.

Yes, but the node should only specify "guts".

> 
> > > +
> > > + guts->regs = devm_ioremap_resource(dev, 0);
> > > + if (!guts->regs) {
> > > + ret = -ENOMEM;
> > > + kfree(guts);
> > > + goto out;
> > > + }
> > > +
> > > + fsl_guts_init(dev, guts);
> > > + ret = 0;
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +static struct platform_driver fsl_soc_guts = {
> > > + .probe = fsl_guts_probe,
> > > + .driver.of_match_table = fsl_guts_of_match,
> > > +};
> > > +
> > > +module_platform_driver(fsl_soc_guts);
> > 
> > Again, this means that the information is not available during early boot,
> >

Re: net: stmmac: dwmac-rk: fixes for Wake-on-Lan on RK3288

2016-06-10 Thread David Miller

From: Vincent Palatin 
Date: Fri, 10 Jun 2016 18:00:36 -0700

> In order to support Wake-On-Lan when using the RK3288 integrated MAC
> (with an external RGMII PHY), we need to avoid shutting down the regulator
> of the external PHY when the MAC is suspended as it's currently done in the 
> MAC 
> platform code.
> As a first step, create independant callbacks for suspend/resume rather than
> re-using exit/init callbacks. So the dwmac platform driver can behave 
> differently
> on suspend where it might skip shutting the PHY and at module unloading.
> Then update the dwmac-rk driver to switch off the PHY regulator only if we are
> not planning to wake up from the LAN.
> Finally add the PMT interrupt to the MAC device tree configuration, so we can
> wake up the core from it when the PHY has received the magic packet.
> 

Ignore my previous email, but in the future please tag these postings
properly with "[PATCH 0/N] " at the beginning of the subject line.

Thanks.

Re: [PATCH] net: ethernet: ti: cpsw: use destroy ctlr to destroy channels

2016-06-10 Thread David Miller

From: Ivan Khoronzhuk 
Date: Sat, 11 Jun 2016 01:11:54 +0300

> Based on master

master... of what?

Re: [PATCH 1/3] net: stmmac: allow to split suspend/resume from init/exit callbacks

2016-06-10 Thread David Miller


All proper patch serieses must start with an introductory postings
ala "Subject: [PATCH 0/3] ..." which explains what the patch series
is doing at a high level, why, and how.

Thanks.

Re: [PATCH net-next] rxrpc: Trim line-terminal whitespace

2016-06-10 Thread David Miller

From: David Howells 
Date: Fri, 10 Jun 2016 22:30:27 +0100

> Trim line-terminal whitespace in net/rxrpc/
> 
> Signed-off-by: David Howells 

Applied.

Re: [PATCH net-next] rxrpc: Limit the listening backlog

2016-06-10 Thread David Miller

From: David Howells 
Date: Fri, 10 Jun 2016 22:30:37 +0100

> Limit the socket incoming call backlog queue size so that a remote client
> can't pump in sufficient new calls that the server runs out of memory.  Note
> that this is partially theoretical at the moment since whilst the number of
> calls is limited, the number of packets trying to set up new calls is not.
> This will be addressed in a later patch.
> 
> If the caller of listen() specifies a backlog INT_MAX, then they get the
> current maximum; anything else greater than max_backlog or anything
> negative incurs EINVAL.
> 
> The limit on the maximum queue size can be set by:
> 
>   echo N >/proc/sys/net/rxrpc/max_backlog
> 
> where 4<=N<=32.
> 
> Further, set the default backlog to 0, requiring listen() to be called
> before we start actually queueing new calls.  Whilst this kind of is a
> change in the UAPI, the caller can't actually *accept* new calls anyway
> unless they've first called listen() to put the socket into the LISTENING
> state - thus the aforementioned new calls would otherwise just sit there,
> eating up kernel memory.  (Note that sockets that don't have a non-zero
> service ID bound don't get incoming calls anyway.)
> 
> Given that the default backlog is now 0, make the AFS filesystem call
> kernel_listen() to set the maximum backlog for itself.
> 
> Possible improvements include:
> 
>  (1) Trimming a too-large backlog to max_backlog when listen is called.
> 
>  (2) Trimming the backlog value whenever the value is used so that changes
>  to max_backlog are applied to an open socket automatically.  Note that
>  the AFS filesystem opens one socket and keeps it open for extended
>  periods, so would miss out on changes to max_backlog.
> 
>  (3) Having a separate setting for the AFS filesystem.
> 
> Signed-off-by: David Howells 

Applied.

Re: [PATCH net-next v2] net, cls: allow for deleting all filters for given parent

2016-06-10 Thread David Miller

From: Daniel Borkmann 
Date: Fri, 10 Jun 2016 23:10:22 +0200

> Add a possibility where the user can just specify the parent and
> all filters under that parent are then being purged. Currently,
> for example for scripting, one needs to specify pref/prio to have
> a well-defined number for 'tc filter del' command for addressing
> the previously created instance or additionally filter handle in
> case of priorities being the same. Improve usage by allowing the
> option for tc to specify the parent and removing the whole chain
> for that given parent.
> 
> Example usage after patch, no tc changes required:
 ...
> Previously, RTM_DELTFILTER requests with invalid prio of 0 were
> rejected, so only netlink requests with RTM_NEWTFILTER and NLM_F_CREATE
> flag were allowed where the kernel would auto-generate a pref/prio.
> We can piggyback on that and use prio of 0 as a wildcard for
> requests of RTM_DELTFILTER.
> 
> For notifying tc netlink monitoring users (e.g. libnl uses this
> for caching), there are two options, that is, sending individual
> tfilter_notify() notifications for each tcf_proto, or sending a
> single one indicating wildcard removal. I tried both and there
> are pros and cons for each, eventually I decided for sending
> individual tfilter_notify(), so that user space can support this
> seamlessly and there won't be a mess of changing each and every
> application to make sure expectations from the kernel won't break
> when they don't understand single notification. Since linear chains
> don't really scale, I expect only a handful of classifiers to be
> attached at max for a given parent anyway.
> 
> Signed-off-by: Daniel Borkmann 
> Acked-by: Jamal Hadi Salim 

Applied.

Re: [PATCH net-next,v2] tools: hv: Add a script to help bonding synthetic and VF NICs

2016-06-10 Thread David Miller

From: Haiyang Zhang 
Date: Fri, 10 Jun 2016 14:23:12 -0700

> This script helps to create bonding network devices based on synthetic NIC
> (the virtual network adapter usually provided by Hyper-V) and the matching
> VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
> function as one network device, and fail over to the synthetic NIC if VF is
> down.
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: K. Y. Srinivasan 
> ---
> v2: Make it to be distro neutral.

This tool doesn't configure anything.

It fishes information out of sysfs and prints it out.

I don't think this is useful, not appropriate for the kernel
tools subdirectory at all.

When I said make it distro neutral, that meant if you wanted to submit
this again you were expected to do the hard work of making it
configure things properly on all major distributions.

[PATCH 3/3] ARM: dts: rockchip: add interrupt for Wake-on-Lan on RK3288

2016-06-10 Thread Vincent Palatin

In order to use Wake-on-Lan on RK3288 integrated MAC, we need to wake-up
the CPU on the PMT interrupt when the MAC and the PHY are in low power mode.
Adding the interrupt declaration.

Signed-off-by: Vincent Palatin 
---
 arch/arm/boot/dts/rk3288.dtsi | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
index 3b44ef3..3ebee53 100644
--- a/arch/arm/boot/dts/rk3288.dtsi
+++ b/arch/arm/boot/dts/rk3288.dtsi
@@ -539,8 +539,9 @@
gmac: ethernet@ff29 {
compatible = "rockchip,rk3288-gmac";
reg = <0xff29 0x1>;
-   interrupts = ;
-   interrupt-names = "macirq";
+   interrupts = ,
+   ;
+   interrupt-names = "macirq", "eth_wake_irq";
rockchip,grf = <&grf>;
clocks = <&cru SCLK_MAC>,
<&cru SCLK_MAC_RX>, <&cru SCLK_MAC_TX>,
-- 
2.8.0.rc3.226.g39d4020

[PATCH 2/3] net: stmmac: dwmac-rk: keep the PHY up for WoL

2016-06-10 Thread Vincent Palatin

When suspending the machine, do not shutdown the external PHY by cutting
its regulator in the mac platform driver suspend code if Wake-on-Lan is enabled,
else it cannot wake us up.
In order to do this, split the suspend/resume callbacks from the
init/exit callbacks, so we can condition the power-down on the lack of
need to wake-up from the LAN but do it unconditionally when unloading the
module.

Signed-off-by: Vincent Palatin 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 49 +++---
 1 file changed, 44 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 0cd3ecf..fa05771 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -46,6 +46,7 @@ struct rk_priv_data {
struct platform_device *pdev;
int phy_iface;
struct regulator *regulator;
+   bool powered_down;
const struct rk_gmac_ops *ops;
 
bool clk_enabled;
@@ -529,9 +530,8 @@ static struct rk_priv_data *rk_gmac_setup(struct 
platform_device *pdev,
return bsp_priv;
 }
 
-static int rk_gmac_init(struct platform_device *pdev, void *priv)
+static int rk_gmac_powerup(struct rk_priv_data *bsp_priv)
 {
-   struct rk_priv_data *bsp_priv = priv;
int ret;
 
ret = phy_power_on(bsp_priv, true);
@@ -542,15 +542,52 @@ static int rk_gmac_init(struct platform_device *pdev, 
void *priv)
if (ret)
return ret;
 
+   bsp_priv->powered_down = true;
+
return 0;
 }
 
-static void rk_gmac_exit(struct platform_device *pdev, void *priv)
+static void rk_gmac_powerdown(struct rk_priv_data *gmac)
 {
-   struct rk_priv_data *gmac = priv;
-
phy_power_on(gmac, false);
gmac_clk_enable(gmac, false);
+   gmac->powered_down = true;
+}
+
+static int rk_gmac_init(struct platform_device *pdev, void *priv)
+{
+   struct rk_priv_data *bsp_priv = priv;
+
+   return rk_gmac_powerup(bsp_priv);
+}
+
+static void rk_gmac_exit(struct platform_device *pdev, void *priv)
+{
+   struct rk_priv_data *bsp_priv = priv;
+
+   rk_gmac_powerdown(bsp_priv);
+}
+
+static void rk_gmac_suspend(struct platform_device *pdev, void *priv)
+{
+   struct rk_priv_data *bsp_priv = priv;
+
+   /* Keep the PHY up if we use Wake-on-Lan. */
+   if (device_may_wakeup(&pdev->dev))
+   return;
+
+   rk_gmac_powerdown(bsp_priv);
+}
+
+static void rk_gmac_resume(struct platform_device *pdev, void *priv)
+{
+   struct rk_priv_data *bsp_priv = priv;
+
+   /* The PHY was up for Wake-on-Lan. */
+   if (!bsp_priv->powered_down)
+   return;
+
+   rk_gmac_powerup(bsp_priv);
 }
 
 static void rk_fix_speed(void *priv, unsigned int speed)
@@ -591,6 +628,8 @@ static int rk_gmac_probe(struct platform_device *pdev)
plat_dat->init = rk_gmac_init;
plat_dat->exit = rk_gmac_exit;
plat_dat->fix_mac_speed = rk_fix_speed;
+   plat_dat->suspend = rk_gmac_suspend;
+   plat_dat->resume = rk_gmac_resume;
 
plat_dat->bsp_priv = rk_gmac_setup(pdev, data);
if (IS_ERR(plat_dat->bsp_priv))
-- 
2.8.0.rc3.226.g39d4020

net: stmmac: dwmac-rk: fixes for Wake-on-Lan on RK3288

2016-06-10 Thread Vincent Palatin

In order to support Wake-On-Lan when using the RK3288 integrated MAC
(with an external RGMII PHY), we need to avoid shutting down the regulator
of the external PHY when the MAC is suspended as it's currently done in the MAC 
platform code.
As a first step, create independant callbacks for suspend/resume rather than
re-using exit/init callbacks. So the dwmac platform driver can behave 
differently
on suspend where it might skip shutting the PHY and at module unloading.
Then update the dwmac-rk driver to switch off the PHY regulator only if we are
not planning to wake up from the LAN.
Finally add the PMT interrupt to the MAC device tree configuration, so we can
wake up the core from it when the PHY has received the magic packet.

[PATCH 1/3] net: stmmac: allow to split suspend/resume from init/exit callbacks

2016-06-10 Thread Vincent Palatin

Let the stmmac platform drivers provide dedicated suspend and resume
callbacks rather than always re-using the init and exits callbacks.
If the driver does not provide the suspend or resume callback, we fall
back to the old behavior trying to use exit or init.

This allows a specific platform to perform only a partial power-down on
suspend if Wake-on-Lan is enabled but always perform the full shutdown
sequence if the module is unloaded.

Signed-off-by: Vincent Palatin 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 8 ++--
 include/linux/stmmac.h| 2 ++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 409db91..a96714d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -411,7 +411,9 @@ static int stmmac_pltfr_suspend(struct device *dev)
struct platform_device *pdev = to_platform_device(dev);
 
ret = stmmac_suspend(dev);
-   if (priv->plat->exit)
+   if (priv->plat->suspend)
+   priv->plat->suspend(pdev, priv->plat->bsp_priv);
+   else if (priv->plat->exit)
priv->plat->exit(pdev, priv->plat->bsp_priv);
 
return ret;
@@ -430,7 +432,9 @@ static int stmmac_pltfr_resume(struct device *dev)
struct stmmac_priv *priv = netdev_priv(ndev);
struct platform_device *pdev = to_platform_device(dev);
 
-   if (priv->plat->init)
+   if (priv->plat->resume)
+   priv->plat->resume(pdev, priv->plat->bsp_priv);
+   else if (priv->plat->init)
priv->plat->init(pdev, priv->plat->bsp_priv);
 
return stmmac_resume(dev);
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index ffdaca9..0507dbf 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -135,6 +135,8 @@ struct plat_stmmacenet_data {
void (*bus_setup)(void __iomem *ioaddr);
int (*init)(struct platform_device *pdev, void *priv);
void (*exit)(struct platform_device *pdev, void *priv);
+   void (*suspend)(struct platform_device *pdev, void *priv);
+   void (*resume)(struct platform_device *pdev, void *priv);
void *bsp_priv;
struct stmmac_axi *axi;
int has_gmac4;
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH net-next 0/2] bpf: couple of fixes

2016-06-10 Thread David Miller

From: Daniel Borkmann 
Date: Fri, 10 Jun 2016 21:19:05 +0200

> These are two fixes for BPF, one to introduce xmit recursion limiter for
> tc bpf programs and the other one to reject filters a bit earlier. For
> more details please see individual patches. I have no strong opinion
> to which tree they should go, they apply to both, but I think net-next
> seems okay to me.

Series applied to net-next, thanks Daniel.

Re: PATCH [1/1] ipv4: Prevent malformed UFO fragments in ip_append_page

2016-06-10 Thread David Miller

From: Steven Caron 
Date: Fri, 10 Jun 2016 19:21:26 +

> As  the ip fragment offset field counts 8-byte chunks, non-final ip
> fragments must be multiples of 8 bytes of payload. Depending  on
> the mtu and ip option sizes, ip_append_page wasn't respecting this,
> notably when running NFS under UDP.
> 

All patch submissions need a proper signoff.

Please resubmit this properly.

Re: [net-next,v3] openvswitch: Add packet truncation support.

2016-06-10 Thread David Miller

From: William Tu 
Date: Fri, 10 Jun 2016 11:49:33 -0700

> The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
> truncate packets. A 'max_len' is added for setting up the maximum
> packet size, and a 'cutlen' field is to record the number of bytes
> to trim the packet when the packet is outputting to a port, or when
> the packet is sent to userspace.
> 
> Signed-off-by: William Tu 

Applied.

Re: [PATCH net-next v2] net, cls: allow for deleting all filters for given parent

2016-06-10 Thread Alexei Starovoitov

On Fri, Jun 10, 2016 at 11:10:22PM +0200, Daniel Borkmann wrote:
> Add a possibility where the user can just specify the parent and
> all filters under that parent are then being purged. Currently,
> for example for scripting, one needs to specify pref/prio to have
> a well-defined number for 'tc filter del' command for addressing
> the previously created instance or additionally filter handle in
> case of priorities being the same. Improve usage by allowing the
> option for tc to specify the parent and removing the whole chain
> for that given parent.
> 
> Example usage after patch, no tc changes required:
> 
>   # tc qdisc replace dev foo clsact
>   # tc filter add dev foo egress bpf da obj ./bpf.o
>   # tc filter add dev foo egress bpf da obj ./bpf.o
>   # tc filter show dev foo egress
>   filter protocol all pref 49151 bpf
>   filter protocol all pref 49151 bpf handle 0x1 bpf.o:[classifier] 
> direct-action
>   filter protocol all pref 49152 bpf
>   filter protocol all pref 49152 bpf handle 0x1 bpf.o:[classifier] 
> direct-action
>   # tc filter del dev foo egress
>   # tc filter show dev foo egress
>   #

useful feature. thanks!
Acked-by: Alexei Starovoitov

[PATCH net-next 4/4] net_sched: remove generic throttled management

2016-06-10 Thread Eric Dumazet

__QDISC_STATE_THROTTLED bit manipulation is rather expensive
for HTB and few others.

I already removed it for sch_fq in commit f2600cf02b5b
("net: sched: avoid costly atomic operation in fq_dequeue()")
and so far nobody complained.

When one ore more packets are stuck in one or more throttled
HTB class, a htb dequeue() performs two atomic operations
to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
lock is held.

Removing this pair of atomic operations bring me a 8 % performance
increase on 200 TCP_RR tests, in presence of throttled classes.

This patch has no side effect, since nothing actually uses
disc_is_throttled() anymore.

Signed-off-by: Eric Dumazet 
---
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h | 16 
 net/sched/sch_api.c   |  7 +--
 net/sched/sch_cbq.c   |  2 --
 net/sched/sch_fq.c|  3 +--
 net/sched/sch_hfsc.c  |  1 -
 net/sched/sch_htb.c   |  3 +--
 net/sched/sch_netem.c |  1 -
 net/sched/sch_tbf.c   |  4 +---
 9 files changed, 6 insertions(+), 35 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4d92ca..7caa99b482c6 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -67,12 +67,12 @@ struct qdisc_watchdog {
 };
 
 void qdisc_watchdog_init(struct qdisc_watchdog *wd, struct Qdisc *qdisc);
-void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires, bool 
throttle);
+void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires);
 
 static inline void qdisc_watchdog_schedule(struct qdisc_watchdog *wd,
   psched_time_t expires)
 {
-   qdisc_watchdog_schedule_ns(wd, PSCHED_TICKS2NS(expires), true);
+   qdisc_watchdog_schedule_ns(wd, PSCHED_TICKS2NS(expires));
 }
 
 void qdisc_watchdog_cancel(struct qdisc_watchdog *wd);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9f3581980c15..9a0d177884c6 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -26,7 +26,6 @@ struct qdisc_rate_table {
 enum qdisc_state_t {
__QDISC_STATE_SCHED,
__QDISC_STATE_DEACTIVATED,
-   __QDISC_STATE_THROTTLED,
 };
 
 struct qdisc_size_table {
@@ -125,21 +124,6 @@ static inline int qdisc_avail_bulklimit(const struct 
netdev_queue *txq)
 #endif
 }
 
-static inline bool qdisc_is_throttled(const struct Qdisc *qdisc)
-{
-   return test_bit(__QDISC_STATE_THROTTLED, &qdisc->state) ? true : false;
-}
-
-static inline void qdisc_throttled(struct Qdisc *qdisc)
-{
-   set_bit(__QDISC_STATE_THROTTLED, &qdisc->state);
-}
-
-static inline void qdisc_unthrottled(struct Qdisc *qdisc)
-{
-   clear_bit(__QDISC_STATE_THROTTLED, &qdisc->state);
-}
-
 struct Qdisc_class_ops {
/* Child qdisc manipulation */
struct netdev_queue *   (*select_queue)(struct Qdisc *, struct tcmsg *);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index d4a8bbfcc953..401eda6de682 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -583,7 +583,6 @@ static enum hrtimer_restart qdisc_watchdog(struct hrtimer 
*timer)
 timer);
 
rcu_read_lock();
-   qdisc_unthrottled(wd->qdisc);
__netif_schedule(qdisc_root(wd->qdisc));
rcu_read_unlock();
 
@@ -598,15 +597,12 @@ void qdisc_watchdog_init(struct qdisc_watchdog *wd, 
struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_watchdog_init);
 
-void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires, bool 
throttle)
+void qdisc_watchdog_schedule_ns(struct qdisc_watchdog *wd, u64 expires)
 {
if (test_bit(__QDISC_STATE_DEACTIVATED,
 &qdisc_root_sleeping(wd->qdisc)->state))
return;
 
-   if (throttle)
-   qdisc_throttled(wd->qdisc);
-
if (wd->last_expires == expires)
return;
 
@@ -620,7 +616,6 @@ EXPORT_SYMBOL(qdisc_watchdog_schedule_ns);
 void qdisc_watchdog_cancel(struct qdisc_watchdog *wd)
 {
hrtimer_cancel(&wd->timer);
-   qdisc_unthrottled(wd->qdisc);
 }
 EXPORT_SYMBOL(qdisc_watchdog_cancel);
 
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 6e61f9aa8783..a29fd811d7b9 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -513,7 +513,6 @@ static enum hrtimer_restart cbq_undelay(struct hrtimer 
*timer)
hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS_PINNED);
}
 
-   qdisc_unthrottled(sch);
__netif_schedule(qdisc_root(sch));
return HRTIMER_NORESTART;
 }
@@ -819,7 +818,6 @@ cbq_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
sch->q.qlen--;
-   qdisc_unthrottled(sch);
return skb;
}
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 3c6a47d66a04..f49c81e91acd 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -445,8 +44

[PATCH net-next 3/4] net_sched: netem: remove qdisc_is_throttled() use

2016-06-10 Thread Eric Dumazet

Looks like it is only there as some optimization attempt.

Since __QDISC_STATE_THROTTLED set/unset is way too expensive,
and netem is the last user, just remove this check.

Signed-off-by: Eric Dumazet 
---
 net/sched/sch_netem.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 9ca7947ab643..2dbe732ca135 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -582,9 +582,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
struct sk_buff *skb;
struct rb_node *p;
 
-   if (qdisc_is_throttled(sch))
-   return NULL;
-
 tfifo_dequeue:
skb = __skb_dequeue(&sch->q);
if (skb) {
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 1/4] net_sched: sch_plug: use a private throttled status

2016-06-10 Thread Eric Dumazet

We want to get rid of generic qdisc throttled management,
so this qdisc has to use a private flag.

Signed-off-by: Eric Dumazet 
---
 net/sched/sch_plug.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c
index ff0d968750df..a12cd37680f8 100644
--- a/net/sched/sch_plug.c
+++ b/net/sched/sch_plug.c
@@ -64,6 +64,8 @@ struct plug_sched_data {
 */
bool unplug_indefinite;
 
+   bool throttled;
+
/* Queue Limit in bytes */
u32 limit;
 
@@ -103,7 +105,7 @@ static struct sk_buff *plug_dequeue(struct Qdisc *sch)
 {
struct plug_sched_data *q = qdisc_priv(sch);
 
-   if (qdisc_is_throttled(sch))
+   if (q->throttled)
return NULL;
 
if (!q->unplug_indefinite) {
@@ -111,7 +113,7 @@ static struct sk_buff *plug_dequeue(struct Qdisc *sch)
/* No more packets to dequeue. Block the queue
 * and wait for the next release command.
 */
-   qdisc_throttled(sch);
+   q->throttled = true;
return NULL;
}
q->pkts_to_release--;
@@ -141,7 +143,7 @@ static int plug_init(struct Qdisc *sch, struct nlattr *opt)
q->limit = ctl->limit;
}
 
-   qdisc_throttled(sch);
+   q->throttled = true;
return 0;
 }
 
@@ -173,7 +175,7 @@ static int plug_change(struct Qdisc *sch, struct nlattr 
*opt)
q->pkts_last_epoch = q->pkts_current_epoch;
q->pkts_current_epoch = 0;
if (q->unplug_indefinite)
-   qdisc_throttled(sch);
+   q->throttled = true;
q->unplug_indefinite = false;
break;
case TCQ_PLUG_RELEASE_ONE:
@@ -182,7 +184,7 @@ static int plug_change(struct Qdisc *sch, struct nlattr 
*opt)
 */
q->pkts_to_release += q->pkts_last_epoch;
q->pkts_last_epoch = 0;
-   qdisc_unthrottled(sch);
+   q->throttled = false;
netif_schedule_queue(sch->dev_queue);
break;
case TCQ_PLUG_RELEASE_INDEFINITE:
@@ -190,7 +192,7 @@ static int plug_change(struct Qdisc *sch, struct nlattr 
*opt)
q->pkts_to_release = 0;
q->pkts_last_epoch = 0;
q->pkts_current_epoch = 0;
-   qdisc_unthrottled(sch);
+   q->throttled = false;
netif_schedule_queue(sch->dev_queue);
break;
case TCQ_PLUG_LIMIT:
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 0/4] net_sched: remove qdisc_is_throttled()

2016-06-10 Thread Eric Dumazet

HTB, CBQ and HFSC pay a very high cost updating the qdisc 'throttled'
status that nothing but CBQ seems to use.

CBQ usage is flaky anyway, since no qdisc ->enqueue() updates the
'throttled' qdisc status.

This looks like some 'optimization' that actually cost more than code
without the optimization, and might cause latency issues with CBQ.

In my tests, I could achieve a 8 % performance increase in TCP_RR
workload through HTB qdisc, in presence of throttled classes,
and 5 % without throttled classes.

Eric Dumazet (4):
  net_sched: sch_plug: use a private throttled status
  net_sched: cbq: remove a flaky use of qdisc_is_throttled()
  net_sched: netem: remove qdisc_is_throttled() use
  net_sched: remove generic throttled management

 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h | 16 
 net/sched/sch_api.c   |  7 +--
 net/sched/sch_cbq.c   |  4 +---
 net/sched/sch_fq.c|  3 +--
 net/sched/sch_hfsc.c  |  1 -
 net/sched/sch_htb.c   |  3 +--
 net/sched/sch_netem.c |  4 
 net/sched/sch_plug.c  | 14 --
 net/sched/sch_tbf.c   |  4 +---
 10 files changed, 15 insertions(+), 45 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 2/4] net_sched: cbq: remove a flaky use of qdisc_is_throttled()

2016-06-10 Thread Eric Dumazet

So far no qdisc ever unset the throttled bit at enqueue() time,
so CBQ usage of qdisc_is_throttled() was flaky.

Since __QDISC_STATE_THROTTLED set/unset is way too expensive
considering that only CBQ was eventually caring for this status,
it would make sense to implement a Qdisc ops ->is_throttled()
if we find that this is needed.

Signed-off-by: Eric Dumazet 
---
 net/sched/sch_cbq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f2af31be6370..6e61f9aa8783 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -345,7 +345,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct 
cbq_class *cl)
 {
int toplevel = q->toplevel;
 
-   if (toplevel > cl->level && !(qdisc_is_throttled(cl->q))) {
+   if (toplevel > cl->level) {
psched_time_t now = psched_get_time();
 
do {
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH v2 0/2] net: ethernet: ti: cpsw: delete rx_descs property

2016-06-10 Thread Schuyler Patton




On 06/08/2016 07:03 PM, Ivan Khoronzhuk wrote:



On 09.06.16 02:11, Schuyler Patton wrote:



On 06/08/2016 09:06 AM, Ivan Khoronzhuk wrote:



On 08.06.16 17:01, Ivan Khoronzhuk wrote:

Hi Schuyer,

On 07.06.16 18:26, Schuyler Patton wrote:

Hi,

On 06/07/2016 08:59 AM, Ivan Khoronzhuk wrote:

There is no reason in rx_descs property because davinici_cpdma
driver splits pool of descriptors equally between tx and rx 
channels.

So, this patch series makes driver to use available number of
descriptors for rx channels.


I agree with the idea of consolidating how the descriptors are 
defined because of
the two variable components, number and size of the pool can be 
confusing to
end users. I would like to request to change how it is being 
proposed here.


I think the number of descriptors should be left in the device 
tree source file as
is and remove the BD size variable and have the driver calculate 
the size of the
pool necessary to support the descriptor request. From an user 
perspective it is
easier I think to be able to list the number of descriptors 
necessary vs. the size

of the pool.

Since the patch series points out how it is used so in the driver 
so to make that

consistent is perhaps change rx_descs to total_descs.

Regards,
Schuyler


The DT entry for cpsw doesn't have property for size of the pool.
It contains only BD ram size, if you mean this. The size of the 
pool is
software decision. Current version of DT entry contain only rx desc 
number.
That is not correct, as it depends on the size of the descriptor, 
which is also
h/w parameter. The DT entry has to describe only h/w part and 
shouldn't contain
driver implementation details, and I'm looking on it from this 
perspective.


Besides, rx_descs describes only rx number of descriptors, that are 
taken from
the same pool as tx descriptors, and setting rx desc to some new 
value doesn't
mean that rest of them are freed for tx. Also, I'm going to send 
series that
adds multi channel support to the driver, and in this case, 
splitting of the
pool will be more sophisticated than now, after what setting those 
parameters
for user (he should do this via device tree) can be even more 
confusing. But,

should -> shouldn't

as it's supposed, it's software decision that shouldn't leak to the 
DT.


If this rx-desc field is removed how will the number of descriptors 
be set?


This field has been used to increase the number of descriptors so high
volume short packets are not dropped due to descriptor exhaustion. 
The current
default number of 64 rx descriptors is too low for gigabit networks. 
Some users
have a strong requirement for zero loss of UDP packets setting this 
field to a

larger number and setting the descriptors off-chip was a means to solve
the problem.
The current implementation of cpdma driver splits descs num on 2 parts 
equally.
Total number = 256, then 128 reserved for rx and 128 for tx, but 
setting this to

64, simply limits usage of reserved rx descriptors to 64, so that:
64 rx descs, 128 tx descs and 64 are always present in the pool but 
cannot be used,

(as new rx descriptor is allocated only after previous was freed).
That means, 64 rx descs are unused. In case of rx descriptor 
exhaustion, an user can
set rx_descs to 128, for instance, in this case all descriptors will 
be in use, but then question,
why intentionally limit number of rx descs, anyway rest 64 descs 
cannot be used for other
purposes. In case of this patch, all rx descs are in use, and no need 
to correct number
of rx descs anymore, use all of themand it doesn't have impact on 
performance, as
anyway, bunch of rx descs were simply limited by DT and unused. So, 
probably, there is no

reason to worry about that.


When we see this issue we set the descriptors to DDR and put a large number
in the desc count. unfortunately I wish I could provide a number, 
usually the issue

is a volume burst of short UDP packets.



PS:
It doesn't concern this patch, but, which PPS makes rx descs to be 
exhausted?...
(In this case "desc_alloc_fail" counter contains some value for rx 
channel,
and can be read with "ethtool -S eth0". Also, the user will be WARNed 
ON by the driver)


it's interesting to test it, I'm worrying about, because in case of 
multichannel,
the pool is split between all channels... they are throughput limited, 
but
anyway, it's good to correlate the number of descs with throughput 
assigned to
a channel, if possible. That has to be possible, if setting to 128 
helps, then
has to be value between 64 and 128 to make handling of rx packets fast 
enough.
After what, can be calculated correlation between number of rx descs 
and throughput

split between channels


With gigabit networks 64 or 128 rx descriptors is not going to enough to 
fix the
DMA overrun problem. Usually we set this number to an arbitrarily large 
2000
descriptors in external DDR to demonstrate it is possible to not drop 
packets. All

this does is move the problem higher up so

[PATCH] [PATCH v2] net: au1000_eth: fix PHY detection

2016-06-10 Thread Manuel Lauss

Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541
("phy: Add API for {un}registering an mdio device to a bus.")
broke PHY detection on this driver with a copy-paste bug:
The code is looking 32 times for a PHY at address 0.

Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have
their (autodetected) PHYs at address 31.

Cc: Andrew Lunn 
Signed-off-by: Manuel Lauss 
---
v2: unbreak "use highest phy addr" case.

before:
libphy: au1000_eth_mii: probed
au1000-eth au1000-eth.0 (unnamed net_device) (uninitialized): no PHY found
au1000-eth: probe of au1000-eth.0 failed with error -1

after:
libphy: au1000_eth_mii: probed
AM79C874 au1000-eth-0:1f: attached PHY driver [AM79C874] 
(mii_bus:phy_addr=au1000-eth-0:1f, irq=-1)
au1000-eth au1000-eth.0 eth0: Au1xx0 Ethernet found at 0x1150, irq 36

 drivers/net/ethernet/amd/au1000_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/au1000_eth.c 
b/drivers/net/ethernet/amd/au1000_eth.c
index e0fb0f1..20760e1 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -509,8 +509,8 @@ static int au1000_mii_probe(struct net_device *dev)
 * on the current MAC's MII bus
 */
for (phy_addr = 0; phy_addr < PHY_MAX_ADDR; phy_addr++)
-   if (mdiobus_get_phy(aup->mii_bus, aup->phy_addr)) {
-   phydev = mdiobus_get_phy(aup->mii_bus, aup->phy_addr);
+   if (mdiobus_get_phy(aup->mii_bus, phy_addr)) {
+   phydev = mdiobus_get_phy(aup->mii_bus, phy_addr);
if (!aup->phy_search_highest_addr)
/* break out with first one found */
break;
-- 
2.8.4

[PATCH] net: ethernet: ti: cpsw: use destroy ctlr to destroy channels

2016-06-10 Thread Ivan Khoronzhuk

There is no reason to destroy channels that are destroyed while
cpdma_ctlr destroy. In this case no need to remember how much
channels where created and destroy them by one, as cpdma_ctlr
destroys all of them.

Signed-off-by: Ivan Khoronzhuk 
---

Based on master

 drivers/net/ethernet/ti/cpsw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e6bb0ec..5319089 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2505,8 +2505,6 @@ static int cpsw_probe(struct platform_device *pdev)
 clean_ale_ret:
cpsw_ale_destroy(priv->ale);
 clean_dma_ret:
-   cpdma_chan_destroy(priv->txch);
-   cpdma_chan_destroy(priv->rxch);
cpdma_ctlr_destroy(priv->dma);
 clean_runtime_disable_ret:
pm_runtime_disable(&pdev->dev);
@@ -2534,8 +2532,6 @@ static int cpsw_remove(struct platform_device *pdev)
unregister_netdev(ndev);
 
cpsw_ale_destroy(priv->ale);
-   cpdma_chan_destroy(priv->txch);
-   cpdma_chan_destroy(priv->rxch);
cpdma_ctlr_destroy(priv->dma);
pm_runtime_disable(&pdev->dev);
device_for_each_child(&pdev->dev, NULL, cpsw_remove_child_device);
-- 
1.9.1

Re: [PATCH net-next 4/7] ixgbe: protect vxlan_get_rx_port in ixgbe_service_task with rtnl_lock

2016-06-10 Thread Alexander Duyck

On Mon, Apr 18, 2016 at 12:19 PM, Hannes Frederic Sowa
 wrote:
> vxlan_get_rx_port requires rtnl_lock to be held.
>
> Cc: Jeff Kirsher 
> Cc: Jesse Brandeburg 
> Cc: Shannon Nelson 
> Cc: Carolyn Wyborny 
> Cc: Don Skidmore 
> Cc: Bruce Allan 
> Cc: John Ronciak 
> Cc: Mitch Williams 
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2976df77bf14f5..b2f2cf40f06a87 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -7192,10 +7192,12 @@ static void ixgbe_service_task(struct work_struct 
> *work)
> return;
> }
>  #ifdef CONFIG_IXGBE_VXLAN
> +   rtnl_lock();
> if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
> adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
> vxlan_get_rx_port(adapter->netdev);
> }
> +   rtnl_unlock();
>  #endif /* CONFIG_IXGBE_VXLAN */
> ixgbe_reset_subtask(adapter);
> ixgbe_phy_interrupt_subtask(adapter);
> --
> 2.5.5
>

Would it be possible to only take the lock inside of the conditional
statement?  There isn't really any need to be holding the rtnl_lock to
check the adapter flag.

- Alex

Re: [PATCH net-next 4/7] ixgbe: protect vxlan_get_rx_port in ixgbe_service_task with rtnl_lock

2016-06-10 Thread Alexander Duyck

On Fri, Jun 10, 2016 at 2:17 PM, Eric Dumazet  wrote:
> On Mon, 2016-04-18 at 21:19 +0200, Hannes Frederic Sowa wrote:
>> vxlan_get_rx_port requires rtnl_lock to be held.
>>
>> Cc: Jeff Kirsher 
>> Cc: Jesse Brandeburg 
>> Cc: Shannon Nelson 
>> Cc: Carolyn Wyborny 
>> Cc: Don Skidmore 
>> Cc: Bruce Allan 
>> Cc: John Ronciak 
>> Cc: Mitch Williams 
>> Signed-off-by: Hannes Frederic Sowa 
>> ---
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index 2976df77bf14f5..b2f2cf40f06a87 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -7192,10 +7192,12 @@ static void ixgbe_service_task(struct work_struct 
>> *work)
>>   return;
>>   }
>>  #ifdef CONFIG_IXGBE_VXLAN
>> + rtnl_lock();
>>   if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
>>   adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
>>   vxlan_get_rx_port(adapter->netdev);
>>   }
>> + rtnl_unlock();
>>  #endif /* CONFIG_IXGBE_VXLAN */
>>   ixgbe_reset_subtask(adapter);
>>   ixgbe_phy_interrupt_subtask(adapter);
>
> Although there might be a deadlock with a concurrent
> cancel_work_sync(&adapter->service_task);  from ixgbe_remove()

I don't think we are holding the rtnl_lock when that is called.  The
ixgbe_remove function is a PCI function.  The ixgbe_close routine is
the one called with the rtnl_lock held.

- Alex

[PATCH net-next] rxrpc: Limit the listening backlog

2016-06-10 Thread David Howells

Limit the socket incoming call backlog queue size so that a remote client
can't pump in sufficient new calls that the server runs out of memory.  Note
that this is partially theoretical at the moment since whilst the number of
calls is limited, the number of packets trying to set up new calls is not.
This will be addressed in a later patch.

If the caller of listen() specifies a backlog INT_MAX, then they get the
current maximum; anything else greater than max_backlog or anything
negative incurs EINVAL.

The limit on the maximum queue size can be set by:

echo N >/proc/sys/net/rxrpc/max_backlog

where 4<=N<=32.

Further, set the default backlog to 0, requiring listen() to be called
before we start actually queueing new calls.  Whilst this kind of is a
change in the UAPI, the caller can't actually *accept* new calls anyway
unless they've first called listen() to put the socket into the LISTENING
state - thus the aforementioned new calls would otherwise just sit there,
eating up kernel memory.  (Note that sockets that don't have a non-zero
service ID bound don't get incoming calls anyway.)

Given that the default backlog is now 0, make the AFS filesystem call
kernel_listen() to set the maximum backlog for itself.

Possible improvements include:

 (1) Trimming a too-large backlog to max_backlog when listen is called.

 (2) Trimming the backlog value whenever the value is used so that changes
 to max_backlog are applied to an open socket automatically.  Note that
 the AFS filesystem opens one socket and keeps it open for extended
 periods, so would miss out on changes to max_backlog.

 (3) Having a separate setting for the AFS filesystem.

Signed-off-by: David Howells 
---

 fs/afs/rxrpc.c  |   34 +++---
 net/rxrpc/af_rxrpc.c|   19 +++
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/misc.c|6 ++
 net/rxrpc/sysctl.c  |   10 ++
 5 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 63cd9f939f19..4832de84d52c 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -85,18 +85,14 @@ int afs_open_socket(void)
 
skb_queue_head_init(&afs_incoming_calls);
 
+   ret = -ENOMEM;
afs_async_calls = create_singlethread_workqueue("kafsd");
-   if (!afs_async_calls) {
-   _leave(" = -ENOMEM [wq]");
-   return -ENOMEM;
-   }
+   if (!afs_async_calls)
+   goto error_0;
 
ret = sock_create_kern(&init_net, AF_RXRPC, SOCK_DGRAM, PF_INET, 
&socket);
-   if (ret < 0) {
-   destroy_workqueue(afs_async_calls);
-   _leave(" = %d [socket]", ret);
-   return ret;
-   }
+   if (ret < 0)
+   goto error_1;
 
socket->sk->sk_allocation = GFP_NOFS;
 
@@ -111,18 +107,26 @@ int afs_open_socket(void)
   sizeof(srx.transport.sin.sin_addr));
 
ret = kernel_bind(socket, (struct sockaddr *) &srx, sizeof(srx));
-   if (ret < 0) {
-   sock_release(socket);
-   destroy_workqueue(afs_async_calls);
-   _leave(" = %d [bind]", ret);
-   return ret;
-   }
+   if (ret < 0)
+   goto error_2;
+
+   ret = kernel_listen(socket, INT_MAX);
+   if (ret < 0)
+   goto error_2;
 
rxrpc_kernel_intercept_rx_messages(socket, afs_rx_interceptor);
 
afs_socket = socket;
_leave(" = 0");
return 0;
+
+error_2:
+   sock_release(socket);
+error_1:
+   destroy_workqueue(afs_async_calls);
+error_0:
+   _leave(" = %d", ret);
+   return ret;
 }
 
 /*
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 38512a200db6..a1bcb0e17250 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -33,8 +33,6 @@ unsigned int rxrpc_debug; // = RXRPC_DEBUG_KPROTO;
 module_param_named(debug, rxrpc_debug, uint, S_IWUSR | S_IRUGO);
 MODULE_PARM_DESC(debug, "RxRPC debugging mask");
 
-static int sysctl_rxrpc_max_qlen __read_mostly = 10;
-
 static struct proto rxrpc_proto;
 static const struct proto_ops rxrpc_rpc_ops;
 
@@ -191,6 +189,7 @@ static int rxrpc_listen(struct socket *sock, int backlog)
 {
struct sock *sk = sock->sk;
struct rxrpc_sock *rx = rxrpc_sk(sk);
+   unsigned int max;
int ret;
 
_enter("%p,%d", rx, backlog);
@@ -201,17 +200,21 @@ static int rxrpc_listen(struct socket *sock, int backlog)
case RXRPC_UNBOUND:
ret = -EADDRNOTAVAIL;
break;
-   case RXRPC_CLIENT_UNBOUND:
-   case RXRPC_CLIENT_BOUND:
-   default:
-   ret = -EBUSY;
-   break;
case RXRPC_SERVER_BOUND:
ASSERT(rx->local != NULL);
+   max = READ_ONCE(rxrpc_max_backlog);
+   ret = -EINVAL;
+   if (backlog == INT_MAX)
+   backlog = max;
+   else if (backlog < 0 || backlog > m

[PATCH net-next] rxrpc: Trim line-terminal whitespace

2016-06-10 Thread David Howells

Trim line-terminal whitespace in net/rxrpc/

Signed-off-by: David Howells 
---

 net/rxrpc/ar-input.c |2 +-
 net/rxrpc/ar-local.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/ar-input.c b/net/rxrpc/ar-input.c
index d7c2a0bc839e..e0815a033999 100644
--- a/net/rxrpc/ar-input.c
+++ b/net/rxrpc/ar-input.c
@@ -734,7 +734,7 @@ void rxrpc_data_ready(struct sock *sk)
rxrpc_post_packet_to_local(local, skb);
goto out;
}
-   
+
if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA &&
(sp->hdr.callNumber == 0 || sp->hdr.seq == 0))
goto bad_message;
diff --git a/net/rxrpc/ar-local.c b/net/rxrpc/ar-local.c
index 701c42b7050e..111f250b045f 100644
--- a/net/rxrpc/ar-local.c
+++ b/net/rxrpc/ar-local.c
@@ -388,7 +388,7 @@ static void rxrpc_process_local_events(struct work_struct 
*work)
_enter("");
 
atomic_inc(&local->usage);
-   
+
while ((skb = skb_dequeue(&local->event_queue))) {
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);

Re: [PATCH net-next 4/7] ixgbe: protect vxlan_get_rx_port in ixgbe_service_task with rtnl_lock

2016-06-10 Thread Eric Dumazet

On Mon, 2016-04-18 at 21:19 +0200, Hannes Frederic Sowa wrote:
> vxlan_get_rx_port requires rtnl_lock to be held.
> 
> Cc: Jeff Kirsher 
> Cc: Jesse Brandeburg 
> Cc: Shannon Nelson 
> Cc: Carolyn Wyborny 
> Cc: Don Skidmore 
> Cc: Bruce Allan 
> Cc: John Ronciak 
> Cc: Mitch Williams 
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 2976df77bf14f5..b2f2cf40f06a87 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -7192,10 +7192,12 @@ static void ixgbe_service_task(struct work_struct 
> *work)
>   return;
>   }
>  #ifdef CONFIG_IXGBE_VXLAN
> + rtnl_lock();
>   if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
>   adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
>   vxlan_get_rx_port(adapter->netdev);
>   }
> + rtnl_unlock();
>  #endif /* CONFIG_IXGBE_VXLAN */
>   ixgbe_reset_subtask(adapter);
>   ixgbe_phy_interrupt_subtask(adapter);

Although there might be a deadlock with a concurrent
cancel_work_sync(&adapter->service_task);  from ixgbe_remove()

Re: [PATCH] phy: marvell: remove LED config override

2016-06-10 Thread Andrew Lunn

> We could move that 0x30 LED configuration to .config_init instead of
> .config_aneg, so that if nobody configures it with marvell,reg-init, the
> behavior does not change. I'd have to create a new .config_init function
> for the 1121, 1318 and 1510.
> 
> Would you prefer that?

Hi Clemens

Yes, i would prefer that.

 Thanks
Andrew

[PATCH net-next v2] net, cls: allow for deleting all filters for given parent

2016-06-10 Thread Daniel Borkmann

Add a possibility where the user can just specify the parent and
all filters under that parent are then being purged. Currently,
for example for scripting, one needs to specify pref/prio to have
a well-defined number for 'tc filter del' command for addressing
the previously created instance or additionally filter handle in
case of priorities being the same. Improve usage by allowing the
option for tc to specify the parent and removing the whole chain
for that given parent.

Example usage after patch, no tc changes required:

  # tc qdisc replace dev foo clsact
  # tc filter add dev foo egress bpf da obj ./bpf.o
  # tc filter add dev foo egress bpf da obj ./bpf.o
  # tc filter show dev foo egress
  filter protocol all pref 49151 bpf
  filter protocol all pref 49151 bpf handle 0x1 bpf.o:[classifier] direct-action
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bpf.o:[classifier] direct-action
  # tc filter del dev foo egress
  # tc filter show dev foo egress
  #

Previously, RTM_DELTFILTER requests with invalid prio of 0 were
rejected, so only netlink requests with RTM_NEWTFILTER and NLM_F_CREATE
flag were allowed where the kernel would auto-generate a pref/prio.
We can piggyback on that and use prio of 0 as a wildcard for
requests of RTM_DELTFILTER.

For notifying tc netlink monitoring users (e.g. libnl uses this
for caching), there are two options, that is, sending individual
tfilter_notify() notifications for each tcf_proto, or sending a
single one indicating wildcard removal. I tried both and there
are pros and cons for each, eventually I decided for sending
individual tfilter_notify(), so that user space can support this
seamlessly and there won't be a mess of changing each and every
application to make sure expectations from the kernel won't break
when they don't understand single notification. Since linear chains
don't really scale, I expect only a handful of classifiers to be
attached at max for a given parent anyway.

Signed-off-by: Daniel Borkmann 
Acked-by: Jamal Hadi Salim 
---
 v1 -> v2:
  - Added notification support, thanks Cong!
  - Kept Jamal's ack from previous RFC for v2.

 net/sched/cls_api.c | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index aafa6bce..cca1ef5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -103,6 +103,17 @@ static int tfilter_notify(struct net *net, struct sk_buff 
*oskb,
  struct nlmsghdr *n, struct tcf_proto *tp,
  unsigned long fh, int event);
 
+static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
+struct nlmsghdr *n,
+struct tcf_proto __rcu **chain, int event)
+{
+   struct tcf_proto __rcu **it_chain;
+   struct tcf_proto *tp;
+
+   for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL;
+it_chain = &tp->next)
+   tfilter_notify(net, oskb, n, tp, 0, event);
+}
 
 /* Select new prio value from the range, managed by kernel. */
 
@@ -156,11 +167,23 @@ replay:
cl = 0;
 
if (prio == 0) {
-   /* If no priority is given, user wants we allocated it. */
-   if (n->nlmsg_type != RTM_NEWTFILTER ||
-   !(n->nlmsg_flags & NLM_F_CREATE))
+   switch (n->nlmsg_type) {
+   case RTM_DELTFILTER:
+   if (protocol || t->tcm_handle)
+   return -ENOENT;
+   break;
+   case RTM_NEWTFILTER:
+   /* If no priority is provided by the user,
+* we allocate one.
+*/
+   if (n->nlmsg_flags & NLM_F_CREATE) {
+   prio = TC_H_MAKE(0x8000U, 0U);
+   break;
+   }
+   /* fall-through */
+   default:
return -ENOENT;
-   prio = TC_H_MAKE(0x8000U, 0U);
+   }
}
 
/* Find head of filter chain. */
@@ -200,6 +223,12 @@ replay:
err = -EINVAL;
if (chain == NULL)
goto errout;
+   if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) {
+   tfilter_notify_chain(net, skb, n, chain, RTM_DELTFILTER);
+   tcf_destroy_chain(chain);
+   err = 0;
+   goto errout;
+   }
 
/* Check the chain for existence of proto-tcf with this priority */
for (back = chain;
-- 
1.9.3

Re: [PATCH net-next 3/7] mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock

2016-06-10 Thread Eric Dumazet

On Mon, 2016-04-18 at 21:19 +0200, Hannes Frederic Sowa wrote:
> mlx4_en_start_port requires rtnl_lock to be held.
> 
> Cc: Eugenia Emantayev 
> Cc: Yishai Hadas 
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index b4b258c8ca47d4..8bd143dda95d11 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1856,6 +1856,7 @@ static void mlx4_en_restart(struct work_struct *work)
>  
>   en_dbg(DRV, priv, "Watchdog task called for port %d\n", priv->port);
>  
> + rtnl_lock();
>   mutex_lock(&mdev->state_lock);
>   if (priv->port_up) {
>   mlx4_en_stop_port(dev, 1);
> @@ -1863,6 +1864,7 @@ static void mlx4_en_restart(struct work_struct *work)
>   en_err(priv, "Failed restarting port %d\n", priv->port);
>   }
>   mutex_unlock(&mdev->state_lock);
> + rtnl_unlock();
>  }
>  
>  static void mlx4_en_clear_stats(struct net_device *dev)

It looks that this work queue is not canceled at device dismantle.

I suspect a use after free is possible.

Re: [PATCH 1/2] rxrpc: Trim line-terminal whitespace [ver #2]

2016-06-10 Thread David Howells

David Miller  wrote:

> There is a clear, documented, way to do this properly with your
> patches themselves.  Put it in the Subject line:
> 
>   Subject: "[PATCH net-next 1/2] rxrpc: ..."

Hmmm... so there is.  I don't remember seeing the netdev-FAQ appear.  I wonder
how to make stgit do this...

David

Re: [PATCH] phy: marvell: remove LED config override

2016-06-10 Thread Clemens Gruber

Hi Andrew,

On Fri, Jun 10, 2016 at 08:38:57PM +0200, Andrew Lunn wrote:
> On Fri, Jun 10, 2016 at 07:42:52PM +0200, Clemens Gruber wrote:
> > Configuring the PHY LED registers for the Marvell 88E1510 and others is
> > not possible, because regardless of the values in marvell,reg-init, it
> > is later overridden in m88e1121_config_aneg with a non-standard default.
> > 
> > This became visible after we moved the call of marvell_of_reg_init in
> > commit 79be1a1c9090 ("phy: marvell: Fix and unify reg-init behavior").
> > Moving it to _config_init was necessary due to the PHY state machine
> > getting stuck if the PHY interrupts were not enabled early on.
> > As that default LED configuration is set in _config_aneg, it overrides
> > the marvell,reg-init configuration from the device tree.
> > 
> > This patch removes this override and allows the user to again set the
> > PHY LED configuration through marvell,reg-init or if that does not
> > exist, keep the original defaults as documented in the datasheet.
> 
> That override code exists for a reason. I don't like just removing it
> without understanding why it is there. Sergei Poselenov added it as
> part of the 1121 support. Maybe he knows why it is there?

Maybe because he needed that special LED configuration (LED[0] .. Link,
LED[1] .. Activity) on his board?
On my board it's LED[0] as Activity and LED[1] as Link LED.

We could move that 0x30 LED configuration to .config_init instead of
.config_aneg, so that if nobody configures it with marvell,reg-init, the
behavior does not change. I'd have to create a new .config_init function
for the 1121, 1318 and 1510.

Would you prefer that?

(I thought it would be good to be consistent with the defaults mentioned
in the datasheet, but being backwards-compatible is probably more
important, you are right)

Regards,
Clemens

Re: [PATCH net-next 8/8] net: dsa: mv88e6xxx: fail on mismatching probe

2016-06-10 Thread Vivien Didelot

Hi,

Andrew Lunn  writes:

> On Wed, Jun 08, 2016 at 08:44:56PM -0400, Vivien Didelot wrote:
>> Now that we have access at probe time to the chip info described in the
>> device tree, check if the probed device matches the device node,
>> otherwise warn the user and fail.
>
> What good is this? So what if the device tree says a different
> model. We don't care, we don't use that information at all, we read it
> from the device itself.

So we can end up with a badly described device tree. It seems to be a
question of rigor vs. flexibility. I don't know much about the DT
philosophy and I don't really mind as long as we warn the user.

I'd like to have other opinions on this though before pushing v2.

Thanks,

Vivien

[PATCH] netlink.7: describe netlink socket options

2016-06-10 Thread Andrey Vagin

Cc: Kir Kolyshkin 
Cc: Michael Kerrisk 
Cc: Herbert Xu 
Cc: Patrick McHardy 
Cc: Christophe Ricard 
Cc: Nicolas Dichtel 
Signed-off-by: Andrey Vagin 
---
 man7/netlink.7 | 75 ++
 1 file changed, 75 insertions(+)

diff --git a/man7/netlink.7 b/man7/netlink.7
index 513f854..b4848df 100644
--- a/man7/netlink.7
+++ b/man7/netlink.7
@@ -368,6 +368,81 @@ and
 .BR NETLINK_SELINUX
 groups allow other users to receive messages.
 No groups allow other users to send messages.
+
+.SS Socket options
+To set or get a netlink socket option, call
+.BR getsockopt (2)
+to read or
+.BR setsockopt (2)
+to write the option with the option level argument set to
+.BR SOL_NETLINK .
+Unless otherwise noted,
+.I optval
+is a pointer to an
+.IR int .
+.TP
+.BR NETLINK_PKTINFO " (since Linux 2.6.14)"
+Enable
+.B nl_pktinfo
+control messages for received packets to get the extended
+destination group number.
+.TP
+.BR NETLINK_ADD_MEMBERSHIP ,\  NETLINK_DROP_MEMBERSHIP " (since Linux 2.6.14)"
+Join/leave a group specified by
+.IR optval .
+.\"commit 9a4595bc7e67962f13232ee55a64e063062c3a99
+.\"Author: Patrick McHardy 
+.TP
+.BR NETLINK_LIST_MEMBERSHIPS " (since Linux 4.2)"
+Retrieve all groups a socket is a member of.
+.I optval
+is a pointer to
+.B __u32
+and
+.I optlen
+is the size of the array. The array is filled with the full membership set of 
the
+socket, and the required array size is returned in
+.I optlen.
+.\"commit b42be38b2778eda2237fc759e55e3b698b05b315
+.\"Author: David Herrmann 
+.TP
+.BR NETLINK_BROADCAST_ERROR " (since Linux 2.6.30)"
+When not set,
+.B netlink_broadcast()
+only reports
+.B ESRCH
+errors and silently ignore
+.B NOBUFS
+errors.
+.\"commit be0c22a46cfb79ab2342bb28fde99afa94ef868e
+.\"Author: Pablo Neira Ayuso 
+.TP
+.BR NETLINK_NO_ENOBUFS " (since Linux 2.6.30)"
+This flag can be used by unicast and broadcast listeners to avoid receiving
+.B ENOBUFS
+errors.
+.\"commit 38938bfe3489394e2eed5e40c9bb8f66a2ce1405
+.\"Author: Pablo Neira Ayuso 
+.TP
+.BR NETLINK_LISTEN_ALL_NSID " (since Linux 4.2)"
+When set, this socket will receive netlink notifications from all network 
namespaces that
+have an
+.I nsid
+assigned into the network namespace where the socket has been opened. The
+.I nsid
+is sent to user space via an ancillary data.
+.\"commit 59324cf35aba5336b611074028777838a963d03b
+.\"Author: Nicolas Dichtel 
+.TP
+.BR NETLINK_CAP_ACK " (since Linux 4.2)"
+The kernel may fail to allocate the necessary room for the acknowledgment
+message back to userspace. This option trims off the payload of the original
+netlink message.
+The netlink message header is still included, so the user can guess from the
+sequence number what is the message that has triggered the acknowledgment.
+.\"commit 0a6a3a23ea6efde079a5b77688541a98bf202721
+.\"Author: Christophe Ricard 
+
 .SH VERSIONS
 The socket interface to netlink is a new feature of Linux 2.2.
 
-- 
2.5.5

Re: [PATCH net-next 7/8] net: dsa: mv88e6xxx: explicit compatible devices

2016-06-10 Thread Vivien Didelot

Hi,

Andrew Lunn  writes:

> On Wed, Jun 08, 2016 at 08:44:55PM -0400, Vivien Didelot wrote:
>> Thanks to the new device probing, we can explicit the exact switch model
>> in the device tree.
>> 
>> Name the driver "mv88e6xxx" and list all its compatible supported chips.
>
> No. This goes against the usual way of doing device tree compatible
> strings. As far as probing goes, all the currently supported switches
> are compatible with 6095. We can at run time determine the specific
> switch model. This list is just a pain to managed, and has no value.
>
> We only need to add a new compatible string when we cannot determine
> at probe time what a switch model is, or we need to read the ID
> register in a different way.

So thinking about this, I might agree that a "compatible" string per
model is not necessary (even though some drivers are doing this, such as
b53), but at least we might want one compatible string per Marvell
switch family. They have different number of ports, different way to
access them via SMI, different way to access the switch ID register.
this information is useful at probe time.

If one string per model is not recommended, I'd suggest one per
family. What do you guys think?

Thanks,

Vivien

Re: [net-next,v3] openvswitch: Add packet truncation support.

2016-06-10 Thread pravin shelar

On Fri, Jun 10, 2016 at 11:49 AM, William Tu  wrote:
> The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
> truncate packets. A 'max_len' is added for setting up the maximum
> packet size, and a 'cutlen' field is to record the number of bytes
> to trim the packet when the packet is outputting to a port, or when
> the packet is sent to userspace.
>
> Signed-off-by: William Tu 
> Cc: Pravin Shelar 
>
> ---
> v2 -> v3
> - Change minimum packet size after truncate to 14 (ETH_HLEN).
> - Pass cutlen to userspace action.
> - Allocate smaller skb buffer if packet is truncated.
>
> v1 -> v2:
> - Use 32 bits cutlen and fix unnecessary cutlen initilization.
> - Fix cutlen leaks to remaining actions under sample action.
> - In queue_userspace_packet, don't extract hlen when using skb_zerocopy.
> ---

Acked-by: Pravin B Shelar 

Thanks.

Re: [PATCH net-next 4/8] net: dsa: mv88e6xxx: do not increment bus refcount

2016-06-10 Thread Andrew Lunn

On Fri, Jun 10, 2016 at 03:59:22PM -0400, Vivien Didelot wrote:
> Hi,
> 
> Andrew Lunn  writes:
> 
> > On Wed, Jun 08, 2016 at 08:44:52PM -0400, Vivien Didelot wrote:
> >> The MDIO device probe and remove functions are respectively incrementing
> >> and decrementing the bus refcount themselves. Since these bus level
> >> actions are out of the device scope, remove them.
> >
> > I agree with the patch. But have you checked the mdio layer is doing
> > the right thing? If not, we should fix that first.
> 
> So I added some printing after incrementing/decrementing the refcount in
> get_device/put_device to track &ps->bus->dev, which name is "0.1".
> 
> Regardless having this patch or not, the refcount of the 0.1 mii_bus
> device is 5 before loading the mv88e6xxx module on vf610-zii-dev-rev-b.
> 
> Below is a portion of dmesg:
> 
> [8.921647] get_device: 400d1000.etherne refcount: 4
> [8.926225] get_device: 0.1 refcount: 2
> [8.929561] get_device: mdio-mux refcount: 5
> [8.934076] get_device: 0.1 refcount: 3
> [8.937446] get_device: 0.1 refcount: 4
> [8.940792] put_device: 0.1 refcount: 3
> [8.944181] libphy: mdio_mux: probed
> [8.947885] mdio_bus 0.1:00: mdio_device_register
> [8.952649] get_device: 0.1:00 refcount: 2
> [8.956283] get_device: 0.1 refcount: 4
> [8.959838] get_device: 0.1:00 refcount: 3
> [8.963991] get_device: 0.1:00 refcount: 4
> [8.967598] put_device: 0.1:00 refcount: 3
> [8.971298] get_device: 0.2 refcount: 2
> [8.974687] get_device: mdio-mux refcount: 7
> 
> So it seems like of_ is managing the bus refcount on events such as a
> new child node (0.1:00).

Great, thanks for looking at this.

Acked-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 4/8] net: dsa: mv88e6xxx: do not increment bus refcount

2016-06-10 Thread Vivien Didelot

Hi,

Andrew Lunn  writes:

> On Wed, Jun 08, 2016 at 08:44:52PM -0400, Vivien Didelot wrote:
>> The MDIO device probe and remove functions are respectively incrementing
>> and decrementing the bus refcount themselves. Since these bus level
>> actions are out of the device scope, remove them.
>
> I agree with the patch. But have you checked the mdio layer is doing
> the right thing? If not, we should fix that first.

So I added some printing after incrementing/decrementing the refcount in
get_device/put_device to track &ps->bus->dev, which name is "0.1".

Regardless having this patch or not, the refcount of the 0.1 mii_bus
device is 5 before loading the mv88e6xxx module on vf610-zii-dev-rev-b.

Below is a portion of dmesg:

[8.921647] get_device: 400d1000.etherne refcount: 4
[8.926225] get_device: 0.1 refcount: 2
[8.929561] get_device: mdio-mux refcount: 5
[8.934076] get_device: 0.1 refcount: 3
[8.937446] get_device: 0.1 refcount: 4
[8.940792] put_device: 0.1 refcount: 3
[8.944181] libphy: mdio_mux: probed
[8.947885] mdio_bus 0.1:00: mdio_device_register
[8.952649] get_device: 0.1:00 refcount: 2
[8.956283] get_device: 0.1 refcount: 4
[8.959838] get_device: 0.1:00 refcount: 3
[8.963991] get_device: 0.1:00 refcount: 4
[8.967598] put_device: 0.1:00 refcount: 3
[8.971298] get_device: 0.2 refcount: 2
[8.974687] get_device: mdio-mux refcount: 7

So it seems like of_ is managing the bus refcount on events such as a
new child node (0.1:00).

Thanks,

Vivien

Re: [PATCH v2 2/3] mwifiex: move .get_tx_power logic to station ioctl file

2016-06-10 Thread Kalle Valo

Javier Martinez Canillas  writes:

>> This patch (2/3) is only for code rearrangement and adds an
>> unnecessary wrapper function. We can simply drop the patch.
>
> Agreed.
>
> Kalle,
>
> Patch 3/3 applies cleanly even after dropping patch 2/3.
> Is that Ok for you or do you want me to re-resend a v3
> with only patches 1/3 and 3/3?

I can drop patch 2, no need to resend. Thanks.

-- 
Kalle Valo

[PATCH net-next,v2] tools: hv: Add a script to help bonding synthetic and VF NICs

2016-06-10 Thread Haiyang Zhang

This script helps to create bonding network devices based on synthetic NIC
(the virtual network adapter usually provided by Hyper-V) and the matching
VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
function as one network device, and fail over to the synthetic NIC if VF is
down.

Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 
---
v2: Make it to be distro neutral.
---
 tools/hv/bondvf.sh |   79 
 1 files changed, 79 insertions(+), 0 deletions(-)
 create mode 100755 tools/hv/bondvf.sh

diff --git a/tools/hv/bondvf.sh b/tools/hv/bondvf.sh
new file mode 100755
index 000..ec7abf1
--- /dev/null
+++ b/tools/hv/bondvf.sh
@@ -0,0 +1,79 @@
+#!/bin/bash
+
+# The script helps creating bonding network devices based on synthetic NIC
+# (the virtual network adapter usually provided by Hyper-V) and the matching
+# VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
+# function as one network device, and fail over to the synthetic NIC if VF is
+# down.
+#
+# Usage:
+# - After configured vSwitch and vNIC with SRIOV, start Linux virtual
+#   machine (VM)
+# - Run this script on the VM. It will list the NIC pairs which should be
+#   bonded together. Also indicates which one should be the primary slave.
+# - User may create bonding NIC configurations based on the Distro manual.
+#   The bonding mode shoudl be "active-backup".
+#   Then, reboot the VM, so that the bonding config are enabled.
+#
+
+sysdir=/sys/class/net
+netvsc_cls={f8615163-df3e-46c5-913f-f2d2f965ed0e}
+bondcnt=0
+
+# Get a list of ethernet names
+list_eth=(`cd $sysdir && ls -d */ | cut -d/ -f1 | grep -v bond`)
+eth_cnt=${#list_eth[@]}
+
+echo List of net devices:
+
+# Get the MAC addresses
+for (( i=0; i < $eth_cnt; i++ ))
+do
+   list_mac[$i]=`cat $sysdir/${list_eth[$i]}/address`
+   echo ${list_eth[$i]}, ${list_mac[$i]}
+done
+
+# Find NIC with matching MAC
+for (( i=0; i < $eth_cnt-1; i++ ))
+do
+   for (( j=i+1; j < $eth_cnt; j++ ))
+   do
+   if [ "${list_mac[$i]}" = "${list_mac[$j]}" ]
+   then
+   list_match[$i]=${list_eth[$j]}
+   break
+   fi
+   done
+done
+
+function create_bond_cfg {
+   echo $'\nBond name:' $1
+   echo should include primary slave: $2, secondary slave: $3
+}
+
+function create_bond {
+   local bondname=bond$bondcnt
+
+   local class_id1=`cat $sysdir/$1/device/class_id 2>/dev/null`
+   local class_id2=`cat $sysdir/$2/device/class_id 2>/dev/null`
+
+   if [ "$class_id1" = "$netvsc_cls" ]
+   then
+   create_bond_cfg $bondname $2 $1
+   elif [ "$class_id2" = "$netvsc_cls" ]
+   then
+   create_bond_cfg $bondname $1 $2
+   else
+   return 0
+   fi
+
+   let bondcnt=bondcnt+1
+}
+
+for (( i=0; i < $eth_cnt-1; i++ ))
+do
+if [ -n "${list_match[$i]}" ]
+then
+   create_bond ${list_eth[$i]} ${list_match[$i]}
+fi
+done
-- 
1.7.4.1

Re: [PATCH v2 2/3] mwifiex: move .get_tx_power logic to station ioctl file

2016-06-10 Thread Javier Martinez Canillas

Hello Amitkumar,

On 06/10/2016 12:26 PM, Amitkumar Karwar wrote:
> Hi Kalle/Javier,
> 
>> From: Javier Martinez Canillas [mailto:jav...@osg.samsung.com]
>> Sent: Friday, June 10, 2016 8:07 PM
>> To: Kalle Valo
>> Cc: linux-ker...@vger.kernel.org; Julian Calaby; Shengzhen Li; Enric
>> Balletbo i Serra; Amitkumar Karwar; netdev@vger.kernel.org; linux-
>> wirel...@vger.kernel.org; Nishant Sarmukadam
>> Subject: Re: [PATCH v2 2/3] mwifiex: move .get_tx_power logic to station
>> ioctl file
>>
>> Hello Kalle,
>>
>> On 06/10/2016 10:30 AM, Kalle Valo wrote:
>>> Javier Martinez Canillas  writes:
>>>
 From: Shengzhen Li 

 Most cfg80211 operations are just a wrappers to functions defined in
 the sta_ioctl.c file, so for consistency move the .get_tx_power logic
>> there.

 Signed-off-by: Shengzhen Li 
 Signed-off-by: Amitkumar Karwar 
 [javier: update the subject line and commit message]
 Signed-off-by: Javier Martinez Canillas 
>>>
>>> [...]
>>>
 --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
 +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
 @@ -385,18 +385,10 @@ mwifiex_cfg80211_get_tx_power(struct wiphy
>> *wiphy,
  int *dbm)
  {
struct mwifiex_adapter *adapter =
>> mwifiex_cfg80211_get_adapter(wiphy);
 -  struct mwifiex_private *priv = mwifiex_get_priv(adapter,
 -  MWIFIEX_BSS_ROLE_ANY);
 -  int ret = mwifiex_send_cmd(priv, HostCmd_CMD_RF_TX_PWR,
 - HostCmd_ACT_GEN_GET, 0, NULL, true);
 -
 -  if (ret < 0)
 -  return ret;
 -
 -  /* tx_power_level is set in HostCmd_CMD_RF_TX_PWR command handler
>> */
 -  *dbm = priv->tx_power_level;
 +  struct mwifiex_private *priv;

 -  return 0;
 +  priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_ANY);
 +  return mwifiex_get_tx_power(priv, dbm);
  }
>>>
>>> So in patch 1 you added the patch and in patch 2 you move it to a
>>> different location? That doesn't make any sense, can't you just fold
>>> the two patches into one so that the function is added only once.
>>>
>>
>> I posted this patch in v1 but then Amitkumar shared his patch with me
>> that was very similar to mine, only that the logic was in a different
>> location.
>>
>> So I included his delta as a separate patch to try keeping attribution
>> as best as possible.
>>
> 
> This patch (2/3) is only for code rearrangement and adds an unnecessary 
> wrapper function. We can simply drop the patch.
>

Agreed.

Kalle,

Patch 3/3 applies cleanly even after dropping patch 2/3.
Is that Ok for you or do you want me to re-resend a v3
with only patches 1/3 and 3/3?

> Regards,
> Amitkumar
> 

Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America

PATCH [1/1] ipv4: Prevent malformed UFO fragments in ip_append_page

2016-06-10 Thread Steven Caron

As  the ip fragment offset field counts 8-byte chunks, non-final ip
fragments must be multiples of 8 bytes of payload. Depending  on
the mtu and ip option sizes, ip_append_page wasn't respecting this,
notably when running NFS under UDP.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 124bf0a..21ec54e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1239,7 +1239,7 @@ ssize_t   ip_append_page(struct sock *sk, struct flowi4 
*fl4, struct page *page,
if (skb->ip_summed != CHECKSUM_PARTIAL)
return -EOPNOTSUPP;

-   skb_shinfo(skb)->gso_size = mtu - fragheaderlen;
+   skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
}
cork->length += size;

PATCH [0/1] ipv4: Prevent malformed UFO fragments in ip_append_page

2016-06-10 Thread Steven Caron

Hi Hannes,

It's a very simple patch which was already included in the original email.
The patch is from Linux 4.7-rc1.

net/ipv4/ip_output.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Thanks, 

Steven

[PATCH net-next 0/2] bpf: couple of fixes

2016-06-10 Thread Daniel Borkmann

These are two fixes for BPF, one to introduce xmit recursion limiter for
tc bpf programs and the other one to reject filters a bit earlier. For
more details please see individual patches. I have no strong opinion
to which tree they should go, they apply to both, but I think net-next
seems okay to me.

Thanks!

Daniel Borkmann (2):
  bpf: enforce recursion limit on redirects
  bpf: reject wrong sized filters earlier

 include/linux/netdevice.h |  2 ++
 net/core/dev.c|  6 ++--
 net/core/filter.c | 78 +--
 3 files changed, 53 insertions(+), 33 deletions(-)

-- 
1.9.3

[PATCH net-next 1/2] bpf: enforce recursion limit on redirects

2016-06-10 Thread Daniel Borkmann

Respect the stack's xmit_recursion limit for calls into dev_queue_xmit().
Currently, they are not handeled by the limiter when attached to clsact's
egress parent, for example, and a buggy program redirecting it to the
same device again could run into stack overflow eventually. It would be
good if we could notify an admin to give him a chance to react. We reuse
xmit_recursion instead of having one private to eBPF, so that the stack's
current recursion depth will be taken into account as well. Follow-up to
commit 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") and
27b29f63058d ("bpf: add bpf_redirect() helper").

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c|  6 ++
 net/core/filter.c | 55 +--
 3 files changed, 38 insertions(+), 25 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4f234b1..94eef35 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2389,6 +2389,8 @@ void synchronize_net(void);
 int init_dummy_netdev(struct net_device *dev);
 
 DECLARE_PER_CPU(int, xmit_recursion);
+#define XMIT_RECURSION_LIMIT   10
+
 static inline int dev_recursion_level(void)
 {
return this_cpu_read(xmit_recursion);
diff --git a/net/core/dev.c b/net/core/dev.c
index c43c9d2..b148357 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3144,8 +3144,6 @@ static void skb_update_prio(struct sk_buff *skb)
 DEFINE_PER_CPU(int, xmit_recursion);
 EXPORT_SYMBOL(xmit_recursion);
 
-#define RECURSION_LIMIT 10
-
 /**
  * dev_loopback_xmit - loop back @skb
  * @net: network namespace this loopback is happening in
@@ -3388,8 +3386,8 @@ static int __dev_queue_xmit(struct sk_buff *skb, void 
*accel_priv)
int cpu = smp_processor_id(); /* ok because BHs are off */
 
if (txq->xmit_lock_owner != cpu) {
-
-   if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
+   if (unlikely(__this_cpu_read(xmit_recursion) >
+XMIT_RECURSION_LIMIT))
goto recursion_alert;
 
skb = validate_xmit_skb(skb, dev);
diff --git a/net/core/filter.c b/net/core/filter.c
index 68adb5f..d11744d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1603,9 +1603,36 @@ static const struct bpf_func_proto bpf_csum_diff_proto = 
{
.arg5_type  = ARG_ANYTHING,
 };
 
+static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
+{
+   if (skb_at_tc_ingress(skb))
+   skb_postpush_rcsum(skb, skb_mac_header(skb), skb->mac_len);
+
+   return dev_forward_skb(dev, skb);
+}
+
+static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
+{
+   int ret;
+
+   if (unlikely(__this_cpu_read(xmit_recursion) > XMIT_RECURSION_LIMIT)) {
+   net_crit_ratelimited("bpf: recursion limit reached on datapath, 
buggy bpf program?\n");
+   kfree_skb(skb);
+   return -ENETDOWN;
+   }
+
+   skb->dev = dev;
+
+   __this_cpu_inc(xmit_recursion);
+   ret = dev_queue_xmit(skb);
+   __this_cpu_dec(xmit_recursion);
+
+   return ret;
+}
+
 static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 flags, u64 r4, u64 r5)
 {
-   struct sk_buff *skb = (struct sk_buff *) (long) r1, *skb2;
+   struct sk_buff *skb = (struct sk_buff *) (long) r1;
struct net_device *dev;
 
if (unlikely(flags & ~(BPF_F_INGRESS)))
@@ -1615,19 +1642,12 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 
flags, u64 r4, u64 r5)
if (unlikely(!dev))
return -EINVAL;
 
-   skb2 = skb_clone(skb, GFP_ATOMIC);
-   if (unlikely(!skb2))
+   skb = skb_clone(skb, GFP_ATOMIC);
+   if (unlikely(!skb))
return -ENOMEM;
 
-   if (flags & BPF_F_INGRESS) {
-   if (skb_at_tc_ingress(skb2))
-   skb_postpush_rcsum(skb2, skb_mac_header(skb2),
-  skb2->mac_len);
-   return dev_forward_skb(dev, skb2);
-   }
-
-   skb2->dev = dev;
-   return dev_queue_xmit(skb2);
+   return flags & BPF_F_INGRESS ?
+  __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
 }
 
 static const struct bpf_func_proto bpf_clone_redirect_proto = {
@@ -1671,15 +1691,8 @@ int skb_do_redirect(struct sk_buff *skb)
return -EINVAL;
}
 
-   if (ri->flags & BPF_F_INGRESS) {
-   if (skb_at_tc_ingress(skb))
-   skb_postpush_rcsum(skb, skb_mac_header(skb),
-  skb->mac_len);
-   return dev_forward_skb(dev, skb);
-   }
-
-   skb->dev = dev;
-   return dev_queue_xmit(skb);
+   return ri->flags & BPF_F_INGRESS ?
+  __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);

[PATCH net-next 2/2] bpf: reject wrong sized filters earlier

2016-06-10 Thread Daniel Borkmann

Add a bpf_check_basics_ok() and reject filters that are of invalid
size much earlier, so we don't do any useless work such as invoking
bpf_prog_alloc(). Currently, rejection happens in bpf_check_classic()
only, but it's really unnecessarily late and they should be rejected
at earliest point. While at it, also clean up one bpf_prog_size() to
make it consistent with the remaining invocations.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index d11744d..df6860c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -748,6 +748,17 @@ static bool chk_code_allowed(u16 code_to_probe)
return codes[code_to_probe];
 }
 
+static bool bpf_check_basics_ok(const struct sock_filter *filter,
+   unsigned int flen)
+{
+   if (filter == NULL)
+   return false;
+   if (flen == 0 || flen > BPF_MAXINSNS)
+   return false;
+
+   return true;
+}
+
 /**
  * bpf_check_classic - verify socket filter code
  * @filter: filter to verify
@@ -768,9 +779,6 @@ static int bpf_check_classic(const struct sock_filter 
*filter,
bool anc_found;
int pc;
 
-   if (flen == 0 || flen > BPF_MAXINSNS)
-   return -EINVAL;
-
/* Check the filter code now */
for (pc = 0; pc < flen; pc++) {
const struct sock_filter *ftest = &filter[pc];
@@ -1065,7 +1073,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct 
sock_fprog_kern *fprog)
struct bpf_prog *fp;
 
/* Make sure new filter is there and in the right amounts. */
-   if (fprog->filter == NULL)
+   if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
 
fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
@@ -1112,7 +1120,7 @@ int bpf_prog_create_from_user(struct bpf_prog **pfp, 
struct sock_fprog *fprog,
int err;
 
/* Make sure new filter is there and in the right amounts. */
-   if (fprog->filter == NULL)
+   if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
 
fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
@@ -1207,7 +1215,6 @@ static
 struct bpf_prog *__get_filter(struct sock_fprog *fprog, struct sock *sk)
 {
unsigned int fsize = bpf_classic_proglen(fprog);
-   unsigned int bpf_fsize = bpf_prog_size(fprog->len);
struct bpf_prog *prog;
int err;
 
@@ -1215,10 +1222,10 @@ struct bpf_prog *__get_filter(struct sock_fprog *fprog, 
struct sock *sk)
return ERR_PTR(-EPERM);
 
/* Make sure new filter is there and in the right amounts. */
-   if (fprog->filter == NULL)
+   if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return ERR_PTR(-EINVAL);
 
-   prog = bpf_prog_alloc(bpf_fsize, 0);
+   prog = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
if (!prog)
return ERR_PTR(-ENOMEM);
 
-- 
1.9.3

Re: [PATCH net-next 4/4] net: dsa: bcm_sf2: Add VLAN support

2016-06-10 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> Another possible approach would have been to allocate the vlan structure
> upong port_vlan_prepare() though that would typically result in more
> fragmentation over time once se start using more VLANs.

This is indeed what the switchdev prepare phase is for. If anything goes
wrong before calling the commit phase, switchdev will free the allocated
data.

Thanks,

Vivien

Re: [PATCH net-next 1/5] net: dsa: b53: Add support for Broadcom RoboSwitch

2016-06-10 Thread Andrew Lunn

> >> +static inline int b53_switch_get_reset_gpio(struct b53_device *dev)
> >> +{
> >> +  enum bcm47xx_board board = bcm47xx_board_get();
> >> +
> >> +  switch (board) {
> >> +  case BCM47XX_BOARD_LINKSYS_WRT300NV11:
> >> +  case BCM47XX_BOARD_LINKSYS_WRT310NV1:
> >> +  return 8;
> > 
> > Rather than hard coding it, could we get it from device tree?
> 
> Difficult for now, this is for in-tree MIPS-based platforms under
> arch/mips/bcm47xx, which are using SSB/platform data.

Ah, O.K.

But it would not harm to also support getting the GPIO from device
tree, if device tree is available. But maybe that should wait until
such a user comes along.

 Andrew

Re: [PATCH net-next 4/4] net: dsa: bcm_sf2: Add VLAN support

2016-06-10 Thread Andrew Lunn

On Fri, Jun 10, 2016 at 11:47:48AM -0700, Florian Fainelli wrote:
> On 06/10/2016 05:00 AM, Andrew Lunn wrote:
> >> @@ -148,6 +155,9 @@ struct bcm_sf2_priv {
> >>struct device_node  *master_mii_dn;
> >>struct mii_bus  *slave_mii_bus;
> >>struct mii_bus  *master_mii_bus;
> >> +
> >> +  /* Cache of programmed VLANs */
> >> +  struct bcm_sf2_vlan vlans[VLAN_N_VID];
> > 
> > Hi Florian
> > 
> > This is a 16Kbyte array. So i assume the whole priv structure needs 5
> > pages. Have you had any trouble allocating this much memory,
> > particularly once it has been used for a while and fragmented?
> 
> Well, since this is using the old binding, we can't unload the driver,
> it's built into the kernel, so initializes early enough we have got
> plenty of memory.

Don't you want to use the new binding at some point?

> > I just wondered if it might be better to use vmalloc() for the
> > vlans.
> 
> That's a very good point, I can't really see a drawback to doing this,
> will submit a patch moving this to a dynamic allocation.
> 
> Another possible approach would have been to allocate the vlan structure
> upong port_vlan_prepare() though that would typically result in more
> fragmentation over time once se start using more VLANs.

It is a trade off, code complexity vs saved memory. I don't think such
a table is too bad, assuming your not trying to run the whole system
in 32Mbyes of RAM.

   Andrew

[net-next,v3] openvswitch: Add packet truncation support.

2016-06-10 Thread William Tu

The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
truncate packets. A 'max_len' is added for setting up the maximum
packet size, and a 'cutlen' field is to record the number of bytes
to trim the packet when the packet is outputting to a port, or when
the packet is sent to userspace.

Signed-off-by: William Tu 
Cc: Pravin Shelar 

---
v2 -> v3
- Change minimum packet size after truncate to 14 (ETH_HLEN).
- Pass cutlen to userspace action.
- Allocate smaller skb buffer if packet is truncated.

v1 -> v2:
- Use 32 bits cutlen and fix unnecessary cutlen initilization.
- Fix cutlen leaks to remaining actions under sample action.
- In queue_userspace_packet, don't extract hlen when using skb_zerocopy.
---
---
 include/uapi/linux/openvswitch.h |  6 ++
 net/openvswitch/actions.c| 40 
 net/openvswitch/datapath.c   | 29 +
 net/openvswitch/datapath.h   |  5 -
 net/openvswitch/flow_netlink.c   |  9 +
 net/openvswitch/vport.c  |  1 +
 6 files changed, 73 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bb0d515..8274675 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -580,6 +580,10 @@ enum ovs_userspace_attr {
 
 #define OVS_USERSPACE_ATTR_MAX (__OVS_USERSPACE_ATTR_MAX - 1)
 
+struct ovs_action_trunc {
+   uint32_t max_len; /* Max packet size in bytes. */
+};
+
 /**
  * struct ovs_action_push_mpls - %OVS_ACTION_ATTR_PUSH_MPLS action argument.
  * @mpls_lse: MPLS label stack entry to push.
@@ -703,6 +707,7 @@ enum ovs_nat_attr {
  * enum ovs_action_attr - Action types.
  *
  * @OVS_ACTION_ATTR_OUTPUT: Output packet to port.
+ * @OVS_ACTION_ATTR_TRUNC: Output packet to port with truncated packet size.
  * @OVS_ACTION_ATTR_USERSPACE: Send packet to userspace according to nested
  * %OVS_USERSPACE_ATTR_* attributes.
  * @OVS_ACTION_ATTR_SET: Replaces the contents of an existing header.  The
@@ -756,6 +761,7 @@ enum ovs_action_attr {
   * The data must be zero for the unmasked
   * bits. */
OVS_ACTION_ATTR_CT,   /* Nested OVS_CT_ATTR_* . */
+   OVS_ACTION_ATTR_TRUNC,/* u32 struct ovs_action_trunc. */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 9a3eb7a..1ecbd77 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -750,6 +750,14 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port,
 
if (likely(vport)) {
u16 mru = OVS_CB(skb)->mru;
+   u32 cutlen = OVS_CB(skb)->cutlen;
+
+   if (unlikely(cutlen > 0)) {
+   if (skb->len - cutlen > ETH_HLEN)
+   pskb_trim(skb, skb->len - cutlen);
+   else
+   pskb_trim(skb, ETH_HLEN);
+   }
 
if (likely(!mru || (skb->len <= mru + ETH_HLEN))) {
ovs_vport_send(vport, skb);
@@ -775,7 +783,8 @@ static void do_output(struct datapath *dp, struct sk_buff 
*skb, int out_port,
 
 static int output_userspace(struct datapath *dp, struct sk_buff *skb,
struct sw_flow_key *key, const struct nlattr *attr,
-   const struct nlattr *actions, int actions_len)
+   const struct nlattr *actions, int actions_len,
+   uint32_t cutlen)
 {
struct dp_upcall_info upcall;
const struct nlattr *a;
@@ -822,7 +831,7 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
} /* End of switch. */
}
 
-   return ovs_dp_upcall(dp, skb, key, &upcall);
+   return ovs_dp_upcall(dp, skb, key, &upcall, cutlen);
 }
 
 static int sample(struct datapath *dp, struct sk_buff *skb,
@@ -832,6 +841,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
const struct nlattr *acts_list = NULL;
const struct nlattr *a;
int rem;
+   u32 cutlen = 0;
 
for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
 a = nla_next(a, &rem)) {
@@ -858,13 +868,24 @@ static int sample(struct datapath *dp, struct sk_buff 
*skb,
return 0;
 
/* The only known usage of sample action is having a single user-space
+* action, or having a truncate action followed by a single user-space
 * action. Treat this usage as a special case.
 * The output_userspace() should clone the skb to be sent to the
 * user space. This skb will be consumed by its caller.
 */
+   if (unlikely(nla_type(a) == OVS_ACTION_ATTR_TRUNC)) {
+   struct ovs_action_tru

Re: [PATCH net-next 1/5] net: dsa: b53: Add support for Broadcom RoboSwitch

2016-06-10 Thread Florian Fainelli

On 06/10/2016 05:11 AM, Andrew Lunn wrote:
>> +static void b53_switch_reset_gpio(struct b53_device *dev)
>> +{
>> +int gpio = dev->reset_gpio;
>> +
>> +if (gpio < 0)
>> +return;
>> +
>> +/* Reset sequence: RESET low(50ms)->high(20ms)
>> + */
>> +gpio_set_value(gpio, 0);
>> +mdelay(50);
>> +
>> +gpio_set_value(gpio, 1);
>> +mdelay(20);
>> +
>> +dev->current_page = 0xff;
>> +}
> 
> Hi Florian
> 
> It would be better to use the gpiod API here, so the active hi/active
> low flag is respected.

OK.

> 
>> +dev->reset_gpio = b53_switch_get_reset_gpio(dev);
>> +if (dev->reset_gpio >= 0) {
>> +ret = devm_gpio_request_one(dev->dev, dev->reset_gpio,
>> +GPIOF_OUT_INIT_HIGH, "robo_reset");
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +return 0;
>> +}
> 
>> +static inline int b53_switch_get_reset_gpio(struct b53_device *dev)
>> +{
>> +enum bcm47xx_board board = bcm47xx_board_get();
>> +
>> +switch (board) {
>> +case BCM47XX_BOARD_LINKSYS_WRT300NV11:
>> +case BCM47XX_BOARD_LINKSYS_WRT310NV1:
>> +return 8;
> 
> Rather than hard coding it, could we get it from device tree?

Difficult for now, this is for in-tree MIPS-based platforms under
arch/mips/bcm47xx, which are using SSB/platform data.
-- 
Florian

Re: [PATCH net-next 4/4] net: dsa: bcm_sf2: Add VLAN support

2016-06-10 Thread Florian Fainelli

On 06/10/2016 05:00 AM, Andrew Lunn wrote:
>> @@ -148,6 +155,9 @@ struct bcm_sf2_priv {
>>  struct device_node  *master_mii_dn;
>>  struct mii_bus  *slave_mii_bus;
>>  struct mii_bus  *master_mii_bus;
>> +
>> +/* Cache of programmed VLANs */
>> +struct bcm_sf2_vlan vlans[VLAN_N_VID];
> 
> Hi Florian
> 
> This is a 16Kbyte array. So i assume the whole priv structure needs 5
> pages. Have you had any trouble allocating this much memory,
> particularly once it has been used for a while and fragmented?

Well, since this is using the old binding, we can't unload the driver,
it's built into the kernel, so initializes early enough we have got
plenty of memory.

> 
> I just wondered if it might be better to use vmalloc() for the
> vlans.

That's a very good point, I can't really see a drawback to doing this,
will submit a patch moving this to a dynamic allocation.

Another possible approach would have been to allocate the vlan structure
upong port_vlan_prepare() though that would typically result in more
fragmentation over time once se start using more VLANs.
-- 
Florian

Re: [PATCH] phy: marvell: remove LED config override

2016-06-10 Thread Andrew Lunn

On Fri, Jun 10, 2016 at 07:42:52PM +0200, Clemens Gruber wrote:
> Configuring the PHY LED registers for the Marvell 88E1510 and others is
> not possible, because regardless of the values in marvell,reg-init, it
> is later overridden in m88e1121_config_aneg with a non-standard default.
> 
> This became visible after we moved the call of marvell_of_reg_init in
> commit 79be1a1c9090 ("phy: marvell: Fix and unify reg-init behavior").
> Moving it to _config_init was necessary due to the PHY state machine
> getting stuck if the PHY interrupts were not enabled early on.
> As that default LED configuration is set in _config_aneg, it overrides
> the marvell,reg-init configuration from the device tree.
> 
> This patch removes this override and allows the user to again set the
> PHY LED configuration through marvell,reg-init or if that does not
> exist, keep the original defaults as documented in the datasheet.

That override code exists for a reason. I don't like just removing it
without understanding why it is there. Sergei Poselenov added it as
part of the 1121 support. Maybe he knows why it is there?

 Andrew

Re: [PATCH net 1/2] ovs/vxlan: fix rtnl notifications on iface deletion

2016-06-10 Thread Jesse Gross

On Fri, Jun 10, 2016 at 2:32 AM, Nicolas Dichtel
 wrote:
> The function vxlan_dev_create() (only used by ovs) never calls
> rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
> never set to RTNL_LINK_INITIALIZED.
> During the deletion phase, the function rollback_registered_many() sends
> a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.
>
> Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN")
> CC: Thomas Graf 
> CC: Pravin B Shelar 
> Signed-off-by: Nicolas Dichtel 

Does Geneve need this as well?

Re: [PATCH net 2/2] ovs/gre: fix rtnl notifications on iface deletion

2016-06-10 Thread pravin shelar

On Fri, Jun 10, 2016 at 2:32 AM, Nicolas Dichtel
 wrote:
> The function gretap_fb_dev_create() (only used by ovs) never calls
> rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
> never set to RTNL_LINK_INITIALIZED.
> During the deletion phase, the function rollback_registered_many() sends
> a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.
>
> Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of 
> vport")
> CC: Thomas Graf 
> CC: Pravin B Shelar 
> Signed-off-by: Nicolas Dichtel 
> ---
>  net/ipv4/ip_gre.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 4d2025f7ec57..c6a46ee87a3a 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -1146,6 +1146,10 @@ struct net_device *gretap_fb_dev_create(struct net 
> *net, const char *name,
> if (err)
> goto out;
>
> +   err = rtnl_configure_link(dev, NULL);
> +   if (err < 0)
> +   goto out;
> +

As the first patch, similar issue exist here. But existing error
handling code already is buggy.

> return dev;
>  out:
> free_netdev(dev);
> --
> 2.4.2
>

Re: [PATCH net 1/2] ovs/vxlan: fix rtnl notifications on iface deletion

2016-06-10 Thread pravin shelar

On Fri, Jun 10, 2016 at 2:32 AM, Nicolas Dichtel
 wrote:
> The function vxlan_dev_create() (only used by ovs) never calls
> rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
> never set to RTNL_LINK_INITIALIZED.
> During the deletion phase, the function rollback_registered_many() sends
> a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.
>
> Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN")
> CC: Thomas Graf 
> CC: Pravin B Shelar 
> Signed-off-by: Nicolas Dichtel 
> ---
>  drivers/net/vxlan.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index f999db2f97b4..f7669d60b3f7 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2972,6 +2972,12 @@ struct net_device *vxlan_dev_create(struct net *net, 
> const char *name,
> return ERR_PTR(err);
> }
>
> +   err = rtnl_configure_link(dev, NULL);
> +   if (err < 0) {
> +   free_netdev(dev);
> +   return ERR_PTR(err);
> +   }
> +
Patch looks good except the error handling code. earlier call to
vxlan_dev_configure() registers the net-device  and add the device to
vxlan device list. it needs to be undone before freeing the device.

[PATCH] phy: marvell: remove LED config override

2016-06-10 Thread Clemens Gruber

Configuring the PHY LED registers for the Marvell 88E1510 and others is
not possible, because regardless of the values in marvell,reg-init, it
is later overridden in m88e1121_config_aneg with a non-standard default.

This became visible after we moved the call of marvell_of_reg_init in
commit 79be1a1c9090 ("phy: marvell: Fix and unify reg-init behavior").
Moving it to _config_init was necessary due to the PHY state machine
getting stuck if the PHY interrupts were not enabled early on.
As that default LED configuration is set in _config_aneg, it overrides
the marvell,reg-init configuration from the device tree.

This patch removes this override and allows the user to again set the
PHY LED configuration through marvell,reg-init or if that does not
exist, keep the original defaults as documented in the datasheet.

Signed-off-by: Clemens Gruber 
---
 drivers/net/phy/marvell.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 280e879..7a94039 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -115,10 +115,6 @@
 #define MII_88E1318S_PHY_WOL_CTRL_CLEAR_WOL_STATUS  BIT(12)
 #define MII_88E1318S_PHY_WOL_CTRL_MAGIC_PACKET_MATCH_ENABLE BIT(14)
 
-#define MII_88E1121_PHY_LED_CTRL   16
-#define MII_88E1121_PHY_LED_PAGE   3
-#define MII_88E1121_PHY_LED_DEF0x0030
-
 #define MII_M1011_PHY_STATUS   0x11
 #define MII_M1011_PHY_STATUS_1000  0x8000
 #define MII_M1011_PHY_STATUS_100   0x4000
@@ -407,15 +403,7 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
if (err < 0)
return err;
 
-   oldpage = phy_read(phydev, MII_MARVELL_PHY_PAGE);
-
-   phy_write(phydev, MII_MARVELL_PHY_PAGE, MII_88E1121_PHY_LED_PAGE);
-   phy_write(phydev, MII_88E1121_PHY_LED_CTRL, MII_88E1121_PHY_LED_DEF);
-   phy_write(phydev, MII_MARVELL_PHY_PAGE, oldpage);
-
-   err = genphy_config_aneg(phydev);
-
-   return err;
+   return genphy_config_aneg(phydev);
 }
 
 static int m88e1318_config_aneg(struct phy_device *phydev)
-- 
2.8.3

Re: [PATCH 1/2] rxrpc: Trim line-terminal whitespace [ver #2]

2016-06-10 Thread David Miller

From: David Howells 
Date: Fri, 10 Jun 2016 14:41:21 +0100

> Hi Dave,
> 
> Can you please add these two patches to net-next?
> 
> Thanks,
> David

David, please stop telling me which tree things go into this way.

There is a clear, documented, way to do this properly with your
patches themselves.  Put it in the Subject line:

Subject: "[PATCH net-next 1/2] rxrpc: ..."

Also, you must provides a header introductory "[PATCH net-next 0/2] ..."
postings giving a high level overview of what the patch series is
about, what it is doing, why it is doing so, and how it is doing it.

Re: [PATCH V2 00/11] net: mediatek: various small fixes

2016-06-10 Thread John Crispin



On 10/06/2016 19:46, David Miller wrote:
> From: John Crispin 
> Date: Fri, 10 Jun 2016 13:30:15 +0200
> 
>>
>>
>> On 10/06/2016 13:27, John Crispin wrote:
>>> This series contains various small fixes that we stumbled across while
>>> doing thorough testing and code level reviewing of the driver. The only
>>> patch that sticks out is the first one, which addresses a DQL related
>>> issue. The rest are just minor fixes.
>>>
>>
>> Hi David,
>>
>> i forgot to remove the last sentence here. can you live with that as it
>> wont end up in the git history or do you want me to send a V3 with this
>> line removed.
> 
> What do you mean it won't end up in the GIT history?  I always put this
> introductory text into the merge commit for the patch series.
> 
> Now, I can remove it for you, which I will do.
> 

cool, thanks a lot !

John

Re: [PATCH 2/2] rxrpc: Limit the listening backlog

2016-06-10 Thread David Miller

From: David Howells 
Date: Fri, 10 Jun 2016 14:40:28 +0100

> David Howells  wrote:
> 
>> +else if (backlog > max)
>> +break;
> 
> Oops.  Please ignore this version of the patch - I forgot to commit a change
> to it before posting.

You also forgot to include a "0/2" introductory posting for the series
as well.

[PATCH iproute2] ip rule: Add support for l3mdev rules

2016-06-10 Thread David Ahern

Kernel commit 96c63fa7393d ("net: Add l3mdev rule") added support for
the FRA_L3MDEV attribute. The attribute enables use of l3mdev rules
which mean 'get table id from l3 master device'. This patch adds
support to iproute2 to show, add and delete rules with this attribute.

Signed-off-by: David Ahern 
---
Assumes FRA_L3MDEV is added by header sync to latest kernel tree

 include/linux/fib_rules.h |  1 +
 ip/iprule.c   | 21 ++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 7cb19e4d5ebd..b0566d5e4259 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -37,7 +37,7 @@ static void usage(void)
fprintf(stderr, "   ip rule { flush | save | restore }\n");
fprintf(stderr, "   ip rule [ list ]\n");
fprintf(stderr, "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ 
tos TOS ] [ fwmark FWMARK[/MASK] ]\n");
-   fprintf(stderr, "[ iif STRING ] [ oif STRING ] [ pref 
NUMBER ]\n");
+   fprintf(stderr, "[ iif STRING ] [ oif STRING ] [ pref 
NUMBER ] [ l3mdev ]\n");
fprintf(stderr, "ACTION := [ table TABLE_ID ]\n");
fprintf(stderr, "  [ nat ADDRESS ]\n");
fprintf(stderr, "  [ realms [SRCREALM/]DSTREALM ]\n");
@@ -139,6 +139,11 @@ int print_rule(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
fprintf(fp, "[detached] ");
}
 
+   if (tb[FRA_L3MDEV]) {
+   if (rta_getattr_u8(tb[FRA_L3MDEV]))
+   fprintf(fp, "lookup [l3mdev-table] ");
+   }
+
table = rtm_get_table(r, tb);
if (table) {
fprintf(fp, "lookup %s ", rtnl_rttable_n2a(table, b1, 
sizeof(b1)));
@@ -311,7 +316,9 @@ static int iprule_restore(void)
 
 static int iprule_modify(int cmd, int argc, char **argv)
 {
+   int l3mdev_rule = 0;
int table_ok = 0;
+   __u32 tid = 0;
struct {
struct nlmsghdr n;
struct rtmsgr;
@@ -393,8 +400,6 @@ static int iprule_modify(int cmd, int argc, char **argv)
addattr32(&req.n, sizeof(req), FRA_FLOW, realm);
} else if (matches(*argv, "table") == 0 ||
   strcmp(*argv, "lookup") == 0) {
-   __u32 tid;
-
NEXT_ARG();
if (rtnl_rttable_a2n(&tid, *argv))
invarg("invalid table ID\n", *argv);
@@ -428,6 +433,10 @@ static int iprule_modify(int cmd, int argc, char **argv)
} else if (strcmp(*argv, "oif") == 0) {
NEXT_ARG();
addattr_l(&req.n, sizeof(req), FRA_OIFNAME, *argv, 
strlen(*argv)+1);
+   } else if (strcmp(*argv, "l3mdev") == 0) {
+   addattr8(&req.n, sizeof(req), FRA_L3MDEV, 1);
+   table_ok = 1;
+   l3mdev_rule = 1;
} else if (strcmp(*argv, "nat") == 0 ||
   matches(*argv, "map-to") == 0) {
NEXT_ARG();
@@ -461,6 +470,12 @@ static int iprule_modify(int cmd, int argc, char **argv)
argv++;
}
 
+   if (l3mdev_rule && tid != 0) {
+   fprintf(stderr,
+   "table can not be specified for l3mdev rules\n");
+   return -EINVAL;
+   }
+
if (req.r.rtm_family == AF_UNSPEC)
req.r.rtm_family = AF_INET;
 
-- 
2.1.4

Re: [PATCH V2 00/11] net: mediatek: various small fixes

2016-06-10 Thread David Miller

From: John Crispin 
Date: Fri, 10 Jun 2016 13:30:15 +0200

> 
> 
> On 10/06/2016 13:27, John Crispin wrote:
>> This series contains various small fixes that we stumbled across while
>> doing thorough testing and code level reviewing of the driver. The only
>> patch that sticks out is the first one, which addresses a DQL related
>> issue. The rest are just minor fixes.
>> 
> 
> Hi David,
> 
> i forgot to remove the last sentence here. can you live with that as it
> wont end up in the git history or do you want me to send a V3 with this
> line removed.

What do you mean it won't end up in the GIT history?  I always put this
introductory text into the merge commit for the patch series.

Now, I can remove it for you, which I will do.

Re: [net-next,v2] openvswitch: Add packet truncation support.

2016-06-10 Thread pravin shelar

On Thu, Jun 9, 2016 at 4:57 PM, William Tu  wrote:
> The patch adds a new OVS action, OVS_ACTION_ATTR_TRUNC, in order to
> truncate packets. A 'max_len' is added for setting up the maximum
> packet size, and a 'cutlen' field is to record the number of bytes
> to trim the packet when the packet is outputting to a port, or when
> the packet is sent to userspace.
>
> Signed-off-by: William Tu 
> ---
> v1 -> v2:
> - Use 32 bits cutlen and fix unnecessary cutlen initilization.
> - Fix cutlen leaks to remaining actions under sample action.
> - In queue_userspace_packet, don't extract hlen when using skb_zerocopy.
> ---
>  include/uapi/linux/openvswitch.h |  7 +++
>  net/openvswitch/actions.c| 34 ++
>  net/openvswitch/datapath.c   | 27 ---
>  net/openvswitch/datapath.h   |  5 -
>  net/openvswitch/flow_netlink.c   |  9 +
>  net/openvswitch/vport.c  |  1 +
>  6 files changed, 67 insertions(+), 16 deletions(-)

...
...
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 9a3eb7a..64dbc80 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -750,6 +750,10 @@ static void do_output(struct datapath *dp, struct 
> sk_buff *skb, int out_port,
>
> if (likely(vport)) {
> u16 mru = OVS_CB(skb)->mru;
> +   u32 cutlen = OVS_CB(skb)->cutlen;
> +
> +   if (cutlen > 0)
> +   pskb_trim(skb, skb->len - cutlen);
>
Lets add check min packet length. also add unlikely annotation  to the
condition.

> if (likely(!mru || (skb->len <= mru + ETH_HLEN))) {
> ovs_vport_send(vport, skb);
...
...
> @@ -1059,8 +1077,16 @@ static int do_execute_actions(struct datapath *dp, 
> struct sk_buff *skb,
> prev_port = nla_get_u32(a);
> break;
>
> +   case OVS_ACTION_ATTR_TRUNC: {
> +   struct ovs_action_trunc *trunc = nla_data(a);
> +
> +   if (skb->len > trunc->max_len)
> +   OVS_CB(skb)->cutlen = skb->len - 
> trunc->max_len;
> +   break;
> +   }
> +
> case OVS_ACTION_ATTR_USERSPACE:
> -   output_userspace(dp, skb, key, a, attr, len);
> +   output_userspace(dp, skb, key, a, attr, len, 0);
Need to pass cutlen here to truncate packet for userspace action. Also
reset cutlen to zero for the skb after this action is executed.

> break;
>
> case OVS_ACTION_ATTR_HASH:
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 856bd8d..26f947e 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
...
...
>
>  static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>   const struct sw_flow_key *key,
> - const struct dp_upcall_info *upcall_info)
> + const struct dp_upcall_info *upcall_info,
> + uint32_t cutlen)
>  {
> struct ovs_header *upcall;
> struct sk_buff *nskb = NULL;
> @@ -515,9 +520,9 @@ static int queue_userspace_packet(struct datapath *dp, 
> struct sk_buff *skb,
> err = -ENOBUFS;
> goto out;
> }
> -   nla->nla_len = nla_attr_size(skb->len);
> +   nla->nla_len = nla_attr_size(skb->len - cutlen);
>
> -   err = skb_zerocopy(user_skb, skb, skb->len, hlen);
> +   err = skb_zerocopy(user_skb, skb, skb->len - cutlen, hlen);
> if (err)
> goto out;

If packet is truncated there is no need to allocate buffer according
to full packet packet length.

Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Steven Rostedt

On Fri, 10 Jun 2016 18:20:22 +0200
Sebastian Andrzej Siewior  wrote:

> > We actually triggered a starvation due to this. I was just seeing if
> > Alison hit the same issue we did in our tests.  
> 
> Okay. Didn't get this information from him. But this is only because
> the softirqs not running in ksoftirqd, right?

Right, I think we supplied this patch before we added the one that
would push repeated softirqs to ksoftirqd. We probably should see if
that patch alone fixes our issue.

-- Steve

[PATCHi next] veth: advertise peer link relationship for both devices

2016-06-10 Thread Lance Richardson

Currently, when creating a veth pair, notfications to user
space only include link peer for one end of the veth pair:
   # ip monitor link &
   # ip link add dev vm1 type veth peer name vm2
   30: vm2@NONE:  mtu 1500 qdisc noop state DOWN
   link/ether be:e3:b7:0e:14:52 brd ff:ff:ff:ff:ff:ff
   31: vm1@vm2:  mtu 1500 qdisc noop state DOWN
   link/ether da:e6:a6:c5:42:54 brd ff:ff:ff:ff:ff:ff

With this change, netlink notifications are sent with complete
information for both interfaces of the veth pair:

   # 3: vm2@NONE:  mtu 1500 qdisc noop state DOWN
   link/ether e2:94:54:8a:ac:f5 brd ff:ff:ff:ff:ff:ff
   4: vm1@vm2:  mtu 1500 qdisc noop state DOWN
   link/ether b2:05:70:e0:fc:35 brd ff:ff:ff:ff:ff:ff
   3: vm2@vm1:  mtu 1500 qdisc noop state DOWN
   link/ether e2:94:54:8a:ac:f5 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Lance Richardson 
---
 drivers/net/veth.c   | 10 +-
 net/core/rtnetlink.c | 10 --
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f37a6e6..9151686 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -458,7 +458,7 @@ static int veth_newlink(struct net *src_net, struct 
net_device *dev,
netif_carrier_off(dev);
 
/*
-* tie the deviced together
+* tie the devices together
 */
 
priv = netdev_priv(dev);
@@ -466,8 +466,16 @@ static int veth_newlink(struct net *src_net, struct 
net_device *dev,
 
priv = netdev_priv(peer);
rcu_assign_pointer(priv->peer, dev);
+
+   err = rtnl_configure_link(dev, NULL);
+   if (err < 0)
+   goto err_configure_dev;
+
+   rtmsg_ifinfo(RTM_NEWLINK, peer, 0, GFP_KERNEL);
return 0;
 
+err_configure_dev:
+   /* nothing to do */
 err_register_dev:
/* nothing to do */
 err_configure_peer:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d69c464..e0956bb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2165,6 +2165,7 @@ static int rtnl_dellink(struct sk_buff *skb, struct 
nlmsghdr *nlh)
 int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm)
 {
unsigned int old_flags;
+   unsigned int gchanges;
int err;
 
old_flags = dev->flags;
@@ -2174,9 +2175,14 @@ int rtnl_configure_link(struct net_device *dev, const 
struct ifinfomsg *ifm)
return err;
}
 
-   dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
+   if (dev->rtnl_link_state == RTNL_LINK_INITIALIZING) {
+   dev->rtnl_link_state = RTNL_LINK_INITIALIZED;
+   gchanges = ~0U;
+   } else {
+   gchanges = dev->flags ^ old_flags;
+   }
 
-   __dev_notify_flags(dev, old_flags, ~0U);
+   __dev_notify_flags(dev, old_flags, gchanges);
return 0;
 }
 EXPORT_SYMBOL(rtnl_configure_link);
-- 
2.5.5

Re: [PATCH next] sched: remove NET_XMIT_POLICED

2016-06-10 Thread Sven Eckelmann

On Friday 10 June 2016 14:21:29 Florian Westphal wrote:
[...]
> BATMAN uses it as an intermediate return value to signal
> forwarding vs. buffering, but it will not return POLICED to
> callers outside of BATMAN.
[...]
> diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
> index f2f1256..b1a4e8a 100644
> --- a/net/batman-adv/send.c
> +++ b/net/batman-adv/send.c
> @@ -156,7 +156,7 @@ int batadv_send_unicast_skb(struct sk_buff *skb,
>   * attempted.
>   *
>   * Return: NET_XMIT_SUCCESS on success, NET_XMIT_DROP on failure, or
> - * NET_XMIT_POLICED if the skb is buffered for later transmit.
> + * -EINPROGRESS if the skb is buffered for later transmit.
>   */
>  int batadv_send_skb_to_orig(struct sk_buff *skb,
>   struct batadv_orig_node *orig_node,
> @@ -188,7 +188,7 @@ int batadv_send_skb_to_orig(struct sk_buff *skb,
>* network coding fails, then send the packet as usual.
>*/
>   if (recv_if && batadv_nc_skb_forward(skb, neigh_node)) {
> - ret = NET_XMIT_POLICED;
> + ret = -EINPROGRESS;
>   } else {
>   batadv_send_unicast_skb(skb, neigh_node);
>   ret = NET_XMIT_SUCCESS;

Looks good to me. But it has a minor conflict with a queued up patch
for David (not yet submitted by Antonio). This is not real problem for
you but it looks like there is something wrong with the queued up patch.
I have submitted a fix for it and Antonio should just know that it is
related to this one.

But to your patch:

Reviewed-by: Sven Eckelmann 

Kind regards,
Sven


signature.asc
Description: This is a digitally signed message part.

Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Eric Dumazet

On Fri, 2016-06-10 at 17:30 +0200, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2016-05-26 19:56:41 [-0400]:
> 
> >For example:
> >
> >   
> >
> >   napi_schedule_prep()
> >test_and_set_bit(NAPI_STATE_SCHED, &n->state)
> >
> >   
> >
> >   sk_busy_loop()
> >
> >  do {
> >   rc = busy_poll()
> >   ret = napi_schedule_prep()
> >return !test_and_set_bit(NAPI_STATE_SCHED, &n->state)
> > 
> >   if (!ret) return 0
> >   
> >  } while (...) /* for ever */
> >
> 
> No, I don't see the busyloop. while() is here:
> |while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
> |  !need_resched() && !busy_loop_timeout(end_time));
> 
> and this seems to be the case since v3.11 where it was introduced (but
> now it moved to dev.c). So even if there is no busy_poll() and
> napi_schedule_prep() returns 0 our cycles here are limited by
> busy_loop_timeout().

Well, before linux-4.5 and commit 2a028ecb76497d ("net: allow BH
servicing in sk_busy_loop()") , sk_busy_loop() was completely blocking
BH.

Not sure it matters in your case.

RE: [PATCH v2 2/3] mwifiex: move .get_tx_power logic to station ioctl file

2016-06-10 Thread Amitkumar Karwar

Hi Kalle/Javier,

> From: Javier Martinez Canillas [mailto:jav...@osg.samsung.com]
> Sent: Friday, June 10, 2016 8:07 PM
> To: Kalle Valo
> Cc: linux-ker...@vger.kernel.org; Julian Calaby; Shengzhen Li; Enric
> Balletbo i Serra; Amitkumar Karwar; netdev@vger.kernel.org; linux-
> wirel...@vger.kernel.org; Nishant Sarmukadam
> Subject: Re: [PATCH v2 2/3] mwifiex: move .get_tx_power logic to station
> ioctl file
> 
> Hello Kalle,
> 
> On 06/10/2016 10:30 AM, Kalle Valo wrote:
> > Javier Martinez Canillas  writes:
> >
> >> From: Shengzhen Li 
> >>
> >> Most cfg80211 operations are just a wrappers to functions defined in
> >> the sta_ioctl.c file, so for consistency move the .get_tx_power logic
> there.
> >>
> >> Signed-off-by: Shengzhen Li 
> >> Signed-off-by: Amitkumar Karwar 
> >> [javier: update the subject line and commit message]
> >> Signed-off-by: Javier Martinez Canillas 
> >
> > [...]
> >
> >> --- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> >> +++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
> >> @@ -385,18 +385,10 @@ mwifiex_cfg80211_get_tx_power(struct wiphy
> *wiphy,
> >>  int *dbm)
> >>  {
> >>struct mwifiex_adapter *adapter =
> mwifiex_cfg80211_get_adapter(wiphy);
> >> -  struct mwifiex_private *priv = mwifiex_get_priv(adapter,
> >> -  MWIFIEX_BSS_ROLE_ANY);
> >> -  int ret = mwifiex_send_cmd(priv, HostCmd_CMD_RF_TX_PWR,
> >> - HostCmd_ACT_GEN_GET, 0, NULL, true);
> >> -
> >> -  if (ret < 0)
> >> -  return ret;
> >> -
> >> -  /* tx_power_level is set in HostCmd_CMD_RF_TX_PWR command handler
> */
> >> -  *dbm = priv->tx_power_level;
> >> +  struct mwifiex_private *priv;
> >>
> >> -  return 0;
> >> +  priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_ANY);
> >> +  return mwifiex_get_tx_power(priv, dbm);
> >>  }
> >
> > So in patch 1 you added the patch and in patch 2 you move it to a
> > different location? That doesn't make any sense, can't you just fold
> > the two patches into one so that the function is added only once.
> >
> 
> I posted this patch in v1 but then Amitkumar shared his patch with me
> that was very similar to mine, only that the logic was in a different
> location.
> 
> So I included his delta as a separate patch to try keeping attribution
> as best as possible.
> 

This patch (2/3) is only for code rearrangement and adds an unnecessary wrapper 
function. We can simply drop the patch.

Regards,
Amitkumar

Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Sebastian Andrzej Siewior

On 06/10/2016 06:11 PM, Steven Rostedt wrote:
>> It is true that in RT we don't have such a limit like in !RT. You would
>> need to use __raise_softirq_irqoff_ksoft() instead the normal or +
>> wakeup() since you may have timers pending and those need to go to the
>> "other" ksoftirqd.
>> But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
>> it will end up on the CPU right away unless there other tasks that need
>> the CPU. So the scheduler will balance it the same way. The only change
>> will be that softirqs which are processed in context of any application
>> for more than two jiffies will be moved to ksoftirqd. This could be a
>> win.
> 
> We actually triggered a starvation due to this. I was just seeing if
> Alison hit the same issue we did in our tests.

Okay. Didn't get this information from him. But this is only because
the softirqs not running in ksoftirqd, right?

> -- Steve
> 
Sebastian

Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Steven Rostedt

On Fri, 10 Jun 2016 17:57:17 +0200
Sebastian Andrzej Siewior  wrote:

> * Steven Rostedt | 2016-06-04 07:11:31 [-0400]:
> 
> >From: Steven Rostedt 
> >Date: Tue, 5 Jan 2016 14:53:09 -0500
> >Subject: [PATCH] softirq: Perform softirqs in local_bh_enable() for a limited
> > amount of time
> >
> >To prevent starvation of tasks like ksoftirqd, if the task that is
> >processing its softirqs takes more than 2 jiffies to do so, and the
> >softirqs are constantly being re-added, then defer the processing to
> >ksoftirqd.  
> 
> I'm not sure about this. Alison didn't scream "yes it solves my problem"
> and I am not sure what it is.
> It is true that in RT we don't have such a limit like in !RT. You would
> need to use __raise_softirq_irqoff_ksoft() instead the normal or +
> wakeup() since you may have timers pending and those need to go to the
> "other" ksoftirqd.
> But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
> it will end up on the CPU right away unless there other tasks that need
> the CPU. So the scheduler will balance it the same way. The only change
> will be that softirqs which are processed in context of any application
> for more than two jiffies will be moved to ksoftirqd. This could be a
> win.

We actually triggered a starvation due to this. I was just seeing if
Alison hit the same issue we did in our tests.

-- Steve

Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-10 Thread Sebastian Andrzej Siewior

* Steven Rostedt | 2016-06-04 07:11:31 [-0400]:

>From: Steven Rostedt 
>Date: Tue, 5 Jan 2016 14:53:09 -0500
>Subject: [PATCH] softirq: Perform softirqs in local_bh_enable() for a limited
> amount of time
>
>To prevent starvation of tasks like ksoftirqd, if the task that is
>processing its softirqs takes more than 2 jiffies to do so, and the
>softirqs are constantly being re-added, then defer the processing to
>ksoftirqd.

I'm not sure about this. Alison didn't scream "yes it solves my problem"
and I am not sure what it is.
It is true that in RT we don't have such a limit like in !RT. You would
need to use __raise_softirq_irqoff_ksoft() instead the normal or +
wakeup() since you may have timers pending and those need to go to the
"other" ksoftirqd.
But then I don't see much change. ksoftirqd runs now at SCHED_OTHER so
it will end up on the CPU right away unless there other tasks that need
the CPU. So the scheduler will balance it the same way. The only change
will be that softirqs which are processed in context of any application
for more than two jiffies will be moved to ksoftirqd. This could be a
win.

>Signed-off-by: Steven Rostedt 
>Signed-off-by: Luis Claudio R. Goncalves 

Sebastian

Re: [RFC 1/3] vsockmon: Add tap functions.

2016-06-10 Thread Stefan Hajnoczi

On Thu, Jun 09, 2016 at 05:02:47PM +0200, Gerard Garcia wrote:
> On 06/01/2016 11:07 PM, Stefan Hajnoczi wrote:
> > On Sat, May 28, 2016 at 06:29:05PM +0200, ggar...@abra.uab.cat wrote:
> > > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > > index 6b158ab..ec7a05d 100644
> > > --- a/net/vmw_vsock/af_vsock.c
> > > +++ b/net/vmw_vsock/af_vsock.c
> > > @@ -96,6 +96,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include 
> > >   #include 
> > > @@ -2012,6 +2013,110 @@ const struct vsock_transport 
> > > *vsock_core_get_transport(void)
> > >   }
> > >   EXPORT_SYMBOL_GPL(vsock_core_get_transport);
> > > +/ TAP /
> > Feel free to put this in a separate source file.  The Kbuild can link
> > multiple objects into a single kernel module.  That would be cleaner
> > than using a big comment to separate it from af_vsock.c code.
> I'm following the af_vsock.c style, where different logic is separated using
> this style of comments. It is not a lot of code
> so I thought it would be cleaner to have it in the same file.

It's up to the af_vsock.c maintainer, but if we keep appending
independent chunks of code to one file it becomes hard to manage and
chances of conflicts during patch merging increases.

> > > +int vsock_add_tap(struct vsock_tap *vt) {
> > > + if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
> > > + return -EINVAL;
> > > +
> > > + spin_lock(&vsock_tap_lock);
> > > + list_add_rcu(&vt->list, &vsock_tap_all);
> > > + spin_unlock(&vsock_tap_lock);
> > > +
> > > + __module_get(vt->module);
> > It's slightly safer to get the module before publishing it on the list.
> > But in practice I guess the caller is the module so the module won't
> > disappear underneath us.
> This function is equal to the function in af_netlink.c used by nlmon. As you
> said, in practice the caller is the module
> so it won't disappear.

Yes, there isn't a huge win right now but given that it's easy to
resolve the issue I'd do it.  The problem comes when people copy-paste
code that contains assumptions and the assumption no longer holds.
Better to write it in the safe way, eliminating the assumption, so that
derived code will be correct under more conditions.  There is no
drawback to moving __module_get() above the spin_lock().


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 157 matches

Mail list logo