date:20170623

Re: [PATCH NET v3 1/2] net: phy: Add phy loopback support in net phy framework

2017-06-23 Thread Yunsheng Lin



On 2017/6/24 11:40, Yunsheng Lin wrote:
> Hi, Andrew
> 
> On 2017/6/24 11:12, Andrew Lunn wrote:
>>> +int phy_loopback(struct phy_device *phydev, bool enable)
>>> +{
>>> +   struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
>>> +   int ret = 0;
>>> +
>>> +   if (enable && phydev->loopback_enabled)
>>> +   return -EBUSY;
>>> +
>>> +   if (!enable && !phydev->loopback_enabled)
>>> +   return -EINVAL;
>>> +
>>> +   if (phydev->drv && phydrv->set_loopback)
>>> +   ret = phydrv->set_loopback(phydev, enable);
>>
>>  else
>>  ret = -EOPNOTSUPP;
>>
>>> +
>>> +   if (ret)
>>> +   return ret;
>>> +
>>> +   phydev->loopback_enabled = enable;
>>> +
>>> +   return 0;
>>> +}
>>> +EXPORT_SYMBOL(phy_loopback);
>>
>> One of the comments we made of the PHY code in the hns driver is that
>> its locking is completely broken. You have made the same error
>> here. The core needs to hold the mutex while calling into the PHY
>> driver.
> Do you mean hns_nic_config_phy_loopback need to hold the mutex while
> calling phy_loopback? and other place that calling phy_* function?
I took some time looking into how to take mutex in phy core, here is what
I find:
phy_resume call phydrv->resume without take mutex.
if phy driver implement resume function, for example marvell_resume, then

static int marvell_resume(struct phy_device *phydev)
{
int err;

/* Resume the fiber mode first */
if (!(phydev->supported & SUPPORTED_FIBRE)) {
err = phy_write(phydev, MII_MARVELL_PHY_PAGE, MII_M_FIBER);
if (err < 0)
goto error;

/* With the page set, use the generic resume */
err = genphy_resume(phydev);
if (err < 0)
goto error;

/* Then, the copper link */
err = phy_write(phydev, MII_MARVELL_PHY_PAGE, MII_M_COPPER);
if (err < 0)
goto error;
}

the code above executes without holding a mutex expect genphy_resume

/* With the page set, use the generic resume */
return genphy_resume(phydev);

error:
phy_write(phydev, MII_MARVELL_PHY_PAGE, MII_M_COPPER);
return err;
}

So I think the correct way to hold a lock is in phy_* function, not in genphy_*,
and current genphy_resume and phy_resume is broken in this way.
Please let me know if I misunderstand the mutex taking in phy core.

Best Regards
Yunsheng Lin

Re: [PATCH NET v3 1/2] net: phy: Add phy loopback support in net phy framework

2017-06-23 Thread Yunsheng Lin

Hi, Andrew

On 2017/6/24 11:12, Andrew Lunn wrote:
>> +int phy_loopback(struct phy_device *phydev, bool enable)
>> +{
>> +struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
>> +int ret = 0;
>> +
>> +if (enable && phydev->loopback_enabled)
>> +return -EBUSY;
>> +
>> +if (!enable && !phydev->loopback_enabled)
>> +return -EINVAL;
>> +
>> +if (phydev->drv && phydrv->set_loopback)
>> +ret = phydrv->set_loopback(phydev, enable);
> 
>   else
>   ret = -EOPNOTSUPP;
> 
>> +
>> +if (ret)
>> +return ret;
>> +
>> +phydev->loopback_enabled = enable;
>> +
>> +return 0;
>> +}
>> +EXPORT_SYMBOL(phy_loopback);
> 
> One of the comments we made of the PHY code in the hns driver is that
> its locking is completely broken. You have made the same error
> here. The core needs to hold the mutex while calling into the PHY
> driver.
Do you mean hns_nic_config_phy_loopback need to hold the mutex while
calling phy_loopback? and other place that calling phy_* function?

Best Regards
Yunsheng Lin

Re: [PATCH net-next] net: dsa: mv88e6xxx: fix error code in mv88e6390_serdes_power()

2017-06-23 Thread Andrew Lunn

On Fri, Jun 23, 2017 at 06:17:04PM +0300, Dan Carpenter wrote:
> We're accidentally returning the wrong variable.  "cmode" is
> uninitialized at this point so it causes a static checker warning.
> 
> Fixes: 6335e9f2446b ("net: dsa: mv88e6xxx: mv88e6390X SERDES support")
> Signed-off-by: Dan Carpenter 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH NET v3 2/2] net: hns: Use phy_driver to setup Phy loopback

2017-06-23 Thread Andrew Lunn

>  static int hns_nic_config_phy_loopback(struct phy_device *phy_dev, u8 en)
>  {
> -#define COPPER_CONTROL_REG 0
> -#define PHY_POWER_DOWN BIT(11)
> -#define PHY_LOOP_BACK BIT(14)
> - u16 val = 0;
> + int err;
>  
>   if (phy_dev->is_c45) /* c45 branch adding for XGE PHY */
>   return -ENOTSUPP;

You should take this out as well. You want the core to tell you if
loopback is supported or not. At some point, a c45 PHY could support
loopback.

> + case MAC_LOOP_PHY_NONE:
>   if ((phy_dev) && (!phy_dev->is_c45))
>   ret |= hns_nic_config_phy_loopback(phy_dev, 0x0);

same here.

 Andrew

Re: [PATCH NET v3 1/2] net: phy: Add phy loopback support in net phy framework

2017-06-23 Thread Andrew Lunn

> +int phy_loopback(struct phy_device *phydev, bool enable)
> +{
> + struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
> + int ret = 0;
> +
> + if (enable && phydev->loopback_enabled)
> + return -EBUSY;
> +
> + if (!enable && !phydev->loopback_enabled)
> + return -EINVAL;
> +
> + if (phydev->drv && phydrv->set_loopback)
> + ret = phydrv->set_loopback(phydev, enable);

else
ret = -EOPNOTSUPP;

> +
> + if (ret)
> + return ret;
> +
> + phydev->loopback_enabled = enable;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(phy_loopback);

One of the comments we made of the PHY code in the hns driver is that
its locking is completely broken. You have made the same error
here. The core needs to hold the mutex while calling into the PHY
driver.

Andrew

Re: [PATCH 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-06-23 Thread Andrew Lunn

On Fri, Jun 23, 2017 at 12:59:07PM +0800, David Wu wrote:
> To make internal phy worked, need to configure the phy_clock,
> phy cru_reset and related registers.
> 
> Change-Id: I6971c0a769754b824b1b908b56080cbaf7867d13
> Signed-off-by: David Wu 
> ---
>  .../devicetree/bindings/net/rockchip-dwmac.txt |  3 +
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 82 
> ++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt 
> b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> index 8f42755..0514f69 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> @@ -22,6 +22,7 @@ Required properties:
>  <&cru SCLK_MACREF_OUT> clock gate for RMII reference clock output
>  <&cru ACLK_GMAC>: AXI clock gate for GMAC
>  <&cru PCLK_GMAC>: APB clock gate for GMAC
> +<&cru MAC_PHY>: clock for internal macphy

If this is the PHY clock, should it actually be specified in the PHY
binding? Can you read the PHY ID registers with this clock off?

 Andrew

Re: [PATCH 04/11] net: stmmac: dwmac-rk: Remove unwanted code for rk3328_set_to_rmii()

2017-06-23 Thread Andrew Lunn

On Fri, Jun 23, 2017 at 12:42:02PM +0800, David Wu wrote:
> This is wrong setting for rk3328_set_to_rmii(), so remove it.
> 
> Change-Id: I9953784ea44335d90710e5473960c95b3d68a5fd

Hi David

This is not a reconsigned tag for a patch.

 Andrew

Re: [PATCH 01/11] net: phy: Add rockchip phy driver support

2017-06-23 Thread Andrew Lunn

On Fri, Jun 23, 2017 at 12:41:59PM +0800, David Wu wrote:
> Support internal ephy currently.
> 
> Signed-off-by: David Wu 
> ---
>  drivers/net/phy/Kconfig|  4 ++
>  drivers/net/phy/Makefile   |  1 +
>  drivers/net/phy/rockchip.c | 94 
> ++
>  3 files changed, 99 insertions(+)
>  create mode 100644 drivers/net/phy/rockchip.c
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index c360dd6..86010d4 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -350,6 +350,10 @@ config XILINX_GMII2RGMII
>   the Reduced Gigabit Media Independent Interface(RGMII) between
>   Ethernet physical media devices and the Gigabit Ethernet controller.
>  
> +config ROCKCHIP_MAC_PHY

This is a bit of an odd name, having both MAC and PHY in it. Are there
any other RockChip PHYs? Any external PHYS? Are they register
incompatible with the internal PHY?  Is it even RockChip IP? Or has it
been licensed from somebody else?

I would more likely just call it ROCKCHIP_PHY.

  Andrew

[PATCH] net/icmp: restore source address if packet is NATed

2017-06-23 Thread Jason A. Donenfeld

The ICMP routines use the source address for two reasons:

1. Rate-limiting ICMP transmissions based on source address, so
   that one source address cannot provoke a flood of replies. If
   the source address is wrong, the rate limiting will be
   incorrectly applied.

2. Choosing the interface and hence new source address of the
   generated ICMP packet. If the original packet source address
   is wrong, ICMP replies will be sent from the wrong source
   address, resulting in either a misdelivery, infoleak, or just
   general network admin confusion.

Most of the time, the icmp_send and icmpv6_send routines can just reach
down into the skb's IP header to determine the saddr. However, if
icmp_send or icmpv6_send is being called from a network device driver --
there are a few in the tree -- then it's possible that by the time
icmp_send or icmpv6_send looks at the packet, the packet's source
address has already been transformed by SNAT or MASQUERADE or some other
transformation that CONNTRACK knows about. In this case, the packet's
source address is most certainly the *wrong* source address to be used
for the purpose of ICMP replies.

Rather, the source address we want to use for ICMP replies is the
original one, from before the transformation occurred.

Fortunately, it's very easy to just ask CONNTRACK if it knows about this
packet, and if so, how to fix it up. The saddr is the only field in the
header we need to fix up, for the purposes of the subsequent processing
in the icmp_send and icmpv6_send functions, so we do the lookup very
early on, so that the rest of the ICMP machinery can progress as usual.
In my tests, this setup works very well.

Signed-off-by: Jason A. Donenfeld 
---
 net/ipv4/icmp.c | 21 +
 net/ipv6/icmp.c | 21 +
 2 files changed, 42 insertions(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c2be26b98b5f..30aa6aa79fd2 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -97,6 +97,10 @@
 #include 
 #include 
 #include 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include 
+#include 
+#endif
 
 /*
  * Build xmit assembly blocks
@@ -586,6 +590,10 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, 
__be32 info)
u32 mark;
struct net *net;
struct sock *sk;
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+#endif
 
if (!rt)
goto out;
@@ -604,6 +612,19 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, 
__be32 info)
goto out;
 
/*
+*  If this function is called after the skb has already been
+*  NAT transformed, the ratelimiting will apply to the wrong
+*  saddr, and the reply will will be marked as coming from the
+*  wrong host. So, we fix it up here in case connection tracking
+*  enables that.
+*/
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   ct = nf_ct_get(skb_in, &ctinfo);
+   if (ct)
+   iph->saddr = ct->tuplehash[0].tuple.src.u3.ip;
+#endif
+
+   /*
 *  No replies to physical multicast/broadcast
 */
if (skb_in->pkt_type != PACKET_HOST)
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 8d7b113958b1..ee8a2853121e 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -69,6 +69,10 @@
 #include 
 #include 
 #include 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include 
+#include 
+#endif
 
 #include 
 
@@ -422,12 +426,29 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
int len;
int err = 0;
u32 mark = IP6_REPLY_MARK(net, skb->mark);
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   enum ip_conntrack_info ctinfo;
+   struct nf_conn *ct;
+#endif
 
if ((u8 *)hdr < skb->head ||
(skb_network_header(skb) + sizeof(*hdr)) > skb_tail_pointer(skb))
return;
 
/*
+*  If this function is called after the skb has already been
+*  NAT transformed, the ratelimiting will apply to the wrong
+*  saddr, and the reply will will be marked as coming from the
+*  wrong host. So, we fix it up here in case connection tracking
+*  enables that.
+*/
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   ct = nf_ct_get(skb, &ctinfo);
+   if (ct)
+   hdr->saddr = ct->tuplehash[0].tuple.src.u3.in6;
+#endif
+
+   /*
 *  Make sure we respect the rules
 *  i.e. RFC 1885 2.4(e)
 *  Rule (e.1) is enforced by not using icmp6_send
-- 
2.13.1

Re: [PATCH 01/11] net: phy: Add rockchip phy driver support

2017-06-23 Thread Andrew Lunn

> +
> +static int internal_config_init(struct phy_device *phydev)
> +{

internal_ is a bit generic. The Marvell Ethernet switches have
internal phy, etc. rockchip_ would be a better prefix.

 Andrew

[PATCH net-next 4/4] sctp: adjust ssthresh when transport is idle

2017-06-23 Thread Marcelo Ricardo Leitner

RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd
because otherwise it could cause the transport to lock into congestion
avoidance phase specially if ssthresh was previously reduced by some
packet drop, leading to poor performance.

The Errata says to adjust ssthresh to cwnd only once, though the same
goal is achieved by updating it every time we update cwnd too. The
caveat is that we could take longer to get back up to speed but that
should be compensated by the fact that we don't adjust on RTO basis (as
RFC says) but based on Heartbeats, which are usually way longer.

See-also: 
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/transport.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 
e3ebf04ddbd092a1ba04962c8fbe85a58588771a..7cdd6bcddbc5ad277f525635cf56788f70a4ee30
 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -569,6 +569,8 @@ void sctp_transport_lower_cwnd(struct sctp_transport 
*transport,
 */
transport->cwnd = max(transport->cwnd/2,
 4*asoc->pathmtu);
+   /* RFC 4960 Errata 3.27.2: also adjust sshthresh */
+   transport->ssthresh = transport->cwnd;
break;
}
 
-- 
2.9.4

[PATCH net-next 3/4] sctp: adjust cwnd increase in Congestion Avoidance phase

2017-06-23 Thread Marcelo Ricardo Leitner

RFC4960 Errata 3.26 identified that at the same time RFC4960 states that
cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was
underspecified and as described could allow increasing cwnd more than
that.

This patch updates it so partial_bytes_acked is maxed to cwnd if
flight_size doesn't reach cwnd, protecting it from such case.

See-also: 
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/transport.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 
9d3589451a967a31ee241a5138a58f2f81a2f2a1..e3ebf04ddbd092a1ba04962c8fbe85a58588771a
 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -405,13 +405,6 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
TSN_lte(asoc->fast_recovery_exit, sack_ctsn))
asoc->fast_recovery = 0;
 
-   /* The appropriate cwnd increase algorithm is performed if, and only
-* if the congestion window is being fully utilized.
-* Note that RFC4960 Errata 3.22 removed the other condition.
-*/
-   if (flight_size < cwnd)
-   return;
-
ssthresh = transport->ssthresh;
pba = transport->partial_bytes_acked;
pmtu = transport->asoc->pathmtu;
@@ -434,6 +427,14 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
if (asoc->fast_recovery)
return;
 
+   /* The appropriate cwnd increase algorithm is performed
+* if, and only if the congestion window is being fully
+* utilized.  Note that RFC4960 Errata 3.22 removed the
+* other condition on ctsn moving.
+*/
+   if (flight_size < cwnd)
+   return;
+
if (bytes_acked > pmtu)
cwnd += pmtu;
else
@@ -451,6 +452,13 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
 * acknowledged by the new Cumulative TSN Ack and by Gap
 * Ack Blocks. (updated by RFC4960 Errata 3.22)
 *
+* When partial_bytes_acked is greater than cwnd and
+* before the arrival of the SACK the sender had less
+* bytes of data outstanding than cwnd (i.e., before
+* arrival of the SACK, flightsize was less than cwnd),
+* reset partial_bytes_acked to cwnd. (RFC 4960 Errata
+* 3.26)
+*
 * When partial_bytes_acked is equal to or greater than
 * cwnd and before the arrival of the SACK the sender
 * had cwnd or more bytes of data outstanding (i.e.,
@@ -460,7 +468,9 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
 * increased by MTU. (RFC 4960 Errata 3.12)
 */
pba += bytes_acked;
-   if (pba >= cwnd) {
+   if (pba > cwnd && flight_size < cwnd)
+   pba = cwnd;
+   if (pba >= cwnd && flight_size >= cwnd) {
pba = pba - cwnd;
cwnd += pmtu;
}
-- 
2.9.4

[PATCH net-next 2/4] sctp: allow increasing cwnd regardless of ctsn moving or not

2017-06-23 Thread Marcelo Ricardo Leitner

As per RFC4960 Errata 3.22, this condition is not needed anymore as it
could cause the partial_bytes_acked to not consider the TSNs acked in
the Gap Ack Blocks although they were received by the peer successfully.

This patch thus drops the check for new Cumulative TSN Ack Point,
leaving just the flight_size < cwnd one.

See-also: 
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/transport.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 
04b6dd1a07ded25fe5874518b0944a6d9df4099b..9d3589451a967a31ee241a5138a58f2f81a2f2a1
 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -406,11 +406,10 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
asoc->fast_recovery = 0;
 
/* The appropriate cwnd increase algorithm is performed if, and only
-* if the cumulative TSN whould advanced and the congestion window is
-* being fully utilized.
+* if the congestion window is being fully utilized.
+* Note that RFC4960 Errata 3.22 removed the other condition.
 */
-   if (TSN_lte(sack_ctsn, transport->asoc->ctsn_ack_point) ||
-   (flight_size < cwnd))
+   if (flight_size < cwnd)
return;
 
ssthresh = transport->ssthresh;
@@ -446,11 +445,11 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
 flight_size, pba);
} else {
/* RFC 2960 7.2.2 Whenever cwnd is greater than ssthresh,
-* upon each SACK arrival that advances the Cumulative TSN Ack
-* Point, increase partial_bytes_acked by the total number of
-* bytes of all new chunks acknowledged in that SACK including
-* chunks acknowledged by the new Cumulative TSN Ack and by
-* Gap Ack Blocks.
+* upon each SACK arrival, increase partial_bytes_acked
+* by the total number of bytes of all new chunks
+* acknowledged in that SACK including chunks
+* acknowledged by the new Cumulative TSN Ack and by Gap
+* Ack Blocks. (updated by RFC4960 Errata 3.22)
 *
 * When partial_bytes_acked is equal to or greater than
 * cwnd and before the arrival of the SACK the sender
-- 
2.9.4

[PATCH net-next 0/4] RFC 4960 Errata fixes

2017-06-23 Thread Marcelo Ricardo Leitner

This patchset contains fixes for 4 Errata topics from
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01
Namely, sections:
 3.12. Order of Adjustments of partial_bytes_acked and cwnd
 3.22. Increase of partial_bytes_acked in Congestion Avoidance
 3.26. CWND Increase in Congestion Avoidance Phase
 3.27. Refresh of cwnd and ssthresh after Idle Period

Tests performed with netperf using net namespaces, with drop rates at
0%, 0.5% and 1% by netem, IPv4 and IPv6, 10 runs for each combination.
I couldn't spot differences on the stats. With and without these patches
the results vary in a similar way in terms of throughput and
retransmissions.

Tests with 20ms delay and 20ms delay + drops at 0.5% and 1% also had
results in a similar way, no noticeable difference.

Looking at cwnd, it was possible to notice slightly lower values being
used while still sustaining same throughput profile.

Marcelo Ricardo Leitner (4):
  sctp: update order of adjustments of partial_bytes_acked and cwnd
  sctp: allow increasing cwnd regardless of ctsn moving or not
  sctp: adjust cwnd increase in Congestion Avoidance phase
  sctp: adjust ssthresh when transport is idle

 net/sctp/transport.c | 54 
 1 file changed, 33 insertions(+), 21 deletions(-)

-- 
2.9.4

[PATCH net-next 1/4] sctp: update order of adjustments of partial_bytes_acked and cwnd

2017-06-23 Thread Marcelo Ricardo Leitner

RFC4960 Errata 3.12 says RFC4960 is unclear about the order of
adjustments applied to partial_bytes_acked and cwnd in the congestion
avoidance phase, and that the actual order should be:
partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd
is increased by MTU.

We were first increasing cwnd, and then subtracting the new value pba,
which leads to a different result as pba is smaller than what it should
and could cause cwnd to not grow as much.

See-also: 
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/transport.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 
721eeebfcd8a50609877db61ede41575e012606a..04b6dd1a07ded25fe5874518b0944a6d9df4099b
 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -452,17 +452,18 @@ void sctp_transport_raise_cwnd(struct sctp_transport 
*transport,
 * chunks acknowledged by the new Cumulative TSN Ack and by
 * Gap Ack Blocks.
 *
-* When partial_bytes_acked is equal to or greater than cwnd
-* and before the arrival of the SACK the sender had cwnd or
-* more bytes of data outstanding (i.e., before arrival of the
-* SACK, flightsize was greater than or equal to cwnd),
-* increase cwnd by MTU, and reset partial_bytes_acked to
-* (partial_bytes_acked - cwnd).
+* When partial_bytes_acked is equal to or greater than
+* cwnd and before the arrival of the SACK the sender
+* had cwnd or more bytes of data outstanding (i.e.,
+* before arrival of the SACK, flightsize was greater
+* than or equal to cwnd), partial_bytes_acked is reset
+* to (partial_bytes_acked - cwnd). Next, cwnd is
+* increased by MTU. (RFC 4960 Errata 3.12)
 */
pba += bytes_acked;
if (pba >= cwnd) {
+   pba = pba - cwnd;
cwnd += pmtu;
-   pba = ((cwnd < pba) ? (pba - cwnd) : 0);
}
 
pr_debug("%s: congestion avoidance: transport:%p, "
-- 
2.9.4

Re

2017-06-23 Thread Tom Crist



Ich habe eine Lotterie gewonnen, die ich beschlossen habe, einen Teil davon 
herauszugeben, Sie haben eine Spende von 4,8 Millionen Euro, ich gewann die 
Amerika-Lotterie im Wert von 40 Millionen Euro in Amerika und beschloss, einen 
Teil davon an fünf Glückspersonen und Wohltätigkeitshäuser zu spenden In 
Erinnerung an meine verstorbene Frau, die an Krebs gestorben ist. Kontaktieren 
Sie mich für weitere Details

Re: [PATCH v2] arm: eBPF JIT compiler

2017-06-23 Thread Shubham Bansal

Hi Russell,Daniel and Kees,

I am attaching the latest patch with this mail. It included support
for BPF_CALL | BPF_JMP tested with and without constant blinding on
ARMv7 machine.
Due to the limitation on my machine I can't test the tail call. It
would be a great help if any of you could help me with this.

Its been a long time since this patch is in works, Russell, can you
please help with sending this patch to ARM patch tracker?

Thanks.
Shubham


0001-Added-Support-for-BPF_CALL-BPF_JMP.patch
Description: Binary data

Re: BUG: KASAN: use-after-free in free_old_xmit_skbs

2017-06-23 Thread Cong Wang

On Fri, Jun 23, 2017 at 1:43 AM, Jason Wang  wrote:
>
>
> On 2017年06月23日 02:53, Michael S. Tsirkin wrote:
>>
>> On Thu, Jun 22, 2017 at 08:15:58AM +0200, jean-philippe menil wrote:
>>>
>>> Hi Michael,
>>>
>>> from what i see, the race appear when we hit virtnet_reset in
>>> virtnet_xdp_set.
>>> virtnet_reset
>>>_remove_vq_common
>>>  virtnet_del_vqs
>>>virtnet_free_queues
>>>  kfree(vi->sq)
>>> when the xdp program (with two instances of the program to trigger it
>>> faster)
>>> is added or removed.
>>>
>>> It's easily repeatable, with 2 cpus and 4 queues on the qemu command
>>> line,
>>> running the xdp_ttl tool from Jesper.
>>>
>>> For now, i'm able to continue my qualification, testing if xdp_qp is not
>>> null,
>>> but do not seem to be a sustainable trick.
>>> if (xdp_qp && vi->xdp_queues_pairs != xdp_qp)
>>>
>>> Maybe it will be more clear to you with theses informations.
>>>
>>> Best regards.
>>>
>>> Jean-Philippe
>>
>>
>> I'm pretty clear about the issue here, I was trying to figure out a fix.
>> Jason, any thoughts?
>>
>>
>
> Hi Jean:
>
> Does the following fix this issue? (I can't reproduce it locally through
> xdp_ttl)

It is tricky here.

>From my understanding of the code base, the tx_lock is not sufficient
here, because in virtnet_del_vqs() all vqs are deleted and one vp
maps to one txq.

I am afraid you have to add a spinlock somewhere to serialized
free_old_xmit_skbs() vs. vring_del_virtqueue(). As you can see
they are in different layers, so it is hard to figure out where to add
it...

Also, make sure we don't sleep inside the spinlock, I see a
synchronize_net().

[PATCH v3 net] net: ipv6: reset daddr and dport in sk if connect() fails

2017-06-23 Thread Wei Wang

From: Wei Wang 

In __ip6_datagram_connect(), reset sk->sk_v6_daddr and inet->dport if
error occurs.
In udp_v6_early_demux(), check for sk_state to make sure it is in
TCP_ESTABLISHED state.
Together, it makes sure unconnected UDP socket won't be considered as a
valid candidate for early demux.

Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast")

v3: add TCP_ESTABLISHED state check in udp_v6_early_demux()
v2: fix compilation error

Signed-off-by: Wei Wang 
Acked-by: Maciej Żenczykowski 
---
 net/ipv6/datagram.c | 8 +++-
 net/ipv6/udp.c  | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index e011122ebd43..5c786f5ab961 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -250,8 +250,14 @@ int __ip6_datagram_connect(struct sock *sk, struct 
sockaddr *uaddr,
 */
 
err = ip6_datagram_dst_update(sk, true);
-   if (err)
+   if (err) {
+   /* Reset daddr and dport so that udp_v6_early_demux()
+* fails to find this socket
+*/
+   memset(&sk->sk_v6_daddr, 0, sizeof(sk->sk_v6_daddr));
+   inet->inet_dport = 0;
goto out;
+   }
 
sk->sk_state = TCP_ESTABLISHED;
sk_set_txhash(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2b33847bf931..d494b2621b11 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -880,7 +880,8 @@ static struct sock *__udp6_lib_demux_lookup(struct net *net,
struct sock *sk;
 
udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
-   if (INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif))
+   if (sk->sk_state == TCP_ESTABLISHED &&
+   INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif))
return sk;
/* Only check first socket in chain */
break;
-- 
2.13.1.611.g7e3b11ae1-goog

Re: unregister_netdevice: waiting for lo to become free. Usage count = 1

2017-06-23 Thread Andrei Vagin

On Fri, Jun 23, 2017 at 02:49:58PM -0700, Andrei Vagin wrote:
> Hello Everyone,
> 
> Today I've met a problem, when any attempts to create a new network
> namespace hang up.
> I see that one of previous namespaces can't be destroyed, because a
> usage count for one
> of its devices isn't zero. To reproduce the problem, you need to
> executed an attached program
> in a separate network namespace:
> 
> [root@fc24 net]# cat run.sh
> ip link set up dev lo
> ./tcp-bug
> [root@fc24 net]# unshare -n sh run.sh

If this program is executed in a new user, pid, mnt, and net namespaces,
the kernel reports a bug:

[root@fc24 net]# unshare -Urmfpn sh run.sh
[root@fc24 net]# unshare -Urmfpn sh run.sh
^C

[   30.592295] unregister_netdevice: waiting for lo to become free. Usage count 
= 1
[   37.958180] 
=
[   37.963675] BUG kmalloc-1024 (Not tainted): Poison overwritten
[   37.966160] 
-

[   37.969497] Disabling lock debugging due to kernel taint
[   37.971181] INFO: 0x93eef8bc3760-0x93eef8bc3760. First byte 0x38 
instead of 0x6b
[   37.973066] INFO: Slab 0xe69fc1e2f000 objects=15 used=15 fp=0x  
(null) flags=0x1fffc08100
[   37.974902] INFO: Object 0x93eef8bc35a8 @offset=13736 
fp=0x93eef8bc1088

[   37.976618] Redzone 93eef8bc35a0: bb bb bb bb bb bb bb bb
  
[   37.978355] Object 93eef8bc35a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.980239] Object 93eef8bc35b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.982209] Object 93eef8bc35c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.984072] Object 93eef8bc35d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.985729] Object 93eef8bc35e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.987273] Object 93eef8bc35f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.988930] Object 93eef8bc3608: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.990582] Object 93eef8bc3618: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.991921] Object 93eef8bc3628: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.993541] Object 93eef8bc3638: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.994922] Object 93eef8bc3648: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.996745] Object 93eef8bc3658: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.998135] Object 93eef8bc3668: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   37.999350] Object 93eef8bc3678: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.000656] Object 93eef8bc3688: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.001920] Object 93eef8bc3698: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.003142] Object 93eef8bc36a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.004549] Object 93eef8bc36b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.005925] Object 93eef8bc36c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.007296] Object 93eef8bc36d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.008852] Object 93eef8bc36e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.010375] Object 93eef8bc36f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.011686] Object 93eef8bc3708: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.012967] Object 93eef8bc3718: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.014244] Object 93eef8bc3728: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.015513] Object 93eef8bc3738: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.016796] Object 93eef8bc3748: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.018101] Object 93eef8bc3758: 6b 6b 6b 6b 6b 6b 6b 6b 38 6b 6b 6b 6b 
6b 6b 6b  8kkk
[   38.019151] Object 93eef8bc3768: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.020349] Object 93eef8bc3778: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.021622] Object 93eef8bc3788: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.022931] Object 93eef8bc3798: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b  
[   38.024223] Object 93eef8bc37a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
6b 6b 6b

Re: [PATCH 2/6] wl1251: Use request_firmware_prefer_user() for loading NVS calibration data

2017-06-23 Thread Luis R. Rodriguez

On Tue, May 16, 2017 at 10:41:08AM +0200, Arend Van Spriel wrote:
> On 16-5-2017 1:13, Luis R. Rodriguez wrote:
> > Since no upstream delta is needed for firmwared I'd like to first encourage
> > evaluating the above. While distributions don't carry it yet that may be 
> > seen as
> > an issue but since what we are looking for are corner cases, only folks 
> > needing
> > to deploy a specific solution would need it or a custom proprietary 
> > solution.
> 
> Ok. I will go try and run firmwared in OpenWrt on a router platform.
> Have to steal one from a colleague :-p Will study firmwared.

Arend, curious how this effort goes. Its important to me as we know then that
if this works its a good approach to recommend moving forward which should also
prove less complex than that soup we had with the custom fallback stuff.

  Luis

unregister_netdevice: waiting for lo to become free. Usage count = 1

2017-06-23 Thread Andrei Vagin

Hello Everyone,

Today I've met a problem, when any attempts to create a new network
namespace hang up.
I see that one of previous namespaces can't be destroyed, because a
usage count for one
of its devices isn't zero. To reproduce the problem, you need to
executed an attached program
in a separate network namespace:

[root@fc24 net]# cat run.sh
ip link set up dev lo
./tcp-bug
[root@fc24 net]# unshare -n sh run.sh
[root@fc24 net]# echo $?
0
[root@fc24 ~]# cat /proc/40/stack
[] msleep+0x3e/0x50
[] netdev_run_todo+0x12a/0x320
[] rtnl_unlock+0xe/0x10
[] default_device_exit_batch+0x14a/0x170
[] ops_exit_list.isra.6+0x52/0x60
[] cleanup_net+0x1ee/0x2f0
[] process_one_work+0x205/0x620
[] worker_thread+0x4e/0x3b0
[] kthread+0x114/0x150
[] ret_from_fork+0x2a/0x40
[] 0x
[root@fc24 ~]# dmesg | tail
[   97.071533] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[   97.079561] systemd-journald[180]: Sent WATCHDOG=1 notification.
[  107.319260] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  117.567180] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  127.807401] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  138.055324] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  148.303308] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  158.559118] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  168.807423] unregister_netdevice: waiting for lo to become free.
Usage count = 1
[  179.055590] unregister_netdevice: waiting for lo to become free.
Usage count = 1

This program creates a server tcp socket, then it creates a pair of
connected tcp sockets
and then it does actions which trigger this problem. It calls
connect() with AF_UNSPEC
for one of connected sockets and then call connect() with the address
of the server socket.


Thanks,
Andrei
#include 
#include 
#include   /* for sockaddr_in and inet_ntoa() */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


#define pr_err(fmt, ...)   \
printf("Error: " fmt, ##__VA_ARGS__)

#define pr_perror(fmt, ...) \
pr_err(fmt ": %s\n", ##__VA_ARGS__, strerror(errno))

#define fail(fmt, ...) \
pr_err(fmt ": %s\n", ##__VA_ARGS__, strerror(errno))

union sockaddr_inet {
	struct sockaddr addr;
	struct sockaddr_in v4;
	struct sockaddr_in6 v6;
};

int tcp_init_server(int family, int *port)
{
	union sockaddr_inet addr;
	int sock;
	int yes = 1, ret;

	memset(&addr,0,sizeof(addr));
	if (family == AF_INET) {
		addr.v4.sin_family = family;
		inet_pton(family, "0.0.0.0", &(addr.v4.sin_addr));
	} else if (family == AF_INET6){
		addr.v6.sin6_family = family;
		inet_pton(family, "::0", &(addr.v6.sin6_addr));
	} else
		return -1;

	sock = socket(family, SOCK_STREAM, IPPROTO_TCP);
	if (sock == -1) {
		pr_perror("socket() failed");
		return -1;
	}

	if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1 ) {
		pr_perror("setsockopt() error");
		return -1;
	}

	while (1) {
		if (family == AF_INET)
			addr.v4.sin_port = htons(*port);
		else if (family == AF_INET6)
			addr.v6.sin6_port = htons(*port);

		ret = bind(sock, (struct sockaddr *) &addr, sizeof(addr));

		/* criu doesn't restore sock opts, so we need this hack */
		if (ret == -1 && errno == EADDRINUSE) {
			(*port)++;
			continue;
		}
		break;
	}

	if (ret == -1) {
		pr_perror("bind() failed");
		return -1;
	}

	if (listen(sock, 1) == -1) {
		pr_perror("listen() failed");
		return -1;
	}
	return sock;
}

int tcp_accept_server(int sock)
{
	struct sockaddr_in maddr;
	int sock2;
	socklen_t addrlen;
#ifdef DEBUG
	test_msg ("Waiting for connection..\n");
#endif
	addrlen = sizeof(maddr);
	sock2 = accept(sock,(struct sockaddr *) &maddr, &addrlen);

	if (sock2 == -1) {
		pr_perror("accept() failed");
		return -1;
	}

#ifdef DEBUG
	test_msg ("Connection!!\n");
#endif
	return sock2;
}

int __tcp_init_client(int sock, int family, char *servIP, unsigned short servPort)
{
	union sockaddr_inet servAddr;

	/* Construct the server address structure */
	memset(&servAddr, 0, sizeof(servAddr));
	if (family == AF_INET) {
		servAddr.v4.sin_family  = AF_INET;
		servAddr.v4.sin_port= htons(servPort);
		inet_pton(AF_INET, servIP, &servAddr.v4.sin_addr);
	} else {
		servAddr.v6.sin6_family  = AF_INET6;
		servAddr.v6.sin6_port= htons(servPort);
		inet_pton(AF_INET6, servIP, &servAddr.v6.sin6_addr);
	}
	if (connect(sock, (struct sockaddr *) &servAddr, sizeof(servAddr)) < 0) {
		pr_perror("can't connect to server");
		return -1;
	}
	return sock;
}

int tcp_init_client(int family, char *servIP, unsigned short servPort)
{
	int sock;

	if ((sock = socket(family, SOCK_STREAM, IPPROTO_TCP)) < 0) {
		pr_perror("can't create socket");
		return -1;
	}
	return __tcp_init_cli

Re: [PATCH net-next v3 01/15] bpf: BPF support for sock_ops

2017-06-23 Thread Daniel Borkmann

On 06/23/2017 01:57 AM, Lawrence Brakmo wrote:

On 6/22/17, 4:19 PM, "netdev-ow...@vger.kernel.org on behalf of Daniel Borkmann" 
 wrote:

 On 06/23/2017 12:58 AM, Lawrence Brakmo wrote:
 [...]
 > Daniel, I see value for having a global program, so I would like to keep 
that. When
 > this patchset is accepted, I will submit one that adds support for per 
cgroup
 > sock_ops programs, with the option to use the global one if none is
 > specified for a cgroup. We could also have the option of the cgroup 
sock_ops
 > program choosing if the global program should run for a particular op 
based on
 > its return value. We can iron it out the details when that patch is 
submitted.

 Hm, could you elaborate on the value part compared to per cgroups ops?
 My understanding is that per cgroup would already be a proper superset
 of just the global one anyway, so why not going with that in the first
 place since you're working on it?

 What would be the additional value? How would global vs per cgroup one
 interact with each other in terms of enforcement e.g., there's already
 semantics in place for cgroups descendants, would it be that we set
 TCP parameters twice or would you disable the global one altogether?
 Just wondering as you could avoid these altogether with going via cgroups
 initially.

 Thanks,
 Daniel

Well, for starters the global program will work even if CONFIG_CGROUP_BPF is
not defined. It is also an easier concept for when a global program is all that

Otoh, major distros are highly likely to enable this on by default anyway.

is required. But I also had in mind that behaviors that were in common for
most cgroup programs could be handled by the global program instead of
adding it to all cgroup programs. In this scenario the global program
represents the default behavior that can be override by the cgroup
program (per op). For example, the cgroup program could return a value
to indicate that that op should be passed to the global program.

But then you would need to go through two program passes for setting
such parameters? Other option could be to make the per cgroup ops more
fine grained and use the effective one that was inherited for delegating
to default ops. My gut feeling is just that this makes interactions to
manage this and enforcement in combination with the later planned per
cgroups ops more complex if the same use-case could indeed be resolved
with per cgroups only.

I agree 100% with you on the value of cgroup programs, but I just happen
to think there is also value in the global program.

Thanks,
Lawrence

Re: [PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-23 Thread Marcelo Ricardo Leitner

On Wed, Jun 21, 2017 at 07:03:27PM +0200, Denys Vlasenko wrote:
> This function compiles to 147 bytes of machine code. 13 callsites.
> 
> I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
> SCTP events are not happening on every packet.
> They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
> Does not look performance critical.
> 
> Signed-off-by: Denys Vlasenko 
> CC: Vlad Yasevich 
> CC: Neil Horman 
> CC: David Miller 
> CC: linux-s...@vger.kernel.org
> CC: netdev@vger.kernel.org
> CC: linux-ker...@vger.kernel.org
> ---
> Changed since v1:
> * reindented function argument list

Dave, this patch is marked as Changes Requested on patchwork, but the v2
here looks good. There was no change request on it, only on the v1, and
they were satified on v2. The only info I asked on v2 was to know he had
a bigger plan for this, not sure if that's what caused confusion or not.

Thanks,
Marcelo

Re: [PATCH v3 07/11] tty: improve tty_insert_flip_char() fast path

2017-06-23 Thread Greg Kroah-Hartman

On Thu, Jun 22, 2017 at 07:13:51PM +0200, Arnd Bergmann wrote:
> kernelci.org reports a crazy stack usage for the VT code when CONFIG_KASAN
> is enabled:
> 
> drivers/tty/vt/keyboard.c: In function 'kbd_keycode':
> drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is 
> larger than 2048 bytes [-Werror=frame-larger-than=]
> 
> The problem is that tty_insert_flip_char() gets inlined many times into
> kbd_keycode(), and also into other functions, and each copy requires 128
> bytes for stack redzone to check for a possible out-of-bounds access on
> the 'ch' and 'flags' arguments that are passed into
> tty_insert_flip_string_flags as a variable-length string.
> 
> This introduces a new __tty_insert_flip_char() function for the slow
> path, which receives the two arguments by value. This completely avoids
> the problem and the stack usage goes back down to around 100 bytes.
> 
> Without KASAN, this is also slightly better, as we don't have to
> spill the arguments to the stack but can simply pass 'ch' and 'flag'
> in registers, saving a few bytes in .text for each call site.
> 
> This should be backported to linux-4.0 or later, which first introduced
> the stack sanitizer in the kernel.
> 
> Cc: sta...@vger.kernel.org
> Fixes: c420f167db8c ("kasan: enable stack instrumentation")
> Signed-off-by: Arnd Bergmann 
> ---
> I already submitted this separately to Greg, but he hasn't replied
> yet. I assume that it's fine if Andrew picks it up along with the
> other patches and drops it again in case Greg applies it to linux-next.

I've been traveling in China this week, give me a chance to catch up
please.

And no, I don't like this patch either, I think kasan needs to be fixed
here, not work around it in odd ways in code that is completly
acceptable to "sane" compilers.  But give me a week to catch up on my
pending stuff first...

thanks,

greg k-h

Re: [PATCH net-next 1/2] ipmr: restrict mroute "queue full" warning to related error values

2017-06-23 Thread Julien Gomes

On 06/23/2017 11:47 AM, David Miller wrote:

> From: Julien Gomes 
> Date: Fri, 23 Jun 2017 10:52:26 -0700
>
>> On 06/23/2017 10:39 AM, David Miller wrote:
>>
>>> From: Julien Gomes 
>>> Date: Wed, 21 Jun 2017 10:58:10 -0700
>>>
 When sending a cache report on mroute_sk, mroute will emit a
 "pending queue full" warning for every error value returned by
 sock_queue_rcv_skb().
 This warning can be misleading, for example on the EPERM error value
 that sk_filter() can return.

 Restricting this warning to only ENOMEM or ENOBUFS seems more
 appropriate.

 Signed-off-by: Julien Gomes 
>>> Incorrect, no other error codes are possible.
>>>
>>> We never attach a socket filter to these kernel internal sockets,
>>> therefore sk_filter() is not even applicable in this analysis.
>>>
>>> Therefore, -ENOBUFS and -ENOMEM are the only errors we can ever see
>>> returned from sock_queue_rcv_skb().
>>>
>>> This goes for your second patch as well.
>> Up to now I would agree, but now that cache reports are also sent
>> through Netlink, wouldn't it make sense to allow the user of mroute_sk
>> to attach a filter to it in order to not receive cache reports on it?
> There is not visibility of this socket outside of the kernel.

Hmm, maybe we are not talking about the same thing.

The value of this socket pointer is set by the MRT_INIT sockopt.
The socket is definitely visible outside of the kernel as it is first
created and used by the user for kernel-user communications
via the sockopts and the messages sent by the kernel to the user
through it.

The typical user setup up to now was to create this socket,
MRT_INIT to register it in ipmr, and handle the incoming packets,
including the cache reports.

Now that the cache reports are also sent through another medium,
the user should be able to decide whether it also wants the
reports on this socket, which, once again, it is already using.
And if the user wants to get the reports only through Netlink,
the kernel will currently emit those unrelated warnings.

Once again, we are not in the case of a purely internal kernel socket,
as this socket is used for internal kernel-user communications.

-- 
Julien Gomes

[PATCH net-next v3 08/12] nfp: provide nfp_port to of nfp_net_get_mac_addr()

2017-06-23 Thread Simon Horman

Provide port rather than vNIC as parameter of nfp_net_get_mac_addr.
This is to allow this function to be used by representor netdevs where
a vNIC may have more than one physical port none of which are associated
with the vNIC.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_app_nic.c  |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_main.h |  3 ++-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 25 +++
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
index 7b966bd3d214..c11a6c34e217 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
@@ -69,7 +69,7 @@ int nfp_app_nic_vnic_init(struct nfp_app *app, struct nfp_net 
*nn,
if (err)
return err < 0 ? err : 0;
 
-   nfp_net_get_mac_addr(app->pf, nn, id);
+   nfp_net_get_mac_addr(app->pf, nn->port, id);
 
return 0;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h 
b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index aa69d4101eb9..edc14dc78674 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -58,6 +58,7 @@ struct nfp_hwinfo;
 struct nfp_mip;
 struct nfp_net;
 struct nfp_nsp_identify;
+struct nfp_port;
 struct nfp_rtsym_table;
 
 /**
@@ -147,7 +148,7 @@ void nfp_hwmon_unregister(struct nfp_pf *pf);
 struct nfp_eth_table_port *
 nfp_net_find_port(struct nfp_eth_table *eth_tbl, unsigned int id);
 void
-nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id);
+nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_port *port, unsigned int 
id);
 
 bool nfp_ctrl_tx(struct nfp_net *nn, struct sk_buff *skb);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 911b764d7641..cfcbc3b9a9aa 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -135,25 +135,24 @@ static u8 __iomem *nfp_net_map_area(struct nfp_cpp *cpp,
 /**
  * nfp_net_get_mac_addr() - Get the MAC address.
  * @pf:   NFP PF handle
- * @nn:   NFP Network structure
+ * @port: NFP port structure
  * @id:  NFP port id
  *
  * First try to get the MAC address from NSP ETH table. If that
  * fails try HWInfo.  As a last resort generate a random address.
  */
 void
-nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_net *nn, unsigned int id)
+nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_port *port, unsigned int id)
 {
struct nfp_eth_table_port *eth_port;
-   struct nfp_net_dp *dp = &nn->dp;
u8 mac_addr[ETH_ALEN];
const char *mac_str;
char name[32];
 
-   eth_port = __nfp_port_get_eth_port(nn->port);
+   eth_port = __nfp_port_get_eth_port(port);
if (eth_port) {
-   ether_addr_copy(dp->netdev->dev_addr, eth_port->mac_addr);
-   ether_addr_copy(dp->netdev->perm_addr, eth_port->mac_addr);
+   ether_addr_copy(port->netdev->dev_addr, eth_port->mac_addr);
+   ether_addr_copy(port->netdev->perm_addr, eth_port->mac_addr);
return;
}
 
@@ -161,22 +160,22 @@ nfp_net_get_mac_addr(struct nfp_pf *pf, struct nfp_net 
*nn, unsigned int id)
 
mac_str = nfp_hwinfo_lookup(pf->hwinfo, name);
if (!mac_str) {
-   dev_warn(dp->dev, "Can't lookup MAC address. Generate\n");
-   eth_hw_addr_random(dp->netdev);
+   nfp_warn(pf->cpp, "Can't lookup MAC address. Generate\n");
+   eth_hw_addr_random(port->netdev);
return;
}
 
if (sscanf(mac_str, "%02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
   &mac_addr[0], &mac_addr[1], &mac_addr[2],
   &mac_addr[3], &mac_addr[4], &mac_addr[5]) != 6) {
-   dev_warn(dp->dev,
-"Can't parse MAC address (%s). Generate.\n", mac_str);
-   eth_hw_addr_random(dp->netdev);
+   nfp_warn(pf->cpp, "Can't parse MAC address (%s). Generate.\n",
+mac_str);
+   eth_hw_addr_random(port->netdev);
return;
}
 
-   ether_addr_copy(dp->netdev->dev_addr, mac_addr);
-   ether_addr_copy(dp->netdev->perm_addr, mac_addr);
+   ether_addr_copy(port->netdev->dev_addr, mac_addr);
+   ether_addr_copy(port->netdev->perm_addr, mac_addr);
 }
 
 struct nfp_eth_table_port *
-- 
2.1.4

[PATCH net-next v3 09/12] nfp: add support for tx/rx with metadata portid

2017-06-23 Thread Simon Horman

Allow tx/rx with metadata port id. This will be used for tx/rx of
representor netdevs acting as upper-devices while a pf netdev acts
as a lower-device.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 57 +++---
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index b7446793106d..b1fa77bd708b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -318,6 +318,7 @@ struct nfp_meta_parsed {
u8 csum_type;
u32 hash;
u32 mark;
+   u32 portid;
__wsum csum;
 };
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2134493ec8a8..2e728543e840 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -755,6 +755,26 @@ static void nfp_net_tx_xmit_more_flush(struct 
nfp_net_tx_ring *tx_ring)
tx_ring->wr_ptr_add = 0;
 }
 
+static int nfp_net_prep_port_id(struct sk_buff *skb)
+{
+   struct metadata_dst *md_dst = skb_metadata_dst(skb);
+   unsigned char *data;
+
+   if (likely(!md_dst))
+   return 0;
+   if (unlikely(md_dst->type != METADATA_HW_PORT_MUX))
+   return 0;
+
+   if (unlikely(skb_cow_head(skb, 8)))
+   return -ENOMEM;
+
+   data = skb_push(skb, 8);
+   put_unaligned_be32(NFP_NET_META_PORTID, data);
+   put_unaligned_be32(md_dst->u.port_info.port_id, data + 4);
+
+   return 8;
+}
+
 /**
  * nfp_net_tx() - Main transmit entry point
  * @skb:SKB to transmit
@@ -767,6 +787,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
struct nfp_net *nn = netdev_priv(netdev);
const struct skb_frag_struct *frag;
struct nfp_net_tx_desc *txd, txdg;
+   int f, nr_frags, wr_idx, md_bytes;
struct nfp_net_tx_ring *tx_ring;
struct nfp_net_r_vector *r_vec;
struct nfp_net_tx_buf *txbuf;
@@ -774,8 +795,6 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
struct nfp_net_dp *dp;
dma_addr_t dma_addr;
unsigned int fsize;
-   int f, nr_frags;
-   int wr_idx;
u16 qidx;
 
dp = &nn->dp;
@@ -797,6 +816,13 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
return NETDEV_TX_BUSY;
}
 
+   md_bytes = nfp_net_prep_port_id(skb);
+   if (unlikely(md_bytes < 0)) {
+   nfp_net_tx_xmit_more_flush(tx_ring);
+   dev_kfree_skb_any(skb);
+   return NETDEV_TX_OK;
+   }
+
/* Start with the head skbuf */
dma_addr = dma_map_single(dp->dev, skb->data, skb_headlen(skb),
  DMA_TO_DEVICE);
@@ -815,7 +841,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
 
/* Build TX descriptor */
txd = &tx_ring->txds[wr_idx];
-   txd->offset_eop = (nr_frags == 0) ? PCIE_DESC_TX_EOP : 0;
+   txd->offset_eop = (nr_frags ? 0 : PCIE_DESC_TX_EOP) | md_bytes;
txd->dma_len = cpu_to_le16(skb_headlen(skb));
nfp_desc_set_dma_addr(txd, dma_addr);
txd->data_len = cpu_to_le16(skb->len);
@@ -855,7 +881,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
*txd = txdg;
txd->dma_len = cpu_to_le16(fsize);
nfp_desc_set_dma_addr(txd, dma_addr);
-   txd->offset_eop =
+   txd->offset_eop |=
(f == nr_frags - 1) ? PCIE_DESC_TX_EOP : 0;
}
 
@@ -1450,6 +1476,10 @@ nfp_net_parse_meta(struct net_device *netdev, struct 
nfp_meta_parsed *meta,
meta->mark = get_unaligned_be32(data);
data += 4;
break;
+   case NFP_NET_META_PORTID:
+   meta->portid = get_unaligned_be32(data);
+   data += 4;
+   break;
case NFP_NET_META_CSUM:
meta->csum_type = CHECKSUM_COMPLETE;
meta->csum =
@@ -1594,6 +1624,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
struct nfp_net_rx_buf *rxbuf;
struct nfp_net_rx_desc *rxd;
struct nfp_meta_parsed meta;
+   struct net_device *netdev;
dma_addr_t new_dma_addr;
void *new_frag;
 
@@ -1672,7 +1703,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
}
 
if (xdp_prog && !(rxd->rxd.flags & PCIE_DESC_RX_BPF &&
-

[PATCH net-next v3 11/12] nfp: add flower app

2017-06-23 Thread Simon Horman

Add app for flower offload. At this point the PF netdev and phys port
representor netdevs are initialised. Follow-up work will add support for
VF and PF representors and beyond that offloading the flower classifier.

Based in part on work by Benjamin LaHaise and Bert van Leeuwen.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile  |   1 +
 drivers/net/ethernet/netronome/nfp/flower/main.c | 294 +++
 drivers/net/ethernet/netronome/nfp/nfp_app.c |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_app.h |   4 +
 4 files changed, 300 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index e14f62863add..10b556b2c59d 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -28,6 +28,7 @@ nfp-objs := \
bpf/main.o \
bpf/offload.o \
flower/cmsg.o \
+   flower/main.o \
nic/main.o
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
new file mode 100644
index ..54d8180317ec
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -0,0 +1,294 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../nfpcore/nfp_cpp.h"
+#include "../nfpcore/nfp_nsp.h"
+#include "../nfp_app.h"
+#include "../nfp_main.h"
+#include "../nfp_net.h"
+#include "../nfp_net_repr.h"
+#include "../nfp_port.h"
+#include "./cmsg.h"
+
+/**
+ * struct nfp_flower_priv - Flower APP per-vNIC priv data
+ * @nn: Pointer to vNIC
+ */
+struct nfp_flower_priv {
+   struct nfp_net *nn;
+};
+
+static const char *nfp_flower_extra_cap(struct nfp_app *app, struct nfp_net 
*nn)
+{
+   return "FLOWER";
+}
+
+static enum devlink_eswitch_mode eswitch_mode_get(struct nfp_app *app)
+{
+   return DEVLINK_ESWITCH_MODE_SWITCHDEV;
+}
+
+static enum nfp_repr_type
+nfp_flower_repr_get_type_and_port(struct nfp_app *app, u32 port_id, u8 *port)
+{
+   switch (FIELD_GET(NFP_FLOWER_CMSG_PORT_TYPE, port_id)) {
+   case NFP_FLOWER_CMSG_PORT_TYPE_PHYS_PORT:
+   *port = FIELD_GET(NFP_FLOWER_CMSG_PORT_PHYS_PORT_NUM,
+ port_id);
+   return NFP_REPR_TYPE_PHYS_PORT;
+
+   case NFP_FLOWER_CMSG_PORT_TYPE_PCIE_PORT:
+   *port = FIELD_GET(NFP_FLOWER_CMSG_PORT_VNIC, port_id);
+   if (FIELD_GET(NFP_FLOWER_CMSG_PORT_VNIC_TYPE, port_id) ==
+   NFP_FLOWER_CMSG_PORT_VNIC_TYPE_PF)
+   return NFP_REPR_TYPE_PF;
+   else
+   return NFP_REPR_TYPE_VF;
+   }
+
+   return NFP_FLOWER_CMSG_PORT_TYPE_UNSPEC;
+}
+
+static struct net_device *
+nfp_flower_repr_get(struct nfp_app *app, u32 port_id)
+{
+   enum nfp_repr_type repr_type;
+   struct nfp_reprs *reprs;
+   u8 port = 0;
+
+   repr_type = nfp_flower_repr_get_type_and_port(app, port_id, &port);
+
+   reprs = rcu_dereference(app->reprs[repr_type]);
+   if (!reprs)
+   return NULL;
+
+   if (port >= reprs->num_reprs)
+   return NULL;
+
+   return reprs->reprs[port];
+}
+
+static void
+nfp_flower_repr_netdev_get_stats64(struct net_device *netdev,
+  str

[PATCH net-next v3 12/12] nfp: add VF and PF representors to flower app

2017-06-23 Thread Simon Horman

Initialise VF and PF representors in flower app.

Based in part on work by Benjamin LaHaise, Bert van Leeuwen and
Jakub Kicinski.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
v3
* Do not associate VF representors with PF using SET_NETDEV_DEV
* Use random MAC address for VF representors
---
 drivers/net/ethernet/netronome/nfp/flower/main.c | 85 +++-
 1 file changed, 83 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 54d8180317ec..8e5ca6b4bb33 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -149,15 +149,80 @@ static const struct net_device_ops 
nfp_flower_repr_netdev_ops = {
.ndo_get_offload_stats  = nfp_repr_get_offload_stats,
 };
 
+static void nfp_flower_sriov_disable(struct nfp_app *app)
+{
+   nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_VF);
+}
+
+static int
+nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
+   enum nfp_flower_cmsg_port_vnic_type vnic_type,
+   enum nfp_repr_type repr_type, unsigned int cnt)
+{
+   u8 nfp_pcie = nfp_cppcore_pcie_unit(app->pf->cpp);
+   struct nfp_flower_priv *priv = app->priv;
+   struct nfp_reprs *reprs, *old_reprs;
+   const u8 queue = 0;
+   int i, err;
+
+   reprs = nfp_reprs_alloc(cnt);
+   if (!reprs)
+   return -ENOMEM;
+
+   for (i = 0; i < cnt; i++) {
+   u32 port_id;
+
+   reprs->reprs[i] = nfp_repr_alloc(app);
+   if (!reprs->reprs[i]) {
+   err = -ENOMEM;
+   goto err_reprs_clean;
+   }
+
+   eth_hw_addr_random(reprs->reprs[i]);
+
+   port_id = nfp_flower_cmsg_pcie_port(nfp_pcie, vnic_type,
+   i, queue);
+   err = nfp_repr_init(app, reprs->reprs[i],
+   &nfp_flower_repr_netdev_ops,
+   port_id, NULL, priv->nn->dp.netdev);
+   if (err)
+   goto err_reprs_clean;
+
+   nfp_info(app->cpp, "%s%d Representor(%s) created\n",
+repr_type == NFP_REPR_TYPE_PF ? "PF" : "VF", i,
+reprs->reprs[i]->name);
+   }
+
+   old_reprs = nfp_app_reprs_set(app, repr_type, reprs);
+   if (IS_ERR(old_reprs)) {
+   err = PTR_ERR(old_reprs);
+   goto err_reprs_clean;
+   }
+
+   return 0;
+err_reprs_clean:
+   nfp_reprs_clean_and_free(reprs);
+   return err;
+}
+
+static int nfp_flower_sriov_enable(struct nfp_app *app, int num_vfs)
+{
+   return nfp_flower_spawn_vnic_reprs(app,
+  NFP_FLOWER_CMSG_PORT_VNIC_TYPE_VF,
+  NFP_REPR_TYPE_VF, num_vfs);
+}
+
 static void nfp_flower_stop(struct nfp_app *app)
 {
+   nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PF);
nfp_reprs_clean_and_free_by_type(app, NFP_REPR_TYPE_PHYS_PORT);
+
 }
 
-static int nfp_flower_start(struct nfp_app *app)
+static int
+nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct nfp_flower_priv *priv)
 {
struct nfp_eth_table *eth_tbl = app->pf->eth_tbl;
-   struct nfp_flower_priv *priv = app->priv;
struct nfp_reprs *reprs, *old_reprs;
unsigned int i;
int err;
@@ -218,6 +283,19 @@ static int nfp_flower_start(struct nfp_app *app)
return err;
 }
 
+static int nfp_flower_start(struct nfp_app *app)
+{
+   int err;
+
+   err = nfp_flower_spawn_phy_reprs(app, app->priv);
+   if (err)
+   return err;
+
+   return nfp_flower_spawn_vnic_reprs(app,
+  NFP_FLOWER_CMSG_PORT_VNIC_TYPE_PF,
+  NFP_REPR_TYPE_PF, 1);
+}
+
 static void nfp_flower_vnic_clean(struct nfp_app *app, struct nfp_net *nn)
 {
kfree(app->priv);
@@ -289,6 +367,9 @@ const struct nfp_app_type app_flower = {
 
.ctrl_msg_rx= nfp_flower_cmsg_rx,
 
+   .sriov_enable   = nfp_flower_sriov_enable,
+   .sriov_disable  = nfp_flower_sriov_disable,
+
.eswitch_mode_get  = eswitch_mode_get,
.repr_get   = nfp_flower_repr_get,
 };
-- 
2.1.4

[PATCH net-next v3 05/12] nfp: general representor implementation

2017-06-23 Thread Simon Horman

Provide infrastructure to create and destroy representors of a given type.

Parts based on work by Bert van Leeuwen, Benjamin LaHaise,
and Jakub Kicinski.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_app.c  |  20 +++
 drivers/net/ethernet/netronome/nfp/nfp_app.h  |  18 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 156 ++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h |  92 +
 5 files changed, 287 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 5ad9a557f06a..a401113035f5 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -22,6 +22,7 @@ nfp-objs := \
nfp_net_common.o \
nfp_net_ethtool.o \
nfp_net_main.o \
+   nfp_net_repr.o \
nfp_netvf_main.o \
nfp_port.o \
bpf/main.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 396b93f54823..c9ccb0f94604 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -38,6 +38,7 @@
 #include "nfpcore/nfp_nffw.h"
 #include "nfp_app.h"
 #include "nfp_main.h"
+#include "nfp_net_repr.h"
 
 static const struct nfp_app_type *apps[] = {
&app_nic,
@@ -68,6 +69,25 @@ struct sk_buff *nfp_app_ctrl_msg_alloc(struct nfp_app *app, 
unsigned int size)
return skb;
 }
 
+struct nfp_reprs *
+nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type type,
+ struct nfp_reprs *reprs)
+{
+   struct nfp_reprs *old;
+
+   old = rcu_dereference_protected(app->reprs[type],
+   lockdep_is_held(&app->pf->lock));
+   if (reprs && old) {
+   old = ERR_PTR(-EBUSY);
+   goto exit_unlock;
+   }
+
+   rcu_assign_pointer(app->reprs[type], reprs);
+
+exit_unlock:
+   return old;
+}
+
 struct nfp_app *nfp_app_alloc(struct nfp_pf *pf, enum nfp_app_id id)
 {
struct nfp_app *app;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index 0fee14ffa081..af023a0491e7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -36,6 +36,8 @@
 
 #include 
 
+#include "nfp_net_repr.h"
+
 struct bpf_prog;
 struct net_device;
 struct pci_dev;
@@ -73,6 +75,7 @@ extern const struct nfp_app_type app_bpf;
  * @tc_busy:   TC HW offload busy (rules loaded)
  * @xdp_offload:offload an XDP program
  * @eswitch_mode_get:get SR-IOV eswitch mode
+ * @repr_get:  get representor netdev
  */
 struct nfp_app_type {
enum nfp_app_id id;
@@ -100,6 +103,7 @@ struct nfp_app_type {
   struct bpf_prog *prog);
 
enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app);
+   struct net_device *(*repr_get)(struct nfp_app *app, u32 id);
 };
 
 /**
@@ -108,6 +112,7 @@ struct nfp_app_type {
  * @pf:backpointer to NFP PF structure
  * @cpp:   pointer to the CPP handle
  * @ctrl:  pointer to ctrl vNIC struct
+ * @reprs: array of pointers to representors
  * @type:  pointer to const application ops and info
  */
 struct nfp_app {
@@ -116,6 +121,7 @@ struct nfp_app {
struct nfp_cpp *cpp;
 
struct nfp_net *ctrl;
+   struct nfp_reprs __rcu *reprs[NFP_REPR_TYPE_MAX + 1];
 
const struct nfp_app_type *type;
 };
@@ -231,6 +237,18 @@ static inline int nfp_app_eswitch_mode_get(struct nfp_app 
*app, u16 *mode)
return 0;
 }
 
+static inline struct net_device *nfp_app_repr_get(struct nfp_app *app, u32 id)
+{
+   if (unlikely(!app || !app->type->repr_get))
+   return NULL;
+
+   return app->type->repr_get(app, id);
+}
+
+struct nfp_reprs *
+nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type type,
+ struct nfp_reprs *reprs);
+
 const char *nfp_app_mip_name(struct nfp_app *app);
 struct sk_buff *nfp_app_ctrl_msg_alloc(struct nfp_app *app, unsigned int size);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
new file mode 100644
index ..8e02f843ae92
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -0,0 +1,156 @@
+/*
+ * Copyright (C) 2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the comp

[PATCH net-next v3 06/12] nfp: add stats and xmit helpers for representors

2017-06-23 Thread Simon Horman

Provide helpers for stats and xmit on representor netdevs.

Parts based on work by Bert van Leeuwen, Benjamin LaHaise and
Jakub Kicinski.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 199 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h |  28 +++
 2 files changed, 226 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index 8e02f843ae92..44adcc5df11e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -32,15 +32,198 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
 #include "nfpcore/nfp_cpp.h"
 #include "nfp_app.h"
 #include "nfp_main.h"
+#include "nfp_net_ctrl.h"
 #include "nfp_net_repr.h"
 #include "nfp_port.h"
 
+static void
+nfp_repr_inc_tx_stats(struct net_device *netdev, unsigned int len,
+ int tx_status)
+{
+   struct nfp_repr *repr = netdev_priv(netdev);
+   struct nfp_repr_pcpu_stats *stats;
+
+   if (unlikely(tx_status != NET_XMIT_SUCCESS &&
+tx_status != NET_XMIT_CN)) {
+   this_cpu_inc(repr->stats->tx_drops);
+   return;
+   }
+
+   stats = this_cpu_ptr(repr->stats);
+   u64_stats_update_begin(&stats->syncp);
+   stats->tx_packets++;
+   stats->tx_bytes += len;
+   u64_stats_update_end(&stats->syncp);
+}
+
+void nfp_repr_inc_rx_stats(struct net_device *netdev, unsigned int len)
+{
+   struct nfp_repr *repr = netdev_priv(netdev);
+   struct nfp_repr_pcpu_stats *stats;
+
+   stats = this_cpu_ptr(repr->stats);
+   u64_stats_update_begin(&stats->syncp);
+   stats->rx_packets++;
+   stats->rx_bytes += len;
+   u64_stats_update_end(&stats->syncp);
+}
+
+static void
+nfp_repr_phy_port_get_stats64(const struct nfp_app *app, u8 phy_port,
+ struct rtnl_link_stats64 *stats)
+{
+   u8 __iomem *mem;
+
+   mem = app->pf->mac_stats_mem + phy_port * NFP_MAC_STATS_SIZE;
+
+   /* TX and RX stats are flipped as we are returning the stats as seen
+* at the switch port corresponding to the phys port.
+*/
+   stats->tx_packets = readq(mem + NFP_MAC_STATS_RX_FRAMES_RECEIVED_OK);
+   stats->tx_bytes = readq(mem + NFP_MAC_STATS_RX_IN_OCTETS);
+   stats->tx_dropped = readq(mem + NFP_MAC_STATS_RX_IN_ERRORS);
+
+   stats->rx_packets = readq(mem + NFP_MAC_STATS_TX_FRAMES_TRANSMITTED_OK);
+   stats->rx_bytes = readq(mem + NFP_MAC_STATS_TX_OUT_OCTETS);
+   stats->rx_dropped = readq(mem + NFP_MAC_STATS_TX_OUT_ERRORS);
+}
+
+static void
+nfp_repr_vf_get_stats64(const struct nfp_app *app, u8 vf,
+   struct rtnl_link_stats64 *stats)
+{
+   u8 __iomem *mem;
+
+   mem = app->pf->vf_cfg_mem + vf * NFP_NET_CFG_BAR_SZ;
+
+   /* TX and RX stats are flipped as we are returning the stats as seen
+* at the switch port corresponding to the VF.
+*/
+   stats->tx_packets = readq(mem + NFP_NET_CFG_STATS_RX_FRAMES);
+   stats->tx_bytes = readq(mem + NFP_NET_CFG_STATS_RX_OCTETS);
+   stats->tx_dropped = readq(mem + NFP_NET_CFG_STATS_RX_DISCARDS);
+
+   stats->rx_packets = readq(mem + NFP_NET_CFG_STATS_TX_FRAMES);
+   stats->rx_bytes = readq(mem + NFP_NET_CFG_STATS_TX_OCTETS);
+   stats->rx_dropped = readq(mem + NFP_NET_CFG_STATS_TX_DISCARDS);
+}
+
+static void
+nfp_repr_pf_get_stats64(const struct nfp_app *app, u8 pf,
+   struct rtnl_link_stats64 *stats)
+{
+   u8 __iomem *mem;
+
+   if (pf)
+   return;
+
+   mem = nfp_cpp_area_iomem(app->pf->data_vnic_bar);
+
+   stats->tx_packets = readq(mem + NFP_NET_CFG_STATS_RX_FRAMES);
+   stats->tx_bytes = readq(mem + NFP_NET_CFG_STATS_RX_OCTETS);
+   stats->tx_dropped = readq(mem + NFP_NET_CFG_STATS_RX_DISCARDS);
+
+   stats->rx_packets = readq(mem + NFP_NET_CFG_STATS_TX_FRAMES);
+   stats->rx_bytes = readq(mem + NFP_NET_CFG_STATS_TX_OCTETS);
+   stats->rx_dropped = readq(mem + NFP_NET_CFG_STATS_TX_DISCARDS);
+}
+
+void
+nfp_repr_get_stats64(const struct nfp_app *app, enum nfp_repr_type type,
+u8 port, struct rtnl_link_stats64 *stats)
+{
+   switch (type) {
+   case NFP_REPR_TYPE_PHYS_PORT:
+   nfp_repr_phy_port_get_stats64(app, port, stats);
+   break;
+   case NFP_REPR_TYPE_PF:
+   nfp_repr_pf_get_stats64(app, port, stats);
+   break;
+   case NFP_REPR_TYPE_VF:
+   nfp_repr_vf_get_stats64(app, port, stats);
+   default:
+   break;
+   }
+}
+
+bool
+nfp_repr_has_offload_stats(const struct net_device *dev, int attr_id)
+{
+   switch (attr_id) {
+   case IFLA_OFFLOAD_XSTATS_CPU_HIT:
+   return true;
+   }
+
+   return false;
+}
+
+static int
+nfp_repr_get_hos

[PATCH net-next v3 10/12] nfp: add support for control messages for flower app

2017-06-23 Thread Simon Horman

In preparation for adding a new flower app - targeted at offloading
the flower classifier - provide support for control message that it will
use to communicate with the NFP.

Based in part on work by Bert van Leeuwen.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
v3
* Provide carrier_ok as a parameter to nfp_flower_cmsg_portmod() rather
  than calling netif_carrier_ok() as nfp_flower_cmsg_portmod() may
  be called before the netdev is updated.
---
 drivers/net/ethernet/netronome/nfp/Makefile  |   1 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c | 159 +++
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h | 116 +
 drivers/net/ethernet/netronome/nfp/nfp_app.c |   5 +-
 drivers/net/ethernet/netronome/nfp/nfp_app.h |   3 +-
 5 files changed, 281 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.h

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index a401113035f5..e14f62863add 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -27,6 +27,7 @@ nfp-objs := \
nfp_port.o \
bpf/main.o \
bpf/offload.o \
+   flower/cmsg.o \
nic/main.o
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
new file mode 100644
index ..7761be436726
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.c
@@ -0,0 +1,159 @@
+/*
+ * Copyright (C) 2015-2017 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "../nfpcore/nfp_cpp.h"
+#include "../nfp_net_repr.h"
+#include "./cmsg.h"
+
+#define nfp_flower_cmsg_warn(app, fmt, args...)
\
+   do {\
+   if (net_ratelimit())\
+   nfp_warn((app)->cpp, fmt, ## args); \
+   } while (0)
+
+static struct nfp_flower_cmsg_hdr *
+nfp_flower_cmsg_get_hdr(struct sk_buff *skb)
+{
+   return (struct nfp_flower_cmsg_hdr *)skb->data;
+}
+
+static void *nfp_flower_cmsg_get_data(struct sk_buff *skb)
+{
+   return (unsigned char *)skb->data + NFP_FLOWER_CMSG_HLEN;
+}
+
+static struct sk_buff *
+nfp_flower_cmsg_alloc(struct nfp_app *app, unsigned int size,
+ enum nfp_flower_cmsg_type_port type)
+{
+   struct nfp_flower_cmsg_hdr *ch;
+   struct sk_buff *skb;
+
+   size += NFP_FLOWER_CMSG_HLEN;
+
+   skb = nfp_app_ctrl_msg_alloc(app, size, GFP_KERNEL);
+   if (!skb)
+   return NULL;
+
+   ch = nfp_flower_cmsg_get_hdr(skb);
+   ch->pad = 0;
+   ch->version = NFP_FLOWER_CMSG_VER1;
+   ch->type = type;
+   skb_put(skb, size);
+
+   return skb;
+}
+
+int nfp_flower_cmsg_portmod(struct net_device *netdev, bool carrier_ok)
+{
+   struct nfp_repr *repr = netdev_priv(netdev);
+   struct nfp_flower_cmsg_portmod *msg;
+   struct sk_buff *skb;
+
+   skb = nfp_flower_cmsg_alloc(repr->app, sizeof(*msg),
+   NFP_FLOWER_CMSG_TYPE_PORT_MOD);
+   if (!skb)
+   return -ENOMEM;
+
+   msg = nfp_flower_cmsg_get_data(skb);
+   msg->portnum = cpu_to_be32(repr->dst->u.port_info.port_id);
+   msg->reserve

[PATCH net-next v3 02/12] nfp: devlink add support for getting eswitch mode

2017-06-23 Thread Simon Horman

From: Jakub Kicinski 

Add app callback for reporting eswitch mode.  Non-SRIOV apps
should not implement this callback, nfp_app code will then
respond with -EOPNOTSUPP.

Signed-off-by: Jakub Kicinski 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_app.h | 15 +++
 drivers/net/ethernet/netronome/nfp/nfp_devlink.c | 18 ++
 2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index f5e373fa8c3b..0fee14ffa081 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -34,6 +34,8 @@
 #ifndef _NFP_APP_H
 #define _NFP_APP_H 1
 
+#include 
+
 struct bpf_prog;
 struct net_device;
 struct pci_dev;
@@ -70,6 +72,7 @@ extern const struct nfp_app_type app_bpf;
  * @setup_tc:  setup TC ndo
  * @tc_busy:   TC HW offload busy (rules loaded)
  * @xdp_offload:offload an XDP program
+ * @eswitch_mode_get:get SR-IOV eswitch mode
  */
 struct nfp_app_type {
enum nfp_app_id id;
@@ -95,6 +98,8 @@ struct nfp_app_type {
bool (*tc_busy)(struct nfp_app *app, struct nfp_net *nn);
int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn,
   struct bpf_prog *prog);
+
+   enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app);
 };
 
 /**
@@ -216,6 +221,16 @@ static inline void nfp_app_ctrl_rx(struct nfp_app *app, 
struct sk_buff *skb)
app->type->ctrl_msg_rx(app, skb);
 }
 
+static inline int nfp_app_eswitch_mode_get(struct nfp_app *app, u16 *mode)
+{
+   if (!app->type->eswitch_mode_get)
+   return -EOPNOTSUPP;
+
+   *mode = app->type->eswitch_mode_get(app);
+
+   return 0;
+}
+
 const char *nfp_app_mip_name(struct nfp_app *app);
 struct sk_buff *nfp_app_ctrl_msg_alloc(struct nfp_app *app, unsigned int size);
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c 
b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 2609a0f28e81..6c9f29c2e975 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -149,9 +149,27 @@ nfp_devlink_port_unsplit(struct devlink *devlink, unsigned 
int port_index)
return ret;
 }
 
+static int nfp_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
+{
+   struct nfp_pf *pf = devlink_priv(devlink);
+   int ret;
+
+   mutex_lock(&pf->lock);
+   if (!pf->app) {
+   ret = -EBUSY;
+   goto out;
+   }
+   ret = nfp_app_eswitch_mode_get(pf->app, mode);
+out:
+   mutex_unlock(&pf->lock);
+
+   return ret;
+}
+
 const struct devlink_ops nfp_devlink_ops = {
.port_split = nfp_devlink_port_split,
.port_unsplit   = nfp_devlink_port_unsplit,
+   .eswitch_mode_get   = nfp_devlink_eswitch_mode_get,
 };
 
 int nfp_devlink_port_register(struct nfp_app *app, struct nfp_port *port)
-- 
2.1.4

[PATCH net-next v3 04/12] nfp: map mac_stats and vf_cfg BARs

2017-06-23 Thread Simon Horman

If present map mac_stats and vf_cfg BARs. These will be used by
representor netdevs to read statistics for phys port and vf representors.

Also provide defines describing the layout of the mac_stats area.
Similar defines are already present for the cf_cfg area.

Based in part on work by Jakub Kicinski.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
v3
* Do not log an error if optional symbols are not found
---
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |   8 ++
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 122 +++--
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |  60 ++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |   2 +
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   |   5 +-
 5 files changed, 164 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h 
b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index 88724f8d0dcd..aa69d4101eb9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -68,6 +68,10 @@ struct nfp_rtsym_table;
  * @data_vnic_bar: Pointer to the CPP area for the data vNICs' BARs
  * @ctrl_vnic_bar: Pointer to the CPP area for the ctrl vNIC's BAR
  * @qc_area:   Pointer to the CPP area for the queues
+ * @mac_stats_bar: Pointer to the CPP area for the MAC stats
+ * @mac_stats_mem: Pointer to mapped MAC stats area
+ * @vf_cfg_bar:Pointer to the CPP area for the VF 
configuration BAR
+ * @vf_cfg_mem:Pointer to mapped VF configuration area
  * @irq_entries:   Array of MSI-X entries for all vNICs
  * @limit_vfs: Number of VFs supported by firmware (~0 for PCI limit)
  * @num_vfs:   Number of SR-IOV VFs enabled
@@ -97,6 +101,10 @@ struct nfp_pf {
struct nfp_cpp_area *data_vnic_bar;
struct nfp_cpp_area *ctrl_vnic_bar;
struct nfp_cpp_area *qc_area;
+   struct nfp_cpp_area *mac_stats_bar;
+   u8 __iomem *mac_stats_mem;
+   struct nfp_cpp_area *vf_cfg_bar;
+   u8 __iomem *vf_cfg_mem;
 
struct msix_entry *irq_entries;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index bc2bc0886176..911b764d7641 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -235,10 +235,8 @@ nfp_net_pf_map_rtsym(struct nfp_pf *pf, const char *name, 
const char *sym_fmt,
 nfp_cppcore_pcie_unit(pf->cpp));
 
sym = nfp_rtsym_lookup(pf->rtbl, pf_symbol);
-   if (!sym) {
-   nfp_err(pf->cpp, "Failed to find PF symbol %s\n", pf_symbol);
+   if (!sym)
return (u8 __iomem *)ERR_PTR(-ENOENT);
-   }
 
if (sym->size < min_size) {
nfp_err(pf->cpp, "PF symbol %s too small\n", pf_symbol);
@@ -486,6 +484,7 @@ nfp_net_pf_app_init(struct nfp_pf *pf, u8 __iomem *qc_bar, 
unsigned int stride)
NFP_PF_CSR_SLICE_SIZE,
&pf->ctrl_vnic_bar);
if (IS_ERR(ctrl_bar)) {
+   nfp_err(pf->cpp, "Failed to find data vNIC memory symbol\n");
err = PTR_ERR(ctrl_bar);
goto err_free;
}
@@ -570,6 +569,80 @@ static void nfp_net_pf_app_stop(struct nfp_pf *pf)
nfp_net_pf_app_stop_ctrl(pf);
 }
 
+static void nfp_net_pci_unmap_mem(struct nfp_pf *pf)
+{
+   if (pf->vf_cfg_bar)
+   nfp_cpp_area_release_free(pf->vf_cfg_bar);
+   if (pf->mac_stats_bar)
+   nfp_cpp_area_release_free(pf->mac_stats_bar);
+   nfp_cpp_area_release_free(pf->qc_area);
+   nfp_cpp_area_release_free(pf->data_vnic_bar);
+}
+
+static int nfp_net_pci_map_mem(struct nfp_pf *pf)
+{
+   u32 ctrl_bar_sz;
+   u8 __iomem *mem;
+   int err;
+
+   ctrl_bar_sz = pf->max_data_vnics * NFP_PF_CSR_SLICE_SIZE;
+   mem = nfp_net_pf_map_rtsym(pf, "net.ctrl", "_pf%d_net_bar0",
+  ctrl_bar_sz, &pf->data_vnic_bar);
+   if (IS_ERR(mem)) {
+   nfp_err(pf->cpp, "Failed to find data vNIC memory symbol\n");
+   err = PTR_ERR(mem);
+   if (!pf->fw_loaded && err == -ENOENT)
+   err = -EPROBE_DEFER;
+   return err;
+   }
+
+   pf->mac_stats_mem = nfp_net_pf_map_rtsym(pf, "net.macstats",
+"_mac_stats",
+NFP_MAC_STATS_SIZE *
+(pf->eth_tbl->max_index + 1),
+&pf->mac_stats_bar);
+   if (IS_ERR(pf->mac_stats_mem)) {
+   if (PTR_ERR(pf->mac_stats_mem) != -ENOENT) {
+   err = PTR_ERR(pf->mac_stats_mem);
+   goto err_unmap_ctrl;
+   }
+   pf->mac_st

[PATCH net-next v3 03/12] nfp: move physical port init into a helper

2017-06-23 Thread Simon Horman

From: Jakub Kicinski 

Move MAC/PHY port init into a helper to make it easier to reuse
it in the representor code.

Signed-off-by: Jakub Kicinski 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_app_nic.c | 23 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.c| 25 
 drivers/net/ethernet/netronome/nfp/nfp_port.h|  3 +++
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
index 83c65e6291ee..7b966bd3d214 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app_nic.c
@@ -42,6 +42,8 @@ static int
 nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
   struct nfp_net *nn, unsigned int id)
 {
+   int err;
+
if (!pf->eth_tbl)
return 0;
 
@@ -49,26 +51,13 @@ nfp_app_nic_vnic_init_phy_port(struct nfp_pf *pf, struct 
nfp_app *app,
if (IS_ERR(nn->port))
return PTR_ERR(nn->port);
 
-   nn->port->eth_id = id;
-   nn->port->eth_port = nfp_net_find_port(pf->eth_tbl, id);
-
-   /* Check if vNIC has external port associated and cfg is OK */
-   if (!nn->port->eth_port) {
-   nfp_err(app->cpp,
-   "NSP port entries don't match vNICs (no entry for port 
#%d)\n",
-   id);
+   err = nfp_port_init_phy_port(pf, app, nn->port, id);
+   if (err) {
nfp_port_free(nn->port);
-   return -EINVAL;
-   }
-   if (nn->port->eth_port->override_changed) {
-   nfp_warn(app->cpp,
-"Config changed for port #%d, reboot required before 
port will be operational\n",
-id);
-   nn->port->type = NFP_PORT_INVALID;
-   return 1;
+   return err;
}
 
-   return 0;
+   return nn->port->type == NFP_PORT_INVALID;
 }
 
 int nfp_app_nic_vnic_init(struct nfp_app *app, struct nfp_net *nn,
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c 
b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index a17410ac01ab..19bceeb82225 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -33,6 +33,7 @@
 
 #include 
 
+#include "nfpcore/nfp_cpp.h"
 #include "nfpcore/nfp_nsp.h"
 #include "nfp_app.h"
 #include "nfp_main.h"
@@ -112,6 +113,30 @@ nfp_port_get_phys_port_name(struct net_device *netdev, 
char *name, size_t len)
return 0;
 }
 
+int nfp_port_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
+  struct nfp_port *port, unsigned int id)
+{
+   port->eth_id = id;
+   port->eth_port = nfp_net_find_port(pf->eth_tbl, id);
+
+   /* Check if vNIC has external port associated and cfg is OK */
+   if (!port->eth_port) {
+   nfp_err(app->cpp,
+   "NSP port entries don't match vNICs (no entry for port 
#%d)\n",
+   id);
+   return -EINVAL;
+   }
+   if (port->eth_port->override_changed) {
+   nfp_warn(app->cpp,
+"Config changed for port #%d, reboot required before 
port will be operational\n",
+id);
+   port->type = NFP_PORT_INVALID;
+   return 0;
+   }
+
+   return 0;
+}
+
 struct nfp_port *
 nfp_port_alloc(struct nfp_app *app, enum nfp_port_type type,
   struct net_device *netdev)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h 
b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index 4d1a9b3fed41..fb28c7071987 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -104,6 +104,9 @@ nfp_port_alloc(struct nfp_app *app, enum nfp_port_type type,
   struct net_device *netdev);
 void nfp_port_free(struct nfp_port *port);
 
+int nfp_port_init_phy_port(struct nfp_pf *pf, struct nfp_app *app,
+  struct nfp_port *port, unsigned int id);
+
 int nfp_net_refresh_eth_port(struct nfp_port *port);
 void nfp_net_refresh_port_table(struct nfp_port *port);
 int nfp_net_refresh_port_table_sync(struct nfp_pf *pf);
-- 
2.1.4

[PATCH net-next v3 07/12] nfp: app callbacks for SRIOV

2017-06-23 Thread Simon Horman

Add app-callbacks for app-specific initialisation of SRIOV.

Disabling SRIOV is brought forward in nfp_pci_remove()
so that nfp_app_sriov_disable is called while the app still exists.

This is intended to be used to implement representor netdevs for virtual
ports.

Signed-off-by: Simon Horman 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_app.h  | 18 
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 42 +++
 2 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index af023a0491e7..ff2d43615808 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -75,6 +75,8 @@ extern const struct nfp_app_type app_bpf;
  * @tc_busy:   TC HW offload busy (rules loaded)
  * @xdp_offload:offload an XDP program
  * @eswitch_mode_get:get SR-IOV eswitch mode
+ * @sriov_enable: app-specific sriov initialisation
+ * @sriov_disable: app-specific sriov clean-up
  * @repr_get:  get representor netdev
  */
 struct nfp_app_type {
@@ -102,6 +104,9 @@ struct nfp_app_type {
int (*xdp_offload)(struct nfp_app *app, struct nfp_net *nn,
   struct bpf_prog *prog);
 
+   int (*sriov_enable)(struct nfp_app *app, int num_vfs);
+   void (*sriov_disable)(struct nfp_app *app);
+
enum devlink_eswitch_mode (*eswitch_mode_get)(struct nfp_app *app);
struct net_device *(*repr_get)(struct nfp_app *app, u32 id);
 };
@@ -237,6 +242,19 @@ static inline int nfp_app_eswitch_mode_get(struct nfp_app 
*app, u16 *mode)
return 0;
 }
 
+static inline int nfp_app_sriov_enable(struct nfp_app *app, int num_vfs)
+{
+   if (!app || !app->type->sriov_enable)
+   return -EOPNOTSUPP;
+   return app->type->sriov_enable(app, num_vfs);
+}
+
+static inline void nfp_app_sriov_disable(struct nfp_app *app)
+{
+   if (app && app->type->sriov_disable)
+   app->type->sriov_disable(app);
+}
+
 static inline struct net_device *nfp_app_repr_get(struct nfp_app *app, u32 id)
 {
if (unlikely(!app || !app->type->repr_get))
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 4e59dcb78c36..748e54cc885e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -54,6 +54,7 @@
 
 #include "nfpcore/nfp6000_pcie.h"
 
+#include "nfp_app.h"
 #include "nfp_main.h"
 #include "nfp_net.h"
 
@@ -97,28 +98,45 @@ static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int 
num_vfs)
struct nfp_pf *pf = pci_get_drvdata(pdev);
int err;
 
+   mutex_lock(&pf->lock);
+
if (num_vfs > pf->limit_vfs) {
nfp_info(pf->cpp, "Firmware limits number of VFs to %u\n",
 pf->limit_vfs);
-   return -EINVAL;
+   err = -EINVAL;
+   goto err_unlock;
+   }
+
+   err = nfp_app_sriov_enable(pf->app, num_vfs);
+   if (err) {
+   dev_warn(&pdev->dev, "App specific PCI sriov configuration 
failed: %d\n",
+err);
+   goto err_unlock;
}
 
err = pci_enable_sriov(pdev, num_vfs);
if (err) {
dev_warn(&pdev->dev, "Failed to enable PCI sriov: %d\n", err);
-   return err;
+   goto err_app_sriov_disable;
}
 
pf->num_vfs = num_vfs;
 
dev_dbg(&pdev->dev, "Created %d VFs.\n", pf->num_vfs);
 
+   mutex_unlock(&pf->lock);
return num_vfs;
+
+err_app_sriov_disable:
+   nfp_app_sriov_disable(pf->app);
+err_unlock:
+   mutex_unlock(&pf->lock);
+   return err;
 #endif
return 0;
 }
 
-static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
+static int __nfp_pcie_sriov_disable(struct pci_dev *pdev)
 {
 #ifdef CONFIG_PCI_IOV
struct nfp_pf *pf = pci_get_drvdata(pdev);
@@ -132,6 +150,8 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
return -EPERM;
}
 
+   nfp_app_sriov_disable(pf->app);
+
pf->num_vfs = 0;
 
pci_disable_sriov(pdev);
@@ -140,6 +160,18 @@ static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
return 0;
 }
 
+static int nfp_pcie_sriov_disable(struct pci_dev *pdev)
+{
+   struct nfp_pf *pf = pci_get_drvdata(pdev);
+   int err;
+
+   mutex_lock(&pf->lock);
+   err = __nfp_pcie_sriov_disable(pdev);
+   mutex_unlock(&pf->lock);
+
+   return err;
+}
+
 static int nfp_pcie_sriov_configure(struct pci_dev *pdev, int num_vfs)
 {
if (num_vfs == 0)
@@ -431,11 +463,11 @@ static void nfp_pci_remove(struct pci_dev *pdev)
 
devlink = priv_to_devlink(pf);
 
-   nfp_net_pci_remove(pf);
-
nfp_pcie_sriov_disable(pdev);
pci_sriov_set_totalvfs(pf->pdev, 0);
 
+   nfp_net_pci_remove(pf);
+
devlink_unre

[PATCH net-next v3 01/12] net: store port/representator id in metadata_dst

2017-06-23 Thread Simon Horman

From: Jakub Kicinski 

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id.  This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission.  The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski 
Signed-off-by: Sridhar Samudrala 
Signed-off-by: Simon Horman 
---
 include/net/dst_metadata.h | 41 -
 net/core/dst.c | 15 ++-
 net/core/filter.c  |  1 +
 net/ipv4/ip_tunnel_core.c  |  6 --
 net/openvswitch/flow_netlink.c |  4 +++-
 5 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index 701fc814d0af..a803129a4849 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -5,10 +5,22 @@
 #include 
 #include 
 
+enum metadata_type {
+   METADATA_IP_TUNNEL,
+   METADATA_HW_PORT_MUX,
+};
+
+struct hw_port_info {
+   struct net_device *lower_dev;
+   u32 port_id;
+};
+
 struct metadata_dst {
struct dst_entrydst;
+   enum metadata_type  type;
union {
struct ip_tunnel_info   tun_info;
+   struct hw_port_info port_info;
} u;
 };
 
@@ -27,7 +39,7 @@ static inline struct ip_tunnel_info *skb_tunnel_info(struct 
sk_buff *skb)
struct metadata_dst *md_dst = skb_metadata_dst(skb);
struct dst_entry *dst;
 
-   if (md_dst)
+   if (md_dst && md_dst->type == METADATA_IP_TUNNEL)
return &md_dst->u.tun_info;
 
dst = skb_dst(skb);
@@ -55,22 +67,33 @@ static inline int skb_metadata_dst_cmp(const struct sk_buff 
*skb_a,
a = (const struct metadata_dst *) skb_dst(skb_a);
b = (const struct metadata_dst *) skb_dst(skb_b);
 
-   if (!a != !b || a->u.tun_info.options_len != b->u.tun_info.options_len)
+   if (!a != !b || a->type != b->type)
return 1;
 
-   return memcmp(&a->u.tun_info, &b->u.tun_info,
- sizeof(a->u.tun_info) + a->u.tun_info.options_len);
+   switch (a->type) {
+   case METADATA_HW_PORT_MUX:
+   return memcmp(&a->u.port_info, &b->u.port_info,
+ sizeof(a->u.port_info));
+   case METADATA_IP_TUNNEL:
+   return memcmp(&a->u.tun_info, &b->u.tun_info,
+ sizeof(a->u.tun_info) +
+a->u.tun_info.options_len);
+   default:
+   return 1;
+   }
 }
 
 void metadata_dst_free(struct metadata_dst *);
-struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
-struct metadata_dst __percpu *metadata_dst_alloc_percpu(u8 optslen, gfp_t 
flags);
+struct metadata_dst *metadata_dst_alloc(u8 optslen, enum metadata_type type,
+   gfp_t flags);
+struct metadata_dst __percpu *
+metadata_dst_alloc_percpu(u8 optslen, enum metadata_type type, gfp_t flags);
 
 static inline struct metadata_dst *tun_rx_dst(int md_size)
 {
struct metadata_dst *tun_dst;
 
-   tun_dst = metadata_dst_alloc(md_size, GFP_ATOMIC);
+   tun_dst = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC);
if (!tun_dst)
return NULL;
 
@@ -85,11 +108,11 @@ static inline struct metadata_dst *tun_dst_unclone(struct 
sk_buff *skb)
int md_size;
struct metadata_dst *new_md;
 
-   if (!md_dst)
+   if (!md_dst || md_dst->type != METADATA_IP_TUNNEL)
return ERR_PTR(-EINVAL);
 
md_size = md_dst->u.tun_info.options_len;
-   new_md = metadata_dst_alloc(md_size, GFP_ATOMIC);
+   new_md = metadata_dst_alloc(md_size, METADATA_IP_TUNNEL, GFP_ATOMIC);
if (!new_md

[PATCH net-next v3 00/12] nfp: add flower app with representors

2017-06-23 Thread Simon Horman

Hi,

this series adds a flower app to the NFP driver.
It initialises four types of netdevs:

* PF netdev - lower-device for communication of packets to device
* PF representor netdev
* VF representor netdevs
* Phys port representor netdevs

The PF netdev acts as a lower-device which sends and receives packets to
and from the firmware. The representors act as upper-devices. For TX
representors attach a metadata dst to the skb which is used by the PF
netdev to prepend metadata to the packet before forwarding the firmware. On
RX the PF netdev looks up the representor based on the prepended metadata
received from the firmware and forwards the skb to the representor after
removing the metadata.

Control queues are used to send and receive control messages which are
used to communicate configuration information with the firmware. These
are in separate vNIC to the queues belonging to the PF netdev. The control
queues are not exposed to use-space via a netdev or any other means.

The first 9 patches of this series provide app-independent infrastructure
to instantiate representors and the remaining 3 patches provide an app
which uses this infrastructure.

As the name implies this app is targeted at providing offload of TC flower.
Flower offload - allowing classifiers to be attached to representor netdevs
- is intended to be provided by follow-up patches at which point it will
become the dominant feature of the app.

Minor changes since v2 noted in changelogs of individual patches.
Review of v1 and v2 of this patchset have been addressed either
through discussion on-list or changes in this patchset.


Jakub Kicinski (3):
  net: store port/representator id in metadata_dst
  nfp: devlink add support for getting eswitch mode
  nfp: move physical port init into a helper

Simon Horman (9):
  nfp: map mac_stats and vf_cfg BARs
  nfp: general representor implementation
  nfp: add stats and xmit helpers for representors
  nfp: app callbacks for SRIOV
  nfp: provide nfp_port to of nfp_net_get_mac_addr()
  nfp: add support for tx/rx with metadata portid
  nfp: add support for control messages for flower app
  nfp: add flower app
  nfp: add VF and PF representors to flower app

 drivers/net/ethernet/netronome/nfp/Makefile|   3 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.c   | 159 +
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   | 116 +++
 drivers/net/ethernet/netronome/nfp/flower/main.c   | 375 +
 drivers/net/ethernet/netronome/nfp/nfp_app.c   |  26 +-
 drivers/net/ethernet/netronome/nfp/nfp_app.h   |  58 +++-
 drivers/net/ethernet/netronome/nfp/nfp_app_nic.c   |  25 +-
 drivers/net/ethernet/netronome/nfp/nfp_devlink.c   |  18 +
 drivers/net/ethernet/netronome/nfp/nfp_main.c  |  42 ++-
 drivers/net/ethernet/netronome/nfp/nfp_main.h  |  11 +-
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  57 +++-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  | 147 +---
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  | 353 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h  | 120 +++
 drivers/net/ethernet/netronome/nfp/nfp_port.c  |  25 ++
 drivers/net/ethernet/netronome/nfp/nfp_port.h  |  63 
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |   2 +
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   |   5 +-
 include/net/dst_metadata.h |  41 ++-
 net/core/dst.c |  15 +-
 net/core/filter.c  |   1 +
 net/ipv4/ip_tunnel_core.c  |   6 +-
 net/openvswitch/flow_netlink.c |   4 +-
 24 files changed, 1577 insertions(+), 96 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/cmsg.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/flower/main.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h

-- 
2.1.4

Faster TCP keepalive

2017-06-23 Thread Stephen Suryaputra Lin

Greetings,

I'm writing this to probe if there has been thoughts or efforts in
allowing sub-second TCP keep alive interval? One application is for TCP
connections between IP hosts connected by an internal backplane where a
faster detection is a necessity and the increased traffic can be
accommodated.

Suggestions on other ways to quickly tearing down TCP connections to a
rebooted host in the application above are welcomed.

Thank you,

Stephen.

[PATCH 3/3] net: qcom/emac: add support for emulation systems

2017-06-23 Thread Timur Tabi

On emulation systems, the EMAC's internal PHY ("SGMII") is not present,
but is not needed for network functionality.  So just display a warning
message and ignore the SGMII.

Tested-by: Philip Elcan 
Tested-by: Adam Wallis 
Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c 
b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
index 18c184e..29ba37a 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
@@ -297,6 +297,14 @@ static int emac_sgmii_acpi_match(struct device *dev, void 
*data)
{}
 };
 
+/* Dummy function for systems without an internal PHY. This avoids having
+ * to check for NULL pointers before calling the functions.
+ */
+static int emac_sgmii_dummy(struct emac_adapter *adpt)
+{
+   return 0;
+}
+
 int emac_sgmii_config(struct platform_device *pdev, struct emac_adapter *adpt)
 {
struct platform_device *sgmii_pdev = NULL;
@@ -311,8 +319,19 @@ int emac_sgmii_config(struct platform_device *pdev, struct 
emac_adapter *adpt)
emac_sgmii_acpi_match);
 
if (!dev) {
-   dev_err(&pdev->dev, "cannot find internal phy node\n");
-   return -ENODEV;
+   dev_warn(&pdev->dev, "cannot find internal phy node\n");
+   /* There is typically no internal PHY on emulation
+* systems, so if we can't find the node, assume
+* we are on an emulation system and stub-out
+* support for the internal PHY.  These systems only
+* use ACPI.
+*/
+   phy->open = emac_sgmii_dummy;
+   phy->close = emac_sgmii_dummy;
+   phy->link_up = emac_sgmii_dummy;
+   phy->link_down = emac_sgmii_dummy;
+
+   return 0;
}
 
sgmii_pdev = to_platform_device(dev);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH 1/3] net: qcom/emac: add shutdown function

2017-06-23 Thread Timur Tabi

The shutdown function halts all DMA and interrupts, so that all
operations are discontinued when the system shuts down, e.g. via
kexec or a forced reboot.

Tested-by: Tyler Baicar 
Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c 
b/drivers/net/ethernet/qualcomm/emac/emac.c
index 98a326f..77c5c92 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -762,6 +762,19 @@ static int emac_remove(struct platform_device *pdev)
return 0;
 }
 
+static void emac_shutdown(struct platform_device *pdev)
+{
+   struct net_device *netdev = dev_get_drvdata(&pdev->dev);
+   struct emac_adapter *adpt = netdev_priv(netdev);
+   struct emac_sgmii *sgmii = &adpt->phy;
+
+   /* Closing the SGMII turns off its interrupts */
+   sgmii->close(adpt);
+
+   /* Resetting the MAC turns off all DMA and its interrupts */
+   emac_mac_reset(adpt);
+}
+
 static struct platform_driver emac_platform_driver = {
.probe  = emac_probe,
.remove = emac_remove,
@@ -770,6 +783,7 @@ static int emac_remove(struct platform_device *pdev)
.of_match_table = emac_dt_match,
.acpi_match_table = ACPI_PTR(emac_acpi_match),
},
+   .shutdown = emac_shutdown,
 };
 
 module_platform_driver(emac_platform_driver);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH 2/3][v2] net: qcom/emac: do not reset the EMAC during initialization

2017-06-23 Thread Timur Tabi

On ACPI systems, the driver depends on firmware pre-initializing the
EMAC because we don't have access to the clocks, and the EMAC has specific
clock programming requirements.  Therefore, we don't want to reset the
EMAC while we are completing the initialization.

Tested-by: Richard Ruigrok 
Signed-off-by: Timur Tabi 
---
v2: improve the patch description.

 drivers/net/ethernet/qualcomm/emac/emac.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c 
b/drivers/net/ethernet/qualcomm/emac/emac.c
index 77c5c92..746d94e 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -683,8 +683,6 @@ static int emac_probe(struct platform_device *pdev)
goto err_undo_mdiobus;
}
 
-   emac_mac_reset(adpt);
-
/* set hw features */
netdev->features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_HW_VLAN_CTAG_RX |
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH 0/3][v2] net: qcom/emac: various minor improvements

2017-06-23 Thread Timur Tabi

A collection of minor fixes and features to the Qualcomm Technologies
EMAC network driver.

Timur Tabi (3):
  net: qcom/emac: add shutdown function
  [v2] net: qcom/emac: do not reset the EMAC during initialization
  net: qcom/emac: add support for emulation systems

 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c | 23 +--
 drivers/net/ethernet/qualcomm/emac/emac.c   | 16 ++--
 2 files changed, 35 insertions(+), 4 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH net-next 0/4] net: phy: Support "internal" PHY interface

2017-06-23 Thread David Miller

From: Florian Fainelli 
Date: Fri, 23 Jun 2017 10:33:12 -0700

> This makes the "internal" phy-mode property generally available and
> documented and this allows us to remove some custom parsing code
> we had for bcmgenet and bcm_sf2 which both used that specific value.

Nice cleanup.

Series applied, thanks.

Re: [PATCH 2/3] net: qcom/emac: do not reset the EMAC during initialization

2017-06-23 Thread David Miller

From: Timur Tabi 
Date: Fri, 23 Jun 2017 13:37:58 -0500

> On 06/23/2017 01:00 PM, David Miller wrote:
>> What if the boot loader or something else left the chip in
>> a weird state?
> 
> We depend on the boot loader leaving the NIC in a very specific state
> already, otherwise the driver can't initialize the hardware.  The
> firmware has to pre-initialize the EMAC for us.
> 
> Not only that, but the driver was resetting the MAC *after*
> programming the clocks (on non-ACPI systems) and initializing both
> PHYs.
> 
>> I'm not applying this.
>> If it's correct, the explanation in this commit message need
>> to be imporved.  The change must be better justified.
> 
> Since this is for ACPI systems, I could do this:
> 
>   if (!has_acpi_companion(&pdev->dev))
>   emac_mac_reset(adpt);
> 
> But at the very least, the call should be moved to earlier in the
> function.

Please just explain the ACPI situation in the commit log message
and resubmit the series.

Thanks.

Re: [PATCH net-next 2/2] af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers

2017-06-23 Thread David Miller

From: Oliver Hartkopp 
Date: Fri, 23 Jun 2017 19:36:12 +0200

> 
> 
> On 06/23/2017 07:32 PM, Julian Wiedmann wrote:
>> From: Mateusz Jurczyk 
>> 
>> Verify that the caller-provided sockaddr structure is large enough to
>> contain the sa_family field, before accessing it in bind() and connect()
>> handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
>> size of the corresponding memory region, very short sockaddrs (zero or
>> one byte long) result in operating on uninitialized memory while
>> referencing .sa_family.
> 
> Won't it make sense to generally check the minimum length for .sa_family at a
> single point before fixing all called sites?

We had this discussion last week and we decided that putting it into
the handlers is the way to go for now.

Re: [PATCH net 0/2] bnxt_en: Error handling and netpoll fixes.

2017-06-23 Thread David Miller

From: Michael Chan 
Date: Fri, 23 Jun 2017 14:00:59 -0400

> Add missing error handling and fix netpoll handling.  The current code
> handles RX and TX events in netpoll mode and is causing lots of warnings
> and errors in the RX code path in netpoll mode.  The fix is to only handle
> TX events in netpoll mode.

Series applied, thanks.

Re: [PATCH net-next 1/2] ipmr: restrict mroute "queue full" warning to related error values

2017-06-23 Thread David Miller

From: Julien Gomes 
Date: Fri, 23 Jun 2017 10:52:26 -0700

> On 06/23/2017 10:39 AM, David Miller wrote:
> 
>> From: Julien Gomes 
>> Date: Wed, 21 Jun 2017 10:58:10 -0700
>>
>>> When sending a cache report on mroute_sk, mroute will emit a
>>> "pending queue full" warning for every error value returned by
>>> sock_queue_rcv_skb().
>>> This warning can be misleading, for example on the EPERM error value
>>> that sk_filter() can return.
>>>
>>> Restricting this warning to only ENOMEM or ENOBUFS seems more
>>> appropriate.
>>>
>>> Signed-off-by: Julien Gomes 
>> Incorrect, no other error codes are possible.
>>
>> We never attach a socket filter to these kernel internal sockets,
>> therefore sk_filter() is not even applicable in this analysis.
>>
>> Therefore, -ENOBUFS and -ENOMEM are the only errors we can ever see
>> returned from sock_queue_rcv_skb().
>>
>> This goes for your second patch as well.
> 
> Up to now I would agree, but now that cache reports are also sent
> through Netlink, wouldn't it make sense to allow the user of mroute_sk
> to attach a filter to it in order to not receive cache reports on it?

There is not visibility of this socket outside of the kernel.

I doubt it would ever be exported in any way, and until it would
be so worrying about this is truly a huge waste of time and developer
resources.

Thank you.

Re: [PATCH net] net: dp83640: Avoid NULL pointer dereference.

2017-06-23 Thread David Miller

From: Richard Cochran 
Date: Fri, 23 Jun 2017 17:51:31 +0200

> The function, skb_complete_tx_timestamp(), used to allow passing in a
> NULL pointer for the time stamps, but that was changed in commit
> 62bccb8cdb69051b95a55ab0c489e3cab261c8ef ("net-timestamp: Make the
> clone operation stand-alone from phy timestamping"), and the existing
> call sites, all of which are in the dp83640 driver, were fixed up.
> 
> Even though the kernel-doc was subsequently updated in commit
> 7a76a021cd5a292be875fbc616daf03eab1e6996 ("net-timestamp: Update
> skb_complete_tx_timestamp comment"), still a bug fix from Manfred
> Rudigier came into the driver using the old semantics.  Probably
> Manfred derived that patch from an older kernel version.
> 
> This fix should be applied to the stable trees as well.
> 
> Fixes: 81e8f2e930fe ("net: dp83640: Fix tx timestamp overflow handling.")
> Signed-off-by: Richard Cochran 

Applied and queued up for -stable, thank you.

Re: [PATCH 2/3] net: qcom/emac: do not reset the EMAC during initialization

2017-06-23 Thread Timur Tabi


On 06/23/2017 01:00 PM, David Miller wrote:

What if the boot loader or something else left the chip in
a weird state?


We depend on the boot loader leaving the NIC in a very specific state 
already, otherwise the driver can't initialize the hardware.  The 
firmware has to pre-initialize the EMAC for us.


Not only that, but the driver was resetting the MAC *after* programming 
the clocks (on non-ACPI systems) and initializing both PHYs.



I'm not applying this.

If it's correct, the explanation in this commit message need
to be imporved.  The change must be better justified.


Since this is for ACPI systems, I could do this:

if (!has_acpi_companion(&pdev->dev))
emac_mac_reset(adpt);

But at the very least, the call should be moved to earlier in the function.

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [pull request][net-next 00/15] Mellanox, mlx5 updates 2017-06-23

2017-06-23 Thread David Miller

From: Saeed Mahameed 
Date: Fri, 23 Jun 2017 17:26:07 +0300

> This series mainly from Tariq and Or includes updates to mlx5 core and
> netdevice dirvers.
> 
> From Tariq, RX path improvments.
> From Or, header re-write updates and FW flash support.
> For more detalis please see tag log below.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Saeed.

Re: [PATCH net-next 2/2] cxgb4: Use Firmware params to get buffer-group map

2017-06-23 Thread David Miller

From: Ganesh Goudar 
Date: Fri, 23 Jun 2017 19:14:37 +0530

> From: Arjun Vynipadath 
> 
> Buffer group mappings can be obtained using FW_PARAMs cmd for newer FW.
> 
> Since some of the bg_maps are obtained in atomic context, created another
> t4_query_params_ns(), that wont sleep when awaiting mbox cmd completion.
> 
> Signed-off-by: Casey Leedom 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net-next 1/2] cxgb4: Update T6 Buffer Group and Channel Mappings

2017-06-23 Thread David Miller

From: Ganesh Goudar 
Date: Fri, 23 Jun 2017 19:14:36 +0530

> From: Arjun Vynipadath 
> 
> We were using t4_get_mps_bg_map() for both t4_get_port_stats()
> to determine which MPS Buffer Groups to report statistics on for a given
> Port, and also for t4_sge_alloc_rxq() to provide a TP Ingress Channel
> Congestion Map.  For T4/T5 these are actually the same values (because they
> are ~somewhat~ related), but for T6 they should return different values
> (T6 has Port 0 associated with MPS Buffer Group 0 (with MPS Buffer Group 1
> silently cascading off) and Port 1 is associated with MPS Buffer Group 2
> (with 3 cascading off)).
> 
> Based on the original work by Casey Leedom 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH -net] tls: return -EFAULT if copy_to_user() fails

2017-06-23 Thread David Miller

From: Dan Carpenter 
Date: Fri, 23 Jun 2017 13:15:44 +0300

> The copy_to_user() function returns the number of bytes remaining but we
> want to return -EFAULT here.
> 
> Fixes: 3c4d7559159b ("tls: kernel TLS support")
> Signed-off-by: Dan Carpenter 

Dan, I happened to realize that tls is only in net-next, but please
indicate the target tree properly in your Subject lines in the
future.

Applied, thanks.

Re: pull request (net-next): ipsec-next 2017-06-23

2017-06-23 Thread David Miller

From: Steffen Klassert 
Date: Fri, 23 Jun 2017 10:38:24 +0200

> 1) Use memdup_user to spmlify xfrm_user_policy.
>From Geliang Tang.
> 
> 2) Make xfrm_dev_register static to silence a sparse warning.
>From Wei Yongjun.
> 
> 3) Use crypto_memneq to check the ICV in the AH protocol.
>From Sabrina Dubroca.
> 
> 4) Remove some unused variables in esp6.
>From Stephen Hemminger.
> 
> 5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
>From Antony Antony.
> 
> 6) Include the UDP encapsulation port to km_migrate announcements.
>From Antony Antony.
> 
> Please pull or let me know if there are problems.

Pulled, thank you very much.

Re: [PATCH V2 net-next 00/11] update ena ethernet driver to version 1.2.0

2017-06-23 Thread David Miller

From: 
Date: Fri, 23 Jun 2017 11:21:49 +0300

> From: Netanel Belgazal 
> 
> This patchset contains some new features/improvements that were added
> to the ENA driver to increase its robustness and are based on
> experience of wide ENA deployment.
> 
> Change log:
> 
> V2:
> * Remove patch that add inline to C-file static function (contradict coding 
> style).
> * Remove patch that moves MTU parameter validation in ena_change_mtu() 
> instead of
> using the network stack.
> * Use upper_32_bits()/lower_32_bits() instead of casting.

Series applied, thanks.

Re: pull request (net): ipsec 2017-06-23

2017-06-23 Thread David Miller

From: Steffen Klassert 
Date: Fri, 23 Jun 2017 09:06:28 +0200

> 1) Fix xfrm garbage collecting when unregistering a netdevice.
>From Hangbin Liu.
> 
> 2) Fix NULL pointer derefernce when exiting a network namespace.
>From Hangbin Liu.
> 
> 3) Fix some error codes in pfkey to prevent a NULL pointer derefernce.
>From Dan Carpenter.
> 
> 4) Fix NULL pointer derefernce on allocation failure in pfkey.
>From Dan Carpenter.
> 
> 5) Adjust IPv6 payload_len to include extension headers. Otherwise
>we corrupt the packets when doing ESP GRO on transport mode.
>From Yossi Kuperman.
> 
> 6) Set nhoff to the proper offset of the IPv6 nexthdr when doing ESP GRO.
>From Yossi Kuperman.
> 
> Please pull or let me know if there are problems.

Pulled, thanks Steffen!

Re: [PATCH net-next] tcp: fix out-of-bounds access in ULP sysctl

2017-06-23 Thread David Miller

From: Jakub Kicinski 
Date: Thu, 22 Jun 2017 18:57:55 -0700

> KASAN reports out-of-bound access in proc_dostring() coming from
> proc_tcp_available_ulp() because in case TCP ULP list is empty
> the buffer allocated for the response will not have anything
> printed into it.  Set the first byte to zero to avoid strlen()
> going out-of-bounds.
> 
> Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
> Signed-off-by: Jakub Kicinski 

Applied, thanks.

Re: [Patch net] sit: use __GFP_NOWARN for user controlled allocation

2017-06-23 Thread David Miller

From: Cong Wang 
Date: Thu, 22 Jun 2017 15:29:33 -0700

> The memory allocation size is controlled by user-space,
> if it is too large just fail silently and return NULL,
> not to mention there is a fallback allocation later.
> 
> Reported-by: Andrey Konovalov 
> Cc: Andrey Konovalov 
> Signed-off-by: Cong Wang 

Applied, thanks.

Re: [PATCH net-next] bpf: possibly avoid extra masking for narrower load in verifier

2017-06-23 Thread David Miller

From: Yonghong Song 
Date: Thu, 22 Jun 2017 15:07:39 -0700

> Commit 31fd85816dbe ("bpf: permits narrower load from bpf program
> context fields") permits narrower load for certain ctx fields.
> The commit however will already generate a masking even if
> the prog-specific ctx conversion produces the result with
> narrower size.
> 
> For example, for __sk_buff->protocol, the ctx conversion
> loads the data into register with 2-byte load.
> A narrower 2-byte load should not generate masking.
> For __sk_buff->vlan_present, the conversion function
> set the result as either 0 or 1, essentially a byte.
> The narrower 2-byte or 1-byte load should not generate masking.
> 
> To avoid unnecessary masking, prog-specific *_is_valid_access
> now passes converted_op_size back to verifier, which indicates
> the valid data width after perceived future conversion.
> Based on this information, verifier is able to avoid
> unnecessary marking.
> 
> Since we want more information back from prog-specific
> *_is_valid_access checking, all of them are packed into
> one data structure for more clarity.
> 
> Acked-by: Daniel Borkmann 
> Signed-off-by: Yonghong Song 

Applied, thank you.

Re: [PATCH v2 net] udpv6: reset daddr and dport in sk if connect() fails

2017-06-23 Thread David Miller

From: Wei Wang 
Date: Thu, 22 Jun 2017 12:03:41 -0700

> From: Wei Wang 
> 
> In __ip6_datagram_connect(), reset sk->sk_v6_daddr and inet->dport if
> error occurs so that udp_v6_early_demux() won't consider this socket
> as a valid candidate for early demux.
> 
> v2: fix compilation error
> 
> Signed-off-by: Wei Wang 
> Acked-by: Maciej Żenczykowski 

Please also add a state test against TCP_ESTABLISHED in UDP v6 early
demux to close this race completely.

Thanks.

[PATCH net 2/2] bnxt_en: Fix netpoll handling.

2017-06-23 Thread Michael Chan

To handle netpoll properly, the driver must only handle TX packets
during NAPI.  Handling RX events cause warnings and errors in
netpoll mode. The ndo_poll_controller() method should call
napi_schedule() directly so that a NAPI weight of zero will be used
during netpoll mode.

The bnxt_en driver supports 2 ring modes: combined, and separate rx/tx.
In separate rx/tx mode, the ndo_poll_controller() method will only
process the tx rings.  In combined mode, the rx and tx completion
entries are mixed in the completion ring and we need to drop the rx
entries and recycle the rx buffers.

Add a function bnxt_force_rx_discard() to handle this in netpoll mode
when we see rx entries in combined ring mode.

Reported-by: Calvin Owens 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 54 +++
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f5ba8ec..74e8e21 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1563,6 +1563,45 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_napi 
*bnapi, u32 *raw_cons,
return rc;
 }
 
+/* In netpoll mode, if we are using a combined completion ring, we need to
+ * discard the rx packets and recycle the buffers.
+ */
+static int bnxt_force_rx_discard(struct bnxt *bp, struct bnxt_napi *bnapi,
+u32 *raw_cons, u8 *event)
+{
+   struct bnxt_cp_ring_info *cpr = &bnapi->cp_ring;
+   u32 tmp_raw_cons = *raw_cons;
+   struct rx_cmp_ext *rxcmp1;
+   struct rx_cmp *rxcmp;
+   u16 cp_cons;
+   u8 cmp_type;
+
+   cp_cons = RING_CMP(tmp_raw_cons);
+   rxcmp = (struct rx_cmp *)
+   &cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
+
+   tmp_raw_cons = NEXT_RAW_CMP(tmp_raw_cons);
+   cp_cons = RING_CMP(tmp_raw_cons);
+   rxcmp1 = (struct rx_cmp_ext *)
+   &cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
+
+   if (!RX_CMP_VALID(rxcmp1, tmp_raw_cons))
+   return -EBUSY;
+
+   cmp_type = RX_CMP_TYPE(rxcmp);
+   if (cmp_type == CMP_TYPE_RX_L2_CMP) {
+   rxcmp1->rx_cmp_cfa_code_errors_v2 |=
+   cpu_to_le32(RX_CMPL_ERRORS_CRC_ERROR);
+   } else if (cmp_type == CMP_TYPE_RX_L2_TPA_END_CMP) {
+   struct rx_tpa_end_cmp_ext *tpa_end1;
+
+   tpa_end1 = (struct rx_tpa_end_cmp_ext *)rxcmp1;
+   tpa_end1->rx_tpa_end_cmp_errors_v2 |=
+   cpu_to_le32(RX_TPA_END_CMP_ERRORS);
+   }
+   return bnxt_rx_pkt(bp, bnapi, raw_cons, event);
+}
+
 #define BNXT_GET_EVENT_PORT(data)  \
((data) &   \
 ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK)
@@ -1745,7 +1784,11 @@ static int bnxt_poll_work(struct bnxt *bp, struct 
bnxt_napi *bnapi, int budget)
if (unlikely(tx_pkts > bp->tx_wake_thresh))
rx_pkts = budget;
} else if ((TX_CMP_TYPE(txcmp) & 0x30) == 0x10) {
-   rc = bnxt_rx_pkt(bp, bnapi, &raw_cons, &event);
+   if (likely(budget))
+   rc = bnxt_rx_pkt(bp, bnapi, &raw_cons, &event);
+   else
+   rc = bnxt_force_rx_discard(bp, bnapi, &raw_cons,
+  &event);
if (likely(rc >= 0))
rx_pkts += rc;
else if (rc == -EBUSY)  /* partial completion */
@@ -6664,12 +6707,11 @@ static void bnxt_poll_controller(struct net_device *dev)
struct bnxt *bp = netdev_priv(dev);
int i;
 
-   for (i = 0; i < bp->cp_nr_rings; i++) {
-   struct bnxt_irq *irq = &bp->irq_tbl[i];
+   /* Only process tx rings/combined rings in netpoll mode. */
+   for (i = 0; i < bp->tx_nr_rings; i++) {
+   struct bnxt_tx_ring_info *txr = &bp->tx_ring[i];
 
-   disable_irq(irq->vector);
-   irq->handler(irq->vector, bp->bnapi[i]);
-   enable_irq(irq->vector);
+   napi_schedule(&txr->bnapi->napi);
}
 }
 #endif
-- 
1.8.3.1

[PATCH net 1/2] bnxt_en: Add missing logic to handle TPA end error conditions.

2017-06-23 Thread Michael Chan

When we get a TPA_END completion to handle a completed LRO packet, it
is possible that hardware would indicate errors.  The current code is
not checking for the error condition.  Define the proper error bits and
the macro to check for this error and abort properly.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 +-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 03f55da..f5ba8ec 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1301,10 +1301,11 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt 
*bp,
cp_cons = NEXT_CMP(cp_cons);
}
 
-   if (unlikely(agg_bufs > MAX_SKB_FRAGS)) {
+   if (unlikely(agg_bufs > MAX_SKB_FRAGS || TPA_END_ERRORS(tpa_end1))) {
bnxt_abort_tpa(bp, bnapi, cp_cons, agg_bufs);
-   netdev_warn(bp->dev, "TPA frags %d exceeded MAX_SKB_FRAGS %d\n",
-   agg_bufs, (int)MAX_SKB_FRAGS);
+   if (agg_bufs > MAX_SKB_FRAGS)
+   netdev_warn(bp->dev, "TPA frags %d exceeded 
MAX_SKB_FRAGS %d\n",
+   agg_bufs, (int)MAX_SKB_FRAGS);
return NULL;
}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 3ef42db..d46a850 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -374,12 +374,16 @@ struct rx_tpa_end_cmp_ext {
 
__le32 rx_tpa_end_cmp_errors_v2;
#define RX_TPA_END_CMP_V2   (0x1 << 0)
-   #define RX_TPA_END_CMP_ERRORS   (0x7fff << 1)
+   #define RX_TPA_END_CMP_ERRORS   (0x3 << 1)
#define RX_TPA_END_CMPL_ERRORS_SHIFT 1
 
u32 rx_tpa_end_cmp_start_opaque;
 };
 
+#define TPA_END_ERRORS(rx_tpa_end_ext) \
+   ((rx_tpa_end_ext)->rx_tpa_end_cmp_errors_v2 &   \
+cpu_to_le32(RX_TPA_END_CMP_ERRORS))
+
 #define DB_IDX_MASK0xff
 #define DB_IDX_VALID   (0x1 << 26)
 #define DB_IRQ_DIS (0x1 << 27)
-- 
1.8.3.1

[PATCH net 0/2] bnxt_en: Error handling and netpoll fixes.

2017-06-23 Thread Michael Chan

Add missing error handling and fix netpoll handling.  The current code
handles RX and TX events in netpoll mode and is causing lots of warnings
and errors in the RX code path in netpoll mode.  The fix is to only handle
TX events in netpoll mode.

Michael Chan (2):
  bnxt_en: Add missing logic to handle TPA end error conditions.
  bnxt_en: Fix netpoll handling.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 61 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  6 ++-
 2 files changed, 57 insertions(+), 10 deletions(-)

-- 
1.8.3.1

Re: [PATCH 2/3] net: qcom/emac: do not reset the EMAC during initialization

2017-06-23 Thread David Miller

From: Timur Tabi 
Date: Thu, 22 Jun 2017 13:05:31 -0500

> It doesn't make sense to reset the EMAC in the middle of initializing
> it during the probe.
> 
> Tested-by: Richard Ruigrok 
> Signed-off-by: Timur Tabi 

Why not?

What if the boot loader or something else left the chip in
a weird state?

I'm not applying this.

If it's correct, the explanation in this commit message need
to be imporved.  The change must be better justified.

RE: Bluetooth: might sleep error in hidp_session_thread

2017-06-23 Thread Rohit Vaswani

I don't have a way to reply back to the older message; but you can use by 
tested-by for the below patch and re-send:

For patch: [v4,3/3] Bluetooth: hidp: fix possible might sleep error in 
hidp_session_thread

Tested-by: Rohit Vaswani 

-Rohit

-Original Message-
From: jeffy [mailto:jeffy.c...@rock-chips.com] 
Sent: Friday, June 23, 2017 05:39
To: Rohit Vaswani; linux-blueto...@vger.kernel.org
Cc: Brian Norris; Douglas Anderson; Johan Hedberg; Peter Hurley; Johan Hedberg; 
netdev@vger.kernel.org; linux-ker...@vger.kernel.org; David S. Miller; Marcel 
Holtmann; Gustavo Padovan
Subject: Re: Bluetooth: might sleep error in hidp_session_thread

Hi Rohit,

Thanx for your reply, and sorry to the delay, somehow your mail was marked as 
spam by the mail server :(

On 06/13/2017 02:31 AM, Rohit Vaswani wrote:
> Hi Jeffy,
>
> I was looking into the patch from Jeffy Chen from February 14 2017  :
>   [v4,3/3] Bluetooth: hidp: fix possible might sleep error in 
> hidp_session_thread: https://patchwork.kernel.org/patch/9570931/
>
> We faced a similar issue and this patch seems to fix the problem in our 
> preliminary test.
> I am trying to check if there was a reason this wasn't merged earlier ?

hmm, i'm not sure why, but please feel free to add your test-by~
>
> Thanks,
> Rohit
>
> nvpublic
>
>
>

Re: [PATCH] net: stmmac: make some functions static

2017-06-23 Thread David Miller

From: Colin King 
Date: Thu, 22 Jun 2017 17:17:29 +0100

> From: Colin Ian King 
> 
> The functions dwmac4_dma_init_rx_chan, dwmac4_dma_init_tx_chan and
> dwmac4_dma_init_channel do not need to be in global scope, so them
> static.
> 
> Cleans up sparse warnings:
> "symbol 'dwmac4_dma_init_rx_chan' was not declared. Should it be static?"
> "symbol 'dwmac4_dma_init_tx_chan' was not declared. Should it be static?"
> "symbol 'dwmac4_dma_init_channel' was not declared. Should it be static?"
> 
> Signed-off-by: Colin Ian King 

Applied to net-next, thanks.

Re: [GIT PULL] arcnet: fixes and features

2017-06-23 Thread David Miller

From: Michael Grzeschik 
Date: Thu, 22 Jun 2017 17:31:02 +0200

> are available in the git repository at:
> 
>   pub...@git.pengutronix.de:/mgr/linux.git tags/arcnet-for-mainline

I'm not pulling from that address, either setup a properl kernel.org
GIT account or we work with just plain patches.

Next, you need to provide a proper commit message in a
"[PATCH 0/N] " posting explaining at a high level what
the patch series is doing as a unit, how it is doing it,
and why it is doing it that way.

Finally, in patch #1 the assignment of "flags" to "0" in the
declaration is unnecessary please remove it.

Re: [PATCH net-next 1/2] ipmr: restrict mroute "queue full" warning to related error values

2017-06-23 Thread Julien Gomes

On 06/23/2017 10:39 AM, David Miller wrote:

> From: Julien Gomes 
> Date: Wed, 21 Jun 2017 10:58:10 -0700
>
>> When sending a cache report on mroute_sk, mroute will emit a
>> "pending queue full" warning for every error value returned by
>> sock_queue_rcv_skb().
>> This warning can be misleading, for example on the EPERM error value
>> that sk_filter() can return.
>>
>> Restricting this warning to only ENOMEM or ENOBUFS seems more
>> appropriate.
>>
>> Signed-off-by: Julien Gomes 
> Incorrect, no other error codes are possible.
>
> We never attach a socket filter to these kernel internal sockets,
> therefore sk_filter() is not even applicable in this analysis.
>
> Therefore, -ENOBUFS and -ENOMEM are the only errors we can ever see
> returned from sock_queue_rcv_skb().
>
> This goes for your second patch as well.

Up to now I would agree, but now that cache reports are also sent
through Netlink, wouldn't it make sense to allow the user of mroute_sk
to attach a filter to it in order to not receive cache reports on it?

I agree this is not crucial in any way, but this could be a way to let
the user program choose whether it receives the reports through one
of the mediums, and not inevitably both.

-- 
Julien Gomes

Re: [PATCH net-next 0/8] xdp: offload mode

2017-06-23 Thread David Miller

From: Jakub Kicinski 
Date: Wed, 21 Jun 2017 18:25:02 -0700

> While we discuss the representors.. :)
> 
> This set adds XDP flag for forcing offload and a attachment mode
> for reporting to user space that program has been offloaded.  The
> nfp driver is modified to make use of the new flags, but also to
> adhere to the DRV_MODE flag which should disable the HW offload.
> 
> The intended driver behaviour is:
> DRV mode   offload
> no flags  yes attempted
> DRV_MODE  yesno
>  HW_MODE  no yes
> 
> Where 'yes' means required, and error will be returned if setup fails.
> 'Attempted' means the offload will only happen automatically if HW is
> capable and offloading the program will cause no change in system
> behaviour (e.g. maps don't have to bound).
> 
> Thanks to loading the program both to the driver and HW by default we
> can fallback to the driver mode without disruption in case user replaces
> the program with one which cannot be offloaded later.
> 
> Note that the NFP driver currently claims XDP offload support but 
> lacks most basic features like direct packet access.
> 
> Only change compared to the RFC is fixing the double bpf_prog_put()
> which Daniel has spotted (patch 5).

Applied, thank you.

Re: [PATCH net-next 1/2] ipmr: restrict mroute "queue full" warning to related error values

2017-06-23 Thread David Miller

From: Julien Gomes 
Date: Wed, 21 Jun 2017 10:58:10 -0700

> When sending a cache report on mroute_sk, mroute will emit a
> "pending queue full" warning for every error value returned by
> sock_queue_rcv_skb().
> This warning can be misleading, for example on the EPERM error value
> that sk_filter() can return.
> 
> Restricting this warning to only ENOMEM or ENOBUFS seems more
> appropriate.
> 
> Signed-off-by: Julien Gomes 

Incorrect, no other error codes are possible.

We never attach a socket filter to these kernel internal sockets,
therefore sk_filter() is not even applicable in this analysis.

Therefore, -ENOBUFS and -ENOMEM are the only errors we can ever see
returned from sock_queue_rcv_skb().

This goes for your second patch as well.

Re: [PATCH net-next 2/2] af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers

2017-06-23 Thread Oliver Hartkopp



On 06/23/2017 07:32 PM, Julian Wiedmann wrote:
> From: Mateusz Jurczyk 
> 
> Verify that the caller-provided sockaddr structure is large enough to
> contain the sa_family field, before accessing it in bind() and connect()
> handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
> size of the corresponding memory region, very short sockaddrs (zero or
> one byte long) result in operating on uninitialized memory while
> referencing .sa_family.

Won't it make sense to generally check the minimum length for .sa_family at a
single point before fixing all called sites?

Regards,
Oliver

> 
> Fixes: 52a82e23b9f2 ("af_iucv: Validate socket address length in 
> iucv_sock_bind()")
> Signed-off-by: Mateusz Jurczyk 
> [jwi: removed unneeded null-check for addr]
> Signed-off-by: Julian Wiedmann 
> ---
>  net/iucv/af_iucv.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
> index 05112094d76b..ac033e413bc5 100644
> --- a/net/iucv/af_iucv.c
> +++ b/net/iucv/af_iucv.c
> @@ -715,10 +715,8 @@ static int iucv_sock_bind(struct socket *sock, struct 
> sockaddr *addr,
>   char uid[9];
>  
>   /* Verify the input sockaddr */
> - if (!addr || addr->sa_family != AF_IUCV)
> - return -EINVAL;
> -
> - if (addr_len < sizeof(struct sockaddr_iucv))
> + if (addr_len < sizeof(struct sockaddr_iucv) ||
> + addr->sa_family != AF_IUCV)
>   return -EINVAL;
>  
>   lock_sock(sk);
> @@ -862,7 +860,7 @@ static int iucv_sock_connect(struct socket *sock, struct 
> sockaddr *addr,
>   struct iucv_sock *iucv = iucv_sk(sk);
>   int err;
>  
> - if (addr->sa_family != AF_IUCV || alen < sizeof(struct sockaddr_iucv))
> + if (alen < sizeof(struct sockaddr_iucv) || addr->sa_family != AF_IUCV)
>   return -EINVAL;
>  
>   if (sk->sk_state != IUCV_OPEN && sk->sk_state != IUCV_BOUND)
>

[PATCH 2/4] net: phy: Support "internal" PHY interface

2017-06-23 Thread Florian Fainelli

Now that the Device Tree binding has been updated, update the PHY
library phy_interface_t and phy_modes to support the "internal" PHY
interface type.

Signed-off-by: Florian Fainelli 
---
 include/linux/phy.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 23d2e46dd322..1d8d70193782 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -64,6 +64,7 @@
 /* Interface Mode definitions */
 typedef enum {
PHY_INTERFACE_MODE_NA,
+   PHY_INTERFACE_MODE_INTERNAL,
PHY_INTERFACE_MODE_MII,
PHY_INTERFACE_MODE_GMII,
PHY_INTERFACE_MODE_SGMII,
@@ -114,6 +115,8 @@ static inline const char *phy_modes(phy_interface_t 
interface)
switch (interface) {
case PHY_INTERFACE_MODE_NA:
return "";
+   case PHY_INTERFACE_MODE_INTERNAL:
+   return "internal";
case PHY_INTERFACE_MODE_MII:
return "mii";
case PHY_INTERFACE_MODE_GMII:
-- 
2.9.3

[PATCH 1/4] dt-bindings: Add "internal" as a valid 'phy-mode' property

2017-06-23 Thread Florian Fainelli

A number of Ethernet MACs have internal Ethernet PHYs and the internal
wiring makes it so that this knowledge needs to be available using the
standard 'phy-mode' property.

Signed-off-by: Florian Fainelli 
---
 Documentation/devicetree/bindings/net/ethernet.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/ethernet.txt 
b/Documentation/devicetree/bindings/net/ethernet.txt
index d4abe9a98109..edd7fd2bbbf9 100644
--- a/Documentation/devicetree/bindings/net/ethernet.txt
+++ b/Documentation/devicetree/bindings/net/ethernet.txt
@@ -11,6 +11,7 @@ The following properties are common to the Ethernet 
controllers:
   the maximum frame size (there's contradiction in ePAPR).
 - phy-mode: string, operation mode of the PHY interface. This is now a de-facto
   standard property; supported values are:
+  * "internal"
   * "mii"
   * "gmii"
   * "sgmii"
-- 
2.9.3

[PATCH net-next 4/4] net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode

2017-06-23 Thread Florian Fainelli

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 76e98e8ed315..648f91b58d1e 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -498,10 +498,8 @@ static void bcm_sf2_identify_ports(struct bcm_sf2_priv 
*priv,
   struct device_node *dn)
 {
struct device_node *port;
-   const char *phy_mode_str;
int mode;
unsigned int port_num;
-   int ret;
 
priv->moca_port = -1;
 
@@ -515,15 +513,11 @@ static void bcm_sf2_identify_ports(struct bcm_sf2_priv 
*priv,
 * time
 */
mode = of_get_phy_mode(port);
-   if (mode < 0) {
-   ret = of_property_read_string(port, "phy-mode",
- &phy_mode_str);
-   if (ret < 0)
-   continue;
-
-   if (!strcasecmp(phy_mode_str, "internal"))
-   priv->int_phy_mask |= 1 << port_num;
-   }
+   if (mode < 0)
+   continue;
+
+   if (mode == PHY_INTERFACE_MODE_INTERNAL)
+   priv->int_phy_mask |= 1 << port_num;
 
if (mode == PHY_INTERFACE_MODE_MOCA)
priv->moca_port = port_num;
-- 
2.9.3

[PATCH net-next 0/4] net: phy: Support "internal" PHY interface

2017-06-23 Thread Florian Fainelli

Hi all,

This makes the "internal" phy-mode property generally available and
documented and this allows us to remove some custom parsing code
we had for bcmgenet and bcm_sf2 which both used that specific value.

Florian Fainelli (4):
  dt-bindings: Add "internal" as a valid 'phy-mode' property
  net: phy: Support "internal" PHY interface
  net: bcmgenet: Remove special handling of "internal" phy-mode
  net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode

 Documentation/devicetree/bindings/net/ethernet.txt |  1 +
 drivers/net/dsa/bcm_sf2.c  | 16 +--
 drivers/net/ethernet/broadcom/genet/bcmmii.c   | 24 --
 include/linux/phy.h|  3 +++
 4 files changed, 17 insertions(+), 27 deletions(-)

-- 
2.9.3

[PATCH net-next 1/4] dt-bindings: Add "internal" as a valid 'phy-mode' property

2017-06-23 Thread Florian Fainelli

A number of Ethernet MACs have internal Ethernet PHYs and the internal
wiring makes it so that this knowledge needs to be available using the
standard 'phy-mode' property.

Signed-off-by: Florian Fainelli 
---
 Documentation/devicetree/bindings/net/ethernet.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/ethernet.txt 
b/Documentation/devicetree/bindings/net/ethernet.txt
index d4abe9a98109..edd7fd2bbbf9 100644
--- a/Documentation/devicetree/bindings/net/ethernet.txt
+++ b/Documentation/devicetree/bindings/net/ethernet.txt
@@ -11,6 +11,7 @@ The following properties are common to the Ethernet 
controllers:
   the maximum frame size (there's contradiction in ePAPR).
 - phy-mode: string, operation mode of the PHY interface. This is now a de-facto
   standard property; supported values are:
+  * "internal"
   * "mii"
   * "gmii"
   * "sgmii"
-- 
2.9.3

Re: [PATCH net-next 0/4] net: phy: Support "internal" PHY interface

2017-06-23 Thread Florian Fainelli

On 06/23/2017 10:33 AM, Florian Fainelli wrote:
> Hi all,
> 
> This makes the "internal" phy-mode property generally available and
> documented and this allows us to remove some custom parsing code
> we had for bcmgenet and bcm_sf2 which both used that specific value.

Sorry for the resend, this is net-next material.

> 
> Florian Fainelli (4):
>   dt-bindings: Add "internal" as a valid 'phy-mode' property
>   net: phy: Support "internal" PHY interface
>   net: bcmgenet: Remove special handling of "internal" phy-mode
>   net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode
> 
>  Documentation/devicetree/bindings/net/ethernet.txt |  1 +
>  drivers/net/dsa/bcm_sf2.c  | 16 +--
>  drivers/net/ethernet/broadcom/genet/bcmmii.c   | 24 
> --
>  include/linux/phy.h|  3 +++
>  4 files changed, 17 insertions(+), 27 deletions(-)
> 


-- 
Florian

[PATCH net-next 3/4] net: bcmgenet: Remove special handling of "internal" phy-mode

2017-06-23 Thread Florian Fainelli

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/genet/bcmmii.c | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 285676f8da6b..071fcbd14e6a 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -251,11 +251,8 @@ int bcmgenet_mii_config(struct net_device *dev)
priv->ext_phy = !priv->internal_phy &&
(priv->phy_interface != PHY_INTERFACE_MODE_MOCA);
 
-   if (priv->internal_phy)
-   priv->phy_interface = PHY_INTERFACE_MODE_NA;
-
switch (priv->phy_interface) {
-   case PHY_INTERFACE_MODE_NA:
+   case PHY_INTERFACE_MODE_INTERNAL:
case PHY_INTERFACE_MODE_MOCA:
/* Irrespective of the actually configured PHY speed (100 or
 * 1000) GENETv4 only has an internal GPHY so we will just end
@@ -471,7 +468,6 @@ static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
 {
struct device_node *dn = priv->pdev->dev.of_node;
struct device *kdev = &priv->pdev->dev;
-   const char *phy_mode_str = NULL;
struct phy_device *phydev = NULL;
char *compat;
int phy_mode;
@@ -510,23 +506,19 @@ static int bcmgenet_mii_of_init(struct bcmgenet_priv 
*priv)
 
/* Get the link mode */
phy_mode = of_get_phy_mode(dn);
+   if (phy_mode < 0) {
+   dev_err(kdev, "invalid PHY mode property\n");
+   return phy_mode;
+   }
+
priv->phy_interface = phy_mode;
 
/* We need to specifically look up whether this PHY interface is 
internal
 * or not *before* we even try to probe the PHY driver over MDIO as we
 * may have shut down the internal PHY for power saving purposes.
 */
-   if (phy_mode < 0) {
-   ret = of_property_read_string(dn, "phy-mode", &phy_mode_str);
-   if (ret < 0) {
-   dev_err(kdev, "invalid PHY mode property\n");
-   return ret;
-   }
-
-   priv->phy_interface = PHY_INTERFACE_MODE_NA;
-   if (!strcasecmp(phy_mode_str, "internal"))
-   priv->internal_phy = true;
-   }
+   if (priv->phy_interface == PHY_INTERFACE_MODE_INTERNAL)
+   priv->internal_phy = true;
 
/* Make sure we initialize MoCA PHYs with a link down */
if (phy_mode == PHY_INTERFACE_MODE_MOCA) {
-- 
2.9.3

Re: [PATCH] PATCH v3 Convert multiple netdev_info messages to netdev_dbg

2017-06-23 Thread David Miller

From: Michael J Dilmore 
Date: Wed, 21 Jun 2017 03:08:54 +0100

> The bond_options.c file contains multiple netdev_info messages that
> clutter kernel output. This patches replaces these with netdev_dbg messages
> and adds a netdev_dbg for packets for slave.
> 
> Signed-off-by: Michael J Dilmore 
> Suggested-by: Joe Perches 

This falls under the category of "very low quality patch submission"
I'm afraid.

First of all your Subject lines need to be done more properly.

Puting "PATCH" twice in there is pointless, just once inside of the
brackets is enough.  The "v3" belongs inside of the brackets too.

Seriously, just look at how other developer format their Subject lines
on this mailing list, and you are less likely to go wrong.  Doing thing
your own unique way is asking for trouble.

But more importantly, your commit log message says you are converting
netdev_info calls into netdev_dbg ones, but that is not at all what
this patch does.

>   netdev_dbg(bond->dev, "%s mode is incompatible with arp 
> monitoring, start mii monitoring\n",
> - newval->string);
> +newval->string);

You're simply adjusting indentation of the code.

There are no "conversions" going on here at all.

[PATCH net-next 2/4] net: phy: Support "internal" PHY interface

2017-06-23 Thread Florian Fainelli

Now that the Device Tree binding has been updated, update the PHY
library phy_interface_t and phy_modes to support the "internal" PHY
interface type.

Signed-off-by: Florian Fainelli 
---
 include/linux/phy.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 23d2e46dd322..1d8d70193782 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -64,6 +64,7 @@
 /* Interface Mode definitions */
 typedef enum {
PHY_INTERFACE_MODE_NA,
+   PHY_INTERFACE_MODE_INTERNAL,
PHY_INTERFACE_MODE_MII,
PHY_INTERFACE_MODE_GMII,
PHY_INTERFACE_MODE_SGMII,
@@ -114,6 +115,8 @@ static inline const char *phy_modes(phy_interface_t 
interface)
switch (interface) {
case PHY_INTERFACE_MODE_NA:
return "";
+   case PHY_INTERFACE_MODE_INTERNAL:
+   return "internal";
case PHY_INTERFACE_MODE_MII:
return "mii";
case PHY_INTERFACE_MODE_GMII:
-- 
2.9.3

[PATCH net-next 0/2] af_iucv updates

2017-06-23 Thread Julian Wiedmann

Hi Dave,
please apply two af_iucv patches for net-next.

Thanks,
Julian


Hans Wippel (1):
  net/iucv: improve endianness handling

Mateusz Jurczyk (1):
  af_iucv: Move sockaddr length checks to before accessing sa_family in
bind and connect handlers

 net/iucv/af_iucv.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

-- 
2.11.2

[PATCH 3/4] net: bcmgenet: Remove special handling of "internal" phy-mode

2017-06-23 Thread Florian Fainelli

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/genet/bcmmii.c | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 285676f8da6b..071fcbd14e6a 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -251,11 +251,8 @@ int bcmgenet_mii_config(struct net_device *dev)
priv->ext_phy = !priv->internal_phy &&
(priv->phy_interface != PHY_INTERFACE_MODE_MOCA);
 
-   if (priv->internal_phy)
-   priv->phy_interface = PHY_INTERFACE_MODE_NA;
-
switch (priv->phy_interface) {
-   case PHY_INTERFACE_MODE_NA:
+   case PHY_INTERFACE_MODE_INTERNAL:
case PHY_INTERFACE_MODE_MOCA:
/* Irrespective of the actually configured PHY speed (100 or
 * 1000) GENETv4 only has an internal GPHY so we will just end
@@ -471,7 +468,6 @@ static int bcmgenet_mii_of_init(struct bcmgenet_priv *priv)
 {
struct device_node *dn = priv->pdev->dev.of_node;
struct device *kdev = &priv->pdev->dev;
-   const char *phy_mode_str = NULL;
struct phy_device *phydev = NULL;
char *compat;
int phy_mode;
@@ -510,23 +506,19 @@ static int bcmgenet_mii_of_init(struct bcmgenet_priv 
*priv)
 
/* Get the link mode */
phy_mode = of_get_phy_mode(dn);
+   if (phy_mode < 0) {
+   dev_err(kdev, "invalid PHY mode property\n");
+   return phy_mode;
+   }
+
priv->phy_interface = phy_mode;
 
/* We need to specifically look up whether this PHY interface is 
internal
 * or not *before* we even try to probe the PHY driver over MDIO as we
 * may have shut down the internal PHY for power saving purposes.
 */
-   if (phy_mode < 0) {
-   ret = of_property_read_string(dn, "phy-mode", &phy_mode_str);
-   if (ret < 0) {
-   dev_err(kdev, "invalid PHY mode property\n");
-   return ret;
-   }
-
-   priv->phy_interface = PHY_INTERFACE_MODE_NA;
-   if (!strcasecmp(phy_mode_str, "internal"))
-   priv->internal_phy = true;
-   }
+   if (priv->phy_interface == PHY_INTERFACE_MODE_INTERNAL)
+   priv->internal_phy = true;
 
/* Make sure we initialize MoCA PHYs with a link down */
if (phy_mode == PHY_INTERFACE_MODE_MOCA) {
-- 
2.9.3

[PATCH net-next 2/2] af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers

2017-06-23 Thread Julian Wiedmann

From: Mateusz Jurczyk 

Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.

Fixes: 52a82e23b9f2 ("af_iucv: Validate socket address length in 
iucv_sock_bind()")
Signed-off-by: Mateusz Jurczyk 
[jwi: removed unneeded null-check for addr]
Signed-off-by: Julian Wiedmann 
---
 net/iucv/af_iucv.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 05112094d76b..ac033e413bc5 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -715,10 +715,8 @@ static int iucv_sock_bind(struct socket *sock, struct 
sockaddr *addr,
char uid[9];
 
/* Verify the input sockaddr */
-   if (!addr || addr->sa_family != AF_IUCV)
-   return -EINVAL;
-
-   if (addr_len < sizeof(struct sockaddr_iucv))
+   if (addr_len < sizeof(struct sockaddr_iucv) ||
+   addr->sa_family != AF_IUCV)
return -EINVAL;
 
lock_sock(sk);
@@ -862,7 +860,7 @@ static int iucv_sock_connect(struct socket *sock, struct 
sockaddr *addr,
struct iucv_sock *iucv = iucv_sk(sk);
int err;
 
-   if (addr->sa_family != AF_IUCV || alen < sizeof(struct sockaddr_iucv))
+   if (alen < sizeof(struct sockaddr_iucv) || addr->sa_family != AF_IUCV)
return -EINVAL;
 
if (sk->sk_state != IUCV_OPEN && sk->sk_state != IUCV_BOUND)
-- 
2.11.2

[PATCH 4/4] net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode

2017-06-23 Thread Florian Fainelli

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 76e98e8ed315..648f91b58d1e 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -498,10 +498,8 @@ static void bcm_sf2_identify_ports(struct bcm_sf2_priv 
*priv,
   struct device_node *dn)
 {
struct device_node *port;
-   const char *phy_mode_str;
int mode;
unsigned int port_num;
-   int ret;
 
priv->moca_port = -1;
 
@@ -515,15 +513,11 @@ static void bcm_sf2_identify_ports(struct bcm_sf2_priv 
*priv,
 * time
 */
mode = of_get_phy_mode(port);
-   if (mode < 0) {
-   ret = of_property_read_string(port, "phy-mode",
- &phy_mode_str);
-   if (ret < 0)
-   continue;
-
-   if (!strcasecmp(phy_mode_str, "internal"))
-   priv->int_phy_mask |= 1 << port_num;
-   }
+   if (mode < 0)
+   continue;
+
+   if (mode == PHY_INTERFACE_MODE_INTERNAL)
+   priv->int_phy_mask |= 1 << port_num;
 
if (mode == PHY_INTERFACE_MODE_MOCA)
priv->moca_port = port_num;
-- 
2.9.3

[PATCH 0/4] net: phy: Support "internal" PHY interface

2017-06-23 Thread Florian Fainelli

Hi all,

This makes the "internal" phy-mode property generally available and
documented and this allows us to remove some custom parsing code
we had for bcmgenet and bcm_sf2 which both used that specific value.

Florian Fainelli (4):
  dt-bindings: Add "internal" as a valid 'phy-mode' property
  net: phy: Support "internal" PHY interface
  net: bcmgenet: Remove special handling of "internal" phy-mode
  net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode

 Documentation/devicetree/bindings/net/ethernet.txt |  1 +
 drivers/net/dsa/bcm_sf2.c  | 16 +--
 drivers/net/ethernet/broadcom/genet/bcmmii.c   | 24 --
 include/linux/phy.h|  3 +++
 4 files changed, 17 insertions(+), 27 deletions(-)

-- 
2.9.3

[PATCH net-next 1/2] net/iucv: improve endianness handling

2017-06-23 Thread Julian Wiedmann

From: Hans Wippel 

Use proper endianness conversion for an skb protocol assignment. Given
that IUCV is only available on big endian systems (s390), this simply
avoids an endianness warning reported by sparse.

Signed-off-by: Hans Wippel 
Reviewed-by: Julian Wiedmann 
Reviewed-by: Ursula Braun 
Signed-off-by: Julian Wiedmann 
---
 net/iucv/af_iucv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index 2cf9d59f1b72..05112094d76b 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -362,7 +362,7 @@ static int afiucv_hs_send(struct iucv_message *imsg, struct 
sock *sock,
else
skb_trim(skb, skb->dev->mtu);
}
-   skb->protocol = ETH_P_AF_IUCV;
+   skb->protocol = cpu_to_be16(ETH_P_AF_IUCV);
nskb = skb_clone(skb, GFP_ATOMIC);
if (!nskb)
return -ENOMEM;
-- 
2.11.2

Re: [PATCH net] net: account for current skb length when deciding about UFO

2017-06-23 Thread David Miller

From: Michal Kubecek 
Date: Mon, 19 Jun 2017 13:03:43 +0200 (CEST)

> Our customer encountered stuck NFS writes for blocks starting at specific
> offsets w.r.t. page boundary caused by networking stack sending packets via
> UFO enabled device with wrong checksum. The problem can be reproduced by
> composing a long UDP datagram from multiple parts using MSG_MORE flag:
> 
>   sendto(sd, buff, 1000, MSG_MORE, ...);
>   sendto(sd, buff, 1000, MSG_MORE, ...);
>   sendto(sd, buff, 3000, 0, ...);
> 
> Assume this packet is to be routed via a device with MTU 1500 and
> NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
> this condition is tested (among others) to decide whether to call
> ip_ufo_append_data():
> 
>   ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
> 
> At the moment, we already have skb with 1028 bytes of data which is not
> marked for GSO so that the test is false (fragheaderlen is usually 20).
> Thus we append second 1000 bytes to this skb without invoking UFO. Third
> sendto(), however, has sufficient length to trigger the UFO path so that we
> end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
> uses udp_csum() to calculate the checksum but that assumes all fragments
> have correct checksum in skb->csum which is not true for UFO fragments.
> 
> When checking against MTU, we need to add skb->len to length of new segment
> if we already have a partially filled skb and fragheaderlen only if there
> isn't one.
> 
> In the IPv6 case, skb can only be null if this is the first segment so that
> we have to use headersize (length of the first IPv6 header) rather than
> fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
> 
> Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach")
> Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for
>   ip6 fragment between __ip6_append_data and ip6_finish_output")
> Signed-off-by: Michal Kubecek 

Applied and queued up for -stable, thanks.

Re: [PATCH 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-06-23 Thread Heiko Stuebner

Hi David,

Am Freitag, 23. Juni 2017, 12:59:07 CEST schrieb David Wu:
> To make internal phy worked, need to configure the phy_clock,
> phy cru_reset and related registers.
> 
> Change-Id: I6971c0a769754b824b1b908b56080cbaf7867d13

please remove all Change-Ids from patches before sending upstream.
There were more affected patches in this series.

> Signed-off-by: David Wu 
> ---
>  .../devicetree/bindings/net/rockchip-dwmac.txt |  3 +
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 82 
> ++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt 
> b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> index 8f42755..0514f69 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> @@ -22,6 +22,7 @@ Required properties:
>  <&cru SCLK_MACREF_OUT> clock gate for RMII reference clock output
>  <&cru ACLK_GMAC>: AXI clock gate for GMAC
>  <&cru PCLK_GMAC>: APB clock gate for GMAC
> +<&cru MAC_PHY>: clock for internal macphy

that clock should not be listed as always "Required" like it is here.
Make it some sort of extra paragraph marking it as required when using
an internal phy.

>   - clock-names: One name for each entry in the clocks property.
>   - phy-mode: See ethernet.txt file in the same directory.
>   - pinctrl-names: Names corresponding to the numbered pinctrl states.
> @@ -35,6 +36,8 @@ Required properties:
>   - assigned-clocks: main clock, should be <&cru SCLK_MAC>;
>   - assigned-clock-parents = parent of main clock.
> can be <&ext_gmac> or <&cru SCLK_MAC_PLL>.
> + - phy-type: For internal phy, it must be "internal"; For external phy, no 
> need
> +   to configure this.
>  
>  Optional properties:
>   - tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as 
> default.
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> index a8e8fd5..c1a1413 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -41,6 +41,7 @@ struct rk_gmac_ops {
>   void (*set_to_rmii)(struct rk_priv_data *bsp_priv);
>   void (*set_rgmii_speed)(struct rk_priv_data *bsp_priv, int speed);
>   void (*set_rmii_speed)(struct rk_priv_data *bsp_priv, int speed);
> + void (*internal_phy_powerup)(struct rk_priv_data *bsp_priv);
>  };
>  
>  struct rk_priv_data {
> @@ -52,6 +53,7 @@ struct rk_priv_data {
>  
>   bool clk_enabled;
>   bool clock_input;
> + bool internal_phy;
>  
>   struct clk *clk_mac;
>   struct clk *gmac_clkin;
> @@ -61,6 +63,9 @@ struct rk_priv_data {
>   struct clk *clk_mac_refout;
>   struct clk *aclk_mac;
>   struct clk *pclk_mac;
> + struct clk *clk_macphy;
> +
> + struct reset_control *macphy_reset;
>  
>   int tx_delay;
>   int rx_delay;
> @@ -750,6 +755,48 @@ static void rk3399_set_rmii_speed(struct rk_priv_data 
> *bsp_priv, int speed)
>   .set_rmii_speed = rk3399_set_rmii_speed,
>  };
>  
> +#define RK_GRF_MACPHY_CON0   0xb00
> +#define RK_GRF_MACPHY_CON1   0xb04
> +#define RK_GRF_MACPHY_CON2   0xb08
> +#define RK_GRF_MACPHY_CON3   0xb0c
> +
> +#define RK_MACPHY_ENABLE GRF_BIT(0)
> +#define RK_MACPHY_DISABLEGRF_CLR_BIT(0)
> +#define RK_MACPHY_CFG_CLK_50MGRF_BIT(14)
> +#define RK_GMAC2PHY_RMII_MODE(GRF_BIT(6) | GRF_CLR_BIT(7))
> +#define RK_GRF_CON2_MACPHY_IDHIWORD_UPDATE(0x1234, 0x, 0)
> +#define RK_GRF_CON3_MACPHY_IDHIWORD_UPDATE(0x35, 0x3f, 0)

These are primarily registers for the rk3328 and come from the GRF which is
somehow prone to chip-designers moving bits around in registers and also
especially the register offsets (*_CONx) will probably not stay the same
on future socs.


> +static void rk_gmac_internal_phy_powerup(struct rk_priv_data *priv)
> +{
> + if (priv->ops->internal_phy_powerup)
> + priv->ops->internal_phy_powerup(priv);
> +
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_MACPHY_CFG_CLK_50M);
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_GMAC2PHY_RMII_MODE);
> +
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON2, RK_GRF_CON2_MACPHY_ID);
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON3, RK_GRF_CON3_MACPHY_ID);
> +
> + /* disable macphy, the default value is enabled */

that comment is not providing useful information, maybe
/* macphy needs to be disabled before trying to reset it */


> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_MACPHY_DISABLE);
> + if (priv->macphy_reset)
> + reset_control_assert(priv->macphy_reset);
> + usleep_range(10, 20);
> + if (priv->macphy_reset)
> + reset_control_deassert(priv->macphy_reset);
> + usleep_range(10, 20);
> + regmap_w

RE: RXFH manual configuration

2017-06-23 Thread Keller, Jacob E



> -Original Message-
> From: Tariq Toukan [mailto:tar...@mellanox.com]
> Sent: Sunday, June 18, 2017 1:32 AM
> To: Keller, Jacob E 
> Cc: Linux Kernel Network Developers ; Saeed
> Mahameed ; Eran Ben Elisha
> 
> Subject: RXFH manual configuration
> 
> Hi Jacob,
> 
> I am looking at your patch:
> d4ab4286276f ethtool: correctly ensure {GS}CHANNELS doesn't conflict ...
> 
> I wonder - if I want to configure the number of channels (ethtool -L),
> without being aware to the history of indirection table (manually set or
> not), how can I know what is the expected behavior?
> I should be able, from userspace, to query the value of
> netif_is_rxfh_configured() in order to decide whether the outcome of an
> ethtool -L command is correct or not.
> 
> Do you know if this indication exists?
> 
> Thanks,
> Tariq

I mis-spoke earlier, this is set in priv_flags and is thus not shared with 
userspace. I'm not sure if there is any current way to tell the status.

Thanks,
Jake

RE: RXFH manual configuration

2017-06-23 Thread Keller, Jacob E

> -Original Message-
> From: Tariq Toukan [mailto:tar...@mellanox.com]
> Sent: Sunday, June 18, 2017 1:32 AM
> To: Keller, Jacob E 
> Cc: Linux Kernel Network Developers ; Saeed
> Mahameed ; Eran Ben Elisha
> 
> Subject: RXFH manual configuration
> 
> Hi Jacob,
> 
> I am looking at your patch:
> d4ab4286276f ethtool: correctly ensure {GS}CHANNELS doesn't conflict ...
> 
> I wonder - if I want to configure the number of channels (ethtool -L),
> without being aware to the history of indirection table (manually set or
> not), how can I know what is the expected behavior?
> I should be able, from userspace, to query the value of
> netif_is_rxfh_configured() in order to decide whether the outcome of an
> ethtool -L command is correct or not.
> 
> Do you know if this indication exists?
> 
> Thanks,
> Tariq

Hi,

I believe you can check the flags value, from /sys/class/net//flags, 
though you'll have to manually parse that. I don't think iproute2 suite shows 
this flag, but i suspect it could be modified to do so. Specifically you check 
for IFF_RXFH_CONFIGURED

Thanks,
Jake

Motorcycle Owners List

2017-06-23 Thread Mary Andrews



Hi,

Hope all's well,

Would you be interested in acquiring an email list of “ Motorcycle Owners List 
” from USA?

Each record in the list contains Contact Name (First, Middle and Last Name), 
Mailing Address, List type and Opt-in email address.

All the contacts are opt-in verified, 100% permission based and can be used for 
unlimited multi-channel marketing.

We also have data for:
(1)Frequent Travelers List   (2)RV/BoatOwners List
(3)Cycling Enthusiasts List  (4)Sail boat Owners List
(5)Power Boat Owners List(6)Harley Davidson Owners List
(7)Travelers List(8)Cruise Travelers List
(9)Wine Enthusiasts List (10)Car Owners List

Let me know if you'd be interested in hearing more about it.

Waiting for your valuable and sincere reply.

Best Regards,
Mary Andrews

Re: fs, net: deadlock between bind/splice on af_unix

2017-06-23 Thread Cong Wang

Hi,

On Thu, Jun 22, 2017 at 10:49 AM,   wrote:
> I was getting below crash while running mp4.

Are you sure your 3.14 kernel has my patch in this thread?
commit 0fb44559ffd67de8517098 is merged in 4.10.

Also, your crash is on unix_dgram_sendmsg() path, not
unix_bind().

Re: [PATCH 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-06-23 Thread Florian Fainelli

On 06/22/2017 09:59 PM, David Wu wrote:
> To make internal phy worked, need to configure the phy_clock,
> phy cru_reset and related registers.
> 
> Change-Id: I6971c0a769754b824b1b908b56080cbaf7867d13
> Signed-off-by: David Wu 
> ---
>  .../devicetree/bindings/net/rockchip-dwmac.txt |  3 +
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 82 
> ++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt 
> b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> index 8f42755..0514f69 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> @@ -22,6 +22,7 @@ Required properties:
>  <&cru SCLK_MACREF_OUT> clock gate for RMII reference clock output
>  <&cru ACLK_GMAC>: AXI clock gate for GMAC
>  <&cru PCLK_GMAC>: APB clock gate for GMAC
> +<&cru MAC_PHY>: clock for internal macphy
>   - clock-names: One name for each entry in the clocks property.
>   - phy-mode: See ethernet.txt file in the same directory.
>   - pinctrl-names: Names corresponding to the numbered pinctrl states.
> @@ -35,6 +36,8 @@ Required properties:
>   - assigned-clocks: main clock, should be <&cru SCLK_MAC>;
>   - assigned-clock-parents = parent of main clock.
> can be <&ext_gmac> or <&cru SCLK_MAC_PLL>.
> + - phy-type: For internal phy, it must be "internal"; For external phy, no 
> need
> +   to configure this.

Use the standard "phy-mode" property. You will see
drivers/net/ethernet/broadcom/genet/ actually define a phy-mode =
"internal" property specifically for that. This should probably be
generalized so it is useful to other drivers a well, I will do just that.

>  
>  Optional properties:
>   - tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as 
> default.
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> index a8e8fd5..c1a1413 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -41,6 +41,7 @@ struct rk_gmac_ops {
>   void (*set_to_rmii)(struct rk_priv_data *bsp_priv);
>   void (*set_rgmii_speed)(struct rk_priv_data *bsp_priv, int speed);
>   void (*set_rmii_speed)(struct rk_priv_data *bsp_priv, int speed);
> + void (*internal_phy_powerup)(struct rk_priv_data *bsp_priv);
>  };
>  
>  struct rk_priv_data {
> @@ -52,6 +53,7 @@ struct rk_priv_data {
>  
>   bool clk_enabled;
>   bool clock_input;
> + bool internal_phy;
>  
>   struct clk *clk_mac;
>   struct clk *gmac_clkin;
> @@ -61,6 +63,9 @@ struct rk_priv_data {
>   struct clk *clk_mac_refout;
>   struct clk *aclk_mac;
>   struct clk *pclk_mac;
> + struct clk *clk_macphy;
> +
> + struct reset_control *macphy_reset;
>  
>   int tx_delay;
>   int rx_delay;
> @@ -750,6 +755,48 @@ static void rk3399_set_rmii_speed(struct rk_priv_data 
> *bsp_priv, int speed)
>   .set_rmii_speed = rk3399_set_rmii_speed,
>  };
>  
> +#define RK_GRF_MACPHY_CON0   0xb00
> +#define RK_GRF_MACPHY_CON1   0xb04
> +#define RK_GRF_MACPHY_CON2   0xb08
> +#define RK_GRF_MACPHY_CON3   0xb0c
> +
> +#define RK_MACPHY_ENABLE GRF_BIT(0)
> +#define RK_MACPHY_DISABLEGRF_CLR_BIT(0)
> +#define RK_MACPHY_CFG_CLK_50MGRF_BIT(14)
> +#define RK_GMAC2PHY_RMII_MODE(GRF_BIT(6) | GRF_CLR_BIT(7))
> +#define RK_GRF_CON2_MACPHY_IDHIWORD_UPDATE(0x1234, 0x, 0)
> +#define RK_GRF_CON3_MACPHY_IDHIWORD_UPDATE(0x35, 0x3f, 0)
> +
> +static void rk_gmac_internal_phy_powerup(struct rk_priv_data *priv)
> +{
> + if (priv->ops->internal_phy_powerup)
> + priv->ops->internal_phy_powerup(priv);
> +
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_MACPHY_CFG_CLK_50M);
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_GMAC2PHY_RMII_MODE);
> +
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON2, RK_GRF_CON2_MACPHY_ID);
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON3, RK_GRF_CON3_MACPHY_ID);
> +
> + /* disable macphy, the default value is enabled */
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_MACPHY_DISABLE);
> + if (priv->macphy_reset)
> + reset_control_assert(priv->macphy_reset);
> + usleep_range(10, 20);
> + if (priv->macphy_reset)
> + reset_control_deassert(priv->macphy_reset);
> + usleep_range(10, 20);
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_MACPHY_ENABLE);
> + msleep(30);
> +}
> +
> +static void rk_gmac_internal_phy_powerdown(struct rk_priv_data *priv)
> +{
> + regmap_write(priv->grf, RK_GRF_MACPHY_CON0, RK_MACPHY_DISABLE);
> + if (priv->macphy_reset)
> + reset_control_assert(priv->macphy_reset);
> +}
> +
>  static int gmac_clk_init(struct rk_priv_data *bsp_priv)
>  {
>   struct

Re: [PATCH 01/11] net: phy: Add rockchip phy driver support

2017-06-23 Thread Florian Fainelli

On 06/22/2017 09:41 PM, David Wu wrote:
> Support internal ephy currently.
> 
> Signed-off-by: David Wu 
> ---
>  drivers/net/phy/Kconfig|  4 ++
>  drivers/net/phy/Makefile   |  1 +
>  drivers/net/phy/rockchip.c | 94 
> ++
>  3 files changed, 99 insertions(+)
>  create mode 100644 drivers/net/phy/rockchip.c
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index c360dd6..86010d4 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -350,6 +350,10 @@ config XILINX_GMII2RGMII
>   the Reduced Gigabit Media Independent Interface(RGMII) between
>   Ethernet physical media devices and the Gigabit Ethernet controller.
>  
> +config ROCKCHIP_MAC_PHY
> + tristate "Drivers for ROCKCHIP MAC PHY"
> + ---help---
> +   Currently supports the mac internal ephy.
>  endif # PHYLIB
>  
>  config MICREL_KS8995MA
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index e36db9a..6d96779 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -69,3 +69,4 @@ obj-$(CONFIG_STE10XP)   += ste10Xp.o
>  obj-$(CONFIG_TERANETICS_PHY) += teranetics.o
>  obj-$(CONFIG_VITESSE_PHY)+= vitesse.o
>  obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o
> +obj-$(CONFIG_ROCKCHIP_MAC_PHY)   += rockchip.o
> diff --git a/drivers/net/phy/rockchip.c b/drivers/net/phy/rockchip.c
> new file mode 100644
> index 000..69e96ec
> --- /dev/null
> +++ b/drivers/net/phy/rockchip.c
> @@ -0,0 +1,94 @@
> +/**
> + * Rockchip mac phy driver

MAC and PHY, capitalized.

> + *
> + * Copyright (c) 2017, Fuzhou Rockchip Electronics Co., Ltd
> + *
> + * David Wu
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + */
> +
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static int internal_config_init(struct phy_device *phydev)
> +{
> + int val;
> + u32 features;
> +
> + /*enable auto mdix*/
> + phy_write(phydev, 0x11, 0x0080);

That is probably the only meaningful change needed by this driver, and
even that is not quite correct because auto MDI-X can be changed from
user-space through ethtool, see
drivers/net/phy/marvell.c::marvell_config_aneg()

> +
> + features = (SUPPORTED_TP | SUPPORTED_MII
> + | SUPPORTED_AUI | SUPPORTED_FIBRE |
> + SUPPORTED_BNC);

This is not necessary, using driver::features set to PHY_GBIT_FEATURES

> +
> + /* Do we support autonegotiation? */
> + val = phy_read(phydev, MII_BMSR);
> + if (val < 0)
> + return val;
> +
> + if (val & BMSR_ANEGCAPABLE)
> + features |= SUPPORTED_Autoneg;

If we have disabled auto-negotiation prior to probing this driver, and
somehow the PHY is not reset, then you are falsely not advertising
support for auto-negotiation just because it *currently is* disabled.

> +
> + if (val & BMSR_100FULL)
> + features |= SUPPORTED_100baseT_Full;
> + if (val & BMSR_100HALF)
> + features |= SUPPORTED_100baseT_Half;
> + if (val & BMSR_10FULL)
> + features |= SUPPORTED_10baseT_Full;
> + if (val & BMSR_10HALF)
> + features |= SUPPORTED_10baseT_Half;
> +
> + if (val & BMSR_ESTATEN) {
> + val = phy_read(phydev, MII_ESTATUS);
> + if (val < 0)
> + return val;
> +
> + if (val & ESTATUS_1000_TFULL)
> + features |= SUPPORTED_1000baseT_Full;
> + if (val & ESTATUS_1000_THALF)
> + features |= SUPPORTED_1000baseT_Half;
> + }
> +
> + phydev->supported = features;
> + phydev->advertising = features;
> +
> + return 0;
> +}
> +
> +static struct phy_driver rockchip_phy_driver[] = {
> +{
> + .phy_id = 0x1234d400,
> + .phy_id_mask= 0x,

Last 4 digits are supposed to hold the revision, do you really need to
have such a strict mask here?

> + .name   = "rockchip internal ephy",
> + .features   = 0,

features shoul dbe set to what you support: PHY_GBIT_FEAUTERS

> + .config_init= internal_config_init,
> + .config_aneg= genphy_config_aneg,
> + .read_status= genphy_read_status,
> + .suspend= genphy_suspend,
> + .resume = genphy_resume,
> +},
> +};
> +
> +module_phy_driver(rockchip_phy_driver);
> +
> +static struct mdio_device_id __maybe_unused rockchip_phy_tbl[] = {
> + { 0x1234d400, 0x },
> + { }
> +};
> +
> +MODULE_DEVICE_TABLE(mdio, rockchip_phy_tbl);
> +
> +MODULE_AUTHOR("David Wu");
> +MODULE_DESCRIPTION("Rockchip mac phy driver");
> +MODULE_LICENSE("GPL v2");
> 


-- 
Florian

Re: [PATCH -net] tls: return -EFAULT if copy_to_user() fails

2017-06-23 Thread Dave Watson

On 06/23/17 01:15 PM, Dan Carpenter wrote:
> The copy_to_user() function returns the number of bytes remaining but we
> want to return -EFAULT here.
> 
> Fixes: 3c4d7559159b ("tls: kernel TLS support")
> Signed-off-by: Dan Carpenter 

Acked-by: Dave Watson 

Yes, -EFAULT seems like the correct choice here, the return from
copy_to_user isn't useful.  Thanks

> 
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 2ebc328bda96..a03130a47b85 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -273,7 +273,8 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
> __user *optval,
>   }
>  
>   if (len == sizeof(crypto_info)) {
> - rc = copy_to_user(optval, crypto_info, sizeof(*crypto_info));
> + if (copy_to_user(optval, crypto_info, sizeof(*crypto_info)))
> + rc = -EFAULT;
>   goto out;
>   }
>  
> @@ -293,9 +294,10 @@ static int do_tls_getsockopt_tx(struct sock *sk, char 
> __user *optval,
>   memcpy(crypto_info_aes_gcm_128->iv, ctx->iv,
>  TLS_CIPHER_AES_GCM_128_IV_SIZE);
>   release_sock(sk);
> - rc = copy_to_user(optval,
> -   crypto_info_aes_gcm_128,
> -   sizeof(*crypto_info_aes_gcm_128));
> + if (copy_to_user(optval,
> +  crypto_info_aes_gcm_128,
> +  sizeof(*crypto_info_aes_gcm_128)))
> + rc = -EFAULT;
>   break;
>   }
>   default:

[PATCH net] net: dp83640: Avoid NULL pointer dereference.

2017-06-23 Thread Richard Cochran

The function, skb_complete_tx_timestamp(), used to allow passing in a
NULL pointer for the time stamps, but that was changed in commit
62bccb8cdb69051b95a55ab0c489e3cab261c8ef ("net-timestamp: Make the
clone operation stand-alone from phy timestamping"), and the existing
call sites, all of which are in the dp83640 driver, were fixed up.

Even though the kernel-doc was subsequently updated in commit
7a76a021cd5a292be875fbc616daf03eab1e6996 ("net-timestamp: Update
skb_complete_tx_timestamp comment"), still a bug fix from Manfred
Rudigier came into the driver using the old semantics.  Probably
Manfred derived that patch from an older kernel version.

This fix should be applied to the stable trees as well.

Fixes: 81e8f2e930fe ("net: dp83640: Fix tx timestamp overflow handling.")
Signed-off-by: Richard Cochran 
---
 drivers/net/phy/dp83640.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index ed0d10f54f26..c3065236ffcc 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -908,7 +908,7 @@ static void decode_txts(struct dp83640_private *dp83640,
if (overflow) {
pr_debug("tx timestamp queue overflow, count %d\n", overflow);
while (skb) {
-   skb_complete_tx_timestamp(skb, NULL);
+   kfree_skb(skb);
skb = skb_dequeue(&dp83640->tx_queue);
}
return;
-- 
2.11.0

Re: [PATCH net-next] udp: fix poll()

2017-06-23 Thread David Miller

From: Paolo Abeni 
Date: Fri, 23 Jun 2017 14:19:51 +0200

> Michael reported an UDP breakage caused by the commit b65ac44674dd
> ("udp: try to avoid 2 cache miss on dequeue").
> The function __first_packet_length() can update the checksum bits
> of the pending skb, making the scratched area out-of-sync, and
> setting skb->csum, if the skb was previously in need of checksum
> validation.
> 
> On later recvmsg() for such skb, checksum validation will be
> invoked again - due to the wrong udp_skb_csum_unnecessary()
> value - and will fail, causing the valid skb to be dropped.
> 
> This change addresses the issue refreshing the scratch area in
> __first_packet_length() after the possible checksum update.
> 
> Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
> Reported-by: Michael Ellerman 
> Signed-off-by: Hannes Frederic Sowa 
> Signed-off-by: Paolo Abeni 

Thanks for fixing this so quickly, applied.

[PATCH net-next] net: dsa: mv88e6xxx: fix error code in mv88e6390_serdes_power()

2017-06-23 Thread Dan Carpenter

We're accidentally returning the wrong variable.  "cmode" is
uninitialized at this point so it causes a static checker warning.

Fixes: 6335e9f2446b ("net: dsa: mv88e6xxx: mv88e6390X SERDES support")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c 
b/drivers/net/dsa/mv88e6xxx/serdes.c
index 411b4f522792..f3c01119b3d1 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -192,7 +192,7 @@ int mv88e6390_serdes_power(struct mv88e6xxx_chip *chip, int 
port, bool on)
 
err = mv88e6xxx_port_get_cmode(chip, port, &cmode);
if (err)
-   return cmode;
+   return err;
 
switch (port) {
case 2:

1 2 >

1 - 100 of 178 matches

Mail list logo