Re: TCP fast retransmit issues

2017-07-27 Thread Christoph Paasch
Hello,

On Wed, Jul 26, 2017 at 7:32 AM, Eric Dumazet  wrote:
> On Wed, 2017-07-26 at 15:42 +0200, Willy Tarreau wrote:
>> On Wed, Jul 26, 2017 at 06:31:21AM -0700, Eric Dumazet wrote:
>> > On Wed, 2017-07-26 at 14:18 +0200, Klavs Klavsen wrote:
>> > > the 192.168.32.44 is a Centos 7 box.
>> >
>> > Could you grab a capture on this box, to see if the bogus packets are
>> > sent by it, or later mangled by a middle box ?
>>
>> Given the huge difference between the window and the ranges of the
>> values in the SACK field, I'm pretty sure there's a firewall doing
>> some sequence numbers randomization in the middle, not aware of SACK
>> and not converting these ones. I've had to disable such broken
>> features more than once in field after similar observations! Probably
>> that the Mac doesn't advertise SACK support and doesn't experience the
>> problem.
>
> We need to check RFC if such invalid SACK blocks should be ignored (DUP
> ACK would be processed and trigger fast retransmit anyway), or strongly
> validated (as I suspect we currently do), leading to a total freeze.

quite some time ago this issue with sequence number randomizing
middleboxes came already up
(http://marc.info/?l=netfilter-devel&m=137691623129822&w=2). From what
I remember, the RFC does not say that invalid SACK blocks should  be
strongly validated. So, trigger dup-ack retransmission seems fine.

I had some patches at the time that ignored invalid sack-blocks and
allowed fast-retransmit to happen thanks to the duplicate acks:
https://patchwork.ozlabs.org/patch/268297/
https://patchwork.ozlabs.org/patch/268298/


Cheers,
Christoph


Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-07-27 Thread David.Wu

Hi Florian,

在 2017/7/28 0:54, Florian Fainelli 写道:

- if you need knowledge about this PHY connection type prior to binding
the PHY device and its driver (that is, before of_phy_connect()) we
could add a boolean property e.g: "phy-is-internal" that allows us to
know that, or we can have a new phy-mode value, e.g: "internal-rmii"
which describes that, either way would probably be fine, but the former
scales better



Using "phy-is-internal" is very helpful, but it is easy to confuse with 
the real internal PHY, may we use the other words like phy is inlined🙂.



Then again, using phy-mode = "internal" even though this is Reduced MII
is not big of a deal IMHO as long as there is no loss of information and
that internal de-facto means internal reduced MII for instance.
--




Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-07-27 Thread David.Wu

Hi Andrew,

在 2017/7/27 21:48, Andrew Lunn 写道:

I think we need to discuss this. This PHY appears to be on an MDIO
bus, it uses a standard PHY driver, and it appears to be using an RMII
interface. So it is just an ordinary PHY.

Internal is supposed to be something which is not ordinary, does not
use one of the standard phy modes, needs something special to make it
work.


Yes, it is a ordinary PHY in fact, using MDIO bus, but it is a internal 
phy inside Soc, so the "internal" is not the internal as Florain said.




Re: TCP fast retransmit issues

2017-07-27 Thread Klavs Klavsen

The network guys know what caused it.

Appearently on (atleast some) Cisco equipment the feature:

TCP Sequence Number Randomization

is enabled by default.


It would most definetely be beneficial if Linux handled SACK "not 
working" better than it does - but then I might never have found the 
culprit who destroyed SACK :)


Willy Tarreau wrote on 26-07-2017 16:38:

On Wed, Jul 26, 2017 at 04:25:29PM +0200, Klavs Klavsen wrote:

Thank you very much guys for your insight.. its highly appreciated.

Next up for me, is waiting till the network guys come back from summer
vacation, and convince them to sniff on the devices in between to pinpoint
the culprit :)

That said, Eric, I'm a bit surprized that it completely stalls. Shouldn't
the sender end up retransmitting unacked segments after seeing a certain
number of ACKs not making progress ? Or maybe this is disabled when SACKs
are in use but it seems to me that once invalid SACKs are ignored we should
ideally fall back to the normal way to deal with losses. Here the server
ACKed 3903858556 for the first time at 15:59:54.292743 and repeated this
one 850 times till 16:01:17.296407 but the client kept sending past this
point probably due to a huge window, so this looks suboptimal to me.

Willy



--
Regards,
Klavs Klavsen, GSEC - k...@vsen.dk - http://www.vsen.dk - Tlf. 61281200

"Those who do not understand Unix are condemned to reinvent it, poorly."
  --Henry Spencer


Re: After a while of system running no incoming UDP any more?

2017-07-27 Thread Marc Haber
On Mon, Jul 24, 2017 at 04:19:10PM +0200, Paolo Abeni wrote:
> Once that a system enter the buggy status, do the packets reach the
> relevant socket's queue?
> 
> ss -u
> nstat |grep -e Udp -e Ip
> 
> will help checking that.

I now have the issue on one machine, a Xen guest acting as authoritative
nameserver for my domains. Here are the outputs during normal use, with
artificial queries coming in on eth0:

[9/1075]mh@impetus:~ $ ss -u
Recv-Q Send-Q Local Address:Port
  Peer Address:Port
0  0  127.0.0.1:56547   
 127.0.0.1:domain   
0  0 216.231.132.60:27667   
198.41.0.4:domain   
0  0 216.231.132.60:44121   
   8.8.8.8:domain   
0  0 216.231.132.60:29814   
198.41.0.4:domain   
[10/1076]mh@impetus:~ $ ss -u
Recv-Q Send-Q Local Address:Port
  Peer Address:Port
[11/1076]mh@impetus:~ $ ss -u
Recv-Q Send-Q Local Address:Port
  Peer Address:Port
[12/1076]mh@impetus:~ $ ss -u
Recv-Q Send-Q Local Address:Port
  Peer Address:Port
[13/1076]mh@impetus:~ $ ss -u
Recv-Q Send-Q Local Address:Port
  Peer Address:Port
[14/1076]mh@impetus:~ $ nstat  | grep -e Udp -e Ip
IpInReceives400688 0.0
IpInAddrErrors  18567  0.0
IpInUnknownProtos   3  0.0
IpInDelivers330634 0.0
IpOutRequests   283637 0.0
UdpInDatagrams  145860 0.0
UdpNoPorts  1313   0.0
UdpInErrors 9356   0.0
UdpOutDatagrams 153093 0.0
UdpIgnoredMulti 34148  0.0
Ip6InReceives   161178 0.0
Ip6InNoRoutes   8  0.0
Ip6InDelivers   73841  0.0
Ip6OutRequests  77575  0.0
Ip6InMcastPkts  87332  0.0
Ip6OutMcastPkts 1090.0
Ip6InOctets 21880674   0.0
Ip6OutOctets96330590.0
Ip6InMcastOctets93714830.0
Ip6OutMcastOctets   6636   0.0
Ip6InNoECTPkts  161202 0.0
Ip6InECT1Pkts   15 0.0
Ip6InECT0Pkts   11 0.0
Ip6InCEPkts 4  0.0
Udp6InDatagrams 11725  0.0
Udp6NoPorts 2  0.0
Udp6InErrors1989   0.0
Udp6OutDatagrams14483  0.0
IpExtInBcastPkts34148  0.0
IpExtInOctets   47462716   0.0
IpExtOutOctets  31262696   0.0
IpExtInBcastOctets  74760590.0
IpExtInNoECTPkts400178 0.0
IpExtInECT1Pkts 22 0.0
IpExtInECT0Pkts 4810.0
IpExtInCEPkts   14 0.0
[15/1077]mh@impetus:~ $ nstat  | grep -e Udp -e Ip
IpInReceives25 0.0
IpInDelivers25 0.0
IpOutRequests   16 0.0
UdpInDatagrams  1  0.0
UdpInErrors 24 0.0
UdpOutDatagrams 16 0.0
Ip6InReceives   15 0.0
Ip6InDelivers   14 0.0
Ip6OutRequests  12 0.0
Ip6InMcastPkts  1  0.0
Ip6InOctets 1219   0.0
Ip6OutOctets4384   0.0
Ip6InMcastOctets1310.0
Ip6InNoECTPkts  15 0.0
IpExtInOctets   11779  0.0
IpExtOutOctets  1023   0.0
IpExtInNoECTPkts25 0.0
[16/1

[PATCH] netfilter: SYNPROXY: fix process non tcp packet bug in {ipv4,ipv6}_synproxy_hook

2017-07-27 Thread Lin Zhang
In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet,
but the real server maybe reply an icmp error packet related to the 
exist tcp conntrack, so we will access wrong tcp data.

For fix it, we simply pass IP_CT_RELATED_REPLY packets.

Signed-off-by: Lin Zhang 
---
 net/ipv4/netfilter/ipt_SYNPROXY.c  | 2 +-
 net/ipv6/netfilter/ip6t_SYNPROXY.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c 
b/net/ipv4/netfilter/ipt_SYNPROXY.c
index f1528f7..3971fd9 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -330,7 +330,7 @@ static unsigned int ipv4_synproxy_hook(void *priv,
if (synproxy == NULL)
return NF_ACCEPT;
 
-   if (nf_is_loopback_packet(skb))
+   if (nf_is_loopback_packet(skb) || ctinfo == IP_CT_RELATED_REPLY)
return NF_ACCEPT;
 
thoff = ip_hdrlen(skb);
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index ce203dd..c4bcefe 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -347,7 +347,7 @@ static unsigned int ipv6_synproxy_hook(void *priv,
if (synproxy == NULL)
return NF_ACCEPT;
 
-   if (nf_is_loopback_packet(skb))
+   if (nf_is_loopback_packet(skb) || ctinfo == IP_CT_RELATED_REPLY)
return NF_ACCEPT;
 
nexthdr = ipv6_hdr(skb)->nexthdr;
-- 
1.8.3.1



Re: [RFC] switchdev: generate phys_port_name in the core

2017-07-27 Thread Jiri Pirko
Fri, Jul 28, 2017 at 07:35:07AM CEST, j...@resnulli.us wrote:
>Fri, Jul 28, 2017 at 04:31:22AM CEST, jakub.kicin...@netronome.com wrote:
>>On Thu, 27 Jul 2017 13:30:44 +0300, Or Gerlitz wrote:
>>> > want to add port splitting support, for example, reporting the name on
>>> > physical ports will become more of a necessity.  
>>> 
>>> > If we adopt Jiri's suggestion of returning structured data it will be
>>> > very easy to give user space type and indexes separately, but we should
>>> > probably still return the string for backwards compatibility.  
>>> 
>>> I am not still clear how the structured data would look like
>>
>>I decided to just quickly write the code, that should be easier to 
>>understand.  We can probably leave out the netlink part of the API
>>if there is no need for it right now, but that's what I ment by
>>returning the information in a more structured way.
>>
>>Tested-by: nobody :)
>>Suggested-by: Jiri (if I understood correctly)
>
>Yes, you did :) Couple of nits inlined.
>
>
>>---
>> drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  8 ++-
>> drivers/net/ethernet/mellanox/mlxsw/switchx2.c   | 10 ++--
>> drivers/net/ethernet/netronome/nfp/nfp_port.c| 26 -
>> drivers/net/ethernet/netronome/nfp/nfp_port.h|  4 +-
>> include/linux/netdevice.h| 18 ++-
>> include/uapi/linux/if_link.h | 16 ++
>> net/core/dev.c   | 31 +--
>> net/core/rtnetlink.c | 69 
>> 
>> 8 files changed, 153 insertions(+), 29 deletions(-)
>>
>>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
>>b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>index 45e60be9c277..7a71291b8ec3 100644
>>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>>@@ -637,16 +637,14 @@ static int mlx5e_rep_close(struct net_device *dev)
>> }
>> 
>> static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
>>- char *buf, size_t len)
>>+ struct netdev_port_info *info)
>
>Either we rename ndo to something like ndo_get_port_info or you rename
>the struct to netdev_port_name_info. These 2 should be in sync

As this is strictly related to port name, I think it should be:
struct netdev_phys_port_name_info

And the rest named in-sync with this


Re: [RFC] switchdev: generate phys_port_name in the core

2017-07-27 Thread Jiri Pirko
Fri, Jul 28, 2017 at 04:31:22AM CEST, jakub.kicin...@netronome.com wrote:
>On Thu, 27 Jul 2017 13:30:44 +0300, Or Gerlitz wrote:
>> > want to add port splitting support, for example, reporting the name on
>> > physical ports will become more of a necessity.  
>> 
>> > If we adopt Jiri's suggestion of returning structured data it will be
>> > very easy to give user space type and indexes separately, but we should
>> > probably still return the string for backwards compatibility.  
>> 
>> I am not still clear how the structured data would look like
>
>I decided to just quickly write the code, that should be easier to 
>understand.  We can probably leave out the netlink part of the API
>if there is no need for it right now, but that's what I ment by
>returning the information in a more structured way.
>
>Tested-by: nobody :)
>Suggested-by: Jiri (if I understood correctly)

Yes, you did :) Couple of nits inlined.


>---
> drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  8 ++-
> drivers/net/ethernet/mellanox/mlxsw/switchx2.c   | 10 ++--
> drivers/net/ethernet/netronome/nfp/nfp_port.c| 26 -
> drivers/net/ethernet/netronome/nfp/nfp_port.h|  4 +-
> include/linux/netdevice.h| 18 ++-
> include/uapi/linux/if_link.h | 16 ++
> net/core/dev.c   | 31 +--
> net/core/rtnetlink.c | 69 
> 8 files changed, 153 insertions(+), 29 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
>b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>index 45e60be9c277..7a71291b8ec3 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
>@@ -637,16 +637,14 @@ static int mlx5e_rep_close(struct net_device *dev)
> }
> 
> static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
>-  char *buf, size_t len)
>+  struct netdev_port_info *info)

Either we rename ndo to something like ndo_get_port_info or you rename
the struct to netdev_port_name_info. These 2 should be in sync


> {
>   struct mlx5e_priv *priv = netdev_priv(dev);
>   struct mlx5e_rep_priv *rpriv = priv->ppriv;
>   struct mlx5_eswitch_rep *rep = rpriv->rep;
>-  int ret;
> 
>-  ret = snprintf(buf, len, "%d", rep->vport - 1);
>-  if (ret >= len)
>-  return -EOPNOTSUPP;
>+  info->type = NETDEV_PORT_PCI_VF;

NETDEV_PORT_TYPE_PCI_VF
or
NETDEV_PORT_NAME_TYPE_PCI_VF
depends on the option you chose above.


>+  info->pci.vf_id = rep->vport - 1;
> 
>   return 0;
> }
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c 
>b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>index 3b0f72455681..383b8b5f41cf 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
>@@ -413,15 +413,13 @@ mlxsw_sx_port_get_stats64(struct net_device *dev,
>   stats->tx_dropped   = tx_dropped;
> }
> 
>-static int mlxsw_sx_port_get_phys_port_name(struct net_device *dev, char 
>*name,
>-  size_t len)
>+static int mlxsw_sx_port_get_phys_port_name(struct net_device *dev,
>+  struct netdev_port_info *info)
> {
>   struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
>-  int err;
> 
>-  err = snprintf(name, len, "p%d", mlxsw_sx_port->mapping.module + 1);
>-  if (err >= len)
>-  return -EINVAL;
>+  info->type = NETDEV_PORT_EXTERNAL;
>+  info->port.id = mlxsw_sx_port->mapping.module + 1;
> 
>   return 0;
> }
>diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c 
>b/drivers/net/ethernet/netronome/nfp/nfp_port.c
>index d16a7b78ba9b..8f5c37b9a79c 100644
>--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
>+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
>@@ -143,11 +143,11 @@ struct nfp_eth_table_port *nfp_port_get_eth_port(struct 
>nfp_port *port)
> }
> 
> int
>-nfp_port_get_phys_port_name(struct net_device *netdev, char *name, size_t len)
>+nfp_port_get_phys_port_name(struct net_device *netdev,
>+  struct netdev_port_info *info)
> {
>   struct nfp_eth_table_port *eth_port;
>   struct nfp_port *port;
>-  int n;
> 
>   port = nfp_port_from_netdev(netdev);
>   if (!port)
>@@ -159,25 +159,27 @@ nfp_port_get_phys_port_name(struct net_device *netdev, 
>char *name, size_t len)
>   if (!eth_port)
>   return -EOPNOTSUPP;
> 
>-  if (!eth_port->is_split)
>-  n = snprintf(name, len, "p%d", eth_port->label_port);
>-  else
>-  n = snprintf(name, len, "p%ds%d", eth_port->label_port,
>-   eth_port->label_subport);
>+  info->type = NETDEV_PORT_EXTERNAL;
>+  info->external.id = eth_p

Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-27 Thread Kurt Van Dijck

> 
> On 07/27/2017 01:47 PM, Oliver Hartkopp wrote:
> > On 07/26/2017 08:29 PM, Franklin S Cooper Jr wrote:
> >>
> > 
> >> I'm fine with switching to using bitrate instead of speed. Kurk was
> >> originally the one that suggested to use the term arbitration and data
> >> since thats how the spec refers to it. Which I do agree with. But your
> >> right that in the drivers (struct can_priv) we just use bittiming and
> >> data_bittiming (CAN-FD timings). I don't think adding "fd" into the
> >> property name makes sense unless we are calling it something like
> >> "max-canfd-bitrate" which I would agree is the easiest to understand.
> >>
> >> So what is the preference if we end up sticking with two properties?
> >> Option 1 or 2?
> >>
> >> 1)
> >> max-bitrate
> >> max-data-bitrate
> >>
> >> 2)
> >> max-bitrate
> >> max-canfd-bitrate
> >>
> >>
> > 
> > 1
> > 
> >>> A CAN transceiver is limited in bandwidth. But you only have one RX and
> >>> one TX line between the CAN controller and the CAN transceiver. The
> >>> transceiver does not know about CAN FD - it has just a physical(!) layer
> >>> with a limited bandwidth. This is ONE limitation.
> >>>
> >>> So I tend to specify only ONE 'max-bitrate' property for the
> >>> fixed-transceiver binding.
> >>>
> >>> The fact whether the CAN controller is CAN FD capable or not is provided
> >>> by the netlink configuration interface for CAN controllers.
> >>
> >> Part of the reasoning to have two properties is to indicate that you
> >> don't support CAN FD while limiting the "arbitration" bit rate.
> > 
> > ??
> > 
> > It's a physical layer device which only has a bandwidth limitation.
> > The transceiver does not know about CAN FD.
> > 
> >> With one
> >> property you can not determine this and end up having to make some
> >> assumptions that can quickly end up biting people.
> > 
> > Despite the fact that the transceiver does not know anything about ISO
> > layer 2 (CAN/CAN FD) the properties should look like
> > 
> > max-bitrate
> > canfd-capable
> > 
> > then.
> > 
> > But when the tranceiver is 'canfd-capable' agnostic, why provide a
> > property for it?
> > 
> > Maybe I'm wrong but I still can't follow your argumentation ideas.
> 

The transceiver does not know about CAN FD, but CAN FD uses
the different restrictions of the arbitration & data phase in the CAN
frame, i.e. during arbitration, the RX must indicate the wire
(dominant/recessive) within 1 bit time, during data in CAN FD, this is
not necessary.

So while _a_ transceiver may be spec'd to 1MBit during arbitration,
CAN FD packets may IMHO exceed that speed during data phase.
That was the whole point of CAN FD: exceed the limits required for
correct arbitration on transceiver & wire.

So I do not agree on the single bandwidth limitation.

The word 'max-arbitration-bitrate' makes the difference very clear.

> Your right. I spoke to our CAN transceiver team and I finally get your
> points.
> 
> So yes using "max-bitrate" alone is all we need. Sorry for the confusion
> and I'll create a new rev using this approach.
> > 
> > Regards,
> > Oliver

Kind regards,
Kurt


Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-27 Thread Cong Wang
On Wed, Jul 26, 2017 at 11:49 AM, David Ahern  wrote:
> On 7/26/17 12:27 PM, Roopa Prabhu wrote:
>> agreed...so looks like the check in v3 should be
>>
>>
>> +   if ( rt == net->ipv6.ip6_null_entry ||
>> +(rt->dst.error &&
>> + #ifdef CONFIG_IPV6_MULTIPLE_TABLES
>> +  rt != net->ipv6.ip6_prohibit_entry &&
>> +  rt != net->ipv6.ip6_blk_hole_entry &&
>> +#endif
>> + )) {
>> err = rt->dst.error;
>> ip6_rt_put(rt);
>> goto errout;
>>
>
> I don't think so. If I add a prohibit route and use the fibmatch
> attribute, I want to see the route from the FIB that was matched.

But net->ipv6.ip6_prohibit_entry is not the prohibit route you can
add in user-space, it is only used by rule actions. So do you really
want to dump it?? My gut feeling is no, but I am definitely not sure.

When you add a prohibit route, a new rt is allocated dynamically,
net->ipv6.ip6_prohibit_entry is relatively static, internal and is the
only one per netns. (Same for net->ipv6.ip6_blk_hole_entry)

I think Hangbin's example doesn't have ip rules, so this case
is not shown up.


Re: Possible race in hysdn.ko

2017-07-27 Thread isdn
Hello Anton,

first of all, this code was developed by other people and I
never managed to get one of these cards - so I do not know so much about
this driver at all.
Unfortunately the firm behind hysdn do not longer exist and
was taken over by Hermstedt AG years ago and even Hermstedt AG is not
longer active in this businesss I think (ISDN is a obsolete technology).

Am 27.07.2017 um 18:19 schrieb Anton Volkov:
> Hello.
> 
> While searching for races in the Linux kernel I've come across
> "drivers/isdn/hysdn/hysdn.ko" module. Here is a question that I came up
> with while analysing results. Lines are given using the info from Linux
> v4.12.
> 
> In hysdn_proclog.c file in put_log_buffer function a non-standard type
> of synchronization is employed. It uses pd->del_lock as some kind of
> semaphore (hysdn_proclog.c: lines 129 and 143). Consider the following
> case:
> 
> Thread 1:Thread 2:
> hysdn_log_write
> -> hysdn_add_log
> -> put_log_buffer
>  spin_lock()  hysdn_conf_open
>  i = pd->del_lock++   -> hysdn_add_log
>  spin_unlock()   -> put_log_buffer
>  if (!i) spin_lock()
>  pd->del_lock--   i = pd->del_lock++
>   spin_unlock()
>   if (!i) 
>   pd->del_lock--
> 
>  - the loop that deletes unused buffer entries
> (hysdn_proclog.c: lines 134-142).
> pd->del_lock-- is not an atomic operation and is executed without any
> locks. Thus it may interfere in the increment process of pd->del_lock in
> another thread. There may be cases that lead to the inability of any
> thread going through the .

Good catch.

> 
> I see several possible solutions to this problem:
> 1) move the  under the spin_lock and delete
> pd->del_lock synchronization;
> 2) wrap pd->del_lock-- with spin_lock protection.
> 
> What do you think should be done about it?

I think the intention to have this construct was to not hold the card
lock for long times from /proc/ access to log data, since that may
disrupt the normal function. This is only a guess - I did not really
analyzed the code deeply enough, but I fear here are other critical
problems with this code, since without extra protection the list could
be damaged during the deletion loop I think.
So maybe to have the complete loop under the lock is a good idea.


Best regards
Karsten




Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the HNAE3 framework

2017-07-27 Thread Leon Romanovsky
On Thu, Jul 27, 2017 at 11:44:32PM +, Salil Mehta wrote:
> Hi Leon
>
> > -Original Message-
> > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> > ow...@vger.kernel.org] On Behalf Of Leon Romanovsky
> > Sent: Sunday, July 23, 2017 2:16 PM
> > To: Salil Mehta
> > Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> > mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> > Subject: Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the
> > HNAE3 framework
> >
> > On Sat, Jul 22, 2017 at 11:09:36PM +0100, Salil Mehta wrote:
> > > This patch adds the support of the HNAE3 (Hisilicon Network
> > > Acceleration Engine 3) framework support to the HNS3 driver.
> > >
> > > Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE
> > > and user-space Ethernet drivers (like ODP etc.) to register with
> > HNAE3
> > > devices and their associated operations.
> > >
> > > Signed-off-by: Daode Huang 
> > > Signed-off-by: lipeng 
> > > Signed-off-by: Salil Mehta 
> > > Signed-off-by: Yisen Zhuang 
> > > ---
> > > Patch V4: Addressed following comments
> > >   1. Andrew Lunn:
> > >  https://lkml.org/lkml/2017/6/17/233
> > >  https://lkml.org/lkml/2017/6/18/105
> > >   2. Bo Yu:
> > >  https://lkml.org/lkml/2017/6/18/112
> > >   3. Stephen Hamminger:
> > >  https://lkml.org/lkml/2017/6/19/778
> > > Patch V3: Addressed below comments
> > >   1. Andrew Lunn:
> > >  https://lkml.org/lkml/2017/6/13/1025
> > > Patch V2: No change
> > > Patch V1: Initial Submit
> > > ---
> > >  drivers/net/ethernet/hisilicon/hns3/hnae3.c | 319
> > 
> > >  drivers/net/ethernet/hisilicon/hns3/hnae3.h | 449
> > 
> > >  2 files changed, 768 insertions(+)
> > >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h
> > >
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > > new file mode 100644
> > > index ..7a11aaff0a23
> > > --- /dev/null
> > > +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > > @@ -0,0 +1,319 @@
> > > +/*
> > > + * Copyright (c) 2016-2017 Hisilicon Limited.
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > modify
> > > + * it under the terms of the GNU General Public License as published
> > by
> > > + * the Free Software Foundation; either version 2 of the License, or
> > > + * (at your option) any later version.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "hnae3.h"
> > > +
> > > +static LIST_HEAD(hnae3_ae_algo_list);
> > > +static LIST_HEAD(hnae3_client_list);
> > > +static LIST_HEAD(hnae3_ae_dev_list);
> > > +
> > > +/* we are keeping things simple and using single lock for all the
> > > + * list. This is a non-critical code so other updations, if happen
> > > + * in parallel, can wait.
> > > + */
> > > +static DEFINE_MUTEX(hnae3_common_lock);
> > > +
> > > +static bool hnae3_client_match(enum hnae3_client_type client_type,
> > > +enum hnae3_dev_type dev_type)
> > > +{
> > > + if (dev_type == HNAE3_DEV_KNIC) {
> > > + switch (client_type) {
> > > + case HNAE3_CLIENT_KNIC:
> > > + case HNAE3_CLIENT_ROCE:
> > > + return true;
> > > + default:
> > > + return false;
> > > + }
> > > + } else if (dev_type == HNAE3_DEV_UNIC) {
> > > + switch (client_type) {
> > > + case HNAE3_CLIENT_UNIC:
> > > + return true;
> > > + default:
> > > + return false;
> > > + }
> > > + } else {
> > > + return false;
> > > + }
> > > +}
> > > +
> > > +static int hnae3_match_n_instantiate(struct hnae3_client *client,
> > > +  struct hnae3_ae_dev *ae_dev,
> > > +  bool is_reg, bool *matched)
> > > +{
> > > + int ret;
> > > +
> > > + *matched = false;
> > > +
> > > + /* check if this client matches the type of ae_dev */
> > > + if (!(hnae3_client_match(client->type, ae_dev->dev_type) &&
> > > +   hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))) {
> > > + return 0;
> > > + }
> > > + /* there is a match of client and dev */
> > > + *matched = true;
> > > +
> > > + if (!(ae_dev->ops && ae_dev->ops->init_client_instance &&
> > > +   ae_dev->ops->uninit_client_instance)) {
> > > + dev_err(&ae_dev->pdev->dev,
> > > + "ae_dev or client init/uninit ops are null\n");
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + /* now, (un-)instantiate client by calling lower layer */
> > > + if (is_reg) {
> > > + ret = ae_dev->ops->init_client_instance(client, ae_dev);
> > > + if (ret)
> > > + dev_err(&ae_dev->pdev->dev,
> > > +  

Re: [PATCH net-next 03/18] net: mvpp2: set the SMI PHY address when connecting to the PHY

2017-07-27 Thread Andrew Lunn
On Thu, Jul 27, 2017 at 06:49:05PM -0700, Antoine Tenart wrote:
> Hi Andrew,
> 
> On Wed, Jul 26, 2017 at 06:08:06PM +0200, Andrew Lunn wrote:
> > On Mon, Jul 24, 2017 at 03:48:33PM +0200, Antoine Tenart wrote:
> > >  
> > > + if (priv->hw_version != MVPP22)
> > > + return 0;
> > > +
> > > + /* Set the SMI PHY address */
> > > + if (of_property_read_u32(port->phy_node, "reg", &phy_addr)) {
> > > + netdev_err(port->dev, "cannot find the PHY address\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + writel(phy_addr, priv->iface_base + MVPP22_SMI_PHY_ADDR(port->gop_id));
> > >   return 0;
> > >  }
> > 
> > You could use phy_dev->mdiodev->addr, rather than parse the DT.
> 
> OK.
> 
> > Why does the MAC need to know this address? The phylib and PHY driver
> > should be the only thing accessing the PHY, otherwise you are asking
> > for trouble.
> 
> This is part of the SMI/xSMI interface. I added into the mvpp2 driver
> and not in the mvmdio one because the GoP port number must be known to
> set this register (so that would be even less clean to do it).

Hi Antoine

It is still not clear to my why you need to program the address into
the hardware. Is the hardware talking to the PHY?

Andrew


Dear Talented

2017-07-27 Thread Kim Sharma
Dear Talented,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a
Film Corporation Located in the United State, is Soliciting for the
Right to use Your Photo/Face and Personality as One of the Semi -Major
Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story
of Anubis (Anubis 2018) The Movie is Currently Filming (In
Production) Please Note That There Will Be No Auditions, Traveling or
Any Special / Professional Acting Skills, Since the Production of This
Movie Will Be Done with our State of Art Computer -Generating Imagery
Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For
More Information/Understanding, Please Write us on the E-Mail Below.
CONTACT EMAIL: blueskyanimatedstu...@usa.com
All Reply to: blueskyanimatedstu...@usa.com
Note: Only the Response send to this mail will be Given a Prior
Consideration.


Talent Scout
Kim Sharma


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Michael S. Tsirkin
On Fri, Jul 28, 2017 at 11:50:45AM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月28日 11:46, Michael S. Tsirkin wrote:
> > On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:
> > > > > + old_prog = rtnl_dereference(tun->xdp_prog);
> > > > > + if (old_prog)
> > > > > + bpf_prog_put(old_prog);
> > > > > + rcu_assign_pointer(tun->xdp_prog, prog);
> > > > Is this OK?  Could this lead to the program getting freed and then
> > > > datapath accessing a stale pointer?  I mean in the scenario where the
> > > > process gets pre-empted between the bpf_prog_put() and
> > > > rcu_assign_pointer()?
> > > Will call bpf_prog_put() after rcu_assign_pointer().
> > I suspect you need to sync RCU or something before that.
> 
> __bpf_prog_put() will do call_rcu(), so looks like it was ok.
> 
> Thanks

True - I missed that.

-- 
MST


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jakub Kicinski
On Fri, 28 Jul 2017 06:46:40 +0300, Michael S. Tsirkin wrote:
> On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:
> > > > +   old_prog = rtnl_dereference(tun->xdp_prog);
> > > > +   if (old_prog)
> > > > +   bpf_prog_put(old_prog);
> > > > +   rcu_assign_pointer(tun->xdp_prog, prog);  
> > > Is this OK?  Could this lead to the program getting freed and then
> > > datapath accessing a stale pointer?  I mean in the scenario where the
> > > process gets pre-empted between the bpf_prog_put() and
> > > rcu_assign_pointer()?  
> > 
> > Will call bpf_prog_put() after rcu_assign_pointer().  
> 
> I suspect you need to sync RCU or something before that.

I think the bpf_prog_put() will use call_rcu() to do the actual free:

static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
{
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
/* bpf_prog_free_id() must be called first */
bpf_prog_free_id(prog, do_idr_lock);
bpf_prog_kallsyms_del(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
}

It's just that we are only under the rtnl here, RCU lock is not held, so
grace period may elapse between bpf_prog_put() and rcu_assign_pointer().


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jason Wang



On 2017年07月28日 11:46, Michael S. Tsirkin wrote:

On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:

+   old_prog = rtnl_dereference(tun->xdp_prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   rcu_assign_pointer(tun->xdp_prog, prog);

Is this OK?  Could this lead to the program getting freed and then
datapath accessing a stale pointer?  I mean in the scenario where the
process gets pre-empted between the bpf_prog_put() and
rcu_assign_pointer()?

Will call bpf_prog_put() after rcu_assign_pointer().

I suspect you need to sync RCU or something before that.


__bpf_prog_put() will do call_rcu(), so looks like it was ok.

Thanks


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Michael S. Tsirkin
On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:
> > > + old_prog = rtnl_dereference(tun->xdp_prog);
> > > + if (old_prog)
> > > + bpf_prog_put(old_prog);
> > > + rcu_assign_pointer(tun->xdp_prog, prog);
> > Is this OK?  Could this lead to the program getting freed and then
> > datapath accessing a stale pointer?  I mean in the scenario where the
> > process gets pre-empted between the bpf_prog_put() and
> > rcu_assign_pointer()?
> 
> Will call bpf_prog_put() after rcu_assign_pointer().

I suspect you need to sync RCU or something before that.


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jason Wang



On 2017年07月28日 11:13, Jakub Kicinski wrote:

On Thu, 27 Jul 2017 17:25:33 +0800, Jason Wang wrote:

This patch tries to implement XDP for tun. The implementation was
split into two parts:

- fast path: small and no gso packet. We try to do XDP at page level
   before build_skb(). For XDP_TX, since creating/destroying queues
   were completely under control of userspace, it was implemented
   through generic XDP helper after skb has been built. This could be
   optimized in the future.
- slow path: big or gso packet. We try to do it after skb was created
   through generic XDP helpers.

XDP_REDIRECT was not implemented, it could be done on top.

xdp1 test shows 47.6% improvement:

Before: ~2.1Mpps
After:  ~3.1Mpps

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
@@ -1008,6 +1016,56 @@ tun_net_get_stats64(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
stats->tx_dropped = tx_dropped;
  }
  
+static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,

+  struct netlink_ext_ack *extack)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+   struct bpf_prog *old_prog;
+
+   /* We will shift the packet that can't be handled to generic
+* XDP layer.
+*/
+
+   old_prog = rtnl_dereference(tun->xdp_prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   rcu_assign_pointer(tun->xdp_prog, prog);

Is this OK?  Could this lead to the program getting freed and then
datapath accessing a stale pointer?  I mean in the scenario where the
process gets pre-empted between the bpf_prog_put() and
rcu_assign_pointer()?


Will call bpf_prog_put() after rcu_assign_pointer().




+   if (prog) {
+   prog = bpf_prog_add(prog, 1);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+   }

I don't think you need this extra reference here.  dev_change_xdp_fd()
will call bpf_prog_get_type() which means driver gets the program with
a reference already taken, drivers does have to free that reference when
program is removed (or device is freed, as you correctly do).


I see, will drop this in next version.

Thanks.




+   return 0;
+}
+




Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 18:11:32 +, Levin, Alexander (Sasha Levin)
wrote:
> This is useful for directly looking up a task based on class id rather than
> having to scan through all open file descriptors.
> 
> Signed-off-by: Sasha Levin 
> ---
> 
> Changes in V2:
>  - Addressed comments from Cong Wang (use nla_put_u32())
> 
>  include/uapi/linux/inet_diag.h |  1 +
>  net/ipv4/inet_diag.c   | 10 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
> index bbe201047df6..678496897a68 100644
> --- a/include/uapi/linux/inet_diag.h
> +++ b/include/uapi/linux/inet_diag.h
> @@ -142,6 +142,7 @@ enum {
>   INET_DIAG_PAD,
>   INET_DIAG_MARK,
>   INET_DIAG_BBRINFO,
> + INET_DIAG_CLASS_ID,
>   __INET_DIAG_MAX,
>  };
>  
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 3828b3a805cd..2c2445d4bb58 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
> inet_connection_sock *icsk,
>   goto errout;
>   }
>  
> + if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
> + u32 classid = 0;
> +
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> + classid = sock_cgroup_classid(&sk->sk_cgrp_data);
> +#endif
> +
> + nla_put_u32(skb, INET_DIAG_CLASS_ID, classid);

You need to check the return value from nla_put_u32() and goto errout
if it's set.  

Perhaps adding __must_check to the nla_put_*() helpers would be a good
idea.

> + }
> +
>  out:
>   nlmsg_end(skb, nlh);
>   return 0;



Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 17:25:33 +0800, Jason Wang wrote:
> This patch tries to implement XDP for tun. The implementation was
> split into two parts:
> 
> - fast path: small and no gso packet. We try to do XDP at page level
>   before build_skb(). For XDP_TX, since creating/destroying queues
>   were completely under control of userspace, it was implemented
>   through generic XDP helper after skb has been built. This could be
>   optimized in the future.
> - slow path: big or gso packet. We try to do it after skb was created
>   through generic XDP helpers.
> 
> XDP_REDIRECT was not implemented, it could be done on top.
> 
> xdp1 test shows 47.6% improvement:
> 
> Before: ~2.1Mpps
> After:  ~3.1Mpps
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 

> @@ -1008,6 +1016,56 @@ tun_net_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>   stats->tx_dropped = tx_dropped;
>  }
>  
> +static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> +struct netlink_ext_ack *extack)
> +{
> + struct tun_struct *tun = netdev_priv(dev);
> + struct bpf_prog *old_prog;
> +
> + /* We will shift the packet that can't be handled to generic
> +  * XDP layer.
> +  */
> +
> + old_prog = rtnl_dereference(tun->xdp_prog);
> + if (old_prog)
> + bpf_prog_put(old_prog);
> + rcu_assign_pointer(tun->xdp_prog, prog);

Is this OK?  Could this lead to the program getting freed and then
datapath accessing a stale pointer?  I mean in the scenario where the
process gets pre-empted between the bpf_prog_put() and
rcu_assign_pointer()?

> + if (prog) {
> + prog = bpf_prog_add(prog, 1);
> + if (IS_ERR(prog))
> + return PTR_ERR(prog);
> + }

I don't think you need this extra reference here.  dev_change_xdp_fd()
will call bpf_prog_get_type() which means driver gets the program with
a reference already taken, drivers does have to free that reference when
program is removed (or device is freed, as you correctly do).

> + return 0;
> +}
> +


Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Ding Tianhong


On 2017/7/28 1:49, Alexander Duyck wrote:
> On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong  
> wrote:
>>
>>
>> On 2017/7/27 2:26, Casey Leedom wrote:
>>>   By the way Ding, two issues:
>>>
>>>  1. Did we ever get any acknowledgement from either Intel or AMD
>>> on this patch?  I know that we can't ensure that, but it sure would
>>> be nice since the PCI Quirks that we're putting in affect their
>>> products.
>>>
>>
>> Still no Intel and AMD guys has ack this, this is what I am worried about, 
>> should I
>> ping some man again ?
>>
>> Thanks
>> Ding
> 
> 
> I probably wouldn't worry about it too much. If anything all this
> patch is doing is disabling relaxed ordering on the platforms we know
> have issues based on what Casey originally had. If nothing else we can
> follow up once the patches are in the kernel and if somebody has an
> issue then.
> 
> You can include my acked-by, but it is mostly related to how this
> interacts with NICs, and not so much about the PCI chipsets
> themselves.
> 
> Acked-by: Alexander Duyck 
> 

Thanks, Alex. :)

> .
> 



Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Ding Tianhong


On 2017/7/28 2:42, Raj, Ashok wrote:
> Hi Casey
> 
>> | Still no Intel and AMD guys has ack this, this is what I am worried about,
>> | should I ping some man again ?
> 
> 
> I can ack the patch set for Intel specific changes. Now that the doc is made
> public :-).
> 

Good, Thanks. :)

> Can you/Ding resend the patch series, i do have the most recent v7, some
> of the commit message wasn't easy to ready. Seems like this patch has
> gotten bigger than originally intended, but seems to be for the overall
> good :-).
> 

OK, I will send v8 patch set and which will update the patch title and add
Casey's new modification for his vf driver, thanks.

Ding

> Sorry for staying silent up until now.
> 
> Cheers,
> Ashok
> 
> .
> 



Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Ding Tianhong


On 2017/7/28 1:44, Casey Leedom wrote:
> | From: Ding Tianhong 
> | Sent: Wednesday, July 26, 2017 6:01 PM
> |
> | On 2017/7/27 3:05, Casey Leedom wrote:
> | >
> | > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up
> | > for you.
> |
> | Ok, you could send the change log and I could put it in the v8 version
> | together, will you base on the patch 3/3 or build a independence patch?
> 
> Which ever you'd prefer.  It would basically mirror the same exact code that
> you've got for cxgb4.  I.e. testing the setting of the VF's PCIe Capability
> Device Control[Relaxed Ordering Enable], setting a new flag in
> adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq().
> But since the VF's PF will already have disabled the PF's Relaxed Ordering
> Enable, the VF will also have it's Relaxed Ordering Enable disabled and any
> effort by the internal chip to send TLPs with the Relaxed Ordering Attribute
> will be gated by the PCIe logic.  So it's not critical that this be in the
> first patch.  Your call.  Let me know if you'd like me to send that to you.
> 

Good, please Send it to me, I will put it together and send the v8 this week,
I think Bjorn will be back next week .:)

> 
> | From: Ding Tianhong 
> | Sent: Wednesday, July 26, 2017 6:08 PM
> |
> | On 2017/7/27 2:26, Casey Leedom wrote:
> | >
> | >  1. Did we ever get any acknowledgement from either Intel or AMD
> | > on this patch?  I know that we can't ensure that, but it sure would
> | > be nice since the PCI Quirks that we're putting in affect their
> | > products.
> |
> | Still no Intel and AMD guys has ack this, this is what I am worried about,
> | should I ping some man again ?
> 
> By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note
> yesterday with a link to the official Intel performance tuning documentation
> which covers this issue:
> 
> https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf
> 
> In section 3.9.1 we have:
> 
> 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
>   and Toward MMIO Regions (P2P)
> 
> In order to maximize performance for PCIe devices in the processors
> listed in Table 3-6 below, the soft- ware should determine whether the
> accesses are toward coherent memory (system memory) or toward MMIO
> regions (P2P access to other devices). If the access is toward MMIO
> region, then software can command HW to set the RO bit in the TLP
> header, as this would allow hardware to achieve maximum throughput for
> these types of accesses. For accesses toward coherent memory, software
> can command HW to clear the RO bit in the TLP header (no RO), as this
> would allow hardware to achieve maximum throughput for these types of
> accesses.
> 
> Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
>PCIe Performance
> 
> ProcessorCPU RP Device IDs
> 
> Intel Xeon processors based on   6F01H-6F0EH
> Broadwell microarchitecture
> 
> Intel Xeon processors based on   2F01H-2F0EH
> Haswell microarchitecture
> 
> Unfortunately that's a pretty thin section.  But it does expand the set of
> Intel Root Complexes for which our Linux PCI Quirk will need to cover.  So
> you should add those to the next (and hopefully final) spin of your patch.
> And, it also verifies the need to handle the use of Relaxed Ordering more
> subtlely than simply turning it off since the NVMe peer-to-peer example I
> keep bringing up would fall into the "need to use Relaxed Ordering" case ...
> 
> It would have been nice to know why this is happening and if any future
> processor would fix this.  After all, Relaxed Ordering, is just supposed to
> be a hint.  At worst, a receiving device could just ignore the attribute
> entirely.  Obviously someone made an effort to implement it but ... it
> didn't go the way they wanted.
> 
> And, it also would have been nice to know if there was any hidden register
> in these Intel Root Complexes which can completely turn off the effort to
> pay attention to the Relaxed Ordering Attribute.  We've spend an enormous
> amount of effort on this issue here on the Linux PCI email list struggling
> mightily to come up with a way to determine when it's
> safe/recommended/not-recommended/unsafe to use Relaxed Ordering when
> directing TLPs towards the Root Complex.  And some architectures require RO
> for decent performance so we can't just "turn it off" unilatterally.
> 

I am glad to hear that more person were focus on this problem, It would be great
if they could enter our discussion and give us more suggestion. :)

Thanks
Ding

> Casey
> 
> .
> 



Re: [RFC] switchdev: generate phys_port_name in the core

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 19:31:22 -0700, Jakub Kicinski wrote:
> +static size_t rtnl_port_info_size(void)
> +{
> + size_t port_info_size = nla_total_size(0) + /* nest IFLA_PORT_INFO */

nla_total_size(4) + /* TYPE */

> + nla_total_size(4) + /* EXTERNAL_ID or PF_ID */
> + nla_total_size(4);  /* SPLIT_ID or VF_ID*/
> +
> + return port_info_size;
> +}
> +


Re: [PATCH net-next v2 0/3] ethtool: support for forward error correction mode setting on a link

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 16:47:25 -0700, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> Forward Error Correction (FEC) modes i.e Base-R
> and Reed-Solomon modes are introduced in 25G/40G/100G standards
> for providing good BER at high speeds. Various networking devices
> which support 25G/40G/100G provides ability to manage supported FEC
> modes and the lack of FEC encoding control and reporting today is a
> source for interoperability issues for many vendors.
> FEC capability as well as specific FEC mode i.e. Base-R
> or RS modes can be requested or advertised through bits D44:47 of base link
> codeword.
> 
> This patch set intends to provide option under ethtool to manage and
> report FEC encoding settings for networking devices as per IEEE 802.3
> bj, bm and by specs.
> 
> v2 :
> - minor patch format fixes and typos pointed out by Andrew
> - there was a pending discussion on the use of 'auto' vs
>   'automatic' for fec settings. I have left it as 'auto'
>   because in most cases today auto is used in place of
>   automatic to represent automatically generated values.
>   We use it in other networking config too. I would prefer
>   leaving it as auto.

On the subject of resetting the values when module is replugged I
assume what was previously described remains:
 - we always allow users to set the FEC regardless of the module type;
 - if user set an incorrect FEC for the module type (or module gets
   swapped) the link will be administratively taken down by either
   the driver or FW.

Is that correct?  Am I misremembering?


[RFC] switchdev: generate phys_port_name in the core

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 13:30:44 +0300, Or Gerlitz wrote:
> > want to add port splitting support, for example, reporting the name on
> > physical ports will become more of a necessity.  
> 
> > If we adopt Jiri's suggestion of returning structured data it will be
> > very easy to give user space type and indexes separately, but we should
> > probably still return the string for backwards compatibility.  
> 
> I am not still clear how the structured data would look like

I decided to just quickly write the code, that should be easier to 
understand.  We can probably leave out the netlink part of the API
if there is no need for it right now, but that's what I ment by
returning the information in a more structured way.

Tested-by: nobody :)
Suggested-by: Jiri (if I understood correctly)
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  8 ++-
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c   | 10 ++--
 drivers/net/ethernet/netronome/nfp/nfp_port.c| 26 -
 drivers/net/ethernet/netronome/nfp/nfp_port.h|  4 +-
 include/linux/netdevice.h| 18 ++-
 include/uapi/linux/if_link.h | 16 ++
 net/core/dev.c   | 31 +--
 net/core/rtnetlink.c | 69 
 8 files changed, 153 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 45e60be9c277..7a71291b8ec3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -637,16 +637,14 @@ static int mlx5e_rep_close(struct net_device *dev)
 }
 
 static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
-   char *buf, size_t len)
+   struct netdev_port_info *info)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5e_rep_priv *rpriv = priv->ppriv;
struct mlx5_eswitch_rep *rep = rpriv->rep;
-   int ret;
 
-   ret = snprintf(buf, len, "%d", rep->vport - 1);
-   if (ret >= len)
-   return -EOPNOTSUPP;
+   info->type = NETDEV_PORT_PCI_VF;
+   info->pci.vf_id = rep->vport - 1;
 
return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c 
b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index 3b0f72455681..383b8b5f41cf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -413,15 +413,13 @@ mlxsw_sx_port_get_stats64(struct net_device *dev,
stats->tx_dropped   = tx_dropped;
 }
 
-static int mlxsw_sx_port_get_phys_port_name(struct net_device *dev, char *name,
-   size_t len)
+static int mlxsw_sx_port_get_phys_port_name(struct net_device *dev,
+   struct netdev_port_info *info)
 {
struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
-   int err;
 
-   err = snprintf(name, len, "p%d", mlxsw_sx_port->mapping.module + 1);
-   if (err >= len)
-   return -EINVAL;
+   info->type = NETDEV_PORT_EXTERNAL;
+   info->port.id = mlxsw_sx_port->mapping.module + 1;
 
return 0;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c 
b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index d16a7b78ba9b..8f5c37b9a79c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -143,11 +143,11 @@ struct nfp_eth_table_port *nfp_port_get_eth_port(struct 
nfp_port *port)
 }
 
 int
-nfp_port_get_phys_port_name(struct net_device *netdev, char *name, size_t len)
+nfp_port_get_phys_port_name(struct net_device *netdev,
+   struct netdev_port_info *info)
 {
struct nfp_eth_table_port *eth_port;
struct nfp_port *port;
-   int n;
 
port = nfp_port_from_netdev(netdev);
if (!port)
@@ -159,25 +159,27 @@ nfp_port_get_phys_port_name(struct net_device *netdev, 
char *name, size_t len)
if (!eth_port)
return -EOPNOTSUPP;
 
-   if (!eth_port->is_split)
-   n = snprintf(name, len, "p%d", eth_port->label_port);
-   else
-   n = snprintf(name, len, "p%ds%d", eth_port->label_port,
-eth_port->label_subport);
+   info->type = NETDEV_PORT_EXTERNAL;
+   info->external.id = eth_port->label_port;
+
+   if (eth_port->is_split) {
+   info->type = NETDEV_PORT_EXTERNAL_SPLIT;
+   info->external.split_id = eth_port->label_subport;
+   }
break;
case NFP_PORT_PF_PORT:
-   n = snprintf(name, len, "pf%d", port->pf_id);
+   info->type = NETDEV_PORT_PCI_PF;
+   info->pci.pf_id = port->pf

Re: [PATCH net-next 08/18] net: mvpp2: make the phy optional

2017-07-27 Thread Antoine Tenart
Hi Andrew,

On Wed, Jul 26, 2017 at 06:20:00PM +0200, Andrew Lunn wrote:
> On Mon, Jul 24, 2017 at 03:48:38PM +0200, Antoine Tenart wrote:
> > SFP ports do not necessarily need to have an Ethernet PHY between the
> > SoC and the actual physical port. However, the driver currently makes
> > the "phy" property mandatory, contrary to what is stated in the Device
> > Tree binding.
> > 
> > To allow handling the PPv2 controller on those boards, this patch makes
> > the PHY optional, and aligns the PPv2 driver on its device tree
> > documentation.
> 
> It is an architectural question...
> 
> but with the boards i have with an SFF port, i actually use a
> fixed-phy to represent the SFF. Then nothing special is needed.

I was not aware of the fixed-phy, that might work for us here.
Thanks for the hint!

> Also, Russell King posted his phylink patches. Once accepted, you are
> going to want to re-write some of this to make use of that code.

And there's that as well.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH net-next 03/18] net: mvpp2: set the SMI PHY address when connecting to the PHY

2017-07-27 Thread Antoine Tenart
Hi Andrew,

On Wed, Jul 26, 2017 at 06:08:06PM +0200, Andrew Lunn wrote:
> On Mon, Jul 24, 2017 at 03:48:33PM +0200, Antoine Tenart wrote:
> >  
> > +   if (priv->hw_version != MVPP22)
> > +   return 0;
> > +
> > +   /* Set the SMI PHY address */
> > +   if (of_property_read_u32(port->phy_node, "reg", &phy_addr)) {
> > +   netdev_err(port->dev, "cannot find the PHY address\n");
> > +   return -EINVAL;
> > +   }
> > +
> > +   writel(phy_addr, priv->iface_base + MVPP22_SMI_PHY_ADDR(port->gop_id));
> > return 0;
> >  }
> 
> You could use phy_dev->mdiodev->addr, rather than parse the DT.

OK.

> Why does the MAC need to know this address? The phylib and PHY driver
> should be the only thing accessing the PHY, otherwise you are asking
> for trouble.

This is part of the SMI/xSMI interface. I added into the mvpp2 driver
and not in the mvmdio one because the GoP port number must be known to
set this register (so that would be even less clean to do it).

> What if the PHY is hanging off some other mdio bus? I've got a
> freescale board with dual ethernets and a Marvell switch on the
> hardware MDIO bus and a PHY on a bit-banging MDIO bus.

Then it wouldn't be controlled by the PPv2 SMI/xSMI interface, so we
wouldn't need to set the this register.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH net-next 04/18] net: mvpp2: move the mii configuration in the ndo_open path

2017-07-27 Thread Antoine Tenart
Hi Andrew,

On Wed, Jul 26, 2017 at 06:11:11PM +0200, Andrew Lunn wrote:
> On Mon, Jul 24, 2017 at 03:48:34PM +0200, Antoine Tenart wrote:
> > This moves the mii configuration in the ndo_open path, to allow handling
> > different mii configurations later and to switch between these
> > configurations at runtime.
> > 
> > Signed-off-by: Antoine Tenart 
> > ---
> >  drivers/net/ethernet/marvell/mvpp2.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
> > b/drivers/net/ethernet/marvell/mvpp2.c
> > index 6929b22a..9d204ffb9b89 100644
> > --- a/drivers/net/ethernet/marvell/mvpp2.c
> > +++ b/drivers/net/ethernet/marvell/mvpp2.c
> > @@ -5862,6 +5862,7 @@ static void mvpp2_start_dev(struct mvpp2_port *port)
> > /* Enable interrupts on all CPUs */
> > mvpp2_interrupts_enable(port);
> >  
> > +   mvpp2_port_mii_set(port);
> 
> You probably should take a look at mvpp2_port_mii_set() and have it
> handle all PHY_INTERFACE_MODE_RGMII variants.

I'll have a look at these variants (and update the whole series).

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH v4 net-next] net: systemport: Support 64bit statistics

2017-07-27 Thread Florian Fainelli
On 07/27/2017 05:43 PM, Jianming.qiao wrote:
> When using Broadcom Systemport device in 32bit Platform, ifconfig can
> only report up to 4G tx,rx status, which will be wrapped to 0 when the
> number of incoming or outgoing packets exceeds 4G, only taking
> around 2 hours in busy network environment (such as streaming).
> Therefore, it makes hard for network diagnostic tool to get reliable
> statistical result, so the patch is used to add 64bit support for
> Broadcom Systemport device in 32bit Platform.
> 
> Signed-off-by: Jianming.qiao 
> ---
>  drivers/net/ethernet/broadcom/bcmsysport.c | 74 
> --
>  drivers/net/ethernet/broadcom/bcmsysport.h |  9 +++-
>  2 files changed, 57 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
> b/drivers/net/ethernet/broadcom/bcmsysport.c
> index 5274501..16cd8a6 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -662,6 +662,7 @@ static int bcm_sysport_alloc_rx_bufs(struct 
> bcm_sysport_priv *priv)
>  static unsigned int bcm_sysport_desc_rx(struct bcm_sysport_priv *priv,
>   unsigned int budget)
>  {
> + struct bcm_sysport_stats *stats64 = &priv->stats64;
>   struct net_device *ndev = priv->netdev;
>   unsigned int processed = 0, to_process;
>   struct bcm_sysport_cb *cb;
> @@ -765,6 +766,10 @@ static unsigned int bcm_sysport_desc_rx(struct 
> bcm_sysport_priv *priv,
>   skb->protocol = eth_type_trans(skb, ndev);
>   ndev->stats.rx_packets++;
>   ndev->stats.rx_bytes += len;
> + u64_stats_update_begin(&stats64->syncp);
> + stats64->rx_packets++;
> + stats64->rx_bytes += len;
> + u64_stats_update_end(&stats64->syncp);
>  
>   napi_gro_receive(&priv->napi, skb);
>  next:
> @@ -784,24 +789,32 @@ static void bcm_sysport_tx_reclaim_one(struct 
> bcm_sysport_tx_ring *ring,
>  unsigned int *pkts_compl)
>  {
>   struct bcm_sysport_priv *priv = ring->priv;
> + struct bcm_sysport_stats *stats64 = &priv->stats64;
>   struct device *kdev = &priv->pdev->dev;
> + unsigned int len = 0;
>  
>   if (cb->skb) {
> - ring->bytes += cb->skb->len;
> - *bytes_compl += cb->skb->len;
> + len = cb->skb->len;
> + *bytes_compl += len;
>   dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
>dma_unmap_len(cb, dma_len),
>DMA_TO_DEVICE);
> - ring->packets++;
>   (*pkts_compl)++;
> - bcm_sysport_free_cb(cb);
>   /* SKB fragment */
>   } else if (dma_unmap_addr(cb, dma_addr)) {
> - ring->bytes += dma_unmap_len(cb, dma_len);
> + len = dma_unmap_len(cb, dma_len);
>   dma_unmap_page(kdev, dma_unmap_addr(cb, dma_addr),
>  dma_unmap_len(cb, dma_len), DMA_TO_DEVICE);
>   dma_unmap_addr_set(cb, dma_addr, 0);
>   }
> +
> + u64_stats_update_begin(&stats64->syncp);
> + ring->bytes += len;
> + if (cb->skb) {
> + ring->packets++;
> + bcm_sysport_free_cb(cb);

This does look better, but we should probably just call
bcm_sysport_free_cb() outside of the statistics update, so something
like this instead:

u64_stats_update_being(&stats64->syncp);
ring->bytes += len;
if (cb->skb)
ring->packets++;
u64_stats_update_end(&stats64->syncp);

if (cb->skb)
bcm_sysport_free_cb(cb);

Or maybe just do the 64-bit statistics update outside of
bcm_sysport_tx_reclaim_one() and do the following since for a given TX
clean run, we can't possibly be wrapping these two 32-bit counters
(pkts_compl and bytes_compl) since we have up to 1536 TX descriptors max
and if they were all 9000 bytes that would still be well within 4GB, so
something like this maybe:

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 5333601f855f..c085deef61ee 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -787,17 +787,14 @@ static void bcm_sysport_tx_reclaim_one(struct
bcm_sysport_tx_ring *ring,
struct device *kdev = &priv->pdev->dev;

if (cb->skb) {
-   ring->bytes += cb->skb->len;
*bytes_compl += cb->skb->len;
dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
 dma_unmap_len(cb, dma_len),
 DMA_TO_DEVICE);
-   ring->packets++;
(*pkts_compl)++;
bcm_sysport_free_cb(cb);
/* SKB fragment */
} else if (dma_unmap_addr(cb, dma_addr)) {
-   ring->bytes += dma_unmap_len(cb, dma_len);

[PATCH v4 net-next] net: systemport: Support 64bit statistics

2017-07-27 Thread Jianming.qiao
When using Broadcom Systemport device in 32bit Platform, ifconfig can
only report up to 4G tx,rx status, which will be wrapped to 0 when the
number of incoming or outgoing packets exceeds 4G, only taking
around 2 hours in busy network environment (such as streaming).
Therefore, it makes hard for network diagnostic tool to get reliable
statistical result, so the patch is used to add 64bit support for
Broadcom Systemport device in 32bit Platform.

Signed-off-by: Jianming.qiao 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 74 --
 drivers/net/ethernet/broadcom/bcmsysport.h |  9 +++-
 2 files changed, 57 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 5274501..16cd8a6 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -662,6 +662,7 @@ static int bcm_sysport_alloc_rx_bufs(struct 
bcm_sysport_priv *priv)
 static unsigned int bcm_sysport_desc_rx(struct bcm_sysport_priv *priv,
unsigned int budget)
 {
+   struct bcm_sysport_stats *stats64 = &priv->stats64;
struct net_device *ndev = priv->netdev;
unsigned int processed = 0, to_process;
struct bcm_sysport_cb *cb;
@@ -765,6 +766,10 @@ static unsigned int bcm_sysport_desc_rx(struct 
bcm_sysport_priv *priv,
skb->protocol = eth_type_trans(skb, ndev);
ndev->stats.rx_packets++;
ndev->stats.rx_bytes += len;
+   u64_stats_update_begin(&stats64->syncp);
+   stats64->rx_packets++;
+   stats64->rx_bytes += len;
+   u64_stats_update_end(&stats64->syncp);
 
napi_gro_receive(&priv->napi, skb);
 next:
@@ -784,24 +789,32 @@ static void bcm_sysport_tx_reclaim_one(struct 
bcm_sysport_tx_ring *ring,
   unsigned int *pkts_compl)
 {
struct bcm_sysport_priv *priv = ring->priv;
+   struct bcm_sysport_stats *stats64 = &priv->stats64;
struct device *kdev = &priv->pdev->dev;
+   unsigned int len = 0;
 
if (cb->skb) {
-   ring->bytes += cb->skb->len;
-   *bytes_compl += cb->skb->len;
+   len = cb->skb->len;
+   *bytes_compl += len;
dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
 dma_unmap_len(cb, dma_len),
 DMA_TO_DEVICE);
-   ring->packets++;
(*pkts_compl)++;
-   bcm_sysport_free_cb(cb);
/* SKB fragment */
} else if (dma_unmap_addr(cb, dma_addr)) {
-   ring->bytes += dma_unmap_len(cb, dma_len);
+   len = dma_unmap_len(cb, dma_len);
dma_unmap_page(kdev, dma_unmap_addr(cb, dma_addr),
   dma_unmap_len(cb, dma_len), DMA_TO_DEVICE);
dma_unmap_addr_set(cb, dma_addr, 0);
}
+
+   u64_stats_update_begin(&stats64->syncp);
+   ring->bytes += len;
+   if (cb->skb) {
+   ring->packets++;
+   bcm_sysport_free_cb(cb);
+   }
+   u64_stats_update_end(&stats64->syncp);
 }
 
 /* Reclaim queued SKBs for transmission completion, lockless version */
@@ -1671,24 +1684,6 @@ static int bcm_sysport_change_mac(struct net_device 
*dev, void *p)
return 0;
 }
 
-static struct net_device_stats *bcm_sysport_get_nstats(struct net_device *dev)
-{
-   struct bcm_sysport_priv *priv = netdev_priv(dev);
-   unsigned long tx_bytes = 0, tx_packets = 0;
-   struct bcm_sysport_tx_ring *ring;
-   unsigned int q;
-
-   for (q = 0; q < dev->num_tx_queues; q++) {
-   ring = &priv->tx_rings[q];
-   tx_bytes += ring->bytes;
-   tx_packets += ring->packets;
-   }
-
-   dev->stats.tx_bytes = tx_bytes;
-   dev->stats.tx_packets = tx_packets;
-   return &dev->stats;
-}
-
 static void bcm_sysport_netif_start(struct net_device *dev)
 {
struct bcm_sysport_priv *priv = netdev_priv(dev);
@@ -1923,6 +1918,37 @@ static int bcm_sysport_stop(struct net_device *dev)
return 0;
 }
 
+static void bcm_sysport_get_stats64(struct net_device *dev,
+   struct rtnl_link_stats64 *stats)
+{
+   struct bcm_sysport_priv *priv = netdev_priv(dev);
+   struct bcm_sysport_stats *stats64 = &priv->stats64;
+   struct bcm_sysport_tx_ring *ring;
+   u64 tx_packets = 0, tx_bytes = 0;
+   unsigned int start;
+   unsigned int q;
+
+   netdev_stats_to_stats64(stats, &dev->stats);
+
+   for (q = 0; q < dev->num_tx_queues; q++) {
+   ring = &priv->tx_rings[q];
+   do {
+   start = u64_stats_fetch_begin_irq(&stats64->syncp);
+   tx_bytes += ring->bytes;
+   tx_packets += ring->packets;
+   

Re: [PATCH v3 net-next 0/2] liquidio: standardization and cleanup

2017-07-27 Thread David Miller
From: Felix Manlunas 
Date: Wed, 26 Jul 2017 12:09:45 -0700

> From: Rick Farrington 
> 
> This patchset corrects some non-standard macro usage.
> 
> 1. Replaced custom MIN macro with use of standard 'min_t'.
> 2. Removed cryptic and misleading macro 'CAST_ULL'.
> 
> change log:
> V1 -> V2:
>   1. Add driver cleanup of macro 'CAST_ULL'.
> V2 -> V3:
>   1. Remove extra parentheses from previous usage of macro 'CAST_ULL'.

Series applied, thank you.


Re: [PATCH net-next] net: phy: Remove stale comments referencing timer

2017-07-27 Thread David Miller
From: Florian Fainelli 
Date: Wed, 26 Jul 2017 12:05:38 -0700

> Since commit a390d1f379cf ("phylib: convert state_queue work to
> delayed_work"), the PHYLIB state machine was converted to use delayed
> workqueues, yet some functions were still referencing the PHY library
> timer in their comments, fix that and remove the now unused
> linux/timer.h include.
> 
> Signed-off-by: Florian Fainelli 

Applied.


Re: [PATCH net-next 0/3] nfp: extend firmware request logic

2017-07-27 Thread David Miller
From: Jakub Kicinski 
Date: Wed, 26 Jul 2017 11:09:45 -0700

> We have been pondering for some time how to support loading different
> application firmwares onto NFP.  We want to support both users selecting
> one of the firmware images provided by Netronome (which are optimized
> for different use cases each) as well as firmware images created  by 
> users themselves or other companies.
> 
> In the end we decided to go with the simplest solution - depending on
> the right firmware image being placed in /lib/firmware.  This vastly
> simplifies driver logic and also doesn't require any new API.
> 
> Different NICs on one system may want to run different applications
> therefore we first try to load firmware specific to the device (by 
> serial number of PCI slot) and if not present try the device model
> based name we have been using so far.

Series applied, thanks Jakub.


Re: [PATCH 1/2] net: phy: rework Kconfig settings for MDIO_BUS

2017-07-27 Thread David Miller
From: Arnd Bergmann 
Date: Wed, 26 Jul 2017 17:13:59 +0200

> I still see build errors in randconfig builds and have had this
> patch for a while to locally work around it:
> 
> drivers/built-in.o: In function `xgene_mdio_probe':
> mux-core.c:(.text+0x352154): undefined reference to `of_mdiobus_register'
> mux-core.c:(.text+0x352168): undefined reference to `mdiobus_free'
> mux-core.c:(.text+0x3521c0): undefined reference to `mdiobus_alloc_size'
> 
> The idea is that CONFIG_MDIO_BUS now reflects whether the mdio_bus
> code is built-in or a module, and other drivers that use the core
> code can simply depend on that, instead of having a complex
> dependency line.
> 
> Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from 
> PHYs")
> Signed-off-by: Arnd Bergmann 
> Reviewed-by: Florian Fainelli 

Applied.


Re: [PATCH 2/2] phy: bcm-ns-usb3: fix MDIO_BUS dependency

2017-07-27 Thread David Miller
From: Arnd Bergmann 
Date: Wed, 26 Jul 2017 17:14:00 +0200

> The driver attempts to 'select MDIO_DEVICE', but the code
> is actually a loadable module when PHYLIB=m:
> 
> drivers/phy/broadcom/phy-bcm-ns-usb3.o: In function 
> `bcm_ns_usb3_mdiodev_phy_write':
> phy-bcm-ns-usb3.c:(.text.bcm_ns_usb3_mdiodev_phy_write+0x28): undefined 
> reference to `mdiobus_write'
> drivers/phy/broadcom/phy-bcm-ns-usb3.o: In function `bcm_ns_usb3_module_exit':
> phy-bcm-ns-usb3.c:(.exit.text+0x18): undefined reference to 
> `mdio_driver_unregister'
> drivers/phy/broadcom/phy-bcm-ns-usb3.o: In function `bcm_ns_usb3_module_init':
> phy-bcm-ns-usb3.c:(.init.text+0x18): undefined reference to 
> `mdio_driver_register'
> phy-bcm-ns-usb3.c:(.init.text+0x38): undefined reference to 
> `mdio_driver_unregister'
> 
> Using 'depends on MDIO_BUS' instead will avoid the link error.
> 
> Fixes: af850e14a7ae ("phy: bcm-ns-usb3: add MDIO driver using proper bus 
> layer")
> Signed-off-by: Arnd Bergmann 
> Reviewed-by: Florian Fainelli 

Applied.


[PATCH net-next v2 1/3] net: ethtool: add support for forward error correction modes

2017-07-27 Thread Roopa Prabhu
From: Vidya Sagar Ravipati 

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of
base link codeword.

This patch set intends to provide option under ethtool to manage
and report FEC encoding settings for networking devices as per
IEEE 802.3 bj, bm and by specs.

set-fec/show-fec option(s) are designed to provide control and
report the FEC encoding on the link.

SET FEC option:
root@tor: ethtool --set-fec  swp1 encoding [off | RS | BaseR | auto]

Encoding: Types of encoding
Off:  Turning off any encoding
RS :  enforcing RS-FEC encoding on supported speeds
BaseR  :  enforcing Base R encoding on supported speeds
Auto   :  IEEE defaults for the speed/medium combination

Here are a few examples of what we would expect if encoding=auto:
- if autoneg is on, we are  expecting FEC to be negotiated as on or off
  as long as protocol supports it
- if the hardware is capable of detecting the FEC encoding on it's
  receiver it will reconfigure its encoder to match
- in absence of the above, the configuration would be set to IEEE
  defaults.

>From our  understanding , this is essentially what most hardware/driver
combinations are doing today in the absence of a way for users to
control the behavior.

SHOW FEC option:
root@tor: ethtool --show-fec  swp1
FEC parameters for swp1:
Active FEC encodings: RS
Configured FEC encodings:  RS | BaseR

ETHTOOL DEVNAME output modification:

ethtool devname output:
root@tor:~# ethtool swp1
Settings for swp1:
root@hpe-7712-03:~# ethtool swp18
Settings for swp18:
Supported ports: [ FIBRE ]
Supported link modes:   4baseCR4/Full
4baseSR4/Full
4baseLR4/Full
10baseSR4/Full
10baseCR4/Full
10baseLR4_ER4/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: [RS | BaseR | None | Not reported]
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: [RS | BaseR | None | Not reported]
 One or more FEC modes
Speed: 10Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 106
Transceiver: internal
Auto-negotiation: off
Link detected: yes

This patch includes following changes
a) New ETHTOOL_SFECPARAM/SFECPARAM API, handled by
  the new get_fecparam/set_fecparam callbacks, provides support
  for configuration of forward error correction modes.
b) Link mode bits for FEC modes i.e. None (No FEC mode), RS, BaseR/FC
  are defined so that users can configure these fec modes for supported
  and advertising fields as part of link autonegotiation.

Signed-off-by: Vidya Sagar Ravipati 
Signed-off-by: Dustin Byford 
Signed-off-by: Roopa Prabhu 
---
 include/linux/ethtool.h  |  4 
 include/uapi/linux/ethtool.h | 48 +++-
 net/core/ethtool.c   | 34 +++
 3 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 83cc986..afdbb70 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -374,5 +374,9 @@ struct ethtool_ops {
  struct ethtool_link_ksettings *);
int (*set_link_ksettings)(struct net_device *,
  const struct ethtool_link_ksettings *);
+   int (*get_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
+   int (*set_fecparam)(struct net_device *,
+ struct ethtool_fecparam *);
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 7d4a594..9c041da 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1238,6 +1238,47 @@ struct ethtool_per_queue_op {
chardata[];
 };
 
+/**
+ * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters
+ * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
+ * @active_fec: FEC mode which is active on porte
+ * @fec: Bitmask of supported/configured FEC modes
+ * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ */
+struct ethtool_fecparam {
+   __u32   cmd;
+   /* bitmask of FEC modes */
+   __u32   active_fec;

[PATCH net-next v2 3/3] cxgb4: ethtool forward error correction management support

2017-07-27 Thread Roopa Prabhu
From: Casey Leedom 

Signed-off-by: Casey Leedom 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c | 100 +
 1 file changed, 100 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
index 26eb00a..03f593e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
@@ -801,6 +801,104 @@ static int set_link_ksettings(struct net_device *dev,
return ret;
 }
 
+/* Translate the Firmware FEC value into the ethtool value. */
+static inline unsigned int fwcap_to_eth_fec(unsigned int fw_fec)
+{
+   unsigned int eth_fec = 0;
+
+   if (fw_fec & FW_PORT_CAP_FEC_RS)
+   eth_fec |= ETHTOOL_FEC_RS;
+   if (fw_fec & FW_PORT_CAP_FEC_BASER_RS)
+   eth_fec |= ETHTOOL_FEC_BASER;
+
+   /* if nothing is set, then FEC is off */
+   if (!eth_fec)
+   eth_fec = ETHTOOL_FEC_OFF;
+
+   return eth_fec;
+}
+
+/* Translate Common Code FEC value into ethtool value. */
+static inline unsigned int cc_to_eth_fec(unsigned int cc_fec)
+{
+   unsigned int eth_fec = 0;
+
+   if (cc_fec & FEC_AUTO)
+   eth_fec |= ETHTOOL_FEC_AUTO;
+   if (cc_fec & FEC_RS)
+   eth_fec |= ETHTOOL_FEC_RS;
+   if (cc_fec & FEC_BASER_RS)
+   eth_fec |= ETHTOOL_FEC_BASER;
+
+   /* if nothing is set, then FEC is off */
+   if (!eth_fec)
+   eth_fec = ETHTOOL_FEC_OFF;
+
+   return eth_fec;
+}
+
+/* Translate ethtool FEC value into Common Code value. */
+static inline unsigned int eth_to_cc_fec(unsigned int eth_fec)
+{
+   unsigned int cc_fec = 0;
+
+   if (eth_fec & ETHTOOL_FEC_OFF)
+   return cc_fec;
+
+   if (eth_fec & ETHTOOL_FEC_AUTO)
+   cc_fec |= FEC_AUTO;
+   if (eth_fec & ETHTOOL_FEC_RS)
+   cc_fec |= FEC_RS;
+   if (eth_fec & ETHTOOL_FEC_BASER)
+   cc_fec |= FEC_BASER_RS;
+
+   return cc_fec;
+}
+
+static int get_fecparam(struct net_device *dev, struct ethtool_fecparam *fec)
+{
+   const struct port_info *pi = netdev_priv(dev);
+   const struct link_config *lc = &pi->link_cfg;
+
+   /* Translate the Firmware FEC Support into the ethtool value.  We
+* always support IEEE 802.3 "automatic" selection of Link FEC type if
+* any FEC is supported.
+*/
+   fec->fec = fwcap_to_eth_fec(lc->supported);
+   if (fec->fec != ETHTOOL_FEC_OFF)
+   fec->fec |= ETHTOOL_FEC_AUTO;
+
+   /* Translate the current internal FEC parameters into the
+* ethtool values.
+*/
+   fec->active_fec = cc_to_eth_fec(lc->fec);
+
+   return 0;
+}
+
+static int set_fecparam(struct net_device *dev, struct ethtool_fecparam *fec)
+{
+   struct port_info *pi = netdev_priv(dev);
+   struct link_config *lc = &pi->link_cfg;
+   struct link_config old_lc;
+   int ret;
+
+   /* Save old Link Configuration in case the L1 Configure below
+* fails.
+*/
+   old_lc = *lc;
+
+   /* Try to perform the L1 Configure and return the result of that
+* effort.  If it fails, revert the attempted change.
+*/
+   lc->requested_fec = eth_to_cc_fec(fec->fec);
+   ret = t4_link_l1cfg(pi->adapter, pi->adapter->mbox,
+   pi->tx_chan, lc);
+   if (ret)
+   *lc = old_lc;
+   return ret;
+}
+
 static void get_pauseparam(struct net_device *dev,
   struct ethtool_pauseparam *epause)
 {
@@ -1255,6 +1353,8 @@ static int get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *info,
 static const struct ethtool_ops cxgb_ethtool_ops = {
.get_link_ksettings = get_link_ksettings,
.set_link_ksettings = set_link_ksettings,
+   .get_fecparam  = get_fecparam,
+   .set_fecparam  = set_fecparam,
.get_drvinfo   = get_drvinfo,
.get_msglevel  = get_msglevel,
.set_msglevel  = set_msglevel,
-- 
2.1.4



[PATCH net-next v2 0/3] ethtool: support for forward error correction mode setting on a link

2017-07-27 Thread Roopa Prabhu
From: Roopa Prabhu 

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.

This patch set intends to provide option under ethtool to manage and
report FEC encoding settings for networking devices as per IEEE 802.3
bj, bm and by specs.

v2 :
- minor patch format fixes and typos pointed out by Andrew
- there was a pending discussion on the use of 'auto' vs
  'automatic' for fec settings. I have left it as 'auto'
  because in most cases today auto is used in place of
  automatic to represent automatically generated values.
  We use it in other networking config too. I would prefer
  leaving it as auto.

Casey Leedom (2):
  cxgb4: core hardware/firmware support for Forward Error Correction on
a link
  cxgb4: ethtool forward error correction management support

Vidya Sagar Ravipati (1):
  net: ethtool: add support for forward error correction modes

 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |  99 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 151 -
 include/linux/ethtool.h|   4 +
 include/uapi/linux/ethtool.h   |  48 ++-
 net/core/ethtool.c |  34 +
 5 files changed, 300 insertions(+), 36 deletions(-)

-- 
2.1.4



[PATCH net-next v2 2/3] cxgb4: core hardware/firmware support for Forward Error Correction on a link

2017-07-27 Thread Roopa Prabhu
From: Casey Leedom 

Signed-off-by: Casey Leedom 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 152 ++---
 1 file changed, 117 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index db41b3e..24087c8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3840,11 +3840,64 @@ void t4_ulprx_read_la(struct adapter *adap, u32 *la_buf)
 FW_PORT_CAP_SPEED_40G | FW_PORT_CAP_SPEED_100G | \
 FW_PORT_CAP_ANEG)
 
+/* Translate Firmware Port Capabilities Pause specification to Common Code */
+static inline unsigned int fwcap_to_cc_pause(unsigned int fw_pause)
+{
+   unsigned int cc_pause = 0;
+
+   if (fw_pause & FW_PORT_CAP_FC_RX)
+   cc_pause |= PAUSE_RX;
+   if (fw_pause & FW_PORT_CAP_FC_TX)
+   cc_pause |= PAUSE_TX;
+
+   return cc_pause;
+}
+
+/* Translate Common Code Pause specification into Firmware Port Capabilities */
+static inline unsigned int cc_to_fwcap_pause(unsigned int cc_pause)
+{
+   unsigned int fw_pause = 0;
+
+   if (cc_pause & PAUSE_RX)
+   fw_pause |= FW_PORT_CAP_FC_RX;
+   if (cc_pause & PAUSE_TX)
+   fw_pause |= FW_PORT_CAP_FC_TX;
+
+   return fw_pause;
+}
+
+/* Translate Firmware Forward Error Correction specification to Common Code */
+static inline unsigned int fwcap_to_cc_fec(unsigned int fw_fec)
+{
+   unsigned int cc_fec = 0;
+
+   if (fw_fec & FW_PORT_CAP_FEC_RS)
+   cc_fec |= FEC_RS;
+   if (fw_fec & FW_PORT_CAP_FEC_BASER_RS)
+   cc_fec |= FEC_BASER_RS;
+
+   return cc_fec;
+}
+
+/* Translate Common Code Forward Error Correction specification to Firmware */
+static inline unsigned int cc_to_fwcap_fec(unsigned int cc_fec)
+{
+   unsigned int fw_fec = 0;
+
+   if (cc_fec & FEC_RS)
+   fw_fec |= FW_PORT_CAP_FEC_RS;
+   if (cc_fec & FEC_BASER_RS)
+   fw_fec |= FW_PORT_CAP_FEC_BASER_RS;
+
+   return fw_fec;
+}
+
 /**
  * t4_link_l1cfg - apply link configuration to MAC/PHY
- * @phy: the PHY to setup
- * @mac: the MAC to setup
- * @lc: the requested link configuration
+ * @adapter: the adapter
+ * @mbox: the Firmware Mailbox to use
+ * @port: the Port ID
+ * @lc: the Port's Link Configuration
  *
  * Set up a port's MAC and PHY according to a desired link configuration.
  * - If the PHY can auto-negotiate first decide what to advertise, then
@@ -3857,22 +3910,46 @@ int t4_link_l1cfg(struct adapter *adap, unsigned int 
mbox, unsigned int port,
  struct link_config *lc)
 {
struct fw_port_cmd c;
-   unsigned int mdi = FW_PORT_CAP_MDI_V(FW_PORT_CAP_MDI_AUTO);
-   unsigned int fc = 0, fec = 0, fw_fec = 0;
+   unsigned int fw_mdi = FW_PORT_CAP_MDI_V(FW_PORT_CAP_MDI_AUTO);
+   unsigned int fw_fc, cc_fec, fw_fec;
+   unsigned int rcap;
 
lc->link_ok = 0;
-   if (lc->requested_fc & PAUSE_RX)
-   fc |= FW_PORT_CAP_FC_RX;
-   if (lc->requested_fc & PAUSE_TX)
-   fc |= FW_PORT_CAP_FC_TX;
 
-   fec = lc->requested_fec & FEC_AUTO ? lc->auto_fec : lc->requested_fec;
+   /* Convert driver coding of Pause Frame Flow Control settings into the
+* Firmware's API.
+*/
+   fw_fc = cc_to_fwcap_pause(lc->requested_fc);
+
+   /* Convert Common Code Forward Error Control settings into the
+* Firmware's API.  If the current Requested FEC has "Automatic"
+* (IEEE 802.3) specified, then we use whatever the Firmware
+* sent us as part of it's IEEE 802.3-based interpratation of
+* the Transceiver Module EPROM FEC parameters.  Otherwise we
+* use whatever is in the current Requested FEC settings.
+*/
+   if (lc->requested_fec & FEC_AUTO)
+   cc_fec = lc->auto_fec;
+   else
+   cc_fec = lc->requested_fec;
+   fw_fec = cc_to_fwcap_fec(cc_fec);
 
-   if (fec & FEC_RS)
-   fw_fec |= FW_PORT_CAP_FEC_RS;
-   if (fec & FEC_BASER_RS)
-   fw_fec |= FW_PORT_CAP_FEC_BASER_RS;
+   /* Figure out what our Requested Port Capabilities are going to be.
+*/
+   if (!(lc->supported & FW_PORT_CAP_ANEG)) {
+   rcap = (lc->supported & ADVERT_MASK) | fw_fc | fw_fec;
+   lc->fc = lc->requested_fc & (PAUSE_RX | PAUSE_TX);
+   lc->fec = cc_fec;
+   } else if (lc->autoneg == AUTONEG_DISABLE) {
+   rcap = lc->requested_speed | fw_fc | fw_fec | fw_mdi;
+   lc->fc = lc->requested_fc & (PAUSE_RX | PAUSE_TX);
+   lc->fec = cc_fec;
+   } else {
+   rcap = lc->advertising | fw_fc | fw_fec | fw_mdi;
+   }
 
+   /* And send that on to the Firmware ...
+*/
memset(&c, 0, sizeof(c));
c.op_to_portid =

RE: [PATCH V4 net-next 2/8] net: hns3: Add support of the HNAE3 framework

2017-07-27 Thread Salil Mehta
Hi Leon

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Leon Romanovsky
> Sent: Sunday, July 23, 2017 2:16 PM
> To: Salil Mehta
> Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the
> HNAE3 framework
> 
> On Sat, Jul 22, 2017 at 11:09:36PM +0100, Salil Mehta wrote:
> > This patch adds the support of the HNAE3 (Hisilicon Network
> > Acceleration Engine 3) framework support to the HNS3 driver.
> >
> > Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE
> > and user-space Ethernet drivers (like ODP etc.) to register with
> HNAE3
> > devices and their associated operations.
> >
> > Signed-off-by: Daode Huang 
> > Signed-off-by: lipeng 
> > Signed-off-by: Salil Mehta 
> > Signed-off-by: Yisen Zhuang 
> > ---
> > Patch V4: Addressed following comments
> >   1. Andrew Lunn:
> >  https://lkml.org/lkml/2017/6/17/233
> >  https://lkml.org/lkml/2017/6/18/105
> >   2. Bo Yu:
> >  https://lkml.org/lkml/2017/6/18/112
> >   3. Stephen Hamminger:
> >  https://lkml.org/lkml/2017/6/19/778
> > Patch V3: Addressed below comments
> >   1. Andrew Lunn:
> >  https://lkml.org/lkml/2017/6/13/1025
> > Patch V2: No change
> > Patch V1: Initial Submit
> > ---
> >  drivers/net/ethernet/hisilicon/hns3/hnae3.c | 319
> 
> >  drivers/net/ethernet/hisilicon/hns3/hnae3.h | 449
> 
> >  2 files changed, 768 insertions(+)
> >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c
> >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h
> >
> > diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > new file mode 100644
> > index ..7a11aaff0a23
> > --- /dev/null
> > +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > @@ -0,0 +1,319 @@
> > +/*
> > + * Copyright (c) 2016-2017 Hisilicon Limited.
> > + *
> > + * This program is free software; you can redistribute it and/or
> modify
> > + * it under the terms of the GNU General Public License as published
> by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "hnae3.h"
> > +
> > +static LIST_HEAD(hnae3_ae_algo_list);
> > +static LIST_HEAD(hnae3_client_list);
> > +static LIST_HEAD(hnae3_ae_dev_list);
> > +
> > +/* we are keeping things simple and using single lock for all the
> > + * list. This is a non-critical code so other updations, if happen
> > + * in parallel, can wait.
> > + */
> > +static DEFINE_MUTEX(hnae3_common_lock);
> > +
> > +static bool hnae3_client_match(enum hnae3_client_type client_type,
> > +  enum hnae3_dev_type dev_type)
> > +{
> > +   if (dev_type == HNAE3_DEV_KNIC) {
> > +   switch (client_type) {
> > +   case HNAE3_CLIENT_KNIC:
> > +   case HNAE3_CLIENT_ROCE:
> > +   return true;
> > +   default:
> > +   return false;
> > +   }
> > +   } else if (dev_type == HNAE3_DEV_UNIC) {
> > +   switch (client_type) {
> > +   case HNAE3_CLIENT_UNIC:
> > +   return true;
> > +   default:
> > +   return false;
> > +   }
> > +   } else {
> > +   return false;
> > +   }
> > +}
> > +
> > +static int hnae3_match_n_instantiate(struct hnae3_client *client,
> > +struct hnae3_ae_dev *ae_dev,
> > +bool is_reg, bool *matched)
> > +{
> > +   int ret;
> > +
> > +   *matched = false;
> > +
> > +   /* check if this client matches the type of ae_dev */
> > +   if (!(hnae3_client_match(client->type, ae_dev->dev_type) &&
> > + hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))) {
> > +   return 0;
> > +   }
> > +   /* there is a match of client and dev */
> > +   *matched = true;
> > +
> > +   if (!(ae_dev->ops && ae_dev->ops->init_client_instance &&
> > + ae_dev->ops->uninit_client_instance)) {
> > +   dev_err(&ae_dev->pdev->dev,
> > +   "ae_dev or client init/uninit ops are null\n");
> > +   return -EOPNOTSUPP;
> > +   }
> > +
> > +   /* now, (un-)instantiate client by calling lower layer */
> > +   if (is_reg) {
> > +   ret = ae_dev->ops->init_client_instance(client, ae_dev);
> > +   if (ret)
> > +   dev_err(&ae_dev->pdev->dev,
> > +   "fail to instantiate client\n");
> > +   return ret;
> > +   }
> > +
> > +   ae_dev->ops->uninit_client_instance(client, ae_dev);
> > +   return 0;
> > +}
> > +
> > +int hnae3_register_client(struct hnae3_client *client)
> > +{
> 

[RFC PATCH net-next 3/6] tcp: remove low_latency sysctl

2017-07-27 Thread Florian Westphal
this option was used by the removed prequeue code, it has no effect
anymore.

Signed-off-by: Florian Westphal 
---
 Documentation/networking/ip-sysctl.txt | 7 +--
 include/net/tcp.h  | 1 -
 net/ipv4/sysctl_net_ipv4.c | 3 +++
 net/ipv4/tcp_ipv4.c| 2 --
 4 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index f485d553e65c..84c9b8cee780 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -353,12 +353,7 @@ tcp_l3mdev_accept - BOOLEAN
compiled with CONFIG_NET_L3_MASTER_DEV.
 
 tcp_low_latency - BOOLEAN
-   If set, the TCP stack makes decisions that prefer lower
-   latency as opposed to higher throughput.  By default, this
-   option is not set meaning that higher throughput is preferred.
-   An example of an application where this default should be
-   changed would be a Beowulf compute cluster.
-   Default: 0
+   This is a legacy option, it has no effect anymore.
 
 tcp_max_orphans - INTEGER
Maximal number of TCP sockets not attached to any user file handle,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 93f115cfc8f8..8507c81fb0e9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -256,7 +256,6 @@ extern int sysctl_tcp_rmem[3];
 extern int sysctl_tcp_app_win;
 extern int sysctl_tcp_adv_win_scale;
 extern int sysctl_tcp_frto;
-extern int sysctl_tcp_low_latency;
 extern int sysctl_tcp_nometrics_save;
 extern int sysctl_tcp_moderate_rcvbuf;
 extern int sysctl_tcp_tso_win_divisor;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 9bf809726066..0d3c038d7b04 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -45,6 +45,9 @@ static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
 static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 
+/* obsolete */
+static int sysctl_tcp_low_latency __read_mostly;
+
 /* Update system visible IP port range */
 static void set_local_port_range(struct net *net, int range[2])
 {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a68eb4577d36..9b51663cd5a4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -85,8 +85,6 @@
 #include 
 #include 
 
-int sysctl_tcp_low_latency __read_mostly;
-
 #ifdef CONFIG_TCP_MD5SIG
 static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key 
*key,
   __be32 daddr, __be32 saddr, const struct tcphdr 
*th);
-- 
2.13.0



[RFC PATCH net-next 2/6] tcp: reindent two spots after prequeue removal

2017-07-27 Thread Florian Westphal
These two branches are now always true, remove the conditional.
objdiff shows no changes.

Signed-off-by: Florian Westphal 
---
 net/ipv4/tcp_input.c | 50 +++---
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 770ce6cb3eca..87efde9f5a90 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4611,16 +4611,14 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
goto out_of_window;
 
/* Ok. In sequence. In window. */
-   if (eaten <= 0) {
 queue_and_out:
-   if (eaten < 0) {
-   if (skb_queue_len(&sk->sk_receive_queue) == 0)
-   sk_forced_mem_schedule(sk, 
skb->truesize);
-   else if (tcp_try_rmem_schedule(sk, skb, 
skb->truesize))
-   goto drop;
-   }
-   eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
+   if (eaten < 0) {
+   if (skb_queue_len(&sk->sk_receive_queue) == 0)
+   sk_forced_mem_schedule(sk, skb->truesize);
+   else if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
+   goto drop;
}
+   eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
if (skb->len)
tcp_event_data_recv(sk, skb);
@@ -5410,30 +5408,28 @@ void tcp_rcv_established(struct sock *sk, struct 
sk_buff *skb,
int eaten = 0;
bool fragstolen = false;
 
-   if (!eaten) {
-   if (tcp_checksum_complete(skb))
-   goto csum_error;
+   if (tcp_checksum_complete(skb))
+   goto csum_error;
 
-   if ((int)skb->truesize > sk->sk_forward_alloc)
-   goto step5;
+   if ((int)skb->truesize > sk->sk_forward_alloc)
+   goto step5;
 
-   /* Predicted packet is in window by definition.
-* seq == rcv_nxt and rcv_wup <= rcv_nxt.
-* Hence, check seq<=rcv_wup reduces to:
-*/
-   if (tcp_header_len ==
-   (sizeof(struct tcphdr) + 
TCPOLEN_TSTAMP_ALIGNED) &&
-   tp->rcv_nxt == tp->rcv_wup)
-   tcp_store_ts_recent(tp);
+   /* Predicted packet is in window by definition.
+* seq == rcv_nxt and rcv_wup <= rcv_nxt.
+* Hence, check seq<=rcv_wup reduces to:
+*/
+   if (tcp_header_len ==
+   (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
+   tp->rcv_nxt == tp->rcv_wup)
+   tcp_store_ts_recent(tp);
 
-   tcp_rcv_rtt_measure_ts(sk, skb);
+   tcp_rcv_rtt_measure_ts(sk, skb);
 
-   NET_INC_STATS(sock_net(sk), 
LINUX_MIB_TCPHPHITS);
+   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPHITS);
 
-   /* Bulk data transfer: receiver */
-   eaten = tcp_queue_rcv(sk, skb, tcp_header_len,
- &fragstolen);
-   }
+   /* Bulk data transfer: receiver */
+   eaten = tcp_queue_rcv(sk, skb, tcp_header_len,
+ &fragstolen);
 
tcp_event_data_recv(sk, skb);
 
-- 
2.13.0



[RFC PATCH net-next 5/6] tcp: remove CA_ACK_SLOWPATH

2017-07-27 Thread Florian Westphal
re-indent tcp_ack, and remove CA_ACK_SLOWPATH; it is always set now.

Signed-off-by: Florian Westphal 
---
 include/net/tcp.h   |  5 ++---
 net/ipv4/tcp_input.c| 35 ---
 net/ipv4/tcp_westwood.c | 31 ---
 3 files changed, 22 insertions(+), 49 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8f11b82b5b5a..3ecb62811004 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -880,9 +880,8 @@ enum tcp_ca_event {
 
 /* Information about inbound ACK, passed to cong_ops->in_ack_event() */
 enum tcp_ca_ack_event_flags {
-   CA_ACK_SLOWPATH = (1 << 0), /* In slow path processing */
-   CA_ACK_WIN_UPDATE   = (1 << 1), /* ACK updated window */
-   CA_ACK_ECE  = (1 << 2), /* ECE bit is set on ack */
+   CA_ACK_WIN_UPDATE   = (1 << 0), /* ACK updated window */
+   CA_ACK_ECE  = (1 << 1), /* ECE bit is set on ack */
 };
 
 /*
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bfde9d7d210e..af0a98d54b62 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3547,6 +3547,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
u32 lost = tp->lost;
int acked = 0; /* Number of packets newly acked */
int rexmit = REXMIT_NONE; /* Flag to (re)transmit to recover losses */
+   u32 ack_ev_flags = 0;
 
sack_state.first_sackt = 0;
sack_state.rate = &rs;
@@ -3590,30 +3591,26 @@ static int tcp_ack(struct sock *sk, const struct 
sk_buff *skb, int flag)
if (flag & FLAG_UPDATE_TS_RECENT)
tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
 
-   {
-   u32 ack_ev_flags = CA_ACK_SLOWPATH;
-
-   if (ack_seq != TCP_SKB_CB(skb)->end_seq)
-   flag |= FLAG_DATA;
-   else
-   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPUREACKS);
+   if (ack_seq != TCP_SKB_CB(skb)->end_seq)
+   flag |= FLAG_DATA;
+   else
+   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPUREACKS);
 
-   flag |= tcp_ack_update_window(sk, skb, ack, ack_seq);
+   flag |= tcp_ack_update_window(sk, skb, ack, ack_seq);
 
-   if (TCP_SKB_CB(skb)->sacked)
-   flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
-   &sack_state);
+   if (TCP_SKB_CB(skb)->sacked)
+   flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
+   &sack_state);
 
-   if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
-   flag |= FLAG_ECE;
-   ack_ev_flags |= CA_ACK_ECE;
-   }
+   if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
+   flag |= FLAG_ECE;
+   ack_ev_flags = CA_ACK_ECE;
+   }
 
-   if (flag & FLAG_WIN_UPDATE)
-   ack_ev_flags |= CA_ACK_WIN_UPDATE;
+   if (flag & FLAG_WIN_UPDATE)
+   ack_ev_flags |= CA_ACK_WIN_UPDATE;
 
-   tcp_in_ack_event(sk, ack_ev_flags);
-   }
+   tcp_in_ack_event(sk, ack_ev_flags);
 
/* We passed data and got it acked, remove any soft error
 * log. Something worked...
diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c
index bec9cafbe3f9..e5de84310949 100644
--- a/net/ipv4/tcp_westwood.c
+++ b/net/ipv4/tcp_westwood.c
@@ -154,24 +154,6 @@ static inline void update_rtt_min(struct westwood *w)
 }
 
 /*
- * @westwood_fast_bw
- * It is called when we are in fast path. In particular it is called when
- * header prediction is successful. In such case in fact update is
- * straight forward and doesn't need any particular care.
- */
-static inline void westwood_fast_bw(struct sock *sk)
-{
-   const struct tcp_sock *tp = tcp_sk(sk);
-   struct westwood *w = inet_csk_ca(sk);
-
-   westwood_update_window(sk);
-
-   w->bk += tp->snd_una - w->snd_una;
-   w->snd_una = tp->snd_una;
-   update_rtt_min(w);
-}
-
-/*
  * @westwood_acked_count
  * This function evaluates cumul_ack for evaluating bk in case of
  * delayed or partial acks.
@@ -223,17 +205,12 @@ static u32 tcp_westwood_bw_rttmin(const struct sock *sk)
 
 static void tcp_westwood_ack(struct sock *sk, u32 ack_flags)
 {
-   if (ack_flags & CA_ACK_SLOWPATH) {
-   struct westwood *w = inet_csk_ca(sk);
-
-   westwood_update_window(sk);
-   w->bk += westwood_acked_count(sk);
+   struct westwood *w = inet_csk_ca(sk);
 
-   update_rtt_min(w);
-   return;
-   }
+   westwood_update_window(sk);
+   w->bk += westwood_acked_count(sk);
 
-   westwood_fast_bw(sk);
+   update_rtt_min(w);
 }
 
 static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event)
-- 
2.13.0



[RFC PATCH net-next 4/6] tcp: remove header prediction

2017-07-27 Thread Florian Westphal
Like prequeue, I am not sure this is overly useful nowadays.

If we receive a train of packets, GRO will aggregate them if the
headers are the same (HP predates GRO by several years) so we don't
get a per-packet benefit, only a per-aggregated-packet one.

Signed-off-by: Florian Westphal 
---
 include/linux/tcp.h  |   6 --
 include/net/tcp.h|  23 --
 net/ipv4/tcp.c   |   4 +-
 net/ipv4/tcp_input.c | 192 +++
 net/ipv4/tcp_minisocks.c |   2 -
 net/ipv4/tcp_output.c|   2 -
 6 files changed, 10 insertions(+), 219 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32fb37cfb0d1..d7389ea36e10 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -148,12 +148,6 @@ struct tcp_sock {
u16 gso_segs;   /* Max number of segs per GSO packet*/
 
 /*
- * Header prediction flags
- * 0x5?10 << 16 + snd_wnd in net byte order
- */
-   __be32  pred_flags;
-
-/*
  * RFC793 variables by their proper names. This means you can
  * read the code and the spec side by side (and laugh ...)
  * See RFC793 and RFC1122. The RFC writes these in capitals.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 8507c81fb0e9..8f11b82b5b5a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -631,29 +631,6 @@ static inline u32 __tcp_set_rto(const struct tcp_sock *tp)
return usecs_to_jiffies((tp->srtt_us >> 3) + tp->rttvar_us);
 }
 
-static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd)
-{
-   tp->pred_flags = htonl((tp->tcp_header_len << 26) |
-  ntohl(TCP_FLAG_ACK) |
-  snd_wnd);
-}
-
-static inline void tcp_fast_path_on(struct tcp_sock *tp)
-{
-   __tcp_fast_path_on(tp, tp->snd_wnd >> tp->rx_opt.snd_wscale);
-}
-
-static inline void tcp_fast_path_check(struct sock *sk)
-{
-   struct tcp_sock *tp = tcp_sk(sk);
-
-   if (RB_EMPTY_ROOT(&tp->out_of_order_queue) &&
-   tp->rcv_wnd &&
-   atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf &&
-   !tp->urg_data)
-   tcp_fast_path_on(tp);
-}
-
 /* Compute the actual rto_min value */
 static inline u32 tcp_rto_min(struct sock *sk)
 {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 62018ea6f45f..e022874d509f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1848,10 +1848,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
tcp_rcv_space_adjust(sk);
 
 skip_copy:
-   if (tp->urg_data && after(tp->copied_seq, tp->urg_seq)) {
+   if (tp->urg_data && after(tp->copied_seq, tp->urg_seq))
tp->urg_data = 0;
-   tcp_fast_path_check(sk);
-   }
if (used + offset < skb->len)
continue;
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87efde9f5a90..bfde9d7d210e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -103,7 +103,6 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define FLAG_DATA_SACKED   0x20 /* New SACK.   
*/
 #define FLAG_ECE   0x40 /* ECE in this ACK 
*/
 #define FLAG_LOST_RETRANS  0x80 /* This ACK marks some retransmission lost 
*/
-#define FLAG_SLOWPATH  0x100 /* Do not skip RFC checks for window 
update.*/
 #define FLAG_ORIG_SACK_ACKED   0x200 /* Never retransmitted data are (s)acked  
*/
 #define FLAG_SND_UNA_ADVANCED  0x400 /* Snd_una was changed (!= 
FLAG_DATA_ACKED) */
 #define FLAG_DSACKING_ACK  0x800 /* SACK blocks contained D-SACK info */
@@ -3367,12 +3366,6 @@ static int tcp_ack_update_window(struct sock *sk, const 
struct sk_buff *skb, u32
if (tp->snd_wnd != nwin) {
tp->snd_wnd = nwin;
 
-   /* Note, it is the only place, where
-* fast path is recovered for sending TCP.
-*/
-   tp->pred_flags = 0;
-   tcp_fast_path_check(sk);
-
if (tcp_send_head(sk))
tcp_slow_start_after_idle_check(sk);
 
@@ -3597,19 +3590,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
if (flag & FLAG_UPDATE_TS_RECENT)
tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
 
-   if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
-   /* Window is constant, pure forward advance.
-* No more checks are required.
-* Note, we use the fact that SND.UNA>=SND.WL2.
-*/
-   tcp_update_wl(tp, ack_seq);
-   tcp_snd_una_update(tp, ack);
-   flag |= FLAG_WIN_UPDATE;
-
-   tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
-
-   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPACKS);
-   } else {

[RFC PATCH net-next 6/6] tcp: remove unused mib counters

2017-07-27 Thread Florian Westphal
was used by tcp prequeue, TCPFORWARDRETRANS use was removed in january.

Signed-off-by: Florian Westphal 
---
 include/uapi/linux/snmp.h | 8 
 net/ipv4/proc.c   | 8 
 2 files changed, 16 deletions(-)

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index d85693295798..73c15719fd35 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -185,13 +185,7 @@ enum
LINUX_MIB_LISTENOVERFLOWS,  /* ListenOverflows */
LINUX_MIB_LISTENDROPS,  /* ListenDrops */
LINUX_MIB_TCPPREQUEUED, /* TCPPrequeued */
-   LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG, /* TCPDirectCopyFromBacklog */
-   LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE,/* TCPDirectCopyFromPrequeue */
-   LINUX_MIB_TCPPREQUEUEDROPPED,   /* TCPPrequeueDropped */
-   LINUX_MIB_TCPHPHITS,/* TCPHPHits */
-   LINUX_MIB_TCPHPHITSTOUSER,  /* TCPHPHitsToUser */
LINUX_MIB_TCPPUREACKS,  /* TCPPureAcks */
-   LINUX_MIB_TCPHPACKS,/* TCPHPAcks */
LINUX_MIB_TCPRENORECOVERY,  /* TCPRenoRecovery */
LINUX_MIB_TCPSACKRECOVERY,  /* TCPSackRecovery */
LINUX_MIB_TCPSACKRENEGING,  /* TCPSACKReneging */
@@ -208,14 +202,12 @@ enum
LINUX_MIB_TCPSACKFAILURES,  /* TCPSackFailures */
LINUX_MIB_TCPLOSSFAILURES,  /* TCPLossFailures */
LINUX_MIB_TCPFASTRETRANS,   /* TCPFastRetrans */
-   LINUX_MIB_TCPFORWARDRETRANS,/* TCPForwardRetrans */
LINUX_MIB_TCPSLOWSTARTRETRANS,  /* TCPSlowStartRetrans */
LINUX_MIB_TCPTIMEOUTS,  /* TCPTimeouts */
LINUX_MIB_TCPLOSSPROBES,/* TCPLossProbes */
LINUX_MIB_TCPLOSSPROBERECOVERY, /* TCPLossProbeRecovery */
LINUX_MIB_TCPRENORECOVERYFAIL,  /* TCPRenoRecoveryFail */
LINUX_MIB_TCPSACKRECOVERYFAIL,  /* TCPSackRecoveryFail */
-   LINUX_MIB_TCPSCHEDULERFAILED,   /* TCPSchedulerFailed */
LINUX_MIB_TCPRCVCOLLAPSED,  /* TCPRcvCollapsed */
LINUX_MIB_TCPDSACKOLDSENT,  /* TCPDSACKOldSent */
LINUX_MIB_TCPDSACKOFOSENT,  /* TCPDSACKOfoSent */
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 43eb6567b3a0..e2c91375cadc 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -207,13 +207,7 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("ListenOverflows", LINUX_MIB_LISTENOVERFLOWS),
SNMP_MIB_ITEM("ListenDrops", LINUX_MIB_LISTENDROPS),
SNMP_MIB_ITEM("TCPPrequeued", LINUX_MIB_TCPPREQUEUED),
-   SNMP_MIB_ITEM("TCPDirectCopyFromBacklog", 
LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG),
-   SNMP_MIB_ITEM("TCPDirectCopyFromPrequeue", 
LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE),
-   SNMP_MIB_ITEM("TCPPrequeueDropped", LINUX_MIB_TCPPREQUEUEDROPPED),
-   SNMP_MIB_ITEM("TCPHPHits", LINUX_MIB_TCPHPHITS),
-   SNMP_MIB_ITEM("TCPHPHitsToUser", LINUX_MIB_TCPHPHITSTOUSER),
SNMP_MIB_ITEM("TCPPureAcks", LINUX_MIB_TCPPUREACKS),
-   SNMP_MIB_ITEM("TCPHPAcks", LINUX_MIB_TCPHPACKS),
SNMP_MIB_ITEM("TCPRenoRecovery", LINUX_MIB_TCPRENORECOVERY),
SNMP_MIB_ITEM("TCPSackRecovery", LINUX_MIB_TCPSACKRECOVERY),
SNMP_MIB_ITEM("TCPSACKReneging", LINUX_MIB_TCPSACKRENEGING),
@@ -230,14 +224,12 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("TCPSackFailures", LINUX_MIB_TCPSACKFAILURES),
SNMP_MIB_ITEM("TCPLossFailures", LINUX_MIB_TCPLOSSFAILURES),
SNMP_MIB_ITEM("TCPFastRetrans", LINUX_MIB_TCPFASTRETRANS),
-   SNMP_MIB_ITEM("TCPForwardRetrans", LINUX_MIB_TCPFORWARDRETRANS),
SNMP_MIB_ITEM("TCPSlowStartRetrans", LINUX_MIB_TCPSLOWSTARTRETRANS),
SNMP_MIB_ITEM("TCPTimeouts", LINUX_MIB_TCPTIMEOUTS),
SNMP_MIB_ITEM("TCPLossProbes", LINUX_MIB_TCPLOSSPROBES),
SNMP_MIB_ITEM("TCPLossProbeRecovery", LINUX_MIB_TCPLOSSPROBERECOVERY),
SNMP_MIB_ITEM("TCPRenoRecoveryFail", LINUX_MIB_TCPRENORECOVERYFAIL),
SNMP_MIB_ITEM("TCPSackRecoveryFail", LINUX_MIB_TCPSACKRECOVERYFAIL),
-   SNMP_MIB_ITEM("TCPSchedulerFailed", LINUX_MIB_TCPSCHEDULERFAILED),
SNMP_MIB_ITEM("TCPRcvCollapsed", LINUX_MIB_TCPRCVCOLLAPSED),
SNMP_MIB_ITEM("TCPDSACKOldSent", LINUX_MIB_TCPDSACKOLDSENT),
SNMP_MIB_ITEM("TCPDSACKOfoSent", LINUX_MIB_TCPDSACKOFOSENT),
-- 
2.13.0



[RFC net-next 0/6] tcp: remove prequeue and header prediction

2017-07-27 Thread Florian Westphal
This RFC removes tcp prequeueing and header prediction support.

After a hallway discussion with Eric Dumazet some
maybe-not-so-useful-anymore TCP stack features came up, HP and
Prequeue among these.

So this RFC proposes to axe both.

In brief, TCP prequeue assumes a single-process-blocking-read
design, which is not that common anymore, and the most frequently
used high-performance networking program that does this is netperf :)

With more commong (e)poll designs, prequeue doesn't work.

The idea behind prequeueing isn't so bad in itself; it moves
part of tcp processing -- including ack processing (including
retransmit queue processing) into process context.
However, removing it would not just avoid some code, for most
programs it elimiates dead code.

As processing then always occurs in BH context, it would allow us
to experiment e.g. with bulk-freeing of skb heads when a packet acks
data on the retransmit queue.

Header prediction is also less useful nowadays.
For packet trains, GRO will aggregate packets so we do not get
a per-packet benefit.
Header prediction will also break down with light packet loss due to SACK.

So, In short: What do others think?

Florian Westphal (6):
  tcp: remove prequeue support
  tcp: reindent two spots after prequeue removal
  tcp: remove low_latency sysctl
  tcp: remove header prediction
  tcp: remove CA_ACK_SLOWPATH
  tcp: remove unused mib counters

 Documentation/networking/ip-sysctl.txt |7 
 include/linux/tcp.h|   15 -
 include/net/tcp.h  |   40 
 include/uapi/linux/snmp.h  |8 
 net/ipv4/proc.c|8 
 net/ipv4/sysctl_net_ipv4.c |3 
 net/ipv4/tcp.c |  109 ---
 net/ipv4/tcp_input.c   |  303 +++--
 net/ipv4/tcp_ipv4.c|   63 --
 net/ipv4/tcp_minisocks.c   |3 
 net/ipv4/tcp_output.c  |2 
 net/ipv4/tcp_timer.c   |   12 -
 net/ipv4/tcp_westwood.c|   31 ---
 net/ipv6/tcp_ipv6.c|3 
 14 files changed, 43 insertions(+), 564 deletions(-)



[RFC PATCH net-next 1/6] tcp: remove prequeue support

2017-07-27 Thread Florian Westphal
prequeue is a tcp receive optimization that moves part of rx processing from
bh to process context.

This only works if the socket being processed belongs to a process that
blocks in recv on this socket.  In practice, this doesn't happen anymore that
often, as servers normally use an event driven (epoll) model.

Even normal clients (e.g. web browsers) commonly use many tcp connections
in parallel.

Lets remove this.

This has measure impact only on netperf from host to local vm.
There are no changes with bulk transfers that use select/poll etc. to
get notified about new data.

I also see no changes when using netperf between two physical hosts
with ixgbe interfaces.

Signed-off-by: Florian Westphal 
---
 include/linux/tcp.h  |   9 
 include/net/tcp.h|  11 -
 net/ipv4/tcp.c   | 105 ---
 net/ipv4/tcp_input.c |  62 
 net/ipv4/tcp_ipv4.c  |  61 +--
 net/ipv4/tcp_minisocks.c |   1 -
 net/ipv4/tcp_timer.c |  12 --
 net/ipv6/tcp_ipv6.c  |   3 +-
 8 files changed, 2 insertions(+), 262 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 542ca1ae02c4..32fb37cfb0d1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -192,15 +192,6 @@ struct tcp_sock {
 
struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
 
-   /* Data for direct copy to user */
-   struct {
-   struct sk_buff_head prequeue;
-   struct task_struct  *task;
-   struct msghdr   *msg;
-   int memory;
-   int len;
-   } ucopy;
-
u32 snd_wl1;/* Sequence for window update   */
u32 snd_wnd;/* The window we expect to receive  */
u32 max_window; /* Maximal window ever seen from peer   */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 12d68335acd4..93f115cfc8f8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1244,17 +1244,6 @@ static inline bool tcp_checksum_complete(struct sk_buff 
*skb)
__tcp_checksum_complete(skb);
 }
 
-/* Prequeue for VJ style copy to user, combined with checksumming. */
-
-static inline void tcp_prequeue_init(struct tcp_sock *tp)
-{
-   tp->ucopy.task = NULL;
-   tp->ucopy.len = 0;
-   tp->ucopy.memory = 0;
-   skb_queue_head_init(&tp->ucopy.prequeue);
-}
-
-bool tcp_prequeue(struct sock *sk, struct sk_buff *skb);
 bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb);
 int tcp_filter(struct sock *sk, struct sk_buff *skb);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 71ce33decd97..62018ea6f45f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -400,7 +400,6 @@ void tcp_init_sock(struct sock *sk)
 
tp->out_of_order_queue = RB_ROOT;
tcp_init_xmit_timers(sk);
-   tcp_prequeue_init(tp);
INIT_LIST_HEAD(&tp->tsq_node);
 
icsk->icsk_rto = TCP_TIMEOUT_INIT;
@@ -1525,20 +1524,6 @@ static void tcp_cleanup_rbuf(struct sock *sk, int copied)
tcp_send_ack(sk);
 }
 
-static void tcp_prequeue_process(struct sock *sk)
-{
-   struct sk_buff *skb;
-   struct tcp_sock *tp = tcp_sk(sk);
-
-   NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPREQUEUED);
-
-   while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
-   sk_backlog_rcv(sk, skb);
-
-   /* Clear memory counter. */
-   tp->ucopy.memory = 0;
-}
-
 static struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off)
 {
struct sk_buff *skb;
@@ -1671,7 +1656,6 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
int err;
int target; /* Read at least this many bytes */
long timeo;
-   struct task_struct *user_recv = NULL;
struct sk_buff *skb, *last;
u32 urg_hole = 0;
 
@@ -1806,51 +1790,6 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
 
tcp_cleanup_rbuf(sk, copied);
 
-   if (!sysctl_tcp_low_latency && tp->ucopy.task == user_recv) {
-   /* Install new reader */
-   if (!user_recv && !(flags & (MSG_TRUNC | MSG_PEEK))) {
-   user_recv = current;
-   tp->ucopy.task = user_recv;
-   tp->ucopy.msg = msg;
-   }
-
-   tp->ucopy.len = len;
-
-   WARN_ON(tp->copied_seq != tp->rcv_nxt &&
-   !(flags & (MSG_PEEK | MSG_TRUNC)));
-
-   /* Ugly... If prequeue is not empty, we have to
-* process it before releasing socket, otherwise
-* order will be broken at second iteration.
-* More elegant solution is required!!!
- 

Re: [PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces

2017-07-27 Thread Laura Abbott
On 07/27/2017 02:39 PM, Tom Bogendoerfer wrote:
> On Thu, Jul 27, 2017 at 02:03:42PM -0700, Laura Abbott wrote:
>> This change causes boot failures for me on my APM Mustang system running
>> Fedora rawhide:
>>
>> [   16.669089] Synchronous External Abort: synchronous external abort 
>> (0x960
>> [   16.669099] Internal error: : 9610 [#1] SMP   
>>
>> [   16.669103] Modules linked in: xgene_enet(+) at803x realtek mdio_xgene 
>> xgenes
>> [   16.669127] CPU: 2 PID: 534 Comm: systemd-udevd Not tainted 
>> 4.13.0-0.rc1.git1
>> [   16.669128] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene 
>> Mustang 6
>> [   16.669131] task: 8003e6f8ce00 task.stack: 8003e4fd8000   
>>
>> [   16.669144] PC is at xgene_enet_wr_mac+0xa0/0x128 [xgene_enet]
>>
>> [   16.669152] LR is at xgene_enet_wr_mac+0x64/0x128 [xgene_enet] 
> 
> on the first glance I don't see anything clock related there.
> 

I don't know the intricacies of the Mustang hardware but external
aborts have been a symptom of missing clocks on other hardware.

> What firmware version is installed on your mustang board ? I saw
> ethernet related crashes with mustang boards because the device tree
> in firmware was too old for the xgene ethernet driver.
> 
> Thoms.
> 

TianoCore 3.06.12 UEFI 2.4.0 Aug 12 2016 13:30:51
CPU: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz
 32 KB ICACHE, 32 KB DCACHE
 SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz
Board: X-Gene Mustang Board
Little Endian build
Slimpro FW:
Ver: 3.5 (build 03.06.12.00 2016/08/12)
PMD: 1000 mV
SOC: 950 mV


Thanks,
Laura


Re: [PATCH v3 1/3] ptp: introduce ptp auxiliary worker

2017-07-27 Thread Grygorii Strashko



On 07/27/2017 03:08 PM, Richard Cochran wrote:

On Wed, Jul 26, 2017 at 05:11:36PM -0500, Grygorii Strashko wrote:

@@ -217,6 +231,19 @@ struct ptp_clock *ptp_clock_register(struct ptp_clock_info 
*info,
mutex_init(&ptp->pincfg_mux);
init_waitqueue_head(&ptp->tsev_wq);
  
+	if (ptp->info->do_aux_work) {

+   char *worker_name = kasprintf(GFP_KERNEL, "ptp%d", ptp->index);


This string is allocated but never freed.


+   kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
+   ptp->kworker = kthread_create_worker(0, worker_name ?
+worker_name : info->name);


Ops. Right need to add kfree(worker_name) here.



+   if (IS_ERR(ptp->kworker)) {
+   err = PTR_ERR(ptp->kworker);
+   pr_err("failed to create ptp aux_worker %d\n", err);
+   goto kworker_err;
+   }
+   }
+


Thanks,
Richard



--
regards,
-grygorii


[PATCH] net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"

2017-07-27 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in printk message

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/toshiba/tc35815.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/toshiba/tc35815.c 
b/drivers/net/ethernet/toshiba/tc35815.c
index d9db8a06afd2..cce9c9ed46aa 100644
--- a/drivers/net/ethernet/toshiba/tc35815.c
+++ b/drivers/net/ethernet/toshiba/tc35815.c
@@ -1338,7 +1338,7 @@ static int tc35815_send_packet(struct sk_buff *skb, 
struct net_device *dev)
 static void tc35815_fatal_error_interrupt(struct net_device *dev, u32 status)
 {
static int count;
-   printk(KERN_WARNING "%s: Fatal Error Intterrupt (%#x):",
+   printk(KERN_WARNING "%s: Fatal Error Interrupt (%#x):",
   dev->name, status);
if (status & Int_IntPCI)
printk(" IntPCI");
-- 
2.11.0



[PATCH] wl3501_cs: fix spelling mistake: "Insupported" -> "Unsupported"

2017-07-27 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in printk message

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/wl3501_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/wl3501_cs.c b/drivers/net/wireless/wl3501_cs.c
index acec0d9ec422..da62220b9c01 100644
--- a/drivers/net/wireless/wl3501_cs.c
+++ b/drivers/net/wireless/wl3501_cs.c
@@ -965,7 +965,7 @@ static inline void wl3501_md_ind_interrupt(struct 
net_device *dev,
&addr4, sizeof(addr4));
if (!(addr4[0] == 0xAA && addr4[1] == 0xAA &&
  addr4[2] == 0x03 && addr4[4] == 0x00)) {
-   printk(KERN_INFO "Insupported packet type!\n");
+   printk(KERN_INFO "Unsupported packet type!\n");
return;
}
pkt_len = sig.size + 12 - 24 - 4 - 6;
-- 
2.11.0



[PATCH] mwifiex: fix spelling mistake: "Insuffient" -> "Insufficient"

2017-07-27 Thread Colin King
From: Colin Ian King 

Trivial fix to spelling mistake in mwifiex_dbg debug message

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/marvell/mwifiex/tdls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/tdls.c 
b/drivers/net/wireless/marvell/mwifiex/tdls.c
index 39cd677d4159..e76af2866a19 100644
--- a/drivers/net/wireless/marvell/mwifiex/tdls.c
+++ b/drivers/net/wireless/marvell/mwifiex/tdls.c
@@ -130,7 +130,7 @@ mwifiex_tdls_append_rates_ie(struct mwifiex_private *priv,
 
if (skb_tailroom(skb) < rates_size + 4) {
mwifiex_dbg(priv->adapter, ERROR,
-   "Insuffient space while adding rates\n");
+   "Insufficient space while adding rates\n");
return -ENOMEM;
}
 
-- 
2.11.0



Re: [PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces

2017-07-27 Thread Tom Bogendoerfer
On Thu, Jul 27, 2017 at 02:03:42PM -0700, Laura Abbott wrote:
> This change causes boot failures for me on my APM Mustang system running
> Fedora rawhide:
> 
> [   16.669089] Synchronous External Abort: synchronous external abort 
> (0x960
> [   16.669099] Internal error: : 9610 [#1] SMP
>   
> [   16.669103] Modules linked in: xgene_enet(+) at803x realtek mdio_xgene 
> xgenes
> [   16.669127] CPU: 2 PID: 534 Comm: systemd-udevd Not tainted 
> 4.13.0-0.rc1.git1
> [   16.669128] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene 
> Mustang 6
> [   16.669131] task: 8003e6f8ce00 task.stack: 8003e4fd8000
>   
> [   16.669144] PC is at xgene_enet_wr_mac+0xa0/0x128 [xgene_enet] 
>   
> [   16.669152] LR is at xgene_enet_wr_mac+0x64/0x128 [xgene_enet] 

on the first glance I don't see anything clock related there.

What firmware version is installed on your mustang board ? I saw
ethernet related crashes with mustang boards because the device tree
in firmware was too old for the xgene ethernet driver.

Thoms.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.[ RFC1925, 2.3 ]


Re: Performance regression with virtio_net

2017-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2017 at 04:14:30PM -0500, Seth Forshee wrote:
> On Thu, Jul 27, 2017 at 11:38:52PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jul 27, 2017 at 12:09:42PM -0500, Seth Forshee wrote:
> > > I'm seeing a performance regression with virtio_net that looks to have
> > > started in 4.12-rc1. I only see it in one context though, downloading
> > > snap packages from the Ubuntu snap store. For example:
> > > 
> > >  
> > > https://api.snapcraft.io/api/v1/snaps/download/b8X2psL1ryVrPt5WEmpYiqfr5emixTd7_1797.snap
> > > 
> > > which redirects to Internap's CDN. Normally this downloads in a few
> > > seconds at ~10 MB/s, but with 4.12 and 4.13 it takes minutes with a rate
> > > of ~150 KB/s. Everything else I've tried downloads as normal speeds.
> > 
> > So just wget that URL should be enough?
> 
> Yes. Note that sometimes it starts out faster then slows down.
> > > I bisected this to 680557cf79f8 "virtio_net: rework mergeable buffer
> > > handling". If I revert this on top of 4.13-rc2 (along with other changes
> > > needed to successfully revert it) speeds return to normal.
> > > 
> > > Thanks,
> > > Seth
> > 
> > 
> > Interesting. A more likely suspect would be
> > e377fcc8486d40867c6c217077ad0fa40977e060 - could you please try
> > reverting that one instead?
> 
> I tried it, and I still get slow download speeds. I did test at
> 680557cf79f82623e2c4fd42733077d60a843513 during the bisect so I'm
> reasonably confident that this is the one where things went bad.
> > Also, could you please look at mergeable_rx_buffer_size in sysfs with
> > and without the change?
> 
> In all cases (stock 4.13-rc2, 680557cf79f8 reverted, and e377fcc8486d
> reverted) mergeable_rx_buffer_size was 1536.
> 
> Thanks,
> Seth

Do you see any error counters incrementing after it slows down?

-- 
MST


Re: Performance regression with virtio_net

2017-07-27 Thread Seth Forshee
On Thu, Jul 27, 2017 at 11:38:52PM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 27, 2017 at 12:09:42PM -0500, Seth Forshee wrote:
> > I'm seeing a performance regression with virtio_net that looks to have
> > started in 4.12-rc1. I only see it in one context though, downloading
> > snap packages from the Ubuntu snap store. For example:
> > 
> >  
> > https://api.snapcraft.io/api/v1/snaps/download/b8X2psL1ryVrPt5WEmpYiqfr5emixTd7_1797.snap
> > 
> > which redirects to Internap's CDN. Normally this downloads in a few
> > seconds at ~10 MB/s, but with 4.12 and 4.13 it takes minutes with a rate
> > of ~150 KB/s. Everything else I've tried downloads as normal speeds.
> 
> So just wget that URL should be enough?

Yes. Note that sometimes it starts out faster then slows down.

> > I bisected this to 680557cf79f8 "virtio_net: rework mergeable buffer
> > handling". If I revert this on top of 4.13-rc2 (along with other changes
> > needed to successfully revert it) speeds return to normal.
> > 
> > Thanks,
> > Seth
> 
> 
> Interesting. A more likely suspect would be
> e377fcc8486d40867c6c217077ad0fa40977e060 - could you please try
> reverting that one instead?

I tried it, and I still get slow download speeds. I did test at
680557cf79f82623e2c4fd42733077d60a843513 during the bisect so I'm
reasonably confident that this is the one where things went bad.

> Also, could you please look at mergeable_rx_buffer_size in sysfs with
> and without the change?

In all cases (stock 4.13-rc2, 680557cf79f8 reverted, and e377fcc8486d
reverted) mergeable_rx_buffer_size was 1536.

Thanks,
Seth


Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-27 Thread Franklin S Cooper Jr


On 07/27/2017 01:47 PM, Oliver Hartkopp wrote:
> On 07/26/2017 08:29 PM, Franklin S Cooper Jr wrote:
>>
> 
>> I'm fine with switching to using bitrate instead of speed. Kurk was
>> originally the one that suggested to use the term arbitration and data
>> since thats how the spec refers to it. Which I do agree with. But your
>> right that in the drivers (struct can_priv) we just use bittiming and
>> data_bittiming (CAN-FD timings). I don't think adding "fd" into the
>> property name makes sense unless we are calling it something like
>> "max-canfd-bitrate" which I would agree is the easiest to understand.
>>
>> So what is the preference if we end up sticking with two properties?
>> Option 1 or 2?
>>
>> 1)
>> max-bitrate
>> max-data-bitrate
>>
>> 2)
>> max-bitrate
>> max-canfd-bitrate
>>
>>
> 
> 1
> 
>>> A CAN transceiver is limited in bandwidth. But you only have one RX and
>>> one TX line between the CAN controller and the CAN transceiver. The
>>> transceiver does not know about CAN FD - it has just a physical(!) layer
>>> with a limited bandwidth. This is ONE limitation.
>>>
>>> So I tend to specify only ONE 'max-bitrate' property for the
>>> fixed-transceiver binding.
>>>
>>> The fact whether the CAN controller is CAN FD capable or not is provided
>>> by the netlink configuration interface for CAN controllers.
>>
>> Part of the reasoning to have two properties is to indicate that you
>> don't support CAN FD while limiting the "arbitration" bit rate.
> 
> ??
> 
> It's a physical layer device which only has a bandwidth limitation.
> The transceiver does not know about CAN FD.
> 
>> With one
>> property you can not determine this and end up having to make some
>> assumptions that can quickly end up biting people.
> 
> Despite the fact that the transceiver does not know anything about ISO
> layer 2 (CAN/CAN FD) the properties should look like
> 
> max-bitrate
> canfd-capable
> 
> then.
> 
> But when the tranceiver is 'canfd-capable' agnostic, why provide a
> property for it?
> 
> Maybe I'm wrong but I still can't follow your argumentation ideas.

Your right. I spoke to our CAN transceiver team and I finally get your
points.

So yes using "max-bitrate" alone is all we need. Sorry for the confusion
and I'll create a new rev using this approach.
> 
> Regards,
> Oliver


Re: [PATCH net] xgene: Don't fail probe, if there is no clk resource for SGMII interfaces

2017-07-27 Thread Laura Abbott
On 07/13/2017 01:57 AM, Thomas Bogendoerfer wrote:
> From: Thomas Bogendoerfer 
> 
> This change fixes following problem
> 
> [1.827940] xgene-enet: probe of 1f210030.ethernet failed with error -2
> 
> which leads to a missing ethernet interface (reproducable at least on
> Gigabyte MP30-AR0 and APM Mustang systems).
> 
> The check for a valid clk resource fails, because DT doesn't provide a
> clock for sgenet1. But the driver doesn't use this clk, if the ethernet
> port is connected via SGMII. Therefore this patch avoids probing for clk
> on SGMII interfaces.
> 
> Fixes: 9aea7779b764 drivers: net: xgene: Fix crash on DT systems
> Signed-off-by: Thomas Bogendoerfer 
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
> b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> index d3906f6b01bd..86058a9f3417 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
> @@ -1785,16 +1785,18 @@ static int xgene_enet_get_resources(struct 
> xgene_enet_pdata *pdata)
>  
>   xgene_enet_gpiod_get(pdata);
>  
> - pdata->clk = devm_clk_get(&pdev->dev, NULL);
> - if (IS_ERR(pdata->clk)) {
> - /* Abort if the clock is defined but couldn't be retrived.
> -  * Always abort if the clock is missing on DT system as
> -  * the driver can't cope with this case.
> -  */
> - if (PTR_ERR(pdata->clk) != -ENOENT || dev->of_node)
> - return PTR_ERR(pdata->clk);
> - /* Firmware may have set up the clock already. */
> - dev_info(dev, "clocks have been setup already\n");
> + if (pdata->phy_mode != PHY_INTERFACE_MODE_SGMII) {
> + pdata->clk = devm_clk_get(&pdev->dev, NULL);
> + if (IS_ERR(pdata->clk)) {
> + /* Abort if the clock is defined but couldn't be
> +  * retrived. Always abort if the clock is missing on
> +  * DT system as the driver can't cope with this case.
> +  */
> + if (PTR_ERR(pdata->clk) != -ENOENT || dev->of_node)
> + return PTR_ERR(pdata->clk);
> + /* Firmware may have set up the clock already. */
> + dev_info(dev, "clocks have been setup already\n");
> + }
>   }
>  
>   if (pdata->phy_mode != PHY_INTERFACE_MODE_XGMII)
> 

This change causes boot failures for me on my APM Mustang system running
Fedora rawhide:

[   16.669089] Synchronous External Abort: synchronous external abort (0x960
[   16.669099] Internal error: : 9610 [#1] SMP  
[   16.669103] Modules linked in: xgene_enet(+) at803x realtek mdio_xgene xgenes
[   16.669127] CPU: 2 PID: 534 Comm: systemd-udevd Not tainted 4.13.0-0.rc1.git1
[   16.669128] Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang 6
[   16.669131] task: 8003e6f8ce00 task.stack: 8003e4fd8000  
[   16.669144] PC is at xgene_enet_wr_mac+0xa0/0x128 [xgene_enet]   
[   16.669152] LR is at xgene_enet_wr_mac+0x64/0x128 [xgene_enet] 


   
[   16.669345] [] xgene_enet_wr_mac+0xa0/0x128 [xgene_enet]   
[   16.669354] [] xgene_sgmac_reset+0x28/0x48 [xgene_enet]
[   16.669362] [] xgene_sgmac_init+0x1e0/0x2e8 [xgene_enet]   
[   16.669370] [] xgene_enet_probe+0xfa4/0x1368 [xgene_enet]  
[   16.669376] [] platform_drv_probe+0x60/0xc0
[   16.669379] [] driver_probe_device+0x31c/0x458 
[   16.669381] [] __driver_attach+0xe4/0x130  
[   16.669384] [] bus_for_each_dev+0x5c/0xa8  
[   16.669386] [] driver_attach+0x30/0x40 
[   16.669388] [] bus_add_driver+0x220/0x2c0  
[   16.669390] [] driver_register+0x6c/0x118  
[   16.669392] [] __platform_driver_register+0x54/0x60
[   16.669400] [] xgene_enet_driver_init+0x14/0x1000 [xgene_e]
[   16.669404] [] do_one_initcall+0x44/0x138  
[   16.669408] [] do_init_module+0x64/0x1d0   
[   16.669410] [] load_module+0x151c/0x1770   
[   16.669413] [] SyS_finit_module+0xd8/0xf0  
[   16.669415] [] __sys_trace_return+0x0/0x4  
[   16.669418] Code: 1404 d503201f d28218e0 95f24031 (b94002a2) 

I suspect the clock is actually needed on some systems.

Thanks,
Laura


RE: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-27 Thread Salil Mehta
Hi Florian,

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Thursday, July 27, 2017 7:05 PM
> To: Salil Mehta; da...@davemloft.net
> Cc: Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to
> HNS3 driver
> 
> On 07/27/2017 11:01 AM, Salil Mehta wrote:
> > Hi Florian,
> >
> >> -Original Message-
> >> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> >> Sent: Sunday, July 23, 2017 6:05 PM
> >> To: Salil Mehta; da...@davemloft.net
> >> Cc: Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> >> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> >> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> >> Subject: Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support
> to
> >> HNS3 driver
> >>
> >>
> >>
> >> On 07/22/2017 03:09 PM, Salil Mehta wrote:
> >>> This patch adds the support of the Ethtool interface to
> >>> the HNS3 Ethernet driver. Various commands to read the
> >>> statistics, configure the offloading, loopback selftest etc.
> >>> are supported.
> >>>
> >>> Signed-off-by: Daode Huang 
> >>> Signed-off-by: lipeng 
> >>> Signed-off-by: Salil Mehta 
> >>> Signed-off-by: Yisen Zhuang 
> >>> ---
> >>> Patch V4: addressed below comments
> >>>  1. Andrew Lunn
> >>> Removed the support of loop PHY back for now
> >>> Patch V3: Address below comments
> >>>  1. Stephen Hemminger
> >>> https://lkml.org/lkml/2017/6/13/974
> >>>  2. Andrew Lunn
> >>> https://lkml.org/lkml/2017/6/13/1037
> >>> Patch V2: No change
> >>> Patch V1: Initial Submit
> >>> ---
> >>>  .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  | 543
> >> +
> >>>  1 file changed, 543 insertions(+)
> >>>  create mode 100644
> >> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> >>>
> >>> diff --git
> >> a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> >> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> >>> new file mode 100644
> >>> index ..82b0d4d829f8
> >>> --- /dev/null
> >>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> >>> @@ -0,0 +1,543 @@
> >>> +/*
> >>> + * Copyright (c) 2016~2017 Hisilicon Limited.
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> >> modify
> >>> + * it under the terms of the GNU General Public License as
> published
> >> by
> >>> + * the Free Software Foundation; either version 2 of the License,
> or
> >>> + * (at your option) any later version.
> >>> + */
> >>> +
> >>> +#include 
> >>> +#include "hns3_enet.h"
> >>> +
> >>> +struct hns3_stats {
> >>> + char stats_string[ETH_GSTRING_LEN];
> >>> + int stats_size;
> >>> + int stats_offset;
> >>> +};
> >>> +
> >>> +/* netdev related stats */
> >>> +#define HNS3_NETDEV_STAT(_string, _member)   \
> >>> + { _string,  \
> >>> +   FIELD_SIZEOF(struct rtnl_link_stats64, _member),  \
> >>> +   offsetof(struct rtnl_link_stats64, _member),  \
> >>> + }
> >>
> >> Can you make this macro use named initializers?
> > Can you please explain bit more or point out some
> > example. This would be very handy.
> 
>   .stat_string = _string,
>   .stats_size = FIELD_SIZEOF(struct rtnl_link_stat64, _member),
>   .stats_offset = offsetof(struct rtnl_link_stats64, _member),
> 
> https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html

Ok got it, thanks!

> --
> Florian


RE: [PATCH V4 net-next 1/8] net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC

2017-07-27 Thread Salil Mehta
Hi Florian,

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Sunday, July 23, 2017 6:24 PM
> To: Salil Mehta; da...@davemloft.net
> Cc: Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 1/8] net: hns3: Add support of HNS3
> Ethernet Driver for hip08 SoC
> 
> 
> 
> On 07/22/2017 03:09 PM, Salil Mehta wrote:
> > This patch adds the support of Hisilicon Network Subsystem 3
> > Ethernet driver to hip08 family of SoCs.
> >
> > This driver includes basic Rx/Tx functionality. It also includes
> > the client registration code with the HNAE3(Hisilicon Network
> > Acceleration Engine 3) framework.
> >
> > This work provides the initial support to the hip08 SoC and
> > would incrementally add features or enhancements.
> >
> > Signed-off-by: Daode Huang 
> > Signed-off-by: lipeng 
> > Signed-off-by: Salil Mehta 
> > Signed-off-by: Yisen Zhuang 
> > ---
> > Patch V4: addressed comments by:
> >   1. Andrew Lunn:
> >  https://lkml.org/lkml/2017/6/17/222
> >  https://lkml.org/lkml/2017/6/17/232
> >   2. Bo Yu:
> >  https://lkml.org/lkml/2017/6/18/110
> >  https://lkml.org/lkml/2017/6/18/115
> > Patch V3: Addresed below comments:
> >   1. Stephen Hemminger:
> >  https://lkml.org/lkml/2017/6/13/972
> >   2. Yuval Mintz:
> >  https://lkml.org/lkml/2017/6/14/151
> > Patch V2: Addressed below comments:
> >   1. Kbuild:
> >  https://lkml.org/lkml/2017/6/11/73
> >   2. Yuval Mintz:
> >  https://lkml.org/lkml/2017/6/10/78
> > Patch V1: Initial Submit
> > ---
> >  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2894
> 
> >  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  598 
> >  2 files changed, 3492 insertions(+)
> >  create mode 100644
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> >  create mode 100644
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> >
> > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > new file mode 100644
> > index ..6e0e2967db42
> > --- /dev/null
> > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > @@ -0,0 +1,2894 @@
> > +/*
> > + * Copyright (c) 2016~2017 Hisilicon Limited.
> > + *
> > + * This program is free software; you can redistribute it and/or
> modify
> > + * it under the terms of the GNU General Public License as published
> by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "hnae3.h"
> > +#include "hns3_enet.h"
> > +
> > +const char hns3_driver_name[] = "hns3";
> > +static const char hns3_driver_string[] =
> > +   "Hisilicon Ethernet Network Driver for Hi162x
> Family";
> > +static const char hns3_copyright[] = "Copyright (c) 2017 Huawei
> Corporation.";
> > +static struct hnae3_client client;
> > +
> > +/* hns3_pci_tbl - PCI Device ID Table
> > + *
> > + * Last entry must be all 0s
> > + *
> > + * { Vendor ID, Device ID, SubVendor ID, SubDevice ID,
> > + *   Class, Class Mask, private data (not used) }
> > + */
> > +static const struct pci_device_id hns3_pci_tbl[] = {
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_GE), 0},
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE), 0},
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA), 0},
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_25GE_RDMA_MACSEC), 0},
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA), 0},
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_50GE_RDMA_MACSEC), 0},
> > +   {PCI_VDEVICE(HUAWEI, HNAE3_DEV_ID_100G_RDMA_MACSEC), 0},
> > +   /* required last entry */
> > +   {0, }
> > +};
> > +MODULE_DEVICE_TABLE(pci, hns3_pci_tbl);
> > +
> > +static irqreturn_t hns3_irq_handle(int irq, void *dev)
> > +{
> > +   struct hns3_enet_tqp_vector *tqp_vector = dev;
> > +
> > +   napi_schedule(&tqp_vector->napi);
> > +
> > +   return IRQ_HANDLED;
> > +}
> > +
> > +static int hns3_nic_init_irq(struct hns3_nic_priv *priv)
> > +{
> > +   struct pci_dev *pdev = priv->ae_handle->pdev;
> > +   struct hns3_enet_tqp_vector *tqp_vectors;
> > +   int txrx_int_idx = 0;
> > +   int rx_int_idx = 0;
> > +   int tx_int_idx = 0;
> > +   int ret;
> > +   int i;
> 
> unsigned int i
Ok.

> 
> > +
> > +   for (i = 0; i < priv->vector_num; i++) {
> > +   tqp_vectors = &priv->tqp_vector[i];
> > +
> > +   if (tqp_vectors->irq_init_flag == HNS3_VECTOR_INITED)
> > +   continue;
> > +
> > +   if (tqp_vectors->tx_group.ring && tqp_vectors-
> >rx_group.ring) {
> > +   snprintf(tqp_vectors->name, HNAE3_INT_NAME_LEN - 1,
> > + 

Re: Performance regression with virtio_net

2017-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2017 at 12:09:42PM -0500, Seth Forshee wrote:
> I'm seeing a performance regression with virtio_net that looks to have
> started in 4.12-rc1. I only see it in one context though, downloading
> snap packages from the Ubuntu snap store. For example:
> 
>  
> https://api.snapcraft.io/api/v1/snaps/download/b8X2psL1ryVrPt5WEmpYiqfr5emixTd7_1797.snap
> 
> which redirects to Internap's CDN. Normally this downloads in a few
> seconds at ~10 MB/s, but with 4.12 and 4.13 it takes minutes with a rate
> of ~150 KB/s. Everything else I've tried downloads as normal speeds.

So just wget that URL should be enough?

> I bisected this to 680557cf79f8 "virtio_net: rework mergeable buffer
> handling". If I revert this on top of 4.13-rc2 (along with other changes
> needed to successfully revert it) speeds return to normal.
> 
> Thanks,
> Seth


Interesting. A more likely suspect would be
e377fcc8486d40867c6c217077ad0fa40977e060 - could you please try
reverting that one instead?
Also, could you please look at mergeable_rx_buffer_size in sysfs with
and without the change?

-- 
MST


Re: [PATCH V2 net] Revert "vhost: cache used event for better performance"

2017-07-27 Thread Michael S. Tsirkin
On Thu, Jul 27, 2017 at 11:22:05AM +0800, Jason Wang wrote:
> This reverts commit 809ecb9bca6a9424ccd392d67e368160f8b76c92. Since it
> was reported to break vhost_net. We want to cache used event and use
> it to check for notification. The assumption was that guest won't move
> the event idx back, but this could happen in fact when 16 bit index
> wraps around after 64K entries.
> 
> Signed-off-by: Jason Wang 

Acked-by: Michael S. Tsirkin 

> ---
> - Changes from V1: tweak commit log
> - The patch is needed for -stable.
> ---
>  drivers/vhost/vhost.c | 28 ++--
>  drivers/vhost/vhost.h |  3 ---
>  2 files changed, 6 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3..9cb3f72 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -308,7 +308,6 @@ static void vhost_vq_reset(struct vhost_dev *dev,
>   vq->avail = NULL;
>   vq->used = NULL;
>   vq->last_avail_idx = 0;
> - vq->last_used_event = 0;
>   vq->avail_idx = 0;
>   vq->last_used_idx = 0;
>   vq->signalled_used = 0;
> @@ -1402,7 +1401,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, 
> void __user *argp)
>   r = -EINVAL;
>   break;
>   }
> - vq->last_avail_idx = vq->last_used_event = s.num;
> + vq->last_avail_idx = s.num;
>   /* Forget the cached index value. */
>   vq->avail_idx = vq->last_avail_idx;
>   break;
> @@ -2241,6 +2240,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
> vhost_virtqueue *vq)
>   __u16 old, new;
>   __virtio16 event;
>   bool v;
> + /* Flush out used index updates. This is paired
> +  * with the barrier that the Guest executes when enabling
> +  * interrupts. */
> + smp_mb();
>  
>   if (vhost_has_feature(vq, VIRTIO_F_NOTIFY_ON_EMPTY) &&
>   unlikely(vq->avail_idx == vq->last_avail_idx))
> @@ -2248,10 +2251,6 @@ static bool vhost_notify(struct vhost_dev *dev, struct 
> vhost_virtqueue *vq)
>  
>   if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
>   __virtio16 flags;
> - /* Flush out used index updates. This is paired
> -  * with the barrier that the Guest executes when enabling
> -  * interrupts. */
> - smp_mb();
>   if (vhost_get_avail(vq, flags, &vq->avail->flags)) {
>   vq_err(vq, "Failed to get flags");
>   return true;
> @@ -2266,26 +2265,11 @@ static bool vhost_notify(struct vhost_dev *dev, 
> struct vhost_virtqueue *vq)
>   if (unlikely(!v))
>   return true;
>  
> - /* We're sure if the following conditions are met, there's no
> -  * need to notify guest:
> -  * 1) cached used event is ahead of new
> -  * 2) old to new updating does not cross cached used event. */
> - if (vring_need_event(vq->last_used_event, new + vq->num, new) &&
> - !vring_need_event(vq->last_used_event, new, old))
> - return false;
> -
> - /* Flush out used index updates. This is paired
> -  * with the barrier that the Guest executes when enabling
> -  * interrupts. */
> - smp_mb();
> -
>   if (vhost_get_avail(vq, event, vhost_used_event(vq))) {
>   vq_err(vq, "Failed to get used event idx");
>   return true;
>   }
> - vq->last_used_event = vhost16_to_cpu(vq, event);
> -
> - return vring_need_event(vq->last_used_event, new, old);
> + return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
>  /* This actually signals the guest, using eventfd. */
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index f720958..bb7c29b 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -115,9 +115,6 @@ struct vhost_virtqueue {
>   /* Last index we used. */
>   u16 last_used_idx;
>  
> - /* Last used evet we've seen */
> - u16 last_used_event;
> -
>   /* Used flags */
>   u16 used_flags;
>  
> -- 
> 2.7.4


Re: [PATCH v3 1/3] ptp: introduce ptp auxiliary worker

2017-07-27 Thread Richard Cochran
On Wed, Jul 26, 2017 at 05:11:36PM -0500, Grygorii Strashko wrote:
> @@ -217,6 +231,19 @@ struct ptp_clock *ptp_clock_register(struct 
> ptp_clock_info *info,
>   mutex_init(&ptp->pincfg_mux);
>   init_waitqueue_head(&ptp->tsev_wq);
>  
> + if (ptp->info->do_aux_work) {
> + char *worker_name = kasprintf(GFP_KERNEL, "ptp%d", ptp->index);

This string is allocated but never freed.

> + kthread_init_delayed_work(&ptp->aux_work, ptp_aux_kworker);
> + ptp->kworker = kthread_create_worker(0, worker_name ?
> +  worker_name : info->name);
> + if (IS_ERR(ptp->kworker)) {
> + err = PTR_ERR(ptp->kworker);
> + pr_err("failed to create ptp aux_worker %d\n", err);
> + goto kworker_err;
> + }
> + }
> +

Thanks,
Richard


Re: [PATCHv4 net] ipv6: no need to check rt->dst.error when get route info

2017-07-27 Thread Roopa Prabhu
On Thu, Jul 27, 2017 at 9:25 AM, Hangbin Liu  wrote:
> After commit 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib
> result when requested"). When we get a prohibit ertry, we will return
> -EACCES directly instead of dump route info.
>
> Fix it by remove the rt->dst.error check.
>
> Before fix:
> \# ip -6 route add prohibit 2003::/64 dev eth1
> \# ip -6 route get fibmatch 2003::1
> RTNETLINK answers: Permission denied
> \# ip -6 route add unreachable 2004::/64 dev eth1
> \# ip -6 route get fibmatch 2004::1
> RTNETLINK answers: No route to host
>
> After fix:
> \# ip -6 route add prohibit 2003::/64 dev eth1
> \# ip -6 route get fibmatch 2003::1
> prohibit 2003::/64 dev lo metric 1024 error -13 pref medium
> \# ip -6 route add unreachable 2004::/64 dev eth1
> \# ip -6 route get fibmatch 2004::1
> unreachable 2004::/64 dev lo metric 1024 error -113 pref medium
>
> Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib...")
> Signed-off-by: Hangbin Liu 
> ---

Acked-by: Roopa Prabhu 


[PATCH net-next] liquidio: bump up driver version to match newer NIC firmware

2017-07-27 Thread Felix Manlunas
Bump up driver version to match newer NIC firmware.  Also update
nic_rx_stats (a struct common to host driver and firmware) by adding a new
field:  fw_total_fwd_bytes.

Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/liquidio_common.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h 
b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 53aaf41..3b9e364 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -27,8 +27,8 @@
 
 #define LIQUIDIO_PACKAGE ""
 #define LIQUIDIO_BASE_MAJOR_VERSION 1
-#define LIQUIDIO_BASE_MINOR_VERSION 5
-#define LIQUIDIO_BASE_MICRO_VERSION 1
+#define LIQUIDIO_BASE_MINOR_VERSION 6
+#define LIQUIDIO_BASE_MICRO_VERSION 0
 #define LIQUIDIO_BASE_VERSION   __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \
__stringify(LIQUIDIO_BASE_MINOR_VERSION)
 #define LIQUIDIO_MICRO_VERSION  "." __stringify(LIQUIDIO_BASE_MICRO_VERSION)
@@ -768,6 +768,7 @@ struct nic_rx_stats {
/* firmware stats */
u64 fw_total_rcvd;
u64 fw_total_fwd;
+   u64 fw_total_fwd_bytes;
u64 fw_err_pko;
u64 fw_err_link;
u64 fw_err_drop;


I NEED YOUR URGENT HELP AND CORPERATION

2017-07-27 Thread IBRAHIM KABORE
Dear Friend

I am contacting you on a business deal of $9,500,000.00 Million United States 
Dollars, ready for transfer into your own personal account and if we make this 
claim, we will share it on the ratio of 50% / 50% basis, I would like to assure 
you that it be 100% risk free and it will be legally backed up with government 
approval. Once you are interested to transact this business with me, kindly 
give me your consent response immediately.

Hoping to hear from you.

My regards,
Mr Ibrahim Kabore
EMAIL,ibrahim.kab...@hotmail.com


[PATCH net] bpf: don't indicate success when copy_from_user fails

2017-07-27 Thread Daniel Borkmann
err in bpf_prog_get_info_by_fd() still holds 0 at that time from prior
check_uarg_tail_zero() check. Explicitly return -EFAULT instead, so
user space can be notified of buggy behavior.

Fixes: 1e2709769086 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 045646d..84bb399 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1289,7 +1289,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
info_len = min_t(u32, sizeof(info), info_len);
 
if (copy_from_user(&info, uinfo, info_len))
-   return err;
+   return -EFAULT;
 
info.type = prog->type;
info.id = prog->aux->id;
-- 
1.9.3



Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-27 Thread Oliver Hartkopp

On 07/26/2017 08:29 PM, Franklin S Cooper Jr wrote:





I'm fine with switching to using bitrate instead of speed. Kurk was
originally the one that suggested to use the term arbitration and data
since thats how the spec refers to it. Which I do agree with. But your
right that in the drivers (struct can_priv) we just use bittiming and
data_bittiming (CAN-FD timings). I don't think adding "fd" into the
property name makes sense unless we are calling it something like
"max-canfd-bitrate" which I would agree is the easiest to understand.

So what is the preference if we end up sticking with two properties?
Option 1 or 2?

1)
max-bitrate
max-data-bitrate

2)
max-bitrate
max-canfd-bitrate




1


A CAN transceiver is limited in bandwidth. But you only have one RX and
one TX line between the CAN controller and the CAN transceiver. The
transceiver does not know about CAN FD - it has just a physical(!) layer
with a limited bandwidth. This is ONE limitation.

So I tend to specify only ONE 'max-bitrate' property for the
fixed-transceiver binding.

The fact whether the CAN controller is CAN FD capable or not is provided
by the netlink configuration interface for CAN controllers.


Part of the reasoning to have two properties is to indicate that you
don't support CAN FD while limiting the "arbitration" bit rate.


??

It's a physical layer device which only has a bandwidth limitation.
The transceiver does not know about CAN FD.


With one
property you can not determine this and end up having to make some
assumptions that can quickly end up biting people.


Despite the fact that the transceiver does not know anything about ISO 
layer 2 (CAN/CAN FD) the properties should look like


max-bitrate
canfd-capable

then.

But when the tranceiver is 'canfd-capable' agnostic, why provide a 
property for it?


Maybe I'm wrong but I still can't follow your argumentation ideas.

Regards,
Oliver


Re: [PATCH RFC 11/13] phylink: add support for MII ioctl access to Clause 45 PHYs

2017-07-27 Thread Andrew Lunn
On Tue, Jul 25, 2017 at 03:03:28PM +0100, Russell King wrote:
> Add support for reading and writing the clause 45 MII registers.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Raj, Ashok
Hi Casey

> | Still no Intel and AMD guys has ack this, this is what I am worried about,
> | should I ping some man again ?


I can ack the patch set for Intel specific changes. Now that the doc is made
public :-).

Can you/Ding resend the patch series, i do have the most recent v7, some
of the commit message wasn't easy to ready. Seems like this patch has
gotten bigger than originally intended, but seems to be for the overall
good :-).

Sorry for staying silent up until now.

Cheers,
Ashok


RE: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY

2017-07-27 Thread Wyborny, Carolyn
> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Florian Fainelli
> Sent: Thursday, July 27, 2017 10:58 AM
> To: Andrew Lunn ; Brown, Aaron F
> 
> Cc: John W. Linville ; netdev@vger.kernel.org; intel-
> wired-...@lists.osuosl.org
> Subject: Re: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY
> 
> On 07/27/2017 08:37 AM, Andrew Lunn wrote:
> > On Thu, Jul 27, 2017 at 12:40:01AM +, Brown, Aaron F wrote:
> >>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf
> >>> Of John W. Linville
> >>> Sent: Friday, July 21, 2017 11:12 AM
> >>> To: netdev@vger.kernel.org
> >>> Cc: intel-wired-...@lists.osuosl.org; John W. Linville
> >>> 
> >>> Subject: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY
> >>>
> >>> The management port on an Edgecore AS7712-32 switch uses an igb MAC,
> >>> but
> >>> it uses a BCM54616 PHY. Without a patch like this, loading the igb
> >>> module produces dmesg output like this:
> >>>
> >>> [3.439125] igb: Copyright (c) 2007-2014 Intel Corporation.
> >>> [3.439866] igb: probe of :00:14.0 failed with error -2
> >>>
> >>> Signed-off-by: John W. Linville 
> >>> Cc: Jeff Kirsher 
> >>> ---
> >>>  drivers/net/ethernet/intel/igb/e1000_82575.c   | 6 ++
> >>>  drivers/net/ethernet/intel/igb/e1000_defines.h | 1 +
> >>>  drivers/net/ethernet/intel/igb/e1000_hw.h  | 1 +
> >>>  3 files changed, 8 insertions(+)
> >>
> >> I do not have the specific hardware (Edgecore switch) but as far as
> regression tests go this works fine.
> >> Tested-by: Aaron Brown 
> >
> > Sorry, missed the initial post, so replying to a reply.
> >
> > Linux has supported the BCM54616 PHY since April 2015. If the Intel
> > drivers used the Linux PHY drivers, you would not of had this problem.
> >
> > It would be good if somebody spent the time to migrate these MAC
> > drivers to use the common Linux PHY infrastructure.
> 
> I suspect there is a design pattern within the Intel drivers to share as
> much low-level code as possible between OSes and only have some
> Linux-ism where necessary (e.g: net_device, ethtool etc.).
> 
> PHY code is a pain in general, especially if you are serious about
> testing interoperability (which is where you can spend tons of $$$ with
> little reward but just say: yes it works), so it may make sense to share
> it across different OSes.
> 
> I too, wish there was more sharing, but considering that this works for
> the Intel driver, there is little incentive in doing this I suppose...
> --
> Florian

Yes Florian you are correct generally, but future implementations have not been 
ruled out.  With this driver, we had our custom phy init code tested and 
released for several products before phylib existed.  I began research on a 
phylib implementation for it at one point, but internal product decisions ended 
up lowering the priority of that work.

Thanks,

Carolyn

Carolyn Wyborny 
Linux Development 
Networking Division 
Intel Corporation 




Re: [PATCHv2 iproute2] utils: return default family when rtm_family is not RTNL_FAMILY_IPMR/IP6MR

2017-07-27 Thread Stephen Hemminger
On Thu, 27 Jul 2017 12:03:02 +0200
Phil Sutter  wrote:

> On Thu, Jul 27, 2017 at 05:44:15PM +0800, Hangbin Liu wrote:
> > When we get a multicast route, the rtm_type is RTN_MULTICAST, but the
> > rtm_family may be AF_INET. If we only check the type with RTNL_FAMILY_IPMR,
> > we will get malformed address. e.g.
> > 
> > + ip -4 route add multicast 172.111.1.1 dev em1 table main
> > 
> > Before fix:
> > + ip route list type multicast table main
> > multicast ac6f:101:800:400:400:0:3c00:0 dev em1 scope link
> > 
> > After fix:
> > + ip route list type multicast table main
> > multicast 172.111.1.1 dev em1 scope link
> > 
> > Fixes: 56e3eb4c3400 ("ip: route: fix multicast route dumps")
> > Signed-off-by: Hangbin Liu   
> 
> Acked-by: Phil Sutter 

Applied, thanks.


Re: [PATCH] netns: more input validation

2017-07-27 Thread Stephen Hemminger
On Tue, 25 Jul 2017 15:30:31 +0200
Matteo Croce  wrote:

> ip netns accepts invalid input as namespace name like an empty string or a
> string longer than the maximum file name length.
> Check that the netns name is not empty and less than or equal to NAME_MAX.
> 
> Signed-off-by: Matteo Croce 

Sure, applied.


Re: [PATCH iproute2] geneve: support for modifying geneve device

2017-07-27 Thread Stephen Hemminger
On Tue, 25 Jul 2017 19:11:43 -0700
Girish Moodalbail  wrote:

> Ability to change geneve device attributes was added to kernel through
> commit 5b861f6baa3a ("geneve: add rtnl changelink support"), however one
> cannot do the same through ip-link(8) command.  Changing the allowed
> geneve device attributes using 'ip link set  type geneve id
>  ' currently fails with 'operation not
> supported' error.  This patch adds support for it.
> 
> Signed-off-by: Girish Moodalbail 

Looks good, applied.
Please send a follow on patch to update usage help and man page.


Re: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY

2017-07-27 Thread Andrew Lunn
> I recall someone mentioning that there were plans to have phylib support
> [Q]SFP[+] modules as well. I am very interested in that work, if someone
> has patches/plans I would like to take a look.

Hi Jon

The first part was posted yesterday:

https://www.spinics.net/lists/netdev/msg445767.html

Andrew


[PATCH v2] net: inet: diag: expose sockets cgroup classid

2017-07-27 Thread Levin, Alexander (Sasha Levin)
This is useful for directly looking up a task based on class id rather than
having to scan through all open file descriptors.

Signed-off-by: Sasha Levin 
---

Changes in V2:
 - Addressed comments from Cong Wang (use nla_put_u32())

 include/uapi/linux/inet_diag.h |  1 +
 net/ipv4/inet_diag.c   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index bbe201047df6..678496897a68 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -142,6 +142,7 @@ enum {
INET_DIAG_PAD,
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
+   INET_DIAG_CLASS_ID,
__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3828b3a805cd..2c2445d4bb58 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
goto errout;
}
 
+   if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
+   u32 classid = 0;
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   classid = sock_cgroup_classid(&sk->sk_cgrp_data);
+#endif
+
+   nla_put_u32(skb, INET_DIAG_CLASS_ID, classid);
+   }
+
 out:
nlmsg_end(skb, nlh);
return 0;
-- 
2.11.0


Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-27 Thread Florian Fainelli
On 07/27/2017 11:01 AM, Salil Mehta wrote:
> Hi Florian,
> 
>> -Original Message-
>> From: Florian Fainelli [mailto:f.faine...@gmail.com]
>> Sent: Sunday, July 23, 2017 6:05 PM
>> To: Salil Mehta; da...@davemloft.net
>> Cc: Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
>> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
>> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
>> Subject: Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to
>> HNS3 driver
>>
>>
>>
>> On 07/22/2017 03:09 PM, Salil Mehta wrote:
>>> This patch adds the support of the Ethtool interface to
>>> the HNS3 Ethernet driver. Various commands to read the
>>> statistics, configure the offloading, loopback selftest etc.
>>> are supported.
>>>
>>> Signed-off-by: Daode Huang 
>>> Signed-off-by: lipeng 
>>> Signed-off-by: Salil Mehta 
>>> Signed-off-by: Yisen Zhuang 
>>> ---
>>> Patch V4: addressed below comments
>>>  1. Andrew Lunn
>>> Removed the support of loop PHY back for now
>>> Patch V3: Address below comments
>>>  1. Stephen Hemminger
>>> https://lkml.org/lkml/2017/6/13/974
>>>  2. Andrew Lunn
>>> https://lkml.org/lkml/2017/6/13/1037
>>> Patch V2: No change
>>> Patch V1: Initial Submit
>>> ---
>>>  .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  | 543
>> +
>>>  1 file changed, 543 insertions(+)
>>>  create mode 100644
>> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
>>>
>>> diff --git
>> a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
>>> new file mode 100644
>>> index ..82b0d4d829f8
>>> --- /dev/null
>>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
>>> @@ -0,0 +1,543 @@
>>> +/*
>>> + * Copyright (c) 2016~2017 Hisilicon Limited.
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>> modify
>>> + * it under the terms of the GNU General Public License as published
>> by
>>> + * the Free Software Foundation; either version 2 of the License, or
>>> + * (at your option) any later version.
>>> + */
>>> +
>>> +#include 
>>> +#include "hns3_enet.h"
>>> +
>>> +struct hns3_stats {
>>> +   char stats_string[ETH_GSTRING_LEN];
>>> +   int stats_size;
>>> +   int stats_offset;
>>> +};
>>> +
>>> +/* netdev related stats */
>>> +#define HNS3_NETDEV_STAT(_string, _member) \
>>> +   { _string,  \
>>> + FIELD_SIZEOF(struct rtnl_link_stats64, _member),  \
>>> + offsetof(struct rtnl_link_stats64, _member),  \
>>> +   }
>>
>> Can you make this macro use named initializers?
> Can you please explain bit more or point out some
> example. This would be very handy. 

.stat_string = _string,
.stats_size = FIELD_SIZEOF(struct rtnl_link_stat64, _member),
.stats_offset = offsetof(struct rtnl_link_stats64, _member),

https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
-- 
Florian


Re: [PATCHv4 net] ipv6: no need to check rt->dst.error when get route info

2017-07-27 Thread David Ahern
On 7/27/17 10:25 AM, Hangbin Liu wrote:
> After commit 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib
> result when requested"). When we get a prohibit ertry, we will return
> -EACCES directly instead of dump route info.
> 
> Fix it by remove the rt->dst.error check.
> 
> Before fix:
> \# ip -6 route add prohibit 2003::/64 dev eth1
> \# ip -6 route get fibmatch 2003::1
> RTNETLINK answers: Permission denied
> \# ip -6 route add unreachable 2004::/64 dev eth1
> \# ip -6 route get fibmatch 2004::1
> RTNETLINK answers: No route to host
> 
> After fix:
> \# ip -6 route add prohibit 2003::/64 dev eth1
> \# ip -6 route get fibmatch 2003::1
> prohibit 2003::/64 dev lo metric 1024 error -13 pref medium
> \# ip -6 route add unreachable 2004::/64 dev eth1
> \# ip -6 route get fibmatch 2004::1
> unreachable 2004::/64 dev lo metric 1024 error -113 pref medium
> 
> Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib...")
> Signed-off-by: Hangbin Liu 
> ---
>  net/ipv6/route.c | 6 --
>  1 file changed, 6 deletions(-)
> 

Acked-by: David Ahern 


RE: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-27 Thread Salil Mehta
Hi Florian,

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Sunday, July 23, 2017 6:05 PM
> To: Salil Mehta; da...@davemloft.net
> Cc: Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to
> HNS3 driver
> 
> 
> 
> On 07/22/2017 03:09 PM, Salil Mehta wrote:
> > This patch adds the support of the Ethtool interface to
> > the HNS3 Ethernet driver. Various commands to read the
> > statistics, configure the offloading, loopback selftest etc.
> > are supported.
> >
> > Signed-off-by: Daode Huang 
> > Signed-off-by: lipeng 
> > Signed-off-by: Salil Mehta 
> > Signed-off-by: Yisen Zhuang 
> > ---
> > Patch V4: addressed below comments
> >  1. Andrew Lunn
> > Removed the support of loop PHY back for now
> > Patch V3: Address below comments
> >  1. Stephen Hemminger
> > https://lkml.org/lkml/2017/6/13/974
> >  2. Andrew Lunn
> > https://lkml.org/lkml/2017/6/13/1037
> > Patch V2: No change
> > Patch V1: Initial Submit
> > ---
> >  .../ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c  | 543
> +
> >  1 file changed, 543 insertions(+)
> >  create mode 100644
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> >
> > diff --git
> a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> > new file mode 100644
> > index ..82b0d4d829f8
> > --- /dev/null
> > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c
> > @@ -0,0 +1,543 @@
> > +/*
> > + * Copyright (c) 2016~2017 Hisilicon Limited.
> > + *
> > + * This program is free software; you can redistribute it and/or
> modify
> > + * it under the terms of the GNU General Public License as published
> by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include "hns3_enet.h"
> > +
> > +struct hns3_stats {
> > +   char stats_string[ETH_GSTRING_LEN];
> > +   int stats_size;
> > +   int stats_offset;
> > +};
> > +
> > +/* netdev related stats */
> > +#define HNS3_NETDEV_STAT(_string, _member) \
> > +   { _string,  \
> > + FIELD_SIZEOF(struct rtnl_link_stats64, _member),  \
> > + offsetof(struct rtnl_link_stats64, _member),  \
> > +   }
> 
> Can you make this macro use named initializers?
Can you please explain bit more or point out some
example. This would be very handy. 

Thanks

> 
> > +
> > +static const struct hns3_stats hns3_netdev_stats[] = {
> > +   /* misc. Rx/Tx statistics */
> > +   HNS3_NETDEV_STAT("rx_packets", rx_packets),
> > +   HNS3_NETDEV_STAT("tx_packets", tx_packets),
> > +   HNS3_NETDEV_STAT("rx_bytes", rx_bytes),
> > +   HNS3_NETDEV_STAT("tx_bytes", tx_bytes),
> > +   HNS3_NETDEV_STAT("rx_errors", rx_errors),
> > +   HNS3_NETDEV_STAT("tx_errors", tx_errors),
> > +   HNS3_NETDEV_STAT("rx_dropped", rx_dropped),
> > +   HNS3_NETDEV_STAT("tx_dropped", tx_dropped),
> > +   HNS3_NETDEV_STAT("multicast", multicast),
> > +   HNS3_NETDEV_STAT("collisions", collisions),
> > +
> > +   /* detailed Rx errors */
> > +   HNS3_NETDEV_STAT("rx_length_errors", rx_length_errors),
> > +   HNS3_NETDEV_STAT("rx_over_errors", rx_over_errors),
> > +   HNS3_NETDEV_STAT("rx_crc_errors", rx_crc_errors),
> > +   HNS3_NETDEV_STAT("rx_frame_errors", rx_frame_errors),
> > +   HNS3_NETDEV_STAT("rx_fifo_errors", rx_fifo_errors),
> > +   HNS3_NETDEV_STAT("rx_missed_errors", rx_missed_errors),
> > +
> > +   /* detailed Tx errors */
> > +   HNS3_NETDEV_STAT("tx_aborted_errors", tx_aborted_errors),
> > +   HNS3_NETDEV_STAT("tx_carrier_errors", tx_carrier_errors),
> > +   HNS3_NETDEV_STAT("tx_fifo_errors", tx_fifo_errors),
> > +   HNS3_NETDEV_STAT("tx_heartbeat_errors", tx_heartbeat_errors),
> > +   HNS3_NETDEV_STAT("tx_window_errors", tx_window_errors),
> > +
> > +   /* for cslip etc */
> > +   HNS3_NETDEV_STAT("rx_compressed", rx_compressed),
> > +   HNS3_NETDEV_STAT("tx_compressed", tx_compressed),
> > +};
> > +
> > +#define HNS3_NETDEV_STATS_COUNT ARRAY_SIZE(hns3_netdev_stats)
> > +
> > +/* tqp related stats */
> > +#define HNS3_TQP_STAT(_string, _member)\
> > +   { _string,  \
> > + FIELD_SIZEOF(struct ring_stats, _member), \
> > + offsetof(struct hns3_enet_ring, stats),   \
> > +   }
> > +
> 
> Same here.
Ok.
> 
> > +static const struct hns3_stats hns3_txq_stats[] = {
> > +   /* Tx per-queue statistics */
> > +   HNS3_TQP_STAT("tx_io_err_cnt", io_err_cnt),
> > +   HNS3_TQP_STAT("tx_sw_err_cnt", sw_err_cnt),
> > +   HNS3_TQP_STAT("tx_seg_pkt_cnt", seg_pkt_cnt),
> > +   HNS3_TQP_STAT("tx_pkts", tx_pkts),
> > +   HNS3_TQP_STAT("tx_bytes", tx_bytes),
> > +   HNS3_TQ

Re: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY

2017-07-27 Thread Florian Fainelli
On 07/27/2017 08:37 AM, Andrew Lunn wrote:
> On Thu, Jul 27, 2017 at 12:40:01AM +, Brown, Aaron F wrote:
>>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
>>> Of John W. Linville
>>> Sent: Friday, July 21, 2017 11:12 AM
>>> To: netdev@vger.kernel.org
>>> Cc: intel-wired-...@lists.osuosl.org; John W. Linville
>>> 
>>> Subject: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY
>>>
>>> The management port on an Edgecore AS7712-32 switch uses an igb MAC,
>>> but
>>> it uses a BCM54616 PHY. Without a patch like this, loading the igb
>>> module produces dmesg output like this:
>>>
>>> [3.439125] igb: Copyright (c) 2007-2014 Intel Corporation.
>>> [3.439866] igb: probe of :00:14.0 failed with error -2
>>>
>>> Signed-off-by: John W. Linville 
>>> Cc: Jeff Kirsher 
>>> ---
>>>  drivers/net/ethernet/intel/igb/e1000_82575.c   | 6 ++
>>>  drivers/net/ethernet/intel/igb/e1000_defines.h | 1 +
>>>  drivers/net/ethernet/intel/igb/e1000_hw.h  | 1 +
>>>  3 files changed, 8 insertions(+)
>>
>> I do not have the specific hardware (Edgecore switch) but as far as 
>> regression tests go this works fine.
>> Tested-by: Aaron Brown 
> 
> Sorry, missed the initial post, so replying to a reply.
> 
> Linux has supported the BCM54616 PHY since April 2015. If the Intel
> drivers used the Linux PHY drivers, you would not of had this problem.
> 
> It would be good if somebody spent the time to migrate these MAC
> drivers to use the common Linux PHY infrastructure.

I suspect there is a design pattern within the Intel drivers to share as
much low-level code as possible between OSes and only have some
Linux-ism where necessary (e.g: net_device, ethtool etc.).

PHY code is a pain in general, especially if you are serious about
testing interoperability (which is where you can spend tons of $$$ with
little reward but just say: yes it works), so it may make sense to share
it across different OSes.

I too, wish there was more sharing, but considering that this works for
the Intel driver, there is little incentive in doing this I suppose...
-- 
Florian


RE: [PATCH V4 net-next 6/8] net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC

2017-07-27 Thread Salil Mehta
Hi Florian,

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Sunday, July 23, 2017 5:54 PM
> To: Salil Mehta; da...@davemloft.net
> Cc: Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 6/8] net: hns3: Add MDIO support to
> HNS3 Ethernet driver for hip08 SoC
> 
> 
> 
> On 07/22/2017 03:09 PM, Salil Mehta wrote:
> > This patch adds the support of MDIO bus interface for HNS3 driver.
> > Code provides various interfaces to start and stop the PHY layer
> > and to read and write the MDIO bus or PHY.
> >
> > Signed-off-by: Daode Huang 
> > Signed-off-by: lipeng 
> > Signed-off-by: Salil Mehta 
> > Signed-off-by: Yisen Zhuang 
> > ---
> > Patch V4: Addressed following comments:
> >  1. Andrew Lunn:
> > https://lkml.org/lkml/2017/6/17/208
> > Patch V3: Addressed Below comments:
> >  1. Florian Fainelli:
> > https://lkml.org/lkml/2017/6/13/963
> >  2. Andrew Lunn:
> > https://lkml.org/lkml/2017/6/13/1039
> > Patch V2: Addressed below comments:
> >  1. Florian Fainelli:
> > https://lkml.org/lkml/2017/6/10/130
> >  2. Andrew Lunn:
> > https://lkml.org/lkml/2017/6/10/168
> > Patch V1: Initial Submit
> > ---
> >  .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c| 230
> +
> >  1 file changed, 230 insertions(+)
> >  create mode 100644
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> >
> > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> > new file mode 100644
> > index ..6036a97f7de5
> > --- /dev/null
> > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> > @@ -0,0 +1,230 @@
> > +/*
> > + * Copyright (c) 2016~2017 Hisilicon Limited.
> > + *
> > + * This program is free software; you can redistribute it and/or
> modify
> > + * it under the terms of the GNU General Public License as published
> by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + */
> > +
> > +#include 
> > +#include 
> > +
> > +#include "hclge_cmd.h"
> > +#include "hclge_main.h"
> > +
> > +enum hclge_mdio_c22_op_seq {
> > +   HCLGE_MDIO_C22_WRITE = 1,
> > +   HCLGE_MDIO_C22_READ = 2
> > +};
> > +
> > +#define HCLGE_MDIO_CTRL_START_B0
> > +#define HCLGE_MDIO_CTRL_ST_S   1
> > +#define HCLGE_MDIO_CTRL_ST_M   (0x3 << HCLGE_MDIO_CTRL_ST_S)
> > +#define HCLGE_MDIO_CTRL_OP_S   3
> > +#define HCLGE_MDIO_CTRL_OP_M   (0x3 << HCLGE_MDIO_CTRL_OP_S)
> > +
> > +#define HCLGE_MDIO_PHYID_S 0
> > +#define HCLGE_MDIO_PHYID_M (0x1f << HCLGE_MDIO_PHYID_S)
> > +
> > +#define HCLGE_MDIO_PHYREG_S0
> > +#define HCLGE_MDIO_PHYREG_M(0x1f << HCLGE_MDIO_PHYREG_S)
> > +
> > +#define HCLGE_MDIO_STA_B   0
> > +
> > +struct hclge_mdio_cfg_cmd {
> > +   u8 ctrl_bit;
> > +   u8 phyid;
> > +   u8 phyad;
> > +   u8 rsvd;
> > +   __le16 reserve;
> > +   __le16 data_wr;
> > +   __le16 data_rd;
> > +   __le16 sta;
> > +};
> > +
> > +static int hclge_mdio_write(struct mii_bus *bus, int phyid, int
> regnum,
> > +   u16 data)
> > +{
> > +   struct hclge_dev *hdev = (struct hclge_dev *)bus->priv;
> 
> Cast is not needed here since bus->priv is already a void *.
> 
> > +   struct hclge_mdio_cfg_cmd *mdio_cmd;
> > +   enum hclge_cmd_status status;
> > +   struct hclge_desc desc;
> > +
> > +   if (!bus)
> > +   return -EINVAL;
> 
> How can this be possible?
Agreed, will remove.

> 
> > +
> > +   hclge_cmd_setup_basic_desc(&desc, HCLGE_OPC_MDIO_CONFIG, false);
> > +
> > +   mdio_cmd = (struct hclge_mdio_cfg_cmd *)desc.data;
> 
> Same here, can we not cast this into a struct hclge_mdio_cfg_cmd?
You mean should we assign directly without casting?

> 
> > +
> > +   hnae_set_field(mdio_cmd->phyid, HCLGE_MDIO_PHYID_M,
> > +  HCLGE_MDIO_PHYID_S, phyid);
> > +   hnae_set_field(mdio_cmd->phyad, HCLGE_MDIO_PHYREG_M,
> > +  HCLGE_MDIO_PHYREG_S, regnum);
> > +
> > +   hnae_set_bit(mdio_cmd->ctrl_bit, HCLGE_MDIO_CTRL_START_B, 1);
> > +   hnae_set_field(mdio_cmd->ctrl_bit, HCLGE_MDIO_CTRL_ST_M,
> > +  HCLGE_MDIO_CTRL_ST_S, 1);
> > +   hnae_set_field(mdio_cmd->ctrl_bit, HCLGE_MDIO_CTRL_OP_M,
> > +  HCLGE_MDIO_CTRL_OP_S, HCLGE_MDIO_C22_WRITE);
> > +
> > +   mdio_cmd->data_wr = cpu_to_le16(data);
> > +
> > +   status = hclge_cmd_send(&hdev->hw, &desc, 1);
> > +   if (status) {
> > +   dev_err(&hdev->pdev->dev,
> > +   "mdio write fail when sending cmd, status is %d.\n",
> > +   status);
> > +   return -EIO;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +static int hclge_mdio_read(struct mii_bus *bus, int phyid, int
> regnum)
> > +{
> > +

Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Alexander Duyck
On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong  wrote:
>
>
> On 2017/7/27 2:26, Casey Leedom wrote:
>>   By the way Ding, two issues:
>>
>>  1. Did we ever get any acknowledgement from either Intel or AMD
>> on this patch?  I know that we can't ensure that, but it sure would
>> be nice since the PCI Quirks that we're putting in affect their
>> products.
>>
>
> Still no Intel and AMD guys has ack this, this is what I am worried about, 
> should I
> ping some man again ?
>
> Thanks
> Ding


I probably wouldn't worry about it too much. If anything all this
patch is doing is disabling relaxed ordering on the platforms we know
have issues based on what Casey originally had. If nothing else we can
follow up once the patches are in the kernel and if somebody has an
issue then.

You can include my acked-by, but it is mostly related to how this
interacts with NICs, and not so much about the PCI chipsets
themselves.

Acked-by: Alexander Duyck 


Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Casey Leedom
| From: Ding Tianhong 
| Sent: Wednesday, July 26, 2017 6:01 PM
|
| On 2017/7/27 3:05, Casey Leedom wrote:
| >
| > Ding, send me a note if you'd like me to work that [cxgb4vf patch] up
| > for you.
|
| Ok, you could send the change log and I could put it in the v8 version
| together, will you base on the patch 3/3 or build a independence patch?

Which ever you'd prefer.  It would basically mirror the same exact code that
you've got for cxgb4.  I.e. testing the setting of the VF's PCIe Capability
Device Control[Relaxed Ordering Enable], setting a new flag in
adpater->flags, testing that flag in cxgb4vf/sge.c:t4vf_sge_alloc_rxq().
But since the VF's PF will already have disabled the PF's Relaxed Ordering
Enable, the VF will also have it's Relaxed Ordering Enable disabled and any
effort by the internal chip to send TLPs with the Relaxed Ordering Attribute
will be gated by the PCIe logic.  So it's not critical that this be in the
first patch.  Your call.  Let me know if you'd like me to send that to you.


| From: Ding Tianhong 
| Sent: Wednesday, July 26, 2017 6:08 PM
|
| On 2017/7/27 2:26, Casey Leedom wrote:
| >
| >  1. Did we ever get any acknowledgement from either Intel or AMD
| > on this patch?  I know that we can't ensure that, but it sure would
| > be nice since the PCI Quirks that we're putting in affect their
| > products.
|
| Still no Intel and AMD guys has ack this, this is what I am worried about,
| should I ping some man again ?

By amusing coincidence, Patrik Cramer (now Cc'ed) from Intel sent me a note
yesterday with a link to the official Intel performance tuning documentation
which covers this issue:

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

In section 3.9.1 we have:

3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
  and Toward MMIO Regions (P2P)

In order to maximize performance for PCIe devices in the processors
listed in Table 3-6 below, the soft- ware should determine whether the
accesses are toward coherent memory (system memory) or toward MMIO
regions (P2P access to other devices). If the access is toward MMIO
region, then software can command HW to set the RO bit in the TLP
header, as this would allow hardware to achieve maximum throughput for
these types of accesses. For accesses toward coherent memory, software
can command HW to clear the RO bit in the TLP header (no RO), as this
would allow hardware to achieve maximum throughput for these types of
accesses.

Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
   PCIe Performance

ProcessorCPU RP Device IDs

Intel Xeon processors based on   6F01H-6F0EH
Broadwell microarchitecture

Intel Xeon processors based on   2F01H-2F0EH
Haswell microarchitecture

Unfortunately that's a pretty thin section.  But it does expand the set of
Intel Root Complexes for which our Linux PCI Quirk will need to cover.  So
you should add those to the next (and hopefully final) spin of your patch.
And, it also verifies the need to handle the use of Relaxed Ordering more
subtlely than simply turning it off since the NVMe peer-to-peer example I
keep bringing up would fall into the "need to use Relaxed Ordering" case ...

It would have been nice to know why this is happening and if any future
processor would fix this.  After all, Relaxed Ordering, is just supposed to
be a hint.  At worst, a receiving device could just ignore the attribute
entirely.  Obviously someone made an effort to implement it but ... it
didn't go the way they wanted.

And, it also would have been nice to know if there was any hidden register
in these Intel Root Complexes which can completely turn off the effort to
pay attention to the Relaxed Ordering Attribute.  We've spend an enormous
amount of effort on this issue here on the Linux PCI email list struggling
mightily to come up with a way to determine when it's
safe/recommended/not-recommended/unsafe to use Relaxed Ordering when
directing TLPs towards the Root Complex.  And some architectures require RO
for decent performance so we can't just "turn it off" unilatterally.

Casey


Re: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY

2017-07-27 Thread Jonathan Toppins
On 07/27/2017 11:37 AM, Andrew Lunn wrote:
> On Thu, Jul 27, 2017 at 12:40:01AM +, Brown, Aaron F wrote:
>>> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
>>> Of John W. Linville
>>> Sent: Friday, July 21, 2017 11:12 AM
>>> To: netdev@vger.kernel.org
>>> Cc: intel-wired-...@lists.osuosl.org; John W. Linville
>>> 
>>> Subject: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY
>>>
>>> The management port on an Edgecore AS7712-32 switch uses an igb MAC,
>>> but
>>> it uses a BCM54616 PHY. Without a patch like this, loading the igb
>>> module produces dmesg output like this:
>>>
>>> [3.439125] igb: Copyright (c) 2007-2014 Intel Corporation.
>>> [3.439866] igb: probe of :00:14.0 failed with error -2
>>>
>>> Signed-off-by: John W. Linville 
>>> Cc: Jeff Kirsher 
>>> ---
>>>  drivers/net/ethernet/intel/igb/e1000_82575.c   | 6 ++
>>>  drivers/net/ethernet/intel/igb/e1000_defines.h | 1 +
>>>  drivers/net/ethernet/intel/igb/e1000_hw.h  | 1 +
>>>  3 files changed, 8 insertions(+)
>>
>> I do not have the specific hardware (Edgecore switch) but as far as 
>> regression tests go this works fine.
>> Tested-by: Aaron Brown 
> 
> Sorry, missed the initial post, so replying to a reply.
> 
> Linux has supported the BCM54616 PHY since April 2015. If the Intel
> drivers used the Linux PHY drivers, you would not of had this problem.
> 
> It would be good if somebody spent the time to migrate these MAC
> drivers to use the common Linux PHY infrastructure.
> 
>   Andrew
> 

Thank you. This was a point I made in 2015 when I posted the original
patch for igb to support this phy.

http://marc.info/?l=linux-netdev&m=142870703206646&w=2
http://marc.info/?l=linux-netdev&m=142870703806647&w=2
http://marc.info/?l=linux-netdev&m=142870703806648&w=2

http://marc.info/?l=linux-netdev&m=142930225730399&w=2
http://marc.info/?l=linux-netdev&m=142930226230400&w=2

It would be good to accept this patch and then later work to port older
drivers to use phylib.

It might be worthwhile to start beating up on new driver submissions
that don't use phylib.

I recall someone mentioning that there were plans to have phylib support
[Q]SFP[+] modules as well. I am very interested in that work, if someone
has patches/plans I would like to take a look.

-Jon


Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-07-27 Thread Corentin Labbe
On Thu, Jul 27, 2017 at 09:54:01AM -0700, Florian Fainelli wrote:
> On 07/27/2017 06:48 AM, Andrew Lunn wrote:
> > On Thu, Jul 27, 2017 at 09:02:16PM +0800, David Wu wrote:
> >> To make internal phy work, need to configure the phy_clock,
> >> phy cru_reset and related registers.
> >>
> >> Signed-off-by: David Wu 
> >> ---
> >> changes in v2:
> >>  - Use the standard "phy-mode" property for internal phy. (Florian)
> > 
> > I think we need to discuss this. This PHY appears to be on an MDIO
> > bus, it uses a standard PHY driver, and it appears to be using an RMII
> > interface. So it is just an ordinary PHY.
> 
> First, the fact that the internal PHY also appears through MDIO is
> orthogonal to the fact that it is internal or external. Plenty of
> designs have internal PHYs exposed through MDIO because that is
> convenient. What matters though is how the data/clock lines are wired
> internally, which is what "phy-mode" describes.
> 
> > 
> > Internal is supposed to be something which is not ordinary, does not
> > use one of the standard phy modes, needs something special to make it
> > work.
> > 
> > Florain, it appears to be your suggestion to use internal. What do you
> > say?
> 
> phy-mode = "internal" really means that it is not a standard MII variant
> to connect the data/clock lines between the Ethernet MAC and the PHY,
> and this can happen in some designs (although quite unlikely). So from
> there we could do several things depending on the requirements:
> 
> - if you can have your Ethernet MAC driver perform the necessary
> configuration *after* you have been able to bind the PHY device with its
> PHY driver, then the PHY driver should have PHY_IS_INTERNAL in its
> flags, and you can use phy_is_internal() from PHYLIB to tell you that
> and we could imagine using: phy-mode = "rmii" because that would not too
> much of a stretch
> 
> - if you need knowledge about this PHY connection type prior to binding
> the PHY device and its driver (that is, before of_phy_connect()) we
> could add a boolean property e.g: "phy-is-internal" that allows us to
> know that, or we can have a new phy-mode value, e.g: "internal-rmii"
> which describes that, either way would probably be fine, but the former
> scales better
> 

Hello

We have the same problem on Allwinner SoCs for dwmac-sun8i, we need to set a 
syscon for chossing between internal/external PHY.
Having this phy-is-internal would be very helpfull. (adding internal-xmii will 
add too many flags in our case)

Thanks
Regards
Corentin Labbe


Performance regression with virtio_net

2017-07-27 Thread Seth Forshee
I'm seeing a performance regression with virtio_net that looks to have
started in 4.12-rc1. I only see it in one context though, downloading
snap packages from the Ubuntu snap store. For example:

 
https://api.snapcraft.io/api/v1/snaps/download/b8X2psL1ryVrPt5WEmpYiqfr5emixTd7_1797.snap

which redirects to Internap's CDN. Normally this downloads in a few
seconds at ~10 MB/s, but with 4.12 and 4.13 it takes minutes with a rate
of ~150 KB/s. Everything else I've tried downloads as normal speeds.

I bisected this to 680557cf79f8 "virtio_net: rework mergeable buffer
handling". If I revert this on top of 4.13-rc2 (along with other changes
needed to successfully revert it) speeds return to normal.

Thanks,
Seth


Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-07-27 Thread Florian Fainelli
On 07/27/2017 06:48 AM, Andrew Lunn wrote:
> On Thu, Jul 27, 2017 at 09:02:16PM +0800, David Wu wrote:
>> To make internal phy work, need to configure the phy_clock,
>> phy cru_reset and related registers.
>>
>> Signed-off-by: David Wu 
>> ---
>> changes in v2:
>>  - Use the standard "phy-mode" property for internal phy. (Florian)
> 
> I think we need to discuss this. This PHY appears to be on an MDIO
> bus, it uses a standard PHY driver, and it appears to be using an RMII
> interface. So it is just an ordinary PHY.

First, the fact that the internal PHY also appears through MDIO is
orthogonal to the fact that it is internal or external. Plenty of
designs have internal PHYs exposed through MDIO because that is
convenient. What matters though is how the data/clock lines are wired
internally, which is what "phy-mode" describes.

> 
> Internal is supposed to be something which is not ordinary, does not
> use one of the standard phy modes, needs something special to make it
> work.
> 
> Florain, it appears to be your suggestion to use internal. What do you
> say?

phy-mode = "internal" really means that it is not a standard MII variant
to connect the data/clock lines between the Ethernet MAC and the PHY,
and this can happen in some designs (although quite unlikely). So from
there we could do several things depending on the requirements:

- if you can have your Ethernet MAC driver perform the necessary
configuration *after* you have been able to bind the PHY device with its
PHY driver, then the PHY driver should have PHY_IS_INTERNAL in its
flags, and you can use phy_is_internal() from PHYLIB to tell you that
and we could imagine using: phy-mode = "rmii" because that would not too
much of a stretch

- if you need knowledge about this PHY connection type prior to binding
the PHY device and its driver (that is, before of_phy_connect()) we
could add a boolean property e.g: "phy-is-internal" that allows us to
know that, or we can have a new phy-mode value, e.g: "internal-rmii"
which describes that, either way would probably be fine, but the former
scales better

Then again, using phy-mode = "internal" even though this is Reduced MII
is not big of a deal IMHO as long as there is no loss of information and
that internal de-facto means internal reduced MII for instance.
-- 
Florian


Re: [PATCH v2 01/11] net: phy: Add rockchip phy driver support

2017-07-27 Thread Florian Fainelli
On 07/27/2017 05:55 AM, David Wu wrote:
> Support internal ephy currently.
> 
> Signed-off-by: David Wu 
> ---
> changes in v2:
>  - Alphabetic order for Kconfig and Makefile.
>  - Add analog register init.
>  - Disable auto-mdix for workround.
>  - Rename config
> 
>  drivers/net/phy/Kconfig|   5 ++
>  drivers/net/phy/Makefile   |   1 +
>  drivers/net/phy/rockchip.c | 128 
> +
>  3 files changed, 134 insertions(+)
>  create mode 100644 drivers/net/phy/rockchip.c
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 2dda720..8dc6cd7 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -334,6 +334,11 @@ config REALTEK_PHY
>   ---help---
> Supports the Realtek 821x PHY.
>  
> +config ROCKCHIP_PHY
> +tristate "Drivers for ROCKCHIP PHYs"

"Driver for Rockchip Ethernet PHYs" would seem more appropriate, this is
just one driver for now.

> +---help---
> +  Currently supports the internal ephy.

EPHY is usually how an Ethernet PHY is abbreviated.

> +
>  config SMSC_PHY
>   tristate "SMSC PHYs"
>   ---help---
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 8e9b9f3..350520e 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -66,6 +66,7 @@ obj-$(CONFIG_MICROSEMI_PHY) += mscc.o
>  obj-$(CONFIG_NATIONAL_PHY)   += national.o
>  obj-$(CONFIG_QSEMI_PHY)  += qsemi.o
>  obj-$(CONFIG_REALTEK_PHY)+= realtek.o
> +obj-$(CONFIG_ROCKCHIP_PHY)   += rockchip.o
>  obj-$(CONFIG_SMSC_PHY)   += smsc.o
>  obj-$(CONFIG_STE10XP)+= ste10Xp.o
>  obj-$(CONFIG_TERANETICS_PHY) += teranetics.o
> diff --git a/drivers/net/phy/rockchip.c b/drivers/net/phy/rockchip.c
> new file mode 100644
> index 000..3f74658
> --- /dev/null
> +++ b/drivers/net/phy/rockchip.c
> @@ -0,0 +1,128 @@
> +/**
> + * drivers/net/phy/rockchip.c
> + *
> + * Driver for ROCKCHIP PHY
> + *
> + * Copyright (c) 2017, Fuzhou Rockchip Electronics Co., Ltd
> + *
> + * David Wu

Missing space between your last name and your email address, there is
another typo like this in the MODULE_AUTHOR() macro.

> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MII_INTERNAL_CTRL_STATUS 17
> +#define SMI_ADDR_TSTCNTL 20
> +#define SMI_ADDR_TSTREAD121
> +#define SMI_ADDR_TSTREAD222
> +#define SMI_ADDR_TSTWRITE23
> +
> +#define AUTOMDIX_EN  BIT(7)
> +#define TSTCNTL_RD   (BIT(15) | BIT(10))
> +#define TSTCNTL_WR   (BIT(14) | BIT(10))
> +
> +#define WR_ADDR_A7CFG0x18
> +
> +static void rockchip_init_tstmode(struct phy_device *phydev)
> +{
> + /* Enable access to Analog and DSP register banks */
> + phy_write(phydev, SMI_ADDR_TSTCNTL, 0x0400);
> + phy_write(phydev, SMI_ADDR_TSTCNTL, 0x);
> + phy_write(phydev, SMI_ADDR_TSTCNTL, 0x0400);
> +}
> +
> +static void  rockchip_close_tstmode(struct phy_device *phydev)
> +{
> + /* Back to basic register bank */
> + phy_write(phydev, SMI_ADDR_TSTCNTL, 0x);
> +}
> +
> +static void rockchip_internal_phy_analog_init(struct phy_device *phydev)
> +{
> + rockchip_init_tstmode(phydev);

Technically MDIO writes can fail, but you are not propagating the return
value, so you could be stuck on a bad page/bank.

> +
> + /*
> +  * Adjust tx amplitude to make sginal better,
> +  * the default value is 0x8.
> +  */
> + phy_write(phydev, SMI_ADDR_TSTWRITE, 0xB);
> + phy_write(phydev, SMI_ADDR_TSTCNTL, TSTCNTL_WR | WR_ADDR_A7CFG);

Likewise.

> +
> + rockchip_close_tstmode(phydev);

Same here.

> +}
> +
> +static int rockchip_internal_phy_config_init(struct phy_device *phydev)
> +{
> + int val;
> +
> + /*
> +  * The auto MIDX has linked problem on some board,
> +  * workround to disable auto MDIX.
> +  */

If this a board-specific problem you may consider register a PHY fixup
for the affected boards, or introduce a specific property to illustrate
that MDI-X is broken.

> + val = phy_read(phydev, MII_INTERNAL_CTRL_STATUS);
> + val &= ~AUTOMDIX_EN;
> + phy_write(phydev, MII_INTERNAL_CTRL_STATUS, val);

You also need to reject MDI configuration requests coming via
phy_ethtool_ksettings_set() which you should see in the
phyddrv::config_ane callback.

> +
> + rockchip_internal_phy_analog_init(phydev);
> +
> + return 0;
> +}
> +
> +static int rockchip_internal_phy_read_status(struct phy_device *phydev)
> +{
> + int ret, old_speed;
> +
> + old_speed = phydev->speed;
> + ret = genphy_read_s

Re: [PATCH net] xen-netback: correctly schedule rate-limited queues

2017-07-27 Thread David Miller
From: Jean-Louis Dupond 
Date: Thu, 27 Jul 2017 10:21:56 +0200

> Op 2017-06-22 17:16, schreef David Miller:
>> From: Wei Liu 
>> Date: Wed, 21 Jun 2017 10:21:22 +0100
>> 
>>> Add a flag to indicate if a queue is rate-limited. Test the flag in
>>> NAPI poll handler and avoid rescheduling the queue if true, otherwise
>>> we risk locking up the host. The rescheduling will be done in the
>>> timer callback function.
>>> Reported-by: Jean-Louis Dupond 
>>> Signed-off-by: Wei Liu 
>>> Tested-by: Jean-Louis Dupond 
>> Applied.
> 
> Could this get applied to stable & LTS kernels also?
> Seems important enough in my opinion.

Sure, queued up.


[PATCHv4 net] ipv6: no need to check rt->dst.error when get route info

2017-07-27 Thread Hangbin Liu
After commit 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib
result when requested"). When we get a prohibit ertry, we will return
-EACCES directly instead of dump route info.

Fix it by remove the rt->dst.error check.

Before fix:
\# ip -6 route add prohibit 2003::/64 dev eth1
\# ip -6 route get fibmatch 2003::1
RTNETLINK answers: Permission denied
\# ip -6 route add unreachable 2004::/64 dev eth1
\# ip -6 route get fibmatch 2004::1
RTNETLINK answers: No route to host

After fix:
\# ip -6 route add prohibit 2003::/64 dev eth1
\# ip -6 route get fibmatch 2003::1
prohibit 2003::/64 dev lo metric 1024 error -13 pref medium
\# ip -6 route add unreachable 2004::/64 dev eth1
\# ip -6 route get fibmatch 2004::1
unreachable 2004::/64 dev lo metric 1024 error -113 pref medium

Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib...")
Signed-off-by: Hangbin Liu 
---
 net/ipv6/route.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4d30c96..8fc52de 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3637,12 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh,
dst = ip6_route_lookup(net, &fl6, 0);
 
rt = container_of(dst, struct rt6_info, dst);
-   if (rt->dst.error) {
-   err = rt->dst.error;
-   ip6_rt_put(rt);
-   goto errout;
-   }
-
if (rt == net->ipv6.ip6_null_entry) {
err = rt->dst.error;
ip6_rt_put(rt);
-- 
2.5.5



Possible race in hysdn.ko

2017-07-27 Thread Anton Volkov

Hello.

While searching for races in the Linux kernel I've come across
"drivers/isdn/hysdn/hysdn.ko" module. Here is a question that I came up
with while analysing results. Lines are given using the info from Linux
v4.12.

In hysdn_proclog.c file in put_log_buffer function a non-standard type
of synchronization is employed. It uses pd->del_lock as some kind of
semaphore (hysdn_proclog.c: lines 129 and 143). Consider the following
case:

Thread 1:Thread 2:
hysdn_log_write
-> hysdn_add_log
-> put_log_buffer
 spin_lock()  hysdn_conf_open
 i = pd->del_lock++   -> hysdn_add_log
 spin_unlock()   -> put_log_buffer
 if (!i) spin_lock()
 pd->del_lock--   i = pd->del_lock++
  spin_unlock()
  if (!i) 
  pd->del_lock--

 - the loop that deletes unused buffer entries
(hysdn_proclog.c: lines 134-142).
pd->del_lock-- is not an atomic operation and is executed without any
locks. Thus it may interfere in the increment process of pd->del_lock in
another thread. There may be cases that lead to the inability of any
thread going through the .

I see several possible solutions to this problem:
1) move the  under the spin_lock and delete
pd->del_lock synchronization;
2) wrap pd->del_lock-- with spin_lock protection.

What do you think should be done about it?

Thank you for your time.


Re: [PATCH net] ipv6: no need to return rt->dst.error if it is not null entry.

2017-07-27 Thread Hangbin Liu
Hi David,

On Wed, Jul 26, 2017 at 01:00:26PM -0600, David Ahern wrote:
> >> I don't think so. If I add a prohibit route and use the fibmatch
> >> attribute, I want to see the route from the FIB that was matched.
> > 
> > 
> > yes, exactly. wouldn't  'rt != net->ipv6.ip6_prohibit_entry' above let
> > it fall through to the route fill code ?
> > 
> > ah...but i guess you are saying that they will have rt6_info's of
> > their own and will not match. got it. ack.
> > 
> 
> This:
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 4d30c96a819d..24de81c804c2 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3637,11 +3637,6 @@ static int inet6_rtm_getroute(struct sk_buff
> *in_skb, struct nlmsghdr *nlh,
> dst = ip6_route_lookup(net, &fl6, 0);
> 
> rt = container_of(dst, struct rt6_info, dst);
> -   if (rt->dst.error) {
> -   err = rt->dst.error;
> -   ip6_rt_put(rt);
> -   goto errout;
> -   }
> 
> if (rt == net->ipv6.ip6_null_entry) {
> err = rt->dst.error;
> 
> Puts back the original behavior. In that case, only rt == null_entry
> drops to the error path which is correct. All other rt values will drop
> to rt6_fill_node and return rt data.

Thanks for your explains. Now I know where I made the mistake. I mis-looked
FR_ACT_UNREACHABLE to RTN_UNREACHABLE and thought we return rt =
net->ipv6.ip6_null_entry in fib6_rule_action().

With you help I know we just set rt->dst.error to -EACCES for prohibit
entry in ip6_route_info_create. So remove the rt->dst.error check is enought.

Thanks
Hangbin


RE: [PATCH V4 net-next 8/8] net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS

2017-07-27 Thread Salil Mehta
Hi Leon,

> -Original Message-
> From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> ow...@vger.kernel.org] On Behalf Of Leon Romanovsky
> Sent: Sunday, July 23, 2017 2:12 PM
> To: Salil Mehta
> Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 8/8] net: hns3: Add HNS3 driver to
> kernel build framework & MAINTAINERS
> 
> On Sat, Jul 22, 2017 at 11:09:42PM +0100, Salil Mehta wrote:
> > This patch updates the MAINTAINERS file with HNS3 Ethernet driver
> > maintainers names and other details. This also introduces the new
> > Makefiles required to build the HNS3 Ethernet driver and updates
> > the existing Kconfig file in the hisilicon folder.
> >
> > Signed-off-by: Salil Mehta 
> > ---
> > Patch V3: Addressed below errors:
> >  1. Intel kbuild: https://lkml.org/lkml/2017/6/14/313
> >  2. Intel Kbuild: https://lkml.org/lkml/2017/6/14/636
> > Patch V2: No change
> > Patch V1: Initial Submit
> > ---
> >  MAINTAINERS|  8 +++
> >  drivers/net/ethernet/hisilicon/Kconfig | 27
> ++
> >  drivers/net/ethernet/hisilicon/Makefile|  1 +
> >  drivers/net/ethernet/hisilicon/hns3/Makefile   |  7 ++
> >  .../net/ethernet/hisilicon/hns3/hns3pf/Makefile| 11 +
> >  5 files changed, 54 insertions(+)
> >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/Makefile
> >  create mode 100644
> drivers/net/ethernet/hisilicon/hns3/hns3pf/Makefile
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 297e610c9163..a22d5b86c2b7 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -6197,6 +6197,14 @@ S:   Maintained
> >  F: drivers/net/ethernet/hisilicon/
> >  F: Documentation/devicetree/bindings/net/hisilicon*.txt
> >
> > +HISILICON NETWORK SUBSYSTEM 3 DRIVER (HNS3)
> > +M: Yisen Zhuang 
> > +M: Salil Mehta 
> > +L: netdev@vger.kernel.org
> > +W: http://www.hisilicon.com
> > +S: Maintained
> > +F: drivers/net/ethernet/hisilicon/hns3/
> > +
> >  HISILICON ROCE DRIVER
> >  M: Lijun Ou 
> >  M: Wei Hu(Xavier) 
> > diff --git a/drivers/net/ethernet/hisilicon/Kconfig
> b/drivers/net/ethernet/hisilicon/Kconfig
> > index d11287e11371..9f8ea283c531 100644
> > --- a/drivers/net/ethernet/hisilicon/Kconfig
> > +++ b/drivers/net/ethernet/hisilicon/Kconfig
> > @@ -76,4 +76,31 @@ config HNS_ENET
> >   This selects the general ethernet driver for HNS.  This module
> make
> >   use of any HNS AE driver, such as HNS_DSAF
> >
> > +config HNS3
> > +   tristate "Hisilicon Network Subsystem Support HNS3 (Framework)"
> > +depends on PCI
> > +   ---help---
> > + This selects the framework support for Hisilicon Network
> Subsystem 3.
> > + This layer facilitates clients like ENET, RoCE and user-space
> ethernet
> > + drivers(like ODP)to register with HNAE devices and their
> associated
> > + operations.
> > +
> > +config HNS3_HCLGE
> > +   tristate "Hisilicon HNS3 HCLGE Acceleration Engine &
> Compatibility Layer Support"
> > +depends on PCI_MSI
> > +   select HNS3
> 
> IMHO it should be "depends" and not "select".
Agreed, will fix in next patch.

Thanks
> 
> > +   ---help---
> > + This selects the HNS3_HCLGE network acceleration engine & its
> hardware
> > + compatibility layer. The engine would be used in Hisilicon
> hip08 family of
> > + SoCs and further upcoming SoCs.
> > +
> > +config HNS3_ENET
> > +   tristate "Hisilicon HNS3 Ethernet Device Support"
> > +depends on 64BIT && PCI
> > +   select HNS3
> 
> Ditto
Agreed, will fix in next patch.

Thanks 
> > +   ---help---
> > + This selects the Ethernet Driver for Hisilicon Network
> Subsystem 3 for hip08
> > + family of SoCs. This module depends upon HNAE3 driver to access
> the HNAE3
> > + devices and their associated operations.
> > +
> >  endif # NET_VENDOR_HISILICON
> > diff --git a/drivers/net/ethernet/hisilicon/Makefile
> b/drivers/net/ethernet/hisilicon/Makefile
> > index 8661695024dc..3828c435c18f 100644
> > --- a/drivers/net/ethernet/hisilicon/Makefile
> > +++ b/drivers/net/ethernet/hisilicon/Makefile
> > @@ -6,4 +6,5 @@ obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
> >  obj-$(CONFIG_HIP04_ETH) += hip04_eth.o
> >  obj-$(CONFIG_HNS_MDIO) += hns_mdio.o
> >  obj-$(CONFIG_HNS) += hns/
> > +obj-$(CONFIG_HNS3) += hns3/
> >  obj-$(CONFIG_HISI_FEMAC) += hisi_femac.o
> > diff --git a/drivers/net/ethernet/hisilicon/hns3/Makefile
> b/drivers/net/ethernet/hisilicon/hns3/Makefile
> > new file mode 100644
> > index ..5e53735b2d4e
> > --- /dev/null
> > +++ b/drivers/net/ethernet/hisilicon/hns3/Makefile
> > @@ -0,0 +1,7 @@
> > +#
> > +# Makefile for the HISILICON network device drivers.
> > +#
> > +
> > +obj-$(CONFIG_HNS3) += hns3pf/
> > +
> > +obj-$(CONFIG_HNS3) +=hnae3.o
> 
> There is a missing space after "+="
Will fix. thanks.

> 
>

RE: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to HNS3 driver

2017-07-27 Thread Salil Mehta
Hi Stephen,

> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Sunday, July 23, 2017 6:26 PM
> To: Salil Mehta
> Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V4 net-next 7/8] net: hns3: Add Ethtool support to
> HNS3 driver
> 
> On Sat, 22 Jul 2017 23:09:41 +0100
> Salil Mehta  wrote:
> 
> > +   HNS3_NETDEV_STAT("rx_packets", rx_packets),
> > +   HNS3_NETDEV_STAT("tx_packets", tx_packets),
> > +   HNS3_NETDEV_STAT("rx_bytes", rx_bytes),
> > +   HNS3_NETDEV_STAT("tx_bytes", tx_bytes),
> > +   HNS3_NETDEV_STAT("rx_errors", rx_errors),
> > +   HNS3_NETDEV_STAT("tx_errors", tx_errors),
> > +   HNS3_NETDEV_STAT("rx_dropped", rx_dropped),
> > +   HNS3_NETDEV_STAT("tx_dropped", tx_dropped),
> > +   HNS3_NETDEV_STAT("multicast", multicast),
> > +   HNS3_NETDEV_STAT("collisions", collisions),
> > +
> > +   /* detailed Rx errors */
> 
> Do not put network statistics in ethtool statistics.
> This is redundant and unnecessary. Yes some other drivers may do it
> but it is not best practice.

Ok sure, will remove netdev stats from the ethtool.

Thanks
Salil


Re: qmi_wwan: Null pointer dereference when removing driver

2017-07-27 Thread Nathaniel Roach
At some point in the suspend procedure the error occurs, so the first 
suspend works but subsequent ones fail with something like "timeout 
waiting for processes to suspend". I just assumed it happened before the 
suspend happens but was too late to be a hindrance.


Presumably the driver dies during the re-probe stage you mentioned, but 
a rmmod was how I found the issue (I was trying to pull out the driver 
to see if it was causing the suspend issues).



On 27/07/17 23:39, Dan Williams wrote:

On Thu, 2017-07-27 at 13:31 +0800, Nathaniel Roach wrote:

Unsure at which point was added, but issue not present in stock
debian 4.11 kernel.

Running on a Thinkpad X220 with coreboot.

I'm building from upstream. When I attempt to remove the qmi_wwan
module (which also happens pre-suspend) the rmmod process gets
killed, and the following shows in dmesg:

Unrelated to the crash (which should be fixed), why do you need to
remove the module pre-suspend?  Typically on a laptop the device will
either have all power cut to it over suspend and thus it'll get
reprobed on resume, or else suspend gets handled OK by the driver.  I'm
curious what the problem was that required an rmmod over suspend.

Dan


[   59.979791] usb 2-1.4: USB disconnect, device number 4
[   59.980102] qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister
'qmi_wwan' usb-:00:1d.0-1.4, WWAN/QMI device
[   60.006821] BUG: unable to handle kernel NULL pointer dereference
at 00e0
[   60.006879] IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
[   60.006911] PGD 0
[   60.006911] P4D 0
[   60.006957] Oops:  [#1] SMP
[   60.006978] Modules linked in: fuse(E) ccm(E) rfcomm(E) cmac(E)
bnep(E) qmi_wwan(E) cdc_wdm(E) cdc_ether(E) usbnet(E) mii(E) btusb(E)
btrtl(E) btbcm(E) btintel(E) bluetooth(E) joydev(E) xpad(E)
ecdh_generic(E) ff_memless(E) binfmt_misc(E) snd_hda_codec_hdmi(E)
snd_hda_codec_conexant(E) snd_hda_codec_generic(E) arc4(E)
iTCO_wdt(E) iTCO_vendor_support(E) intel_rapl(E)
x86_pkg_temp_thermal(E) kvm_intel(E) kvm(E) irqbypass(E)
crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E)
ghash_clmulni_intel(E) aesni_intel(E) iwlmvm(E) aes_x86_64(E)
crypto_simd(E) mac80211(E) cryptd(E) glue_helper(E) snd_hda_intel(E)
snd_hda_codec(E) iwlwifi(E) snd_hwdep(E) psmouse(E) snd_hda_core(E)
snd_pcm(E) serio_raw(E) sdhci_pci(E) pcspkr(E) snd_timer(E)
ehci_pci(E) e1000e(E) i2c_i801(E) ehci_hcd(E) snd(E) sg(E) i915(E)
lpc_ich(E)
[   60.007366]  ptp(E) usbcore(E) cfg80211(E) mfd_core(E) pps_core(E)
shpchp(E) ac(E) battery(E) tpm_tis(E) tpm_tis_core(E) evdev(E) tpm(E)
parport_pc(E) ppdev(E) lp(E) parport(E) ip_tables(E) x_tables(E)
autofs4(E)
[   60.007474] CPU: 2 PID: 33 Comm: kworker/2:1 Tainted:
GE   4.12.3-nr44-normandy-r1500619820+ #1
[   60.007524] Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000
4.6-810-g50522254fb 07/21/2017
[   60.007580] Workqueue: usb_hub_wq hub_event [usbcore]
[   60.007609] task: 8c882b716040 task.stack: b8e800d84000
[   60.007644] RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
[   60.007678] RSP: 0018:b8e800d87b38 EFLAGS: 00010246
[   60.007711] RAX:  RBX:  RCX:

[   60.007752] RDX: 0001 RSI: 8c8824f3f1d0 RDI:
8c8824ef6400
[   60.007792] RBP: 8c8824ef6400 R08:  R09:

[   60.007833] R10: b8e800d87780 R11: 0011 R12:
c07ea0e8
[   60.007874] R13: 8c8824e2e000 R14: 8c8824e2e098 R15:

[   60.007915] FS:  () GS:8c883530()
knlGS:
[   60.007960] CS:  0010 DS:  ES:  CR0: 80050033
[   60.007994] CR2: 00e0 CR3: 000229ca5000 CR4:
000406e0
[   60.008035] Call Trace:
[   60.008065]  ? usb_unbind_interface+0x71/0x270 [usbcore]
[   60.008101]  ? device_release_driver_internal+0x154/0x210
[   60.008135]  ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
[   60.008168]  ? usbnet_disconnect+0x6c/0xf0 [usbnet]
[   60.008194]  ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
[   60.008232]  ? usb_unbind_interface+0x71/0x270 [usbcore]
[   60.008264]  ? device_release_driver_internal+0x154/0x210
[   60.008296]  ? bus_remove_device+0xf5/0x160
[   60.008324]  ? device_del+0x1dc/0x310
[   60.008355]  ? usb_remove_ep_devs+0x1b/0x30 [usbcore]
[   60.008393]  ? usb_disable_device+0x93/0x250 [usbcore]
[   60.008430]  ? usb_disconnect+0x90/0x260 [usbcore]
[   60.008468]  ? hub_event+0x1d9/0x14a0 [usbcore]
[   60.008500]  ? process_one_work+0x175/0x370
[   60.008528]  ? worker_thread+0x4a/0x380
[   60.008555]  ? kthread+0xfc/0x130
[   60.008579]  ? process_one_work+0x370/0x370
[   60.008606]  ? kthread_park+0x60/0x60
[   60.008631]  ? ret_from_fork+0x22/0x30
[   60.008656] Code: 66 0f 1f 44 00 00 66 66 66 66 90 55 48 89 fd 53
48 83 ec 10 48 8b 9f c8 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 44
24 08 31 c0  83 e0 00 00 00 02 74 51 e8 0d b3 2b cd 85 c0 74 67
48 8b bb
[   60.011925] RIP: qmi_wwan_disconnect+0x25/0xc0

Re: qmi_wwan: Null pointer dereference when removing driver

2017-07-27 Thread Dan Williams
On Thu, 2017-07-27 at 13:31 +0800, Nathaniel Roach wrote:
> Unsure at which point was added, but issue not present in stock
> debian 4.11 kernel.
> 
> Running on a Thinkpad X220 with coreboot.
> 
> I'm building from upstream. When I attempt to remove the qmi_wwan
> module (which also happens pre-suspend) the rmmod process gets
> killed, and the following shows in dmesg:

Unrelated to the crash (which should be fixed), why do you need to
remove the module pre-suspend?  Typically on a laptop the device will
either have all power cut to it over suspend and thus it'll get
reprobed on resume, or else suspend gets handled OK by the driver.  I'm
curious what the problem was that required an rmmod over suspend.

Dan

> [   59.979791] usb 2-1.4: USB disconnect, device number 4
> [   59.980102] qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister
> 'qmi_wwan' usb-:00:1d.0-1.4, WWAN/QMI device
> [   60.006821] BUG: unable to handle kernel NULL pointer dereference
> at 00e0
> [   60.006879] IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
> [   60.006911] PGD 0
> [   60.006911] P4D 0
> [   60.006957] Oops:  [#1] SMP
> [   60.006978] Modules linked in: fuse(E) ccm(E) rfcomm(E) cmac(E)
> bnep(E) qmi_wwan(E) cdc_wdm(E) cdc_ether(E) usbnet(E) mii(E) btusb(E)
> btrtl(E) btbcm(E) btintel(E) bluetooth(E) joydev(E) xpad(E)
> ecdh_generic(E) ff_memless(E) binfmt_misc(E) snd_hda_codec_hdmi(E)
> snd_hda_codec_conexant(E) snd_hda_codec_generic(E) arc4(E)
> iTCO_wdt(E) iTCO_vendor_support(E) intel_rapl(E)
> x86_pkg_temp_thermal(E) kvm_intel(E) kvm(E) irqbypass(E)
> crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E)
> ghash_clmulni_intel(E) aesni_intel(E) iwlmvm(E) aes_x86_64(E)
> crypto_simd(E) mac80211(E) cryptd(E) glue_helper(E) snd_hda_intel(E)
> snd_hda_codec(E) iwlwifi(E) snd_hwdep(E) psmouse(E) snd_hda_core(E)
> snd_pcm(E) serio_raw(E) sdhci_pci(E) pcspkr(E) snd_timer(E)
> ehci_pci(E) e1000e(E) i2c_i801(E) ehci_hcd(E) snd(E) sg(E) i915(E)
> lpc_ich(E)
> [   60.007366]  ptp(E) usbcore(E) cfg80211(E) mfd_core(E) pps_core(E)
> shpchp(E) ac(E) battery(E) tpm_tis(E) tpm_tis_core(E) evdev(E) tpm(E)
> parport_pc(E) ppdev(E) lp(E) parport(E) ip_tables(E) x_tables(E)
> autofs4(E)
> [   60.007474] CPU: 2 PID: 33 Comm: kworker/2:1 Tainted:
> GE   4.12.3-nr44-normandy-r1500619820+ #1
> [   60.007524] Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000
> 4.6-810-g50522254fb 07/21/2017
> [   60.007580] Workqueue: usb_hub_wq hub_event [usbcore]
> [   60.007609] task: 8c882b716040 task.stack: b8e800d84000
> [   60.007644] RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
> [   60.007678] RSP: 0018:b8e800d87b38 EFLAGS: 00010246
> [   60.007711] RAX:  RBX:  RCX:
> 
> [   60.007752] RDX: 0001 RSI: 8c8824f3f1d0 RDI:
> 8c8824ef6400
> [   60.007792] RBP: 8c8824ef6400 R08:  R09:
> 
> [   60.007833] R10: b8e800d87780 R11: 0011 R12:
> c07ea0e8
> [   60.007874] R13: 8c8824e2e000 R14: 8c8824e2e098 R15:
> 
> [   60.007915] FS:  () GS:8c883530()
> knlGS:
> [   60.007960] CS:  0010 DS:  ES:  CR0: 80050033
> [   60.007994] CR2: 00e0 CR3: 000229ca5000 CR4:
> 000406e0
> [   60.008035] Call Trace:
> [   60.008065]  ? usb_unbind_interface+0x71/0x270 [usbcore]
> [   60.008101]  ? device_release_driver_internal+0x154/0x210
> [   60.008135]  ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
> [   60.008168]  ? usbnet_disconnect+0x6c/0xf0 [usbnet]
> [   60.008194]  ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
> [   60.008232]  ? usb_unbind_interface+0x71/0x270 [usbcore]
> [   60.008264]  ? device_release_driver_internal+0x154/0x210
> [   60.008296]  ? bus_remove_device+0xf5/0x160
> [   60.008324]  ? device_del+0x1dc/0x310
> [   60.008355]  ? usb_remove_ep_devs+0x1b/0x30 [usbcore]
> [   60.008393]  ? usb_disable_device+0x93/0x250 [usbcore]
> [   60.008430]  ? usb_disconnect+0x90/0x260 [usbcore]
> [   60.008468]  ? hub_event+0x1d9/0x14a0 [usbcore]
> [   60.008500]  ? process_one_work+0x175/0x370
> [   60.008528]  ? worker_thread+0x4a/0x380
> [   60.008555]  ? kthread+0xfc/0x130
> [   60.008579]  ? process_one_work+0x370/0x370
> [   60.008606]  ? kthread_park+0x60/0x60
> [   60.008631]  ? ret_from_fork+0x22/0x30
> [   60.008656] Code: 66 0f 1f 44 00 00 66 66 66 66 90 55 48 89 fd 53
> 48 83 ec 10 48 8b 9f c8 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 44
> 24 08 31 c0  83 e0 00 00 00 02 74 51 e8 0d b3 2b cd 85 c0 74 67
> 48 8b bb
> [   60.011925] RIP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan] RSP:
> b8e800d87b38
> [   60.013564] CR2: 00e0
> [   60.022125] ---[ end trace e536b59f45bc0f25 ]---
> [   60.025385] IPv6: ADDRCONF(NETDEV_UP): wlp2s0: link is not ready
> 
> If I attempt a second rmmod, the process hangs. If I attempt it on
> 4.11.x it works as expected:
> 
> [   16.897783] fuse init (API v

Re: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY

2017-07-27 Thread Andrew Lunn
On Thu, Jul 27, 2017 at 12:40:01AM +, Brown, Aaron F wrote:
> > From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
> > Of John W. Linville
> > Sent: Friday, July 21, 2017 11:12 AM
> > To: netdev@vger.kernel.org
> > Cc: intel-wired-...@lists.osuosl.org; John W. Linville
> > 
> > Subject: [Intel-wired-lan] [PATCH] igb: support BCM54616 PHY
> > 
> > The management port on an Edgecore AS7712-32 switch uses an igb MAC,
> > but
> > it uses a BCM54616 PHY. Without a patch like this, loading the igb
> > module produces dmesg output like this:
> > 
> > [3.439125] igb: Copyright (c) 2007-2014 Intel Corporation.
> > [3.439866] igb: probe of :00:14.0 failed with error -2
> > 
> > Signed-off-by: John W. Linville 
> > Cc: Jeff Kirsher 
> > ---
> >  drivers/net/ethernet/intel/igb/e1000_82575.c   | 6 ++
> >  drivers/net/ethernet/intel/igb/e1000_defines.h | 1 +
> >  drivers/net/ethernet/intel/igb/e1000_hw.h  | 1 +
> >  3 files changed, 8 insertions(+)
> 
> I do not have the specific hardware (Edgecore switch) but as far as 
> regression tests go this works fine.
> Tested-by: Aaron Brown 

Sorry, missed the initial post, so replying to a reply.

Linux has supported the BCM54616 PHY since April 2015. If the Intel
drivers used the Linux PHY drivers, you would not of had this problem.

It would be good if somebody spent the time to migrate these MAC
drivers to use the common Linux PHY infrastructure.

Andrew


Re: [PATCH net-next v2 07/10] net: dsa: lan9303: Added basic offloading of unicast traffic

2017-07-27 Thread Egil Hjelmeland

On 27. juli 2017 15:31, Andrew Lunn wrote:

I think you are over-simplifying here. Say i have a layer 2 VPN and i
bridge port 1 and the VPN? The software bridge still wants to do STP
on port 1, in order to solve loops.



Problem is that the mainline lan9303_separate_ports() does its
work by setting port 1 & 2 in STP BLOCKING state (and port 0 in
FORWARDING state). So my understanding is that it would break port
separation if LAN9303_SWE_PORT_STATE is written while the driver
is in the non-bridged state.


If the hardware cannot do it, that is a different matter. But if the
hardware can do STP states per port, you should try to make use of it
here.



The HW does STP states per port, but not per pair of port.
I can set port 1 in learning, but I can not tell port 2
to ignore addresses learned on port 1. (Except by using VLAN).

Unless somebody can come up with an other way to implement the
port separation, I think this is how it has to be. I suppose
we don't want to break the port separation feature.



I thought the SW bridge would carry doing its STP work even if
there is a port_stp_state_set method on a DSA port?


It will, but it means you are dropping frames in software, adding
extra load to the CPU, reducing the available bandwidth for the other
port, etc.



That is exactly the case with all traffic with the current mainline
driver.



Andrew



Egil


[net V2 14/14] net/mlx5: Fix mlx5_add_flow_rules call with correct num of dests

2017-07-27 Thread Saeed Mahameed
From: Paul Blakey 

When adding ethtool steering rule with action DISCARD we wrongly
pass a NULL dest with dest_num 1 to mlx5_add_flow_rules().
What this error seems to have caused is sending VPORT 0
(MLX5_FLOW_DESTINATION_TYPE_VPORT) as the fte dest instead of no dests.
We have fte action correctly set to DROP so it might been ignored
anyways.

To reproduce use:
 # sudo ethtool --config-nfc  flow-type ether \
   dst aa:bb:cc:dd:ee:ff action -1

Fixes: 74491de93712 ("net/mlx5: Add multi dest support")
Signed-off-by: Paul Blakey 
Reviewed-by: Mark Bloch 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
index 22fb993987b4..eafc59280ada 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
@@ -320,7 +320,7 @@ add_ethtool_flow_rule(struct mlx5e_priv *priv,
 
spec->match_criteria_enable = 
(!outer_header_zero(spec->match_criteria));
flow_act.flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
-   rule = mlx5_add_flow_rules(ft, spec, &flow_act, dst, 1);
+   rule = mlx5_add_flow_rules(ft, spec, &flow_act, dst, dst ? 1 : 0);
if (IS_ERR(rule)) {
err = PTR_ERR(rule);
netdev_err(priv->netdev, "%s: failed to add ethtool steering 
rule: %d\n",
-- 
2.13.0



  1   2   >