date:20180903

RE: [net-next 02/15] i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h

2018-09-03 Thread Keller, Jacob E




> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of David Miller
> Sent: Wednesday, August 29, 2018 8:05 PM
> To: Kirsher, Jeffrey T 
> Cc: Keller, Jacob E ; netdev@vger.kernel.org;
> nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com
> Subject: Re: [net-next 02/15] i40e: move ethtool stats boiler plate code to
> i40e_ethtool_stats.h
> 
> From: Jeff Kirsher 
> Date: Wed, 29 Aug 2018 15:48:21 -0700
> 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> > new file mode 100644
> > index ..0290ade7494b
> > --- /dev/null
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> > @@ -0,0 +1,221 @@
> ...
> > +/**
> > + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> > + * @p: ethtool supplied buffer
> > + * @stats: stat definitions array
> > + * @size: size of the stats array
> > + *
> > + * Format and copy the strings described by stats into the buffer pointed 
> > at
> > + * by p.
> > + **/
> > +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats 
> > stats[],
> > +   const unsigned int size, ...)
> 
> Need to be marked inline.

Marking this inline seems to be causing build problems on some systems because 
it uses variadic arguments.

Thanks,
Jake

Re: [PATCH net 0/3] bnxt_en: Bug fixes.

2018-09-03 Thread Michael Chan

On Mon, Sep 3, 2018 at 10:50 PM, Michael Chan  wrote:
> On Mon, Sep 3, 2018 at 10:01 PM, David Miller  wrote:
>>
>> From: Michael Chan 
>> Date: Mon,  3 Sep 2018 04:23:16 -0400
>>
>> > This short series fixes resource related logic in the driver, mostly
>> > affecting the RDMA driver under corner cases.
>>
>> Series applied, thanks Michael.
>>
>> Do you want patch #3 queued up for -stable?
>
> Yes, please go ahead.  Thanks.

But there is a dependency on patch #2 though.  So #2 needs to be queued as well.

Re: [PATCH net 0/3] bnxt_en: Bug fixes.

2018-09-03 Thread Michael Chan

On Mon, Sep 3, 2018 at 10:01 PM, David Miller  wrote:
>
> From: Michael Chan 
> Date: Mon,  3 Sep 2018 04:23:16 -0400
>
> > This short series fixes resource related logic in the driver, mostly
> > affecting the RDMA driver under corner cases.
>
> Series applied, thanks Michael.
>
> Do you want patch #3 queued up for -stable?

Yes, please go ahead.  Thanks.

Why not use all the syn queues? in the function "tcp_conn_request", I have some questions.

2018-09-03 Thread Ttttabcd

Hello everyone,recently I am looking at the source code for handling TCP 
three-way handshake(Linux Kernel version 4.18.5).

I found some strange places in the source code for handling syn messages.

in the function "tcp_conn_request"

This code will be executed when we don't enable the syn cookies.

if (!net->ipv4.sysctl_tcp_syncookies &&
(net->ipv4.sysctl_max_syn_backlog - 
inet_csk_reqsk_queue_len(sk) <
 (net->ipv4.sysctl_max_syn_backlog >> 2)) &&
!tcp_peer_is_proven(req, dst)) {
/* Without syncookies last quarter of
 * backlog is filled with destinations,
 * proven to be alive.
 * It means that we continue to communicate
 * to destinations, already remembered
 * to the moment of synflood.
 */
pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
rsk_ops->family);
goto drop_and_release;
}

But why don't we use all the syn queues?

Why do we need to leave the size of (net->ipv4.sysctl_max_syn_backlog >> 2) in 
the queue?

Even if the system is attacked by a syn flood, there is no need to leave a 
part. Why do we need to leave a part?

The value of sysctl_max_syn_backlog is the maximum length of the queue only if 
syn cookies are enabled.


This is the first strange place, here is another strange place

__u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;

if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
 inet_csk_reqsk_queue_is_full(sk)) && !isn) {

if (!want_cookie && !isn) {

The value of "isn" comes from TCP_SKB_CB(skb)->tcp_tw_isn, then it is judged 
twice whether its value is indeed 0.

But "tcp_tw_isn" is initialized in the function "tcp_v4_fill_cb"

TCP_SKB_CB(skb)->tcp_tw_isn = 0;

So it has always been 0, I used printk to test, and the result is always 0.

Since it is always 0, why do you need to judge twice?

This is two strange places I found.

Can anyone tell me why the code here is written like this?

Re: [PATCH v3 0/6] Ethernet over hdlc

2018-09-03 Thread David Miller

From: David Gounaris 
Date: Mon,  3 Sep 2018 14:47:24 +0200

> Here is what has been changed in v3 after the review comments from v2.
> 
> v3-0001: corrected style problems
> v3-0002: corrected style problems
> v3-0003: corrected style problems
> v3-0004: corrected style problems
> v3-0005: corrected style problems
> v3-0006: corrected style problems
> 
> Sorry for that, I did not know about scripts/checkpatch.pl.

Series applied to net-next.

Re: pull-request: mac80211 2018-09-03

2018-09-03 Thread David Miller

From: Johannes Berg 
Date: Mon,  3 Sep 2018 14:15:45 +0200

> This time around for mac80211 I have a larger than usual number of
> fixes, in part because Luca dumped our (Intel's) patches out after
> quite a while - we'll try to make sure this doesn't happen again.
> 
> Shortlog below, as usual, each fix is pretty self-contained but it
> adds up to quite a bit overall.
> 
> Please pull and let me know if there's any problem.

Ok pulled, I'll try to get this merged into net-next in the next day
or two.

Thanks.

Re: [PATCH net-next v2] cxgb4: collect hardware queue descriptors

2018-09-03 Thread David Miller

From: Rahul Lakkireddy 
Date: Mon,  3 Sep 2018 17:41:29 +0530

> Collect descriptors of all ULD and LLD hardware queues managed
> by LLD.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 
> ---
> v2:
> - Move inline functions to header file.
> - Add missing undefine for QDESC_GET_* macros.

Applied.

Re: [PATCH net-next] cxgb4: when max_tx_rate is 0 disable tx rate limiting

2018-09-03 Thread David Miller

From: Ganesh Goudar 
Date: Mon,  3 Sep 2018 16:21:46 +0530

> in ndo_set_vf_rate() when max_tx_rate is 0 disable tx
> rate limiting for that vf.
> 
> Signed-off-by: Casey Leedom 
> Signed-off-by: Ganesh Goudar 

Applied, thanks.

Re: [PATCH v2 00/11] mscc: ocelot: add support for SerDes muxing configuration

2018-09-03 Thread David Miller

From: Alexandre Belloni 
Date: Mon, 3 Sep 2018 15:45:22 +0200

> On 03/09/2018 15:34:15+0200, Andrew Lunn wrote:
>> > I suggest patches 1 and 8 go through MIPS tree, 2 to 5 and 11 go through
>> > net while the others (6, 7, 9 and 10) go through the generic PHY subsystem.
>> 
>> Hi Quentin
>> 
>> Are you expecting merge conflicts? If not, it might be simpler to gets
>> ACKs from each maintainer, and then merge it though one tree.
>> 
> 
> There are some other DT changes for this cycle so those should probably
> go through MIPS.

No objection for this going through the MIPS tree, and from me:

Acked-by: David S. Miller

Re: [PATCH net-next 00/11] Misc. bug fixes & small enhancements for HNS3 Driver

2018-09-03 Thread David Miller

From: Salil Mehta 
Date: Mon, 3 Sep 2018 11:21:45 +0100

> This patch-set presents some fixes and minor enhancements to HNS3
> Driver

Series applied, thank you.

Re: [PATCH net-next 2/2] tipc: correct spelling errors for tipc_topsrv_queue_evt() comments

2018-09-03 Thread David Miller

From: Zhenbo Gao 
Date: Mon, 3 Sep 2018 16:36:46 +0800

> tipc_conn_queue_evt -> tipc_topsrv_queue_evt
> tipc_send_work -> tipc_conn_send_work
> tipc_send_to_sock -> tipc_conn_send_to_sock
> 
> Signed-off-by: Zhenbo Gao 
> Reviewed-by: Ying Xue 

Applied.

Re: [PATCH net-next 1/2] tipc: correct spelling errors for struct tipc_bc_base's comment

2018-09-03 Thread David Miller

From: Zhenbo Gao 
Date: Mon, 3 Sep 2018 16:36:45 +0800

> Trivial fix for two spelling mistakes.
> 
> Signed-off-by: Zhenbo Gao 
> Reviewed-by: Ying Xue 

Applied.

Re: [PATCH net 0/3] bnxt_en: Bug fixes.

2018-09-03 Thread David Miller

From: Michael Chan 
Date: Mon,  3 Sep 2018 04:23:16 -0400

> This short series fixes resource related logic in the driver, mostly
> affecting the RDMA driver under corner cases.

Series applied, thanks Michael.

Do you want patch #3 queued up for -stable?

Re: [PATCH net 0/2] sctp: two fixes for spp_ipv6_flowlabel and spp_dscp sockopts

2018-09-03 Thread David Miller

From: Marcelo Ricardo Leitner 
Date: Mon, 3 Sep 2018 19:06:06 -0300

> On Mon, Sep 03, 2018 at 03:47:09PM +0800, Xin Long wrote:
>> This patchset fixes two problems in sctp_apply_peer_addr_params()
>> when setting spp_ipv6_flowlabel or spp_dscp.
>> 
>> Xin Long (2):
>>   sctp: fix invalid reference to the index variable of the iterator
>>   sctp: not traverse asoc trans list if non-ipv6 trans exists for
>> ipv6_flowlabel
>> 
>>  net/sctp/socket.c | 34 +++---
>>  1 file changed, 19 insertions(+), 15 deletions(-)
> 
> Series
> Acked-by: Marcelo Ricardo Leitner 

Series applied.

Re: [PATCH v2] net/ibm/emac: wrong emac_calc_base call was used by typo

2018-09-03 Thread David Miller

From: Ivan Mikhaylov 
Date: Mon, 3 Sep 2018 10:26:28 +0300

> __emac_calc_base_mr1 was used instead of __emac4_calc_base_mr1
> by copy-paste mistake for emac4syn.
> 
> Fixes: 45d6e545505fd32edb812f085be7de45b6a5c0af ("net/ibm/emac: add 8192 
> rx/tx fifo size")
> Signed-off-by: Ivan Mikhaylov 

Applied.

Re: [PATCH 00/25] Change tty_port(standard)_install's return type

2018-09-03 Thread Sam Ravnborg

Hi Jaejoong.

> Change return type for tty functions. Patch No.01
>   tty: Change return type to void
Adding this patch first will generate a lot of warnings
until all users are updated.
It is usual practice to prepare all users
and then apply the infrastructure changes as the
last patch.
Then people will not see a lot of warnings when
they build something in the middle,
and I guess current stack set may also generate
a few mails from the 0-day build infrastructure.


>   isdn: i4l: isdn_tty: Change return type to void

And a nitpick on the patch description.
This patch do not change any return type, but
it ignore the return value og tty_part_install().
Same goes for all ramaining patches.

Sam

Re: [PATCH net-next] net: sched: null actions array pointer before releasing action

2018-09-03 Thread David Miller

From: Cong Wang 
Date: Mon, 3 Sep 2018 11:18:43 -0700

> On Mon, Sep 3, 2018 at 12:05 AM Vlad Buslov  wrote:
>>
>> Currently, tcf_action_delete() nulls actions array pointer after putting
>> and deleting it. However, if tcf_idr_delete_index() returns an error,
>> pointer to action is not set to null. That results it being released second
>> time in error handling code of tca_action_gd().
> 
> Oops, good catch.
> 
> Acked-by: Cong Wang 
> 
> David, this one should also go to -net rather than -net-next.

Applied to 'net', thanks.

Re: [PATCH net-next v1 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR

2018-09-03 Thread David Ahern

On 9/2/18 10:37 PM, Christian Brauner wrote:
> - Backwards Compatibility:
>   If userspace wants to determine whether ipv4 RTM_GETADDR requests support
>   the new IFA_IF_NETNSID property they should verify that the reply after
>   sending a request includes the IFA_IF_NETNSID property. If it does not
>   userspace should assume that IFA_IF_NETNSID is not supported for ipv4
>   RTM_GETADDR requests on this kernel.

Can only use it once per message type, but NLM_F_DUMP_FILTERED is a flag
that can be set to explicitly say the request is filtered as requested.
See 21fdd092acc7e. I would like to see other filters added for addresses
in the same release this gets used. The only one that comes to mind for
addresses is to only return addresses for devices with master device
index N (same intent as 21fdd092acc7e for neighbors).

Re: [PATCH net-next 10/12] net: ethernet: Add helper for set_pauseparam for Asym Pause

2018-09-03 Thread Florian Fainelli




On 09/03/18 12:58, Andrew Lunn wrote:
>> Don't you want to go one step further and incorporate the logic that xgenet,
>> tg3, gianfar and others have? That is: look at the currently advertised
>> parameters, determine what changed, and re-start auto-negotiation as a
>> result of it being enabled and something changed?
> 
> Hi Florian
> 
> Given the number of changes i'm making, over a so many different
> drivers which i cannot test, i wanted to try to keep the changes
> KISS. That way we are more likely to spot errors during review.  So i
> would prefer to be done later, unless it actually makes review
> simpler...

That's fine, but you still need to keep the test for phydev->autoneg in
xgene I believe. Since all of these drivers appear to be doing the same
thing, that's why I suggested moving this into the helper directly.
-- 
Florian

Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace

2018-09-03 Thread David Ahern

On 9/3/18 3:10 AM, Thomas Haller wrote:
> Hi,
> 
> On Sat, 2018-09-01 at 17:45 -0600, David Ahern wrote:
>> On 9/1/18 3:05 AM, Lorenzo Bianconi wrote:
>>>
>>> I was thinking about the commit 38e01b30563a and then I realized I
>>> misread the code
>>> yesterday. The commit 38e01b30563a provides all relevant info but
>>> it
>>> emits the event
>>> for veth1 (the device moved in the new namespace).
>>> An userspace application will not receive that message if it
>>> filters
>>> events for just
>>> a specific device (veth0 in this case) despite that some device
>>> properties have changed
>>> (since veth0 and veth1 are paired devices). To fix that behavior in
>>> veth_notify routine
>>> I emits a RTM_NEWLINK event for veth0.
>>
>> Userspace is managing a veth a pair. It moves one of them to a new
>> namespace and decides to filter messages related to that device
>> before
>> the move. If it did not filter the DELLINK message it would get the
>> information you want. That to me is a userspace problem. What am I
>> missing?
> 
> The userspace component which moves the veth peer is likely not the
> same that is listening to these events.
> 
>> Fundamentally, your proposal means 3 messages when moving a device to
>> a
>> new namespace which is what I think is unnecessary.
> 
> You are correct, the necessary information is signaled in this case.
> 
> Usually, one can manage the information about one link by considering
> only RTM_NEWLINK/RTM_DELLINK message for that link. That seems
> desirable behavior of the netlink API, as it simplifies user space
> implementations.
> 
> For example, libnl3's cache for links doesn't properly handles this,
> and you end up with invalid data in the cache. You may call that a
> userspace problem and bug. But does it really need to be that
> complicated?
> 
> FWIW, a similar thing was also NACK'ed earlier:
>   http://lists.openwall.net/netdev/2016/06/12/8

And from what I can see, I agree with that NACK; a 3rd message is not
necessary. Userspace has the ability to learn what it needs and should
be looking at the existing options that bleed data across namespaces.

> 
> 
> 
> Paolo and Lorenzo pointed out another issue when moving the peer to a
> third namespace:
> 

> 
> ip netns del ns1
> ip netns del ns2
> ip netns del ns3
> ip netns add ns1
> ip netns add ns2
> ip netns add ns3
> 
> 
> ip -n ns1 link add dev veth0 type veth peer name veth1
> ip -n ns1 monitor link &
> 
> ip -n ns1 link set veth1 netns ns2
> # Deleted 9: veth1@veth0:  mtu 1500 qdisc noop 
> state DOWN group default 
> #link/ether 86:79:1d:c6:45:bc brd ff:ff:ff:ff:ff:ff new-netnsid 0 
> new-ifindex 9
> 
> ip -n ns2 link set veth1 netns ns3
> # no event.


ip li add veth1 type veth peer name veth2
ip netns add foo
ip netns add bar
--> start ip monitor here; see below

ip li set veth2 netns foo
ip -netns foo li set veth2 netns bar


>From init_net:
$ ip monitor all-nsid
[nsid current]Deleted inet veth2
[nsid current]Deleted inet6 veth2
[nsid current]nsid 0 (iproute2 netns name: foo)
[nsid current]Deleted 16: veth2@veth1:  mtu
1500 qdisc noop state DOWN group default
link/ether 0a:28:87:f9:17:7c brd ff:ff:ff:ff:ff:ff new-netns foo
new-ifindex 16
[nsid current]inet if16 forwarding on rp_filter off mc_forwarding off
proxy_neigh off ignore_routes_with_linkdown off
[nsid current]inet6 if16 forwarding off mc_forwarding off proxy_neigh
off ignore_routes_with_linkdown off
[nsid current]nsid 0 (iproute2 netns name: foo)
[nsid current]16: veth2@if17:  mtu 1500 qdisc noop
state DOWN group default
link/ether 0a:28:87:f9:17:7c brd ff:ff:ff:ff:ff:ff link-netnsid 0
[nsid current]Deleted inet veth2
[nsid current]Deleted inet6 veth2
[nsid current]nsid 1
[nsid current]Deleted 16: veth2@if17:  mtu 1500
qdisc noop state DOWN group default
link/ether 0a:28:87:f9:17:7c brd ff:ff:ff:ff:ff:ff link-netnsid 0
new-netnsid 1 new-ifindex 16

[PATCH v2 net-next] failover: Add missing check to validate 'slave_dev' in net_failover_slave_unregister

2018-09-03 Thread YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/net_failover.c: In function 'net_failover_slave_unregister':
drivers/net/net_failover.c:598:35: warning:
 variable 'primary_dev' set but not used [-Wunused-but-set-variable]

There should check the validity of 'slave_dev'.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")

Signed-off-by: YueHaibing 
---
v2: use WARN_ON_ONCE as Liran Alon suggested
---
 drivers/net/net_failover.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 7ae1856..5a749dc 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -603,6 +603,9 @@ static int net_failover_slave_unregister(struct net_device 
*slave_dev,
primary_dev = rtnl_dereference(nfo_info->primary_dev);
standby_dev = rtnl_dereference(nfo_info->standby_dev);
 
+   if (WARN_ON_ONCE(slave_dev != primary_dev && slave_dev != standby_dev))
+   return -ENODEV;
+
vlan_vids_del_by_dev(slave_dev, failover_dev);
dev_uc_unsync(slave_dev, failover_dev);
dev_mc_unsync(slave_dev, failover_dev);

[PATCH 10/25] tty: synclink: Change return type to void

2018-09-03 Thread Jaejoong Kim

Since tty_port_install() always returns 0, the return type has changed
to void. Now apply this and remove the obsolete error check.

Signed-off-by: Jaejoong Kim 
---
 drivers/tty/synclink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/synclink.c b/drivers/tty/synclink.c
index fbdf4d0..6e7e4d6 100644
--- a/drivers/tty/synclink.c
+++ b/drivers/tty/synclink.c
@@ -3355,8 +3355,9 @@ static int mgsl_install(struct tty_driver *driver, struct 
tty_struct *tty)
if (mgsl_paranoia_check(info, tty->name, "mgsl_open"))
return -ENODEV;
tty->driver_data = info;
+   tty_port_install(&info->port, driver, tty);
 
-   return tty_port_install(&info->port, driver, tty);
+   return 0;
 }
 
 /* mgsl_open()
-- 
2.7.4

[PATCH 16/25] isdn: capi: Change return type to void

2018-09-03 Thread Jaejoong Kim

Since tty_standard_install() always returns 0, the return type has changed
to void. Now apply this and remove the obsolete error check.

Signed-off-by: Jaejoong Kim 
---
 drivers/isdn/capi/capi.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/isdn/capi/capi.c b/drivers/isdn/capi/capi.c
index ef5560b..08daf3a 100644
--- a/drivers/isdn/capi/capi.c
+++ b/drivers/isdn/capi/capi.c
@@ -999,13 +999,11 @@ static int
 capinc_tty_install(struct tty_driver *driver, struct tty_struct *tty)
 {
struct capiminor *mp = capiminor_get(tty->index);
-   int ret = tty_standard_install(driver, tty);
 
-   if (ret == 0)
-   tty->driver_data = mp;
-   else
-   capiminor_put(mp);
-   return ret;
+   tty_standard_install(driver, tty);
+   tty->driver_data = mp;
+
+   return 0;
 }
 
 static void capinc_tty_cleanup(struct tty_struct *tty)
-- 
2.7.4

Re: [PATCH net-next 00/12] Preparing for phylib limkmodes

2018-09-03 Thread David Miller

From: Andrew Lunn 
Date: Sun,  2 Sep 2018 19:06:29 +0200

> phylib currently makes us of a u32 bitmap for advertising, supported,
> and link partner capabilities. For a long time, this has been
> sufficient, for devices up to 1Gbps. With more MAC/PHY combinations
> now supporting speeds greater than 1Gbps, we have run out of
> bits. There is the need to replace this u32 with an
> __ETHTOOL_DECLARE_LINK_MODE_MASK, which makes use of linux's generic
> bitmaps.
> 
> This patchset does some of the work preparing for this change. A few
> cleanups are applied to PHY drivers. Some MAC drivers directly access
> members of phydev which are going to change type. These patches adds
> some helpers and swaps MAC drivers to use them, mostly dealing with
> Pause configuration.

Andrew, please fix the indentation issue Florian pointed out in patch
#8 and finish the other feedback threads which seem to still be in
progress.

Thanks!

[PATCH net] net: phy: sfp: Handle unimplemented hwmon limits and alarms

2018-09-03 Thread Andrew Lunn

Not all SFPs implement the registers containing sensor limits and
alarms. Luckily, there is a bit indicating if they are implemented or
not. Add checking for this bit, when deciding if the hwmon attributes
should be visible.

Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors")
Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/sfp.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 4637d980310e..52fffb98fde9 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -398,7 +398,6 @@ static umode_t sfp_hwmon_is_visible(const void *data,
switch (type) {
case hwmon_temp:
switch (attr) {
-   case hwmon_temp_input:
case hwmon_temp_min_alarm:
case hwmon_temp_max_alarm:
case hwmon_temp_lcrit_alarm:
@@ -407,13 +406,16 @@ static umode_t sfp_hwmon_is_visible(const void *data,
case hwmon_temp_max:
case hwmon_temp_lcrit:
case hwmon_temp_crit:
+   if (!(sfp->id.ext.enhopts & SFP_ENHOPTS_ALARMWARN))
+   return 0;
+   /* fall through */
+   case hwmon_temp_input:
return 0444;
default:
return 0;
}
case hwmon_in:
switch (attr) {
-   case hwmon_in_input:
case hwmon_in_min_alarm:
case hwmon_in_max_alarm:
case hwmon_in_lcrit_alarm:
@@ -422,13 +424,16 @@ static umode_t sfp_hwmon_is_visible(const void *data,
case hwmon_in_max:
case hwmon_in_lcrit:
case hwmon_in_crit:
+   if (!(sfp->id.ext.enhopts & SFP_ENHOPTS_ALARMWARN))
+   return 0;
+   /* fall through */
+   case hwmon_in_input:
return 0444;
default:
return 0;
}
case hwmon_curr:
switch (attr) {
-   case hwmon_curr_input:
case hwmon_curr_min_alarm:
case hwmon_curr_max_alarm:
case hwmon_curr_lcrit_alarm:
@@ -437,6 +442,10 @@ static umode_t sfp_hwmon_is_visible(const void *data,
case hwmon_curr_max:
case hwmon_curr_lcrit:
case hwmon_curr_crit:
+   if (!(sfp->id.ext.enhopts & SFP_ENHOPTS_ALARMWARN))
+   return 0;
+   /* fall through */
+   case hwmon_curr_input:
return 0444;
default:
return 0;
@@ -452,7 +461,6 @@ static umode_t sfp_hwmon_is_visible(const void *data,
channel == 1)
return 0;
switch (attr) {
-   case hwmon_power_input:
case hwmon_power_min_alarm:
case hwmon_power_max_alarm:
case hwmon_power_lcrit_alarm:
@@ -461,6 +469,10 @@ static umode_t sfp_hwmon_is_visible(const void *data,
case hwmon_power_max:
case hwmon_power_lcrit:
case hwmon_power_crit:
+   if (!(sfp->id.ext.enhopts & SFP_ENHOPTS_ALARMWARN))
+   return 0;
+   /* fall through */
+   case hwmon_power_input:
return 0444;
default:
return 0;
-- 
2.19.0.rc1

Re: [PATCH net] ip6_tunnel: respect ttl inherit for ip6tnl

2018-09-03 Thread David Miller

From: Hangbin Liu 
Date: Fri, 31 Aug 2018 16:52:01 +0800

> man ip-tunnel ttl section says:
> 0 is a special value meaning that packets inherit the TTL value.
> 
> IPv4 tunnel respect this in ip_tunnel_xmit(), but IPv6 tunnel has not
> implement it yet. To make IPv6 behave consistently with IP tunnel,
> add ipv6 tunnel inherit support.
> 
> Signed-off-by: Hangbin Liu 

Applied, thank you.

Re: [PATCH net-next 01/12] net: phy: ste10Xp: Remove wrong SUPPORTED_Pause

2018-09-03 Thread Andrew Lunn

On Mon, Sep 03, 2018 at 10:53:58AM -0700, Florian Fainelli wrote:
> 
> 
> On 9/2/2018 10:06 AM, Andrew Lunn wrote:
> >The PHY driver should not indicate that Pause is supported. It is upto
> >the MAC drive enable it, if it supports Pause frames. So remove it
> >from the ste10Xp driver.
> 
> This came up before when Timur was cleaning up the Pause|ASym_Pause
> advertisment bits, and we agreed that a driver that cannot have the
> Asym_Pause bit writable, e.g: bcm63xx, would have to specifically leave
> SUPPORTED_Pause as a way to tell PHYLIB about that situation. See:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/phy?id=529ed12752635ba8a35dc78ec70ed6f42570b4ca
> 
> Can you check the datasheet if available?

Hi Florian

Datasheet is at:

https://www.st.com/resource/en/datasheet/ste100p.pdf

It indicates that bit 10 of the Auto-Neg advertisement for Pause is
R/W and defaults to 1. There is no support for Asym Pause, bit 11 is
reserved.

So we don't have the case of the bcm63xx. It should be that this PHY
supports plain symmetric Pause.

 Andrew

Re: [PATCH net-next v2 1/2] netlink: ipv4 igmp join notifications

2018-09-03 Thread Roopa Prabhu

On Sun, Sep 2, 2018 at 4:18 AM, Patrick Ruddy
 wrote:
> Hi Roopa
>
> inline
>
> thx
>
> -pr
>
> On Fri, 2018-08-31 at 09:29 -0700, Roopa Prabhu wrote:
>> On Fri, Aug 31, 2018 at 4:20 AM, Patrick Ruddy
>>  wrote:
>> > Some userspace applications need to know about IGMP joins from the kernel
>> > for 2 reasons
>> > 1. To allow the programming of multicast MAC filters in hardware
>> > 2. To form a multicast FORUS list for non link-local multicast
>> >groups to be sent to the kernel and from there to the interested
>> >party.
>> > (1) can be fulfilled but simply sending the hardware multicast MAC
>> > address to be programmed but (2) requires the L3 address to be sent
>> > since this cannot be constructed from the MAC address whereas the
>> > reverse translation is a standard library function.
>> >
>> > This commit provides addition and deletion of multicast addresses
>> > using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
>> > the RTM_GETADDR extension to allow multicast join state to be read
>> > from the kernel.
>> >
>> > Signed-off-by: Patrick Ruddy 
>> > ---
>> > v2: fix kbuild warnings.
>>
>> I am still going through the series, but AFAICT, user-space caches listening 
>> to
>> RTNLGRP_IPV4_IFADDR will now also get multicast addresses by default ?
>>
>
> Yes that's the crux of this change. It's unfortunate that I could not
> use IFA_MULTICAST to distinguish the SAFI. I suppose the other option
> would be to create a set of new NEW/DEL/GETMULTICAST messages but the
> partial code for RTM_GETMULTICAST in ipv6/mcast.c complicates that
> slightly. Happy to look at it if you think that would be be better.
>

yeah, true. Thinking about this some more, you are adding an interface
for multicast entries learnt via igmp.
There is already a netlink channel for layer2 mc addresses via igmp. I
can't see why that cannot be used.
It is RTM_*MDB msgs. It is currently only available for the bridge.
But, I have a requirement for it to be
available via a vxlan dev...so, I am looking at making it available on
other devices.

Can you check if RTM_*MDB msgs can be made to work for your case ?.

The reason I think it should be possible is because this is similar to
bridge fdb entries.
The bridge fdb api  (RTM_NEWNEIGH with AF_BRIDGE) is overloaded to
notify and dump netdev unicast addresses.
similarly I think the mdb api can be overloaded to notify and dump
netdev multicast addresses (statically added or learnt via igmp)

Re: [PATCH net 0/2] sctp: two fixes for spp_ipv6_flowlabel and spp_dscp sockopts

2018-09-03 Thread Marcelo Ricardo Leitner

On Mon, Sep 03, 2018 at 03:47:09PM +0800, Xin Long wrote:
> This patchset fixes two problems in sctp_apply_peer_addr_params()
> when setting spp_ipv6_flowlabel or spp_dscp.
> 
> Xin Long (2):
>   sctp: fix invalid reference to the index variable of the iterator
>   sctp: not traverse asoc trans list if non-ipv6 trans exists for
> ipv6_flowlabel
> 
>  net/sctp/socket.c | 34 +++---
>  1 file changed, 19 insertions(+), 15 deletions(-)

Series
Acked-by: Marcelo Ricardo Leitner

[PATCH net-next v2] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Vlad Buslov

Recent refactoring of add_metainfo() caused use_all_metadata() to add
metainfo to ife action metalist without taking reference to module. This
causes warning in module_put called from ife action cleanup function.

Implement add_metainfo_and_get_ops() function that returns with reference
to module taken if metainfo was added successfully, and call it from
use_all_metadata(), instead of calling __add_metainfo() directly.

Example warning:

[  646.344393] WARNING: CPU: 1 PID: 2278 at kernel/module.c:1139 
module_put+0x1cb/0x230
[  646.352437] Modules linked in: act_meta_skbtcindex act_meta_mark 
act_meta_skbprio act_ife ife veth nfsv3 nfs fscache xt_CHECKSUM iptable_mangle 
ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c tun ebtable_filter ebtables 
ip6table_filter ip6_tables bridge stp llc mlx5_ib ib_uverbs ib_core intel_rapl 
sb_edac x86_pkg_temp_thermal mlx5_core coretemp kvm_intel kvm nfsd igb 
irqbypass crct10dif_pclmul devlink crc32_pclmul mei_me joydev ses crc32c_intel 
enclosure auth_rpcgss i2c_algo_bit ioatdma ptp mei pps_core ghash_clmulni_intel 
iTCO_wdt iTCO_vendor_support pcspkr dca ipmi_ssif lpc_ich target_core_mod 
i2c_i801 ipmi_si ipmi_devintf pcc_cpufreq wmi ipmi_msghandler nfs_acl lockd 
acpi_pad acpi_power_meter grace sunrpc mpt3sas raid_class scsi_transport_sas
[  646.425631] CPU: 1 PID: 2278 Comm: tc Not tainted 4.19.0-rc1+ #799
[  646.432187] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 
03/30/2017
[  646.440595] RIP: 0010:module_put+0x1cb/0x230
[  646.445238] Code: f3 66 94 02 e8 26 ff fa ff 85 c0 74 11 0f b6 1d 51 30 94 
02 80 fb 01 77 60 83 e3 01 74 13 65 ff 0d 3a 83 db 73 e9 2b ff ff ff <0f> 0b e9 
00 ff ff ff e8 59 01 fb ff 85 c0 75 e4 48 c7 c2 20 62 6b
[  646.464997] RSP: 0018:880354d37068 EFLAGS: 00010286
[  646.470599] RAX:  RBX: c0a52518 RCX: 8c2668db
[  646.478118] RDX: 0003 RSI: dc00 RDI: c0a52518
[  646.485641] RBP: c0a52180 R08: fbfff814a4a4 R09: fbfff814a4a3
[  646.493164] R10: c0a5251b R11: fbfff814a4a4 R12: 11006a9a6e0d
[  646.500687] R13:  R14: 880362bab890 R15: dead0100
[  646.508213] FS:  7f4164c99800() GS:88036fe4() 
knlGS:
[  646.516961] CS:  0010 DS:  ES:  CR0: 80050033
[  646.523080] CR2: 7f41638b8420 CR3: 000351df0004 CR4: 001606e0
[  646.530595] Call Trace:
[  646.533408]  ? find_symbol_in_section+0x260/0x260
[  646.538509]  tcf_ife_cleanup+0x11b/0x200 [act_ife]
[  646.543695]  tcf_action_cleanup+0x29/0xa0
[  646.548078]  __tcf_action_put+0x5a/0xb0
[  646.552289]  ? nla_put+0x65/0xe0
[  646.555889]  __tcf_idr_release+0x48/0x60
[  646.560187]  tcf_generic_walker+0x448/0x6b0
[  646.564764]  ? tcf_action_dump_1+0x450/0x450
[  646.569411]  ? __lock_is_held+0x84/0x110
[  646.573720]  ? tcf_ife_walker+0x10c/0x20f [act_ife]
[  646.578982]  tca_action_gd+0x972/0xc40
[  646.583129]  ? tca_get_fill.constprop.17+0x250/0x250
[  646.588471]  ? mark_lock+0xcf/0x980
[  646.592324]  ? check_chain_key+0x140/0x1f0
[  646.596832]  ? debug_show_all_locks+0x240/0x240
[  646.601839]  ? memset+0x1f/0x40
[  646.605350]  ? nla_parse+0xca/0x1a0
[  646.609217]  tc_ctl_action+0x215/0x230
[  646.613339]  ? tcf_action_add+0x220/0x220
[  646.617748]  rtnetlink_rcv_msg+0x56a/0x6d0
[  646.67]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.626466]  netlink_rcv_skb+0x18d/0x200
[  646.630752]  ? rtnl_fdb_del+0x3f0/0x3f0
[  646.634959]  ? netlink_ack+0x500/0x500
[  646.639106]  netlink_unicast+0x2d0/0x370
[  646.643409]  ? netlink_attachskb+0x340/0x340
[  646.648050]  ? _copy_from_iter_full+0xe9/0x3e0
[  646.652870]  ? import_iovec+0x11e/0x1c0
[  646.657083]  netlink_sendmsg+0x3b9/0x6a0
[  646.661388]  ? netlink_unicast+0x370/0x370
[  646.665877]  ? netlink_unicast+0x370/0x370
[  646.670351]  sock_sendmsg+0x6b/0x80
[  646.674212]  ___sys_sendmsg+0x4a1/0x520
[  646.678443]  ? copy_msghdr_from_user+0x210/0x210
[  646.683463]  ? lock_downgrade+0x320/0x320
[  646.687849]  ? debug_show_all_locks+0x240/0x240
[  646.692760]  ? do_raw_spin_unlock+0xa2/0x130
[  646.697418]  ? _raw_spin_unlock+0x24/0x30
[  646.701798]  ? __handle_mm_fault+0x1819/0x1c10
[  646.706619]  ? __pmd_alloc+0x320/0x320
[  646.710738]  ? debug_show_all_locks+0x240/0x240
[  646.715649]  ? restore_nameidata+0x7b/0xa0
[  646.720117]  ? check_chain_key+0x140/0x1f0
[  646.724590]  ? check_chain_key+0x140/0x1f0
[  646.729070]  ? __fget_light+0xbc/0xd0
[  646.733121]  ? __sys_sendmsg+0xd7/0x150
[  646.737329]  __sys_sendmsg+0xd7/0x150
[  646.741359]  ? __ia32_sys_shutdown+0x30/0x30
[  646.746003]  ? up_read+0x53/0x90
[  646.749601]  ? __do_page_fault+0x484/0x780
[  646.754105]  ? do_syscall_64+0x1e/0x2c0
[  646.758320]  do_syscall_64+0x72/0x2c0
[  646.762353]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  646.767776] RIP: 0033:0x7f4163872150
[  646.771713] Code: 8b 15 3c 7d 2b 00

Re: [bpf-next 1/3] flow_dissector: implements flow dissector BPF hook

2018-09-03 Thread Petar Penkov

On Sun, Sep 2, 2018 at 2:03 PM, Daniel Borkmann  wrote:
> On 08/30/2018 08:22 PM, Petar Penkov wrote:
>> From: Petar Penkov 
>>
>> Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
>> attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
>> path. The BPF program is per-network namespace.
>>
>> Signed-off-by: Petar Penkov 
>> Signed-off-by: Willem de Bruijn 
> [...]
>> + err = check_flow_keys_access(env, off, size);
>> + if (!err && t == BPF_READ && value_regno >= 0)
>> + mark_reg_unknown(env, regs, value_regno);
>>   } else {
>>   verbose(env, "R%d invalid mem access '%s'\n", regno,
>>   reg_type_str[reg->type]);
>> @@ -1925,6 +1954,8 @@ static int check_helper_mem_access(struct 
>> bpf_verifier_env *env, int regno,
>>   case PTR_TO_PACKET_META:
>>   return check_packet_access(env, regno, reg->off, access_size,
>>  zero_size_allowed);
>> + case PTR_TO_FLOW_KEYS:
>> + return check_flow_keys_access(env, reg->off, access_size);
>>   case PTR_TO_MAP_VALUE:
>>   return check_map_access(env, regno, reg->off, access_size,
>>   zero_size_allowed);
>> @@ -3976,6 +4007,7 @@ static bool may_access_skb(enum bpf_prog_type type)
>>   case BPF_PROG_TYPE_SOCKET_FILTER:
>>   case BPF_PROG_TYPE_SCHED_CLS:
>>   case BPF_PROG_TYPE_SCHED_ACT:
>> + case BPF_PROG_TYPE_FLOW_DISSECTOR:
>>   return true;
>
> This one should not be added here. It would allow for LD_ABS to be used, but
> you already have direct packet access as well as bpf_skb_load_bytes() helper
> enabled. Downside on LD_ABS is that error path will exit the BPF prog with
> return 0 for historical reasons w/o user realizing (here: to BPF_OK mapping).
> So we should not encourage use of LD_ABS/IND anymore in eBPF context and
> avoid surprises.
>
>>   default:
>>   return false;
>> @@ -4451,6 +4483,7 @@ static bool regsafe(struct bpf_reg_state *rold, struct 
>> bpf_reg_state *rcur,
>>   case PTR_TO_CTX:
>>   case CONST_PTR_TO_MAP:
>>   case PTR_TO_PACKET_END:
>> + case PTR_TO_FLOW_KEYS:
>>   /* Only valid matches are exact, which memcmp() above
>>* would have accepted
>>*/
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index c25eb36f1320..0143b9c0c67e 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5092,6 +5092,17 @@ sk_skb_func_proto(enum bpf_func_id func_id, const 
>> struct bpf_prog *prog)
>>   }
>>  }
>>
>> +static const struct bpf_func_proto *
>> +flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog 
>> *prog)
>> +{
>> + switch (func_id) {
>> + case BPF_FUNC_skb_load_bytes:
>> + return &bpf_skb_load_bytes_proto;
>
> Probably makes sense to also enable bpf_skb_pull_data helper for direct packet
> access use to fetch non-linear data from here once.
>
>> + default:
>> + return bpf_base_func_proto(func_id);
>> + }
>> +}
>> +
>>  static const struct bpf_func_proto *
>>  lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  {
>> @@ -5207,6 +5218,7 @@ static bool bpf_skb_is_valid_access(int off, int size, 
>> enum bpf_access_type type
>>   case bpf_ctx_range(struct __sk_buff, data):
>>   case bpf_ctx_range(struct __sk_buff, data_meta):
>>   case bpf_ctx_range(struct __sk_buff, data_end):
>> + case bpf_ctx_range(struct __sk_buff, flow_keys):
>>   if (size != size_default)
>>   return false;
>>   break;
>> @@ -5235,6 +5247,7 @@ static bool sk_filter_is_valid_access(int off, int 
>> size,
>>   case bpf_ctx_range(struct __sk_buff, data):
>>   case bpf_ctx_range(struct __sk_buff, data_meta):
>>   case bpf_ctx_range(struct __sk_buff, data_end):
>> + case bpf_ctx_range(struct __sk_buff, flow_keys):
>>   case bpf_ctx_range_till(struct __sk_buff, family, local_port):
> [...]
> Thanks,
> Daniel

Thank you for your feedback, Daniel! I'll make these changes and submit a v2.
Petar

Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-03 Thread Vlad Buslov



On Mon 03 Sep 2018 at 18:50, Cong Wang  wrote:
> On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov  wrote:
>>
>> Action API was changed to work with actions and action_idr in concurrency
>> safe manner, however tcf_del_walker() still uses actions without taking
>> reference to them first and deletes them directly, disregarding possible
>> concurrent delete.
>>
>> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
>> caller to hold reference to action and accepts action id as argument,
>> instead of direct action pointer.
>
> Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
> tcf_dump_walker() already does.

Because tcf_del_walker() calls __tcf_idr_release(), which take
idrinfo->lock itself (deadlock). It also calls sleeping functions like
tcf_action_goto_chain_fini(), so just implementing function that
releases action without taking idrinfo->lock is not enough.

Re: [PATCH net-next 10/12] net: ethernet: Add helper for set_pauseparam for Asym Pause

2018-09-03 Thread Andrew Lunn

> Don't you want to go one step further and incorporate the logic that xgenet,
> tg3, gianfar and others have? That is: look at the currently advertised
> parameters, determine what changed, and re-start auto-negotiation as a
> result of it being enabled and something changed?

Hi Florian

Given the number of changes i'm making, over a so many different
drivers which i cannot test, i wanted to try to keep the changes
KISS. That way we are more likely to spot errors during review.  So i
would prefer to be done later, unless it actually makes review
simpler...

   Andrew

Re: [Patch net] tipc: call start and done ops directly in __tipc_nl_compat_dumpit()

2018-09-03 Thread Cong Wang

On Mon, Sep 3, 2018 at 12:49 PM Cong Wang  wrote:
>
> __tipc_nl_compat_dumpit() uses a netlink_callback on stack,
> so the only way to align it with other ->dumpit() call path
> is calling tipc_dump_start() and tipc_dump_done() directly
> inside it. Otherwise ->dumpit() would always get NULL from
> cb->args[0].
>
> Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator")
> Reported-by: syzbot+e93a2c41f91b8e2c7...@syzkaller.appspotmail.com
> Cc: Jon Maloy 
> Cc: Ying Xue 
> Signed-off-by: Cong Wang 
> ---
>  net/tipc/netlink_compat.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
> index a2f76743c73a..aa05934613f2 100644
> --- a/net/tipc/netlink_compat.c
> +++ b/net/tipc/netlink_compat.c
> @@ -185,6 +185,7 @@ static int __tipc_nl_compat_dumpit(struct 
> tipc_nl_compat_cmd_dump *cmd,
> return -ENOMEM;
>
> buf->sk = msg->dst_sk;
> +   tipc_dump_start(&cb);

Well, tipc_dump_start() uses sock_net(cb->skb->sk) which
seems not set here... I need to pass msg->dst_sk in.

I will send v2.

Re: [PATCH v2 net-next 7/7] net: dsa: Add Lantiq / Intel DSA driver for vrx200

2018-09-03 Thread Florian Fainelli





On 9/1/2018 5:05 AM, Hauke Mehrtens wrote:

This adds the DSA driver for the GSWIP Switch found in the VRX200 SoC.
This switch is integrated in the DSL SoC, this SoC uses a GSWIP version
2.1, there are other SoCs using different versions of this IP block, but
this driver was only tested with the version found in the VRX200.
Currently only the basic features are implemented which will forward all
packages to the CPU and let the CPU do the forwarding. The hardware also
support Layer 2 offloading which is not yet implemented in this driver.

The GPHY FW loaded is now done by this driver and not any more by the
separate driver in drivers/soc/lantiq/gphy.c, I will remove this driver
is a separate patch. to make use of the GPHY this switch driver is
needed anyway. Other SoCs have more embedded GPHYs so this driver should
support a variable number of GPHYs. After the firmware was loaded the
GPHY can be probed on the MDIO bus and it behaves like an external GPHY,
without the firmware it can not be probed on the MDIO bus.

Currently this depends on SOC_TYPE_XWAY because the SoC revision
detection function ltq_soc_type() is used to load the correct GPHY
firmware on the VRX200 SoCs.

The clock names in the sysctrl.c file have to be changed because the
clocks are now used by a different driver. This should be cleaned up and
a real common clock driver should provide the clocks instead.

Signed-off-by: Hauke Mehrtens 
---


Looks great, just a few suggestions below

[snip]


+static void gswip_adjust_link(struct dsa_switch *ds, int port,
+ struct phy_device *phydev)
+{
+   struct gswip_priv *priv = ds->priv;
+   u16 macconf = phydev->mdio.addr & GSWIP_MDIO_PHY_ADDR_MASK;
+   u16 miirate = 0;
+   u16 miimode;
+   u16 lcl_adv = 0, rmt_adv = 0;
+   u8 flowctrl;
+
+   /* do not run this for the CPU port */
+   if (dsa_is_cpu_port(ds, port))
+   return;


Typically we expect the adjust_link callback to run for fixed link 
ports, that is inter-switch links (between switches) or between the CPU 
port and the Ethernet MAC attached to the switch. Here you are running 
this for the user facing ports (IIRC), which should really not be 
necessary, most Ethernet switches will be able to look at their built-in 
PHY's state and configure the switch's port automatically. Maybe this is 
not possible here because you had to disable polling?


Can you consider implementing PHYLINK operations which would make the 
driver more future proof, should you consider newer generations of 
switches that support 10G PHY, SGMII, SFP/SFF and so on?


[snip]


+   if (priv->ds->dst->cpu_dp->index != priv->hw_info->cpu_port) {
+   dev_err(dev, "wrong CPU port defined, HW only supports port: 
%i",
+   priv->hw_info->cpu_port);
+   err = -EINVAL;
+   goto mdio_bus;
+   }


There are a number of switches (b53, qca8k, mt7530) that have this 
requirement, we should probably be moving this check down into the core 
DSA layer and allow either to continue but disable switch tagging, if it 
was requested. Andrew what do you think?

--
Florian

Re: [PATCH net-next 09/12] net: ethernet: Add helper for MACs which support pause

2018-09-03 Thread Andrew Lunn

On Mon, Sep 03, 2018 at 10:39:04AM -0700, Florian Fainelli wrote:
> 
> 
> On 9/2/2018 10:06 AM, Andrew Lunn wrote:
> >Rather than have the MAC drivers manipulate phydev members, add a
> >helper function for MACs supporting Pause, but not Asym Pause.
> >
> >Signed-off-by: Andrew Lunn 
> 
> Reviewed-by: Florian Fainelli 
> 
> I wonder if this would be better named phy_support_sym_pause() as opposed to
> asym_pause()?

Hi Florian

Maybe, bit it is then very similar to asym. I wounder if there is some
other word we can use for the symmetrical case?

  Andrew

[Patch net] tipc: call start and done ops directly in __tipc_nl_compat_dumpit()

2018-09-03 Thread Cong Wang

__tipc_nl_compat_dumpit() uses a netlink_callback on stack,
so the only way to align it with other ->dumpit() call path
is calling tipc_dump_start() and tipc_dump_done() directly
inside it. Otherwise ->dumpit() would always get NULL from
cb->args[0].

Fixes: 9a07efa9aea2 ("tipc: switch to rhashtable iterator")
Reported-by: syzbot+e93a2c41f91b8e2c7...@syzkaller.appspotmail.com
Cc: Jon Maloy 
Cc: Ying Xue 
Signed-off-by: Cong Wang 
---
 net/tipc/netlink_compat.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c
index a2f76743c73a..aa05934613f2 100644
--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -185,6 +185,7 @@ static int __tipc_nl_compat_dumpit(struct 
tipc_nl_compat_cmd_dump *cmd,
return -ENOMEM;
 
buf->sk = msg->dst_sk;
+   tipc_dump_start(&cb);
 
do {
int rem;
@@ -216,6 +217,7 @@ static int __tipc_nl_compat_dumpit(struct 
tipc_nl_compat_cmd_dump *cmd,
err = 0;
 
 err_out:
+   tipc_dump_done(&cb);
kfree_skb(buf);
 
if (err == -EMSGSIZE) {
-- 
2.14.4

Re: [PATCH v2 net-next 4/7] dt-bindings: net: Add lantiq,xrx200-net DT bindings

2018-09-03 Thread Florian Fainelli





On 9/1/2018 5:04 AM, Hauke Mehrtens wrote:

This adds the binding for the PMAC core between the CPU and the GSWIP
switch found on the xrx200 / VR9 Lantiq / Intel SoC.

Signed-off-by: Hauke Mehrtens 
Cc: devicet...@vger.kernel.org
---
  .../devicetree/bindings/net/lantiq,xrx200-net.txt   | 21 +
  1 file changed, 21 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt

diff --git a/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt 
b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
new file mode 100644
index ..8a2fe5200cdc
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
@@ -0,0 +1,21 @@
+Lantiq xRX200 GSWIP PMAC Ethernet driver
+==
+
+Required properties:
+
+- compatible   : "lantiq,xrx200-net" for the PMAC of the embedded
+   : GSWIP in the xXR200
+- reg  : memory range of the PMAC core inside of the GSWIP core
+- interrupts   : TX and RX DMA interrupts. Use interrupt-names "tx" for
+   : the TX interrupt and "rx" for the RX interrupt.


You would likely want to document that the order should be strict, that 
is TX interrupt first and RX interrupt second, but other than that:


Reviewed-by: Florian Fainelli 
--
Florian

Re: [PATCH v2 net-next 5/7] net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver

2018-09-03 Thread Florian Fainelli





On 9/1/2018 5:04 AM, Hauke Mehrtens wrote:

This drives the PMAC between the GSWIP Switch and the CPU in the VRX200
SoC. This is currently only the very basic version of the Ethernet
driver.

When the DMA channel is activated we receive some packets which were
send to the SoC while it was still in U-Boot, these packets have the
wrong header. Resetting the IP cores did not work so we read out the
extra packets at the beginning and discard them.

This also adapts the clock code in sysctrl.c to use the default name of
the device node so that the driver gets the correct clock. sysctrl.c
should be replaced with a proper common clock driver later.

Signed-off-by: Hauke Mehrtens 
---
  MAINTAINERS  |   1 +
  arch/mips/lantiq/xway/sysctrl.c  |   6 +-
  drivers/net/ethernet/Kconfig |   7 +
  drivers/net/ethernet/Makefile|   1 +
  drivers/net/ethernet/lantiq_xrx200.c | 591 +++
  5 files changed, 603 insertions(+), 3 deletions(-)
  create mode 100644 drivers/net/ethernet/lantiq_xrx200.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b2ee65f6086..912d31b5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8171,6 +8171,7 @@ M:Hauke Mehrtens 
  L:netdev@vger.kernel.org
  S:Maintained
  F:net/dsa/tag_gswip.c
+F: drivers/net/ethernet/lantiq_xrx200.c
  
  LANTIQ MIPS ARCHITECTURE

  M:John Crispin 
diff --git a/arch/mips/lantiq/xway/sysctrl.c b/arch/mips/lantiq/xway/sysctrl.c
index e0af39b33e28..eeb89a37e27e 100644
--- a/arch/mips/lantiq/xway/sysctrl.c
+++ b/arch/mips/lantiq/xway/sysctrl.c
@@ -505,7 +505,7 @@ void __init ltq_soc_init(void)
clkdev_add_pmu("1a80.pcie", "msi", 1, 1, PMU1_PCIE2_MSI);
clkdev_add_pmu("1a80.pcie", "pdi", 1, 1, PMU1_PCIE2_PDI);
clkdev_add_pmu("1a80.pcie", "ctl", 1, 1, PMU1_PCIE2_CTL);
-   clkdev_add_pmu("1e108000.eth", NULL, 0, 0, PMU_SWITCH | 
PMU_PPE_DP);
+   clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH | 
PMU_PPE_DP);
clkdev_add_pmu("1da0.usif", "NULL", 1, 0, PMU_USIF);
clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU);
} else if (of_machine_is_compatible("lantiq,ar10")) {
@@ -513,7 +513,7 @@ void __init ltq_soc_init(void)
  ltq_ar10_fpi_hz(), ltq_ar10_pp32_hz());
clkdev_add_pmu("1e101000.usb", "otg", 1, 0, PMU_USB0);
clkdev_add_pmu("1e106000.usb", "otg", 1, 0, PMU_USB1);
-   clkdev_add_pmu("1e108000.eth", NULL, 0, 0, PMU_SWITCH |
+   clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH |
   PMU_PPE_DP | PMU_PPE_TC);


Should not that be part of patch 4 where you define the base register 
address?



clkdev_add_pmu("1da0.usif", "NULL", 1, 0, PMU_USIF);
clkdev_add_pmu("1f203020.gphy", NULL, 1, 0, PMU_GPHY);
@@ -536,7 +536,7 @@ void __init ltq_soc_init(void)
clkdev_add_pmu(NULL, "ahb", 1, 0, PMU_AHBM | PMU_AHBS);
  
  		clkdev_add_pmu("1da0.usif", "NULL", 1, 0, PMU_USIF);

-   clkdev_add_pmu("1e108000.eth", NULL, 0, 0,
+   clkdev_add_pmu("1e10b308.eth", NULL, 0, 0,
PMU_SWITCH | PMU_PPE_DPLUS | PMU_PPE_DPLUM |
PMU_PPE_EMA | PMU_PPE_TC | PMU_PPE_SLL01 |
PMU_PPE_QSB | PMU_PPE_TOP);


Likewise.

[snip]


+static int xrx200_open(struct net_device *dev)
+{
+   struct xrx200_priv *priv = netdev_priv(dev);
+
+   ltq_dma_open(&priv->chan_tx.dma);
+   ltq_dma_enable_irq(&priv->chan_tx.dma);
+
+   napi_enable(&priv->chan_rx.napi);
+   ltq_dma_open(&priv->chan_rx.dma);
+   /* The boot loader does not always deactivate the receiving of frames
+* on the ports and then some packets queue up in the PPE buffers.
+* They already passed the PMAC so they do not have the tags
+* configured here. Read the these packets here and drop them.
+* The HW should have written them into memory after 10us
+*/
+   udelay(10);


You execute in process context with the ndo_open() callback (AFAIR), 
would usleep_range() work here?



+   xrx200_flush_dma(&priv->chan_rx);
+   ltq_dma_enable_irq(&priv->chan_rx.dma);
+
+   netif_wake_queue(dev);
+
+   return 0;
+}
+
+static int xrx200_close(struct net_device *dev)
+{
+   struct xrx200_priv *priv = netdev_priv(dev);
+
+   netif_stop_queue(dev);
+
+   napi_disable(&priv->chan_rx.napi);
+   ltq_dma_close(&priv->chan_rx.dma);
+
+   ltq_dma_close(&priv->chan_tx.dma);
+
+   return 0;
+}
+
+static int xrx200_alloc_skb(struct xrx200_chan *ch)
+{
+   int ret = 0;
+
+#define DMA_PAD(NET_IP_ALIGN + NET_SKB_PAD)
+   ch->skb[ch->dma.desc] = dev_alloc_skb(XRX200_DMA_DATA_LEN + DMA_PAD);
+   if (!ch->skb[ch->dma.desc]) {
+   ret = -

Re: [PATCH v2 net-next 3/7] net: dsa: Add Lantiq / Intel GSWIP tag support

2018-09-03 Thread Florian Fainelli





On 9/1/2018 5:03 AM, Hauke Mehrtens wrote:

This handles the tag added by the PMAC on the VRX200 SoC line.

The GSWIP uses internally a GSWIP special tag which is located after the
Ethernet header. The PMAC which connects the GSWIP to the CPU converts
this special tag used by the GSWIP into the PMAC special tag which is
added in front of the Ethernet header.

This was tested with GSWIP 2.1 found in the VRX200 SoCs, other GSWIP
versions use slightly different PMAC special tags.

Signed-off-by: Hauke Mehrtens 


Reviewed-by: Florian Fainelli 

Just one suggestion below, if you need to resubmit this:

[snip]


+static struct sk_buff *gswip_tag_rcv(struct sk_buff *skb,
+struct net_device *dev,
+struct packet_type *pt)
+{
+   int port;
+   u8 *gswip_tag;
+
+   if (unlikely(!pskb_may_pull(skb, GSWIP_RX_HEADER_LEN)))
+   return NULL;
+
+   gswip_tag = skb->data - ETH_HLEN;
+   skb_pull_rcsum(skb, GSWIP_RX_HEADER_LEN);


I would be moving this after the port lookup was successful, that way if 
you are discarding a frame, you can do this as quickly as possible, this 
should not have a functional impact since you return a skb with the 
checksum updated past the return of that function.

--
Florian

Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-03 Thread Cong Wang

On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov  wrote:
>
> Action API was changed to work with actions and action_idr in concurrency
> safe manner, however tcf_del_walker() still uses actions without taking
> reference to them first and deletes them directly, disregarding possible
> concurrent delete.
>
> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
> caller to hold reference to action and accepts action id as argument,
> instead of direct action pointer.

Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
tcf_dump_walker() already does.

[RFC PATCH bpf-next 4/4] tools/bpf: bpftool: add net support

2018-09-03 Thread Yonghong Song

Add "bpftool net" support. Networking devices are enumerated
to dump all xdp information. Also, for each networking device,
tc classes and qdiscs are enumerated in order to check bpf filters
with these classes and qdiscs. In addition, root handle and
clsact ingress/egress are also checked for bpf filters.

For example,

  $ bpftool net
  xdp [
  ]
  netdev_filters [
  ifindex 2 name handle_icmp flags direct-action flags_gen [not_in_hw ]
prog_id 3194 tag 846d29c14d0d7d26 act []
  ifindex 2 name handle_egress flags direct-action flags_gen [not_in_hw ]
prog_id 3193 tag 387d281be9fe77aa
  ]

  $ bpftool -jp net
  [{
"xdp": [],
"netdev_filters": [{
"ifindex": 2,
"name": "handle_icmp",
"flags": "direct-action",
"flags_gen": ["not_in_hw"
],
"prog_id": 3194,
"tag": "846d29c14d0d7d26",
"act": []
},{
"ifindex": 2,
"name": "handle_egress",
"flags": "direct-action",
"flags_gen": ["not_in_hw"
],
"prog_id": 3193,
"tag": "387d281be9fe77aa"
}
]
}
  ]

Signed-off-by: Yonghong Song 
---
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   7 +
 tools/bpf/bpftool/net.c| 219 
 tools/bpf/bpftool/netlink_dumper.c | 261 +
 tools/bpf/bpftool/netlink_dumper.h | 103 
 5 files changed, 592 insertions(+), 1 deletion(-)
 create mode 100644 tools/bpf/bpftool/net.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.h

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index d15a62be6cf0..79dc3f193547 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -85,7 +85,7 @@ static int do_help(int argc, char **argv)
"   %s batch file FILE\n"
"   %s version\n"
"\n"
-   "   OBJECT := { prog | map | cgroup | perf }\n"
+   "   OBJECT := { prog | map | cgroup | perf | net }\n"
"   " HELP_SPEC_OPTIONS "\n"
"",
bin_name, bin_name, bin_name);
@@ -215,6 +215,7 @@ static const struct cmd cmds[] = {
{ "map",do_map },
{ "cgroup", do_cgroup },
{ "perf",   do_perf },
+   { "net",do_net },
{ "version",do_version },
{ 0 }
 };
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 238e734d75b3..f82aeb08a043 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -136,6 +136,7 @@ int do_map(int argc, char **arg);
 int do_event_pipe(int argc, char **argv);
 int do_cgroup(int argc, char **arg);
 int do_perf(int argc, char **arg);
+int do_net(int argc, char **arg);
 
 int prog_parse_fd(int *argc, char ***argv);
 int map_parse_fd(int *argc, char ***argv);
@@ -165,4 +166,10 @@ struct btf_dumper {
  */
 int btf_dumper_type(const struct btf_dumper *d, __u32 type_id,
const void *data);
+
+struct nlattr;
+struct ifinfomsg;
+struct tcmsg;
+int do_filter_dump(struct tcmsg *ifinfo, struct nlattr **tb);
+int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb);
 #endif
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
new file mode 100644
index ..0ced41d9fee9
--- /dev/null
+++ b/tools/bpf/bpftool/net.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0+
+// Copyright (C) 2018 Facebook
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include "main.h"
+#include "netlink_dumper.h"
+
+struct bpf_netdev_t {
+   int *ifindex_array;
+   int used_len;
+   int array_len;
+   int filter_idx;
+};
+
+struct bpf_tcinfo_t {
+   int *handle_array;
+   int used_len;
+   int array_len;
+   boolis_qdisc;
+};
+
+static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
+{
+   struct bpf_netdev_t *netinfo = cookie;
+   struct ifinfomsg *ifinfo = msg;
+
+   if (netinfo->filter_idx > 0 && netinfo->filter_idx != ifinfo->ifi_index)
+   return 0;
+
+   if (netinfo->used_len == netinfo->array_len) {
+   netinfo->ifindex_array = realloc(netinfo->ifindex_array,
+   netinfo->array_len * sizeof(int) + 64);
+   netinfo->array_len += 64/sizeof(int);
+   }
+   netinfo->ifindex_array[netinfo->used_len++] = ifinfo->ifi_index;
+
+   return do_xdp_dump(ifinfo, tb);
+}
+
+static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
+{
+   struct bpf_tcinfo_t *tcinfo = cookie;
+   struct tcmsg *info = msg;
+
+   if (tcinfo->is_qdisc

[RFC PATCH bpf-next 2/4] tools/bpf: move bpf/lib netlink related functions into a new file

2018-09-03 Thread Yonghong Song

There are no functionality change for this patch.

In the subsequent patches, more netlink related library functions
will be added and a separate file is better than cluttering bpf.c.

Signed-off-by: Yonghong Song 
---
 tools/lib/bpf/Build |   2 +-
 tools/lib/bpf/bpf.c | 129 ---
 tools/lib/bpf/netlink.c | 164 
 3 files changed, 165 insertions(+), 130 deletions(-)
 create mode 100644 tools/lib/bpf/netlink.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 13a861135127..512b2c0ba0d2 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o netlink.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 60aa4ca8b2c5..3878a26a2071 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -28,16 +28,8 @@
 #include 
 #include "bpf.h"
 #include "libbpf.h"
-#include "nlattr.h"
-#include 
-#include 
-#include 
 #include 
 
-#ifndef SOL_NETLINK
-#define SOL_NETLINK 270
-#endif
-
 /*
  * When building perf, unistd.h is overridden. __NR_bpf is
  * required to be defined explicitly.
@@ -499,127 +491,6 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
-{
-   struct sockaddr_nl sa;
-   int sock, seq = 0, len, ret = -1;
-   char buf[4096];
-   struct nlattr *nla, *nla_xdp;
-   struct {
-   struct nlmsghdr  nh;
-   struct ifinfomsg ifinfo;
-   char attrbuf[64];
-   } req;
-   struct nlmsghdr *nh;
-   struct nlmsgerr *err;
-   socklen_t addrlen;
-   int one = 1;
-
-   memset(&sa, 0, sizeof(sa));
-   sa.nl_family = AF_NETLINK;
-
-   sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
-   if (sock < 0) {
-   return -errno;
-   }
-
-   if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
-  &one, sizeof(one)) < 0) {
-   fprintf(stderr, "Netlink error reporting not supported\n");
-   }
-
-   if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) {
-   ret = -errno;
-   goto cleanup;
-   }
-
-   addrlen = sizeof(sa);
-   if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) {
-   ret = -errno;
-   goto cleanup;
-   }
-
-   if (addrlen != sizeof(sa)) {
-   ret = -LIBBPF_ERRNO__INTERNAL;
-   goto cleanup;
-   }
-
-   memset(&req, 0, sizeof(req));
-   req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg));
-   req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
-   req.nh.nlmsg_type = RTM_SETLINK;
-   req.nh.nlmsg_pid = 0;
-   req.nh.nlmsg_seq = ++seq;
-   req.ifinfo.ifi_family = AF_UNSPEC;
-   req.ifinfo.ifi_index = ifindex;
-
-   /* started nested attribute for XDP */
-   nla = (struct nlattr *)(((char *)&req)
-   + NLMSG_ALIGN(req.nh.nlmsg_len));
-   nla->nla_type = NLA_F_NESTED | IFLA_XDP;
-   nla->nla_len = NLA_HDRLEN;
-
-   /* add XDP fd */
-   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-   nla_xdp->nla_type = IFLA_XDP_FD;
-   nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
-   memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
-   nla->nla_len += nla_xdp->nla_len;
-
-   /* if user passed in any flags, add those too */
-   if (flags) {
-   nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
-   nla_xdp->nla_type = IFLA_XDP_FLAGS;
-   nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
-   memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
-   nla->nla_len += nla_xdp->nla_len;
-   }
-
-   req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
-
-   if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
-   ret = -errno;
-   goto cleanup;
-   }
-
-   len = recv(sock, buf, sizeof(buf), 0);
-   if (len < 0) {
-   ret = -errno;
-   goto cleanup;
-   }
-
-   for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
-nh = NLMSG_NEXT(nh, len)) {
-   if (nh->nlmsg_pid != sa.nl_pid) {
-   ret = -LIBBPF_ERRNO__WRNGPID;
-   goto cleanup;
-   }
-   if (nh->nlmsg_seq != seq) {
-   ret = -LIBBPF_ERRNO__INVSEQ;
-   goto cleanup;
-   }
-   switch (nh->nlmsg_type) {
-   case NLMSG_ERROR:
-   err = (struct nlmsgerr *)NLMSG_DATA(nh);
-   if (!err->error)
-   continue;
-   ret = err->error;
-   nla_dump_e

[RFC PATCH bpf-next 3/4] tools/bpf: add more netlink functionalities in lib/bpf

2018-09-03 Thread Yonghong Song

This patch added a few netlink attribute parsing functions
and the netlink API functions to query networking links, tc classes,
tc qdiscs and tc filters. For example, the following API is
to get networking links:
  int nl_get_link(int sock, unsigned int nl_pid,
  dump_nlmsg_t dump_link_nlmsg,
  void *cookie);

Note that when the API is called, the user also provided a
callback function with the following signature:
  int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);

The "cookie" is the parameter the user passed to the API and will
be available for the callback function.
The "msg" is the information about the result, e.g., ifinfomsg or
tcmsg. The "tb" is the parsed netlink attributes.

Signed-off-by: Yonghong Song 
---
 tools/lib/bpf/libbpf.h   |  16 
 tools/lib/bpf/libbpf_errno.c |   1 +
 tools/lib/bpf/netlink.c  | 165 ++-
 tools/lib/bpf/nlattr.c   |  33 ---
 tools/lib/bpf/nlattr.h   |  38 
 5 files changed, 238 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 96c55fac54c3..e3b00e23e181 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -46,6 +46,7 @@ enum libbpf_errno {
LIBBPF_ERRNO__PROGTYPE, /* Kernel doesn't support this program type */
LIBBPF_ERRNO__WRNGPID,  /* Wrong pid in netlink message */
LIBBPF_ERRNO__INVSEQ,   /* Invalid netlink sequence */
+   LIBBPF_ERRNO__NLPARSE,  /* netlink parsing error */
__LIBBPF_ERRNO__END,
 };
 
@@ -297,4 +298,19 @@ int bpf_perf_event_read_simple(void *mem, unsigned long 
size,
   unsigned long page_size,
   void **buf, size_t *buf_len,
   bpf_perf_event_print_t fn, void *priv);
+
+struct nlmsghdr;
+struct nlattr;
+typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
+typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t,
+ void *cookie);
+int bpf_netlink_open(unsigned int *nl_pid);
+int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
+   void *cookie);
+int nl_get_class(int sock, unsigned int nl_pid, int ifindex,
+dump_nlmsg_t dump_class_nlmsg, void *cookie);
+int nl_get_qdisc(int sock, unsigned int nl_pid, int ifindex,
+dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
+int nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
+ dump_nlmsg_t dump_filter_nlmsg, void *cookie);
 #endif
diff --git a/tools/lib/bpf/libbpf_errno.c b/tools/lib/bpf/libbpf_errno.c
index d9ba851bd7f9..2464ade3b326 100644
--- a/tools/lib/bpf/libbpf_errno.c
+++ b/tools/lib/bpf/libbpf_errno.c
@@ -42,6 +42,7 @@ static const char *libbpf_strerror_table[NR_ERRNO] = {
[ERRCODE_OFFSET(PROGTYPE)]  = "Kernel doesn't support this program 
type",
[ERRCODE_OFFSET(WRNGPID)]   = "Wrong pid in netlink message",
[ERRCODE_OFFSET(INVSEQ)]= "Invalid netlink sequence",
+   [ERRCODE_OFFSET(NLPARSE)]   = "Incorrect netlink message parsing",
 };
 
 int libbpf_strerror(int err, char *buf, size_t size)
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index ba067b8236a5..4779c0ae9e7a 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -17,7 +17,7 @@
 #define SOL_NETLINK 270
 #endif
 
-static int bpf_netlink_open(__u32 *nl_pid)
+int bpf_netlink_open(__u32 *nl_pid)
 {
struct sockaddr_nl sa;
socklen_t addrlen;
@@ -60,7 +60,9 @@ static int bpf_netlink_open(__u32 *nl_pid)
return ret;
 }
 
-static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
+static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
+   __dump_nlmsg_t _fn, dump_nlmsg_t fn,
+   void *cookie)
 {
struct nlmsgerr *err;
struct nlmsghdr *nh;
@@ -97,6 +99,11 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq)
default:
break;
}
+   if (_fn) {
+   ret = _fn(nh, fn, cookie);
+   if (ret)
+   return ret;
+   }
}
}
ret = 0;
@@ -156,9 +163,161 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
ret = -errno;
goto cleanup;
}
-   ret = bpf_netlink_recv(sock, nl_pid, seq);
+   ret = bpf_netlink_recv(sock, nl_pid, seq, NULL, NULL, NULL);
 
 cleanup:
close(sock);
return ret;
 }
+
+static int __dump_link_nlmsg(struct nlmsghdr *nlh, dump_nlmsg_t 
dump_link_nlmsg,
+void *cookie)
+{
+   struct nlattr *tb[IFLA_MAX + 1], *attr;
+   struct ifinfomsg *ifi = NLMSG_DATA(nlh);
+   int len;
+
+

[RFC PATCH bpf-next 0/4] tools/bpf: bpftool: add net support

2018-09-03 Thread Yonghong Song

The functionality to dump network driver and tc related bpf programs
are added. Currently, users can already use "ip link show "
and "tc filter show dev  ..." to dump bpf program attachment
information for xdp programs and tc bpf programs.
The implementation here allows bpftool as a central place for
bpf introspection and users do not need to revert to other tools.
Also, we can make command simpler to dump bpf related information,
e.g., "bpftool net" is able to dump all xdp and tc bpf programs.

For example,

  $ bpftool net
  xdp [
  ]
  netdev_filters [
  ifindex 2 name handle_icmp flags direct-action flags_gen [not_in_hw ]
prog_id 3194 tag 846d29c14d0d7d26 act []
  ifindex 2 name handle_egress flags direct-action flags_gen [not_in_hw ]
prog_id 3193 tag 387d281be9fe77aa
  ] 

  $ bpftool -jp net
  [{
"xdp": [],
"netdev_filters": [{
"ifindex": 2,
"name": "handle_icmp",
"flags": "direct-action",
"flags_gen": ["not_in_hw"
],
"prog_id": 3194,
"tag": "846d29c14d0d7d26",
"act": []
},{
"ifindex": 2,
"name": "handle_egress",
"flags": "direct-action",
"flags_gen": ["not_in_hw"
],
"prog_id": 3193,
"tag": "387d281be9fe77aa"
}
]
}
  ]

This is labeled as RFC as
  . the current command line as "bpftool net {show|list} [dev name]",
does this sound reasonable? When "dev name" is specified, only
bpf programs for that particular device are displayed.
  . for some netlink attributes, currently I only print out the
raw numbers, maybe I should print better similar to iproute2?
  . bpftool documentation and bash completion are not implemented.

Patch #1 sync'd kernel uapi header if_link.h to tools directory.
Patch #2 re-organized the bpf/lib bpf.c to have a separate file
for netlink related functions.
Patch #3 added additional netlink related functions.
Patch #4 implemented "bpftool net" command.

Yonghong Song (4):
  tools/bpf: sync kernel uapi header if_link.h to tools
  tools/bpf: move bpf/lib netlink related functions into a new file
  tools/bpf: add more netlink functionalities in lib/bpf
  tools/bpf: bpftool: add net support

 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   7 +
 tools/bpf/bpftool/net.c| 219 +++
 tools/bpf/bpftool/netlink_dumper.c | 261 +++
 tools/bpf/bpftool/netlink_dumper.h | 103 +
 tools/include/uapi/linux/if_link.h |  17 ++
 tools/lib/bpf/Build|   2 +-
 tools/lib/bpf/bpf.c| 129 
 tools/lib/bpf/libbpf.h |  16 ++
 tools/lib/bpf/libbpf_errno.c   |   1 +
 tools/lib/bpf/netlink.c| 323 +
 tools/lib/bpf/nlattr.c |  33 +--
 tools/lib/bpf/nlattr.h |  38 
 13 files changed, 1009 insertions(+), 143 deletions(-)
 create mode 100644 tools/bpf/bpftool/net.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.c
 create mode 100644 tools/bpf/bpftool/netlink_dumper.h
 create mode 100644 tools/lib/bpf/netlink.c

-- 
2.17.1

[RFC PATCH bpf-next 1/4] tools/bpf: sync kernel uapi header if_link.h to tools

2018-09-03 Thread Yonghong Song

Among others, this header will be used later for
bpftool net support.

Signed-off-by: Yonghong Song 
---
 tools/include/uapi/linux/if_link.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tools/include/uapi/linux/if_link.h 
b/tools/include/uapi/linux/if_link.h
index cf01b6824244..43391e2d1153 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -164,6 +164,8 @@ enum {
IFLA_CARRIER_UP_COUNT,
IFLA_CARRIER_DOWN_COUNT,
IFLA_NEW_IFINDEX,
+   IFLA_MIN_MTU,
+   IFLA_MAX_MTU,
__IFLA_MAX
 };
 
@@ -334,6 +336,7 @@ enum {
IFLA_BRPORT_GROUP_FWD_MASK,
IFLA_BRPORT_NEIGH_SUPPRESS,
IFLA_BRPORT_ISOLATED,
+   IFLA_BRPORT_BACKUP_PORT,
__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
@@ -459,6 +462,16 @@ enum {
 
 #define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1)
 
+/* XFRM section */
+enum {
+   IFLA_XFRM_UNSPEC,
+   IFLA_XFRM_LINK,
+   IFLA_XFRM_IF_ID,
+   __IFLA_XFRM_MAX
+};
+
+#define IFLA_XFRM_MAX (__IFLA_XFRM_MAX - 1)
+
 enum macsec_validation_type {
MACSEC_VALIDATE_DISABLED = 0,
MACSEC_VALIDATE_CHECK = 1,
@@ -920,6 +933,7 @@ enum {
XDP_ATTACHED_DRV,
XDP_ATTACHED_SKB,
XDP_ATTACHED_HW,
+   XDP_ATTACHED_MULTI,
 };
 
 enum {
@@ -928,6 +942,9 @@ enum {
IFLA_XDP_ATTACHED,
IFLA_XDP_FLAGS,
IFLA_XDP_PROG_ID,
+   IFLA_XDP_DRV_PROG_ID,
+   IFLA_XDP_SKB_PROG_ID,
+   IFLA_XDP_HW_PROG_ID,
__IFLA_XDP_MAX,
 };
 
-- 
2.17.1

Re: [PATCH net-next] net: sched: null actions array pointer before releasing action

2018-09-03 Thread Cong Wang

On Mon, Sep 3, 2018 at 12:05 AM Vlad Buslov  wrote:
>
> Currently, tcf_action_delete() nulls actions array pointer after putting
> and deleting it. However, if tcf_idr_delete_index() returns an error,
> pointer to action is not set to null. That results it being released second
> time in error handling code of tca_action_gd().

Oops, good catch.

Acked-by: Cong Wang 

David, this one should also go to -net rather than -net-next.

[Patch net] act_ife: fix a potential use-after-free

2018-09-03 Thread Cong Wang

Immediately after module_put(), user could delete this
module, so e->ops could be already freed before we call
e->ops->release().

Fix this by moving module_put() after ops->release().

Fixes: ef6980b6becb ("introduce IFE action")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/act_ife.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 196430aefe87..fc412769a1be 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -400,7 +400,6 @@ static void _tcf_ife_cleanup(struct tc_action *a)
struct tcf_meta_info *e, *n;
 
list_for_each_entry_safe(e, n, &ife->metalist, metalist) {
-   module_put(e->ops->owner);
list_del(&e->metalist);
if (e->metaval) {
if (e->ops->release)
@@ -408,6 +407,7 @@ static void _tcf_ife_cleanup(struct tc_action *a)
else
kfree(e->metaval);
}
+   module_put(e->ops->owner);
kfree(e);
}
 }
-- 
2.14.4

Re: [PATCH net-next 01/12] net: phy: ste10Xp: Remove wrong SUPPORTED_Pause

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

The PHY driver should not indicate that Pause is supported. It is upto
the MAC drive enable it, if it supports Pause frames. So remove it
from the ste10Xp driver.


This came up before when Timur was cleaning up the Pause|ASym_Pause 
advertisment bits, and we agreed that a driver that cannot have the 
Asym_Pause bit writable, e.g: bcm63xx, would have to specifically leave 
SUPPORTED_Pause as a way to tell PHYLIB about that situation. See:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/phy?id=529ed12752635ba8a35dc78ec70ed6f42570b4ca

Can you check the datasheet if available?




Signed-off-by: Andrew Lunn 
---
  drivers/net/phy/ste10Xp.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/ste10Xp.c b/drivers/net/phy/ste10Xp.c
index fbd548a1ad84..2fe9a87b55b5 100644
--- a/drivers/net/phy/ste10Xp.c
+++ b/drivers/net/phy/ste10Xp.c
@@ -86,7 +86,7 @@ static struct phy_driver ste10xp_pdriver[] = {
.phy_id = STE101P_PHY_ID,
.phy_id_mask = 0xfff0,
.name = "STe101p",
-   .features = PHY_BASIC_FEATURES | SUPPORTED_Pause,
+   .features = PHY_BASIC_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.config_init = ste10Xp_config_init,
.ack_interrupt = ste10Xp_ack_interrupt,
@@ -97,7 +97,7 @@ static struct phy_driver ste10xp_pdriver[] = {
.phy_id = STE100P_PHY_ID,
.phy_id_mask = 0x,
.name = "STe100p",
-   .features = PHY_BASIC_FEATURES | SUPPORTED_Pause,
+   .features = PHY_BASIC_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.config_init = ste10Xp_config_init,
.ack_interrupt = ste10Xp_ack_interrupt,



--
Florian

Re: [PATCH net-next 12/12] net: ethernet: Add helper to determine if pause configuration is supported

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

Rather than have MAC drivers open code the test, add a helper in
phylib. This will help when we change the type of phydev->supported.

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 
--
Florian

Re: [PATCH net-next 11/12] net: ethernet: Add helper for set_pauseparam for Pause

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

ethtool can be used to enable/disable pause. Add a helper to configure
the PHY when Pause is supported.


One comment, see below:

[snip]

+/**
+ * phy_set_pause - Configure Pause
+ * @phydev: target phy_device struct
+ * @rx: Receiver Pause is supported
+ * @autoneg: Auto neg should be used
+ *
+ * Description: Configure advertised Pause support depending on if
+ * receiver pause and pause auto neg is supported. Generally called
+ * from the set_pauseparam .ndo.
+ */
+void phy_set_pause(struct phy_device *phydev, bool rx, bool autoneg)
+{
+   phydev->supported &= ~SUPPORTED_Pause;
+
+   if (rx || autoneg)
+   phydev->supported |= SUPPORTED_Pause;
+
+   phydev->advertising = phydev->supported;


This is the logic from FEC, but I think the one from bcm63xx_enet.c is 
actually the correct one, you can enable symmetric pause support if it 
is enabled for both the RX and TX paths here.


Similar comment to patch 9, I would name this phy_set_sym_pause() for 
clarity.

--
Florian

Re: [PATCH net-next 10/12] net: ethernet: Add helper for set_pauseparam for Asym Pause

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

ethtool can be used to enable/disable pause. Add a helper to configure
the PHY when asym pause is supported.


Don't you want to go one step further and incorporate the logic that 
xgenet, tg3, gianfar and others have? That is: look at the currently 
advertised parameters, determine what changed, and re-start 
auto-negotiation as a result of it being enabled and something changed?


Could be done as a follow-up patch I suppose, although see below:

[snip]

index 4f50f11718f4..dfe03afd00b0 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -306,7 +306,6 @@ static int xgene_set_pauseparam(struct net_device *ndev,
  {
struct xgene_enet_pdata *pdata = netdev_priv(ndev);
struct phy_device *phydev = ndev->phydev;
-   u32 oldadv, newadv;
  
  	if (phy_interface_mode_is_rgmii(pdata->phy_mode) ||

pdata->phy_mode == PHY_INTERFACE_MODE_SGMII) {
@@ -322,29 +321,12 @@ static int xgene_set_pauseparam(struct net_device *ndev,
pdata->tx_pause = pp->tx_pause;
pdata->rx_pause = pp->rx_pause;
  
-		oldadv = phydev->advertising;

-   newadv = oldadv & ~(ADVERTISED_Pause | ADVERTISED_Asym_Pause);
+   phy_set_asym_pause(phydev, pp->rx_pause,  pp->tx_pause);
  
-		if (pp->rx_pause)

-   newadv |= ADVERTISED_Pause | ADVERTISED_Asym_Pause;
-
-   if (pp->tx_pause)
-   newadv ^= ADVERTISED_Asym_Pause;
-
-   if (oldadv ^ newadv) {
-   phydev->advertising = newadv;
-
-   if (phydev->autoneg)
-   return phy_start_aneg(phydev);
-


You have remove the part that checks for phydev->autoneg here, was that 
intentional?



-   if (!pp->autoneg) {
-   pdata->mac_ops->flowctl_tx(pdata,
-  pdata->tx_pause);
-   pdata->mac_ops->flowctl_rx(pdata,
-  pdata->rx_pause);
-   }
+   if (!pp->autoneg) {
+   pdata->mac_ops->flowctl_tx(pdata, pdata->tx_pause);
+   pdata->mac_ops->flowctl_rx(pdata, pdata->rx_pause);
}



--
Florian

Re: [PATCH net-next 02/12] net: phy: et1011c: Remove incorrect missing 1000 Half

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

The driver indicates it can do 10/100 full and half duplex, plus 1G
Full. The datasheet indicates 1G half is also supported. So make use
of the standard PHY_GBIT_FEATURES.

It could be, this was added because there is a MAC which does not
support 1G half. Bit this is the wrong place to enforce this.


Indeed, this should not be left to the PHY to decide, although in my 
experience using 1000/Half is always a "roll the dice" thing, since it 
is not AFAIR properly specified.




Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 
--
Florian

[PATCH mlx5-next] net/mlx5: Add memic command opcode to command checker

2018-09-03 Thread Leon Romanovsky

From: Ariel Levkovich 

Adding the alloc/dealloc memic FW command opcodes to
avoid "unknown command" prints in the command string
converter and internal error status handler.

Signed-off-by: Ariel Levkovich 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 853240f5ae82..39750fca371d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -312,6 +312,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT:
case MLX5_CMD_OP_FPGA_DESTROY_QP:
case MLX5_CMD_OP_DESTROY_GENERAL_OBJECT:
+   case MLX5_CMD_OP_DEALLOC_MEMIC:
return MLX5_CMD_STAT_OK;
 
case MLX5_CMD_OP_QUERY_HCA_CAP:
@@ -435,6 +436,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev 
*dev, u16 op,
case MLX5_CMD_OP_CREATE_GENERAL_OBJECT:
case MLX5_CMD_OP_MODIFY_GENERAL_OBJECT:
case MLX5_CMD_OP_QUERY_GENERAL_OBJECT:
+   case MLX5_CMD_OP_ALLOC_MEMIC:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -617,6 +619,8 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(MODIFY_GENERAL_OBJECT);
MLX5_COMMAND_STR_CASE(QUERY_GENERAL_OBJECT);
MLX5_COMMAND_STR_CASE(QUERY_MODIFY_HEADER_CONTEXT);
+   MLX5_COMMAND_STR_CASE(ALLOC_MEMIC);
+   MLX5_COMMAND_STR_CASE(DEALLOC_MEMIC);
default: return "unknown command opcode";
}
 }
-- 
2.14.4

Re: [PATCH net-next] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Cong Wang

On Mon, Sep 3, 2018 at 12:10 AM Vlad Buslov  wrote:
>
> Recent refactoring of add_metainfo() caused use_all_metadata() to add
> metainfo to ife action metalist without taking reference to module. This
> causes warning in module_put called from ife action cleanup function.
>
> Implement add_metainfo_and_get_ops() function that returns with reference
> to module taken if metainfo was added successfully, and call it from
> use_all_metadata(), instead of calling __add_metainfo() directly.
>
...
>
> Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
> Signed-off-by: Vlad Buslov 

This patch should be applied to -net rather than -net-next.

Acked-by: Cong Wang 

One nit below.

>  static int add_metainfo(struct tcf_ife_info *ife, u32 metaid, void *metaval,
> int len, bool exists)
>  {
> @@ -349,7 +364,8 @@ static int use_all_metadata(struct tcf_ife_info *ife, 
> bool exists)
>
> read_lock(&ife_mod_lock);
> list_for_each_entry(o, &ifeoplist, list) {
> -   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, exists);
> +   rc = add_metainfo_and_get_ops(o, ife, o->metaid, NULL, 0, 
> true,
> + exists);

Nit: you can fold constants into this helper as it only has one caller.

Re: [PATCH net-next 09/12] net: ethernet: Add helper for MACs which support pause

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

Rather than have the MAC drivers manipulate phydev members, add a
helper function for MACs supporting Pause, but not Asym Pause.

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 

I wonder if this would be better named phy_support_sym_pause() as 
opposed to asym_pause()?

--
Florian

Re: [PATCH net-next 08/12] net: ethernet: Add helper for MACs which support asym pause

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

Rather than have the MAC drivers manipulate phydev members to indicate
they support Asym Pause, add a helper function.

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 

Just one nit in tg3.c:


diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index eab00239a47a..9aa7955d5d31 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -2123,15 +2123,13 @@ static int tg3_phy_init(struct tg3 *tp)
case PHY_INTERFACE_MODE_RGMII:
if (!(tp->phy_flags & TG3_PHYFLG_10_100_ONLY)) {
phy_set_max_speed(phydev, SPEED_1000);
-   phydev->supported |= (SUPPORTED_Pause |
- SUPPORTED_Asym_Pause);
+   phy_support_asym_pause(phydev);
break;
}
/* fallthru */
case PHY_INTERFACE_MODE_MII:
phy_set_max_speed(phydev, SPEED_100);
-   phydev->supported |= (SUPPORTED_Pause |
- SUPPORTED_Asym_Pause);
+   phy_support_asym_pause(phydev);


Your indentation is off by a tab here.
--
Florian

Re: [PATCH net-next] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Cong Wang

On Mon, Sep 3, 2018 at 10:12 AM Vlad Buslov  wrote:
>
>
> On Mon 03 Sep 2018 at 17:03, Cong Wang  wrote:
> > On Mon, Sep 3, 2018 at 12:10 AM Vlad Buslov  wrote:
> >>
> >> Recent refactoring of add_metainfo() caused use_all_metadata() to add
> >> metainfo to ife action metalist without taking reference to module. This
> >> causes warning in module_put called from ife action cleanup function.
> >>
> >> Implement add_metainfo_and_get_ops() function that returns with reference
> >> to module taken if metainfo was added successfully, and call it from
> >> use_all_metadata(), instead of calling __add_metainfo() directly.
> >
> > Good catch!
> >
> > I thought every entry in ifeoplist must hold a refcnt to its module, looks
> > like I was wrong.
> >
> >
> >> read_lock(&ife_mod_lock);
> >> list_for_each_entry(o, &ifeoplist, list) {
> >> -   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, 
> >> exists);
> >> +   rc = add_metainfo_and_get_ops(o, ife, o->metaid, NULL, 0, 
> >> true,
> >> + exists);
> >> if (rc == 0)
> >> installed += 1;
> >
> > I am afraid you have to rollback on failure inside this loop, that is,
> > releasing all previous module refcnt properly on error.
>
> Do I? This function looks like it is explicitly designed to succeed if
> at least one metainfo was successfully added. And this is how it was
> originally implemented before deadlock fix refactoring.

Hmm, well, it is taken care by tcf_idr_release() via _tcf_ife_cleanup()...

So, your patch should be fine.

Re: [PATCH net-next 07/12] net: ethernet: Add helper to remove a supported link mode

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

Some MAC hardware cannot support a subset of link modes. e.g. often
1Gbps Full duplex is supported, but Half duplex is not. Add a helper
to remove such a link mode.

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 
--
Florian

Re: [PATCH net-next 06/12] net: ethernet: Fix up drivers masking pause support

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

PHY drivers don't indicate they support pause. They expect MAC drivers
to enable its support if the MAC has the needed hardware. Thus MAC
drivers should not mask Pause support, but enable it.

Change a few ANDs to ORs.

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 

Adding Michael for tg3.c


---
  drivers/net/ethernet/broadcom/tg3.c | 4 ++--
  drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 4 ++--
  drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
  drivers/net/ethernet/smsc/smsc911x.c| 2 +-
  drivers/net/ethernet/smsc/smsc9420.c| 2 +-
  5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index cdc32724c9d9..eab00239a47a 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -2123,14 +2123,14 @@ static int tg3_phy_init(struct tg3 *tp)
case PHY_INTERFACE_MODE_RGMII:
if (!(tp->phy_flags & TG3_PHYFLG_10_100_ONLY)) {
phy_set_max_speed(phydev, SPEED_1000);
-   phydev->supported &= (SUPPORTED_Pause |
+   phydev->supported |= (SUPPORTED_Pause |
  SUPPORTED_Asym_Pause);
break;
}
/* fallthru */
case PHY_INTERFACE_MODE_MII:
phy_set_max_speed(phydev, SPEED_100);
-   phydev->supported &= (SUPPORTED_Pause |
+   phydev->supported |= (SUPPORTED_Pause |
  SUPPORTED_Asym_Pause);
break;
default:
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 398971a062f4..05b15d254e32 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -10,8 +10,6 @@
  
  #define HCLGE_PHY_SUPPORTED_FEATURES	(SUPPORTED_Autoneg | \

 SUPPORTED_TP | \
-SUPPORTED_Pause | \
-SUPPORTED_Asym_Pause | \
 PHY_10BT_FEATURES | \
 PHY_100BT_FEATURES | \
 PHY_1000BT_FEATURES)
@@ -213,6 +211,8 @@ int hclge_mac_connect_phy(struct hclge_dev *hdev)
}
  
  	phydev->supported &= HCLGE_PHY_SUPPORTED_FEATURES;

+   phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+
phydev->advertising = phydev->supported;
  
  	return 0;

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index e93b5375504b..db231bda7c2a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -360,7 +360,7 @@ static int mtk_phy_connect(struct net_device *dev)
SUPPORTED_Pause | SUPPORTED_Asym_Pause;
  
  	phy_set_max_speed(dev->phydev, SPEED_1000);

-   dev->phydev->supported &= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+   dev->phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
dev->phydev->advertising = dev->phydev->supported |
ADVERTISED_Autoneg;
phy_start_aneg(dev->phydev);
diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index f84dbd0beb8e..3e34bf53f055 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1051,7 +1051,7 @@ static int smsc911x_mii_probe(struct net_device *dev)
phy_set_max_speed(phydev, SPEED_100);
  
  	/* mask with MAC supported features */

-   phydev->supported &= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
+   phydev->supported |= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
phydev->advertising = phydev->supported;
  
  	pdata->last_duplex = -1;

diff --git a/drivers/net/ethernet/smsc/smsc9420.c 
b/drivers/net/ethernet/smsc/smsc9420.c
index 795f60d92611..326177384544 100644
--- a/drivers/net/ethernet/smsc/smsc9420.c
+++ b/drivers/net/ethernet/smsc/smsc9420.c
@@ -1138,7 +1138,7 @@ static int smsc9420_mii_probe(struct net_device *dev)
phy_set_max_speed(phydev, SPEED_100);
  
  	/* mask with MAC supported features */

-   phydev->supported &= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
+   phydev->supported |= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
phydev->advertising = phydev->supported;
  
  	phy_attached_info(phydev);




--
Florian

Re: [PATCH net-next 05/12] net: ethernet: genet: Fix speed selection

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

The phy supported speed is being used to determine if the MAC should
be configured to 100 or 1G. The masking logic is broken. Instead, look
1G supported speeds to enable 1G MAC support.

Signed-off-by: Andrew Lunn 
---
  drivers/net/ethernet/broadcom/genet/bcmmii.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 881e566730f3..69587a61e8d6 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -220,11 +220,10 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
 * capabilities, use that knowledge to also configure the
 * Reverse MII interface correctly.
 */
-   if ((dev->phydev->supported & PHY_BASIC_FEATURES) ==
-   PHY_BASIC_FEATURES)
-   port_ctrl = PORT_MODE_EXT_RVMII_25;
-   else
+   if (dev->phydev->supported & PHY_1000BT_FEATURES)
port_ctrl = PORT_MODE_EXT_RVMII_50;
+   else
+   port_ctrl = PORT_MODE_EXT_RVMII_25;


Your change is not wrong, but the driver was wrong previously, reverse 
MII is 10/100 only, but the reference clock selection (25Mhz or 50Mhz) 
cannot be automatically discovered and we need the help of Device 
Tree/platform data here. I will submit a separate patch for "net", and 
this one can go in for now.


Reviewed-by: Florian Fainelli 
--
Florian

Re: [PATCH net-next 04/12] net: ethernet: Use phy_set_max_speed() to limit advertised speed

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

Many Ethernet MAC drivers want to limit the PHY to only advertise a
maximum speed of 100Mbs or 1Gbps. Rather than using a mask, make use
of the helper function phy_set_max_speed().

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 
--
Florian

Re: [PATCH net-next 03/12] net: phy: bcm63xx: Allow to be built with COMPILE_TEST

2018-09-03 Thread Florian Fainelli





On 9/2/2018 10:06 AM, Andrew Lunn wrote:

There is nothing in this driver which prevents it to be compiled for
other architectures. Add COMPILE_TEST so we get better compile test
coverage.

Signed-off-by: Andrew Lunn 


Reviewed-by: Florian Fainelli 
--
Florian

[PATCH mlx5-next] net/mlx5: Fix atomic_mode enum values

2018-09-03 Thread Leon Romanovsky

From: Moni Shoua 

The field atomic_mode is 4 bits wide and therefore can hold values
from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values
be incorrect. While that, remove unused enum values.

Fixes: 57cda166bbe0 ("net/mlx5: Add DCT command interface")
Signed-off-by: Moni Shoua 
Reviewed-by: Artemy Kovalyov 
Signed-off-by: Leon Romanovsky 
---
 include/linux/mlx5/driver.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 66c1170500cf..89caed98ef0b 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -163,10 +163,7 @@ enum mlx5_dcbx_oper_mode {
 };
 
 enum mlx5_dct_atomic_mode {
-   MLX5_ATOMIC_MODE_DCT_OFF= 20,
-   MLX5_ATOMIC_MODE_DCT_NONE   = 0 << MLX5_ATOMIC_MODE_DCT_OFF,
-   MLX5_ATOMIC_MODE_DCT_IB_COMP= 1 << MLX5_ATOMIC_MODE_DCT_OFF,
-   MLX5_ATOMIC_MODE_DCT_CX = 2 << MLX5_ATOMIC_MODE_DCT_OFF,
+   MLX5_ATOMIC_MODE_DCT_CX = 2,
 };
 
 enum {
-- 
2.14.4

Re: [PATCH net-next] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Vlad Buslov



On Mon 03 Sep 2018 at 17:03, Cong Wang  wrote:
> On Mon, Sep 3, 2018 at 12:10 AM Vlad Buslov  wrote:
>>
>> Recent refactoring of add_metainfo() caused use_all_metadata() to add
>> metainfo to ife action metalist without taking reference to module. This
>> causes warning in module_put called from ife action cleanup function.
>>
>> Implement add_metainfo_and_get_ops() function that returns with reference
>> to module taken if metainfo was added successfully, and call it from
>> use_all_metadata(), instead of calling __add_metainfo() directly.
>
> Good catch!
>
> I thought every entry in ifeoplist must hold a refcnt to its module, looks
> like I was wrong.
>
>
>> read_lock(&ife_mod_lock);
>> list_for_each_entry(o, &ifeoplist, list) {
>> -   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, 
>> exists);
>> +   rc = add_metainfo_and_get_ops(o, ife, o->metaid, NULL, 0, 
>> true,
>> + exists);
>> if (rc == 0)
>> installed += 1;
>
> I am afraid you have to rollback on failure inside this loop, that is,
> releasing all previous module refcnt properly on error.

Do I? This function looks like it is explicitly designed to succeed if
at least one metainfo was successfully added. And this is how it was
originally implemented before deadlock fix refactoring.

Re: [PATCH net-next] net: sched: action_ife: take reference to meta module

2018-09-03 Thread Cong Wang

On Mon, Sep 3, 2018 at 12:10 AM Vlad Buslov  wrote:
>
> Recent refactoring of add_metainfo() caused use_all_metadata() to add
> metainfo to ife action metalist without taking reference to module. This
> causes warning in module_put called from ife action cleanup function.
>
> Implement add_metainfo_and_get_ops() function that returns with reference
> to module taken if metainfo was added successfully, and call it from
> use_all_metadata(), instead of calling __add_metainfo() directly.

Good catch!

I thought every entry in ifeoplist must hold a refcnt to its module, looks
like I was wrong.


> read_lock(&ife_mod_lock);
> list_for_each_entry(o, &ifeoplist, list) {
> -   rc = __add_metainfo(o, ife, o->metaid, NULL, 0, true, exists);
> +   rc = add_metainfo_and_get_ops(o, ife, o->metaid, NULL, 0, 
> true,
> + exists);
> if (rc == 0)
> installed += 1;

I am afraid you have to rollback on failure inside this loop, that is,
releasing all previous module refcnt properly on error.

Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet

On Mon, 2018-09-03 at 16:19 +0100, Jose Abreu wrote:
> On 03-09-2018 15:07, Jerome Brunet wrote:
> > 
> > You had it on what you sent in the RFT, but this different.
> 
> Yeah, I had to fix the logic where tx queues != rx queues...
> 
> > 
> > Like with the RFT, the network breakdown we had is no longer reproduced.
> > However this patch wreck the Rx throughput (680MBps -> 35MBps)
> 
> Damn, thats low. And I cant reproduce it here :/
> 
> Strange because I barely messed around with RX path...
> 
> Can you try attached patch in top of this one please?

Situation is even worse with this.
I'm using an NFS root filesystem. With your fixup, I'm not reaching the prompt
anymore. Looks like a the same kind of network breakdown we had previously

> 
> > 
> > BTW, this patch and the RFT assume that 4ae0169fd1b3 ("net: stmmac: Do not 
> > keep
> > rearming the coalesce timer in stmmac_xmit") is still applied but I believe
> > David reverted the patch.
> > 
> > If you still need this change, you should include it back in your changeset.
> 
> Yes I know it was reverted but -net was not merged into -next yet...
> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> 
> > 
> > > Thanks and Best Regards,
> > > Jose Miguel Abreu
> > > ---
> > >  drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
> > >  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |   7 +-
> > >  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 
> > > +++---
> > >  3 files changed, 126 insertions(+), 62 deletions(-)
> 
>

[PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace

2018-09-03 Thread Arseny Maslennikov

Signed-off-by: Arseny Maslennikov 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 38 +++
 1 file changed, 38 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 30f840f874b3..7386e5bde3d3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2386,6 +2386,42 @@ int ipoib_add_pkey_attr(struct net_device *dev)
return device_create_file(&dev->dev, &dev_attr_pkey);
 }
 
+/*
+ * We erroneously exposed the iface's port number in the dev_id
+ * sysfs field long after dev_port was introduced for that purpose[1],
+ * and we need to stop everyone from relying on that.
+ * Let's overload the shower routine for the dev_id file here
+ * to gently bring the issue up.
+ *
+ * [1] https://www.spinics.net/lists/netdev/msg272123.html
+ */
+static ssize_t dev_id_show(struct device *dev,
+  struct device_attribute *attr, char *buf)
+{
+   struct net_device *ndev = to_net_dev(dev);
+   ssize_t ret = -EINVAL;
+
+   if (ndev->dev_id == ndev->dev_port) {
+   netdev_info_once(ndev,
+   "\"%s\" wants to know my dev_id. "
+   "Should it look at dev_port instead?\n",
+   current->comm);
+   netdev_info_once(ndev,
+   "See Documentation/ABI/testing/sysfs-class-net for more 
info.\n");
+   }
+
+   ret = sprintf(buf, "%#x\n", ndev->dev_id);
+
+   return ret;
+}
+static DEVICE_ATTR_RO(dev_id);
+
+int ipoib_intercept_dev_id_attr(struct net_device *dev)
+{
+   device_remove_file(&dev->dev, &dev_attr_dev_id);
+   return device_create_file(&dev->dev, &dev_attr_dev_id);
+}
+
 static struct net_device *ipoib_add_port(const char *format,
 struct ib_device *hca, u8 port)
 {
@@ -2427,6 +2463,8 @@ static struct net_device *ipoib_add_port(const char 
*format,
 */
ndev->priv_destructor = ipoib_intf_free;
 
+   if (ipoib_intercept_dev_id_attr(ndev))
+   goto sysfs_failed;
if (ipoib_cm_add_mode_attr(ndev))
goto sysfs_failed;
if (ipoib_add_pkey_attr(ndev))
-- 
2.19.0.rc1

[PATCH v3 1/3] Documentation/ABI: document /sys/class/net/*/dev_port

2018-09-03 Thread Arseny Maslennikov

The sysfs field was introduced 4 years ago along with fixes to various
drivers that erroneously used `dev_id' for that purpose, but it was not
properly documented anywhere.
See commit v3.14-rc3-739-g3f85944fe207.

Signed-off-by: Arseny Maslennikov 
---
 Documentation/ABI/testing/sysfs-class-net | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-class-net 
b/Documentation/ABI/testing/sysfs-class-net
index 2f1788111cd9..ec2232f6a949 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -91,6 +91,24 @@ Description:
stacked (e.g: VLAN interfaces) but still have the same MAC
address as their parent device.
 
+What:  /sys/class/net//dev_port
+Date:  February 2014
+KernelVersion: 3.15
+Contact:   netdev@vger.kernel.org
+Description:
+   Indicates the port number of this network device, formatted
+   as a decimal value. Some NICs have multiple independent ports
+   on the same PCI bus, device and function. This attribute allows
+   userspace to distinguish the respective interfaces.
+
+   Note: some device drivers started to use 'dev_id' for this
+   purpose since long before 3.15 and have not adopted the new
+   attribute ever since. To query the port number, some tools look
+   exclusively at 'dev_port', while others only consult 'dev_id'.
+   If a network device has multiple client adapter ports as
+   described in the previous paragraph and does not set this
+   attribute to its port number, it's a kernel bug.
+
 What:  /sys/class/net//dormant
 Date:  March 2006
 KernelVersion: 2.6.17
-- 
2.19.0.rc1

[PATCH v3 2/3] IB/ipoib: Use dev_port to expose network interface port numbers

2018-09-03 Thread Arseny Maslennikov

Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.

Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.

The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2a0 (`net/mlx4_en: Expose port number through sysfs').

Signed-off-by: Arseny Maslennikov 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index e3d28f9ad9c0..30f840f874b3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1880,6 +1880,8 @@ static int ipoib_parent_init(struct net_device *ndev)
   sizeof(union ib_gid));
 
SET_NETDEV_DEV(priv->dev, priv->ca->dev.parent);
+   priv->dev->dev_port = priv->port - 1;
+   /* Let's set this one too for backwards compatibility. */
priv->dev->dev_id = priv->port - 1;
 
return 0;
-- 
2.19.0.rc1

[PATCH v3 0/3] IB/ipoib: Use dev_port to disambiguate port numbers

2018-09-03 Thread Arseny Maslennikov

Pre-3.15 userspace had trouble distinguishing different ports
of a NIC on a single PCI bus/device/function. To solve this,
a sysfs field `dev_port' was introduced quite a while ago
(commit v3.14-rc3-739-g3f85944fe207), and some relevant device
drivers were fixed to use it, but not in case of IPoIB.

The convention for some reason never got documented in the kernel, but
was immediately adopted by userspace (notably udev[1][2], biosdevname[3])

1/3 documents the sysfs field — that's why I'm CC-ing netdev.

This series was tested on and applies to 4.19-rc2.

[1] https://lists.freedesktop.org/archives/systemd-devel/2014-June/020788.html
[2] https://lists.freedesktop.org/archives/systemd-devel/2014-July/020804.html
[3] 
https://github.com/CloudAutomationNTools/biosdevname/blob/c795d51dd93a5309652f0d635f12a3ecfabfaa72/src/eths.c#L38

v1->v2: replace a line instead of inserting and then removing.
v2->v3: restore both attributes, output a notice of deprecation to kmsg.

Arseny Maslennikov (3):
  Documentation/ABI: document /sys/class/net/*/dev_port
  IB/ipoib: Use dev_port to expose network interface port numbers
  IB/ipoib: Log sysfs 'dev_id' accesses from userspace

 Documentation/ABI/testing/sysfs-class-net | 18 ++
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 40 +++
 2 files changed, 58 insertions(+)

-- 
2.19.0.rc1

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu

On 03-09-2018 16:38, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
>> On 03-09-2018 15:10, Jerome Brunet wrote:
>>> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
 On 03-09-2018 11:16, Jerome Brunet wrote:
> No notable change. Rx is fine but Tx:
> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>
> I suppose the problem as something to do with the retries. When doing Tx 
> test
> alone, we don't have such a things a throughput where we expect it to be.
 Yeah, I just remembered you are not using GMAC4 so it wouldn't
 make a difference. Is your version 3.710? If so please try adding
 the following compatible to your DT bindings "snps,dwmac-3.710".
>>> According to the documentation, it is a 3.70a but I learn (the hard way) 
>>> not to
>>> trust the documentation too much. Is there anyway to make sure which 
>>> version we
>>> have. Like a register to read ?
>> It should be dumped at probe by a string like this one:
>>
>> "User ID: 0xXY, Synopsys ID: 0xXZ"
> User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?

Its 3.7. As for the User ID this can be changed by final HW team
so I can't confirm what it means.

>
>>> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
>>> some reason, the MDIO bus failed to register with this. Since it is not the
>>> documented version, I did not check why.
>> No you can't change. You need to add it. So it should stay like this:
>>
>> compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
>> "snps,dwmac-3.710";
> Adding "snps,dwmac-3.710" does not change anything for me.
> Having both Tx and Rx at the same time still wreck Tx throughput 
> unfortunately 

Okay, so you said that there are lots of retries: can you disable
COE at all ? (it should be something like: ethtool -K eth0 rx off
tx off).

Thanks and Best Regards,
Jose Miguel Abreu

>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
> By the way, your mailer (and its auto 80 column rule I suppose) made the 
> patch
> below a bit harder to apply
 Sorry. Next time I will send as attachment.
>>> No worries
>>>
 Thanks and Best Regards,
 Jose Miguel Abreu
>>
>

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet

On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
> On 03-09-2018 15:10, Jerome Brunet wrote:
> > On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> > > On 03-09-2018 11:16, Jerome Brunet wrote:
> > > > No notable change. Rx is fine but Tx:
> > > > [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
> > > > 
> > > > I suppose the problem as something to do with the retries. When doing 
> > > > Tx test
> > > > alone, we don't have such a things a throughput where we expect it to 
> > > > be.
> > > 
> > > Yeah, I just remembered you are not using GMAC4 so it wouldn't
> > > make a difference. Is your version 3.710? If so please try adding
> > > the following compatible to your DT bindings "snps,dwmac-3.710".
> > 
> > According to the documentation, it is a 3.70a but I learn (the hard way) 
> > not to
> > trust the documentation too much. Is there anyway to make sure which 
> > version we
> > have. Like a register to read ?
> 
> It should be dumped at probe by a string like this one:
> 
> "User ID: 0xXY, Synopsys ID: 0xXZ"

User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?

> 
> > 
> > Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
> > some reason, the MDIO bus failed to register with this. Since it is not the
> > documented version, I did not check why.
> 
> No you can't change. You need to add it. So it should stay like this:
> 
> compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
> "snps,dwmac-3.710";

Adding "snps,dwmac-3.710" does not change anything for me.
Having both Tx and Rx at the same time still wreck Tx throughput unfortunately 

> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> 
> > 
> > > > By the way, your mailer (and its auto 80 column rule I suppose) made 
> > > > the patch
> > > > below a bit harder to apply
> > > 
> > > Sorry. Next time I will send as attachment.
> > 
> > No worries
> > 
> > > Thanks and Best Regards,
> > > Jose Miguel Abreu
> 
>

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu

On 03-09-2018 15:10, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 11:16, Jerome Brunet wrote:
>>> No notable change. Rx is fine but Tx:
>>> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>>>
>>> I suppose the problem as something to do with the retries. When doing Tx 
>>> test
>>> alone, we don't have such a things a throughput where we expect it to be.
>> Yeah, I just remembered you are not using GMAC4 so it wouldn't
>> make a difference. Is your version 3.710? If so please try adding
>> the following compatible to your DT bindings "snps,dwmac-3.710".
> According to the documentation, it is a 3.70a but I learn (the hard way) not 
> to
> trust the documentation too much. Is there anyway to make sure which version 
> we
> have. Like a register to read ?

It should be dumped at probe by a string like this one:

"User ID: 0xXY, Synopsys ID: 0xXZ"

>
> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
> some reason, the MDIO bus failed to register with this. Since it is not the
> documented version, I did not check why.

No you can't change. You need to add it. So it should stay like this:

compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
"snps,dwmac-3.710";

Thanks and Best Regards,
Jose Miguel Abreu

>
>>> By the way, your mailer (and its auto 80 column rule I suppose) made the 
>>> patch
>>> below a bit harder to apply
>> Sorry. Next time I will send as attachment.
> No worries
>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>

[PATCH net] net/mlx5: Fix SQ offset in QPs with small RQ

2018-09-03 Thread Tariq Toukan

Correct the formula for calculating the RQ page remainder,
which should be in byte granularity.  The result will be
non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
is a power of 2.

Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
SQ offset in strides granularity.

Fixes: d7037ad73daa ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: Tariq Toukan 
Reviewed-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/wq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Hi Dave,
Please queue for -stable v4.18.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
index 86478a6b99c5..c8c315eb5128 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
@@ -139,14 +139,15 @@ int mlx5_wq_qp_create(struct mlx5_core_dev *mdev, struct 
mlx5_wq_param *param,
  struct mlx5_wq_ctrl *wq_ctrl)
 {
u32 sq_strides_offset;
+   u32 rq_pg_remainder;
int err;
 
mlx5_fill_fbc(MLX5_GET(qpc, qpc, log_rq_stride) + 4,
  MLX5_GET(qpc, qpc, log_rq_size),
  &wq->rq.fbc);
 
-   sq_strides_offset =
-   ((wq->rq.fbc.frag_sz_m1 + 1) % PAGE_SIZE) / MLX5_SEND_WQE_BB;
+   rq_pg_remainder   = mlx5_wq_cyc_get_byte_size(&wq->rq) % PAGE_SIZE;
+   sq_strides_offset = rq_pg_remainder / MLX5_SEND_WQE_BB;
 
mlx5_fill_fbc_offset(ilog2(MLX5_SEND_WQE_BB),
 MLX5_GET(qpc, qpc, log_sq_size),
-- 
1.8.3.1

Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu

On 03-09-2018 15:07, Jerome Brunet wrote:
>
> You had it on what you sent in the RFT, but this different.

Yeah, I had to fix the logic where tx queues != rx queues...

>
> Like with the RFT, the network breakdown we had is no longer reproduced.
> However this patch wreck the Rx throughput (680MBps -> 35MBps)

Damn, thats low. And I cant reproduce it here :/

Strange because I barely messed around with RX path...

Can you try attached patch in top of this one please?

>
> BTW, this patch and the RFT assume that 4ae0169fd1b3 ("net: stmmac: Do not 
> keep
> rearming the coalesce timer in stmmac_xmit") is still applied but I believe
> David reverted the patch.
>
> If you still need this change, you should include it back in your changeset.

Yes I know it was reverted but -net was not merged into -next yet...

Thanks and Best Regards,
Jose Miguel Abreu

>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |   7 +-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 
>> +++---
>>  3 files changed, 126 insertions(+), 62 deletions(-)
>

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 14f890f2a970..3c7cfda80433 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2247,10 +2247,8 @@ static void stmmac_tx_timer(struct timer_list *t)
 	struct stmmac_tx_queue *tx_q = from_timer(tx_q, t, txtimer);
 	struct stmmac_priv *priv = tx_q->priv_data;
 
-	if (napi_schedule_prep(&tx_q->napi)) {
-		stmmac_disable_dma_irq(priv, priv->ioaddr, tx_q->queue_index);
+	if (napi_schedule_prep(&tx_q->napi))
 		__napi_schedule(&tx_q->napi);
-	}
 
 	tx_q->tx_timer_active = 0;
 }

[PATCH v2 net-next] packet: add sockopt to ignore outgoing packets

2018-09-03 Thread Vincent Whitchurch

Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them.  So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)

Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this.  Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.

The first intended user is lldpd.

Signed-off-by: Vincent Whitchurch 
---
v2: Stricter value validation.
Moved ignore check out of skb_loop_sk().

 include/linux/netdevice.h  |  1 +
 include/uapi/linux/if_packet.h |  1 +
 net/core/dev.c |  3 +++
 net/packet/af_packet.c | 17 +
 4 files changed, 22 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ca5ab98053c8..8ef14d9edc58 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2317,6 +2317,7 @@ static inline struct sk_buff 
*call_gro_receive_sk(gro_receive_sk_t cb,
 
 struct packet_type {
__be16  type;   /* This is really htons(ether_type). */
+   boolignore_outgoing;
struct net_device   *dev;   /* NULL is wildcarded here   */
int (*func) (struct sk_buff *,
 struct net_device *,
diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index 67b61d91d89b..467b654bd4c7 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -57,6 +57,7 @@ struct sockaddr_ll {
 #define PACKET_QDISC_BYPASS20
 #define PACKET_ROLLOVER_STATS  21
 #define PACKET_FANOUT_DATA 22
+#define PACKET_IGNORE_OUTGOING 23
 
 #define PACKET_FANOUT_HASH 0
 #define PACKET_FANOUT_LB   1
diff --git a/net/core/dev.c b/net/core/dev.c
index 325fc5088370..09dcf190c081 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1970,6 +1970,9 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct 
net_device *dev)
rcu_read_lock();
 again:
list_for_each_entry_rcu(ptype, ptype_list, list) {
+   if (ptype->ignore_outgoing)
+   continue;
+
/* Never send packets back to the socket
 * they originated from - MvS (miqu...@drinkel.ow.org)
 */
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5610061e7f2e..23336498eb9f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3805,6 +3805,20 @@ packet_setsockopt(struct socket *sock, int level, int 
optname, char __user *optv
 
return fanout_set_data(po, optval, optlen);
}
+   case PACKET_IGNORE_OUTGOING:
+   {
+   int val;
+
+   if (optlen != sizeof(val))
+   return -EINVAL;
+   if (copy_from_user(&val, optval, sizeof(val)))
+   return -EFAULT;
+   if (val < 0 || val > 1)
+   return -EINVAL;
+
+   po->prot_hook.ignore_outgoing = !!val;
+   return 0;
+   }
case PACKET_TX_HAS_OFF:
{
unsigned int val;
@@ -3928,6 +3942,9 @@ static int packet_getsockopt(struct socket *sock, int 
level, int optname,
((u32)po->fanout->flags << 24)) :
   0);
break;
+   case PACKET_IGNORE_OUTGOING:
+   val = po->prot_hook.ignore_outgoing;
+   break;
case PACKET_ROLLOVER_STATS:
if (!po->rollover)
return -EINVAL;
-- 
2.11.0

Re: [PATCH v7 1/4] gpiolib: Pass bitmaps, not integer arrays, to get/set array

2018-09-03 Thread Geert Uytterhoeven

On Mon, Sep 3, 2018 at 6:31 AM Matthew Wilcox  wrote:
> > +++ b/drivers/auxdisplay/hd44780.c
> > @@ -62,17 +62,12 @@ static void hd44780_strobe_gpio(struct hd44780 *hd)
> >  /* write to an LCD panel register in 8 bit GPIO mode */
> >  static void hd44780_write_gpio8(struct hd44780 *hd, u8 val, unsigned int 
> > rs)
> >  {
> > - int values[10]; /* for DATA[0-7], RS, RW */
> > - unsigned int i, n;
> > -
> > - for (i = 0; i < 8; i++)
> > - values[PIN_DATA0 + i] = !!(val & BIT(i));
> > - values[PIN_CTRL_RS] = rs;
> > - n = 9;
> > - if (hd->pins[PIN_CTRL_RW]) {
> > - values[PIN_CTRL_RW] = 0;
> > - n++;
> > - }
> > + DECLARE_BITMAP(values, 10); /* for DATA[0-7], RS, RW */
> > + unsigned int n;
> > +
> > + *values = val;
> > + __assign_bit(8, values, rs);
> > + n = hd->pins[PIN_CTRL_RW] ? 10 : 9;
>
> Doesn't this assume little endian bitmaps?  Has anyone tested this on
> big-endian machines?

include/linux/bitops.h:

static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
 bool value)
{
if (value)
__set_bit(nr, addr);
else
__clear_bit(nr, addr);
}

include/asm-generic/bitops/non-atomic.h:

static inline void __set_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);

*p  |= mask;
}

include/linux/bits.h:

#define BIT_MASK(nr)(1UL << ((nr) % BITS_PER_LONG))

Looks like native endianness to me.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet

On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> On 03-09-2018 11:16, Jerome Brunet wrote:
> > No notable change. Rx is fine but Tx:
> > [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
> > 
> > I suppose the problem as something to do with the retries. When doing Tx 
> > test
> > alone, we don't have such a things a throughput where we expect it to be.
> 
> Yeah, I just remembered you are not using GMAC4 so it wouldn't
> make a difference. Is your version 3.710? If so please try adding
> the following compatible to your DT bindings "snps,dwmac-3.710".

According to the documentation, it is a 3.70a but I learn (the hard way) not to
trust the documentation too much. Is there anyway to make sure which version we
have. Like a register to read ?

Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
some reason, the MDIO bus failed to register with this. Since it is not the
documented version, I did not check why.

> 
> > 
> > By the way, your mailer (and its auto 80 column rule I suppose) made the 
> > patch
> > below a bit harder to apply
> 
> Sorry. Next time I will send as attachment.

No worries

> 
> Thanks and Best Regards,
> Jose Miguel Abreu

Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet

On Mon, 2018-09-03 at 14:35 +0100, Jose Abreu wrote:
> This follows David Miller advice and tries to fix coalesce timer in
> multi-queue scenarios.
> 
> We are now using per-queue coalesce values and per-queue TX timer.
> 
> Coalesce timer default values was changed to 1ms and the coalesce frames
> to 25.
> 
> Tested in B2B setup between XGMAC2 and GMAC5.
> 
> Signed-off-by: Jose Abreu 
> Cc: Jerome Brunet 
> Cc: Martin Blumenstingl 
> Cc: David S. Miller 
> Cc: Joao Pinto 
> Cc: Giuseppe Cavallaro 
> Cc: Alexandre Torgue 
> ---
> Jerome,
> 
> Can I have your Tested-by in this patch?

You had it on what you sent in the RFT, but this different.

Like with the RFT, the network breakdown we had is no longer reproduced.
However this patch wreck the Rx throughput (680MBps -> 35MBps)

BTW, this patch and the RFT assume that 4ae0169fd1b3 ("net: stmmac: Do not keep
rearming the coalesce timer in stmmac_xmit") is still applied but I believe
David reverted the patch.

If you still need this change, you should include it back in your changeset.

> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> ---
>  drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h  |   7 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 
> +++---
>  3 files changed, 126 insertions(+), 62 deletions(-)

Re: phys_port_id in switchdev mode?

2018-09-03 Thread Marcelo Ricardo Leitner

On Sat, Sep 01, 2018 at 01:34:12PM +0200, Jakub Kicinski wrote:
> On Fri, 31 Aug 2018 17:13:22 -0300, Marcelo Ricardo Leitner wrote:
> > On Tue, Aug 28, 2018 at 08:43:51PM +0200, Jakub Kicinski wrote:
> > > Ugh, CC: netdev..
> > > 
> > > On Tue, 28 Aug 2018 20:05:39 +0200, Jakub Kicinski wrote:  
> > > > Hi!
> > > > 
> > > > I wonder if we can use phys_port_id in switchdev to group together
> > > > interfaces of a single PCI PF?  Here is the problem:  
> > 
> > On Mellanox cards, this is already possible via phys_switch_id, as
> > each PF has its own phys_switch_id. So all VFs with a given
> > phys_switch_id belong to the PF with that same phys_switch_id.
> 
> You mean Connect-X4 and on, Connect-X3 also shares PF between ports.

Yes ConnectX-3 shares PF beween ports but doesn't support switchdev
mode.

I see the issue now. I was still considering the external ports as
uplink representors.

[PATCH net-next 0/2] net: stmmac: Coalesce and tail addr fixes

2018-09-03 Thread Jose Abreu

The fix for coalesce timer and a fix in tail address setting that impacts
XGMAC2 operation.

Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 

Jose Abreu (2):
  net: stmmac: Rework coalesce timer and fix multi-queue races
  net: stmmac: Fixup the tail addr setting in xmit path

 drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |   7 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 182 +++---
 3 files changed, 129 insertions(+), 64 deletions(-)

-- 
2.7.4

[PATCH net-next 2/2] net: stmmac: Fixup the tail addr setting in xmit path

2018-09-03 Thread Jose Abreu

Currently we are always setting the tail address of descriptor list to
the end of the pre-allocated list.

According to databook this is not correct. Tail address should point to
the last available descriptor + 1, which means we have to update the
tail address everytime we call the xmit function.

This should make no impact in older versions of MAC but in newer
versions there are some DMA features which allows the IP to fetch
descriptors in advance and in a non sequential order so its critical
that we set the tail address correctly.

Signed-off-by: Jose Abreu 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1fca66ad6b17..14f890f2a970 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2224,8 +2224,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv 
*priv)
stmmac_init_tx_chan(priv, priv->ioaddr, priv->plat->dma_cfg,
tx_q->dma_tx_phy, chan);
 
-   tx_q->tx_tail_addr = tx_q->dma_tx_phy +
-   (DMA_TX_SIZE * sizeof(struct dma_desc));
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy;
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
   tx_q->tx_tail_addr, chan);
}
@@ -3015,6 +3014,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
 
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
 
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
@@ -3235,6 +3235,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
stmmac_enable_dma_transmission(priv, priv->ioaddr);
 
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
 
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
-- 
2.7.4

[PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu

This follows David Miller advice and tries to fix coalesce timer in
multi-queue scenarios.

We are now using per-queue coalesce values and per-queue TX timer.

Coalesce timer default values was changed to 1ms and the coalesce frames
to 25.

Tested in B2B setup between XGMAC2 and GMAC5.

Signed-off-by: Jose Abreu 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: David S. Miller 
Cc: Joao Pinto 
Cc: Giuseppe Cavallaro 
Cc: Alexandre Torgue 
---
Jerome,

Can I have your Tested-by in this patch?

Thanks and Best Regards,
Jose Miguel Abreu
---
 drivers/net/ethernet/stmicro/stmmac/common.h  |   4 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h  |   7 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 +++---
 3 files changed, 126 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h 
b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1854f270ad66..b1b305f8f414 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -258,10 +258,10 @@ struct stmmac_safety_stats {
 #define MAX_DMA_RIWT   0xff
 #define MIN_DMA_RIWT   0x20
 /* Tx coalesce parameters */
-#define STMMAC_COAL_TX_TIMER   4
+#define STMMAC_COAL_TX_TIMER   1000
 #define STMMAC_MAX_COAL_TX_TICK10
 #define STMMAC_TX_MAX_FRAMES   256
-#define STMMAC_TX_FRAMES   64
+#define STMMAC_TX_FRAMES   25
 
 /* Packets types */
 enum packets_types {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 76649adf8fb0..957030cfb833 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -48,6 +48,9 @@ struct stmmac_tx_info {
 
 /* Frequently used values are kept adjacent for cache effect */
 struct stmmac_tx_queue {
+   u32 tx_count_frames;
+   int tx_timer_active;
+   struct timer_list txtimer;
u32 queue_index;
struct stmmac_priv *priv_data;
struct dma_extended_desc *dma_etx cacheline_aligned_in_smp;
@@ -59,6 +62,7 @@ struct stmmac_tx_queue {
dma_addr_t dma_tx_phy;
u32 tx_tail_addr;
u32 mss;
+   struct napi_struct napi cacheline_aligned_in_smp;
 };
 
 struct stmmac_rx_queue {
@@ -109,15 +113,12 @@ struct stmmac_pps_cfg {
 
 struct stmmac_priv {
/* Frequently used values are kept adjacent for cache effect */
-   u32 tx_count_frames;
u32 tx_coal_frames;
u32 tx_coal_timer;
-   bool tx_timer_armed;
 
int tx_coalesce;
int hwts_tx_en;
bool tx_path_in_lpi_mode;
-   struct timer_list txtimer;
bool tso;
 
unsigned int dma_buf_sz;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ff1ffb46198a..1fca66ad6b17 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -148,6 +148,7 @@ static void stmmac_verify_args(void)
 static void stmmac_disable_all_queues(struct stmmac_priv *priv)
 {
u32 rx_queues_cnt = priv->plat->rx_queues_to_use;
+   u32 tx_queues_cnt = priv->plat->tx_queues_to_use;
u32 queue;
 
for (queue = 0; queue < rx_queues_cnt; queue++) {
@@ -155,6 +156,12 @@ static void stmmac_disable_all_queues(struct stmmac_priv 
*priv)
 
napi_disable(&rx_q->napi);
}
+
+   for (queue = 0; queue < tx_queues_cnt; queue++) {
+   struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
+
+   napi_disable(&tx_q->napi);
+   }
 }
 
 /**
@@ -164,6 +171,7 @@ static void stmmac_disable_all_queues(struct stmmac_priv 
*priv)
 static void stmmac_enable_all_queues(struct stmmac_priv *priv)
 {
u32 rx_queues_cnt = priv->plat->rx_queues_to_use;
+   u32 tx_queues_cnt = priv->plat->tx_queues_to_use;
u32 queue;
 
for (queue = 0; queue < rx_queues_cnt; queue++) {
@@ -171,6 +179,12 @@ static void stmmac_enable_all_queues(struct stmmac_priv 
*priv)
 
napi_enable(&rx_q->napi);
}
+
+   for (queue = 0; queue < tx_queues_cnt; queue++) {
+   struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
+
+   napi_enable(&tx_q->napi);
+   }
 }
 
 /**
@@ -1843,18 +1857,18 @@ static void stmmac_dma_operation_mode(struct 
stmmac_priv *priv)
  * @queue: TX queue index
  * Description: it reclaims the transmit resources after transmission 
completes.
  */
-static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
+static int stmmac_tx_clean(struct stmmac_priv *priv, int limit, u32 queue)
 {
struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
unsigned int bytes_compl = 0, pkts_compl = 0;
-   unsigned int entry;
+   unsigned int entry, count = 0;
 
netif_tx_lock(priv->dev);
 
priv->xstats.tx_clean++;
 
entry = tx_q->dirty_tx;
-   while (entry != tx_q->

Re: [PATCH net] sctp: hold transport before accessing its asoc in sctp_transport_get_next

2018-09-03 Thread Neil Horman

On Fri, Aug 31, 2018 at 08:03:23AM -0400, Neil Horman wrote:
> On Fri, Aug 31, 2018 at 03:09:05PM +0800, Xin Long wrote:
> > On Wed, Aug 29, 2018 at 7:36 PM Neil Horman  wrote:
> > >
> > > On Wed, Aug 29, 2018 at 12:08:40AM +0800, Xin Long wrote:
> > > > On Mon, Aug 27, 2018 at 9:08 PM Neil Horman  
> > > > wrote:
> > > > >
> > > > > On Mon, Aug 27, 2018 at 06:38:31PM +0800, Xin Long wrote:
> > > > > > As Marcelo noticed, in sctp_transport_get_next, it is iterating over
> > > > > > transports but then also accessing the association directly, without
> > > > > > checking any refcnts before that, which can cause an use-after-free
> > > > > > Read.
> > > > > >
> > > > > > So fix it by holding transport before accessing the association. 
> > > > > > With
> > > > > > that, sctp_transport_hold calls can be removed in the later places.
> > > > > >
> > > > > > Fixes: 626d16f50f39 ("sctp: export some apis or variables for 
> > > > > > sctp_diag and reuse some for proc")
> > > > > > Reported-by: syzbot+fe62a0c9aa6a85c6d...@syzkaller.appspotmail.com
> > > > > > Signed-off-by: Xin Long 
> > > > > > ---
> > > > > >  net/sctp/proc.c   |  4 
> > > > > >  net/sctp/socket.c | 22 +++---
> > > > > >  2 files changed, 15 insertions(+), 11 deletions(-)
> > > > > >
> > > > > > diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> > > > > > index ef5c9a8..4d6f1c8 100644
> > > > > > --- a/net/sctp/proc.c
> > > > > > +++ b/net/sctp/proc.c
> > > > > > @@ -264,8 +264,6 @@ static int sctp_assocs_seq_show(struct seq_file 
> > > > > > *seq, void *v)
> > > > > >   }
> > > > > >
> > > > > >   transport = (struct sctp_transport *)v;
> > > > > > - if (!sctp_transport_hold(transport))
> > > > > > - return 0;
> > > > > >   assoc = transport->asoc;
> > > > > >   epb = &assoc->base;
> > > > > >   sk = epb->sk;
> > > > > > @@ -322,8 +320,6 @@ static int sctp_remaddr_seq_show(struct 
> > > > > > seq_file *seq, void *v)
> > > > > >   }
> > > > > >
> > > > > >   transport = (struct sctp_transport *)v;
> > > > > > - if (!sctp_transport_hold(transport))
> > > > > > - return 0;
> > > > > >   assoc = transport->asoc;
> > > > > >
> > > > > >   list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
> > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > index e96b15a..aa76586 100644
> > > > > > --- a/net/sctp/socket.c
> > > > > > +++ b/net/sctp/socket.c
> > > > > > @@ -5005,9 +5005,14 @@ struct sctp_transport 
> > > > > > *sctp_transport_get_next(struct net *net,
> > > > > >   break;
> > > > > >   }
> > > > > >
> > > > > > + if (!sctp_transport_hold(t))
> > > > > > + continue;
> > > > > > +
> > > > > >   if (net_eq(sock_net(t->asoc->base.sk), net) &&
> > > > > >   t->asoc->peer.primary_path == t)
> > > > > >   break;
> > > > > > +
> > > > > > + sctp_transport_put(t);
> > > > > >   }
> > > > > >
> > > > > >   return t;
> > > > > > @@ -5017,13 +5022,18 @@ struct sctp_transport 
> > > > > > *sctp_transport_get_idx(struct net *net,
> > > > > > struct rhashtable_iter 
> > > > > > *iter,
> > > > > > int pos)
> > > > > >  {
> > > > > > - void *obj = SEQ_START_TOKEN;
> > > > > > + struct sctp_transport *t;
> > > > > >
> > > > > > - while (pos && (obj = sctp_transport_get_next(net, iter)) &&
> > > > > > -!IS_ERR(obj))
> > > > > > - pos--;
> > > > > > + if (!pos)
> > > > > > + return SEQ_START_TOKEN;
> > > > > >
> > > > > > - return obj;
> > > > > > + while ((t = sctp_transport_get_next(net, iter)) && 
> > > > > > !IS_ERR(t)) {
> > > > > > + if (!--pos)
> > > > > > + break;
> > > > > > + sctp_transport_put(t);
> > > > > > + }
> > > > > > +
> > > > > > + return t;
> > > > > >  }
> > > > > >
> > > > > >  int sctp_for_each_endpoint(int (*cb)(struct sctp_endpoint *, void 
> > > > > > *),
> > > > > > @@ -5082,8 +5092,6 @@ int sctp_for_each_transport(int (*cb)(struct 
> > > > > > sctp_transport *, void *),
> > > > > >
> > > > > >   tsp = sctp_transport_get_idx(net, &hti, *pos + 1);
> > > > > >   for (; !IS_ERR_OR_NULL(tsp); tsp = 
> > > > > > sctp_transport_get_next(net, &hti)) {
> > > > > > - if (!sctp_transport_hold(tsp))
> > > > > > - continue;
> > > > > >   ret = cb(tsp, p);
> > > > > >   if (ret)
> > > > > >   break;
> > > > > > --
> > > > > > 2.1.0
> > > > > >
> > > > > >
> > > > > Acked-by: Neil Horman 
> > > > >
> > > > > Additionally, its not germaine to this particular fix, but why are we 
> > > > > still
> > > > > using that pos variable in sctp_transport_get_idx?  With the 
> > > > > conversion to
> > > > > rhashtables, it doesn't seem parti

[PATCH net-next v2] cxgb4: collect hardware queue descriptors

2018-09-03 Thread Rahul Lakkireddy

Collect descriptors of all ULD and LLD hardware queues managed
by LLD.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Ganesh Goudar 
---
v2:
- Move inline functions to header file.
- Add missing undefine for QDESC_GET_* macros.

 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h |  42 
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h |   3 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 238 ++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h| 106 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h|   7 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c  |   4 +
 6 files changed, 399 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index 36d25883d123..b2d617abcf49 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -315,6 +315,48 @@ struct cudbg_pbt_tables {
u32 pbt_data[CUDBG_PBT_DATA_ENTRIES];
 };
 
+enum cudbg_qdesc_qtype {
+   CUDBG_QTYPE_UNKNOWN = 0,
+   CUDBG_QTYPE_NIC_TXQ,
+   CUDBG_QTYPE_NIC_RXQ,
+   CUDBG_QTYPE_NIC_FLQ,
+   CUDBG_QTYPE_CTRLQ,
+   CUDBG_QTYPE_FWEVTQ,
+   CUDBG_QTYPE_INTRQ,
+   CUDBG_QTYPE_PTP_TXQ,
+   CUDBG_QTYPE_OFLD_TXQ,
+   CUDBG_QTYPE_RDMA_RXQ,
+   CUDBG_QTYPE_RDMA_FLQ,
+   CUDBG_QTYPE_RDMA_CIQ,
+   CUDBG_QTYPE_ISCSI_RXQ,
+   CUDBG_QTYPE_ISCSI_FLQ,
+   CUDBG_QTYPE_ISCSIT_RXQ,
+   CUDBG_QTYPE_ISCSIT_FLQ,
+   CUDBG_QTYPE_CRYPTO_TXQ,
+   CUDBG_QTYPE_CRYPTO_RXQ,
+   CUDBG_QTYPE_CRYPTO_FLQ,
+   CUDBG_QTYPE_TLS_RXQ,
+   CUDBG_QTYPE_TLS_FLQ,
+   CUDBG_QTYPE_MAX,
+};
+
+#define CUDBG_QDESC_REV 1
+
+struct cudbg_qdesc_entry {
+   u32 data_size;
+   u32 qtype;
+   u32 qid;
+   u32 desc_size;
+   u32 num_desc;
+   u8 data[0]; /* Must be last */
+};
+
+struct cudbg_qdesc_info {
+   u32 qdesc_entry_size;
+   u32 num_queues;
+   u8 data[0]; /* Must be last */
+};
+
 #define IREG_NUM_ELEM 4
 
 static const u32 t6_tp_pio_array[][IREG_NUM_ELEM] = {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 215fe6260fd7..dec63c15c0ba 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -81,7 +81,8 @@ enum cudbg_dbg_entity_type {
CUDBG_MBOX_LOG = 66,
CUDBG_HMA_INDIRECT = 67,
CUDBG_HMA = 68,
-   CUDBG_MAX_ENTITY = 70,
+   CUDBG_QDESC = 70,
+   CUDBG_MAX_ENTITY = 71,
 };
 
 struct cudbg_init {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index d97e0d7e541a..7c49681407ad 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -19,6 +19,7 @@
 
 #include "t4_regs.h"
 #include "cxgb4.h"
+#include "cxgb4_cudbg.h"
 #include "cudbg_if.h"
 #include "cudbg_lib_common.h"
 #include "cudbg_entity.h"
@@ -2890,3 +2891,240 @@ int cudbg_collect_hma_indirect(struct cudbg_init 
*pdbg_init,
}
return cudbg_write_and_release_buff(pdbg_init, &temp_buff, dbg_buff);
 }
+
+void cudbg_fill_qdesc_num_and_size(const struct adapter *padap,
+  u32 *num, u32 *size)
+{
+   u32 tot_entries = 0, tot_size = 0;
+
+   /* NIC TXQ, RXQ, FLQ, and CTRLQ */
+   tot_entries += MAX_ETH_QSETS * 3;
+   tot_entries += MAX_CTRL_QUEUES;
+
+   tot_size += MAX_ETH_QSETS * MAX_TXQ_ENTRIES * MAX_TXQ_DESC_SIZE;
+   tot_size += MAX_ETH_QSETS * MAX_RSPQ_ENTRIES * MAX_RXQ_DESC_SIZE;
+   tot_size += MAX_ETH_QSETS * MAX_RX_BUFFERS * MAX_FL_DESC_SIZE;
+   tot_size += MAX_CTRL_QUEUES * MAX_CTRL_TXQ_ENTRIES *
+   MAX_CTRL_TXQ_DESC_SIZE;
+
+   /* FW_EVTQ and INTRQ */
+   tot_entries += INGQ_EXTRAS;
+   tot_size += INGQ_EXTRAS * MAX_RSPQ_ENTRIES * MAX_RXQ_DESC_SIZE;
+
+   /* PTP_TXQ */
+   tot_entries += 1;
+   tot_size += MAX_TXQ_ENTRIES * MAX_TXQ_DESC_SIZE;
+
+   /* ULD TXQ, RXQ, and FLQ */
+   tot_entries += CXGB4_TX_MAX * MAX_OFLD_QSETS;
+   tot_entries += CXGB4_ULD_MAX * MAX_ULD_QSETS * 2;
+
+   tot_size += CXGB4_TX_MAX * MAX_OFLD_QSETS * MAX_TXQ_ENTRIES *
+   MAX_TXQ_DESC_SIZE;
+   tot_size += CXGB4_ULD_MAX * MAX_ULD_QSETS * MAX_RSPQ_ENTRIES *
+   MAX_RXQ_DESC_SIZE;
+   tot_size += CXGB4_ULD_MAX * MAX_ULD_QSETS * MAX_RX_BUFFERS *
+   MAX_FL_DESC_SIZE;
+
+   /* ULD CIQ */
+   tot_entries += CXGB4_ULD_MAX * MAX_ULD_QSETS;
+   tot_size += CXGB4_ULD_MAX * MAX_ULD_QSETS * SGE_MAX_IQ_SIZE *
+   MAX_RXQ_DESC_SIZE;
+
+   tot_size += sizeof(struct cudbg_ver_hdr) +
+   sizeof(struct cudbg_qdesc_info) +
+   sizeof(struct cudbg_qdesc_entry) * tot_entries;
+
+   if (num)
+   *num = tot_e

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu

On 03-09-2018 11:16, Jerome Brunet wrote:
> No notable change. Rx is fine but Tx:
> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>
> I suppose the problem as something to do with the retries. When doing Tx test
> alone, we don't have such a things a throughput where we expect it to be.

Yeah, I just remembered you are not using GMAC4 so it wouldn't
make a difference. Is your version 3.710? If so please try adding
the following compatible to your DT bindings "snps,dwmac-3.710".

>
> By the way, your mailer (and its auto 80 column rule I suppose) made the patch
> below a bit harder to apply

Sorry. Next time I will send as attachment.

Thanks and Best Regards,
Jose Miguel Abreu

Re: [PATCH RFC net-next 00/11] udp gso

2018-09-03 Thread Sowmini Varadhan

On (09/03/18 10:02), Steffen Klassert wrote:
> I'm working on patches that builds such skb lists. The list is chained
> at the frag_list pointer of the first skb, all subsequent skbs are linked
> to the next pointer of the skb. It looks like this:

there are some risks to using the frag_list pointer, Alex Duyck
had pointed this out to me in 
  https://www.mail-archive.com/netdev@vger.kernel.org/msg131081.html
(see last paragraph, or search for the string "gotcha" there)

I dont know the details of your playground patch, but might want to
watch out for those to make sure it is immune to these issues..

--Sowmini

[PATCH V2 ipsec-next 0/2] xfrm: bug fixes when processing multiple transforms

2018-09-03 Thread Sowmini Varadhan

This series contains bug fixes that were encountered when I set
up a libreswan tunnel using the config below, which will set up
an IPsec policy involving 2 tmpls.

type=transport
compress=yes
esp=aes_gcm_c-128-null # offloaded to Niantic
auto=start

The non-offload test case uses  esp=aes_gcm_c-256-null.

Each patch has a technical description of the contents of the fix.

V2: added Fixes tag so that it can be backported to the stable trees.

Sowmini Varadhan (2):
  xfrm: reset transport header back to network header after all input
transforms ahave been applied
  xfrm: reset crypto_done when iterating over multiple input xfrms

 net/ipv4/xfrm4_input.c  |1 +
 net/ipv4/xfrm4_mode_transport.c |4 +---
 net/ipv6/xfrm6_input.c  |1 +
 net/ipv6/xfrm6_mode_transport.c |4 +---
 net/xfrm/xfrm_input.c   |1 +
 5 files changed, 5 insertions(+), 6 deletions(-)

[PATCH V2 ipsec-next 1/2] xfrm: reset transport header back to network header after all input transforms ahave been applied

2018-09-03 Thread Sowmini Varadhan

A policy may have been set up with multiple transforms (e.g., ESP
and ipcomp). In this situation, the ingress IPsec processing
iterates in xfrm_input() and applies each transform in turn,
processing the nexthdr to find any additional xfrm that may apply.

This patch resets the transport header back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Suggested-by: Steffen Klassert 
Signed-off-by: Sowmini Varadhan 
---
v2: added "Fixes" tag

 net/ipv4/xfrm4_input.c  |1 +
 net/ipv4/xfrm4_mode_transport.c |4 +---
 net/ipv6/xfrm6_input.c  |1 +
 net/ipv6/xfrm6_mode_transport.c |4 +---
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index bcfc00e..f8de248 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -67,6 +67,7 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
 
if (xo && (xo->flags & XFRM_GRO)) {
skb_mac_header_rebuild(skb);
+   skb_reset_transport_header(skb);
return 0;
}
 
diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
index 3d36644..1ad2c2c 100644
--- a/net/ipv4/xfrm4_mode_transport.c
+++ b/net/ipv4/xfrm4_mode_transport.c
@@ -46,7 +46,6 @@ static int xfrm4_transport_output(struct xfrm_state *x, 
struct sk_buff *skb)
 static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 {
int ihl = skb->data - skb_transport_header(skb);
-   struct xfrm_offload *xo = xfrm_offload(skb);
 
if (skb->transport_header != skb->network_header) {
memmove(skb_transport_header(skb),
@@ -54,8 +53,7 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct 
sk_buff *skb)
skb->network_header = skb->transport_header;
}
ip_hdr(skb)->tot_len = htons(skb->len + ihl);
-   if (!xo || !(xo->flags & XFRM_GRO))
-   skb_reset_transport_header(skb);
+   skb_reset_transport_header(skb);
return 0;
 }
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 841f4a0..9ef490d 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -59,6 +59,7 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 
if (xo && (xo->flags & XFRM_GRO)) {
skb_mac_header_rebuild(skb);
+   skb_reset_transport_header(skb);
return -1;
}
 
diff --git a/net/ipv6/xfrm6_mode_transport.c b/net/ipv6/xfrm6_mode_transport.c
index 9ad07a9..3c29da5 100644
--- a/net/ipv6/xfrm6_mode_transport.c
+++ b/net/ipv6/xfrm6_mode_transport.c
@@ -51,7 +51,6 @@ static int xfrm6_transport_output(struct xfrm_state *x, 
struct sk_buff *skb)
 static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 {
int ihl = skb->data - skb_transport_header(skb);
-   struct xfrm_offload *xo = xfrm_offload(skb);
 
if (skb->transport_header != skb->network_header) {
memmove(skb_transport_header(skb),
@@ -60,8 +59,7 @@ static int xfrm6_transport_input(struct xfrm_state *x, struct 
sk_buff *skb)
}
ipv6_hdr(skb)->payload_len = htons(skb->len + ihl -
   sizeof(struct ipv6hdr));
-   if (!xo || !(xo->flags & XFRM_GRO))
-   skb_reset_transport_header(skb);
+   skb_reset_transport_header(skb);
return 0;
 }
 
-- 
1.7.1

[PATCH V2 ipsec-next 2/2] xfrm: reset crypto_done when iterating over multiple input xfrms

2018-09-03 Thread Sowmini Varadhan

We only support one offloaded xfrm (we do not have devices that
can handle more than one offload), so reset crypto_done in
xfrm_input() when iterating over multiple transforms in xfrm_input,
so that we can invoke the appropriate x->type->input for the
non-offloaded transforms

Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")

Signed-off-by: Sowmini Varadhan 
---
v2: added "Fixes" tag

 net/xfrm/xfrm_input.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3520e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -458,6 +458,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
goto drop;
}
+   crypto_done = false;
} while (!err);
 
err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
-- 
1.7.1

[PATCH net-next] cxgb4: when max_tx_rate is 0 disable tx rate limiting

2018-09-03 Thread Ganesh Goudar

in ndo_set_vf_rate() when max_tx_rate is 0 disable tx
rate limiting for that vf.

Signed-off-by: Casey Leedom 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 961e3087..2e1e286 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2749,6 +2749,27 @@ static int cxgb4_mgmt_set_vf_rate(struct net_device 
*dev, int vf,
return -EINVAL;
}
 
+   if (max_tx_rate == 0) {
+   /* unbind VF to to any Traffic Class */
+   fw_pfvf =
+   (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_PFVF) |
+FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_PFVF_SCHEDCLASS_ETH));
+   fw_class = 0x;
+   ret = t4_set_params(adap, adap->mbox, adap->pf, vf + 1, 1,
+   &fw_pfvf, &fw_class);
+   if (ret) {
+   dev_err(adap->pdev_dev,
+   "Err %d in unbinding PF %d VF %d from TX Rate 
Limiting\n",
+   ret, adap->pf, vf);
+   return -EINVAL;
+   }
+   dev_info(adap->pdev_dev,
+"PF %d VF %d is unbound from TX Rate Limiting\n",
+adap->pf, vf);
+   adap->vfinfo[vf].tx_rate = 0;
+   return 0;
+   }
+
ret = t4_get_link_params(pi, &link_ok, &speed, &mtu);
if (ret != FW_SUCCESS) {
dev_err(adap->pdev_dev,
@@ -2798,8 +2819,8 @@ static int cxgb4_mgmt_set_vf_rate(struct net_device *dev, 
int vf,
&fw_class);
if (ret) {
dev_err(adap->pdev_dev,
-   "Err %d in binding VF %d to Traffic Class %d\n",
-   ret, vf, class_id);
+   "Err %d in binding PF %d VF %d to Traffic Class %d\n",
+   ret, adap->pf, vf, class_id);
return -EINVAL;
}
dev_info(adap->pdev_dev, "PF %d VF %d is bound to Class %d\n",
-- 
2.1.0

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet

On Mon, 2018-09-03 at 10:36 +0100, Jose Abreu wrote:
> Hi Jerome,
> 
> On 03-09-2018 09:56, Jerome Brunet wrote:
> > On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
> > > [ As for now this is only for testing! ]
> > > 
> > > This follows David Miller advice and tries to fix coalesce timer in
> > > multi-queue scenarios.
> > > 
> > > We are now using per-queue coalesce values and per-queue TX timer. This
> > > assumes that tx_queues == rx_queues, which can not be necessarly true.
> > > Official patch will need to have this fixed.
> > > 
> > > Coalesce timer default values was changed to 1ms and the coalesce frames
> > > to 25.
> > > 
> > > Tested in B2B setup between XGMAC2 and GMAC5.
> > 
> > Tested on Amlogic meson-axg-s400. No regression seen so far.
> > (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)
> > 
> > As far as I understand from the device tree parsing, this platform (and all
> > other amlogic platforms) use single queue.
> 
> Thanks for testing! I will send a formal patch once I get around
> the problem of rx queues != tx queues.
> 
> > 
> > ---
> > 
> > Jose,
> > 
> > On another topic doing iperf3 test on amlogic's devices we seen a strange
> > behavior.
> > 
> > Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
> > platform). However, when doing both Rx and Tx at the same time, We see the 
> > Tx
> > throughput dropping significantly (~30MBps) and lot of TCP retries.
> > 
> > Would you any idea what might be our problem ? or how to start investigating
> > this ?
> > 
> 
> I'm not able to reproduce this here but I'm using multiple queue.
> I will try with single queue. In the meantime please try this
> patch (it shall be applied directly on top of this RFT):

No notable change. Rx is fine but Tx:
[  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes

I suppose the problem as something to do with the retries. When doing Tx test
alone, we don't have such a things a throughput where we expect it to be.

By the way, your mailer (and its auto 80 column rule I suppose) made the patch
below a bit harder to apply

> 
> 
> --->8
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index ae26a6e8608e..1407975320aa 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct
> stmmac_priv *priv)
> stmmac_init_tx_chan(priv, priv->ioaddr,
> priv->plat->dma_cfg,
> tx_q->dma_tx_phy, chan);
>  
> -   tx_q->tx_tail_addr = tx_q->dma_tx_phy +
> -   (DMA_TX_SIZE * sizeof(struct dma_desc));
> +   tx_q->tx_tail_addr = tx_q->dma_tx_phy;
> stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
>tx_q->tx_tail_addr, chan);
> }
> @@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct
> sk_buff *skb, struct net_device *dev)
>  
> netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
> skb->len);
>  
> +   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
> sizeof(*desc));
> stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
> tx_q->tx_tail_addr, queue);
>  
> if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
> @@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct
> sk_buff *skb, struct net_device *dev)
> netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
> skb->len);
>  
> stmmac_enable_dma_transmission(priv, priv->ioaddr);
> +
> +   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
> sizeof(*desc));
> stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
> tx_q->tx_tail_addr, queue);
>  
> if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
> --->8
> 
> Thanks and Best Regards,
> Jose Miguel Abreu

Re: phys_port_id in switchdev mode?

2018-09-03 Thread Or Gerlitz

On Fri, Aug 31, 2018 at 11:13 PM, Marcelo Ricardo Leitner
 wrote:
> On Tue, Aug 28, 2018 at 08:43:51PM +0200, Jakub Kicinski wrote:
>> Ugh, CC: netdev..
>>
>> On Tue, 28 Aug 2018 20:05:39 +0200, Jakub Kicinski wrote:
>> > Hi!
>> >
>> > I wonder if we can use phys_port_id in switchdev to group together
>> > interfaces of a single PCI PF?  Here is the problem:
>
> On Mellanox cards, this is already possible via phys_switch_id, as
> each PF has its own phys_switch_id. So all VFs with a given
> phys_switch_id belong to the PF with that same phys_switch_id.

This is due to the fact that currently when getting into switchdev mode
the PF netdev becomes the uplink representor. This is problematic and we
are working to have an uplink repr as nfp and others have. Bottom line,
this is not the correct way to group PF with it's VFs, switch id is something
that relates to switch port reprs not the entities behind them.

Re: phys_port_id in switchdev mode?

2018-09-03 Thread Or Gerlitz

On Tue, Aug 28, 2018 at 9:05 PM, Jakub Kicinski
 wrote:
> Hi!


Hi Jakub and sorry for the late reply, this crazigly hot summer refuses to die,

Note I replied couple of minutes ago but it didn't get to the list, so
lets take it from this one:

> I wonder if we can use phys_port_id in switchdev to group together
> interfaces of a single PCI PF?  Here is the problem:
>
> With a mix of PF and VF interfaces it gets increasingly difficult to
> figure out which one corresponds to which PF.  We can identify which
> *representor* is which, by means of phys_port_name and devlink
> flavours.  But if the actual VF/PF interfaces are also present on the
> same host, it gets confusing when one tries to identify the PF they
> came from.  Generally one has to resort of matching between PCI DBDF of
> the PF and VFs or read relevant info out of ethtool -i.
>
> In multi host scenario this is particularly painful, as there seems to
> be no immediately obvious way to match PCI interface ID of a card (0,
> 1, 2, 3, 4...) to the DBDF we have connected.
>
> Another angle to this is legacy SR-IOV NDOs.  User space picks a netdev
> from /sys/bus/pci/$VF_DBDF/physfn/net/ to run the NDOs on in somehow
> random manner, which means we have to provide those for all devices with
> link to the PF (all reprs).  And we have to link them (a) because it's
> right (tm) and (b) to get correct naming.


wait, as you commented in later, not only the mellanox vf reprs but rather also
the nfp vf reprs are not linked to the PF, because ip link output
grows quadratically.

> The only reliable way to make
> user space (libvirt) choose the repr it should run the NDOs on (which is
> IMHO the corresponding PF repr) is to set phys_port_id on actual VFs,
> VF reprs, PFs and PF reprs to a value corresponding to the *PCI PF*,
> not the external/Ethernet port when in switchdev mode.  User space
> should understand phys_port_id in this context, given it was originally
> introduced for matching VFs to ports.


Using phy_port_id to match/group VFs to PFs makes sense to me.

So what would be the libvirt use case you envision that needs
the VF and PF reprs to support that as well? or maybe you were
not referring to libvirt but to some other provisioning element? I need
to refresh my memory on that area.

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jose Abreu

Hi Jerome,

On 03-09-2018 09:56, Jerome Brunet wrote:
> On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
>> [ As for now this is only for testing! ]
>>
>> This follows David Miller advice and tries to fix coalesce timer in
>> multi-queue scenarios.
>>
>> We are now using per-queue coalesce values and per-queue TX timer. This
>> assumes that tx_queues == rx_queues, which can not be necessarly true.
>> Official patch will need to have this fixed.
>>
>> Coalesce timer default values was changed to 1ms and the coalesce frames
>> to 25.
>>
>> Tested in B2B setup between XGMAC2 and GMAC5.
> Tested on Amlogic meson-axg-s400. No regression seen so far.
> (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)
>
> As far as I understand from the device tree parsing, this platform (and all
> other amlogic platforms) use single queue.

Thanks for testing! I will send a formal patch once I get around
the problem of rx queues != tx queues.

>
> ---
>
> Jose,
>
> On another topic doing iperf3 test on amlogic's devices we seen a strange
> behavior.
>
> Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
> platform). However, when doing both Rx and Tx at the same time, We see the Tx
> throughput dropping significantly (~30MBps) and lot of TCP retries.
>
> Would you any idea what might be our problem ? or how to start investigating
> this ?
>

I'm not able to reproduce this here but I'm using multiple queue.
I will try with single queue. In the meantime please try this
patch (it shall be applied directly on top of this RFT):


--->8
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ae26a6e8608e..1407975320aa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct
stmmac_priv *priv)
stmmac_init_tx_chan(priv, priv->ioaddr,
priv->plat->dma_cfg,
tx_q->dma_tx_phy, chan);
 
-   tx_q->tx_tail_addr = tx_q->dma_tx_phy +
-   (DMA_TX_SIZE * sizeof(struct dma_desc));
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy;
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
   tx_q->tx_tail_addr, chan);
}
@@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct
sk_buff *skb, struct net_device *dev)
 
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
skb->len);
 
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, queue);
 
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
@@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct
sk_buff *skb, struct net_device *dev)
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
skb->len);
 
stmmac_enable_dma_transmission(priv, priv->ioaddr);
+
+   tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, queue);
 
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
--->8

Thanks and Best Regards,
Jose Miguel Abreu

Re: [PATCH net-next] veth: report NEWLINK event when moving the peer device in a new namespace

2018-09-03 Thread Thomas Haller

Hi,

On Sat, 2018-09-01 at 17:45 -0600, David Ahern wrote:
> On 9/1/18 3:05 AM, Lorenzo Bianconi wrote:
> > 
> > I was thinking about the commit 38e01b30563a and then I realized I
> > misread the code
> > yesterday. The commit 38e01b30563a provides all relevant info but
> > it
> > emits the event
> > for veth1 (the device moved in the new namespace).
> > An userspace application will not receive that message if it
> > filters
> > events for just
> > a specific device (veth0 in this case) despite that some device
> > properties have changed
> > (since veth0 and veth1 are paired devices). To fix that behavior in
> > veth_notify routine
> > I emits a RTM_NEWLINK event for veth0.
> 
> Userspace is managing a veth a pair. It moves one of them to a new
> namespace and decides to filter messages related to that device
> before
> the move. If it did not filter the DELLINK message it would get the
> information you want. That to me is a userspace problem. What am I
> missing?

The userspace component which moves the veth peer is likely not the
same that is listening to these events.

> Fundamentally, your proposal means 3 messages when moving a device to
> a
> new namespace which is what I think is unnecessary.

You are correct, the necessary information is signaled in this case.

Usually, one can manage the information about one link by considering
only RTM_NEWLINK/RTM_DELLINK message for that link. That seems
desirable behavior of the netlink API, as it simplifies user space
implementations.

For example, libnl3's cache for links doesn't properly handles this,
and you end up with invalid data in the cache. You may call that a
userspace problem and bug. But does it really need to be that
complicated?

FWIW, a similar thing was also NACK'ed earlier:
  http://lists.openwall.net/netdev/2016/06/12/8

Paolo and Lorenzo pointed out another issue when moving the peer to a
third namespace:

>>>

ip netns del ns1
ip netns del ns2
ip netns del ns3
ip netns add ns1
ip netns add ns2
ip netns add ns3

ip -n ns1 link add dev veth0 type veth peer name veth1
ip -n ns1 monitor link &

ip -n ns1 link set veth1 netns ns2
# Deleted 9: veth1@veth0:  mtu 1500 qdisc noop 
state DOWN group default 
#link/ether 86:79:1d:c6:45:bc brd ff:ff:ff:ff:ff:ff new-netnsid 0 
new-ifindex 9

ip -n ns2 link set veth1 netns ns3
# no event.

>>

best,
Thomas

signature.asc
Description: This is a digitally signed message part

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-03 Thread Jerome Brunet

On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
> [ As for now this is only for testing! ]
> 
> This follows David Miller advice and tries to fix coalesce timer in
> multi-queue scenarios.
> 
> We are now using per-queue coalesce values and per-queue TX timer. This
> assumes that tx_queues == rx_queues, which can not be necessarly true.
> Official patch will need to have this fixed.
> 
> Coalesce timer default values was changed to 1ms and the coalesce frames
> to 25.
> 
> Tested in B2B setup between XGMAC2 and GMAC5.

Tested on Amlogic meson-axg-s400. No regression seen so far.
(arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)

As far as I understand from the device tree parsing, this platform (and all
other amlogic platforms) use single queue.

---

Jose,

On another topic doing iperf3 test on amlogic's devices we seen a strange
behavior.

Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
platform). However, when doing both Rx and Tx at the same time, We see the Tx
throughput dropping significantly (~30MBps) and lot of TCP retries.

Would you any idea what might be our problem ? or how to start investigating
this ?

[PATCH net 1/3] bnxt_en: Fix firmware signaled resource change logic in open.

2018-09-03 Thread Michael Chan

When the driver detects that resources have changed during open, it
should reset the rx and tx rings to 0.  This will properly setup the
init sequence to initialize the default rings again.  We also need
to signal the RDMA driver to stop and clear its interrupts.  We then
call the RoCE driver to restart if a new set of default rings is
successfully reserved.

Fixes: 25e1acd6b92b ("bnxt_en: Notify firmware about IF state changes.")
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8bb1e38..6a1baf3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6684,6 +6684,8 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
hw_resc->resv_rx_rings = 0;
hw_resc->resv_hw_ring_grps = 0;
hw_resc->resv_vnics = 0;
+   bp->tx_nr_rings = 0;
+   bp->rx_nr_rings = 0;
}
return rc;
 }
@@ -8769,20 +8771,25 @@ static int bnxt_init_dflt_ring_mode(struct bnxt *bp)
if (bp->tx_nr_rings)
return 0;
 
+   bnxt_ulp_irq_stop(bp);
+   bnxt_clear_int_mode(bp);
rc = bnxt_set_dflt_rings(bp, true);
if (rc) {
netdev_err(bp->dev, "Not enough rings available.\n");
-   return rc;
+   goto init_dflt_ring_err;
}
rc = bnxt_init_int_mode(bp);
if (rc)
-   return rc;
+   goto init_dflt_ring_err;
+
bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
if (bnxt_rfs_supported(bp) && bnxt_rfs_capable(bp)) {
bp->flags |= BNXT_FLAG_RFS;
bp->dev->features |= NETIF_F_NTUPLE;
}
-   return 0;
+init_dflt_ring_err:
+   bnxt_ulp_irq_restart(bp, rc);
+   return rc;
 }
 
 int bnxt_restore_pf_fw_resources(struct bnxt *bp)
-- 
2.5.1

[PATCH net 3/3] bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.

2018-09-03 Thread Michael Chan

Currently, the driver adjusts the bp->hw_resc.max_cp_rings by the number
of MSIX vectors used by RDMA.  There is one code path in open that needs
to check the true max_cp_rings including any used by RDMA.  This code
is now checking for the reduced max_cp_rings which will fail when the
number of cp rings is very small.

To fix this in a clean way, we don't adjust max_cp_rings anymore.
Instead, we add a helper bnxt_get_max_func_cp_rings_for_en() to get the
reduced max_cp_rings when appropriate.

Fixes: ec86f14ea506 ("bnxt_en: Add ULP calls to stop and restart IRQs.")
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   | 7 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h   | 2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 7 ---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c   | 5 -
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6472ce4..cecbb1d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5913,9 +5913,9 @@ unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp)
return bp->hw_resc.max_cp_rings;
 }
 
-void bnxt_set_max_func_cp_rings(struct bnxt *bp, unsigned int max)
+unsigned int bnxt_get_max_func_cp_rings_for_en(struct bnxt *bp)
 {
-   bp->hw_resc.max_cp_rings = max;
+   return bp->hw_resc.max_cp_rings - bnxt_get_ulp_msix_num(bp);
 }
 
 static unsigned int bnxt_get_max_func_irqs(struct bnxt *bp)
@@ -8631,7 +8631,8 @@ static void _bnxt_get_max_rings(struct bnxt *bp, int 
*max_rx, int *max_tx,
 
*max_tx = hw_resc->max_tx_rings;
*max_rx = hw_resc->max_rx_rings;
-   *max_cp = min_t(int, hw_resc->max_irqs, hw_resc->max_cp_rings);
+   *max_cp = min_t(int, bnxt_get_max_func_cp_rings_for_en(bp),
+   hw_resc->max_irqs);
*max_cp = min_t(int, *max_cp, hw_resc->max_stat_ctxs);
max_ring_grps = hw_resc->max_hw_ring_grps;
if (BNXT_CHIP_TYPE_NITRO_A0(bp) && BNXT_PF(bp)) {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index c4c77b9..bde3846 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1481,7 +1481,7 @@ int bnxt_hwrm_set_coal(struct bnxt *);
 unsigned int bnxt_get_max_func_stat_ctxs(struct bnxt *bp);
 void bnxt_set_max_func_stat_ctxs(struct bnxt *bp, unsigned int max);
 unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp);
-void bnxt_set_max_func_cp_rings(struct bnxt *bp, unsigned int max);
+unsigned int bnxt_get_max_func_cp_rings_for_en(struct bnxt *bp);
 int bnxt_get_avail_msix(struct bnxt *bp, int num);
 int bnxt_reserve_rings(struct bnxt *bp);
 void bnxt_tx_disable(struct bnxt *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
index 6d583bc..fcd085a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c
@@ -451,7 +451,7 @@ static int bnxt_hwrm_func_vf_resc_cfg(struct bnxt *bp, int 
num_vfs)
 
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_VF_RESOURCE_CFG, -1, -1);
 
-   vf_cp_rings = hw_resc->max_cp_rings - bp->cp_nr_rings;
+   vf_cp_rings = bnxt_get_max_func_cp_rings_for_en(bp) - bp->cp_nr_rings;
vf_stat_ctx = hw_resc->max_stat_ctxs - bp->num_stat_ctxs;
if (bp->flags & BNXT_FLAG_AGG_RINGS)
vf_rx_rings = hw_resc->max_rx_rings - bp->rx_nr_rings * 2;
@@ -549,7 +549,8 @@ static int bnxt_hwrm_func_cfg(struct bnxt *bp, int num_vfs)
max_stat_ctxs = hw_resc->max_stat_ctxs;
 
/* Remaining rings are distributed equally amongs VF's for now */
-   vf_cp_rings = (hw_resc->max_cp_rings - bp->cp_nr_rings) / num_vfs;
+   vf_cp_rings = (bnxt_get_max_func_cp_rings_for_en(bp) -
+  bp->cp_nr_rings) / num_vfs;
vf_stat_ctx = (max_stat_ctxs - bp->num_stat_ctxs) / num_vfs;
if (bp->flags & BNXT_FLAG_AGG_RINGS)
vf_rx_rings = (hw_resc->max_rx_rings - bp->rx_nr_rings * 2) /
@@ -643,7 +644,7 @@ static int bnxt_sriov_enable(struct bnxt *bp, int *num_vfs)
 */
vfs_supported = *num_vfs;
 
-   avail_cp = hw_resc->max_cp_rings - bp->cp_nr_rings;
+   avail_cp = bnxt_get_max_func_cp_rings_for_en(bp) - bp->cp_nr_rings;
avail_stat = hw_resc->max_stat_ctxs - bp->num_stat_ctxs;
avail_cp = min_t(int, avail_cp, avail_stat);
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index deac73e..beee612 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -169,7 +169,6 @@ static int bnxt_req_msix_vecs(struct bnxt_en_dev *edev, int 
ulp_id,
edev->ulp_tbl[ulp_id].msix_requested = avail_msix;
}
bnxt_fill_msix_vecs(bp, ent

1 2 >

1 - 100 of 117 matches

Mail list logo