date:20181025

Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Jiri Pirko

Fri, Oct 26, 2018 at 12:40:31AM CEST, jay.vosbu...@canonical.com wrote:
>Chas Williams <3ch...@gmail.com> wrote:
>
>>On 10/25/2018 05:59 PM, Jay Vosburgh wrote:
>>> Chas Williams <3ch...@gmail.com> wrote:
>>>
 netif_is_lag_port should be used to identify link aggregation ports.
 For this to work, we need to reorganize the bonding and team drivers
 so that the necessary flags are set before dev_open is called.

 commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
 made this decision originally based on the IFF_SLAVE flag which isn't
 used by the team driver.  Note, we do need to retain the IFF_SLAVE
 check for the eql driver.
>>>
>>> Is 31e77c93e432 the correct commit reference?  I don't see
>>> anything in there about IFF_SLAVE or bonding; it's a patch to the
>>> process scheduler.
>>
>>No, that's wrong.  It should be c2edacf80e155.
>>
>>> And, as Jiri said, the subject doesn't mention bonding.
>>
>>The behavior of bonding wasn't changed.  The intent of the patch
>>is to add team slaves to the interfaces that don't get automatic
>>IPv6 addresses.  The body discusses why bonding had to change as
>>well.
>
>   Sure, but the bonding code has changed, and the current
>presentation makes it harder for reviewers to follow (or perhaps even
>notice).
>
>>I was under the impression that the subject needs to kept short.
>>If there a better way to phrase what I want to do?
>
>   I'd suggest splitting this into three patches: A first patch
>that adds the new IPv6 functionality, then one patch each for team and
>bonding to take advantage of that new functionality.  Each of the three
>would then be very straightforward, change just one thing, and should be
>clearer all around.

+1

[PATCH net] ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12

2018-10-25 Thread Hangbin Liu

Similiar with ipv6 mcast commit 89225d1ce6af3 ("net: ipv6: mld: fix v1/v2
switchback timeout to rfc3810, 9.12.")

i) RFC3376 8.12. Older Version Querier Present Timeout says:

   The Older Version Querier Interval is the time-out for transitioning
   a host back to IGMPv3 mode once an older version query is heard.
   When an older version query is received, hosts set their Older
   Version Querier Present Timer to Older Version Querier Interval.

   This value MUST be ((the Robustness Variable) times (the Query
   Interval in the last Query received)) plus (one Query Response
   Interval).

Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT.
Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response
Interval) in struct in_device.

Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri.
We need update these values when receive IGMPv3 queries.

Reported-by: Ying Xu 
Signed-off-by: Hangbin Liu 
---
 include/linux/inetdevice.h |  4 +++-
 net/ipv4/igmp.c| 53 +++---
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index c759d1c..a64f21a 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -37,7 +37,9 @@ struct in_device {
unsigned long   mr_v1_seen;
unsigned long   mr_v2_seen;
unsigned long   mr_maxdelay;
-   unsigned char   mr_qrv;
+   unsigned long   mr_qi;  /* Query Interval */
+   unsigned long   mr_qri; /* Query Response Interval */
+   unsigned char   mr_qrv; /* Query Robustness Variable */
unsigned char   mr_gq_running;
unsigned char   mr_ifc_count;
struct timer_list   mr_gq_timer;/* general query timer */
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 4da3944..765b2b3 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -111,13 +111,10 @@
 #ifdef CONFIG_IP_MULTICAST
 /* Parameter names and values are taken from igmp-v2-06 draft */
 
-#define IGMP_V1_ROUTER_PRESENT_TIMEOUT (400*HZ)
-#define IGMP_V2_ROUTER_PRESENT_TIMEOUT (400*HZ)
 #define IGMP_V2_UNSOLICITED_REPORT_INTERVAL(10*HZ)
 #define IGMP_V3_UNSOLICITED_REPORT_INTERVAL(1*HZ)
+#define IGMP_QUERY_INTERVAL(125*HZ)
 #define IGMP_QUERY_RESPONSE_INTERVAL   (10*HZ)
-#define IGMP_QUERY_ROBUSTNESS_VARIABLE 2
-
 
 #define IGMP_INITIAL_REPORT_DELAY  (1)
 
@@ -935,13 +932,15 @@ static bool igmp_heard_query(struct in_device *in_dev, 
struct sk_buff *skb,
 
max_delay = IGMP_QUERY_RESPONSE_INTERVAL;
in_dev->mr_v1_seen = jiffies +
-   IGMP_V1_ROUTER_PRESENT_TIMEOUT;
+   (in_dev->mr_qrv * in_dev->mr_qi) +
+   in_dev->mr_qri;
group = 0;
} else {
/* v2 router present */
max_delay = ih->code*(HZ/IGMP_TIMER_SCALE);
in_dev->mr_v2_seen = jiffies +
-   IGMP_V2_ROUTER_PRESENT_TIMEOUT;
+   (in_dev->mr_qrv * in_dev->mr_qi) +
+   in_dev->mr_qri;
}
/* cancel the interface change timer */
in_dev->mr_ifc_count = 0;
@@ -981,8 +980,21 @@ static bool igmp_heard_query(struct in_device *in_dev, 
struct sk_buff *skb,
if (!max_delay)
max_delay = 1;  /* can't mod w/ 0 */
in_dev->mr_maxdelay = max_delay;
-   if (ih3->qrv)
-   in_dev->mr_qrv = ih3->qrv;
+
+   /* RFC3376, 4.1.6. QRV and 4.1.7. QQIC, when the most recently
+* received value was zero, use the default or statically
+* configured value.
+*/
+   in_dev->mr_qrv = ih3->qrv ?: net->ipv4.sysctl_igmp_qrv;
+   in_dev->mr_qi = IGMPV3_QQIC(ih3->qqic)*HZ ?: 
IGMP_QUERY_INTERVAL;
+
+   /* RFC3376, 8.3. Query Response Interval:
+* The number of seconds represented by the [Query Response
+* Interval] must be less than the [Query Interval].
+*/
+   if (in_dev->mr_qri >= in_dev->mr_qi)
+   in_dev->mr_qri = (in_dev->mr_qi/HZ - 1)*HZ;
+
if (!group) { /* general query */
if (ih3->nsrcs)
return true;/* no sources allowed */
@@ -1723,18 +1735,30 @@ void ip_mc_down(struct in_device *in_dev)
ip_mc_dec_group(in_dev, IGMP_ALL_HOSTS);
 }
 
-void ip_mc_init_dev(struct in_device *in_dev)
-{
 #ifdef CONFIG_IP_MULTICAST
+static void ip_mc_reset(struct in_device *in_dev)
+{
struct net *net = dev

Re: [PATCH v3 2/2] net: qcom/emac: add phy-handle support for ACPI

2018-10-25 Thread Wang, Dongsheng

On 2018/10/26 10:37, Timur Tabi wrote:
> On 10/25/18 9:18 PM, Wang, Dongsheng wrote:
>> But when I was reading Documentation/acpi/DSD-properties-rules.txt, my
>> understanding is we should try to conform to DT bindings. So maybe ACPI
>> doesn't have such a document, just DT bindings.
> There was an attempt to document DSDs, but it was abandoned after a while.
>
> https://github.com/ahs3/dsd
>

Yes, here's a database concept, and I asked some Intel guys, the answer
I got was there is no such database or document. :(


Cheers,

Dongsheng

Re: [PATCH v3 2/2] net: qcom/emac: add phy-handle support for ACPI

2018-10-25 Thread Timur Tabi


On 10/25/18 9:18 PM, Wang, Dongsheng wrote:

But when I was reading Documentation/acpi/DSD-properties-rules.txt, my
understanding is we should try to conform to DT bindings. So maybe ACPI
doesn't have such a document, just DT bindings.


There was an attempt to document DSDs, but it was abandoned after a while.

https://github.com/ahs3/dsd

[PATCH net] ipv6/mcast: update mc_qrv when join new group

2018-10-25 Thread Hangbin Liu

Currently we only set mc_qrv to sysctl_mld_qrv when interface up. If we
change sysctl_mld_qrv after interface up, it will has no effect.

Fix it by assigning latest sysctl_mld_qrv to idev mc_qrv when join new group.

Reported-by: Ying Xu 
Signed-off-by: Hangbin Liu 
---
 net/ipv6/mcast.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index dbab62e..bed4890 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -680,6 +680,7 @@ static void igmp6_group_added(struct ifmcaddr6 *mc)
if (!(dev->flags & IFF_UP) || (mc->mca_flags & MAF_NOREPORT))
return;
 
+   mc->idev->mc_qrv = sysctl_mld_qrv;
if (mld_in_v1_mode(mc->idev)) {
igmp6_join_group(mc);
return;
-- 
2.5.5

[PATCH net] bridge: do not add port to router list when receives query with source 0.0.0.0

2018-10-25 Thread Hangbin Liu

Based on RFC 4541, 2.1.1.  IGMP Forwarding Rules

  The switch supporting IGMP snooping must maintain a list of
  multicast routers and the ports on which they are attached.  This
  list can be constructed in any combination of the following ways:

  a) This list should be built by the snooping switch sending
 Multicast Router Solicitation messages as described in IGMP
 Multicast Router Discovery [MRDISC].  It may also snoop
 Multicast Router Advertisement messages sent by and to other
 nodes.

  b) The arrival port for IGMP Queries (sent by multicast routers)
 where the source address is not 0.0.0.0.

We should not add the port to router list when receives query with source
0.0.0.0.

Reported-by: Ying Xu 
Signed-off-by: Hangbin Liu 
---
 net/bridge/br_multicast.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 024139b..41cdafb 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1422,7 +1422,15 @@ static void br_multicast_query_received(struct 
net_bridge *br,
return;
 
br_multicast_update_query_timer(br, query, max_delay);
-   br_multicast_mark_router(br, port);
+
+   /* Based on RFC4541, section 2.1.1 IGMP Forwarding Rules,
+* the arrival port for IGMP Queries where the source address
+* is 0.0.0.0 should not be added to router port list.
+*/
+   if ((saddr->proto == htons(ETH_P_IP) && saddr->u.ip4) ||
+   (saddr->proto == htons(ETH_P_IPV6) &&
+!ipv6_addr_any(&saddr->u.ip6)))
+   br_multicast_mark_router(br, port);
 }
 
 static void br_ip4_multicast_query(struct net_bridge *br,
-- 
2.5.5

Re: [PATCH v3 2/2] net: qcom/emac: add phy-handle support for ACPI

2018-10-25 Thread Wang, Dongsheng

On 2018/10/26 3:24, Andrew Lunn wrote:
> On Thu, Oct 25, 2018 at 06:09:15PM +0800, Wang Dongsheng wrote:
>> Use "phy-handle" to porint an internal MDIO device port.
> Hi Dongsheng
>
> You are basically defining how all future ACPI based MAC drivers get
> access to their PHY. This needs to become part of the ACPI standard,
> etc.
>
> This code should not be hidden away in the emac driver. It needs to be
> placed somewhere public so other drivers can use it. And it needs good
> documentation, including an example of what needs to go into the ACPI
> tables, etc.
Hi Andrew

I saw AppliedMicro(apm) xgene has used "phy-handle" for ACPI method, so
I guess "phy-handle" has become part of the ACPI standard. But I cannot
make sure.
I tried to confirm the property is defined in the document(Like DT
binding). However, I did not find any documentation on the description
property definition on the UEFI/ACPICA website.
But when I was reading Documentation/acpi/DSD-properties-rules.txt, my
understanding is we should try to conform to DT bindings. So maybe ACPI
doesn't have such a document, just DT bindings.

Cheers,
Dongsheng

> Thanks
>   Andrew
>

Re: [PATCH bpf 0/7] Batch of direct packet access fixes for BPF

2018-10-25 Thread Alexei Starovoitov

On Wed, Oct 24, 2018 at 10:05:42PM +0200, Daniel Borkmann wrote:
> Several fixes to get direct packet access in order from verifier
> side. Also test suite fix to run cg_skb as unpriv and an improvement
> to make direct packet write less error prone in future.

Applied, Thanks

Re: [PATCH] net/{ipv4,ipv6}: Do not put target net if input nsid is invalid

2018-10-25 Thread David Miller

From: Bjørn Mork 
Date: Thu, 25 Oct 2018 21:18:25 +0200

> The cleanup path will put the target net when netnsid is set.  So we must
> reset netnsid if the input is invalid.
> 
> Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to 
> bad attributes")
> Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to 
> bad attributes")
> Cc: David Ahern 
> Signed-off-by: Bjørn Mork 

Applied, thank you.

Re: [PATCH v1 net] lan743x: Remove SPI dependency from Microchip group.

2018-10-25 Thread David Miller

From: Bryan Whitehead 
Date: Thu, 25 Oct 2018 13:09:38 -0400

> The SPI dependency does not apply to lan743x driver, and other
> drivers in the group already state their dependence on SPI.
> 
> Signed-off-by: Bryan Whitehead 

Yep, make sense.

Applied, thanks.

Re: [PATCH net] drivers: net: remove inclusion when not needed

2018-10-25 Thread David Miller

From: Eric Dumazet 
Date: Thu, 25 Oct 2018 06:42:12 -0700

> Drivers using generic NAPI interface no longer need to include
> , since busy polling was moved to core networking
> stack long ago.
> 
> See commit 79e7fff47b7b ("net: remove support for per driver
> ndo_busy_poll()") for reference.
> 
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.

Re: [PATCH net] net: phy: genphy_10g_driver: Avoid NULL pointer dereference

2018-10-25 Thread David Miller

From: Andrew Lunn 
Date: Thu, 25 Oct 2018 14:42:38 +0200

> This driver got missed during the recent change of .features from a
> u32 to a pointer to a Linux bitmap. Change the initialisation from 0
> to PHY_10GBIT_FEATURES so removing the danger of a NULL pointer
> dereference.
> 
> Fixes: 719655a14971 ("net: phy: Replace phy driver features u32 with 
> link_mode bitmap")
> Reported-by: Jose Abreu 
> Signed-off-by: Andrew Lunn 

Applied, thanks Andrew.

Re: [PATCH net] r8169: fix broken Wake-on-LAN from S5 (poweroff)

2018-10-25 Thread David Miller

From: Heiner Kallweit 
Date: Thu, 25 Oct 2018 18:40:19 +0200

> It was reported that WoL from S5 is broken (WoL from S3 works) and the
> analysis showed that during system shutdown the network interface was
> brought down already when the actual kernel shutdown started.
> Therefore netif_running() returned false and as a consequence the PHY
> was suspended. Obviously WoL wasn't working then.
> To fix this the original patch needs to be effectively reverted.
> A side effect is that when normally bringing down the interface and
> WoL is enabled the PHY will remain powered on (like it was before the
> original patch).
> 
> Fixes: fe87bef01f9b ("r8169: don't check WoL when powering down PHY and 
> interface is down")
> Reported-by: Neil MacLeod 
> Signed-off-by: Heiner Kallweit 

Applied and queued up for -stable, thanks.

Re: [PATCH v2 bpf] bpf: devmap: fix wrong interface selection in notifier_call

2018-10-25 Thread Daniel Borkmann

On 10/24/2018 01:15 PM, Taehee Yoo wrote:
> The dev_map_notification() removes interface in devmap if
> unregistering interface's ifindex is same.
> But only checking ifindex is not enough because other netns can have
> same ifindex. so that wrong interface selection could occurred.
> Hence netdev pointer comparison code is added.
> 
> v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann)
> v1: Initial patch
> 
> Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map")
> Signed-off-by: Taehee Yoo 

Applied to bpf, thanks Taehee!

Re: [PATCH] selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS

2018-10-25 Thread Daniel Borkmann

On 10/25/2018 04:47 PM, Naresh Kamboju wrote:
> BPF sockmap and hashmap are dependent on CONFIG_BPF_STREAM_PARSER and
> xskmap is dependent on CONFIG_XDP_SOCKETS
> 
> Signed-off-by: Naresh Kamboju 

Applied to bpf, thanks Naresh!

Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Jay Vosburgh

Chas Williams <3ch...@gmail.com> wrote:

>On 10/25/2018 05:59 PM, Jay Vosburgh wrote:
>> Chas Williams <3ch...@gmail.com> wrote:
>>
>>> netif_is_lag_port should be used to identify link aggregation ports.
>>> For this to work, we need to reorganize the bonding and team drivers
>>> so that the necessary flags are set before dev_open is called.
>>>
>>> commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>>> made this decision originally based on the IFF_SLAVE flag which isn't
>>> used by the team driver.  Note, we do need to retain the IFF_SLAVE
>>> check for the eql driver.
>>
>>  Is 31e77c93e432 the correct commit reference?  I don't see
>> anything in there about IFF_SLAVE or bonding; it's a patch to the
>> process scheduler.
>
>No, that's wrong.  It should be c2edacf80e155.
>
>>  And, as Jiri said, the subject doesn't mention bonding.
>
>The behavior of bonding wasn't changed.  The intent of the patch
>is to add team slaves to the interfaces that don't get automatic
>IPv6 addresses.  The body discusses why bonding had to change as
>well.

Sure, but the bonding code has changed, and the current
presentation makes it harder for reviewers to follow (or perhaps even
notice).

>I was under the impression that the subject needs to kept short.
>If there a better way to phrase what I want to do?

I'd suggest splitting this into three patches: A first patch
that adds the new IPv6 functionality, then one patch each for team and
bonding to take advantage of that new functionality.  Each of the three
would then be very straightforward, change just one thing, and should be
clearer all around.

-J

>>> Signed-off-by: Chas Williams <3ch...@gmail.com>
>>> ---
>>> drivers/net/bonding/bond_main.c | 4 ++--
>>> drivers/net/team/team.c | 7 +--
>>> net/ipv6/addrconf.c | 2 +-
>>> 3 files changed, 8 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/bond_main.c 
>>> b/drivers/net/bonding/bond_main.c
>>> index ffa37adb7681..5cdad164332b 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
>>> net_device *slave_dev,
>>>
>>> /* set slave flag before open to prevent IPv6 addrconf */
>>> slave_dev->flags |= IFF_SLAVE;
>>> +   slave_dev->priv_flags |= IFF_BONDING;
>>>
>>> /* open the slave since the application closed it */
>>> res = dev_open(slave_dev);
>>> @@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct 
>>> net_device *slave_dev,
>>> goto err_restore_mac;
>>> }
>>>
>>> -   slave_dev->priv_flags |= IFF_BONDING;
>>> /* initialize slave stats */
>>> dev_get_stats(new_slave->dev, &new_slave->slave_stats);
>>>
>>> @@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, 
>>> struct net_device *slave_dev,
>>> slave_disable_netpoll(new_slave);
>>>
>>> err_close:
>>> -   slave_dev->priv_flags &= ~IFF_BONDING;
>>> dev_close(slave_dev);
>>>
>>> err_restore_mac:
>>> +   slave_dev->priv_flags &= ~IFF_BONDING;
>>> slave_dev->flags &= ~IFF_SLAVE;
>>> if (!bond->params.fail_over_mac ||
>>> BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>>> index db633ae9f784..8fc7d57e9f6d 100644
>>> --- a/drivers/net/team/team.c
>>> +++ b/drivers/net/team/team.c
>>> @@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, 
>>> struct team_port *port,
>>>&lag_upper_info, extack);
>>> if (err)
>>> return err;
>>> -   port->dev->priv_flags |= IFF_TEAM_PORT;
>>> return 0;
>>> }
>>>
>>> static void team_upper_dev_unlink(struct team *team, struct team_port *port)
>>> {
>>> netdev_upper_dev_unlink(port->dev, team->dev);
>>> -   port->dev->priv_flags &= ~IFF_TEAM_PORT;
>>> }
>>>
>>> static void __team_port_change_port_added(struct team_port *port, bool 
>>> linkup);
>>> @@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct 
>>> net_device *port_dev,
>>> goto err_port_enter;
>>> }
>>>
>>> +   /* set slave flag before open to prevent IPv6 addrconf */
>>> +   port->dev->priv_flags |= IFF_TEAM_PORT;
>>> +
>>> err = dev_open(port_dev);
>>> if (err) {
>>> netdev_dbg(dev, "Device %s opening failed\n",
>>> @@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct 
>>> net_device *port_dev,
>>> dev_close(port_dev);
>>>
>>> err_dev_open:
>>> +   port->dev->priv_flags &= ~IFF_TEAM_PORT;
>>> team_port_leave(team, port);
>>> team_port_set_orig_dev_addr(port);
>>>
>>> @@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct 
>>> net_device *port_dev)
>>> dev_uc_unsync(port_dev, dev);
>>> dev_mc_unsync(port_dev, dev);
>>> dev_close(port_dev);
>>> +   port->dev->priv_flags &= ~IFF_TEAM_PORT;
>>> team_port_leave(team, p

Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Chas Williams





On 10/25/2018 05:59 PM, Jay Vosburgh wrote:

Chas Williams <3ch...@gmail.com> wrote:


netif_is_lag_port should be used to identify link aggregation ports.
For this to work, we need to reorganize the bonding and team drivers
so that the necessary flags are set before dev_open is called.

commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
made this decision originally based on the IFF_SLAVE flag which isn't
used by the team driver.  Note, we do need to retain the IFF_SLAVE
check for the eql driver.


Is 31e77c93e432 the correct commit reference?  I don't see
anything in there about IFF_SLAVE or bonding; it's a patch to the
process scheduler.


No, that's wrong.  It should be c2edacf80e155.


And, as Jiri said, the subject doesn't mention bonding.


The behavior of bonding wasn't changed.  The intent of the patch
is to add team slaves to the interfaces that don't get automatic
IPv6 addresses.  The body discusses why bonding had to change as
well.

I was under the impression that the subject needs to kept short.
If there a better way to phrase what I want to do?




Signed-off-by: Chas Williams <3ch...@gmail.com>
---
drivers/net/bonding/bond_main.c | 4 ++--
drivers/net/team/team.c | 7 +--
net/ipv6/addrconf.c | 2 +-
3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ffa37adb7681..5cdad164332b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,

/* set slave flag before open to prevent IPv6 addrconf */
slave_dev->flags |= IFF_SLAVE;
+   slave_dev->priv_flags |= IFF_BONDING;

/* open the slave since the application closed it */
res = dev_open(slave_dev);
@@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
goto err_restore_mac;
}

-   slave_dev->priv_flags |= IFF_BONDING;
/* initialize slave stats */
dev_get_stats(new_slave->dev, &new_slave->slave_stats);

@@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
slave_disable_netpoll(new_slave);

err_close:
-   slave_dev->priv_flags &= ~IFF_BONDING;
dev_close(slave_dev);

err_restore_mac:
+   slave_dev->priv_flags &= ~IFF_BONDING;
slave_dev->flags &= ~IFF_SLAVE;
if (!bond->params.fail_over_mac ||
BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index db633ae9f784..8fc7d57e9f6d 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, 
struct team_port *port,
   &lag_upper_info, extack);
if (err)
return err;
-   port->dev->priv_flags |= IFF_TEAM_PORT;
return 0;
}

static void team_upper_dev_unlink(struct team *team, struct team_port *port)
{
netdev_upper_dev_unlink(port->dev, team->dev);
-   port->dev->priv_flags &= ~IFF_TEAM_PORT;
}

static void __team_port_change_port_added(struct team_port *port, bool linkup);
@@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev,
goto err_port_enter;
}

+   /* set slave flag before open to prevent IPv6 addrconf */
+   port->dev->priv_flags |= IFF_TEAM_PORT;
+
err = dev_open(port_dev);
if (err) {
netdev_dbg(dev, "Device %s opening failed\n",
@@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev,
dev_close(port_dev);

err_dev_open:
+   port->dev->priv_flags &= ~IFF_TEAM_PORT;
team_port_leave(team, port);
team_port_set_orig_dev_addr(port);

@@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct 
net_device *port_dev)
dev_uc_unsync(port_dev, dev);
dev_mc_unsync(port_dev, dev);
dev_close(port_dev);
+   port->dev->priv_flags &= ~IFF_TEAM_PORT;
team_port_leave(team, port);

__team_option_inst_mark_removed_port(team, port);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 45b84dd5c4eb..121f863022ed 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,

case NETDEV_UP:
case NETDEV_CHANGE:
-   if (dev->flags & IFF_SLAVE)
+   if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)


Note that netvsc_vf_join() also uses IFF_SLAVE in order skip
IPv6 addrconf for netvsc devices; I don't believe its usage will pass
netif_is_lag_port().  It looks like the above will work, but your commit
message mentions eql as the reason for retaini

Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Chas Williams





On 10/25/2018 05:10 PM, Jiri Pirko wrote:

Thu, Oct 25, 2018 at 11:02:27PM CEST, 3ch...@gmail.com wrote:

netif_is_lag_port should be used to identify link aggregation ports.
For this to work, we need to reorganize the bonding and team drivers
so that the necessary flags are set before dev_open is called.

commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
made this decision originally based on the IFF_SLAVE flag which isn't
used by the team driver.  Note, we do need to retain the IFF_SLAVE
check for the eql driver.

Signed-off-by: Chas Williams <3ch...@gmail.com>
---
drivers/net/bonding/bond_main.c | 4 ++--
drivers/net/team/team.c | 7 +--
net/ipv6/addrconf.c | 2 +-


Subject talks about "team" yet you modify bond and team. Confusing..


The subject discusses what I want to do. The body of the message
covers how I had to do it. The behavior of bonding with respect to
addrconf isn't changed but netif_is_lag_port is picky about the
flags it wants to see from bonding.  So some bonding changes are
necessary.

Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Jay Vosburgh

Chas Williams <3ch...@gmail.com> wrote:

>netif_is_lag_port should be used to identify link aggregation ports.
>For this to work, we need to reorganize the bonding and team drivers
>so that the necessary flags are set before dev_open is called.
>
>commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>made this decision originally based on the IFF_SLAVE flag which isn't
>used by the team driver.  Note, we do need to retain the IFF_SLAVE
>check for the eql driver.

Is 31e77c93e432 the correct commit reference?  I don't see
anything in there about IFF_SLAVE or bonding; it's a patch to the
process scheduler.

And, as Jiri said, the subject doesn't mention bonding.

>Signed-off-by: Chas Williams <3ch...@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 4 ++--
> drivers/net/team/team.c | 7 +--
> net/ipv6/addrconf.c | 2 +-
> 3 files changed, 8 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index ffa37adb7681..5cdad164332b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
>net_device *slave_dev,
> 
>   /* set slave flag before open to prevent IPv6 addrconf */
>   slave_dev->flags |= IFF_SLAVE;
>+  slave_dev->priv_flags |= IFF_BONDING;
> 
>   /* open the slave since the application closed it */
>   res = dev_open(slave_dev);
>@@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct 
>net_device *slave_dev,
>   goto err_restore_mac;
>   }
> 
>-  slave_dev->priv_flags |= IFF_BONDING;
>   /* initialize slave stats */
>   dev_get_stats(new_slave->dev, &new_slave->slave_stats);
> 
>@@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct 
>net_device *slave_dev,
>   slave_disable_netpoll(new_slave);
> 
> err_close:
>-  slave_dev->priv_flags &= ~IFF_BONDING;
>   dev_close(slave_dev);
> 
> err_restore_mac:
>+  slave_dev->priv_flags &= ~IFF_BONDING;
>   slave_dev->flags &= ~IFF_SLAVE;
>   if (!bond->params.fail_over_mac ||
>   BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index db633ae9f784..8fc7d57e9f6d 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, 
>struct team_port *port,
>  &lag_upper_info, extack);
>   if (err)
>   return err;
>-  port->dev->priv_flags |= IFF_TEAM_PORT;
>   return 0;
> }
> 
> static void team_upper_dev_unlink(struct team *team, struct team_port *port)
> {
>   netdev_upper_dev_unlink(port->dev, team->dev);
>-  port->dev->priv_flags &= ~IFF_TEAM_PORT;
> }
> 
> static void __team_port_change_port_added(struct team_port *port, bool 
> linkup);
>@@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct 
>net_device *port_dev,
>   goto err_port_enter;
>   }
> 
>+  /* set slave flag before open to prevent IPv6 addrconf */
>+  port->dev->priv_flags |= IFF_TEAM_PORT;
>+
>   err = dev_open(port_dev);
>   if (err) {
>   netdev_dbg(dev, "Device %s opening failed\n",
>@@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct 
>net_device *port_dev,
>   dev_close(port_dev);
> 
> err_dev_open:
>+  port->dev->priv_flags &= ~IFF_TEAM_PORT;
>   team_port_leave(team, port);
>   team_port_set_orig_dev_addr(port);
> 
>@@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct 
>net_device *port_dev)
>   dev_uc_unsync(port_dev, dev);
>   dev_mc_unsync(port_dev, dev);
>   dev_close(port_dev);
>+  port->dev->priv_flags &= ~IFF_TEAM_PORT;
>   team_port_leave(team, port);
> 
>   __team_option_inst_mark_removed_port(team, port);
>diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>index 45b84dd5c4eb..121f863022ed 100644
>--- a/net/ipv6/addrconf.c
>+++ b/net/ipv6/addrconf.c
>@@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, 
>unsigned long event,
> 
>   case NETDEV_UP:
>   case NETDEV_CHANGE:
>-  if (dev->flags & IFF_SLAVE)
>+  if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)

Note that netvsc_vf_join() also uses IFF_SLAVE in order skip
IPv6 addrconf for netvsc devices; I don't believe its usage will pass
netif_is_lag_port().  It looks like the above will work, but your commit
message mentions eql as the reason for retaining the IFF_SLAVE test, and
eql isn't the only user of IFF_SLAVE in this manner.

-J

>   break;
> 
>   if (idev && idev->cnf.disable_ipv6)
>-- 
>2.14.4
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Jiri Pirko

Thu, Oct 25, 2018 at 11:02:27PM CEST, 3ch...@gmail.com wrote:
>netif_is_lag_port should be used to identify link aggregation ports.
>For this to work, we need to reorganize the bonding and team drivers
>so that the necessary flags are set before dev_open is called.
>
>commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
>made this decision originally based on the IFF_SLAVE flag which isn't
>used by the team driver.  Note, we do need to retain the IFF_SLAVE
>check for the eql driver.
>
>Signed-off-by: Chas Williams <3ch...@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 4 ++--
> drivers/net/team/team.c | 7 +--
> net/ipv6/addrconf.c | 2 +-

Subject talks about "team" yet you modify bond and team. Confusing..

[PATCH net-next] net/ipv6: Block IPv6 addrconf on team ports

2018-10-25 Thread Chas Williams

netif_is_lag_port should be used to identify link aggregation ports.
For this to work, we need to reorganize the bonding and team drivers
so that the necessary flags are set before dev_open is called.

commit 31e77c93e432 ("sched/fair: Update blocked load when newly idle")
made this decision originally based on the IFF_SLAVE flag which isn't
used by the team driver.  Note, we do need to retain the IFF_SLAVE
check for the eql driver.

Signed-off-by: Chas Williams <3ch...@gmail.com>
---
 drivers/net/bonding/bond_main.c | 4 ++--
 drivers/net/team/team.c | 7 +--
 net/ipv6/addrconf.c | 2 +-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ffa37adb7681..5cdad164332b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1536,6 +1536,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
 
/* set slave flag before open to prevent IPv6 addrconf */
slave_dev->flags |= IFF_SLAVE;
+   slave_dev->priv_flags |= IFF_BONDING;
 
/* open the slave since the application closed it */
res = dev_open(slave_dev);
@@ -1544,7 +1545,6 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
goto err_restore_mac;
}
 
-   slave_dev->priv_flags |= IFF_BONDING;
/* initialize slave stats */
dev_get_stats(new_slave->dev, &new_slave->slave_stats);
 
@@ -1804,10 +1804,10 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
slave_disable_netpoll(new_slave);
 
 err_close:
-   slave_dev->priv_flags &= ~IFF_BONDING;
dev_close(slave_dev);
 
 err_restore_mac:
+   slave_dev->priv_flags &= ~IFF_BONDING;
slave_dev->flags &= ~IFF_SLAVE;
if (!bond->params.fail_over_mac ||
BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) {
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index db633ae9f784..8fc7d57e9f6d 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1128,14 +1128,12 @@ static int team_upper_dev_link(struct team *team, 
struct team_port *port,
   &lag_upper_info, extack);
if (err)
return err;
-   port->dev->priv_flags |= IFF_TEAM_PORT;
return 0;
 }
 
 static void team_upper_dev_unlink(struct team *team, struct team_port *port)
 {
netdev_upper_dev_unlink(port->dev, team->dev);
-   port->dev->priv_flags &= ~IFF_TEAM_PORT;
 }
 
 static void __team_port_change_port_added(struct team_port *port, bool linkup);
@@ -1214,6 +1212,9 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev,
goto err_port_enter;
}
 
+   /* set slave flag before open to prevent IPv6 addrconf */
+   port->dev->priv_flags |= IFF_TEAM_PORT;
+
err = dev_open(port_dev);
if (err) {
netdev_dbg(dev, "Device %s opening failed\n",
@@ -1292,6 +1293,7 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev,
dev_close(port_dev);
 
 err_dev_open:
+   port->dev->priv_flags &= ~IFF_TEAM_PORT;
team_port_leave(team, port);
team_port_set_orig_dev_addr(port);
 
@@ -1328,6 +1330,7 @@ static int team_port_del(struct team *team, struct 
net_device *port_dev)
dev_uc_unsync(port_dev, dev);
dev_mc_unsync(port_dev, dev);
dev_close(port_dev);
+   port->dev->priv_flags &= ~IFF_TEAM_PORT;
team_port_leave(team, port);
 
__team_option_inst_mark_removed_port(team, port);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 45b84dd5c4eb..121f863022ed 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3482,7 +3482,7 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
 
case NETDEV_UP:
case NETDEV_CHANGE:
-   if (dev->flags & IFF_SLAVE)
+   if (netif_is_lag_port(dev) || dev->flags & IFF_SLAVE)
break;
 
if (idev && idev->cnf.disable_ipv6)
-- 
2.14.4

[RFC] net: stmmac: RX Jumbo packet size > 8191 problem

2018-10-25 Thread Thor Thayer


Hi,

I'm running into a weird issue at the DMA boundary for large packets 
(>8192) that I can't explain.  I'm hoping someone here has an idea on 
why I'm seeing this issue.


This is the Synopsys DesignWare Ethernet GMAC core (3.74) using the 
stmmac driver found at drivers/net/ethernet/stmicro/stmmac.


If I ping with data sizes that exceed the first DMA buffer size (size 
set to 8191), ping reports a data mismatch as follows at byte #8144:


$ ping -c 1 -M do -s 8150 192.168.1.99
PING 192.168.1.99 (192.168.1.99) 8150(8178) bytes of data.
8158 bytes from 192.168.1.99: icmp_seq=1 ttl=64 time=0.669 ms
wrong data byte #8144 should be 0xd0 but was 0x0
#16	10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 
27 28 29 2a 2b 2c 2d 2e 2f

%< ---snip--
#8112	b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 
c6 c7 c8 c9 ca cb cc cd ce cf

#8144   0 0 0 0 d0 d1
^^^
Notice the 4 bytes of 0 there before the expected byte of d0. I 
confirmed the on-wire result with wireshark - same data packet as shown 
above.


Looking at the queue, I'm seeing these values in the RX descriptors (I'm 
using ring mode, enhanced descriptors).

0xa0040320 0x9fff1fff 0x7a358042 0x7a35a042
 ^des0  ^des1  ^des2  ^desc3

desc0 => 8196 bytes, OWN, First & Last Descriptor, Frame type = Eth
desc1 => Disable IRQ on done, Rx Buffer2 sz = 8191, Rx Buffer1 sz = 8191
desc2 => Buffer 1 Addr Pointer
desc3 => Buffer 2 Addr Pointer

If I adjust init_desc3() and refill_desc3() to initialize desc3 to 
desc2+BUF_SIZE_8KiB-4, I get a descriptor as show below and ping 
completes successfully.

0xa0040320 0x9fff1fff 0x77df8042 0x77dfa03e
  ^ this is now different

But I'm not sure why the -4 works because desc3 overlaps into the end of 
the first DMA buffer area (des2) which is counterintuitive.


At first I thought the 4 extra bytes were the FCS but that should occur 
at the end of the complete transfer, so I'd expect it to be at the end 
of all the data (in buffer2)


Here is the change that works. I ran a ping sweep with packet sizes from 
8100 to 8300 successfully with this change.

---
$ git diff
diff --git a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c 
b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c

index abc3f85270cd..b52be0235d8f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
+++ b/drivers/net/ethernet/stmicro/stmmac/ring_mode.c
@@ -115,13 +115,13 @@ static void refill_desc3(void *priv_ptr, struct 
dma_desc *p)


/* Fill DES3 in case of RING mode */
if (priv->dma_buf_sz >= BUF_SIZE_8KiB)
-   p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB);
+   p->des3 = cpu_to_le32(le32_to_cpu(p->des2) +
+ BUF_SIZE_8KiB - 4);

 }

 /* In ring mode we need to fill the desc3 because it is used as buffer */
 static void init_desc3(struct dma_desc *p)
 {
-   p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB);
+   p->des3 = cpu_to_le32(le32_to_cpu(p->des2) + BUF_SIZE_8KiB - 4);
 }

 static void clean_desc3(void *priv_ptr, struct dma_desc *p)
---

Any thoughts on why I need to change the indexing?

Thanks,

Thor

[4.18-stable 1/1] netfilter: use kvmalloc_array to allocate memory for hashtable

2018-10-25 Thread Mark Asselstine

David,

Please promote mainline commit 285189c78eeb6f684a024b86fb5997d10c6aa564 
[netfilter: use kvmalloc_array to allocate memory for hashtable] to 
linux-4.18.y stable. As it happens this not only fixes the issue described in 
the commit log, it also solves the issue of kmemleak reporting false positives 
of 'struct nf_conn' objects.

unreferenced object 0x9af78fa6de00 (size 256): 
  comm "rdate", pid 4215, jiffies 4299506036 (age 115.149s) 
  hex dump (first 32 bytes): 
01 00 00 00 00 00 00 00 0a 00 96 98 f7 9a ff ff  
45 e6 00 00 00 00 00 00 10 99 a3 94 f7 9a ff ff E... 
  backtrace: 
[<06b47d03>] kmem_cache_alloc+0x146/0x200 
[] __nf_conntrack_alloc.isra.13+0x4d/0x170[nf_conntrack] 
[<8c1c1285>] init_conntrack+0x6a/0x2f0 [nf_conntrack] 
[] nf_conntrack_in+0x2c5/0x360 [nf_conntrack] 
[<00213d80>] ipv4_conntrack_local+0x5d/0x70 [nf_conntrack_ipv4] 
[] nf_hook_slow+0x48/0xd0 
[] __ip_local_out+0xbd/0xf0 
[] ip_local_out+0x1c/0x50 
[<71f63135>] ip_queue_xmit+0x15f/0x3d0 
[<8fb87cfd>] __tcp_transmit_skb+0x5bf/0xab0 
[<73c7808d>] tcp_connect+0x648/0x830 
[<0e12e101>] tcp_v4_connect+0x458/0x4d0 
[<3223764c>] __inet_stream_connect+0xe2/0x380 
[<5c32d180>] inet_stream_connect+0x3b/0x60 
[<465bcd15>] __sys_connect+0xce/0x100 
[<55a63178>] __x64_sys_connect+0x1a/0x20 

The main object pointer to these struct nf_conn objects is 'salted' with 
ip_conntrack_info in sk_buff._nfct, and as such is not a viable pointer to 
this object by the kmemleak logic.

The only other consistent reference to these objects or contents is found in 
the hash table. But it appears that kmemleak does not scan the 
nf_conntrack_hash which is initialized in nf_ct_alloc_hashtable() via 
__get_free_pages(). This results in the objects appearing as "leaks".

I could solve this by keeping the original code and adding a call to 
kmemleak_alloc() in nf_ct_alloc_hashtable() and similarly a call to 
kmemleak_free() in nf_ct_free_hashtable(). But since this mainline commit 
exists which happens to also sort out this issue we are most likely best to do 
the backport and kill two birds with one stone.

He Zhe previously sent out a patch to this list "[RFC] [PATCH] netfilter: Fix 
kmemleak false positive reports". With the additional analysis summarized here 
that patch should not be considered for merging.

Thanks,
Mark Asselstine

Re: [PATCH] net/{ipv4,ipv6}: Do not put target net if input nsid is invalid

2018-10-25 Thread David Ahern

On 10/25/18 1:18 PM, Bjørn Mork wrote:
> The cleanup path will put the target net when netnsid is set.  So we must
> reset netnsid if the input is invalid.
> 
> Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to 
> bad attributes")
> Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to 
> bad attributes")
> Cc: David Ahern 
> Signed-off-by: Bjørn Mork 
> ---
>  net/ipv4/devinet.c  | 1 +
>  net/ipv6/addrconf.c | 1 +
>  2 files changed, 2 insertions(+)
> 

Reviewed-by: David Ahern

Re: [PATCH v3 2/2] net: qcom/emac: add phy-handle support for ACPI

2018-10-25 Thread Andrew Lunn

On Thu, Oct 25, 2018 at 06:09:15PM +0800, Wang Dongsheng wrote:
> Use "phy-handle" to porint an internal MDIO device port.

Hi Dongsheng

You are basically defining how all future ACPI based MAC drivers get
access to their PHY. This needs to become part of the ACPI standard,
etc.

This code should not be hidden away in the emac driver. It needs to be
placed somewhere public so other drivers can use it. And it needs good
documentation, including an example of what needs to go into the ACPI
tables, etc.

Thanks
Andrew

[PATCH] net/{ipv4,ipv6}: Do not put target net if input nsid is invalid

2018-10-25 Thread Bjørn Mork

The cleanup path will put the target net when netnsid is set.  So we must
reset netnsid if the input is invalid.

Fixes: d7e38611b81e ("net/ipv4: Put target net when address dump fails due to 
bad attributes")
Fixes: 242afaa6968c ("net/ipv6: Put target net when address dump fails due to 
bad attributes")
Cc: David Ahern 
Signed-off-by: Bjørn Mork 
---
 net/ipv4/devinet.c  | 1 +
 net/ipv6/addrconf.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 9250b309c742..a34602ae27de 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1704,6 +1704,7 @@ static int inet_valid_dump_ifaddr_req(const struct 
nlmsghdr *nlh,
 
net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
if (IS_ERR(net)) {
+   fillargs->netnsid = -1;
NL_SET_ERR_MSG(extack, "ipv4: Invalid target 
network namespace id");
return PTR_ERR(net);
}
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 7eb09c86fa13..63a808d5af15 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5058,6 +5058,7 @@ static int inet6_valid_dump_ifaddr_req(const struct 
nlmsghdr *nlh,
fillargs->netnsid = nla_get_s32(tb[i]);
net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
if (IS_ERR(net)) {
+   fillargs->netnsid = -1;
NL_SET_ERR_MSG_MOD(extack, "Invalid target 
network namespace id");
return PTR_ERR(net);
}
-- 
2.11.0

Re: [net-next][PATCH] net/ipv4: fix a net leak

2018-10-25 Thread David Ahern

On 10/25/18 12:43 PM, Bjørn Mork wrote:
> 
> inet_valid_dump_ifaddr_req() will bail out with an error, but only
> *after* setting fillargs->netnsid:
> 
> if (i == IFA_TARGET_NETNSID) {
> struct net *net;
> 
> fillargs->netnsid = nla_get_s32(tb[i]);
> 
> net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
> if (IS_ERR(net)) {
> NL_SET_ERR_MSG(extack, "ipv4: Invalid target 
> network namespace id");
> return PTR_ERR(net);
> }
> *tgt_net = net;
> } else {
> 
> 
> 
> So inet_dump_ifaddr() ends up doing put_net(tgt_net):
> 
> 
> err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net,
>  skb->sk, cb);
> if (err < 0)
> goto put_tgt_net;
> ..
> put_tgt_net:
> if (fillargs.netnsid >= 0)
> put_net(tgt_net);
> 
> 
> 
> I believe you should set fillargs->netnsid back to -1 in the
> inet_valid_dump_ifaddr_req() error path, or use a temp variable to avoid
> changing it unless get_net is successful.

good point. either use of an intermediate or resetting nsid on failure.
Will you send a patch to fix ipv4 and v6?

Thanks,

Re: [net-next][PATCH] net/ipv4: fix a net leak

2018-10-25 Thread Bjørn Mork

David Ahern  writes:
> On 10/24/18 9:02 AM, David Ahern wrote:
>> On 10/24/18 3:36 AM, Li RongQing wrote:
>>> put net when input a invalid ifindex, otherwise it will be leaked
>>>
>>> Fixes: 5fcd266a9f64("net/ipv4: Add support for dumping addresses for a 
>>> specific device")
>>> Cc: David Ahern 
>>> Signed-off-by: Zhang Yu 
>>> Signed-off-by: Li RongQing 
>>> ---
>>>  net/ipv4/devinet.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
>>> index 63d5b58fbfdb..fd0c5a47e742 100644
>>> --- a/net/ipv4/devinet.c
>>> +++ b/net/ipv4/devinet.c
>>> @@ -1775,8 +1775,10 @@ static int inet_dump_ifaddr(struct sk_buff *skb, 
>>> struct netlink_callback *cb)
>>>  
>>> if (fillargs.ifindex) {
>>> dev = __dev_get_by_index(tgt_net, fillargs.ifindex);
>>> -   if (!dev)
>>> +   if (!dev) {
>>> +   put_net(tgt_net);
>>> return -ENODEV;
>>> +   }
>>>  
>>> in_dev = __in_dev_get_rtnl(dev);
>>> if (in_dev) {
>>>
>> 
>> Good catch. IPv6 has the same problem. Will fix that one.
>> 
> Actually remove that 'Reviewed-by'. You should only call put_net if
> (fillargs.netnsid >= 0)
>
> DaveM: just want to call this out since I mistakenly added the
> Reviewed-by. This patch should be dropped.

Hmm, I see that you implemented that.  But I believe it's still buggy if
called with an invalid netnsid.

inet_valid_dump_ifaddr_req() will bail out with an error, but only
*after* setting fillargs->netnsid:

if (i == IFA_TARGET_NETNSID) {
struct net *net;

fillargs->netnsid = nla_get_s32(tb[i]);

net = rtnl_get_net_ns_capable(sk, fillargs->netnsid);
if (IS_ERR(net)) {
NL_SET_ERR_MSG(extack, "ipv4: Invalid target 
network namespace id");
return PTR_ERR(net);
}
*tgt_net = net;
} else {



So inet_dump_ifaddr() ends up doing put_net(tgt_net):


err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net,
 skb->sk, cb);
if (err < 0)
goto put_tgt_net;
..
put_tgt_net:
if (fillargs.netnsid >= 0)
put_net(tgt_net);



I believe you should set fillargs->netnsid back to -1 in the
inet_valid_dump_ifaddr_req() error path, or use a temp variable to avoid
changing it unless get_net is successful.



Bjørn

Re: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem

2018-10-25 Thread Florian Fainelli

On 10/25/18 11:32 AM, David Miller wrote:
> From: 
> Date: Wed, 24 Oct 2018 14:51:23 -0700
> 
>> From: Tristram Ha 
>>
>> Socket buffer is not re-created when headroom is 2 and tailroom is 1.
>>
>> Signed-off-by: Tristram Ha 
> 
> Applied.

No fixes tag?
-- 
Florian

Re: [PATCH] octeontx2-af: Use GFP_ATOMIC under spin lock

2018-10-25 Thread David Miller

From: Wei Yongjun 
Date: Thu, 25 Oct 2018 01:42:26 +

> The function nix_update_mce_list() is called from
> nix_update_bcast_mce_list(), and a spin lock is held
> here, so we should use GFP_ATOMIC instead.
> 
> Fixes: 4b05528ebf0c ("octeontx2-af: Update bcast list upon NIXLF alloc/free")
> Signed-off-by: Wei Yongjun 

I'm applying this.

I'm really disappointed in how the octeontx2 driver submission has done.

The Intel folks can get an entire new driver in with 2 series
of patches, we're on the 3rd or 4th here and the driver still
isn't completely enough to have basic functionality working.

This driver is huge, overly complicated, and is being submitted in a
very painful way.

Re: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem

2018-10-25 Thread David Miller

From: 
Date: Wed, 24 Oct 2018 14:51:23 -0700

> From: Tristram Ha 
> 
> Socket buffer is not re-created when headroom is 2 and tailroom is 1.
> 
> Signed-off-by: Tristram Ha 

Applied.

Re: netif_receive_skb is taking long time

2018-10-25 Thread David Miller

From: Keyur Amrutbhai Patel 
Date: Thu, 25 Oct 2018 17:22:02 +

> Current time consuming function are " netif_receive_skb " and "
> napi_alloc_skb " these two function calls are taking maximum about
> of time

netif_receive_skb() calls the entire networking stack receive path.
So measuring it by itself it not very useful.

Use 'perf' or a similar tool to fully profile the kernel and get a
more detailed analysis.

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-25 Thread Eric Dumazet




On 10/24/2018 12:41 PM, Andre Tomt wrote:
> 
> It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I 
> still do not have a useful packet capture.
> 
> It is running a torrent client serving up various linux distributions.
>

Have you also applied this fix ?

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913

Re: [PATCH ghak90 (was ghak32) V4 03/10] audit: log container info of syscalls

2018-10-25 Thread Richard Guy Briggs

On 2018-10-25 17:57, Steve Grubb wrote:
> On Thu, 25 Oct 2018 08:27:32 -0400
> Richard Guy Briggs  wrote:
> 
> > On 2018-10-25 06:49, Paul Moore wrote:
> > > On Thu, Oct 25, 2018 at 2:06 AM Steve Grubb 
> > > wrote:  
> > > > On Wed, 24 Oct 2018 20:42:55 -0400
> > > > Richard Guy Briggs  wrote:  
> > > > > On 2018-10-24 16:55, Paul Moore wrote:  
> > > > > > On Wed, Oct 24, 2018 at 11:15 AM Richard Guy Briggs
> > > > > >  wrote:  
> > > > > > > On 2018-10-19 19:16, Paul Moore wrote:  
> > > > > > > > On Sun, Aug 5, 2018 at 4:32 AM Richard Guy Briggs
> > > > > > > >  wrote:  
> > > 
> > > ...
> > >   
> > > > > > > > > +/*
> > > > > > > > > + * audit_log_contid - report container info
> > > > > > > > > + * @tsk: task to be recorded
> > > > > > > > > + * @context: task or local context for record
> > > > > > > > > + * @op: contid string description
> > > > > > > > > + */
> > > > > > > > > +int audit_log_contid(struct task_struct *tsk,
> > > > > > > > > +struct audit_context
> > > > > > > > > *context, char *op) +{
> > > > > > > > > +   struct audit_buffer *ab;
> > > > > > > > > +
> > > > > > > > > +   if (!audit_contid_set(tsk))
> > > > > > > > > +   return 0;
> > > > > > > > > +   /* Generate AUDIT_CONTAINER record with
> > > > > > > > > container ID */
> > > > > > > > > +   ab = audit_log_start(context, GFP_KERNEL,
> > > > > > > > > AUDIT_CONTAINER);
> > > > > > > > > +   if (!ab)
> > > > > > > > > +   return -ENOMEM;
> > > > > > > > > +   audit_log_format(ab, "op=%s contid=%llu",
> > > > > > > > > +op, audit_get_contid(tsk));
> > > > > > > > > +   audit_log_end(ab);
> > > > > > > > > +   return 0;
> > > > > > > > > +}
> > > > > > > > > +EXPORT_SYMBOL(audit_log_contid);  
> > > > > > > >
> > > > > > > > As discussed in the previous iteration of the patch, I
> > > > > > > > prefer AUDIT_CONTAINER_ID here over AUDIT_CONTAINER.  If
> > > > > > > > you feel strongly about keeping it as-is with
> > > > > > > > AUDIT_CONTAINER I suppose I could live with that, but it
> > > > > > > > is isn't my first choice.  
> > > > > > >
> > > > > > > I don't have a strong opinion on this one, mildly
> > > > > > > preferring the shorter one only because it is shorter.  
> > > > > >
> > > > > > We already have multiple AUDIT_CONTAINER* record types, so it
> > > > > > seems as though we should use "AUDIT_CONTAINER" as a prefix
> > > > > > of sorts, rather than a type itself.  
> > > > >
> > > > > I'm fine with that.  I'd still like to hear Steve's input.  He
> > > > > had stronger opinions than me.  
> > > >
> > > > The creation event should be separate and distinct from the
> > > > continuing use when its used as a supplemental record. IOW,
> > > > binding the ID to a container is part of the lifecycle and needs
> > > > to be kept distinct.  
> > > 
> > > Steve's comment is pretty ambiguous when it comes to AUDIT_CONTAINER
> > > vs AUDIT_CONTAINER_ID, but one could argue that AUDIT_CONTAINER_ID
> > > helps distinguish the audit container id marking record and gets to
> > > what I believe is the spirit of Steve's comment.  Taking this in
> > > context with my previous remarks, let's switch to using
> > > AUDIT_CONTAINER_ID.  
> > 
> > I suspect Steve is mixing up AUDIT_CONTAINER_OP with
> > AUDIT_CONTAINER_ID, confusing the fact that they are two seperate
> > records.  As a summary, the suggested records are:
> > CONTAINER_OPaudit container identifier creation
> > CONTAINER   audit container identifier aux record to an
> > event
> > 
> > and what Paul is suggesting (which is fine by me) is:
> > CONTAINER_OPaudit container identifier creation event
> > CONTAINER_IDaudit container identifier aux record to
> > an event
> > 
> > Steve, please indicate you are fine with this.
> 
> I thought it was:

It *was*.  It was changed at Paul's request in this v3 thread:
https://www.redhat.com/archives/linux-audit/2018-July/msg00087.html

And listed in the examples and changelog to this v4 patchset:
https://www.redhat.com/archives/linux-audit/2018-July/msg00178.html

It is also listed in this userspace patchset update v4 (which should
also have had a changelog added to it, note to self...):
https://www.redhat.com/archives/linux-audit/2018-July/msg00189.html

I realize it is hard to keep up with all the detail changes in these
patchsets...

> CONTAINER_ID audit container identifier creation event
> CONTAINER audit container identifier aux record to an event
> 
> Or vice versa. Don't mix up creation of the identifier with operations.

Exactly what I'm trying to avoid...  Worded another way: "Don't mix up
the creation operation with routine reporting of the identifier in
events."  Steve, can you and Paul discuss and agree on what they should
be called?  I don't have a horse in this race, but I need to record the
result of that run.  ;-)

> -Steve

- RGB

--
Richard Guy Briggs 
Sr. S/W

Re: netif_receive_skb is taking long time

2018-10-25 Thread Eric Dumazet



Please do not top post, and use normal quoting.

On 10/25/2018 10:22 AM, Keyur Amrutbhai Patel wrote:
> Hi Eric,
> 
> First of all thank you for replying and giving some spotlight.
> 
> First step would be to read Documentation/networking/scaling.txt and see if 
> anything there helps.
>  - This is good article. I had gone through it.  Any suggestion on RSS? How 
> to configure it? Do I need to take care anything specially in my NIC driver?

Just read the page and apply the various configurations.

> 
> Have you tried to profile the kernel and see if some contention or hot 
> function appears ?
> - I have added time stampings in different functions. That is how I came to 
> know that almost ~3375 neno seconds are used by just " netif_receive_skb " 
> don’t know why. With less than that time my DMA operation is finishes and 
> descriptors are managed.
> Current time consuming function are " netif_receive_skb " and " 
> napi_alloc_skb " these two function calls are taking maximum about of time
> 

So... networking spend more time in upper stacks than a driver.

A driver does almost nothing, just passing around bits that that NIC put in 
memory.

In most workloads, a driver would not use more than 5% of total cpu cycles.

Now, if all you need is to impress your friends/boss about some
crazy number of RX packets per second,
just do not allocate skbs, and not call netif_receive_skb(),
use something like XDP to drop incoming frames :)

> Maybe use a faster cpu, or remove not needed features like too heavy 
> netfilter rules.
> - I am using Intex Xeon Platinum series processors. These are fast enough 
> CPUs available in market with 64 cores. 2 CPU nodes (each has 32 core)
> 
> We can not really answer your question, you do not provide enough information.
> - Please let me know what additional details you need. We have 6 queues in 
> HW. Each is mapped to MSI-X vector. Each vector is giving interrupt on 
> different CPU. From interrupt I am scheduling napi and from napi poll 
> function I am getting DMA page and constructing skb and passing it to network 
> layer with "netif_receive_skb".
> 
> Let me know additional details which are required.
> 
> Regards,
> Keyur
> 
> -Original Message-
> From: Eric Dumazet  
> Sent: Thursday, October 25, 2018 10:38 PM
> To: Keyur Amrutbhai Patel ; netdev@vger.kernel.org
> Subject: Re: netif_receive_skb is taking long time
> 
> EXTERNAL EMAIL
> 
> On 10/25/2018 08:39 AM, Keyur Amrutbhai Patel wrote:
>> Hi,
>>
>> In my NIC driver "netif_receive_skb" is taking too long time. Almost 3375 
>> neno seconds. Which is more than whole packet processing from interrupt.
>>
>> Could anyone please help me to understand what could be the reason behind 
>> this? How to solve it to take minimum time?
>>
>> Is there any standard calls which we need to follow in order to get faster 
>> performance?
>>
> 
> First step would be to read Documentation/networking/scaling.txt and see if 
> anything there helps.
> 
> Have you tried to profile the kernel and see if some contention or hot 
> function appears ?
> 
> Maybe use a faster cpu, or remove not needed features like too heavy 
> netfilter rules.
> 
> We can not really answer your question, you do not provide enough information.
>

RE: netif_receive_skb is taking long time

2018-10-25 Thread Keyur Amrutbhai Patel

Hi Eric,

First of all thank you for replying and giving some spotlight.

First step would be to read Documentation/networking/scaling.txt and see if 
anything there helps.
 - This is good article. I had gone through it.  Any suggestion on RSS? How to 
configure it? Do I need to take care anything specially in my NIC driver?

Have you tried to profile the kernel and see if some contention or hot function 
appears ?
- I have added time stampings in different functions. That is how I came to 
know that almost ~3375 neno seconds are used by just " netif_receive_skb " 
don’t know why. With less than that time my DMA operation is finishes and 
descriptors are managed.
Current time consuming function are " netif_receive_skb " and " napi_alloc_skb 
" these two function calls are taking maximum about of time

Maybe use a faster cpu, or remove not needed features like too heavy netfilter 
rules.
- I am using Intex Xeon Platinum series processors. These are fast enough CPUs 
available in market with 64 cores. 2 CPU nodes (each has 32 core)

We can not really answer your question, you do not provide enough information.
- Please let me know what additional details you need. We have 6 queues in HW. 
Each is mapped to MSI-X vector. Each vector is giving interrupt on different 
CPU. From interrupt I am scheduling napi and from napi poll function I am 
getting DMA page and constructing skb and passing it to network layer with 
"netif_receive_skb".

Let me know additional details which are required.

Regards,
Keyur

-Original Message-
From: Eric Dumazet  
Sent: Thursday, October 25, 2018 10:38 PM
To: Keyur Amrutbhai Patel ; netdev@vger.kernel.org
Subject: Re: netif_receive_skb is taking long time

EXTERNAL EMAIL

On 10/25/2018 08:39 AM, Keyur Amrutbhai Patel wrote:
> Hi,
>
> In my NIC driver "netif_receive_skb" is taking too long time. Almost 3375 
> neno seconds. Which is more than whole packet processing from interrupt.
>
> Could anyone please help me to understand what could be the reason behind 
> this? How to solve it to take minimum time?
>
> Is there any standard calls which we need to follow in order to get faster 
> performance?
>

First step would be to read Documentation/networking/scaling.txt and see if 
anything there helps.

Have you tried to profile the kernel and see if some contention or hot function 
appears ?

Maybe use a faster cpu, or remove not needed features like too heavy netfilter 
rules.

We can not really answer your question, you do not provide enough information.

[PATCH v1 net] lan743x: Remove SPI dependency from Microchip group.

2018-10-25 Thread Bryan Whitehead

The SPI dependency does not apply to lan743x driver, and other
drivers in the group already state their dependence on SPI.

Signed-off-by: Bryan Whitehead 
---
 drivers/net/ethernet/microchip/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/microchip/Kconfig 
b/drivers/net/ethernet/microchip/Kconfig
index 16bd3f4..cf1d491 100644
--- a/drivers/net/ethernet/microchip/Kconfig
+++ b/drivers/net/ethernet/microchip/Kconfig
@@ -5,7 +5,6 @@
 config NET_VENDOR_MICROCHIP
bool "Microchip devices"
default y
-   depends on SPI
---help---
  If you have a network (Ethernet) card belonging to this class, say Y.
 
-- 
2.7.4

[PATCH v1 net] lan743x: Remove SPI dependency from Microchip group

2018-10-25 Thread Bryan Whitehead

The SPI dependency does not apply to lan743x driver, and other
drivers in the group already state their dependence on SPI.

Bryan Whitehead (1):
  lan743x: Remove SPI dependency from Microchip group.

 drivers/net/ethernet/microchip/Kconfig | 1 -
 1 file changed, 1 deletion(-)

-- 
2.7.4

Re: netif_receive_skb is taking long time

2018-10-25 Thread Eric Dumazet

On 10/25/2018 08:39 AM, Keyur Amrutbhai Patel wrote:
> Hi,
> 
> In my NIC driver "netif_receive_skb" is taking too long time. Almost 3375 
> neno seconds. Which is more than whole packet processing from interrupt.
> 
> Could anyone please help me to understand what could be the reason behind 
> this? How to solve it to take minimum time?
> 
> Is there any standard calls which we need to follow in order to get faster 
> performance?
> 

First step would be to read Documentation/networking/scaling.txt and see if 
anything there helps.

Have you tried to profile the kernel and see if some contention or hot function 
appears ?

Maybe use a faster cpu, or remove not needed features like too heavy netfilter 
rules.

We can not really answer your question, you do not provide enough information.

RE: netif_receive_skb is taking long time

2018-10-25 Thread Keyur Amrutbhai Patel

Any help on this would be appreciated. 

-Original Message-
From: netdev-ow...@vger.kernel.org  On Behalf Of 
Keyur Amrutbhai Patel
Sent: Thursday, October 25, 2018 9:09 PM
To: netdev@vger.kernel.org
Subject: netif_receive_skb is taking long time

Hi,

In my NIC driver "netif_receive_skb" is taking too long time. Almost 3375 neno 
seconds. Which is more than whole packet processing from interrupt.

Could anyone please help me to understand what could be the reason behind this? 
How to solve it to take minimum time?

Is there any standard calls which we need to follow in order to get faster 
performance?

Regards,
Keyur

This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.

[PATCH net] r8169: fix broken Wake-on-LAN from S5 (poweroff)

2018-10-25 Thread Heiner Kallweit

It was reported that WoL from S5 is broken (WoL from S3 works) and the
analysis showed that during system shutdown the network interface was
brought down already when the actual kernel shutdown started.
Therefore netif_running() returned false and as a consequence the PHY
was suspended. Obviously WoL wasn't working then.
To fix this the original patch needs to be effectively reverted.
A side effect is that when normally bringing down the interface and
WoL is enabled the PHY will remain powered on (like it was before the
original patch).

Fixes: fe87bef01f9b ("r8169: don't check WoL when powering down PHY and 
interface is down")
Reported-by: Neil MacLeod 
Signed-off-by: Heiner Kallweit 
---
 drivers/net/ethernet/realtek/r8169.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 0d8070adc..987fedeee 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4162,10 +4162,15 @@ static void rtl_wol_suspend_quirk(struct 
rtl8169_private *tp)
 
 static bool rtl_wol_pll_power_down(struct rtl8169_private *tp)
 {
-   if (!netif_running(tp->dev) || !__rtl8169_get_wol(tp))
+   struct phy_device *phydev;
+
+   if (!__rtl8169_get_wol(tp))
return false;
 
-   phy_speed_down(tp->dev->phydev, false);
+   /* phydev may not be attached to netdevice */
+   phydev = mdiobus_get_phy(tp->mii_bus, 0);
+
+   phy_speed_down(phydev, false);
rtl_wol_suspend_quirk(tp);
 
return true;
-- 
2.19.1

netif_receive_skb is taking long time

2018-10-25 Thread Keyur Amrutbhai Patel

Hi,

In my NIC driver "netif_receive_skb" is taking too long time. Almost 3375 neno 
seconds. Which is more than whole packet processing from interrupt.

Could anyone please help me to understand what could be the reason behind this? 
How to solve it to take minimum time?

Is there any standard calls which we need to follow in order to get faster 
performance?

Regards,
Keyur


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.

Re: [PATCH v2 1/3] bpf: allow zero-initializing hash map seed

2018-10-25 Thread Lorenz Bauer

On Tue, 9 Oct 2018 at 01:08, Song Liu  wrote:
>
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -253,6 +253,8 @@ enum bpf_attach_type {
> >  #define BPF_F_NO_COMMON_LRU(1U << 1)
> >  /* Specify numa node during map creation */
> >  #define BPF_F_NUMA_NODE(1U << 2)
> > +/* Zero-initialize hash function seed. This should only be used for 
> > testing. */
> > +#define BPF_F_ZERO_SEED(1U << 6)
>
> Please add this line after
> #define BPF_F_STACK_BUILD_ID(1U << 5)

I wanted to keep the flags for BPF_MAP_CREATE grouped together.
Maybe the correct value is (1U << 3)? It seemed like the other flags
were allocated to avoid
overlap between different BPF commands, however, so I tried to follow suit.

-- 
Lorenz Bauer  |  Systems Engineer
25 Lavington St., London SE1 0NZ

www.cloudflare.com

Re: [PATCH v2 2/3] tools: sync linux/bpf.h

2018-10-25 Thread Lorenz Bauer

On Tue, 9 Oct 2018 at 01:12, Song Liu  wrote:
>
> On Mon, Oct 8, 2018 at 3:34 AM Lorenz Bauer  wrote:
> >
> > Synchronize changes to linux/bpf.h from
> > commit 88db241b34bf ("bpf: allow zero-initializing hash map seed").
> I guess we cannot keep this hash during git-am? We probably don't
> need this hash anyway, as the two patches will be applied back to back.

I copied what was done in one of the previous commits that synced the
header. I'm a bit at a
loss what to put in the commit message otherwise.

-- 
Lorenz Bauer  |  Systems Engineer
25 Lavington St., London SE1 0NZ

www.cloudflare.com

Re: [RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-25 Thread John Hurley

On Thu, Oct 25, 2018 at 2:00 PM Jiri Pirko  wrote:
>
> Thu, Oct 25, 2018 at 02:26:51PM CEST, john.hur...@netronome.com wrote:
> >Add a helper function to determine if the type of a netdev is geneve based
> >on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
> >to check the underlying type of the device.
> >
> >A recent patch added a similar helper to vxlan.h
> >
> >Signed-off-by: John Hurley 
> >Reviewed-by: Jakub Kicinski 
>
> I don't understand why this and the next patch are part of this
> patchset. They don't seem directly related.

This is used in later patches that implement the indirect block
offload but I suppose it is not directly related.
We can probably move it to a separate patchset.
Thanks

Re: [RFC net-next v2 0/8] indirect tc block cb registration

2018-10-25 Thread John Hurley

On Thu, Oct 25, 2018 at 1:58 PM Jiri Pirko  wrote:
>
> Thu, Oct 25, 2018 at 02:26:49PM CEST, john.hur...@netronome.com wrote:
> >This patchset introduces an alternative to egdev offload by allowing a
> >driver to register for block updates when an external device (e.g. tunnel
> >netdev) is bound to a TC block. Drivers can track new netdevs or register
> >to existing ones to receive information on such events. Based on this,
> >they may register for block offload rules using already existing
> >functions.
> >
> >The patchset also implements this new indirect block registration in the
> >NFP driver to allow the offloading of tunnel rules. The use of egdev
> >offload (which is currently only used for tunnel offload) is subsequently
> >removed.
>
> John, I'm missing v1->v2 changelog. Could you please add it?
>
> Thanks!

Hi Jiri,
There's little change outside the NFP in v2 but here's short changelog:

v1->v2:
- free allocated owner struct in block_owner_clean function
- add geneve type helper function
- move test stub in NFP (v1 patch 2) to full tunnel offload
implementation via indirect blocks (v2 patches 3-8)

[PATCH] selftests/bpf: add config fragments BPF_STREAM_PARSER and XDP_SOCKETS

2018-10-25 Thread Naresh Kamboju

BPF sockmap and hashmap are dependent on CONFIG_BPF_STREAM_PARSER and
xskmap is dependent on CONFIG_XDP_SOCKETS

Signed-off-by: Naresh Kamboju 
---
 tools/testing/selftests/bpf/config | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/config 
b/tools/testing/selftests/bpf/config
index dd49df5e2df4..7f90d3645af8 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -20,3 +20,5 @@ CONFIG_VXLAN=y
 CONFIG_GENEVE=y
 CONFIG_NET_CLS_FLOWER=m
 CONFIG_LWTUNNEL=y
+CONFIG_BPF_STREAM_PARSER=y
+CONFIG_XDP_SOCKETS=y
-- 
2.17.1

[PATCH net] drivers: net: remove inclusion when not needed

2018-10-25 Thread Eric Dumazet

Drivers using generic NAPI interface no longer need to include
, since busy polling was moved to core networking
stack long ago.

See commit 79e7fff47b7b ("net: remove support for per driver
ndo_busy_poll()") for reference.

Signed-off-by: Eric Dumazet 
---
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 1 -
 drivers/net/ethernet/intel/i40e/i40e_txrx.c  | 1 -
 drivers/net/ethernet/intel/iavf/iavf_txrx.c  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe.h | 1 -
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   | 1 -
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c  | 1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 1 -
 9 files changed, 9 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
index 
d96a84a62d78dff9625ce78d15779a05df8b510c..0cc911f928b143c5e511b83f3e1a4494443a1f2e
 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
@@ -119,7 +119,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 
5a727d4729da7348075b75101154cca3cf515073..686899d7e555e84a4c78459e1780765b9794b913
 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -27,7 +27,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "bnx2x_cmn.h"
 #include "bnx2x_init.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 
740ea58ba938d412d87048eb2b109a2a5d358170..aef3c89ee79c4e7384e0713c55b12090c1c36f60
 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2,7 +2,6 @@
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
 #include 
-#include 
 #include 
 #include 
 #include "i40e.h"
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c 
b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 
edc349f4974827d6a09afe2f650c6327357f591e..fb9bfad96daff5f57f079a24d47be0d55cb8f96c
 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -2,7 +2,6 @@
 /* Copyright(c) 2013 - 2018 Intel Corporation. */
 
 #include 
-#include 
 
 #include "iavf.h"
 #include "iavf_trace.h"
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 
7a7679e7be84c548c091452bdcea736567fffc66..ec1b87cc44100904bf7b486692bc1d06b256fc80
 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -30,7 +30,6 @@
 #include "ixgbe_ipsec.h"
 
 #include 
-#include 
 
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 
fe49384eba48cb3f75bd33f5bfb9cf1fa15af791..b744cd49a7856e97917bcdce93e7d5ee205f09cd
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -39,7 +39,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 
a1aeeb8094c376f9fac9610b7a4606e92860d4fc..5a6d0919533d6e0e619927abd753c5d07ed95dac
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -31,7 +31,6 @@
  *
  */
 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 
2f7fb8de6967293f26d9ee72b96a66807fdfdde9..94224c22ecc310a87b6715051e335446f29bec03
 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c 
b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 
b2d2ec8c11e2d15e0562ca89c2f76d58bc4e69c3..5f384f73007daf478b25dc529159d9d9da062419
 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -70,7 +70,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "myri10ge_mcp.h"
 #include "myri10ge_mcp_gen_header.h"
-- 
2.19.1.568.g152ad8e336-goog

Re: [RFC PATCH v2 01/10] udp: implement complete book-keeping for encap_needed

2018-10-25 Thread Paolo Abeni

Hi,

I'm sorry for lagging behind, this one felt outside my radar.

On Mon, 2018-10-22 at 12:06 -0400, Willem de Bruijn wrote:
> @@ -2431,7 +2435,9 @@ int udp_lib_setsockopt(struct sock *sk, int level, int 
> optname,
> > /* FALLTHROUGH */
> > case UDP_ENCAP_L2TPINUDP:
> > up->encap_type = val;
> > -   udp_encap_enable();
> > +   if (!up->encap_enabled)
> > +   udp_encap_enable();
> > +   up->encap_enabled = 1;
> 
> nit: no need for the branch: udp_encap_enable already has a branch and
> is static inline.

Uhm... I think it's needed, so that we call udp_encap_enable() at most
once per socket. If up->encap_enabled we also call static_key_disable
at socket destruction time (once per socket, again) and hopefully we
don't "leak" static_key_enable invocation.

> Perhaps it makes sense to convert that to take the udp_sock and handle
> the state change within, to avoid having to open code at multiple
> locations.

Possibly calling directly udp_tunnel_encap_enable()? that additionally
cope with ipv6, which is not needed here, but should not hurt.

Cheers,

Paolo

Re: [RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-25 Thread Jiri Pirko

Thu, Oct 25, 2018 at 02:26:51PM CEST, john.hur...@netronome.com wrote:
>Add a helper function to determine if the type of a netdev is geneve based
>on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
>to check the underlying type of the device.
>
>A recent patch added a similar helper to vxlan.h
>
>Signed-off-by: John Hurley 
>Reviewed-by: Jakub Kicinski 

I don't understand why this and the next patch are part of this
patchset. They don't seem directly related.

Re: [RFC net-next v2 0/8] indirect tc block cb registration

2018-10-25 Thread Jiri Pirko

Thu, Oct 25, 2018 at 02:26:49PM CEST, john.hur...@netronome.com wrote:
>This patchset introduces an alternative to egdev offload by allowing a
>driver to register for block updates when an external device (e.g. tunnel
>netdev) is bound to a TC block. Drivers can track new netdevs or register
>to existing ones to receive information on such events. Based on this,
>they may register for block offload rules using already existing
>functions.
>
>The patchset also implements this new indirect block registration in the
>NFP driver to allow the offloading of tunnel rules. The use of egdev
>offload (which is currently only used for tunnel offload) is subsequently
>removed.

John, I'm missing v1->v2 changelog. Could you please add it?

Thanks!

[PATCH net] net: phy: genphy_10g_driver: Avoid NULL pointer dereference

2018-10-25 Thread Andrew Lunn

This driver got missed during the recent change of .features from a
u32 to a pointer to a Linux bitmap. Change the initialisation from 0
to PHY_10GBIT_FEATURES so removing the danger of a NULL pointer
dereference.

Fixes: 719655a14971 ("net: phy: Replace phy driver features u32 with link_mode 
bitmap")
Reported-by: Jose Abreu 
Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/phy-c45.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
index e1225545362d..d7636ff03bc7 100644
--- a/drivers/net/phy/phy-c45.c
+++ b/drivers/net/phy/phy-c45.c
@@ -329,7 +329,7 @@ struct phy_driver genphy_10g_driver = {
.name   = "Generic 10G PHY",
.soft_reset = gen10g_no_soft_reset,
.config_init= gen10g_config_init,
-   .features   = 0,
+   .features   = PHY_10GBIT_FEATURES,
.config_aneg= gen10g_config_aneg,
.read_status= gen10g_read_status,
.suspend= gen10g_suspend,
-- 
2.19.1

[RFC net-next v2 8/8] nfp: flower: remove unnecessary code in flow lookup

2018-10-25 Thread John Hurley

Recent changes to NFP mean that stats updates from fw to driver no longer
require a flow lookup and (because egdev offload has been removed) the
ingress netdev for a lookup is now always known.

Remove obsolete code in a flow lookup that matches on host context and
that allows for a netdev to be NULL.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.h |  3 +--
 drivers/net/ethernet/netronome/nfp/flower/metadata.c | 11 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c  |  6 ++
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index d8c8f0d..3d3a13f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -20,7 +20,6 @@ struct nfp_fl_pre_lag;
 struct net_device;
 struct nfp_app;
 
-#define NFP_FL_STATS_CTX_DONT_CARE cpu_to_be32(0x)
 #define NFP_FL_STATS_ELEM_RS   FIELD_SIZEOF(struct nfp_fl_stats_id, \
 init_unalloc)
 #define NFP_FLOWER_MASK_ENTRY_RS   256
@@ -248,7 +247,7 @@ int nfp_modify_flow_metadata(struct nfp_app *app,
 
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx);
+  struct net_device *netdev);
 struct nfp_fl_payload *
 nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 9b4711c..573a440 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -21,7 +21,6 @@ struct nfp_mask_id_table {
 struct nfp_fl_flow_table_cmp_arg {
struct net_device *netdev;
unsigned long cookie;
-   __be32 host_ctx;
 };
 
 static int nfp_release_stats_entry(struct nfp_app *app, u32 stats_context_id)
@@ -76,14 +75,13 @@ static int nfp_get_stats_entry(struct nfp_app *app, u32 
*stats_context_id)
 /* Must be called with either RTNL or rcu_read_lock */
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx)
+  struct net_device *netdev)
 {
struct nfp_fl_flow_table_cmp_arg flower_cmp_arg;
struct nfp_flower_priv *priv = app->priv;
 
flower_cmp_arg.netdev = netdev;
flower_cmp_arg.cookie = tc_flower_cookie;
-   flower_cmp_arg.host_ctx = host_ctx;
 
return rhashtable_lookup_fast(&priv->flow_table, &flower_cmp_arg,
  nfp_flower_table_params);
@@ -307,8 +305,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
priv->stats[stats_cxt].bytes = 0;
priv->stats[stats_cxt].used = jiffies;
 
-   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev,
-NFP_FL_STATS_CTX_DONT_CARE);
+   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (check_entry) {
if (nfp_release_stats_entry(app, stats_cxt))
return -EINVAL;
@@ -353,9 +350,7 @@ static int nfp_fl_obj_cmpfn(struct rhashtable_compare_arg 
*arg,
const struct nfp_fl_flow_table_cmp_arg *cmp_arg = arg->key;
const struct nfp_fl_payload *flow_entry = obj;
 
-   if ((!cmp_arg->netdev || flow_entry->ingress_dev == cmp_arg->netdev) &&
-   (cmp_arg->host_ctx == NFP_FL_STATS_CTX_DONT_CARE ||
-flow_entry->meta.host_ctx_id == cmp_arg->host_ctx))
+   if (flow_entry->ingress_dev == cmp_arg->netdev)
return flow_entry->tc_flower_cookie != cmp_arg->cookie;
 
return 1;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 392d292..07ff728 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -512,8 +512,7 @@ nfp_flower_del_offload(struct nfp_app *app, struct 
net_device *netdev,
if (nfp_netdev_is_nfp_repr(netdev))
port = nfp_port_from_netdev(netdev);
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev,
- NFP_FL_STATS_CTX_DONT_CARE);
+   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (!nfp_flow)
return -ENOENT;
 
@@ -561,8 +560,7 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device 
*netdev,
struct nfp_fl_payload *nfp_flow;
u32 ctx_id;
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev,
-

[RFC net-next v2 5/8] nfp: flower: add infastructure for indirect TC block register

2018-10-25 Thread John Hurley

Add support structures and functions that can be used by NFP to impliment
the indirect block register functionality of TC.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  13 +++
 drivers/net/ethernet/netronome/nfp/flower/main.h   |   8 ++
 .../net/ethernet/netronome/nfp/flower/offload.c| 129 +
 3 files changed, 150 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 3a54728..518006c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -568,8 +568,18 @@ static int nfp_flower_init(struct nfp_app *app)
goto err_cleanup_metadata;
}
 
+   INIT_LIST_HEAD(&app_priv->indr_block_cb_priv);
+   app_priv->indr_block_owner = tc_indr_block_owner_create();
+   if (!app_priv->indr_block_owner) {
+   err = -ENOMEM;
+   goto err_lag_clean;
+   }
+
return 0;
 
+err_lag_clean:
+   if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+   nfp_flower_lag_cleanup(&app_priv->nfp_lag);
 err_cleanup_metadata:
nfp_flower_metadata_cleanup(app);
 err_free_app_priv:
@@ -588,6 +598,8 @@ static void nfp_flower_clean(struct nfp_app *app)
if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
nfp_flower_lag_cleanup(&app_priv->nfp_lag);
 
+   nfp_flower_clean_indr_block_priv(app);
+
nfp_flower_metadata_cleanup(app);
vfree(app->priv);
app->priv = NULL;
@@ -678,6 +690,7 @@ static void nfp_flower_stop(struct nfp_app *app)
unregister_netdevice_notifier(&app_priv->nfp_lag.lag_nb);
 
nfp_tunnel_config_stop(app);
+   tc_indr_block_owner_clean(app_priv->indr_block_owner);
 }
 
 const struct nfp_app_type app_flower = {
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index a91ac52..8b4bcf3 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -133,6 +133,8 @@ struct nfp_fl_lag {
  * @reify_wait_queue:  wait queue for repr reify response counting
  * @mtu_conf:  Configuration of repr MTU value
  * @nfp_lag:   Link aggregation data block
+ * @indr_block_cb_priv:List of priv data passed to indirect block 
registers
+ * @indr_block_owner:  Struct required for indirect blocks
  */
 struct nfp_flower_priv {
struct nfp_app *app;
@@ -166,6 +168,8 @@ struct nfp_flower_priv {
wait_queue_head_t reify_wait_queue;
struct nfp_mtu_conf mtu_conf;
struct nfp_fl_lag nfp_lag;
+   struct list_head indr_block_cb_priv;
+   struct tcf_indr_block_owner *indr_block_owner;
 };
 
 /**
@@ -269,5 +273,9 @@ int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
   struct nfp_fl_pre_lag *pre_act);
 int nfp_flower_lag_get_output_id(struct nfp_app *app,
 struct net_device *master);
+void
+nfp_flower_register_indr_block(struct nfp_app *app, struct net_device *netdev);
+void nfp_flower_unregister_indr_block(struct net_device *netdev);
+void nfp_flower_clean_indr_block_priv(struct nfp_app *app);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 2c32edf..f701b2e 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -693,3 +693,132 @@ int nfp_flower_setup_tc(struct nfp_app *app, struct 
net_device *netdev,
return -EOPNOTSUPP;
}
 }
+
+struct nfp_flower_indr_block_cb_priv {
+   struct net_device *netdev;
+   struct nfp_app *app;
+   struct list_head list;
+};
+
+static struct nfp_flower_indr_block_cb_priv *
+nfp_flower_indr_block_cb_priv_lookup(struct nfp_app *app,
+struct net_device *netdev)
+{
+   struct nfp_flower_indr_block_cb_priv *cb_priv;
+   struct nfp_flower_priv *priv = app->priv;
+
+   /* All callback list access should be protected by RTNL. */
+   ASSERT_RTNL();
+
+   list_for_each_entry(cb_priv, &priv->indr_block_cb_priv, list)
+   if (cb_priv->netdev == netdev)
+   return cb_priv;
+
+   return NULL;
+}
+
+void nfp_flower_clean_indr_block_priv(struct nfp_app *app)
+{
+   struct nfp_flower_indr_block_cb_priv *cb_priv, *temp;
+   struct nfp_flower_priv *priv = app->priv;
+
+   list_for_each_entry_safe(cb_priv, temp, &priv->indr_block_cb_priv, list)
+   kfree(cb_priv);
+}
+
+static int nfp_flower_setup_indr_block_cb(enum tc_setup_type type,
+ void *type_data, void *cb_priv)
+{
+   struct nfp_flower_indr_block_cb_priv *priv = cb_priv;
+   struct tc_cls_flower_of

[RFC net-next v2 6/8] nfp: flower: offload tunnel decap rules via indirect TC blocks

2018-10-25 Thread John Hurley

Previously, TC block tunnel decap rules were only offloaded when a
callback was triggered through registration of the rules egress device.
This meant that the driver had no access to the ingress netdev and so
could not verify it was the same tunnel type that the rule implied.

Register tunnel devices for indirect TC block offloads in NFP, giving
access to new rules based on the ingress device rather than egress. Use
this to verify the netdev type of VXLAN and Geneve based rules and offload
the rules to HW if applicable.

Tunnel registration is done via a netdev notifier. On notifier
registration, this is triggered for already existing netdevs. This means
that NFP can register for offloads from devices that exist before it is
loaded (filter rules will be replayed from the TC core).

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/action.c  | 15 ---
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h| 13 +
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 11 +++
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c |  9 -
 4 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 04349c7..1260825 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -91,21 +91,6 @@ nfp_fl_pre_lag(struct nfp_app *app, const struct tc_action 
*action,
return act_size;
 }
 
-static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
-enum nfp_flower_tun_type tun_type)
-{
-   if (!out_dev->rtnl_link_ops)
-   return false;
-
-   if (!strcmp(out_dev->rtnl_link_ops->kind, "vxlan"))
-   return tun_type == NFP_FL_TUNNEL_VXLAN;
-
-   if (!strcmp(out_dev->rtnl_link_ops->kind, "geneve"))
-   return tun_type == NFP_FL_TUNNEL_GENEVE;
-
-   return false;
-}
-
 static int
 nfp_fl_output(struct nfp_app *app, struct nfp_fl_output *output,
  const struct tc_action *action, struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 29d673a..06e2888 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../nfp_app.h"
 #include "../nfpcore/nfp_cpp.h"
@@ -475,6 +476,18 @@ static inline int nfp_flower_cmsg_get_data_len(struct 
sk_buff *skb)
return skb->len - NFP_FLOWER_CMSG_HLEN;
 }
 
+static inline bool
+nfp_fl_netdev_is_tunnel_type(struct net_device *dev,
+enum nfp_flower_tun_type tun_type)
+{
+   if (netif_is_vxlan(dev))
+   return tun_type == NFP_FL_TUNNEL_VXLAN;
+   if (netif_is_geneve(dev))
+   return tun_type == NFP_FL_TUNNEL_GENEVE;
+
+   return false;
+}
+
 struct sk_buff *
 nfp_flower_cmsg_mac_repr_start(struct nfp_app *app, unsigned int num_ports);
 void
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index f701b2e..1dc6044 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -128,6 +128,7 @@ nfp_flower_calc_opt_layer(struct 
flow_dissector_key_enc_opts *enc_opts,
 
 static int
 nfp_flower_calculate_key_layers(struct nfp_app *app,
+   struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
bool egress,
@@ -186,8 +187,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
skb_flow_dissector_target(flow->dissector,
  
FLOW_DISSECTOR_KEY_ENC_CONTROL,
  flow->key);
-   if (!egress)
-   return -EOPNOTSUPP;
 
if (mask_enc_ctl->addr_type != 0x ||
enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
@@ -250,6 +249,10 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
default:
return -EOPNOTSUPP;
}
+
+   /* Ensure the ingress netdev matches the expected tun type. */
+   if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
+   return -EOPNOTSUPP;
} else if (egress) {
/* Reject non tunnel matches offloaded to egress repr. */
return -EOPNOTSUPP;
@@ -451,8 +454,8 @@ nfp_flower_add_offload(struct nfp_app *app, struct 
net_device *netdev,
if (!key_layer)
return -ENOMEM;
 
-

[RFC net-next v2 7/8] nfp: flower: remove TC egdev offloads

2018-10-25 Thread John Hurley

Previously, only tunnel decap rules required egdev registration for
offload in NFP. These are now supported via indirect TC block callbacks.

Remove the egdev code from NFP.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c   | 12 
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  3 -
 .../net/ethernet/netronome/nfp/flower/metadata.c   |  1 +
 .../net/ethernet/netronome/nfp/flower/offload.c| 79 +-
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 518006c..45ab4be 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -146,23 +146,12 @@ nfp_flower_repr_netdev_stop(struct nfp_app *app, struct 
nfp_repr *repr)
return nfp_flower_cmsg_portmod(repr, false, repr->netdev->mtu, false);
 }
 
-static int
-nfp_flower_repr_netdev_init(struct nfp_app *app, struct net_device *netdev)
-{
-   return tc_setup_cb_egdev_register(netdev,
- nfp_flower_setup_tc_egress_cb,
- netdev_priv(netdev));
-}
-
 static void
 nfp_flower_repr_netdev_clean(struct nfp_app *app, struct net_device *netdev)
 {
struct nfp_repr *repr = netdev_priv(netdev);
 
kfree(repr->app_priv);
-
-   tc_setup_cb_egdev_unregister(netdev, nfp_flower_setup_tc_egress_cb,
-netdev_priv(netdev));
 }
 
 static void
@@ -711,7 +700,6 @@ const struct nfp_app_type app_flower = {
.vnic_init  = nfp_flower_vnic_init,
.vnic_clean = nfp_flower_vnic_clean,
 
-   .repr_init  = nfp_flower_repr_netdev_init,
.repr_preclean  = nfp_flower_repr_netdev_preclean,
.repr_clean = nfp_flower_repr_netdev_clean,
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 8b4bcf3..d8c8f0d 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -213,7 +213,6 @@ struct nfp_fl_payload {
char *unmasked_data;
char *mask_data;
char *action_data;
-   bool ingress_offload;
 };
 
 extern const struct rhashtable_params nfp_flower_table_params;
@@ -262,8 +261,6 @@ void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 
ipv4);
 void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb);
 void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
-int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv);
 void nfp_flower_lag_init(struct nfp_fl_lag *lag);
 void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag);
 int nfp_flower_lag_reset(struct nfp_fl_lag *lag);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 48729bf..9b4711c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -287,6 +287,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
 
nfp_flow->meta.host_ctx_id = cpu_to_be32(stats_cxt);
nfp_flow->meta.host_cookie = cpu_to_be64(flow->cookie);
+   nfp_flow->ingress_dev = netdev;
 
new_mask_id = 0;
if (!nfp_check_mask_add(app, nfp_flow->mask_data,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 1dc6044..392d292 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -131,7 +131,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
-   bool egress,
enum nfp_flower_tun_type *tun_type)
 {
struct flow_dissector_key_basic *mask_basic = NULL;
@@ -253,9 +252,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
/* Ensure the ingress netdev matches the expected tun type. */
if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
return -EOPNOTSUPP;
-   } else if (egress) {
-   /* Reject non tunnel matches offloaded to egress repr. */
-   return -EOPNOTSUPP;
}
 
if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
@@ -376,7 +372,7 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 }
 
 static struct nfp_fl_payload *
-nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer, bool egress)
+nfp_flower_allocate_new(struct n

[RFC net-next v2 2/8] net: add netif_is_geneve()

2018-10-25 Thread John Hurley

Add a helper function to determine if the type of a netdev is geneve based
on its rtnl_link_ops. This allows drivers that may wish to ofload tunnels
to check the underlying type of the device.

A recent patch added a similar helper to vxlan.h

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 include/net/geneve.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/net/geneve.h b/include/net/geneve.h
index a7600ed..fc6a7e0 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -60,6 +60,12 @@ struct genevehdr {
struct geneve_opt options[];
 };
 
+static inline bool netif_is_geneve(const struct net_device *dev)
+{
+   return dev->rtnl_link_ops &&
+  !strcmp(dev->rtnl_link_ops->kind, "geneve");
+}
+
 #ifdef CONFIG_INET
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
u8 name_assign_type, u16 dst_port);
-- 
2.7.4

[RFC net-next v2 4/8] nfp: flower: allow non repr netdev offload

2018-10-25 Thread John Hurley

Previously the offload functions in NFP assumed that the ingress (or
egress) netdev passed to them was an nfp repr.

Modify the driver to permit the passing of non repr netdevs as the ingress
device for an offload rule candidate. This may include devices such as
tunnels. The driver should then base its offload decision on a combination
of ingress device and egress port for a rule.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 14 
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  3 +-
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 38 --
 .../net/ethernet/netronome/nfp/flower/offload.c| 33 +++
 4 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 244dc26..04349c7 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -151,11 +151,12 @@ nfp_fl_output(struct nfp_app *app, struct nfp_fl_output 
*output,
/* Set action output parameters. */
output->flags = cpu_to_be16(tmp_flags);
 
-   /* Only offload if egress ports are on the same device as the
-* ingress port.
-*/
-   if (!switchdev_port_same_parent_id(in_dev, out_dev))
-   return -EOPNOTSUPP;
+   if (nfp_netdev_is_nfp_repr(in_dev)) {
+   /* Confirm ingress and egress are on same device. */
+   if (!switchdev_port_same_parent_id(in_dev, out_dev))
+   return -EOPNOTSUPP;
+   }
+
if (!nfp_netdev_is_nfp_repr(out_dev))
return -EOPNOTSUPP;
 
@@ -728,9 +729,8 @@ nfp_flower_loop_action(struct nfp_app *app, const struct 
tc_action *a,
*a_len += sizeof(struct nfp_fl_push_vlan);
} else if (is_tcf_tunnel_set(a)) {
struct ip_tunnel_info *ip_tun = tcf_tunnel_info(a);
-   struct nfp_repr *repr = netdev_priv(netdev);
 
-   *tun_type = nfp_fl_get_tun_from_act_l4_port(repr->app, a);
+   *tun_type = nfp_fl_get_tun_from_act_l4_port(app, a);
if (*tun_type == NFP_FL_TUNNEL_NONE)
return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 90045ba..a91ac52 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -226,7 +226,8 @@ void nfp_flower_metadata_cleanup(struct nfp_app *app);
 
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
enum tc_setup_type type, void *type_data);
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c 
b/drivers/net/ethernet/netronome/nfp/flower/match.c
index e54fb60..cdf7559 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -52,10 +52,13 @@ nfp_flower_compile_port(struct nfp_flower_in_port *frame, 
u32 cmsg_port,
return 0;
}
 
-   if (tun_type)
+   if (tun_type) {
frame->in_port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
-   else
+   } else {
+   if (!cmsg_port)
+   return -EOPNOTSUPP;
frame->in_port = cpu_to_be32(cmsg_port);
+   }
 
return 0;
 }
@@ -289,17 +292,21 @@ nfp_flower_compile_ipv4_udp_tun(struct 
nfp_flower_ipv4_udp_tun *frame,
}
 }
 
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
  enum nfp_flower_tun_type tun_type)
 {
-   struct nfp_repr *netdev_repr;
+   u32 cmsg_port = 0;
int err;
u8 *ext;
u8 *msk;
 
+   if (nfp_netdev_is_nfp_repr(netdev))
+   cmsg_port = nfp_repr_get_port_id(netdev);
+
memset(nfp_flow->unmasked_data, 0, key_ls->key_size);
memset(nfp_flow->mask_data, 0, key_ls->key_size);
 
@@ -327,15 +334,13 @@ int nfp_flower_compile_flow_match(struct 
tc_cls_flower_of

[RFC net-next v2 3/8] nfp: flower: include geneve as supported offload tunnel type

2018-10-25 Thread John Hurley

Offload of geneve decap rules is supported in NFP. Include geneve in the
check for supported types.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 8e5bec0..170f314 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -190,6 +190,8 @@ static bool nfp_tun_is_netdev_to_offload(struct net_device 
*netdev)
return true;
if (netif_is_vxlan(netdev))
return true;
+   if (netif_is_geneve(netdev))
+   return true;
 
return false;
 }
-- 
2.7.4

[RFC net-next v2 0/8] indirect tc block cb registration

2018-10-25 Thread John Hurley

This patchset introduces an alternative to egdev offload by allowing a
driver to register for block updates when an external device (e.g. tunnel
netdev) is bound to a TC block. Drivers can track new netdevs or register
to existing ones to receive information on such events. Based on this,
they may register for block offload rules using already existing
functions.

The patchset also implements this new indirect block registration in the
NFP driver to allow the offloading of tunnel rules. The use of egdev
offload (which is currently only used for tunnel offload) is subsequently
removed.

John Hurley (8):
  net: sched: register callbacks for indirect tc block binds
  net: add netif_is_geneve()
  nfp: flower: include geneve as supported offload tunnel type
  nfp: flower: allow non repr netdev offload
  nfp: flower: add infastructure for indirect TC block register
  nfp: flower: offload tunnel decap rules via indirect TC blocks
  nfp: flower: remove TC egdev offloads
  nfp: flower: remove unnecessary code in flow lookup

 drivers/net/ethernet/netronome/nfp/flower/action.c |  29 +-
 drivers/net/ethernet/netronome/nfp/flower/cmsg.h   |  13 +
 drivers/net/ethernet/netronome/nfp/flower/main.c   |  25 +-
 drivers/net/ethernet/netronome/nfp/flower/main.h   |  17 +-
 drivers/net/ethernet/netronome/nfp/flower/match.c  |  38 +--
 .../net/ethernet/netronome/nfp/flower/metadata.c   |  12 +-
 .../net/ethernet/netronome/nfp/flower/offload.c| 246 +++--
 .../ethernet/netronome/nfp/flower/tunnel_conf.c|  11 +-
 include/net/geneve.h   |   6 +
 include/net/pkt_cls.h  |  56 
 include/net/sch_generic.h  |   3 +
 net/sched/cls_api.c| 299 -
 12 files changed, 609 insertions(+), 146 deletions(-)

-- 
2.7.4

[RFC net-next v2 1/8] net: sched: register callbacks for indirect tc block binds

2018-10-25 Thread John Hurley

Currently drivers can register to receive TC block bind/unbind callbacks
by implementing the setup_tc ndo in any of their given netdevs. However,
drivers may also be interested in binds to higher level devices (e.g.
tunnel drivers) to potentially offload filters applied to them.

Introduce indirect block devs which allows drivers to register callbacks
for block binds on other devices. The calling driver is expected to
reference an 'owner' struct that it will pass to all block registrations.
This is used to track the callbacks from a given driver and free them if
the driver is removed while the upper level device is still active.
Freeing a callback will also trigger an unbind event (if necessary) to
direct the driver to remove any offloaded rules and unreg any block filter
callbacks.

Allow registering an indirect block dev callback for a device that is
already bound to a block. In this case (if it is an ingress block),
register and also trigger the callback meaning that any already installed
rules can be replayed to the calling driver.

Signed-off-by: John Hurley 
Signed-off-by: Jakub Kicinski 
---
 include/net/pkt_cls.h |  56 +
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 299 +-
 3 files changed, 357 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 72ffb31..1b47837 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -37,6 +37,7 @@ struct tcf_block_ext_info {
 };
 
 struct tcf_block_cb;
+struct tcf_indr_block_owner;
 bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 
 #ifdef CONFIG_NET_CLS
@@ -81,6 +82,20 @@ void __tcf_block_cb_unregister(struct tcf_block *block,
   struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 tc_setup_cb_t *cb, void *cb_ident);
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident,
+   struct tcf_indr_block_owner *owner);
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident,
+ struct tcf_indr_block_owner *owner);
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb,
+void *cb_ident);
+
+struct tcf_indr_block_owner *tc_indr_block_owner_create(void);
+void tc_indr_block_owner_clean(struct tcf_indr_block_owner *owner);
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 struct tcf_result *res, bool compat_mode);
@@ -183,6 +198,47 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 {
 }
 
+static inline
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb,
+   void *cb_ident,
+   struct tcf_indr_block_owner *owner)
+{
+   return 0;
+}
+
+static inline
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident,
+ struct tcf_indr_block_owner *owner)
+{
+   return 0;
+}
+
+static inline
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb,
+  void *cb_ident)
+{
+}
+
+static inline
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb,
+void *cb_ident)
+{
+}
+
+static inline struct tcf_indr_block_owner *tc_indr_block_owner_create(void)
+{
+   /* NULL would mean an error, only CONFIG_NET_CLS can dereference this */
+   return (void *)1;
+}
+
+static inline void tc_indr_block_owner_clean(struct tcf_indr_block_owner 
*owner)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4d73642..8301581 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -24,6 +24,9 @@ struct bpf_flow_keys;
 typedef int tc_setup_cb_t(enum tc_setup_type type,
  void *type_data, void *cb_priv);
 
+typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+   enum tc_setup_type type, void *type_data);
+
 struct qdisc_rate_table {
struct tc_ratespec rate;
u32 data[256];
diff --git a/net/sched/cls_api.c b/net/sched/cls_ap

[PATCH net V2 1/1] net/smc: fix smc_buf_unuse to use the lgr pointer

2018-10-25 Thread Ursula Braun

From: Karsten Graul 

The pointer to the link group is unset in the smc connection structure
right before the call to smc_buf_unuse. Provide the lgr pointer to
smc_buf_unuse explicitly.
And move the call to smc_lgr_schedule_free_work to the end of
smc_conn_free.

Fixes: a6920d1d130c ("net/smc: handle unregistered buffers")
Signed-off-by: Karsten Graul 
Signed-off-by: Ursula Braun 
---
 net/smc/smc_core.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index e871368500e3..18daebcef181 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -122,22 +122,17 @@ static void __smc_lgr_unregister_conn(struct 
smc_connection *conn)
sock_put(&smc->sk); /* sock_hold in smc_lgr_register_conn() */
 }
 
-/* Unregister connection and trigger lgr freeing if applicable
+/* Unregister connection from lgr
  */
 static void smc_lgr_unregister_conn(struct smc_connection *conn)
 {
struct smc_link_group *lgr = conn->lgr;
-   int reduced = 0;
 
write_lock_bh(&lgr->conns_lock);
if (conn->alert_token_local) {
-   reduced = 1;
__smc_lgr_unregister_conn(conn);
}
write_unlock_bh(&lgr->conns_lock);
-   if (!reduced || lgr->conns_num)
-   return;
-   smc_lgr_schedule_free_work(lgr);
 }
 
 /* Send delete link, either as client to request the initiation
@@ -291,7 +286,8 @@ static int smc_lgr_create(struct smc_sock *smc, bool 
is_smcd,
return rc;
 }
 
-static void smc_buf_unuse(struct smc_connection *conn)
+static void smc_buf_unuse(struct smc_connection *conn,
+ struct smc_link_group *lgr)
 {
if (conn->sndbuf_desc)
conn->sndbuf_desc->used = 0;
@@ -301,8 +297,6 @@ static void smc_buf_unuse(struct smc_connection *conn)
conn->rmb_desc->used = 0;
} else {
/* buf registration failed, reuse not possible */
-   struct smc_link_group *lgr = conn->lgr;
-
write_lock_bh(&lgr->rmbs_lock);
list_del(&conn->rmb_desc->list);
write_unlock_bh(&lgr->rmbs_lock);
@@ -315,16 +309,21 @@ static void smc_buf_unuse(struct smc_connection *conn)
 /* remove a finished connection from its link group */
 void smc_conn_free(struct smc_connection *conn)
 {
-   if (!conn->lgr)
+   struct smc_link_group *lgr = conn->lgr;
+
+   if (!lgr)
return;
-   if (conn->lgr->is_smcd) {
+   if (lgr->is_smcd) {
smc_ism_unset_conn(conn);
tasklet_kill(&conn->rx_tsklet);
} else {
smc_cdc_tx_dismiss_slots(conn);
}
-   smc_lgr_unregister_conn(conn);
-   smc_buf_unuse(conn);
+   smc_lgr_unregister_conn(conn);  /* unsets conn->lgr */
+   smc_buf_unuse(conn, lgr);   /* allow buffer reuse */
+
+   if (!lgr->conns_num)
+   smc_lgr_schedule_free_work(lgr);
 }
 
 static void smc_link_clear(struct smc_link *lnk)
-- 
2.16.4

[PATCH v3 2/2] net: qcom/emac: add phy-handle support for ACPI

2018-10-25 Thread Wang Dongsheng

Use "phy-handle" to porint an internal MDIO device port.

Signed-off-by: Wang Dongsheng 
---
 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 115 +++---
 1 file changed, 100 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c 
b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
index f2ed013ce5d5..3dc3ae55e5bb 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
@@ -96,6 +96,96 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, 
int regnum, u16 val)
return 0;
 }
 
+static int acpi_device_match(struct device *dev, void *fwnode)
+{
+   return dev->fwnode == fwnode;
+}
+
+static struct phy_device *
+emac_acpi_get_phydev_from_phy_handle(struct platform_device *pdev)
+{
+   struct fwnode_reference_args args;
+   struct fwnode_handle *fw_node;
+   struct acpi_device *adev;
+   acpi_handle handle;
+   struct device *dev;
+   struct phy_device *phydev;
+   struct net_device *netdev;
+   struct emac_adapter *adpt;
+   int phy_addr;
+   int ret;
+
+   /* Get PHY Port reference from phy-handle */
+   fw_node = acpi_fwnode_handle(ACPI_COMPANION(&pdev->dev));
+   ret = acpi_node_get_property_reference(fw_node, "phy-handle", 0,
+  &args);
+   if (ACPI_FAILURE(ret) || !is_acpi_device_node(args.fwnode))
+   return ERR_PTR(-ENODEV);
+
+   /* Get PHY addr from the port node */
+   if (fwnode_property_read_u32(args.fwnode, "phy-channel", &phy_addr))
+   return ERR_PTR(-ENODEV);
+
+   /* Get the MDIO bus that included the port */
+   handle = ACPI_HANDLE_FWNODE(args.fwnode);
+   if (!handle || acpi_bus_get_device(handle, &adev))
+   return ERR_PTR(-ENODEV);
+
+   while (adev->parent) {
+   if (!strcmp(acpi_device_hid(adev), "QCOM8070"))
+   break;
+   adev = adev->parent;
+   }
+   if (!adev->parent)
+   return ERR_PTR(-ENODEV);
+
+   dev = bus_find_device(&platform_bus_type, NULL,
+ &adev->fwnode,
+ acpi_device_match);
+   if (!dev)
+   return ERR_PTR(-ENODEV);
+
+   netdev = dev_get_drvdata(dev);
+   if (!netdev)
+   return ERR_PTR(-EPROBE_DEFER);
+
+   adpt = netdev_priv(netdev);
+   if (!adpt->mii_bus)
+   return ERR_PTR(-EPROBE_DEFER);
+
+   phydev = mdiobus_get_phy(adpt->mii_bus, phy_addr);
+   return phydev ? phydev : ERR_PTR(-ENODEV);
+}
+
+static struct phy_device *
+emac_acpi_get_phydev(struct platform_device *pdev, struct emac_adapter *adpt)
+{
+   struct phy_device *phydev = NULL;
+   int phy_addr;
+   int ret;
+
+   /* Compatible with "phy-channel" */
+   ret = device_property_read_u32(&pdev->dev, "phy-channel",
+  &phy_addr);
+   if (!ret)
+   phydev = mdiobus_get_phy(adpt->mii_bus, phy_addr);
+   if (phydev)
+   return phydev;
+
+   /* Get PHY Port reference from phy-handle */
+   phydev = emac_acpi_get_phydev_from_phy_handle(pdev);
+   if (!IS_ERR(phydev))
+   return phydev;
+   if (PTR_ERR(phydev) == -EPROBE_DEFER)
+   return ERR_PTR(-EPROBE_DEFER);
+
+   /* If we can't read a valid phy address from "phy-channel"/"phy-handle",
+* then assume that there is only one phy on local mdio bus.
+*/
+   phydev = phy_find_first(adpt->mii_bus);
+   return phydev ? phydev : ERR_PTR(-ENODEV);
+}
+
 static int emac_mdio_bus_create(struct platform_device *pdev,
struct emac_adapter *adpt)
 {
@@ -128,13 +218,9 @@ static int emac_get_phydev(struct platform_device *pdev,
struct emac_adapter *adpt)
 {
struct device_node *np = pdev->dev.of_node;
-   struct mii_bus *bus = adpt->mii_bus;
struct device_node *phy_np;
struct phy_device *phydev;
 
-   u32 phy_addr;
-   int ret;
-
if (!has_acpi_companion(&pdev->dev)) {
phy_np = of_parse_phandle(np, "phy-handle", 0);
adpt->phydev = of_phy_find_device(phy_np);
@@ -142,14 +228,9 @@ static int emac_get_phydev(struct platform_device *pdev,
return adpt->phydev ? 0 : -ENODEV;
}
 
-   ret = device_property_read_u32(&pdev->dev, "phy-channel",
-  &phy_addr);
-   /* If we can't read a valid phy address, then assume
-* that there is only one phy on this mdio bus.
-*/
-   phydev = ret ? phy_find_first(bus) : mdiobus_get_phy(bus, phy_addr);
-   if (!phydev)
-   return -ENODEV;
+   phydev = emac_acpi_get_phydev(pdev, adpt);
+   if (IS_ERR(phydev))
+   return PTR_ERR(phydev);
 
/* of_phy_find_device() claims a reference

[PATCH v3 0/2] net: qcom/emac: add shared mdio bus support

2018-10-25 Thread Wang Dongsheng

The emac include MDIO controller, and the motherboard has more than one
PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
device that not has MDIO bus connected.

Based on ACPI, since "phy-handle" cannot directly point to a _DSD
sub-package, so we use "phy-handle" to point an internal MDIO device port.
The port describes the phy address.

Tested: QDF2400 (ACPI), buildin/insmod/rmmod

V3:
 - Add "phy-handle" support.
 - Remove all of DT changes.

V2:
 - Separate patch.

Wang Dongsheng (2):
  net: qcom/emac: split phy_config to mdio bus create and get phy device
  net: qcom/emac: add phy-handle support for ACPI

 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 183 ++
 1 file changed, 142 insertions(+), 41 deletions(-)

-- 
2.18.0

[PATCH v3 1/2] net: qcom/emac: split phy_config to mdio bus create and get phy device

2018-10-25 Thread Wang Dongsheng

This patch separate emac_mdio_bus_create and emac_get_phydev from
emac_phy_config, and do some codes clean.

Signed-off-by: Wang Dongsheng 
---
 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 96 +++
 1 file changed, 56 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c 
b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
index 53dbf1e163a8..f2ed013ce5d5 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-phy.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
@@ -96,15 +96,15 @@ static int emac_mdio_write(struct mii_bus *bus, int addr, 
int regnum, u16 val)
return 0;
 }
 
-/* Configure the MDIO bus and connect the external PHY */
-int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt)
+static int emac_mdio_bus_create(struct platform_device *pdev,
+   struct emac_adapter *adpt)
 {
struct device_node *np = pdev->dev.of_node;
struct mii_bus *mii_bus;
int ret;
 
/* Create the mii_bus object for talking to the MDIO bus */
-   adpt->mii_bus = mii_bus = devm_mdiobus_alloc(&pdev->dev);
+   mii_bus = devm_mdiobus_alloc(&pdev->dev);
if (!mii_bus)
return -ENOMEM;
 
@@ -115,50 +115,66 @@ int emac_phy_config(struct platform_device *pdev, struct 
emac_adapter *adpt)
mii_bus->parent = &pdev->dev;
mii_bus->priv = adpt;
 
-   if (has_acpi_companion(&pdev->dev)) {
-   u32 phy_addr;
-
-   ret = mdiobus_register(mii_bus);
-   if (ret) {
-   dev_err(&pdev->dev, "could not register mdio bus\n");
-   return ret;
-   }
-   ret = device_property_read_u32(&pdev->dev, "phy-channel",
-  &phy_addr);
-   if (ret)
-   /* If we can't read a valid phy address, then assume
-* that there is only one phy on this mdio bus.
-*/
-   adpt->phydev = phy_find_first(mii_bus);
-   else
-   adpt->phydev = mdiobus_get_phy(mii_bus, phy_addr);
-
-   /* of_phy_find_device() claims a reference to the phydev,
-* so we do that here manually as well. When the driver
-* later unloads, it can unilaterally drop the reference
-* without worrying about ACPI vs DT.
-*/
-   if (adpt->phydev)
-   get_device(&adpt->phydev->mdio.dev);
-   } else {
-   struct device_node *phy_np;
-
-   ret = of_mdiobus_register(mii_bus, np);
-   if (ret) {
-   dev_err(&pdev->dev, "could not register mdio bus\n");
-   return ret;
-   }
+   ret = of_mdiobus_register(mii_bus, has_acpi_companion(&pdev->dev) ?
+ NULL : np);
+   if (ret)
+   dev_err(&pdev->dev, "Could not register mdio bus\n");
+
+   adpt->mii_bus = ret ? NULL : mii_bus;
+   return ret;
+}
+
+static int emac_get_phydev(struct platform_device *pdev,
+  struct emac_adapter *adpt)
+{
+   struct device_node *np = pdev->dev.of_node;
+   struct mii_bus *bus = adpt->mii_bus;
+   struct device_node *phy_np;
+   struct phy_device *phydev;
 
+   u32 phy_addr;
+   int ret;
+
+   if (!has_acpi_companion(&pdev->dev)) {
phy_np = of_parse_phandle(np, "phy-handle", 0);
adpt->phydev = of_phy_find_device(phy_np);
of_node_put(phy_np);
+   return adpt->phydev ? 0 : -ENODEV;
}
 
-   if (!adpt->phydev) {
-   dev_err(&pdev->dev, "could not find external phy\n");
-   mdiobus_unregister(mii_bus);
+   ret = device_property_read_u32(&pdev->dev, "phy-channel",
+  &phy_addr);
+   /* If we can't read a valid phy address, then assume
+* that there is only one phy on this mdio bus.
+*/
+   phydev = ret ? phy_find_first(bus) : mdiobus_get_phy(bus, phy_addr);
+   if (!phydev)
return -ENODEV;
-   }
 
+   /* of_phy_find_device() claims a reference to the phydev,
+* so we do that here manually as well. When the driver
+* later unloads, it can unilaterally drop the reference
+* without worrying about ACPI vs DT.
+*/
+   get_device(&phydev->mdio.dev);
+   adpt->phydev = phydev;
return 0;
 }
+
+/* Configure the MDIO bus and connect the external PHY */
+int emac_phy_config(struct platform_device *pdev, struct emac_adapter *adpt)
+{
+   int ret;
+
+   ret = emac_mdio_bus_create(pdev, adpt);
+   if (ret)
+   return ret;
+
+   ret = emac_get_phydev(pdev, adpt);
+   if (ret) {
+   dev_err(&pdev->dev, "Could not find externa

Re: Regression in 4.19 net/phy/realtek: garbled sysfs output

2018-10-25 Thread Holger Hoffstätte


On 10/24/18 22:12, Andrew Lunn wrote:

On Wed, Oct 24, 2018 at 09:36:02PM +0200, Holger Hoffstätte wrote:

Hi,

Since 4.19 r8169 depends on phylib:

$lsmod | grep r8169
r8169  81920  0
libphy 57344  2 r8169,realtek

Unfortunately this now gives me the following sysfs error:

$cd /sys/module/realtek/drivers
$ls -l
ls: cannot access 'mdio_bus:RTL8201F 10/100Mbps Ethernet': No such file or 
directory
total 0
lrwxrwxrwx 1 root root 0 Oct 24 21:09 'mdio_bus:RTL8201CP Ethernet' -> 
'../../../bus/mdio_bus/drivers/RTL8201CP Ethernet'
l? ? ???? 'mdio_bus:RTL8201F 10/100Mbps Ethernet'
lrwxrwxrwx 1 root root 0 Oct 24 21:09 'mdio_bus:RTL8211 Gigabit Ethernet' -> 
'../../../bus/mdio_bus/drivers/RTL8211 Gigabit Ethernet'
[..]

Apparently the forward slash in "10/100Mbps Ethernet" is interpreted as
directory separator that leads nowhere, and was introduced in commit
513588dd44b ("net: phy: realtek: add RTL8201F phy-id and functions").

Would it be acceptable to change the name simply to "RTL8201F Ethernet"?


Hi Holger

Or use "RTL8201F Fast Ethernet"


Yes, even better since it's correct. :)
As expected changing the name .name entry fixes the sysfs behaviour.

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 7fc8508b5..271e8adc3 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -220,7 +220,7 @@ static struct phy_driver realtek_drvs[] = {
.flags  = PHY_HAS_INTERRUPT,
}, {
.phy_id = 0x001cc816,
-   .name   = "RTL8201F 10/100Mbps Ethernet",
+   .name   = "RTL8201F Fast Ethernet",
.phy_id_mask= 0x001f,
.features   = PHY_BASIC_FEATURES,
.flags  = PHY_HAS_INTERRUPT,


I wonder if other drivers have similar problems?

davicom.c:  .name   = "Davicom DM9161B/C",
intel-xway.c:   .name   = "Intel XWAY PHY11G (PEF 7071/PEF 7072) 
v1.3",
intel-xway.c:   .name   = "Intel XWAY PHY11G (PEF 7071/PEF 7072) 
v1.4",
intel-xway.c:   .name   = "Intel XWAY PHY11G (PEF 7071/PEF 7072) 
v1.5 / v1.6",
intel-xway.c:   .name   = "Intel XWAY PHY22F (PEF 7061) v1.5 / 
v1.6",
smsc.c:  .name = "SMSC LAN8710/LAN8720",


I'm open to suggestions about how to rename those identifiers.
"|" seems to work but IMHO looks a bit weird:

"Davicom DM9161B/C" -> "Davicom DM9161B|C"
"(PEF 7071/PEF 7072) v1.5 / v1.6" -> "(PEF 7071|7072) v1.5|6"

We can go full regex, which will probably get me voted off the island:
"(PEF 7071/PEF 7072) v1.5 / v1.6" -> "(PEF {7071,7072}) v1.{5,6}"

Cast your votes now!

cheers,
Holger

Re: [PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog

2018-10-25 Thread Prashant Bhole





On 10/25/2018 5:54 PM, Naresh Kamboju wrote:

On Tue, 9 Oct 2018 at 12:32, Song Liu  wrote:


On Mon, Oct 8, 2018 at 6:07 PM Prashant Bhole
 wrote:


map_lookup_elem isn't supported by certain map types like:
- BPF_MAP_TYPE_PROG_ARRAY
- BPF_MAP_TYPE_STACK_TRACE
- BPF_MAP_TYPE_XSKMAP
- BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
Let's add verfier tests to check whether verifier prevents
bpf_map_lookup_elem call on above programs from bpf program.

Signed-off-by: Prashant Bhole 
Acked-by: Alexei Starovoitov 

Acked-by: Song Liu 


---
  tools/testing/selftests/bpf/test_verifier.c | 121 +++-
  1 file changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 65ae44c85d27..cf4cd32b6772 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -48,7 +48,7 @@

  #define MAX_INSNS  BPF_MAXINSNS
  #define MAX_FIXUPS 8
-#define MAX_NR_MAPS8
+#define MAX_NR_MAPS13
  #define POINTER_VALUE  0xcafe4all
  #define TEST_DATA_LEN  64

@@ -65,6 +65,10 @@ struct bpf_test {
 int fixup_map_hash_48b[MAX_FIXUPS];
 int fixup_map_hash_16b[MAX_FIXUPS];
 int fixup_map_array_48b[MAX_FIXUPS];
+   int fixup_map_sockmap[MAX_FIXUPS];
+   int fixup_map_sockhash[MAX_FIXUPS];
+   int fixup_map_xskmap[MAX_FIXUPS];
+   int fixup_map_stacktrace[MAX_FIXUPS];
 int fixup_prog1[MAX_FIXUPS];
 int fixup_prog2[MAX_FIXUPS];
 int fixup_map_in_map[MAX_FIXUPS];
@@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = {
 .errstr = "invalid access to packet",
 .prog_type = BPF_PROG_TYPE_SCHED_CLS,
 },
+   {
+   "prevent map lookup in sockmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_sockmap = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 15 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in sockhash",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_sockhash = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 18 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
+   },
+   {
+   "prevent map lookup in xskmap",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_xskmap = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 17 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_XDP,
+   },
+   {
+   "prevent map lookup in stack trace",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map_stacktrace = { 3 },
+   .result = REJECT,
+   .errstr = "cannot pass map_type 7 into func 
bpf_map_lookup_elem",
+   .prog_type = BPF_PROG_TYPE_PERF_EVENT,
+   },
+   {
+   "prevent map lookup in prog array",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF

Re: [PATCH bpf-next 6/6] selftests/bpf: test_verifier, check bpf_map_lookup_elem access in bpf prog

2018-10-25 Thread Naresh Kamboju

On Tue, 9 Oct 2018 at 12:32, Song Liu  wrote:
>
> On Mon, Oct 8, 2018 at 6:07 PM Prashant Bhole
>  wrote:
> >
> > map_lookup_elem isn't supported by certain map types like:
> > - BPF_MAP_TYPE_PROG_ARRAY
> > - BPF_MAP_TYPE_STACK_TRACE
> > - BPF_MAP_TYPE_XSKMAP
> > - BPF_MAP_TYPE_SOCKMAP/BPF_MAP_TYPE_SOCKHASH
> > Let's add verfier tests to check whether verifier prevents
> > bpf_map_lookup_elem call on above programs from bpf program.
> >
> > Signed-off-by: Prashant Bhole 
> > Acked-by: Alexei Starovoitov 
> Acked-by: Song Liu 
>
> > ---
> >  tools/testing/selftests/bpf/test_verifier.c | 121 +++-
> >  1 file changed, 120 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/test_verifier.c 
> > b/tools/testing/selftests/bpf/test_verifier.c
> > index 65ae44c85d27..cf4cd32b6772 100644
> > --- a/tools/testing/selftests/bpf/test_verifier.c
> > +++ b/tools/testing/selftests/bpf/test_verifier.c
> > @@ -48,7 +48,7 @@
> >
> >  #define MAX_INSNS  BPF_MAXINSNS
> >  #define MAX_FIXUPS 8
> > -#define MAX_NR_MAPS8
> > +#define MAX_NR_MAPS13
> >  #define POINTER_VALUE  0xcafe4all
> >  #define TEST_DATA_LEN  64
> >
> > @@ -65,6 +65,10 @@ struct bpf_test {
> > int fixup_map_hash_48b[MAX_FIXUPS];
> > int fixup_map_hash_16b[MAX_FIXUPS];
> > int fixup_map_array_48b[MAX_FIXUPS];
> > +   int fixup_map_sockmap[MAX_FIXUPS];
> > +   int fixup_map_sockhash[MAX_FIXUPS];
> > +   int fixup_map_xskmap[MAX_FIXUPS];
> > +   int fixup_map_stacktrace[MAX_FIXUPS];
> > int fixup_prog1[MAX_FIXUPS];
> > int fixup_prog2[MAX_FIXUPS];
> > int fixup_map_in_map[MAX_FIXUPS];
> > @@ -4541,6 +4545,85 @@ static struct bpf_test tests[] = {
> > .errstr = "invalid access to packet",
> > .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> > },
> > +   {
> > +   "prevent map lookup in sockmap",
> > +   .insns = {
> > +   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +   BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +BPF_FUNC_map_lookup_elem),
> > +   BPF_EXIT_INSN(),
> > +   },
> > +   .fixup_map_sockmap = { 3 },
> > +   .result = REJECT,
> > +   .errstr = "cannot pass map_type 15 into func 
> > bpf_map_lookup_elem",
> > +   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
> > +   },
> > +   {
> > +   "prevent map lookup in sockhash",
> > +   .insns = {
> > +   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +   BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +BPF_FUNC_map_lookup_elem),
> > +   BPF_EXIT_INSN(),
> > +   },
> > +   .fixup_map_sockhash = { 3 },
> > +   .result = REJECT,
> > +   .errstr = "cannot pass map_type 18 into func 
> > bpf_map_lookup_elem",
> > +   .prog_type = BPF_PROG_TYPE_SOCK_OPS,
> > +   },
> > +   {
> > +   "prevent map lookup in xskmap",
> > +   .insns = {
> > +   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +   BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +BPF_FUNC_map_lookup_elem),
> > +   BPF_EXIT_INSN(),
> > +   },
> > +   .fixup_map_xskmap = { 3 },
> > +   .result = REJECT,
> > +   .errstr = "cannot pass map_type 17 into func 
> > bpf_map_lookup_elem",
> > +   .prog_type = BPF_PROG_TYPE_XDP,
> > +   },
> > +   {
> > +   "prevent map lookup in stack trace",
> > +   .insns = {
> > +   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> > +   BPF_LD_MAP_FD(BPF_REG_1, 0),
> > +   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> > +BPF_FUNC_map_lookup_elem),
> > +   BPF_EXIT_INSN(),
> > +   },
> > +   .fixup_map_stacktrace = { 3 },
> > +   .result = REJECT,
> > +   .errstr = "cannot pass map_

RE: [PATCH v3 0/2] net: qcom/emac: add shared mdio bus support

2018-10-25 Thread Wang, Dongsheng

Sorry, please ignore this patch.

Cheers,
-Dongsheng


> -Original Message-
> From: Wang, Dongsheng
> Sent: Thursday, October 25, 2018 4:15 PM
> To: ti...@kernel.org; and...@lunn.ch
> Cc: netdev@vger.kernel.org; Wang, Dongsheng
> ; Zheng, Joey
> ; f.faine...@gmail.com
> Subject: [PATCH v3 0/2] net: qcom/emac: add shared mdio bus support
> 
> The emac include MDIO controller, and the motherboard has more than one
> PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
> device that not has MDIO bus connected.
> 
> Based on ACPI, since "phy-handle" cannot directly point to a _DSD sub-package,
> so we use "phy-handle" to point an internal MDIO device port.
> The port describes the phy address.
> 
> Tested: QDF2400 (ACPI), buildin/insmod/rmmod
> 
> V3:
>  - Add "phy-handle" support.
>  - Remove all of DT changes.
> 
> V2:
>  - Separate patch.
> 
> Wang Dongsheng (2):
>   net: qcom/emac: split phy_config to mdio bus create and get phy device
>   net: qcom/emac: add phy-handle support for ACPI
> 
>  drivers/net/ethernet/qualcomm/emac/emac-phy.c | 183 ++
>  1 file changed, 142 insertions(+), 41 deletions(-)
> 
> --
> 2.18.0

[PATCH v3 0/2] net: qcom/emac: add shared mdio bus support

2018-10-25 Thread Wang Dongsheng

The emac include MDIO controller, and the motherboard has more than one
PHY connected to an MDIO bus. So share the shared mii_bus for others MAC
device that not has MDIO bus connected.

Based on ACPI, since "phy-handle" cannot directly point to a _DSD
sub-package, so we use "phy-handle" to point an internal MDIO device port.
The port describes the phy address.

Tested: QDF2400 (ACPI), buildin/insmod/rmmod

V3:
 - Add "phy-handle" support.
 - Remove all of DT changes.

V2:
 - Separate patch.

Wang Dongsheng (2):
  net: qcom/emac: split phy_config to mdio bus create and get phy device
  net: qcom/emac: add phy-handle support for ACPI

 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 183 ++
 1 file changed, 142 insertions(+), 41 deletions(-)

-- 
2.18.0

70 matches

Mail list logo