date:20171228

Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Martin Blumenstingl

On Fri, Dec 29, 2017 at 8:48 AM, Martin Blumenstingl
 wrote:
> Hi Emiliano,
>
> On Fri, Dec 29, 2017 at 2:31 AM, Emiliano Ingrassia
>  wrote:
>> Hi Martin, Hi Dave,
>>
>> On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
>>> Hi Dave,
>>>
>>> please do not apply this series until it got a Tested-by from Emiliano.
>>>
>>>
>>> Hi Emiliano,
>>>
>>> you reported [0] that you couldn't get dwmac-meson8b to work on your
>>> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
>>> I think I was able to find a fix: it consists of two patches (which you
>>> find in this series)
>>>
>>> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
>>> only partially test this (I could only check if the clocks were
>>> calculated correctly when using a dummy 52394Hz input clock instead
>>> of MPLL2).
>>>
>>> Could you please give this series a try and let me know about the
>>> results?
>>> You obviously still need your two "ARM: dts: meson8b" patches which
>>> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
>>> - enable Ethernet on the Odroid-C1
>>>
>>> When testing on Meson8b this also needs a fix for the MPLL clock driver:
>>> "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
>>> https://patchwork.kernel.org/patch/10131677/
>>>
>>>
>>> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
>>> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
>>> fine (so let's hope that this also fixes your Meson8b issue :)).
>>>
>>>
>>> changes since v1 at [1]:
>>> - changed the subject of the cover-letter to indicate that this is all
>>>   about the RGMII clock
>>> - added PATCH #1 which ensures that we don't unnecessarily change the
>>>   parent clocks in RMII mode (and also makes the code easier to
>>>   understand)
>>> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>>>   is about the RGMII clock
>>> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
>>> - replaced PATCH #3 (formerly PATCH #2) with one that sets
>>>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>>>   on Meson8b correctly
>>>
>>> changes since v2 at [2]:
>>> - added PATCH #2 to make the following patch easier
>>> - Emiliano reported that there's currently another bug in the
>>>   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
>>>   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
>>>   (instead of a divide by 5 or divide by 10 clock divider). This has not
>>>   been visible on GXBB and later due to the input clock which always led
>>>   to a selection of "divide by 10" (which is done internally in the IP
>>>   block, but the bit actually means "enable RGMII clock output").
>>>   PATCH #3 was added to address this issue.
>>> - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
>>>   updated and the patch itself rebased because the m25_div clock was
>>>   removed with the new PATCH #3 (so some of the statements were not
>>>   valid anymore)
>>>
>>
>> Here is the clk_summary relative to ethernet on Odroid-C1+
>> with this new series applied:
>>
>> xtal112400  0 0
>>  sys_pll00  12  0 0
>>   cpu_clk   00  12  0 0
>>  vid_pll00   73200  0 0
>>  fixed_pll  22  255000  0 0
>>   mpll2 11   24701  0 0
>>c941.ethernet#m250_sel   11   24701  0 0
>> c941.ethernet#m250_div  11   24701  0 0
>>  c941.ethernet#fixed_div10  112470  0 0
>>   c941.ethernet#m25_en  112470  0 0
>>
>> The ethernet prg0 register is set to 0x74A1 which should be correct with
>> respect to the information contained in the S805 SoC manual.
>> Actually, the ethernet is not yet fully functional.
>> Trying to ping the board, I can see ARP request from host to board using
>> tcpdump. However, the host can't see any response.
> great - we're getting closer!
>
>> Following the U-Boot value for prg0 register, which is 0x7d21, I also
>> tried to set bit 11. As expected, this did not have any influence.
> it *may* be something outside the PRG_ETH0 register than
> to confirm that: could you temporarily revert the last patch from this
> series ("net: stmmac: dwmac-meson8b: propagate rate changes to the
> parent clock")? this way MPLL2 will stay at ~500MHz and PRG_ETH0
> should be identical to what u-boot sets (apart from bit 11, but that
> is only relevant in RMII mode according to the datasheet)
>
>> Another thing that we should check is the "Ethernet Memory PD"

Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Martin Blumenstingl

Hi Emiliano,

On Fri, Dec 29, 2017 at 2:31 AM, Emiliano Ingrassia
 wrote:
> Hi Martin, Hi Dave,
>
> On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
>> Hi Dave,
>>
>> please do not apply this series until it got a Tested-by from Emiliano.
>>
>>
>> Hi Emiliano,
>>
>> you reported [0] that you couldn't get dwmac-meson8b to work on your
>> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
>> I think I was able to find a fix: it consists of two patches (which you
>> find in this series)
>>
>> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
>> only partially test this (I could only check if the clocks were
>> calculated correctly when using a dummy 52394Hz input clock instead
>> of MPLL2).
>>
>> Could you please give this series a try and let me know about the
>> results?
>> You obviously still need your two "ARM: dts: meson8b" patches which
>> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
>> - enable Ethernet on the Odroid-C1
>>
>> When testing on Meson8b this also needs a fix for the MPLL clock driver:
>> "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
>> https://patchwork.kernel.org/patch/10131677/
>>
>>
>> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
>> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
>> fine (so let's hope that this also fixes your Meson8b issue :)).
>>
>>
>> changes since v1 at [1]:
>> - changed the subject of the cover-letter to indicate that this is all
>>   about the RGMII clock
>> - added PATCH #1 which ensures that we don't unnecessarily change the
>>   parent clocks in RMII mode (and also makes the code easier to
>>   understand)
>> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>>   is about the RGMII clock
>> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
>> - replaced PATCH #3 (formerly PATCH #2) with one that sets
>>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>>   on Meson8b correctly
>>
>> changes since v2 at [2]:
>> - added PATCH #2 to make the following patch easier
>> - Emiliano reported that there's currently another bug in the
>>   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
>>   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
>>   (instead of a divide by 5 or divide by 10 clock divider). This has not
>>   been visible on GXBB and later due to the input clock which always led
>>   to a selection of "divide by 10" (which is done internally in the IP
>>   block, but the bit actually means "enable RGMII clock output").
>>   PATCH #3 was added to address this issue.
>> - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
>>   updated and the patch itself rebased because the m25_div clock was
>>   removed with the new PATCH #3 (so some of the statements were not
>>   valid anymore)
>>
>
> Here is the clk_summary relative to ethernet on Odroid-C1+
> with this new series applied:
>
> xtal112400  0 0
>  sys_pll00  12  0 0
>   cpu_clk   00  12  0 0
>  vid_pll00   73200  0 0
>  fixed_pll  22  255000  0 0
>   mpll2 11   24701  0 0
>c941.ethernet#m250_sel   11   24701  0 0
> c941.ethernet#m250_div  11   24701  0 0
>  c941.ethernet#fixed_div10  112470  0 0
>   c941.ethernet#m25_en  112470  0 0
>
> The ethernet prg0 register is set to 0x74A1 which should be correct with
> respect to the information contained in the S805 SoC manual.
> Actually, the ethernet is not yet fully functional.
> Trying to ping the board, I can see ARP request from host to board using
> tcpdump. However, the host can't see any response.
great - we're getting closer!

> Following the U-Boot value for prg0 register, which is 0x7d21, I also
> tried to set bit 11. As expected, this did not have any influence.
it *may* be something outside the PRG_ETH0 register than
to confirm that: could you temporarily revert the last patch from this
series ("net: stmmac: dwmac-meson8b: propagate rate changes to the
parent clock")? this way MPLL2 will stay at ~500MHz and PRG_ETH0
should be identical to what u-boot sets (apart from bit 11, but that
is only relevant in RMII mode according to the datasheet)

> Another thing that we should check is the "Ethernet Memory PD" (see S805
> manual - sec. 5.4) register which bits 3-2 enable/disable ethernet
> normal operation. However, those bits are already cleared by U-Boot.
if the peripheral registers itself are configured

Re: [PATCH net 3/3] eet: ena: invoke netif_carrier_off() only after netdev registered

2017-12-28 Thread Jakub Kicinski

On Thu, 28 Dec 2017 21:30:20 +, neta...@amazon.com wrote:
> From: Netanel Belgazal 
> 
> netif_carrier_off() should be called only after register netdev.
> Move the function's call after the registration.

By "should" you mean in your driver, right?  I think calling
netif_carrier_off() on an unregistered netdev is a pretty standard
thing to do for drivers which manage carrier state.

> Signed-off-by: Netanel Belgazal 
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
> b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index fbe21a817bd8..ee50c56765a4 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -3276,14 +3276,14 @@ static int ena_probe(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  
>   memcpy(adapter->netdev->perm_addr, adapter->mac_addr, netdev->addr_len);
>  
> - netif_carrier_off(netdev);
> -
>   rc = register_netdev(netdev);
>   if (rc) {
>   dev_err(>dev, "Cannot register net device\n");
>   goto err_rss;
>   }
>  
> + netif_carrier_off(netdev);
> +
>   INIT_WORK(>reset_task, ena_fw_reset_device);

This looks suspicious.  After you call register_netdev() someone can
open the device and link may come up before you clear it again with
carrier off.  Leading to netdev without a carrier until it's reopened.

>   adapter->last_keep_alive_jiffies = jiffies;

Re: [PATCH][next] wcn36xx: remove redundant assignment to msg_body.min_ch_time

2017-12-28 Thread Loic Poulain

Hi Colin, Bjorn,

On 26 December 2017 at 21:13, Bjorn Andersson
 wrote:
> On Tue 19 Dec 09:04 PST 2017, Colin King wrote:
>
>> From: Colin Ian King 
>>
>> msg_body.min_ch_time is being assigned twice; remove the redundant
>> first assignment.
>>
>> Detected by CoverityScan, CID#1463042 ("Unused Value")
>>
>
> Happy to see Coverity working for us :)
>
>
> This should have had a:
>
> Fixes: 2f3bef4b247e ("wcn36xx: Add hardware scan offload support")
>
>> Signed-off-by: Colin Ian King 
>> ---
>>  drivers/net/wireless/ath/wcn36xx/smd.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
>> b/drivers/net/wireless/ath/wcn36xx/smd.c
>> index 2914618a0335..bab2eca5fcac 100644
>> --- a/drivers/net/wireless/ath/wcn36xx/smd.c
>> +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
>> @@ -625,7 +625,6 @@ int wcn36xx_smd_start_hw_scan(struct wcn36xx *wcn, 
>> struct ieee80211_vif *vif,
>>   INIT_HAL_MSG(msg_body, WCN36XX_HAL_START_SCAN_OFFLOAD_REQ);
>>
>>   msg_body.scan_type = WCN36XX_HAL_SCAN_TYPE_ACTIVE;
>> - msg_body.min_ch_time = 30;
>>   msg_body.min_ch_time = 100;
>
> But I strongly suspect the second line is supposed to be max_ch_time.
>
> @Loic, do you agree?

You're absolutely right.
Colin could you please update your patch accordingly?

Regards,
Loic

Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-28 Thread Masami Hiramatsu

On Thu, 28 Dec 2017 17:11:31 -0800
Alexei Starovoitov  wrote:

> On 12/27/17 11:51 PM, Masami Hiramatsu wrote:
> >
> > Then what happen if the user set invalid retval to those functions?
> > even if we limit the injectable functions, it can cause a problem,
> >
> > for example,
> >
> >  obj = func_return_object();
> >  if (!obj) {
> > handling_error...;
> >  }
> >  obj->field = x;
> >
> > In this case, obviously func_return_object() must return NULL if there is
> > an error, not -ENOMEM. But without the correct retval information, how would
> > you check the BPF code doesn't cause a trouble?
> > Currently it seems you are expecting only the functions which return error 
> > code.
> >
> >  ret = func_return_state();
> >  if (ret < 0) {
> > handling_error...;
> >  }
> >
> > But how we can distinguish those?
> >
> > If we have the error range for each function, we can ensure what is
> > *correct* error code, NULL or errno, or any other error numbers. :)
> 
> messing up return values may cause problems and range check is
> not going to magically help.
> The caller may handle only a certain set of errors or interpret
> some of them like EBUSY as a signal to retry.
> It's plain impossible to make sure that kernel will be functional
> after error injection has been made.

Hmm, if so, why we need this injectable table?
If we can not make sure the safeness of the error injection (of course, yes)
why we need to limit the error injection on such limited functions?
I think we don't need it anymore. Any function can be injectable, and no
need to make sure the safeness.

Thank you,

> Like kmalloc() unconditionally returning NULL will be deadly
> for the kernel, hence this patch 4/4 has very limited practical
> use. The bpf program need to make intelligent decisions when
> to return an error and what kind of error to return.
> Doing blank range check adds a false sense of additional safety.
> More so it wastes kilobytes of memory to do this check, hence nack.
> 


-- 
Masami Hiramatsu

[PATCH RESEND 1/3] net: Fix possible race in peernet2id_alloc()

2017-12-28 Thread Kirill Tkhai

peernet2id_alloc() is racy without rtnl_lock() as atomic_read(>count)
under net->nsid_lock does not guarantee, peer is alive:

rcu_read_lock()
peernet2id_alloc()..
  spin_lock_bh(>nsid_lock)   ..
  atomic_read(>count) == 1  ..
  ..  put_net()
  ..cleanup_net()
  ..  for_each_net(tmp)
  ..
spin_lock_bh(>nsid_lock)
  ..__peernet2id(tmp, net) == -1
  ....
  ....
__peernet2id_alloc(alloc == true)   ..
  ....
rcu_read_unlock()   ..
..synchronize_rcu()
..kmem_cache_free(net)

After the above situation, net::netns_id contains id pointing to freed memory,
and any other dereferencing by the id will operate with this freed memory.

Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
is generic interface, and better we fix it before someone really starts
use it in wrong context.

Signed-off-by: Kirill Tkhai 
---
 net/core/net_namespace.c |   23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 60a71be75aea..6a4eab438221 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -221,17 +221,32 @@ static void rtnl_net_notifyid(struct net *net, int cmd, 
int id);
  */
 int peernet2id_alloc(struct net *net, struct net *peer)
 {
-   bool alloc;
+   bool alloc = false, alive = false;
int id;
 
-   if (atomic_read(>count) == 0)
-   return NETNSA_NSID_NOT_ASSIGNED;
spin_lock_bh(>nsid_lock);
-   alloc = atomic_read(>count) == 0 ? false : true;
+   /* Spinlock guarantees we never hash a peer to net->netns_ids
+* after idr_destroy(>netns_ids) occurs in cleanup_net().
+*/
+   if (atomic_read(>count) == 0) {
+   id = NETNSA_NSID_NOT_ASSIGNED;
+   goto unlock;
+   }
+   /*
+* When peer is obtained from RCU lists, we may race with
+* its cleanup. Check whether it's alive, and this guarantees
+* we never hash a peer back to net->netns_ids, after it has
+* just been idr_remove()'d from there in cleanup_net().
+*/
+   if (maybe_get_net(peer))
+   alive = alloc = true;
id = __peernet2id_alloc(net, peer, );
+unlock:
spin_unlock_bh(>nsid_lock);
if (alloc && id >= 0)
rtnl_net_notifyid(net, RTM_NEWNSID, id);
+   if (alive)
+   put_net(peer);
return id;
 }
 EXPORT_SYMBOL_GPL(peernet2id_alloc);

[PATCH RESEND 3/3] net: Remove spinlock from get_net_ns_by_id()

2017-12-28 Thread Kirill Tkhai

idr_find() is safe under rcu_read_lock() and
maybe_get_net() guarantees that net is alive.

Signed-off-by: Kirill Tkhai 
---
 net/core/net_namespace.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 6a4eab438221..a675f35a18ff 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -279,11 +279,9 @@ struct net *get_net_ns_by_id(struct net *net, int id)
return NULL;
 
rcu_read_lock();
-   spin_lock_bh(>nsid_lock);
peer = idr_find(>netns_ids, id);
if (peer)
peer = maybe_get_net(peer);
-   spin_unlock_bh(>nsid_lock);
rcu_read_unlock();
 
return peer;

[PATCH RESEND 2/3] net: Add BUG_ON() to get_net()

2017-12-28 Thread Kirill Tkhai

Since people may mistakenly obtain destroying net
from net_namespace_list and from net::netns_ids
without checking for its net::counter, let's protect
against such situations and insert BUG_ON() to stop
move on after this.

Panic is better, than memory corruption and undefined
behavior.

Signed-off-by: Kirill Tkhai 
---
 include/net/net_namespace.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 10f99dafd5ac..ff0e47471d5b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -195,7 +195,7 @@ void __put_net(struct net *net);
 
 static inline struct net *get_net(struct net *net)
 {
-   atomic_inc(>count);
+   BUG_ON(atomic_inc_return(>count) <= 1);
return net;
 }

[PATCH net-next] cxgb4: Check alignment constraint for T6

2017-12-28 Thread Ganesh Goudar

Update the check for setting  IPV4 filters and align filter_id
to multiple of 2, only for IPv6 filters in case of T6.

Signed-off-by: Arjun Vynipadath 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 5980f30..29178cf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -1189,6 +1189,7 @@ int __cxgb4_set_filter(struct net_device *dev, int 
filter_id,
   struct filter_ctx *ctx)
 {
struct adapter *adapter = netdev2adap(dev);
+   unsigned int chip_ver = CHELSIO_CHIP_VERSION(adapter->params.chip);
unsigned int max_fidx, fidx;
struct filter_entry *f;
u32 iconf;
@@ -1225,12 +1226,18 @@ int __cxgb4_set_filter(struct net_device *dev, int 
filter_id,
 * insertion.
 */
if (fs->type == 0) { /* IPv4 */
-   /* If our IPv4 filter isn't being written to a
-* multiple of four filter index and there's an IPv6
-* filter at the multiple of 4 base slot, then we
-* prevent insertion.
+   /* For T6, If our IPv4 filter isn't being written to a
+* multiple of two filter index and there's an IPv6
+* filter at the multiple of 2 base slot, then we need
+* to delete that IPv6 filter ...
+* For adapters below T6, IPv6 filter occupies 4 entries.
+* Hence we need to delete the filter in multiple of 4 slot.
 */
-   fidx = filter_id & ~0x3;
+   if (chip_ver < CHELSIO_T6)
+   fidx = filter_id & ~0x3;
+   else
+   fidx = filter_id & ~0x1;
+
if (fidx != filter_id &&
adapter->tids.ftid_tab[fidx].fs.type) {
f = >tids.ftid_tab[fidx];
-- 
2.1.0

Re: [PATCH v6 0/6] Add M_CAN Support for Dra76 platform

2017-12-28 Thread Yang, Wenyou




On 2017/12/22 21:31, Faiz Abbas wrote:

This patch series adds support for M_CAN on the TI Dra76
platform. Device tree patches will be sent separately.
A bunch of patches were sent before by
Franklin Cooper . I have clubbed the
series together and rebased to the latest kernel.

Tested this series on SAMA5D2 Xplained board.

Tested-by: Wenyou Yang 



v6 changes:
Dropped the patches to make hclk optional. Drivers
which enable hclk as the interface clock using
pm_runtime calls must still provide a hclk in the
clocks property.

Support higher speed CAN-FD bitrate:
The community decided that data sampling point be used
for the secondary sampling point here
https://patchwork.kernel.org/patch/9909845/

Franklin S Cooper Jr (6):
   can: dev: Add support for limiting configured bitrate
   can: m_can: Add call to of_can_transceiver
   can: m_can: Add PM Runtime
   can: m_can: Support higher speed CAN-FD bitrates
   dt-bindings: can: m_can: Document new can transceiver binding
   dt-bindings: can: can-transceiver: Document new binding

  .../bindings/net/can/can-transceiver.txt   | 24 +++
  .../devicetree/bindings/net/can/m_can.txt  |  9 +++
  drivers/net/can/dev.c  | 39 +++
  drivers/net/can/m_can/m_can.c  | 81 --
  include/linux/can/dev.h|  8 +++
  5 files changed, 156 insertions(+), 5 deletions(-)
  create mode 100644 
Documentation/devicetree/bindings/net/can/can-transceiver.txt



Best Regards,
Wenyou Yang

Re: [PATCH net-next] virtio_net: implement VIRTIO_CONFIG_S_NEEDS_RESET

2017-12-28 Thread Jason Wang




On 2017年12月29日 03:11, Willem de Bruijn wrote:

On Mon, Oct 16, 2017 at 11:44 PM, Michael S. Tsirkin  wrote:

On Tue, Oct 17, 2017 at 11:05:07AM +0800, Jason Wang wrote:


On 2017年10月17日 06:34, Willem de Bruijn wrote:

On Mon, Oct 16, 2017 at 12:38 PM, Michael S. Tsirkin  wrote:

On Mon, Oct 16, 2017 at 12:04:57PM -0400, Willem de Bruijn wrote:

On Mon, Oct 16, 2017 at 11:31 AM, Michael S. Tsirkin  wrote:

On Mon, Oct 16, 2017 at 11:03:18AM -0400, Willem de Bruijn wrote:

+static int virtnet_reset(struct virtnet_info *vi)
+{
+ struct virtio_device *dev = vi->vdev;
+ int ret;
+
+ virtio_config_disable(dev);
+ dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
+ virtnet_freeze_down(dev, true);
+ remove_vq_common(vi);
+
+ virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+ virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+
+ ret = virtio_finalize_features(dev);
+ if (ret)
+ goto err;
+
+ ret = virtnet_restore_up(dev);
+ if (ret)
+ goto err;
+
+ ret = virtnet_set_queues(vi, vi->curr_queue_pairs);
+ if (ret)
+ goto err;
+
+ virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+ virtio_config_enable(dev);
+ return 0;
+
+err:
+ virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
+ return ret;
+}
+
   static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
   {
struct scatterlist sg;

I have a question here though. How do things like MAC address
get restored?

What about the rx mode?

vlans?

The function as is releases and reinitializes only ring state.
Device configuration such as mac and vlan persist across
the reset.

What gave you this impression? Take a look at e.g. this
code in qemu:

static void virtio_net_reset(VirtIODevice *vdev)
{
  VirtIONet *n = VIRTIO_NET(vdev);

  /* Reset back to compatibility mode */
  n->promisc = 1;
  n->allmulti = 0;
  n->alluni = 0;
  n->nomulti = 0;
  n->nouni = 0;
  n->nobcast = 0;
  /* multiqueue is disabled by default */
  n->curr_queues = 1;
  timer_del(n->announce_timer);
  n->announce_counter = 0;
  n->status &= ~VIRTIO_NET_S_ANNOUNCE;

  /* Flush any MAC and VLAN filter table state */
  n->mac_table.in_use = 0;
  n->mac_table.first_multi = 0;
  n->mac_table.multi_overflow = 0;
  n->mac_table.uni_overflow = 0;
  memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
  memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
  qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
  memset(n->vlans, 0, MAX_VLAN >> 3);
}

So device seems to lose all state, you have to re-program it.

Oh, indeed! The guest does not reset its state, so it might
be out of sync with the host after the operation. Was this not
an issue when previously resetting in the context of xdp?

I suspect it was broken back then, too.

Okay. I guess that in principle this is all programmable through
virtnet_set_rx_mode, virtnet_vlan_rx_add_vid, etc. But it's a
lot more complex than just restoring virtnet_reset. Will need to
be careful about concurrency issues at the least. Similar to the
ones you point out below.


The problem has been pointed out during developing virtio-net XDP. But it
may not be a big issue since vhost_net ignores all kinds of the filters now.

Thanks

It might not keep doing that in the future though.
And virtio-net in userspace doesn't ignore the filters.

How about the guest honor the request only if no state has been
offloaded to the host?

This is the common case for vhost_net, and not expected to change
soon.


FYI, I'm implementing to use tun eBPF filter for virtio-net. So 
recovering filter should be considered.


Thanks



Even when it does, we have a graceful degradation strategy. Guest
revert state prior to reset and reapply. Though for the time being,
solving this only in the case without state offload would be solve my
use case.

[PATCH net-next v7 6/6] net: dccp: Remove dccpprobe module

2017-12-28 Thread Masami Hiramatsu

Remove DCCP probe module since jprobe has been deprecated.
That function is now replaced by dccp/dccp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 Changes in v5:
  - Fix a conflict with previous change in Makefile.
---
 net/dccp/Kconfig  |   17 
 net/dccp/Makefile |2 -
 net/dccp/probe.c  |  203 -
 3 files changed, 222 deletions(-)
 delete mode 100644 net/dccp/probe.c

diff --git a/net/dccp/Kconfig b/net/dccp/Kconfig
index 8c0ef71bed2f..b270e84d9c13 100644
--- a/net/dccp/Kconfig
+++ b/net/dccp/Kconfig
@@ -39,23 +39,6 @@ config IP_DCCP_DEBUG
 
  Just say N.
 
-config NET_DCCPPROBE
-   tristate "DCCP connection probing"
-   depends on PROC_FS && KPROBES
-   ---help---
-   This module allows for capturing the changes to DCCP connection
-   state in response to incoming packets. It is used for debugging
-   DCCP congestion avoidance modules. If you don't understand
-   what was just said, you don't need it: say N.
-
-   Documentation on how to use DCCP connection probing can be found
-   at:
-   
- 
http://www.linuxfoundation.org/collaborate/workgroups/networking/dccpprobe
-
-   To compile this code as a module, choose M here: the
-   module will be called dccp_probe.
-
 
 endmenu
 
diff --git a/net/dccp/Makefile b/net/dccp/Makefile
index 4215f13a63af..5b4ff37bc806 100644
--- a/net/dccp/Makefile
+++ b/net/dccp/Makefile
@@ -21,12 +21,10 @@ obj-$(subst y,$(CONFIG_IP_DCCP),$(CONFIG_IPV6)) += 
dccp_ipv6.o
 dccp_ipv6-y := ipv6.o
 
 obj-$(CONFIG_INET_DCCP_DIAG) += dccp_diag.o
-obj-$(CONFIG_NET_DCCPPROBE) += dccp_probe.o
 
 dccp-$(CONFIG_SYSCTL) += sysctl.o
 
 dccp_diag-y := diag.o
-dccp_probe-y := probe.o
 
 # build with local directory for trace.h
 CFLAGS_proto.o := -I$(src)
diff --git a/net/dccp/probe.c b/net/dccp/probe.c
deleted file mode 100644
index 3d3fda05b32d..
--- a/net/dccp/probe.c
+++ /dev/null
@@ -1,203 +0,0 @@
-/*
- * dccp_probe - Observe the DCCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * Modified for DCCP from Stephen Hemminger's code
- * Copyright (C) 2006, Ian McDonald 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "dccp.h"
-#include "ccid.h"
-#include "ccids/ccid3.h"
-
-static int port;
-
-static int bufsize = 64 * 1024;
-
-static const char procname[] = "dccpprobe";
-
-static struct {
-   struct kfifo  fifo;
-   spinlock_tlock;
-   wait_queue_head_t wait;
-   struct timespec64 tstart;
-} dccpw;
-
-static void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   struct timespec64 now;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   getnstimeofday64();
-
-   now = timespec64_sub(now, dccpw.tstart);
-
-   len = sprintf(tbuf, "%lu.%06lu ",
- (unsigned long) now.tv_sec,
- (unsigned long) now.tv_nsec / NSEC_PER_USEC);
-   len += vscnprintf(tbuf+len, sizeof(tbuf)-len, fmt, args);
-   va_end(args);
-
-   kfifo_in_locked(, tbuf, len, );
-   wake_up();
-}
-
-static int jdccp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
-{
-   const struct inet_sock *inet = inet_sk(sk);
-   struct ccid3_hc_tx_sock *hc = NULL;
-
-   if (ccid_get_current_tx_ccid(dccp_sk(sk)) == DCCPC_CCID3)
-   hc = ccid3_hc_tx_sk(sk);
-
-   if (port == 0 || ntohs(inet->inet_dport) == port ||
-   ntohs(inet->inet_sport) == port) {
-   if (hc)
-   printl("%pI4:%u %pI4:%u %d %d %d %d %u %llu %llu %d\n",
-  >inet_saddr, ntohs(inet->inet_sport),
-  >inet_daddr, ntohs(inet->inet_dport), size,
-  hc->tx_s, hc->tx_rtt, hc->tx_p,
-  hc->tx_x_calc, hc->tx_x_recv >> 6,
-  hc->tx_x >> 6, hc->tx_t_ipi);
-   else
-   printl("%pI4:%u %pI4:%u

[PATCH net-next v7 4/6] net: sctp: Remove debug SCTP probe module

2017-12-28 Thread Masami Hiramatsu

Remove SCTP probe module since jprobe has been deprecated.
That function is now replaced by sctp/sctp_probe and
sctp/sctp_probe_path trace-events.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 net/sctp/Kconfig  |   12 ---
 net/sctp/Makefile |3 -
 net/sctp/probe.c  |  244 -
 3 files changed, 259 deletions(-)
 delete mode 100644 net/sctp/probe.c

diff --git a/net/sctp/Kconfig b/net/sctp/Kconfig
index d9c04dc1b3f3..c740b189d4ba 100644
--- a/net/sctp/Kconfig
+++ b/net/sctp/Kconfig
@@ -37,18 +37,6 @@ menuconfig IP_SCTP
 
 if IP_SCTP
 
-config NET_SCTPPROBE
-   tristate "SCTP: Association probing"
-depends on PROC_FS && KPROBES
----help---
-This module allows for capturing the changes to SCTP association
-state in response to incoming packets. It is used for debugging
-SCTP congestion control algorithms. If you don't understand
-what was just said, you don't need it: say N.
-
-To compile this code as a module, choose M here: the
-module will be called sctp_probe.
-
 config SCTP_DBG_OBJCNT
bool "SCTP: Debug object counts"
depends on PROC_FS
diff --git a/net/sctp/Makefile b/net/sctp/Makefile
index 54bd9c1a8aa1..6776582ec449 100644
--- a/net/sctp/Makefile
+++ b/net/sctp/Makefile
@@ -4,7 +4,6 @@
 #
 
 obj-$(CONFIG_IP_SCTP) += sctp.o
-obj-$(CONFIG_NET_SCTPPROBE) += sctp_probe.o
 obj-$(CONFIG_INET_SCTP_DIAG) += sctp_diag.o
 
 sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
@@ -16,8 +15,6 @@ sctp-y := sm_statetable.o sm_statefuns.o sm_sideeffect.o \
  offload.o stream_sched.o stream_sched_prio.o \
  stream_sched_rr.o stream_interleave.o
 
-sctp_probe-y := probe.o
-
 sctp-$(CONFIG_SCTP_DBG_OBJCNT) += objcnt.o
 sctp-$(CONFIG_PROC_FS) += proc.o
 sctp-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/net/sctp/probe.c b/net/sctp/probe.c
deleted file mode 100644
index 1280f85a598d..
--- a/net/sctp/probe.c
+++ /dev/null
@@ -1,244 +0,0 @@
-/*
- * sctp_probe - Observe the SCTP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * Modified for SCTP from Stephen Hemminger's code
- * Copyright (C) 2010, Wei Yongjun 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-MODULE_SOFTDEP("pre: sctp");
-MODULE_AUTHOR("Wei Yongjun ");
-MODULE_DESCRIPTION("SCTP snooper");
-MODULE_LICENSE("GPL");
-
-static int port __read_mostly = 0;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int fwmark __read_mostly = 0;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int bufsize __read_mostly = 64 * 1024;
-MODULE_PARM_DESC(bufsize, "Log buffer size (default 64k)");
-module_param(bufsize, int, 0);
-
-static int full __read_mostly = 1;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received,  0=only cwnd 
changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "sctpprobe";
-
-static struct {
-   struct kfifo  fifo;
-   spinlock_tlock;
-   wait_queue_head_t wait;
-   struct timespec64 tstart;
-} sctpw;
-
-static __printf(1, 2) void printl(const char *fmt, ...)
-{
-   va_list args;
-   int len;
-   char tbuf[256];
-
-   va_start(args, fmt);
-   len = vscnprintf(tbuf, sizeof(tbuf), fmt, args);
-   va_end(args);
-
-   kfifo_in_locked(, tbuf, len, );
-   wake_up();
-}
-
-static int sctpprobe_open(struct inode *inode, struct file *file)
-{
-   kfifo_reset();
-   ktime_get_ts64();
-
-   return 0;
-}
-
-static ssize_t sctpprobe_read(struct file *file, char __user *buf,
- size_t len, loff_t *ppos)
-{
-   int error = 0, cnt = 0;
-   unsigned char *tbuf;
-
-   if (!buf)
-   return -EINVAL;
-
-   if (len == 0)
-   return 0;
-
-   tbuf = vmalloc(len);
-   if (!tbuf)
-   return

[PATCH net-next v7 5/6] net: dccp: Add DCCP sendmsg trace event

2017-12-28 Thread Masami Hiramatsu

Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v5:
   - Fix to add local directory to include for trace.h.
 Thanks Steven!
  Changes in v7:
   - Avoid preprocessor directives in tracepoint macro args
 by sharing TP_STORE_ADDR_PORTS() macro with tcp.h.
---
 include/trace/events/net_probe_common.h |   44 
 include/trace/events/tcp.h  |   39 --
 net/dccp/Makefile   |3 +
 net/dccp/proto.c|5 ++
 net/dccp/trace.h|   84 +++
 5 files changed, 137 insertions(+), 38 deletions(-)
 create mode 100644 include/trace/events/net_probe_common.h
 create mode 100644 net/dccp/trace.h

diff --git a/include/trace/events/net_probe_common.h 
b/include/trace/events/net_probe_common.h
new file mode 100644
index ..3930119cab08
--- /dev/null
+++ b/include/trace/events/net_probe_common.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#if !defined(_TRACE_NET_PROBE_COMMON_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_PROBE_COMMON_H
+
+#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk)  \
+   do {\
+   struct sockaddr_in *v4 = (void *)__entry->saddr;\
+   \
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_sport;\
+   v4->sin_addr.s_addr = inet->inet_saddr; \
+   v4 = (void *)__entry->daddr;\
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_dport;\
+   v4->sin_addr.s_addr = inet->inet_daddr; \
+   } while (0)
+
+#if IS_ENABLED(CONFIG_IPV6)
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   do {\
+   if (sk->sk_family == AF_INET6) {\
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr; \
+   \
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_sport;   \
+   v6->sin6_addr = inet6_sk(sk)->saddr;\
+   v6 = (void *)__entry->daddr;\
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_dport;   \
+   v6->sin6_addr = sk->sk_v6_daddr;\
+   } else  \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);  \
+   } while (0)
+
+#else
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);
+
+#endif
+
+#endif
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 4dea6342f7d4..1501ca91814f 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -279,44 +279,7 @@ TRACE_EVENT(tcp_retransmit_synack,
  __entry->saddr_v6, __entry->daddr_v6)
 );
 
-
-#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk)  \
-   do {\
-   struct sockaddr_in *v4 = (void *)__entry->saddr;\
-   \
-   v4->sin_family = AF_INET;   \
-   v4->sin_port = inet->inet_sport;\
-   v4->sin_addr.s_addr = inet->inet_saddr; \
-   v4 = (void *)__entry->daddr;\
-   v4->sin_family = AF_INET;   \
-   v4->sin_port = inet->inet_dport;\
-   v4->sin_addr.s_addr = inet->inet_daddr; \
-   } while (0)
-
-#if IS_ENABLED(CONFIG_IPV6)
-
-#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
-   do {\
-   if (sk->sk_family == AF_INET6) {\
-   struct sockaddr_in6 *v6 = (void *)__entry->saddr; \
-   \
-   v6->sin6_family = AF_INET6; \
-   v6->sin6_port = inet->inet_sport;

[PATCH net-next v7 3/6] net: sctp: Add SCTP ACK tracking trace event

2017-12-28 Thread Masami Hiramatsu

Add SCTP ACK tracking trace event to trace the changes of SCTP
association state in response to incoming packets.
It is used for debugging SCTP congestion control algorithms,
and will replace sctp_probe module.

Note that this event a bit tricky. Since this consists of 2
events (sctp_probe and sctp_probe_path) so you have to enable
both events as below.

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/sctp/sctp_probe/enable
  # echo 1 > events/sctp/sctp_probe_path/enable

Or, you can enable all the events under sctp.

  # echo 1 > events/sctp/enable

Since sctp_probe_path event is always invoked from sctp_probe
event, you can not see any output if you only enable
sctp_probe_path.

Signed-off-by: Masami Hiramatsu 
---
  Changes in v3:
   - Add checking whether sctp_probe_path event is enabled
 before iterating sctp paths to record. Thanks Steven.
  Changes in v4:
   - Move a temporal variable definition in the block.
   - Fix to cast pointer to unsigned long instead of __u64
 for 32bit environment.
---
 include/trace/events/sctp.h |   99 +++
 net/sctp/sm_statefuns.c |5 ++
 2 files changed, 104 insertions(+)
 create mode 100644 include/trace/events/sctp.h

diff --git a/include/trace/events/sctp.h b/include/trace/events/sctp.h
new file mode 100644
index ..7475c7be165a
--- /dev/null
+++ b/include/trace/events/sctp.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM sctp
+
+#if !defined(_TRACE_SCTP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_SCTP_H
+
+#include 
+#include 
+
+TRACE_EVENT(sctp_probe_path,
+
+   TP_PROTO(struct sctp_transport *sp,
+const struct sctp_association *asoc),
+
+   TP_ARGS(sp, asoc),
+
+   TP_STRUCT__entry(
+   __field(__u64, asoc)
+   __field(__u32, primary)
+   __array(__u8, ipaddr, sizeof(union sctp_addr))
+   __field(__u32, state)
+   __field(__u32, cwnd)
+   __field(__u32, ssthresh)
+   __field(__u32, flight_size)
+   __field(__u32, partial_bytes_acked)
+   __field(__u32, pathmtu)
+   ),
+
+   TP_fast_assign(
+   __entry->asoc = (unsigned long)asoc;
+   __entry->primary = (sp == asoc->peer.primary_path);
+   memcpy(__entry->ipaddr, >ipaddr, sizeof(union sctp_addr));
+   __entry->state = sp->state;
+   __entry->cwnd = sp->cwnd;
+   __entry->ssthresh = sp->ssthresh;
+   __entry->flight_size = sp->flight_size;
+   __entry->partial_bytes_acked = sp->partial_bytes_acked;
+   __entry->pathmtu = sp->pathmtu;
+   ),
+
+   TP_printk("asoc=%#llx%s ipaddr=%pISpc state=%u cwnd=%u ssthresh=%u "
+ "flight_size=%u partial_bytes_acked=%u pathmtu=%u",
+ __entry->asoc, __entry->primary ? "(*)" : "",
+ __entry->ipaddr, __entry->state, __entry->cwnd,
+ __entry->ssthresh, __entry->flight_size,
+ __entry->partial_bytes_acked, __entry->pathmtu)
+);
+
+TRACE_EVENT(sctp_probe,
+
+   TP_PROTO(const struct sctp_endpoint *ep,
+const struct sctp_association *asoc,
+struct sctp_chunk *chunk),
+
+   TP_ARGS(ep, asoc, chunk),
+
+   TP_STRUCT__entry(
+   __field(__u64, asoc)
+   __field(__u32, mark)
+   __field(__u16, bind_port)
+   __field(__u16, peer_port)
+   __field(__u32, pathmtu)
+   __field(__u32, rwnd)
+   __field(__u16, unack_data)
+   ),
+
+   TP_fast_assign(
+   struct sk_buff *skb = chunk->skb;
+
+   __entry->asoc = (unsigned long)asoc;
+   __entry->mark = skb->mark;
+   __entry->bind_port = ep->base.bind_addr.port;
+   __entry->peer_port = asoc->peer.port;
+   __entry->pathmtu = asoc->pathmtu;
+   __entry->rwnd = asoc->peer.rwnd;
+   __entry->unack_data = asoc->unack_data;
+
+   if (trace_sctp_probe_path_enabled()) {
+   struct sctp_transport *sp;
+
+   list_for_each_entry(sp, >peer.transport_addr_list,
+   transports) {
+   trace_sctp_probe_path(sp, asoc);
+   }
+   }
+   ),
+
+   TP_printk("asoc=%#llx mark=%#x bind_port=%d peer_port=%d pathmtu=%d "
+ "rwnd=%u unack_data=%d",
+ __entry->asoc, __entry->mark, __entry->bind_port,
+ __entry->peer_port, __entry->pathmtu, __entry->rwnd,
+ __entry->unack_data)
+);
+
+#endif /* _TRACE_SCTP_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index

[PATCH net-next v7 2/6] net: tcp: Remove TCP probe module

2017-12-28 Thread Masami Hiramatsu

Remove TCP probe module since jprobe has been deprecated.
That function is now replaced by tcp/tcp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu 
---
 net/Kconfig  |   17 ---
 net/ipv4/Makefile|1 
 net/ipv4/tcp_probe.c |  301 --
 3 files changed, 319 deletions(-)
 delete mode 100644 net/ipv4/tcp_probe.c

diff --git a/net/Kconfig b/net/Kconfig
index 9dba2715919d..efe930db3c08 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -336,23 +336,6 @@ config NET_PKTGEN
  To compile this code as a module, choose M here: the
  module will be called pktgen.
 
-config NET_TCPPROBE
-   tristate "TCP connection probing"
-   depends on INET && PROC_FS && KPROBES
-   ---help---
-   This module allows for capturing the changes to TCP connection
-   state in response to incoming packets. It is used for debugging
-   TCP congestion avoidance modules. If you don't understand
-   what was just said, you don't need it: say N.
-
-   Documentation on how to use TCP connection probing can be found
-   at:
-   
- 
http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe
-
-   To compile this code as a module, choose M here: the
-   module will be called tcp_probe.
-
 config NET_DROP_MONITOR
tristate "Network packet drop alerting service"
depends on INET && TRACEPOINTS
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index c6c8ad1d4b6d..47a0a6649a9d 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -43,7 +43,6 @@ obj-$(CONFIG_INET_DIAG) += inet_diag.o
 obj-$(CONFIG_INET_TCP_DIAG) += tcp_diag.o
 obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
 obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
-obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
 obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
deleted file mode 100644
index 697f4c67b2e3..
--- a/net/ipv4/tcp_probe.c
+++ /dev/null
@@ -1,301 +0,0 @@
-/*
- * tcpprobe - Observe the TCP flow with kprobes.
- *
- * The idea for this came from Werner Almesberger's umlsim
- * Copyright (C) 2004, Stephen Hemminger 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-MODULE_AUTHOR("Stephen Hemminger ");
-MODULE_DESCRIPTION("TCP cwnd snooper");
-MODULE_LICENSE("GPL");
-MODULE_VERSION("1.1");
-
-static int port __read_mostly;
-MODULE_PARM_DESC(port, "Port to match (0=all)");
-module_param(port, int, 0);
-
-static unsigned int bufsize __read_mostly = 4096;
-MODULE_PARM_DESC(bufsize, "Log buffer size in packets (4096)");
-module_param(bufsize, uint, 0);
-
-static unsigned int fwmark __read_mostly;
-MODULE_PARM_DESC(fwmark, "skb mark to match (0=no mark)");
-module_param(fwmark, uint, 0);
-
-static int full __read_mostly;
-MODULE_PARM_DESC(full, "Full log (1=every ack packet received,  0=only cwnd 
changes)");
-module_param(full, int, 0);
-
-static const char procname[] = "tcpprobe";
-
-struct tcp_log {
-   ktime_t tstamp;
-   union {
-   struct sockaddr raw;
-   struct sockaddr_in  v4;
-   struct sockaddr_in6 v6;
-   }   src, dst;
-   u16 length;
-   u32 snd_nxt;
-   u32 snd_una;
-   u32 snd_wnd;
-   u32 rcv_wnd;
-   u32 snd_cwnd;
-   u32 ssthresh;
-   u32 srtt;
-};
-
-static struct {
-   spinlock_t  lock;
-   wait_queue_head_t wait;
-   ktime_t start;
-   u32 lastcwnd;
-
-   unsigned long   head, tail;
-   struct tcp_log  *log;
-} tcp_probe;
-
-static inline int tcp_probe_used(void)
-{
-   return (tcp_probe.head - tcp_probe.tail) & (bufsize - 1);
-}
-
-static inline int tcp_probe_avail(void)
-{
-   return bufsize - tcp_probe_used() - 1;
-}
-
-#define tcp_probe_copy_fl_to_si4(inet, si4, mem)   \
-   do {\
-   si4.sin_family = AF_INET;

[PATCH net-next v7 1/6] net: tcp: Add trace events for TCP congestion window tracing

2017-12-28 Thread Masami Hiramatsu

This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.

User can use ftrace to trace this event as below;

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/tcp/tcp_probe/enable
  (run workloads)
  # cat trace

Signed-off-by: Masami Hiramatsu 
---
 Changes in v6:
  - Avoid preprocessor directives in tracepoint macro args as
Mat did on net tree.
---
 include/trace/events/tcp.h |   97 
 net/ipv4/tcp_input.c   |3 +
 2 files changed, 100 insertions(+)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 8e88a1671538..4dea6342f7d4 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM tcp
 
@@ -8,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * tcp event with arguments sk and skb
@@ -277,6 +279,101 @@ TRACE_EVENT(tcp_retransmit_synack,
  __entry->saddr_v6, __entry->daddr_v6)
 );
 
+
+#define TP_STORE_ADDR_PORTS_V4(__entry, inet, sk)  \
+   do {\
+   struct sockaddr_in *v4 = (void *)__entry->saddr;\
+   \
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_sport;\
+   v4->sin_addr.s_addr = inet->inet_saddr; \
+   v4 = (void *)__entry->daddr;\
+   v4->sin_family = AF_INET;   \
+   v4->sin_port = inet->inet_dport;\
+   v4->sin_addr.s_addr = inet->inet_daddr; \
+   } while (0)
+
+#if IS_ENABLED(CONFIG_IPV6)
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   do {\
+   if (sk->sk_family == AF_INET6) {\
+   struct sockaddr_in6 *v6 = (void *)__entry->saddr; \
+   \
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_sport;   \
+   v6->sin6_addr = inet6_sk(sk)->saddr;\
+   v6 = (void *)__entry->daddr;\
+   v6->sin6_family = AF_INET6; \
+   v6->sin6_port = inet->inet_dport;   \
+   v6->sin6_addr = sk->sk_v6_daddr;\
+   } else  \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);  \
+   } while (0)
+
+#else
+
+#define TP_STORE_ADDR_PORTS(__entry, inet, sk) \
+   TP_STORE_ADDR_PORTS_V4(__entry, inet, sk);
+
+#endif
+
+TRACE_EVENT(tcp_probe,
+
+   TP_PROTO(struct sock *sk, struct sk_buff *skb),
+
+   TP_ARGS(sk, skb),
+
+   TP_STRUCT__entry(
+   /* sockaddr_in6 is always bigger than sockaddr_in */
+   __array(__u8, saddr, sizeof(struct sockaddr_in6))
+   __array(__u8, daddr, sizeof(struct sockaddr_in6))
+   __field(__u16, sport)
+   __field(__u16, dport)
+   __field(__u32, mark)
+   __field(__u16, length)
+   __field(__u32, snd_nxt)
+   __field(__u32, snd_una)
+   __field(__u32, snd_cwnd)
+   __field(__u32, ssthresh)
+   __field(__u32, snd_wnd)
+   __field(__u32, srtt)
+   __field(__u32, rcv_wnd)
+   ),
+
+   TP_fast_assign(
+   const struct tcp_sock *tp = tcp_sk(sk);
+   const struct inet_sock *inet = inet_sk(sk);
+
+   memset(__entry->saddr, 0, sizeof(struct sockaddr_in6));
+   memset(__entry->daddr, 0, sizeof(struct sockaddr_in6));
+
+   TP_STORE_ADDR_PORTS(__entry, inet, sk);
+
+   /* For filtering use */
+   __entry->sport = ntohs(inet->inet_sport);
+   __entry->dport = ntohs(inet->inet_dport);
+   __entry->mark = skb->mark;
+
+   __entry->length = skb->len;
+   __entry->snd_nxt = tp->snd_nxt;
+   __entry->snd_una = tp->snd_una;
+   __entry->snd_cwnd = tp->snd_cwnd;
+   __entry->snd_wnd = tp->snd_wnd;
+   __entry->rcv_wnd = tp->rcv_wnd;
+   __entry->ssthresh = tcp_current_ssthresh(sk);
+   __entry->srtt = tp->srtt_us >> 3;
+   ),
+
+

[PATCH net-next v7 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-28 Thread Masami Hiramatsu

Hi,

This series is v7 of the replacement of jprobe usage with trace
events. This version fixes net/dccp/trace.h to avoid sparse
warning. Since the TP_STORE_ADDR_PORTS macro can be shared
with trace/events/tcp.h, it also introduce a new common header
file and move the definition of that macro.

Previous version is here;
 https://lkml.org/lkml/2017/12/28/7

Changes from v6:
  [5/6]: Avoid preprocessor directives in tracepoint macro args

Thank you,

---

Masami Hiramatsu (6):
  net: tcp: Add trace events for TCP congestion window tracing
  net: tcp: Remove TCP probe module
  net: sctp: Add SCTP ACK tracking trace event
  net: sctp: Remove debug SCTP probe module
  net: dccp: Add DCCP sendmsg trace event
  net: dccp: Remove dccpprobe module


 include/trace/events/net_probe_common.h |   44 +
 include/trace/events/sctp.h |   99 ++
 include/trace/events/tcp.h  |   60 ++
 net/Kconfig |   17 --
 net/dccp/Kconfig|   17 --
 net/dccp/Makefile   |5 -
 net/dccp/probe.c|  203 -
 net/dccp/proto.c|5 +
 net/dccp/trace.h|   84 +
 net/ipv4/Makefile   |1 
 net/ipv4/tcp_input.c|3 
 net/ipv4/tcp_probe.c|  301 ---
 net/sctp/Kconfig|   12 -
 net/sctp/Makefile   |3 
 net/sctp/probe.c|  244 -
 net/sctp/sm_statefuns.c |5 +
 16 files changed, 303 insertions(+), 800 deletions(-)
 create mode 100644 include/trace/events/net_probe_common.h
 create mode 100644 include/trace/events/sctp.h
 delete mode 100644 net/dccp/probe.c
 create mode 100644 net/dccp/trace.h
 delete mode 100644 net/ipv4/tcp_probe.c
 delete mode 100644 net/sctp/probe.c

--
Masami Hiramatsu (Linaro)

[PATCH net-next 2/2] tun: allow to attach ebpf socket filter

2017-12-28 Thread Jason Wang

This patch allows userspace to attach eBPF filter to tun. This will
allow to implement VM dataplane filtering in a more efficient way
compared to cBPF filter.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c   | 26 ++
 include/uapi/linux/if_tun.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0853829..6e9452b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -238,6 +238,7 @@ struct tun_struct {
struct tun_pcpu_stats __percpu *pcpu_stats;
struct bpf_prog __rcu *xdp_prog;
struct tun_prog __rcu *steering_prog;
+   struct tun_prog __rcu *filter_prog;
 };
 
 static int tun_napi_receive(struct napi_struct *napi, int budget)
@@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, 
struct sk_buff *skb)
 #endif
 }
 
+static unsigned int run_ebpf_filter(struct tun_struct *tun,
+   struct sk_buff *skb,
+   int len)
+{
+   struct tun_prog *prog = rcu_dereference(tun->filter_prog);
+
+   if (prog)
+   len = bpf_prog_run_clear_cb(prog->prog, skb);
+
+   return len;
+}
+
 /* Net device start xmit */
 static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct tun_struct *tun = netdev_priv(dev);
int txq = skb->queue_mapping;
struct tun_file *tfile;
+   int len = skb->len;
 
rcu_read_lock();
tfile = rcu_dereference(tun->tfiles[txq]);
@@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
struct net_device *dev)
sk_filter(tfile->socket.sk, skb))
goto drop;
 
+   len = run_ebpf_filter(tun, skb, len);
+   if (!len)
+   goto drop;
+
if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
goto drop;
 
+   if (pskb_trim(skb, len))
+   goto drop;
+
skb_tx_timestamp(skb);
 
/* Orphan the skb - required as we might hang on to it
@@ -2068,6 +2089,7 @@ static void tun_free_netdev(struct net_device *dev)
tun_flow_uninit(tun);
security_tun_dev_free_security(tun->security);
__tun_set_ebpf(tun, >steering_prog, NULL);
+   __tun_set_ebpf(tun, >filter_prog, NULL);
 }
 
 static void tun_setup(struct net_device *dev)
@@ -2849,6 +2871,10 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
ret = tun_set_ebpf(tun, >steering_prog, argp);
break;
 
+   case TUNSETFILTEREBPF:
+   ret = tun_set_ebpf(tun, >filter_prog, argp);
+   break;
+
default:
ret = -EINVAL;
break;
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index fb38c17..ee432cd 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -58,6 +58,7 @@
 #define TUNSETVNETBE _IOW('T', 222, int)
 #define TUNGETVNETBE _IOR('T', 223, int)
 #define TUNSETSTEERINGEBPF _IOR('T', 224, int)
+#define TUNSETFILTEREBPF _IOR('T', 225, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN0x0001
-- 
2.7.4

[PATCH net-next 0/2] tun: allow to attach eBPF filter

2017-12-28 Thread Jason Wang

Hi all:

This series tries to implement eBPF socket filter for tun. This could
be used for implementing efficient virtio-net receive filter for
vhost-net.

Thanks

Jason Wang (2):
  tuntap: rename struct tun_steering_prog to struct tun_prog
  tun: allow to attach ebpf socket filter

 drivers/net/tun.c   | 58 -
 include/uapi/linux/if_tun.h |  1 +
 2 files changed, 43 insertions(+), 16 deletions(-)

-- 
2.7.4

[PATCH net-next 1/2] tuntap: rename struct tun_steering_prog to struct tun_prog

2017-12-28 Thread Jason Wang

To be reused by other eBPF program other than queue selection.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e367d631..0853829 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -195,7 +195,7 @@ struct tun_flow_entry {
 
 #define TUN_NUM_FLOW_ENTRIES 1024
 
-struct tun_steering_prog {
+struct tun_prog {
struct rcu_head rcu;
struct bpf_prog *prog;
 };
@@ -237,7 +237,7 @@ struct tun_struct {
u32 rx_batched;
struct tun_pcpu_stats __percpu *pcpu_stats;
struct bpf_prog __rcu *xdp_prog;
-   struct tun_steering_prog __rcu *steering_prog;
+   struct tun_prog __rcu *steering_prog;
 };
 
 static int tun_napi_receive(struct napi_struct *napi, int budget)
@@ -571,7 +571,7 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, 
struct sk_buff *skb)
 
 static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb)
 {
-   struct tun_steering_prog *prog;
+   struct tun_prog *prog;
u16 ret = 0;
 
prog = rcu_dereference(tun->steering_prog);
@@ -2027,19 +2027,18 @@ static ssize_t tun_chr_read_iter(struct kiocb *iocb, 
struct iov_iter *to)
return ret;
 }
 
-static void tun_steering_prog_free(struct rcu_head *rcu)
+static void tun_prog_free(struct rcu_head *rcu)
 {
-   struct tun_steering_prog *prog = container_of(rcu,
-struct tun_steering_prog, rcu);
+   struct tun_prog *prog = container_of(rcu, struct tun_prog, rcu);
 
bpf_prog_destroy(prog->prog);
kfree(prog);
 }
 
-static int __tun_set_steering_ebpf(struct tun_struct *tun,
-  struct bpf_prog *prog)
+static int __tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
+ struct bpf_prog *prog)
 {
-   struct tun_steering_prog *old, *new = NULL;
+   struct tun_prog *old, *new = NULL;
 
if (prog) {
new = kmalloc(sizeof(*new), GFP_KERNEL);
@@ -2049,13 +2048,13 @@ static int __tun_set_steering_ebpf(struct tun_struct 
*tun,
}
 
spin_lock_bh(>lock);
-   old = rcu_dereference_protected(tun->steering_prog,
+   old = rcu_dereference_protected(*prog_p,
lockdep_is_held(>lock));
-   rcu_assign_pointer(tun->steering_prog, new);
+   rcu_assign_pointer(*prog_p, new);
spin_unlock_bh(>lock);
 
if (old)
-   call_rcu(>rcu, tun_steering_prog_free);
+   call_rcu(>rcu, tun_prog_free);
 
return 0;
 }
@@ -2068,7 +2067,7 @@ static void tun_free_netdev(struct net_device *dev)
free_percpu(tun->pcpu_stats);
tun_flow_uninit(tun);
security_tun_dev_free_security(tun->security);
-   __tun_set_steering_ebpf(tun, NULL);
+   __tun_set_ebpf(tun, >steering_prog, NULL);
 }
 
 static void tun_setup(struct net_device *dev)
@@ -2550,7 +2549,8 @@ static int tun_set_queue(struct file *file, struct ifreq 
*ifr)
return ret;
 }
 
-static int tun_set_steering_ebpf(struct tun_struct *tun, void __user *data)
+static int tun_set_ebpf(struct tun_struct *tun, struct tun_prog **prog_p,
+   void __user *data)
 {
struct bpf_prog *prog;
int fd;
@@ -2566,7 +2566,7 @@ static int tun_set_steering_ebpf(struct tun_struct *tun, 
void __user *data)
return PTR_ERR(prog);
}
 
-   return __tun_set_steering_ebpf(tun, prog);
+   return __tun_set_ebpf(tun, prog_p, prog);
 }
 
 static long __tun_chr_ioctl(struct file *file, unsigned int cmd,
@@ -2846,7 +2846,7 @@ static long __tun_chr_ioctl(struct file *file, unsigned 
int cmd,
break;
 
case TUNSETSTEERINGEBPF:
-   ret = tun_set_steering_ebpf(tun, argp);
+   ret = tun_set_ebpf(tun, >steering_prog, argp);
break;
 
default:
-- 
2.7.4

[GIT] Networking

2017-12-28 Thread David Miller


1) IPv6 gre tunnels end up with different default features enabled
   depending upon whether netlink or ioctls are used to bring them
   up.  Fix from Alexey Kodanev.

2) Fix read past end of user control message in RDS< from Avinash
   Repaka.

3) Missing RCU barrier in mini qdisc code, from Cong Wang.

4) Missing policy put when reusing per-cpu route entries, from
   Florian Westphal.

5) Handle nested PCI errors properly in bnx2x driver, from Guilherme
   G. Piccoli.

6) Run nested transport mode IPSEC packets via tasklet, from Herbert
   Xu.

7) Fix handling poll() for stream sockets in tipc, from Parthasarathy
   Bhuvaragan.

8) Fix two stack-out-of-bounds issues in IPSEC, from Steffen Klassert.

9) Another zerocopy ubuf handling fix, from Willem de Bruijn.

Please pull, thanks a lot!

The following changes since commit ead68f216110170ec729e2c4dec0aad6d38259d7:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-12-21 
15:57:30 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to d5902f6d1fbdb27e6a33c418063466d94be9dfa2:

  Merge branch 'strparser-Fix-lockdep-issue' (2017-12-28 14:28:23 -0500)


Alexey Kodanev (1):
  ip6_gre: fix device features for ioctl setup

Antony Antony (1):
  xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM)

Avinash Repaka (1):
  RDS: Check cmsg_len before dereferencing CMSG_DATA

Aviv Heller (1):
  xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)

Cong Wang (2):
  xfrm: check id proto in validate_tmpl()
  net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap()

Daniel Borkmann (1):
  Merge branch 'bpf-bpftool-various-fixes'

David S. Miller (4):
  Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec
  Merge branch 'tg3-fixes'
  Merge git://git.kernel.org/.../bpf/bpf
  Merge branch 'strparser-Fix-lockdep-issue'

Florian Westphal (1):
  xfrm: put policies when reusing pcpu xdst entry

Fugang Duan (1):
  net: fec: unmap the xmit buffer that are not transferred by DMA

Grygorii Strashko (1):
  net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg 
workaround

Guilherme G. Piccoli (1):
  bnx2x: Improve reliability in case of nested PCI errors

Herbert Xu (1):
  xfrm: Reinject transport-mode packets through tasklet

Jakub Kicinski (2):
  tools: bpftool: maps: close json array on error paths of show
  tools: bpftool: protect against races with disappearing objects

Jiri Pirko (1):
  net: sched: fix possible null pointer deref in tcf_block_put

Jon Maloy (2):
  tipc: base group replicast ack counter on number of actual receivers
  tipc: fix memory leak of group member when peer node is lost

Mat Martineau (1):
  tcp: Avoid preprocessor directives in tracepoint macro args

Michal Kubecek (1):
  xfrm: fix XFRMA_OUTPUT_MARK policy entry

Parthasarathy Bhuvaragan (1):
  tipc: fix hanging poll() for stream sockets

Quentin Monnet (1):
  selftests/bpf: fix Makefile for passing LLC to the command line

Russell King (2):
  phylink: ensure the PHY interface mode is appropriately set
  phylink: ensure AN is enabled

Siva Reddy Kallam (3):
  tg3: Update copyright
  tg3: Add workaround to restrict 5762 MRRS to 2048
  tg3: Enable PHY reset in MTU change path for 5720

Steffen Klassert (2):
  xfrm: Fix stack-out-of-bounds read on socket policy lookup.
  xfrm: Fix stack-out-of-bounds with misconfigured transport mode policies.

Tom Herbert (2):
  sock: Add sock_owned_by_user_nocheck
  strparser: Call sock_owned_by_user_nocheck

Tommi Rantala (2):
  tipc: error path leak fixes in tipc_enable_bearer()
  tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path

Tonghao Zhang (1):
  sctp: Replace use of sockets_allocated with specified macro.

Willem de Bruijn (1):
  skbuff: in skb_copy_ubufs unclone before releasing zerocopy

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |  4 +--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 +-
 drivers/net/ethernet/broadcom/tg3.c  | 19 +++--
 drivers/net/ethernet/broadcom/tg3.h  |  7 -
 drivers/net/ethernet/freescale/fec_main.c|  6 
 drivers/net/phy/micrel.c |  1 +
 drivers/net/phy/phylink.c|  2 ++
 include/net/sock.h   |  5 
 include/net/xfrm.h   |  3 ++
 include/trace/events/tcp.h   | 97 
+
 net/core/skbuff.c|  6 ++--
 net/ipv4/xfrm4_input.c   | 12 +++-
 net/ipv6/ip6_gre.c   | 57 
+-
 net/ipv6/xfrm6_input.c

Re: [RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Emiliano Ingrassia

Hi Martin, Hi Dave,

On Thu, Dec 28, 2017 at 11:21:23PM +0100, Martin Blumenstingl wrote:
> Hi Dave,
> 
> please do not apply this series until it got a Tested-by from Emiliano.
> 
> 
> Hi Emiliano,
> 
> you reported [0] that you couldn't get dwmac-meson8b to work on your
> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
> I think I was able to find a fix: it consists of two patches (which you
> find in this series)
> 
> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
> only partially test this (I could only check if the clocks were
> calculated correctly when using a dummy 52394Hz input clock instead
> of MPLL2).
> 
> Could you please give this series a try and let me know about the
> results?
> You obviously still need your two "ARM: dts: meson8b" patches which
> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
> - enable Ethernet on the Odroid-C1
> 
> When testing on Meson8b this also needs a fix for the MPLL clock driver:
> "clk: meson: mpll: use 64-bit maths in params_from_rate", see:
> https://patchwork.kernel.org/patch/10131677/
> 
> 
> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
> fine (so let's hope that this also fixes your Meson8b issue :)).
> 
> 
> changes since v1 at [1]:
> - changed the subject of the cover-letter to indicate that this is all
>   about the RGMII clock
> - added PATCH #1 which ensures that we don't unnecessarily change the
>   parent clocks in RMII mode (and also makes the code easier to
>   understand)
> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>   is about the RGMII clock
> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
> - replaced PATCH #3 (formerly PATCH #2) with one that sets
>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>   on Meson8b correctly
> 
> changes since v2 at [2]:
> - added PATCH #2 to make the following patch easier
> - Emiliano reported that there's currently another bug in the
>   dwmac-meson8b driver which prevents it from working with RGMII PHYs on
>   Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
>   (instead of a divide by 5 or divide by 10 clock divider). This has not
>   been visible on GXBB and later due to the input clock which always led
>   to a selection of "divide by 10" (which is done internally in the IP
>   block, but the bit actually means "enable RGMII clock output").
>   PATCH #3 was added to address this issue.
> - the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
>   updated and the patch itself rebased because the m25_div clock was
>   removed with the new PATCH #3 (so some of the statements were not
>   valid anymore)
>

Here is the clk_summary relative to ethernet on Odroid-C1+
with this new series applied:

xtal112400  0 0
 sys_pll00  12  0 0
  cpu_clk   00  12  0 0
 vid_pll00   73200  0 0
 fixed_pll  22  255000  0 0
  mpll2 11   24701  0 0
   c941.ethernet#m250_sel   11   24701  0 0
c941.ethernet#m250_div  11   24701  0 0
 c941.ethernet#fixed_div10  112470  0 0
  c941.ethernet#m25_en  112470  0 0

The ethernet prg0 register is set to 0x74A1 which should be correct with
respect to the information contained in the S805 SoC manual.
Actually, the ethernet is not yet fully functional.
Trying to ping the board, I can see ARP request from host to board using
tcpdump. However, the host can't see any response.

Following the U-Boot value for prg0 register, which is 0x7d21, I also
tried to set bit 11. As expected, this did not have any influence.
Another thing that we should check is the "Ethernet Memory PD" (see S805
manual - sec. 5.4) register which bits 3-2 enable/disable ethernet
normal operation. However, those bits are already cleared by U-Boot.

Thank you for the support.

Best regards,

Emiliano

> 
> [0] 
> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
> [1] 
> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
> [2] 
> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005861.html
> 
> 
> Martin Blumenstingl (5):
>   net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode
>   net: stmmac: dwmac-meson8b: simplify generating the clock names
>   net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration
>   net: stmmac: dwmac-meson8b: fix setting the RGMII clock on Meson8b
>   net: stmmac: dwmac-meson8b: propagate rate

Re: [RFC PATCH bpf-next v2 4/4] error-injection: Support fault injection framework

2017-12-28 Thread Alexei Starovoitov


On 12/27/17 11:51 PM, Masami Hiramatsu wrote:


Then what happen if the user set invalid retval to those functions?
even if we limit the injectable functions, it can cause a problem,

for example,

 obj = func_return_object();
 if (!obj) {
handling_error...;
 }
 obj->field = x;

In this case, obviously func_return_object() must return NULL if there is
an error, not -ENOMEM. But without the correct retval information, how would
you check the BPF code doesn't cause a trouble?
Currently it seems you are expecting only the functions which return error code.

 ret = func_return_state();
 if (ret < 0) {
handling_error...;
 }

But how we can distinguish those?

If we have the error range for each function, we can ensure what is
*correct* error code, NULL or errno, or any other error numbers. :)


messing up return values may cause problems and range check is
not going to magically help.
The caller may handle only a certain set of errors or interpret
some of them like EBUSY as a signal to retry.
It's plain impossible to make sure that kernel will be functional
after error injection has been made.
Like kmalloc() unconditionally returning NULL will be deadly
for the kernel, hence this patch 4/4 has very limited practical
use. The bpf program need to make intelligent decisions when
to return an error and what kind of error to return.
Doing blank range check adds a false sense of additional safety.
More so it wastes kilobytes of memory to do this check, hence nack.

Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error injectable event is on function entry

2017-12-28 Thread Alexei Starovoitov


On 12/28/17 12:20 AM, Masami Hiramatsu wrote:

On Wed, 27 Dec 2017 20:32:07 -0800
Alexei Starovoitov  wrote:


On 12/27/17 8:16 PM, Steven Rostedt wrote:

On Wed, 27 Dec 2017 19:45:42 -0800
Alexei Starovoitov  wrote:


I don't think that's the case. My reading of current
trace_kprobe_ftrace() -> arch_check_ftrace_location()
is that it will not be true for old mcount case.


In the old mcount case, you can't use ftrace to return without calling
the function. That is, no modification of the return ip, unless you
created a trampoline that could handle arbitrary stack frames, and
remove them from the stack before returning back to the function.


correct. I was saying that trace_kprobe_ftrace() won't let us do
bpf_override_return with old mcount.


No, trace_kprobe_ftrace() just checks the given address will be
managed by ftrace. you can see arch_check_ftrace_location() in kernel/kprobes.c.

FYI, CONFIG_KPROBES_ON_FTRACE depends on DYNAMIC_FTRACE_WITH_REGS, and
DYNAMIC_FTRACE_WITH_REGS doesn't depend on CC_USING_FENTRY.
This means if you compile kernel with old gcc and enable DYNAMIC_FTRACE,
kprobes uses ftrace on mcount address which is NOT the entry point
of target function.


ok. fair enough. I think we can gate the feature to !mcount only.


On the other hand, changing IP feature has been implemented originaly
by kprobes with int3 (sw breakpoint). This means you can use kprobes
at correct address (the entry address of the function) you can hijack
the function, as jprobe did.


As far as the rest of your arguments it very much puzzles me that
you claim that this patch suppose to work based on historical
reasoning whereas you did NOT test it.


I believe that Masami is saying that the modification of the IP from
kprobes has been very well tested. But I'm guessing that you still want
a test case for using kprobes in this particular instance. It's not the
implementation of modifying the IP that you are worried about, but the
implementation of BPF using it in this case. Right?


exactly. No doubt that old code works.
But it doesn't mean that bpf_override_return() will continue to
work in kprobes that are not ftrace based.
I suspect Josef's existing test case will cover this situation.
Probably only special .config is needed to disable ftrace, so
"kprobe on entry but not ftrace" check will kick in.


Right. If you need to test it, you can run Josef's test case without
CONFIG_DYNAMIC_FTRACE.


It should be obvious that the person who submits the patch
must run the tests.


But I didn't get an impression that this situation was tested.
Instead I see only logical reasoning that it's _supposed_ to work.
That's not enough.


OK, so would you just ask me to run samples/bpf ?


Please run Josef's test in the !ftrace setup.

Re: [pull request][for-next V3 00/11] Mellanox, mlx5 E-Switch updates 2017-12-19

2017-12-28 Thread David Miller

From: Saeed Mahameed 
Date: Fri, 29 Dec 2017 01:23:03 +0200

> ==
> This series includes updates for mlx5 E-Switch infrastructures,
> to be merged into net-next and rdma-next trees.
> 
> Mark's patches provide E-Switch refactoring that generalize the mlx5
> E-Switch vf representors interfaces and data structures. The serious is
> mainly focused on moving ethernet (netdev) specific representors logic out
> of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
> provides better separation and allows future support for other types of vf
> representors (e.g. RDMA).
> 
> Gal's patches at the end of this serious, provide a simple syntax fix and
> two other patches that handles vport ingress/egress ACL steering name
> spaces to be aligned with the Firmware/Hardware specs.
> ===
> 
> V1->V2:
>  - Addressed coding style comments in patches #1 and #7
>  - The series is still based on rc4, as now I see net-next is also @rc4.
> 
> V2->V3:
>  - Fixed compilation warning, reported by Dave.
> 
> Please pull and let me know if there's any problem.

Looks good, pulled, thank you.

Re: [PATCH net-next v6 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-28 Thread Masami Hiramatsu

On Thu, 28 Dec 2017 12:06:13 -0500 (EST)
David Miller  wrote:

> From: Masami Hiramatsu 
> Date: Thu, 28 Dec 2017 15:10:00 +0900
> 
> > Changes from v5:
> >   [1/6]: Avoid preprocessor directives in tracepoint macro args
> 
> Patch #1 is not the only patch which has this problem, at a minimum
> patch #5 has it too.

Oops, sorry...

> Please audit the entire series for an issue when it is brought to your
> attention.

Thank you for your kindly advice.

> 
> Thank you.


-- 
Masami Hiramatsu

Re: [ovs-dev] Pravin Shelar

2017-12-28 Thread Pravin Shelar

On Wed, Dec 27, 2017 at 10:33 AM, Joe Perches  wrote:
> On Wed, 2017-12-27 at 10:25 -0800, Ben Pfaff wrote:
>> On Wed, Dec 27, 2017 at 04:22:55PM +0100, Julia Lawall wrote:
>> > The email address pshe...@nicira.com listed for Pravin Shelar in
>> > MAINTAINERS (OPENVSWITCH section) seems to bounce.
>>
>> Pravin has used a newer address recently, so I sent out a suggested
>> update (for OVS):
>> https://patchwork.ozlabs.org/patch/853232/
>
> As Pravin is still active with acks but not any authored patches in
> the
> last year, this should still be updated in the linux-kernel's
> MAINTAINERS
> file too.
> ---
> diff --git a/MAINTAINERS b/MAINTAINERS
> index
> a6e86e20761e..5869e5f0b930 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@
> -10137,7 +10137,7 @@ F: drivers/irqchip/irq-ompic.c
>  F: dri
> vers/irqchip/irq-or1k-*
>
>  OPENVSWITCH
> -M: Pravin Shelar  cira.com>
> +M: Pravin Shelar 
>  L: netdev@vge
> r.kernel.org
>  L: d...@openvswitch.org
>  W: http://openvswitch.org

Thanks Joe for the patch. But it is corrupted. I will send updated patch soon.

Re: iproute2 net-next

2017-12-28 Thread Daniel Borkmann

On 12/26/2017 10:35 AM, Leon Romanovsky wrote:
> On Mon, Dec 25, 2017 at 10:14:26PM -0800, Stephen Hemminger wrote:
>> On Tue, 26 Dec 2017 06:47:43 +0200
>> Leon Romanovsky  wrote:
>>
>>> On Mon, Dec 25, 2017 at 10:49:19AM -0800, Stephen Hemminger wrote:
 David Ahern has agreed to take over managing the net-next branch of 
 iproute2.
 The new location is:
  https://git.kernel.org/pub/scm/linux/kernel/git/dsahern/iproute2-next.git/

 In the past, I have accepted new features into iproute2 master branch, but
 am changing the policy so that outside of the merge window (up until -rc1)
 new features will get put into net-next to get some more review and testing
 time. This means that things like the proposed batch streaming mode will
 go through net-next.
>>>
>>> Did you consider to create one shared repo for the iproute2 to allow
>>> multiple committers workflow?
>>
>> For now having separate trees is best, there is no need for multiple
>> committers the load is very light.
>>
>>> It will be much convenient for the users to have one place for
>>> master/stable/net-next branches, instead of actually following two
>>> different repositories.
>>
>> If you are doing network development, you already need to deal with
>> multiple repo's on the kernel side so there is no difference.
> 
> I agree with you that one extra "git remote add .." is not so huge and
> all people who develop for the netdev will do it. My concern is about
> Documentation and newcomers, who will have a hard time to find a right
> tree.

I guess it would certainly help to identify the official repo to rebase
against much quicker if it would be under a common group on korg e.g.

  * iproute2/iproute2.git - for current cycle
  * iproute2/iproute2-next.git- for net-next bits

and also be in line with other tooling (ethtool and others), even if
not as high volume, but it would make it unambiguous right away from
the other, private iproute2 repos on korg, imho. Just a thought.

>>> Example, of such shared repo:
>>> BPF: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/
>>> Bluetooth: 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/
>>> RDMA: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/
>>
>> Most of these are high volume or vendor silo'd which is not the case here.
Cheers,
Daniel

[for-next V3 03/11] net/mlx5: E-Switch, Simplify representor load/unload callback API

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

In the load() callback for loading representors we don't really need
struct mlx5_eswitch but struct mlx5_core_dev, pass it directly.

In the unload() callback for unloading representors we don't need the
struct mlx5_eswitch argument, remove it.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 14 +++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  5 ++---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c |  6 +++---
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 4661ef12c18c..6d2219f3acf6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -967,7 +967,7 @@ static const struct mlx5e_profile mlx5e_rep_profile = {
 /* e-Switch vport representors */
 
 static int
-mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
+mlx5e_nic_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
 {
struct mlx5e_priv *priv = netdev_priv(rep->netdev);
struct mlx5e_rep_priv *rpriv = priv->ppriv;
@@ -992,7 +992,7 @@ mlx5e_nic_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
 }
 
 static void
-mlx5e_nic_rep_unload(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
+mlx5e_nic_rep_unload(struct mlx5_eswitch_rep *rep)
 {
struct mlx5e_priv *priv = netdev_priv(rep->netdev);
struct mlx5e_rep_priv *rpriv = priv->ppriv;
@@ -1008,7 +1008,7 @@ mlx5e_nic_rep_unload(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
 }
 
 static int
-mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
+mlx5e_vport_rep_load(struct mlx5_core_dev *dev, struct mlx5_eswitch_rep *rep)
 {
struct mlx5e_rep_priv *rpriv;
struct net_device *netdev;
@@ -1019,7 +1019,7 @@ mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
if (!rpriv)
return -ENOMEM;
 
-   netdev = mlx5e_create_netdev(esw->dev, _rep_profile, rpriv);
+   netdev = mlx5e_create_netdev(dev, _rep_profile, rpriv);
if (!netdev) {
pr_warn("Failed to create representor netdev for vport %d\n",
rep->vport);
@@ -1044,7 +1044,7 @@ mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
goto err_detach_netdev;
}
 
-   upriv = netdev_priv(mlx5_eswitch_get_uplink_netdev(esw));
+   upriv = netdev_priv(mlx5_eswitch_get_uplink_netdev(dev->priv.eswitch));
err = tc_setup_cb_egdev_register(netdev, mlx5e_setup_tc_block_cb,
 upriv);
if (err)
@@ -1076,7 +1076,7 @@ mlx5e_vport_rep_load(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
 }
 
 static void
-mlx5e_vport_rep_unload(struct mlx5_eswitch *esw, struct mlx5_eswitch_rep *rep)
+mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep)
 {
struct net_device *netdev = rep->netdev;
struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -1085,7 +1085,7 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch *esw, struct 
mlx5_eswitch_rep *rep)
struct mlx5e_priv *upriv;
 
unregister_netdev(rep->netdev);
-   upriv = netdev_priv(mlx5_eswitch_get_uplink_netdev(esw));
+   upriv = 
netdev_priv(mlx5_eswitch_get_uplink_netdev(priv->mdev->priv.eswitch));
tc_setup_cb_egdev_unregister(netdev, mlx5e_setup_tc_block_cb,
 upriv);
mlx5e_rep_neigh_cleanup(rpriv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 9722c2a96090..23808a65889c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -139,10 +139,9 @@ struct mlx5_esw_sq {
 };
 
 struct mlx5_eswitch_rep {
-   int(*load)(struct mlx5_eswitch *esw,
+   int(*load)(struct mlx5_core_dev *dev,
   struct mlx5_eswitch_rep *rep);
-   void   (*unload)(struct mlx5_eswitch *esw,
-struct mlx5_eswitch_rep *rep);
+   void   (*unload)(struct mlx5_eswitch_rep *rep);
u16vport;
u8 hw_id[ETH_ALEN];
struct net_device  *netdev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 26fbc50ddc6d..aa20f51c0a99 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -777,7 +777,7 @@ static

[for-next V3 04/11] net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

In our pursuit to cleanup e-switch sub-module from mlx5e specific code,
we move the functions that insert/remove the flow steering rules that
allow mlx5e representors to send packets directly to VFs into the EN
driver code.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 57 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  9 ++--
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 55 +
 3 files changed, 59 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 6d2219f3acf6..19edaa155062 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -190,6 +190,59 @@ int mlx5e_attr_get(struct net_device *dev, struct 
switchdev_attr *attr)
return 0;
 }
 
+static void mlx5e_sqs2vport_stop(struct mlx5_eswitch *esw,
+struct mlx5_eswitch_rep *rep)
+{
+   struct mlx5_esw_sq *esw_sq, *tmp;
+
+   if (esw->mode != SRIOV_OFFLOADS)
+   return;
+
+   list_for_each_entry_safe(esw_sq, tmp, >vport_sqs_list, list) {
+   mlx5_del_flow_rules(esw_sq->send_to_vport_rule);
+   list_del(_sq->list);
+   kfree(esw_sq);
+   }
+}
+
+static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
+struct mlx5_eswitch_rep *rep,
+u16 *sqns_array, int sqns_num)
+{
+   struct mlx5_flow_handle *flow_rule;
+   struct mlx5_esw_sq *esw_sq;
+   int err;
+   int i;
+
+   if (esw->mode != SRIOV_OFFLOADS)
+   return 0;
+
+   for (i = 0; i < sqns_num; i++) {
+   esw_sq = kzalloc(sizeof(*esw_sq), GFP_KERNEL);
+   if (!esw_sq) {
+   err = -ENOMEM;
+   goto out_err;
+   }
+
+   /* Add re-inject rule to the PF/representor sqs */
+   flow_rule = mlx5_eswitch_add_send_to_vport_rule(esw,
+   rep->vport,
+   sqns_array[i]);
+   if (IS_ERR(flow_rule)) {
+   err = PTR_ERR(flow_rule);
+   kfree(esw_sq);
+   goto out_err;
+   }
+   esw_sq->send_to_vport_rule = flow_rule;
+   list_add(_sq->list, >vport_sqs_list);
+   }
+   return 0;
+
+out_err:
+   mlx5e_sqs2vport_stop(esw, rep);
+   return err;
+}
+
 int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
 {
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
@@ -210,7 +263,7 @@ int mlx5e_add_sqs_fwd_rules(struct mlx5e_priv *priv)
sqs[num_sqs++] = c->sq[tc].sqn;
}
 
-   err = mlx5_eswitch_sqs2vport_start(esw, rep, sqs, num_sqs);
+   err = mlx5e_sqs2vport_start(esw, rep, sqs, num_sqs);
kfree(sqs);
 
 out:
@@ -225,7 +278,7 @@ void mlx5e_remove_sqs_fwd_rules(struct mlx5e_priv *priv)
struct mlx5e_rep_priv *rpriv = priv->ppriv;
struct mlx5_eswitch_rep *rep = rpriv->rep;
 
-   mlx5_eswitch_sqs2vport_stop(esw, rep);
+   mlx5e_sqs2vport_stop(esw, rep);
 }
 
 static void mlx5e_rep_neigh_update_init_interval(struct mlx5e_rep_priv *rpriv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 23808a65889c..21b506fd2b67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -222,6 +222,9 @@ int mlx5_eswitch_get_vport_config(struct mlx5_eswitch *esw,
 int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 int vport,
 struct ifla_vf_stats *vf_stats);
+struct mlx5_flow_handle *
+mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport,
+   u32 sqn);
 
 struct mlx5_flow_spec;
 struct mlx5_esw_flow_attr;
@@ -258,12 +261,6 @@ struct mlx5_esw_flow_attr {
struct mlx5e_tc_flow_parse_attr *parse_attr;
 };
 
-int mlx5_eswitch_sqs2vport_start(struct mlx5_eswitch *esw,
-struct mlx5_eswitch_rep *rep,
-u16 *sqns_array, int sqns_num);
-void mlx5_eswitch_sqs2vport_stop(struct mlx5_eswitch *esw,
-struct mlx5_eswitch_rep *rep);
-
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode);
 int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode);
 int mlx5_devlink_eswitch_inline_mode_set(struct devlink *devlink, u8 mode);
diff --git

[for-next V3 01/11] net/mlx5: E-Switch, Refactor vport representors initialization

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

Refactor the init stage of vport representors registration.
vport number and hw id can be assigned by the E-Switch driver and not by
the netdevice driver. While here, make the error path of mlx5_eswitch_init()
a reverse order of the good path, also use kcalloc to allocate an array
instead of kzalloc.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  7 
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  | 12 +++
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  2 ++
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 39 +++---
 4 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 2c43606c26b5..4661ef12c18c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1100,17 +1100,12 @@ static void mlx5e_rep_register_vf_vports(struct 
mlx5e_priv *priv)
struct mlx5_eswitch *esw   = mdev->priv.eswitch;
int total_vfs = MLX5_TOTAL_VPORTS(mdev);
int vport;
-   u8 mac[ETH_ALEN];
-
-   mlx5_query_nic_vport_mac_address(mdev, 0, mac);
 
for (vport = 1; vport < total_vfs; vport++) {
struct mlx5_eswitch_rep rep;
 
rep.load = mlx5e_vport_rep_load;
rep.unload = mlx5e_vport_rep_unload;
-   rep.vport = vport;
-   ether_addr_copy(rep.hw_id, mac);
mlx5_eswitch_register_vport_rep(esw, vport, );
}
 }
@@ -1132,10 +1127,8 @@ void mlx5e_register_vport_reps(struct mlx5e_priv *priv)
struct mlx5_eswitch *esw   = mdev->priv.eswitch;
struct mlx5_eswitch_rep rep;
 
-   mlx5_query_nic_vport_mac_address(mdev, 0, rep.hw_id);
rep.load = mlx5e_nic_rep_load;
rep.unload = mlx5e_nic_rep_unload;
-   rep.vport = FDB_UPLINK_VPORT;
rep.netdev = priv->netdev;
mlx5_eswitch_register_vport_rep(esw, 0, ); /* UPLINK PF vport*/
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index bbb140f517c4..6d4cbdb69823 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1644,13 +1644,9 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
goto abort;
}
 
-   esw->offloads.vport_reps =
-   kzalloc(total_vports * sizeof(struct mlx5_eswitch_rep),
-   GFP_KERNEL);
-   if (!esw->offloads.vport_reps) {
-   err = -ENOMEM;
+   err = esw_offloads_init_reps(esw);
+   if (err)
goto abort;
-   }
 
hash_init(esw->offloads.encap_tbl);
hash_init(esw->offloads.mod_hdr_tbl);
@@ -1681,8 +1677,8 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 abort:
if (esw->work_queue)
destroy_workqueue(esw->work_queue);
+   esw_offloads_cleanup_reps(esw);
kfree(esw->vports);
-   kfree(esw->offloads.vport_reps);
kfree(esw);
return err;
 }
@@ -1696,7 +1692,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
 
esw->dev->priv.eswitch = NULL;
destroy_workqueue(esw->work_queue);
-   kfree(esw->offloads.vport_reps);
+   esw_offloads_cleanup_reps(esw);
kfree(esw->vports);
kfree(esw);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 565c8b7a399a..9722c2a96090 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -197,6 +197,8 @@ struct mlx5_eswitch {
 
 void esw_offloads_cleanup(struct mlx5_eswitch *esw, int nvports);
 int esw_offloads_init(struct mlx5_eswitch *esw, int nvports);
+void esw_offloads_cleanup_reps(struct mlx5_eswitch *esw);
+int esw_offloads_init_reps(struct mlx5_eswitch *esw);
 
 /* E-Switch API */
 int mlx5_eswitch_init(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 1143d80119bd..7e15854c1087 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -732,6 +732,41 @@ static int esw_offloads_start(struct mlx5_eswitch *esw)
return err;
 }
 
+void esw_offloads_cleanup_reps(struct mlx5_eswitch *esw)
+{
+   kfree(esw->offloads.vport_reps);
+}
+
+int esw_offloads_init_reps(struct mlx5_eswitch *esw)
+{
+   int total_vfs = MLX5_TOTAL_VPORTS(esw->dev);
+   struct mlx5_core_dev *dev = esw->dev;
+   struct mlx5_esw_offload *offloads;
+   struct mlx5_eswitch_rep *rep;
+   u8

[for-next V3 09/11] net/mlx5e: E-Switch, Use the name of static array instead of its address

2017-12-28 Thread Saeed Mahameed

From: Gal Pressman 

Using the address of a static array is the same as using its name (in
this specific use-case), but it's confusing and makes the code less
readable.

Fixes: 1bd27b11c1df ("net/mlx5: Introduce E-switch QoS management")
Fixes: bd77bf1cb595 ("net/mlx5: Add SRIOV VF max rate configuration support")
Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 26 +++
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6d4cbdb69823..cdf65ed8714c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1290,7 +1290,7 @@ static int esw_create_tsar(struct mlx5_eswitch *esw)
 
err = mlx5_create_scheduling_element_cmd(dev,
 SCHEDULING_HIERARCHY_E_SWITCH,
-_ctx,
+tsar_ctx,
 >qos.root_tsar_id);
if (err) {
esw_warn(esw->dev, "E-Switch create TSAR failed (%d)\n", err);
@@ -1333,20 +1333,20 @@ static int esw_vport_enable_qos(struct mlx5_eswitch 
*esw, int vport_num,
if (vport->qos.enabled)
return -EEXIST;
 
-   MLX5_SET(scheduling_context, _ctx, element_type,
+   MLX5_SET(scheduling_context, sched_ctx, element_type,
 SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
-   vport_elem = MLX5_ADDR_OF(scheduling_context, _ctx,
+   vport_elem = MLX5_ADDR_OF(scheduling_context, sched_ctx,
  element_attributes);
MLX5_SET(vport_element, vport_elem, vport_number, vport_num);
-   MLX5_SET(scheduling_context, _ctx, parent_element_id,
+   MLX5_SET(scheduling_context, sched_ctx, parent_element_id,
 esw->qos.root_tsar_id);
-   MLX5_SET(scheduling_context, _ctx, max_average_bw,
+   MLX5_SET(scheduling_context, sched_ctx, max_average_bw,
 initial_max_rate);
-   MLX5_SET(scheduling_context, _ctx, bw_share, initial_bw_share);
+   MLX5_SET(scheduling_context, sched_ctx, bw_share, initial_bw_share);
 
err = mlx5_create_scheduling_element_cmd(dev,
 SCHEDULING_HIERARCHY_E_SWITCH,
-_ctx,
+sched_ctx,
 >qos.esw_tsar_ix);
if (err) {
esw_warn(esw->dev, "E-Switch create TSAR vport element failed 
(vport=%d,err=%d)\n",
@@ -1392,22 +1392,22 @@ static int esw_vport_qos_config(struct mlx5_eswitch 
*esw, int vport_num,
if (!vport->qos.enabled)
return -EIO;
 
-   MLX5_SET(scheduling_context, _ctx, element_type,
+   MLX5_SET(scheduling_context, sched_ctx, element_type,
 SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
-   vport_elem = MLX5_ADDR_OF(scheduling_context, _ctx,
+   vport_elem = MLX5_ADDR_OF(scheduling_context, sched_ctx,
  element_attributes);
MLX5_SET(vport_element, vport_elem, vport_number, vport_num);
-   MLX5_SET(scheduling_context, _ctx, parent_element_id,
+   MLX5_SET(scheduling_context, sched_ctx, parent_element_id,
 esw->qos.root_tsar_id);
-   MLX5_SET(scheduling_context, _ctx, max_average_bw,
+   MLX5_SET(scheduling_context, sched_ctx, max_average_bw,
 max_rate);
-   MLX5_SET(scheduling_context, _ctx, bw_share, bw_share);
+   MLX5_SET(scheduling_context, sched_ctx, bw_share, bw_share);
bitmask |= MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_MAX_AVERAGE_BW;
bitmask |= MODIFY_SCHEDULING_ELEMENT_IN_MODIFY_BITMASK_BW_SHARE;
 
err = mlx5_modify_scheduling_element_cmd(dev,
 SCHEDULING_HIERARCHY_E_SWITCH,
-_ctx,
+sched_ctx,
 vport->qos.esw_tsar_ix,
 bitmask);
if (err) {
-- 
2.13.0

[for-next V3 05/11] net/mlx5: E-Switch, Create a dedicated send to vport rule deletion function

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

In order for representors to send packets directly to VFs we use an
E-Switch function which insert special rules into the HW. For symmetry
create an E-Switch function that deletes these rules as well.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 5 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 19edaa155062..01bf4e3c8afa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -199,7 +199,7 @@ static void mlx5e_sqs2vport_stop(struct mlx5_eswitch *esw,
return;
 
list_for_each_entry_safe(esw_sq, tmp, >vport_sqs_list, list) {
-   mlx5_del_flow_rules(esw_sq->send_to_vport_rule);
+   mlx5_eswitch_del_send_to_vport_rule(esw_sq->send_to_vport_rule);
list_del(_sq->list);
kfree(esw_sq);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 21b506fd2b67..9ed401225225 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -225,6 +225,7 @@ int mlx5_eswitch_get_vport_stats(struct mlx5_eswitch *esw,
 struct mlx5_flow_handle *
 mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch *esw, int vport,
u32 sqn);
+void mlx5_eswitch_del_send_to_vport_rule(struct mlx5_flow_handle *rule);
 
 struct mlx5_flow_spec;
 struct mlx5_esw_flow_attr;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 90a30c51d92e..121609b823c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -339,6 +339,11 @@ mlx5_eswitch_add_send_to_vport_rule(struct mlx5_eswitch 
*esw, int vport, u32 sqn
return flow_rule;
 }
 
+void mlx5_eswitch_del_send_to_vport_rule(struct mlx5_flow_handle *rule)
+{
+   mlx5_del_flow_rules(rule);
+}
+
 static int esw_add_fdb_miss_rule(struct mlx5_eswitch *esw)
 {
struct mlx5_flow_act flow_act = {0};
-- 
2.13.0

[for-next V3 11/11] net/mlx5: Separate ingress/egress namespaces for each vport

2017-12-28 Thread Saeed Mahameed

From: Gal Pressman 

Each vport has its own root flow table for the ACL flow tables and root
flow table is per namespace, therefore we should create a namespace for
each vport.

Fixes: efdc810ba39d ("net/mlx5: Flow steering, Add vport ACL support")
Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c |  10 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 145 ++
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h |   4 +-
 include/linux/mlx5/fs.h   |   4 +
 4 files changed, 133 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index cdf65ed8714c..7649e36653d9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -867,9 +867,10 @@ static int esw_vport_enable_egress_acl(struct mlx5_eswitch 
*esw,
esw_debug(dev, "Create vport[%d] egress ACL log_max_size(%d)\n",
  vport->vport, MLX5_CAP_ESW_EGRESS_ACL(dev, log_max_ft_size));
 
-   root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_ESW_EGRESS);
+   root_ns = mlx5_get_flow_vport_acl_namespace(dev, 
MLX5_FLOW_NAMESPACE_ESW_EGRESS,
+   vport->vport);
if (!root_ns) {
-   esw_warn(dev, "Failed to get E-Switch egress flow namespace\n");
+   esw_warn(dev, "Failed to get E-Switch egress flow namespace for 
vport (%d)\n", vport->vport);
return -EOPNOTSUPP;
}
 
@@ -984,9 +985,10 @@ static int esw_vport_enable_ingress_acl(struct 
mlx5_eswitch *esw,
esw_debug(dev, "Create vport[%d] ingress ACL log_max_size(%d)\n",
  vport->vport, MLX5_CAP_ESW_INGRESS_ACL(dev, log_max_ft_size));
 
-   root_ns = mlx5_get_flow_namespace(dev, MLX5_FLOW_NAMESPACE_ESW_INGRESS);
+   root_ns = mlx5_get_flow_vport_acl_namespace(dev, 
MLX5_FLOW_NAMESPACE_ESW_INGRESS,
+   vport->vport);
if (!root_ns) {
-   esw_warn(dev, "Failed to get E-Switch ingress flow 
namespace\n");
+   esw_warn(dev, "Failed to get E-Switch ingress flow namespace 
for vport (%d)\n", vport->vport);
return -EOPNOTSUPP;
}
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 5e786e29f93a..45e75b1010f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -2014,16 +2014,6 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
return >fdb_root_ns->ns;
else
return NULL;
-   case MLX5_FLOW_NAMESPACE_ESW_EGRESS:
-   if (steering->esw_egress_root_ns)
-   return >esw_egress_root_ns->ns;
-   else
-   return NULL;
-   case MLX5_FLOW_NAMESPACE_ESW_INGRESS:
-   if (steering->esw_ingress_root_ns)
-   return >esw_ingress_root_ns->ns;
-   else
-   return NULL;
case MLX5_FLOW_NAMESPACE_SNIFFER_RX:
if (steering->sniffer_rx_root_ns)
return >sniffer_rx_root_ns->ns;
@@ -2054,6 +2044,33 @@ struct mlx5_flow_namespace 
*mlx5_get_flow_namespace(struct mlx5_core_dev *dev,
 }
 EXPORT_SYMBOL(mlx5_get_flow_namespace);
 
+struct mlx5_flow_namespace *mlx5_get_flow_vport_acl_namespace(struct 
mlx5_core_dev *dev,
+ enum 
mlx5_flow_namespace_type type,
+ int vport)
+{
+   struct mlx5_flow_steering *steering = dev->priv.steering;
+
+   if (!steering || vport >= MLX5_TOTAL_VPORTS(dev))
+   return NULL;
+
+   switch (type) {
+   case MLX5_FLOW_NAMESPACE_ESW_EGRESS:
+   if (steering->esw_egress_root_ns &&
+   steering->esw_egress_root_ns[vport])
+   return >esw_egress_root_ns[vport]->ns;
+   else
+   return NULL;
+   case MLX5_FLOW_NAMESPACE_ESW_INGRESS:
+   if (steering->esw_ingress_root_ns &&
+   steering->esw_ingress_root_ns[vport])
+   return >esw_ingress_root_ns[vport]->ns;
+   else
+   return NULL;
+   default:
+   return NULL;
+   }
+}
+
 static struct fs_prio *fs_create_prio(struct mlx5_flow_namespace *ns,
  unsigned int prio, int num_levels)
 {
@@ -2331,13 +2348,41 @@ static void cleanup_root_ns(struct 
mlx5_flow_root_namespace *root_ns)
clean_tree(_ns->ns.node);
 }
 
+static

[for-next V3 07/11] net/mlx5: E-Switch, Create generic header struct to be used by representors

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

Now that we don't store type dependent data in struct mlx5_eswitch_rep
we can create a generic interface, and representor type.

struct mlx5_eswitch_rep will store an array of interfaces, each
interface is used by a different representor type.

Once we moved to a more generic interface, rdma driver representors can
be added and utilize the same mechanism as the Ethernet driver
representors use.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 29 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  9 +--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  | 22 +--
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 70 +++---
 5 files changed, 88 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 3c74f0599ad3..5b2b673c0b13 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1086,7 +1086,7 @@ mlx5e_vport_rep_load(struct mlx5_core_dev *dev, struct 
mlx5_eswitch_rep *rep)
 
rpriv->netdev = netdev;
rpriv->rep = rep;
-   rep->priv = rpriv;
+   rep->rep_if[REP_ETH].priv = rpriv;
INIT_LIST_HEAD(>vport_sqs_list);
 
err = mlx5e_attach_netdev(netdev_priv(netdev));
@@ -1103,7 +1103,7 @@ mlx5e_vport_rep_load(struct mlx5_core_dev *dev, struct 
mlx5_eswitch_rep *rep)
goto err_detach_netdev;
}
 
-   uplink_rpriv = mlx5_eswitch_get_uplink_priv(dev->priv.eswitch);
+   uplink_rpriv = mlx5_eswitch_get_uplink_priv(dev->priv.eswitch, REP_ETH);
upriv = netdev_priv(uplink_rpriv->netdev);
err = tc_setup_cb_egdev_register(netdev, mlx5e_setup_tc_block_cb,
 upriv);
@@ -1146,7 +1146,8 @@ mlx5e_vport_rep_unload(struct mlx5_eswitch_rep *rep)
struct mlx5e_priv *upriv;
 
unregister_netdev(netdev);
-   uplink_rpriv = mlx5_eswitch_get_uplink_priv(priv->mdev->priv.eswitch);
+   uplink_rpriv = mlx5_eswitch_get_uplink_priv(priv->mdev->priv.eswitch,
+   REP_ETH);
upriv = netdev_priv(uplink_rpriv->netdev);
tc_setup_cb_egdev_unregister(netdev, mlx5e_setup_tc_block_cb,
 upriv);
@@ -1164,11 +1165,11 @@ static void mlx5e_rep_register_vf_vports(struct 
mlx5e_priv *priv)
int vport;
 
for (vport = 1; vport < total_vfs; vport++) {
-   struct mlx5_eswitch_rep rep = {};
+   struct mlx5_eswitch_rep_if rep_if = {};
 
-   rep.load = mlx5e_vport_rep_load;
-   rep.unload = mlx5e_vport_rep_unload;
-   mlx5_eswitch_register_vport_rep(esw, vport, );
+   rep_if.load = mlx5e_vport_rep_load;
+   rep_if.unload = mlx5e_vport_rep_unload;
+   mlx5_eswitch_register_vport_rep(esw, vport, _if, REP_ETH);
}
 }
 
@@ -1180,24 +1181,24 @@ static void mlx5e_rep_unregister_vf_vports(struct 
mlx5e_priv *priv)
int vport;
 
for (vport = 1; vport < total_vfs; vport++)
-   mlx5_eswitch_unregister_vport_rep(esw, vport);
+   mlx5_eswitch_unregister_vport_rep(esw, vport, REP_ETH);
 }
 
 void mlx5e_register_vport_reps(struct mlx5e_priv *priv)
 {
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_eswitch *esw   = mdev->priv.eswitch;
+   struct mlx5_eswitch_rep_if rep_if;
struct mlx5e_rep_priv *rpriv;
-   struct mlx5_eswitch_rep rep;
 
rpriv = priv->ppriv;
rpriv->netdev = priv->netdev;
 
-   rep.load = mlx5e_nic_rep_load;
-   rep.unload = mlx5e_nic_rep_unload;
-   rep.priv = rpriv;
+   rep_if.load = mlx5e_nic_rep_load;
+   rep_if.unload = mlx5e_nic_rep_unload;
+   rep_if.priv = rpriv;
INIT_LIST_HEAD(>vport_sqs_list);
-   mlx5_eswitch_register_vport_rep(esw, 0, ); /* UPLINK PF vport*/
+   mlx5_eswitch_register_vport_rep(esw, 0, _if, REP_ETH); /* UPLINK PF 
vport*/
 
mlx5e_rep_register_vf_vports(priv); /* VFs vports */
 }
@@ -1208,7 +1209,7 @@ void mlx5e_unregister_vport_reps(struct mlx5e_priv *priv)
struct mlx5_eswitch *esw   = mdev->priv.eswitch;
 
mlx5e_rep_unregister_vf_vports(priv); /* VFs vports */
-   mlx5_eswitch_unregister_vport_rep(esw, 0); /* UPLINK PF*/
+   mlx5_eswitch_unregister_vport_rep(esw, 0, REP_ETH); /* UPLINK PF*/
 }
 
 void *mlx5e_alloc_nic_rep_priv(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index 8db68369367e..e4473a9ebd50 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++

[pull request][for-next V3 00/11] Mellanox, mlx5 E-Switch updates 2017-12-19

2017-12-28 Thread Saeed Mahameed

Hi Dave and Doug,

==
This series includes updates for mlx5 E-Switch infrastructures,
to be merged into net-next and rdma-next trees.

Mark's patches provide E-Switch refactoring that generalize the mlx5
E-Switch vf representors interfaces and data structures. The serious is
mainly focused on moving ethernet (netdev) specific representors logic out
of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
provides better separation and allows future support for other types of vf
representors (e.g. RDMA).

Gal's patches at the end of this serious, provide a simple syntax fix and
two other patches that handles vport ingress/egress ACL steering name
spaces to be aligned with the Firmware/Hardware specs.
===

V1->V2:
 - Addressed coding style comments in patches #1 and #7
 - The series is still based on rc4, as now I see net-next is also @rc4.

V2->V3:
 - Fixed compilation warning, reported by Dave.

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit 1291a0d5049dbc06baaaf66a9ff3f53db493b19b:

  Linux 4.15-rc4 (2017-12-17 18:59:59 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-shared-4.16-1

for you to fetch changes up to 9b93ab981e3bf62ff95a8cbb6faf652cd400decd:

  net/mlx5: Separate ingress/egress namespaces for each vport (2017-12-29 
00:43:52 +0200)


mlx5-shared-4.16-1

mlx5 shared code for both rdma-next and net-next trees.


Gal Pressman (3):
  net/mlx5e: E-Switch, Use the name of static array instead of its address
  net/mlx5: Fix ingress/egress naming mistake
  net/mlx5: Separate ingress/egress namespaces for each vport

Mark Bloch (8):
  net/mlx5: E-Switch, Refactor vport representors initialization
  net/mlx5: E-Switch, Refactor load/unload of representors
  net/mlx5: E-Switch, Simplify representor load/unload callback API
  net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch
  net/mlx5: E-Switch, Create a dedicated send to vport rule deletion 
function
  net/mlx5e: Move ethernet representors data into separate struct
  net/mlx5: E-Switch, Create generic header struct to be used by 
representors
  net/mlx5e: E-Switch, Move send-to-vport rule struct to en_rep

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 147 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |  14 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  48 +++--
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  45 +++--
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 216 -
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 145 +++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   4 +-
 include/linux/mlx5/fs.h|   4 +
 9 files changed, 424 insertions(+), 214 deletions(-)

[for-next V3 10/11] net/mlx5: Fix ingress/egress naming mistake

2017-12-28 Thread Saeed Mahameed

From: Gal Pressman 

The functions names do not represent their actions, switch the mistaken
ingress/egress naming.

Fixes: fba53f7b5719 ("net/mlx5: Introduce mlx5_flow_steering structure")
Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index c70fd663a633..5e786e29f93a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -2406,7 +2406,7 @@ static int init_fdb_root_ns(struct mlx5_flow_steering 
*steering)
return PTR_ERR(prio);
 }
 
-static int init_ingress_acl_root_ns(struct mlx5_flow_steering *steering)
+static int init_egress_acl_root_ns(struct mlx5_flow_steering *steering)
 {
struct fs_prio *prio;
 
@@ -2420,7 +2420,7 @@ static int init_ingress_acl_root_ns(struct 
mlx5_flow_steering *steering)
return PTR_ERR_OR_ZERO(prio);
 }
 
-static int init_egress_acl_root_ns(struct mlx5_flow_steering *steering)
+static int init_ingress_acl_root_ns(struct mlx5_flow_steering *steering)
 {
struct fs_prio *prio;
 
-- 
2.13.0

[for-next V3 06/11] net/mlx5e: Move ethernet representors data into separate struct

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

Ethernet representors have a need to store data which is applicable
only for them. Create a priv void pointer in struct mlx5_eswitch_rep
and move mlx5e to store the relevant data there. As part of this change
we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the
E-Switch code will copy a priv value which is garbage.

We also rename mlx5_eswitch_get_uplink_netdev() to
mlx5_eswitch_get_uplink_priv() and make it return void *.
This way E-Switch code doesn't need to deal with net devices and
we leave the task of getting it to mlx5e.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   | 58 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h   |  9 
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 14 --
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h  |  7 +--
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |  7 ++-
 5 files changed, 60 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 01bf4e3c8afa..3c74f0599ad3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -194,11 +194,13 @@ static void mlx5e_sqs2vport_stop(struct mlx5_eswitch *esw,
 struct mlx5_eswitch_rep *rep)
 {
struct mlx5_esw_sq *esw_sq, *tmp;
+   struct mlx5e_rep_priv *rpriv;
 
if (esw->mode != SRIOV_OFFLOADS)
return;
 
-   list_for_each_entry_safe(esw_sq, tmp, >vport_sqs_list, list) {
+   rpriv = mlx5e_rep_to_rep_priv(rep);
+   list_for_each_entry_safe(esw_sq, tmp, >vport_sqs_list, list) {
mlx5_eswitch_del_send_to_vport_rule(esw_sq->send_to_vport_rule);
list_del(_sq->list);
kfree(esw_sq);
@@ -210,6 +212,7 @@ static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
 u16 *sqns_array, int sqns_num)
 {
struct mlx5_flow_handle *flow_rule;
+   struct mlx5e_rep_priv *rpriv;
struct mlx5_esw_sq *esw_sq;
int err;
int i;
@@ -217,6 +220,7 @@ static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
if (esw->mode != SRIOV_OFFLOADS)
return 0;
 
+   rpriv = mlx5e_rep_to_rep_priv(rep);
for (i = 0; i < sqns_num; i++) {
esw_sq = kzalloc(sizeof(*esw_sq), GFP_KERNEL);
if (!esw_sq) {
@@ -234,7 +238,7 @@ static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
goto out_err;
}
esw_sq->send_to_vport_rule = flow_rule;
-   list_add(_sq->list, >vport_sqs_list);
+   list_add(_sq->list, >vport_sqs_list);
}
return 0;
 
@@ -291,7 +295,7 @@ static void mlx5e_rep_neigh_update_init_interval(struct 
mlx5e_rep_priv *rpriv)
 #endif
unsigned long ipv4_interval = NEIGH_VAR(_tbl.parms,
DELAY_PROBE_TIME);
-   struct net_device *netdev = rpriv->rep->netdev;
+   struct net_device *netdev = rpriv->netdev;
struct mlx5e_priv *priv = netdev_priv(netdev);
 
rpriv->neigh_update.min_interval = min_t(unsigned long, ipv6_interval, 
ipv4_interval);
@@ -312,7 +316,7 @@ static void mlx5e_rep_neigh_stats_work(struct work_struct 
*work)
 {
struct mlx5e_rep_priv *rpriv = container_of(work, struct mlx5e_rep_priv,

neigh_update.neigh_stats_work.work);
-   struct net_device *netdev = rpriv->rep->netdev;
+   struct net_device *netdev = rpriv->netdev;
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5e_neigh_hash_entry *nhe;
 
@@ -408,7 +412,7 @@ static int mlx5e_rep_netevent_event(struct notifier_block 
*nb,
struct mlx5e_rep_priv *rpriv = container_of(nb, struct mlx5e_rep_priv,
neigh_update.netevent_nb);
struct mlx5e_neigh_update_table *neigh_update = >neigh_update;
-   struct net_device *netdev = rpriv->rep->netdev;
+   struct net_device *netdev = rpriv->netdev;
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5e_neigh_hash_entry *nhe = NULL;
struct mlx5e_neigh m_neigh = {};
@@ -536,7 +540,7 @@ static int mlx5e_rep_neigh_init(struct mlx5e_rep_priv 
*rpriv)
 static void mlx5e_rep_neigh_cleanup(struct mlx5e_rep_priv *rpriv)
 {
struct mlx5e_neigh_update_table *neigh_update = >neigh_update;
-   struct mlx5e_priv *priv = netdev_priv(rpriv->rep->netdev);
+   struct mlx5e_priv *priv = netdev_priv(rpriv->netdev);
 
unregister_netevent_notifier(_update->netevent_nb);
 
@@ -957,7 +961,7 @@ static int mlx5e_init_rep_rx(struct mlx5e_priv *priv)

[for-next V3 02/11] net/mlx5: E-Switch, Refactor load/unload of representors

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

Refactor the load/unload stages for better code reuse.

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 66 +-
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 7e15854c1087..26fbc50ddc6d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -767,12 +767,47 @@ int esw_offloads_init_reps(struct mlx5_eswitch *esw)
return 0;
 }
 
-int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
+static void esw_offloads_unload_reps(struct mlx5_eswitch *esw, int nvports)
+{
+   struct mlx5_eswitch_rep *rep;
+   int vport;
+
+   for (vport = nvports - 1; vport >= 0; vport--) {
+   rep = >offloads.vport_reps[vport];
+   if (!rep->valid)
+   continue;
+
+   rep->unload(esw, rep);
+   }
+}
+
+static int esw_offloads_load_reps(struct mlx5_eswitch *esw, int nvports)
 {
struct mlx5_eswitch_rep *rep;
int vport;
int err;
 
+   for (vport = 0; vport < nvports; vport++) {
+   rep = >offloads.vport_reps[vport];
+   if (!rep->valid)
+   continue;
+
+   err = rep->load(esw, rep);
+   if (err)
+   goto err_reps;
+   }
+
+   return 0;
+
+err_reps:
+   esw_offloads_unload_reps(esw, vport);
+   return err;
+}
+
+int esw_offloads_init(struct mlx5_eswitch *esw, int nvports)
+{
+   int err;
+
/* disable PF RoCE so missed packets don't go through RoCE steering */
mlx5_dev_list_lock();
mlx5_remove_dev_by_protocol(esw->dev, MLX5_INTERFACE_PROTOCOL_IB);
@@ -790,25 +825,13 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int 
nvports)
if (err)
goto create_fg_err;
 
-   for (vport = 0; vport < nvports; vport++) {
-   rep = >offloads.vport_reps[vport];
-   if (!rep->valid)
-   continue;
-
-   err = rep->load(esw, rep);
-   if (err)
-   goto err_reps;
-   }
+   err = esw_offloads_load_reps(esw, nvports);
+   if (err)
+   goto err_reps;
 
return 0;
 
 err_reps:
-   for (vport--; vport >= 0; vport--) {
-   rep = >offloads.vport_reps[vport];
-   if (!rep->valid)
-   continue;
-   rep->unload(esw, rep);
-   }
esw_destroy_vport_rx_group(esw);
 
 create_fg_err:
@@ -849,16 +872,7 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw)
 
 void esw_offloads_cleanup(struct mlx5_eswitch *esw, int nvports)
 {
-   struct mlx5_eswitch_rep *rep;
-   int vport;
-
-   for (vport = nvports - 1; vport >= 0; vport--) {
-   rep = >offloads.vport_reps[vport];
-   if (!rep->valid)
-   continue;
-   rep->unload(esw, rep);
-   }
-
+   esw_offloads_unload_reps(esw, nvports);
esw_destroy_vport_rx_group(esw);
esw_destroy_offloads_table(esw);
esw_destroy_offloads_fdb_tables(esw);
-- 
2.13.0

[for-next V3 08/11] net/mlx5e: E-Switch, Move send-to-vport rule struct to en_rep

2017-12-28 Thread Saeed Mahameed

From: Mark Bloch 

Move struct mlx5_esw_sq which keeps send-to-vport rule to from the eswitch
code to mlx5e and rename it to better reflect where it belongs

Signed-off-by: Mark Bloch 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 22 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.h  |  5 +
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.h |  5 -
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 5b2b673c0b13..c6a77f8e99a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -193,17 +193,17 @@ int mlx5e_attr_get(struct net_device *dev, struct 
switchdev_attr *attr)
 static void mlx5e_sqs2vport_stop(struct mlx5_eswitch *esw,
 struct mlx5_eswitch_rep *rep)
 {
-   struct mlx5_esw_sq *esw_sq, *tmp;
+   struct mlx5e_rep_sq *rep_sq, *tmp;
struct mlx5e_rep_priv *rpriv;
 
if (esw->mode != SRIOV_OFFLOADS)
return;
 
rpriv = mlx5e_rep_to_rep_priv(rep);
-   list_for_each_entry_safe(esw_sq, tmp, >vport_sqs_list, list) {
-   mlx5_eswitch_del_send_to_vport_rule(esw_sq->send_to_vport_rule);
-   list_del(_sq->list);
-   kfree(esw_sq);
+   list_for_each_entry_safe(rep_sq, tmp, >vport_sqs_list, list) {
+   mlx5_eswitch_del_send_to_vport_rule(rep_sq->send_to_vport_rule);
+   list_del(_sq->list);
+   kfree(rep_sq);
}
 }
 
@@ -213,7 +213,7 @@ static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
 {
struct mlx5_flow_handle *flow_rule;
struct mlx5e_rep_priv *rpriv;
-   struct mlx5_esw_sq *esw_sq;
+   struct mlx5e_rep_sq *rep_sq;
int err;
int i;
 
@@ -222,8 +222,8 @@ static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
 
rpriv = mlx5e_rep_to_rep_priv(rep);
for (i = 0; i < sqns_num; i++) {
-   esw_sq = kzalloc(sizeof(*esw_sq), GFP_KERNEL);
-   if (!esw_sq) {
+   rep_sq = kzalloc(sizeof(*rep_sq), GFP_KERNEL);
+   if (!rep_sq) {
err = -ENOMEM;
goto out_err;
}
@@ -234,11 +234,11 @@ static int mlx5e_sqs2vport_start(struct mlx5_eswitch *esw,
sqns_array[i]);
if (IS_ERR(flow_rule)) {
err = PTR_ERR(flow_rule);
-   kfree(esw_sq);
+   kfree(rep_sq);
goto out_err;
}
-   esw_sq->send_to_vport_rule = flow_rule;
-   list_add(_sq->list, >vport_sqs_list);
+   rep_sq->send_to_vport_rule = flow_rule;
+   list_add(_sq->list, >vport_sqs_list);
}
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
index e4473a9ebd50..b9b481f2833a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
@@ -133,6 +133,11 @@ struct mlx5e_encap_entry {
int encap_size;
 };
 
+struct mlx5e_rep_sq {
+   struct mlx5_flow_handle *send_to_vport_rule;
+   struct list_head list;
+};
+
 void *mlx5e_alloc_nic_rep_priv(struct mlx5_core_dev *mdev);
 void mlx5e_register_vport_reps(struct mlx5e_priv *priv);
 void mlx5e_unregister_vport_reps(struct mlx5e_priv *priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 91175965df7f..3b481182f13a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -138,11 +138,6 @@ struct mlx5_eswitch_fdb {
};
 };
 
-struct mlx5_esw_sq {
-   struct mlx5_flow_handle *send_to_vport_rule;
-   struct list_head list;
-};
-
 struct mlx5_eswitch_rep;
 struct mlx5_eswitch_rep_if {
int(*load)(struct mlx5_core_dev *dev,
-- 
2.13.0

Re: [pull request][for-next V2 00/11] Mellanox, mlx5 E-Switch updates 2017-12-19

2017-12-28 Thread Saeed Mahameed

On Thu, Dec 28, 2017 at 12:03 AM, David Miller  wrote:
> From: David Miller 
> Date: Wed, 27 Dec 2017 17:01:22 -0500 (EST)
>
>> Pulled, thank you.
>
> Actually, I had to revert.  Please fix this and resubmit:
>
> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c: In function 
> ‘esw_offloads_load_reps’:
> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:774:2: warning: 
> this ‘for’ clause does not guard... [-Wmisleading-indentation]
>   for (rep_type = 0; rep_type < NUM_REP_TYPES; rep_type++)
>   ^~~
> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:776:3: note: 
> ...this statement, but the latter is misleadingly indented as if it is 
> guarded by the ‘for’
>if (err)
>^~

Thanks Dave! this is not just a warning, this is an actual bug. We
will fix and re-spin soon.
Sorry for any inconvenience.

[RFT net-next v3 2/5] net: stmmac: dwmac-meson8b: simplify generating the clock names

2017-12-28 Thread Martin Blumenstingl

Instead of using a custom buffer, snprintf() and devm_kstrdup() we can
simplify this by using devm_kasprintf().
No functional changes - this just makes the code shorter.

Signed-off-by: Martin Blumenstingl 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index e1d5907e481c..1c14210df465 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -86,7 +86,6 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac *dwmac)
struct clk_init_data init;
int i, ret;
struct device *dev = >pdev->dev;
-   char clk_name[32];
const char *clk_div_parents[1];
const char *mux_parent_names[MUX_CLK_NUM_PARENTS];
static const struct clk_div_table clk_25m_div_table[] = {
@@ -113,8 +112,8 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
}
 
/* create the m250_mux */
-   snprintf(clk_name, sizeof(clk_name), "%s#m250_sel", dev_name(dev));
-   init.name = clk_name;
+   init.name = devm_kasprintf(dev, GFP_KERNEL, "%s#m250_sel",
+  dev_name(dev));
init.ops = _mux_ops;
init.flags = 0;
init.parent_names = mux_parent_names;
@@ -132,8 +131,8 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
return PTR_ERR(dwmac->m250_mux_clk);
 
/* create the m250_div */
-   snprintf(clk_name, sizeof(clk_name), "%s#m250_div", dev_name(dev));
-   init.name = devm_kstrdup(dev, clk_name, GFP_KERNEL);
+   init.name = devm_kasprintf(dev, GFP_KERNEL, "%s#m250_div",
+  dev_name(dev));
init.ops = _divider_ops;
init.flags = CLK_SET_RATE_PARENT;
clk_div_parents[0] = __clk_get_name(dwmac->m250_mux_clk);
@@ -151,8 +150,8 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
return PTR_ERR(dwmac->m250_div_clk);
 
/* create the m25_div */
-   snprintf(clk_name, sizeof(clk_name), "%s#m25_div", dev_name(dev));
-   init.name = devm_kstrdup(dev, clk_name, GFP_KERNEL);
+   init.name = devm_kasprintf(dev, GFP_KERNEL, "%s#m25_div",
+  dev_name(dev));
init.ops = _divider_ops;
init.flags = CLK_IS_BASIC | CLK_SET_RATE_PARENT;
clk_div_parents[0] = __clk_get_name(dwmac->m250_div_clk);
-- 
2.15.1

[RFT net-next v3 4/5] net: stmmac: dwmac-meson8b: fix setting the RGMII clock on Meson8b

2017-12-28 Thread Martin Blumenstingl

Meson8b only supports MPLL2 as clock input. The rate of the MPLL2 clock
set by Odroid-C1's u-boot is close to 500MHz. The exact rate is
52394Hz, which is calculated in drivers/clk/meson/clk-mpll.c
using the following formula:
DIV_ROUND_UP_ULL((u64)parent_rate * SDM_DEN, (SDM_DEN * n2) + sdm)
Odroid-C1's u-boot configures MPLL2 with the following values:
- SDM_DEN = 16384
- SDM = 1638
- N2 = 5

The 250MHz clock (m250_div) inside dwmac-meson8b driver is derived from
the MPLL2 clock. Due to MPLL2 running slightly faster than 500MHz the
common clock framework chooses a divider which is too big to generate
the 250MHz clock (a divider of 2 would be needed, but this is rounded up
to a divider of 3). This breaks the RTL8211F RGMII PHY on Odroid-C1
because it requires a (close to) 25MHz clock.

Round the divider to the closest value to prevent this issue on Meson8b.
This means we'll now end up with a clock rate of 25000120Hz (= 25MHz
plus 120Hz).
This has no effect on the Meson GX SoCs since there fclk_div2 is used as
input clock, which has a rate of 1000MHz (and thus is divisible cleanly
to 250MHz and 25MHz).

Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 8b 
/ GXBB DWMAC")
Reported-by: Emiliano Ingrassia 
Signed-off-by: Martin Blumenstingl 
Reviewed-by: Jerome Brunet 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 7199e8c08536..d06106417063 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -139,7 +139,9 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
dwmac->m250_div.shift = PRG_ETH0_CLK_M250_DIV_SHIFT;
dwmac->m250_div.width = PRG_ETH0_CLK_M250_DIV_WIDTH;
dwmac->m250_div.hw.init = 
-   dwmac->m250_div.flags = CLK_DIVIDER_ONE_BASED | CLK_DIVIDER_ALLOW_ZERO;
+   dwmac->m250_div.flags = CLK_DIVIDER_ONE_BASED |
+   CLK_DIVIDER_ALLOW_ZERO |
+   CLK_DIVIDER_ROUND_CLOSEST;
 
dwmac->m250_div_clk = devm_clk_register(dev, >m250_div.hw);
if (WARN_ON(IS_ERR(dwmac->m250_div_clk)))
-- 
2.15.1

[RFT net-next v3 3/5] net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration

2017-12-28 Thread Martin Blumenstingl

While testing the dwmac-meson8b with an RGMII PHY on Meson8b we
discovered that the m25_div is not actually a divider but rather a gate.
This matches with the datasheet which describes bit 10 as "Generate
25MHz clock for PHY". Back when the driver was written it was assumed
that this was a divider (which could divide by 5 or 10) because other
clock bits in the datasheet were documented incorrectly.

Tests have shown that without bit 10 set the RTL8211F RGMII PHY on
Odroid-C1 (using a Meson8b SoC) does not work.
On GXBB and newer SoCs (where the driver was initially tested with RGMII
PHYs) this is not a problem because the input clock is running at 1GHz.
The m250_div clock's biggest possible divider is 7 (3-bit divider, with
value 0 being reserved). Thus we end up with a m250_div of 4 and a
"m25_div" of 10 (= register value 1).

Instead it turns out that the Ethernet IP block seems to have a fixed
"divide by 10" clock internally. This means that bit 10 is a gate clock
which enables the RGMII clock output.

This replaces the "m25_div" clock with a clock gate called "m25_en"
which ensures that we can set this bit to 1 whenever we enable RGMII
mode. This however means that we are now missing a "divide by 10" after
the m250_div (and before our new m25_en), otherwise the common clock
framework thinks that the rate of the m25_en clock is 10-times higher
than it is in the actual hardware. That is solved by adding a
fixed-factor clock which divides the m250_div output by 10.

Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 8b 
/ GXBB DWMAC")
Reported-by: Emiliano Ingrassia 
Signed-off-by: Martin Blumenstingl 
---
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 66 +-
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 1c14210df465..7199e8c08536 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -40,9 +40,7 @@
 #define PRG_ETH0_CLK_M250_DIV_SHIFT7
 #define PRG_ETH0_CLK_M250_DIV_WIDTH3
 
-/* divides the result of m25_sel by either 5 (bit unset) or 10 (bit set) */
-#define PRG_ETH0_CLK_M25_DIV_SHIFT 10
-#define PRG_ETH0_CLK_M25_DIV_WIDTH 1
+#define PRG_ETH0_GENERATE_25M_PHY_CLOCK10
 
 #define PRG_ETH0_INVERTED_RMII_CLK BIT(11)
 #define PRG_ETH0_TX_AND_PHY_REF_CLKBIT(12)
@@ -63,8 +61,11 @@ struct meson8b_dwmac {
struct clk_divider  m250_div;
struct clk  *m250_div_clk;
 
-   struct clk_divider  m25_div;
-   struct clk  *m25_div_clk;
+   struct clk_fixed_factor fixed_div10;
+   struct clk  *fixed_div10_clk;
+
+   struct clk_gate m25_en;
+   struct clk  *m25_en_clk;
 
u32 tx_delay_ns;
 };
@@ -88,11 +89,6 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
struct device *dev = >pdev->dev;
const char *clk_div_parents[1];
const char *mux_parent_names[MUX_CLK_NUM_PARENTS];
-   static const struct clk_div_table clk_25m_div_table[] = {
-   { .val = 0, .div = 5 },
-   { .val = 1, .div = 10 },
-   { /* sentinel */ },
-   };
 
/* get the mux parents from DT */
for (i = 0; i < MUX_CLK_NUM_PARENTS; i++) {
@@ -149,25 +145,39 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
if (WARN_ON(IS_ERR(dwmac->m250_div_clk)))
return PTR_ERR(dwmac->m250_div_clk);
 
-   /* create the m25_div */
-   init.name = devm_kasprintf(dev, GFP_KERNEL, "%s#m25_div",
+   /* create the fixed_div10 */
+   init.name = devm_kasprintf(dev, GFP_KERNEL, "%s#fixed_div10",
   dev_name(dev));
-   init.ops = _divider_ops;
-   init.flags = CLK_IS_BASIC | CLK_SET_RATE_PARENT;
+   init.ops = _fixed_factor_ops;
+   init.flags = CLK_SET_RATE_PARENT;
clk_div_parents[0] = __clk_get_name(dwmac->m250_div_clk);
init.parent_names = clk_div_parents;
init.num_parents = ARRAY_SIZE(clk_div_parents);
 
-   dwmac->m25_div.reg = dwmac->regs + PRG_ETH0;
-   dwmac->m25_div.shift = PRG_ETH0_CLK_M25_DIV_SHIFT;
-   dwmac->m25_div.width = PRG_ETH0_CLK_M25_DIV_WIDTH;
-   dwmac->m25_div.table = clk_25m_div_table;
-   dwmac->m25_div.hw.init = 
-   dwmac->m25_div.flags = CLK_DIVIDER_ALLOW_ZERO;
+   dwmac->fixed_div10.mult = 1;
+   dwmac->fixed_div10.div = 10;
+   dwmac->fixed_div10.hw.init = 
+
+   dwmac->fixed_div10_clk = devm_clk_register(dev, >fixed_div10.hw);
+   if (WARN_ON(IS_ERR(dwmac->fixed_div10_clk)))
+   return PTR_ERR(dwmac->fixed_div10_clk);
+
+   /* create the m25_en */
+   init.name = devm_kasprintf(dev, GFP_KERNEL,

[RFT net-next v3 0/5] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Martin Blumenstingl

Hi Dave,

please do not apply this series until it got a Tested-by from Emiliano.


Hi Emiliano,

you reported [0] that you couldn't get dwmac-meson8b to work on your
Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
I think I was able to find a fix: it consists of two patches (which you
find in this series)

Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
only partially test this (I could only check if the clocks were
calculated correctly when using a dummy 52394Hz input clock instead
of MPLL2).

Could you please give this series a try and let me know about the
results?
You obviously still need your two "ARM: dts: meson8b" patches which
- add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
- enable Ethernet on the Odroid-C1

When testing on Meson8b this also needs a fix for the MPLL clock driver:
"clk: meson: mpll: use 64-bit maths in params_from_rate", see:
https://patchwork.kernel.org/patch/10131677/


I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
fine (so let's hope that this also fixes your Meson8b issue :)).


changes since v1 at [1]:
- changed the subject of the cover-letter to indicate that this is all
  about the RGMII clock
- added PATCH #1 which ensures that we don't unnecessarily change the
  parent clocks in RMII mode (and also makes the code easier to
  understand)
- changed subject of PATCH #2 (formerly PATCH #1) to state that this
  is about the RGMII clock
- added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
- replaced PATCH #3 (formerly PATCH #2) with one that sets
  CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
  on Meson8b correctly

changes since v2 at [2]:
- added PATCH #2 to make the following patch easier
- Emiliano reported that there's currently another bug in the
  dwmac-meson8b driver which prevents it from working with RGMII PHYs on
  Meson8b: bit 10 of the PRG_ETH0 register is configures a clock gate
  (instead of a divide by 5 or divide by 10 clock divider). This has not
  been visible on GXBB and later due to the input clock which always led
  to a selection of "divide by 10" (which is done internally in the IP
  block, but the bit actually means "enable RGMII clock output").
  PATCH #3 was added to address this issue.
- the commit message of PATCH #4 and #5 (formerly PATCH #2 and #3) were
  updated and the patch itself rebased because the m25_div clock was
  removed with the new PATCH #3 (so some of the statements were not
  valid anymore)


[0] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
[1] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
[2] http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005861.html


Martin Blumenstingl (5):
  net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode
  net: stmmac: dwmac-meson8b: simplify generating the clock names
  net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration
  net: stmmac: dwmac-meson8b: fix setting the RGMII clock on Meson8b
  net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock

 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 119 +++--
 1 file changed, 63 insertions(+), 56 deletions(-)

-- 
2.15.1

[RFT net-next v3 1/5] net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode

2017-12-28 Thread Martin Blumenstingl

Neither the m25_div_clk nor the m250_div_clk or m250_mux_clk are used in
RMII mode. The m25_div_clk output is routed to the RGMII PHY's "RGMII
clock".
This means that we don't need to configure the clocks in RMII mode. The
driver however did this - with no effect since the clocks are not routed
to the PHY in RMII mode.

While here also rename meson8b_init_clk to meson8b_init_rgmii_clk to
make it easier to understand the code.

Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 8b 
/ GXBB DWMAC")
Signed-off-by: Martin Blumenstingl 
---
 .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 46 ++
 1 file changed, 21 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index 4404650b32c5..e1d5907e481c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -81,7 +81,7 @@ static void meson8b_dwmac_mask_bits(struct meson8b_dwmac 
*dwmac, u32 reg,
writel(data, dwmac->regs + reg);
 }
 
-static int meson8b_init_clk(struct meson8b_dwmac *dwmac)
+static int meson8b_init_rgmii_clk(struct meson8b_dwmac *dwmac)
 {
struct clk_init_data init;
int i, ret;
@@ -176,7 +176,6 @@ static int meson8b_init_clk(struct meson8b_dwmac *dwmac)
 static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
 {
int ret;
-   unsigned long clk_rate;
u8 tx_dly_val = 0;
 
switch (dwmac->phy_mode) {
@@ -191,9 +190,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
 
case PHY_INTERFACE_MODE_RGMII_ID:
case PHY_INTERFACE_MODE_RGMII_TXID:
-   /* Generate a 25MHz clock for the PHY */
-   clk_rate = 25 * 1000 * 1000;
-
/* enable RGMII mode */
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
PRG_ETH0_RGMII_MODE);
@@ -204,12 +200,24 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac 
*dwmac)
 
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_TXDLY_MASK,
tx_dly_val << PRG_ETH0_TXDLY_SHIFT);
+
+   ret = clk_prepare_enable(dwmac->m25_div_clk);
+   if (ret) {
+   dev_err(>pdev->dev, "failed to enable the PHY 
clock\n");
+   return ret;
+   }
+
+   /* Generate the 25MHz RGMII clock for the PHY */
+   ret = clk_set_rate(dwmac->m25_div_clk, 25 * 1000 * 1000);
+   if (ret) {
+   clk_disable_unprepare(dwmac->m25_div_clk);
+
+   dev_err(>pdev->dev, "failed to set PHY clock\n");
+   return ret;
+   }
break;
 
case PHY_INTERFACE_MODE_RMII:
-   /* Use the rate of the mux clock for the internal RMII PHY */
-   clk_rate = clk_get_rate(dwmac->m250_mux_clk);
-
/* disable RGMII mode -> enables RMII mode */
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_RGMII_MODE,
0);
@@ -231,20 +239,6 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac 
*dwmac)
return -EINVAL;
}
 
-   ret = clk_prepare_enable(dwmac->m25_div_clk);
-   if (ret) {
-   dev_err(>pdev->dev, "failed to enable the PHY clock\n");
-   return ret;
-   }
-
-   ret = clk_set_rate(dwmac->m25_div_clk, clk_rate);
-   if (ret) {
-   clk_disable_unprepare(dwmac->m25_div_clk);
-
-   dev_err(>pdev->dev, "failed to set PHY clock\n");
-   return ret;
-   }
-
/* enable TX_CLK and PHY_REF_CLK generator */
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_TX_AND_PHY_REF_CLK,
PRG_ETH0_TX_AND_PHY_REF_CLK);
@@ -294,7 +288,7 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
 >tx_delay_ns))
dwmac->tx_delay_ns = 2;
 
-   ret = meson8b_init_clk(dwmac);
+   ret = meson8b_init_rgmii_clk(dwmac);
if (ret)
goto err_remove_config_dt;
 
@@ -311,7 +305,8 @@ static int meson8b_dwmac_probe(struct platform_device *pdev)
return 0;
 
 err_clk_disable:
-   clk_disable_unprepare(dwmac->m25_div_clk);
+   if (phy_interface_mode_is_rgmii(dwmac->phy_mode))
+   clk_disable_unprepare(dwmac->m25_div_clk);
 err_remove_config_dt:
stmmac_remove_config_dt(pdev, plat_dat);
 
@@ -322,7 +317,8 @@ static int meson8b_dwmac_remove(struct platform_device 
*pdev)
 {
struct meson8b_dwmac *dwmac = get_stmmac_bsp_priv(>dev);
 
-   clk_disable_unprepare(dwmac->m25_div_clk);
+   if (phy_interface_mode_is_rgmii(dwmac->phy_mode))
+

[RFT net-next v3 5/5] net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock

2017-12-28 Thread Martin Blumenstingl

On Meson8b the only valid input clock is MPLL2. The bootloader
configures that to run at 52394Hz which cannot be divided evenly
down to 25MHz using the m250_div and m25_div clocks. Currently the
common clock framework chooses a m250_div of 2 - with the internal fixed
"divide by 10" this results in a RGMII clock of 25000120Hz (120Hz above
the requested 25MHz).

Letting the common clock framework propagate the rate changes up to the
parent of m250_mux allows us to get the best possible clock rate. With
this patch the common clock framework calculates a rate of
very-close-to-250MHz (24701Hz to be exact) for the MPLL2 clock
(which is the mux input). Dividing that by 1 (using m250_div) along with
the internal fixed divide-by-10 gives us a RGMII clock of 2470Hz
(which is only 30Hz off the requested 25MHz, compared to 120Hz from
u-boot and the vendor driver).

SoCs from the Meson GX series are not affected by this change because
the input clock is FCLK_DIV2 whose rate cannot be changed (which is fine
since it's running at 1GHz, thus it's a multiple of 250MHz).

Fixes: 566e8251625304 ("net: stmmac: add a glue driver for the Amlogic Meson 8b 
/ GXBB DWMAC")
Suggested-by: Jerome Brunet 
Signed-off-by: Martin Blumenstingl 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index d06106417063..9c3cdfef414a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -111,7 +111,7 @@ static int meson8b_init_rgmii_clk(struct meson8b_dwmac 
*dwmac)
init.name = devm_kasprintf(dev, GFP_KERNEL, "%s#m250_sel",
   dev_name(dev));
init.ops = _mux_ops;
-   init.flags = 0;
+   init.flags = CLK_SET_RATE_PARENT;
init.parent_names = mux_parent_names;
init.num_parents = MUX_CLK_NUM_PARENTS;
 
-- 
2.15.1

[PATCH net-next 1/2] update ENA driver to version 1.5.0

2017-12-28 Thread netanel

From: Netanel Belgazal 

This patchset contains two changes:
* Add a robust mechanism for detection of stuck Rx/Tx rings due to
  missed or misrouted MSI-X
* Increase the driver version to 1.5.0

Netanel Belgazal (2):
  net: ena: add detection and recovery mechanism for handling
missed/misrouted MSI-X
  net: ena: increase ena driver version to 1.5.0

 drivers/net/ethernet/amazon/ena/ena_eth_com.c   | 12 +
 drivers/net/ethernet/amazon/ena/ena_eth_com.h   |  2 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c| 68 ++---
 drivers/net/ethernet/amazon/ena/ena_netdev.h|  6 ++-
 drivers/net/ethernet/amazon/ena/ena_regs_defs.h |  2 +
 5 files changed, 82 insertions(+), 8 deletions(-)

-- 
2.7.3.AMZN

[PATCH net-next 2/2] net: ena: increase ena driver version to 1.5.0

2017-12-28 Thread netanel

From: Netanel Belgazal 

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h 
b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 734ff2e84494..f1972b5ab650 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -44,7 +44,7 @@
 #include "ena_eth_com.h"
 
 #define DRV_MODULE_VER_MAJOR   1
-#define DRV_MODULE_VER_MINOR   3
+#define DRV_MODULE_VER_MINOR   5
 #define DRV_MODULE_VER_SUBMINOR 0
 
 #define DRV_MODULE_NAME"ena"
-- 
2.7.3.AMZN

[PATCH net-next 1/2] net: ena: add detection and recovery mechanism for handling missed/misrouted MSI-X

2017-12-28 Thread netanel

From: Netanel Belgazal 

A mechanism for detection of stuck Rx/Tx rings due to missed or
misrouted interrupts.
Check if there are unhandled completion descriptors before the first
MSI-X interrupt arrived.
The check is per queue and per interrupt vector.
Once such condition is detected, driver and device reset is scheduled.

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_eth_com.c   | 12 +
 drivers/net/ethernet/amazon/ena/ena_eth_com.h   |  2 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c| 68 ++---
 drivers/net/ethernet/amazon/ena/ena_netdev.h|  4 ++
 drivers/net/ethernet/amazon/ena/ena_regs_defs.h |  2 +
 5 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.c 
b/drivers/net/ethernet/amazon/ena/ena_eth_com.c
index b11e573ad57a..04ed2b57ff20 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.c
@@ -504,3 +504,15 @@ int ena_com_tx_comp_req_id_get(struct ena_com_io_cq 
*io_cq, u16 *req_id)
 
return 0;
 }
+
+bool ena_com_cq_empty(struct ena_com_io_cq *io_cq)
+{
+   struct ena_eth_io_rx_cdesc_base *cdesc;
+
+   cdesc = ena_com_get_next_rx_cdesc(io_cq);
+   if (cdesc)
+   return false;
+   else
+   return true;
+}
+
diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.h 
b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
index bb53c3a4f8e9..2f7657227cfe 100644
--- a/drivers/net/ethernet/amazon/ena/ena_eth_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.h
@@ -88,6 +88,8 @@ int ena_com_add_single_rx_desc(struct ena_com_io_sq *io_sq,
 
 int ena_com_tx_comp_req_id_get(struct ena_com_io_cq *io_cq, u16 *req_id);
 
+bool ena_com_cq_empty(struct ena_com_io_cq *io_cq);
+
 static inline void ena_com_unmask_intr(struct ena_com_io_cq *io_cq,
   struct ena_eth_io_intr_reg *intr_reg)
 {
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 97c5a89a9cf7..a6f283232cb7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -158,6 +158,8 @@ static void ena_init_io_rings_common(struct ena_adapter 
*adapter,
ring->per_napi_packets = 0;
ring->per_napi_bytes = 0;
ring->cpu = 0;
+   ring->first_interrupt = false;
+   ring->no_interrupt_event_cnt = 0;
u64_stats_init(>syncp);
 }
 
@@ -1274,6 +1276,9 @@ static irqreturn_t ena_intr_msix_io(int irq, void *data)
 {
struct ena_napi *ena_napi = data;
 
+   ena_napi->tx_ring->first_interrupt = true;
+   ena_napi->rx_ring->first_interrupt = true;
+
napi_schedule_irqoff(_napi->napi);
 
return IRQ_HANDLED;
@@ -2648,8 +2653,32 @@ static void ena_fw_reset_device(struct work_struct *work)
rtnl_unlock();
 }
 
-static int check_missing_comp_in_queue(struct ena_adapter *adapter,
-  struct ena_ring *tx_ring)
+static int check_for_rx_interrupt_queue(struct ena_adapter *adapter,
+   struct ena_ring *rx_ring)
+{
+   if (likely(rx_ring->first_interrupt))
+   return 0;
+
+   if (ena_com_cq_empty(rx_ring->ena_com_io_cq))
+   return 0;
+
+   rx_ring->no_interrupt_event_cnt++;
+
+   if (rx_ring->no_interrupt_event_cnt == ENA_MAX_NO_INTERRUPT_ITERATIONS) 
{
+   netif_err(adapter, rx_err, adapter->netdev,
+ "Potential MSIX issue on Rx side Queue = %d. Reset 
the device\n",
+ rx_ring->qid);
+   adapter->reset_reason = ENA_REGS_RESET_MISS_INTERRUPT;
+   smp_mb__before_atomic();
+   set_bit(ENA_FLAG_TRIGGER_RESET, >flags);
+   return -EIO;
+   }
+
+   return 0;
+}
+
+static int check_missing_comp_in_tx_queue(struct ena_adapter *adapter,
+ struct ena_ring *tx_ring)
 {
struct ena_tx_buffer *tx_buf;
unsigned long last_jiffies;
@@ -2659,8 +2688,27 @@ static int check_missing_comp_in_queue(struct 
ena_adapter *adapter,
for (i = 0; i < tx_ring->ring_size; i++) {
tx_buf = _ring->tx_buffer_info[i];
last_jiffies = tx_buf->last_jiffies;
-   if (unlikely(last_jiffies &&
-time_is_before_jiffies(last_jiffies + 
adapter->missing_tx_completion_to))) {
+
+   if (last_jiffies == 0)
+   /* no pending Tx at this location */
+   continue;
+
+   if (unlikely(!tx_ring->first_interrupt && 
time_is_before_jiffies(last_jiffies +
+2 * adapter->missing_tx_completion_to))) {
+   /* If after graceful period interrupt is still not
+* received, we schedule a reset
+

[PATCH net 3/3] eet: ena: invoke netif_carrier_off() only after netdev registered

2017-12-28 Thread netanel

From: Netanel Belgazal 

netif_carrier_off() should be called only after register netdev.
Move the function's call after the registration.

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index fbe21a817bd8..ee50c56765a4 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3276,14 +3276,14 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 
memcpy(adapter->netdev->perm_addr, adapter->mac_addr, netdev->addr_len);
 
-   netif_carrier_off(netdev);
-
rc = register_netdev(netdev);
if (rc) {
dev_err(>dev, "Cannot register net device\n");
goto err_rss;
}
 
+   netif_carrier_off(netdev);
+
INIT_WORK(>reset_task, ena_fw_reset_device);
 
adapter->last_keep_alive_jiffies = jiffies;
-- 
2.7.3.AMZN

[PATCH net 2/3] net: ena: fix error handling in ena_down() sequence

2017-12-28 Thread netanel

From: Netanel Belgazal 

ENA admin command queue errors are not handled as part of ena_down().
As a result, in case of error admin queue transitions to non-running
state and aborts all subsequent commands including those coming from
ena_up(). Reset scheduled by the driver from the timer service
context would not proceed due to sharing rtnl with ena_up()/ena_down()

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6fb28fd43eb3..fbe21a817bd8 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -75,6 +75,9 @@ static struct workqueue_struct *ena_wq;
 MODULE_DEVICE_TABLE(pci, ena_pci_tbl);
 
 static int ena_rss_init_default(struct ena_adapter *adapter);
+static void check_for_admin_com_state(struct ena_adapter *adapter);
+static void ena_destroy_device(struct ena_adapter *adapter);
+static int ena_restore_device(struct ena_adapter *adapter);
 
 static void ena_tx_timeout(struct net_device *dev)
 {
@@ -1884,6 +1887,17 @@ static int ena_close(struct net_device *netdev)
if (test_bit(ENA_FLAG_DEV_UP, >flags))
ena_down(adapter);
 
+   /* Check for device status and issue reset if needed*/
+   check_for_admin_com_state(adapter);
+   if (unlikely(test_bit(ENA_FLAG_TRIGGER_RESET, >flags))) {
+   netif_err(adapter, ifdown, adapter->netdev,
+ "Destroy failure, restarting device\n");
+   ena_dump_stats_to_dmesg(adapter);
+   /* rtnl lock already obtained in dev_ioctl() layer */
+   ena_destroy_device(adapter);
+   ena_restore_device(adapter);
+   }
+
return 0;
 }
 
@@ -2544,11 +2558,12 @@ static void ena_destroy_device(struct ena_adapter 
*adapter)
 
ena_com_set_admin_running_state(ena_dev, false);
 
-   ena_close(netdev);
+   if (test_bit(ENA_FLAG_DEV_UP, >flags))
+   ena_down(adapter);
 
/* Before releasing the ENA resources, a device reset is required.
 * (to prevent the device from accessing them).
-* In case the reset flag is set and the device is up, ena_close
+* In case the reset flag is set and the device is up, ena_down()
 * already perform the reset, so it can be skipped.
 */
if (!(test_bit(ENA_FLAG_TRIGGER_RESET, >flags) && dev_up))
-- 
2.7.3.AMZN

[PATCH net 1/3] net: ena: unmask MSI-X only after device initialization is completed

2017-12-28 Thread netanel

From: Netanel Belgazal 

Under certain conditions MSI-X interrupt might arrive right after it
was unmasked in ena_up(). There is a chance it would be processed by
the driver before device ENA_FLAG_DEV_UP flag is set. In such a case
the interrupt is ignored.
ENA device operates in auto-masked mode, therefore ignoring
interrupt leaves it masked for good.
Moving unmask of interrupt to be the last step in ena_up().

Signed-off-by: Netanel Belgazal 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 97c5a89a9cf7..6fb28fd43eb3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1565,7 +1565,7 @@ static int ena_rss_configure(struct ena_adapter *adapter)
 
 static int ena_up_complete(struct ena_adapter *adapter)
 {
-   int rc, i;
+   int rc;
 
rc = ena_rss_configure(adapter);
if (rc)
@@ -1584,17 +1584,6 @@ static int ena_up_complete(struct ena_adapter *adapter)
 
ena_napi_enable_all(adapter);
 
-   /* Enable completion queues interrupt */
-   for (i = 0; i < adapter->num_queues; i++)
-   ena_unmask_interrupt(>tx_ring[i],
->rx_ring[i]);
-
-   /* schedule napi in case we had pending packets
-* from the last time we disable napi
-*/
-   for (i = 0; i < adapter->num_queues; i++)
-   napi_schedule(>ena_napi[i].napi);
-
return 0;
 }
 
@@ -1731,7 +1720,7 @@ static int ena_create_all_io_rx_queues(struct ena_adapter 
*adapter)
 
 static int ena_up(struct ena_adapter *adapter)
 {
-   int rc;
+   int rc, i;
 
netdev_dbg(adapter->netdev, "%s\n", __func__);
 
@@ -1774,6 +1763,17 @@ static int ena_up(struct ena_adapter *adapter)
 
set_bit(ENA_FLAG_DEV_UP, >flags);
 
+   /* Enable completion queues interrupt */
+   for (i = 0; i < adapter->num_queues; i++)
+   ena_unmask_interrupt(>tx_ring[i],
+>rx_ring[i]);
+
+   /* schedule napi in case we had pending packets
+* from the last time we disable napi
+*/
+   for (i = 0; i < adapter->num_queues; i++)
+   napi_schedule(>ena_napi[i].napi);
+
return rc;
 
 err_up:
-- 
2.7.3.AMZN

[PATCH net 0/3] bug fixes for ENA Ethernet driver

2017-12-28 Thread netanel

From: Netanel Belgazal 



This patchset contains 3 bug fixes:
* handle rare race condition during MSI-X initialization
* fix error processing in ena_down()
* call netif_carrier_off() only after netdev is registered

Netanel Belgazal (3):
  net: ena: unmask MSI-X only after device initialization is completed
  net: ena: fix error handling in ena_down() sequence
  eet: ena: invoke netif_carrier_off() only after netdev registered

 drivers/net/ethernet/amazon/ena/ena_netdev.c | 49 ++--
 1 file changed, 32 insertions(+), 17 deletions(-)

-- 
2.7.3.AMZN

Re: pull-request: bpf-next 2017-12-28

2017-12-28 Thread Daniel Borkmann

On 12/28/2017 02:41 AM, David Miller wrote:
> From: Daniel Borkmann 
> Date: Thu, 28 Dec 2017 01:18:21 +0100
> 
>> The following pull-request contains BPF updates for your *net-next*
>> tree.
> 
> Pulled.

Thanks!

> Any progress on those tests failing on strict alignment architectures?

Sorry for the delay, was swamped right before Christmas break; I'm back
from vacation 2nd week of Jan, so I'll get right to it then.

Thanks,
Daniel

Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter

2017-12-28 Thread Guillaume Nault

On Thu, Dec 28, 2017 at 07:23:48PM +0100, Lorenzo Bianconi wrote:
> On Dec 28, Guillaume Nault wrote:
> > After a quick review of L2TPv3 and pseudowires RFCs, I still don't see
> > how adding some padding between the L2TPv3 header and the payload could
> > constitute a valid frame. Of course, the base feature is already there,
> > but after a quick test, it looks like the padding bits aren't
> > initialised and leak memory.
> 
> Do you mean for L2TPv2 or L2TPv3? For L2TPv3 offset/peer_offset are 
> initialized
> in l2tp_nl_cmd_session_create()
>
That's the offsets stored in the l2tp_session_cfg structure. But I was
talking about the xmit path: l2tp_build_l2tpv3_header() doesn't
initialise the padding between the header and the payload. So when
someone activates this option, then every transmitted packet leaks
memory on the wire.

> Setting session data offset is already supported in L2TP kernel module
> (and could be already used by userspace applications);
> for L2TPv2 there is an optional 16-bit value in the header while for L2TPv3
> the offset is configured by userspace.
> At the moment the kernel (for L2TPv3) uses offset for both tx and rx side.
> Userspace (iproute2) allows to distinguish tx offset (offset) from rx one
> (peer_offset) but since the rx part is not handled at the moment
> (I fixed peer_offset support in iproute2, I have not sent the patch upstream 
> yet, attached below)
> this leads to a misalignment between tunnel endpoints.
> You can easily reproduce the issue using this setup (and the below patch for 
> iproute2):
> 
> ip l2tp add tunnel local  remote  tunnel_id  peer_tunnel_id 
>  udp_sport  udp_dport 
> ip l2tp add tunnel local  remote  tunnel_id  peer_tunnel_id 
>  udp_sport  udp_dport 
> 
> ip l2tp add session name l2tp0 tunnel_id  session_id  
> peer_session_id  offset 8 peer_offset 16
> ip l2tp add session name l2tp0 tunnel_id  session_id  
> peer_session_id  offset 16 peer_offset 8
> 
Yes, I'm well aware of that. And thanks for having worked on a full
solution including iproute2. But does one really need to set
asymetrical offset values? It doesn't look wrong to require setting the
same value on both sides. Other options need this, like "l2spec_type".

Here we have an option that:
  * creates invalid packets (AFAIK),
  * is buggy and leaks memory on the network,
  * doesn't seem to have any use case (even the manpage
says "This is hardly ever used").

So I'm sorry, but I don't see the point in expanding this option to
allow even stranger setups. If there's a use case, then fine.
Otherwise, let's just acknowledge that the "peer_offset" option of
iproute2 is a noop (and maybe remove it from the manpage).

And the kernel "offset" option needs to be fixed. Actually, I wouldn't
mind if it was converted to be a noop, or even rejected. L2TP already
has its share of unused features that complicate the code and hamper
evolution and bug fixing. As I said earlier, if it's buggy, unused and
can't even produce valid packets, then why bothering with it?

But that's just my point of view. James, do you have an opinion on
this?

Re: [PATCH net] skbuff: in skb_copy_ubufs unclone before releasing zerocopy

2017-12-28 Thread David Miller

From: Willem de Bruijn 
Date: Thu, 28 Dec 2017 12:38:13 -0500

> From: Willem de Bruijn 
> 
> skb_copy_ubufs must unclone before it is safe to modify its
> skb_shared_info with skb_zcopy_clear.
> 
> Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even
> without user frags") ensures that all skbs release their zerocopy
> state, even those without frags.
> 
> But I forgot an edge case where such an skb arrives that is cloned.
> 
> The stack does not build such packets. Vhost/tun skbs have their
> frags orphaned before cloning. TCP skbs only attach zerocopy state
> when a frag is added.
> 
> But if TCP packets can be trimmed or linearized, this might occur.
> Tracing the code I found no instance so far (e.g., skb_linearize
> ends up calling skb_zcopy_clear if !skb->data_len).
> 
> Still, it is non-obvious that no path exists. And it is fragile to
> rely on this.
> 
> Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without 
> user frags")
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable.

Re: [PATCH net 0/2] strparser: Fix lockdep issue

2017-12-28 Thread David Miller

From: Tom Herbert 
Date: Thu, 28 Dec 2017 11:00:42 -0800

> When sock_owned_by_user returns true in strparser. Fix is to add and
> call sock_owned_by_user_nocheck since the check for owned by user is
> not an error condition in this case.
> 
> Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
> Reported-by: syzbot 
> Reported-and-tested-by: 
> 

Series applied.

Re: [RFT net-next v2 0/3] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Martin Blumenstingl

Hi Emiliano,

On Thu, Dec 28, 2017 at 6:51 PM, Emiliano Ingrassia
 wrote:
> Hi Martin,
>
> thank you for the quick response!
>
> On Thu, Dec 28, 2017 at 05:58:34PM +0100, Martin Blumenstingl wrote:
>> Hi Emiliano,
>>
>> thank you for testing this!
>>
>> On Thu, Dec 28, 2017 at 5:16 PM, Emiliano Ingrassia
>>  wrote:
>> > Hi Martin, Hi Dave,
>> >
>> > On Sun, Dec 24, 2017 at 12:40:57AM +0100, Martin Blumenstingl wrote:
>> >> Hi Dave,
>> >>
>> >> please do not apply this series until it got a Tested-by from Emiliano.
>> >>
>> >>
>> >> Hi Emiliano,
>> >>
>> >> you reported [0] that you couldn't get dwmac-meson8b to work on your
>> >> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
>> >> I think I was able to find a fix: it consists of two patches (which you
>> >> find in this series)
>> >>
>> >> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
>> >> only partially test this (I could only check if the clocks were
>> >> calculated correctly when using a dummy 52394Hz input clock instead
>> >> of MPLL2).
>> >>
>> >> Could you please give this series a try and let me know about the
>> >> results?
>> >> You obviously still need your two "ARM: dts: meson8b" patches which
>> >> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
>> >> - enable Ethernet on the Odroid-C1
>> >>
>> >> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
>> >> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
>> >> fine (so let's hope that this also fixes your Meson8b issue :)).
>> >>
>> >>
>> >> changes since v1 at [1]:
>> >> - changed the subject of the cover-letter to indicate that this is all
>> >>   about the RGMII clock
>> >> - added PATCH #1 which ensures that we don't unnecessarily change the
>> >>   parent clocks in RMII mode (and also makes the code easier to
>> >>   understand)
>> >> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>> >>   is about the RGMII clock
>> >> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
>> >> - replaced PATCH #3 (formerly PATCH #2) with one that sets
>> >>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>> >>   on Meson8b correctly
>> >>
>> >
>> > Really thank you for your help and effort. I tried your patch but
>> > unfortunately it didn't solve the problem.
>> this is probably my fault: I forgot to mention that it requires a fix
>> for the 32-bit SoCs in the clock driver ("clk: meson: mpll: use 64-bit
>> maths in params_from_rate", see [0]) to work properly
>>
>
> Ok, with that patch applied I got:
>
> xtal   112400  0 0
>  sys_pll   00  12  0 0
>   cpu_clk  00  12  0 0
>  vid_pll   00   73200  0 0
>  fixed_pll 22  255000  0 0
>   mpll211   124999851  0 0
>c941.ethernet#m250_sel  11   124999851  0 0
> c941.ethernet#m250_div 11   124999851  0 0
>  c941.ethernet#m25_div 112471  0 0
in theory this looks good...!

> which is equal to your result. However, the ethernet is still not working.
OK, I'll send an updated version later (or tomorrow, depending on how
much time I have left today) which adds a fixed divider and converts
bit 10 to a gate
on GXBB and newer bit 10 is always true since the m250_div is only
3-bit wide (= max divider of 7, 0 is invalid according to the
datasheet). with a 1000MHz input clock (fclk_div2) m250_div will
divide this by 4 and m25_div divided this by 10.

> The prg0 register is set to 0x70A1.
>
> A problem that I see with this solution is that MPLL2 is set to ~125 MHz.
> The S805 SoC manual reports that bits 9-7 should contain a value x such
> that: MPLL2 = 250 MHz * x (with x >= 1).
> In our case, bits 9-7 are set to 1 which is incorrect.
> I think that MPLL2 should be 250 MHz at least.
when looking at the GXBB clock tree we need a fixed divide by 10.
this also means that the mpll2 clock will probably be set to ~250MHz,
the m250_div to 1 and with the fixed divider "10" we get close to our
desired 25MHz

>> >
>> > Here is the new clk_summary:
>> >
>> > xtal112400  0 0
>> >  sys_pll00  12  0 0
>> >   cpu_clk   00  12  0 0
>> >  vid_pll00   73200  0 0
>> >  fixed_pll  22  255000  0 0
>> >   mpll2 11   10625  0 0
>> >c941.ethernet#m250_sel   11   10625

Re: [PATCH net-next] virtio_net: implement VIRTIO_CONFIG_S_NEEDS_RESET

2017-12-28 Thread Willem de Bruijn

On Mon, Oct 16, 2017 at 11:44 PM, Michael S. Tsirkin  wrote:
> On Tue, Oct 17, 2017 at 11:05:07AM +0800, Jason Wang wrote:
>>
>>
>> On 2017年10月17日 06:34, Willem de Bruijn wrote:
>> > On Mon, Oct 16, 2017 at 12:38 PM, Michael S. Tsirkin  
>> > wrote:
>> > > On Mon, Oct 16, 2017 at 12:04:57PM -0400, Willem de Bruijn wrote:
>> > > > On Mon, Oct 16, 2017 at 11:31 AM, Michael S. Tsirkin  
>> > > > wrote:
>> > > > > On Mon, Oct 16, 2017 at 11:03:18AM -0400, Willem de Bruijn wrote:
>> > > > > > > > +static int virtnet_reset(struct virtnet_info *vi)
>> > > > > > > > +{
>> > > > > > > > + struct virtio_device *dev = vi->vdev;
>> > > > > > > > + int ret;
>> > > > > > > > +
>> > > > > > > > + virtio_config_disable(dev);
>> > > > > > > > + dev->failed = dev->config->get_status(dev) & 
>> > > > > > > > VIRTIO_CONFIG_S_FAILED;
>> > > > > > > > + virtnet_freeze_down(dev, true);
>> > > > > > > > + remove_vq_common(vi);
>> > > > > > > > +
>> > > > > > > > + virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>> > > > > > > > + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>> > > > > > > > +
>> > > > > > > > + ret = virtio_finalize_features(dev);
>> > > > > > > > + if (ret)
>> > > > > > > > + goto err;
>> > > > > > > > +
>> > > > > > > > + ret = virtnet_restore_up(dev);
>> > > > > > > > + if (ret)
>> > > > > > > > + goto err;
>> > > > > > > > +
>> > > > > > > > + ret = virtnet_set_queues(vi, vi->curr_queue_pairs);
>> > > > > > > > + if (ret)
>> > > > > > > > + goto err;
>> > > > > > > > +
>> > > > > > > > + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>> > > > > > > > + virtio_config_enable(dev);
>> > > > > > > > + return 0;
>> > > > > > > > +
>> > > > > > > > +err:
>> > > > > > > > + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>> > > > > > > > + return ret;
>> > > > > > > > +}
>> > > > > > > > +
>> > > > > > > >   static int virtnet_set_guest_offloads(struct virtnet_info 
>> > > > > > > > *vi, u64 offloads)
>> > > > > > > >   {
>> > > > > > > >struct scatterlist sg;
>> > > > > > > I have a question here though. How do things like MAC address
>> > > > > > > get restored?
>> > > > > > >
>> > > > > > > What about the rx mode?
>> > > > > > >
>> > > > > > > vlans?
>> > > > > > The function as is releases and reinitializes only ring state.
>> > > > > > Device configuration such as mac and vlan persist across
>> > > > > > the reset.
>> > > > > What gave you this impression? Take a look at e.g. this
>> > > > > code in qemu:
>> > > > >
>> > > > > static void virtio_net_reset(VirtIODevice *vdev)
>> > > > > {
>> > > > >  VirtIONet *n = VIRTIO_NET(vdev);
>> > > > >
>> > > > >  /* Reset back to compatibility mode */
>> > > > >  n->promisc = 1;
>> > > > >  n->allmulti = 0;
>> > > > >  n->alluni = 0;
>> > > > >  n->nomulti = 0;
>> > > > >  n->nouni = 0;
>> > > > >  n->nobcast = 0;
>> > > > >  /* multiqueue is disabled by default */
>> > > > >  n->curr_queues = 1;
>> > > > >  timer_del(n->announce_timer);
>> > > > >  n->announce_counter = 0;
>> > > > >  n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> > > > >
>> > > > >  /* Flush any MAC and VLAN filter table state */
>> > > > >  n->mac_table.in_use = 0;
>> > > > >  n->mac_table.first_multi = 0;
>> > > > >  n->mac_table.multi_overflow = 0;
>> > > > >  n->mac_table.uni_overflow = 0;
>> > > > >  memset(n->mac_table.macs, 0, MAC_TABLE_ENTRIES * ETH_ALEN);
>> > > > >  memcpy(>mac[0], >nic->conf->macaddr, sizeof(n->mac));
>> > > > >  qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
>> > > > >  memset(n->vlans, 0, MAX_VLAN >> 3);
>> > > > > }
>> > > > >
>> > > > > So device seems to lose all state, you have to re-program it.
>> > > > Oh, indeed! The guest does not reset its state, so it might
>> > > > be out of sync with the host after the operation. Was this not
>> > > > an issue when previously resetting in the context of xdp?
>> > > I suspect it was broken back then, too.
>> > Okay. I guess that in principle this is all programmable through
>> > virtnet_set_rx_mode, virtnet_vlan_rx_add_vid, etc. But it's a
>> > lot more complex than just restoring virtnet_reset. Will need to
>> > be careful about concurrency issues at the least. Similar to the
>> > ones you point out below.
>> >
>>
>> The problem has been pointed out during developing virtio-net XDP. But it
>> may not be a big issue since vhost_net ignores all kinds of the filters now.
>>
>> Thanks
>
> It might not keep doing that in the future though.
> And virtio-net in userspace doesn't ignore the filters.

How about the guest honor the request only if no state has been
offloaded to the host?

This is the common case for vhost_net, and not expected to change
soon.

Even when it does, we have a graceful degradation strategy. Guest
revert state prior to reset and reapply.

[PATCH net 2/2] strparser: Call sock_owned_by_user_nocheck

2017-12-28 Thread Tom Herbert

strparser wants to check socket ownership without producing any
warnings. As indicated by the comment in the code, it is permissible
for owned_by_user to return true.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reported-by: syzbot 
Reported-and-tested-by: 

Signed-off-by: Tom Herbert 
---
 net/strparser/strparser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index c5fda15ba319..1fdab5c4eda8 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -401,7 +401,7 @@ void strp_data_ready(struct strparser *strp)
 * allows a thread in BH context to safely check if the process
 * lock is held. In this case, if the lock is held, queue work.
 */
-   if (sock_owned_by_user(strp->sk)) {
+   if (sock_owned_by_user_nocheck(strp->sk)) {
queue_work(strp_wq, >work);
return;
}
-- 
2.11.0

[PATCH net 0/2] strparser: Fix lockdep issue

2017-12-28 Thread Tom Herbert

When sock_owned_by_user returns true in strparser. Fix is to add and
call sock_owned_by_user_nocheck since the check for owned by user is
not an error condition in this case.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reported-by: syzbot 
Reported-and-tested-by: 


Tom Herbert (2):
  sock: Add sock_owned_by_user_nocheck
  strparser: Call sock_owned_by_user_nocheck

 include/net/sock.h| 5 +
 net/strparser/strparser.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

-- 
2.11.0

[PATCH net 1/2] sock: Add sock_owned_by_user_nocheck

2017-12-28 Thread Tom Herbert

This allows checking socket lock ownership with producing lockdep
warnings.

Signed-off-by: Tom Herbert 
---
 include/net/sock.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 9155da422692..7a7b14e9628a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1514,6 +1514,11 @@ static inline bool sock_owned_by_user(const struct sock 
*sk)
return sk->sk_lock.owned;
 }
 
+static inline bool sock_owned_by_user_nocheck(const struct sock *sk)
+{
+   return sk->sk_lock.owned;
+}
+
 /* no reclassification while locks are held */
 static inline bool sock_allow_reclassification(const struct sock *csk)
 {
-- 
2.11.0

Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-28 Thread Russell King - ARM Linux

On Thu, Dec 28, 2017 at 11:04:16AM +0100, Antoine Tenart wrote:
> Hi Russell,
> 
> On Wed, Dec 27, 2017 at 11:20:00PM +, Russell King - ARM Linux wrote:
> > On Wed, Dec 27, 2017 at 11:42:52PM +0100, Antoine Tenart wrote:
> > > 
> > > What do you suggest to describe this in the dt, to enable a port using
> > > the current PPv2 driver?
> > 
> > I don't - I'm merely pointing out that you're bodging support for the
> > SFP cage rather than productively discussing phylink for mvpp2.
> > 
> > As far as I remember, the discussion stalled at this point:
> > 
> > - You think there's modes that mvpp2 supports that are not supportable
> >   if you use phylink.
> > 
> > - I've described what phylink supports, and I've asked you for details
> >   about what you can't support.
> 
> That's not what I remembered. You had some valid points, and others
> related to PHY modes the driver wasn't supporting before the phylink
> transition. My understanding of this was that you wanted a full
> featured support while I only wanted to convert the already supported
> modes.

You are mistaken - you can get a full refresher on where things were
at via https://patchwork.kernel.org/patch/9963971/

There are two points in that thread where discussion stopped awaiting
input:

1. I asked for details about what mvpp2.c supports that phylink does
   not (as you indicated that there were certain things that mvpp2
   supports that phylink does not.)  I'm still awaiting a response.

2. 25th Sept, you indicated that you would get someone to test
   an issue related to in-band AN. No results of that testing have
   been forthcoming.

Consequently, the ball is in your court on both these issues.

I am not after a full featured support, what I'm after is ensuring
that phylink is (a) used correctly and (b) implementations using it
are correct.  Part of that is ensuring that users don't introduce
unexpected failure conditions.

So, when you do this in the validate() callback:

 +   phylink_set(mask, 1000baseX_Full);

and then do this in the mac_config() callback:

 +   if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
 +   port->phy_interface != PHY_INTERFACE_MODE_SGMII)
 +   return;

and this in the link_state() callback:

 +   if (!phy_interface_mode_is_rgmii(port->phy_interface) &&
 +   port->phy_interface != PHY_INTERFACE_MODE_SGMII)
 +   return 0;

the result is that phylink thinks that you support 1000base-X modes,
and it will call mac_config() asking for 1000base-X, but you silently
ignore that, leaving the hardware configured in whatever state it was.
That leads to a silent failure as far as the user is concerned.

So, if you do not intend to support 1000base-X initially, don't
allow it in the validate callback until you do.

It gets worse, because the return in link_state() means that phylink
thinks that the link is up if it has requested 1000base-X, which it
won't be unless you've properly configured it.

It's this kind of unreliability that I was concerned about in your
patch.  I'm not demanding "full featured implementation" but I do
want you to use it correctly.

> You're probably right about not wanting this dt patch. The non-dt
> patches still are relevant regardless of phylink being supported in the
> PPv2 driver. I'll send a v2 without the dt parts.

Thanks.

> > What I'm most concerned about, given the bindings for comphy that
> > have been merged, is that Free Electrons is pushing forward seemingly
> > with no regard to the requirement that the serdes lanes are dynamically
> > reconfigurable, and that's a basic requirement for SFP, and for the
> > 88x3310 PHYs on the Macchiatobin platform.
> 
> The main idea behind the comphy driver is to provide a way to
> reconfigure the serdes lanes at runtime. Could you develop what are
> blocking points to properly support SFP, regarding the current comphy
> support?

If it supports serdes lane mode reconfiguration (iow, switching between
1000base-X, 2500base-X, SGMII, 10G-KR), then that's all that's required.
Note that you need comphy to switch between SGMII / 10G-KR to support
the 88x3310 fully too.

Having looked deeper at this, it seems it does have the capability of
doing what's required, sorry for the noise.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-28 Thread Russell King - ARM Linux

On Thu, Dec 28, 2017 at 07:27:39PM +0100, Antoine Tenart wrote:
> Hi Florian,
> 
> On Thu, Dec 28, 2017 at 07:02:09AM -0800, Florian Fainelli wrote:
> > On 12/28/2017 02:05 AM, Antoine Tenart wrote:
> > > On Thu, Dec 28, 2017 at 08:46:23AM +0100, Andrew Lunn wrote:
> > >> On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux wrote:
> > >>> On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
> >   
> >  +_eth2 {
> >  +  /* CPS Lane 5 */
> >  +  status = "okay";
> >  +  phy-mode = "2500base-x";
> >  +  /* Generic PHY, providing serdes lanes */
> >  +  phys = <_comphy5 2>;
> >  +};
> >  +
> > >>>
> > >>> This is wrong.  This lane is connected to a SFP cage which can support
> > >>> more than just 2500base-X.  Tying it in this way to 2500base-X means
> > >>> that this port does not support conenctions at 1000base-X, despite
> > >>> that's one of the most popular and more standardised speeds.
> > >>>
> > >>
> > >> I agree with Russell here. SFP modules are hot pluggable, and support
> > >> a range of interface modes. You need to query what the SFP module is
> > >> in order to know how to configure the SERDES interface. The phylink
> > >> infrastructure does that for you.
> > > 
> > > Sure, I understand. We'll be able to support such interfaces only when
> > > the phylink PPv2 support lands in.
> > 
> > Should we expect PHYLINK support to make it as the first patch in your
> > v2 of this patch series, or is someone else doing that?
> 
> No, the phylink patch conflicts with Marcin's ACPI series and we agreed
> to let him get his series merged first. And I will probably work on a
> few other topics before having the chance to work on it. So it'll
> probably be me doing that, but not right now.

ACPI is going to be a problem with phylink for a while.  There's patches
queued in net-next which convert phylink and SFP mostly to the fwnode
and property based systems, but phylib and i2c do not seem to have the
necessary bits to be able to deal with those.

Specifically, in DT we have "of_find_i2c_adapter_by_node()" but afaics
there is no equivalent in ACPI - which means in an ACPI based system
we have no way to determine the I2C bus associated with a SFP socket,
which is a rather fundamental issue for SFP modules.

For phylib side, there's "of_phy_attach()" but again there is no
equivalent in ACPI. This should not be that much of a problem, because
network drivers using the DT phylib calls (eg, "of_phy_connect()") are
already restricted by this. That may have been solved by Marcin's
series, but I've not seen it to know.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

Re: WARNING in strp_data_ready

2017-12-28 Thread Dmitry Vyukov

On Thu, Dec 28, 2017 at 7:21 PM, Ozgur  wrote:
>
>
> 28.12.2017, 19:33, "Dmitry Vyukov" :
>> On Thu, Dec 28, 2017 at 5:14 PM, Tom Herbert  wrote:
>>>  On Thu, Dec 28, 2017 at 12:59 AM, Ozgur  wrote:
  28.12.2017, 04:19, "Tom Herbert" :
>  On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>   27.12.2017, 23:14, "Dmitry Vyukov" :
>>>   On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
27.12.2017, 22:21, "Dmitry Vyukov" :
>On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert 
>  wrote:
>> Did you try the patch I posted?
>
>Hi Tom,

Hello Dmitry,

>No. And I didn't know I need to. Why?
>If you think the patch needs additional testing, you can ask 
> syzbot to
>test it. See 
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>Otherwise proceed with committing it. Or what are we waiting for?
>
>Thanks

I think we need to fixed patch for crash, in fact check to patch 
 code and test solve the bug.
How do test it because there is no patch in the following bug?
>>>
>>>   Hi Ozgur,
>>>
>>>   I am not sure I completely understand what you mean. But the
>>>   reproducer for this bug (which one can use for testing) is here:
>>>   https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>>>   Tom also mentions there is some patch for this, but I don't know where
>>>   it is, it doesn't seem to be referenced from this thread.
>>
>>   Hello Dmitry,
>>
>>   Ah, I'm sorry I don't seen Tom mail and I don't have a patch not 
>> tested :)
>>   I think Tom send patch to only you and are you tested?
>>
>>   kcmsock.c will change and strp_data_ready I think locked.
>>
>>   Tom, please send a patch for me? I can test and inform you.
>
>  Hi Ozgur,
>
>  I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if 
> you can!
>
>  Thanks,
>  Tom

  Hello Tom,

  Which are you use the repos? I pulled but I don't seen this patches.
>>>  They are not in any public repo yet. I posted the patches to netdev
>>>  list so they can be reviewed and tested by third parties. Posting
>>>  patches to the list a normal path to get patches into the kernel
>>>  
>>> (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/).
>>>
>>>  These patches were applied to net-next but are simple enough that they
>>>  should apply to other branches. I will repost and target to net per
>>>  Dave's directive once they are verified to fix the issue.
>
> Hello,
>
> thanks Tom and I have tested the fixed patch for linux-next builds and don't 
> have to kernel panic. when nocheck funcs call sk_lock.owned and kernel 
> doesn't give a panic.  I have compiled and uploaded next-kernel.
>
> Dmitry,
> could you test it on linux-next?

If you are trying to test how many times I can repeat this, I can
repeat this lots of times:

If you think the patch needs additional testing, you can ask syzbot to
test it. See 
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot

Re: [PATCH net-next 5/6] arm64: dts: marvell: mcbin: enable the fourth network interface

2017-12-28 Thread Antoine Tenart

Hi Florian,

On Thu, Dec 28, 2017 at 07:02:09AM -0800, Florian Fainelli wrote:
> On 12/28/2017 02:05 AM, Antoine Tenart wrote:
> > On Thu, Dec 28, 2017 at 08:46:23AM +0100, Andrew Lunn wrote:
> >> On Wed, Dec 27, 2017 at 10:24:01PM +, Russell King - ARM Linux wrote:
> >>> On Wed, Dec 27, 2017 at 11:14:45PM +0100, Antoine Tenart wrote:
>   
>  +_eth2 {
>  +/* CPS Lane 5 */
>  +status = "okay";
>  +phy-mode = "2500base-x";
>  +/* Generic PHY, providing serdes lanes */
>  +phys = <_comphy5 2>;
>  +};
>  +
> >>>
> >>> This is wrong.  This lane is connected to a SFP cage which can support
> >>> more than just 2500base-X.  Tying it in this way to 2500base-X means
> >>> that this port does not support conenctions at 1000base-X, despite
> >>> that's one of the most popular and more standardised speeds.
> >>>
> >>
> >> I agree with Russell here. SFP modules are hot pluggable, and support
> >> a range of interface modes. You need to query what the SFP module is
> >> in order to know how to configure the SERDES interface. The phylink
> >> infrastructure does that for you.
> > 
> > Sure, I understand. We'll be able to support such interfaces only when
> > the phylink PPv2 support lands in.
> 
> Should we expect PHYLINK support to make it as the first patch in your
> v2 of this patch series, or is someone else doing that?

No, the phylink patch conflicts with Marcin's ACPI series and we agreed
to let him get his series merged first. And I will probably work on a
few other topics before having the chance to work on it. So it'll
probably be me doing that, but not right now.

This isn't an issue regarding the PPv2 and PHY patches of this series,
but yes we probably won't get the fourth network interface supported on
the mcbin during this release.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

Re: [PATCH net-next 1/6] phy: add 2.5G SGMII mode to the phy_mode enum

2017-12-28 Thread Antoine Tenart

Hi Florian,

On Thu, Dec 28, 2017 at 06:16:51AM -0800, Florian Fainelli wrote:
> 
> And since you are respinning, please make sure you update phy_modes() in
> the same header file as well as
> Documentation/devicetree/bindings/net/ethernet.txt with the newly added
> PHY interface mode.

You're right. Thanks for pointing this out!

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

Re: [PATCH net-next 2/2] l2tp: add peer_offset parameter

2017-12-28 Thread Lorenzo Bianconi

On Dec 28, Guillaume Nault wrote:
> On Fri, Dec 22, 2017 at 03:10:18PM +0100, Lorenzo Bianconi wrote:
> > Introduce peer_offset parameter in order to add the capability
> > to specify two different values for payload offset on tx/rx side.
> > If just offset is provided by userspace use it for rx side as well
> > in order to maintain compatibility with older l2tp versions
> > 
> Sorry for being late on this, I originally missed this patchset and
> only noticed it yesterday.
> 
> Lorenzo, can you give some context around this new feature?
> Quite frankly I can't see the point of it. I've never heard of offsets
> in L2TPv3, and for L2TPv2, the offset value is already encoded in the
> header.

Hi Guillaume,

thanks for your feedback.

> 
> After a quick review of L2TPv3 and pseudowires RFCs, I still don't see
> how adding some padding between the L2TPv3 header and the payload could
> constitute a valid frame. Of course, the base feature is already there,
> but after a quick test, it looks like the padding bits aren't
> initialised and leak memory.

Do you mean for L2TPv2 or L2TPv3? For L2TPv3 offset/peer_offset are initialized
in l2tp_nl_cmd_session_create()

> 
> So, unless I missed something, setting offsets in L2TPv3 is
> non-compliant, the current implementation is buggy and most likely
> unused. I'd really prefer getting the implementation fixed, or even
> removed entirely. Extending it to allow asymmetrical offset values
> looks wrong to me, unless you have a use case in mind.
> 
> Regards,
> 
> Guillaume
> 
> PS: I also noticed that iproute2 has a "peer_offset" option, but it's a
> noop.

Setting session data offset is already supported in L2TP kernel module
(and could be already used by userspace applications);
for L2TPv2 there is an optional 16-bit value in the header while for L2TPv3
the offset is configured by userspace.
At the moment the kernel (for L2TPv3) uses offset for both tx and rx side.
Userspace (iproute2) allows to distinguish tx offset (offset) from rx one
(peer_offset) but since the rx part is not handled at the moment
(I fixed peer_offset support in iproute2, I have not sent the patch upstream 
yet, attached below)
this leads to a misalignment between tunnel endpoints.
You can easily reproduce the issue using this setup (and the below patch for 
iproute2):

ip l2tp add tunnel local  remote  tunnel_id  peer_tunnel_id 
 udp_sport  udp_dport 
ip l2tp add tunnel local  remote  tunnel_id  peer_tunnel_id 
 udp_sport  udp_dport 

ip l2tp add session name l2tp0 tunnel_id  session_id  peer_session_id 
 offset 8 peer_offset 16
ip l2tp add session name l2tp0 tunnel_id  session_id  peer_session_id 
 offset 16 peer_offset 8

commit ee1b976f22fbea530c94a5233ac8c7cd8d643ae9
Author: Lorenzo Bianconi 
Date:   Thu Dec 21 14:41:39 2017 +0100

ip: l2tp: add peer_offset netlink callback

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index 472e9924..21223df7 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -127,6 +127,7 @@ enum {
L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* flag */
L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* flag */
L2TP_ATTR_PAD,
+   L2TP_ATTR_PEER_OFFSET,  /* u16 */
__L2TP_ATTR_MAX,
 };
 
diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 7c5ed313..a3220a8b 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -176,6 +176,8 @@ static int create_session(struct l2tp_parm *p)
  p->reorder_timeout);
if (p->offset)
addattr16(, 1024, L2TP_ATTR_OFFSET, p->offset);
+   if (p->peer_offset)
+   addattr16(, 1024, L2TP_ATTR_PEER_OFFSET, p->peer_offset);
if (p->cookie_len)
addattr_l(, 1024, L2TP_ATTR_COOKIE,
  p->cookie, p->cookie_len);
@@ -316,6 +318,8 @@ static int get_response(struct nlmsghdr *n, void *arg)
p->encap = rta_getattr_u16(attrs[L2TP_ATTR_ENCAP_TYPE]);
if (attrs[L2TP_ATTR_OFFSET])
p->offset = rta_getattr_u16(attrs[L2TP_ATTR_OFFSET]);
+   if (attrs[L2TP_ATTR_PEER_OFFSET])
+   p->peer_offset = rta_getattr_u16(attrs[L2TP_ATTR_PEER_OFFSET]);
if (attrs[L2TP_ATTR_DATA_SEQ])
p->data_seq = rta_getattr_u16(attrs[L2TP_ATTR_DATA_SEQ]);
if (attrs[L2TP_ATTR_CONN_ID])


Regards,
Lorenzo

Re: [PATCH iproute2 2/3] utils: ll_map: Update name and type for existing entry

2017-12-28 Thread Serhey Popovych

Stephen Hemminger wrote:
> On Wed, 20 Dec 2017 09:37:30 +0200
> Serhey Popovych  wrote:
> 
>> In case of we update existing entry we need not only rehash
>> but also update name in existing entry.
>>
>> Need to update device type too since cached interface might
>> be deleted and new with same index, but different type
>> added (e.g. eth0 and ppp0).
>>
>> Reuse new entry initialization path to avoid duplications.
>>
>> Signed-off-by: Serhey Popovych 
> 
> Can you provide an example where this is an observed bug?
> I suspect that unless you use a batch mode command the reload
> of the cache on next invocation is solving this.
> 
From my side example from description is pretty clear: eth0 -> ppp0
or rename eth0 -> eth1, etc.

According to ll_remember_index() code we might be called with
non-empty cache. If ll_get_by_index() returns an entry with
name that differs from current we need:

  1. Rehash in ->name_hash (done with current code)
  2. Update entry name (not done with current code)

That's my point of view.

Thanks.




signature.asc
Description: OpenPGP digital signature

Re: [PATCH iproute2 2/3] utils: ll_map: Update name and type for existing entry

2017-12-28 Thread Stephen Hemminger

On Wed, 20 Dec 2017 09:37:30 +0200
Serhey Popovych  wrote:

> In case of we update existing entry we need not only rehash
> but also update name in existing entry.
> 
> Need to update device type too since cached interface might
> be deleted and new with same index, but different type
> added (e.g. eth0 and ppp0).
> 
> Reuse new entry initialization path to avoid duplications.
> 
> Signed-off-by: Serhey Popovych 

Can you provide an example where this is an observed bug?
I suspect that unless you use a batch mode command the reload
of the cache on next invocation is solving this.

Re: [PATCH iproute2 3/3] utils: ll_map: Make network device name fixed size array of char

2017-12-28 Thread Stephen Hemminger

On Wed, 20 Dec 2017 09:37:31 +0200
Serhey Popovych  wrote:

> Network device names are fixed in size and never exceed
> IFNAMSIZ (16 bytes).
> 
> Make name fixed size array to always malloc() same size chunk
> of memory and use memcpy()/memcmp() with constant IFNAMSIZ
> to benefit from possible compiler optimizations replacing
> call to a function with two/four load/store instructions
> on 64/32 bit systems.
> 
> Check if IFLA_IFNAME attribute present in netlink message
> (should always) and use strncpy() to pad name with zeros.
> 
> Signed-off-by: Serhey Popovych 
> ---
>  lib/ll_map.c |   20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/ll_map.c b/lib/ll_map.c
> index abe7bdc..fcbf0fb 100644
> --- a/lib/ll_map.c
> +++ b/lib/ll_map.c
> @@ -30,7 +30,7 @@ struct ll_cache {
>   unsignedflags;
>   unsignedindex;
>   unsigned short  type;
> - charname[];
> + charname[IFNAMSIZ];
>  };
>  
>  #define IDXMAP_SIZE  1024
> @@ -71,7 +71,7 @@ static struct ll_cache *ll_get_by_name(const char *name)
>   struct ll_cache *im
>   = container_of(n, struct ll_cache, name_hash);
>  
> - if (strncmp(im->name, name, IFNAMSIZ) == 0)
> + if (!strcmp(im->name, name))
>   return im;
>   }
>  
> @@ -82,7 +82,7 @@ int ll_remember_index(const struct sockaddr_nl *who,
> struct nlmsghdr *n, void *arg)
>  {
>   unsigned int h;
> - const char *ifname;
> + char ifname[IFNAMSIZ];
>   struct ifinfomsg *ifi = NLMSG_DATA(n);
>   struct ll_cache *im;
>   struct rtattr *tb[IFLA_MAX+1];
> @@ -105,17 +105,21 @@ int ll_remember_index(const struct sockaddr_nl *who,
>   }
>  
>   parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), IFLA_PAYLOAD(n));
> - ifname = rta_getattr_str(tb[IFLA_IFNAME]);
> - if (ifname == NULL)
> +
> + if (!tb[IFLA_IFNAME])
> + return 0;
> + strncpy(ifname, rta_getattr_str(tb[IFLA_IFNAME]), IFNAMSIZ);
> + if (!ifname[0])
>   return 0;
> + ifname[IFNAMSIZ - 1] = '\0';
>  
>   if (im) {
>   /* change to existing entry */
> - rehash = strcmp(im->name, ifname);
> + rehash = memcmp(im->name, ifname, IFNAMSIZ);

This is not safe. There is not guarantee that bytes after end of string are 
zero.
And in your code, strncpy() will overwrite characters from the beginning to 
null,
it will not overwrite after that. Then comparison with entry may not work 
because
of the data after that.

I really doubt this is critical path on anything. Probably just having a better
hash table implementation would solve that.

Re: [PATCH iproute2 3/3] utils: ll_map: Make network device name fixed size array of char

2017-12-28 Thread Serhey Popovych

Stephen Hemminger wrote:
> On Wed, 20 Dec 2017 09:37:31 +0200
> Serhey Popovych  wrote:
> 
>> Network device names are fixed in size and never exceed
>> IFNAMSIZ (16 bytes).
>>
>> Make name fixed size array to always malloc() same size chunk
>> of memory and use memcpy()/memcmp() with constant IFNAMSIZ
>> to benefit from possible compiler optimizations replacing
>> call to a function with two/four load/store instructions
>> on 64/32 bit systems.
>>
>> Check if IFLA_IFNAME attribute present in netlink message
>> (should always) and use strncpy() to pad name with zeros.
>>
>> Signed-off-by: Serhey Popovych 
>> ---
>>  lib/ll_map.c |   20 
>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/lib/ll_map.c b/lib/ll_map.c
>> index abe7bdc..fcbf0fb 100644
>> --- a/lib/ll_map.c
>> +++ b/lib/ll_map.c
>> @@ -30,7 +30,7 @@ struct ll_cache {
>>  unsignedflags;
>>  unsignedindex;
>>  unsigned short  type;
>> -charname[];
>> +charname[IFNAMSIZ];
>>  };
>>  
>>  #define IDXMAP_SIZE 1024
>> @@ -71,7 +71,7 @@ static struct ll_cache *ll_get_by_name(const char *name)
>>  struct ll_cache *im
>>  = container_of(n, struct ll_cache, name_hash);
>>  
>> -if (strncmp(im->name, name, IFNAMSIZ) == 0)
>> +if (!strcmp(im->name, name))
>>  return im;
>>  }
>>  
>> @@ -82,7 +82,7 @@ int ll_remember_index(const struct sockaddr_nl *who,
>>struct nlmsghdr *n, void *arg)
>>  {
>>  unsigned int h;
>> -const char *ifname;
>> +char ifname[IFNAMSIZ];
>>  struct ifinfomsg *ifi = NLMSG_DATA(n);
>>  struct ll_cache *im;
>>  struct rtattr *tb[IFLA_MAX+1];
>> @@ -105,17 +105,21 @@ int ll_remember_index(const struct sockaddr_nl *who,
>>  }
>>  
>>  parse_rtattr(tb, IFLA_MAX, IFLA_RTA(ifi), IFLA_PAYLOAD(n));
>> -ifname = rta_getattr_str(tb[IFLA_IFNAME]);
>> -if (ifname == NULL)
>> +
>> +if (!tb[IFLA_IFNAME])
>> +return 0;
>> +strncpy(ifname, rta_getattr_str(tb[IFLA_IFNAME]), IFNAMSIZ);
>> +if (!ifname[0])
>>  return 0;
>> +ifname[IFNAMSIZ - 1] = '\0';
>>  
>>  if (im) {
>>  /* change to existing entry */
>> -rehash = strcmp(im->name, ifname);
>> +rehash = memcmp(im->name, ifname, IFNAMSIZ);
> 
> This is not safe. There is not guarantee that bytes after end of string are 
> zero.

Sorry Stephen, correct if my assumptions are wrong:

  1. struct ll_cache entries are only modified in ll_remember_index().
 There are no places where we may modify ll_cache entries.

  2. strncpy() always pad with zeroes to the end of IFNAMSIZ sized
 buffer.

  3. strncpy() may not return null terminated string: this addressed
 with ifname[IFNAMSIZ - 1] = '\0' in the code above.

Assuming 1 and 2 we always have im->name[] initialized with string and
zero pads up to IFNAMSIZ. We prepare ifname using strncpy() to, so it
is zero padded and we can safely use memcmp() to compare byte by byte.

> And in your code, strncpy() will overwrite characters from the beginning to 
> null,
> it will not overwrite after that. Then comparison with entry may not work 
> because
> of the data after that.
strncpy() will not pad with zeroes up to IFNAMSIZ? I get from strncpy(3)
it will pad to the end of buf, so IFNAMSIZ is initialized.

Please correct me if I'm wrong at some point.

Thanks.

> 
> I really doubt this is critical path on anything. Probably just having a 
> better
> hash table implementation would solve that.
> 

Well this is critical with thousands of interfaces. I guess. Didn't go
with tests, but can do that. Of course difference might not be so big as
I expect, but anyway.



signature.asc
Description: OpenPGP digital signature

Re: [PATCH iproute2 3/3] ip/tunnel: Document "external" parameter

2017-12-28 Thread Stephen Hemminger

On Thu, 28 Dec 2017 13:11:42 +0200
Serhey Popovych  wrote:

> Add it to ip-link(8) "type gre" output help message
> as well as to ip-link(8) page.
> 
> Signed-off-by: Serhey Popovych 

Applied. Thanks

Re: [PATCH iproute2 1/3] vxcan,veth: Forbid "type" for peer device

2017-12-28 Thread Stephen Hemminger

On Thu, 28 Dec 2017 13:01:04 +0200
Serhey Popovych  wrote:

> It is already given for original device we configure this
> peer for.
> 
> Results from following command before/after change applied
> are shown below:
> 
>   $ ip link add dev veth1a type veth peer name veth1b \
>type veth peer name veth1c
> 
> Before:
> ---
> 
> 
> 
> After:
> --
> 
> Error: duplicate "type": "veth" is the second value.
> 
> Signed-off-by: Serhey Popovych 

Applied this one. The other util patches have some issues

Re: [PATCH v2 bpf-next 06/11] bpf: Add sock_ops RTO callback

2017-12-28 Thread Yuchung Cheng

On Thu, Dec 21, 2017 at 5:20 PM, Lawrence Brakmo  wrote:
>
> Adds an optional call to sock_ops BPF program based on whether the
> BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
> The BPF program is passed 2 arguments: icsk_retransmits and whether the
> RTO has expired.
>
> Signed-off-by: Lawrence Brakmo 
> ---
>  include/uapi/linux/bpf.h | 5 +
>  include/uapi/linux/tcp.h | 3 +++
>  net/ipv4/tcp_timer.c | 9 +
>  3 files changed, 17 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 62b2c89..3cf9014 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -995,6 +995,11 @@ enum {
>  * a congestion threshold. RTTs above
>  * this indicate congestion
>  */
> +   BPF_SOCK_OPS_RTO_CB,/* Called when an RTO has triggered.
> +* Arg1: value of icsk_retransmits
> +* Arg2: value of icsk_rto
> +* Arg3: whether RTO has expired
> +*/
>  };
>
>  #define TCP_BPF_IW 1001/* Set TCP initial congestion window 
> */
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index b4a4f64..089c19e 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -259,6 +259,9 @@ struct tcp_md5sig {
> __u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
>  };
>
> +/* Definitions for bpf_sock_ops_flags */
> +#define BPF_SOCK_OPS_RTO_CB_FLAG   (1<<0)
> +
>  /* INET_DIAG_MD5SIG */
>  struct tcp_diag_md5sig {
> __u8tcpm_family;
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 6db3124..f9c57e2 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -215,9 +215,18 @@ static int tcp_write_timeout(struct sock *sk)
> tcp_fastopen_active_detect_blackhole(sk, expired);
can't we just call it here once w/ 'expired' as a parameter, instead
of duplicating the code?

> if (expired) {
> /* Has it gone just too far? */
> +   if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
> +   tcp_call_bpf_3arg(sk, BPF_SOCK_OPS_RTO_CB,
> + icsk->icsk_retransmits,
> + icsk->icsk_rto, 1);
> tcp_write_err(sk);
> return 1;
> }
> +
> +   if (BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_RTO_CB_FLAG))
> +   tcp_call_bpf_3arg(sk, BPF_SOCK_OPS_RTO_CB,
> + icsk->icsk_retransmits,
> + icsk->icsk_rto, 0);
> return 0;
>  }
>
> --
> 2.9.5
>

Re: [RFT net-next v2 0/3] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Emiliano Ingrassia

Hi Martin,

thank you for the quick response!

On Thu, Dec 28, 2017 at 05:58:34PM +0100, Martin Blumenstingl wrote:
> Hi Emiliano,
> 
> thank you for testing this!
> 
> On Thu, Dec 28, 2017 at 5:16 PM, Emiliano Ingrassia
>  wrote:
> > Hi Martin, Hi Dave,
> >
> > On Sun, Dec 24, 2017 at 12:40:57AM +0100, Martin Blumenstingl wrote:
> >> Hi Dave,
> >>
> >> please do not apply this series until it got a Tested-by from Emiliano.
> >>
> >>
> >> Hi Emiliano,
> >>
> >> you reported [0] that you couldn't get dwmac-meson8b to work on your
> >> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
> >> I think I was able to find a fix: it consists of two patches (which you
> >> find in this series)
> >>
> >> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
> >> only partially test this (I could only check if the clocks were
> >> calculated correctly when using a dummy 52394Hz input clock instead
> >> of MPLL2).
> >>
> >> Could you please give this series a try and let me know about the
> >> results?
> >> You obviously still need your two "ARM: dts: meson8b" patches which
> >> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
> >> - enable Ethernet on the Odroid-C1
> >>
> >> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
> >> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
> >> fine (so let's hope that this also fixes your Meson8b issue :)).
> >>
> >>
> >> changes since v1 at [1]:
> >> - changed the subject of the cover-letter to indicate that this is all
> >>   about the RGMII clock
> >> - added PATCH #1 which ensures that we don't unnecessarily change the
> >>   parent clocks in RMII mode (and also makes the code easier to
> >>   understand)
> >> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
> >>   is about the RGMII clock
> >> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
> >> - replaced PATCH #3 (formerly PATCH #2) with one that sets
> >>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
> >>   on Meson8b correctly
> >>
> >
> > Really thank you for your help and effort. I tried your patch but
> > unfortunately it didn't solve the problem.
> this is probably my fault: I forgot to mention that it requires a fix
> for the 32-bit SoCs in the clock driver ("clk: meson: mpll: use 64-bit
> maths in params_from_rate", see [0]) to work properly
>

Ok, with that patch applied I got:

xtal   112400  0 0
 sys_pll   00  12  0 0
  cpu_clk  00  12  0 0
 vid_pll   00   73200  0 0
 fixed_pll 22  255000  0 0
  mpll211   124999851  0 0
   c941.ethernet#m250_sel  11   124999851  0 0
c941.ethernet#m250_div 11   124999851  0 0
 c941.ethernet#m25_div 112471  0 0

which is equal to your result. However, the ethernet is still not working.
The prg0 register is set to 0x70A1.

A problem that I see with this solution is that MPLL2 is set to ~125 MHz.
The S805 SoC manual reports that bits 9-7 should contain a value x such
that: MPLL2 = 250 MHz * x (with x >= 1).
In our case, bits 9-7 are set to 1 which is incorrect.
I think that MPLL2 should be 250 MHz at least.

> >
> > Here is the new clk_summary:
> >
> > xtal112400  0 0
> >  sys_pll00  12  0 0
> >   cpu_clk   00  12  0 0
> >  vid_pll00   73200  0 0
> >  fixed_pll  22  255000  0 0
> >   mpll2 11   10625  0 0
> >c941.ethernet#m250_sel   11   10625  0 0
> > c941.ethernet#m250_div  11   10625  0 0
> >  c941.ethernet#m25_div  112125  0 0
> >
> > which leads to a value of 0x70a1 in the prg0 ethernet register.
> > As you can see, something is changed but the RGMII clock is not at 25 MHz.
> > In particular, the bit 10 of prg0, which enables the "generation of 25 MHz
> > clock for PHY" (see S805 manual), is 0.
> assuming that the description in the datasheet is correct
> after Kevin and Mike got updated information from Amlogic about the
> PRG_ETHERNET0 register documenation (see [1]) we thought that bit 10
> means "0 = divide by 5, 1 = divide by 10" (see [2]). I didn't question
> this so far, but I'll test this on a newer SoC later (by forcing
> m250_div to 125MHz, then m25_div will have register value 0 = divide
> by 5)
> 
>

[PATCH net] skbuff: in skb_copy_ubufs unclone before releasing zerocopy

2017-12-28 Thread Willem de Bruijn

From: Willem de Bruijn 

skb_copy_ubufs must unclone before it is safe to modify its
skb_shared_info with skb_zcopy_clear.

Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even
without user frags") ensures that all skbs release their zerocopy
state, even those without frags.

But I forgot an edge case where such an skb arrives that is cloned.

The stack does not build such packets. Vhost/tun skbs have their
frags orphaned before cloning. TCP skbs only attach zerocopy state
when a frag is added.

But if TCP packets can be trimmed or linearized, this might occur.
Tracing the code I found no instance so far (e.g., skb_linearize
ends up calling skb_zcopy_clear if !skb->data_len).

Still, it is non-obvious that no path exists. And it is fragile to
rely on this.

Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without 
user frags")
Signed-off-by: Willem de Bruijn 

---

I should have spotted this when preparing the above patch, of course.
Apologies for the added work.

Since I did not spot a path that triggers the issue, this can also
go to net-next, instead. It applies to both cleanly.
---
 net/core/skbuff.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a3cb0be4c6f3..08f574081315 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1177,12 +1177,12 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
int i, new_frags;
u32 d_off;
 
-   if (!num_frags)
-   goto release;
-
if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
return -EINVAL;
 
+   if (!num_frags)
+   goto release;
+
new_frags = (__skb_pagelen(skb) + PAGE_SIZE - 1) >> PAGE_SHIFT;
for (i = 0; i < new_frags; i++) {
page = alloc_page(gfp_mask);
-- 
2.15.1.620.gb9897f4670-goog

Re: [PATCH net-next 0/4] mlx4 misc for 4.16

2017-12-28 Thread David Miller

From: Tariq Toukan 
Date: Thu, 28 Dec 2017 16:26:07 +0200

> This patchset contains misc cleanups and improvements
> to the mlx4 Core and Eth drivers.
> 
> In patches 1 and 2 I reduce and reorder the branches in the RX csum flow.
> In patch 3 I align the FMR unmapping flow with the device spec, to allow
>   a remapping afterwards.
> Patch 4 by Moni changes the default QoS settings so that a pause
>   frame stops all traffic regardless of its prio.
> 
> Series generated against net-next commit:
> 836df24a7062 net: hns3: hns3_get_channels() can be static

Series applied, thanks Tariq.

I can't say that that ipv6 ifdef you added in patch #1 is the nicest.

It's one thing to ifdef control entire blocks of code, or individual
values.

It's another thing to partially include pieces of code structure
such as closing parenthesis, arithmetic operators, and curly braces.
Two of those were included in the ifdef section.

For example, if we have:

if (x & (IPV4_THING | IPV6_THING)) {

I don't want to see:

if (x & (IPV4_THING |
#ifdef IPV6_TEST
 IPV6_THING)) {
#else
 0)) {
#endif

Those closing parenthesis and the openning curly brace are there
regardless of the CPP test, and duplicating them in both arms of
the CPP test only makes the code more confusing to read.

I can say I would prefer:

if (x & (IPV4_THING |
#ifdef IPV6_TEST
 IPV6_THING
#else
 0
#endif
 )) {

which is better.  However, best would be:

#ifdef IPV6_TEST
#define THING_MASK  IPV4_THING | IPV6_THING
#else
#define THING_MASK  IPV4_THING
#endif

if (x & THING_MASK) {

is the best.

Re: [patch net-next] net: sched: don't set extack message in case the qdisc will be created

2017-12-28 Thread David Miller

From: Jiri Pirko 
Date: Thu, 28 Dec 2017 16:52:10 +0100

> From: Jiri Pirko 
> 
> If the qdisc is not found here, it is going to be created. Therefore,
> this is not an error path. Remove the extack message set and don't
> confuse user with error message in case the qdisc was created
> successfully.
> 
> Fixes: 09215598119e ("net: sched: sch_api: handle generic qdisc errors")
> Signed-off-by: Jiri Pirko 

Applied.

Re: [PATCH 1/3] net: Fix possible race in peernet2id_alloc()

2017-12-28 Thread David Miller

From: Kirill Tkhai 
Date: Thu, 28 Dec 2017 15:55:15 +0300

> Could you please clarify the status or what I should do with the patchset
> (because it's not clear for me)?

Please resend.

Re: [PATCH net v1 1/1] tipc: fix hanging poll() for stream sockets

2017-12-28 Thread David Miller

From: Parthasarathy Bhuvaragan 
Date: Thu, 28 Dec 2017 12:03:06 +0100

> In commit 42b531de17d2f6 ("tipc: Fix missing connection request
> handling"), we replaced unconditional wakeup() with condtional
> wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.
> 
> This breaks the applications which do a connect followed by poll
> with POLLOUT flag. These applications are not woken when the
> connection is ESTABLISHED and hence sleep forever.
> 
> In this commit, we fix it by including the POLLOUT event for
> sockets in TIPC_CONNECTING state.
> 
> Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling")
> Acked-by: Jon Maloy 
> Signed-off-by: Parthasarathy Bhuvaragan 

Applied and queued up for -stable.

Re: [PATCH net-next v9 0/2] add UniPhier AVE ethernet support

2017-12-28 Thread David Miller

From: Kunihiko Hayashi 
Date: Thu, 28 Dec 2017 15:58:10 +0900

> This series adds support for Socionext AVE ethernet controller implemented
> on UniPhier SoCs. This driver supports RGMII/RMII modes.

Series applied.

Re: [PATCH net-next] cxgb4/cxgb4vf: support for XLAUI Port Type

2017-12-28 Thread David Miller

From: Ganesh Goudar 
Date: Thu, 28 Dec 2017 12:07:15 +0530

> Add support for new Backplane XLAUI port type.
> 
> Signed-off-by: Casey Leedom 
> Signed-off-by: Ganesh Goudar 

Applied.

Fw: [Bug 198297] New: Unable to add ethX to bridge if ethX. is already present in this bridge

2017-12-28 Thread Stephen Hemminger

I don't think this is ever going to work as expected.

Begin forwarded message:

Date: Thu, 28 Dec 2017 08:38:37 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 198297] New: Unable to add ethX to bridge if ethX. is 
already present in this bridge

https://bugzilla.kernel.org/show_bug.cgi?id=198297

Bug ID: 198297
   Summary: Unable to add ethX to bridge if ethX. is
already present in this bridge
   Product: Networking
   Version: 2.5
Kernel Version: 4.14.2
  Hardware: ARM
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: alexander_cheremshin...@yahoo.com
Regression: No

Kernel fails adding ethX to bridge if ethX. is already present in this
bridge.
Steps to reproduce:
# vconfig add eth2 10
# brctl addbr br
# brctl addif br eth2.10
# brctl show
bridge name bridge id   STP enabled interfaces
br  8000.0024a407481a   no  eth2.10
# brctl addif br eth2
can't add eth2 to bridge br: File exists
# brctl show
bridge name bridge id   STP enabled interfaces
br  8000.0024a407481a   no  eth2.10

But it is ok if ethX is added before ethX.
Steps to reproduce:
# brctl delif br eth2.10
# brctl addif br eth2
# brctl addif br eth2.10
# brctl show
bridge name bridge id   STP enabled interfaces
br  8000.0024a407481a   no  eth2
eth2.10

So the result is depending on order of interface addition, that does not looks
logical as for me. This works good at least in kernel 3.10.70.

From my investigation it fails in function __netdev_upper_dev_link
(net/core/dev.c) on lines:
if (netdev_has_upper_dev(dev, upper_dev))
return -EEXIST;
I checked source code of kernel 4.14.8 but it looks the same and I think it
also has this issue.
I'm not so good with linux kernel to fix this by myself so it would be very
nice to get a patch with fix for this issue or explanation why such behavior is
correct.

Thanks in advance,
Alex.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH net-next v6 0/6] net: tcp: sctp: dccp: Replace jprobe usage with trace events

2017-12-28 Thread David Miller

From: Masami Hiramatsu 
Date: Thu, 28 Dec 2017 15:10:00 +0900

> Changes from v5:
>   [1/6]: Avoid preprocessor directives in tracepoint macro args

Patch #1 is not the only patch which has this problem, at a minimum
patch #5 has it too.

Please audit the entire series for an issue when it is brought to your
attention.

Thank you.

Re: [PATCH net-next] cxgb4: display VNI correctly

2017-12-28 Thread David Miller

From: Ganesh Goudar 
Date: Thu, 28 Dec 2017 11:29:52 +0530

> Fix incorrect VNI display in mps_tcam
> 
> Signed-off-by: Santosh Rastapur 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [RFT net-next v2 0/3] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Martin Blumenstingl

Hi Emiliano,

thank you for testing this!

On Thu, Dec 28, 2017 at 5:16 PM, Emiliano Ingrassia
 wrote:
> Hi Martin, Hi Dave,
>
> On Sun, Dec 24, 2017 at 12:40:57AM +0100, Martin Blumenstingl wrote:
>> Hi Dave,
>>
>> please do not apply this series until it got a Tested-by from Emiliano.
>>
>>
>> Hi Emiliano,
>>
>> you reported [0] that you couldn't get dwmac-meson8b to work on your
>> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
>> I think I was able to find a fix: it consists of two patches (which you
>> find in this series)
>>
>> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
>> only partially test this (I could only check if the clocks were
>> calculated correctly when using a dummy 52394Hz input clock instead
>> of MPLL2).
>>
>> Could you please give this series a try and let me know about the
>> results?
>> You obviously still need your two "ARM: dts: meson8b" patches which
>> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
>> - enable Ethernet on the Odroid-C1
>>
>> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
>> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
>> fine (so let's hope that this also fixes your Meson8b issue :)).
>>
>>
>> changes since v1 at [1]:
>> - changed the subject of the cover-letter to indicate that this is all
>>   about the RGMII clock
>> - added PATCH #1 which ensures that we don't unnecessarily change the
>>   parent clocks in RMII mode (and also makes the code easier to
>>   understand)
>> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>>   is about the RGMII clock
>> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
>> - replaced PATCH #3 (formerly PATCH #2) with one that sets
>>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>>   on Meson8b correctly
>>
>
> Really thank you for your help and effort. I tried your patch but
> unfortunately it didn't solve the problem.
this is probably my fault: I forgot to mention that it requires a fix
for the 32-bit SoCs in the clock driver ("clk: meson: mpll: use 64-bit
maths in params_from_rate", see [0]) to work properly

>
> Here is the new clk_summary:
>
> xtal112400  0 0
>  sys_pll00  12  0 0
>   cpu_clk   00  12  0 0
>  vid_pll00   73200  0 0
>  fixed_pll  22  255000  0 0
>   mpll2 11   10625  0 0
>c941.ethernet#m250_sel   11   10625  0 0
> c941.ethernet#m250_div  11   10625  0 0
>  c941.ethernet#m25_div  112125  0 0
>
> which leads to a value of 0x70a1 in the prg0 ethernet register.
> As you can see, something is changed but the RGMII clock is not at 25 MHz.
> In particular, the bit 10 of prg0, which enables the "generation of 25 MHz
> clock for PHY" (see S805 manual), is 0.
assuming that the description in the datasheet is correct
after Kevin and Mike got updated information from Amlogic about the
PRG_ETHERNET0 register documenation (see [1]) we thought that bit 10
means "0 = divide by 5, 1 = divide by 10" (see [2]). I didn't question
this so far, but I'll test this on a newer SoC later (by forcing
m250_div to 125MHz, then m25_div will have register value 0 = divide
by 5)

if the description from the documentation is correct then we need to
replace m25_div with a fixed-factor clock (mult = 1, div = 5) and make
it a m25_en gate clock instead
the resulting clock path would look like this: mpll2 > m250_sel >
m250_div > fixed_factor > m25_en

> Please, if you have other suggestions let me know.
could you please re-test this with the patch from [0] applied? no
other changes should be needed!

> Best regards,
>
> Emiliano
>
>>
>> [0] 
>> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
>> [1] 
>> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
>>
>>
>> Martin Blumenstingl (3):
>>   net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode
>>   net: stmmac: dwmac-meson8b: fix setting the RGMII clock on Meson8b
>>   net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock
>>
>>  .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 55 
>> +++---
>>  1 file changed, 27 insertions(+), 28 deletions(-)
>>
>> --
>> 2.15.1
>>

Regards
Martin


[0] https://patchwork.kernel.org/patch/10131677/
[1] https://www.mail-archive.com/netdev@vger.kernel.org/msg119290.html
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg119293.html

Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-28 Thread Jiri Pirko

Thu, Dec 28, 2017 at 05:33:58PM CET, d...@cumulusnetworks.com wrote:
>On 12/28/17 10:23 AM, Jiri Pirko wrote:
>>> So there are 4 tables exported to userspace:
>>>
>>> 1. mlxsw_erif table which is not in any of the kvd regions (no resource
>>> path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
>>> in Router Interfaces? So the switch supports up to 1000 router interfaces.
>>>
>>> 2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
>> Size tells you the actual size. It cannot give you max size. The reason
>> is simple. The resources are shared among multiple tables. That is
>> exactly what this resource patch makes visible.
>> 
>> 
>
>In the erif table, the 1000 is the max not current usage. I do not have

I believe that is a bug in erif dpipe implementation
(mlxsw_sp_dpipe_table_erif_size_get) We'll fix that. Thanks!



>1000 interfaces:
>
>$ ip -br li sh | wc -l
>597
>
>
>$ devlink dpipe table dump pci/:03:00.0 name mlxsw_erif
>...
>  index 503
>  match_value:
>type field_exact header mlxsw_meta field erif_port mapping ifindex
>mapping_value 601 value 503
>  action_value:
>type field_modify header mlxsw_meta field l3_forward value 1
>
>
>The host4 table it is current size with no maximum.
>
>The meaning of table size needs to be consistent across tables.

Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-28 Thread David Ahern

On 12/28/17 10:23 AM, Jiri Pirko wrote:
>> So there are 4 tables exported to userspace:
>>
>> 1. mlxsw_erif table which is not in any of the kvd regions (no resource
>> path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
>> in Router Interfaces? So the switch supports up to 1000 router interfaces.
>>
>> 2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
> Size tells you the actual size. It cannot give you max size. The reason
> is simple. The resources are shared among multiple tables. That is
> exactly what this resource patch makes visible.
> 
> 

In the erif table, the 1000 is the max not current usage. I do not have
1000 interfaces:

$ ip -br li sh | wc -l
597

$ devlink dpipe table dump pci/:03:00.0 name mlxsw_erif
...
  index 503
  match_value:
type field_exact header mlxsw_meta field erif_port mapping ifindex
mapping_value 601 value 503
  action_value:
type field_modify header mlxsw_meta field l3_forward value 1

The host4 table it is current size with no maximum.

The meaning of table size needs to be consistent across tables.

Re: WARNING in strp_data_ready

2017-12-28 Thread Dmitry Vyukov

On Thu, Dec 28, 2017 at 5:14 PM, Tom Herbert  wrote:
> On Thu, Dec 28, 2017 at 12:59 AM, Ozgur  wrote:
>>
>>
>> 28.12.2017, 04:19, "Tom Herbert" :
>>> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
  27.12.2017, 23:14, "Dmitry Vyukov" :
>  On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
>>   27.12.2017, 22:21, "Dmitry Vyukov" :
>>>   On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
>>> wrote:
Did you try the patch I posted?
>>>
>>>   Hi Tom,
>>
>>   Hello Dmitry,
>>
>>>   No. And I didn't know I need to. Why?
>>>   If you think the patch needs additional testing, you can ask syzbot to
>>>   test it. See 
>>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>>>   Otherwise proceed with committing it. Or what are we waiting for?
>>>
>>>   Thanks
>>
>>   I think we need to fixed patch for crash, in fact check to patch code 
>> and test solve the bug.
>>   How do test it because there is no patch in the following bug?
>
>  Hi Ozgur,
>
>  I am not sure I completely understand what you mean. But the
>  reproducer for this bug (which one can use for testing) is here:
>  https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
>  Tom also mentions there is some patch for this, but I don't know where
>  it is, it doesn't seem to be referenced from this thread.

  Hello Dmitry,

  Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
  I think Tom send patch to only you and are you tested?

  kcmsock.c will change and strp_data_ready I think locked.

  Tom, please send a patch for me? I can test and inform you.
>>>
>>> Hi Ozgur,
>>>
>>> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
>>> can!
>>>
>>> Thanks,
>>> Tom
>>
>> Hello Tom,
>>
>> Which are you use the repos? I pulled but I don't seen this patches.
>>
> They are not in any public repo yet. I posted the patches to netdev
> list so they can be reviewed and tested by third parties. Posting
> patches to the list a normal path to get patches into the kernel
> (http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/).
>
> These patches were applied to net-next but are simple enough that they
> should apply to other branches. I will repost and target to net per
> Dave's directive once they are verified to fix the issue.

FWIW they are already verified to fix the issue, see few emails up, also here:
https://groups.google.com/d/msg/syzkaller-bugs/Kxs05ziCpgY/fPdZcO_GAwAJ
and don't forget this:
https://groups.google.com/d/msg/syzkaller-bugs/Kxs05ziCpgY/uGjsrA3HAwAJ

Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-28 Thread Jiri Pirko

Thu, Dec 28, 2017 at 05:09:09PM CET, d...@cumulusnetworks.com wrote:
>On 12/28/17 2:25 AM, Yuval Mintz wrote:
> Again, I have no objections to kvd, linear, hash, etc terms as they do
> relate to Mellanox products. But kvd/linear, for example, does correlate
> to industry standard concepts in some way. My request is that the
> resource listing guide the user in some way, stating what these
> resources mean.

 So the showed relation to dpipe table would be enougn or you would still
 like to see some description? I don't like the description concept here
 as the relations to dpipe table should tell user exactly what he needs
 to know.
>>>
>>> I believe it is useful to have a 1-line, short description that gives
>>> the user some memory jogger as to what the resource is used for. It does
>>> not have to be an exhaustive list, but the user should not have to do
>>> mental jumping jacks running a bunch of commands to understand the
>>> resources for vendor specific asics.
>> 
>> Perhaps we can simply have devlink utility output the dpipe
>> table[s] associated with the resource when showing the resource?
>> It would contain live information as well as prevent the need for
>> 'mental jumping jacks'.
>> 
>
>My primary contention for this static partitioning is that the proposal
>is not giving the user the information they need to make decisions.
>
>As I mentioned earlier, the resource show command gives this:
>$ devlink resource show pci/:03:00.0
>pci/:03:00.0:
>  name kvd size 245760 size_valid true
>  resources:
>name linear size 98304 occ 0
>name hash_double size 60416
>name hash_single size 87040
>
>the paths /kvd/linear, /kvd/hash_single and /kvd/hash_double are
>essentially random names (nothing related to industry standard names)

Of course. There is no industry standard for internal ASIC
implementations. This is the same as for dpipe. There is no industry
standard for ASIC pipeline. dpipe visualizes it. This resource patch
visualizes the internal ASIC resources and their mapping to the
individual dpipe tables.


>and the listed sizes are random numbers (no units)[1]. There is nothing
>there to tell a user what they can adjust or why they would want to make
>an adjustment.
>
>
>Looking at 'dpipe table show':
>
>$ devlink dpipe table show pci/:03:00.0
>pci/:03:00.0:
>  name mlxsw_erif size 1000 counters_enabled false
>  match:
>type field_exact header mlxsw_meta field erif_port mapping ifindex
>  action:
>type field_modify header mlxsw_meta field l3_forward
>type field_modify header mlxsw_meta field l3_drop
>
>  resource_path /kvd/hash_single name mlxsw_host4 size 62
>counters_enabled false
>  match:
>type field_exact header mlxsw_meta field erif_port mapping ifindex
>type field_exact header ipv4 field destination ip
>  action:
>type field_modify header ethernet field destination mac
>
>  resource_path /kvd/hash_double name mlxsw_host6 size 0
>counters_enabled false
>  match:
>type field_exact header mlxsw_meta field erif_port mapping ifindex
>type field_exact header ipv6 field destination ip
>  action:
>type field_modify header ethernet field destination mac
>
>  resource_path /kvd/linear name mlxsw_adj size 0 counters_enabled false
>  match:
>type field_exact header mlxsw_meta field adj_index
>type field_exact header mlxsw_meta field adj_size
>type field_exact header mlxsw_meta field adj_hash_index
>  action:
>type field_modify header ethernet field destination mac
>type field_modify header mlxsw_meta field erif_port mapping ifindex
>
>
>So there are 4 tables exported to userspace:
>
>1. mlxsw_erif table which is not in any of the kvd regions (no resource
>path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
>in Router Interfaces? So the switch supports up to 1000 router interfaces.
>
>2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the

Size tells you the actual size. It cannot give you max size. The reason
is simple. The resources are shared among multiple tables. That is
exactly what this resource patch makes visible.


>fields mlxsw_host4 means IPv4 host entries (neighbor entries). Why is
>the size set at 62? Seems really low.
>
>3. mlxsw_host6 in /kvd/hash_double with a size of 0. Based on the fields
>mlxsw_host6 means IPv6 host entries (neighbor entries). The size of 0 is
>concerning. I guess the switch is not configured to do IPv6?
>
>4. mlxsw_adj in /kvd/linear with a size of 0. Based on the fields I am
>going to guess it is an fdb entry

Re: [RFT net-next v2 0/3] dwmac-meson8b: RGMII clock fixes for Meson8b

2017-12-28 Thread Emiliano Ingrassia

Hi Martin, Hi Dave,

On Sun, Dec 24, 2017 at 12:40:57AM +0100, Martin Blumenstingl wrote:
> Hi Dave,
> 
> please do not apply this series until it got a Tested-by from Emiliano.
> 
> 
> Hi Emiliano,
> 
> you reported [0] that you couldn't get dwmac-meson8b to work on your
> Odroid-C1. With your findings (register dumps, clk_summary output, etc.)
> I think I was able to find a fix: it consists of two patches (which you
> find in this series)
> 
> Unfortunately I don't have any Meson8b boards with RGMII PHY so I could
> only partially test this (I could only check if the clocks were
> calculated correctly when using a dummy 52394Hz input clock instead
> of MPLL2).
> 
> Could you please give this series a try and let me know about the
> results?
> You obviously still need your two "ARM: dts: meson8b" patches which
> - add the amlogic,meson8b-dwmac" compatible to meson8b.dtsi
> - enable Ethernet on the Odroid-C1
> 
> I have tested this myself on a Khadas VIM (GXL SoC, internal RMII PHY)
> and a Khadas VIM2 (GXM SoC, external RGMII PHY). Both are still working
> fine (so let's hope that this also fixes your Meson8b issue :)).
> 
> 
> changes since v1 at [1]:
> - changed the subject of the cover-letter to indicate that this is all
>   about the RGMII clock
> - added PATCH #1 which ensures that we don't unnecessarily change the
>   parent clocks in RMII mode (and also makes the code easier to
>   understand)
> - changed subject of PATCH #2 (formerly PATCH #1) to state that this
>   is about the RGMII clock
> - added Jerome's Reviewed-by to PATCH #2 (formerly PATCH #1)
> - replaced PATCH #3 (formerly PATCH #2) with one that sets
>   CLK_SET_RATE_PARENT on the mux and thus re-configures the MPLL2 clock
>   on Meson8b correctly
>

Really thank you for your help and effort. I tried your patch but
unfortunately it didn't solve the problem.

Here is the new clk_summary:

xtal112400  0 0
 sys_pll00  12  0 0
  cpu_clk   00  12  0 0
 vid_pll00   73200  0 0
 fixed_pll  22  255000  0 0
  mpll2 11   10625  0 0
   c941.ethernet#m250_sel   11   10625  0 0
c941.ethernet#m250_div  11   10625  0 0
 c941.ethernet#m25_div  112125  0 0

which leads to a value of 0x70a1 in the prg0 ethernet register.
As you can see, something is changed but the RGMII clock is not at 25 MHz.
In particular, the bit 10 of prg0, which enables the "generation of 25 MHz
clock for PHY" (see S805 manual), is 0.

Please, if you have other suggestions let me know.

Best regards,

Emiliano

> 
> [0] 
> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005596.html
> [1] 
> http://lists.infradead.org/pipermail/linux-amlogic/2017-December/005848.html
> 
> 
> Martin Blumenstingl (3):
>   net: stmmac: dwmac-meson8b: only configure the clocks in RGMII mode
>   net: stmmac: dwmac-meson8b: fix setting the RGMII clock on Meson8b
>   net: stmmac: dwmac-meson8b: propagate rate changes to the parent clock
> 
>  .../net/ethernet/stmicro/stmmac/dwmac-meson8b.c| 55 
> +++---
>  1 file changed, 27 insertions(+), 28 deletions(-)
> 
> -- 
> 2.15.1
>

Re: WARNING in strp_data_ready

2017-12-28 Thread Tom Herbert

On Thu, Dec 28, 2017 at 12:59 AM, Ozgur  wrote:
>
>
> 28.12.2017, 04:19, "Tom Herbert" :
>> On Wed, Dec 27, 2017 at 12:20 PM, Ozgur  wrote:
>>>  27.12.2017, 23:14, "Dmitry Vyukov" :
  On Wed, Dec 27, 2017 at 9:08 PM, Ozgur  wrote:
>   27.12.2017, 22:21, "Dmitry Vyukov" :
>>   On Wed, Dec 27, 2017 at 8:09 PM, Tom Herbert  
>> wrote:
>>>Did you try the patch I posted?
>>
>>   Hi Tom,
>
>   Hello Dmitry,
>
>>   No. And I didn't know I need to. Why?
>>   If you think the patch needs additional testing, you can ask syzbot to
>>   test it. See 
>> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#communication-with-syzbot
>>   Otherwise proceed with committing it. Or what are we waiting for?
>>
>>   Thanks
>
>   I think we need to fixed patch for crash, in fact check to patch code 
> and test solve the bug.
>   How do test it because there is no patch in the following bug?

  Hi Ozgur,

  I am not sure I completely understand what you mean. But the
  reproducer for this bug (which one can use for testing) is here:
  https://groups.google.com/forum/#!topic/syzkaller-bugs/Kxs05ziCpgY
  Tom also mentions there is some patch for this, but I don't know where
  it is, it doesn't seem to be referenced from this thread.
>>>
>>>  Hello Dmitry,
>>>
>>>  Ah, I'm sorry I don't seen Tom mail and I don't have a patch not tested :)
>>>  I think Tom send patch to only you and are you tested?
>>>
>>>  kcmsock.c will change and strp_data_ready I think locked.
>>>
>>>  Tom, please send a patch for me? I can test and inform you.
>>
>> Hi Ozgur,
>>
>> I reposted the patches as RFC "kcm: Fix lockdep issue". Please test if you 
>> can!
>>
>> Thanks,
>> Tom
>
> Hello Tom,
>
> Which are you use the repos? I pulled but I don't seen this patches.
>
They are not in any public repo yet. I posted the patches to netdev
list so they can be reviewed and tested by third parties. Posting
patches to the list a normal path to get patches into the kernel
(http://nickdesaulniers.github.io/blog/2017/05/16/submitting-your-first-patch-to-the-linux-kernel-and-responding-to-feedback/).

These patches were applied to net-next but are simple enough that they
should apply to other branches. I will repost and target to net per
Dave's directive once they are verified to fix the issue.

Tom

Re: [patch net-next v2 00/10] Add support for resource abstraction

2017-12-28 Thread David Ahern

On 12/28/17 2:25 AM, Yuval Mintz wrote:
 Again, I have no objections to kvd, linear, hash, etc terms as they do
 relate to Mellanox products. But kvd/linear, for example, does correlate
 to industry standard concepts in some way. My request is that the
 resource listing guide the user in some way, stating what these
 resources mean.
>>>
>>> So the showed relation to dpipe table would be enougn or you would still
>>> like to see some description? I don't like the description concept here
>>> as the relations to dpipe table should tell user exactly what he needs
>>> to know.
>>
>> I believe it is useful to have a 1-line, short description that gives
>> the user some memory jogger as to what the resource is used for. It does
>> not have to be an exhaustive list, but the user should not have to do
>> mental jumping jacks running a bunch of commands to understand the
>> resources for vendor specific asics.
> 
> Perhaps we can simply have devlink utility output the dpipe
> table[s] associated with the resource when showing the resource?
> It would contain live information as well as prevent the need for
> 'mental jumping jacks'.
> 

My primary contention for this static partitioning is that the proposal
is not giving the user the information they need to make decisions.

As I mentioned earlier, the resource show command gives this:
$ devlink resource show pci/:03:00.0
pci/:03:00.0:
  name kvd size 245760 size_valid true
  resources:
name linear size 98304 occ 0
name hash_double size 60416
name hash_single size 87040

the paths /kvd/linear, /kvd/hash_single and /kvd/hash_double are
essentially random names (nothing related to industry standard names)
and the listed sizes are random numbers (no units)[1]. There is nothing
there to tell a user what they can adjust or why they would want to make
an adjustment.

Looking at 'dpipe table show':

$ devlink dpipe table show pci/:03:00.0
pci/:03:00.0:
  name mlxsw_erif size 1000 counters_enabled false
  match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
  action:
type field_modify header mlxsw_meta field l3_forward
type field_modify header mlxsw_meta field l3_drop

  resource_path /kvd/hash_single name mlxsw_host4 size 62
counters_enabled false
  match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
type field_exact header ipv4 field destination ip
  action:
type field_modify header ethernet field destination mac

  resource_path /kvd/hash_double name mlxsw_host6 size 0
counters_enabled false
  match:
type field_exact header mlxsw_meta field erif_port mapping ifindex
type field_exact header ipv6 field destination ip
  action:
type field_modify header ethernet field destination mac

  resource_path /kvd/linear name mlxsw_adj size 0 counters_enabled false
  match:
type field_exact header mlxsw_meta field adj_index
type field_exact header mlxsw_meta field adj_size
type field_exact header mlxsw_meta field adj_hash_index
  action:
type field_modify header ethernet field destination mac
type field_modify header mlxsw_meta field erif_port mapping ifindex

So there are 4 tables exported to userspace:

1. mlxsw_erif table which is not in any of the kvd regions (no resource
path is given) and it has a size of 1000. Does mlxsw_erif mean a rif as
in Router Interfaces? So the switch supports up to 1000 router interfaces.

2. mlxsw_host4 in /kvd/hash_single with a size of 62. Based on the
fields mlxsw_host4 means IPv4 host entries (neighbor entries). Why is
the size set at 62? Seems really low.

3. mlxsw_host6 in /kvd/hash_double with a size of 0. Based on the fields
mlxsw_host6 means IPv6 host entries (neighbor entries). The size of 0 is
concerning. I guess the switch is not configured to do IPv6?

4. mlxsw_adj in /kvd/linear with a size of 0. Based on the fields I am
going to guess it is an fdb entry

So if I combine the output of "resource show" and "dpipe table show", I
can conclude:

1. /kvd/linear is only used for fdb entries. So if I want an L3 only use
case I can set /kvd/linear 0. Is that correct? I believe the answer is
no, but based on the information given it seems that way.

2. /kvd/hash_double has a size of 60416, but it is only used for
mlxsw_host6 table which has a size of 0. Now, I am confused.

3. /kvd/hash_single has a size of 87040, but it is only used for
mlxsw_host4 table which has a size of 62. Again confused.

What is a user to make of this output? How is it useful for making
decisions on whether to increase or decrease some resource?

[1] In a response, Jiri mentioned units are added by this patch set but
all I see in the patches is this:

@@ -245,4 +256,8 @@ enum devlink_dpipe_header_id {
DEVLINK_DPIPE_HEADER_IPV6,
 };

+enum devlink_resource_unit {
+   DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+};
+
 #endif /* _UAPI_LINUX_DEVLINK_H_ */

DEVLINK_RESOURCE_UNIT_DOUBLE_WORD means what???

Re: [PATCH net-next v2 1/3] virtio_net: propagate linkspeed/duplex settings from the hypervisor

2017-12-28 Thread Jason Baron



On 12/27/2017 04:43 PM, David Miller wrote:
> From: Jason Baron 
> Date: Fri, 22 Dec 2017 16:54:01 -0500
> 
>> The ability to set speed and duplex for virtio_net in useful in various
>> scenarios as described here:
>>
>> 16032be virtio_net: add ethtool support for set and get of settings
>>
>> However, it would be nice to be able to set this from the hypervisor,
>> such that virtio_net doesn't require custom guest ethtool commands.
>>
>> Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
>> the hypervisor to export a linkspeed and duplex setting. The user can
>> subsequently overwrite it later if desired via: 'ethtool -s'.
>>
>> Signed-off-by: Jason Baron 
>> Cc: "Michael S. Tsirkin" 
>> Cc: Jason Wang 
> 
> Looks mostly fine to me but need some virtio_net reviewers on this one.
> 
>> @@ -57,6 +57,8 @@
>>   * Steering */
>>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23   /* Set MAC address */
>>  
>> +#define VIRTIO_NET_F_SPEED_DUPLEX 63/* Host set linkspeed and 
>> duplex */
>> +
> 
> Why use a value so far away from the largest existing one?
> 
> Just curious.
> 

So that came from a discussion with Michael about which bit to use for
this, and he suggested using 63:

"
Transports started from bit 24 and are growing up.
So I would say devices should start from bit 63 and grow down.
"

https://patchwork.ozlabs.org/patch/848814/#1826669

I will add a comment to explain it.

Thanks,

-Jason

[patch net-next] net: sched: don't set extack message in case the qdisc will be created

2017-12-28 Thread Jiri Pirko

From: Jiri Pirko 

If the qdisc is not found here, it is going to be created. Therefore,
this is not an error path. Remove the extack message set and don't
confuse user with error message in case the qdisc was created
successfully.

Fixes: 09215598119e ("net: sched: sch_api: handle generic qdisc errors")
Signed-off-by: Jiri Pirko 
---
 net/sched/sch_api.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 3a3a1da..81ecf5b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1402,10 +1402,8 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
return -EINVAL;
}
q = qdisc_lookup(dev, tcm->tcm_handle);
-   if (!q) {
-   NL_SET_ERR_MSG(extack, "No qdisc found 
for specified handle");
+   if (!q)
goto create_n_graft;
-   }
if (n->nlmsg_flags & NLM_F_EXCL) {
NL_SET_ERR_MSG(extack, "Exclusivity 
flag on, cannot override");
return -EEXIST;
-- 
2.9.5

Re: [PATCH iproute2] gre/tunnel: Print erspan_index using print_uint()

2017-12-28 Thread Serhey Popovych


> Hi Serhey,

Hi William,

Yes, iproute2-next/net-next branch contains fix already: we probably do
not need proposed change. Sorry for noise.

I'm currently focused on bug fixing in stable which affects my work for
iproute2-next. All my future work will be based on iproute2-next as it
contains refactoring, optimization and cleanups, not fixes.

Sorry again and thanks for feedback.

> 
> On Thu, Dec 28, 2017 at 3:12 AM, Serhey Popovych
>  wrote:
>> One is missing in JSON output because fprintf()
>> is used instead of print_uint().
>>
>> Signed-off-by: Serhey Popovych 
>> ---
>>  ip/link_gre.c  |3 ++-
>>  ip/link_gre6.c |4 +++-
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/ip/link_gre.c b/ip/link_gre.c
>> index 896bb19..1e331c8 100644
>> --- a/ip/link_gre.c
>> +++ b/ip/link_gre.c
>> @@ -476,7 +476,8 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
>> struct rtattr *tb[])
>> if (tb[IFLA_GRE_ERSPAN_INDEX]) {
>> __u32 erspan_idx = 
>> rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
>>
>> -   fprintf(f, "erspan_index %u ", erspan_idx);
>> +   print_uint(PRINT_ANY,
>> +  "erspan_index", "erspan_index %u ", erspan_idx);
>> }
>>
> 
> Did you pull the latest code from net-next branch?
> I think I already fix it.
> William
> 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH iproute2] gre/tunnel: Print erspan_index using print_uint()

2017-12-28 Thread William Tu

Hi Serhey,

On Thu, Dec 28, 2017 at 3:12 AM, Serhey Popovych
 wrote:
> One is missing in JSON output because fprintf()
> is used instead of print_uint().
>
> Signed-off-by: Serhey Popovych 
> ---
>  ip/link_gre.c  |3 ++-
>  ip/link_gre6.c |4 +++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/ip/link_gre.c b/ip/link_gre.c
> index 896bb19..1e331c8 100644
> --- a/ip/link_gre.c
> +++ b/ip/link_gre.c
> @@ -476,7 +476,8 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 
> struct rtattr *tb[])
> if (tb[IFLA_GRE_ERSPAN_INDEX]) {
> __u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
>
> -   fprintf(f, "erspan_index %u ", erspan_idx);
> +   print_uint(PRINT_ANY,
> +  "erspan_index", "erspan_index %u ", erspan_idx);
> }
>

Did you pull the latest code from net-next branch?
I think I already fix it.
William

1 2 >

1 - 100 of 130 matches

Mail list logo