date:20170125

Re: [PATCH net-next 5/5] net: dsa: mv88e6xxx: Implement the 6390 external MDIO bus

2017-01-25 Thread Gregory CLEMENT

Hi Andrew,
 
 On mer., janv. 25 2017, Andrew Lunn  wrote:

>> diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
>> b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
>> index 7d24add45e74..572d585dc1e2 100644
>> --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
>> +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
>> @@ -387,6 +387,7 @@
>>  #define GLOBAL2_PTP_AVB_DATA0x17
>>  #define GLOBAL2_SMI_PHY_CMD 0x18
>>  #define GLOBAL2_SMI_PHY_CMD_BUSYBIT(15)
>> +#define GLOBAL2_SMI_PHY_CMD_EXTERNALBIT(13)
>>  #define GLOBAL2_SMI_PHY_CMD_MODE_22 BIT(12)
>>  #define GLOBAL2_SMI_PHY_CMD_OP_22_WRITE_DATA((0x1 << 10) | \
>
> Hi Gregory
>
> Please could you check if the 88E6341 has an external MDIO. Global 2,
> register 0x18, bit 13.

I confirm that 88E6341 has Global 2, register 0x18, bit 13 referred as
"External access"

Gregory

>
> Thanks
>   Andrew

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

[PATCH net v2 4/4] r8152: check rx after napi is enabled

2017-01-25 Thread Hayes Wang

Schedule the napi after napi_enable() for rx, if it is necessary.

If the rx is completed when napi is disabled, the sheduling of napi
would be lost. Then, no one handles the rx packet until next napi
is scheduled.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 45d168e..8924520 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -32,7 +32,7 @@
 #define NETNEXT_VERSION"08"
 
 /* Information for net */
-#define NET_VERSION"7"
+#define NET_VERSION"8"
 
 #define DRIVER_VERSION "v1." NETNEXT_VERSION "." NET_VERSION
 #define DRIVER_AUTHOR "Realtek linux nic maintainers "
@@ -3561,6 +3561,9 @@ static int rtl8152_post_reset(struct usb_interface *intf)
netif_wake_queue(netdev);
usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
+   if (!list_empty(&tp->rx_done))
+   napi_schedule(&tp->napi);
+
return 0;
 }
 
@@ -3700,6 +3703,8 @@ static int rtl8152_resume(struct usb_interface *intf)
napi_enable(&tp->napi);
clear_bit(SELECTIVE_SUSPEND, &tp->flags);
smp_mb__after_atomic();
+   if (!list_empty(&tp->rx_done))
+   napi_schedule(&tp->napi);
} else {
tp->rtl_ops.up(tp);
netif_carrier_off(tp->netdev);
-- 
2.7.4

[PATCH net v2 3/4] r8152: re-schedule napi for tx

2017-01-25 Thread Hayes Wang

Re-schedule napi after napi_complete() for tx, if it is necessay.

In r8152_poll(), if the tx is completed after tx_bottom() and before
napi_complete(), the scheduling of napi would be lost. Then, no
one handles the next tx until the next napi_schedule() is called.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index ec882be..45d168e 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1936,6 +1936,9 @@ static int r8152_poll(struct napi_struct *napi, int 
budget)
napi_complete(napi);
if (!list_empty(&tp->rx_done))
napi_schedule(napi);
+   else if (!skb_queue_empty(&tp->tx_queue) &&
+!list_empty(&tp->tx_free))
+   napi_schedule(&tp->napi);
}
 
return work_done;
-- 
2.7.4

[PATCH net v2 1/4] r8152: avoid start_xmit to call napi_schedule during autosuspend

2017-01-25 Thread Hayes Wang

Adjust the setting of the flag of SELECTIVE_SUSPEND to prevent start_xmit()
from calling napi_schedule() directly during runtime suspend.

After calling napi_disable() or clearing the flag of WORK_ENABLE,
scheduling the napi is useless.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index e1466b4..23bef8e 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3585,10 +3585,15 @@ static int rtl8152_rumtime_suspend(struct r8152 *tp)
struct net_device *netdev = tp->netdev;
int ret = 0;
 
+   set_bit(SELECTIVE_SUSPEND, &tp->flags);
+   smp_mb__after_atomic();
+
if (netif_running(netdev) && test_bit(WORK_ENABLE, &tp->flags)) {
u32 rcr = 0;
 
if (delay_autosuspend(tp)) {
+   clear_bit(SELECTIVE_SUSPEND, &tp->flags);
+   smp_mb__after_atomic();
ret = -EBUSY;
goto out1;
}
@@ -3605,6 +3610,8 @@ static int rtl8152_rumtime_suspend(struct r8152 *tp)
if (!(ocp_data & RXFIFO_EMPTY)) {
rxdy_gated_en(tp, false);
ocp_write_dword(tp, MCU_TYPE_PLA, PLA_RCR, rcr);
+   clear_bit(SELECTIVE_SUSPEND, &tp->flags);
+   smp_mb__after_atomic();
ret = -EBUSY;
goto out1;
}
@@ -3624,8 +3631,6 @@ static int rtl8152_rumtime_suspend(struct r8152 *tp)
}
}
 
-   set_bit(SELECTIVE_SUSPEND, &tp->flags);
-
 out1:
return ret;
 }
@@ -3681,12 +3686,13 @@ static int rtl8152_resume(struct usb_interface *intf)
if (netif_running(tp->netdev) && tp->netdev->flags & IFF_UP) {
if (test_bit(SELECTIVE_SUSPEND, &tp->flags)) {
tp->rtl_ops.autosuspend_en(tp, false);
-   clear_bit(SELECTIVE_SUSPEND, &tp->flags);
napi_disable(&tp->napi);
set_bit(WORK_ENABLE, &tp->flags);
if (netif_carrier_ok(tp->netdev))
rtl_start_rx(tp);
napi_enable(&tp->napi);
+   clear_bit(SELECTIVE_SUSPEND, &tp->flags);
+   smp_mb__after_atomic();
} else {
tp->rtl_ops.up(tp);
netif_carrier_off(tp->netdev);
-- 
2.7.4

[PATCH net v2 0/4] r8152: fix scheduling napi

2017-01-25 Thread Hayes Wang

v2:
Add smp_mb__after_atomic() for patch #1.

v1:
Scheduling the napi during the following periods would let it be ignored.
And the events wouldn't be handled until next napi_schedule() is called.

1. after napi_disable and before napi_enable().
2. after all actions of napi function is completed and before calling
   napi_complete().

If no next napi_schedule() is called, tx or rx would stop working.

In order to avoid these situations, the followings solutions are applied.

1. prevent start_xmit() from calling napi_schedule() during runtime suspend
   or after napi_disable().
2. re-schedule the napi for tx if it is necessary.
3. check if any rx is finished or not after napi_enable().

Hayes Wang (4):
  r8152: avoid start_xmit to call napi_schedule during autosuspend
  r8152: avoid start_xmit to schedule napi when napi is disabled
  r8152: re-schedule napi for tx
  r8152: check rx after napi is enabled

 drivers/net/usb/r8152.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

-- 
2.7.4

[PATCH net v2 2/4] r8152: avoid start_xmit to schedule napi when napi is disabled

2017-01-25 Thread Hayes Wang

Stop the tx when the napi is disabled to prevent napi_schedule() is
called.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 23bef8e..ec882be 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3155,10 +3155,13 @@ static void set_carrier(struct r8152 *tp)
if (!netif_carrier_ok(netdev)) {
tp->rtl_ops.enable(tp);
set_bit(RTL8152_SET_RX_MODE, &tp->flags);
+   netif_stop_queue(netdev);
napi_disable(&tp->napi);
netif_carrier_on(netdev);
rtl_start_rx(tp);
napi_enable(&tp->napi);
+   netif_wake_queue(netdev);
+   netif_info(tp, link, netdev, "carrier on\n");
}
} else {
if (netif_carrier_ok(netdev)) {
@@ -3166,6 +3169,7 @@ static void set_carrier(struct r8152 *tp)
napi_disable(&tp->napi);
tp->rtl_ops.disable(tp);
napi_enable(&tp->napi);
+   netif_info(tp, link, netdev, "carrier off\n");
}
}
 }
@@ -3515,12 +3519,12 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
if (!netif_running(netdev))
return 0;
 
+   netif_stop_queue(netdev);
napi_disable(&tp->napi);
clear_bit(WORK_ENABLE, &tp->flags);
usb_kill_urb(tp->intr_urb);
cancel_delayed_work_sync(&tp->schedule);
if (netif_carrier_ok(netdev)) {
-   netif_stop_queue(netdev);
mutex_lock(&tp->control);
tp->rtl_ops.disable(tp);
mutex_unlock(&tp->control);
@@ -3548,10 +3552,10 @@ static int rtl8152_post_reset(struct usb_interface 
*intf)
rtl_start_rx(tp);
rtl8152_set_rx_mode(netdev);
mutex_unlock(&tp->control);
-   netif_wake_queue(netdev);
}
 
napi_enable(&tp->napi);
+   netif_wake_queue(netdev);
usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
return 0;
-- 
2.7.4

Re: [PATCH 2/2] mac80211: use accessor functions to set sta->_flags

2017-01-25 Thread Amadeusz Slawinski

And yes I did. Somehow managed to ignore those warnings though, sorry
about that.
Rechecked with just first patch and it should still be good. Please
ignore this one ;)

On 24 January 2017 at 16:44, Johannes Berg  wrote:
> On Tue, 2017-01-24 at 16:42 +0100, Amadeusz Sławiński wrote:
>> cleanup patch to make use of set_sta_flag()/clear_sta_flag() in
>> places
>> where we access sta->_flags
>>
>> Signed-off-by: Amadeusz Sławiński 
>> ---
>>  net/mac80211/sta_info.c | 12 ++--
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
>> index b6cfcf0..6c9cc2f 100644
>> --- a/net/mac80211/sta_info.c
>> +++ b/net/mac80211/sta_info.c
>> @@ -1855,13 +1855,13 @@ int sta_info_move_state(struct sta_info *sta,
>>   switch (new_state) {
>>   case IEEE80211_STA_NONE:
>>   if (sta->sta_state == IEEE80211_STA_AUTH)
>> - clear_bit(WLAN_STA_AUTH, &sta->_flags);
>> + clear_sta_flag(sta, WLAN_STA_AUTH);
>
> You should try to run this patch sometime :)
>
> johannes

Re: [PATCH for bnxt_re V4 17/21] RDMA/bnxt_re: Handling dispatching of events to IB stack

2017-01-25 Thread Selvin Xavier

On Tue, Jan 24, 2017 at 5:48 PM, Leon Romanovsky  wrote:
> All callers to this function in this patch set qp_wait to be false.
> Do you have in following patches qp_wait == true?
> I'm curious because of your msleep below.

Thanks for pointing it out. Driver in our internal tree had one more
condition which has a qp_wait == true condition. I missed to remove
this before posting upstream. Will include this in V5

Re: [PATCH for bnxt_re V4 20/21] RDMA/bnxt_re: Add QP event handling

2017-01-25 Thread Selvin Xavier

On Tue, Jan 24, 2017 at 5:50 PM, Leon Romanovsky  wrote:
> it looks like if( ... ) return 0

Yes.. There is some code to be added on this area as a part of error
reporting. We will add this once the driver is accepted. Perhaps, i
will add a debug print here for now.

[PATCH 2/2] net-next: ethernet: mediatek: change the compatible string

2017-01-25 Thread John Crispin

When the binding was defined, I was not aware that mt2701 was an earlier
version of the SoC. For sake of consistency, the ethernet driver should
use mt2701 inside the compat string as this is the earliest SoC with the
ethernet core.

The ethernet driver is currently of no real use until we finish and
upstream the DSA driver. There are no users of this binding yet. It should
be safe to fix this now before it is too late and we need to provide
backward compatibility for the mt7623-eth compat string.

Reported-by: Sean Wang 
Signed-off-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 25ae0c5..9e75768 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2515,7 +2515,7 @@ static int mtk_remove(struct platform_device *pdev)
 }
 
 const struct of_device_id of_mtk_match[] = {
-   { .compatible = "mediatek,mt7623-eth" },
+   { .compatible = "mediatek,mt2701-eth" },
{},
 };
 MODULE_DEVICE_TABLE(of, of_mtk_match);
-- 
1.7.10.4

[PATCH 1/2] Documentation: devicetree: change the mediatek ethernet compatible string

2017-01-25 Thread John Crispin

When the binding was defined, I was not aware that mt2701 was an earlier
version of the SoC. For sake of consistency, the ethernet driver should
use mt2701 inside the compat string as this is the earliest SoC with the
ethernet core.

The ethernet driver is currently of no real use until we finish and
upstream the DSA driver. There are no users of this binding yet. It should
be safe to fix this now before it is too late and we need to provide
backward compatibility for the mt7623-eth compat string.

Reported-by: Sean Wang 
Signed-off-by: John Crispin 
---
 Documentation/devicetree/bindings/net/mediatek-net.txt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt 
b/Documentation/devicetree/bindings/net/mediatek-net.txt
index c010faf..c7194e8 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -7,7 +7,7 @@ have dual GMAC each represented by a child node..
 * Ethernet controller node
 
 Required properties:
-- compatible: Should be "mediatek,mt7623-eth"
+- compatible: Should be "mediatek,mt2701-eth"
 - reg: Address and length of the register set for the device
 - interrupts: Should contain the three frame engines interrupts in numeric
order. These are fe_int0, fe_int1 and fe_int2.
-- 
1.7.10.4

Re: [PATCH 2/2] mac80211: use accessor functions to set sta->_flags

2017-01-25 Thread Johannes Berg

On Wed, 2017-01-25 at 09:55 +0100, Amadeusz Slawinski wrote:
> And yes I did. Somehow managed to ignore those warnings though, sorry
> about that.

:)
That was intentional so nobody changing mac80211 in the future will
accidentally play with those flags through the normal accessors.

> Rechecked with just first patch and it should still be good. Please
> ignore this one ;)

Yeah, I still have that one pending, no worries :)

johannes

Re: [PATCH net-next v2] macb: Common code to enable ptp support for MACB/GEM

2017-01-25 Thread Nicolas Ferre

Le 19/01/2017 à 17:07, Nicolas Ferre a écrit :
> Le 19/01/2017 à 08:56, Andrei Pistirica a écrit :
>> This patch does the following:
>> - MACB/GEM-PTP interface
>> - registers and bitfields for TSU
>> - capability flags to enable PTP per platform basis
>>
>> Signed-off-by: Andrei Pistirica 
> 
> Acked-by: Nicolas Ferre 

Harini or Rafal, do you plan to review this patch and add your
"Reviewed-by" tags? It can be useful to make this support move forward.

Regards,

>> ---
>> Patch history:
>>
>> Version 1:
>> This is just the common code for MACB/GEM-PTP support.
>> Code is based on the comments related to the following patch series:
>> - [RFC PATCH net-next v1-to-4 1/2] macb: Add 1588 support in Cadence GEM
>> - [RFC PATCH net-next v1-to-4 2/2] macb: Enable 1588 support in SAMA5Dx 
>> platforms
>>
>> Version 2:
>> - Cosmetic changes and PTP capability flag changed doe to overlapping with 
>> JUMBO.
>>
>> Note: Patch on net-next: January 19.
>>
>>  drivers/net/ethernet/cadence/macb.c | 32 +++-
>>  drivers/net/ethernet/cadence/macb.h | 74 
>> +
>>  2 files changed, 104 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.c 
>> b/drivers/net/ethernet/cadence/macb.c
>> index c0fb80a..ff1e648 100644
>> --- a/drivers/net/ethernet/cadence/macb.c
>> +++ b/drivers/net/ethernet/cadence/macb.c
>> @@ -2085,6 +2085,9 @@ static int macb_open(struct net_device *dev)
>>  
>>  netif_tx_start_all_queues(dev);
>>  
>> +if (bp->ptp_info)
>> +bp->ptp_info->ptp_init(dev);
>> +
>>  return 0;
>>  }
>>  
>> @@ -2106,6 +2109,9 @@ static int macb_close(struct net_device *dev)
>>  
>>  macb_free_consistent(bp);
>>  
>> +if (bp->ptp_info)
>> +bp->ptp_info->ptp_remove(dev);
>> +
>>  return 0;
>>  }
>>  
>> @@ -2379,6 +2385,17 @@ static int macb_set_ringparam(struct net_device 
>> *netdev,
>>  return 0;
>>  }
>>  
>> +static int macb_get_ts_info(struct net_device *netdev,
>> +struct ethtool_ts_info *info)
>> +{
>> +struct macb *bp = netdev_priv(netdev);
>> +
>> +if (bp->ptp_info)
>> +return bp->ptp_info->get_ts_info(netdev, info);
>> +
>> +return ethtool_op_get_ts_info(netdev, info);
>> +}
>> +
>>  static const struct ethtool_ops macb_ethtool_ops = {
>>  .get_regs_len   = macb_get_regs_len,
>>  .get_regs   = macb_get_regs,
>> @@ -2396,7 +2413,7 @@ static const struct ethtool_ops gem_ethtool_ops = {
>>  .get_regs_len   = macb_get_regs_len,
>>  .get_regs   = macb_get_regs,
>>  .get_link   = ethtool_op_get_link,
>> -.get_ts_info= ethtool_op_get_ts_info,
>> +.get_ts_info= macb_get_ts_info,
>>  .get_ethtool_stats  = gem_get_ethtool_stats,
>>  .get_strings= gem_get_ethtool_strings,
>>  .get_sset_count = gem_get_sset_count,
>> @@ -2409,6 +2426,7 @@ static const struct ethtool_ops gem_ethtool_ops = {
>>  static int macb_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
>>  {
>>  struct phy_device *phydev = dev->phydev;
>> +struct macb *bp = netdev_priv(dev);
>>  
>>  if (!netif_running(dev))
>>  return -EINVAL;
>> @@ -2416,7 +2434,17 @@ static int macb_ioctl(struct net_device *dev, struct 
>> ifreq *rq, int cmd)
>>  if (!phydev)
>>  return -ENODEV;
>>  
>> -return phy_mii_ioctl(phydev, rq, cmd);
>> +if (!bp->ptp_info)
>> +return phy_mii_ioctl(phydev, rq, cmd);
>> +
>> +switch (cmd) {
>> +case SIOCSHWTSTAMP:
>> +return bp->ptp_info->set_hwtst(dev, rq, cmd);
>> +case SIOCGHWTSTAMP:
>> +return bp->ptp_info->get_hwtst(dev, rq);
>> +default:
>> +return phy_mii_ioctl(phydev, rq, cmd);
>> +}
>>  }
>>  
>>  static int macb_set_features(struct net_device *netdev,
>> diff --git a/drivers/net/ethernet/cadence/macb.h 
>> b/drivers/net/ethernet/cadence/macb.h
>> index d67adad..94ddedd 100644
>> --- a/drivers/net/ethernet/cadence/macb.h
>> +++ b/drivers/net/ethernet/cadence/macb.h
>> @@ -131,6 +131,20 @@
>>  #define GEM_RXIPCCNT0x01a8 /* IP header Checksum Error 
>> Counter */
>>  #define GEM_RXTCPCCNT   0x01ac /* TCP Checksum Error Counter */
>>  #define GEM_RXUDPCCNT   0x01b0 /* UDP Checksum Error Counter */
>> +#define GEM_TISUBN  0x01bc /* 1588 Timer Increment Sub-ns */
>> +#define GEM_TSH 0x01c0 /* 1588 Timer Seconds High */
>> +#define GEM_TSL 0x01d0 /* 1588 Timer Seconds Low */
>> +#define GEM_TN  0x01d4 /* 1588 Timer Nanoseconds */
>> +#define GEM_TA  0x01d8 /* 1588 Timer Adjust */
>> +#define GEM_TI  0x01dc /* 1588 Timer Increment */
>> +#define GEM_EFTSL   0x01e0 /* PTP Event Frame Tx Seconds Low */
>> +#define GEM_EFTN0x01e4 /* PTP Event Frame Tx Nan

NAPI on USB network drivers

2017-01-25 Thread Oliver Neukum

Hi,

looking at r8152 I noticed that it uses NAPI. I never considered
this for the generic USB networking code as you cannot disable
interrupts for USB. Is it still worth it? What are the benefits?

Regards
Oliver

Re: [PATCH net-next v2] macb: Common code to enable ptp support for MACB/GEM

2017-01-25 Thread Harini Katakam

On Wed, Jan 25, 2017 at 2:56 PM, Nicolas Ferre  wrote:
> Le 19/01/2017 à 17:07, Nicolas Ferre a écrit :
>> Le 19/01/2017 à 08:56, Andrei Pistirica a écrit :
>>> This patch does the following:
>>> - MACB/GEM-PTP interface
>>> - registers and bitfields for TSU
>>> - capability flags to enable PTP per platform basis
>>>
>>> Signed-off-by: Andrei Pistirica 
>>
>> Acked-by: Nicolas Ferre 

Reviewed-by: Harini Katakam 

>
> Harini or Rafal, do you plan to review this patch and add your
> "Reviewed-by" tags? It can be useful to make this support move forward.

Sure, reviewed and working with it, meant to add tag :)

Regards,
Harini

RE: NAPI on USB network drivers

2017-01-25 Thread Hayes Wang

Oliver Neukum [mailto:oneu...@suse.com]
> Sent: Wednesday, January 25, 2017 5:35 PM
[...]
> looking at r8152 I noticed that it uses NAPI. I never considered
> this for the generic USB networking code as you cannot disable
> interrupts for USB. Is it still worth it? What are the benefits?

You could use napi_gro_receive() and it influences the performance.

Best Regards,
Hayes

Re: [PATCH 2/2] net: phy: leds: Fix truncated LED trigger names

2017-01-25 Thread Geert Uytterhoeven

Hi Andrew,

On Tue, Jan 24, 2017 at 9:03 PM, Andrew Lunn  wrote:
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 5c9d2529685fe215..f6ab919528ab3627 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -25,7 +25,6 @@
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>
>>  #include 
>>
>> @@ -339,6 +338,8 @@ struct phy_c45_device_ids {
>>   u32 device_ids[8];
>>  };
>>
>> +#include 
>> +
>>  /* phy_device: An instance of a PHY
>>   *
>>   * drv: Pointer to the driver for this PHY instance
>> diff --git a/include/linux/phy_led_triggers.h 
>> b/include/linux/phy_led_triggers.h
>> index a2daea0a37d2ae14..69dffb4fc5a294e9 100644
>> --- a/include/linux/phy_led_triggers.h
>> +++ b/include/linux/phy_led_triggers.h
>> @@ -20,9 +20,8 @@
>>  #include 
>>
>>  #define PHY_LED_TRIGGER_SPEED_SUFFIX_SIZE10
>> -#define PHY_MII_BUS_ID_SIZE  (20 - 3)
>>
>> -#define PHY_LINK_LED_TRIGGER_NAME_SIZE (PHY_MII_BUS_ID_SIZE + \
>> +#define PHY_LINK_LED_TRIGGER_NAME_SIZE (MII_BUS_ID_SIZE + \
>>  FIELD_SIZEOF(struct mdio_device, addr)+\
>>  PHY_LED_TRIGGER_SPEED_SUFFIX_SIZE)
>
> Hi Geert
>
> Using the macro is great, but it does seem a bit ugly having the
> include in the middle of the file.
>
> As far as i can see, phy.h only uses a pointer to a struct
> phy_led_trigger, not struct phy_led_trigger itself. Could you try
> removing the header file all together and just have a forward
> declaration of phy_led_trigger?

Thanks for the suggestion!
Yes, the include can be removed. A forward declaration isn't even needed.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH v2 2/3] net: phy: leds: Break dependency of phy.h on phy_led_triggers.h

2017-01-25 Thread Geert Uytterhoeven

 includes , which is not really
needed.  Drop the include from , and add it to all users
that didn't include it explicitly.

Suggested-by: Andrew Lunn 
Signed-off-by: Geert Uytterhoeven 
---
v2:
  - New.
---
 drivers/net/phy/phy.c  | 1 +
 drivers/net/phy/phy_led_triggers.c | 1 +
 include/linux/phy.h| 1 -
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 48da6e93c3f783e0..807abd6e331f8aa2 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/phy/phy_led_triggers.c 
b/drivers/net/phy/phy_led_triggers.c
index 3f619e7371e97d8a..94ca42e630bbead0 100644
--- a/drivers/net/phy/phy_led_triggers.c
+++ b/drivers/net/phy/phy_led_triggers.c
@@ -12,6 +12,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 
 static struct phy_led_trigger *phy_speed_to_led_trigger(struct phy_device *phy,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5c9d2529685fe215..43474f39ef6523c5 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 
-- 
1.9.1

[PATCH v2 1/3] net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash

2017-01-25 Thread Geert Uytterhoeven

phy_attach_direct() ignores errors returned by
phy_led_triggers_register(). I think that's OK, as LED triggers can be
considered a non-critical feature.

However, this causes problems later:
  - phy_led_trigger_change_speed() will access the array
phy_device.phy_led_triggers, which has been freed in the error path
of phy_led_triggers_register(), which may lead to a crash.

  - phy_led_triggers_unregister() will access the same array, leading to
crashes during s2ram or poweroff, like:

Unable to handle kernel NULL pointer dereference at virtual address

...
[] (__list_del_entry_valid) from [] 
(led_trigger_unregister+0x34/0xcc)
[] (led_trigger_unregister) from [] 
(phy_led_triggers_unregister+0x28/0x34)
[] (phy_led_triggers_unregister) from [] 
(phy_detach+0x30/0x74)
[] (phy_detach) from [] (sh_eth_close+0x64/0x9c)
[] (sh_eth_close) from [] 
(dpm_run_callback+0x48/0xc8)

or:

list_del corruption. prev->next should be dede6540, but was 2e323931
[ cut here ]
kernel BUG at lib/list_debug.c:52!
...
[] (__list_del_entry_valid) from [] 
(led_trigger_unregister+0x34/0xcc)
[] (led_trigger_unregister) from [] 
(phy_led_triggers_unregister+0x28/0x34)
[] (phy_led_triggers_unregister) from [] 
(phy_detach+0x30/0x74)
[] (phy_detach) from [] (sh_eth_close+0x6c/0xa4)
[] (sh_eth_close) from [] 
(__dev_close_many+0xac/0xd0)

To fix this, clear phy_device.phy_num_led_triggers in the error path of
phy_led_triggers_register() fails.

Note that the "No phy led trigger registered for speed" message will
still be printed on link speed changes, which is a good cue that
something went wrong with the LED triggers.

Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy 
link state change")
Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Florian Fainelli 
---
Alternatively, phy_attach_direct() could consider
phy_led_triggers_register() failures as fatal, so
phy_led_trigger_change_speed() and phy_led_triggers_unregister() are
never called afterwards.

Exposed by commit 4567d686f5c6d955 ("phy: increase size of
MII_BUS_ID_SIZE and bus_id"), which caused duplicate trigger names.

v2:
  - Add Reviewed-by.
---
 drivers/net/phy/phy_led_triggers.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy_led_triggers.c 
b/drivers/net/phy/phy_led_triggers.c
index fa62bdf2f52694de..3f619e7371e97d8a 100644
--- a/drivers/net/phy/phy_led_triggers.c
+++ b/drivers/net/phy/phy_led_triggers.c
@@ -102,8 +102,10 @@ int phy_led_triggers_register(struct phy_device *phy)
sizeof(struct phy_led_trigger) *
   phy->phy_num_led_triggers,
GFP_KERNEL);
-   if (!phy->phy_led_triggers)
-   return -ENOMEM;
+   if (!phy->phy_led_triggers) {
+   err = -ENOMEM;
+   goto out_clear;
+   }
 
for (i = 0; i < phy->phy_num_led_triggers; i++) {
err = phy_led_trigger_register(phy, &phy->phy_led_triggers[i],
@@ -120,6 +122,8 @@ int phy_led_triggers_register(struct phy_device *phy)
while (i--)
phy_led_trigger_unregister(&phy->phy_led_triggers[i]);
devm_kfree(&phy->mdio.dev, phy->phy_led_triggers);
+out_clear:
+   phy->phy_num_led_triggers = 0;
return err;
 }
 EXPORT_SYMBOL_GPL(phy_led_triggers_register);
-- 
1.9.1

[PATCH v2 3/3] net: phy: leds: Fix truncated LED trigger names

2017-01-25 Thread Geert Uytterhoeven

Commit 4567d686f5c6d955 ("phy: increase size of MII_BUS_ID_SIZE and
bus_id") increased the size of MII bus IDs, but forgot to update the
private definition in .
This may cause:
  1. Truncation of LED trigger names,
  2. Duplicate LED trigger names,
  3. Failures registering LED triggers,
  4. Crashes due to bad error handling in the LED trigger failure path.

To fix this, and prevent the definitions going out of sync again in the
future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE
definition.

Example:
  - Before I had triggers "ee70.etherne:01:100Mbps" and
"ee70.etherne:01:10Mbps",
  - After the increase of MII_BUS_ID_SIZE, both became
"ee70.ethernet-:01:" => FAIL,
  - Now, the triggers are "ee70.ethernet-:01:100Mbps" and
"ee70.ethernet-:01:10Mbps", which are unique again.

Fixes: 4567d686f5c6d955 ("phy: increase size of MII_BUS_ID_SIZE and bus_id")
Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy 
link state change")
Signed-off-by: Geert Uytterhoeven 
---
v2:
  - Drop moving the include of , as
 no longer includes it,
  - #include  from .
---
 include/linux/phy_led_triggers.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/phy_led_triggers.h b/include/linux/phy_led_triggers.h
index a2daea0a37d2ae14..b37b05bfd1a6dd8a 100644
--- a/include/linux/phy_led_triggers.h
+++ b/include/linux/phy_led_triggers.h
@@ -18,11 +18,11 @@
 #ifdef CONFIG_LED_TRIGGER_PHY
 
 #include 
+#include 
 
 #define PHY_LED_TRIGGER_SPEED_SUFFIX_SIZE  10
-#define PHY_MII_BUS_ID_SIZE(20 - 3)
 
-#define PHY_LINK_LED_TRIGGER_NAME_SIZE (PHY_MII_BUS_ID_SIZE + \
+#define PHY_LINK_LED_TRIGGER_NAME_SIZE (MII_BUS_ID_SIZE + \
   FIELD_SIZEOF(struct mdio_device, addr)+\
   PHY_LED_TRIGGER_SPEED_SUFFIX_SIZE)
 
-- 
1.9.1

[PATCH v2 0/3] net: phy: leds: Fix truncated LED trigger names and crashes

2017-01-25 Thread Geert Uytterhoeven

Hi David,

I started seeing crashes during s2ram and poweroff on all my ARM boards,
like:

Unable to handle kernel NULL pointer dereference at virtual address 
...
[] (__list_del_entry_valid) from [] 
(led_trigger_unregister+0x34/0xcc)
[] (led_trigger_unregister) from [] 
(phy_led_triggers_unregister+0x28/0x34)
[] (phy_led_triggers_unregister) from [] 
(phy_detach+0x30/0x74)
[] (phy_detach) from [] (sh_eth_close+0x64/0x9c)
[] (sh_eth_close) from [] (dpm_run_callback+0x48/0xc8)

or:

list_del corruption. prev->next should be dede6540, but was 2e323931
[ cut here ]
kernel BUG at lib/list_debug.c:52!
...
[] (__list_del_entry_valid) from [] 
(led_trigger_unregister+0x34/0xcc)
[] (led_trigger_unregister) from [] 
(phy_led_triggers_unregister+0x28/0x34)
[] (phy_led_triggers_unregister) from [] 
(phy_detach+0x30/0x74)
[] (phy_detach) from [] (sh_eth_close+0x6c/0xa4)
[] (sh_eth_close) from [] (__dev_close_many+0xac/0xd0)

As the only clue was a kernel message like

sh-eth ee70.ethernet eth0: No phy led trigger registered for speed(100)

I had to bisected this, leading to commit 4567d686f5c6d955 ("phy:
increase size of MII_BUS_ID_SIZE and bus_id").  Reverting that commit
fixed the issue.

More investigation revealed the crashes are due to the combination of
two things:
  - Truncated LED trigger names, leading to duplicate names, and
registration failures,
  - Bad error handling in case of registration failures.

Both are fixed by this patch series.

Changes compared to v1:
  - Add Reviewed-by,
  - New patch "net: phy: leds: Break dependency of phy.h on
phy_led_triggers.h",
  - Drop moving the include of , as
 no longer includes it,
  - #include  from .

Thanks!

Geert Uytterhoeven (3):
  net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
  net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
  net: phy: leds: Fix truncated LED trigger names

 drivers/net/phy/phy.c  | 1 +
 drivers/net/phy/phy_led_triggers.c | 9 +++--
 include/linux/phy.h| 1 -
 include/linux/phy_led_triggers.h   | 4 ++--
 4 files changed, 10 insertions(+), 5 deletions(-)

-- 
1.9.1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 1/2] Documentation: devicetree: change the mediatek ethernet compatible string

2017-01-25 Thread Matthias Brugger




On 01/25/2017 09:20 AM, John Crispin wrote:

When the binding was defined, I was not aware that mt2701 was an earlier
version of the SoC. For sake of consistency, the ethernet driver should
use mt2701 inside the compat string as this is the earliest SoC with the
ethernet core.

The ethernet driver is currently of no real use until we finish and
upstream the DSA driver. There are no users of this binding yet. It should
be safe to fix this now before it is too late and we need to provide
backward compatibility for the mt7623-eth compat string.

Reported-by: Sean Wang 
Signed-off-by: John Crispin 
---


sounds reasonable to me:
Reviewed-by: Matthias Brugger 


 Documentation/devicetree/bindings/net/mediatek-net.txt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt 
b/Documentation/devicetree/bindings/net/mediatek-net.txt
index c010faf..c7194e8 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -7,7 +7,7 @@ have dual GMAC each represented by a child node..
 * Ethernet controller node

 Required properties:
-- compatible: Should be "mediatek,mt7623-eth"
+- compatible: Should be "mediatek,mt2701-eth"
 - reg: Address and length of the register set for the device
 - interrupts: Should contain the three frame engines interrupts in numeric
order. These are fe_int0, fe_int1 and fe_int2.

RE: [patch] samples/bpf: silence shift wrapping warning

2017-01-25 Thread David Laight

From: Alexei Starovoitov
> Sent: 22 January 2017 22:51
> On Sat, Jan 21, 2017 at 07:51:43AM +0300, Dan Carpenter wrote:
> > max_key is a value in the 0-63 range, so on 32 bit systems the shift
> > could wrap.
> >
> > Signed-off-by: Dan Carpenter 
> 
> Looks fine. I think 'net-next' is ok.
> 
> Acked-by: Alexei Starovoitov 
> 
> > diff --git a/samples/bpf/lwt_len_hist_user.c 
> > b/samples/bpf/lwt_len_hist_user.c
> > index ec8f3bb..bd06eef 100644
> > --- a/samples/bpf/lwt_len_hist_user.c
> > +++ b/samples/bpf/lwt_len_hist_user.c
> > @@ -68,7 +68,7 @@ int main(int argc, char **argv)
> > for (i = 1; i <= max_key + 1; i++) {
> > stars(starstr, data[i - 1], max_value, MAX_STARS);
> > printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
> > -  (1l << i) >> 1, (1l << i) - 1, data[i - 1],
> > +  (1ULL << i) >> 1, (1ULL << i) - 1, data[i - 1],
> >MAX_STARS, starstr);
> > }

The format effectors are wrong on 32bit systems.

David

Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants

2017-01-25 Thread Vlastimil Babka

On 01/24/2017 04:00 PM, Michal Hocko wrote:

> Well, I am not opposed to kvmalloc_array but I would argue that this
> conversion cannot introduce new overflow issues. The code would have
> to be broken already because even though kmalloc_array checks for the
> overflow but vmalloc fallback doesn't...

Yeah I agree, but if some of the places were really wrong, after the
conversion we won't see them anymore.

> If there is a general interest for this API I can add it.

I think it would be better, yes.

OK, fair enough. I will fold the following into the original patch. I
was little bit reluctant to create kvcalloc so I've made the original
callers more talkative and added | __GFP_ZERO.

Fair enough,

To be honest I do not
really like how kcalloc...

how kcalloc what?

[...]

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index cdc55d5ee4ad..eca16612b1ae 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
  */
 unsigned int *xt_alloc_entry_offsets(unsigned int size)
 {
-   if (size < (SIZE_MAX / sizeof(unsigned int)))
-   return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
-
-   return NULL;
+   return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | 
__GFP_ZERO);

This one wouldn't compile.

[PATCH 0/2] More ethtool support and BGX configuration changes

2017-01-25 Thread sunil . kovvuri

From: Sunil Goutham 

These patches adds support to set queue sizes from ethtool and changes 
the way serdes lane configuration is done by BGX driver on 81/83xx 
platforms.

Sunil Goutham (2):
  net: thunderx: Support to configure queue sizes from ethtool
  net: thunderx: Leave serdes lane config on 81/83xx to firmware

 .../net/ethernet/cavium/thunder/nicvf_ethtool.c| 39 -
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 19 -
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 16 +++-
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c  | 95 --
 4 files changed, 83 insertions(+), 86 deletions(-)

-- 
2.7.4

[PATCH 2/2] net: thunderx: Leave serdes lane config on 81/83xx to firmware

2017-01-25 Thread sunil . kovvuri

From: Sunil Goutham 

For DLMs and SLMs on 80/81/83xx, many lane configurations
across different boards are coming up. Also kernel doesn't have
any way to identify board type/info and since firmware does,
just get rid of figuring out lane to serdes config and take
whatever has been programmed by low level firmware.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 95 +--
 1 file changed, 18 insertions(+), 77 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c 
b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 2f85b64..dfb2bad 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -894,17 +894,15 @@ static void bgx_print_qlm_mode(struct bgx *bgx, u8 lmacid)
struct device *dev = &bgx->pdev->dev;
struct lmac *lmac;
char str[20];
-   u8 dlm;
 
-   if (lmacid > bgx->max_lmac)
+   if (!bgx->is_dlm && lmacid)
return;
 
lmac = &bgx->lmac[lmacid];
-   dlm = (lmacid / 2) + (bgx->bgx_id * 2);
if (!bgx->is_dlm)
sprintf(str, "BGX%d QLM mode", bgx->bgx_id);
else
-   sprintf(str, "BGX%d DLM%d mode", bgx->bgx_id, dlm);
+   sprintf(str, "BGX%d LMAC%d mode", bgx->bgx_id, lmacid);
 
switch (lmac->lmac_type) {
case BGX_MODE_SGMII:
@@ -990,7 +988,6 @@ static void lmac_set_training(struct bgx *bgx, struct lmac 
*lmac, int lmacid)
 static void bgx_set_lmac_config(struct bgx *bgx, u8 idx)
 {
struct lmac *lmac;
-   struct lmac *olmac;
u64 cmr_cfg;
u8 lmac_type;
u8 lane_to_sds;
@@ -1010,62 +1007,26 @@ static void bgx_set_lmac_config(struct bgx *bgx, u8 idx)
return;
}
 
-   /* On 81xx BGX can be split across 2 DLMs
-* firmware programs lmac_type of LMAC0 and LMAC2
+   /* For DLMs or SLMs on 80/81/83xx so many lane configurations
+* are possible and vary across boards. Also Kernel doesn't have
+* any way to identify board type/info and since firmware does,
+* just take lmac type and serdes lane config as is.
 */
-   if ((idx == 0) || (idx == 2)) {
-   cmr_cfg = bgx_reg_read(bgx, idx, BGX_CMRX_CFG);
-   lmac_type = (u8)((cmr_cfg >> 8) & 0x07);
-   lane_to_sds = (u8)(cmr_cfg & 0xFF);
-   /* Check if config is not reset value */
-   if ((lmac_type == 0) && (lane_to_sds == 0xE4))
-   lmac->lmac_type = BGX_MODE_INVALID;
-   else
-   lmac->lmac_type = lmac_type;
-   lmac_set_training(bgx, lmac, lmac->lmacid);
-   lmac_set_lane2sds(bgx, lmac);
-
-   olmac = &bgx->lmac[idx + 1];
-   /*  Check if other LMAC on the same DLM is already configured by
-*  firmware, if so use the same config or else set as same, as
-*  that of LMAC 0/2.
-*  This check is needed as on 80xx only one lane of each of the
-*  DLM of BGX0 is used, so have to rely on firmware for
-*  distingushing 80xx from 81xx.
-*/
-   cmr_cfg = bgx_reg_read(bgx, idx + 1, BGX_CMRX_CFG);
-   lmac_type = (u8)((cmr_cfg >> 8) & 0x07);
-   lane_to_sds = (u8)(cmr_cfg & 0xFF);
-   if ((lmac_type == 0) && (lane_to_sds == 0xE4)) {
-   olmac->lmac_type = lmac->lmac_type;
-   lmac_set_lane2sds(bgx, olmac);
-   } else {
-   olmac->lmac_type = lmac_type;
-   olmac->lane_to_sds = lane_to_sds;
-   }
-   lmac_set_training(bgx, olmac, olmac->lmacid);
-   }
-}
-
-static bool is_dlm0_in_bgx_mode(struct bgx *bgx)
-{
-   struct lmac *lmac;
-
-   if (!bgx->is_dlm)
-   return true;
-
-   lmac = &bgx->lmac[0];
-   if (lmac->lmac_type == BGX_MODE_INVALID)
-   return false;
-
-   return true;
+   cmr_cfg = bgx_reg_read(bgx, idx, BGX_CMRX_CFG);
+   lmac_type = (u8)((cmr_cfg >> 8) & 0x07);
+   lane_to_sds = (u8)(cmr_cfg & 0xFF);
+   /* Check if config is reset value */
+   if ((lmac_type == 0) && (lane_to_sds == 0xE4))
+   lmac->lmac_type = BGX_MODE_INVALID;
+   else
+   lmac->lmac_type = lmac_type;
+   lmac->lane_to_sds = lane_to_sds;
+   lmac_set_training(bgx, lmac, lmac->lmacid);
 }
 
 static void bgx_get_qlm_mode(struct bgx *bgx)
 {
struct lmac *lmac;
-   struct lmac *lmac01;
-   struct lmac *lmac23;
u8  idx;
 
/* Init all LMAC's type to invalid */
@@ -1081,29 +1042,9 @@ static void bgx_get_qlm_mode(struct bgx *bgx)
if (bgx->lmac_count > bgx->max_lmac)
bgx->lmac_count = bgx->max_lmac;
 
-   for (idx = 0; idx < bgx->max_lmac; idx++)
-

[PATCH 1/2] net: thunderx: Support to configure queue sizes from ethtool

2017-01-25 Thread sunil . kovvuri

From: Sunil Goutham 

Adds support to set Rx/Tx queue sizes from ethtool. Fixes
an issue with retrieving queue size. Also sets SQ's CQ_LIMIT
based on configured Tx queue size such that HW doesn't process
SQEs when there is no sufficient space in CQ.

Signed-off-by: Sunil Goutham 
---
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c| 39 --
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 19 +--
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 16 ++---
 3 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
index 5ac4746..02a986c 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
@@ -471,12 +471,46 @@ static void nicvf_get_ringparam(struct net_device *netdev,
struct nicvf *nic = netdev_priv(netdev);
struct queue_set *qs = nic->qs;
 
-   ring->rx_max_pending = MAX_RCV_BUF_COUNT;
-   ring->rx_pending = qs->rbdr_len;
+   ring->rx_max_pending = MAX_CMP_QUEUE_LEN;
+   ring->rx_pending = qs->cq_len;
ring->tx_max_pending = MAX_SND_QUEUE_LEN;
ring->tx_pending = qs->sq_len;
 }
 
+static int nicvf_set_ringparam(struct net_device *netdev,
+  struct ethtool_ringparam *ring)
+{
+   struct nicvf *nic = netdev_priv(netdev);
+   struct queue_set *qs = nic->qs;
+   u32 rx_count, tx_count;
+
+   /* Due to HW errata this is not supported on T88 pass 1.x silicon */
+   if (pass1_silicon(nic->pdev))
+   return -EINVAL;
+
+   if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
+   return -EINVAL;
+
+   tx_count = clamp_t(u32, ring->tx_pending,
+  MIN_SND_QUEUE_LEN, MAX_SND_QUEUE_LEN);
+   rx_count = clamp_t(u32, ring->rx_pending,
+  MIN_CMP_QUEUE_LEN, MAX_CMP_QUEUE_LEN);
+
+   if ((tx_count == qs->sq_len) && (rx_count == qs->cq_len))
+   return 0;
+
+   /* Permitted lengths are 1K, 2K, 4K, 8K, 16K, 32K, 64K */
+   qs->sq_len = rounddown_pow_of_two(tx_count);
+   qs->cq_len = rounddown_pow_of_two(rx_count);
+
+   if (netif_running(netdev)) {
+   nicvf_stop(netdev);
+   nicvf_open(netdev);
+   }
+
+   return 0;
+}
+
 static int nicvf_get_rss_hash_opts(struct nicvf *nic,
   struct ethtool_rxnfc *info)
 {
@@ -787,6 +821,7 @@ static const struct ethtool_ops nicvf_ethtool_ops = {
.get_regs   = nicvf_get_regs,
.get_coalesce   = nicvf_get_coalesce,
.get_ringparam  = nicvf_get_ringparam,
+   .set_ringparam  = nicvf_set_ringparam,
.get_rxnfc  = nicvf_get_rxnfc,
.set_rxnfc  = nicvf_set_rxnfc,
.get_rxfh_key_size  = nicvf_get_rxfh_key_size,
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index d2ac133..ac0390b 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -603,7 +603,7 @@ void nicvf_cmp_queue_config(struct nicvf *nic, struct 
queue_set *qs,
cq_cfg.ena = 1;
cq_cfg.reset = 0;
cq_cfg.caching = 0;
-   cq_cfg.qsize = CMP_QSIZE;
+   cq_cfg.qsize = ilog2(qs->cq_len >> 10);
cq_cfg.avg_con = 0;
nicvf_queue_reg_write(nic, NIC_QSET_CQ_0_7_CFG, qidx, *(u64 *)&cq_cfg);
 
@@ -652,9 +652,12 @@ static void nicvf_snd_queue_config(struct nicvf *nic, 
struct queue_set *qs,
sq_cfg.ena = 1;
sq_cfg.reset = 0;
sq_cfg.ldwb = 0;
-   sq_cfg.qsize = SND_QSIZE;
+   sq_cfg.qsize = ilog2(qs->sq_len >> 10);
sq_cfg.tstmp_bgx_intf = 0;
-   sq_cfg.cq_limit = 0;
+   /* CQ's level at which HW will stop processing SQEs to avoid
+* transmitting a pkt with no space in CQ to post CQE_TX.
+*/
+   sq_cfg.cq_limit = (CMP_QUEUE_PIPELINE_RSVD * 256) / qs->cq_len;
nicvf_queue_reg_write(nic, NIC_QSET_SQ_0_7_CFG, qidx, *(u64 *)&sq_cfg);
 
/* Set threshold value for interrupt generation */
@@ -816,11 +819,21 @@ int nicvf_config_data_transfer(struct nicvf *nic, bool 
enable)
 {
bool disable = false;
struct queue_set *qs = nic->qs;
+   struct queue_set *pqs = nic->pnicvf->qs;
int qidx;
 
if (!qs)
return 0;
 
+   /* Take primary VF's queue lengths.
+* This is needed to take queue lengths set from ethtool
+* into consideration.
+*/
+   if (nic->sqs_mode && pqs) {
+   qs->cq_len = pqs->cq_len;
+   qs->sq_len = pqs->sq_len;
+   }
+
if (enable) {
if (nicvf_alloc_resources(nic))
return -ENOMEM;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf

Re: [PATCH v2 3/3] net: phy: leds: Fix truncated LED trigger names

2017-01-25 Thread Andrew Lunn

On Wed, Jan 25, 2017 at 11:39:50AM +0100, Geert Uytterhoeven wrote:
> Commit 4567d686f5c6d955 ("phy: increase size of MII_BUS_ID_SIZE and
> bus_id") increased the size of MII bus IDs, but forgot to update the
> private definition in .
> This may cause:
>   1. Truncation of LED trigger names,
>   2. Duplicate LED trigger names,
>   3. Failures registering LED triggers,
>   4. Crashes due to bad error handling in the LED trigger failure path.
> 
> To fix this, and prevent the definitions going out of sync again in the
> future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE
> definition.
> 
> Example:
>   - Before I had triggers "ee70.etherne:01:100Mbps" and
> "ee70.etherne:01:10Mbps",
>   - After the increase of MII_BUS_ID_SIZE, both became
> "ee70.ethernet-:01:" => FAIL,
>   - Now, the triggers are "ee70.ethernet-:01:100Mbps" and
> "ee70.ethernet-:01:10Mbps", which are unique again.
> 
> Fixes: 4567d686f5c6d955 ("phy: increase size of MII_BUS_ID_SIZE and bus_id")
> Fixes: 2e0bc452f4721520 ("net: phy: leds: add support for led triggers on phy 
> link state change")
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 2/3] net: phy: leds: Break dependency of phy.h on phy_led_triggers.h

2017-01-25 Thread Andrew Lunn

On Wed, Jan 25, 2017 at 11:39:49AM +0100, Geert Uytterhoeven wrote:
>  includes , which is not really
> needed.  Drop the include from , and add it to all users
> that didn't include it explicitly.
> 
> Suggested-by: Andrew Lunn 
> Signed-off-by: Geert Uytterhoeven 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants

2017-01-25 Thread Michal Hocko

On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
> On 01/24/2017 04:00 PM, Michal Hocko wrote:
> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
> > > > conversion cannot introduce new overflow issues. The code would have
> > > > to be broken already because even though kmalloc_array checks for the
> > > > overflow but vmalloc fallback doesn't...
> > > 
> > > Yeah I agree, but if some of the places were really wrong, after the
> > > conversion we won't see them anymore.
> > > 
> > > > If there is a general interest for this API I can add it.
> > > 
> > > I think it would be better, yes.
> > 
> > OK, fair enough. I will fold the following into the original patch. I
> > was little bit reluctant to create kvcalloc so I've made the original
> > callers more talkative and added | __GFP_ZERO.
> 
> Fair enough,
> 
> > To be honest I do not
> > really like how kcalloc...
> 
> how kcalloc what?

how kcalloc hides the GFP_ZERO and the name doesn't reflect that.
 
> [...]
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index cdc55d5ee4ad..eca16612b1ae 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -712,10 +712,7 @@ EXPORT_SYMBOL(xt_check_entry_offsets);
> >   */
> >  unsigned int *xt_alloc_entry_offsets(unsigned int size)
> >  {
> > -   if (size < (SIZE_MAX / sizeof(unsigned int)))
> > -   return kvzalloc(size * sizeof(unsigned int), GFP_KERNEL);
> > -
> > -   return NULL;
> > +   return kvmalloc_array(size * sizeof(unsigned int), GFP_KERNEL | 
> > __GFP_ZERO);
> 
> This one wouldn't compile.

fixed, thanks!

-- 
Michal Hocko
SUSE Labs

ip link SR-IOV VF MAC address disparity

2017-01-25 Thread Leon Goldberg

Hey,

Using ip link to retrieve the MAC addresses of some SR-IOV virtual
functions, I'm receiving mixed results:

[root@nari05 sys]# ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp2s0f0:  mtu 1500 qdisc mq
master ovirtmgmt state UP mode DEFAULT qlen 1000
link/ether 78:e7:d1:e4:9b:64 brd ff:ff:ff:ff:ff:ff
3: enp2s0f1:  mtu 1500 qdisc mq master test1
state DOWN mode DEFAULT qlen 1000
link/ether 78:e7:d1:e4:9b:65 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
4: ovirtmgmt:  mtu 1500 qdisc noqueue
state UP mode DEFAULT
link/ether 78:e7:d1:e4:9b:64 brd ff:ff:ff:ff:ff:ff
5: test1:  mtu 1500 qdisc noqueue
state DOWN mode DEFAULT
link/ether 78:e7:d1:e4:9b:65 brd ff:ff:ff:ff:ff:ff
11: ;vdsmdummy;:  mtu 1500 qdisc noop state DOWN
mode DEFAULT
link/ether 5e:b4:ac:5c:b9:a1 brd ff:ff:ff:ff:ff:ff
37: enp2s16f1:  mtu 1500 qdisc noop state DOWN
mode DEFAULT qlen 1000
link/ether 00:00:00:00:00:02 brd ff:ff:ff:ff:ff:ff
38: enp2s16f3:  mtu 1500 qdisc noop state DOWN
mode DEFAULT qlen 1000
link/ether d6:ee:45:57:c0:39 brd ff:ff:ff:ff:ff:ff
39: enp2s16f5:  mtu 1500 qdisc noop state DOWN
mode DEFAULT qlen 1000
link/ether 4a:2c:25:42:97:4a brd ff:ff:ff:ff:ff:ff
40: enp2s16f7:  mtu 1500 qdisc noop state DOWN
mode DEFAULT qlen 1000
link/ether c2:fe:2f:5e:f5:e8 brd ff:ff:ff:ff:ff:ff
41: enp2s17f1:  mtu 1500 qdisc noop state DOWN
mode DEFAULT qlen 1000
link/ether e6:31:a9:59:5f:ad brd ff:ff:ff:ff:ff:ff

enp2s0f1 is the physical function; enp2s1f* are the interfaces to the
virtual functions.

Essentially, I have 2 questions:
1) What is the difference between the entries under the physical
function and the interfaces?
2) How should I retrieve the correct MAC addresses? I'm aware of
/sys/...//net/address, but I am now not sure it is the correct
source.

Thanks,
Leon

Re: NAPI on USB network drivers

2017-01-25 Thread Oliver Hartkopp


On 01/25/2017 10:39 AM, Hayes Wang wrote:

Oliver Neukum [mailto:oneu...@suse.com]

Sent: Wednesday, January 25, 2017 5:35 PM

[...]

looking at r8152 I noticed that it uses NAPI. I never considered
this for the generic USB networking code as you cannot disable
interrupts for USB. Is it still worth it? What are the benefits?


You could use napi_gro_receive() and it influences the performance.


Another positive effect with NAPI is that you won't face out-of-order 
ethernet frames as you get with non-NAPI drivers, e.g. ax88179_178a


http://marc.info/?l=linux-can&m=148049063812807&w=2

We have the issue with CAN drivers where all USB drivers and >90% of the 
I/O mapped drivers do not use NAPI.


I wonder whether it makes sense to add NAPI to a driver which only has 
ONE RX buffer ... but when searching for a solution for o-o-o frames I 
was always pointed to NAPI.


Regards,
Oliver

Re: [PATCH 5/6] treewide: use kv[mz]alloc* rather than opencoded variants

2017-01-25 Thread Ilya Dryomov

On Wed, Jan 25, 2017 at 2:09 PM, Michal Hocko  wrote:
> On Wed 25-01-17 12:15:59, Vlastimil Babka wrote:
>> On 01/24/2017 04:00 PM, Michal Hocko wrote:
>> > > > Well, I am not opposed to kvmalloc_array but I would argue that this
>> > > > conversion cannot introduce new overflow issues. The code would have
>> > > > to be broken already because even though kmalloc_array checks for the
>> > > > overflow but vmalloc fallback doesn't...
>> > >
>> > > Yeah I agree, but if some of the places were really wrong, after the
>> > > conversion we won't see them anymore.
>> > >
>> > > > If there is a general interest for this API I can add it.
>> > >
>> > > I think it would be better, yes.
>> >
>> > OK, fair enough. I will fold the following into the original patch. I
>> > was little bit reluctant to create kvcalloc so I've made the original
>> > callers more talkative and added | __GFP_ZERO.
>>
>> Fair enough,
>>
>> > To be honest I do not
>> > really like how kcalloc...
>>
>> how kcalloc what?
>
> how kcalloc hides the GFP_ZERO and the name doesn't reflect that.

The userspace calloc() is specified to zero memory, so I'd say the name
does reflect that.

Thanks,

Ilya

[PATCH net-next] sfc: reduce severity of PIO buffer alloc failures

2017-01-25 Thread Bert Kenward

From: Tomáš Pilař 

PIO buffer allocation can fail for two valid reasons:
 - we've run out of them (results in -ENOSPC)
 - the NIC configuration doesn't support them (results in -EPERM)
Since both these failures are expected netif_err is excessive.

Signed-off-by: Bert Kenward 
---
 drivers/net/ethernet/sfc/ef10.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index dccbbd323616..7c53da28ad64 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -1170,7 +1170,13 @@ static int efx_ef10_dimension_resources(struct efx_nic 
*efx)
 nic_data->piobuf_size / efx_piobuf_size);
 
rc = efx_ef10_alloc_piobufs(efx, n_piobufs);
-   if (rc)
+   if (rc == -ENOSPC)
+   netif_dbg(efx, probe, efx->net_dev,
+ "out of PIO buffers; cannot allocate more\n");
+   else if (rc == -EPERM)
+   netif_dbg(efx, probe, efx->net_dev,
+ "not permitted to allocate PIO buffers\n");
+   else if (rc)
netif_err(efx, probe, efx->net_dev,
  "failed to allocate PIO buffers (%d)\n", rc);
else
@@ -1317,8 +1323,14 @@ static int efx_ef10_init_nic(struct efx_nic *efx)
efx_ef10_free_piobufs(efx);
}
 
-   /* Log an error on failure, but this is non-fatal */
-   if (rc)
+   /* Log an error on failure, but this is non-fatal.
+* Permission errors are less important - we've presumably
+* had the PIO buffer licence removed.
+*/
+   if (rc == -EPERM)
+   netif_dbg(efx, drv, efx->net_dev,
+ "not permitted to restore PIO buffers\n");
+   else if (rc)
netif_err(efx, drv, efx->net_dev,
  "failed to restore PIO buffers (%d)\n", rc);
nic_data->must_restore_piobufs = false;
-- 
2.7.4

[PATCH 14/23] tools lib bpf: Fix map offsets in relocation

2017-01-25 Thread Arnaldo Carvalho de Melo

From: Joe Stringer 

Commit 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") attempted to
fix map resolution by identifying the number of symbols that point to
maps, and using this number to resolve each of the maps.

However, during relocation the original definition of the map size was
still in use. For up to two maps, the calculation was correct if there
was a small difference in size between the map definition in libbpf and
the one that the client library uses. However if the difference was
large, particularly if more than two maps were used in the BPF program,
the relocation would fail.

For example, when using a map definition with size 28, with three maps,
map relocation would count:

(sym_offset / sizeof(struct bpf_map_def) => map_idx)
(0 / 16 => 0), ie map_idx = 0
(28 / 16 => 1), ie map_idx = 1
(56 / 16 => 3), ie map_idx = 3

So, libbpf reports:

libbpf: bpf relocation: map_idx 3 large than 2

Fix map relocation by checking the exact offset of maps when doing
relocation.

Signed-off-by: Joe Stringer 
[Allow different map size in an object]
Signed-off-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: netdev@vger.kernel.org
Fixes: 4708bbda5cb2 ("tools lib bpf: Fix maps resolution")
Link: http://lkml.kernel.org/r/20170123011128.26534-2-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 

Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 84e6b35da4bd..671d5ad07cf1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -779,7 +779,7 @@ static int
 bpf_program__collect_reloc(struct bpf_program *prog,
   size_t nr_maps, GElf_Shdr *shdr,
   Elf_Data *data, Elf_Data *symbols,
-  int maps_shndx)
+  int maps_shndx, struct bpf_map *maps)
 {
int i, nrels;
 
@@ -829,7 +829,15 @@ bpf_program__collect_reloc(struct bpf_program *prog,
return -LIBBPF_ERRNO__RELOC;
}
 
-   map_idx = sym.st_value / sizeof(struct bpf_map_def);
+   /* TODO: 'maps' is sorted. We can use bsearch to make it 
faster. */
+   for (map_idx = 0; map_idx < nr_maps; map_idx++) {
+   if (maps[map_idx].offset == sym.st_value) {
+   pr_debug("relocation: find map %zd (%s) for 
insn %u\n",
+map_idx, maps[map_idx].name, insn_idx);
+   break;
+   }
+   }
+
if (map_idx >= nr_maps) {
pr_warning("bpf relocation: map_idx %d large than %d\n",
   (int)map_idx, (int)nr_maps - 1);
@@ -953,7 +961,8 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
err = bpf_program__collect_reloc(prog, nr_maps,
 shdr, data,
 obj->efile.symbols,
-obj->efile.maps_shndx);
+obj->efile.maps_shndx,
+obj->maps);
if (err)
return err;
}
-- 
2.9.3

[PATCH 15/23] tools lib bpf: Define prog_type fns with macro

2017-01-25 Thread Arnaldo Carvalho de Melo

From: Joe Stringer 

Turning this into a macro allows future prog types to be added with a
single line per type.

Signed-off-by: Joe Stringer 
Acked-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170123011128.26534-3-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 671d5ad07cf1..371cb40a2304 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1428,37 +1428,28 @@ static void bpf_program__set_type(struct bpf_program 
*prog,
prog->type = type;
 }
 
-int bpf_program__set_tracepoint(struct bpf_program *prog)
-{
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-   return 0;
-}
-
-int bpf_program__set_kprobe(struct bpf_program *prog)
-{
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
-   return 0;
-}
-
 static bool bpf_program__is_type(struct bpf_program *prog,
 enum bpf_prog_type type)
 {
return prog ? (prog->type == type) : false;
 }
 
-bool bpf_program__is_tracepoint(struct bpf_program *prog)
-{
-   return bpf_program__is_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-}
-
-bool bpf_program__is_kprobe(struct bpf_program *prog)
-{
-   return bpf_program__is_type(prog, BPF_PROG_TYPE_KPROBE);
-}
+#define BPF_PROG_TYPE_FNS(NAME, TYPE)  \
+int bpf_program__set_##NAME(struct bpf_program *prog)  \
+{  \
+   if (!prog)  \
+   return -EINVAL; \
+   bpf_program__set_type(prog, TYPE);  \
+   return 0;   \
+}  \
+   \
+bool bpf_program__is_##NAME(struct bpf_program *prog)  \
+{  \
+   return bpf_program__is_type(prog, TYPE);\
+}  \
+
+BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
+BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 
 int bpf_map__fd(struct bpf_map *map)
 {
-- 
2.9.3

[PATCH 17/23] tools lib bpf: Add libbpf_get_error()

2017-01-25 Thread Arnaldo Carvalho de Melo

From: Joe Stringer 

This function will turn a libbpf pointer into a standard error code (or
0 if the pointer is valid).

This also allows removal of the dependency on linux/err.h in the public
header file, which causes problems in userspace programs built against
libbpf.

Signed-off-by: Joe Stringer 
Acked-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170123011128.26534-5-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c  | 8 
 tools/lib/bpf/libbpf.h  | 4 +++-
 tools/perf/tests/llvm.c | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 406838fa9c4f..e6cd62b1264b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1542,3 +1543,10 @@ bpf_object__find_map_by_offset(struct bpf_object *obj, 
size_t offset)
}
return ERR_PTR(-ENOENT);
 }
+
+long libbpf_get_error(const void *ptr)
+{
+   if (IS_ERR(ptr))
+   return PTR_ERR(ptr);
+   return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2188ccdc0e2d..4014d1ba5e3d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -22,8 +22,8 @@
 #define __BPF_LIBBPF_H
 
 #include 
+#include 
 #include 
-#include 
 #include   // for size_t
 
 enum libbpf_errno {
@@ -234,4 +234,6 @@ int bpf_map__set_priv(struct bpf_map *map, void *priv,
  bpf_map_clear_priv_t clear_priv);
 void *bpf_map__priv(struct bpf_map *map);
 
+long libbpf_get_error(const void *ptr);
+
 #endif
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 02a33ebcd992..d357dab72e68 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -13,7 +13,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
struct bpf_object *obj;
 
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, NULL);
-   if (IS_ERR(obj))
+   if (libbpf_get_error(obj))
return TEST_FAIL;
bpf_object__close(obj);
return TEST_OK;
-- 
2.9.3

Re: [PATCH net v2 3/4] r8152: re-schedule napi for tx

2017-01-25 Thread Eric Dumazet

On Wed, 2017-01-25 at 16:13 +0800, Hayes Wang wrote:
> Re-schedule napi after napi_complete() for tx, if it is necessay.
> 
> In r8152_poll(), if the tx is completed after tx_bottom() and before
> napi_complete(), the scheduling of napi would be lost. Then, no
> one handles the next tx until the next napi_schedule() is called.
> 
> Signed-off-by: Hayes Wang 
> ---
>  drivers/net/usb/r8152.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index ec882be..45d168e 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -1936,6 +1936,9 @@ static int r8152_poll(struct napi_struct *napi, int 
> budget)
>   napi_complete(napi);
>   if (!list_empty(&tp->rx_done))
>   napi_schedule(napi);
> + else if (!skb_queue_empty(&tp->tx_queue) &&
> +  !list_empty(&tp->tx_free))
> + napi_schedule(&tp->napi);

Why using &tp->napi instead of napi here, as done 3 lines above ?

[PATCH 16/23] tools lib bpf: Add set/is helpers for all prog types

2017-01-25 Thread Arnaldo Carvalho de Melo

From: Joe Stringer 

These bpf_prog_types were exposed in the uapi but there were no
corresponding functions to set these types for programs in libbpf.

Signed-off-by: Joe Stringer 
Acked-by: Wang Nan 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170123011128.26534-4-...@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo 
---
 tools/lib/bpf/libbpf.c |  5 +
 tools/lib/bpf/libbpf.h | 10 ++
 2 files changed, 15 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 371cb40a2304..406838fa9c4f 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1448,8 +1448,13 @@ bool bpf_program__is_##NAME(struct bpf_program *prog)
\
return bpf_program__is_type(prog, TYPE);\
 }  \
 
+BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER);
 BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
+BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS);
+BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT);
 BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
+BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
+BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
 int bpf_map__fd(struct bpf_map *map)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a5a8b86a06fe..2188ccdc0e2d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -174,11 +174,21 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n);
 /*
  * Adjust type of bpf program. Default is kprobe.
  */
+int bpf_program__set_socket_filter(struct bpf_program *prog);
 int bpf_program__set_tracepoint(struct bpf_program *prog);
 int bpf_program__set_kprobe(struct bpf_program *prog);
+int bpf_program__set_sched_cls(struct bpf_program *prog);
+int bpf_program__set_sched_act(struct bpf_program *prog);
+int bpf_program__set_xdp(struct bpf_program *prog);
+int bpf_program__set_perf_event(struct bpf_program *prog);
 
+bool bpf_program__is_socket_filter(struct bpf_program *prog);
 bool bpf_program__is_tracepoint(struct bpf_program *prog);
 bool bpf_program__is_kprobe(struct bpf_program *prog);
+bool bpf_program__is_sched_cls(struct bpf_program *prog);
+bool bpf_program__is_sched_act(struct bpf_program *prog);
+bool bpf_program__is_xdp(struct bpf_program *prog);
+bool bpf_program__is_perf_event(struct bpf_program *prog);
 
 /*
  * We don't need __attribute__((packed)) now since it is
-- 
2.9.3

[PATCH net] net: dsa: Mop up remaining NET_DSA_HWMON references

2017-01-25 Thread Andrew Lunn

Previous patches have moved the temperature sensor code into the
Marvell PHYs. A few now dead references to NET_DSA_HWMON were left
behind. Go reap them.

Reported-by: Valentin Rothberg 
Signed-off-by: Andrew Lunn 
---
 Documentation/networking/dsa/dsa.txt | 24 
 include/net/dsa.h|  8 
 2 files changed, 32 deletions(-)

diff --git a/Documentation/networking/dsa/dsa.txt 
b/Documentation/networking/dsa/dsa.txt
index 63912ef34606..b8b40753133e 100644
--- a/Documentation/networking/dsa/dsa.txt
+++ b/Documentation/networking/dsa/dsa.txt
@@ -295,7 +295,6 @@ DSA currently leverages the following subsystems:
 - MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c
 - Switchdev: net/switchdev/*
 - Device Tree for various of_* functions
-- HWMON: drivers/hwmon/*
 
 MDIO/PHY library
 
@@ -349,12 +348,6 @@ Documentation/devicetree/bindings/net/dsa/dsa.txt. 
PHY/MDIO library helper
 functions such as of_get_phy_mode(), of_phy_connect() are also used to query
 per-port PHY specific details: interface connection, MDIO bus location etc..
 
-HWMON
--
-
-Some switch drivers feature internal temperature sensors which are exposed as
-regular HWMON devices in /sys/class/hwmon/.
-
 Driver development
 ==
 
@@ -495,23 +488,6 @@ Power management
   BR_STATE_DISABLED and propagating changes to the hardware if this port is
   disabled while being a bridge member
 
-Hardware monitoring

-
-These callbacks are only available if CONFIG_NET_DSA_HWMON is enabled:
-
-- get_temp: this function queries the given switch for its temperature
-
-- get_temp_limit: this function returns the switch current maximum temperature
-  limit
-
-- set_temp_limit: this function configures the maximum temperature limit 
allowed
-
-- get_temp_alarm: this function returns the critical temperature threshold
-  returning an alarm notification
-
-See Documentation/hwmon/sysfs-interface for details.
-
 Bridge layer
 
 
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 9d6cd923c48c..08b340403927 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -178,14 +178,6 @@ struct dsa_switch {
 */
s8  rtable[DSA_MAX_SWITCHES];
 
-#ifdef CONFIG_NET_DSA_HWMON
-   /*
-* Hardware monitoring information
-*/
-   charhwmon_name[IFNAMSIZ + 8];
-   struct device   *hwmon_dev;
-#endif
-
/*
 * The lower device this switch uses to talk to the host
 */
-- 
2.11.0

Re: TCP stops sending packets over loopback on 4.10-rc3?

2017-01-25 Thread Josef Bacik

On Tue, Jan 24, 2017 at 9:07 AM, Eric Dumazet  
wrote:

On Tue, 2017-01-24 at 06:20 -0500, Josef Bacik wrote:

 Hello,

 I've been trying to test some NBD changes I had made recently and I
 started having packet timeouts.  I traced this down to tcp just
 stopping sending packets after a lot of writing.  All NBD does is 
call

 kernel_sendmsg() with a request struct and some pages when it does
 writes.  I did a bunch of tracing and I've narrowed it down to 
running

 out of sk_wmem_queued space.  In tcp_sendmsg() here

 new_segment:
 /* Allocate new segment. If the interface 
is SG,

  * allocate skb fitting to single page.
  */
 if (!sk_stream_memory_free(sk))
 goto wait_for_sndbuf;

 we hit this pretty regularly, and eventually just get stuck in
 sk_stream_wait_memory until the timeout ends and we error out
 everything.  Now sk_stream_memory_free checks the sk_wmem_queued and
 calls into the sk_prot->stream_memory_free(), so I broke this out 
like

 the following


 if (sk->sk_wmem_queued >= sk->sk_sndbuf) {
 trace_printk("sk_wmem_queued %d, sk_sndbuf %d\n",
 sk->sk_wmem_queued, sk->sk_sndbuf);
 goto wait_for_sndbuf;
  }
  if (sk->sk_prot->stream_memory_free &&
 !sk->sk_prot->stream_memory_free(sk)) {
 trace_printk("sk_stream_memory_free\n");
 goto wait_for_sndbuf;
  }

 And I got this in my tracing

kworker/u16:5-112   [001]   1375.637564: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1375.639657: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [003]   1375.641128: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [003]   1375.643441: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1375.807614: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1377.538744: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1377.543418: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
 kworker/2:4H-1535  [002]   1377.544685: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [000]   1379.378352: tcp_sendmsg:
 sk_wmem_queued 4205796, sk_sndbuf 4194304
kworker/u16:5-112   [003]   1380.985721: tcp_sendmsg:
 sk_wmem_queued 4212416, sk_sndbuf 4194304

 This is as far as I've gotten and I'll keep digging into it, but I 
was

 wondering if this looks familiar to anybody?  Also one thing I've
 noticed is sk_stream_wait_memory() will wait on sk_sleep(sk), but
 basically nothing wakes this up.  For example it seems the main way 
we

 reduce sk_wmem_queued is through sk_wmem_free_skb(), which doesn't
 appear to wake anything up in any of its callers, so anybody who 
does

 end up sleeping will basically never wake up.  That seems like it
 should be more broken than it is, so I'm curious to know how things 
are

 actually woken up in this case.  Thanks,



git grep -n SOCK_QUEUE_SHRUNK

-> tcp_check_space()


But tcp_check_space() doesn't actually reduce sk_wmem_queued from what 
I can see.  The only places that appear to reduce it are tcp_trim_head, 
which is only called in the retransmit path, and sk_wmem_free_skb, 
which seems to be right, but I added a trace_printk() in it to see if 
it was firing during my test and it never fires.  So we _appear_ to 
only ever be incrementing this counter, but never decrementing it.  I'm 
doing a bunch of tracing trying to figure out what is going on here but 
so far nothing is popping which is starting to make me think ftrace is 
broken.  Thanks,


Josef

Re: NAPI on USB network drivers

2017-01-25 Thread Eric Dumazet

On Wed, 2017-01-25 at 09:39 +, Hayes Wang wrote:
> Oliver Neukum [mailto:oneu...@suse.com]
> > Sent: Wednesday, January 25, 2017 5:35 PM
> [...]
> > looking at r8152 I noticed that it uses NAPI. I never considered
> > this for the generic USB networking code as you cannot disable
> > interrupts for USB. Is it still worth it? What are the benefits?
> 
> You could use napi_gro_receive() and it influences the performance.

You also could use napi_complete_done() instead of napi_complete(), as
it allows users to tune the performance vs latency for GRO.

Looking at this driver, I do not see any limitation on the number of
skbs that can be pushed into tp->rx_queue.

I wonder if this queue can end up consuming all memory of a host under
stress.

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 
e1466b4d2b6c727148a884672bbd9593bf04b3ac..221df4a931b5c1073f1922d0fa0bbff158c73b7d
 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1840,7 +1840,10 @@ static int rx_bottom(struct r8152 *tp, int budget)
stats->rx_packets++;
stats->rx_bytes += pkt_len;
} else {
-   __skb_queue_tail(&tp->rx_queue, skb);
+   if (unlikely(skb_queue_len(&tp->rx_queue) >= 
1000))
+   kfree_skb(skb);
+   else
+   __skb_queue_tail(&tp->rx_queue, skb);
}
 
 find_next_rx:

Re: [PATCH v2] virtio_net: fix PAGE_SIZE > 64k

2017-01-25 Thread Michael S. Tsirkin

On Tue, Jan 24, 2017 at 08:07:40PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 24, 2017 at 7:48 PM, John Fastabend
>  wrote:
> >
> > It is a concern on my side. I want XDP and Linux stack to work
> > reasonably well together.
> 
> btw the micro benchmarks showed that page per packet approach
> that xdp took in mlx4 should be 10% slower vs normal operation
> for tcp/ip stack.

Interesting. TCP only or UDP too? What's the packet size? Are you tuning
your rmem limits at all?  The slowdown would be more noticeable with
UDP with default values and small packet sizes.

> We thought that for our LB use case
> it will be an acceptable slowdown, but turned out that overall we
> got a performance boost, since xdp model simplified user space
> and got data path faster, so we magically got extra free cpu
> that is used for other apps on the same host and overall
> perf win despite extra overhead in tcp/ip.
> Not all use cases are the same and not everyone will be as lucky,
> so I'd like to see performance of xdp_pass improving too, though
> it turned out to be not as high priority as I initially estimated.

Re: TCP stops sending packets over loopback on 4.10-rc3?

2017-01-25 Thread Eric Dumazet

On Wed, 2017-01-25 at 09:14 -0500, Josef Bacik wrote:
> On Tue, Jan 24, 2017 at 9:07 AM, Eric Dumazet  

> > 
> > git grep -n SOCK_QUEUE_SHRUNK
> > 
> > -> tcp_check_space()
> 
> But tcp_check_space() doesn't actually reduce sk_wmem_queued from what 
> I can see.  The only places that appear to reduce it are tcp_trim_head, 
> which is only called in the retransmit path, and sk_wmem_free_skb, 
> which seems to be right,

This is exactly how it works.

We free a bunch of skbs (an ACK can acknowledge dozens of them), and set
the SOCK_QUEUE_SHRUNK.

Then later, tcp_check_space() is called once and check if the bit was
set by a prior call to tcp_trim_head() or full skb freeing.

>  but I added a trace_printk() in it to see if 
> it was firing during my test and it never fires.  So we _appear_ to 
> only ever be incrementing this counter, but never decrementing it.  I'm 
> doing a bunch of tracing trying to figure out what is going on here but 
> so far nothing is popping which is starting to make me think ftrace is 
> broken.  Thanks,
> 

Just to make sure, are you telling use native/standard TCP is broken
over loopback, or is that only when using an additional kernel module ?

Re: TCP stops sending packets over loopback on 4.10-rc3?

2017-01-25 Thread Josef Bacik


On Wed, Jan 25, 2017 at 9:14 AM, Josef Bacik  wrote:
On Tue, Jan 24, 2017 at 9:07 AM, Eric Dumazet 
 wrote:

On Tue, 2017-01-24 at 06:20 -0500, Josef Bacik wrote:

 Hello,

 I've been trying to test some NBD changes I had made recently and I
 started having packet timeouts.  I traced this down to tcp just
 stopping sending packets after a lot of writing.  All NBD does is 
call

 kernel_sendmsg() with a request struct and some pages when it does
 writes.  I did a bunch of tracing and I've narrowed it down to 
running

 out of sk_wmem_queued space.  In tcp_sendmsg() here

 new_segment:
 /* Allocate new segment. If the interface 
is SG,

  * allocate skb fitting to single page.
  */
 if (!sk_stream_memory_free(sk))
 goto wait_for_sndbuf;

 we hit this pretty regularly, and eventually just get stuck in
 sk_stream_wait_memory until the timeout ends and we error out
 everything.  Now sk_stream_memory_free checks the sk_wmem_queued 
and
 calls into the sk_prot->stream_memory_free(), so I broke this out 
like

 the following


 if (sk->sk_wmem_queued >= sk->sk_sndbuf) {
 trace_printk("sk_wmem_queued %d, sk_sndbuf %d\n",
 sk->sk_wmem_queued, sk->sk_sndbuf);
 goto wait_for_sndbuf;
  }
  if (sk->sk_prot->stream_memory_free &&
 !sk->sk_prot->stream_memory_free(sk)) {
 trace_printk("sk_stream_memory_free\n");
 goto wait_for_sndbuf;
  }

 And I got this in my tracing

kworker/u16:5-112   [001]   1375.637564: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1375.639657: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [003]   1375.641128: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [003]   1375.643441: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1375.807614: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1377.538744: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [001]   1377.543418: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
 kworker/2:4H-1535  [002]   1377.544685: tcp_sendmsg:
 sk_wmem_queued 4204872, sk_sndbuf 4194304
kworker/u16:5-112   [000]   1379.378352: tcp_sendmsg:
 sk_wmem_queued 4205796, sk_sndbuf 4194304
kworker/u16:5-112   [003]   1380.985721: tcp_sendmsg:
 sk_wmem_queued 4212416, sk_sndbuf 4194304

 This is as far as I've gotten and I'll keep digging into it, but I 
was

 wondering if this looks familiar to anybody?  Also one thing I've
 noticed is sk_stream_wait_memory() will wait on sk_sleep(sk), but
 basically nothing wakes this up.  For example it seems the main 
way we

 reduce sk_wmem_queued is through sk_wmem_free_skb(), which doesn't
 appear to wake anything up in any of its callers, so anybody who 
does

 end up sleeping will basically never wake up.  That seems like it
 should be more broken than it is, so I'm curious to know how 
things are

 actually woken up in this case.  Thanks,



git grep -n SOCK_QUEUE_SHRUNK

-> tcp_check_space()


But tcp_check_space() doesn't actually reduce sk_wmem_queued from 
what I can see.  The only places that appear to reduce it are 
tcp_trim_head, which is only called in the retransmit path, and 
sk_wmem_free_skb, which seems to be right, but I added a 
trace_printk() in it to see if it was firing during my test and it 
never fires.  So we _appear_ to only ever be incrementing this 
counter, but never decrementing it.  I'm doing a bunch of tracing 
trying to figure out what is going on here but so far nothing is 
popping which is starting to make me think ftrace is broken.  Thanks,


Nope ftrace isn't broken, I'm just dumb, the space is being reclaimed 
by sk_wmem_free_skb().  So I guess I need to figure out why I stop 
getting ACK's from the other side of the loopback.  Thanks,


Josef

Re: TCP stops sending packets over loopback on 4.10-rc3?

2017-01-25 Thread Eric Dumazet

On Wed, 2017-01-25 at 09:26 -0500, Josef Bacik wrote:

> Nope ftrace isn't broken, I'm just dumb, the space is being reclaimed 
> by sk_wmem_free_skb().  So I guess I need to figure out why I stop 
> getting ACK's from the other side of the loopback.  Thanks,

ss -temoi dst 127.0.0.1

Might give you some hints, like packets being dropped.

ACK can be delayed if the reader is slow to consume bytes.

Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive

2017-01-25 Thread Michael S. Tsirkin

On Tue, Jan 24, 2017 at 08:02:29PM -0800, John Fastabend wrote:
> On 17-01-24 07:23 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 25, 2017 at 10:57:12AM +0800, Jason Wang wrote:
> >>
> >>
> >> On 2017年01月25日 04:08, Michael S. Tsirkin wrote:
> >>> On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
>  From: "Michael S. Tsirkin" 
>  Date: Mon, 23 Jan 2017 23:08:35 +0200
> 
> > On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> >> In the small buffer case during driver unload we currently use
> >> put_page instead of dev_kfree_skb. Resolve this by adding a check
> >> for virtnet mode when checking XDP queue type. Also name the
> >> function so that the code reads correctly to match the additional
> >> check.
> >>
> >> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> >> Signed-off-by: John Fastabend 
> >> Acked-by: Jason Wang 
> > Acked-by: Michael S. Tsirkin 
> >
> > I think we definitely want this one in -net as it's
> > a bugfix.
>  This whole series is a bug fix, we must have adjust_header XDP
>  support in the virtio_net driver before v4.10 goes out, it is
>  a requires base feature for XDP.
> >>> I have to say device resets outside probe have a huge potential
> >>> to uncover hypervisor bugs.
> >>
> >> Maybe not if it reuses most of current codes? Since we've already used them
> >> in sleep or hibernation?
> >>
> >> Thanks
> > 
> > Except almost no one uses sleep or hybernate with VMs.  I'm not saying
> > it's a bad idea, just that it needs a lot of testing before release and
> > we won't get enough if we merge at this point.
> > 
> 
> Then it would seem like a good thing to have another user of these paths and
> find the bugs versus letting them sit there for some poor folks who do use
> sleep/hybernate.

Absolutely. But -rc6 is not the time to test waters IMO.

> >>>   I am rather uncomfortable
> >>> doing that after -rc1.
> >>>
> >>> How about a module option to disable it by default?
> >>> We can then ship a partial implementation in 4.10
> >>> and work on completing it in 4.11.
> >>>
> 
> Ugh I would prefer to avoid module options. This will only happen if users
> push XDP program into driver anyways.

Again I agree, it's an idea for a stopgap measure so we can have
something in 4.10 - and also assuming that 256b headroom is a must.

> > 
> > To clarify, I'm thinking an option similar to enable_xdp,
> > and have all packets have a 256 byte headroom for 4.10.
> 
> An option where? In QEMU side, in driver? Is the reset really that bad, coming
> from a hardware driver side lots of configuration changes can cause resets. I
> agree its not overly elegant but could follow on patches be used to make it
> prettier if possible.

Again I agree and it's not that bad it's just not something we should
do past rc5.

> I know folks prefer to avoid tuning knobs but I think exposing the headroom
> configuration to users might not be a bad idea. After all these same users are
> already programming maps and ebpf codes. A simple tuning knob should not be a
> big deal and reasonable defaults would of course be used. That is a net-next
> debate though.

No arguments from my side here.

> > 
> > Consider our options for 4.11.
> > 
> 
> Finally just to point out here are the drivers with XDP support on latest
> net tree,
> 
>   mlx/mlx5
>   mlx/mlx4
>   qlogic/qede
>   netronome/nfp
>   virtio_net
> 
> And here is the list of adjust header support,
> 
>   mlx/mlx4

Above seems to imply an interface for userspace to detect the amount
of head space would be benefitial.

> 
> So we currently have the same feature gap on all the other drivers except one.
> Although I do not think that is a very good excuse. Lets figure out what we
> should do about virtio.
> 
> Thanks,
> John

If we can simply defer adjust_head patches to 4.11 then that's fine.

-- 
MST

Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive

2017-01-25 Thread Michael S. Tsirkin

On Wed, Jan 25, 2017 at 01:46:46PM +0800, Jason Wang wrote:
> > Then it would seem like a good thing to have another user of these paths and
> > find the bugs versus letting them sit there for some poor folks who do use
> > sleep/hybernate.
> > 
> 
> Yes, and uncovering hypervisor bugs now is better than uncovering it in the
> future.
> 
> Thanks

Not really, all the uncovering should happen in -next or early rc.
Right now we need to fix what has been uncovered so far.

-- 
MST

Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive

2017-01-25 Thread Michael S. Tsirkin

On Tue, Jan 24, 2017 at 11:33:56PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 24, 2017 at 8:02 PM, John Fastabend
>  wrote:
> >
> > Finally just to point out here are the drivers with XDP support on latest
> > net tree,
> >
> > mlx/mlx5
> > mlx/mlx4
> > qlogic/qede
> > netronome/nfp
> > virtio_net
> >
> > And here is the list of adjust header support,
> >
> > mlx/mlx4
> >
> 
> in net-next it's actually:
> yes: mlx4, mlx5
> no: qede, nfp, virtio
> while nfp and virtio are working on it.
> 
> xdp_adjust_head() is must have for load balancer,

What amount of head space does it need? 70 bytes
to do vxlan kind of thing?

> so the sooner it lands for virtio the easier it will be
> to develop xdp programs. Initially I expected
> e1k+xdp to be the base line for debugging and
> development of xdp programs, but since not everyone
> agreed on e1k the virtio+xdp filled in the gap.
> So without adjust_head in virtio I see very little use for it
> in our environment.
> It is a must have feature regardless of timing.
> I will backport whatever is necessary, but distros
> will stick with official releases and imo it's not great
> from xdp adoption point of view to have
> virtio driver lacking key features.

If everyone can agree it's net-next material then I'm happy.

-- 
MST

RE: [Xen-devel] xennet_start_xmit assumptions

2017-01-25 Thread Paul Durrant

> -Original Message-
> From: Sowmini Varadhan [mailto:sowmini.varad...@oracle.com]
> Sent: 19 January 2017 11:14
> To: Paul Durrant 
> Cc: Konrad Rzeszutek Wilk ; Wei Liu
> ; netdev@vger.kernel.org; xen-
> de...@lists.xenproject.org
> Subject: Re: [Xen-devel] xennet_start_xmit assumptions
> 
> On (01/19/17 09:36), Paul Durrant wrote:
> >
> > Hi Sowmini,
> >
> >   Sounds like a straightforward bug to me... netfront should be able
> > to handle an empty skb and clearly, if it's relying on skb_headlen()
> > being non-zero, that's not the case.
> >
> >   Paul
> 
> I see. Seems like there are 2 things broken here: recovering
> from skb->len = 0, and recovering from  the more complex
> case of (skb->len > 0 && skb_headlen(skb) == 0)
> 
> Do you folks want to take a shot at fixing this,
> since you know the code better? If you are interested,
> I can share my test program to help you reproduce the
> simpler skb->len == 0 case, but it's the fully non-linear
> skbs that may be more interesting to reproduce/fix.
> 
> I'll probably work on fixing packet_snd to return -EINVAL
> or similar when the len is zero this week.
> 

Sowmini,

  I knocked together the following patch, which seems to work for me:

---8<---
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 40f26b6..a957c89 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -567,6 +567,10 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
u16 queue_index;
struct sk_buff *nskb;

+   /* Drop packets that are not at least ETH_HLEN in length */
+   if (skb->len < ETH_HLEN)
+   goto drop;
+
/* Drop the packet if no queues are set up */
if (num_queues < 1)
goto drop;
@@ -609,6 +613,8 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}

len = skb_headlen(skb);
+   if ((len < ETH_HLEN) && !__pskb_pull_tail(skb, ETH_HLEN))
+   goto drop;

spin_lock_irqsave(&queue->tx_lock, flags);

---8<---

  Making netfront cope with a fully non-linear skb looks like it would be quite 
intrusive and probably not worth it so I opted for just doing the ETH_HLEN 
pull-tail if necessary. Can you check it works for you?

  Paul

Re: [PATCH net] sctp: sctp_addr_id2transport should verify the addr before looking up assoc

2017-01-25 Thread Neil Horman

On Tue, Jan 24, 2017 at 02:01:53PM +0800, Xin Long wrote:
> sctp_addr_id2transport is a function for sockopt to look up assoc by
> address. As the address is from userspace, it can be a v4-mapped v6
> address. But in sctp protocol stack, it always handles a v4-mapped
> v6 address as a v4 address. So it's necessary to convert it to a v4
> address before looking up assoc by address.
> 
> This patch is to fix it by calling sctp_verify_addr in which it can do
> this conversion before calling sctp_endpoint_lookup_assoc, just like
> what sctp_sendmsg and __sctp_connect do for the address from users.
> 
> Signed-off-by: Xin Long 
> ---
>  net/sctp/socket.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 318c678..37eeab7 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -235,8 +235,12 @@ static struct sctp_transport 
> *sctp_addr_id2transport(struct sock *sk,
> sctp_assoc_t id)
>  {
>   struct sctp_association *addr_asoc = NULL, *id_asoc = NULL;
> - struct sctp_transport *transport;
> + struct sctp_af *af = sctp_get_af_specific(addr->ss_family);
>   union sctp_addr *laddr = (union sctp_addr *)addr;
> + struct sctp_transport *transport;
> +
> + if (sctp_verify_addr(sk, laddr, af->sockaddr_len))
> + return NULL;
>  
>   addr_asoc = sctp_endpoint_lookup_assoc(sctp_sk(sk)->ep,
>  laddr,
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Acked-by: Neil Horman

Re: [PATCH net] sctp: sctp_addr_id2transport should verify the addr before looking up assoc

2017-01-25 Thread Xin Long

On Wed, Jan 25, 2017 at 11:27 PM, Vladislav Yasevich
 wrote:
> On Tue, Jan 24, 2017 at 1:01 AM, Xin Long  wrote:
>>
>> sctp_addr_id2transport is a function for sockopt to look up assoc by
>> address. As the address is from userspace, it can be a v4-mapped v6
>> address. But in sctp protocol stack, it always handles a v4-mapped
>> v6 address as a v4 address. So it's necessary to convert it to a v4
>> address before looking up assoc by address.
>>
>> This patch is to fix it by calling sctp_verify_addr in which it can do
>> this conversion before calling sctp_endpoint_lookup_assoc, just like
>> what sctp_sendmsg and __sctp_connect do for the address from users.
>>
>> Signed-off-by: Xin Long 
>> ---
>>  net/sctp/socket.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 318c678..37eeab7 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -235,8 +235,12 @@ static struct sctp_transport
>> *sctp_addr_id2transport(struct sock *sk,
>>   sctp_assoc_t id)
>>  {
>> struct sctp_association *addr_asoc = NULL, *id_asoc = NULL;
>> -   struct sctp_transport *transport;
>> +   struct sctp_af *af = sctp_get_af_specific(addr->ss_family);
>> union sctp_addr *laddr = (union sctp_addr *)addr;
>> +   struct sctp_transport *transport;
>> +
>> +   if (sctp_verify_addr(sk, laddr, af->sockaddr_len))
>> +   return NULL;
>>
>
> This causes a side-effect such that GET options will end up with ipv4
> address instead
> of a v4mapped address that was passed in.
not really

(more below)
>
> -vlad
>
>>
>> addr_asoc = sctp_endpoint_lookup_assoc(sctp_sk(sk)->ep,
>>laddr,
sctp_get_pf_specific(sk->sk_family)->addr_to_user(sctp_sk(sk),
(union sctp_addr *)addr);

here it will convert it back to v4mapped v6 address.

>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

Re: [Xen-devel] xennet_start_xmit assumptions

2017-01-25 Thread Sowmini Varadhan

On (01/25/17 15:06), Paul Durrant wrote:
> 
>   Making netfront cope with a fully non-linear skb looks like it would
> be quite intrusive and probably not worth it so I opted for just doing
> the ETH_HLEN pull-tail if necessary. Can you check it works for you?

I tested it, and it works fine, but note that DaveM's comments in 
this thread: the DKI is that we *must* provide at least the hard_header_len
in the non-paged part of the skb. So might not even be necessary to handle
the fully non-linear skb (though it's probably prudent to check
and bail for this, as your patch does)

I just posted an RFC patch for fixing the pf_packet layer,
just in case other drivers like xen_netfront dont explicitly
check for this
   http://patchwork.ozlabs.org/patch/719236/

Re: ip link SR-IOV VF MAC address disparity

2017-01-25 Thread Greg

On Wed, 2017-01-25 at 15:34 +0200, Leon Goldberg wrote:
> Hey,
> 
> Using ip link to retrieve the MAC addresses of some SR-IOV virtual
> functions, I'm receiving mixed results:
> 
> [root@nari05 sys]# ip link
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode 
> DEFAULT
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: enp2s0f0:  mtu 1500 qdisc mq
> master ovirtmgmt state UP mode DEFAULT qlen 1000
> link/ether 78:e7:d1:e4:9b:64 brd ff:ff:ff:ff:ff:ff
> 3: enp2s0f1:  mtu 1500 qdisc mq master test1
> state DOWN mode DEFAULT qlen 1000
> link/ether 78:e7:d1:e4:9b:65 brd ff:ff:ff:ff:ff:ff
> vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
> vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
> vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
> vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
> vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
> 4: ovirtmgmt:  mtu 1500 qdisc noqueue
> state UP mode DEFAULT
> link/ether 78:e7:d1:e4:9b:64 brd ff:ff:ff:ff:ff:ff
> 5: test1:  mtu 1500 qdisc noqueue
> state DOWN mode DEFAULT
> link/ether 78:e7:d1:e4:9b:65 brd ff:ff:ff:ff:ff:ff
> 11: ;vdsmdummy;:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT
> link/ether 5e:b4:ac:5c:b9:a1 brd ff:ff:ff:ff:ff:ff
> 37: enp2s16f1:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
> link/ether 00:00:00:00:00:02 brd ff:ff:ff:ff:ff:ff
> 38: enp2s16f3:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
> link/ether d6:ee:45:57:c0:39 brd ff:ff:ff:ff:ff:ff
> 39: enp2s16f5:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
> link/ether 4a:2c:25:42:97:4a brd ff:ff:ff:ff:ff:ff
> 40: enp2s16f7:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
> link/ether c2:fe:2f:5e:f5:e8 brd ff:ff:ff:ff:ff:ff
> 41: enp2s17f1:  mtu 1500 qdisc noop state DOWN
> mode DEFAULT qlen 1000
> link/ether e6:31:a9:59:5f:ad brd ff:ff:ff:ff:ff:ff
> 
> enp2s0f1 is the physical function; enp2s1f* are the interfaces to the
> virtual functions.
> 
> Essentially, I have 2 questions:
> 1) What is the difference between the entries under the physical
> function and the interfaces?
> 2) How should I retrieve the correct MAC addresses? I'm aware of
> /sys/...//net/address, but I am now not sure it is the correct
> source.

If you haven't used ip link commands to set the VF device MAC addresses
then the devices will create their own as they register their net device
entries.  In that case you'll see the VF MAC addresses as all 00's
because you haven't set them.

Best known methods call for using the ip link command to set the MAC
addresses for the VF devices rather than letting them set their own
temporary LAA type MAC addresses.

- Greg

> 
> Thanks,
> Leon

Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive

2017-01-25 Thread John Fastabend

On 17-01-25 06:52 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 24, 2017 at 11:33:56PM -0800, Alexei Starovoitov wrote:
>> On Tue, Jan 24, 2017 at 8:02 PM, John Fastabend
>>  wrote:
>>>
>>> Finally just to point out here are the drivers with XDP support on latest
>>> net tree,
>>>
>>> mlx/mlx5
>>> mlx/mlx4
>>> qlogic/qede
>>> netronome/nfp
>>> virtio_net
>>>
>>> And here is the list of adjust header support,
>>>
>>> mlx/mlx4
>>>
>>
>> in net-next it's actually:
>> yes: mlx4, mlx5
>> no: qede, nfp, virtio
>> while nfp and virtio are working on it.
>>
>> xdp_adjust_head() is must have for load balancer,
> 
> What amount of head space does it need? 70 bytes
> to do vxlan kind of thing?
> 
>> so the sooner it lands for virtio the easier it will be
>> to develop xdp programs. Initially I expected
>> e1k+xdp to be the base line for debugging and
>> development of xdp programs, but since not everyone
>> agreed on e1k the virtio+xdp filled in the gap.
>> So without adjust_head in virtio I see very little use for it
>> in our environment.
>> It is a must have feature regardless of timing.
>> I will backport whatever is necessary, but distros
>> will stick with official releases and imo it's not great
>> from xdp adoption point of view to have
>> virtio driver lacking key features.
> 
> If everyone can agree it's net-next material then I'm happy.
> 

Considering that the only support for adjust_head in net branch
is in mlx4 and most drivers are aborting when programs get loaded
with adjust_head support I am OK with applying the patch below to
net and this series to net-next.

https://patchwork.ozlabs.org/patch/707118/

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 08327e005ccc..db761f37783e 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -1677,6 +1677,11 @@  static int virtnet_xdp_set(struct net_device *dev,
struct bpf_prog *prog)
u16 xdp_qp = 0, curr_qp;
int i, err;

 +  if (prog && prog->xdp_adjust_head) {
 +  netdev_warn(dev, "Does not support bpf_xdp_adjust_head()\n");
 +  return -EOPNOTSUPP;
 +  }
 +
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6)) {
netdev_warn(dev, "can't set XDP while host is implementing LRO, 
disable LRO
first\n");


Thanks,
John

[PATCH 1/3] net: phy: broadcom: use auxctl reading helper in BCM54612E code

2017-01-25 Thread Rafał Miłecki

From: Rafał Miłecki 

Starting with commit 5b4e29005123 ("net: phy: broadcom: add
bcm54xx_auxctl_read") we have a reading helper so use it and avoid code
duplication.
It also means we don't need MII_BCM54XX_AUXCTL_SHDWSEL_MISC define as
it's the same as MII_BCM54XX_AUXCTL_SHDWSEL_MISC just for reading needs
(same value shifted by 12 bits).

Signed-off-by: Rafał Miłecki 
---
 drivers/net/phy/broadcom.c | 6 ++
 include/linux/brcmphy.h| 1 -
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 4223e35490b0..25c6e6cea2dc 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -395,10 +395,8 @@ static int bcm54612e_config_aneg(struct phy_device *phydev)
(phydev->interface != PHY_INTERFACE_MODE_RGMII_RXID)) {
u16 reg;
 
-   /* Errata: reads require filling in the write selector field */
-   bcm54xx_auxctl_write(phydev, MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
-MII_BCM54XX_AUXCTL_MISC_RDSEL_MISC);
-   reg = phy_read(phydev, MII_BCM54XX_AUX_CTL);
+   reg = bcm54xx_auxctl_read(phydev,
+ MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
/* Disable RXD to RXC delay (default set) */
reg &= ~MII_BCM54XX_AUXCTL_MISC_RXD_RXC_SKEW;
/* Clear shadow selector field */
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 295fb3e73de5..34e61004b9dc 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -111,7 +111,6 @@
 #define MII_BCM54XX_AUXCTL_MISC_WREN   0x8000
 #define MII_BCM54XX_AUXCTL_MISC_RXD_RXC_SKEW   0x0100
 #define MII_BCM54XX_AUXCTL_MISC_FORCE_AMDIX0x0200
-#define MII_BCM54XX_AUXCTL_MISC_RDSEL_MISC 0x7000
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC0x0007
 #define MII_BCM54XX_AUXCTL_SHDWSEL_READ_SHIFT  12
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_RGMII_SKEW_EN  (1 << 8)
-- 
2.11.0

[PATCH 2/3] net: phy: broadcom: drop duplicated define for RXD to RXC delay

2017-01-25 Thread Rafał Miłecki

From: Rafał Miłecki 

We had two defines for the same bit (both were used with the
MII_BCM54XX_AUXCTL_SHDWSEL_MISC register).

Signed-off-by: Rafał Miłecki 
---
 drivers/net/phy/broadcom.c | 2 +-
 include/linux/brcmphy.h| 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 25c6e6cea2dc..9b7c2d57ca92 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -42,7 +42,7 @@ static int bcm54810_config(struct phy_device *phydev)
return rc;
 
val = bcm54xx_auxctl_read(phydev, MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
-   val &= ~MII_BCM54XX_AUXCTL_SHDWSEL_MISC_RGMII_SKEW_EN;
+   val &= ~MII_BCM54XX_AUXCTL_MISC_RXD_RXC_SKEW;
val |= MII_BCM54XX_AUXCTL_MISC_WREN;
rc = bcm54xx_auxctl_write(phydev, MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
  val);
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 34e61004b9dc..bff53da82b58 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -113,7 +113,6 @@
 #define MII_BCM54XX_AUXCTL_MISC_FORCE_AMDIX0x0200
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC0x0007
 #define MII_BCM54XX_AUXCTL_SHDWSEL_READ_SHIFT  12
-#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_RGMII_SKEW_EN  (1 << 8)
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN   (1 << 4)
 
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MASK0x0007
-- 
2.11.0

[PATCH 3/3] net: phy: bcm-phy-lib: clean up remaining AUXCTL register defines

2017-01-25 Thread Rafał Miłecki

From: Rafał Miłecki 

1) Use 0x%02x format for register number. This follows some other
   defines and makes it easier to distinct register from values.
2) Put register define above values and sort the values. It makes
   reading header code easier.
3) Drop SHDWSEL_ name part from the only value define using it. For all
   other values we just start with MISC_.

Signed-off-by: Rafał Miłecki 
---
 drivers/net/phy/bcm-phy-lib.c | 6 +++---
 include/linux/brcmphy.h   | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
index ab9ad689617c..8d7e21a9fa77 100644
--- a/drivers/net/phy/bcm-phy-lib.c
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -241,7 +241,7 @@ int bcm_phy_downshift_get(struct phy_device *phydev, u8 
*count)
return val;
 
/* Check if wirespeed is enabled or not */
-   if (!(val & MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN)) {
+   if (!(val & MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN)) {
*count = DOWNSHIFT_DEV_DISABLE;
return 0;
}
@@ -283,12 +283,12 @@ int bcm_phy_downshift_set(struct phy_device *phydev, u8 
count)
val |= MII_BCM54XX_AUXCTL_MISC_WREN;
 
if (count == DOWNSHIFT_DEV_DISABLE) {
-   val &= ~MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN;
+   val &= ~MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN;
return bcm54xx_auxctl_write(phydev,
MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
val);
} else {
-   val |= MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN;
+   val |= MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN;
ret = bcm54xx_auxctl_write(phydev,
   MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
   val);
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index bff53da82b58..a79d57caf7f1 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -104,16 +104,16 @@
 /*
  * AUXILIARY CONTROL SHADOW ACCESS REGISTERS.  (PHY REG 0x18)
  */
-#define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL  0x
+#define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL  0x00
 #define MII_BCM54XX_AUXCTL_ACTL_TX_6DB 0x0400
 #define MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA  0x0800
 
-#define MII_BCM54XX_AUXCTL_MISC_WREN   0x8000
+#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC0x07
+#define MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN   0x0010
 #define MII_BCM54XX_AUXCTL_MISC_RXD_RXC_SKEW   0x0100
 #define MII_BCM54XX_AUXCTL_MISC_FORCE_AMDIX0x0200
-#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC0x0007
+#define MII_BCM54XX_AUXCTL_MISC_WREN   0x8000
 #define MII_BCM54XX_AUXCTL_SHDWSEL_READ_SHIFT  12
-#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN   (1 << 4)
 
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MASK0x0007
 
-- 
2.11.0

Re: [PATCH net-next 5/5] bpf: enable verifier to better track const alu ops

2017-01-25 Thread William Tu

Looks good to me, I tested with several complex program without any
problem. Thanks for the patch.
--William

On Mon, Jan 23, 2017 at 4:06 PM, Daniel Borkmann  wrote:
> William reported couple of issues in relation to direct packet
> access. Typical scheme is to check for data + [off] <= data_end,
> where [off] can be either immediate or coming from a tracked
> register that contains an immediate, depending on the branch, we
> can then access the data. However, in case of calculating [off]
> for either the mentioned test itself or for access after the test
> in a more "complex" way, then the verifier will stop tracking the
> CONST_IMM marked register and will mark it as UNKNOWN_VALUE one.
>
> Adding that UNKNOWN_VALUE typed register to a pkt() marked
> register, the verifier then bails out in check_packet_ptr_add()
> as it finds the registers imm value below 48. In the first below
> example, that is due to evaluate_reg_imm_alu() not handling right
> shifts and thus marking the register as UNKNOWN_VALUE via helper
> __mark_reg_unknown_value() that resets imm to 0.
>
> In the second case the same happens at the time when r4 is set
> to r4 &= r5, where it transitions to UNKNOWN_VALUE from
> evaluate_reg_imm_alu(). Later on r4 we shift right by 3 inside
> evaluate_reg_alu(), where the register's imm turns into 3. That
> is, for registers with type UNKNOWN_VALUE, imm of 0 means that
> we don't know what value the register has, and for imm > 0 it
> means that the value has [imm] upper zero bits. F.e. when shifting
> an UNKNOWN_VALUE register by 3 to the right, no matter what value
> it had, we know that the 3 upper most bits must be zero now.
> This is to make sure that ALU operations with unknown registers
> don't overflow. Meaning, once we know that we have more than 48
> upper zero bits, or, in other words cannot go beyond 0x offset
> with ALU ops, such an addition will track the target register
> as a new pkt() register with a new id, but 0 offset and 0 range,
> so for that a new data/data_end test will be required. Is the source
> register a CONST_IMM one that is to be added to the pkt() register,
> or the source instruction is an add instruction with immediate
> value, then it will get added if it stays within max 0x bounds.
> From there, pkt() type, can be accessed should reg->off + imm be
> within the access range of pkt().
>
>   [...]
>   from 28 to 30: R0=imm1,min_value=1,max_value=1
> R1=pkt(id=0,off=0,r=22) R2=pkt_end
> R3=imm144,min_value=144,max_value=144
> R4=imm0,min_value=0,max_value=0
> R5=inv48,min_value=2054,max_value=2054 R10=fp
>   30: (bf) r5 = r3
>   31: (07) r5 += 23
>   32: (77) r5 >>= 3
>   33: (bf) r6 = r1
>   34: (0f) r6 += r5
>   cannot add integer value with 0 upper zero bits to ptr_to_packet
>
>   [...]
>   from 52 to 80: R0=imm1,min_value=1,max_value=1
> R1=pkt(id=0,off=0,r=34) R2=pkt_end R3=inv
> R4=imm272 R5=inv56,min_value=17,max_value=17
> R6=pkt(id=0,off=26,r=34) R10=fp
>   80: (07) r4 += 71
>   81: (18) r5 = 0xfff8
>   83: (5f) r4 &= r5
>   84: (77) r4 >>= 3
>   85: (0f) r1 += r4
>   cannot add integer value with 3 upper zero bits to ptr_to_packet
>
> Thus to get above use-cases working, evaluate_reg_imm_alu() has
> been extended for further ALU ops. This is fine, because we only
> operate strictly within realm of CONST_IMM types, so here we don't
> care about overflows as they will happen in the simulated but also
> real execution and interaction with pkt() in check_packet_ptr_add()
> will check actual imm value once added to pkt(), but it's irrelevant
> before.
>
> With regards to 06c1c049721a ("bpf: allow helpers access to variable
> memory") that works on UNKNOWN_VALUE registers, the verifier becomes
> now a bit smarter as it can better resolve ALU ops, so we need to
> adapt two test cases there, as min/max bound tracking only becomes
> necessary when registers were spilled to stack. So while mask was
> set before to track upper bound for UNKNOWN_VALUE case, it's now
> resolved directly as CONST_IMM, and such contructs are only necessary
> when f.e. registers are spilled.
>
> For commit 6b17387307ba ("bpf: recognize 64bit immediate loads as
> consts") that initially enabled dw load tracking only for nfp jit/
> analyzer, I did couple of tests on large, complex programs and we
> don't increase complexity badly (my tests were in ~3% range on avg).
> I've added a couple of tests similar to affected code above, and
> it works fine with verifier now.
>
> Reported-by: William Tu 
> Signed-off-by: Daniel Borkmann 
> Cc: Gianluca Borello 
> Cc: William Tu 
> Acked-by: Alexei Starovoitov 
> ---
>  kernel/bpf/verifier.c   | 64 +++---
>  tools/testing/selftests/bpf/test_verifier.c | 82 
> +
>  2 files changed, 127 insertions(+), 19 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 8f69df7..fb3513b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kern

Re: [PATCH net] sctp: sctp_addr_id2transport should verify the addr before looking up assoc

2017-01-25 Thread Vladislav Yasevich

On Wed, Jan 25, 2017 at 10:34 AM, Xin Long  wrote:
>
> On Wed, Jan 25, 2017 at 11:27 PM, Vladislav Yasevich
>  wrote:
> > On Tue, Jan 24, 2017 at 1:01 AM, Xin Long  wrote:
> >>
> >> sctp_addr_id2transport is a function for sockopt to look up assoc by
> >> address. As the address is from userspace, it can be a v4-mapped v6
> >> address. But in sctp protocol stack, it always handles a v4-mapped
> >> v6 address as a v4 address. So it's necessary to convert it to a v4
> >> address before looking up assoc by address.
> >>
> >> This patch is to fix it by calling sctp_verify_addr in which it can do
> >> this conversion before calling sctp_endpoint_lookup_assoc, just like
> >> what sctp_sendmsg and __sctp_connect do for the address from users.
> >>
> >> Signed-off-by: Xin Long 
> >> ---
> >>  net/sctp/socket.c | 6 +-
> >>  1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> >> index 318c678..37eeab7 100644
> >> --- a/net/sctp/socket.c
> >> +++ b/net/sctp/socket.c
> >> @@ -235,8 +235,12 @@ static struct sctp_transport
> >> *sctp_addr_id2transport(struct sock *sk,
> >>   sctp_assoc_t id)
> >>  {
> >> struct sctp_association *addr_asoc = NULL, *id_asoc = NULL;
> >> -   struct sctp_transport *transport;
> >> +   struct sctp_af *af = sctp_get_af_specific(addr->ss_family);
> >> union sctp_addr *laddr = (union sctp_addr *)addr;
> >> +   struct sctp_transport *transport;
> >> +
> >> +   if (sctp_verify_addr(sk, laddr, af->sockaddr_len))
> >> +   return NULL;
> >>
> >
> > This causes a side-effect such that GET options will end up with ipv4
> > address instead
> > of a v4mapped address that was passed in.
> not really
>
> (more below)
> >
> > -vlad
> >
> >>
> >> addr_asoc = sctp_endpoint_lookup_assoc(sctp_sk(sk)->ep,
> >>laddr,
> sctp_get_pf_specific(sk->sk_family)->addr_to_user(sctp_sk(sk),
> (union sctp_addr *)addr);
>
> here it will convert it back to v4mapped v6 address.
>

Yep, you are right.  Missed the fact that it was already there.

ACK

-vlad

> >> --
> >> 2.1.0
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >

Re: [PATCH net] net: dsa: Mop up remaining NET_DSA_HWMON references

2017-01-25 Thread kbuild test robot

Hi Andrew,

[auto build test ERROR on net/master]

url:
https://github.com/0day-ci/linux/commits/Andrew-Lunn/net-dsa-Mop-up-remaining-NET_DSA_HWMON-references/20170125-221411
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/dsa/dsa.c: In function 'dsa_switch_setup_one':
>> net/dsa/dsa.c:447:15: error: 'struct dsa_switch' has no member named 
>> 'hwmon_name'
  scnprintf(ds->hwmon_name, sizeof(ds->hwmon_name), "%s_dsa%d",
  ^~
   net/dsa/dsa.c:447:38: error: 'struct dsa_switch' has no member named 
'hwmon_name'
  scnprintf(ds->hwmon_name, sizeof(ds->hwmon_name), "%s_dsa%d",
 ^~
>> net/dsa/dsa.c:449:5: error: 'struct dsa_switch' has no member named 
>> 'hwmon_dev'
  ds->hwmon_dev = hwmon_device_register_with_groups(NULL,
^~
   net/dsa/dsa.c:450:8: error: 'struct dsa_switch' has no member named 
'hwmon_name'
 ds->hwmon_name, ds, dsa_hwmon_groups);
   ^~
   net/dsa/dsa.c:451:16: error: 'struct dsa_switch' has no member named 
'hwmon_dev'
  if (IS_ERR(ds->hwmon_dev))
   ^~
   net/dsa/dsa.c:452:6: error: 'struct dsa_switch' has no member named 
'hwmon_dev'
   ds->hwmon_dev = NULL;
 ^~
   net/dsa/dsa.c: In function 'dsa_switch_destroy':
   net/dsa/dsa.c:518:8: error: 'struct dsa_switch' has no member named 
'hwmon_dev'
 if (ds->hwmon_dev)
   ^~
   net/dsa/dsa.c:519:29: error: 'struct dsa_switch' has no member named 
'hwmon_dev'
  hwmon_device_unregister(ds->hwmon_dev);
^~

vim +447 net/dsa/dsa.c

51579c3f Guenter Roeck 2014-10-29  441  /* Create valid hwmon 
'name' attribute */
51579c3f Guenter Roeck 2014-10-29  442  for (i = j = 0; i < 
IFNAMSIZ && netname[i]; i++) {
51579c3f Guenter Roeck 2014-10-29  443  if 
(isalnum(netname[i]))
51579c3f Guenter Roeck 2014-10-29  444  
hname[j++] = netname[i];
51579c3f Guenter Roeck 2014-10-29  445  }
51579c3f Guenter Roeck 2014-10-29  446  hname[j] = '\0';
51579c3f Guenter Roeck 2014-10-29 @447  
scnprintf(ds->hwmon_name, sizeof(ds->hwmon_name), "%s_dsa%d",
51579c3f Guenter Roeck 2014-10-29  448hname, index);
51579c3f Guenter Roeck 2014-10-29 @449  ds->hwmon_dev = 
hwmon_device_register_with_groups(NULL,
51579c3f Guenter Roeck 2014-10-29  450  
ds->hwmon_name, ds, dsa_hwmon_groups);
51579c3f Guenter Roeck 2014-10-29  451  if 
(IS_ERR(ds->hwmon_dev))
51579c3f Guenter Roeck 2014-10-29  452  ds->hwmon_dev = 
NULL;

:: The code at line 447 was first introduced by commit
:: 51579c3f1a9192b75365576227d40c7619493285 net: dsa: Add support for 
reporting switch chip temperatures

:: TO: Guenter Roeck 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net] net: dsa: Mop up remaining NET_DSA_HWMON references

2017-01-25 Thread Andrew Lunn

On Wed, Jan 25, 2017 at 03:04:17PM +0100, Andrew Lunn wrote:
> Previous patches have moved the temperature sensor code into the
> Marvell PHYs. A few now dead references to NET_DSA_HWMON were left
> behind. Go reap them.

Hi David

Wrong subject line. This should be net-next, which is why 0-day has
found issues. It applied it to the wrong branch.

  Andrew

Re: [PATCH/RFC v3 net] ravb: unmap descriptors when freeing rings

2017-01-25 Thread Sergei Shtylyov


Hello.

On 01/24/2017 09:21 PM, Simon Horman wrote:


From: Kazuya Mizuguchi 

"swiotlb buffer is full" errors occur after repeated initialisation of a
device - f.e. suspend/resume or ip link set up/down. This is because memory
mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
is not released.  Resolve this problem by unmapping descriptors when
freeing rings.

Note, ravb_tx_free() is moved but not otherwise modified by this patch.

Signed-off-by: Kazuya Mizuguchi 
[simon: reworked]
Signed-off-by: Simon Horman 
--
v3 [Simon Horman]
* As suggested by Sergei Shtylyov
  - consistently use le32_to_cpu(desc->dptr)
  - Do not clear desc->ds_cc as it is not used
* Paramatise ravb_tx_free() to allow it to free non-transmitted buffers

v2 [Simon Horman]
* As suggested by Sergei Shtylyov
  - Use dma_mapping_error() and rx_desc->ds_cc when unmapping RX descriptors;
this is consistent with the way that they are mapped
  - Use ravb_tx_free() to clear TX descriptors
* Reduce scope of new local variable

v1 [Kazuya Mizuguchi]
---
 drivers/net/ethernet/renesas/ravb_main.c | 113 ++-
 1 file changed, 65 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 89ac1e3f6175..57fe1411bb9d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -179,6 +179,51 @@ static struct mdiobb_ops bb_ops = {
.get_mdio_data = ravb_get_mdio_data,
 };

+enum ravb_tx_free_mode {
+   ravb_tx_free_all,
+   ravb_tx_free_txed_only,
+};
+
+/* Free TX skb function for AVB-IP */
+static int ravb_tx_free(struct net_device *ndev, int q,
+   enum ravb_tx_free_mode free_mode)


   Hmm... Sorry but this looks over-engineered. A *bool* parameter (named e.g 
'all) would suffice IMHO.



+{
+   struct ravb_private *priv = netdev_priv(ndev);
+   struct net_device_stats *stats = &priv->stats[q];
+   struct ravb_tx_desc *desc;
+   int free_num = 0;
+   int entry;
+   u32 size;
+
+   for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) {
+   entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] *
+NUM_TX_DESC);
+   desc = &priv->tx_ring[q][entry];
+   if (free_mode == ravb_tx_free_txed_only &&
+   desc->die_dt != DT_FEMPTY)
+   break;
+   /* Descriptor type must be checked before all other reads */
+   dma_rmb();
+   size = le16_to_cpu(desc->ds_tagl) & TX_DS;
+   /* Free the original skb. */
+   if (priv->tx_skb[q][entry / NUM_TX_DESC]) {
+   dma_unmap_single(ndev->dev.parent, 
le32_to_cpu(desc->dptr),
+size, DMA_TO_DEVICE);
+   /* Last packet descriptor? */
+   if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) {
+   entry /= NUM_TX_DESC;
+   dev_kfree_skb_any(priv->tx_skb[q][entry]);
+   priv->tx_skb[q][entry] = NULL;
+   stats->tx_packets++;
+   }
+   free_num++;
+   }
+   stats->tx_bytes += size;


   Hmmm... we shouldn't count the discarded unsent packets/bytes as sent, right?

[...]

@@ -215,12 +262,19 @@ static void ravb_ring_free(struct net_device *ndev, int q)
}

if (priv->tx_ring[q]) {
+   ravb_tx_free(ndev, q, ravb_tx_free_all);
+
ring_size = sizeof(struct ravb_tx_desc) *
(priv->num_tx_ring[q] * NUM_TX_DESC + 1);
dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q],
  priv->tx_desc_dma[q]);
priv->tx_ring[q] = NULL;
}
+
+   /* Free TX skb ringbuffer.
+* SKBs are freed by ravb_tx_free() call above. */


   This is not a recommended comment format:

/* bla
 * bla
 */

[...]

MBR, Sergei

Re: [PATCH] cxgbit: use T6 specific macro to set force bit

2017-01-25 Thread Joe Perches

On Wed, 2017-01-25 at 12:08 +0530, Varun Prakash wrote:
> On Wed, Jan 25, 2017 at 02:41:40AM +0530, Joe Perches wrote:
> > On Tue, 2017-01-24 at 17:07 +0530, Varun Prakash wrote:
> > > For T6 adapters use T6 specific macro to set
> > > force bit.
> > 
> > []
> > > diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h 
> > > b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
> > 
> > []
> > > @@ -1349,6 +1349,10 @@ struct cpl_tx_data {
> > >  #define TX_FORCE_S   13
> > >  #define TX_FORCE_V(x)((x) << TX_FORCE_S)
> > >  
> > > +#define T6_TX_FORCE_S20
> > > +#define T6_TX_FORCE_V(x) ((x) << T6_TX_FORCE_S)
> > > +#define T6_TX_FORCE_FT6_TX_FORCE_V(1U)
> > > +
> > >  enum {
> > >   ULP_TX_MEM_READ = 2,
> > >   ULP_TX_MEM_WRITE = 3,
> > > diff --git a/drivers/target/iscsi/cxgbit/cxgbit_target.c 
> > > b/drivers/target/iscsi/cxgbit/cxgbit_target.c
> > 
> > []
> > > @@ -162,12 +162,14 @@ cxgbit_tx_data_wr(struct cxgbit_sock *csk, struct 
> > > sk_buff *skb, u32 dlen,
> > > u32 len, u32 credits, u32 compl)
> > >  {
> > >   struct fw_ofld_tx_data_wr *req;
> > > + const struct cxgb4_lld_info *lldi = &csk->com.cdev->lldi;
> > >   u32 submode = cxgbit_skcb_submode(skb);
> > >   u32 wr_ulp_mode = 0;
> > >   u32 hdr_size = sizeof(*req);
> > >   u32 opcode = FW_OFLD_TX_DATA_WR;
> > >   u32 immlen = 0;
> > > - u32 force = TX_FORCE_V(!submode);
> > > + u32 force = is_t5(lldi->adapter_type) ? TX_FORCE_V(!submode) :
> > > + T6_TX_FORCE_F;
> > 
> > Perhaps it'd be better to add a is_t6() mechanism so this
> > is written in the positive rather than the negative.
> > 
> 
> At present

That's the key phrase that describes the reason why it's
generally better to write code in the positive than the
negative.

> cxgbit driver supports only T5 and T6 adapters so
> if a adapter is not T5 then it is T6.

Your code, your choices...

Re: [PATCH/RFC v3 net] ravb: unmap descriptors when freeing rings

2017-01-25 Thread Sergei Shtylyov


On 01/24/2017 09:21 PM, Simon Horman wrote:


From: Kazuya Mizuguchi 

"swiotlb buffer is full" errors occur after repeated initialisation of a
device - f.e. suspend/resume or ip link set up/down. This is because memory
mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
is not released.  Resolve this problem by unmapping descriptors when
freeing rings.


   Could you look into the sh_eth driver which seems to have the same issue?


Note, ravb_tx_free() is moved but not otherwise modified by this patch.


   This is not true anymore BTW.


Signed-off-by: Kazuya Mizuguchi 
[simon: reworked]
Signed-off-by: Simon Horman 


MBR, Sergei

[PULL] virtio, vhost: fixes

2017-01-25 Thread Michael S. Tsirkin

The following changes since commit 7a308bb3016f57e5be11a677d15b821536419d36:

  Linux 4.10-rc5 (2017-01-22 12:54:15 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus

for you to fetch changes up to c7070619f3408d9a0dffbed9149e6f00479cf43b:

  vring: Force use of DMA API for ARM-based systems with legacy devices 
(2017-01-25 00:33:11 +0200)


virtio, vhost: fixes

ARM DMA fixes
vhost vsock bugfix

Signed-off-by: Michael S. Tsirkin 


Robin Murphy (1):
  virtio_mmio: Set DMA masks appropriately

Stefan Hajnoczi (1):
  vhost/vsock: handle vhost_vq_init_access() error

Will Deacon (1):
  vring: Force use of DMA API for ARM-based systems with legacy devices

 drivers/vhost/vsock.c| 13 +
 drivers/virtio/virtio_mmio.c | 20 +++-
 drivers/virtio/virtio_ring.c |  7 +++
 3 files changed, 35 insertions(+), 5 deletions(-)

[PATCH net-next] xen-netfront: reject short packets and handle non-linear packets

2017-01-25 Thread Paul Durrant

Sowmini points out two vulnerabilities in xen-netfront:

a) The code assumes that skb->len is at least ETH_HLEN.
b) The code assumes that at least ETH_HLEN octets are in the linear
   port of the socket buffer.

This patch adds tests for both of these, and in the case of the latter
pulls sufficient bytes into the linear area.

Signed-off-by: Paul Durrant 
Reported-by: Sowmini Varadhan 
Tested-by: Sowmini Varadhan 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
---
 drivers/net/xen-netfront.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 40f26b6..0478809 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -567,6 +567,10 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
u16 queue_index;
struct sk_buff *nskb;
 
+   /* Basic sanity check */
+   if (unlikely(skb->len < ETH_HLEN))
+   goto drop;
+
/* Drop the packet if no queues are set up */
if (num_queues < 1)
goto drop;
@@ -609,6 +613,11 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
len = skb_headlen(skb);
+   if (unlikely(len < ETH_HLEN)) {
+   if (!__pskb_pull_tail(skb, ETH_HLEN - len))
+   goto drop;
+   len = ETH_HLEN;
+   }
 
spin_lock_irqsave(&queue->tx_lock, flags);
 
-- 
2.1.4

[PATCH 0/1] pull request for net: batman-adv 2017-01-25

2017-01-25 Thread Simon Wunderlich

Hi David,

here is a bugfix for net which we would like to have integrated.

Please pull or let me know of any problem!

Thank you,
  Simon

The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:

  Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batadv-net-for-davem-20170125

for you to fetch changes up to 4ea33ef0f9e95b69db9131d7afd98563713e81b0:

  batman-adv: Decrease hardif refcnt on fragmentation send error (2017-01-04 
08:22:04 +0100)


Here is a batman-adv bugfix:

 - fix reference count handling on fragmentation error, by Sven Eckelmann


Sven Eckelmann (1):
  batman-adv: Decrease hardif refcnt on fragmentation send error

 net/batman-adv/fragmentation.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

[PATCH 1/1] batman-adv: Decrease hardif refcnt on fragmentation send error

2017-01-25 Thread Simon Wunderlich

From: Sven Eckelmann 

An error before the hardif is found has to free the skb. But every error
after that has to free the skb + put the hard interface.

Fixes: 8def0be82dd1 ("batman-adv: Consume skb in batadv_frag_send_packet")
Signed-off-by: Sven Eckelmann 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/fragmentation.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c
index 9c561e683f4b..0854ebd8613e 100644
--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -474,7 +474,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
primary_if = batadv_primary_if_get_selected(bat_priv);
if (!primary_if) {
ret = -EINVAL;
-   goto put_primary_if;
+   goto free_skb;
}
 
/* Create one header to be copied to all fragments */
@@ -502,7 +502,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
skb_fragment = batadv_frag_create(skb, &frag_header, mtu);
if (!skb_fragment) {
ret = -ENOMEM;
-   goto free_skb;
+   goto put_primary_if;
}
 
batadv_inc_counter(bat_priv, BATADV_CNT_FRAG_TX);
@@ -511,7 +511,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
if (ret != NET_XMIT_SUCCESS) {
ret = NET_XMIT_DROP;
-   goto free_skb;
+   goto put_primary_if;
}
 
frag_header.no++;
@@ -519,7 +519,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
/* The initial check in this function should cover this case */
if (frag_header.no == BATADV_FRAG_MAX_FRAGMENTS - 1) {
ret = -EINVAL;
-   goto free_skb;
+   goto put_primary_if;
}
}
 
@@ -527,7 +527,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
if (batadv_skb_head_push(skb, header_size) < 0 ||
pskb_expand_head(skb, header_size + ETH_HLEN, 0, GFP_ATOMIC) < 0) {
ret = -ENOMEM;
-   goto free_skb;
+   goto put_primary_if;
}
 
memcpy(skb->data, &frag_header, header_size);
-- 
2.11.0

Re: [PATCH 1/3] net: phy: broadcom: use auxctl reading helper in BCM54612E code

2017-01-25 Thread Florian Fainelli

On 01/25/2017 07:54 AM, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> Starting with commit 5b4e29005123 ("net: phy: broadcom: add
> bcm54xx_auxctl_read") we have a reading helper so use it and avoid code
> duplication.
> It also means we don't need MII_BCM54XX_AUXCTL_SHDWSEL_MISC define as
> it's the same as MII_BCM54XX_AUXCTL_SHDWSEL_MISC just for reading needs
> (same value shifted by 12 bits).
> 
> Signed-off-by: Rafał Miłecki 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH 3/3] net: phy: bcm-phy-lib: clean up remaining AUXCTL register defines

2017-01-25 Thread Florian Fainelli

On 01/25/2017 07:54 AM, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> 1) Use 0x%02x format for register number. This follows some other
>defines and makes it easier to distinct register from values.
> 2) Put register define above values and sort the values. It makes
>reading header code easier.
> 3) Drop SHDWSEL_ name part from the only value define using it. For all
>other values we just start with MISC_.

That's not how these bits are defined in the data sheet, so please drop
that part of the patch.

> 
> Signed-off-by: Rafał Miłecki 
> ---
>  drivers/net/phy/bcm-phy-lib.c | 6 +++---
>  include/linux/brcmphy.h   | 8 
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
> index ab9ad689617c..8d7e21a9fa77 100644
> --- a/drivers/net/phy/bcm-phy-lib.c
> +++ b/drivers/net/phy/bcm-phy-lib.c
> @@ -241,7 +241,7 @@ int bcm_phy_downshift_get(struct phy_device *phydev, u8 
> *count)
>   return val;
>  
>   /* Check if wirespeed is enabled or not */
> - if (!(val & MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN)) {
> + if (!(val & MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN)) {
>   *count = DOWNSHIFT_DEV_DISABLE;
>   return 0;
>   }
> @@ -283,12 +283,12 @@ int bcm_phy_downshift_set(struct phy_device *phydev, u8 
> count)
>   val |= MII_BCM54XX_AUXCTL_MISC_WREN;
>  
>   if (count == DOWNSHIFT_DEV_DISABLE) {
> - val &= ~MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN;
> + val &= ~MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN;
>   return bcm54xx_auxctl_write(phydev,
>   MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
>   val);
>   } else {
> - val |= MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN;
> + val |= MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN;
>   ret = bcm54xx_auxctl_write(phydev,
>  MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
>  val);
> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> index bff53da82b58..a79d57caf7f1 100644
> --- a/include/linux/brcmphy.h
> +++ b/include/linux/brcmphy.h
> @@ -104,16 +104,16 @@
>  /*
>   * AUXILIARY CONTROL SHADOW ACCESS REGISTERS.  (PHY REG 0x18)
>   */
> -#define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL0x
> +#define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL0x00
>  #define MII_BCM54XX_AUXCTL_ACTL_TX_6DB   0x0400
>  #define MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA0x0800
>  
> -#define MII_BCM54XX_AUXCTL_MISC_WREN 0x8000
> +#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC  0x07
> +#define MII_BCM54XX_AUXCTL_MISC_WIRESPEED_EN 0x0010
>  #define MII_BCM54XX_AUXCTL_MISC_RXD_RXC_SKEW 0x0100
>  #define MII_BCM54XX_AUXCTL_MISC_FORCE_AMDIX  0x0200
> -#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC  0x0007
> +#define MII_BCM54XX_AUXCTL_MISC_WREN 0x8000
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_READ_SHIFT12
> -#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN (1 << 4)
>  
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MASK  0x0007
>  
> 


-- 
Florian

Re: [PATCH 2/3] net: phy: broadcom: drop duplicated define for RXD to RXC delay

2017-01-25 Thread Florian Fainelli

On 01/25/2017 07:54 AM, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> We had two defines for the same bit (both were used with the
> MII_BCM54XX_AUXCTL_SHDWSEL_MISC register).
> 
> Signed-off-by: Rafał Miłecki 
> ---
>  drivers/net/phy/broadcom.c | 2 +-
>  include/linux/brcmphy.h| 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
> index 25c6e6cea2dc..9b7c2d57ca92 100644
> --- a/drivers/net/phy/broadcom.c
> +++ b/drivers/net/phy/broadcom.c
> @@ -42,7 +42,7 @@ static int bcm54810_config(struct phy_device *phydev)
>   return rc;
>  
>   val = bcm54xx_auxctl_read(phydev, MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
> - val &= ~MII_BCM54XX_AUXCTL_SHDWSEL_MISC_RGMII_SKEW_EN;
> + val &= ~MII_BCM54XX_AUXCTL_MISC_RXD_RXC_SKEW;
>   val |= MII_BCM54XX_AUXCTL_MISC_WREN;
>   rc = bcm54xx_auxctl_write(phydev, MII_BCM54XX_AUXCTL_SHDWSEL_MISC,
> val);
> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> index 34e61004b9dc..bff53da82b58 100644
> --- a/include/linux/brcmphy.h
> +++ b/include/linux/brcmphy.h
> @@ -113,7 +113,6 @@
>  #define MII_BCM54XX_AUXCTL_MISC_FORCE_AMDIX  0x0200
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC  0x0007
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_READ_SHIFT12
> -#define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_RGMII_SKEW_EN(1 << 8)
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN (1 << 4)

Please drop the other one and keep this one instead, the SHDWSEL prefix
is intentional and correct here since it matches the datasheet.
-- 
Florian

[PATCH 3/7] batman-adv: Remove one condition check in batadv_route_unicast_packet

2017-01-25 Thread Simon Wunderlich

From: Gao Feng 

It could decrease one condition check to collect some statements in the
first condition block.

Signed-off-by: Gao Feng 
Signed-off-by: Sven Eckelmann 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/routing.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 6b08b26da4d9..9d657cff93de 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -719,20 +719,18 @@ static int batadv_route_unicast_packet(struct sk_buff 
*skb,
 
len = skb->len;
res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-   if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
-   ret = NET_RX_SUCCESS;
-
-   /* skb was consumed */
-   skb = NULL;
-
/* translate transmit result into receive result */
if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN) {
+   ret = NET_RX_SUCCESS;
/* skb was transmitted and consumed */
batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
   len + ETH_HLEN);
}
 
+   /* skb was consumed */
+   skb = NULL;
+
 put_orig_node:
batadv_orig_node_put(orig_node);
 free_skb:
-- 
2.11.0

2017-01-25 Thread Simon Wunderlich

From: Sven Eckelmann 

Signed-off-by: Sven Eckelmann 
Signed-off-by: Simon Wunderlich 
---
 include/uapi/linux/batman_adv.h| 2 +-
 net/batman-adv/Makefile| 2 +-
 net/batman-adv/bat_algo.c  | 2 +-
 net/batman-adv/bat_algo.h  | 2 +-
 net/batman-adv/bat_iv_ogm.c| 2 +-
 net/batman-adv/bat_iv_ogm.h| 2 +-
 net/batman-adv/bat_v.c | 2 +-
 net/batman-adv/bat_v.h | 2 +-
 net/batman-adv/bat_v_elp.c | 2 +-
 net/batman-adv/bat_v_elp.h | 2 +-
 net/batman-adv/bat_v_ogm.c | 2 +-
 net/batman-adv/bat_v_ogm.h | 2 +-
 net/batman-adv/bitarray.c  | 2 +-
 net/batman-adv/bitarray.h  | 2 +-
 net/batman-adv/bridge_loop_avoidance.c | 2 +-
 net/batman-adv/bridge_loop_avoidance.h | 2 +-
 net/batman-adv/debugfs.c   | 2 +-
 net/batman-adv/debugfs.h   | 2 +-
 net/batman-adv/distributed-arp-table.c | 2 +-
 net/batman-adv/distributed-arp-table.h | 2 +-
 net/batman-adv/fragmentation.c | 2 +-
 net/batman-adv/fragmentation.h | 2 +-
 net/batman-adv/gateway_client.c| 2 +-
 net/batman-adv/gateway_client.h| 2 +-
 net/batman-adv/gateway_common.c| 2 +-
 net/batman-adv/gateway_common.h| 2 +-
 net/batman-adv/hard-interface.c| 2 +-
 net/batman-adv/hard-interface.h| 2 +-
 net/batman-adv/hash.c  | 2 +-
 net/batman-adv/hash.h  | 2 +-
 net/batman-adv/icmp_socket.c   | 2 +-
 net/batman-adv/icmp_socket.h   | 2 +-
 net/batman-adv/log.c   | 2 +-
 net/batman-adv/log.h   | 2 +-
 net/batman-adv/main.c  | 2 +-
 net/batman-adv/main.h  | 2 +-
 net/batman-adv/multicast.c | 2 +-
 net/batman-adv/multicast.h | 2 +-
 net/batman-adv/netlink.c   | 2 +-
 net/batman-adv/netlink.h   | 2 +-
 net/batman-adv/network-coding.c| 2 +-
 net/batman-adv/network-coding.h| 2 +-
 net/batman-adv/originator.c| 2 +-
 net/batman-adv/originator.h| 2 +-
 net/batman-adv/packet.h| 2 +-
 net/batman-adv/routing.c   | 2 +-
 net/batman-adv/routing.h   | 2 +-
 net/batman-adv/send.c  | 2 +-
 net/batman-adv/send.h  | 2 +-
 net/batman-adv/soft-interface.c| 2 +-
 net/batman-adv/soft-interface.h| 2 +-
 net/batman-adv/sysfs.c | 2 +-
 net/batman-adv/sysfs.h | 2 +-
 net/batman-adv/tp_meter.c  | 2 +-
 net/batman-adv/tp_meter.h  | 2 +-
 net/batman-adv/translation-table.c | 2 +-
 net/batman-adv/translation-table.h | 2 +-
 net/batman-adv/tvlv.c  | 2 +-
 net/batman-adv/tvlv.h  | 2 +-
 net/batman-adv/types.h | 2 +-
 60 files changed, 60 insertions(+), 60 deletions(-)

diff --git a/include/uapi/linux/batman_adv.h b/include/uapi/linux/batman_adv.h
index 734fe83ab645..a83ddb7b63db 100644
--- a/include/uapi/linux/batman_adv.h
+++ b/include/uapi/linux/batman_adv.h
@@ -1,4 +1,4 @@
-/* Copyright (C) 2016 B.A.T.M.A.N. contributors:
+/* Copyright (C) 2016-2017  B.A.T.M.A.N. contributors:
  *
  * Matthias Schiffer
  *
diff --git a/net/batman-adv/Makefile b/net/batman-adv/Makefile
index f724d3c98a81..915987bc6d29 100644
--- a/net/batman-adv/Makefile
+++ b/net/batman-adv/Makefile
@@ -1,5 +1,5 @@
 #
-# Copyright (C) 2007-2016  B.A.T.M.A.N. contributors:
+# Copyright (C) 2007-2017  B.A.T.M.A.N. contributors:
 #
 # Marek Lindner, Simon Wunderlich
 #
diff --git a/net/batman-adv/bat_algo.c b/net/batman-adv/bat_algo.c
index 623d04302aa2..44fd073b7546 100644
--- a/net/batman-adv/bat_algo.c
+++ b/net/batman-adv/bat_algo.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2007-2016  B.A.T.M.A.N. contributors:
+/* Copyright (C) 2007-2017  B.A.T.M.A.N. contributors:
  *
  * Marek Lindner, Simon Wunderlich
  *
diff --git a/net/batman-adv/bat_algo.h b/net/batman-adv/bat_algo.h
index 3b5b69cdd12b..29f6312f9bf1 100644
--- a/net/batman-adv/bat_algo.h
+++ b/net/batman-adv/bat_algo.h
@@ -1,4 +1,4 @@
-/* Copyright (C) 2011-2016  B.A.T.M.A.N. contributors:
+/* Copyright (C) 2011-2017  B.A.T.M.A.N. contributors:
  *
  * Marek Lindner, Linus Lüssing
  *
diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index f00f666e2ccd..7c3d994e90d8 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2007-2016  B.A.T.M.A.N. contributors:
+/* Copyright (C) 2007-2017  B.A.T.M.A.N. contributors:
  *
  * Marek Lindner, Simon Wunderlich
  *
diff --git a/net/batman-adv/bat_iv_ogm.h b/net/batman-adv/bat_iv_ogm.h
index b9f3550faaf7..ae2ab526bdb1 100644
--- a/net/batman-adv/bat_iv_ogm.h
+++ b/net/batman-adv/bat_iv_ogm.h
@@ -1,4 +1,4 @@
-/* Copyright (C) 2007-2016  B.A.T.M.A.N. contributors:
+/* Copyright (C) 2007-2017  B.A.T.M.A.N. contributors:
  *
  * Marek

[PATCH 4/7] batman-adv: don't add loop detect macs to TT

2017-01-25 Thread Simon Wunderlich

From: Simon Wunderlich 

The bridge loop avoidance (BLA) feature of batman-adv sends packets to
probe for Mesh/LAN packet loops. Those packets are not sent by real
clients and should therefore not be added to the translation table (TT).

Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/bridge_loop_avoidance.h | 18 ++
 net/batman-adv/soft-interface.c|  3 ++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.h 
b/net/batman-adv/bridge_loop_avoidance.h
index 1ae93e46fb98..2827cd3c13d2 100644
--- a/net/batman-adv/bridge_loop_avoidance.h
+++ b/net/batman-adv/bridge_loop_avoidance.h
@@ -20,6 +20,8 @@
 
 #include "main.h"
 
+#include 
+#include 
 #include 
 
 struct net_device;
@@ -27,6 +29,22 @@ struct netlink_callback;
 struct seq_file;
 struct sk_buff;
 
+/**
+ * batadv_bla_is_loopdetect_mac - check if the mac address is from a loop 
detect
+ *  frame sent by bridge loop avoidance
+ * @mac: mac address to check
+ *
+ * Return: true if the it looks like a loop detect frame
+ * (mac starts with BA:BE), false otherwise
+ */
+static inline bool batadv_bla_is_loopdetect_mac(const uint8_t *mac)
+{
+   if (mac[0] == 0xba && mac[1] == 0xbe)
+   return true;
+
+   return false;
+}
+
 #ifdef CONFIG_BATMAN_ADV_BLA
 bool batadv_bla_rx(struct batadv_priv *bat_priv, struct sk_buff *skb,
   unsigned short vid, bool is_bcast);
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 60516bbb7e83..4e447bf17332 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -258,7 +258,8 @@ static int batadv_interface_tx(struct sk_buff *skb,
ethhdr = eth_hdr(skb);
 
/* Register the client MAC in the transtable */
-   if (!is_multicast_ether_addr(ethhdr->h_source)) {
+   if (!is_multicast_ether_addr(ethhdr->h_source) &&
+   !batadv_bla_is_loopdetect_mac(ethhdr->h_source)) {
client_added = batadv_tt_local_add(soft_iface, ethhdr->h_source,
   vid, skb->skb_iif,
   skb->mark);
-- 
2.11.0

[PATCH 0/7] pull request for net-next: batman-adv 2017-01-25

2017-01-25 Thread Simon Wunderlich


Hi David,

here is a small feature/cleanup pull request of batman-adv to go into net-next.

Please pull or let me know of any problem!

Thank you,
  Simon

The following changes since commit 7ce7d89f48834cefece7804d38fc5d85382edf77:

  Linux 4.10-rc1 (2016-12-25 16:13:08 -0800)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batadv-next-for-davem-20170125

for you to fetch changes up to 03d17903cbfdc3bdf6e08c9fb6603936135bba5b:

  batman-adv: Remove unused variable in batadv_tt_local_set_flags (2017-01-04 
08:20:21 +0100)


This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - Handle NET_XMIT_CN like NET_XMIT_SUCCESS, and a follow up code clean up,
   by Gao Feng (2 patches)

 - ignore self-generated loop detect MAC addresses in translation table,
   by Simon Wunderlich

 - install uapi batman_adv.h header, by Sven Eckelmann

 - bump copyright years, by Sven Eckelmann

 - Remove an unused variable in translation table code, by Sven Eckelmann


Gao Feng (2):
  batman-adv: Treat NET_XMIT_CN as transmit successfully
  batman-adv: Remove one condition check in batadv_route_unicast_packet

Simon Wunderlich (2):
  batman-adv: Start new development cycle
  batman-adv: don't add loop detect macs to TT

Sven Eckelmann (3):
  uapi: install batman_adv.h header
  batman-adv: update copyright years for 2017
  batman-adv: Remove unused variable in batadv_tt_local_set_flags

 include/uapi/linux/Kbuild  |  1 +
 include/uapi/linux/batman_adv.h|  2 +-
 net/batman-adv/Makefile|  2 +-
 net/batman-adv/bat_algo.c  |  2 +-
 net/batman-adv/bat_algo.h  |  2 +-
 net/batman-adv/bat_iv_ogm.c|  2 +-
 net/batman-adv/bat_iv_ogm.h|  2 +-
 net/batman-adv/bat_v.c |  2 +-
 net/batman-adv/bat_v.h |  2 +-
 net/batman-adv/bat_v_elp.c |  2 +-
 net/batman-adv/bat_v_elp.h |  2 +-
 net/batman-adv/bat_v_ogm.c |  2 +-
 net/batman-adv/bat_v_ogm.h |  2 +-
 net/batman-adv/bitarray.c  |  2 +-
 net/batman-adv/bitarray.h  |  2 +-
 net/batman-adv/bridge_loop_avoidance.c |  2 +-
 net/batman-adv/bridge_loop_avoidance.h | 20 +++-
 net/batman-adv/debugfs.c   |  2 +-
 net/batman-adv/debugfs.h   |  2 +-
 net/batman-adv/distributed-arp-table.c |  5 +++--
 net/batman-adv/distributed-arp-table.h |  2 +-
 net/batman-adv/fragmentation.c |  4 ++--
 net/batman-adv/fragmentation.h |  2 +-
 net/batman-adv/gateway_client.c|  2 +-
 net/batman-adv/gateway_client.h|  2 +-
 net/batman-adv/gateway_common.c|  2 +-
 net/batman-adv/gateway_common.h|  2 +-
 net/batman-adv/hard-interface.c|  2 +-
 net/batman-adv/hard-interface.h|  2 +-
 net/batman-adv/hash.c  |  2 +-
 net/batman-adv/hash.h  |  2 +-
 net/batman-adv/icmp_socket.c   |  2 +-
 net/batman-adv/icmp_socket.h   |  2 +-
 net/batman-adv/log.c   |  2 +-
 net/batman-adv/log.h   |  2 +-
 net/batman-adv/main.c  |  2 +-
 net/batman-adv/main.h  |  4 ++--
 net/batman-adv/multicast.c |  2 +-
 net/batman-adv/multicast.h |  2 +-
 net/batman-adv/netlink.c   |  2 +-
 net/batman-adv/netlink.h   |  2 +-
 net/batman-adv/network-coding.c|  2 +-
 net/batman-adv/network-coding.h|  2 +-
 net/batman-adv/originator.c|  2 +-
 net/batman-adv/originator.h|  2 +-
 net/batman-adv/packet.h|  2 +-
 net/batman-adv/routing.c   | 20 +---
 net/batman-adv/routing.h   |  2 +-
 net/batman-adv/send.c  |  2 +-
 net/batman-adv/send.h  |  2 +-
 net/batman-adv/soft-interface.c|  7 ---
 net/batman-adv/soft-interface.h|  2 +-
 net/batman-adv/sysfs.c |  2 +-
 net/batman-adv/sysfs.h |  2 +-
 net/batman-adv/tp_meter.c  |  4 ++--
 net/batman-adv/tp_meter.h  |  2 +-
 net/batman-adv/translation-table.c |  4 +---
 net/batman-adv/translation-table.h |  2 +-
 net/batman-adv/tvlv.c  |  2 +-
 net/batman-adv/tvlv.h  |  2 +-
 net/batman-adv/types.h |  2 +-
 61 files changed, 95 insertions(+), 78 deletions(-)

[PATCH 5/7] uapi: install batman_adv.h header

2017-01-25 Thread Simon Wunderlich

From: Sven Eckelmann 

09748a22f4ab ("batman-adv: add generic netlink family for batman-adv")
introduced the new batman_adv.h which describes the netlink attributes and
commands of batman-adv. But the Kbuild entry to install the header was not
added.

All currently known tools ship their own copy of batman_adv.h but it should
be installed anyway to later be able to migrate to the system batman_adv.h.

Signed-off-by: Sven Eckelmann 
Signed-off-by: Simon Wunderlich 
---
 include/uapi/linux/Kbuild | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index a8b93e685239..7fdceb2ac5b7 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -64,6 +64,7 @@ header-y += auto_fs.h
 header-y += auxvec.h
 header-y += ax25.h
 header-y += b1lli.h
+header-y += batman_adv.h
 header-y += baycom.h
 header-y += bcm933xx_hcs.h
 header-y += bfs_fs.h
-- 
2.11.0

[PATCH 1/7] batman-adv: Start new development cycle

2017-01-25 Thread Simon Wunderlich

Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/main.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/main.h b/net/batman-adv/main.h
index a6cc8040a21d..8683542067ba 100644
--- a/net/batman-adv/main.h
+++ b/net/batman-adv/main.h
@@ -24,7 +24,7 @@
 #define BATADV_DRIVER_DEVICE "batman-adv"
 
 #ifndef BATADV_SOURCE_VERSION
-#define BATADV_SOURCE_VERSION "2016.5"
+#define BATADV_SOURCE_VERSION "2017.0"
 #endif
 
 /* B.A.T.M.A.N. parameters */
-- 
2.11.0

[PATCH 2/7] batman-adv: Treat NET_XMIT_CN as transmit successfully

2017-01-25 Thread Simon Wunderlich

From: Gao Feng 

The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packet is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.

So batman-adv should add the NET_XMIT_CN check.

Signed-off-by: Gao Feng 
Signed-off-by: Sven Eckelmann 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/distributed-arp-table.c |  3 ++-
 net/batman-adv/fragmentation.c |  2 +-
 net/batman-adv/routing.c   | 10 +-
 net/batman-adv/soft-interface.c|  2 +-
 net/batman-adv/tp_meter.c  |  2 +-
 5 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/batman-adv/distributed-arp-table.c 
b/net/batman-adv/distributed-arp-table.c
index 49576c5a3fe3..3641765d55df 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -659,7 +659,8 @@ static bool batadv_dat_send_data(struct batadv_priv 
*bat_priv,
}
 
send_status = batadv_send_unicast_skb(tmp_skb, neigh_node);
-   if (send_status == NET_XMIT_SUCCESS) {
+   if (send_status == NET_XMIT_SUCCESS ||
+   send_status == NET_XMIT_CN) {
/* count the sent packet */
switch (packet_subtype) {
case BATADV_P_DAT_DHT_GET:
diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c
index 9c561e683f4b..52396160360b 100644
--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -509,7 +509,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
batadv_add_counter(bat_priv, BATADV_CNT_FRAG_TX_BYTES,
   skb_fragment->len + ETH_HLEN);
ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
-   if (ret != NET_XMIT_SUCCESS) {
+   if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN) {
ret = NET_XMIT_DROP;
goto free_skb;
}
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 6713bdf414cd..6b08b26da4d9 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -262,7 +262,7 @@ static int batadv_recv_my_icmp_packet(struct batadv_priv 
*bat_priv,
icmph->ttl = BATADV_TTL;
 
res = batadv_send_skb_to_orig(skb, orig_node, NULL);
-   if (res == NET_XMIT_SUCCESS)
+   if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
ret = NET_RX_SUCCESS;
 
/* skb was consumed */
@@ -330,7 +330,7 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv 
*bat_priv,
icmp_packet->ttl = BATADV_TTL;
 
res = batadv_send_skb_to_orig(skb, orig_node, NULL);
-   if (res == NET_RX_SUCCESS)
+   if (res == NET_RX_SUCCESS || res == NET_XMIT_CN)
ret = NET_XMIT_SUCCESS;
 
/* skb was consumed */
@@ -424,7 +424,7 @@ int batadv_recv_icmp_packet(struct sk_buff *skb,
 
/* route it */
res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-   if (res == NET_XMIT_SUCCESS)
+   if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
ret = NET_RX_SUCCESS;
 
/* skb was consumed */
@@ -719,14 +719,14 @@ static int batadv_route_unicast_packet(struct sk_buff 
*skb,
 
len = skb->len;
res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-   if (res == NET_XMIT_SUCCESS)
+   if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
ret = NET_RX_SUCCESS;
 
/* skb was consumed */
skb = NULL;
 
/* translate transmit result into receive result */
-   if (res == NET_XMIT_SUCCESS) {
+   if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN) {
/* skb was transmitted and consumed */
batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 7b3494ae6ad9..60516bbb7e83 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -386,7 +386,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
ret = batadv_send_skb_via_tt(bat_priv, skb, dst_hint,
 vid);
}
-   if (ret != NET_XMIT_SUCCESS)
+   if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN)
goto dropped_freed;
}
 
diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c
index 981e8c5b07e9..c367c8316a82 100644
--- a/net/batman-adv/tp_meter.c
+++ b/net/batman-adv/tp_meter.c
@@ -615,7 +615,7 @@ static int batadv_tp_send_msg(struct batadv_tp_vars 
*tp_vars, const u8 *src,
batadv_tp_fill_prerandom(tp_vars, data, data_len);
 
r = batadv_send_skb_to_orig(skb, orig_node, NULL);
-

[PATCH 7/7] batman-adv: Remove unused variable in batadv_tt_local_set_flags

2017-01-25 Thread Simon Wunderlich

From: Sven Eckelmann 

Signed-off-by: Sven Eckelmann 
Signed-off-by: Simon Wunderlich 
---
 net/batman-adv/translation-table.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/batman-adv/translation-table.c 
b/net/batman-adv/translation-table.c
index 941afad92121..6077a87d46f0 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3714,7 +3714,6 @@ static void batadv_tt_local_set_flags(struct batadv_priv 
*bat_priv, u16 flags,
 {
struct batadv_hashtable *hash = bat_priv->tt.local_hash;
struct batadv_tt_common_entry *tt_common_entry;
-   u16 changed_num = 0;
struct hlist_head *head;
u32 i;
 
@@ -3736,7 +3735,6 @@ static void batadv_tt_local_set_flags(struct batadv_priv 
*bat_priv, u16 flags,
continue;
tt_common_entry->flags &= ~flags;
}
-   changed_num++;
 
if (!count)
continue;
-- 
2.11.0

Re: [PATCH/RFC v3 net] ravb: unmap descriptors when freeing rings

2017-01-25 Thread Simon Horman

On Wed, Jan 25, 2017 at 07:05:08PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 01/24/2017 09:21 PM, Simon Horman wrote:
> 
> >From: Kazuya Mizuguchi 
> >
> >"swiotlb buffer is full" errors occur after repeated initialisation of a
> >device - f.e. suspend/resume or ip link set up/down. This is because memory
> >mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
> >is not released.  Resolve this problem by unmapping descriptors when
> >freeing rings.
> >
> >Note, ravb_tx_free() is moved but not otherwise modified by this patch.
> >
> >Signed-off-by: Kazuya Mizuguchi 
> >[simon: reworked]
> >Signed-off-by: Simon Horman 
> >--
> >v3 [Simon Horman]
> >* As suggested by Sergei Shtylyov
> >  - consistently use le32_to_cpu(desc->dptr)
> >  - Do not clear desc->ds_cc as it is not used
> >* Paramatise ravb_tx_free() to allow it to free non-transmitted buffers
> >
> >v2 [Simon Horman]
> >* As suggested by Sergei Shtylyov
> >  - Use dma_mapping_error() and rx_desc->ds_cc when unmapping RX descriptors;
> >this is consistent with the way that they are mapped
> >  - Use ravb_tx_free() to clear TX descriptors
> >* Reduce scope of new local variable
> >
> >v1 [Kazuya Mizuguchi]
> >---
> > drivers/net/ethernet/renesas/ravb_main.c | 113 
> > ++-
> > 1 file changed, 65 insertions(+), 48 deletions(-)
> >
> >diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
> >b/drivers/net/ethernet/renesas/ravb_main.c
> >index 89ac1e3f6175..57fe1411bb9d 100644
> >--- a/drivers/net/ethernet/renesas/ravb_main.c
> >+++ b/drivers/net/ethernet/renesas/ravb_main.c
> >@@ -179,6 +179,51 @@ static struct mdiobb_ops bb_ops = {
> > .get_mdio_data = ravb_get_mdio_data,
> > };
> >
> >+enum ravb_tx_free_mode {
> >+ravb_tx_free_all,
> >+ravb_tx_free_txed_only,
> >+};
> >+
> >+/* Free TX skb function for AVB-IP */
> >+static int ravb_tx_free(struct net_device *ndev, int q,
> >+enum ravb_tx_free_mode free_mode)
> 
>Hmm... Sorry but this looks over-engineered. A *bool* parameter (named
> e.g 'all) would suffice IMHO.

Ha! The last time I used a bool for something like this I was encouraged
to use an enum, admittedly that was not kernel code but I was unsure
which way to go this time. I'll change things to bool as you sugget.

> >+{
> >+struct ravb_private *priv = netdev_priv(ndev);
> >+struct net_device_stats *stats = &priv->stats[q];
> >+struct ravb_tx_desc *desc;
> >+int free_num = 0;
> >+int entry;
> >+u32 size;
> >+
> >+for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) {
> >+entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] *
> >+ NUM_TX_DESC);
> >+desc = &priv->tx_ring[q][entry];
> >+if (free_mode == ravb_tx_free_txed_only &&
> >+desc->die_dt != DT_FEMPTY)
> >+break;
> >+/* Descriptor type must be checked before all other reads */
> >+dma_rmb();
> >+size = le16_to_cpu(desc->ds_tagl) & TX_DS;
> >+/* Free the original skb. */
> >+if (priv->tx_skb[q][entry / NUM_TX_DESC]) {
> >+dma_unmap_single(ndev->dev.parent, 
> >le32_to_cpu(desc->dptr),
> >+ size, DMA_TO_DEVICE);
> >+/* Last packet descriptor? */
> >+if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) {
> >+entry /= NUM_TX_DESC;
> >+dev_kfree_skb_any(priv->tx_skb[q][entry]);
> >+priv->tx_skb[q][entry] = NULL;
> >+stats->tx_packets++;
> >+}
> >+free_num++;
> >+}
> >+stats->tx_bytes += size;
> 
>Hmmm... we shouldn't count the discarded unsent packets/bytes as sent, 
> right?

Yes, I think so. Sorry for missing that.

> [...]
> >@@ -215,12 +262,19 @@ static void ravb_ring_free(struct net_device *ndev, 
> >int q)
> > }
> >
> > if (priv->tx_ring[q]) {
> >+ravb_tx_free(ndev, q, ravb_tx_free_all);
> >+
> > ring_size = sizeof(struct ravb_tx_desc) *
> > (priv->num_tx_ring[q] * NUM_TX_DESC + 1);
> > dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q],
> >   priv->tx_desc_dma[q]);
> > priv->tx_ring[q] = NULL;
> > }
> >+
> >+/* Free TX skb ringbuffer.
> >+ * SKBs are freed by ravb_tx_free() call above. */
> 
>This is not a recommended comment format:
> 
> /* bla
>  * bla
>  */

Thanks, I will fix that.

Re: [PATCH/RFC v3 net] ravb: unmap descriptors when freeing rings

2017-01-25 Thread Simon Horman

On Wed, Jan 25, 2017 at 07:18:15PM +0300, Sergei Shtylyov wrote:
> On 01/24/2017 09:21 PM, Simon Horman wrote:
> 
> >From: Kazuya Mizuguchi 
> >
> >"swiotlb buffer is full" errors occur after repeated initialisation of a
> >device - f.e. suspend/resume or ip link set up/down. This is because memory
> >mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
> >is not released.  Resolve this problem by unmapping descriptors when
> >freeing rings.
> 
>Could you look into the sh_eth driver which seems to have the same issue?

Sure, I will check.

> >Note, ravb_tx_free() is moved but not otherwise modified by this patch.
> 
>This is not true anymore BTW.

Thanks for noticing, I'll fix that.

> >Signed-off-by: Kazuya Mizuguchi 
> >[simon: reworked]
> >Signed-off-by: Simon Horman 
> 
> MBR, Sergei
>

Re: [PATCH net-next] xen-netfront: reject short packets and handle non-linear packets

2017-01-25 Thread Eric Dumazet

On Wed, 2017-01-25 at 16:26 +, Paul Durrant wrote:
> Sowmini points out two vulnerabilities in xen-netfront:
> 
> a) The code assumes that skb->len is at least ETH_HLEN.
> b) The code assumes that at least ETH_HLEN octets are in the linear
>port of the socket buffer.
> 
> This patch adds tests for both of these, and in the case of the latter
> pulls sufficient bytes into the linear area.
> 
> Signed-off-by: Paul Durrant 
> Reported-by: Sowmini Varadhan 
> Tested-by: Sowmini Varadhan 
> ---
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> ---
>  drivers/net/xen-netfront.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 40f26b6..0478809 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -567,6 +567,10 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>   u16 queue_index;
>   struct sk_buff *nskb;
>  
> + /* Basic sanity check */
> + if (unlikely(skb->len < ETH_HLEN))
> + goto drop;
> +
>   /* Drop the packet if no queues are set up */
>   if (num_queues < 1)
>   goto drop;
> @@ -609,6 +613,11 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>   }
>  
>   len = skb_headlen(skb);
> + if (unlikely(len < ETH_HLEN)) {
> + if (!__pskb_pull_tail(skb, ETH_HLEN - len))
> + goto drop;
> + len = ETH_HLEN;
> + }

Looks like duplicated code, and buggy, considering the code above

page = virt_to_page(skb->data);
offset = offset_in_page(skb->data);

Your patch might end up with skb->data/head being reallocated, and use
after free would happen.

What about something like that ?

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 
40f26b69beb11459f0566fc1d1d739aa75e643bf..99a67fe4de86d3141169143b0820d00968cb09f2
 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -583,6 +583,8 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
skb->len);
goto drop;
}
+   if (!pskb_may_pull(skb, ETH_HLEN))
+   goto drop;
 
slots = xennet_count_skb_slots(skb);
if (unlikely(slots > MAX_XEN_SKB_FRAGS + 1)) {

Re: [RFC PATCH net-next 0/5] bridge: per vlan lwt and dst_metadata support

2017-01-25 Thread Roopa Prabhu

On 1/24/17, 7:47 AM, Stephen Hemminger wrote:
> On Fri, 20 Jan 2017 21:46:51 -0800
> Roopa Prabhu  wrote:
>
>> From: Roopa Prabhu 
>>
>> High level summary:
>> lwt and dst_metadata/collect_metadata have enabled vxlan l3 deployments
>> to use a single vxlan netdev for multiple vnis eliminating the scalability
>> problem with using a single vxlan netdev per vni. This series tries to
>> do the same for vxlan netdevs in pure l2 bridged networks.
>> Use-case/deployment and details are below.
>>
>> Deployment scerario details:
>> As we know VXLAN is used to build layer 2 virtual networks across the
>> underlay layer3 infrastructure. A VXLAN tunnel endpoint (VTEP)
>> originates and terminates VXLAN tunnels. And a VTEP can be a TOR switch
>> or a vswitch in the hypervisor. This patch series mainly
>> focuses on the TOR switch configured as a Vtep. Vxlan segment ID (vni)
>> along with vlan id is used to identify layer 2 segments in a vxlan
>> overlay network. Vxlan bridging is the function provided by Vteps to 
>> terminate
>> vxlan tunnels and map the vxlan vni to traditional end host vlan. This is
>> covered in the "VXLAN Deployment Scenarios" in sections 6 and 6.1 in RFC 
>> 7348.
>> To provide vxlan bridging function, a vtep has to map vlan to a vni. The rfc
>> says that the ingress VTEP device shall remove the IEEE 802.1Q VLAN tag in
>> the original Layer 2 packet if there is one before encapsulating the packet
>> into the VXLAN format to transmit it through the underlay network. The remote
>> VTEP devices have information about the VLAN in which the packet will be
>> placed based on their own VLAN-to-VXLAN VNI mapping configurations.
>>
>> Existing solution:
>> Without this patch series one can deploy such a vtep configuration by
>> by adding the local ports and vxlan netdevs into a vlan filtering bridge.
>> The local ports are configured as trunk ports carrying all vlans.
>> A vxlan netdev per vni is added to the bridge. Vlan mapping to vni is
>> achieved by configuring the vlan as pvid on the corresponding vxlan netdev.
>> The vxlan netdev only receives traffic corresponding to the vlan it is mapped
>> to. This configuration maps traffic belonging to a vlan to the corresponding
>> vxlan segment.
>>
>>   ---
>>  |  bridge   |
>>  |   |
>>   ---
>> |100,200   |100 (pvid)|200 (pvid)
>> |  |  |
>>swp1  vxlan1000  vxlan2000
>> 
>> This provides the required vxlan bridging function but poses a
>> scalability problem with using a single vxlan netdev for each vni.
>>
>> Solution in this patch series:
>> The Goal is to use a single vxlan device to carry all vnis similar
>> to the vxlan collect metadata mode but vxlan driver still carrying all
>> the forwarding information.
>> - vxlan driver changes:
>> - enable collect metadata mode device to be used with learning,
>>   replication, fdb
>> - A single fdb table hashed by (mac, vni)
>> - rx path already has the vni
>> - tx path expects a vni in the packet with dst_metadata and vxlan
>>   driver has all the forwarding information for the vni in the
>>   dst_metadata.
>>
>> - Bridge driver changes: per vlan LWT and dst_metadata support:
>> - Our use case is vxlan and 1-1 mapping between vlan and vni, but I have
>>   kept the api generic for any tunnel info
>> - Uapi to configure/unconfigure/dump per vlan tunnel data
>> - new bridge port flag to turn this feature on/off. off by default
>> - ingress hook:
>> - if port is a lwt tunnel port, use tunnel info in
>>   attached dst_metadata to map it to a local vlan
>> - egress hook:
>> - if port is a lwt tunnel port, use tunnel info attached to vlan
>>   to set dst_metadata on the skb
>>
>> Other approaches tried and vetoed:
>> - tc vlan push/pop and tunnel metadata dst:
>> - posses a tc rule scalability problem (2 rules per vni)
>> - cannot handle the case where a packet needs to be replicated to
>>   multiple vxlan remote tunnel end-points.. which the vxlan driver
>>   can do today by having multiple remote destinations per fdb.
>> - making vxlan driver understand vlan-vni mapping:
>> - I had a series almost ready with this one but soon realized
>>   it duplicated a lot of vlan handling code in the vxlan driver
>>
>> This series is briefly tested for functionality. Sending it out as RFC while
>> I continue to test it more. There are some rough edges which I am in the 
>> process
>> of fixing.
>>
>> Signed-off-by: Roopa Prabhu 
>>
>> Roopa Prabhu (5):
>>   ip_tunnels: new IP_TUNNEL_INFO_BRIDGE flag for ip_tunnel_info mode
>>   vxlan: make COLLECT_METADATA mode bridge friendly
>>   bridge: uapi: add per vlan tunnel info
>>   bridge: vlan lwt and dst_metadata netlink support

[PATCH net] net: dsa: Bring back device detaching in dsa_slave_suspend()

2017-01-25 Thread Florian Fainelli

Commit 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid
lockdep splat") removed the netif_device_detach() call done in
dsa_slave_suspend() which is necessary, and paired with a corresponding
netif_device_attach(), bring it back.

Fixes: 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid lockdep 
splat")
Signed-off-by: Florian Fainelli 
---
David,

Can you also queue this for -stable? Thanks!

 net/dsa/slave.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 68c9eea00518..020a28d5c93e 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1203,6 +1203,8 @@ int dsa_slave_suspend(struct net_device *slave_dev)
 {
struct dsa_slave_priv *p = netdev_priv(slave_dev);
 
+   netif_device_detach(slave_dev);
+
if (p->phy) {
phy_stop(p->phy);
p->old_pause = -1;
-- 
2.9.3

Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread David Miller

From: Wei Wang 
Date: Wed, 25 Jan 2017 09:15:34 -0800

> Looks like you sent a separate patch on top of this patch series to
> address double connect().  Then I think this patch series should be
> good to go.

Indeed, Willy please give some kind of ACK.

Thanks.

Re: [PATCH net] sctp: sctp_addr_id2transport should verify the addr before looking up assoc

2017-01-25 Thread David Miller

From: Xin Long 
Date: Tue, 24 Jan 2017 14:01:53 +0800

> sctp_addr_id2transport is a function for sockopt to look up assoc by
> address. As the address is from userspace, it can be a v4-mapped v6
> address. But in sctp protocol stack, it always handles a v4-mapped
> v6 address as a v4 address. So it's necessary to convert it to a v4
> address before looking up assoc by address.
> 
> This patch is to fix it by calling sctp_verify_addr in which it can do
> this conversion before calling sctp_endpoint_lookup_assoc, just like
> what sctp_sendmsg and __sctp_connect do for the address from users.
> 
> Signed-off-by: Xin Long 

Applied.

Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: Workaround missing PHY ID on mv88e6390

2017-01-25 Thread Gregory CLEMENT

Hi Andrew,

 On mer., janv. 25 2017, Andrew Lunn  wrote:

> The internal PHYs of the mv88e6390 do not have a model ID. Trap any
> calls to the ID register, and if it is zero, return the ID for the
> mv88e6390. The Marvell PHY driver can then bind to this ID.
>
> Signed-off-by: Andrew Lunn 
> Reviewed-by: Florian Fainelli 
> ---
>  drivers/net/dsa/mv88e6xxx/global2.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/dsa/mv88e6xxx/global2.c 
> b/drivers/net/dsa/mv88e6xxx/global2.c
> index 353e26bea3c3..521a5511bd5f 100644
> --- a/drivers/net/dsa/mv88e6xxx/global2.c
> +++ b/drivers/net/dsa/mv88e6xxx/global2.c
> @@ -520,7 +520,21 @@ int mv88e6xxx_g2_smi_phy_read(struct mv88e6xxx_chip 
> *chip,
>   if (err)
>   return err;
>  
> - return mv88e6xxx_g2_read(chip, GLOBAL2_SMI_PHY_DATA, val);
> + err = mv88e6xxx_g2_read(chip, GLOBAL2_SMI_PHY_DATA, val);
> + if (err)
> + return err;
> +
> + if (reg == MII_PHYSID2) {
> + /* The mv88e6390 internal PHYS don't have a model number.
> +  * Use the switch family model number instead.
> +  */
> + if (!(*val & 0x3ff)) {

I tested this series on the Topaz switch but it failed because while I
said we read 0x1410C00 actually we read 0x01410C01. With the
MARVELL_PHY_ID_MASK we mask the 4 lower bits so that's why in my patch
"phy: marvell: Add support for the PHY embedded in the topaz switch" I
used the 0x01410C00 value for MARVELL_PHY_ID_88E6141.

However with the mask you use it doesn't work.

So this mask should be changed to 0x3f0 for the Topaz. Actually 0x3fe
would be enough but it seems more logical to use the same mask that for
MARVELL_PHY_ID_MASK.

We could either use the same mask for both family and still use 6390 as
they seem compatible or we use two different families based on the lower
bit.

Gregory

> + if (chip->info->family == MV88E6XXX_FAMILY_6390)
> + *val |= PORT_SWITCH_ID_PROD_NUM_6390;
> + }
> + }
> +
> + return 0;
>  }
>  
>  int mv88e6xxx_g2_smi_phy_write(struct mv88e6xxx_chip *chip,
> -- 
> 2.11.0
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

Re: [PATCH net] sctp: sctp gso should set feature with NETIF_F_SG when calling skb_segment

2017-01-25 Thread David Miller

From: Xin Long 
Date: Tue, 24 Jan 2017 14:05:16 +0800

> Now sctp gso puts segments into skb's frag_list, then processes these
> segments in skb_segment. But skb_segment handles them only when gs is
> enabled, as it's in the same branch with skb's frags.
> 
> Although almost all the NICs support sg other than some old ones, but
> since commit 1e16aa3ddf86 ("net: gso: use feature flag argument in all
> protocol gso handlers"), features &= skb->dev->hw_enc_features, and
> xfrm_output_gso call skb_segment with features = 0, which means sctp
> gso would call skb_segment with sg = 0, and skb_segment would not work
> as expected.
> 
> This patch is to fix it by setting features param with NETIF_F_SG when
> calling skb_segment so that it can go the right branch to process the
> skb's frag_list.
> 
> Signed-off-by: Xin Long 

Applied, thanks.

Re: [PATCH v4] net: ethernet: faraday: To support device tree usage.

2017-01-25 Thread David Miller

From: Greentime Hu 
Date: Tue, 24 Jan 2017 16:46:14 +0800

> We also use the same binding document to describe the same faraday ethernet
> controller and add faraday to vendor-prefixes.txt.
> 
> Signed-off-by: Greentime Hu 
> ---
> Changes in v4:
>   - Use the same binding document to describe the same faraday ethernet 
> controller and add faraday to vendor-prefixes.txt.
> Changes in v3:
>   - Nothing changed in this patch but I have committed andestech to 
> vendor-prefixes.txt.
> Changes in v2:
>   - Change atmac100_of_ids to ftmac100_of_ids
> 
> ---
>  .../net/{moxa,moxart-mac.txt => faraday,ftmac.txt} |7 +--
>  .../devicetree/bindings/vendor-prefixes.txt|1 +
>  drivers/net/ethernet/faraday/ftmac100.c|7 +++
>  3 files changed, 13 insertions(+), 2 deletions(-)
>  rename Documentation/devicetree/bindings/net/{moxa,moxart-mac.txt => 
> faraday,ftmac.txt} (68%)
> 
> diff --git a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt 
> b/Documentation/devicetree/bindings/net/faraday,ftmac.txt
> similarity index 68%
> rename from Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
> rename to Documentation/devicetree/bindings/net/faraday,ftmac.txt

Why are you renaming the MOXA binding file instead of adding a completely new 
one
for faraday?  The MOXA one should stick around, I don't see a justification for
removing it.

Re: [PATCH] net: ethernet: mvneta: add support for 2.5G DRSGMII mode

2017-01-25 Thread David Miller

From: Jan Luebbe 
Date: Mon, 23 Jan 2017 15:22:06 +0100

> The Marvell MVNETA Ethernet controller supports a 2.5 Gbps SGMII mode
> called DRSGMII.
> 
> This patch adds a corresponding phy-mode string 'drsgmii' and parses it
> from DT. The MVNETA then configures the SERDES protocol value
> accordingly.
> 
> It was successfully tested on a MV78460 connected to a FPGA.
> 
> Signed-off-by: Jan Luebbe 

I still haven't seen a sufficient explanation as to why this change
works without any explicit MAC programming changes to this driver.

That really needs to be explained before I will apply this patch.

Thanks.

Re: [PATCH net-next v7 1/1] net sched actions: Add support for user cookies

2017-01-25 Thread David Miller

From: Jamal Hadi Salim 
Date: Tue, 24 Jan 2017 07:02:41 -0500

> Introduce optional 128-bit action cookie.

Applied, but like Jiri I think you can use one buffer instead of two
to store the user's cookie data.

Thanks.

Re: [PATCH RFC net-next] packet: always ensure that we pass hard_header_len bytes in skb_headlen() to the driver

2017-01-25 Thread David Miller

From: Sowmini Varadhan 
Date: Tue, 24 Jan 2017 08:11:49 -0800

> @@ -2685,21 +2685,22 @@ static inline int dev_parse_header(const struct 
> sk_buff *skb,
>  }
>  
>  /* ll_header must have at least hard_header_len allocated */
> -static inline bool dev_validate_header(const struct net_device *dev,
> +static inline int dev_validate_header(const struct net_device *dev,
>  char *ll_header, int len)
>  {
>   if (likely(len >= dev->hard_header_len))
> - return true;
> + return len;
>  
>   if (capable(CAP_SYS_RAWIO)) {
>   memset(ll_header + len, 0, dev->hard_header_len - len);
> - return true;
> + return dev->hard_header_len;
>   }
>  
>   if (dev->header_ops && dev->header_ops->validate)
> - return dev->header_ops->validate(ll_header, len);
> + if (!dev->header_ops->validate(ll_header, len))
> + return -1;
>  
> - return false;
> + return dev->hard_header_len;
>  }
>  
>  typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, 
> int len);

This mostly looks good.  But I'm not so sure you handle the variable length 
header
case properly.  That's why we have the header_ops->validate() callback, to 
accomodate
that.

In the variable length case, you'll end up having to return something other than
just hard_header_len.  Probably you'll need to make header_ops->validate() 
return
that length.

Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: Workaround missing PHY ID on mv88e6390

2017-01-25 Thread Vivien Didelot

Hi Gregory, Andrew,

Gregory CLEMENT  writes:

>> +if (reg == MII_PHYSID2) {
>> +/* The mv88e6390 internal PHYS don't have a model number.
>> + * Use the switch family model number instead.
>> + */
>> +if (!(*val & 0x3ff)) {
>
> I tested this series on the Topaz switch but it failed because while I
> said we read 0x1410C00 actually we read 0x01410C01. With the
> MARVELL_PHY_ID_MASK we mask the 4 lower bits so that's why in my patch
> "phy: marvell: Add support for the PHY embedded in the topaz switch" I
> used the 0x01410C00 value for MARVELL_PHY_ID_88E6141.
>
> However with the mask you use it doesn't work.
>
> So this mask should be changed to 0x3f0 for the Topaz. Actually 0x3fe
> would be enough but it seems more logical to use the same mask that for
> MARVELL_PHY_ID_MASK.
>
> We could either use the same mask for both family and still use 6390 as
> they seem compatible or we use two different families based on the lower
> bit.

Since several chips have this issue, we can introduce a u16 physid2_mask
member in the mv88e6xxx_info structure and move the check in
mv88e6xxx_phy_read() so that the logic of device (as in Global2) helpers
are not affected by such (necessary) hack. Something like:

static int mv88e6xxx_phy_read(struct mv88e6xxx_chip *chip, int phy,
  int reg, u16 *val)
{
...

err = chip->info->ops->phy_read(chip, bus, addr, reg, val);
if (err)
return err;

if (reg == MII_PHYSID2 && chip->info->physid2_mask) {
/* Some internal PHYs don't have a model number,
 * so return the switch family model number directly.
 */
if (!(*val & chip->info->physid2_mask))
*val |= chip->info->prod_num;
}

return 0;
}

Thanks,

Vivien

Re: [patch net-next] tipc: uninitialized return code in tipc_setsockopt()

2017-01-25 Thread David Miller

From: Dan Carpenter 
Date: Tue, 24 Jan 2017 12:49:35 +0300

> We shuffled some code around and added some new case statements here and
> now "res" isn't initialized on all paths.
> 
> Fixes: 01fd12bb189a ("tipc: make replicast a user selectable option")
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.

Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: Workaround missing PHY ID on mv88e6390

2017-01-25 Thread Florian Fainelli

On 01/25/2017 09:27 AM, Gregory CLEMENT wrote:
> Hi Andrew,
>  
>  On mer., janv. 25 2017, Andrew Lunn  wrote:
> 
>> The internal PHYs of the mv88e6390 do not have a model ID. Trap any
>> calls to the ID register, and if it is zero, return the ID for the
>> mv88e6390. The Marvell PHY driver can then bind to this ID.
>>
>> Signed-off-by: Andrew Lunn 
>> Reviewed-by: Florian Fainelli 
>> ---
>>  drivers/net/dsa/mv88e6xxx/global2.c | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/dsa/mv88e6xxx/global2.c 
>> b/drivers/net/dsa/mv88e6xxx/global2.c
>> index 353e26bea3c3..521a5511bd5f 100644
>> --- a/drivers/net/dsa/mv88e6xxx/global2.c
>> +++ b/drivers/net/dsa/mv88e6xxx/global2.c
>> @@ -520,7 +520,21 @@ int mv88e6xxx_g2_smi_phy_read(struct mv88e6xxx_chip 
>> *chip,
>>  if (err)
>>  return err;
>>  
>> -return mv88e6xxx_g2_read(chip, GLOBAL2_SMI_PHY_DATA, val);
>> +err = mv88e6xxx_g2_read(chip, GLOBAL2_SMI_PHY_DATA, val);
>> +if (err)
>> +return err;
>> +
>> +if (reg == MII_PHYSID2) {
>> +/* The mv88e6390 internal PHYS don't have a model number.
>> + * Use the switch family model number instead.
>> + */
>> +if (!(*val & 0x3ff)) {
> 
> I tested this series on the Topaz switch but it failed because while I
> said we read 0x1410C00 actually we read 0x01410C01. With the
> MARVELL_PHY_ID_MASK we mask the 4 lower bits so that's why in my patch
> "phy: marvell: Add support for the PHY embedded in the topaz switch" I
> used the 0x01410C00 value for MARVELL_PHY_ID_88E6141.
> 
> However with the mask you use it doesn't work.
> 
> So this mask should be changed to 0x3f0 for the Topaz. Actually 0x3fe
> would be enough but it seems more logical to use the same mask that for
> MARVELL_PHY_ID_MASK.
> 
> We could either use the same mask for both family and still use 6390 as
> they seem compatible or we use two different families based on the lower
> bit.

By convention, the lower 4 bits are used to carry revision information,
which is why most drivers use 0x_fff0, can you try to use that here
for the PHY mask value?
-- 
Florian

Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: Workaround missing PHY ID on mv88e6390

2017-01-25 Thread Andrew Lunn

> I tested this series on the Topaz switch but it failed because while I
> said we read 0x1410C00 actually we read 0x01410C01. With the
> MARVELL_PHY_ID_MASK we mask the 4 lower bits so that's why in my patch
> "phy: marvell: Add support for the PHY embedded in the topaz switch" I
> used the 0x01410C00 value for MARVELL_PHY_ID_88E6141.

O.K. The lower 4 bits seem to be the silicon revision. Marvells own
SDK ignores those bits. So lets do the same here, use
MARVELL_PHY_ID_MASK.

Andrew

Re: [pull request][net-next 00/12] Mellanox mlx5 updates 2017-01-24

2017-01-25 Thread David Miller

From: Saeed Mahameed 
Date: Tue, 24 Jan 2017 22:16:40 +0200

> This pull request includes one new feature to support offloading IPv6
> tunnels in switchdev mode, in addition to some small mlx5 updates.
> Details are down bleow.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks a lot.

Re: [PATCH net-next 3/3] net/tcp-fastopen: Add new API support

2017-01-25 Thread Willy Tarreau

Hi Wei,

On Wed, Jan 25, 2017 at 09:15:34AM -0800, Wei Wang wrote:
> Willy,
> 
> Looks like you sent a separate patch on top of this patch series to address
> double connect().

Yes, sorry, I wanted to reply to this thread after the git-send-email
and got caught immediately after :-)

So as suggested by Eric in order to make the review easier, it was done
on top of your series.

> Then I think this patch series should be good to go.
> I will get your patch tested with our TFO test cases.

I think so as well. Thanks for running the tests. On my side I could fix
the haproxy bug which triggered this, and could verify the the whole
series works fine both with and without the haproxy fix. So I think we're
good now.

Thanks,
Willy

Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: Workaround missing PHY ID on mv88e6390

2017-01-25 Thread Andrew Lunn

> Since several chips have this issue, we can introduce a u16 physid2_mask
> member in the mv88e6xxx_info structure and move the check in
> mv88e6xxx_phy_read() so that the logic of device (as in Global2) helpers
> are not affected by such (necessary) hack. Something like:
> 
> static int mv88e6xxx_phy_read(struct mv88e6xxx_chip *chip, int phy,
>   int reg, u16 *val)
> {
> ...
> 
> err = chip->info->ops->phy_read(chip, bus, addr, reg, val);
> if (err)
> return err;
> 
> if (reg == MII_PHYSID2 && chip->info->physid2_mask) {
> /* Some internal PHYs don't have a model number,
>  * so return the switch family model number directly.
>  */
> if (!(*val & chip->info->physid2_mask))

Hi Vivien

I don't see the need to have per switch masks. Lets just hard code it
to ignore the lower 4 bits.

> *val |= chip->info->prod_num;

and this is not good. I deliberately picked the family num, not the
product num. Otherwise for the 6390 family, we have 6 different PHY
IDs. And two more for Gregorys two switches.

 Andrew

1 2 3 >

1 - 100 of 215 matches

Mail list logo