Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-07 Thread Oleksij Rempel
Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> Oleksij Rempel wrote:
>> Yes, this is "normal" problem. The firmware has no error handler for PCI
>> bus related exceptions. So if we filed to read PCI bus first time, we
>> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>> and provide an kernel "firmware panic!" message.
>> Every one who can or will to fix this, is welcome.
>>
>>> *
>>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
>>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> [...]
> 
>> memdmp 50ae78 50ae88
> 
> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> 
> [...copy to bin...]
> $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> [..]
>0:   6c1004  entry   a1, 32
>3:   126aa2  l32ra2, 0xfffdaa8c
>6:   0c0200  memw
>9:   8820l32i.n  a8, a2, 0  <--Exception cause PC 
> still points at load
>b:   c020movi.n  a2, 0
>d:   081940  extui   a9, a8, 1, 1
> 
> Judging from that it should be fairly simple to at least implement
> some sort of retry, possible after triggering a PCIe link retrain?

I assume, yes.

> There are some related PCIe root complex registers that may point to
> what exactly failed if they were dumped.
> 
> The root complex registers live at 0x0004 and I think match the
> registers described for the root complex in the AR9344 datasheet.

Suddenly I don't have ar7010 docs to tell..

> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> the hierarchy reports any of the following errors and the associated
> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> ERR_NONFATAL."
> 
> AFAICS link retrain can be done by setting bit3 (INIT_RST,
> "Application request to initiate a training reset") in
> PCIE_APP (0x4).
> 
> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> flips some bits in the RC to enable the PCIe bus for reading the
> EEPROM).
> 
> The root complex pci configuration space is at 0x2 which could
> have further error details:
>> memdmp 2 20200
> 
> 02: a02a 168c 0010 0006  0001 0001   .*..
> 020010:          
> 020020:          
> 020030:    0040    01ff  ...@
> 020040: 5bc3 5001        [.P.
> 020050: 0080 7005        ..p.
> 020060:          
> 020070: 0042 0010  8701  2010 0013 4411  .BD.
> 020080: 3011    00c0 03c0    0...
> 020090:    0010      
> 0200a0:          
> 0200b0:          
> 0200c0:          
> 0200d0:          
> 0200e0:          
> 0200f0:          
> 020100: 1401 0001     0006 2030  ...0
> 020110:    2000  00a0    
> 020120:          
> 020130:          
> 020140: 0001 0002        
> 020150:   8000 00ff      
> 020160:          
> 020170:          
> 020180:          
> 020190:          
> 0201a0:          
> 0201b0:          
> 0201c0:          
> 0201d0:          
> 0201e0:          
> 0201f0:          
> 
> Transformed into something suitable for feeding into lspci -F:
> 
> 00:00.0 Description filled in by lspci
> 00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
> 40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00
> 80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00
>

Re: [PATCH net-next] net: ipmr: add getlink support

2017-06-07 Thread Nikolay Aleksandrov
On 06/06/17 19:41, Nikolay Aleksandrov wrote:
> Currently there's no way to dump the VIF table for an ipmr table other
> than the default (via proc). This is a major issue when debugging ipmr
> issues and in general it is good to know which interfaces are
> configured. This patch adds support for RTM_GETLINK for the ipmr family
> so we can dump the VIF table and the ipmr table's current config for
> each table. We're protected by rtnl so no need to acquire RCU or
> mrt_lock.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> The plan is to add full netlink control to ipmr via new/set/dellink later.
> Also this would allow us to dump any number of VIFs in the future when we
> remove the VIF device limit.
> 
>  include/uapi/linux/mroute.h |  38 +++
>  net/ipv4/ipmr.c | 114 
> 
>  2 files changed, 152 insertions(+)
> 

Self-NAK, after discussing this further with Roopa, I think it'll be best if we 
turn all of
these table and vif options into embedded separate netlink attributes. We'll 
get all the
netlink benefits and it will be much easier for user-space to manipulate them 
later.

Will send v2 later after some testing.

Thanks,
 Nik




Re: [PATCH net-next v2 3/3] udp: try to avoid 2 cache miss on dequeue

2017-06-07 Thread Paolo Abeni
Hi David,

On Tue, 2017-06-06 at 16:23 +0200, Paolo Abeni wrote:
> when udp_recvmsg() is executed, on x86_64 and other archs, most skb
> fields are on cold cachelines.
> If the skb are linear and the kernel don't need to compute the udp
> csum, only a handful of skb fields are required by udp_recvmsg().
> Since we already use skb->dev_scratch to cache hot data, and
> there are 32 bits unused on 64 bit archs, use such field to cache
> as much data as we can, and try to prefetch on dequeue the relevant
> fields that are left out.
> 
> This can save up to 2 cache miss per packet.
> 
> v1 -> v2:
>   - changed udp_dev_scratch fields types to u{32,16} variant,
> replaced bitfield with bool
> 
> Signed-off-by: Paolo Abeni 

Can you please keep on-hold this series a little time? the lkp-robot
just reported a performance regression on v1 which I have still to
investigate. I can't look at it really soon, but I expect the same
should apply to v2.

It sounds quite weird to me, since the bisected patch touches the UDP
code only and the regression is on apachebench.

Thank you,

Paolo




[PATCH 3/9] net: mvmdio: use GENMASK for masks

2017-06-07 Thread Antoine Tenart
Cosmetic patch to use the GENMASK helper for masks.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 17b518b13ae3..96af8d57d9e5 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -138,7 +138,7 @@ static int orion_mdio_read(struct mii_bus *bus, int mii_id,
goto out;
}
 
-   ret = val & 0x;
+   ret = val & GENMASK(15,0);
 out:
mutex_unlock(&dev->lock);
return ret;
-- 
2.9.4



[PATCH 2/9] net: mvmdio: use tabs for defines

2017-06-07 Thread Antoine Tenart
Cosmetic patch replacing spaces by tabs for defined values.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 109a2bff334d..17b518b13ae3 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -30,25 +30,25 @@
 #include 
 #include 
 
-#define MVMDIO_SMI_DATA_SHIFT  0
-#define MVMDIO_SMI_PHY_ADDR_SHIFT  16
-#define MVMDIO_SMI_PHY_REG_SHIFT   21
-#define MVMDIO_SMI_READ_OPERATION  BIT(26)
-#define MVMDIO_SMI_WRITE_OPERATION 0
-#define MVMDIO_SMI_READ_VALID  BIT(27)
-#define MVMDIO_SMI_BUSYBIT(28)
-#define MVMDIO_ERR_INT_CAUSE  0x007C
-#define  MVMDIO_ERR_INT_SMI_DONE  0x0010
-#define MVMDIO_ERR_INT_MASK   0x0080
+#define MVMDIO_SMI_DATA_SHIFT  0
+#define MVMDIO_SMI_PHY_ADDR_SHIFT  16
+#define MVMDIO_SMI_PHY_REG_SHIFT   21
+#define MVMDIO_SMI_READ_OPERATION  BIT(26)
+#define MVMDIO_SMI_WRITE_OPERATION 0
+#define MVMDIO_SMI_READ_VALID  BIT(27)
+#define MVMDIO_SMI_BUSYBIT(28)
+#define MVMDIO_ERR_INT_CAUSE   0x007C
+#define  MVMDIO_ERR_INT_SMI_DONE   0x0010
+#define MVMDIO_ERR_INT_MASK0x0080
 
 /*
  * SMI Timeout measurements:
  * - Kirkwood 88F6281 (Globalscale Dreamplug): 45us to 95us (Interrupt)
  * - Armada 370   (Globalscale Mirabox):   41us to 43us (Polled)
  */
-#define MVMDIO_SMI_TIMEOUT1000 /* 1000us = 1ms */
-#define MVMDIO_SMI_POLL_INTERVAL_MIN  45
-#define MVMDIO_SMI_POLL_INTERVAL_MAX  55
+#define MVMDIO_SMI_TIMEOUT 1000 /* 1000us = 1ms */
+#define MVMDIO_SMI_POLL_INTERVAL_MIN   45
+#define MVMDIO_SMI_POLL_INTERVAL_MAX   55
 
 struct orion_mdio_dev {
struct mutex lock;
-- 
2.9.4



[PATCH 8/9] dt-bindings: orion-mdio: document the new xmdio compatible

2017-06-07 Thread Antoine Tenart
A new compatible for Marvell xMDIO interfaces was added into the Marvell
MDIO driver. Document this new compatible.

Signed-off-by: Antoine Tenart 
---
 Documentation/devicetree/bindings/net/marvell-orion-mdio.txt | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt 
b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
index ccdabdcc8618..315036ff8fed 100644
--- a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
+++ b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
@@ -1,12 +1,12 @@
 * Marvell MDIO Ethernet Controller interface
 
 The Ethernet controllers of the Marvel Kirkwood, Dove, Orion5x,
-MV78xx0, Armada 370 and Armada XP have an identical unit that provides
-an interface with the MDIO bus. This driver handles this MDIO
-interface.
+MV78xx0, Armada 370, Armada XP, Armada 7k and Armada 8k have an
+identical unit that provides an interface with the MDIO bus or to
+the xMDIO bus. This driver handles these interfaces.
 
 Required properties:
-- compatible: "marvell,orion-mdio"
+- compatible: "marvell,orion-mdio" or "marvell,xmdio"
 - reg: address and length of the MDIO registers.  When an interrupt is
   not present, the length is the size of the SMI register (4 bytes)
   otherwise it must be 0x84 bytes to cover the interrupt control
-- 
2.9.4



[PATCH 1/9] net: mvmdio: reorder headers alphabetically

2017-06-07 Thread Antoine Tenart
Cosmetic fix reordering headers alphabetically.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 90a60b98c28e..109a2bff334d 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -17,16 +17,16 @@
  * warranty of any kind, whether express or implied.
  */
 
+#include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
 
-- 
2.9.4



[PATCH 5/9] net: mvmdio: introduce an ops structure

2017-06-07 Thread Antoine Tenart
Introduce an ops structure to add an indirection on functions accessing
the registers. This is needed to add the xMDIO support later.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 65 +++
 1 file changed, 51 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 56bbe3990590..8a71ef93a61b 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -62,6 +62,15 @@ struct orion_mdio_dev {
 */
int err_interrupt;
wait_queue_head_t smi_busy_wait;
+   struct orion_mdio_ops *ops;
+};
+
+struct orion_mdio_ops {
+   int (*is_done)(struct orion_mdio_dev *);
+   int (*is_read_valid)(struct orion_mdio_dev *);
+   void (*start_read)(struct orion_mdio_dev *, int, int);
+   u16 (*read)(struct orion_mdio_dev *);
+   void (*write)(struct orion_mdio_dev *, int, int, u16);
 };
 
 static int orion_mdio_smi_is_done(struct orion_mdio_dev *dev)
@@ -74,6 +83,30 @@ static int orion_mdio_smi_is_read_valid(struct 
orion_mdio_dev *dev)
return !(readl(dev->regs) & MVMDIO_SMI_READ_VALID);
 }
 
+static void orion_mdio_start_read_op(struct orion_mdio_dev *dev, int mii_id,
+int regnum)
+{
+   writel(((mii_id << MVMDIO_SMI_PHY_ADDR_SHIFT) |
+   (regnum << MVMDIO_SMI_PHY_REG_SHIFT)  |
+   MVMDIO_SMI_READ_OPERATION),
+  dev->regs);
+}
+
+static u16 orion_mdio_read_op(struct orion_mdio_dev *dev)
+{
+   return readl(dev->regs) & GENMASK(15,0);
+}
+
+static void orion_mdio_write_op(struct orion_mdio_dev *dev, int mii_id,
+   int regnum, u16 value)
+{
+   writel(((mii_id << MVMDIO_SMI_PHY_ADDR_SHIFT) |
+   (regnum << MVMDIO_SMI_PHY_REG_SHIFT)  |
+   MVMDIO_SMI_WRITE_OPERATION|
+   (value << MVMDIO_SMI_DATA_SHIFT)),
+  dev->regs);
+}
+
 /* Wait for the SMI unit to be ready for another operation
  */
 static int orion_mdio_wait_ready(struct mii_bus *bus)
@@ -84,7 +117,7 @@ static int orion_mdio_wait_ready(struct mii_bus *bus)
int timedout = 0;
 
while (1) {
-   if (orion_mdio_smi_is_done(dev))
+   if (dev->ops->is_done(dev))
return 0;
else if (timedout)
break;
@@ -103,8 +136,7 @@ static int orion_mdio_wait_ready(struct mii_bus *bus)
if (timeout < 2)
timeout = 2;
wait_event_timeout(dev->smi_busy_wait,
-  orion_mdio_smi_is_done(dev),
-  timeout);
+  dev->ops->is_done(dev), timeout);
 
++timedout;
}
@@ -126,22 +158,19 @@ static int orion_mdio_read(struct mii_bus *bus, int 
mii_id,
if (ret < 0)
goto out;
 
-   writel(((mii_id << MVMDIO_SMI_PHY_ADDR_SHIFT) |
-   (regnum << MVMDIO_SMI_PHY_REG_SHIFT)  |
-   MVMDIO_SMI_READ_OPERATION),
-  dev->regs);
+   dev->ops->start_read(dev, mii_id, regnum);
 
ret = orion_mdio_wait_ready(bus);
if (ret < 0)
goto out;
 
-   if (orion_mdio_smi_is_read_valid(dev)) {
+   if (dev->ops->is_read_valid(dev)) {
dev_err(bus->parent, "SMI bus read not valid\n");
ret = -ENODEV;
goto out;
}
 
-   ret = readl(dev->regs) & GENMASK(15,0);
+   ret = dev->ops->read(dev);
 out:
mutex_unlock(&dev->lock);
return ret;
@@ -159,11 +188,7 @@ static int orion_mdio_write(struct mii_bus *bus, int 
mii_id,
if (ret < 0)
goto out;
 
-   writel(((mii_id << MVMDIO_SMI_PHY_ADDR_SHIFT) |
-   (regnum << MVMDIO_SMI_PHY_REG_SHIFT)  |
-   MVMDIO_SMI_WRITE_OPERATION|
-   (value << MVMDIO_SMI_DATA_SHIFT)),
-  dev->regs);
+   dev->ops->write(dev, mii_id, regnum, value);
 
 out:
mutex_unlock(&dev->lock);
@@ -190,6 +215,7 @@ static int orion_mdio_probe(struct platform_device *pdev)
struct resource *r;
struct mii_bus *bus;
struct orion_mdio_dev *dev;
+   struct orion_mdio_ops *ops;
int i, ret;
 
r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -249,6 +275,17 @@ static int orion_mdio_probe(struct platform_device *pdev)
 
mutex_init(&dev->lock);
 
+   ops = devm_kzalloc(&pdev->dev, sizeof(*ops), GFP_KERNEL);
+   if (!ops)
+   return -ENOMEM;
+
+   ops->is_done = orion_mdio_smi_is_done;
+   ops->is_read_valid = orion_mdio_smi_is_read_valid;
+   ops->start_read = orion_mdio_start_read_op;
+   ops->read = orion_mdio_read_op;
+   ops->write = orion_mdio_wri

[PATCH 4/9] net: mvmdio: move the read valid check into its own function

2017-06-07 Thread Antoine Tenart
Move the read valid check in its own function. This is needed as a
requirement to factorize the driver to add the xMDIO support in the
future.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 96af8d57d9e5..56bbe3990590 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -69,6 +69,11 @@ static int orion_mdio_smi_is_done(struct orion_mdio_dev *dev)
return !(readl(dev->regs) & MVMDIO_SMI_BUSY);
 }
 
+static int orion_mdio_smi_is_read_valid(struct orion_mdio_dev *dev)
+{
+   return !(readl(dev->regs) & MVMDIO_SMI_READ_VALID);
+}
+
 /* Wait for the SMI unit to be ready for another operation
  */
 static int orion_mdio_wait_ready(struct mii_bus *bus)
@@ -113,7 +118,6 @@ static int orion_mdio_read(struct mii_bus *bus, int mii_id,
   int regnum)
 {
struct orion_mdio_dev *dev = bus->priv;
-   u32 val;
int ret;
 
mutex_lock(&dev->lock);
@@ -131,14 +135,13 @@ static int orion_mdio_read(struct mii_bus *bus, int 
mii_id,
if (ret < 0)
goto out;
 
-   val = readl(dev->regs);
-   if (!(val & MVMDIO_SMI_READ_VALID)) {
+   if (orion_mdio_smi_is_read_valid(dev)) {
dev_err(bus->parent, "SMI bus read not valid\n");
ret = -ENODEV;
goto out;
}
 
-   ret = val & GENMASK(15,0);
+   ret = readl(dev->regs) & GENMASK(15,0);
 out:
mutex_unlock(&dev->lock);
return ret;
-- 
2.9.4



[PATCH 7/9] net: mvmdio: add xmdio support

2017-06-07 Thread Antoine Tenart
This patch adds the xMDIO interface support in the mvmdio driver. This
interface is used in Ethernet controllers on Marvell 370, 7k and 8k (as
of now). The xSMI interface supported by this driver complies with the
IEEE 802.3 clause 45 (while the SMI interface complies with the clause
22). The xSMI interface is used by 10GbE devices.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/Kconfig  |   6 +-
 drivers/net/ethernet/marvell/mvmdio.c | 121 --
 2 files changed, 104 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index da6fb825afea..205bb7e683b7 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -35,9 +35,9 @@ config MVMDIO
depends on HAS_IOMEM
select PHYLIB
---help---
- This driver supports the MDIO interface found in the network
- interface units of the Marvell EBU SoCs (Kirkwood, Orion5x,
- Dove, Armada 370 and Armada XP).
+ This driver supports the MDIO and xMDIO interfaces found in
+ the network interface units of the Marvell EBU SoCs (Kirkwood,
+ Orion5x, Dove, Armada 370, Armada XP, Armada 7k and Armada 8k).
 
  This driver is used by the MV643XX_ETH and MVNETA drivers.
 
diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 3cb3dd3331d8..13b198aca5c1 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -41,6 +41,15 @@
 #define  MVMDIO_ERR_INT_SMI_DONE   0x0010
 #define MVMDIO_ERR_INT_MASK0x0080
 
+#define MVMDIO_XSMI_MGNT_REG   0x0
+#define  MVMDIO_XSMI_READ_VALIDBIT(29)
+#define  MVMDIO_XSMI_BUSY  BIT(30)
+#define MVMDIO_XSMI_ADDR_REG   0x8
+#define  MVMDIO_XSMI_PHYADDR_SHIFT 16
+#define  MVMDIO_XSMI_DEVADDR_SHIFT 21
+#define  MVMDIO_XSMI_READ_OPERATION(0x7 << 26)
+#define  MVMDIO_XSMI_WRITE_OPERATION   (0x5 << 27)
+
 /*
  * SMI Timeout measurements:
  * - Kirkwood 88F6281 (Globalscale Dreamplug): 45us to 95us (Interrupt)
@@ -50,6 +59,9 @@
 #define MVMDIO_SMI_POLL_INTERVAL_MIN   45
 #define MVMDIO_SMI_POLL_INTERVAL_MAX   55
 
+#define MVMDIO_XSMI_POLL_INTERVAL_MIN  150
+#define MVMDIO_XSMI_POLL_INTERVAL_MAX  160
+
 struct orion_mdio_dev {
struct mutex lock;
void __iomem *regs;
@@ -76,18 +88,19 @@ struct orion_mdio_ops {
void (*write)(struct orion_mdio_dev *, int, int, u16);
 };
 
-static int orion_mdio_smi_is_done(struct orion_mdio_dev *dev)
+/* smi */
+static int smi_is_done(struct orion_mdio_dev *dev)
 {
return !(readl(dev->regs) & MVMDIO_SMI_BUSY);
 }
 
-static int orion_mdio_smi_is_read_valid(struct orion_mdio_dev *dev)
+static int smi_is_read_valid(struct orion_mdio_dev *dev)
 {
return !(readl(dev->regs) & MVMDIO_SMI_READ_VALID);
 }
 
-static void orion_mdio_start_read_op(struct orion_mdio_dev *dev, int mii_id,
-int regnum)
+static void smi_start_read_op(struct orion_mdio_dev *dev, int mii_id,
+ int regnum)
 {
writel(((mii_id << MVMDIO_SMI_PHY_ADDR_SHIFT) |
(regnum << MVMDIO_SMI_PHY_REG_SHIFT)  |
@@ -95,13 +108,13 @@ static void orion_mdio_start_read_op(struct orion_mdio_dev 
*dev, int mii_id,
   dev->regs);
 }
 
-static u16 orion_mdio_read_op(struct orion_mdio_dev *dev)
+static u16 smi_read_op(struct orion_mdio_dev *dev)
 {
return readl(dev->regs) & GENMASK(15,0);
 }
 
-static void orion_mdio_write_op(struct orion_mdio_dev *dev, int mii_id,
-   int regnum, u16 value)
+static void smi_write_op(struct orion_mdio_dev *dev, int mii_id,
+int regnum, u16 value)
 {
writel(((mii_id << MVMDIO_SMI_PHY_ADDR_SHIFT) |
(regnum << MVMDIO_SMI_PHY_REG_SHIFT)  |
@@ -110,6 +123,47 @@ static void orion_mdio_write_op(struct orion_mdio_dev 
*dev, int mii_id,
   dev->regs);
 }
 
+/* xsmi */
+static int xsmi_is_done(struct orion_mdio_dev *dev)
+{
+   return !(readl(dev->regs + MVMDIO_XSMI_MGNT_REG) & MVMDIO_XSMI_BUSY);
+}
+
+static int xsmi_is_read_valid(struct orion_mdio_dev *dev)
+{
+   return !(readl(dev->regs + MVMDIO_XSMI_MGNT_REG) &
+MVMDIO_XSMI_READ_VALID);
+}
+
+static void xsmi_start_read_op(struct orion_mdio_dev *dev, int mii_id,
+ int regnum)
+{
+   u16 dev_addr = regnum >> 16;
+
+   writel(regnum & GENMASK(15,0), dev->regs + MVMDIO_XSMI_ADDR_REG);
+   writel((mii_id << MVMDIO_XSMI_PHYADDR_SHIFT) |
+  (dev_addr << MVMDIO_XSMI_DEVADDR_SHIFT) |
+  MVMDIO_XSMI_READ_OPERATION,
+  dev->regs + MVMDIO_XSMI_MGNT_REG);
+}
+
+static u16 xsmi_read_op(struct orion_mdio_dev *dev)
+{
+   return readl(dev->regs + MVMDIO_XSMI_MGNT_REG) & GENMASK(15,0);
+}
+
+static void xsmi_write

[PATCH 0/9] net: mvmdio: add xSMI support

2017-06-07 Thread Antoine Tenart
Hello,

This series aims to add the xSMI support (also called xMDIO) to the
mvmdio driver. The xSMI interface complies with the IEEE 802.3 clause 45
and is used by 10GbE devices. On 7k and 8k (as of now), such an
interface is found and is used by Ethernet controllers.

Patches 1-3 are cosmetic cleanups.

Patches 4-6 are prerequisites to the xSMI support.

Patches 7-9 add the xSMI support to the mvmdio driver, and a node is
added both in the cp110 slave and master device trees.

This was tested on an Armada 8040 mcbin, as well as on both the
Armada 7040 DB and the Armada 8040 DB to ensure the SMI interface
was still working.

Thanks,
Antoine

Antoine Tenart (9):
  net: mvmdio: reorder headers alphabetically
  net: mvmdio: use tabs for defines
  net: mvmdio: use GENMASK for masks
  net: mvmdio: move the read valid check into its own function
  net: mvmdio: introduce an ops structure
  net: mvmdio: put the poll intervals in the private structure
  net: mvmdio: add xmdio support
  dt-bindings: orion-mdio: document the new xmdio compatible
  arm64: marvell: dts: add xmdio nodes for 7k/8k

 .../devicetree/bindings/net/marvell-orion-mdio.txt |   8 +-
 .../boot/dts/marvell/armada-cp110-master.dtsi  |   7 +
 .../arm64/boot/dts/marvell/armada-cp110-slave.dtsi |   7 +
 drivers/net/ethernet/marvell/Kconfig   |   6 +-
 drivers/net/ethernet/marvell/mvmdio.c  | 200 +
 5 files changed, 184 insertions(+), 44 deletions(-)

-- 
2.9.4



[PATCH 6/9] net: mvmdio: put the poll intervals in the private structure

2017-06-07 Thread Antoine Tenart
Put the two poll intervals (min and max) in the driver's private
structure. This is needed to add the xmdio support later.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 8a71ef93a61b..3cb3dd3331d8 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -63,6 +63,9 @@ struct orion_mdio_dev {
int err_interrupt;
wait_queue_head_t smi_busy_wait;
struct orion_mdio_ops *ops;
+
+   unsigned int poll_interval_min;
+   unsigned int poll_interval_max;
 };
 
 struct orion_mdio_ops {
@@ -123,8 +126,8 @@ static int orion_mdio_wait_ready(struct mii_bus *bus)
break;
 
if (dev->err_interrupt <= 0) {
-   usleep_range(MVMDIO_SMI_POLL_INTERVAL_MIN,
-MVMDIO_SMI_POLL_INTERVAL_MAX);
+   usleep_range(dev->poll_interval_min,
+dev->poll_interval_max);
 
if (time_is_before_jiffies(end))
++timedout;
@@ -279,6 +282,8 @@ static int orion_mdio_probe(struct platform_device *pdev)
if (!ops)
return -ENOMEM;
 
+   dev->poll_interval_min = MVMDIO_SMI_POLL_INTERVAL_MIN;
+   dev->poll_interval_max = MVMDIO_SMI_POLL_INTERVAL_MAX;
ops->is_done = orion_mdio_smi_is_done;
ops->is_read_valid = orion_mdio_smi_is_read_valid;
ops->start_read = orion_mdio_start_read_op;
-- 
2.9.4



[PATCH 9/9] arm64: marvell: dts: add xmdio nodes for 7k/8k

2017-06-07 Thread Antoine Tenart
Add the description of the xMDIO bus for the Marvell Armada 7k and
Marvell Armada 8k; for both CP110 slave and master. This bus is found
on Marvell Ethernet controllers and provides an interface with the
xMDIO bus.

Signed-off-by: Antoine Tenart 
---
 arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi | 7 +++
 arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi  | 7 +++
 2 files changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi 
b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
index 037ed30d75a7..95953743455e 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
@@ -98,6 +98,13 @@
clocks = <&cpm_syscon0 1 9>, <&cpm_syscon0 1 5>;
};
 
+   cpm_xmdio: mdio@12a600 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "marvell,xmdio";
+   reg = <0x12a600 0x10>;
+   };
+
cpm_icu: interrupt-controller@1e {
compatible = "marvell,cp110-icu"; 
reg = <0x1e 0x10>;
diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi 
b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
index 2a99ff8fca2a..594356243ddb 100644
--- a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
@@ -103,6 +103,13 @@
clocks = <&cps_syscon0 1 9>, <&cps_syscon0 1 5>;
};
 
+   cps_xmdio: mdio@12a600 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "marvell,xmdio";
+   reg = <0x12a600 0x10>;
+   };
+
cps_icu: interrupt-controller@1e {
compatible = "marvell,cp110-icu";
reg = <0x1e 0x10>;
-- 
2.9.4



[PATCH 1/2(net.git)] stmmac: fix ptp header for GMAC3 hw timestamp

2017-06-07 Thread Mario Molitor
>From e87e1a88a3bf054c45156edf8a63d72169064baa Mon Sep 17 00:00:00 2001
From: Mario Molitor 
Date: Tue, 6 Jun 2017 21:31:02 +0200
Subject: [PATCH 1/2] stmmac: fix ptp header for GMAC3 hw timestamp

According the CYCLON V documention only the bit 16 of snaptypesel should
set.
(more information see Table 17-20 (cv_5v4.pdf) :
 Timestamp Snapshot Dependency on Register Bits)

fixed: commit d2042052a0aa ("stmmac: update the PTP header file")
Signed-off-by: Mario Molitor 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 15 ---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h  |  3 ++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a74c481..14d0bf3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -546,7 +546,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
/* PTP v1, UDP, any kind of event packet */
config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
/* take time stamp for all event messages */
-   snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+   if (priv->plat->has_gmac4)
+   snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
+   else
+   snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
 
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
@@ -578,7 +581,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+   if (priv->plat->has_gmac4)
+   snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
+   else
+   snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
 
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
@@ -612,7 +618,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device *dev, 
struct ifreq *ifr)
config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
ptp_v2 = PTP_TCR_TSVER2ENA;
/* take time stamp for all event messages */
-   snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+   if (priv->plat->has_gmac4)
+   snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
+   else
+   snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
 
ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
index 48fb72f..f4b31d6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h
@@ -59,7 +59,8 @@
 /* Enable Snapshot for Messages Relevant to Master */
 #definePTP_TCR_TSMSTRENA   BIT(15)
 /* Select PTP packets for Taking Snapshots */
-#definePTP_TCR_SNAPTYPSEL_1GENMASK(17, 16)
+#definePTP_TCR_SNAPTYPSEL_1BIT(16)
+#definePTP_GMAC4_TCR_SNAPTYPSEL_1  GENMASK(17, 16)
 /* Enable MAC address for PTP Frame Filtering */
 #definePTP_TCR_TSENMACADDR BIT(18)
 
-- 
2.7.4



Re: [PATCH 9/9] arm64: marvell: dts: add xmdio nodes for 7k/8k

2017-06-07 Thread Gregory CLEMENT
Hi Dave,
 
 On mer., juin 07 2017, Antoine Tenart  
wrote:

> Add the description of the xMDIO bus for the Marvell Armada 7k and
> Marvell Armada 8k; for both CP110 slave and master. This bus is found
> on Marvell Ethernet controllers and provides an interface with the
> xMDIO bus.
>

If you agrees with this series please don't apply this patch. I will
take care of it. We have many changes in the dt directory for the next
release and I want that all the change are in the same place to take
care of the merge conflict.

Thanks,

Gregory

> Signed-off-by: Antoine Tenart 
> ---
>  arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi | 7 +++
>  arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi  | 7 +++
>  2 files changed, 14 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi 
> b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
> index 037ed30d75a7..95953743455e 100644
> --- a/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
> +++ b/arch/arm64/boot/dts/marvell/armada-cp110-master.dtsi
> @@ -98,6 +98,13 @@
>   clocks = <&cpm_syscon0 1 9>, <&cpm_syscon0 1 5>;
>   };
>  
> + cpm_xmdio: mdio@12a600 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + compatible = "marvell,xmdio";
> + reg = <0x12a600 0x10>;
> + };
> +
>   cpm_icu: interrupt-controller@1e {
>   compatible = "marvell,cp110-icu"; 
>   reg = <0x1e 0x10>;
> diff --git a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi 
> b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
> index 2a99ff8fca2a..594356243ddb 100644
> --- a/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
> +++ b/arch/arm64/boot/dts/marvell/armada-cp110-slave.dtsi
> @@ -103,6 +103,13 @@
>   clocks = <&cps_syscon0 1 9>, <&cps_syscon0 1 5>;
>   };
>  
> + cps_xmdio: mdio@12a600 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + compatible = "marvell,xmdio";
> + reg = <0x12a600 0x10>;
> + };
> +
>   cps_icu: interrupt-controller@1e {
>   compatible = "marvell,cp110-icu";
>   reg = <0x1e 0x10>;
> -- 
> 2.9.4
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com


[PATCH 2/2(net.git)] stmmac: fix for hw timestamp of GMAC3 unit

2017-06-07 Thread Mario Molitor
>From 94f3314c6d67246e015a50fd14e665732a5e78b0 Mon Sep 17 00:00:00 2001
From: Mario Molitor 
Date: Tue, 6 Jun 2017 21:47:16 +0200
Subject: [PATCH 2/2] stmmac: fix for hw timestamp of GMAC3 unit

1.) Bugfix of function stmmac_get_tx_hwtstamp.
Corrected the tx timestamp available check (same as 4.8 and older)
Change printout from info syslevel to debug.

2.) Bugfix of function stmmac_get_rx_hwtstamp.
Corrected the rx timestamp available check (same as 4.8 and older)
Change printout from info syslevel to debug.

fixed: commit ba1ffd74df74 ("stmmac: fix PTP support for GMAC4")
Signed-off-by: Mario Molitor 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c | 11 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 10 +-
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
index aa64764..6c2f956 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c
@@ -214,13 +214,13 @@ static int dwmac4_wrback_get_tx_timestamp_status(struct 
dma_desc *p)
 {
/* Context type from W/B descriptor must be zero */
if (le32_to_cpu(p->des3) & TDES3_CONTEXT_TYPE)
-   return -EINVAL;
+   return 0;
 
/* Tx Timestamp Status is 1 so des0 and des1'll have valid values */
if (le32_to_cpu(p->des3) & TDES3_TIMESTAMP_STATUS)
-   return 0;
+   return 1;
 
-   return 1;
+   return 0;
 }
 
 static inline u64 dwmac4_get_timestamp(void *desc, u32 ats)
@@ -282,7 +282,10 @@ static int dwmac4_wrback_get_rx_timestamp_status(void 
*desc, u32 ats)
}
}
 exit:
-   return ret;
+   if (likely(ret == 0))
+   return 1;
+   else
+   return 0;
 }
 
 static void dwmac4_rd_init_rx_desc(struct dma_desc *p, int disable_rx_ic,
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 14d0bf3..9eb8132 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -434,14 +434,14 @@ static void stmmac_get_tx_hwtstamp(struct stmmac_priv 
*priv,
return;
 
/* check tx tstamp status */
-   if (!priv->hw->desc->get_tx_timestamp_status(p)) {
+   if (priv->hw->desc->get_tx_timestamp_status(p)) {
/* get the valid tstamp */
ns = priv->hw->desc->get_timestamp(p, priv->adv_ts);
 
memset(&shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps));
shhwtstamp.hwtstamp = ns_to_ktime(ns);
 
-   netdev_info(priv->dev, "get valid TX hw timestamp %llu\n", ns);
+   netdev_dbg(priv->dev, "get valid TX hw timestamp %llu\n", ns);
/* pass tstamp to stack */
skb_tstamp_tx(skb, &shhwtstamp);
}
@@ -468,19 +468,19 @@ static void stmmac_get_rx_hwtstamp(struct stmmac_priv 
*priv, struct dma_desc *p,
return;
 
/* Check if timestamp is available */
-   if (!priv->hw->desc->get_rx_timestamp_status(p, priv->adv_ts)) {
+   if (priv->hw->desc->get_rx_timestamp_status(p, priv->adv_ts)) {
/* For GMAC4, the valid timestamp is from CTX next desc. */
if (priv->plat->has_gmac4)
ns = priv->hw->desc->get_timestamp(np, priv->adv_ts);
else
ns = priv->hw->desc->get_timestamp(p, priv->adv_ts);
 
-   netdev_info(priv->dev, "get valid RX hw timestamp %llu\n", ns);
+   netdev_dbg(priv->dev, "get valid RX hw timestamp %llu\n", ns);
shhwtstamp = skb_hwtstamps(skb);
memset(shhwtstamp, 0, sizeof(struct skb_shared_hwtstamps));
shhwtstamp->hwtstamp = ns_to_ktime(ns);
} else  {
-   netdev_err(priv->dev, "cannot get RX hw timestamp\n");
+   netdev_dbg(priv->dev, "cannot get RX hw timestamp\n");
}
 }
 
-- 
2.7.4



[PATCH v3 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-06-07 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.  

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

Casey Leedom (2):
  PCI: Add new PCIe Fabric End Node flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (1):
  PCI: Enable PCIe Relaxed Ordering if supported

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 ++
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +--
 drivers/pci/pci.c   | 29 +
 drivers/pci/probe.c | 43 +
 drivers/pci/quirks.c| 38 ++
 include/linux/pci.h |  4 +++
 7 files changed, 135 insertions(+), 2 deletions(-)

-- 
1.9.0




[PATCH v3 1/3] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-06-07 Thread Ding Tianhong
From: Casey Leedom 

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
Ordering Attribute should not be used on Transaction Layer Packets destined
for the PCIe End Node so flagged.  Initially flagged this way are Intel
E5-26xx Root Complex Ports which suffer from a Flow Control Credit
Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
---
 drivers/pci/quirks.c | 38 ++
 include/linux/pci.h  |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 085fb78..58bdd23 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3999,6 +3999,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
  quirk_tw686x_class);
 
 /*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set.  Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010).  As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, 
PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the Attribute as were supplied in the header of the
  * corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33c2b0b..e1e8428 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -183,6 +183,8 @@ enum pci_dev_flags {
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
/* Do not use FLR even if device advertises PCI_AF_CAP */
PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
+   /* Don't use Relaxed Ordering for TLPs directed at this device */
+   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
 };
 
 enum pci_irq_reroute_variant {
-- 
1.9.0




[PATCH v3 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-06-07 Thread Ding Tianhong
The PCIe Device Control Register use the bit 4 to indicate that
whether the device is permitted to enable relaxed ordering or not.
But relaxed ordering is not safe for some platform which could only
use strong write ordering, so devices are allowed (but not required)
to enable relaxed ordering bit by default.

If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

Signed-off-by: Ding Tianhong 
---
 drivers/pci/pci.c   | 29 +
 drivers/pci/probe.c | 43 +++
 include/linux/pci.h |  2 ++
 3 files changed, 74 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b01bd5b..3d42b38 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4878,6 +4878,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relexed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+ PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_get_relaxed_ordering - check PCI Express relexed ordering bit
+ * @dev: PCI device to query
+ *
+ * Returns true if relaxed ordering is been set
+ */
+int pcie_get_relaxed_ordering(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+   return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
+}
+EXPORT_SYMBOL(pcie_get_relaxed_ordering);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 19c8950..0c94c80 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1701,6 +1701,48 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */ 
+static int pci_dev_disable_relaxed_ordering(struct pci_dev *dev)
+{
+   int ro_disabled = 0;
+
+   while(dev) {
+   if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
+   ro_disabled = 1;
+   break;
+   }
+   dev = dev->bus->self;
+   }
+
+   return ro_disabled;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   struct pci_dev *bridge = pci_upstream_bridge(dev);
+   int origin_ero;
+
+   if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge))
+   return;
+
+   origin_ero = pcie_get_relaxed_ordering(dev);
+   /* If the releaxed ordering enable bit is not set, do nothing. */
+   if (!origin_ero)
+   return;
+
+   if (pci_dev_disable_relaxed_ordering(dev)) {
+   pcie_clear_relaxed_ordering(dev);
+   dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1708,6 +1750,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(&hpp, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e1e8428..299d2f3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1105,6 +1105,8 @@ int __pci_enable_wake(struct pci_dev *dev, pci_power_t 
state,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+int pcie_get_relaxed_ordering(struct pci_dev *dev);
 
 static inline int pci_enable_wake(struct pci_dev *dev, pci_power_t state,
  bool enable)
-- 
1.9.0




[PATCH v3 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-06-07 Thread Ding Tianhong
From: Casey Leedom 

cxgb4 Ethernet driver now queries Root Complex Port to determine if it can
send TLPs to it with the Relaxed Ordering Attribute set.

Signed-off-by: Casey Leedom 
Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c|  5 +++--
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index e88c180..478f25a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -521,6 +521,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS  = (1 << 6),
MASTER_PF  = (1 << 7),
FW_OFLD_CONN   = (1 << 9),
+   ROOT_NO_RELAXED_ORDERING = (1 << 10),
 };
 
 enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 38a5c67..fbfe341 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4628,6 +4628,7 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 #ifdef CONFIG_PCI_IOV
u32 v, port_vec;
 #endif
+   struct pci_dev *root;
 
printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
 
@@ -4726,6 +4727,22 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
 
+   /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+* Ingress Packet Data to Free List Buffers in order to allow for
+* chipset performance optimizations between the Root Complex and
+* Memory Controllers.  (Messages to the associated Ingress Queue
+* notifying new Packet Placement in the Free Lists Buffers will be
+* send without the Relaxed Ordering Attribute thus guaranteing that
+* all preceding PCIe Transaction Layer Packets will be processed
+* first.)  But some Root Complexes have various issues with Upstream
+* Transaction Layer Packets with the Relaxed Ordering Attribute set.
+* So we check our Root Complex to see if it's flaged with advice
+* against using Relaxed Ordering.
+*/
+   root = pci_find_pcie_root_port(adapter->pdev);
+   if (pcie_get_relaxed_ordering(root))
+   adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(&adapter->stats_lock);
spin_lock_init(&adapter->tid_release_lock);
spin_lock_init(&adapter->win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index f05f0d4..ac229a3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2571,6 +2571,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = &adap->sge;
struct port_info *pi = netdev_priv(dev);
+   int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
 
/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2624,8 +2625,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct 
sge_rspq *iq, bool fwevtq,
 
flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
-FW_IQ_CMD_FL0FETCHRO_F |
-FW_IQ_CMD_FL0DATARO_F |
+FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+FW_IQ_CMD_FL0DATARO_V(relaxed) |
 FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
-- 
1.9.0




[PATCH net-next] cxgb4: Fix tids count for ipv6 offload connection

2017-06-07 Thread Ganesh Goudar
the adapter consumes two tids for every ipv6 offload
connection be it active or passive, calculate tid usage
count accordingly.

Also change the signatures of relevant functions to get
the address family.

Signed-off-by: Rizwan Ansari 
Signed-off-by: Varun Prakash 
Signed-off-by: Ganesh Goudar 
---
 drivers/infiniband/hw/cxgb4/cm.c   | 14 +---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 12 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 38 +++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 24 ++
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |  8 +++--
 drivers/target/iscsi/cxgbit/cxgbit_cm.c|  6 ++--
 6 files changed, 71 insertions(+), 31 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 0910faf..b0ae4f0 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -398,7 +398,8 @@ void _c4iw_free_ep(struct kref *kref)
(const u32 *)&sin6->sin6_addr.s6_addr,
1);
}
-   cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid);
+   cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid,
+ep->com.local_addr.ss_family);
dst_release(ep->dst);
cxgb4_l2t_release(ep->l2t);
if (ep->mpa_skb)
@@ -1199,7 +1200,7 @@ static int act_establish(struct c4iw_dev *dev, struct 
sk_buff *skb)
 
/* setup the hwtid for this connection */
ep->hwtid = tid;
-   cxgb4_insert_tid(t, ep, tid);
+   cxgb4_insert_tid(t, ep, tid, ep->com.local_addr.ss_family);
insert_ep_tid(ep);
 
ep->snd_seq = be32_to_cpu(req->snd_isn);
@@ -2304,7 +2305,8 @@ static int act_open_rpl(struct c4iw_dev *dev, struct 
sk_buff *skb)
   (const u32 *)&sin6->sin6_addr.s6_addr, 1);
}
if (status && act_open_has_tid(status))
-   cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, GET_TID(rpl));
+   cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, GET_TID(rpl),
+ep->com.local_addr.ss_family);
 
remove_handle(ep->com.dev, &ep->com.dev->atid_idr, atid);
cxgb4_free_atid(t, atid);
@@ -2581,7 +2583,8 @@ static int pass_accept_req(struct c4iw_dev *dev, struct 
sk_buff *skb)
 child_ep->tx_chan, child_ep->smac_idx, child_ep->rss_qid);
 
init_timer(&child_ep->timer);
-   cxgb4_insert_tid(t, child_ep, hwtid);
+   cxgb4_insert_tid(t, child_ep, hwtid,
+child_ep->com.local_addr.ss_family);
insert_ep_tid(child_ep);
if (accept_cr(child_ep, skb, req)) {
c4iw_put_ep(&parent_ep->com);
@@ -2849,7 +2852,8 @@ static int peer_abort(struct c4iw_dev *dev, struct 
sk_buff *skb)
1);
}
remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid);
-   cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid);
+   cxgb4_remove_tid(ep->com.dev->rdev.lldi.tids, 0, ep->hwtid,
+ep->com.local_addr.ss_family);
dst_release(ep->dst);
cxgb4_l2t_release(ep->l2t);
c4iw_reconnect(ep);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 1fa34b0..00044d7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2669,6 +2669,8 @@ static int tid_info_show(struct seq_file *seq, void *v)
 
if (t4_read_reg(adap, LE_DB_CONFIG_A) & HASHEN_F) {
unsigned int sb;
+   seq_printf(seq, "Connections in use: %u\n",
+  atomic_read(&t->conns_in_use));
 
if (chip <= CHELSIO_T5)
sb = t4_read_reg(adap, LE_DB_SERVER_INDEX_A) / 4;
@@ -2699,17 +2701,23 @@ static int tid_info_show(struct seq_file *seq, void *v)
   atomic_read(&t->hash_tids_in_use));
}
} else if (t->ntids) {
+   seq_printf(seq, "Connections in use: %u\n",
+  atomic_read(&t->conns_in_use));
+
seq_printf(seq, "TID range: 0..%u", t->ntids - 1);
seq_printf(seq, ", in use: %u\n",
   atomic_read(&t->tids_in_use));
}
 
if (t->nstids)
-   seq_printf(seq, "STID range: %u..%u, in use: %u\n",
+   seq_printf(seq, "STID range: %u..%u, in use-IPv4/IPv6: %u/%u\n",
   (!t->stid_base &&
   (chip <= CHELSIO_T5)) ?
   t->stid_base + 1 : t->stid_base,
-  t->stid_base + t->nstids - 1, t

Re: [PATCH 4/9] net: mvmdio: move the read valid check into its own function

2017-06-07 Thread Sergei Shtylyov

Hello!

On 6/7/2017 11:38 AM, Antoine Tenart wrote:


Move the read valid check in its own function. This is needed as a
requirement to factorize the driver to add the xMDIO support in the
future.

Signed-off-by: Antoine Tenart 
---
 drivers/net/ethernet/marvell/mvmdio.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c 
b/drivers/net/ethernet/marvell/mvmdio.c
index 96af8d57d9e5..56bbe3990590 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -69,6 +69,11 @@ static int orion_mdio_smi_is_done(struct orion_mdio_dev *dev)
return !(readl(dev->regs) & MVMDIO_SMI_BUSY);
 }

+static int orion_mdio_smi_is_read_valid(struct orion_mdio_dev *dev)
+{
+   return !(readl(dev->regs) & MVMDIO_SMI_READ_VALID);
+}
+
 /* Wait for the SMI unit to be ready for another operation
  */
 static int orion_mdio_wait_ready(struct mii_bus *bus)
@@ -113,7 +118,6 @@ static int orion_mdio_read(struct mii_bus *bus, int mii_id,
   int regnum)
 {
struct orion_mdio_dev *dev = bus->priv;
-   u32 val;
int ret;

mutex_lock(&dev->lock);
@@ -131,14 +135,13 @@ static int orion_mdio_read(struct mii_bus *bus, int 
mii_id,
if (ret < 0)
goto out;

-   val = readl(dev->regs);
-   if (!(val & MVMDIO_SMI_READ_VALID)) {
+   if (orion_mdio_smi_is_read_valid(dev)) {
dev_err(bus->parent, "SMI bus read not valid\n");


   I think you reversed the valuid/invalid sense in the new function's name.


ret = -ENODEV;
goto out;
}


[...]

MBR, Sergei



Re: [PATCH v2] xfrm: fix xfrm_dev_event() missing when compile without CONFIG_XFRM_OFFLOAD

2017-06-07 Thread Steffen Klassert
On Tue, Jun 06, 2017 at 05:26:01PM +0800, Hangbin Liu wrote:
> On Tue, Jun 06, 2017 at 10:06:58AM +0200, Steffen Klassert wrote:
> > On Thu, Jun 01, 2017 at 02:57:56PM +0800, Hangbin Liu wrote:
> > > In commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") we
> > > make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
> > > But this will make xfrm_dev_event() missing if we only enable default XFRM
> > > options.
> > > 
> > > Then if we set down and unregister an interface with IPsec on it.
> > 
> > You should not be able to register an interface with IPsec offload
> > without CONFIG_XFRM_OFFLOAD.
> 
> Yes, I mean when compile with default CONFIG_XFRM, the xfrm_dev_event() ->
> xfrm_dev_down() -> xfrm_garbage_collect() will missing.

Ok, I see what you mean now. Thanks for the explanation!


Re: [PATCH v2] xfrm: fix xfrm_dev_event() missing when compile without CONFIG_XFRM_OFFLOAD

2017-06-07 Thread Steffen Klassert
On Thu, Jun 01, 2017 at 02:57:56PM +0800, Hangbin Liu wrote:
> In commit d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") we
> make xfrm_device.o only compiled when enable option CONFIG_XFRM_OFFLOAD.
> But this will make xfrm_dev_event() missing if we only enable default XFRM
> options.
> 
> Then if we set down and unregister an interface with IPsec on it. there
> will no xfrm_garbage_collect(), which will cause dev usage count hold and
> get error like:
> 
> unregister_netdevice: waiting for  to become free. Usage count = 4
> 
> Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
> Signed-off-by: Hangbin Liu 

Applied, thanks Hangbin!


Re: [PATCH v2 2/2] xfrm: add UDP encapsulation port in migrate message

2017-06-07 Thread Steffen Klassert
On Tue, Jun 06, 2017 at 12:12:14PM +0200, Antony Antony wrote:
> Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
> to userland. Only add if XFRMA_ENCAP was in user migrate request.
> 
> Signed-off-by: Antony Antony 
> Reviewed-by: Richard Guy Briggs 

Both patches applied to ipsec-next, thanks a lot!


RE: [PATCH v2 4/4] net: macb: Add hardware PTP support

2017-06-07 Thread Rafal Ozieblo
> From: Richard Cochran [mailto:richardcoch...@gmail.com]
> Sent: 6 czerwca 2017 20:34
> To: Rafal Ozieblo 
> Cc: David Miller ; nicolas.fe...@atmel.com;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> harini.kata...@xilinx.com; andrei.pistir...@microchip.com
> Subject: Re: [PATCH v2 4/4] net: macb: Add hardware PTP support
> 
> On Fri, Jun 02, 2017 at 03:28:10PM +0100, Rafal Ozieblo wrote:
> > +static void gem_ptp_clear_timer(struct macb *bp)
> > +{
> > +   bp->tsu_incr.ns = 0;
> > +   bp->tsu_incr.sub_ns = 0;
> 
> What is the point of this function?
Cleaning all bellow registers will stop hardware PTP clock.

> 
> > +   gem_writel(bp, TISUBN, GEM_BF(SUBNSINCR, 0));
> > +   gem_writel(bp, TI, GEM_BF(NSINCR, 0));
> > +   gem_writel(bp, TA, 0);
> > +}
(...)
> > +int gem_ptp_txstamp(struct macb_queue *queue, struct sk_buff *skb,
> > +   struct macb_dma_desc *desc)
> > +{
> > +   struct gem_tx_ts *tx_timestamp;
> > +   struct macb_dma_desc_ptp *desc_ptp;
> > +   unsigned long head = queue->tx_ts_head;
> > +   unsigned long tail = READ_ONCE(queue->tx_ts_tail);
> > +
> > +   if (!GEM_BFEXT(DMA_TXVALID, desc->ctrl))
> > +   return -EINVAL;
> > +
> > +   if (CIRC_SPACE(head, tail, PTP_TS_BUFFER_SIZE) == 0)
> > +   return -ENOMEM;
> > +
> > +   skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> > +   desc_ptp = macb_ptp_desc(queue->bp, desc);
> > +   tx_timestamp = &queue->tx_timestamps[head];
> > +   tx_timestamp->skb = skb;
> > +   tx_timestamp->desc_ptp.ts_1 = desc_ptp->ts_1;
> > +   tx_timestamp->desc_ptp.ts_2 = desc_ptp->ts_2;
> > +   /* move head */
> > +   smp_store_release(&queue->tx_ts_head,
> > + (head + 1) & (PTP_TS_BUFFER_SIZE - 1));
> > +
> > +   schedule_work(&queue->tx_ts_task);
> 
> Since the time stamp is in the buffer descriptor, why delay the
> delivery via the work item?

I put comment about that a few months ago:
https://patchwork.ozlabs.org/patch/705629/
Let me quote part about not doing it via worker:

" I think, you can not do it in that way. 
It will hold two locks. If you enable appropriate option in kernel (as far as I 
remember CONFIG_DEBUG_SPINLOCK) you will get a warning here.

Please look at following call-stack:

1. macb_interrupt()   // spin_lock(&bp->lock) is taken
2. macb_tx_interrupt()
3. macb_handle_txtstamp()
4. skb_tstamp_tx()
5. __skb_tstamp_tx()
6. skb_may_tx_timestamp()
7. read_lock_bh() // second lock is taken

I know that those are different locks and different types. But this could lead 
to deadlocks. This is the reason of warning I could see.
And this is the reason why I get timestamp in interrupt routine but pass it to 
skb outside interrupt (using circular buffer).

Please, refer to this:
https://lkml.org/lkml/2016/11/18/168

1. macb_tx_interrupt()
2. macb_tx_timestamp_add() and schedule_work(&queue->tx_timestamp_task)

Then, outside interrupt (without holding a lock) :
1. macb_tx_timestamp_flush()
2. macb_tstamp_tx()
3. skb_tstamp_tx()"
> 
> > +   return 0;
> > +}
(...)
> > +void gem_ptp_remove(struct net_device *ndev)
> > +{
> > +   struct macb *bp = netdev_priv(ndev);
> > +
> > +   if (bp->ptp_clock)
> > +   ptp_clock_unregister(bp->ptp_clock);
> > +
> > +   gem_ptp_clear_timer(bp);
> 
> Why is this 'clear' needed?
To stop hardware PTP clock.
> 
> > +   dev_info(&bp->pdev->dev, "%s ptp clock unregistered.\n",
> > +GEM_PTP_TIMER_NAME);
> > +}
> 
> Thanks,
> Richard

I'll correct all other issues.

Regards,
Rafal


[net-next] vxlan: use a more suitable function when assigning NULL

2017-06-07 Thread Mark Bloch
When stopping the vxlan interface we detach it from the socket.
Use RCU_INIT_POINTER() and not rcu_assign_pointer() to do so.

Suggested-by: Stephen Hemminger 
Signed-off-by: Mark Bloch 
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a6b5052..7cb21a0 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1077,10 +1077,10 @@ static void vxlan_sock_release(struct vxlan_dev *vxlan)
 #if IS_ENABLED(CONFIG_IPV6)
struct vxlan_sock *sock6 = rtnl_dereference(vxlan->vn6_sock);
 
-   rcu_assign_pointer(vxlan->vn6_sock, NULL);
+   RCU_INIT_POINTER(vxlan->vn6_sock, NULL);
 #endif
 
-   rcu_assign_pointer(vxlan->vn4_sock, NULL);
+   RCU_INIT_POINTER(vxlan->vn4_sock, NULL);
synchronize_net();
 
vxlan_vs_del_dev(vxlan);
-- 
1.8.4.3



EMAIL UPDATE

2017-06-07 Thread IT Department
Recently, we have detect some unusual activity on your account and as a result, 
all email users are urged to update their email account within 24 hours of 
receiving this e-mail, please click the link http://beam.to/1469 to confirm 
that your email account is up to date with the institution requirement.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus




Please Do Not Print If Unnecessary. JOM JIMAT. GO GREEN.

This e-mail and any files transmitted with it (message) is intended only 
for the use recepient (s) named and may contain confidential 
information. Opinions, conclusion and other information in this 
message that do not relate to the official business of PERBADANAN 
NASIONAL BERHAD (PNS) or its Group of Companies shall be 
understood as neither given or nor endorsed by PNS or any of the 
Companies within the Group.



[PATCH net] bpf, arm64: use separate register for state in stxr

2017-06-07 Thread Daniel Borkmann
Will reported that in BPF_XADD we must use a different register in stxr
instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
behavior per architecture. Reference manual says [1]:

  If s == t, then one of the following behaviors must occur:

   * The instruction is UNDEFINED.
   * The instruction executes as a NOP.
   * The instruction performs the store to the specified address, but
 the value stored is UNKNOWN.

Thus, use a different temporary register for the status flag to fix it.

Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:

  [...]
  003c:  c85f7d4b  ldxr x11, [x10]
  0040:  8b07016b  add x11, x11, x7
  0044:  c80c7d4b  stxr w12, x11, [x10]
  0048:  35ac  cbnz w12, 0x003c
  [...]

  [1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132

Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon 
Signed-off-by: Daniel Borkmann 
---
 arch/arm64/net/bpf_jit_comp.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 71f9305..c870d6f 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -36,6 +36,7 @@
 #define TMP_REG_1 (MAX_BPF_JIT_REG + 0)
 #define TMP_REG_2 (MAX_BPF_JIT_REG + 1)
 #define TCALL_CNT (MAX_BPF_JIT_REG + 2)
+#define TMP_REG_3 (MAX_BPF_JIT_REG + 3)
 
 /* Map BPF registers to A64 registers */
 static const int bpf2a64[] = {
@@ -57,6 +58,7 @@
/* temporary registers for internal BPF JIT */
[TMP_REG_1] = A64_R(10),
[TMP_REG_2] = A64_R(11),
+   [TMP_REG_3] = A64_R(12),
/* tail_call_cnt */
[TCALL_CNT] = A64_R(26),
/* temporary register for blinding constants */
@@ -319,6 +321,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
const u8 src = bpf2a64[insn->src_reg];
const u8 tmp = bpf2a64[TMP_REG_1];
const u8 tmp2 = bpf2a64[TMP_REG_2];
+   const u8 tmp3 = bpf2a64[TMP_REG_3];
const s16 off = insn->off;
const s32 imm = insn->imm;
const int i = insn - ctx->prog->insnsi;
@@ -689,10 +692,10 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
emit(A64_PRFM(tmp, PST, L1, STRM), ctx);
emit(A64_LDXR(isdw, tmp2, tmp), ctx);
emit(A64_ADD(isdw, tmp2, tmp2, src), ctx);
-   emit(A64_STXR(isdw, tmp2, tmp, tmp2), ctx);
+   emit(A64_STXR(isdw, tmp2, tmp, tmp3), ctx);
jmp_offset = -3;
check_imm19(jmp_offset);
-   emit(A64_CBNZ(0, tmp2, jmp_offset), ctx);
+   emit(A64_CBNZ(0, tmp3, jmp_offset), ctx);
break;
 
/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
-- 
1.9.3



Re: [PATCH net] bpf, arm64: use separate register for state in stxr

2017-06-07 Thread Will Deacon
Hi Daniel,

On Wed, Jun 07, 2017 at 01:45:37PM +0200, Daniel Borkmann wrote:
> Will reported that in BPF_XADD we must use a different register in stxr
> instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
> behavior per architecture. Reference manual says [1]:
> 
>   If s == t, then one of the following behaviors must occur:
> 
>* The instruction is UNDEFINED.
>* The instruction executes as a NOP.
>* The instruction performs the store to the specified address, but
>  the value stored is UNKNOWN.
> 
> Thus, use a different temporary register for the status flag to fix it.
> 
> Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:
> 
>   [...]
>   003c:  c85f7d4b  ldxr x11, [x10]
>   0040:  8b07016b  add x11, x11, x7
>   0044:  c80c7d4b  stxr w12, x11, [x10]
>   0048:  35ac  cbnz w12, 0x003c
>   [...]
> 
>   [1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132
> 
> Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
> Reported-by: Will Deacon 
> Signed-off-by: Daniel Borkmann 
> ---
>  arch/arm64/net/bpf_jit_comp.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Cheers for fixing this up:

Acked-by: Will Deacon 

Will


Re: [PATCH 7/9] net: mvmdio: add xmdio support

2017-06-07 Thread Andrew Lunn
On Wed, Jun 07, 2017 at 10:38:08AM +0200, Antoine Tenart wrote:
> This patch adds the xMDIO interface support in the mvmdio driver. This
> interface is used in Ethernet controllers on Marvell 370, 7k and 8k (as
> of now). The xSMI interface supported by this driver complies with the
> IEEE 802.3 clause 45 (while the SMI interface complies with the clause
> 22). The xSMI interface is used by 10GbE devices.
> 
> Signed-off-by: Antoine Tenart 

Hi Antoine

I've only take a quick look, but i don't see anywhere you look at the
register address and see if it has MII_ADDR_C45 to determine if a C45
transaction should be done, or a C22. The MDIO bus can have a mix of
C45 and C22 devices on it, and you need to use the correct transaction
type depending on the target device/address.

  Andrew


[PATCH iproute] tc: flower: add support for matching on ip tos and ttl

2017-06-07 Thread Or Gerlitz
Allow users to set flower classifier filter rules which
include matches for ip tos and ttl.

Signed-off-by: Or Gerlitz 
Reviewed-by: Jiri Pirko 
---
 man/man8/tc-flower.8 | 17 +++-
 tc/f_flower.c| 75 
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 7648079..be46f02 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -30,7 +30,11 @@ flower \- flow based traffic control filter
 .BR vlan_ethtype " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | " sctp " | " icmp " | " icmpv6 " | "
-.IR IP_PROTO " } | { "
+.IR IP_PROTO " } | "
+.B ip_tos
+.IR MASKED_IP_TOS " | "
+.B ip_ttl
+.IR MASKED_IP_TTL " | { "
 .BR dst_ip " | " src_ip " } "
 .IR PREFIX " | { "
 .BR dst_port " | " src_port " } "
@@ -122,6 +126,17 @@ may be
 .BR tcp ", " udp ", " sctp ", " icmp ", " icmpv6
 or an unsigned 8bit value in hexadecimal format.
 .TP
+.BI ip_tos " MASKED_IP_TOS"
+Match on ipv4 TOS or ipv6 traffic-class - eight bits in hexadecimal format.
+A mask may be optionally provided to limit the bits which are matched. A mask
+is provided by following the value with a slash and then the mask. If the mask
+is missing then a match on all bits is assumed.
+.TP
+.BI ip_ttl " MASKED_IP_TTL"
+Match on ipv4 TTL or ipv6 hop-limit  - eight bits value in decimal or 
hexadecimal format.
+A mask may be optionally provided to limit the bits which are matched. Same
+logic is used for the mask as with matching on ip_tos.
+.TP
 .BI dst_ip " PREFIX"
 .TQ
 .BI src_ip " PREFIX"
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 1b6b46e..5be693a 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -53,6 +53,8 @@ static void explain(void)
"   dst_mac MASKED-LLADDR |\n"
"   src_mac MASKED-LLADDR |\n"
"   ip_proto [tcp | udp | sctp | icmp | 
icmpv6 | IP-PROTO ] |\n"
+   "   ip_tos MASKED-IP_TOS |\n"
+   "   ip_ttl MASKED-IP_TTL |\n"
"   dst_ip PREFIX |\n"
"   src_ip PREFIX |\n"
"   dst_port PORT-NUMBER |\n"
@@ -510,6 +512,41 @@ err:
return err;
 }
 
+static int flower_parse_ip_tos_ttl(char *str, int key_type, int mask_type,
+  struct nlmsghdr *n)
+{
+   char *slash;
+   int ret, err = -1;
+   __u8 tos_ttl;
+
+   slash = strchr(str, '/');
+   if (slash)
+   *slash = '\0';
+
+   ret = get_u8(&tos_ttl, str, 10);
+   if (ret < 0)
+   ret = get_u8(&tos_ttl, str, 16);
+   if (ret < 0)
+   goto err;
+
+   addattr8(n, MAX_MSG, key_type, tos_ttl);
+
+   if (slash) {
+   ret = get_u8(&tos_ttl, slash + 1, 16);
+   if (ret < 0)
+   goto err;
+   } else {
+   tos_ttl = 0xff;
+   }
+   addattr8(n, MAX_MSG, mask_type, tos_ttl);
+
+   err = 0;
+err:
+   if (slash)
+   *slash = '/';
+   return err;
+}
+
 static int flower_parse_key_id(const char *str, int type, struct nlmsghdr *n)
 {
int ret;
@@ -665,6 +702,26 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
fprintf(stderr, "Illegal \"ip_proto\"\n");
return -1;
}
+   } else if (matches(*argv, "ip_tos") == 0) {
+   NEXT_ARG();
+   ret = flower_parse_ip_tos_ttl(*argv,
+ TCA_FLOWER_KEY_IP_TOS,
+ 
TCA_FLOWER_KEY_IP_TOS_MASK,
+ n);
+   if (ret < 0) {
+   fprintf(stderr, "Illegal \"ip_tos\"\n");
+   return -1;
+   }
+   } else if (matches(*argv, "ip_ttl") == 0) {
+   NEXT_ARG();
+   ret = flower_parse_ip_tos_ttl(*argv,
+ TCA_FLOWER_KEY_IP_TTL,
+ 
TCA_FLOWER_KEY_IP_TTL_MASK,
+ n);
+   if (ret < 0) {
+   fprintf(stderr, "Illegal \"ip_ttl\"\n");
+   return -1;
+   }
} else if (matches(*argv, "dst_ip") == 0) {
NEXT_ARG();
ret = flower_parse_ip_addr(*argv, vlan_ethtype ?
@@ -963,6 +1020,19 @@ static void flower_print_ip_proto(FILE *f, __u8 
*p_ip_proto,
*p_ip_proto = ip_proto;
 }
 
+static void flower_pri

[PATCH] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

2017-06-07 Thread Mateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < NLMSG_HDRLEN expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk 
---
 net/netfilter/nfnetlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 80f5ecf2c3d7..c634cfca40ec 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -491,7 +491,8 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
 
-   if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+   if (skb->len < sizeof(nlh->nlmsg_len) ||
+   nlh->nlmsg_len < NLMSG_HDRLEN ||
skb->len < nlh->nlmsg_len)
return;
 
-- 
2.13.1.508.gb3defc5cc-goog



Re: [PATCH 2/2] tcp: md5: add fields to the tcp_md5sig struct to set a key address prefix

2017-06-07 Thread Eric Dumazet
On Wed, 2017-06-07 at 08:13 +0200, Ivan Delalande wrote:
> On Tue, Jun 06, 2017 at 09:08:22PM -0700, Eric Dumazet wrote:
> > On Tue, 2017-06-06 at 17:54 -0700, Ivan Delalande wrote:
> >> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> >> index 38a2b07afdff..52ac30aa0652 100644
> >> --- a/include/uapi/linux/tcp.h
> >> +++ b/include/uapi/linux/tcp.h
> >> @@ -234,9 +234,13 @@ enum {
> >>  /* for TCP_MD5SIG socket option */
> >>  #define TCP_MD5SIG_MAXKEYLEN  80
> >>  
> >> +/* tcp_md5sig flags */
> >> +#define TCP_MD5SIG_FLAG_PREFIX1   /* address prefix 
> >> length */
> >> +
> >>  struct tcp_md5sig {
> >>struct __kernel_sockaddr_storage tcpm_addr; /* address associated */
> >> -  __u16   __tcpm_pad1;/* zero */
> >> +  __u8tcpm_flags; /* flags */
> >> +  __u8tcpm_prefixlen; /* address prefix */
> >>__u16   tcpm_keylen;/* key length */
> >>__u32   __tcpm_pad2;/* zero */
> >>__u8tcpm_key[TCP_MD5SIG_MAXKEYLEN]; /* key (binary) */
> > 
> > This will break some applications that maybe did not clear the
> > __tcpm_pad1 field ?
> > 
> > 
> > You need to find another way to maintain compatibility with old
> > applications.
> 
> All right, I thought this was acceptable after seeing a few examples of
> this in commits extending other structures in uapi, but the context and
> use were probably different for those.
> 
> We had another version of this patch which steals a bit from tcpm_keylen
> to use as a flag for this feature specifically and with the prefixlen at
> the same place as this patch. So when the flag is set we know we can
> safely interpret this part of the padding field as a prefix as all valid
> calls from older user programs should not have a key length greater than
> 80 bytes.
> 
> Would this be better? Programs compiled with the new headers could break
> on older kernels if they don't check the version, I don't know if that's
> a concern.
> 
> Or should we just add these two new fields at the end of tcp_md5sig and
> use them only if the value of optlen in the parse function called from
> setsockopt is large enough?

I believe this is the deferrable way to handle this.

But note that old kernels would not send an error back, if an
application tries the new semantic.





[PATCH net-next] net: dsa: mv88e6xxx: Have 6161/6123 use EDSA tags

2017-06-07 Thread Andrew Lunn
The mv88e6161 and mv88e6123 are capable of using EDSA tags when
passing frames from the host to the switch and back.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ea20c4ee30b6..f53bae07387f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3279,7 +3279,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.g1_irqs = 9,
.atu_move_port_mask = 0xf,
.pvt = true,
-   .tag_protocol = DSA_TAG_PROTO_DSA,
+   .tag_protocol = DSA_TAG_PROTO_EDSA,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
.ops = &mv88e6123_ops,
},
@@ -3331,7 +3331,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.g1_irqs = 9,
.atu_move_port_mask = 0xf,
.pvt = true,
-   .tag_protocol = DSA_TAG_PROTO_DSA,
+   .tag_protocol = DSA_TAG_PROTO_EDSA,
.flags = MV88E6XXX_FLAGS_FAMILY_6165,
.ops = &mv88e6161_ops,
},
-- 
2.11.0



[PATCH] decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb

2017-06-07 Thread Mateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk 
---
 net/decnet/netfilter/dn_rtmsg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 1ed81ac6dd1a..26e020e9d415 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -102,7 +102,9 @@ static inline void dnrmg_receive_user_skb(struct sk_buff 
*skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
 
-   if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
+   if (skb->len < sizeof(nlh->nlmsg_len) ||
+   nlh->nlmsg_len < sizeof(*nlh) ||
+   skb->len < nlh->nlmsg_len)
return;
 
if (!netlink_capable(skb, CAP_NET_ADMIN))
-- 
2.13.1.508.gb3defc5cc-goog



[iproute PATCH] iproute: Remove useless check for nexthop keyword when setting RTA_OIF

2017-06-07 Thread Jakub Sitnicki
When modifying a route we set the RTA_OIF attribute only if a device was
specified with "dev" or "oif" keyword. But for some unknown reason we
earlier alternatively check also for the presence of "nexthop" keyword,
even though it has no effect. So remove the pointless check.

Signed-off-by: Jakub Sitnicki 
---
 ip/iproute.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index b4ca291..4fd36a1 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -1241,16 +1241,14 @@ static int iproute_modify(int cmd, unsigned int flags, 
int argc, char **argv)
if (!dst_ok)
usage();
 
-   if (d || nhs_ok)  {
+   if (d) {
int idx;
 
-   if (d) {
-   if ((idx = ll_name_to_index(d)) == 0) {
-   fprintf(stderr, "Cannot find device \"%s\"\n", 
d);
-   return -1;
-   }
-   addattr32(&req.n, sizeof(req), RTA_OIF, idx);
+   if ((idx = ll_name_to_index(d)) == 0) {
+   fprintf(stderr, "Cannot find device \"%s\"\n", d);
+   return -1;
}
+   addattr32(&req.n, sizeof(req), RTA_OIF, idx);
}
 
if (mxrta->rta_len > RTA_LENGTH(0)) {
-- 
2.9.4



Re: [PATCH] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

2017-06-07 Thread Eric Dumazet
On Wed, 2017-06-07 at 14:35 +0200, Mateusz Jurczyk wrote:
> Verify that the length of the socket buffer is sufficient to cover the
> entire nlh->nlmsg_len field before accessing that field for further
> input sanitization. If the client only supplies 1-3 bytes of data in
> sk_buff, then nlh->nlmsg_len remains partially uninitialized and
> contains leftover memory from the corresponding kernel allocation.
> Operating on such data may result in indeterminate evaluation of the
> nlmsg_len < NLMSG_HDRLEN expression.
> 
> The bug was discovered by a runtime instrumentation designed to detect
> use of uninitialized memory in the kernel. The patch prevents this and
> other similar tools (e.g. KMSAN) from flagging this behavior in the future.
> 
> Signed-off-by: Mateusz Jurczyk 
> ---
>  net/netfilter/nfnetlink.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 80f5ecf2c3d7..c634cfca40ec 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -491,7 +491,8 @@ static void nfnetlink_rcv(struct sk_buff *skb)
>  {
>   struct nlmsghdr *nlh = nlmsg_hdr(skb);
>  
> - if (nlh->nlmsg_len < NLMSG_HDRLEN ||
> + if (skb->len < sizeof(nlh->nlmsg_len) ||

This assumes nlmsg_len is first field of the structure.

offsetofend() might be more descriptive, one does not have to check the
structure to make sure the code is correct.

Or simply use the more common form :

if (skb->len < NLMSG_HDRLEN ||

> + nlh->nlmsg_len < NLMSG_HDRLEN ||
>   skb->len < nlh->nlmsg_len)
>   return;
>  




[PATCH net-next 0/3] mlx4 drivers: version update

2017-06-07 Thread Tariq Toukan
Hi Dave,

This patchset contains version updates for the MLX4 drivers:
Core, EN, and IB.

Just like we've done in mlx5, we modify the outdated driver
version (reported in ethtool for example).
This better reflects the current driver state, and removes the
redundant date string.
We are not going to change this frequently or even use it.

I include the IB patch in this series as it has similar subject
and content.
It does not cause any kind of conflict with Doug's tree.
The rdma mailing list is CCed.
Please let me know if I need to submit this differently.

Series generated against net-next commit:
216fe8f021e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Thanks,
Tariq.

Tariq Toukan (3):
  net/mlx4_core: Bump driver version
  net/mlx4_en: Bump driver version
  IB/mlx4: Bump driver version

 drivers/infiniband/hw/mlx4/main.c   | 5 ++---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_main.c| 4 ++--
 drivers/net/ethernet/mellanox/mlx4/main.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h   | 3 +--
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h| 3 +--
 6 files changed, 8 insertions(+), 11 deletions(-)

-- 
1.8.3.1



[PATCH net-next 3/3] IB/mlx4: Bump driver version

2017-06-07 Thread Tariq Toukan
Remove date and bump version for mlx4_ib driver.

Signed-off-by: Tariq Toukan 
---
 drivers/infiniband/hw/mlx4/main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 521d0def2d9e..75b2f7d4cd95 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -61,8 +61,7 @@
 #include 
 
 #define DRV_NAME   MLX4_IB_DRV_NAME
-#define DRV_VERSION"2.2-1"
-#define DRV_RELDATE"Feb 2014"
+#define DRV_VERSION"4.0-0"
 
 #define MLX4_IB_FLOW_MAX_PRIO 0xFFF
 #define MLX4_IB_FLOW_QPN_MASK 0xFF
@@ -79,7 +78,7 @@
 
 static const char mlx4_ib_version[] =
DRV_NAME ": Mellanox ConnectX InfiniBand driver v"
-   DRV_VERSION " (" DRV_RELDATE ")\n";
+   DRV_VERSION "\n";
 
 static void do_slave_init(struct mlx4_ib_dev *ibdev, int slave, int do_init);
 
-- 
1.8.3.1



[PATCH net-next 2/3] net/mlx4_en: Bump driver version

2017-06-07 Thread Tariq Toukan
Remove date and bump version for mlx4_en driver.

Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_main.c| 4 ++--
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h| 3 +--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index ffbcb27c05e5..e97fbf327594 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -89,7 +89,7 @@ static int mlx4_en_moderation_update(struct mlx4_en_priv 
*priv)
struct mlx4_en_dev *mdev = priv->mdev;
 
strlcpy(drvinfo->driver, DRV_NAME, sizeof(drvinfo->driver));
-   strlcpy(drvinfo->version, DRV_VERSION " (" DRV_RELDATE ")",
+   strlcpy(drvinfo->version, DRV_VERSION,
sizeof(drvinfo->version));
snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
"%d.%d.%d",
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c 
b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index 36a7a54bbb82..d94f981eafc4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -46,11 +46,11 @@
 MODULE_AUTHOR("Liran Liss, Yevgeny Petrilin");
 MODULE_DESCRIPTION("Mellanox ConnectX HCA Ethernet driver");
 MODULE_LICENSE("Dual BSD/GPL");
-MODULE_VERSION(DRV_VERSION " ("DRV_RELDATE")");
+MODULE_VERSION(DRV_VERSION);
 
 static const char mlx4_en_version[] =
DRV_NAME ": Mellanox ConnectX HCA Ethernet driver v"
-   DRV_VERSION " (" DRV_RELDATE ")\n";
+   DRV_VERSION "\n";
 
 #define MLX4_EN_PARM_INT(X, def_val, desc) \
static unsigned int X = def_val;\
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 39f401aa3047..8c4f63946b14 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -58,8 +58,7 @@
 #include "mlx4_stats.h"
 
 #define DRV_NAME   "mlx4_en"
-#define DRV_VERSION"2.2-1"
-#define DRV_RELDATE"Feb 2014"
+#define DRV_VERSION"4.0-0"
 
 #define MLX4_EN_MSG_LEVEL  (NETIF_MSG_LINK | NETIF_MSG_IFDOWN)
 
-- 
1.8.3.1



[PATCH net-next 1/3] net/mlx4_core: Bump driver version

2017-06-07 Thread Tariq Toukan
Remove date and bump version for mlx4_core driver.

Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 83aab1e4c8c8..ccae3c6593c4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -119,7 +119,7 @@
 
 static char mlx4_version[] =
DRV_NAME ": Mellanox ConnectX core driver v"
-   DRV_VERSION " (" DRV_RELDATE ")\n";
+   DRV_VERSION "\n";
 
 static struct mlx4_profile default_profile = {
.num_qp = 1 << 18,
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index b4f1bc56cc68..6ea2b7a0c34d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -56,8 +56,7 @@
 
 #define DRV_NAME   "mlx4_core"
 #define PFXDRV_NAME ": "
-#define DRV_VERSION"2.2-1"
-#define DRV_RELDATE"Feb, 2014"
+#define DRV_VERSION"4.0-0"
 
 #define MLX4_FS_UDP_UC_EN  (1 << 1)
 #define MLX4_FS_TCP_UC_EN  (1 << 2)
-- 
1.8.3.1



Re: [PATCH v2 4/4] net: macb: Add hardware PTP support

2017-06-07 Thread Richard Cochran
On Wed, Jun 07, 2017 at 11:13:36AM +, Rafal Ozieblo wrote:
> Please look at following call-stack:
> 
> 1. macb_interrupt()   // spin_lock(&bp->lock) is taken
> 2. macb_tx_interrupt()
> 3. macb_handle_txtstamp()
> 4. skb_tstamp_tx()
> 5. __skb_tstamp_tx()
> 6. skb_may_tx_timestamp()
> 7. read_lock_bh() // second lock is taken

Well, you can always drop the lock, or postpone the call to
skb_tstamp_tx() until after the spin lock is released...

> I know that those are different locks and different types. But this could 
> lead 
> to deadlocks. This is the reason of warning I could see.

Can you please post the lockdep splat?

Thanks,
Richard



Re: [PATCH net-next] net: fec: Clear and enable MIB counters on imx51

2017-06-07 Thread Fabio Estevam
On Tue, Jun 6, 2017 at 10:57 PM, Andrew Lunn  wrote:
> Both the IMX51 and IMX53 datasheet indicates that the MIB counters
> should be cleared during setup. Otherwise random numbers are returned
> via ethtool -S.  Add a quirk and a function to do this.
>
> Tested on an IMX51.
>
> Signed-off-by: Andrew Lunn 

Thanks for the fix:

Reviewed-by: Fabio Estevam 


[PATCH v2] netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

2017-06-07 Thread Mateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < NLMSG_HDRLEN expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk 
---
Changes in v2:
  - Compare skb->len against NLMSG_HDRLEN to avoid assuming the layout of
the nlmsghdr structure.

 net/netfilter/nfnetlink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 80f5ecf2c3d7..1f9667f52be5 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -491,7 +491,8 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
 
-   if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+   if (skb->len < NLMSG_HDRLEN ||
+   nlh->nlmsg_len < NLMSG_HDRLEN ||
skb->len < nlh->nlmsg_len)
return;
 
-- 
2.13.1.508.gb3defc5cc-goog



[PATCH v2 2/2] net: emac: fix and unify emac_mdio functions

2017-06-07 Thread Christian Lamparter
emac_mdio_read_link() was not copying the requested phy settings
back into the emac driver's own phy api. This has caused a link
speed mismatch issue for the AR8035 as the emac driver kept
trying to connect with 10/100MBps on a 1GBit/s link.

This patch also unifies shared code between emac_setup_aneg()
and emac_mdio_setup_forced(). And furthermore it removes
a chunk of emac_mdio_init_phy(), that was copying the same
data into itself.

Signed-off-by: Christian Lamparter 
---
 drivers/net/ethernet/ibm/emac/core.c | 41 
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index b6e871bfb659..259e69a52ec5 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -2478,20 +2478,24 @@ static int emac_mii_bus_reset(struct mii_bus *bus)
return emac_reset(dev);
 }
 
+static int emac_mdio_phy_start_aneg(struct mii_phy *phy,
+   struct phy_device *phy_dev)
+{
+   phy_dev->autoneg = phy->autoneg;
+   phy_dev->speed = phy->speed;
+   phy_dev->duplex = phy->duplex;
+   phy_dev->advertising = phy->advertising;
+   return phy_start_aneg(phy_dev);
+}
+
 static int emac_mdio_setup_aneg(struct mii_phy *phy, u32 advertise)
 {
struct net_device *ndev = phy->dev;
struct emac_instance *dev = netdev_priv(ndev);
 
-   dev->phy.autoneg = AUTONEG_ENABLE;
-   dev->phy.speed = SPEED_1000;
-   dev->phy.duplex = DUPLEX_FULL;
-   dev->phy.advertising = advertise;
phy->autoneg = AUTONEG_ENABLE;
-   phy->speed = dev->phy.speed;
-   phy->duplex = dev->phy.duplex;
phy->advertising = advertise;
-   return phy_start_aneg(dev->phy_dev);
+   return emac_mdio_phy_start_aneg(phy, dev->phy_dev);
 }
 
 static int emac_mdio_setup_forced(struct mii_phy *phy, int speed, int fd)
@@ -2499,13 +2503,10 @@ static int emac_mdio_setup_forced(struct mii_phy *phy, 
int speed, int fd)
struct net_device *ndev = phy->dev;
struct emac_instance *dev = netdev_priv(ndev);
 
-   dev->phy.autoneg =  AUTONEG_DISABLE;
-   dev->phy.speed = speed;
-   dev->phy.duplex = fd;
phy->autoneg = AUTONEG_DISABLE;
phy->speed = speed;
phy->duplex = fd;
-   return phy_start_aneg(dev->phy_dev);
+   return emac_mdio_phy_start_aneg(phy, dev->phy_dev);
 }
 
 static int emac_mdio_poll_link(struct mii_phy *phy)
@@ -2527,16 +2528,17 @@ static int emac_mdio_read_link(struct mii_phy *phy)
 {
struct net_device *ndev = phy->dev;
struct emac_instance *dev = netdev_priv(ndev);
+   struct phy_device *phy_dev = dev->phy_dev;
int res;
 
-   res = phy_read_status(dev->phy_dev);
+   res = phy_read_status(phy_dev);
if (res)
return res;
 
-   dev->phy.speed = phy->speed;
-   dev->phy.duplex = phy->duplex;
-   dev->phy.pause = phy->pause;
-   dev->phy.asym_pause = phy->asym_pause;
+   phy->speed = phy_dev->speed;
+   phy->duplex = phy_dev->duplex;
+   phy->pause = phy_dev->pause;
+   phy->asym_pause = phy_dev->asym_pause;
return 0;
 }
 
@@ -2546,13 +2548,6 @@ static int emac_mdio_init_phy(struct mii_phy *phy)
struct emac_instance *dev = netdev_priv(ndev);
 
phy_start(dev->phy_dev);
-   dev->phy.autoneg = phy->autoneg;
-   dev->phy.speed = phy->speed;
-   dev->phy.duplex = phy->duplex;
-   dev->phy.advertising = phy->advertising;
-   dev->phy.pause = phy->pause;
-   dev->phy.asym_pause = phy->asym_pause;
-
return phy_init_hw(dev->phy_dev);
 }
 
-- 
2.11.0



[PATCH v2 1/2] net: emac: fix reset timeout with AR8035 phy

2017-06-07 Thread Christian Lamparter
This patch fixes a problem where the AR8035 PHY can't be
detected on an Cisco Meraki MR24, if the ethernet cable is
not connected on boot.

Russell Senior provided steps to reproduce the issue:
|Disconnect ethernet cable, apply power, wait until device has booted,
|plug in ethernet, check for interfaces, no eth0 is listed.
|
|This appears to be a problem during probing of the AR8035 Phy chip.
|When ethernet has no link, the phy detection fails, and eth0 is not
|created. Plugging ethernet later has no effect, because there is no
|interface as far as the kernel is concerned. The relevant part of
|the boot log looks like this:
|this is the failing case:
|
|[0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[0.882532] /plb/opb/ethernet@ef600c00: reset timeout
|[0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!
|and the succeeding case:
|
|[0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:..
|[0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)

Based on the comment and the commit message of
commit 23fbb5a87c56 ("emac: Fix EMAC soft reset on 460EX/GT").
This is because the AR8035 PHY doesn't provide the TX Clock,
if the ethernet cable is not attached. This causes the reset
to timeout and the PHY detection code in emac_init_phy() is
unable to detect the AR8035 PHY. As a result, the emac driver
bails out early and the user left with no ethernet.

In order to stay compatible with existing configurations, the driver
tries the current reset approach at first. Only if the first attempt
timed out, it does perform one more retry with the clock temporarily
switched to the internal source for just the duration of the reset.

LEDE-Bug: #687 

Cc: Chris Blake 
Reported-by: Russell Senior 
Fixes: 23fbb5a87c56e98 ("emac: Fix EMAC soft reset on 460EX/GT")
Signed-off-by: Christian Lamparter 
---
v1 -> v2:
- made it clear, that the clock source is only switched
  temporarily.
- fixed missing goto label, if !CONFIG_PPC_DCR_NATIVE
---
 drivers/net/ethernet/ibm/emac/core.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 508923f39ccf..b6e871bfb659 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -343,6 +343,7 @@ static int emac_reset(struct emac_instance *dev)
 {
struct emac_regs __iomem *p = dev->emacp;
int n = 20;
+   bool __maybe_unused try_internal_clock = false;
 
DBG(dev, "reset" NL);
 
@@ -355,6 +356,7 @@ static int emac_reset(struct emac_instance *dev)
}
 
 #ifdef CONFIG_PPC_DCR_NATIVE
+do_retry:
/*
 * PPC460EX/GT Embedded Processor Advanced User's Manual
 * section 28.10.1 Mode Register 0 (EMACx_MR0) states:
@@ -362,10 +364,19 @@ static int emac_reset(struct emac_instance *dev)
 * of the EMAC. If none is present, select the internal clock
 * (SDR0_ETH_CFG[EMACx_PHY_CLK] = 1).
 * After a soft reset, select the external clock.
+*
+* The AR8035-A PHY Meraki MR24 does not provide a TX Clk if the
+* ethernet cable is not attached. This causes the reset to timeout
+* and the PHY detection code in emac_init_phy() is unable to
+* communicate and detect the AR8035-A PHY. As a result, the emac
+* driver bails out early and the user has no ethernet.
+* In order to stay compatible with existing configurations, the
+* driver will temporarily switch to the internal clock, after
+* the first reset fails.
 */
if (emac_has_feature(dev, EMAC_FTR_460EX_PHY_CLK_FIX)) {
-   if (dev->phy_address == 0x &&
-   dev->phy_map == 0x) {
+   if (try_internal_clock || (dev->phy_address == 0x &&
+  dev->phy_map == 0x)) {
/* No PHY: select internal loop clock before reset */
dcri_clrset(SDR0, SDR0_ETH_CFG,
0, SDR0_ETH_CFG_ECS << dev->cell_index);
@@ -383,8 +394,15 @@ static int emac_reset(struct emac_instance *dev)
 
 #ifdef CONFIG_PPC_DCR_NATIVE
if (emac_has_feature(dev, EMAC_FTR_460EX_PHY_CLK_FIX)) {
-   if (dev->phy_address == 0x &&
-   dev->phy_map == 0x) {
+   if (!n && !try_internal_clock) {
+   /* first attempt has timed out. */
+   n = 20;
+   try_internal_clock = true;
+   goto do_retry;
+   }
+
+   if (try_internal_clock || (dev->phy_address == 0x &&
+  dev->phy_map == 0x)) {
  

Re: [PATCH net-next] net: fec: Clear and enable MIB counters on imx51

2017-06-07 Thread David Miller
From: Andrew Lunn 
Date: Wed,  7 Jun 2017 03:57:09 +0200

> Both the IMX51 and IMX53 datasheet indicates that the MIB counters
> should be cleared during setup. Otherwise random numbers are returned
> via ethtool -S.  Add a quirk and a function to do this.
> 
> Tested on an IMX51.
> 
> Signed-off-by: Andrew Lunn 

Applied, thanks.


Re: [PATCH net-next v2 3/3] udp: try to avoid 2 cache miss on dequeue

2017-06-07 Thread David Miller
From: Paolo Abeni 
Date: Wed, 07 Jun 2017 09:56:45 +0200

> Hi David,
> 
> On Tue, 2017-06-06 at 16:23 +0200, Paolo Abeni wrote:
>> when udp_recvmsg() is executed, on x86_64 and other archs, most skb
>> fields are on cold cachelines.
>> If the skb are linear and the kernel don't need to compute the udp
>> csum, only a handful of skb fields are required by udp_recvmsg().
>> Since we already use skb->dev_scratch to cache hot data, and
>> there are 32 bits unused on 64 bit archs, use such field to cache
>> as much data as we can, and try to prefetch on dequeue the relevant
>> fields that are left out.
>> 
>> This can save up to 2 cache miss per packet.
>> 
>> v1 -> v2:
>>   - changed udp_dev_scratch fields types to u{32,16} variant,
>> replaced bitfield with bool
>> 
>> Signed-off-by: Paolo Abeni 
> 
> Can you please keep on-hold this series a little time? the lkp-robot
> just reported a performance regression on v1 which I have still to
> investigate. I can't look at it really soon, but I expect the same
> should apply to v2.
> 
> It sounds quite weird to me, since the bisected patch touches the UDP
> code only and the regression is on apachebench.

Hmmm, DNS lookups?

Thanks for looking into this.


Re: [PATCH v2 1/2] net: emac: fix reset timeout with AR8035 phy

2017-06-07 Thread Andrew Lunn
On Wed, Jun 07, 2017 at 03:51:15PM +0200, Christian Lamparter wrote:
> This patch fixes a problem where the AR8035 PHY can't be
> detected on an Cisco Meraki MR24, if the ethernet cable is
> not connected on boot.
> 
> Russell Senior provided steps to reproduce the issue:
> |Disconnect ethernet cable, apply power, wait until device has booted,
> |plug in ethernet, check for interfaces, no eth0 is listed.
> |
> |This appears to be a problem during probing of the AR8035 Phy chip.
> |When ethernet has no link, the phy detection fails, and eth0 is not
> |created. Plugging ethernet later has no effect, because there is no
> |interface as far as the kernel is concerned. The relevant part of
> |the boot log looks like this:
> |this is the failing case:
> |
> |[0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
> |[0.882532] /plb/opb/ethernet@ef600c00: reset timeout
> |[0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!
> |and the succeeding case:
> |
> |[0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
> |[0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:..
> |[0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)
> 
> Based on the comment and the commit message of
> commit 23fbb5a87c56 ("emac: Fix EMAC soft reset on 460EX/GT").
> This is because the AR8035 PHY doesn't provide the TX Clock,
> if the ethernet cable is not attached. This causes the reset
> to timeout and the PHY detection code in emac_init_phy() is
> unable to detect the AR8035 PHY. As a result, the emac driver
> bails out early and the user left with no ethernet.
> 
> In order to stay compatible with existing configurations, the driver
> tries the current reset approach at first. Only if the first attempt
> timed out, it does perform one more retry with the clock temporarily
> switched to the internal source for just the duration of the reset.
> 
> LEDE-Bug: #687 
> 
> 
> Cc: Chris Blake 
> Reported-by: Russell Senior 
> Fixes: 23fbb5a87c56e98 ("emac: Fix EMAC soft reset on 460EX/GT")
> Signed-off-by: Christian Lamparter 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v2 2/2] net: emac: fix and unify emac_mdio functions

2017-06-07 Thread Andrew Lunn
On Wed, Jun 07, 2017 at 03:51:16PM +0200, Christian Lamparter wrote:
> emac_mdio_read_link() was not copying the requested phy settings
> back into the emac driver's own phy api. This has caused a link
> speed mismatch issue for the AR8035 as the emac driver kept
> trying to connect with 10/100MBps on a 1GBit/s link.
> 
> This patch also unifies shared code between emac_setup_aneg()
> and emac_mdio_setup_forced(). And furthermore it removes
> a chunk of emac_mdio_init_phy(), that was copying the same
> data into itself.
> 
> Signed-off-by: Christian Lamparter 

Reviewed-by: Andrew Lunn 

Andrew


RE: Grant

2017-06-07 Thread Mayrhofer Family


-- 
Good Day,

My wife and I have awarded you with a donation of $ 1,000,000.00 Dollars from
part of our Jackpot Lottery of 50 Million Dollars, respond with your details
for claims.

We await your earliest response and God Bless you.

Friedrich And Annand Mayrhofer.


[PATCH 1/2] ip_tunnel: fix potential issue in ip_tunnel_rcv

2017-06-07 Thread Haishuang Yan
When ip_tunnel_rcv fails, the tun_dst won't be freed, so move
skb_dst_set to begin and tun_dst would be freed by kfree_skb.

Signed-off-by: Haishuang Yan 
---
 net/ipv4/ip_tunnel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index b878ecb..27fc20f 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -386,6 +386,9 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff 
*skb,
const struct iphdr *iph = ip_hdr(skb);
int err;
 
+   if (tun_dst)
+   skb_dst_set(skb, (struct dst_entry *)tun_dst);
+
 #ifdef CONFIG_NET_IPGRE_BROADCAST
if (ipv4_is_multicast(iph->daddr)) {
tunnel->dev->stats.multicast++;
@@ -439,9 +442,6 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff 
*skb,
skb->dev = tunnel->dev;
}
 
-   if (tun_dst)
-   skb_dst_set(skb, (struct dst_entry *)tun_dst);
-
gro_cells_receive(&tunnel->gro_cells, skb);
return 0;
 
-- 
1.8.3.1





[PATCH 2/2] ip6_tunnel: fix potential issue in __ip6_tnl_rcv

2017-06-07 Thread Haishuang Yan
When __ip6_tnl_rcv fails, the tun_dst won't be freed, so move
skb_dst_set to begin and tun_dst would be freed by kfree_skb.

Signed-off-by: Haishuang Yan 
---
 net/ipv6/ip6_tunnel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 9b37f97..bf45f1b 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -789,6 +789,9 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel, struct 
sk_buff *skb,
const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
int err;
 
+   if (tun_dst)
+   skb_dst_set(skb, (struct dst_entry *)tun_dst);
+
if ((!(tpi->flags & TUNNEL_CSUM) &&
 (tunnel->parms.i_flags & TUNNEL_CSUM)) ||
((tpi->flags & TUNNEL_CSUM) &&
@@ -852,9 +855,6 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel, struct 
sk_buff *skb,
 
skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(tunnel->dev)));
 
-   if (tun_dst)
-   skb_dst_set(skb, (struct dst_entry *)tun_dst);
-
gro_cells_receive(&tunnel->gro_cells, skb);
return 0;
 
-- 
1.8.3.1





Re: [PATCH v2] decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb

2017-06-07 Thread Florian Westphal
Mateusz Jurczyk  wrote:
> Verify that the length of the socket buffer is sufficient to cover the
> nlmsghdr structure before accessing the nlh->nlmsg_len field for further
> input sanitization. If the client only supplies 1-3 bytes of data in
> sk_buff, then nlh->nlmsg_len remains partially uninitialized and
> contains leftover memory from the corresponding kernel allocation.
> Operating on such data may result in indeterminate evaluation of the
> nlmsg_len < sizeof(*nlh) expression.
> 
> The bug was discovered by a runtime instrumentation designed to detect
> use of uninitialized memory in the kernel. The patch prevents this and
> other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Instead of changing all the internal users wouldn't it be better
to add this check once in netlink_unicast_kernel?



RE: [PATCH net] bnx2x: fix pf2vf bulletin DMA mapping leak

2017-06-07 Thread Mintz, Yuval
> When freeing VF's DMA mappings, an already NULLed pointer was checked
> again due to an apparent copy&paste error. Consequently, the pf2vf bulletin
> DMA mapping was not freed.
> 
> Signed-off-by: Michal Schmidt 

Thanks Michal.

Acked-by: Yuval Mintz 


[PATCH v2] decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb

2017-06-07 Thread Mateusz Jurczyk
Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk 
---
Changes in v2:
  - Compare skb->len against sizeof(*nlh) instead of sizeof(nlh->nlmsg_len)
to avoid assuming the layout of the nlmsghdr structure. This was
motivated by Eric Dumazet's comment on a related patch submission.

 net/decnet/netfilter/dn_rtmsg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 1ed81ac6dd1a..aa8ffecc46a4 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -102,7 +102,9 @@ static inline void dnrmg_receive_user_skb(struct sk_buff 
*skb)
 {
struct nlmsghdr *nlh = nlmsg_hdr(skb);
 
-   if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
+   if (skb->len < sizeof(*nlh) ||
+   nlh->nlmsg_len < sizeof(*nlh) ||
+   skb->len < nlh->nlmsg_len)
return;
 
if (!netlink_capable(skb, CAP_NET_ADMIN))
-- 
2.13.1.508.gb3defc5cc-goog



Re: [PATCH 4/9] net: mvmdio: move the read valid check into its own function

2017-06-07 Thread Antoine Tenart
Hello,

On Wed, Jun 07, 2017 at 01:00:21PM +0300, Sergei Shtylyov wrote:
> On 6/7/2017 11:38 AM, Antoine Tenart wrote:
> > 
> > -   val = readl(dev->regs);
> > -   if (!(val & MVMDIO_SMI_READ_VALID)) {
> > +   if (orion_mdio_smi_is_read_valid(dev)) {
> > dev_err(bus->parent, "SMI bus read not valid\n");
> 
>I think you reversed the valuid/invalid sense in the new function's name.

Good catch, I'll fix this.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH 7/9] net: mvmdio: add xmdio support

2017-06-07 Thread Antoine Tenart
Hi Andrew,

On Wed, Jun 07, 2017 at 02:12:05PM +0200, Andrew Lunn wrote:
> On Wed, Jun 07, 2017 at 10:38:08AM +0200, Antoine Tenart wrote:
> > This patch adds the xMDIO interface support in the mvmdio driver. This
> > interface is used in Ethernet controllers on Marvell 370, 7k and 8k (as
> > of now). The xSMI interface supported by this driver complies with the
> > IEEE 802.3 clause 45 (while the SMI interface complies with the clause
> > 22). The xSMI interface is used by 10GbE devices.
> 
> I've only take a quick look, but i don't see anywhere you look at the
> register address and see if it has MII_ADDR_C45 to determine if a C45
> transaction should be done, or a C22. The MDIO bus can have a mix of
> C45 and C22 devices on it, and you need to use the correct transaction
> type depending on the target device/address.

So this could be dynamic and not based on the compatible. I'll try this
and see if it can work.

Thanks!

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH v2] decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb

2017-06-07 Thread Mateusz Jurczyk
On Wed, Jun 7, 2017 at 4:18 PM, Florian Westphal  wrote:
> Mateusz Jurczyk  wrote:
>> Verify that the length of the socket buffer is sufficient to cover the
>> nlmsghdr structure before accessing the nlh->nlmsg_len field for further
>> input sanitization. If the client only supplies 1-3 bytes of data in
>> sk_buff, then nlh->nlmsg_len remains partially uninitialized and
>> contains leftover memory from the corresponding kernel allocation.
>> Operating on such data may result in indeterminate evaluation of the
>> nlmsg_len < sizeof(*nlh) expression.
>>
>> The bug was discovered by a runtime instrumentation designed to detect
>> use of uninitialized memory in the kernel. The patch prevents this and
>> other similar tools (e.g. KMSAN) from flagging this behavior in the future.
>
> Instead of changing all the internal users wouldn't it be better
> to add this check once in netlink_unicast_kernel?
>

Perhaps. I must admit I'm not very familiar with this code
area/interface, so I preferred to fix the few specific cases instead
of submitting a general patch, which might have some unexpected side
effects, e.g. behavior different from one of the internal clients etc.

If you think one check in netlink_unicast_kernel is a better way to do
it, I'm happy to implement it like that.

Thanks,
Mateusz


[RFC PATCH net-next 0/5] bpf: rewrite value tracking in verifier

2017-06-07 Thread Edward Cree
This series simplifies alignment tracking, generalises bounds tracking and
 fixes some bounds-tracking bugs in the BPF verifier.  Pointer arithmetic on
 packet pointers, stack pointers, map value pointers and context pointers has
 been unified, and bounds on these pointers are only checked when the pointer
 is dereferenced.
Operations on pointers which destroy all relation to the original pointer
 (such as multiplies and shifts) are disallowed if !env->allow_ptr_leaks,
 otherwise they convert the pointer to an unknown scalar and feed it to the
 normal scalar arithmetic handling.
Pointer types have been unified with the corresponding adjusted-pointer types
 where those existed (e.g. PTR_TO_MAP_VALUE[_ADJ] or FRAME_PTR vs
 PTR_TO_STACK); similarly, CONST_IMM and UNKNOWN_VALUE have been unified into
 SCALAR_VALUE.
Pointer types (except CONST_PTR_TO_MAP, PTR_TO_MAP_VALUE_OR_NULL and
 PTR_TO_PACKET_END, which do not allow arithmetic) have a 'fixed offset' and
 a 'variable offset'; the former is used when e.g. adding an immediate or a
 known-constant register, as long as it does not overflow.  Otherwise the
 latter is used, and any operation creating a new variable offset creates a
 new 'id' (and, for PTR_TO_PACKET, clears the 'range').
SCALAR_VALUEs use the 'variable offset' fields to track the range of possible
 values; the 'fixed offset' should never be set on a scalar.

Patch 2/5 is rather on the big side, but since it changes the contents and
 semantics of a fairly central data structure, I'm not really sure how to go
 about splitting it up further without producing broken intermediate states.

With the changes in patch 5/5, all tools/testing/selftests/bpf/test_verifier
 tests pass.

Edward Cree (5):
  selftests/bpf: add test for mixed signed and unsigned bounds checks
  bpf/verifier: rework value tracking
  bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU
path
  bpf/verifier: track signed and unsigned min/max values
  selftests/bpf: change test_verifier expectations

 include/linux/bpf.h |   34 +-
 include/linux/bpf_verifier.h|   56 +-
 include/linux/tnum.h|   58 +
 kernel/bpf/Makefile |2 +-
 kernel/bpf/tnum.c   |  163 +++
 kernel/bpf/verifier.c   | 1852 ---
 tools/testing/selftests/bpf/test_verifier.c |  248 ++--
 7 files changed, 1482 insertions(+), 931 deletions(-)
 create mode 100644 include/linux/tnum.h
 create mode 100644 kernel/bpf/tnum.c



[RFC PATCH net-next 1/5] selftests/bpf: add test for mixed signed and unsigned bounds checks

2017-06-07 Thread Edward Cree
Currently fails due to bug in verifier bounds handling.

Signed-off-by: Edward Cree 
---
 tools/testing/selftests/bpf/test_verifier.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index cabb19b..5074cfa 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -5169,6 +5169,32 @@ static struct bpf_test tests[] = {
},
.result = ACCEPT,
},
+   {
+   "bounds checks mixing signed and unsigned",
+   .insns = {
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+   BPF_LD_MAP_FD(BPF_REG_1, 0),
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+BPF_FUNC_map_lookup_elem),
+   BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+   BPF_ST_MEM(BPF_DW, BPF_REG_10, -16, -8),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
+   BPF_MOV64_IMM(BPF_REG_2, -1),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_2, 3),
+   BPF_JMP_IMM(BPF_JSGT, BPF_REG_1, 1, 2),
+   BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1),
+   BPF_ST_MEM(BPF_B, BPF_REG_0, 0, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .fixup_map1 = { 3 },
+   .errstr_unpriv = "R0 pointer arithmetic prohibited",
+   .errstr = "R0 min value is negative, either use unsigned index 
or do a if (index >=0) check.",
+   .result = REJECT,
+   .result_unpriv = REJECT,
+   },
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)



[RFC PATCH net-next 4/5] bpf/verifier: track signed and unsigned min/max values

2017-06-07 Thread Edward Cree
Allows us to, sometimes, combine information from a signed check of one
 bound and an unsigned check of the other.
We now track the full range of possible values, rather than restricting
 ourselves to [0, 1<<30) and considering anything beyond that as
 unknown.  While this is probably not necessary, it makes the code more
 straightforward and symmetrical between signed and unsigned bounds.

Signed-off-by: Edward Cree 
---
 include/linux/bpf_verifier.h |  22 +-
 kernel/bpf/verifier.c| 661 +--
 2 files changed, 395 insertions(+), 288 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index e341469..10a5944 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -11,11 +11,15 @@
 #include  /* for MAX_BPF_STACK */
 #include 
 
- /* Just some arbitrary values so we can safely do math without overflowing and
-  * are obviously wrong for any sort of memory access.
-  */
-#define BPF_REGISTER_MAX_RANGE (1024 * 1024 * 1024)
-#define BPF_REGISTER_MIN_RANGE -1
+/* Maximum variable offset umax_value permitted when resolving memory accesses.
+ * In practice this is far bigger than any realistic pointer offset; this limit
+ * ensures that umax_value + (int)off + (int)size cannot overflow a u64.
+ */
+#define BPF_MAX_VAR_OFF(1ULL << 31)
+/* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO].  This ensures
+ * that converting umax_value to int cannot overflow.
+ */
+#define BPF_MAX_VAR_SIZINT_MAX
 
 struct bpf_reg_state {
enum bpf_reg_type type;
@@ -38,7 +42,7 @@ struct bpf_reg_state {
 * PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
 */
u32 id;
-   /* These three fields must be last.  See states_equal() */
+   /* These five fields must be last.  See states_equal() */
/* For scalar types (SCALAR_VALUE), this represents our knowledge of
 * the actual value.
 * For pointer types, this represents the variable part of the offset
@@ -51,8 +55,10 @@ struct bpf_reg_state {
 * These refer to the same value as align, not necessarily the actual
 * contents of the register.
 */
-   s64 min_value; /* minimum possible (s64)value */
-   u64 max_value; /* maximum possible (u64)value */
+   s64 smin_value; /* minimum possible (s64)value */
+   s64 smax_value; /* maximum possible (s64)value */
+   u64 umin_value; /* minimum possible (u64)value */
+   u64 umax_value; /* maximum possible (u64)value */
 };
 
 enum bpf_stack_slot_type {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1ff5b5d..a5bb3f1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -234,12 +234,20 @@ static void print_verifier_state(struct 
bpf_verifier_state *state)
verbose(",ks=%d,vs=%d",
reg->map_ptr->key_size,
reg->map_ptr->value_size);
-   if (reg->min_value != BPF_REGISTER_MIN_RANGE)
-   verbose(",min_value=%lld",
-   (long long)reg->min_value);
-   if (reg->max_value != BPF_REGISTER_MAX_RANGE)
-   verbose(",max_value=%llu",
-   (unsigned long long)reg->max_value);
+   if (reg->smin_value != reg->umin_value &&
+   reg->smin_value != S64_MIN)
+   verbose(",smin_value=%lld",
+   (long long)reg->smin_value);
+   if (reg->smax_value != reg->umax_value &&
+   reg->smax_value != S64_MAX)
+   verbose(",smax_value=%lld",
+   (long long)reg->smax_value);
+   if (reg->umin_value != 0)
+   verbose(",umin_value=%llu",
+   (unsigned long long)reg->umin_value);
+   if (reg->umax_value != U64_MAX)
+   verbose(",umax_value=%llu",
+   (unsigned long long)reg->umax_value);
if (~reg->align.mask) {
char tn_buf[48];
 
@@ -464,14 +472,24 @@ static const int caller_saved[CALLER_SAVED_REGS] = {
BPF_REG_0, BPF_REG_1, BPF_REG_2, BPF_REG_3, BPF_REG_4, BPF_REG_5
 };
 
+/* Mark the unknown part of a register (variable offset or scalar value) as
+ * known to have the value @imm.
+ */
+static void __mark_reg_known(struct bpf_reg_state *reg, u64 imm)
+{
+   reg->align = tn_const(imm);
+   reg->smin_value = (s64)imm;
+   reg->smax_value = (s64)imm;
+   reg->umin_value = imm;
+   reg->umax_value = imm;
+}
+
 /* Mark the 'variable offset' part of a register as zero.  This should be

[RFC PATCH net-next 5/5] selftests/bpf: change test_verifier expectations

2017-06-07 Thread Edward Cree
Some of the verifier's error messages have changed, and some constructs
 that previously couldn't be verified are now accepted.

Signed-off-by: Edward Cree 
---
 tools/testing/selftests/bpf/test_verifier.c | 226 ++--
 1 file changed, 116 insertions(+), 110 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 5074cfa..f5281df 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -421,7 +421,7 @@ static struct bpf_test tests[] = {
BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
-   .errstr_unpriv = "R1 pointer arithmetic",
+   .errstr_unpriv = "R1 subtraction from stack pointer",
.result_unpriv = REJECT,
.errstr = "R1 invalid mem access",
.result = REJECT,
@@ -603,8 +603,9 @@ static struct bpf_test tests[] = {
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, -4),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned stack access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"invalid map_fd for function call",
@@ -650,8 +651,9 @@ static struct bpf_test tests[] = {
BPF_EXIT_INSN(),
},
.fixup_map1 = { 3 },
-   .errstr = "misaligned access",
+   .errstr = "misaligned value access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"sometimes access memory with incorrect alignment",
@@ -672,6 +674,7 @@ static struct bpf_test tests[] = {
.errstr = "R0 invalid mem access",
.errstr_unpriv = "R0 leaks addr",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"jump test 1",
@@ -1184,8 +1187,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, cb[0]) + 1),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned context access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"check cb access: half, oob 1",
@@ -1279,8 +1283,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, cb[0]) + 2),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned context access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"check cb access: word, unaligned 2",
@@ -1290,8 +1295,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, cb[4]) + 1),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned context access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"check cb access: word, unaligned 3",
@@ -1301,8 +1307,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, cb[4]) + 2),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned context access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"check cb access: word, unaligned 4",
@@ -1312,8 +1319,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, cb[4]) + 3),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned context access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"check cb access: double",
@@ -1339,8 +1347,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, cb[1])),
BPF_EXIT_INSN(),
},
-   .errstr = "misaligned access",
+   .errstr = "misaligned context access",
.result = REJECT,
+   .flags = F_LOAD_WITH_STRICT_ALIGNMENT,
},
{
"check cb access: double, unaligned 2",
@@ -1350,8 +1359,9 @@ static struct bpf_test tests[] = {
offsetof(struct __sk_buff, c

[RFC PATCH net-next 2/5] bpf/verifier: rework value tracking

2017-06-07 Thread Edward Cree
Tracks value alignment by means of tracking known & unknown bits.
Tightens some min/max value checks and fixes a couple of bugs therein.

Signed-off-by: Edward Cree 
---
 include/linux/bpf.h  |   34 +-
 include/linux/bpf_verifier.h |   40 +-
 include/linux/tnum.h |   58 ++
 kernel/bpf/Makefile  |2 +-
 kernel/bpf/tnum.c|  163 +
 kernel/bpf/verifier.c| 1641 +++---
 6 files changed, 1170 insertions(+), 768 deletions(-)
 create mode 100644 include/linux/tnum.h
 create mode 100644 kernel/bpf/tnum.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d7..5ac19ab 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -115,35 +115,25 @@ enum bpf_access_type {
 };
 
 /* types of values stored in eBPF registers */
+/* Pointer types represent:
+ * pointer
+ * pointer + imm
+ * pointer + (u16) var
+ * pointer + (u16) var + imm
+ * if (range > 0) then [ptr, ptr + range - off) is safe to access
+ * if (id > 0) means that some 'var' was added
+ * if (off > 0) means that 'imm' was added
+ */
 enum bpf_reg_type {
NOT_INIT = 0,/* nothing was written into register */
-   UNKNOWN_VALUE,   /* reg doesn't contain a valid pointer */
+   SCALAR_VALUE,/* reg doesn't contain a valid pointer */
PTR_TO_CTX,  /* reg points to bpf_context */
CONST_PTR_TO_MAP,/* reg points to struct bpf_map */
PTR_TO_MAP_VALUE,/* reg points to map element value */
PTR_TO_MAP_VALUE_OR_NULL,/* points to map elem value or NULL */
-   FRAME_PTR,   /* reg == frame_pointer */
-   PTR_TO_STACK,/* reg == frame_pointer + imm */
-   CONST_IMM,   /* constant integer value */
-
-   /* PTR_TO_PACKET represents:
-* skb->data
-* skb->data + imm
-* skb->data + (u16) var
-* skb->data + (u16) var + imm
-* if (range > 0) then [ptr, ptr + range - off) is safe to access
-* if (id > 0) means that some 'var' was added
-* if (off > 0) menas that 'imm' was added
-*/
-   PTR_TO_PACKET,
+   PTR_TO_STACK,/* reg == frame_pointer + offset */
+   PTR_TO_PACKET,   /* reg points to skb->data */
PTR_TO_PACKET_END,   /* skb->data + headlen */
-
-   /* PTR_TO_MAP_VALUE_ADJ is used for doing pointer math inside of a map
-* elem value.  We only allow this if we can statically verify that
-* access from this register are going to fall within the size of the
-* map element.
-*/
-   PTR_TO_MAP_VALUE_ADJ,
 };
 
 struct bpf_prog;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index d5093b5..e341469 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -9,6 +9,7 @@
 
 #include  /* for enum bpf_reg_type */
 #include  /* for MAX_BPF_STACK */
+#include 
 
  /* Just some arbitrary values so we can safely do math without overflowing and
   * are obviously wrong for any sort of memory access.
@@ -19,30 +20,39 @@
 struct bpf_reg_state {
enum bpf_reg_type type;
union {
-   /* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE 
*/
-   s64 imm;
-
-   /* valid when type == PTR_TO_PACKET* */
-   struct {
-   u16 off;
-   u16 range;
-   };
+   /* valid when type == PTR_TO_PACKET */
+   u32 range;
 
/* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
 *   PTR_TO_MAP_VALUE_OR_NULL
 */
struct bpf_map *map_ptr;
};
+   /* Fixed part of pointer offset, pointer types only */
+   s32 off;
+   /* Used to find other pointers with the same variable offset, so they
+* can share range knowledge.
+* Exception: for PTR_TO_MAP_VALUE_OR_NULL this is used to share which
+* map value we came from, when one is tested for != NULL.  Note that
+* this overloading means that we can't do pointer arithmetic on a
+* PTR_TO_MAP_VALUE_OR_NULL, we have to NULL-check it _first_.
+*/
u32 id;
+   /* These three fields must be last.  See states_equal() */
+   /* For scalar types (SCALAR_VALUE), this represents our knowledge of
+* the actual value.
+* For pointer types, this represents the variable part of the offset
+* from the pointed-to object, and is shared with all bpf_reg_states
+* with the same id as us.
+*/
+   struct tnum align;
/* Used to determine if any memory access using this register will
-* result in a bad access. These two fields must be last.
-* See states_equal()
+* result in a bad access.
+* These refer to the same value as align, not necessarily the actual
+* contents of t

[RFC PATCH net-next 3/5] bpf/verifier: feed pointer-to-unknown-scalar casts into scalar ALU path

2017-06-07 Thread Edward Cree
If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
 treat the pointer as an unknown scalar and try again, because we might be
 able to conclude something about the result (e.g. pointer & 0x40 is either
 0 or 0x40).

Signed-off-by: Edward Cree 
---
 kernel/bpf/verifier.c | 244 ++
 1 file changed, 127 insertions(+), 117 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dd06e4e..1ff5b5d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1566,6 +1566,8 @@ static void coerce_reg_to_32(struct bpf_reg_state *reg)
 /* Handles arithmetic on a pointer and a scalar: computes new min/max and 
align.
  * Caller must check_reg_overflow all argument regs beforehand.
  * Caller should also handle BPF_MOV case separately.
+ * If we return -EACCES, caller may want to try again treating pointer as a
+ * scalar.  So we only emit a diagnostic if !env->allow_ptr_leaks.
  */
 static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env,
   struct bpf_insn *insn,
@@ -1588,43 +1590,29 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
 
if (BPF_CLASS(insn->code) != BPF_ALU64) {
/* 32-bit ALU ops on pointers produce (meaningless) scalars */
-   if (!env->allow_ptr_leaks) {
+   if (!env->allow_ptr_leaks)
verbose("R%d 32-bit pointer arithmetic prohibited\n",
dst);
-   return -EACCES;
-   }
-   __mark_reg_unknown(dst_reg);
-   /* High bits are known zero */
-   dst_reg->align.mask = (u32)-1;
-   return 0;
+   return -EACCES;
}
 
if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) {
-   if (!env->allow_ptr_leaks) {
+   if (!env->allow_ptr_leaks)
verbose("R%d pointer arithmetic on 
PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n",
dst);
-   return -EACCES;
-   }
-   __mark_reg_unknown(dst_reg);
-   return 0;
+   return -EACCES;
}
if (ptr_reg->type == CONST_PTR_TO_MAP) {
-   if (!env->allow_ptr_leaks) {
+   if (!env->allow_ptr_leaks)
verbose("R%d pointer arithmetic on CONST_PTR_TO_MAP 
prohibited\n",
dst);
-   return -EACCES;
-   }
-   __mark_reg_unknown(dst_reg);
-   return 0;
+   return -EACCES;
}
if (ptr_reg->type == PTR_TO_PACKET_END) {
-   if (!env->allow_ptr_leaks) {
+   if (!env->allow_ptr_leaks)
verbose("R%d pointer arithmetic on PTR_TO_PACKET_END 
prohibited\n",
dst);
-   return -EACCES;
-   }
-   __mark_reg_unknown(dst_reg);
-   return 0;
+   return -EACCES;
}
 
/* In case of 'scalar += pointer', dst_reg inherits pointer type and id.
@@ -1648,8 +1636,9 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
break;
}
if (max_val == BPF_REGISTER_MAX_RANGE) {
-   verbose("R%d tried to add unbounded value to pointer\n",
-   dst);
+   if (!env->allow_ptr_leaks)
+   verbose("R%d tried to add unbounded value to 
pointer\n",
+   dst);
return -EACCES;
}
/* A new variable offset is created.  Note that off_reg->off
@@ -1676,28 +1665,20 @@ static int adjust_ptr_min_max_vals(struct 
bpf_verifier_env *env,
case BPF_SUB:
if (dst_reg == off_reg) {
/* scalar -= pointer.  Creates an unknown scalar */
-   if (!env->allow_ptr_leaks) {
+   if (!env->allow_ptr_leaks)
verbose("R%d tried to subtract pointer from 
scalar\n",
dst);
-   return -EACCES;
-   }
-   /* Make it an unknown scalar */
-   __mark_reg_unknown(dst_reg);
-   break;
+   return -EACCES;
}
/* We don't allow subtraction from FP, because (according to
 * test_verifier.c test "invalid fp arithmetic", JITs might not
 * be able to deal with it.
 */
if (ptr_reg->type == PTR_TO_STACK) {
-   if (!env->allow_ptr_leaks) {
+   if (!env->allow_ptr_leaks)

Re: [PATCH 1/2] ip_tunnel: fix potential issue in ip_tunnel_rcv

2017-06-07 Thread Eric Dumazet
On Wed, 2017-06-07 at 22:16 +0800, Haishuang Yan wrote:
> When ip_tunnel_rcv fails, the tun_dst won't be freed, so move
> skb_dst_set to begin and tun_dst would be freed by kfree_skb.
> 
> Signed-off-by: Haishuang Yan 
> ---

Please add the missing Fixes: tag and CC author of the patch that added
this bug, so that he has a chance to comment and avoid future similar
bugs.

Thanks.




[PATCH net-next v2] net: ipmr: add getlink support

2017-06-07 Thread Nikolay Aleksandrov
Currently there's no way to dump the VIF table for an ipmr table other
than the default (via proc). This is a major issue when debugging ipmr
issues and in general it is good to know which interfaces are
configured. This patch adds support for RTM_GETLINK for the ipmr family
so we can dump the VIF table and the ipmr table's current config for
each table. We're protected by rtnl so no need to acquire RCU or
mrt_lock.

Signed-off-by: Nikolay Aleksandrov 
---
v2: use netlink attributes for all mrtable and vif fields, and set message
type to RTM_NEWLINK

The plan is to add full netlink control to ipmr via new/set/dellink later.
Also this would allow us to dump any number of VIFs in the future when we
remove the VIF device limit.

 include/uapi/linux/mroute.h |  42 +++
 net/ipv4/ipmr.c | 126 
 2 files changed, 168 insertions(+)

diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
index 1fe4c1e7d66e..f904367c0cee 100644
--- a/include/uapi/linux/mroute.h
+++ b/include/uapi/linux/mroute.h
@@ -110,6 +110,48 @@ struct igmpmsg {
struct in_addr im_src,im_dst;
 };
 
+/* ipmr netlink table attributes */
+enum {
+   IPMRA_TABLE_UNSPEC,
+   IPMRA_TABLE_ID,
+   IPMRA_TABLE_CACHE_RES_QUEUE_LEN,
+   IPMRA_TABLE_MROUTE_REG_VIF_NUM,
+   IPMRA_TABLE_MROUTE_DO_ASSERT,
+   IPMRA_TABLE_MROUTE_DO_PIM,
+   IPMRA_TABLE_VIFS,
+   __IPMRA_TABLE_MAX
+};
+#define IPMRA_TABLE_MAX (__IPMRA_TABLE_MAX - 1)
+
+/* ipmr netlink vif attribute format
+ * [ IPMRA_TABLE_VIFS ] - nested attribute
+ *   [ IPMRA_VIF ] - nested attribute
+ * [ IPMRA_VIFA_xxx ]
+ */
+enum {
+   IPMRA_VIF_UNSPEC,
+   IPMRA_VIF,
+   __IPMRA_VIF_MAX
+};
+#define IPMRA_VIF_MAX (__IPMRA_VIF_MAX - 1)
+
+/* vif-specific attributes */
+enum {
+   IPMRA_VIFA_UNSPEC,
+   IPMRA_VIFA_IFINDEX,
+   IPMRA_VIFA_VIF_ID,
+   IPMRA_VIFA_FLAGS,
+   IPMRA_VIFA_BYTES_IN,
+   IPMRA_VIFA_BYTES_OUT,
+   IPMRA_VIFA_PACKETS_IN,
+   IPMRA_VIFA_PACKETS_OUT,
+   IPMRA_VIFA_LOCAL_ADDR,
+   IPMRA_VIFA_REMOTE_ADDR,
+   IPMRA_VIFA_PAD,
+   __IPMRA_VIFA_MAX
+};
+#define IPMRA_VIFA_MAX (__IPMRA_VIFA_MAX - 1)
+
 /* That's all usermode folks */
 
 #define MFC_ASSERT_THRESH (3*HZ)   /* Maximal freq. of asserts */
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 551de4d023a8..9374b99c7c17 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2528,6 +2528,129 @@ static int ipmr_rtm_route(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return ipmr_mfc_delete(tbl, &mfcc, parent);
 }
 
+static bool ipmr_fill_table(struct mr_table *mrt, struct sk_buff *skb)
+{
+   u32 queue_len = atomic_read(&mrt->cache_resolve_queue_len);
+
+   if (nla_put_u32(skb, IPMRA_TABLE_ID, mrt->id) ||
+   nla_put_u32(skb, IPMRA_TABLE_CACHE_RES_QUEUE_LEN, queue_len) ||
+   nla_put_s32(skb, IPMRA_TABLE_MROUTE_REG_VIF_NUM,
+   mrt->mroute_reg_vif_num) ||
+   nla_put_u8(skb, IPMRA_TABLE_MROUTE_DO_ASSERT,
+  mrt->mroute_do_assert) ||
+   nla_put_u8(skb, IPMRA_TABLE_MROUTE_DO_PIM, mrt->mroute_do_pim))
+   return false;
+
+   return true;
+}
+
+static bool ipmr_fill_vif(struct mr_table *mrt, u32 vifid, struct sk_buff *skb)
+{
+   struct nlattr *vif_nest;
+   struct vif_device *vif;
+
+   /* if the VIF doesn't exist just continue */
+   if (!VIF_EXISTS(mrt, vifid))
+   return true;
+
+   vif = &mrt->vif_table[vifid];
+   vif_nest = nla_nest_start(skb, IPMRA_VIF);
+   if (!vif_nest)
+   return false;
+   if (nla_put_u32(skb, IPMRA_VIFA_IFINDEX, vif->dev->ifindex) ||
+   nla_put_u32(skb, IPMRA_VIFA_VIF_ID, vifid) ||
+   nla_put_u16(skb, IPMRA_VIFA_FLAGS, vif->flags) ||
+   nla_put_u64_64bit(skb, IPMRA_VIFA_BYTES_IN, vif->bytes_in,
+ IPMRA_VIFA_PAD) ||
+   nla_put_u64_64bit(skb, IPMRA_VIFA_BYTES_OUT, vif->bytes_out,
+ IPMRA_VIFA_PAD) ||
+   nla_put_u64_64bit(skb, IPMRA_VIFA_PACKETS_IN, vif->pkt_in,
+ IPMRA_VIFA_PAD) ||
+   nla_put_u64_64bit(skb, IPMRA_VIFA_PACKETS_OUT, vif->pkt_out,
+ IPMRA_VIFA_PAD) ||
+   nla_put_be32(skb, IPMRA_VIFA_LOCAL_ADDR, vif->local) ||
+   nla_put_be32(skb, IPMRA_VIFA_REMOTE_ADDR, vif->remote)) {
+   nla_nest_cancel(skb, vif_nest);
+   return false;
+   }
+   nla_nest_end(skb, vif_nest);
+
+   return true;
+}
+
+static int ipmr_rtm_dumplink(struct sk_buff *skb, struct netlink_callback *cb)
+{
+   struct net *net = sock_net(skb->sk);
+   struct nlmsghdr *nlh = NULL;
+   unsigned int t = 0, s_t;
+   unsigned int e = 0, s_e;
+   struct mr_table *mrt;
+
+   s_t = cb->args[0];
+   s_e = cb->args[1];
+
+   ipmr_for_each_

Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-06-07 Thread Jes Sorensen

On 06/07/2017 12:06 AM, Saeed Mahameed wrote:

On Wed, Jun 7, 2017 at 12:46 AM, Jes Sorensen  wrote:

Hey Jes,

It is not just about squashing patches, I am working on a series of
patches to allow compiling out eswitch/eswitch_offloads/en_rep.c/en_tc
altogether, it will come out cleaner as it will remove all ethernet
sriov/eswitch VF representors and eswitch tc offloads stuff with one
kconfig flag, and yet preserve standard QoS functionality from en_tc.



Saeed,

I realize it is not just about squashing patches, however doing that to
someone else's patches is just broken. The Linux kernel way is to build on
top of patches, if they are valid, rather than throwing them all away and
doing it from scratch again bottom up. If there was something actually wrong
with my patches, and I would love to understand if that is the case, since I
don't know 1/100th of the hardware details that you know, then please share
those details.


Hey Jes,

Sorry for the inconvenience, I am working on a very similar patches,
even before you posted yours.
Your patches are fine, but as i said before, removing eswitch as is
will introduce a small regression in Multi-PF configuration.

the issue is that lately we are having tons of discussions exactly
about this and how to do the driver breakdown
that makes everyone happy, so things are moving relatively slow, but
my work on eswitch is converging.


Gotcha. I deliberately didn't disable eswitch itself in my patch, but 
only the offloading functionality, because of the old discussion 
mentioning that the eswitch needing to be initialized.



Sounds good.


I will post some patches for you to review by end of week.



Could we please start seeing this stuff happen in a public git tree so it is
possible to follow and contribute to the development? It is very frustrating
having to wait for things to appear and and not knowing whether a patch is
integrated or needs to be revised when you have things building on top of
it.


Sure, I will post some patches later today.
I believe they will be fully ready by for submission by End of week.
Again sorry about this.


Awesome!

Jes




Re: [PATCH net-next] net: dsa: mv88e6xxx: Have 6161/6123 use EDSA tags

2017-06-07 Thread Vivien Didelot
Andrew Lunn  writes:

> The mv88e6161 and mv88e6123 are capable of using EDSA tags when
> passing frames from the host to the switch and back.
>
> Signed-off-by: Andrew Lunn 

Reviewed-by: Vivien Didelot 


Re: [PATCH 7/9] net: mvmdio: add xmdio support

2017-06-07 Thread Florian Fainelli
On 06/07/2017 01:38 AM, Antoine Tenart wrote:
> This patch adds the xMDIO interface support in the mvmdio driver. This
> interface is used in Ethernet controllers on Marvell 370, 7k and 8k (as
> of now). The xSMI interface supported by this driver complies with the
> IEEE 802.3 clause 45 (while the SMI interface complies with the clause
> 22). The xSMI interface is used by 10GbE devices.
> 
> Signed-off-by: Antoine Tenart 
> ---

> + if (of_device_is_compatible(np, "marvell,orion-mdio")) {
> + ops->is_done = smi_is_done;
> + ops->is_read_valid = smi_is_read_valid;
> + ops->start_read = smi_start_read_op;
> + ops->read = smi_read_op;
> + ops->write = smi_write_op;
> +
> + dev->poll_interval_min = MVMDIO_SMI_POLL_INTERVAL_MIN;
> + dev->poll_interval_max = MVMDIO_SMI_POLL_INTERVAL_MAX;
> + } else if (of_device_is_compatible(np, "marvell,xmdio")) {
> + ops->is_done = xsmi_is_done;
> + ops->is_read_valid = xsmi_is_read_valid;
> + ops->start_read = xsmi_start_read_op;
> + ops->read = xsmi_read_op;
> + ops->write = xsmi_write_op;
> +
> + dev->poll_interval_min = MVMDIO_XSMI_POLL_INTERVAL_MIN;
> + dev->poll_interval_max = MVMDIO_XSMI_POLL_INTERVAL_MAX;
> + } else {
> + return -EINVAL;
> + }

Instead of doing this, you could have the ops structure declared e.g: a
static global variables in the driver and reference them from the
of_device_id .data field, something like:

static struct orion_mdio_ops mdio_ops = {
...
};

static struct orion_mdio_data mdio_data = {
.ops = &mdio_ops,
.poll_intervall_min = ...,
.poll_interfave_max = ...,
};

static struct orion_mdio_ops xmdio_ops = {
...
};

static strcut orion_mdio_data xmdio_ data = {
};

and then reference those using of_id->data in the probe function

> +
> + dev->ops = ops;
> + return 0;
> +}
> +
>  static int orion_mdio_probe(struct platform_device *pdev)
>  {
>   struct resource *r;
>   struct mii_bus *bus;
>   struct orion_mdio_dev *dev;
> - struct orion_mdio_ops *ops;
>   int i, ret;
>  
>   r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> @@ -278,18 +367,9 @@ static int orion_mdio_probe(struct platform_device *pdev)
>  
>   mutex_init(&dev->lock);
>  
> - ops = devm_kzalloc(&pdev->dev, sizeof(*ops), GFP_KERNEL);
> - if (!ops)
> - return -ENOMEM;
> -
> - dev->poll_interval_min = MVMDIO_SMI_POLL_INTERVAL_MIN;
> - dev->poll_interval_max = MVMDIO_SMI_POLL_INTERVAL_MAX;
> - ops->is_done = orion_mdio_smi_is_done;
> - ops->is_read_valid = orion_mdio_smi_is_read_valid;
> - ops->start_read = orion_mdio_start_read_op;
> - ops->read = orion_mdio_read_op;
> - ops->write = orion_mdio_write_op;
> - dev->ops = ops;
> + ret = orion_mdio_populate_ops(pdev, dev);
> + if (ret)
> + return ret;
>  
>   if (pdev->dev.of_node)
>   ret = of_mdiobus_register(bus, pdev->dev.of_node);
> @@ -340,6 +420,7 @@ static int orion_mdio_remove(struct platform_device *pdev)
>  
>  static const struct of_device_id orion_mdio_match[] = {
>   { .compatible = "marvell,orion-mdio" },

and do .data = &mdio_data

> + { .compatible = "marvell,xmdio" },

and .data = &xmdio_data

>   { }
>  };
>  MODULE_DEVICE_TABLE(of, orion_mdio_match);
> 

-- 
Florian


Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-07 Thread Jason Gunthorpe
On Wed, Jun 07, 2017 at 07:16:42AM +0300, Saeed Mahameed wrote:
> On Tue, Jun 6, 2017 at 7:17 PM, Jason Gunthorpe
>  wrote:
> > On Tue, Jun 06, 2017 at 06:52:15AM +, Ilan Tayari wrote:
> >
> >> So neither the host stack nor the network are aware of them.
> >> They exist momentarily only on the internal traces on the board and not
> >> anywhere else.
> >
> > Is that really true? If you are creating rocee QPs' then the RDMA
> > stack sees this stuff and now we have buried a RDMA ULP inside an
> > ethernet driver which seems really wonky..
> 
> It is not an ethernet driver, mlx5_core provides both RDMA and
> ethernet interfaces to both mlx5_ib and the mlx5e netdevice.
> 
> so it is perfectly capable of creating QPs on its own, after all it is
> the one creating QPs for the RDMA stack :).
> 
> rdma_create_qp->mlx5_ib_create_qp->mlx5_core_create_qp.

Wait, so you built a RDMA ULP inside your driver without using the
RDMA API?

This keep getting more ugly :(

What about security? What if user space sends some raw packets to the
FPGA - can it reprogram the ISPEC settings or worse?

Jason


Re: [PATCH 7/9] net: mvmdio: add xmdio support

2017-06-07 Thread Russell King - ARM Linux
On Wed, Jun 07, 2017 at 08:48:06AM -0700, Florian Fainelli wrote:
> On 06/07/2017 01:38 AM, Antoine Tenart wrote:
> > This patch adds the xMDIO interface support in the mvmdio driver. This
> > interface is used in Ethernet controllers on Marvell 370, 7k and 8k (as
> > of now). The xSMI interface supported by this driver complies with the
> > IEEE 802.3 clause 45 (while the SMI interface complies with the clause
> > 22). The xSMI interface is used by 10GbE devices.
> > 
> > Signed-off-by: Antoine Tenart 
> > ---
> 
> > +   if (of_device_is_compatible(np, "marvell,orion-mdio")) {
> > +   ops->is_done = smi_is_done;
> > +   ops->is_read_valid = smi_is_read_valid;
> > +   ops->start_read = smi_start_read_op;
> > +   ops->read = smi_read_op;
> > +   ops->write = smi_write_op;
> > +
> > +   dev->poll_interval_min = MVMDIO_SMI_POLL_INTERVAL_MIN;
> > +   dev->poll_interval_max = MVMDIO_SMI_POLL_INTERVAL_MAX;
> > +   } else if (of_device_is_compatible(np, "marvell,xmdio")) {
> > +   ops->is_done = xsmi_is_done;
> > +   ops->is_read_valid = xsmi_is_read_valid;
> > +   ops->start_read = xsmi_start_read_op;
> > +   ops->read = xsmi_read_op;
> > +   ops->write = xsmi_write_op;
> > +
> > +   dev->poll_interval_min = MVMDIO_XSMI_POLL_INTERVAL_MIN;
> > +   dev->poll_interval_max = MVMDIO_XSMI_POLL_INTERVAL_MAX;
> > +   } else {
> > +   return -EINVAL;
> > +   }
> 
> Instead of doing this, you could have the ops structure declared e.g: a
> static global variables in the driver and reference them from the
> of_device_id .data field, something like:
> 
> static struct orion_mdio_ops mdio_ops = {
>   ...
> };

In this case, don't forget the "const" for static structures containing
only function pointers (so that the function pointers can't be exploited.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [PATCH net-next v2 2/2] bpf: Remove the capability check for cgroup skb eBPF program

2017-06-07 Thread Daniel Borkmann

On 06/07/2017 12:44 AM, Chenbo Feng wrote:

On 06/06/2017 09:56 AM, Daniel Borkmann wrote:

On 06/02/2017 01:42 AM, Alexei Starovoitov wrote:

On Wed, May 31, 2017 at 06:16:00PM -0700, Chenbo Feng wrote:

From: Chenbo Feng 

Currently loading a cgroup skb eBPF program require a CAP_SYS_ADMIN
capability while attaching the program to a cgroup only requires the
user have CAP_NET_ADMIN privilege. We can escape the capability
check when load the program just like socket filter program to make
the capability requirement consistent.

Change since v1:
Change the code style in order to be compliant with checkpatch.pl
preference

Signed-off-by: Chenbo Feng 


as far as I can see they're indeed the same as socket filters, so
Acked-by: Alexei Starovoitov 

but I don't quite understand how it helps, since as you said
attaching such unpriv fd to cgroup still requires root.
Do you have more patches to follow?


Hmm, when we relax this from capable(CAP_SYS_ADMIN) to unprivileged,
then we must at least also zero out the not-yet-initialized memory
for the mac header for egress case in __cgroup_bpf_run_filter_skb().


Do you mean something like:

if (type == BPF_CGROUP_INET_EGRESS) {

 offset = skb_network_header(skb) - skb_mac_header(skb);

 memset(skb_mac_header(skb), 0, offset)


That won't work, reason is the same as per 92046578ac88 ("bpf: cgroup
skb progs cannot access ld_abs/ind"), meaning that mac header is not
yet set at that point in time, but more below.


}

And could you explain more on why we need to do this if we remove the

> CAP_SYS_ADMIN check? I thought we still cannot directly access the
> sk_buff without using bpf_skb_load_bytes helper and we still need a
> CAP_NET_ADMIN in order to attach and run the program on egress side right?

Ok, forget this comment of mine. The __cgroup_bpf_run_filter_skb() does
__skb_push()/__skb_pull(), but for egress case the offset is always 0, so
we don't start at mac header but at network header instead, meaning memset()
is not needed.

Thanks,
Daniel


Re: [PATCH 7/9] net: mvmdio: add xmdio support

2017-06-07 Thread Antoine Tenart
Hi Florian,

On Wed, Jun 07, 2017 at 08:48:06AM -0700, Florian Fainelli wrote:
> On 06/07/2017 01:38 AM, Antoine Tenart wrote:
> 
> > +   if (of_device_is_compatible(np, "marvell,orion-mdio")) {
> > +   ops->is_done = smi_is_done;
> > +   ops->is_read_valid = smi_is_read_valid;
> > +   ops->start_read = smi_start_read_op;
> > +   ops->read = smi_read_op;
> > +   ops->write = smi_write_op;
> > +
> > +   dev->poll_interval_min = MVMDIO_SMI_POLL_INTERVAL_MIN;
> > +   dev->poll_interval_max = MVMDIO_SMI_POLL_INTERVAL_MAX;
> > +   } else if (of_device_is_compatible(np, "marvell,xmdio")) {
> > +   ops->is_done = xsmi_is_done;
> > +   ops->is_read_valid = xsmi_is_read_valid;
> > +   ops->start_read = xsmi_start_read_op;
> > +   ops->read = xsmi_read_op;
> > +   ops->write = xsmi_write_op;
> > +
> > +   dev->poll_interval_min = MVMDIO_XSMI_POLL_INTERVAL_MIN;
> > +   dev->poll_interval_max = MVMDIO_XSMI_POLL_INTERVAL_MAX;
> > +   } else {
> > +   return -EINVAL;
> > +   }
> 
> Instead of doing this, you could have the ops structure declared e.g: a
> static global variables in the driver and reference them from the
> of_device_id .data field, something like:

Good idea, I'll update the series using static global variables for ops
and poll intervals and reference them in the .data field.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Leon Romanovsky
Hi Kaike,

In the commit bc10ed7d3d19 ("IB/core: Add rdma netlink helper functions"),
part of larger series [1], you introduced ibnl_rcv_reply_skb(), which is very
similar to netlink_rcv_skb() with one major change.

The netlink messages without NLM_F_REQUEST flag are handled by 
ibnl_rcv_reply_skb(),
while netlink_rcv_skb() doesn't. The comment introduced in commit d35b685640ae
"[NETLINK]: Ignore !NLM_F_REQUEST messages directly in netlink_run_queue()")
says that "Only requests are handled by the kernel".

It makes me wonder if it is expected behavior for ibnl_rcv_reply_skb()
to handle !NLM_F_REQUEST messages and do we really need it? What are the 
scenarios?
In my use case, which is for sure different from yours, I'm always setting 
NLM_F_REQUEST
while communicating with kernel.

Thanks

[1] http://www.spinics.net/lists/linux-rdma/msg28153.html


signature.asc
Description: PGP signature


Re: [PATCH net-next] net: phy: use of_mdio_parse_addr

2017-06-07 Thread Liviu Dudau
On Fri, Jun 02, 2017 at 02:22:51PM -0400, David Miller wrote:
> From: Jon Mason 
> Date: Wed, 31 May 2017 15:43:30 -0400
> 
> > use of_mdio_parse_addr() in place of an OF read of reg and a bounds
> > check (which is litterally the exact same thing that
> > of_mdio_parse_addr() does)
> > 
> > Signed-off-by: Jon Mason 
> 
> Applied, thanks Jon.

This makes linux-next fail the modules_install target as depmod detects 2 
circular
dependencies. Reverting this patch fixes the issue.

depmod: ERROR: Cycle detected: libphy -> of_mdio -> fixed_phy -> libphy
depmod: ERROR: Cycle detected: libphy -> of_mdio -> libphy
depmod: ERROR: Found 3 modules in dependency cycles!
make[1]: *** [/home/dliviu/devel/kernel/Makefile:1245: _modinst_post] Error 1

This is on an ARCH=arm build, build I doubt it makes a difference. Let me know 
if
you need some .config values in order to reproduce.

Best regards,
Liviu



[PATCH net-next 3/3] rxrpc: Provide a cmsg to specify the amount of Tx data for a call

2017-06-07 Thread David Howells
Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.

Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.

By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).

An error will be returned if too little or too much data is presented in
the Tx phase.

Signed-off-by: David Howells 
---

 Documentation/networking/rxrpc.txt |   34 +++
 fs/afs/rxrpc.c |   18 +++-
 include/linux/rxrpc.h  |1 +
 include/net/af_rxrpc.h |2 +
 net/rxrpc/af_rxrpc.c   |5 +++
 net/rxrpc/ar-internal.h|3 +-
 net/rxrpc/call_object.c|3 ++
 net/rxrpc/sendmsg.c|   54 +++-
 8 files changed, 115 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/rxrpc.txt 
b/Documentation/networking/rxrpc.txt
index bce8e10a2a8e..8c70ba5dee4d 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -327,6 +327,7 @@ calls, to invoke certain actions and to report certain 
conditions.  These are:
RXRPC_ACCEPTs-- n/a Accept new call
RXRPC_EXCLUSIVE_CALLs-- n/a Make an exclusive client call
RXRPC_UPGRADE_SERVICE   s-- n/a Client call can be upgraded
+   RXRPC_TX_LENGTH s-- data lenTotal length of Tx data
 
(SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message)
 
@@ -406,6 +407,19 @@ calls, to invoke certain actions and to report certain 
conditions.  These are:
  future communication to that server and RXRPC_UPGRADE_SERVICE should no
  longer be set.
 
+ (*) RXRPC_TX_LENGTH
+
+ This is used to inform the kernel of the total amount of data that is
+ going to be transmitted by a call (whether in a client request or a
+ service response).  If given, it allows the kernel to encrypt from the
+ userspace buffer directly to the packet buffers, rather than copying into
+ the buffer and then encrypting in place.  This may only be given with the
+ first sendmsg() providing data for a call.  EMSGSIZE will be generated if
+ the amount of data actually given is different.
+
+ This takes a parameter of __s64 type that indicates how much will be
+ transmitted.  This may not be less than zero.
+
 The symbol RXRPC__SUPPORTED is defined as one more than the highest control
 message type supported.  At run time this can be queried by means of the
 RXRPC_SUPPORTED_CMSG socket option (see below).
@@ -577,6 +591,9 @@ A client would issue an operation by:
  MSG_MORE should be set in msghdr::msg_flags on all but the last part of
  the request.  Multiple requests may be made simultaneously.
 
+ An RXRPC_TX_LENGTH control message can also be specified on the first
+ sendmsg() call.
+
  If a call is intended to go to a destination other than the default
  specified through connect(), then msghdr::msg_name should be set on the
  first request message of that call.
@@ -764,6 +781,7 @@ The kernel interface functions are as follows:
struct sockaddr_rxrpc *srx,
struct key *key,
unsigned long user_call_ID,
+   s64 tx_total_len,
gfp_t gfp);
 
  This allocates the infrastructure to make a new RxRPC call and assigns
@@ -780,6 +798,11 @@ The kernel interface functions are as follows:
  control data buffer.  It is entirely feasible to use this to point to a
  kernel data structure.
 
+ tx_total_len is the amount of data the caller is intending to transmit
+ with this call (or -1 if unknown at this point).  Setting the data size
+ allows the kernel to encrypt directly to the packet buffers, thereby
+ saving a copy.  The value may not be less than -1.
+
  If this function is successful, an opaque reference to the RxRPC call is
  returned.  The caller now holds a reference on this and it must be
  properly ended.
@@ -931,6 +954,17 @@ The kernel interface functions are as follows:
 
  This is used to find the remote peer address of a call.
 
+ (*) Set the total transmit data size on a call.
+
+   void rxrpc_kernel_set_tx_length(struct socket *sock,
+   struct rxrpc_call *call,
+   s64 tx_total_len);
+
+ This sets the amount of data that the caller is intending to transmit on a
+ call.  It's intended t

[PATCH net-next 2/3] rxrpc: Consolidate sendmsg parameters

2017-06-07 Thread David Howells
Consolidate the sendmsg control message parameters into a struct rather
than passing them individually through the argument list of
rxrpc_sendmsg_cmsg().  This makes it easier to add more parameters.

Signed-off-by: David Howells 
---

 net/rxrpc/sendmsg.c |   83 +--
 1 file changed, 41 insertions(+), 42 deletions(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 5a4801e7f560..d939a5b1abc3 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -28,6 +28,14 @@ enum rxrpc_command {
RXRPC_CMD_REJECT_BUSY,  /* [server] reject a call as busy */
 };
 
+struct rxrpc_send_params {
+   unsigned long   user_call_ID;   /* User's call ID */
+   u32 abort_code; /* Abort code to Tx (if abort) 
*/
+   enum rxrpc_command  command : 8;/* The command to implement */
+   boolexclusive;  /* Shared or exclusive call */
+   boolupgrade;/* If the connection is 
upgradeable */
+};
+
 /*
  * wait for space to appear in the transmit/ACK window
  * - caller holds the socket locked
@@ -362,19 +370,12 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
 /*
  * extract control messages from the sendmsg() control buffer
  */
-static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
- unsigned long *user_call_ID,
- enum rxrpc_command *command,
- u32 *abort_code,
- bool *_exclusive,
- bool *_upgrade)
+static int rxrpc_sendmsg_cmsg(struct msghdr *msg, struct rxrpc_send_params *p)
 {
struct cmsghdr *cmsg;
bool got_user_ID = false;
int len;
 
-   *command = RXRPC_CMD_SEND_DATA;
-
if (msg->msg_controllen == 0)
return -EINVAL;
 
@@ -394,45 +395,43 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
if (msg->msg_flags & MSG_CMSG_COMPAT) {
if (len != sizeof(u32))
return -EINVAL;
-   *user_call_ID = *(u32 *) CMSG_DATA(cmsg);
+   p->user_call_ID = *(u32 *)CMSG_DATA(cmsg);
} else {
if (len != sizeof(unsigned long))
return -EINVAL;
-   *user_call_ID = *(unsigned long *)
+   p->user_call_ID = *(unsigned long *)
CMSG_DATA(cmsg);
}
-   _debug("User Call ID %lx", *user_call_ID);
got_user_ID = true;
break;
 
case RXRPC_ABORT:
-   if (*command != RXRPC_CMD_SEND_DATA)
+   if (p->command != RXRPC_CMD_SEND_DATA)
return -EINVAL;
-   *command = RXRPC_CMD_SEND_ABORT;
-   if (len != sizeof(*abort_code))
+   p->command = RXRPC_CMD_SEND_ABORT;
+   if (len != sizeof(p->abort_code))
return -EINVAL;
-   *abort_code = *(unsigned int *) CMSG_DATA(cmsg);
-   _debug("Abort %x", *abort_code);
-   if (*abort_code == 0)
+   p->abort_code = *(unsigned int *)CMSG_DATA(cmsg);
+   if (p->abort_code == 0)
return -EINVAL;
break;
 
case RXRPC_ACCEPT:
-   if (*command != RXRPC_CMD_SEND_DATA)
+   if (p->command != RXRPC_CMD_SEND_DATA)
return -EINVAL;
-   *command = RXRPC_CMD_ACCEPT;
+   p->command = RXRPC_CMD_ACCEPT;
if (len != 0)
return -EINVAL;
break;
 
case RXRPC_EXCLUSIVE_CALL:
-   *_exclusive = true;
+   p->exclusive = true;
if (len != 0)
return -EINVAL;
break;
 
case RXRPC_UPGRADE_SERVICE:
-   *_upgrade = true;
+   p->upgrade = true;
if (len != 0)
return -EINVAL;
break;
@@ -455,8 +454,7 @@ static int rxrpc_sendmsg_cmsg(struct msghdr *msg,
  */
 static struct rxrpc_call *
 rxrpc_new_client_call_for_sendmsg(struct rxrpc_sock *rx, struct msghdr *msg,
- unsigned long user_call_ID, bool exclusive,
- bool upgrade)
+ struct rxrpc_send_params *p)
__releases(

[PATCH net-next 1/3] rxrpc: Provide a getsockopt call to query what cmsgs types are supported

2017-06-07 Thread David Howells
Provide a getsockopt() call that can query what cmsg types are supported by
AF_RXRPC.
---

 Documentation/networking/rxrpc.txt |9 +
 include/linux/rxrpc.h  |   24 ++--
 net/rxrpc/af_rxrpc.c   |   30 +-
 3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/Documentation/networking/rxrpc.txt 
b/Documentation/networking/rxrpc.txt
index 18078e630a63..bce8e10a2a8e 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -406,6 +406,10 @@ calls, to invoke certain actions and to report certain 
conditions.  These are:
  future communication to that server and RXRPC_UPGRADE_SERVICE should no
  longer be set.
 
+The symbol RXRPC__SUPPORTED is defined as one more than the highest control
+message type supported.  At run time this can be queried by means of the
+RXRPC_SUPPORTED_CMSG socket option (see below).
+
 
 ==
 SOCKET OPTIONS
@@ -459,6 +463,11 @@ AF_RXRPC sockets support a few socket options at the 
SOL_RXRPC level:
  must point to an array of two unsigned short ints.  The first is the
  service ID to upgrade from and the second the service ID to upgrade to.
 
+ (*) RXRPC_SUPPORTED_CMSG
+
+ This is a read-only option that writes an int into the buffer indicating
+ the highest control message type supported.
+
 
 
 SECURITY
diff --git a/include/linux/rxrpc.h b/include/linux/rxrpc.h
index 707910c6c6c5..bdd3175b9a48 100644
--- a/include/linux/rxrpc.h
+++ b/include/linux/rxrpc.h
@@ -38,6 +38,7 @@ struct sockaddr_rxrpc {
 #define RXRPC_EXCLUSIVE_CONNECTION 3   /* Deprecated; use 
RXRPC_EXCLUSIVE_CALL instead */
 #define RXRPC_MIN_SECURITY_LEVEL   4   /* minimum security level */
 #define RXRPC_UPGRADEABLE_SERVICE  5   /* Upgrade service[0] -> 
service[1] */
+#define RXRPC_SUPPORTED_CMSG   6   /* Get highest supported 
control message type */
 
 /*
  * RxRPC control messages
@@ -45,16 +46,19 @@ struct sockaddr_rxrpc {
  * - terminal messages mean that a user call ID tag can be recycled
  * - s/r/- indicate whether these are applicable to sendmsg() and/or recvmsg()
  */
-#define RXRPC_USER_CALL_ID 1   /* sr: user call ID specifier */
-#define RXRPC_ABORT2   /* sr: abort request / notification 
[terminal] */
-#define RXRPC_ACK  3   /* -r: [Service] RPC op final ACK 
received [terminal] */
-#define RXRPC_NET_ERROR5   /* -r: network error received 
[terminal] */
-#define RXRPC_BUSY 6   /* -r: server busy received [terminal] 
*/
-#define RXRPC_LOCAL_ERROR  7   /* -r: local error generated [terminal] 
*/
-#define RXRPC_NEW_CALL 8   /* -r: [Service] new incoming call 
notification */
-#define RXRPC_ACCEPT   9   /* s-: [Service] accept request */
-#define RXRPC_EXCLUSIVE_CALL   10  /* s-: Call should be on exclusive 
connection */
-#define RXRPC_UPGRADE_SERVICE  11  /* s-: Request service upgrade for 
client call */
+enum rxrpc_cmsg_type {
+   RXRPC_USER_CALL_ID  = 1,/* sr: user call ID specifier */
+   RXRPC_ABORT = 2,/* sr: abort request / notification 
[terminal] */
+   RXRPC_ACK   = 3,/* -r: [Service] RPC op final ACK 
received [terminal] */
+   RXRPC_NET_ERROR = 5,/* -r: network error received 
[terminal] */
+   RXRPC_BUSY  = 6,/* -r: server busy received [terminal] 
*/
+   RXRPC_LOCAL_ERROR   = 7,/* -r: local error generated [terminal] 
*/
+   RXRPC_NEW_CALL  = 8,/* -r: [Service] new incoming call 
notification */
+   RXRPC_ACCEPT= 9,/* s-: [Service] accept request */
+   RXRPC_EXCLUSIVE_CALL= 10,   /* s-: Call should be on exclusive 
connection */
+   RXRPC_UPGRADE_SERVICE   = 11,   /* s-: Request service upgrade for 
client call */
+   RXRPC__SUPPORTED
+};
 
 /*
  * RxRPC security levels
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0c4dc4a7832c..44a52b82bb5d 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -582,6 +582,34 @@ static int rxrpc_setsockopt(struct socket *sock, int 
level, int optname,
 }
 
 /*
+ * Get socket options.
+ */
+static int rxrpc_getsockopt(struct socket *sock, int level, int optname,
+   char __user *optval, int __user *_optlen)
+{
+   int optlen;
+   
+   if (level != SOL_RXRPC)
+   return -EOPNOTSUPP;
+
+   if (get_user(optlen, _optlen))
+   return -EFAULT;
+   
+   switch (optname) {
+   case RXRPC_SUPPORTED_CMSG:
+   if (optlen < sizeof(int))
+   return -ETOOSMALL;
+   if (put_user(RXRPC__SUPPORTED - 1, (int __user *)optval) ||
+   put_user(sizeof(int), _optlen))
+   return -EFAULT;
+   return 0;
+   
+   d

[PATCH net-next 0/3] rxrpc: Tx length parameter

2017-06-07 Thread David Howells

Here's a set of patches that allows someone initiating a client call with
AF_RXRPC to indicate upfront the total amount of data that will be
transmitted.  This will allow AF_RXRPC to encrypt directly from source
buffer to packet rather than having to copy into the buffer and only
encrypt when it's full (the encrypted portion of the packet starts with a
length and so we can't encrypt until we know what the length will be).

The three patches are:

 (1) Provide a means of finding out what control message types are actually
 supported.  EINVAL is reported if an unsupported cmsg type is seen, so
 we don't want to set the new cmsg unless we know it will be accepted.

 (2) Consolidate some stuff into a struct to reduce the parameter count on
 the function that parses the cmsg buffer.

 (3) Introduce the RXRPC_TX_LENGTH cmsg.  This can be provided on the first
 sendmsg() that contributes data to a client call request or a service
 call reply.  If provided, the user must provide exactly that amount of
 data or an error will be incurred.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
    rxrpc-rewrite-20170607

David
---
David Howells (3):
  rxrpc: Provide a getsockopt call to query what cmsgs types are supported
  rxrpc: Consolidate sendmsg parameters
  rxrpc: Provide a cmsg to specify the amount of Tx data for a call


 Documentation/networking/rxrpc.txt |   43 +++
 fs/afs/rxrpc.c |   18 +
 include/linux/rxrpc.h  |   25 ---
 include/net/af_rxrpc.h |2 +
 net/rxrpc/af_rxrpc.c   |   35 +
 net/rxrpc/ar-internal.h|3 +
 net/rxrpc/call_object.c|3 +
 net/rxrpc/sendmsg.c|  135 +---
 8 files changed, 207 insertions(+), 57 deletions(-)



Re: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Jason Gunthorpe
On Wed, Jun 07, 2017 at 07:19:01PM +0300, Leon Romanovsky wrote:
> It makes me wonder if it is expected behavior for
> ibnl_rcv_reply_skb() to handle !NLM_F_REQUEST messages and do we
> really need it? What are the scenarios?  In my use case, which is
> for sure different from yours, I'm always setting NLM_F_REQUEST
> while communicating with kernel.

If I recall the user space SA code issues REQUESTS from the kernel to
userspace, so userspace returns with the response format. This is
abnormal for netlink hence the special function.

Jason


Re: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Leon Romanovsky
On Wed, Jun 7, 2017 at 7:37 PM, Jason Gunthorpe
 wrote:
> On Wed, Jun 07, 2017 at 07:19:01PM +0300, Leon Romanovsky wrote:
>> It makes me wonder if it is expected behavior for
>> ibnl_rcv_reply_skb() to handle !NLM_F_REQUEST messages and do we
>> really need it? What are the scenarios?  In my use case, which is
>> for sure different from yours, I'm always setting NLM_F_REQUEST
>> while communicating with kernel.
>
> If I recall the user space SA code issues REQUESTS from the kernel to
> userspace, so userspace returns with the response format. This is
> abnormal for netlink hence the special function.

In netlink semantics, kernel side is supposed to send netlink
notification message and userspace is supposed to send REQUEST.

>
> Jason


Re: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Jason Gunthorpe
On Wed, Jun 07, 2017 at 07:43:44PM +0300, Leon Romanovsky wrote:
> On Wed, Jun 7, 2017 at 7:37 PM, Jason Gunthorpe
>  wrote:
> > On Wed, Jun 07, 2017 at 07:19:01PM +0300, Leon Romanovsky wrote:
> >> It makes me wonder if it is expected behavior for
> >> ibnl_rcv_reply_skb() to handle !NLM_F_REQUEST messages and do we
> >> really need it? What are the scenarios?  In my use case, which is
> >> for sure different from yours, I'm always setting NLM_F_REQUEST
> >> while communicating with kernel.
> >
> > If I recall the user space SA code issues REQUESTS from the kernel to
> > userspace, so userspace returns with the response format. This is
> > abnormal for netlink hence the special function.
> 
> In netlink semantics, kernel side is supposed to send netlink
> notification message and userspace is supposed to send REQUEST.

That pattern is for async communications, the SA stuff needs a sync
protocol, unfortunately.

Jason


Re: [PATCH net-next 00/16] nfp: ctrl vNIC

2017-06-07 Thread David Miller
From: Jakub Kicinski 
Date: Mon,  5 Jun 2017 17:01:41 -0700

> This series adds the ability to use one vNIC as a control channel
> for passing messages to and from the application firmware.  The
> implementation restructures the existing netdev vNIC code to be able
> to deal with nfp_nets with netdev pointer set to NULL.  Control vNICs
> are not visible to userspace (other than for dumping ring state), and
> since they don't have netdevs we use a tasklet for RX and simple skb 
> list for TX queuing.
> 
> Due to special status of the control vNIC we have to reshuffle the
> init code a bit to make sure control vNIC will be fully brought up
> (and therefore communication with app FW can happen) before any netdev
> or port is visible to user space.
> 
> FW will designate which vNIC is supposed to be used as control one
> by setting _pf%u_net_ctrl_bar symbol.  Some FWs depend on metadata
> being prepended to control message, some prefer to look at queue ID
> to decide that something is a control message.  Our implementation
> can cater to both.
> 
> First two users of this code will be eBPF maps and flower offloads.

Ok, I read this over and also checked out your discussion with Jiri.
So far this looks OK to me, so series applied.

I look forward to seeing the eBPF maps and flower follow-on stuff.


Re: [PATCH net] bnx2x: fix pf2vf bulletin DMA mapping leak

2017-06-07 Thread David Miller
From: Michal Schmidt 
Date: Tue,  6 Jun 2017 16:30:31 +0200

> When freeing VF's DMA mappings, an already NULLed pointer was checked
> again due to an apparent copy&paste error. Consequently, the pf2vf
> bulletin DMA mapping was not freed.
> 
> Signed-off-by: Michal Schmidt 

Applied, thank you.


Re: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Leon Romanovsky
On Wed, Jun 07, 2017 at 10:47:50AM -0600, Jason Gunthorpe wrote:
> On Wed, Jun 07, 2017 at 07:43:44PM +0300, Leon Romanovsky wrote:
> > On Wed, Jun 7, 2017 at 7:37 PM, Jason Gunthorpe
> >  wrote:
> > > On Wed, Jun 07, 2017 at 07:19:01PM +0300, Leon Romanovsky wrote:
> > >> It makes me wonder if it is expected behavior for
> > >> ibnl_rcv_reply_skb() to handle !NLM_F_REQUEST messages and do we
> > >> really need it? What are the scenarios?  In my use case, which is
> > >> for sure different from yours, I'm always setting NLM_F_REQUEST
> > >> while communicating with kernel.
> > >
> > > If I recall the user space SA code issues REQUESTS from the kernel to
> > > userspace, so userspace returns with the response format. This is
> > > abnormal for netlink hence the special function.
> >
> > In netlink semantics, kernel side is supposed to send netlink
> > notification message and userspace is supposed to send REQUEST.
>
> That pattern is for async communications, the SA stuff needs a sync
> protocol, unfortunately.

There is special flag NLM_F_ACK for it and userspace will set
NLM_F_REQUEST | NLM_F_ACK once synchronization is needed.

>
> Jason


signature.asc
Description: PGP signature


Re: [PATCH] net: fix up hash documentation

2017-06-07 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 6 Jun 2017 19:01:37 +0300

> commit 61b905da33 ("net: Rename skb->rxhash to skb->hash")
> didn't update the documentation, fix this up.
> 
> Cc: Tom Herbert 
> Signed-off-by: Michael S. Tsirkin 

Applied.


Re: [PATCH net] net: rps: send out pending IPI's on CPU hotplug

2017-06-07 Thread David Miller
From: ashwa...@codeaurora.org
Date: Tue, 06 Jun 2017 20:47:36 +0530

> IPI's from the victim cpu are not handled in dev_cpu_callback.
> So these pending IPI's would be sent to the remote cpu only when
> NET_RX is scheduled on the victim cpu and since this trigger is
> unpredictable it would result in packet latencies on the remote cpu.
> 
> This patch adds support to send the pending ipi's of victim cpu.
> 
> Signed-off-by: Ashwanth Goli 
> ---
>  net/core/dev.c | 31 ++-
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fca407b..e6bfa54 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4948,6 +4948,19 @@ __sum16 __skb_gro_checksum_complete(struct
> sk_buff *skb)

This patch has been severely corrupted by your email client.

Please read Documentation/email-clients.txt, fix things up, send
a test email to yourself, and only resubmit this patch once you
are able yourself to successfully apply the patch that arrives
in that test email.

Thank you.


Re: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Jason Gunthorpe
On Wed, Jun 07, 2017 at 08:00:37PM +0300, Leon Romanovsky wrote:
> On Wed, Jun 07, 2017 at 10:47:50AM -0600, Jason Gunthorpe wrote:
> > On Wed, Jun 07, 2017 at 07:43:44PM +0300, Leon Romanovsky wrote:
> > > On Wed, Jun 7, 2017 at 7:37 PM, Jason Gunthorpe
> > >  wrote:
> > > > On Wed, Jun 07, 2017 at 07:19:01PM +0300, Leon Romanovsky wrote:
> > > >> It makes me wonder if it is expected behavior for
> > > >> ibnl_rcv_reply_skb() to handle !NLM_F_REQUEST messages and do we
> > > >> really need it? What are the scenarios?  In my use case, which is
> > > >> for sure different from yours, I'm always setting NLM_F_REQUEST
> > > >> while communicating with kernel.
> > > >
> > > > If I recall the user space SA code issues REQUESTS from the kernel to
> > > > userspace, so userspace returns with the response format. This is
> > > > abnormal for netlink hence the special function.
> > >
> > > In netlink semantics, kernel side is supposed to send netlink
> > > notification message and userspace is supposed to send REQUEST.
> >
> > That pattern is for async communications, the SA stuff needs a sync
> > protocol, unfortunately.
> 
> There is special flag NLM_F_ACK for it and userspace will set
> NLM_F_REQUEST | NLM_F_ACK once synchronization is needed.

AFAIK, that is different, that is acking and retriggering a single shot
notification, not completing a kernel initiated handshake.

Jason


RE: Netlink messages without NLM_F_REQUEST flag

2017-06-07 Thread Weiny, Ira
> 
> On Wed, Jun 07, 2017 at 10:47:50AM -0600, Jason Gunthorpe wrote:
> > On Wed, Jun 07, 2017 at 07:43:44PM +0300, Leon Romanovsky wrote:
> > > On Wed, Jun 7, 2017 at 7:37 PM, Jason Gunthorpe 
> > >  wrote:
> > > > On Wed, Jun 07, 2017 at 07:19:01PM +0300, Leon Romanovsky wrote:
> > > >> It makes me wonder if it is expected behavior for
> > > >> ibnl_rcv_reply_skb() to handle !NLM_F_REQUEST messages and do 
> > > >> we really need it? What are the scenarios?  In my use case, 
> > > >> which is for sure different from yours, I'm always setting 
> > > >> NLM_F_REQUEST while communicating with kernel.
> > > >
> > > > If I recall the user space SA code issues REQUESTS from the 
> > > > kernel to userspace, so userspace returns with the response 
> > > > format. This is abnormal for netlink hence the special function.
> > >
> > > In netlink semantics, kernel side is supposed to send netlink 
> > > notification message and userspace is supposed to send REQUEST.
> >
> > That pattern is for async communications, the SA stuff needs a sync 
> > protocol, unfortunately.
> 
> There is special flag NLM_F_ACK for it and userspace will set 
> NLM_F_REQUEST | NLM_F_ACK once synchronization is needed.
> 

Reference?

>From my understanding, NLM_F_REQUEST | NLM_F_ACK is simply requesting an ack 
>from the kernel on a request.  In our case the message is a response to the 
>kernel request.

Ira




Re: [PATCH net-next v2 5/5] net: dsa: Stop accessing ds->dst->cpu_dp in drivers

2017-06-07 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

>> So as I said in v2, now that a driver is guaranteed that dp->cpu_dp is
>> correctly assigned at setup time, isn't better (especially for future
>> multi-CPU support) to provide an helper which returns the CPU port for a
>> given port? i.e. dsa_get_cpu_port(struct dsa_switch *ds, int port).
>>
>> Or is there something blocking? I might be wrong.
>
> mt7530.c needs access to the CPU port at ops->setup() time which is
> why this is still here.

Yes, mt7530 is the only one doing this and has an hardcoded CPU port. So
what I meant was, shouldn't we have this instead:

struct dsa_port *dsa_get_cpu_port(struct dsa_switch *ds, int port)
{
return ds->ports[port].cpu_dp;
}

And:

-   dn = ds->dst->cpu_dp->netdev->dev.of_node->parent;
+   cpu_dp = dsa_get_cpu_port(ds, MT7530_CPU_PORT);
+   dn = cpu_dp->netdev->dev.of_node->parent;


Thanks,

Vivien


[PATCH net-next 1/4] tcp: add a struct net parameter to tcp_parse_options()

2017-06-07 Thread Eric Dumazet
We want to move some TCP sysctls to net namespaces in the future.

tcp_window_scaling, tcp_sack and tcp_timestamps being fetched
from tcp_parse_options(), we need to pass an extra parameter.

Signed-off-by: Eric Dumazet 
---
 drivers/infiniband/hw/cxgb4/cm.c |  2 +-
 include/net/tcp.h|  2 +-
 net/ipv4/syncookies.c|  2 +-
 net/ipv4/tcp_input.c | 18 +++---
 net/ipv4/tcp_minisocks.c |  4 ++--
 net/ipv6/syncookies.c|  2 +-
 6 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 
0910faf3587b547e873bc4e5572e7defd93623b3..21e1eb38c986a2db5d1ce1fdafcf738cff36e692
 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -3752,7 +3752,7 @@ static void build_cpl_pass_accept_req(struct sk_buff 
*skb, int stid , u8 tos)
 */
memset(&tmp_opt, 0, sizeof(tmp_opt));
tcp_clear_options(&tmp_opt);
-   tcp_parse_options(skb, &tmp_opt, 0, NULL);
+   tcp_parse_options(&init_net, skb, &tmp_opt, 0, NULL);
 
req = (struct cpl_pass_accept_req *)__skb_push(skb, sizeof(*req));
memset(req, 0, sizeof(*req));
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
28b577a35786ddc9b223b54dd387e59910d9c521..0b0cfeefa05b86473bbef091f54dc976334e9372
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -427,7 +427,7 @@ void tcp_set_keepalive(struct sock *sk, int val);
 void tcp_syn_ack_timeout(const struct request_sock *req);
 int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
int flags, int *addr_len);
-void tcp_parse_options(const struct sk_buff *skb,
+void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
   struct tcp_options_received *opt_rx,
   int estab, struct tcp_fastopen_cookie *foc);
 const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 
6426250a58ea1afb29b673c00bb9d58bd3d21122..6a32cb3818771d3109a013521bda9c0e6cdab74b
 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -312,7 +312,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
 
/* check for timestamp cookie support */
memset(&tcp_opt, 0, sizeof(tcp_opt));
-   tcp_parse_options(skb, &tcp_opt, 0, NULL);
+   tcp_parse_options(sock_net(sk), skb, &tcp_opt, 0, NULL);
 
if (tcp_opt.saw_tstamp && tcp_opt.rcv_tsecr) {
tsoff = secure_tcp_ts_off(ip_hdr(skb)->daddr, 
ip_hdr(skb)->saddr);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
4ea8ec5c7bb410834d1c54e0159467ae08d4cd15..99ee707f0ef496998b11b5367ade1a0412b50bef
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3724,7 +3724,8 @@ static void tcp_parse_fastopen_option(int len, const 
unsigned char *cookie,
  * But, this can also be called on packets in the established flow when
  * the fast version below fails.
  */
-void tcp_parse_options(const struct sk_buff *skb,
+void tcp_parse_options(const struct net *net,
+  const struct sk_buff *skb,
   struct tcp_options_received *opt_rx, int estab,
   struct tcp_fastopen_cookie *foc)
 {
@@ -3858,7 +3859,8 @@ static bool tcp_parse_aligned_timestamp(struct tcp_sock 
*tp, const struct tcphdr
 /* Fast parse options. This hopes to only see timestamps.
  * If it is wrong it falls back on tcp_parse_options().
  */
-static bool tcp_fast_parse_options(const struct sk_buff *skb,
+static bool tcp_fast_parse_options(const struct net *net,
+  const struct sk_buff *skb,
   const struct tcphdr *th, struct tcp_sock *tp)
 {
/* In the spirit of fast parsing, compare doff directly to constant
@@ -3873,7 +3875,7 @@ static bool tcp_fast_parse_options(const struct sk_buff 
*skb,
return true;
}
 
-   tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
+   tcp_parse_options(net, skb, &tp->rx_opt, 1, NULL);
if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr)
tp->rx_opt.rcv_tsecr -= tp->tsoffset;
 
@@ -5234,7 +5236,8 @@ static bool tcp_validate_incoming(struct sock *sk, struct 
sk_buff *skb,
bool rst_seq_match = false;
 
/* RFC1323: H1. Apply PAWS check first. */
-   if (tcp_fast_parse_options(skb, th, tp) && tp->rx_opt.saw_tstamp &&
+   if (tcp_fast_parse_options(sock_net(sk), skb, th, tp) &&
+   tp->rx_opt.saw_tstamp &&
tcp_paws_discard(sk, skb)) {
if (!th->rst) {
NET_INC_STATS(sock_net(sk), 
LINUX_MIB_PAWSESTABREJECTED);
@@ -5605,7 +5608,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, 
struct sk_buff *synack,
/* Get original SYNACK MSS value if user MSS sets mss_clamp */
tcp_clear_

[PATCH net-next 3/4] tcp: Namespaceify sysctl_tcp_window_scaling

2017-06-07 Thread Eric Dumazet
Signed-off-by: Eric Dumazet 
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h  |  1 -
 net/ipv4/syncookies.c  |  2 +-
 net/ipv4/sysctl_net_ipv4.c | 14 +++---
 net/ipv4/tcp_input.c   |  3 +--
 net/ipv4/tcp_ipv4.c|  1 +
 net/ipv4/tcp_output.c  |  4 ++--
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 
bb02482ec8215d59943a708fce0f720e0a71aa8f..1a2ae74a108510a49466a08fd2c2d84c3940a3a9
 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -123,6 +123,7 @@ struct netns_ipv4 {
unsigned int sysctl_tcp_notsent_lowat;
int sysctl_tcp_tw_reuse;
int sysctl_tcp_sack;
+   int sysctl_tcp_window_scaling;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
f9d2ce0ba6769d3b95da8846344c0ba1811a55a8..f41ed5bac49312e2bd9b7bfd54b59a95d9888ec9
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -238,7 +238,6 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 
 /* sysctl variables for tcp */
 extern int sysctl_tcp_timestamps;
-extern int sysctl_tcp_window_scaling;
 extern int sysctl_tcp_fastopen;
 extern int sysctl_tcp_retrans_collapse;
 extern int sysctl_tcp_stdurg;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 
b386e8592ffddf2c3565486b6897cb9a4dc2dcb1..3d74a45773f1f7adf430885fe46096a498b4e489
 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -257,7 +257,7 @@ bool cookie_timestamp_decode(const struct net *net,
tcp_opt->wscale_ok = 1;
tcp_opt->snd_wscale = options & TS_OPT_WSCALE_MASK;
 
-   return sysctl_tcp_window_scaling != 0;
+   return net->ipv4.sysctl_tcp_window_scaling != 0;
 }
 EXPORT_SYMBOL(cookie_timestamp_decode);
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 
74718f8a0aa8fd8ff52cb50112bb0f5101125b6a..c30ac2ba0e140698d310b4d84e1bbd8f37e650a5
 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -371,13 +371,6 @@ static struct ctl_table ipv4_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
-   {
-   .procname   = "tcp_window_scaling",
-   .data   = &sysctl_tcp_window_scaling,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec
-   },
{
.procname   = "tcp_retrans_collapse",
.data   = &sysctl_tcp_retrans_collapse,
@@ -1116,6 +1109,13 @@ static struct ctl_table ipv4_net_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .procname   = "tcp_window_scaling",
+   .data   = &init_net.ipv4.sysctl_tcp_window_scaling,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
{ }
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
2eacfcaf1257b9f474ae5af8169eec4b4d30f3f3..675ee903370ffd983109a2651235d627cad6eaa5
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -77,7 +77,6 @@
 #include 
 
 int sysctl_tcp_timestamps __read_mostly = 1;
-int sysctl_tcp_window_scaling __read_mostly = 1;
 int sysctl_tcp_fack __read_mostly;
 int sysctl_tcp_max_reordering __read_mostly = 300;
 int sysctl_tcp_dsack __read_mostly = 1;
@@ -3765,7 +3764,7 @@ void tcp_parse_options(const struct net *net,
break;
case TCPOPT_WINDOW:
if (opsize == TCPOLEN_WINDOW && th->syn &&
-   !estab && sysctl_tcp_window_scaling) {
+   !estab && 
net->ipv4.sysctl_tcp_window_scaling) {
__u8 snd_wscale = *(__u8 *)ptr;
opt_rx->wscale_ok = 1;
if (snd_wscale > TCP_MAX_WSCALE) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
3c475a2a84323c8064d271b7dc2d0e1d68dd7e2e..e07ef5b14aaf4bd46253775b649cd77b58af5a2f
 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2466,6 +2466,7 @@ static int __net_init tcp_sk_init(struct net *net)
 
net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256);
net->ipv4.sysctl_tcp_sack = 1;
+   net->ipv4.sysctl_tcp_window_scaling = 1;
 
return 0;
 fail:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
45c8e459db49bcbf3ca17b5f048cee82c3273ef7..3f40950107857b2984e858bf6d9e48e6f87a3259
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -575,7 +575,7 @@ static unsigned int tcp_syn_options(struct sock *sk, struct 
sk_buff *skb,
opts->ts

[PATCH net-next 2/4] tcp: Namespaceify sysctl_tcp_sack

2017-06-07 Thread Eric Dumazet
Signed-off-by: Eric Dumazet 
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h  |  4 ++--
 net/ipv4/syncookies.c  |  7 ---
 net/ipv4/sysctl_net_ipv4.c | 14 +++---
 net/ipv4/tcp_input.c   |  3 +--
 net/ipv4/tcp_ipv4.c|  1 +
 net/ipv4/tcp_output.c  |  2 +-
 net/ipv6/syncookies.c  |  2 +-
 8 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 
cd686c4fb32dc5409a08f818d48228bffa6f6778..bb02482ec8215d59943a708fce0f720e0a71aa8f
 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -122,6 +122,7 @@ struct netns_ipv4 {
int sysctl_tcp_fin_timeout;
unsigned int sysctl_tcp_notsent_lowat;
int sysctl_tcp_tw_reuse;
+   int sysctl_tcp_sack;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
0b0cfeefa05b86473bbef091f54dc976334e9372..f9d2ce0ba6769d3b95da8846344c0ba1811a55a8
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -239,7 +239,6 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 /* sysctl variables for tcp */
 extern int sysctl_tcp_timestamps;
 extern int sysctl_tcp_window_scaling;
-extern int sysctl_tcp_sack;
 extern int sysctl_tcp_fastopen;
 extern int sysctl_tcp_retrans_collapse;
 extern int sysctl_tcp_stdurg;
@@ -520,7 +519,8 @@ u32 __cookie_v4_init_sequence(const struct iphdr *iph, 
const struct tcphdr *th,
  u16 *mssp);
 __u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss);
 u64 cookie_init_timestamp(struct request_sock *req);
-bool cookie_timestamp_decode(struct tcp_options_received *opt);
+bool cookie_timestamp_decode(const struct net *net,
+struct tcp_options_received *opt);
 bool cookie_ecn_ok(const struct tcp_options_received *opt,
   const struct net *net, const struct dst_entry *dst);
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 
6a32cb3818771d3109a013521bda9c0e6cdab74b..b386e8592ffddf2c3565486b6897cb9a4dc2dcb1
 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -232,7 +232,8 @@ EXPORT_SYMBOL(tcp_get_cookie_sock);
  * return false if we decode a tcp option that is disabled
  * on the host.
  */
-bool cookie_timestamp_decode(struct tcp_options_received *tcp_opt)
+bool cookie_timestamp_decode(const struct net *net,
+struct tcp_options_received *tcp_opt)
 {
/* echoed timestamp, lowest bits contain options */
u32 options = tcp_opt->rcv_tsecr;
@@ -247,7 +248,7 @@ bool cookie_timestamp_decode(struct tcp_options_received 
*tcp_opt)
 
tcp_opt->sack_ok = (options & TS_OPT_SACK) ? TCP_SACK_SEEN : 0;
 
-   if (tcp_opt->sack_ok && !sysctl_tcp_sack)
+   if (tcp_opt->sack_ok && !net->ipv4.sysctl_tcp_sack)
return false;
 
if ((options & TS_OPT_WSCALE_MASK) == TS_OPT_WSCALE_MASK)
@@ -319,7 +320,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
tcp_opt.rcv_tsecr -= tsoff;
}
 
-   if (!cookie_timestamp_decode(&tcp_opt))
+   if (!cookie_timestamp_decode(sock_net(sk), &tcp_opt))
goto out;
 
ret = NULL;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 
86957e9cd6c6748ac00aa0307154bb131c43f1da..74718f8a0aa8fd8ff52cb50112bb0f5101125b6a
 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -378,13 +378,6 @@ static struct ctl_table ipv4_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
-   {
-   .procname   = "tcp_sack",
-   .data   = &sysctl_tcp_sack,
-   .maxlen = sizeof(int),
-   .mode   = 0644,
-   .proc_handler   = proc_dointvec
-   },
{
.procname   = "tcp_retrans_collapse",
.data   = &sysctl_tcp_retrans_collapse,
@@ -1116,6 +1109,13 @@ static struct ctl_table ipv4_net_table[] = {
.extra2 = &one,
},
 #endif
+   {
+   .procname   = "tcp_sack",
+   .data   = &init_net.ipv4.sysctl_tcp_sack,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec
+   },
{ }
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
99ee707f0ef496998b11b5367ade1a0412b50bef..2eacfcaf1257b9f474ae5af8169eec4b4d30f3f3
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -78,7 +78,6 @@
 
 int sysctl_tcp_timestamps __read_mostly = 1;
 int sysctl_tcp_window_scaling __read_mostly = 1;
-int sysctl_tcp_sack __read_mostly = 1;
 int sysctl_tcp_fack __read_mostly;
 int sysctl_tcp_max_reordering __read_mostly = 300;
 int sysctl_tcp_dsack __read_mostly = 1;

[PATCH net-next 4/4] tcp: Namespaceify sysctl_tcp_timestamps

2017-06-07 Thread Eric Dumazet
Signed-off-by: Eric Dumazet 
---
 include/net/netns/ipv4.h   |  1 +
 include/net/secure_seq.h   |  5 +++--
 include/net/tcp.h  |  3 +--
 net/core/secure_seq.c  |  9 +
 net/ipv4/syncookies.c  |  6 --
 net/ipv4/sysctl_net_ipv4.c | 14 +++---
 net/ipv4/tcp_input.c   |  5 ++---
 net/ipv4/tcp_ipv4.c|  9 +
 net/ipv4/tcp_output.c  |  7 ---
 net/ipv6/syncookies.c  |  3 ++-
 net/ipv6/tcp_ipv6.c|  7 ---
 11 files changed, 38 insertions(+), 31 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 
1a2ae74a108510a49466a08fd2c2d84c3940a3a9..9a14a0850b0e3601194479b4e1a433dc817e088e
 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -124,6 +124,7 @@ struct netns_ipv4 {
int sysctl_tcp_tw_reuse;
int sysctl_tcp_sack;
int sysctl_tcp_window_scaling;
+   int sysctl_tcp_timestamps;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
 
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 
b94006f6fbdde0d78fe33b9c2d86159e291c30cf..031bf16d15218329be98b1fb8c3f3e891a6f86e3
 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -8,10 +8,11 @@ u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const 
__be32 *daddr,
   __be16 dport);
 u32 secure_tcp_seq(__be32 saddr, __be32 daddr,
   __be16 sport, __be16 dport);
-u32 secure_tcp_ts_off(__be32 saddr, __be32 daddr);
+u32 secure_tcp_ts_off(const struct net *net, __be32 saddr, __be32 daddr);
 u32 secure_tcpv6_seq(const __be32 *saddr, const __be32 *daddr,
 __be16 sport, __be16 dport);
-u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 *daddr);
+u32 secure_tcpv6_ts_off(const struct net *net,
+   const __be32 *saddr, const __be32 *daddr);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
__be16 sport, __be16 dport);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
f41ed5bac49312e2bd9b7bfd54b59a95d9888ec9..aec092560d9bd60d4323fa6d9ced74f17026b5a7
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -237,7 +237,6 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 
 
 /* sysctl variables for tcp */
-extern int sysctl_tcp_timestamps;
 extern int sysctl_tcp_fastopen;
 extern int sysctl_tcp_retrans_collapse;
 extern int sysctl_tcp_stdurg;
@@ -1869,7 +1868,7 @@ struct tcp_request_sock_ops {
struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl,
   const struct request_sock *req);
u32 (*init_seq)(const struct sk_buff *skb);
-   u32 (*init_ts_off)(const struct sk_buff *skb);
+   u32 (*init_ts_off)(const struct net *net, const struct sk_buff *skb);
int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
   struct flowi *fl, struct request_sock *req,
   struct tcp_fastopen_cookie *foc,
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 
ae35cce3a40d70387bee815798933aa43a0e6d84..7232274de334bbd0852b80fc286ee316e22946d7
 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -51,7 +51,8 @@ static u32 seq_scale(u32 seq)
 #endif
 
 #if IS_ENABLED(CONFIG_IPV6)
-u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 *daddr)
+u32 secure_tcpv6_ts_off(const struct net *net,
+   const __be32 *saddr, const __be32 *daddr)
 {
const struct {
struct in6_addr saddr;
@@ -61,7 +62,7 @@ u32 secure_tcpv6_ts_off(const __be32 *saddr, const __be32 
*daddr)
.daddr = *(struct in6_addr *)daddr,
};
 
-   if (sysctl_tcp_timestamps != 1)
+   if (net->ipv4.sysctl_tcp_timestamps != 1)
return 0;
 
ts_secret_init();
@@ -113,9 +114,9 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
 
 #ifdef CONFIG_INET
-u32 secure_tcp_ts_off(__be32 saddr, __be32 daddr)
+u32 secure_tcp_ts_off(const struct net *net, __be32 saddr, __be32 daddr)
 {
-   if (sysctl_tcp_timestamps != 1)
+   if (net->ipv4.sysctl_tcp_timestamps != 1)
return 0;
 
ts_secret_init();
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 
3d74a45773f1f7adf430885fe46096a498b4e489..7835bb4a1fab2b335c65001cc3c9233ffb4fd5cc
 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -243,7 +243,7 @@ bool cookie_timestamp_decode(const struct net *net,
return true;
}
 
-   if (!sysctl_tcp_timestamps)
+   if (!net->ipv4.sysctl_tcp_timestamps)
return false;
 
tcp_opt->sack_ok = (options & TS_OPT_SACK) ? TCP_SACK_SEEN : 0;
@@ -316,7 +316,9 @@ struct sock *cookie_v4_check(struct sock *sk, struct 
sk_buff *skb)
tcp_parse_options(sock_net(sk), skb, &tcp_opt, 0

[PATCH net-next 0/4] tcp: Namespaceify 3 sysctls

2017-06-07 Thread Eric Dumazet
Move tcp_sack, tcp_window_scaling and tcp_timestamps
sysctls to network namespaces.

Eric Dumazet (4):
  tcp: add a struct net parameter to tcp_parse_options()
  tcp: Namespaceify sysctl_tcp_sack
  tcp: Namespaceify sysctl_tcp_window_scaling
  tcp: Namespaceify sysctl_tcp_timestamps

 drivers/infiniband/hw/cxgb4/cm.c |  2 +-
 include/net/netns/ipv4.h |  3 +++
 include/net/secure_seq.h |  5 +++--
 include/net/tcp.h| 10 --
 net/core/secure_seq.c|  9 +
 net/ipv4/syncookies.c| 17 +---
 net/ipv4/sysctl_net_ipv4.c   | 42 
 net/ipv4/tcp_input.c | 29 +--
 net/ipv4/tcp_ipv4.c  | 11 +++
 net/ipv4/tcp_minisocks.c |  4 ++--
 net/ipv4/tcp_output.c| 13 +++--
 net/ipv6/syncookies.c|  7 ---
 net/ipv6/tcp_ipv6.c  |  7 ---
 13 files changed, 86 insertions(+), 73 deletions(-)

-- 
2.13.0.506.g27d5fe0cd-goog



Re: Stmmac: fix for hw timestamp of GMAC 3 unit

2017-06-07 Thread Mario Molitor

Hi Pepe,
thanks for the response.
I have to thanking for the development of stmmac driver.
Today I have sending the two patches as patches for net.git kernel. I was 
the last day to busy.

Thanks and best regards,
Marrio

-Ursprüngliche Nachricht- 
From: Giuseppe CAVALLARO

Sent: Tuesday, June 6, 2017 7:43 AM
To: Mario Molitor ; alexandre.tor...@st.com
Cc: netdev@vger.kernel.org ; linux-ker...@vger.kernel.org
Subject: Re: Stmmac: fix for hw timestamp of GMAC 3 unit

Hi Mario

thanks for your tests, and, at first glance, your patches seem to be
sensible so,
please, send the changes as patches for net.git kernel.

Regards
Peppe


On 6/6/2017 12:11 AM, Mario Molitor wrote:

Dear stmmac maintainer group,

I have found an problem in stmmac driver of linux kernel and I hope for a 
fix in the mainline kernel.

At the moment I have two patch files which fix this problem for me.
The problem seems created with the commit 
d2042052a0aa6a54f01a0c9e14243ec040b100e2 and 
ba1ffd74df74a9efa5290f87632a0ed55f1aa387 has breakage the functionality of 
GMAC3 unit.

I assume that these commits were only tested with a GMAC4 unit.
I have got seen this problem with execution of ptp4l daemon on system with 
linux 4.11 Kernel.


===
root@QuantumXsoc:~ ptp4l -f /etc/ptp.cfg -i eth0 -m
ptp4l[47.928]: selected /dev/ptp0 as PTP clock
ptp4l[47.937]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[47.938]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[47.938]: port 1: link up
[   48.282709] socfpga-dwmac ff702000.ethernet eth0: get valid RX hw 
timestamp 0
[   48.316316] socfpga-dwmac ff702000.ethernet eth0: get valid RX hw 
timestamp 0
[   48.340260] socfpga-dwmac ff702000.ethernet eth0: get valid RX hw 
timestamp 0
[   48.456738] socfpga-dwmac ff702000.ethernet eth0: get valid RX hw 
timestamp 0

ptp4l[48.457]: port 1: received DELAY_REQ without timestamp
[   48.488442] socfpga-dwmac ff702000.ethernet eth0: get valid RX hw 
timestamp 0
[   48.495599] socfpga-dwmac ff702000.ethernet eth0: get valid RX hw 
timestamp 0

ptp4l[48.489]: port 1: received SYNC without timestamp



I have found two kind of problems and for this two patches (based on the 
4.11 kernel) which fix this problem.


1.) PTP_TCR_SNAPTYPSEL_1

The first problem was for my point of view the change of definition 
PTP_TCR_SNAPTYPSEL_1.  I think according the
CYCLON V documention only the bit 16 of snaptypesel should be set. (more 
information see Table 17-20 (cv_5v4.pdf) : Timestamp Snapshot Dependency 
on Register Bits)

I believe that the GMAC4 have another necessary definition.

( patch 0001-stmmac-fix-ptp-header-for-GMAC3-hw-timestamp.patch )

>From 2d54d58dc8548d98572eb5fbfe488ec59b4c0ef5 Mon Sep 17 00:00:00 2001
From: Mario Molitor 
Date: Mon, 5 Jun 2017 18:58:49 +0200
Subject: [PATCH 1/2] stmmac: fix ptp header for GMAC3 hw timestamp

According the CYCLON V documention only the bit 16 of snaptypesel should 
set.
(more information see Table 17-20 (cv_5v4.pdf) : Timestamp Snapshot 
Dependency on Register Bits)


fixed: d2042052a0aa6a54f01a0c9e14243ec040b100e2
---
  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 15 ---
  drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.h  |  3 ++-
  2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

index 4498a38..13a1ac9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -471,7 +471,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device 
*dev, struct ifreq *ifr)

  /* PTP v1, UDP, any kind of event packet */
  config.rx_filter = HWTSTAMP_FILTER_PTP_V1_L4_EVENT;
  /* take time stamp for all event messages */
- snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+ if (priv->plat->has_gmac4)
+ snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
+ else
+ snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
  ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
  ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
@@ -503,7 +506,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device 
*dev, struct ifreq *ifr)

  config.rx_filter = HWTSTAMP_FILTER_PTP_V2_L4_EVENT;
  ptp_v2 = PTP_TCR_TSVER2ENA;
  /* take time stamp for all event messages */
- snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
+ if (priv->plat->has_gmac4)
+ snap_type_sel = PTP_GMAC4_TCR_SNAPTYPSEL_1;
+ else
+ snap_type_sel = PTP_TCR_SNAPTYPSEL_1;
  ptp_over_ipv4_udp = PTP_TCR_TSIPV4ENA;
  ptp_over_ipv6_udp = PTP_TCR_TSIPV6ENA;
@@ -537,7 +543,10 @@ static int stmmac_hwtstamp_ioctl(struct net_device 
*dev, struct ifreq *ifr)

  config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
  ptp_v2 = PTP_TCR_TSVER2ENA;
  /* take time stamp for all event messages */
- snap_type_sel = PTP_TCR_

Re: general protection fault in deactivate_slab

2017-06-07 Thread Cong Wang
On Tue, Jun 6, 2017 at 4:24 AM, Andrey Konovalov  wrote:
> On Tue, Jun 6, 2017 at 1:00 PM, Andrey Konovalov  
> wrote:
>> On Tue, Jun 6, 2017 at 12:30 PM, Gene Blue  wrote:
>>> Hi:
>>>   I got this crash when fuzzing the kernel with syzkaller.
>>>
>>>   My kernel version is  4.11.0-rc1 directly download from kernel.org.
>>>
>>>   And this crash is reproducible. Three times in total during the period of
>>> fuzzing.
>>
>> Hi!
>>
>> This has already been reported and fixed:
>> https://groups.google.com/forum/#!topic/syzkaller/e3I2c8X2oWo
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=232cd35d0804cc241eb887bb8d4d9b3b9881c64a
>
> Apparently I was wrong, this is actually a different bug, the stack
> trace just looks similar. I got the same report once on 4.12-rc3 which
> has "ipv6: fix out of bound writes in __ip6_append_data()".

But this one is IPv4. ;) We need a similar fix for IPv4 too, but it
is still _not_ related to this one which is inside slab allocator.

>
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 32220 Comm: syz-executor6 Not tainted 4.12.0-rc3+ #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: 880037009700 task.stack: 880026ae
> RIP: 0010:get_freepointer mm/slub.c:243 [inline]
> RIP: 0010:deactivate_slab+0x63/0x5f0 mm/slub.c:2020
> RSP: 0018:880026ae6f40 EFLAGS: 00010006
> RAX:  RBX: eae09400 RCX: 
> RDX: 00e906298e83888b RSI: eae09400 RDI: 88003e80cf40
> RBP: 880026ae7000 R08: 880038253818 R09: 880038254018
> R10: 880026ae7020 R11: 0001 R12: 
> R13: 00e906298e83888b R14: 88003e80cf40 R15: 88003e80cf40
> FS:  7f34fa033700() GS:88003ed0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f34fa032db8 CR3: 388b2000 CR4: 06e0
> Call Trace:
>  ___slab_alloc+0x4f0/0x550 mm/slub.c:2595
>  __slab_alloc+0x51/0x90 mm/slub.c:2621
>  slab_alloc_node mm/slub.c:2684 [inline]
>  __kmalloc_node_track_caller+0x179/0x360 mm/slub.c:4303
>  __kmalloc_reserve.isra.32+0x46/0xe0 net/core/skbuff.c:138
>  __alloc_skb+0x15c/0x770 net/core/skbuff.c:231
>  alloc_skb include/linux/skbuff.h:936 [inline]
>  sock_wmalloc+0x156/0x1f0 net/core/sock.c:1879
>  __ip_append_data.isra.48+0x1e43/0x2d80 net/ipv4/ip_output.c:1041


I don't understand why we could cause NULL-ptr deref in slab allocator.
Is that slab overwritten by mistake already?


> bond0: bcsf0 ether type (3) is different from other slaves (1), can
> not enslave it
>  ip_append_data.part.49+0xe3/0x160 net/ipv4/ip_output.c:1235
>  ip_append_data+0x5f/0x80 net/ipv4/ip_output.c:1224
>  udp_sendmsg+0x10ae/0x2ce0 net/ipv4/udp.c:1073
>  inet_sendmsg+0x169/0x5c0 net/ipv4/af_inet.c:762
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xcf/0x110 net/socket.c:643
>  ___sys_sendmsg+0x98a/0xa90 net/socket.c:1997
> bond0: bcsf0 ether type (3) is different from other slaves (1), can
> not enslave it
>  __sys_sendmsg+0x13d/0x320 net/socket.c:2031
> sctp: [Deprecated]: syz-executor7 (pid 32255) Use of struct
> sctp_assoc_value in delayed_ack socket option.
> Use struct sctp_sack_info instead
>  SYSC_sendmsg net/socket.c:2042 [inline]
>  SyS_sendmsg+0x32/0x50 net/socket.c:2038
> sctp: [Deprecated]: syz-executor7 (pid 32255) Use of struct
> sctp_assoc_value in delayed_ack socket option.
> Use struct sctp_sack_info instead
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x446179
> RSP: 002b:7f34fa032c08 EFLAGS: 0282 ORIG_RAX: 002e
> RAX: ffda RBX: 42a0 RCX: 00446179
> RDX: 0800 RSI: 2076 RDI: 0005
> RBP:  R08:  R09: 
> R10:  R11: 0282 R12: 0005
> R13:  R14: 7f34fa0339c0 R15: 7f34fa033700
> Code: 3a 48 8b 84 c7 d0 00 00 00 48 89 45 88 31 c0 4d 85 e4 0f 95 c0
> 83 c0 0f 48 85 d2 89 45 80 0f 84 7a 05 00 00 48 63 47 20 49 89 d5 <4c>
> 8b 34 02 4c 89 e2 4d 85 f6 0f 84 6b 05 00 00 48 8b 4b 18 49
> RIP: get_freepointer mm/slub.c:243 [inline] RSP: 880026ae6f40
> RIP: deactivate_slab+0x63/0x5f0 mm/slub.c:2020 RSP: 880026ae6f40
> sctp: [Deprecated]: syz-executor7 (pid 32258) Use of struct
> sctp_assoc_value in delayed_ack socket option.
> Use struct sctp_sack_info instead
> ---[ end trace 743c7af6619c952b ]---
> Kernel panic - not syncing: Fatal exception
> sctp: [Deprecated]: syz-executor7 (pid 32258) Use of struct
> sctp_assoc_value in delayed_ack socket option.
> Use struct sctp_sack_info instead
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>


RE: [PATCH net-next 00/16] nfp: ctrl vNIC

2017-06-07 Thread Mintz, Yuval
> >> >> >What were your plans with pre-netdev config?
> >> >>
> >> >> We need to pass come initial resource division. Generally the
> >> >> consensus is to have these options exposed through devlink, let
> >> >> the user configure them all and then to have a trigger that would
> >> >> cause driver re-orchestration according to the new values. The
> >> >> flow would look like
> >> >> this:
> >> >>
> >> >> -driver loads with defaults, inits hw and instantiates netdevs
> >> >> -driver exposes config options via devlink -user sets up the
> >> >> options -user pushes the "go" trigger -upon the trigger command,
> >> >> devlink calls the driver re-init callback -driver shuts down the
> >> >> current instances, re-initializes hw,  re-instantiates the netdevs
> >> >>
> >> >> Makes sense?
> >> >
> >> >I like the idea of a "go"/apply/reload trigger and extending devlink.
> >> >Do you plan on adding a way to persist the settings?  I'm concerned
> >> >NIC users may want to boot into the right mode once it's set,
> >> >without reloads and reconfigs upon boot.  Also is there going to be
> >> >a way to query the pending/running config?  Sounds like we may want
> >> >to expose three value sets - persistent/default, running and
> >> >pending/to be applied.
> >
> >> I don't think it is a good idea to introduce any kind of
> >> configuration persistency in HW. I believe that user is the master
> >> and he has all needed info. He can store it persistently, but it is up to
> him.
> >>
> >> So basicaly during boot, we need the devlink configuration to happen
> >> early on, before the netdevices get configured. udev? Not sure how
> >> exactly to do this. Have to ask around :)
> >
> >Thinking about use cases where we'd want information available at probe
> >time, it might have been even better to have it separated from the
> >netdevice, e.g., providing clients with node to configure (generic?)
> >information on top of their PCI nodes.
> 
> Yuval, devlink is separated from the netdevices

Separate from the netdevices, yes. But I think it's still a networking facility.
I.e., would it make sense creating devlink nodes for PCI devices?

Anyway, I don't think there's any *strong* need for what I was asking for;
It's simply that I was thinking of qed where there's quite a bit going on
during the pci probe, and thought how re-doing it can be avoided.


  1   2   3   >