date:20171126

Re: [PATCH 1/6] perf: Add new type PERF_TYPE_PROBE

2017-11-26 Thread Peter Zijlstra

On Sat, Nov 25, 2017 at 05:59:54PM -0800, Alexei Starovoitov wrote:

> If we were poking into 'struct perf_event_attr __user *uptr'
> directly like get|put_user(.., &uptr->config)
> then 32-bit user space with 4-byte aligned u64s would cause
> 64-bit kernel to trap on archs like sparc.

But surely archs that have hardware alignment requirements have __u64 ==
__aligned_u64 ?

It's just that the structure layout can change between archs that have
__u64 != __aligned_u64 and __u64 == __aligned_u64.

But I would argue an architecture that has hardware alignment
requirements and has an unaligned __u64 is just plain broken.

> But in this case you're right. We can use config[12] as-is, since these
> u64 fields are passing the value one way only (into the kernel) and
> we do full perf_copy_attr() first and all further accesses are from
> copied structure and u64_to_user_ptr(event->attr.config) will be fine.

Right. Also note that there are no holes in perf_event_attr, if the
structure itself is allocated aligned the individual fields will be
aligned.

> Do you mind we do
> union {
>  __u64 file_path;
>  __u64 func_name;
>  __u64 config;
> };
> and similar with config1 ?

> Or prefer that we use 'config/config1' to store string+offset there?
> I think config/config1 is cleaner than config1/config2

I would prefer you use config1/config2 for this and leave config itself
for modifiers (like the retprobe thing). It also better lines up with
the BP stuff.

A little something like so perhaps:

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 362493a2f950..b6e76512f757 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -380,10 +380,14 @@ struct perf_event_attr {
__u32   bp_type;
union {
__u64   bp_addr;
+   __u64   uprobe_path;
+   __u64   kprobe_func;
__u64   config1; /* extension of config */
};
union {
__u64   bp_len;
+   __u64   kprobe_addr;
+   __u64   probe_offset;
__u64   config2; /* extension of config1 */
};
__u64   branch_sample_type; /* enum perf_branch_sample_type */

Re: [RFC net-next 4/4] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-11-26 Thread Geert Uytterhoeven

Hi Florian,

On Mon, Nov 27, 2017 at 5:05 AM, Florian Fainelli  wrote:
> On 11/06/2017 07:50 AM, Geert Uytterhoeven wrote:
>> On Tue, Oct 31, 2017 at 5:33 PM, Florian Fainelli  
>> wrote:
>>> On 10/31/2017 08:26 AM, Geert Uytterhoeven wrote:
 On Mon, Oct 30, 2017 at 5:09 PM, Florian Fainelli  
 wrote:
> On 10/30/2017 06:56 AM, Geert Uytterhoeven wrote:
>> On Thu, Oct 26, 2017 at 1:21 AM, Florian Fainelli  
>> wrote:
>>> Marc reported that he was not getting the PHY library adjust_link()
>>> callback function to run when calling phy_stop() + phy_disconnect()
>>> which does not indeed happen because we set the state machine to
>>> PHY_HALTED but we don't get to run it to process this state past that
>>> point.
>>>
>>> Fix this with a synchronous call to phy_state_machine() in order to have
>>> the state machine actually act on PHY_HALTED, set the PHY device's link
>>> down, turn the network device's carrier off and finally call the
>>> adjust_link() function.
>>>
>>> At the end of phy_state_machine() though, if we are going to be moving
>>> from PHY_HALTED to PHY_HALTED, do not reschedule the state machine, this
>>> is pointless.
>>>
>>> Reported-by: Marc Gonzalez 
>>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>>> Signed-off-by: Marc Gonzalez 
>>> Signed-off-by: Florian Fainelli 
>>
>> Thanks for your patch!
>>
>> Unfortunately, after applying this one, the last in your series, both
>> sh73a0/kzm9g and r8a73a4/ape6evm start crashing again in the system
>> suspend/resume path, due to register accesses while the device is already
>> suspended:
>
> OK, seems like there is another path, uncovered by this patch that we
> can be hitting, does the following patch below help?

 Unfortunately it doesn't help.
>>>
>>> OK :/
>>>

>> Unhandled fault: imprecise external abort (0x1406) at 0x0005b950

 Note that this is an imprecise external abort, i.e. it's reporting may
 be delayed,
 and the backtrace may be inaccurate.
>>>
>>> True, can you help narrow it down with me? Can you confirm that
>>> adjust_link() (assuming that is the problem) does not get called past
>>> phy_stop_machine() as it should?
>>
>> I've added some additional debug checks (keep track of both phy and
>> smsc state, and refuse the access registers if smsc is disabled).
>
> Thanks for doing that, and sorry for responding that late.
>
>>
>> Apparently phy_stop_machine() is called twice:
>>   - Once from mdio_bus_phy_suspend(), cfr. the first backtrace,
>>   - A second time from smsc911x_suspend(), cfr. the second backtrace.
>>
>> The second call causes a call to smsc911x_phy_adjust_link() while the smsc is
>> already disabled, cfr. the third backtrace. This would trigger the imprecise
>> external abort if I let it access the registers.
>>
>> [ cut here ]
>> WARNING: CPU: 0 PID: 1083 at drivers/net/phy/phy.c:597
>> phy_stop_machine+0x44/0xcc
>> phy_stop_machine: phy running, good
>> CPU: 0 PID: 1083 Comm: bash Not tainted
>> 4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
>> Hardware name: Generic R8A73A4 (Flattened Device Tree)
>> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> [] (show_stack) from [] (dump_stack+0xa4/0xdc)
>> [] (dump_stack) from [] (__warn+0xcc/0xfc)
>> [] (__warn) from [] (warn_slowpath_fmt+0x34/0x44)
>> [] (warn_slowpath_fmt) from [] 
>> (phy_stop_machine+0x44/0xcc)
>> [] (phy_stop_machine) from []
>> (mdio_bus_phy_suspend+0x24/0x40)
>> [] (mdio_bus_phy_suspend) from []
>> (dpm_run_callback+0x17c/0x3ec)
>> [] (dpm_run_callback) from [] 
>> (__device_suspend+0x498/0x6b0)
>> [] (__device_suspend) from [] (dpm_suspend+0x1d8/0x568)
>> [] (dpm_suspend) from []
>> (suspend_devices_and_enter+0x78/0xe98)
>> [] (suspend_devices_and_enter) from []
>> (pm_suspend+0xa40/0xbec)
>> [] (pm_suspend) from [] (state_store+0xac/0xcc)
>> [] (state_store) from [] (kernfs_fop_write+0x190/0x1d0)
>> [] (kernfs_fop_write) from [] (__vfs_write+0x20/0x11c)
>> [] (__vfs_write) from [] (vfs_write+0xb8/0x144)
>> [] (vfs_write) from [] (SyS_write+0x40/0x80)
>> [] (SyS_write) from [] (ret_fast_syscall+0x0/0x28)
>> ---[ end trace 8fc4c71351438007 ]---
>> libphy: phy_stop_machine: Kicking state machine synchronously
>> libphy: phy_stop_machine: Kicking state machine done
>> [ cut here ]
>> WARNING: CPU: 0 PID: 1083 at drivers/net/phy/phy.c:598
>> phy_stop_machine+0x64/0xcc
>> phy_stop_machine: phy already stopped
>> CPU: 0 PID: 1083 Comm: bash Tainted: GW
>> 4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
>> Hardware name: Generic R8A73A4 (Flattened Device Tree)
>> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
>> [] (show_stack) from [] (dump_stack+0xa4/0xdc)
>> [] (dump_stack) from [] (__warn+0xcc/0xfc)
>> [] (__warn) from [] (warn_slowpath_fmt+0x34/0x44)
>> [] (warn_slowpath_fmt) from [

[RFC PATCH 1/3] net: macb: Add RBQP to the macb queues

2017-11-26 Thread Harini Katakam

From: "Edgar E. Iglesias" 

Add RX queue pointer to macb queues to make it accessible for the
multiple queues available. Currently the first RX queue is used.

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Harini Katakam 
Signed-off-by: Michal Simek 
---
 drivers/net/ethernet/cadence/macb.h  | 1 +
 drivers/net/ethernet/cadence/macb_main.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index c93f3a2..acb6578 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -968,6 +968,7 @@ struct macb_queue {
unsigned intIMR;
unsigned intTBQP;
unsigned intTBQPH;
+   unsigned intRBQP;
 
unsigned inttx_head, tx_tail;
struct macb_dma_desc*tx_ring;
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 72a67f7..623ae9c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2875,6 +2875,7 @@ static int macb_init(struct platform_device *pdev)
if (bp->hw_dma_cap & HW_DMA_CAP_64B)
queue->TBQPH = GEM_TBQPH(hw_q - 1);
 #endif
+   queue->RBQP = GEM_RBQP(hw_q - 1);
} else {
/* queue0 uses legacy registers */
queue->ISR  = MACB_ISR;
@@ -2886,6 +2887,7 @@ static int macb_init(struct platform_device *pdev)
if (bp->hw_dma_cap & HW_DMA_CAP_64B)
queue->TBQPH = MACB_TBQPH;
 #endif
+   queue->RBQP = MACB_RBQP;
}
 
/* get irq: here we use the linux queue index, not the hardware
-- 
2.7.4

[RFC PATCH 2/3] net: macb: Tie-off unused RX queues

2017-11-26 Thread Harini Katakam

From: "Edgar E. Iglesias" 

Currently, we only use the first receive queue and leave the
remaining DMA descriptor pointers pointing at 0.

Disable unused queues by connecting them to a looped descriptor
chain without free slots.

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Harini Katakam 
Signed-off-by: Michal Simek 
---
 drivers/net/ethernet/cadence/macb.h  |  2 ++
 drivers/net/ethernet/cadence/macb_main.c | 42 
 2 files changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index acb6578..974c801 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -994,6 +994,7 @@ struct macb {
unsigned intrx_tail;
unsigned intrx_prepared_head;
struct macb_dma_desc*rx_ring;
+   struct macb_dma_desc*rx_ring_tieoff;
struct sk_buff  **rx_skbuff;
void*rx_buffers;
size_t  rx_buffer_size;
@@ -1019,6 +1020,7 @@ struct macb {
}   hw_stats;
 
dma_addr_t  rx_ring_dma;
+   dma_addr_t  rx_ring_tieoff_dma;
dma_addr_t  rx_buffers_dma;
 
struct macb_or_gem_ops  macbgem_ops;
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 623ae9c..b14a04d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1755,6 +1755,12 @@ static void macb_free_consistent(struct macb *bp)
bp->rx_ring = NULL;
}
 
+   if (bp->rx_ring_tieoff) {
+   dma_free_coherent(&bp->pdev->dev, sizeof(bp->rx_ring_tieoff[0]),
+ bp->rx_ring_tieoff, bp->rx_ring_tieoff_dma);
+   bp->rx_ring_tieoff = NULL;
+   }
+
for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
kfree(queue->tx_skb);
queue->tx_skb = NULL;
@@ -1826,6 +1832,19 @@ static int macb_alloc_consistent(struct macb *bp)
 &bp->rx_ring_dma, GFP_KERNEL);
if (!bp->rx_ring)
goto out_err;
+
+   /* If we have more than one queue, allocate a tie off descriptor
+* that will be used to disable unused RX queues.
+*/
+   if (bp->num_queues > 1) {
+   bp->rx_ring_tieoff = dma_alloc_coherent(&bp->pdev->dev,
+   sizeof(bp->rx_ring_tieoff[0]),
+   &bp->rx_ring_tieoff_dma,
+   GFP_KERNEL);
+   if (!bp->rx_ring_tieoff)
+   goto out_err;
+   }
+
netdev_dbg(bp->dev,
   "Allocated RX ring of %d bytes at %08lx (mapped %p)\n",
   size, (unsigned long)bp->rx_ring_dma, bp->rx_ring);
@@ -1840,6 +1859,19 @@ static int macb_alloc_consistent(struct macb *bp)
return -ENOMEM;
 }
 
+static void macb_init_tieoff(struct macb *bp)
+{
+   struct macb_dma_desc *d = bp->rx_ring_tieoff;
+
+   if (bp->num_queues > 1) {
+   /* Setup a wrapping descriptor with no free slots
+* (WRAP and USED) to tie off/disable unused RX queues.
+*/
+   d->addr = MACB_BIT(RX_WRAP) | MACB_BIT(RX_USED);
+   d->ctrl = 0;
+   }
+}
+
 static void gem_init_rings(struct macb *bp)
 {
struct macb_queue *queue;
@@ -1862,6 +1894,7 @@ static void gem_init_rings(struct macb *bp)
bp->rx_prepared_head = 0;
 
gem_rx_refill(bp);
+   macb_init_tieoff(bp);
 }
 
 static void macb_init_rings(struct macb *bp)
@@ -1879,6 +1912,7 @@ static void macb_init_rings(struct macb *bp)
bp->queues[0].tx_head = 0;
bp->queues[0].tx_tail = 0;
desc->ctrl |= MACB_BIT(TX_WRAP);
+   macb_init_tieoff(bp);
 }
 
 static void macb_reset_hw(struct macb *bp)
@@ -2063,6 +2097,14 @@ static void macb_init_hw(struct macb *bp)
queue_writel(queue, TBQPH, 
upper_32_bits(queue->tx_ring_dma));
 #endif
 
+   /* We only use the first queue at the moment. Remaining
+* queues must be tied-off before we enable the receiver.
+*
+* See the documentation for receive_q1_ptr for more info.
+*/
+   if (q)
+   queue_writel(queue, RBQP, bp->rx_ring_tieoff_dma);
+
/* Enable interrupts */
queue_writel(queue, IER,
 MACB_RX_INT_FLAGS |
-- 
2.7.4

[RFC PATCH 3/3] net: macb: Handle HRESP error

2017-11-26 Thread Harini Katakam

From: Harini Katakam 

Handle HRESP error by doing a SW reset of RX and TX and
re-initializing the descriptors, RX and TX queue pointers.

Signed-off-by: Harini Katakam 
Signed-off-by: Michal Simek 
---
 drivers/net/ethernet/cadence/macb.h  |  2 +
 drivers/net/ethernet/cadence/macb_main.c | 65 ++--
 2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index 974c801..5246ee1 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -1060,6 +1060,8 @@ struct macb {
struct ptp_clock_info ptp_clock_info;
struct tsu_incr tsu_incr;
struct hwtstamp_config tstamp_config;
+
+   struct tasklet_struct   hresp_err_tasklet;
 };
 
 #ifdef CONFIG_MACB_USE_HWSTAMP
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index b14a04d..d76e04f 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1244,6 +1244,63 @@ static int macb_poll(struct napi_struct *napi, int 
budget)
return work_done;
 }
 
+static void macb_hresp_error_task(unsigned long data)
+{
+   struct macb *bp = (struct macb *)data;
+   struct net_device *dev = bp->dev;
+   struct macb_queue *queue = bp->queues;
+   unsigned int q;
+   u32 ctrl;
+
+   for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
+   queue_writel(queue, IDR, MACB_RX_INT_FLAGS |
+MACB_TX_INT_FLAGS |
+MACB_BIT(HRESP));
+   }
+   ctrl = macb_readl(bp, NCR);
+   ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
+   macb_writel(bp, NCR, ctrl);
+
+   netif_tx_stop_all_queues(dev);
+   netif_carrier_off(dev);
+
+   bp->macbgem_ops.mog_init_rings(bp);
+
+   /* Initialize TX and RX buffers */
+   macb_writel(bp, RBQP, lower_32_bits(bp->rx_ring_dma));
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+   if (bp->hw_dma_cap & HW_DMA_CAP_64B)
+   macb_writel(bp, RBQPH, upper_32_bits(bp->rx_ring_dma));
+#endif
+   for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
+   queue_writel(queue, TBQP, lower_32_bits(queue->tx_ring_dma));
+#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
+   if (bp->hw_dma_cap & HW_DMA_CAP_64B)
+   queue_writel(queue, TBQPH, 
upper_32_bits(queue->tx_ring_dma));
+#endif
+
+   /* We only use the first queue at the moment. Remaining
+* queues must be tied-off before we enable the receiver.
+*
+* See the documentation for receive_q1_ptr for more info.
+*/
+   if (q)
+   queue_writel(queue, RBQP, bp->rx_ring_tieoff_dma);
+
+   /* Enable interrupts */
+   queue_writel(queue, IER,
+MACB_RX_INT_FLAGS |
+MACB_TX_INT_FLAGS |
+MACB_BIT(HRESP));
+   }
+
+   ctrl |= MACB_BIT(RE) | MACB_BIT(TE);
+   macb_writel(bp, NCR, ctrl);
+
+   netif_carrier_on(dev);
+   netif_tx_start_all_queues(dev);
+}
+
 static irqreturn_t macb_interrupt(int irq, void *dev_id)
 {
struct macb_queue *queue = dev_id;
@@ -1333,10 +1390,7 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
}
 
if (status & MACB_BIT(HRESP)) {
-   /* TODO: Reset the hardware, and maybe move the
-* netdev_err to a lower-priority context as well
-* (work queue?)
-*/
+   tasklet_schedule(&bp->hresp_err_tasklet);
netdev_err(dev, "DMA bus error: HRESP not OK\n");
 
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
@@ -3586,6 +3640,9 @@ static int macb_probe(struct platform_device *pdev)
goto err_out_unregister_mdio;
}
 
+   tasklet_init(&bp->hresp_err_tasklet, macb_hresp_error_task,
+(unsigned long)bp);
+
phy_attached_info(phydev);
 
netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
-- 
2.7.4

[RFC PATCH 0/3] Miscellaneous fixes in macb driver

2017-11-26 Thread Harini Katakam

This series fixes the following:
-> Ties off unused RX queues
-> Handles RX HRESP error

Edgar E. Iglesias (2):
  net: macb: Add RBQP to the macb queues
  net: macb: Tie-off unused RX queues

Harini Katakam (1):
  net: macb: Handle HRESP error

 drivers/net/ethernet/cadence/macb.h  |   5 ++
 drivers/net/ethernet/cadence/macb_main.c | 109 +--
 2 files changed, 110 insertions(+), 4 deletions(-)

-- 
2.7.4

Re: [RFC PATCH 1/2] net: macb: Add RBQP to the macb queues

2017-11-26 Thread Harini Katakam

Hi,

Please ignore this series.
I'm sending another updated one.
Sorry for the inconvenience.

Regards,
Harini

On Mon, Nov 27, 2017 at 12:33 PM, Harini Katakam
 wrote:
> From: "Edgar E. Iglesias" 
>
> Add RX queue pointer to macb queues to make it accessible for the
> multiple queues available. Currently the first RX queue is used.
>
> Signed-off-by: Edgar E. Iglesias 
> Signed-off-by: Harini Katakam 
> Signed-off-by: Michal Simek 
> ---
>  drivers/net/ethernet/cadence/macb.h  | 1 +
>  drivers/net/ethernet/cadence/macb_main.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/cadence/macb.h 
> b/drivers/net/ethernet/cadence/macb.h
> index c93f3a2..acb6578 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -968,6 +968,7 @@ struct macb_queue {
> unsigned intIMR;
> unsigned intTBQP;
> unsigned intTBQPH;
> +   unsigned intRBQP;
>
> unsigned inttx_head, tx_tail;
> struct macb_dma_desc*tx_ring;
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index 72a67f7..623ae9c 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -2875,6 +2875,7 @@ static int macb_init(struct platform_device *pdev)
> if (bp->hw_dma_cap & HW_DMA_CAP_64B)
> queue->TBQPH = GEM_TBQPH(hw_q - 1);
>  #endif
> +   queue->RBQP = GEM_RBQP(hw_q - 1);
> } else {
> /* queue0 uses legacy registers */
> queue->ISR  = MACB_ISR;
> @@ -2886,6 +2887,7 @@ static int macb_init(struct platform_device *pdev)
> if (bp->hw_dma_cap & HW_DMA_CAP_64B)
> queue->TBQPH = MACB_TBQPH;
>  #endif
> +   queue->RBQP = MACB_RBQP;
> }
>
> /* get irq: here we use the linux queue index, not the 
> hardware
> --
> 2.7.4
>

[PATCH RESEND] net: phy: harmonize phy_id{,_mask} data type

2017-11-26 Thread Richard Leitner

From: Richard Leitner 

Previously phy_id was u32 and phy_id_mask was unsigned int. As the
phy_id_mask defines the important bits of the phy_id (and is therefore
the same size) these two variables should be the same data type.

Signed-off-by: Richard Leitner 
Reviewed-by: Florian Fainelli 
Reviewed-by: Andrew Lunn 
---
RESEND as suggested by Andrew Lunn
---
 include/linux/phy.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index dc82a07cb4fd..e00fd9ce3bce 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -509,7 +509,7 @@ struct phy_driver {
struct mdio_driver_common mdiodrv;
u32 phy_id;
char *name;
-   unsigned int phy_id_mask;
+   u32 phy_id_mask;
u32 features;
u32 flags;
const void *driver_data;
-- 
2.11.0

[RFC PATCH 2/2] net: macb: Tie-off unused RX queues

2017-11-26 Thread Harini Katakam

From: "Edgar E. Iglesias" 

Currently, we only use the first receive queue and leave the
remaining DMA descriptor pointers pointing at 0.

Disable unused queues by connecting them to a looped descriptor
chain without free slots.

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Harini Katakam 
Signed-off-by: Michal Simek 
---
 drivers/net/ethernet/cadence/macb.h  |  2 ++
 drivers/net/ethernet/cadence/macb_main.c | 42 
 2 files changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index acb6578..974c801 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -994,6 +994,7 @@ struct macb {
unsigned intrx_tail;
unsigned intrx_prepared_head;
struct macb_dma_desc*rx_ring;
+   struct macb_dma_desc*rx_ring_tieoff;
struct sk_buff  **rx_skbuff;
void*rx_buffers;
size_t  rx_buffer_size;
@@ -1019,6 +1020,7 @@ struct macb {
}   hw_stats;
 
dma_addr_t  rx_ring_dma;
+   dma_addr_t  rx_ring_tieoff_dma;
dma_addr_t  rx_buffers_dma;
 
struct macb_or_gem_ops  macbgem_ops;
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 623ae9c..b14a04d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1755,6 +1755,12 @@ static void macb_free_consistent(struct macb *bp)
bp->rx_ring = NULL;
}
 
+   if (bp->rx_ring_tieoff) {
+   dma_free_coherent(&bp->pdev->dev, sizeof(bp->rx_ring_tieoff[0]),
+ bp->rx_ring_tieoff, bp->rx_ring_tieoff_dma);
+   bp->rx_ring_tieoff = NULL;
+   }
+
for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
kfree(queue->tx_skb);
queue->tx_skb = NULL;
@@ -1826,6 +1832,19 @@ static int macb_alloc_consistent(struct macb *bp)
 &bp->rx_ring_dma, GFP_KERNEL);
if (!bp->rx_ring)
goto out_err;
+
+   /* If we have more than one queue, allocate a tie off descriptor
+* that will be used to disable unused RX queues.
+*/
+   if (bp->num_queues > 1) {
+   bp->rx_ring_tieoff = dma_alloc_coherent(&bp->pdev->dev,
+   sizeof(bp->rx_ring_tieoff[0]),
+   &bp->rx_ring_tieoff_dma,
+   GFP_KERNEL);
+   if (!bp->rx_ring_tieoff)
+   goto out_err;
+   }
+
netdev_dbg(bp->dev,
   "Allocated RX ring of %d bytes at %08lx (mapped %p)\n",
   size, (unsigned long)bp->rx_ring_dma, bp->rx_ring);
@@ -1840,6 +1859,19 @@ static int macb_alloc_consistent(struct macb *bp)
return -ENOMEM;
 }
 
+static void macb_init_tieoff(struct macb *bp)
+{
+   struct macb_dma_desc *d = bp->rx_ring_tieoff;
+
+   if (bp->num_queues > 1) {
+   /* Setup a wrapping descriptor with no free slots
+* (WRAP and USED) to tie off/disable unused RX queues.
+*/
+   d->addr = MACB_BIT(RX_WRAP) | MACB_BIT(RX_USED);
+   d->ctrl = 0;
+   }
+}
+
 static void gem_init_rings(struct macb *bp)
 {
struct macb_queue *queue;
@@ -1862,6 +1894,7 @@ static void gem_init_rings(struct macb *bp)
bp->rx_prepared_head = 0;
 
gem_rx_refill(bp);
+   macb_init_tieoff(bp);
 }
 
 static void macb_init_rings(struct macb *bp)
@@ -1879,6 +1912,7 @@ static void macb_init_rings(struct macb *bp)
bp->queues[0].tx_head = 0;
bp->queues[0].tx_tail = 0;
desc->ctrl |= MACB_BIT(TX_WRAP);
+   macb_init_tieoff(bp);
 }
 
 static void macb_reset_hw(struct macb *bp)
@@ -2063,6 +2097,14 @@ static void macb_init_hw(struct macb *bp)
queue_writel(queue, TBQPH, 
upper_32_bits(queue->tx_ring_dma));
 #endif
 
+   /* We only use the first queue at the moment. Remaining
+* queues must be tied-off before we enable the receiver.
+*
+* See the documentation for receive_q1_ptr for more info.
+*/
+   if (q)
+   queue_writel(queue, RBQP, bp->rx_ring_tieoff_dma);
+
/* Enable interrupts */
queue_writel(queue, IER,
 MACB_RX_INT_FLAGS |
-- 
2.7.4

[RFC PATCH 1/2] net: macb: Add RBQP to the macb queues

2017-11-26 Thread Harini Katakam

From: "Edgar E. Iglesias" 

Add RX queue pointer to macb queues to make it accessible for the
multiple queues available. Currently the first RX queue is used.

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Harini Katakam 
Signed-off-by: Michal Simek 
---
 drivers/net/ethernet/cadence/macb.h  | 1 +
 drivers/net/ethernet/cadence/macb_main.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb.h 
b/drivers/net/ethernet/cadence/macb.h
index c93f3a2..acb6578 100644
--- a/drivers/net/ethernet/cadence/macb.h
+++ b/drivers/net/ethernet/cadence/macb.h
@@ -968,6 +968,7 @@ struct macb_queue {
unsigned intIMR;
unsigned intTBQP;
unsigned intTBQPH;
+   unsigned intRBQP;
 
unsigned inttx_head, tx_tail;
struct macb_dma_desc*tx_ring;
diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index 72a67f7..623ae9c 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2875,6 +2875,7 @@ static int macb_init(struct platform_device *pdev)
if (bp->hw_dma_cap & HW_DMA_CAP_64B)
queue->TBQPH = GEM_TBQPH(hw_q - 1);
 #endif
+   queue->RBQP = GEM_RBQP(hw_q - 1);
} else {
/* queue0 uses legacy registers */
queue->ISR  = MACB_ISR;
@@ -2886,6 +2887,7 @@ static int macb_init(struct platform_device *pdev)
if (bp->hw_dma_cap & HW_DMA_CAP_64B)
queue->TBQPH = MACB_TBQPH;
 #endif
+   queue->RBQP = MACB_RBQP;
}
 
/* get irq: here we use the linux queue index, not the hardware
-- 
2.7.4

Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer

2017-11-26 Thread Stephen Hemminger

On Sun, 26 Nov 2017 20:13:39 -0700
David Ahern  wrote:

> On 11/26/17 11:17 AM, Stephen Hemminger wrote:
> > This allows veth device in containers to see the GSO maximum
> > settings of the actual device being used for output.
> > 
> > Signed-off-by: Stephen Hemminger 
> > ---
> >  drivers/net/veth.c | 72 
> > ++
> >  1 file changed, 72 insertions(+)
> > 
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index f5438d0978ca..0c9ce156943b 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -511,17 +511,89 @@ static struct rtnl_link_ops veth_link_ops = {
> > .get_link_net   = veth_get_link_net,
> >  };
> >  
> > +/* When veth device is added to a bridge or other master device
> > + * then reflect the GSO max values from the upper device
> > + * to the other end of veth pair.
> > + */
> > +static void veth_change_upper(struct net_device *dev,
> > + const struct netdev_notifier_changeupper_info *info)
> > +{
> > +   struct net_device *upper = info->upper_dev;
> > +   struct net_device *peer;
> > +   struct veth_priv *priv;
> > +
> > +   if (dev->netdev_ops != &veth_netdev_ops)
> > +   return;
> > +
> > +   priv = netdev_priv(dev);
> > +   peer = rtnl_dereference(priv->peer);
> > +   if (!peer)
> > +   return;
> > +
> > +   if (upper) {
> > +   peer->gso_max_segs = upper->gso_max_segs;
> > +   peer->gso_max_size = upper->gso_max_size;
> > +   } else {
> > +   peer->gso_max_segs = GSO_MAX_SEGS;
> > +   peer->gso_max_size = GSO_MAX_SIZE;
> > +   }  
> 
> veth devices can be added to a VRF instead of a bridge, and I do not
> believe the gso propagation works for L3 master devices.
> 
> From a quick grep, team devices do not appear to handle gso changes either.

This code should still work correctly, but no optimization would happen.
The gso_max_size of the VRF or team will
still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart
enough to handle GSO limits, then the algorithm would handle it.

Re: [PATCH iproute2] drop support for DECnet

2017-11-26 Thread Stephen Hemminger

On Sun, 26 Nov 2017 17:19:21 -0500 (EST)
David Miller  wrote:

> From: Stephen Hemminger 
> Date: Sun, 26 Nov 2017 12:11:18 -0800
> 
> > Nothing has been done in Linux to maintain DECnet
> > for years. The code is buggy and does not support net namespace.
> > This patch removes all support of it from iproute2
> > 
> > Signed-off-by: Stephen Hemminger   
> 
> Even if decnet had been moved to staging/ (which it hasn't) I'd say
> this is entirely premature Stephen.
> 
> As long as the decnet code is upstream, you can't just yank away
> support from the tooling.

I agree that it is probably too early to remove support completely.
Instead, why not just mark IPX and DecNET as deprecated and warn users
for a couple of releases.

Re: [RFC net-next 4/4] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

2017-11-26 Thread Florian Fainelli

Hi Geert,

On 11/06/2017 07:50 AM, Geert Uytterhoeven wrote:
> Hi Florian,
> 
> On Tue, Oct 31, 2017 at 5:33 PM, Florian Fainelli  
> wrote:
>> On 10/31/2017 08:26 AM, Geert Uytterhoeven wrote:
>>> On Mon, Oct 30, 2017 at 5:09 PM, Florian Fainelli  
>>> wrote:
 On 10/30/2017 06:56 AM, Geert Uytterhoeven wrote:
> On Thu, Oct 26, 2017 at 1:21 AM, Florian Fainelli  
> wrote:
>> Marc reported that he was not getting the PHY library adjust_link()
>> callback function to run when calling phy_stop() + phy_disconnect()
>> which does not indeed happen because we set the state machine to
>> PHY_HALTED but we don't get to run it to process this state past that
>> point.
>>
>> Fix this with a synchronous call to phy_state_machine() in order to have
>> the state machine actually act on PHY_HALTED, set the PHY device's link
>> down, turn the network device's carrier off and finally call the
>> adjust_link() function.
>>
>> At the end of phy_state_machine() though, if we are going to be moving
>> from PHY_HALTED to PHY_HALTED, do not reschedule the state machine, this
>> is pointless.
>>
>> Reported-by: Marc Gonzalez 
>> Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
>> Signed-off-by: Marc Gonzalez 
>> Signed-off-by: Florian Fainelli 
>
> Thanks for your patch!
>
> Unfortunately, after applying this one, the last in your series, both
> sh73a0/kzm9g and r8a73a4/ape6evm start crashing again in the system
> suspend/resume path, due to register accesses while the device is already
> suspended:

 OK, seems like there is another path, uncovered by this patch that we
 can be hitting, does the following patch below help?
>>>
>>> Unfortunately it doesn't help.
>>
>> OK :/
>>
>>>
> Unhandled fault: imprecise external abort (0x1406) at 0x0005b950
>>>
>>> Note that this is an imprecise external abort, i.e. it's reporting may
>>> be delayed,
>>> and the backtrace may be inaccurate.
>>
>> True, can you help narrow it down with me? Can you confirm that
>> adjust_link() (assuming that is the problem) does not get called past
>> phy_stop_machine() as it should?
> 
> I've added some additional debug checks (keep track of both phy and
> smsc state, and refuse the access registers if smsc is disabled).

Thanks for doing that, and sorry for responding that late.

> 
> Apparently phy_stop_machine() is called twice:
>   - Once from mdio_bus_phy_suspend(), cfr. the first backtrace,
>   - A second time from smsc911x_suspend(), cfr. the second backtrace.
> 
> The second call causes a call to smsc911x_phy_adjust_link() while the smsc is
> already disabled, cfr. the third backtrace. This would trigger the imprecise
> external abort if I let it access the registers.
> 
> [ cut here ]
> WARNING: CPU: 0 PID: 1083 at drivers/net/phy/phy.c:597
> phy_stop_machine+0x44/0xcc
> phy_stop_machine: phy running, good
> CPU: 0 PID: 1083 Comm: bash Not tainted
> 4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
> Hardware name: Generic R8A73A4 (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0xa4/0xdc)
> [] (dump_stack) from [] (__warn+0xcc/0xfc)
> [] (__warn) from [] (warn_slowpath_fmt+0x34/0x44)
> [] (warn_slowpath_fmt) from [] 
> (phy_stop_machine+0x44/0xcc)
> [] (phy_stop_machine) from []
> (mdio_bus_phy_suspend+0x24/0x40)
> [] (mdio_bus_phy_suspend) from []
> (dpm_run_callback+0x17c/0x3ec)
> [] (dpm_run_callback) from [] 
> (__device_suspend+0x498/0x6b0)
> [] (__device_suspend) from [] (dpm_suspend+0x1d8/0x568)
> [] (dpm_suspend) from []
> (suspend_devices_and_enter+0x78/0xe98)
> [] (suspend_devices_and_enter) from []
> (pm_suspend+0xa40/0xbec)
> [] (pm_suspend) from [] (state_store+0xac/0xcc)
> [] (state_store) from [] (kernfs_fop_write+0x190/0x1d0)
> [] (kernfs_fop_write) from [] (__vfs_write+0x20/0x11c)
> [] (__vfs_write) from [] (vfs_write+0xb8/0x144)
> [] (vfs_write) from [] (SyS_write+0x40/0x80)
> [] (SyS_write) from [] (ret_fast_syscall+0x0/0x28)
> ---[ end trace 8fc4c71351438007 ]---
> libphy: phy_stop_machine: Kicking state machine synchronously
> libphy: phy_stop_machine: Kicking state machine done
> [ cut here ]
> WARNING: CPU: 0 PID: 1083 at drivers/net/phy/phy.c:598
> phy_stop_machine+0x64/0xcc
> phy_stop_machine: phy already stopped
> CPU: 0 PID: 1083 Comm: bash Tainted: GW
> 4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
> Hardware name: Generic R8A73A4 (Flattened Device Tree)
> [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
> [] (show_stack) from [] (dump_stack+0xa4/0xdc)
> [] (dump_stack) from [] (__warn+0xcc/0xfc)
> [] (__warn) from [] (warn_slowpath_fmt+0x34/0x44)
> [] (warn_slowpath_fmt) from [] 
> (phy_stop_machine+0x64/0xcc)
> [] (phy_stop_machine) from [] (smsc911x_suspend+0x44/0xa4)
> [] (smsc911x_suspend) from [] 
> (dpm_run_callback+0x17c/

Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer

2017-11-26 Thread David Ahern

On 11/26/17 11:17 AM, Stephen Hemminger wrote:
> This allows veth device in containers to see the GSO maximum
> settings of the actual device being used for output.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/veth.c | 72 
> ++
>  1 file changed, 72 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index f5438d0978ca..0c9ce156943b 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -511,17 +511,89 @@ static struct rtnl_link_ops veth_link_ops = {
>   .get_link_net   = veth_get_link_net,
>  };
>  
> +/* When veth device is added to a bridge or other master device
> + * then reflect the GSO max values from the upper device
> + * to the other end of veth pair.
> + */
> +static void veth_change_upper(struct net_device *dev,
> +   const struct netdev_notifier_changeupper_info *info)
> +{
> + struct net_device *upper = info->upper_dev;
> + struct net_device *peer;
> + struct veth_priv *priv;
> +
> + if (dev->netdev_ops != &veth_netdev_ops)
> + return;
> +
> + priv = netdev_priv(dev);
> + peer = rtnl_dereference(priv->peer);
> + if (!peer)
> + return;
> +
> + if (upper) {
> + peer->gso_max_segs = upper->gso_max_segs;
> + peer->gso_max_size = upper->gso_max_size;
> + } else {
> + peer->gso_max_segs = GSO_MAX_SEGS;
> + peer->gso_max_size = GSO_MAX_SIZE;
> + }

veth devices can be added to a VRF instead of a bridge, and I do not
believe the gso propagation works for L3 master devices.

>From a quick grep, team devices do not appear to handle gso changes either.

unregister_netdevice: waiting for lo to become free. Usage count = 1

2017-11-26 Thread Cengiz Can

Hello!

In case anyone wondering (like I did) if this is still an issue, it's not.

It was fixed in 4.12.

Re: [PATCH net] openvswitch: fix the incorrect flow action alloc size

2017-11-26 Thread David Miller

From: zhangliping 
Date: Sat, 25 Nov 2017 22:02:12 +0800

> From: zhangliping 
> 
> If we want to add a datapath flow, which has more than 500 vxlan outputs'
> action, we will get the following error reports:
>   openvswitch: netlink: Flow action size 32832 bytes exceeds max
>   openvswitch: netlink: Flow action size 32832 bytes exceeds max
>   openvswitch: netlink: Actions may not be safe on all matching packets
>   ... ...
> 
> It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
> this is not the root cause. For example, for a vxlan output action, we need
> about 60 bytes for the nlattr, but after it is converted to the flow
> action, it only occupies 24 bytes. This means that we can still support
> more than 1000 vxlan output actions for a single datapath flow under the
> the current 32k max limitation.
> 
> So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
> shouldn't report EINVAL and keep it move on, as the judgement can be
> done by the reserve_sfa_size.
> 
> Signed-off-by: zhangliping 

Applied, thanks.

linux-next: manual merge of the net tree with Linus' tree

2017-11-26 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net tree got a conflict in:

  net/rxrpc/call_object.c

between commit:

  e99e88a9d2b0 ("treewide: setup_timer() -> timer_setup()")

from Linus' tree and commit:

  9faaff593404 ("rxrpc: Provide a different lockdep key for call->user_mutex 
for kernel calls")

from the net tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc net/rxrpc/call_object.c
index 994dc2df57e4,7ee3d6ce5aa2..
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@@ -114,7 -118,16 +118,15 @@@ struct rxrpc_call *rxrpc_alloc_call(str
goto nomem_2;
  
mutex_init(&call->user_mutex);
+ 
+   /* Prevent lockdep reporting a deadlock false positive between the afs
+* filesystem and sys_sendmsg() via the mmap sem.
+*/
+   if (rx->sk.sk_kern_sock)
+   lockdep_set_class(&call->user_mutex,
+ &rxrpc_call_user_mutex_lock_class_key);
+ 
 -  setup_timer(&call->timer, rxrpc_call_timer_expired,
 -  (unsigned long)call);
 +  timer_setup(&call->timer, rxrpc_call_timer_expired, 0);
INIT_WORK(&call->processor, &rxrpc_process_call);
INIT_LIST_HEAD(&call->link);
INIT_LIST_HEAD(&call->chan_wait_link);

Re: [PATCH iproute2] drop support for DECnet

2017-11-26 Thread David Miller

From: Stephen Hemminger 
Date: Sun, 26 Nov 2017 12:11:18 -0800

> Nothing has been done in Linux to maintain DECnet
> for years. The code is buggy and does not support net namespace.
> This patch removes all support of it from iproute2
> 
> Signed-off-by: Stephen Hemminger 

Even if decnet had been moved to staging/ (which it hasn't) I'd say
this is entirely premature Stephen.

As long as the decnet code is upstream, you can't just yank away
support from the tooling.

Thank you.

Re: [patch iproute2 00/11] tc: jsonify couple of qdics, filter and actions

2017-11-26 Thread Jiri Pirko

Sun, Nov 26, 2017 at 09:44:17PM CET, step...@networkplumber.org wrote:
>On Sat, 25 Nov 2017 15:48:24 +0100
>Jiri Pirko  wrote:
>
>> From: Jiri Pirko 
>> 
>> An example json output:
>> 
>> $ tc -s -j filter show dev ens8 egress
>> [{
>> "protocol": "ip",
>> "pref": 6001,
>> "kind": "flower",
>> "chain": 0
>> },{
>> "protocol": "ip",
>> "pref": 6001,
>> "kind": "flower",
>> "chain": 0,
>> "options": {
>> "handle": 1,
>> "keys": {
>> "eth_type": "ipv4",
>> "dst_ip": "192.168.250.1"
>> },
>> "not_in_hw": true,
>> "actions": [{
>> "order": 1,
>> "kind": "gact",
>> "control_action": {
>> "type": "drop"
>> },
>> "prob": {
>> "random_type": "none",
>> "control_action": {
>> "type": "pass"
>> },
>> "val": 0
>> },
>> "index": 1,
>> "ref": 1,
>> "bind": 1,
>> "installed": 1667830,
>> "last_used": 1667830,
>> "stats": {
>> "bytes": 0,
>> "packets": 0,
>> "drops": 0,
>> "overlimits": 0,
>> "requeues": 0,
>> "backlog": 0,
>> "qlen": 0,
>> "requeues": 0
>> },
>> "cookie": "a1b2c3d4bb"
>> }
>> }
>> }
>> }
>> ]
>> $ tc -s filter show dev ens8 egress
>> filter pref 6001 flower chain 0 
>> filter pref 6001 flower chain 0 handle 0x1 
>>   eth_type ipv4
>>   dst_ip 192.168.250.1
>>   not_in_hw
>> action order 1: gact action drop
>>  random type none pass val 0
>>  index 1 ref 1 bind 1 installed 16689 sec used 16689 sec
>> Action statistics:
>> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>> backlog 0b 0p requeues 0
>> cookie a1b2c3d4bb
>> 
>> ---
>> To be applied on top of my cookie fix patchset
>> 
>> Jiri Pirko (11):
>>   tc: jsonify qdisc core
>>   tc: jsonify stats2
>>   tc: jsonify fq_codel qdisc
>>   tc: jsonify htb qdisc
>>   tc: jsonify filter core
>>   tc: jsonify flower filter
>>   tc: jsonify matchall filter
>>   tc: jsonify actions core
>>   tc: jsonify gact action
>>   tc: jsonify mirred action
>>   tc: jsonify vlan action
>> 
>>  tc/f_flower.c   | 287 
>> +---
>>  tc/f_matchall.c |  12 +--
>>  tc/m_action.c   |  22 +++--
>>  tc/m_gact.c |  18 ++--
>>  tc/m_mirred.c   |  46 +++--
>>  tc/m_vlan.c |  26 +++--
>>  tc/q_fq_codel.c |  25 +++--
>>  tc/q_htb.c  |  20 ++--
>>  tc/tc.c |   5 +-
>>  tc/tc_filter.c  |  47 ++
>>  tc/tc_qdisc.c   |  52 ++
>>  tc/tc_util.c|  66 +
>>  tc/tc_util.h|   1 +
>>  13 files changed, 396 insertions(+), 231 deletions(-)
>> 
>
>
>Applied, but other qdisc need some jsonification as well.

and classes/actions. I agree.

Re: [patch iproute2 00/11] tc: jsonify couple of qdics, filter and actions

2017-11-26 Thread Stephen Hemminger

On Sat, 25 Nov 2017 15:48:24 +0100
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> An example json output:
> 
> $ tc -s -j filter show dev ens8 egress
> [{
> "protocol": "ip",
> "pref": 6001,
> "kind": "flower",
> "chain": 0
> },{
> "protocol": "ip",
> "pref": 6001,
> "kind": "flower",
> "chain": 0,
> "options": {
> "handle": 1,
> "keys": {
> "eth_type": "ipv4",
> "dst_ip": "192.168.250.1"
> },
> "not_in_hw": true,
> "actions": [{
> "order": 1,
> "kind": "gact",
> "control_action": {
> "type": "drop"
> },
> "prob": {
> "random_type": "none",
> "control_action": {
> "type": "pass"
> },
> "val": 0
> },
> "index": 1,
> "ref": 1,
> "bind": 1,
> "installed": 1667830,
> "last_used": 1667830,
> "stats": {
> "bytes": 0,
> "packets": 0,
> "drops": 0,
> "overlimits": 0,
> "requeues": 0,
> "backlog": 0,
> "qlen": 0,
> "requeues": 0
> },
> "cookie": "a1b2c3d4bb"
> }
> }
> }
> }
> ]
> $ tc -s filter show dev ens8 egress
> filter pref 6001 flower chain 0 
> filter pref 6001 flower chain 0 handle 0x1 
>   eth_type ipv4
>   dst_ip 192.168.250.1
>   not_in_hw
> action order 1: gact action drop
>  random type none pass val 0
>  index 1 ref 1 bind 1 installed 16689 sec used 16689 sec
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
> backlog 0b 0p requeues 0
> cookie a1b2c3d4bb
> 
> ---
> To be applied on top of my cookie fix patchset
> 
> Jiri Pirko (11):
>   tc: jsonify qdisc core
>   tc: jsonify stats2
>   tc: jsonify fq_codel qdisc
>   tc: jsonify htb qdisc
>   tc: jsonify filter core
>   tc: jsonify flower filter
>   tc: jsonify matchall filter
>   tc: jsonify actions core
>   tc: jsonify gact action
>   tc: jsonify mirred action
>   tc: jsonify vlan action
> 
>  tc/f_flower.c   | 287 
> +---
>  tc/f_matchall.c |  12 +--
>  tc/m_action.c   |  22 +++--
>  tc/m_gact.c |  18 ++--
>  tc/m_mirred.c   |  46 +++--
>  tc/m_vlan.c |  26 +++--
>  tc/q_fq_codel.c |  25 +++--
>  tc/q_htb.c  |  20 ++--
>  tc/tc.c |   5 +-
>  tc/tc_filter.c  |  47 ++
>  tc/tc_qdisc.c   |  52 ++
>  tc/tc_util.c|  66 +
>  tc/tc_util.h|   1 +
>  13 files changed, 396 insertions(+), 231 deletions(-)
> 


Applied, but other qdisc need some jsonification as well.

Re: [patch iproute2 0/2] tc: couple ouf action cookie printout fixes

2017-11-26 Thread Stephen Hemminger

On Sat, 25 Nov 2017 11:07:55 +0100
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> Jiri Pirko (2):
>   tc: move action cookie print out of the stats if
>   tc: remove action cookie len from printout
> 
>  tc/m_action.c | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 

Applied,  thanks Jiri

[PATCH iproute2] drop support for DECnet

2017-11-26 Thread Stephen Hemminger

Nothing has been done in Linux to maintain DECnet
for years. The code is buggy and does not support net namespace.
This patch removes all support of it from iproute2

Signed-off-by: Stephen Hemminger 
---
 Makefile|   3 --
 README.decnet   |  33 --
 README.lnstat   |   2 +-
 include/utils.h |   3 --
 ip/ip.c |   2 +-
 ip/ipaddress.c  |   2 --
 ip/ipntable.c   |   2 --
 lib/dnet_ntop.c | 101 
 lib/dnet_pton.c |  75 -
 lib/utils.c |  27 ++-
 man/man8/ip.8   |   5 ---
 11 files changed, 4 insertions(+), 251 deletions(-)
 delete mode 100644 README.decnet
 delete mode 100644 lib/dnet_ntop.c
 delete mode 100644 lib/dnet_pton.c

diff --git a/Makefile b/Makefile
index 6a51e0db9107..f3c2d5b90f7b 100644
--- a/Makefile
+++ b/Makefile
@@ -29,9 +29,6 @@ endif
 
 DEFINES+=-DCONFDIR=\"$(CONFDIR)\"
 
-#options for decnet
-ADDLIB+=dnet_ntop.o dnet_pton.o
-
 #options for ipx
 ADDLIB+=ipx_ntop.o ipx_pton.o
 
diff --git a/README.decnet b/README.decnet
deleted file mode 100644
index 4300f906d97b..
--- a/README.decnet
+++ /dev/null
@@ -1,33 +0,0 @@
-
-Here are a few quick points about DECnet support...
-
- o iproute2 is the tool of choice for configuring the DECnet support for
-   Linux. For many features, it is the only tool which can be used to
-   configure them.
-
- o No name resolution is available as yet, all addresses must be
-   entered numerically.
-
- o Remember to set the hardware address of the interface using: 
-
-   ip link set ethX address xx:xx:xx:xx:xx:xx
-  (where xx:xx:xx:xx:xx:xx is the MAC address for your DECnet node
-   address)
-
-   if your Ethernet card won't listen to more than one unicast
-   mac address at once. If the Linux DECnet stack doesn't talk to
-   any other DECnet nodes, then check this with tcpdump and if its
-   a problem, change the mac address (but do this _before_ starting
-   any other network protocol on the interface)
-
- o Whilst you can use ip addr add to add more than one DECnet address to an
-   interface, don't expect addresses which are not the same as the
-   kernels node address to work properly with 2.4 kernels. This should
-   be fine with 2.6 kernels as the routing code has been extensively
-   modified and improved.
-
- o The DECnet support is currently self contained. It does not depend on
-   the libdnet library.
-
-Steve Whitehouse 
-
diff --git a/README.lnstat b/README.lnstat
index 057925f671b7..59134a158c3b 100644
--- a/README.lnstat
+++ b/README.lnstat
@@ -9,7 +9,7 @@ In addition to routing cache statistics, it supports any kind 
of statistics
 the linux kernel exports via a file in /proc/net/stat.  In a stock 2.6.9
 kernel, this is 
per-protocol neighbour cache statistics 
-   (ipv4, ipv6, atm, decnet)
+   (ipv4, ipv6, atm)
routing cache statistics
(ipv4)
connection tracking statistics
diff --git a/include/utils.h b/include/utils.h
index d3895d562726..fb7b5d295254 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -142,9 +142,6 @@ int get_ifname(char *, const char *);
 int matches(const char *arg, const char *pattern);
 int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
 
-const char *dnet_ntop(int af, const void *addr, char *str, size_t len);
-int dnet_pton(int af, const char *src, void *addr);
-
 const char *ipx_ntop(int af, const void *addr, char *str, size_t len);
 int ipx_pton(int af, const char *src, void *addr);
 
diff --git a/ip/ip.c b/ip/ip.c
index b15e6b66b3f6..bc95d0ab95cb 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -216,7 +216,7 @@ int main(int argc, char **argv)
} else if (strcmp(opt, "-I") == 0) {
preferred_family = AF_IPX;
} else if (strcmp(opt, "-D") == 0) {
-   preferred_family = AF_DECnet;
+   invarg("DECnet is not supported", opt);
} else if (strcmp(opt, "-M") == 0) {
preferred_family = AF_MPLS;
} else if (strcmp(opt, "-B") == 0) {
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 8057011ef525..2a4939340be2 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1576,8 +1576,6 @@ int print_addrinfo(const struct sockaddr_nl *who, struct 
nlmsghdr *n,
print_string(PRINT_ANY, "family", "%s ", "inet");
else if (ifa->ifa_family == AF_INET6)
print_string(PRINT_ANY, "family", "%s ", "inet6");
-   else if (ifa->ifa_family == AF_DECnet)
-   print_string(PRINT_ANY, "family", "%s ", "dnet");
else if (ifa->ifa_family == AF_IPX)
print_string(PRINT_ANY, "family", " %s ", "ipx");
else
diff --git a/ip/ipntable.c b/ip/ipntable.c
index 2f72c989f35d..a33bcdcf9f76 100644
--- a/ip/ipntable.c
+++ b/ip/ipntable

Re: [PATCH iproute2/master 00/11] bpf: pass ifindex for bpf offload

2017-11-26 Thread Stephen Hemminger

On Thu, 23 Nov 2017 18:11:57 -0800
Jakub Kicinski  wrote:

> Hi!
> 
> This series allows us to pass ifindex automatically when we
> set up TC cls_bpf or XDP offload.  There is a fair bit of
> refactoring to separate the parse and load stages of lib/bpf.c.
> In case of TC the skip_sw flag may come after the program
> arguments (e.g. "bpf obj prog.o da skip_sw"), so we can't
> just load the program as we parse the arguments.  Note that
> this impacts only loading of the program, all other supported
> methods of finding a program (pinned, bytecode, bytefile-file)
> are handled as previously, the load call will do nothing for
> them.
> 
> To simplify the implementation f_bpf and m_bpf will no longer
> allow specifying programs multiple times.  Device ifindex is 
> also resolved before running filter-specific code.
> 
> 
> Jakub Kicinski (11):
>   bpf: pass program type in struct bpf_cfg_in
>   bpf: keep parsed program mode in struct bpf_cfg_in
>   bpf: allocate opcode table in struct bpf_cfg_in
>   bpf: split parse from program loading
>   bpf: rename bpf_parse_common() to bpf_parse_and_load_common()
>   bpf: expose bpf_parse_common() and bpf_load_common()
>   bpf: allow loading programs for a specific ifindex
>   {f,m}_bpf: don't allow specifying multiple bpf programs
>   tc_filter: resolve device name before parsing filter
>   f_bpf: communicate ifindex for eBPF offload
>   iplink: communicate ifindex for xdp offload
> 
>  include/bpf_util.h|  25 +++-
>  ip/iplink.c   |   4 +-
>  ip/iplink_xdp.c   |  13 -
>  ip/iproute_lwtunnel.c |   3 +-
>  ip/xdp.h  |   4 +-
>  lib/bpf.c | 155 
> ++
>  tc/f_bpf.c|  18 +-
>  tc/m_bpf.c|   6 +-
>  tc/tc_filter.c|  50 
>  9 files changed, 178 insertions(+), 100 deletions(-)
> 


Looks good, applied. Thanks Jakub

Re: [PATCH net] openvswitch: fix the incorrect flow action alloc size

2017-11-26 Thread Pravin Shelar

On Sat, Nov 25, 2017 at 7:32 PM, zhangliping  wrote:
> From: zhangliping 
>
> If we want to add a datapath flow, which has more than 500 vxlan outputs'
> action, we will get the following error reports:
>   openvswitch: netlink: Flow action size 32832 bytes exceeds max
>   openvswitch: netlink: Flow action size 32832 bytes exceeds max
>   openvswitch: netlink: Actions may not be safe on all matching packets
>   ... ...
>
> It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
> this is not the root cause. For example, for a vxlan output action, we need
> about 60 bytes for the nlattr, but after it is converted to the flow
> action, it only occupies 24 bytes. This means that we can still support
> more than 1000 vxlan output actions for a single datapath flow under the
> the current 32k max limitation.
>
> So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
> shouldn't report EINVAL and keep it move on, as the judgement can be
> done by the reserve_sfa_size.
>
> Signed-off-by: zhangliping 

Thanks for the patch.

Acked-by: Pravin B Shelar

[PATCH RFC 1/2] br: add notifier for when bridge changes it GSO maximums

2017-11-26 Thread Stephen Hemminger

Add a callback notifier for when the minimum GSO values calculated
across all the bridge ports changes. This allows for veth to adjust
based on the devices in the bridge.

Signed-off-by: Stephen Hemminger 
---
 include/linux/netdevice.h | 1 +
 net/bridge/br_if.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef789e1d679e..0da966ffec70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2326,6 +2326,7 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_UDP_TUNNEL_PUSH_INFO0x001C
 #define NETDEV_UDP_TUNNEL_DROP_INFO0x001D
 #define NETDEV_CHANGE_TX_QUEUE_LEN 0x001E
+#define NETDEV_CHANGE_GSO_MAX  0x001F
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 9ba4ed65c52b..ca4ccadd78d0 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -453,8 +453,15 @@ static void br_set_gso_limits(struct net_bridge *br)
gso_max_size = min(gso_max_size, p->dev->gso_max_size);
gso_max_segs = min(gso_max_segs, p->dev->gso_max_segs);
}
+
+   if (br->dev->gso_max_size == gso_max_size &&
+   br->dev->gso_max_segs == gso_max_segs)
+   return;
+
br->dev->gso_max_size = gso_max_size;
br->dev->gso_max_segs = gso_max_segs;
+
+   call_netdevice_notifiers(NETDEV_CHANGE_GSO_MAX, br->dev);
 }
 
 /*
-- 
2.11.0

[PATCH RFC 2/2] veth: propagate bridge GSO to peer

2017-11-26 Thread Stephen Hemminger

This allows veth device in containers to see the GSO maximum
settings of the actual device being used for output.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/veth.c | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f5438d0978ca..0c9ce156943b 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -511,17 +511,89 @@ static struct rtnl_link_ops veth_link_ops = {
.get_link_net   = veth_get_link_net,
 };
 
+/* When veth device is added to a bridge or other master device
+ * then reflect the GSO max values from the upper device
+ * to the other end of veth pair.
+ */
+static void veth_change_upper(struct net_device *dev,
+ const struct netdev_notifier_changeupper_info *info)
+{
+   struct net_device *upper = info->upper_dev;
+   struct net_device *peer;
+   struct veth_priv *priv;
+
+   if (dev->netdev_ops != &veth_netdev_ops)
+   return;
+
+   priv = netdev_priv(dev);
+   peer = rtnl_dereference(priv->peer);
+   if (!peer)
+   return;
+
+   if (upper) {
+   peer->gso_max_segs = upper->gso_max_segs;
+   peer->gso_max_size = upper->gso_max_size;
+   } else {
+   peer->gso_max_segs = GSO_MAX_SEGS;
+   peer->gso_max_size = GSO_MAX_SIZE;
+   }
+}
+
+static void veth_change_upper_gso(struct net_device *upper)
+{
+   struct net_device *peer, *dev;
+   struct veth_priv *priv;
+
+   for_each_netdev(dev_net(upper), dev) {
+   if (dev->netdev_ops != &veth_netdev_ops)
+   continue;
+   if (!netdev_has_upper_dev(dev, upper))
+   continue;
+
+   priv = netdev_priv(dev);
+   peer = rtnl_dereference(priv->peer);
+   if (!peer)
+   continue;
+   peer->gso_max_segs = upper->gso_max_segs;
+   peer->gso_max_size = upper->gso_max_size;
+   }
+}
+
+static int veth_netdev_event(struct notifier_block *this,
+unsigned long event, void *ptr)
+{
+   struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
+
+   /* Propagate the upper (bridge) device settings to peer */
+   switch (event) {
+   case NETDEV_CHANGEUPPER:
+   veth_change_upper(event_dev, ptr);
+   break;
+   case NETDEV_CHANGE_GSO_MAX:
+   veth_change_upper_gso(event_dev);
+   break;
+   }
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block veth_netdev_notifier = {
+   .notifier_call = veth_netdev_event,
+};
+
 /*
  * init/fini
  */
 
 static __init int veth_init(void)
 {
+   register_netdevice_notifier(&veth_netdev_notifier);
return rtnl_link_register(&veth_link_ops);
 }
 
 static __exit void veth_exit(void)
 {
+   unregister_netdevice_notifier(&veth_netdev_notifier);
rtnl_link_unregister(&veth_link_ops);
 }
 
-- 
2.11.0

[PATCH RFC 0/2] veth, bridge, and GSO maximums

2017-11-26 Thread Stephen Hemminger

This pair of patchesimproves the performance when running
containers in an environment where underlying device has lower
GSO maximum (such as Azure).

With containers a veth pair is created and one end is attached
to the bridge device. The bridge device correctly reports
computes GSO parameters that are the minimum of the lower devices.

The problem is that the other end of the veth device (in container)
reports the full GSO size. This patch propogates the upper
(bridge device) parameters to the other end of the veth device.

Please consider it as alternative to the sysfs GSO changes.

Stephen Hemminger (2):
  br: add notifier for when bridge changes it GSO maximums
  veth: propagate bridge GSO to peer

 drivers/net/veth.c| 72 +++
 include/linux/netdevice.h |  1 +
 net/bridge/br_if.c|  7 +
 3 files changed, 80 insertions(+)

-- 
2.11.0

Re: [PATCH RFC] veth: make veth aware of gso buffer size

2017-11-26 Thread Stephen Hemminger

On Sat, 25 Nov 2017 13:26:52 -0800
Solio Sarabia  wrote:

> GSO buffer size supported by underlying devices is not propagated to
> veth. In high-speed connections with hw TSO enabled, veth sends buffers
> bigger than lower device's maximum GSO, forcing sw TSO and increasing
> system CPU usage.
> 
> Signed-off-by: Solio Sarabia 
> ---
> Exposing gso_max_size via sysfs is not advised [0]. This patch queries
> available interfaces get this value. Reading dev_list is O(n), since it
> can be large (e.g. hundreds of containers), only a subset of interfaces
> is inspected.  _Please_ advise pointers how to make veth aware of lower
> device's GSO value.
> 
> In a test scenario with Hyper-V, Ubuntu VM, Docker inside VM, and NTttcp
> microworkload sending 40 Gbps from one container, this fix reduces 3x
> sender host CPU overhead, since now all TSO is done on physical NIC.
> Savings in CPU cycles benefit other use cases where veth is used, and
> the GSO buffer size is properly set.
> 
> [0] https://lkml.org/lkml/2017/11/24/512
> 
>  drivers/net/veth.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index f5438d0..e255b51 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -298,6 +298,34 @@ static const struct net_device_ops veth_netdev_ops = {
>  NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | \
>  NETIF_F_HW_VLAN_STAG_TX | NETIF_F_HW_VLAN_STAG_RX )
>  
> +static void veth_set_gso(struct net_device *dev)
> +{
> + struct net_device *nd;
> + unsigned int size = GSO_MAX_SIZE;
> + u16 segs = GSO_MAX_SEGS;
> + unsigned int count = 0;
> + const unsigned int limit = 10;
> +
> + /* Set default gso based on available physical/synthetic devices,
> +  * ignore virtual interfaces, and limit looping through dev_list
> +  * as the total number of interfaces can be large.
> +  */
> + read_lock(&dev_base_lock);
> + for_each_netdev(&init_net, nd) {
> + if (count >= limit)
> + break;
> + if (nd->dev.parent && nd->flags & IFF_UP) {
> + size = min(size, nd->gso_max_size);
> + segs = min(segs, nd->gso_max_segs);
> + }
> + count++;
> + }
> +
> + read_unlock(&dev_base_lock);
> + netif_set_gso_max_size(dev, size);
> + dev->gso_max_segs = size ? size - 1 : 0;
> +}

Thanks for looking for a solution. 

Looking at the first 10 devices (including those not related to veth) is not 
that
great a method. There maybe 100's of tunnels, and there is no guarantee of 
ordering
in the device list. And what about network namespaces, looking in root namespace
is suspect as well.

The locking also looks wrong. veth_setup is called with RTNL held
(from __rtnl_link_register). Therefore acquiring dev_base_lock is not necessary.

Re: [PATCH] net: openvswitch: datapath: fix data type in queue_gso_packets

2017-11-26 Thread David Miller

From: Willem de Bruijn 
Date: Sat, 25 Nov 2017 16:15:01 -0500

> On Sat, Nov 25, 2017 at 2:14 PM, Gustavo A. R. Silva
>  wrote:
>> gso_type is being used in binary AND operations together with SKB_GSO_UDP.
>> The issue is that variable gso_type is of type unsigned short and
>> SKB_GSO_UDP expands to more than 16 bits:
>>
>> SKB_GSO_UDP = 1 << 16
>>
>> this makes any binary AND operation between gso_type and SKB_GSO_UDP to
>> be always zero, hence making some code unreachable and likely causing
>> undesired behavior.
>>
>> Fix this by changing the data type of variable gso_type to unsigned int.
>>
>> Addresses-Coverity-ID: 1462223
>> Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet")
>> Signed-off-by: Gustavo A. R. Silva 
> 
> Acked-by: Willem de Bruijn 

Applied and I'll queued this up with Willem's changes for -stable.

Thanks!

Re: [PATCH net] net: qmi_wwan: add support for Cinterion PLS8

2017-11-26 Thread Reinhard Speyerer

On Fri, Nov 24, 2017 at 07:25:19PM +0100, Bjørn Mork wrote:
> Reinhard Speyerer  writes:
> 
> > before posting this problem report
> > https://developer.gemalto.com/threads/ipv6dualstack-problems-pls8-e-revision-03017
> > in the Gemalto developer forum I tested the qmi_wwan/cdc_ether changes
> > you suggested above and apart from having two working QMI interfaces
> > the IPv6/dualstack problems observed with AT^SWWAN/cdc_ether were
> > also gone when using WDSStartNetworkInterface and the QMI interface in
> > raw IP mode instead.
> 
> Right. I did not know about the "carrier off" issue. But messed up
> ethernet headers is a well known problem with all these Qualcomm based
> modems. Switching them to raw IP mode is often the only way to make them
> work consistently.
> 
> Having seen this problem with multiple vendors, where some even have
> borrowed our workarounds for their own out-of-tree drivers, makes me
> pretty sure that it isn't easily fixable. It's a Qualcomm bug, and I
> guess no one is allowed to even look at the code.  Much less change it.
> Which makes sense given the mess it must be...
> 
> > Unfortunately Gemalto does no seems to be willing to provide an
> > alternative USB composition which includes QMI interfaces for the
> > PLS8. Therefore applying the above changes to qmi_wwan/cdc_ether might
> > make the PLS8 network interfaces stop working when Gemalto decides to
> > replace their f_rmnet gadget in CDCECM mode with a f_ecm gadget when
> > releasing a firmware update.
> 
> I don't think this is necessarily a problem. Only the QMI control
> channel will stop working should this happen.  The qmi_wwan driver will
> provide the same network device support as cdc_ether, using CDC ECM
> framing.
> 
> And to be honest, such a redesign of the modem application for a mature
> product is very unlikely, isn't it?  Why would Gemalto want to do all
> that extra work, taking the risks involved?  For what possible purpose?
> This is probably the reason they don't want to mess with alternative USB
> compositions either.
> 
> In any case, I think it is worth adding this device to qmi_wwan if it
> works with current firmwares and you, or anyone else, finds it useful.
> And it does sound like that based on the IPv6 issues you mention..
> 
> But I'll leave the decision to you or anyone else with such a device.

Hi Bjørn,

given that the PLS8 USB PID 0x0061 is also used by firmware version
02.x which has been relased quite some time ago I'm afraid switching
it from cdc_ether to qmi_wwan in the mainline Linux kernel now might
break too many existing/working setups even if only for the changed
interface name.

Since Gemalto also seems to have moved away from supporting USB
compositions with a QMI interface with newer firmware versions in
general they might as well reserve the right to reject problem reports
submitted to them when not using their AT^SWWAN/CDCECM setup.

Therefore I will not submit patches which switch PLS8 with USB PID
0x0061 from the cdc_ether to the qmi_wwan driver. If there are other
PLS8 users who don't share my concerns: please feel free to submit the
patches yourself.

Let's hope that Gemalto is able to fix the AT^SWWAN/CDCECM-specific
IPv6/dualstack problems in their forthcoming PLS8 firmware version 04.x
or provides an alternative USB composition if the effort for fixing the
bugs would be too high.

Regards,
Reinhard

Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected

2017-11-26 Thread Andreas Hartmann

Hello!

Since Linux 4.14 (running as host), the virtual network based on bridge
and tun/tap-devices is partly broken. Linux 4.13.x or earlier works
perfectly.


Given is the following architecture on host:

VM1 -> tun/tap -> br1
VM2 -> tun/tap -> br0 / br1
VM3 -> tun/tap -> br0

Example network configuration of the VMs:


  
  
  



Host is connected through br1.


VM2 is the router between two different networks provided by br1 and br0.

VM3 can be reached by the host via ssh through the router.
There are some more VMs in the network provided by br0 - e.g. VM4.

Mostly all VMs are Centos 7 / 64bit (Linux 3.10.x) VMs provided by
kvm_amd. VM1 uses Linux 4.4.x.


Now, VM1 sends a UDP IPv4 package (it's the first radius message from
hostapd during initialization of a EAP-TLS handshake with a WLAN client
- Access Request) to VM3. This package is answered by VM3 (Access
Challenge) and received by VM1.

Next, VM1 sends the second Access Request. I'm not sure, if this package
is still received by VM3 or not. But this is sure: VM1 never gets any
answer and the connection to VM3 is now *completely dead*. It isn't even
possible to reach VM3 by ssh any more.

There aren't any log messages - neither on the host, nor on the VM.
Other VMs aren't affected.


Bisecting the problem leads to this patch:

2ddf71e23cc246e95af72a6deed67b4a50a7b81c
net: add notifier hooks for devmap bpf map


It turns out, that the problem can be worked around by using e1000 as VM
interface instead of virtio for VM1 and 3:


  
  
  
  



Would it be possible to fix this problem to get it working again with
virtio? Do you need some more information? Feel free to ask!



Thanks,
Andreas

[PATCH net] vxlan: use be32 type for the param vni in vxlan_fdb_delete

2017-11-26 Thread Xin Long

All callers of __vxlan_fdb_delete pass vni with __be32 type, and
this param should be declared as __be32 type.

Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode")
Signed-off-by: Xin Long 
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 7ac4870..19b9cc5 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -874,8 +874,8 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr 
*tb[],
 
 static int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
  const unsigned char *addr, union vxlan_addr ip,
- __be16 port, __be32 src_vni, u32 vni, u32 ifindex,
- u16 vid)
+ __be16 port, __be32 src_vni, __be32 vni,
+ u32 ifindex, u16 vid)
 {
struct vxlan_fdb *f;
struct vxlan_rdst *rd = NULL;
-- 
2.1.0

[PATCH net] bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM

2017-11-26 Thread Xin Long

bond_opt_initval expects a u64 type param, it's better to use
nla_get_u64 to extract the value here, to eliminate a sparse
endianness mismatch warning.

Fixes: 171a42c38c6e ("bonding: add netlink support for sys prio, actor sys mac, 
and port key")
Signed-off-by: Xin Long 
---
 drivers/net/bonding/bond_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_netlink.c 
b/drivers/net/bonding/bond_netlink.c
index a1b33aa..9697977 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -423,7 +423,7 @@ static int bond_changelink(struct net_device *bond_dev, 
struct nlattr *tb[],
return -EINVAL;
 
bond_opt_initval(&newval,
-nla_get_be64(data[IFLA_BOND_AD_ACTOR_SYSTEM]));
+nla_get_u64(data[IFLA_BOND_AD_ACTOR_SYSTEM]));
err = __bond_opt_set(bond, BOND_OPT_AD_ACTOR_SYSTEM, &newval);
if (err)
return err;
-- 
2.1.0

[PATCH net] sctp: use right member as the param of list_for_each_entry

2017-11-26 Thread Xin Long

Commit d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues
when migrating a sock") made a mistake that using 'list' as the param of
list_for_each_entry to traverse the retransmit, sacked and abandoned
queues, while chunks are using 'transmitted_list' to link into these
queues.

It could cause NULL dereference panic if there are chunks in any of these
queues when peeling off one asoc.

So use the chunk member 'transmitted_list' instead in this patch.

Fixes: d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when 
migrating a sock")
Signed-off-by: Xin Long 
---
 net/sctp/socket.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3204a9b..014847e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -188,13 +188,13 @@ static void sctp_for_each_tx_datachunk(struct 
sctp_association *asoc,
list_for_each_entry(chunk, &t->transmitted, transmitted_list)
cb(chunk);
 
-   list_for_each_entry(chunk, &q->retransmit, list)
+   list_for_each_entry(chunk, &q->retransmit, transmitted_list)
cb(chunk);
 
-   list_for_each_entry(chunk, &q->sacked, list)
+   list_for_each_entry(chunk, &q->sacked, transmitted_list)
cb(chunk);
 
-   list_for_each_entry(chunk, &q->abandoned, list)
+   list_for_each_entry(chunk, &q->abandoned, transmitted_list)
cb(chunk);
 
list_for_each_entry(chunk, &q->out_chunk_list, list)
-- 
2.1.0

[PATCH net 2/3] sctp: force the params with right types for sctp csum apis

2017-11-26 Thread Xin Long

Now sctp_csum_xxx doesn't really match the param types of these common
csum apis. As sctp_csum_xxx is defined in sctp/checksum.h, many sparse
errors occur when make C=2 not only with M=net/sctp but also with other
modules that include this header file.

This patch is to force them fit in csum apis with the right types.

Fixes: e6d8b64b34aa ("net: sctp: fix and consolidate SCTP checksumming code")
Signed-off-by: Xin Long 
---
 include/net/sctp/checksum.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/checksum.h b/include/net/sctp/checksum.h
index 4a5b9a3..32ee65a 100644
--- a/include/net/sctp/checksum.h
+++ b/include/net/sctp/checksum.h
@@ -48,31 +48,32 @@ static inline __wsum sctp_csum_update(const void *buff, int 
len, __wsum sum)
/* This uses the crypto implementation of crc32c, which is either
 * implemented w/ hardware support or resolves to __crc32c_le().
 */
-   return crc32c(sum, buff, len);
+   return (__force __wsum)crc32c((__force __u32)sum, buff, len);
 }
 
 static inline __wsum sctp_csum_combine(__wsum csum, __wsum csum2,
   int offset, int len)
 {
-   return __crc32c_le_combine(csum, csum2, len);
+   return (__force __wsum)__crc32c_le_combine((__force __u32)csum,
+  (__force __u32)csum2, len);
 }
 
 static inline __le32 sctp_compute_cksum(const struct sk_buff *skb,
unsigned int offset)
 {
struct sctphdr *sh = sctp_hdr(skb);
-__le32 ret, old = sh->checksum;
const struct skb_checksum_ops ops = {
.update  = sctp_csum_update,
.combine = sctp_csum_combine,
};
+   __le32 old = sh->checksum;
+   __wsum new;
 
sh->checksum = 0;
-   ret = cpu_to_le32(~__skb_checksum(skb, offset, skb->len - offset,
- ~(__u32)0, &ops));
+   new = ~__skb_checksum(skb, offset, skb->len - offset, ~(__wsum)0, &ops);
sh->checksum = old;
 
-   return ret;
+   return cpu_to_le32((__force __u32)new);
 }
 
 #endif /* __sctp_checksum_h__ */
-- 
2.1.0

[PATCH net 3/3] sctp: remove extern from stream sched

2017-11-26 Thread Xin Long

Now each stream sched ops is defined in different .c file and
added into the global ops in another .c file, it uses extern
to make this work.

However extern is not good coding style to get them in and
even make C=2 reports errors for this.

This patch adds sctp_sched_ops_xxx_init for each stream sched
ops in their .c file, then get them into the global ops by
calling them when initializing sctp module.

Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler")
Fixes: ac1ed8b82cd6 ("sctp: introduce round robin stream scheduler")
Signed-off-by: Xin Long 
---
 include/net/sctp/sctp.h |  5 +
 include/net/sctp/stream_sched.h |  5 +
 net/sctp/protocol.c |  1 +
 net/sctp/stream_sched.c | 25 ++---
 net/sctp/stream_sched_prio.c|  7 ++-
 net/sctp/stream_sched_rr.c  |  7 ++-
 6 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 749a428..906a9c0 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -195,6 +195,11 @@ void sctp_remaddr_proc_exit(struct net *net);
 int sctp_offload_init(void);
 
 /*
+ * sctp/stream_sched.c
+ */
+void sctp_sched_ops_init(void);
+
+/*
  * sctp/stream.c
  */
 int sctp_send_reset_streams(struct sctp_association *asoc,
diff --git a/include/net/sctp/stream_sched.h b/include/net/sctp/stream_sched.h
index c676550..5c5da48 100644
--- a/include/net/sctp/stream_sched.h
+++ b/include/net/sctp/stream_sched.h
@@ -69,4 +69,9 @@ void sctp_sched_dequeue_common(struct sctp_outq *q, struct 
sctp_chunk *ch);
 int sctp_sched_init_sid(struct sctp_stream *stream, __u16 sid, gfp_t gfp);
 struct sctp_sched_ops *sctp_sched_ops_from_stream(struct sctp_stream *stream);
 
+void sctp_sched_ops_register(enum sctp_sched_type sched,
+struct sctp_sched_ops *sched_ops);
+void sctp_sched_ops_prio_init(void);
+void sctp_sched_ops_rr_init(void);
+
 #endif /* __sctp_stream_sched_h__ */
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index f5172c2..6a38c25 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1499,6 +1499,7 @@ static __init int sctp_init(void)
INIT_LIST_HEAD(&sctp_address_families);
sctp_v4_pf_init();
sctp_v6_pf_init();
+   sctp_sched_ops_init();
 
status = register_pernet_subsys(&sctp_defaults_ops);
if (status)
diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
index 0b83ec5..d8c162a 100644
--- a/net/sctp/stream_sched.c
+++ b/net/sctp/stream_sched.c
@@ -119,16 +119,27 @@ static struct sctp_sched_ops sctp_sched_fcfs = {
.unsched_all = sctp_sched_fcfs_unsched_all,
 };
 
+static void sctp_sched_ops_fcfs_init(void)
+{
+   sctp_sched_ops_register(SCTP_SS_FCFS, &sctp_sched_fcfs);
+}
+
 /* API to other parts of the stack */
 
-extern struct sctp_sched_ops sctp_sched_prio;
-extern struct sctp_sched_ops sctp_sched_rr;
+static struct sctp_sched_ops *sctp_sched_ops[SCTP_SS_MAX + 1];
 
-static struct sctp_sched_ops *sctp_sched_ops[] = {
-   &sctp_sched_fcfs,
-   &sctp_sched_prio,
-   &sctp_sched_rr,
-};
+void sctp_sched_ops_register(enum sctp_sched_type sched,
+struct sctp_sched_ops *sched_ops)
+{
+   sctp_sched_ops[sched] = sched_ops;
+}
+
+void sctp_sched_ops_init(void)
+{
+   sctp_sched_ops_fcfs_init();
+   sctp_sched_ops_prio_init();
+   sctp_sched_ops_rr_init();
+}
 
 int sctp_sched_set_sched(struct sctp_association *asoc,
 enum sctp_sched_type sched)
diff --git a/net/sctp/stream_sched_prio.c b/net/sctp/stream_sched_prio.c
index 384dbf3..7997d35 100644
--- a/net/sctp/stream_sched_prio.c
+++ b/net/sctp/stream_sched_prio.c
@@ -333,7 +333,7 @@ static void sctp_sched_prio_unsched_all(struct sctp_stream 
*stream)
sctp_sched_prio_unsched(soute);
 }
 
-struct sctp_sched_ops sctp_sched_prio = {
+static struct sctp_sched_ops sctp_sched_prio = {
.set = sctp_sched_prio_set,
.get = sctp_sched_prio_get,
.init = sctp_sched_prio_init,
@@ -345,3 +345,8 @@ struct sctp_sched_ops sctp_sched_prio = {
.sched_all = sctp_sched_prio_sched_all,
.unsched_all = sctp_sched_prio_unsched_all,
 };
+
+void sctp_sched_ops_prio_init(void)
+{
+   sctp_sched_ops_register(SCTP_SS_PRIO, &sctp_sched_prio);
+}
diff --git a/net/sctp/stream_sched_rr.c b/net/sctp/stream_sched_rr.c
index 7612a43..1155692 100644
--- a/net/sctp/stream_sched_rr.c
+++ b/net/sctp/stream_sched_rr.c
@@ -187,7 +187,7 @@ static void sctp_sched_rr_unsched_all(struct sctp_stream 
*stream)
sctp_sched_rr_unsched(stream, soute);
 }
 
-struct sctp_sched_ops sctp_sched_rr = {
+static struct sctp_sched_ops sctp_sched_rr = {
.set = sctp_sched_rr_set,
.get = sctp_sched_rr_get,
.init = sctp_sched_rr_init,
@@ -199,3 +199,8 @@ struct sctp_sched_ops sctp_sched_rr = {
.sched_all = sctp_sched_rr_sched_all,
.unsched_all = sc

[PATCH net 1/3] sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail

2017-11-26 Thread Xin Long

This patch is to force SCTP_ERROR_INV_STRM with right type to
fit in sctp_chunk_fail to avoid the sparse error.

Signed-off-by: Xin Long 
---
 net/sctp/stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/stream.c b/net/sctp/stream.c
index a20145b..76ea66b 100644
--- a/net/sctp/stream.c
+++ b/net/sctp/stream.c
@@ -64,7 +64,7 @@ static void sctp_stream_outq_migrate(struct sctp_stream 
*stream,
 */
 
/* Mark as failed send. */
-   sctp_chunk_fail(ch, SCTP_ERROR_INV_STRM);
+   sctp_chunk_fail(ch, (__force __u32)SCTP_ERROR_INV_STRM);
if (asoc->peer.prsctp_capable &&
SCTP_PR_PRIO_ENABLED(ch->sinfo.sinfo_flags))
asoc->sent_cnt_removable--;
-- 
2.1.0

[PATCH net 0/3] sctp: fix some other sparse errors

2017-11-26 Thread Xin Long

After the last fixes for sparse errors, there are still three sparse
errors in sctp codes, two of them are type cast, and the other one
is using extern.

Xin Long (3):
  sctp: force SCTP_ERROR_INV_STRM with __u32 when calling
sctp_chunk_fail
  sctp: force the params with right types for sctp csum apis
  sctp: remove extern from stream sched

 include/net/sctp/checksum.h | 13 +++--
 include/net/sctp/sctp.h |  5 +
 include/net/sctp/stream_sched.h |  5 +
 net/sctp/protocol.c |  1 +
 net/sctp/stream.c   |  2 +-
 net/sctp/stream_sched.c | 25 ++---
 net/sctp/stream_sched_prio.c|  7 ++-
 net/sctp/stream_sched_rr.c  |  7 ++-
 8 files changed, 49 insertions(+), 16 deletions(-)

-- 
2.1.0

Re: general protection fault in dst_destroy() - 4.13.9

2017-11-26 Thread Anders K. Pedersen | Cohaesio

On man, 2017-11-20 at 17:13 +0200, Ido Schimmel wrote:
> On Sun, Nov 19, 2017 at 12:45:41PM +, Anders K. Pedersen |
> Cohaesio wrote:
> > Hello,
> > 
> > A few days ago, one of our routers (running Linux 4.13.9) crashed
> > due
> > to a general protection fault in dst_destroy(). At the time, it had
> > run
> > for several weeks without any problems, but then crashed three
> > times in
> > a row within a few minutes - all due to a general protection fault
> > at
> > dst_destroy()+0x35. Since then, it has run for several days without
> > any
> > further problems, so I suspect that this was triggered by a traffic
> > pattern in the routed packets, but I don't have a way to reproduce
> > it.
> > 
> > Disassembly shows that this is in the inlined dev_put(), which does
> > this_cpu_dec(*dev->pcpu_refcnt). As far as I can tell there haven't
> > been any fixes in this area since 4.13, and a Google search didn't
> > find
> > anything recent, so I'm guessing this is not a known problem.
> > 
> > I have included the kernel output via serial console below as well
> > as
> > gdb and objdump information. Please let me know, if I can provide
> > any
> > additional information.
> > 
> > 
> > [2024260.461401] general protection fault:  [#1] SMP
> > [2024260.467193] Modules linked in:
> > [2024260.470897] CPU: 15 PID: 0 Comm: swapper/15 Tainted:
> > GW   4.13.9 #2
> > [2024260.479488] Hardware name: Dell Inc. PowerEdge R730/0H21J3,
> > BIOS 2.5.5 08/16/2017
> > [2024260.488279] task: 88085b625cc0 task.stack:
> > c90e4000
> > [2024260.495277] RIP: 0010:dst_destroy+0x35/0xa0
> > [2024260.500277] RSP: 0018:88085f5c3f08 EFLAGS: 00010286
> > [2024260.506474] RAX: 88085ac0e880 RBX: 88082cf9fb00 RCX:
> > 0020
> > [2024260.514868] RDX: 88082cf9fbc0 RSI:  RDI:
> > 816786c0
> > [2024260.523258] RBP:  R08: ff00 R09:
> > 
> > [2024260.531649] R10:  R11:  R12:
> > 88085f5da678
> > [2024260.540040] R13: 000a R14: 88085b625cc0 R15:
> > 88085b625cc0
> > [2024260.548431] FS:  ()
> > GS:88085f5c() knlGS:
> > [2024260.557924] CS:  0010 DS:  ES:  CR0: 80050033
> > [2024260.564719] CR2: 7fc800e48e88 CR3: 01809000 CR4:
> > 001406e0
> > [2024260.573112] Call Trace:
> > [2024260.576113]  
> > [2024260.578618]  ? rcu_process_callbacks+0x18f/0x460
> > [2024260.584126]  ? rebalance_domains+0xe2/0x290
> > [2024260.589128]  ? __do_softirq+0x100/0x292
> > [2024260.593727]  ? irq_exit+0x92/0xa0
> > [2024260.597729]  ? smp_apic_timer_interrupt+0x39/0x50
> > [2024260.603328]  ? apic_timer_interrupt+0x7c/0x90
> > [2024260.608528]  
> > [2024260.611134]  ? cpuidle_enter_state+0x14c/0x2b0
> > [2024260.616432]  ? cpuidle_enter_state+0x128/0x2b0
> > [2024260.621731]  ? do_idle+0xf9/0x190
> > [2024260.625733]  ? cpu_startup_entry+0x5f/0x70
> > [2024260.630636]  ? start_secondary+0x12a/0x130
> > [2024260.635536]  ? secondary_startup_64+0x9f/0x9f
> > [2024260.640731] Code: f6 47 60 08 48 8b 6f 18 74 62 48 8b 43 20 48
> > 8b 40 30 48 85 c0 74 05 48
> > 89 df ff d0 48 8b 03 48 85 c0 74 0a 48 8b 80 e0 03 00 00 <65> ff 08
> > f6 43 60 80 74 26 48 8d bb
> > e0 00 00 00 e8 e6 7f 01 00
> > [2024260.662626] RIP: dst_destroy+0x35/0xa0 RSP: 88085f5c3f08
> > [2024260.669333] ---[ end trace 3c1827251806827c ]---
> > [2024260.724173] Kernel panic - not syncing: Fatal exception in
> > interrupt
> > [2024261.102792] Kernel Offset: disabled
> > [2024261.156022] Rebooting in 60 seconds..
> > [2024321.167958] ACPI MEMORY or I/O RESET_REG.
> 
> This looks very similar to a bug Eric already fixed here:
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/co
> mmit/?id=222d7dbd258dad4cd5241c43ef818141fad5a87a
> 
> I don't see it in v4.13.9 which might explain why you're still
> hitting
> it. Can you please try to reproduce with mentioned patch?

Yes, it looks like it could be related. I see that it is included in
v4.14, so we'll update to that and see if it comes back.

Thanks,
Anders

HELLO!

2017-11-26 Thread Mr. G. CHA

I have important transaction for you as next of kin to claim US$8.37m  email me 
at changgordo...@yahoo.com.hk so I can send you more details

40 matches

Mail list logo