Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-26 Thread Stephen Hemminger
On Wed, 25 Jan 2017 15:02:19 -0600
Thomas Falcon  wrote:

>  static irqreturn_t ibmvnic_interrupt(int irq, void *instance)
>  {
>   struct ibmvnic_adapter *adapter = instance;
> + unsigned long flags;
> +
> + spin_lock_irqsave(>crq.lock, flags);
> + vio_disable_interrupts(adapter->vdev);
> + tasklet_schedule(>tasklet);
> + spin_unlock_irqrestore(>crq.lock, flags);
> + return IRQ_HANDLED;
> +}
> +

Why not use NAPI? rather than a tasklet


Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-26 Thread Thomas Falcon
On 01/26/2017 12:28 PM, Stephen Hemminger wrote:
> On Wed, 25 Jan 2017 15:02:19 -0600
> Thomas Falcon  wrote:
>
>>  static irqreturn_t ibmvnic_interrupt(int irq, void *instance)
>>  {
>>  struct ibmvnic_adapter *adapter = instance;
>> +unsigned long flags;
>> +
>> +spin_lock_irqsave(>crq.lock, flags);
>> +vio_disable_interrupts(adapter->vdev);
>> +tasklet_schedule(>tasklet);
>> +spin_unlock_irqrestore(>crq.lock, flags);
>> +return IRQ_HANDLED;
>> +}
>> +
> Why not use NAPI? rather than a tasklet
>
This interrupt function doesn't process packets, but message passing between 
firmware and driver for determining device capabilities and available 
resources, such as the number TX and RX queues. 



Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-26 Thread Thomas Falcon
On 01/26/2017 11:56 AM, David Miller wrote:
> From: Thomas Falcon 
> Date: Thu, 26 Jan 2017 10:44:22 -0600
>
>> On 01/25/2017 10:04 PM, David Miller wrote:
>>> From: Thomas Falcon 
>>> Date: Wed, 25 Jan 2017 15:02:19 -0600
>>>
 Move most interrupt handler processing into a tasklet, and
 introduce a delay after re-enabling interrupts to fix timing
 issues encountered in hardware testing.

 Signed-off-by: Thomas Falcon 
>>> I don't think you have any idea what the real problem is.  This looks
>>> like a hack, at best.  Next patch you'll increase the delay to "20",
>>> right?  And if that doesn't work you'll try "40".
>>>
>>> Or if you do know the reason, you need to explain it in detail in this
>>> commit message so that we can properly evaluate this patch.
>> You're right, I should have given more explanation in the commit message 
>> about the bug encountered and our reasons for this sort of fix.  The issue 
>> is that there are some scenarios during the device init where multiple 
>> messages are sent by firmware in one interrupt request. 
>>
>> We have observed behavior where messages are delayed, resulting in the 
>> interrupt handler completing before the delayed messages can be processed, 
>> fouling up the device bring-up in the device probing and elsewhere.  The 
>> goal of the delay is to buy some time for the hypervisor to forward all the 
>> CRQ messages from the firmware.
> Then isn't there an event you can sleep and wait for, or a piece of state for
> you to poll and test for in a timeout based loop?
>
> This delay is a bad solution for the problem of waiting for A to happen
> before you do B.
>
Understood.  We will come up with a better fix.  Thanks for your attention.



Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-26 Thread David Miller
From: Thomas Falcon 
Date: Thu, 26 Jan 2017 10:44:22 -0600

> On 01/25/2017 10:04 PM, David Miller wrote:
>> From: Thomas Falcon 
>> Date: Wed, 25 Jan 2017 15:02:19 -0600
>>
>>> Move most interrupt handler processing into a tasklet, and
>>> introduce a delay after re-enabling interrupts to fix timing
>>> issues encountered in hardware testing.
>>>
>>> Signed-off-by: Thomas Falcon 
>> I don't think you have any idea what the real problem is.  This looks
>> like a hack, at best.  Next patch you'll increase the delay to "20",
>> right?  And if that doesn't work you'll try "40".
>>
>> Or if you do know the reason, you need to explain it in detail in this
>> commit message so that we can properly evaluate this patch.
> 
> You're right, I should have given more explanation in the commit message 
> about the bug encountered and our reasons for this sort of fix.  The issue is 
> that there are some scenarios during the device init where multiple messages 
> are sent by firmware in one interrupt request. 
> 
> We have observed behavior where messages are delayed, resulting in the 
> interrupt handler completing before the delayed messages can be processed, 
> fouling up the device bring-up in the device probing and elsewhere.  The goal 
> of the delay is to buy some time for the hypervisor to forward all the CRQ 
> messages from the firmware.

Then isn't there an event you can sleep and wait for, or a piece of state for
you to poll and test for in a timeout based loop?

This delay is a bad solution for the problem of waiting for A to happen
before you do B.


Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-26 Thread Thomas Falcon
On 01/25/2017 10:04 PM, David Miller wrote:
> From: Thomas Falcon 
> Date: Wed, 25 Jan 2017 15:02:19 -0600
>
>> Move most interrupt handler processing into a tasklet, and
>> introduce a delay after re-enabling interrupts to fix timing
>> issues encountered in hardware testing.
>>
>> Signed-off-by: Thomas Falcon 
> I don't think you have any idea what the real problem is.  This looks
> like a hack, at best.  Next patch you'll increase the delay to "20",
> right?  And if that doesn't work you'll try "40".
>
> Or if you do know the reason, you need to explain it in detail in this
> commit message so that we can properly evaluate this patch.

You're right, I should have given more explanation in the commit message about 
the bug encountered and our reasons for this sort of fix.  The issue is that 
there are some scenarios during the device init where multiple messages are 
sent by firmware in one interrupt request. 

We have observed behavior where messages are delayed, resulting in the 
interrupt handler completing before the delayed messages can be processed, 
fouling up the device bring-up in the device probing and elsewhere.  The goal 
of the delay is to buy some time for the hypervisor to forward all the CRQ 
messages from the firmware.
>
> Furthermore, if you're going to move all of your packet processing
> into software interrupt context, you might as well use NAPI polling
> which is a purposefully built piece of infrastructure for doing
> exactly this.
>
This interrupt handler doesn't handle packet processing, but communications 
between the driver and firmware for device settings and resource allocation.  
Packet processing is done with different interrupts that do use NAPI polling.



Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-25 Thread David Miller
From: Thomas Falcon 
Date: Wed, 25 Jan 2017 15:02:19 -0600

> Move most interrupt handler processing into a tasklet, and
> introduce a delay after re-enabling interrupts to fix timing
> issues encountered in hardware testing.
> 
> Signed-off-by: Thomas Falcon 

I don't think you have any idea what the real problem is.  This looks
like a hack, at best.  Next patch you'll increase the delay to "20",
right?  And if that doesn't work you'll try "40".

Or if you do know the reason, you need to explain it in detail in this
commit message so that we can properly evaluate this patch.

Furthermore, if you're going to move all of your packet processing
into software interrupt context, you might as well use NAPI polling
which is a purposefully built piece of infrastructure for doing
exactly this.



[PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-25 Thread Thomas Falcon
Move most interrupt handler processing into a tasklet, and
introduce a delay after re-enabling interrupts to fix timing
issues encountered in hardware testing.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 21 +++--
 drivers/net/ethernet/ibm/ibmvnic.h |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index c125966..09071bf 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3414,6 +3414,18 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 static irqreturn_t ibmvnic_interrupt(int irq, void *instance)
 {
struct ibmvnic_adapter *adapter = instance;
+   unsigned long flags;
+
+   spin_lock_irqsave(>crq.lock, flags);
+   vio_disable_interrupts(adapter->vdev);
+   tasklet_schedule(>tasklet);
+   spin_unlock_irqrestore(>crq.lock, flags);
+   return IRQ_HANDLED;
+}
+
+static void ibmvnic_tasklet(void *data)
+{
+   struct ibmvnic_adapter *adapter = data;
struct ibmvnic_crq_queue *queue = >crq;
struct vio_dev *vdev = adapter->vdev;
union ibmvnic_crq *crq;
@@ -3421,7 +3433,6 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
bool done = false;
 
spin_lock_irqsave(>lock, flags);
-   vio_disable_interrupts(vdev);
while (!done) {
/* Pull all the valid messages off the CRQ */
while ((crq = ibmvnic_next_crq(adapter)) != NULL) {
@@ -3429,6 +3440,8 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
crq->generic.first = 0;
}
vio_enable_interrupts(vdev);
+   /* delay in case of firmware hiccup */
+   mdelay(10);
crq = ibmvnic_next_crq(adapter);
if (crq) {
vio_disable_interrupts(vdev);
@@ -3439,7 +3452,6 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
}
}
spin_unlock_irqrestore(>lock, flags);
-   return IRQ_HANDLED;
 }
 
 static int ibmvnic_reenable_crq_queue(struct ibmvnic_adapter *adapter)
@@ -3494,6 +3506,7 @@ static void ibmvnic_release_crq_queue(struct 
ibmvnic_adapter *adapter)
 
netdev_dbg(adapter->netdev, "Releasing CRQ\n");
free_irq(vdev->irq, adapter);
+   tasklet_kill(>tasklet);
do {
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
@@ -3539,6 +3552,9 @@ static int ibmvnic_init_crq_queue(struct ibmvnic_adapter 
*adapter)
 
retrc = 0;
 
+   tasklet_init(>tasklet, (void *)ibmvnic_tasklet,
+(unsigned long)adapter);
+
netdev_dbg(adapter->netdev, "registering irq 0x%x\n", vdev->irq);
rc = request_irq(vdev->irq, ibmvnic_interrupt, 0, IBMVNIC_NAME,
 adapter);
@@ -3560,6 +3576,7 @@ static int ibmvnic_init_crq_queue(struct ibmvnic_adapter 
*adapter)
return retrc;
 
 req_irq_failed:
+   tasklet_kill(>tasklet);
do {
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index dd775d9..0d0edc3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1049,5 +1049,6 @@ struct ibmvnic_adapter {
 
struct work_struct vnic_crq_init;
struct work_struct ibmvnic_xport;
+   struct tasklet_struct tasklet;
bool failover;
 };
-- 
2.7.4