Re: [ewg] [PATCH] IPoIB: synchronize timer deletion with completion handler

2009-12-10 Thread Eli Cohen
On Wed, Dec 09, 2009 at 02:39:49PM -0800, Roland Dreier wrote:
> 
>  > When calling del_timer_sync on priv->poll_timer, it is necessary to prevent
>  > farther arming of the timer which can be done by a completion handler. 
> Though
>  > it is harmless since the timer will eventually stop being rearmed, it is 
> better
>  > practice to avoid calling the timer function after it is deleted. This 
> patch
>  > handles this by using a new flag that is checked before arming the timer.
> 
> have you seen this in practice?  If it can happen then it's not
> harmless, since the module could be unloaded with the timer pending.
> However I don't see how it could happen, since we only seem to delete
> the timer after we know that no more completions are coming (except for
> the case where we decide that the hardware is wedged but it really only
> takes a *long* time to respond at exactly the wrong time, and we somehow
> get a completion between the del_timer_sync and the modify QP to reset
> state -- which is so unlikely it seems not worth adding this extra
> complexity for -- maybe we could add the del_timer_sync to after we
> delete the CQ or something if you're really worried)

No, I haven't actually seen this happening but I think it could.
Consider this scenario:

1. ipoib_send() calls ib_req_notify_cq()
2. ipoib_ib_dev_stop() gets called
3. All pending WR complete; interrupt is generated but not yet
serviced.
4. ipoib_ib_dev_stop() calls del_timer_sync() and passes it.
5. Completion handler is finally serviced - it schedules
ipoib_ib_tx_timer_func()
6. IPoIB unloads and we're in trouble.

Not so likely to happen indeed. Your suggestion to defer deleting the
timer until after the CQ is destroyed will also solve this.
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


Re: [ewg] [PATCH] IPoIB: synchronize timer deletion with completion handler

2009-12-09 Thread Roland Dreier

 > When calling del_timer_sync on priv->poll_timer, it is necessary to prevent
 > farther arming of the timer which can be done by a completion handler. Though
 > it is harmless since the timer will eventually stop being rearmed, it is 
 > better
 > practice to avoid calling the timer function after it is deleted. This patch
 > handles this by using a new flag that is checked before arming the timer.

have you seen this in practice?  If it can happen then it's not
harmless, since the module could be unloaded with the timer pending.
However I don't see how it could happen, since we only seem to delete
the timer after we know that no more completions are coming (except for
the case where we decide that the hardware is wedged but it really only
takes a *long* time to respond at exactly the wrong time, and we somehow
get a completion between the del_timer_sync and the modify QP to reset
state -- which is so unlikely it seems not worth adding this extra
complexity for -- maybe we could add the del_timer_sync to after we
delete the CQ or something if you're really worried)
___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


[ewg] [PATCH] IPoIB: synchronize timer deletion with completion handler

2009-12-07 Thread Eli Cohen
When calling del_timer_sync on priv->poll_timer, it is necessary to prevent
farther arming of the timer which can be done by a completion handler. Though
it is harmless since the timer will eventually stop being rearmed, it is better
practice to avoid calling the timer function after it is deleted. This patch
handles this by using a new flag that is checked before arming the timer.

Signed-off-by: Eli Cohen 
---
 drivers/infiniband/ulp/ipoib/ipoib.h|1 +
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |5 -
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h 
b/drivers/infiniband/ulp/ipoib/ipoib.h
index 753a983..3c6f931 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -92,6 +92,7 @@ enum {
IPOIB_FLAG_ADMIN_CM   = 9,
IPOIB_FLAG_UMCAST = 10,
IPOIB_FLAG_CSUM   = 11,
+   IPOIB_FLAG_POLL_TIMER = 12,
 
IPOIB_MAX_BACKOFF_SECONDS = 16,
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 8c91d9f..b946a01 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -481,7 +481,8 @@ void ipoib_send_comp_handler(struct ib_cq *cq, void 
*dev_ptr)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev_ptr);
 
-   mod_timer(&priv->poll_timer, jiffies);
+   if (test_bit(IPOIB_FLAG_POLL_TIMER, &priv->flags))
+   mod_timer(&priv->poll_timer, jiffies);
 }
 
 static inline int post_send(struct ipoib_dev_priv *priv,
@@ -860,6 +861,7 @@ int ipoib_ib_dev_stop(struct net_device *dev, int flush)
ipoib_dbg(priv, "All sends and receives done.\n");
 
 timeout:
+   clear_bit(IPOIB_FLAG_POLL_TIMER, &priv->flags);
del_timer_sync(&priv->poll_timer);
qp_attr.qp_state = IB_QPS_RESET;
if (ib_modify_qp(priv->qp, &qp_attr, IB_QP_STATE))
@@ -904,6 +906,7 @@ int ipoib_ib_dev_init(struct net_device *dev, struct 
ib_device *ca, int port)
 
setup_timer(&priv->poll_timer, ipoib_ib_tx_timer_func,
(unsigned long) dev);
+   set_bit(IPOIB_FLAG_POLL_TIMER, &priv->flags);
 
if (dev->flags & IFF_UP) {
if (ipoib_ib_dev_open(dev)) {
-- 
1.6.5.5

___
ewg mailing list
ewg@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg