Thanks for testing on ehca... > While using IPoIB over EHCA (rc6 bits), unregister_netdev hangs with
I don't think you're actually using rc6 bits, since in your patch you have: > -poll_more: and I think that is only in Dave's net-2.6.24 tree now, right? > The problem is that the poll handler does netif_rx_complete (which > does a dev_put) followed by netif_rx_reschedule() to schedule for > more receives (which again does a dev_put). This reduces refcount to > < 0 (depending on how many times netif_rx_complete followed by > netif_rx_reschedule was called). Dave, the real problem seems to be that netif_rx_recschedule() calls __napi_schedule() rather than __netif_rx_schedule(), so it misses the call to dev_hold() that is needed to balance the dev_put() in netif_rx_complete(). The current netif_rx_reschedule() looks like it really should be napi_reschedule(), and we need a new function that takes a netdev too. Or am I misunderstanding the refcounting? I'll send a patch once I've had some breakfast and had a chance to at least compile it... Krishna, unfortunately your proposed fix has a race: > - netif_rx_complete(dev, napi); > - if (unlikely(ib_req_notify_cq(priv->cq, > - IB_CQ_NEXT_COMP | > - IB_CQ_REPORT_MISSED_EVENTS)) && > - netif_rx_reschedule(napi)) > - goto poll_more; > + if (likely(!ib_req_notify_cq(priv->cq, > + IB_CQ_NEXT_COMP | > + IB_CQ_REPORT_MISSED_EVENTS))) It is possible for an interrupt to happen immediately right here, before the netif_rx_complete(), so that netif_rx_schedule() gets called while we are still on the poll list. > + netif_rx_complete(dev, napi); - R. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html