[EMAIL PROTECTED] wrote on 11/14/2006 03:18:23 PM: > Shirley> The rotting packet situation consistently happens for > Shirley> ehca driver. The napi could poll forever with your > Shirley> original patch. That's the reason I defer the rotting > Shirley> packet process in next napi poll. > > Hmm, I don't see it. In my latest patch, the poll routine does: > > repoll: > done = 0; > empty = 0; > > while (max) { > t = min(IPOIB_NUM_WC, max); > n = ib_poll_cq(priv->cq, t, priv->ibwc); > > for (i = 0; i < n; ++i) { > if (priv->ibwc[i].wr_id & IPOIB_OP_RECV) { > ++done; > --max; > ipoib_ib_handle_rx_wc(dev, priv->ibwc + i); > } else > ipoib_ib_handle_tx_wc(dev, priv->ibwc + i); > } > > if (n != t) { > empty = 1; > break; > } > } > > dev->quota -= done; > *budget -= done; > > if (empty) { > netif_rx_complete(dev); > if (unlikely(ib_req_notify_cq(priv->cq, > IB_CQ_NEXT_COMP | > IB_CQ_REPORT_MISSED_EVENTS)) && > netif_rx_reschedule(dev, 0)) > goto repoll; > > return 0; > } > > return 1; > > so every receive completion will count against the limit set by the > variable max. The only way I could see the driver staying in the poll > routine for a long time would be if it was only processing send > completions, but even that doesn't actually seem bad: the driver is > making progress handling completions. >
Is it possible that when one gets into the "rotting packet" case, the quota is at or close to 0 (on ehca). If in the cass it is 0 and netif_rx_reschedule() case wins (over netif_rx_schedule()) then it keeps spinning unable to process any packets since the undo parameter for netif_reschedule() is 0. If netif_rx_reschedule() keeps winning for a few iterations then the receive queues get full and dropping packets, thus causing a loss in performance. If this is indeed the case, then one option to try out may be is to change the undo parameter of netif_rx_rechedule()to either IB_WC or even dev->weight. > Shirley> It does help the performance from 1XXMb/s to 7XXMb/s, but > Shirley> not as expected 3XXXMb/s. > > Is that 3xxx Mb/sec the performance you see without the NAPI patch? > > Shirley> With the defer rotting packet process patch, I can see > Shirley> packets out of order problem in TCP layer. Is it > Shirley> possible there is a race somewhere causing two napi polls > Shirley> in the same time? mthca seems to use irq auto affinity, > Shirley> but ehca uses round-robin interrupt. > > I don't see how two NAPI polls could run at once, and I would expect > worse effects from them stepping on each other than just out-of-order > packets. However, the fact that ehca does round-robin interrupt > handling might lead to out-of-order packets just because different > CPUs are all feeding packets into the network stack. > > - R. > > _______________________________________________ > openib-general mailing list > openib-general@openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general >
_______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general