Shirley> After completion handler receives the notification, don't Shirley> poll the CQ right away, and wait for more WIKIs in Shirley> CQ. That way can reduce the CQ lock overhead.
Roland> That's interesting... it makes sense, and it argues in Roland> favor of deferring CQ polling to a kernel thread. Of Roland> course this will hurt ping-pong latency. Maybe it's Roland> better to just implement NAPI though... Roland> And actually it argues against splitting the CQ, because having one CQ Roland> increases the number of CQ entries that we have a chance to poll at Roland> any one time, by lumping send and receive completions together... The assumption you have here is that one CPU is capable of handling the completions without impacting bandwidth. We have seen the opposite in that we end up with one CPU pegged at high throughput. The benefit you are working on is latency will be faster if we handle both send and receive processing off the same thread/interrupt, but you have to balance that with bandwidth limitations. You think 4X has a bandwdith problem using IPoIB, wait till 12X comes out. What per CPU utilization do you see on mthca on a multiple CPU machine running peak bandwidth? Roland> - R. Bernie King-Smith IBM Corporation Server Group Cluster System Performance [EMAIL PROTECTED] (845)433-8483 Tie. 293-8483 or wombat2 on NOTES "We are not responsible for the world we are born into, only for the world we leave when we die. So we have to accept what has gone before us and work to change the only thing we can, -- The Future." William Shatner _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general