Shirley> After completion handler receives the notification, don't
    Shirley> poll the CQ right away, and wait for more WIKIs in
    Shirley> CQ. That way can reduce the CQ lock overhead.

    Roland> That's interesting... it makes sense, and it argues in
    Roland> favor of deferring CQ polling to a kernel thread.  Of
    Roland> course this will hurt ping-pong latency.  Maybe it's
    Roland> better to just implement NAPI though...

Roland> And actually it argues against splitting the CQ, because having one
CQ
Roland> increases the number of CQ entries that we have a chance to poll at
Roland> any one time, by lumping send and receive completions together...

The assumption you have here is that one CPU is capable of handling the
completions without impacting bandwidth. We have seen the opposite in that
we end up with one CPU pegged at high throughput. The benefit you are
working on is latency will be faster if we handle both send and receive
processing off the same thread/interrupt, but you have to balance that with
bandwidth limitations. You think 4X has a bandwdith problem using IPoIB,
wait till 12X comes out.

What per CPU utilization do you see on mthca on a multiple CPU machine
running peak bandwidth?

Roland>  - R.


Bernie King-Smith
IBM Corporation
Server Group
Cluster System Performance
[EMAIL PROTECTED]    (845)433-8483
Tie. 293-8483 or wombat2 on NOTES

"We are not responsible for the world we are born into, only for the world
we leave when we die.
So we have to accept what has gone before us and work to change the only
thing we can,
-- The Future." William Shatner

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to