On Wed, 26 Aug 2009 18:24:20 -0600 Jason Gunthorpe <[email protected]> wrote:
> On Wed, Aug 26, 2009 at 04:40:26PM -0700, Ira Weiny wrote: > > > Of course! :-) But first I would like to mention some numbers from the > > prototype code I have. When running on a small fabric the additional > > overhead > > of thread creation actually slows down the scan. :-( > > It seems strange to me to thread something like this (and alot of hard > work).. > > FSM multiplexing the recv path usually gives much better performance, > something like net discovery is quite easy.. Using the original algorithm and data structures lended itself to threading. Now that I am neck deep in all this I have thought that rewriting it all might be easier. > main loop: > fill tx queue from next list > recieve replies and correlate with next list This would still need additional code (or additional synchronization in the API to libibnetdisc) if you wanted a user app to be multi-threaded. Someone has to be in charge of receiving all replies on that ibmad_port object and handing them to the proper owner. Of course one could open multiple ibmad_port objects but how is the app writer to know to do that? Digging through the code to find out that libibnetdisc is consuming all the replies? This is what got me on this in the first place. smp_query_via (_do_madrpc) is not thread safe. Threading was the easy way to deal with multiple blocking queries on the fabric. Changing _do_madrpc to be thread safe allowed a very quick multithreaded implementation on top of the current algorithm which blocked on multiple queries. I did not have to form the queries myself, it was easy... (I had that working months ago.) Given that we don't want to change libibmad things got more complicated and your algorithm seems much better... (except [see below]) Also, I feel that someone down the road might fall into the same trap that I did thinking that smp_query_via is thread safe and I would like to fix that. > > each entry: > add to next list additional ports > > Repeat until dead. > > Where a 'next list' would be a set of actions along the lines of > 'query node' or 'query port' the action on a 'query node' completion > is to generate 'query port' next list items for all the ports, and on > 'query port' completion is to generate 'query node' items for all > enabled ports.. > > libumad is nonblocking, parallel, etc... Yes, and libibmad layers on top of it an easier interface to issue common queries. Why should we ask the user to re-implement that code? For example, mad_rpc now handles redirection. My implementation does not yet. So now I have to handle that on my own as well... :-( Ira > > Jason -- Ira Weiny Math Programmer/Computer Scientist Lawrence Livermore National Lab 925-423-8008 [email protected] _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
