Pasha tells me he'll be able to review the patch next week, so I'll wait to commit until then. I added the patch to the ticket, just so that it doesn't get lost.

Any other reviewers would be welcome...  :-)

On May 14, 2008, at 5:39 PM, Jeff Squyres wrote:

https://svn.open-mpi.org/trac/ompi/ticket/1285 turned out to be more complicated than expected (of course). The startup in the openib btl mixes resource discovery and initialization (vs. doing discovery, deciding which hcas/ports/lids to use, initializing them, and then assigning resources to them), which made the whole "putting the BSRQ receive_queues value in the INI file" a bit more difficult than expected -- the code got a bit hairy. I would appreciate some more eyes on this code before I commit it; thanks.
---------

The attached patch solves two problems:

- allow receive_queues to be specified in the INI file
- detect when multiple different receive_queues are specified and gracefully abort

However, accomplishing these goals ran into multiple difficulties. By putting receive_queues in the INI file:

1. we may not find the value until we've already traversed multiple HCAs
2. we may find multiple different receive_queues values

But since the openib btl initializes as it discovers each HCA/port/ LID (including the BSRQ data), if we find a new receive_queues value late in the discovery process, then all the BSRQ data that was previously initialized will likely be invalid. So I had to pull all the BSRQ initialization out until after the rest of the discovery / initialization process.

Additionally, note that if the user specifies the MCA parameter btl_openib_receive_queues, it trumps whatever was in the INI file. So in this case, there can never be a receive_queues conflict.

The attached patch does the following (Jon wrote part of this, too):

- some random style cleanup
- fix a few minor memory leaks
- adapt _ini.c to accept the "receive_queues" field in the file
- move 90% of _setup_qps() from _ini.c to _component.c
- move what was left of _setup_qps() into the main _register_mca_params() function - adapt init_one_hca() to detect conflicting receive_queues values from the INI file
- after the _component.c loop calling init_one_hca():
 - call setup_qps() to parse the final receive_queues string value
- traverse all resulting btls and initialize their HCAs (if they weren't already): setup some lists and call prepare_hca_for_use()

I tested this code on a dual-HCA system where I artificially put in differing receive_queues values in the INI file for the two different types of HCAs that I have and it all seemed to work. But I'd appreciate some more eyes on the code to sanity check.

If I hear nothing back by COB tomorrow, I'll commit.  Thanks.

--
Jeff Squyres
Cisco Systems
<receive-queues.patch><mime-attachment.txt><mime-attachment.txt>


--
Jeff Squyres
Cisco Systems

Reply via email to