Pasha tells me he'll be able to review the patch next week, so I'll
wait to commit until then. I added the patch to the ticket, just so
that it doesn't get lost.
Any other reviewers would be welcome... :-)
On May 14, 2008, at 5:39 PM, Jeff Squyres wrote:
https://svn.open-mpi.org/trac/ompi/ticket/1285 turned out to be more
complicated than expected (of course). The startup in the openib
btl mixes resource discovery and initialization (vs. doing
discovery, deciding which hcas/ports/lids to use, initializing them,
and then assigning resources to them), which made the whole "putting
the BSRQ receive_queues value in the INI file" a bit more difficult
than expected -- the code got a bit hairy. I would appreciate some
more eyes on this code before I commit it; thanks.
---------
The attached patch solves two problems:
- allow receive_queues to be specified in the INI file
- detect when multiple different receive_queues are specified and
gracefully abort
However, accomplishing these goals ran into multiple difficulties.
By putting receive_queues in the INI file:
1. we may not find the value until we've already traversed multiple
HCAs
2. we may find multiple different receive_queues values
But since the openib btl initializes as it discovers each HCA/port/
LID (including the BSRQ data), if we find a new receive_queues value
late in the discovery process, then all the BSRQ data that was
previously initialized will likely be invalid. So I had to pull all
the BSRQ initialization out until after the rest of the discovery /
initialization process.
Additionally, note that if the user specifies the MCA parameter
btl_openib_receive_queues, it trumps whatever was in the INI file.
So in this case, there can never be a receive_queues conflict.
The attached patch does the following (Jon wrote part of this, too):
- some random style cleanup
- fix a few minor memory leaks
- adapt _ini.c to accept the "receive_queues" field in the file
- move 90% of _setup_qps() from _ini.c to _component.c
- move what was left of _setup_qps() into the main
_register_mca_params() function
- adapt init_one_hca() to detect conflicting receive_queues values
from the INI file
- after the _component.c loop calling init_one_hca():
- call setup_qps() to parse the final receive_queues string value
- traverse all resulting btls and initialize their HCAs (if they
weren't already): setup some lists and call prepare_hca_for_use()
I tested this code on a dual-HCA system where I artificially put in
differing receive_queues values in the INI file for the two
different types of HCAs that I have and it all seemed to work. But
I'd appreciate some more eyes on the code to sanity check.
If I hear nothing back by COB tomorrow, I'll commit. Thanks.
--
Jeff Squyres
Cisco Systems
<receive-queues.patch><mime-attachment.txt><mime-attachment.txt>
--
Jeff Squyres
Cisco Systems