While working on OMPI udapl btl, I have noticed some "interesting" behavior. OFA udapl wants the evd queues to be a power of 2 and then will subtract 1 for book keeping (ie, so that internal head and tail pointers never touch except when the ring is empty). OFA udapl will report the queue length as this number (and not the original size requested) when queried. This becomes interesting when a power of 2 is passed in and then queried. For example, a requested queue of length 256 will report a length of 255 when queried. I cannot tell if it is acceptable to get a size less than the one you request, based on the udapl documentation.
Now during the setup of the ompi connection, it will try to make the local parameters sufficient to run the programs. Now if we try to run a small amount of procs, then the defaults will be reset across all nodes. Since the defaults may not exactly match, udapl btl will try to resize the queue (in this example 256 > 255). When the call finally makes it up to the ofa udapl code, it will bail because it checks to see if the new size is less than or equal to the current size + 1. So if the ofa udapl code is working as designed, then the ompi udapl btl code needs to have the proper boundary check for size + 1 (for which I have a patch). If not, then the ofa code need to be changed to either round up to the next power of 2 if given a power of 2, or return the size + 1 when queried. So, which one is correct? Thanks, Jon BTW, If anyone is interested, I have cut down dapltest to a very basic test that will show this behavior 100% of the time. I can make the source available to whomever wants it.