There are several reasons these calls are there. Please read further.

On Jan 26, 2009, at 02:19 , Brice Goglin wrote:

Hello,

I am testing OpenMPI 1.3 over Open-MX. OpenMPI 1.2 works well but 1.3
does not load. This is caused by OMPI MX components now using some MX
internal symbols (mx_open_board, mx__get_mapper_state and
mx__regcache_clean). This looks like an ugly hack to me :) Why don't you
talk to Myricom about adding a proper interface in MX?

mx__regcache_clean is something that was added inside Open MPI by the Myricom people. So, I guess they consider it as not ugly enough.

mx_open_board is there so we can detect as quick as possible if the Myricom hardware is available on the machine or there are just libraries laying around. There is no other way to do so, except initializing the device, and then we are stuck with the current configuration (as we cannot modify the MX behavior at runtime).

mx__get_mapper_state is there to detect multiple links and compute the routes. There are two reasons for this: - clusters with multiple MX interfaces. We want to have a one to one mapping between the cards, and not to rely on the mapper to do the right thing. - clusters of clusters: we have to be able to figure out that even if two computers have MX they will not necessarily be able to communicate over it if they belong to 2 distinct clusters.

Building OMPI directly on Open-MX will disable the mapper_state stuff
because of missing MX internal headers. But, Open-MX is ABI compatible
with MX.

Unfortunately we access more than just the simple interface propose in myriexpress.h. However, Open MPI can be build without these dependencies if the correct defines are not set. I guess this will work in most common cases (not grids as an example).

So building on MX and running on Open-MX requires the addition
of these symbols in Open-MX anyway. Before I do so, I'd like to know why
you actually need these symbols. Are mx_open_board and
mx__get_mapper_state used to get a "fabric identifier" in the context of
multi-clusters/grids?

Yes, you have half the answer.

If so, assuming it will ever matter for Open-MX,
is it ok to just have mx__get_mapper_state report the MAC address of the
my mapper node and nothing else in the mapper_state structure?

Yes, the only thing we need is an unique identifier per cluster. We use the last 6 digits from the mapper MAC address.

Then, I guess mx__regcache_clean is called when the OMPI free hook wants to
clear the MX regcache, right?

As we don't really have access to the MX memory registration (which is good), we need sometimes to force the cleanup. This is why we're using this function.

Also, is there any plan to use any other MX internal symbols in the
future releases?

Depend on the bugs we're running into. So far so good, but there is no way to guarantee we will not need additional symbols.

By the way, is there a way to get more details from OMPI when it fails
to load a component because of missing symbols like this?
LD_DEBUG=verbose isn't very convenient :)

mca_component_show_load_errors is what you need there. Set it to something high depending on the level of verbosity you want to have.

  george.



thanks,
Brice Goglin

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to