Re: [OMPI devel] Open-MX vs OMPI 1.3 using MX internal symbols

George Bosilca Mon, 26 Jan 2009 09:10:02 -0500

There are several reasons these calls are there. Please read further.


On Jan 26, 2009, at 02:19 , Brice Goglin wrote:

Hello,

I am testing OpenMPI 1.3 over Open-MX. OpenMPI 1.2 works well but 1.3
does not load. This is caused by OMPI MX components now using some MX
internal symbols (mx_open_board, mx__get_mapper_state and

mx__regcache_clean). This looks like an ugly hack to me :) Why don'tyou

talk to Myricom about adding a proper interface in MX?

mx__regcache_clean is something that was added inside Open MPI by theMyricom people. So, I guess they consider it as not ugly enough.

mx_open_board is there so we can detect as quick as possible if theMyricom hardware is available on the machine or there are justlibraries laying around. There is no other way to do so, exceptinitializing the device, and then we are stuck with the currentconfiguration (as we cannot modify the MX behavior at runtime).

mx__get_mapper_state is there to detect multiple links and compute theroutes. There are two reasons for this:- clusters with multiple MX interfaces. We want to have a one to onemapping between the cards, and not to rely on the mapper to do theright thing.- clusters of clusters: we have to be able to figure out that even iftwo computers have MX they will not necessarily be able to communicateover it if they belong to 2 distinct clusters.

Building OMPI directly on Open-MX will disable the mapper_state stuff
because of missing MX internal headers. But, Open-MX is ABI compatible
with MX.

Unfortunately we access more than just the simple interface propose inmyriexpress.h. However, Open MPI can be build without thesedependencies if the correct defines are not set. I guess this willwork in most common cases (not grids as an example).

So building on MX and running on Open-MX requires the addition
of these symbols in Open-MX anyway. Before I do so, I'd like to knowwhy
you actually need these symbols. Are mx_open_board and
mx__get_mapper_state used to get a "fabric identifier" in thecontext of
multi-clusters/grids?


Yes, you have half the answer.

If so, assuming it will ever matter for Open-MX,
is it ok to just have mx__get_mapper_state report the MAC address ofthe
my mapper node and nothing else in the mapper_state structure?

Yes, the only thing we need is an unique identifier per cluster. Weuse the last 6 digits from the mapper MAC address.

Then, I guess mx__regcache_clean is called when the OMPI free hookwants to
clear the MX regcache, right?

As we don't really have access to the MX memory registration (which isgood), we need sometimes to force the cleanup. This is why we're usingthis function.

Also, is there any plan to use any other MX internal symbols in the
future releases?

Depend on the bugs we're running into. So far so good, but there is noway to guarantee we will not need additional symbols.

By the way, is there a way to get more details from OMPI when it fails
to load a component because of missing symbols like this?
LD_DEBUG=verbose isn't very convenient :)

mca_component_show_load_errors is what you need there. Set it tosomething high depending on the level of verbosity you want to have.


  george.



thanks,
Brice Goglin

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] Open-MX vs OMPI 1.3 using MX internal symbols

Reply via email to