[OMPI devel] bug in openib btl_remove_procs

2008-03-07 Thread Jeff Squyres
I noticed that when btl_remove_procs is invoked on the openib BTL  
(e.g., when you "mpirun --mca btl self,openib ...", an openib endpoint  
will be removed because self's exclusivity will edge it out), the  
openib remove_procs() function will not remove the corresponding  
endpoint on mca_btl_openib_proc_t->proc_endpoints[] array even though  
the endpoint was OBJ_RELEASE'ed (and freed).


This was causing a problem for me on the cpc branch because we  
actually examine that array.  Can someone sanity check this commit?   
(it's on the cpc branch; it's apparently not a problem on the current  
trunk -- if it's ok, we can bring it in when the cpc stuff comes back  
to the trunk)


https://svn.open-mpi.org/trac/ompi/changeset/17784

Thanks.

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Fault tolerance

2008-03-07 Thread Aurélien Bouteiller

We now use the errmgr.

Aurelien

Le 6 mars 08 à 13:38, Aurélien Bouteiller a écrit :


Aside of what Josh said, we are working right know at UTK on orted/MPI
recovery (without killing/respawning all). For now we had no use of
the errgmr, but I'm quite sure this would be the smartest  place to
put all the mechanisms we are trying now.

Aurelien
Le 6 mars 08 à 11:17, Ralph Castain a écrit :


Ah - ok, thanks for clarifying! I'm happy to leave it around, but
wasn't
sure if/where it fit into anyone's future plans.

Thanks
Ralph



On 3/6/08 9:13 AM, "Josh Hursey"  wrote:


The checkpoint/restart work that I have integrated does not respond
to
failure at the moment. If a failures happens I want ORTE to  
terminate

the entire job. I will then restart the entire job from a checkpoint
file. This follows the 'all fall down' approach that users typically
expect when using a global C/R technique.

Eventually I want to integrate something better where I can respond
to
a failure with a recovery from inside ORTE. I'm not there yet, but
hopefully in the near future.

I'll let the UTK group talk about what they are doing with ORTE,
but I
suspect they will be taking advantage of the errmgr to help respond
to
failure and restart a single process.


It is important to consider in this context that we do *not* always
want ORTE to abort whenever it detects a process failure. This is  
the

default mode for MPI applications (MPI_ERRORS_ARE_FATAL), and should
be supported. But there is another mode in which we would like ORTE
to
keep running to conform with (MPI_ERRORS_RETURN):
http://www.mpi-forum.org/docs/mpi-11-html/node148.html

It is known that certain standards conformant MPI "fault tolerant"
programs do not work in Open MPI for various reasons some in the
runtime and some external. Here we are mostly talking about
disconnected fates of intra-communicator groups. I have a test in  
the

ompi-tests repository that illustrates this problem, but I do not
have
time to fix it at the moment.


So in short keep the errmgr around for now. I suspect we will be
using
it, and possibly tweaking it in the nearish future.

Thanks for the observation.

Cheers,
Josh

On Mar 6, 2008, at 10:44 AM, Ralph Castain wrote:


Hello

I've been doing some work on fault response within the system, and
finally
realized something I should probably have seen awhile back. Perhaps
I am
misunderstanding somewhere, so forgive the ignorance if so.

When we designed ORTE some time in the deep, dark past, we had
envisioned
that people might want multiple ways of responding to process  
faults

and/or
abnormal terminations. You might want to just abort the job,  
attempt

to
restart just that proc, attempt to restart the job, etc. To support
these
multiple options, and to provide a means for people to simply try
new ones,
we created the errmgr framework.

Our thought was that a process and/or daemon would call the errmgr
when we
detected something abnormal happening, and that the selected errmgr
component could then do whatever fault response was desired.

However, I now see that the fault tolerance mechanisms inside of
OMPI do not
seem to be using that methodology. Instead, we have hard-coded a
particular
response into the system.

If we configure without FT, we just abort the entire job since that
is the
only errmgr component that exists.

If we configure with FT, then we execute the hard-coded C/R
methodology.
This is built directly into the code, so there is no option as to
what
happens.

Is there a reason why the errmgr framework was not used? Did the FT
team
decide that this was not a useful tool to support multiple FT
strategies?
Can we modify it to better serve those needs, or is it simply not
feasible?

If it isn't going to be used for that purpose, then I might as well
remove
it. As things stand, there really is no purpose served by the  
errmgr

framework - might as well replace it with just a function call.

Appreciate any insights
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] t_win failures if openib btl is not loaded

2008-03-07 Thread Jeff Squyres
I filed this as https://svn.open-mpi.org/trac/ompi/ticket/1233 so that  
it would not be forgotten.



On Feb 18, 2008, at 10:53 AM, Tim Prins wrote:


Hi all,

This is a bit strange, so I thought I'd ping the group before digging
any further.

The onesided test 't_win' is failing for us (specifically the
'FENCE_EPOCH' part). It is only failing when we are NOT using openib.

But here is where it gets strange. The test is run twice: once where  
the

window memory is allocated using MPI_Alloc_mem, and once where it is
allocated using malloc. When we use MPI_Alloc_mem, it fails. Using
malloc, it works just fine all the time.

That is, I do
"mpirun -np 1 -mca btl  ./t_win" and get:

btls| Using MPI_Alloc_mem | using malloc
+-+---
self|   Fail  |   Pass
openib,self |   Pass  |   Pass
sm,self |   Fail  |   Pass
tcp,self|   Fail  |   Pass

But we are only using one proc, so the only transport ever used should
be 'self'. So this makes me think something is going on with the mpool
or related part of the code.

Any ideas as to what is going on here?

Thanks,

Tim



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems