Hi Ralph,
Camille and myself are working also on improving the restart ability
of orte2. We are focusing on restarting individual processes (while
Josh needs to restart the entire job). However I guess most of the
functionalities are similar. Could we join your discussions on point 3 ?
Aurelien
Le 27 févr. 08 à 21:47, Ralph Castain a écrit :
Hi folks
Okay, the ORTE merge appears to have gone well and is now complete -
you are
free to use the trunk.
A few caveats:
1. obviously, you will need to autogen/configure once you update. I
-strongly- recommend you rm -rf your install directory first as you
will
definitely be hit with stale libraries from this commit
2. this is a "drop" from the ORTE devel effort. As such, it is -not-
complete. There are several known issues, particularly with
comm_spawn and
singleton comm_spawn in certain environments and scenarios. I have a
"fix"
already done and ready to be applied for the comm_spawn problems,
but I want
to test it some more in the morning before committing it to the
trunk - and
I didn't want to delay this merge any longer.
3. we know that checkpoint/restart is currently broken. Josh and I
have
discussed a couple of options for repairing it, and he will look at
it as
soon as he has a chance. It isn't a big problem - just need to
decide which
option he would prefer to pursue.
The remaining ORTE scalability work should be moving into the trunk
over the
next few weeks (I will be on vacation 3/7-14, so it will likely take
through
March). We do not anticipate any API changes or framework adds/
deletes the
rest of the way - there will be a few new components added to existing
frameworks, some revamp of the logic in a few places, etc.
I will try to cover all the changes in one or two notes over the
next few
days to avoid carpal tunnel. Please feel free to ask questions and
I'll do
my best to provide answers.
Thanks again for the cooperation tonight...
Ralph
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel