Hi Ralph,

Camille and myself are working also on improving the restart ability of orte2. We are focusing on restarting individual processes (while Josh needs to restart the entire job). However I guess most of the functionalities are similar. Could we join your discussions on point 3 ?


Le 27 févr. 08 à 21:47, Ralph Castain a écrit :

Hi folks

Okay, the ORTE merge appears to have gone well and is now complete - you are
free to use the trunk.

A few caveats:

1. obviously, you will need to autogen/configure once you update. I
-strongly- recommend you rm -rf your install directory first as you will
definitely be hit with stale libraries from this commit

2. this is a "drop" from the ORTE devel effort. As such, it is -not-
complete. There are several known issues, particularly with comm_spawn and singleton comm_spawn in certain environments and scenarios. I have a "fix" already done and ready to be applied for the comm_spawn problems, but I want to test it some more in the morning before committing it to the trunk - and
I didn't want to delay this merge any longer.

3. we know that checkpoint/restart is currently broken. Josh and I have discussed a couple of options for repairing it, and he will look at it as soon as he has a chance. It isn't a big problem - just need to decide which
option he would prefer to pursue.

The remaining ORTE scalability work should be moving into the trunk over the next few weeks (I will be on vacation 3/7-14, so it will likely take through March). We do not anticipate any API changes or framework adds/ deletes the
rest of the way - there will be a few new components added to existing
frameworks, some revamp of the logic in a few places, etc.

I will try to cover all the changes in one or two notes over the next few days to avoid carpal tunnel. Please feel free to ask questions and I'll do
my best to provide answers.

Thanks again for the cooperation tonight...

devel mailing list

Reply via email to