Re: [OMPI devel] How to add a schedule algorithm to the pml

2010-09-22 Thread Joshua Hursey
crcpw is a wrapper around the PML to support coordinated checkpoint restart. It 
mostly just replays the call to the 'crcp' framework that tracks the signature 
of messages traveling through the system.

If you are not using the C/R feature, then I would not worry about the crcpw 
PML component (it is disabled automatically in non-CR builds).

-- Josh

On Sep 22, 2010, at 8:44 AM, Jeff Squyres wrote:

> On Sep 22, 2010, at 8:00 AM, Jeff Squyres wrote:
> 
>> crcpw: this is a fork of the ob1 PML; it add some failover semantics.
> 
> Oops!  I messed this up:
> 
> bfo is the one I meant to write up there -- it's a fork of ob1; it adds 
> failover semantics.
> 
> I don't know exactly what crcpw is -- I suspect this is a Josh creation for 
> some kind of fault tolerance...?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r23936

2010-10-26 Thread Joshua Hursey
I like the idea of putting the old libevent back as a separate component, just 
for performance/correctness comparisons. I think it would be good for the 
trunk, but for the release branches just choose one version to ship (so we 
don't confuse users).

-- Josh

On Oct 26, 2010, at 6:27 AM, Jeff Squyres (jsquyres) wrote:

> Btw it strikes me that we could put the old libevent back as a separate 
> component for comparisons. 
> 
> Sent from my PDA. No type good. 
> 
> On Oct 26, 2010, at 6:20 AM, "Jeff Squyres"  wrote:
> 
>> On Oct 25, 2010, at 9:29 PM, George Bosilca wrote:
>> 
>>> 1. Not all processes deadlock in btl_sm_add_procs. The process that setup 
>>> the shared memory area, is going forward, and block later in a barrier.
>> 
>> Yes, I'm seeing the same thing (I didn't include all details like this in my 
>> post, sorry). I was running with -np 2 on a local machine and saw vpid=0 get 
>> stuck in opal_progress (because the first time through, seg_inited < 
>> n_local_procs).  vpid=1 increments seg_inited and therefore doesn't enter 
>> the loop that calls opal_progress(), and therefore continues on.
>> 
>>> 2. All other processes, loop around the opal_progress, until they got a 
>>> message from all other processes. The variable used for counting is somehow 
>>> updated correctly, but we still call opal_progress. I couldn't figure out 
>>> is we loop more that we should, or if opal_progress doesn't return. 
>>> However, both of these possibilities look very unlikely to me: the loop in 
>>> the sm_add_procs is pretty straightforward, and I couldn't find any loops 
>>> in opal_progress. I wonder if some of the messages get lost on the exchange.
>> 
>> I had this problem, too, until I tried to use padb to get stack traces.  I 
>> noticed that when I ran padb, my blocked process un-blocked itself and 
>> continued.  After more digging, I determined that my blocked process was, in 
>> fact, blocked in poll() with an infinite timeout.  padb (or any signal at 
>> all) caused it to unblock and therefore continue.
>> 
>>> 3. If I unblock the situation by hand, everything goes back to normal. 
>>> NetPIPE runs to completion but the performances are __really__ bad. On my 
>>> test machine I get around 2000Mbs, when the expected value is at least 10 
>>> times more. Similar finding on the latency side, we're now at 1.65 
>>> micro-sec up from the usual 0.35 we had before.
>> 
>> It's a feature!
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] 1.5.x plans

2010-11-01 Thread Joshua Hursey
I think bringing the large changes from the trunk via patches in the CMR style 
is a non-starter, so I am glad that none of the options include this. So I am 
for any of the options proposed.

I would just like the development branch (whether it be v1.5.X or v1.7) to be 
released more often. The original intention was that it would happen once every 
month or two. We missed that mark by quite a lot, which only compounds the 
problem with this particular decision.

So I vote for any of the three options :)

-- Josh

On Oct 30, 2010, at 3:16 PM, Shamis, Pavel wrote:

> IMHO "B" will require a lot of attention from all developers/vendors, as well 
> it maybe quite time consuming task (btw, I think it is q couple of openib btl 
> changes that aren't on the list). So probably it will be good to ask all btl 
> (or other modules/features) maintainers directly.
> 
> Personally I prefer option C , A.
> 
> My 0.02c 
> 
> - Pasha
> 
> On Oct 26, 2010, at 5:07 PM, Jeff Squyres wrote:
> 
>> On the teleconf today, two important topics were discussed about the 1.5.x 
>> series:
>> 
>> -
>> 
>> 1. I outlined my plan for a "small" 1.5.1 release.  It is intended to fix a 
>> small number of compilation and portability issues.  Everyone seemed to 
>> think that this was an ok idea.  I have done some tomfoolery in Trac to 
>> re-target a bunch of tickets -- those listed in 1.5.1 are the only ones that 
>> I intend to apply to 1.5.1:
>> 
>>   https://svn.open-mpi.org/trac/ompi/report/15
>> 
>> (there's one critical bug that I don't know how to fix -- I'm waiting for 
>> feedback from Red Hat before I can continue)
>> 
>> *** Does anyone have any other tickets/bugs that they want/need in a 
>> short-term 1.5.1 release?
>> 
>> -
>> 
>> 2. We discussed what to do for 1.5.2.  Because 1.5[.0] took s long to 
>> release, there's now a sizable divergence between the trunk and the 1.5 
>> branch.  The problem is that there are a number of wide-reaching new 
>> features on the trunk, some of which may (will) be difficult to bring to the 
>> v1.5 branch in a piecemeal fashion, including (but not limited to):
>> 
>> - Paffinity changes (including new hwloc component)
>> - --with-libltdl changes
>> - Ummunotify support
>> - Solaris sysinfo component
>> - Notifier improvements
>> - OPAL_SOS
>> - Common shared memory improvements
>> - Build system improvements
>> - New libevent
>> - BFO PML
>> - Almost all ORTE changes
>> - Bunches of checkpoint restart mo'betterness (including MPI extensions)
>> 
>> There seem to be 3 obvious options about moving forward (all assume that we 
>> do 1.5.1 as described above):
>> 
>>  A. End the 1.5 line (i.e., work towards transitioning it to 1.6), and then 
>> re-branch the trunk to be v1.7.
>>  B. Sync the trunk to the 1.5 branch en masse.  Stabilize that and call it 
>> 1.5.2.
>>  C. Do the same thing as A, but wait at least 6 months (i.e., give the 1.5 
>> series time to mature).
>> 
>> Most people (including me) favored B.  Rich was a little concerned that B 
>> spent too much time on maintenance/logistics when we could just be moving 
>> forward, and therefore favored either A or C.
>> 
>> Any opinions from people who weren't there on the call today?
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




[OMPI devel] Fwd: [all-osl-users] Fwd: Servers reboot on Wednesday (11/24) morning starting at 8:00

2010-11-22 Thread Joshua Hursey
FYI.

Begin forwarded message:

> From: DongInn Kim 
> Date: November 21, 2010 10:46:23 PM CST
> To: "all-osl-us...@osl.iu.edu" 
> Subject: [all-osl-users] Fwd: Servers reboot on Wednesday (11/24) morning 
> starting at 8:00
> 
> Hi,
> 
> The OSL NFS server(deep-thought) would restart at 8:00 AM (EST) on Nov 24, 
> 2010.
> 
> While the server is rebooted, the following services would be unavailable.
> 
> - All the websites sitting on NFS.
>http://www.open-mpi.org
>http://www.osl.iu.edu
>http://www.scalabletools.org
>etc
> - License servers
> - Module features to load up or unload some specific versions of software
> 
> Please let me know if you have any concerns or questions about this outage.
> 
> Regards,
> 
> - DongInn
> 
>  Original Message 
> Subject: Servers reboot on Wednesday (11/24) morning starting at 8:00
> Date: Sun, 21 Nov 2010 10:15:42 -0500
> From: Shei, Shing-Shong 
> Organization: School of Informatics and Computing, Indiana University
> To: 
> Hi,
> 
> We are going to reboot bunch of central servers on Wednesday, Nov 24,
> morning due to some of them are still running a vulnerable kernel.  Here
> is the list of the machines affected and their corresponding list of
> people in charge.
> 
> mneme -- Randy, Beth, and Felix
> deep-thought-- Andrew and DongInn
> carl, lenny   -- Fil and Alessandro
> 
> This is supposed to be a quick reboot for the new kernel to kick in.  So
> if this is a problem to you, please let us know ASAP and we can
> reschedule the machine you do not to reboot on Wednesday.
> 
> Thanks and have a Happy Thanksgiving holiday.
> 
> Bruce
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Joshua Hursey
Can you make a v1.7 milestone on Trac, so I can move some of my tickets?

Some are CMRs, but a couple are defects, with fixes in development, that 
without those CMRs cannot be moved to v1.5.

Thanks,
Josh


On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:

> I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got in 
> the way.
> 
> Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided 
> what to do with the v1.5 series:
> 
> - 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that 
> need to be reviewed; those and the currently ready-to-commit major CMR are 
> all that is planned for 1.5.1.  Hopefully, they could be ready by tonight.
> 
> - 1.5.2 (and successive releases) will be "normal" feature releases.  There's 
> a bit of divergence between the trunk and the v1.5 branch, meaning that some 
> porting of features may be required to get over to the v1.5 branch (FWIW, I 
> think that many things will not require much porting at all -- but some 
> will).  Many of the CMRs filed against v1.5.2 are still relevant; *some* of 
> the features/bugs are still relevant.  We'll start [re-]examining the v1.5.2 
> tickets in more detail soon.  So feel free to apply to have your favorite 
> feature brought over to the v1.5 branch.  Bigger features may be kept in the 
> wings for v1.7 (e.g., the wholesale ORTE refresh for v1.5.x has been axed and 
> will wait until v1.7).  There is a bunch of affinity work occurring on the 
> trunk (and/or in hg branches) right now; we plan to bring all that stuff in 
> to the v1.5 series when ready (probably 3+ months at the earliest -- 
> especially with the December holidays delaying everything).  Once that's 
> done, we ca!
> n then probably start thinking about wrapping up the v1.5 series, converting 
> it to its stable counterpart (1.6), and then branching for v1.7.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] 1.5 plans

2010-11-30 Thread Joshua Hursey
(Insert jab at the definition of 'quickly' when talking about OMPI releases)

>From the way I read Jeff's original email, it seems that we are trying to get 
>v1.5 stable so we can start v1.7 in the next few months (3-5). The C/R 
>functionality on the trunk is significantly different than that on the v1.5 
>(and more so with v1.4). So brining these features over the v1.5 branch will 
>require a CMR that will look like re-syncing to the trunk (it requires the 
>ORTE refresh, and a couple other odds and ends). Since the ORTE refresh was 
>killed due to the size of the feature, so has the C/R features. So even though 
>the v1.5 is a feature branch, the C/R feature is locked out of it at the 
>moment and pushed to v1.7.

So, from my perspective, there is now a push to hurry up on the v1.7 so users 
will have a release branch with the latest-n-greatest C/R functionality. 
Releasing v1.7 next summer would be fine with me, but pushing it further into 
the future seems bad to me.


As a side comment:
The stable branch is a great idea for the production side of the house since it 
is more carefully crafted and maintained. The feature branch is a great idea 
for the researchers in the group to gain exposure for new features, and 
enhancements on old features (many of these require changes to internal APIs 
and data structures). From my perspective, a slow moving feature branch is no 
longer that useful to the research community since it becomes more and more 
painful to synchronize the trunk and branch the longer it takes for the feature 
branch to stabilize for release. So the question often becomes why bother. But 
this a longer discussion for another time maybe.

-- Josh

On Nov 30, 2010, at 9:36 AM, Terry Dontje wrote:

> On 11/30/2010 09:00 AM, Jeff Squyres wrote:
>> On Nov 30, 2010, at 8:54 AM, Joshua Hursey wrote:
>> 
>> 
>>> Can you make a v1.7 milestone on Trac, so I can move some of my tickets?
>>> 
>> Done.
>> 
> I have a question about Josh's recent ticket moves.  One of them mentions 1.5 
> is stablizing quickly Josh can you clarify what you mean by quickly because I 
> think there will be a 1.5 release 3-6 months from now.  So does that fall 
> into your quickly perspective?
> 
> --td
>>> Some are CMRs, but a couple are defects, with fixes in development, that 
>>> without those CMRs cannot be moved to v1.5.
>>> 
>>> Thanks,
>>> Josh
>>> 
>>> 
>>> On Nov 29, 2010, at 11:43 AM, Jeff Squyres wrote:
>>> 
>>> 
>>>> I'm about 2 weeks late on this email; apologies.  SC and Thanksgiving got 
>>>> in the way.
>>>> 
>>>> Per a discussion on the devel teleconf nearly 3 weeks ago, we have decided 
>>>> what to do with the v1.5 series:
>>>> 
>>>> - 1.5.1 will be a bug fix release.  There's 2 blocker bugs right now that 
>>>> need to be reviewed; those and the currently ready-to-commit major CMR are 
>>>> all that is planned for 1.5.1.  Hopefully, they could be ready by tonight.
>>>> 
>>>> - 1.5.2 (and successive releases) will be "normal" feature releases.  
>>>> There's a bit of divergence between the trunk and the v1.5 branch, meaning 
>>>> that some porting of features may be required to get over to the v1.5 
>>>> branch (FWIW, I think that many things will not require much porting at 
>>>> all -- but some will).  Many of the CMRs filed against v1.5.2 are still 
>>>> relevant; *some* of the features/bugs are still relevant.  We'll start 
>>>> [re-]examining the v1.5.2 tickets in more detail soon.  So feel free to 
>>>> apply to have your favorite feature brought over to the v1.5 branch.  
>>>> Bigger features may be kept in the wings for v1.7 (e.g., the wholesale 
>>>> ORTE refresh for v1.5.x has been axed and will wait until v1.7).  There is 
>>>> a bunch of affinity work occurring on the trunk (and/or in hg branches) 
>>>> right now; we plan to bring all that stuff in to the v1.5 series when 
>>>> ready (probably 3+ months at the earliest -- especially with the December 
>>>> holidays delaying everything).  Once that's done, we!
>>>> 
>>   !
>> 
>>> ca!
>>> 
>>>> n then probably start thinking about wrapping up the v1.5 series, 
>>>> converting it to its stable counterpart (1.6), and then branching for v1.7.
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> 
>>>> jsquy...@cisco.com
>>>> 
>>>> For corporate legal information go to:
>>>>

Re: [OMPI devel] Some questions about checkpoint/restart (16)

2010-12-22 Thread Joshua Hursey
Thanks for the questions. Keep them coming. I hope to have some time after the 
first of the year to make some progress on some of the others. But for this 
one, I think you are correct. Does the attached patch (created from the Open 
MPI trunk r24190) fix this particular issue? If so, I'll go ahead and commit it 
to the trunk and ask for it to be brought over the to release series.

Thanks again,
Josh



cr_init_thread.patch
Description: Binary data


On Dec 22, 2010, at 3:07 AM, Takayuki Seki wrote:

> 
> I have a new question about Checkpoint/Restart.
> 
> 16th question is as follows:
> 
> (16) If a program uses MPI_Init_thread function,
> checkpoint cannot be taken by the opal_cr_thread_fn thread.
> 
> Framework : ompi/mpi
> Component : c
> The source file   : ompi/mpi/c/init_thread.c
> The function name : MPI_Init_thread
> 
> 
> Here's the code that causes the problem:
> 
>  #define LOOP 60
> 
>  MPI_Barrier(MPI_COMM_WORLD);
>  if (rank == 0) {
>printf("   rank=%d 60 seconds sleeping start   \n",rank); fflush(stdout);
>  }
>  for (i=0;i loop. */
> sleep(1);
> if (rank == 0) {
>   printf("   rank=%d loop=%d \n",rank,i); fflush(stdout);
> }
>  }
>  if (rank == 0) {
>printf("   rank=%d 60 seconds sleeping finished \n",rank); fflush(stdout);
>  }
>  MPI_Barrier(MPI_COMM_WORLD);
>  if (rank == 0) {
>printf("   rank=%d executes Finalize \n",rank); fflush(stdout);
>  }
>  MPI_Finalize();
> 
> 
> * This problem can be confirmed even by execution by one process.
> 
> mpiexec -n 1  ./a.out
> 
> * Take checkpoint while the process is in the loop to which it takes 60 
> seconds.
> 
> * Example of restart result of a program using MPI_Init.
> 
> -bash-3.2$ ompi-restart ompi_global_snapshot_20762.ckpt
>   rank=0 loop=42
>   rank=0 loop=43
>   rank=0 loop=44
>   rank=0 loop=45
>   rank=0 loop=46
>   rank=0 loop=47
>   rank=0 loop=48
>   rank=0 loop=49
>   rank=0 loop=50
>   rank=0 loop=51
>   rank=0 loop=52
>   rank=0 loop=53
>   rank=0 loop=54
>   rank=0 loop=55
>   rank=0 loop=56
>   rank=0 loop=57
>   rank=0 loop=58
>   rank=0 loop=59
>   rank=0 60 seconds sleeping finished
>   rank=0 executes Finalize
>   rank=0 program end
> 
>  Because checkpoint was taken by opal_cr_thread_fn function immediately
>  when the checkpoint operation was executed,
>  the program restarts from the loop.
> 
> * Example of restart result of a program using MPI_Init_thread.
> 
> -bash-3.2$ ompi-restart ompi_global_snapshot_20660.ckpt
>   rank=0 executes Finalize
>   rank=0 program end
> 
>  It is in the MPI_Barrier function after the loop
>  that checkpoint was actually taken.
>  Therefore, the program restarts from MPI_Barrier function.
> 
> 
> * I think that it is the problem that MPI_Init_thread does not execute 
> OPAL_CR_INIT_LIBRARY.
>  So, opal_cr_thread_is_active still remains in false condition.
>  Therefore, the following while loop does not terminate.
> 
>/*
> * Wait to become active
> */
>while( !opal_cr_thread_is_active && !opal_cr_thread_is_done) {
>sched_yield();
>}
> 
> 
> * MPI_Init_thread uses OPAL_CR_ENTER_LIBRARY and OPAL_CR_EXIT_LIBRARY.
>  I think it is not correct.
>  Because MPI_Init_thread is an initialization function of MPI,
>  I think that it should be the same specification as MPI_Init.
> 
> 
> -bash-3.2$ cat t_mpi_question-16.c
> #include 
> #include 
> #include 
> #include "mpi.h"
> 
> #define LOOP   60
> 
> int main(int ac,char **av)
> {
>  int i;
>  int rank,size;
>  int required,provided,provided_for_query;
> 
>  required = MPI_THREAD_SINGLE;
>  provided = -1;
>  provided_for_query = -1;
> #if defined(USE_INITTHREAD)
>  MPI_Init_thread(&ac,&av,required,&provided);
>  MPI_Query_thread(&provided_for_query);
> #else
>  MPI_Init(&ac,&av);
> #endif
>  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>  MPI_Comm_size(MPI_COMM_WORLD,&size);
>  if (rank == 0) {
>printf("   rank=%d sz=%d required=%d provided=%d provided_for_query=%d \n"
>   ,rank,size,required,provided,provided_for_query); fflush(stdout);
>  }
> 
>  MPI_Barrier(MPI_COMM_WORLD);
> 
>  if (rank == 0) {
>printf("   rank=%d 60 seconds sleeping start   \n",rank); fflush(stdout);
>  }
>  for (i=0;i sleep(1);
> if (rank == 0) {
>   printf("   rank=%d loop=%d \n",rank,i); fflush(stdout);
> }
>  }
>  if (rank == 0) {
>printf("   rank=%d 60 seconds sleeping finished \n",rank); fflush(stdout);
>  }
> 
>  MPI_Barrier(MPI_COMM_WORLD);
>  if (rank == 0) {
>printf("   rank=%d executes Finalize \n",rank); fflush(stdout);
>  }
>  MPI_Finalize();
>  if (rank == 0) {
>printf("   rank=%d program end \n",rank); fflush(stdout);
>  }
>  return(0);
> }
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey



Re: [OMPI devel] Some questions about checkpoint/restart (16)

2010-12-29 Thread Joshua Hursey
I committed the fix to the Open MPI trunk in r24194. I also asked for it to be 
brought over to the v1.4 and v1.5 branches (links below to those tickets).
  https://svn.open-mpi.org/trac/ompi/ticket/2671
  https://svn.open-mpi.org/trac/ompi/ticket/2672

I'll hopefully get to the other bugs you reported in the next couple months. 
Thanks for your patience.

Thanks again for the bug report.

-- Josh


On Dec 27, 2010, at 9:20 PM, Takayuki Seki wrote:

>> Thanks for the questions. Keep them coming. I hope to have some time after 
>> the first of the year to make some progress on some of the others. But for 
>> this one, I think you are correct. Does the attached patch (created from the 
>> Open MPI trunk r24190) fix this particular issue? If so, I'll go ahead and 
>> commit it to the trunk and ask for it to be brought over the to release 
>> series.
> 
> Thank you very much for your answer.
> I tried correcting this issue in a same way as your patch.
> Checkpoint was taken by opal_cr_thread_fn in my simple test program as 
> expected.
> I think that your patch is correct though it tested only by my simple test 
> program.
> 
> Best regards,
> Takayuki Seki.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] Change in communication between process (RMAPS)

2011-01-06 Thread Joshua Hursey
So I can point you to some of the work that I did while at Indiana University 
to support process migration in Open MPI in a coordinated manner. This should 
introduce you to some of the internal pieces that fit together to provide this 
support.

The transparent C/R in Open MPI webpage from IU is a good place to start:
   http://osl.iu.edu/research/ft/ompi-cr/index.php

>From there you will find a link to a couple papers that should get you 
>started. In particular "A Composable Runtime Recovery Policy Framework 
>Supporting Resilient HPC Applications" discusses how the ORTE ErrMgr framework 
>was used (initially) to provide process migration and automatic recovery. The 
>actual code in the Open MPI trunk is slightly different. Instead of using 
>different components of the ErrMgr framework (i.e., autor, crmig, stable) we 
>just rolled it all into the existing components (i.e., hnp, orted, app). But 
>all the code can be found in those component directories.

If you want a more general overview of the C/R system in Open MPI, I would 
start with the paper "The Design and Implementation of Checkpoint/Restart 
Process Fault Tolerance for Open MPI" which provides a high level view of the 
architecture (combined with the paper above you will have a fairly complete 
picture of the design). The C/R infrastructure currently only supports 
coordinated C/R, but was designed to be more extensible. So if you are looking 
into uncoordinated C/R techniques you may find that many of the C/R frameworks 
in Open MPI can be reused.

That should get you started. Let us know if you have any further questions.

-- Josh

On Jan 6, 2011, at 3:19 PM, Hugo Meyer wrote:

> Thanks for the reply and don't worry about the delay.
> 
> Yeah, i supposse it wouln't be easy :(.
> But my final goal is what you are mentioning, is to stop one particular 
> process (previously checkpointed) and the migrate it to another place (node, 
> core, slot, etc.) and restart it there, but without making a coordinated 
> checkpoint. I just need to checkpoint processes in an uncoordinated way, and 
> move them.
> 
> Where can i see something about process migration in the code? or something 
> that could guide me.
> 
> Greetings.
> 
> Hugo Meyer
> 
> 2011/1/6 Jeff Squyres 
> Sorry for the delay; you wrote while many of us were on vacation and we're 
> just now starting to catch up on past mails...
> 
> I'm not entirely sure what you're trying to do.  It sounds like you're trying 
> to replace one process with another.  That's quite complicated; there will be 
> a lot of changes required in the code base to do this.
> 
> - you'll need to notify the ORTE subsystem of the process change
> - this notification will likely need to span multiple processes
> - all MPI processes will need to quiesce their communications, disconnect, 
> and reconnect
> - ...and probably other things
> 
> That being said, you might be able to leverage some of the work that's been 
> done with checkpoint/restart/migration.  It's not entirely the same thing 
> that you're doing, but it's at least similar (quiesce networks, [pretend to] 
> move a process from location A to location B, etc.).
> 
> 
> 
> On Dec 28, 2010, at 7:03 AM, Hugo Meyer wrote:
> 
> > Hello to all.
> >
> > I'm new in the forum, at least is the first time i write.
> >
> > I'm working with open mpi and I would do a little experiment, i will try to 
> > pass one process by another process.
> >
> > For example, assuming that there are 2 processes that are communicating say 
> > rank 1 and 2. And there is a process of rank 3, I would like the rank 3 (it 
> > could be assumed that this node is marked down at the initial hostfile) 
> > took the place of rank 2, and rank 1 still think that he is communicating 
> > with rank 2 when in fact is communicating with the rank 3.
> >
> > I guess I'll have to modify tables as orte_job_map_t and orte_proc_t, but I 
> > wanted to know if someone already has experience doing something similar, 
> > and can guide me at least.
> >
> > The communication between processes, in principle, would be irrelevant, so 
> > i will not need to use checkpoints / restarts for now.
> >
> > Greetings
> >
> > Hugo Meyer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] Building Open MPI components outside of the sourcetree

2011-01-20 Thread Joshua Hursey
This may be a good candidate for: svn-root/tmp-public/

On Jan 20, 2011, at 12:42 PM, George Bosilca wrote:

> 
> On Jan 19, 2011, at 19:39 , Jeff Squyres (jsquyres) wrote:
> 
>> I'd rather not setup another SVN repo. Where should it go in the current 
>> OMPI SVN?
> 
> contrib?
> 
>  george.
> 
>> 
>> Sent from my PDA. No type good. 
>> 
>> On Jan 19, 2011, at 5:01 PM, "George Bosilca"  wrote:
>> 
>>> 
>>> On Jan 19, 2011, at 16:44 , Jeff Squyres wrote:
>>> 
>>>> Where should it be on the main web site?  
>>> 
>>> The Documentation section look like a good place to me.
>>> 
>>>> It needs to be in a repo somewhere; it may change over time.
>>> 
>>> The source code can be hosted at Indiana in the same way ompi-tests and 
>>> ompi-docs are hosted. However, I don't expect this code to drastically 
>>> change every other day, so providing a tar on a webpage should be good 
>>> enough. To be more precise on this point, as we only allow big modification 
>>> of the build system between major releases I expect to only maintain 3 
>>> template (stable, unstable and trunk).
>>> 
>>> george.
>>> 
>>>> 
>>>> 
>>>> On Jan 19, 2011, at 4:38 PM, George Bosilca wrote:
>>>> 
>>>>> This stuff should be directly on the main Open MPI website. Not as a link 
>>>>> to bitbucket, but as a webpage and 3 tars.
>>>>> 
>>>>> george.
>>>>> 
>>>>> On Jan 19, 2011, at 15:43 , Jeff Squyres wrote:
>>>>> 
>>>>>> Over the years, a few parties have wanted to be able to build Open MPI 
>>>>>> components outside of the official source tree (e.g., they are 
>>>>>> developing their own components outside of OMPI's SVN).  We've typically 
>>>>>> said "use --with-devel-headers", but a) never really provided a full 
>>>>>> example of how to do this, and b) never acknowledged that using 
>>>>>> --with-devel-headers is somewhat of a pain.
>>>>>> 
>>>>>> That ends now.  :-)
>>>>>> 
>>>>>> I am publishing a bitbucket repo of three example "tcp2" BTL components. 
>>>>>>  They are almost exact copies of the real TCP BTL component, but have 
>>>>>> had their configury updated to enable them to be built outside of the 
>>>>>> Open MPI source tree:
>>>>>> 
>>>>>> 1. A component for the v1.4 Open MPI tree
>>>>>> 2. A component for the v1.5/v1.6 Open MPI tree
>>>>>> 3. A component for the trunk/v1.7 (as of r24265) Open MPI tree
>>>>>> 
>>>>>> Each of these example components support the --with-devel-headers method 
>>>>>> as well as a new method: --with-openmpi-source=DIR (i.e., where you 
>>>>>> specify the corresponding Open MPI source directory, and the component 
>>>>>> builds against that).  
>>>>>> 
>>>>>> There are three different components because the configury between each 
>>>>>> of them are a bit different.  Look at the configure.ac in the version 
>>>>>> that you care about to see examples of how to get the relevant CPPFLAGS 
>>>>>> / CFLAGS that you need to build your component.
>>>>>> 
>>>>>> Here's the bitbucket repo:
>>>>>> 
>>>>>> https://bitbucket.org/jsquyres/build-ompi-components-outside-of-source-tree
>>>>>> 
>>>>>> There's a top-level README.txt file in the repo that explains a bit more.
>>>>>> 
>>>>>> Enjoy!
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com
>>>>>> For corporate legal information go to:
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> 
>>>>>> 
>>>>>> ___
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>>> 
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] OMPI-MIGRATE error

2011-01-25 Thread Joshua Hursey
Can you try with the current trunk head (r24296)?
I just committed a fix for the C/R functionality in which restarts were getting 
stuck. This will likely affect the migration functionality, but I have not had 
an opportunity to test just yet.

Another thing to check is that prelink is turned off on all of your machines.
  https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink

Let me know if the problem persists, and I'll dig into a bit more.

Thanks,
Josh

On Jan 24, 2011, at 11:37 AM, Hugo Meyer wrote:

> Hello @ll
> 
> I've got a problem when i try to use the ompi-migrate command.  
> 
> What i'm doing is execute for example the next application in one node of a 
> cluster (both process wil run on the same node):
> 
> mpirun -np 2 -am ft-enable-cr ./whoami 10 10
> 
> Then in the same node i try to migrate the processes to another node:
> 
> ompi-migrate -x node9 -t node3 14914
> 
> And then i get this message:
> 
> [clus9:15620] *** Process received signal ***
> [clus9:15620] Signal: Segmentation fault (11)
> [clus9:15620] Signal code: Address not mapped (1)
> [clus9:15620] Failing at address: (nil)
> [clus9:15620] [ 0] /lib64/libpthread.so.0 [0x2c0b8d40]
> [clus9:15620] *** End of error message ***
> Segmentation fault
> 
> I assume that maybe there is something wrong with the thread level, but i 
> have configured the open-mpi like this:
> 
> ../configure --prefix=/home/hmeyer/desarrollo/ompi-code/binarios/ 
> --enable-debug --enable-debug-symbols --enable-trace --with-ft=cr 
> --disable-ipv6 --enable-opal-multi-threads --enable-ft-thread --without-hwloc 
> --disable-vt --with-blcr=/soft/blcr-0.8.2/ 
> --with-blcr-libdir=/soft/blcr-0.8.2/lib/
> 
> The checkpoint and restart works fine, but when i restore an application that 
> has more than one process, this one is restored and executed until the last 
> line before MPI_FINALIZE(), but the processes never finalize, i assume that 
> they never call the MPI_FINALIZE(), but with one process ompi-checkpoint and 
> ompi-restart work great.
> 
> Best regards.
> 
> Hugo Meyer
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] OMPI-MIGRATE error

2011-01-26 Thread Joshua Hursey
I found a few more bugs after testing the C/R functionality this morning. I 
just committed some more C/R fixes in r24306 (things are now working correctly 
on my test cluster).
  https://svn.open-mpi.org/trac/ompi/changeset/24306

One thing I just noticed in your original email was that you are specifying the 
wrong parameter for migration (it is different than the standard C/R 
functionality for backwards compatibility reasons). You need to use the 
'ft-enable-cr-recovery' AMCA parameter:
  mpirun -np 2 -am ft-enable-cr-recovery ./whoami 10 10

If you still get the segmentation fault after upgrading to the current trunk, 
can you send me a backtrace from the core file? That will help me narrow down 
on the problem.

Thanks,
Josh


On Jan 26, 2011, at 8:40 AM, Hugo Meyer wrote:

> Josh.
> 
> The ompi-checkpoint with his restart now are working great, but the same 
> error persist with ompi-migrate. I've also tried using "-r", but i get the 
> same error.
> 
> Best regards.
> 
> Hugo Meyer
> 
> 2011/1/26 Hugo Meyer 
> Thanks Josh.
> 
> I've already check te prelink and is set to "no".
> 
> I'm going to try with the trunk head, and then i'll let you know how it goes.
> 
> Best regards.
> 
> Hugo Meyer
> 
> 2011/1/25 Joshua Hursey 
> 
> Can you try with the current trunk head (r24296)?
> I just committed a fix for the C/R functionality in which restarts were 
> getting stuck. This will likely affect the migration functionality, but I 
> have not had an opportunity to test just yet.
> 
> Another thing to check is that prelink is turned off on all of your machines.
>  https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html#prelink
> 
> Let me know if the problem persists, and I'll dig into a bit more.
> 
> Thanks,
> Josh
> 
> On Jan 24, 2011, at 11:37 AM, Hugo Meyer wrote:
> 
> > Hello @ll
> >
> > I've got a problem when i try to use the ompi-migrate command.
> >
> > What i'm doing is execute for example the next application in one node of a 
> > cluster (both process wil run on the same node):
> >
> > mpirun -np 2 -am ft-enable-cr ./whoami 10 10
> >
> > Then in the same node i try to migrate the processes to another node:
> >
> > ompi-migrate -x node9 -t node3 14914
> >
> > And then i get this message:
> >
> > [clus9:15620] *** Process received signal ***
> > [clus9:15620] Signal: Segmentation fault (11)
> > [clus9:15620] Signal code: Address not mapped (1)
> > [clus9:15620] Failing at address: (nil)
> > [clus9:15620] [ 0] /lib64/libpthread.so.0 [0x2c0b8d40]
> > [clus9:15620] *** End of error message ***
> > Segmentation fault
> >
> > I assume that maybe there is something wrong with the thread level, but i 
> > have configured the open-mpi like this:
> >
> > ../configure --prefix=/home/hmeyer/desarrollo/ompi-code/binarios/ 
> > --enable-debug --enable-debug-symbols --enable-trace --with-ft=cr 
> > --disable-ipv6 --enable-opal-multi-threads --enable-ft-thread 
> > --without-hwloc --disable-vt --with-blcr=/soft/blcr-0.8.2/ 
> > --with-blcr-libdir=/soft/blcr-0.8.2/lib/
> >
> > The checkpoint and restart works fine, but when i restore an application 
> > that has more than one process, this one is restored and executed until the 
> > last line before MPI_FINALIZE(), but the processes never finalize, i assume 
> > that they never call the MPI_FINALIZE(), but with one process 
> > ompi-checkpoint and ompi-restart work great.
> >
> > Best regards.
> >
> > Hugo Meyer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] OMPI-MIGRATE error

2011-01-27 Thread Joshua Hursey
 [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at 
> line 323
> [clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
> [clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / 
> autor_recovery_complete
> Soy el número 0 (1)
> Terminando, una instrucción antes del finalize
> Soy el número 1 (1)
> Terminando, una instrucción antes del finalize
> [clus9:06105] 1 more process has sent help message help-orte-errmgr-hnp.txt / 
> autor_recovering_job
> [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:06105] [[42095,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06107] [[42095,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at 
> line 323
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../orte/mca/grpcomm/base/grpcomm_base_modex.c at 
> line 350
> [clus9:06106] [[42095,1],0] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ../../../../../orte/mca/grpcomm/bad/grpcomm_bad_module.c at 
> line 323
> [clus9:06106] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
> [clus9:06107] pml:ob1: ft_event(Restart): Failed orte_grpcomm.modex() = -26
> 
> As you can see, it keeps looping on the recover. Then when i try to migrate 
> this processes using ompi-migrate, i get this:
> 
> [hmeyer@clus9 ~]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/ompi-migrate 
> -x node9 -t node3 18082
> --
> Error: The Job identified by PID (18082) was not able to migrate processes in 
> this
>job. This could be caused by any of the following:
>- Invalid node or rank specified
>- No processes on the indicated node can by migrated
>- Process migration was not enabled for this job. Make sure to indicate
>  the proper AMCA file: "-am ft-enable-cr-recovery".
> --
> But, in the terminal where is running the application i get this:
> 
> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun 
> -np 2 -am ft-enable-cr-recovery ./whoami 10 10
> Antes de MPI_Init
> Antes de MPI_Init
> [clus9:18082] [[62740,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> [clus9:18082] [[62740,0],0] ORTE_ERROR_LOG: Error in file 
> ../../../../../orte/mca/errmgr/hnp/errmgr_hnp_crmig.c at line 287
> --
> Warning: Could not find any processes to migrate on the nodes specified.
>  You provided the following:
> Nodes: node9
> Procs: (null)
> --
> --
> Notice: The processes have been successfully migrated to/from the specified
> machines.
> --
> Soy el número 1 (1)
> Terminando, una instrucción antes del finalize
> Soy el número 0 (1)
> Terminando, una instrucción antes del finalize
> --
> Error: The process below has failed. There is no checkpoint available for
>this job, so we are terminating the application since automatic
>recovery cannot occur.
> Internal Name: [[62740,1],0]
> MCW Rank: 0
> 
> ------
> [clus9:18082] 1 more process has sent help message help-orte-errmgr-hnp.txt / 
> autor_failed_to_recover_proc
> [clus9:18082] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
> help / error messages
> 
> I asume that the orte_get_job_data_object is the problem, because it is not 
> obtaining the proper value. 
> 
> If you need more data, just let me know.
> 
> Best Regards.
> 
> Hugo Meyer
> 
> 
> 
> 
> 2011/1/26 J

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-31 Thread Joshua Hursey

On Jan 31, 2011, at 6:47 AM, Hugo Meyer wrote:

> Hi Joshua.
> 
> I've tried the migration again, and i get the next (running process where 
> mpirun is running):
> 
> Terminal 1:
> 
> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun 
> -np 2 -am ft-enable-cr-recovery --mca orte_base_help_aggregate 0 ./whoami 10 
> 10
> Antes de MPI_Init
> Antes de MPI_Init
> --
> Warning: Could not find any processes to migrate on the nodes specified.
>  You provided the following:
> Nodes: node9
> Procs: (null)
> --
> Soy el número 1 (1)
> Terminando, una instrucción antes del finalize
> Soy el número 0 (1)
> Terminando, una instrucción antes del finalize
> 
> Terminal 2:
> 
> [hmeyer@clus9 build]$ 
> /home/hmeyer/desarrollo/ompi-code/binarios/bin/ompi-migrate -x node9 -t node3 
> 11724
> --
> Error: The Job identified by PID (11724) was not able to migrate processes in 
> this
>job. This could be caused by any of the following:
>- Invalid node or rank specified
>- No processes on the indicated node can by migrated
>- Process migration was not enabled for this job. Make sure to indicate
>  the proper AMCA file: "-am ft-enable-cr-recovery".
> --

The error message indicates that there were no processes found on 'node9'. Did 
you confirm that there were processes running on that node?

It is possible that the node name that Open MPI is using is different than what 
you put in. For example it could be fully qualified (e.g., 
node9.my.domain.com). So you might try that too. MPI_Get_processor_name() 
should return the name of the node that we are attempting to use. So you could 
have all processes print that out when the startup.


> Then i try another way, and i get the next:
> 
> Terminal 1:
> 
> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun 
> -np 3 -am ft-enable-cr-recovery ./whoami 10 10
> Antes de MPI_Init
> Antes de MPI_Init
> Antes de MPI_Init
> --
> Notice: A migration of this job has been requested.
> The processes below will be migrated.
> Please standby.
>   [[40382,1],1] Rank 1 on Node clus9
> 
> --
> --
> Error: The process below has failed. There is no checkpoint available for
>this job, so we are terminating the application since automatic
>recovery cannot occur.
> Internal Name: [[40382,1],1]
> MCW Rank: 1
> 
> --
> Soy el número 0 (1)
> Terminando, una instrucción antes del finalize
> Soy el número 2 (1)
> Terminando, una instrucción antes del finalize
> 
> Terminal 2:
> 
> [hmeyer@clus9 build]$ 
> /home/hmeyer/desarrollo/ompi-code/binarios/bin/ompi-migrate -r 1 -t node3 
> 11784
> [clus9:11795] *** Process received signal ***
> [clus9:11795] Signal: Segmentation fault (11)
> [clus9:11795] Signal code: Address not mapped (1)
> [clus9:11795] Failing at address: (nil)
> [clus9:11795] [ 0] /lib64/libpthread.so.0 [0x2c0b9d40]
> [clus9:11795] *** End of error message ***
> Segmentation fault

Humm. Well that's not good. It looks like the automatic recovery is jumping in 
while migrating, which should not be happening. I'll take a look and see if I 
can reproduce locally.

Thanks,
Josh

> 
> I'm using the ompi-migrate command in the right way? or i am missing 
> something? Because the first attempt didn't find any process.
> 
> Best Regards.
> 
> Hugo Meyer
> 
> 
> 2011/1/28 Hugo Meyer 
> Thanks to you Joshua.
> 
> I will try the procedure with this modifications and i will let you know how 
> it goes.
> 
> Best Regards.
> 
> Hugo Meyer
> 
> 2011/1/27 Joshua Hursey 
> 
> I believe that this is now fixed on the trunk. All the details are in the 
> commit message:
>  https://svn.open-mpi.org/trac/ompi/changeset/24317
> 
> In my testing yesterday, I did not test the scenario where the node with 
> mpirun also contains processes (the test cluster I was using does not by 
> default run this way). So I was able to reproduce by running on a single 
> node. There were a couple bugs that emerged that are fixed in

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-31 Thread Joshua Hursey
So I was not able to reproduce this issue.

A couple notes:
 - You can see the node-to-process-rank mapping using the '-display-map' 
command line option to mpirun. This will give you the node names that Open MPI 
is using, and how it intends to layout the processes. You can use the 
'-display-allocation' option to see all of the nodes that Open MPI knows about. 
Open MPI cannot, currently, migrate to a node that it does not know about on 
startup.
 - If the problem persists, add the following MCA parameters to your 
~/.openmpi/mca-params.conf file and send me a zipped-up text file of the 
output. It might show us where things are going wrong:

orte_debug_daemons=1
errmgr_base_verbose=20
snapc_full_verbose=20


-- Josh

On Jan 31, 2011, at 9:46 AM, Joshua Hursey wrote:

> 
> On Jan 31, 2011, at 6:47 AM, Hugo Meyer wrote:
> 
>> Hi Joshua.
>> 
>> I've tried the migration again, and i get the next (running process where 
>> mpirun is running):
>> 
>> Terminal 1:
>> 
>> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun 
>> -np 2 -am ft-enable-cr-recovery --mca orte_base_help_aggregate 0 ./whoami 10 
>> 10
>> Antes de MPI_Init
>> Antes de MPI_Init
>> --
>> Warning: Could not find any processes to migrate on the nodes specified.
>> You provided the following:
>> Nodes: node9
>> Procs: (null)
>> --
>> Soy el número 1 (1)
>> Terminando, una instrucción antes del finalize
>> Soy el número 0 (1)
>> Terminando, una instrucción antes del finalize
>> 
>> Terminal 2:
>> 
>> [hmeyer@clus9 build]$ 
>> /home/hmeyer/desarrollo/ompi-code/binarios/bin/ompi-migrate -x node9 -t 
>> node3 11724
>> --
>> Error: The Job identified by PID (11724) was not able to migrate processes 
>> in this
>>   job. This could be caused by any of the following:
>>   - Invalid node or rank specified
>>   - No processes on the indicated node can by migrated
>>   - Process migration was not enabled for this job. Make sure to indicate
>> the proper AMCA file: "-am ft-enable-cr-recovery".
>> --
> 
> The error message indicates that there were no processes found on 'node9'. 
> Did you confirm that there were processes running on that node?
> 
> It is possible that the node name that Open MPI is using is different than 
> what you put in. For example it could be fully qualified (e.g., 
> node9.my.domain.com). So you might try that too. MPI_Get_processor_name() 
> should return the name of the node that we are attempting to use. So you 
> could have all processes print that out when the startup.
> 
> 
>> Then i try another way, and i get the next:
>> 
>> Terminal 1:
>> 
>> [hmeyer@clus9 whoami]$ /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun 
>> -np 3 -am ft-enable-cr-recovery ./whoami 10 10
>> Antes de MPI_Init
>> Antes de MPI_Init
>> Antes de MPI_Init
>> --
>> Notice: A migration of this job has been requested.
>>The processes below will be migrated.
>>Please standby.
>>  [[40382,1],1] Rank 1 on Node clus9
>> 
>> --
>> --
>> Error: The process below has failed. There is no checkpoint available for
>>   this job, so we are terminating the application since automatic
>>   recovery cannot occur.
>> Internal Name: [[40382,1],1]
>> MCW Rank: 1
>> 
>> --
>> Soy el número 0 (1)
>> Terminando, una instrucción antes del finalize
>> Soy el número 2 (1)
>> Terminando, una instrucción antes del finalize
>> 
>> Terminal 2:
>> 
>> [hmeyer@clus9 build]$ 
>> /home/hmeyer/desarrollo/ompi-code/binarios/bin/ompi-migrate -r 1 -t node3 
>> 11784
>> [clus9:11795] *** Process received signal ***
>> [clus9:11795] Signal: Segmentation fault (11)
>> [clus9:11795] Signal code: Address not mapped (1)
>> [clus9:11795] Failing at address: (nil)
>> [clus9:11795] [ 0] /lib64/libpthread.so.0 [0x2c0b9d40]
>> [clus9:117

Re: [OMPI devel] OMPI-MIGRATE error

2011-01-31 Thread Joshua Hursey
That helped. There was a missing check in the automatic recovery logic that 
prevents it from starting up while the migration is going on. r24326 should fix 
this bug. The segfault should have just been residual fallout from this bug. 
Can you try the current trunk to confirm?

One other thing I noticed in the output is that it looks like one of your nodes 
is asking you for a password (i.e., 'node1'). You may want to make sure that 
you can login without a password on that node, as it might otherwise hinder 
Open MPI's startup mechanism on that node.

Thanks,
Josh

On Jan 31, 2011, at 12:36 PM, Hugo Meyer wrote:

> Hi Josh.
> 
> As you say, the first problem was because of the name of the node. But the 
> second problem persist (the segmentation fault). As you ask, i'm sending you 
> the output of execute with the mca params that you pass me. At the end of the 
> file i put the output of the second terminal.
> 
> Best Regards
> 
> Hugo Meyer
> 
> 2011/1/31 Joshua Hursey 
> So I was not able to reproduce this issue.
> 
> A couple notes:
>  - You can see the node-to-process-rank mapping using the '-display-map' 
> command line option to mpirun. This will give you the node names that Open 
> MPI is using, and how it intends to layout the processes. You can use the 
> '-display-allocation' option to see all of the nodes that Open MPI knows 
> about. Open MPI cannot, currently, migrate to a node that it does not know 
> about on startup.
>  - If the problem persists, add the following MCA parameters to your 
> ~/.openmpi/mca-params.conf file and send me a zipped-up text file of the 
> output. It might show us where things are going wrong:
> 
> orte_debug_daemons=1
> errmgr_base_verbose=20
> snapc_full_verbose=20
> 
> 
> -- Josh
> 
> On Jan 31, 2011, at 9:46 AM, Joshua Hursey wrote:
> 
> >
> > On Jan 31, 2011, at 6:47 AM, Hugo Meyer wrote:
> >
> >> Hi Joshua.
> >>
> >> I've tried the migration again, and i get the next (running process where 
> >> mpirun is running):
> >>
> >> Terminal 1:
> >>
> >> [hmeyer@clus9 whoami]$ 
> >> /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 2 -am 
> >> ft-enable-cr-recovery --mca orte_base_help_aggregate 0 ./whoami 10 10
> >> Antes de MPI_Init
> >> Antes de MPI_Init
> >> --
> >> Warning: Could not find any processes to migrate on the nodes specified.
> >> You provided the following:
> >> Nodes: node9
> >> Procs: (null)
> >> --
> >> Soy el número 1 (1)
> >> Terminando, una instrucción antes del finalize
> >> Soy el número 0 (1)
> >> Terminando, una instrucción antes del finalize
> >>
> >> Terminal 2:
> >>
> >> [hmeyer@clus9 build]$ 
> >> /home/hmeyer/desarrollo/ompi-code/binarios/bin/ompi-migrate -x node9 -t 
> >> node3 11724
> >> --
> >> Error: The Job identified by PID (11724) was not able to migrate processes 
> >> in this
> >>   job. This could be caused by any of the following:
> >>   - Invalid node or rank specified
> >>   - No processes on the indicated node can by migrated
> >>   - Process migration was not enabled for this job. Make sure to 
> >> indicate
> >> the proper AMCA file: "-am ft-enable-cr-recovery".
> >> --
> >
> > The error message indicates that there were no processes found on 'node9'. 
> > Did you confirm that there were processes running on that node?
> >
> > It is possible that the node name that Open MPI is using is different than 
> > what you put in. For example it could be fully qualified (e.g., 
> > node9.my.domain.com). So you might try that too. MPI_Get_processor_name() 
> > should return the name of the node that we are attempting to use. So you 
> > could have all processes print that out when the startup.
> >
> >
> >> Then i try another way, and i get the next:
> >>
> >> Terminal 1:
> >>
> >> [hmeyer@clus9 whoami]$ 
> >> /home/hmeyer/desarrollo/ompi-code/binarios/bin/mpirun -np 3 -am 
> >> ft-enable-cr-recovery ./whoami 10 10
> >> Antes de MPI_Init
> >> Antes de MPI_Init
> >>

[OMPI devel] Return status of MPI_Probe()

2011-03-21 Thread Joshua Hursey
If MPI_Probe() encounters an error causing it to exit with the 
'status.MPI_ERROR' set, say:
  ret = MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

Should it return an error? So should it return:
 - ret = status.MPI_ERROR
 - ret = MPI_ERROR_IN_STATUS
 - ret = MPI_SUCCESS
Additionally, should it trigger the error handler on the communicator?

In Open MPI, it will always return MPI_SUCCESS (pml_ob1_iprobe.c:74), but it 
feels like this is wrong. I looked to the MPI standard for some insight, but 
could not find where it addresses the return code of MPI_Probe.

Can anyone shed some light on this topic for me?

Thanks,
Josh


--------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] Return status of MPI_Probe()

2011-03-22 Thread Joshua Hursey
George,

I agree that it is difficult to come up with a good scenario, outside of 
resilience, in which MPI_Probe would return an error (other than a bad argument 
type of error - which does currently work). I agree with your assessment of the 
value of the return code, and that it should trigger the error handler. I just 
wanted to confirm that there was not something I was missing.

I will commit a patch to fix this (it is just a couple lines in pml/ob1) for 
the trunk. I don't know if we need to push it to the release branches just yet.

Thanks,
Josh

On Mar 21, 2011, at 10:17 PM, George Bosilca wrote:

> Josh,
> 
> If we don't take in account resilience I would not expect MPI_Probe to have 
> that many opportunities to return errors. However, in order to keep the 
> implementation consistent (with the other MPI functions) I would abide to the 
> following.
> 
> MPI_ERROR_IN_STATUS is only for calls taking multiple requests as input, so I 
> don't think this should be applied to the MPI_Probe. I would expect the 
> return to be equal to status.MPI_ERROR (similar to only other function 
> working on a single request, such as MPI_Test).
> 
> It better trigger the error-handler attached to the communicator, as 
> explicitly requested by the MPI standard (section 8.3).
>> A user can associate error handlers to three types of objects: 
>> communicators, windows, and files. The specified error handling routine will 
>> be used for any MPI exception that occurs during a call to MPI for the 
>> respective object.
> 
>  george.
> 
> On Mar 21, 2011, at 16:50 , Joshua Hursey wrote:
> 
>> If MPI_Probe() encounters an error causing it to exit with the 
>> 'status.MPI_ERROR' set, say:
>> ret = MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
>> 
>> Should it return an error? So should it return:
>> - ret = status.MPI_ERROR
>> - ret = MPI_ERROR_IN_STATUS
>> - ret = MPI_SUCCESS
>> Additionally, should it trigger the error handler on the communicator?
>> 
>> In Open MPI, it will always return MPI_SUCCESS (pml_ob1_iprobe.c:74), but it 
>> feels like this is wrong. I looked to the MPI standard for some insight, but 
>> could not find where it addresses the return code of MPI_Probe.
>> 
>> Can anyone shed some light on this topic for me?
>> 
>> Thanks,
>> Josh
>> 
>> 
>> 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> "To preserve the freedom of the human mind then and freedom of the press, 
> every spirit should be ready to devote itself to martyrdom; for as long as we 
> may think as we will, and speak as we think, the condition of man will 
> proceed in improvement."
>  -- Thomas Jefferson, 1799
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] Return status of MPI_Probe()

2011-03-22 Thread Joshua Hursey
Sounds good.

Would you mind reviewing the CMRs?
  https://svn.open-mpi.org/trac/ompi/ticket/2756
  https://svn.open-mpi.org/trac/ompi/ticket/2757

Thanks,
Josh

On Mar 22, 2011, at 10:19 AM, George Bosilca wrote:

> Josh,
> 
> Your patch (r24551) looks fine. I think you should make a CMR for the 1.4 and 
> 1.5.
> 
>  Thanks,
>george.
> 
> 
> On Mar 22, 2011, at 09:04 , Joshua Hursey wrote:
> 
>> George,
>> 
>> I agree that it is difficult to come up with a good scenario, outside of 
>> resilience, in which MPI_Probe would return an error (other than a bad 
>> argument type of error - which does currently work). I agree with your 
>> assessment of the value of the return code, and that it should trigger the 
>> error handler. I just wanted to confirm that there was not something I was 
>> missing.
>> 
>> I will commit a patch to fix this (it is just a couple lines in pml/ob1) for 
>> the trunk. I don't know if we need to push it to the release branches just 
>> yet.
>> 
>> Thanks,
>> Josh
>> 
>> On Mar 21, 2011, at 10:17 PM, George Bosilca wrote:
>> 
>>> Josh,
>>> 
>>> If we don't take in account resilience I would not expect MPI_Probe to have 
>>> that many opportunities to return errors. However, in order to keep the 
>>> implementation consistent (with the other MPI functions) I would abide to 
>>> the following.
>>> 
>>> MPI_ERROR_IN_STATUS is only for calls taking multiple requests as input, so 
>>> I don't think this should be applied to the MPI_Probe. I would expect the 
>>> return to be equal to status.MPI_ERROR (similar to only other function 
>>> working on a single request, such as MPI_Test).
>>> 
>>> It better trigger the error-handler attached to the communicator, as 
>>> explicitly requested by the MPI standard (section 8.3).
>>>> A user can associate error handlers to three types of objects: 
>>>> communicators, windows, and files. The specified error handling routine 
>>>> will be used for any MPI exception that occurs during a call to MPI for 
>>>> the respective object.
>>> 
>>> george.
>>> 
>>> On Mar 21, 2011, at 16:50 , Joshua Hursey wrote:
>>> 
>>>> If MPI_Probe() encounters an error causing it to exit with the 
>>>> 'status.MPI_ERROR' set, say:
>>>> ret = MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
>>>> 
>>>> Should it return an error? So should it return:
>>>> - ret = status.MPI_ERROR
>>>> - ret = MPI_ERROR_IN_STATUS
>>>> - ret = MPI_SUCCESS
>>>> Additionally, should it trigger the error handler on the communicator?
>>>> 
>>>> In Open MPI, it will always return MPI_SUCCESS (pml_ob1_iprobe.c:74), but 
>>>> it feels like this is wrong. I looked to the MPI standard for some 
>>>> insight, but could not find where it addresses the return code of 
>>>> MPI_Probe.
>>>> 
>>>> Can anyone shed some light on this topic for me?
>>>> 
>>>> Thanks,
>>>> Josh
>>>> 
>>>> 
>>>> 
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>> 
>>>> 
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> "To preserve the freedom of the human mind then and freedom of the press, 
>>> every spirit should be ready to devote itself to martyrdom; for as long as 
>>> we may think as we will, and speak as we think, the condition of man will 
>>> proceed in improvement."
>>> -- Thomas Jefferson, 1799
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> "To preserve the freedom of the human mind then and freedom of the press, 
> every spirit should be ready to devote itself to martyrdom; for as long as we 
> may think as we will, and speak as we think, the condition of man will 
> proceed in improvement."
>  -- Thomas Jefferson, 1799
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




[OMPI devel] Fwd: [devel-core] Open MPI Developers Meeting

2011-03-30 Thread Joshua Hursey
Rich wanted to make this available to a broader audience. Re-posting to the 
devel list.

Begin forwarded message:

> From: Joshua Hursey 
> Date: March 30, 2011 9:14:03 AM CDT
> Subject: [devel-core] Open MPI Developers Meeting
> 
> It has been requested that we have a face-to-face Open MPI developers 
> meeting. It has been a long time since we were all in the same room to 
> discuss issues. Oak Ridge is willing to host the event.
> 
> 
> To get the ball rolling we need to decide two things:
> 
> 1) When would be the best 3 days that work for the most developers. Please 
> fill out the doodle poll by the next teleconf (April 5) so we can set the 
> date.
>  http://doodle.com/c59p4hrxqu2d9rmu
> 
> 2) What topics do we want on the agenda? I have a few items, but I'll bring 
> those forward later.
> 
> 
> Please send agenda items to the list. I'll bring this up on the next teleconf 
> as well.
> 
> -- Josh


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] Fwd: [devel-core] Open MPI Developers Meeting

2011-04-01 Thread Joshua Hursey
It looks like May 3-5 will work the best for everyone, in particular those 
needing to arrange visas.

So there are two things we need to do in the near term.

1) Since ORNL is hosting this event, you need to let Rich and I know if you are 
attending soon (by COB Tuesday if at all possible) so we can start the 
paperwork. Please email us directly, and we will start the process.

2) Agenda items. As I mentioned previously, I'll bring this up on the teleconf. 
But if you will not be able to attend the teleconf, reply to this email and let 
us know what you would like added to the agenda.


If you are planning on attending, please let Rich (rlgraham -at- ornl -dot- 
gov) and I know as soon as possible.

Thanks,
Josh


On Mar 30, 2011, at 10:36 AM, Joshua Hursey wrote:

> Rich wanted to make this available to a broader audience. Re-posting to the 
> devel list.
> 
> Begin forwarded message:
> 
>> From: Joshua Hursey 
>> Date: March 30, 2011 9:14:03 AM CDT
>> Subject: [devel-core] Open MPI Developers Meeting
>> 
>> It has been requested that we have a face-to-face Open MPI developers 
>> meeting. It has been a long time since we were all in the same room to 
>> discuss issues. Oak Ridge is willing to host the event.
>> 
>> 
>> To get the ball rolling we need to decide two things:
>> 
>> 1) When would be the best 3 days that work for the most developers. Please 
>> fill out the doodle poll by the next teleconf (April 5) so we can set the 
>> date.
>> http://doodle.com/c59p4hrxqu2d9rmu
>> 
>> 2) What topics do we want on the agenda? I have a few items, but I'll bring 
>> those forward later.
>> 
>> 
>> Please send agenda items to the list. I'll bring this up on the next 
>> teleconf as well.
>> 
>> -- Josh
> 
> 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




[OMPI devel] Open MPI Developers Meeting Agenda

2011-04-06 Thread Joshua Hursey
Reminder:
  If you are interested in attending the May 3-5 Open MPI Developers Meeting at 
ORNL let let Rich (rlgraham -at- ornl -dot- gov) and I know as soon as possible 
so we can start the paperwork. This is of particular importance for non-US 
citizens since the paperwork takes considerably more time.


The meeting will be three full days (May 3-5) on the ORNL campus. I intend to 
setup a teleconf for some/most of the sessions for those that cannot attend in 
person. Once we have the agenda topics on the table we can start negotiating 
time allotments.

Below are the agenda items that I have gathered so far (in no particular order):
 - MPI 2.2 implementation tickets
 - MPI 3.0 implementation planning
 - ORNL: Hierarchical Collectives discussion
 - Runtime integration discussion
 - New Process Affinity functionality
 - Update on ORTE development
 - Fault tolerance feature development and integration
   (C/R, logging, replication, FT-MPI, MPI 3.0, message reliability, ...)

Other topics that I thought of which folks might want to discuss - Is there 
anyone that wants to include these and lead their discussions?:
 - Threading design
 - Performance tuning (Point-to-point and/or Collective)
 - Testing infrastructure (MTT)


Keep sending agenda items to the list (or me directly if you would rather). I 
hope to have the agenda sketched out by the teleconf on 4/12 so we can fine 
tune it on the call.

Thanks,
Josh


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




Re: [OMPI devel] Adaptive or fault-tolerant MPI

2011-04-22 Thread Joshua Hursey

On Apr 22, 2011, at 1:20 PM, N.M. Maclaren wrote:

> On Apr 22 2011, Ralph Castain wrote:
> 
>> Several of us are. Josh and George (plus teammates), and some other outside 
>> folks, are working the MPI side of it.
>> 
>> I'm working only the ORTE side of the problem.
>> 
>> Quite a bit of capability is already in the trunk, but there is always more 
>> to do :-)
> 
> Is there a specification of what objectives are covered by 'fault-tolerant'?

We do not really have a website to point folks to at the moment. Some of the 
existing functionally in and planned functionality for Open MPI has been 
announced and documented, but not uniformly or in a central place at the 
moment. We have a developers meeting in a couple weeks and this is a topic I am 
planning on covering:
  https://svn.open-mpi.org/trac/ompi/wiki/May11Meeting
Once something is available, we'll post to the users/developers lists so that 
people know where to look for developments.

I am responsible for two fault tolerance features in Open MPI: 
Checkpoint/Restart and MPI Forum's Fault Tolerance Working Group proposals. The 
Checkpoint/Restart support is documented here:
  http://osl.iu.edu/research/ft/ompi-cr/

Most of my attention is focused on the MPI Forum's Fault Tolerance Working 
Group proposals that are focused on enabling fault tolerant applications to be 
developed on top of MPI (so non-transparent fault tolerance). The Open MPI 
prototype is not yet publicly available, but soon. Information about the 
semantics and interfaces of that project can be found at the links below:
  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/FaultToleranceWikiPage
  https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization

That is what I have been up to regarding fault tolerance. Others can probably 
elaborate on what they are working on if they wish.

-- Josh

> 
> Regards,
> Nick Maclaren.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Open MPI error

2011-04-25 Thread Joshua Hursey
Yeah this sounds like the limitation of the number of app contexts that we can 
use in ORTE. Since ompi-restart uses N app contexts to restart a job (one for 
each process in the original job), then it is possible that we can hit this 
limitation.

I suspect that it should not be too difficult to change the type, but I have 
not looked into it. I filed a ticket so we don't lose track of the issue:
  https://svn.open-mpi.org/trac/ompi/ticket/2783

If you have the time to work on a patch, I would be happy to review it. At this 
time, I am uncertain when I will get to this (and the other C/R) bugs. 
Hopefully soon, but it is not top priority at the moment.

Thanks,
Josh

On Apr 25, 2011, at 3:34 PM, Jeff Squyres wrote:

> On Apr 25, 2011, at 3:28 PM, Ralph Castain wrote:
> 
>>> Can someone provide a solution to this.
>> 
>> Probably won't happen for awhile - this is something peculiar to the restart 
>> mechanism. I'll make a note to look at it, but it would be a low priority.
> 
> Kishor -- is this something you could work on a patch for?
> 
> If so, Josh and/or Ralph might be able to point you in the right direction.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Open MPI Meeting

2011-05-03 Thread Joshua Hursey
We will be starting a few min late. I'll hang around the visitor center but if 
you don't see me send me an email directly.







[OMPI devel] OMPI Meeting Schedule Change

2011-05-05 Thread Joshua Hursey
For those interested in joining the WebEx this afternoon, the schedule has been 
adjusted slightly. So please see the updated schedule on the wiki.

-- Josh





Re: [OMPI devel] RFC: Fix missing code in MPI_Abort functionality

2011-06-09 Thread Joshua Hursey
 be
>> terminated along with the calling process. Currently ORTE notices that
>> there was an abort, and terminates the job. Once your RFC goes through
>> then this may no longer be the case, and OMPI can determine what to do
>> when it receives a process failure notification.
>> 
>>> 
>>> If we accept the fact that MPI_Abort will only abort the processes in the 
>>> current communicator what happens with the other processes in the same 
>>> MPI_COMM_WORLD (but not on the communicator that has been used by 
>>> MPI_Abort)?
>> 
>> Currently, ORTE will abort them as well. When your RFC goes through
>> then the OMPI layer will be notified of the error and can take the
>> appropriate action, as determined by the MPI standard.
>> 
>>> What about all the other connected processes (based on the connectivity as 
>>> defined in the MPI standard in Section 10.5.4) ? Do they see this as a 
>>> fault?
>> 
>> They are informed of the fault via the ORTE errmgr callback routine
>> (that we have an RFC for), and then can take the appropriate action
>> based on MPI semantics. So we are pushing the decision of the
>> implication of the fault to the OMPI layer - where it should be.
>> 
>> 
>> The remainder of the OMPI layer logic for MPI_ERRORS_RETURN and other
>> connected error management scenarios is not included in this patch
>> since that depends on there being a callback to the OMPI layer - which
>> does not exist just yet. So a small patch to wire in the ORTE piece to
>> allow the OMPI layer to request a set of processes to be terminated -
>> to more accurately support MPI_Abort semantics.
>> 
>> Does that answer your questions?
>> 
>> -- Josh
>> 
>> 
>>> 
>>> george.
>>> 
>>> On Jun 9, 2011, at 16:32 , Josh Hursey wrote:
>>> 
>>>> WHAT: Fix missing code in MPI_Abort
>>>> 
>>>> WHY: MPI_Abort is missing logic to ask for termination of the process
>>>> group defined by the communicator
>>>> 
>>>> WHERE: Mostly orte/mca/errmgr
>>>> 
>>>> WHEN: Open MPI trunk
>>>> 
>>>> TIMEOUT: Tuesday, June 14, 2011 (after teleconf)
>>>> 
>>>> Details:
>>>> ---
>>>> A bitbucket branch is available here (last sync to r24757 of trunk)
>>>> https://bitbucket.org/jjhursey/ompi-abort/
>>>> 
>>>> In the MPI Standard (v2.2) Section 8.7 after the introduction of
>>>> MPI_Abort, it states:
>>>> "This routine makes a best attempt to abort all tasks in the group of 
>>>> comm."
>>>> 
>>>> Open MPI currently only calls orte_errmgr.abort() to abort the calling
>>>> process itself. The code to ask for the abort of the other processes
>>>> in the group defined by the communicator is commented out. Since one
>>>> process calling abort currently causes all processes in the job to
>>>> abort, it has not been a big deal. However as the group starts
>>>> exploring better resilience in the OMPI layer (with further support
>>>> from the ORTE layer) this aspect of MPI_Abort will become more
>>>> necessary to get right.
>>>> 
>>>> This branch adds back the logic necessary for a single process calling
>>>> MPI_Abort to request, from ORTE errmgr, that a defined subgroup of
>>>> processes be aborted. Once the request is sent to the HNP, the local
>>>> process then calls abort on itself. The HNP requests that the defined
>>>> subgroup of processes be terminated using the existing plm mechanisms
>>>> for doing so.
>>>> 
>>>> This change has no effect on the current default user experienced
>>>> behavior of MPI_Abort.
>>>> 
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] RFC: Resilient ORTE

2011-06-09 Thread Joshua Hursey
'm
>>>> fine. I personally think that forcing the errmgr to track ordering of
>>>> callback registration makes it a more complex solution, but as long as
>>>> it works.
>>>> 
>>>> In particular I need to replace the default 'abort' errmgr call in
>>>> OMPI with something else. If both are called, then this does not help
>>>> me at all - since the abort behavior will be activated either before
>>>> or after my callback. So can you explain how I would do that with the
>>>> current or the proposed interface?
>>>> 
>>>> -- Josh
>>>> 
>>>> On Thu, Jun 9, 2011 at 12:54 PM, Ralph Castain  wrote:
>>>>> I agree - let's not get overly complex unless we can clearly articulate
>>>>> a
>>>>> requirement to do so.
>>>>> 
>>>>> On Thu, Jun 9, 2011 at 10:45 AM, George Bosilca 
>>>>> wrote:
>>>>>> 
>>>>>> This will require exactly opposite registration and de-registration
>>>>>> order,
>>>>>> or no de-registration at all (aka no way to unload a component). Or
>>>>>> some
>>>>>> even more complex code to deal with internally.
>>>>>> 
>>>>>> If the error manager handle the callbacks it can use the registration
>>>>>> ordering (which will be what the the approach can do), and can enforce
>>>>>> that
>>>>>> all callbacks will be called. I would rather prefer this approach.
>>>>>> 
>>>>>> george.
>>>>>> 
>>>>>> On Jun 9, 2011, at 08:36 , Josh Hursey wrote:
>>>>>> 
>>>>>>> I would prefer returning the previous callback instead of relying on
>>>>>>> the errmgr to get the ordering right. Additionally, when I want to
>>>>>>> unregister (or replace) a call back it is easy to do that with a
>>>>>>> single interface, than introducing a new one to remove a particular
>>>>>>> callback.
>>>>>>> Register:
>>>>>>> ompi_errmgr.set_fault_callback(my_callback, prev_callback);
>>>>>>> Deregister:
>>>>>>> ompi_errmgr.set_fault_callback(prev_callback, old_callback);
>>>>>>> or to eliminate all callbacks (if you needed that for somme reason):
>>>>>>> ompi_errmgr.set_fault_callback(NULL, old_callback);
>>>>>> 
>>>>>> 
>>>>>> ___
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>>> 
>>>>> ___
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://users.nccs.gov/~jjhursey
>>>> 
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> 
>> 
>> -- 
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 




Re: [OMPI devel] RFC: Resilient ORTE

2011-06-18 Thread Joshua Hursey
Cool. Then can we hold off pushing this into the trunk for a couple days until 
I get a chance to test it? Monday COB does not give me much time since we just 
got the new patch on Friday COB (the RFC gave us 2 weeks to review the original 
patch). Would waiting until next Thursday/Friday COB be too disruptive? That 
should give me and maybe Ralph enough time to test and send any further 
feedback.

Thanks,
Josh

On Jun 17, 2011, at 5:59 PM, Wesley Bland wrote:

> I believe that it does. I made quite a few changes in the last checkin though 
> I didn't run your specific test this afternoon. I'll be able to try it later 
> this evening but it should be easy to test now that it's synced with the 
> trunk again.
> 
> On Jun 17, 2011 5:32 PM, "Josh Hursey"  wrote:
> > Does this include a fix for the problem I reported with mpirun-hosted 
> > processes?
> > 
> > If not I would ask that we holding off on putting it into the trunk
> > until that particular bug is addressed. From my experience tackling
> > this particular issues requires some code refactoring, which should
> > probably be done once in the trunk instead of two possibly disruptive
> > commits.
> > 
> > -- Josh
> > 
> > On Fri, Jun 17, 2011 at 5:18 PM, Wesley Bland  wrote:
> >> This is a reminder that the Resilient ORTE RFC is set to go into the trunk
> >> on Monday at COB.
> >> I've updated the code with a few of the changes that were mentioned on and
> >> off the list (moved code out of orted_comm.c, errmgr_set_callback returns
> >> previous callback, post_startup function, corrected normal termination
> >> issues). Please take another look at it if you have any interest. The code
> >> can be found here:
> >> https://bitbucket.org/wesbland/resilient-orte/
> >> Thanks,
> >> Wesley Bland
> > 
> > 
> > 
> > -- 
> > Joshua Hursey
> > Postdoctoral Research Associate
> > Oak Ridge National Laboratory
> > http://users.nccs.gov/~jjhursey




Re: [OMPI devel] carto vs. hwloc

2009-12-16 Thread Joshua Hursey
Currently, I am working on process migration and automatic recovery based on 
checkpoint/restart. WRT the PML stack, this works by rewiring the BTLs after 
restart of the migrated/recovered MPI process(es). There is a fair amount of 
work in getting this right with respect to both the runtime and the OMPI layer 
(particularly the modex). For the automatic recovery with C/R we will, at 
first, require the restart of all processes in the job [for consistency]. For 
migration, only those processes moving will need to be restarted, all others 
may be blocked.

I think what you are looking for is the ability to lose a process and replace 
it without restarting all the rest of the processes. This would require a bit 
more work beyond what I am currently working on. Since you will need to flush 
the PML/BML/BTL stack of latent messages, etc. The message logging work by UTK 
should do this anyway (if they use uncoordinated C/R+message logging), but they 
will have to fill in the details on that project.

-- Josh

On Dec 16, 2009, at 1:32 AM, George Bosilca wrote:

> As far as I know what Josh did is slightly different. In the case of a 
> complete restart (where all processes are restarted from a checkpoint), he 
> setup and rewire a new set of BTLs.
> 
> However, it happens that we do have some code to rewire the MPI processes in 
> case of failure(s) in one of UTK projects. I'll have to talk with the team 
> here, to see if at this point there is something we can contribute regarding 
> this matter.
> 
>  george.
> 
> On Dec 15, 2009, at 21:08 , Ralph Castain wrote:
> 
>> 
>> On Dec 15, 2009, at 6:31 PM, Jeff Squyres wrote:
>> 
>>> On Dec 15, 2009, at 2:20 PM, Ralph Castain wrote:
>>> 
 It probably should be done at a lower level, but it begs a different 
 question. For example, I've created the capability  in the new cluster 
 manager to detect interfaces that are lost, ride through the problem by 
 moving affected procs to other nodes (reconnecting ORTE-level comm), and 
 move procs back if/when nodes reappear. So someone can remove a node 
 "on-the-fly" and replace that hardware with another node without having to 
 stop and restart the job, etc. A lot of that infrastructure is now down 
 inside ORTE, though a few key pieces remain in the ORCM code base (and 
 most likely will stay there).
 
 Works great - unless it is an MPI job. If we can figure out a way for the 
 MPI procs to (a) be properly restarted on the "new" node, and (b) update 
 the BTL connection info on the other MPI procs in the job, then we would 
 be good to go...
 
 Trivial problem, I am sure :-)
>>> 
>>> ...actually, the groundwork is there with Josh's work, isn't it?  I think 
>>> the real issue is handling un-graceful BTL failures properly.  I'm guessing 
>>> that's the biggest piece that isn't done...?
>> 
>> Think sonot sure how to update the BTL's with the new info, but perhaps 
>> Josh has already done that problem.
>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] Fwd: [osl-staff] [all-osl-users] OSL systems maintenance

2009-12-29 Thread Joshua Hursey
FYI. This will affect the Open MPI Trac and SVN on Wednesday morning.

Begin forwarded message:

> From: "Kim, DongInn" 
> Date: December 28, 2009 3:55:28 PM EST
> To: all-osl-us...@osl.iu.edu
> Subject: [osl-staff] [all-osl-users] OSL systems maintenance
> Reply-To: Internal OSL staff mailing list 
> 
> We are planning the OSL systems maintenance on Wednesday, Dec 30, 2009 and 
> require that all the OSL systems have some downtime.
> 
> All the services in the OSL systems would not available during the following 
> downtime.
> Date: Dec 30, 2009
> Time:
> - 4:00am-6:00am Pacific US time
> - 5:00am-7:00am Mountain US time
> - 6:00am-8:00am Central US time
> - 7:00am-9:00am Eastern US time
> - 12:00pm-2:00pm GMT
> 
> The systems to reboot:
> 
>   eddie
>   flowerpot
>   frogstar
>   gibson
>   magrathea
>   milliways
>   rontok
>   sourcehaven
>   vogon
>   wowbagger
> 
> Please let me know if you have any issues or questions about the downtime.
> 
> Regards,
> 
> -- 
> - DongInn
> 
> ___
> osl-staff mailing list
> osl-st...@osl.iu.edu
> http://www.osl.iu.edu/mailman/listinfo.cgi/osl-staff




[OMPI devel] Fwd: Update on CS mail problem

2010-01-08 Thread Joshua Hursey
You may have noticed that some of the messages from this morning were marked as 
a virus (prefixed with [PMX:VIRUS]). This was caused by the problem described 
below by Rob. This affected the various mailing lists (including all the Open 
MPI project lists) that were hosted by IU.

The admins at IU think they have the issue resolved, and should be resending 
the quarantined messages sometime today.

-- Josh

Begin forwarded message:

> From: Rob Henderson 
> Date: January 8, 2010 10:32:45 AM EST
> To: undisclosed-recipients 
> Subject: Update on CS mail problem
> 
> 
> There was a problem with the CS virus and spam scanning software that
> was causing email sent through @cs.indiana.edu, @extreme.indiana.edu,
> and @osl.iu.edu to be incorrectly tagged as a virus with the virus code:
> 
>   SOPHOS_SAVI_ERROR_OLD_VIRUS_DATA
> 
> We have corrected the underlying problem that caused this but the result
> was that email delivered starting around 1:20am through around 8:25am on
> 1/8/2010 was not delivered properly.
> 
> The messages were saved in the software's quarantine queue and I will be
> releasing this email from the queue so you should be receiving the
> original messages intact.  I'm working with the vendor on the release of
> the messages from the queue so you should be getting them shortly but I
> don't have an ETA on this yet.  But, I'm working to do this as soon as
> possible so please bear with me.
> 
> Thanks,
> 
>  --Rob




Re: [OMPI devel] New feature for SVN commit messages

2010-02-05 Thread Joshua Hursey
Is this functionality still working?

I added 'cmr:v1.5.1' to r22564 and it did not create a ticket. I noticed a few 
of the tickets manually created yesterday also cited this problem.

-- Josh

On Feb 3, 2010, at 8:23 AM, Jeff Squyres wrote:

> A little while ago, IU added the feature of automatically creating CMRs from 
> SVN commits when you add tokens like this in your commit message:
> 
>svn ci -m "This fixes ...foo...  cmr:v1.4.2"
> 
> IU just extended this feature by allowing you to specify a reviewer, thusly:
> 
>svn ci -m "This fixes ...foo...  cmr:v1.4.2:reviewer=jsquyres"
> 
> You must specify a valid Trac ID.  If you do this, the ticket will be 
> assigned to that ID, meaning that they'll get an email with the ticket and a 
> request to review it.
> 
> More description is here:
> 
>https://svn.open-mpi.org/trac/ompi/wiki/TracSVNCommitMessages
> 
> Enjoy!
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] v1.4 broken

2010-02-17 Thread Joshua Hursey
I just noticed that the nightly tarball of v1.4 failed to build in the OpenIB 
BTL last night. The error was:

-
btl_openib_component.c: In function 'init_one_device':
btl_openib_component.c:2089: error: 'mca_btl_openib_component_t' has no member 
named 'default_recv_qps'
-

It looks like CMR #2251 is the problem.

-- Josh


[OMPI devel] Build issue: mpi_portable_platform.h

2010-03-12 Thread Joshua Hursey
I noticed the following build error on the OMPI trunk (r22821) on IU's Odin 
machine:
  make[3]: *** No rule to make target `mpi_portable_platform.h', needed by 
`all-am'.  Stop.

I took a quick pass through the svn commit log and did not see anything that 
would have broken this. Any thoughts on what could be causing this?

-- Josh


Re: [OMPI devel] Build issue: mpi_portable_platform.h

2010-03-12 Thread Joshua Hursey
I think I figured it out. The error was coming from a Mercurial branch cloned 
from my internal HG+SVN branch. HG previously marked "mpi_portable_platform.h" 
as a file to not include in rev. control since it was auto-generated. Now that 
it is not auto-generated, it needs to be included in the rev. control.

The fix (in case anyone hits the same problem) is to remove 
"mpi_portable_platform.h" from the .hgignore in your HG+SVN, then 'hg 
addremove', 'hg commit'. Then things are better.

Thanks for the pointers to the rev #, that helped.

Cheers,
Josh


On Mar 12, 2010, at 4:42 PM, Rainer Keller wrote:

> Hi Josh,
> this is caused by moving mpi_portable_platform.h.in file in two steps from 
> ompi/include to opal/include -- in order to be used by opal_info and 
> orte_info.
> 
> You need to autogen.sh again after svn up to at least r22789.
> 
> Hope, this helps?
> 
> Best regards,
> RAiner
> 
> 
> On Friday 12 March 2010 04:17:41 pm Joshua Hursey wrote:
>> I noticed the following build error on the OMPI trunk (r22821) on IU's Odin
>> machine: make[3]: *** No rule to make target `mpi_portable_platform.h',
>> needed by `all-am'.  Stop.
>> 
>> I took a quick pass through the svn commit log and did not see anything
>> that would have broken this. Any thoughts on what could be causing this?
>> 
>> -- Josh
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> -- 
> 
> Rainer Keller, PhD  Tel: +1 (865) 241-6293
> Oak Ridge National Lab  Fax: +1 (865) 241-4811
> PO Box 2008 MS 6164   Email: kel...@ornl.gov
> Oak Ridge, TN 37831-2008AIM/Skype: rusraink




Re: [OMPI devel] Build issue: mpi_portable_platform.h

2010-03-12 Thread Joshua Hursey
I use it, but I only ran it once when I setup the HG+SVN. I'll start refreshing 
it more frequently.

Thanks for the tip,
Josh

On Mar 12, 2010, at 6:19 PM, Jeff Squyres wrote:

> Josh --
> 
> Do you use the contrib/hg/build-hgignore.pl script?  It examines all the 
> svn:ignore files to build up a .hgignore file.  I run this every time I svn 
> up on my hg+svn tree.
> 
> 
> On Mar 12, 2010, at 3:06 PM, Joshua Hursey wrote:
> 
>> I think I figured it out. The error was coming from a Mercurial branch 
>> cloned from my internal HG+SVN branch. HG previously marked 
>> "mpi_portable_platform.h" as a file to not include in rev. control since it 
>> was auto-generated. Now that it is not auto-generated, it needs to be 
>> included in the rev. control.
>> 
>> The fix (in case anyone hits the same problem) is to remove 
>> "mpi_portable_platform.h" from the .hgignore in your HG+SVN, then 'hg 
>> addremove', 'hg commit'. Then things are better.
>> 
>> Thanks for the pointers to the rev #, that helped.
>> 
>> Cheers,
>> Josh
>> 
>> 
>> On Mar 12, 2010, at 4:42 PM, Rainer Keller wrote:
>> 
>>> Hi Josh,
>>> this is caused by moving mpi_portable_platform.h.in file in two steps from
>>> ompi/include to opal/include -- in order to be used by opal_info and
>>> orte_info.
>>> 
>>> You need to autogen.sh again after svn up to at least r22789.
>>> 
>>> Hope, this helps?
>>> 
>>> Best regards,
>>> RAiner
>>> 
>>> 
>>> On Friday 12 March 2010 04:17:41 pm Joshua Hursey wrote:
>>>> I noticed the following build error on the OMPI trunk (r22821) on IU's Odin
>>>> machine: make[3]: *** No rule to make target `mpi_portable_platform.h',
>>>> needed by `all-am'.  Stop.
>>>> 
>>>> I took a quick pass through the svn commit log and did not see anything
>>>> that would have broken this. Any thoughts on what could be causing this?
>>>> 
>>>> -- Josh
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>> 
>>> --
>>> 
>>> Rainer Keller, PhD  Tel: +1 (865) 241-6293
>>> Oak Ridge National Lab  Fax: +1 (865) 241-4811
>>> PO Box 2008 MS 6164   Email: kel...@ornl.gov
>>> Oak Ridge, TN 37831-2008AIM/Skype: rusraink
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-03-23 Thread Joshua Hursey
Just a reminder that this RFC will go into the trunk this evening unless there 
are strong objections.

We intend to let this soak for a few days then bring it over to the 1.5 series 
(after the 1.5.0 release).

-- Josh

On Mar 15, 2010, at 9:26 AM, Josh Hursey wrote:

> (Updated RFC, per offline discussion)
> 
> WHAT: Merge a tmp branch for fault recovery development into the OMPI trunk
> 
> WHY: Bring over work done by Josh and Ralph to extend OMPI's fault recovery 
> capabilities
> 
> WHERE: Impacts a number of ORTE files and a ORTE ErrMgr framework
> 
> TIMEOUT: Barring objections and/or further requests for delay, evening of 
> March 23
> 
> REFERENCE BRANCH: http://bitbucket.org/jjhursey/orte-errmgr/
> 
> ==
> 
> BACKGROUND:
> 
> Josh and Ralph have been working on a private branch off of the trunk on 
> extended fault recovery procedures, mostly impacting ORTE. The new code 
> optionally allows ORTE to recover from failed nodes, moving processes to 
> other nodes in order to maintain operation. In addition, the code provides 
> better support for recovering from individual process failures.
> 
> Not all of the work done on the private branch will be brought over in this 
> commit. Some of the MPI-specific code that allows recovery from process 
> failure on-the-fly will be committed separately at a later date. This commit 
> provides the foundation for ORTE stabilization that can be built upon to 
> provide OMPI layer stability in the future.
> 
> This commit significantly modifies the ORTE ErrMgr framework to support those 
> advanced recovery operations. The ErrMgr public interface has been preserved 
> since it is used in various places throughout the codebase, and should 
> continue to be used as normal. The ErrMgr framework has been internally 
> redesigned to better support multiple strategies for responding to failures 
> (represents a merge of the old ErrMgr and the RecoS framework, into the 
> ErrMgr 3.0 component interface). The default (base) mode will continue to 
> work exactly the same as today, aborting the job when a failure occurs. 
> However, if the user elects to enable recovery then one or more ErrMgr 
> components will be activated to determine the recovery policy for the job.
> 
> We have created a public repo (reference branch, above) with the code to be 
> merged into the trunk (r22815). Please feel free to check it out and test it.
> 
> NOTE: The new recovery capability is only active if the user elects to use it 
> by setting the MCA parameter errmgr_base_enable_recovery to '1'.
> 
> NOTE: More ErrMgr recovery components will be coming online in the near 
> future, currently this branch only includes the 'orcm' module for ORTE 
> process recovery (not MPI processes). If you want to experiment with this 
> feature, below are the MCA parameters that you will need to get started.
>> #
>> plm=rsh
>> rmaps=resilient
>> routed=cm
>> errmgr_base_enable_recovery=1
>> #
> 
> Comments, suggestions, and corrections are welcome!
> 
> 
> 
> On Mar 10, 2010, at 2:22 PM, Josh Hursey wrote:
> 
>> Wesley,
>> 
>> Thanks for catching that oversight. Below are the MCA parameters that you 
>> should need at the moment:
>> #
>> # Use the C/R Process Migration Recovery Supervisor
>> recos_base_enable=1
>> # Only use the 'rsh' launcher, other launchers will be supported later
>> plm=rsh
>> # The resilient mapper knows how to use RecoS and deal with recovering procs
>> rmaps=resilient
>> # 'cm' component is the only one that can handle failures at the moment
>> routed=cm
>> #
>> 
>> Let me know if you have any troubles.
>> 
>> -- Josh
>> 
>> On Mar 10, 2010, at 10:36 AM, Wesley Bland wrote:
>> 
>>> Josh,
>>> 
>>> You mentioned some MCA parameters that you would include in the email, but 
>>> I don't see those parameters anywhere.  Could you please put those in here 
>>> to make testing easier for people.
>>> 
>>> Wesley
>>> 
>>> On Wed, Mar 10, 2010 at 1:26 PM, Josh Hursey  wrote:
>>> Yesterday evening George, Thomas and I discussed some of their concerns 
>>> about this RFC at the MPI Forum meeting. After the discussion, we seemed to 
>>> be in agreement that the RecoS framework is a good idea and the concepts 
>>> and fixes in this RFC should move forward with a couple of notes:
>>> 
>>> - They wanted to test the branch a bit more over the next couple of days. 
>>> Some MCA parameters that you will need are at the bottom of this message.
>>> 
>>> - Reiterate that this RFC only addresses ORTE stability, not OMPI 
>>> stability. The OMPI stability extension is a second step for the line of 
>>> work, and should/will fit in nicely with the RecoS framework being proposed 
>>> in this RFC. The OMPI layer stability will require a significant amount of 
>>> work, but the RecoS framework will provide the

Re: [OMPI devel] trunk breakage

2010-05-22 Thread Joshua Hursey
Along with this, the exit code from mpirun is not correct. It is returning 1, 
even when the run was successful. This is showing up in MTT, where the trivial 
test suite is failing things like 'hello world' since the return code is not 
what was expected.

Ralph is looking into this, but I just wanted to give people a heads up.

-- Josh

On May 21, 2010, at 12:59 PM, Jeff Squyres wrote:

> Both things should now be fixed.  Please let me know if you run into problems.
> 
> There's a few more fixes coming in paffinity, but base functionality should 
> be restored.
> 
> 
> On May 21, 2010, at 10:45 AM, Jeff Squyres wrote:
> 
>> There's two things broken on the trunk right now:
>> 
>> 1. I broke internal libltdl builds.  Grr.  Should have a fix shortly.
>> 
>> 2. Paffinity is broken -- if you try to run with any binding, you'll get an 
>> error.  It looks like some OPAL_SOS stuff broke it (error code checking 
>> conversion stuff).  Ralph and I talked on the phone and agreed on a fix; 
>> I'll get to it after I fix #1.
>> 
>> Sorry folks...
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] trunk breakage

2010-05-22 Thread Joshua Hursey
I thought so too, but MTT is still showing a bunch of errors with the current 
nightly snapshot.
  http://www.open-mpi.org/mtt/index.php?do_redir=1855

It kind of looks like it only really shows up when using more than one node (>4 
np on Odin).


On May 22, 2010, at 2:07 PM, Ralph Castain wrote:

> 
> On May 22, 2010, at 8:43 AM, Joshua Hursey wrote:
> 
>> Along with this, the exit code from mpirun is not correct. It is returning 
>> 1, even when the run was successful. This is showing up in MTT, where the 
>> trivial test suite is failing things like 'hello world' since the return 
>> code is not what was expected.
> 
> I thought this had been fixed - no?
> 
>> 
>> Ralph is looking into this, but I just wanted to give people a heads up.
>> 
>> -- Josh
>> 
>> On May 21, 2010, at 12:59 PM, Jeff Squyres wrote:
>> 
>>> Both things should now be fixed.  Please let me know if you run into 
>>> problems.
>>> 
>>> There's a few more fixes coming in paffinity, but base functionality should 
>>> be restored.
>>> 
>>> 
>>> On May 21, 2010, at 10:45 AM, Jeff Squyres wrote:
>>> 
>>>> There's two things broken on the trunk right now:
>>>> 
>>>> 1. I broke internal libltdl builds.  Grr.  Should have a fix shortly.
>>>> 
>>>> 2. Paffinity is broken -- if you try to run with any binding, you'll get 
>>>> an error.  It looks like some OPAL_SOS stuff broke it (error code checking 
>>>> conversion stuff).  Ralph and I talked on the phone and agreed on a fix; 
>>>> I'll get to it after I fix #1.
>>>> 
>>>> Sorry folks...
>>>> 
>>>> -- 
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> 
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RFC: Checkpoint/Restart Advancements and Bug Fixes

2010-07-31 Thread Joshua Hursey
WHAT:
Checkpoint/Restart-based automatic recovery and process migration, advanced 
checkpoint storage, C/R-enabled debugging, MPI Extension API for C/R, and some 
bug fixes.

WHY:
This commit includes a variety of checkpoint/restart advancements that have 
been pending on a temporary branch for a long while. Users have been waiting on 
many of these bug fixes and advancements for a while now. More details below.

WHERE:
  http://bitbucket.org/jjhursey/ompi-cr-recos
Last sync'ed to trunk in r23536 (July 31, 2010)

WHEN:
Move into the trunk in the next two weeks. Then into the 1.5 series with the 
ORTE refresh (Ticket #2471).

TIMEOUT:
Aug 10, 2010 @ teleconf (commit at COB)

DOCUMENTATION
Following public site will be fully updated upon commit:
  http://osl.iu.edu/research/ft
Temporary documentation site (will be taken down upon commit):
  http://osl.iu.edu/~jjhursey/research/ft-www-preview
Man page documentation will be updated soon.


The changes may seem large but are isolated to a C/R components and frameworks 
except where they are wired into the infrastructure.

This commit brings in a variety of pending features and bug fixes that have 
been accumulating over the past 8-12 months. Highlights are below (full change 
log at bottom):
 * Added C/R-enabled Debugging Support
 * Added a Stable Storage framework for advanced checkpoint storage techniques
 * Added checkpoint caching and compression support
 * Added two C/R-based recovery policies
   * C/R-based Process Migration (API and ompi-migrate tool activated)
   * C/R-based Automatic Recovery
 * Added a variety of C/R MPI Extensions functions (e.g., Checkpoint, Restart, 
Migrate)
 * Added C/R progress meters to File Movement (FileM), Stable Storage (SStore), 
and Snapshot Coordination (SnapC) frameworks

While this RFC is pending I plan to clean up the man page documentation for 
these features and update copyrights in the code base.



Change Log:
---
Major Changes:
--
 * Added C/R-enabled Debugging support.
   Enabled with the --enable-crdebug flag. See the following website for more 
information:
   http://osl.iu.edu/research/ft/crdebug/
 * Added Stable Storage (SStore) framework for checkpoint storage
   * 'central' component does a direct to central storage save
   * 'stage' component stages checkpoints to central storage while the 
application continues execution.
 * 'stage' supports offline compression of checkpoints before moving 
(sstore_stage_compress)
 * 'stage' supports local caching of checkpoints to improve automatic 
recovery (sstore_stage_caching)
 * Added Compression (compress) framework to support
 * Add two new ErrMgr recovery policies
   * {{{crmig}}} C/R Process Migration
   * {{{autor}}} C/R Automatic Recovery
 * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} 
ErrMgr component
 * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} 
configure option)
   * {{{OMPI_CR_Checkpoint}}} (Fixes #2342)
   * {{{OMPI_CR_Restart}}}
   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
   * {{{OMPI_CR_INC_register_callback}}} (Fixes #2192)
   * {{{OMPI_CR_Quiesce_start}}}
   * {{{OMPI_CR_Quiesce_checkpoint}}}
   * {{{OMPI_CR_Quiesce_end}}}
   * {{{OMPI_CR_self_register_checkpoint_callback}}}
   * {{{OMPI_CR_self_register_restart_callback}}}
   * {{{OMPI_CR_self_register_continue_callback}}}
 * The ErrMgr predicted_fault() interface has been changed to take an 
opal_list_t of ErrMgr defined types. This will allow us to better support a 
wider range of fault prediction services in the future.
 * Add a progress meter to:
   * FileM rsh (filem_rsh_process_meter)
   * SnapC full (snapc_full_progress_meter)
   * SStore stage (sstore_stage_progress_meter)
 * Added 2 new command line options to ompi-restart
   * --showme : Display the full command line that would have been exec'ed.
   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes 
#2413)
 * Deprecated some MCA params:
   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
   * snapc_base_global_snapshot_dir deprecated, use 
sstore_base_global_snapshot_dir
   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
   * snapc_base_store_in_place deprecated, replaced with different components 
of SStore
   * snapc_base_global_snapshot_ref deprecated, use 
sstore_base_global_snapshot_ref
   * snapc_base_establish_global_snapshot_dir deprecated, never well supported
   * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem

Minor Changes:
--
 * Fixes #1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint 
handles and does the right thing.
 * Fixes #2097 : {{{ompi-info}}} should now report all available CRS components
 * Fixes #2161 : Manual checkpoint movement. A user can 'mv' a checkpoint 
directory from the original location to another and still

Re: [OMPI devel] RFC: Checkpoint/Restart Advancements and Bug Fixes

2010-08-10 Thread Joshua Hursey
Committed in r23587

:)

On Jul 31, 2010, at 12:51 PM, Joshua Hursey wrote:

> WHAT:
> Checkpoint/Restart-based automatic recovery and process migration, advanced 
> checkpoint storage, C/R-enabled debugging, MPI Extension API for C/R, and 
> some bug fixes.
> 
> WHY:
> This commit includes a variety of checkpoint/restart advancements that have 
> been pending on a temporary branch for a long while. Users have been waiting 
> on many of these bug fixes and advancements for a while now. More details 
> below.
> 
> WHERE:
>  http://bitbucket.org/jjhursey/ompi-cr-recos
> Last sync'ed to trunk in r23536 (July 31, 2010)
> 
> WHEN:
> Move into the trunk in the next two weeks. Then into the 1.5 series with the 
> ORTE refresh (Ticket #2471).
> 
> TIMEOUT:
> Aug 10, 2010 @ teleconf (commit at COB)
> 
> DOCUMENTATION
> Following public site will be fully updated upon commit:
>  http://osl.iu.edu/research/ft
> Temporary documentation site (will be taken down upon commit):
>  http://osl.iu.edu/~jjhursey/research/ft-www-preview
> Man page documentation will be updated soon.
> 
> 
> The changes may seem large but are isolated to a C/R components and 
> frameworks except where they are wired into the infrastructure.
> 
> This commit brings in a variety of pending features and bug fixes that have 
> been accumulating over the past 8-12 months. Highlights are below (full 
> change log at bottom):
> * Added C/R-enabled Debugging Support
> * Added a Stable Storage framework for advanced checkpoint storage techniques
> * Added checkpoint caching and compression support
> * Added two C/R-based recovery policies
>   * C/R-based Process Migration (API and ompi-migrate tool activated)
>   * C/R-based Automatic Recovery
> * Added a variety of C/R MPI Extensions functions (e.g., Checkpoint, Restart, 
> Migrate)
> * Added C/R progress meters to File Movement (FileM), Stable Storage 
> (SStore), and Snapshot Coordination (SnapC) frameworks
> 
> While this RFC is pending I plan to clean up the man page documentation for 
> these features and update copyrights in the code base.
> 
> 
> 
> Change Log:
> ---
> Major Changes:
> --
> * Added C/R-enabled Debugging support.
>   Enabled with the --enable-crdebug flag. See the following website for more 
> information:
>   http://osl.iu.edu/research/ft/crdebug/
> * Added Stable Storage (SStore) framework for checkpoint storage
>   * 'central' component does a direct to central storage save
>   * 'stage' component stages checkpoints to central storage while the 
> application continues execution.
> * 'stage' supports offline compression of checkpoints before moving 
> (sstore_stage_compress)
> * 'stage' supports local caching of checkpoints to improve automatic 
> recovery (sstore_stage_caching)
> * Added Compression (compress) framework to support
> * Add two new ErrMgr recovery policies
>   * {{{crmig}}} C/R Process Migration
>   * {{{autor}}} C/R Automatic Recovery
> * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} 
> ErrMgr component
> * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} 
> configure option)
>   * {{{OMPI_CR_Checkpoint}}} (Fixes #2342)
>   * {{{OMPI_CR_Restart}}}
>   * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
>   * {{{OMPI_CR_INC_register_callback}}} (Fixes #2192)
>   * {{{OMPI_CR_Quiesce_start}}}
>   * {{{OMPI_CR_Quiesce_checkpoint}}}
>   * {{{OMPI_CR_Quiesce_end}}}
>   * {{{OMPI_CR_self_register_checkpoint_callback}}}
>   * {{{OMPI_CR_self_register_restart_callback}}}
>   * {{{OMPI_CR_self_register_continue_callback}}}
> * The ErrMgr predicted_fault() interface has been changed to take an 
> opal_list_t of ErrMgr defined types. This will allow us to better support a 
> wider range of fault prediction services in the future.
> * Add a progress meter to:
>   * FileM rsh (filem_rsh_process_meter)
>   * SnapC full (snapc_full_progress_meter)
>   * SStore stage (sstore_stage_progress_meter)
> * Added 2 new command line options to ompi-restart
>   * --showme : Display the full command line that would have been exec'ed.
>   * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes 
> #2413)
> * Deprecated some MCA params:
>   * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
>   * snapc_base_global_snapshot_dir deprecated, use 
> sstore_base_global_snapshot_dir
>   * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
>   * snapc_base_store_in_place deprecated, replaced with different com

Re: [OMPI devel] Question on MCA_BASE_METADATA_PARAM_NONE

2010-08-23 Thread Joshua Hursey
When you configure with '--with-ft=cr' this enables the C/R fault tolerance 
frameworks, tools and code paths. One code path is the component selection 
logic you cited below. When you run an application compiled with Open MPI 
passing the '-am ft-enable-cr' or '-am ft-enable-cr-recovery' options this 
activates the logic below to pick only those components that have self 
identified as 'checkpoint ready'. 'checkpoint ready' means different things for 
different frameworks. Some frameworks do not need to do anything (e.g., timer), 
while others require much more work (e.g., BTLs).

There are some components that have not been verified to work well under C/R 
scenarios, and they are not selected when you pass the '-am ' parameters cited 
above. The Shared Memory BTL -is- checkpoint ready, and -will- be selected (on 
the current 1.4, 1.5 and trunk branches).  See the code below (Line 94):
  
https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mca/btl/sm/btl_sm_component.c#L94

The shared memory collective module [also called 'sm'] (which is not enabled 
under normal use due to testing - Line 89 in coll_sm_component.c) is -not- 
checkpoint ready (line 77), also due to testing:
  
https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mca/coll/sm/coll_sm_component.c#L76

So shared memory communication support has been available for 
checkpoint/restart functionality for a couple years now. The shared memory 
collective has not matured or been tested enough to be active even under 
non-C/R circumstances. Once it is ready, we can consider possibly trying to 
support it under C/R enabled activities.

I hope that clarifies what is going on.

-- Josh

On Aug 23, 2010, at 12:50 PM,   
wrote:

> Hi
>  
> In the file “mca_base_components_open.c”, following code checks for the 
> components that are checkpointable. If I configure OpenMPI library with 
> “—enable-cr” option, I was under the assumption that all components will be 
> checkpointable. However I see that quite a few components are not 
> checkpointable and that list includes “Shared Memmory (sm)”. Do I have to add 
> any other options to “configure” command so that all components are 
> checkpointable? Thanks
>  
>  186/*
>  187 * If the user asked for a checkpoint enabled run
>  188 * then only load checkpoint enabled components.
>  189 */
>  190if( MCA_BASE_METADATA_PARAM_CHECKPOINT & open_only_flags) {
>  191if( MCA_BASE_METADATA_PARAM_CHECKPOINT & 
> dummy->data.param_field) {
>  192opal_output_verbose(10, output_id,
>  193"mca: base: components_open: "
>  194"(%s) Component %s is 
> Checkpointable",
>  195type_name,
>  196
> dummy->version.mca_component_name);
>  197}
>  198else {
>  199opal_output_verbose(10, output_id,
>  200"mca: base: components_open: "
>  201"(%s) Component %s is *NOT* 
> Checkpointable - Disabled",
>  202type_name,
>  203
> dummy->version.mca_component_name);
>  204opal_list_remove_item(&components_found, item);
>  205}
>  206}
>  207}
>  208}
>  
> Thanks
> 
> Ananda
> 
>  
> Ananda B Mudar, PMP
> Senior Technical Architect
> Wipro Technologies
> Please do not print this email unless it is absolutely necessary.
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> 
> www.wipro.com
> 
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey







Re: [OMPI devel] Checkpoint/restart question

2010-08-26 Thread Joshua Hursey
I have not played with the Condor checkpoint/restart library in quite some 
time. Supporting it should be fairly straight forward though (though the devil 
is always in the details with such things).

In Open MPI, all of the code to support checkpoint/restart services like BLCR 
or condor is part of a framework component in OPAL called CRS (for 
Checkpoint/Restart Service). To support a new checkpointer you will need to 
develop a new component under opal/mca/crs/. If you are (or someone you know 
is) interested in doing the development, you should be able to use the BLCR 
module to help guide you through the details.

This integration would allow you to use all of the Open MPI's current C/R 
infrastructure just with the Condor C/R library capturing the per process 
checkpoints. Storage and coordination is handled in other frameworks in the 
Open MPI environment so you should not need to worry about that at this level.

If you have any questions let me know and I can try to help you navigate the 
code base.

-- Josh

On Aug 25, 2010, at 7:36 PM, Tomas Oppelstrup wrote:

> Hi,
> I have a question about checkpoint-restart operation with opem-mpi. I
> hope this is an apropriate forum for my question.
> 
> I do not have access to recopmile the kernel or load kernel modules,
> so I would like to use the condor checkpoint-restart library. Can
> that me made to work with openmpi's checkpoint-restart
> infrastructure?
> 
> The condor library, upon recept of a signal or calling its checkpoint
> function from within the program, generates a file containing the
> complete (as complete as possible) state of the process, including
> the state of libraries, e.g. openmpi. On restart, the process
> image/state is loaded into memory and execution is resumed at the
> checkpoint location.
> 
> On restart, I assume that some information in the mpi-state may need
> to be reinitalized, since e.g. the names of the hosts of the
> mpi-process, and pids of possible support processes will have
> changed.
> 
> Is this tricky to fix (that code must somehow be there for the BLCR
> compatibility)?
> 
> Perhaps it can be achieved by (in violation of the mpi-standard)
> calling MPI_Finalize before the checkpoint, and MPI_Init after
> restart? This seems like a conceptually appealing solution, but may
> not be allowed nor to the correct thing in openmpi?!
> 
>  Thanks for any ideas/help/pointers to more information!
> 
> Tomas
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey







Re: [OMPI devel] nit-pick: typo in README (1.4.3rc1 and 1.5rc5)

2010-08-26 Thread Joshua Hursey
Well that's slightly embarrassing. Thanks for the catch. I filed CMRs to have 
this patch applied to the v1.4 and 1.5 branches before the next releases. 
Tickets below:
  https://svn.open-mpi.org/trac/ompi/ticket/2548
  https://svn.open-mpi.org/trac/ompi/ticket/2549

Thanks,
Josh

On Aug 25, 2010, at 5:48 PM, Paul H. Hargrove wrote:

> The following patch applies to both 1.4.3rc1 and 1.5rc5 to fix a typo in 
> the README:
> 
> --- README.orig2010-08-25 14:45:09.0 -0700
> +++ README 2010-08-25 14:45:20.0 -0700
> @@ -69,7 +69,7 @@
> - Asynchronous, transparent checkpoint/restart support
>   - Fully coordinated checkpoint/restart coordination component
>   - Support for the following checkpoint/restart services:
> -- blcr: Berkley Lab's Checkpoint/Restart
> +- blcr: Berkeley Lab's Checkpoint/Restart
> - self: Application level callbacks
>   - Support for the following interconnects:
> - tcp
> 
> 
> 
> -Paul
> 
> -- 
> Paul H. Hargrove  phhargr...@lbl.gov
> Future Technologies Group
> HPC Research Department   Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey







Re: [OMPI devel] Question on the members of ompi_crcp_bkmrk_pml_drain_message_ref_t and ompi_crcp_bkmrk_pml_traffic_message_ref_t

2010-08-26 Thread Joshua Hursey
Ananda,

So I think the comments are just misleading/wrong here. So messages are grouped 
by signature/envelope of the message. The 
ompi_crcp_bkmrk_pml_drain_message_ref_t and 
ompi_crcp_bkmrk_pml_traffic_message_ref_t data structures describe the envelope 
and each have a list of 'msg_contents' that point to the unique information for 
each message (e.g., buffer, request, status) of the type 
ompi_crcp_bkmrk_pml_message_content_ref_t.

So the 'drain_message' and 'traffic_message' datatypes use the various integers 
to count the number of done/active/posted 'message_content' datatypes stored in 
the list that they are responsible for. The internals of crcp_bkmrk_pml.c use 
these counter values to quickly look up what needs to be drained or waited on 
instead of iterating through the list of all messages every time.

This was a technique to reduce both the memory footprint of the implementation 
and improve performance slightly. It looks like the comments were not updated 
to match the change. Sorry about that. I'll file a ticket to update those 
comments in the trunk and branches so I don't forget.
  https://svn.open-mpi.org/trac/ompi/ticket/2550

I hope that helps a bit.

Best,
Josh

On Aug 26, 2010, at 1:59 AM,   
wrote:

> Josh
>  
> In the file ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.h, I have a question on the 
> way few of the members of the following structures are defined:
> ompi_crcp_bkmrk_pml_drain_message_ref_t
> ompi_crcp_bkmrk_pml_traffic_message_ref_t
>  
> Under the definition of “ompi_crcp_bkmrk_pml_drain_message_ref_t”, based on 
> the comments following members are better declared as Boolean variables 
> however they are declared as integers. Is there any reason for not using 
> Boolean type?
>  
> /** Is this message complete WRT PML semantics?
>  * true = message done on this side (send or receive)
>  * false = message still in process (sending or receiving)
>  */
> int done;
>  
> /** Is the message actively being worked on?
>  * true = Message is !done, and is in the progress cycle
>  * false = Message is !done and is *not* in the progress cycle ( 
> [send/recv]_init requests)
>  */
> int active;
>  
> /** Has this message been posted?
>  * true = message was posted (Send or recv)
>  * false = message was not yet posted.
>  *   Used when trying to figure out which messages the drain protocol 
> needs to post, and
>  *   which message have already been posted for it.
>  */
> int already_posted;
>  
> I see that you have used bool type for similar members in 
> ompi_crcp_bkmrk_pml_message_content_ref_t.
>  
> Thanks
> Ananda
>  
> Ananda B Mudar, PMP
> Senior Technical Architect
> Wipro Technologies
> Please do not print this email unless it is absolutely necessary.
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> 
> www.wipro.com
> 
> 


Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey







Re: [OMPI devel] Multi-environment builds

2007-07-11 Thread Joshua Hursey


On Jul 11, 2007, at 8:09 AM, Terry D. Dontje wrote:


Jeff Squyres wrote:


On Jul 10, 2007, at 1:26 PM, Ralph H Castain wrote:




2. It may be useful to have some high-level parameters to specify a
specific run-time environment, since ORTE has multiple, related
frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
somesuch.




I was just writing this up in an enhancement ticket when the though
hit me: isn't this aggregate MCA parameters?  I.e.:

mpirun --am tm ...

Specifically, we'll need to make a "tm" AMCA file (and whatever other
ones we want), but my point is: does AMCA already give us what we  
want?




The above sounds like a possible solution as long as we are going to
deliver a set of such files and not require each site to create their
own.  Also, can one pull in multiple AMCA files for one run thus  
you can
specify a tm AMCA and possibly some other AMCA file that the user  
may want?


Yep. You can put a ':' between different parameters. So:
 shell$ mpirun -am tm:foo:bar ...
will pull in the three AMCA files 'tm' 'foo' 'bar' in that order of  
precedence. Meaning that 'tm' can override a MCA parameter in 'foo',  
and 'foo' can override a MCA parameter in 'bar'. And any '-mca'  
command line options take a higher precedence than AMCA parameter  
files, so could override MCA parameters set by any of 'tm' 'foo' or  
'bar'.


I'll put it on my list to make a faq entry for AMCA usage, as I don't  
see one.


-- Josh



--td
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/





Re: [OMPI devel] Notes on building and running Open MPI on Red Storm

2007-07-12 Thread Joshua Hursey
Thanks for the heads up. I've noticed this warning on the Cray  
systems here at ORNL, and haven't had a chance to put the fix in yet.


This function is exposed in non-CR builds as a user interface item.  
If the user requests a checkpoint of an MPI job that was not compiled  
with C/R (or doesn't have it enabled if FT was compiled in) then it  
will respond with a nice error message instead of not responding at  
all. I go back and forth on cutting this out completely as the tools  
to checkpoint shouldn't be built if a user doesn't compile in FT  
support.


I'll work on a fix for it in the next couple of days.

Cheers,
Josh

On Jul 12, 2007, at 10:40 AM, Brian Barrett wrote:


On Jul 11, 2007, at 4:47 PM, Glendenning, Lisa wrote:


  * When linking with libopen-pal, the following warning is normal:
'In
function `checkpoint_response': warning: mkfifo is not implemented  
and

will always fail'


Josh -

I thought the checkpoint code wasn't built unless requested.  Anyway,
if you AC_CHECK_FUNCS for mkfifo, it'll fail on the Cray.  Can you
update opal/runtime/opal_cr.c to not have code that calls mkfifo()
compiled on platforms that don't have mkfifo?


Thanks,

Brian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/