[OMPI users] How to checkpoint atomic function in OpenMPI

2010-06-14 Thread Nguyen Toan
Hi all,
I have a MPI program as follows:
---
int main(){
   MPI_Init();
   ..
   for (i=0; i<1; i++) {
  my_atomic_func();
   }
   ...
   MPI_Finalize();
   return 0;
}


The runtime of this program mainly involves in running the loop and
my_atomic_func() takes a little bit long.
Here I want my_atomic_func() to be operated atomically, but the timing of
checkpointing (by running ompi-checkpoint command) may be in the middle of
my_atomic_func() operation and hence ompi-restart may fail to restart
correctly.

So my question is:
+ At the checkpoint time (executing ompi-checkpoint), is there a way to let
OpenMPI wait until my_atomic_func()  finishes its operation?
+ How does ompi-checkpoint operate to checkpoint MPI threads?

Regards,
Nguyen Toan


Re: [OMPI users] How to checkpoint atomic function in OpenMPI

2010-07-16 Thread Josh Hursey

On Jun 14, 2010, at 5:26 AM, Nguyen Toan wrote:

> Hi all,
> I have a MPI program as follows:
> ---
> int main(){
>MPI_Init();
>..
>for (i=0; i<1; i++) {
>   my_atomic_func();
>}
>...
>MPI_Finalize();
>return 0;
> }
> 
> 
> The runtime of this program mainly involves in running the loop and 
> my_atomic_func() takes a little bit long. 
> Here I want my_atomic_func() to be operated atomically, but the timing of 
> checkpointing (by running ompi-checkpoint command) may be in the middle of 
> my_atomic_func() operation and hence ompi-restart may fail to restart 
> correctly.
> 
> So my question is:
> + At the checkpoint time (executing ompi-checkpoint), is there a way to let 
> OpenMPI wait until my_atomic_func()  finishes its operation?

We do not currently have an external function to declare a critical section 
during which a checkpoint should not be taken. I filed a ticket to make one 
available. The link is below if you would like to follow its progress:
  https://svn.open-mpi.org/trac/ompi/ticket/2487

I have an MPI Extension interface for C/R that I will be bringing into the 
trunk in the next few weeks. I should be able to extend it to include this 
feature. But I can't promise a deadline, just that I will update the ticket 
when it is available.

In the mean time you might try to use the BLCR interface to define critical 
sections. If you are using the C/R thread then this may work (though I have not 
tried it):
  cr_enter_cs()
  cr_leave_cs()

> + How does ompi-checkpoint operate to checkpoint MPI threads? 

We depend on the Checkpoint/Restart Service (e.g., BLCR) to capture the whole 
process image including all threads. So BLCR will capture the state of all 
threads when we take the process checkpoint.

-- Josh

> 
> Regards,
> Nguyen Toan
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How to checkpoint atomic function in OpenMPI

2010-07-22 Thread Nguyen Toan
Dear Josh,
I hope to see this new API soon. Anyway, I will try these critical section
functions in BLCR. Thank you for the support.

Best Regards,
Nguyen Toan

On Sat, Jul 17, 2010 at 6:34 AM, Josh Hursey  wrote:

>
> On Jun 14, 2010, at 5:26 AM, Nguyen Toan wrote:
>
> > Hi all,
> > I have a MPI program as follows:
> > ---
> > int main(){
> >MPI_Init();
> >..
> >for (i=0; i<1; i++) {
> >   my_atomic_func();
> >}
> >...
> >MPI_Finalize();
> >return 0;
> > }
> > 
> >
> > The runtime of this program mainly involves in running the loop and
> my_atomic_func() takes a little bit long.
> > Here I want my_atomic_func() to be operated atomically, but the timing of
> checkpointing (by running ompi-checkpoint command) may be in the middle of
> my_atomic_func() operation and hence ompi-restart may fail to restart
> correctly.
> >
> > So my question is:
> > + At the checkpoint time (executing ompi-checkpoint), is there a way to
> let OpenMPI wait until my_atomic_func()  finishes its operation?
>
> We do not currently have an external function to declare a critical section
> during which a checkpoint should not be taken. I filed a ticket to make one
> available. The link is below if you would like to follow its progress:
>  https://svn.open-mpi.org/trac/ompi/ticket/2487
>
> I have an MPI Extension interface for C/R that I will be bringing into the
> trunk in the next few weeks. I should be able to extend it to include this
> feature. But I can't promise a deadline, just that I will update the ticket
> when it is available.
>
> In the mean time you might try to use the BLCR interface to define critical
> sections. If you are using the C/R thread then this may work (though I have
> not tried it):
>  cr_enter_cs()
>  cr_leave_cs()
>
> > + How does ompi-checkpoint operate to checkpoint MPI threads?
>
> We depend on the Checkpoint/Restart Service (e.g., BLCR) to capture the
> whole process image including all threads. So BLCR will capture the state of
> all threads when we take the process checkpoint.
>
> -- Josh
>
> >
> > Regards,
> > Nguyen Toan
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>