Dear Josh
First of all, thank you for your continuous attention on this issue.
About the problem, even though I followed what you had suggested like the
below, the checkpoint did not work.
So append this value to your $HOME/.openmpi/mca-params.conf file
#-
mca_base_param_file_prefix=ft-enable-cr
#-
Sincerely
Thomas
On Mon, Jan 11, 2010 at 2:21 PM, Josh Hursey wrote:
> (Sorry for the delay in replying. I am still sorting through a backlog of
> holiday email buildup).
>
>
> On Dec 10, 2009, at 7:32 PM, Chang IL Yoon wrote:
>
> Dear Josh.
>>
>> Thank your for keeping attention on this problem.
>>
>>
>> On Wed, Dec 9, 2009 at 8:40 AM, Josh Hursey
>> wrote:
>>
>> On Dec 3, 2009, at 2:01 PM, Chang IL Yoon wrote:
>>
>> Dear Josh and Paul.
>>
>> First of all, thank you very much for your interesting on my problem.
>>
>> 1) I tested it again with MPIRUN_CMD as 'mpirun -am ft-enable-cr -np %N
>> %P'
>> But the checkpoint did not work.
>>
>> Is it giving the same error?
>>
>> Can you send me information on how you configured Open MPI on your system?
>>
>> Yes, it gives the same error.
>>
>> When was installing the open-mpi-1.3.3, I used the following
>> configuration.
>>
>> ./configure --enable-ft-thread --with-ft=cr --enable-mpi- threads
>> --with-blcr={BLCR_DIR} --with-blcr-libdir={BLCR_LIBDIR} --
>> prefix={OPENMPI_DIR}
>>
>> What kind of configuration information do you need?
>>
>
> This looks fine to me.
>
>
>
>> 2) Here are the more information on my MPI configuration.
>> - What version of Open MPI are you using?
>> >> I am using Open-MPI ver 1.3.3 with BLCR ver 0.8.2
>>
>> - How did you configure Open MPI?
>> >> ./configure --enable-ft-thread --with-ft=cr --enable-mpi-threads
>> --with-blcr={BLCR_DIR} --with-blcr-libdir={BLCR_LIBDIR}
>> --prefix={OPENMPI_DIR}
>>
>> - What arguments are being passed to 'mpirun' when running with GASNet?
>> >> mpirun -am ft-enable-cr --machinefile ./machinefile -np 1 ./personal
>>
>> The '-np 1' argument is a bit puzzling to me, don't you want this to be >1
>> normally. GASNet does not use any MPI dynamic process management interfaces
>> (e.g., MPI_Comm_spawn), does it?
>>
>> Sorry, actually I do not know if GASNet uses a MPI dynamic process
>> management or not.
>>
>>
> It probably does not (not many applications do), but it could be a problem
> if they do.
>
>
>
>> >> personal is the same probram, my-app.c except for using gasnet_init
>> and gasnet_exit() instead of MPI_Init() and MPI_Finalize().
>> >> my-app.c is in http://osl.iu.edu/research/ft/ompi-cr/examples.php.
>> >> gasnet_init() and gasnet_exit() use MPI_Init() and MPI_Finalize().
>>
>> So you are using the program from the SELF checkpoint example? If Open MPI
>> detects that the application has the appropriate function callbacks to use
>> the SELF CRS (which this example does) then it will -not- use the BLCR
>> component, but instead select the SELF component.
>>
>> Try using a simple counting program instead of that particular example.
>> You could also just remove the opal_crs_self_user_* and my_personal_*
>> functions form the example program to reduce it to one.
>>
>> I'm not sure why the checkpoint would not work even with the SELF CRS.
>> I'll have to check on that.
>>
>> Even though I used a simple counting program, the check point did not
>> work.
>>
>
> Humm... Everything seems to be setup correctly, and the application is
> still behaving like it is not getting the '-am ft-enable-cr' parameter. The
> only other thing I can think of to try is to set this value in the
> $HOME/.openmpi/mca-params.conf file. It looks a bit different but if you add
> the following line it should work (as long as $HOME is mounted on all of the
> machines).
>
> So append this value to your $HOME/.openmpi/mca-params.conf file and see if
> that helps.
> #-
> mca_base_param_file_prefix=ft-enable-cr
> #-
>
> If that doesn't work, I'll have to think a bit more about what might be
> going wrong here.
>
> -- Josh
>
>
>
>> - Do you have any environment variables/MCA parameters set for Open MPI?
>> >> yes
>> $HOME/.openmpi/mca-params.conf
>> # Local snapshot directory (not used in this scenario)
>> crs_base_snapshot_dir=${HOME}/temp
>>
>> # Remote snapshot directory (globally mounted file system))
>> snapc_base_global_snapshot_dir=${HOME}/checkpoints
>>
>> - My network interconnects is Infiniband/OpenIB (IP over IB).
>>
>> These all look fine to me.
>>
>>
>>
>> 3) If there are something for me to solve this problem, please let me know
>> without any hesitation.
>>
>> Thank you again for your reading
>>
>> Sincerely
>>
>>
>> On Tue, Dec 1, 2009 at 1:49 PM, Paul H. Hargrove
>> wrote:
>> Thomas,
>>
>> I connection with Josh's question about mpirun arguments, I suggest you
>> try setting
>> MPIRUN_CMD='mpirun -am ft-enable-cr -np %N %P %A'
>> in your environment before launching the GASNet application. This will
>> instruct GASNet's wrapper around mp