On Feb 10, 2010, at 9:45 AM, Addepalli, Srirangam V wrote:
> I am trying to test orte-checkpoint with a MPI JOB. It how ever hangs for all
> jobs. This is how i submit the job is started
> mpirun -np 8 -mca ft-enable cr /apps/nwchem-5.1.1/bin/LINUX64/nwchem
> siosi6.nw
This might be the problem, if it wasn't a typo. The command line flag is "-am
ft-enable-cr" not "-mca ft-enable cr". The former activates a set of MCA
parameters (in the AMCA file 'ft-enable-cr'). The latter should be ignored by
the MCA system.
Give that a try and let us know if the behavior changes.
-- Josh
>> From another terminal i try the orte-checkpoint
>
> ompi-checkpoint -v --term 9338
> [compute-19-12.local:09377] orte_checkpoint: Checkpointing...
> [compute-19-12.local:09377] PID 9338
> [compute-19-12.local:09377] Connected to Mpirun [[5009,0],0]
> [compute-19-12.local:09377] Terminating after checkpoint
> [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Contact Head Node
> Process PID 9338
> [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Requested a
> checkpoint of jobid [INVALID]
> [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Receive a command
> message.
> [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Status Update.
>
>
> Is there any way to debug the issue to get more information or log messages.
>
> Rangam
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users