I am new to MPI and I try to run it with parallel HDF5 . So I setup a
cluster of 2 nodes with GFS2 and DRBD and in my shared folder I compiled
the example provided in:
http://www.hdfgroup.org/HDF5/Tutor/pcrtaccd.html<https://mailbox.corp.sopra/owa/redir.aspx?C=nPwOrkPk6U-pPdJHB1jNffnBw6gfNtAIBnlB_ZBwfZ1HxR0wLiLngpKoN8z4xoeOLYImVbuZbFg.&URL=http%3a%2f%2fwww.hdfgroup.org%2fHDF5%2fTutor%2fpcrtaccd.html>.


 When I try to run it in the shared file this error occurred : Fatal error
in PMPI_Comm_dup: Other MPI error, error stack:

PMPI_Comm_dup(176)............: MPI_Comm_dup(MPI_COMM_WORLD,
new_comm=0xff9a56f8)

failed PMPI_Comm_dup(161)............:

MPIR_Comm_dup_impl(55)........:

MPIR_Comm_copy(967)...........: MPIR_Get_contextid(521).......:

MPIR_Get_contextid_sparse(683): MPIR_Allreduce_impl(712)......:

MPIR_Allreduce_intra(357).....:

dequeue_and_set_error(596)....: Communication error with rank 0

HDF5: infinite loop closing library
D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD
Fatal error in PMPI_Comm_dup: Other MPI error, error stack:

PMPI_Comm_dup(176)............: MPI_Comm_dup(MPI_COMM_WORLD,
new_comm=0xbf940f28)

failed PMPI_Comm_dup(161)............:

MPIR_Comm_dup_impl(55)........:

MPIR_Comm_copy(967)...........:

MPIR_Get_contextid(521).......:

MPIR_Get_contextid_sparse(683):

MPIR_Allreduce_impl(712)......:

MPIR_Allreduce_intra(357).....:

dequeue_and_set_error(596)....: Communication error with rank 1

HDF5: infinite loop closing library
D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD



 When I run the code in single node it work well and also when I execute
the Hello world example in my cluster .


 Could anyou help me to figure out what is the origin of the problem ?



Thank you .
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to