I am new to MPI and I try to run it with parallel HDF5 . So I setup a cluster of 2 nodes with GFS2 and DRBD and in my shared folder I compiled the example provided in: http://www.hdfgroup.org/HDF5/Tutor/pcrtaccd.html<https://mailbox.corp.sopra/owa/redir.aspx?C=nPwOrkPk6U-pPdJHB1jNffnBw6gfNtAIBnlB_ZBwfZ1HxR0wLiLngpKoN8z4xoeOLYImVbuZbFg.&URL=http%3a%2f%2fwww.hdfgroup.org%2fHDF5%2fTutor%2fpcrtaccd.html>.
When I try to run it in the shared file this error occurred : Fatal error in PMPI_Comm_dup: Other MPI error, error stack: PMPI_Comm_dup(176)............: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0xff9a56f8) failed PMPI_Comm_dup(161)............: MPIR_Comm_dup_impl(55)........: MPIR_Comm_copy(967)...........: MPIR_Get_contextid(521).......: MPIR_Get_contextid_sparse(683): MPIR_Allreduce_impl(712)......: MPIR_Allreduce_intra(357).....: dequeue_and_set_error(596)....: Communication error with rank 0 HDF5: infinite loop closing library D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD Fatal error in PMPI_Comm_dup: Other MPI error, error stack: PMPI_Comm_dup(176)............: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0xbf940f28) failed PMPI_Comm_dup(161)............: MPIR_Comm_dup_impl(55)........: MPIR_Comm_copy(967)...........: MPIR_Get_contextid(521).......: MPIR_Get_contextid_sparse(683): MPIR_Allreduce_impl(712)......: MPIR_Allreduce_intra(357).....: dequeue_and_set_error(596)....: Communication error with rank 1 HDF5: infinite loop closing library D,T,AC,FD,P,FD,P,FD,P,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD,FD When I run the code in single node it work well and also when I execute the Hello world example in my cluster . Could anyou help me to figure out what is the origin of the problem ? Thank you .
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
