xuji wrote:
>     Hi all:
>  
>  
>     
> I wrote an e-mail many days ago about continuing run in Gromacs-4.0 with 
> check point file. But I can't solve this problem yet.
>     I run a simulation with
>       mpiexec -machinefile ./mf_24 -np 24 mdrun -v -append -cpt 5 -cpi 
> dppc_md_prev.cpt -cpo dppc_md.cpt -s dppc_md.tpr -o dppc_md.trr -c 
> dppc_md.gro -g dppc_md.log -e dppc_md.edr 
>     in 4 nodes. But when I continue to run the simulation with
>       mpiexec -machinefile ./mf_24 -np 24 mdrun -v -append -cpt 5 -cpi 
> dppc_md.cpt -cpo dppc_md_2.cpt -s dppc_md.tpr -o dppc_md.trr -c dppc_md.gro 
> -g dppc_md.log -e dppc_md.edr 
>     or with
>       mpiexec -machinefile ./mf_24 -np 24 mdrun -v -append -cpt 5 -cpi 
> dppc_md_prev.cpt -cpo dppc_md_2.cpt -s dppc_md.tpr -o dppc_md.trr -c 
> dppc_md.gro -g dppc_md.log -e dppc_md.edr 
>     because there're 2 check point file in the simulation directory, I tried 
> both of them.
>     I always get following errors:
>  
>     Reading checkpoint file dppc_md_prev.cpt generated: Fri Mar 20 08:53:47 
> 2009
>     or
>     Reading checkpoint file dppc_md.cpt generated: Fri Mar 20 08:58:08 2009 
>  
>     Loaded with Money
>     Fatal error in MPI_Bcast:
>     Message truncated, error stack:
>     MPI_Bcast(1145)...................: MPI_Bcast(buf=0x7fffc33242dc, 
> count=4, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
>     MPIR_Bcast(229)...................: 
>     MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2 
> truncated; 12 bytes received but buffer size is 4
>     Fatal error in MPI_Bcast:
>     Message truncated, error stack:
>     MPI_Bcast(1145)...................: MPI_Bcast(buf=0x7fff6c0da09c, 
> count=4, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
>     MPIR_Bcast(229)...................: 
>     MPIDI_CH3U_Receive_data_found(254): Message from rank 4 and tag 2 
> truncated; 12 bytes received but buffer size is 4
>     Fatal error in MPI_Bcast:
>     Message truncated, error stack:
>     MPI_Bcast(1145)...................: MPI_Bcast(buf=0x7fff9ac2ebec, 
> count=4, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
>     MPIR_Bcast(229)...................: 
>     MPIDI_CH3U_Receive_data_found(254): Message from rank 0 and tag 2 
> truncated; 12 bytes received but buffer size is 4
>     rank 16 in job 5  Node115_33001   caused collective abort of all ranks
>       exit status of rank 16: killed by signal 9 
>     rank 8 in job 5  Node115_33001   caused collective abort of all ranks
>       exit status of rank 8: killed by signal 9 
>     rank 6 in job 5  Node115_33001   caused collective abort of all ranks
>       exit status of rank 6: killed by signal 9 
>     
>     Can someone help me with this problem? Appreciate any help in advance!

This probably isn't intrinsically related to GROMACS. Probably there's
something changed in the way your MPI is configured between your early
and subsequent runs. You should simplify the problem down below 24(?)
processors to help diagnose, see if you can re-run the earlier
calculation now, and/or test that other simple MPI programs work.

Mark
_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to