On 14/05/2012 4:18 PM, Anirban wrote:


On Mon, May 14, 2012 at 11:35 AM, Mark Abraham <mark.abra...@anu.edu.au <mailto:mark.abra...@anu.edu.au>> wrote:

    On 14/05/2012 3:52 PM, Anirban wrote:
    Hi ALL,

    I am trying to simulate a membrane protein system using CHARMM36
    FF on GROAMCS4.5.5 on a parallel cluster running on MPI. The
    system consists of arounf 1,17,000 atoms. The job runs fine on 5
    nodes (5X12=120 cores) using mpirun and gives proper output. But
    whenever I try to submit it on more than 5 nodes, the job gets
    killed with the following error:

    That's likely going to be an issue with the configuration of your
    MPI system, or your hardware, or both. Do check your .log file for
    evidence of unsuitable DD partiion, though the fact of "turning on
    dynamic load balancing" suggest DD partitioning worked OK.

    Mark


Hello Mark,

Thanks for the reply.
The .log file reports no error/warning and ends abruptly with the following last lines:

That's most consistent with a problem external to GROMACS.

Mark


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Making 3D domain decomposition grid 4 x 3 x 9, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  Protein_POPC
  1:  SOL_CL
There are: 117548 Atoms
Charge group distribution at step 0: 358 353 443 966 1106 746 374 351 352 352 358 454 975 1080 882 381 356 357 357 358 375 770 1101 882 365 359 358 351 348 487 983 1051 912 377 344 361 363 352 596 1051 1036 1050 553 351 349 366 352 375 912 1125 1045 478 351 344 356 362 445 971 1040 959 520 405 355 357 355 639 1032 1072 1096 790 474 353 349 345 449 1019 1047 971 444 354 357 355 357 391 946 1093 904 375 367 368 349 349 409 934 1082 867 406 350 350 364 341 398 978 1104 937 415 341 368
Grid: 6 x 7 x 4 cells
Initial temperature: 300.318 K

Started mdrun on node 0 Fri May 11 20:43:52 2012

           Step           Time         Lambda
              0        0.00000        0.00000

   Energies (kJ/mol)
U-B Proper Dih. Improper Dih. CMAP Dih. LJ-14 8.67972e+04 6.15820e+04 1.38445e+03 -1.60452e+03 1.44395e+04 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip. Potential -5.21377e+04 4.98413e+04 -1.21372e+06 -8.94296e+04 -1.14284e+06 Kinetic En. Total Energy Temperature Pressure (bar) Constr. rmsd 2.93549e+05 -8.49294e+05 3.00132e+02 -1.80180e+01 1.40708e-05
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Any suggestion is welcome.

Thanks,

Anirban



    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    starting mdrun 'Protein'
    50000000 steps, 100000.0 ps.

    NOTE: Turning on dynamic load balancing

    Fatal error in MPI_Sendrecv: Other MPI error
    Fatal error in MPI_Sendrecv: Other MPI error
    Fatal error in MPI_Sendrecv: Other MPI error

    
=====================================================================================
    =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    =   EXIT CODE: 256
    =   CLEANING UP REMAINING PROCESSES
    =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    
=====================================================================================
    [proxy:0:0@cn034] HYD_pmcd_pmip_control_cmd_cb
    (./pm/pmiserv/pmip_cb.c:906): assert (!closed) failed
    [proxy:0:0@cn034] HYDT_dmxu_poll_wait_for_event
    (./tools/demux/demux_poll.c:77): callback returned error status
    [proxy:0:0@cn034] main (./pm/pmiserv/pmip.c:214): demux engine
    error waiting for event
    .
    .
    .
    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    Why is this happening? Is it related to DD and PME? How to solve
    it? Any suggestion is welcome.
    Sorry for re-posting.


    Thanks and regards,

    Anirban






    --
    gmx-users mailing list gmx-users@gromacs.org
    <mailto:gmx-users@gromacs.org>
    http://lists.gromacs.org/mailman/listinfo/gmx-users
    Please search the archive at
    http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
    Please don't post (un)subscribe requests to the list. Use the
    www interface or send it to gmx-users-requ...@gromacs.org
    <mailto:gmx-users-requ...@gromacs.org>.
    Can't post? Read http://www.gromacs.org/Support/Mailing_Lists





-- 
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to