I've run gromacs v4.6.3 on BG/Q without problem and my colleagues have run older versions of gromacs on the BG/L also without problem. (No BG/P experience here, unfortunately). Still, it's worth having you post your job submission script for us to take a look at.
Also, is it just gromacs 4.6.2/4.6.3 are problematic, or does, for example, 4.0.7 work ok for you on the BG/P? Chris. ________________________________________ From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se <gromacs.org_gmx-users-boun...@maillist.sys.kth.se> on behalf of arrow50311 <linxingcheng50...@gmail.com> Sent: 12 March 2014 17:37 To: gmx-us...@gromacs.org Subject: Re: [gmx-users] Assistance needed running gromacs 4.6.3 on Blue Gene/P Is there any follow-up for this question? I met with exactly the same problem on Bluegene/P. Could anyone offer a help? Thank you, Prentice Bisbal wrote > Mark, > > Since I was working with 4.6.2, I built 4.6.3 to see if this was the > result of a bug in 4.6.2. It isn't I get the same error with 4.6.3, but > that is the version I'll be working with from now on, since it's the > latest. Since the problem occurs with both versions, might as well try > to fix it in the latest version, right? > > I compiled 4.6.3 with the following options to include debugging > information: > > cmake .. \ > -DCMAKE_TOOLCHAIN_FILE=../cmake/Platform/BlueGeneP-static-XL-C.cmake \ > -DBUILD_SHARED_LIBS=OFF \ > -DGMX_MPI=ON \ > -DCMAKE_C_FLAGS="-O0 -g -qstrict -qarch=450 -qtune=450" \ > -DCMAKE_INSTALL_PREFIX=/scratch/bgapps/gromacs-4.6.3 \ > -DGMX_CPU_ACCELERATION=None \ > -DGMX_THREAD_MPI=OFF \ > -DGMX_OPENMP=OFF \ > -DGMX_DEFAULT_SUFFIX=ON \ > -DCMAKE_PREFIX_PATH=/scratch/bgapps/fftw-3.3.2 \ > 2>&1 | tee cmake.log > > For qarch, I removed the 'd' from the end, so that the double-FPU isn't > used, which can cause problems if the data isn't aligned correctly. The > -qstrict makes sure certain optimizations aren't performed. It should be > superfluous with optimization levels below 3, but I through it in just > to be safe, and set -O0. (of course, I think -g turns off all > optizations, anyway) > > On the BG/P, I had to install FFTW3 separately, and that wasn't > installed with debugging active, so there are no symbols for FFTW. > > One of my coworkers wrote a script that converts BG/P core files to > stack traces. In all the kernels I've looked at so far (9 out of 64), > the stack ends at a vfprintf call. For example: > > ------------------------------------------------------------- > > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/resolv/res_init.c:414 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/libio/wgenops.c:419 > /scratch/pbisbal/build/gromacs-4.6.3/src/gmxlib/nonbonded/nb_kernel_c/nb_kernel_ElecRFCut_VdwBhamSh_GeomW4P1_c.c:673 > ??:0 > /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/sys/dcmf/../ccmi/executor/Broadcast.h:83 > /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/dcmfd/src/coll/reduce/reduce_algorithms.c:69 > /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/comm/lib/dev/mpich2/src/mpid/dcmfd/src/coll/bcast/bcast_algorithms.c:227 > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:779 > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:762 > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/nbnxn_atomdata.c:374 > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/calcmu.c:88 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/mdrun.c:113 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/runner.c:1492 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/genalg.c:467 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/calc_verletbuf.c:266 > ../stdio-common/printf_fphex.c:335 > ../stdio-common/printf_fphex.c:452 > ??:0 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > > ----------------------------------------------------------------- > > Another node with a different stack looks like this: > > --------------------------------------------------------------- > > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/libio/genops.c:982 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/string/memcpy.c:159 > /scratch/pbisbal/build/gromacs-4.6.3/src/mdlib/ns.c:423 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/runner.c:1646 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/genalg.c:467 > /scratch/pbisbal/build/gromacs-4.6.3/src/kernel/calc_verletbuf.c:266 > ../stdio-common/printf_fphex.c:335 > ../stdio-common/printf_fphex.c:452 > ??:0 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > /bgsys/drivers/V1R4M2_200_2010-100508P/ppc/toolchain/gnu/glibc-2.4/stdio-common/vfprintf.c:1819 > > --------------------------------------------------------------- > > All the stacks look like one of these two. > > Is any of this information useful? My coworker, who has a lot of > experience developing for Blue Gene/P's, says this looks like an I/O > problem, but he doesn't have the time to dig into the Gromacs source > code for us. I'm willing to do some digging, but some guidance from > someone who know the code well would be very helpful. > > Prentice > > > > On 08/06/2013 08:19 PM, Mark Abraham wrote: >> That all looks fine so far. The core file processor won't help unless >> you've compiled with -g. Hopefully cmake -DCMAKE_BUILD_TYPE=Debug will >> do that, but I haven't actually checked that really works. If not, you >> might have to hack cmake/Platform/BlueGeneP-static-XL-C.cmake. >> >> Anyway, if you can compile with -g, then the core file will tell us in >> what function it is dying, which might help locate the problem. >> >> Mark >> >> On Tue, Aug 6, 2013 at 11:43 PM, Prentice Bisbal >> < > prentice.bisbal@ > > wrote: >>> Dear GMX-users, >>> >>> I need some assistance running Gromacs 4.6.3 on a Blue Gene/P. Although >>> I >>> have a background in Chemistry, I'm an experienced professional HPC >>> admin >>> who's relatively new to supporting Blue Genes and Gromacs. My first >>> Gromacs >>> user is having trouble running Gromacs on our BG/P. His jobs die and >>> dump >>> core, with no obvious signs (not to me, at least) of where the problem >>> lies. >>> >>> I compiled Gromacs 4.6.3 with the following options: >>> >>> ------------------------------------------snip------------------------------------------- >>> >>> cmake .. \ >>> -DCMAKE_TOOLCHAIN_FILE=../cmake/Platform/BlueGeneP-static-XL-C.cmake \ >>> -DBUILD_SHARED_LIBS=OFF \ >>> -DGMX_MPI=ON \ >>> -DCMAKE_C_FLAGS="-O3 -qarch=450d -qtune=450" \ >>> -DCMAKE_INSTALL_PREFIX=/scratch/bgapps/gromacs-4.6.2 \ >>> -DGMX_CPU_ACCELERATION=None \ >>> -DGMX_THREAD_MPI=OFF \ >>> -DGMX_OPENMP=OFF \ >>> -DGMX_DEFAULT_SUFFIX=ON \ >>> -DCMAKE_PREFIX_PATH=/scratch/bgapps/fftw-3.3.2 \ >>> 2>&1 | tee cmake.log >>> >>> ------------------------------------------snip------------------------------------------- >>> >>> When one of my users submits a job, it dumps core. My scheduler is >>> LoadLeveler, and I used this JCF file to replicate the problem. I added >>> the >>> '-debug 1' flag after searching the gmx-users archives: >>> >>> ------------------------------------------snip------------------------------------------- >>> >>> #!/bin/bash >>> # @ job_name = xiang >>> # @ job_type = bluegene >>> # @ bg_size = 64 >>> # @ class = small >>> # @ wall_clock_limit = 01:00:00,00:50:00 >>> # @ error = job.$(Cluster).$(Process).err >>> # @ output = job.$(Cluster).$(Process).out >>> # @ environment = COPY_ALL; >>> # @ queue >>> >>> source /scratch/bgapps/gromacs-4.6.2/bin/GMXRC.bash >>> >>> ------------------------------------------snip------------------------------------------- >>> >>> /bgsys/drivers/ppcfloor/bin/mpirun >>> /scratch/bgapps/gromacs-4.6.2/bin/mdrun_mpi -pin off -deffnm sbm-b_dyn3 >>> -v >>> -dlb yes -debug 1 >>> >>> The stderr file shows this at the bottom, which isn't too helpful: >>> >>> ------------------------------------------snip------------------------------------------- >>> >>> Reading file sbm-b_dyn3.tpr, VERSION 4.6.2 (single precision) >>> >>> Will use 48 particle-particle and 16 PME only nodes >>> This is a guess, check the performance at the end of the log file >>> Using 64 MPI processes >>> > <Aug 06 17:25:55.303879> > BE_MPI (ERROR): The error message in the job record >>> is as follows: >>> > <Aug 06 17:25:55.303940> > BE_MPI (ERROR): "killed with signal 6" >>> >>> -----------------------------------------snip----------------------------------------------- >>> >>> I have a bunch of core files which I can analyze with the IBM Core file >>> processor, and I also have bunch of debug files from mdrun. I went >>> through >>> about 12/64 of them, and didn't see anything that looked like an error. >>> >>> Can anyone offer me any suggestions of what to look for, or additional >>> debugging steps I can take? Please keep in mind I'm the system >>> administrator >>> and not an expert-user of gromacs, so I'm not sure if the inputs are >>> correct, or are at correct for my BG/P configuration. Any help will be >>> greatly appreciated. >>> >>> Thanks, >>> Prentice >>> >>> -- >>> gmx-users mailing list > gmx-users@ >>> http://lists.gromacs.org/mailman/listinfo/gmx-users >>> * Please search the archive at >>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >>> * Please don't post (un)subscribe requests to the list. Use the www >>> interface or send it to > gmx-users-request@ > . >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > -- > gmx-users mailing list > gmx-users@ > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to > gmx-users-request@ > . > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- View this message in context: http://gromacs.5086.x6.nabble.com/Assistance-needed-running-gromacs-4-6-3-on-Blue-Gene-P-tp5010370p5015114.html Sent from the GROMACS Users Forum mailing list archive at Nabble.com. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.