Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node
Hi Ralph, Here is the output when I put "-mca rmaps_base_verbose 10 --display-map" and where it stopped(by gdb), which shows it stopped in a function of lama. I usually use PGI 13.10, so I tried to change it to gnu compiler. Then, it works. Therefore, this problem depends on compiler. That's all what I could find today. Regards, Tetsuya Mishima [mishima@manage ~]$ gdb GNU gdb (GDB) CentOS (7.0.1-42.el5.centos.1) (gdb) attach 14666 0x2b4c5c33 in rmaps_lama_prune_max_tree () at ./rmaps_lama_max_tree.c:814 [mishima@manage demos]$ mpirun -np 2 -mca rmaps lama -report-bindings -mca rmaps_base_verbose 10 --display-map myprog [manage.cluster:21503] mca: base: components_register: registering rmaps components [manage.cluster:21503] mca: base: components_register: found loaded component lama [manage.cluster:21503] mca:rmaps:lama: Priority 0 [manage.cluster:21503] mca:rmaps:lama: Map : NULL [manage.cluster:21503] mca:rmaps:lama: Bind : NULL [manage.cluster:21503] mca:rmaps:lama: MPPR : NULL [manage.cluster:21503] mca:rmaps:lama: Order : NULL [manage.cluster:21503] mca: base: components_register: component lama register function successful [manage.cluster:21503] mca: base: components_open: opening rmaps components [manage.cluster:21503] mca: base: components_open: found loaded component lama [manage.cluster:21503] mca:rmaps:select: checking available component lama [manage.cluster:21503] mca:rmaps:select: Querying component [lama] [manage.cluster:21503] [[23940,0],0]: Final mapper priorities [manage.cluster:21503] Mapper: lama Priority: 0 [manage.cluster:21503] mca:rmaps: mapping job [23940,1] [manage.cluster:21503] mca:rmaps: creating new map for job [23940,1] [manage.cluster:21503] mca:rmaps: nprocs 2 [manage.cluster:21503] mca:rmaps:lama: Mapping job [23940,1] [manage.cluster:21503] mca:rmaps:lama: Revised Parameters - [manage.cluster:21503] mca:rmaps:lama: Map : csbnh [manage.cluster:21503] mca:rmaps:lama: Bind : 1c [manage.cluster:21503] mca:rmaps:lama: MPPR : (null) [manage.cluster:21503] mca:rmaps:lama: Order : s [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] mca:rmaps:lama: - Binding : [1c] [manage.cluster:21503] mca:rmaps:lama: - Binding :1 x Core [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] mca:rmaps:lama: - Mapping : [csbnh] [manage.cluster:21503] mca:rmaps:lama: - Mapping : (0) Core (7 vs 0) [manage.cluster:21503] mca:rmaps:lama: - Mapping : (1) Socket (3 vs 1) [manage.cluster:21503] mca:rmaps:lama: - Mapping : (2) Board (1 vs 3) [manage.cluster:21503] mca:rmaps:lama: - Mapping : (3)Machine (0 vs 7) [manage.cluster:21503] mca:rmaps:lama: - Mapping : (4) Hw. Thread (8 vs 8) [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] mca:rmaps:lama: - MPPR : [(null)] [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] mca:rmaps:lama: - Ordering : [s] [manage.cluster:21503] mca:rmaps:lama: - Ordering : Sequential [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] AVAILABLE NODES FOR MAPPING: [manage.cluster:21503] node: manage daemon: 0 [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] mca:rmaps:lama: - Building the Max Tree... [manage.cluster:21503] mca:rmaps:lama: - [manage.cluster:21503] mca:rmaps:lama: - Converting Remote Tree: manage [mishima@manage demos]$ ompi_info | grep "C compiler family" C compiler family name: GNU [mishima@manage demos]$ mpirun -np 2 -mca rmaps lama myprog Hello world from process 0 of 2 Hello world from process 1 of 2 > On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote: > > > > > > > Ralph, thanks. I'll try it on Tuseday. > > > > Let me confirm one thing. I don't put "-with-libevent" when I build > > openmpi. > > Is there any possibility to build with external libevent automatically? > > No - only happens if you direct it > > > > > > Tetsuya Mishima > > > > > >> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to > > your cmd line and let's see if it finishes the mapping. > >> > >> Unless you specifically built with an external libevent (which I doubt), > > there is no conflict. The connection issue is unlikely to be a factor here > > as it works when not using the lama mapper. > >> > >> > >> On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote: > >> > >>> > >>> > >>> Thank you, Ralph. > >>> > >>> Then, this problem should depend on our environment. > >>> But, at least, inversion problem is not the cause because > >>> node05 has normal hier order. > >>> > >>> I can not connect to our cluster now. Tuesday, going > >>> back to my office, I'll send you further report. > >>> > >>> Before that, please let
Re: [OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10
I fear that Jeff and Brian are both out for the holiday, Gus, so we are unlikely to have much info on this until they return I'm unaware of any such problems in 1.6.5. It looks like something isn't properly aligned in memory - could be an error on our part, but might be in the program. You might want to build a debug version and see if that segfaults, and then look at the core with gdb to see where it happened. On Dec 23, 2013, at 3:27 PM, Gus Correawrote: > Dear OMPI experts > > I have been using OMPI 1.6.5 built with gcc 4.4.7 and > PGI pgfortran 11.10 to successfully compile and run > a large climate modeling program (CESM) in several > different configurations. > > However, today I hit a segmentation fault when running a new model > configuration. > [In the climate modeling jargon, a program is called a "model".] > > This is somewhat unpleasant because that OMPI build > is a central piece of the production CESM model setup available > to all users in our two clusters at this point. > I have other OMPI 1.6.5 builds, with other compilers, but that one > was working very well with CESM, until today. > > Unless I am misinterpreting it, the error message, > reproduced below, seems to indicate the problem > happened inside the OMPI library. > Or not? > > Other details: > > Nodes are AMD Opteron 6376 x86_64, interconnect is Infiniband QDR, > OS is stock CentOS 6.4, kernel 2.6.32-358.2.1.el6.x86_64. > The program is compiled with the OMPI wrappers (mpicc and mpif90), > and somewhat conservative optimization flags: > > FFLAGS := $(CPPDEFS) -i4 -gopt -Mlist -Mextend -byteswapio > -Minform=inform -traceback -O2 -Mvect=nosse -Kieee > > Is this a known issue? > Any clues on how to address it? > > Thank you for your help, > Gus Correa > > error message *** > > [1,31]:[node30:17008] *** Process received signal *** > [1,31]:[node30:17008] Signal: Segmentation fault (11) > [1,31]:[node30:17008] Signal code: Address not mapped (1) > [1,31]:[node30:17008] Failing at address: 0x17 > [1,31]:[node30:17008] [ 0] /lib64/libpthread.so.0(+0xf500) > [0x2b788ef9f500] > [1,31]:[node30:17008] [ 1] > /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(+0x100ee3) > [0x2b788e200ee3] > [1,31]:[node30:17008] [ 2] > /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x111) > [0x2b788e203771] > [1,31]:[node30:17008] [ 3] > /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x97) > [0x2b788e2046d7] > [1,31]:[node30:17008] [ 4] > /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x8b) > [0x2b788e2052ab] > [1,31]:[node30:17008] [ 5] ./ccsm.exe(pgf90_auto_alloc+0x73) > [0xe2c4c3] > [1,31]:[node30:17008] *** End of error message *** > -- > mpiexec noticed that process rank 31 with PID 17008 on node node30 exited on > signal 11 (Segmentation fault). > -- > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10
Dear OMPI experts I have been using OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10 to successfully compile and run a large climate modeling program (CESM) in several different configurations. However, today I hit a segmentation fault when running a new model configuration. [In the climate modeling jargon, a program is called a "model".] This is somewhat unpleasant because that OMPI build is a central piece of the production CESM model setup available to all users in our two clusters at this point. I have other OMPI 1.6.5 builds, with other compilers, but that one was working very well with CESM, until today. Unless I am misinterpreting it, the error message, reproduced below, seems to indicate the problem happened inside the OMPI library. Or not? Other details: Nodes are AMD Opteron 6376 x86_64, interconnect is Infiniband QDR, OS is stock CentOS 6.4, kernel 2.6.32-358.2.1.el6.x86_64. The program is compiled with the OMPI wrappers (mpicc and mpif90), and somewhat conservative optimization flags: FFLAGS := $(CPPDEFS) -i4 -gopt -Mlist -Mextend -byteswapio -Minform=inform -traceback -O2 -Mvect=nosse -Kieee Is this a known issue? Any clues on how to address it? Thank you for your help, Gus Correa error message *** [1,31]:[node30:17008] *** Process received signal *** [1,31]:[node30:17008] Signal: Segmentation fault (11) [1,31]:[node30:17008] Signal code: Address not mapped (1) [1,31]:[node30:17008] Failing at address: 0x17 [1,31]:[node30:17008] [ 0] /lib64/libpthread.so.0(+0xf500) [0x2b788ef9f500] [1,31]:[node30:17008] [ 1] /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(+0x100ee3) [0x2b788e200ee3] [1,31]:[node30:17008] [ 2] /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x111) [0x2b788e203771] [1,31]:[node30:17008] [ 3] /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x97) [0x2b788e2046d7] [1,31]:[node30:17008] [ 4] /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x8b) [0x2b788e2052ab] [1,31]:[node30:17008] [ 5] ./ccsm.exe(pgf90_auto_alloc+0x73) [0xe2c4c3] [1,31]:[node30:17008] *** End of error message *** -- mpiexec noticed that process rank 31 with PID 17008 on node node30 exited on signal 11 (Segmentation fault). --
[OMPI users] Call for Workshops: EuroMPI/ASIA 2014
*** * EuroMPI/ASIA 2014 Call for Workshops * * The 21st European MPI Users' Group Meeting * * Kyoto, Japan* * 9th - 12th September, 2014 * * www.eurompi2014.org* *** In addition to the main conference's technical program, EuroMPI/ASIA 2014 is soliciting proposals for one-day or half-day workshops to be held in conjunction with the main conference. It is intended that those workshops are aim to discuss on ongoing work and latest ideas related to applications with message passing paradigms. The Call for Papers will be separately announced and also shown in the conference page. BACKGROUND AND TOPICS - EuroMPI is the preeminent meeting for users, developers and researchers to interact and discuss new developments and applications of message-passing parallel computing, in particular in and related to the Message Passing Interface (MPI). The annual meeting has a long, rich tradition, and has been held in European countries. In the 21st EuroMPI, the conference venue is at Kyoto, Japan, outside of Europe. Following past meetings, EuroMPI/ASIA 2014 will continue to focus on not just MPI, but also extensions or alternative interfaces for high-performance homogeneous/heterogeneous/hybrid systems, benchmarks, tools, parallel I/O, fault tolerance, and parallel applications using MPI and other interfaces. Through the presentation of contributed papers, poster presentations and invited talks, attendees will have the opportunity to share ideas and experiences to contribute to the improvement and furthering of message-passing and related parallel programming paradigms. IMPORTANT DATES --- - Workshop proposals due: January 30, 2014 - Workshop notification: February 14, 2014 - Submission of full papers and poster abstracts: April 25, 2014 - Author notification: May 30, 2014 - Camera Ready papers due: June 20, 2014 - Tutorials: September 9, 2014 - Conference: September 10-12, 2014 SUBMISSION INSTRUCTIONS --- The organizers of accepted workshops are required to advertise the workshop and call for papers, solicit submissions, conduct the reviewing process, and decide upon the final workshop program. Workshop proposals and further inquiries should be sent to the EuroMPI 2014 Workshops Chairs, eurompi2014-worksh...@riken.jp The workshop proposal should include following information: 1. Title of the workshop 2. Workshop Organizers(s): name, affiliation, and e-mail 3. Brief description of the workshop 5. Call for Papers of the workshop (draft version) 6. Tentative list of program committee members (name, affiliation, and e-mail) 7. Workshop web address (Tentative) The decision on acceptance/rejection of workshop proposals will be made on the basis of the overall quality of the proposal as well as of how it fits in with the conference. The organizers of successful workshops will be responsible for their own reviewing process, publicity (e.g., website and call for papers), and proceedings production. They will be required to closely cooperate with the Workshops Chairs and the EuroMPI 2014 local organizers to finalize all organizational details.