Re: [OMPI users] "-bind-to numa" of openmpi-1.7.4rc1 dosen't work for our magny cours based 32 core node

2013-12-23 Thread tmishima


Hi Ralph,

Here is the output when I put "-mca rmaps_base_verbose 10 --display-map"
and where it stopped(by gdb), which shows it stopped in a function of lama.

I usually use PGI 13.10, so I tried to change it to gnu compiler.
Then, it works. Therefore, this problem depends on compiler.

That's all what I could find today.

Regards,
Tetsuya Mishima

[mishima@manage ~]$ gdb
GNU gdb (GDB) CentOS (7.0.1-42.el5.centos.1)

(gdb) attach 14666

0x2b4c5c33 in rmaps_lama_prune_max_tree ()
at ./rmaps_lama_max_tree.c:814

[mishima@manage demos]$ mpirun -np 2 -mca rmaps lama -report-bindings -mca
rmaps_base_verbose 10 --display-map myprog
[manage.cluster:21503] mca: base: components_register: registering rmaps
components
[manage.cluster:21503] mca: base: components_register: found loaded
component lama
[manage.cluster:21503] mca:rmaps:lama: Priority   0
[manage.cluster:21503] mca:rmaps:lama: Map   : NULL
[manage.cluster:21503] mca:rmaps:lama: Bind  : NULL
[manage.cluster:21503] mca:rmaps:lama: MPPR  : NULL
[manage.cluster:21503] mca:rmaps:lama: Order : NULL
[manage.cluster:21503] mca: base: components_register: component lama
register function successful
[manage.cluster:21503] mca: base: components_open: opening rmaps components
[manage.cluster:21503] mca: base: components_open: found loaded component
lama
[manage.cluster:21503] mca:rmaps:select: checking available component lama
[manage.cluster:21503] mca:rmaps:select: Querying component [lama]
[manage.cluster:21503] [[23940,0],0]: Final mapper priorities
[manage.cluster:21503]  Mapper: lama Priority: 0
[manage.cluster:21503] mca:rmaps: mapping job [23940,1]
[manage.cluster:21503] mca:rmaps: creating new map for job [23940,1]
[manage.cluster:21503] mca:rmaps: nprocs 2
[manage.cluster:21503] mca:rmaps:lama: Mapping job [23940,1]
[manage.cluster:21503] mca:rmaps:lama: Revised Parameters -
[manage.cluster:21503] mca:rmaps:lama: Map   : csbnh
[manage.cluster:21503] mca:rmaps:lama: Bind  : 1c
[manage.cluster:21503] mca:rmaps:lama: MPPR  : (null)
[manage.cluster:21503] mca:rmaps:lama: Order : s
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Binding  : [1c]
[manage.cluster:21503] mca:rmaps:lama: - Binding  :1 x   Core
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : [csbnh]
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (0)   Core (7
vs 0)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (1) Socket (3
vs 1)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (2)  Board (1
vs 3)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (3)Machine (0
vs 7)
[manage.cluster:21503] mca:rmaps:lama: - Mapping  : (4) Hw. Thread (8
vs 8)
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - MPPR : [(null)]
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Ordering : [s]
[manage.cluster:21503] mca:rmaps:lama: - Ordering : Sequential
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] AVAILABLE NODES FOR MAPPING:
[manage.cluster:21503] node: manage daemon: 0
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Building the Max Tree...
[manage.cluster:21503] mca:rmaps:lama: -
[manage.cluster:21503] mca:rmaps:lama: - Converting Remote Tree: manage

[mishima@manage demos]$ ompi_info | grep "C compiler family"
  C compiler family name: GNU
[mishima@manage demos]$ mpirun -np 2 -mca rmaps lama myprog
Hello world from process 0 of 2
Hello world from process 1 of 2



> On Dec 21, 2013, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > Ralph, thanks. I'll try it on Tuseday.
> >
> > Let me confirm one thing. I don't put "-with-libevent" when I build
> > openmpi.
> > Is there any possibility to build with external libevent automatically?
>
> No - only happens if you direct it
>
>
> >
> > Tetsuya Mishima
> >
> >
> >> Not entirely sure - add "-mca rmaps_base_verbose 10 --display-map" to
> > your cmd line and let's see if it finishes the mapping.
> >>
> >> Unless you specifically built with an external libevent (which I
doubt),
> > there is no conflict. The connection issue is unlikely to be a factor
here
> > as it works when not using the lama mapper.
> >>
> >>
> >> On Dec 21, 2013, at 3:43 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>>
> >>>
> >>> Thank you, Ralph.
> >>>
> >>> Then, this problem should depend on our environment.
> >>> But, at least, inversion problem is not the cause because
> >>> node05 has normal hier order.
> >>>
> >>> I can not connect to our cluster now. Tuesday, going
> >>> back to my office, I'll send you further report.
> >>>
> >>> Before that, please let 

Re: [OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10

2013-12-23 Thread Ralph Castain
I fear that Jeff and Brian are both out for the holiday, Gus, so we are 
unlikely to have much info on this until they return

I'm unaware of any such problems in 1.6.5. It looks like something isn't 
properly aligned in memory - could be an error on our part, but might be in the 
program. You might want to build a debug version and see if that segfaults, and 
then look at the core with gdb to see where it happened.


On Dec 23, 2013, at 3:27 PM, Gus Correa  wrote:

> Dear OMPI experts
> 
> I have been using OMPI 1.6.5 built with gcc 4.4.7 and
> PGI pgfortran 11.10 to successfully compile and run
> a large climate modeling program (CESM) in several
> different configurations.
> 
> However, today I hit a segmentation fault when running a new model 
> configuration.
> [In the climate modeling jargon, a program is called a "model".]
> 
> This is somewhat unpleasant because that OMPI build
> is a central piece of the production CESM model setup available
> to all users in our two clusters at this point.
> I have other OMPI 1.6.5 builds, with other compilers, but that one
> was working very well with CESM, until today.
> 
> Unless I am misinterpreting it, the error message,
> reproduced below, seems to indicate the problem
> happened inside the OMPI library.
> Or not?
> 
> Other details:
> 
> Nodes are AMD Opteron 6376 x86_64, interconnect is Infiniband QDR,
> OS is stock CentOS 6.4, kernel 2.6.32-358.2.1.el6.x86_64.
> The program is compiled with the OMPI wrappers (mpicc and mpif90),
> and somewhat conservative optimization flags:
> 
> FFLAGS   := $(CPPDEFS) -i4 -gopt -Mlist -Mextend -byteswapio 
> -Minform=inform -traceback -O2 -Mvect=nosse -Kieee
> 
> Is this a known issue?
> Any clues on how to address it?
> 
> Thank you for your help,
> Gus Correa
> 
>  error message ***
> 
> [1,31]:[node30:17008] *** Process received signal ***
> [1,31]:[node30:17008] Signal: Segmentation fault (11)
> [1,31]:[node30:17008] Signal code: Address not mapped (1)
> [1,31]:[node30:17008] Failing at address: 0x17
> [1,31]:[node30:17008] [ 0] /lib64/libpthread.so.0(+0xf500) 
> [0x2b788ef9f500]
> [1,31]:[node30:17008] [ 1] 
> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(+0x100ee3) 
> [0x2b788e200ee3]
> [1,31]:[node30:17008] [ 2] 
> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x111)
>  [0x2b788e203771]
> [1,31]:[node30:17008] [ 3] 
> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x97)
>  [0x2b788e2046d7]
> [1,31]:[node30:17008] [ 4] 
> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x8b)
>  [0x2b788e2052ab]
> [1,31]:[node30:17008] [ 5] ./ccsm.exe(pgf90_auto_alloc+0x73) 
> [0xe2c4c3]
> [1,31]:[node30:17008] *** End of error message ***
> --
> mpiexec noticed that process rank 31 with PID 17008 on node node30 exited on 
> signal 11 (Segmentation fault).
> --
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Segmentation fault on OMPI 1.6.5 built with gcc 4.4.7 and PGI pgfortran 11.10

2013-12-23 Thread Gus Correa

Dear OMPI experts

I have been using OMPI 1.6.5 built with gcc 4.4.7 and
PGI pgfortran 11.10 to successfully compile and run
a large climate modeling program (CESM) in several
different configurations.

However, today I hit a segmentation fault when running a new model 
configuration.

[In the climate modeling jargon, a program is called a "model".]

This is somewhat unpleasant because that OMPI build
is a central piece of the production CESM model setup available
to all users in our two clusters at this point.
I have other OMPI 1.6.5 builds, with other compilers, but that one
was working very well with CESM, until today.

Unless I am misinterpreting it, the error message,
reproduced below, seems to indicate the problem
happened inside the OMPI library.
Or not?

Other details:

Nodes are AMD Opteron 6376 x86_64, interconnect is Infiniband QDR,
OS is stock CentOS 6.4, kernel 2.6.32-358.2.1.el6.x86_64.
The program is compiled with the OMPI wrappers (mpicc and mpif90),
and somewhat conservative optimization flags:

FFLAGS   := $(CPPDEFS) -i4 -gopt -Mlist -Mextend -byteswapio 
-Minform=inform -traceback -O2 -Mvect=nosse -Kieee


Is this a known issue?
Any clues on how to address it?

Thank you for your help,
Gus Correa

 error message ***

[1,31]:[node30:17008] *** Process received signal ***
[1,31]:[node30:17008] Signal: Segmentation fault (11)
[1,31]:[node30:17008] Signal code: Address not mapped (1)
[1,31]:[node30:17008] Failing at address: 0x17
[1,31]:[node30:17008] [ 0] /lib64/libpthread.so.0(+0xf500) 
[0x2b788ef9f500]
[1,31]:[node30:17008] [ 1] 
/sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(+0x100ee3) 
[0x2b788e200ee3]
[1,31]:[node30:17008] [ 2] 
/sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x111) 
[0x2b788e203771]
[1,31]:[node30:17008] [ 3] 
/sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x97) 
[0x2b788e2046d7]
[1,31]:[node30:17008] [ 4] 
/sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x8b) 
[0x2b788e2052ab]
[1,31]:[node30:17008] [ 5] ./ccsm.exe(pgf90_auto_alloc+0x73) 
[0xe2c4c3]

[1,31]:[node30:17008] *** End of error message ***
--
mpiexec noticed that process rank 31 with PID 17008 on node node30 
exited on signal 11 (Segmentation fault).

--



[OMPI users] Call for Workshops: EuroMPI/ASIA 2014

2013-12-23 Thread Javier Garcia Blas
 ***
 *  EuroMPI/ASIA 2014 Call for Workshops   *
 *  The 21st European MPI Users' Group Meeting *
 * Kyoto, Japan*
 * 9th - 12th September, 2014  *
 *  www.eurompi2014.org*
 ***

In addition to the main conference's technical program, EuroMPI/ASIA
2014 is soliciting proposals for one-day or half-day workshops to be
held in conjunction with the main conference.  It is intended that
those workshops are aim to discuss on ongoing work and latest ideas
related to applications with message passing paradigms.
The Call for Papers will be separately announced and also shown in the
conference page.


BACKGROUND AND TOPICS
-
EuroMPI is the preeminent meeting for users, developers and
researchers to interact and discuss new developments and applications
of message-passing parallel computing, in particular in and related to
the Message Passing Interface (MPI). The annual meeting has a long,
rich tradition, and has been held in European countries.  In the 21st
EuroMPI, the conference venue is at Kyoto, Japan, outside of Europe.
 Following past meetings, EuroMPI/ASIA 2014 will continue to focus
on not just MPI, but also extensions or alternative interfaces for
high-performance homogeneous/heterogeneous/hybrid systems, benchmarks,
tools, parallel I/O, fault tolerance, and parallel applications using
MPI and other interfaces.  Through the presentation of contributed
papers, poster presentations and invited talks, attendees will have
the opportunity to share ideas and experiences to contribute to the
improvement and furthering of message-passing and related parallel
programming paradigms.

IMPORTANT DATES
---
- Workshop proposals due: January 30, 2014
- Workshop notification: February 14, 2014
- Submission of full papers and poster abstracts: April 25, 2014
- Author notification: May 30, 2014
- Camera Ready papers due: June 20, 2014
- Tutorials: September 9, 2014
- Conference: September 10-12, 2014

SUBMISSION INSTRUCTIONS
--- 
The organizers of accepted workshops are required to advertise the
workshop and call for papers, solicit submissions, conduct the
reviewing process, and decide upon the final workshop program.
Workshop proposals and further inquiries should be sent to the EuroMPI
2014 Workshops Chairs, eurompi2014-worksh...@riken.jp

The workshop proposal should include following information:
1. Title of the workshop
2. Workshop Organizers(s): name, affiliation, and e-mail
3. Brief description of the workshop
5. Call for Papers of the workshop (draft version)
6. Tentative list of program committee members (name, affiliation, and e-mail)
7. Workshop web address (Tentative)

The decision on acceptance/rejection of workshop proposals will be
made on the basis of the overall quality of the proposal as well as of
how it fits in with the conference. The organizers of successful
workshops will be responsible for their own reviewing process,
publicity (e.g., website and call for papers), and proceedings
production. They will be required to closely cooperate with the
Workshops Chairs and the EuroMPI 2014 local organizers to finalize all
organizational details.