Re: [O-MPI devel] sm btl/signal 11 problem on Linux

2005-12-28 Thread Graziano Giuliani
Hi all,
can confirm this bug also on Linux Debian testing with kernel 2.6.14 and 
gcc (GCC) 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) running WRF atmospheric 
model compiled with portland pgf90. For who cares about this, it needs just a 
little patch in the RSL layer of the model to convert fortran integer comms 
to C comms (MPICH uses integer for boths, just a matter of using 
MPI_Comm_f2c).

It is not limited to FC4.
Interesting thing is that it works if pls_rsh_debug is set

Open MPI: 1.0.2a1r8609
   Open MPI SVN revision: r8609
Open RTE: 1.0.2a1r8609
   Open RTE SVN revision: r8609
OPAL: 1.0.2a1r8609
   OPAL SVN revision: r8609
  MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component v1.0.2)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.2)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.2)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.2)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.2)
MCA coll: self (MCA v1.0, API v1.0, Component v1.0.2)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.2)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.0.2)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.2)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.2)
 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.2)
 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.2)
 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.2)
 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.2)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.0.2)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.2)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.2)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.2)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.2)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.2)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.2)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.2)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.2)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.2)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.2)
 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.2)
 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.2)
 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.2)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.2)
 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.2)
   MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.2)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.2)
MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.0.2)
 MCA rml: oob (MCA v1.0, API v1.0, Component v1.0.2)
 MCA pls: fork (MCA v1.0, API v1.0, Component v1.0.2)
 MCA pls: proxy (MCA v1.0, API v1.0, Component v1.0.2)
 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.0.2)
 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.0.2)
 MCA sds: env (MCA v1.0, API v1.0, Component v1.0.2)
 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.0.2)
 MCA sds: seed (MCA v1.0, API v1.0, Component v1.0.2)
 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.0.2)
 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.0.2)
  Prefix: /home/cluster/openmpi
  Bindir: /home/cluster/openmpi/bin
  Libdir: /home/cluster/openmpi/lib
  Incdir: /home/cluster/openmpi/include
   Pkglibdir: /home/cluster/openmpi/lib/openmpi
  Sysconfdir: /home/cluster/openmpi/etc
 Configured architecture: i686-pc-linux-gnu
   Configured by: cluster
   Configured on: Tue Dec 27 12:03:35 UTC 2005
  Configure host: hactar
Built by: cluster
Built on: Tue Dec 27 12:20:26 UTC 2005
  Built host: hactar
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
 C char size: 1
 C bool size: 1
C short size: 2
  C int size: 4
 C long size: 4
C float s

Re: [O-MPI devel] sm btl/signal 11 problem on Linux

2005-12-28 Thread Brian Barrett

On Dec 28, 2005, at 4:50 AM, Graziano Giuliani wrote:


Hi all,
can confirm this bug also on Linux Debian testing with kernel  
2.6.14 and
gcc (GCC) 4.0.3 20051201 (prerelease) (Debian 4.0.2-5) running WRF  
atmospheric
model compiled with portland pgf90. For who cares about this, it  
needs just a
little patch in the RSL layer of the model to convert fortran  
integer comms

to C comms (MPICH uses integer for boths, just a matter of using
MPI_Comm_f2c).


Shame on the developers of the atmospheric code for not using  
MPI_Comm_{f2c,c2f} in the first place ;).



It is not limited to FC4.
Interesting thing is that it works if pls_rsh_debug is set


Could you generate a stack trace from a core file?  It would be good  
to verify that this is the startup bug we are seeing with FC4 and not  
another bug somewhere else.  Or, I think you are using a recent  
enough version of Open MPI that you should see a stack trace printed  
when a SIGSEGV or SIGBUS occurs.  Finally, could you let me know what  
options you passed to configure and send the config.out file  
generated by configure (this is probably unrelated to the error with  
sm, but I'm curious why you ended up with the malloc_hooks component  
-- it shouldn't be automatically chosen in any circumstance).


Thanks,

Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/