Siegmar, Ralph,

I'm sorry to response so late since last week.

Ralph fixed the problem in r32459 and it was merged to v1.8
in r32474. But in v1.8 an additional custom patch is needed
because the db/dstore source codes are different between trunk
and v1.8.

I'm preparing and testing the custom patch just now.
Wait wait a minute please.

Takahiro Kawashima,
MPI development team,
Fujitsu

> Hi,
> 
> thank you very much to everybody who tried to solve my bus
> error problem on Solaris 10 Sparc. I thought that you found
> and fixed it, so that I installed openmpi-1.8.2rc4r32485 on
> my machines (Solaris 10 Sparc (tyr), Solaris 10 x86_64 (sunpc1),
> openSUSE Linux 12.1 x86_64 (linpc1)) with gcc-4.9.0. A small
> program works on my x86_64 architectures, but still breaks
> with a bus error on my Sparc system.
> 
> linpc1 fd1026 106 mpiexec -np 1 init_finalize
> Hello!
> linpc1 fd1026 106 exit
> logout
> tyr small_prog 113 ssh sunpc1
> sunpc1 fd1026 101 mpiexec -np 1 init_finalize
> Hello!
> sunpc1 fd1026 102 exit
> logout
> tyr small_prog 114 mpiexec -np 1 init_finalize
> [tyr:21109] *** Process received signal ***
> [tyr:21109] Signal: Bus Error (10)
> ...
> 
> 
> gdb shows the following backtrace.
> 
> tyr small_prog 122 /usr/local/gdb-7.6.1_64_gcc/bin/gdb 
> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec
> GNU gdb (GDB) 7.6.1
> ...
> (gdb) run -np 1 init_finalize
> Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 
> init_finalize
> [Thread debugging using libthread_db enabled]
> [New Thread 1 (LWP 1)]
> [New LWP    2        ]
> [tyr:21158] *** Process received signal ***
> [tyr:21158] Signal: Bus Error (10)
> [tyr:21158] Signal code: Invalid address alignment (1)
> [tyr:21158] Failing at address: ffffffff7fffd224
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_backtrace_print+0x2c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xcd130
> /lib/sparcv9/libc.so.1:0xd8b98
> /lib/sparcv9/libc.so.1:0xcc70c
> /lib/sparcv9/libc.so.1:0xcc918
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3ee8
>  [ Signal 10 (BUS)]
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_db_base_store+0xc8
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_decode_pidmap+0x798
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_util_nidmap_init+0x3cc
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x226c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_init+0x308
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_init+0x31c
> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:MPI_Init+0x2a8
> /home/fd1026/SunOS/sparc/bin/init_finalize:main+0x10
> /home/fd1026/SunOS/sparc/bin/init_finalize:_start+0x7c
> [tyr:21158] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 21158 on node tyr exited on 
> signal 10 (Bus Error).
> --------------------------------------------------------------------------
> [LWP    2         exited]
> [New Thread 2        ]
> [Switching to Thread 1 (LWP 1)]
> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to 
> satisfy query
> (gdb) bt
> #0  0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
> #1  0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
> #2  0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
> #3  0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
> #4  0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
> #5  0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
> #6  0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
> #7  0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
> #8  0xffffffff7ec7748c in vm_close () from 
> /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
> #9  0xffffffff7ec74a6c in lt_dlclose () from 
> /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
> #10 0xffffffff7ec99b90 in ri_destructor (obj=0x1001ead30)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_component_repository.c:391
> #11 0xffffffff7ec984a8 in opal_obj_run_destructors (object=0x1001ead30)
>     at ../../../../openmpi-1.8.2rc4r32485/opal/class/opal_object.h:446
> #12 0xffffffff7ec9940c in mca_base_component_repository_release (
>     component=0xffffffff7b023df0 <mca_oob_tcp_component>)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_component_repository.c:244
> #13 0xffffffff7ec9b754 in mca_base_component_unload (
>     component=0xffffffff7b023df0 <mca_oob_tcp_component>, output_id=-1)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:47
> #14 0xffffffff7ec9b7e8 in mca_base_component_close (
>     component=0xffffffff7b023df0 <mca_oob_tcp_component>, output_id=-1)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:60
> #15 0xffffffff7ec9b8bc in mca_base_components_close (output_id=-1, 
>     components=0xffffffff7f12b930 <orte_oob_base_framework+80>, skip=0x0)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:86
> #16 0xffffffff7ec9b824 in mca_base_framework_components_close (
>     framework=0xffffffff7f12b8e0 <orte_oob_base_framework>, skip=0x0)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_components_close.c:66
> #17 0xffffffff7efae21c in orte_oob_base_close ()
>     at 
> ../../../../openmpi-1.8.2rc4r32485/orte/mca/oob/base/oob_base_frame.c:94
> #18 0xffffffff7ecb28cc in mca_base_framework_close (
>     framework=0xffffffff7f12b8e0 <orte_oob_base_framework>)
>     at 
> ../../../../openmpi-1.8.2rc4r32485/opal/mca/base/mca_base_framework.c:187
> #19 0xffffffff7bf078c0 in rte_finalize ()
>     at 
> ../../../../../openmpi-1.8.2rc4r32485/orte/mca/ess/hnp/ess_hnp_module.c:858
> #20 0xffffffff7ef30a44 in orte_finalize ()
>     at ../../openmpi-1.8.2rc4r32485/orte/runtime/orte_finalize.c:65
> #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0d8)
>     at ../../../../openmpi-1.8.2rc4r32485/orte/tools/orterun/orterun.c:1096
> #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0d8)
>     at ../../../../openmpi-1.8.2rc4r32485/orte/tools/orterun/main.c:13
> (gdb) 
> 
> 
> Is this a new problem? I would be grateful if somebody could
> fix it. Thank you very much for any help in advance.
> 
> Kind regards
> 
> Siegmar

Reply via email to