Gilles, George,
The problem is the one Gilles pointed.
I temporarily modified the code bellow and the bus error disappeared.
--- orte/util/nidmap.c (revision 32447)
+++ orte/util/nidmap.c (working copy)
@@ -885,7 +885,7 @@
orte_proc_state_t state;
orte_app_idx_t app_idx;
int32_t restarts;
- orte_process_name_t proc, dmn;
+ orte_process_name_t proc __attribute__((__aligned__(8))), dmn;
char *hostname;
uint8_t flag;
opal_buffer_t *bptr;
Takahiro Kawashima,
MPI development team,
Fujitsu
> Kawashima-san,
>
> This is interesting :-)
>
> proc is in the stack and has type orte_process_name_t
>
> with
>
> typedef uint32_t orte_jobid_t;
> typedef uint32_t orte_vpid_t;
> struct orte_process_name_t {
> orte_jobid_t jobid; /**< Job number */
> orte_vpid_t vpid; /**< Process id - equivalent to rank */
> };
> typedef struct orte_process_name_t orte_process_name_t;
>
>
> so there is really no reason to align this on 8 bytes...
> but later, proc is casted into an uint64_t ...
> so proc should have been aligned on 8 bytes but it is too late,
> and hence the glory SIGBUS
>
>
> this is loosely related to
> http://www.open-mpi.org/community/lists/devel/2014/08/15532.php
> (see heterogeneous.v2.patch)
> if we make opal_process_name_t an union of uint64_t and a struct of two
> uint32_t, the compiler
> will align this on 8 bytes.
> note the patch is not enough (and will not apply on the v1.8 branch anyway),
> we could simply remove orte_process_name_t and ompi_process_name_t and
> use only
> opal_process_name_t (and never declare variables with type
> opal_proc_name_t otherwise alignment might be incorrect)
>
> as a workaround, you can declare an opal_process_name_t (for alignment),
> and cast it to an orte_process_name_t
>
> i will write a patch (i will not be able to test on sparc ...)
> please note this issue might be present in other places
>
> Cheers,
>
> Gilles
>
> On 2014/08/08 13:03, Kawashima, Takahiro wrote:
> > Hi,
> >
> >>>>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
> >>>>>> 10 Sparc and I receive a bus error, if I run a small program.
> > I've finally reproduced the bus error in my SPARC environment.
> >
> > #0 0xffffffff00db4740 (__waitpid_nocancel + 0x44)
> > (0x200,0x0,0x0,0xa0,0xfffff80100064af0,0x35b4)
> > #1 0xffffffff0001a310 (handle_signal + 0x574) (signo=10,info=(struct
> > siginfo *) 0x000007feffffd100,p=(void *) 0x000007feffffd100) at line 277 in
> > ../sigattach.c <SIGNAL HANDLER>
> > #2 0xffffffff0282aff4 (store + 0x540) (uid=(unsigned long *)
> > 0xffffffff0118a128,scope=8:'\b',key=(char *) 0xffffffff0106a0a8
> > "opal.local.ldr",data=(void *) 0x000007feffffde74,type=15:'\017') at line
> > 252 in db_hash.c
> > #3 0xffffffff01266350 (opal_db_base_store + 0xc4) (proc=(unsigned long *)
> > 0xffffffff0118a128,scope=8:'\b',key=(char *) 0xffffffff0106a0a8
> > "opal.local.ldr",object=(void *) 0x000007feffffde74,type=15:'\017') at line
> > 49 in db_base_fns.c
> > #4 0xffffffff00fdbab4 (orte_util_decode_pidmap + 0x790) (bo=(struct *)
> > 0x0000000000281d70) at line 975 in nidmap.c
> > #5 0xffffffff00fd6d20 (orte_util_nidmap_init + 0x3dc) (buffer=(struct
> > opal_buffer_t *) 0x0000000000241fc0) at line 141 in nidmap.c
> > #6 0xffffffff01e298cc (rte_init + 0x2a0) () at line 153 in ess_env_module.c
> > #7 0xffffffff00f9f28c (orte_init + 0x308) (pargc=(int *)
> > 0x0000000000000000,pargv=(char ***) 0x0000000000000000,flags=32) at line
> > 148 in orte_init.c
> > #8 0xffffffff001a6f08 (ompi_mpi_init + 0x31c) (argc=1,argv=(char **)
> > 0x000007fefffff348,requested=0,provided=(int *) 0x000007feffffe698) at line
> > 464 in ompi_mpi_init.c
> > #9 0xffffffff001ff79c (MPI_Init + 0x2b0) (argc=(int *)
> > 0x000007feffffe814,argv=(char ***) 0x000007feffffe818) at line 84 in init.c
> > #10 0x0000000000100ae4 (main + 0x44) (argc=1,argv=(char **)
> > 0x000007fefffff348) at line 8 in mpiinitfinalize.c
> > #11 0xffffffff00d2b81c (__libc_start_main + 0x194)
> > (0x100aa0,0x1,0x7fefffff348,0x100d24,0x100d14,0x0)
> > #12 0x000000000010094c (_start + 0x2c) ()
> >
> > The line 252 in opal/mca/db/hash/db_hash.c is:
> >
> > case OPAL_UINT64:
> > if (NULL == data) {
> > OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM);
> > return OPAL_ERR_BAD_PARAM;
> > }
> > kv->type = OPAL_UINT64;
> > kv->data.uint64 = *(uint64_t*)(data); // !!! here !!!
> > break;
> >
> > My environment is:
> >
> > Open MPI v1.8 branch r32447 (latest)
> > configure --enable-debug
> > SPARC-V9 (Fujitsu SPARC64 IXfx)
> > Linux (custom)
> > gcc 4.2.4
> >
> > I could not reproduce it with Open MPI trunk nor with Fujitsu compiler.
> >
> > Can this information help?
> >
> > Takahiro Kawashima,
> > MPI development team,
> > Fujitsu
> >
> >> Hi,
> >>
> >> I'm sorry once more to answer late, but the last two days our mail
> >> server was down (hardware error).
> >>
> >>> Did you configure this --enable-debug?
> >> Yes, I used the following command.
> >>
> >> ../openmpi-1.8.2rc3/configure --prefix=/usr/local/openmpi-1.8.2_64_gcc \
> >> --libdir=/usr/local/openmpi-1.8.2_64_gcc/lib64 \
> >> --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
> >> --with-jdk-headers=/usr/local/jdk1.8.0/include \
> >> JAVA_HOME=/usr/local/jdk1.8.0 \
> >> LDFLAGS="-m64 -L/usr/local/gcc-4.9.0/lib/amd64" \
> >> CC="gcc" CXX="g++" FC="gfortran" \
> >> CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \
> >> CPP="cpp" CXXCPP="cpp" \
> >> CPPFLAGS="" CXXCPPFLAGS="" \
> >> --enable-mpi-cxx \
> >> --enable-cxx-exceptions \
> >> --enable-mpi-java \
> >> --enable-heterogeneous \
> >> --enable-mpi-thread-multiple \
> >> --with-threads=posix \
> >> --with-hwloc=internal \
> >> --without-verbs \
> >> --with-wrapper-cflags="-std=c11 -m64" \
> >> --enable-debug \
> >> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc
> >>
> >>
> >>
> >>> If so, you should get a line number in the backtrace
> >> I got them for gdb (see below), but not for "dbx".
> >>
> >>
> >> Kind regards
> >>
> >> Siegmar
> >>
> >>
> >>
> >>>
> >>> On Aug 5, 2014, at 2:59 AM, Siegmar Gross
> >> <[email protected]> wrote:
> >>>> Hi,
> >>>>
> >>>> I'm sorry to answer so late, but last week I didn't have Internet
> >>>> access. In the meantime I've installed openmpi-1.8.2rc3 and I get
> >>>> the same error.
> >>>>
> >>>>> This looks like the typical type of alignment error that we used
> >>>>> to see when testing regularly on SPARC. :-\
> >>>>>
> >>>>> It looks like the error was happening in mca_db_hash.so. Could
> >>>>> you get a stack trace / file+line number where it was failing
> >>>>> in mca_db_hash? (i.e., the actual bad code will likely be under
> >>>>> opal/mca/db/hash somewhere)
> >>>> Unfortunately I don't get a file+line number from a file in
> >>>> opal/mca/db/Hash.
> >>>>
> >>>>
> >>>>
> >>>> tyr small_prog 102 ompi_info | grep MPI:
> >>>> Open MPI: 1.8.2rc3
> >>>> tyr small_prog 103 which mpicc
> >>>> /usr/local/openmpi-1.8.2_64_gcc/bin/mpicc
> >>>> tyr small_prog 104 mpicc init_finalize.c
> >>>> tyr small_prog 106 /opt/solstudio12.3/bin/sparcv9/dbx
> >> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec
> >>>> For information about new features see `help changes'
> >>>> To remove this message, put `dbxenv suppress_startup_message 7.9' in
> >>>> your
> >> .dbxrc
> >>>> Reading mpiexec
> >>>> Reading ld.so.1
> >>>> Reading libopen-rte.so.7.0.4
> >>>> Reading libopen-pal.so.6.2.0
> >>>> Reading libsendfile.so.1
> >>>> Reading libpicl.so.1
> >>>> Reading libkstat.so.1
> >>>> Reading liblgrp.so.1
> >>>> Reading libsocket.so.1
> >>>> Reading libnsl.so.1
> >>>> Reading libgcc_s.so.1
> >>>> Reading librt.so.1
> >>>> Reading libm.so.2
> >>>> Reading libpthread.so.1
> >>>> Reading libc.so.1
> >>>> Reading libdoor.so.1
> >>>> Reading libaio.so.1
> >>>> Reading libmd.so.1
> >>>> (dbx) check -all
> >>>> access checking - ON
> >>>> memuse checking - ON
> >>>> (dbx) run -np 1 a.outRunning: mpiexec -np 1 a.out
> >>>> (process id 27833)
> >>>> Reading rtcapihook.so
> >>>> Reading libdl.so.1
> >>>> Reading rtcaudit.so
> >>>> Reading libmapmalloc.so.1
> >>>> Reading libgen.so.1
> >>>> Reading libc_psr.so.1
> >>>> Reading rtcboot.so
> >>>> Reading librtc.so
> >>>> Reading libmd_psr.so.1
> >>>> RTC: Enabling Error Checking...
> >>>> RTC: Running program...
> >>>> Write to unallocated (wua) on thread 1:
> >>>> Attempting to write 1 byte at address 0xffffffff79f04000
> >>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
> >>>> 0xffffffff55174da0: _readdir+0x0064: call
> >> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
> >>>> (dbx) where
> >>>> current thread: t@1
> >>>> =>[1] _readdir(0xffffffff79f00300, 0x2e6800, 0x4, 0x2d, 0x4,
> >> 0xffffffff79f00300), at 0xffffffff55174da0
> >>>> [2] list_files_by_dir(0x100138fd8, 0xffffffff7fffd1f0,
> >>>> 0xffffffff7fffd1e8,
> >> 0xffffffff7fffd210, 0x0, 0xffffffff702a0010), at
> >>>> 0xffffffff63174594
> >>>> [3] foreachfile_callback(0x100138fd8, 0xffffffff7fffd458, 0x0, 0x2e,
> >>>> 0x0,
> >> 0xffffffff702a0010), at 0xffffffff6317461c
> >>>> [4] foreach_dirinpath(0x1001d8a28, 0x0, 0xffffffff631745e0,
> >> 0xffffffff7fffd458, 0x0, 0xffffffff702a0010), at 0xffffffff63171684
> >>>> [5] lt_dlforeachfile(0x1001d8a28, 0xffffffff6319656c, 0x0, 0x53, 0x2f,
> >> 0xf), at 0xffffffff63174748
> >>>> [6] find_dyn_components(0x0, 0xffffffff6323b570, 0x0, 0x1,
> >> 0xffffffff7fffd6a0, 0xffffffff702a0010), at 0xffffffff63195e38
> >>>> [7] mca_base_component_find(0x0, 0xffffffff6323b570,
> >>>> 0xffffffff6335e1b0,
> >> 0x0, 0xffffffff7fffd6a0, 0x1), at 0xffffffff631954d8
> >>>> [8] mca_base_framework_components_register(0xffffffff6335e1c0, 0x0,
> >>>> 0x3e,
> >> 0x0, 0x3b, 0x100800), at 0xffffffff631b1638
> >>>> [9] mca_base_framework_register(0xffffffff6335e1c0, 0x0, 0x2,
> >> 0xffffffff7fffd8d0, 0x0, 0xffffffff702a0010), at 0xffffffff631b24d4
> >>>> [10] mca_base_framework_open(0xffffffff6335e1c0, 0x0, 0x2,
> >> 0xffffffff7fffd990, 0x0, 0xffffffff702a0010), at 0xffffffff631b25d0
> >>>> [11] opal_init(0xffffffff7fffdd70, 0xffffffff7fffdd78, 0x100117c60,
> >> 0xffffffff7fffde58, 0x400, 0x100117c60), at
> >>>> 0xffffffff63153694
> >>>> [12] orterun(0x4, 0xffffffff7fffde58, 0x2, 0xffffffff7fffdda0, 0x0,
> >> 0xffffffff702a0010), at 0x100005078
> >>>> [13] main(0x4, 0xffffffff7fffde58, 0xffffffff7fffde80, 0x100117c60,
> >> 0x100000000, 0xffffffff6a700200), at 0x100003d68
> >>>> (dbx)
> >>>>
> >>>>
> >>>>
> >>>> I get the following output with gdb.
> >>>>
> >>>> tyr small_prog 107 /usr/local/gdb-7.6.1_64_gcc/bin/gdb
> >> /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec
> >>>> GNU gdb (GDB) 7.6.1
> >>>> Copyright (C) 2013 Free Software Foundation, Inc.
> >>>> License GPLv3+: GNU GPL version 3 or later
> >> <http://gnu.org/licenses/gpl.html>
> >>>> This is free software: you are free to change and redistribute it.
> >>>> There is NO WARRANTY, to the extent permitted by law. Type "show
> >>>> copying"
> >>>> and "show warranty" for details.
> >>>> This GDB was configured as "sparc-sun-solaris2.10".
> >>>> For bug reporting instructions, please see:
> >>>> <http://www.gnu.org/software/gdb/bugs/>...
> >>>> Reading symbols from
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/bin/orterun...done.
> >>>> (gdb) run -np 1 a.out
> >>>> Starting program: /usr/local/openmpi-1.8.2_64_gcc/bin/mpiexec -np 1 a.out
> >>>> [Thread debugging using libthread_db enabled]
> >>>> [New Thread 1 (LWP 1)]
> >>>> [New LWP 2 ]
> >>>> [tyr:27867] *** Process received signal ***
> >>>> [tyr:27867] Signal: Bus Error (10)
> >>>> [tyr:27867] Signal code: Invalid address alignment (1)
> >>>> [tyr:27867] Failing at address: ffffffff7fffd224
> >>>>
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b
> >> acktrace_print+0x2c
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfa
> >> 0
> >>>> /lib/sparcv9/libc.so.1:0xd8b98
> >>>> /lib/sparcv9/libc.so.1:0xcc70c
> >>>> /lib/sparcv9/libc.so.1:0xcc918
> >>>>
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e
> >> e8 [ Signal 10 (BUS)]
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d
> >> b_base_store+0xc8
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
> >> til_decode_pidmap+0x798
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
> >> til_nidmap_init+0x3cc
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22
> >> 6c
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i
> >> nit+0x308
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in
> >> it+0x31c
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0
> >> x2a8
> >> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:main+0x20
> >> /home/fd1026/work/skripte/master/parallel/prog/mpi/small_prog/a.out:_start+0x7c
> >>>> [tyr:27867] *** End of error message ***
> >>>> --------------------------------------------------------------------------
> >>>> mpiexec noticed that process rank 0 with PID 27867 on node tyr exited on
> >> signal 10 (Bus Error).
> >>>> --------------------------------------------------------------------------
> >>>> [LWP 2 exited]
> >>>> [New Thread 2 ]
> >>>> [Switching to Thread 1 (LWP 1)]
> >>>> sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
> >> satisfy query
> >>>> (gdb) bt
> >>>> #0 0xffffffff7f6173d0 in rtld_db_dlactivity () from
> >> /usr/lib/sparcv9/ld.so.1
> >>>> #1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
> >>>> #2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
> >>>> #3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
> >>>> #4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
> >>>> #5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
> >>>> #6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
> >>>> #7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
> >>>> #8 0xffffffff7ec7746c in vm_close ()
> >>>> from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
> >>>> #9 0xffffffff7ec74a4c in lt_dlclose ()
> >>>> from /usr/local/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6
> >>>> #10 0xffffffff7ec99b70 in ri_destructor (obj=0x1001ead30)
> >>>> at
> >> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:391
> >>>> #11 0xffffffff7ec98488 in opal_obj_run_destructors (object=0x1001ead30)
> >>>> at ../../../../openmpi-1.8.2rc3/opal/class/opal_object.h:446
> >>>> #12 0xffffffff7ec993ec in mca_base_component_repository_release (
> >>>> component=0xffffffff7b023cf0 <mca_oob_tcp_component>)
> >>>> at
> >> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_component_repository.c:244
> >>>> #13 0xffffffff7ec9b734 in mca_base_component_unload (
> >>>> component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1)
> >>>> at
> >> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:47
> >>>> #14 0xffffffff7ec9b7c8 in mca_base_component_close (
> >>>> component=0xffffffff7b023cf0 <mca_oob_tcp_component>, output_id=-1)
> >>>> at
> >> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:60
> >>>> #15 0xffffffff7ec9b89c in mca_base_components_close (output_id=-1,
> >>>> components=0xffffffff7f12b430 <orte_oob_base_framework+80>, skip=0x0)
> >>>> ---Type <return> to continue, or q <return> to quit---
> >>>> at
> >> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:86
> >>>> #16 0xffffffff7ec9b804 in mca_base_framework_components_close (
> >>>> framework=0xffffffff7f12b3e0 <orte_oob_base_framework>, skip=0x0)
> >>>> at
> >> ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_components_close.c:66
> >>>> #17 0xffffffff7efae1e4 in orte_oob_base_close ()
> >>>> at ../../../../openmpi-1.8.2rc3/orte/mca/oob/base/oob_base_frame.c:94
> >>>> #18 0xffffffff7ecb28ac in mca_base_framework_close (
> >>>> framework=0xffffffff7f12b3e0 <orte_oob_base_framework>)
> >>>> at ../../../../openmpi-1.8.2rc3/opal/mca/base/mca_base_framework.c:187
> >>>> #19 0xffffffff7bf078c0 in rte_finalize ()
> >>>> at
> >>>> ../../../../../openmpi-1.8.2rc3/orte/mca/ess/hnp/ess_hnp_module.c:858
> >>>> #20 0xffffffff7ef30a44 in orte_finalize ()
> >>>> at ../../openmpi-1.8.2rc3/orte/runtime/orte_finalize.c:65
> >>>> #21 0x00000001000070c4 in orterun (argc=4, argv=0xffffffff7fffe0e8)
> >>>> at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/orterun.c:1096
> >>>> #22 0x0000000100003d70 in main (argc=4, argv=0xffffffff7fffe0e8)
> >>>> at ../../../../openmpi-1.8.2rc3/orte/tools/orterun/main.c:13
> >>>> (gdb)
> >>>>
> >>>>
> >>>> Is the above information helpful to track down the error? Do you need
> >>>> anything else? Thank you very much for any help in advance.
> >>>>
> >>>>
> >>>> Kind regards
> >>>>
> >>>> Siegmar
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Jul 25, 2014, at 2:08 AM, Siegmar Gross
> >> <[email protected]> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
> >>>>>> 10 Sparc and I receive a bus error, if I run a small program.
> >>>>>>
> >>>>>> tyr hello_1 105 mpiexec -np 2 a.out
> >>>>>> [tyr:29164] *** Process received signal ***
> >>>>>> [tyr:29164] Signal: Bus Error (10)
> >>>>>> [tyr:29164] Signal code: Invalid address alignment (1)
> >>>>>> [tyr:29164] Failing at address: ffffffff7fffd1c4
> >>>>>>
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_b
> >> acktrace_print+0x2c
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:0xccfd
> >> 0
> >>>>>> /lib/sparcv9/libc.so.1:0xd8b98
> >>>>>> /lib/sparcv9/libc.so.1:0xcc70c
> >>>>>> /lib/sparcv9/libc.so.1:0xcc918
> >>>>>>
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_db_hash.so:0x3e
> >> e8 [ Signal 10 (BUS)]
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-pal.so.6.2.0:opal_d
> >> b_base_store+0xc8
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
> >> til_decode_pidmap+0x798
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_u
> >> til_nidmap_init+0x3cc
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/openmpi/mca_ess_env.so:0x22
> >> 6c
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libopen-rte.so.7.0.4:orte_i
> >> nit+0x308
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:ompi_mpi_in
> >> it+0x31c
> >> /export2/prog/SunOS_sparc/openmpi-1.8.2_64_gcc/lib64/libmpi.so.1.5.2:PMPI_Init+0
> >> x2a8
> >>>>>> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:main+0x20
> >>>>>>
> >> /home/fd1026/work/skripte/master/parallel/prog/mpi/hello_1/a.out:_start+0x7c
> >>>>>> [tyr:29164] *** End of error message ***
> >>>>>> ...
> >>>>>>
> >>>>>>
> >>>>>> I get the following output if I run the program in "dbx".
> >>>>>>
> >>>>>> ...
> >>>>>> RTC: Enabling Error Checking...
> >>>>>> RTC: Running program...
> >>>>>> Write to unallocated (wua) on thread 1:
> >>>>>> Attempting to write 1 byte at address 0xffffffff79f04000
> >>>>>> t@1 (l@1) stopped in _readdir at 0xffffffff55174da0
> >>>>>> 0xffffffff55174da0: _readdir+0x0064: call
> >> _PROCEDURE_LINKAGE_TABLE_+0x2380 [PLT] ! 0xffffffff55342a80
> >>>>>> (dbx)
> >>>>>>
> >>>>>>
> >>>>>> Hopefully the above output helps to fix the error. Can I provide
> >>>>>> anything else? Thank you very much for any help in advance.
> >>>>>>
> >>>>>>
> >>>>>> Kind regards
> >>>>>>
> >>>>>> Siegmar