[OMPI users] Segmentation fault with openmpi-v2.0.1-134-g52bea1d on SuSE Linux

2016-11-02 Thread Siegmar Gross

Hi,

I have installed openmpi-v2.0.1-134-g52bea1d on my "SUSE Linux Enterprise
Server 12 (x86_64)" with Sun C 5.14 beta and gcc-6.2.0. Unfortunately,
I get an error when I run one of my programs.

loki spawn 149 ompi_info | grep -e "Open MPI:" -e "C compiler absolute:"
Open MPI: 2.0.2a1
 C compiler absolute: /opt/solstudio12.5b/bin/cc
loki spawn 150 mpiexec -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[loki:03941] sm_segment_attach: mca_common_sm_module_attach failure!
--
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  loki
  System call: open(2)
  Error:   No such file or directory (errno 2)
--
[loki:03941] *** Process received signal ***
[loki:03941] Signal: Segmentation fault (11)
[loki:03941] Signal code: Address not mapped (1)
[loki:03941] Failing at address: 0x8
[loki:03931] [[37095,0],0] ORTE_ERROR_LOG: Not found in file 
../../openmpi-v2.0.1-134-g52bea1d/orte/orted/pmix/pmix_server_fence.c at line 186
[loki:03931] [[37095,0],0] ORTE_ERROR_LOG: Not found in file 
../../openmpi-v2.0.1-134-g52bea1d/orte/orted/pmix/pmix_server_fence.c at line 186

--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[37095,2],0]) is on host: loki
  Process 2 ([[37095,2],1]) is on host: unknown!
  BTLs attempted: self sm tcp vader

Your MPI job is now going to abort; sorry.
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
loki spawn 151



The program works as expected, if I specify the hosts in the following way.

loki spawn 151 mpiexec -np 1 --host loki,loki,loki,nfs1,nfs1 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

Parent process 0: tasks in MPI_COMM_WORLD:1
  tasks in COMM_CHILD_PROCESSES local group:  1
  tasks in COMM_CHILD_PROCESSES remote group: 4

Slave process 0 of 4 running on loki
Slave process 1 of 4 running on loki
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
Slave process 2 of 4 running on nfs1
spawn_slave 2: argv[0]: spawn_slave
Slave process 3 of 4 running on nfs1
spawn_slave 3: argv[0]: spawn_slave
loki spawn 152



I would be grateful, if somebody can fix the problem. Thank you
very much for any help in advance.


Kind regards

Siegmar
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Redusing libmpi.so size....

2016-11-02 Thread George Bosilca
Gilles is right, the script shows only what is used right after MPI_Init,
and it will disregard some of the less mainstreams type of modules, the
ones that are dynamically loaded as needed during the execution. It also
only shows only what is related to libmpi, and ignores everything related
to ORTE that is not in use inside the MPI library. However, it does allow
you to define a list of necessary modules, that you can then use during
configure to limit the size of your MPI library.

1. If your goal is to limit the size of the library for a limited set of
applications you can do the following. Instead of generating an app, use
the output of the script to generate a function. You can then link it with
your application(s). Calling the function right before your MPI_Finalize
will allow you to dump the entire list of used modules in your
application(s).

2. During configure use the option --enable-mca-no-build="list" to remove
all unnecessary modules from the build process. The configure will ignore
them, and therefore they will not endup in your libmpi.so

3. Some of the framework are dynamically selected for each communicator or
peer process (e.g. collective and BTL), so it might be difficult and error
prone to trim then down more.

  George.



On Wed, Nov 2, 2016 at 12:28 AM, Gilles Gouaillardet 
wrote:

> Did you strip the libraries already ?
>
>
> the script will show the list of frameworks and components used by MPI
> helloworld.
>
> from that, you can deduce a list of components that are not required,
> exclude them via the configure command line, and rebuild a trimmed Open MPI.
>
> note this is pretty painful and incomplete. for example, the ompi/io
> components are not explicitly required by MPI helloworld, but they are
> required
>
> if your app uses MPI-IO (e.g. MPI_File_xxx)
>
> some more components might be dynamically required by realworld MPI app.
>
>
> may i ask why you are focusing on reducing the lib size ?
>
> reducing the lib size by excluding (allegedly) useless components is a
> long and painful process, and you might end up having to debug
>
> new problems by your own ...
>
> as far as i am concerned, if a few MB libs is too big (filesystem ? memory
> ?), i do not see how a real world application can even run on your arm node
>
>
> Cheers,
>
>
> Gilles
> On 11/2/2016 12:49 PM, Mahesh Nanavalla wrote:
>
> HI George,
> Thanks for reply,
>
> By that above script ,how can i reduce* libmpi.so* size.
>
>
>
> On Tue, Nov 1, 2016 at 11:27 PM, George Bosilca 
> wrote:
>
>> Let's try to coerce OMPI to dump all modules that are still loaded after
>> MPI_Init. We are still having a superset of the needed modules, but at
>> least everything unnecessary in your particular environment has been
>> trimmed as during a normal OMPI run.
>>
>> George.
>>
>> PS: It's a shell script that needs ag to run. You need to provide the
>> OMPI source directory. You will get a C file (named tmp.c) in the current
>> directory that contain the code necessary to dump all active modules. You
>> will have to fiddle with the compile line to get it to work, as you will
>> need to specify both source and build header files directories. For the
>> sake of completeness here is my compile line
>>
>> mpicc -o tmp -g tmp.c -I. -I../debug/opal/include -I../debug/ompi/include
>> -Iompi/include -Iopal/include -Iopal/mca/event/libevent2022/libevent
>> -Iorte/include -I../debug/opal/mca/hwloc/hwloc1113/hwloc/include
>> -Iopal/mca/hwloc/hwloc1113/hwloc/include -Ioshmem/include -I../debug/
>> -lopen-rte -l open-pal
>>
>>
>>
>> On Tue, Nov 1, 2016 at 7:12 AM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>>
>>> Run ompi_info; it will tell you all the plugins that are installed.
>>>
>>> > On Nov 1, 2016, at 2:13 AM, Mahesh Nanavalla <
>>> mahesh.nanavalla...@gmail.com> wrote:
>>> >
>>> > Hi Jeff Squyres,
>>> >
>>> > Thank you for your reply...
>>> >
>>> > My problem is i want to reduce library size by removing unwanted
>>> plugin's.
>>> >
>>> > Here libmpi.so.12.0.3 size is 2.4MB.
>>> >
>>> > How can i know what are the pluggin's included to build the
>>> libmpi.so.12.0.3 and how can remove.
>>> >
>>> > Thanks&Regards,
>>> > Mahesh N
>>> >
>>> > On Fri, Oct 28, 2016 at 7:09 PM, Jeff Squyres (jsquyres) <
>>> jsquy...@cisco.com> wrote:
>>> > On Oct 28, 2016, at 8:12 AM, Mahesh Nanavalla <
>>> mahesh.nanavalla...@gmail.com> wrote:
>>> > >
>>> > > i have configured as below for arm
>>> > >
>>> > > ./configure --enable-orterun-prefix-by-default
>>> --prefix="/home/nmahesh/Workspace/ARM_MPI/openmpi"
>>> CC=arm-openwrt-linux-muslgnueabi-gcc CXX=arm-openwrt-linux-muslgnueabi-g++
>>> --host=arm-openwrt-linux-muslgnueabi --enable-script-wrapper-compilers
>>> --disable-mpi-fortran --enable-dlopen --enable-shared --disable-vt
>>> --disable-java --disable-libompitrace --disable-static
>>> >
>>> > Note that there is a tradeoff here: --enable-dlopen will reduce the
>>> size of libmpi.so by splitting out all the plugins into separate DSOs
>>> (

Re: [OMPI users] OpenMPI + InfiniBand

2016-11-02 Thread Sergei Hrushev
Hi Nathan!

UDCM does not require IPoIB. It should be working for you. Can you build
> Open MPI with --enable-debug and run with -mca btl_base_verbose 100 and
> create a gist with the output.
>
>
Ok, done:

https://gist.github.com/hsa-online/30bb27a90bb7b225b233cc2af11b3942


Best regards,
Sergei.
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users