It looks like I was too quick to blame libltdl.
A build of the current 'master' tarball on the same system and identical
configure arguments fails as seen below.

While the failure is not identical, it is also a out-of-memory error.
I am currently assuming that an rlimit has been lowered on this system
since the last time I tested there (1.8.4rc5, I believe).

-Paul

--------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.

  Local host:  cvrsvc03
  System call: mmap(2)
  Error:       Cannot allocate memory (errno 12)
--------------------------------------------------------------------------
[cvrsvc03:19412] create_and_attach: unable to create shared memory BTL
coordinating structure :: size 134217728
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[30315,1],0]) is on host: cvrsvc03
  Process 2 ([[30315,1],1]) is on host: cvrsvc03
  BTLs attempted: self

Your MPI job is now going to abort; sorry.

On Mon, Feb 2, 2015 at 7:01 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:

> Howard,
>
> This was seen on NERSC's Carver.
>
> -Paul
>
> On Mon, Feb 2, 2015 at 6:49 PM, Howard Pritchard <hpprit...@gmail.com>
> wrote:
>
>> Hi Paul,
>>
>> Thanks for checking in depth into this.  Just to help in determining how
>> to proceed, which national center is this?
>>
>> Howard
>>
>>
>> 2015-02-02 19:35 GMT-07:00 Paul Hargrove <phhargr...@lbl.gov>:
>>
>>> Below is one example of what happens when you assume that you can trust
>>> the libltdl installed an otherwise very well maintained national center.  I
>>> think this is another "vote" for continuing to embed (a working) libltdl.
>>>
>>> -Paul
>>>
>>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>>> libibverbs: Warning: no userspace device-specific driver found for
>>> /sys/class/infiniband_verbs/uverbs2
>>> libibverbs: Warning: no userspace device-specific driver found for
>>> /sys/class/infiniband_verbs/uverbs1
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_loadleveler.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_simulator.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_slurm.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_ras_tm.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_lama.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_mindist.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_ppr.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_rank_file.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_resilient:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_resilient.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_round_robin:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_round_robin.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_seq:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_seq.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_staged:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_rmaps_staged.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] mca: base: component_find: unable to open
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_odls_default:
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/INST/lib/openmpi/mca_odls_default.so:
>>> failed to map segment from shared object: Cannot allocate memory (ignored)
>>> [cvrsvc03:25777] [[22934,0],0] ORTE_ERROR_LOG: Not found in file
>>> /global/homes/h/hargrove/GSCRATCH/OMPI/openmpi-libltdl-linux-x86_64-icc-11.1/openmpi-gitclone/orte/mca/ess/hnp/ess_hnp_module.c
>>> at line 583
>>>
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>>   orte_odls_base_select failed
>>>   --> Returned value Not found (-13) instead of ORTE_SUCCESS
>>>
>>> --------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>> --
>>> Paul H. Hargrove                          phhargr...@lbl.gov
>>> Computer Languages & Systems Software (CLaSS) Group
>>> Computer Science Department               Tel: +1-510-495-2352
>>> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2015/02/16896.php
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/02/16897.php
>>
>
>
>
> --
> Paul H. Hargrove                          phhargr...@lbl.gov
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>



-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department               Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to