Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread Gilles Gouaillardet
Adrian,

i just fixed this in the master
(https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2)

the root cause is a corner case was not handled correctly :

MPI_Type_hvector(2, 1, 0, MPI_INT, &type);

type has extent = 4 *but* size = 8
ob1 used to test only the extent to determine whether the message should
be sent inlined or not
extent <= 256 means try to send the message inline
that meant a fragment of size 8 (which is greater than 65536 e.g.
max default size for IB) was allocated,
and that failed.

now both extent and size are tested, so the message is not sent inline,
and it just works.

Cheers,

Gilles


Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread Gilles Gouaillardet
Adrian,

about the
"[n050409][[36216,1],1][btl_openib_xrc.c:58:mca_btl_openib_xrc_check_api] XRC
error: bad XRC API (require XRC from OFED pre 3.12). " message.

this means ompi was built on a system with OFED 3.12 or greater, and you
are running on a system with an earlier OFED release.

please not Jeff recently pushed a patch related to that and this message
might be a false positive.

Cheers,

Gilles

On 2015/01/19 14:17, Gilles Gouaillardet wrote:
> Adrian,
>
> i just fixed this in the master
> (https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2)
>
> the root cause is a corner case was not handled correctly :
>
> MPI_Type_hvector(2, 1, 0, MPI_INT, &type);
>
> type has extent = 4 *but* size = 8
> ob1 used to test only the extent to determine whether the message should
> be sent inlined or not
> extent <= 256 means try to send the message inline
> that meant a fragment of size 8 (which is greater than 65536 e.g.
> max default size for IB) was allocated,
> and that failed.
>
> now both extent and size are tested, so the message is not sent inline,
> and it just works.
>
> Cheers,
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16798.php



Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread George Bosilca
The extent should not be part of the decision, what matters is the amount
of data to be pushed on the wire, and not it's span in memory.

  George.


On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Adrian,
>
> i just fixed this in the master
> (
> https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2
> )
>
> the root cause is a corner case was not handled correctly :
>
> MPI_Type_hvector(2, 1, 0, MPI_INT, &type);
>
> type has extent = 4 *but* size = 8
> ob1 used to test only the extent to determine whether the message should
> be sent inlined or not
> extent <= 256 means try to send the message inline
> that meant a fragment of size 8 (which is greater than 65536 e.g.
> max default size for IB) was allocated,
> and that failed.
>
> now both extent and size are tested, so the message is not sent inline,
> and it just works.
>
> Cheers,
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16798.php
>


Re: [OMPI devel] btl_openib.c:1200: mca_btl_openib_alloc: Assertion `qp != 255' failed

2015-01-19 Thread George Bosilca
Btw,

MPI_Type_hvector(2, 1, 0, MPI_INT, &type);

Is just a weird datatype. Because the stride is 0, this datatype a memory
layout that includes 2 times the same int. I'm not sure this was indeed
intended...

  George.


On Mon, Jan 19, 2015 at 12:17 AM, Gilles Gouaillardet <
gilles.gouaillar...@iferc.org> wrote:

> Adrian,
>
> i just fixed this in the master
> (
> https://github.com/open-mpi/ompi/commit/d14daf40d041f7a0a8e9d85b3bfd5eb570495fd2
> )
>
> the root cause is a corner case was not handled correctly :
>
> MPI_Type_hvector(2, 1, 0, MPI_INT, &type);
>
> type has extent = 4 *but* size = 8
> ob1 used to test only the extent to determine whether the message should
> be sent inlined or not
> extent <= 256 means try to send the message inline
> that meant a fragment of size 8 (which is greater than 65536 e.g.
> max default size for IB) was allocated,
> and that failed.
>
> now both extent and size are tested, so the message is not sent inline,
> and it just works.
>
> Cheers,
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/01/16798.php
>


[OMPI devel] segmentation fault on an accumulate-fence test

2015-01-19 Thread Alina Sklarevich
Dear OMPI community,



We observe a segmentation fault in our regression testing. Our initial
investigation shows that It happens for any 1.8.x release and with any
PML/BTL/MTL combo on two processes, when running the MPICH one-sided test,
accumulate-fence test, attached to this report with the following command
line:



$mpirun -np 2 --bind-to core --display-map --map-by node -mca pml ob1 -mca
btl self,openib ../test/mpi/rma/accfence1



The initial trace is:



Data for JOB [16088,1] offset 0



   JOB MAP   



Data for node: vegas15 Num slots: 16 Max slots: 0Num procs: 1

   Process OMPI jobid: [16088,1] App: 0 Process rank: 0



Data for node: vegas16 Num slots: 16 Max slots: 0Num procs: 1

   Process OMPI jobid: [16088,1] App: 0 Process rank: 1



=

[vegas16:22098] *** Process received signal ***

[vegas16:22098] Signal: Segmentation fault (11)

[vegas16:22098] Signal code: Address not mapped (1)

[vegas16:22098] Failing at address: 0x34

[vegas16:22098] [ 0] /lib64/libpthread.so.0[0x3f6e80f710]

[vegas16:22098] [ 1]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libopen-pal.so.6(opal_memory_ptmalloc2_int_free+0x188)[0x7772baa2]

[vegas16:22098] [ 2]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libopen-pal.so.6(opal_memory_ptmalloc2_free+0x98)[0x7772a1f5]

[vegas16:22098] [ 3]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libopen-pal.so.6(+0xd6f59)[0x77728f59]

[vegas16:22098] [ 4]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(+0x2f884)[0x77c92884]

[vegas16:22098] [ 5]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(ompi_attr_delete_all+0x2eb)[0x77c92dbe]

[vegas16:22098] [ 6]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(ompi_comm_free+0x6a)[0x77c99336]

[vegas16:22098] [ 7]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_free+0x921)[0x732ab3bc]

[vegas16:22098] [ 8]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(ompi_win_free+0x24)[0x77cc0c87]

[vegas16:22098] [ 9]
/labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(MPI_Win_free+0xb8)[0x77d2b702]

[vegas16:22098] [10]
/labhome/alinas/workspace/mpich/mpich-mellanox/test/mpi/rma/accfence1[0x402447]

[vegas16:22098] [11] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3f6e41ed1d]

[vegas16:22098] [12]
/labhome/alinas/workspace/mpich/mpich-mellanox/test/mpi/rma/accfence1[0x402119]

[vegas16:22098] *** End of error message ***





And subsequent investigation of the core file generates the following hints:



(gdb) bt

#0  0x77722a96 in opal_memory_ptmalloc2_int_free
(av=0x7796b320, mem=0x7125a0) at malloc.c:4402

#1  0x777211f5 in opal_memory_ptmalloc2_free (mem=0x7125a0) at
malloc.c:3511

#2  0x7771ff59 in opal_memory_linux_free_hook (__ptr=0x7125a0,
caller=0x7769a8f6) at hooks.c:709

#3  0x7769a8f6 in opal_datatype_destruct (datatype=0x7123b0) at
opal_datatype_create.c:59

#4  0x73346ad0 in opal_obj_run_destructors (object=0x7123b0) at
../../../../opal/class/opal_object.h:448

#5  0x7334af68 in process_acc (module=0x70e370, source=0,
acc_header=0x70fef0) at osc_rdma_data_move.c:1184

#6  0x7334c752 in process_frag (module=0x70e370, frag=0x70fee0) at
osc_rdma_data_move.c:1576

#7  0x7334cafb in ompi_osc_rdma_callback (request=0x700b80) at
osc_rdma_data_move.c:1656

#8  0x73db3770 in ompi_request_complete (request=0x700b80,
with_signal=true) at ../../../../ompi/request/request.h:402

#9  0x73db3f11 in recv_request_pml_complete (recvreq=0x700b80) at
pml_ob1_recvreq.h:181

#10 0x73db5019 in mca_pml_ob1_recv_frag_callback_match
(btl=0x741d9c20, tag=65 'A', des=0x7fffd210, cbdata=0x0) at
pml_ob1_recvfrag.c:243

#11 0x73fd6c4b in mca_btl_sm_component_progress () at
btl_sm_component.c:1087

#12 0x77678d66 in opal_progress () at runtime/opal_progress.c:187

#13 0x73dabb44 in opal_condition_wait (c=0x77ffa120,
m=0x77ffa160) at ../../../../opal/threads/condition.h:78

#14 0x73dabcc6 in ompi_request_wait_completion (req=0x7fffd410)
at ../../../../ompi/request/request.h:381

#15 0x73dac9da in mca_pml_ob1_recv (addr=0x7fffd9ec, count=1,
datatype=0x77fe25c0, src=0, tag=-24, comm=0x70dac0, status=0x0) at
pml_ob1_irecv.c:109

#16 0x72cd2868 in ompi_coll_tuned_scatter_intra_basic_linear
(sbuf=0x0, scount=1, sdtype=0x77fe25c0, rbuf=0x7fffd9ec, rcount=1,
rdtype=0x77fe25c0, root=0, comm=0x70dac0, module=0x70fa20)

at coll_tuned_scatter.c:231

#17 0x72cbbd75 in ompi_coll_tuned_scatter_intra_dec_fixed
(sbuf=0x0, scount=1, sdtype=0x77fe25c0, rbuf=0x7fffd9ec, rcount=1,
rdtype=0x77fe25c0, root=0, comm=

Re: [OMPI devel] segmentation fault on an accumulate-fence test

2015-01-19 Thread Alina Sklarevich
Attaching the test for reproduction.

On Mon, Jan 19, 2015 at 11:48 AM, Alina Sklarevich <
ali...@dev.mellanox.co.il> wrote:

> Dear OMPI community,
>
>
>
> We observe a segmentation fault in our regression testing. Our initial
> investigation shows that It happens for any 1.8.x release and with any
> PML/BTL/MTL combo on two processes, when running the MPICH one-sided test,
> accumulate-fence test, attached to this report with the following command
> line:
>
>
>
> $mpirun -np 2 --bind-to core --display-map --map-by node -mca pml ob1 -mca
> btl self,openib ../test/mpi/rma/accfence1
>
>
>
> The initial trace is:
>
>
>
> Data for JOB [16088,1] offset 0
>
>
>
>    JOB MAP   
>
>
>
> Data for node: vegas15 Num slots: 16 Max slots: 0Num procs: 1
>
>Process OMPI jobid: [16088,1] App: 0 Process rank: 0
>
>
>
> Data for node: vegas16 Num slots: 16 Max slots: 0Num procs: 1
>
>Process OMPI jobid: [16088,1] App: 0 Process rank: 1
>
>
>
> =
>
> [vegas16:22098] *** Process received signal ***
>
> [vegas16:22098] Signal: Segmentation fault (11)
>
> [vegas16:22098] Signal code: Address not mapped (1)
>
> [vegas16:22098] Failing at address: 0x34
>
> [vegas16:22098] [ 0] /lib64/libpthread.so.0[0x3f6e80f710]
>
> [vegas16:22098] [ 1]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libopen-pal.so.6(opal_memory_ptmalloc2_int_free+0x188)[0x7772baa2]
>
> [vegas16:22098] [ 2]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libopen-pal.so.6(opal_memory_ptmalloc2_free+0x98)[0x7772a1f5]
>
> [vegas16:22098] [ 3]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libopen-pal.so.6(+0xd6f59)[0x77728f59]
>
> [vegas16:22098] [ 4]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(+0x2f884)[0x77c92884]
>
> [vegas16:22098] [ 5]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(ompi_attr_delete_all+0x2eb)[0x77c92dbe]
>
> [vegas16:22098] [ 6]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(ompi_comm_free+0x6a)[0x77c99336]
>
> [vegas16:22098] [ 7]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_free+0x921)[0x732ab3bc]
>
> [vegas16:22098] [ 8]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(ompi_win_free+0x24)[0x77cc0c87]
>
> [vegas16:22098] [ 9]
> /labhome/alinas/workspace/ompi/openmpi-1.8.4/install/lib/libmpi.so.1(MPI_Win_free+0xb8)[0x77d2b702]
>
> [vegas16:22098] [10]
> /labhome/alinas/workspace/mpich/mpich-mellanox/test/mpi/rma/accfence1[0x402447]
>
> [vegas16:22098] [11] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3f6e41ed1d]
>
> [vegas16:22098] [12]
> /labhome/alinas/workspace/mpich/mpich-mellanox/test/mpi/rma/accfence1[0x402119]
>
> [vegas16:22098] *** End of error message ***
>
>
>
>
>
> And subsequent investigation of the core file generates the following
> hints:
>
>
>
> (gdb) bt
>
> #0  0x77722a96 in opal_memory_ptmalloc2_int_free
> (av=0x7796b320, mem=0x7125a0) at malloc.c:4402
>
> #1  0x777211f5 in opal_memory_ptmalloc2_free (mem=0x7125a0) at
> malloc.c:3511
>
> #2  0x7771ff59 in opal_memory_linux_free_hook (__ptr=0x7125a0,
> caller=0x7769a8f6) at hooks.c:709
>
> #3  0x7769a8f6 in opal_datatype_destruct (datatype=0x7123b0) at
> opal_datatype_create.c:59
>
> #4  0x73346ad0 in opal_obj_run_destructors (object=0x7123b0) at
> ../../../../opal/class/opal_object.h:448
>
> #5  0x7334af68 in process_acc (module=0x70e370, source=0,
> acc_header=0x70fef0) at osc_rdma_data_move.c:1184
>
> #6  0x7334c752 in process_frag (module=0x70e370, frag=0x70fee0) at
> osc_rdma_data_move.c:1576
>
> #7  0x7334cafb in ompi_osc_rdma_callback (request=0x700b80) at
> osc_rdma_data_move.c:1656
>
> #8  0x73db3770 in ompi_request_complete (request=0x700b80,
> with_signal=true) at ../../../../ompi/request/request.h:402
>
> #9  0x73db3f11 in recv_request_pml_complete (recvreq=0x700b80) at
> pml_ob1_recvreq.h:181
>
> #10 0x73db5019 in mca_pml_ob1_recv_frag_callback_match
> (btl=0x741d9c20, tag=65 'A', des=0x7fffd210, cbdata=0x0) at
> pml_ob1_recvfrag.c:243
>
> #11 0x73fd6c4b in mca_btl_sm_component_progress () at
> btl_sm_component.c:1087
>
> #12 0x77678d66 in opal_progress () at runtime/opal_progress.c:187
>
> #13 0x73dabb44 in opal_condition_wait (c=0x77ffa120,
> m=0x77ffa160) at ../../../../opal/threads/condition.h:78
>
> #14 0x73dabcc6 in ompi_request_wait_completion
> (req=0x7fffd410) at ../../../../ompi/request/request.h:381
>
> #15 0x73dac9da in mca_pml_ob1_recv (addr=0x7fffd9ec, count=1,
> datatype=0x77fe25c0, src=0, tag=-24, comm=0x70dac0, status=0x0) at
> pml_ob1_irecv.c:109
>
> #16 0x72cd2868 in ompi_coll_tuned_scatter_intra_bas

Re: [OMPI devel] Failures

2015-01-19 Thread Gilles Gouaillardet
George,

i was able to reproduce the hang with intel compiler 14.0.0
but i am still unable to reproduce it with intel compiler 14.3

i was not able to understand where the issue come from, so
i could not create an appropriate test in configure

at this stage, i can only recommend you update your compiler version


Cheers,

Gilles

On 2015/01/17 0:19, George Bosilca wrote:
> Your patch solve the issue with opal_tree. The opal_lifo remains broken.
>
>   George.
>
>
> On Fri, Jan 16, 2015 at 5:12 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  George,
>>
>> i pushed
>> https://github.com/open-mpi/ompi/commit/ac16970d21d21f529f1ec01ebe0520843227475b
>> in order to get the intel compiler work with ompi
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2015/01/16 17:29, Gilles Gouaillardet wrote:
>>
>> George,
>>
>> i was unable to reproduce the hang with icc 14.0.3.174 and greater on a
>> RHEL6 like distro.
>>
>> i was able to reproduce the opal_tree failure and found two possible
>> workarounds :
>> a) manually compile opal/class/opal_tree.lo *without* the
>> -finline-functions flag
>> b) update deserialize_add_tree_item and declare curr_delim as volatile
>> char * (see the patch below)
>>
>> this function is recursive, and the compiler could generate some
>> incorrect code.
>>
>> Cheers,
>>
>> Gilles
>>
>> diff --git a/opal/class/opal_tree.c b/opal/class/opal_tree.c
>> index e8964e0..492e8dc 100644
>> --- a/opal/class/opal_tree.c
>> +++ b/opal/class/opal_tree.c
>> @@ -465,7 +465,7 @@ int opal_tree_serialize(opal_tree_item_t
>> *start_item, opal_buffer_t *buffer)
>>  static int deserialize_add_tree_item(opal_buffer_t *data,
>>   opal_tree_item_t *parent_item,
>>   opal_tree_item_deserialize_fn_t
>> deserialize,
>> - char *curr_delim,
>> + volatile char *curr_delim,
>>   int depth)
>>  {
>>  int idx = 1, rc;
>>
>> On 2015/01/16 8:57, George Bosilca wrote:
>>
>>  Today's trunk compiled with icc fails to complete the check on 2 tests:
>> opal_lifo and opal_tree.
>>
>> For opal_tree the output is:
>> OPAL dss:unpack: got type 9 when expecting type 3
>>  Failure :  failed tree deserialization size compare
>> SUPPORT: OMPI Test failed: opal_tree_t (1 of 12 failed)
>>
>> and opal_lifo gets stuck forever in the single threaded call to thread_test
>> in a 128 bits atomic CAS. Unfortunately I lack the time to dig deep enough
>> to see what is the root cause, but a quick look at the opal_config.h file
>> indicates that our configure detects that __int128 is a supported type when
>> it should not be.
>>
>>   George
>>
>> Open MPI git d13c14e configured with --enable-debug
>> icc (ICC) 14.0.0 20130728
>>
>>
>>
>> ___
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16789.php
>>
>>
>>
>> ___
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/01/16790.php
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2015/01/16791.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/01/16794.php



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-730-g06d3b57

2015-01-19 Thread Ralph Castain
Could you explain the comment about opal_setenv getting “picky”? You can pass a 
flag that tells opal_setenv whether or not to overwrite a pre-existing value, 
and it tells you if the value was found (which is exactly what you asked for) - 
why isn’t it adequate to just pass a “false” for overwrite and check the return 
for OPAL_EXISTS?


> On Jan 19, 2015, at 10:48 AM, git...@crest.iu.edu wrote:
> 
> This is an automated email from the git hooks/post-receive script. It was
> generated because a ref change was pushed to the repository containing
> the project "open-mpi/ompi".
> 
> The branch, master has been updated
>   via  06d3b57c07a3e028d660d747848e320369185d06 (commit)
>   via  fd807aee69675a0b0602eb6971bacf61db5b10a5 (commit)
>  from  da83b084f506ea3c34ebe9da3c6dd6f44e2537a8 (commit)
> 
> Those revisions listed above that are new to this repository have
> not appeared on any other notification email; so we list those
> revisions in full, below.
> 
> - Log -
> https://github.com/open-mpi/ompi/commit/06d3b57c07a3e028d660d747848e320369185d06
> 
> commit 06d3b57c07a3e028d660d747848e320369185d06
> Merge: da83b08 fd807ae
> Author: Howard Pritchard 
> Date:   Mon Jan 19 11:48:24 2015 -0700
> 
>Merge pull request #351 from hppritcha/topic/alps_odls_spawn_bug
> 
>odls/alps: check if PMI gni rdma creds already set
> 
> 
> 
> https://github.com/open-mpi/ompi/commit/fd807aee69675a0b0602eb6971bacf61db5b10a5
> 
> commit fd807aee69675a0b0602eb6971bacf61db5b10a5
> Author: Howard Pritchard 
> Date:   Mon Jan 19 10:12:38 2015 -0800
> 
>odls/alps: check if PMI gni rdma creds already set
> 
>Need to check if the alps odls component has already
>read the rdma creds from alps.  Its okay to ask apshepherd
>multiple times for rdma creds, but opal_setenv gets
>a bit picky about this.  Rather than check for the OPAL_EXISTS
>return value from opal_setenv, for now just check with
>a static variable whether or not orte_odls_alps_get_rdma_creds
>has already been successfully called before.
> 
>Would be nice to have an opal_getenv function for checking
>if an env. variable had already been set by opal_putenv.
> 
> diff --git a/orte/mca/odls/alps/odls_alps_utils.c 
> b/orte/mca/odls/alps/odls_alps_utils.c
> index 8236038..2ffee05 100644
> --- a/orte/mca/odls/alps/odls_alps_utils.c
> +++ b/orte/mca/odls/alps/odls_alps_utils.c
> @@ -53,6 +53,17 @@ int orte_odls_alps_get_rdma_creds(void)
> alpsAppGni_t *rdmacred_buf;
> char *ptr;
> char env_buffer[1024];
> +static int already_got_creds = 0;
> +
> +/*
> + * If we already put the GNI RDMA credentials into orte_launch_environ,
> + * no need to do anything.
> + * TODO: kind of ugly, need to implement an opal_getenv
> + */
> +
> +if (1 == already_got_creds) {
> +return ORTE_SUCCESS;
> +}
> 
> /*
>  * get the Cray HSN RDMA credentials here and stuff them in to the
> @@ -234,6 +245,7 @@ int orte_odls_alps_get_rdma_creds(void)
> } 
> 
>fn_exit:
> +if (ORTE_SUCCESS == ret) already_got_creds = 1;
> return ret;
> }
> 
> 
> 
> ---
> 
> Summary of changes:
> orte/mca/odls/alps/odls_alps_utils.c | 12 
> 1 file changed, 12 insertions(+)
> 
> 
> hooks/post-receive
> -- 
> open-mpi/ompi
> ___
> ompi-commits mailing list
> ompi-comm...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/ompi-commits