[OMPI devel] RFC: remove PMI component in OMPI/RTE framework

2014-05-25 Thread Ralph Castain
WHAT:  remove stale and unmaintained component in ompi/rte framework

WHY:   because it is unused, unmaintained, and doesn't even compile?

WHEN:  without objections, after telecon on June 9

HOW:   svn del ompi/rte/pmi

This was a component added by Brian as a test of the ompi/rte framework while 
we developed that system. It never really had any purpose other than to provide 
an alternative to ORTE while we tested the revised integration. So far as we 
know, nobody ever used it in an actual installation.

Ralph



Re: [OMPI devel] Still problems with del_procs in trunkj

2014-05-25 Thread Gilles Gouaillardet
Rolf,

the assert fails because the endpoint reference count is greater than one.
the root cause is the endpoint has been added to the list of
eager_rdma_buffers of the openib btl device (and hence OBJ_RETAIN'ed at
ompi/mca/btl/openib/btl_openib_endpoint.c:1009)

a simple workaround is not to use eager rdma with the openib btl
(e.g. export OMPI_MCA_btl_openib_use_eager_rdma=0)

here is attached a patch that solves the issue.

because of my poor understanding of the openib btl, i did not commit it.
i am wondering wether it is safe to simply OBJ_RELEASE the endpoint
(e.g. what happens if there are inflight messages ?)
i also added some comments that indicates the patch might be suboptimal.

Nathan, could you please review the attached patch ?

please note that if the faulty assertion is removed, the endpoint will be
OBJ_RELEASE'd  but only in the btl finalize.

Gilles



On Sat, May 24, 2014 at 12:31 AM, Rolf vandeVaart wrote:

> I am still seeing problems with del_procs with openib.  Do we believe
> everything should be working?  This is with the latest trunk (updated 1
> hour ago).
>
> [rvandevaart@drossetti-ivy0 examples]$ mpirun --mca btl_openib_if_include
> mlx5_0:1 -np 2 -host drossetti-ivy0,drossetti-ivy1
> connectivity_cConnectivity test on 2 processes PASSED.
> connectivity_c: ../../../../../ompi/mca/btl/openib/btl_openib.c:1151:
> mca_btl_openib_del_procs: Assertion
> `((opal_object_t*)endpoint)->obj_reference_count == 1' failed.
> connectivity_c: ../../../../../ompi/mca/btl/openib/btl_openib.c:1151:
> mca_btl_openib_del_procs: Assertion
> `((opal_object_t*)endpoint)->obj_reference_count == 1' failed.
> --
> mpirun noticed that process rank 1 with PID 28443 on node drossetti-ivy1
> exited on signal 11 (Segmentation fault).
> --
> [rvandevaart@drossetti-ivy0 examples]$
>
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> ---
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/05/14836.php
>
Index: ompi/mca/btl/openib/btl_openib.c
===
--- ompi/mca/btl/openib/btl_openib.c(revision 31888)
+++ ompi/mca/btl/openib/btl_openib.c(working copy)
@@ -1128,7 +1128,7 @@
 struct ompi_proc_t **procs,
 struct mca_btl_base_endpoint_t ** peers)
 {
-int i,ep_index;
+int i, ep_index;
 mca_btl_openib_module_t* openib_btl = (mca_btl_openib_module_t*) btl;
 mca_btl_openib_endpoint_t* endpoint;
 
@@ -1144,8 +1144,19 @@
 continue;
 }
 if (endpoint == del_endpoint) {
+int j;
 BTL_VERBOSE(("in del_procs %d, setting another endpoint to 
null",
  ep_index));
+/* remove the endpoint from eager_rdma_buffers */
+for (j=0; jdevice->eager_rdma_buffers_count; j++) 
{
+if (openib_btl->device->eager_rdma_buffers[j] == endpoint) 
{
+/* should it be obj_reference_count == 2 ? */
+assert(((opal_object_t*)endpoint)->obj_reference_count 
> 1);
+OBJ_RELEASE(endpoint);
+openib_btl->device->eager_rdma_buffers[j] = NULL;
+/* can we simply break and leave the for loop ? */
+}
+}
 opal_pointer_array_set_item(openib_btl->device->endpoints,
 ep_index, NULL);
 assert(((opal_object_t*)endpoint)->obj_reference_count == 1);


[OMPI devel] Trunk (RDMA and VT) warnings

2014-05-25 Thread Ralph Castain
Building optimized on an IB-based machine:

osc_rdma_data_move.c: In function 'ompi_osc_rdma_callback':
osc_rdma_data_move.c:1633: warning: unused variable 'incoming_length'
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send':
osc_rdma_data_move.c:221: warning: 'ptr' may be used uninitialized in this 
function
osc_rdma_data_move.c:220: warning: 'frag' may be used uninitialized in this 
function
osc_rdma_data_move.c: In function 'ompi_osc_gacc_long_start':
osc_rdma_data_move.c:961: warning: 'acc_data' may be used uninitialized in this 
function
osc_rdma_data_move.c: In function 'ompi_osc_rdma_gacc_start':
osc_rdma_data_move.c:912: warning: 'acc_data' may be used uninitialized in this 
function
osc_rdma_comm.c: In function 'ompi_osc_rdma_rget_accumulate_internal':
osc_rdma_comm.c:943: warning: 'ptr' may be used uninitialized in this function
osc_rdma_comm.c:940: warning: 'frag' may be used uninitialized in this function
osc_rdma_data_move.c: In function 'ompi_osc_rdma_acc_long_start':
osc_rdma_data_move.c:827: warning: 'acc_data' may be used uninitialized in this 
function
osc_rdma_comm.c: In function 'ompi_osc_rdma_rget':
osc_rdma_comm.c:736: warning: 'ptr' may be used uninitialized in this function
osc_rdma_comm.c:733: warning: 'frag' may be used uninitialized in this function
osc_rdma_comm.c: In function 'ompi_osc_rdma_accumulate_w_req':
osc_rdma_comm.c:420: warning: 'ptr' may be used uninitialized in this function
osc_rdma_comm.c:417: warning: 'frag' may be used uninitialized in this function
osc_rdma_comm.c: In function 'ompi_osc_rdma_put_w_req':
osc_rdma_comm.c:251: warning: 'ptr' may be used uninitialized in this function
osc_rdma_comm.c:244: warning: 'frag' may be used uninitialized in this function
osc_rdma_comm.c: In function 'ompi_osc_rdma_get':
osc_rdma_comm.c:736: warning: 'ptr' may be used uninitialized in this function
osc_rdma_comm.c:733: warning: 'frag' may be used uninitialized in this function




vt_plugin_cntr.c: In function 'vt_plugin_cntr_write_post_mortem':
vt_plugin_cntr.c:1139: warning: 'min_counter' may be used uninitialized in this 
function
vt_plugin_cntr.c: In function 'vt_plugin_cntr_write_post_mortem':
vt_plugin_cntr.c:1139: warning: 'min_counter' may be used uninitialized in this 
function
vt_plugin_cntr.c: In function 'vt_plugin_cntr_write_post_mortem':
vt_plugin_cntr.c:1139: warning: 'min_counter' may be used uninitialized in this 
function
vt_plugin_cntr.c: In function 'vt_plugin_cntr_write_post_mortem':
vt_plugin_cntr.c:1139: warning: 'min_counter' may be used uninitialized in this 
function