Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-26 Thread George Bosilca
All,

I take advantage of this thread to clarify what is missing to have a perfectly 
MPI agnostic BTL interface. Some of these issues are pretty straightforward 
(getting rid of RTE and OMPI vestiges), some others will require some thinking 
from their developers in order to cope with a not conformant design (such as 
using MPI_COMM_WORLD in the BTL). So, here is an exhaustive list:

- Open IB uses quite a few ORTE internals: orte_proc_is_bound
- also it makes usage of some functions/define that I can’t find anywhere in 
the code base ompi_progress_threads

- UGNI uses MPI_COMM_WORLD for internal management
- USNIC uses num_procs for internal management. It also directly calls 
ompi_rte_abort
- common OFACM uses the num_procs to hash table allocation

Two items are of general interest as they affect our compatibility with past 
installations/usages:
- MPOOL alloc uses MPI level info keys … 
- most of the BTL MCA parameters have not been renamed (!!!). Personally, I 
would be in favor of creating synonyms for now and then deprecate the OMPI 
version in 2.0, but I don’t want to enforce this on everybody. So, the 
discussion is open on this topic.

Ralph and Jeff (I think you added the seq interface to TCP), please take a look 
at the following:
- the implementation of the TCP seq interface seems to be wrong: it used the 
my_node_rank to compute the sequence number instead of the my_local_rank (I 
changed this to my_local_rank)

If you have any issue with the move, I’ll be happy to help and/or support you 
on your last move toward a completely generic BTL. To facilitate your work I 
exposed a minimalistic set of OMPI information at the OPAL level. Take a look 
at opal/util/proc.h for more info, but please try not to expose more.

  Thanks,
George.


On Jul 26, 2014, at 02:22 , Ralph Castain  wrote:

> That's because you folks didn't completely cleanup the open fabrics stuff 
> prior to the move - something that we warned about, but folks said they would 
> resolve later :-)
> 
> On Jul 25, 2014, at 11:19 PM, Mike Dubman  wrote:
> 
>> Making all in mca/common/ofacm
>> make[2]: Entering directory 
>> `/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'
>>   CC   libmca_common_ofacm_la-common_ofacm_base.lo
>>   CC   libmca_common_ofacm_la-common_ofacm_oob.lo
>>   CC   libmca_common_ofacm_la-common_ofacm_empty.lo
>>   LN_S libmca_common_ofacm.la
>> common_ofacm_oob.c: In function 'oob_component_query':
>> common_ofacm_oob.c:178: warning: passing argument 4 of 
>> 'orte_rml.recv_buffer_nb' from incompatible pointer type
>> common_ofacm_oob.c:178: note: expected 'orte_rml_buffer_callback_fn_t' but 
>> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
>> opal_buffer_t *, ompi_rml_tag_t,  void *)'
>> common_ofacm_xoob.c: In function 'xoob_context_init':
>> common_ofacm_xoob.c:354: error: request for member 'jobid' in something not 
>> a structure or union
>> common_ofacm_xoob.c: In function 'xoob_endpoint_fina
>> common_ofacm_oob.c:728: warning: passing argument 4 of 
>> 'orte_rml.send_buffer_nb' from incompatible pointer type
>> common_ofacm_oob.c:728: note: expected 'orte_rml_buffer_callback_fn_t' but 
>> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
>> opal_buffer_t *, ompi_rml_tag_t,  void *)'
>> common_ofacm_xoob.c: In function 'xoob_send_connect_data':
>> common_ofacm_xoob.c:791: warning: passing argument 1 of 
>> 'orte_rml.send_buffer_nb' from incompatible pointer type
>> common_ofacm_xoob.c:791: note: expected 'struct orte_process_name_t *' but 
>> argument is of type 'opal_process_name_t *'
>> common_ofacm_xoob.c:791: warning: passing argument 4 of 
>> 'orte_rml.send_buffer_nb' from incompatible pointer type
>> common_ofacm_xoob.c:791: note: expected 'orte_rml_buffer_callback_fn_t' but 
>> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
>> opal_buffer_t *, ompi_rml_tag_t,  void *)'
>> common_ofacm_xoob.c: In function 'xoob_recv_qp_create':
>> common_ofacm_xoob.c:963: warning: 'ibv_create_xrc_rcv_qp' is deprecated 
>> (declared at /usr/include/infiniband/ofa_verbs.h:126)
>> common_ofacm_xoob.c:983: warning: 'ibv_modify_xrc_rcv_qp' is deprecated 
>> (declared at /usr/include/infiniband/ofa_verbs.h:152)
>> common_ofacm_xoob.c:1011: warning: 'ibv_modify_xrc_rcv_qp' is deprecated 
>> (declared at /usr/include/infiniband/ofa_verbs.h:152)
>> common_ofacm_xoob.c: In function 'xoob_recv_qp_connect':
>> common_ofacm_xoob.c:1032: warning: 'ibv_reg_xrc_rcv_qp' is deprecated 
>> (declared at /usr/include/infiniband/ofa_verbs.h:185)
>> common_ofacm_xoob.c: In function 'xoob_component_query':
>> common_ofacm_xoob.c:1407: warning: passing argument 4 of 
>> 'orte_rml.recv_buffer_nb' from incompatible pointer type
>> common_ofacm_xoob.c:1407: note: expected 'orte_rml_buffer_callback_fn_t' but 
>> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
>> 

Re: [OMPI devel] RFC: Bump minimum sm pool size to 128K from 64K

2014-07-26 Thread Rolf vandeVaart
Yes (my mistake)


Sent from my iPhone

On Jul 26, 2014, at 3:19 PM, "George Bosilca" 
> wrote:

We are talking MB not KB isn't it?

  George.



On Thu, Jul 24, 2014 at 2:57 PM, Rolf vandeVaart 
> wrote:
WHAT: Bump up the minimum sm pool size to 128K from 64K.
WHY: When running OSU benchmark on 2 nodes and utilizing a larger 
btl_smcuda_max_send_size, we can run into the case where the free list cannot 
grow.  This is not a common case, but it is something that folks sometimes 
experiment with.  Also note that this minimum was set back 5 years ago so it 
seems that it could be time to bump it up.
WHEN: Tuesday, July 29, 2014 after weekly concall if there are no objections.


[rvandevaart@ivy0 ompi-trunk-regerror]$ svn diff 
ompi/mca/mpool/sm/mpool_sm_component.c
Index: ompi/mca/mpool/sm/mpool_sm_component.c
===
--- ompi/mca/mpool/sm/mpool_sm_component.c  (revision 32293)
+++ ompi/mca/mpool/sm/mpool_sm_component.c  (working copy)
@@ -80,7 +80,7 @@
 }
 };

-static long default_min = 67108864;
+static long default_min = 134217728;
 static unsigned long long ompi_mpool_sm_min_size;
 static int ompi_mpool_sm_verbose;

[rvandevaart@drossetti-ivy0 ompi-trunk-regerror]$
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15257.php

___
devel mailing list
de...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2014/07/15273.php


Re: [OMPI devel] RFC: Bump minimum sm pool size to 128K from 64K

2014-07-26 Thread George Bosilca
We are talking MB not KB isn't it?

  George.



On Thu, Jul 24, 2014 at 2:57 PM, Rolf vandeVaart 
wrote:

> WHAT: Bump up the minimum sm pool size to 128K from 64K.
> WHY: When running OSU benchmark on 2 nodes and utilizing a larger
> btl_smcuda_max_send_size, we can run into the case where the free list
> cannot grow.  This is not a common case, but it is something that folks
> sometimes experiment with.  Also note that this minimum was set back 5
> years ago so it seems that it could be time to bump it up.
> WHEN: Tuesday, July 29, 2014 after weekly concall if there are no
> objections.
>
>
> [rvandevaart@ivy0 ompi-trunk-regerror]$ svn diff
> ompi/mca/mpool/sm/mpool_sm_component.c
> Index: ompi/mca/mpool/sm/mpool_sm_component.c
> ===
> --- ompi/mca/mpool/sm/mpool_sm_component.c  (revision 32293)
> +++ ompi/mca/mpool/sm/mpool_sm_component.c  (working copy)
> @@ -80,7 +80,7 @@
>  }
>  };
>
> -static long default_min = 67108864;
> +static long default_min = 134217728;
>  static unsigned long long ompi_mpool_sm_min_size;
>  static int ompi_mpool_sm_verbose;
>
> [rvandevaart@drossetti-ivy0 ompi-trunk-regerror]$
>
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> ---
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/07/15257.php
>


Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-26 Thread Ralph Castain
That's because you folks didn't completely cleanup the open fabrics stuff prior 
to the move - something that we warned about, but folks said they would resolve 
later :-)

On Jul 25, 2014, at 11:19 PM, Mike Dubman  wrote:

> Making all in mca/common/ofacm
> make[2]: Entering directory 
> `/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'
>   CC   libmca_common_ofacm_la-common_ofacm_base.lo
>   CC   libmca_common_ofacm_la-common_ofacm_oob.lo
>   CC   libmca_common_ofacm_la-common_ofacm_empty.lo
>   LN_S libmca_common_ofacm.la
> common_ofacm_oob.c: In function 'oob_component_query':
> common_ofacm_oob.c:178: warning: passing argument 4 of 
> 'orte_rml.recv_buffer_nb' from incompatible pointer type
> common_ofacm_oob.c:178: note: expected 'orte_rml_buffer_callback_fn_t' but 
> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
> opal_buffer_t *, ompi_rml_tag_t,  void *)'
> common_ofacm_xoob.c: In function 'xoob_context_init':
> common_ofacm_xoob.c:354: error: request for member 'jobid' in something not a 
> structure or union
> common_ofacm_xoob.c: In function 'xoob_endpoint_fina
> common_ofacm_oob.c:728: warning: passing argument 4 of 
> 'orte_rml.send_buffer_nb' from incompatible pointer type
> common_ofacm_oob.c:728: note: expected 'orte_rml_buffer_callback_fn_t' but 
> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
> opal_buffer_t *, ompi_rml_tag_t,  void *)'
> common_ofacm_xoob.c: In function 'xoob_send_connect_data':
> common_ofacm_xoob.c:791: warning: passing argument 1 of 
> 'orte_rml.send_buffer_nb' from incompatible pointer type
> common_ofacm_xoob.c:791: note: expected 'struct orte_process_name_t *' but 
> argument is of type 'opal_process_name_t *'
> common_ofacm_xoob.c:791: warning: passing argument 4 of 
> 'orte_rml.send_buffer_nb' from incompatible pointer type
> common_ofacm_xoob.c:791: note: expected 'orte_rml_buffer_callback_fn_t' but 
> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
> opal_buffer_t *, ompi_rml_tag_t,  void *)'
> common_ofacm_xoob.c: In function 'xoob_recv_qp_create':
> common_ofacm_xoob.c:963: warning: 'ibv_create_xrc_rcv_qp' is deprecated 
> (declared at /usr/include/infiniband/ofa_verbs.h:126)
> common_ofacm_xoob.c:983: warning: 'ibv_modify_xrc_rcv_qp' is deprecated 
> (declared at /usr/include/infiniband/ofa_verbs.h:152)
> common_ofacm_xoob.c:1011: warning: 'ibv_modify_xrc_rcv_qp' is deprecated 
> (declared at /usr/include/infiniband/ofa_verbs.h:152)
> common_ofacm_xoob.c: In function 'xoob_recv_qp_connect':
> common_ofacm_xoob.c:1032: warning: 'ibv_reg_xrc_rcv_qp' is deprecated 
> (declared at /usr/include/infiniband/ofa_verbs.h:185)
> common_ofacm_xoob.c: In function 'xoob_component_query':
> common_ofacm_xoob.c:1407: warning: passing argument 4 of 
> 'orte_rml.recv_buffer_nb' from incompatible pointer type
> common_ofacm_xoob.c:1407: note: expected 'orte_rml_buffer_callback_fn_t' but 
> argument is of type 'void (*)(int,  opal_process_name_t *, struct 
> opal_buffer_t *, ompi_rml_tag_t,  void *)'
> make[2]: *** [libmca_common_ofacm_la-common_ofacm_xoob.lo] Error 1
> make[2]: *** Waiting for unfinished jobs
> make[2]: Leaving directory 
> `/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15271.php



[OMPI devel] trunk compilation errors in jenkins

2014-07-26 Thread Mike Dubman
Making all in mca/common/ofacm
make[2]: Entering directory
`/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'
  CC   libmca_common_ofacm_la-common_ofacm_base.lo
  CC   libmca_common_ofacm_la-common_ofacm_oob.lo
  CC   libmca_common_ofacm_la-common_ofacm_empty.lo
  LN_S libmca_common_ofacm.la
common_ofacm_oob.c: In function 'oob_component_query':
common_ofacm_oob.c:178: warning: passing argument 4 of
'orte_rml.recv_buffer_nb' from incompatible pointer type
common_ofacm_oob.c:178: note: expected 'orte_rml_buffer_callback_fn_t' but
argument is of type 'void (*)(int,  opal_process_name_t *, struct
opal_buffer_t *, ompi_rml_tag_t,  void *)'
common_ofacm_xoob.c: In function 'xoob_context_init':
common_ofacm_xoob.c:354: error: request for member 'jobid' in something not
a structure or union
common_ofacm_xoob.c: In function 'xoob_endpoint_fina
common_ofacm_oob.c:728: warning: passing argument 4 of
'orte_rml.send_buffer_nb' from incompatible pointer type
common_ofacm_oob.c:728: note: expected 'orte_rml_buffer_callback_fn_t' but
argument is of type 'void (*)(int,  opal_process_name_t *, struct
opal_buffer_t *, ompi_rml_tag_t,  void *)'
common_ofacm_xoob.c: In function 'xoob_send_connect_data':
common_ofacm_xoob.c:791: warning: passing argument 1 of
'orte_rml.send_buffer_nb' from incompatible pointer type
common_ofacm_xoob.c:791: note: expected 'struct orte_process_name_t *' but
argument is of type 'opal_process_name_t *'
common_ofacm_xoob.c:791: warning: passing argument 4 of
'orte_rml.send_buffer_nb' from incompatible pointer type
common_ofacm_xoob.c:791: note: expected 'orte_rml_buffer_callback_fn_t' but
argument is of type 'void (*)(int,  opal_process_name_t *, struct
opal_buffer_t *, ompi_rml_tag_t,  void *)'
common_ofacm_xoob.c: In function 'xoob_recv_qp_create':
common_ofacm_xoob.c:963: warning: 'ibv_create_xrc_rcv_qp' is deprecated
(declared at /usr/include/infiniband/ofa_verbs.h:126)
common_ofacm_xoob.c:983: warning: 'ibv_modify_xrc_rcv_qp' is deprecated
(declared at /usr/include/infiniband/ofa_verbs.h:152)
common_ofacm_xoob.c:1011: warning: 'ibv_modify_xrc_rcv_qp' is deprecated
(declared at /usr/include/infiniband/ofa_verbs.h:152)
common_ofacm_xoob.c: In function 'xoob_recv_qp_connect':
common_ofacm_xoob.c:1032: warning: 'ibv_reg_xrc_rcv_qp' is deprecated
(declared at /usr/include/infiniband/ofa_verbs.h:185)
common_ofacm_xoob.c: In function 'xoob_component_query':
common_ofacm_xoob.c:1407: warning: passing argument 4 of
'orte_rml.recv_buffer_nb' from incompatible pointer type
common_ofacm_xoob.c:1407: note: expected 'orte_rml_buffer_callback_fn_t'
but argument is of type 'void (*)(int,  opal_process_name_t *, struct
opal_buffer_t *, ompi_rml_tag_t,  void *)'
make[2]: *** [libmca_common_ofacm_la-common_ofacm_xoob.lo] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: Leaving directory
`/hpc/local/benchmarks/hpc-stack-gcc/src/install/ompi-master/opal/mca/common/ofacm'