[OMPI devel] PML/ob1 problem
Hello guys, I'm running some experimental tcp btl which implements rdma GET method and advertises it in its flags of the btl API. The btl`s send() method returns rc=1 to select fast path for PML. (this optimization was added in revision 18551 in v1.3) It seems that in PML/ob1, mca_pml_ob1_send_request_start_rdma() function does not treat right such combination (btl GET + fastpath rc>0) and going into deadlock, i.e. +++ pml_ob1_sendreq.c +670 At this line, sendreq->req_state is 0 +++ pml_ob1_sendreq.c +800 At this line, if btl has GET method and btl`s send() returned fastpath hint - the call to mca_pml_ob1_rndv_completion_request() will decrement sendreq->req_state by one, leaving it to -1. This value of -1 will keep send_request_pml_complete_check() from completing request on PML level. The PML logic (in mca_pml_ob1_send_request_start_rdma) for PUT operation initializes req_state to "2" in pml_ob1_sendreq.c +791, but leaves req_state to 0 for GET operations. Please suggest. Thanks Mike.
Re: [OMPI devel] RFC: Rename several OMPI_* names to OPAL_*
Sounds reasonable to me - but again, let's commit it when we do the 1.4 branch so maintaining 1.3 doesn't become impossible. On Feb 10, 2009, at 2:38 PM, George Bosilca wrote: These changes look fine to me. However, I would like to amend this proposal to include the splitting of the config directory. Over the last months, I know several project that use OPAL, and they like to use it as an independent part and not as a subset of ompi. Therefore, I had to extract everything related to OPAL from the ompi tree. While the source code is pretty well divided in sub-projects, this is not the case for the m4 scripts in the config directory. I would like to propose to split the config directory in several parts: opal/config, orte/config and ompi/config and to modify the autogen script to take them into account. Thanks, george. On Feb 10, 2009, at 12:54 , Greg Koenig wrote: RFC: Rename several OMPI_* names to OPAL_* WHAT: Rename several #define values that encode the prefix "OMPI_" to instead encode the prefix "OPAL_" throughout the entire Open MPI source code tree. Also, eliminate unnecessary #include lines from source code files under the ".../ompi/mca/btl" subtree. WHY: (1) These are general source code improvements that update #define values to more accurately describe which layer the values belong and remove unnecessary dependencies within the source code; (2) These changes will help with the effort to move the BTL code into an independent layer. WHERE: 1.4 trunk WHEN: Negotiable -- see below, but probably near split for 1.4 (No earlier than February 19, 2009) Timeout: February 19, 2009 The proposed change involves renaming several #define values that encode the prefix "OMPI_" to instead encode the prefix "OPAL_" throughout the entire Open MPI source code tree. These names are holdovers from when the three existing layers of Open MPI were developed together prior to being split apart. Additionally, the proposed change eliminates a few unnecessary #include lines in BTL source code files under the .../ompi/mca/btl subtree. Specific modifications are detailed following this message text. A script to carry out these modifications is also attached to this message (gzipped to pass unmolested through the ORNL e-mail server). We believe these modifications improve the Open MPI source code by renaming values such that they correspond to the Open MPI layer to which they most closely belong, and that this improvement is itself of benefit to Open MPI. These modifications will also aid our ongoing efforts to extract the BTL code into a new layer ("ONET") that can be built with just direct dependence on the OPAL layer. Although these changes are simple string substitutions, they touch a fair amount of code in the Open MPI tree. Three people have tested these changes at our site on various platforms and have not discovered any problems. However, we recognize that some members of the community may have input/feedback regarding testing and we remain open to suggestions related to testing. One challenge that has been brought up regarding this RFC is that applying patches and/or CMRs to the source code tree after the proposed changes are performed will be more difficult. To that end, the best opportunity to apply the modifications proposed in this RFC seems to be in conjunction with 1.4. (My understanding from the developer conference call this morning is that there are a few other changes waiting for this switch as well.) We are open to suggestions about the best time to apply this RFC to avoid major disruptions. Specific changes follow: * From .../configure.ac. * OMPI_NEED_C_BOOL * OMPI_HAVE_WEAK_SYMBOLS * OMPI_C_HAVE_WEAK_SYMBOLS * OMPI_USE_STDBOOL_H * OMPI_HAVE_SA_RESTART * OMPI_HAVE_VA_COPY * OMPI_HAVE_UNDERSCORE_VA_COPY * OMPI_PTRDIFF_TYPE * (also, ompi_ptrdiff_t) * OMPI_ALIGN_WORD_SIZE_INTEGERS * OMPI_WANT_LIBLTDL * (also, OMPI_ENABLE_DLOPEN_SUPPORT) * OMPI_STDC_HEADERS * OMPI_HAVE_SYS_TIME_H * OMPI_HAVE_LONG_LONG * OMPI_HAVE_SYS_SYNCH_H * OMPI_SIZEOF_BOOL * OMPI_SIZEOF_INT * From .../config/ompi_check_attributes.m4. * OMPI_HAVE_ATTRIBUTE * (also, ompi_cv___attribute__) * OMPI_HAVE_ATTRIBUTE_ALIGNED * (also, ompi_cv___attribute__aligned) * OMPI_HAVE_ATTRIBUTE_ALWAYS_INLINE * (also, ompi_cv___attribute__always_inline) * OMPI_HAVE_ATTRIBUTE_COLD * (also, ompi_cv___attribute__cold) * OMPI_HAVE_ATTRIBUTE_CONST * (also, ompi_cv___attribute__const) * OMPI_HAVE_ATTRIBUTE_DEPRECATED * (also, ompi_cv___attribute__deprecated) * OMPI_HAVE_ATTRIBUTE_FORMAT * (also, ompi_cv___attribute__format) * OMPI_HAVE_ATTRIBUTE_HOT * (also, ompi_cv___attribute__hot) * OMPI_HAVE_ATTRIBUTE_MALLOC * (also, ompi_cv___attribute__malloc) * OMPI_HAVE_ATTRIBUTE_MAY_ALIAS * (also, ompi
Re: [OMPI devel] RFC: Rename several OMPI_* names to OPAL_*
This is a pretty good suggestion; I think it could be useful to what we're trying to do with STCI, but I can see that it could be generally useful to others as well. As long as nobody objects on the grounds that this is too much to put into a single RFC, I can work towards the goal of incorporating this suggestion into what we've proposed below. I suspect/know that a couple of the .m4 files in the current .../config/ subtree contain a mixture both ompi_ and opal_ related variables, so some non-zero amount of thought is probably required to make the right thing happen with these. I'm working on another project today, but will investigate later this week. If anybody has useful suggestions about how to tackle this, I'm open. On 2/10/09 4:38 PM, "George Bosilca" wrote: > These changes look fine to me. However, I would like to amend this > proposal to include the splitting of the config directory. Over the > last months, I know several project that use OPAL, and they like to > use it as an independent part and not as a subset of ompi. Therefore, > I had to extract everything related to OPAL from the ompi tree. While > the source code is pretty well divided in sub-projects, this is not > the case for the m4 scripts in the config directory. > > I would like to propose to split the config directory in several > parts: opal/config, orte/config and ompi/config and to modify the > autogen script to take them into account. > >Thanks, > george. > > On Feb 10, 2009, at 12:54 , Greg Koenig wrote: > >> RFC: Rename several OMPI_* names to OPAL_* >> >> WHAT: Rename several #define values that encode the prefix "OMPI_" to >> instead encode the prefix "OPAL_" throughout the entire Open MPI >> source code >> tree. Also, eliminate unnecessary #include lines from source code >> files >> under the ".../ompi/mca/btl" subtree. >> >> WHY: (1) These are general source code improvements that update >> #define >> values to more accurately describe which layer the values belong and >> remove >> unnecessary dependencies within the source code; (2) These changes >> will help >> with the effort to move the BTL code into an independent layer. >> >> WHERE: 1.4 trunk >> >> WHEN: Negotiable -- see below, but probably near split for 1.4 >> (No earlier than February 19, 2009) >> >> Timeout: February 19, 2009 >> >> >> >> The proposed change involves renaming several #define values that >> encode the >> prefix "OMPI_" to instead encode the prefix "OPAL_" throughout the >> entire >> Open MPI source code tree. These names are holdovers from when the >> three >> existing layers of Open MPI were developed together prior to being >> split >> apart. Additionally, the proposed change eliminates a few unnecessary >> #include lines in BTL source code files under the .../ompi/mca/btl >> subtree. >> >> Specific modifications are detailed following this message text. A >> script >> to carry out these modifications is also attached to this message >> (gzipped >> to pass unmolested through the ORNL e-mail server). >> >> We believe these modifications improve the Open MPI source code by >> renaming >> values such that they correspond to the Open MPI layer to which they >> most >> closely belong, and that this improvement is itself of benefit to >> Open MPI. >> These modifications will also aid our ongoing efforts to extract the >> BTL >> code into a new layer ("ONET") that can be built with just direct >> dependence >> on the OPAL layer. >> >> Although these changes are simple string substitutions, they touch a >> fair >> amount of code in the Open MPI tree. Three people have tested these >> changes >> at our site on various platforms and have not discovered any problems. >> However, we recognize that some members of the community may have >> input/feedback regarding testing and we remain open to suggestions >> related >> to testing. >> >> One challenge that has been brought up regarding this RFC is that >> applying >> patches and/or CMRs to the source code tree after the proposed >> changes are >> performed will be more difficult. To that end, the best opportunity >> to >> apply the modifications proposed in this RFC seems to be in >> conjunction with >> 1.4. (My understanding from the developer conference call this >> morning is >> that there are a few other changes waiting for this switch as >> well.) We are >> open to suggestions about the best time to apply this RFC to avoid >> major >> disruptions. >> >> >> Specific changes follow: >> >> * From .../configure.ac. >> * OMPI_NEED_C_BOOL >> * OMPI_HAVE_WEAK_SYMBOLS >> * OMPI_C_HAVE_WEAK_SYMBOLS >> * OMPI_USE_STDBOOL_H >> * OMPI_HAVE_SA_RESTART >> * OMPI_HAVE_VA_COPY >> * OMPI_HAVE_UNDERSCORE_VA_COPY >> * OMPI_PTRDIFF_TYPE >> * (also, ompi_ptrdiff_t) >> * OMPI_ALIGN_WORD_SIZE_INTEGERS >> * OMPI_WANT_LIBLTDL >> * (also, OMPI_ENABLE_DLOPEN_SUPPORT) >> * OMPI_STDC_HEADERS >>
[OMPI devel] possible bugs and unexpected values in returned errors classes
Below a list of stuff that I've got by running mpi4py testsuite. Never reported them before just because some of them are not actually errors, but anyway, I want to raise the discussion. - Likely bugs (regarding my interpretation of the MPI standard) 1) When passing MPI_REQUEST_NULL, MPI_Request_free() DO NOT fail. 2) When passing MPI_REQUEST_NULL, MPI_Cancel() DO NOT fail. 3) When passing MPI_REQUEST_NULL, MPI_Request_get_status() DO NOT fail. 4) When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and MPI_Win_set_errhandler() DO NOT fail. - Unexpected errors classes (at least for me) 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with MPI_ERR_ARG. I would expect MPI_ERR_COMM. 2) MPI_Type_free() fails with MPI_ERR_INTERN when passing predefined datatypes like MPI_INT or MPI_FLOAT. I would expect MPI_ERR_TYPE. - Controversial (I'm even fine with the current behavior) 1) MPI_Info_get_nthkey(info, n) returns MPI_ERR_INFO_KEY when "n" is larger that the number of keys. Perhaps MPI_ERR_ARG would be more appropriate? A possible rationale would be that the error is not related to the contents of a 'key' string, but an out of range value for "n". That's all. Sorry for being so pedantic :-) and not offering help for the patches, but I'm really busy. -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594