[OMPI devel] PML/ob1 problem

2009-02-11 Thread Mike Dubman
Hello guys,

I'm running some experimental tcp btl which implements rdma GET method and
advertises it in its flags of the btl API.
The btl`s send() method returns rc=1 to select fast path for PML. (this
optimization was added in revision 18551 in v1.3)

It seems that in PML/ob1, mca_pml_ob1_send_request_start_rdma() function
does not treat right such combination (btl GET + fastpath rc>0) and going
into deadlock, i.e.

+++ pml_ob1_sendreq.c +670
At this line, sendreq->req_state is 0

+++ pml_ob1_sendreq.c +800
At this line, if btl has GET method and btl`s send() returned fastpath hint
- the call to mca_pml_ob1_rndv_completion_request() will decrement
sendreq->req_state by one, leaving it to -1.

This value of -1 will keep send_request_pml_complete_check() from completing
request on PML level.

The PML logic (in mca_pml_ob1_send_request_start_rdma) for PUT operation
initializes req_state to "2" in pml_ob1_sendreq.c +791, but leaves req_state
to 0 for GET operations.

Please suggest.

Thanks

Mike.


Re: [OMPI devel] RFC: Rename several OMPI_* names to OPAL_*

2009-02-11 Thread Ralph Castain
Sounds reasonable to me - but again, let's commit it when we do the  
1.4 branch so maintaining 1.3 doesn't become impossible.



On Feb 10, 2009, at 2:38 PM, George Bosilca wrote:

These changes look fine to me. However, I would like to amend this  
proposal to include the splitting of the config directory. Over the  
last months, I know several project that use OPAL, and they like to  
use it as an independent part and not as a subset of ompi.  
Therefore, I had to extract everything related to OPAL from the ompi  
tree. While the source code is pretty well divided in sub-projects,  
this is not the case for the m4 scripts in the config directory.


I would like to propose to split the config directory in several  
parts: opal/config, orte/config and ompi/config and to modify the  
autogen script to take them into account.


 Thanks,
   george.

On Feb 10, 2009, at 12:54 , Greg Koenig wrote:


RFC: Rename several OMPI_* names to OPAL_*

WHAT: Rename several #define values that encode the prefix "OMPI_" to
instead encode the prefix "OPAL_" throughout the entire Open MPI  
source code
tree.  Also, eliminate unnecessary #include lines from source code  
files

under the ".../ompi/mca/btl" subtree.

WHY: (1) These are general source code improvements that update  
#define
values to more accurately describe which layer the values belong  
and remove
unnecessary dependencies within the source code; (2) These changes  
will help

with the effort to move the BTL code into an independent layer.

WHERE: 1.4 trunk

WHEN: Negotiable -- see below, but probably near split for 1.4
(No earlier than February 19, 2009)

Timeout: February 19, 2009



The proposed change involves renaming several #define values that  
encode the
prefix "OMPI_" to instead encode the prefix "OPAL_" throughout the  
entire
Open MPI source code tree.  These names are holdovers from when the  
three
existing layers of Open MPI were developed together prior to being  
split
apart.  Additionally, the proposed change eliminates a few  
unnecessary
#include lines in BTL source code files under the .../ompi/mca/btl  
subtree.


Specific modifications are detailed following this message text.  A  
script
to carry out these modifications is also attached to this message  
(gzipped

to pass unmolested through the ORNL e-mail server).

We believe these modifications improve the Open MPI source code by  
renaming
values such that they correspond to the Open MPI layer to which  
they most
closely belong, and that this improvement is itself of benefit to  
Open MPI.
These modifications will also aid our ongoing efforts to extract  
the BTL
code into a new layer ("ONET") that can be built with just direct  
dependence

on the OPAL layer.

Although these changes are simple string substitutions, they touch  
a fair
amount of code in the Open MPI tree.  Three people have tested  
these changes
at our site on various platforms and have not discovered any  
problems.

However, we recognize that some members of the community may have
input/feedback regarding testing and we remain open to suggestions  
related

to testing.

One challenge that has been brought up regarding this RFC is that  
applying
patches and/or CMRs to the source code tree after the proposed  
changes are
performed will be more difficult.  To that end, the best  
opportunity to
apply the modifications proposed in this RFC seems to be in  
conjunction with
1.4.  (My understanding from the developer conference call this  
morning is
that there are a few other changes waiting for this switch as  
well.)  We are
open to suggestions about the best time to apply this RFC to avoid  
major

disruptions.


Specific changes follow:

* From .../configure.ac.
   * OMPI_NEED_C_BOOL
   * OMPI_HAVE_WEAK_SYMBOLS
   * OMPI_C_HAVE_WEAK_SYMBOLS
   * OMPI_USE_STDBOOL_H
   * OMPI_HAVE_SA_RESTART
   * OMPI_HAVE_VA_COPY
   * OMPI_HAVE_UNDERSCORE_VA_COPY
   * OMPI_PTRDIFF_TYPE
   * (also, ompi_ptrdiff_t)
   * OMPI_ALIGN_WORD_SIZE_INTEGERS
   * OMPI_WANT_LIBLTDL
   * (also, OMPI_ENABLE_DLOPEN_SUPPORT)
   * OMPI_STDC_HEADERS
   * OMPI_HAVE_SYS_TIME_H
   * OMPI_HAVE_LONG_LONG
   * OMPI_HAVE_SYS_SYNCH_H
   * OMPI_SIZEOF_BOOL
   * OMPI_SIZEOF_INT

* From .../config/ompi_check_attributes.m4.
   * OMPI_HAVE_ATTRIBUTE
   * (also, ompi_cv___attribute__)
   * OMPI_HAVE_ATTRIBUTE_ALIGNED
   * (also, ompi_cv___attribute__aligned)
   * OMPI_HAVE_ATTRIBUTE_ALWAYS_INLINE
   * (also, ompi_cv___attribute__always_inline)
   * OMPI_HAVE_ATTRIBUTE_COLD
   * (also, ompi_cv___attribute__cold)
   * OMPI_HAVE_ATTRIBUTE_CONST
   * (also, ompi_cv___attribute__const)
   * OMPI_HAVE_ATTRIBUTE_DEPRECATED
   * (also, ompi_cv___attribute__deprecated)
   * OMPI_HAVE_ATTRIBUTE_FORMAT
   * (also, ompi_cv___attribute__format)
   * OMPI_HAVE_ATTRIBUTE_HOT
   * (also, ompi_cv___attribute__hot)
   * OMPI_HAVE_ATTRIBUTE_MALLOC
   * (also, ompi_cv___attribute__malloc)
   * OMPI_HAVE_ATTRIBUTE_MAY_ALIAS
   * (also, ompi

Re: [OMPI devel] RFC: Rename several OMPI_* names to OPAL_*

2009-02-11 Thread Greg Koenig
This is a pretty good suggestion; I think it could be useful to what we're
trying to do with STCI, but I can see that it could be generally useful to
others as well.  As long as nobody objects on the grounds that this is too
much to put into a single RFC, I can work towards the goal of incorporating
this suggestion into what we've proposed below.

I suspect/know that a couple of the .m4 files in the current .../config/
subtree contain a mixture both ompi_ and opal_ related variables, so some
non-zero amount of thought is probably required to make the right thing
happen with these.  I'm working on another project today, but will
investigate later this week.  If anybody has useful suggestions about how to
tackle this, I'm open.


On 2/10/09 4:38 PM, "George Bosilca"  wrote:

> These changes look fine to me. However, I would like to amend this
> proposal to include the splitting of the config directory. Over the
> last months, I know several project that use OPAL, and they like to
> use it as an independent part and not as a subset of ompi. Therefore,
> I had to extract everything related to OPAL from the ompi tree. While
> the source code is pretty well divided in sub-projects, this is not
> the case for the m4 scripts in the config directory.
> 
> I would like to propose to split the config directory in several
> parts: opal/config, orte/config and ompi/config and to modify the
> autogen script to take them into account.
> 
>Thanks,
>  george.
> 
> On Feb 10, 2009, at 12:54 , Greg Koenig wrote:
> 
>> RFC: Rename several OMPI_* names to OPAL_*
>> 
>> WHAT: Rename several #define values that encode the prefix "OMPI_" to
>> instead encode the prefix "OPAL_" throughout the entire Open MPI
>> source code
>> tree.  Also, eliminate unnecessary #include lines from source code
>> files
>> under the ".../ompi/mca/btl" subtree.
>> 
>> WHY: (1) These are general source code improvements that update
>> #define
>> values to more accurately describe which layer the values belong and
>> remove
>> unnecessary dependencies within the source code; (2) These changes
>> will help
>> with the effort to move the BTL code into an independent layer.
>> 
>> WHERE: 1.4 trunk
>> 
>> WHEN: Negotiable -- see below, but probably near split for 1.4
>>  (No earlier than February 19, 2009)
>> 
>> Timeout: February 19, 2009
>> 
>> 
>> 
>> The proposed change involves renaming several #define values that
>> encode the
>> prefix "OMPI_" to instead encode the prefix "OPAL_" throughout the
>> entire
>> Open MPI source code tree.  These names are holdovers from when the
>> three
>> existing layers of Open MPI were developed together prior to being
>> split
>> apart.  Additionally, the proposed change eliminates a few unnecessary
>> #include lines in BTL source code files under the .../ompi/mca/btl
>> subtree.
>> 
>> Specific modifications are detailed following this message text.  A
>> script
>> to carry out these modifications is also attached to this message
>> (gzipped
>> to pass unmolested through the ORNL e-mail server).
>> 
>> We believe these modifications improve the Open MPI source code by
>> renaming
>> values such that they correspond to the Open MPI layer to which they
>> most
>> closely belong, and that this improvement is itself of benefit to
>> Open MPI.
>> These modifications will also aid our ongoing efforts to extract the
>> BTL
>> code into a new layer ("ONET") that can be built with just direct
>> dependence
>> on the OPAL layer.
>> 
>> Although these changes are simple string substitutions, they touch a
>> fair
>> amount of code in the Open MPI tree.  Three people have tested these
>> changes
>> at our site on various platforms and have not discovered any problems.
>> However, we recognize that some members of the community may have
>> input/feedback regarding testing and we remain open to suggestions
>> related
>> to testing.
>> 
>> One challenge that has been brought up regarding this RFC is that
>> applying
>> patches and/or CMRs to the source code tree after the proposed
>> changes are
>> performed will be more difficult.  To that end, the best opportunity
>> to
>> apply the modifications proposed in this RFC seems to be in
>> conjunction with
>> 1.4.  (My understanding from the developer conference call this
>> morning is
>> that there are a few other changes waiting for this switch as
>> well.)  We are
>> open to suggestions about the best time to apply this RFC to avoid
>> major
>> disruptions.
>> 
>> 
>> Specific changes follow:
>> 
>> * From .../configure.ac.
>> * OMPI_NEED_C_BOOL
>> * OMPI_HAVE_WEAK_SYMBOLS
>> * OMPI_C_HAVE_WEAK_SYMBOLS
>> * OMPI_USE_STDBOOL_H
>> * OMPI_HAVE_SA_RESTART
>> * OMPI_HAVE_VA_COPY
>> * OMPI_HAVE_UNDERSCORE_VA_COPY
>> * OMPI_PTRDIFF_TYPE
>> * (also, ompi_ptrdiff_t)
>> * OMPI_ALIGN_WORD_SIZE_INTEGERS
>> * OMPI_WANT_LIBLTDL
>> * (also, OMPI_ENABLE_DLOPEN_SUPPORT)
>> * OMPI_STDC_HEADERS
>> 

[OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-11 Thread Lisandro Dalcin
Below a list of stuff that I've got by running mpi4py testsuite. Never
reported them before just because some of them are not actually
errors, but anyway, I want to raise the discussion.

- Likely bugs (regarding my interpretation of the MPI standard)

1) When passing MPI_REQUEST_NULL, MPI_Request_free() DO NOT fail.

2) When passing MPI_REQUEST_NULL, MPI_Cancel() DO NOT fail.

3) When passing MPI_REQUEST_NULL, MPI_Request_get_status() DO NOT fail.

4)  When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and
MPI_Win_set_errhandler()  DO NOT fail.


- Unexpected errors classes (at least for me)

1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with
MPI_ERR_ARG. I would expect MPI_ERR_COMM.

2) MPI_Type_free() fails with MPI_ERR_INTERN when passing predefined
datatypes like MPI_INT or MPI_FLOAT. I would expect MPI_ERR_TYPE.


- Controversial (I'm even fine with the current behavior)

1) MPI_Info_get_nthkey(info, n) returns MPI_ERR_INFO_KEY when "n" is
larger that the number of keys. Perhaps MPI_ERR_ARG would be more
appropriate? A possible rationale would be that the error is not
related to the contents of a 'key' string, but an out of range value
for "n".


That's all. Sorry for being so pedantic :-) and not offering help for
the patches, but I'm really busy.


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594