[OMPI devel] Potential issue with PERUSE_COMM_MSG_MATCH_POSTED_REQ event called for unexpected matches
I thought I would run this by the group before trying to unravel the code and figure out how to fix the problem. It looks to me from some experiementation that when a process matches an unexpected message that the PERUSE framework incorrectly fires a PERUSE_COMM_MSG_MATCH_POSTED_REQ in addition to a PERUSE_COMM_REQ_MATCH_UNEX event. I believe this is wrong that the former event should not be fired in this case. If the above assumption is true I think the problem arises because PERUSE_COMM_MSG_MATCH_POSTED_REQ event is fired in function mca_pml_ob1_recv_request_progress which is called by mca_pml_ob1_recv_request_match_specific when a match of an unexpected message has occurred. I am wondering if the PERUSE_COMM_MSG_MATCH_POSTED_REQ event should be moved to a more posted queue centric routine something like mca_pml_ob1_recv_frag_match? Suggestions...thoughts? --td
Re: [OMPI devel] [RFC] Runtime Services Layer
Just returned from vacation...sorry for delayed response In the past, I have expressed three concerns about the RSL. I'll aggregate them here for those who haven't seen them before - and apologize in advance for the long note. For those wanting it in short, the concerns (somewhat related) are: 1. What problem are we really trying to solve? 2. Who is going to maintain old RTE versions, and why? 3. Are we constraining ourselves from further improvements in startup performance? My bottom line recommendation: I have no philosophical issue with the RSL concept. However, I recommend holding off until the next version of ORTE is completed and then re-evaluating to see how valuable the RSL might be, as that next version will include memory footprint reduction and framework consolidation that may yield much of the RSL's value without the extra work. Long version: 1. What problem are we really trying to solve? If the RSL is intended to solve the Cray support problem (where the Cray OS really just wants to see OMPI, not ORTE), then it may have some value. The issue to date has revolved around the difficulty of maintaining the Cray port in the face of changes to ORTE - as new frameworks are added, special components for Cray also need to be created to provide a "do-nothing" capability. In addition, the Cray is memory constrained, and the ORTE library occupies considerable space while providing very little functionality. The degree of value provide by the RSL will therefore depend somewhat on the efficacy of the changes in development within ORTE. Those changes will, among other things, significantly consolidate and reduce the number of frameworks, and reduce the memory footprint. The expectation is that the result will require only a single CNOS component in one framework. It isn't clear, therefore, that the RSL will provide a significant value in that environment. If the RSL is intended to aid in ORTE development, as hinted at in the RFC, then I believe that is questionable. Developing ORTE in a tmp branch has proven reasonably effective as changes to the MPI layer are largely invisible to ORTE. Creating another layer to the system that would also have to be maintained seems like a non-productive way of addressing any problems in that area. If the RSL is intended as a means of "freezing" the MPI-RTE interface, then I believe we could better attain that objective by simply defining a set of requirements for the RTE. As I'll note below, freezing the interface at an API level could negatively impact other Open MPI objectives. 2. Who is going to maintain old RTE versions, and why? It isn't clear to me why anyone would want to do this - are we seriously proposing that we maintain support for the ORTE layer that shipped with Open MPI 1.0?? Can someone explain why we would want to do that? Given what I know of ORTE, it seems questionable that, for example, one could have RSL components for both the ORTE that shipped with Open MPI 1.0 and the ORTE that is currently in the trunk without writing a great deal of RSL code. Creating an RSL component for the ORTE intended for Open MPI 1.3 would seem like even greater work as the flow of control is very different (see below). I'm sure one could overcome this with considerable code in the respective RSL components - but I have difficulty understanding the value in doing all that coding. Can someone explain that, and can we identify the personnel (and/or their organization) that are willing to perform that function? 3. Are we constraining ourselves from further improvements in startup performance? This is my biggest area of concern. The RSL has been proposed as an API-level definition. However, the MPI-RTE interaction really is defined in terms of a flow-of-control - although each point of interaction is instantiated as an API, the fact is that what happens at that point is not independent of all prior interactions. As an example of my concern, consider what we are currently doing with ORTE. The latest change in requirements involves the need to significantly improve startup time, reduce memory footprint, and reduce ORTE complexity. What we are doing to meet that requirement is to review the delineation of responsibilities between the MPI and RTE layers. The current delineation evolved over time, with many of the decisions made at a very early point in the program. For example, we instituted RTE-level stage gates in the MPI layer because, at the time they were needed, the MPI developers didn't want to deal with them on their side (e.g., ensuring that failure of one proc wouldn't hang the system). Given today's level of maturity in the MPI layer, we are now planning on moving the stage gates to the MPI layer, implemented as an "all-to-all" - this will remove several thousand lines of code from ORTE and make it easier for the MPI layer to operate on non-ORTE environments. Similar efforts are underway to reduce ORTE involvement in the modex operation and other parts of
[OMPI devel] Orted problem
Hi, I am having a problem with the last version of openmpi. In some executions (1 each 100 more or less) a message is printed: [tegasaste:01617] [NO-NAME] ORTE_ERROR_LOG: File read failure in file util/universe_setup_file_io.c at line 123 It seems like if it try to read the universe file and it have nothing. If I look the file, it contains correct information. It seems like if the file would have been created, but no filled yet when the read is executed. The output of ompi_info command: Open MPI: 1.2.3 Open MPI SVN revision: r15136 Open RTE: 1.2.3 Open RTE SVN revision: r15136 OPAL: 1.2.3 OPAL SVN revision: r15136 Prefix: /soft/openmpi1.2.3 Configured architecture: i686-pc-linux-gnu Configured by: csegura Configured on: Wed Aug 22 04:25:19 WEST 2007 Configure host: tegasaste Built by: csegura Built on: miÃ(c) ago 22 04:38:34 WEST 2007 Built host: tegasaste C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.3) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.3) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.3) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.3) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.3) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.3) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.3) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.3) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.3) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.3) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.3) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.3) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.3) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.3) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.3) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.3) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.3) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.3) MCA btl: openib (MCA v1.0, API v1.0.1, Component v1.2.3) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.3) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.3) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.3) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.3) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.3) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.3) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.3) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.3) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.3) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.3) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.3) MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.3) MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.3) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.3) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.3) MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.3) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.3) MCA rds: resfile (MCA v1.