FYI - for anyone interested. > Begin forwarded message: > > From: Ralph Castain <r...@open-mpi.org> > Subject: PMIx 2.0 planning > Date: June 1, 2015 at 7:59:50 AM PDT > To: pmix-de...@open-mpi.org > > Hi folks > > With v1.0 nearly out the door, I’d like to invite discussion for v2.0. Our > initial plan is to release 2.0 in time for SC15, with the expectation that we > may not have all the features implemented yet - whether we add them during > the 2.0 series, or delay some to 3.0 remains TBD. > > The initial thought is to focus 2.0 in the following areas - please note that > we would deeply appreciate the involvement of each relevant community, so > please feel free to forward this note and/or reach out to relevant > representatives: > > > 1. Performance improvements > * dynamic spawn/reap of listening threads to achieve target performance of > completing 1000 client connections in < 1 sec > * shared memory use to reduce memory footprint (Elena has already sent out > some thoughts on this) > > > 2. Fault response support > We currently provide application notification of faults (existing and > impending) that includes information on the impacted processes. However, the > response is currently limited to calling PMIx_Abort - i.e., the app can take > internal action, but the only request it can make of the RM is to abort. We > do allow for abort of specific procs as opposed to the entire job, but we’d > like to support a broader set of options. For example, the app might request > a coordinated checkpoint, ask for replacement nodes to be allocated, or > request immediate restart at a reduced size. > > > 3. File system support > We would like to begin supporting file positioning directives - e.g., > hot/warm/cold data movement, persistence requests to maintain files and/or > shared memory regions across job steps, and burst buffer management. > > > 4. Network/fabric support > The existing notification capability can be used to notify of network issues. > However, there has been interest expressed in further interactions that would > allow an application to specify quality of service and security requirements, > request information on network topology, etc. > > > 5. Power directives > On very large scale systems, it is expected that some form of power > management will be required or desired. Most of that happens at allocation > request time, but there may be some possible directives an app could want to > pass during execution. We’re open to suggestion. > > > Any other topics of interest are always welcome! > Ralph >