FYI - for anyone interested.

> Begin forwarded message:
> 
> From: Ralph Castain <r...@open-mpi.org>
> Subject: PMIx 2.0 planning
> Date: June 1, 2015 at 7:59:50 AM PDT
> To: pmix-de...@open-mpi.org
> 
> Hi folks
> 
> With v1.0 nearly out the door, I’d like to invite discussion for v2.0. Our 
> initial plan is to release 2.0 in time for SC15, with the expectation that we 
> may not have all the features implemented yet - whether we add them during 
> the 2.0 series, or delay some to 3.0 remains TBD.
> 
> The initial thought is to focus 2.0 in the following areas - please note that 
> we would deeply appreciate the involvement of each relevant community, so 
> please feel free to forward this note and/or reach out to relevant 
> representatives:
> 
> 
> 1. Performance improvements
>    * dynamic spawn/reap of listening threads to achieve target performance of 
> completing 1000 client connections in < 1 sec
>    * shared memory use to reduce memory footprint (Elena has already sent out 
> some thoughts on this)
> 
> 
> 2. Fault response support
> We currently provide application notification of faults (existing and 
> impending) that includes information on the impacted processes. However, the 
> response is currently limited to calling PMIx_Abort - i.e., the app can take 
> internal action, but the only request it can make of the RM is to abort. We 
> do allow for abort of specific procs as opposed to the entire job, but we’d 
> like to support a broader set of options. For example, the app might request 
> a coordinated checkpoint, ask for replacement nodes to be allocated, or 
> request immediate restart at a reduced size.
> 
> 
> 3. File system support
> We would like to begin supporting file positioning directives - e.g., 
> hot/warm/cold data movement, persistence requests to maintain files and/or 
> shared memory regions across job steps, and burst buffer management.
> 
> 
> 4. Network/fabric support
> The existing notification capability can be used to notify of network issues. 
> However, there has been interest expressed in further interactions that would 
> allow an application to specify quality of service and security requirements, 
> request information on network topology, etc.
> 
> 
> 5. Power directives
> On very large scale systems, it is expected that some form of power 
> management will be required or desired. Most of that happens at allocation 
> request time, but there may be some possible directives an app could want to 
> pass during execution. We’re open to suggestion.
> 
> 
> Any other topics of interest are always welcome!
> Ralph
> 

Reply via email to