Thanks! Much appreciate the help

On Aug 26, 2011, at 6:32 PM, George Bosilca wrote:

> I see. This should be fixed in r25098. Thanks for your patience.
> 
>  george.
> 
> On Aug 26, 2011, at 19:47 , Ralph Castain wrote:
> 
>> Has nothing to do with version, George - it's a problem of ORTE_ENABLE_EPOCH 
>> not being included in an AC_DEFINE. It is solely defined via AM_CONDITIONAL, 
>> but then used in .h files - which is simply wrong.
>> 
>> Please fix it.
>> 
>> 
>> On Aug 26, 2011, at 5:41 PM, George Bosilca wrote:
>> 
>>> We can't reproduce this. It compiles and runs without troubles on our macs. 
>>> However, it might depend on the Mac OS X version, we recently moved to Lion.
>>> 
>>> Thanks,
>>>  george.
>>> 
>>> On Aug 26, 2011, at 19:19 , Ralph Castain wrote:
>>> 
>>>> Hate to say this, but the trunk is broken - won't build on Mac with that 
>>>> disabled. I'll try to dig into it later :-(
>>>> 
>>>> 
>>>> On Aug 26, 2011, at 4:18 PM, Wesley Bland wrote:
>>>> 
>>>>> The epoch and resilient rote code is now macro'd away. To enable use
>>>>> 
>>>>> --enable-resilient-orte
>>>>> 
>>>>> which defines:
>>>>> 
>>>>> ORTE_ENABLE_EPOCH
>>>>> ORTE_RESIL_ORTE
>>>>> 
>>>>> --
>>>>> 
>>>>> Wesley
>>>>> 
>>>>> On Aug 26, 2011, at 6:16 PM, wbl...@osl.iu.edu wrote:
>>>>> 
>>>>>> Author: wbland
>>>>>> Date: 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> New Revision: 25093
>>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25093
>>>>>> 
>>>>>> Log:
>>>>>> By popular demand the epoch code is now disabled by default. 
>>>>>> 
>>>>>> To enable the epochs and the resilient orte code, use the configure flag:
>>>>>> 
>>>>>> --enable-resilient-orte
>>>>>> 
>>>>>> This will define both:
>>>>>> 
>>>>>> ORTE_ENABLE_EPOCH
>>>>>> ORTE_RESIL_ORTE
>>>>>> 
>>>>>> Text files modified: 
>>>>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c  |    12 
>>>>>> ++++                                    
>>>>>> trunk/ompi/mca/coll/sm2/coll_sm2_module.c                    |     3     
>>>>>>                                     
>>>>>> trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c                   |    49 
>>>>>> ++++++++----------                      
>>>>>> trunk/ompi/mca/dpm/orte/dpm_orte.c                           |     2     
>>>>>>                                     
>>>>>> trunk/ompi/mca/pml/bfo/pml_bfo_failover.c                    |    10 +-- 
>>>>>>                                     
>>>>>> trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h                         |     6 --  
>>>>>>                                     
>>>>>> trunk/ompi/proc/proc.c                                       |     6 +-  
>>>>>>                                     
>>>>>> trunk/opal/config/opal_configure_options.m4                  |     8 +++ 
>>>>>>                                     
>>>>>> trunk/orte/include/orte/types.h                              |    24 
>>>>>> +++++++++                               
>>>>>> trunk/orte/mca/db/daemon/db_daemon.c                         |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/errmgr/app/errmgr_app.c                       |    19 
>>>>>> ++++++-                                 
>>>>>> trunk/orte/mca/errmgr/base/errmgr_base_fns.c                 |    12 
>>>>>> ++--                                    
>>>>>> trunk/orte/mca/errmgr/base/errmgr_base_tool.c                |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/errmgr/hnp/errmgr_hnp.c                       |    99 
>>>>>> +++++++++++++++++++++++++++------------ 
>>>>>> trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c                 |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c                 |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/errmgr/orted/errmgr_orted.c                   |    71 
>>>>>> +++++++++++++++++++++-------            
>>>>>> trunk/orte/mca/ess/alps/ess_alps_module.c                    |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/base/base.h                               |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/base/ess_base_select.c                    |    14 
>>>>>> ++---                                   
>>>>>> trunk/orte/mca/ess/env/ess_env_module.c                      |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/ess.h                                     |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/generic/ess_generic_module.c              |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/hnp/ess_hnp_module.c                      |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/lsf/ess_lsf_module.c                      |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/singleton/ess_singleton_module.c          |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/slave/ess_slave_module.c                  |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/slurm/ess_slurm_module.c                  |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/slurmd/ess_slurmd_module.c                |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/ess/tm/ess_tm_module.c                        |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/filem/rsh/filem_rsh_module.c                  |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c              |    21 
>>>>>> ++-----                                 
>>>>>> trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c            |     8 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/iof/base/base.h                               |     8 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/iof/base/iof_base_open.c                      |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/iof/hnp/iof_hnp.c                             |     7 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/iof/hnp/iof_hnp_receive.c                     |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/iof/orted/iof_orted.c                         |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/odls/base/odls_base_default_fns.c             |     7 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/odls/base/odls_base_open.c                    |     5 -   
>>>>>>                                     
>>>>>> trunk/orte/mca/odls/base/odls_base_state.c                   |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/oob/tcp/oob_tcp_msg.c                         |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/oob/tcp/oob_tcp_peer.c                        |     5 ++  
>>>>>>                                     
>>>>>> trunk/orte/mca/plm/base/plm_base_jobid.c                     |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/plm/base/plm_base_launch_support.c            |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/plm/base/plm_base_orted_cmds.c                |     8 +-- 
>>>>>>                                     
>>>>>> trunk/orte/mca/plm/base/plm_base_receive.c                   |     7 ++  
>>>>>>                                     
>>>>>> trunk/orte/mca/plm/base/plm_base_rsh_support.c               |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c           |    23 
>>>>>> +++++----                               
>>>>>> trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c             |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/rmaps/seq/rmaps_seq.c                         |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/rmcast/base/rmcast_base_open.c                |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/rmcast/tcp/rmcast_tcp.c                       |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/rmcast/udp/rmcast_udp.c                       |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/rml/base/rml_base_components.c                |     5 +   
>>>>>>                                     
>>>>>> trunk/orte/mca/rml/rml_types.h                               |     6 +   
>>>>>>                                     
>>>>>> trunk/orte/mca/routed/base/routed_base_components.c          |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/routed/base/routed_base_register_sync.c       |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/mca/routed/binomial/routed_binomial.c             |    54 
>>>>>> ++++++++++++---------                   
>>>>>> trunk/orte/mca/routed/cm/routed_cm.c                         |    19 
>>>>>> +++----                                 
>>>>>> trunk/orte/mca/routed/direct/routed_direct.c                 |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/routed/linear/routed_linear.c                 |    17 
>>>>>> +++---                                  
>>>>>> trunk/orte/mca/routed/radix/routed_radix.c                   |    22 
>>>>>> ++++----                                
>>>>>> trunk/orte/mca/routed/slave/routed_slave.c                   |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/sensor/file/sensor_file.c                     |     2     
>>>>>>                                     
>>>>>> trunk/orte/mca/snapc/base/snapc_base_fns.c                   |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/snapc/full/snapc_full_global.c                |    12 
>>>>>> ++--                                    
>>>>>> trunk/orte/mca/snapc/full/snapc_full_local.c                 |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/snapc/full/snapc_full_module.c                |     4     
>>>>>>                                     
>>>>>> trunk/orte/mca/sstore/base/sstore_base_fns.c                 |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/sstore/central/sstore_central_global.c        |     3     
>>>>>>                                     
>>>>>> trunk/orte/mca/sstore/central/sstore_central_local.c         |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/sstore/stage/sstore_stage_global.c            |     7 +-  
>>>>>>                                     
>>>>>> trunk/orte/mca/sstore/stage/sstore_stage_local.c             |    12 
>>>>>> ++--                                    
>>>>>> trunk/orte/orted/orted_comm.c                                |    20 
>>>>>> ++++----                                
>>>>>> trunk/orte/orted/orted_main.c                                |     7 +-  
>>>>>>                                     
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c   |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c      |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c   |     6 ++  
>>>>>>                                     
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_print_fns.c     |    19 
>>>>>> +++++++                                 
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_size_fns.c      |     2     
>>>>>>                                     
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_support.h       |    11 
>>>>>> ++++                                    
>>>>>> trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c |    10 +++ 
>>>>>>                                     
>>>>>> trunk/orte/runtime/orte_data_server.c                        |     2     
>>>>>>                                     
>>>>>> trunk/orte/runtime/orte_globals.c                            |     4 +   
>>>>>>                                     
>>>>>> trunk/orte/runtime/orte_init.c                               |     9 +++ 
>>>>>>                                     
>>>>>> trunk/orte/runtime/orte_wait.h                               |     6 +-  
>>>>>>                                     
>>>>>> trunk/orte/test/system/oob_stress.c                          |     3     
>>>>>>                                     
>>>>>> trunk/orte/test/system/orte_ring.c                           |     6 -   
>>>>>>                                     
>>>>>> trunk/orte/test/system/orte_spawn.c                          |     4     
>>>>>>                                     
>>>>>> trunk/orte/tools/orte-ps/orte-ps.c                           |    10 +++ 
>>>>>>                                     
>>>>>> trunk/orte/tools/orte-top/orte-top.c                         |     2     
>>>>>>                                     
>>>>>> trunk/orte/util/comm/comm.c                                  |     7 ++  
>>>>>>                                     
>>>>>> trunk/orte/util/comm/comm.h                                  |     5 +   
>>>>>>                                     
>>>>>> trunk/orte/util/hnp_contact.c                                |     3     
>>>>>>                                     
>>>>>> trunk/orte/util/name_fns.c                                   |    47 
>>>>>> ++++++++++++++----                      
>>>>>> trunk/orte/util/name_fns.h                                   |    30 
>>>>>> ++++++++++++                            
>>>>>> trunk/orte/util/nidmap.c                                     |    13 
>>>>>> ++++                                    
>>>>>> trunk/orte/util/nidmap.h                                     |    11 
>>>>>> ++++                                    
>>>>>> trunk/orte/util/proc_info.c                                  |    14 
>>>>>> ++++-                                   
>>>>>> trunk/test/util/orte_session_dir.c                           |     2     
>>>>>>                                     
>>>>>> 101 files changed, 652 insertions(+), 362 deletions(-)
>>>>>> 
>>>>>> Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c  
>>>>>> (original)
>>>>>> +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c  
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -693,8 +693,16 @@
>>>>>> bool found = false;
>>>>>> 
>>>>>> BTL_VERBOSE(("Searching for ep and proc with follow parameters:"
>>>>>> -                "jobid %d, vpid %d, epoch %d, sid %" PRIx64 ", lid %d",
>>>>>> -                process_name->jobid, process_name->vpid, 
>>>>>> process_name->epoch, subnet_id, lid));
>>>>>> +                "jobid %d, vpid %d, "
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +                "epoch %d, "
>>>>>> +#endif
>>>>>> +                "sid %" PRIx64 ", lid %d",
>>>>>> +                process_name->jobid, process_name->vpid, 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +                process_name->epoch, 
>>>>>> +#endif
>>>>>> +                subnet_id, lid));
>>>>>> /* find ibproc */
>>>>>> OPAL_THREAD_LOCK(&mca_btl_openib_component.ib_lock);
>>>>>> for (ib_proc = (mca_btl_openib_proc_t*)
>>>>>> 
>>>>>> Modified: trunk/ompi/mca/coll/sm2/coll_sm2_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/coll/sm2/coll_sm2_module.c    (original)
>>>>>> +++ trunk/ompi/mca/coll/sm2/coll_sm2_module.c    2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -1208,7 +1208,8 @@
>>>>>>        peer = OBJ_NEW(orte_namelist_t);
>>>>>>        peer->name.jobid = 
>>>>>> comm->c_local_group->grp_proc_pointers[i]->proc_name.jobid;
>>>>>>        peer->name.vpid =  
>>>>>> comm->c_local_group->grp_proc_pointers[i]->proc_name.vpid;
>>>>>> -            peer->name.epoch = 
>>>>>> comm->c_local_group->grp_proc_pointers[i]->proc_name.epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(peer->name.epoch,comm->c_local_group->grp_proc_pointers[i]->proc_name.epoch);
>>>>>> +
>>>>>>        opal_list_append(&peers, &peer->item);
>>>>>>    }
>>>>>>    /* prepare send data */
>>>>>> 
>>>>>> Modified: trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c   (original)
>>>>>> +++ trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -702,7 +702,7 @@
>>>>>> void 
>>>>>> ompi_crcp_bkmrk_pml_peer_ref_construct(ompi_crcp_bkmrk_pml_peer_ref_t 
>>>>>> *peer_ref) {
>>>>>> peer_ref->proc_name.jobid  = ORTE_JOBID_INVALID;
>>>>>> peer_ref->proc_name.vpid   = ORTE_VPID_INVALID;
>>>>>> -    peer_ref->proc_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(peer_ref->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> OBJ_CONSTRUCT(&peer_ref->send_list,       opal_list_t);
>>>>>> OBJ_CONSTRUCT(&peer_ref->isend_list,      opal_list_t);
>>>>>> @@ -730,7 +730,7 @@
>>>>>> 
>>>>>> peer_ref->proc_name.jobid  = ORTE_JOBID_INVALID;
>>>>>> peer_ref->proc_name.vpid   = ORTE_VPID_INVALID;
>>>>>> -    peer_ref->proc_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(peer_ref->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> while( NULL != (item = opal_list_remove_first(&peer_ref->send_list)) ) {
>>>>>>    HOKE_TRAFFIC_MSG_REF_RETURN(item);
>>>>>> @@ -840,7 +840,7 @@
>>>>>> 
>>>>>> msg_ref->proc_name.jobid  = ORTE_JOBID_INVALID;
>>>>>> msg_ref->proc_name.vpid   = ORTE_VPID_INVALID;
>>>>>> -    msg_ref->proc_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> msg_ref->matched        = INVALID_INT;
>>>>>> msg_ref->done           = INVALID_INT;
>>>>>> @@ -868,7 +868,7 @@
>>>>>> 
>>>>>> msg_ref->proc_name.jobid  = ORTE_JOBID_INVALID;
>>>>>> msg_ref->proc_name.vpid   = ORTE_VPID_INVALID;
>>>>>> -    msg_ref->proc_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> msg_ref->matched        = INVALID_INT;
>>>>>> msg_ref->done           = INVALID_INT;
>>>>>> @@ -902,7 +902,7 @@
>>>>>> 
>>>>>> msg_ref->proc_name.jobid  = ORTE_JOBID_INVALID;
>>>>>> msg_ref->proc_name.vpid   = ORTE_VPID_INVALID;
>>>>>> -    msg_ref->proc_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> msg_ref->done           = INVALID_INT;
>>>>>> msg_ref->active         = INVALID_INT;
>>>>>> @@ -934,7 +934,7 @@
>>>>>> 
>>>>>> msg_ref->proc_name.jobid  = ORTE_JOBID_INVALID;
>>>>>> msg_ref->proc_name.vpid   = ORTE_VPID_INVALID;
>>>>>> -    msg_ref->proc_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> msg_ref->done           = INVALID_INT;
>>>>>> msg_ref->active         = INVALID_INT;
>>>>>> @@ -954,7 +954,7 @@
>>>>>> 
>>>>>> msg_ack_ref->peer.jobid  = ORTE_JOBID_INVALID;
>>>>>> msg_ack_ref->peer.vpid   = ORTE_VPID_INVALID;
>>>>>> -    msg_ack_ref->peer.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(msg_ack_ref->peer.epoch,ORTE_EPOCH_MIN);
>>>>>> }
>>>>>> 
>>>>>> void ompi_crcp_bkmrk_pml_drain_message_ack_ref_destruct( 
>>>>>> ompi_crcp_bkmrk_pml_drain_message_ack_ref_t *msg_ack_ref) {
>>>>>> @@ -962,7 +962,7 @@
>>>>>> 
>>>>>> msg_ack_ref->peer.jobid  = ORTE_JOBID_INVALID;
>>>>>> msg_ack_ref->peer.vpid   = ORTE_VPID_INVALID;
>>>>>> -    msg_ack_ref->peer.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(msg_ack_ref->peer.epoch,ORTE_EPOCH_MIN);
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> @@ -1015,7 +1015,7 @@
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> -#define CREATE_NEW_MSG(msg_ref, v_type, v_count, v_ddt_size, v_tag, 
>>>>>> v_rank, v_comm, p_jobid, p_vpid, p_epoch) \
>>>>>> +#define CREATE_NEW_MSG(msg_ref, v_type, v_count, v_ddt_size, v_tag, 
>>>>>> v_rank, v_comm, p_jobid, p_vpid) \
>>>>>> {                                                               \
>>>>>> HOKE_TRAFFIC_MSG_REF_ALLOC(msg_ref, ret);                     \
>>>>>>                                                             \
>>>>>> @@ -1034,7 +1034,7 @@
>>>>>>                                                             \
>>>>>> msg_ref->proc_name.jobid  = p_jobid;                          \
>>>>>> msg_ref->proc_name.vpid   = p_vpid;                           \
>>>>>> -   msg_ref->proc_name.epoch  = p_epoch;                          \
>>>>>> +   
>>>>>> ORTE_EPOCH_SET(msg_ref->proc_name.epoch,orte_ess.proc_get_epoch(&(msg_ref->proc_name)));
>>>>>>  \
>>>>>>                                                             \
>>>>>> msg_ref->matched = 0;                                         \
>>>>>> msg_ref->done    = 0;                                         \
>>>>>> @@ -1043,7 +1043,7 @@
>>>>>> msg_ref->active_drain = 0;                                    \
>>>>>> }
>>>>>> 
>>>>>> -#define CREATE_NEW_DRAIN_MSG(msg_ref, v_type, v_count, v_ddt_size, 
>>>>>> v_tag, v_rank, v_comm, p_jobid, p_vpid, p_epoch) \
>>>>>> +#define CREATE_NEW_DRAIN_MSG(msg_ref, v_type, v_count, v_ddt_size, 
>>>>>> v_tag, v_rank, v_comm, p_jobid, p_vpid) \
>>>>>> {                                                               \
>>>>>> HOKE_DRAIN_MSG_REF_ALLOC(msg_ref, ret);                       \
>>>>>>                                                             \
>>>>>> @@ -1063,7 +1063,7 @@
>>>>>>                                                             \
>>>>>> msg_ref->proc_name.jobid  = p_jobid;                          \
>>>>>> msg_ref->proc_name.vpid   = p_vpid;                           \
>>>>>> -   msg_ref->proc_name.epoch  = p_epoch;                          \
>>>>>> +   
>>>>>> ORTE_EPOCH_SET(msg_ref->proc_name.epoch,orte_ess.proc_get_epoch(&(msg_ref->proc_name)));
>>>>>>  \
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> @@ -1466,7 +1466,7 @@
>>>>>> 
>>>>>>    new_peer_ref->proc_name.jobid  = procs[i]->proc_name.jobid;
>>>>>>    new_peer_ref->proc_name.vpid   = procs[i]->proc_name.vpid;
>>>>>> -        new_peer_ref->proc_name.epoch  = procs[i]->proc_name.epoch;
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(new_peer_ref->proc_name.epoch,procs[i]->proc_name.epoch);
>>>>>> 
>>>>>>    opal_list_append(&ompi_crcp_bkmrk_pml_peer_refs, 
>>>>>> &(new_peer_ref->super));
>>>>>> }
>>>>>> @@ -3237,13 +3237,11 @@
>>>>>>        CREATE_NEW_MSG((*msg_ref), msg_type,
>>>>>>                       count, ddt_size, tag, dest, comm,
>>>>>>                       peer_ref->proc_name.jobid,
>>>>>> -                           peer_ref->proc_name.vpid,
>>>>>> -                           peer_ref->proc_name.epoch);
>>>>>> +                           peer_ref->proc_name.vpid);
>>>>>>    } else {
>>>>>>        CREATE_NEW_MSG((*msg_ref), msg_type,
>>>>>>                       count, ddt_size, tag, dest, comm,
>>>>>> -                           ORTE_JOBID_INVALID, ORTE_VPID_INVALID,
>>>>>> -                           ORTE_EPOCH_INVALID);
>>>>>> +                           ORTE_JOBID_INVALID, ORTE_VPID_INVALID);
>>>>>>    }
>>>>>> 
>>>>>>    if( msg_type == COORD_MSG_TYPE_P_SEND ||
>>>>>> @@ -3377,7 +3375,7 @@
>>>>>> if( NULL == from_peer_ref && NULL != to_peer_ref ) {
>>>>>>    (*new_msg_ref)->proc_name.jobid = to_peer_ref->proc_name.jobid;
>>>>>>    (*new_msg_ref)->proc_name.vpid  = to_peer_ref->proc_name.vpid;
>>>>>> -        (*new_msg_ref)->proc_name.epoch = to_peer_ref->proc_name.epoch;
>>>>>> +        
>>>>>> ORTE_EPOCH_SET((*new_msg_ref)->proc_name.epoch,to_peer_ref->proc_name.epoch);
>>>>>> }
>>>>>> 
>>>>>> return exit_status;
>>>>>> @@ -3808,8 +3806,7 @@
>>>>>>    CREATE_NEW_DRAIN_MSG((*msg_ref), msg_type,
>>>>>>                         count, NULL, tag, dest, comm,
>>>>>>                         peer_ref->proc_name.jobid,
>>>>>> -                             peer_ref->proc_name.vpid,
>>>>>> -                             peer_ref->proc_name.epoch);
>>>>>> +                             peer_ref->proc_name.vpid);
>>>>>> 
>>>>>>    (*msg_ref)->done           = 0;
>>>>>>    (*msg_ref)->active         = 0;
>>>>>> @@ -5284,8 +5281,7 @@
>>>>>> */
>>>>>> peer_name.jobid  = ORTE_PROC_MY_NAME->jobid;
>>>>>> peer_name.vpid   = peer_idx;
>>>>>> -    peer_name.epoch  = ORTE_EPOCH_INVALID;
>>>>>> -    peer_name.epoch  = orte_ess.proc_get_epoch(&peer_name);
>>>>>> +    ORTE_EPOCH_SET(peer_name.epoch,orte_ess.proc_get_epoch(&peer_name));
>>>>>> 
>>>>>> if( NULL == (peer_ref = find_peer(peer_name))) {
>>>>>>    opal_output(mca_crcp_bkmrk_component.super.output_handle,
>>>>>> @@ -5346,8 +5342,7 @@
>>>>>> 
>>>>>> peer_name.jobid  = ORTE_PROC_MY_NAME->jobid;
>>>>>> peer_name.vpid   = peer_idx;
>>>>>> -    peer_name.epoch  = ORTE_EPOCH_INVALID;
>>>>>> -    peer_name.epoch  = orte_ess.proc_get_epoch(&peer_name);
>>>>>> +    ORTE_EPOCH_SET(peer_name.epoch,orte_ess.proc_get_epoch(&peer_name));
>>>>>> 
>>>>>> if ( 0 > (ret = orte_rml.recv_buffer_nb(&peer_name,
>>>>>>                                        OMPI_CRCP_COORD_BOOKMARK_TAG,
>>>>>> @@ -5529,7 +5524,8 @@
>>>>>> HOKE_DRAIN_ACK_MSG_REF_ALLOC(d_msg_ack, ret);
>>>>>> d_msg_ack->peer.jobid  = peer_ref->proc_name.jobid;
>>>>>> d_msg_ack->peer.vpid   = peer_ref->proc_name.vpid;
>>>>>> -    d_msg_ack->peer.epoch  = peer_ref->proc_name.epoch;
>>>>>> +    ORTE_EPOCH_SET(d_msg_ack->peer.epoch,peer_ref->proc_name.epoch);
>>>>>> +
>>>>>> d_msg_ack->complete    = false;
>>>>>> opal_list_append(&drained_msg_ack_list, &(d_msg_ack->super));
>>>>>> OPAL_OUTPUT_VERBOSE((10, mca_crcp_bkmrk_component.super.output_handle,
>>>>>> @@ -6169,8 +6165,7 @@
>>>>>>                   count, datatype_size, tag, rank,
>>>>>>                   ompi_comm_lookup(comm_id),
>>>>>>                   peer_ref->proc_name.jobid,
>>>>>> -                       peer_ref->proc_name.vpid,
>>>>>> -                       peer_ref->proc_name.epoch);
>>>>>> +                       peer_ref->proc_name.vpid);
>>>>>> 
>>>>>>    traffic_message_create_drain_message(true, num_left_unresolved,
>>>>>>                                         peer_ref,
>>>>>> 
>>>>>> Modified: trunk/ompi/mca/dpm/orte/dpm_orte.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/dpm/orte/dpm_orte.c   (original)
>>>>>> +++ trunk/ompi/mca/dpm/orte/dpm_orte.c   2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -1130,7 +1130,7 @@
>>>>>> /* flag the identity of the remote proc */
>>>>>> carport.jobid = mev->sender.jobid;
>>>>>> carport.vpid = mev->sender.vpid;
>>>>>> -    carport.epoch = mev->sender.epoch;
>>>>>> +    ORTE_EPOCH_SET(carport.epoch,mev->sender.epoch);
>>>>>> 
>>>>>> /* release the event */
>>>>>> OBJ_RELEASE(mev);
>>>>>> 
>>>>>> Modified: trunk/ompi/mca/pml/bfo/pml_bfo_failover.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/pml/bfo/pml_bfo_failover.c    (original)
>>>>>> +++ trunk/ompi/mca/pml/bfo/pml_bfo_failover.c    2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -1,8 +1,5 @@
>>>>>> /*
>>>>>> * Copyright (c) 2010      Oracle and/or its affiliates.  All rights 
>>>>>> reserved.
>>>>>> - * Copyright (c) 2004-2011 The University of Tennessee and The 
>>>>>> University
>>>>>> - *                         of Tennessee Research Foundation.  All rights
>>>>>> - *                         reserved.
>>>>>> * $COPYRIGHT$
>>>>>> *
>>>>>> * Additional copyrights may follow
>>>>>> @@ -398,13 +395,13 @@
>>>>>>    (hdr->hdr_match.hdr_seq != (uint16_t)recvreq->req_msgseq)) {
>>>>>>    orte_proc.jobid = hdr->hdr_restart.hdr_jobid;
>>>>>>    orte_proc.vpid = hdr->hdr_restart.hdr_vpid;
>>>>>> -        orte_proc.epoch = hdr->hdr_restart.hdr_epoch;
>>>>>> +
>>>>>>    ompi_proc = ompi_proc_find(&orte_proc);
>>>>>>    opal_output_verbose(20, mca_pml_bfo_output,
>>>>>>                        "RNDVRESTARTNOTIFY: received: does not match 
>>>>>> request, sending NACK back "
>>>>>>                        "PML:req=%d,hdr=%d CTX:req=%d,hdr=%d 
>>>>>> SRC:req=%d,hdr=%d "
>>>>>>                        "RQS:req=%d,hdr=%d src_req=%p, dst_req=%p, 
>>>>>> peer=%d, hdr->hdr_jobid=%d, "
>>>>>> -                            "hdr->hdr_vpid=%d, hdr->hdr_epoch=%d, 
>>>>>> ompi_proc->proc_hostname=%s",
>>>>>> +                            "hdr->hdr_vpid=%d, 
>>>>>> ompi_proc->proc_hostname=%s",
>>>>>>                        (uint16_t)recvreq->req_msgseq, 
>>>>>> hdr->hdr_match.hdr_seq,
>>>>>>                        recvreq->req_recv.req_base.req_comm->c_contextid, 
>>>>>> hdr->hdr_match.hdr_ctx,
>>>>>>                        
>>>>>> recvreq->req_recv.req_base.req_ompi.req_status.MPI_SOURCE,
>>>>>> @@ -413,7 +410,7 @@
>>>>>>                        recvreq->remote_req_send.pval, (void *)recvreq,
>>>>>>                        
>>>>>> recvreq->req_recv.req_base.req_ompi.req_status.MPI_SOURCE,
>>>>>>                        hdr->hdr_restart.hdr_jobid, 
>>>>>> hdr->hdr_restart.hdr_vpid, 
>>>>>> -                            hdr->hdr_restart.hdr_epoch, 
>>>>>> ompi_proc->proc_hostname);
>>>>>> +                            ompi_proc->proc_hostname);
>>>>>>    mca_pml_bfo_recv_request_rndvrestartnack(des, ompi_proc, false);
>>>>>>    return;
>>>>>> }
>>>>>> @@ -715,7 +712,6 @@
>>>>>> restart->hdr_dst_rank = sendreq->req_send.req_base.req_peer; /* Needed 
>>>>>> for NACKs */
>>>>>> restart->hdr_jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> restart->hdr_vpid = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    restart->hdr_epoch = ORTE_PROC_MY_NAME->epoch;
>>>>>> 
>>>>>> bfo_hdr_hton(restart, MCA_PML_BFO_HDR_TYPE_RNDVRESTARTNOTIFY, proc);
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h (original)
>>>>>> +++ trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -2,9 +2,6 @@
>>>>>> * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
>>>>>> *                         University Research and Technology
>>>>>> *                         Corporation.  All rights reserved.
>>>>>> - * Copyright (c) 2004-2011 The University of Tennessee and The 
>>>>>> University
>>>>>> - *                         of Tennessee Research Foundation.  All rights
>>>>>> - *                         reserved.
>>>>>> * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
>>>>>> *                         University of Stuttgart.  All rights reserved.
>>>>>> * Copyright (c) 2004-2005 The Regents of the University of California.
>>>>>> @@ -415,7 +412,6 @@
>>>>>> int32_t  hdr_dst_rank;                    /**< needed to send NACK */
>>>>>> uint32_t hdr_jobid;                       /**< needed to send NACK */
>>>>>> uint32_t hdr_vpid;                        /**< needed to send NACK */
>>>>>> -    uint32_t hdr_epoch;                       /**< needed to send NACK 
>>>>>> */
>>>>>> };
>>>>>> typedef struct mca_pml_bfo_restart_hdr_t mca_pml_bfo_restart_hdr_t;
>>>>>> 
>>>>>> @@ -428,7 +424,6 @@
>>>>>> (h).hdr_dst_rank = ntohl((h).hdr_dst_rank); \
>>>>>> (h).hdr_jobid = ntohl((h).hdr_jobid); \
>>>>>> (h).hdr_vpid = ntohl((h).hdr_vpid); \
>>>>>> -    (h).hdr_epoch = ntohl((h).hdr_epoch); \
>>>>>> } while (0)
>>>>>> 
>>>>>> #define MCA_PML_BFO_RESTART_HDR_HTON(h) \
>>>>>> @@ -437,7 +432,6 @@
>>>>>> (h).hdr_dst_rank = htonl((h).hdr_dst_rank); \
>>>>>> (h).hdr_jobid = htonl((h).hdr_jobid); \
>>>>>> (h).hdr_vpid = htonl((h).hdr_vpid); \
>>>>>> -    (h).hdr_epoch = htonl((h).hdr_epoch); \
>>>>>> } while (0) 
>>>>>> 
>>>>>> #endif /* PML_BFO */
>>>>>> 
>>>>>> Modified: trunk/ompi/proc/proc.c
>>>>>> ==============================================================================
>>>>>> --- trunk/ompi/proc/proc.c       (original)
>>>>>> +++ trunk/ompi/proc/proc.c       2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -108,7 +108,8 @@
>>>>>> 
>>>>>>    proc->proc_name.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>    proc->proc_name.vpid = i;
>>>>>> -        proc->proc_name.epoch = ORTE_EPOCH_MIN;
>>>>>> +        ORTE_EPOCH_SET(proc->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> +
>>>>>>    if (i == ORTE_PROC_MY_NAME->vpid) {
>>>>>>        ompi_proc_local_proc = proc;
>>>>>>        proc->proc_flags = OPAL_PROC_ALL_LOCAL;
>>>>>> @@ -362,8 +363,7 @@
>>>>>> 
>>>>>>    /* Does not change: proc->proc_name.vpid */
>>>>>>    proc->proc_name.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> -        proc->proc_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        proc->proc_name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&proc->proc_name);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(proc->proc_name.epoch,orte_ess.proc_get_epoch(&proc->proc_name));
>>>>>> 
>>>>>>    /* Make sure to clear the local flag before we set it below */
>>>>>>    proc->proc_flags = 0;
>>>>>> 
>>>>>> Modified: trunk/opal/config/opal_configure_options.m4
>>>>>> ==============================================================================
>>>>>> --- trunk/opal/config/opal_configure_options.m4  (original)
>>>>>> +++ trunk/opal/config/opal_configure_options.m4  2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -416,6 +416,14 @@
>>>>>> AM_CONDITIONAL(WANT_FT_CR,  test "$opal_want_ft_cr" = "1")
>>>>>> 
>>>>>> #
>>>>>> +# Compile in resilient runtime code
>>>>>> +#
>>>>>> +AC_ARG_ENABLE(resilient-orte,
>>>>>> +    [AC_HELP_STRING([--enable-resilient-orte], [Enable the resilient 
>>>>>> runtime code.])])
>>>>>> +AM_CONDITIONAL(ORTE_RESIL_ORTE, [test "$enable_resilient_orte" = "yes"])
>>>>>> +AM_CONDITIONAL(ORTE_ENABLE_EPOCH, [test "$enable_resilient_orte" = 
>>>>>> "yes"])
>>>>>> +
>>>>>> +#
>>>>>> # Do we want to install binaries?
>>>>>> #
>>>>>> AC_ARG_ENABLE([binaries],
>>>>>> 
>>>>>> Modified: trunk/orte/include/orte/types.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/include/orte/types.h      (original)
>>>>>> +++ trunk/orte/include/orte/types.h      2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -81,24 +81,43 @@
>>>>>> #define ORTE_VPID_T         OPAL_UINT32
>>>>>> #define ORTE_VPID_MAX       UINT32_MAX-2
>>>>>> #define ORTE_VPID_MIN       0
>>>>>> +
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> typedef uint32_t orte_epoch_t;
>>>>>> #define ORTE_EPOCH_T        OPAL_UINT32
>>>>>> #define ORTE_EPOCH_MAX      UINT32_MAX-2
>>>>>> #define ORTE_EPOCH_MIN      0
>>>>>> +#endif
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> #define ORTE_PROCESS_NAME_HTON(n)       \
>>>>>> do {                                    \
>>>>>> n.jobid = htonl(n.jobid);           \
>>>>>> n.vpid = htonl(n.vpid);             \
>>>>>> n.epoch = htonl(n.epoch);           \
>>>>>> } while (0)
>>>>>> +#else
>>>>>> +#define ORTE_PROCESS_NAME_HTON(n)       \
>>>>>> +do {                                    \
>>>>>> +    n.jobid = htonl(n.jobid);           \
>>>>>> +    n.vpid = htonl(n.vpid);             \
>>>>>> +} while (0)
>>>>>> +#endif
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> #define ORTE_PROCESS_NAME_NTOH(n)       \
>>>>>> do {                                    \
>>>>>> n.jobid = ntohl(n.jobid);           \
>>>>>> n.vpid = ntohl(n.vpid);             \
>>>>>> n.epoch = ntohl(n.epoch);           \
>>>>>> } while (0)
>>>>>> +#else
>>>>>> +#define ORTE_PROCESS_NAME_NTOH(n)       \
>>>>>> +do {                                    \
>>>>>> +    n.jobid = ntohl(n.jobid);           \
>>>>>> +    n.vpid = ntohl(n.vpid);             \
>>>>>> +} while (0)
>>>>>> +#endif
>>>>>> 
>>>>>> #define ORTE_NAME_ARGS(n) \
>>>>>> (unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID : 
>>>>>> (unsigned long)(n)->jobid), \
>>>>>> @@ -127,6 +146,7 @@
>>>>>> struct orte_process_name_t {
>>>>>> orte_jobid_t jobid;     /**< Job number */
>>>>>> orte_vpid_t vpid;       /**< Process id - equivalent to rank */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t epoch;     /**< Epoch - used to measure the generation of a 
>>>>>> recovered process.
>>>>>>                         *   The epoch will start at ORTE_EPOCH_MIN and
>>>>>>                         *   increment every time the process is detected 
>>>>>> as
>>>>>> @@ -135,6 +155,7 @@
>>>>>>                         *   processes that did not directly detect the
>>>>>>                         *   failure to increment their epochs.
>>>>>>                         */
>>>>>> +#endif
>>>>>> };
>>>>>> typedef struct orte_process_name_t orte_process_name_t;
>>>>>> 
>>>>>> @@ -157,7 +178,10 @@
>>>>>> #define    ORTE_NAME                (OPAL_DSS_ID_DYNAMIC + 2)  /**< an 
>>>>>> orte_process_name_t */
>>>>>> #define    ORTE_VPID                (OPAL_DSS_ID_DYNAMIC + 3)  /**< a 
>>>>>> vpid */
>>>>>> #define    ORTE_JOBID               (OPAL_DSS_ID_DYNAMIC + 4)  /**< a 
>>>>>> jobid */
>>>>>> +
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> #define    ORTE_EPOCH               (OPAL_DSS_ID_DYNAMIC + 5)  /**< an 
>>>>>> epoch */
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> /* State-related types */
>>>>>> 
>>>>>> Modified: trunk/orte/mca/db/daemon/db_daemon.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/db/daemon/db_daemon.c (original)
>>>>>> +++ trunk/orte/mca/db/daemon/db_daemon.c 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -386,7 +386,7 @@
>>>>>>        dat = OBJ_NEW(orte_db_data_t);
>>>>>>        dat->name.jobid = sender->jobid;
>>>>>>        dat->name.vpid = sender->vpid;
>>>>>> -            dat->name.epoch= sender->epoch;
>>>>>> +            ORTE_EPOCH_SET(dat->name.epoch,sender->epoch);
>>>>>>        dat->key = key;
>>>>>>        count=1;
>>>>>>        opal_dss.unpack(buf, &dat->size, &count, OPAL_INT32);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/app/errmgr_app.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/app/errmgr_app.c       (original)
>>>>>> +++ trunk/orte/mca/errmgr/app/errmgr_app.c       2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -82,8 +82,10 @@
>>>>>> NULL,
>>>>>> NULL,
>>>>>> NULL,
>>>>>> -    orte_errmgr_base_register_migration_warning,
>>>>>> -    orte_errmgr_base_set_fault_callback
>>>>>> +    orte_errmgr_base_register_migration_warning
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> +    ,orte_errmgr_base_set_fault_callback
>>>>>> +#endif
>>>>>> };
>>>>>> 
>>>>>> /************************
>>>>>> @@ -93,18 +95,23 @@
>>>>>> {
>>>>>> int ret = ORTE_SUCCESS;
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
>>>>>>                              ORTE_RML_TAG_EPOCH_CHANGE, 
>>>>>>                              ORTE_RML_PERSISTENT, 
>>>>>>                              epoch_change_recv, 
>>>>>>                              NULL);
>>>>>> +#endif
>>>>>> +
>>>>>> return ret;
>>>>>> }
>>>>>> 
>>>>>> static int finalize(void)
>>>>>> {
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD,
>>>>>>                     ORTE_RML_TAG_EPOCH_CHANGE);
>>>>>> +#endif
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> @@ -151,6 +158,7 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> void epoch_change_recv(int status, 
>>>>>>                   orte_process_name_t *sender, 
>>>>>>                   opal_buffer_t *buffer, 
>>>>>> @@ -209,15 +217,20 @@
>>>>>>                ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> 
>>>>>>    (*fault_cbfunc)(procs);
>>>>>> +    } else if (NULL == fault_cbfunc) {
>>>>>> +        OPAL_OUTPUT_VERBOSE((1, orte_errmgr_base.output,
>>>>>> +                    "%s errmgr:app Calling fault callback failed (NULL 
>>>>>> pointer)!",
>>>>>> +                    ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> } else {
>>>>>>    OPAL_OUTPUT_VERBOSE((1, orte_errmgr_base.output,
>>>>>> -                    "%s errmgr:app Calling fault callback failed!",
>>>>>> +                    "%s errmgr:app Calling fault callback failed 
>>>>>> (num_dead <= 0)!",
>>>>>>                ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> }
>>>>>> 
>>>>>> free(proc);
>>>>>> OBJ_RELEASE(procs);
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> static int orte_errmgr_app_abort_peers(orte_process_name_t *procs, 
>>>>>> orte_std_cntr_t num_procs)
>>>>>> {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/base/errmgr_base_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/base/errmgr_base_fns.c (original)
>>>>>> +++ trunk/orte/mca/errmgr/base/errmgr_base_fns.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -97,13 +97,13 @@
>>>>>> {
>>>>>> item->proc_name.vpid  = ORTE_VPID_INVALID;
>>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID;
>>>>>> -    item->proc_name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> }
>>>>>> 
>>>>>> void orte_errmgr_predicted_proc_destruct( orte_errmgr_predicted_proc_t 
>>>>>> *item)
>>>>>> {
>>>>>> item->proc_name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    item->proc_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> +    ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_INVALID);
>>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID;
>>>>>> }
>>>>>> 
>>>>>> @@ -139,13 +139,13 @@
>>>>>> void orte_errmgr_predicted_map_construct(orte_errmgr_predicted_map_t 
>>>>>> *item)
>>>>>> {
>>>>>> item->proc_name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    item->proc_name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID;
>>>>>> 
>>>>>> item->node_name = NULL;
>>>>>> 
>>>>>> item->map_proc_name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    item->map_proc_name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(item->map_proc_name.epoch,ORTE_EPOCH_MIN);
>>>>>> item->map_proc_name.jobid = ORTE_JOBID_INVALID;
>>>>>> 
>>>>>> item->map_node_name = NULL;
>>>>>> @@ -156,7 +156,7 @@
>>>>>> void orte_errmgr_predicted_map_destruct( orte_errmgr_predicted_map_t 
>>>>>> *item)
>>>>>> {
>>>>>> item->proc_name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    item->proc_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> +    ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_INVALID);
>>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID;
>>>>>> 
>>>>>> if( NULL != item->node_name ) {
>>>>>> @@ -165,7 +165,7 @@
>>>>>> }
>>>>>> 
>>>>>> item->map_proc_name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    item->map_proc_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> +    ORTE_EPOCH_SET(item->map_proc_name.epoch,ORTE_EPOCH_INVALID);
>>>>>> item->map_proc_name.jobid = ORTE_JOBID_INVALID;
>>>>>> 
>>>>>> if( NULL != item->map_node_name ) {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/base/errmgr_base_tool.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/base/errmgr_base_tool.c        (original)
>>>>>> +++ trunk/orte/mca/errmgr/base/errmgr_base_tool.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -267,7 +267,7 @@
>>>>>> */
>>>>>> errmgr_cmdline_sender.jobid = ORTE_JOBID_INVALID;
>>>>>> errmgr_cmdline_sender.vpid  = ORTE_VPID_INVALID;
>>>>>> -    errmgr_cmdline_sender.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(errmgr_cmdline_sender.epoch,ORTE_EPOCH_MIN);
>>>>>> if (ORTE_SUCCESS != (ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD,
>>>>>>                                                   ORTE_RML_TAG_MIGRATE,
>>>>>>                                                   0,
>>>>>> @@ -379,14 +379,14 @@
>>>>>> if( OPAL_EQUAL != orte_util_compare_name_fields(ORTE_NS_CMP_ALL, 
>>>>>> ORTE_NAME_INVALID, &errmgr_cmdline_sender) ) {
>>>>>>    swap_dest.jobid = errmgr_cmdline_sender.jobid;
>>>>>>    swap_dest.vpid  = errmgr_cmdline_sender.vpid;
>>>>>> -        swap_dest.epoch = errmgr_cmdline_sender.epoch;
>>>>>> +        ORTE_EPOCH_SET(swap_dest.epoch,errmgr_cmdline_sender.epoch);
>>>>>> 
>>>>>>    errmgr_cmdline_sender = *sender;
>>>>>>    
>>>>>> orte_errmgr_base_migrate_update(ORTE_ERRMGR_MIGRATE_STATE_ERR_INPROGRESS);
>>>>>> 
>>>>>>    errmgr_cmdline_sender.jobid = swap_dest.jobid;
>>>>>>    errmgr_cmdline_sender.vpid  = swap_dest.vpid;
>>>>>> -        errmgr_cmdline_sender.epoch = swap_dest.epoch;
>>>>>> +        ORTE_EPOCH_SET(errmgr_cmdline_sender.epoch,swap_dest.epoch);
>>>>>> 
>>>>>>    goto cleanup;
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/hnp/errmgr_hnp.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/hnp/errmgr_hnp.c       (original)
>>>>>> +++ trunk/orte/mca/errmgr/hnp/errmgr_hnp.c       2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -53,6 +53,7 @@
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/runtime/orte_locks.h"
>>>>>> #include "orte/runtime/orte_quit.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/errmgr/errmgr.h"
>>>>>> #include "orte/mca/errmgr/base/base.h"
>>>>>> @@ -83,9 +84,11 @@
>>>>>> orte_errmgr_hnp_global_suggest_map_targets,
>>>>>> /* FT Event hook  */
>>>>>> orte_errmgr_hnp_global_ft_event,
>>>>>> -    orte_errmgr_base_register_migration_warning,
>>>>>> +    orte_errmgr_base_register_migration_warning
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> /* Set the callback */
>>>>>> -    orte_errmgr_base_set_fault_callback
>>>>>> +    ,orte_errmgr_base_set_fault_callback
>>>>>> +#endif
>>>>>> };
>>>>>> 
>>>>>> 
>>>>>> @@ -97,14 +100,16 @@
>>>>>> static void update_local_procs_in_job(orte_job_t *jdata, 
>>>>>> orte_job_state_t jobstate,
>>>>>>                                  orte_proc_state_t state, 
>>>>>> orte_exit_code_t exit_code);
>>>>>> static void check_job_complete(orte_job_t *jdata);
>>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t 
>>>>>> epoch);
>>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid);
>>>>>> static int hnp_relocate(orte_job_t *jdata, orte_process_name_t *proc,
>>>>>>                    orte_proc_state_t state, orte_exit_code_t exit_code);
>>>>>> static orte_odls_child_t* proc_is_local(orte_process_name_t *proc);
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> static int send_to_local_applications(opal_pointer_array_t *dead_names);
>>>>>> static void failure_notification(int status, orte_process_name_t* sender,
>>>>>>                             opal_buffer_t *buffer, orte_rml_tag_t tag,
>>>>>>                             void* cbdata);
>>>>>> +#endif
>>>>>> 
>>>>>> /************************
>>>>>> * API Definitions
>>>>>> @@ -380,16 +385,21 @@
>>>>>> **********************/
>>>>>> int orte_errmgr_hnp_base_global_init(void)
>>>>>> {
>>>>>> -    int ret;
>>>>>> +    int ret = ORTE_SUCCESS;
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, 
>>>>>> ORTE_RML_TAG_FAILURE_NOTICE,
>>>>>>                              ORTE_RML_PERSISTENT, failure_notification, 
>>>>>> NULL);
>>>>>> +#endif
>>>>>> +
>>>>>> return ret;
>>>>>> }
>>>>>> 
>>>>>> int orte_errmgr_hnp_base_global_finalize(void)
>>>>>> {
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_FAILURE_NOTICE);
>>>>>> +#endif
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> @@ -406,6 +416,7 @@
>>>>>> orte_odls_child_t *child;
>>>>>> int rc;
>>>>>> orte_app_context_t *app;
>>>>>> +    orte_proc_t *pdat;
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_errmgr_base.output,
>>>>>>                     "%s errmgr:hnp: job %s reported state %s"
>>>>>> @@ -538,7 +549,7 @@
>>>>>>                                  ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED,
>>>>>>                                  exit_code);
>>>>>>        /* order all local procs for this job to be killed */
>>>>>> -            killprocs(jdata->jobid, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +            killprocs(jdata->jobid, ORTE_VPID_WILDCARD);
>>>>>>        check_job_complete(jdata);  /* set the local proc states */
>>>>>>        /* the job object for this job will have been NULL'd
>>>>>>         * in the array if the job was solely local. If it isn't
>>>>>> @@ -550,7 +561,7 @@
>>>>>>        break;
>>>>>>    case ORTE_JOB_STATE_COMM_FAILED:
>>>>>>        /* order all local procs for this job to be killed */
>>>>>> -            killprocs(jdata->jobid, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +            killprocs(jdata->jobid, ORTE_VPID_WILDCARD);
>>>>>>        check_job_complete(jdata);  /* set the local proc states */
>>>>>>        /* the job object for this job will have been NULL'd
>>>>>>         * in the array if the job was solely local. If it isn't
>>>>>> @@ -562,7 +573,7 @@
>>>>>>        break;
>>>>>>    case ORTE_JOB_STATE_HEARTBEAT_FAILED:
>>>>>>        /* order all local procs for this job to be killed */
>>>>>> -            killprocs(jdata->jobid, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +            killprocs(jdata->jobid, ORTE_VPID_WILDCARD);
>>>>>>        check_job_complete(jdata);  /* set the local proc states */
>>>>>>        /* the job object for this job will have been NULL'd
>>>>>>         * in the array if the job was solely local. If it isn't
>>>>>> @@ -632,10 +643,6 @@
>>>>>>        }
>>>>>>    }
>>>>>> 
>>>>>> -        if (ORTE_PROC_STATE_ABORTED_BY_SIG == state) {
>>>>>> -            exit_code = 0;
>>>>>> -        }
>>>>>> -
>>>>>>    orte_errmgr_hnp_update_proc(jdata, proc, state, pid, exit_code);
>>>>>>    check_job_complete(jdata);  /* need to set the job state */
>>>>>>    /* the job object for this job will have been NULL'd
>>>>>> @@ -679,7 +686,7 @@
>>>>>> 
>>>>>>    case ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED:
>>>>>>        if (jdata->enable_recovery) {
>>>>>> -                killprocs(proc->jobid, proc->vpid, proc->epoch);
>>>>>> +                killprocs(proc->jobid, proc->vpid);
>>>>>>            /* is this a local proc */
>>>>>>            if (NULL != (child = proc_is_local(proc))) {
>>>>>>                /* local proc - see if it has reached its restart limit */
>>>>>> @@ -778,18 +785,37 @@
>>>>>>                    opal_output(0, "%s UNABLE TO RELOCATE PROCS FROM 
>>>>>> FAILED DAEMON %s",
>>>>>>                            ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), 
>>>>>> ORTE_NAME_PRINT(proc));
>>>>>>                    /* kill all local procs */
>>>>>> -                        killprocs(ORTE_JOBID_WILDCARD, 
>>>>>> ORTE_VPID_WILDCARD, ORTE_EPOCH_WILDCARD);
>>>>>> +                        killprocs(ORTE_JOBID_WILDCARD, 
>>>>>> ORTE_VPID_WILDCARD);
>>>>>>                    /* kill all jobs */
>>>>>>                    hnp_abort(ORTE_JOBID_WILDCARD, exit_code);
>>>>>>                    /* check if all is complete so we can terminate */
>>>>>>                    check_job_complete(jdata);
>>>>>>                }
>>>>>>            } else {
>>>>>> +#if !ORTE_RESIL_ORTE
>>>>>> +                    if (NULL == (pdat = 
>>>>>> (orte_proc_t*)opal_pointer_array_get_item(jdata->procs, proc->vpid))) {
>>>>>> +                        ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
>>>>>> +                        orte_show_help("help-orte-errmgr-hnp.txt", 
>>>>>> "errmgr-hnp:daemon-died", true,
>>>>>> +                                       ORTE_VPID_PRINT(proc->vpid), 
>>>>>> "Unknown");
>>>>>> +                    } else {
>>>>>> +                        orte_show_help("help-orte-errmgr-hnp.txt", 
>>>>>> "errmgr-hnp:daemon-died", true,
>>>>>> +                                       ORTE_VPID_PRINT(proc->vpid), 
>>>>>> +                                       (NULL == pdat->node) ? "Unknown" 
>>>>>> : 
>>>>>> +                                       ((NULL == pdat->node->name) ? 
>>>>>> "Unknown" : pdat->node->name));
>>>>>> +                    }
>>>>>> +#endif
>>>>>>                if (ORTE_SUCCESS != 
>>>>>> orte_errmgr_hnp_record_dead_process(proc)) {
>>>>>>                    /* The process is already dead so don't keep trying 
>>>>>> to do
>>>>>>                     * this stuff. */
>>>>>>                    return ORTE_SUCCESS;
>>>>>>                }
>>>>>> +
>>>>>> +#if !ORTE_RESIL_ORTE
>>>>>> +                    /* kill all local procs */
>>>>>> +                    killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD);
>>>>>> +                    /* kill all jobs */
>>>>>> +                    hnp_abort(ORTE_JOBID_WILDCARD, exit_code);
>>>>>> +#endif
>>>>>>                /* We'll check if the job was complete when we get the
>>>>>>                 * message back from the HNP notifying us of the dead
>>>>>>                 * process */
>>>>>> @@ -805,7 +831,7 @@
>>>>>>        } else {
>>>>>>            orte_errmgr_hnp_record_dead_process(proc);
>>>>>>            /* kill all local procs */
>>>>>> -                killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +                killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD);
>>>>>>            /* kill all jobs */
>>>>>>            hnp_abort(ORTE_JOBID_WILDCARD, exit_code);
>>>>>>            return ORTE_ERR_UNRECOVERABLE;
>>>>>> @@ -824,6 +850,7 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> static void failure_notification(int status, orte_process_name_t* sender,
>>>>>>                             opal_buffer_t *buffer, orte_rml_tag_t tag,
>>>>>>                             void* cbdata)
>>>>>> @@ -984,6 +1011,7 @@
>>>>>> 
>>>>>> OBJ_RELEASE(dead_names);
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> /*****************
>>>>>> * Local Functions
>>>>>> @@ -1354,7 +1382,6 @@
>>>>>>            ORTE_UPDATE_EXIT_STATUS(proc->exit_code);
>>>>>>        }
>>>>>>        break;
>>>>>> -#if 0
>>>>>>    case ORTE_PROC_STATE_ABORTED_BY_SIG:
>>>>>>        OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output,
>>>>>>                             "%s errmgr:hnp:check_job_completed proc %s 
>>>>>> aborted by signal",
>>>>>> @@ -1370,7 +1397,6 @@
>>>>>>            ORTE_UPDATE_EXIT_STATUS(proc->exit_code);
>>>>>>        }
>>>>>>        break;
>>>>>> -#endif
>>>>>>    case ORTE_PROC_STATE_TERM_WO_SYNC:
>>>>>>        OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output,
>>>>>>                             "%s errmgr:hnp:check_job_completed proc %s 
>>>>>> terminated without sync",
>>>>>> @@ -1393,7 +1419,6 @@
>>>>>>        }
>>>>>>        break;
>>>>>>    case ORTE_PROC_STATE_COMM_FAILED:
>>>>>> -#if 1
>>>>>>        if (!jdata->abort) {
>>>>>>            jdata->state = ORTE_JOB_STATE_COMM_FAILED;
>>>>>>            /* point to the lowest rank to cause the problem */
>>>>>> @@ -1403,7 +1428,6 @@
>>>>>>            jdata->abort = true;
>>>>>>            ORTE_UPDATE_EXIT_STATUS(proc->exit_code);
>>>>>>        }
>>>>>> -#endif
>>>>>>        break;
>>>>>>    case ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED:
>>>>>>        if (!jdata->abort) {
>>>>>> @@ -1530,9 +1554,6 @@
>>>>>> */
>>>>>> CHECK_DAEMONS:
>>>>>> if (jdata == NULL || jdata->jobid == ORTE_PROC_MY_NAME->jobid) {
>>>>>> -#if 0
>>>>>> -        if ((jdata->num_procs - 1) <= jdata->num_terminated) { /* 
>>>>>> Subtract one for the HNP */
>>>>>> -#endif
>>>>>>    if (0 == orte_routed.num_routes()) {
>>>>>>        /* orteds are done! */
>>>>>>        OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output,
>>>>>> @@ -1696,7 +1717,7 @@
>>>>>> }
>>>>>> }
>>>>>> 
>>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t 
>>>>>> epoch)
>>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid)
>>>>>> {
>>>>>> opal_pointer_array_t cmd;
>>>>>> orte_proc_t proc;
>>>>>> @@ -1707,7 +1728,9 @@
>>>>>>    orte_sensor.stop(job);
>>>>>> }
>>>>>> 
>>>>>> -    if (ORTE_JOBID_WILDCARD == job && ORTE_VPID_WILDCARD == vpid && 
>>>>>> ORTE_EPOCH_WILDCARD == epoch) {
>>>>>> +    if (ORTE_JOBID_WILDCARD == job 
>>>>>> +        && ORTE_VPID_WILDCARD == vpid 
>>>>>> +        && ORTE_EPOCH_CMP(ORTE_EPOCH_WILDCARD,epoch)) {
>>>>>>    if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(NULL))) {
>>>>>>        ORTE_ERROR_LOG(rc);
>>>>>>    }
>>>>>> @@ -1718,7 +1741,7 @@
>>>>>> OBJ_CONSTRUCT(&proc, orte_proc_t);
>>>>>> proc.name.jobid = job;
>>>>>> proc.name.vpid = vpid;
>>>>>> -    proc.name.epoch = epoch;
>>>>>> +    ORTE_EPOCH_SET(proc.name.epoch,epoch);
>>>>>> opal_pointer_array_add(&cmd, &proc);
>>>>>> if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(&cmd))) {
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>> @@ -1913,13 +1936,15 @@
>>>>>> }
>>>>>> 
>>>>>> if (NULL != (pdat = 
>>>>>> (orte_proc_t*)opal_pointer_array_get_item(jdat->procs, proc->vpid)) &&
>>>>>> -        ORTE_PROC_STATE_TERMINATED < pdat->state) {
>>>>>> +        ORTE_PROC_STATE_TERMINATED > pdat->state) {
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>    /* Make sure that the epochs match. */
>>>>>>    if (proc->epoch != pdat->name.epoch) {
>>>>>>        opal_output(1, "The epoch does not match the current epoch. 
>>>>>> Throwing the request out.");
>>>>>>        return ORTE_SUCCESS;
>>>>>>    }
>>>>>> +#endif
>>>>>> 
>>>>>>    dead_names = OBJ_NEW(opal_pointer_array_t);
>>>>>> 
>>>>>> @@ -1935,6 +1960,7 @@
>>>>>>        }
>>>>>>    }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>>    if (!mca_errmgr_hnp_component.term_in_progress) {
>>>>>>        /*
>>>>>>         * Send a message to the other daemons so they know that a daemon 
>>>>>> has
>>>>>> @@ -1949,7 +1975,7 @@
>>>>>>            OBJ_RELEASE(buffer);
>>>>>>        } else {
>>>>>> 
>>>>>> -                /* Iterate of the list of dead procs and send them 
>>>>>> along with
>>>>>> +                /* Iterate over the list of dead procs and send them 
>>>>>> along with
>>>>>>             * the rest. The HNP needs this info so it can tell the other
>>>>>>             * ORTEDs and they can inform the appropriate applications.
>>>>>>             */
>>>>>> @@ -1973,6 +1999,9 @@
>>>>>>    } else {
>>>>>>        orte_errmgr_hnp_global_mark_processes_as_dead(dead_names);
>>>>>>    }
>>>>>> +#else
>>>>>> +        orte_errmgr_hnp_global_mark_processes_as_dead(dead_names);
>>>>>> +#endif
>>>>>> }
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> @@ -2011,6 +2040,7 @@
>>>>>>                    ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>>>>                    ORTE_NAME_PRINT(&pdat->name)));
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>>        /* Make sure the epochs match, if not it probably means that we
>>>>>>         * already reported this failure. */
>>>>>>        if (name_item->epoch != pdat->name.epoch) {
>>>>>> @@ -2018,6 +2048,7 @@
>>>>>>        }
>>>>>> 
>>>>>>        orte_util_set_epoch(name_item, name_item->epoch + 1);
>>>>>> +#endif
>>>>>> 
>>>>>>        /* Remove it from the job array */
>>>>>>        opal_pointer_array_set_item(jdat->procs, name_item->vpid, NULL);
>>>>>> @@ -2034,6 +2065,7 @@
>>>>>> 
>>>>>>        OBJ_RELEASE(pdat);
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>>        /* Create a new proc object that will keep track of the epoch
>>>>>>         * information */
>>>>>>        pdat = OBJ_NEW(orte_proc_t);
>>>>>> @@ -2041,14 +2073,15 @@
>>>>>>        pdat->name.vpid = name_item->vpid;
>>>>>>        pdat->name.epoch = name_item->epoch + 1;
>>>>>> 
>>>>>> -            /* Set the state as terminated so we'll know the process 
>>>>>> isn't
>>>>>> -             * actually there. */
>>>>>> -            pdat->state = ORTE_PROC_STATE_TERMINATED;
>>>>>> -
>>>>>>        opal_pointer_array_set_item(jdat->procs, name_item->vpid, pdat);
>>>>>>        jdat->num_procs++;
>>>>>>        jdat->num_terminated++;
>>>>>> +#endif
>>>>>> +            /* Set the state as terminated so we'll know the process 
>>>>>> isn't
>>>>>> +             * actually there. */
>>>>>> +            pdat->state = ORTE_PROC_STATE_TERMINATED;
>>>>>>    } else {
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>>        opal_output(0, "Proc data not found for %s", 
>>>>>> ORTE_NAME_PRINT(name_item));
>>>>>>        /* Create a new proc object that will keep track of the epoch
>>>>>>         * information */
>>>>>> @@ -2064,11 +2097,13 @@
>>>>>>        opal_pointer_array_set_item(jdat->procs, name_item->vpid, pdat);
>>>>>>        jdat->num_procs++;
>>>>>>        jdat->num_terminated++;
>>>>>> +#endif
>>>>>>    }
>>>>>> 
>>>>>>    check_job_complete(jdat);
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> if (!orte_orteds_term_ordered) {
>>>>>>    /* Need to update the orted routing module. */
>>>>>>    orte_routed.update_routing_tree(ORTE_PROC_MY_NAME->jobid);
>>>>>> @@ -2077,10 +2112,12 @@
>>>>>>        (*fault_cbfunc)(dead_procs);
>>>>>>    }
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> int send_to_local_applications(opal_pointer_array_t *dead_names) {
>>>>>> opal_buffer_t *buf;
>>>>>> int ret = ORTE_SUCCESS;
>>>>>> @@ -2121,3 +2158,5 @@
>>>>>> 
>>>>>> return ret;
>>>>>> }
>>>>>> +#endif
>>>>>> +
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c (original)
>>>>>> +++ trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -522,7 +522,7 @@
>>>>>> wp_item = OBJ_NEW(errmgr_autor_wp_item_t);
>>>>>> wp_item->name.jobid = proc->jobid;
>>>>>> wp_item->name.vpid = proc->vpid;
>>>>>> -    wp_item->name.epoch = proc->epoch;
>>>>>> +    ORTE_EPOCH_SET(wp_item->name.epoch,proc->epoch);
>>>>>> wp_item->state = state;
>>>>>> 
>>>>>> opal_list_append(procs_pending_recovery, &(wp_item->super));
>>>>>> @@ -626,7 +626,7 @@
>>>>>> {
>>>>>> wp->name.jobid = ORTE_JOBID_INVALID;
>>>>>> wp->name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    wp->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(wp->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> wp->state = 0;
>>>>>> }
>>>>>> @@ -635,7 +635,7 @@
>>>>>> {
>>>>>> wp->name.jobid = ORTE_JOBID_INVALID;
>>>>>> wp->name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    wp->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> +    ORTE_EPOCH_SET(wp->name.epoch,ORTE_EPOCH_INVALID);
>>>>>> 
>>>>>> wp->state = 0;
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c (original)
>>>>>> +++ trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -750,7 +750,7 @@
>>>>>>            close_iof_stdin = true;
>>>>>>            iof_name.jobid = proc->name.jobid;
>>>>>>            iof_name.vpid = proc->name.vpid;
>>>>>> -                iof_name.epoch = proc->name.epoch;
>>>>>> +                ORTE_EPOCH_SET(iof_name.epoch,proc->name.epoch);
>>>>>>        }
>>>>>>    }
>>>>>> }
>>>>>> @@ -807,7 +807,7 @@
>>>>>>                close_iof_stdin = true;
>>>>>>                iof_name.jobid = proc->name.jobid;
>>>>>>                iof_name.vpid = proc->name.vpid;
>>>>>> -                    iof_name.epoch = proc->name.epoch;
>>>>>> +                    ORTE_EPOCH_SET(iof_name.epoch,proc->name.epoch);
>>>>>>            }
>>>>>>        }
>>>>>>    }
>>>>>> @@ -855,7 +855,7 @@
>>>>>>                close_iof_stdin = true;
>>>>>>                iof_name.jobid = proc->name.jobid;
>>>>>>                iof_name.vpid = proc->name.vpid;
>>>>>> -                    iof_name.epoch = proc->name.epoch;
>>>>>> +                    ORTE_EPOCH_SET(iof_name.epoch,proc->name.epoch);
>>>>>>            }
>>>>>>        }
>>>>>>    }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/errmgr/orted/errmgr_orted.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/errmgr/orted/errmgr_orted.c   (original)
>>>>>> +++ trunk/orte/mca/errmgr/orted/errmgr_orted.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -34,6 +34,7 @@
>>>>>> #include "orte/util/show_help.h"
>>>>>> #include "orte/util/nidmap.h"
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> #include "orte/mca/rml/rml.h"
>>>>>> #include "orte/mca/odls/odls.h"
>>>>>> #include "orte/mca/odls/base/base.h"
>>>>>> @@ -41,7 +42,9 @@
>>>>>> #include "orte/mca/plm/plm_types.h"
>>>>>> #include "orte/mca/routed/routed.h"
>>>>>> #include "orte/mca/sensor/sensor.h"
>>>>>> +#include "orte/mca/ess/ess.h"
>>>>>> #include "orte/runtime/orte_quit.h"
>>>>>> +#include "orte/runtime/orte_globals.h"
>>>>>> 
>>>>>> #include "orte/mca/errmgr/errmgr.h"
>>>>>> #include "orte/mca/errmgr/base/base.h"
>>>>>> @@ -59,13 +62,15 @@
>>>>>> static void update_local_children(orte_odls_job_t *jobdat,
>>>>>>                              orte_job_state_t jobstate,
>>>>>>                              orte_proc_state_t state);
>>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t 
>>>>>> epoch);
>>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid);
>>>>>> static int record_dead_process(orte_process_name_t *proc);
>>>>>> -static int send_to_local_applications(opal_pointer_array_t *dead_names);
>>>>>> static int mark_processes_as_dead(opal_pointer_array_t *dead_procs);
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> +static int send_to_local_applications(opal_pointer_array_t *dead_names);
>>>>>> static void failure_notification(int status, orte_process_name_t* sender,
>>>>>>                             opal_buffer_t *buffer, orte_rml_tag_t tag,
>>>>>>                             void* cbdata);
>>>>>> +#endif
>>>>>> 
>>>>>> /*
>>>>>> * Module functions: Global
>>>>>> @@ -104,8 +109,10 @@
>>>>>> predicted_fault,
>>>>>> suggest_map_targets,
>>>>>> ft_event,
>>>>>> -    orte_errmgr_base_register_migration_warning,
>>>>>> -    orte_errmgr_base_set_fault_callback  /* Set callback function */
>>>>>> +    orte_errmgr_base_register_migration_warning
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> +    ,orte_errmgr_base_set_fault_callback  /* Set callback function */
>>>>>> +#endif
>>>>>> };
>>>>>> 
>>>>>> /************************
>>>>>> @@ -113,16 +120,22 @@
>>>>>> ************************/
>>>>>> static int init(void)
>>>>>> {
>>>>>> -    int ret;
>>>>>> +    int ret = ORTE_SUCCESS;
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, 
>>>>>> ORTE_RML_TAG_FAILURE_NOTICE,
>>>>>>                              ORTE_RML_PERSISTENT, failure_notification, 
>>>>>> NULL);
>>>>>> +#endif
>>>>>> +
>>>>>> return ret;
>>>>>> }
>>>>>> 
>>>>>> static int finalize(void)
>>>>>> {
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_FAILURE_NOTICE);
>>>>>> +#endif
>>>>>> +
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> @@ -228,10 +241,10 @@
>>>>>>        /* update all procs in job */
>>>>>>        update_local_children(jobdat, jobstate, 
>>>>>> ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED);
>>>>>>        /* order all local procs for this job to be killed */
>>>>>> -            killprocs(jobdat->jobid, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +            killprocs(jobdat->jobid, ORTE_VPID_WILDCARD);
>>>>>>    case ORTE_JOB_STATE_COMM_FAILED:
>>>>>>        /* kill all local procs */
>>>>>> -            killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +            killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD);
>>>>>>        /* tell the caller we can't recover */
>>>>>>        return ORTE_ERR_UNRECOVERABLE;
>>>>>>        break;
>>>>>> @@ -276,7 +289,7 @@
>>>>>>    /* see if this was a lifeline */
>>>>>>    if (ORTE_SUCCESS != orte_routed.route_lost(proc)) {
>>>>>>        /* kill our children */
>>>>>> -            killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD, 
>>>>>> ORTE_EPOCH_WILDCARD);
>>>>>> +            killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD);
>>>>>>        /* terminate - our routed children will see
>>>>>>         * us leave and automatically die
>>>>>>         */
>>>>>> @@ -290,10 +303,18 @@
>>>>>>        if (0 == orte_routed.num_routes() &&
>>>>>>                0 == opal_list_get_size(&orte_local_children)) {
>>>>>>            orte_quit();
>>>>>> +            } else {
>>>>>> +                OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output,
>>>>>> +                            "%s errmgr:orted not exiting, num_routes() 
>>>>>> == %d, num children == %d",
>>>>>> +                            ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>>>> +                            orte_routed.num_routes(),
>>>>>> +                            opal_list_get_size(&orte_local_children)));
>>>>>>        }
>>>>>>    }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>>    record_dead_process(proc);
>>>>>> +#endif
>>>>>> 
>>>>>>    /* if not, then indicate we can continue */
>>>>>>    return ORTE_SUCCESS;
>>>>>> @@ -344,7 +365,7 @@
>>>>>>                /* Decrement the number of local procs */
>>>>>>                jobdat->num_local_procs--;
>>>>>>                /* kill this proc */
>>>>>> -                    killprocs(proc->jobid, proc->vpid, proc->epoch);
>>>>>> +                    killprocs(proc->jobid, proc->vpid);
>>>>>>            }
>>>>>>            app = 
>>>>>> (orte_app_context_t*)opal_pointer_array_get_item(&jobdat->apps, 
>>>>>> child->app_idx);
>>>>>>            if( jobdat->enable_recovery && child->restarts < 
>>>>>> app->max_restarts ) {
>>>>>> @@ -526,10 +547,12 @@
>>>>>>                    ORTE_ERROR_LOG(rc);
>>>>>>                    goto FINAL_CLEANUP;
>>>>>>                }
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>                if (ORTE_SUCCESS != (rc = opal_dss.pack(alert, 
>>>>>> &child->name->epoch, 1, ORTE_EPOCH))) {
>>>>>>                    ORTE_ERROR_LOG(rc);
>>>>>>                    goto FINAL_CLEANUP;
>>>>>>                }
>>>>>> +#endif
>>>>>>            }
>>>>>>        }
>>>>>>        /* pack an invalid marker */
>>>>>> @@ -660,7 +683,7 @@
>>>>>>        continue;
>>>>>>    }
>>>>>> 
>>>>>> -        if (name_item->epoch < orte_util_lookup_epoch(name_item)) {
>>>>>> +        if (0 < 
>>>>>> ORTE_EPOCH_CMP(name_item->epoch,orte_ess.proc_get_epoch(name_item))) {
>>>>>>        continue;
>>>>>>    }
>>>>>> 
>>>>>> @@ -669,9 +692,11 @@
>>>>>>                         ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>>>>                         ORTE_NAME_PRINT(name_item)));
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>    /* Increment the epoch */
>>>>>>    orte_util_set_proc_state(name_item, ORTE_PROC_STATE_TERMINATED);
>>>>>>    orte_util_set_epoch(name_item, name_item->epoch + 1);
>>>>>> +#endif
>>>>>> 
>>>>>>    OPAL_THREAD_LOCK(&orte_odls_globals.mutex);
>>>>>> 
>>>>>> @@ -706,6 +731,7 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> static void failure_notification(int status, orte_process_name_t* sender,
>>>>>>                             opal_buffer_t *buffer, orte_rml_tag_t tag,
>>>>>>                             void* cbdata)
>>>>>> @@ -714,7 +740,7 @@
>>>>>> orte_std_cntr_t n;
>>>>>> int ret = ORTE_SUCCESS, num_failed;
>>>>>> int32_t i;
>>>>>> -    orte_process_name_t *name_item, proc;
>>>>>> +    orte_process_name_t *name_item;
>>>>>> 
>>>>>> dead_names = OBJ_NEW(opal_pointer_array_t);
>>>>>> 
>>>>>> @@ -746,7 +772,7 @@
>>>>>>    /* There shouldn't be an issue of receiving this message multiple 
>>>>>>     * times but it doesn't hurt to double check.
>>>>>>     */
>>>>>> -        if (proc.epoch < orte_util_lookup_epoch(name_item)) {
>>>>>> +        if (0 < 
>>>>>> ORTE_EPOCH_CMP(name_item->epoch,orte_ess.proc_get_epoch(name_item))) {
>>>>>>        opal_output(1, "Received from proc %s local epoch %d", 
>>>>>> ORTE_NAME_PRINT(name_item), orte_util_lookup_epoch(name_item));
>>>>>>        continue;
>>>>>>    }
>>>>>> @@ -767,6 +793,7 @@
>>>>>>    free(name_item);
>>>>>> }
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> /*****************
>>>>>> * Local Functions
>>>>>> @@ -948,11 +975,13 @@
>>>>>>            ORTE_ERROR_LOG(rc);
>>>>>>            return rc;
>>>>>>        }            
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>        /* Pack the child's epoch. */
>>>>>>        if (ORTE_SUCCESS != (rc = opal_dss.pack(buf, 
>>>>>> &(child->name->epoch), 1, ORTE_EPOCH))) {
>>>>>>            ORTE_ERROR_LOG(rc);
>>>>>>            return rc;
>>>>>>        }
>>>>>> +#endif
>>>>>>        /* pack the contact info */
>>>>>>        if (ORTE_SUCCESS != (rc = opal_dss.pack(buf, &child->rml_uri, 1, 
>>>>>> OPAL_STRING))) {
>>>>>>            ORTE_ERROR_LOG(rc);
>>>>>> @@ -1015,7 +1044,7 @@
>>>>>> }
>>>>>> }
>>>>>> 
>>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t 
>>>>>> epoch)
>>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid)
>>>>>> {
>>>>>> opal_pointer_array_t cmd;
>>>>>> orte_proc_t proc;
>>>>>> @@ -1026,7 +1055,9 @@
>>>>>>    orte_sensor.stop(job);
>>>>>> }
>>>>>> 
>>>>>> -    if (ORTE_JOBID_WILDCARD == job && ORTE_VPID_WILDCARD == vpid && 
>>>>>> ORTE_EPOCH_WILDCARD == epoch) {
>>>>>> +    if (ORTE_JOBID_WILDCARD == job 
>>>>>> +        && ORTE_VPID_WILDCARD == vpid 
>>>>>> +        && 0 == ORTE_EPOCH_CMP(ORTE_EPOCH_WILDCARD,epoch)) {
>>>>>>    if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(NULL))) {
>>>>>>        ORTE_ERROR_LOG(rc);
>>>>>>    }
>>>>>> @@ -1037,7 +1068,7 @@
>>>>>> OBJ_CONSTRUCT(&proc, orte_proc_t);
>>>>>> proc.name.jobid = job;
>>>>>> proc.name.vpid = vpid;
>>>>>> -    proc.name.epoch = epoch;
>>>>>> +    ORTE_EPOCH_SET(proc.name.epoch,epoch);
>>>>>> opal_pointer_array_add(&cmd, &proc);
>>>>>> if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(&cmd))) {
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>> @@ -1082,20 +1113,21 @@
>>>>>> return rc;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> int send_to_local_applications(opal_pointer_array_t *dead_names) {
>>>>>> opal_buffer_t *buf;
>>>>>> int ret;
>>>>>> orte_process_name_t *name_item;
>>>>>> int size, i;
>>>>>> 
>>>>>> -    OPAL_OUTPUT_VERBOSE((10, orte_errmgr_base.output,
>>>>>> -                "%s Sending failure to local applications.",
>>>>>> -                ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> -    
>>>>>> buf = OBJ_NEW(opal_buffer_t);
>>>>>> 
>>>>>> size = opal_pointer_array_get_size(dead_names);
>>>>>> 
>>>>>> +    OPAL_OUTPUT_VERBOSE((10, orte_errmgr_base.output,
>>>>>> +                "%s Sending %d failure(s) to local applications.",
>>>>>> +                ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), size));
>>>>>> +    
>>>>>> if (ORTE_SUCCESS != (ret = opal_dss.pack(buf, &size, 1, ORTE_VPID))) {
>>>>>>    ORTE_ERROR_LOG(ret);
>>>>>>    OBJ_RELEASE(buf);
>>>>>> @@ -1122,4 +1154,5 @@
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/alps/ess_alps_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/alps/ess_alps_module.c    (original)
>>>>>> +++ trunk/orte/mca/ess/alps/ess_alps_module.c    2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -363,8 +363,8 @@
>>>>>> 
>>>>>> ORTE_PROC_MY_NAME->jobid = jobid;
>>>>>> ORTE_PROC_MY_NAME->vpid = (orte_vpid_t) cnos_get_rank() + starting_vpid;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME);
>>>>>> +    ORTE_EPOCH_PRINT(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_INVALID);
>>>>>> +    
>>>>>> ORTE_EPOCH_PRINT(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME));
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                     "ess:alps set name to %s", 
>>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/base/base.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/base/base.h       (original)
>>>>>> +++ trunk/orte/mca/ess/base/base.h       2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -57,7 +57,11 @@
>>>>>> 
>>>>>> ORTE_DECLSPEC extern opal_list_t orte_ess_base_components_available;
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> ORTE_DECLSPEC orte_epoch_t 
>>>>>> orte_ess_base_proc_get_epoch(orte_process_name_t *proc);
>>>>>> +#else
>>>>>> +ORTE_DECLSPEC int orte_ess_base_proc_get_epoch(orte_process_name_t 
>>>>>> *proc);
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/base/ess_base_select.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/base/ess_base_select.c    (original)
>>>>>> +++ trunk/orte/mca/ess/base/ess_base_select.c    2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -36,21 +36,19 @@
>>>>>> * Generic function to retrieve the epoch of a specific process 
>>>>>> * from the job data.
>>>>>> */
>>>>>> +#if !ORTE_ENABLE_EPOCH
>>>>>> +int orte_ess_base_proc_get_epoch(orte_process_name_t *proc) {
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +#else
>>>>>> orte_epoch_t orte_ess_base_proc_get_epoch(orte_process_name_t *proc) {
>>>>>> orte_epoch_t epoch = ORTE_EPOCH_INVALID;
>>>>>> 
>>>>>> -#if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> epoch = orte_util_lookup_epoch(proc);
>>>>>> -#endif
>>>>>> -
>>>>>> -    OPAL_OUTPUT_VERBOSE((2, orte_ess_base_output,
>>>>>> -                         "%s ess:generic: proc %s has epoch %d",
>>>>>> -                         ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>>>> -                         ORTE_NAME_PRINT(proc),
>>>>>> -                         epoch));
>>>>>> 
>>>>>> return epoch;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> int 
>>>>>> orte_ess_base_select(void)
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/env/ess_env_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/env/ess_env_module.c      (original)
>>>>>> +++ trunk/orte/mca/ess/env/ess_env_module.c      2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -392,8 +392,7 @@
>>>>>> 
>>>>>> ORTE_PROC_MY_NAME->jobid = jobid;
>>>>>> ORTE_PROC_MY_NAME->vpid = vpid;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME));
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                     "ess:env set name to %s", 
>>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/ess.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/ess.h     (original)
>>>>>> +++ trunk/orte/mca/ess/ess.h     2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -111,7 +111,11 @@
>>>>>> * will get the most up to date version stored within the orte_proc_t 
>>>>>> struct.
>>>>>> * Obviously the epoch of the proc that is passed in will be ignored.
>>>>>> */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> typedef orte_epoch_t 
>>>>>> (*orte_ess_base_module_proc_get_epoch_fn_t)(orte_process_name_t *proc);
>>>>>> +#else
>>>>>> +typedef int 
>>>>>> (*orte_ess_base_module_proc_get_epoch_fn_t)(orte_process_name_t *proc);
>>>>>> +#endif
>>>>>> 
>>>>>> /**
>>>>>> * Update the pidmap
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/generic/ess_generic_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/generic/ess_generic_module.c      (original)
>>>>>> +++ trunk/orte/mca/ess/generic/ess_generic_module.c      2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -155,7 +155,7 @@
>>>>>>    goto error;
>>>>>> }
>>>>>> ORTE_PROC_MY_NAME->vpid = strtol(envar, NULL, 10);
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                     "%s completed name definition",
>>>>>> @@ -273,7 +273,7 @@
>>>>>>            if (vpid == ORTE_PROC_MY_NAME->vpid) {
>>>>>>                ORTE_PROC_MY_DAEMON->jobid = 0;
>>>>>>                ORTE_PROC_MY_DAEMON->vpid = i;
>>>>>> -                    ORTE_PROC_MY_DAEMON->epoch = 
>>>>>> ORTE_PROC_MY_NAME->epoch;
>>>>>> +                    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>>            }
>>>>>>            OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                                 "%s node %d name %s rank %s",
>>>>>> @@ -304,7 +304,7 @@
>>>>>>                if (vpid == ORTE_PROC_MY_NAME->vpid) {
>>>>>>                    ORTE_PROC_MY_DAEMON->jobid = 0;
>>>>>>                    ORTE_PROC_MY_DAEMON->vpid = i;
>>>>>> -                        ORTE_PROC_MY_DAEMON->epoch = 
>>>>>> ORTE_PROC_MY_NAME->epoch;
>>>>>> +                        
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>>                }
>>>>>>                OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                                     "%s node %d name %s rank %d",
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/hnp/ess_hnp_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/hnp/ess_hnp_module.c      (original)
>>>>>> +++ trunk/orte/mca/ess/hnp/ess_hnp_module.c      2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -494,7 +494,7 @@
>>>>>> proc = OBJ_NEW(orte_proc_t);
>>>>>> proc->name.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> proc->name.vpid = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    proc->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> proc->pid = orte_process_info.pid;
>>>>>> proc->rml_uri = orte_rml.get_contact_info();
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/lsf/ess_lsf_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/lsf/ess_lsf_module.c      (original)
>>>>>> +++ trunk/orte/mca/ess/lsf/ess_lsf_module.c      2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -357,8 +357,7 @@
>>>>>> 
>>>>>> ORTE_PROC_MY_NAME->jobid = jobid;
>>>>>> ORTE_PROC_MY_NAME->vpid = vpid;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME));
>>>>>> 
>>>>>> /* fix up the base name and make it the "real" name */
>>>>>> lsf_nodeid = atoi(getenv("LSF_PM_TASKID"));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/singleton/ess_singleton_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/singleton/ess_singleton_module.c  (original)
>>>>>> +++ trunk/orte/mca/ess/singleton/ess_singleton_module.c  2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -188,7 +188,7 @@
>>>>>>    /* set the name */
>>>>>>    ORTE_PROC_MY_NAME->jobid = 0xffff0000 & ((uint32_t)jobfam << 16);
>>>>>>    ORTE_PROC_MY_NAME->vpid = 0;
>>>>>> -        ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_MIN;
>>>>>> +        ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> } else {
>>>>>>    /*
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/slave/ess_slave_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/slave/ess_slave_module.c  (original)
>>>>>> +++ trunk/orte/mca/ess/slave/ess_slave_module.c  2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -280,8 +280,7 @@
>>>>>> 
>>>>>> ORTE_PROC_MY_NAME->jobid = jobid;
>>>>>> ORTE_PROC_MY_NAME->vpid = vpid;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME));
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                     "ess:slave set name to %s", 
>>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/slurm/ess_slurm_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/slurm/ess_slurm_module.c  (original)
>>>>>> +++ trunk/orte/mca/ess/slurm/ess_slurm_module.c  2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -368,8 +368,7 @@
>>>>>> /* fix up the vpid and make it the "real" vpid */
>>>>>> slurm_nodeid = atoi(getenv("SLURM_NODEID"));
>>>>>> ORTE_PROC_MY_NAME->vpid = vpid + slurm_nodeid;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME));
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                     "ess:slurm set name to %s", 
>>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/slurmd/ess_slurmd_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/slurmd/ess_slurmd_module.c        (original)
>>>>>> +++ trunk/orte/mca/ess/slurmd/ess_slurmd_module.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -195,7 +195,7 @@
>>>>>> }
>>>>>> ORTE_PROC_MY_NAME->vpid = strtol(envar, NULL, 10);
>>>>>> #endif
>>>>>> -    ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN);
>>>>>> /* get our local rank */
>>>>>> if (NULL == (envar = getenv("SLURM_LOCALID"))) {
>>>>>>    error = "could not get SLURM_LOCALID";
>>>>>> @@ -260,7 +260,7 @@
>>>>>> nodeid = strtol(envar, NULL, 10);
>>>>>> ORTE_PROC_MY_DAEMON->jobid = 0;
>>>>>> ORTE_PROC_MY_DAEMON->vpid = nodeid;
>>>>>> -    ORTE_PROC_MY_DAEMON->epoch = ORTE_PROC_MY_NAME->epoch;
>>>>>> +    ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> 
>>>>>> /* get the number of ppn */
>>>>>> if (NULL == (tasks_per_node = getenv("SLURM_STEP_TASKS_PER_NODE"))) {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/ess/tm/ess_tm_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/ess/tm/ess_tm_module.c        (original)
>>>>>> +++ trunk/orte/mca/ess/tm/ess_tm_module.c        2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -364,7 +364,7 @@
>>>>>> 
>>>>>> ORTE_PROC_MY_NAME->jobid = jobid;
>>>>>> ORTE_PROC_MY_NAME->vpid = vpid;
>>>>>> -    ORTE_PROC_MY_NAME->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME));
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output,
>>>>>>                     "ess:tm set name to %s", 
>>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/filem/rsh/filem_rsh_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/filem/rsh/filem_rsh_module.c  (original)
>>>>>> +++ trunk/orte/mca/filem/rsh/filem_rsh_module.c  2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -1097,11 +1097,11 @@
>>>>>> if( NULL != proc_set ) {
>>>>>>    wp_item->proc_set.source.jobid = proc_set->source.jobid;
>>>>>>    wp_item->proc_set.source.vpid  = proc_set->source.vpid;
>>>>>> -        wp_item->proc_set.source.epoch = proc_set->source.epoch;
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(wp_item->proc_set.source.epoch,proc_set->source.epoch);
>>>>>> 
>>>>>>    wp_item->proc_set.sink.jobid = proc_set->sink.jobid;
>>>>>>    wp_item->proc_set.sink.vpid  = proc_set->sink.vpid;
>>>>>> -        wp_item->proc_set.sink.epoch = proc_set->sink.epoch;
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(wp_item->proc_set.sink.epoch,proc_set->sink.epoch);
>>>>>> }
>>>>>> /* Copy the File Set */
>>>>>> if( NULL != file_set ) {
>>>>>> @@ -1396,7 +1396,7 @@
>>>>>>        wp_item = OBJ_NEW(orte_filem_rsh_work_pool_item_t);
>>>>>>        wp_item->proc_set.source.jobid = sender->jobid;
>>>>>>        wp_item->proc_set.source.vpid  = sender->vpid;
>>>>>> -            wp_item->proc_set.source.epoch = sender->epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(wp_item->proc_set.source.epoch,sender->epoch);
>>>>>> 
>>>>>>        opal_list_append(&work_pool_waiting, &(wp_item->super));
>>>>>>    }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c      (original)
>>>>>> +++ trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c      2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -168,8 +168,7 @@
>>>>>> if (vpids[0] == ORTE_PROC_MY_NAME->vpid) {
>>>>>>    /* I send first */
>>>>>>    peer.vpid = vpids[1];
>>>>>> -        peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +        ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>    /* setup a temp buffer so I can inform the other side as to the
>>>>>>     * number of entries in my buffer
>>>>>> @@ -226,8 +225,7 @@
>>>>>>    opal_dss.pack(&buf, &num_entries, 1, OPAL_INT32);
>>>>>>    opal_dss.copy_payload(&buf, sendbuf);
>>>>>>    peer.vpid = vpids[0];
>>>>>> -        peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +        ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>    OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base.output,
>>>>>>                         "%s grpcomm:coll:two-proc sending to %s",
>>>>>> @@ -320,8 +318,7 @@
>>>>>>    /* first send my current contents */
>>>>>>    nv = (rank - distance + np) % np;
>>>>>>    peer.vpid = vpids[nv];
>>>>>> -        peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +        ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>    OBJ_CONSTRUCT(&buf, opal_buffer_t);
>>>>>>    opal_dss.pack(&buf, &total_entries, 1, OPAL_INT32);
>>>>>> @@ -340,8 +337,7 @@
>>>>>>    num_recvd = 0;
>>>>>>    nv = (rank + distance) % np;
>>>>>>    peer.vpid = vpids[nv];
>>>>>> -        peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +        ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>    OBJ_CONSTRUCT(&bucket, opal_buffer_t);
>>>>>>    if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(&peer,
>>>>>> @@ -439,8 +435,7 @@
>>>>>>    /* first send my current contents */
>>>>>>    nv = rank ^ distance;
>>>>>>    peer.vpid = vpids[nv];
>>>>>> -        peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +        ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>    OBJ_CONSTRUCT(&buf, opal_buffer_t);
>>>>>>    opal_dss.pack(&buf, &total_entries, 1, OPAL_INT32);
>>>>>> @@ -646,8 +641,7 @@
>>>>>>    proc.jobid = jobid;
>>>>>>    proc.vpid = 0;
>>>>>>    while (proc.vpid < jobdat->num_procs && 0 < 
>>>>>> opal_list_get_size(&daemon_tree)) {
>>>>>> -            proc.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            proc.epoch = orte_ess.proc_get_epoch(&proc);
>>>>>> +            ORTE_EPOCH_SET(proc.epoch,orte_ess.proc_get_epoch(&proc));
>>>>>> 
>>>>>>        /* get the daemon that hosts this proc */
>>>>>>        daemonvpid = orte_ess.proc_get_daemon(&proc);
>>>>>> @@ -713,8 +707,7 @@
>>>>>>    /* send it */
>>>>>>    my_parent.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>    my_parent.vpid = orte_routed.get_routing_tree(NULL);
>>>>>> -        my_parent.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        my_parent.epoch = orte_ess.proc_get_epoch(&my_parent);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(my_parent.epoch,orte_ess.proc_get_epoch(&my_parent));
>>>>>> 
>>>>>>    OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base.output,
>>>>>>                         "%s grpcomm:base:daemon_coll: daemon collective 
>>>>>> not the HNP - sending to parent %s",
>>>>>> 
>>>>>> Modified: trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c    (original)
>>>>>> +++ trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c    2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -95,7 +95,7 @@
>>>>>> 
>>>>>> my_local_rank_zero_proc.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> my_local_rank_zero_proc.vpid = ORTE_VPID_INVALID;
>>>>>> -    my_local_rank_zero_proc.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(my_local_rank_zero_proc.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> if (ORTE_SUCCESS != (rc = orte_grpcomm_base_modex_init())) {
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>> @@ -270,7 +270,7 @@
>>>>>>    proc.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>    for (v=0; v < orte_process_info.num_procs; v++) {
>>>>>>        proc.vpid = v;
>>>>>> -            proc.epoch = orte_util_lookup_epoch(&proc);
>>>>>> +            ORTE_EPOCH_SET(proc.epoch,orte_util_lookup_epoch(&proc));
>>>>>> 
>>>>>>        /* is this proc local_rank=0 on its node? */
>>>>>>        if (0 == my_local_rank && 0 == orte_ess.get_local_rank(&proc)) {
>>>>>> @@ -285,7 +285,7 @@
>>>>>>        nm = OBJ_NEW(orte_namelist_t);
>>>>>>        nm->name.jobid = proc.jobid;
>>>>>>        nm->name.vpid = proc.vpid;
>>>>>> -            nm->name.epoch = proc.epoch;
>>>>>> +            ORTE_EPOCH_SET(nm->name.epoch,proc.epoch);
>>>>>> 
>>>>>>        opal_list_append(&my_local_peers, &nm->item);
>>>>>>        /* if I am not local_rank=0, is this one? */
>>>>>> @@ -293,7 +293,7 @@
>>>>>>            0 == orte_ess.get_local_rank(&proc)) {
>>>>>>            my_local_rank_zero_proc.jobid = proc.jobid;
>>>>>>            my_local_rank_zero_proc.vpid = proc.vpid;
>>>>>> -                my_local_rank_zero_proc.epoch = proc.epoch;
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(my_local_rank_zero_proc.epoch,proc.epoch);
>>>>>>        }
>>>>>>    }
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/iof/base/base.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/iof/base/base.h       (original)
>>>>>> +++ trunk/orte/mca/iof/base/base.h       2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -135,7 +135,7 @@
>>>>>>    ep = OBJ_NEW(orte_iof_sink_t);                              \
>>>>>>    ep->name.jobid = (nm)->jobid;                               \
>>>>>>    ep->name.vpid = (nm)->vpid;                                 \
>>>>>> -        ep->name.epoch = (nm)->epoch;                               \
>>>>>> +        ORTE_EPOCH_SET(ep->name.epoch,(nm)->epoch);                 \
>>>>>>    ep->tag = (tg);                                             \
>>>>>>    if (0 <= (fid)) {                                           \
>>>>>>        ep->wev->fd = (fid);                                    \
>>>>>> @@ -169,7 +169,7 @@
>>>>>>    rev = OBJ_NEW(orte_iof_read_event_t);                       \
>>>>>>    rev->name.jobid = (nm)->jobid;                              \
>>>>>>    rev->name.vpid = (nm)->vpid;                                \
>>>>>> -        rev->name.epoch = (nm)->epoch;                              \
>>>>>> +        ORTE_EPOCH_SET(rev->name.epoch,(nm)->epoch);                \
>>>>>>    rev->tag = (tg);                                            \
>>>>>>    rev->fd = (fid);                                            \
>>>>>>    *(rv) = rev;                                                \
>>>>>> @@ -194,7 +194,7 @@
>>>>>>    ep = OBJ_NEW(orte_iof_sink_t);                              \
>>>>>>    ep->name.jobid = (nm)->jobid;                               \
>>>>>>    ep->name.vpid = (nm)->vpid;                                 \
>>>>>> -        ep->name.epoch = (nm)->epoch;                               \
>>>>>> +        ORTE_EPOCH_SET(ep->name.epoch,(nm)->epoch);                 \
>>>>>>    ep->tag = (tg);                                             \
>>>>>>    if (0 <= (fid)) {                                           \
>>>>>>        ep->wev->fd = (fid);                                    \
>>>>>> @@ -215,7 +215,7 @@
>>>>>>    rev = OBJ_NEW(orte_iof_read_event_t);                       \
>>>>>>    rev->name.jobid = (nm)->jobid;                              \
>>>>>>    rev->name.vpid = (nm)->vpid;                                \
>>>>>> -        rev->name.epoch= (nm)->epoch;                               \
>>>>>> +        ORTE_EPOCH_SET(rev->name.epoch,(nm)->epoch);                \
>>>>>>    rev->tag = (tg);                                            \
>>>>>>    *(rv) = rev;                                                \
>>>>>>    opal_event_set(opal_event_base,                             \
>>>>>> 
>>>>>> Modified: trunk/orte/mca/iof/base/iof_base_open.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/iof/base/iof_base_open.c      (original)
>>>>>> +++ trunk/orte/mca/iof/base/iof_base_open.c      2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -91,7 +91,7 @@
>>>>>> {
>>>>>> ptr->daemon.jobid = ORTE_JOBID_INVALID;
>>>>>> ptr->daemon.vpid = ORTE_VPID_INVALID;
>>>>>> -    ptr->daemon.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ptr->daemon.epoch,ORTE_EPOCH_MIN);
>>>>>> ptr->wev = OBJ_NEW(orte_iof_write_event_t);
>>>>>> }
>>>>>> static void orte_iof_base_sink_destruct(orte_iof_sink_t* ptr)
>>>>>> 
>>>>>> Modified: trunk/orte/mca/iof/hnp/iof_hnp.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/iof/hnp/iof_hnp.c     (original)
>>>>>> +++ trunk/orte/mca/iof/hnp/iof_hnp.c     2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -186,7 +186,7 @@
>>>>>>    proct = OBJ_NEW(orte_iof_proc_t);
>>>>>>    proct->name.jobid = dst_name->jobid;
>>>>>>    proct->name.vpid = dst_name->vpid;
>>>>>> -        proct->name.epoch = dst_name->epoch;
>>>>>> +        ORTE_EPOCH_SET(proct->name.epoch,dst_name->epoch);
>>>>>>    opal_list_append(&mca_iof_hnp_component.procs, &proct->super);
>>>>>>    /* see if we are to output to a file */
>>>>>>    if (NULL != orte_output_filename) {
>>>>>> @@ -281,8 +281,7 @@
>>>>>>                             &mca_iof_hnp_component.sinks);
>>>>>>        sink->daemon.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>        sink->daemon.vpid = proc->node->daemon->name.vpid;
>>>>>> -            sink->daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            sink->daemon.epoch = orte_ess.proc_get_epoch(&sink->daemon);
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(sink->daemon.epoch,orte_ess.proc_get_epoch(&sink->daemon));
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> @@ -389,7 +388,7 @@
>>>>>>                     &mca_iof_hnp_component.sinks);
>>>>>> sink->daemon.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> sink->daemon.vpid = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    sink->daemon.epoch = ORTE_PROC_MY_NAME->epoch;
>>>>>> +    ORTE_EPOCH_SET(sink->daemon.epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/iof/hnp/iof_hnp_receive.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/iof/hnp/iof_hnp_receive.c     (original)
>>>>>> +++ trunk/orte/mca/iof/hnp/iof_hnp_receive.c     2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -109,21 +109,21 @@
>>>>>>                             NULL, &mca_iof_hnp_component.sinks);
>>>>>>        sink->daemon.jobid = mev->sender.jobid;
>>>>>>        sink->daemon.vpid = mev->sender.vpid;
>>>>>> -            sink->daemon.epoch = mev->sender.epoch;
>>>>>> +            ORTE_EPOCH_SET(sink->daemon.epoch,mev->sender.epoch);
>>>>>>    }
>>>>>>    if (ORTE_IOF_STDERR & stream) {
>>>>>>        ORTE_IOF_SINK_DEFINE(&sink, &origin, -1, ORTE_IOF_STDERR,
>>>>>>                             NULL, &mca_iof_hnp_component.sinks);
>>>>>>        sink->daemon.jobid = mev->sender.jobid;
>>>>>>        sink->daemon.vpid = mev->sender.vpid;
>>>>>> -            sink->daemon.epoch = mev->sender.epoch;
>>>>>> +            ORTE_EPOCH_SET(sink->daemon.epoch,mev->sender.epoch);
>>>>>>    }
>>>>>>    if (ORTE_IOF_STDDIAG & stream) {
>>>>>>        ORTE_IOF_SINK_DEFINE(&sink, &origin, -1, ORTE_IOF_STDDIAG,
>>>>>>                             NULL, &mca_iof_hnp_component.sinks);
>>>>>>        sink->daemon.jobid = mev->sender.jobid;
>>>>>>        sink->daemon.vpid = mev->sender.vpid;
>>>>>> -            sink->daemon.epoch = mev->sender.epoch;
>>>>>> +            ORTE_EPOCH_SET(sink->daemon.epoch,mev->sender.epoch);
>>>>>>    }
>>>>>>    goto CLEAN_RETURN;
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/iof/orted/iof_orted.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/iof/orted/iof_orted.c (original)
>>>>>> +++ trunk/orte/mca/iof/orted/iof_orted.c 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -163,7 +163,7 @@
>>>>>> proct = OBJ_NEW(orte_iof_proc_t);
>>>>>> proct->name.jobid = dst_name->jobid;
>>>>>> proct->name.vpid = dst_name->vpid;
>>>>>> -    proct->name.epoch = dst_name->epoch;
>>>>>> +    ORTE_EPOCH_SET(proct->name.epoch,dst_name->epoch);
>>>>>> opal_list_append(&mca_iof_orted_component.procs, &proct->super);
>>>>>> /* see if we are to output to a file */
>>>>>> if (NULL != orte_output_filename) {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/odls/base/odls_base_default_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/odls/base/odls_base_default_fns.c     (original)
>>>>>> +++ trunk/orte/mca/odls/base/odls_base_default_fns.c     2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -734,8 +734,7 @@
>>>>>> proc.jobid = jobdat->jobid;
>>>>>> for (j=0; j < jobdat->num_procs; j++) {
>>>>>>    proc.vpid = j;
>>>>>> -        proc.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        proc.epoch = orte_ess.proc_get_epoch(&proc);
>>>>>> +        ORTE_EPOCH_SET(proc.epoch,orte_ess.proc_get_epoch(&proc));
>>>>>>    /* get the vpid of the daemon that is to host this proc */
>>>>>>    if (ORTE_VPID_INVALID == (host_daemon = 
>>>>>> orte_ess.proc_get_daemon(&proc))) {
>>>>>>        ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND);
>>>>>> @@ -1044,6 +1043,7 @@
>>>>>> free(param);
>>>>>> free(value);
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* setup the epoch */
>>>>>> if (ORTE_SUCCESS != (rc = orte_util_convert_epoch_to_string(&value, 
>>>>>> child->name->epoch))) {
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>> @@ -1057,6 +1057,7 @@
>>>>>> opal_setenv(param, value, true, env);
>>>>>> free(param);
>>>>>> free(value);
>>>>>> +#endif
>>>>>> 
>>>>>> /* setup the vpid */
>>>>>> if (ORTE_SUCCESS != (rc = orte_util_convert_vpid_to_string(&value, 
>>>>>> child->name->vpid))) {
>>>>>> @@ -2721,7 +2722,7 @@
>>>>>>    OBJ_CONSTRUCT(&proctmp, orte_proc_t);
>>>>>>    proctmp.name.jobid = ORTE_JOBID_WILDCARD;
>>>>>>    proctmp.name.vpid = ORTE_VPID_WILDCARD;
>>>>>> -        proctmp.name.epoch = ORTE_EPOCH_WILDCARD;
>>>>>> +        ORTE_EPOCH_SET(proctmp.name.epoch,ORTE_EPOCH_WILDCARD);
>>>>>>    opal_pointer_array_add(&procarray, &proctmp);
>>>>>>    procptr = &procarray;
>>>>>>    do_cleanup = true;
>>>>>> 
>>>>>> Modified: trunk/orte/mca/odls/base/odls_base_open.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/odls/base/odls_base_open.c    (original)
>>>>>> +++ trunk/orte/mca/odls/base/odls_base_open.c    2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -187,7 +187,7 @@
>>>>>>        if (-1 == rank) {
>>>>>>            /* wildcard */
>>>>>>            nm->name.vpid = ORTE_VPID_WILDCARD;
>>>>>> -                nm->name.epoch = ORTE_EPOCH_WILDCARD;
>>>>>> +                ORTE_EPOCH_SET(nm->name.epoch,ORTE_EPOCH_WILDCARD);
>>>>>>        } else if (rank < 0) {
>>>>>>            /* error out on bozo case */
>>>>>>            orte_show_help("help-odls-base.txt",
>>>>>> @@ -200,8 +200,7 @@
>>>>>>             * will be in the job - we'll check later
>>>>>>             */
>>>>>>            nm->name.vpid = rank;
>>>>>> -                nm->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                nm->name.epoch = orte_ess.proc_get_epoch(&nm->name);
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(nm->name.epoch,orte_ess.proc_get_epoch(&nm->name));
>>>>>>        }
>>>>>>        opal_list_append(&orte_odls_globals.xterm_ranks, &nm->item);
>>>>>>    }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/odls/base/odls_base_state.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/odls/base/odls_base_state.c   (original)
>>>>>> +++ trunk/orte/mca/odls/base/odls_base_state.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -77,17 +77,17 @@
>>>>>>    /* if I am the HNP, then use me as the source */
>>>>>>    p_set->source.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>    p_set->source.vpid  = ORTE_PROC_MY_NAME->vpid;
>>>>>> -        p_set->source.epoch = ORTE_PROC_MY_NAME->epoch;
>>>>>> +        ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> }
>>>>>> else {
>>>>>>    /* otherwise, set the HNP as the source */
>>>>>>    p_set->source.jobid = ORTE_PROC_MY_HNP->jobid;
>>>>>>    p_set->source.vpid  = ORTE_PROC_MY_HNP->vpid;
>>>>>> -        p_set->source.epoch = ORTE_PROC_MY_HNP->epoch;
>>>>>> +        ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_HNP->epoch);
>>>>>> }
>>>>>> p_set->sink.jobid   = ORTE_PROC_MY_NAME->jobid;
>>>>>> p_set->sink.vpid    = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    p_set->sink.epoch   = ORTE_PROC_MY_NAME->epoch;
>>>>>> +    ORTE_EPOCH_SET(p_set->sink.epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> 
>>>>>> opal_list_append(&(filem_request->process_sets), &(p_set->super) );
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/oob/tcp/oob_tcp_msg.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/oob/tcp/oob_tcp_msg.c (original)
>>>>>> +++ trunk/orte/mca/oob/tcp/oob_tcp_msg.c 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -137,6 +137,7 @@
>>>>>> bool mca_oob_tcp_msg_send_handler(mca_oob_tcp_msg_t* msg, struct 
>>>>>> mca_oob_tcp_peer_t * peer)
>>>>>> {
>>>>>> int rc;
>>>>>> +
>>>>>> while(1) {
>>>>>>    rc = writev(peer->peer_sd, msg->msg_rwptr, msg->msg_rwnum);
>>>>>>    if(rc < 0) {
>>>>>> @@ -338,6 +339,7 @@
>>>>>> orte_process_name_t src = msg->msg_hdr.msg_src;
>>>>>> 
>>>>>> OPAL_THREAD_LOCK(&mca_oob_tcp_component.tcp_lock);
>>>>>> +
>>>>>> if (orte_util_compare_name_fields(ORTE_NS_CMP_ALL, &peer->peer_name, 
>>>>>> &src) != OPAL_EQUAL) {
>>>>>>    opal_hash_table_remove_value_uint64(&mca_oob_tcp_component.tcp_peers, 
>>>>>>                                        
>>>>>> orte_util_hash_name(&peer->peer_name));
>>>>>> 
>>>>>> Modified: trunk/orte/mca/oob/tcp/oob_tcp_peer.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/oob/tcp/oob_tcp_peer.c        (original)
>>>>>> +++ trunk/orte/mca/oob/tcp/oob_tcp_peer.c        2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -903,6 +903,11 @@
>>>>>> static void mca_oob_tcp_peer_recv_handler(int sd, short flags, void* 
>>>>>> user)
>>>>>> {
>>>>>> mca_oob_tcp_peer_t* peer = (mca_oob_tcp_peer_t *)user;
>>>>>> +
>>>>>> +    if (orte_abnormal_term_ordered) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> OPAL_THREAD_LOCK(&peer->peer_lock);
>>>>>> switch(peer->peer_state) {
>>>>>>    case MCA_OOB_TCP_CONNECT_ACK: 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/plm/base/plm_base_jobid.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/plm/base/plm_base_jobid.c     (original)
>>>>>> +++ trunk/orte/mca/plm/base/plm_base_jobid.c     2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -62,12 +62,12 @@
>>>>>> /* set the name */
>>>>>> ORTE_PROC_MY_NAME->jobid = 0xffff0000 & ((uint32_t)jobfam << 16);
>>>>>> ORTE_PROC_MY_NAME->vpid = 0;
>>>>>> -    ORTE_PROC_MY_NAME->epoch= ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> /* copy it to the HNP field */
>>>>>> ORTE_PROC_MY_HNP->jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> ORTE_PROC_MY_HNP->vpid = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    ORTE_PROC_MY_HNP->epoch = ORTE_PROC_MY_NAME->epoch;
>>>>>> +    ORTE_EPOCH_SET(ORTE_PROC_MY_HNP->epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> 
>>>>>> /* done */
>>>>>> return ORTE_SUCCESS;
>>>>>> 
>>>>>> Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/plm/base/plm_base_launch_support.c    (original)
>>>>>> +++ trunk/orte/mca/plm/base/plm_base_launch_support.c    2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -377,8 +377,7 @@
>>>>>> /* push stdin - the IOF will know what to do with the specified target */
>>>>>> name.jobid = job;
>>>>>> name.vpid = jdata->stdin_target;
>>>>>> -    name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    name.epoch = orte_ess.proc_get_epoch(&name);
>>>>>> +    ORTE_EPOCH_SET(name.epoch,orte_ess.proc_get_epoch(&name));
>>>>>> 
>>>>>> if (ORTE_SUCCESS != (rc = orte_iof.push(&name, ORTE_IOF_STDIN, 0))) {
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/plm/base/plm_base_orted_cmds.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/plm/base/plm_base_orted_cmds.c        (original)
>>>>>> +++ trunk/orte/mca/plm/base/plm_base_orted_cmds.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -163,8 +163,7 @@
>>>>>>            continue;
>>>>>>        }
>>>>>>        peer.vpid = v;
>>>>>> -            peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +            ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>        /* don't worry about errors on the send here - just
>>>>>>         * issue it and keep going
>>>>>> @@ -242,7 +241,7 @@
>>>>>> OBJ_CONSTRUCT(&proc, orte_proc_t);
>>>>>> proc.name.jobid = jobid;
>>>>>> proc.name.vpid = ORTE_VPID_WILDCARD;
>>>>>> -    proc.name.epoch  = ORTE_EPOCH_WILDCARD;
>>>>>> +    ORTE_EPOCH_SET(proc.name.epoch,ORTE_EPOCH_WILDCARD);
>>>>>> opal_pointer_array_add(&procs, &proc);
>>>>>> if (ORTE_SUCCESS != (rc = orte_plm_base_orted_kill_local_procs(&procs))) 
>>>>>> {
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>> @@ -340,8 +339,7 @@
>>>>>>            continue;
>>>>>>        }
>>>>>>        peer.vpid = v;
>>>>>> -            peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +            ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>>        /* check to see if this daemon is known to be "dead" */
>>>>>>        if (proc->state > ORTE_PROC_STATE_UNTERMINATED) {
>>>>>>            /* don't try to send this */
>>>>>> 
>>>>>> Modified: trunk/orte/mca/plm/base/plm_base_receive.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/plm/base/plm_base_receive.c   (original)
>>>>>> +++ trunk/orte/mca/plm/base/plm_base_receive.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -146,7 +146,9 @@
>>>>>> orte_job_t *jdata, *parent;
>>>>>> opal_buffer_t answer;
>>>>>> orte_vpid_t vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t epoch;
>>>>>> +#endif
>>>>>> orte_proc_t *proc;
>>>>>> orte_proc_state_t state;
>>>>>> orte_exit_code_t exit_code;
>>>>>> @@ -394,8 +396,7 @@
>>>>>>                    break;
>>>>>>                }
>>>>>>                name.vpid = vpid;
>>>>>> -                    name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                    name.epoch = orte_ess.proc_get_epoch(&name);
>>>>>> +                    
>>>>>> ORTE_EPOCH_SET(name.epoch,orte_ess.proc_get_epoch(&name));
>>>>>> 
>>>>>>                /* unpack the pid */
>>>>>>                count = 1;
>>>>>> @@ -488,9 +489,11 @@
>>>>>>            }
>>>>>>            name.vpid = vpid;
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>            count=1;
>>>>>>            opal_dss.unpack(msgpkt->buffer, &epoch, &count, ORTE_EPOCH);
>>>>>>            name.epoch = epoch;
>>>>>> +#endif
>>>>>> 
>>>>>>            OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output,
>>>>>>                                 "%s plm:base:receive Described rank %s",
>>>>>> 
>>>>>> Modified: trunk/orte/mca/plm/base/plm_base_rsh_support.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/plm/base/plm_base_rsh_support.c       (original)
>>>>>> +++ trunk/orte/mca/plm/base/plm_base_rsh_support.c       2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -1527,7 +1527,9 @@
>>>>>> {
>>>>>> char *param, *path, *tmp, *cmd, *basename, *dest_dir;
>>>>>> int i;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t epoch;
>>>>>> +#endif
>>>>>> orte_process_name_t proc;
>>>>>> 
>>>>>> /* if a prefix is set, pass it to the bootproxy in a special way */
>>>>>> @@ -1638,6 +1640,7 @@
>>>>>> opal_setenv("OMPI_COMM_WORLD_RANK", cmd, true, argv);
>>>>>> free(cmd);
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* set the epoch */
>>>>>> proc.jobid = jobid;
>>>>>> proc.vpid = vpid;
>>>>>> @@ -1648,6 +1651,7 @@
>>>>>> opal_setenv(param, cmd, true, argv);
>>>>>> free(param);
>>>>>> free(cmd);
>>>>>> +#endif
>>>>>> 
>>>>>> /* set the number of procs */
>>>>>> asprintf(&cmd, "%d", (int)num_procs);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c   (original)
>>>>>> +++ trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c   2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -33,12 +33,14 @@
>>>>>> #include "orte/mca/ess/ess.h"
>>>>>> #include "opal/mca/sysinfo/sysinfo_types.h"
>>>>>> 
>>>>>> +#include "orte/types.h"
>>>>>> #include "orte/util/show_help.h"
>>>>>> #include "orte/util/name_fns.h"
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/util/hostfile/hostfile.h"
>>>>>> #include "orte/util/dash_host/dash_host.h"
>>>>>> #include "orte/mca/errmgr/errmgr.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rmaps/base/rmaps_private.h"
>>>>>> #include "orte/mca/rmaps/base/base.h"
>>>>>> @@ -454,7 +456,7 @@
>>>>>>     */
>>>>>> 
>>>>>>    /* We do set the epoch here since they all start with the same value. 
>>>>>> */
>>>>>> -        proc->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +        ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>>    proc->app_idx = app_idx;
>>>>>>    OPAL_OUTPUT_VERBOSE((5, orte_rmaps_base.rmaps_output,
>>>>>> @@ -559,11 +561,12 @@
>>>>>>                    }
>>>>>>                }
>>>>>>                proc->name.vpid = vpid;
>>>>>> -                    proc->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                    proc->name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&proc->name);
>>>>>> +                    ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);
>>>>>> +                    
>>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name));
>>>>>> +                    
>>>>>>                /* If there is an invalid epoch here, it's because it 
>>>>>> doesn't exist yet. */
>>>>>> -                    if (ORTE_NODE_RANK_INVALID == proc->name.epoch) {
>>>>>> -                        proc->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +                    if (0 == 
>>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_INVALID,proc->name.epoch)) {
>>>>>> +                        ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN);
>>>>>>                }
>>>>>>            }
>>>>>>            if (NULL == opal_pointer_array_get_item(jdata->procs, 
>>>>>> proc->name.vpid)) {
>>>>>> @@ -601,8 +604,8 @@
>>>>>>                    }
>>>>>>                }
>>>>>>                proc->name.vpid = vpid;
>>>>>> -                    proc->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                    proc->name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&proc->name);
>>>>>> +                    ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);
>>>>>> +                    
>>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name));
>>>>>>            }
>>>>>>            if (NULL == opal_pointer_array_get_item(jdata->procs, 
>>>>>> proc->name.vpid)) {
>>>>>>                if (ORTE_SUCCESS != (rc = 
>>>>>> opal_pointer_array_set_item(jdata->procs, proc->name.vpid, proc))) {
>>>>>> @@ -835,7 +838,7 @@
>>>>>>            return ORTE_ERR_OUT_OF_RESOURCE;
>>>>>>        }
>>>>>>        proc->name.vpid = daemons->num_procs;  /* take the next available 
>>>>>> vpid */
>>>>>> -            proc->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +            ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN);
>>>>>>        proc->node = node;
>>>>>>        proc->nodename = node->name;
>>>>>>        OPAL_OUTPUT_VERBOSE((5, orte_rmaps_base.rmaps_output,
>>>>>> @@ -1014,8 +1017,8 @@
>>>>>>        return ORTE_ERR_OUT_OF_RESOURCE;
>>>>>>    }
>>>>>>    proc->name.vpid = jdata->num_procs;  /* take the next available vpid 
>>>>>> */
>>>>>> -        proc->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        proc->name.epoch = orte_ess.proc_get_epoch(&proc->name);
>>>>>> +        ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name));
>>>>>>    proc->node = node;
>>>>>>    proc->nodename = node->name;
>>>>>>    OPAL_OUTPUT_VERBOSE((5, orte_rmaps_base.rmaps_output,
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c     (original)
>>>>>> +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c     2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -502,8 +502,7 @@
>>>>>>        }
>>>>>>        proc->name.vpid = rank;
>>>>>>        /* Either init or update the epoch. */
>>>>>> -            proc->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            proc->name.epoch = orte_ess.proc_get_epoch(&proc->name);
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name));
>>>>>> 
>>>>>>        proc->slot_list = strdup(rfmap->slot_list);
>>>>>>        /* insert the proc into the proper place */
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rmaps/seq/rmaps_seq.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rmaps/seq/rmaps_seq.c (original)
>>>>>> +++ trunk/orte/mca/rmaps/seq/rmaps_seq.c 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -235,8 +235,7 @@
>>>>>>        }
>>>>>>        /* assign the vpid */
>>>>>>        proc->name.vpid = vpid++;
>>>>>> -            proc->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            proc->name.epoch = orte_ess.proc_get_epoch(&proc->name);
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name));
>>>>>> 
>>>>>>        /* add to the jdata proc array */
>>>>>>        if (ORTE_SUCCESS != (rc = 
>>>>>> opal_pointer_array_set_item(jdata->procs, proc->name.vpid, proc))) {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rmcast/base/rmcast_base_open.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rmcast/base/rmcast_base_open.c        (original)
>>>>>> +++ trunk/orte/mca/rmcast/base/rmcast_base_open.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -341,7 +341,7 @@
>>>>>> {
>>>>>> ptr->name.jobid = ORTE_JOBID_INVALID;
>>>>>> ptr->name.vpid = ORTE_VPID_INVALID;
>>>>>> -    ptr->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_MIN);
>>>>>> ptr->channel = ORTE_RMCAST_INVALID_CHANNEL;
>>>>>> OBJ_CONSTRUCT(&ptr->ctl, orte_thread_ctl_t);
>>>>>> ptr->seq_num = ORTE_RMCAST_SEQ_INVALID;
>>>>>> @@ -430,7 +430,7 @@
>>>>>> {
>>>>>> ptr->name.jobid = ORTE_JOBID_INVALID;
>>>>>> ptr->name.vpid = ORTE_VPID_INVALID;
>>>>>> -    ptr->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_MIN);
>>>>>> OBJ_CONSTRUCT(&ptr->last_msg, opal_list_t);
>>>>>> }
>>>>>> static void recvlog_destruct(rmcast_recv_log_t *ptr)
>>>>>> @@ -439,7 +439,7 @@
>>>>>> 
>>>>>> ptr->name.jobid = ORTE_JOBID_INVALID;
>>>>>> ptr->name.vpid = ORTE_VPID_INVALID;
>>>>>> -    ptr->name.epoch = ORTE_EPOCH_INVALID;
>>>>>> +    ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_INVALID);
>>>>>> while (NULL != (item = opal_list_remove_first(&ptr->last_msg))) {
>>>>>>    OBJ_RELEASE(item);
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rmcast/tcp/rmcast_tcp.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rmcast/tcp/rmcast_tcp.c       (original)
>>>>>> +++ trunk/orte/mca/rmcast/tcp/rmcast_tcp.c       2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -681,7 +681,7 @@
>>>>>>    /* caller requested id of sender */
>>>>>>    name->jobid = recvptr->name.jobid;
>>>>>>    name->vpid = recvptr->name.vpid;
>>>>>> -        name->epoch= recvptr->name.epoch;
>>>>>> +        ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch);
>>>>>> }
>>>>>> *seq_num = recvptr->seq_num;
>>>>>> *msg = recvptr->iovec_array;
>>>>>> @@ -776,7 +776,7 @@
>>>>>>    /* caller requested id of sender */
>>>>>>    name->jobid = recvptr->name.jobid;
>>>>>>    name->vpid = recvptr->name.vpid;
>>>>>> -        name->epoch= recvptr->name.epoch;
>>>>>> +        ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch);
>>>>>> }
>>>>>> *seq_num = recvptr->seq_num;
>>>>>> if (ORTE_SUCCESS != (ret = opal_dss.copy_payload(buf, recvptr->buf))) {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rmcast/udp/rmcast_udp.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rmcast/udp/rmcast_udp.c       (original)
>>>>>> +++ trunk/orte/mca/rmcast/udp/rmcast_udp.c       2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -460,7 +460,7 @@
>>>>>>    /* caller requested id of sender */
>>>>>>    name->jobid = recvptr->name.jobid;
>>>>>>    name->vpid = recvptr->name.vpid;
>>>>>> -        name->epoch= recvptr->name.epoch;
>>>>>> +        ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch);
>>>>>> }
>>>>>> *seq_num = recvptr->seq_num;
>>>>>> *msg = recvptr->iovec_array;
>>>>>> @@ -553,7 +553,7 @@
>>>>>>    /* caller requested id of sender */
>>>>>>    name->jobid = recvptr->name.jobid;
>>>>>>    name->vpid = recvptr->name.vpid;
>>>>>> -        name->epoch= recvptr->name.epoch;
>>>>>> +        ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch);
>>>>>> }
>>>>>> *seq_num = recvptr->seq_num;
>>>>>> if (ORTE_SUCCESS != (ret = opal_dss.copy_payload(buf, recvptr->buf))) {
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rml/base/rml_base_components.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rml/base/rml_base_components.c        (original)
>>>>>> +++ trunk/orte/mca/rml/base/rml_base_components.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -20,6 +20,7 @@
>>>>>> #include "opal/util/output.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/rml.h"
>>>>>> +#include "orte/util/name_fns.h"
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> 
>>>>>> @@ -67,14 +68,14 @@
>>>>>> {
>>>>>> pkt->sender.jobid = ORTE_JOBID_INVALID;
>>>>>> pkt->sender.vpid = ORTE_VPID_INVALID;
>>>>>> -    pkt->sender.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(pkt->sender.epoch,ORTE_EPOCH_MIN);
>>>>>> pkt->buffer = NULL;
>>>>>> }
>>>>>> static void msg_pkt_destructor(orte_msg_packet_t *pkt)
>>>>>> {
>>>>>> pkt->sender.jobid = ORTE_JOBID_INVALID;
>>>>>> pkt->sender.vpid = ORTE_VPID_INVALID;
>>>>>> -    pkt->sender.epoch = ORTE_EPOCH_INVALID;
>>>>>> +    ORTE_EPOCH_SET(pkt->sender.epoch,ORTE_EPOCH_INVALID);
>>>>>> if (NULL != pkt->buffer) {
>>>>>>    OBJ_RELEASE(pkt->buffer);
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/rml/rml_types.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/rml/rml_types.h       (original)
>>>>>> +++ trunk/orte/mca/rml/rml_types.h       2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -62,7 +62,7 @@
>>>>>>    pkt = OBJ_NEW(orte_msg_packet_t);                           \
>>>>>>    pkt->sender.jobid = (sndr)->jobid;                          \
>>>>>>    pkt->sender.vpid = (sndr)->vpid;                            \
>>>>>> -        pkt->sender.epoch = (sndr)->epoch;                          \
>>>>>> +        ORTE_EPOCH_SET(pkt->sender.epoch,(sndr)->epoch);            \
>>>>>>    if ((crt)) {                                                \
>>>>>>        pkt->buffer = OBJ_NEW(opal_buffer_t);                   \
>>>>>>        opal_dss.copy_payload(pkt->buffer, *(buf));             \
>>>>>> @@ -85,7 +85,7 @@
>>>>>>    pkt = OBJ_NEW(orte_msg_packet_t);                           \
>>>>>>    pkt->sender.jobid = (sndr)->jobid;                          \
>>>>>>    pkt->sender.vpid = (sndr)->vpid;                            \
>>>>>> -        pkt->sender.epoch = (sndr)->epoch;                          \
>>>>>> +        ORTE_EPOCH_SET(pkt->sender.epoch,(sndr)->epoch);            \
>>>>>>    if ((crt)) {                                                \
>>>>>>        pkt->buffer = OBJ_NEW(opal_buffer_t);                   \
>>>>>>        opal_dss.copy_payload(pkt->buffer, *(buf));             \
>>>>>> @@ -191,8 +191,10 @@
>>>>>> 
>>>>>> #define ORTE_RML_TAG_SUBSCRIBE              46
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* For Epoch Updates */
>>>>>> #define ORTE_RML_TAG_EPOCH_CHANGE           47
>>>>>> +#endif
>>>>>> 
>>>>>> /* Notify of failed processes */
>>>>>> #define ORTE_RML_TAG_FAILURE_NOTICE         48
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/base/routed_base_components.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/base/routed_base_components.c  (original)
>>>>>> +++ trunk/orte/mca/routed/base/routed_base_components.c  2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -65,7 +65,7 @@
>>>>>> {
>>>>>> ptr->route.jobid = ORTE_JOBID_INVALID;
>>>>>> ptr->route.vpid = ORTE_VPID_INVALID;
>>>>>> -    ptr->route.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ptr->route.epoch,ORTE_EPOCH_MIN);
>>>>>> ptr->hnp_uri = NULL;
>>>>>> }
>>>>>> static void jfamdest(orte_routed_jobfam_t *ptr)
>>>>>> @@ -117,7 +117,7 @@
>>>>>> jfam = OBJ_NEW(orte_routed_jobfam_t);
>>>>>> jfam->route.jobid = ORTE_PROC_MY_HNP->jobid;
>>>>>> jfam->route.vpid = ORTE_PROC_MY_HNP->vpid;
>>>>>> -    jfam->route.epoch = ORTE_PROC_MY_HNP->epoch;
>>>>>> +    ORTE_EPOCH_SET(jfam->route.epoch,ORTE_PROC_MY_HNP->epoch);
>>>>>> jfam->job_family = ORTE_JOB_FAMILY(ORTE_PROC_MY_NAME->jobid);
>>>>>> if (NULL != orte_process_info.my_hnp_uri) {
>>>>>>    jfam->hnp_uri = strdup(orte_process_info.my_hnp_uri);
>>>>>> @@ -252,7 +252,7 @@
>>>>>>    jfam->job_family = jobfamily;
>>>>>>    jfam->route.jobid = name.jobid;
>>>>>>    jfam->route.vpid = name.vpid;
>>>>>> -        jfam->route.epoch = name.epoch;
>>>>>> +        ORTE_EPOCH_SET(jfam->route.epoch,name.epoch);
>>>>>>    jfam->hnp_uri = strdup(uri);
>>>>>> done:
>>>>>>    free(uri);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/base/routed_base_register_sync.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/base/routed_base_register_sync.c       
>>>>>> (original)
>>>>>> +++ trunk/orte/mca/routed/base/routed_base_register_sync.c       
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -127,7 +127,9 @@
>>>>>> orte_std_cntr_t cnt;
>>>>>> char *rml_uri;
>>>>>> orte_vpid_t vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t epoch;
>>>>>> +#endif
>>>>>> int rc;
>>>>>> 
>>>>>> if (ORTE_JOB_FAMILY(job) == ORTE_JOB_FAMILY(ORTE_PROC_MY_NAME->jobid)) {
>>>>>> @@ -146,11 +148,13 @@
>>>>>> cnt = 1;
>>>>>> while (ORTE_SUCCESS == (rc = opal_dss.unpack(buffer, &vpid, &cnt, 
>>>>>> ORTE_VPID))) {
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>    cnt = 1;
>>>>>>    if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &epoch, &cnt, 
>>>>>> ORTE_EPOCH))) {
>>>>>>        ORTE_ERROR_LOG(rc);
>>>>>>        continue;
>>>>>>    }        
>>>>>> +#endif
>>>>>> 
>>>>>>    if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &rml_uri, &cnt, 
>>>>>> OPAL_STRING))) {
>>>>>>        ORTE_ERROR_LOG(rc);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/binomial/routed_binomial.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/binomial/routed_binomial.c     (original)
>>>>>> +++ trunk/orte/mca/routed/binomial/routed_binomial.c     2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -33,6 +33,7 @@
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/runtime/orte_wait.h"
>>>>>> #include "orte/runtime/runtime.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/base/rml_contact.h"
>>>>>> 
>>>>>> @@ -147,7 +148,7 @@
>>>>>> 
>>>>>> if (proc->jobid == ORTE_JOBID_INVALID ||
>>>>>>    proc->vpid == ORTE_VPID_INVALID ||
>>>>>> -        proc->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -216,7 +217,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -274,8 +275,7 @@
>>>>>>                                 ORTE_NAME_PRINT(route)));
>>>>>>            jfam->route.jobid = route->jobid;
>>>>>>            jfam->route.vpid = route->vpid;
>>>>>> -                jfam->route.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                jfam->route.epoch = 
>>>>>> orte_ess.proc_get_epoch(&jfam->route);
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route));
>>>>>> 
>>>>>>            return ORTE_SUCCESS;
>>>>>>        }
>>>>>> @@ -290,8 +290,7 @@
>>>>>>    jfam->job_family = jfamily;
>>>>>>    jfam->route.jobid = route->jobid;
>>>>>>    jfam->route.vpid = route->vpid;
>>>>>> -        jfam->route.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        jfam->route.epoch = orte_ess.proc_get_epoch(&jfam->route);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route));
>>>>>> 
>>>>>>    opal_pointer_array_add(&orte_routed_jobfams, jfam);
>>>>>>    return ORTE_SUCCESS;
>>>>>> @@ -317,11 +316,21 @@
>>>>>> /* initialize */
>>>>>> daemon.jobid = ORTE_PROC_MY_DAEMON->jobid;
>>>>>> daemon.vpid = ORTE_PROC_MY_DAEMON->vpid;
>>>>>> -    daemon.epoch = ORTE_PROC_MY_DAEMON->epoch;
>>>>>> +    ORTE_EPOCH_SET(daemon.epoch,ORTE_PROC_MY_DAEMON->epoch);
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>>    target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +#else
>>>>>> +    if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>> +        target->vpid == ORTE_VPID_INVALID) {
>>>>>> +#endif
>>>>>> +        ret = ORTE_NAME_INVALID;
>>>>>> +        goto found;
>>>>>> +    }
>>>>>> +
>>>>>> +    if (0 > ORTE_EPOCH_CMP(target->epoch, 
>>>>>> orte_ess.proc_get_epoch(target))) {
>>>>>>    ret = ORTE_NAME_INVALID;
>>>>>>    goto found;
>>>>>> }
>>>>>> @@ -443,7 +452,7 @@
>>>>>> 
>>>>>>        /* If the daemon to which we should be routing is dead, then 
>>>>>> update
>>>>>>         * the routing tree and start over. */
>>>>>> -            if (!orte_util_proc_is_running(&daemon)) {
>>>>>> +            if (!PROC_IS_RUNNING(&daemon)) {
>>>>>>            update_routing_tree(daemon.jobid);
>>>>>>            goto startover;
>>>>>>        }
>>>>>> @@ -461,8 +470,7 @@
>>>>>> ret = &daemon;
>>>>>> 
>>>>>> found:
>>>>>> -    daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    daemon.epoch = orte_ess.proc_get_epoch(&daemon);
>>>>>> +    ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon));
>>>>>> 
>>>>>> OPAL_OUTPUT_VERBOSE((1, orte_routed_base_output,
>>>>>>                     "%s routed_binomial_get(%s) --> %s",
>>>>>> @@ -879,7 +887,7 @@
>>>>>> */
>>>>>> local_lifeline.jobid = proc->jobid;
>>>>>> local_lifeline.vpid = proc->vpid;
>>>>>> -    local_lifeline.epoch = proc->epoch;
>>>>>> +    ORTE_EPOCH_SET(local_lifeline.epoch,proc->epoch);
>>>>>> lifeline = &local_lifeline;
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> @@ -924,11 +932,11 @@
>>>>>>             * that process so we can check it's state.
>>>>>>             */
>>>>>>            proc_name.vpid = peer;
>>>>>> -                proc_name.epoch = orte_util_lookup_epoch(&proc_name);
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(proc_name.epoch,orte_util_lookup_epoch(&proc_name));
>>>>>> 
>>>>>> -                if (!orte_util_proc_is_running(&proc_name) 
>>>>>> -                    && ORTE_EPOCH_MIN < proc_name.epoch
>>>>>> -                    && ORTE_EPOCH_INVALID != proc_name.epoch) {
>>>>>> +                if (!PROC_IS_RUNNING(&proc_name)
>>>>>> +                    && 0 < 
>>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_MIN,proc_name.epoch)
>>>>>> +                    && 0 != 
>>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_INVALID,proc_name.epoch)) {
>>>>>>                OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output,
>>>>>>                                     "%s routed:binomial child %s is 
>>>>>> dead",
>>>>>>                                     ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>>>>>> @@ -967,7 +975,7 @@
>>>>>> }
>>>>>> 
>>>>>> /* find the children of this rank */
>>>>>> -    OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output,
>>>>>> +    OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output,
>>>>>>                     "%s routed:binomial find children of rank %d",
>>>>>>                     ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), rank));
>>>>>> bitmap = opal_cube_dim(num_procs);
>>>>>> @@ -977,24 +985,25 @@
>>>>>> 
>>>>>> for (i = hibit + 1, mask = 1 << i; i <= bitmap; ++i, mask <<= 1) {
>>>>>>    peer = rank | mask;
>>>>>> -        OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output,
>>>>>> +        OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output,
>>>>>>                         "%s routed:binomial find children checking peer 
>>>>>> %d",
>>>>>>                         ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), peer));
>>>>>>    if (peer < num_procs) {
>>>>>> -            OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output,
>>>>>> +            OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output,
>>>>>>                             "%s routed:binomial find children computing 
>>>>>> tree",
>>>>>>                             ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
>>>>>>        /* execute compute on this child */
>>>>>>        if (0 <= (found = binomial_tree(peer, rank, me, num_procs, 
>>>>>> nchildren, childrn, relatives, mine, jobid))) {
>>>>>>            proc_name.vpid = found;
>>>>>> 
>>>>>> -                if (!orte_util_proc_is_running(&proc_name) && 
>>>>>> ORTE_EPOCH_MIN < orte_util_lookup_epoch(&proc_name)) {
>>>>>> -                    OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output,
>>>>>> +                if (!PROC_IS_RUNNING(&proc_name) 
>>>>>> +                    && 0 < 
>>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_MIN,orte_util_lookup_epoch(&proc_name))) {
>>>>>> +                    OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output,
>>>>>>                                     "%s routed:binomial find children 
>>>>>> proc out of date - returning parent %d",
>>>>>>                                     ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), 
>>>>>> parent));
>>>>>>                return parent;
>>>>>>            }
>>>>>> -                OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output,
>>>>>> +                OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output,
>>>>>>                                 "%s routed:binomial find children 
>>>>>> returning found value %d",
>>>>>>                                 ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), 
>>>>>> found));
>>>>>>            return found;
>>>>>> @@ -1029,8 +1038,7 @@
>>>>>> ORTE_PROC_MY_PARENT->vpid = binomial_tree(0, 0, ORTE_PROC_MY_NAME->vpid,
>>>>>>                               orte_process_info.max_procs,
>>>>>>                               &num_children, &my_children, NULL, true, 
>>>>>> jobid);
>>>>>> -    ORTE_PROC_MY_PARENT->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_PARENT->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_PARENT->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT));
>>>>>> 
>>>>>> if (0 < opal_output_get_verbosity(orte_routed_base_output)) {
>>>>>>    opal_output(0, "%s: parent %d num_children %d", 
>>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_PROC_MY_PARENT->vpid, 
>>>>>> num_children);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/cm/routed_cm.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/cm/routed_cm.c (original)
>>>>>> +++ trunk/orte/mca/routed/cm/routed_cm.c 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -35,6 +35,7 @@
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/runtime/orte_wait.h"
>>>>>> #include "orte/runtime/runtime.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/base/rml_contact.h"
>>>>>> 
>>>>>> @@ -139,7 +140,7 @@
>>>>>> 
>>>>>> if (proc->jobid == ORTE_JOBID_INVALID ||
>>>>>>    proc->vpid == ORTE_VPID_INVALID ||
>>>>>> -        proc->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -200,7 +201,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -257,8 +258,7 @@
>>>>>>                                 ORTE_NAME_PRINT(route)));
>>>>>>            jfam->route.jobid = route->jobid;
>>>>>>            jfam->route.vpid = route->vpid;
>>>>>> -                jfam->route.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                jfam->route.epoch = 
>>>>>> orte_ess.proc_get_epoch(&jfam->route);
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route));
>>>>>> 
>>>>>>            return ORTE_SUCCESS;
>>>>>>        }
>>>>>> @@ -273,8 +273,7 @@
>>>>>>    jfam->job_family = jfamily;
>>>>>>    jfam->route.jobid = route->jobid;
>>>>>>    jfam->route.vpid = route->vpid;
>>>>>> -        jfam->route.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        jfam->route.epoch = orte_ess.proc_get_epoch(&jfam->route);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route));
>>>>>> 
>>>>>>    opal_pointer_array_add(&orte_routed_jobfams, jfam);
>>>>>>    return ORTE_SUCCESS;
>>>>>> @@ -299,7 +298,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    ret = ORTE_NAME_INVALID;
>>>>>>    goto found;
>>>>>> }
>>>>>> @@ -367,8 +366,7 @@
>>>>>> }
>>>>>> 
>>>>>> /* Initialize daemon's epoch, based on its current vpid/jobid */
>>>>>> -    daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    daemon.epoch = orte_ess.proc_get_epoch(&daemon);
>>>>>> +    ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon));
>>>>>> 
>>>>>> /* if the daemon is me, then send direct to the target! */
>>>>>> if (ORTE_PROC_MY_NAME->vpid == daemon.vpid) {
>>>>>> @@ -814,8 +812,7 @@
>>>>>> */
>>>>>> local_lifeline.jobid = proc->jobid;
>>>>>> local_lifeline.vpid = proc->vpid;
>>>>>> -    local_lifeline.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    local_lifeline.epoch = orte_ess.proc_get_epoch(&local_lifeline);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(local_lifeline.epoch,orte_ess.proc_get_epoch(&local_lifeline));
>>>>>> 
>>>>>> lifeline = &local_lifeline;
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/direct/routed_direct.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/direct/routed_direct.c (original)
>>>>>> +++ trunk/orte/mca/routed/direct/routed_direct.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -24,6 +24,7 @@
>>>>>> #include "orte/util/name_fns.h"
>>>>>> #include "orte/util/proc_info.h"
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/base/rml_contact.h"
>>>>>> 
>>>>>> @@ -135,7 +136,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    ret = ORTE_NAME_INVALID;
>>>>>> } else {
>>>>>>    /* all routes are direct */
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/linear/routed_linear.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/linear/routed_linear.c (original)
>>>>>> +++ trunk/orte/mca/routed/linear/routed_linear.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -31,6 +31,7 @@
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/runtime/orte_wait.h"
>>>>>> #include "orte/runtime/runtime.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/base/rml_contact.h"
>>>>>> 
>>>>>> @@ -132,7 +133,7 @@
>>>>>> 
>>>>>> if (proc->jobid == ORTE_JOBID_INVALID ||
>>>>>>    proc->vpid == ORTE_VPID_INVALID ||
>>>>>> -        proc->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -201,7 +202,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -259,7 +260,7 @@
>>>>>>                                 ORTE_NAME_PRINT(route)));
>>>>>>            jfam->route.jobid = route->jobid;
>>>>>>            jfam->route.vpid = route->vpid;
>>>>>> -                jfam->route.epoch = route->epoch;
>>>>>> +                ORTE_EPOCH_SET(jfam->route.epoch,route->epoch);
>>>>>>            return ORTE_SUCCESS;
>>>>>>        }
>>>>>>    }
>>>>>> @@ -273,7 +274,7 @@
>>>>>>    jfam->job_family = jfamily;
>>>>>>    jfam->route.jobid = route->jobid;
>>>>>>    jfam->route.vpid = route->vpid;
>>>>>> -        jfam->route.epoch = route->epoch;
>>>>>> +        ORTE_EPOCH_SET(jfam->route.epoch,route->epoch);
>>>>>>    opal_pointer_array_add(&orte_routed_jobfams, jfam);
>>>>>>    return ORTE_SUCCESS;
>>>>>> }
>>>>>> @@ -373,8 +374,7 @@
>>>>>> }
>>>>>> 
>>>>>> /* Initialize daemon's epoch, based on its current vpid/jobid */
>>>>>> -    daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    daemon.epoch = orte_ess.proc_get_epoch(&daemon);
>>>>>> +    ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon));
>>>>>> 
>>>>>> /* if the daemon is me, then send direct to the target! */
>>>>>> if (ORTE_PROC_MY_NAME->vpid == daemon.vpid) {
>>>>>> @@ -395,8 +395,7 @@
>>>>>>            /* we are at end of chain - wrap around */
>>>>>>            daemon.vpid = 0;
>>>>>>        }
>>>>>> -            daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -            daemon.epoch = orte_ess.proc_get_epoch(&daemon);
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon));
>>>>>>        ret = &daemon;
>>>>>>    }
>>>>>> }
>>>>>> @@ -741,7 +740,7 @@
>>>>>> */
>>>>>> local_lifeline.jobid = proc->jobid;
>>>>>> local_lifeline.vpid = proc->vpid;
>>>>>> -    local_lifeline.epoch = proc->epoch;
>>>>>> +    ORTE_EPOCH_SET(local_lifeline.epoch,proc->epoch);
>>>>>> lifeline = &local_lifeline;
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/radix/routed_radix.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/radix/routed_radix.c   (original)
>>>>>> +++ trunk/orte/mca/routed/radix/routed_radix.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -31,6 +31,7 @@
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/runtime/orte_wait.h"
>>>>>> #include "orte/runtime/runtime.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/base/rml_contact.h"
>>>>>> 
>>>>>> @@ -145,7 +146,7 @@
>>>>>> 
>>>>>> if (proc->jobid == ORTE_JOBID_INVALID ||
>>>>>>    proc->vpid == ORTE_VPID_INVALID ||
>>>>>> -        proc->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -214,7 +215,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    return ORTE_ERR_BAD_PARAM;
>>>>>> }
>>>>>> 
>>>>>> @@ -272,7 +273,7 @@
>>>>>>                                 ORTE_NAME_PRINT(route)));
>>>>>>            jfam->route.jobid = route->jobid;
>>>>>>            jfam->route.vpid = route->vpid;
>>>>>> -                jfam->route.epoch = route->epoch;
>>>>>> +                ORTE_EPOCH_SET(jfam->route.epoch,route->epoch);
>>>>>>            return ORTE_SUCCESS;
>>>>>>        }
>>>>>>    }
>>>>>> @@ -286,7 +287,7 @@
>>>>>>    jfam->job_family = jfamily;
>>>>>>    jfam->route.jobid = route->jobid;
>>>>>>    jfam->route.vpid = route->vpid;
>>>>>> -        jfam->route.epoch = route->epoch;
>>>>>> +        ORTE_EPOCH_SET(jfam->route.epoch,route->epoch);
>>>>>>    opal_pointer_array_add(&orte_routed_jobfams, jfam);
>>>>>>    return ORTE_SUCCESS;
>>>>>> }
>>>>>> @@ -310,7 +311,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    ret = ORTE_NAME_INVALID;
>>>>>>    goto found;
>>>>>> }
>>>>>> @@ -413,8 +414,7 @@
>>>>>>        if (opal_bitmap_is_set_bit(&child->relatives, daemon.vpid)) {
>>>>>>            /* yep - we need to step through this child */
>>>>>>            daemon.vpid = child->vpid;
>>>>>> -                daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -                daemon.epoch = orte_ess.proc_get_epoch(&daemon);
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon));
>>>>>>            ret = &daemon;
>>>>>>            goto found;
>>>>>>        }
>>>>>> @@ -425,8 +425,7 @@
>>>>>> * any of our children, so we have to step up through our parent
>>>>>> */
>>>>>> daemon.vpid = ORTE_PROC_MY_PARENT->vpid;
>>>>>> -    daemon.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    daemon.epoch = orte_ess.proc_get_epoch(&daemon);
>>>>>> +    ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon));
>>>>>> 
>>>>>> ret = &daemon;
>>>>>> 
>>>>>> @@ -788,7 +787,7 @@
>>>>>> */
>>>>>> local_lifeline.jobid = proc->jobid;
>>>>>> local_lifeline.vpid = proc->vpid;
>>>>>> -    local_lifeline.epoch = proc->epoch;
>>>>>> +    ORTE_EPOCH_SET(local_lifeline.epoch,proc->epoch);
>>>>>> lifeline = &local_lifeline;
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> @@ -881,8 +880,7 @@
>>>>>>    ORTE_PROC_MY_PARENT->vpid = (Ii-Sum) % NInPrevLevel;
>>>>>>    ORTE_PROC_MY_PARENT->vpid += (Sum - NInPrevLevel);
>>>>>> }
>>>>>> -    ORTE_PROC_MY_PARENT->epoch = ORTE_EPOCH_INVALID;
>>>>>> -    ORTE_PROC_MY_PARENT->epoch = 
>>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_PARENT->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT));
>>>>>> 
>>>>>> /* compute my direct children and the bitmap that shows which vpids
>>>>>> * lie underneath their branch
>>>>>> 
>>>>>> Modified: trunk/orte/mca/routed/slave/routed_slave.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/routed/slave/routed_slave.c   (original)
>>>>>> +++ trunk/orte/mca/routed/slave/routed_slave.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -26,6 +26,7 @@
>>>>>> #include "orte/runtime/orte_globals.h"
>>>>>> #include "orte/runtime/orte_wait.h"
>>>>>> #include "orte/runtime/runtime.h"
>>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h"
>>>>>> 
>>>>>> #include "orte/mca/rml/base/rml_contact.h"
>>>>>> 
>>>>>> @@ -134,7 +135,7 @@
>>>>>> 
>>>>>> if (target->jobid == ORTE_JOBID_INVALID ||
>>>>>>    target->vpid == ORTE_VPID_INVALID ||
>>>>>> -        target->epoch == ORTE_EPOCH_INVALID) {
>>>>>> +        0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) {
>>>>>>    ret = ORTE_NAME_INVALID;
>>>>>> } else {
>>>>>>    /* a slave must always route via its parent daemon */
>>>>>> @@ -275,8 +276,7 @@
>>>>>> */
>>>>>> local_lifeline.jobid = proc->jobid;
>>>>>> local_lifeline.vpid = proc->vpid;
>>>>>> -    local_lifeline.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    local_lifeline.epoch = orte_ess.proc_get_epoch(&local_lifeline);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(local_lifeline.epoch,orte_ess.proc_get_epoch(&local_lifeline));
>>>>>> 
>>>>>> lifeline = &local_lifeline;
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/sensor/file/sensor_file.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/sensor/file/sensor_file.c     (original)
>>>>>> +++ trunk/orte/mca/sensor/file/sensor_file.c     2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -70,7 +70,9 @@
>>>>>> opal_list_item_t super;
>>>>>> orte_jobid_t jobid;
>>>>>> orte_vpid_t vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t epoch;
>>>>>> +#endif
>>>>>> char *file;
>>>>>> int tick;
>>>>>> bool check_size;
>>>>>> 
>>>>>> Modified: trunk/orte/mca/snapc/base/snapc_base_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/snapc/base/snapc_base_fns.c   (original)
>>>>>> +++ trunk/orte/mca/snapc/base/snapc_base_fns.c   2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -81,7 +81,7 @@
>>>>>> {
>>>>>> snapshot->process_name.jobid  = 0;
>>>>>> snapshot->process_name.vpid   = 0;
>>>>>> -    snapshot->process_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE;
>>>>>> 
>>>>>> @@ -92,7 +92,7 @@
>>>>>> {
>>>>>> snapshot->process_name.jobid  = 0;
>>>>>> snapshot->process_name.vpid   = 0;
>>>>>> -    snapshot->process_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE;
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/snapc/full/snapc_full_global.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/snapc/full/snapc_full_global.c        (original)
>>>>>> +++ trunk/orte/mca/snapc/full/snapc_full_global.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -427,7 +427,7 @@
>>>>>>        new_proc = OBJ_NEW(orte_proc_t);
>>>>>>        new_proc->name.jobid = proc->name.jobid;
>>>>>>        new_proc->name.vpid  = proc->name.vpid;
>>>>>> -            new_proc->name.epoch = proc->name.epoch;
>>>>>> +            ORTE_EPOCH_SET(new_proc->name.epoch,proc->name.epoch);
>>>>>>        new_proc->node = OBJ_NEW(orte_node_t);
>>>>>>        new_proc->node->name = proc->node->name;
>>>>>>        opal_list_append(migrating_procs, &new_proc->super);
>>>>>> @@ -618,7 +618,7 @@
>>>>>> 
>>>>>>    orted_snapshot->process_name.jobid  = cur_node->daemon->name.jobid;
>>>>>>    orted_snapshot->process_name.vpid   = cur_node->daemon->name.vpid;
>>>>>> -        orted_snapshot->process_name.epoch  = 
>>>>>> cur_node->daemon->name.epoch;
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(orted_snapshot->process_name.epoch,cur_node->daemon->name.epoch);
>>>>>> 
>>>>>>    mask = ORTE_NS_CMP_JOBID;
>>>>>> 
>>>>>> @@ -636,7 +636,7 @@
>>>>>> 
>>>>>>        app_snapshot->process_name.jobid = procs[p]->name.jobid;
>>>>>>        app_snapshot->process_name.vpid = procs[p]->name.vpid;
>>>>>> -            app_snapshot->process_name.epoch = procs[p]->name.epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(app_snapshot->process_name.epoch,procs[p]->name.epoch);
>>>>>> 
>>>>>>        opal_list_append(&(orted_snapshot->super.local_snapshots), 
>>>>>> &(app_snapshot->super));
>>>>>>    }
>>>>>> @@ -800,7 +800,7 @@
>>>>>> 
>>>>>>            app_snapshot->process_name.jobid = procs[p]->name.jobid;
>>>>>>            app_snapshot->process_name.vpid = procs[p]->name.vpid;
>>>>>> -                app_snapshot->process_name.epoch = procs[p]->name.epoch;
>>>>>> +                
>>>>>> ORTE_EPOCH_SET(app_snapshot->process_name.epoch,procs[p]->name.epoch);
>>>>>> 
>>>>>>            opal_list_append(&(orted_snapshot->super.local_snapshots), 
>>>>>> &(app_snapshot->super));
>>>>>>        }
>>>>>> @@ -816,7 +816,7 @@
>>>>>> 
>>>>>>    orted_snapshot->process_name.jobid  = cur_node->daemon->name.jobid;
>>>>>>    orted_snapshot->process_name.vpid   = cur_node->daemon->name.vpid;
>>>>>> -        orted_snapshot->process_name.epoch  = 
>>>>>> cur_node->daemon->name.epoch;
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(orted_snapshot->process_name.epoch,cur_node->daemon->name.epoch);
>>>>>> 
>>>>>>    mask = ORTE_NS_CMP_ALL;
>>>>>> 
>>>>>> @@ -837,7 +837,7 @@
>>>>>> 
>>>>>>        app_snapshot->process_name.jobid = procs[p]->name.jobid;
>>>>>>        app_snapshot->process_name.vpid = procs[p]->name.vpid;
>>>>>> -            app_snapshot->process_name.epoch = procs[p]->name.epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(app_snapshot->process_name.epoch,procs[p]->name.epoch);
>>>>>> 
>>>>>>        opal_list_append(&(orted_snapshot->super.local_snapshots), 
>>>>>> &(app_snapshot->super));
>>>>>>    }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/snapc/full/snapc_full_local.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/snapc/full/snapc_full_local.c (original)
>>>>>> +++ trunk/orte/mca/snapc/full/snapc_full_local.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -2033,7 +2033,7 @@
>>>>>>        vpid_snapshot->process_pid              = child->pid;
>>>>>>        vpid_snapshot->super.process_name.jobid = child->name->jobid;
>>>>>>        vpid_snapshot->super.process_name.vpid  = child->name->vpid;
>>>>>> -            vpid_snapshot->super.process_name.epoch = 
>>>>>> child->name->epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(vpid_snapshot->super.process_name.epoch,child->name->epoch);
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> @@ -2095,7 +2095,7 @@
>>>>>>        vpid_snapshot->process_pid              = child->pid;
>>>>>>        vpid_snapshot->super.process_name.jobid = child->name->jobid;
>>>>>>        vpid_snapshot->super.process_name.vpid  = child->name->vpid;
>>>>>> -            vpid_snapshot->super.process_name.epoch = 
>>>>>> child->name->epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(vpid_snapshot->super.process_name.epoch,child->name->epoch);
>>>>>>        /*vpid_snapshot->migrating = true;*/
>>>>>> 
>>>>>>        opal_list_append(&(local_global_snapshot.local_snapshots), 
>>>>>> &(vpid_snapshot->super.super));
>>>>>> @@ -2111,7 +2111,7 @@
>>>>>>        vpid_snapshot->process_pid              = child->pid;
>>>>>>        vpid_snapshot->super.process_name.jobid = child->name->jobid;
>>>>>>        vpid_snapshot->super.process_name.vpid  = child->name->vpid;
>>>>>> -            vpid_snapshot->super.process_name.epoch = 
>>>>>> child->name->epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(vpid_snapshot->super.process_name.epoch,child->name->epoch);
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/snapc/full/snapc_full_module.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/snapc/full/snapc_full_module.c        (original)
>>>>>> +++ trunk/orte/mca/snapc/full/snapc_full_module.c        2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -83,7 +83,7 @@
>>>>>> void orte_snapc_full_orted_construct(orte_snapc_full_orted_snapshot_t 
>>>>>> *snapshot) {
>>>>>> snapshot->process_name.jobid  = 0;
>>>>>> snapshot->process_name.vpid   = 0;
>>>>>> -    snapshot->process_name.epoch  = 0;
>>>>>> +    ORTE_EPOCH_SET(snapshot->process_name.epoch,0);
>>>>>> 
>>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE;
>>>>>> }
>>>>>> @@ -91,7 +91,7 @@
>>>>>> void orte_snapc_full_orted_destruct( orte_snapc_full_orted_snapshot_t 
>>>>>> *snapshot) {
>>>>>> snapshot->process_name.jobid  = 0;
>>>>>> snapshot->process_name.vpid   = 0;
>>>>>> -    snapshot->process_name.epoch  = 0;
>>>>>> +    ORTE_EPOCH_SET(snapshot->process_name.epoch,0);
>>>>>> 
>>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE;
>>>>>> }
>>>>>> 
>>>>>> Modified: trunk/orte/mca/sstore/base/sstore_base_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/sstore/base/sstore_base_fns.c (original)
>>>>>> +++ trunk/orte/mca/sstore/base/sstore_base_fns.c 2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -62,7 +62,7 @@
>>>>>> {
>>>>>> snapshot->process_name.jobid  = 0;
>>>>>> snapshot->process_name.vpid   = 0;
>>>>>> -    snapshot->process_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> snapshot->crs_comp = NULL;
>>>>>> snapshot->compress_comp    = NULL;
>>>>>> @@ -76,7 +76,7 @@
>>>>>> {
>>>>>> snapshot->process_name.jobid  = 0;
>>>>>> snapshot->process_name.vpid   = 0;
>>>>>> -    snapshot->process_name.epoch  = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> if( NULL != snapshot->crs_comp ) {
>>>>>>    free(snapshot->crs_comp);
>>>>>> @@ -637,7 +637,7 @@
>>>>>> 
>>>>>>        vpid_snapshot->process_name.jobid  = proc.jobid;
>>>>>>        vpid_snapshot->process_name.vpid   = proc.vpid;
>>>>>> -            vpid_snapshot->process_name.epoch  = proc.epoch;
>>>>>> +            
>>>>>> ORTE_EPOCH_SET(vpid_snapshot->process_name.epoch,proc.epoch);
>>>>>>    }
>>>>>>    else if(0 == strncmp(token, SSTORE_METADATA_LOCAL_CRS_COMP_STR, 
>>>>>> strlen(SSTORE_METADATA_LOCAL_CRS_COMP_STR))) {
>>>>>>        vpid_snapshot->crs_comp = strdup(value);
>>>>>> 
>>>>>> Modified: trunk/orte/mca/sstore/central/sstore_central_global.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/sstore/central/sstore_central_global.c        
>>>>>> (original)
>>>>>> +++ trunk/orte/mca/sstore/central/sstore_central_global.c        
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -1216,8 +1216,7 @@
>>>>>> 
>>>>>>    vpid_snapshot->process_name.jobid  = handle_info->jobid;
>>>>>>    vpid_snapshot->process_name.vpid   = i;
>>>>>> -        vpid_snapshot->process_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        vpid_snapshot->process_name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&vpid_snapshot->process_name);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(vpid_snapshot->process_name.epoch,orte_ess.proc_get_epoch(&vpid_snapshot->process_name));
>>>>>> 
>>>>>>    vpid_snapshot->crs_comp     = NULL;
>>>>>>    global_snapshot->start_time = NULL;
>>>>>> 
>>>>>> Modified: trunk/orte/mca/sstore/central/sstore_central_local.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/sstore/central/sstore_central_local.c (original)
>>>>>> +++ trunk/orte/mca/sstore/central/sstore_central_local.c 2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -210,7 +210,7 @@
>>>>>> {
>>>>>> info->name.jobid = ORTE_JOBID_INVALID;
>>>>>> info->name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    info->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> info->local_location = NULL;
>>>>>> info->metadata_filename = NULL;
>>>>>> @@ -222,7 +222,7 @@
>>>>>> {
>>>>>> info->name.jobid = ORTE_JOBID_INVALID;
>>>>>> info->name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    info->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> if( NULL != info->local_location ) {
>>>>>>    free(info->local_location);
>>>>>> @@ -535,7 +535,7 @@
>>>>>> 
>>>>>> app_info->name.jobid = name->jobid;
>>>>>> app_info->name.vpid  = name->vpid;
>>>>>> -    app_info->name.epoch = name->epoch;
>>>>>> +    ORTE_EPOCH_SET(app_info->name.epoch,name->epoch);
>>>>>> 
>>>>>> opal_list_append(handle_info->app_info_handle, &(app_info->super));
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/mca/sstore/stage/sstore_stage_global.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/sstore/stage/sstore_stage_global.c    (original)
>>>>>> +++ trunk/orte/mca/sstore/stage/sstore_stage_global.c    2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -1218,10 +1218,10 @@
>>>>>>    p_set = OBJ_NEW(orte_filem_base_process_set_t);
>>>>>>    p_set->source.jobid = peer->jobid;
>>>>>>    p_set->source.vpid  = peer->vpid;
>>>>>> -        p_set->source.epoch = peer->epoch;
>>>>>> +        ORTE_EPOCH_SET(p_set->source.epoch,peer->epoch);
>>>>>>    p_set->sink.jobid   = ORTE_PROC_MY_NAME->jobid;
>>>>>>    p_set->sink.vpid    = ORTE_PROC_MY_NAME->vpid;
>>>>>> -        p_set->sink.epoch   = ORTE_PROC_MY_NAME->epoch;
>>>>>> +        ORTE_EPOCH_SET(p_set->sink.epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>>    opal_list_append(&(filem_request->process_sets), &(p_set->super) );
>>>>>> }
>>>>>> 
>>>>>> @@ -1706,8 +1706,7 @@
>>>>>> 
>>>>>>    vpid_snapshot->process_name.jobid  = handle_info->jobid;
>>>>>>    vpid_snapshot->process_name.vpid   = i;
>>>>>> -        vpid_snapshot->process_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        vpid_snapshot->process_name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&vpid_snapshot->process_name);
>>>>>> +        
>>>>>> ORTE_EPOCH_SET(vpid_snapshot->process_name.epoch,orte_ess.proc_get_epoch(&vpid_snapshot->process_name));
>>>>>> 
>>>>>>    /* JJH: Currently we do not have this information since we do not save
>>>>>>     * individual vpid info in the Global SStore. It is in the metadata
>>>>>> 
>>>>>> Modified: trunk/orte/mca/sstore/stage/sstore_stage_local.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/mca/sstore/stage/sstore_stage_local.c     (original)
>>>>>> +++ trunk/orte/mca/sstore/stage/sstore_stage_local.c     2011-08-26 
>>>>>> 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -287,7 +287,7 @@
>>>>>> {
>>>>>> info->name.jobid = ORTE_JOBID_INVALID;
>>>>>> info->name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    info->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> info->local_location = NULL;
>>>>>> info->compressed_local_location = NULL;
>>>>>> @@ -302,7 +302,7 @@
>>>>>> {
>>>>>> info->name.jobid = ORTE_JOBID_INVALID;
>>>>>> info->name.vpid  = ORTE_VPID_INVALID;
>>>>>> -    info->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> if( NULL != info->local_location ) {
>>>>>>    free(info->local_location);
>>>>>> @@ -1014,7 +1014,7 @@
>>>>>> 
>>>>>> app_info->name.jobid = name->jobid;
>>>>>> app_info->name.vpid  = name->vpid;
>>>>>> -    app_info->name.epoch = name->epoch;
>>>>>> +    ORTE_EPOCH_SET(app_info->name.epoch,name->epoch);
>>>>>> 
>>>>>> opal_list_append(handle_info->app_info_handle, &(app_info->super));
>>>>>> 
>>>>>> @@ -2057,17 +2057,17 @@
>>>>>>    /* if I am the HNP, then use me as the source */
>>>>>>    p_set->source.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>    p_set->source.vpid  = ORTE_PROC_MY_NAME->vpid;
>>>>>> -        p_set->source.epoch = ORTE_PROC_MY_NAME->epoch;
>>>>>> +        ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> }
>>>>>> else {
>>>>>>    /* otherwise, set the HNP as the source */
>>>>>>    p_set->source.jobid = ORTE_PROC_MY_HNP->jobid;
>>>>>>    p_set->source.vpid  = ORTE_PROC_MY_HNP->vpid;
>>>>>> -        p_set->source.epoch = ORTE_PROC_MY_HNP->epoch;
>>>>>> +        ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_HNP->epoch);
>>>>>> }
>>>>>> p_set->sink.jobid   = ORTE_PROC_MY_NAME->jobid;
>>>>>> p_set->sink.vpid    = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    p_set->sink.epoch   = ORTE_PROC_MY_NAME->epoch;
>>>>>> +    ORTE_EPOCH_SET(p_set->sink.epoch,ORTE_PROC_MY_NAME->epoch);
>>>>>> opal_list_append(&(filem_request->process_sets), &(p_set->super) );
>>>>>> 
>>>>>> /* Define the file set */
>>>>>> 
>>>>>> Modified: trunk/orte/orted/orted_comm.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/orted/orted_comm.c        (original)
>>>>>> +++ trunk/orte/orted/orted_comm.c        2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -123,18 +123,13 @@
>>>>>>    nm = (orte_routed_tree_t*)item;
>>>>>> 
>>>>>>    target.vpid = nm->vpid;
>>>>>> -        target.epoch = orte_util_lookup_epoch(&target);
>>>>>> +        ORTE_EPOCH_SET(target.epoch,orte_ess.proc_get_epoch(&target));
>>>>>> 
>>>>>> -        if (!orte_util_proc_is_running(&target)) {
>>>>>> +        if (!PROC_IS_RUNNING(&target)) {
>>>>>>        continue;
>>>>>>    }
>>>>>> 
>>>>>> -        target.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        if (ORTE_NODE_RANK_INVALID == (target.epoch = 
>>>>>> orte_ess.proc_get_epoch(&target))) {
>>>>>> -            /* If we are trying to send to a previously failed process 
>>>>>> it's
>>>>>> -             * better to fail silently. */
>>>>>> -            continue;
>>>>>> -        }
>>>>>> +        ORTE_EPOCH_SET(target.epoch,orte_ess.proc_get_epoch(&target));
>>>>>> 
>>>>>>    OPAL_OUTPUT_VERBOSE((1, orte_debug_output,
>>>>>>                         "%s orte:daemon:send_relay sending relay msg to 
>>>>>> %s",
>>>>>> @@ -422,7 +417,8 @@
>>>>>>        proct = OBJ_NEW(orte_proc_t);
>>>>>>        proct->name.jobid = proc.jobid;
>>>>>>        proct->name.vpid = proc.vpid;
>>>>>> -            proct->name.epoch = proc.epoch;
>>>>>> +            ORTE_EPOCH_SET(proct->name.epoch,proc.epoch);
>>>>>> +
>>>>>>        opal_pointer_array_add(&procarray, proct);
>>>>>>        num_replies++;
>>>>>>    }
>>>>>> @@ -1059,7 +1055,9 @@
>>>>>>        orte_job_t *jdata;
>>>>>>        orte_proc_t *proc;
>>>>>>        orte_vpid_t vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>        orte_epoch_t epoch;
>>>>>> +#endif
>>>>>>        int32_t i, num_procs;
>>>>>> 
>>>>>>        /* setup the answer */
>>>>>> @@ -1086,12 +1084,14 @@
>>>>>>            goto CLEANUP;
>>>>>>        }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>        /* unpack the epoch */
>>>>>>        n = 1;
>>>>>>        if (ORTE_SUCCESS != (ret = opal_dss.unpack(buffer, &epoch, &n, 
>>>>>> ORTE_EPOCH))) {
>>>>>>            ORTE_ERROR_LOG(ret);
>>>>>>            goto CLEANUP;
>>>>>>        }
>>>>>> +#endif
>>>>>> 
>>>>>>        /* if they asked for a specific proc, then just get that info */
>>>>>>        if (ORTE_VPID_WILDCARD != vpid) {
>>>>>> @@ -1201,7 +1201,7 @@
>>>>>>                /* loop across all daemons */
>>>>>>                proc2.jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>                for (proc2.vpid=1; proc2.vpid < 
>>>>>> orte_process_info.num_procs; proc2.vpid++) {
>>>>>> -                        proc2.epoch = orte_util_lookup_epoch(&proc2);
>>>>>> +                        
>>>>>> ORTE_EPOCH_SET(proc2.epoch,orte_util_lookup_epoch(&proc2));
>>>>>> 
>>>>>>                    /* setup the cmd */
>>>>>>                    relay_msg = OBJ_NEW(opal_buffer_t);
>>>>>> 
>>>>>> Modified: trunk/orte/orted/orted_main.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/orted/orted_main.c        (original)
>>>>>> +++ trunk/orte/orted/orted_main.c        2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -388,14 +388,14 @@
>>>>>> orte_process_info.my_daemon_uri = orte_rml.get_contact_info();
>>>>>> ORTE_PROC_MY_DAEMON->jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>> ORTE_PROC_MY_DAEMON->vpid = ORTE_PROC_MY_NAME->vpid;
>>>>>> -    ORTE_PROC_MY_DAEMON->epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> /* if I am also the hnp, then update that contact info field too */
>>>>>> if (ORTE_PROC_IS_HNP) {
>>>>>>    orte_process_info.my_hnp_uri = orte_rml.get_contact_info();
>>>>>>    ORTE_PROC_MY_HNP->jobid = ORTE_PROC_MY_NAME->jobid;
>>>>>>    ORTE_PROC_MY_HNP->vpid = ORTE_PROC_MY_NAME->vpid;
>>>>>> -        ORTE_PROC_MY_HNP->epoch = ORTE_EPOCH_MIN;
>>>>>> +        ORTE_EPOCH_SET(ORTE_PROC_MY_HNP->epoch,ORTE_EPOCH_MIN);
>>>>>> }
>>>>>> 
>>>>>> /* setup the primary daemon command receive function */
>>>>>> @@ -495,7 +495,8 @@
>>>>>>    proc = OBJ_NEW(orte_proc_t);
>>>>>>    proc->name.jobid = jdata->jobid;
>>>>>>    proc->name.vpid = 0;
>>>>>> -        proc->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +        ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN);
>>>>>> +
>>>>>>    proc->state = ORTE_PROC_STATE_RUNNING;
>>>>>>    proc->app_idx = 0;
>>>>>>    proc->node = nodes[0]; /* hnp node must be there */
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c   
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c   
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -76,6 +76,7 @@
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /** check the epochs - if one of them is WILDCARD, then ignore
>>>>>> * this field since anything is okay
>>>>>> */
>>>>>> @@ -87,6 +88,7 @@
>>>>>>        return OPAL_VALUE1_GREATER;
>>>>>>    }
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> /** only way to get here is if all fields are equal or WILDCARD */
>>>>>> return OPAL_EQUAL;
>>>>>> @@ -122,6 +124,7 @@
>>>>>> return OPAL_EQUAL;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_dt_compare_epoch(orte_epoch_t *value1,
>>>>>>                      orte_epoch_t *value2,
>>>>>>                      opal_data_type_t type)
>>>>>> @@ -136,6 +139,7 @@
>>>>>> 
>>>>>> return OPAL_EQUAL;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> /**
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c      
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c      
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -61,7 +61,7 @@
>>>>>> 
>>>>>> val->jobid = src->jobid;
>>>>>> val->vpid = src->vpid;
>>>>>> -    val->epoch = src->epoch;
>>>>>> +    ORTE_EPOCH_SET(val->epoch,src->epoch);
>>>>>> 
>>>>>> *dest = val;
>>>>>> return ORTE_SUCCESS;
>>>>>> @@ -105,6 +105,7 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /*
>>>>>> * EPOCH
>>>>>> */
>>>>>> @@ -123,6 +124,7 @@
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c   
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c   
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -58,7 +58,9 @@
>>>>>> orte_process_name_t* proc;
>>>>>> orte_jobid_t *jobid;
>>>>>> orte_vpid_t *vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t *epoch;
>>>>>> +#endif
>>>>>> 
>>>>>> /* collect all the jobids in a contiguous array */
>>>>>> jobid = (orte_jobid_t*)malloc(num_vals * sizeof(orte_jobid_t));
>>>>>> @@ -100,6 +102,7 @@
>>>>>> }
>>>>>> free(vpid);
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* Collect all the epochs in a contiguous array */
>>>>>> epoch = (orte_epoch_t *) malloc(num_vals * sizeof(orte_epoch_t));
>>>>>> if (NULL == epoch) {
>>>>>> @@ -118,6 +121,7 @@
>>>>>>    return rc;
>>>>>> }
>>>>>> free(epoch);
>>>>>> +#endif
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> @@ -156,6 +160,7 @@
>>>>>> return ret;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /*
>>>>>> * EPOCH
>>>>>> */
>>>>>> @@ -171,6 +176,7 @@
>>>>>> 
>>>>>> return ret;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> /*
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_print_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_print_fns.c     
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_print_fns.c     
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -125,8 +125,10 @@
>>>>>>        orte_dt_quick_print(output, "ORTE_STD_CNTR", prefix, src, 
>>>>>> ORTE_STD_CNTR_T);
>>>>>>        break;
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>    case ORTE_EPOCH:
>>>>>>        orte_dt_quick_print(output, "ORTE_EPOCH", prefix, src, 
>>>>>> ORTE_EPOCH_T);
>>>>>> +#endif
>>>>>> 
>>>>>>    case ORTE_VPID:
>>>>>>        orte_dt_quick_print(output, "ORTE_VPID", prefix, src, 
>>>>>> ORTE_VPID_T);
>>>>>> @@ -478,11 +480,21 @@
>>>>>> if (orte_xml_output) {
>>>>>>    /* need to create the output in XML format */
>>>>>>    if (0 == src->pid) {
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>        asprintf(output, "%s<process rank=\"%s\" status=\"%s\" 
>>>>>> epoch=\"%s\"/>\n", pfx2,
>>>>>>                 ORTE_VPID_PRINT(src->name.vpid), 
>>>>>> orte_proc_state_to_str(src->state), ORTE_EPOCH_PRINT(src->name.epoch));
>>>>>> +#else
>>>>>> +            asprintf(output, "%s<process rank=\"%s\" 
>>>>>> status=\"%s\"/>\n", pfx2,
>>>>>> +                     ORTE_VPID_PRINT(src->name.vpid), 
>>>>>> orte_proc_state_to_str(src->state));
>>>>>> +#endif
>>>>>>    } else {
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>        asprintf(output, "%s<process rank=\"%s\" pid=\"%d\" status=\"%s\" 
>>>>>> epoch=\"%s\"/>\n", pfx2,
>>>>>>                 ORTE_VPID_PRINT(src->name.vpid), (int)src->pid, 
>>>>>> orte_proc_state_to_str(src->state), ORTE_EPOCH_PRINT(src->name.epoch));
>>>>>> +#else
>>>>>> +            asprintf(output, "%s<process rank=\"%s\" pid=\"%d\" 
>>>>>> status=\"%s\"/>\n", pfx2,
>>>>>> +                     ORTE_VPID_PRINT(src->name.vpid), (int)src->pid, 
>>>>>> orte_proc_state_to_str(src->state));
>>>>>> +#endif
>>>>>>    }
>>>>>>    free(pfx2);
>>>>>>    return ORTE_SUCCESS;
>>>>>> @@ -490,10 +502,17 @@
>>>>>> 
>>>>>> if (!orte_devel_level_output) {
>>>>>>    /* just print a very simple output for users */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>    asprintf(&tmp, "\n%sProcess OMPI jobid: %s Process rank: %s Epoch: 
>>>>>> %s", pfx2,
>>>>>>             ORTE_JOBID_PRINT(src->name.jobid),
>>>>>>             ORTE_VPID_PRINT(src->name.vpid),
>>>>>>             ORTE_EPOCH_PRINT(src->name.epoch));
>>>>>> +#else
>>>>>> +        asprintf(&tmp, "\n%sProcess OMPI jobid: %s Process rank: %s 
>>>>>> Epoch: %s", pfx2,
>>>>>> +                 ORTE_JOBID_PRINT(src->name.jobid),
>>>>>> +                 ORTE_VPID_PRINT(src->name.vpid));
>>>>>> +#endif
>>>>>> +        
>>>>>>    /* set the return */
>>>>>>    *output = tmp;
>>>>>>    free(pfx2);
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_size_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_size_fns.c      
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_size_fns.c      
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -45,9 +45,11 @@
>>>>>>        *size = sizeof(orte_std_cntr_t);
>>>>>>        break;
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>>    case ORTE_EPOCH:
>>>>>>        *size = sizeof(orte_epoch_t);
>>>>>>        break;
>>>>>> +#endif
>>>>>> 
>>>>>>    case ORTE_VPID:
>>>>>>        *size = sizeof(orte_vpid_t);
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_support.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_support.h       
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_support.h       
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -52,9 +52,14 @@
>>>>>> int orte_dt_compare_vpid(orte_vpid_t *value1,
>>>>>>                     orte_vpid_t *value2,
>>>>>>                     opal_data_type_t type);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_dt_compare_epoch(orte_epoch_t *value1,
>>>>>>                      orte_epoch_t *value2,
>>>>>>                      opal_data_type_t type);
>>>>>> +#define ORTE_EPOCH_CMP(n,m) ( (m) - (n) )
>>>>>> +#else
>>>>>> +#define ORTE_EPOCH_CMP(n,m) ( 0 )
>>>>>> +#endif
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> int orte_dt_compare_job(orte_job_t *value1, orte_job_t *value2, 
>>>>>> opal_data_type_t type);
>>>>>> int orte_dt_compare_node(orte_node_t *value1, orte_node_t *value2, 
>>>>>> opal_data_type_t type);
>>>>>> @@ -86,7 +91,9 @@
>>>>>> int orte_dt_copy_name(orte_process_name_t **dest, orte_process_name_t 
>>>>>> *src, opal_data_type_t type);
>>>>>> int orte_dt_copy_jobid(orte_jobid_t **dest, orte_jobid_t *src, 
>>>>>> opal_data_type_t type);
>>>>>> int orte_dt_copy_vpid(orte_vpid_t **dest, orte_vpid_t *src, 
>>>>>> opal_data_type_t type);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_dt_copy_epoch(orte_epoch_t **dest, orte_epoch_t *src, 
>>>>>> opal_data_type_t type);
>>>>>> +#endif
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> int orte_dt_copy_job(orte_job_t **dest, orte_job_t *src, 
>>>>>> opal_data_type_t type);
>>>>>> int orte_dt_copy_node(orte_node_t **dest, orte_node_t *src, 
>>>>>> opal_data_type_t type);
>>>>>> @@ -116,8 +123,10 @@
>>>>>>                   int32_t num_vals, opal_data_type_t type);
>>>>>> int orte_dt_pack_vpid(opal_buffer_t *buffer, const void *src,
>>>>>>                  int32_t num_vals, opal_data_type_t type);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_dt_pack_epoch(opal_buffer_t *buffer, const void *src,
>>>>>>                  int32_t num_vals, opal_data_type_t type);
>>>>>> +#endif
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> int orte_dt_pack_job(opal_buffer_t *buffer, const void *src,
>>>>>>                 int32_t num_vals, opal_data_type_t type);
>>>>>> @@ -185,8 +194,10 @@
>>>>>>                     int32_t *num_vals, opal_data_type_t type);
>>>>>> int orte_dt_unpack_vpid(opal_buffer_t *buffer, void *dest,
>>>>>>                    int32_t *num_vals, opal_data_type_t type);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_dt_unpack_epoch(opal_buffer_t *buffer, void *dest,
>>>>>>                    int32_t *num_vals, opal_data_type_t type);
>>>>>> +#endif
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> int orte_dt_unpack_job(opal_buffer_t *buffer, void *dest,
>>>>>>                   int32_t *num_vals, opal_data_type_t type);
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c 
>>>>>> (original)
>>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c 
>>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011)
>>>>>> @@ -54,7 +54,9 @@
>>>>>> orte_process_name_t* proc;
>>>>>> orte_jobid_t *jobid;
>>>>>> orte_vpid_t *vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t *epoch;
>>>>>> +#endif
>>>>>> 
>>>>>> num = *num_vals;
>>>>>> 
>>>>>> @@ -92,6 +94,7 @@
>>>>>>    return rc;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* collect all the epochs in a contiguous array */
>>>>>> epoch= (orte_epoch_t*)malloc(num * sizeof(orte_epoch_t));
>>>>>> if (NULL == epoch) {
>>>>>> @@ -109,18 +112,21 @@
>>>>>>    free(jobid);
>>>>>>    return rc;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> /* build the names from the jobid/vpid/epoch arrays */
>>>>>> proc = (orte_process_name_t*)dest;
>>>>>> for (i=0; i < num; i++) {
>>>>>>    proc->jobid = jobid[i];
>>>>>>    proc->vpid = vpid[i];
>>>>>> -        proc->epoch = epoch[i];
>>>>>> +        ORTE_EPOCH_SET(proc->epoch,epoch[i]);
>>>>>>    proc++;
>>>>>> }
>>>>>> 
>>>>>> /* cleanup */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> free(epoch);
>>>>>> +#endif
>>>>>> free(vpid);
>>>>>> free(jobid);
>>>>>> 
>>>>>> @@ -159,6 +165,7 @@
>>>>>> return ret;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /*
>>>>>> * EPOCH 
>>>>>> */
>>>>>> @@ -174,6 +181,7 @@
>>>>>> 
>>>>>> return ret;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> /*
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/orte_data_server.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/orte_data_server.c        (original)
>>>>>> +++ trunk/orte/runtime/orte_data_server.c        2011-08-26 18:16:14 EDT 
>>>>>> (Fri, 26 Aug 2011)
>>>>>> @@ -220,7 +220,7 @@
>>>>>>        data->port = port_name;
>>>>>>        data->owner.jobid = sender->jobid;
>>>>>>        data->owner.vpid = sender->vpid;
>>>>>> -            data->owner.epoch = sender->epoch;
>>>>>> +            ORTE_EPOCH_SET(data->owner.epoch,sender->epoch);
>>>>>> 
>>>>>>        /* store the data */
>>>>>>        data->index = opal_pointer_array_add(orte_data_server_store, 
>>>>>> data);
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/orte_globals.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/orte_globals.c    (original)
>>>>>> +++ trunk/orte/runtime/orte_globals.c    2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -277,6 +277,7 @@
>>>>>>    return rc;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> tmp = ORTE_EPOCH;
>>>>>> if (ORTE_SUCCESS != (rc = opal_dss.register_type(orte_dt_pack_epoch,
>>>>>>                                                 orte_dt_unpack_epoch,
>>>>>> @@ -290,6 +291,7 @@
>>>>>>    ORTE_ERROR_LOG(rc);
>>>>>>    return rc;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> #if !ORTE_DISABLE_FULL_SUPPORT
>>>>>> tmp = ORTE_JOB;
>>>>>> @@ -933,7 +935,7 @@
>>>>>> proc->beat = 0;
>>>>>> OBJ_CONSTRUCT(&proc->stats, opal_ring_buffer_t);
>>>>>> opal_ring_buffer_init(&proc->stats, orte_stat_history_size);
>>>>>> -    proc->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN);
>>>>>> #if OPAL_ENABLE_FT_CR == 1
>>>>>> proc->ckpt_state = 0;
>>>>>> proc->ckpt_snapshot_ref = NULL;
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/orte_init.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/orte_init.c       (original)
>>>>>> +++ trunk/orte/runtime/orte_init.c       2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -57,8 +57,17 @@
>>>>>> char *orte_prohibited_session_dirs = NULL;
>>>>>> bool orte_create_session_dirs = true;
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +orte_process_name_t orte_name_wildcard = {ORTE_JOBID_WILDCARD, 
>>>>>> ORTE_VPID_WILDCARD, ORTE_EPOCH_WILDCARD};
>>>>>> +#else
>>>>>> orte_process_name_t orte_name_wildcard = {ORTE_JOBID_WILDCARD, 
>>>>>> ORTE_VPID_WILDCARD};
>>>>>> +#endif
>>>>>> +
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +orte_process_name_t orte_name_invalid = {ORTE_JOBID_INVALID, 
>>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_INVALID}; 
>>>>>> +#else
>>>>>> orte_process_name_t orte_name_invalid = {ORTE_JOBID_INVALID, 
>>>>>> ORTE_VPID_INVALID}; 
>>>>>> +#endif
>>>>>> 
>>>>>> 
>>>>>> #if OPAL_CC_USE_PRAGMA_IDENT
>>>>>> 
>>>>>> Modified: trunk/orte/runtime/orte_wait.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/runtime/orte_wait.h       (original)
>>>>>> +++ trunk/orte/runtime/orte_wait.h       2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -204,7 +204,7 @@
>>>>>>    mev = OBJ_NEW(orte_message_event_t);                    \
>>>>>>    mev->sender.jobid = (sndr)->jobid;                      \
>>>>>>    mev->sender.vpid = (sndr)->vpid;                        \
>>>>>> -        mev->sender.epoch = (sndr)->epoch;                      \
>>>>>> +        ORTE_EPOCH_SET(mev->sender.epoch,(sndr)->epoch);        \
>>>>>>    opal_dss.copy_payload(mev->buffer, (buf));              \
>>>>>>    mev->tag = (tg);                                        \
>>>>>>    mev->file = strdup((buf)->parent.cls_init_file_name);   \
>>>>>> @@ -228,7 +228,7 @@
>>>>>>    mev = OBJ_NEW(orte_message_event_t);                    \
>>>>>>    mev->sender.jobid = (sndr)->jobid;                      \
>>>>>>    mev->sender.vpid = (sndr)->vpid;                        \
>>>>>> -        mev->sender.epoch = (sndr)->epoch;                      \
>>>>>> +        ORTE_EPOCH_SET(mev->sender.epoch,(sndr)->epoch);        \
>>>>>>    opal_dss.copy_payload(mev->buffer, (buf));              \
>>>>>>    mev->tag = (tg);                                        \
>>>>>>    opal_event_evtimer_set(opal_event_base,                 \
>>>>>> @@ -258,7 +258,7 @@
>>>>>>    tmp = OBJ_NEW(orte_notify_event_t);                     \
>>>>>>    tmp->proc.jobid = (data)->jobid;                        \
>>>>>>    tmp->proc.vpid = (data)->vpid;                          \
>>>>>> -        tmp->proc.epoch = (data)->epoch;                        \
>>>>>> +        ORTE_EPOCH_SET(tmp->proc.epoch,(data)->epoch);          \
>>>>>>    opal_event.evtimer_set(opal_event_base,                 \
>>>>>>                           tmp->ev, (cbfunc), tmp);         \
>>>>>>    now.tv_sec = 0;                                         \
>>>>>> 
>>>>>> Modified: trunk/orte/test/system/oob_stress.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/test/system/oob_stress.c  (original)
>>>>>> +++ trunk/orte/test/system/oob_stress.c  2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -74,8 +74,7 @@
>>>>>> 
>>>>>> for (j=1; j < count+1; j++) {
>>>>>>    peer.vpid = (ORTE_PROC_MY_NAME->vpid + j) % 
>>>>>> orte_process_info.num_procs;
>>>>>> -        peer.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        peer.epoch = orte_ess.proc_get_epoch(&peer);
>>>>>> +        ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer));
>>>>>> 
>>>>>>    /* rank0 starts ring */
>>>>>>    if (ORTE_PROC_MY_NAME->vpid == 0) {
>>>>>> 
>>>>>> Modified: trunk/orte/test/system/orte_ring.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/test/system/orte_ring.c   (original)
>>>>>> +++ trunk/orte/test/system/orte_ring.c   2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -41,16 +41,14 @@
>>>>>> if( right_peer_orte_name.vpid >= num_peers ) {
>>>>>>    right_peer_orte_name.vpid = 0;
>>>>>> }
>>>>>> -    right_peer_orte_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    right_peer_orte_name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&right_peer_orte_name);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(right_peer_orte_name.epoch,orte_ess.proc_get_epoch(&right_peer_orte_name));
>>>>>> 
>>>>>> left_peer_orte_name.jobid  = ORTE_PROC_MY_NAME->jobid;
>>>>>> left_peer_orte_name.vpid   = ORTE_PROC_MY_NAME->vpid - 1;
>>>>>> if( ORTE_PROC_MY_NAME->vpid == 0 ) {
>>>>>>    left_peer_orte_name.vpid = num_peers - 1;
>>>>>> }
>>>>>> -    left_peer_orte_name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -    left_peer_orte_name.epoch = 
>>>>>> orte_ess.proc_get_epoch(&left_peer_orte_name);
>>>>>> +    
>>>>>> ORTE_EPOCH_SET(left_peer_orte_name.epoch,orte_ess.proc_get_epoch(&left_peer_orte_name));
>>>>>> 
>>>>>> printf("My name is: %s -- PID %d\tMy Left Peer is %s\tMy Right Peer is 
>>>>>> %s\n",
>>>>>>       ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), getpid(),
>>>>>> 
>>>>>> Modified: trunk/orte/test/system/orte_spawn.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/test/system/orte_spawn.c  (original)
>>>>>> +++ trunk/orte/test/system/orte_spawn.c  2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -74,8 +74,8 @@
>>>>>> for (i=0; i < app->num_procs; i++) {
>>>>>>    name.vpid = i;
>>>>>> 
>>>>>> -        name.epoch = ORTE_EPOCH_INVALID;
>>>>>> -        name.epoch = orte_ess.proc_get_epoch(&name);
>>>>>> +        ORTE_EPOCH_SET(name.epoch,orte_ess.proc_get_epoch(&name));
>>>>>> +
>>>>>>    fprintf(stderr, "Parent: sending message to child %s\n", 
>>>>>> ORTE_NAME_PRINT(&name));
>>>>>>    if (0 > (rc = orte_rml.send(&name, &msg, 1, MY_TAG, 0))) {
>>>>>>        ORTE_ERROR_LOG(rc);
>>>>>> 
>>>>>> Modified: trunk/orte/tools/orte-ps/orte-ps.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/tools/orte-ps/orte-ps.c   (original)
>>>>>> +++ trunk/orte/tools/orte-ps/orte-ps.c   2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -869,8 +869,14 @@
>>>>>>    }
>>>>>> 
>>>>>>    /* query the HNP for info on the procs in this job */
>>>>>> -        if (ORTE_SUCCESS != (ret = 
>>>>>> orte_util_comm_query_proc_info(&(hnpinfo->hnp->name), job->jobid,
>>>>>> -                                                                  
>>>>>> ORTE_VPID_WILDCARD, ORTE_EPOCH_WILDCARD, &cnt, &procs))) {
>>>>>> +        if (ORTE_SUCCESS != (ret = 
>>>>>> orte_util_comm_query_proc_info(&(hnpinfo->hnp->name), 
>>>>>> +                                                                  
>>>>>> job->jobid,
>>>>>> +                                                                  
>>>>>> ORTE_VPID_WILDCARD, 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +                                                                  
>>>>>> ORTE_EPOCH_WILDCARD, 
>>>>>> +#endif
>>>>>> +                                                                  &cnt, 
>>>>>> +                                                                  
>>>>>> &procs))) {
>>>>>>        ORTE_ERROR_LOG(ret);
>>>>>>    }
>>>>>>    job->procs->addr = (void**)procs;
>>>>>> 
>>>>>> Modified: trunk/orte/tools/orte-top/orte-top.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/tools/orte-top/orte-top.c (original)
>>>>>> +++ trunk/orte/tools/orte-top/orte-top.c 2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -471,7 +471,7 @@
>>>>>> if (NULL == ranks) {
>>>>>>    /* take all ranks */
>>>>>>    proc.vpid = ORTE_VPID_WILDCARD;
>>>>>> -        proc.epoch = ORTE_EPOCH_WILDCARD;
>>>>>> +        ORTE_EPOCH_SET(proc.epoch,ORTE_EPOCH_WILDCARD);
>>>>>>    if (ORTE_SUCCESS != (ret = opal_dss.pack(&cmdbuf, &proc, 1, 
>>>>>> ORTE_NAME))) {
>>>>>>        ORTE_ERROR_LOG(ret);
>>>>>>        goto cleanup;
>>>>>> 
>>>>>> Modified: trunk/orte/util/comm/comm.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/comm/comm.c  (original)
>>>>>> +++ trunk/orte/util/comm/comm.c  2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -433,8 +433,13 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_util_comm_query_proc_info(const orte_process_name_t *hnp, 
>>>>>> orte_jobid_t job, orte_vpid_t vpid,
>>>>>>                               orte_epoch_t epoch, int *num_procs, 
>>>>>> orte_proc_t ***proc_info_array)
>>>>>> +#else
>>>>>> +int orte_util_comm_query_proc_info(const orte_process_name_t *hnp, 
>>>>>> orte_jobid_t job, orte_vpid_t vpid,
>>>>>> +                                   int *num_procs, orte_proc_t 
>>>>>> ***proc_info_array)
>>>>>> +#endif
>>>>>> {
>>>>>> int ret;
>>>>>> int32_t cnt, cnt_procs, n;
>>>>>> @@ -463,11 +468,13 @@
>>>>>>    OBJ_RELEASE(cmd);
>>>>>>    return ret;
>>>>>> }
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> if (ORTE_SUCCESS != (ret = opal_dss.pack(cmd, &epoch, 1, ORTE_EPOCH))) {
>>>>>>    ORTE_ERROR_LOG(ret);
>>>>>>    OBJ_RELEASE(cmd);
>>>>>>    return ret;
>>>>>> }
>>>>>> +#endif
>>>>>> /* define a max time to wait for send to complete */
>>>>>> timer_fired = false;
>>>>>> error_exit = ORTE_SUCCESS;
>>>>>> 
>>>>>> Modified: trunk/orte/util/comm/comm.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/comm/comm.h  (original)
>>>>>> +++ trunk/orte/util/comm/comm.h  2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -52,7 +52,10 @@
>>>>>>                                             int *num_nodes, orte_node_t 
>>>>>> ***node_info_array);
>>>>>> 
>>>>>> ORTE_DECLSPEC int orte_util_comm_query_proc_info(const 
>>>>>> orte_process_name_t *hnp, orte_jobid_t job, orte_vpid_t vpid,
>>>>>> -                                                 orte_epoch_t epoch, 
>>>>>> int *num_procs, orte_proc_t ***proc_info_array);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +                                                 orte_epoch_t epoch, 
>>>>>> +#endif
>>>>>> +                                                 int *num_procs, 
>>>>>> orte_proc_t ***proc_info_array);
>>>>>> 
>>>>>> ORTE_DECLSPEC int orte_util_comm_spawn_job(const orte_process_name_t 
>>>>>> *hnp, orte_job_t *jdata);
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/util/hnp_contact.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/hnp_contact.c        (original)
>>>>>> +++ trunk/orte/util/hnp_contact.c        2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -55,7 +55,8 @@
>>>>>> {
>>>>>> ptr->name.jobid = ORTE_JOBID_INVALID;
>>>>>> ptr->name.vpid = ORTE_VPID_INVALID;
>>>>>> -    ptr->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_MIN);
>>>>>> +
>>>>>> ptr->rml_uri = NULL;
>>>>>> }
>>>>>> static void orte_hnp_contact_destruct(orte_hnp_contact_t *ptr)
>>>>>> 
>>>>>> Modified: trunk/orte/util/name_fns.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/name_fns.c   (original)
>>>>>> +++ trunk/orte/util/name_fns.c   2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -46,7 +46,7 @@
>>>>>> {
>>>>>> list->name.jobid = ORTE_JOBID_INVALID;
>>>>>> list->name.vpid = ORTE_VPID_INVALID;
>>>>>> -    list->name.epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(list->name.epoch,ORTE_EPOCH_MIN);
>>>>>> }
>>>>>> 
>>>>>> /* destructor - used to free any resources held by instance */
>>>>>> @@ -116,7 +116,10 @@
>>>>>> char* orte_util_print_name_args(const orte_process_name_t *name)
>>>>>> {
>>>>>> orte_print_args_buffers_t *ptr;
>>>>>> -    char *job, *vpid, *epoch;
>>>>>> +    char *job, *vpid; 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +    char *epoch;
>>>>>> +#endif
>>>>>> 
>>>>>> /* protect against NULL names */
>>>>>> if (NULL == name) {
>>>>>> @@ -141,7 +144,7 @@
>>>>>> */
>>>>>> job = orte_util_print_jobids(name->jobid);
>>>>>> vpid = orte_util_print_vpids(name->vpid);
>>>>>> -    epoch = orte_util_print_epoch(name->epoch);
>>>>>> +    ORTE_EPOCH_SET(epoch,orte_util_print_epoch(name->epoch));
>>>>>> 
>>>>>> /* get the next buffer */
>>>>>> ptr = get_print_name_buffer();
>>>>>> @@ -156,9 +159,15 @@
>>>>>>    ptr->cntr = 0;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> snprintf(ptr->buffers[ptr->cntr++], 
>>>>>>         ORTE_PRINT_NAME_ARGS_MAX_SIZE, 
>>>>>>         "[%s,%s,%s]", job, vpid, epoch);
>>>>>> +#else
>>>>>> +    snprintf(ptr->buffers[ptr->cntr++], 
>>>>>> +             ORTE_PRINT_NAME_ARGS_MAX_SIZE, 
>>>>>> +             "[%s,%s]", job, vpid);
>>>>>> +#endif
>>>>>> 
>>>>>> return ptr->buffers[ptr->cntr-1];
>>>>>> }
>>>>>> @@ -282,6 +291,7 @@
>>>>>> return ptr->buffers[ptr->cntr-1];
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> char* orte_util_print_epoch(const orte_epoch_t epoch)
>>>>>> {
>>>>>> orte_print_args_buffers_t *ptr;
>>>>>> @@ -309,6 +319,7 @@
>>>>>> }
>>>>>> return ptr->buffers[ptr->cntr-1];
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> @@ -403,6 +414,7 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> int orte_util_convert_epoch_to_string(char **epoch_string, const 
>>>>>> orte_epoch_t epoch)
>>>>>> {
>>>>>> /* check for wildcard value - handle appropriately */
>>>>>> @@ -425,7 +437,6 @@
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> -
>>>>>> int orte_util_convert_string_to_epoch(orte_epoch_t *epoch, const char* 
>>>>>> epoch_string)
>>>>>> {
>>>>>> if (NULL == epoch_string) {  /* got an error */
>>>>>> @@ -450,6 +461,7 @@
>>>>>> 
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> int orte_util_convert_string_to_process_name(orte_process_name_t *name,
>>>>>>                                         const char* name_string)
>>>>>> @@ -457,13 +469,15 @@
>>>>>> char *temp, *token;
>>>>>> orte_jobid_t job;
>>>>>> orte_vpid_t vpid;
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t epoch;
>>>>>> +#endif
>>>>>> int return_code=ORTE_SUCCESS;
>>>>>> -    
>>>>>> +
>>>>>> /* set default */
>>>>>> name->jobid = ORTE_JOBID_INVALID;
>>>>>> name->vpid = ORTE_VPID_INVALID;
>>>>>> -    name->epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(name->epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> /* check for NULL string - error */
>>>>>> if (NULL == name_string) {
>>>>>> @@ -510,6 +524,7 @@
>>>>>>    vpid = strtoul(token, NULL, 10);
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> token = strtok(NULL, ORTE_SCHEMA_DELIMITER_STRING);  /** get next field 
>>>>>> -> epoch*/
>>>>>> 
>>>>>> /* check for error */
>>>>>> @@ -528,10 +543,11 @@
>>>>>> } else {
>>>>>>    epoch = strtoul(token, NULL, 10);
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> name->jobid = job;
>>>>>> name->vpid = vpid;
>>>>>> -    name->epoch = epoch;
>>>>>> +    ORTE_EPOCH_SET(name->epoch,epoch);
>>>>>> 
>>>>>> free(temp);
>>>>>> 
>>>>>> @@ -568,6 +584,7 @@
>>>>>>    asprintf(&tmp2, "%s%c%lu", tmp, ORTE_SCHEMA_DELIMITER_CHAR, (unsigned 
>>>>>> long)name->vpid);
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> if (ORTE_EPOCH_WILDCARD == name->epoch) {
>>>>>>    asprintf(name_string, "%s%c%s", tmp2, ORTE_SCHEMA_DELIMITER_CHAR, 
>>>>>> ORTE_SCHEMA_WILDCARD_STRING);
>>>>>> } else if (ORTE_EPOCH_INVALID == name->epoch) {
>>>>>> @@ -575,6 +592,10 @@
>>>>>> } else {
>>>>>>    asprintf(name_string, "%s%c%lu", tmp2, ORTE_SCHEMA_DELIMITER_CHAR, 
>>>>>> (unsigned long)name->epoch);
>>>>>> }
>>>>>> +#else
>>>>>> +    asprintf(name_string, "%s", tmp2);
>>>>>> +#endif
>>>>>> +
>>>>>> free(tmp);
>>>>>> free(tmp2);
>>>>>> 
>>>>>> @@ -585,8 +606,11 @@
>>>>>> /****    CREATE PROCESS NAME    ****/
>>>>>> int orte_util_create_process_name(orte_process_name_t **name,
>>>>>>                              orte_jobid_t job,
>>>>>> -                                  orte_vpid_t vpid,
>>>>>> -                                  orte_epoch_t epoch)
>>>>>> +                                  orte_vpid_t vpid
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +                                  ,orte_epoch_t epoch
>>>>>> +#endif
>>>>>> +                                  )
>>>>>> {
>>>>>> *name = NULL;
>>>>>> 
>>>>>> @@ -598,7 +622,8 @@
>>>>>> 
>>>>>> (*name)->jobid = job;
>>>>>> (*name)->vpid = vpid;
>>>>>> -    (*name)->epoch = epoch;
>>>>>> +    ORTE_EPOCH_SET((*name)->epoch,epoch);
>>>>>> +
>>>>>> return ORTE_SUCCESS;
>>>>>> }
>>>>>> 
>>>>>> @@ -655,6 +680,7 @@
>>>>>>    }
>>>>>> }
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* Get here if jobid's and vpid's are equal, or not being checked.
>>>>>> * Now check epoch.
>>>>>> */
>>>>>> @@ -666,6 +692,7 @@
>>>>>>        return OPAL_VALUE1_GREATER;
>>>>>>    }
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> /* only way to get here is if all fields are being checked and are equal,
>>>>>> * or jobid not checked, but vpid equal,
>>>>>> 
>>>>>> Modified: trunk/orte/util/name_fns.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/name_fns.h   (original)
>>>>>> +++ trunk/orte/util/name_fns.h   2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -61,9 +61,13 @@
>>>>>> #define ORTE_VPID_PRINT(n) \
>>>>>> orte_util_print_vpids(n)
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> ORTE_DECLSPEC char* orte_util_print_epoch(const orte_epoch_t epoch);
>>>>>> #define ORTE_EPOCH_PRINT(n) \
>>>>>> orte_util_print_epoch(n)
>>>>>> +#else
>>>>>> +#define ORTE_EPOCH_PRINT(n)
>>>>>> +#endif
>>>>>> 
>>>>>> ORTE_DECLSPEC char* orte_util_print_job_family(const orte_jobid_t job);
>>>>>> #define ORTE_JOB_FAMILY_PRINT(n) \
>>>>>> @@ -104,6 +108,24 @@
>>>>>> #define ORTE_JOBID_IS_DAEMON(n)  \
>>>>>> !((n) & 0x0000ffff)
>>>>>> 
>>>>>> +/* Macro for getting the epoch out of the process name */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +#define ORTE_EPOCH_GET(n) \
>>>>>> +    ((n)->epoch)
>>>>>> +#else
>>>>>> +#define ORTE_EPOCH_GET(n)
>>>>>> +#endif
>>>>>> +
>>>>>> +/* Macro for setting the epoch in the process name */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +#define ORTE_EPOCH_SET(n,m) \
>>>>>> +    ( (n) = (m) )
>>>>>> +#else
>>>>>> +#define ORTE_EPOCH_SET(n,m) \
>>>>>> +    do {                    \
>>>>>> +    } while(0);
>>>>>> +#endif
>>>>>> +
>>>>>> /* List of names for general use */
>>>>>> struct orte_namelist_t {
>>>>>> opal_list_item_t item;      /**< Allows this item to be placed on a list 
>>>>>> */
>>>>>> @@ -117,16 +139,24 @@
>>>>>> ORTE_DECLSPEC int orte_util_convert_string_to_jobid(orte_jobid_t *jobid, 
>>>>>> const char* jobidstring);
>>>>>> ORTE_DECLSPEC int orte_util_convert_vpid_to_string(char **vpid_string, 
>>>>>> const orte_vpid_t vpid);
>>>>>> ORTE_DECLSPEC int orte_util_convert_string_to_vpid(orte_vpid_t *vpid, 
>>>>>> const char* vpidstring);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> ORTE_DECLSPEC int orte_util_convert_epoch_to_string(char **epoch_string, 
>>>>>> const orte_epoch_t epoch);
>>>>>> ORTE_DECLSPEC int orte_util_convert_string_to_epoch(orte_vpid_t *epoch, 
>>>>>> const char* epochstring);
>>>>>> +#endif
>>>>>> ORTE_DECLSPEC int 
>>>>>> orte_util_convert_string_to_process_name(orte_process_name_t *name,
>>>>>>                                         const char* name_string);
>>>>>> ORTE_DECLSPEC int orte_util_convert_process_name_to_string(char** 
>>>>>> name_string,
>>>>>>                                         const orte_process_name_t *name);
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> ORTE_DECLSPEC int orte_util_create_process_name(orte_process_name_t 
>>>>>> **name,
>>>>>>                              orte_jobid_t job,
>>>>>>                              orte_vpid_t vpid,
>>>>>>                              orte_epoch_t epoch);
>>>>>> +#else
>>>>>> +ORTE_DECLSPEC int orte_util_create_process_name(orte_process_name_t 
>>>>>> **name,
>>>>>> +                                  orte_jobid_t job,
>>>>>> +                                  orte_vpid_t vpid);
>>>>>> +#endif
>>>>>> ORTE_DECLSPEC int orte_util_compare_name_fields(orte_ns_cmp_bitmask_t 
>>>>>> fields,
>>>>>>                              const orte_process_name_t* name1,
>>>>>>                              const orte_process_name_t* name2);
>>>>>> 
>>>>>> Modified: trunk/orte/util/nidmap.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/nidmap.c     (original)
>>>>>> +++ trunk/orte/util/nidmap.c     2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -249,7 +249,7 @@
>>>>>>     */
>>>>>>    /* construct the URI */
>>>>>>    proc.vpid = node->daemon;
>>>>>> -        proc.epoch = ORTE_EPOCH_MIN;
>>>>>> +        ORTE_EPOCH_SET(proc.epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>>    orte_util_convert_process_name_to_string(&proc_name, &proc);
>>>>>>    asprintf(&uri, "%s;tcp://%s:%d", proc_name, addr, 
>>>>>> (int)orte_process_info.my_port);
>>>>>> @@ -1001,6 +1001,7 @@
>>>>>> }
>>>>>> #endif
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /* Look up the current epoch value that we have stored locally.
>>>>>> *
>>>>>> * Note that this will not ping the HNP to get the most up to date epoch 
>>>>>> stored
>>>>>> @@ -1023,7 +1024,9 @@
>>>>>> /*print_orte_job_data();*/
>>>>>> return e;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> bool orte_util_proc_is_running(orte_process_name_t *proc) {
>>>>>> int i;
>>>>>> unsigned int j;
>>>>>> @@ -1078,7 +1081,9 @@
>>>>>> 
>>>>>> return ORTE_ERROR;
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> /*
>>>>>> * This function performs both the get and set operations on the epoch 
>>>>>> for a
>>>>>> * sepcific process name. If the epoch passed into the function is
>>>>>> @@ -1091,6 +1096,11 @@
>>>>>> orte_job_t *jdata;
>>>>>> orte_proc_t *pdata;
>>>>>> 
>>>>>> +    if (ORTE_JOBID_INVALID == proc->jobid || 
>>>>>> +        ORTE_VPID_INVALID  == proc->vpid) {
>>>>>> +        return ORTE_EPOCH_INVALID;
>>>>>> +    }
>>>>>> +
>>>>>> /* Sanity check just to make sure we don't overwrite our existing
>>>>>> * orte_job_data.
>>>>>> */
>>>>>> @@ -1165,4 +1175,5 @@
>>>>>>    return ORTE_EPOCH_MIN;
>>>>>> }
>>>>>> }
>>>>>> +#endif
>>>>>> 
>>>>>> 
>>>>>> Modified: trunk/orte/util/nidmap.h
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/nidmap.h     (original)
>>>>>> +++ trunk/orte/util/nidmap.h     2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -48,11 +48,19 @@
>>>>>> ORTE_DECLSPEC orte_pmap_t* orte_util_lookup_pmap(orte_process_name_t 
>>>>>> *proc);
>>>>>> ORTE_DECLSPEC orte_nid_t* orte_util_lookup_nid(orte_process_name_t 
>>>>>> *proc);
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> ORTE_DECLSPEC orte_epoch_t orte_util_lookup_epoch(orte_process_name_t 
>>>>>> *proc);
>>>>>> ORTE_DECLSPEC orte_epoch_t orte_util_set_epoch(orte_process_name_t 
>>>>>> *proc, orte_epoch_t epoch);
>>>>>> +#endif
>>>>>> 
>>>>>> ORTE_DECLSPEC int orte_util_set_proc_state(orte_process_name_t *proc, 
>>>>>> orte_proc_state_t state);
>>>>>> +
>>>>>> +#if ORTE_RESIL_ORTE
>>>>>> +#define PROC_IS_RUNNING(n) orte_util_proc_is_running(n)
>>>>>> ORTE_DECLSPEC bool orte_util_proc_is_running(orte_process_name_t *proc);
>>>>>> +#else
>>>>>> +#define PROC_IS_RUNNING(n) ( true )
>>>>>> +#endif
>>>>>> 
>>>>>> ORTE_DECLSPEC int orte_util_encode_nodemap(opal_byte_object_t *boptr);
>>>>>> ORTE_DECLSPEC int orte_util_decode_nodemap(opal_byte_object_t *boptr);
>>>>>> @@ -72,5 +80,8 @@
>>>>>> END_C_DECLS
>>>>>> 
>>>>>> /* Local functions */
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> orte_epoch_t get_epoch_from_orte_job_data(orte_process_name_t *proc, 
>>>>>> orte_epoch_t epoch);
>>>>>> #endif
>>>>>> +
>>>>>> +#endif
>>>>>> 
>>>>>> Modified: trunk/orte/util/proc_info.c
>>>>>> ==============================================================================
>>>>>> --- trunk/orte/util/proc_info.c  (original)
>>>>>> +++ trunk/orte/util/proc_info.c  2011-08-26 18:16:14 EDT (Fri, 26 Aug 
>>>>>> 2011)
>>>>>> @@ -36,13 +36,19 @@
>>>>>> 
>>>>>> #include "orte/util/proc_info.h"
>>>>>> 
>>>>>> +#if ORTE_ENABLE_EPOCH
>>>>>> +#define ORTE_NAME_INVALID {ORTE_JOBID_INVALID, ORTE_VPID_INVALID, 
>>>>>> ORTE_EPOCH_MIN}
>>>>>> +#else
>>>>>> +#define ORTE_NAME_INVALID {ORTE_JOBID_INVALID, ORTE_VPID_INVALID}
>>>>>> +#endif
>>>>>> +
>>>>>> ORTE_DECLSPEC orte_proc_info_t orte_process_info = {
>>>>>> -    /*  .my_name =              */   {ORTE_JOBID_INVALID, 
>>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN},
>>>>>> -    /*  .my_daemon =            */   {ORTE_JOBID_INVALID, 
>>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN},
>>>>>> +    /*  .my_name =              */   ORTE_NAME_INVALID,
>>>>>> +    /*  .my_daemon =            */   ORTE_NAME_INVALID,
>>>>>> /*  .my_daemon_uri =        */   NULL,
>>>>>> -    /*  .my_hnp =               */   {ORTE_JOBID_INVALID, 
>>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN},
>>>>>> +    /*  .my_hnp =               */   ORTE_NAME_INVALID,
>>>>>> /*  .my_hnp_uri =           */   NULL,
>>>>>> -    /*  .my_parent =            */   {ORTE_JOBID_INVALID, 
>>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN},
>>>>>> +    /*  .my_parent =            */   ORTE_NAME_INVALID,
>>>>>> /*  .hnp_pid =              */   0,
>>>>>> /*  .app_num =              */   0,
>>>>>> /*  .num_procs =            */   1,
>>>>>> 
>>>>>> Modified: trunk/test/util/orte_session_dir.c
>>>>>> ==============================================================================
>>>>>> --- trunk/test/util/orte_session_dir.c   (original)
>>>>>> +++ trunk/test/util/orte_session_dir.c   2011-08-26 18:16:14 EDT (Fri, 
>>>>>> 26 Aug 2011)
>>>>>> @@ -57,7 +57,7 @@
>>>>>> orte_process_info.my_name->cellid = 0;
>>>>>> orte_process_info.my_name->jobid = 0;
>>>>>> orte_process_info.my_name->vpid = 0;
>>>>>> -    orte_process_info.my_name->epoch = ORTE_EPOCH_MIN;
>>>>>> +    ORTE_EPOCH_SET(orte_process_info.my_name->epoch,ORTE_EPOCH_MIN);
>>>>>> 
>>>>>> test_init("orte_session_dir_t");
>>>>>> test_out = fopen( "test_session_dir_out", "w+" );
>>>>>> _______________________________________________
>>>>>> svn-full mailing list
>>>>>> svn-f...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to