Thanks! Much appreciate the help
On Aug 26, 2011, at 6:32 PM, George Bosilca wrote: > I see. This should be fixed in r25098. Thanks for your patience. > > george. > > On Aug 26, 2011, at 19:47 , Ralph Castain wrote: > >> Has nothing to do with version, George - it's a problem of ORTE_ENABLE_EPOCH >> not being included in an AC_DEFINE. It is solely defined via AM_CONDITIONAL, >> but then used in .h files - which is simply wrong. >> >> Please fix it. >> >> >> On Aug 26, 2011, at 5:41 PM, George Bosilca wrote: >> >>> We can't reproduce this. It compiles and runs without troubles on our macs. >>> However, it might depend on the Mac OS X version, we recently moved to Lion. >>> >>> Thanks, >>> george. >>> >>> On Aug 26, 2011, at 19:19 , Ralph Castain wrote: >>> >>>> Hate to say this, but the trunk is broken - won't build on Mac with that >>>> disabled. I'll try to dig into it later :-( >>>> >>>> >>>> On Aug 26, 2011, at 4:18 PM, Wesley Bland wrote: >>>> >>>>> The epoch and resilient rote code is now macro'd away. To enable use >>>>> >>>>> --enable-resilient-orte >>>>> >>>>> which defines: >>>>> >>>>> ORTE_ENABLE_EPOCH >>>>> ORTE_RESIL_ORTE >>>>> >>>>> -- >>>>> >>>>> Wesley >>>>> >>>>> On Aug 26, 2011, at 6:16 PM, wbl...@osl.iu.edu wrote: >>>>> >>>>>> Author: wbland >>>>>> Date: 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> New Revision: 25093 >>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25093 >>>>>> >>>>>> Log: >>>>>> By popular demand the epoch code is now disabled by default. >>>>>> >>>>>> To enable the epochs and the resilient orte code, use the configure flag: >>>>>> >>>>>> --enable-resilient-orte >>>>>> >>>>>> This will define both: >>>>>> >>>>>> ORTE_ENABLE_EPOCH >>>>>> ORTE_RESIL_ORTE >>>>>> >>>>>> Text files modified: >>>>>> trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c | 12 >>>>>> ++++ >>>>>> trunk/ompi/mca/coll/sm2/coll_sm2_module.c | 3 >>>>>> >>>>>> trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c | 49 >>>>>> ++++++++---------- >>>>>> trunk/ompi/mca/dpm/orte/dpm_orte.c | 2 >>>>>> >>>>>> trunk/ompi/mca/pml/bfo/pml_bfo_failover.c | 10 +-- >>>>>> >>>>>> trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h | 6 -- >>>>>> >>>>>> trunk/ompi/proc/proc.c | 6 +- >>>>>> >>>>>> trunk/opal/config/opal_configure_options.m4 | 8 +++ >>>>>> >>>>>> trunk/orte/include/orte/types.h | 24 >>>>>> +++++++++ >>>>>> trunk/orte/mca/db/daemon/db_daemon.c | 2 >>>>>> >>>>>> trunk/orte/mca/errmgr/app/errmgr_app.c | 19 >>>>>> ++++++- >>>>>> trunk/orte/mca/errmgr/base/errmgr_base_fns.c | 12 >>>>>> ++-- >>>>>> trunk/orte/mca/errmgr/base/errmgr_base_tool.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/errmgr/hnp/errmgr_hnp.c | 99 >>>>>> +++++++++++++++++++++++++++------------ >>>>>> trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/errmgr/orted/errmgr_orted.c | 71 >>>>>> +++++++++++++++++++++------- >>>>>> trunk/orte/mca/ess/alps/ess_alps_module.c | 4 >>>>>> >>>>>> trunk/orte/mca/ess/base/base.h | 4 + >>>>>> >>>>>> trunk/orte/mca/ess/base/ess_base_select.c | 14 >>>>>> ++--- >>>>>> trunk/orte/mca/ess/env/ess_env_module.c | 3 >>>>>> >>>>>> trunk/orte/mca/ess/ess.h | 4 + >>>>>> >>>>>> trunk/orte/mca/ess/generic/ess_generic_module.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/ess/hnp/ess_hnp_module.c | 2 >>>>>> >>>>>> trunk/orte/mca/ess/lsf/ess_lsf_module.c | 3 >>>>>> >>>>>> trunk/orte/mca/ess/singleton/ess_singleton_module.c | 2 >>>>>> >>>>>> trunk/orte/mca/ess/slave/ess_slave_module.c | 3 >>>>>> >>>>>> trunk/orte/mca/ess/slurm/ess_slurm_module.c | 3 >>>>>> >>>>>> trunk/orte/mca/ess/slurmd/ess_slurmd_module.c | 4 >>>>>> >>>>>> trunk/orte/mca/ess/tm/ess_tm_module.c | 2 >>>>>> >>>>>> trunk/orte/mca/filem/rsh/filem_rsh_module.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c | 21 >>>>>> ++----- >>>>>> trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c | 8 +- >>>>>> >>>>>> trunk/orte/mca/iof/base/base.h | 8 +- >>>>>> >>>>>> trunk/orte/mca/iof/base/iof_base_open.c | 2 >>>>>> >>>>>> trunk/orte/mca/iof/hnp/iof_hnp.c | 7 +- >>>>>> >>>>>> trunk/orte/mca/iof/hnp/iof_hnp_receive.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/iof/orted/iof_orted.c | 2 >>>>>> >>>>>> trunk/orte/mca/odls/base/odls_base_default_fns.c | 7 +- >>>>>> >>>>>> trunk/orte/mca/odls/base/odls_base_open.c | 5 - >>>>>> >>>>>> trunk/orte/mca/odls/base/odls_base_state.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/oob/tcp/oob_tcp_msg.c | 2 >>>>>> >>>>>> trunk/orte/mca/oob/tcp/oob_tcp_peer.c | 5 ++ >>>>>> >>>>>> trunk/orte/mca/plm/base/plm_base_jobid.c | 4 >>>>>> >>>>>> trunk/orte/mca/plm/base/plm_base_launch_support.c | 3 >>>>>> >>>>>> trunk/orte/mca/plm/base/plm_base_orted_cmds.c | 8 +-- >>>>>> >>>>>> trunk/orte/mca/plm/base/plm_base_receive.c | 7 ++ >>>>>> >>>>>> trunk/orte/mca/plm/base/plm_base_rsh_support.c | 4 + >>>>>> >>>>>> trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c | 23 >>>>>> +++++---- >>>>>> trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c | 3 >>>>>> >>>>>> trunk/orte/mca/rmaps/seq/rmaps_seq.c | 3 >>>>>> >>>>>> trunk/orte/mca/rmcast/base/rmcast_base_open.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/rmcast/tcp/rmcast_tcp.c | 4 >>>>>> >>>>>> trunk/orte/mca/rmcast/udp/rmcast_udp.c | 4 >>>>>> >>>>>> trunk/orte/mca/rml/base/rml_base_components.c | 5 + >>>>>> >>>>>> trunk/orte/mca/rml/rml_types.h | 6 + >>>>>> >>>>>> trunk/orte/mca/routed/base/routed_base_components.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/routed/base/routed_base_register_sync.c | 4 + >>>>>> >>>>>> trunk/orte/mca/routed/binomial/routed_binomial.c | 54 >>>>>> ++++++++++++--------- >>>>>> trunk/orte/mca/routed/cm/routed_cm.c | 19 >>>>>> +++---- >>>>>> trunk/orte/mca/routed/direct/routed_direct.c | 3 >>>>>> >>>>>> trunk/orte/mca/routed/linear/routed_linear.c | 17 >>>>>> +++--- >>>>>> trunk/orte/mca/routed/radix/routed_radix.c | 22 >>>>>> ++++---- >>>>>> trunk/orte/mca/routed/slave/routed_slave.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/sensor/file/sensor_file.c | 2 >>>>>> >>>>>> trunk/orte/mca/snapc/base/snapc_base_fns.c | 4 >>>>>> >>>>>> trunk/orte/mca/snapc/full/snapc_full_global.c | 12 >>>>>> ++-- >>>>>> trunk/orte/mca/snapc/full/snapc_full_local.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/snapc/full/snapc_full_module.c | 4 >>>>>> >>>>>> trunk/orte/mca/sstore/base/sstore_base_fns.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/sstore/central/sstore_central_global.c | 3 >>>>>> >>>>>> trunk/orte/mca/sstore/central/sstore_central_local.c | 6 +- >>>>>> >>>>>> trunk/orte/mca/sstore/stage/sstore_stage_global.c | 7 +- >>>>>> >>>>>> trunk/orte/mca/sstore/stage/sstore_stage_local.c | 12 >>>>>> ++-- >>>>>> trunk/orte/orted/orted_comm.c | 20 >>>>>> ++++---- >>>>>> trunk/orte/orted/orted_main.c | 7 +- >>>>>> >>>>>> trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c | 4 + >>>>>> >>>>>> trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c | 4 + >>>>>> >>>>>> trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c | 6 ++ >>>>>> >>>>>> trunk/orte/runtime/data_type_support/orte_dt_print_fns.c | 19 >>>>>> +++++++ >>>>>> trunk/orte/runtime/data_type_support/orte_dt_size_fns.c | 2 >>>>>> >>>>>> trunk/orte/runtime/data_type_support/orte_dt_support.h | 11 >>>>>> ++++ >>>>>> trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c | 10 +++ >>>>>> >>>>>> trunk/orte/runtime/orte_data_server.c | 2 >>>>>> >>>>>> trunk/orte/runtime/orte_globals.c | 4 + >>>>>> >>>>>> trunk/orte/runtime/orte_init.c | 9 +++ >>>>>> >>>>>> trunk/orte/runtime/orte_wait.h | 6 +- >>>>>> >>>>>> trunk/orte/test/system/oob_stress.c | 3 >>>>>> >>>>>> trunk/orte/test/system/orte_ring.c | 6 - >>>>>> >>>>>> trunk/orte/test/system/orte_spawn.c | 4 >>>>>> >>>>>> trunk/orte/tools/orte-ps/orte-ps.c | 10 +++ >>>>>> >>>>>> trunk/orte/tools/orte-top/orte-top.c | 2 >>>>>> >>>>>> trunk/orte/util/comm/comm.c | 7 ++ >>>>>> >>>>>> trunk/orte/util/comm/comm.h | 5 + >>>>>> >>>>>> trunk/orte/util/hnp_contact.c | 3 >>>>>> >>>>>> trunk/orte/util/name_fns.c | 47 >>>>>> ++++++++++++++---- >>>>>> trunk/orte/util/name_fns.h | 30 >>>>>> ++++++++++++ >>>>>> trunk/orte/util/nidmap.c | 13 >>>>>> ++++ >>>>>> trunk/orte/util/nidmap.h | 11 >>>>>> ++++ >>>>>> trunk/orte/util/proc_info.c | 14 >>>>>> ++++- >>>>>> trunk/test/util/orte_session_dir.c | 2 >>>>>> >>>>>> 101 files changed, 652 insertions(+), 362 deletions(-) >>>>>> >>>>>> Modified: trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c >>>>>> (original) >>>>>> +++ trunk/ompi/mca/btl/openib/connect/btl_openib_connect_xoob.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -693,8 +693,16 @@ >>>>>> bool found = false; >>>>>> >>>>>> BTL_VERBOSE(("Searching for ep and proc with follow parameters:" >>>>>> - "jobid %d, vpid %d, epoch %d, sid %" PRIx64 ", lid %d", >>>>>> - process_name->jobid, process_name->vpid, >>>>>> process_name->epoch, subnet_id, lid)); >>>>>> + "jobid %d, vpid %d, " >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> + "epoch %d, " >>>>>> +#endif >>>>>> + "sid %" PRIx64 ", lid %d", >>>>>> + process_name->jobid, process_name->vpid, >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> + process_name->epoch, >>>>>> +#endif >>>>>> + subnet_id, lid)); >>>>>> /* find ibproc */ >>>>>> OPAL_THREAD_LOCK(&mca_btl_openib_component.ib_lock); >>>>>> for (ib_proc = (mca_btl_openib_proc_t*) >>>>>> >>>>>> Modified: trunk/ompi/mca/coll/sm2/coll_sm2_module.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/mca/coll/sm2/coll_sm2_module.c (original) >>>>>> +++ trunk/ompi/mca/coll/sm2/coll_sm2_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -1208,7 +1208,8 @@ >>>>>> peer = OBJ_NEW(orte_namelist_t); >>>>>> peer->name.jobid = >>>>>> comm->c_local_group->grp_proc_pointers[i]->proc_name.jobid; >>>>>> peer->name.vpid = >>>>>> comm->c_local_group->grp_proc_pointers[i]->proc_name.vpid; >>>>>> - peer->name.epoch = >>>>>> comm->c_local_group->grp_proc_pointers[i]->proc_name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(peer->name.epoch,comm->c_local_group->grp_proc_pointers[i]->proc_name.epoch); >>>>>> + >>>>>> opal_list_append(&peers, &peer->item); >>>>>> } >>>>>> /* prepare send data */ >>>>>> >>>>>> Modified: trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c (original) >>>>>> +++ trunk/ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -702,7 +702,7 @@ >>>>>> void >>>>>> ompi_crcp_bkmrk_pml_peer_ref_construct(ompi_crcp_bkmrk_pml_peer_ref_t >>>>>> *peer_ref) { >>>>>> peer_ref->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> peer_ref->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - peer_ref->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(peer_ref->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> OBJ_CONSTRUCT(&peer_ref->send_list, opal_list_t); >>>>>> OBJ_CONSTRUCT(&peer_ref->isend_list, opal_list_t); >>>>>> @@ -730,7 +730,7 @@ >>>>>> >>>>>> peer_ref->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> peer_ref->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - peer_ref->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(peer_ref->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> while( NULL != (item = opal_list_remove_first(&peer_ref->send_list)) ) { >>>>>> HOKE_TRAFFIC_MSG_REF_RETURN(item); >>>>>> @@ -840,7 +840,7 @@ >>>>>> >>>>>> msg_ref->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> msg_ref->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - msg_ref->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> msg_ref->matched = INVALID_INT; >>>>>> msg_ref->done = INVALID_INT; >>>>>> @@ -868,7 +868,7 @@ >>>>>> >>>>>> msg_ref->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> msg_ref->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - msg_ref->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> msg_ref->matched = INVALID_INT; >>>>>> msg_ref->done = INVALID_INT; >>>>>> @@ -902,7 +902,7 @@ >>>>>> >>>>>> msg_ref->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> msg_ref->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - msg_ref->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> msg_ref->done = INVALID_INT; >>>>>> msg_ref->active = INVALID_INT; >>>>>> @@ -934,7 +934,7 @@ >>>>>> >>>>>> msg_ref->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> msg_ref->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - msg_ref->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(msg_ref->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> msg_ref->done = INVALID_INT; >>>>>> msg_ref->active = INVALID_INT; >>>>>> @@ -954,7 +954,7 @@ >>>>>> >>>>>> msg_ack_ref->peer.jobid = ORTE_JOBID_INVALID; >>>>>> msg_ack_ref->peer.vpid = ORTE_VPID_INVALID; >>>>>> - msg_ack_ref->peer.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(msg_ack_ref->peer.epoch,ORTE_EPOCH_MIN); >>>>>> } >>>>>> >>>>>> void ompi_crcp_bkmrk_pml_drain_message_ack_ref_destruct( >>>>>> ompi_crcp_bkmrk_pml_drain_message_ack_ref_t *msg_ack_ref) { >>>>>> @@ -962,7 +962,7 @@ >>>>>> >>>>>> msg_ack_ref->peer.jobid = ORTE_JOBID_INVALID; >>>>>> msg_ack_ref->peer.vpid = ORTE_VPID_INVALID; >>>>>> - msg_ack_ref->peer.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(msg_ack_ref->peer.epoch,ORTE_EPOCH_MIN); >>>>>> } >>>>>> >>>>>> >>>>>> @@ -1015,7 +1015,7 @@ >>>>>> } >>>>>> >>>>>> >>>>>> -#define CREATE_NEW_MSG(msg_ref, v_type, v_count, v_ddt_size, v_tag, >>>>>> v_rank, v_comm, p_jobid, p_vpid, p_epoch) \ >>>>>> +#define CREATE_NEW_MSG(msg_ref, v_type, v_count, v_ddt_size, v_tag, >>>>>> v_rank, v_comm, p_jobid, p_vpid) \ >>>>>> { \ >>>>>> HOKE_TRAFFIC_MSG_REF_ALLOC(msg_ref, ret); \ >>>>>> \ >>>>>> @@ -1034,7 +1034,7 @@ >>>>>> \ >>>>>> msg_ref->proc_name.jobid = p_jobid; \ >>>>>> msg_ref->proc_name.vpid = p_vpid; \ >>>>>> - msg_ref->proc_name.epoch = p_epoch; \ >>>>>> + >>>>>> ORTE_EPOCH_SET(msg_ref->proc_name.epoch,orte_ess.proc_get_epoch(&(msg_ref->proc_name))); >>>>>> \ >>>>>> \ >>>>>> msg_ref->matched = 0; \ >>>>>> msg_ref->done = 0; \ >>>>>> @@ -1043,7 +1043,7 @@ >>>>>> msg_ref->active_drain = 0; \ >>>>>> } >>>>>> >>>>>> -#define CREATE_NEW_DRAIN_MSG(msg_ref, v_type, v_count, v_ddt_size, >>>>>> v_tag, v_rank, v_comm, p_jobid, p_vpid, p_epoch) \ >>>>>> +#define CREATE_NEW_DRAIN_MSG(msg_ref, v_type, v_count, v_ddt_size, >>>>>> v_tag, v_rank, v_comm, p_jobid, p_vpid) \ >>>>>> { \ >>>>>> HOKE_DRAIN_MSG_REF_ALLOC(msg_ref, ret); \ >>>>>> \ >>>>>> @@ -1063,7 +1063,7 @@ >>>>>> \ >>>>>> msg_ref->proc_name.jobid = p_jobid; \ >>>>>> msg_ref->proc_name.vpid = p_vpid; \ >>>>>> - msg_ref->proc_name.epoch = p_epoch; \ >>>>>> + >>>>>> ORTE_EPOCH_SET(msg_ref->proc_name.epoch,orte_ess.proc_get_epoch(&(msg_ref->proc_name))); >>>>>> \ >>>>>> } >>>>>> >>>>>> >>>>>> @@ -1466,7 +1466,7 @@ >>>>>> >>>>>> new_peer_ref->proc_name.jobid = procs[i]->proc_name.jobid; >>>>>> new_peer_ref->proc_name.vpid = procs[i]->proc_name.vpid; >>>>>> - new_peer_ref->proc_name.epoch = procs[i]->proc_name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(new_peer_ref->proc_name.epoch,procs[i]->proc_name.epoch); >>>>>> >>>>>> opal_list_append(&ompi_crcp_bkmrk_pml_peer_refs, >>>>>> &(new_peer_ref->super)); >>>>>> } >>>>>> @@ -3237,13 +3237,11 @@ >>>>>> CREATE_NEW_MSG((*msg_ref), msg_type, >>>>>> count, ddt_size, tag, dest, comm, >>>>>> peer_ref->proc_name.jobid, >>>>>> - peer_ref->proc_name.vpid, >>>>>> - peer_ref->proc_name.epoch); >>>>>> + peer_ref->proc_name.vpid); >>>>>> } else { >>>>>> CREATE_NEW_MSG((*msg_ref), msg_type, >>>>>> count, ddt_size, tag, dest, comm, >>>>>> - ORTE_JOBID_INVALID, ORTE_VPID_INVALID, >>>>>> - ORTE_EPOCH_INVALID); >>>>>> + ORTE_JOBID_INVALID, ORTE_VPID_INVALID); >>>>>> } >>>>>> >>>>>> if( msg_type == COORD_MSG_TYPE_P_SEND || >>>>>> @@ -3377,7 +3375,7 @@ >>>>>> if( NULL == from_peer_ref && NULL != to_peer_ref ) { >>>>>> (*new_msg_ref)->proc_name.jobid = to_peer_ref->proc_name.jobid; >>>>>> (*new_msg_ref)->proc_name.vpid = to_peer_ref->proc_name.vpid; >>>>>> - (*new_msg_ref)->proc_name.epoch = to_peer_ref->proc_name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET((*new_msg_ref)->proc_name.epoch,to_peer_ref->proc_name.epoch); >>>>>> } >>>>>> >>>>>> return exit_status; >>>>>> @@ -3808,8 +3806,7 @@ >>>>>> CREATE_NEW_DRAIN_MSG((*msg_ref), msg_type, >>>>>> count, NULL, tag, dest, comm, >>>>>> peer_ref->proc_name.jobid, >>>>>> - peer_ref->proc_name.vpid, >>>>>> - peer_ref->proc_name.epoch); >>>>>> + peer_ref->proc_name.vpid); >>>>>> >>>>>> (*msg_ref)->done = 0; >>>>>> (*msg_ref)->active = 0; >>>>>> @@ -5284,8 +5281,7 @@ >>>>>> */ >>>>>> peer_name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> peer_name.vpid = peer_idx; >>>>>> - peer_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer_name.epoch = orte_ess.proc_get_epoch(&peer_name); >>>>>> + ORTE_EPOCH_SET(peer_name.epoch,orte_ess.proc_get_epoch(&peer_name)); >>>>>> >>>>>> if( NULL == (peer_ref = find_peer(peer_name))) { >>>>>> opal_output(mca_crcp_bkmrk_component.super.output_handle, >>>>>> @@ -5346,8 +5342,7 @@ >>>>>> >>>>>> peer_name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> peer_name.vpid = peer_idx; >>>>>> - peer_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer_name.epoch = orte_ess.proc_get_epoch(&peer_name); >>>>>> + ORTE_EPOCH_SET(peer_name.epoch,orte_ess.proc_get_epoch(&peer_name)); >>>>>> >>>>>> if ( 0 > (ret = orte_rml.recv_buffer_nb(&peer_name, >>>>>> OMPI_CRCP_COORD_BOOKMARK_TAG, >>>>>> @@ -5529,7 +5524,8 @@ >>>>>> HOKE_DRAIN_ACK_MSG_REF_ALLOC(d_msg_ack, ret); >>>>>> d_msg_ack->peer.jobid = peer_ref->proc_name.jobid; >>>>>> d_msg_ack->peer.vpid = peer_ref->proc_name.vpid; >>>>>> - d_msg_ack->peer.epoch = peer_ref->proc_name.epoch; >>>>>> + ORTE_EPOCH_SET(d_msg_ack->peer.epoch,peer_ref->proc_name.epoch); >>>>>> + >>>>>> d_msg_ack->complete = false; >>>>>> opal_list_append(&drained_msg_ack_list, &(d_msg_ack->super)); >>>>>> OPAL_OUTPUT_VERBOSE((10, mca_crcp_bkmrk_component.super.output_handle, >>>>>> @@ -6169,8 +6165,7 @@ >>>>>> count, datatype_size, tag, rank, >>>>>> ompi_comm_lookup(comm_id), >>>>>> peer_ref->proc_name.jobid, >>>>>> - peer_ref->proc_name.vpid, >>>>>> - peer_ref->proc_name.epoch); >>>>>> + peer_ref->proc_name.vpid); >>>>>> >>>>>> traffic_message_create_drain_message(true, num_left_unresolved, >>>>>> peer_ref, >>>>>> >>>>>> Modified: trunk/ompi/mca/dpm/orte/dpm_orte.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/mca/dpm/orte/dpm_orte.c (original) >>>>>> +++ trunk/ompi/mca/dpm/orte/dpm_orte.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -1130,7 +1130,7 @@ >>>>>> /* flag the identity of the remote proc */ >>>>>> carport.jobid = mev->sender.jobid; >>>>>> carport.vpid = mev->sender.vpid; >>>>>> - carport.epoch = mev->sender.epoch; >>>>>> + ORTE_EPOCH_SET(carport.epoch,mev->sender.epoch); >>>>>> >>>>>> /* release the event */ >>>>>> OBJ_RELEASE(mev); >>>>>> >>>>>> Modified: trunk/ompi/mca/pml/bfo/pml_bfo_failover.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/mca/pml/bfo/pml_bfo_failover.c (original) >>>>>> +++ trunk/ompi/mca/pml/bfo/pml_bfo_failover.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -1,8 +1,5 @@ >>>>>> /* >>>>>> * Copyright (c) 2010 Oracle and/or its affiliates. All rights >>>>>> reserved. >>>>>> - * Copyright (c) 2004-2011 The University of Tennessee and The >>>>>> University >>>>>> - * of Tennessee Research Foundation. All rights >>>>>> - * reserved. >>>>>> * $COPYRIGHT$ >>>>>> * >>>>>> * Additional copyrights may follow >>>>>> @@ -398,13 +395,13 @@ >>>>>> (hdr->hdr_match.hdr_seq != (uint16_t)recvreq->req_msgseq)) { >>>>>> orte_proc.jobid = hdr->hdr_restart.hdr_jobid; >>>>>> orte_proc.vpid = hdr->hdr_restart.hdr_vpid; >>>>>> - orte_proc.epoch = hdr->hdr_restart.hdr_epoch; >>>>>> + >>>>>> ompi_proc = ompi_proc_find(&orte_proc); >>>>>> opal_output_verbose(20, mca_pml_bfo_output, >>>>>> "RNDVRESTARTNOTIFY: received: does not match >>>>>> request, sending NACK back " >>>>>> "PML:req=%d,hdr=%d CTX:req=%d,hdr=%d >>>>>> SRC:req=%d,hdr=%d " >>>>>> "RQS:req=%d,hdr=%d src_req=%p, dst_req=%p, >>>>>> peer=%d, hdr->hdr_jobid=%d, " >>>>>> - "hdr->hdr_vpid=%d, hdr->hdr_epoch=%d, >>>>>> ompi_proc->proc_hostname=%s", >>>>>> + "hdr->hdr_vpid=%d, >>>>>> ompi_proc->proc_hostname=%s", >>>>>> (uint16_t)recvreq->req_msgseq, >>>>>> hdr->hdr_match.hdr_seq, >>>>>> recvreq->req_recv.req_base.req_comm->c_contextid, >>>>>> hdr->hdr_match.hdr_ctx, >>>>>> >>>>>> recvreq->req_recv.req_base.req_ompi.req_status.MPI_SOURCE, >>>>>> @@ -413,7 +410,7 @@ >>>>>> recvreq->remote_req_send.pval, (void *)recvreq, >>>>>> >>>>>> recvreq->req_recv.req_base.req_ompi.req_status.MPI_SOURCE, >>>>>> hdr->hdr_restart.hdr_jobid, >>>>>> hdr->hdr_restart.hdr_vpid, >>>>>> - hdr->hdr_restart.hdr_epoch, >>>>>> ompi_proc->proc_hostname); >>>>>> + ompi_proc->proc_hostname); >>>>>> mca_pml_bfo_recv_request_rndvrestartnack(des, ompi_proc, false); >>>>>> return; >>>>>> } >>>>>> @@ -715,7 +712,6 @@ >>>>>> restart->hdr_dst_rank = sendreq->req_send.req_base.req_peer; /* Needed >>>>>> for NACKs */ >>>>>> restart->hdr_jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> restart->hdr_vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - restart->hdr_epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> >>>>>> bfo_hdr_hton(restart, MCA_PML_BFO_HDR_TYPE_RNDVRESTARTNOTIFY, proc); >>>>>> >>>>>> >>>>>> Modified: trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h >>>>>> ============================================================================== >>>>>> --- trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h (original) >>>>>> +++ trunk/ompi/mca/pml/bfo/pml_bfo_hdr.h 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -2,9 +2,6 @@ >>>>>> * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana >>>>>> * University Research and Technology >>>>>> * Corporation. All rights reserved. >>>>>> - * Copyright (c) 2004-2011 The University of Tennessee and The >>>>>> University >>>>>> - * of Tennessee Research Foundation. All rights >>>>>> - * reserved. >>>>>> * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, >>>>>> * University of Stuttgart. All rights reserved. >>>>>> * Copyright (c) 2004-2005 The Regents of the University of California. >>>>>> @@ -415,7 +412,6 @@ >>>>>> int32_t hdr_dst_rank; /**< needed to send NACK */ >>>>>> uint32_t hdr_jobid; /**< needed to send NACK */ >>>>>> uint32_t hdr_vpid; /**< needed to send NACK */ >>>>>> - uint32_t hdr_epoch; /**< needed to send NACK >>>>>> */ >>>>>> }; >>>>>> typedef struct mca_pml_bfo_restart_hdr_t mca_pml_bfo_restart_hdr_t; >>>>>> >>>>>> @@ -428,7 +424,6 @@ >>>>>> (h).hdr_dst_rank = ntohl((h).hdr_dst_rank); \ >>>>>> (h).hdr_jobid = ntohl((h).hdr_jobid); \ >>>>>> (h).hdr_vpid = ntohl((h).hdr_vpid); \ >>>>>> - (h).hdr_epoch = ntohl((h).hdr_epoch); \ >>>>>> } while (0) >>>>>> >>>>>> #define MCA_PML_BFO_RESTART_HDR_HTON(h) \ >>>>>> @@ -437,7 +432,6 @@ >>>>>> (h).hdr_dst_rank = htonl((h).hdr_dst_rank); \ >>>>>> (h).hdr_jobid = htonl((h).hdr_jobid); \ >>>>>> (h).hdr_vpid = htonl((h).hdr_vpid); \ >>>>>> - (h).hdr_epoch = htonl((h).hdr_epoch); \ >>>>>> } while (0) >>>>>> >>>>>> #endif /* PML_BFO */ >>>>>> >>>>>> Modified: trunk/ompi/proc/proc.c >>>>>> ============================================================================== >>>>>> --- trunk/ompi/proc/proc.c (original) >>>>>> +++ trunk/ompi/proc/proc.c 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -108,7 +108,8 @@ >>>>>> >>>>>> proc->proc_name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> proc->proc_name.vpid = i; >>>>>> - proc->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> + >>>>>> if (i == ORTE_PROC_MY_NAME->vpid) { >>>>>> ompi_proc_local_proc = proc; >>>>>> proc->proc_flags = OPAL_PROC_ALL_LOCAL; >>>>>> @@ -362,8 +363,7 @@ >>>>>> >>>>>> /* Does not change: proc->proc_name.vpid */ >>>>>> proc->proc_name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> - proc->proc_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc->proc_name.epoch = >>>>>> orte_ess.proc_get_epoch(&proc->proc_name); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc->proc_name.epoch,orte_ess.proc_get_epoch(&proc->proc_name)); >>>>>> >>>>>> /* Make sure to clear the local flag before we set it below */ >>>>>> proc->proc_flags = 0; >>>>>> >>>>>> Modified: trunk/opal/config/opal_configure_options.m4 >>>>>> ============================================================================== >>>>>> --- trunk/opal/config/opal_configure_options.m4 (original) >>>>>> +++ trunk/opal/config/opal_configure_options.m4 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -416,6 +416,14 @@ >>>>>> AM_CONDITIONAL(WANT_FT_CR, test "$opal_want_ft_cr" = "1") >>>>>> >>>>>> # >>>>>> +# Compile in resilient runtime code >>>>>> +# >>>>>> +AC_ARG_ENABLE(resilient-orte, >>>>>> + [AC_HELP_STRING([--enable-resilient-orte], [Enable the resilient >>>>>> runtime code.])]) >>>>>> +AM_CONDITIONAL(ORTE_RESIL_ORTE, [test "$enable_resilient_orte" = "yes"]) >>>>>> +AM_CONDITIONAL(ORTE_ENABLE_EPOCH, [test "$enable_resilient_orte" = >>>>>> "yes"]) >>>>>> + >>>>>> +# >>>>>> # Do we want to install binaries? >>>>>> # >>>>>> AC_ARG_ENABLE([binaries], >>>>>> >>>>>> Modified: trunk/orte/include/orte/types.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/include/orte/types.h (original) >>>>>> +++ trunk/orte/include/orte/types.h 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -81,24 +81,43 @@ >>>>>> #define ORTE_VPID_T OPAL_UINT32 >>>>>> #define ORTE_VPID_MAX UINT32_MAX-2 >>>>>> #define ORTE_VPID_MIN 0 >>>>>> + >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> typedef uint32_t orte_epoch_t; >>>>>> #define ORTE_EPOCH_T OPAL_UINT32 >>>>>> #define ORTE_EPOCH_MAX UINT32_MAX-2 >>>>>> #define ORTE_EPOCH_MIN 0 >>>>>> +#endif >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> #define ORTE_PROCESS_NAME_HTON(n) \ >>>>>> do { \ >>>>>> n.jobid = htonl(n.jobid); \ >>>>>> n.vpid = htonl(n.vpid); \ >>>>>> n.epoch = htonl(n.epoch); \ >>>>>> } while (0) >>>>>> +#else >>>>>> +#define ORTE_PROCESS_NAME_HTON(n) \ >>>>>> +do { \ >>>>>> + n.jobid = htonl(n.jobid); \ >>>>>> + n.vpid = htonl(n.vpid); \ >>>>>> +} while (0) >>>>>> +#endif >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> #define ORTE_PROCESS_NAME_NTOH(n) \ >>>>>> do { \ >>>>>> n.jobid = ntohl(n.jobid); \ >>>>>> n.vpid = ntohl(n.vpid); \ >>>>>> n.epoch = ntohl(n.epoch); \ >>>>>> } while (0) >>>>>> +#else >>>>>> +#define ORTE_PROCESS_NAME_NTOH(n) \ >>>>>> +do { \ >>>>>> + n.jobid = ntohl(n.jobid); \ >>>>>> + n.vpid = ntohl(n.vpid); \ >>>>>> +} while (0) >>>>>> +#endif >>>>>> >>>>>> #define ORTE_NAME_ARGS(n) \ >>>>>> (unsigned long) ((NULL == n) ? (unsigned long)ORTE_JOBID_INVALID : >>>>>> (unsigned long)(n)->jobid), \ >>>>>> @@ -127,6 +146,7 @@ >>>>>> struct orte_process_name_t { >>>>>> orte_jobid_t jobid; /**< Job number */ >>>>>> orte_vpid_t vpid; /**< Process id - equivalent to rank */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; /**< Epoch - used to measure the generation of a >>>>>> recovered process. >>>>>> * The epoch will start at ORTE_EPOCH_MIN and >>>>>> * increment every time the process is detected >>>>>> as >>>>>> @@ -135,6 +155,7 @@ >>>>>> * processes that did not directly detect the >>>>>> * failure to increment their epochs. >>>>>> */ >>>>>> +#endif >>>>>> }; >>>>>> typedef struct orte_process_name_t orte_process_name_t; >>>>>> >>>>>> @@ -157,7 +178,10 @@ >>>>>> #define ORTE_NAME (OPAL_DSS_ID_DYNAMIC + 2) /**< an >>>>>> orte_process_name_t */ >>>>>> #define ORTE_VPID (OPAL_DSS_ID_DYNAMIC + 3) /**< a >>>>>> vpid */ >>>>>> #define ORTE_JOBID (OPAL_DSS_ID_DYNAMIC + 4) /**< a >>>>>> jobid */ >>>>>> + >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> #define ORTE_EPOCH (OPAL_DSS_ID_DYNAMIC + 5) /**< an >>>>>> epoch */ >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> /* State-related types */ >>>>>> >>>>>> Modified: trunk/orte/mca/db/daemon/db_daemon.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/db/daemon/db_daemon.c (original) >>>>>> +++ trunk/orte/mca/db/daemon/db_daemon.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -386,7 +386,7 @@ >>>>>> dat = OBJ_NEW(orte_db_data_t); >>>>>> dat->name.jobid = sender->jobid; >>>>>> dat->name.vpid = sender->vpid; >>>>>> - dat->name.epoch= sender->epoch; >>>>>> + ORTE_EPOCH_SET(dat->name.epoch,sender->epoch); >>>>>> dat->key = key; >>>>>> count=1; >>>>>> opal_dss.unpack(buf, &dat->size, &count, OPAL_INT32); >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/app/errmgr_app.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/app/errmgr_app.c (original) >>>>>> +++ trunk/orte/mca/errmgr/app/errmgr_app.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -82,8 +82,10 @@ >>>>>> NULL, >>>>>> NULL, >>>>>> NULL, >>>>>> - orte_errmgr_base_register_migration_warning, >>>>>> - orte_errmgr_base_set_fault_callback >>>>>> + orte_errmgr_base_register_migration_warning >>>>>> +#if ORTE_RESIL_ORTE >>>>>> + ,orte_errmgr_base_set_fault_callback >>>>>> +#endif >>>>>> }; >>>>>> >>>>>> /************************ >>>>>> @@ -93,18 +95,23 @@ >>>>>> { >>>>>> int ret = ORTE_SUCCESS; >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, >>>>>> ORTE_RML_TAG_EPOCH_CHANGE, >>>>>> ORTE_RML_PERSISTENT, >>>>>> epoch_change_recv, >>>>>> NULL); >>>>>> +#endif >>>>>> + >>>>>> return ret; >>>>>> } >>>>>> >>>>>> static int finalize(void) >>>>>> { >>>>>> +#if ORTE_RESIL_ORTE >>>>>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD, >>>>>> ORTE_RML_TAG_EPOCH_CHANGE); >>>>>> +#endif >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -151,6 +158,7 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> void epoch_change_recv(int status, >>>>>> orte_process_name_t *sender, >>>>>> opal_buffer_t *buffer, >>>>>> @@ -209,15 +217,20 @@ >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> >>>>>> (*fault_cbfunc)(procs); >>>>>> + } else if (NULL == fault_cbfunc) { >>>>>> + OPAL_OUTPUT_VERBOSE((1, orte_errmgr_base.output, >>>>>> + "%s errmgr:app Calling fault callback failed (NULL >>>>>> pointer)!", >>>>>> + ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> } else { >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_errmgr_base.output, >>>>>> - "%s errmgr:app Calling fault callback failed!", >>>>>> + "%s errmgr:app Calling fault callback failed >>>>>> (num_dead <= 0)!", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> } >>>>>> >>>>>> free(proc); >>>>>> OBJ_RELEASE(procs); >>>>>> } >>>>>> +#endif >>>>>> >>>>>> static int orte_errmgr_app_abort_peers(orte_process_name_t *procs, >>>>>> orte_std_cntr_t num_procs) >>>>>> { >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/base/errmgr_base_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/base/errmgr_base_fns.c (original) >>>>>> +++ trunk/orte/mca/errmgr/base/errmgr_base_fns.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -97,13 +97,13 @@ >>>>>> { >>>>>> item->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> - item->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> } >>>>>> >>>>>> void orte_errmgr_predicted_proc_destruct( orte_errmgr_predicted_proc_t >>>>>> *item) >>>>>> { >>>>>> item->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - item->proc_name.epoch = ORTE_EPOCH_INVALID; >>>>>> + ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_INVALID); >>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> } >>>>>> >>>>>> @@ -139,13 +139,13 @@ >>>>>> void orte_errmgr_predicted_map_construct(orte_errmgr_predicted_map_t >>>>>> *item) >>>>>> { >>>>>> item->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - item->proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> >>>>>> item->node_name = NULL; >>>>>> >>>>>> item->map_proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - item->map_proc_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(item->map_proc_name.epoch,ORTE_EPOCH_MIN); >>>>>> item->map_proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> >>>>>> item->map_node_name = NULL; >>>>>> @@ -156,7 +156,7 @@ >>>>>> void orte_errmgr_predicted_map_destruct( orte_errmgr_predicted_map_t >>>>>> *item) >>>>>> { >>>>>> item->proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - item->proc_name.epoch = ORTE_EPOCH_INVALID; >>>>>> + ORTE_EPOCH_SET(item->proc_name.epoch,ORTE_EPOCH_INVALID); >>>>>> item->proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> >>>>>> if( NULL != item->node_name ) { >>>>>> @@ -165,7 +165,7 @@ >>>>>> } >>>>>> >>>>>> item->map_proc_name.vpid = ORTE_VPID_INVALID; >>>>>> - item->map_proc_name.epoch = ORTE_EPOCH_INVALID; >>>>>> + ORTE_EPOCH_SET(item->map_proc_name.epoch,ORTE_EPOCH_INVALID); >>>>>> item->map_proc_name.jobid = ORTE_JOBID_INVALID; >>>>>> >>>>>> if( NULL != item->map_node_name ) { >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/base/errmgr_base_tool.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/base/errmgr_base_tool.c (original) >>>>>> +++ trunk/orte/mca/errmgr/base/errmgr_base_tool.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -267,7 +267,7 @@ >>>>>> */ >>>>>> errmgr_cmdline_sender.jobid = ORTE_JOBID_INVALID; >>>>>> errmgr_cmdline_sender.vpid = ORTE_VPID_INVALID; >>>>>> - errmgr_cmdline_sender.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(errmgr_cmdline_sender.epoch,ORTE_EPOCH_MIN); >>>>>> if (ORTE_SUCCESS != (ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, >>>>>> ORTE_RML_TAG_MIGRATE, >>>>>> 0, >>>>>> @@ -379,14 +379,14 @@ >>>>>> if( OPAL_EQUAL != orte_util_compare_name_fields(ORTE_NS_CMP_ALL, >>>>>> ORTE_NAME_INVALID, &errmgr_cmdline_sender) ) { >>>>>> swap_dest.jobid = errmgr_cmdline_sender.jobid; >>>>>> swap_dest.vpid = errmgr_cmdline_sender.vpid; >>>>>> - swap_dest.epoch = errmgr_cmdline_sender.epoch; >>>>>> + ORTE_EPOCH_SET(swap_dest.epoch,errmgr_cmdline_sender.epoch); >>>>>> >>>>>> errmgr_cmdline_sender = *sender; >>>>>> >>>>>> orte_errmgr_base_migrate_update(ORTE_ERRMGR_MIGRATE_STATE_ERR_INPROGRESS); >>>>>> >>>>>> errmgr_cmdline_sender.jobid = swap_dest.jobid; >>>>>> errmgr_cmdline_sender.vpid = swap_dest.vpid; >>>>>> - errmgr_cmdline_sender.epoch = swap_dest.epoch; >>>>>> + ORTE_EPOCH_SET(errmgr_cmdline_sender.epoch,swap_dest.epoch); >>>>>> >>>>>> goto cleanup; >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/hnp/errmgr_hnp.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/hnp/errmgr_hnp.c (original) >>>>>> +++ trunk/orte/mca/errmgr/hnp/errmgr_hnp.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -53,6 +53,7 @@ >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/runtime/orte_locks.h" >>>>>> #include "orte/runtime/orte_quit.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/errmgr/errmgr.h" >>>>>> #include "orte/mca/errmgr/base/base.h" >>>>>> @@ -83,9 +84,11 @@ >>>>>> orte_errmgr_hnp_global_suggest_map_targets, >>>>>> /* FT Event hook */ >>>>>> orte_errmgr_hnp_global_ft_event, >>>>>> - orte_errmgr_base_register_migration_warning, >>>>>> + orte_errmgr_base_register_migration_warning >>>>>> +#if ORTE_RESIL_ORTE >>>>>> /* Set the callback */ >>>>>> - orte_errmgr_base_set_fault_callback >>>>>> + ,orte_errmgr_base_set_fault_callback >>>>>> +#endif >>>>>> }; >>>>>> >>>>>> >>>>>> @@ -97,14 +100,16 @@ >>>>>> static void update_local_procs_in_job(orte_job_t *jdata, >>>>>> orte_job_state_t jobstate, >>>>>> orte_proc_state_t state, >>>>>> orte_exit_code_t exit_code); >>>>>> static void check_job_complete(orte_job_t *jdata); >>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t >>>>>> epoch); >>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid); >>>>>> static int hnp_relocate(orte_job_t *jdata, orte_process_name_t *proc, >>>>>> orte_proc_state_t state, orte_exit_code_t exit_code); >>>>>> static orte_odls_child_t* proc_is_local(orte_process_name_t *proc); >>>>>> +#if ORTE_RESIL_ORTE >>>>>> static int send_to_local_applications(opal_pointer_array_t *dead_names); >>>>>> static void failure_notification(int status, orte_process_name_t* sender, >>>>>> opal_buffer_t *buffer, orte_rml_tag_t tag, >>>>>> void* cbdata); >>>>>> +#endif >>>>>> >>>>>> /************************ >>>>>> * API Definitions >>>>>> @@ -380,16 +385,21 @@ >>>>>> **********************/ >>>>>> int orte_errmgr_hnp_base_global_init(void) >>>>>> { >>>>>> - int ret; >>>>>> + int ret = ORTE_SUCCESS; >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, >>>>>> ORTE_RML_TAG_FAILURE_NOTICE, >>>>>> ORTE_RML_PERSISTENT, failure_notification, >>>>>> NULL); >>>>>> +#endif >>>>>> + >>>>>> return ret; >>>>>> } >>>>>> >>>>>> int orte_errmgr_hnp_base_global_finalize(void) >>>>>> { >>>>>> +#if ORTE_RESIL_ORTE >>>>>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_FAILURE_NOTICE); >>>>>> +#endif >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -406,6 +416,7 @@ >>>>>> orte_odls_child_t *child; >>>>>> int rc; >>>>>> orte_app_context_t *app; >>>>>> + orte_proc_t *pdat; >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_errmgr_base.output, >>>>>> "%s errmgr:hnp: job %s reported state %s" >>>>>> @@ -538,7 +549,7 @@ >>>>>> ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED, >>>>>> exit_code); >>>>>> /* order all local procs for this job to be killed */ >>>>>> - killprocs(jdata->jobid, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(jdata->jobid, ORTE_VPID_WILDCARD); >>>>>> check_job_complete(jdata); /* set the local proc states */ >>>>>> /* the job object for this job will have been NULL'd >>>>>> * in the array if the job was solely local. If it isn't >>>>>> @@ -550,7 +561,7 @@ >>>>>> break; >>>>>> case ORTE_JOB_STATE_COMM_FAILED: >>>>>> /* order all local procs for this job to be killed */ >>>>>> - killprocs(jdata->jobid, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(jdata->jobid, ORTE_VPID_WILDCARD); >>>>>> check_job_complete(jdata); /* set the local proc states */ >>>>>> /* the job object for this job will have been NULL'd >>>>>> * in the array if the job was solely local. If it isn't >>>>>> @@ -562,7 +573,7 @@ >>>>>> break; >>>>>> case ORTE_JOB_STATE_HEARTBEAT_FAILED: >>>>>> /* order all local procs for this job to be killed */ >>>>>> - killprocs(jdata->jobid, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(jdata->jobid, ORTE_VPID_WILDCARD); >>>>>> check_job_complete(jdata); /* set the local proc states */ >>>>>> /* the job object for this job will have been NULL'd >>>>>> * in the array if the job was solely local. If it isn't >>>>>> @@ -632,10 +643,6 @@ >>>>>> } >>>>>> } >>>>>> >>>>>> - if (ORTE_PROC_STATE_ABORTED_BY_SIG == state) { >>>>>> - exit_code = 0; >>>>>> - } >>>>>> - >>>>>> orte_errmgr_hnp_update_proc(jdata, proc, state, pid, exit_code); >>>>>> check_job_complete(jdata); /* need to set the job state */ >>>>>> /* the job object for this job will have been NULL'd >>>>>> @@ -679,7 +686,7 @@ >>>>>> >>>>>> case ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED: >>>>>> if (jdata->enable_recovery) { >>>>>> - killprocs(proc->jobid, proc->vpid, proc->epoch); >>>>>> + killprocs(proc->jobid, proc->vpid); >>>>>> /* is this a local proc */ >>>>>> if (NULL != (child = proc_is_local(proc))) { >>>>>> /* local proc - see if it has reached its restart limit */ >>>>>> @@ -778,18 +785,37 @@ >>>>>> opal_output(0, "%s UNABLE TO RELOCATE PROCS FROM >>>>>> FAILED DAEMON %s", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> ORTE_NAME_PRINT(proc)); >>>>>> /* kill all local procs */ >>>>>> - killprocs(ORTE_JOBID_WILDCARD, >>>>>> ORTE_VPID_WILDCARD, ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(ORTE_JOBID_WILDCARD, >>>>>> ORTE_VPID_WILDCARD); >>>>>> /* kill all jobs */ >>>>>> hnp_abort(ORTE_JOBID_WILDCARD, exit_code); >>>>>> /* check if all is complete so we can terminate */ >>>>>> check_job_complete(jdata); >>>>>> } >>>>>> } else { >>>>>> +#if !ORTE_RESIL_ORTE >>>>>> + if (NULL == (pdat = >>>>>> (orte_proc_t*)opal_pointer_array_get_item(jdata->procs, proc->vpid))) { >>>>>> + ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); >>>>>> + orte_show_help("help-orte-errmgr-hnp.txt", >>>>>> "errmgr-hnp:daemon-died", true, >>>>>> + ORTE_VPID_PRINT(proc->vpid), >>>>>> "Unknown"); >>>>>> + } else { >>>>>> + orte_show_help("help-orte-errmgr-hnp.txt", >>>>>> "errmgr-hnp:daemon-died", true, >>>>>> + ORTE_VPID_PRINT(proc->vpid), >>>>>> + (NULL == pdat->node) ? "Unknown" >>>>>> : >>>>>> + ((NULL == pdat->node->name) ? >>>>>> "Unknown" : pdat->node->name)); >>>>>> + } >>>>>> +#endif >>>>>> if (ORTE_SUCCESS != >>>>>> orte_errmgr_hnp_record_dead_process(proc)) { >>>>>> /* The process is already dead so don't keep trying >>>>>> to do >>>>>> * this stuff. */ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> + >>>>>> +#if !ORTE_RESIL_ORTE >>>>>> + /* kill all local procs */ >>>>>> + killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD); >>>>>> + /* kill all jobs */ >>>>>> + hnp_abort(ORTE_JOBID_WILDCARD, exit_code); >>>>>> +#endif >>>>>> /* We'll check if the job was complete when we get the >>>>>> * message back from the HNP notifying us of the dead >>>>>> * process */ >>>>>> @@ -805,7 +831,7 @@ >>>>>> } else { >>>>>> orte_errmgr_hnp_record_dead_process(proc); >>>>>> /* kill all local procs */ >>>>>> - killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD); >>>>>> /* kill all jobs */ >>>>>> hnp_abort(ORTE_JOBID_WILDCARD, exit_code); >>>>>> return ORTE_ERR_UNRECOVERABLE; >>>>>> @@ -824,6 +850,7 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> static void failure_notification(int status, orte_process_name_t* sender, >>>>>> opal_buffer_t *buffer, orte_rml_tag_t tag, >>>>>> void* cbdata) >>>>>> @@ -984,6 +1011,7 @@ >>>>>> >>>>>> OBJ_RELEASE(dead_names); >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /***************** >>>>>> * Local Functions >>>>>> @@ -1354,7 +1382,6 @@ >>>>>> ORTE_UPDATE_EXIT_STATUS(proc->exit_code); >>>>>> } >>>>>> break; >>>>>> -#if 0 >>>>>> case ORTE_PROC_STATE_ABORTED_BY_SIG: >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output, >>>>>> "%s errmgr:hnp:check_job_completed proc %s >>>>>> aborted by signal", >>>>>> @@ -1370,7 +1397,6 @@ >>>>>> ORTE_UPDATE_EXIT_STATUS(proc->exit_code); >>>>>> } >>>>>> break; >>>>>> -#endif >>>>>> case ORTE_PROC_STATE_TERM_WO_SYNC: >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output, >>>>>> "%s errmgr:hnp:check_job_completed proc %s >>>>>> terminated without sync", >>>>>> @@ -1393,7 +1419,6 @@ >>>>>> } >>>>>> break; >>>>>> case ORTE_PROC_STATE_COMM_FAILED: >>>>>> -#if 1 >>>>>> if (!jdata->abort) { >>>>>> jdata->state = ORTE_JOB_STATE_COMM_FAILED; >>>>>> /* point to the lowest rank to cause the problem */ >>>>>> @@ -1403,7 +1428,6 @@ >>>>>> jdata->abort = true; >>>>>> ORTE_UPDATE_EXIT_STATUS(proc->exit_code); >>>>>> } >>>>>> -#endif >>>>>> break; >>>>>> case ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED: >>>>>> if (!jdata->abort) { >>>>>> @@ -1530,9 +1554,6 @@ >>>>>> */ >>>>>> CHECK_DAEMONS: >>>>>> if (jdata == NULL || jdata->jobid == ORTE_PROC_MY_NAME->jobid) { >>>>>> -#if 0 >>>>>> - if ((jdata->num_procs - 1) <= jdata->num_terminated) { /* >>>>>> Subtract one for the HNP */ >>>>>> -#endif >>>>>> if (0 == orte_routed.num_routes()) { >>>>>> /* orteds are done! */ >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output, >>>>>> @@ -1696,7 +1717,7 @@ >>>>>> } >>>>>> } >>>>>> >>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t >>>>>> epoch) >>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid) >>>>>> { >>>>>> opal_pointer_array_t cmd; >>>>>> orte_proc_t proc; >>>>>> @@ -1707,7 +1728,9 @@ >>>>>> orte_sensor.stop(job); >>>>>> } >>>>>> >>>>>> - if (ORTE_JOBID_WILDCARD == job && ORTE_VPID_WILDCARD == vpid && >>>>>> ORTE_EPOCH_WILDCARD == epoch) { >>>>>> + if (ORTE_JOBID_WILDCARD == job >>>>>> + && ORTE_VPID_WILDCARD == vpid >>>>>> + && ORTE_EPOCH_CMP(ORTE_EPOCH_WILDCARD,epoch)) { >>>>>> if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(NULL))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> } >>>>>> @@ -1718,7 +1741,7 @@ >>>>>> OBJ_CONSTRUCT(&proc, orte_proc_t); >>>>>> proc.name.jobid = job; >>>>>> proc.name.vpid = vpid; >>>>>> - proc.name.epoch = epoch; >>>>>> + ORTE_EPOCH_SET(proc.name.epoch,epoch); >>>>>> opal_pointer_array_add(&cmd, &proc); >>>>>> if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(&cmd))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> @@ -1913,13 +1936,15 @@ >>>>>> } >>>>>> >>>>>> if (NULL != (pdat = >>>>>> (orte_proc_t*)opal_pointer_array_get_item(jdat->procs, proc->vpid)) && >>>>>> - ORTE_PROC_STATE_TERMINATED < pdat->state) { >>>>>> + ORTE_PROC_STATE_TERMINATED > pdat->state) { >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* Make sure that the epochs match. */ >>>>>> if (proc->epoch != pdat->name.epoch) { >>>>>> opal_output(1, "The epoch does not match the current epoch. >>>>>> Throwing the request out."); >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> dead_names = OBJ_NEW(opal_pointer_array_t); >>>>>> >>>>>> @@ -1935,6 +1960,7 @@ >>>>>> } >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> if (!mca_errmgr_hnp_component.term_in_progress) { >>>>>> /* >>>>>> * Send a message to the other daemons so they know that a daemon >>>>>> has >>>>>> @@ -1949,7 +1975,7 @@ >>>>>> OBJ_RELEASE(buffer); >>>>>> } else { >>>>>> >>>>>> - /* Iterate of the list of dead procs and send them >>>>>> along with >>>>>> + /* Iterate over the list of dead procs and send them >>>>>> along with >>>>>> * the rest. The HNP needs this info so it can tell the other >>>>>> * ORTEDs and they can inform the appropriate applications. >>>>>> */ >>>>>> @@ -1973,6 +1999,9 @@ >>>>>> } else { >>>>>> orte_errmgr_hnp_global_mark_processes_as_dead(dead_names); >>>>>> } >>>>>> +#else >>>>>> + orte_errmgr_hnp_global_mark_processes_as_dead(dead_names); >>>>>> +#endif >>>>>> } >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> @@ -2011,6 +2040,7 @@ >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> ORTE_NAME_PRINT(&pdat->name))); >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> /* Make sure the epochs match, if not it probably means that we >>>>>> * already reported this failure. */ >>>>>> if (name_item->epoch != pdat->name.epoch) { >>>>>> @@ -2018,6 +2048,7 @@ >>>>>> } >>>>>> >>>>>> orte_util_set_epoch(name_item, name_item->epoch + 1); >>>>>> +#endif >>>>>> >>>>>> /* Remove it from the job array */ >>>>>> opal_pointer_array_set_item(jdat->procs, name_item->vpid, NULL); >>>>>> @@ -2034,6 +2065,7 @@ >>>>>> >>>>>> OBJ_RELEASE(pdat); >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> /* Create a new proc object that will keep track of the epoch >>>>>> * information */ >>>>>> pdat = OBJ_NEW(orte_proc_t); >>>>>> @@ -2041,14 +2073,15 @@ >>>>>> pdat->name.vpid = name_item->vpid; >>>>>> pdat->name.epoch = name_item->epoch + 1; >>>>>> >>>>>> - /* Set the state as terminated so we'll know the process >>>>>> isn't >>>>>> - * actually there. */ >>>>>> - pdat->state = ORTE_PROC_STATE_TERMINATED; >>>>>> - >>>>>> opal_pointer_array_set_item(jdat->procs, name_item->vpid, pdat); >>>>>> jdat->num_procs++; >>>>>> jdat->num_terminated++; >>>>>> +#endif >>>>>> + /* Set the state as terminated so we'll know the process >>>>>> isn't >>>>>> + * actually there. */ >>>>>> + pdat->state = ORTE_PROC_STATE_TERMINATED; >>>>>> } else { >>>>>> +#if ORTE_RESIL_ORTE >>>>>> opal_output(0, "Proc data not found for %s", >>>>>> ORTE_NAME_PRINT(name_item)); >>>>>> /* Create a new proc object that will keep track of the epoch >>>>>> * information */ >>>>>> @@ -2064,11 +2097,13 @@ >>>>>> opal_pointer_array_set_item(jdat->procs, name_item->vpid, pdat); >>>>>> jdat->num_procs++; >>>>>> jdat->num_terminated++; >>>>>> +#endif >>>>>> } >>>>>> >>>>>> check_job_complete(jdat); >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> if (!orte_orteds_term_ordered) { >>>>>> /* Need to update the orted routing module. */ >>>>>> orte_routed.update_routing_tree(ORTE_PROC_MY_NAME->jobid); >>>>>> @@ -2077,10 +2112,12 @@ >>>>>> (*fault_cbfunc)(dead_procs); >>>>>> } >>>>>> } >>>>>> +#endif >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> int send_to_local_applications(opal_pointer_array_t *dead_names) { >>>>>> opal_buffer_t *buf; >>>>>> int ret = ORTE_SUCCESS; >>>>>> @@ -2121,3 +2158,5 @@ >>>>>> >>>>>> return ret; >>>>>> } >>>>>> +#endif >>>>>> + >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c (original) >>>>>> +++ trunk/orte/mca/errmgr/hnp/errmgr_hnp_autor.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -522,7 +522,7 @@ >>>>>> wp_item = OBJ_NEW(errmgr_autor_wp_item_t); >>>>>> wp_item->name.jobid = proc->jobid; >>>>>> wp_item->name.vpid = proc->vpid; >>>>>> - wp_item->name.epoch = proc->epoch; >>>>>> + ORTE_EPOCH_SET(wp_item->name.epoch,proc->epoch); >>>>>> wp_item->state = state; >>>>>> >>>>>> opal_list_append(procs_pending_recovery, &(wp_item->super)); >>>>>> @@ -626,7 +626,7 @@ >>>>>> { >>>>>> wp->name.jobid = ORTE_JOBID_INVALID; >>>>>> wp->name.vpid = ORTE_VPID_INVALID; >>>>>> - wp->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(wp->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> wp->state = 0; >>>>>> } >>>>>> @@ -635,7 +635,7 @@ >>>>>> { >>>>>> wp->name.jobid = ORTE_JOBID_INVALID; >>>>>> wp->name.vpid = ORTE_VPID_INVALID; >>>>>> - wp->name.epoch = ORTE_EPOCH_INVALID; >>>>>> + ORTE_EPOCH_SET(wp->name.epoch,ORTE_EPOCH_INVALID); >>>>>> >>>>>> wp->state = 0; >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c (original) >>>>>> +++ trunk/orte/mca/errmgr/hnp/errmgr_hnp_crmig.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -750,7 +750,7 @@ >>>>>> close_iof_stdin = true; >>>>>> iof_name.jobid = proc->name.jobid; >>>>>> iof_name.vpid = proc->name.vpid; >>>>>> - iof_name.epoch = proc->name.epoch; >>>>>> + ORTE_EPOCH_SET(iof_name.epoch,proc->name.epoch); >>>>>> } >>>>>> } >>>>>> } >>>>>> @@ -807,7 +807,7 @@ >>>>>> close_iof_stdin = true; >>>>>> iof_name.jobid = proc->name.jobid; >>>>>> iof_name.vpid = proc->name.vpid; >>>>>> - iof_name.epoch = proc->name.epoch; >>>>>> + ORTE_EPOCH_SET(iof_name.epoch,proc->name.epoch); >>>>>> } >>>>>> } >>>>>> } >>>>>> @@ -855,7 +855,7 @@ >>>>>> close_iof_stdin = true; >>>>>> iof_name.jobid = proc->name.jobid; >>>>>> iof_name.vpid = proc->name.vpid; >>>>>> - iof_name.epoch = proc->name.epoch; >>>>>> + ORTE_EPOCH_SET(iof_name.epoch,proc->name.epoch); >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/errmgr/orted/errmgr_orted.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/errmgr/orted/errmgr_orted.c (original) >>>>>> +++ trunk/orte/mca/errmgr/orted/errmgr_orted.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -34,6 +34,7 @@ >>>>>> #include "orte/util/show_help.h" >>>>>> #include "orte/util/nidmap.h" >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> #include "orte/mca/rml/rml.h" >>>>>> #include "orte/mca/odls/odls.h" >>>>>> #include "orte/mca/odls/base/base.h" >>>>>> @@ -41,7 +42,9 @@ >>>>>> #include "orte/mca/plm/plm_types.h" >>>>>> #include "orte/mca/routed/routed.h" >>>>>> #include "orte/mca/sensor/sensor.h" >>>>>> +#include "orte/mca/ess/ess.h" >>>>>> #include "orte/runtime/orte_quit.h" >>>>>> +#include "orte/runtime/orte_globals.h" >>>>>> >>>>>> #include "orte/mca/errmgr/errmgr.h" >>>>>> #include "orte/mca/errmgr/base/base.h" >>>>>> @@ -59,13 +62,15 @@ >>>>>> static void update_local_children(orte_odls_job_t *jobdat, >>>>>> orte_job_state_t jobstate, >>>>>> orte_proc_state_t state); >>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t >>>>>> epoch); >>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid); >>>>>> static int record_dead_process(orte_process_name_t *proc); >>>>>> -static int send_to_local_applications(opal_pointer_array_t *dead_names); >>>>>> static int mark_processes_as_dead(opal_pointer_array_t *dead_procs); >>>>>> +#if ORTE_RESIL_ORTE >>>>>> +static int send_to_local_applications(opal_pointer_array_t *dead_names); >>>>>> static void failure_notification(int status, orte_process_name_t* sender, >>>>>> opal_buffer_t *buffer, orte_rml_tag_t tag, >>>>>> void* cbdata); >>>>>> +#endif >>>>>> >>>>>> /* >>>>>> * Module functions: Global >>>>>> @@ -104,8 +109,10 @@ >>>>>> predicted_fault, >>>>>> suggest_map_targets, >>>>>> ft_event, >>>>>> - orte_errmgr_base_register_migration_warning, >>>>>> - orte_errmgr_base_set_fault_callback /* Set callback function */ >>>>>> + orte_errmgr_base_register_migration_warning >>>>>> +#if ORTE_RESIL_ORTE >>>>>> + ,orte_errmgr_base_set_fault_callback /* Set callback function */ >>>>>> +#endif >>>>>> }; >>>>>> >>>>>> /************************ >>>>>> @@ -113,16 +120,22 @@ >>>>>> ************************/ >>>>>> static int init(void) >>>>>> { >>>>>> - int ret; >>>>>> + int ret = ORTE_SUCCESS; >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> ret = orte_rml.recv_buffer_nb(ORTE_NAME_WILDCARD, >>>>>> ORTE_RML_TAG_FAILURE_NOTICE, >>>>>> ORTE_RML_PERSISTENT, failure_notification, >>>>>> NULL); >>>>>> +#endif >>>>>> + >>>>>> return ret; >>>>>> } >>>>>> >>>>>> static int finalize(void) >>>>>> { >>>>>> +#if ORTE_RESIL_ORTE >>>>>> orte_rml.recv_cancel(ORTE_NAME_WILDCARD, ORTE_RML_TAG_FAILURE_NOTICE); >>>>>> +#endif >>>>>> + >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> @@ -228,10 +241,10 @@ >>>>>> /* update all procs in job */ >>>>>> update_local_children(jobdat, jobstate, >>>>>> ORTE_PROC_STATE_SENSOR_BOUND_EXCEEDED); >>>>>> /* order all local procs for this job to be killed */ >>>>>> - killprocs(jobdat->jobid, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(jobdat->jobid, ORTE_VPID_WILDCARD); >>>>>> case ORTE_JOB_STATE_COMM_FAILED: >>>>>> /* kill all local procs */ >>>>>> - killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD); >>>>>> /* tell the caller we can't recover */ >>>>>> return ORTE_ERR_UNRECOVERABLE; >>>>>> break; >>>>>> @@ -276,7 +289,7 @@ >>>>>> /* see if this was a lifeline */ >>>>>> if (ORTE_SUCCESS != orte_routed.route_lost(proc)) { >>>>>> /* kill our children */ >>>>>> - killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD, >>>>>> ORTE_EPOCH_WILDCARD); >>>>>> + killprocs(ORTE_JOBID_WILDCARD, ORTE_VPID_WILDCARD); >>>>>> /* terminate - our routed children will see >>>>>> * us leave and automatically die >>>>>> */ >>>>>> @@ -290,10 +303,18 @@ >>>>>> if (0 == orte_routed.num_routes() && >>>>>> 0 == opal_list_get_size(&orte_local_children)) { >>>>>> orte_quit(); >>>>>> + } else { >>>>>> + OPAL_OUTPUT_VERBOSE((5, orte_errmgr_base.output, >>>>>> + "%s errmgr:orted not exiting, num_routes() >>>>>> == %d, num children == %d", >>>>>> + ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> + orte_routed.num_routes(), >>>>>> + opal_list_get_size(&orte_local_children))); >>>>>> } >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> record_dead_process(proc); >>>>>> +#endif >>>>>> >>>>>> /* if not, then indicate we can continue */ >>>>>> return ORTE_SUCCESS; >>>>>> @@ -344,7 +365,7 @@ >>>>>> /* Decrement the number of local procs */ >>>>>> jobdat->num_local_procs--; >>>>>> /* kill this proc */ >>>>>> - killprocs(proc->jobid, proc->vpid, proc->epoch); >>>>>> + killprocs(proc->jobid, proc->vpid); >>>>>> } >>>>>> app = >>>>>> (orte_app_context_t*)opal_pointer_array_get_item(&jobdat->apps, >>>>>> child->app_idx); >>>>>> if( jobdat->enable_recovery && child->restarts < >>>>>> app->max_restarts ) { >>>>>> @@ -526,10 +547,12 @@ >>>>>> ORTE_ERROR_LOG(rc); >>>>>> goto FINAL_CLEANUP; >>>>>> } >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> if (ORTE_SUCCESS != (rc = opal_dss.pack(alert, >>>>>> &child->name->epoch, 1, ORTE_EPOCH))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> goto FINAL_CLEANUP; >>>>>> } >>>>>> +#endif >>>>>> } >>>>>> } >>>>>> /* pack an invalid marker */ >>>>>> @@ -660,7 +683,7 @@ >>>>>> continue; >>>>>> } >>>>>> >>>>>> - if (name_item->epoch < orte_util_lookup_epoch(name_item)) { >>>>>> + if (0 < >>>>>> ORTE_EPOCH_CMP(name_item->epoch,orte_ess.proc_get_epoch(name_item))) { >>>>>> continue; >>>>>> } >>>>>> >>>>>> @@ -669,9 +692,11 @@ >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> ORTE_NAME_PRINT(name_item))); >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* Increment the epoch */ >>>>>> orte_util_set_proc_state(name_item, ORTE_PROC_STATE_TERMINATED); >>>>>> orte_util_set_epoch(name_item, name_item->epoch + 1); >>>>>> +#endif >>>>>> >>>>>> OPAL_THREAD_LOCK(&orte_odls_globals.mutex); >>>>>> >>>>>> @@ -706,6 +731,7 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> static void failure_notification(int status, orte_process_name_t* sender, >>>>>> opal_buffer_t *buffer, orte_rml_tag_t tag, >>>>>> void* cbdata) >>>>>> @@ -714,7 +740,7 @@ >>>>>> orte_std_cntr_t n; >>>>>> int ret = ORTE_SUCCESS, num_failed; >>>>>> int32_t i; >>>>>> - orte_process_name_t *name_item, proc; >>>>>> + orte_process_name_t *name_item; >>>>>> >>>>>> dead_names = OBJ_NEW(opal_pointer_array_t); >>>>>> >>>>>> @@ -746,7 +772,7 @@ >>>>>> /* There shouldn't be an issue of receiving this message multiple >>>>>> * times but it doesn't hurt to double check. >>>>>> */ >>>>>> - if (proc.epoch < orte_util_lookup_epoch(name_item)) { >>>>>> + if (0 < >>>>>> ORTE_EPOCH_CMP(name_item->epoch,orte_ess.proc_get_epoch(name_item))) { >>>>>> opal_output(1, "Received from proc %s local epoch %d", >>>>>> ORTE_NAME_PRINT(name_item), orte_util_lookup_epoch(name_item)); >>>>>> continue; >>>>>> } >>>>>> @@ -767,6 +793,7 @@ >>>>>> free(name_item); >>>>>> } >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /***************** >>>>>> * Local Functions >>>>>> @@ -948,11 +975,13 @@ >>>>>> ORTE_ERROR_LOG(rc); >>>>>> return rc; >>>>>> } >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* Pack the child's epoch. */ >>>>>> if (ORTE_SUCCESS != (rc = opal_dss.pack(buf, >>>>>> &(child->name->epoch), 1, ORTE_EPOCH))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> return rc; >>>>>> } >>>>>> +#endif >>>>>> /* pack the contact info */ >>>>>> if (ORTE_SUCCESS != (rc = opal_dss.pack(buf, &child->rml_uri, 1, >>>>>> OPAL_STRING))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> @@ -1015,7 +1044,7 @@ >>>>>> } >>>>>> } >>>>>> >>>>>> -static void killprocs(orte_jobid_t job, orte_vpid_t vpid, orte_epoch_t >>>>>> epoch) >>>>>> +static void killprocs(orte_jobid_t job, orte_vpid_t vpid) >>>>>> { >>>>>> opal_pointer_array_t cmd; >>>>>> orte_proc_t proc; >>>>>> @@ -1026,7 +1055,9 @@ >>>>>> orte_sensor.stop(job); >>>>>> } >>>>>> >>>>>> - if (ORTE_JOBID_WILDCARD == job && ORTE_VPID_WILDCARD == vpid && >>>>>> ORTE_EPOCH_WILDCARD == epoch) { >>>>>> + if (ORTE_JOBID_WILDCARD == job >>>>>> + && ORTE_VPID_WILDCARD == vpid >>>>>> + && 0 == ORTE_EPOCH_CMP(ORTE_EPOCH_WILDCARD,epoch)) { >>>>>> if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(NULL))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> } >>>>>> @@ -1037,7 +1068,7 @@ >>>>>> OBJ_CONSTRUCT(&proc, orte_proc_t); >>>>>> proc.name.jobid = job; >>>>>> proc.name.vpid = vpid; >>>>>> - proc.name.epoch = epoch; >>>>>> + ORTE_EPOCH_SET(proc.name.epoch,epoch); >>>>>> opal_pointer_array_add(&cmd, &proc); >>>>>> if (ORTE_SUCCESS != (rc = orte_odls.kill_local_procs(&cmd))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> @@ -1082,20 +1113,21 @@ >>>>>> return rc; >>>>>> } >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> int send_to_local_applications(opal_pointer_array_t *dead_names) { >>>>>> opal_buffer_t *buf; >>>>>> int ret; >>>>>> orte_process_name_t *name_item; >>>>>> int size, i; >>>>>> >>>>>> - OPAL_OUTPUT_VERBOSE((10, orte_errmgr_base.output, >>>>>> - "%s Sending failure to local applications.", >>>>>> - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> - >>>>>> buf = OBJ_NEW(opal_buffer_t); >>>>>> >>>>>> size = opal_pointer_array_get_size(dead_names); >>>>>> >>>>>> + OPAL_OUTPUT_VERBOSE((10, orte_errmgr_base.output, >>>>>> + "%s Sending %d failure(s) to local applications.", >>>>>> + ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), size)); >>>>>> + >>>>>> if (ORTE_SUCCESS != (ret = opal_dss.pack(buf, &size, 1, ORTE_VPID))) { >>>>>> ORTE_ERROR_LOG(ret); >>>>>> OBJ_RELEASE(buf); >>>>>> @@ -1122,4 +1154,5 @@ >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/ess/alps/ess_alps_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/alps/ess_alps_module.c (original) >>>>>> +++ trunk/orte/mca/ess/alps/ess_alps_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -363,8 +363,8 @@ >>>>>> >>>>>> ORTE_PROC_MY_NAME->jobid = jobid; >>>>>> ORTE_PROC_MY_NAME->vpid = (orte_vpid_t) cnos_get_rank() + starting_vpid; >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_NAME->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME); >>>>>> + ORTE_EPOCH_PRINT(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_INVALID); >>>>>> + >>>>>> ORTE_EPOCH_PRINT(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "ess:alps set name to %s", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> >>>>>> Modified: trunk/orte/mca/ess/base/base.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/base/base.h (original) >>>>>> +++ trunk/orte/mca/ess/base/base.h 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -57,7 +57,11 @@ >>>>>> >>>>>> ORTE_DECLSPEC extern opal_list_t orte_ess_base_components_available; >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> ORTE_DECLSPEC orte_epoch_t >>>>>> orte_ess_base_proc_get_epoch(orte_process_name_t *proc); >>>>>> +#else >>>>>> +ORTE_DECLSPEC int orte_ess_base_proc_get_epoch(orte_process_name_t >>>>>> *proc); >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/ess/base/ess_base_select.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/base/ess_base_select.c (original) >>>>>> +++ trunk/orte/mca/ess/base/ess_base_select.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -36,21 +36,19 @@ >>>>>> * Generic function to retrieve the epoch of a specific process >>>>>> * from the job data. >>>>>> */ >>>>>> +#if !ORTE_ENABLE_EPOCH >>>>>> +int orte_ess_base_proc_get_epoch(orte_process_name_t *proc) { >>>>>> + return 0; >>>>>> +} >>>>>> +#else >>>>>> orte_epoch_t orte_ess_base_proc_get_epoch(orte_process_name_t *proc) { >>>>>> orte_epoch_t epoch = ORTE_EPOCH_INVALID; >>>>>> >>>>>> -#if !ORTE_DISABLE_FULL_SUPPORT >>>>>> epoch = orte_util_lookup_epoch(proc); >>>>>> -#endif >>>>>> - >>>>>> - OPAL_OUTPUT_VERBOSE((2, orte_ess_base_output, >>>>>> - "%s ess:generic: proc %s has epoch %d", >>>>>> - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> - ORTE_NAME_PRINT(proc), >>>>>> - epoch)); >>>>>> >>>>>> return epoch; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> int >>>>>> orte_ess_base_select(void) >>>>>> >>>>>> Modified: trunk/orte/mca/ess/env/ess_env_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/env/ess_env_module.c (original) >>>>>> +++ trunk/orte/mca/ess/env/ess_env_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -392,8 +392,7 @@ >>>>>> >>>>>> ORTE_PROC_MY_NAME->jobid = jobid; >>>>>> ORTE_PROC_MY_NAME->vpid = vpid; >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_NAME->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "ess:env set name to %s", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> >>>>>> Modified: trunk/orte/mca/ess/ess.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/ess.h (original) >>>>>> +++ trunk/orte/mca/ess/ess.h 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -111,7 +111,11 @@ >>>>>> * will get the most up to date version stored within the orte_proc_t >>>>>> struct. >>>>>> * Obviously the epoch of the proc that is passed in will be ignored. >>>>>> */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> typedef orte_epoch_t >>>>>> (*orte_ess_base_module_proc_get_epoch_fn_t)(orte_process_name_t *proc); >>>>>> +#else >>>>>> +typedef int >>>>>> (*orte_ess_base_module_proc_get_epoch_fn_t)(orte_process_name_t *proc); >>>>>> +#endif >>>>>> >>>>>> /** >>>>>> * Update the pidmap >>>>>> >>>>>> Modified: trunk/orte/mca/ess/generic/ess_generic_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/generic/ess_generic_module.c (original) >>>>>> +++ trunk/orte/mca/ess/generic/ess_generic_module.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -155,7 +155,7 @@ >>>>>> goto error; >>>>>> } >>>>>> ORTE_PROC_MY_NAME->vpid = strtol(envar, NULL, 10); >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "%s completed name definition", >>>>>> @@ -273,7 +273,7 @@ >>>>>> if (vpid == ORTE_PROC_MY_NAME->vpid) { >>>>>> ORTE_PROC_MY_DAEMON->jobid = 0; >>>>>> ORTE_PROC_MY_DAEMON->vpid = i; >>>>>> - ORTE_PROC_MY_DAEMON->epoch = >>>>>> ORTE_PROC_MY_NAME->epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> } >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "%s node %d name %s rank %s", >>>>>> @@ -304,7 +304,7 @@ >>>>>> if (vpid == ORTE_PROC_MY_NAME->vpid) { >>>>>> ORTE_PROC_MY_DAEMON->jobid = 0; >>>>>> ORTE_PROC_MY_DAEMON->vpid = i; >>>>>> - ORTE_PROC_MY_DAEMON->epoch = >>>>>> ORTE_PROC_MY_NAME->epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> } >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "%s node %d name %s rank %d", >>>>>> >>>>>> Modified: trunk/orte/mca/ess/hnp/ess_hnp_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/hnp/ess_hnp_module.c (original) >>>>>> +++ trunk/orte/mca/ess/hnp/ess_hnp_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -494,7 +494,7 @@ >>>>>> proc = OBJ_NEW(orte_proc_t); >>>>>> proc->name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> proc->name.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - proc->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> proc->pid = orte_process_info.pid; >>>>>> proc->rml_uri = orte_rml.get_contact_info(); >>>>>> >>>>>> Modified: trunk/orte/mca/ess/lsf/ess_lsf_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/lsf/ess_lsf_module.c (original) >>>>>> +++ trunk/orte/mca/ess/lsf/ess_lsf_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -357,8 +357,7 @@ >>>>>> >>>>>> ORTE_PROC_MY_NAME->jobid = jobid; >>>>>> ORTE_PROC_MY_NAME->vpid = vpid; >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_NAME->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME)); >>>>>> >>>>>> /* fix up the base name and make it the "real" name */ >>>>>> lsf_nodeid = atoi(getenv("LSF_PM_TASKID")); >>>>>> >>>>>> Modified: trunk/orte/mca/ess/singleton/ess_singleton_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/singleton/ess_singleton_module.c (original) >>>>>> +++ trunk/orte/mca/ess/singleton/ess_singleton_module.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -188,7 +188,7 @@ >>>>>> /* set the name */ >>>>>> ORTE_PROC_MY_NAME->jobid = 0xffff0000 & ((uint32_t)jobfam << 16); >>>>>> ORTE_PROC_MY_NAME->vpid = 0; >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> } else { >>>>>> /* >>>>>> >>>>>> Modified: trunk/orte/mca/ess/slave/ess_slave_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/slave/ess_slave_module.c (original) >>>>>> +++ trunk/orte/mca/ess/slave/ess_slave_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -280,8 +280,7 @@ >>>>>> >>>>>> ORTE_PROC_MY_NAME->jobid = jobid; >>>>>> ORTE_PROC_MY_NAME->vpid = vpid; >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_NAME->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "ess:slave set name to %s", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> >>>>>> Modified: trunk/orte/mca/ess/slurm/ess_slurm_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/slurm/ess_slurm_module.c (original) >>>>>> +++ trunk/orte/mca/ess/slurm/ess_slurm_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -368,8 +368,7 @@ >>>>>> /* fix up the vpid and make it the "real" vpid */ >>>>>> slurm_nodeid = atoi(getenv("SLURM_NODEID")); >>>>>> ORTE_PROC_MY_NAME->vpid = vpid + slurm_nodeid; >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_NAME->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "ess:slurm set name to %s", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> >>>>>> Modified: trunk/orte/mca/ess/slurmd/ess_slurmd_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/slurmd/ess_slurmd_module.c (original) >>>>>> +++ trunk/orte/mca/ess/slurmd/ess_slurmd_module.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -195,7 +195,7 @@ >>>>>> } >>>>>> ORTE_PROC_MY_NAME->vpid = strtol(envar, NULL, 10); >>>>>> #endif >>>>>> - ORTE_PROC_MY_NAME->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN); >>>>>> /* get our local rank */ >>>>>> if (NULL == (envar = getenv("SLURM_LOCALID"))) { >>>>>> error = "could not get SLURM_LOCALID"; >>>>>> @@ -260,7 +260,7 @@ >>>>>> nodeid = strtol(envar, NULL, 10); >>>>>> ORTE_PROC_MY_DAEMON->jobid = 0; >>>>>> ORTE_PROC_MY_DAEMON->vpid = nodeid; >>>>>> - ORTE_PROC_MY_DAEMON->epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> >>>>>> /* get the number of ppn */ >>>>>> if (NULL == (tasks_per_node = getenv("SLURM_STEP_TASKS_PER_NODE"))) { >>>>>> >>>>>> Modified: trunk/orte/mca/ess/tm/ess_tm_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/ess/tm/ess_tm_module.c (original) >>>>>> +++ trunk/orte/mca/ess/tm/ess_tm_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -364,7 +364,7 @@ >>>>>> >>>>>> ORTE_PROC_MY_NAME->jobid = jobid; >>>>>> ORTE_PROC_MY_NAME->vpid = vpid; >>>>>> - ORTE_PROC_MY_NAME->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_NAME)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_ess_base_output, >>>>>> "ess:tm set name to %s", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> >>>>>> Modified: trunk/orte/mca/filem/rsh/filem_rsh_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/filem/rsh/filem_rsh_module.c (original) >>>>>> +++ trunk/orte/mca/filem/rsh/filem_rsh_module.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -1097,11 +1097,11 @@ >>>>>> if( NULL != proc_set ) { >>>>>> wp_item->proc_set.source.jobid = proc_set->source.jobid; >>>>>> wp_item->proc_set.source.vpid = proc_set->source.vpid; >>>>>> - wp_item->proc_set.source.epoch = proc_set->source.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(wp_item->proc_set.source.epoch,proc_set->source.epoch); >>>>>> >>>>>> wp_item->proc_set.sink.jobid = proc_set->sink.jobid; >>>>>> wp_item->proc_set.sink.vpid = proc_set->sink.vpid; >>>>>> - wp_item->proc_set.sink.epoch = proc_set->sink.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(wp_item->proc_set.sink.epoch,proc_set->sink.epoch); >>>>>> } >>>>>> /* Copy the File Set */ >>>>>> if( NULL != file_set ) { >>>>>> @@ -1396,7 +1396,7 @@ >>>>>> wp_item = OBJ_NEW(orte_filem_rsh_work_pool_item_t); >>>>>> wp_item->proc_set.source.jobid = sender->jobid; >>>>>> wp_item->proc_set.source.vpid = sender->vpid; >>>>>> - wp_item->proc_set.source.epoch = sender->epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(wp_item->proc_set.source.epoch,sender->epoch); >>>>>> >>>>>> opal_list_append(&work_pool_waiting, &(wp_item->super)); >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c (original) >>>>>> +++ trunk/orte/mca/grpcomm/base/grpcomm_base_coll.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -168,8 +168,7 @@ >>>>>> if (vpids[0] == ORTE_PROC_MY_NAME->vpid) { >>>>>> /* I send first */ >>>>>> peer.vpid = vpids[1]; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> /* setup a temp buffer so I can inform the other side as to the >>>>>> * number of entries in my buffer >>>>>> @@ -226,8 +225,7 @@ >>>>>> opal_dss.pack(&buf, &num_entries, 1, OPAL_INT32); >>>>>> opal_dss.copy_payload(&buf, sendbuf); >>>>>> peer.vpid = vpids[0]; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base.output, >>>>>> "%s grpcomm:coll:two-proc sending to %s", >>>>>> @@ -320,8 +318,7 @@ >>>>>> /* first send my current contents */ >>>>>> nv = (rank - distance + np) % np; >>>>>> peer.vpid = vpids[nv]; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> OBJ_CONSTRUCT(&buf, opal_buffer_t); >>>>>> opal_dss.pack(&buf, &total_entries, 1, OPAL_INT32); >>>>>> @@ -340,8 +337,7 @@ >>>>>> num_recvd = 0; >>>>>> nv = (rank + distance) % np; >>>>>> peer.vpid = vpids[nv]; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> OBJ_CONSTRUCT(&bucket, opal_buffer_t); >>>>>> if (ORTE_SUCCESS != (rc = orte_rml.recv_buffer_nb(&peer, >>>>>> @@ -439,8 +435,7 @@ >>>>>> /* first send my current contents */ >>>>>> nv = rank ^ distance; >>>>>> peer.vpid = vpids[nv]; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> OBJ_CONSTRUCT(&buf, opal_buffer_t); >>>>>> opal_dss.pack(&buf, &total_entries, 1, OPAL_INT32); >>>>>> @@ -646,8 +641,7 @@ >>>>>> proc.jobid = jobid; >>>>>> proc.vpid = 0; >>>>>> while (proc.vpid < jobdat->num_procs && 0 < >>>>>> opal_list_get_size(&daemon_tree)) { >>>>>> - proc.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc.epoch = orte_ess.proc_get_epoch(&proc); >>>>>> + ORTE_EPOCH_SET(proc.epoch,orte_ess.proc_get_epoch(&proc)); >>>>>> >>>>>> /* get the daemon that hosts this proc */ >>>>>> daemonvpid = orte_ess.proc_get_daemon(&proc); >>>>>> @@ -713,8 +707,7 @@ >>>>>> /* send it */ >>>>>> my_parent.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> my_parent.vpid = orte_routed.get_routing_tree(NULL); >>>>>> - my_parent.epoch = ORTE_EPOCH_INVALID; >>>>>> - my_parent.epoch = orte_ess.proc_get_epoch(&my_parent); >>>>>> + >>>>>> ORTE_EPOCH_SET(my_parent.epoch,orte_ess.proc_get_epoch(&my_parent)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_grpcomm_base.output, >>>>>> "%s grpcomm:base:daemon_coll: daemon collective >>>>>> not the HNP - sending to parent %s", >>>>>> >>>>>> Modified: trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c (original) >>>>>> +++ trunk/orte/mca/grpcomm/hier/grpcomm_hier_module.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -95,7 +95,7 @@ >>>>>> >>>>>> my_local_rank_zero_proc.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> my_local_rank_zero_proc.vpid = ORTE_VPID_INVALID; >>>>>> - my_local_rank_zero_proc.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(my_local_rank_zero_proc.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> if (ORTE_SUCCESS != (rc = orte_grpcomm_base_modex_init())) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> @@ -270,7 +270,7 @@ >>>>>> proc.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> for (v=0; v < orte_process_info.num_procs; v++) { >>>>>> proc.vpid = v; >>>>>> - proc.epoch = orte_util_lookup_epoch(&proc); >>>>>> + ORTE_EPOCH_SET(proc.epoch,orte_util_lookup_epoch(&proc)); >>>>>> >>>>>> /* is this proc local_rank=0 on its node? */ >>>>>> if (0 == my_local_rank && 0 == orte_ess.get_local_rank(&proc)) { >>>>>> @@ -285,7 +285,7 @@ >>>>>> nm = OBJ_NEW(orte_namelist_t); >>>>>> nm->name.jobid = proc.jobid; >>>>>> nm->name.vpid = proc.vpid; >>>>>> - nm->name.epoch = proc.epoch; >>>>>> + ORTE_EPOCH_SET(nm->name.epoch,proc.epoch); >>>>>> >>>>>> opal_list_append(&my_local_peers, &nm->item); >>>>>> /* if I am not local_rank=0, is this one? */ >>>>>> @@ -293,7 +293,7 @@ >>>>>> 0 == orte_ess.get_local_rank(&proc)) { >>>>>> my_local_rank_zero_proc.jobid = proc.jobid; >>>>>> my_local_rank_zero_proc.vpid = proc.vpid; >>>>>> - my_local_rank_zero_proc.epoch = proc.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(my_local_rank_zero_proc.epoch,proc.epoch); >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/iof/base/base.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/iof/base/base.h (original) >>>>>> +++ trunk/orte/mca/iof/base/base.h 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -135,7 +135,7 @@ >>>>>> ep = OBJ_NEW(orte_iof_sink_t); \ >>>>>> ep->name.jobid = (nm)->jobid; \ >>>>>> ep->name.vpid = (nm)->vpid; \ >>>>>> - ep->name.epoch = (nm)->epoch; \ >>>>>> + ORTE_EPOCH_SET(ep->name.epoch,(nm)->epoch); \ >>>>>> ep->tag = (tg); \ >>>>>> if (0 <= (fid)) { \ >>>>>> ep->wev->fd = (fid); \ >>>>>> @@ -169,7 +169,7 @@ >>>>>> rev = OBJ_NEW(orte_iof_read_event_t); \ >>>>>> rev->name.jobid = (nm)->jobid; \ >>>>>> rev->name.vpid = (nm)->vpid; \ >>>>>> - rev->name.epoch = (nm)->epoch; \ >>>>>> + ORTE_EPOCH_SET(rev->name.epoch,(nm)->epoch); \ >>>>>> rev->tag = (tg); \ >>>>>> rev->fd = (fid); \ >>>>>> *(rv) = rev; \ >>>>>> @@ -194,7 +194,7 @@ >>>>>> ep = OBJ_NEW(orte_iof_sink_t); \ >>>>>> ep->name.jobid = (nm)->jobid; \ >>>>>> ep->name.vpid = (nm)->vpid; \ >>>>>> - ep->name.epoch = (nm)->epoch; \ >>>>>> + ORTE_EPOCH_SET(ep->name.epoch,(nm)->epoch); \ >>>>>> ep->tag = (tg); \ >>>>>> if (0 <= (fid)) { \ >>>>>> ep->wev->fd = (fid); \ >>>>>> @@ -215,7 +215,7 @@ >>>>>> rev = OBJ_NEW(orte_iof_read_event_t); \ >>>>>> rev->name.jobid = (nm)->jobid; \ >>>>>> rev->name.vpid = (nm)->vpid; \ >>>>>> - rev->name.epoch= (nm)->epoch; \ >>>>>> + ORTE_EPOCH_SET(rev->name.epoch,(nm)->epoch); \ >>>>>> rev->tag = (tg); \ >>>>>> *(rv) = rev; \ >>>>>> opal_event_set(opal_event_base, \ >>>>>> >>>>>> Modified: trunk/orte/mca/iof/base/iof_base_open.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/iof/base/iof_base_open.c (original) >>>>>> +++ trunk/orte/mca/iof/base/iof_base_open.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -91,7 +91,7 @@ >>>>>> { >>>>>> ptr->daemon.jobid = ORTE_JOBID_INVALID; >>>>>> ptr->daemon.vpid = ORTE_VPID_INVALID; >>>>>> - ptr->daemon.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ptr->daemon.epoch,ORTE_EPOCH_MIN); >>>>>> ptr->wev = OBJ_NEW(orte_iof_write_event_t); >>>>>> } >>>>>> static void orte_iof_base_sink_destruct(orte_iof_sink_t* ptr) >>>>>> >>>>>> Modified: trunk/orte/mca/iof/hnp/iof_hnp.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/iof/hnp/iof_hnp.c (original) >>>>>> +++ trunk/orte/mca/iof/hnp/iof_hnp.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -186,7 +186,7 @@ >>>>>> proct = OBJ_NEW(orte_iof_proc_t); >>>>>> proct->name.jobid = dst_name->jobid; >>>>>> proct->name.vpid = dst_name->vpid; >>>>>> - proct->name.epoch = dst_name->epoch; >>>>>> + ORTE_EPOCH_SET(proct->name.epoch,dst_name->epoch); >>>>>> opal_list_append(&mca_iof_hnp_component.procs, &proct->super); >>>>>> /* see if we are to output to a file */ >>>>>> if (NULL != orte_output_filename) { >>>>>> @@ -281,8 +281,7 @@ >>>>>> &mca_iof_hnp_component.sinks); >>>>>> sink->daemon.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> sink->daemon.vpid = proc->node->daemon->name.vpid; >>>>>> - sink->daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - sink->daemon.epoch = orte_ess.proc_get_epoch(&sink->daemon); >>>>>> + >>>>>> ORTE_EPOCH_SET(sink->daemon.epoch,orte_ess.proc_get_epoch(&sink->daemon)); >>>>>> } >>>>>> } >>>>>> >>>>>> @@ -389,7 +388,7 @@ >>>>>> &mca_iof_hnp_component.sinks); >>>>>> sink->daemon.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> sink->daemon.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - sink->daemon.epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(sink->daemon.epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/iof/hnp/iof_hnp_receive.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/iof/hnp/iof_hnp_receive.c (original) >>>>>> +++ trunk/orte/mca/iof/hnp/iof_hnp_receive.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -109,21 +109,21 @@ >>>>>> NULL, &mca_iof_hnp_component.sinks); >>>>>> sink->daemon.jobid = mev->sender.jobid; >>>>>> sink->daemon.vpid = mev->sender.vpid; >>>>>> - sink->daemon.epoch = mev->sender.epoch; >>>>>> + ORTE_EPOCH_SET(sink->daemon.epoch,mev->sender.epoch); >>>>>> } >>>>>> if (ORTE_IOF_STDERR & stream) { >>>>>> ORTE_IOF_SINK_DEFINE(&sink, &origin, -1, ORTE_IOF_STDERR, >>>>>> NULL, &mca_iof_hnp_component.sinks); >>>>>> sink->daemon.jobid = mev->sender.jobid; >>>>>> sink->daemon.vpid = mev->sender.vpid; >>>>>> - sink->daemon.epoch = mev->sender.epoch; >>>>>> + ORTE_EPOCH_SET(sink->daemon.epoch,mev->sender.epoch); >>>>>> } >>>>>> if (ORTE_IOF_STDDIAG & stream) { >>>>>> ORTE_IOF_SINK_DEFINE(&sink, &origin, -1, ORTE_IOF_STDDIAG, >>>>>> NULL, &mca_iof_hnp_component.sinks); >>>>>> sink->daemon.jobid = mev->sender.jobid; >>>>>> sink->daemon.vpid = mev->sender.vpid; >>>>>> - sink->daemon.epoch = mev->sender.epoch; >>>>>> + ORTE_EPOCH_SET(sink->daemon.epoch,mev->sender.epoch); >>>>>> } >>>>>> goto CLEAN_RETURN; >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/iof/orted/iof_orted.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/iof/orted/iof_orted.c (original) >>>>>> +++ trunk/orte/mca/iof/orted/iof_orted.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -163,7 +163,7 @@ >>>>>> proct = OBJ_NEW(orte_iof_proc_t); >>>>>> proct->name.jobid = dst_name->jobid; >>>>>> proct->name.vpid = dst_name->vpid; >>>>>> - proct->name.epoch = dst_name->epoch; >>>>>> + ORTE_EPOCH_SET(proct->name.epoch,dst_name->epoch); >>>>>> opal_list_append(&mca_iof_orted_component.procs, &proct->super); >>>>>> /* see if we are to output to a file */ >>>>>> if (NULL != orte_output_filename) { >>>>>> >>>>>> Modified: trunk/orte/mca/odls/base/odls_base_default_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/odls/base/odls_base_default_fns.c (original) >>>>>> +++ trunk/orte/mca/odls/base/odls_base_default_fns.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -734,8 +734,7 @@ >>>>>> proc.jobid = jobdat->jobid; >>>>>> for (j=0; j < jobdat->num_procs; j++) { >>>>>> proc.vpid = j; >>>>>> - proc.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc.epoch = orte_ess.proc_get_epoch(&proc); >>>>>> + ORTE_EPOCH_SET(proc.epoch,orte_ess.proc_get_epoch(&proc)); >>>>>> /* get the vpid of the daemon that is to host this proc */ >>>>>> if (ORTE_VPID_INVALID == (host_daemon = >>>>>> orte_ess.proc_get_daemon(&proc))) { >>>>>> ORTE_ERROR_LOG(ORTE_ERR_NOT_FOUND); >>>>>> @@ -1044,6 +1043,7 @@ >>>>>> free(param); >>>>>> free(value); >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* setup the epoch */ >>>>>> if (ORTE_SUCCESS != (rc = orte_util_convert_epoch_to_string(&value, >>>>>> child->name->epoch))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> @@ -1057,6 +1057,7 @@ >>>>>> opal_setenv(param, value, true, env); >>>>>> free(param); >>>>>> free(value); >>>>>> +#endif >>>>>> >>>>>> /* setup the vpid */ >>>>>> if (ORTE_SUCCESS != (rc = orte_util_convert_vpid_to_string(&value, >>>>>> child->name->vpid))) { >>>>>> @@ -2721,7 +2722,7 @@ >>>>>> OBJ_CONSTRUCT(&proctmp, orte_proc_t); >>>>>> proctmp.name.jobid = ORTE_JOBID_WILDCARD; >>>>>> proctmp.name.vpid = ORTE_VPID_WILDCARD; >>>>>> - proctmp.name.epoch = ORTE_EPOCH_WILDCARD; >>>>>> + ORTE_EPOCH_SET(proctmp.name.epoch,ORTE_EPOCH_WILDCARD); >>>>>> opal_pointer_array_add(&procarray, &proctmp); >>>>>> procptr = &procarray; >>>>>> do_cleanup = true; >>>>>> >>>>>> Modified: trunk/orte/mca/odls/base/odls_base_open.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/odls/base/odls_base_open.c (original) >>>>>> +++ trunk/orte/mca/odls/base/odls_base_open.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -187,7 +187,7 @@ >>>>>> if (-1 == rank) { >>>>>> /* wildcard */ >>>>>> nm->name.vpid = ORTE_VPID_WILDCARD; >>>>>> - nm->name.epoch = ORTE_EPOCH_WILDCARD; >>>>>> + ORTE_EPOCH_SET(nm->name.epoch,ORTE_EPOCH_WILDCARD); >>>>>> } else if (rank < 0) { >>>>>> /* error out on bozo case */ >>>>>> orte_show_help("help-odls-base.txt", >>>>>> @@ -200,8 +200,7 @@ >>>>>> * will be in the job - we'll check later >>>>>> */ >>>>>> nm->name.vpid = rank; >>>>>> - nm->name.epoch = ORTE_EPOCH_INVALID; >>>>>> - nm->name.epoch = orte_ess.proc_get_epoch(&nm->name); >>>>>> + >>>>>> ORTE_EPOCH_SET(nm->name.epoch,orte_ess.proc_get_epoch(&nm->name)); >>>>>> } >>>>>> opal_list_append(&orte_odls_globals.xterm_ranks, &nm->item); >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/odls/base/odls_base_state.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/odls/base/odls_base_state.c (original) >>>>>> +++ trunk/orte/mca/odls/base/odls_base_state.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -77,17 +77,17 @@ >>>>>> /* if I am the HNP, then use me as the source */ >>>>>> p_set->source.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> p_set->source.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - p_set->source.epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> } >>>>>> else { >>>>>> /* otherwise, set the HNP as the source */ >>>>>> p_set->source.jobid = ORTE_PROC_MY_HNP->jobid; >>>>>> p_set->source.vpid = ORTE_PROC_MY_HNP->vpid; >>>>>> - p_set->source.epoch = ORTE_PROC_MY_HNP->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_HNP->epoch); >>>>>> } >>>>>> p_set->sink.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> p_set->sink.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - p_set->sink.epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->sink.epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> >>>>>> opal_list_append(&(filem_request->process_sets), &(p_set->super) ); >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/oob/tcp/oob_tcp_msg.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/oob/tcp/oob_tcp_msg.c (original) >>>>>> +++ trunk/orte/mca/oob/tcp/oob_tcp_msg.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -137,6 +137,7 @@ >>>>>> bool mca_oob_tcp_msg_send_handler(mca_oob_tcp_msg_t* msg, struct >>>>>> mca_oob_tcp_peer_t * peer) >>>>>> { >>>>>> int rc; >>>>>> + >>>>>> while(1) { >>>>>> rc = writev(peer->peer_sd, msg->msg_rwptr, msg->msg_rwnum); >>>>>> if(rc < 0) { >>>>>> @@ -338,6 +339,7 @@ >>>>>> orte_process_name_t src = msg->msg_hdr.msg_src; >>>>>> >>>>>> OPAL_THREAD_LOCK(&mca_oob_tcp_component.tcp_lock); >>>>>> + >>>>>> if (orte_util_compare_name_fields(ORTE_NS_CMP_ALL, &peer->peer_name, >>>>>> &src) != OPAL_EQUAL) { >>>>>> opal_hash_table_remove_value_uint64(&mca_oob_tcp_component.tcp_peers, >>>>>> >>>>>> orte_util_hash_name(&peer->peer_name)); >>>>>> >>>>>> Modified: trunk/orte/mca/oob/tcp/oob_tcp_peer.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/oob/tcp/oob_tcp_peer.c (original) >>>>>> +++ trunk/orte/mca/oob/tcp/oob_tcp_peer.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -903,6 +903,11 @@ >>>>>> static void mca_oob_tcp_peer_recv_handler(int sd, short flags, void* >>>>>> user) >>>>>> { >>>>>> mca_oob_tcp_peer_t* peer = (mca_oob_tcp_peer_t *)user; >>>>>> + >>>>>> + if (orte_abnormal_term_ordered) { >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> OPAL_THREAD_LOCK(&peer->peer_lock); >>>>>> switch(peer->peer_state) { >>>>>> case MCA_OOB_TCP_CONNECT_ACK: >>>>>> >>>>>> Modified: trunk/orte/mca/plm/base/plm_base_jobid.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/plm/base/plm_base_jobid.c (original) >>>>>> +++ trunk/orte/mca/plm/base/plm_base_jobid.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -62,12 +62,12 @@ >>>>>> /* set the name */ >>>>>> ORTE_PROC_MY_NAME->jobid = 0xffff0000 & ((uint32_t)jobfam << 16); >>>>>> ORTE_PROC_MY_NAME->vpid = 0; >>>>>> - ORTE_PROC_MY_NAME->epoch= ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_NAME->epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> /* copy it to the HNP field */ >>>>>> ORTE_PROC_MY_HNP->jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> ORTE_PROC_MY_HNP->vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - ORTE_PROC_MY_HNP->epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_HNP->epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> >>>>>> /* done */ >>>>>> return ORTE_SUCCESS; >>>>>> >>>>>> Modified: trunk/orte/mca/plm/base/plm_base_launch_support.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/plm/base/plm_base_launch_support.c (original) >>>>>> +++ trunk/orte/mca/plm/base/plm_base_launch_support.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -377,8 +377,7 @@ >>>>>> /* push stdin - the IOF will know what to do with the specified target */ >>>>>> name.jobid = job; >>>>>> name.vpid = jdata->stdin_target; >>>>>> - name.epoch = ORTE_EPOCH_INVALID; >>>>>> - name.epoch = orte_ess.proc_get_epoch(&name); >>>>>> + ORTE_EPOCH_SET(name.epoch,orte_ess.proc_get_epoch(&name)); >>>>>> >>>>>> if (ORTE_SUCCESS != (rc = orte_iof.push(&name, ORTE_IOF_STDIN, 0))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> >>>>>> Modified: trunk/orte/mca/plm/base/plm_base_orted_cmds.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/plm/base/plm_base_orted_cmds.c (original) >>>>>> +++ trunk/orte/mca/plm/base/plm_base_orted_cmds.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -163,8 +163,7 @@ >>>>>> continue; >>>>>> } >>>>>> peer.vpid = v; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> /* don't worry about errors on the send here - just >>>>>> * issue it and keep going >>>>>> @@ -242,7 +241,7 @@ >>>>>> OBJ_CONSTRUCT(&proc, orte_proc_t); >>>>>> proc.name.jobid = jobid; >>>>>> proc.name.vpid = ORTE_VPID_WILDCARD; >>>>>> - proc.name.epoch = ORTE_EPOCH_WILDCARD; >>>>>> + ORTE_EPOCH_SET(proc.name.epoch,ORTE_EPOCH_WILDCARD); >>>>>> opal_pointer_array_add(&procs, &proc); >>>>>> if (ORTE_SUCCESS != (rc = orte_plm_base_orted_kill_local_procs(&procs))) >>>>>> { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> @@ -340,8 +339,7 @@ >>>>>> continue; >>>>>> } >>>>>> peer.vpid = v; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> /* check to see if this daemon is known to be "dead" */ >>>>>> if (proc->state > ORTE_PROC_STATE_UNTERMINATED) { >>>>>> /* don't try to send this */ >>>>>> >>>>>> Modified: trunk/orte/mca/plm/base/plm_base_receive.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/plm/base/plm_base_receive.c (original) >>>>>> +++ trunk/orte/mca/plm/base/plm_base_receive.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -146,7 +146,9 @@ >>>>>> orte_job_t *jdata, *parent; >>>>>> opal_buffer_t answer; >>>>>> orte_vpid_t vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; >>>>>> +#endif >>>>>> orte_proc_t *proc; >>>>>> orte_proc_state_t state; >>>>>> orte_exit_code_t exit_code; >>>>>> @@ -394,8 +396,7 @@ >>>>>> break; >>>>>> } >>>>>> name.vpid = vpid; >>>>>> - name.epoch = ORTE_EPOCH_INVALID; >>>>>> - name.epoch = orte_ess.proc_get_epoch(&name); >>>>>> + >>>>>> ORTE_EPOCH_SET(name.epoch,orte_ess.proc_get_epoch(&name)); >>>>>> >>>>>> /* unpack the pid */ >>>>>> count = 1; >>>>>> @@ -488,9 +489,11 @@ >>>>>> } >>>>>> name.vpid = vpid; >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> count=1; >>>>>> opal_dss.unpack(msgpkt->buffer, &epoch, &count, ORTE_EPOCH); >>>>>> name.epoch = epoch; >>>>>> +#endif >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_plm_globals.output, >>>>>> "%s plm:base:receive Described rank %s", >>>>>> >>>>>> Modified: trunk/orte/mca/plm/base/plm_base_rsh_support.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/plm/base/plm_base_rsh_support.c (original) >>>>>> +++ trunk/orte/mca/plm/base/plm_base_rsh_support.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -1527,7 +1527,9 @@ >>>>>> { >>>>>> char *param, *path, *tmp, *cmd, *basename, *dest_dir; >>>>>> int i; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; >>>>>> +#endif >>>>>> orte_process_name_t proc; >>>>>> >>>>>> /* if a prefix is set, pass it to the bootproxy in a special way */ >>>>>> @@ -1638,6 +1640,7 @@ >>>>>> opal_setenv("OMPI_COMM_WORLD_RANK", cmd, true, argv); >>>>>> free(cmd); >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* set the epoch */ >>>>>> proc.jobid = jobid; >>>>>> proc.vpid = vpid; >>>>>> @@ -1648,6 +1651,7 @@ >>>>>> opal_setenv(param, cmd, true, argv); >>>>>> free(param); >>>>>> free(cmd); >>>>>> +#endif >>>>>> >>>>>> /* set the number of procs */ >>>>>> asprintf(&cmd, "%d", (int)num_procs); >>>>>> >>>>>> Modified: trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c (original) >>>>>> +++ trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -33,12 +33,14 @@ >>>>>> #include "orte/mca/ess/ess.h" >>>>>> #include "opal/mca/sysinfo/sysinfo_types.h" >>>>>> >>>>>> +#include "orte/types.h" >>>>>> #include "orte/util/show_help.h" >>>>>> #include "orte/util/name_fns.h" >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/util/hostfile/hostfile.h" >>>>>> #include "orte/util/dash_host/dash_host.h" >>>>>> #include "orte/mca/errmgr/errmgr.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rmaps/base/rmaps_private.h" >>>>>> #include "orte/mca/rmaps/base/base.h" >>>>>> @@ -454,7 +456,7 @@ >>>>>> */ >>>>>> >>>>>> /* We do set the epoch here since they all start with the same value. >>>>>> */ >>>>>> - proc->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> proc->app_idx = app_idx; >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_rmaps_base.rmaps_output, >>>>>> @@ -559,11 +561,12 @@ >>>>>> } >>>>>> } >>>>>> proc->name.vpid = vpid; >>>>>> - proc->name.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc->name.epoch = >>>>>> orte_ess.proc_get_epoch(&proc->name); >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name)); >>>>>> + >>>>>> /* If there is an invalid epoch here, it's because it >>>>>> doesn't exist yet. */ >>>>>> - if (ORTE_NODE_RANK_INVALID == proc->name.epoch) { >>>>>> - proc->name.epoch = ORTE_EPOCH_MIN; >>>>>> + if (0 == >>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_INVALID,proc->name.epoch)) { >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN); >>>>>> } >>>>>> } >>>>>> if (NULL == opal_pointer_array_get_item(jdata->procs, >>>>>> proc->name.vpid)) { >>>>>> @@ -601,8 +604,8 @@ >>>>>> } >>>>>> } >>>>>> proc->name.vpid = vpid; >>>>>> - proc->name.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc->name.epoch = >>>>>> orte_ess.proc_get_epoch(&proc->name); >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name)); >>>>>> } >>>>>> if (NULL == opal_pointer_array_get_item(jdata->procs, >>>>>> proc->name.vpid)) { >>>>>> if (ORTE_SUCCESS != (rc = >>>>>> opal_pointer_array_set_item(jdata->procs, proc->name.vpid, proc))) { >>>>>> @@ -835,7 +838,7 @@ >>>>>> return ORTE_ERR_OUT_OF_RESOURCE; >>>>>> } >>>>>> proc->name.vpid = daemons->num_procs; /* take the next available >>>>>> vpid */ >>>>>> - proc->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN); >>>>>> proc->node = node; >>>>>> proc->nodename = node->name; >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_rmaps_base.rmaps_output, >>>>>> @@ -1014,8 +1017,8 @@ >>>>>> return ORTE_ERR_OUT_OF_RESOURCE; >>>>>> } >>>>>> proc->name.vpid = jdata->num_procs; /* take the next available vpid >>>>>> */ >>>>>> - proc->name.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc->name.epoch = orte_ess.proc_get_epoch(&proc->name); >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_INVALID); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name)); >>>>>> proc->node = node; >>>>>> proc->nodename = node->name; >>>>>> OPAL_OUTPUT_VERBOSE((5, orte_rmaps_base.rmaps_output, >>>>>> >>>>>> Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c (original) >>>>>> +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -502,8 +502,7 @@ >>>>>> } >>>>>> proc->name.vpid = rank; >>>>>> /* Either init or update the epoch. */ >>>>>> - proc->name.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc->name.epoch = orte_ess.proc_get_epoch(&proc->name); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name)); >>>>>> >>>>>> proc->slot_list = strdup(rfmap->slot_list); >>>>>> /* insert the proc into the proper place */ >>>>>> >>>>>> Modified: trunk/orte/mca/rmaps/seq/rmaps_seq.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rmaps/seq/rmaps_seq.c (original) >>>>>> +++ trunk/orte/mca/rmaps/seq/rmaps_seq.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -235,8 +235,7 @@ >>>>>> } >>>>>> /* assign the vpid */ >>>>>> proc->name.vpid = vpid++; >>>>>> - proc->name.epoch = ORTE_EPOCH_INVALID; >>>>>> - proc->name.epoch = orte_ess.proc_get_epoch(&proc->name); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc->name.epoch,orte_ess.proc_get_epoch(&proc->name)); >>>>>> >>>>>> /* add to the jdata proc array */ >>>>>> if (ORTE_SUCCESS != (rc = >>>>>> opal_pointer_array_set_item(jdata->procs, proc->name.vpid, proc))) { >>>>>> >>>>>> Modified: trunk/orte/mca/rmcast/base/rmcast_base_open.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rmcast/base/rmcast_base_open.c (original) >>>>>> +++ trunk/orte/mca/rmcast/base/rmcast_base_open.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -341,7 +341,7 @@ >>>>>> { >>>>>> ptr->name.jobid = ORTE_JOBID_INVALID; >>>>>> ptr->name.vpid = ORTE_VPID_INVALID; >>>>>> - ptr->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_MIN); >>>>>> ptr->channel = ORTE_RMCAST_INVALID_CHANNEL; >>>>>> OBJ_CONSTRUCT(&ptr->ctl, orte_thread_ctl_t); >>>>>> ptr->seq_num = ORTE_RMCAST_SEQ_INVALID; >>>>>> @@ -430,7 +430,7 @@ >>>>>> { >>>>>> ptr->name.jobid = ORTE_JOBID_INVALID; >>>>>> ptr->name.vpid = ORTE_VPID_INVALID; >>>>>> - ptr->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_MIN); >>>>>> OBJ_CONSTRUCT(&ptr->last_msg, opal_list_t); >>>>>> } >>>>>> static void recvlog_destruct(rmcast_recv_log_t *ptr) >>>>>> @@ -439,7 +439,7 @@ >>>>>> >>>>>> ptr->name.jobid = ORTE_JOBID_INVALID; >>>>>> ptr->name.vpid = ORTE_VPID_INVALID; >>>>>> - ptr->name.epoch = ORTE_EPOCH_INVALID; >>>>>> + ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_INVALID); >>>>>> while (NULL != (item = opal_list_remove_first(&ptr->last_msg))) { >>>>>> OBJ_RELEASE(item); >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/rmcast/tcp/rmcast_tcp.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rmcast/tcp/rmcast_tcp.c (original) >>>>>> +++ trunk/orte/mca/rmcast/tcp/rmcast_tcp.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -681,7 +681,7 @@ >>>>>> /* caller requested id of sender */ >>>>>> name->jobid = recvptr->name.jobid; >>>>>> name->vpid = recvptr->name.vpid; >>>>>> - name->epoch= recvptr->name.epoch; >>>>>> + ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch); >>>>>> } >>>>>> *seq_num = recvptr->seq_num; >>>>>> *msg = recvptr->iovec_array; >>>>>> @@ -776,7 +776,7 @@ >>>>>> /* caller requested id of sender */ >>>>>> name->jobid = recvptr->name.jobid; >>>>>> name->vpid = recvptr->name.vpid; >>>>>> - name->epoch= recvptr->name.epoch; >>>>>> + ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch); >>>>>> } >>>>>> *seq_num = recvptr->seq_num; >>>>>> if (ORTE_SUCCESS != (ret = opal_dss.copy_payload(buf, recvptr->buf))) { >>>>>> >>>>>> Modified: trunk/orte/mca/rmcast/udp/rmcast_udp.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rmcast/udp/rmcast_udp.c (original) >>>>>> +++ trunk/orte/mca/rmcast/udp/rmcast_udp.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -460,7 +460,7 @@ >>>>>> /* caller requested id of sender */ >>>>>> name->jobid = recvptr->name.jobid; >>>>>> name->vpid = recvptr->name.vpid; >>>>>> - name->epoch= recvptr->name.epoch; >>>>>> + ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch); >>>>>> } >>>>>> *seq_num = recvptr->seq_num; >>>>>> *msg = recvptr->iovec_array; >>>>>> @@ -553,7 +553,7 @@ >>>>>> /* caller requested id of sender */ >>>>>> name->jobid = recvptr->name.jobid; >>>>>> name->vpid = recvptr->name.vpid; >>>>>> - name->epoch= recvptr->name.epoch; >>>>>> + ORTE_EPOCH_SET(name->epoch,recvptr->name.epoch); >>>>>> } >>>>>> *seq_num = recvptr->seq_num; >>>>>> if (ORTE_SUCCESS != (ret = opal_dss.copy_payload(buf, recvptr->buf))) { >>>>>> >>>>>> Modified: trunk/orte/mca/rml/base/rml_base_components.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rml/base/rml_base_components.c (original) >>>>>> +++ trunk/orte/mca/rml/base/rml_base_components.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -20,6 +20,7 @@ >>>>>> #include "opal/util/output.h" >>>>>> >>>>>> #include "orte/mca/rml/rml.h" >>>>>> +#include "orte/util/name_fns.h" >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> >>>>>> @@ -67,14 +68,14 @@ >>>>>> { >>>>>> pkt->sender.jobid = ORTE_JOBID_INVALID; >>>>>> pkt->sender.vpid = ORTE_VPID_INVALID; >>>>>> - pkt->sender.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(pkt->sender.epoch,ORTE_EPOCH_MIN); >>>>>> pkt->buffer = NULL; >>>>>> } >>>>>> static void msg_pkt_destructor(orte_msg_packet_t *pkt) >>>>>> { >>>>>> pkt->sender.jobid = ORTE_JOBID_INVALID; >>>>>> pkt->sender.vpid = ORTE_VPID_INVALID; >>>>>> - pkt->sender.epoch = ORTE_EPOCH_INVALID; >>>>>> + ORTE_EPOCH_SET(pkt->sender.epoch,ORTE_EPOCH_INVALID); >>>>>> if (NULL != pkt->buffer) { >>>>>> OBJ_RELEASE(pkt->buffer); >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/rml/rml_types.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/rml/rml_types.h (original) >>>>>> +++ trunk/orte/mca/rml/rml_types.h 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -62,7 +62,7 @@ >>>>>> pkt = OBJ_NEW(orte_msg_packet_t); \ >>>>>> pkt->sender.jobid = (sndr)->jobid; \ >>>>>> pkt->sender.vpid = (sndr)->vpid; \ >>>>>> - pkt->sender.epoch = (sndr)->epoch; \ >>>>>> + ORTE_EPOCH_SET(pkt->sender.epoch,(sndr)->epoch); \ >>>>>> if ((crt)) { \ >>>>>> pkt->buffer = OBJ_NEW(opal_buffer_t); \ >>>>>> opal_dss.copy_payload(pkt->buffer, *(buf)); \ >>>>>> @@ -85,7 +85,7 @@ >>>>>> pkt = OBJ_NEW(orte_msg_packet_t); \ >>>>>> pkt->sender.jobid = (sndr)->jobid; \ >>>>>> pkt->sender.vpid = (sndr)->vpid; \ >>>>>> - pkt->sender.epoch = (sndr)->epoch; \ >>>>>> + ORTE_EPOCH_SET(pkt->sender.epoch,(sndr)->epoch); \ >>>>>> if ((crt)) { \ >>>>>> pkt->buffer = OBJ_NEW(opal_buffer_t); \ >>>>>> opal_dss.copy_payload(pkt->buffer, *(buf)); \ >>>>>> @@ -191,8 +191,10 @@ >>>>>> >>>>>> #define ORTE_RML_TAG_SUBSCRIBE 46 >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* For Epoch Updates */ >>>>>> #define ORTE_RML_TAG_EPOCH_CHANGE 47 >>>>>> +#endif >>>>>> >>>>>> /* Notify of failed processes */ >>>>>> #define ORTE_RML_TAG_FAILURE_NOTICE 48 >>>>>> >>>>>> Modified: trunk/orte/mca/routed/base/routed_base_components.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/base/routed_base_components.c (original) >>>>>> +++ trunk/orte/mca/routed/base/routed_base_components.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -65,7 +65,7 @@ >>>>>> { >>>>>> ptr->route.jobid = ORTE_JOBID_INVALID; >>>>>> ptr->route.vpid = ORTE_VPID_INVALID; >>>>>> - ptr->route.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ptr->route.epoch,ORTE_EPOCH_MIN); >>>>>> ptr->hnp_uri = NULL; >>>>>> } >>>>>> static void jfamdest(orte_routed_jobfam_t *ptr) >>>>>> @@ -117,7 +117,7 @@ >>>>>> jfam = OBJ_NEW(orte_routed_jobfam_t); >>>>>> jfam->route.jobid = ORTE_PROC_MY_HNP->jobid; >>>>>> jfam->route.vpid = ORTE_PROC_MY_HNP->vpid; >>>>>> - jfam->route.epoch = ORTE_PROC_MY_HNP->epoch; >>>>>> + ORTE_EPOCH_SET(jfam->route.epoch,ORTE_PROC_MY_HNP->epoch); >>>>>> jfam->job_family = ORTE_JOB_FAMILY(ORTE_PROC_MY_NAME->jobid); >>>>>> if (NULL != orte_process_info.my_hnp_uri) { >>>>>> jfam->hnp_uri = strdup(orte_process_info.my_hnp_uri); >>>>>> @@ -252,7 +252,7 @@ >>>>>> jfam->job_family = jobfamily; >>>>>> jfam->route.jobid = name.jobid; >>>>>> jfam->route.vpid = name.vpid; >>>>>> - jfam->route.epoch = name.epoch; >>>>>> + ORTE_EPOCH_SET(jfam->route.epoch,name.epoch); >>>>>> jfam->hnp_uri = strdup(uri); >>>>>> done: >>>>>> free(uri); >>>>>> >>>>>> Modified: trunk/orte/mca/routed/base/routed_base_register_sync.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/base/routed_base_register_sync.c >>>>>> (original) >>>>>> +++ trunk/orte/mca/routed/base/routed_base_register_sync.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -127,7 +127,9 @@ >>>>>> orte_std_cntr_t cnt; >>>>>> char *rml_uri; >>>>>> orte_vpid_t vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; >>>>>> +#endif >>>>>> int rc; >>>>>> >>>>>> if (ORTE_JOB_FAMILY(job) == ORTE_JOB_FAMILY(ORTE_PROC_MY_NAME->jobid)) { >>>>>> @@ -146,11 +148,13 @@ >>>>>> cnt = 1; >>>>>> while (ORTE_SUCCESS == (rc = opal_dss.unpack(buffer, &vpid, &cnt, >>>>>> ORTE_VPID))) { >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> cnt = 1; >>>>>> if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &epoch, &cnt, >>>>>> ORTE_EPOCH))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> continue; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> if (ORTE_SUCCESS != (rc = opal_dss.unpack(buffer, &rml_uri, &cnt, >>>>>> OPAL_STRING))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> >>>>>> Modified: trunk/orte/mca/routed/binomial/routed_binomial.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/binomial/routed_binomial.c (original) >>>>>> +++ trunk/orte/mca/routed/binomial/routed_binomial.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -33,6 +33,7 @@ >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/runtime/orte_wait.h" >>>>>> #include "orte/runtime/runtime.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rml/base/rml_contact.h" >>>>>> >>>>>> @@ -147,7 +148,7 @@ >>>>>> >>>>>> if (proc->jobid == ORTE_JOBID_INVALID || >>>>>> proc->vpid == ORTE_VPID_INVALID || >>>>>> - proc->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -216,7 +217,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -274,8 +275,7 @@ >>>>>> ORTE_NAME_PRINT(route))); >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = ORTE_EPOCH_INVALID; >>>>>> - jfam->route.epoch = >>>>>> orte_ess.proc_get_epoch(&jfam->route); >>>>>> + >>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route)); >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -290,8 +290,7 @@ >>>>>> jfam->job_family = jfamily; >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = ORTE_EPOCH_INVALID; >>>>>> - jfam->route.epoch = orte_ess.proc_get_epoch(&jfam->route); >>>>>> + >>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route)); >>>>>> >>>>>> opal_pointer_array_add(&orte_routed_jobfams, jfam); >>>>>> return ORTE_SUCCESS; >>>>>> @@ -317,11 +316,21 @@ >>>>>> /* initialize */ >>>>>> daemon.jobid = ORTE_PROC_MY_DAEMON->jobid; >>>>>> daemon.vpid = ORTE_PROC_MY_DAEMON->vpid; >>>>>> - daemon.epoch = ORTE_PROC_MY_DAEMON->epoch; >>>>>> + ORTE_EPOCH_SET(daemon.epoch,ORTE_PROC_MY_DAEMON->epoch); >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> target->epoch == ORTE_EPOCH_INVALID) { >>>>>> +#else >>>>>> + if (target->jobid == ORTE_JOBID_INVALID || >>>>>> + target->vpid == ORTE_VPID_INVALID) { >>>>>> +#endif >>>>>> + ret = ORTE_NAME_INVALID; >>>>>> + goto found; >>>>>> + } >>>>>> + >>>>>> + if (0 > ORTE_EPOCH_CMP(target->epoch, >>>>>> orte_ess.proc_get_epoch(target))) { >>>>>> ret = ORTE_NAME_INVALID; >>>>>> goto found; >>>>>> } >>>>>> @@ -443,7 +452,7 @@ >>>>>> >>>>>> /* If the daemon to which we should be routing is dead, then >>>>>> update >>>>>> * the routing tree and start over. */ >>>>>> - if (!orte_util_proc_is_running(&daemon)) { >>>>>> + if (!PROC_IS_RUNNING(&daemon)) { >>>>>> update_routing_tree(daemon.jobid); >>>>>> goto startover; >>>>>> } >>>>>> @@ -461,8 +470,7 @@ >>>>>> ret = &daemon; >>>>>> >>>>>> found: >>>>>> - daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - daemon.epoch = orte_ess.proc_get_epoch(&daemon); >>>>>> + ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_routed_base_output, >>>>>> "%s routed_binomial_get(%s) --> %s", >>>>>> @@ -879,7 +887,7 @@ >>>>>> */ >>>>>> local_lifeline.jobid = proc->jobid; >>>>>> local_lifeline.vpid = proc->vpid; >>>>>> - local_lifeline.epoch = proc->epoch; >>>>>> + ORTE_EPOCH_SET(local_lifeline.epoch,proc->epoch); >>>>>> lifeline = &local_lifeline; >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> @@ -924,11 +932,11 @@ >>>>>> * that process so we can check it's state. >>>>>> */ >>>>>> proc_name.vpid = peer; >>>>>> - proc_name.epoch = orte_util_lookup_epoch(&proc_name); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc_name.epoch,orte_util_lookup_epoch(&proc_name)); >>>>>> >>>>>> - if (!orte_util_proc_is_running(&proc_name) >>>>>> - && ORTE_EPOCH_MIN < proc_name.epoch >>>>>> - && ORTE_EPOCH_INVALID != proc_name.epoch) { >>>>>> + if (!PROC_IS_RUNNING(&proc_name) >>>>>> + && 0 < >>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_MIN,proc_name.epoch) >>>>>> + && 0 != >>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_INVALID,proc_name.epoch)) { >>>>>> OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output, >>>>>> "%s routed:binomial child %s is >>>>>> dead", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> @@ -967,7 +975,7 @@ >>>>>> } >>>>>> >>>>>> /* find the children of this rank */ >>>>>> - OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output, >>>>>> + OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output, >>>>>> "%s routed:binomial find children of rank %d", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), rank)); >>>>>> bitmap = opal_cube_dim(num_procs); >>>>>> @@ -977,24 +985,25 @@ >>>>>> >>>>>> for (i = hibit + 1, mask = 1 << i; i <= bitmap; ++i, mask <<= 1) { >>>>>> peer = rank | mask; >>>>>> - OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output, >>>>>> + OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output, >>>>>> "%s routed:binomial find children checking peer >>>>>> %d", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), peer)); >>>>>> if (peer < num_procs) { >>>>>> - OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output, >>>>>> + OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output, >>>>>> "%s routed:binomial find children computing >>>>>> tree", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME))); >>>>>> /* execute compute on this child */ >>>>>> if (0 <= (found = binomial_tree(peer, rank, me, num_procs, >>>>>> nchildren, childrn, relatives, mine, jobid))) { >>>>>> proc_name.vpid = found; >>>>>> >>>>>> - if (!orte_util_proc_is_running(&proc_name) && >>>>>> ORTE_EPOCH_MIN < orte_util_lookup_epoch(&proc_name)) { >>>>>> - OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output, >>>>>> + if (!PROC_IS_RUNNING(&proc_name) >>>>>> + && 0 < >>>>>> ORTE_EPOCH_CMP(ORTE_EPOCH_MIN,orte_util_lookup_epoch(&proc_name))) { >>>>>> + OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output, >>>>>> "%s routed:binomial find children >>>>>> proc out of date - returning parent %d", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> parent)); >>>>>> return parent; >>>>>> } >>>>>> - OPAL_OUTPUT_VERBOSE((3, orte_routed_base_output, >>>>>> + OPAL_OUTPUT_VERBOSE((5, orte_routed_base_output, >>>>>> "%s routed:binomial find children >>>>>> returning found value %d", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), >>>>>> found)); >>>>>> return found; >>>>>> @@ -1029,8 +1038,7 @@ >>>>>> ORTE_PROC_MY_PARENT->vpid = binomial_tree(0, 0, ORTE_PROC_MY_NAME->vpid, >>>>>> orte_process_info.max_procs, >>>>>> &num_children, &my_children, NULL, true, >>>>>> jobid); >>>>>> - ORTE_PROC_MY_PARENT->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_PARENT->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_PARENT->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT)); >>>>>> >>>>>> if (0 < opal_output_get_verbosity(orte_routed_base_output)) { >>>>>> opal_output(0, "%s: parent %d num_children %d", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), ORTE_PROC_MY_PARENT->vpid, >>>>>> num_children); >>>>>> >>>>>> Modified: trunk/orte/mca/routed/cm/routed_cm.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/cm/routed_cm.c (original) >>>>>> +++ trunk/orte/mca/routed/cm/routed_cm.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -35,6 +35,7 @@ >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/runtime/orte_wait.h" >>>>>> #include "orte/runtime/runtime.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rml/base/rml_contact.h" >>>>>> >>>>>> @@ -139,7 +140,7 @@ >>>>>> >>>>>> if (proc->jobid == ORTE_JOBID_INVALID || >>>>>> proc->vpid == ORTE_VPID_INVALID || >>>>>> - proc->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -200,7 +201,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -257,8 +258,7 @@ >>>>>> ORTE_NAME_PRINT(route))); >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = ORTE_EPOCH_INVALID; >>>>>> - jfam->route.epoch = >>>>>> orte_ess.proc_get_epoch(&jfam->route); >>>>>> + >>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route)); >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -273,8 +273,7 @@ >>>>>> jfam->job_family = jfamily; >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = ORTE_EPOCH_INVALID; >>>>>> - jfam->route.epoch = orte_ess.proc_get_epoch(&jfam->route); >>>>>> + >>>>>> ORTE_EPOCH_SET(jfam->route.epoch,orte_ess.proc_get_epoch(&jfam->route)); >>>>>> >>>>>> opal_pointer_array_add(&orte_routed_jobfams, jfam); >>>>>> return ORTE_SUCCESS; >>>>>> @@ -299,7 +298,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> ret = ORTE_NAME_INVALID; >>>>>> goto found; >>>>>> } >>>>>> @@ -367,8 +366,7 @@ >>>>>> } >>>>>> >>>>>> /* Initialize daemon's epoch, based on its current vpid/jobid */ >>>>>> - daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - daemon.epoch = orte_ess.proc_get_epoch(&daemon); >>>>>> + ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon)); >>>>>> >>>>>> /* if the daemon is me, then send direct to the target! */ >>>>>> if (ORTE_PROC_MY_NAME->vpid == daemon.vpid) { >>>>>> @@ -814,8 +812,7 @@ >>>>>> */ >>>>>> local_lifeline.jobid = proc->jobid; >>>>>> local_lifeline.vpid = proc->vpid; >>>>>> - local_lifeline.epoch = ORTE_EPOCH_INVALID; >>>>>> - local_lifeline.epoch = orte_ess.proc_get_epoch(&local_lifeline); >>>>>> + >>>>>> ORTE_EPOCH_SET(local_lifeline.epoch,orte_ess.proc_get_epoch(&local_lifeline)); >>>>>> >>>>>> lifeline = &local_lifeline; >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/routed/direct/routed_direct.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/direct/routed_direct.c (original) >>>>>> +++ trunk/orte/mca/routed/direct/routed_direct.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -24,6 +24,7 @@ >>>>>> #include "orte/util/name_fns.h" >>>>>> #include "orte/util/proc_info.h" >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rml/base/rml_contact.h" >>>>>> >>>>>> @@ -135,7 +136,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> ret = ORTE_NAME_INVALID; >>>>>> } else { >>>>>> /* all routes are direct */ >>>>>> >>>>>> Modified: trunk/orte/mca/routed/linear/routed_linear.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/linear/routed_linear.c (original) >>>>>> +++ trunk/orte/mca/routed/linear/routed_linear.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -31,6 +31,7 @@ >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/runtime/orte_wait.h" >>>>>> #include "orte/runtime/runtime.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rml/base/rml_contact.h" >>>>>> >>>>>> @@ -132,7 +133,7 @@ >>>>>> >>>>>> if (proc->jobid == ORTE_JOBID_INVALID || >>>>>> proc->vpid == ORTE_VPID_INVALID || >>>>>> - proc->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -201,7 +202,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -259,7 +260,7 @@ >>>>>> ORTE_NAME_PRINT(route))); >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = route->epoch; >>>>>> + ORTE_EPOCH_SET(jfam->route.epoch,route->epoch); >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> } >>>>>> @@ -273,7 +274,7 @@ >>>>>> jfam->job_family = jfamily; >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = route->epoch; >>>>>> + ORTE_EPOCH_SET(jfam->route.epoch,route->epoch); >>>>>> opal_pointer_array_add(&orte_routed_jobfams, jfam); >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -373,8 +374,7 @@ >>>>>> } >>>>>> >>>>>> /* Initialize daemon's epoch, based on its current vpid/jobid */ >>>>>> - daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - daemon.epoch = orte_ess.proc_get_epoch(&daemon); >>>>>> + ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon)); >>>>>> >>>>>> /* if the daemon is me, then send direct to the target! */ >>>>>> if (ORTE_PROC_MY_NAME->vpid == daemon.vpid) { >>>>>> @@ -395,8 +395,7 @@ >>>>>> /* we are at end of chain - wrap around */ >>>>>> daemon.vpid = 0; >>>>>> } >>>>>> - daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - daemon.epoch = orte_ess.proc_get_epoch(&daemon); >>>>>> + >>>>>> ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon)); >>>>>> ret = &daemon; >>>>>> } >>>>>> } >>>>>> @@ -741,7 +740,7 @@ >>>>>> */ >>>>>> local_lifeline.jobid = proc->jobid; >>>>>> local_lifeline.vpid = proc->vpid; >>>>>> - local_lifeline.epoch = proc->epoch; >>>>>> + ORTE_EPOCH_SET(local_lifeline.epoch,proc->epoch); >>>>>> lifeline = &local_lifeline; >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> >>>>>> Modified: trunk/orte/mca/routed/radix/routed_radix.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/radix/routed_radix.c (original) >>>>>> +++ trunk/orte/mca/routed/radix/routed_radix.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -31,6 +31,7 @@ >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/runtime/orte_wait.h" >>>>>> #include "orte/runtime/runtime.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rml/base/rml_contact.h" >>>>>> >>>>>> @@ -145,7 +146,7 @@ >>>>>> >>>>>> if (proc->jobid == ORTE_JOBID_INVALID || >>>>>> proc->vpid == ORTE_VPID_INVALID || >>>>>> - proc->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(proc->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -214,7 +215,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> return ORTE_ERR_BAD_PARAM; >>>>>> } >>>>>> >>>>>> @@ -272,7 +273,7 @@ >>>>>> ORTE_NAME_PRINT(route))); >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = route->epoch; >>>>>> + ORTE_EPOCH_SET(jfam->route.epoch,route->epoch); >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> } >>>>>> @@ -286,7 +287,7 @@ >>>>>> jfam->job_family = jfamily; >>>>>> jfam->route.jobid = route->jobid; >>>>>> jfam->route.vpid = route->vpid; >>>>>> - jfam->route.epoch = route->epoch; >>>>>> + ORTE_EPOCH_SET(jfam->route.epoch,route->epoch); >>>>>> opal_pointer_array_add(&orte_routed_jobfams, jfam); >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -310,7 +311,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> ret = ORTE_NAME_INVALID; >>>>>> goto found; >>>>>> } >>>>>> @@ -413,8 +414,7 @@ >>>>>> if (opal_bitmap_is_set_bit(&child->relatives, daemon.vpid)) { >>>>>> /* yep - we need to step through this child */ >>>>>> daemon.vpid = child->vpid; >>>>>> - daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - daemon.epoch = orte_ess.proc_get_epoch(&daemon); >>>>>> + >>>>>> ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon)); >>>>>> ret = &daemon; >>>>>> goto found; >>>>>> } >>>>>> @@ -425,8 +425,7 @@ >>>>>> * any of our children, so we have to step up through our parent >>>>>> */ >>>>>> daemon.vpid = ORTE_PROC_MY_PARENT->vpid; >>>>>> - daemon.epoch = ORTE_EPOCH_INVALID; >>>>>> - daemon.epoch = orte_ess.proc_get_epoch(&daemon); >>>>>> + ORTE_EPOCH_SET(daemon.epoch,orte_ess.proc_get_epoch(&daemon)); >>>>>> >>>>>> ret = &daemon; >>>>>> >>>>>> @@ -788,7 +787,7 @@ >>>>>> */ >>>>>> local_lifeline.jobid = proc->jobid; >>>>>> local_lifeline.vpid = proc->vpid; >>>>>> - local_lifeline.epoch = proc->epoch; >>>>>> + ORTE_EPOCH_SET(local_lifeline.epoch,proc->epoch); >>>>>> lifeline = &local_lifeline; >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> @@ -881,8 +880,7 @@ >>>>>> ORTE_PROC_MY_PARENT->vpid = (Ii-Sum) % NInPrevLevel; >>>>>> ORTE_PROC_MY_PARENT->vpid += (Sum - NInPrevLevel); >>>>>> } >>>>>> - ORTE_PROC_MY_PARENT->epoch = ORTE_EPOCH_INVALID; >>>>>> - ORTE_PROC_MY_PARENT->epoch = >>>>>> orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT); >>>>>> + >>>>>> ORTE_EPOCH_SET(ORTE_PROC_MY_PARENT->epoch,orte_ess.proc_get_epoch(ORTE_PROC_MY_PARENT)); >>>>>> >>>>>> /* compute my direct children and the bitmap that shows which vpids >>>>>> * lie underneath their branch >>>>>> >>>>>> Modified: trunk/orte/mca/routed/slave/routed_slave.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/routed/slave/routed_slave.c (original) >>>>>> +++ trunk/orte/mca/routed/slave/routed_slave.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -26,6 +26,7 @@ >>>>>> #include "orte/runtime/orte_globals.h" >>>>>> #include "orte/runtime/orte_wait.h" >>>>>> #include "orte/runtime/runtime.h" >>>>>> +#include "orte/runtime/data_type_support/orte_dt_support.h" >>>>>> >>>>>> #include "orte/mca/rml/base/rml_contact.h" >>>>>> >>>>>> @@ -134,7 +135,7 @@ >>>>>> >>>>>> if (target->jobid == ORTE_JOBID_INVALID || >>>>>> target->vpid == ORTE_VPID_INVALID || >>>>>> - target->epoch == ORTE_EPOCH_INVALID) { >>>>>> + 0 == ORTE_EPOCH_CMP(target->epoch,ORTE_EPOCH_INVALID)) { >>>>>> ret = ORTE_NAME_INVALID; >>>>>> } else { >>>>>> /* a slave must always route via its parent daemon */ >>>>>> @@ -275,8 +276,7 @@ >>>>>> */ >>>>>> local_lifeline.jobid = proc->jobid; >>>>>> local_lifeline.vpid = proc->vpid; >>>>>> - local_lifeline.epoch = ORTE_EPOCH_INVALID; >>>>>> - local_lifeline.epoch = orte_ess.proc_get_epoch(&local_lifeline); >>>>>> + >>>>>> ORTE_EPOCH_SET(local_lifeline.epoch,orte_ess.proc_get_epoch(&local_lifeline)); >>>>>> >>>>>> lifeline = &local_lifeline; >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/sensor/file/sensor_file.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/sensor/file/sensor_file.c (original) >>>>>> +++ trunk/orte/mca/sensor/file/sensor_file.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -70,7 +70,9 @@ >>>>>> opal_list_item_t super; >>>>>> orte_jobid_t jobid; >>>>>> orte_vpid_t vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; >>>>>> +#endif >>>>>> char *file; >>>>>> int tick; >>>>>> bool check_size; >>>>>> >>>>>> Modified: trunk/orte/mca/snapc/base/snapc_base_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/snapc/base/snapc_base_fns.c (original) >>>>>> +++ trunk/orte/mca/snapc/base/snapc_base_fns.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -81,7 +81,7 @@ >>>>>> { >>>>>> snapshot->process_name.jobid = 0; >>>>>> snapshot->process_name.vpid = 0; >>>>>> - snapshot->process_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE; >>>>>> >>>>>> @@ -92,7 +92,7 @@ >>>>>> { >>>>>> snapshot->process_name.jobid = 0; >>>>>> snapshot->process_name.vpid = 0; >>>>>> - snapshot->process_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE; >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/snapc/full/snapc_full_global.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/snapc/full/snapc_full_global.c (original) >>>>>> +++ trunk/orte/mca/snapc/full/snapc_full_global.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -427,7 +427,7 @@ >>>>>> new_proc = OBJ_NEW(orte_proc_t); >>>>>> new_proc->name.jobid = proc->name.jobid; >>>>>> new_proc->name.vpid = proc->name.vpid; >>>>>> - new_proc->name.epoch = proc->name.epoch; >>>>>> + ORTE_EPOCH_SET(new_proc->name.epoch,proc->name.epoch); >>>>>> new_proc->node = OBJ_NEW(orte_node_t); >>>>>> new_proc->node->name = proc->node->name; >>>>>> opal_list_append(migrating_procs, &new_proc->super); >>>>>> @@ -618,7 +618,7 @@ >>>>>> >>>>>> orted_snapshot->process_name.jobid = cur_node->daemon->name.jobid; >>>>>> orted_snapshot->process_name.vpid = cur_node->daemon->name.vpid; >>>>>> - orted_snapshot->process_name.epoch = >>>>>> cur_node->daemon->name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(orted_snapshot->process_name.epoch,cur_node->daemon->name.epoch); >>>>>> >>>>>> mask = ORTE_NS_CMP_JOBID; >>>>>> >>>>>> @@ -636,7 +636,7 @@ >>>>>> >>>>>> app_snapshot->process_name.jobid = procs[p]->name.jobid; >>>>>> app_snapshot->process_name.vpid = procs[p]->name.vpid; >>>>>> - app_snapshot->process_name.epoch = procs[p]->name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(app_snapshot->process_name.epoch,procs[p]->name.epoch); >>>>>> >>>>>> opal_list_append(&(orted_snapshot->super.local_snapshots), >>>>>> &(app_snapshot->super)); >>>>>> } >>>>>> @@ -800,7 +800,7 @@ >>>>>> >>>>>> app_snapshot->process_name.jobid = procs[p]->name.jobid; >>>>>> app_snapshot->process_name.vpid = procs[p]->name.vpid; >>>>>> - app_snapshot->process_name.epoch = procs[p]->name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(app_snapshot->process_name.epoch,procs[p]->name.epoch); >>>>>> >>>>>> opal_list_append(&(orted_snapshot->super.local_snapshots), >>>>>> &(app_snapshot->super)); >>>>>> } >>>>>> @@ -816,7 +816,7 @@ >>>>>> >>>>>> orted_snapshot->process_name.jobid = cur_node->daemon->name.jobid; >>>>>> orted_snapshot->process_name.vpid = cur_node->daemon->name.vpid; >>>>>> - orted_snapshot->process_name.epoch = >>>>>> cur_node->daemon->name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(orted_snapshot->process_name.epoch,cur_node->daemon->name.epoch); >>>>>> >>>>>> mask = ORTE_NS_CMP_ALL; >>>>>> >>>>>> @@ -837,7 +837,7 @@ >>>>>> >>>>>> app_snapshot->process_name.jobid = procs[p]->name.jobid; >>>>>> app_snapshot->process_name.vpid = procs[p]->name.vpid; >>>>>> - app_snapshot->process_name.epoch = procs[p]->name.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(app_snapshot->process_name.epoch,procs[p]->name.epoch); >>>>>> >>>>>> opal_list_append(&(orted_snapshot->super.local_snapshots), >>>>>> &(app_snapshot->super)); >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/snapc/full/snapc_full_local.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/snapc/full/snapc_full_local.c (original) >>>>>> +++ trunk/orte/mca/snapc/full/snapc_full_local.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -2033,7 +2033,7 @@ >>>>>> vpid_snapshot->process_pid = child->pid; >>>>>> vpid_snapshot->super.process_name.jobid = child->name->jobid; >>>>>> vpid_snapshot->super.process_name.vpid = child->name->vpid; >>>>>> - vpid_snapshot->super.process_name.epoch = >>>>>> child->name->epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(vpid_snapshot->super.process_name.epoch,child->name->epoch); >>>>>> } >>>>>> } >>>>>> >>>>>> @@ -2095,7 +2095,7 @@ >>>>>> vpid_snapshot->process_pid = child->pid; >>>>>> vpid_snapshot->super.process_name.jobid = child->name->jobid; >>>>>> vpid_snapshot->super.process_name.vpid = child->name->vpid; >>>>>> - vpid_snapshot->super.process_name.epoch = >>>>>> child->name->epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(vpid_snapshot->super.process_name.epoch,child->name->epoch); >>>>>> /*vpid_snapshot->migrating = true;*/ >>>>>> >>>>>> opal_list_append(&(local_global_snapshot.local_snapshots), >>>>>> &(vpid_snapshot->super.super)); >>>>>> @@ -2111,7 +2111,7 @@ >>>>>> vpid_snapshot->process_pid = child->pid; >>>>>> vpid_snapshot->super.process_name.jobid = child->name->jobid; >>>>>> vpid_snapshot->super.process_name.vpid = child->name->vpid; >>>>>> - vpid_snapshot->super.process_name.epoch = >>>>>> child->name->epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(vpid_snapshot->super.process_name.epoch,child->name->epoch); >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/snapc/full/snapc_full_module.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/snapc/full/snapc_full_module.c (original) >>>>>> +++ trunk/orte/mca/snapc/full/snapc_full_module.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -83,7 +83,7 @@ >>>>>> void orte_snapc_full_orted_construct(orte_snapc_full_orted_snapshot_t >>>>>> *snapshot) { >>>>>> snapshot->process_name.jobid = 0; >>>>>> snapshot->process_name.vpid = 0; >>>>>> - snapshot->process_name.epoch = 0; >>>>>> + ORTE_EPOCH_SET(snapshot->process_name.epoch,0); >>>>>> >>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE; >>>>>> } >>>>>> @@ -91,7 +91,7 @@ >>>>>> void orte_snapc_full_orted_destruct( orte_snapc_full_orted_snapshot_t >>>>>> *snapshot) { >>>>>> snapshot->process_name.jobid = 0; >>>>>> snapshot->process_name.vpid = 0; >>>>>> - snapshot->process_name.epoch = 0; >>>>>> + ORTE_EPOCH_SET(snapshot->process_name.epoch,0); >>>>>> >>>>>> snapshot->state = ORTE_SNAPC_CKPT_STATE_NONE; >>>>>> } >>>>>> >>>>>> Modified: trunk/orte/mca/sstore/base/sstore_base_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/sstore/base/sstore_base_fns.c (original) >>>>>> +++ trunk/orte/mca/sstore/base/sstore_base_fns.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -62,7 +62,7 @@ >>>>>> { >>>>>> snapshot->process_name.jobid = 0; >>>>>> snapshot->process_name.vpid = 0; >>>>>> - snapshot->process_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> snapshot->crs_comp = NULL; >>>>>> snapshot->compress_comp = NULL; >>>>>> @@ -76,7 +76,7 @@ >>>>>> { >>>>>> snapshot->process_name.jobid = 0; >>>>>> snapshot->process_name.vpid = 0; >>>>>> - snapshot->process_name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(snapshot->process_name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> if( NULL != snapshot->crs_comp ) { >>>>>> free(snapshot->crs_comp); >>>>>> @@ -637,7 +637,7 @@ >>>>>> >>>>>> vpid_snapshot->process_name.jobid = proc.jobid; >>>>>> vpid_snapshot->process_name.vpid = proc.vpid; >>>>>> - vpid_snapshot->process_name.epoch = proc.epoch; >>>>>> + >>>>>> ORTE_EPOCH_SET(vpid_snapshot->process_name.epoch,proc.epoch); >>>>>> } >>>>>> else if(0 == strncmp(token, SSTORE_METADATA_LOCAL_CRS_COMP_STR, >>>>>> strlen(SSTORE_METADATA_LOCAL_CRS_COMP_STR))) { >>>>>> vpid_snapshot->crs_comp = strdup(value); >>>>>> >>>>>> Modified: trunk/orte/mca/sstore/central/sstore_central_global.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/sstore/central/sstore_central_global.c >>>>>> (original) >>>>>> +++ trunk/orte/mca/sstore/central/sstore_central_global.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -1216,8 +1216,7 @@ >>>>>> >>>>>> vpid_snapshot->process_name.jobid = handle_info->jobid; >>>>>> vpid_snapshot->process_name.vpid = i; >>>>>> - vpid_snapshot->process_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - vpid_snapshot->process_name.epoch = >>>>>> orte_ess.proc_get_epoch(&vpid_snapshot->process_name); >>>>>> + >>>>>> ORTE_EPOCH_SET(vpid_snapshot->process_name.epoch,orte_ess.proc_get_epoch(&vpid_snapshot->process_name)); >>>>>> >>>>>> vpid_snapshot->crs_comp = NULL; >>>>>> global_snapshot->start_time = NULL; >>>>>> >>>>>> Modified: trunk/orte/mca/sstore/central/sstore_central_local.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/sstore/central/sstore_central_local.c (original) >>>>>> +++ trunk/orte/mca/sstore/central/sstore_central_local.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -210,7 +210,7 @@ >>>>>> { >>>>>> info->name.jobid = ORTE_JOBID_INVALID; >>>>>> info->name.vpid = ORTE_VPID_INVALID; >>>>>> - info->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> info->local_location = NULL; >>>>>> info->metadata_filename = NULL; >>>>>> @@ -222,7 +222,7 @@ >>>>>> { >>>>>> info->name.jobid = ORTE_JOBID_INVALID; >>>>>> info->name.vpid = ORTE_VPID_INVALID; >>>>>> - info->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> if( NULL != info->local_location ) { >>>>>> free(info->local_location); >>>>>> @@ -535,7 +535,7 @@ >>>>>> >>>>>> app_info->name.jobid = name->jobid; >>>>>> app_info->name.vpid = name->vpid; >>>>>> - app_info->name.epoch = name->epoch; >>>>>> + ORTE_EPOCH_SET(app_info->name.epoch,name->epoch); >>>>>> >>>>>> opal_list_append(handle_info->app_info_handle, &(app_info->super)); >>>>>> >>>>>> >>>>>> Modified: trunk/orte/mca/sstore/stage/sstore_stage_global.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/sstore/stage/sstore_stage_global.c (original) >>>>>> +++ trunk/orte/mca/sstore/stage/sstore_stage_global.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -1218,10 +1218,10 @@ >>>>>> p_set = OBJ_NEW(orte_filem_base_process_set_t); >>>>>> p_set->source.jobid = peer->jobid; >>>>>> p_set->source.vpid = peer->vpid; >>>>>> - p_set->source.epoch = peer->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->source.epoch,peer->epoch); >>>>>> p_set->sink.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> p_set->sink.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - p_set->sink.epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->sink.epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> opal_list_append(&(filem_request->process_sets), &(p_set->super) ); >>>>>> } >>>>>> >>>>>> @@ -1706,8 +1706,7 @@ >>>>>> >>>>>> vpid_snapshot->process_name.jobid = handle_info->jobid; >>>>>> vpid_snapshot->process_name.vpid = i; >>>>>> - vpid_snapshot->process_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - vpid_snapshot->process_name.epoch = >>>>>> orte_ess.proc_get_epoch(&vpid_snapshot->process_name); >>>>>> + >>>>>> ORTE_EPOCH_SET(vpid_snapshot->process_name.epoch,orte_ess.proc_get_epoch(&vpid_snapshot->process_name)); >>>>>> >>>>>> /* JJH: Currently we do not have this information since we do not save >>>>>> * individual vpid info in the Global SStore. It is in the metadata >>>>>> >>>>>> Modified: trunk/orte/mca/sstore/stage/sstore_stage_local.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/mca/sstore/stage/sstore_stage_local.c (original) >>>>>> +++ trunk/orte/mca/sstore/stage/sstore_stage_local.c 2011-08-26 >>>>>> 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -287,7 +287,7 @@ >>>>>> { >>>>>> info->name.jobid = ORTE_JOBID_INVALID; >>>>>> info->name.vpid = ORTE_VPID_INVALID; >>>>>> - info->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> info->local_location = NULL; >>>>>> info->compressed_local_location = NULL; >>>>>> @@ -302,7 +302,7 @@ >>>>>> { >>>>>> info->name.jobid = ORTE_JOBID_INVALID; >>>>>> info->name.vpid = ORTE_VPID_INVALID; >>>>>> - info->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(info->name.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> if( NULL != info->local_location ) { >>>>>> free(info->local_location); >>>>>> @@ -1014,7 +1014,7 @@ >>>>>> >>>>>> app_info->name.jobid = name->jobid; >>>>>> app_info->name.vpid = name->vpid; >>>>>> - app_info->name.epoch = name->epoch; >>>>>> + ORTE_EPOCH_SET(app_info->name.epoch,name->epoch); >>>>>> >>>>>> opal_list_append(handle_info->app_info_handle, &(app_info->super)); >>>>>> >>>>>> @@ -2057,17 +2057,17 @@ >>>>>> /* if I am the HNP, then use me as the source */ >>>>>> p_set->source.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> p_set->source.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - p_set->source.epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> } >>>>>> else { >>>>>> /* otherwise, set the HNP as the source */ >>>>>> p_set->source.jobid = ORTE_PROC_MY_HNP->jobid; >>>>>> p_set->source.vpid = ORTE_PROC_MY_HNP->vpid; >>>>>> - p_set->source.epoch = ORTE_PROC_MY_HNP->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->source.epoch,ORTE_PROC_MY_HNP->epoch); >>>>>> } >>>>>> p_set->sink.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> p_set->sink.vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - p_set->sink.epoch = ORTE_PROC_MY_NAME->epoch; >>>>>> + ORTE_EPOCH_SET(p_set->sink.epoch,ORTE_PROC_MY_NAME->epoch); >>>>>> opal_list_append(&(filem_request->process_sets), &(p_set->super) ); >>>>>> >>>>>> /* Define the file set */ >>>>>> >>>>>> Modified: trunk/orte/orted/orted_comm.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/orted/orted_comm.c (original) >>>>>> +++ trunk/orte/orted/orted_comm.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -123,18 +123,13 @@ >>>>>> nm = (orte_routed_tree_t*)item; >>>>>> >>>>>> target.vpid = nm->vpid; >>>>>> - target.epoch = orte_util_lookup_epoch(&target); >>>>>> + ORTE_EPOCH_SET(target.epoch,orte_ess.proc_get_epoch(&target)); >>>>>> >>>>>> - if (!orte_util_proc_is_running(&target)) { >>>>>> + if (!PROC_IS_RUNNING(&target)) { >>>>>> continue; >>>>>> } >>>>>> >>>>>> - target.epoch = ORTE_EPOCH_INVALID; >>>>>> - if (ORTE_NODE_RANK_INVALID == (target.epoch = >>>>>> orte_ess.proc_get_epoch(&target))) { >>>>>> - /* If we are trying to send to a previously failed process >>>>>> it's >>>>>> - * better to fail silently. */ >>>>>> - continue; >>>>>> - } >>>>>> + ORTE_EPOCH_SET(target.epoch,orte_ess.proc_get_epoch(&target)); >>>>>> >>>>>> OPAL_OUTPUT_VERBOSE((1, orte_debug_output, >>>>>> "%s orte:daemon:send_relay sending relay msg to >>>>>> %s", >>>>>> @@ -422,7 +417,8 @@ >>>>>> proct = OBJ_NEW(orte_proc_t); >>>>>> proct->name.jobid = proc.jobid; >>>>>> proct->name.vpid = proc.vpid; >>>>>> - proct->name.epoch = proc.epoch; >>>>>> + ORTE_EPOCH_SET(proct->name.epoch,proc.epoch); >>>>>> + >>>>>> opal_pointer_array_add(&procarray, proct); >>>>>> num_replies++; >>>>>> } >>>>>> @@ -1059,7 +1055,9 @@ >>>>>> orte_job_t *jdata; >>>>>> orte_proc_t *proc; >>>>>> orte_vpid_t vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; >>>>>> +#endif >>>>>> int32_t i, num_procs; >>>>>> >>>>>> /* setup the answer */ >>>>>> @@ -1086,12 +1084,14 @@ >>>>>> goto CLEANUP; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* unpack the epoch */ >>>>>> n = 1; >>>>>> if (ORTE_SUCCESS != (ret = opal_dss.unpack(buffer, &epoch, &n, >>>>>> ORTE_EPOCH))) { >>>>>> ORTE_ERROR_LOG(ret); >>>>>> goto CLEANUP; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /* if they asked for a specific proc, then just get that info */ >>>>>> if (ORTE_VPID_WILDCARD != vpid) { >>>>>> @@ -1201,7 +1201,7 @@ >>>>>> /* loop across all daemons */ >>>>>> proc2.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> for (proc2.vpid=1; proc2.vpid < >>>>>> orte_process_info.num_procs; proc2.vpid++) { >>>>>> - proc2.epoch = orte_util_lookup_epoch(&proc2); >>>>>> + >>>>>> ORTE_EPOCH_SET(proc2.epoch,orte_util_lookup_epoch(&proc2)); >>>>>> >>>>>> /* setup the cmd */ >>>>>> relay_msg = OBJ_NEW(opal_buffer_t); >>>>>> >>>>>> Modified: trunk/orte/orted/orted_main.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/orted/orted_main.c (original) >>>>>> +++ trunk/orte/orted/orted_main.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -388,14 +388,14 @@ >>>>>> orte_process_info.my_daemon_uri = orte_rml.get_contact_info(); >>>>>> ORTE_PROC_MY_DAEMON->jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> ORTE_PROC_MY_DAEMON->vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - ORTE_PROC_MY_DAEMON->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_DAEMON->epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> /* if I am also the hnp, then update that contact info field too */ >>>>>> if (ORTE_PROC_IS_HNP) { >>>>>> orte_process_info.my_hnp_uri = orte_rml.get_contact_info(); >>>>>> ORTE_PROC_MY_HNP->jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> ORTE_PROC_MY_HNP->vpid = ORTE_PROC_MY_NAME->vpid; >>>>>> - ORTE_PROC_MY_HNP->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ORTE_PROC_MY_HNP->epoch,ORTE_EPOCH_MIN); >>>>>> } >>>>>> >>>>>> /* setup the primary daemon command receive function */ >>>>>> @@ -495,7 +495,8 @@ >>>>>> proc = OBJ_NEW(orte_proc_t); >>>>>> proc->name.jobid = jdata->jobid; >>>>>> proc->name.vpid = 0; >>>>>> - proc->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN); >>>>>> + >>>>>> proc->state = ORTE_PROC_STATE_RUNNING; >>>>>> proc->app_idx = 0; >>>>>> proc->node = nodes[0]; /* hnp node must be there */ >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_compare_fns.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -76,6 +76,7 @@ >>>>>> } >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /** check the epochs - if one of them is WILDCARD, then ignore >>>>>> * this field since anything is okay >>>>>> */ >>>>>> @@ -87,6 +88,7 @@ >>>>>> return OPAL_VALUE1_GREATER; >>>>>> } >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /** only way to get here is if all fields are equal or WILDCARD */ >>>>>> return OPAL_EQUAL; >>>>>> @@ -122,6 +124,7 @@ >>>>>> return OPAL_EQUAL; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_dt_compare_epoch(orte_epoch_t *value1, >>>>>> orte_epoch_t *value2, >>>>>> opal_data_type_t type) >>>>>> @@ -136,6 +139,7 @@ >>>>>> >>>>>> return OPAL_EQUAL; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> /** >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -61,7 +61,7 @@ >>>>>> >>>>>> val->jobid = src->jobid; >>>>>> val->vpid = src->vpid; >>>>>> - val->epoch = src->epoch; >>>>>> + ORTE_EPOCH_SET(val->epoch,src->epoch); >>>>>> >>>>>> *dest = val; >>>>>> return ORTE_SUCCESS; >>>>>> @@ -105,6 +105,7 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* >>>>>> * EPOCH >>>>>> */ >>>>>> @@ -123,6 +124,7 @@ >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -58,7 +58,9 @@ >>>>>> orte_process_name_t* proc; >>>>>> orte_jobid_t *jobid; >>>>>> orte_vpid_t *vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t *epoch; >>>>>> +#endif >>>>>> >>>>>> /* collect all the jobids in a contiguous array */ >>>>>> jobid = (orte_jobid_t*)malloc(num_vals * sizeof(orte_jobid_t)); >>>>>> @@ -100,6 +102,7 @@ >>>>>> } >>>>>> free(vpid); >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* Collect all the epochs in a contiguous array */ >>>>>> epoch = (orte_epoch_t *) malloc(num_vals * sizeof(orte_epoch_t)); >>>>>> if (NULL == epoch) { >>>>>> @@ -118,6 +121,7 @@ >>>>>> return rc; >>>>>> } >>>>>> free(epoch); >>>>>> +#endif >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> @@ -156,6 +160,7 @@ >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* >>>>>> * EPOCH >>>>>> */ >>>>>> @@ -171,6 +176,7 @@ >>>>>> >>>>>> return ret; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> /* >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_print_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_print_fns.c >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_print_fns.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -125,8 +125,10 @@ >>>>>> orte_dt_quick_print(output, "ORTE_STD_CNTR", prefix, src, >>>>>> ORTE_STD_CNTR_T); >>>>>> break; >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> case ORTE_EPOCH: >>>>>> orte_dt_quick_print(output, "ORTE_EPOCH", prefix, src, >>>>>> ORTE_EPOCH_T); >>>>>> +#endif >>>>>> >>>>>> case ORTE_VPID: >>>>>> orte_dt_quick_print(output, "ORTE_VPID", prefix, src, >>>>>> ORTE_VPID_T); >>>>>> @@ -478,11 +480,21 @@ >>>>>> if (orte_xml_output) { >>>>>> /* need to create the output in XML format */ >>>>>> if (0 == src->pid) { >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> asprintf(output, "%s<process rank=\"%s\" status=\"%s\" >>>>>> epoch=\"%s\"/>\n", pfx2, >>>>>> ORTE_VPID_PRINT(src->name.vpid), >>>>>> orte_proc_state_to_str(src->state), ORTE_EPOCH_PRINT(src->name.epoch)); >>>>>> +#else >>>>>> + asprintf(output, "%s<process rank=\"%s\" >>>>>> status=\"%s\"/>\n", pfx2, >>>>>> + ORTE_VPID_PRINT(src->name.vpid), >>>>>> orte_proc_state_to_str(src->state)); >>>>>> +#endif >>>>>> } else { >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> asprintf(output, "%s<process rank=\"%s\" pid=\"%d\" status=\"%s\" >>>>>> epoch=\"%s\"/>\n", pfx2, >>>>>> ORTE_VPID_PRINT(src->name.vpid), (int)src->pid, >>>>>> orte_proc_state_to_str(src->state), ORTE_EPOCH_PRINT(src->name.epoch)); >>>>>> +#else >>>>>> + asprintf(output, "%s<process rank=\"%s\" pid=\"%d\" >>>>>> status=\"%s\"/>\n", pfx2, >>>>>> + ORTE_VPID_PRINT(src->name.vpid), (int)src->pid, >>>>>> orte_proc_state_to_str(src->state)); >>>>>> +#endif >>>>>> } >>>>>> free(pfx2); >>>>>> return ORTE_SUCCESS; >>>>>> @@ -490,10 +502,17 @@ >>>>>> >>>>>> if (!orte_devel_level_output) { >>>>>> /* just print a very simple output for users */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> asprintf(&tmp, "\n%sProcess OMPI jobid: %s Process rank: %s Epoch: >>>>>> %s", pfx2, >>>>>> ORTE_JOBID_PRINT(src->name.jobid), >>>>>> ORTE_VPID_PRINT(src->name.vpid), >>>>>> ORTE_EPOCH_PRINT(src->name.epoch)); >>>>>> +#else >>>>>> + asprintf(&tmp, "\n%sProcess OMPI jobid: %s Process rank: %s >>>>>> Epoch: %s", pfx2, >>>>>> + ORTE_JOBID_PRINT(src->name.jobid), >>>>>> + ORTE_VPID_PRINT(src->name.vpid)); >>>>>> +#endif >>>>>> + >>>>>> /* set the return */ >>>>>> *output = tmp; >>>>>> free(pfx2); >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_size_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_size_fns.c >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_size_fns.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -45,9 +45,11 @@ >>>>>> *size = sizeof(orte_std_cntr_t); >>>>>> break; >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> case ORTE_EPOCH: >>>>>> *size = sizeof(orte_epoch_t); >>>>>> break; >>>>>> +#endif >>>>>> >>>>>> case ORTE_VPID: >>>>>> *size = sizeof(orte_vpid_t); >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_support.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_support.h >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_support.h >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -52,9 +52,14 @@ >>>>>> int orte_dt_compare_vpid(orte_vpid_t *value1, >>>>>> orte_vpid_t *value2, >>>>>> opal_data_type_t type); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_dt_compare_epoch(orte_epoch_t *value1, >>>>>> orte_epoch_t *value2, >>>>>> opal_data_type_t type); >>>>>> +#define ORTE_EPOCH_CMP(n,m) ( (m) - (n) ) >>>>>> +#else >>>>>> +#define ORTE_EPOCH_CMP(n,m) ( 0 ) >>>>>> +#endif >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> int orte_dt_compare_job(orte_job_t *value1, orte_job_t *value2, >>>>>> opal_data_type_t type); >>>>>> int orte_dt_compare_node(orte_node_t *value1, orte_node_t *value2, >>>>>> opal_data_type_t type); >>>>>> @@ -86,7 +91,9 @@ >>>>>> int orte_dt_copy_name(orte_process_name_t **dest, orte_process_name_t >>>>>> *src, opal_data_type_t type); >>>>>> int orte_dt_copy_jobid(orte_jobid_t **dest, orte_jobid_t *src, >>>>>> opal_data_type_t type); >>>>>> int orte_dt_copy_vpid(orte_vpid_t **dest, orte_vpid_t *src, >>>>>> opal_data_type_t type); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_dt_copy_epoch(orte_epoch_t **dest, orte_epoch_t *src, >>>>>> opal_data_type_t type); >>>>>> +#endif >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> int orte_dt_copy_job(orte_job_t **dest, orte_job_t *src, >>>>>> opal_data_type_t type); >>>>>> int orte_dt_copy_node(orte_node_t **dest, orte_node_t *src, >>>>>> opal_data_type_t type); >>>>>> @@ -116,8 +123,10 @@ >>>>>> int32_t num_vals, opal_data_type_t type); >>>>>> int orte_dt_pack_vpid(opal_buffer_t *buffer, const void *src, >>>>>> int32_t num_vals, opal_data_type_t type); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_dt_pack_epoch(opal_buffer_t *buffer, const void *src, >>>>>> int32_t num_vals, opal_data_type_t type); >>>>>> +#endif >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> int orte_dt_pack_job(opal_buffer_t *buffer, const void *src, >>>>>> int32_t num_vals, opal_data_type_t type); >>>>>> @@ -185,8 +194,10 @@ >>>>>> int32_t *num_vals, opal_data_type_t type); >>>>>> int orte_dt_unpack_vpid(opal_buffer_t *buffer, void *dest, >>>>>> int32_t *num_vals, opal_data_type_t type); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_dt_unpack_epoch(opal_buffer_t *buffer, void *dest, >>>>>> int32_t *num_vals, opal_data_type_t type); >>>>>> +#endif >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> int orte_dt_unpack_job(opal_buffer_t *buffer, void *dest, >>>>>> int32_t *num_vals, opal_data_type_t type); >>>>>> >>>>>> Modified: trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c >>>>>> (original) >>>>>> +++ trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c >>>>>> 2011-08-26 18:16:14 EDT (Fri, 26 Aug 2011) >>>>>> @@ -54,7 +54,9 @@ >>>>>> orte_process_name_t* proc; >>>>>> orte_jobid_t *jobid; >>>>>> orte_vpid_t *vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t *epoch; >>>>>> +#endif >>>>>> >>>>>> num = *num_vals; >>>>>> >>>>>> @@ -92,6 +94,7 @@ >>>>>> return rc; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* collect all the epochs in a contiguous array */ >>>>>> epoch= (orte_epoch_t*)malloc(num * sizeof(orte_epoch_t)); >>>>>> if (NULL == epoch) { >>>>>> @@ -109,18 +112,21 @@ >>>>>> free(jobid); >>>>>> return rc; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /* build the names from the jobid/vpid/epoch arrays */ >>>>>> proc = (orte_process_name_t*)dest; >>>>>> for (i=0; i < num; i++) { >>>>>> proc->jobid = jobid[i]; >>>>>> proc->vpid = vpid[i]; >>>>>> - proc->epoch = epoch[i]; >>>>>> + ORTE_EPOCH_SET(proc->epoch,epoch[i]); >>>>>> proc++; >>>>>> } >>>>>> >>>>>> /* cleanup */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> free(epoch); >>>>>> +#endif >>>>>> free(vpid); >>>>>> free(jobid); >>>>>> >>>>>> @@ -159,6 +165,7 @@ >>>>>> return ret; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* >>>>>> * EPOCH >>>>>> */ >>>>>> @@ -174,6 +181,7 @@ >>>>>> >>>>>> return ret; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> /* >>>>>> >>>>>> Modified: trunk/orte/runtime/orte_data_server.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/orte_data_server.c (original) >>>>>> +++ trunk/orte/runtime/orte_data_server.c 2011-08-26 18:16:14 EDT >>>>>> (Fri, 26 Aug 2011) >>>>>> @@ -220,7 +220,7 @@ >>>>>> data->port = port_name; >>>>>> data->owner.jobid = sender->jobid; >>>>>> data->owner.vpid = sender->vpid; >>>>>> - data->owner.epoch = sender->epoch; >>>>>> + ORTE_EPOCH_SET(data->owner.epoch,sender->epoch); >>>>>> >>>>>> /* store the data */ >>>>>> data->index = opal_pointer_array_add(orte_data_server_store, >>>>>> data); >>>>>> >>>>>> Modified: trunk/orte/runtime/orte_globals.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/orte_globals.c (original) >>>>>> +++ trunk/orte/runtime/orte_globals.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -277,6 +277,7 @@ >>>>>> return rc; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> tmp = ORTE_EPOCH; >>>>>> if (ORTE_SUCCESS != (rc = opal_dss.register_type(orte_dt_pack_epoch, >>>>>> orte_dt_unpack_epoch, >>>>>> @@ -290,6 +291,7 @@ >>>>>> ORTE_ERROR_LOG(rc); >>>>>> return rc; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> #if !ORTE_DISABLE_FULL_SUPPORT >>>>>> tmp = ORTE_JOB; >>>>>> @@ -933,7 +935,7 @@ >>>>>> proc->beat = 0; >>>>>> OBJ_CONSTRUCT(&proc->stats, opal_ring_buffer_t); >>>>>> opal_ring_buffer_init(&proc->stats, orte_stat_history_size); >>>>>> - proc->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc->name.epoch,ORTE_EPOCH_MIN); >>>>>> #if OPAL_ENABLE_FT_CR == 1 >>>>>> proc->ckpt_state = 0; >>>>>> proc->ckpt_snapshot_ref = NULL; >>>>>> >>>>>> Modified: trunk/orte/runtime/orte_init.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/orte_init.c (original) >>>>>> +++ trunk/orte/runtime/orte_init.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -57,8 +57,17 @@ >>>>>> char *orte_prohibited_session_dirs = NULL; >>>>>> bool orte_create_session_dirs = true; >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> +orte_process_name_t orte_name_wildcard = {ORTE_JOBID_WILDCARD, >>>>>> ORTE_VPID_WILDCARD, ORTE_EPOCH_WILDCARD}; >>>>>> +#else >>>>>> orte_process_name_t orte_name_wildcard = {ORTE_JOBID_WILDCARD, >>>>>> ORTE_VPID_WILDCARD}; >>>>>> +#endif >>>>>> + >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> +orte_process_name_t orte_name_invalid = {ORTE_JOBID_INVALID, >>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_INVALID}; >>>>>> +#else >>>>>> orte_process_name_t orte_name_invalid = {ORTE_JOBID_INVALID, >>>>>> ORTE_VPID_INVALID}; >>>>>> +#endif >>>>>> >>>>>> >>>>>> #if OPAL_CC_USE_PRAGMA_IDENT >>>>>> >>>>>> Modified: trunk/orte/runtime/orte_wait.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/runtime/orte_wait.h (original) >>>>>> +++ trunk/orte/runtime/orte_wait.h 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -204,7 +204,7 @@ >>>>>> mev = OBJ_NEW(orte_message_event_t); \ >>>>>> mev->sender.jobid = (sndr)->jobid; \ >>>>>> mev->sender.vpid = (sndr)->vpid; \ >>>>>> - mev->sender.epoch = (sndr)->epoch; \ >>>>>> + ORTE_EPOCH_SET(mev->sender.epoch,(sndr)->epoch); \ >>>>>> opal_dss.copy_payload(mev->buffer, (buf)); \ >>>>>> mev->tag = (tg); \ >>>>>> mev->file = strdup((buf)->parent.cls_init_file_name); \ >>>>>> @@ -228,7 +228,7 @@ >>>>>> mev = OBJ_NEW(orte_message_event_t); \ >>>>>> mev->sender.jobid = (sndr)->jobid; \ >>>>>> mev->sender.vpid = (sndr)->vpid; \ >>>>>> - mev->sender.epoch = (sndr)->epoch; \ >>>>>> + ORTE_EPOCH_SET(mev->sender.epoch,(sndr)->epoch); \ >>>>>> opal_dss.copy_payload(mev->buffer, (buf)); \ >>>>>> mev->tag = (tg); \ >>>>>> opal_event_evtimer_set(opal_event_base, \ >>>>>> @@ -258,7 +258,7 @@ >>>>>> tmp = OBJ_NEW(orte_notify_event_t); \ >>>>>> tmp->proc.jobid = (data)->jobid; \ >>>>>> tmp->proc.vpid = (data)->vpid; \ >>>>>> - tmp->proc.epoch = (data)->epoch; \ >>>>>> + ORTE_EPOCH_SET(tmp->proc.epoch,(data)->epoch); \ >>>>>> opal_event.evtimer_set(opal_event_base, \ >>>>>> tmp->ev, (cbfunc), tmp); \ >>>>>> now.tv_sec = 0; \ >>>>>> >>>>>> Modified: trunk/orte/test/system/oob_stress.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/test/system/oob_stress.c (original) >>>>>> +++ trunk/orte/test/system/oob_stress.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -74,8 +74,7 @@ >>>>>> >>>>>> for (j=1; j < count+1; j++) { >>>>>> peer.vpid = (ORTE_PROC_MY_NAME->vpid + j) % >>>>>> orte_process_info.num_procs; >>>>>> - peer.epoch = ORTE_EPOCH_INVALID; >>>>>> - peer.epoch = orte_ess.proc_get_epoch(&peer); >>>>>> + ORTE_EPOCH_SET(peer.epoch,orte_ess.proc_get_epoch(&peer)); >>>>>> >>>>>> /* rank0 starts ring */ >>>>>> if (ORTE_PROC_MY_NAME->vpid == 0) { >>>>>> >>>>>> Modified: trunk/orte/test/system/orte_ring.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/test/system/orte_ring.c (original) >>>>>> +++ trunk/orte/test/system/orte_ring.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -41,16 +41,14 @@ >>>>>> if( right_peer_orte_name.vpid >= num_peers ) { >>>>>> right_peer_orte_name.vpid = 0; >>>>>> } >>>>>> - right_peer_orte_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - right_peer_orte_name.epoch = >>>>>> orte_ess.proc_get_epoch(&right_peer_orte_name); >>>>>> + >>>>>> ORTE_EPOCH_SET(right_peer_orte_name.epoch,orte_ess.proc_get_epoch(&right_peer_orte_name)); >>>>>> >>>>>> left_peer_orte_name.jobid = ORTE_PROC_MY_NAME->jobid; >>>>>> left_peer_orte_name.vpid = ORTE_PROC_MY_NAME->vpid - 1; >>>>>> if( ORTE_PROC_MY_NAME->vpid == 0 ) { >>>>>> left_peer_orte_name.vpid = num_peers - 1; >>>>>> } >>>>>> - left_peer_orte_name.epoch = ORTE_EPOCH_INVALID; >>>>>> - left_peer_orte_name.epoch = >>>>>> orte_ess.proc_get_epoch(&left_peer_orte_name); >>>>>> + >>>>>> ORTE_EPOCH_SET(left_peer_orte_name.epoch,orte_ess.proc_get_epoch(&left_peer_orte_name)); >>>>>> >>>>>> printf("My name is: %s -- PID %d\tMy Left Peer is %s\tMy Right Peer is >>>>>> %s\n", >>>>>> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), getpid(), >>>>>> >>>>>> Modified: trunk/orte/test/system/orte_spawn.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/test/system/orte_spawn.c (original) >>>>>> +++ trunk/orte/test/system/orte_spawn.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -74,8 +74,8 @@ >>>>>> for (i=0; i < app->num_procs; i++) { >>>>>> name.vpid = i; >>>>>> >>>>>> - name.epoch = ORTE_EPOCH_INVALID; >>>>>> - name.epoch = orte_ess.proc_get_epoch(&name); >>>>>> + ORTE_EPOCH_SET(name.epoch,orte_ess.proc_get_epoch(&name)); >>>>>> + >>>>>> fprintf(stderr, "Parent: sending message to child %s\n", >>>>>> ORTE_NAME_PRINT(&name)); >>>>>> if (0 > (rc = orte_rml.send(&name, &msg, 1, MY_TAG, 0))) { >>>>>> ORTE_ERROR_LOG(rc); >>>>>> >>>>>> Modified: trunk/orte/tools/orte-ps/orte-ps.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/tools/orte-ps/orte-ps.c (original) >>>>>> +++ trunk/orte/tools/orte-ps/orte-ps.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -869,8 +869,14 @@ >>>>>> } >>>>>> >>>>>> /* query the HNP for info on the procs in this job */ >>>>>> - if (ORTE_SUCCESS != (ret = >>>>>> orte_util_comm_query_proc_info(&(hnpinfo->hnp->name), job->jobid, >>>>>> - >>>>>> ORTE_VPID_WILDCARD, ORTE_EPOCH_WILDCARD, &cnt, &procs))) { >>>>>> + if (ORTE_SUCCESS != (ret = >>>>>> orte_util_comm_query_proc_info(&(hnpinfo->hnp->name), >>>>>> + >>>>>> job->jobid, >>>>>> + >>>>>> ORTE_VPID_WILDCARD, >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> + >>>>>> ORTE_EPOCH_WILDCARD, >>>>>> +#endif >>>>>> + &cnt, >>>>>> + >>>>>> &procs))) { >>>>>> ORTE_ERROR_LOG(ret); >>>>>> } >>>>>> job->procs->addr = (void**)procs; >>>>>> >>>>>> Modified: trunk/orte/tools/orte-top/orte-top.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/tools/orte-top/orte-top.c (original) >>>>>> +++ trunk/orte/tools/orte-top/orte-top.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -471,7 +471,7 @@ >>>>>> if (NULL == ranks) { >>>>>> /* take all ranks */ >>>>>> proc.vpid = ORTE_VPID_WILDCARD; >>>>>> - proc.epoch = ORTE_EPOCH_WILDCARD; >>>>>> + ORTE_EPOCH_SET(proc.epoch,ORTE_EPOCH_WILDCARD); >>>>>> if (ORTE_SUCCESS != (ret = opal_dss.pack(&cmdbuf, &proc, 1, >>>>>> ORTE_NAME))) { >>>>>> ORTE_ERROR_LOG(ret); >>>>>> goto cleanup; >>>>>> >>>>>> Modified: trunk/orte/util/comm/comm.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/comm/comm.c (original) >>>>>> +++ trunk/orte/util/comm/comm.c 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -433,8 +433,13 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_util_comm_query_proc_info(const orte_process_name_t *hnp, >>>>>> orte_jobid_t job, orte_vpid_t vpid, >>>>>> orte_epoch_t epoch, int *num_procs, >>>>>> orte_proc_t ***proc_info_array) >>>>>> +#else >>>>>> +int orte_util_comm_query_proc_info(const orte_process_name_t *hnp, >>>>>> orte_jobid_t job, orte_vpid_t vpid, >>>>>> + int *num_procs, orte_proc_t >>>>>> ***proc_info_array) >>>>>> +#endif >>>>>> { >>>>>> int ret; >>>>>> int32_t cnt, cnt_procs, n; >>>>>> @@ -463,11 +468,13 @@ >>>>>> OBJ_RELEASE(cmd); >>>>>> return ret; >>>>>> } >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> if (ORTE_SUCCESS != (ret = opal_dss.pack(cmd, &epoch, 1, ORTE_EPOCH))) { >>>>>> ORTE_ERROR_LOG(ret); >>>>>> OBJ_RELEASE(cmd); >>>>>> return ret; >>>>>> } >>>>>> +#endif >>>>>> /* define a max time to wait for send to complete */ >>>>>> timer_fired = false; >>>>>> error_exit = ORTE_SUCCESS; >>>>>> >>>>>> Modified: trunk/orte/util/comm/comm.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/comm/comm.h (original) >>>>>> +++ trunk/orte/util/comm/comm.h 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -52,7 +52,10 @@ >>>>>> int *num_nodes, orte_node_t >>>>>> ***node_info_array); >>>>>> >>>>>> ORTE_DECLSPEC int orte_util_comm_query_proc_info(const >>>>>> orte_process_name_t *hnp, orte_jobid_t job, orte_vpid_t vpid, >>>>>> - orte_epoch_t epoch, >>>>>> int *num_procs, orte_proc_t ***proc_info_array); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> + orte_epoch_t epoch, >>>>>> +#endif >>>>>> + int *num_procs, >>>>>> orte_proc_t ***proc_info_array); >>>>>> >>>>>> ORTE_DECLSPEC int orte_util_comm_spawn_job(const orte_process_name_t >>>>>> *hnp, orte_job_t *jdata); >>>>>> >>>>>> >>>>>> Modified: trunk/orte/util/hnp_contact.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/hnp_contact.c (original) >>>>>> +++ trunk/orte/util/hnp_contact.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -55,7 +55,8 @@ >>>>>> { >>>>>> ptr->name.jobid = ORTE_JOBID_INVALID; >>>>>> ptr->name.vpid = ORTE_VPID_INVALID; >>>>>> - ptr->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(ptr->name.epoch,ORTE_EPOCH_MIN); >>>>>> + >>>>>> ptr->rml_uri = NULL; >>>>>> } >>>>>> static void orte_hnp_contact_destruct(orte_hnp_contact_t *ptr) >>>>>> >>>>>> Modified: trunk/orte/util/name_fns.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/name_fns.c (original) >>>>>> +++ trunk/orte/util/name_fns.c 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -46,7 +46,7 @@ >>>>>> { >>>>>> list->name.jobid = ORTE_JOBID_INVALID; >>>>>> list->name.vpid = ORTE_VPID_INVALID; >>>>>> - list->name.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(list->name.epoch,ORTE_EPOCH_MIN); >>>>>> } >>>>>> >>>>>> /* destructor - used to free any resources held by instance */ >>>>>> @@ -116,7 +116,10 @@ >>>>>> char* orte_util_print_name_args(const orte_process_name_t *name) >>>>>> { >>>>>> orte_print_args_buffers_t *ptr; >>>>>> - char *job, *vpid, *epoch; >>>>>> + char *job, *vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> + char *epoch; >>>>>> +#endif >>>>>> >>>>>> /* protect against NULL names */ >>>>>> if (NULL == name) { >>>>>> @@ -141,7 +144,7 @@ >>>>>> */ >>>>>> job = orte_util_print_jobids(name->jobid); >>>>>> vpid = orte_util_print_vpids(name->vpid); >>>>>> - epoch = orte_util_print_epoch(name->epoch); >>>>>> + ORTE_EPOCH_SET(epoch,orte_util_print_epoch(name->epoch)); >>>>>> >>>>>> /* get the next buffer */ >>>>>> ptr = get_print_name_buffer(); >>>>>> @@ -156,9 +159,15 @@ >>>>>> ptr->cntr = 0; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> snprintf(ptr->buffers[ptr->cntr++], >>>>>> ORTE_PRINT_NAME_ARGS_MAX_SIZE, >>>>>> "[%s,%s,%s]", job, vpid, epoch); >>>>>> +#else >>>>>> + snprintf(ptr->buffers[ptr->cntr++], >>>>>> + ORTE_PRINT_NAME_ARGS_MAX_SIZE, >>>>>> + "[%s,%s]", job, vpid); >>>>>> +#endif >>>>>> >>>>>> return ptr->buffers[ptr->cntr-1]; >>>>>> } >>>>>> @@ -282,6 +291,7 @@ >>>>>> return ptr->buffers[ptr->cntr-1]; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> char* orte_util_print_epoch(const orte_epoch_t epoch) >>>>>> { >>>>>> orte_print_args_buffers_t *ptr; >>>>>> @@ -309,6 +319,7 @@ >>>>>> } >>>>>> return ptr->buffers[ptr->cntr-1]; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> >>>>>> >>>>>> @@ -403,6 +414,7 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> int orte_util_convert_epoch_to_string(char **epoch_string, const >>>>>> orte_epoch_t epoch) >>>>>> { >>>>>> /* check for wildcard value - handle appropriately */ >>>>>> @@ -425,7 +437,6 @@ >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> - >>>>>> int orte_util_convert_string_to_epoch(orte_epoch_t *epoch, const char* >>>>>> epoch_string) >>>>>> { >>>>>> if (NULL == epoch_string) { /* got an error */ >>>>>> @@ -450,6 +461,7 @@ >>>>>> >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> int orte_util_convert_string_to_process_name(orte_process_name_t *name, >>>>>> const char* name_string) >>>>>> @@ -457,13 +469,15 @@ >>>>>> char *temp, *token; >>>>>> orte_jobid_t job; >>>>>> orte_vpid_t vpid; >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t epoch; >>>>>> +#endif >>>>>> int return_code=ORTE_SUCCESS; >>>>>> - >>>>>> + >>>>>> /* set default */ >>>>>> name->jobid = ORTE_JOBID_INVALID; >>>>>> name->vpid = ORTE_VPID_INVALID; >>>>>> - name->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(name->epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> /* check for NULL string - error */ >>>>>> if (NULL == name_string) { >>>>>> @@ -510,6 +524,7 @@ >>>>>> vpid = strtoul(token, NULL, 10); >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> token = strtok(NULL, ORTE_SCHEMA_DELIMITER_STRING); /** get next field >>>>>> -> epoch*/ >>>>>> >>>>>> /* check for error */ >>>>>> @@ -528,10 +543,11 @@ >>>>>> } else { >>>>>> epoch = strtoul(token, NULL, 10); >>>>>> } >>>>>> +#endif >>>>>> >>>>>> name->jobid = job; >>>>>> name->vpid = vpid; >>>>>> - name->epoch = epoch; >>>>>> + ORTE_EPOCH_SET(name->epoch,epoch); >>>>>> >>>>>> free(temp); >>>>>> >>>>>> @@ -568,6 +584,7 @@ >>>>>> asprintf(&tmp2, "%s%c%lu", tmp, ORTE_SCHEMA_DELIMITER_CHAR, (unsigned >>>>>> long)name->vpid); >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> if (ORTE_EPOCH_WILDCARD == name->epoch) { >>>>>> asprintf(name_string, "%s%c%s", tmp2, ORTE_SCHEMA_DELIMITER_CHAR, >>>>>> ORTE_SCHEMA_WILDCARD_STRING); >>>>>> } else if (ORTE_EPOCH_INVALID == name->epoch) { >>>>>> @@ -575,6 +592,10 @@ >>>>>> } else { >>>>>> asprintf(name_string, "%s%c%lu", tmp2, ORTE_SCHEMA_DELIMITER_CHAR, >>>>>> (unsigned long)name->epoch); >>>>>> } >>>>>> +#else >>>>>> + asprintf(name_string, "%s", tmp2); >>>>>> +#endif >>>>>> + >>>>>> free(tmp); >>>>>> free(tmp2); >>>>>> >>>>>> @@ -585,8 +606,11 @@ >>>>>> /**** CREATE PROCESS NAME ****/ >>>>>> int orte_util_create_process_name(orte_process_name_t **name, >>>>>> orte_jobid_t job, >>>>>> - orte_vpid_t vpid, >>>>>> - orte_epoch_t epoch) >>>>>> + orte_vpid_t vpid >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> + ,orte_epoch_t epoch >>>>>> +#endif >>>>>> + ) >>>>>> { >>>>>> *name = NULL; >>>>>> >>>>>> @@ -598,7 +622,8 @@ >>>>>> >>>>>> (*name)->jobid = job; >>>>>> (*name)->vpid = vpid; >>>>>> - (*name)->epoch = epoch; >>>>>> + ORTE_EPOCH_SET((*name)->epoch,epoch); >>>>>> + >>>>>> return ORTE_SUCCESS; >>>>>> } >>>>>> >>>>>> @@ -655,6 +680,7 @@ >>>>>> } >>>>>> } >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* Get here if jobid's and vpid's are equal, or not being checked. >>>>>> * Now check epoch. >>>>>> */ >>>>>> @@ -666,6 +692,7 @@ >>>>>> return OPAL_VALUE1_GREATER; >>>>>> } >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /* only way to get here is if all fields are being checked and are equal, >>>>>> * or jobid not checked, but vpid equal, >>>>>> >>>>>> Modified: trunk/orte/util/name_fns.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/name_fns.h (original) >>>>>> +++ trunk/orte/util/name_fns.h 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -61,9 +61,13 @@ >>>>>> #define ORTE_VPID_PRINT(n) \ >>>>>> orte_util_print_vpids(n) >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> ORTE_DECLSPEC char* orte_util_print_epoch(const orte_epoch_t epoch); >>>>>> #define ORTE_EPOCH_PRINT(n) \ >>>>>> orte_util_print_epoch(n) >>>>>> +#else >>>>>> +#define ORTE_EPOCH_PRINT(n) >>>>>> +#endif >>>>>> >>>>>> ORTE_DECLSPEC char* orte_util_print_job_family(const orte_jobid_t job); >>>>>> #define ORTE_JOB_FAMILY_PRINT(n) \ >>>>>> @@ -104,6 +108,24 @@ >>>>>> #define ORTE_JOBID_IS_DAEMON(n) \ >>>>>> !((n) & 0x0000ffff) >>>>>> >>>>>> +/* Macro for getting the epoch out of the process name */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> +#define ORTE_EPOCH_GET(n) \ >>>>>> + ((n)->epoch) >>>>>> +#else >>>>>> +#define ORTE_EPOCH_GET(n) >>>>>> +#endif >>>>>> + >>>>>> +/* Macro for setting the epoch in the process name */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> +#define ORTE_EPOCH_SET(n,m) \ >>>>>> + ( (n) = (m) ) >>>>>> +#else >>>>>> +#define ORTE_EPOCH_SET(n,m) \ >>>>>> + do { \ >>>>>> + } while(0); >>>>>> +#endif >>>>>> + >>>>>> /* List of names for general use */ >>>>>> struct orte_namelist_t { >>>>>> opal_list_item_t item; /**< Allows this item to be placed on a list >>>>>> */ >>>>>> @@ -117,16 +139,24 @@ >>>>>> ORTE_DECLSPEC int orte_util_convert_string_to_jobid(orte_jobid_t *jobid, >>>>>> const char* jobidstring); >>>>>> ORTE_DECLSPEC int orte_util_convert_vpid_to_string(char **vpid_string, >>>>>> const orte_vpid_t vpid); >>>>>> ORTE_DECLSPEC int orte_util_convert_string_to_vpid(orte_vpid_t *vpid, >>>>>> const char* vpidstring); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> ORTE_DECLSPEC int orte_util_convert_epoch_to_string(char **epoch_string, >>>>>> const orte_epoch_t epoch); >>>>>> ORTE_DECLSPEC int orte_util_convert_string_to_epoch(orte_vpid_t *epoch, >>>>>> const char* epochstring); >>>>>> +#endif >>>>>> ORTE_DECLSPEC int >>>>>> orte_util_convert_string_to_process_name(orte_process_name_t *name, >>>>>> const char* name_string); >>>>>> ORTE_DECLSPEC int orte_util_convert_process_name_to_string(char** >>>>>> name_string, >>>>>> const orte_process_name_t *name); >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> ORTE_DECLSPEC int orte_util_create_process_name(orte_process_name_t >>>>>> **name, >>>>>> orte_jobid_t job, >>>>>> orte_vpid_t vpid, >>>>>> orte_epoch_t epoch); >>>>>> +#else >>>>>> +ORTE_DECLSPEC int orte_util_create_process_name(orte_process_name_t >>>>>> **name, >>>>>> + orte_jobid_t job, >>>>>> + orte_vpid_t vpid); >>>>>> +#endif >>>>>> ORTE_DECLSPEC int orte_util_compare_name_fields(orte_ns_cmp_bitmask_t >>>>>> fields, >>>>>> const orte_process_name_t* name1, >>>>>> const orte_process_name_t* name2); >>>>>> >>>>>> Modified: trunk/orte/util/nidmap.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/nidmap.c (original) >>>>>> +++ trunk/orte/util/nidmap.c 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -249,7 +249,7 @@ >>>>>> */ >>>>>> /* construct the URI */ >>>>>> proc.vpid = node->daemon; >>>>>> - proc.epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(proc.epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> orte_util_convert_process_name_to_string(&proc_name, &proc); >>>>>> asprintf(&uri, "%s;tcp://%s:%d", proc_name, addr, >>>>>> (int)orte_process_info.my_port); >>>>>> @@ -1001,6 +1001,7 @@ >>>>>> } >>>>>> #endif >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* Look up the current epoch value that we have stored locally. >>>>>> * >>>>>> * Note that this will not ping the HNP to get the most up to date epoch >>>>>> stored >>>>>> @@ -1023,7 +1024,9 @@ >>>>>> /*print_orte_job_data();*/ >>>>>> return e; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> +#if ORTE_RESIL_ORTE >>>>>> bool orte_util_proc_is_running(orte_process_name_t *proc) { >>>>>> int i; >>>>>> unsigned int j; >>>>>> @@ -1078,7 +1081,9 @@ >>>>>> >>>>>> return ORTE_ERROR; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> /* >>>>>> * This function performs both the get and set operations on the epoch >>>>>> for a >>>>>> * sepcific process name. If the epoch passed into the function is >>>>>> @@ -1091,6 +1096,11 @@ >>>>>> orte_job_t *jdata; >>>>>> orte_proc_t *pdata; >>>>>> >>>>>> + if (ORTE_JOBID_INVALID == proc->jobid || >>>>>> + ORTE_VPID_INVALID == proc->vpid) { >>>>>> + return ORTE_EPOCH_INVALID; >>>>>> + } >>>>>> + >>>>>> /* Sanity check just to make sure we don't overwrite our existing >>>>>> * orte_job_data. >>>>>> */ >>>>>> @@ -1165,4 +1175,5 @@ >>>>>> return ORTE_EPOCH_MIN; >>>>>> } >>>>>> } >>>>>> +#endif >>>>>> >>>>>> >>>>>> Modified: trunk/orte/util/nidmap.h >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/nidmap.h (original) >>>>>> +++ trunk/orte/util/nidmap.h 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -48,11 +48,19 @@ >>>>>> ORTE_DECLSPEC orte_pmap_t* orte_util_lookup_pmap(orte_process_name_t >>>>>> *proc); >>>>>> ORTE_DECLSPEC orte_nid_t* orte_util_lookup_nid(orte_process_name_t >>>>>> *proc); >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> ORTE_DECLSPEC orte_epoch_t orte_util_lookup_epoch(orte_process_name_t >>>>>> *proc); >>>>>> ORTE_DECLSPEC orte_epoch_t orte_util_set_epoch(orte_process_name_t >>>>>> *proc, orte_epoch_t epoch); >>>>>> +#endif >>>>>> >>>>>> ORTE_DECLSPEC int orte_util_set_proc_state(orte_process_name_t *proc, >>>>>> orte_proc_state_t state); >>>>>> + >>>>>> +#if ORTE_RESIL_ORTE >>>>>> +#define PROC_IS_RUNNING(n) orte_util_proc_is_running(n) >>>>>> ORTE_DECLSPEC bool orte_util_proc_is_running(orte_process_name_t *proc); >>>>>> +#else >>>>>> +#define PROC_IS_RUNNING(n) ( true ) >>>>>> +#endif >>>>>> >>>>>> ORTE_DECLSPEC int orte_util_encode_nodemap(opal_byte_object_t *boptr); >>>>>> ORTE_DECLSPEC int orte_util_decode_nodemap(opal_byte_object_t *boptr); >>>>>> @@ -72,5 +80,8 @@ >>>>>> END_C_DECLS >>>>>> >>>>>> /* Local functions */ >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> orte_epoch_t get_epoch_from_orte_job_data(orte_process_name_t *proc, >>>>>> orte_epoch_t epoch); >>>>>> #endif >>>>>> + >>>>>> +#endif >>>>>> >>>>>> Modified: trunk/orte/util/proc_info.c >>>>>> ============================================================================== >>>>>> --- trunk/orte/util/proc_info.c (original) >>>>>> +++ trunk/orte/util/proc_info.c 2011-08-26 18:16:14 EDT (Fri, 26 Aug >>>>>> 2011) >>>>>> @@ -36,13 +36,19 @@ >>>>>> >>>>>> #include "orte/util/proc_info.h" >>>>>> >>>>>> +#if ORTE_ENABLE_EPOCH >>>>>> +#define ORTE_NAME_INVALID {ORTE_JOBID_INVALID, ORTE_VPID_INVALID, >>>>>> ORTE_EPOCH_MIN} >>>>>> +#else >>>>>> +#define ORTE_NAME_INVALID {ORTE_JOBID_INVALID, ORTE_VPID_INVALID} >>>>>> +#endif >>>>>> + >>>>>> ORTE_DECLSPEC orte_proc_info_t orte_process_info = { >>>>>> - /* .my_name = */ {ORTE_JOBID_INVALID, >>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN}, >>>>>> - /* .my_daemon = */ {ORTE_JOBID_INVALID, >>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN}, >>>>>> + /* .my_name = */ ORTE_NAME_INVALID, >>>>>> + /* .my_daemon = */ ORTE_NAME_INVALID, >>>>>> /* .my_daemon_uri = */ NULL, >>>>>> - /* .my_hnp = */ {ORTE_JOBID_INVALID, >>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN}, >>>>>> + /* .my_hnp = */ ORTE_NAME_INVALID, >>>>>> /* .my_hnp_uri = */ NULL, >>>>>> - /* .my_parent = */ {ORTE_JOBID_INVALID, >>>>>> ORTE_VPID_INVALID, ORTE_EPOCH_MIN}, >>>>>> + /* .my_parent = */ ORTE_NAME_INVALID, >>>>>> /* .hnp_pid = */ 0, >>>>>> /* .app_num = */ 0, >>>>>> /* .num_procs = */ 1, >>>>>> >>>>>> Modified: trunk/test/util/orte_session_dir.c >>>>>> ============================================================================== >>>>>> --- trunk/test/util/orte_session_dir.c (original) >>>>>> +++ trunk/test/util/orte_session_dir.c 2011-08-26 18:16:14 EDT (Fri, >>>>>> 26 Aug 2011) >>>>>> @@ -57,7 +57,7 @@ >>>>>> orte_process_info.my_name->cellid = 0; >>>>>> orte_process_info.my_name->jobid = 0; >>>>>> orte_process_info.my_name->vpid = 0; >>>>>> - orte_process_info.my_name->epoch = ORTE_EPOCH_MIN; >>>>>> + ORTE_EPOCH_SET(orte_process_info.my_name->epoch,ORTE_EPOCH_MIN); >>>>>> >>>>>> test_init("orte_session_dir_t"); >>>>>> test_out = fopen( "test_session_dir_out", "w+" ); >>>>>> _______________________________________________ >>>>>> svn-full mailing list >>>>>> svn-f...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel