When in master state and performing light sweeps, openSM ignores other subnet managers in master state and fails to perform handovers or relinquish control. Handovers and relinquishing master is only performed after a heavy sweep. This can result in two subnet managers being in master state, seeing each other SM in master state, but both choose to do nothing about it.
This patch initiates a heavy sweep when another master subnet manager is found during a light sweep. This is sufficient to start a handover or relinquish scenario. Signed-off-by: Albert Chu <ch...@llnl.gov> --- opensm/osm_sminfo_rcv.c | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/opensm/osm_sminfo_rcv.c b/opensm/osm_sminfo_rcv.c index 66eb886..d4ae7c9 100644 --- a/opensm/osm_sminfo_rcv.c +++ b/opensm/osm_sminfo_rcv.c @@ -385,8 +385,22 @@ static void smi_rcv_process_get_sm(IN osm_sm_t * sm, osm_sm_state_mgr_signal_master_is_alive(sm); else { /* This is a response we got while sweeping the subnet. - We will handle a case of handover needed later on, when the sweep - is done and all SMs are recongnized. */ + * + * If this is during a heavy sweep, we will handle a case of + * handover needed later on, when the sweep is done and all + * SMs are recognized. + * + * If this is during a light sweep, initiate a heavy sweep + * to initiate handover scenarios. + * + * Note that it does not matter if the remote SM is lower + * or higher priority. If it is lower priority, we must + * wait for it HANDOVER. If it is higher priority, we need + * to HANDOVER to it. Both cases are handled after doing + * a heavy sweep. + */ + if (light_sweep) + sm->p_subn->force_heavy_sweep = TRUE; } break; case IB_SMINFO_STATE_STANDBY: -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html