Hi Vu,

I haven't tested this and I cannot see by reading the if it does work but I 
have a few remarks for improvement of mostly readability. See [Lennart] below.

Thanks
Lennart

> -----Original Message-----
> From: Vu Minh Nguyen <[email protected]>
> Sent: den 12 oktober 2018 08:35
> To: Hans Nordebäck <[email protected]>; Lennart Lund
> <[email protected]>; Gary Lee <[email protected]>
> Cc: [email protected]; Vu Minh Nguyen
> <[email protected]>
> Subject: [PATCH 1/1] imm: cluster is rebooted after split-brain recovery
> [#2934]
> 
> During split-brain, there is possiblity of having mismatches in global 
> counters
> hold by IMMNDs which might cause cluster rebooted at split-brain recovery.
> 
> This patch introduces some changes in IMMD - not to update global counters
> from joining IMMNDs if the coord is already elected.
> ---
>  src/imm/immd/immd_evt.c | 34
> +++++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/src/imm/immd/immd_evt.c b/src/imm/immd/immd_evt.c
> index 1114a81d4..237b3a4bb 100644
> --- a/src/imm/immd/immd_evt.c
> +++ b/src/imm/immd/immd_evt.c
> @@ -1629,6 +1629,15 @@ static uint32_t
> immd_evt_proc_immnd_req_sync(IMMD_CB *cb, IMMD_EVT *evt,
>       return proc_rc;
>  }
> 
> +/* Return true if there is any difference in global counters. */
[Lennart]
1.
This is not what this function is doing so if this is the intention the 
following logic is not correct.
This function returns true if any of the checked counters has a lower value 
than the one in what I assume is a received message not if there is a 
difference.
Example of diff check: cb->fevsSendCount != evt->info.ctrl_msg.fevs_count
2.
The function name is not descriptive and the name is made up of abbreviations 
only
A descriptive name shall not contain abbreviations except if they are widely 
used (see Google style guide)
It may be a strategy to create "namespaces" for functions by adding a prefix 
e.g. immd_ etc. this may be ok when writing C code but in this case it is not 
needed. This function is static and that tells me it is local to this file. A 
descriptive name shall as precise as possible tell a reader what the function 
is doing. This name only hints the reader that some difference is checked (has 
to guess that diff means difference) but what is checked and what difference?

> +static bool immd_evt_proc_check_diff(const IMMD_CB *cb, const
> IMMD_EVT *evt)
> +{
> +     return ((cb->fevsSendCount < evt->info.ctrl_msg.fevs_count) ||
> +             (cb->admo_id_count < evt->info.ctrl_msg.admo_id_count)
> ||
> +             (cb->ccb_id_count  < evt->info.ctrl_msg.ccb_id_count) ||
> +             (cb->impl_count    < evt->info.ctrl_msg.impl_count));
> +}
> +
> 
> /**********************************************************
> ******************
>   * Name          : immd_evt_proc_immnd_intro
>   *
> @@ -1718,7 +1727,14 @@ static uint32_t
> immd_evt_proc_immnd_intro(IMMD_CB *cb, IMMD_EVT *evt,
>                           cb->mRulingEpoch);
>               }
> 
> -             if (cb->mRulingEpoch < node_info->epoch) {
> +             /*
> +               Don't update the ruling epoch from joining IMMND if the
> coord
> +               is already elected to avoid cluster reboot after split-brain
> +               recovery. This rule is also applied for all below global
> +               counters (fevs counter, ccb id counter, etc.).
> +             */
[Lennart] This if statement is unreadable it requires a lot of analysis to 
understand. The worst problem is the first parenthesis. Some of the problems 
are:
The parameters e.g. cb->immnd_coord are not Boolean but are handled as Booleans
What does it mean if one of the parameters are 0?
etc.
A solution could be to create a help function with a descriptive name for the 
first parenthesis
The second parenthesis is better but here is the problem to understand what an 
epoch and a mRulingEpoch is. I tried to look it up in the definition but found 
no information.
> +             if (!(cb->immnd_coord && node_info->immnd_key != cb-
> >node_id) &&
> +                 (cb->mRulingEpoch < node_info->epoch)) {
>                       cb->mRulingEpoch = node_info->epoch;
>                       LOG_NO("Ruling epoch changed to:%u", cb-
> >mRulingEpoch);
>               }
> @@ -1769,6 +1785,21 @@ static uint32_t
> immd_evt_proc_immnd_intro(IMMD_CB *cb, IMMD_EVT *evt,
> 
>                       veteranImmndNode = true;
> 
[Lennart] Same problem with this if statement as the previous one. Also seems 
to be redundant. An even better reason for creating a help function (maybe an 
inline function)
The logging here and in the previous if statement, are they really needed? 
Also, the log text is not possible to understand without analyzing the code...
> +                     if (cb->immnd_coord && node_info->immnd_key !=
> cb->node_id) {
> +                             if (immd_evt_proc_check_diff(cb, evt)) {
> +                                     IMMSV_ND2D_CONTROL* msg =
> &(evt->info.ctrl_msg);
> +                                     LOG_NO("Ignore updating counters
> from %x:"
> +                                            "fevs(%llu/%llu), admid
> (%u/%u),"
> +                                            "ccbid(%u/%u), impid(%u/%u)",
> +                                            node_info->immnd_key,
> +                                            cb->fevsSendCount, msg-
> >fevs_count,
> +                                            cb->admo_id_count, msg-
> >admo_id_count,
> +                                            cb->ccb_id_count, msg-
> >ccb_id_count,
> +                                            cb->impl_count, msg-
> >impl_count);
> +                             }
> +                             goto accept_node;
> +                     }
> +
>                       if (cb->fevsSendCount < evt-
> >info.ctrl_msg.fevs_count) {
>                               LOG_NO(
>                                   "Refresh of fevs count from %llu to %llu
> from %x.",
> @@ -1839,6 +1870,7 @@ static uint32_t
> immd_evt_proc_immnd_intro(IMMD_CB *cb, IMMD_EVT *evt,
>               }
>       }
> 
> +
>       /* Determine type of node. */
>       if (sinfo->dest == cb->loc_immnd_dest) {
>               node_info->isOnController = true;
> --
> 2.18.0



_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to