Hi Han and Numan, Thanks for your review, I have summited v2 change, ovn-ctl: Fix stucked while do cluster db init
Thanks, Yun -------------- [email protected] >On Mon, Mar 9, 2020 at 3:50 AM Tao YunXiang < >[email protected]> wrote: >> >> For cluster mode, only the first server(which is the leader) in a cluster >> should do init. But, sometimes, the role of the first server would >transfer >> leadership, so we should add "--no-leader-only" option in case >> the ovn-northd stucked. >> >Thanks for the patch. This is trickier than it appears. >Let me describe the problem from my understanding. When creating the >cluster, the "init" needs to be executed once. This should be done with the >first server. In this case the script works well, because the >"cluster_remote_addr" is NOT specified when starting the first server, >which is the leader at that time. When another server joins the cluster, >the "cluster_remote_addr" must be specified, so the "init" won't be >executed. >However, when a server needs to be restarted later, no matter if it is >leader or not, it doesn't need to supply the "cluster_remote_addr" >parameter, because that parameter is needed only for joining a cluster. >When a server is restarted by executing this script, if >"cluster_remote_addr" is NOT specified, it will stuck at the "init" >command, because it is not leader and "--no-leader-only" is not specified >here. > >So, the implementation doesn't really reflect the comment: >> # Initialize the database if it's running standalone, >> # active-passive, or is the first server in a cluster. >It tried to run the init command even when it is not the first server in a >cluster. > >To fix the problem, one way is to make sure it does exactly as what the >comment said. However, it seems there is no easy way to distinguish if the >server is the first server in a cluster or it is just a server that has >already joined being restarted. > >Alternatively, adding "--no-leader-only" solves the problem as well, as >proposed by this patch, because it is not harmful to run "init" again even >if the DB is already initiated, just like how it is handled when it is >standalone mode. > >Based on the above analysis, I am ok with the fix, but I suggest to update >the commit message to clarify the problem more clearly. In addition, for >the comment mentioned above, it should be changed to: > ># Initialize the database if it's NOT joining a cluster. > >Finally, maybe the if condition can be removed completely, but I haven't >tested and not 100% sure. > >Thanks, >Han > > >> Author: Tao YunXiang <[email protected]> >> Co-authored-by: Liu Chang <[email protected]> >> Signed-off-by: Tao YunXiang <[email protected]> >> Signed-off-by: Liu Chang <[email protected]> >> >> --- >> utilities/ovn-ctl | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl >> index 2a337ae27..71bdbc076 100755 >> --- a/utilities/ovn-ctl >> +++ b/utilities/ovn-ctl >> @@ -293,7 +293,7 @@ $cluster_remote_port >> # Initialize the database if it's running standalone, >> # active-passive, or is the first server in a cluster. >> if test -z "$cluster_remote_addr"; then >> - $(echo ovn-${db}ctl | tr _ -) init >> + $(echo ovn-${db}ctl | tr _ -) --no-leader-only init >> fi >> >> if test $mode = cluster; then >> -- >> 2.17.1 >> >> >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
