On Mon, Mar 9, 2020 at 3:50 AM Tao YunXiang < [email protected]> wrote: > > For cluster mode, only the first server(which is the leader) in a cluster > should do init. But, sometimes, the role of the first server would transfer > leadership, so we should add "--no-leader-only" option in case > the ovn-northd stucked. > Thanks for the patch. This is trickier than it appears. Let me describe the problem from my understanding. When creating the cluster, the "init" needs to be executed once. This should be done with the first server. In this case the script works well, because the "cluster_remote_addr" is NOT specified when starting the first server, which is the leader at that time. When another server joins the cluster, the "cluster_remote_addr" must be specified, so the "init" won't be executed. However, when a server needs to be restarted later, no matter if it is leader or not, it doesn't need to supply the "cluster_remote_addr" parameter, because that parameter is needed only for joining a cluster. When a server is restarted by executing this script, if "cluster_remote_addr" is NOT specified, it will stuck at the "init" command, because it is not leader and "--no-leader-only" is not specified here.
So, the implementation doesn't really reflect the comment: > # Initialize the database if it's running standalone, > # active-passive, or is the first server in a cluster. It tried to run the init command even when it is not the first server in a cluster. To fix the problem, one way is to make sure it does exactly as what the comment said. However, it seems there is no easy way to distinguish if the server is the first server in a cluster or it is just a server that has already joined being restarted. Alternatively, adding "--no-leader-only" solves the problem as well, as proposed by this patch, because it is not harmful to run "init" again even if the DB is already initiated, just like how it is handled when it is standalone mode. Based on the above analysis, I am ok with the fix, but I suggest to update the commit message to clarify the problem more clearly. In addition, for the comment mentioned above, it should be changed to: # Initialize the database if it's NOT joining a cluster. Finally, maybe the if condition can be removed completely, but I haven't tested and not 100% sure. Thanks, Han > Author: Tao YunXiang <[email protected]> > Co-authored-by: Liu Chang <[email protected]> > Signed-off-by: Tao YunXiang <[email protected]> > Signed-off-by: Liu Chang <[email protected]> > > --- > utilities/ovn-ctl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl > index 2a337ae27..71bdbc076 100755 > --- a/utilities/ovn-ctl > +++ b/utilities/ovn-ctl > @@ -293,7 +293,7 @@ $cluster_remote_port > # Initialize the database if it's running standalone, > # active-passive, or is the first server in a cluster. > if test -z "$cluster_remote_addr"; then > - $(echo ovn-${db}ctl | tr _ -) init > + $(echo ovn-${db}ctl | tr _ -) --no-leader-only init > fi > > if test $mode = cluster; then > -- > 2.17.1 > > > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
