On Thu, Aug 9, 2018 at 1:57 AM, aginwala <aginw...@asu.edu> wrote: > > > To add on , we are using LB VIP IP and no constraint with 3 nodes as Han mentioned earlier where active node have syncs from invalid IP and rest two nodes sync from LB VIP IP. Also, I was able to get some logs from one node that triggered: https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460 > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:10.189.208.16:50686: entering RECONNECT > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp: 10.189.208.16:50686: disconnecting (removing OVN_Northbound database due to server termination) > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp: 10.189.208.21:56160: disconnecting (removing _Server database due to server termination) > 20 > > I am not sure if sync_from on active node too via some invalid ip is causing some flaw when all are down during the race condition in this corner case. > > > > > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique <nusid...@redhat.com> wrote: >> >> >> >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff <b...@ovn.org> wrote: >>> >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote: >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff <b...@ovn.org> wrote: >>> > > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote: >>> > > > Hi, >>> > > > >>> > > > We found an issue in our testing (thanks aginwala) with active-backup >>> > mode >>> > > > in OVN setup. >>> > > > In the 3 node setup with pacemaker, after stopping pacemaker on all >>> > three >>> > > > nodes (simulate a complete shutdown), and then if starting all of them >>> > > > simultaneously, there is a good chance that the whole DB content gets >>> > lost. >>> > > > >>> > > > After studying the replication code, it seems there is a phase that the >>> > > > backup node deletes all its data and wait for data to be synced from the >>> > > > active node: >>> > > > https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306 >>> > > > >>> > > > At this state, if the node was set to active, then all data is gone for >>> > the >>> > > > whole cluster. This can happen in different situations. In the test >>> > > > scenario mentioned above it is very likely to happen, since pacemaker >>> > just >>> > > > randomly select one as master, not knowing the internal sync state of >>> > each >>> > > > node. It could also happen when failover happens right after a new >>> > backup >>> > > > is started, although less likely in real environment, so starting up >>> > node >>> > > > one by one may largely reduce the probability. >>> > > > >>> > > > Does this analysis make sense? We will do more tests to verify the >>> > > > conclusion, but would like to share with community for discussions and >>> > > > suggestions. Once this happens it is very critical - even more serious >>> > than >>> > > > just no HA. Without HA it is just control plane outage, but this would >>> > be >>> > > > data plane outage because OVS flows will be removed accordingly since >>> > the >>> > > > data is considered as deleted from ovn-controller point of view. >>> > > > >>> > > > We understand that active-standby is not the ideal HA mechanism and >>> > > > clustering is the future, and we are also testing the clustering with >>> > the >>> > > > latest patch. But it would be good if this problem can be addressed with >>> > > > some quick fix, such as keep a copy of the old data somewhere until the >>> > > > first sync finishes? >>> > > >>> > > This does seem like a plausible bug, and at first glance I believe that >>> > > you're correct about the race here. I guess that the correct behavior >>> > > must be to keep the original data until a new copy of the data has been >>> > > received, and only then atomically replace the original by the new. >>> > > >>> > > Is this something you have time and ability to fix? >>> > >>> > Thanks Ben for quick response. I guess I will not have time until I send >>> > out next series for incremental processing :) >>> > It would be good if someone can help and then please reply this email if >>> > he/she starts working on it so that we will not end up with overlapping >>> > work. >> >> >> I will give a shot at fixing this issue. >> >> In the case of tripleo we haven't hit this issue. I haven't tested this scenario. >> I will test it out. One difference when compared to your setup is tripleo uses >> IPAddr2 resource and a collocation constraint set. >> >> Thanks >> Numan >>
Thanks Numan for helping on this. I think IPAddr2 should have same problem, if my previous analysis was right, unless using IPAddr2 would result in pacemaker always electing the node that is configured with the master IP as the master when starting pacemaker on all nodes again. Ali, thanks for the information. Just to clarify that the log "removing xxx database due to server termination" is not related to this issue. It might be misleading but it doesn't mean deleting content of database. It is just doing clean-up of internal data structure before exiting. The code that deletes the DB data is here: https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306, and there is no log printing for this. You may add log here to verify when you reproduce the issue. Thanks, Han
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss