On Wed, Sep 5, 2018 at 10:44 AM aginwala <aginw...@asu.edu> wrote: > > Thanks Numan: > > I will give it shot and update the findings. > > > On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique <nusid...@redhat.com> wrote: >> >> >> >> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou <zhou...@gmail.com> wrote: >>> >>> >>> >>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique <nusid...@redhat.com> wrote: >>> > >>> > >>> > >>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff <b...@ovn.org> wrote: >>> >> >>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote: >>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala <aginw...@asu.edu> wrote: >>> >> > > >>> >> > > >>> >> > > To add on , we are using LB VIP IP and no constraint with 3 nodes as Han >>> >> > mentioned earlier where active node have syncs from invalid IP and rest >>> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs from one >>> >> > node that triggered: >>> >> > https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460 >>> >> > > >>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp: 10.189.208.16:50686: >>> >> > entering RECONNECT >>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp: >>> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound database due to >>> >> > server termination) >>> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp: >>> >> > 10.189.208.21:56160: disconnecting (removing _Server database due to server >>> >> > termination) >>> >> > > 20 >>> >> > > >>> >> > > I am not sure if sync_from on active node too via some invalid ip is >>> >> > causing some flaw when all are down during the race condition in this >>> >> > corner case. >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique < nusid...@redhat.com> wrote: >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff <b...@ovn.org> wrote: >>> >> > >>> >>> >> > >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote: >>> >> > >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff <b...@ovn.org> wrote: >>> >> > >>> > > >>> >> > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote: >>> >> > >>> > > > Hi, >>> >> > >>> > > > >>> >> > >>> > > > We found an issue in our testing (thanks aginwala) with >>> >> > active-backup >>> >> > >>> > mode >>> >> > >>> > > > in OVN setup. >>> >> > >>> > > > In the 3 node setup with pacemaker, after stopping pacemaker on >>> >> > all >>> >> > >>> > three >>> >> > >>> > > > nodes (simulate a complete shutdown), and then if starting all of >>> >> > them >>> >> > >>> > > > simultaneously, there is a good chance that the whole DB content >>> >> > gets >>> >> > >>> > lost. >>> >> > >>> > > > >>> >> > >>> > > > After studying the replication code, it seems there is a phase >>> >> > that the >>> >> > >>> > > > backup node deletes all its data and wait for data to be synced >>> >> > from the >>> >> > >>> > > > active node: >>> >> > >>> > > > >>> >> > https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306 >>> >> > >>> > > > >>> >> > >>> > > > At this state, if the node was set to active, then all data is >>> >> > gone for >>> >> > >>> > the >>> >> > >>> > > > whole cluster. This can happen in different situations. In the >>> >> > test >>> >> > >>> > > > scenario mentioned above it is very likely to happen, since >>> >> > pacemaker >>> >> > >>> > just >>> >> > >>> > > > randomly select one as master, not knowing the internal sync >>> >> > state of >>> >> > >>> > each >>> >> > >>> > > > node. It could also happen when failover happens right after a new >>> >> > >>> > backup >>> >> > >>> > > > is started, although less likely in real environment, so starting >>> >> > up >>> >> > >>> > node >>> >> > >>> > > > one by one may largely reduce the probability. >>> >> > >>> > > > >>> >> > >>> > > > Does this analysis make sense? We will do more tests to verify the >>> >> > >>> > > > conclusion, but would like to share with community for >>> >> > discussions and >>> >> > >>> > > > suggestions. Once this happens it is very critical - even more >>> >> > serious >>> >> > >>> > than >>> >> > >>> > > > just no HA. Without HA it is just control plane outage, but this >>> >> > would >>> >> > >>> > be >>> >> > >>> > > > data plane outage because OVS flows will be removed accordingly >>> >> > since >>> >> > >>> > the >>> >> > >>> > > > data is considered as deleted from ovn-controller point of view. >>> >> > >>> > > > >>> >> > >>> > > > We understand that active-standby is not the ideal HA mechanism >>> >> > and >>> >> > >>> > > > clustering is the future, and we are also testing the clustering >>> >> > with >>> >> > >>> > the >>> >> > >>> > > > latest patch. But it would be good if this problem can be >>> >> > addressed with >>> >> > >>> > > > some quick fix, such as keep a copy of the old data somewhere >>> >> > until the >>> >> > >>> > > > first sync finishes? >>> >> > >>> > > >>> >> > >>> > > This does seem like a plausible bug, and at first glance I believe >>> >> > that >>> >> > >>> > > you're correct about the race here. I guess that the correct >>> >> > behavior >>> >> > >>> > > must be to keep the original data until a new copy of the data has >>> >> > been >>> >> > >>> > > received, and only then atomically replace the original by the new. >>> >> > >>> > > >>> >> > >>> > > Is this something you have time and ability to fix? >>> >> > >>> > >>> >> > >>> > Thanks Ben for quick response. I guess I will not have time until I >>> >> > send >>> >> > >>> > out next series for incremental processing :) >>> >> > >>> > It would be good if someone can help and then please reply this email >>> >> > if >>> >> > >>> > he/she starts working on it so that we will not end up with >>> >> > overlapping >>> >> > >>> > work. >>> >> > >> >>> >> > >> >>> >> > >> I will give a shot at fixing this issue. >>> >> > >> >>> >> > >> In the case of tripleo we haven't hit this issue. I haven't tested this >>> >> > scenario. >>> >> > >> I will test it out. One difference when compared to your setup is >>> >> > tripleo uses >>> >> > >> IPAddr2 resource and a collocation constraint set. >>> >> > >> >>> >> > >> Thanks >>> >> > >> Numan >>> >> > >> >>> >> > >>> >> > Thanks Numan for helping on this. I think IPAddr2 should have same problem, >>> >> > if my previous analysis was right, unless using IPAddr2 would result in >>> >> > pacemaker always electing the node that is configured with the master IP as >>> >> > the master when starting pacemaker on all nodes again. >>> >> > >>> >> > Ali, thanks for the information. Just to clarify that the log "removing xxx >>> >> > database due to server termination" is not related to this issue. It might >>> >> > be misleading but it doesn't mean deleting content of database. It is just >>> >> > doing clean-up of internal data structure before exiting. The code that >>> >> > deletes the DB data is here: >>> >> > https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306, >>> >> > and there is no log printing for this. You may add log here to verify when >>> >> > you reproduce the issue. >>> >> >>> >> Right, "removing" in this case just means "no longer serving". >>> > >>> > >>> > Hi Han/Ben, >>> > >>> > I have submitted two possible solutions to solve this issue - https://patchwork.ozlabs.org/patch/965246/ and https://patchwork.ozlabs.org/patch/965247/ >>> > Han - can you please try these out and see if it solves the issue. >>> > >>> > Approach 1 resets the database just before processing the monitor reply. This approach is simpler, but it has a small window of error. If the function process_notification() >>> > fails for some reason we could lose the data. I am not sure if it is a possibility or not. >>> > >>> > Approach 2 on the other hand, stores the monitor reply in an in memory ovsdb struct, resets the database and then repopulates the db from the in memory ovsdb struct. >>> > >>> > Please let me know which approach seems to be better or if there is any other way. >>> > >>> > Thanks >>> > Numan >>> > >>> > >>> Thanks Numan! I like Approach 1 for the simplicity. For the error situation, if it happens in extreme situation, since it is standby, we can make sure it never serve as active node in that state - by simply exit. What do you think? >> >> >> I agree that approach 1 is simpler. I think simply exiting would not help. If pacemaker is used for active/standby which I suppose is the case with your setup, pacemaker will restart the ovsdb-server again when it >> sees that monitor action returns NOT_RUNNING. I think it should be fine, because pacemaker would not promote this node as master since there is already a master. But you found this issue by stopping/starting >> the pacemaker resource. So I am not sure how it would behave.
Hi Numan, I agree with you after thinking about it again. Simply exiting would not solve the issue. It is less likely to happen than the original implementation but there is still a probability. It seems we will have to either do atomic swapping to make sure there is never a state that the ovsdb doesn't have data in the disk file, or have some state in the file to indicate that the DB is in *incomplete* state and should not be used as active node. For this reason, even approach 2 still has a problem. Imaging the process got killed after reset database but before new data population to file is complete, it would still leave the data on disk incomplete. Regards, Han
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss