On Tue, Sep 18, 2018 at 1:41 PM Ben Pfaff <b...@ovn.org> wrote: > On Tue, Sep 11, 2018 at 10:55:56AM -0700, Han Zhou wrote: > > On Tue, Sep 11, 2018 at 10:30 AM <nusid...@redhat.com> wrote: > > > > > > From: Numan Siddique <nusid...@redhat.com> > > > > > > The present code resets the database when it is in the state - > > > 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it > > > receives the monitor reply when it is in the state - > > > 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode > > > before it processes the monitor reply, the whole data is lost. > > > > > > This patch alleviates the problem by resetting the database when it > > > receives the monitor reply (before processing it). So that > > > reset and repopulation of the db happens in the same state. > > > > > > This approach still has a window for data loss if the function > > > process_notification() when processing the monitor reply fails for > > > some reason or ovsdb-server crashes for some reason during > > > process_notification(). > > > > > > Reported-by: Han Zhou <zhou...@gmail.com> > > > Reported-at: > > > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html > > > Tested-by: aginwala <aginw...@ebay.com> > > > Acked-by: Han Zhou <zhou...@gmail.com> > > > Signed-off-by: Numan Siddique <nusid...@redhat.com> > > > --- > > > ovsdb/replication.c | 25 +++++++++++-------------- > > > 1 file changed, 11 insertions(+), 14 deletions(-) > > > > > > v1 -> v2 > > > -------- > > > * Updated the commit message as per Han's suggestion > > > * Added few comments in the code where it resets the db. > > > > > > diff --git a/ovsdb/replication.c b/ovsdb/replication.c > > > index 2b9ae2f83..752b3c89c 100644 > > > --- a/ovsdb/replication.c > > > +++ b/ovsdb/replication.c > > > @@ -299,19 +299,7 @@ replication_run(void) > > > /* After receiving schemas, reset the local databases > > that > > > * will be monitored and send out monitor requests for > > them. */ > > > if (hmap_is_empty(&request_ids)) { > > > - struct shash_node *node, *next; > > > - > > > - SHASH_FOR_EACH_SAFE (node, next, replication_dbs) > { > > > - db = node->data; > > > - error = reset_database(db); > > > - if (error) { > > > - const char *db_name = db->schema->name; > > > - shash_find_and_delete(replication_dbs, > > db_name); > > > - ovsdb_error_assert(error); > > > - VLOG_WARN("Failed to reset database, " > > > - "%s not replicated.", db_name); > > > - } > > > - } > > > + struct shash_node *node; > > > > > > if (shash_is_empty(replication_dbs)) { > > > VLOG_WARN("Nothing to replicate."); > > > @@ -335,7 +323,16 @@ replication_run(void) > > > case RPL_S_MONITOR_REQUESTED: { > > > /* Reply to monitor requests. */ > > > struct ovsdb_error *error; > > > - error = process_notification(msg->result, db); > > > + VLOG_INFO("Monitor request received. Resetting the > > database"); > > > + /* Resetting the database here has few risks. If the > > > + * process_notification() fails, the database is > > completely > > > + * lost locally. In case that node becomes active, > then > > > + * there is a chance of complete data loss in the > > active/standy > > > + * cluster. */ > > > + error = reset_database(db); > > > + if (!error) { > > > + error = process_notification(msg->result, db); > > > + } > > > if (error) { > > > ovsdb_error_assert(error); > > > state = RPL_S_ERR; > > > -- > > > 2.17.1 > > > > > > _______________________________________________ > > > dev mailing list > > > d...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > Thanks Numan! > > Thanks Numan and Han. I applied this to master. If you'd like it > backported, please let me know (and how far). >
Thanks Ben for the review and applying the patch. Can you please backport to 2.10 and 2.9 branches. They applied cleanly to me without any issues. Thanks Numan _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev