On Tue, Sep 18, 2018 at 1:41 PM Ben Pfaff <b...@ovn.org> wrote:

> On Tue, Sep 11, 2018 at 10:55:56AM -0700, Han Zhou wrote:
> > On Tue, Sep 11, 2018 at 10:30 AM <nusid...@redhat.com> wrote:
> > >
> > > From: Numan Siddique <nusid...@redhat.com>
> > >
> > > The present code resets the database when it is in the state -
> > > 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it
> > > receives the monitor reply when it is in the state -
> > > 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode
> > > before it processes the monitor reply, the whole data is lost.
> > >
> > > This patch alleviates the problem by resetting the database when it
> > > receives the monitor reply (before processing it). So that
> > > reset and repopulation of the db happens in the same state.
> > >
> > > This approach still has a window for data loss if the function
> > > process_notification() when processing the monitor reply fails for
> > > some reason or ovsdb-server crashes for some reason during
> > > process_notification().
> > >
> > > Reported-by: Han Zhou <zhou...@gmail.com>
> > > Reported-at:
> >
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html
> > > Tested-by: aginwala <aginw...@ebay.com>
> > > Acked-by: Han Zhou <zhou...@gmail.com>
> > > Signed-off-by: Numan Siddique <nusid...@redhat.com>
> > > ---
> > >  ovsdb/replication.c | 25 +++++++++++--------------
> > >  1 file changed, 11 insertions(+), 14 deletions(-)
> > >
> > > v1 -> v2
> > > --------
> > >  * Updated the commit message as per Han's suggestion
> > >  * Added few comments in the code where it resets the db.
> > >
> > > diff --git a/ovsdb/replication.c b/ovsdb/replication.c
> > > index 2b9ae2f83..752b3c89c 100644
> > > --- a/ovsdb/replication.c
> > > +++ b/ovsdb/replication.c
> > > @@ -299,19 +299,7 @@ replication_run(void)
> > >                  /* After receiving schemas, reset the local databases
> > that
> > >                   * will be monitored and send out monitor requests for
> > them. */
> > >                  if (hmap_is_empty(&request_ids)) {
> > > -                    struct shash_node *node, *next;
> > > -
> > > -                    SHASH_FOR_EACH_SAFE (node, next, replication_dbs)
> {
> > > -                        db = node->data;
> > > -                        error = reset_database(db);
> > > -                        if (error) {
> > > -                            const char *db_name = db->schema->name;
> > > -                            shash_find_and_delete(replication_dbs,
> > db_name);
> > > -                            ovsdb_error_assert(error);
> > > -                            VLOG_WARN("Failed to reset database, "
> > > -                                      "%s not replicated.", db_name);
> > > -                        }
> > > -                    }
> > > +                    struct shash_node *node;
> > >
> > >                      if (shash_is_empty(replication_dbs)) {
> > >                          VLOG_WARN("Nothing to replicate.");
> > > @@ -335,7 +323,16 @@ replication_run(void)
> > >              case RPL_S_MONITOR_REQUESTED: {
> > >                  /* Reply to monitor requests. */
> > >                  struct ovsdb_error *error;
> > > -                error = process_notification(msg->result, db);
> > > +                VLOG_INFO("Monitor request received. Resetting the
> > database");
> > > +                /* Resetting the database here has few risks. If the
> > > +                 * process_notification() fails, the database is
> > completely
> > > +                 * lost locally. In case that node becomes active,
> then
> > > +                 * there is a chance of complete data loss in the
> > active/standy
> > > +                 * cluster. */
> > > +                error = reset_database(db);
> > > +                if (!error) {
> > > +                    error = process_notification(msg->result, db);
> > > +                }
> > >                  if (error) {
> > >                      ovsdb_error_assert(error);
> > >                      state = RPL_S_ERR;
> > > --
> > > 2.17.1
> > >
> > > _______________________________________________
> > > dev mailing list
> > > d...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> > Thanks Numan!
>
> Thanks Numan and Han.  I applied this to master.  If you'd like it
> backported, please let me know (and how far).
>

Thanks Ben for the review and applying the patch. Can you please backport
to 2.10 and 2.9 branches.
They applied cleanly to me without any issues.

Thanks
Numan
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to