On 7/15/20 4:51 PM, Dumitru Ceara wrote: > On 7/15/20 4:22 PM, Federico Paolinelli wrote: >> >> >> On Tue, Jul 14, 2020 at 10:22 AM Dumitru Ceara <dce...@redhat.com >> <mailto:dce...@redhat.com>> wrote: >> >> On 7/10/20 10:18 AM, Federico Paolinelli wrote: >> > There are some occurrences where the database ends up in an >> inconsistent >> > state. This happened in ovn-k8s and is described in [0]. >> > Here we are adding a supported way to check that a given db is >> consistent, >> > which is less error prone than checking the logs. >> > >> > Tested against both a valid db and a corrupted db attached to the >> > above bug [1]. >> > >> > [0]: https://bugzilla.redhat.com/show_bug.cgi?id=1837953#c23 >> > [1]: https://bugzilla.redhat.com/attachment.cgi?id=1697595 >> > >> > Signed-off-by: Federico Paolinelli <fpaol...@redhat.com >> <mailto:fpaol...@redhat.com>> >> > Suggested-by: Dumitru Ceara <dce...@redhat.com >> <mailto:dce...@redhat.com>> >> > --- >> > ovsdb/ovsdb-tool.c | 22 ++++++++++++++++++++++ >> > 1 file changed, 22 insertions(+) >> > >> > diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c >> > index 91662cab8..016a3ba28 100644 >> > --- a/ovsdb/ovsdb-tool.c >> > +++ b/ovsdb/ovsdb-tool.c >> > @@ -1497,6 +1497,28 @@ do_check_cluster(struct ovs_cmdl_context *ctx) >> > } >> > } >> > >> > + /* Check for db consistency: >> > + * The serverid must be in the servers list. >> > + */ >> > + >> > + for (struct server *s = c.servers; s < >> &c.servers[c.n_servers]; s++) { >> > + struct shash *servers_obj = json_object(s->snap->servers); >> > + char *server_id = xasprintf(SID_FMT, >> SID_ARGS(&s->header.sid)); >> > + bool found = false; >> > + const struct shash_node *node; >> > + >> > + SHASH_FOR_EACH (node, servers_obj) { >> > + if (!strncmp(server_id, node->name, SID_LEN)) { >> > + found = true; >> > + } >> > + } >> > + if (!found) { >> > + ovs_fatal(0, "%s: server %s not found in server list", >> > + s->filename, server_id); >> > + } >> > + free(server_id); >> > + } >> > + >> > /* Clean up. */ >> > >> > for (size_t i = 0; i < c.n_servers; i++) { >> > >> >> Looks good to me, thanks! >> >> Acked-by: Dumitru Ceara <dce...@redhat.com <mailto:dce...@redhat.com>> >> >> >> Question: does it make sense to add a unit test that validates the >> behaviour >> of this new function? And if so, what would be the best place to add it? >> >> Thanks! >> > > I might be wrong but I don't see an easy way to generate a corrupted > database to run "ovsdb-tool check-cluster" against. > > Thanks, > Dumitru >
Hi Federico, As discussed offline, we probably need a new revision of this patch to check the s->entries[*].servers too and fail the check only when the server is not in the raft header and also not in the log entries. Regards, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev