Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On 7/15/21 8:31 PM, Dumitru Ceara wrote: > On 7/15/21 7:37 PM, Han Zhou wrote: >> On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets wrote: >>> >>> On 6/29/21 9:57 PM, Ilya Maximets wrote: >> Regarding the current patch, I think it's better to add a test case to >> cover the scenario and confirm that existing connections didn't >> reset. With >> that: >> Acked-by: Han Zhou I'll work on a unit test for this. >>> >>> Hi. Here is a unit test that I came up with: >>> >>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at >>> index 62181dd4d..e32f9ec89 100644 >>> --- a/tests/ovsdb-idl.at >>> +++ b/tests/ovsdb-idl.at >>> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl, >> monitor_cond_since, cluster disconnect], >>> 008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] >> uuid=<2> >>> 009: done >>> ]]) >>> + >>> +dnl This test checks that IDL keeps the existing connection to the >> server if >>> +dnl it's still on a list of remotes after update. >>> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes], >>> + [], >>> + [['set-remote unix:socket' \ >>> +'+set-remote unix:bad_socket,unix:socket' \ >>> +'+set-remote unix:bad_socket' \ >>> +'+set-remote unix:socket' \ >>> +'set-remote unix:bad_socket,unix:socket' \ >>> +'+set-remote unix:socket' \ >>> +'+reconnect']], >>> + [[000: empty >>> +001: new remotes: unix:socket, is connected: true >>> +002: new remotes: unix:bad_socket,unix:socket, is connected: true >>> +003: new remotes: unix:bad_socket, is connected: false >>> +004: new remotes: unix:socket, is connected: false >>> +005: empty >>> +006: new remotes: unix:bad_socket,unix:socket, is connected: true >>> +007: new remotes: unix:socket, is connected: true >>> +008: reconnect >>> +009: empty >>> +010: done >>> +]]) >>> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c >>> index a886f971e..93329cd4c 100644 >>> --- a/tests/test-ovsdb.c >>> +++ b/tests/test-ovsdb.c >>> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx) >>> setvbuf(stdout, NULL, _IONBF, 0); >>> >>> symtab = ovsdb_symbol_table_create(); >>> +const char remote_s[] = "set-remote "; >>> const char cond_s[] = "condition "; >>> if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) { >>> update_conditions(idl, ctx->argv[2] + strlen(cond_s)); >>> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx) >>> if (!strcmp(arg, "reconnect")) { >>> print_and_log("%03d: reconnect", step++); >>> ovsdb_idl_force_reconnect(idl); >>> +} else if (!strncmp(arg, remote_s, strlen(remote_s))) { >>> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true); >>> +print_and_log("%03d: new remotes: %s, is connected: %s", >> step++, >>> + arg + strlen(remote_s), >>> + ovsdb_idl_is_connected(idl) ? "true" : >> "false"); >>> } else if (!strncmp(arg, cond_s, strlen(cond_s))) { >>> update_conditions(idl, arg + strlen(cond_s)); >>> print_and_log("%03d: change conditions", step++); >>> --- >>> >>> Dumitru, Han, if it looks good to you, I can squash it in before >>> applying the patch. What do you think? >> >> Thanks Ilya. LGTM. > > Looks good to me too, thanks! Applied. Thanks! Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On 7/15/21 7:37 PM, Han Zhou wrote: > On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets wrote: >> >> On 6/29/21 9:57 PM, Ilya Maximets wrote: > Regarding the current patch, I think it's better to add a test case to > cover the scenario and confirm that existing connections didn't > reset. With > that: > Acked-by: Han Zhou >>> >>> I'll work on a unit test for this. >> >> Hi. Here is a unit test that I came up with: >> >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at >> index 62181dd4d..e32f9ec89 100644 >> --- a/tests/ovsdb-idl.at >> +++ b/tests/ovsdb-idl.at >> @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl, > monitor_cond_since, cluster disconnect], >> 008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] > uuid=<2> >> 009: done >> ]]) >> + >> +dnl This test checks that IDL keeps the existing connection to the > server if >> +dnl it's still on a list of remotes after update. >> +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes], >> + [], >> + [['set-remote unix:socket' \ >> +'+set-remote unix:bad_socket,unix:socket' \ >> +'+set-remote unix:bad_socket' \ >> +'+set-remote unix:socket' \ >> +'set-remote unix:bad_socket,unix:socket' \ >> +'+set-remote unix:socket' \ >> +'+reconnect']], >> + [[000: empty >> +001: new remotes: unix:socket, is connected: true >> +002: new remotes: unix:bad_socket,unix:socket, is connected: true >> +003: new remotes: unix:bad_socket, is connected: false >> +004: new remotes: unix:socket, is connected: false >> +005: empty >> +006: new remotes: unix:bad_socket,unix:socket, is connected: true >> +007: new remotes: unix:socket, is connected: true >> +008: reconnect >> +009: empty >> +010: done >> +]]) >> diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c >> index a886f971e..93329cd4c 100644 >> --- a/tests/test-ovsdb.c >> +++ b/tests/test-ovsdb.c >> @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx) >> setvbuf(stdout, NULL, _IONBF, 0); >> >> symtab = ovsdb_symbol_table_create(); >> +const char remote_s[] = "set-remote "; >> const char cond_s[] = "condition "; >> if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) { >> update_conditions(idl, ctx->argv[2] + strlen(cond_s)); >> @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx) >> if (!strcmp(arg, "reconnect")) { >> print_and_log("%03d: reconnect", step++); >> ovsdb_idl_force_reconnect(idl); >> +} else if (!strncmp(arg, remote_s, strlen(remote_s))) { >> +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true); >> +print_and_log("%03d: new remotes: %s, is connected: %s", > step++, >> + arg + strlen(remote_s), >> + ovsdb_idl_is_connected(idl) ? "true" : > "false"); >> } else if (!strncmp(arg, cond_s, strlen(cond_s))) { >> update_conditions(idl, arg + strlen(cond_s)); >> print_and_log("%03d: change conditions", step++); >> --- >> >> Dumitru, Han, if it looks good to you, I can squash it in before >> applying the patch. What do you think? > > Thanks Ilya. LGTM. Looks good to me too, thanks! > >> >> Best regards, Ilya Maximets. > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On Thu, Jul 15, 2021 at 9:17 AM Ilya Maximets wrote: > > On 6/29/21 9:57 PM, Ilya Maximets wrote: > >>> Regarding the current patch, I think it's better to add a test case to > >>> cover the scenario and confirm that existing connections didn't reset. With > >>> that: > >>> Acked-by: Han Zhou > > > > I'll work on a unit test for this. > > Hi. Here is a unit test that I came up with: > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index 62181dd4d..e32f9ec89 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl, monitor_cond_since, cluster disconnect], > 008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2> > 009: done > ]]) > + > +dnl This test checks that IDL keeps the existing connection to the server if > +dnl it's still on a list of remotes after update. > +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes], > + [], > + [['set-remote unix:socket' \ > +'+set-remote unix:bad_socket,unix:socket' \ > +'+set-remote unix:bad_socket' \ > +'+set-remote unix:socket' \ > +'set-remote unix:bad_socket,unix:socket' \ > +'+set-remote unix:socket' \ > +'+reconnect']], > + [[000: empty > +001: new remotes: unix:socket, is connected: true > +002: new remotes: unix:bad_socket,unix:socket, is connected: true > +003: new remotes: unix:bad_socket, is connected: false > +004: new remotes: unix:socket, is connected: false > +005: empty > +006: new remotes: unix:bad_socket,unix:socket, is connected: true > +007: new remotes: unix:socket, is connected: true > +008: reconnect > +009: empty > +010: done > +]]) > diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c > index a886f971e..93329cd4c 100644 > --- a/tests/test-ovsdb.c > +++ b/tests/test-ovsdb.c > @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx) > setvbuf(stdout, NULL, _IONBF, 0); > > symtab = ovsdb_symbol_table_create(); > +const char remote_s[] = "set-remote "; > const char cond_s[] = "condition "; > if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) { > update_conditions(idl, ctx->argv[2] + strlen(cond_s)); > @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx) > if (!strcmp(arg, "reconnect")) { > print_and_log("%03d: reconnect", step++); > ovsdb_idl_force_reconnect(idl); > +} else if (!strncmp(arg, remote_s, strlen(remote_s))) { > +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true); > +print_and_log("%03d: new remotes: %s, is connected: %s", step++, > + arg + strlen(remote_s), > + ovsdb_idl_is_connected(idl) ? "true" : "false"); > } else if (!strncmp(arg, cond_s, strlen(cond_s))) { > update_conditions(idl, arg + strlen(cond_s)); > print_and_log("%03d: change conditions", step++); > --- > > Dumitru, Han, if it looks good to you, I can squash it in before > applying the patch. What do you think? Thanks Ilya. LGTM. > > Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On 6/29/21 9:57 PM, Ilya Maximets wrote: >>> Regarding the current patch, I think it's better to add a test case to >>> cover the scenario and confirm that existing connections didn't reset. With >>> that: >>> Acked-by: Han Zhou > > I'll work on a unit test for this. Hi. Here is a unit test that I came up with: diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index 62181dd4d..e32f9ec89 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -2282,3 +2282,27 @@ OVSDB_CHECK_CLUSTER_IDL_C([simple idl, monitor_cond_since, cluster disconnect], 008: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2> 009: done ]]) + +dnl This test checks that IDL keeps the existing connection to the server if +dnl it's still on a list of remotes after update. +OVSDB_CHECK_IDL_C([simple idl, initially empty, set remotes], + [], + [['set-remote unix:socket' \ +'+set-remote unix:bad_socket,unix:socket' \ +'+set-remote unix:bad_socket' \ +'+set-remote unix:socket' \ +'set-remote unix:bad_socket,unix:socket' \ +'+set-remote unix:socket' \ +'+reconnect']], + [[000: empty +001: new remotes: unix:socket, is connected: true +002: new remotes: unix:bad_socket,unix:socket, is connected: true +003: new remotes: unix:bad_socket, is connected: false +004: new remotes: unix:socket, is connected: false +005: empty +006: new remotes: unix:bad_socket,unix:socket, is connected: true +007: new remotes: unix:socket, is connected: true +008: reconnect +009: empty +010: done +]]) diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c index a886f971e..93329cd4c 100644 --- a/tests/test-ovsdb.c +++ b/tests/test-ovsdb.c @@ -2621,6 +2621,7 @@ do_idl(struct ovs_cmdl_context *ctx) setvbuf(stdout, NULL, _IONBF, 0); symtab = ovsdb_symbol_table_create(); +const char remote_s[] = "set-remote "; const char cond_s[] = "condition "; if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) { update_conditions(idl, ctx->argv[2] + strlen(cond_s)); @@ -2664,6 +2665,11 @@ do_idl(struct ovs_cmdl_context *ctx) if (!strcmp(arg, "reconnect")) { print_and_log("%03d: reconnect", step++); ovsdb_idl_force_reconnect(idl); +} else if (!strncmp(arg, remote_s, strlen(remote_s))) { +ovsdb_idl_set_remote(idl, arg + strlen(remote_s), true); +print_and_log("%03d: new remotes: %s, is connected: %s", step++, + arg + strlen(remote_s), + ovsdb_idl_is_connected(idl) ? "true" : "false"); } else if (!strncmp(arg, cond_s, strlen(cond_s))) { update_conditions(idl, arg + strlen(cond_s)); print_and_log("%03d: change conditions", step++); --- Dumitru, Han, if it looks good to you, I can squash it in before applying the patch. What do you think? Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On Tue, Jun 29, 2021 at 09:57:44PM +0200, Ilya Maximets wrote: > For the more or less automatic ways of solving the disbalance there are > few more ideas that we can explore: > > - Try to measure the load on the ovsdb-server process and report it > somehow in the _Server database, so the client might make a decision > to re-connect to a less loaded server. This might be some metric > based on total number of clients or the time it takes to run a > single event processing loop (poll interval). The servers could report their individual loads to each other via the leader, and report all the server loads in _Server. This has pitfalls too, of course. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On 6/29/21 8:05 PM, Ben Pfaff wrote: > On Tue, Jun 29, 2021 at 10:29:59AM -0700, Han Zhou wrote: >> On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff wrote: >>> >>> On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote: If a new database server added to the cluster, or if one of the database servers changed its IP address or port, then you need to update the list of remotes for the client. For example, if a new OVN_Southbound database server is added, you need to update the ovn-remote for the ovn-controller. However, in the current implementation, the ovsdb-cs module always closes the current connection and creates a new one. This can lead to a storm of re-connections if all ovn-controllers will be updated simultaneously. They can also start re-dowloading the database content, creating even more load on the database servers. Correct this by saving an existing connection if it is still in the list of remotes after the update. 'reconnect' module will report connection state updates, but that is OK since no real re-connection happened and we only updated the state of a new 'reconnect' instance. If required, re-connection can be forced after the update of remotes with ovsdb_cs_force_reconnect(). >>> >>> I think one of the goals here was to keep the load balanced as servers >>> are added. Yes, I thought about that and that is a valid point. It's more like a trade-off here between stability of connections and trying to keep the load balanced in some way. >>> Maybe that's not a big deal, or maybe it would make sense to >>> flip a coin for each of the new servers and switch over to it with >>> probability 1/n where n is the number of servers. That seems like an interesting approach, but I think that resulted probability of keeping the connection would be low. >> >> A similar load-balancing problem exists also when a server is down and then >> recovered. Connections will obviously move away when it is down but they >> won't automatically connect back when it is recovered. Apart from the >> flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the >> past that provides a CLI to reconnect to a specific server which leaves >> this burden to CMS/operators. It is not ideal but still could be an >> alternative to solve the problem. I remember these patches. And I think that disbalance after one of the servers went down and up again (e.g. temporary disconnection of one of the cluster nodes) is a more important issue and at the same time harder to solve, because this happens automatically without intervention from user/CMS's side. And at some extent it's inevitable. E.g. cluster will almost always be disbalanced if 3 server nodes will be restarted for upgrade one by one. Luckily, worker nodes with ovn-controllers needs maintenance too, so eventual load balance will be achieved. One interesting side effect of the current patch is that you can mimic behavior of patches [0][1] like this: set ovn-remote= set ovn-remote= After the first command, the ovn-controller will re-connect to a new server and it will not re-connect again after addition of all other servers back to the list. But I agree that this looks more like a hack than an actual way to do that. For the more or less automatic ways of solving the disbalance there are few more ideas that we can explore: - Try to measure the load on the ovsdb-server process and report it somehow in the _Server database, so the client might make a decision to re-connect to a less loaded server. This might be some metric based on total number of clients or the time it takes to run a single event processing loop (poll interval). - A bit more controlled way is to limit number of clients per server, so the server will decline connection attempts. CMS might have an idea how many clients one server is able/allowed to handle. E.g. for N servers and M clients, it might be reasonable to allow not more than 2M/N connections per server to still be able to serve all clients if half of the servers is down. Of course, it's up to CMS/user to decide on the exact number. This could be implemented as an extra column for connection row in the database. >> >> I think both approaches have their pros and cons. The smart way doesn't >> require human intervention in theory, but when operating at scale people >> usually want to be cautious and have more control over the changes. For >> example, they may want to add the server to the cluster first, and then >> gradually move 1/n connections to the new server after a graceful period, >> or they could be more conservative and only let the new server take new >> connections without moving any existing connections. I'd support both >> options and let the operators decide according to their requirements. This sounds reasonable. >> >> Regarding the current patch, I think it's better to add a test cas
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On Tue, Jun 29, 2021 at 10:29:59AM -0700, Han Zhou wrote: > On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff wrote: > > > > On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote: > > > If a new database server added to the cluster, or if one of the > > > database servers changed its IP address or port, then you need to > > > update the list of remotes for the client. For example, if a new > > > OVN_Southbound database server is added, you need to update the > > > ovn-remote for the ovn-controller. > > > > > > However, in the current implementation, the ovsdb-cs module always > > > closes the current connection and creates a new one. This can lead > > > to a storm of re-connections if all ovn-controllers will be updated > > > simultaneously. They can also start re-dowloading the database > > > content, creating even more load on the database servers. > > > > > > Correct this by saving an existing connection if it is still in the > > > list of remotes after the update. > > > > > > 'reconnect' module will report connection state updates, but that > > > is OK since no real re-connection happened and we only updated the > > > state of a new 'reconnect' instance. > > > > > > If required, re-connection can be forced after the update of remotes > > > with ovsdb_cs_force_reconnect(). > > > > I think one of the goals here was to keep the load balanced as servers > > are added. Maybe that's not a big deal, or maybe it would make sense to > > flip a coin for each of the new servers and switch over to it with > > probability 1/n where n is the number of servers. > > A similar load-balancing problem exists also when a server is down and then > recovered. Connections will obviously move away when it is down but they > won't automatically connect back when it is recovered. Apart from the > flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the > past that provides a CLI to reconnect to a specific server which leaves > this burden to CMS/operators. It is not ideal but still could be an > alternative to solve the problem. > > I think both approaches have their pros and cons. The smart way doesn't > require human intervention in theory, but when operating at scale people > usually want to be cautious and have more control over the changes. For > example, they may want to add the server to the cluster first, and then > gradually move 1/n connections to the new server after a graceful period, > or they could be more conservative and only let the new server take new > connections without moving any existing connections. I'd support both > options and let the operators decide according to their requirements. > > Regarding the current patch, I think it's better to add a test case to > cover the scenario and confirm that existing connections didn't reset. With > that: > Acked-by: Han Zhou This seems reasonable; to be sure, I'm not arguing against Ilya's appproach, just trying to explain my recollection of why it was done this way. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On Tue, Jun 29, 2021 at 8:43 AM Ben Pfaff wrote: > > On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote: > > If a new database server added to the cluster, or if one of the > > database servers changed its IP address or port, then you need to > > update the list of remotes for the client. For example, if a new > > OVN_Southbound database server is added, you need to update the > > ovn-remote for the ovn-controller. > > > > However, in the current implementation, the ovsdb-cs module always > > closes the current connection and creates a new one. This can lead > > to a storm of re-connections if all ovn-controllers will be updated > > simultaneously. They can also start re-dowloading the database > > content, creating even more load on the database servers. > > > > Correct this by saving an existing connection if it is still in the > > list of remotes after the update. > > > > 'reconnect' module will report connection state updates, but that > > is OK since no real re-connection happened and we only updated the > > state of a new 'reconnect' instance. > > > > If required, re-connection can be forced after the update of remotes > > with ovsdb_cs_force_reconnect(). > > I think one of the goals here was to keep the load balanced as servers > are added. Maybe that's not a big deal, or maybe it would make sense to > flip a coin for each of the new servers and switch over to it with > probability 1/n where n is the number of servers. A similar load-balancing problem exists also when a server is down and then recovered. Connections will obviously move away when it is down but they won't automatically connect back when it is recovered. Apart from the flipping-a-coin approach suggested by Ben, I saw a proposal [0] [1] in the past that provides a CLI to reconnect to a specific server which leaves this burden to CMS/operators. It is not ideal but still could be an alternative to solve the problem. I think both approaches have their pros and cons. The smart way doesn't require human intervention in theory, but when operating at scale people usually want to be cautious and have more control over the changes. For example, they may want to add the server to the cluster first, and then gradually move 1/n connections to the new server after a graceful period, or they could be more conservative and only let the new server take new connections without moving any existing connections. I'd support both options and let the operators decide according to their requirements. Regarding the current patch, I think it's better to add a test case to cover the scenario and confirm that existing connections didn't reset. With that: Acked-by: Han Zhou [0] https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/373895.html [1] https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/374237.html ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On Tue, Jun 29, 2021 at 12:56:18PM +0200, Ilya Maximets wrote: > If a new database server added to the cluster, or if one of the > database servers changed its IP address or port, then you need to > update the list of remotes for the client. For example, if a new > OVN_Southbound database server is added, you need to update the > ovn-remote for the ovn-controller. > > However, in the current implementation, the ovsdb-cs module always > closes the current connection and creates a new one. This can lead > to a storm of re-connections if all ovn-controllers will be updated > simultaneously. They can also start re-dowloading the database > content, creating even more load on the database servers. > > Correct this by saving an existing connection if it is still in the > list of remotes after the update. > > 'reconnect' module will report connection state updates, but that > is OK since no real re-connection happened and we only updated the > state of a new 'reconnect' instance. > > If required, re-connection can be forced after the update of remotes > with ovsdb_cs_force_reconnect(). I think one of the goals here was to keep the load balanced as servers are added. Maybe that's not a big deal, or maybe it would make sense to flip a coin for each of the new servers and switch over to it with probability 1/n where n is the number of servers. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
On 6/29/21 12:56 PM, Ilya Maximets wrote: > If a new database server added to the cluster, or if one of the > database servers changed its IP address or port, then you need to > update the list of remotes for the client. For example, if a new > OVN_Southbound database server is added, you need to update the > ovn-remote for the ovn-controller. > > However, in the current implementation, the ovsdb-cs module always > closes the current connection and creates a new one. This can lead > to a storm of re-connections if all ovn-controllers will be updated > simultaneously. They can also start re-dowloading the database > content, creating even more load on the database servers. > > Correct this by saving an existing connection if it is still in the > list of remotes after the update. > > 'reconnect' module will report connection state updates, but that > is OK since no real re-connection happened and we only updated the > state of a new 'reconnect' instance. > > If required, re-connection can be forced after the update of remotes > with ovsdb_cs_force_reconnect(). > > Signed-off-by: Ilya Maximets > --- I tried it out and it works fine; the code looks OK to me: Acked-by: Dumitru Ceara Thanks, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ovsdb-cs: Avoid unnecessary re-connections when updating remotes.
If a new database server added to the cluster, or if one of the database servers changed its IP address or port, then you need to update the list of remotes for the client. For example, if a new OVN_Southbound database server is added, you need to update the ovn-remote for the ovn-controller. However, in the current implementation, the ovsdb-cs module always closes the current connection and creates a new one. This can lead to a storm of re-connections if all ovn-controllers will be updated simultaneously. They can also start re-dowloading the database content, creating even more load on the database servers. Correct this by saving an existing connection if it is still in the list of remotes after the update. 'reconnect' module will report connection state updates, but that is OK since no real re-connection happened and we only updated the state of a new 'reconnect' instance. If required, re-connection can be forced after the update of remotes with ovsdb_cs_force_reconnect(). Signed-off-by: Ilya Maximets --- lib/jsonrpc.c | 13 + lib/jsonrpc.h | 1 + lib/ovsdb-cs.c | 25 + lib/svec.c | 11 +++ lib/svec.h | 1 + 5 files changed, 47 insertions(+), 4 deletions(-) diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c index 926cbcb86..7e333ae25 100644 --- a/lib/jsonrpc.c +++ b/lib/jsonrpc.c @@ -952,6 +952,19 @@ jsonrpc_session_steal(struct jsonrpc_session *s) return rpc; } +void +jsonrpc_session_replace(struct jsonrpc_session *s, struct jsonrpc *rpc) +{ +if (s->rpc) { +jsonrpc_close(s->rpc); +} +s->rpc = rpc; +if (s->rpc) { +reconnect_set_name(s->reconnect, jsonrpc_get_name(s->rpc)); +reconnect_connected(s->reconnect, time_msec()); +} +} + static void jsonrpc_session_disconnect(struct jsonrpc_session *s) { diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h index d75d66b86..034a30b71 100644 --- a/lib/jsonrpc.h +++ b/lib/jsonrpc.h @@ -114,6 +114,7 @@ struct jsonrpc_session *jsonrpc_session_open_unreliably(struct jsonrpc *, void jsonrpc_session_close(struct jsonrpc_session *); struct jsonrpc *jsonrpc_session_steal(struct jsonrpc_session *); +void jsonrpc_session_replace(struct jsonrpc_session *, struct jsonrpc *); void jsonrpc_session_run(struct jsonrpc_session *); void jsonrpc_session_wait(struct jsonrpc_session *); diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c index 911b71dd4..22630955d 100644 --- a/lib/ovsdb-cs.c +++ b/lib/ovsdb-cs.c @@ -660,7 +660,8 @@ ovsdb_cs_wait(struct ovsdb_cs *cs) /* Network connection. */ -/* Changes the remote and creates a new session. +/* Changes the remote and creates a new session. Keeps existing connection + * if current remote is still valid. * * If 'retry' is true, the connection to the remote will automatically retry * when it fails. If 'retry' is false, the connection is one-time. */ @@ -670,9 +671,12 @@ ovsdb_cs_set_remote(struct ovsdb_cs *cs, const char *remote, bool retry) if (cs && ((remote != NULL) != (cs->remote != NULL) || (remote && cs->remote && strcmp(remote, cs->remote { +struct jsonrpc *rpc = NULL; + /* Close the old session, if any. */ if (cs->session) { -jsonrpc_session_close(cs->session); +/* Save the current open connection and close the session. */ +rpc = jsonrpc_session_steal(cs->session); cs->session = NULL; free(cs->remote); @@ -682,17 +686,30 @@ ovsdb_cs_set_remote(struct ovsdb_cs *cs, const char *remote, bool retry) /* Open new session, if any. */ if (remote) { struct svec remotes = SVEC_EMPTY_INITIALIZER; +struct uuid old_cid = cs->cid; + ovsdb_session_parse_remote(remote, &remotes, &cs->cid); if (cs->shuffle_remotes) { svec_shuffle(&remotes); } cs->session = jsonrpc_session_open_multiple(&remotes, retry); -svec_destroy(&remotes); -cs->state_seqno = UINT_MAX; +/* Use old connection, if cluster id didn't change and the remote + * is still on the list, to avoid unnecessary re-connection. */ +if (rpc && uuid_equals(&old_cid, &cs->cid) +&& svec_contains_unsorted(&remotes, jsonrpc_get_name(rpc))) { +jsonrpc_session_replace(cs->session, rpc); +cs->state_seqno = jsonrpc_session_get_seqno(cs->session); +rpc = NULL; +} else { +cs->state_seqno = UINT_MAX; +} +svec_destroy(&remotes); cs->remote = xstrdup(remote); } + +jsonrpc_close(rpc); } } diff --git a/lib/svec.c b/lib/svec.c index c1b986bab..6ea96384b 100644 --- a/lib/svec.c +++ b/lib/svec.c @@ -247,6 +247,17 @@ svec_contains(const struct svec *svec, const char *name) return svec_find(svec, name) != SIZE_MAX; } +bool +svec_contains_unsor