Good idea. This would be another good improvement. We could update _last_written_ssconf only if all RPC calls that distribute it succeed. This way, if some fails, we'll try to redistribute it again on a next config update. But I'm a bit worried that this would slow ConfigWriter considerably if a node failed, especially if there were a network problem and trying to upload would end up with a network timeout. So it's a trade-off between trying to be consistent as much as possible and performance. What do you think?
I guess this could be solved by uploading ssconf asynchronously, which would also speed up all configuration updates considerably, but I would rather focus on this in the WConfD daemon (ATM I'm not completely sure if ssconf falling a bit behind the master configuration would be OK, I haven't worked with ssconf very much). The same idea could be used for distributing the configuration to master candidates. On Mon, Jan 13, 2014 at 9:45 AM, Guido Trotter <[email protected]> wrote: > The only doubt I have about this is: what if an update fails for some > reason? Should we keep track of that and in that case retry more > aggressively? rest LGTM > > Thanks, > > Guido > > On Mon, Jan 13, 2014 at 8:49 AM, Petr Pudlak <[email protected]> wrote: > > Instead of writing it every time cluster.serial_no changes, compare it > > to the last written version and write+distribute if it has changed. This > > has these advantages: > > - Less frequent ssconf distribution. > > - Updates are correctly performed even if something changes that > > affects ssconf, but doesn't touch the cluster.serial_no. > > > > Signed-off-by: Petr Pudlak <[email protected]> > > --- > > lib/config.py | 23 +++++++++++++++-------- > > 1 file changed, 15 insertions(+), 8 deletions(-) > > > > diff --git a/lib/config.py b/lib/config.py > > index fcdefc3..3f5b56e 100644 > > --- a/lib/config.py > > +++ b/lib/config.py > > @@ -199,7 +199,10 @@ class ConfigWriter(object): > > # better to raise an error before starting to modify the config > > # file than after it was modified > > self._my_hostname = netutils.Hostname.GetSysName() > > - self._last_cluster_serial = -1 > > + # A dictionary of strings keeping what has been written to ssconf > at its > > + # last update. This is used for determining if we need to > update/distribute > > + # ssconf or not (which is an expensive operation). > > + self._last_written_ssconf = {} > > self._cfg_id = None > > self._context = None > > self._OpenConfig(accept_foreign) > > @@ -2364,9 +2367,8 @@ class ConfigWriter(object): > > raise errors.ConfigurationError(msg) > > > > self._config_data = data > > - # reset the last serial as -1 so that the next write will cause > > - # ssconf update > > - self._last_cluster_serial = -1 > > + # force the next write to update ssconf > > + self._last_written_ssconf = {} > > > > # Upgrade configuration if needed > > self._UpgradeConfig() > > @@ -2508,12 +2510,15 @@ class ConfigWriter(object): > > # and redistribute the config file to master candidates > > self._DistributeConfig(feedback_fn) > > > > - # Write ssconf files on all nodes (including locally) > > - if self._last_cluster_serial < self._config_data.cluster.serial_no: > > + # Write ssconf files on all nodes (including locally), if anything > has > > + # changed > > + ssconf = self._UnlockedGetSsconfValues() > > + if ssconf != self._last_written_ssconf: > > + logging.debug("Ssconf changed, distributing") > > if not self._offline: > > result = self._GetRpc(None).call_write_ssconf_files( > > self._UnlockedGetNodeNames(self._UnlockedGetOnlineNodeList()), > > - self._UnlockedGetSsconfValues()) > > + ssconf) > > > > for nname, nresu in result.items(): > > msg = nresu.fail_msg > > @@ -2525,7 +2530,9 @@ class ConfigWriter(object): > > if feedback_fn: > > feedback_fn(errmsg) > > > > - self._last_cluster_serial = self._config_data.cluster.serial_no > > + self._last_written_ssconf = ssconf > > + else: > > + logging.debug("Ssconf unchanged") > > > > def _GetAllHvparamsStrings(self, hypervisors): > > """Get the hvparams of all given hypervisors from the config. > > -- > > 1.8.5.1 > > > > > > -- > Guido Trotter > Ganeti Engineering > Google Germany GmbH > Dienerstr. 12, 80331, München > > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg > Geschäftsführer: Graham Law, Christine Elizabeth Flores > Steuernummer: 48/725/00206 > Umsatzsteueridentifikationsnummer: DE813741370 >
