Good idea. This would be another good improvement. We could update
_last_written_ssconf only if all RPC calls that distribute it succeed. This
way, if some fails, we'll try to redistribute it again on a next config
update. But I'm a bit worried that this would slow ConfigWriter
considerably if a node failed, especially if there were a network problem
and trying to upload would end up with a network timeout. So it's a
trade-off between trying to be consistent as much as possible and
performance. What do you think?

I guess this could be solved by uploading ssconf asynchronously, which
would also speed up all configuration updates considerably, but I would
rather focus on this in the WConfD daemon (ATM I'm not completely sure if
ssconf falling a bit behind the master configuration would be OK, I haven't
worked with ssconf very much). The same idea could be used for distributing
the configuration to master candidates.


On Mon, Jan 13, 2014 at 9:45 AM, Guido Trotter <[email protected]> wrote:

> The only doubt I have about this is: what if an update fails for some
> reason? Should we keep track of that and in that case retry more
> aggressively? rest LGTM
>
> Thanks,
>
> Guido
>
> On Mon, Jan 13, 2014 at 8:49 AM, Petr Pudlak <[email protected]> wrote:
> > Instead of writing it every time cluster.serial_no changes, compare it
> > to the last written version and write+distribute if it has changed. This
> > has these advantages:
> > - Less frequent ssconf distribution.
> > - Updates are correctly performed even if something changes that
> >   affects ssconf, but doesn't touch the cluster.serial_no.
> >
> > Signed-off-by: Petr Pudlak <[email protected]>
> > ---
> >  lib/config.py | 23 +++++++++++++++--------
> >  1 file changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/config.py b/lib/config.py
> > index fcdefc3..3f5b56e 100644
> > --- a/lib/config.py
> > +++ b/lib/config.py
> > @@ -199,7 +199,10 @@ class ConfigWriter(object):
> >      # better to raise an error before starting to modify the config
> >      # file than after it was modified
> >      self._my_hostname = netutils.Hostname.GetSysName()
> > -    self._last_cluster_serial = -1
> > +    # A dictionary of strings keeping what has been written to ssconf
> at its
> > +    # last update. This is used for determining if we need to
> update/distribute
> > +    # ssconf or not (which is an expensive operation).
> > +    self._last_written_ssconf = {}
> >      self._cfg_id = None
> >      self._context = None
> >      self._OpenConfig(accept_foreign)
> > @@ -2364,9 +2367,8 @@ class ConfigWriter(object):
> >        raise errors.ConfigurationError(msg)
> >
> >      self._config_data = data
> > -    # reset the last serial as -1 so that the next write will cause
> > -    # ssconf update
> > -    self._last_cluster_serial = -1
> > +    # force the next write to update ssconf
> > +    self._last_written_ssconf = {}
> >
> >      # Upgrade configuration if needed
> >      self._UpgradeConfig()
> > @@ -2508,12 +2510,15 @@ class ConfigWriter(object):
> >      # and redistribute the config file to master candidates
> >      self._DistributeConfig(feedback_fn)
> >
> > -    # Write ssconf files on all nodes (including locally)
> > -    if self._last_cluster_serial < self._config_data.cluster.serial_no:
> > +    # Write ssconf files on all nodes (including locally), if anything
> has
> > +    # changed
> > +    ssconf = self._UnlockedGetSsconfValues()
> > +    if ssconf != self._last_written_ssconf:
> > +      logging.debug("Ssconf changed, distributing")
> >        if not self._offline:
> >          result = self._GetRpc(None).call_write_ssconf_files(
> >            self._UnlockedGetNodeNames(self._UnlockedGetOnlineNodeList()),
> > -          self._UnlockedGetSsconfValues())
> > +          ssconf)
> >
> >          for nname, nresu in result.items():
> >            msg = nresu.fail_msg
> > @@ -2525,7 +2530,9 @@ class ConfigWriter(object):
> >              if feedback_fn:
> >                feedback_fn(errmsg)
> >
> > -      self._last_cluster_serial = self._config_data.cluster.serial_no
> > +      self._last_written_ssconf = ssconf
> > +    else:
> > +      logging.debug("Ssconf unchanged")
> >
> >    def _GetAllHvparamsStrings(self, hypervisors):
> >      """Get the hvparams of all given hypervisors from the config.
> > --
> > 1.8.5.1
> >
>
>
>
> --
> Guido Trotter
> Ganeti Engineering
> Google Germany GmbH
> Dienerstr. 12, 80331, München
>
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Graham Law, Christine Elizabeth Flores
> Steuernummer: 48/725/00206
> Umsatzsteueridentifikationsnummer: DE813741370
>

Reply via email to