This is mostly a survey rather than a proposal. How would people think
about limiting updater to only adding/updating instances and let
killTasks take care of instance removals?

We have all heard stories (or happen to create some ourselves) when an
outdated instance count value in .aurora config caused unexpected
instance removals. Granted, there are plenty of other values in the
config that can cause service-wide outage but instance count seems to
be the worst in that sense.

After the recent refactoring of addInstances and killTasks to act as
scaleOut/scaleIn APIs [1], the outdated instance count problem will
only get worse as automated scaling tools will quickly render existing
.aurora config value obsolete. With that in mind, should we block
instance removal in the updater and let an explicit killTasks call be
the only acceptable action to reduce instance count? Is there any
value (aside from arguable convenience factor) in having
startJobUpdate ever killing instances?

Thanks,
Maxim

[1] - http://markmail.org/message/2smaej5n5e54li3g

Reply via email to