On Fri, Feb 5, 2016 at 4:31 PM, Bill Farner <wfar...@apache.org> wrote:
> Or without any persistence at all. The client could refuse to adjust the > instance count on a job unless there's additional command line argument. > The same arguments of responsibility could be said here of users of old > clients or custom clients. > I guess that's true. I concur. > On Fri, Feb 5, 2016 at 3:17 PM, John Sirois <j...@conductant.com> wrote: > > > On Fri, Feb 5, 2016 at 4:07 PM, Maxim Khutornenko <ma...@apache.org> > > wrote: > > > > > We have had attempts to safeguard client updater command with a > > > "dangerous change" warning before but it did not get good feedback. > > > Besides, automated tools/scripts just ignored it. > > > > > > An alternative could be what George suggest on the scaling API thread > > > mentioned earlier: automatically bump up instance count to the job > > > active task count. I'd say this could be an implementation to the > > > proposal above rather than a safeguard as it accomplishes the exact > > > same goal. > > > > > > Bill, do you have any ideas of what that safeguard could be? > > > > > > > I'd recommend that an API call that reduced instance count require an > > `confirm_instance_reduction =true` parameter - this could be plumbed back > > to a flag in the official Aurora client. > > That said, since Aurora immediately forgets jobs and splits things into > > tasks, I'm not sure this is even sanely possible today. > > > > Assuming it is possible, any human that turns that flag on by default > with > > a shell alias or an rc file can take responsibility for their own > problem. > > If a tool passes the boolean, again - that's the tool's problem. > Hopefully > > its a carefully developed and vetted auto-scaling tool. > > > > > > > On Fri, Feb 5, 2016 at 2:56 PM, Bill Farner <wfar...@apache.org> > wrote: > > > >> > > > >> the outdated instance count problem will only get worse as automated > > > >> scaling tools will quickly render existing .aurora config value > > obsolete > > > > > > > > > > > > This is not a compelling reason to remove functionality. Sounds > like a > > > > safeguard is needed instead. > > > > > > > > On Fri, Feb 5, 2016 at 2:43 PM, Maxim Khutornenko <ma...@apache.org> > > > wrote: > > > > > > > >> This is mostly a survey rather than a proposal. How would people > think > > > >> about limiting updater to only adding/updating instances and let > > > >> killTasks take care of instance removals? > > > >> > > > >> We have all heard stories (or happen to create some ourselves) when > an > > > >> outdated instance count value in .aurora config caused unexpected > > > >> instance removals. Granted, there are plenty of other values in the > > > >> config that can cause service-wide outage but instance count seems > to > > > >> be the worst in that sense. > > > >> > > > >> After the recent refactoring of addInstances and killTasks to act as > > > >> scaleOut/scaleIn APIs [1], the outdated instance count problem will > > > >> only get worse as automated scaling tools will quickly render > existing > > > >> .aurora config value obsolete. With that in mind, should we block > > > >> instance removal in the updater and let an explicit killTasks call > be > > > >> the only acceptable action to reduce instance count? Is there any > > > >> value (aside from arguable convenience factor) in having > > > >> startJobUpdate ever killing instances? > > > >> > > > >> Thanks, > > > >> Maxim > > > >> > > > >> [1] - http://markmail.org/message/2smaej5n5e54li3g > > > >> > > > > > > > > > > > -- > > John Sirois > > 303-512-3301 > > > -- John Sirois 303-512-3301