Thanks for your comment, Stephan. I have moved it into the doc to keep discussion history in one place.
On Wed, May 6, 2015 at 1:33 AM, Erb, Stephan <[email protected]> wrote: > Hi Maxim, > > I am not keen on the potential risk of tasks getting stuck in STARTING. We > perform auto-scaling of jobs, so there might be nobody around to notice and > correct the problem in time. > > How about keeping the initial_interval_secs and just change its meaning to be > grace period, so that health checks are triggered but errors ignored during > this interval. > > The initial_interval_secs is then a user-configurable upper bound of when a > job is meant to be working. It can even be set rather high, because it won't > affect the update performance. > > What do you think? > > Best Regards, > Stephan > ________________________________________ > From: Maxim Khutornenko <[email protected]> > Sent: Tuesday, May 5, 2015 10:24 PM > To: [email protected] > Subject: Health Checks for Updates design review > > Hi, > > I have put together a design proposal for improving health-enabled job > update performance. Please, review and leave your comments: > > https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit > > Thanks, > Maxim
