Summary of IRC Meeting in #aurora at Mon Feb 1 19:06:57 2016: Attendees: mkhutornenko, adeshmukh, zmanji, benley, jcohen
- Preface - Deprecation cycles - Action: jcohen to follow up w/ dev thread re: changing deprecation policy. - AURORA-1603 - Rollback testing - Action: jcohen to email dev@ w.r.t. rollback testing. IRC log follows: ## Preface ## [Mon Feb 1 19:07:49 2016] <jcohen>: Ok, letâs start w/ roll call, as always everyone is encouraged to parctipate! [Mon Feb 1 19:07:53 2016] <jcohen>: here :) [Mon Feb 1 19:07:57 2016] <benley>: Here [Mon Feb 1 19:07:59 2016] <adeshmukh>: here [Mon Feb 1 19:08:43 2016] <mkhutornenko>: here [Mon Feb 1 19:08:54 2016] <zmanji>: here [Mon Feb 1 19:10:25 2016] <jcohen>: Ok, first things first⦠## Deprecation cycles ## [Mon Feb 1 19:11:08 2016] <jcohen>: As we increase the cadence of releases, our policy of killing deprecated fields after one release cycle becomes more burdensome. [Mon Feb 1 19:11:54 2016] <jcohen>: Given that weâre trying to at least keep up with Mesosâs release cycle which is now timed, it seems like this will be a continuing problem for us, since we can expect releases fairly regularly. [Mon Feb 1 19:12:21 2016] <jcohen>: Curious what people think about moving from a release-based deprecation to a timed deprecation [Mon Feb 1 19:12:28 2016] <benley>: I'd be in favor. [Mon Feb 1 19:12:50 2016] <jcohen>: (i.e. instead of deprecated in release X, removed in release X + 1, instead it would be removed N days after the release in which it was deprecated) [Mon Feb 1 19:13:14 2016] <zmanji>: I'm also in favor of time based because I like the frequent releases but some of the deprecations are pretty difficult to do [Mon Feb 1 19:13:26 2016] <benley>: Or perhaps "2 releases, or at least NN days" [Mon Feb 1 19:14:20 2016] <mkhutornenko>: +1 to a timed approach. I think Mesos follows the same practice [Mon Feb 1 19:14:23 2016] <jcohen>: Yeah, I want to ensure we keep a balance between giving operators enough time to adopt changes to deprecated fields versus us having to keep them around for too long. [Mon Feb 1 19:15:19 2016] <jcohen>: It seems all are in favor. Given the absence of wfarner, jsirois, would it make sense to continue this discussion on the dev list where we can come up with a final, revised policy? [Mon Feb 1 19:16:07 2016] <jcohen>: #action jcohen to follow up w/ dev thread re: changing deprecation policy. ## AURORA-1603 ## [Mon Feb 1 19:16:35 2016] <jcohen>: https://issues.apache.org/jira/browse/AURORA-1603 [Mon Feb 1 19:16:40 2016] <jcohen>: AURORA-1603 [Mon Feb 1 19:16:55 2016] <jcohen>: mkhutornenko: you want to walk through what happened here? [Mon Feb 1 19:17:55 2016] <mkhutornenko>: The details of the root cause are too intricate to follow along here but I can give a brief overview of what happened [Mon Feb 1 19:18:39 2016] <mkhutornenko>: we tried to deploy a master version into one of our clusters and immediately noticed an issue with duplicate instances showing up in job page: https://issues.apache.org/jira/browse/AURORA-1604 [Mon Feb 1 19:19:10 2016] <mkhutornenko>: we immediately attempted to rollback to a previous known good version but the scheduler was unable to restart [Mon Feb 1 19:19:47 2016] <mkhutornenko>: we have found stack trace (listed in https://issues.apache.org/jira/browse/AURORA-1603) and had to restore scheduler from backup [Mon Feb 1 19:20:17 2016] <mkhutornenko>: that led to a few other issues found in our recovery instructions not being updated with recent changes [Mon Feb 1 19:20:35 2016] <mkhutornenko>: https://issues.apache.org/jira/browse/AURORA-1605 [Mon Feb 1 19:21:06 2016] <mkhutornenko>: all in all, we were able to recover but it took us a few hours to reconcile this problem [Mon Feb 1 19:22:44 2016] <jcohen>: Thanks Maxim. This dovetails nicely to my next topic⦠## Rollback testing ## [Mon Feb 1 19:23:14 2016] <mkhutornenko>: btw, master is not in a working state currently, so I wouldnât recommend deploying from it [Mon Feb 1 19:23:33 2016] <jcohen>: Do folks think it would be beneficial to come up with some sort of test suite that ensures itâs possible to roll back between commits? [Mon Feb 1 19:23:53 2016] <jcohen>: I donât know how many people deploy from master as opposed to from releases [Mon Feb 1 19:24:10 2016] <jcohen>: Obviously itâs not a problem that comes up frequently, but it can lead to serious issues when it does arise [Mon Feb 1 19:24:32 2016] <mkhutornenko>: I think build-to-build rollback verification is important and would benefit overall quality [Mon Feb 1 19:25:16 2016] <jcohen>: Our jenkins job does not currently run e2e tests unfortunately [Mon Feb 1 19:25:47 2016] <jcohen>: if it did, it seems like the easiest thing to do would be to run e2e tests, then git checkout HEAD^ and try to rebuild/restart the scheduler [Mon Feb 1 19:26:35 2016] <mkhutornenko>: we are planning to alter our internal deploy sequence to verify build-to-build upgrade/rollback cycle in a test cluster but would be nice to have a solution everyone could benefit from [Mon Feb 1 19:27:32 2016] <jcohen>: It might be worth reviving AURORA-476 [Mon Feb 1 19:27:36 2016] <jcohen>: AURORA-476 [Mon Feb 1 19:28:24 2016] <jcohen>: Again, Iâll redirect this to the dev list for further discussion. [Mon Feb 1 19:28:33 2016] <mkhutornenko>: +1 [Mon Feb 1 19:28:39 2016] <jcohen>: #action jcohen to email dev@ w.r.t. rollback testing. [Mon Feb 1 19:29:04 2016] <jcohen>: Thatâs all Iâve got on my list, anyone else have any topics? [Mon Feb 1 19:30:54 2016] <jcohen>: Ok folks, thatâll do it then. Have a good week everyone! [Mon Feb 1 19:32:53 2016] <jcohen>: ASFBot: meeting end [Mon Feb 1 19:33:05 2016] <zmanji>: ASFBot: meeting end Meeting ended at Mon Feb 1 19:33:05 2016