Summary of IRC Meeting in #aurora at Mon Feb  1 19:06:57 2016:

Attendees: mkhutornenko, adeshmukh, zmanji, benley, jcohen

- Preface
- Deprecation cycles
  - Action: jcohen to follow up w/ dev thread re: changing deprecation policy.
- AURORA-1603
- Rollback testing
  - Action: jcohen to email dev@ w.r.t. rollback testing.


IRC log follows:

## Preface ##
[Mon Feb  1 19:07:49 2016] <jcohen>: Ok, let’s start w/ roll call, as always 
everyone is encouraged to parctipate!
[Mon Feb  1 19:07:53 2016] <jcohen>: here :)
[Mon Feb  1 19:07:57 2016] <benley>: Here
[Mon Feb  1 19:07:59 2016] <adeshmukh>: here
[Mon Feb  1 19:08:43 2016] <mkhutornenko>: here
[Mon Feb  1 19:08:54 2016] <zmanji>: here
[Mon Feb  1 19:10:25 2016] <jcohen>: Ok, first things first…
## Deprecation cycles ##
[Mon Feb  1 19:11:08 2016] <jcohen>: As we increase the cadence of releases, 
our policy of killing deprecated fields after one release cycle becomes more 
burdensome.
[Mon Feb  1 19:11:54 2016] <jcohen>: Given that we’re trying to at least keep 
up with Mesos’s release cycle which is now timed, it seems like this will be 
a continuing problem for us, since we can expect releases fairly regularly.
[Mon Feb  1 19:12:21 2016] <jcohen>: Curious what people think about moving 
from a release-based deprecation to a timed deprecation
[Mon Feb  1 19:12:28 2016] <benley>: I'd be in favor.
[Mon Feb  1 19:12:50 2016] <jcohen>: (i.e. instead of deprecated in release X, 
removed in release X + 1, instead it would be removed N days after the release 
in which it was deprecated)
[Mon Feb  1 19:13:14 2016] <zmanji>: I'm also in favor of time based because I 
like the frequent releases but some of the deprecations are pretty difficult to 
do
[Mon Feb  1 19:13:26 2016] <benley>: Or perhaps "2 releases, or at least NN 
days"
[Mon Feb  1 19:14:20 2016] <mkhutornenko>: +1 to a timed approach. I think 
Mesos follows the same practice
[Mon Feb  1 19:14:23 2016] <jcohen>: Yeah, I want to ensure we keep a balance 
between giving operators enough time to adopt changes to deprecated fields 
versus us having to keep them around for too long.
[Mon Feb  1 19:15:19 2016] <jcohen>: It seems all are in favor. Given the 
absence of wfarner, jsirois, would it make sense to continue this discussion on 
the dev list where we can come up with a final, revised policy?
[Mon Feb  1 19:16:07 2016] <jcohen>: #action jcohen to follow up w/ dev thread 
re: changing deprecation policy.
## AURORA-1603 ##
[Mon Feb  1 19:16:35 2016] <jcohen>: 
https://issues.apache.org/jira/browse/AURORA-1603
[Mon Feb  1 19:16:40 2016] <jcohen>: AURORA-1603
[Mon Feb  1 19:16:55 2016] <jcohen>: mkhutornenko: you want to walk through 
what happened here?
[Mon Feb  1 19:17:55 2016] <mkhutornenko>: The details of the root cause are 
too intricate to follow along here but I can give a brief overview of what 
happened
[Mon Feb  1 19:18:39 2016] <mkhutornenko>: we tried to deploy a master version 
into one of our clusters and immediately noticed an issue with duplicate 
instances showing up in job page: 
https://issues.apache.org/jira/browse/AURORA-1604
[Mon Feb  1 19:19:10 2016] <mkhutornenko>: we immediately attempted to rollback 
to a previous known good version but the scheduler was unable to restart
[Mon Feb  1 19:19:47 2016] <mkhutornenko>: we have found stack trace (listed in 
https://issues.apache.org/jira/browse/AURORA-1603) and had to restore scheduler 
from backup
[Mon Feb  1 19:20:17 2016] <mkhutornenko>: that led to a few other issues found 
in our recovery instructions not being updated with recent changes
[Mon Feb  1 19:20:35 2016] <mkhutornenko>: 
https://issues.apache.org/jira/browse/AURORA-1605
[Mon Feb  1 19:21:06 2016] <mkhutornenko>: all in all, we were able to recover 
but it took us a few hours to reconcile this problem
[Mon Feb  1 19:22:44 2016] <jcohen>: Thanks Maxim. This dovetails nicely to my 
next topic…
## Rollback testing ##
[Mon Feb  1 19:23:14 2016] <mkhutornenko>: btw, master is not in a working 
state currently, so I wouldn’t recommend deploying from it
[Mon Feb  1 19:23:33 2016] <jcohen>: Do folks think it would be beneficial to 
come up with some sort of test suite that ensures it’s possible to roll back 
between commits?
[Mon Feb  1 19:23:53 2016] <jcohen>: I don’t know how many people deploy from 
master as opposed to from releases
[Mon Feb  1 19:24:10 2016] <jcohen>: Obviously it’s not a problem that comes 
up frequently, but it can lead to serious issues when it does arise
[Mon Feb  1 19:24:32 2016] <mkhutornenko>: I think build-to-build rollback 
verification is important and would benefit overall quality
[Mon Feb  1 19:25:16 2016] <jcohen>: Our jenkins job does not currently run e2e 
tests unfortunately
[Mon Feb  1 19:25:47 2016] <jcohen>: if it did, it seems like the easiest thing 
to do would be to run e2e tests, then git checkout HEAD^ and try to 
rebuild/restart the scheduler
[Mon Feb  1 19:26:35 2016] <mkhutornenko>: we are planning to alter our 
internal deploy sequence to verify build-to-build upgrade/rollback cycle in a 
test cluster but would be nice to have a solution everyone could benefit from
[Mon Feb  1 19:27:32 2016] <jcohen>: It might be worth reviving AURORA-476
[Mon Feb  1 19:27:36 2016] <jcohen>: AURORA-476
[Mon Feb  1 19:28:24 2016] <jcohen>: Again, I’ll redirect this to the dev 
list for further discussion.
[Mon Feb  1 19:28:33 2016] <mkhutornenko>: +1
[Mon Feb  1 19:28:39 2016] <jcohen>: #action jcohen to email dev@ w.r.t. 
rollback testing.
[Mon Feb  1 19:29:04 2016] <jcohen>: That’s all I’ve got on my list, anyone 
else have any topics?
[Mon Feb  1 19:30:54 2016] <jcohen>: Ok folks, that’ll do it then. Have a 
good week everyone!
[Mon Feb  1 19:32:53 2016] <jcohen>: ASFBot: meeting end
[Mon Feb  1 19:33:05 2016] <zmanji>: ASFBot: meeting end


Meeting ended at Mon Feb  1 19:33:05 2016

Reply via email to