Re: [chromium-dev] More sheriffs?
Peter Kasting wrote: > On Fri, Nov 13, 2009 at 12:44 PM, Stuart Morgan > wrote: >> >> If we end up actually having four at a time that seems likely to be >> worse than two: either four people are doing nothing but sheriffing, >> which there is probably not enough work for, or all four people are >> more likely to think that someone else is probably watching and they >> can do something else. I didn’t see Stuart’s original message, so I don’t know if there was more context, but I agree with what he’s saying here. In my experience, sheriffing is a one-person job, except we want that one person to be able to take a break or have lunch or have someone to fall back on when there are compound problems. I think it’s actually pretty rare for there to be more than three things wrong at a time, and usually when there are that many wrong, they didn’t all go bad simultaneously. It’s a one-person job, but it’s more than a full-time job, so we schedule two. Recently, there have been a few cases where people on the schedule couldn’t sheriff and didn’t arrange for a replacement. Things have gotten really bad when this happened, and for that reason alone, I’d support going to three. I also agree that going three months between shifts means that you might lose touch with how to do it effectively. Maybe we’ve got enough people now that we don’t need to sheriff for two days at a time. Maybe we can move from two sheriffs for two days to three for one. I’m not terribly motivated by any of the time zone policies, because I haven't seen this as a significant source of problems. Mark -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: sheriff's keep the tree *open* WAS: [chromium-dev] More sheriffs?
Ojan Vafai wrote: > I don't think this is what sheriffs are supposed to do, although there is > clearly not consensus here. The goal of the sheriff is to keep the tree open > as long as possible without carpeting over regressions. The sheriff should > suffer through minor flakiness without closing the tree (e.g. a couple flaky > webkit tests should not close the tree). YES. This is important, and I want to expand on it. A good sheriff doesn’t just open and close the tree. A good sheriff actively monitors and manages the tree. As much as we might want to codify how to respond to certain situations, I think that the best sheriffs rely on experience and good judgment more than anything else. Don’t read that to mean that we shouldn’t document sheriffing duties and tools, I think that’s important too. What might be even more helpful to a rookie (or a bad sheriff) would be to watch a good sheriff work through a troublesome tree before the rookie’s own number is up. If our tree were completely flake-free, we could rely on tools to keep things green, and we wouldn’t need sheriffs at all, or at least not in the same capacity that we need them today. Unfortunately, we’re not there yet. Until that problem is solved, we need the shades of gray between “the tree should be open” and “the tree should be closed” that a good sheriff’s human judgment provides. Mark -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: sheriff's keep the tree *open* WAS: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 2:28 PM, Ojan Vafai wrote: > The goal of the sheriff is to keep the tree open as long as possible > without carpeting over regressions. The sheriff should suffer through minor > flakiness without closing the tree (e.g. a couple flaky webkit tests should > not close the tree). > When I am sheriffing I keep the tree open until the point at which there is redness that has no owner. Normally, I take ownership of redness when I see it, so this only occurs when multiple different things are red. At that point I close the tree until all redness has owners, at which point I reopen. I don't know how well that squares with your description. that goal is achieved by reducing flakiness, not by keeping the tree closed > until all the flakiness has been properly documented (e.g. listed in > test_expectations.txt). > Are you suggesting not documenting the flakiness? If not, then I suspect that we are in fairly close agreement given my paragraph above. It's also a team goal to keep the tree open for >7 hours in every eight hour > period. The latter is primarily the responsibility of the sheriffs. > I see this as saying that the sheriff should prioritize tree openness over tree greenness, which I disagree with. Perhaps, though, you are not trying to say that so strongly, and you're again saying something more akin to my first paragraph above. Solving the problem by having the tree open if things "aren't too bad" is >> not good enough. Right now I just checked and the purify and valgrind bots >> were red. As usual. No sign of anyone looking into them. >> > > This is not a solution, but closing the tree doesn't really solve it > either. > I wasn't saying that closing the tree solved this problem. I was saying that the sheriff was not looking into these things, and that it was an example of a general pattern where many sheriffs seem not to look into them, and that not being busy dealing with these is one reason why other people might perceive the current sheriff system as sufficient and effective more than I do. The entire reason I want more sheriffs is _precisely_ to hold the tree open longer, because it means that when purify, valgrind, and layout tests all fail, they can all get owners immediately and the tree can stay open. Right now it seems to me that either the tree is not open enough, or we sheriffs are letting things slide. PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 3:38 PM, Peter Kasting wrote: > On Fri, Nov 13, 2009 at 2:56 PM, Dirk Pranke wrote: > >> I think two sheriffs in US/Pacific during US/Pacific work hours is >> plenty. >> > > I was told at lunch that we already try to some degree to schedule PST with > non-PST people (although obvioulsy there are far more of the former), which > gives me the impression that there is a large percentage of time where we > have one, rather than two, sheriffs. That is perhaps the most important > thing I'm trying to rectify in this proposal. > > On Fri, Nov 13, 2009 at 2:58 PM, Nicolas Sylvain > wrote: > >> As for http://dev.chromium.org/developers/tree-sheriffs, every sheriff >> receives it in the reminder email the day before they start their sheriff >> duty. >> > > I see calendar reminder mails and think of them as conveying a reminder of > an event, so I'd never noticed that these mails also mention a web page I'm > supposed to be reading. I know that is my own fault, but maybe there are > others in the same boat. In any case, I still think Ben's suggestions would > be useful. > > Overall I am surprised at how many people are skeptical of this proposal > given how unilaterally positive the smaller lunchtime discussion was. I > guess I perceive us as not having a very effective sheriff system right > now--it's certainly been difficult for me--and am looking for ways to remedy > that. It seems like those who aren't in favor of this generally wouldn't > agree with that assessment, and thus perceive this as adding overhead and > reducing effectiveness rather than combating a notable lack. If that is > accurate, I'm not sure how to square the two worldviews. I guess I will > leave this idea in the hands of the green tree task force to decide whether > it would be helpful. > It'd be interesting if others from lunch chimed in with why they think it's a good idea. Also, I think there was clear consensus in adding another sheriff so we always have 2 in the Americas (or maybe even PST). Do we know what the next steps are to implement this? -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 2:56 PM, Dirk Pranke wrote: > I think two sheriffs in US/Pacific during US/Pacific work hours is > plenty. > I was told at lunch that we already try to some degree to schedule PST with non-PST people (although obvioulsy there are far more of the former), which gives me the impression that there is a large percentage of time where we have one, rather than two, sheriffs. That is perhaps the most important thing I'm trying to rectify in this proposal. On Fri, Nov 13, 2009 at 2:58 PM, Nicolas Sylvain wrote: > As for http://dev.chromium.org/developers/tree-sheriffs, every sheriff > receives it in the reminder email the day before they start their sheriff > duty. > I see calendar reminder mails and think of them as conveying a reminder of an event, so I'd never noticed that these mails also mention a web page I'm supposed to be reading. I know that is my own fault, but maybe there are others in the same boat. In any case, I still think Ben's suggestions would be useful. Overall I am surprised at how many people are skeptical of this proposal given how unilaterally positive the smaller lunchtime discussion was. I guess I perceive us as not having a very effective sheriff system right now--it's certainly been difficult for me--and am looking for ways to remedy that. It seems like those who aren't in favor of this generally wouldn't agree with that assessment, and thus perceive this as adding overhead and reducing effectiveness rather than combating a notable lack. If that is accurate, I'm not sure how to square the two worldviews. I guess I will leave this idea in the hands of the green tree task force to decide whether it would be helpful. PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
Having just come off sheriffing four days in the past two weeks ... On Fri, Nov 13, 2009 at 12:31 PM, Peter Kasting wrote: > At lunch today, a few of us discussed the idea of moving from two sheriffs > to four. > There are several reasons we contemplated such a change: > * The team is large enough that on the current schedule, you go months > between sheriffing, which is so long that you forget things like what tools > help you do what. This is perhaps true, but I think it's more an issue that people don't run more of the tests on their own machines (or, alternatively, are asked to sheriff for areas of the system they never touch). > * Sheriffing is a heavy burden, and getting moreso with more team members. > * Either the two sheriffs are in different time zones, in which case you > have effectively one sheriff on duty who has to do everything (bad due to > point above), or they're not, in which case a chunk of the day is not > covered at all. I think two sheriffs in US/Pacific during US/Pacific work hours is plenty. I can't speak to how much an issue the lack of sheriffs are to people outside that window. > * New sheriffs could really use a "mentor sheriff" with them, which is > pretty difficult to schedule. Last week was actually my first time, and I didn't think it was a big deal, although I did ask a few people a few questions. I was pretty much full time on keeping the tree green and cleaning up flaky tests. Given that I'm otherwise full time on LTTF, this wasn't much of a change. I think it's unrealistic to expect to do anything real on a project while sheriffing, because you can't context-switch that fast to do a good job on either (at least, I can't). I also think the bots would've been green most of the time except that someone has clearly been ignoring the memory tests for a long time. If bots fails for a couple days straight, it's beyond a sheriff to try and fix it - I think someone needs to get assigned that problem specifically. So, I'd probably leave things mostly the way they are unless there's a desire to have better sheriffing outside of the MTV hours. I fully support always having two sheriffs during MTV hours. -- Dirk -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: sheriff's keep the tree *open* WAS: [chromium-dev] More sheriffs?
+1 (for what it's worth) On Fri, Nov 13, 2009 at 2:28 PM, Ojan Vafai wrote: > On Fri, Nov 13, 2009 at 1:25 PM, Peter Kasting wrote: > >> On Fri, Nov 13, 2009 at 1:15 PM, Finnur Thorarinsson >> wrote: >> >>> If the sheriff load is too much for two people to devote 100% of their >>> time to, then there is something wrong with the process. >>> >> >> It's clearly too much, given that I hardly see any other sheriffs even >> attempt to maintain the rule of "every bot green all the time", which is >> what you're supposed to do as sheriff. And when I maintain it, I need to >> keep the tree closed for long periods while I deal with the myriad of issues >> that come up. >> > > I don't think this is what sheriffs are supposed to do, although there is > clearly not consensus here. The goal of the sheriff is to keep the tree open > as long as possible without carpeting over regressions. The sheriff should > suffer through minor flakiness without closing the tree (e.g. a couple flaky > webkit tests should not close the tree). > > I *do* think it is a team goal to have every bot green all the time, but > that goal is achieved by reducing flakiness, not by keeping the tree closed > until all the flakiness has been properly documented (e.g. listed in > test_expectations.txt). It's also a team goal to keep the tree open for >7 > hours in every eight hour period. The latter is primarily the responsibility > of the sheriffs. > > >> Solving the problem by having the tree open if things "aren't too bad" is >> not good enough. Right now I just checked and the purify and valgrind bots >> were red. As usual. No sign of anyone looking into them. >> > > This is not a solution, but closing the tree doesn't really solve it > either. We need to put more burden on the sheriffs to watch and address > these bots, which, perhaps you're right that we should have more sheriffs. > > Ojan > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
sheriff's keep the tree *open* WAS: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 1:25 PM, Peter Kasting wrote: > On Fri, Nov 13, 2009 at 1:15 PM, Finnur Thorarinsson wrote: > >> If the sheriff load is too much for two people to devote 100% of their >> time to, then there is something wrong with the process. >> > > It's clearly too much, given that I hardly see any other sheriffs even > attempt to maintain the rule of "every bot green all the time", which is > what you're supposed to do as sheriff. And when I maintain it, I need to > keep the tree closed for long periods while I deal with the myriad of issues > that come up. > I don't think this is what sheriffs are supposed to do, although there is clearly not consensus here. The goal of the sheriff is to keep the tree open as long as possible without carpeting over regressions. The sheriff should suffer through minor flakiness without closing the tree (e.g. a couple flaky webkit tests should not close the tree). I *do* think it is a team goal to have every bot green all the time, but that goal is achieved by reducing flakiness, not by keeping the tree closed until all the flakiness has been properly documented (e.g. listed in test_expectations.txt). It's also a team goal to keep the tree open for >7 hours in every eight hour period. The latter is primarily the responsibility of the sheriffs. > Solving the problem by having the tree open if things "aren't too bad" is > not good enough. Right now I just checked and the purify and valgrind bots > were red. As usual. No sign of anyone looking into them. > This is not a solution, but closing the tree doesn't really solve it either. We need to put more burden on the sheriffs to watch and address these bots, which, perhaps you're right that we should have more sheriffs. Ojan -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 1:33 PM, Stuart Morgan wrote: > On Fri, Nov 13, 2009 at 1:25 PM, Peter Kasting > wrote: > > Sheriffs are in theory supposed to watch all the perf bots too. Do you? > I > > don't. I doubt very many people do. > > That's probably mostly a function of the fact that there's essentially > no mention of monitoring perf (the fact that they should, how to do > it, how to handle regressions, etc.) on the page about what sheriffs > should do, not a manpower issue. Given that our project lead didn't even know there _was_ such a page, I'm not convinced. I don't think most sheriffs exhaustively read and understand that page, and the tasks and best practices as sheriff change rapidly (I hadn't ever heard of "drover" last time I sheriffed), which is part of the motivation for speeding up the cycle. PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
Big +1 for at least a third sheriff. With two sheriffs, if one is not in PST, then really we only have one sheriff. If that sheriff happens to be new, then we have 0 <= num_sheriffs <= 1. On Fri, Nov 13, 2009 at 12:31 PM, Peter Kasting wrote: > At lunch today, a few of us discussed the idea of moving from two sheriffs > to four. > There are several reasons we contemplated such a change: > * The team is large enough that on the current schedule, you go months > between sheriffing, which is so long that you forget things like what tools > help you do what. > * Sheriffing is a heavy burden, and getting moreso with more team members. > * Either the two sheriffs are in different time zones, in which case you > have effectively one sheriff on duty who has to do everything (bad due to > point above), or they're not, in which case a chunk of the day is not > covered at all. > * New sheriffs could really use a "mentor sheriff" with them, which is > pretty difficult to schedule. > I think these are good reasons, so I propose we make this change. Comments? > PK > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 1:15 PM, Finnur Thorarinsson wrote: > If the sheriff load is too much for two people to devote 100% of their time > to, then there is something wrong with the process. > It's clearly too much, given that I hardly see any other sheriffs even attempt to maintain the rule of "every bot green all the time", which is what you're supposed to do as sheriff. And when I maintain it, I need to keep the tree closed for long periods while I deal with the myriad of issues that come up. Solving the problem by having the tree open if things "aren't too bad" is not good enough. Right now I just checked and the purify and valgrind bots were red. As usual. No sign of anyone looking into them. Sheriffs are in theory supposed to watch all the perf bots too. Do you? I don't. I doubt very many people do. There is tons of information available to sheriffs and too few people to cover it. Someone watching perf, someone watching purify/valgrind, someone watching layout tests, and someone watching everything else would be really helpful. Especially if one of those people was experienced enough to help somebody else doing it for the first time. The team is growing fast enough that we have a _lot_ of first-time sheriffs. PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
(resending to chromium-dev) Sheriffing the PST time zone is usually the worst. We could experiment with tweaking the scheduling algorithm to have two PST sheriffs and one non-PST sheriff per shift. Other than that -- fixing flaky tests would go a long way to making the job easier. Right now out of 12 failing bots, only 1 is a true failure. On Fri, Nov 13, 2009 at 12:48 PM, Peter Kasting wrote: > On Fri, Nov 13, 2009 at 12:44 PM, Stuart Morgan > wrote: > >> If we end up actually having four at a time that seems likely to be >> worse than two: either four people are doing nothing but sheriffing, >> which there is probably not enough work for, or all four people are >> more likely to think that someone else is probably watching and they >> can do something else. > > > I can only say that in my own sheriffing experience that this is utterly > untrue, and having two people at once is amazingly helpful since we can > track down different problem areas; one working on purify and valgrind > errors while another works on layout tests. There has never been a time in > such cases where we both did nothing because we thought the other person was > working on it; we were always pinging each other and dividing work on the > fly. > > I don't think Chromium team members are so irresponsible that they would > not work out some system in such cases. And part of the point is that it > would be nice to be able to get a _little_ bit of work done on the days > you're sheriffing, or go to lunch, or whatever. > > PK > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
For a while now, I've advocated having 2 pacific timezone sheriffs always on duty and having one or two in other time zones. I still advocate such an idea. So, to be clear, I think this is a good idea as long as the distribution of sheriffs (time zone wise) is deliberate. (I think this addresses Stuart's concern as well.) J On Fri, Nov 13, 2009 at 12:31 PM, Peter Kasting wrote: > At lunch today, a few of us discussed the idea of moving from two sheriffs > to four. > > There are several reasons we contemplated such a change: > * The team is large enough that on the current schedule, you go months > between sheriffing, which is so long that you forget things like what tools > help you do what. > * Sheriffing is a heavy burden, and getting moreso with more team members. > * Either the two sheriffs are in different time zones, in which case you > have effectively one sheriff on duty who has to do everything (bad due to > point above), or they're not, in which case a chunk of the day is not > covered at all. > * New sheriffs could really use a "mentor sheriff" with them, which is > pretty difficult to schedule. > > I think these are good reasons, so I propose we make this change. > Comments? > > PK > > -- > Chromium Developers mailing list: chromium-dev@googlegroups.com > View archives, change email options, or unsubscribe: > http://groups.google.com/group/chromium-dev > -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 12:44 PM, Stuart Morgan wrote: > If we end up actually having four at a time that seems likely to be > worse than two: either four people are doing nothing but sheriffing, > which there is probably not enough work for, or all four people are > more likely to think that someone else is probably watching and they > can do something else. I can only say that in my own sheriffing experience that this is utterly untrue, and having two people at once is amazingly helpful since we can track down different problem areas; one working on purify and valgrind errors while another works on layout tests. There has never been a time in such cases where we both did nothing because we thought the other person was working on it; we were always pinging each other and dividing work on the fly. I don't think Chromium team members are so irresponsible that they would not work out some system in such cases. And part of the point is that it would be nice to be able to get a _little_ bit of work done on the days you're sheriffing, or go to lunch, or whatever. PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
Re: [chromium-dev] More sheriffs?
On Fri, Nov 13, 2009 at 12:35 PM, Ben Goodger wrote: > On Fri, Nov 13, 2009 at 12:31 PM, Peter Kasting > wrote: > > * The team is large enough that on the current schedule, you go months > > between sheriffing, which is so long that you forget things like what > tools > > help you do what. > > This info should be written down and kept up to date by sheriffs on a > daily basis. See http://dev.chromium.org/developers/tree-sheriffs , which is linked off our main developer wiki page. PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev
[chromium-dev] More sheriffs?
At lunch today, a few of us discussed the idea of moving from two sheriffs to four. There are several reasons we contemplated such a change: * The team is large enough that on the current schedule, you go months between sheriffing, which is so long that you forget things like what tools help you do what. * Sheriffing is a heavy burden, and getting moreso with more team members. * Either the two sheriffs are in different time zones, in which case you have effectively one sheriff on duty who has to do everything (bad due to point above), or they're not, in which case a chunk of the day is not covered at all. * New sheriffs could really use a "mentor sheriff" with them, which is pretty difficult to schedule. I think these are good reasons, so I propose we make this change. Comments? PK -- Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev