On 2015-12-29 6:24 PM, Steve Fink wrote:
That makes me think that we're not disagreeing very much. Still some, I
think, but sure -- if assigning a handful of people to work on
intermittent oranges makes the problem go away, then that seems like a
reasonable thing to do. Though if you're taking existing developers, and
I think you'd have to in order to get anywhere, then you'd want to be
sure that stalling off what they're currently working on is not going to
strike them as a ridiculously bad idea.

And even then, I'm not sure of the "tiger team" approach here. We have
that now, and his name is Ehsan. He has good evidence to show that it
doesn't work.

FWIW, the reason that it doesn't work is that my time is limited and can't be devoted to this problem enough, and in many cases when I ask someone else to help, the answer is that they don't have time.

If more people did this more systematically, I think we can get to a good place in the number of oranges, and stay there.

But this sort of problem seems like it's something we have an
institutional disregard for. And any attempt to spot-fix it without
addressing that systemic issue seems less likely to succeed. *That* is
what I understood the true motivation for things like enforced tree
closures was -- not to directly solve the issue, but to force everyone
to care more about it, so that we don't have to do crappy things like
that. But I still don't like it, because it feels like a "you must write
at least 100 lines of code a day" type of thing: it doesn't necessarily
put the pressure at the right place, and it causes a lot of collateral
damage.

I really doubt that the idea of closing trees indefinitely is realistic enough to even merit a discussion. :-)

I didn't really want to get into this (hey, I just wanted to throw out a
suggestion to get people thinking), but to me the problem is that we're
at all ok with allowing the current situation to develop. The sheriffs
have been warning about impending orangopocalypse for as long as I can
remember, and we've all (myself included) largely ignored them. Ehsan
has also brought this up several times, with real data as to how
addressable this is with focused effort, and from what I can tell the
response has basically been a collective "boy, that Ehsan guy is smart
and nice to have around" followed by a continuation of the current way
of doing things.

Yes, exactly. Our current situation is that people can successfully get by pretending that the orange situation is "someone else's problem". Look at it this way. If you decide to ignore an orange bug in your area at all costs, you'll have no problem doing that. Just ignore the bug long enough until a) someone else actually fixes it, or b) the test gets disabled by a sheriff.

A "tiger team" approach can bring us into a good spot now, but while the situation above doesn't change, we'll end up back where we are. And we'll end up having this conversation once again.

To be fair, we *are* doing some important work in the right direction. I
don't know what all of it is, but the treeherder improvements for
identifying and categorizing oranges is really important, as are rr,
chaos mode, web replay, and related work. To me at least, that still
feels like more of the "let's throw a few smart people at the systemic
problem" approach, though. I could be wrong. Maybe we're close enough to
being able to recognize failures and direct them to the appropriate
people, who will then jump up and fix them, that this whole problem will
go away soon.

/me holds breath

/me turns blue

It's true that some of these new initiatives will help debugging intermittent failures, they're not strictly needed. What we need is accepting shared responsibility about this issue, and address it like any other ongoing technical issue.

Part of this shared responsibility is acknowledging that this is part of what MoCo employed engineers need to spend time on, which requires support for engineering managers. As the recent response to bkelly's triaging effort shows, there is a lot of goodwill on this issue both from engineers and managers, so I hope this time we'll keep being on top of this issue.

Cheers,
Ehsan
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to