On Thu, Mar 5, 2009 at 6:01 PM, Nicolas Sylvain <nsylv...@chromium.org> wrote:
> There is one thing though, I don't think hclam checkin turned any Linux
> tests red. The linux build was unusable, but the unit tests and layout tests
> were passing. It seems like there is a lack of tests here.

Yeah, we aren't running UI tests yet, and that would have caught this
(or page cycler or anything else).  I think we're getting close, but
there is a bunch of things involved to get everything working.  We
definitely need better testing, but I think we're all test burnt out
from working on layout tests, and just want to get some of our UI
going.

> When I woke up I reverted his change because it broke Purify on windows and
> another unit test on windows, on only 1 bot.  There is a possibility that he
> did wait, but did not judge that the tree was red enough to worry about it.
> (the unit test did look like a flakyness, and Purify is often slow to give
> results).
> I learned just after the revert that it did fix the problem with linux.
> So, my recommendations:
> 1. hclam: don't resubmit until you are sure it does not break linux. I'm
> sure any linux people here can help you with this.
> 2. linux people: We need more tests.
> 3. Everyone: If the tree is broken, and no sheriff is around, please, try to
> find what broke it, and revert the change.

I pretty much wasted my day doing 3).  It was my fault for not
starting at the top and just going down, reverting, rebuild, etc.
That would have found it pretty quickly.  Mostly because it takes so
long to build, I instead tried to debug it, hoping I would find the
answer quickly.  In the end I chased a bunch of wrong ideas and wasted
a ton of time on it.

> If the tree is broken because of a problem with the buildbots, and no one is
> around to fix it, you can also page me. (I'll add the email alias to my who
> page).

It's not so fair to wake you up in the middle of the night.  I'd
rather just take the day off :)

I'm sure I was far too hard on hclam, I'm not super angry or anything.
 I am just trying to point out a problem that we continually seem to
have, and we obviously need to improve here and have clear steps about
how we're doing it.  More tests on Mac and Linux definitely need to
happen.  If SVN wasn't so slow, I would have just sync'd back a few
revisions.  We should make our build and source control faster, the
build I know we're working on.

I think also some improvement to the waterfall is going to be really
helpful.  We have so many builds and builders and tests now, it's
really hard to take all the information in.  When there is a problem,
we should have like a "top 10 recent shady commits" page, which tries
to attribute new failures across all our builders and tests to a
change list.  It's often hard to tell from a single builder if a patch
was bad, but if it caused a purify problem here, another problem here,
etc, then at least that would give people a good starting point of
where to look when they think the tree has gone sour.  We also
obviously need to improve our test flakiness.

I've committed more than my share of shady patches, and I think so has
the rest of the team.  Again, I didn't meant to try to blame things so
personally.  And you're right, even when you do commit and watch the
tree, it's often hard to tell the outcome, when it goes from "kinda
red" to "still kinda red", that isn't a great gauge.  It just seems
with all of the new platforms / code / complexity, we need some better
ways of managing when it happens, because it's going to keep on
happening.  I think the outcome goal should be to not have sheriffs,
and to have a system that can handle itself.

> Nicolas
>
> On Thu, Mar 5, 2009 at 8:47 AM, Amanda Walker <ama...@chromium.org> wrote:
>>
>> On Thu, Mar 5, 2009 at 11:41 AM, Dean McNamee <de...@chromium.org> wrote:
>> > Maybe we need to limit the hours people are commit, if a sheriff isn't
>> > around to fix things?  I don't really think that's a good strategy
>> > either.  I don't know.
>>
>> Well, it's not the sheriff's responsibility to fix every checkin,
>> they're just there as a safety net.  All of us should be making sure
>> we don't break the build, every time we commit something.  I don't
>> think limiting commit hours is a workable strategy, given how many
>> people are working across so many time zones.  However, last night was
>> a good reminder that no one should ever commit and then leave without
>> watching the build to make sure it landed cleanly.  It just leaves a
>> mess for other people, which is unfriendly.
>>
>> --Amanda
>>
>> >>
>
>

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to