Notice: I'm done messing with test-patch.sh. There is a little zombies line
at the end of the report now that should do a better job of clean reporting
whenever there are sightings.

Also note that all builds last night failed with OOME. Seems to be
infrastructure that is OOMEing, not our tests (for once). Let me ask INFRA.

St.Ack

On Thu, Dec 3, 2015 at 9:27 AM, Stack <st...@duboce.net> wrote:

> Notice: I'm messing with test-patch.sh reporting trying to improve the
> zombie section. I'll likely break things for a while (I already have -- the
> hadoopqa report section is curtailed at mo). Will flag when done.
> St.Ack
>
> On Wed, Dec 2, 2015 at 1:22 PM, Stack <st...@duboce.net> wrote:
>
>> As part of my continuing advocacy of builds.apache.org and that their
>> results are now worthy of our trust and nurture, here are some highlights
>> from the last few days of builds:
>>
>> + hadoopqa is now finding zombies before the patch is committed.
>> HBASE-14888 showed "-1 core tests. The patch failed these unit tests:" but
>> didn't have any failed tests listed (I'm trying to see if I can do anything
>> about this...). Running our little ./dev-tools/findHangingTests.py against
>> the consoleText, it showed a hanging test. Running locally, I see same
>> hang. This is before the patch landed.
>> + Our branch runs are now near totally zombie and flakey free -- still
>> some work to do -- but a recent patch that seemed harmless was causing a
>> reliable flake fail in the backport to branch-1* confirmed by local runs.
>> The flakeyness was plain to see up in builds.apache.org.
>> + In the last few days I've committed a patch that included javadoc
>> warnings even though hadoopqa said the patch introduced javadoc issues (I
>> missed it). This messed up life for folks subsequently as their patches now
>> reported javadoc issues....
>>
>> In short, I suggest that builds.apache.org is worth keeping an eye on,
>> make sure you get a clean build out of hadoopqa before committing anything,
>> and lets all work together to try and keep our builds blue: it'll save us
>> all work in the long run.
>>
>> St.Ack
>>
>>
>> On Tue, Nov 4, 2014 at 9:38 AM, Stack <st...@duboce.net> wrote:
>>
>>> Branch-1 and master have stabilized and now run mostly blue (give or
>>> take the odd failure) [1][2]. Having a mostly blue branch-1 has helped us
>>> identify at least one destabilizing commit in the last few days, maybe two;
>>> this is as it should be (smile).
>>>
>>> Lets keep our builds blue. If you commit a patch, make sure subsequent
>>> builds stay blue. You can subscribe to bui...@hbase.apache.org to get
>>> notice of failures if not already subscribed.
>>>
>>> Thanks,
>>> St.Ack
>>>
>>> 1. https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.0/
>>> 2. https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/
>>>
>>>
>>> On Mon, Oct 13, 2014 at 4:41 PM, Stack <st...@duboce.net> wrote:
>>>
>>>> A few notes on testing.
>>>>
>>>> Too long to read, infra is more capable now and after some work, we are
>>>> seeing branch-1 and trunk mostly running blue. Lets try and keep it this
>>>> way going forward.
>>>>
>>>> Apache Infra has new, more capable hardware.
>>>>
>>>> A recent spurt of test fixing combined with more capable hardware seems
>>>> to have gotten us to a new place; tests are mostly passing now on branch-1
>>>> and master.  Lets try and keep it this way and start to trust our test runs
>>>> again.  Just a few flakies remain.  Lets try and nail them.
>>>>
>>>> Our tests now run in parallel with other test suites where previous we
>>>> ran alone. You can see this sometimes when our zombie detector reports
>>>> tests from another project altogether as lingerers (To be fixed).  Some of
>>>> our tests are failing because a concurrent hbase run is undoing classes and
>>>> data from under it. Also, lets fix.
>>>>
>>>> Our tests are brittle. It takes 75minutes for them to complete.  Many
>>>> are heavy-duty integration tests starting up multiple clusters and
>>>> mapreduce all in the one JVM. It is a miracle they pass at all.  Usually
>>>> integration tests have been cast as unit tests because there was no where
>>>> else for them to get an airing.  We have the hbase-it suite now which would
>>>> be a more apt place but until these are run on a regular basis in public
>>>> for all to see, the fat integration tests disguised as unit tests will
>>>> remain.  A review of our current unit tests weeding the old cruft and the
>>>> no longer relevant or duplicates would be a nice undertaking if someone is
>>>> looking to contribute.
>>>>
>>>> Alex Newman has been working on making our tests work up on travis and
>>>> circle-ci.  That'll be sweet when it goes end-to-end.  He also added in
>>>> some "type" categorizations -- client, filter, mapreduce -- alongside our
>>>> old "sizing" categorizations of small/medium/large.  His thinking is that
>>>> we can run these categorizations in parallel so we could run the total
>>>> suite in about the time of the longest test, say 20-30minutes?  We could
>>>> even change Apache to run them this way.
>>>>
>>>> FYI,
>>>> St.Ack
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to