Re: [lldb-dev] test results look typical?

Todd Fiala via lldb-dev Tue, 25 Aug 2015 07:49:18 -0700

On Tue, Aug 25, 2015 at 5:40 AM, Tamas Berghammer <tbergham...@google.com>
wrote:


> Going back to the original question I think you have more test failures
> then expected. As Chaoren mentioned all TestDataFormatterLibc* tests are
> failing because of a missing dependency,
>

Thanks, Tamas.  I'm going to be testing again today with libc++ installed.


> but I think the rest of the tests should pass (I wouldn't expect them to
> depend on libc++-dev).
>
>
I'll get a better handle on what's failing once I get rid of that first
batch.


> You can see the up to date list of failures on the Linux buildbot here:
> http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-cmake
>
>
Ah yes,that'll be good to cross reference.


> The buildbot is running in "Google Compute Engine" with Linux version:
> "Linux buildbot-master-ubuntu-1404 3.16.0-31-generic #43~14.04.1-Ubuntu SMP
> Tue Mar 10 20:13:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux"
>
> LLDB is compiled by Clang (not sure about witch version but can find out
> if somebody thinks it matters) and the inferiors are compiled with
> clang-3.5, clang-tot, gcc-4.9.2. In all tested configuration there should
> be no failure (all failing tests should be XFAIL-ed).
>
>
Ah okay good to know.  In the past IIRC I did get different failures using
clang-built vs. gcc-built lldb on Ubuntu 14.04.  The clang-built lldbs at
the time were harder to debug on Linux for one reason or another (I think
particularly if any optimizations were enabled due to loss of debuginfo,
but there might have been more).  Are you using a clang-built lldb and
debugging it reasonably well on Linux?  If so I'd just assume move over to
using clang so there's one less difference when I'm looking across
platforms.


> For the flaky tests we introduced an "expectedFlaky" decorator what
> executes the test twice and expects it to pass at least once,
>

Ah that's a good addition.  We had talked about doing something to watch
tests over time to see when it might be good to promote an XFAIL test that
is consistently passing to a static "expect success" test.  The flaky flag
sounds handy for those that flap.


> but it haven't been applied to all flaky test yet. The plan with the tests
> passing with "unexpected success" at the moment is to gather statistics
> about them and based on that mark them as "expected flaky" or remove the
> "expected failure" based on the number of failures we seen in the last few
> hundreds runs.
>

Ah yes that :-)  Love it.

Thanks, Tamas!


>
> Tamas
>
> On Tue, Aug 25, 2015 at 2:50 AM via lldb-dev <lldb-dev@lists.llvm.org>
> wrote:
>
>> On Mon, Aug 24, 2015 at 05:37:43PM -0700, via lldb-dev wrote:
>> > On Mon, Aug 24, 2015 at 03:37:52PM -0700, Todd Fiala via lldb-dev wrote:
>> > > On Linux on non-virtualized hardware, I currently see the failures
>> below on
>> > > Ubuntu 14.04.2 using a setup like this:
>> > > [...]
>> > >
>> > > ninja check-lldb output:
>>
>> FYI, ninja check-lldb actually calls dosep.
>>
>> > > Ran 394 test suites (15 failed) (3.807107%)
>> > > Ran 474 test cases (17 failed) (3.586498%)
>> >
>> > I don't think you can trust the reporting of dosep.py's "Ran N test
>> > cases", as it fails to count about 500 test cases.  The only way I've
>> > found to get an accurate count is to add up all the Ns from "Ran N tests
>> > in" as follows:
>> >
>> > ./dosep.py -s --options "-v --executable $BLDDIR/bin/lldb" 2>&1 | tee
>> test_out.log
>> > export total=`grep -E "^Ran [0-9]+ tests? in" test_out.log | awk
>> '{count+=$2} END {print count}'`
>>
>> Of course, these commands assume you're running the tests from the
>> lldb/test directory.
>>
>> > (See comments in http://reviews.llvm.org/rL238467.)
>>
>> I've pasted (and tweaked) the relavent comments from that review here,
>> where I describe a narrowed case showing how dosep fails to count all the
>> test cases from one test suite in test/types.  Note that the tests were run
>> on OSX, so your counts may vary.
>>
>> The final count from:
>>     Ran N test cases .*
>> is wrong, as I'll explain below. I've done a comparison between dosep and
>> dotest on a narrowed subset of tests to show how dosep can omit the test
>> cases from a test suite in its count.
>>
>> Tested on subset of lldb/test with just the following directories/files
>> (i.e. all others directories/files were removed):
>>     test/make
>>     test/pexpect-2.4
>>     test/plugins
>>     test/types
>>     test/unittest2
>> # The .py files kept in test/types are as follows (so
>> test/types/TestIntegerTypes.py* was removed):
>>     test/types/AbstractBase.py
>>     test/types/HideTestFailures.py
>>     test/types/TestFloatTypes.py
>>     test/types/TestFloatTypesExpr.py
>>     test/types/TestIntegerTypesExpr.py
>>     test/types/TestRecursiveTypes.py
>>
>> Tests were run in the lldb/test directory using the following commands:
>>     dotest:
>>         ./dotest.py -v
>>     dosep:
>>         ./dosep.py -s --options "-v"
>>
>> Comparing the test case totals, dotest correctly counts 46, but dosep
>> counts only 16:
>>     dotest:
>>         Ran 46 tests in 75.934s
>>     dosep:
>>         Testing: 23 tests, 4 threads ## note: this number changes
>> randonmly
>>         Ran 6 tests in 7.049s
>>         [PASSED TestFloatTypes.py] - 1 out of 23 test suites processed
>>         Ran 6 tests in 11.165s
>>         [PASSED TestFloatTypesExpr.py] - 2 out of 23 test suites processed
>>         Ran 30 tests in 54.581s ## FIXME: not counted?
>>         [PASSED TestIntegerTypesExpr.py] - 3 out of 23 test suites
>> processed
>>         Ran 4 tests in 3.212s
>>         [PASSED TestRecursiveTypes.py] - 4 out of 23 test suites processed
>>         Ran 4 test suites (0 failed) (0.000000%)
>>         Ran 16 test cases (0 failed) (0.000000%)
>>
>> With test/types/TestIntegerTypesExpr.py* removed, both correctly count 16
>> test cases:
>>     dosep:
>>         Testing: 16 tests, 4 threads
>>         Ran 6 tests in 7.059s
>>         Ran 6 tests in 11.186s
>>         Ran 4 tests in 3.241s
>>         Ran 3 test suites (0 failed) (0.000000%)
>>         Ran 16 test cases (0 failed) (0.000000%)
>>
>> Note: I couldn't compare the test counts on all the tests because of the
>> concern raised in http://reviews.llvm.org/rL237053. That is, dotest can
>> no longer complete the tests on OSX, as all test suites fail after test
>> case 898: test_disassemble_invalid_vst_1_64_raw_data get ERRORs. I don't
>> think that issue is related to problems in dosep.
>>
>> Thanks,
>> -Dawn
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev@lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>
>


-- 
-Todd

_______________________________________________
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev

Re: [lldb-dev] test results look typical?

Reply via email to