Thanks Bogdan. Your analysis is much appreciated.

According to your findings, it looks like it must be an infrastructure
issue. But that doesn't explain why the test is a lot more stable on the
3.4 branch. I'd like to setup new jenkins jobs on a private
infrastructure (other than Apache) on branch-3.5 with java8 and see if
there's any difference.

Increasing the timeout further is a good idea I think. Test runtimes are
quite close to 10 mins and it would be nice to see whether it's just
takes ages to finish or fallen into a deadlock.


Andor



On 07/19/2018 08:46 AM, Bogdan Kanivets wrote:
> Hi Andor,
>
> For testManyChildWatchersAutoReset, that was me who put 10min timeout on
> the test itself. I wanted to see the logs and the problem is that when test
> is timed out by ant (default 15min) logs aren't captured.
> I agree that it became much flakier. I've pushed the PR right now to
> increase to 14min.
>
> Also, I'm still looking at the slowness and posted some thoughts in jira
> https://issues.apache.org/jira/browse/ZOOKEEPER-3046
>
> On Wed, Jul 18, 2018 at 9:09 PM Michael Han <h...@apache.org> wrote:
>
>> Thanks Pat for promptly fixing this!
>>
>> I have no idea of the "failed to get" symptoms. Probably we could give it
>> more days and see if the pattern recurs? If not might be a transient infra
>> issue...
>>
>> On Wed, Jul 18, 2018 at 11:16 AM, Patrick Hunt <ph...@apache.org> wrote:
>>
>>> Ok, I committed a change that seems to address the main failure:
>>>
>> https://github.com/apache/zookeeper/commit/06b9507ab78a1a055b8f467846c157
>>> 91600b72ee
>>>
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>> ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
>>>
>>> However I do notice some oddness in the sense that for some jobs/runs it
>>> fails to get the information from the REST interface, even though it's
>> fine
>>> for most of them, take a look, any ideas?
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>> ZooKeeper-Find-Flaky-Tests/456/console
>>>
>>> [ZooKeeper-Find-Flaky-Tests] $ /bin/bash /tmp/
>>> jenkins4452773653790031730.sh
>>> ERROR:__main__:failed to get:
>>> https://builds.apache.org/job/ZooKeeper-trunk/108/
>>> testReport/api/json?tree=suites%5Bname%2Ccases%
>>> 5BclassName%2Cname%2Cstatus%5D%5D
>>> ERROR:__main__:failed to get:
>>> https://builds.apache.org/job/ZooKeeper-trunk/104/
>>> testReport/api/json?tree=suites%5Bname%2Ccases%
>>> 5BclassName%2Cname%2Cstatus%5D%5D
>>> ERROR:__main__:failed to get:
>>> https://builds.apache.org/job/ZooKeeper-trunk/100/
>>> testReport/api/json?tree=suites%5Bname%2Ccases%
>>> 5BclassName%2Cname%2Cstatus%5D%5D
>>>
>>>
>>> Notice that it doesn't complain about job 107 (etc...)
>>>
>>> Any ideas on this? Have you seen this before? Perhaps we should open an
>>> INFRA jira?
>>>
>>> Patrick
>>>
>>> On Wed, Jul 18, 2018 at 10:52 AM Patrick Hunt <ph...@apache.org> wrote:
>>>
>>>> FYI, created this:
>>>> https://issues.apache.org/jira/browse/INFRA-16785
>>>> for the security warnings, not sure if that's causing the issue. Likely
>>>> it's the recent jenkins upgrade, looking into it a bit...
>>>>
>>>> Patrick
>>>>
>>>>
>>>> On Wed, Jul 18, 2018 at 9:48 AM Michael Han <h...@apache.org> wrote:
>>>>
>>>>> Hi Andor,
>>>>>
>>>>>>> I suspect it should succeed eventually if we were to increase the
>>>>> timeout even more. But is that correct? Bug or infrastructure issue?
>>>>>
>>>>> You could set up a dedicated git branch with all patches (e.g. the one
>>> in
>>>>> ZOOKEEPER-2251) you want to apply and I can set up a dedicated Jenkins
>>> job
>>>>> that points to this branch and stress test the entire unit test suite.
>>>>> Some
>>>>> tests are only flaky when they ran on Apache infrastructure and when
>>> they
>>>>> ran together.
>>>>>
>>>>> It would be interesting to figure out what cause this test fail. Since
>>>>> same
>>>>> test works reliably in 3.4, there must be some commits in 3.5 that we
>>>>> could
>>>>> possibly blame...
>>>>>
>>>>>>> I'm going to raise a ticket on that if somebody willing to fix it.
>>>>> I just had a brief look before Jenkins is down. Looks like python was
>>>>> complaining about some SSL stuff and I suspect if we upgrade to use
>>> later
>>>>> version of python (3.x) it might work. I'll try that later when
>> Jenkins
>>> is
>>>>> back.
>>>>>
>>>>>
>>>>> On Wed, Jul 18, 2018 at 8:42 AM, Andor Molnar
>>> <an...@cloudera.com.invalid
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> *branch-3.4*
>>>>>>
>>>>>> I've taken a quick look at our Jenkins builds and in terms of flaky
>>>>> tests,
>>>>>> it looks like branch-3.4 is in a pretty good shape. The build hasn't
>>>>> failed
>>>>>> for 5-6 days on all JDKs which I think is pretty awesome.
>>>>>>
>>>>>> *branch-3.5*
>>>>>>
>>>>>> This branch is in very bad condition. Which is quite unfortunate
>> given
>>>>>> we're in the middle of stabilising it. :)
>>>>>> Especially on JDK8, last successful build was 11 days ago. JDK9 (50%
>>>>>> failing) and JDK10 (30% failing) are looking better in the last 10
>>>>> builds.
>>>>>> Interestingly (apart from a few quite rare ones) it looks there's
>>> only 1
>>>>>> test which is quite nasty on this branch:
>>> testManyChildWatchersAutoReset
>>>>>> There's a Jira about fixing it and a fix has been merged by
>> increasing
>>>>> the
>>>>>> timeout of the test, but having a bug on the branch is also possible
>>>>>> causing the test to fail even with 10 min timeout.
>>>>>>
>>>>>> I wasn't able to repro the failing test on my machine (Mac and
>>>>> CentOS7), it
>>>>>> always finished in 30-40 seconds maximum. On jenkins slaves it shows
>>> the
>>>>>> following:
>>>>>>
>>>>>> *JDK 8:*
>>>>>>
>>>>>> Report creation timed out.
>>>>>>
>>>>>>
>>>>>> *JDK 9:*
>>>>>>
>>>>>> New Failures
>>>>>> Chart
>>>>>> See children
>>>>>> Build Number ⇒
>>>>>> Package-Class-Testmethod names ⇓
>>>>>> 351
>>>>>> 350
>>>>>> 349
>>>>>> 348
>>>>>> 347
>>>>>> 346
>>>>>> 345
>>>>>> 344
>>>>>> 343
>>>>>> 342
>>>>>> 341
>>>>>> 340
>>>>>> 339
>>>>>> 338
>>>>>> 337
>>>>>> 336
>>>>>> 335
>>>>>> 334
>>>>>>  testManyChildWatchersAutoReset
>>>>>> 45.604
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.337
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 21.904
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 583.063
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.325
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.383
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.362
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 21.139
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 24.031
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 584.200
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/342/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.327
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/341/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.323
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/340/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 23.737
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/339/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.406
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/338/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 547.004
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/337/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.393
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/336/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> N/A
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/test_results_analyzer/>
>>>>>> 373.955
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java9/334/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>>
>>>>>>
>>>>>> *JDK 10:*
>>>>>>
>>>>>>
>>>>>> New Failures
>>>>>> Chart
>>>>>> See children
>>>>>> Build Number ⇒
>>>>>> Package-Class-Testmethod names ⇓
>>>>>> 110
>>>>>> 109
>>>>>> 108
>>>>>> 107
>>>>>> 106
>>>>>> 105
>>>>>> 104
>>>>>> 103
>>>>>> 102
>>>>>> 101
>>>>>> 100
>>>>>> 99
>>>>>> 98
>>>>>> 97
>>>>>> 96
>>>>>> 95
>>>>>> 94
>>>>>> 93
>>>>>> 92
>>>>>>  testManyChildWatchersAutoReset
>>>>>> 364.945
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/110/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 543.983
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/109/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 388.182
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/108/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.446
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/107/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.025
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/106/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 535.046
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/105/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.306
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/104/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 474.005
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/103/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 560.925
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/102/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.328
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/101/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 558.547
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/100/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.397
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/99/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.414
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/98/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 430.383
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/97/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 564.064
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/96/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 600.357
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/95/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 432.435
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/94/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 596.378
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/93/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>> 39.242
>>>>>> <https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
>>>>>> ZooKeeper_branch35_java10/92/testReport/org.apache.zookeeper.test/
>>>>>> DisconnectedWatcherTest/testManyChildWatchersAutoReset>
>>>>>>
>>>>>>
>>>>>> It takes ages to complete on Jenkins for some reason and it looks
>> like
>>>>> it
>>>>>> ends quite frequently close to the limit, so I suspect it should
>>> succeed
>>>>>> eventually if we were to increase the timeout even more. But is that
>>>>>> correct?
>>>>>> Bug or infrastructure issue?
>>>>>>
>>>>>> *master / 3.6*
>>>>>>
>>>>>> Pretty much the same as 3.5. I haven't seen
>>>>> testManyChildWatchersAutoReset
>>>>>> failing on this branch with JDK8 which is a bit confusing, but other
>>>>> then
>>>>>> that I see the same pattern on JDK9 and JDK10. Unable to generate
>> the
>>>>> above
>>>>>> reports here, because Test Result Analyzer keep timeouting for me,
>> but
>>>>> I'll
>>>>>> follow-up when I have them.
>>>>>>
>>>>>> Btw. Flaky Test report has been broken for 10 days, I'm going to
>>> raise a
>>>>>> ticket on that if somebody willing to fix it. (I'm planning to do
>> so.)
>>>>>> It would be nice to see the report working again, because if my
>>>>>> observations are correct, we don't have too many annoying tests
>> apart
>>>>> from
>>>>>> the one mentioned.
>>>>>>
>>>>>> Thanks,
>>>>>> Andor
>>>>>>

Reply via email to