Thanks Bogdan, so far so good. testNodeDataChanged is an old beast, I've a possible fix for that from @afine:
https://github.com/apache/zookeeper/pull/300 Would be great if we could review it and get rid of this flaky. Andor On 10/20/18 06:41, Bogdan Kanivets wrote: > I think the argument for keeping concurrency is that it may manifest some > unknown problems with the code. > > Maybe a middle ground - move largest offenders into separate junit tag and > run them after rest of the test with threads=1. Hopefully this will make > life better for PRs. > > On the note of largest offenders, I've done 44 runs on aws r3.large with > various thread settings (1, 2, 4, 8). > Failure counts: > 1 testNextConfigAlreadyActive > 1 testNonExistingOpCode > 1 testRaceConditionBetweenLeaderAndAckRequestProcessor > 1 testWatcherDisconnectOnClose > 2 testDoubleElection > 5 testCurrentServersAreObserversInNextConfig > 5 testNormalFollowerRunWithDiff > 7 startSingleServerTest > 18 testNodeDataChanged > > Haven't seen testPurgeWhenLogRollingInProgress > or testManyChildWatchersAutoReset failing yet. > > > > On Thu, Oct 18, 2018 at 10:03 PM Michael Han <h...@apache.org> wrote: > >> It's a good idea to reduce the concurrency of to eliminate flakyness. Looks >> like single threaded unit tests on trunk is pretty stable >> https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some >> failures >> are due to C tests). The build time is longer, but not too bad (for >> pre-commit build, for nightly build, build time should not be a concern at >> all). >> >> >> On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar <an...@cloudera.com.invalid> >> wrote: >> >>> +1 >>> >>> >>> >>> On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli <eolive...@gmail.com> >>> wrote: >>> >>>> Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar >>>> <an...@apache.org> ha scritto: >>>>> Thank you guys. This is great help. >>>>> >>>>> I remember your efforts Bogdan, as far as I remember you observer >>> thread >>>> starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong. >>>>> I’ve created an umbrella Jira to capture all flaky test fixing >> efforts >>>> here: >>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3170 < >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3170> >>>>> All previous flaky-related tickets have been converted to sub-tasks. >>>> Some of them might not be up-to-date, please consider reviewing them >> and >>>> close if possible. Additionally feel free to create new sub-tasks to >>>> capture your actual work. >>>>> I’ve already modified Trunk and branch-3.5 builds to run on 4 threads >>>> for testing initially. It resulted in slightly more stable tests: >>>> >>>> +1 >>>> >>>> I have assigned the umbrella issue to you Andor as you are driving >>>> this important task. is is ok ? >>>> >>>> thank you >>>> >>>> Enrico >>>> >>>> >>>>> Trunk (java 8) - failing 1/4 (since #229) - build time increased by >>>> 40-45% >>>>> Trunk (java 9) - failing 0/2 (since #993) - ~40% >>>>> Trunk (java 10) - failing 1/2 (since #280) - >>>>> branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45% >>>>> >>>>> However the pattern is not big enough and results are inaccurate, so >> I >>>> need more builds. I also need to fix a bug in SSL to get java9/10 >> builds >>>> working on 3.5. >>>>> Please let me know if I should revert the changes. Precommit build is >>>> still running on 8 threads, but I’d like to change that one too. >>>>> Regards, >>>>> Andor >>>>> >>>>> >>>>> >>>>>> On 2018. Oct 15., at 9:31, Bogdan Kanivets <bkaniv...@gmail.com> >>>> wrote: >>>>>> Fangmin, >>>>>> >>>>>> Those are good ideas. >>>>>> >>>>>> FYI, I've stated running tests continuously in aws m1.xlarge. >>>>>> https://github.com/lavacat/zookeeper-tests-lab >>>>>> >>>>>> So far, I've done ~ 12 runs of trunk. Same common offenders as in >>> Flaky >>>>>> dash: testManyChildWatchersAutoReset, >> testPurgeWhenLogRollingInProgr >>>> ess >>>>>> I'll do some more runs, then try to come up with report. >>>>>> >>>>>> I'm using aws and not Apache Jenkins env because of better >>>>>> control/observability. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <lvfang...@gmail.com> >>>> wrote: >>>>>>> Internally, we also did some works to reduce the flaky, here are >> the >>>> main >>>>>>> things we've done: >>>>>>> >>>>>>> * using retry rule to retry in case the zk client lost it's >>>> connection, >>>>>>> this could happen if the quorum tests is running on unstable >>>> environment >>>>>>> and the leader election happened. >>>>>>> * using random port instead of sequentially to avoid the port >> racing >>>> when >>>>>>> running tests concurrently >>>>>>> * changing tests to avoid using the same test path when >>>> creating/deleting >>>>>>> nodes >>>>>>> >>>>>>> These greatly reduced the flaky internally, we should try those if >>>> we're >>>>>>> seeing similar issues in the Jenkins. >>>>>>> >>>>>>> Fangmin >>>>>>> >>>>>>> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets < >>> bkaniv...@gmail.com >>>>>>> wrote: >>>>>>> >>>>>>>> I've looked into flakiness couple months ago (special attention >> on >>>>>>>> testManyChildWatchersAutoReset). In my opinion the problem is a) >>>> and c). >>>>>>>> Unfortunately I don't have data to back this claim. >>>>>>>> >>>>>>>> I don't remember seeing many 'port binding' exceptions. Unless >>> 'port >>>>>>>> assignment' issue manifested as some other exception. >>>>>>>> >>>>>>>> Before decreasing number of threads I think more data should be >>>>>>>> collected/visualized >>>>>>>> >>>>>>>> 1) Flaky dashboard is great, but we should add another report >> that >>>> maps >>>>>>>> 'error causes' to builds/tests >>>>>>>> 2) Flaky dash can be extended to save more history (for example >>> like >>>> this >>>>>>>> https://www.chromium.org/developers/testing/flakiness-dashboard) >>>>>>>> 3) PreCommit builds should be included in dashboard >>>>>>>> 4) We should have a common clean benchmark. For example - take >>>>>>>> AWS t3.xlarge instance with set linux distro, jvm, zk commit sha >>> and >>>> run >>>>>>>> tests (current 8 threads) for 8 hours with 1 min cooldown. >>>>>>>> >>>>>>>> Due to recent employment change, I got sidetracked, but I really >>>> want to >>>>>>>> get to the bottom of this. >>>>>>>> I'm going to setup 4) and report results to this mailing list. >> Also >>>>>>> willing >>>>>>>> to work on other items. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli < >>> eolive...@gmail.com >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <br...@apache.org> ha >>>> scritto: >>>>>>>>>> i think the unique port assignment (d) is more problematic than >>> it >>>>>>>>>> appears. there is a race between finding a free port and >> actually >>>>>>>>>> grabbing it. i think that contributes to the flakiness. >>>>>>>>>> >>>>>>>>> This is very hard to solve for our test cases, because we need >> to >>>> build >>>>>>>>> configs before starting the groups of servers. >>>>>>>>> For tests in single server it will be easier, you just have to >>> start >>>>>>> the >>>>>>>>> server on port zero, get the port and the create client configs. >>>>>>>>> I don't know how much it will be worth >>>>>>>>> >>>>>>>>> Enrico >>>>>>>>> >>>>>>>>> >>>>>>>>>> ben >>>>>>>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <an...@apache.org >>>>>>> wrote: >>>>>>>>>>> That is a completely valid point. I started to investigate >>> flakies >>>>>>>> for >>>>>>>>>> exactly the same reason, if you remember the thread that I >>> started >>>> a >>>>>>>>> while >>>>>>>>>> ago. It was later abandoned unfortunately, because I’ve run >> into >>> a >>>>>>> few >>>>>>>>>> issues: >>>>>>>>>>> - We nailed down that in order to release 3.5 stable, we have >> to >>>>>>> make >>>>>>>>>> sure it’s not worse than 3.4 by comparing the builds: but these >>>>>>> builds >>>>>>>>> are >>>>>>>>>> not comparable, because 3.4 tests running single threaded while >>> 3.5 >>>>>>>>>> multithreaded showing problems which might also exist on 3.4, >>>>>>>>>>> - Neither of them running C++ tests for some reason, but >> that’s >>>> not >>>>>>>>>> really an issue here, >>>>>>>>>>> - Looks like tests on 3.5 is just as solid as on 3.4, because >>>>>>> running >>>>>>>>>> them on a dedicated, single threaded environment show almost >> all >>>>>>> tests >>>>>>>>>> succeeding, >>>>>>>>>>> - I think the root cause of failing unit tests could be one >> (or >>>>>>> more) >>>>>>>>> of >>>>>>>>>> the following: >>>>>>>>>>> a) Environmental: Jenkins slave gets overloaded with >>> other >>>>>>>>>> builds and multithreaded test running makes things even worse: >>>>>>> starving >>>>>>>>> JDK >>>>>>>>>> threads and ZK instances (both clients and servers) are unable >> to >>>>>>>> operate >>>>>>>>>>> b) Conceptional: ZK unit tests were not designed to run >>> on >>>>>>>>>> multiple threads: I investigated the unique port assignment >>> feature >>>>>>>> which >>>>>>>>>> is looking good, but there could be other possible gaps which >>> makes >>>>>>>> them >>>>>>>>>> unreliable when running simultaneously. >>>>>>>>>>> c) Bad testing: testing ZK in the wrong way, making bad >>>>>>>>>> assumption (e.g. not syncing clients), etc. >>>>>>>>>>> d) Bug in the server. >>>>>>>>>>> >>>>>>>>>>> I feel that finding case d) with these tests is super hard, >>>>>>> because a >>>>>>>>>> test report doesn’t give any information on what could go wrong >>>> with >>>>>>>>>> ZooKeeper. More or less guessing is your only option. >>>>>>>>>>> Finding c) is a little bit easier, I’m trying to submit >> patches >>> on >>>>>>>> them >>>>>>>>>> and hopefully making some progress. >>>>>>>>>>> The huge pain in the arse though are a) and b): people >>> desperately >>>>>>>> keep >>>>>>>>>> commenting “please retest this” on github to get a green build >>>> while >>>>>>>>>> testing is going in a direction to hide real problems: I mean >>>> people >>>>>>>>>> started not to care about a failing build, because “it must be >>> some >>>>>>>> flaky >>>>>>>>>> unrelated to my patch”. Which is bad, but the shame is it’s >> true >>>> 90% >>>>>>>>>> percent of cases. >>>>>>>>>>> I’m just trying to find some ways - besides fixing c) and d) >>>>>>> flakies >>>>>>>> - >>>>>>>>>> to get more reliable and more informative Jenkins builds. Don’t >>>> want >>>>>>> to >>>>>>>>>> make a huge turnaround, but I think if we can get a >> significantly >>>>>>> more >>>>>>>>>> reliable build for the price of slightly longer build time >>> running >>>>>>> on 4 >>>>>>>>>> threads instead of 8, I say let’s do it. >>>>>>>>>>> As always, any help from the community is more than welcome >> and >>>>>>>>>> appreciated. >>>>>>>>>>> Thanks, >>>>>>>>>>> Andor >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt <ph...@apache.org> >>>>>>> wrote: >>>>>>>>>>>> iirc the number of threads was increased to improve >>> performance. >>>>>>>>>> Reducing >>>>>>>>>>>> is fine, but do we understand why it's failing? Perhaps it's >>>>>>>> finding >>>>>>>>>> real >>>>>>>>>>>> issues as a result of the artificial concurrency/load. >>>>>>>>>>>> >>>>>>>>>>>> Patrick >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar >>>>>>>>>> <an...@cloudera.com.invalid> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the feedback. >>>>>>>>>>>>> I'm running a few tests now: branch-3.5 on 2 threads and >> trunk >>>>>>> on >>>>>>>> 4 >>>>>>>>>> threads >>>>>>>>>>>>> to see what's the impact on the build time. >>>>>>>>>>>>> >>>>>>>>>>>>> Github PR job is hard to configure, because its settings are >>>>>>> hard >>>>>>>>>> coded >>>>>>>>>>>>> into a shell script in the codebase. I have to open PR for >>> that. >>>>>>>>>>>>> Andor >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar < >>>>>>>>>>>>> nkal...@cloudera.com.invalid> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1, running the tests locally with 1 thread always passes >>>>>>> (well, >>>>>>>> I >>>>>>>>>> run it >>>>>>>>>>>>>> about 5 times, but still) >>>>>>>>>>>>>> On the other hand, running it on 8 threads yields similarly >>>>>>> flaky >>>>>>>>>> results >>>>>>>>>>>>>> as Apache runs. (Although it is much faster, but if we have >>> to >>>>>>>> run >>>>>>>>>> 6-8-10 >>>>>>>>>>>>>> times sometimes to get a green run...) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Norbert >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli < >>>>>>>>> eolive...@gmail.com >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Enrico >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor Molnar <an...@apache.org> >>> ha >>>>>>>>>> scritto: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What do you think of changing number of threads running >>> unit >>>>>>>>> tests >>>>>>>>>> in >>>>>>>>>>>>>>>> Jenkins from current 8 to 4 or even 2? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Running unit tests inside Cloudera environment on a >> single >>>>>>>> thread >>>>>>>>>>>>> shows >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> builds much more stable. That would be probably too slow, >>> but >>>>>>>>> maybe >>>>>>>>>>>>>>> running >>>>>>>>>>>>>>>> at least less threads would improve the situation. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It's getting very annoying that I cannot get a green >> build >>> on >>>>>>>>>> GitHub >>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> only a few retests. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Andor >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- Enrico Olivelli >>>>>>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>> -- Enrico Olivelli >>>>>>>>>