It's a good idea to reduce the concurrency of to eliminate flakyness. Looks like single threaded unit tests on trunk is pretty stable https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some failures are due to C tests). The build time is longer, but not too bad (for pre-commit build, for nightly build, build time should not be a concern at all).
On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar <an...@cloudera.com.invalid> wrote: > +1 > > > > On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli <eolive...@gmail.com> > wrote: > > > Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar > > <an...@apache.org> ha scritto: > > > > > > Thank you guys. This is great help. > > > > > > I remember your efforts Bogdan, as far as I remember you observer > thread > > starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong. > > > > > > I’ve created an umbrella Jira to capture all flaky test fixing efforts > > here: > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170 < > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170> > > > > > > All previous flaky-related tickets have been converted to sub-tasks. > > Some of them might not be up-to-date, please consider reviewing them and > > close if possible. Additionally feel free to create new sub-tasks to > > capture your actual work. > > > > > > I’ve already modified Trunk and branch-3.5 builds to run on 4 threads > > for testing initially. It resulted in slightly more stable tests: > > > > +1 > > > > I have assigned the umbrella issue to you Andor as you are driving > > this important task. is is ok ? > > > > thank you > > > > Enrico > > > > > > > > > > Trunk (java 8) - failing 1/4 (since #229) - build time increased by > > 40-45% > > > Trunk (java 9) - failing 0/2 (since #993) - ~40% > > > Trunk (java 10) - failing 1/2 (since #280) - > > > branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45% > > > > > > However the pattern is not big enough and results are inaccurate, so I > > need more builds. I also need to fix a bug in SSL to get java9/10 builds > > working on 3.5. > > > > > > Please let me know if I should revert the changes. Precommit build is > > still running on 8 threads, but I’d like to change that one too. > > > > > > Regards, > > > Andor > > > > > > > > > > > > > On 2018. Oct 15., at 9:31, Bogdan Kanivets <bkaniv...@gmail.com> > > wrote: > > > > > > > > Fangmin, > > > > > > > > Those are good ideas. > > > > > > > > FYI, I've stated running tests continuously in aws m1.xlarge. > > > > https://github.com/lavacat/zookeeper-tests-lab > > > > > > > > So far, I've done ~ 12 runs of trunk. Same common offenders as in > Flaky > > > > dash: testManyChildWatchersAutoReset, testPurgeWhenLogRollingInProgr > > ess > > > > I'll do some more runs, then try to come up with report. > > > > > > > > I'm using aws and not Apache Jenkins env because of better > > > > control/observability. > > > > > > > > > > > > > > > > > > > > On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <lvfang...@gmail.com> > > wrote: > > > > > > > >> Internally, we also did some works to reduce the flaky, here are the > > main > > > >> things we've done: > > > >> > > > >> * using retry rule to retry in case the zk client lost it's > > connection, > > > >> this could happen if the quorum tests is running on unstable > > environment > > > >> and the leader election happened. > > > >> * using random port instead of sequentially to avoid the port racing > > when > > > >> running tests concurrently > > > >> * changing tests to avoid using the same test path when > > creating/deleting > > > >> nodes > > > >> > > > >> These greatly reduced the flaky internally, we should try those if > > we're > > > >> seeing similar issues in the Jenkins. > > > >> > > > >> Fangmin > > > >> > > > >> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets < > bkaniv...@gmail.com > > > > > > >> wrote: > > > >> > > > >>> I've looked into flakiness couple months ago (special attention on > > > >>> testManyChildWatchersAutoReset). In my opinion the problem is a) > > and c). > > > >>> Unfortunately I don't have data to back this claim. > > > >>> > > > >>> I don't remember seeing many 'port binding' exceptions. Unless > 'port > > > >>> assignment' issue manifested as some other exception. > > > >>> > > > >>> Before decreasing number of threads I think more data should be > > > >>> collected/visualized > > > >>> > > > >>> 1) Flaky dashboard is great, but we should add another report that > > maps > > > >>> 'error causes' to builds/tests > > > >>> 2) Flaky dash can be extended to save more history (for example > like > > this > > > >>> https://www.chromium.org/developers/testing/flakiness-dashboard) > > > >>> 3) PreCommit builds should be included in dashboard > > > >>> 4) We should have a common clean benchmark. For example - take > > > >>> AWS t3.xlarge instance with set linux distro, jvm, zk commit sha > and > > run > > > >>> tests (current 8 threads) for 8 hours with 1 min cooldown. > > > >>> > > > >>> Due to recent employment change, I got sidetracked, but I really > > want to > > > >>> get to the bottom of this. > > > >>> I'm going to setup 4) and report results to this mailing list. Also > > > >> willing > > > >>> to work on other items. > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli < > eolive...@gmail.com > > > > > > >>> wrote: > > > >>> > > > >>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <br...@apache.org> ha > > scritto: > > > >>>> > > > >>>>> i think the unique port assignment (d) is more problematic than > it > > > >>>>> appears. there is a race between finding a free port and actually > > > >>>>> grabbing it. i think that contributes to the flakiness. > > > >>>>> > > > >>>> > > > >>>> This is very hard to solve for our test cases, because we need to > > build > > > >>>> configs before starting the groups of servers. > > > >>>> For tests in single server it will be easier, you just have to > start > > > >> the > > > >>>> server on port zero, get the port and the create client configs. > > > >>>> I don't know how much it will be worth > > > >>>> > > > >>>> Enrico > > > >>>> > > > >>>> > > > >>>>> ben > > > >>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <an...@apache.org> > > > >> wrote: > > > >>>>>> > > > >>>>>> That is a completely valid point. I started to investigate > flakies > > > >>> for > > > >>>>> exactly the same reason, if you remember the thread that I > started > > a > > > >>>> while > > > >>>>> ago. It was later abandoned unfortunately, because I’ve run into > a > > > >> few > > > >>>>> issues: > > > >>>>>> > > > >>>>>> - We nailed down that in order to release 3.5 stable, we have to > > > >> make > > > >>>>> sure it’s not worse than 3.4 by comparing the builds: but these > > > >> builds > > > >>>> are > > > >>>>> not comparable, because 3.4 tests running single threaded while > 3.5 > > > >>>>> multithreaded showing problems which might also exist on 3.4, > > > >>>>>> > > > >>>>>> - Neither of them running C++ tests for some reason, but that’s > > not > > > >>>>> really an issue here, > > > >>>>>> > > > >>>>>> - Looks like tests on 3.5 is just as solid as on 3.4, because > > > >> running > > > >>>>> them on a dedicated, single threaded environment show almost all > > > >> tests > > > >>>>> succeeding, > > > >>>>>> > > > >>>>>> - I think the root cause of failing unit tests could be one (or > > > >> more) > > > >>>> of > > > >>>>> the following: > > > >>>>>> a) Environmental: Jenkins slave gets overloaded with > other > > > >>>>> builds and multithreaded test running makes things even worse: > > > >> starving > > > >>>> JDK > > > >>>>> threads and ZK instances (both clients and servers) are unable to > > > >>> operate > > > >>>>>> b) Conceptional: ZK unit tests were not designed to run > on > > > >>>>> multiple threads: I investigated the unique port assignment > feature > > > >>> which > > > >>>>> is looking good, but there could be other possible gaps which > makes > > > >>> them > > > >>>>> unreliable when running simultaneously. > > > >>>>>> c) Bad testing: testing ZK in the wrong way, making bad > > > >>>>> assumption (e.g. not syncing clients), etc. > > > >>>>>> d) Bug in the server. > > > >>>>>> > > > >>>>>> I feel that finding case d) with these tests is super hard, > > > >> because a > > > >>>>> test report doesn’t give any information on what could go wrong > > with > > > >>>>> ZooKeeper. More or less guessing is your only option. > > > >>>>>> > > > >>>>>> Finding c) is a little bit easier, I’m trying to submit patches > on > > > >>> them > > > >>>>> and hopefully making some progress. > > > >>>>>> > > > >>>>>> The huge pain in the arse though are a) and b): people > desperately > > > >>> keep > > > >>>>> commenting “please retest this” on github to get a green build > > while > > > >>>>> testing is going in a direction to hide real problems: I mean > > people > > > >>>>> started not to care about a failing build, because “it must be > some > > > >>> flaky > > > >>>>> unrelated to my patch”. Which is bad, but the shame is it’s true > > 90% > > > >>>>> percent of cases. > > > >>>>>> > > > >>>>>> I’m just trying to find some ways - besides fixing c) and d) > > > >> flakies > > > >>> - > > > >>>>> to get more reliable and more informative Jenkins builds. Don’t > > want > > > >> to > > > >>>>> make a huge turnaround, but I think if we can get a significantly > > > >> more > > > >>>>> reliable build for the price of slightly longer build time > running > > > >> on 4 > > > >>>>> threads instead of 8, I say let’s do it. > > > >>>>>> > > > >>>>>> As always, any help from the community is more than welcome and > > > >>>>> appreciated. > > > >>>>>> > > > >>>>>> Thanks, > > > >>>>>> Andor > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt <ph...@apache.org> > > > >> wrote: > > > >>>>>>> > > > >>>>>>> iirc the number of threads was increased to improve > performance. > > > >>>>> Reducing > > > >>>>>>> is fine, but do we understand why it's failing? Perhaps it's > > > >>> finding > > > >>>>> real > > > >>>>>>> issues as a result of the artificial concurrency/load. > > > >>>>>>> > > > >>>>>>> Patrick > > > >>>>>>> > > > >>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar > > > >>>>> <an...@cloudera.com.invalid> > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Thanks for the feedback. > > > >>>>>>>> I'm running a few tests now: branch-3.5 on 2 threads and trunk > > > >> on > > > >>> 4 > > > >>>>> threads > > > >>>>>>>> to see what's the impact on the build time. > > > >>>>>>>> > > > >>>>>>>> Github PR job is hard to configure, because its settings are > > > >> hard > > > >>>>> coded > > > >>>>>>>> into a shell script in the codebase. I have to open PR for > that. > > > >>>>>>>> > > > >>>>>>>> Andor > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar < > > > >>>>>>>> nkal...@cloudera.com.invalid> wrote: > > > >>>>>>>> > > > >>>>>>>>> +1, running the tests locally with 1 thread always passes > > > >> (well, > > > >>> I > > > >>>>> run it > > > >>>>>>>>> about 5 times, but still) > > > >>>>>>>>> On the other hand, running it on 8 threads yields similarly > > > >> flaky > > > >>>>> results > > > >>>>>>>>> as Apache runs. (Although it is much faster, but if we have > to > > > >>> run > > > >>>>> 6-8-10 > > > >>>>>>>>> times sometimes to get a green run...) > > > >>>>>>>>> > > > >>>>>>>>> Norbert > > > >>>>>>>>> > > > >>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli < > > > >>>> eolive...@gmail.com > > > >>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> +1 > > > >>>>>>>>>> > > > >>>>>>>>>> Enrico > > > >>>>>>>>>> > > > >>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor Molnar <an...@apache.org> > ha > > > >>>>> scritto: > > > >>>>>>>>>> > > > >>>>>>>>>>> Hi, > > > >>>>>>>>>>> > > > >>>>>>>>>>> What do you think of changing number of threads running > unit > > > >>>> tests > > > >>>>> in > > > >>>>>>>>>>> Jenkins from current 8 to 4 or even 2? > > > >>>>>>>>>>> > > > >>>>>>>>>>> Running unit tests inside Cloudera environment on a single > > > >>> thread > > > >>>>>>>> shows > > > >>>>>>>>>> the > > > >>>>>>>>>>> builds much more stable. That would be probably too slow, > but > > > >>>> maybe > > > >>>>>>>>>> running > > > >>>>>>>>>>> at least less threads would improve the situation. > > > >>>>>>>>>>> > > > >>>>>>>>>>> It's getting very annoying that I cannot get a green build > on > > > >>>>> GitHub > > > >>>>>>>>> with > > > >>>>>>>>>>> only a few retests. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Regards, > > > >>>>>>>>>>> Andor > > > >>>>>>>>>>> > > > >>>>>>>>>> -- > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> -- Enrico Olivelli > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> -- > > > >>>> > > > >>>> > > > >>>> -- Enrico Olivelli > > > >>>> > > > >>> > > > >> > > > > > >