Re: Decrease number of threads in Jenkins builds to reduce flakyness

Andor Molnár Mon, 22 Oct 2018 09:36:02 -0700

Thanks Bogdan, so far so good.

testNodeDataChanged is an old beast, I've a possible fix for that from
@afine:


https://github.com/apache/zookeeper/pull/300

Would be great if we could review it and get rid of this flaky.


Andor




On 10/20/18 06:41, Bogdan Kanivets wrote:
> I think the argument for keeping concurrency is that it may manifest some
> unknown problems with the code.
>
> Maybe a middle ground  - move largest offenders into separate junit tag and
> run them after rest of the test with threads=1. Hopefully this will make
> life better for PRs.
>
> On the note of largest offenders, I've done 44 runs on aws r3.large with
> various thread settings (1, 2, 4, 8).
> Failure counts:
>       1 testNextConfigAlreadyActive
>       1 testNonExistingOpCode
>       1 testRaceConditionBetweenLeaderAndAckRequestProcessor
>       1 testWatcherDisconnectOnClose
>       2 testDoubleElection
>       5 testCurrentServersAreObserversInNextConfig
>       5 testNormalFollowerRunWithDiff
>       7 startSingleServerTest
>      18 testNodeDataChanged
>
> Haven't seen testPurgeWhenLogRollingInProgress
> or testManyChildWatchersAutoReset failing yet.
>
>
>
> On Thu, Oct 18, 2018 at 10:03 PM Michael Han <h...@apache.org> wrote:
>
>> It's a good idea to reduce the concurrency of to eliminate flakyness. Looks
>> like single threaded unit tests on trunk is pretty stable
>> https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some
>> failures
>> are due to C tests). The build time is longer, but not too bad (for
>> pre-commit build, for nightly build, build time should not be a concern at
>> all).
>>
>>
>> On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar <an...@cloudera.com.invalid>
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli <eolive...@gmail.com>
>>> wrote:
>>>
>>>> Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar
>>>> <an...@apache.org> ha scritto:
>>>>> Thank you guys. This is great help.
>>>>>
>>>>> I remember your efforts Bogdan, as far as I remember you observer
>>> thread
>>>> starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong.
>>>>> I’ve created an umbrella Jira to capture all flaky test fixing
>> efforts
>>>> here:
>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3170 <
>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3170>
>>>>> All previous flaky-related tickets have been converted to sub-tasks.
>>>> Some of them might not be up-to-date, please consider reviewing them
>> and
>>>> close if possible. Additionally feel free to create new sub-tasks to
>>>> capture your actual work.
>>>>> I’ve already modified Trunk and branch-3.5 builds to run on 4 threads
>>>> for testing initially. It resulted in slightly more stable tests:
>>>>
>>>> +1
>>>>
>>>> I have assigned the umbrella issue to you Andor as you are driving
>>>> this important task. is is ok ?
>>>>
>>>> thank you
>>>>
>>>> Enrico
>>>>
>>>>
>>>>> Trunk (java 8) - failing 1/4 (since #229) - build time increased by
>>>> 40-45%
>>>>> Trunk (java 9) - failing 0/2 (since #993) - ~40%
>>>>> Trunk (java 10) - failing 1/2 (since #280) -
>>>>> branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45%
>>>>>
>>>>> However the pattern is not big enough and results are inaccurate, so
>> I
>>>> need more builds. I also need to fix a bug in SSL to get java9/10
>> builds
>>>> working on 3.5.
>>>>> Please let me know if I should revert the changes. Precommit build is
>>>> still running on 8 threads, but I’d like to change that one too.
>>>>> Regards,
>>>>> Andor
>>>>>
>>>>>
>>>>>
>>>>>> On 2018. Oct 15., at 9:31, Bogdan Kanivets <bkaniv...@gmail.com>
>>>> wrote:
>>>>>> Fangmin,
>>>>>>
>>>>>> Those are good ideas.
>>>>>>
>>>>>> FYI, I've stated running tests continuously in aws m1.xlarge.
>>>>>> https://github.com/lavacat/zookeeper-tests-lab
>>>>>>
>>>>>> So far, I've done ~ 12 runs of trunk. Same common offenders as in
>>> Flaky
>>>>>> dash: testManyChildWatchersAutoReset,
>> testPurgeWhenLogRollingInProgr
>>>> ess
>>>>>> I'll do some more runs, then try to come up with report.
>>>>>>
>>>>>> I'm using aws and not Apache Jenkins env because of better
>>>>>> control/observability.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <lvfang...@gmail.com>
>>>> wrote:
>>>>>>> Internally, we also did some works to reduce the flaky, here are
>> the
>>>> main
>>>>>>> things we've done:
>>>>>>>
>>>>>>> * using retry rule to retry in case the zk client lost it's
>>>> connection,
>>>>>>> this could happen if the quorum tests is running on unstable
>>>> environment
>>>>>>> and the leader election happened.
>>>>>>> * using random port instead of sequentially to avoid the port
>> racing
>>>> when
>>>>>>> running tests concurrently
>>>>>>> * changing tests to avoid using the same test path when
>>>> creating/deleting
>>>>>>> nodes
>>>>>>>
>>>>>>> These greatly reduced the flaky internally, we should try those if
>>>> we're
>>>>>>> seeing similar issues in the Jenkins.
>>>>>>>
>>>>>>> Fangmin
>>>>>>>
>>>>>>> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets <
>>> bkaniv...@gmail.com
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've looked into flakiness couple months ago (special attention
>> on
>>>>>>>> testManyChildWatchersAutoReset). In my opinion the problem is a)
>>>> and c).
>>>>>>>> Unfortunately I don't have data to back this claim.
>>>>>>>>
>>>>>>>> I don't remember seeing many 'port binding' exceptions. Unless
>>> 'port
>>>>>>>> assignment' issue manifested as some other exception.
>>>>>>>>
>>>>>>>> Before decreasing number of threads I think more data should be
>>>>>>>> collected/visualized
>>>>>>>>
>>>>>>>> 1) Flaky dashboard is great, but we should add another report
>> that
>>>> maps
>>>>>>>> 'error causes' to builds/tests
>>>>>>>> 2) Flaky dash can be extended to save more history (for example
>>> like
>>>> this
>>>>>>>> https://www.chromium.org/developers/testing/flakiness-dashboard)
>>>>>>>> 3) PreCommit builds should be included in dashboard
>>>>>>>> 4) We should have a common clean benchmark. For example - take
>>>>>>>> AWS t3.xlarge instance with set linux distro, jvm, zk commit sha
>>> and
>>>> run
>>>>>>>> tests (current 8 threads) for 8 hours with 1 min cooldown.
>>>>>>>>
>>>>>>>> Due to recent employment change, I got sidetracked, but I really
>>>> want to
>>>>>>>> get to the bottom of this.
>>>>>>>> I'm going to setup 4) and report results to this mailing list.
>> Also
>>>>>>> willing
>>>>>>>> to work on other items.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli <
>>> eolive...@gmail.com
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <br...@apache.org> ha
>>>> scritto:
>>>>>>>>>> i think the unique port assignment (d) is more problematic than
>>> it
>>>>>>>>>> appears. there is a race between finding a free port and
>> actually
>>>>>>>>>> grabbing it. i think that contributes to the flakiness.
>>>>>>>>>>
>>>>>>>>> This is very hard to solve for our test cases, because we need
>> to
>>>> build
>>>>>>>>> configs before starting the groups of servers.
>>>>>>>>> For tests in single server it will be easier, you just have to
>>> start
>>>>>>> the
>>>>>>>>> server on port zero, get the port and the create client configs.
>>>>>>>>> I don't know how much it will be worth
>>>>>>>>>
>>>>>>>>> Enrico
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ben
>>>>>>>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <an...@apache.org
>>>>>>> wrote:
>>>>>>>>>>> That is a completely valid point. I started to investigate
>>> flakies
>>>>>>>> for
>>>>>>>>>> exactly the same reason, if you remember the thread that I
>>> started
>>>> a
>>>>>>>>> while
>>>>>>>>>> ago. It was later abandoned unfortunately, because I’ve run
>> into
>>> a
>>>>>>> few
>>>>>>>>>> issues:
>>>>>>>>>>> - We nailed down that in order to release 3.5 stable, we have
>> to
>>>>>>> make
>>>>>>>>>> sure it’s not worse than 3.4 by comparing the builds: but these
>>>>>>> builds
>>>>>>>>> are
>>>>>>>>>> not comparable, because 3.4 tests running single threaded while
>>> 3.5
>>>>>>>>>> multithreaded showing problems which might also exist on 3.4,
>>>>>>>>>>> - Neither of them running C++ tests for some reason, but
>> that’s
>>>> not
>>>>>>>>>> really an issue here,
>>>>>>>>>>> - Looks like tests on 3.5 is just as solid as on 3.4, because
>>>>>>> running
>>>>>>>>>> them on a dedicated, single threaded environment show almost
>> all
>>>>>>> tests
>>>>>>>>>> succeeding,
>>>>>>>>>>> - I think the root cause of failing unit tests could be one
>> (or
>>>>>>> more)
>>>>>>>>> of
>>>>>>>>>> the following:
>>>>>>>>>>>        a) Environmental: Jenkins slave gets overloaded with
>>> other
>>>>>>>>>> builds and multithreaded test running makes things even worse:
>>>>>>> starving
>>>>>>>>> JDK
>>>>>>>>>> threads and ZK instances (both clients and servers) are unable
>> to
>>>>>>>> operate
>>>>>>>>>>>        b) Conceptional: ZK unit tests were not designed to run
>>> on
>>>>>>>>>> multiple threads: I investigated the unique port assignment
>>> feature
>>>>>>>> which
>>>>>>>>>> is looking good, but there could be other possible gaps which
>>> makes
>>>>>>>> them
>>>>>>>>>> unreliable when running simultaneously.
>>>>>>>>>>>        c) Bad testing: testing ZK in the wrong way, making bad
>>>>>>>>>> assumption (e.g. not syncing clients), etc.
>>>>>>>>>>>        d) Bug in the server.
>>>>>>>>>>>
>>>>>>>>>>> I feel that finding case d) with these tests is super hard,
>>>>>>> because a
>>>>>>>>>> test report doesn’t give any information on what could go wrong
>>>> with
>>>>>>>>>> ZooKeeper. More or less guessing is your only option.
>>>>>>>>>>> Finding c) is a little bit easier, I’m trying to submit
>> patches
>>> on
>>>>>>>> them
>>>>>>>>>> and hopefully making some progress.
>>>>>>>>>>> The huge pain in the arse though are a) and b): people
>>> desperately
>>>>>>>> keep
>>>>>>>>>> commenting “please retest this” on github to get a green build
>>>> while
>>>>>>>>>> testing is going in a direction to hide real problems: I mean
>>>> people
>>>>>>>>>> started not to care about a failing build, because “it must be
>>> some
>>>>>>>> flaky
>>>>>>>>>> unrelated to my patch”. Which is bad, but the shame is it’s
>> true
>>>> 90%
>>>>>>>>>> percent of cases.
>>>>>>>>>>> I’m just trying to find some ways - besides fixing c) and d)
>>>>>>> flakies
>>>>>>>> -
>>>>>>>>>> to get more reliable and more informative Jenkins builds. Don’t
>>>> want
>>>>>>> to
>>>>>>>>>> make a huge turnaround, but I think if we can get a
>> significantly
>>>>>>> more
>>>>>>>>>> reliable build for the price of slightly longer build time
>>> running
>>>>>>> on 4
>>>>>>>>>> threads instead of 8, I say let’s do it.
>>>>>>>>>>> As always, any help from the community is more than welcome
>> and
>>>>>>>>>> appreciated.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Andor
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt <ph...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>> iirc the number of threads was increased to improve
>>> performance.
>>>>>>>>>> Reducing
>>>>>>>>>>>> is fine, but do we understand why it's failing? Perhaps it's
>>>>>>>> finding
>>>>>>>>>> real
>>>>>>>>>>>> issues as a result of the artificial concurrency/load.
>>>>>>>>>>>>
>>>>>>>>>>>> Patrick
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
>>>>>>>>>> <an...@cloudera.com.invalid>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the feedback.
>>>>>>>>>>>>> I'm running a few tests now: branch-3.5 on 2 threads and
>> trunk
>>>>>>> on
>>>>>>>> 4
>>>>>>>>>> threads
>>>>>>>>>>>>> to see what's the impact on the build time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Github PR job is hard to configure, because its settings are
>>>>>>> hard
>>>>>>>>>> coded
>>>>>>>>>>>>> into a shell script in the codebase. I have to open PR for
>>> that.
>>>>>>>>>>>>> Andor
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
>>>>>>>>>>>>> nkal...@cloudera.com.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1, running the tests locally with 1 thread always passes
>>>>>>> (well,
>>>>>>>> I
>>>>>>>>>> run it
>>>>>>>>>>>>>> about 5 times, but still)
>>>>>>>>>>>>>> On the other hand, running it on 8 threads yields similarly
>>>>>>> flaky
>>>>>>>>>> results
>>>>>>>>>>>>>> as Apache runs. (Although it is much faster, but if we have
>>> to
>>>>>>>> run
>>>>>>>>>> 6-8-10
>>>>>>>>>>>>>> times sometimes to get a green run...)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Norbert
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli <
>>>>>>>>> eolive...@gmail.com
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor Molnar <an...@apache.org>
>>> ha
>>>>>>>>>> scritto:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think of changing number of threads running
>>> unit
>>>>>>>>> tests
>>>>>>>>>> in
>>>>>>>>>>>>>>>> Jenkins from current 8 to 4 or even 2?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Running unit tests inside Cloudera environment on a
>> single
>>>>>>>> thread
>>>>>>>>>>>>> shows
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> builds much more stable. That would be probably too slow,
>>> but
>>>>>>>>> maybe
>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>> at least less threads would improve the situation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It's getting very annoying that I cannot get a green
>> build
>>> on
>>>>>>>>>> GitHub
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> only a few retests.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Andor
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Enrico Olivelli
>>>>>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Enrico Olivelli
>>>>>>>>>

Re: Decrease number of threads in Jenkins builds to reduce flakyness

Reply via email to