It's a good idea to reduce the concurrency of to eliminate flakyness. Looks
like single threaded unit tests on trunk is pretty stable
https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some failures
are due to C tests). The build time is longer, but not too bad (for
pre-commit build, for nightly build, build time should not be a concern at
all).


On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar <an...@cloudera.com.invalid>
wrote:

> +1
>
>
>
> On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar
> > <an...@apache.org> ha scritto:
> > >
> > > Thank you guys. This is great help.
> > >
> > > I remember your efforts Bogdan, as far as I remember you observer
> thread
> > starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong.
> > >
> > > I’ve created an umbrella Jira to capture all flaky test fixing efforts
> > here:
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170 <
> > https://issues.apache.org/jira/browse/ZOOKEEPER-3170>
> > >
> > > All previous flaky-related tickets have been converted to sub-tasks.
> > Some of them might not be up-to-date, please consider reviewing them and
> > close if possible. Additionally feel free to create new sub-tasks to
> > capture your actual work.
> > >
> > > I’ve already modified Trunk and branch-3.5 builds to run on 4 threads
> > for testing initially. It resulted in slightly more stable tests:
> >
> > +1
> >
> > I have assigned the umbrella issue to you Andor as you are driving
> > this important task. is is ok ?
> >
> > thank you
> >
> > Enrico
> >
> >
> > >
> > > Trunk (java 8) - failing 1/4 (since #229) - build time increased by
> > 40-45%
> > > Trunk (java 9) - failing 0/2 (since #993) - ~40%
> > > Trunk (java 10) - failing 1/2 (since #280) -
> > > branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45%
> > >
> > > However the pattern is not big enough and results are inaccurate, so I
> > need more builds. I also need to fix a bug in SSL to get java9/10 builds
> > working on 3.5.
> > >
> > > Please let me know if I should revert the changes. Precommit build is
> > still running on 8 threads, but I’d like to change that one too.
> > >
> > > Regards,
> > > Andor
> > >
> > >
> > >
> > > > On 2018. Oct 15., at 9:31, Bogdan Kanivets <bkaniv...@gmail.com>
> > wrote:
> > > >
> > > > Fangmin,
> > > >
> > > > Those are good ideas.
> > > >
> > > > FYI, I've stated running tests continuously in aws m1.xlarge.
> > > > https://github.com/lavacat/zookeeper-tests-lab
> > > >
> > > > So far, I've done ~ 12 runs of trunk. Same common offenders as in
> Flaky
> > > > dash: testManyChildWatchersAutoReset, testPurgeWhenLogRollingInProgr
> > ess
> > > > I'll do some more runs, then try to come up with report.
> > > >
> > > > I'm using aws and not Apache Jenkins env because of better
> > > > control/observability.
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <lvfang...@gmail.com>
> > wrote:
> > > >
> > > >> Internally, we also did some works to reduce the flaky, here are the
> > main
> > > >> things we've done:
> > > >>
> > > >> * using retry rule to retry in case the zk client lost it's
> > connection,
> > > >> this could happen if the quorum tests is running on unstable
> > environment
> > > >> and the leader election happened.
> > > >> * using random port instead of sequentially to avoid the port racing
> > when
> > > >> running tests concurrently
> > > >> * changing tests to avoid using the same test path when
> > creating/deleting
> > > >> nodes
> > > >>
> > > >> These greatly reduced the flaky internally, we should try those if
> > we're
> > > >> seeing similar issues in the Jenkins.
> > > >>
> > > >> Fangmin
> > > >>
> > > >> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets <
> bkaniv...@gmail.com
> > >
> > > >> wrote:
> > > >>
> > > >>> I've looked into flakiness couple months ago (special attention on
> > > >>> testManyChildWatchersAutoReset). In my opinion the problem is a)
> > and c).
> > > >>> Unfortunately I don't have data to back this claim.
> > > >>>
> > > >>> I don't remember seeing many 'port binding' exceptions. Unless
> 'port
> > > >>> assignment' issue manifested as some other exception.
> > > >>>
> > > >>> Before decreasing number of threads I think more data should be
> > > >>> collected/visualized
> > > >>>
> > > >>> 1) Flaky dashboard is great, but we should add another report that
> > maps
> > > >>> 'error causes' to builds/tests
> > > >>> 2) Flaky dash can be extended to save more history (for example
> like
> > this
> > > >>> https://www.chromium.org/developers/testing/flakiness-dashboard)
> > > >>> 3) PreCommit builds should be included in dashboard
> > > >>> 4) We should have a common clean benchmark. For example - take
> > > >>> AWS t3.xlarge instance with set linux distro, jvm, zk commit sha
> and
> > run
> > > >>> tests (current 8 threads) for 8 hours with 1 min cooldown.
> > > >>>
> > > >>> Due to recent employment change, I got sidetracked, but I really
> > want to
> > > >>> get to the bottom of this.
> > > >>> I'm going to setup 4) and report results to this mailing list. Also
> > > >> willing
> > > >>> to work on other items.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli <
> eolive...@gmail.com
> > >
> > > >>> wrote:
> > > >>>
> > > >>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <br...@apache.org> ha
> > scritto:
> > > >>>>
> > > >>>>> i think the unique port assignment (d) is more problematic than
> it
> > > >>>>> appears. there is a race between finding a free port and actually
> > > >>>>> grabbing it. i think that contributes to the flakiness.
> > > >>>>>
> > > >>>>
> > > >>>> This is very hard to solve for our test cases, because we need to
> > build
> > > >>>> configs before starting the groups of servers.
> > > >>>> For tests in single server it will be easier, you just have to
> start
> > > >> the
> > > >>>> server on port zero, get the port and the create client configs.
> > > >>>> I don't know how much it will be worth
> > > >>>>
> > > >>>> Enrico
> > > >>>>
> > > >>>>
> > > >>>>> ben
> > > >>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <an...@apache.org>
> > > >> wrote:
> > > >>>>>>
> > > >>>>>> That is a completely valid point. I started to investigate
> flakies
> > > >>> for
> > > >>>>> exactly the same reason, if you remember the thread that I
> started
> > a
> > > >>>> while
> > > >>>>> ago. It was later abandoned unfortunately, because I’ve run into
> a
> > > >> few
> > > >>>>> issues:
> > > >>>>>>
> > > >>>>>> - We nailed down that in order to release 3.5 stable, we have to
> > > >> make
> > > >>>>> sure it’s not worse than 3.4 by comparing the builds: but these
> > > >> builds
> > > >>>> are
> > > >>>>> not comparable, because 3.4 tests running single threaded while
> 3.5
> > > >>>>> multithreaded showing problems which might also exist on 3.4,
> > > >>>>>>
> > > >>>>>> - Neither of them running C++ tests for some reason, but that’s
> > not
> > > >>>>> really an issue here,
> > > >>>>>>
> > > >>>>>> - Looks like tests on 3.5 is just as solid as on 3.4, because
> > > >> running
> > > >>>>> them on a dedicated, single threaded environment show almost all
> > > >> tests
> > > >>>>> succeeding,
> > > >>>>>>
> > > >>>>>> - I think the root cause of failing unit tests could be one (or
> > > >> more)
> > > >>>> of
> > > >>>>> the following:
> > > >>>>>>        a) Environmental: Jenkins slave gets overloaded with
> other
> > > >>>>> builds and multithreaded test running makes things even worse:
> > > >> starving
> > > >>>> JDK
> > > >>>>> threads and ZK instances (both clients and servers) are unable to
> > > >>> operate
> > > >>>>>>        b) Conceptional: ZK unit tests were not designed to run
> on
> > > >>>>> multiple threads: I investigated the unique port assignment
> feature
> > > >>> which
> > > >>>>> is looking good, but there could be other possible gaps which
> makes
> > > >>> them
> > > >>>>> unreliable when running simultaneously.
> > > >>>>>>        c) Bad testing: testing ZK in the wrong way, making bad
> > > >>>>> assumption (e.g. not syncing clients), etc.
> > > >>>>>>        d) Bug in the server.
> > > >>>>>>
> > > >>>>>> I feel that finding case d) with these tests is super hard,
> > > >> because a
> > > >>>>> test report doesn’t give any information on what could go wrong
> > with
> > > >>>>> ZooKeeper. More or less guessing is your only option.
> > > >>>>>>
> > > >>>>>> Finding c) is a little bit easier, I’m trying to submit patches
> on
> > > >>> them
> > > >>>>> and hopefully making some progress.
> > > >>>>>>
> > > >>>>>> The huge pain in the arse though are a) and b): people
> desperately
> > > >>> keep
> > > >>>>> commenting “please retest this” on github to get a green build
> > while
> > > >>>>> testing is going in a direction to hide real problems: I mean
> > people
> > > >>>>> started not to care about a failing build, because “it must be
> some
> > > >>> flaky
> > > >>>>> unrelated to my patch”. Which is bad, but the shame is it’s true
> > 90%
> > > >>>>> percent of cases.
> > > >>>>>>
> > > >>>>>> I’m just trying to find some ways - besides fixing c) and d)
> > > >> flakies
> > > >>> -
> > > >>>>> to get more reliable and more informative Jenkins builds. Don’t
> > want
> > > >> to
> > > >>>>> make a huge turnaround, but I think if we can get a significantly
> > > >> more
> > > >>>>> reliable build for the price of slightly longer build time
> running
> > > >> on 4
> > > >>>>> threads instead of 8, I say let’s do it.
> > > >>>>>>
> > > >>>>>> As always, any help from the community is more than welcome and
> > > >>>>> appreciated.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Andor
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt <ph...@apache.org>
> > > >> wrote:
> > > >>>>>>>
> > > >>>>>>> iirc the number of threads was increased to improve
> performance.
> > > >>>>> Reducing
> > > >>>>>>> is fine, but do we understand why it's failing? Perhaps it's
> > > >>> finding
> > > >>>>> real
> > > >>>>>>> issues as a result of the artificial concurrency/load.
> > > >>>>>>>
> > > >>>>>>> Patrick
> > > >>>>>>>
> > > >>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
> > > >>>>> <an...@cloudera.com.invalid>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks for the feedback.
> > > >>>>>>>> I'm running a few tests now: branch-3.5 on 2 threads and trunk
> > > >> on
> > > >>> 4
> > > >>>>> threads
> > > >>>>>>>> to see what's the impact on the build time.
> > > >>>>>>>>
> > > >>>>>>>> Github PR job is hard to configure, because its settings are
> > > >> hard
> > > >>>>> coded
> > > >>>>>>>> into a shell script in the codebase. I have to open PR for
> that.
> > > >>>>>>>>
> > > >>>>>>>> Andor
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> > > >>>>>>>> nkal...@cloudera.com.invalid> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> +1, running the tests locally with 1 thread always passes
> > > >> (well,
> > > >>> I
> > > >>>>> run it
> > > >>>>>>>>> about 5 times, but still)
> > > >>>>>>>>> On the other hand, running it on 8 threads yields similarly
> > > >> flaky
> > > >>>>> results
> > > >>>>>>>>> as Apache runs. (Although it is much faster, but if we have
> to
> > > >>> run
> > > >>>>> 6-8-10
> > > >>>>>>>>> times sometimes to get a green run...)
> > > >>>>>>>>>
> > > >>>>>>>>> Norbert
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli <
> > > >>>> eolive...@gmail.com
> > > >>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> +1
> > > >>>>>>>>>>
> > > >>>>>>>>>> Enrico
> > > >>>>>>>>>>
> > > >>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor Molnar <an...@apache.org>
> ha
> > > >>>>> scritto:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What do you think of changing number of threads running
> unit
> > > >>>> tests
> > > >>>>> in
> > > >>>>>>>>>>> Jenkins from current 8 to 4 or even 2?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Running unit tests inside Cloudera environment on a single
> > > >>> thread
> > > >>>>>>>> shows
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> builds much more stable. That would be probably too slow,
> but
> > > >>>> maybe
> > > >>>>>>>>>> running
> > > >>>>>>>>>>> at least less threads would improve the situation.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> It's getting very annoying that I cannot get a green build
> on
> > > >>>>> GitHub
> > > >>>>>>>>> with
> > > >>>>>>>>>>> only a few retests.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards,
> > > >>>>>>>>>>> Andor
> > > >>>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> -- Enrico Olivelli
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>> --
> > > >>>>
> > > >>>>
> > > >>>> -- Enrico Olivelli
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>

Reply via email to