Just to throw my 2c in here. It would also be good if we could run the
tests that failed "last time" first. That way at least if they are broken
then we'd fail fast rather than waiting for ever. Not much sucks more than
waiting for 2 hours to find out a test failed when it fails regularly and
you could have known within minutes...

On Wed, 27 Mar 2019 at 15:17, Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> Hi All,
>
> I've created a PR for what we have internally for retrying flaky tests. Any
> reviews and ideas are welcome: https://github.com/apache/kafka/pull/6506
> It's basically collects the failed classes and reruns them at the end. If
> they successful it overwrites the test report.
>
> Thanks,
> Viktor
>
> On Mon, Mar 11, 2019 at 1:05 PM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > I agree with Ron.
> > I think improving the framework with a configurable number of retries on
> > some tests will yield the highest ROI in terms of passing builds.
> >
> > On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino <rndg...@gmail.com> wrote:
> >
> > > It's a classic problem: you can't string N things together serially and
> > > expect high reliability.  5,000 tests in a row isn't going to give you
> a
> > > bunch of 9's.  It feels to me that the test frameworks themselves
> should
> > > support a more robust model -- like a way to tag a test as "retry me up
> > to
> > > N times before you really consider me a failure" or something like
> that.
> > >
> > > Ron
> > >
> > > On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <
> > > stanis...@confluent.io>
> > > wrote:
> > >
> > > > > We internally have an improvement for a half a year now which
> reruns
> > > the
> > > > flaky test classes at the end of the test gradle task, lets you know
> > that
> > > > they were rerun and probably flaky. It fails the build only if the
> > second
> > > > run of the test class was also unsuccessful. I think it works pretty
> > > good,
> > > > we mostly have green builds. If there is interest, I can try to
> > > contribute
> > > > that.
> > > >
> > > > That does sound very intriguing. Does it rerun the test classes that
> > > failed
> > > > or some known, marked classes? If it is the former, I can see a lot
> of
> > > > value in having that automated in our PR builds. I wonder what others
> > > think
> > > > of this
> > > >
> > > > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey All,
> > > > >
> > > > > Thanks for the loads of ideas.
> > > > >
> > > > > @Stanislav, @Sonke
> > > > > I probably left it out from my email but I really imagined this as
> a
> > > > > case-by-case basis change. If we think that it wouldn't cause
> > problems,
> > > > > then it might be applied. That way we'd limit the blast radius
> > > somewhat.
> > > > > The 1 hour gain is really just the most optimistic scenario, I'm
> > almost
> > > > > sure that not every test could be transformed to use a common
> > cluster.
> > > > > We internally have an improvement for a half a year now which
> reruns
> > > the
> > > > > flaky test classes at the end of the test gradle task, lets you
> know
> > > that
> > > > > they were rerun and probably flaky. It fails the build only if the
> > > second
> > > > > run of the test class was also unsuccessful. I think it works
> pretty
> > > > good,
> > > > > we mostly have green builds. If there is interest, I can try to
> > > > contribute
> > > > > that.
> > > > >
> > > > > >I am also extremely annoyed at times by the amount of coffee I
> have
> > to
> > > > > drink before tests finish
> > > > > Just please don't get a heart attack :)
> > > > >
> > > > > @Ron, @Colin
> > > > > You bring up a very good point that it is easier and frees up more
> > > > > resources if we just run change specific tests and it's good to
> know
> > > > that a
> > > > > similar solution (meaning using a shared resource for testing) have
> > > > failed
> > > > > elsewhere. I second Ron on the test categorization though, although
> > as
> > > a
> > > > > first attempt I think using a flaky retry + running only the
> > necessary
> > > > > tests would help in both time saving and effectiveness. Also it
> would
> > > be
> > > > > easier to achieve.
> > > > >
> > > > > @Ismael
> > > > > Yea, it'd be interesting to profile the startup/shutdown, I've
> never
> > > done
> > > > > that. Perhaps I'll set some time apart for that :). It's definitely
> > > true
> > > > > though that if we see a significant delay there we wouldn't just
> > > improve
> > > > > the efficiency of the tests but also customer experience.
> > > > >
> > > > > Best,
> > > > > Viktor
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com>
> > wrote:
> > > > >
> > > > > > It's an idea that has come up before and worth exploring
> > eventually.
> > > > > > However, I'd first try to optimize the server startup/shutdown
> > > process.
> > > > > If
> > > > > > we measure where the time is going, maybe some opportunities will
> > > > present
> > > > > > themselves.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > > > > viktorsomo...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Folks,
> > > > > > >
> > > > > > > I've been observing lately that unit tests usually take 2.5
> hours
> > > to
> > > > > run
> > > > > > > and a very big portion of these are the core tests where a new
> > > > cluster
> > > > > is
> > > > > > > spun up for every test. This takes most of the time. I ran a
> test
> > > > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> > > > profiler
> > > > > > and
> > > > > > > it shows for instance that running the whole class itself took
> 10
> > > > > minutes
> > > > > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> > > > That's a
> > > > > > > 100% overhead. Without profiler the whole class takes 7 minutes
> > and
> > > > 48
> > > > > > > seconds, so the useful time would be between 3-4 minutes. This
> > is a
> > > > > > bigger
> > > > > > > test though, most of them won't take this much.
> > > > > > > There are 74 classes that implement KafkaServerTestHarness and
> > just
> > > > > > running
> > > > > > > :core:integrationTest takes almost 2 hours.
> > > > > > >
> > > > > > > I think we could greatly speed up these integration tests by
> just
> > > > > > creating
> > > > > > > the cluster once per class and perform the tests on separate
> > > > methods. I
> > > > > > > know that this a little bit contradicts to the principle that
> > tests
> > > > > > should
> > > > > > > be independent but it seems like recreating clusters for each
> is
> > a
> > > > very
> > > > > > > expensive operation. Also if the tests are acting on different
> > > > > resources
> > > > > > > (different topics, etc.) then it might not hurt their
> > independence.
> > > > > There
> > > > > > > might be cases of course where this is not possible but I think
> > > there
> > > > > > could
> > > > > > > be a lot where it is.
> > > > > > >
> > > > > > > In the optimal case we could cut the testing time back by
> > > > approximately
> > > > > > an
> > > > > > > hour. This would save resources and give quicker feedback for
> PR
> > > > > builds.
> > > > > > >
> > > > > > > What are your thoughts?
> > > > > > > Has anyone thought about this or were there any attempts made?
> > > > > > >
> > > > > > > Best,
> > > > > > > Viktor
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best,
> > > > Stanislav
> > > >
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>

Reply via email to