> We internally have an improvement for a half a year now which reruns the
flaky test classes at the end of the test gradle task, lets you know that
they were rerun and probably flaky. It fails the build only if the second
run of the test class was also unsuccessful. I think it works pretty good,
we mostly have green builds. If there is interest, I can try to contribute
that.

That does sound very intriguing. Does it rerun the test classes that failed
or some known, marked classes? If it is the former, I can see a lot of
value in having that automated in our PR builds. I wonder what others think
of this

On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <viktorsomo...@gmail.com>
wrote:

> Hey All,
>
> Thanks for the loads of ideas.
>
> @Stanislav, @Sonke
> I probably left it out from my email but I really imagined this as a
> case-by-case basis change. If we think that it wouldn't cause problems,
> then it might be applied. That way we'd limit the blast radius somewhat.
> The 1 hour gain is really just the most optimistic scenario, I'm almost
> sure that not every test could be transformed to use a common cluster.
> We internally have an improvement for a half a year now which reruns the
> flaky test classes at the end of the test gradle task, lets you know that
> they were rerun and probably flaky. It fails the build only if the second
> run of the test class was also unsuccessful. I think it works pretty good,
> we mostly have green builds. If there is interest, I can try to contribute
> that.
>
> >I am also extremely annoyed at times by the amount of coffee I have to
> drink before tests finish
> Just please don't get a heart attack :)
>
> @Ron, @Colin
> You bring up a very good point that it is easier and frees up more
> resources if we just run change specific tests and it's good to know that a
> similar solution (meaning using a shared resource for testing) have failed
> elsewhere. I second Ron on the test categorization though, although as a
> first attempt I think using a flaky retry + running only the necessary
> tests would help in both time saving and effectiveness. Also it would be
> easier to achieve.
>
> @Ismael
> Yea, it'd be interesting to profile the startup/shutdown, I've never done
> that. Perhaps I'll set some time apart for that :). It's definitely true
> though that if we see a significant delay there we wouldn't just improve
> the efficiency of the tests but also customer experience.
>
> Best,
> Viktor
>
>
>
> On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com> wrote:
>
> > It's an idea that has come up before and worth exploring eventually.
> > However, I'd first try to optimize the server startup/shutdown process.
> If
> > we measure where the time is going, maybe some opportunities will present
> > themselves.
> >
> > Ismael
> >
> > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com
> > >
> > wrote:
> >
> > > Hi Folks,
> > >
> > > I've been observing lately that unit tests usually take 2.5 hours to
> run
> > > and a very big portion of these are the core tests where a new cluster
> is
> > > spun up for every test. This takes most of the time. I ran a test
> > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> > and
> > > it shows for instance that running the whole class itself took 10
> minutes
> > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > seconds, so the useful time would be between 3-4 minutes. This is a
> > bigger
> > > test though, most of them won't take this much.
> > > There are 74 classes that implement KafkaServerTestHarness and just
> > running
> > > :core:integrationTest takes almost 2 hours.
> > >
> > > I think we could greatly speed up these integration tests by just
> > creating
> > > the cluster once per class and perform the tests on separate methods. I
> > > know that this a little bit contradicts to the principle that tests
> > should
> > > be independent but it seems like recreating clusters for each is a very
> > > expensive operation. Also if the tests are acting on different
> resources
> > > (different topics, etc.) then it might not hurt their independence.
> There
> > > might be cases of course where this is not possible but I think there
> > could
> > > be a lot where it is.
> > >
> > > In the optimal case we could cut the testing time back by approximately
> > an
> > > hour. This would save resources and give quicker feedback for PR
> builds.
> > >
> > > What are your thoughts?
> > > Has anyone thought about this or were there any attempts made?
> > >
> > > Best,
> > > Viktor
> > >
> >
>


-- 
Best,
Stanislav

Reply via email to