Re: Speeding up integration tests

2019-04-01 Thread Damian Guy
Just to throw my 2c in here. It would also be good if we could run the
tests that failed "last time" first. That way at least if they are broken
then we'd fail fast rather than waiting for ever. Not much sucks more than
waiting for 2 hours to find out a test failed when it fails regularly and
you could have known within minutes...

On Wed, 27 Mar 2019 at 15:17, Viktor Somogyi-Vass 
wrote:

> Hi All,
>
> I've created a PR for what we have internally for retrying flaky tests. Any
> reviews and ideas are welcome: https://github.com/apache/kafka/pull/6506
> It's basically collects the failed classes and reruns them at the end. If
> they successful it overwrites the test report.
>
> Thanks,
> Viktor
>
> On Mon, Mar 11, 2019 at 1:05 PM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > I agree with Ron.
> > I think improving the framework with a configurable number of retries on
> > some tests will yield the highest ROI in terms of passing builds.
> >
> > On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino  wrote:
> >
> > > It's a classic problem: you can't string N things together serially and
> > > expect high reliability.  5,000 tests in a row isn't going to give you
> a
> > > bunch of 9's.  It feels to me that the test frameworks themselves
> should
> > > support a more robust model -- like a way to tag a test as "retry me up
> > to
> > > N times before you really consider me a failure" or something like
> that.
> > >
> > > Ron
> > >
> > > On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <
> > > stanis...@confluent.io>
> > > wrote:
> > >
> > > > > We internally have an improvement for a half a year now which
> reruns
> > > the
> > > > flaky test classes at the end of the test gradle task, lets you know
> > that
> > > > they were rerun and probably flaky. It fails the build only if the
> > second
> > > > run of the test class was also unsuccessful. I think it works pretty
> > > good,
> > > > we mostly have green builds. If there is interest, I can try to
> > > contribute
> > > > that.
> > > >
> > > > That does sound very intriguing. Does it rerun the test classes that
> > > failed
> > > > or some known, marked classes? If it is the former, I can see a lot
> of
> > > > value in having that automated in our PR builds. I wonder what others
> > > think
> > > > of this
> > > >
> > > > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey All,
> > > > >
> > > > > Thanks for the loads of ideas.
> > > > >
> > > > > @Stanislav, @Sonke
> > > > > I probably left it out from my email but I really imagined this as
> a
> > > > > case-by-case basis change. If we think that it wouldn't cause
> > problems,
> > > > > then it might be applied. That way we'd limit the blast radius
> > > somewhat.
> > > > > The 1 hour gain is really just the most optimistic scenario, I'm
> > almost
> > > > > sure that not every test could be transformed to use a common
> > cluster.
> > > > > We internally have an improvement for a half a year now which
> reruns
> > > the
> > > > > flaky test classes at the end of the test gradle task, lets you
> know
> > > that
> > > > > they were rerun and probably flaky. It fails the build only if the
> > > second
> > > > > run of the test class was also unsuccessful. I think it works
> pretty
> > > > good,
> > > > > we mostly have green builds. If there is interest, I can try to
> > > > contribute
> > > > > that.
> > > > >
> > > > > >I am also extremely annoyed at times by the amount of coffee I
> have
> > to
> > > > > drink before tests finish
> > > > > Just please don't get a heart attack :)
> > > > >
> > > > > @Ron, @Colin
> > > > > You bring up a very good point that it is easier and frees up more
> > > > > resources if we just run change specific tests and it's good to
> know
> > > > that a
> > > > > similar solution (meaning using a shared resource for testing) have
> > > > failed
> > > > > elsewhere. I second Ron on the test categorization though, although
> > as
> > > a
> > > > > first attempt I think using a flaky retry + running only the
> > necessary
> > > > > tests would help in both time saving and effectiveness. Also it
> would
> > > be
> > > > > easier to achieve.
> > > > >
> > > > > @Ismael
> > > > > Yea, it'd be interesting to profile the startup/shutdown, I've
> never
> > > done
> > > > > that. Perhaps I'll set some time apart for that :). It's definitely
> > > true
> > > > > though that if we see a significant delay there we wouldn't just
> > > improve
> > > > > the efficiency of the tests but also customer experience.
> > > > >
> > > > > Best,
> > > > > Viktor
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma 
> > wrote:
> > > > >
> > > > > > It's an idea that has come up before and worth exploring
> > eventually.
> > > > > > However, I'd first try to optimize the server startup/shutdown
> > > process.
> > > > > If
> > > > > > we measure where the time is going, maybe some 

Re: Speeding up integration tests

2019-03-27 Thread Viktor Somogyi-Vass
Hi All,

I've created a PR for what we have internally for retrying flaky tests. Any
reviews and ideas are welcome: https://github.com/apache/kafka/pull/6506
It's basically collects the failed classes and reruns them at the end. If
they successful it overwrites the test report.

Thanks,
Viktor

On Mon, Mar 11, 2019 at 1:05 PM Stanislav Kozlovski 
wrote:

> I agree with Ron.
> I think improving the framework with a configurable number of retries on
> some tests will yield the highest ROI in terms of passing builds.
>
> On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino  wrote:
>
> > It's a classic problem: you can't string N things together serially and
> > expect high reliability.  5,000 tests in a row isn't going to give you a
> > bunch of 9's.  It feels to me that the test frameworks themselves should
> > support a more robust model -- like a way to tag a test as "retry me up
> to
> > N times before you really consider me a failure" or something like that.
> >
> > Ron
> >
> > On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > wrote:
> >
> > > > We internally have an improvement for a half a year now which reruns
> > the
> > > flaky test classes at the end of the test gradle task, lets you know
> that
> > > they were rerun and probably flaky. It fails the build only if the
> second
> > > run of the test class was also unsuccessful. I think it works pretty
> > good,
> > > we mostly have green builds. If there is interest, I can try to
> > contribute
> > > that.
> > >
> > > That does sound very intriguing. Does it rerun the test classes that
> > failed
> > > or some known, marked classes? If it is the former, I can see a lot of
> > > value in having that automated in our PR builds. I wonder what others
> > think
> > > of this
> > >
> > > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com>
> > > wrote:
> > >
> > > > Hey All,
> > > >
> > > > Thanks for the loads of ideas.
> > > >
> > > > @Stanislav, @Sonke
> > > > I probably left it out from my email but I really imagined this as a
> > > > case-by-case basis change. If we think that it wouldn't cause
> problems,
> > > > then it might be applied. That way we'd limit the blast radius
> > somewhat.
> > > > The 1 hour gain is really just the most optimistic scenario, I'm
> almost
> > > > sure that not every test could be transformed to use a common
> cluster.
> > > > We internally have an improvement for a half a year now which reruns
> > the
> > > > flaky test classes at the end of the test gradle task, lets you know
> > that
> > > > they were rerun and probably flaky. It fails the build only if the
> > second
> > > > run of the test class was also unsuccessful. I think it works pretty
> > > good,
> > > > we mostly have green builds. If there is interest, I can try to
> > > contribute
> > > > that.
> > > >
> > > > >I am also extremely annoyed at times by the amount of coffee I have
> to
> > > > drink before tests finish
> > > > Just please don't get a heart attack :)
> > > >
> > > > @Ron, @Colin
> > > > You bring up a very good point that it is easier and frees up more
> > > > resources if we just run change specific tests and it's good to know
> > > that a
> > > > similar solution (meaning using a shared resource for testing) have
> > > failed
> > > > elsewhere. I second Ron on the test categorization though, although
> as
> > a
> > > > first attempt I think using a flaky retry + running only the
> necessary
> > > > tests would help in both time saving and effectiveness. Also it would
> > be
> > > > easier to achieve.
> > > >
> > > > @Ismael
> > > > Yea, it'd be interesting to profile the startup/shutdown, I've never
> > done
> > > > that. Perhaps I'll set some time apart for that :). It's definitely
> > true
> > > > though that if we see a significant delay there we wouldn't just
> > improve
> > > > the efficiency of the tests but also customer experience.
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > > >
> > > >
> > > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma 
> wrote:
> > > >
> > > > > It's an idea that has come up before and worth exploring
> eventually.
> > > > > However, I'd first try to optimize the server startup/shutdown
> > process.
> > > > If
> > > > > we measure where the time is going, maybe some opportunities will
> > > present
> > > > > themselves.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Folks,
> > > > > >
> > > > > > I've been observing lately that unit tests usually take 2.5 hours
> > to
> > > > run
> > > > > > and a very big portion of these are the core tests where a new
> > > cluster
> > > > is
> > > > > > spun up for every test. This takes most of the time. I ran a test
> > > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> > > profiler
> > > > > and
> > > > > > it shows for instance that running the whole 

Re: Speeding up integration tests

2019-03-11 Thread Stanislav Kozlovski
I agree with Ron.
I think improving the framework with a configurable number of retries on
some tests will yield the highest ROI in terms of passing builds.

On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino  wrote:

> It's a classic problem: you can't string N things together serially and
> expect high reliability.  5,000 tests in a row isn't going to give you a
> bunch of 9's.  It feels to me that the test frameworks themselves should
> support a more robust model -- like a way to tag a test as "retry me up to
> N times before you really consider me a failure" or something like that.
>
> Ron
>
> On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > > We internally have an improvement for a half a year now which reruns
> the
> > flaky test classes at the end of the test gradle task, lets you know that
> > they were rerun and probably flaky. It fails the build only if the second
> > run of the test class was also unsuccessful. I think it works pretty
> good,
> > we mostly have green builds. If there is interest, I can try to
> contribute
> > that.
> >
> > That does sound very intriguing. Does it rerun the test classes that
> failed
> > or some known, marked classes? If it is the former, I can see a lot of
> > value in having that automated in our PR builds. I wonder what others
> think
> > of this
> >
> > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com>
> > wrote:
> >
> > > Hey All,
> > >
> > > Thanks for the loads of ideas.
> > >
> > > @Stanislav, @Sonke
> > > I probably left it out from my email but I really imagined this as a
> > > case-by-case basis change. If we think that it wouldn't cause problems,
> > > then it might be applied. That way we'd limit the blast radius
> somewhat.
> > > The 1 hour gain is really just the most optimistic scenario, I'm almost
> > > sure that not every test could be transformed to use a common cluster.
> > > We internally have an improvement for a half a year now which reruns
> the
> > > flaky test classes at the end of the test gradle task, lets you know
> that
> > > they were rerun and probably flaky. It fails the build only if the
> second
> > > run of the test class was also unsuccessful. I think it works pretty
> > good,
> > > we mostly have green builds. If there is interest, I can try to
> > contribute
> > > that.
> > >
> > > >I am also extremely annoyed at times by the amount of coffee I have to
> > > drink before tests finish
> > > Just please don't get a heart attack :)
> > >
> > > @Ron, @Colin
> > > You bring up a very good point that it is easier and frees up more
> > > resources if we just run change specific tests and it's good to know
> > that a
> > > similar solution (meaning using a shared resource for testing) have
> > failed
> > > elsewhere. I second Ron on the test categorization though, although as
> a
> > > first attempt I think using a flaky retry + running only the necessary
> > > tests would help in both time saving and effectiveness. Also it would
> be
> > > easier to achieve.
> > >
> > > @Ismael
> > > Yea, it'd be interesting to profile the startup/shutdown, I've never
> done
> > > that. Perhaps I'll set some time apart for that :). It's definitely
> true
> > > though that if we see a significant delay there we wouldn't just
> improve
> > > the efficiency of the tests but also customer experience.
> > >
> > > Best,
> > > Viktor
> > >
> > >
> > >
> > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma  wrote:
> > >
> > > > It's an idea that has come up before and worth exploring eventually.
> > > > However, I'd first try to optimize the server startup/shutdown
> process.
> > > If
> > > > we measure where the time is going, maybe some opportunities will
> > present
> > > > themselves.
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I've been observing lately that unit tests usually take 2.5 hours
> to
> > > run
> > > > > and a very big portion of these are the core tests where a new
> > cluster
> > > is
> > > > > spun up for every test. This takes most of the time. I ran a test
> > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> > profiler
> > > > and
> > > > > it shows for instance that running the whole class itself took 10
> > > minutes
> > > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> > That's a
> > > > > 100% overhead. Without profiler the whole class takes 7 minutes and
> > 48
> > > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > > bigger
> > > > > test though, most of them won't take this much.
> > > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > > running
> > > > > :core:integrationTest takes almost 2 hours.
> > > > >
> > > > > I think we could greatly speed up these integration tests by just
> > > > creating
> > > > > the 

Re: Speeding up integration tests

2019-03-08 Thread Ron Dagostino
It's a classic problem: you can't string N things together serially and
expect high reliability.  5,000 tests in a row isn't going to give you a
bunch of 9's.  It feels to me that the test frameworks themselves should
support a more robust model -- like a way to tag a test as "retry me up to
N times before you really consider me a failure" or something like that.

Ron

On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski 
wrote:

> > We internally have an improvement for a half a year now which reruns the
> flaky test classes at the end of the test gradle task, lets you know that
> they were rerun and probably flaky. It fails the build only if the second
> run of the test class was also unsuccessful. I think it works pretty good,
> we mostly have green builds. If there is interest, I can try to contribute
> that.
>
> That does sound very intriguing. Does it rerun the test classes that failed
> or some known, marked classes? If it is the former, I can see a lot of
> value in having that automated in our PR builds. I wonder what others think
> of this
>
> On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> wrote:
>
> > Hey All,
> >
> > Thanks for the loads of ideas.
> >
> > @Stanislav, @Sonke
> > I probably left it out from my email but I really imagined this as a
> > case-by-case basis change. If we think that it wouldn't cause problems,
> > then it might be applied. That way we'd limit the blast radius somewhat.
> > The 1 hour gain is really just the most optimistic scenario, I'm almost
> > sure that not every test could be transformed to use a common cluster.
> > We internally have an improvement for a half a year now which reruns the
> > flaky test classes at the end of the test gradle task, lets you know that
> > they were rerun and probably flaky. It fails the build only if the second
> > run of the test class was also unsuccessful. I think it works pretty
> good,
> > we mostly have green builds. If there is interest, I can try to
> contribute
> > that.
> >
> > >I am also extremely annoyed at times by the amount of coffee I have to
> > drink before tests finish
> > Just please don't get a heart attack :)
> >
> > @Ron, @Colin
> > You bring up a very good point that it is easier and frees up more
> > resources if we just run change specific tests and it's good to know
> that a
> > similar solution (meaning using a shared resource for testing) have
> failed
> > elsewhere. I second Ron on the test categorization though, although as a
> > first attempt I think using a flaky retry + running only the necessary
> > tests would help in both time saving and effectiveness. Also it would be
> > easier to achieve.
> >
> > @Ismael
> > Yea, it'd be interesting to profile the startup/shutdown, I've never done
> > that. Perhaps I'll set some time apart for that :). It's definitely true
> > though that if we see a significant delay there we wouldn't just improve
> > the efficiency of the tests but also customer experience.
> >
> > Best,
> > Viktor
> >
> >
> >
> > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma  wrote:
> >
> > > It's an idea that has come up before and worth exploring eventually.
> > > However, I'd first try to optimize the server startup/shutdown process.
> > If
> > > we measure where the time is going, maybe some opportunities will
> present
> > > themselves.
> > >
> > > Ismael
> > >
> > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I've been observing lately that unit tests usually take 2.5 hours to
> > run
> > > > and a very big portion of these are the core tests where a new
> cluster
> > is
> > > > spun up for every test. This takes most of the time. I ran a test
> > > > (TopicCommandWithAdminClient with 38 test inside) through the
> profiler
> > > and
> > > > it shows for instance that running the whole class itself took 10
> > minutes
> > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> That's a
> > > > 100% overhead. Without profiler the whole class takes 7 minutes and
> 48
> > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > bigger
> > > > test though, most of them won't take this much.
> > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > running
> > > > :core:integrationTest takes almost 2 hours.
> > > >
> > > > I think we could greatly speed up these integration tests by just
> > > creating
> > > > the cluster once per class and perform the tests on separate
> methods. I
> > > > know that this a little bit contradicts to the principle that tests
> > > should
> > > > be independent but it seems like recreating clusters for each is a
> very
> > > > expensive operation. Also if the tests are acting on different
> > resources
> > > > (different topics, etc.) then it might not hurt their independence.
> > There
> > > > might be cases of course where this is not possible but I think there
> > > could

Re: Speeding up integration tests

2019-03-08 Thread Stanislav Kozlovski
> We internally have an improvement for a half a year now which reruns the
flaky test classes at the end of the test gradle task, lets you know that
they were rerun and probably flaky. It fails the build only if the second
run of the test class was also unsuccessful. I think it works pretty good,
we mostly have green builds. If there is interest, I can try to contribute
that.

That does sound very intriguing. Does it rerun the test classes that failed
or some known, marked classes? If it is the former, I can see a lot of
value in having that automated in our PR builds. I wonder what others think
of this

On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass 
wrote:

> Hey All,
>
> Thanks for the loads of ideas.
>
> @Stanislav, @Sonke
> I probably left it out from my email but I really imagined this as a
> case-by-case basis change. If we think that it wouldn't cause problems,
> then it might be applied. That way we'd limit the blast radius somewhat.
> The 1 hour gain is really just the most optimistic scenario, I'm almost
> sure that not every test could be transformed to use a common cluster.
> We internally have an improvement for a half a year now which reruns the
> flaky test classes at the end of the test gradle task, lets you know that
> they were rerun and probably flaky. It fails the build only if the second
> run of the test class was also unsuccessful. I think it works pretty good,
> we mostly have green builds. If there is interest, I can try to contribute
> that.
>
> >I am also extremely annoyed at times by the amount of coffee I have to
> drink before tests finish
> Just please don't get a heart attack :)
>
> @Ron, @Colin
> You bring up a very good point that it is easier and frees up more
> resources if we just run change specific tests and it's good to know that a
> similar solution (meaning using a shared resource for testing) have failed
> elsewhere. I second Ron on the test categorization though, although as a
> first attempt I think using a flaky retry + running only the necessary
> tests would help in both time saving and effectiveness. Also it would be
> easier to achieve.
>
> @Ismael
> Yea, it'd be interesting to profile the startup/shutdown, I've never done
> that. Perhaps I'll set some time apart for that :). It's definitely true
> though that if we see a significant delay there we wouldn't just improve
> the efficiency of the tests but also customer experience.
>
> Best,
> Viktor
>
>
>
> On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma  wrote:
>
> > It's an idea that has come up before and worth exploring eventually.
> > However, I'd first try to optimize the server startup/shutdown process.
> If
> > we measure where the time is going, maybe some opportunities will present
> > themselves.
> >
> > Ismael
> >
> > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com
> > >
> > wrote:
> >
> > > Hi Folks,
> > >
> > > I've been observing lately that unit tests usually take 2.5 hours to
> run
> > > and a very big portion of these are the core tests where a new cluster
> is
> > > spun up for every test. This takes most of the time. I ran a test
> > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> > and
> > > it shows for instance that running the whole class itself took 10
> minutes
> > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > seconds, so the useful time would be between 3-4 minutes. This is a
> > bigger
> > > test though, most of them won't take this much.
> > > There are 74 classes that implement KafkaServerTestHarness and just
> > running
> > > :core:integrationTest takes almost 2 hours.
> > >
> > > I think we could greatly speed up these integration tests by just
> > creating
> > > the cluster once per class and perform the tests on separate methods. I
> > > know that this a little bit contradicts to the principle that tests
> > should
> > > be independent but it seems like recreating clusters for each is a very
> > > expensive operation. Also if the tests are acting on different
> resources
> > > (different topics, etc.) then it might not hurt their independence.
> There
> > > might be cases of course where this is not possible but I think there
> > could
> > > be a lot where it is.
> > >
> > > In the optimal case we could cut the testing time back by approximately
> > an
> > > hour. This would save resources and give quicker feedback for PR
> builds.
> > >
> > > What are your thoughts?
> > > Has anyone thought about this or were there any attempts made?
> > >
> > > Best,
> > > Viktor
> > >
> >
>


-- 
Best,
Stanislav


Re: Speeding up integration tests

2019-02-28 Thread Viktor Somogyi-Vass
Hey All,

Thanks for the loads of ideas.

@Stanislav, @Sonke
I probably left it out from my email but I really imagined this as a
case-by-case basis change. If we think that it wouldn't cause problems,
then it might be applied. That way we'd limit the blast radius somewhat.
The 1 hour gain is really just the most optimistic scenario, I'm almost
sure that not every test could be transformed to use a common cluster.
We internally have an improvement for a half a year now which reruns the
flaky test classes at the end of the test gradle task, lets you know that
they were rerun and probably flaky. It fails the build only if the second
run of the test class was also unsuccessful. I think it works pretty good,
we mostly have green builds. If there is interest, I can try to contribute
that.

>I am also extremely annoyed at times by the amount of coffee I have to
drink before tests finish
Just please don't get a heart attack :)

@Ron, @Colin
You bring up a very good point that it is easier and frees up more
resources if we just run change specific tests and it's good to know that a
similar solution (meaning using a shared resource for testing) have failed
elsewhere. I second Ron on the test categorization though, although as a
first attempt I think using a flaky retry + running only the necessary
tests would help in both time saving and effectiveness. Also it would be
easier to achieve.

@Ismael
Yea, it'd be interesting to profile the startup/shutdown, I've never done
that. Perhaps I'll set some time apart for that :). It's definitely true
though that if we see a significant delay there we wouldn't just improve
the efficiency of the tests but also customer experience.

Best,
Viktor



On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma  wrote:

> It's an idea that has come up before and worth exploring eventually.
> However, I'd first try to optimize the server startup/shutdown process. If
> we measure where the time is going, maybe some opportunities will present
> themselves.
>
> Ismael
>
> On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass  >
> wrote:
>
> > Hi Folks,
> >
> > I've been observing lately that unit tests usually take 2.5 hours to run
> > and a very big portion of these are the core tests where a new cluster is
> > spun up for every test. This takes most of the time. I ran a test
> > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> and
> > it shows for instance that running the whole class itself took 10 minutes
> > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > seconds, so the useful time would be between 3-4 minutes. This is a
> bigger
> > test though, most of them won't take this much.
> > There are 74 classes that implement KafkaServerTestHarness and just
> running
> > :core:integrationTest takes almost 2 hours.
> >
> > I think we could greatly speed up these integration tests by just
> creating
> > the cluster once per class and perform the tests on separate methods. I
> > know that this a little bit contradicts to the principle that tests
> should
> > be independent but it seems like recreating clusters for each is a very
> > expensive operation. Also if the tests are acting on different resources
> > (different topics, etc.) then it might not hurt their independence. There
> > might be cases of course where this is not possible but I think there
> could
> > be a lot where it is.
> >
> > In the optimal case we could cut the testing time back by approximately
> an
> > hour. This would save resources and give quicker feedback for PR builds.
> >
> > What are your thoughts?
> > Has anyone thought about this or were there any attempts made?
> >
> > Best,
> > Viktor
> >
>


Re: Speeding up integration tests

2019-02-27 Thread Ismael Juma
It's an idea that has come up before and worth exploring eventually.
However, I'd first try to optimize the server startup/shutdown process. If
we measure where the time is going, maybe some opportunities will present
themselves.

Ismael

On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass 
wrote:

> Hi Folks,
>
> I've been observing lately that unit tests usually take 2.5 hours to run
> and a very big portion of these are the core tests where a new cluster is
> spun up for every test. This takes most of the time. I ran a test
> (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> it shows for instance that running the whole class itself took 10 minutes
> and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> 100% overhead. Without profiler the whole class takes 7 minutes and 48
> seconds, so the useful time would be between 3-4 minutes. This is a bigger
> test though, most of them won't take this much.
> There are 74 classes that implement KafkaServerTestHarness and just running
> :core:integrationTest takes almost 2 hours.
>
> I think we could greatly speed up these integration tests by just creating
> the cluster once per class and perform the tests on separate methods. I
> know that this a little bit contradicts to the principle that tests should
> be independent but it seems like recreating clusters for each is a very
> expensive operation. Also if the tests are acting on different resources
> (different topics, etc.) then it might not hurt their independence. There
> might be cases of course where this is not possible but I think there could
> be a lot where it is.
>
> In the optimal case we could cut the testing time back by approximately an
> hour. This would save resources and give quicker feedback for PR builds.
>
> What are your thoughts?
> Has anyone thought about this or were there any attempts made?
>
> Best,
> Viktor
>


Re: Speeding up integration tests

2019-02-27 Thread Ron Dagostino
Hi Colin.

<< wrote:

> On Wed, Feb 27, 2019, at 10:02, Ron Dagostino wrote:
> > Hi everyone.  Maybe providing the option to run it both ways -- start
> your
> > own cluster vs. using one that is pre-started -- might be useful?  Don't
> > know how it would work or if it would be useful, but it is something to
> > think about.
> >
> > Also, while the argument against using a pre-started cluster due to an
> > expected increase in test flakiness is reasonable, it is conjecture and
> may
> > not turn out to be correct; if it isn't too much trouble it might be
> worth
> > it to actually see if the arguments is right since the decreased time
> > benefit is less conjecture/more real.
>
> It's not just a conjecture.  We tried sharing the same cluster for
> multiple test cases in Hadoop.  It often led to difficult to debug tests.
> A test can change the cluster state in a subtle way that causes a following
> test to fail.  The ordering that tests get run in JUnit is also random,
> which makes this even more frustrating for developers.
>
> An easier solution to the problem of long test run times is to run only
> the tests that are affected by a particular change.  For example, if you
> make a change to Connect, you shouldn't need to re-run the tests for
> Clients, since your change doesn't impact any of that code.  We had this
> set up in Hadoop, and it helped free up a lot of Jenkins time.
>
> best,
> Colin
>
> >
> > Ron
> >
> > On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
> >  wrote:
> >
> > > Hi,
> > >
> > > while I am also extremely annoyed at times by the amount of coffee I
> > > have to drink before tests finish I think the argument about flaky
> > > tests is valid! The current setup has the benefit that every test case
> > > runs on a pristine cluster, if we changed this we'd need to go through
> > > all tests and ensure that topic names are different, which can
> > > probably be abstracted to include a timestamp in the name or something
> > > like that, but it is an additional failure potential.
> > > Add to this the fact that "JUnit runs tests using a deterministic, but
> > > unpredictable order" and the water gets even muddier. Potentially this
> > > might mean that adding an additional test case changes the order that
> > > existing test cases are executed in which might mean that all of a
> > > sudden something breaks that you didn't even touch.
> > >
> > > Best regards,
> > > Sönke
> > >
> > >
> > > On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
> > >  wrote:
> > > >
> > > > Hey Viktor,
> > > >
> > > > I am all up for the idea of speeding up the tests. Running the
> > > > `:core:integrationTest` command takes an absurd amount of time as is
> and
> > > is
> > > > continuously going to go up if we don't do anything about it.
> > > > Having said that, I am very scared that your proposal might
> significantly
> > > > increase the test flakiness of current and future tests - test
> flakiness
> > > is
> > > > a huge problem we're battling. We don't get green PR builds too
> often -
> > > it
> > > > is very common that one or two flaky tests fail in each PR.
> > > > We have also found it hard to get a green build for the 2.2 release (
> > > > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> > > >
> > > > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com> wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > I've been observing lately that unit tests usually take 2.5 hours
> to
> > > run
> > > > > and a very big portion of these are the core tests where a new
> cluster
> > > is
> > > > > spun up for every test. This takes most of the time. I ran a test
> > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> profiler
> > > and
> > > > > it shows for instance that running the whole class itself took 10
> > > minutes
> > > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> That's a
> > > > > 100% overhead. Without profiler the whole class takes 7 minutes
> and 48
> > > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > bigger
> > > > > test though, most of them won't take this much.
> > > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > running
> > > > > :core:integrationTest takes almost 2 hours.
> > > > >
> > > > > I think we could greatly speed up these integration tests by just
> > > creating
> > > > > the cluster once per class and perform the tests on separate
> methods. I
> > > > > know that this a little bit contradicts to the principle that tests
> > > should
> > > > > be independent but it seems like recreating clusters for each is a
> very
> > > > > expensive operation. Also if the tests are acting on different
> > > resources
> > > > > (different topics, etc.) then it might not hurt their independence.
> > > There
> > > > > might be cases of course where this is not possible but I think
> there
> > > could
> > > > > be a lot where it is.
> > > > >
> > > 

Re: Speeding up integration tests

2019-02-27 Thread Colin McCabe
On Wed, Feb 27, 2019, at 10:02, Ron Dagostino wrote:
> Hi everyone.  Maybe providing the option to run it both ways -- start your
> own cluster vs. using one that is pre-started -- might be useful?  Don't
> know how it would work or if it would be useful, but it is something to
> think about.
> 
> Also, while the argument against using a pre-started cluster due to an
> expected increase in test flakiness is reasonable, it is conjecture and may
> not turn out to be correct; if it isn't too much trouble it might be worth
> it to actually see if the arguments is right since the decreased time
> benefit is less conjecture/more real.

It's not just a conjecture.  We tried sharing the same cluster for multiple 
test cases in Hadoop.  It often led to difficult to debug tests.  A test can 
change the cluster state in a subtle way that causes a following test to fail.  
The ordering that tests get run in JUnit is also random, which makes this even 
more frustrating for developers.

An easier solution to the problem of long test run times is to run only the 
tests that are affected by a particular change.  For example, if you make a 
change to Connect, you shouldn't need to re-run the tests for Clients, since 
your change doesn't impact any of that code.  We had this set up in Hadoop, and 
it helped free up a lot of Jenkins time.

best,
Colin

> 
> Ron
> 
> On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
>  wrote:
> 
> > Hi,
> >
> > while I am also extremely annoyed at times by the amount of coffee I
> > have to drink before tests finish I think the argument about flaky
> > tests is valid! The current setup has the benefit that every test case
> > runs on a pristine cluster, if we changed this we'd need to go through
> > all tests and ensure that topic names are different, which can
> > probably be abstracted to include a timestamp in the name or something
> > like that, but it is an additional failure potential.
> > Add to this the fact that "JUnit runs tests using a deterministic, but
> > unpredictable order" and the water gets even muddier. Potentially this
> > might mean that adding an additional test case changes the order that
> > existing test cases are executed in which might mean that all of a
> > sudden something breaks that you didn't even touch.
> >
> > Best regards,
> > Sönke
> >
> >
> > On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
> >  wrote:
> > >
> > > Hey Viktor,
> > >
> > > I am all up for the idea of speeding up the tests. Running the
> > > `:core:integrationTest` command takes an absurd amount of time as is and
> > is
> > > continuously going to go up if we don't do anything about it.
> > > Having said that, I am very scared that your proposal might significantly
> > > increase the test flakiness of current and future tests - test flakiness
> > is
> > > a huge problem we're battling. We don't get green PR builds too often -
> > it
> > > is very common that one or two flaky tests fail in each PR.
> > > We have also found it hard to get a green build for the 2.2 release (
> > > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> > >
> > > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com> wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I've been observing lately that unit tests usually take 2.5 hours to
> > run
> > > > and a very big portion of these are the core tests where a new cluster
> > is
> > > > spun up for every test. This takes most of the time. I ran a test
> > > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> > and
> > > > it shows for instance that running the whole class itself took 10
> > minutes
> > > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > bigger
> > > > test though, most of them won't take this much.
> > > > There are 74 classes that implement KafkaServerTestHarness and just
> > running
> > > > :core:integrationTest takes almost 2 hours.
> > > >
> > > > I think we could greatly speed up these integration tests by just
> > creating
> > > > the cluster once per class and perform the tests on separate methods. I
> > > > know that this a little bit contradicts to the principle that tests
> > should
> > > > be independent but it seems like recreating clusters for each is a very
> > > > expensive operation. Also if the tests are acting on different
> > resources
> > > > (different topics, etc.) then it might not hurt their independence.
> > There
> > > > might be cases of course where this is not possible but I think there
> > could
> > > > be a lot where it is.
> > > >
> > > > In the optimal case we could cut the testing time back by
> > approximately an
> > > > hour. This would save resources and give quicker feedback for PR
> > builds.
> > > >
> > > > What are your thoughts?
> > > > Has anyone thought about this or 

Re: Speeding up integration tests

2019-02-27 Thread Ron Dagostino
Hi everyone.  Maybe providing the option to run it both ways -- start your
own cluster vs. using one that is pre-started -- might be useful?  Don't
know how it would work or if it would be useful, but it is something to
think about.

Also, while the argument against using a pre-started cluster due to an
expected increase in test flakiness is reasonable, it is conjecture and may
not turn out to be correct; if it isn't too much trouble it might be worth
it to actually see if the arguments is right since the decreased time
benefit is less conjecture/more real.

Ron

On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
 wrote:

> Hi,
>
> while I am also extremely annoyed at times by the amount of coffee I
> have to drink before tests finish I think the argument about flaky
> tests is valid! The current setup has the benefit that every test case
> runs on a pristine cluster, if we changed this we'd need to go through
> all tests and ensure that topic names are different, which can
> probably be abstracted to include a timestamp in the name or something
> like that, but it is an additional failure potential.
> Add to this the fact that "JUnit runs tests using a deterministic, but
> unpredictable order" and the water gets even muddier. Potentially this
> might mean that adding an additional test case changes the order that
> existing test cases are executed in which might mean that all of a
> sudden something breaks that you didn't even touch.
>
> Best regards,
> Sönke
>
>
> On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
>  wrote:
> >
> > Hey Viktor,
> >
> > I am all up for the idea of speeding up the tests. Running the
> > `:core:integrationTest` command takes an absurd amount of time as is and
> is
> > continuously going to go up if we don't do anything about it.
> > Having said that, I am very scared that your proposal might significantly
> > increase the test flakiness of current and future tests - test flakiness
> is
> > a huge problem we're battling. We don't get green PR builds too often -
> it
> > is very common that one or two flaky tests fail in each PR.
> > We have also found it hard to get a green build for the 2.2 release (
> > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> >
> > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com> wrote:
> >
> > > Hi Folks,
> > >
> > > I've been observing lately that unit tests usually take 2.5 hours to
> run
> > > and a very big portion of these are the core tests where a new cluster
> is
> > > spun up for every test. This takes most of the time. I ran a test
> > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> and
> > > it shows for instance that running the whole class itself took 10
> minutes
> > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > seconds, so the useful time would be between 3-4 minutes. This is a
> bigger
> > > test though, most of them won't take this much.
> > > There are 74 classes that implement KafkaServerTestHarness and just
> running
> > > :core:integrationTest takes almost 2 hours.
> > >
> > > I think we could greatly speed up these integration tests by just
> creating
> > > the cluster once per class and perform the tests on separate methods. I
> > > know that this a little bit contradicts to the principle that tests
> should
> > > be independent but it seems like recreating clusters for each is a very
> > > expensive operation. Also if the tests are acting on different
> resources
> > > (different topics, etc.) then it might not hurt their independence.
> There
> > > might be cases of course where this is not possible but I think there
> could
> > > be a lot where it is.
> > >
> > > In the optimal case we could cut the testing time back by
> approximately an
> > > hour. This would save resources and give quicker feedback for PR
> builds.
> > >
> > > What are your thoughts?
> > > Has anyone thought about this or were there any attempts made?
> > >
> > > Best,
> > > Viktor
> > >
> >
> >
> > --
> > Best,
> > Stanislav
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>


Re: Speeding up integration tests

2019-02-27 Thread Sönke Liebau
Hi,

while I am also extremely annoyed at times by the amount of coffee I
have to drink before tests finish I think the argument about flaky
tests is valid! The current setup has the benefit that every test case
runs on a pristine cluster, if we changed this we'd need to go through
all tests and ensure that topic names are different, which can
probably be abstracted to include a timestamp in the name or something
like that, but it is an additional failure potential.
Add to this the fact that "JUnit runs tests using a deterministic, but
unpredictable order" and the water gets even muddier. Potentially this
might mean that adding an additional test case changes the order that
existing test cases are executed in which might mean that all of a
sudden something breaks that you didn't even touch.

Best regards,
Sönke


On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
 wrote:
>
> Hey Viktor,
>
> I am all up for the idea of speeding up the tests. Running the
> `:core:integrationTest` command takes an absurd amount of time as is and is
> continuously going to go up if we don't do anything about it.
> Having said that, I am very scared that your proposal might significantly
> increase the test flakiness of current and future tests - test flakiness is
> a huge problem we're battling. We don't get green PR builds too often - it
> is very common that one or two flaky tests fail in each PR.
> We have also found it hard to get a green build for the 2.2 release (
> https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
>
> On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
>
> > Hi Folks,
> >
> > I've been observing lately that unit tests usually take 2.5 hours to run
> > and a very big portion of these are the core tests where a new cluster is
> > spun up for every test. This takes most of the time. I ran a test
> > (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> > it shows for instance that running the whole class itself took 10 minutes
> > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > seconds, so the useful time would be between 3-4 minutes. This is a bigger
> > test though, most of them won't take this much.
> > There are 74 classes that implement KafkaServerTestHarness and just running
> > :core:integrationTest takes almost 2 hours.
> >
> > I think we could greatly speed up these integration tests by just creating
> > the cluster once per class and perform the tests on separate methods. I
> > know that this a little bit contradicts to the principle that tests should
> > be independent but it seems like recreating clusters for each is a very
> > expensive operation. Also if the tests are acting on different resources
> > (different topics, etc.) then it might not hurt their independence. There
> > might be cases of course where this is not possible but I think there could
> > be a lot where it is.
> >
> > In the optimal case we could cut the testing time back by approximately an
> > hour. This would save resources and give quicker feedback for PR builds.
> >
> > What are your thoughts?
> > Has anyone thought about this or were there any attempts made?
> >
> > Best,
> > Viktor
> >
>
>
> --
> Best,
> Stanislav



--
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Speeding up integration tests

2019-02-27 Thread Stanislav Kozlovski
Hey Viktor,

I am all up for the idea of speeding up the tests. Running the
`:core:integrationTest` command takes an absurd amount of time as is and is
continuously going to go up if we don't do anything about it.
Having said that, I am very scared that your proposal might significantly
increase the test flakiness of current and future tests - test flakiness is
a huge problem we're battling. We don't get green PR builds too often - it
is very common that one or two flaky tests fail in each PR.
We have also found it hard to get a green build for the 2.2 release (
https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).

On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:

> Hi Folks,
>
> I've been observing lately that unit tests usually take 2.5 hours to run
> and a very big portion of these are the core tests where a new cluster is
> spun up for every test. This takes most of the time. I ran a test
> (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> it shows for instance that running the whole class itself took 10 minutes
> and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> 100% overhead. Without profiler the whole class takes 7 minutes and 48
> seconds, so the useful time would be between 3-4 minutes. This is a bigger
> test though, most of them won't take this much.
> There are 74 classes that implement KafkaServerTestHarness and just running
> :core:integrationTest takes almost 2 hours.
>
> I think we could greatly speed up these integration tests by just creating
> the cluster once per class and perform the tests on separate methods. I
> know that this a little bit contradicts to the principle that tests should
> be independent but it seems like recreating clusters for each is a very
> expensive operation. Also if the tests are acting on different resources
> (different topics, etc.) then it might not hurt their independence. There
> might be cases of course where this is not possible but I think there could
> be a lot where it is.
>
> In the optimal case we could cut the testing time back by approximately an
> hour. This would save resources and give quicker feedback for PR builds.
>
> What are your thoughts?
> Has anyone thought about this or were there any attempts made?
>
> Best,
> Viktor
>


-- 
Best,
Stanislav


Speeding up integration tests

2019-02-27 Thread Viktor Somogyi-Vass
Hi Folks,

I've been observing lately that unit tests usually take 2.5 hours to run
and a very big portion of these are the core tests where a new cluster is
spun up for every test. This takes most of the time. I ran a test
(TopicCommandWithAdminClient with 38 test inside) through the profiler and
it shows for instance that running the whole class itself took 10 minutes
and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
100% overhead. Without profiler the whole class takes 7 minutes and 48
seconds, so the useful time would be between 3-4 minutes. This is a bigger
test though, most of them won't take this much.
There are 74 classes that implement KafkaServerTestHarness and just running
:core:integrationTest takes almost 2 hours.

I think we could greatly speed up these integration tests by just creating
the cluster once per class and perform the tests on separate methods. I
know that this a little bit contradicts to the principle that tests should
be independent but it seems like recreating clusters for each is a very
expensive operation. Also if the tests are acting on different resources
(different topics, etc.) then it might not hurt their independence. There
might be cases of course where this is not possible but I think there could
be a lot where it is.

In the optimal case we could cut the testing time back by approximately an
hour. This would save resources and give quicker feedback for PR builds.

What are your thoughts?
Has anyone thought about this or were there any attempts made?

Best,
Viktor