Thanks for that link Ismael -- I'd never seen it before. That's super useful
In fact, it helped me realize another area where we could really improve
things. I took a look at the worst offenders on that chart and was
debugging the 2nd most flaky Streams test when I expanded the timeframe and
saw
Hi David,
Thanks for continuing to drive this.
Yes, ignoreFailures does come with some risk and it's only suitable as an
intermediate step as part of a long-term plan that eventually gates all
merges on builds with passing tests. I brought it up because it would allow
us to immediately prevent
Hi all,
I am still experimenting with reducing the noise of flaky tests in build
results. I should have results to share early next week.
Chris, I am also for a programmatic gate. Regarding using ignoreFailures,
it seems risky because the build may be green but with failed tests, no?
I would
Hi all,
There's a lot to catch up on here but I wanted to clarify something.
Regarding this comment from Sophie:
> Yet multiple people in this thread so
far have voiced support for "gating merges on the successful completion of
all parts of the build except tests". Just to be totally clear, I
I think it breaks the Jenkins output otherwise. Feel free to test it via a
PR.
Ismael
On Wed, Nov 22, 2023, 12:42 AM David Jacot
wrote:
> Hi Ismael,
>
> No, I was not aware of KAFKA-12216. My understanding is that we could still
> do it without the JUnitFlakyTestDataPublisher plugin and we
Hi Ismael,
No, I was not aware of KAFKA-12216. My understanding is that we could still
do it without the JUnitFlakyTestDataPublisher plugin and we could use
gradle enterprise for this. Or do you think that reporting the flaky tests
in the build results is required?
David
On Wed, Nov 22, 2023 at
Hi David,
Did you take a look at https://issues.apache.org/jira/browse/KAFKA-12216? I
looked into this option already (yes, there isn't much that we haven't
considered in this space).
Ismael
On Wed, Nov 22, 2023 at 12:24 AM David Jacot
wrote:
> Hi all,
>
> Thanks for the good discussion and
Hi all,
Thanks for the good discussion and all the comments. Overall, it seems that
we all agree on the bad state of our CI. That's a good first step!
I have talked to a few folks this week about it and it seems that many
folks (including me) are not comfortable with merging PRs at the moment
Hi,
We have a dashboard already:
[image: image.png]
https://ge.apache.org/scans/tests?search.names=Git%20branch=P28D=kafka=America%2FLos_Angeles=trunk=FLAKY
On Tue, Nov 14, 2023 at 10:41 PM Николай Ижиков wrote:
> Hello guys.
>
> I want to tell you about one more approach to deal with flaky
Thanks Sophie. Overall I agree with you.
I think 50% is too high as a general rule, and I believe something like
30% might be more appropriate (going lower, given the infrastructure at
hand might become too aggressive)?
The difficult part about a policy like this is, that we don't really
For some concrete data, here are the stats for the latest build on two
community PRs I am currently reviewing:
https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14648/16/tests
- 18 unrelated test failures
- 13 unique tests
- only 1 out of 4 JDK builds were green
In the interest of moving things forward, here is what we would need (in my
opinion) to start enforcing this:
1. Get overall build failure percentage under a certain threshold
1. What is an acceptable number here?
2. How do we achieve this: wait until all of them are fixed, disable
So...where did we land on this?
To take a few steps back and make sure we're all talking about the same
thing, I want to mention that I myself was responsible for merging a PR
that broke the build a few weeks ago. There was a warning that only
appeared in some of the versions, but when checking
Hi all,
I think at least one of those is my fault, apologies.
I'll try to make sure all my tests are passing from now on.
It doesn't help that GitHub always shows that the tests have failed,
even when they have not. I suspect this is because Jenkins always
marks the builds as unstable, even when
Hello guys.
I want to tell you about one more approach to deal with flaky tests.
We adopt this approach in Apache Ignite community, so may be it can be helpful
for Kafka, also.
TL;DR: Apache Ignite community have a tool that provide a statistic of tests
and can tell if PR introduces new
To use the pain analogy, people seem to have really good painkillers and
hence they somehow don't feel the pain already. ;)
The reality is that important and high quality tests will get fixed. Poor
quality tests (low signal to noise ratio) might not get fixed and that's ok.
I'm not opposed to
One potential forcing function could be to allow an unconditional revert of
any commit (including testing and non-testing changes) that can be shown to
have introduced test flakiness. This could at least prevent us from
accruing more flaky tests, though it would obviously not be viable for
I agree on the test gap argument. However, my worry is, if we don't
"force the pain", it won't get fixed at all. -- I also know, that we try
to find an working approach for many years...
My take is that if we disable a test and file a non-blocking Jira, it's
basically the same as just
Matthias,
Flaky tests are worse than useless. I know engineers find it hard to
disable them because of the supposed test gap (I find it hard too), but the
truth is that the test gap is already there! No-one blocks merging PRs on
flaky tests, but they do get used to ignoring build failures.
The
Thanks Matthias -- I agree that there are lots of flaky test JIRAs but not
a forcing function to fix them.
One option is to mark them as blockers as you alluded to.
One other thing to keep in mind is that we have the gradle enterprise tool
to track flaky tests!
Thanks for starting this discussion David! I totally agree to "no"!
I think there is no excuse whatsoever for merging PRs with compilation
errors (except an honest mistake for conflicting PRs that got merged
interleaved). -- Every committer must(!) check the Jenkins status before
merging to
Hi Divij,
I think this proposal overall makes sense. My only nit sort of a suggestion
is that let's also consider a label called newbie++[1] for flaky tests if
we are considering adding newbie as a label. I think some of the flaky
tests need familiarity with the codebase or the test setup so as a
Hi David et al,
I agree with all the suggestions, but from what I've seen, the flaky tests
tend to get ignored, and I'm afraid that disabling them would leave them
getting forgotten. If the Jira ticket is accurate, we've got plenty of
tickets opened for > 2 years
> Please, do it.
We can use specific labels to effectively filter those tickets.
We already have a label and a way to discover flaky tests. They are tagged
with the label "flaky-test" [1]. There is also a label "newbie" [2] meant
for folks who are new to Apache Kafka code base.
My suggestion is
> To kickstart this effort, we can publish a list of such tickets in the
> community and assign one or more committers the role of a «shepherd" for each
> ticket.
Please, do it.
We can use specific label to effectively filter those tickets.
> 13 нояб. 2023 г., в 15:16, Divij Vaidya
Thanks for bringing this up David.
My primary concern revolves around the possibility that the currently
disabled tests may remain inactive indefinitely. We currently have
unresolved JIRA tickets for flaky tests that have been pending for an
extended period. I am inclined to support the idea of
I will say that I have also seen tests that seem to be more flaky
intermittently. It may be ok for some time and suddenly the CI is
overloaded and we see issues.
I have also seen the CI struggling with running out of space recently, so I
wonder if we can also try to improve things on that front.
On Sat, Nov 11, 2023 at 10:32 AM John Roesler wrote:
> In other words, I’m biased to think that new flakiness indicates
> non-deterministic bugs more often than it indicates a bad test.
>
My experience is exactly the opposite. As someone who has tracked many of
the flaky fixes, the vast
Thanks, all,
This proposal sounds good to me.
We could try to just disable the current flaky tests as a one-time step and
configure GitHub only to merge green builds as a bulwark against the future.
By definition, flaky tests may not fail during the buggy PR build itself, but
they should make
One more thing:
3. If test flakiness is introduced by a recent PR, it's appropriate to
revert said PR vs disabling the flaky tests.
Ismael
On Sat, Nov 11, 2023, 8:45 AM Ismael Juma wrote:
> Hi David,
>
> I would be fine with:
> 1. Only allowing merges if the build is green
> 2. Disabling all
Hi David,
I would be fine with:
1. Only allowing merges if the build is green
2. Disabling all flaky tests that aren't fixed within a week. That is, if a
test is flaky for more than a week, it should be automatically disabled (it
doesn't add any value since it gets ignored).
We need both to make
Hi David,
Thanks for bringing this point and also for creating the revert PRs. I
think there has been an effort in the community to fix a lot of flakey
tests(like around MirrorMaker). I also agree that we shouldn't merge PRs
without green builds and look to ignore flaky tests. For example, I did
Hi all,
The state of our CI worries me a lot. Just this week, we merged two PRs
with compilation errors and one PR introducing persistent failures. This
really hurts the quality and the velocity of the project and it basically
defeats the purpose of having a CI because we tend to ignore it
33 matches
Mail list logo