In this PR: https://github.com/apache/incubator-druid/pull/5957 (unrelatedly) I made several changes that should hopefully help to analyze flaky failures, e. g. printing deadlocked threads (if any) if a test times out. Also printing full stack traces on failures.
On Fri, 7 Sep 2018 at 00:28, Gian Merlino <g...@apache.org> wrote: > Our CI has been super flaky lately: it seems rare that a PR is able to pass > without a few retries. In an effort to try to help I added a new label > "Flaky tests" and tagged all the open issues that look related to flaky > tests. I also closed a few that have been open for a long time and I don't > recall seeing in a while. They are all here: > > https://github.com/apache/incubator-druid/labels/Flaky%20test > > In a few cases I edited the titles so they all have the specific test case > that failed (the class and method). I think it helps to have one issue per > method, that way we can track them separately. > > Non-scientifically I seem to be noticing these four often these days: > > 1) https://github.com/apache/incubator-druid/issues/6296 > 2) https://github.com/apache/incubator-druid/issues/2373 > 3) https://github.com/apache/incubator-druid/issues/6311 > 4) https://github.com/apache/incubator-druid/issues/6312 > > Please, if we can, let's spend some time looking into what is going on with > these tests. We will thank ourselves when it makes PR flow smoother! >