So tackling the two parts individually: 1) Detection : A good automatic flaky detector will no doubt save manual effort. The one mentioned by Matteo seems like a good one which we can hook up with post-commit job. I see a minor problem though, we'll should use about 20 runs at least, but the rate of execution of this post-commit <https://builds.apache.org/view/All/job/HBase-Trunk_matrix/> job for trunk is too low which means we'll get stale information from the tool. One way around that would be running the post-commit job back-to-back? It can than trigger another job in the end which will use the tool to recompute flakies.
2) Handling: Ignoring flakies in main build and having separate job just for flakies. We can run it often enough and get some stats on flaky-ness rate of each test using the same tool as above. However, I feel that we should keep it super simple in the start. Just have a manual list and a separate job for flakies. It it works out fine, we can add the automatic detection and statistics part later. - Appy