Hi Rob, Thanks a lot for your post. Your use cases are very illustrative. I am just starting to tackle the situation and would like to come up with a solution that can at least reduce the time I spent triaging and debugging false positives.
The failures are not deterministic so is not that a specific set of tests are failing randomly but in a consistent manner. The plugins you mention could be a good approach. Need to analyse the impact on the overall build time or the need of adding new resources. On Friday, August 8, 2014 6:09:21 PM UTC+2, Rob Mandeville wrote: > > Ugh, intermittent test failures. I lived in that jungle for years. In > a previous life, my company used Jenkins to drive a complete homebrew > solution. We wrote a tool to parse the log and write to database, wrote a > webapp to read the database and let you know what failed, even wrote a tool > to produce a subset of jobs so that you could auto-rerun all the failures. > Jenkins was reduced to simply running the jobs, and nobody looked at the > site to see what passed and what failed. > > > > Trust me: you do not want to go there unless it’s absolutely necessary. > > > > First thought should be to keeping these tests from being intermittent. > Perhaps the test code could account for strange OS or resource factors. > Perhaps you can limit slave nodes or something to control resource > contention. Perhaps not. > > > > If fixing the intermittency is not an option, your next defense is to > break up the build. If you have an hour of tests, but only five minutes of > them are intermittent, isolate the intermittents into their own job so that > you can do a rerun in five minutes rather than an hour. > > > > Now, for plugins. > > > > First, there is a Naginator plugin. This can be configured to rerun > failed builds, so it can keep trying those intermittent tests. > > > > Secondly, there is a Build Flow plugin, which is more complicated and more > versatile. You use it to tie a bunch of build jobs together (compile as > one job, unit tests over on another one, Selenium jobs on a third…). The > DSL that you use has the ‘retry’ keyword, so you can say to run a certain > job some number of times until it succeeds. This would allow you to have > one master build that is red, yellow, or (blue|green), and that could retry > your intermittent tests until they pass or just fail one too many times. > > > > Of course, all of this retrying-until-it-works is based on the assumption > that if you run a test three times and it fails on two of those tries, the > test should be considered passing. This is not a good assumption: it could > mean that the code is just plain flaky. I prefer to get the test to > recognize the conditions that make it flaky and thus respond to them. If > the test can fail because the database is unresponsive, have the test start > by trying to contact the database, and retrying every minute or two until > it succeeds or a test-determined timeout occurs. Handling intermittency at > the test level rather than the Jenkins level requires you to define the > things that can break your test, rather than just assuming that passing > once in a while is okay. If you can name me some factors that cause your > tests to go intermittent, I may have some ideas as to how to make the tests > non-intermittent. > > > > --Rob > > > > *From:* [email protected] <javascript:> [mailto: > [email protected] <javascript:>] *On Behalf Of *Albert Tresens > *Sent:* Friday, August 08, 2014 11:05 AM > *To:* [email protected] <javascript:> > *Subject:* False positive on Jenkins builds. How to address. > > > > Hi, > > I am trying to optimize the triaging time on jenkins failues caused by > false positives. There is a percentage of failures that are always self > healed after subsequent builds. Mostly dependent on the underlying OS or > some resource factors. > > Does someone followed any specific approach to address such a situations?. > I guess is a common problem. > > I thought about spawning the Jenkins jobs so I get duplicated results and > discard if its not a double failure or adding plugins for filtering > specific exceptions. > > Any suggestion or alternatives? > > Thanks! > > -- > You received this message because you are subscribed to the Google Groups > "Jenkins Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > Click here > <https://www.mailcontrol.com/sr/T2QE9Uq1co7GX2PQPOmvUtVTDJsKpCsgUcP!5TmclhuaW9pNFZEyQhCHjGJhJAM35b6TXk5rwVH1VsBVsOva5A==> > > to report this email as spam. > > ------------------------------ > This e-mail and the information, including any attachments it contains, > are intended to be a confidential communication only to the person or > entity to whom it is addressed and may contain information that is > privileged. If the reader of this message is not the intended recipient, > you are hereby notified that any dissemination, distribution or copying of > this communication is strictly prohibited. If you have received this > communication in error, please immediately notify the sender and destroy > the original message. > > Thank you. > > Please consider the environment before printing this email. > -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
