In my experience they never get fixed. To be honest, when I was doing the releases I would have these failures investigated to determine if it was a trait problem vs a problem in the code being released. If it was the latter I would cancel the vote. The only time tests should be disabled is if we know it is a problem in the test but can’t figure out how to fix it.
I also don’t ever recall Gary ever having more than one or two tests fail in a run. Ralph > On Nov 20, 2023, at 5:00 AM, Volkan Yazıcı <vol...@yazi.ci> wrote: > > I am not asking to disable Windows tests. I am asking to disable tests > and only those tests that have a failure rate on Windows higher than, > say, 30%. To be precise, I think there are 2-3 of them dealing with > network sockets and rolling file appenders. I am not talking about > dozens or such. > > After disabling them, we can create a ticket referencing them. So that > interested parties can fix them. > >> On Mon, Nov 20, 2023 at 12:25 PM Piotr P. Karwasz >> <piotr.karw...@gmail.com> wrote: >> >> Hi Volkan, >> >>> On Mon, 20 Nov 2023 at 09:36, Volkan Yazıcı <vol...@yazi.ci> wrote: >>> >>> As Gary (the only Windows user among the active Log4j maintainers, >>> AFAIK) has noticed several times, Log4j tests on Windows are pretty >>> unstable. It not only fails on Gary's laptop, but Piotr and I need to >>> give Windows tests in CI a kick on a regular basis. Approximately one >>> out of three CI runs fails on Windows. Piotr already improved the >>> situation extensively, though there are still several leftovers that >>> need attention. >>> >>> Unless somebody steps up to improve the unstable Windows tests, I >>> would like to disable those only for the WIndows platform. >> >> Please don't. Windows has an annoying file locking policy that >> prevents users from deleting files with open file descriptors, but >> that is one of the few ways to detect resource leakage we have. >> >> Tests running on *NIXes will ignore problems with open file >> descriptors and delete the log files, but on a production system those >> leaks will accumulate and cause application crashes. We had such a >> leak, when we used `URLConnection#getLastModified` on a `jar:...` URL. >> This call caused file descriptor exhaustion on both Windows and >> *NIXes, but only the Windows test was able to detect it. >> >> Piotr, >> who never thought would ever defend Microsoft Windows. >> >> PS: Gary reports the failures, but always runs the build again until >> it succeeds, even on Friday 13th, when he had to wait until Saturday >> 14th for the test run to succeed.