In my experience they never get fixed. To be honest, when I was doing the 
releases I would have these failures investigated to determine if it was a 
trait problem vs a problem in the code being released. If it was the latter I 
would cancel the vote. The only time tests should be disabled is if we know it 
is a problem in the test but can’t figure out how to fix it.

I also don’t ever recall Gary ever having more than one or two tests fail in a 
run.

Ralph

> On Nov 20, 2023, at 5:00 AM, Volkan Yazıcı <vol...@yazi.ci> wrote:
> 
> I am not asking to disable Windows tests. I am asking to disable tests
> and only those tests that have a failure rate on Windows higher than,
> say, 30%. To be precise, I think there are 2-3 of them dealing with
> network sockets and rolling file appenders. I am not talking about
> dozens or such.
> 
> After disabling them, we can create a ticket referencing them. So that
> interested parties can fix them.
> 
>> On Mon, Nov 20, 2023 at 12:25 PM Piotr P. Karwasz
>> <piotr.karw...@gmail.com> wrote:
>> 
>> Hi Volkan,
>> 
>>> On Mon, 20 Nov 2023 at 09:36, Volkan Yazıcı <vol...@yazi.ci> wrote:
>>> 
>>> As Gary (the only Windows user among the active Log4j maintainers,
>>> AFAIK) has noticed several times, Log4j tests on Windows are pretty
>>> unstable. It not only fails on Gary's laptop, but Piotr and I need to
>>> give Windows tests in CI a kick on a regular basis. Approximately one
>>> out of three CI runs fails on Windows. Piotr already improved the
>>> situation extensively, though there are still several leftovers that
>>> need attention.
>>> 
>>> Unless somebody steps up to improve the unstable Windows tests, I
>>> would like to disable those only for the WIndows platform.
>> 
>> Please don't. Windows has an annoying file locking policy that
>> prevents users from deleting files with open file descriptors, but
>> that is one of the few ways to detect resource leakage we have.
>> 
>> Tests running on *NIXes will ignore problems with open file
>> descriptors and delete the log files, but on a production system those
>> leaks will accumulate and cause application crashes. We had such a
>> leak, when we used `URLConnection#getLastModified` on a `jar:...` URL.
>> This call caused file descriptor exhaustion on both Windows and
>> *NIXes, but only the Windows test was able to detect it.
>> 
>> Piotr,
>> who never thought would ever defend Microsoft Windows.
>> 
>> PS: Gary reports the failures, but always runs the build again until
>> it succeeds, even on Friday 13th, when he had to wait until Saturday
>> 14th for the test run to succeed.

Reply via email to