Charles Hache created LOG4J2-1784:
-------------------------------------
Summary: Large values for max in DefaultRolloverStrategy blocks
application
Key: LOG4J2-1784
URL: https://issues.apache.org/jira/browse/LOG4J2-1784
Project: Log4j 2
Issue Type: Bug
Components: Appenders
Affects Versions: 2.7
Environment: Played around a lot in 2.5, once realized issue was
related to log4j, upgraded to 2.7 and confirmed it is still happening. Using
on a raspberry pi.
Reporter: Charles Hache
Hi all,
I'm trying to debug an application which is using log4j2 for logging. After
discovering that by default the RollingRandomAccessFile only kept the last 7
logs using the counter, I found the DefaultRolloverStrategy option and set the
max to something like 99999, so in theory I wouldn't have to worry about it.
A couple days later I started looking at everything again (log files from
log4j, timestamped CSV files from the application, etc) I can see something is
going terribly wrong. What I'm finding is that when it comes time to rotate
the logs, it looks like the whole application hangs. I see the following:
- All logging to the log file stops;
- All my data going to the CSV files stop;
- CPU usage goes to 100%;
- I cannot stop my application cleanly;
After a while (~4-5 minutes) everything goes back to normal (until the next log
rotation), but then the first lines of the logs have timestamps like this:
2017-01-14 07:43:00,001 INFO ....
2017-01-14 07:42:11,329 DEBUG ....
2017-01-14 07:47:26,648 TRACE ....
2017-01-14 07:47:26,695 INFO ....
2017-01-14 07:47:26,697 TRACE ....
2017-01-14 07:47:26,737 INFO ....
2017-01-14 07:47:26,739 TRACE ....
2017-01-14 07:47:26,741 TRACE ....
2017-01-14 07:47:26,744 TRACE ....
2017-01-14 07:47:26,768 TRACE ....
Of note is that there is a gap of a few minutes in the first couple lines of
the file (I'm not too concerned about the out-of-order timestamps as lots of
different threads are logging). The application always writes many (dozens,
maybe hundreds) of lines per second, so a gap of a couple minutes is
uncharacteristic. The rest of the log file looks normal (has no gaps, has log
lines every few milliseconds). From my specific log lines and where they are
w.r.t. each other, I looks to me like calls to the logger are blocking (eg: in
cases I'll have a couple log lines that usually log 1-2 milliseconds apart, but
when they occur at rollover time they'll be minutes apart).
I shrunk the file size rollover to something small (256K), which I hit in maybe
15 seconds or so, and it is clearly reproducible. When I reduced the default
rollover strategy max file to 40, the problem when away and I can see it
rolling over the file practically instantly.
The documentation for the DefaultRolloverStrategy says "When the ascending
attribute is set to true (the default) then the counter will be incremented and
the current log file will be renamed to include the counter value. If the
counter hits the maximum value then the oldest file, which will have the
smallest counter, will be deleted, all other files will be renamed to have
their counter decremented and then the current file will be renamed to have the
maximum counter value. Note that with this counting strategy specifying a large
maximum value may entirely avoid renaming files."
I took this to mean that having a value so large that it would never be hit
would be more efficient, since it doesn't have to do anything, just keep
counting up. I'm wondering now if it's doing something like "check if
log-1.log exists... check if log-2.log exist ....... check if log-99999.log
exists" but I don't see any iobound activity in iotop or anything (although I
can't say for sure if FS calls to see if something exists would show up there,
for example).
This might not necessarily be a bug in the logger per se, but it would at least
be a documentation issue if there is some O(n) operation based on the max value
for the DefaultRolloverStrategy.
I'd usually dig deeper, but now I still have to debug the original issues with
the actual application. Let me know if there is any more info I should provide
or if there is anything else I can do!
Thanks,
Charles
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]