Hi Scott I'd really appreciate it, if you took the time to do that - I (and my coworkers) really quite baffled.
We have had an increase of activity and load on the system during the last few weeks (more users and utilization, so nothing alarming), and now this problem has manifested itself, and I've never seen it before. My understanding of what's happening is that the blocking thread is waiting for an empty buffer so it can flush it's own full buffer, but that notification never comes. As the source for log4j shows the thread has called wait() on an ArrayList (the full buffer), and something has gone wrong, but what? We've seen it happen 3 or 4 times during the last week - always a few minutes after midnight when Resin performs its logrotation. In the stacktrace most of the threads that waits for the deadlocked thread is just (as is the deadlocked thread) doing a simple logger.info(String), so it might be possible to reproduce by spawning 10-20 threads that calls logger.info() and then have Resin perform a rollover once a minute. Regards, Jens Dueholm Christensen Rambøll Survey IT ________________________________ From: resin-interest-boun...@caucho.com [resin-interest-boun...@caucho.com] On Behalf Of Scott Ferguson [f...@cauchomail.com] Sent: 21 February 2012 19:01 To: resin-interest@caucho.com Subject: Re: [Resin-interest] Problems with rollover of logfile with log4j 1.2.16 and AsyncAppender On 02/20/2012 12:56 PM, Jens Dueholm Christensen (JEDC) wrote: Hi Lately we are seeing increased blocking behaviour from log4j when Resin performs the nightly rollover of logfiles. We’re running Resin Pro 3.1 (license #1013826) in a pre .11 snapshot (resin-pro-3.1.s110225) (due to bug #4349 and a bit of waiting before 3.1.11 was released). Updating to 3.1.12 is planned, but the changelog (http://www.caucho.com/resin-3.1/changes/changes.xtp is the best I could find?) does not mention any fix for our current problem, so I’m looking for any insights or advise. I'd need to look at the log4j code to understand what it's doing. I don't think that wait() should be related to Resin (other than timing) but I'd need to look at their code to be certain, and I can't think of any logging that would help. (Logging the logging code is tricky :) 3.1.13 should be out next week, by the way. -- Scott We’ve configured resin to perform rollovers of this logfile: <stdout-log path="/usr/local/www/xact/logs/resin-web-stdout.log" timestamp="[%Y-%m-%d %H:%M:%S.%s]" rollover-period="1D"/> In our log4j configuration we log to stdout with: <appender name="CONSOLE" class="org.apache.log4j.ConsoleAppender"> <param name="Target" value="System.out"/> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%d{ISO8601} [%-20.20t] %5p %c{1} - %m - %x%n"/> </layout> </appender> <appender name="ASYNC" class="org.apache.log4j.AsyncAppender"> <appender-ref ref="CONSOLE"/> </appender> <root> <priority value="warn"/> <appender-ref ref="ASYNC"/> </root> <<loggers>> Some time after midnight when Resin performs the rollover, the JVM becomes unresponsive and a stacktrace shows multiple threads hanging, and we have to manually restart (it doesn’t get picked up by the watchdog, and can/will hang for hours). A complete JVM stacktrace is attached to this mail. All threads that are attempting to log are waiting for access to the object blocked by this thread: "http-172.27.80.36:8080-30$1663241944" daemon prio=10 tid=0x00007f8804005800 nid=0x2a40 in Object.wait() [0x00007f87a08ed000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:195) - locked <0x00000005ba882928> (a java.util.ArrayList) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) - locked <0x00000005b8d6db00> (a org.apache.log4j.AsyncAppender) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) - locked <0x00000005ba6f9e38> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.info(Category.java:666) at com.pls.popinhandler.PopinScriptHandler.handleRequest(PopinScriptHandler.java:65) … It seems like something never never calls notifyAll() on the ArrayList, and as a result our world grinds to a halt.. https://issues.apache.org/bugzilla/show_bug.cgi?id=38137#c16 (comment #16) has the same kind of stacktrace as we are seeing, and the comments in #17 does point towards Resin. The last line in logfile resin-web-stdout.log.20120218 (before rotation) has the timestamp 2012-02-18 23:59:59,718. The first line in the new logfile resin-web-stdout.log.20120219 has the timestamp 2012-02-19 01:30:48.878 (here we actually waited to see if the problem corrected itself – but alas) which was when we did a restart. All other logfiles (watchdog, jvm, resin stdout and stderr) does not contain any indication of what is wrong. If it’s not Resin causing the problem, I guess we’ll just have to switch our entire log4j setup to use org.apache.log4j.rolling.RollingFileAppender (part of log4j companions), but we’d prefer to stick with the existing setup. Are there any interesting Resin loggers (and loglevel) that could show something? Any thoughts or possible hints are appriciated. Regards, Jens Dueholm Christensen Rambøll Survey IT _______________________________________________ resin-interest mailing list resin-interest@caucho.com<mailto:resin-interest@caucho.com> http://maillist.caucho.com/mailman/listinfo/resin-interest _______________________________________________ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest