Re: Notification strategy for OutOfMemoryError
On 1/23/2014 5:21 PM, Christopher Schultz wrote: If you'd care to post your code to either the list or onto the wiki, I'm sure it would be useful to someone. Feel free to trim-out huge sections of the code and say make this fit your environment, etc. if you don't want to show everyone how bad your email-assembling code looks ;) Yeah, I don't really want to show my email code. Sending email will be left as an exercise for the reader. The filter part of it is barely anything. The method handleError(Error,ServletRequest) does the work of logging and sending email. I synchronized that method so that only one can run at a time in order to minimize memory usage by the filter if I'm getting multiple Error's thrown. ErrorNotifier.java /** * Catch all {@link Throwable}s and notify on {@link Error}s. * * @see Filter#doFilter(ServletRequest, ServletResponse, FilterChain) */ public void doFilter( ServletRequest request, ServletResponse response, FilterChain chain ) throws IOException, ServletException { try{ chain.doFilter(request, response); }catch ( Throwable t ){ if ( t instanceof Error ){ try{ handleError((Error)t, request); }catch ( Throwable tx ){ m_log.fatal(tx.getMessage(), tx); } } throw t; } } private static synchronized void handleError( Error error, ServletRequest request ) throws UnsupportedEncodingException, MessagingException { log and check the error and send email } There are also destroy() and init(FilterConfig) methods but they are stubs that do nothing and are only there because they are required by the Filter interface. WEB-INF/web.xml: !-- Filter to catch java.lang.Error and send emails -- filter display-nameErrorNotifier/display-name filter-nameErrorNotifier/filter-name filter-classcom.mydomain.filters.ErrorNotifier/filter-class /filter filter-mapping filter-nameErrorNotifier/filter-name url-pattern/*/url-pattern /filter-mapping - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
On 12/11/2013 11:42 PM, André Warnier wrote: The original issue of the OP was to be notified ASAP when an OOM occurs. And he indicated that an OOM resulted in a message in the logs. So, something is already catching the OOM exception, to write this line in the logs. On the other hand, there is ample literature available that seems to indicate that any method for trying to recover from (or even do something worthwhile after) an OOM is ultimately flawed and unreliable. We have a lot of servlets and JSP's. Most of them do not use huge amounts of memory but a few do (like reports). When there is a memory leak, the first thing to get an OOME will be something that uses a large amount of memory. That request will die, but the rest of the requests that don't use a lot of memory will have plenty of space for a while. I implemented the filter, and it works in my testing. I also implemented the command line jvm option which works but only gives me the first OOME. The command line option works no matter what and the filter works as long as it doesn't run out of memory generating the email message. We'll see how it all works after it gets deployed to our production systems in a few weeks. Our product is mature enough that we've fixed memory leaks to the point that we normally go many months without any OOME's so it could be a while before this actually kicks in for a real operating situation. Thanks to Christopher for the ideas. They were very helpful. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Bill, On 1/23/14, 8:08 PM, Bill Davidson wrote: On 12/11/2013 11:42 PM, André Warnier wrote: The original issue of the OP was to be notified ASAP when an OOM occurs. And he indicated that an OOM resulted in a message in the logs. So, something is already catching the OOM exception, to write this line in the logs. On the other hand, there is ample literature available that seems to indicate that any method for trying to recover from (or even do something worthwhile after) an OOM is ultimately flawed and unreliable. We have a lot of servlets and JSP's. Most of them do not use huge amounts of memory but a few do (like reports). When there is a memory leak, the first thing to get an OOME will be something that uses a large amount of memory. That request will die, but the rest of the requests that don't use a lot of memory will have plenty of space for a while. I implemented the filter, and it works in my testing. I also implemented the command line jvm option which works but only gives me the first OOME. The command line option works no matter what and the filter works as long as it doesn't run out of memory generating the email message. We'll see how it all works after it gets deployed to our production systems in a few weeks. Our product is mature enough that we've fixed memory leaks to the point that we normally go many months without any OOME's so it could be a while before this actually kicks in for a real operating situation. Thanks to Christopher for the ideas. They were very helpful. Glad to see my thoughts were useful. If you'd care to post your code to either the list or onto the wiki, I'm sure it would be useful to someone. Feel free to trim-out huge sections of the code and say make this fit your environment, etc. if you don't want to show everyone how bad your email-assembling code looks ;) - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJS4cAdAAoJEBzwKT+lPKRYsv4QALn+ShovWg71Ucxi/43gaGfE u8yTIH9XGwjOb1XBe/Jp6AS63GBFf/QJrkYeiR9UejcOKxBoXjxDg5sDApVfljk0 +/oxolY8ehhe8LRxR3YOuV1k+yNbzrErIKdJUZ281Hk7NDkuhNePPYp6B9/AJAto 5mFh+Y/1ZADNwqA1i1T22GfM4IbCnh/mKbYQdNoEVGQ3b3ISw1Ct/hMkV0lX+DPY JWCA2XADEtQkOK/3UPfhvtdzhYibbtYQm4MwIgtiFEyuV0LC1po1Pk09IT8f0agr eMW5zgNT4KuQ/Qv1zX6oLXVNsLKbLQ+Jd/s4H2GP8IOdc+ASSR6SV6UvjojbU+W4 QvScs1iCYul3Gx70E0JZDOh25+aIMIcLWKz6P0u6Yuo5J3ExiGVZuHcHYSQxtom9 f+uwdweY19Qp9YN+7wLHhrGDwsIBvxKlFgINBSz5fbkblA66K05V/mKSPrjngg2Y 8zn0UJpUCIYdPkKzsg1JwZQvd/8kEV3Qrz2PekF/k6JF/S3LN+nBpLSc+5shtxXv od2cfnpssnSHpKTwTB85ZdrgA/mwkiRuqNdDWQMFt+CIx4+u5Lk6ZrZ2YCLEEpWz fSZK0/QOW3TDa9WgpguQ5wwfRPqLV7Q30/6bNyBMer35+2E8A0Fee7kUgLkEpxzl lBumesN28J4jpIKGTAfo =IEBs -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
On Thu, Jan 23, 2014 at 8:21 PM, Christopher Schultz ch...@christopherschultz.net wrote: Glad to see my thoughts were useful. If you'd care to post your code to either the list or onto the wiki, I'm sure it would be useful to someone. +1 I love it when others share code, and thanks for suggesting that Chris.
Re: Notification strategy for OutOfMemoryError
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 André, On 12/12/13, 2:42 AM, André Warnier wrote: The original issue of the OP was to be notified ASAP when an OOM occurs. And he indicated that an OOM resulted in a message in the logs. So, something is already catching the OOM exception, to write this line in the logs. Right, but he wants to be notified. Most notification strategies are flawed at best. On the other hand, there is ample literature available that seems to indicate that any method for trying to recover from (or even do something worthwhile after) an OOM is ultimately flawed and unreliable. Yes and no. If the OOME was transient (e.g. a single thread ran-away and filled-up the heap with thread-specific objects), then the heap can recover then the thread dies (with OOME) and the JVM is likely to continue without further related errors. On the other hand, you can of course fill up the heap with objects that will *not* be collected when the thread is killed, then you're done and bad things will happen like the GC getting so angry and confused it can't do its job even if the heap does recover. Whilst I am in awe of the various solutions proposed so far (and of their developers), would it, in this case, not be simpler to pipe the log through a simple filter which catches the OOM log message and warns the OP accordingly ? That works, but you need to make sure that pipe stays up and running all the time. How do you get notified if the pipe goes down? How do you recover from the pipe going down... do you need to restart Tomcat? How do you get notified if the notifier for the pipe going down goes down? It's turtles all the way down. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSqb2TAAoJEBzwKT+lPKRYG3MQAMtprIzBT3eyiQdfBpMUuZL+ iLLrLI5lNxnSsRJF5wm6NNJKr0/HZQNzVAtDCTcIIxfhD/U0NGh3NPECceJ8N+dx A/cZezw0HxtuL+HGEDJip35S8/pTYvqHnRVde+1Z99HhyhZvYLvoUJzOKiCDaKN/ i/Jn4xy5swUQVAlvrc7SBd3G5i1XqjZ8cEp0IsJECgLocxXjGbCAKKUeeDf4azo4 xT+IK0obcdFolVYj3JPUtTsgmiQ81id2vMTlq1RElz+3hex7+odkvB3Yuf5Nw017 ua+OQOEAZFDk6ehCZSKWXtP2BWOylc5kwJxjKJSoOV8BE7FXAayV3xmr+EUpbUfa 16ode+ND+9GQlp5+4py8xl0UC66dfvF/Lm93qEoOlPsg006ybwiRlGhKV4nCsoWe 0h9sUwh9Wxki9FTZMcGI9s3KkRNo8yS3OyogGaZSRA77vbBSU8A5QCaiCRGp9MD+ MDtwGTDYbI9f9L0HvCh/f1pkt2KLYc4dFSBSRF5wp4qjtFq30QiKCVogGmrGNxVp IEi1MbJLdYo4r2yy4O+lZTXYPzidX5oF0B1h6KsOAMbq8wi23azJz9mp5wMEMFhk akL8fIsn3ktXo7/V232LGzYibq/wNDlzruUr0ayxVsQXKtMn6yFvQHsgRoaCsUbO 7DUcrLxpUN4llKqmacMO =5NlU -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 André, On 12/12/13, 2:42 AM, André Warnier wrote: The original issue of the OP was to be notified ASAP when an OOM occurs. And he indicated that an OOM resulted in a message in the logs. So, something is already catching the OOM exception, to write this line in the logs. Right, but he wants to be notified. Most notification strategies are flawed at best. On the other hand, there is ample literature available that seems to indicate that any method for trying to recover from (or even do something worthwhile after) an OOM is ultimately flawed and unreliable. Yes and no. If the OOME was transient (e.g. a single thread ran-away and filled-up the heap with thread-specific objects), then the heap can recover then the thread dies (with OOME) and the JVM is likely to continue without further related errors. On the other hand, you can of course fill up the heap with objects that will *not* be collected when the thread is killed, then you're done and bad things will happen like the GC getting so angry and confused it can't do its job even if the heap does recover. Whilst I am in awe of the various solutions proposed so far (and of their developers), would it, in this case, not be simpler to pipe the log through a simple filter which catches the OOM log message and warns the OP accordingly ? That works, but you need to make sure that pipe stays up and running all the time. How do you get notified if the pipe goes down? How do you recover from the pipe going down... do you need to restart Tomcat? How do you get notified if the notifier for the pipe going down goes down? It's turtles all the way down. I love the Pratchett reference, but seriously, I believe that in this case at least there is less chance for the pipe (or the 10-line script/program that runs on it) to go down, than for any of the proposed alternative solutions to misfire. And if it goes down, then Tomcat and the JVM are quite likely to go down alongside it. What better notification could one hope for ? (Except if Nagios goes down of course.) - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Bill, On 12/9/13, 8:20 PM, Bill Davidson wrote: On 12/9/2013 3:12 PM, Christopher Schultz wrote: Was it a transient error, or a chronic condition? A single thread can, for instance, spew objects into its stack or eden space exhausting memory but, when that thread hits the OOME, all those objects are freed which basically recovers from the situation. If, instead, you fill-up some shared cache, buffer, etc. and NO threads can get more memory, then you're basically toast. Which of the above was it? It looked more like the first one though we still haven't tracked down the cause. We had several dozen threads running at the time. That's common for us. It's not that unusual for us to have a couple of hundred users with active sessions per server at any given time. There are a bunch of things you can try to do. They all have their caveats, failure scenarios, and inefficacies. 1. Use -XX:OnOutOfMemoryError=cmd args;cmd args Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space. That looks interesting. It wouldn't tell me about the error but at least I'd know that there was an OOME. Better than nothing and I can go check catalina.out. Of course, I still have the problem that threads silently fail and show my users not so much as an error message. 2. Use JMX monitoring, set java.lang:MemoryPool/[heap space]/UsageThreshold to whatever byte value you want to set as your limit. Then, check java.lang:MemoryPool/[heap space]/UsageThresholdExceeded to see if it is true. If so, your usage threshold has been exceeded. Note that this is not proof-positive than an OOME occurred. It's also tough to tell what value to use for the threshold. You can't really set it to MaxHeap - 1 byte, because you'll never get that value in practice. If you set it too low, you'll get warnings all the time when your heap usage rises in the normal course of business. I'm less enthused about that one. 3. catch IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. I'm not sure I understand this one. How does an IOException relate to an OOME? Sorry, I meant of course OutOfMemoryError. Just make sure you use as little memory as possible during the exception handler or it will fail itself and possibly mask the original problem. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSqIFyAAoJEBzwKT+lPKRYKxoP/iDp+6bzkKN3KgqsMziO2fSl /NTbWIz1F9jzBAkFpRs+MQqwT3ZzUbE0QPbTnxPh3ShS+iD+UobtvdS2mn6YLNX9 qcWislmZKYMSFK/idc9JgBZE6XDRPG6bAo/X2lEfV9rURHVaA10QCgt7xDXdfN9b 5Ggs0ZfA5v2VQNIZyDFIukZzswFfA/VVb42vAR/wkTuBVVT/opZliv19gHc6y9D9 harx4z5cfEFQMq8YLdrQJIyPUXeoRoS9Um9oujS2FCWDEa0kni5Fn2nFh7beHHzi CMjncYAQ57pm59LwWp0PzRsZmPsr/UmgwiM95yt+c6cqxZKCr3d9xqqAMaqC33Fj sK8Hz/JzFkkfAK5xRHmdSqP+svpFpAvWJD75LOr4XvEOmEM1hCe4hkbzsa6wDJ5y f7EHAVbCjbgqtE7Ic3G5Gxz7KlarlWn7QIvJHJfy1kD9KbSVJtph5O6d6fQkxaQG vxeL9LqnMmPq8YmAlvuK/V2uUAmHNS4TVYKYTLnx/Z9kuiozhQSGoJAWP36KpM+3 kXZi9/9/L5SIry+R+PABj8UIKktIZ9ZJGZzV9uT8SiiEdkADz8kgcRWRUtdk3eDD T9Dzt2Tr9SV+HKUVkE3MTpqpAO6NVroOZ57ij6diiD7WXEMb9WeZcUt7IxxNOf+A nDT5bAeuQLHwAZTyQN74 =N4NL -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
On 12/11/2013 7:14 AM, Christopher Schultz wrote: 3. cath IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. I'm not sure I understand this one. How does an IOException relate to an OOME? Sorry, I meant of course OutOfMemoryError. Just make sure you use as little memory as possible during the exception handler or it will fail itself and possibly mask the original problem. - You can catch an OOME in a fiter? I would not have expected that. Off to the documentation. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Bill, On 12/11/13, 12:53 PM, Bill Davidson wrote: On 12/11/2013 7:14 AM, Christopher Schultz wrote: 3. cath IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. I'm not sure I understand this one. How does an IOException relate to an OOME? Sorry, I meant of course OutOfMemoryError. Just make sure you use as little memory as possible during the exception handler or it will fail itself and possibly mask the original problem. - You can catch an OOME in a fiter? I would not have expected that. Sure, why not? You can catch anything that extends from Throwable[1]. Now, the handler might not actually succeed if it needs too much memory to run, the JVM could actually call-over completely before the catch block runs, etc. Basically, there are no guarantees: you can only do your best. - -chris [1] Technically, the JVM allows you to throw and catch anything you want to. The compiler is much more stringent. If you want to go through the effort, you can use a Java assembler to build bytecode directly, and the throw instruction will take any reference argument. You can also set up a catch handler that will catch non-Throwables. At least, you used to be able to do all this. It's possible that modern .class verifiers will complain loudly if you try to do silly things like this. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSqSh3AAoJEBzwKT+lPKRY0c4P/0yF73JykERsbycHMHDTeBd4 q2m8QkPe7Hlv+ZjddQXMA1TVQ0HU3uM6oJwrmdR5r51AQlRRMtHk5OD9N9ExV5GX b8Oxtg6mzXKSdy5u/vOfw0VCwxwx79HnXi8yWmcbYG/6PaQCM0L91SzSvoy09jjP jIh79iRbQjAnIg0z/WxOS2jlWvt9piO8irtEUVZ13J+K2wcloi+dMvCxrgA+fJnq n7ELIJBGMVLMzo6/9WHdTas9xKaKfifLyrlgLCrWeFT5r7aFKSFjtk6xz2Jz+qsW G3qW4ALBUuubxNxZZ82knmUnoUsrhi7BZEAoK8RcSdHHHZc9n54hWabk63iEFhQ8 Wow0kpwe3/XKTkR+zJrtOqeNO6dJYUI/jxf/3LIuiIBtbKx8GZpxRoI9Tl/dg8ds /qrFbAHDTLmhgTqgQEs5y+fUczDH2oc+g6TWpjchH/MoJAoRCjg63dMRMrdee88w ZCNT49yExsmQBgO9HGPmSUlU1HpRPJnvRFn54oRSbB7cDPOUhAnSADUQ7SUIm6G2 21KqZ4Y/5pP5zlIpekykOW3AGpvQKrZptMaDkUNm4s2BzH5w+B3+bQDb65jmFSL1 1Piy+QASDWXAOZC1UkYF9ltg/Se79qh8Kbar1mdfdp+u3HGGv9AoGztvjsX3FBc/ ddIIoVPPi9DohZdwocLj =iO7b -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Bill, On 12/11/13, 12:53 PM, Bill Davidson wrote: On 12/11/2013 7:14 AM, Christopher Schultz wrote: 3. cath IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. I'm not sure I understand this one. How does an IOException relate to an OOME? Sorry, I meant of course OutOfMemoryError. Just make sure you use as little memory as possible during the exception handler or it will fail itself and possibly mask the original problem. - You can catch an OOME in a fiter? I would not have expected that. Here is my Filter in its entirety. Note that the catch block does not require the use of any heap. It also avoids any stack usage (presumably, the complete stack frame has been established before the try and therefore method-local references don't cost anything during execution). We also don't do any string-concatentation: all logs are either static strings or otherwise-unmodified String values coming back from the ServletRequest method (those should have been determined long before this code is called). We insert a FALSE into the application scope so that, when the OOME is detected, putting TRUE into the application scope doesn't cause any new memory to be allocated in the application's attribute hashtable. public class OOMEReportingFilter extends HttpFilter { private static final Logger logger = Logger.getLogger(OOMEReportingFilter.class); public static final String OOME_KEY = OOMEReportingFilter.class.getName() + .OUT_OF_MEMORY; private ServletContext _application; @Override public void init(FilterConfig config) { _application = config.getServletContext(); _application.setAttribute(OOME_KEY, Boolean.FALSE); } @Override public void doFilter(HttpServletRequest request, HttpServletResponse response, FilterChain chain) throws IOException, ServletException { try { chain.doFilter(request, response); } catch (OutOfMemoryError oome) { logger.error(Detected OutOfMemory condition during request); logger.error(Here are some request details:); logger.error(Next line will contain the current request's URI); logger.error(request.getRequestURI()); logger.error(Next line will contain the current request's query string (if any):); logger.error(request.getQueryString()); logger.error(Next line will contain the original request's URI (essentially what the client actually requested)); logger.error(request.getAttribute(javax.servlet.forward.request_uri)); logger.error(Next line will contain the original request's query string (if any):); logger.error(request.getAttribute(javax.servlet.forward.query_string)); logger.error(Finally, the current user (if any):); logger.error((null != request.getSession(false) null != request.getSession(false).getAttribute(user) ? ((my.User)request.getSession(false).getAttribute(user)).getUsername() : no current user)); _application.setAttribute(OOME_KEY, Boolean.TRUE); throw oome; } } } Hope that helps. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSqSmvAAoJEBzwKT+lPKRYSw4QAKv66MHQHGagzU55qTlDkubi SaOMuqEHdyaC//3t9QaGHYl11OSWnms3OUb5u1BcDZ6ajEcFQScfh9QV8YC7qPI9 ZUIPoM08/5bnayZoQYseEfKfBRwx49iNpQ7nBG6NOFm5jc04QV4LvaSCO+CNX+Ta CP5NpGy8ha9xlrt2A68j8C8bbqGvt5thi3U0QVDhrmCwFRBQtjFrBUUUmWeeM0dr kFH495mSNeaLg0yep7cJIBzgZyxifkxPqBPdVdafSKk71rqfCbte+LSoHjxXsJLR wPQugIM0gZzD6Y2gsKgkimeyIfy0zWLV/yztxRStz8/aQx+R55ygL0iO9QYKfKDB /K34anWHrNzwOzcfkNvED6Dcnc8U+7G/9qvUrEXSvhUdEUoJW473+sm26nkd3L0n aHGrGdnH7WhvHP/eloxwXFxh+LBqdZGoKLlk7DzWGpbUPEbrj/WE/YBCGRJJ/DTh znwpygs5fqO29gxisVKMrcjsn/9llfhEGepxJDsS5QJhLFlPi4Gha0pEnUKQXkPX GaKJg3ld+0wRKDxJ3WtEZfryTvekjqd62DPJMcbA/VH60LDz15gmLs70qX1oTFs3 bDzVuh94U0ekfbu2JuldV4ZVcsyf/UMVXkCN8nn1ZFBAHxUa3gTZb0iaklC+bNIB LJvKpxpPFbFo1smNe0Fg =O3LJ -END PGP SIGNATURE- - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
Guys, just wondering.. The original issue of the OP was to be notified ASAP when an OOM occurs. And he indicated that an OOM resulted in a message in the logs. So, something is already catching the OOM exception, to write this line in the logs. On the other hand, there is ample literature available that seems to indicate that any method for trying to recover from (or even do something worthwhile after) an OOM is ultimately flawed and unreliable. Whilst I am in awe of the various solutions proposed so far (and of their developers), would it, in this case, not be simpler to pipe the log through a simple filter which catches the OOM log message and warns the OP accordingly ? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
Christopher Schultz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Bill, On 12/9/13, 5:38 PM, Bill Davidson wrote: Last week, one of my servers got an OutOfMemoryError at approximately 1:21pm. :( It's worth pointing out that this is not a trivial issue. My monitoring software which does a heart beat check once per minute did not notice until 3:01pm. Heart beat kept working for over an hour and a half. Was it a transient error, or a chronic condition? A single thread can, for instance, spew objects into its stack or eden space exhausting memory but, when that thread hits the OOME, all those objects are freed which basically recovers from the situation. If, instead, you fill-up some shared cache, buffer, etc. and NO threads can get more memory, then you're basically toast. Which of the above was it? During that time my high capacity high availablity 24/7 application was getting occasional OutOfMemoryError's until memory got bad enough that even the heart beat check servlet failed. Apparently some things that allocate large chunks of memory started failing first, but none of my customers called to complain. Smaller stuff continiued to work. I didn't know until my monitoring software sent me an email about the heart beat failure. That doesn't work for me. I need to know sooner. +1 I thought of trying to handle it with error-page in web.xml. Apparently that does not work. I used java.lang.Throwable as the exception-type. I was already using this for a number of common exceptions to send me email. In most OOME situations, your recovery options are limited... because the JVM might need to allocate (a small amount of) memory in order to even report the error. I see the OutOfMemoryError's logged in my catalina.out Is there some way that I can catch this so that I can send email or something? I need to know as soon as possible so that I can attempt diagnosis and restart the server. Google has not been helpful. Everything says that you have to fix the memory leak. Duh. I know that. We've fixed many over the years. We haven't had one in nearly 2 years. We thought we'd fixed them all. We need to find out about them sooner when they do happen. There are a bunch of things you can try to do. They all have their caveats, failure scenarios, and inefficacies. 1. Use -XX:OnOutOfMemoryError=cmd args;cmd args Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space. 2. Use JMX monitoring, set java.lang:MemoryPool/[heap space]/UsageThreshold to whatever byte value you want to set as your limit. Then, check java.lang:MemoryPool/[heap space]/UsageThresholdExceeded to see if it is true. If so, your usage threshold has been exceeded. Note that this is not proof-positive than an OOME occurred. It's also tough to tell what value to use for the threshold. You can't really set it to MaxHeap - 1 byte, because you'll never get that value in practice. If you set it too low, you'll get warnings all the time when your heap usage rises in the normal course of business. 3. catch IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. 4. You can do what I do: simply look at your total heap space by inspecting java.lang:Memory/HeapMemoryUsage[used] and set a threshold that will cause your monitor to alarm for WARNING and CRITICAL conditions. You may recover and not have to check anything. These days, I get a false-alarm about once every 3 weeks when the heap space grows a hair higher than usual before a full GC runs and clears everything out. The nice thing about #4 is that you can find our early if you *might* be having a problem. Then you can keep an eye on your service to make sure it recovers. If it never OOME's, great. If it does, you can manually restart or whatever. If it OOME's, and #1-#3 above fail because memory might be required to actually execute the do-this-thing-on-OOME action, then you might never get notified. With #4, you don't have to wait until an OOME to take action. Here is another discussion of the matter : http://forum.dlang.org/thread/ikpzfqonfhvrrsthc...@forum.dlang.org?page=3#post-kjcscn:241sap:241:40digitalmars.com and another : http://stackoverflow.com/questions/6244055/why-there-are-no-outofmemoryerror-subclasses Based on : I see the OutOfMemoryError's logged in my catalina.out If so, can't you pipe your catalina.out through a program that will inspect each line (in real-time), and when it sees such a line, immediately send a
Re: Notification strategy for OutOfMemoryError
On 12/9/2013 5:20 PM, Bill Davidson wrote: On 12/9/2013 3:12 PM, Christopher Schultz wrote: 1. Use -XX:OnOutOfMemoryError=cmd args;cmd args Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space. That looks interesting. It wouldn't tell me about the error but at least I'd know that there was an OOME. Better than nothing and I can go check catalina.out. Of course, I still have the problem that threads silently fail and show my users not so much as an error message. I have implemented this one. The only real down side is that it only fires for the first one. Admittedly, I don't want to see hundreds of these but getting pinged every few minutes would probably be good. I may rig up a way for my script to do that until I turn it off. Still not entirely sure how to do the other ones. - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Notification strategy for OutOfMemoryError
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Bill, On 12/9/13, 5:38 PM, Bill Davidson wrote: Last week, one of my servers got an OutOfMemoryError at approximately 1:21pm. :( It's worth pointing out that this is not a trivial issue. My monitoring software which does a heart beat check once per minute did not notice until 3:01pm. Heart beat kept working for over an hour and a half. Was it a transient error, or a chronic condition? A single thread can, for instance, spew objects into its stack or eden space exhausting memory but, when that thread hits the OOME, all those objects are freed which basically recovers from the situation. If, instead, you fill-up some shared cache, buffer, etc. and NO threads can get more memory, then you're basically toast. Which of the above was it? During that time my high capacity high availablity 24/7 application was getting occasional OutOfMemoryError's until memory got bad enough that even the heart beat check servlet failed. Apparently some things that allocate large chunks of memory started failing first, but none of my customers called to complain. Smaller stuff continiued to work. I didn't know until my monitoring software sent me an email about the heart beat failure. That doesn't work for me. I need to know sooner. +1 I thought of trying to handle it with error-page in web.xml. Apparently that does not work. I used java.lang.Throwable as the exception-type. I was already using this for a number of common exceptions to send me email. In most OOME situations, your recovery options are limited... because the JVM might need to allocate (a small amount of) memory in order to even report the error. I see the OutOfMemoryError's logged in my catalina.out Is there some way that I can catch this so that I can send email or something? I need to know as soon as possible so that I can attempt diagnosis and restart the server. Google has not been helpful. Everything says that you have to fix the memory leak. Duh. I know that. We've fixed many over the years. We haven't had one in nearly 2 years. We thought we'd fixed them all. We need to find out about them sooner when they do happen. There are a bunch of things you can try to do. They all have their caveats, failure scenarios, and inefficacies. 1. Use -XX:OnOutOfMemoryError=cmd args;cmd args Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space. 2. Use JMX monitoring, set java.lang:MemoryPool/[heap space]/UsageThreshold to whatever byte value you want to set as your limit. Then, check java.lang:MemoryPool/[heap space]/UsageThresholdExceeded to see if it is true. If so, your usage threshold has been exceeded. Note that this is not proof-positive than an OOME occurred. It's also tough to tell what value to use for the threshold. You can't really set it to MaxHeap - 1 byte, because you'll never get that value in practice. If you set it too low, you'll get warnings all the time when your heap usage rises in the normal course of business. 3. catch IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. 4. You can do what I do: simply look at your total heap space by inspecting java.lang:Memory/HeapMemoryUsage[used] and set a threshold that will cause your monitor to alarm for WARNING and CRITICAL conditions. You may recover and not have to check anything. These days, I get a false-alarm about once every 3 weeks when the heap space grows a hair higher than usual before a full GC runs and clears everything out. The nice thing about #4 is that you can find our early if you *might* be having a problem. Then you can keep an eye on your service to make sure it recovers. If it never OOME's, great. If it does, you can manually restart or whatever. If it OOME's, and #1-#3 above fail because memory might be required to actually execute the do-this-thing-on-OOME action, then you might never get notified. With #4, you don't have to wait until an OOME to take action. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.15 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSpk4+AAoJEBzwKT+lPKRYsCIP/0XZ/v8njibLl1ECnpByBagB jtqCeE78lsHdWouoW7ydIpgmSP60KqvHtMemQUoS3STpn52ahNv/hf8imnybgByv smtTxq0cbFNsnHqJiUb/VQtyK5bnqW7u+mLxwvvt1uIwHUoX5QyTZCUBQqvbUuDM JRexqlFZIGzoiXLNUc5Z+Lg36IBZ8xO6/wlC014GQJTtbc71TS06gxTOKNDNTyuO
Re: Notification strategy for OutOfMemoryError
On 12/9/2013 3:12 PM, Christopher Schultz wrote: Was it a transient error, or a chronic condition? A single thread can, for instance, spew objects into its stack or eden space exhausting memory but, when that thread hits the OOME, all those objects are freed which basically recovers from the situation. If, instead, you fill-up some shared cache, buffer, etc. and NO threads can get more memory, then you're basically toast. Which of the above was it? It looked more like the first one though we still haven't tracked down the cause. We had several dozen threads running at the time. That's common for us. It's not that unusual for us to have a couple of hundred users with active sessions per server at any given time. There are a bunch of things you can try to do. They all have their caveats, failure scenarios, and inefficacies. 1. Use -XX:OnOutOfMemoryError=cmd args;cmd args Rig this to email you, register a passive-check data point with your monitoring server, etc. Just remember that OOMEs happen for a number of reasons. You could have run out of file handles or you could have run out of heap space. That looks interesting. It wouldn't tell me about the error but at least I'd know that there was an OOME. Better than nothing and I can go check catalina.out. Of course, I still have the problem that threads silently fail and show my users not so much as an error message. 2. Use JMX monitoring, set java.lang:MemoryPool/[heap space]/UsageThreshold to whatever byte value you want to set as your limit. Then, check java.lang:MemoryPool/[heap space]/UsageThresholdExceeded to see if it is true. If so, your usage threshold has been exceeded. Note that this is not proof-positive than an OOME occurred. It's also tough to tell what value to use for the threshold. You can't really set it to MaxHeap - 1 byte, because you'll never get that value in practice. If you set it too low, you'll get warnings all the time when your heap usage rises in the normal course of business. I'm less enthused about that one. 3. catch IOException in a filter and set an application attribute. Check this attribute from your monitor. I've been considering doing this, because I can rig it so that the error handler does not actually require any memory to run. The problem is that sometimes OOMEs interrupt one thread and not another. You may not catch the OOME in that thread -- it may happen in a background thread that does not go through the filter. I'm not sure I understand this one. How does an IOException relate to an OOME? 4. You can do what I do: simply look at your total heap space by inspecting java.lang:Memory/HeapMemoryUsage[used] and set a threshold that will cause your monitor to alarm for WARNING and CRITICAL conditions. You may recover and not have to check anything. These days, I get a false-alarm about once every 3 weeks when the heap space grows a hair higher than usual before a full GC runs and clears everything out. The nice thing about #4 is that you can find our early if you *might* be having a problem. Then you can keep an eye on your service to make sure it recovers. If it never OOME's, great. If it does, you can manually restart or whatever. If it OOME's, and #1-#3 above fail because memory might be required to actually execute the do-this-thing-on-OOME action, then you might never get notified. With #4, you don't have to wait until an OOME to take action. Is there a way I can get to this from my heartbeat servlet? - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org