Re: Recovery from OutOfMemoryError?
On 01/08/2007, at 6:50 PM, Mark H. Wood wrote: Would you (or anyone) care to provide a link to where I can learn more about swatch? Everything I've turned up so far points to a wanna-be replacement for UTC called "internet time" promoted by a watchmaker. http://swatch.sourceforge.net/ http://sourceforge.net/project/showfiles.php?group_id=68627 Cheers Andrew - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
Would you (or anyone) care to provide a link to where I can learn more about swatch? Everything I've turned up so far points to a wanna-be replacement for UTC called "internet time" promoted by a watchmaker. -- Mark H. Wood, Lead System Programmer [EMAIL PROTECTED] Typically when a software vendor says that a product is "intuitive" he means the exact opposite. pgpWyQYLJ0sxf.pgp Description: PGP signature
RE: Recovery from OutOfMemoryError?
> From: Christopher Schultz [mailto:[EMAIL PROTECTED] > Subject: Re: Recovery from OutOfMemoryError? > > > (Sorry for not responding sooner. Went out to dinner and to see the > > Spider Pig movie :-) > > Nice. ;) The pig completely disappeared half way through the movie, but there are rumors it will show up at the beginning of next season. Was the pig's vanishing related to the guilt trip Santa's Little Helper admitted to at the end? Inquiring minds want to know... > Which JVM are you working on, though? One of the > mainstream ones? The one we ship for our mainframes is based on Sun's current one, but uses our own allocator, GC, and JIT, among other things. The replaced GC mechanism does not change the visible semantics of such operations, just the internal workings. > Or something designed to be super high-availability (not that the > mainstream ones aren't...)? I don't think it's any more or less reliable than other HotSpot-based JVMs. > The only conclusion that I could draw was that some user (or several > users) caused the OOME and permanently disabled the server. One possibility is that there's really an ongoing memory leak in his webapps, and enough junk accumulates to eat up most of the heap after a while. Then a large, unsatisfiable request is made, there's no recovery logic built into the app, and the failure leaves some application structures in an inconsistent state. > The server should keep going, right? It should indeed. The fact that it doesn't says he may be doing something odd at the time of the failure. > Maybe he's busting his PermGen, but that's unlikely since he says > it only happens under peak load. Due to the use of reflection within Tomcat, there are many anonymous classes created during normal operation. These are discardable immediately after processing of each request, but I suppose if enough requests are going on concurrently and the size of the PermGen was marginal to begin with, it could be the source of the problem. Really need a lot more details to answer this. > So, what is the likely cause of the tech support call? Simply leaving up a lock could hang the application. (Referring to the java.util.concurrent kind here, not synchronized blocks or methods.) Leaving out a simple finally clause to release resources in failure cases can easily result in a dead webapp. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
On 01/08/2007, at 3:44 PM, Christopher Schultz wrote: I'm guessing he's running a webapp, and that one of the request worker threads got an OOME. Most webapp requests are idempotent (or should be), and those that aren't are generally wrapped around database or other transactions. Assuming I'm right (which is frequently dangerous), one failed request should not affect the rest of the application. Any locally-instantiated objects should be ripe for collection, including any of the "big ones" that probably caused the OOME in the first place. The server should keep going, right? It sounds as if the original poster doesn't really have much to say about how the thing is programmed, and is trying to find a solution to his problem, which is being called at 3am. Swatch keeping its eyes on catalina.out and then calling killall -9 java, ./bin/startup.sh should solve this. As for the rest of the memory issues - Catching OOM doesn't help you really, as Tomcat does not catch OOM - it throws it all the way up to the top, at which stage the JVM dies. IE: Your thread uses all the memory - tomcat now receives a new request, tries to allocate memory for a new object - poof. Even though your code deals nicely with the OOM situation, tomcat doesn't. Cheers Andrew PS: I can't wait for the day where Java gets pointers and the sizeof operator... - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Christopher Schultz [mailto:[EMAIL PROTECTED] > Subject: Re: Recovery from OutOfMemoryError? > > I generally think of this as "failure return" as simply allowing the > exception to propagate. That can work, but since OOMEs don't require "throws" declarations on methods, it's usually better for the lump of code that took the original hit to wrapper the OOME in something unique to the function. This also helps in getting the programmers who write code that calls the lump in question to think about the issue. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: >> From: Christopher Schultz [mailto:[EMAIL PROTECTED] >> Subject: Re: Recovery from OutOfMemoryError? > > (Sorry for not responding sooner. Went out to dinner and to see the > Spider Pig movie :-) Nice. ;) >> Actually, my past experience has been that it's the GC >> thread that OOMEs, not a worker thread. > > Assuming we're talking about a current HotSpot-based JVM, the threads > doing GCs cannot get OOMEs, since they are dedicated to doing just GC > operations, and never do any object allocations themselves. On older > JVMs (and some from other vendors), the thread that initially encounters > an allocation failure also does the GC; if the GC fails to recover > enough memory, it can generate an OOME for itself. Like I said, it's been a lng time since I've had to worry about OOMEs that didn't result from honestly having too small of a heap to handle the program's needs. It was probably a 1.3 JVM or something like that. >> It has always been my understanding that a JVM that suffers an OOME >> is all but done for. > > The JVM itself doesn't care about any exceptions thrown at the > application. There are certainly a ton of applications that handle such > error conditions very badly, and hang themselves up by doing such things > as trying to display messages rather than nulling out now useless > references. Some of the stress-testing of our JVM involves running apps > designed to provoke OOMEs; these readily recover and keep on truckin'. Right. Which JVM are you working on, though? One of the mainstream ones? Or something designed to be super high-availability (not that the mainstream ones aren't...)? >> The OP would seem to corroborate this claim, since it sounds like his >> whole app server becomes unresponsive once he gets an OOME (hence the >> early morning phone calls). > > The supposed timing of the phone calls leaves me somewhat skeptical; > what are they running where the peak load occurs at 3 AM? I had thought of that, and it didn't make a whole lot of sense to me. The only conclusion that I could draw was that some user (or several users) caused the OOME and permanently disabled the server. At 03:00 (or so) other users, perhaps in a different timezone, started trying to use the server and found it unresponsive. Then again, maybe he runs an adult website that gets most of its traffic at 3 in the morning. If not, whoever he works for needs to get a more geographically diverse tech support team ;) >> If your assertion (OOMEs can be ignored, since only one allocation >> fails and the rest of the VM is fine) were true, then the OP would >> not be getting any calls in the middle of the night: the user would >> simply re-try the request and (hopefully) get a result the second > time. > > That's not what I said at all. Sorry. I was trying to recap a nuanced position in a single sentence. > Each logical module should be designed > to handle such situations, typically by discarding what has been done up > to the point of failure, and then returning an error to its caller. > What is likely to have happened instead in the OP's case is that the app > encountering the OOME had no provision at all for error recovery, and > simply quit, leaving many now useless objects around with live > references to them. It may have even made matters worse by trying to > generate an error message of some sort. I'm guessing he's running a webapp, and that one of the request worker threads got an OOME. Most webapp requests are idempotent (or should be), and those that aren't are generally wrapped around database or other transactions. Assuming I'm right (which is frequently dangerous), one failed request should not affect the rest of the application. Any locally-instantiated objects should be ripe for collection, including any of the "big ones" that probably caused the OOME in the first place. The server should keep going, right? For some reason, it doesn't. Maybe he's busting his PermGen, but that's unlikely since he says it only happens under peak load. So, what is the likely cause of the tech support call? The server must have gone down, right? If it wasn't the servlet, and it wasn't Tomcat, and it wasn't the JVM, what brought caused the outage? - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGsI4i9CaO5/Lv0PARAiFZAJoCEmn46zAr01MbSYygabxyHMR7uACgjMoG BruXyXOAzRPhJYY7M/0R0qQ= =ejah -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: >> From: Christopher Schultz [mailto:[EMAIL PROTECTED] >> Subject: Re: Recovery from OutOfMemoryError? >> >> Are you suggesting that all methods should be written as a >> loops around attempts to do real work, catching OOME and >> re-trying until the work gets done? > > Sort of, but not at the method level - something on a larger scale. > Think recoverable database operations, where nothing is permanently > stored until a commit happens. And perpetual retry isn't needed - just > a failure return to the caller. I generally think of this as "failure return" as simply allowing the exception to propagate. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGsIuK9CaO5/Lv0PARAgchAJwNisfWQClMub7qmUj8/smKxbonVwCfQDrb bwYmaWlh5u+7gClpLMq41KI= =iOsg -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
I have recently changed a lot of my old perceptions on this matter after reading this excellent article: http://www.ibm.com/developerworks/java/library/j-jtp01274.html If you change your mindset when you write your apps to consider how the garbage collector actually operates, then those memory errors are less likely to come back and bite you. And on the subject of soft references, I started using them as well as transient decelerations on some objects I didn't need to persist in serializable classes and it really helps reduce the load. Java 6 also comes with JConsole, a really handy profiling tool, make the most of it. Peter Caldarale, Charles R wrote: From: Christopher Schultz [mailto:[EMAIL PROTECTED] Subject: Re: Recovery from OutOfMemoryError? (Sorry for not responding sooner. Went out to dinner and to see the Spider Pig movie :-) Actually, my past experience has been that it's the GC thread that OOMEs, not a worker thread. Assuming we're talking about a current HotSpot-based JVM, the threads doing GCs cannot get OOMEs, since they are dedicated to doing just GC operations, and never do any object allocations themselves. On older JVMs (and some from other vendors), the thread that initially encounters an allocation failure also does the GC; if the GC fails to recover enough memory, it can generate an OOME for itself. It has always been my understanding that a JVM that suffers an OOME is all but done for. The JVM itself doesn't care about any exceptions thrown at the application. There are certainly a ton of applications that handle such error conditions very badly, and hang themselves up by doing such things as trying to display messages rather than nulling out now useless references. Some of the stress-testing of our JVM involves running apps designed to provoke OOMEs; these readily recover and keep on truckin'. The OP would seem to corroborate this claim, since it sounds like his whole app server becomes unresponsive once he gets an OOME (hence the early morning phone calls). The supposed timing of the phone calls leaves me somewhat skeptical; what are they running where the peak load occurs at 3 AM? If your assertion (OOMEs can be ignored, since only one allocation fails and the rest of the VM is fine) were true, then the OP would not be getting any calls in the middle of the night: the user would simply re-try the request and (hopefully) get a result the second time. That's not what I said at all. Each logical module should be designed to handle such situations, typically by discarding what has been done up to the point of failure, and then returning an error to its caller. What is likely to have happened instead in the OP's case is that the app encountering the OOME had no provision at all for error recovery, and simply quit, leaving many now useless objects around with live references to them. It may have even made matters worse by trying to generate an error message of some sort. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Peter Stavrinides Albourne Partners (Cyprus) Ltd Tel: +357 22 750652 If you are not an intended recipient of this e-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute it. Please visit http://www.albourne.com/email.html for important additional terms relating to this e-mail. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Christopher Schultz [mailto:[EMAIL PROTECTED] > Subject: Re: Recovery from OutOfMemoryError? (Sorry for not responding sooner. Went out to dinner and to see the Spider Pig movie :-) > Actually, my past experience has been that it's the GC > thread that OOMEs, not a worker thread. Assuming we're talking about a current HotSpot-based JVM, the threads doing GCs cannot get OOMEs, since they are dedicated to doing just GC operations, and never do any object allocations themselves. On older JVMs (and some from other vendors), the thread that initially encounters an allocation failure also does the GC; if the GC fails to recover enough memory, it can generate an OOME for itself. > It has always been my understanding that a JVM that suffers an OOME > is all but done for. The JVM itself doesn't care about any exceptions thrown at the application. There are certainly a ton of applications that handle such error conditions very badly, and hang themselves up by doing such things as trying to display messages rather than nulling out now useless references. Some of the stress-testing of our JVM involves running apps designed to provoke OOMEs; these readily recover and keep on truckin'. > The OP would seem to corroborate this claim, since it sounds like his > whole app server becomes unresponsive once he gets an OOME (hence the > early morning phone calls). The supposed timing of the phone calls leaves me somewhat skeptical; what are they running where the peak load occurs at 3 AM? > If your assertion (OOMEs can be ignored, since only one allocation > fails and the rest of the VM is fine) were true, then the OP would > not be getting any calls in the middle of the night: the user would > simply re-try the request and (hopefully) get a result the second time. That's not what I said at all. Each logical module should be designed to handle such situations, typically by discarding what has been done up to the point of failure, and then returning an error to its caller. What is likely to have happened instead in the OP's case is that the app encountering the OOME had no provision at all for error recovery, and simply quit, leaving many now useless objects around with live references to them. It may have even made matters worse by trying to generate an error message of some sort. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Christopher Schultz [mailto:[EMAIL PROTECTED] > Subject: Re: Recovery from OutOfMemoryError? > > Are you suggesting that all methods should be written as a > loops around attempts to do real work, catching OOME and > re-trying until the work gets done? Sort of, but not at the method level - something on a larger scale. Think recoverable database operations, where nothing is permanently stored until a commit happens. And perpetual retry isn't needed - just a failure return to the caller. > IMHO, it's not a library's problem if there wasn't enough memory to > perform its duty. It's the driver's responsibility to catch and > re-attempt anything important. Agreed. Unfortunately, many such drivers seem to ignore the possibility of failure of libraries. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
On 8/1/07, Caldarale, Charles R <[EMAIL PROTECTED]> wrote: > ... but these often aren't learned until > something catastrophic happens. > great sentence :-) Leon - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: >> From: Leon Rosenberg [mailto:[EMAIL PROTECTED] >> Subject: Re: Recovery from OutOfMemoryError? >> >> Thats however strongly depend on where it happened... if for example >> the code in question was a middleware stub which is left in >> unpredictable state, or the orb itself, or any kind of stack >> somewhere, or a processing queue, or some background threads... or 3rd >> party libraries... > > Agreed - but the above defines software of somewhat questionable > quality, not written with robustness in mind. But if it's not a > critical environment, the occasional outage may not matter, so robust > algorithms are not always needed. Are you suggesting that all methods should be written as a loops around attempts to do real work, catching OOME and re-trying until the work gets done? That's what it sounds like, here. IMHO, it's not a library's problem if there wasn't enough memory to perform its duty. It's the driver's responsibility to catch and re-attempt anything important. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGr7gq9CaO5/Lv0PARAjYhAJ0eyYzCaDls9rsjrvoJS6xu6XogCgCeKgr+ acntG1IntJLIABNbcEFKOh0= =thpd -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chuck, Caldarale, Charles R wrote: >> From: Christopher Schultz [mailto:[EMAIL PROTECTED] >> Subject: Re: Recovery from OutOfMemoryError? >> >> A thread that suffers an OOME is pretty much hosed, anyway, so >> counting on it to do any kind of recovery is difficult. > > Why do you say that? The only thing that failed is the allocation of > some particular object, leaving the rest of the thread's state intact. In my experience, OOMEs are not just caused by the failure of a large allocation, such as a huge array or something, but rather tons of small allocations. One extra value object that puts the heap over the top. > In most cases, it's easy to return a failure notification to the > caller of whatever method encountered the error. In these cases, attempting to open a file, start a process, send an email, or even generating an "error" page, etc. would simply result in another OOME. Actually, my past experience has been that it's the GC thread that OOMEs, not a worker thread. In this case, the VM really is hosed. You have taken issue with this assertion in the past. It has always been my understanding that a JVM that suffers an OOME is all but done for. The OP would seem to corroborate this claim, since it sounds like his whole app server becomes unresponsive once he gets an OOME (hence the early morning phone calls). If your assertion (OOMEs can be ignored, since only one allocation fails and the rest of the VM is fine) were true, then the OP would not be getting any calls in the middle of the night: the user would simply re-try the request and (hopefully) get a result the second time. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGr7bI9CaO5/Lv0PARAkZcAJ99hua96HcbrNesDPoSkHwFmHG6xgCfW+Ee PclChFZVgdQRK8zHBmJ5jrE= =WUiw -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Leon Rosenberg [mailto:[EMAIL PROTECTED] > Subject: Re: Recovery from OutOfMemoryError? > > Thats however strongly depend on where it happened... if for example > the code in question was a middleware stub which is left in > unpredictable state, or the orb itself, or any kind of stack > somewhere, or a processing queue, or some background threads... or 3rd > party libraries... Agreed - but the above defines software of somewhat questionable quality, not written with robustness in mind. But if it's not a critical environment, the occasional outage may not matter, so robust algorithms are not always needed. > I think there are very few places where an oome should be caught and > can be handled properly, or you have to surround each new with > try/catch Certainly you don't want try/catch everywhere, but as you say, it is needed in state-altering places so that restoration to a usable condition can be done when necessary. Employing techniques such as acquiring all necessary data structures before manipulating pointers in doubly-linked lists go a long way towards eliminating the need for complex backout mechanisms; but these often aren't learned until something catastrophic happens. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
On 7/31/07, Caldarale, Charles R <[EMAIL PROTECTED]> wrote: > > From: Christopher Schultz [mailto:[EMAIL PROTECTED] > > Subject: Re: Recovery from OutOfMemoryError? > > > > A thread that suffers an OOME is pretty much hosed, anyway, so > > counting on it to do any kind of recovery is difficult. > > Why do you say that? The only thing that failed is the allocation of > some particular object, leaving the rest of the thread's state intact. > In most cases, it's easy to return a failure notification to the caller > of whatever method encountered the error. Unless one's design is based > on wishful thinking, of course... Thats however strongly depend on where it happened... if for example the code in question was a middleware stub which is left in unpredictable state, or the orb itself, or any kind of stack somewhere, or a processing queue, or some background threads... or 3rd party libraries... chances to recover from an oome are pretty low in my opinion, and even if you recover (unless it was an unusually expensive request) the next request you get will bring you into same situation... I think there are very few places where an oome should be caught and can be handled properly, or you have to surround each new with try/catch regards Leon > > - Chuck > > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY > MATERIAL and is thus for use only by the intended recipient. If you > received this in error, please contact the sender and delete the e-mail > and its attachments from all computers. > > - > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
How about using SoftReference to store _large_ or some session data? It must add additional overhead due to data re-retrieval needed in case it has been collected, but at least its guaranteed that you will never get an oome, and chances are good, that you will loose inactive sessions earlier. regards leon p.s. of course only a workaround but may help if data size is unpredictable. On 7/31/07, Craig Berry <[EMAIL PROTECTED]> wrote: > The trouble is that our memory demand per user session is unpredictable. > Some user sessions do things that barely touch the heap; other sessions > can make huge demands. It depends on what the user chooses to do during > the session. So throttling user count down to make it utterly safe > would be impractical. Instead, statistically, it's unlikely that more > than one or two memory-hungry sessions will be active at any given time. > When we get more than that at once, we risk an OOME. > > -Original Message- > From: Andrew Miehs [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 31, 2007 10:01 AM > To: Tomcat Users List > Subject: Re: Recovery from OutOfMemoryError? > > On 31/07/2007, at 6:52 PM, Craig Berry wrote: > > > Fixing the bug would be cool, but the "bug" is actually just too many > > users contending for the same heap space, so that's going to be tough. > > I'd thought of the log watcher, but that seems a rather blunt > > instrument; I was thinking there might be some kind of Tomcat (or JVM) > > intrinsic mechanism for this. > > How much heap space do you have set?! Why don't you just increase it? > > If not, why not decrease the number of users you allow onto the server? > > Restarting Tomcat is even more 'blunt' then allowing access to > fewer users... > > Confused... > > Andrew > > - > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Christopher Schultz [mailto:[EMAIL PROTECTED] > Subject: Re: Recovery from OutOfMemoryError? > > A thread that suffers an OOME is pretty much hosed, anyway, so > counting on it to do any kind of recovery is difficult. Why do you say that? The only thing that failed is the allocation of some particular object, leaving the rest of the thread's state intact. In most cases, it's easy to return a failure notification to the caller of whatever method encountered the error. Unless one's design is based on wishful thinking, of course... - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
How about keeping track of how many of these big operations are running (using a synchronized counter) and returning a 503 status for "big" requests that come in when the system is busy? Only the servlet(s) or page(s) that handle these big requests would be limited, the rest of the webapp would keep handling requests as normal. -- Len On 7/31/07, Craig Berry <[EMAIL PROTECTED]> wrote: > The trouble is that our memory demand per user session is unpredictable. > Some user sessions do things that barely touch the heap; other sessions > can make huge demands. It depends on what the user chooses to do during > the session. So throttling user count down to make it utterly safe > would be impractical. Instead, statistically, it's unlikely that more > than one or two memory-hungry sessions will be active at any given time. > When we get more than that at once, we risk an OOME. > > -Original Message- > From: Andrew Miehs [mailto:[EMAIL PROTECTED] > Sent: Tuesday, July 31, 2007 10:01 AM > To: Tomcat Users List > Subject: Re: Recovery from OutOfMemoryError? > > On 31/07/2007, at 6:52 PM, Craig Berry wrote: > > > Fixing the bug would be cool, but the "bug" is actually just too many > > users contending for the same heap space, so that's going to be tough. > > I'd thought of the log watcher, but that seems a rather blunt > > instrument; I was thinking there might be some kind of Tomcat (or JVM) > > intrinsic mechanism for this. > > How much heap space do you have set?! Why don't you just increase it? > > If not, why not decrease the number of users you allow onto the server? > > Restarting Tomcat is even more 'blunt' then allowing access to > fewer users... > > Confused... > > Andrew > > - > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Craig, Craig Berry wrote: > Fixing the bug would be cool, but the "bug" is actually just too many > users contending for the same heap space, so that's going to be tough. Too many users logged-in, or too many simultaneous connections? If the latter, you can simply limit the number of simultaneous (active) connections using the attributes in server.xml. If the former, you have many options (from easiest to most difficult): 1. Increase the heap size (you probably already did that). 2. Shorten the session timeout to destroy sessions more quickly. 3. Buy more memory. 4. Cluster your applications among multiple servers. 5. Lighten the amount of information you store in the session to reclaim memory. > I'd thought of the log watcher, but that seems a rather blunt > instrument; I was thinking there might be some kind of Tomcat (or JVM) > intrinsic mechanism for this. Not really. There's no global "exception listener" or anything. Exceptions are generally handled by the thread that is executing at the time. A thread that suffers an OOME is pretty much hosed, anyway, so counting on it to do any kind of recovery is difficult. - -chris -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGr4pK9CaO5/Lv0PARAt0kAJwL8lOl5sJCxcMxcgxB4xrWObWiwACfceQn yBcBzdFnAjafIoBa7Pqo3vY= =pydn -END PGP SIGNATURE- - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
Hi, Marco, Yes, our memory allocation sizes are carefully selected, and set near the maximum available in 32-bit Java. We're investigating running under a 64-bit JVM to enable the use of additional heap space. -Original Message- From: Marco [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 31, 2007 10:40 AM To: 'Tomcat Users List' Subject: RE: Recovery from OutOfMemoryError? Dear Craig, You are familiar with, even with enough systemmemory, JVM uses limited memory? I your application consumes much memory, you could change settings in the tomcat6.conf file: #JAVA_OPTS="-Xminf0.1 -Xmaxf0.3" JAVA_OPTS="-Xmx1024M -Xms512M" Or higher, depending on your systemconfiguration. Not too high though. You need testing, for this could lead Tomcat not to start. (second, consider using -server setting: JAVA_OPTS="- server -Xmx1024M -Xms512M" More information on http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/java.html#standard Marco -Oorspronkelijk bericht- Van: Craig Berry [mailto:[EMAIL PROTECTED] Verzonden: dinsdag 31 juli 2007 18:44 Aan: users@tomcat.apache.org Onderwerp: Recovery from OutOfMemoryError? Our Tomcat-based app suffers from occasional OutOfMemoryErrors. We have found that we need to manually restart Tomcat when these happen; frequently the Tomcat process appears to be working after the error, but is actually crippled in one way or another by the loss of some key thread. We would very much like to trigger an automated Tomcat restart when an OOME occurs. Does anyone have suggestions on the cleanest, safest way to arrange this? (We're running Tomcat as a normal process under Linux, if that matters.) -- Craig Berry Principal Architect and Technical Manager PortBlue Corporation (http://www.portblue.com/) - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
On 31/07/2007, at 7:39 PM, Marco wrote: Dear Craig, You are familiar with, even with enough systemmemory, JVM uses limited memory? I your application consumes much memory, you could change settings in the tomcat6.conf file: #JAVA_OPTS="-Xminf0.1 -Xmaxf0.3" JAVA_OPTS="-Xmx1024M -Xms512M" mx and ms should be the same for a server application. And as mentioned by someone earlier, you will probably want to increase MaxPermSize as well. -Xmx1500m -Xms1500m -XX:MaxPermSize=256m Cheers Andrew - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
Dear Craig, You are familiar with, even with enough systemmemory, JVM uses limited memory? I your application consumes much memory, you could change settings in the tomcat6.conf file: #JAVA_OPTS="-Xminf0.1 -Xmaxf0.3" JAVA_OPTS="-Xmx1024M -Xms512M" Or higher, depending on your systemconfiguration. Not too high though. You need testing, for this could lead Tomcat not to start. (second, consider using -server setting: JAVA_OPTS="- server -Xmx1024M -Xms512M" More information on http://java.sun.com/j2se/1.5.0/docs/tooldocs/solaris/java.html#standard Marco -Oorspronkelijk bericht- Van: Craig Berry [mailto:[EMAIL PROTECTED] Verzonden: dinsdag 31 juli 2007 18:44 Aan: users@tomcat.apache.org Onderwerp: Recovery from OutOfMemoryError? Our Tomcat-based app suffers from occasional OutOfMemoryErrors. We have found that we need to manually restart Tomcat when these happen; frequently the Tomcat process appears to be working after the error, but is actually crippled in one way or another by the loss of some key thread. We would very much like to trigger an automated Tomcat restart when an OOME occurs. Does anyone have suggestions on the cleanest, safest way to arrange this? (We're running Tomcat as a normal process under Linux, if that matters.) -- Craig Berry Principal Architect and Technical Manager PortBlue Corporation (http://www.portblue.com/) - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
On 31/07/2007, at 7:19 PM, Caldarale, Charles R wrote: From: Craig Berry [mailto:[EMAIL PROTECTED] Subject: RE: Recovery from OutOfMemoryError? It depends on what the user chooses to do during the session. Again, try another point of view. It's what the webapps choose to do in response to user requests that provoke the problem. Is there some spot in your code that's making a grab for a big array and failing to handle the possibility of allocation failure? Or have you simply over-configured the number of connector threads for the size heap you're running? I would also strongly agree with the fix the problem solution, but if you really want to 'kick' your users out - then have a look at swatch. IIRC it can perform a task on receiving a log message. Cheers Andrew - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
hi Craig, if you get OutOfMemoryError reliably in your log, you might consider the tanukisoftware Java Service Wrapper as an intermediate "solution". It can watch the output and automatically restart Tomcat. I would not combine it with an app, that has a very high volume of log output though. Also I think, the output must go throw STDOUT, so if the error occurs in different log files, you have to throw all those together to stdout. Of course with log frameworks you can define stdout as an additional log target. Setting this service wrapper up will take you some time, but it's a very powerful wrapper. Java 6 has -XX:OnOutOfMemoryError="cmd1 args...;cmd2 ..." which could also be useful (sending mail etc.). I don't really know how reliable it is, but setup is definitely faster than the service wrapper. If this works, report it back,, it might be useful for others as well. Be careful when designing cmd and args about assumptions concerning current working directory and environment. Regards, Rainer Craig Berry wrote: Our Tomcat-based app suffers from occasional OutOfMemoryErrors. We have found that we need to manually restart Tomcat when these happen; frequently the Tomcat process appears to be working after the error, but is actually crippled in one way or another by the loss of some key thread. We would very much like to trigger an automated Tomcat restart when an OOME occurs. Does anyone have suggestions on the cleanest, safest way to arrange this? (We're running Tomcat as a normal process under Linux, if that matters.) - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Craig Berry [mailto:[EMAIL PROTECTED] > Subject: RE: Recovery from OutOfMemoryError? > > It depends on what the user chooses to do during > the session. Again, try another point of view. It's what the webapps choose to do in response to user requests that provoke the problem. Is there some spot in your code that's making a grab for a big array and failing to handle the possibility of allocation failure? Or have you simply over-configured the number of connector threads for the size heap you're running? - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
The trouble is that our memory demand per user session is unpredictable. Some user sessions do things that barely touch the heap; other sessions can make huge demands. It depends on what the user chooses to do during the session. So throttling user count down to make it utterly safe would be impractical. Instead, statistically, it's unlikely that more than one or two memory-hungry sessions will be active at any given time. When we get more than that at once, we risk an OOME. -Original Message- From: Andrew Miehs [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 31, 2007 10:01 AM To: Tomcat Users List Subject: Re: Recovery from OutOfMemoryError? On 31/07/2007, at 6:52 PM, Craig Berry wrote: > Fixing the bug would be cool, but the "bug" is actually just too many > users contending for the same heap space, so that's going to be tough. > I'd thought of the log watcher, but that seems a rather blunt > instrument; I was thinking there might be some kind of Tomcat (or JVM) > intrinsic mechanism for this. How much heap space do you have set?! Why don't you just increase it? If not, why not decrease the number of users you allow onto the server? Restarting Tomcat is even more 'blunt' then allowing access to fewer users... Confused... Andrew - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
Oh, I'm not "blaming" either one. Normally, our server is quite adequate to handle the expected user load. Every now and then we get a "perfect storm" of too many users asking for too many large-memory requests, and an OOME happens. We're investigating ways to increase capacity, but in the mean time, automating recovery would help keep the 3am phone calls from happening. We're running Java 6 and Tomcat 6. It's definitely heap space that's running out, and we're actively profiling various load scenarios to spot the most likely targets for memory-use reduction. But again, a short-term technique to automate the restart would help a lot. -Original Message- From: Caldarale, Charles R [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 31, 2007 10:00 AM To: Tomcat Users List Subject: RE: Recovery from OutOfMemoryError? > From: Craig Berry [mailto:[EMAIL PROTECTED] > Subject: RE: Recovery from OutOfMemoryError? > > Fixing the bug would be cool, but the "bug" is actually just too many > users contending for the same heap space Let's put it another way: your webapp and/or JVM configuration aren't set up properly to handle the number of users you have; don't blame it on the users (or Tomcat). What JVM do you have installed? What version of Tomcat? What OS? Are you sure you're out of heap space, or is some other resource being exhausted, such as file handles? If it's really heap space, is it the PermGen? Do you have a memory leak in your webapp? Have you profiled what's going on to see the real cause of the problem? - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
On 31/07/2007, at 6:52 PM, Craig Berry wrote: Fixing the bug would be cool, but the "bug" is actually just too many users contending for the same heap space, so that's going to be tough. I'd thought of the log watcher, but that seems a rather blunt instrument; I was thinking there might be some kind of Tomcat (or JVM) intrinsic mechanism for this. How much heap space do you have set?! Why don't you just increase it? If not, why not decrease the number of users you allow onto the server? Restarting Tomcat is even more 'blunt' then allowing access to fewer users... Confused... Andrew - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
> From: Craig Berry [mailto:[EMAIL PROTECTED] > Subject: RE: Recovery from OutOfMemoryError? > > Fixing the bug would be cool, but the "bug" is actually just too many > users contending for the same heap space Let's put it another way: your webapp and/or JVM configuration aren't set up properly to handle the number of users you have; don't blame it on the users (or Tomcat). What JVM do you have installed? What version of Tomcat? What OS? Are you sure you're out of heap space, or is some other resource being exhausted, such as file handles? If it's really heap space, is it the PermGen? Do you have a memory leak in your webapp? Have you profiled what's going on to see the real cause of the problem? - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
1. you may have a memory leak in your code... do some profiling 2. check out abandoned sessions that are "due to expire" perhaps you can lower the session timeout and that will make some memory available earlier. Craig Berry wrote: > Our Tomcat-based app suffers from occasional OutOfMemoryErrors. We have > found that we need to manually restart Tomcat when these happen; > frequently the Tomcat process appears to be working after the error, but > is actually crippled in one way or another by the loss of some key > thread. > > > > We would very much like to trigger an automated Tomcat restart when an > OOME occurs. Does anyone have suggestions on the cleanest, safest way > to arrange this? (We're running Tomcat as a normal process under Linux, > if that matters.) > > > - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Recovery from OutOfMemoryError?
Fixing the bug would be cool, but the "bug" is actually just too many users contending for the same heap space, so that's going to be tough. I'd thought of the log watcher, but that seems a rather blunt instrument; I was thinking there might be some kind of Tomcat (or JVM) intrinsic mechanism for this. -Original Message- From: Leon Rosenberg [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 31, 2007 9:46 AM To: Tomcat Users List Subject: Re: Recovery from OutOfMemoryError? variant 1: a log watcher that checks for OOMe and restarts tomcat variant 2: fix the bug :-) regards Leon On 7/31/07, Craig Berry <[EMAIL PROTECTED]> wrote: > Our Tomcat-based app suffers from occasional OutOfMemoryErrors. We have > found that we need to manually restart Tomcat when these happen; > frequently the Tomcat process appears to be working after the error, but > is actually crippled in one way or another by the loss of some key > thread. > > > > We would very much like to trigger an automated Tomcat restart when an > OOME occurs. Does anyone have suggestions on the cleanest, safest way > to arrange this? (We're running Tomcat as a normal process under Linux, > if that matters.) > > > > -- > > Craig Berry > Principal Architect and Technical Manager > > PortBlue Corporation (http://www.portblue.com/) > > > > > > - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recovery from OutOfMemoryError?
variant 1: a log watcher that checks for OOMe and restarts tomcat variant 2: fix the bug :-) regards Leon On 7/31/07, Craig Berry <[EMAIL PROTECTED]> wrote: > Our Tomcat-based app suffers from occasional OutOfMemoryErrors. We have > found that we need to manually restart Tomcat when these happen; > frequently the Tomcat process appears to be working after the error, but > is actually crippled in one way or another by the loss of some key > thread. > > > > We would very much like to trigger an automated Tomcat restart when an > OOME occurs. Does anyone have suggestions on the cleanest, safest way > to arrange this? (We're running Tomcat as a normal process under Linux, > if that matters.) > > > > -- > > Craig Berry > Principal Architect and Technical Manager > > PortBlue Corporation (http://www.portblue.com/) > > > > > > - To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]