Thanks all for the insight...
And just as Charlie predicted, the event happened again without tripping
the alert.
One benefit, even though I was not actively watching what happened, I
did have the server monitor running. The event happened when the server
was only using 440MB, with 1.2GB free in the jvm allocation. So it
definitely is not a memory issue. (It also happened in between GC
cycles, so that isn't the issue either.)
As for the possibility of the CPU, I won't discount this, but I doubt it
would be from CF. We do use CFDocument/CFPDF which I know grab
resources, but normally those pages are during the morning, and it
actually happened twice last night at a time when I would not expect it.
I'll have to gather more information. I'm starting to think that the
cause may be outside CF. I'm going to try to look at all the system logs
and try to piece together exactly what was happening at the time of the
event.
Another question though... Fusion Reactor monitors the entire system,
not just CF right? (i.e. it can track running system processes, not just
what CF is doing) If this is true, this may be the next step if my
efforts are fruitless.
Thanks,
Frank
On 08/09/2013 12:55 AM, Charlie Arehart wrote:
Like you, I would think this is not memory related. I think that's
just a really old error message, from the days when even the then
Macromedia engineers could only throw up their hands and guess when
something was amiss.
I recently saw this error message happening for a client where we
found (since they were on IIS) that the jrun_iis6_wildcardxxxx.logs
(in [ColdFusion9]\runtime\lib\wsconfig\nn\LogFiles) had indications of
errors also. I realize you're on Apache, and you say you looked at
all the logs, but did you check out those logs in that wsconfig dir
and its subdirs? It's just a stab in the dark whether any log messages
there (around the same time) will be useful.
I would focus on something making the CF instance not responsive. I
know you said you raised the simult request threads from 10 to 40, and
it seemed fine at 10. But maybe you have new load, or a new problem
that makes requests hang.
As Ajas said, FR (or as you're using it, the CF Server Monitor) can
show you any running requests (the CFSM only show them if you turn on
"start monitoring"). If you can be on when it happens you may be
surprised what you find. If all 10 (or now 40) are hung, even if only
for a while, that could lead to the error---not that CF's down, but
the connector thinks it can't be reached.
And as you noted in a later message, turning on the alerts will help
(in either CFSM, again where "start monitoring" must be enabled for
them to work, or in FR, or SeeFusion), as that will give you info even
when you can't be "watching" the monitors. Since you're using the
CFSM, and you say you configured the alerts, did you confirm that you
get the email they send? There's no test feature. What I do is set the
memory alert to below the current memory used, which should trigger an
alert within a few minutes. But then I turn that alert off. I find it
useless, since the JVM (since 1.5) can often let used memory climb to
the max before deciding to do a major GC, so you can get those memory
alerts when there's no real problem, if indeed a GC at that point
would have collected a lot of "not really used" memory.
But I do recommend that slow server alert in the CFSM, or the running
requests alert in FR. For almost everyone, if you have many requests
running at once, that's a "canary in the coal mine" indicating that
problems may be afoot. The question then is whether the alert shows
many slow requests. If it just shows many fast ones, then that is just
a sign of a lot of traffic, and if it's being handled fast, you need
to increase the number of "max simult requests", and the alert level
in whatever monitor you're using.
And be careful about setting the other values in "request tuning" so
low (web services, flash remoting, and remote cfcs). There's never a
harm in them being more than you need. But if they are less than you
need, that could be where a bottleneck happens. I know you say you
don't serve web services, but I've seen shows have their own cf pages
calling their own CFCs as web services. And if that request limit was
low, then that becomes a single threading bottleneck. Or maybe you DO
have code calling CFCs remotely (via ajax). Or about flash remoting,
the monitor (and FR) use those, and your own code may (even if
unexpectedly). Again, why constrict them? If you don't use them,
there's no harm in them being larger (like 5 or 10, each).
Finally, note that you could have cf requests using either cfthread or
reporting, and there are limits for each of those (configurable in the
admin). And though you are not using CF Standard, I'll say for other
readers that they could have all this sort of problem caused by using
some tag that is itself single-threaded in CF Standard, as are many
tags, including cfdocument, cfpdf, and more. That could cause a "low
traffic" site to still have hung requests.
Let us know if any of that helps, or not. But yes, if it remains and
you don't solve it, I am available for consulting, and with my
satisfaction guarantee, you don't have to pay for time you don't feel
is valuable.
/charlie
*From:*ad...@acfug.org [mailto:ad...@acfug.org] *On Behalf Of *Frank
Moorman
*Sent:* Thursday, August 08, 2013 7:42 PM
*To:* discussion@acfug.org
*Subject:* [ACFUG Discuss] Out of Memory?!?
All,
I'm trying to figure out and determine a Jrun Out of Memory error. I
get the following in my logs:
[Thu Aug 08 14:40:14 2013] [notice] jrApache[2937: 31182] returning error page
for JRun too busy or out of memory
[Thu Aug 08 15:50:09 2013] [notice] jrApache[1787: 63699] returning error page
for JRun too busy or out of memory
It doesn't happen often, (maybe once or occasionally twice a business
day) but as everyone understands, users aren't happy when it happens
to them.
This is a linux box, 64bit Centos 6, CF9 Enterprise, 64bit jvm version
1.7. (The jvm was installed separately from CF for security and
coldfusion uses it.)
I doubt it is actually an out of memory condition (though I could be
wrong) The server has 6GB of physical memory and another 6GB of swap.
It rarely needs to use swap. (i.e. I have not observed it.)
The jvm is given significant memory to use as well. It is using a
64bit jvm with the settings of 1GB min JVM heap, as well as a 3GB max.
When I look through the server monitor, it is normal to see 1 to 1.5GB
allocated and between 100-750MB used. (I see a normal sawtooth pattern
with the memory usage, so it looks like what I would expect from the
garbage collection routing. It does spike occasionally but I have
never seen it close to the 3GB max. (I've never even seen it hit 2GB
used.)
The server is set for 40 template requests (I recently upped it from
10 to see if that was the problem and it still occurred with the same
frequency.)
Flash remoting is set to 2, webservice 1, CFC 1. (These remote
settings are only set for the monitor, as the server does not provide
any webservices outside the running application) Jrun is set to 50
requests, and 1000 queued. (Enough to cover the CF requests.)
I looked at Charlie's blog... I have checked the logs, and other than
the apache error log (above) I do not see anything. I've check the
system /var/log/messages, I've checked all the CF logs (I also
archived everything yesterday, and the cf logs are practically empty
even after today's occurrence.) I did not find any jvm abort logs that
Charlie mentioned in his blog. (I checked in the CF directory
mentioned as well as the system logs and the actual JVM directory) I
also checked the Jrun log (in /opt/jrun4/logs/cfusion-event.log ) and
was surprised because the only entries were months ago. (Because of
the age of the log, I'm curious if I am looking at the right place for
it.)
Does anyone have any ideas on what might be happening? or something
else that I should check?
I have searched the web and found different ideas (even the rare "add
more memory")
Another mentions the requests being overloaded, but I honestly do not
believe that the 10 simultaneous template requests was low for the
traffic for this site. After quadrupling it, with the problem still
occurring, it is even less likely.
I've seen some mentioning client variable storage, but the server is
set to use cookies for that, not a database. While I do not use client
storage, I know there are items like the last time visited etc, so I
may just turn it off completely.
Another one I found interested mentions a bug with MySql drivers with
the "Maintain Connections" setting and suggested to uncheck this box.
I search for this and found the bug mentioned, one site even
speculated it was still a problem with CF9, but I could not find any
details. Does anyone know of this issue, I've seen it mentioned, but a
lack of any details other than its bad to have that checked. (The page
that mentioned it did say it ate memory.)
I'd love more ideas, I know these are not an easy or straight forward
error. I may try removing the client storage next, but other ideas are
welcome. (i.e. I'm not very convinced that the other things I found on
the web will be effective.)
Thanks,
Frank
-------------------------------------------------------------
To unsubscribe from this list, manage your profile @
http://www.acfug.org?fa=login.edituserform
For more info, see http://www.acfug.org/mailinglists
Archive @ http://www.mail-archive.com/discussion%40acfug.org/
List hosted by FusionLink <http://www.fusionlink.com>
-------------------------------------------------------------
-------------------------------------------------------------
To unsubscribe from this list, manage your profile @
http://www.acfug.org?fa=login.edituserform
For more info, see http://www.acfug.org/mailinglists
Archive @ http://www.mail-archive.com/discussion%40acfug.org/
List hosted by FusionLink <http://www.fusionlink.com>
-------------------------------------------------------------
-------------------------------------------------------------
To unsubscribe from this list, manage your profile @
http://www.acfug.org?fa=login.edituserform
For more info, see http://www.acfug.org/mailinglists
Archive @ http://www.mail-archive.com/discussion%40acfug.org/
List hosted by http://www.fusionlink.com
-------------------------------------------------------------