Hi Brian,
In general, we try to follow with the logging severities the guideline
[1]. "Notice" is informational, "warning" is something
unexpected/unwanted (hitting some limits, got some data that looks like
an attack, ...), "error" is something which should be looked at by an
engineer.
[1] https://openacs.org/xowiki/logging-conventions
below, I comment inline.
On 07.11.23 18:28, Brian Fenton wrote:
Hi
with the goal of improving signal-to-noise in our error logs, and also
to increase my understanding of Naviserver internals, I have a few
questions about various different messages we are seeing regularly in
our logs.
1. [06/Nov/2023:14:58:08][77320.7f7b9a7fd640][-driver:nsssl:0-]
Notice: ... sockAccept accepted 2 connections
this is by far the most common entry in our logs - we see a LOT of
these. Are they merely informative or indicative of some tuning
action we could take? I don't see any way to disable them. I would
love to remove these.
This indicates some stress situation of the server, where multiple
requests come in at the same time. The value of 2 is not of a concern,
if you see values like 10 or 20, you should consider a change in your
configuration (e.g., configuring a second driver thread). It would be
possible to define a configuration value for the network driver to set
the threshold when the notice is written to the log file.
1.
2.
[06/Nov/2023:14:57:28][77320.7f7b63fff640][-conn:openacs:default:6:6292-]
Notice: ns_cache create entry collision cache util_memoize key
'__util_memoize_installed_p', no timeout
We also see a quite a few of these, always for
'__util_memoize_installed_p', and always with 'no timeout'. From
looking at cache.c I see that this means that Naviserver failed
to create the cache entry. I checked the 'util_memoize' cache
utilization in case the cache was full, but it's quite low. What
other cause could there be for this?
This means that concurrent operations are executed to obtain the same
cache value, and that the second request (here
conn:openacs:default:6:6292) has to wait until the locking process has
finished. The problem can become serious, when the locking process
"hangs", i.e. taking a very long time to finish. This means, that more
and more of caching requests for the same entry will pile up,
potentially until a restart, since no timeout was specified. The warning
was introduced to give the developer a hint, why suddenly the server is
hanging (before the message was introduced, the cause was very hard to
determine).
This message tells as well something else:
* It looks as if the installation is based on a very old version of
openacs-core (including in particular acs-tcl) installed. At least since
9 years, the code has changed to perform this test just at startup.
Upgrade to recent versions will improve the performance and increase
security.
* The message tells me as well that the application has a stress on the
util-memoize cache. Old versions of OpenACS used this "util-memoize"
cache as a kitchen-sink cache for everything. I've seen instances having
300K + entries in this cache. The size itself is not the problem, but
there were/are many operations in OpenACS that obtain all keys from the
cache. It means that a lock is created, a list with 300K+ entries is
created and handed to the interpreter before the cache is released. In
the meantime, *all* caching requests for this cache have to be halted,
... causing in high load situations degraded performance and limited
concurrency. I would not be surprised, if you see high lock times on
this cache.
There are caches aside of the util-memoize cache, where cache collisions
might happen. See e.g. [2] for more background. Cache partitioning is in
this situation the instrument for scaling.
[2] https://openacs.org/xowiki/cache-size-tuning
1.
2. These 5 appear usually as a group:
[07/Nov/2023:13:46:07][38.7f352cff9640][-conn:openacs:default:3:4088-]
Notice: exiting: exceeded max connections per thread
[07/Nov/2023:13:46:07][38.7f3525b84640][-driver:nsssl:0-] Notice:
NsEnsureRunningConnectionThreads wantCreate 1 waiting 1 idle 0
current 2
[07/Nov/2023:13:46:07][38.7f352d7fa640][-conn:openacs:default:5:0-]
Notice: start update interpreter openacs to epoch 1, concurrent 1
[07/Nov/2023:13:46:08][38.7f352d7fa640][-conn:openacs:default:5:0-]
Notice: update interpreter openacs to epoch 1 done, trace none,
time 0.373006 secs concurrent 1
[07/Nov/2023:13:46:08][38.7f352d7fa640][-conn:openacs:default:5:0-]
Notice: thread initialized (0.388878 secs)
We see these pretty regularly, over 50 per day on a local
development server. "maxconnections" is set to the default value,
which is 100, I believe. "maxthreads" and "minthreads" are also
set to default values. What could be causing these so regularly
on a system with 1 logged-in user?
The message tells that a connection thread has reached its end of live,
as defined by the "connsperthread" parameter. When this happens, the
thread is stopped, and depending on the configuration parameters, a new
thread is created maybe immediately or on demand. The reason for
stopping connection threads from time to time is to clean up some
interpreter specific data, which was accumulated by some applications.
With the modern versions of OpenACS and with well-behaved packages, this
is not an issue, the value can be increased substantially.
The relevant counter does not depend on the number of different users,
but on the number of requests handled by a particular connection thread.
1.
2.
These 3 are related to each other, I presume:
[07/Nov/2023:15:36:24][38.7f3536b86640][-conn:openacs:default:28:28082-]
Notice: nsdb: closing old handle in pool 'pool1'
[07/Nov/2023:15:45:39][38.7f3536b86640][-conn:openacs:default:31:31287-]
Notice: nsdb: closing idle handle in pool 'pool1'
[07/Nov/2023:15:45:39][38.7f3536b86640][-conn:openacs:default:31:31287-]
Notice: dbdrv: opening database 'nsoracle:192.168.1.1/ORACLE'
These are quite frequent too. Am I correct in thinking that we can
tune these with pool "maxidle" and "maxopen" params mentioned here
https://naviserver.sourceforge.io/n/nsdb/files/ns_db.html ? Is
there any reason not to turn off the automatic closing feature?
There might be some reasons for keeping the number of db connections low
(e.g. many servers use the same database with a limited number of client
connections, etc.), since at least 20 years there is no technical reason
for the common db drivers not to turn off the automatic closing feature.
Reconnecting to the database can be costly, when the db is running on a
different host and/or the connection is via HTTPS.
hope this helps a little
all the best
-g
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel