Re: [naviserver-devel] A selection of messages appearing in our error logs

Gustaf Neumann Wed, 08 Nov 2023 01:00:58 -0800

Hi Brian,

In general, we try to follow with the logging severities the guideline[1]. "Notice" is informational, "warning" is somethingunexpected/unwanted (hitting some limits, got some data that looks likean attack, ...), "error" is something which should be looked at by anengineer.



    [1] https://openacs.org/xowiki/logging-conventions


below, I comment inline.



On 07.11.23 18:28, Brian Fenton wrote:

Hi
with the goal of improving signal-to-noise in our error logs, and alsoto increase my understanding of Naviserver internals, I have a fewquestions about various different messages we are seeing regularly inour logs.
 1. [06/Nov/2023:14:58:08][77320.7f7b9a7fd640][-driver:nsssl:0-]
    Notice: ... sockAccept accepted 2 connections
    this is by far the most common entry in our logs - we see a LOT of
    these. Are they merely informative or indicative of some tuning
    action we could take? I don't see any way to disable them. I would
    love to remove these.

This indicates some stress situation of the server, where multiplerequests come in at the same time. The value of 2 is not of a concern,if you see values like 10 or 20, you should consider a change in yourconfiguration (e.g., configuring a second driver thread). It would bepossible to define a configuration value for the network driver to setthe threshold when the notice is written to the log file.

1.

2.
    [06/Nov/2023:14:57:28][77320.7f7b63fff640][-conn:openacs:default:6:6292-]
    Notice: ns_cache create entry collision cache util_memoize key
    '__util_memoize_installed_p', no timeout
    We also see a quite a few of these, always for
    '__util_memoize_installed_p', and always with 'no timeout'. From
    looking at cache.c I see  that this means that Naviserver failed
    to create the cache entry. I checked the 'util_memoize' cache
    utilization in case the cache was full, but it's quite low. What
    other cause could there be for this?

This means that concurrent operations are executed to obtain the samecache value, and that the second request (hereconn:openacs:default:6:6292) has to wait until the locking process hasfinished. The problem can become serious, when the locking process"hangs", i.e. taking a very long time to finish. This means, that moreand more of caching requests for the same entry will pile up,potentially until a restart, since no timeout was specified. The warningwas introduced to give the developer a hint, why suddenly the server ishanging (before the message was introduced, the cause was very hard todetermine).



This message tells as well something else:

* It looks as if the installation is based on a very old version ofopenacs-core (including in particular acs-tcl) installed. At least since9 years, the code has changed to perform this test just at startup.Upgrade to recent versions will improve the performance and increasesecurity.

* The message tells me as well that the application has a stress on theutil-memoize cache. Old versions of OpenACS used this "util-memoize"cache as a kitchen-sink cache for everything. I've seen instances having300K + entries in this cache. The size itself is not the problem, butthere were/are many operations in OpenACS that obtain all keys from thecache. It means that a lock is created, a list with 300K+ entries iscreated and handed to the interpreter before the cache is released. Inthe meantime, *all* caching requests for this cache have to be halted,... causing in high load situations degraded performance and limitedconcurrency. I would not be surprised, if you see high lock times onthis cache.

There are caches aside of the util-memoize cache, where cache collisionsmight happen. See e.g. [2] for more background. Cache partitioning is inthis situation the instrument for scaling.



   [2] https://openacs.org/xowiki/cache-size-tuning

1.
 2. These 5 appear usually as a group:
    [07/Nov/2023:13:46:07][38.7f352cff9640][-conn:openacs:default:3:4088-]
    Notice: exiting: exceeded max connections per thread
    [07/Nov/2023:13:46:07][38.7f3525b84640][-driver:nsssl:0-] Notice:
    NsEnsureRunningConnectionThreads wantCreate 1 waiting 1 idle 0
    current 2
    [07/Nov/2023:13:46:07][38.7f352d7fa640][-conn:openacs:default:5:0-]
    Notice: start update interpreter openacs to epoch 1, concurrent 1
    [07/Nov/2023:13:46:08][38.7f352d7fa640][-conn:openacs:default:5:0-]
    Notice: update interpreter openacs to epoch 1 done, trace none,
    time 0.373006 secs concurrent 1
    [07/Nov/2023:13:46:08][38.7f352d7fa640][-conn:openacs:default:5:0-]
    Notice: thread initialized (0.388878 secs)
    We see these pretty regularly, over 50 per day on  a  local
    development server.  "maxconnections" is set to the default value,
    which is 100, I believe.  "maxthreads" and "minthreads" are also
    set to default values. What could  be causing these so regularly
    on a system with 1 logged-in user?

The message tells that a connection thread has reached its end of live,as defined by the "connsperthread" parameter. When this happens, thethread is stopped, and depending on the configuration parameters, a newthread is created maybe immediately or on demand. The reason forstopping connection threads from time to time is to clean up someinterpreter specific data, which was accumulated by some applications.With the modern versions of OpenACS and with well-behaved packages, thisis not an issue, the value can be increased substantially.

The relevant counter does not depend on the number of different users,but on the number of requests handled by a particular connection thread.

1.

2.
     These 3 are related to each other,  I presume:
    [07/Nov/2023:15:36:24][38.7f3536b86640][-conn:openacs:default:28:28082-]
    Notice: nsdb: closing old handle in pool 'pool1'
    [07/Nov/2023:15:45:39][38.7f3536b86640][-conn:openacs:default:31:31287-]
    Notice: nsdb: closing idle handle in pool 'pool1'
    [07/Nov/2023:15:45:39][38.7f3536b86640][-conn:openacs:default:31:31287-]
    Notice: dbdrv: opening database 'nsoracle:192.168.1.1/ORACLE'
    These are quite frequent too. Am I correct in thinking that we can
    tune these with pool "maxidle" and "maxopen" params mentioned here
    https://naviserver.sourceforge.io/n/nsdb/files/ns_db.html ? Is
    there any reason not to turn off the automatic closing feature?

There might be some reasons for keeping the number of db connections low(e.g. many servers use the same database with a limited number of clientconnections, etc.), since at least 20 years there is no technical reasonfor the common db drivers not to turn off the automatic closing feature.Reconnecting to the database can be costly, when the db is running on adifferent host and/or the connection is via HTTPS.



hope this helps a little


all the best

-g

_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Re: [naviserver-devel] A selection of messages appearing in our error logs

Reply via email to