Hi Brian,

In general, we try to follow with the logging severities the guideline [1]. "Notice" is informational, "warning" is something unexpected/unwanted (hitting some limits, got some data that looks like an attack, ...), "error" is something which should be looked at by an engineer.


    [1] https://openacs.org/xowiki/logging-conventions


below, I comment inline.



On 07.11.23 18:28, Brian Fenton wrote:
Hi

with the goal of improving signal-to-noise in our error logs, and also to increase my understanding of Naviserver internals, I have a few questions about various different messages we are seeing regularly in our logs.

 1. [06/Nov/2023:14:58:08][77320.7f7b9a7fd640][-driver:nsssl:0-]
    Notice: ... sockAccept accepted 2 connections
    this is by far the most common entry in our logs - we see a LOT of
    these. Are they merely informative or indicative of some tuning
    action we could take? I don't see any way to disable them. I would
    love to remove these.


This indicates some stress situation of the server, where multiple requests come in at the same time. The value of 2 is not of a concern, if you see values like 10 or 20, you should consider a change in your configuration (e.g., configuring a second driver thread). It would be possible to define a configuration value for the network driver to set the threshold when the notice is written to the log file.


1.

2.
    [06/Nov/2023:14:57:28][77320.7f7b63fff640][-conn:openacs:default:6:6292-]
    Notice: ns_cache create entry collision cache util_memoize key
    '__util_memoize_installed_p', no timeout
    We also see a quite a few of these, always for
    '__util_memoize_installed_p', and always with 'no timeout'. From
    looking at cache.c I see  that this means that Naviserver failed
    to create the cache entry. I checked the 'util_memoize' cache
    utilization in case the cache was full, but it's quite low. What
    other cause could there be for this?

This means that concurrent operations are executed to obtain the same cache value, and that the second request (here conn:openacs:default:6:6292) has to wait until the locking process has finished. The problem can become serious, when the locking process "hangs", i.e. taking a very long time to finish. This means, that more and more of caching requests for the same entry will pile up, potentially until a restart, since no timeout was specified. The warning was introduced to give the developer a hint, why suddenly the server is hanging (before the message was introduced, the cause was very hard to determine).


This message tells as well something else:
* It looks as if the installation is based on a very old version of openacs-core (including in particular acs-tcl) installed. At least since 9 years, the code has changed to perform this test just at startup. Upgrade to recent versions will improve the performance and increase security.


* The message tells me as well that the application has a stress on the util-memoize cache. Old versions of OpenACS used this "util-memoize" cache as a kitchen-sink cache for everything. I've seen instances having 300K + entries in this cache. The size itself is not the problem, but there were/are many operations in OpenACS that obtain all keys from the cache. It means that a lock is created, a list with 300K+ entries is created and handed to the interpreter before the cache is released. In the meantime, *all* caching requests for this cache have to be halted, ... causing in high load situations degraded performance and limited concurrency. I would not be surprised, if you see high lock times on this cache.


There are caches aside of the util-memoize cache, where cache collisions might happen. See e.g. [2] for more background. Cache partitioning is in this situation the instrument for scaling.


   [2] https://openacs.org/xowiki/cache-size-tuning


1.
 2. These 5 appear usually as a group:
    [07/Nov/2023:13:46:07][38.7f352cff9640][-conn:openacs:default:3:4088-]
    Notice: exiting: exceeded max connections per thread
    [07/Nov/2023:13:46:07][38.7f3525b84640][-driver:nsssl:0-] Notice:
    NsEnsureRunningConnectionThreads wantCreate 1 waiting 1 idle 0
    current 2
    [07/Nov/2023:13:46:07][38.7f352d7fa640][-conn:openacs:default:5:0-]
    Notice: start update interpreter openacs to epoch 1, concurrent 1
    [07/Nov/2023:13:46:08][38.7f352d7fa640][-conn:openacs:default:5:0-]
    Notice: update interpreter openacs to epoch 1 done, trace none,
    time 0.373006 secs concurrent 1
    [07/Nov/2023:13:46:08][38.7f352d7fa640][-conn:openacs:default:5:0-]
    Notice: thread initialized (0.388878 secs)
    We see these pretty regularly, over 50 per day on  a  local
    development server.  "maxconnections" is set to the default value,
    which is 100, I believe.  "maxthreads" and "minthreads" are also
    set to default values. What could  be causing these so regularly
    on a system with 1 logged-in user?


The message tells that a connection thread has reached its end of live, as defined by the "connsperthread" parameter. When this happens, the thread is stopped, and depending on the configuration parameters, a new thread is created maybe immediately or on demand. The reason for stopping connection threads from time to time is to clean up some interpreter specific data, which was accumulated by some applications. With the modern versions of OpenACS and with well-behaved packages, this is not an issue, the value can be increased substantially.


The relevant counter does not depend on the number of different users, but on the number of requests handled by a particular connection thread.


1.

2.
     These 3 are related to each other,  I presume:
    [07/Nov/2023:15:36:24][38.7f3536b86640][-conn:openacs:default:28:28082-]
    Notice: nsdb: closing old handle in pool 'pool1'
    [07/Nov/2023:15:45:39][38.7f3536b86640][-conn:openacs:default:31:31287-]
    Notice: nsdb: closing idle handle in pool 'pool1'
    [07/Nov/2023:15:45:39][38.7f3536b86640][-conn:openacs:default:31:31287-]
    Notice: dbdrv: opening database 'nsoracle:192.168.1.1/ORACLE'
    These are quite frequent too. Am I correct in thinking that we can
    tune these with pool "maxidle" and "maxopen" params mentioned here
    https://naviserver.sourceforge.io/n/nsdb/files/ns_db.html ? Is
    there any reason not to turn off the automatic closing feature?

There might be some reasons for keeping the number of db connections low (e.g. many servers use the same database with a limited number of client connections, etc.), since at least 20 years there is no technical reason for the common db drivers not to turn off the automatic closing feature. Reconnecting to the database can be costly, when the db is running on a different host and/or the connection is via HTTPS.


hope this helps a little


all the best

-g
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to