Hello all,

As you might have seen if you frequent the Technical Support channel in the 
DSpace Slack 
<https://dspace-org.slack.com/archives/C3V628QNN/p1709318454084069>, we've 
been encountering high CPU usage in DSpace 7.6 leading to decreased 
performance (ie. site unavailability) and we've been having a lot of 
errors.  We're not entirely sure which errors may be the significant ones 
relating to our performance challenges, and would welcome any input from 
the community to help us improve our site issues.  Also, if you're 
encountering similar issues, please do let us know--maybe we're all having 
the same problems and can solve these collaboratively.  Thanks already to 
Tim Donoghue and Mark Wood for their suggestions in the DSpace Slack!  I've 
aggregated the responses we've already received on these issues here to 
enable us to keep track of suggestions.

So, a summary of our issues:  We've set up our server following best 
practices in the Performance Tuning documentation 
<https://wiki.lyrasis.org/display/DSDOC7x/Performance+Tuning+DSpace>.  We 
run everything on a single server with 4 CPU and 12 GB of RAM, which was 
the configuration that worked for our previous version of DSpace (6.3).  
Initially, we had pm2 configured in cluster mode with max instances and 
max_memory_restart: 500M. With this configuration, the node instances kept 
restarting ~every minute and seemed to be monopolizing the CPUs, and 
starving the other components.  Since then, we have since tuned it down to 
3 instances, ie:

{
    "apps": [
        {
           "name": "dspace-ui",
           "cwd": "/var/dspace-frontend/",
           "script": "dist/server/main.js",
           "instances": "3",
           "exec_mode": "cluster",
           "timestamp": "YYYY-MM-DD HH:mm Z",
           "out_file": "log/dspace-ui.log",
           "error_file": "log/dspace-ui_error.log",
           "merge_logs": true,
           "env": {
              "NODE_ENV": "production",
              "NODE_EXTRA_CA_CERTS": "/etc/ssl/certs/rootCA2.crt"
           },
           "max_memory_restart": "1500M",
           "node_args": "--max_old_space_size=4096"
        }
    ]
}

A review of process with top shows very active node instances despite low 
traffic (~ 1 request/second):

Tasks: 289 total,   5 running, 284 sleeping,   0 stopped,   0 zombie
%Cpu(s): 93.9 us,  3.0 sy,  0.0 ni,  3.0 id,  0.0 wa,  0.0 hi,  0.0 si, 
 0.0 st
MiB Mem :  11965.2 total,    440.4 free,  10901.1 used,    623.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    613.9 avail Mem
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
COMMAND
1504803 dspace    20   0 2328180   1.5g  13936 R 100.0  12.5  10:53.08 node 
/v+
1506783 dspace    20   0 2620092   1.7g  14024 R  93.8  14.8   9:44.49 node 
/v+
1506913 dspace    20   0 1383380 586472  14180 R  93.8   4.8   4:57.11 node 
/v+
1508040 dspace    20   0  733380 141452  36952 R  75.0   1.2   0:00.77 node 
/v+
    781 root      20   0  237020   2536    944 S   6.2   0.0   9:41.79 
vmtoolsd
      1 root      20   0  171488   7176   2492 S   0.0   0.1   0:44.04 
systemd

Our cache settings are set as follows:
# Caching settings
cache:
...
  serverSide:
    debug: false
    botCache:
      max: 1000
      timeToLive: 86400000 # 1 day
      allowStale: true
    anonymousCache:
      max: 1000
      timeToLive: 10000 # 10 seconds
      allowStale: true

*The main question of our systems analyst (Francois Malric): Is this level 
of constantly high CPU usage is normal due to node.js?  Or is it likely 
that our DSpace is displaying poor performance due to underlying issues?*

Here are some examples of the errors we've seen:

(1) From our DSpace Logs:  According to this, we don't have that much 
traffic (HTTP 0.96 requests/minute), which would likely be higher if bot 
traffic was the issue.  Nota bene, our pm2 monitor likely has a bug as it's 
showing the units as req/min; should be req/sec.

lq Process List qqqqqqqqqqqqqqqqqqqqklqq  dspace-ui Logs 
 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk
x[ 1] dspace-ui     Mem: 824 MB     xx dspace-ui > The response for 
'https://ruor.uottawa.ca/server/api/core/items/d2d3c  x
x[ 2] dspace-ui     Mem: 316 MB     xx dspace-ui > 1 rules skipped due to 
selector errors:                                x
x[ 3] dspace-ui     Mem: 777 MB     xx dspace-ui >   
.custom-file-input:lang(en)~.custom-file-label -> unmatched          x
x[ 0] pm2-logrotate       Mem:  45  xx dspace-ui > GET 
/handle/10393/19705/simple-search?query=&sort_by=score&order=desc  x
x                                   xx dspace-ui > *1 rules skipped due to 
selector errors:  *                              x
x                                   xx dspace-ui >   
*.custom-file-input:lang(en)~.custom-file-label 
-> unmatched*          x
x                                   xx dspace-ui > Redirecting from 
/bitstreams/e524c49e-5fc2-4e74-b69d-0c890238ab3b/dow  x
x                                   xx dspace-ui > GET 
/bitstreams/e524c49e-5fc2-4e74-b69d-0c890238ab3b/download 302      x
x                                   xx dspace-ui >* ERROR Error: Cannot set 
headers after they are sent to the client      x*
x                                   xx dspace-ui >     at new NodeError 
(node:internal/errors:405:5)                      x
x                                   xx dspace-ui >     at 
ServerResponse.setHeader (node:_http_outgoing:648:11)           x
x                                   xx dspace-ui >     at 
ServerResponseService.setHeader (/opt/dspace-frontend/dist/ser  x
x                                   xx dspace-ui >     at Object.next 
(/opt/dspace-frontend/dist/server/9366.js:1:4722)   x
x                                   xx dspace-ui >     at 
ConsumerObserver2.next (/opt/dspace-frontend/dist/server/main.  x
x                                   xx dspace-ui >     at 
SafeSubscriber2.Subscriber2._next (/opt/dspace-frontend/dist/s  x
x                                   xx dspace-ui >     at 
SafeSubscriber2.Subscriber2.next (/opt/dspace-frontend/dist/se  x
x                                   xx dspace-ui >     at 
/opt/dspace-frontend/dist/server/main.js:1:4471483              x
x                                   xx dspace-ui >     at 
OperatorSubscriber2._this._next (/opt/dspace-frontend/dist/ser  x
x                                   xx dspace-ui >     at 
OperatorSubscriber2.Subscriber2.next (/opt/dspace-frontend/dis  x
x                                   xx dspace-ui >   code: '
*ERR_HTTP_HEADERS_SENT*'                                        x
x                                   xx dspace-ui > }                       
                                               x
x                                   xx dspace-ui > 1 rules skipped due to 
selector errors:                                x
x                                   xx dspace-ui >   
.custom-file-input:lang(en)~.custom-file-label -> unmatched          x
x                                   xx dspace-ui > Warning 
[ERR_HTTP_HEADERS_SENT]: *Tried to set headers after they *      x
x                                   xx dspace-ui > GET 
/items/d2d3cc05-419a-488e-912e-1ff20ab7a654 200 3281.694 ms - -    x
mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqjmqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
lq Custom Metrics qqqqqqqqqqqqqqqqqqklq Metadata 
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk
x Heap Size             709.14 MiB  xx App Name              dspace-ui     
                                               x
x Event Loop Latency p95            xx Namespace             default       
                                               x
x Event Loop Latency      34.32 ms  xx Version               N/A           
                                               x
x Active handles                10  xx Restarts              38             
                                              x
x Active requests                1  xx Uptime                5m             
                                              x
x HTTP                0.96 req/min  xx Script path           
/opt/dspace-frontend/dist/server/main.js                     x
x HTTP P95 Latency         4009 ms  xx Script args           N/A           
                                               x
x HTTP Mean Latency         868 ms  xx Interpreter           node           
                                              x

We'd particularly appreciate feedback on the error messages (bolded above):

   - 1 rules skipped due to selector errors:
      - *Suggestion from Mark Wood: this is probably because we're using 
      Bootstrap 4  
      
<https://stackoverflow.com/questions/67581454/receiving-unmatched-pseudo-class-lang-after-updating-angular>*
   - .custom-file-input:lang(en)~.custom-file-label -> unmatched
   - ERROR Error: Cannot set headers after they are sent to the client
   - Warning [ERR_HTTP_HEADERS_SENT]: Tried to set headers after they
      - *Suggestion from Mark Wood: * *The Headers Sent errors seem  to be 
      mostly an annoyance, but the constant dumping of stack traces is bloating 
      the log.*
   
These errors occur constantly :'(

*Suggestion from Mark Wood on error messages: The most serious is probably 
the proxy errors.  It appears that PM2 is closing proxy connections, 
probably because there are too many.  The machine is simply being asked to 
do more work than it can handle in the available time.  We see this too, 
even after doubling our CPU and memory from levels that were quite adequate 
for v6.  We are about to throw a big increase in resources at v7 to see if 
that helps, as it has at other sites.*

(2) Example of errors from our Apache Error Log (we run Apache to proxy, as 
recommended in documentation):

[Fri Mar 01 14:51:00.740446 2024] [proxy:error] [pid 1494894:tid 
140510257739520] [client 66.XXX.75.XXX:0] AH00898: Error reading from 
remote server returned by /handle/10393/19705/simple-search
[Fri Mar 01 14:53:26.192799 2024] [proxy_http:error] [pid 1494894:tid 
140510257739520] (104)Connection reset by peer: [client 66.XXX.75.XXX:0] 
AH01102: error reading status line from remote server localhost:4000

Some suggestions we've seen in the DSpace Slack already:
Tim Donoghue:

   - Initial increased bot traffic is common, but tends to decrease over 
   time
   - Review major errors in DSpace/Tomcat/Postgres/etc. logs
   - Enable more caching in server-side rendering as that uses the most CPU 
   in Node.js
      - Seconded by Mark Wood: Increasing the caching will likely reduce 
      the CPU demand but memory demand will increase drastically.
   - Mark Wood: In general, DSpace 7.x is much more computationally 
   expensive than the previous versions

If you've read this far, thank you so much for your time and 
consideration.  The wider DSpace community seems to be struggling with 
these issues, and we would all welcome your observations on these issues 
and suggestions for resolving it.

Best,
Carolyn Sullivan

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/0e817201-5f4a-4fc1-9781-b462cc97134fn%40googlegroups.com.

Reply via email to