Dear Carolyn, not directly answering your question but we are running (a possibly smaller instance of DSpace for a non insitutonal site) without PM2 but with node.js directly. With the amount of traffic you are having you might want to give it a try to rule out PM2 and it's handling as a souce of error. With our install we are not getting e.g. *ERR_HTTP_HEADERS_SENT *
BR Maruan Sahyoun FileAffairs GmbH Carolyn Sullivan schrieb am Dienstag, 5. März 2024 um 11:30:15 UTC+1: > Hello all, > > As you might have seen if you frequent the Technical Support channel in > the DSpace Slack > <https://dspace-org.slack.com/archives/C3V628QNN/p1709318454084069>, > we've been encountering high CPU usage in DSpace 7.6 leading to decreased > performance (ie. site unavailability) and we've been having a lot of > errors. We're not entirely sure which errors may be the significant ones > relating to our performance challenges, and would welcome any input from > the community to help us improve our site issues. Also, if you're > encountering similar issues, please do let us know--maybe we're all having > the same problems and can solve these collaboratively. Thanks already to > Tim Donoghue and Mark Wood for their suggestions in the DSpace Slack! I've > aggregated the responses we've already received on these issues here to > enable us to keep track of suggestions. > > So, a summary of our issues: We've set up our server following best > practices in the Performance Tuning documentation > <https://wiki.lyrasis.org/display/DSDOC7x/Performance+Tuning+DSpace>. We > run everything on a single server with 4 CPU and 12 GB of RAM, which was > the configuration that worked for our previous version of DSpace (6.3). > Initially, we had pm2 configured in cluster mode with max instances and > max_memory_restart: 500M. With this configuration, the node instances kept > restarting ~every minute and seemed to be monopolizing the CPUs, and > starving the other components. Since then, we have since tuned it down to > 3 instances, ie: > > { > "apps": [ > { > "name": "dspace-ui", > "cwd": "/var/dspace-frontend/", > "script": "dist/server/main.js", > "instances": "3", > "exec_mode": "cluster", > "timestamp": "YYYY-MM-DD HH:mm Z", > "out_file": "log/dspace-ui.log", > "error_file": "log/dspace-ui_error.log", > "merge_logs": true, > "env": { > "NODE_ENV": "production", > "NODE_EXTRA_CA_CERTS": "/etc/ssl/certs/rootCA2.crt" > }, > "max_memory_restart": "1500M", > "node_args": "--max_old_space_size=4096" > } > ] > } > > A review of process with top shows very active node instances despite low > traffic (~ 1 request/second): > > Tasks: 289 total, 5 running, 284 sleeping, 0 stopped, 0 zombie > %Cpu(s): 93.9 us, 3.0 sy, 0.0 ni, 3.0 id, 0.0 wa, 0.0 hi, 0.0 si, > 0.0 st > MiB Mem : 11965.2 total, 440.4 free, 10901.1 used, 623.8 buff/cache > MiB Swap: 0.0 total, 0.0 free, 0.0 used. 613.9 avail Mem > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > 1504803 dspace 20 0 2328180 1.5g 13936 R 100.0 12.5 10:53.08 > node /v+ > 1506783 dspace 20 0 2620092 1.7g 14024 R 93.8 14.8 9:44.49 > node /v+ > 1506913 dspace 20 0 1383380 586472 14180 R 93.8 4.8 4:57.11 > node /v+ > 1508040 dspace 20 0 733380 141452 36952 R 75.0 1.2 0:00.77 > node /v+ > 781 root 20 0 237020 2536 944 S 6.2 0.0 9:41.79 > vmtoolsd > 1 root 20 0 171488 7176 2492 S 0.0 0.1 0:44.04 > systemd > > Our cache settings are set as follows: > # Caching settings > cache: > ... > serverSide: > debug: false > botCache: > max: 1000 > timeToLive: 86400000 # 1 day > allowStale: true > anonymousCache: > max: 1000 > timeToLive: 10000 # 10 seconds > allowStale: true > > *The main question of our systems analyst (Francois Malric): Is this level > of constantly high CPU usage is normal due to node.js? Or is it likely > that our DSpace is displaying poor performance due to underlying issues?* > > Here are some examples of the errors we've seen: > > (1) From our DSpace Logs: According to this, we don't have that much > traffic (HTTP 0.96 requests/minute), which would likely be higher if bot > traffic was the issue. Nota bene, our pm2 monitor likely has a bug as it's > showing the units as req/min; should be req/sec. > > lq Process List qqqqqqqqqqqqqqqqqqqqklqq dspace-ui Logs > qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk > x[ 1] dspace-ui Mem: 824 MB xx dspace-ui > The response for ' > https://ruor.uottawa.ca/server/api/core/items/d2d3c x > x[ 2] dspace-ui Mem: 316 MB xx dspace-ui > 1 rules skipped due to > selector errors: x > x[ 3] dspace-ui Mem: 777 MB xx dspace-ui > > .custom-file-input:lang(en)~.custom-file-label -> unmatched x > x[ 0] pm2-logrotate Mem: 45 xx dspace-ui > GET > /handle/10393/19705/simple-search?query=&sort_by=score&order=desc x > x xx dspace-ui > *1 rules skipped due > to selector errors: * x > x xx dspace-ui > > *.custom-file-input:lang(en)~.custom-file-label > -> unmatched* x > x xx dspace-ui > Redirecting from > /bitstreams/e524c49e-5fc2-4e74-b69d-0c890238ab3b/dow x > x xx dspace-ui > GET > /bitstreams/e524c49e-5fc2-4e74-b69d-0c890238ab3b/download 302 x > x xx dspace-ui >* ERROR Error: Cannot > set headers after they are sent to the client x* > x xx dspace-ui > at new NodeError > (node:internal/errors:405:5) x > x xx dspace-ui > at > ServerResponse.setHeader (node:_http_outgoing:648:11) x > x xx dspace-ui > at > ServerResponseService.setHeader (/opt/dspace-frontend/dist/ser x > x xx dspace-ui > at Object.next > (/opt/dspace-frontend/dist/server/9366.js:1:4722) x > x xx dspace-ui > at > ConsumerObserver2.next (/opt/dspace-frontend/dist/server/main. x > x xx dspace-ui > at > SafeSubscriber2.Subscriber2._next (/opt/dspace-frontend/dist/s x > x xx dspace-ui > at > SafeSubscriber2.Subscriber2.next (/opt/dspace-frontend/dist/se x > x xx dspace-ui > at > /opt/dspace-frontend/dist/server/main.js:1:4471483 x > x xx dspace-ui > at > OperatorSubscriber2._this._next (/opt/dspace-frontend/dist/ser x > x xx dspace-ui > at > OperatorSubscriber2.Subscriber2.next (/opt/dspace-frontend/dis x > x xx dspace-ui > code: ' > *ERR_HTTP_HEADERS_SENT*' x > x xx dspace-ui > } > x > x xx dspace-ui > 1 rules skipped due to > selector errors: x > x xx dspace-ui > > .custom-file-input:lang(en)~.custom-file-label -> unmatched x > x xx dspace-ui > Warning > [ERR_HTTP_HEADERS_SENT]: *Tried to set headers after they * x > x xx dspace-ui > GET > /items/d2d3cc05-419a-488e-912e-1ff20ab7a654 200 3281.694 ms - - x > > mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqjmqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj > lq Custom Metrics qqqqqqqqqqqqqqqqqqklq Metadata > qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqk > x Heap Size 709.14 MiB xx App Name dspace-ui > x > x Event Loop Latency p95 xx Namespace default > x > x Event Loop Latency 34.32 ms xx Version N/A > x > x Active handles 10 xx Restarts 38 > x > x Active requests 1 xx Uptime 5m > x > x HTTP 0.96 req/min xx Script path > /opt/dspace-frontend/dist/server/main.js x > x HTTP P95 Latency 4009 ms xx Script args N/A > x > x HTTP Mean Latency 868 ms xx Interpreter node > x > > We'd particularly appreciate feedback on the error messages (bolded above): > > - 1 rules skipped due to selector errors: > - *Suggestion from Mark Wood: this is probably because we're using > Bootstrap 4 > > <https://stackoverflow.com/questions/67581454/receiving-unmatched-pseudo-class-lang-after-updating-angular>* > - .custom-file-input:lang(en)~.custom-file-label -> unmatched > - ERROR Error: Cannot set headers after they are sent to the client > - Warning [ERR_HTTP_HEADERS_SENT]: Tried to set headers after they > - *Suggestion from Mark Wood: * *The Headers Sent errors seem to > be mostly an annoyance, but the constant dumping of stack traces is > bloating the log.* > > These errors occur constantly :'( > > *Suggestion from Mark Wood on error messages: The most serious is probably > the proxy errors. It appears that PM2 is closing proxy connections, > probably because there are too many. The machine is simply being asked to > do more work than it can handle in the available time. We see this too, > even after doubling our CPU and memory from levels that were quite adequate > for v6. We are about to throw a big increase in resources at v7 to see if > that helps, as it has at other sites.* > > (2) Example of errors from our Apache Error Log (we run Apache to proxy, > as recommended in documentation): > > [Fri Mar 01 14:51:00.740446 2024] [proxy:error] [pid 1494894:tid > 140510257739520] [client 66.XXX.75.XXX:0] AH00898: Error reading from > remote server returned by /handle/10393/19705/simple-search > [Fri Mar 01 14:53:26.192799 2024] [proxy_http:error] [pid 1494894:tid > 140510257739520] (104)Connection reset by peer: [client 66.XXX.75.XXX:0] > AH01102: error reading status line from remote server localhost:4000 > > Some suggestions we've seen in the DSpace Slack already: > Tim Donoghue: > > - Initial increased bot traffic is common, but tends to decrease over > time > - Review major errors in DSpace/Tomcat/Postgres/etc. logs > - Enable more caching in server-side rendering as that uses the most > CPU in Node.js > - Seconded by Mark Wood: Increasing the caching will likely reduce > the CPU demand but memory demand will increase drastically. > - Mark Wood: In general, DSpace 7.x is much more computationally > expensive than the previous versions > > If you've read this far, thank you so much for your time and > consideration. The wider DSpace community seems to be struggling with > these issues, and we would all welcome your observations on these issues > and suggestions for resolving it. > > Best, > Carolyn Sullivan > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/1233de11-55c3-489c-89d4-6279578bdb2en%40googlegroups.com.