dhairav opened a new issue, #9625:
URL: https://github.com/apache/trafficserver/issues/9625
Hello,
We use trafficserver as a reverse proxy for one of our caching systems. It
is a 128G RAM system with about 56 TB of storage, wherein we have configured a
(safe) RAM cache size of 92G. We've also configured an Average Object Size of
128K.
According to our calculations, we should see at least 10+ GB of free memory
on the system, but the RAM utilization of the system simply doesn't seem to
come down, even at low-traffic periods.
We typically hit 3-4 Gbps traffic and the RAM cache is almost always full,
with a simple LRU configured. We have observed that the RAM utilization of the
system has kept on increasing over time - and it has been OOM killed multiple
times, causing traffic disruptions.
What I would like to understand is why the RAM consumption is exceeding the
RAM cache size explicitly by such a huge margin.
The system is currently running at less than 2G of free RAM, with no other
major processes running on the same.
Here is the output from the `free` command -
total: 125Gi
used: 122Gi
free: 961Mi
shared: 2.0Mi
buffers: 884Mi
cache: 800Mi
available: 1.5Gi
I can see via `systemctl status trafficserver` that trafficserver is using
122G RAM
`trafficserver.service - Apache Traffic Server is a fast, scalable and
extensible caching proxy server.
Loaded: loaded (/lib/systemd/system/trafficserver.service; enabled;
vendor preset: enabled)
Active: active (running) since Tue 2023-04-11 00:29:36 UTC; 1 weeks 1
days ago
Docs: man:traffic_server(8)
Main PID: 54560 (traffic_manager)
Tasks: 49 (limit: 9830)
Memory: 122.0G
CPU: 6d 20h 1min 19.626s
CGroup: /system.slice/trafficserver.service
├─54560 /usr/bin/traffic_manager
├─54569 /usr/bin/traffic_server -M --httpport
80:fd=8,443:fd=9:ssl:proto=http,443:fd=10:ipv6
└─54571 traffic_crashlog --syslog --wait --host
x86_64-pc-linux-gnu --user trafficserver
Apr 11 00:29:36 dhairav systemd[1]: Started Apache Traffic Server is a fast,
scalable and extensible caching proxy server..
Apr 11 00:29:36 dhairav traffic_manager[54560]: [E. Mgmt] log ==>
[TrafficManager] using root directory '/usr'
Apr 11 00:29:36 dhairav traffic_manager[54560]: NOTE: --- Manager Starting
---
Apr 11 00:29:36 dhairav traffic_manager[54560]: NOTE: Manager Version:
Apache Traffic Server - traffic_manager - 8.1.5 - (build # 081207 on Aug 12
2022 at 07:16:08)
Apr 11 00:29:36 dhairav traffic_manager[54560]: NOTE:
RLIMIT_NOFILE(7):cur(58981),max(58981)
Apr 11 00:29:39 dhairav traffic_server[54569]: NOTE: --- traffic_server
Starting ---
Apr 11 00:29:39 dhairav traffic_server[54569]: NOTE: traffic_server Version:
Apache Traffic Server - traffic_server - 8.1.5 - (build # 081207 on Aug 12 2022
at 07:16:08)
Apr 11 00:29:39 dhairav traffic_server[54569]: NOTE:
RLIMIT_NOFILE(7):cur(58981),max(58981)
Apr 11 00:29:39 dhairav traffic_manager[54569]: Traffic Server 8.1.5 Aug 12
2022 07:16:08 localhost
Apr 11 00:29:39 dhairav traffic_manager[54569]: traffic_server: using root
directory '/usr'
`
Now as a short-term solution, I could create a boot script to prevent an OOM
kill by adding an exception for trafficserver processes, but I believe it will
not solve the inherent memory leak or any misconfiguration we might have done.
I suspect something to do with the cache index overhead, but we've taken that
into account in our calculations as well while setting the safer RAM cache 92G
value.
I'll be happy to attach my `records.config` file or anything else at my
disposal to solve this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]