Now would be a good time to do: ls -l /var/lib/prometheus/data/chunks_head/ du -sck /var/lib/prometheus/data/chunks_head/*
My suspicion is your out-of-memory condition is messing up the writing of chunks. Are you using cgroups/containers? Also, is prometheus continually crashing and being restarted by systemd? Try looking in "journalctl -eu prometheus". That might explain why you see lots of free memory most of the time (when prometheus is stopped). On Thursday, 17 February 2022 at 14:57:25 UTC Senthil wrote: > The issue started again. > > 629G chunks_head > 0 lock > 4.0K queries.active > 9.3G wal > > There is numerous restart of Prometheus > Feb 17 09:02:02 kernel: Out of memory: Kill process 36580 (prometheus) > score 844 or sacrifice child > Feb 17 09:08:36 kernel: Out of memory: Kill process 39001 (prometheus) > score 846 or sacrifice child > Feb 17 09:16:02 kernel: Out of memory: Kill process 41074 (prometheus) > score 845 or sacrifice child > Feb 17 09:22:17 kernel: Out of memory: Kill process 44665 (prometheus) > score 844 or sacrifice child > Feb 17 09:29:25 kernel: Out of memory: Kill process 47234 (prometheus) > score 844 or sacrifice child > Feb 17 09:36:06 kernel: Out of memory: Kill process 48970 (prometheus) > score 846 or sacrifice child > Feb 17 09:43:21 kernel: Out of memory: Kill process 50661 (prometheus) > score 844 or sacrifice child > > but there is plenty of mem available in the servers. > > total used free shared buff/cache > available > Mem: 47 5 31 0 10 > 40 > Swap: 5 1 3 > Total: 52 7 35 > > On Tuesday, February 1, 2022 at 5:21:32 PM UTC-5 Brian Candler wrote: > >> On Tuesday, 1 February 2022 at 21:52:30 UTC Senthil wrote: >> >>> I started on Jan 31, so it's a day. >>> >>> # du -sck chunks_head/* >>> 54140 chunks_head/024326 >>> 4 chunks_head/024327 >>> 54144 total >>> >> >> That's perfectly reasonable: it's only 54MB (which is a long way from >> 689GB!) >> >> Here's what I see on a moderately busy system: >> >> root@ldex-prometheus:~# du -sck /var/lib/prometheus/data/chunks_head/* >> 81004 /var/lib/prometheus/data/chunks_head/006831 >> 77824 /var/lib/prometheus/data/chunks_head/006832 >> 158828 total >> >> That's comparable to yours. >> >> Therefore, I think you need to keep an eye on this periodically. If only >> you had a monitoring system which could do this for you :-) >> >> If it does start to rise, that's when you'll need to check prometheus log >> output and find out what's happening. But this is very strange, and it >> does seem to be something specific to your system. >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/25405bc6-d4e6-4152-8dde-87b89e18bdd9n%40googlegroups.com.