I have prometheus running in EKS (App Version: 2.18.1). The data is being
stored in an EFS mount. I am repeatedly getting compaction failure errors
and the number of WAL files increase drastically. This gets fixed only
after the WAL directory is deleted and pod restarted. But on removing the
WAL directory I am losing data. Please let me know if there is a permanent
fix for this issue.
Error from logs:
level=error ts=2020-07-23T03:51:41.230Z caller=db.go:667 component=tsdb
msg="compaction failed" err="persist head block: write compaction: add
series: out-of-order series added with label set
\"{__name__=\\\"go_gc_duration_seconds\\\",
instance=\\\"<hostname>:<port>\\\", job=\\\"<job_name>\\\",
quantile=\\\"0\\\", region=\\\"<region_label>\\\"}\""
The job name varies. Each time this error occurs, it points to a different
job.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-developers/70f9be73-891e-40bf-b7cf-94f2003924c1o%40googlegroups.com.