I have prometheus running in EKS (App Version: 2.18.1). The data is being 
stored in an EFS mount. I am repeatedly getting compaction failure errors 
and the number of WAL files increase drastically. This gets fixed only 
after the WAL directory is deleted and pod restarted. But on removing the 
WAL directory I am losing data. Please let me know if there is a permanent 
fix for this issue.

Error from logs: 

level=error ts=2020-07-23T03:51:41.230Z caller=db.go:667 component=tsdb 
msg="compaction failed" err="persist head block: write compaction: add 
series: out-of-order series added with label set 
\"{__name__=\\\"go_gc_duration_seconds\\\", 
instance=\\\"<hostname>:<port>\\\", job=\\\"<job_name>\\\", 
quantile=\\\"0\\\", region=\\\"<region_label>\\\"}\""

The job name varies. Each time this error occurs, it points to a different 
job.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/70f9be73-891e-40bf-b7cf-94f2003924c1o%40googlegroups.com.

Reply via email to