Re: [prometheus-users] Re: Prometheus not able to start and gives error "opening storage failed:: found unsequential head chunk error"

Murali Krishna Kanagala Thu, 11 Jun 2020 17:51:03 -0700

Try deleting the contents WAL folder and start the service. Ideally you can
delete the last WAL where the service is failing to start. It is in the log
just before Prometheus stopping.


On Thu, Jun 11, 2020, 2:26 AM neel patel <neel5...@gmail.com> wrote:

> Hi Julien,
>
> Unfortunately i don't have previous logs. And i don't think, it is related
> to disk as i have ~28GB storage available. Below is the output of my root
> file system.
>
> /dev/sda3        96G   69G   28G  72% /
> /dev/sda1       297M  207M   91M  70% /boot
> tmpfs           378M     0  378M   0% /run/user/26
> tmpfs           378M   24K  378M   1% /run/user/1000
>
>
> Let me know in case of more help.
>
> On Thursday, June 11, 2020 at 12:20:46 PM UTC+5:30, neel patel wrote:
>>
>> Prometheus was working fine before 2 days and today when i stop and start
>> it again, it shows below error and not able to start the prometheus.
>>
>> I build prometheus from master branch and using that binary since 2 days.
>> Below are the logs when i start prometheus.
>>
>>
>> [neel@localhost prometheus]$ ./prometheus --config.file=prometheus.yml
>> --storage.tsdb.path=data/
>> level=info ts=2020-06-11T06:25:10.967Z caller=main.go:302 msg="No time or
>> size retention was set so using the default time retention" duration=15d
>> level=info ts=2020-06-11T06:25:10.967Z caller=main.go:337 msg="Starting
>> Prometheus" version="(version=2.18.1, branch=master,
>> revision=18d9ebf0ffc26b8bd0e136f552c8e9886d29ade4)"
>> level=info ts=2020-06-11T06:25:10.967Z caller=main.go:338
>> build_context="(go=go1.14.3, user=neel@localhost.localdomain,
>> date=20200604-05:51:34)"
>> level=info ts=2020-06-11T06:25:10.967Z caller=main.go:339
>> host_details="(Linux 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49
>> UTC 2019 x86_64 localhost.localdomain (none))"
>> level=info ts=2020-06-11T06:25:10.967Z caller=main.go:340
>> fd_limits="(soft=1024, hard=4096)"
>> level=info ts=2020-06-11T06:25:10.967Z caller=main.go:341
>> vm_limits="(soft=unlimited, hard=unlimited)"
>> level=info ts=2020-06-11T06:25:10.973Z caller=main.go:678 msg="Starting
>> TSDB ..."
>> level=info ts=2020-06-11T06:25:10.973Z caller=web.go:524 component=web
>> msg="Start listening for connections" address=0.0.0.0:9090
>> level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb
>> msg="Found healthy block" mint=1591250134254 maxt=1591257600000
>> ulid=01EACBSE6H4EP5CS2K2G6KWJ0W
>> level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb
>> msg="Found healthy block" mint=1591682400000 maxt=1591747200000
>> ulid=01EAE4FPDZ932CDRBAGH019TK7
>> level=info ts=2020-06-11T06:25:10.974Z caller=repair.go:59 component=tsdb
>> msg="Found healthy block" mint=1591747200000 maxt=1591768800000
>> ulid=01EAEQAB8X70893C6ZC2T2ZK38
>> level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb
>> msg="Found healthy block" mint=1591790400000 maxt=1591797600000
>> ulid=01EAGR8C1PMZZACQMN185NC254
>> level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb
>> msg="Found healthy block" mint=1591797600000 maxt=1591804800000
>> ulid=01EAGR8D3YVDV517XCMBMMKCS7
>> level=info ts=2020-06-11T06:25:10.975Z caller=repair.go:59 component=tsdb
>> msg="Found healthy block" mint=1591768800000 maxt=1591790400000
>> ulid=01EAGR8ETQQC8MZ09SSC62N1D3
>> level=info ts=2020-06-11T06:25:10.982Z caller=main.go:547 msg="Stopping
>> scrape discovery manager..."
>> level=info ts=2020-06-11T06:25:10.983Z caller=main.go:561 msg="Stopping
>> notify discovery manager..."
>> level=info ts=2020-06-11T06:25:10.983Z caller=main.go:583 msg="Stopping
>> scrape manager..."
>> level=info ts=2020-06-11T06:25:10.983Z caller=main.go:557 msg="Notify
>> discovery manager stopped"
>> level=info ts=2020-06-11T06:25:10.983Z caller=main.go:543 msg="Scrape
>> discovery manager stopped"
>> level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:882
>> component="rule manager" msg="Stopping rule manager..."
>> level=info ts=2020-06-11T06:25:10.983Z caller=manager.go:892
>> component="rule manager" msg="Rule manager stopped"
>> level=info ts=2020-06-11T06:25:10.983Z caller=notifier.go:601
>> component=notifier msg="Stopping notification manager..."
>> level=info ts=2020-06-11T06:25:10.983Z caller=main.go:749 msg="Notifier
>> manager stopped"
>> level=info ts=2020-06-11T06:25:10.983Z caller=main.go:577 msg="Scrape
>> manager stopped"
>> level=error ts=2020-06-11T06:25:10.983Z caller=main.go:758 err="opening
>> storage failed: found unsequential head chunk files 144 and 148"
>>
>> #########################
>>
>> Data directory content as below
>>
>> 01EACBSE6H4EP5CS2K2G6KWJ0W  01EAEQAB8X70893C6ZC2T2ZK38
>> 01EAGR8D3YVDV517XCMBMMKCS7  chunks_head  queries.active
>> 01EAE4FPDZ932CDRBAGH019TK7  01EAGR8C1PMZZACQMN185NC254
>> 01EAGR8ETQQC8MZ09SSC62N1D3  lock         wal
>>
>> ##########################
>>
>>
>>
>> Prometheus config file as below.
>>
>> # my global config
>> global:
>>   scrape_interval:     15s # Set the scrape interval to every 15 seconds.
>> Default is every 1 minute.
>>   evaluation_interval: 15s # Evaluate rules every 15 seconds. The default
>> is every 1 minute.
>>   # scrape_timeout is set to the global default (10s).
>>
>> # Alertmanager configuration
>> alerting:
>>   alertmanagers:
>>   - static_configs:
>>     - targets:
>>       # - alertmanager:9093
>>
>> # Load rules once and periodically evaluate them according to the global
>> 'evaluation_interval'.
>> rule_files:
>>   # - "first_rules.yml"
>>   # - "second_rules.yml"
>>
>> # A scrape configuration containing exactly one endpoint to scrape:
>> # Here it's Prometheus itself.
>> scrape_configs:
>>   # The job name is added as a label `job=<job_name>` to any timeseries
>> scraped from this config.
>>   - job_name: 'prometheus'
>>
>>     # Override the global default and scrape targets from this job every
>> 5 seconds.
>>     scrape_interval: 5s
>>
>>     static_configs:
>>       - targets: ['localhost:9090']
>>
>>   # The job name is added as a label `job=<job_name>` to any timeseries
>> scraped from this config.
>>   - job_name: 'postgres-exporter'
>>
>>     # Override the global default and scrape targets from this job every
>> 5 seconds.
>>     scrape_interval: 15s
>>
>>     static_configs:
>>       - targets: ['localhost:9187']
>> #############################################
>>
>> Let me know, is the TSDB is corrupted ? If yes, is there anyway to
>> recover ?
>>
>> Thanks in Advance
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/e5d4d0b1-b916-42d1-81fc-7c4809d0abd6o%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/e5d4d0b1-b916-42d1-81fc-7c4809d0abd6o%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAKimyZteuHgb29roj68tnB9ONom4Txy%2BADQjWimjg37HH4kd3Q%40mail.gmail.com.

Re: [prometheus-users] Re: Prometheus not able to start and gives error "opening storage failed:: found unsequential head chunk error"

Reply via email to