Re: [prometheus-users] Prometheus is not showing the metric gathered from netdata

Julius Volz Sun, 10 May 2020 13:38:17 -0700

Huh! Ok, strange. And I guess you double-checked that that is what the
Prometheus server really scrapes... then I'm a bit out of suggestions at
the moment without poking at the setup myself.


On Sun, May 10, 2020 at 10:34 PM Yashar Nesabian <[email protected]> wrote:

> here is the chart for the last 6 hours for the metric: (the last metric is
> for 14:43 )
>
> [image: Screenshot from 2020-05-11 00-58-58.png]
>
>
> On Monday, May 11, 2020 at 12:43:21 AM UTC+4:30, Yashar Nesabian wrote:
>>
>> The other slaves have 2-3 seconds difference with the timestamp of these
>> metrics, and yes the 2:57pm UTC is almost correct (I don't know the exact
>> time) and using foo[24h] is not very informative right now because we still
>> have the previous metrics when the slaves were on netdata master number 1.
>> I did another experiment, I downloaded the metric files again and ran the
>> command (date +%s) on the Prometheus server almost at the same time,
>> The metrics' timestamp was 1589141392868 and the server's timestamp
>> was  1589141393 So I think this is not the problem
>>
>> On Monday, May 11, 2020 at 12:19:23 AM UTC+4:30, Julius Volz wrote:
>>>
>>> [+CCing back prometheus-users, which I had accidentally removed]
>>>
>>> How similar are the others? The ones in your example are from this
>>> afternoon (2:57pm UTC), I guess that's when you downloaded the file for
>>> grepping first?
>>>
>>> A regular instant vector selector in PromQL (like just "foo") will only
>>> select data points up to 5 minutes into the past from the current
>>> evaluation timestamp. So the table view would not show samples for any
>>> series whose last sample is more than 5m into the past. You could try a
>>> range selector like "foo[24h]" on these to see if any historical data is
>>> returned (I would expect so).
>>>
>>> On Sun, May 10, 2020 at 9:37 PM Yashar Nesabian <[email protected]>
>>> wrote:
>>>
>>>> Sure, here it is:
>>>> if the second parameter is the timestamp, then yes that's the problem,
>>>> but I wonder how come other metrics are stored by the Prometheus server?
>>>> because they also have a similar timestamp
>>>>
>>>> grep -i "netdata_web_log_detailed_response_codes_total" 
>>>> allmetrics\?format=prometheus_all_hosts\&source=as-collected.2 | grep -i 
>>>> "abs"
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-02.x.y.zabs"}
>>>>  245453 1589122673736
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-02.x.y.zabs"}
>>>>  82 1589122673736
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-02.x.y.zabs"}
>>>>  6 1589122673736
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-04.x.y.zabs"}
>>>>  238105 1589122673017
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-04.x.y.zabs"}
>>>>  59 1589122673017
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-04.x.y.zabs"}
>>>>  3 1589122673017
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-03.x.y.zabs"}
>>>>  241708 1589122673090
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-03.x.y.zabs"}
>>>>  68 1589122673090
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-03.x.y.zabs"}
>>>>  5 1589122673090
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-01.x.y.zabs"}
>>>>  250296 1589122674872
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-01.x.y.zabs"}
>>>>  81 1589122674872
>>>> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-01.x.y.zabs"}
>>>>  7 1589122674872
>>>>
>>>>
>>>>
>>>> On Sun, May 10, 2020 at 10:36 PM Julius Volz <[email protected]>
>>>> wrote:
>>>>
>>>>> Hmm, odd. Could you share some of the lines that your grep finds in
>>>>> the metrics output of the correctly scraped target?
>>>>>
>>>>> The example at the top of
>>>>> https://github.com/netdata/netdata/issues/3891 suggests that Netdata
>>>>> sets client-side timestamps for samples (which is uncommon for Prometheus
>>>>> otherwise). Maybe those timestamps are too far in the past (more than 5
>>>>> minutes), so they would not be shown anymore?
>>>>>
>>>>> On Sun, May 10, 2020 at 6:51 PM Yashar Nesabian <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I have a job on the Prometheus which gathers metrics from 4 netdata
>>>>>> master servers. Here is the scenario I had:
>>>>>> - on netdata master number 1, I gather metrics of  about 200 slaves
>>>>>> - For some reason, I decided to move 12 slaves
>>>>>> (a1,a2,a3,a4,b1,b2,b3,b4,c1,c2,c3,c4) from the first netdata master to 
>>>>>> the
>>>>>> second netdata master
>>>>>> - Now I only see metrics from 8 servers on the Prometheus server
>>>>>> a1,a2,a3,a4,b1,b2,b3,b4) coming from the second master
>>>>>> - I check the job status in the targets page and I see all 4 masters
>>>>>> are up and metrics are gathered successfully
>>>>>> - Here is the URL which Prometheus uses to read the metrics from the
>>>>>> netdata master number 2:
>>>>>> http://172.16.76.152:19999/api/v1/allmetrics?format=prometheus_all_hosts
>>>>>> - I grep the downloaded file with hosts metrics for the c1,c2,c3,c4
>>>>>> hosts and I see netdata is sending all the metrics relevant to these 
>>>>>> slaves
>>>>>> - But when I search for the metric in the Graph page, I don't see any
>>>>>> results:
>>>>>>
>>>>>> [image: Screenshot from 2020-05-10 20-58-27.png]
>>>>>>
>>>>>> all the servers' time is synced and are correct.
>>>>>> here is the output of systemctl status prometheus:
>>>>>>
>>>>>> May 10 19:35:07 devops-mon-01 systemd[1]: Reloading Prometheus.
>>>>>> May 10 19:35:07 devops-mon-01 prometheus[6076]: level=info
>>>>>> ts=2020-05-10T15:05:07.407Z caller=main.go:734 msg="Loading configuration
>>>>>> file" filename=/e
>>>>>> tc/prometheus/prometheus.yml
>>>>>> May 10 19:35:07 devops-mon-01 prometheus[6076]: level=info
>>>>>> ts=2020-05-10T15:05:07.416Z caller=main.go:762 msg="Completed loading of
>>>>>> configuration file
>>>>>> " filename=/etc/prometheus/prometheus.yml
>>>>>> May 10 19:35:07 devops-mon-01 systemd[1]: Reloaded Prometheus.
>>>>>> May 10 19:53:22 devops-mon-01 prometheus[6076]: level=error
>>>>>> ts=2020-05-10T15:23:22.621Z caller=api.go:1347 component=web msg="error
>>>>>> writing response"
>>>>>> bytesWritten=0 err="write tcp 172.16.77.50:9090->172.16.76.168:56778:
>>>>>> write: broken pipe"
>>>>>> May 10 20:25:53 devops-mon-01 prometheus[6076]: level=error
>>>>>> ts=2020-05-10T15:55:53.058Z caller=api.go:1347 component=web msg="error
>>>>>> writing response"
>>>>>> bytesWritten=0 err="write tcp 172.16.77.50:9090->172.16.76.168:41728:
>>>>>> write: broken pipe"
>>>>>>
>>>>>> 172.16.77.50 is our Prometheus server and 172.16.76.168 is our
>>>>>> grafana server so I think the last error is not related to my problem
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Prometheus Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/prometheus-users/156d8c36-c1de-4ca3-8b2a-2cfbcb5895fc%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/prometheus-users/156d8c36-c1de-4ca3-8b2a-2cfbcb5895fc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Julius Volz
>>>>> PromLabs - promlabs.com
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Best Regards*
>>>>
>>>> *Yashar Nesabian*
>>>>
>>>> *Senior Site Reliability Engineer*
>>>>
>>>
>>>
>>> --
>>> Julius Volz
>>> PromLabs - promlabs.com
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/75c2fc70-af2a-4d23-beb8-0682bb250437%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/75c2fc70-af2a-4d23-beb8-0682bb250437%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Julius Volz
PromLabs - promlabs.com

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAObpH5xLdQyibeBoWRD1i%3DzoXNvvQFB71HuNNHoFCeCK965EHw%40mail.gmail.com.

Re: [prometheus-users] Prometheus is not showing the metric gathered from netdata

Reply via email to