Re: [prometheus-users] Prometheus is not showing the metric gathered from netdata

Julien Pivotto Sun, 10 May 2020 13:55:17 -0700

Hi,


Could you try with honor_timestamps: false ?

Thanks

On 10 May 13:13, Yashar Nesabian wrote:
> The other slaves have 2-3 seconds difference with the timestamp of these 
> metrics, and yes the 2:57pm UTC is almost correct (I don't know the exact 
> time) and using foo[24h] is not very informative right now because we still 
> have the previous metrics when the slaves were on netdata master number 1.
> I did another experiment, I downloaded the metric files again and ran the 
> command (date +%s) on the Prometheus server almost at the same time, 
> The metrics' timestamp was 1589141392868 and the server's timestamp 
> was  1589141393 So I think this is not the problem
> 
> On Monday, May 11, 2020 at 12:19:23 AM UTC+4:30, Julius Volz wrote:
> >
> > [+CCing back prometheus-users, which I had accidentally removed]
> >
> > How similar are the others? The ones in your example are from this 
> > afternoon (2:57pm UTC), I guess that's when you downloaded the file for 
> > grepping first?
> >
> > A regular instant vector selector in PromQL (like just "foo") will only 
> > select data points up to 5 minutes into the past from the current 
> > evaluation timestamp. So the table view would not show samples for any 
> > series whose last sample is more than 5m into the past. You could try a 
> > range selector like "foo[24h]" on these to see if any historical data is 
> > returned (I would expect so).
> >
> > On Sun, May 10, 2020 at 9:37 PM Yashar Nesabian <[email protected] 
> > <javascript:>> wrote:
> >
> >> Sure, here it is:
> >> if the second parameter is the timestamp, then yes that's the problem, 
> >> but I wonder how come other metrics are stored by the Prometheus server? 
> >> because they also have a similar timestamp
> >>
> >> grep -i "netdata_web_log_detailed_response_codes_total" 
> >> allmetrics\?format=prometheus_all_hosts\&source=as-collected.2 | grep -i 
> >> "abs"
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-02.x.y.zabs"}
> >>  245453 1589122673736
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-02.x.y.zabs"}
> >>  82 1589122673736
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-02.x.y.zabs"}
> >>  6 1589122673736
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-04.x.y.zabs"}
> >>  238105 1589122673017
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-04.x.y.zabs"}
> >>  59 1589122673017
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-04.x.y.zabs"}
> >>  3 1589122673017
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-03.x.y.zabs"}
> >>  241708 1589122673090
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-03.x.y.zabs"}
> >>  68 1589122673090
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-03.x.y.zabs"}
> >>  5 1589122673090
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="200",instance="abs-01.x.y.zabs"}
> >>  250296 1589122674872
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="400",instance="abs-01.x.y.zabs"}
> >>  81 1589122674872
> >> netdata_web_log_detailed_response_codes_total{chart="web_log_passenger_event.detailed_response_codes",family="responses",dimension="401",instance="abs-01.x.y.zabs"}
> >>  7 1589122674872
> >>
> >>
> >>
> >> On Sun, May 10, 2020 at 10:36 PM Julius Volz <[email protected] 
> >> <javascript:>> wrote:
> >>
> >>> Hmm, odd. Could you share some of the lines that your grep finds in the 
> >>> metrics output of the correctly scraped target?
> >>>
> >>> The example at the top of https://github.com/netdata/netdata/issues/3891 
> >>> suggests 
> >>> that Netdata sets client-side timestamps for samples (which is uncommon 
> >>> for 
> >>> Prometheus otherwise). Maybe those timestamps are too far in the past 
> >>> (more 
> >>> than 5 minutes), so they would not be shown anymore?
> >>>
> >>> On Sun, May 10, 2020 at 6:51 PM Yashar Nesabian <[email protected] 
> >>> <javascript:>> wrote:
> >>>
> >>>> I have a job on the Prometheus which gathers metrics from 4 netdata 
> >>>> master servers. Here is the scenario I had:
> >>>> - on netdata master number 1, I gather metrics of  about 200 slaves
> >>>> - For some reason, I decided to move 12 slaves 
> >>>> (a1,a2,a3,a4,b1,b2,b3,b4,c1,c2,c3,c4) from the first netdata master to 
> >>>> the 
> >>>> second netdata master
> >>>> - Now I only see metrics from 8 servers on the Prometheus server 
> >>>> a1,a2,a3,a4,b1,b2,b3,b4) coming from the second master
> >>>> - I check the job status in the targets page and I see all 4 masters 
> >>>> are up and metrics are gathered successfully
> >>>> - Here is the URL which Prometheus uses to read the metrics from the 
> >>>> netdata master number 2: 
> >>>> http://172.16.76.152:19999/api/v1/allmetrics?format=prometheus_all_hosts
> >>>> - I grep the downloaded file with hosts metrics for the c1,c2,c3,c4 
> >>>> hosts and I see netdata is sending all the metrics relevant to these 
> >>>> slaves
> >>>> - But when I search for the metric in the Graph page, I don't see any 
> >>>> results:
> >>>>
> >>>> [image: Screenshot from 2020-05-10 20-58-27.png]
> >>>>
> >>>> all the servers' time is synced and are correct.
> >>>> here is the output of systemctl status prometheus:
> >>>>
> >>>> May 10 19:35:07 devops-mon-01 systemd[1]: Reloading Prometheus.
> >>>> May 10 19:35:07 devops-mon-01 prometheus[6076]: level=info 
> >>>> ts=2020-05-10T15:05:07.407Z caller=main.go:734 msg="Loading 
> >>>> configuration 
> >>>> file" filename=/e
> >>>> tc/prometheus/prometheus.yml
> >>>> May 10 19:35:07 devops-mon-01 prometheus[6076]: level=info 
> >>>> ts=2020-05-10T15:05:07.416Z caller=main.go:762 msg="Completed loading of 
> >>>> configuration file
> >>>> " filename=/etc/prometheus/prometheus.yml
> >>>> May 10 19:35:07 devops-mon-01 systemd[1]: Reloaded Prometheus.
> >>>> May 10 19:53:22 devops-mon-01 prometheus[6076]: level=error 
> >>>> ts=2020-05-10T15:23:22.621Z caller=api.go:1347 component=web msg="error 
> >>>> writing response"
> >>>> bytesWritten=0 err="write tcp 172.16.77.50:9090->172.16.76.168:56778: 
> >>>> write: broken pipe"
> >>>> May 10 20:25:53 devops-mon-01 prometheus[6076]: level=error 
> >>>> ts=2020-05-10T15:55:53.058Z caller=api.go:1347 component=web msg="error 
> >>>> writing response"
> >>>> bytesWritten=0 err="write tcp 172.16.77.50:9090->172.16.76.168:41728: 
> >>>> write: broken pipe"
> >>>>
> >>>> 172.16.77.50 is our Prometheus server and 172.16.76.168 is our grafana 
> >>>> server so I think the last error is not related to my problem
> >>>>
> >>>> -- 
> >>>> You received this message because you are subscribed to the Google 
> >>>> Groups "Prometheus Users" group.
> >>>> To unsubscribe from this group and stop receiving emails from it, send 
> >>>> an email to [email protected] <javascript:>.
> >>>> To view this discussion on the web visit 
> >>>> https://groups.google.com/d/msgid/prometheus-users/156d8c36-c1de-4ca3-8b2a-2cfbcb5895fc%40googlegroups.com
> >>>>  
> >>>> <https://groups.google.com/d/msgid/prometheus-users/156d8c36-c1de-4ca3-8b2a-2cfbcb5895fc%40googlegroups.com?utm_medium=email&utm_source=footer>
> >>>> .
> >>>>
> >>>
> >>>
> >>> -- 
> >>> Julius Volz
> >>> PromLabs - promlabs.com
> >>>
> >>
> >>
> >> -- 
> >>
> >> *Best Regards*
> >>
> >> *Yashar Nesabian*
> >>
> >> *Senior Site Reliability Engineer*
> >>
> >
> >
> > -- 
> > Julius Volz
> > PromLabs - promlabs.com
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/763aa44f-814a-40c9-b291-da1251329134%40googlegroups.com.


-- 
Julien Pivotto
@roidelapluie

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200510205424.GA174608%40oxygen.

signature.asc
Description: PGP signature

Re: [prometheus-users] Prometheus is not showing the metric gathered from netdata

Reply via email to