Re: [prometheus-users] Re: Zero downtime upgrade of node_exporters

Feder John Wed, 09 Dec 2020 01:12:23 -0800

Yes, the main painpoints are the following:

   - large fleet of instances
   - huge gab between exporter version (years, so yeah metrics were renamed 
   a few times -> this would break dashboards)


The initial reason for setting up a parallel pipeline is just my curiosity, 
to compare the diff between the 2 sets of exporters.

Alternatively, let's say I have 30 backend machines, I can roll out the 
latest exporter version to say 2 of those, and start building the new 
dashboards, recording rules with the time series data of those 2 instances. 
However, until these are done, we would have degraded monitoring for those 
2 instances, hence I am leaning something similar to the original proposal.

What do you think?

sup...@gmail.com a következőt írta (2020. december 8., kedd, 23:09:42 
UTC+1):

> I've seen (and done) running multiple Prometheus instances of different 
> versions for testing and upgrades, but not typically different versions of 
> the exporter.
>
> There is some issue with upgrading and different metric names, but that 
> can typically be solve downstream with prepared changes to recording rules 
> and dashboards. (query: "new_metric_name or or_metric_name") 
>
> But as Stuart says, rolling out a new node_exporter doesn't require 
> deploying two versions simultaneously. A simple service restart takes under 
> a second. With a 15 second scrape interval, the chances of a lost scrape 
> are pretty low. And even then, the Prometheus design is robust against a 
> few lost scrapes.
>
> Just test it carefully, and have a good rollback plan.
>
> On Tue, Dec 8, 2020 at 9:52 PM Stuart Clark <stuart...@jahingo.com> wrote:
>
>> On 08/12/2020 18:21, Feder John wrote:
>>
>> Basically my question is:
>>
>>    - How would you solve running 2 exporters on the same hosts, exposing 
>>    2 sets of metrics to 2 different ports?
>>    
>>
>> Running multiple instances of the same exporter is generally not a 
>> problem. You just need to ensure you set the port to be different for each 
>> instance.
>>
>> On Tuesday, December 8, 2020 at 7:20:10 PM UTC+1 Feder John wrote:
>>
>>> Hello
>>>
>>> I would like to upgrade node_exporters (and later others as well) on a 
>>> fleet of machines.
>>> I have an existing pipeline set up (machines -> prometheus -> grafana). 
>>> I would like to perform a zero downtime upgrade, something like this:
>>>
>>>    - Spin up a new prometheus 
>>>    - Rollout the new version to a set of machines, WHILE having the old 
>>>    exporter exposing metrics for a different port the new exporter would 
>>>    expose metrics to another port 
>>>    - (optionally, I could even scrape the new exporters from the new 
>>>    prometheus)
>>>    
>>> This way, I could compare the 2 versions easily and perform a switchover 
>>> to the new one after decoming the old exporters.
>>>
>>> Does it sounds like a doable plan?
>>> If so, how would you manage to set this up?
>>>
>>> I'm not quite sure I understand what you are trying to achieve with this?
>>
>> While running two parallel versions of Node Exporter is certainly 
>> possible, it does make things a lot more complex. You would need to set the 
>> second instance to a different port (and therefore would not use the normal 
>> "well known" port for that instance) as well as more complex automation for 
>> the deployment.
>>
>> The Node Exporter starts up pretty quickly, so if you are trying not to 
>> miss any scrapes it should be quite possible to copy over the new binary, 
>> stop and then start the exporter within a scrape interval (assuming the 
>> scrape interval isn't really short).
>>
>> What does starting a totally new instance of Prometheus give you? 
>> Presumably this new instance wouldn't be connected to Grafana and wouldn't 
>> contain any data, so I'm not seeing why it would be a benefit?
>>
>> If you did parallel run two instances of Node Exporter you could scrape 
>> both from a single Prometheus, but you'd need to be really careful with all 
>> your queries as without the right selectors you'd suddenly be returning two 
>> values for each metric (just differing by the job label).
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to prometheus-use...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/9cb3d064-49c4-373e-bf27-0549442a05cb%40Jahingo.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/9cb3d064-49c4-373e-bf27-0549442a05cb%40Jahingo.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/241b971c-d1b9-47df-ab76-0cbc5cd81850n%40googlegroups.com.

Re: [prometheus-users] Re: Zero downtime upgrade of node_exporters

Reply via email to