Re: [prometheus-users] Re: Zero downtime upgrade of node_exporters

Stuart Clark Wed, 09 Dec 2020 01:32:16 -0800

The problem you would have is that all your alerts and dashboards are likely to 
break for those two servers anyway. Not all metrics are likely to have been 
renamed, so for any which exist in both versions your queries will now be 
returning two sets of time series per server instead of only one.


For the situation you mention I would suggest the first thing you should do is 
to update all your dashboards and alerts. For some it might be easier to create 
totally new versions of a dashboard, for others you might be able to adjust 
things to work with both the old and new names.

To achieve that, just run a test Prometheus (maybe locally) with the new and 
old exporter (again could be just local) so that you can see what has changed.

Only after you have done that (or at least critical ones, if it is acceptable 
to have some temporary breakage) would I look to roll out the updated 
exporters. Depending on your processes you could have a big bang maintenance 
window where you do that and deploy the updated dashboards and alerts at the 
same time. 

On 9 December 2020 09:12:17 GMT, Feder John <jfede...@gmail.com> wrote:
>Yes, the main painpoints are the following:
>
>   - large fleet of instances
>- huge gab between exporter version (years, so yeah metrics were
>renamed 
>   a few times -> this would break dashboards)
>
>The initial reason for setting up a parallel pipeline is just my
>curiosity, 
>to compare the diff between the 2 sets of exporters.
>
>Alternatively, let's say I have 30 backend machines, I can roll out the
>
>latest exporter version to say 2 of those, and start building the new 
>dashboards, recording rules with the time series data of those 2
>instances. 
>However, until these are done, we would have degraded monitoring for
>those 
>2 instances, hence I am leaning something similar to the original
>proposal.
>
>What do you think?
>
>sup...@gmail.com a következőt írta (2020. december 8., kedd, 23:09:42 
>UTC+1):
>
>> I've seen (and done) running multiple Prometheus instances of
>different 
>> versions for testing and upgrades, but not typically different
>versions of 
>> the exporter.
>>
>> There is some issue with upgrading and different metric names, but
>that 
>> can typically be solve downstream with prepared changes to recording
>rules 
>> and dashboards. (query: "new_metric_name or or_metric_name") 
>>
>> But as Stuart says, rolling out a new node_exporter doesn't require 
>> deploying two versions simultaneously. A simple service restart takes
>under 
>> a second. With a 15 second scrape interval, the chances of a lost
>scrape 
>> are pretty low. And even then, the Prometheus design is robust
>against a 
>> few lost scrapes.
>>
>> Just test it carefully, and have a good rollback plan.
>>
>> On Tue, Dec 8, 2020 at 9:52 PM Stuart Clark <stuart...@jahingo.com>
>wrote:
>>
>>> On 08/12/2020 18:21, Feder John wrote:
>>>
>>> Basically my question is:
>>>
>>>    - How would you solve running 2 exporters on the same hosts,
>exposing 
>>>    2 sets of metrics to 2 different ports?
>>>    
>>>
>>> Running multiple instances of the same exporter is generally not a 
>>> problem. You just need to ensure you set the port to be different
>for each 
>>> instance.
>>>
>>> On Tuesday, December 8, 2020 at 7:20:10 PM UTC+1 Feder John wrote:
>>>
>>>> Hello
>>>>
>>>> I would like to upgrade node_exporters (and later others as well)
>on a 
>>>> fleet of machines.
>>>> I have an existing pipeline set up (machines -> prometheus ->
>grafana). 
>>>> I would like to perform a zero downtime upgrade, something like
>this:
>>>>
>>>>    - Spin up a new prometheus 
>>>>    - Rollout the new version to a set of machines, WHILE having the
>old 
>>>>    exporter exposing metrics for a different port the new exporter
>would 
>>>>    expose metrics to another port 
>>>>    - (optionally, I could even scrape the new exporters from the
>new 
>>>>    prometheus)
>>>>    
>>>> This way, I could compare the 2 versions easily and perform a
>switchover 
>>>> to the new one after decoming the old exporters.
>>>>
>>>> Does it sounds like a doable plan?
>>>> If so, how would you manage to set this up?
>>>>
>>>> I'm not quite sure I understand what you are trying to achieve with
>this?
>>>
>>> While running two parallel versions of Node Exporter is certainly 
>>> possible, it does make things a lot more complex. You would need to
>set the 
>>> second instance to a different port (and therefore would not use the
>normal 
>>> "well known" port for that instance) as well as more complex
>automation for 
>>> the deployment.
>>>
>>> The Node Exporter starts up pretty quickly, so if you are trying not
>to 
>>> miss any scrapes it should be quite possible to copy over the new
>binary, 
>>> stop and then start the exporter within a scrape interval (assuming
>the 
>>> scrape interval isn't really short).
>>>
>>> What does starting a totally new instance of Prometheus give you? 
>>> Presumably this new instance wouldn't be connected to Grafana and
>wouldn't 
>>> contain any data, so I'm not seeing why it would be a benefit?
>>>
>>> If you did parallel run two instances of Node Exporter you could
>scrape 
>>> both from a single Prometheus, but you'd need to be really careful
>with all 
>>> your queries as without the right selectors you'd suddenly be
>returning two 
>>> values for each metric (just differing by the job label).
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google
>Groups 
>>> "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it,
>send an 
>>> email to prometheus-use...@googlegroups.com.
>>> To view this discussion on the web visit 
>>>
>https://groups.google.com/d/msgid/prometheus-users/9cb3d064-49c4-373e-bf27-0549442a05cb%40Jahingo.com
>
>>>
><https://groups.google.com/d/msgid/prometheus-users/9cb3d064-49c4-373e-bf27-0549442a05cb%40Jahingo.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "Prometheus Users" group.
>To unsubscribe from this group and stop receiving emails from it, send
>an email to prometheus-users+unsubscr...@googlegroups.com.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/prometheus-users/241b971c-d1b9-47df-ab76-0cbc5cd81850n%40googlegroups.com.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/156828E2-564F-4C11-BA4A-F9888E3395CC%40Jahingo.com.

Re: [prometheus-users] Re: Zero downtime upgrade of node_exporters

Reply via email to