Re: [prometheus-users] Re: Zero downtime upgrade of node_exporters

Feder John Wed, 09 Dec 2020 01:59:47 -0800

Sounds like a valid approach, I will take this into consideration also, 
thanks!
As a gratis, I have a bit more exotic exporters as well, for those I might 
need to spin up docker simulations, or do you have other recommendations?


Stuart Clark a következőt írta (2020. december 9., szerda, 10:32:16 UTC+1):

> The problem you would have is that all your alerts and dashboards are 
> likely to break for those two servers anyway. Not all metrics are likely to 
> have been renamed, so for any which exist in both versions your queries 
> will now be returning two sets of time series per server instead of only 
> one.
>
> For the situation you mention I would suggest the first thing you should 
> do is to update all your dashboards and alerts. For some it might be easier 
> to create totally new versions of a dashboard, for others you might be able 
> to adjust things to work with both the old and new names.
>
> To achieve that, just run a test Prometheus (maybe locally) with the new 
> and old exporter (again could be just local) so that you can see what has 
> changed.
>
> Only after you have done that (or at least critical ones, if it is 
> acceptable to have some temporary breakage) would I look to roll out the 
> updated exporters. Depending on your processes you could have a big bang 
> maintenance window where you do that and deploy the updated dashboards and 
> alerts at the same time. 
>
> On 9 December 2020 09:12:17 GMT, Feder John <jfed...@gmail.com> wrote:
>>
>> Yes, the main painpoints are the following:
>>
>>    - large fleet of instances
>>    - huge gab between exporter version (years, so yeah metrics were 
>>    renamed a few times -> this would break dashboards)
>>
>> The initial reason for setting up a parallel pipeline is just my 
>> curiosity, to compare the diff between the 2 sets of exporters.
>>
>> Alternatively, let's say I have 30 backend machines, I can roll out the 
>> latest exporter version to say 2 of those, and start building the new 
>> dashboards, recording rules with the time series data of those 2 instances. 
>> However, until these are done, we would have degraded monitoring for those 
>> 2 instances, hence I am leaning something similar to the original proposal.
>>
>> What do you think?
>>
>> sup...@gmail.com a következőt írta (2020. december 8., kedd, 23:09:42 
>> UTC+1):
>>
>>> I've seen (and done) running multiple Prometheus instances of different 
>>> versions for testing and upgrades, but not typically different versions of 
>>> the exporter.
>>>
>>> There is some issue with upgrading and different metric names, but that 
>>> can typically be solve downstream with prepared changes to recording rules 
>>> and dashboards. (query: "new_metric_name or or_metric_name") 
>>>
>>> But as Stuart says, rolling out a new node_exporter doesn't require 
>>> deploying two versions simultaneously. A simple service restart takes under 
>>> a second. With a 15 second scrape interval, the chances of a lost scrape 
>>> are pretty low. And even then, the Prometheus design is robust against a 
>>> few lost scrapes.
>>>
>>> Just test it carefully, and have a good rollback plan.
>>>
>>> On Tue, Dec 8, 2020 at 9:52 PM Stuart Clark <stuart...@jahingo.com> 
>>> wrote:
>>>
>>>> On 08/12/2020 18:21, Feder John wrote:
>>>>
>>>> Basically my question is:
>>>>
>>>>    - How would you solve running 2 exporters on the same hosts, 
>>>>    exposing 2 sets of metrics to 2 different ports?
>>>>    
>>>>
>>>> Running multiple instances of the same exporter is generally not a 
>>>> problem. You just need to ensure you set the port to be different for each 
>>>> instance.
>>>>
>>>> On Tuesday, December 8, 2020 at 7:20:10 PM UTC+1 Feder John wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> I would like to upgrade node_exporters (and later others as well) on a 
>>>>> fleet of machines.
>>>>> I have an existing pipeline set up (machines -> prometheus -> 
>>>>> grafana). I would like to perform a zero downtime upgrade, something like 
>>>>> this:
>>>>>
>>>>>    - Spin up a new prometheus 
>>>>>    - Rollout the new version to a set of machines, WHILE having the 
>>>>>    old exporter exposing metrics for a different port the new exporter 
>>>>> would 
>>>>>    expose metrics to another port 
>>>>>    - (optionally, I could even scrape the new exporters from the new 
>>>>>    prometheus)
>>>>>    
>>>>> This way, I could compare the 2 versions easily and perform a 
>>>>> switchover to the new one after decoming the old exporters.
>>>>>
>>>>> Does it sounds like a doable plan?
>>>>> If so, how would you manage to set this up?
>>>>>
>>>>> I'm not quite sure I understand what you are trying to achieve with 
>>>> this?
>>>>
>>>> While running two parallel versions of Node Exporter is certainly 
>>>> possible, it does make things a lot more complex. You would need to set 
>>>> the 
>>>> second instance to a different port (and therefore would not use the 
>>>> normal 
>>>> "well known" port for that instance) as well as more complex automation 
>>>> for 
>>>> the deployment.
>>>>
>>>> The Node Exporter starts up pretty quickly, so if you are trying not to 
>>>> miss any scrapes it should be quite possible to copy over the new binary, 
>>>> stop and then start the exporter within a scrape interval (assuming the 
>>>> scrape interval isn't really short).
>>>>
>>>> What does starting a totally new instance of Prometheus give you? 
>>>> Presumably this new instance wouldn't be connected to Grafana and wouldn't 
>>>> contain any data, so I'm not seeing why it would be a benefit?
>>>>
>>>> If you did parallel run two instances of Node Exporter you could scrape 
>>>> both from a single Prometheus, but you'd need to be really careful with 
>>>> all 
>>>> your queries as without the right selectors you'd suddenly be returning 
>>>> two 
>>>> values for each metric (just differing by the job label).
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Prometheus Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to prometheus-use...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/prometheus-users/9cb3d064-49c4-373e-bf27-0549442a05cb%40Jahingo.com
>>>>  
>>>> <https://groups.google.com/d/msgid/prometheus-users/9cb3d064-49c4-373e-bf27-0549442a05cb%40Jahingo.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/65c5a1db-d98b-4a3f-bca6-8690857d72d4n%40googlegroups.com.

Re: [prometheus-users] Re: Zero downtime upgrade of node_exporters

Reply via email to