Re: [prometheus-users] Re: Prometheus self scrape job taking too much memory

Yaron B Tue, 24 Aug 2021 00:48:26 -0700

thanks!

ב-יום שני, 23 באוגוסט 2021 בשעה 22:27:57 UTC+3, [email protected] כתב/ה:


> 50 nodes at 64Gi is 3200Gi of memory. Using 30Gi is 0.9% of the cluster. 
> This is a little high, but not out of bounds for a normal deployment.
>
> I would recommend starting to consider sharding by Kubernetes namespace. 
> This is what we're working on to avoid single service namespaces from 
> exploding the cluster monitoring too badly.
>
> On Mon, Aug 23, 2021 at 4:13 PM Yaron B <[email protected]> wrote:
>
>> we have around 50 nodes with 64 gig of ram.
>>
>> by the way, we found that our backend added a metric that spammed the 
>> prometheus until it crashed :)
>> they removed the metric and the server seems to be stable.
>> still using around 30gb of ram but at least not crashing
>>
>> ב-יום שני, 23 באוגוסט 2021 בשעה 16:25:21 UTC+3, [email protected] כתב/ה:
>>
>>> Seems about correct for that many series. Kubernetes use includes a lot 
>>> of label data/cardinality that requires extra memory for tracking.
>>>
>>> How big is your cluster in terms of total memory for all nodes?
>>>
>>> On Mon, Aug 23, 2021 at 2:18 PM Yaron B <[email protected]> wrote:
>>>
>>>> that makes sense but if I look at the numbers in the url you gave me:
>>>> Number of Series 2514033
>>>> Number of Chunks 3098707
>>>> Number of Label Pairs 1088507
>>>> and use them in memory calculator I found, it shows me much less ram 
>>>> than what I am using now.
>>>>
>>>> do you see any number here that should be a red light for me? something 
>>>> that is not right?
>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 14:58:36 UTC+3, [email protected] כתב/ה:
>>>>
>>>>> Prometheus needs memory to buffer incoming data before writing it to 
>>>>> disk. The more you scrape, the more it needs.
>>>>>
>>>>> You can see a summary of this information on 
>>>>> prometheus:9090/tsdb-status
>>>>>
>>>>> On Mon, Aug 23, 2021 at 1:55 PM Yaron B <[email protected]> wrote:
>>>>>
>>>>>> can anyone understand from this image why is the server is using so 
>>>>>> much ?
>>>>>> production-prometheus-server-869bffc459-r92nh                    
>>>>>>  1186m        54937Mi
>>>>>> thats crazy!
>>>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 13:35:18 UTC+3, ‪Yaron B‬‏ כתב/ה:
>>>>>>
>>>>>>> at the moment we did add some scrape jobs that bumped the memory 
>>>>>>> usage from around 30gb to 40gb but we are not sure why the self 
>>>>>>> scraping 
>>>>>>> takes so much ram.
>>>>>>>  its not a new implementation, we did notice it is using a lot of 
>>>>>>> memory but it didn't crash on us so we let it run. today 
>>>>>>> as you can see in the attached image, it crashed, skyrocket the 
>>>>>>> memory usage to 60gb ,then we started to disable jobs until the server 
>>>>>>> didn't crash anymore but it is using more than it used in the last 15 
>>>>>>> days
>>>>>>>
>>>>>>>
>>>>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 13:29:59 UTC+3, Stuart Clark כתב/ה:
>>>>>>>
>>>>>>>> On 23/08/2021 11:23, Yaron B wrote:
>>>>>>>>
>>>>>>>> I am attaching the heap.svg if someone can help me figure out what 
>>>>>>>> is using the memory 
>>>>>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 12:23:33 UTC+3, ‪Yaron B‬‏ כתב/ה:
>>>>>>>>
>>>>>>>>> Hi, 
>>>>>>>>>
>>>>>>>>> we are facing an issue with the prometheus server memory usage.
>>>>>>>>> when starting the server it starts with around 30GB of ram , even 
>>>>>>>>> without any jobs configured other than the self one.
>>>>>>>>> in the image attached you can see the heap size usage for the 
>>>>>>>>> prometheus job. 
>>>>>>>>> is there a way to reduce this size? when we add our kubernetes 
>>>>>>>>> scrape job we reach our node limit and get OOMKilled.
>>>>>>>>>
>>>>>>>> So at the moment it isn't scraping anything other than itself via 
>>>>>>>> the /metrics endpoint?
>>>>>>>>
>>>>>>>> Is this a brand new service (i.e. no existing data stored on disk)?
>>>>>>>>
>>>>>>>> Is there anything querying the server (e.g. Grafana dashboards, 
>>>>>>>> etc.)?
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Stuart Clark
>>>>>>>>
>>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Prometheus Users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/prometheus-users/0659c262-daeb-452e-8dc4-4df8df22021dn%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/prometheus-users/0659c262-daeb-452e-8dc4-4df8df22021dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Prometheus Users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/prometheus-users/b17b8d09-fe23-4c43-b85e-c2f4d7a87539n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/prometheus-users/b17b8d09-fe23-4c43-b85e-c2f4d7a87539n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/prometheus-users/8fefbe26-6cf0-498a-96d2-0bb21f536ee5n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/prometheus-users/8fefbe26-6cf0-498a-96d2-0bb21f536ee5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e32d7477-6076-4166-a0be-63d740b441f0n%40googlegroups.com.

Re: [prometheus-users] Re: Prometheus self scrape job taking too much memory

Reply via email to