thanks! ב-יום שני, 23 באוגוסט 2021 בשעה 22:27:57 UTC+3, [email protected] כתב/ה:
> 50 nodes at 64Gi is 3200Gi of memory. Using 30Gi is 0.9% of the cluster. > This is a little high, but not out of bounds for a normal deployment. > > I would recommend starting to consider sharding by Kubernetes namespace. > This is what we're working on to avoid single service namespaces from > exploding the cluster monitoring too badly. > > On Mon, Aug 23, 2021 at 4:13 PM Yaron B <[email protected]> wrote: > >> we have around 50 nodes with 64 gig of ram. >> >> by the way, we found that our backend added a metric that spammed the >> prometheus until it crashed :) >> they removed the metric and the server seems to be stable. >> still using around 30gb of ram but at least not crashing >> >> ב-יום שני, 23 באוגוסט 2021 בשעה 16:25:21 UTC+3, [email protected] כתב/ה: >> >>> Seems about correct for that many series. Kubernetes use includes a lot >>> of label data/cardinality that requires extra memory for tracking. >>> >>> How big is your cluster in terms of total memory for all nodes? >>> >>> On Mon, Aug 23, 2021 at 2:18 PM Yaron B <[email protected]> wrote: >>> >>>> that makes sense but if I look at the numbers in the url you gave me: >>>> Number of Series 2514033 >>>> Number of Chunks 3098707 >>>> Number of Label Pairs 1088507 >>>> and use them in memory calculator I found, it shows me much less ram >>>> than what I am using now. >>>> >>>> do you see any number here that should be a red light for me? something >>>> that is not right? >>>> ב-יום שני, 23 באוגוסט 2021 בשעה 14:58:36 UTC+3, [email protected] כתב/ה: >>>> >>>>> Prometheus needs memory to buffer incoming data before writing it to >>>>> disk. The more you scrape, the more it needs. >>>>> >>>>> You can see a summary of this information on >>>>> prometheus:9090/tsdb-status >>>>> >>>>> On Mon, Aug 23, 2021 at 1:55 PM Yaron B <[email protected]> wrote: >>>>> >>>>>> can anyone understand from this image why is the server is using so >>>>>> much ? >>>>>> production-prometheus-server-869bffc459-r92nh >>>>>> 1186m 54937Mi >>>>>> thats crazy! >>>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 13:35:18 UTC+3, Yaron B כתב/ה: >>>>>> >>>>>>> at the moment we did add some scrape jobs that bumped the memory >>>>>>> usage from around 30gb to 40gb but we are not sure why the self >>>>>>> scraping >>>>>>> takes so much ram. >>>>>>> its not a new implementation, we did notice it is using a lot of >>>>>>> memory but it didn't crash on us so we let it run. today >>>>>>> as you can see in the attached image, it crashed, skyrocket the >>>>>>> memory usage to 60gb ,then we started to disable jobs until the server >>>>>>> didn't crash anymore but it is using more than it used in the last 15 >>>>>>> days >>>>>>> >>>>>>> >>>>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 13:29:59 UTC+3, Stuart Clark כתב/ה: >>>>>>> >>>>>>>> On 23/08/2021 11:23, Yaron B wrote: >>>>>>>> >>>>>>>> I am attaching the heap.svg if someone can help me figure out what >>>>>>>> is using the memory >>>>>>>> ב-יום שני, 23 באוגוסט 2021 בשעה 12:23:33 UTC+3, Yaron B כתב/ה: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> we are facing an issue with the prometheus server memory usage. >>>>>>>>> when starting the server it starts with around 30GB of ram , even >>>>>>>>> without any jobs configured other than the self one. >>>>>>>>> in the image attached you can see the heap size usage for the >>>>>>>>> prometheus job. >>>>>>>>> is there a way to reduce this size? when we add our kubernetes >>>>>>>>> scrape job we reach our node limit and get OOMKilled. >>>>>>>>> >>>>>>>> So at the moment it isn't scraping anything other than itself via >>>>>>>> the /metrics endpoint? >>>>>>>> >>>>>>>> Is this a brand new service (i.e. no existing data stored on disk)? >>>>>>>> >>>>>>>> Is there anything querying the server (e.g. Grafana dashboards, >>>>>>>> etc.)? >>>>>>>> >>>>>>>> -- >>>>>>>> Stuart Clark >>>>>>>> >>>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Prometheus Users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/prometheus-users/0659c262-daeb-452e-8dc4-4df8df22021dn%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/prometheus-users/0659c262-daeb-452e-8dc4-4df8df22021dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Prometheus Users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/prometheus-users/b17b8d09-fe23-4c43-b85e-c2f4d7a87539n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/prometheus-users/b17b8d09-fe23-4c43-b85e-c2f4d7a87539n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/8fefbe26-6cf0-498a-96d2-0bb21f536ee5n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/prometheus-users/8fefbe26-6cf0-498a-96d2-0bb21f536ee5n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e32d7477-6076-4166-a0be-63d740b441f0n%40googlegroups.com.

