The problem is not that much priorities etc, it is all the questions and confusions around this:
- When do we decide we are overloaded? - What do we do for the low priority targets? and more importantly: - When do we decide that we can scrape the low targets again? How to avoid: High load -> stop low scrapes -> Normal load (because we do not scrape low priorities) -> restart low scrapes -> High load -> stop low scrapes -> Normal load (because we do not scrape low priorities) -> restart low scrapes -> High load -> stop low scrapes -> Normal load (because we do not scrape low priorities) -> restart low scrapes Overall that does not seem easy questions. On 30 Jul 10:10, Bartłomiej Płotka wrote: > Yes, looks like having many scrapers would solve this, and having Thanos on > top for query aggregation can do. However, given the overhead of even > operating the TSDB instances like Prometheus (e.g maintaining persistence > volumes), I would still see some longer-term solution of better multitenant > support (isolation of tenants scrape) within scrape engine. Some > alternative is dynamic relabelling configured from outside as seen here > https://blog.freshtracks.io/bomb-squad-automatic-detection-and-suppression-of-prometheus-cardinality-explosions-62ca8e02fa32 > - > I think with good monitoring of Prometheus health we could implement > "sidecar" applying such priorities dynamically as well. That would be good > for a star maybe (: > > In the meantime, the separate scraper looks like the way to go. > > Kind Regards, > Bartek > > On Thu, 30 Jul 2020 at 10:01, Lili Cosic <cosicl...@gmail.com> wrote: > > > Thanks, everyone for the replies! The official msg seems to be to use a > > Prometheus instance per tenant/priority if you want to have multiple > > tenants in your environment. > > > > Kind regards, > > Lili > > > > On Thursday, 30 July 2020 10:44:59 UTC+2, Ben Kochie wrote: > >> > >> I'm with Brian and Julian on this. > >> > >> Multi-tenancy is not really something we want to solve in Prometheus. > >> This is a concern for higher level systems like Kubernetes. Prometheus is > >> designed to be distributed. If you have targets with different needs, they > >> need to have separate Prometheus instances. > >> > >> This is also why we have things like Thanos and Cortex as aggregation > >> layers. > >> > >> Similar to why we have said we don't plan to implement IO limits, this is > >> a scheduling concern, out of scope for Prometheus. > >> > >> On Thu, Jul 30, 2020, 10:31 Frederic Branczyk <fbra...@gmail.com> wrote: > >> > >>> That's only effective in limiting the number of targets, the point here > >>> is that selectively scraping those with a higher priority based on > >>> backpressure of the system as a whole. > >>> > >>> On Wed, 22 Jul 2020 at 17:00, Julien Pivotto <roidel...@prometheus.io> > >>> wrote: > >>> > >>>> On 22 Jul 16:47, Frederic Branczyk wrote: > >>>> > In practice even that can still be problematic. You only know that > >>>> > Prometheus has a problem when everything fails, the point is to keep > >>>> things > >>>> > alive well enough for more critical components. > >>>> > > >>>> > On Wed, 22 Jul 2020 at 16:38, Julien Pivotto <roidel...@prometheus.io > >>>> > > >>>> > wrote: > >>>> > > >>>> > > On 22 Jul 16:36, Frederic Branczyk wrote: > >>>> > > > It's unclear how that helps, can you help me understand? > >>>> > > > >>>> > > - job: highprio > >>>> > > relabel_configs: > >>>> > > - target_label: job > >>>> > > replacement: pods > >>>> > > - source_labels: [__meta_pod_priority] > >>>> > > regex: high > >>>> > > action: keep > >>>> > >>>> highprio job will always be scraped. > >>>> > >>>> > > - job: lowprio > >>>> > > relabel_configs: > >>>> > > - target_label: job > >>>> > > replacement: pods > >>>> > > - source_labels: [__meta_pod_priority] > >>>> > > regex: high > >>>> > > action: drop > >>>> > > target_limit: 1000 > >>>> > > > >>>> > > > > >>>> > > > On Wed, 22 Jul 2020 at 16:34, Julien Pivotto < > >>>> roidel...@prometheus.io > >>>> > > > > >>>> > > > wrote: > >>>> > > > > >>>> > > > > On 22 Jul 16:32, Frederic Branczyk wrote: > >>>> > > > > > Can you explain what you mean by two jobs? Do you mean two > >>>> scrape > >>>> > > > > configs? > >>>> > > > > > >>>> > > > > Yes. > >>>> > > > > > >>>> > > > > > > >>>> > > > > > On Wed, 22 Jul 2020 at 11:40, Julien Pivotto < > >>>> > > roidel...@prometheus.io > >>>> > > > > > > >>>> > > > > > wrote: > >>>> > > > > > > >>>> > > > > > > On 22 Jul 02:35, Lili Cosic wrote: > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > On Wednesday, 22 July 2020 11:23:00 UTC+2, Brian Brazil > >>>> wrote: > >>>> > > > > > > > > > >>>> > > > > > > > > On Wed, 22 Jul 2020 at 10:18, Julien Pivotto < > >>>> > > > > roidel...@prometheus.io > >>>> > > > > > > > > <javascript:>> wrote: > >>>> > > > > > > > > > >>>> > > > > > > > >> On 22 Jul 02:14, Lili Cosic wrote: > >>>> > > > > > > > >> > Only now seen in the docs that I am supposed to > >>>> start any > >>>> > > > > > > discussions > >>>> > > > > > > > >> here > >>>> > > > > > > > >> > first before opening an issue, sorry about that! :) > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > Currently there is no way of a target to have higher > >>>> scrape > >>>> > > > > > > priority > >>>> > > > > > > > >> over > >>>> > > > > > > > >> > another, but if you have a setup and even if you set > >>>> target > >>>> > > > > limits > >>>> > > > > > > and > >>>> > > > > > > > >> > sample limits you can still overestimate your setup, > >>>> you > >>>> > > still > >>>> > > > > want > >>>> > > > > > > to > >>>> > > > > > > > >> have > >>>> > > > > > > > >> > a higher priority targets that are preferred over > >>>> the entire > >>>> > > > > > > Prometheus > >>>> > > > > > > > >> to > >>>> > > > > > > > >> > fail. It would need to be based on the inability to > >>>> ingest > >>>> > > into > >>>> > > > > > > tsdb on > >>>> > > > > > > > >> the > >>>> > > > > > > > >> > current rate we are scrapping, if that is hit the > >>>> priority > >>>> > > class > >>>> > > > > > > would > >>>> > > > > > > > >> take > >>>> > > > > > > > >> > affect and only the highest priority targets would be > >>>> > > scrapped > >>>> > > > > in > >>>> > > > > > > > >> favour of > >>>> > > > > > > > >> > lower priority. Another option which might be > >>>> simpler would > >>>> > > be > >>>> > > > > to > >>>> > > > > > > have > >>>> > > > > > > > >> a > >>>> > > > > > > > >> > global limit on how much prometheus can handle based > >>>> on perf > >>>> > > > > > > testing. > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > This would be treated as a last resort, and there > >>>> would > >>>> > > > > definitely > >>>> > > > > > > be a > >>>> > > > > > > > >> > need for a high severity alert to inform the admin > >>>> that > >>>> > > > > something > >>>> > > > > > > went > >>>> > > > > > > > >> > terribly wrong, but because we would still be able > >>>> to ingest > >>>> > > > > > > Prometheus > >>>> > > > > > > > >> > metrics for example if they are higher priority class > >>>> > > alerting > >>>> > > > > > > would be > >>>> > > > > > > > >> > possible. > >>>> > > > > > > > >> > >>>> > > > > > > > >> Hi, > >>>> > > > > > > > >> > >>>> > > > > > > > >> I think that limiting the number of targets you scrape > >>>> is > >>>> > > already > >>>> > > > > a > >>>> > > > > > > last > >>>> > > > > > > > >> resort. I don't think we would need a second line of > >>>> defense. > >>>> > > > > > > > >> > >>>> > > > > > > > > > >>>> > > > > > > > > I agree with Julien here. If you've gotten to this > >>>> point you're > >>>> > > > > > > already > >>>> > > > > > > > > seriously overloaded, and prioritising individual > >>>> targets is > >>>> > > just > >>>> > > > > > > > > rearranging the deckchairs at that point. > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > >> > >>>> > > > > > > > >> You can achieve this priority by setting 2 jobs, one > >>>> which is > >>>> > > > > limited > >>>> > > > > > > > >> and one which is not, and use relabeling to decinde > >>>> which > >>>> > > target > >>>> > > > > is > >>>> > > > > > > > >> going in which job. > >>>> > > > > > > > >> > >>>> > > > > > > > > > >>>> > > > > > > > > Or more generally, one Prometheus for the important > >>>> targets and > >>>> > > > > > > another > >>>> > > > > > > > > for the less important and riskier targets. > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > I get your point completely Brian, and agree to some > >>>> degree but > >>>> > > > > people > >>>> > > > > > > are > >>>> > > > > > > > still going to be setting up a multi tenant prometheus > >>>> which then > >>>> > > > > causes > >>>> > > > > > > > the above problems I mentioned. Even within the riskier > >>>> targets > >>>> > > there > >>>> > > > > > > will > >>>> > > > > > > > be some more important than others for users. I think we > >>>> should > >>>> > > still > >>>> > > > > > > > strive to making a single shared Prometheus as safe as > >>>> possible, > >>>> > > if > >>>> > > > > this > >>>> > > > > > > is > >>>> > > > > > > > not the priority class I suggested, open to other ideas! > >>>> > > > > > > > >>>> > > > > > > Then 2 jobs are the answer, one unlimited and one limited. > >>>> > > > > > > > >>>> > > > > > > The target_limit is already pretty advanced use case. > >>>> > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > Brian > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > >> > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > We could model this on something like PriorityClass > >>>> > > > > > > > >> > < > >>>> > > > > > > > >> > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass > >>>> > > > > > > >>>> > > > > > > > >>>> > > > > > > > >> from > >>>> > > > > > > > >> > Kubernetes, but I am open to other suggestions. > >>>> > > > > > > > >> > >>>> > > > > > > > >> That could be used in relabeling as I said. > >>>> > > > > > > > >> > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > I am open to other suggestions, or maybe there is > >>>> something > >>>> > > like > >>>> > > > > > > this > >>>> > > > > > > > >> but I > >>>> > > > > > > > >> > missed it. The main purpose is to ensure there are > >>>> > > protection > >>>> > > > > > > > >> mechanisms in > >>>> > > > > > > > >> > place, so any ideas and suggestions welcome! > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > >>>> > > > > > > > >> regards, > >>>> > > > > > > > >> > >>>> > > > > > > > >> > Thanks and kind regards, > >>>> > > > > > > > >> > Lili > >>>> > > > > > > > >> > > >>>> > > > > > > > >> > -- > >>>> > > > > > > > >> > You received this message because you are subscribed > >>>> to the > >>>> > > > > Google > >>>> > > > > > > > >> Groups "Prometheus Developers" group. > >>>> > > > > > > > >> > To unsubscribe from this group and stop receiving > >>>> emails > >>>> > > from > >>>> > > > > it, > >>>> > > > > > > send > >>>> > > > > > > > >> an email to > >>>> > > prometheus-developers+unsubscr...@googlegroups.com > >>>> > > > > > > > >> <javascript:>. > >>>> > > > > > > > >> > To view this discussion on the web visit > >>>> > > > > > > > >> > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com > >>>> > > > > > > > >> . > >>>> > > > > > > > >> > >>>> > > > > > > > >> > >>>> > > > > > > > >> -- > >>>> > > > > > > > >> Julien Pivotto > >>>> > > > > > > > >> @roidelapluie > >>>> > > > > > > > >> > >>>> > > > > > > > >> -- > >>>> > > > > > > > >> You received this message because you are subscribed > >>>> to the > >>>> > > Google > >>>> > > > > > > Groups > >>>> > > > > > > > >> "Prometheus Developers" group. > >>>> > > > > > > > >> To unsubscribe from this group and stop receiving > >>>> emails from > >>>> > > it, > >>>> > > > > > > send an > >>>> > > > > > > > >> email to > >>>> prometheus-developers+unsubscr...@googlegroups.com > >>>> > > > > > > <javascript:> > >>>> > > > > > > > >> . > >>>> > > > > > > > >> To view this discussion on the web visit > >>>> > > > > > > > >> > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen > >>>> > > > > > > > >> . > >>>> > > > > > > > >> > >>>> > > > > > > > > > >>>> > > > > > > > > > >>>> > > > > > > > > -- > >>>> > > > > > > > > Brian Brazil > >>>> > > > > > > > > www.robustperception.io > >>>> > > > > > > > > > >>>> > > > > > > > > >>>> > > > > > > > -- > >>>> > > > > > > > You received this message because you are subscribed to > >>>> the > >>>> > > Google > >>>> > > > > > > Groups "Prometheus Developers" group. > >>>> > > > > > > > To unsubscribe from this group and stop receiving emails > >>>> from it, > >>>> > > > > send > >>>> > > > > > > an email to > >>>> prometheus-developers+unsubscr...@googlegroups.com. > >>>> > > > > > > > To view this discussion on the web visit > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/b0b9e5f7-239a-4cc7-9108-9e6e015a30d6o%40googlegroups.com > >>>> > > > > > > . > >>>> > > > > > > > >>>> > > > > > > > >>>> > > > > > > -- > >>>> > > > > > > Julien Pivotto > >>>> > > > > > > @roidelapluie > >>>> > > > > > > > >>>> > > > > > > -- > >>>> > > > > > > You received this message because you are subscribed to the > >>>> Google > >>>> > > > > Groups > >>>> > > > > > > "Prometheus Developers" group. > >>>> > > > > > > To unsubscribe from this group and stop receiving emails > >>>> from it, > >>>> > > send > >>>> > > > > an > >>>> > > > > > > email to prometheus-developers+unsubscr...@googlegroups.com > >>>> . > >>>> > > > > > > To view this discussion on the web visit > >>>> > > > > > > > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen > >>>> > > > > > > . > >>>> > > > > > > > >>>> > > > > > > >>>> > > > > > -- > >>>> > > > > > You received this message because you are subscribed to the > >>>> Google > >>>> > > > > Groups "Prometheus Developers" group. > >>>> > > > > > To unsubscribe from this group and stop receiving emails from > >>>> it, > >>>> > > send > >>>> > > > > an email to prometheus-developers+unsubscr...@googlegroups.com. > >>>> > > > > > To view this discussion on the web visit > >>>> > > > > > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1Umx-uFZFPoeOMA-ev4oN5QoRUyODiCWnSZML3hessHkmBQ%40mail.gmail.com > >>>> > > > > . > >>>> > > > > > >>>> > > > > -- > >>>> > > > > Julien Pivotto > >>>> > > > > @roidelapluie > >>>> > > > > > >>>> > > > > >>>> > > > -- > >>>> > > > You received this message because you are subscribed to the Google > >>>> > > Groups "Prometheus Developers" group. > >>>> > > > To unsubscribe from this group and stop receiving emails from it, > >>>> send > >>>> > > an email to prometheus-developers+unsubscr...@googlegroups.com. > >>>> > > > To view this discussion on the web visit > >>>> > > > >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmzgPKCrpmsDb4v3CrN9Oe%2Bmaka8bosCDuodmjmd-RAyLw%40mail.gmail.com > >>>> > > . > >>>> > > > >>>> > > -- > >>>> > > Julien Pivotto > >>>> > > @roidelapluie > >>>> > > > >>>> > > >>>> > -- > >>>> > You received this message because you are subscribed to the Google > >>>> Groups "Prometheus Developers" group. > >>>> > To unsubscribe from this group and stop receiving emails from it, > >>>> send an email to prometheus-developers+unsubscr...@googlegroups.com. > >>>> > To view this discussion on the web visit > >>>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmyxR%3DQ%2B6_emwh12CVwkwemU%2B-tzenvgP1WQ%2BCHnw67UUQ%40mail.gmail.com > >>>> . > >>>> > >>>> -- > >>>> Julien Pivotto > >>>> @roidelapluie > >>>> > >>> -- > >>> You received this message because you are subscribed to the Google > >>> Groups "Prometheus Developers" group. > >>> To unsubscribe from this group and stop receiving emails from it, send > >>> an email to prometheus-developers+unsubscr...@googlegroups.com. > >>> To view this discussion on the web visit > >>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com > >>> <https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer> > >>> . > >>> > >> -- > > You received this message because you are subscribed to the Google Groups > > "Prometheus Developers" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to prometheus-developers+unsubscr...@googlegroups.com. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com > > <https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com?utm_medium=email&utm_source=footer> > > . > > > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to prometheus-developers+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-developers/CAMssQwZT78NtfWCQCsrx%2B-B3u4RZKGoFmMGKEH_ypXWGoh3w%2Bw%40mail.gmail.com. -- Julien Pivotto @roidelapluie -- You received this message because you are subscribed to the Google Groups "Prometheus Developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20200730091922.GA156213%40oxygen.