Re: [prometheus-users] Re: Losing metrics for the time when federation node is down

Ben Kochie Mon, 30 Nov 2020 04:23:17 -0800

On Mon, Nov 30, 2020 at 1:08 PM Aliaksandr Valialkin <valy...@gmail.com>
wrote:


>
>
> On Sun, Nov 29, 2020 at 3:10 PM Ben Kochie <sup...@gmail.com> wrote:
>
>> On Sun, Nov 29, 2020 at 11:51 AM Aliaksandr Valialkin <valy...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Nov 27, 2020 at 11:11 AM Ben Kochie <sup...@gmail.com> wrote:
>>>
>>>>
>>>>>
>>>>>> Or else is there any other ways by which we can solve this issue.
>>>>>>
>>>>>
>>>>> Using something other than federation.  remote_write is able to buffer
>>>>> up data locally if the endpoint is down.
>>>>>
>>>>> Prometheus itself can't accept remote_write requests, so you'd have to
>>>>> write to some other system
>>>>> <https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage>
>>>>> which can.  I suggest VictoriaMetrics, as it's simple to run and has a 
>>>>> very
>>>>> prometheus-like API, which can be queried as if it were a prometheus
>>>>> instance.
>>>>>
>>>>
>>>> I recommend Thanos, as it scales better and with less effort than
>>>> VictoriaMetrics. It also uses PromQL code directly, so you will get the
>>>> same results as Prometheus, not an emulation of PromQL.
>>>>
>>>>
>>> Could you share more details on why you think that VictoriaMetrics has
>>> scalability issues and is harder to set up and operate than Thanos?
>>> VictoriaMetrics users have quite the opposite opinion. See
>>> https://victoriametrics.github.io/CaseStudies.html and
>>> https://medium.com/faun/comparing-thanos-to-victoriametrics-cluster-b193bea1683
>>> .
>>>
>>
>> Thanos uses object storage, which avoids the need for manual sharding of
>> TSDB storage. Today I have 100TiB of data stored in object storage buckets.
>> I make no changes to scale up or down these buckets.
>>
>>
> VictoriaMetrics stores data on persistent disks. Every replicated durable
> persistent disk in GCP <https://cloud.google.com/persistent-disk> can scale
> up to 64TB
> <https://cloud.google.com/compute/docs/disks/add-persistent-disk#resize_pd>
> without the need to stop VictoriaMetrics, i.e. without downtime. Given that 
> VictoriaMetrics
> compresses real-world data much better than Prometheus
> <https://valyala.medium.com/prometheus-vs-victoriametrics-benchmark-on-node-exporter-metrics-4ca29c75590f>,
> a single-node VictoriaMetrics can substitute the whole Thanos cluster for
> your workload (in theory of course - just give it a try in order to verify
> this statement :) ). Cluster version of VictoriaMetrics
> <https://victoriametrics.github.io/Cluster-VictoriaMetrics.html> can
> scale to petabytes. For example, a cluster with one terabyte capacity can
> be built with 16 vmstorage nodes with 64TB persistent disk per each node.
> That's why VictoriaMetrics in production usually has lower infrastructure
> costs than Thanos.
>

* GCP persistent disk costs double that of object storage, and is zone
local only.
* Cost is four times if you want regional replication.
* GCP persistent disks don't have multi-regional replication (GCS does by
default).
* Object storage versioning makes for easy lifecycle management for
disaster recovery.
* Plus you have to maintain some percent of un-used filesystem overhead to
avoid running out of space.
* You can't shrink Persistent disks
* And we're back to manual labor required to scale.

Storing on persistent disks is a major reason why we don't just use
Prometheus for TSDB. As an instance-level SPoF, the cost of persistent
disks compared to object storage, and the toil involved.

No thanks, we're moving away from old-school architectures.


>
>
> --
> Best Regards,
>
> Aliaksandr Valialkin, CTO VictoriaMetrics
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmqNtcDOu2nGTunh2QQz27ym3wkUDAfLN4Eos-FDUAwL%3DA%40mail.gmail.com.

Re: [prometheus-users] Re: Losing metrics for the time when federation node is down

Reply via email to