Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-19 Thread Rob Skillington
If anyone wants to do some further testing on their own datasets, would
definitely be interesting to see what range they are in.

I’ll start addressing latest round of comments and tie up tests.

On Wed, Aug 19, 2020 at 4:53 AM Brian Brazil <
brian.bra...@robustperception.io> wrote:

> On Wed, 19 Aug 2020 at 09:47, Rob Skillington  wrote:
>
>> To add a bit more detail to that example, I was actually using a
>> fairly tuned
>> remote write queue config that sent large batches since the batch send
>> deadline
>> was set to 1 minute longer with a max samples per send of 5,000. Here's
>> that
>> config:
>> ```
>> remote_write:
>>   - url: http://localhost:3030/remote/write
>> remote_timeout: 30s
>> queue_config:
>>   capacity: 1
>>   max_shards: 10
>>   min_shards: 3
>>   max_samples_per_send: 5000
>>   batch_send_deadline: 1m
>>   min_backoff: 50ms
>>   max_backoff: 1s
>> ```
>>
>> Using the default config we get worse utilization for both before/after
>> numbers
>> but the delta/difference is less:
>> - steady state ~177kb/sec without this change
>> - steady state ~210kb/sec with this change
>> - roughly 20% increase
>>
>
> I think 20% is okay all things considered.
>
> Brian
>
>
>>
>> Using config:
>> ```
>> remote_write:
>>   - url: http://localhost:3030/remote/write
>> remote_timeout: 30s
>> ```
>>
>> Implicitly the values for this config is:
>> - min shards 1
>> - max shards 1000
>> - max samples per send 100
>> - capacity 500
>> - batch send deadline 5s
>> - min backoff 30ms
>> - max backoff 100ms
>>
>> On Wed, Aug 19, 2020 at 4:26 AM Brian Brazil <
>> brian.bra...@robustperception.io> wrote:
>>
>>> On Wed, 19 Aug 2020 at 09:20, Rob Skillington 
>>> wrote:
>>>
 Here's the results from testing:
 - node_exporter exporting 309 metrics each by turning on a lot of
 optional
   collectors, all have help set, very few have unit set
 - running 8 on the host at 1s scrape interval, each with unique
 instance label
 - steady state ~137kb/sec without this change
 - steady state ~172kb/sec with this change
 - roughly 30% increase

 Graph here:

 https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976

 How do we want to proceed? This could be fairly close to the higher end
 of
 the spectrum in terms of expected increase given the node_exporter
 metrics
 density and fairly verbose metadata.

 Even having said that however 30% is a fairly big increase and
 relatively large
 egress cost to have to swallow without any way to back out of this
 behavior.

 What do folks think of next steps?

>>>
>>> It is on the high end, however this is going to be among the worst cases
>>> as there's not going to be a lot of per-metric cardinality from the node
>>> exporter. I bet if you greatly increased the number of targets (and reduced
>>> the scrape interval to compensate) it'd be more reasonable. I think this is
>>> just about okay.
>>>
>>> Brian
>>>
>>>


 On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington 
 wrote:

> Agreed - I'll see what I can do in getting some numbers for a workload
> collecting cAdvisor metrics, it seems to have a significant amount of
> HELP set:
>
> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics
>
>
> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
> brian.bra...@robustperception.io> wrote:
>
>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto <
>> roidelapl...@prometheus.io> wrote:
>>
>>> On 11 Aug 11:05, Brian Brazil wrote:
>>>
>>>
>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
>>> wrote:
>>>
>>>
>>> >
>>>
>>>
>>> > > I'm hesitant to add anything that significantly increases the
>>> network
>>>
>>>
>>> > > bandwidth usage or remote write while at the same time not
>>> giving users a
>>>
>>>
>>> > > way to tune the usage to their needs.
>>>
>>>
>>> > >
>>>
>>>
>>> > > I agree with Brian that we don't want the protocol itself to
>>> become
>>>
>>>
>>> > > stateful by introducing something like negotiation. I'd also
>>> prefer not to
>>>
>>>
>>> > > introduce multiple ways to do things, though I'm hoping we can
>>> find a way
>>>
>>>
>>> > > to accommodate your use case while not ballooning average users
>>> network
>>>
>>>
>>> > > egress bill.
>>>
>>>
>>> > >
>>>
>>>
>>> > > I am fine with forcing the consuming end to be somewhat stateful
>>> like in
>>>
>>>
>>> > > the case of Josh's PR where all metadata is sent periodically
>>> and must be
>>>
>>>
>>> > > stored by the remote storage system.
>>>
>>>
>>> > >
>>>
>>>
>>> >
>>>
>>>
>>> >
>>>
>>

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-19 Thread Brian Brazil
On Wed, 19 Aug 2020 at 09:47, Rob Skillington  wrote:

> To add a bit more detail to that example, I was actually using a
> fairly tuned
> remote write queue config that sent large batches since the batch send
> deadline
> was set to 1 minute longer with a max samples per send of 5,000. Here's
> that
> config:
> ```
> remote_write:
>   - url: http://localhost:3030/remote/write
> remote_timeout: 30s
> queue_config:
>   capacity: 1
>   max_shards: 10
>   min_shards: 3
>   max_samples_per_send: 5000
>   batch_send_deadline: 1m
>   min_backoff: 50ms
>   max_backoff: 1s
> ```
>
> Using the default config we get worse utilization for both before/after
> numbers
> but the delta/difference is less:
> - steady state ~177kb/sec without this change
> - steady state ~210kb/sec with this change
> - roughly 20% increase
>

I think 20% is okay all things considered.

Brian


>
> Using config:
> ```
> remote_write:
>   - url: http://localhost:3030/remote/write
> remote_timeout: 30s
> ```
>
> Implicitly the values for this config is:
> - min shards 1
> - max shards 1000
> - max samples per send 100
> - capacity 500
> - batch send deadline 5s
> - min backoff 30ms
> - max backoff 100ms
>
> On Wed, Aug 19, 2020 at 4:26 AM Brian Brazil <
> brian.bra...@robustperception.io> wrote:
>
>> On Wed, 19 Aug 2020 at 09:20, Rob Skillington 
>> wrote:
>>
>>> Here's the results from testing:
>>> - node_exporter exporting 309 metrics each by turning on a lot of
>>> optional
>>>   collectors, all have help set, very few have unit set
>>> - running 8 on the host at 1s scrape interval, each with unique instance
>>> label
>>> - steady state ~137kb/sec without this change
>>> - steady state ~172kb/sec with this change
>>> - roughly 30% increase
>>>
>>> Graph here:
>>> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976
>>>
>>> How do we want to proceed? This could be fairly close to the higher end
>>> of
>>> the spectrum in terms of expected increase given the node_exporter
>>> metrics
>>> density and fairly verbose metadata.
>>>
>>> Even having said that however 30% is a fairly big increase and
>>> relatively large
>>> egress cost to have to swallow without any way to back out of this
>>> behavior.
>>>
>>> What do folks think of next steps?
>>>
>>
>> It is on the high end, however this is going to be among the worst cases
>> as there's not going to be a lot of per-metric cardinality from the node
>> exporter. I bet if you greatly increased the number of targets (and reduced
>> the scrape interval to compensate) it'd be more reasonable. I think this is
>> just about okay.
>>
>> Brian
>>
>>
>>>
>>>
>>> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington 
>>> wrote:
>>>
 Agreed - I'll see what I can do in getting some numbers for a workload
 collecting cAdvisor metrics, it seems to have a significant amount of
 HELP set:

 https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics


 On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
 brian.bra...@robustperception.io> wrote:

> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto <
> roidelapl...@prometheus.io> wrote:
>
>> On 11 Aug 11:05, Brian Brazil wrote:
>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
>> wrote:
>> >
>> > > I'm hesitant to add anything that significantly increases the
>> network
>> > > bandwidth usage or remote write while at the same time not giving
>> users a
>> > > way to tune the usage to their needs.
>> > >
>> > > I agree with Brian that we don't want the protocol itself to
>> become
>> > > stateful by introducing something like negotiation. I'd also
>> prefer not to
>> > > introduce multiple ways to do things, though I'm hoping we can
>> find a way
>> > > to accommodate your use case while not ballooning average users
>> network
>> > > egress bill.
>> > >
>> > > I am fine with forcing the consuming end to be somewhat stateful
>> like in
>> > > the case of Josh's PR where all metadata is sent periodically and
>> must be
>> > > stored by the remote storage system.
>> > >
>> >
>> >
>> >
>> > > Overall I'd like to see some numbers regarding current network
>> bandwidth
>> > > of remote write, remote write with metadata via Josh's PR, and
>> remote write
>> > > with sending metadata for every series in a remote write payload.
>> > >
>> >
>> > I agree, I noticed that in Rob's PR and had the same thought.
>>
>> Remote bandwidth are likely to affect only people using remote write.
>>
>> Getting a view on the on-disk size of the WAL would be great too, as
>> that will affect everyone.
>>
>
> I'm not worried about that, it's only really on series creation so
> won't be noticed unless you have really high levels of churn.
>
> Brian
>

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-19 Thread Rob Skillington
To add a bit more detail to that example, I was actually using a
fairly tuned
remote write queue config that sent large batches since the batch send
deadline
was set to 1 minute longer with a max samples per send of 5,000. Here's
that
config:
```
remote_write:
  - url: http://localhost:3030/remote/write
remote_timeout: 30s
queue_config:
  capacity: 1
  max_shards: 10
  min_shards: 3
  max_samples_per_send: 5000
  batch_send_deadline: 1m
  min_backoff: 50ms
  max_backoff: 1s
```

Using the default config we get worse utilization for both before/after
numbers
but the delta/difference is less:
- steady state ~177kb/sec without this change
- steady state ~210kb/sec with this change
- roughly 20% increase

Using config:
```
remote_write:
  - url: http://localhost:3030/remote/write
remote_timeout: 30s
```

Implicitly the values for this config is:
- min shards 1
- max shards 1000
- max samples per send 100
- capacity 500
- batch send deadline 5s
- min backoff 30ms
- max backoff 100ms

On Wed, Aug 19, 2020 at 4:26 AM Brian Brazil <
brian.bra...@robustperception.io> wrote:

> On Wed, 19 Aug 2020 at 09:20, Rob Skillington  wrote:
>
>> Here's the results from testing:
>> - node_exporter exporting 309 metrics each by turning on a lot of
>> optional
>>   collectors, all have help set, very few have unit set
>> - running 8 on the host at 1s scrape interval, each with unique instance
>> label
>> - steady state ~137kb/sec without this change
>> - steady state ~172kb/sec with this change
>> - roughly 30% increase
>>
>> Graph here:
>> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976
>>
>> How do we want to proceed? This could be fairly close to the higher end of
>> the spectrum in terms of expected increase given the node_exporter
>> metrics
>> density and fairly verbose metadata.
>>
>> Even having said that however 30% is a fairly big increase and relatively
>> large
>> egress cost to have to swallow without any way to back out of this
>> behavior.
>>
>> What do folks think of next steps?
>>
>
> It is on the high end, however this is going to be among the worst cases
> as there's not going to be a lot of per-metric cardinality from the node
> exporter. I bet if you greatly increased the number of targets (and reduced
> the scrape interval to compensate) it'd be more reasonable. I think this is
> just about okay.
>
> Brian
>
>
>>
>>
>> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington 
>> wrote:
>>
>>> Agreed - I'll see what I can do in getting some numbers for a workload
>>> collecting cAdvisor metrics, it seems to have a significant amount of
>>> HELP set:
>>>
>>> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics
>>>
>>>
>>> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
>>> brian.bra...@robustperception.io> wrote:
>>>
 On Tue, 11 Aug 2020 at 11:07, Julien Pivotto <
 roidelapl...@prometheus.io> wrote:

> On 11 Aug 11:05, Brian Brazil wrote:
> > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
> wrote:
> >
> > > I'm hesitant to add anything that significantly increases the
> network
> > > bandwidth usage or remote write while at the same time not giving
> users a
> > > way to tune the usage to their needs.
> > >
> > > I agree with Brian that we don't want the protocol itself to become
> > > stateful by introducing something like negotiation. I'd also
> prefer not to
> > > introduce multiple ways to do things, though I'm hoping we can
> find a way
> > > to accommodate your use case while not ballooning average users
> network
> > > egress bill.
> > >
> > > I am fine with forcing the consuming end to be somewhat stateful
> like in
> > > the case of Josh's PR where all metadata is sent periodically and
> must be
> > > stored by the remote storage system.
> > >
> >
> >
> >
> > > Overall I'd like to see some numbers regarding current network
> bandwidth
> > > of remote write, remote write with metadata via Josh's PR, and
> remote write
> > > with sending metadata for every series in a remote write payload.
> > >
> >
> > I agree, I noticed that in Rob's PR and had the same thought.
>
> Remote bandwidth are likely to affect only people using remote write.
>
> Getting a view on the on-disk size of the WAL would be great too, as
> that will affect everyone.
>

 I'm not worried about that, it's only really on series creation so
 won't be noticed unless you have really high levels of churn.

 Brian


>
> >
> > Brian
> >
> >
> > >
> > > Rob, I'll review your PR tomorrow but it looks like Julien and
> Brian may
> > > already have that covered.
> > >
> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington <
> r...@chronosphere.io>
> > > wrote:
> > >
>>

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-19 Thread Brian Brazil
On Wed, 19 Aug 2020 at 09:20, Rob Skillington  wrote:

> Here's the results from testing:
> - node_exporter exporting 309 metrics each by turning on a lot of optional
>   collectors, all have help set, very few have unit set
> - running 8 on the host at 1s scrape interval, each with unique instance
> label
> - steady state ~137kb/sec without this change
> - steady state ~172kb/sec with this change
> - roughly 30% increase
>
> Graph here:
> https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976
>
> How do we want to proceed? This could be fairly close to the higher end of
> the spectrum in terms of expected increase given the node_exporter metrics
> density and fairly verbose metadata.
>
> Even having said that however 30% is a fairly big increase and relatively
> large
> egress cost to have to swallow without any way to back out of this
> behavior.
>
> What do folks think of next steps?
>

It is on the high end, however this is going to be among the worst cases as
there's not going to be a lot of per-metric cardinality from the node
exporter. I bet if you greatly increased the number of targets (and reduced
the scrape interval to compensate) it'd be more reasonable. I think this is
just about okay.

Brian


>
>
> On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington 
> wrote:
>
>> Agreed - I'll see what I can do in getting some numbers for a workload
>> collecting cAdvisor metrics, it seems to have a significant amount of
>> HELP set:
>>
>> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics
>>
>>
>> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
>> brian.bra...@robustperception.io> wrote:
>>
>>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto 
>>> wrote:
>>>
 On 11 Aug 11:05, Brian Brazil wrote:
 > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
 wrote:
 >
 > > I'm hesitant to add anything that significantly increases the
 network
 > > bandwidth usage or remote write while at the same time not giving
 users a
 > > way to tune the usage to their needs.
 > >
 > > I agree with Brian that we don't want the protocol itself to become
 > > stateful by introducing something like negotiation. I'd also prefer
 not to
 > > introduce multiple ways to do things, though I'm hoping we can find
 a way
 > > to accommodate your use case while not ballooning average users
 network
 > > egress bill.
 > >
 > > I am fine with forcing the consuming end to be somewhat stateful
 like in
 > > the case of Josh's PR where all metadata is sent periodically and
 must be
 > > stored by the remote storage system.
 > >
 >
 >
 >
 > > Overall I'd like to see some numbers regarding current network
 bandwidth
 > > of remote write, remote write with metadata via Josh's PR, and
 remote write
 > > with sending metadata for every series in a remote write payload.
 > >
 >
 > I agree, I noticed that in Rob's PR and had the same thought.

 Remote bandwidth are likely to affect only people using remote write.

 Getting a view on the on-disk size of the WAL would be great too, as
 that will affect everyone.

>>>
>>> I'm not worried about that, it's only really on series creation so won't
>>> be noticed unless you have really high levels of churn.
>>>
>>> Brian
>>>
>>>

 >
 > Brian
 >
 >
 > >
 > > Rob, I'll review your PR tomorrow but it looks like Julien and
 Brian may
 > > already have that covered.
 > >
 > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington >>> >
 > > wrote:
 > >
 > >> Update: The PR now sends the fields over remote write from the WAL
 and
 > >> metadata
 > >> is also updated in the WAL when any field changes.
 > >>
 > >> Now opened the PR against the primary repo:
 > >> https://github.com/prometheus/prometheus/pull/7771
 > >>
 > >> I have tested this end-to-end with a modified M3 branch:
 > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
 > >> > {... "msg":"received
 > >> series","labels":"{__name__="prometheus_rule_group_...
 > >> >
 iterations_total",instance="localhost:9090",job="prometheus01",role=...
 > >> > "remote"}","type":"counter","unit":"","help":"The total number of
 > >> scheduled...
 > >> > rule group evaluations, whether executed or missed."}
 > >>
 > >> Tests still haven't been updated. Please any feedback on the
 approach /
 > >> data structures would be greatly appreciated.
 > >>
 > >> Would be good to know what others thoughts are on next steps.
 > >>
 > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington <
 r...@chronosphere.io>
 > >> wrote:
 > >>
 > >>> Here's a draft PR that builds that propagates metadata to the WAL
 and
 > >>> the WAL
 > >>> reader can read it back:
 > >>> https

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-19 Thread Rob Skillington
Here's the results from testing:
- node_exporter exporting 309 metrics each by turning on a lot of optional
  collectors, all have help set, very few have unit set
- running 8 on the host at 1s scrape interval, each with unique instance
label
- steady state ~137kb/sec without this change
- steady state ~172kb/sec with this change
- roughly 30% increase

Graph here:
https://github.com/prometheus/prometheus/pull/7771#issuecomment-675923976

How do we want to proceed? This could be fairly close to the higher end of
the spectrum in terms of expected increase given the node_exporter metrics
density and fairly verbose metadata.

Even having said that however 30% is a fairly big increase and relatively
large
egress cost to have to swallow without any way to back out of this behavior.

What do folks think of next steps?


On Tue, Aug 11, 2020 at 11:55 AM Rob Skillington 
wrote:

> Agreed - I'll see what I can do in getting some numbers for a workload
> collecting cAdvisor metrics, it seems to have a significant amount of HELP
> set:
>
> https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics
>
>
> On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
> brian.bra...@robustperception.io> wrote:
>
>> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto 
>> wrote:
>>
>>> On 11 Aug 11:05, Brian Brazil wrote:
>>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
>>> wrote:
>>> >
>>> > > I'm hesitant to add anything that significantly increases the network
>>> > > bandwidth usage or remote write while at the same time not giving
>>> users a
>>> > > way to tune the usage to their needs.
>>> > >
>>> > > I agree with Brian that we don't want the protocol itself to become
>>> > > stateful by introducing something like negotiation. I'd also prefer
>>> not to
>>> > > introduce multiple ways to do things, though I'm hoping we can find
>>> a way
>>> > > to accommodate your use case while not ballooning average users
>>> network
>>> > > egress bill.
>>> > >
>>> > > I am fine with forcing the consuming end to be somewhat stateful
>>> like in
>>> > > the case of Josh's PR where all metadata is sent periodically and
>>> must be
>>> > > stored by the remote storage system.
>>> > >
>>> >
>>> >
>>> >
>>> > > Overall I'd like to see some numbers regarding current network
>>> bandwidth
>>> > > of remote write, remote write with metadata via Josh's PR, and
>>> remote write
>>> > > with sending metadata for every series in a remote write payload.
>>> > >
>>> >
>>> > I agree, I noticed that in Rob's PR and had the same thought.
>>>
>>> Remote bandwidth are likely to affect only people using remote write.
>>>
>>> Getting a view on the on-disk size of the WAL would be great too, as
>>> that will affect everyone.
>>>
>>
>> I'm not worried about that, it's only really on series creation so won't
>> be noticed unless you have really high levels of churn.
>>
>> Brian
>>
>>
>>>
>>> >
>>> > Brian
>>> >
>>> >
>>> > >
>>> > > Rob, I'll review your PR tomorrow but it looks like Julien and Brian
>>> may
>>> > > already have that covered.
>>> > >
>>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington 
>>> > > wrote:
>>> > >
>>> > >> Update: The PR now sends the fields over remote write from the WAL
>>> and
>>> > >> metadata
>>> > >> is also updated in the WAL when any field changes.
>>> > >>
>>> > >> Now opened the PR against the primary repo:
>>> > >> https://github.com/prometheus/prometheus/pull/7771
>>> > >>
>>> > >> I have tested this end-to-end with a modified M3 branch:
>>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
>>> > >> > {... "msg":"received
>>> > >> series","labels":"{__name__="prometheus_rule_group_...
>>> > >> >
>>> iterations_total",instance="localhost:9090",job="prometheus01",role=...
>>> > >> > "remote"}","type":"counter","unit":"","help":"The total number of
>>> > >> scheduled...
>>> > >> > rule group evaluations, whether executed or missed."}
>>> > >>
>>> > >> Tests still haven't been updated. Please any feedback on the
>>> approach /
>>> > >> data structures would be greatly appreciated.
>>> > >>
>>> > >> Would be good to know what others thoughts are on next steps.
>>> > >>
>>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington <
>>> r...@chronosphere.io>
>>> > >> wrote:
>>> > >>
>>> > >>> Here's a draft PR that builds that propagates metadata to the WAL
>>> and
>>> > >>> the WAL
>>> > >>> reader can read it back:
>>> > >>> https://github.com/robskillington/prometheus/pull/1/files
>>> > >>>
>>> > >>> Would like a little bit of feedback before on the datatypes and
>>> > >>> structure going
>>> > >>> further if folks are open to that.
>>> > >>>
>>> > >>> There's a few things not happening:
>>> > >>> - Remote write queue manager does not use or send these extra
>>> fields yet.
>>> > >>> - Head does not reset the "metadata" slice (not sure where "series"
>>> > >>> slice is
>>> > >>>   reset in the head for pending series writes to WAL, want to do
>>> in same
>>> > >>> 

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-11 Thread Rob Skillington
Agreed - I'll see what I can do in getting some numbers for a workload
collecting cAdvisor metrics, it seems to have a significant amount of HELP
set:
https://github.com/google/cadvisor/blob/8450c56c21bc5406e2df79a2162806b9a23ebd34/metrics/testdata/prometheus_metrics


On Tue, Aug 11, 2020 at 6:15 AM Brian Brazil <
brian.bra...@robustperception.io> wrote:

> On Tue, 11 Aug 2020 at 11:07, Julien Pivotto 
> wrote:
>
>> On 11 Aug 11:05, Brian Brazil wrote:
>> > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
>> wrote:
>> >
>> > > I'm hesitant to add anything that significantly increases the network
>> > > bandwidth usage or remote write while at the same time not giving
>> users a
>> > > way to tune the usage to their needs.
>> > >
>> > > I agree with Brian that we don't want the protocol itself to become
>> > > stateful by introducing something like negotiation. I'd also prefer
>> not to
>> > > introduce multiple ways to do things, though I'm hoping we can find a
>> way
>> > > to accommodate your use case while not ballooning average users
>> network
>> > > egress bill.
>> > >
>> > > I am fine with forcing the consuming end to be somewhat stateful like
>> in
>> > > the case of Josh's PR where all metadata is sent periodically and
>> must be
>> > > stored by the remote storage system.
>> > >
>> >
>> >
>> >
>> > > Overall I'd like to see some numbers regarding current network
>> bandwidth
>> > > of remote write, remote write with metadata via Josh's PR, and remote
>> write
>> > > with sending metadata for every series in a remote write payload.
>> > >
>> >
>> > I agree, I noticed that in Rob's PR and had the same thought.
>>
>> Remote bandwidth are likely to affect only people using remote write.
>>
>> Getting a view on the on-disk size of the WAL would be great too, as
>> that will affect everyone.
>>
>
> I'm not worried about that, it's only really on series creation so won't
> be noticed unless you have really high levels of churn.
>
> Brian
>
>
>>
>> >
>> > Brian
>> >
>> >
>> > >
>> > > Rob, I'll review your PR tomorrow but it looks like Julien and Brian
>> may
>> > > already have that covered.
>> > >
>> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington 
>> > > wrote:
>> > >
>> > >> Update: The PR now sends the fields over remote write from the WAL
>> and
>> > >> metadata
>> > >> is also updated in the WAL when any field changes.
>> > >>
>> > >> Now opened the PR against the primary repo:
>> > >> https://github.com/prometheus/prometheus/pull/7771
>> > >>
>> > >> I have tested this end-to-end with a modified M3 branch:
>> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
>> > >> > {... "msg":"received
>> > >> series","labels":"{__name__="prometheus_rule_group_...
>> > >> >
>> iterations_total",instance="localhost:9090",job="prometheus01",role=...
>> > >> > "remote"}","type":"counter","unit":"","help":"The total number of
>> > >> scheduled...
>> > >> > rule group evaluations, whether executed or missed."}
>> > >>
>> > >> Tests still haven't been updated. Please any feedback on the
>> approach /
>> > >> data structures would be greatly appreciated.
>> > >>
>> > >> Would be good to know what others thoughts are on next steps.
>> > >>
>> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington > >
>> > >> wrote:
>> > >>
>> > >>> Here's a draft PR that builds that propagates metadata to the WAL
>> and
>> > >>> the WAL
>> > >>> reader can read it back:
>> > >>> https://github.com/robskillington/prometheus/pull/1/files
>> > >>>
>> > >>> Would like a little bit of feedback before on the datatypes and
>> > >>> structure going
>> > >>> further if folks are open to that.
>> > >>>
>> > >>> There's a few things not happening:
>> > >>> - Remote write queue manager does not use or send these extra
>> fields yet.
>> > >>> - Head does not reset the "metadata" slice (not sure where "series"
>> > >>> slice is
>> > >>>   reset in the head for pending series writes to WAL, want to do in
>> same
>> > >>> place).
>> > >>> - Metadata is not re-written on change yet.
>> > >>> - Tests.
>> > >>>
>> > >>>
>> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington > >
>> > >>> wrote:
>> > >>>
>> >  Sounds good, I've updated the proposal with details on places in
>> which
>> >  changes
>> >  are required given the new approach:
>> > 
>> > 
>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
>> > 
>> > 
>> >  On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
>> >  brian.bra...@robustperception.io> wrote:
>> > 
>> > > On Fri, 7 Aug 2020 at 15:48, Rob Skillington > >
>> > > wrote:
>> > >
>> > >> True - I mean this could also be a blacklist by config perhaps,
>> so if
>> > >> you
>> > >> really don't want to have increased egress you can optionally
>> turn
>> > >> off sending
>> > >> the TYPE, HELP, UNIT or send them at different frequencies via
>> > >> config. We could
>> > >> package some sensible defaults so folks don't

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-11 Thread Brian Brazil
On Tue, 11 Aug 2020 at 11:07, Julien Pivotto 
wrote:

> On 11 Aug 11:05, Brian Brazil wrote:
> > On Tue, 11 Aug 2020 at 04:09, Callum Styan 
> wrote:
> >
> > > I'm hesitant to add anything that significantly increases the network
> > > bandwidth usage or remote write while at the same time not giving
> users a
> > > way to tune the usage to their needs.
> > >
> > > I agree with Brian that we don't want the protocol itself to become
> > > stateful by introducing something like negotiation. I'd also prefer
> not to
> > > introduce multiple ways to do things, though I'm hoping we can find a
> way
> > > to accommodate your use case while not ballooning average users network
> > > egress bill.
> > >
> > > I am fine with forcing the consuming end to be somewhat stateful like
> in
> > > the case of Josh's PR where all metadata is sent periodically and must
> be
> > > stored by the remote storage system.
> > >
> >
> >
> >
> > > Overall I'd like to see some numbers regarding current network
> bandwidth
> > > of remote write, remote write with metadata via Josh's PR, and remote
> write
> > > with sending metadata for every series in a remote write payload.
> > >
> >
> > I agree, I noticed that in Rob's PR and had the same thought.
>
> Remote bandwidth are likely to affect only people using remote write.
>
> Getting a view on the on-disk size of the WAL would be great too, as
> that will affect everyone.
>

I'm not worried about that, it's only really on series creation so won't be
noticed unless you have really high levels of churn.

Brian


>
> >
> > Brian
> >
> >
> > >
> > > Rob, I'll review your PR tomorrow but it looks like Julien and Brian
> may
> > > already have that covered.
> > >
> > > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington 
> > > wrote:
> > >
> > >> Update: The PR now sends the fields over remote write from the WAL and
> > >> metadata
> > >> is also updated in the WAL when any field changes.
> > >>
> > >> Now opened the PR against the primary repo:
> > >> https://github.com/prometheus/prometheus/pull/7771
> > >>
> > >> I have tested this end-to-end with a modified M3 branch:
> > >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
> > >> > {... "msg":"received
> > >> series","labels":"{__name__="prometheus_rule_group_...
> > >> >
> iterations_total",instance="localhost:9090",job="prometheus01",role=...
> > >> > "remote"}","type":"counter","unit":"","help":"The total number of
> > >> scheduled...
> > >> > rule group evaluations, whether executed or missed."}
> > >>
> > >> Tests still haven't been updated. Please any feedback on the approach
> /
> > >> data structures would be greatly appreciated.
> > >>
> > >> Would be good to know what others thoughts are on next steps.
> > >>
> > >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington 
> > >> wrote:
> > >>
> > >>> Here's a draft PR that builds that propagates metadata to the WAL and
> > >>> the WAL
> > >>> reader can read it back:
> > >>> https://github.com/robskillington/prometheus/pull/1/files
> > >>>
> > >>> Would like a little bit of feedback before on the datatypes and
> > >>> structure going
> > >>> further if folks are open to that.
> > >>>
> > >>> There's a few things not happening:
> > >>> - Remote write queue manager does not use or send these extra fields
> yet.
> > >>> - Head does not reset the "metadata" slice (not sure where "series"
> > >>> slice is
> > >>>   reset in the head for pending series writes to WAL, want to do in
> same
> > >>> place).
> > >>> - Metadata is not re-written on change yet.
> > >>> - Tests.
> > >>>
> > >>>
> > >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington 
> > >>> wrote:
> > >>>
> >  Sounds good, I've updated the proposal with details on places in
> which
> >  changes
> >  are required given the new approach:
> > 
> > 
> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
> > 
> > 
> >  On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
> >  brian.bra...@robustperception.io> wrote:
> > 
> > > On Fri, 7 Aug 2020 at 15:48, Rob Skillington 
> > > wrote:
> > >
> > >> True - I mean this could also be a blacklist by config perhaps,
> so if
> > >> you
> > >> really don't want to have increased egress you can optionally turn
> > >> off sending
> > >> the TYPE, HELP, UNIT or send them at different frequencies via
> > >> config. We could
> > >> package some sensible defaults so folks don't need to update their
> > >> config.
> > >>
> > >> The main intention is to enable these added features and make it
> > >> possible for
> > >> various consumers to be able to adjust some of these parameters if
> > >> required
> > >> since backends can be so different in their implementation. For
> M3 I
> > >> would be
> > >> totally fine with the extra egress that should be mitigated fairly
> > >> considerably
> > >> by Snappy and the fact that HELP is common across certai

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-11 Thread Julien Pivotto
On 11 Aug 11:05, Brian Brazil wrote:
> On Tue, 11 Aug 2020 at 04:09, Callum Styan  wrote:
> 
> > I'm hesitant to add anything that significantly increases the network
> > bandwidth usage or remote write while at the same time not giving users a
> > way to tune the usage to their needs.
> >
> > I agree with Brian that we don't want the protocol itself to become
> > stateful by introducing something like negotiation. I'd also prefer not to
> > introduce multiple ways to do things, though I'm hoping we can find a way
> > to accommodate your use case while not ballooning average users network
> > egress bill.
> >
> > I am fine with forcing the consuming end to be somewhat stateful like in
> > the case of Josh's PR where all metadata is sent periodically and must be
> > stored by the remote storage system.
> >
> 
> 
> 
> > Overall I'd like to see some numbers regarding current network bandwidth
> > of remote write, remote write with metadata via Josh's PR, and remote write
> > with sending metadata for every series in a remote write payload.
> >
> 
> I agree, I noticed that in Rob's PR and had the same thought.

Remote bandwidth are likely to affect only people using remote write. 

Getting a view on the on-disk size of the WAL would be great too, as
that will affect everyone.

> 
> Brian
> 
> 
> >
> > Rob, I'll review your PR tomorrow but it looks like Julien and Brian may
> > already have that covered.
> >
> > On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington 
> > wrote:
> >
> >> Update: The PR now sends the fields over remote write from the WAL and
> >> metadata
> >> is also updated in the WAL when any field changes.
> >>
> >> Now opened the PR against the primary repo:
> >> https://github.com/prometheus/prometheus/pull/7771
> >>
> >> I have tested this end-to-end with a modified M3 branch:
> >> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
> >> > {... "msg":"received
> >> series","labels":"{__name__="prometheus_rule_group_...
> >> > iterations_total",instance="localhost:9090",job="prometheus01",role=...
> >> > "remote"}","type":"counter","unit":"","help":"The total number of
> >> scheduled...
> >> > rule group evaluations, whether executed or missed."}
> >>
> >> Tests still haven't been updated. Please any feedback on the approach /
> >> data structures would be greatly appreciated.
> >>
> >> Would be good to know what others thoughts are on next steps.
> >>
> >> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington 
> >> wrote:
> >>
> >>> Here's a draft PR that builds that propagates metadata to the WAL and
> >>> the WAL
> >>> reader can read it back:
> >>> https://github.com/robskillington/prometheus/pull/1/files
> >>>
> >>> Would like a little bit of feedback before on the datatypes and
> >>> structure going
> >>> further if folks are open to that.
> >>>
> >>> There's a few things not happening:
> >>> - Remote write queue manager does not use or send these extra fields yet.
> >>> - Head does not reset the "metadata" slice (not sure where "series"
> >>> slice is
> >>>   reset in the head for pending series writes to WAL, want to do in same
> >>> place).
> >>> - Metadata is not re-written on change yet.
> >>> - Tests.
> >>>
> >>>
> >>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington 
> >>> wrote:
> >>>
>  Sounds good, I've updated the proposal with details on places in which
>  changes
>  are required given the new approach:
> 
>  https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
> 
> 
>  On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
>  brian.bra...@robustperception.io> wrote:
> 
> > On Fri, 7 Aug 2020 at 15:48, Rob Skillington 
> > wrote:
> >
> >> True - I mean this could also be a blacklist by config perhaps, so if
> >> you
> >> really don't want to have increased egress you can optionally turn
> >> off sending
> >> the TYPE, HELP, UNIT or send them at different frequencies via
> >> config. We could
> >> package some sensible defaults so folks don't need to update their
> >> config.
> >>
> >> The main intention is to enable these added features and make it
> >> possible for
> >> various consumers to be able to adjust some of these parameters if
> >> required
> >> since backends can be so different in their implementation. For M3 I
> >> would be
> >> totally fine with the extra egress that should be mitigated fairly
> >> considerably
> >> by Snappy and the fact that HELP is common across certain metric
> >> families and
> >> receiving it every single Remote Write request.
> >>
> >
> > That's really a micro-optimisation. If you are that worried about
> > bandwidth you'd run a sidecar specific to your remote backend that was
> > stateful and far more efficient overall. Sending the full label names 
> > and
> > values on every request is going to be far more than the overhead of
> > metadata on top of that

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-11 Thread Brian Brazil
On Tue, 11 Aug 2020 at 04:09, Callum Styan  wrote:

> I'm hesitant to add anything that significantly increases the network
> bandwidth usage or remote write while at the same time not giving users a
> way to tune the usage to their needs.
>
> I agree with Brian that we don't want the protocol itself to become
> stateful by introducing something like negotiation. I'd also prefer not to
> introduce multiple ways to do things, though I'm hoping we can find a way
> to accommodate your use case while not ballooning average users network
> egress bill.
>
> I am fine with forcing the consuming end to be somewhat stateful like in
> the case of Josh's PR where all metadata is sent periodically and must be
> stored by the remote storage system.
>



> Overall I'd like to see some numbers regarding current network bandwidth
> of remote write, remote write with metadata via Josh's PR, and remote write
> with sending metadata for every series in a remote write payload.
>

I agree, I noticed that in Rob's PR and had the same thought.

Brian


>
> Rob, I'll review your PR tomorrow but it looks like Julien and Brian may
> already have that covered.
>
> On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington 
> wrote:
>
>> Update: The PR now sends the fields over remote write from the WAL and
>> metadata
>> is also updated in the WAL when any field changes.
>>
>> Now opened the PR against the primary repo:
>> https://github.com/prometheus/prometheus/pull/7771
>>
>> I have tested this end-to-end with a modified M3 branch:
>> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
>> > {... "msg":"received
>> series","labels":"{__name__="prometheus_rule_group_...
>> > iterations_total",instance="localhost:9090",job="prometheus01",role=...
>> > "remote"}","type":"counter","unit":"","help":"The total number of
>> scheduled...
>> > rule group evaluations, whether executed or missed."}
>>
>> Tests still haven't been updated. Please any feedback on the approach /
>> data structures would be greatly appreciated.
>>
>> Would be good to know what others thoughts are on next steps.
>>
>> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington 
>> wrote:
>>
>>> Here's a draft PR that builds that propagates metadata to the WAL and
>>> the WAL
>>> reader can read it back:
>>> https://github.com/robskillington/prometheus/pull/1/files
>>>
>>> Would like a little bit of feedback before on the datatypes and
>>> structure going
>>> further if folks are open to that.
>>>
>>> There's a few things not happening:
>>> - Remote write queue manager does not use or send these extra fields yet.
>>> - Head does not reset the "metadata" slice (not sure where "series"
>>> slice is
>>>   reset in the head for pending series writes to WAL, want to do in same
>>> place).
>>> - Metadata is not re-written on change yet.
>>> - Tests.
>>>
>>>
>>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington 
>>> wrote:
>>>
 Sounds good, I've updated the proposal with details on places in which
 changes
 are required given the new approach:

 https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#


 On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
 brian.bra...@robustperception.io> wrote:

> On Fri, 7 Aug 2020 at 15:48, Rob Skillington 
> wrote:
>
>> True - I mean this could also be a blacklist by config perhaps, so if
>> you
>> really don't want to have increased egress you can optionally turn
>> off sending
>> the TYPE, HELP, UNIT or send them at different frequencies via
>> config. We could
>> package some sensible defaults so folks don't need to update their
>> config.
>>
>> The main intention is to enable these added features and make it
>> possible for
>> various consumers to be able to adjust some of these parameters if
>> required
>> since backends can be so different in their implementation. For M3 I
>> would be
>> totally fine with the extra egress that should be mitigated fairly
>> considerably
>> by Snappy and the fact that HELP is common across certain metric
>> families and
>> receiving it every single Remote Write request.
>>
>
> That's really a micro-optimisation. If you are that worried about
> bandwidth you'd run a sidecar specific to your remote backend that was
> stateful and far more efficient overall. Sending the full label names and
> values on every request is going to be far more than the overhead of
> metadata on top of that, so I don't see a need as it stands for any of 
> this
> to be configurable.
>
> Brian
>
>
>>
>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
>> brian.bra...@robustperception.io> wrote:
>>
>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington 
>>> wrote:
>>>
 Hey Björn,


 Thanks for the detailed response. I've had a few back and forths on
 this with
 Brian and

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-10 Thread Callum Styan
I'm hesitant to add anything that significantly increases the network
bandwidth usage or remote write while at the same time not giving users a
way to tune the usage to their needs.

I agree with Brian that we don't want the protocol itself to become
stateful by introducing something like negotiation. I'd also prefer not to
introduce multiple ways to do things, though I'm hoping we can find a way
to accommodate your use case while not ballooning average users network
egress bill.

I am fine with forcing the consuming end to be somewhat stateful like in
the case of Josh's PR where all metadata is sent periodically and must be
stored by the remote storage system.

Overall I'd like to see some numbers regarding current network bandwidth of
remote write, remote write with metadata via Josh's PR, and remote write
with sending metadata for every series in a remote write payload.

Rob, I'll review your PR tomorrow but it looks like Julien and Brian may
already have that covered.

On Sun, Aug 9, 2020 at 9:36 PM Rob Skillington  wrote:

> Update: The PR now sends the fields over remote write from the WAL and
> metadata
> is also updated in the WAL when any field changes.
>
> Now opened the PR against the primary repo:
> https://github.com/prometheus/prometheus/pull/7771
>
> I have tested this end-to-end with a modified M3 branch:
> https://github.com/m3db/m3/compare/r/test-prometheus-metadata
> > {... "msg":"received
> series","labels":"{__name__="prometheus_rule_group_...
> > iterations_total",instance="localhost:9090",job="prometheus01",role=...
> > "remote"}","type":"counter","unit":"","help":"The total number of
> scheduled...
> > rule group evaluations, whether executed or missed."}
>
> Tests still haven't been updated. Please any feedback on the approach /
> data structures would be greatly appreciated.
>
> Would be good to know what others thoughts are on next steps.
>
> On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington 
> wrote:
>
>> Here's a draft PR that builds that propagates metadata to the WAL and the
>> WAL
>> reader can read it back:
>> https://github.com/robskillington/prometheus/pull/1/files
>>
>> Would like a little bit of feedback before on the datatypes and structure
>> going
>> further if folks are open to that.
>>
>> There's a few things not happening:
>> - Remote write queue manager does not use or send these extra fields yet.
>> - Head does not reset the "metadata" slice (not sure where "series" slice
>> is
>>   reset in the head for pending series writes to WAL, want to do in same
>> place).
>> - Metadata is not re-written on change yet.
>> - Tests.
>>
>>
>> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington 
>> wrote:
>>
>>> Sounds good, I've updated the proposal with details on places in which
>>> changes
>>> are required given the new approach:
>>>
>>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
>>>
>>>
>>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
>>> brian.bra...@robustperception.io> wrote:
>>>
 On Fri, 7 Aug 2020 at 15:48, Rob Skillington 
 wrote:

> True - I mean this could also be a blacklist by config perhaps, so if
> you
> really don't want to have increased egress you can optionally turn off
> sending
> the TYPE, HELP, UNIT or send them at different frequencies via config.
> We could
> package some sensible defaults so folks don't need to update their
> config.
>
> The main intention is to enable these added features and make it
> possible for
> various consumers to be able to adjust some of these parameters if
> required
> since backends can be so different in their implementation. For M3 I
> would be
> totally fine with the extra egress that should be mitigated fairly
> considerably
> by Snappy and the fact that HELP is common across certain metric
> families and
> receiving it every single Remote Write request.
>

 That's really a micro-optimisation. If you are that worried about
 bandwidth you'd run a sidecar specific to your remote backend that was
 stateful and far more efficient overall. Sending the full label names and
 values on every request is going to be far more than the overhead of
 metadata on top of that, so I don't see a need as it stands for any of this
 to be configurable.

 Brian


>
> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
> brian.bra...@robustperception.io> wrote:
>
>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington 
>> wrote:
>>
>>> Hey Björn,
>>>
>>>
>>> Thanks for the detailed response. I've had a few back and forths on
>>> this with
>>> Brian and Chris over IRC and CNCF Slack now too.
>>>
>>> I agree that fundamentally it seems naive to idealistically model
>>> this around
>>> per metric name. It needs to be per series given what may happen
>>> w.r.t.
>>> collision across targets, etc.
>>>
>

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-09 Thread Rob Skillington
Update: The PR now sends the fields over remote write from the WAL and
metadata
is also updated in the WAL when any field changes.

Now opened the PR against the primary repo:
https://github.com/prometheus/prometheus/pull/7771

I have tested this end-to-end with a modified M3 branch:
https://github.com/m3db/m3/compare/r/test-prometheus-metadata
> {... "msg":"received
series","labels":"{__name__="prometheus_rule_group_...
> iterations_total",instance="localhost:9090",job="prometheus01",role=...
> "remote"}","type":"counter","unit":"","help":"The total number of
scheduled...
> rule group evaluations, whether executed or missed."}

Tests still haven't been updated. Please any feedback on the approach /
data structures would be greatly appreciated.

Would be good to know what others thoughts are on next steps.

On Sat, Aug 8, 2020 at 11:21 AM Rob Skillington  wrote:

> Here's a draft PR that builds that propagates metadata to the WAL and the
> WAL
> reader can read it back:
> https://github.com/robskillington/prometheus/pull/1/files
>
> Would like a little bit of feedback before on the datatypes and structure
> going
> further if folks are open to that.
>
> There's a few things not happening:
> - Remote write queue manager does not use or send these extra fields yet.
> - Head does not reset the "metadata" slice (not sure where "series" slice
> is
>   reset in the head for pending series writes to WAL, want to do in same
> place).
> - Metadata is not re-written on change yet.
> - Tests.
>
>
> On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington 
> wrote:
>
>> Sounds good, I've updated the proposal with details on places in which
>> changes
>> are required given the new approach:
>>
>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
>>
>>
>> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
>> brian.bra...@robustperception.io> wrote:
>>
>>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington 
>>> wrote:
>>>
 True - I mean this could also be a blacklist by config perhaps, so if
 you
 really don't want to have increased egress you can optionally turn off
 sending
 the TYPE, HELP, UNIT or send them at different frequencies via config.
 We could
 package some sensible defaults so folks don't need to update their
 config.

 The main intention is to enable these added features and make it
 possible for
 various consumers to be able to adjust some of these parameters if
 required
 since backends can be so different in their implementation. For M3 I
 would be
 totally fine with the extra egress that should be mitigated fairly
 considerably
 by Snappy and the fact that HELP is common across certain metric
 families and
 receiving it every single Remote Write request.

>>>
>>> That's really a micro-optimisation. If you are that worried about
>>> bandwidth you'd run a sidecar specific to your remote backend that was
>>> stateful and far more efficient overall. Sending the full label names and
>>> values on every request is going to be far more than the overhead of
>>> metadata on top of that, so I don't see a need as it stands for any of this
>>> to be configurable.
>>>
>>> Brian
>>>
>>>

 On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
 brian.bra...@robustperception.io> wrote:

> On Thu, 6 Aug 2020 at 22:58, Rob Skillington 
> wrote:
>
>> Hey Björn,
>>
>>
>> Thanks for the detailed response. I've had a few back and forths on
>> this with
>> Brian and Chris over IRC and CNCF Slack now too.
>>
>> I agree that fundamentally it seems naive to idealistically model
>> this around
>> per metric name. It needs to be per series given what may happen
>> w.r.t.
>> collision across targets, etc.
>>
>> Perhaps we can separate these discussions apart into two
>> considerations:
>>
>> 1) Modeling of the data such that it is kept around for transmission
>> (primarily
>> we're focused on WAL here).
>>
>> 2) Transmission (and of which you allude to has many areas for
>> improvement).
>>
>> For (1) - it seems like this needs to be done per time series,
>> thankfully we
>> actually already have modeled this to be stored per series data just
>> once in a
>> single WAL file. I will write up my proposal here, but it will
>> surmount to
>> essentially encoding the HELP, UNIT and TYPE to the WAL per series
>> similar to
>> how labels for a series are encoded once per series in the WAL. Since
>> this
>> optimization is in place, there's already a huge dampening effect on
>> how
>> expensive it is to write out data about a series (e.g. labels). We
>> can always
>> go and collect a sample WAL file and measure how much extra size
>> with/without
>> HELP, UNIT and TYPE this would add, but it seems like it won't
>> fundamentally
>> change the order of magnitu

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-08 Thread Rob Skillington
Here's a draft PR that builds that propagates metadata to the WAL and the
WAL
reader can read it back:
https://github.com/robskillington/prometheus/pull/1/files

Would like a little bit of feedback before on the datatypes and structure
going
further if folks are open to that.

There's a few things not happening:
- Remote write queue manager does not use or send these extra fields yet.
- Head does not reset the "metadata" slice (not sure where "series" slice
is
  reset in the head for pending series writes to WAL, want to do in same
place).
- Metadata is not re-written on change yet.
- Tests.


On Sat, Aug 8, 2020 at 9:37 AM Rob Skillington  wrote:

> Sounds good, I've updated the proposal with details on places in which
> changes
> are required given the new approach:
>
> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#
>
>
> On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
> brian.bra...@robustperception.io> wrote:
>
>> On Fri, 7 Aug 2020 at 15:48, Rob Skillington  wrote:
>>
>>> True - I mean this could also be a blacklist by config perhaps, so if
>>> you
>>> really don't want to have increased egress you can optionally turn off
>>> sending
>>> the TYPE, HELP, UNIT or send them at different frequencies via config.
>>> We could
>>> package some sensible defaults so folks don't need to update their
>>> config.
>>>
>>> The main intention is to enable these added features and make it
>>> possible for
>>> various consumers to be able to adjust some of these parameters if
>>> required
>>> since backends can be so different in their implementation. For M3 I
>>> would be
>>> totally fine with the extra egress that should be mitigated fairly
>>> considerably
>>> by Snappy and the fact that HELP is common across certain metric
>>> families and
>>> receiving it every single Remote Write request.
>>>
>>
>> That's really a micro-optimisation. If you are that worried about
>> bandwidth you'd run a sidecar specific to your remote backend that was
>> stateful and far more efficient overall. Sending the full label names and
>> values on every request is going to be far more than the overhead of
>> metadata on top of that, so I don't see a need as it stands for any of this
>> to be configurable.
>>
>> Brian
>>
>>
>>>
>>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
>>> brian.bra...@robustperception.io> wrote:
>>>
 On Thu, 6 Aug 2020 at 22:58, Rob Skillington 
 wrote:

> Hey Björn,
>
>
> Thanks for the detailed response. I've had a few back and forths on
> this with
> Brian and Chris over IRC and CNCF Slack now too.
>
> I agree that fundamentally it seems naive to idealistically model this
> around
> per metric name. It needs to be per series given what may happen
> w.r.t.
> collision across targets, etc.
>
> Perhaps we can separate these discussions apart into two
> considerations:
>
> 1) Modeling of the data such that it is kept around for transmission
> (primarily
> we're focused on WAL here).
>
> 2) Transmission (and of which you allude to has many areas for
> improvement).
>
> For (1) - it seems like this needs to be done per time series,
> thankfully we
> actually already have modeled this to be stored per series data just
> once in a
> single WAL file. I will write up my proposal here, but it will
> surmount to
> essentially encoding the HELP, UNIT and TYPE to the WAL per series
> similar to
> how labels for a series are encoded once per series in the WAL. Since
> this
> optimization is in place, there's already a huge dampening effect on
> how
> expensive it is to write out data about a series (e.g. labels). We can
> always
> go and collect a sample WAL file and measure how much extra size
> with/without
> HELP, UNIT and TYPE this would add, but it seems like it won't
> fundamentally
> change the order of magnitude in terms of "information about a
> timeseries
> storage size" vs "datapoints about a timeseries storage size". One
> extra change
> would be re-encoding the series into the WAL if the HELP changed for
> that
> series, just so that when HELP does change it can be up to date from
> the view
> of whoever is reading the WAL (i.e. the Remote Write loop). Since this
> entry
> needs to be loaded into memory for Remote Write today anyway, with
> string
> interning as suggested by Chris, it won't change the memory profile
> algorithmically of a Prometheus with Remote Write enabled. There will
> be some
> overhead that at most would likely be similar to the label data, but
> we aren't
> altering data structures (so won't change big-O magnitude of memory
> being used),
> we're adding fields to existing data structures that exist and string
> interning
> should actually make it much less onerous since there is a large
>

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-08 Thread Rob Skillington
Sounds good, I've updated the proposal with details on places in which
changes
are required given the new approach:
https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#


On Fri, Aug 7, 2020 at 2:09 PM Brian Brazil <
brian.bra...@robustperception.io> wrote:

> On Fri, 7 Aug 2020 at 15:48, Rob Skillington  wrote:
>
>> True - I mean this could also be a blacklist by config perhaps, so if you
>> really don't want to have increased egress you can optionally turn off
>> sending
>> the TYPE, HELP, UNIT or send them at different frequencies via config. We
>> could
>> package some sensible defaults so folks don't need to update their config.
>>
>> The main intention is to enable these added features and make it possible
>> for
>> various consumers to be able to adjust some of these parameters if
>> required
>> since backends can be so different in their implementation. For M3 I
>> would be
>> totally fine with the extra egress that should be mitigated fairly
>> considerably
>> by Snappy and the fact that HELP is common across certain metric families
>> and
>> receiving it every single Remote Write request.
>>
>
> That's really a micro-optimisation. If you are that worried about
> bandwidth you'd run a sidecar specific to your remote backend that was
> stateful and far more efficient overall. Sending the full label names and
> values on every request is going to be far more than the overhead of
> metadata on top of that, so I don't see a need as it stands for any of this
> to be configurable.
>
> Brian
>
>
>>
>> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
>> brian.bra...@robustperception.io> wrote:
>>
>>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington 
>>> wrote:
>>>
 Hey Björn,


 Thanks for the detailed response. I've had a few back and forths on
 this with
 Brian and Chris over IRC and CNCF Slack now too.

 I agree that fundamentally it seems naive to idealistically model this
 around
 per metric name. It needs to be per series given what may happen w.r.t.
 collision across targets, etc.

 Perhaps we can separate these discussions apart into two considerations:

 1) Modeling of the data such that it is kept around for transmission
 (primarily
 we're focused on WAL here).

 2) Transmission (and of which you allude to has many areas for
 improvement).

 For (1) - it seems like this needs to be done per time series,
 thankfully we
 actually already have modeled this to be stored per series data just
 once in a
 single WAL file. I will write up my proposal here, but it will surmount
 to
 essentially encoding the HELP, UNIT and TYPE to the WAL per series
 similar to
 how labels for a series are encoded once per series in the WAL. Since
 this
 optimization is in place, there's already a huge dampening effect on
 how
 expensive it is to write out data about a series (e.g. labels). We can
 always
 go and collect a sample WAL file and measure how much extra size
 with/without
 HELP, UNIT and TYPE this would add, but it seems like it won't
 fundamentally
 change the order of magnitude in terms of "information about a
 timeseries
 storage size" vs "datapoints about a timeseries storage size". One
 extra change
 would be re-encoding the series into the WAL if the HELP changed for
 that
 series, just so that when HELP does change it can be up to date from
 the view
 of whoever is reading the WAL (i.e. the Remote Write loop). Since this
 entry
 needs to be loaded into memory for Remote Write today anyway, with
 string
 interning as suggested by Chris, it won't change the memory profile
 algorithmically of a Prometheus with Remote Write enabled. There will
 be some
 overhead that at most would likely be similar to the label data, but we
 aren't
 altering data structures (so won't change big-O magnitude of memory
 being used),
 we're adding fields to existing data structures that exist and string
 interning
 should actually make it much less onerous since there is a large
 duplicative
 effect with HELP among time series.

 For (2) - now we have basically TYPE, HELP and UNIT all available for
 transmission if we wanted to send it with every single datapoint. While
 I think
 we should definitely examine HPACK like compression features as you
 mentioned
 Björn, I think we should think more about separating that kind of work
 into a
 Milestone 2 where this is considered.

>>>
>>>
>>>
 For the time being it's very plausible
 we could do some negotiation of the receiving Remote Write endpoint by
 sending
 a "GET" to the remote write endpoint and seeing if it responds with a
 "capabilities + preferences" response, and if the endpoint specifies
 that it
 would like to receive metadata all 

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-07 Thread Brian Brazil
On Fri, 7 Aug 2020 at 15:48, Rob Skillington  wrote:

> True - I mean this could also be a blacklist by config perhaps, so if you
> really don't want to have increased egress you can optionally turn off
> sending
> the TYPE, HELP, UNIT or send them at different frequencies via config. We
> could
> package some sensible defaults so folks don't need to update their config.
>
> The main intention is to enable these added features and make it possible
> for
> various consumers to be able to adjust some of these parameters if
> required
> since backends can be so different in their implementation. For M3 I would
> be
> totally fine with the extra egress that should be mitigated fairly
> considerably
> by Snappy and the fact that HELP is common across certain metric families
> and
> receiving it every single Remote Write request.
>

That's really a micro-optimisation. If you are that worried about bandwidth
you'd run a sidecar specific to your remote backend that was stateful and
far more efficient overall. Sending the full label names and values on
every request is going to be far more than the overhead of metadata on top
of that, so I don't see a need as it stands for any of this to be
configurable.

Brian


>
> On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
> brian.bra...@robustperception.io> wrote:
>
>> On Thu, 6 Aug 2020 at 22:58, Rob Skillington  wrote:
>>
>>> Hey Björn,
>>>
>>>
>>> Thanks for the detailed response. I've had a few back and forths on this
>>> with
>>> Brian and Chris over IRC and CNCF Slack now too.
>>>
>>> I agree that fundamentally it seems naive to idealistically model this
>>> around
>>> per metric name. It needs to be per series given what may happen w.r.t.
>>> collision across targets, etc.
>>>
>>> Perhaps we can separate these discussions apart into two considerations:
>>>
>>> 1) Modeling of the data such that it is kept around for transmission
>>> (primarily
>>> we're focused on WAL here).
>>>
>>> 2) Transmission (and of which you allude to has many areas for
>>> improvement).
>>>
>>> For (1) - it seems like this needs to be done per time series,
>>> thankfully we
>>> actually already have modeled this to be stored per series data just
>>> once in a
>>> single WAL file. I will write up my proposal here, but it will surmount
>>> to
>>> essentially encoding the HELP, UNIT and TYPE to the WAL per series
>>> similar to
>>> how labels for a series are encoded once per series in the WAL. Since
>>> this
>>> optimization is in place, there's already a huge dampening effect on how
>>> expensive it is to write out data about a series (e.g. labels). We can
>>> always
>>> go and collect a sample WAL file and measure how much extra size
>>> with/without
>>> HELP, UNIT and TYPE this would add, but it seems like it won't
>>> fundamentally
>>> change the order of magnitude in terms of "information about a
>>> timeseries
>>> storage size" vs "datapoints about a timeseries storage size". One extra
>>> change
>>> would be re-encoding the series into the WAL if the HELP changed for
>>> that
>>> series, just so that when HELP does change it can be up to date from the
>>> view
>>> of whoever is reading the WAL (i.e. the Remote Write loop). Since this
>>> entry
>>> needs to be loaded into memory for Remote Write today anyway, with
>>> string
>>> interning as suggested by Chris, it won't change the memory profile
>>> algorithmically of a Prometheus with Remote Write enabled. There will be
>>> some
>>> overhead that at most would likely be similar to the label data, but we
>>> aren't
>>> altering data structures (so won't change big-O magnitude of memory
>>> being used),
>>> we're adding fields to existing data structures that exist and string
>>> interning
>>> should actually make it much less onerous since there is a large
>>> duplicative
>>> effect with HELP among time series.
>>>
>>> For (2) - now we have basically TYPE, HELP and UNIT all available for
>>> transmission if we wanted to send it with every single datapoint. While
>>> I think
>>> we should definitely examine HPACK like compression features as you
>>> mentioned
>>> Björn, I think we should think more about separating that kind of work
>>> into a
>>> Milestone 2 where this is considered.
>>>
>>
>>
>>
>>> For the time being it's very plausible
>>> we could do some negotiation of the receiving Remote Write endpoint by
>>> sending
>>> a "GET" to the remote write endpoint and seeing if it responds with a
>>> "capabilities + preferences" response, and if the endpoint specifies
>>> that it
>>> would like to receive metadata all the time on every single request and
>>> let
>>> Snappy take care of keeping size not ballooning too much, or if it would
>>> like
>>> TYPE on every single datapoint, and HELP and UNIT every DESIRED_SECONDS
>>> or so.
>>> To enable a "send HELP every 10 minutes" feature we would have to add to
>>> the
>>> datastructure that holds the LABELS, TYPE, HELP and UNIT for each series
>>> a
>>> "last sent" timestamp to know when to r

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-07 Thread Rob Skillington
True - I mean this could also be a blacklist by config perhaps, so if you
really don't want to have increased egress you can optionally turn off
sending
the TYPE, HELP, UNIT or send them at different frequencies via config. We
could
package some sensible defaults so folks don't need to update their config.

The main intention is to enable these added features and make it possible
for
various consumers to be able to adjust some of these parameters if required
since backends can be so different in their implementation. For M3 I would
be
totally fine with the extra egress that should be mitigated fairly
considerably
by Snappy and the fact that HELP is common across certain metric families
and
receiving it every single Remote Write request.

On Fri, Aug 7, 2020 at 3:56 AM Brian Brazil <
brian.bra...@robustperception.io> wrote:

> On Thu, 6 Aug 2020 at 22:58, Rob Skillington  wrote:
>
>> Hey Björn,
>>
>>
>> Thanks for the detailed response. I've had a few back and forths on this
>> with
>> Brian and Chris over IRC and CNCF Slack now too.
>>
>> I agree that fundamentally it seems naive to idealistically model this
>> around
>> per metric name. It needs to be per series given what may happen w.r.t.
>> collision across targets, etc.
>>
>> Perhaps we can separate these discussions apart into two considerations:
>>
>> 1) Modeling of the data such that it is kept around for transmission
>> (primarily
>> we're focused on WAL here).
>>
>> 2) Transmission (and of which you allude to has many areas for
>> improvement).
>>
>> For (1) - it seems like this needs to be done per time series, thankfully
>> we
>> actually already have modeled this to be stored per series data just once
>> in a
>> single WAL file. I will write up my proposal here, but it will surmount
>> to
>> essentially encoding the HELP, UNIT and TYPE to the WAL per series
>> similar to
>> how labels for a series are encoded once per series in the WAL. Since
>> this
>> optimization is in place, there's already a huge dampening effect on how
>> expensive it is to write out data about a series (e.g. labels). We can
>> always
>> go and collect a sample WAL file and measure how much extra size
>> with/without
>> HELP, UNIT and TYPE this would add, but it seems like it won't
>> fundamentally
>> change the order of magnitude in terms of "information about a timeseries
>> storage size" vs "datapoints about a timeseries storage size". One extra
>> change
>> would be re-encoding the series into the WAL if the HELP changed for that
>> series, just so that when HELP does change it can be up to date from the
>> view
>> of whoever is reading the WAL (i.e. the Remote Write loop). Since this
>> entry
>> needs to be loaded into memory for Remote Write today anyway, with string
>> interning as suggested by Chris, it won't change the memory profile
>> algorithmically of a Prometheus with Remote Write enabled. There will be
>> some
>> overhead that at most would likely be similar to the label data, but we
>> aren't
>> altering data structures (so won't change big-O magnitude of memory being
>> used),
>> we're adding fields to existing data structures that exist and string
>> interning
>> should actually make it much less onerous since there is a large
>> duplicative
>> effect with HELP among time series.
>>
>> For (2) - now we have basically TYPE, HELP and UNIT all available for
>> transmission if we wanted to send it with every single datapoint. While I
>> think
>> we should definitely examine HPACK like compression features as you
>> mentioned
>> Björn, I think we should think more about separating that kind of work
>> into a
>> Milestone 2 where this is considered.
>>
>
>
>
>> For the time being it's very plausible
>> we could do some negotiation of the receiving Remote Write endpoint by
>> sending
>> a "GET" to the remote write endpoint and seeing if it responds with a
>> "capabilities + preferences" response, and if the endpoint specifies that
>> it
>> would like to receive metadata all the time on every single request and
>> let
>> Snappy take care of keeping size not ballooning too much, or if it would
>> like
>> TYPE on every single datapoint, and HELP and UNIT every DESIRED_SECONDS
>> or so.
>> To enable a "send HELP every 10 minutes" feature we would have to add to
>> the
>> datastructure that holds the LABELS, TYPE, HELP and UNIT for each series
>> a
>> "last sent" timestamp to know when to resend to that backend, but that
>> seems
>> entirely plausible and would not use more than 4 extra bytes.
>>
>
> Negotiation is fundamentally stateful, as the process that receives the
> first request may be a very different one from the one that receives the
> second - such as if an upgrade is in progress. Remote write is intended to
> be a very simple thing that's easy to implement on the receiver end and is
> a send-only request-based protocol, so request-time negotiation is
> basically out. Any negotiation needs to happen via the config file, and
> even then it'd be better if

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-07 Thread Brian Brazil
On Thu, 6 Aug 2020 at 22:58, Rob Skillington  wrote:

> Hey Björn,
>
>
> Thanks for the detailed response. I've had a few back and forths on this
> with
> Brian and Chris over IRC and CNCF Slack now too.
>
> I agree that fundamentally it seems naive to idealistically model this
> around
> per metric name. It needs to be per series given what may happen w.r.t.
> collision across targets, etc.
>
> Perhaps we can separate these discussions apart into two considerations:
>
> 1) Modeling of the data such that it is kept around for transmission
> (primarily
> we're focused on WAL here).
>
> 2) Transmission (and of which you allude to has many areas for
> improvement).
>
> For (1) - it seems like this needs to be done per time series, thankfully
> we
> actually already have modeled this to be stored per series data just once
> in a
> single WAL file. I will write up my proposal here, but it will surmount to
> essentially encoding the HELP, UNIT and TYPE to the WAL per series similar
> to
> how labels for a series are encoded once per series in the WAL. Since this
> optimization is in place, there's already a huge dampening effect on how
> expensive it is to write out data about a series (e.g. labels). We can
> always
> go and collect a sample WAL file and measure how much extra size
> with/without
> HELP, UNIT and TYPE this would add, but it seems like it won't
> fundamentally
> change the order of magnitude in terms of "information about a timeseries
> storage size" vs "datapoints about a timeseries storage size". One extra
> change
> would be re-encoding the series into the WAL if the HELP changed for that
> series, just so that when HELP does change it can be up to date from the
> view
> of whoever is reading the WAL (i.e. the Remote Write loop). Since this
> entry
> needs to be loaded into memory for Remote Write today anyway, with string
> interning as suggested by Chris, it won't change the memory profile
> algorithmically of a Prometheus with Remote Write enabled. There will be
> some
> overhead that at most would likely be similar to the label data, but we
> aren't
> altering data structures (so won't change big-O magnitude of memory being
> used),
> we're adding fields to existing data structures that exist and string
> interning
> should actually make it much less onerous since there is a large
> duplicative
> effect with HELP among time series.
>
> For (2) - now we have basically TYPE, HELP and UNIT all available for
> transmission if we wanted to send it with every single datapoint. While I
> think
> we should definitely examine HPACK like compression features as you
> mentioned
> Björn, I think we should think more about separating that kind of work
> into a
> Milestone 2 where this is considered.
>



> For the time being it's very plausible
> we could do some negotiation of the receiving Remote Write endpoint by
> sending
> a "GET" to the remote write endpoint and seeing if it responds with a
> "capabilities + preferences" response, and if the endpoint specifies that
> it
> would like to receive metadata all the time on every single request and
> let
> Snappy take care of keeping size not ballooning too much, or if it would
> like
> TYPE on every single datapoint, and HELP and UNIT every DESIRED_SECONDS or
> so.
> To enable a "send HELP every 10 minutes" feature we would have to add to
> the
> datastructure that holds the LABELS, TYPE, HELP and UNIT for each series a
> "last sent" timestamp to know when to resend to that backend, but that
> seems
> entirely plausible and would not use more than 4 extra bytes.
>

Negotiation is fundamentally stateful, as the process that receives the
first request may be a very different one from the one that receives the
second - such as if an upgrade is in progress. Remote write is intended to
be a very simple thing that's easy to implement on the receiver end and is
a send-only request-based protocol, so request-time negotiation is
basically out. Any negotiation needs to happen via the config file, and
even then it'd be better if nothing ever needed to be configured. Getting
all the users of a remote write to change their config file or restart all
their Prometheus servers is not an easy task after all.

Brian


>
> These thoughts are based on the discussion I've had and the thoughts on
> this
> thread. What's the feedback on this before I go ahead and re-iterate the
> design
> to more closely map to what I'm suggesting here?
>
> Best,
> Rob
>
> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein 
> wrote:
>
>> On 03.08.20 03:04, Rob Skillington wrote:
>> > Ok - I have a proposal which could be broken up into two pieces, first
>> > delivering TYPE per datapoint, the second consistently and reliably
>> HELP and
>> > UNIT once per unique metric name:
>> >
>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo
>> > /edit#heading=h.bik9uwphqy3g
>>
>> Thanks for the doc. I have commented on it, but while doing so, I felt
>> the urge to comment

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-06 Thread Rob Skillington
Hey Callum,

Apologies missed your response as was typing back to Björn.

Look forward to seeing your document, sounds good. As I mentioned in
my previous
email I think that there's definitely "further work" area. I'd like to get
just
TYPE and if it's not too difficult then HELP and UNIT at least too flowing
sooner than that timeline however, and we have folks ready to contribute to
work in this space right now.

Would love to hear your thoughts on my latest proposal as sent with the
last
email.

Best,
Rob

On Thu, Aug 6, 2020 at 5:58 PM Rob Skillington  wrote:

> Hey Björn,
>
>
> Thanks for the detailed response. I've had a few back and forths on this
> with
> Brian and Chris over IRC and CNCF Slack now too.
>
> I agree that fundamentally it seems naive to idealistically model this
> around
> per metric name. It needs to be per series given what may happen w.r.t.
> collision across targets, etc.
>
> Perhaps we can separate these discussions apart into two considerations:
>
> 1) Modeling of the data such that it is kept around for transmission
> (primarily
> we're focused on WAL here).
>
> 2) Transmission (and of which you allude to has many areas for
> improvement).
>
> For (1) - it seems like this needs to be done per time series, thankfully
> we
> actually already have modeled this to be stored per series data just once
> in a
> single WAL file. I will write up my proposal here, but it will surmount to
> essentially encoding the HELP, UNIT and TYPE to the WAL per series similar
> to
> how labels for a series are encoded once per series in the WAL. Since this
> optimization is in place, there's already a huge dampening effect on how
> expensive it is to write out data about a series (e.g. labels). We can
> always
> go and collect a sample WAL file and measure how much extra size
> with/without
> HELP, UNIT and TYPE this would add, but it seems like it won't
> fundamentally
> change the order of magnitude in terms of "information about a timeseries
> storage size" vs "datapoints about a timeseries storage size". One extra
> change
> would be re-encoding the series into the WAL if the HELP changed for that
> series, just so that when HELP does change it can be up to date from the
> view
> of whoever is reading the WAL (i.e. the Remote Write loop). Since this
> entry
> needs to be loaded into memory for Remote Write today anyway, with string
> interning as suggested by Chris, it won't change the memory profile
> algorithmically of a Prometheus with Remote Write enabled. There will be
> some
> overhead that at most would likely be similar to the label data, but we
> aren't
> altering data structures (so won't change big-O magnitude of memory being
> used),
> we're adding fields to existing data structures that exist and string
> interning
> should actually make it much less onerous since there is a large
> duplicative
> effect with HELP among time series.
>
> For (2) - now we have basically TYPE, HELP and UNIT all available for
> transmission if we wanted to send it with every single datapoint. While I
> think
> we should definitely examine HPACK like compression features as you
> mentioned
> Björn, I think we should think more about separating that kind of work
> into a
> Milestone 2 where this is considered. For the time being it's very
> plausible
> we could do some negotiation of the receiving Remote Write endpoint by
> sending
> a "GET" to the remote write endpoint and seeing if it responds with a
> "capabilities + preferences" response, and if the endpoint specifies that
> it
> would like to receive metadata all the time on every single request and
> let
> Snappy take care of keeping size not ballooning too much, or if it would
> like
> TYPE on every single datapoint, and HELP and UNIT every DESIRED_SECONDS or
> so.
> To enable a "send HELP every 10 minutes" feature we would have to add to
> the
> datastructure that holds the LABELS, TYPE, HELP and UNIT for each series a
> "last sent" timestamp to know when to resend to that backend, but that
> seems
> entirely plausible and would not use more than 4 extra bytes.
>
> These thoughts are based on the discussion I've had and the thoughts on
> this
> thread. What's the feedback on this before I go ahead and re-iterate the
> design
> to more closely map to what I'm suggesting here?
>
> Best,
> Rob
>
> On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein 
> wrote:
>
>> On 03.08.20 03:04, Rob Skillington wrote:
>> > Ok - I have a proposal which could be broken up into two pieces, first
>> > delivering TYPE per datapoint, the second consistently and reliably
>> HELP and
>> > UNIT once per unique metric name:
>> >
>> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo
>> > /edit#heading=h.bik9uwphqy3g
>>
>> Thanks for the doc. I have commented on it, but while doing so, I felt
>> the urge to comment more generally, which would not fit well into the
>> margin of a Google doc. My thoughts are also a bit out of scope of
>> Rob's design doc 

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-06 Thread Rob Skillington
Hey Björn,


Thanks for the detailed response. I've had a few back and forths on this
with
Brian and Chris over IRC and CNCF Slack now too.

I agree that fundamentally it seems naive to idealistically model this
around
per metric name. It needs to be per series given what may happen w.r.t.
collision across targets, etc.

Perhaps we can separate these discussions apart into two considerations:

1) Modeling of the data such that it is kept around for transmission
(primarily
we're focused on WAL here).

2) Transmission (and of which you allude to has many areas for improvement).

For (1) - it seems like this needs to be done per time series, thankfully
we
actually already have modeled this to be stored per series data just once
in a
single WAL file. I will write up my proposal here, but it will surmount to
essentially encoding the HELP, UNIT and TYPE to the WAL per series similar
to
how labels for a series are encoded once per series in the WAL. Since this
optimization is in place, there's already a huge dampening effect on how
expensive it is to write out data about a series (e.g. labels). We can
always
go and collect a sample WAL file and measure how much extra size
with/without
HELP, UNIT and TYPE this would add, but it seems like it won't
fundamentally
change the order of magnitude in terms of "information about a timeseries
storage size" vs "datapoints about a timeseries storage size". One extra
change
would be re-encoding the series into the WAL if the HELP changed for that
series, just so that when HELP does change it can be up to date from the
view
of whoever is reading the WAL (i.e. the Remote Write loop). Since this
entry
needs to be loaded into memory for Remote Write today anyway, with string
interning as suggested by Chris, it won't change the memory profile
algorithmically of a Prometheus with Remote Write enabled. There will be
some
overhead that at most would likely be similar to the label data, but we
aren't
altering data structures (so won't change big-O magnitude of memory being
used),
we're adding fields to existing data structures that exist and string
interning
should actually make it much less onerous since there is a large duplicative
effect with HELP among time series.

For (2) - now we have basically TYPE, HELP and UNIT all available for
transmission if we wanted to send it with every single datapoint. While I
think
we should definitely examine HPACK like compression features as you
mentioned
Björn, I think we should think more about separating that kind of work into
a
Milestone 2 where this is considered. For the time being it's very
plausible
we could do some negotiation of the receiving Remote Write endpoint by
sending
a "GET" to the remote write endpoint and seeing if it responds with a
"capabilities + preferences" response, and if the endpoint specifies that
it
would like to receive metadata all the time on every single request and let
Snappy take care of keeping size not ballooning too much, or if it would
like
TYPE on every single datapoint, and HELP and UNIT every DESIRED_SECONDS or
so.
To enable a "send HELP every 10 minutes" feature we would have to add to
the
datastructure that holds the LABELS, TYPE, HELP and UNIT for each series a
"last sent" timestamp to know when to resend to that backend, but that
seems
entirely plausible and would not use more than 4 extra bytes.

These thoughts are based on the discussion I've had and the thoughts on
this
thread. What's the feedback on this before I go ahead and re-iterate the
design
to more closely map to what I'm suggesting here?

Best,
Rob

On Thu, Aug 6, 2020 at 2:01 PM Bjoern Rabenstein  wrote:

> On 03.08.20 03:04, Rob Skillington wrote:
> > Ok - I have a proposal which could be broken up into two pieces, first
> > delivering TYPE per datapoint, the second consistently and reliably HELP
> and
> > UNIT once per unique metric name:
> >
> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo
> > /edit#heading=h.bik9uwphqy3g
>
> Thanks for the doc. I have commented on it, but while doing so, I felt
> the urge to comment more generally, which would not fit well into the
> margin of a Google doc. My thoughts are also a bit out of scope of
> Rob's design doc and more about the general topic of remote write and
> the equally general topic of metadata (about which we have an ongoing
> discussion among the Prometheus developers).
>
> Disclaimer: I don't know the remote-write protocol very well. My hope
> here is that my somewhat distant perspective is of some value as it
> allows to take a step back. However, I might just miss crucial details
> that completely invalidate my thoughts. We'll see...
>
> I do care a lot about metadata, though. (And ironically, the reason
> why I declared remote write "somebody else's problem" is that I've
> always disliked how it fundamentally ignores metadata.)
>
> Rob's document embraces the fact that metadata can change over time,
> but it assumes that at any given time, there i

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-06 Thread Callum Styan
Thanks Rob for putting this proposal together, I think it highlights some
features of what we want metadata RW and remote write in general to look
like in the future. As others have pointed out (thanks Björn for giving
such a detailed description) there's issues with the way Prometheus
currently handles metadata that need to be thought about and handled
differently when storing metadata in the WAL or in long term storage. I
didn't make many more comments as most of what I wanted to say had already
been mentioned by others.

 As part of thinking about how to get metadata and exemplars into remote
write, some of us have been discussing what we've been calling 'the future
of remote write'. While there's nothing formal yet, I will be starting a
brainstorming/design doc soon and would appreciate your input there Rob.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CAN2d5OTRMVzAqZ%3DLqpSv6bx9Z%2Btvw3MZCmBxj4e8B1T1YOJr6Q%40mail.gmail.com.


Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-06 Thread Bjoern Rabenstein
On 03.08.20 03:04, Rob Skillington wrote:
> Ok - I have a proposal which could be broken up into two pieces, first
> delivering TYPE per datapoint, the second consistently and reliably HELP and
> UNIT once per unique metric name:
> https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo
> /edit#heading=h.bik9uwphqy3g

Thanks for the doc. I have commented on it, but while doing so, I felt
the urge to comment more generally, which would not fit well into the
margin of a Google doc. My thoughts are also a bit out of scope of
Rob's design doc and more about the general topic of remote write and
the equally general topic of metadata (about which we have an ongoing
discussion among the Prometheus developers).

Disclaimer: I don't know the remote-write protocol very well. My hope
here is that my somewhat distant perspective is of some value as it
allows to take a step back. However, I might just miss crucial details
that completely invalidate my thoughts. We'll see...

I do care a lot about metadata, though. (And ironically, the reason
why I declared remote write "somebody else's problem" is that I've
always disliked how it fundamentally ignores metadata.)

Rob's document embraces the fact that metadata can change over time,
but it assumes that at any given time, there is only one set of
metadata per unique metric name. It takes into account that there can
be drift, but it considers them an irregularity that will only happen
occasionally and iron out over time.

In practice, however, metadata can be legitimately and deliberately
different for different time series of the same name. Instrumentation
libraries and even the exposition format inherently require one set of
metadata per metric name, but this is all only enforced (and meant to
be enforced) _per target_. Once the samples are ingested (or even sent
onwards via remote write), they have no notion of what target they
came from. Furthermore, samples created by rule evaluation don't have
an originating target in the first place. (Which raises the question
of metadata for recording rules, which is another can of worms I'd
like to open eventually...)

(There is also the technical difficulty that the WAL has no notion of
bundling or referencing all the series with the same metric name. That
was commented about in the doc but is not my focus here.)

Rob's doc sees TYPE as special because it is so cheap to just add to
every data point. That's correct, but it's giving me an itch: Should
we really create different ways of handling metadata, depending on its
expected size?

Compare this with labels. There is no upper limit to their number or
size. Still, we have no plan of treating "large" labels differently
from "short" labels.

On top of that, we have by now gained the insight that metadata is
changing over time and essentially has to be tracked per series.

Or in other words: From a pure storage perspective, metadata behaves
exactly the same as labels! (There are certainly huge differences
semantically, but those only manifest themselves on the query level,
i.e. how you treat it in PromQL etc.)

(This is not exactly a new insight. This is more or less what I said
during the 2016 dev summit, when we first discussed remote write. But
I don't want to dwell on "told you so" moments... :o)

There is a good reason why we don't just add metadata as "pseudo
labels": As discussed a lot in the various design docs including Rob's
one, it would blow up the data size significantly because HELP strings
tend to be relatively long.

And that's the point where I would like to take a step back: We are
discussing to essentially treat something that is structurally the
same thing in three different ways: Way 1 for labels as we know
them. Way 2 for "small" metadata. Way 3 for "big" metadata.

However, while labels tend to be shorter than HELP strings, there is
the occasional use case with long or many labels. (Infamously, at
SoundCloud, a binary accidentally put a whole HTML page into a
label. That wasn't a use case, it was a bug, but the Prometheus server
ingesting that was just chugging along as if nothing special had
happened. It looked weird in the expression browser, though...) I'm
sure any vendor offering Prometheus remote storage as a service will
have a customer or two that use excessively long label names. If we
have to deal with that, why not bite the bullet and treat metadata in
the same way as labels in general? Or to phrase it in another way: Any
solution for "big" metadata could be used for labels, too, to
alleviate the pain with excessively long label names.

Or most succintly: A robust and really good solution for
"big" metadata in remote write will make remote write much more
efficient if applied to labels, too.

Imagine an NALSD tech interview question that boils down to "design
Prometheus remote write". I bet that most of the better candidates
will recognize that most of the payload will consist of series
indentifiers (call them labels or whate

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-08-03 Thread Rob Skillington
Ok - I have a proposal which could be broken up into two pieces, first
delivering TYPE per datapoint, the second consistently and reliably HELP
and UNIT once per unique metric name:
https://docs.google.com/document/d/1LY8Im8UyIBn8e3LJ2jB-MoajXkfAqW2eKzY735aYxqo/edit#heading=h.bik9uwphqy3g

Would love to get some feedback on it. Thanks for the consideration. Is
there anyone in particular I should reach out to ask for feedback from
directly?

Best,
Rob


On Tue, Jul 21, 2020 at 5:55 PM Rob Skillington  wrote:

> Also want to point out that with just TYPE you can do things such as know
> it's a histogram type and then suggest using "sum(rate(...)) by (le)" with
> a one click button in a UI which again is significantly harder without that
> information.
>
> The reason it becomes important though is some systems (i.e. StackDriver)
> require this schema/metric information the first time you record a sample.
> So you really want the very basics of it the first time you receive that
> sample (i.e. at least TYPE):
>
> Defines a metric type and its schema. Once a metric descriptor is created,
>> deleting or altering it stops data collection and makes the metric type's
>> existing data unusable.
>> The following are specific rules for service defined Monitoring metric
>> descriptors:
>> type, metricKind, valueType and description fields are all required. The
>> unit field must be specified if the valueType is any of DOUBLE, INT64,
>> DISTRIBUTION.
>> Maximum of default 500 metric descriptors per service is allowed.
>> Maximum of default 10 labels per metric descriptor is allowed.
>
>
> https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors
>
> Just an example, but other systems and definitely systems that want to do
> processing of metrics on the way in would prefer at very least things like
> TYPE and maybe ideally UNIT too are specified.
>
>
> On Tue, Jul 21, 2020 at 5:49 PM Rob Skillington 
> wrote:
>
>> Hey Chris,
>>
>> Apologies on the delay to your response.
>>
>> Yes I think that even just TYPE would be a great first step. I am working
>> on a very small one pager that outlines perhaps how we get from here to
>> that future you talk about.
>>
>> In terms of downstream processing, just having the TYPE on every single
>> sample would be a huge step forward as it enables the ability to do
>> stateless processing of the metric (i.e. downsampling and working out
>> whether counter resets need to be detected during downsampling of this
>> single individual sample).
>>
>> Also you can imagine this enables the ability to suggest certain
>> functions that can be applied, i.e. auto-suggest rate(...) should be
>> applied without needing to analyze or use best effort heuristics of the
>> actual values of a time series.
>>
>> Completely agreed that solving this for UNIT and HELP is more difficult
>> and that information would likely be much nicer to be sent/stored per
>> metric name rather than per time-series sample.
>>
>> I'll send out the Google doc for some comments shortly.
>>
>> Transactional approach is interesting, it could be difficult given that
>> this information can flap (i.e. start with some value for HELP/UNIT but a
>> different target of the same application has a different value) and hence
>> that means ordering is important and dealing with transactional order could
>> be a hard problem. I agree that making this deterministic if possible would
>> be great. Maybe it could be something like a token that is sent alongside
>> the first remote write payload, and if that continuation token that the
>> receiver sees means it missed some part of the stream then it can go and do
>> a full sync and from there on in receive updates/additions in a
>> transactional way from the stream over remote write. Just a random thought
>> though and requires more exploration / different solutions being listed to
>> weigh up pros/cons/complexity/etc.
>>
>> Best,
>> Rob
>>
>>
>>
>> On Thu, Jul 16, 2020 at 4:39 PM Chris Marchbanks 
>> wrote:
>>
>>> Hi Rob,
>>>
>>> I would also like metadata to become stateless, and view 6815
>>>  only as a first
>>> step, and the start of an output format. Currently, there is a work in
>>> progress design doc, and another topic for an upcoming dev summit, for
>>> allowing use cases where metadata needs to be in the same request as the
>>> samples.
>>>
>>> Generally, I (and some others I have talked to) don't want to send all
>>> the metadata with every sample as that is very repetitive, specifically for
>>> histograms and metrics with many series. Instead, I would like remote write
>>> requests to become transaction based, at which point all the metadata from
>>> that scrape/transaction can be added to the metadata field introduced
>>> to the proto in 6815
>>>  and then each
>>> sample can be linked to a metadata entry without as much duplication. That
>>> is very broad

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-07-21 Thread Rob Skillington
Also want to point out that with just TYPE you can do things such as know
it's a histogram type and then suggest using "sum(rate(...)) by (le)" with
a one click button in a UI which again is significantly harder without that
information.

The reason it becomes important though is some systems (i.e. StackDriver)
require this schema/metric information the first time you record a sample.
So you really want the very basics of it the first time you receive that
sample (i.e. at least TYPE):

Defines a metric type and its schema. Once a metric descriptor is created,
> deleting or altering it stops data collection and makes the metric type's
> existing data unusable.
> The following are specific rules for service defined Monitoring metric
> descriptors:
> type, metricKind, valueType and description fields are all required. The
> unit field must be specified if the valueType is any of DOUBLE, INT64,
> DISTRIBUTION.
> Maximum of default 500 metric descriptors per service is allowed.
> Maximum of default 10 labels per metric descriptor is allowed.

https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors

Just an example, but other systems and definitely systems that want to do
processing of metrics on the way in would prefer at very least things like
TYPE and maybe ideally UNIT too are specified.


On Tue, Jul 21, 2020 at 5:49 PM Rob Skillington  wrote:

> Hey Chris,
>
> Apologies on the delay to your response.
>
> Yes I think that even just TYPE would be a great first step. I am working
> on a very small one pager that outlines perhaps how we get from here to
> that future you talk about.
>
> In terms of downstream processing, just having the TYPE on every single
> sample would be a huge step forward as it enables the ability to do
> stateless processing of the metric (i.e. downsampling and working out
> whether counter resets need to be detected during downsampling of this
> single individual sample).
>
> Also you can imagine this enables the ability to suggest certain functions
> that can be applied, i.e. auto-suggest rate(...) should be applied without
> needing to analyze or use best effort heuristics of the actual values of a
> time series.
>
> Completely agreed that solving this for UNIT and HELP is more difficult
> and that information would likely be much nicer to be sent/stored per
> metric name rather than per time-series sample.
>
> I'll send out the Google doc for some comments shortly.
>
> Transactional approach is interesting, it could be difficult given that
> this information can flap (i.e. start with some value for HELP/UNIT but a
> different target of the same application has a different value) and hence
> that means ordering is important and dealing with transactional order could
> be a hard problem. I agree that making this deterministic if possible would
> be great. Maybe it could be something like a token that is sent alongside
> the first remote write payload, and if that continuation token that the
> receiver sees means it missed some part of the stream then it can go and do
> a full sync and from there on in receive updates/additions in a
> transactional way from the stream over remote write. Just a random thought
> though and requires more exploration / different solutions being listed to
> weigh up pros/cons/complexity/etc.
>
> Best,
> Rob
>
>
>
> On Thu, Jul 16, 2020 at 4:39 PM Chris Marchbanks 
> wrote:
>
>> Hi Rob,
>>
>> I would also like metadata to become stateless, and view 6815
>>  only as a first
>> step, and the start of an output format. Currently, there is a work in
>> progress design doc, and another topic for an upcoming dev summit, for
>> allowing use cases where metadata needs to be in the same request as the
>> samples.
>>
>> Generally, I (and some others I have talked to) don't want to send all
>> the metadata with every sample as that is very repetitive, specifically for
>> histograms and metrics with many series. Instead, I would like remote write
>> requests to become transaction based, at which point all the metadata from
>> that scrape/transaction can be added to the metadata field introduced to
>> the proto in 6815 
>> and then each sample can be linked to a metadata entry without as much
>> duplication. That is very broad strokes, and I am sure it will be refined
>> or changed completely with more usage.
>>
>> That said, TYPE and UNIT are much smaller than metric name and help text,
>> and I would support adding those to a linked metadata entry before remote
>> write becomes transactional. Would that satisfy your use cases?
>>
>> Chris
>>
>> On Thu, Jul 16, 2020 at 1:43 PM Rob Skillington 
>> wrote:
>>
>>> Typo: "community request" should be: "community contribution that
>>> duplicates some of PR 6815"
>>>
>>> On Thu, Jul 16, 2020 at 3:27 PM Rob Skillington 
>>> wrote:
>>>
 Firstly: Thanks a lot for sharing the dev summit notes, they 

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-07-21 Thread Rob Skillington
Hey Chris,

Apologies on the delay to your response.

Yes I think that even just TYPE would be a great first step. I am working
on a very small one pager that outlines perhaps how we get from here to
that future you talk about.

In terms of downstream processing, just having the TYPE on every single
sample would be a huge step forward as it enables the ability to do
stateless processing of the metric (i.e. downsampling and working out
whether counter resets need to be detected during downsampling of this
single individual sample).

Also you can imagine this enables the ability to suggest certain functions
that can be applied, i.e. auto-suggest rate(...) should be applied without
needing to analyze or use best effort heuristics of the actual values of a
time series.

Completely agreed that solving this for UNIT and HELP is more difficult and
that information would likely be much nicer to be sent/stored per metric
name rather than per time-series sample.

I'll send out the Google doc for some comments shortly.

Transactional approach is interesting, it could be difficult given that
this information can flap (i.e. start with some value for HELP/UNIT but a
different target of the same application has a different value) and hence
that means ordering is important and dealing with transactional order could
be a hard problem. I agree that making this deterministic if possible would
be great. Maybe it could be something like a token that is sent alongside
the first remote write payload, and if that continuation token that the
receiver sees means it missed some part of the stream then it can go and do
a full sync and from there on in receive updates/additions in a
transactional way from the stream over remote write. Just a random thought
though and requires more exploration / different solutions being listed to
weigh up pros/cons/complexity/etc.

Best,
Rob



On Thu, Jul 16, 2020 at 4:39 PM Chris Marchbanks 
wrote:

> Hi Rob,
>
> I would also like metadata to become stateless, and view 6815
>  only as a first
> step, and the start of an output format. Currently, there is a work in
> progress design doc, and another topic for an upcoming dev summit, for
> allowing use cases where metadata needs to be in the same request as the
> samples.
>
> Generally, I (and some others I have talked to) don't want to send all the
> metadata with every sample as that is very repetitive, specifically for
> histograms and metrics with many series. Instead, I would like remote write
> requests to become transaction based, at which point all the metadata from
> that scrape/transaction can be added to the metadata field introduced to
> the proto in 6815 
> and then each sample can be linked to a metadata entry without as much
> duplication. That is very broad strokes, and I am sure it will be refined
> or changed completely with more usage.
>
> That said, TYPE and UNIT are much smaller than metric name and help text,
> and I would support adding those to a linked metadata entry before remote
> write becomes transactional. Would that satisfy your use cases?
>
> Chris
>
> On Thu, Jul 16, 2020 at 1:43 PM Rob Skillington 
> wrote:
>
>> Typo: "community request" should be: "community contribution that
>> duplicates some of PR 6815"
>>
>> On Thu, Jul 16, 2020 at 3:27 PM Rob Skillington 
>> wrote:
>>
>>> Firstly: Thanks a lot for sharing the dev summit notes, they are greatly
>>> appreciated. Also thank you for a great PromCon!
>>>
>>> In regards to prometheus remote write metadata propagation consensus, is
>>> there any plans/projects/collaborations that can be done to perhaps plan
>>> work on a protocol that might help others in the ecosystem offer the same
>>> benefits to Prometheus ecosystem projects that operate on a per write
>>> request basis (i.e. stateless processing of a write request)?
>>>
>>> I understand https://github.com/prometheus/prometheus/pull/6815 unblocks
>>> feature development on top of Prometheus for users with specific
>>> architectures, however it is a non-starter for a lot of other projects,
>>> especially for third party exporters to systems that are unowned by end
>>> users (i.e. writing a StackDriver remote write endpoint that targeted
>>> StackDriver, the community is unable to change the implementation of
>>> StackDriver itself to cache/statefully make metrics metadata available at
>>> ingestion time to StackDriver).
>>>
>>> Obviously I have a vested interest since as a remote write target, M3
>>> has several stateless components before TSDB ingestion and flowing the
>>> entire metadata to a distributed set of DB nodes that own a different set
>>> of the metrics space from each other node this has implications on M3
>>> itself of course too (i.e. it is non-trivial to map metric name -> DB node
>>> without some messy stateful cache sitting somewhere in the architecture
>>> which adds operational burdens to end u

Re: [prometheus-developers] Re: Remote Write Metadata propagation

2020-07-16 Thread Chris Marchbanks
Hi Rob,

I would also like metadata to become stateless, and view 6815
 only as a first step,
and the start of an output format. Currently, there is a work in progress
design doc, and another topic for an upcoming dev summit, for allowing use
cases where metadata needs to be in the same request as the samples.

Generally, I (and some others I have talked to) don't want to send all the
metadata with every sample as that is very repetitive, specifically for
histograms and metrics with many series. Instead, I would like remote write
requests to become transaction based, at which point all the metadata from
that scrape/transaction can be added to the metadata field introduced to
the proto in 6815  and
then each sample can be linked to a metadata entry without as much
duplication. That is very broad strokes, and I am sure it will be refined
or changed completely with more usage.

That said, TYPE and UNIT are much smaller than metric name and help text,
and I would support adding those to a linked metadata entry before remote
write becomes transactional. Would that satisfy your use cases?

Chris

On Thu, Jul 16, 2020 at 1:43 PM Rob Skillington  wrote:

> Typo: "community request" should be: "community contribution that
> duplicates some of PR 6815"
>
> On Thu, Jul 16, 2020 at 3:27 PM Rob Skillington 
> wrote:
>
>> Firstly: Thanks a lot for sharing the dev summit notes, they are greatly
>> appreciated. Also thank you for a great PromCon!
>>
>> In regards to prometheus remote write metadata propagation consensus, is
>> there any plans/projects/collaborations that can be done to perhaps plan
>> work on a protocol that might help others in the ecosystem offer the same
>> benefits to Prometheus ecosystem projects that operate on a per write
>> request basis (i.e. stateless processing of a write request)?
>>
>> I understand https://github.com/prometheus/prometheus/pull/6815 unblocks
>> feature development on top of Prometheus for users with specific
>> architectures, however it is a non-starter for a lot of other projects,
>> especially for third party exporters to systems that are unowned by end
>> users (i.e. writing a StackDriver remote write endpoint that targeted
>> StackDriver, the community is unable to change the implementation of
>> StackDriver itself to cache/statefully make metrics metadata available at
>> ingestion time to StackDriver).
>>
>> Obviously I have a vested interest since as a remote write target, M3 has
>> several stateless components before TSDB ingestion and flowing the entire
>> metadata to a distributed set of DB nodes that own a different set of the
>> metrics space from each other node this has implications on M3 itself of
>> course too (i.e. it is non-trivial to map metric name -> DB node without
>> some messy stateful cache sitting somewhere in the architecture which adds
>> operational burdens to end users).
>>
>> I suppose what I'm asking is, are maintainers open to a community request
>> that duplicates some of
>> https://github.com/prometheus/prometheus/pull/6815 but sends just metric
>> TYPE and UNIT per datapoint (which would need to be captured by the WAL if
>> feature is enabled) to a backend so it can statefully be processed
>> correctly without needing a sync of a global set of metadata to a backend?
>>
>> And if not, what are the plans here and how can we collaborate to make
>> this data useful to other consumers in the Prometheus ecosystem.
>>
>> Best intentions,
>> Rob
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to prometheus-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-developers/CABakzZbvZeyKLXfK08aiXgGcZso%3D8A0H1JBT9jwBzf6rCiUmVw%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-developers/CANVFovU9LKp6gvhu-hEEAsKxoY9jz0ONUdPES%2BOQ7hpwVnpY%2Bg%40mail.gmail.com.