The only histogram implementation available to use are those by
dropwizard, and they do some lock-free synchronization stuff that so far
we wanted to keep out of hot paths (this applis to both reading and
writing); we have however never made benchmarks.
But it is reasonable to assume that they are way more expensive than the
alternatives (in the ideal case just being a getter).
You'd pay that cost irrespective of whether a reporter is enabled or
not, which is another thing we so far wanted to prevent.
Finally, histograms are problematic because they are 10x more expensive
on the metric backend (because they are effectively 10 metrics), and
should be used with extreme caution, in particular because we lack any
switch to disable/enable metrics (and I think we are getting to a point
where the metric system becomes unusable for heavy users because of
that, where another histogram isn't helping).
Overall, at this time I'm against using Histograms.
Furthermore, I believe that we should be able to derive a Histogram from
that supplier if we later one decide differently. We'd just need to poll
the supplier more often.
On 22/07/2021 09:36, Arvid Heise wrote:
Hi all,
@Steven Wu <mailto:stevenz...@gmail.com>
Regarding "lastFetchTime" latency metric, I found Gauge to be less
informative as it only captures the last sampling value for each
metric
publish interval (e.g. 60s).
* Can we make it a histogram? Histograms are more expensive though.
* Timer [1, 2] is cheaper as it just tracks min, max, avg, count.
but there
is no such metric type in Flink
* Summary metric type [3] (from Prometheus) would be nice too
I'd also think that a histogram is much more expressive but the
original FLIP-33 decided against it because of it's cost. @Chesnay
Schepler <mailto:ches...@apache.org> could you shed some light on how
much more expensive it is in comparison to a simple gauge? Does it
depend on whether a reporter is actually using the metric?
The current interface of this FLIP-179 would actually allow to switch
the type of the metric later. But since the metric type is
user-facing, we need to have an agreement now.
@Becket Qin <mailto:becket....@gmail.com>
In that case, do we still need the metric here? It seems we are
creating a
"global variable" which users may potentially use. I am wondering
how much
additional convenience it provides because it seems easy for people to
simply pass the fetch time by themselves if they have decided to
not use
SourceReaderBase. Also, it looks like we do not have an API
pattern that
lets users get the value of a metric and derive another metric. So
I think
it is easier for people to understand if LastFetchTimeGauge() is
just an
independent metric by itself, instead of being a part of the
eventTimeFetchLag computation.
I'm not sure if I follow the global variable argument, could you
elaborate? Are you referring specifically to the SettableGauge? How is
that different from a Counter or Meter?
With the current design, we could very well add a LastFetchTime
metric. The key point of the current abstraction is that a user gets
the much harder eventTimeFetchLag metric for free, since we already
need to extract the event time for other metrics. I think the JavaDoc
makes it clear what the intent of the LastFetchTimeGauge is and if not
we can improve it.
Btw we have derived metrics already. For example, we have Meters for
byteIn/Out and recordIn/Out. That's already part of FLIP-33.
Would it make sense to have a more generic metadata type <T>
associated
with the records batch? In some cases, it may be useful to allow
the Source
implementation to carry some additional information of the batch
to the
RecordEmitter. For example, the split info of the batch, the
sender of the
batch etc. Because the RecordEmitter only takes one record at.a time,
currently such information needs to be put into each record, which may
involve a lot of wrapper object creation.
I like the idea of having more general metadata and I follow the
example. I'm wondering if we could avoid a generic type (since that
adds a bit of complexity to the mental model and usage) by simply
encouraging to use a more specific MetaData subclass as a return type
of the method.
public interface RecordsWithSplitIds<E> {
@Nullable default RecordMetadatagetMetadata() {
return null; }
...
}
public interface RecordMetadata {
long getLastFetchTime(); // mandatory? }
And using it as
public class KafkaRecordMetadataimplements RecordMetadata {}
private static class KafkaPartitionSplitRecords<T> implements
RecordsWithSplitIds<T> {
@Override public KafkaRecordMetadatagetMetadata() {
return metadata; }
}
Or do we want to have the generic to explicitly pass it to the
RecordEmitter? Would that metadata be a fourth parameter of
RecordEmitter#emitRecord?
It might be slightly better if we let the method accept a Supplier
in this
case. However, it seems to introduce a parallel channel or a sidepath
between the user implementation and SourceOutput. I am not sure if
this is
the right way to go. Would it be more intuitive if we just add a
new method
to the SourceOutput, to allow the FetchTime to be passed in
explicitly?
This would work well with the change I suggested above, which adds a
generic metadata type <T> to the RecordsWithSplits and passes that
to the
RecordEmitter.emitRecord() as an argument.
We could do that. That would remove the gauge from the MetricGroup,
right? The main downside is that sources that do not use
SourceReaderBase cannot set the metric anymore. So I'd rather keep the
current way and extend it with the metadata extension.
Best,
Arvid
On Wed, Jul 21, 2021 at 1:38 PM Becket Qin <becket....@gmail.com
<mailto:becket....@gmail.com>> wrote:
Hey Chesnay,
I think I got what that method was designed for now. Basically the
motivation is to let the SourceOutput to report the
eventTimeFetchLag for
users. At this point, the SourceOutput only has the EventTime, so this
method provides a way for the users to pass the FetchTime to the
SourceOutput. This is essentially a context associated with each
record
emitted to the SourceOutput.
It might be slightly better if we let the method accept a Supplier
in this
case. However, it seems to introduce a parallel channel or a sidepath
between the user implementation and SourceOutput. I am not sure if
this is
the right way to go. Would it be more intuitive if we just add a
new method
to the SourceOutput, to allow the FetchTime to be passed in
explicitly?
This would work well with the change I suggested above, which adds a
generic metadata type <T> to the RecordsWithSplits and passes that
to the
RecordEmitter.emitRecord() as an argument.
What do you think?
Thanks,
Jiangjie (Becket) Qin
On Tue, Jul 20, 2021 at 2:50 PM Chesnay Schepler
<ches...@apache.org <mailto:ches...@apache.org>> wrote:
> Would it be easier to understand if the method would accept a
Supplier
> instead?
>
> On 20/07/2021 05:36, Becket Qin wrote:
> > In that case, do we still need the metric here? It seems we
are creating
> a
> > "global variable" which users may potentially use. I am
wondering how
> much
> > additional convenience it provides because it seems easy for
people to
> > simply pass the fetch time by themselves if they have decided
to not use
> > SourceReaderBase. Also, it looks like we do not have an API
pattern that
> > lets users get the value of a metric and derive another
metric. So I
> think
> > it is easier for people to understand if LastFetchTimeGauge()
is just an
> > independent metric by itself, instead of being a part of the
> > eventTimeFetchLag computation.
>
>
>