Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

Chesnay Schepler Thu, 22 Jul 2021 00:46:17 -0700

The only histogram implementation available to use are those bydropwizard, and they do some lock-free synchronization stuff that so farwe wanted to keep out of hot paths (this applis to both reading andwriting); we have however never made benchmarks.But it is reasonable to assume that they are way more expensive than thealternatives (in the ideal case just being a getter).You'd pay that cost irrespective of whether a reporter is enabled ornot, which is another thing we so far wanted to prevent.Finally, histograms are problematic because they are 10x more expensiveon the metric backend (because they are effectively 10 metrics), andshould be used with extreme caution, in particular because we lack anyswitch to disable/enable metrics (and I think we are getting to a pointwhere the metric system becomes unusable for heavy users because ofthat, where another histogram isn't helping).


Overall, at this time I'm against using Histograms.

Furthermore, I believe that we should be able to derive a Histogram fromthat supplier if we later one decide differently. We'd just need to pollthe supplier more often.


On 22/07/2021 09:36, Arvid Heise wrote:

Hi all,

@Steven Wu <mailto:[email protected]>

    Regarding "lastFetchTime" latency metric, I found Gauge to be less
    informative as it only captures the last sampling value for each
    metric
    publish interval (e.g. 60s).
    * Can we make it a histogram? Histograms are more expensive though.
    * Timer [1, 2] is cheaper as it just tracks min, max, avg, count.
    but there
    is no such metric type in Flink
    * Summary metric type [3] (from Prometheus) would be nice too

I'd also think that a histogram is much more expressive but theoriginal FLIP-33 decided against it because of it's cost. @ChesnaySchepler <mailto:[email protected]> could you shed some light on howmuch more expensive it is in comparison to a simple gauge? Does itdepend on whether a reporter is actually using the metric?The current interface of this FLIP-179 would actually allow to switchthe type of the metric later. But since the metric type isuser-facing, we need to have an agreement now.


@Becket Qin <mailto:[email protected]>

    In that case, do we still need the metric here? It seems we are
    creating a
    "global variable" which users may potentially use. I am wondering
    how much
    additional convenience it provides because it seems easy for people to
    simply pass the fetch time by themselves if they have decided to
    not use
    SourceReaderBase. Also, it looks like we do not have an API
    pattern that
    lets users get the value of a metric and derive another metric. So
    I think
    it is easier for people to understand if LastFetchTimeGauge() is
    just an
    independent metric by itself, instead of being a part of the
    eventTimeFetchLag computation.

I'm not sure if I follow the global variable argument, could youelaborate? Are you referring specifically to the SettableGauge? How isthat different from a Counter or Meter?

With the current design, we could very well add a LastFetchTimemetric. The key point of the current abstraction is that a user getsthe much harder eventTimeFetchLag metric for free, since we alreadyneed to extract the event time for other metrics. I think the JavaDocmakes it clear what the intent of the LastFetchTimeGauge is and if notwe can improve it.Btw we have derived metrics already. For example, we have Meters forbyteIn/Out and recordIn/Out. That's already part of FLIP-33.


    Would it make sense to have a more generic metadata type <T>
    associated
    with the records batch? In some cases, it may be useful to allow
    the Source
    implementation to carry some additional information of the batch
    to the
    RecordEmitter. For example, the split info of the batch, the
    sender of the
    batch etc. Because the RecordEmitter only takes one record at.a time,
    currently such information needs to be put into each record, which may
    involve a lot of wrapper object creation.

I like the idea of having more general metadata and I follow theexample. I'm wondering if we could avoid a generic type (since thatadds a bit of complexity to the mental model and usage) by simplyencouraging to use a more specific MetaData subclass as a return typeof the method.

public interface RecordsWithSplitIds<E> {
     @Nullable default RecordMetadatagetMetadata() {
         return null; }
     ...
}
public interface RecordMetadata {
     long getLastFetchTime(); // mandatory? }
And using it as
public class KafkaRecordMetadataimplements RecordMetadata {}
private static class KafkaPartitionSplitRecords<T> implements 
RecordsWithSplitIds<T> {
     @Override public KafkaRecordMetadatagetMetadata() {
         return metadata; }
}

Or do we want to have the generic to explicitly pass it to theRecordEmitter? Would that metadata be a fourth parameter ofRecordEmitter#emitRecord?


    It might be slightly better if we let the method accept a Supplier
    in this
    case. However, it seems to introduce a parallel channel or a sidepath
    between the user implementation and SourceOutput. I am not sure if
    this is
    the right way to go. Would it be more intuitive if we just add a
    new method
    to the SourceOutput, to allow the FetchTime to be passed in
    explicitly?
    This would work well with the change I suggested above, which adds a
    generic metadata type <T> to the RecordsWithSplits and passes that
    to the
    RecordEmitter.emitRecord() as an argument.

We could do that. That would remove the gauge from the MetricGroup,right? The main downside is that sources that do not useSourceReaderBase cannot set the metric anymore. So I'd rather keep thecurrent way and extend it with the metadata extension.


Best,

Arvid

On Wed, Jul 21, 2021 at 1:38 PM Becket Qin <[email protected]<mailto:[email protected]>> wrote:


    Hey Chesnay,

    I think I got what that method was designed for now. Basically the
    motivation is to let the SourceOutput to report the
    eventTimeFetchLag for
    users. At this point, the SourceOutput only has the EventTime, so this
    method provides a way for the users to pass the FetchTime to the
    SourceOutput. This is essentially a context associated with each
    record
    emitted to the SourceOutput.

    It might be slightly better if we let the method accept a Supplier
    in this
    case. However, it seems to introduce a parallel channel or a sidepath
    between the user implementation and SourceOutput. I am not sure if
    this is
    the right way to go. Would it be more intuitive if we just add a
    new method
    to the SourceOutput, to allow the FetchTime to be passed in
    explicitly?
    This would work well with the change I suggested above, which adds a
    generic metadata type <T> to the RecordsWithSplits and passes that
    to the
    RecordEmitter.emitRecord() as an argument.

    What do you think?

    Thanks,

    Jiangjie (Becket) Qin

    On Tue, Jul 20, 2021 at 2:50 PM Chesnay Schepler
    <[email protected] <mailto:[email protected]>> wrote:

    > Would it be easier to understand if the method would accept a
    Supplier
    > instead?
    >
    > On 20/07/2021 05:36, Becket Qin wrote:
    > > In that case, do we still need the metric here? It seems we
    are creating
    > a
    > > "global variable" which users may potentially use. I am
    wondering how
    > much
    > > additional convenience it provides because it seems easy for
    people to
    > > simply pass the fetch time by themselves if they have decided
    to not use
    > > SourceReaderBase. Also, it looks like we do not have an API
    pattern that
    > > lets users get the value of a metric and derive another
    metric. So I
    > think
    > > it is easier for people to understand if LastFetchTimeGauge()
    is just an
    > > independent metric by itself, instead of being a part of the
    > > eventTimeFetchLag computation.
    >
    >
    >

Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics

Reply via email to