Hi Martijn, No worries. Thanks for the discussion!
Suppose we agree to deprecate the existing SourceReaderBase public constructor, it means we plan to remove this construct (which could be backward incompatible) sooner or later. I think it makes sense to at least update all the externalized connectors listed under github.com/apache/flink-connector-* to use the new constructor, before we remove the existing constructor. Maybe Wencong can create a FLIP for this public API change and specify this plan "Compatibility, Deprecation, and Migration Plan" section? The changes are probably straightforward to make for each externalized connectors. I am happy to make the change if it is not made after 2 Flink release cycles. Cheers, Dong On Mon, Jan 9, 2023 at 4:27 PM Martijn Visser <[email protected]> wrote: > Hi Dong, > > I think the key part is indeed your sentence that "FLIP-33 intends to > standard the convention of metric name/semantic, but not the > implementation." > When I read FLIP-33, that's not how I interpreted it. So I've been reading > back in the FLIP-33 discussion thread, there were some more clarifications > in that thread that it is indeed only meant for standardizing metric names. > Should have done that earlier, sorry for that. > > As part of this proposal, does that also include updating the existing > connectors (given that a lot of them have been externalized) to not use the > to-be-deprecated SourceReaderBase constructor? > > Best regards, > > Martijn > > On Mon, Jan 9, 2023 at 1:39 AM Dong Lin <[email protected]> wrote: > > > Hi Martijn, > > > > By saying "Each connector defines their own metrics at the moment", what > > FLIP-33 intends to avoid is that connectA defines a metric named > > numRecordsIn and connectB defines a metric named numOfRecordsIn. In this > > case, these two metrics would have the same semantic meaning but > different > > name in the dashboard, which complicates the monitoring for users. > FLIP-33 > > intends to standard the convention of metric name/semantic, but not the > > implementation. > > > > It is mentioned in FLIP-33 that "A connector implementation does not have > > to report all the defined metrics. But if a connector reports a metric of > > the same semantic defined below, the implementation should follow the > > convention". > > > > What is proposed in this thread will only affect the connection > > implementation, i.e. SourceReaderBase would no longer increment > > numRecordsIn. But it would not change the name/semantic of numRecordsIn. > > > > Note that even if we don't change SourceReaderBase, a subclass of > > SourceReaderBase (or Source in general) can still update numRecordsIn in > a > > way that breaks the semantic of this metric mentioned in FLIP-33. There > is > > no way we can programmatically prevent that from happening. Developers of > > respective connectors need to be aware of the convention and enforce it > > properly. > > > > Regards, > > Dong > > > > > > > > On Mon, Jan 9, 2023 at 6:06 AM Martijn Visser <[email protected]> > > wrote: > > > > > Hi Dong, > > > > > > In the proposal from Wencong is stated "If source reader want to report > > to > > > metric by self, it can set source.reader.metric.num_records_in.override > > to > > > true, which disables the registration of numRecordsIn in > SourceReaderBase > > > and let the actual implementation to report the metric instead." > > > > > > While in FLIP-33 is stated "Each connector defines their own metrics at > > the > > > moment. This complicates operation and monitoring. Admittedly, > different > > > connectors may have different metrics, but some commonly used metrics > can > > > probably be standardized. This FLIP proposes a set of standard > connector > > > metrics that each connector should emit if applicable." > > > > > > With this proposal, the actual implementation can be disabled in > > > SourceReaderBase and the connector can define its own metric again. > With > > > how I read FLIP-33, the purpose was exactly to avoid that. So you can > > have > > > a connector that has its own definition for numRecordsIn, which it > > > submits/sends as numRecordsIn, which basically breaks it being a > default > > > metric. > > > > > > Best regards, > > > > > > Martijn > > > > > > On Sun, Jan 8, 2023 at 3:26 PM Dong Lin <[email protected]> wrote: > > > > > > > Hi Martijn, > > > > > > > > I think the change proposed in this thread would *not* break the idea > > in > > > > FLIP-33. FLIP-33 standardized metrics such as numRecordsIn, by > > specifying > > > > the name/type/semantics of those metrics so that all connectors can > > > follow. > > > > The name/type/semantics of numRecordsIn metric would be the same > before > > > and > > > > after the proposed change. > > > > > > > > Does this make sense? Or could you explain which part of FLIP-33 > would > > be > > > > broken by the proposed change? > > > > > > > > Regards, > > > > Dong > > > > > > > > > > > > On Sun, Jan 8, 2023 at 9:33 PM Martijn Visser < > > [email protected]> > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > This feels like we purposely want to abandon the idea that was > > > introduced > > > > > with FLIP-33 on standardizing metrics [1]. From this proposal, I > > don't > > > > see > > > > > why we want to abandon that idea. Next to that, if you override > > > > > the numRecordsIn logic, you also touch on the other metrics that > rely > > > on > > > > > this value. > > > > > > > > > > Best regards, > > > > > > > > > > Martijn > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics > > > > > > > > > > On Sun, Jan 8, 2023 at 12:43 PM Wencong Liu <[email protected]> > > > > wrote: > > > > > > > > > > > Thanks for the discussion, Dong and John > > > > > > > > > > > > > > > > > > Considering the extra cost of method calls, the approach of > adding > > an > > > > > > extra SourceReaderBase constructor > > > > > > should be a better choice. If there is no further discussion, I > > will > > > > > > follow this plan. > > > > > > > > > > > > > > > > > > Best, > > > > > > Wencong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2023-01-08 01:05:07, "John Roesler" <[email protected]> > > wrote: > > > > > > >Thanks, for the discussion, Dong. > > > > > > > > > > > > > >To answer your question: I was unclear if the desire was to > > > deprecate > > > > > the > > > > > > metric itself, to be replaced with some other metric, or whether > we > > > > just > > > > > > wanted to deprecate the way the superclass manages it. It’s clear > > now > > > > > that > > > > > > we want to keep the metric and only change the superclass logic. > > > > > > > > > > > > > >I think we’re on the same page now. > > > > > > > > > > > > > >Thanks! > > > > > > >-John > > > > > > > > > > > > > >On Sat, Jan 7, 2023, at 07:21, Dong Lin wrote: > > > > > > >> Hi John, > > > > > > >> > > > > > > >> Not sure if I understand the difference between "deprecate > that > > > > > metric" > > > > > > and > > > > > > >> "deprecate the private counter mechanism". I think what we > want > > is > > > > to > > > > > > >> update SourceReaderBase's implementation so that it no longer > > > > > explicitly > > > > > > >> increments this metric. But we still need to expose this > metric > > to > > > > > user. > > > > > > >> And methods such as RecordEmitter#emitRecord (which can be > > invoked > > > > by > > > > > > >> SourceReaderBase#pollNext(...)) can potentially increment this > > > > metric. > > > > > > >> > > > > > > >> I like your approach of adding an extra SourceReaderBase > > > > constructor. > > > > > > That > > > > > > >> appears simpler than adding a deprecated config. We can mark > the > > > > > > existing > > > > > > >> SourceReaderBase constructor as @deprecated. > > > > > > SourceReaderBase#pollNext(...) > > > > > > >> will not increment the counter if a subclass uses the newly > > added > > > > > > >> constructor. > > > > > > >> > > > > > > >> Cheers, > > > > > > >> Dong > > > > > > >> > > > > > > >> > > > > > > >> On Sat, Jan 7, 2023 at 4:47 AM John Roesler < > > [email protected]> > > > > > > wrote: > > > > > > >> > > > > > > >>> Thanks for the replies, Dong and Wencong! > > > > > > >>> > > > > > > >>> That’s a good point about the overhead of the extra method. > > > > > > >>> > > > > > > >>> Is the desire to actually deprecate that metric in a > > user-facing > > > > way, > > > > > > or > > > > > > >>> just to deprecate the private counter mechanism? > > > > > > >>> > > > > > > >>> It seems like if the desire is to deprecate the existing > > private > > > > > > counter, > > > > > > >>> we can accomplish it by deprecating the current constructor > and > > > > > > offering > > > > > > >>> another that is documented not to track the metric. This > seems > > > > better > > > > > > than > > > > > > >>> the config option, since the concern is purely about the > > division > > > > of > > > > > > >>> responsibilities between the sub- and super-class. > > > > > > >>> > > > > > > >>> Another option, which might be better if we wish to keep a > > > > uniformly > > > > > > named > > > > > > >>> metric, would be to simply make the counter protected. That > > would > > > > be > > > > > > better > > > > > > >>> if we’re generally happy with the metric and counter, but a > few > > > > > special > > > > > > >>> source connectors need to emit records in other ways. > > > > > > >>> > > > > > > >>> And finally, if we really want to get rid of the metric > itself, > > > > then > > > > > I > > > > > > >>> agree, a config is the way to do it. > > > > > > >>> > > > > > > >>> Thanks, > > > > > > >>> John > > > > > > >>> > > > > > > >>> On Fri, Jan 6, 2023, at 00:55, Dong Lin wrote: > > > > > > >>> > Hi John and Wencong, > > > > > > >>> > > > > > > > >>> > Thanks for the reply! > > > > > > >>> > > > > > > > >>> > It is nice that optional-2 can address the problem without > > > > > affecting > > > > > > the > > > > > > >>> > existing source connectors as far as functionality is > > > concerned. > > > > > One > > > > > > >>> > potential concern with this approach is that it might > > increase > > > > the > > > > > > Flink > > > > > > >>> > runtime overhead by adding one more virtual functional call > > to > > > > the > > > > > > >>> > per-record runtime call stack. > > > > > > >>> > > > > > > > >>> > Since Java's default MaxInlineLevel is 12-18, I believe it > is > > > > easy > > > > > > for an > > > > > > >>> > operator chain of 5+ operators to exceed this limit. In > this > > > > case. > > > > > > And > > > > > > >>> > option-2 would incur one more virtual table lookup to > produce > > > > each > > > > > > >>> record. > > > > > > >>> > It is not clear how much this overhead would show up for > jobs > > > > with > > > > > a > > > > > > >>> chain > > > > > > >>> > of lightweight operators. I am recently working on > > FLINK-30531 > > > > > > >>> > <https://issues.apache.org/jira/browse/FLINK-30531> to > > reduce > > > > > > runtime > > > > > > >>> > overhead which might be related to this discussion. > > > > > > >>> > > > > > > > >>> > In comparison to option-2, the option-3 provided in my > > earlier > > > > > email > > > > > > >>> would > > > > > > >>> > not add this extra overhead. I think it might be worthwhile > > to > > > > > > invest in > > > > > > >>> > the long-term performance (and simpler runtime infra) and > pay > > > for > > > > > the > > > > > > >>> > short-term cost of deprecating this metric in > > > SourceOperatorBase. > > > > > > What do > > > > > > >>> > you think? > > > > > > >>> > > > > > > > >>> > Regards, > > > > > > >>> > Dong > > > > > > >>> > > > > > > > >>> > > > > > > > >>> > On Thu, Jan 5, 2023 at 10:10 PM Wencong Liu < > > > > [email protected]> > > > > > > >>> wrote: > > > > > > >>> > > > > > > > >>> >> Hi, All > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> Thanks for the reply! > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> I think both John and Dong's opinions are reasonable. > John's > > > > > > Suggestion > > > > > > >>> 2 > > > > > > >>> >> is a good implementation. > > > > > > >>> >> It does not affect the existing source connectors, but > also > > > > > provides > > > > > > >>> >> support > > > > > > >>> >> for custom counter in the future implementation. > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> WDYT? > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> Best, > > > > > > >>> >> > > > > > > >>> >> Wencong Liu > > > > > > >>> > > > > > > > > > > > > > > > > > > > > >
