Hi, Shammon. Thanks for bring this FLIP. I share all the concerns with Qingsheng. Beside that, I have few question:
1: What kinds of JobStatus will the `JobExecutionStatusEven` including? From the jave doc "/** Job status changed event for runtime. */" for the `JobExecutionStatusEvent`, I believe it won't contain all JobStatus defined in org.apache.flink.api.common.JobStatus since it also define some JobStatus that not for runtime such as INITIALIZING/CREATED, etc. I think we do need have a clear defination for it. 2: I'm really confused about the `config()` included in `LineageEntity`, where is it from and what is it for ? 3: Regardless whether `inputChangelogMode` in `TableSinkLineageEntity` is needed or not, since `TableSinkLineageEntity` contains `inputChangelogMode`, why `TableSourceLineageEntity` don't contain changelogmode? 4: About `Job lineage for DataStream job`, why DataStreamSink need to set setLineageRelationEntity, shouldn't it be collection by Flink framework automatically? Btw, since there're a lot interfaces proposed, I think it'll be better to give an example about how to implement a listener in this FLIP to make us know better about the interfaces. Best regards, Yuxia ----- 原始邮件 ----- 发件人: "Qingsheng Ren" <re...@apache.org> 收件人: "dev" <dev@flink.apache.org> 抄送: "Shammon FY" <zjur...@gmail.com> 发送时间: 星期二, 2023年 6 月 20日 下午 6:19:10 主题: Re: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener Hi Shammon, Thanks for starting this FLIP! Data lineage is a very important topic, which has been missing for a long time in Flink. I have some questions about the FLIP. About events and listeners: 1. I’m not sure if it is necessary to expose JobType to in JobCreatedEvent. This is an internal class in flink-runtime, and I think the correct API should be RuntimeExecutionMode. Furthermore, I think the boundary of batch and streaming becomes much more vague as Flink is evolving towards batch-streaming unification, so I’m concerned about exposing JobType as a public API. Is there any specific use case to expose the batch / streaming info to listeners or meta services? 2. Currently JobCreatedEvent gives a Configuration, which is quite ambiguous. To be honest the configuration is quite a mess in Flink, so maybe it’s better to be more specific here to tell users what information they could expect to see here, instead of just a “job configuration” as described in JavaDoc. 3. JobStatusChangedListenerFactory.Context provides an IO executor. I think more information should be provided here, such as which thread model this executor could promise, and whether the user should care about concurrency issues. Otherwise I prefer not to give such an utility that no one dares to use safely, and leave it to users to choose their implementation. About lineage: 4. I don’t quite get the LineageRelationEntity, which is just a list of LineageEntity. Could you elaborate more on this class? From my naive imagination, the lineage is shaped as a DAG, where vertices are sources and sinks (LineageEntity) and edges are connections between them (LineageRelation), so it is a bit confusing for a name mixing these two concepts. 5. I can’t find the definition of CatalogContext in the current code base and Flink, which appears in the TableLineageEntity. 6. TableSinkLineageEntity exposes ModifyType, ChangelogMode and a boolean (the “override” is quite confusing). I’m wondering if these are necessary for meta services, as they are actually concepts defined in the runtime level of Flink Table / SQL. Best, Qingsheng On Tue, Jun 20, 2023 at 2:31 PM Shammon FY <zjur...@gmail.com> wrote: > Hi devs, > > Is there any comment or feedback for this FLIP? Hope to hear from you, > thanks > > Best, > Shammon FY > > > On Tue, Jun 6, 2023 at 8:22 PM Shammon FY <zjur...@gmail.com> wrote: > > > Hi devs, > > > > I would like to start a discussion on FLIP-314: Support Customized Job > > Lineage Listener[1] which is the next stage of FLIP-294 [2]. Flink > > streaming and batch jobs create lineage dependency between source and > sink, > > users can manage their data and jobs according to this lineage > information. > > For example, when there is a delay in Flink ETL or data, users can easily > > trace the problematic jobs and affected data. On the other hand, when > users > > need to correct data or debug, they can perform operations based on > lineage > > too. > > > > In FLIP-314 we want to introduce lineage related interfaces for Flink and > > users can create customized job status listeners. When job status > changes, > > users can get job status and information to add, update or delete > lineage. > > > > Looking forward to your feedback, thanks. > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Job+Meta+Data+Listener > > > > Best, > > Shammon FY > > >