Hi, Shammon. Thanks for bring this FLIP. 
I share all the concerns with Qingsheng.
Beside that, I have few question:

1: What kinds of JobStatus will the `JobExecutionStatusEven` including? From 
the jave doc "/** Job status changed event for runtime. */"  for the 
`JobExecutionStatusEvent`, I believe it won't contain all 
JobStatus defined in org.apache.flink.api.common.JobStatus since it also define 
some JobStatus that not for runtime such as INITIALIZING/CREATED, etc. I think 
we do need have a clear defination for it.
 
2: I'm really confused about the `config()` included in `LineageEntity`, where 
is it from and what is it for ?

3: Regardless whether `inputChangelogMode` in `TableSinkLineageEntity` is 
needed or not, since `TableSinkLineageEntity` contains `inputChangelogMode`, 
why `TableSourceLineageEntity` don't contain changelogmode?

4: About `Job lineage for DataStream job`, why DataStreamSink need to set 
setLineageRelationEntity, shouldn't it be collection by Flink framework 
automatically?

Btw, since there're a lot interfaces proposed, I think it'll be better to give 
an example about how to implement a listener in this FLIP to make us know 
better about the interfaces.


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Qingsheng Ren" <re...@apache.org>
收件人: "dev" <dev@flink.apache.org>
抄送: "Shammon FY" <zjur...@gmail.com>
发送时间: 星期二, 2023年 6 月 20日 下午 6:19:10
主题: Re: [DISCUSS] FLIP-314: Support Customized Job Lineage Listener

Hi Shammon,

Thanks for starting this FLIP! Data lineage is a very important topic,
which has been missing for a long time in Flink. I have some questions
about the FLIP.

About events and listeners:

1. I’m not sure if it is necessary to expose JobType to in JobCreatedEvent.
This is an internal class in flink-runtime, and I think the correct API
should be RuntimeExecutionMode. Furthermore, I think the boundary of batch
and streaming becomes much more vague as Flink is evolving towards
batch-streaming unification, so I’m concerned about exposing JobType as a
public API. Is there any specific use case to expose the batch / streaming
info to listeners or meta services?

2. Currently JobCreatedEvent gives a Configuration, which is quite
ambiguous. To be honest the configuration is quite a mess in Flink, so
maybe it’s better to be more specific here to tell users what information
they could expect to see here, instead of just a “job configuration” as
described in JavaDoc.

3. JobStatusChangedListenerFactory.Context provides an IO executor. I think
more information should be provided here, such as which thread model this
executor could promise, and whether the user should care about concurrency
issues. Otherwise I prefer not to give such an utility that no one dares to
use safely, and leave it to users to choose their implementation.

About lineage:

4. I don’t quite get the LineageRelationEntity, which is just a list of
LineageEntity. Could you elaborate more on this class? From my naive
imagination, the lineage is shaped as a DAG, where vertices are sources and
sinks (LineageEntity) and edges are connections between them
(LineageRelation), so it is a bit confusing for a name mixing these two
concepts.

5. I can’t find the definition of CatalogContext in the current code base
and Flink, which appears in the TableLineageEntity.

6. TableSinkLineageEntity exposes ModifyType, ChangelogMode and a boolean
(the “override” is quite confusing). I’m wondering if these are necessary
for meta services, as they are actually concepts defined in the runtime
level of Flink Table / SQL.

Best,
Qingsheng

On Tue, Jun 20, 2023 at 2:31 PM Shammon FY <zjur...@gmail.com> wrote:

> Hi devs,
>
> Is there any comment or feedback for this FLIP? Hope to hear from you,
> thanks
>
> Best,
> Shammon FY
>
>
> On Tue, Jun 6, 2023 at 8:22 PM Shammon FY <zjur...@gmail.com> wrote:
>
> > Hi devs,
> >
> > I would like to start a discussion on FLIP-314: Support Customized Job
> > Lineage Listener[1] which is the next stage of FLIP-294 [2]. Flink
> > streaming and batch jobs create lineage dependency between source and
> sink,
> > users can manage their data and jobs according to this lineage
> information.
> > For example, when there is a delay in Flink ETL or data, users can easily
> > trace the problematic jobs and affected data. On the other hand, when
> users
> > need to correct data or debug, they can perform operations based on
> lineage
> > too.
> >
> > In FLIP-314 we want to introduce lineage related interfaces for Flink and
> > users can create customized job status listeners. When job status
> changes,
> > users can get job status and information to add, update or delete
> lineage.
> >
> > Looking forward to your feedback, thanks.
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-294%3A+Support+Customized+Job+Meta+Data+Listener
> >
> > Best,
> > Shammon FY
> >
>

Reply via email to