Hi all,

If there are no more feedbacks, I will start a vote for the new interfaces
in the next day, thanks

Best,
Fang Yong

On Thu, Feb 8, 2024 at 1:30 PM Yong Fang <zjur...@gmail.com> wrote:

> Hi devs,
>
> According to the online-discussion in FLINK-3127 [1] and
> offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would like
> to update the lineage vertex relevant interfaces in FLIP-314 [2] as follows:
>
> 1. Introduce `LineageDataset` which represents source and sink in
> `LineageVertex`. The fields in `LineageDataset` are as follows:
>     /* Name for this particular dataset. */
>     String name;
>     /* Unique name for this dataset's storage, for example, url for jdbc
> connector and location for lakehouse connector. */
>     String namespace;
>     /* Facets for the lineage vertex to describe the particular
> information of dataset, such as schema and config. */
>     Map<String, Facet> facets;
>
> 2. There may be multiple datasets in one `LineageVertex`, for example,
> kafka source or hybrid source. So users can get dataset list from
> `LineageVertex`:
>     /** Get datasets from the lineage vertex. */
>     List<LineageDataset> datasets();
>
> 3. There will be built in facets for config and schema. To describe
> columns in table/sql jobs and datastream jobs, we introduce
> `DatasetSchemaField`.
>     /** Builtin config facet for dataset. */
>     @PublicEvolving
>     public interface DatasetConfigFacet extends LineageDatasetFacet {
>         Map<String, String> config();
>     }
>
>     /** Field for schema in dataset. */
>     public interface DatasetSchemaField<T> {
>         /** The name of the field. */
>         String name();
>         /** The type of the field. */
>         T type();
>     }
>
> Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward
> to your feedback, thanks
>
> Best,
> Fang Yong
>
> On Mon, Sep 25, 2023 at 1:18 PM Shammon FY <zjur...@gmail.com> wrote:
>
>> Hi David,
>>
>> Do you want the detailed topology for Flink job? You can get
>> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has
>> `String jsonPlan`. You can parse the json plan to get all steps and
>> relations between them in a Flink job. Hope this can help you, thanks!
>>
>> Best,
>> Shammon FY
>>
>> On Tue, Sep 19, 2023 at 11:46 PM David Radley <david_rad...@uk.ibm.com>
>> wrote:
>>
>>> Hi there,
>>> I am looking at the interfaces. If I am reading it correctly,there is
>>> one relationship between the source and sink and this relationship
>>> represents the operational lineage. Lineage is usually represented as asset
>>> -> process - > asset – see for example
>>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph
>>>
>>> Maybe I am missing it, but it seems to be that it would be useful to
>>> store the process in the lineage graph.
>>>
>>> It is useful to have the top level lineage as source -> Flink job ->
>>> sink. Where the Flink job is the process, but also to have this asset ->
>>> process -> asset pattern for each of the steps in the job. If this is
>>> present, please could you point me to it,
>>>
>>>       Kind regards, David.
>>>
>>>
>>>
>>>
>>>
>>> From: David Radley <david_rad...@uk.ibm.com>
>>> Date: Tuesday, 19 September 2023 at 16:11
>>> To: dev@flink.apache.org <dev@flink.apache.org>
>>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job
>>> Lineage Listener
>>> Hi,
>>> I notice that there is an experimental lineage integration for Flink
>>> with OpenLineage https://openlineage.io/docs/integrations/flink  . I
>>> think this feature would allow for a superior Flink OpenLineage integration,
>>>         Kind regards, David.
>>>
>>> From: XTransfer <jiabao....@xtransfer.cn.INVALID>
>>> Date: Tuesday, 19 September 2023 at 15:47
>>> To: dev@flink.apache.org <dev@flink.apache.org>
>>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job
>>> Lineage Listener
>>> Thanks Shammon for this proposal.
>>>
>>> That’s helpful for collecting the lineage of Flink tasks.
>>> Looking forward to its implementation.
>>>
>>> Best,
>>> Jiabao
>>>
>>>
>>> > 2023年9月18日 20:56,Leonard Xu <xbjt...@gmail.com> 写道:
>>> >
>>> > Thanks Shammon for the informations, the comment makes the lifecycle
>>> clearer.
>>> > +1
>>> >
>>> >
>>> > Best,
>>> > Leonard
>>> >
>>> >
>>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY <zjur...@gmail.com> wrote:
>>> >>
>>> >> Hi devs,
>>> >>
>>> >> After discussing with @Qingsheng, I fixed a minor issue of the
>>> lineage lifecycle in `StreamExecutionEnvironment`. I have added the comment
>>> to explain that the lineage information in `StreamExecutionEnvironment`
>>> will be consistent with that of transformations. When users clear the
>>> existing transformations, the added lineage information will also be
>>> deleted.
>>> >>
>>> >> Please help to review it again, and If there are no more concerns
>>> about FLIP-314[1], I would like to start voting later, thanks. cc @
>>> <>Leonard
>>> >>
>>> >> Best,
>>> >> Shammon FY
>>> >>
>>> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY <zjur...@gmail.com
>>> <mailto:zjur...@gmail.com>> wrote:
>>> >> Hi devs,
>>> >>
>>> >> Thanks for all the valuable feedback. If there are no more concerns
>>> about FLIP-314[1], I would like to start voting later, thanks.
>>> >>
>>> >>
>>> >> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>>>  <
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener
>>> >
>>> >>
>>> >> Best,
>>> >> Shammon FY
>>> >>
>>> >>
>>> >> On Wed, Jul 12, 2023 at 11:18 AM Shammon FY <zjur...@gmail.com
>>> <mailto:zjur...@gmail.com>> wrote:
>>> >> Thanks for the valuable feedback, Leonard.
>>> >>
>>> >> I have discussed with Leonard off-line. We have reached some
>>> conclusions about these issues and I have updated the FLIP as follows:
>>> >>
>>> >> 1. Simplify the `LineageEdge` interface by creating an edge from one
>>> source vertex to sink vertex.
>>> >> 2. Remove the `TableColumnSourceLineageVertex` interface and update
>>> `TableColumnLineageEdge` to create an edge from columns in one source to
>>> each sink column.
>>> >> 3. Rename `SupportsLineageVertex` to `LineageVertexProvider`
>>> >> 4. Add method `addLineageEdges(LineageEdge ... edges)` in
>>> `StreamExecutionEnviroment` for datastream job and remove previous methods
>>> in `DataStreamSource` and `DataStreamSink`.
>>> >>
>>> >> Looking forward to your feedback, thanks.
>>> >>
>>> >> Best,
>>> >> Shammon FY
>>> >
>>>
>>> Unless otherwise stated above:
>>>
>>> IBM United Kingdom Limited
>>> Registered in England and Wales with number 741598
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>>
>>> Unless otherwise stated above:
>>>
>>> IBM United Kingdom Limited
>>> Registered in England and Wales with number 741598
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>>>
>>

Reply via email to