Hi all, If there are no more feedbacks, I will start a vote for the new interfaces in the next day, thanks
Best, Fang Yong On Thu, Feb 8, 2024 at 1:30 PM Yong Fang <zjur...@gmail.com> wrote: > Hi devs, > > According to the online-discussion in FLINK-3127 [1] and > offline-discussion with Maciej Obuchowski and Zhenqiu Huang, we would like > to update the lineage vertex relevant interfaces in FLIP-314 [2] as follows: > > 1. Introduce `LineageDataset` which represents source and sink in > `LineageVertex`. The fields in `LineageDataset` are as follows: > /* Name for this particular dataset. */ > String name; > /* Unique name for this dataset's storage, for example, url for jdbc > connector and location for lakehouse connector. */ > String namespace; > /* Facets for the lineage vertex to describe the particular > information of dataset, such as schema and config. */ > Map<String, Facet> facets; > > 2. There may be multiple datasets in one `LineageVertex`, for example, > kafka source or hybrid source. So users can get dataset list from > `LineageVertex`: > /** Get datasets from the lineage vertex. */ > List<LineageDataset> datasets(); > > 3. There will be built in facets for config and schema. To describe > columns in table/sql jobs and datastream jobs, we introduce > `DatasetSchemaField`. > /** Builtin config facet for dataset. */ > @PublicEvolving > public interface DatasetConfigFacet extends LineageDatasetFacet { > Map<String, String> config(); > } > > /** Field for schema in dataset. */ > public interface DatasetSchemaField<T> { > /** The name of the field. */ > String name(); > /** The type of the field. */ > T type(); > } > > Thanks for valuable inputs from @Maciej and @Zhenqiu. And looking forward > to your feedback, thanks > > Best, > Fang Yong > > On Mon, Sep 25, 2023 at 1:18 PM Shammon FY <zjur...@gmail.com> wrote: > >> Hi David, >> >> Do you want the detailed topology for Flink job? You can get >> `JobDetailsInfo` in `RestCusterClient` with the submitted job id, it has >> `String jsonPlan`. You can parse the json plan to get all steps and >> relations between them in a Flink job. Hope this can help you, thanks! >> >> Best, >> Shammon FY >> >> On Tue, Sep 19, 2023 at 11:46 PM David Radley <david_rad...@uk.ibm.com> >> wrote: >> >>> Hi there, >>> I am looking at the interfaces. If I am reading it correctly,there is >>> one relationship between the source and sink and this relationship >>> represents the operational lineage. Lineage is usually represented as asset >>> -> process - > asset – see for example >>> https://egeria-project.org/features/lineage-management/overview/#the-lineage-graph >>> >>> Maybe I am missing it, but it seems to be that it would be useful to >>> store the process in the lineage graph. >>> >>> It is useful to have the top level lineage as source -> Flink job -> >>> sink. Where the Flink job is the process, but also to have this asset -> >>> process -> asset pattern for each of the steps in the job. If this is >>> present, please could you point me to it, >>> >>> Kind regards, David. >>> >>> >>> >>> >>> >>> From: David Radley <david_rad...@uk.ibm.com> >>> Date: Tuesday, 19 September 2023 at 16:11 >>> To: dev@flink.apache.org <dev@flink.apache.org> >>> Subject: [EXTERNAL] RE: [DISCUSS] FLIP-314: Support Customized Job >>> Lineage Listener >>> Hi, >>> I notice that there is an experimental lineage integration for Flink >>> with OpenLineage https://openlineage.io/docs/integrations/flink . I >>> think this feature would allow for a superior Flink OpenLineage integration, >>> Kind regards, David. >>> >>> From: XTransfer <jiabao....@xtransfer.cn.INVALID> >>> Date: Tuesday, 19 September 2023 at 15:47 >>> To: dev@flink.apache.org <dev@flink.apache.org> >>> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-314: Support Customized Job >>> Lineage Listener >>> Thanks Shammon for this proposal. >>> >>> That’s helpful for collecting the lineage of Flink tasks. >>> Looking forward to its implementation. >>> >>> Best, >>> Jiabao >>> >>> >>> > 2023年9月18日 20:56,Leonard Xu <xbjt...@gmail.com> 写道: >>> > >>> > Thanks Shammon for the informations, the comment makes the lifecycle >>> clearer. >>> > +1 >>> > >>> > >>> > Best, >>> > Leonard >>> > >>> > >>> >> On Sep 18, 2023, at 7:54 PM, Shammon FY <zjur...@gmail.com> wrote: >>> >> >>> >> Hi devs, >>> >> >>> >> After discussing with @Qingsheng, I fixed a minor issue of the >>> lineage lifecycle in `StreamExecutionEnvironment`. I have added the comment >>> to explain that the lineage information in `StreamExecutionEnvironment` >>> will be consistent with that of transformations. When users clear the >>> existing transformations, the added lineage information will also be >>> deleted. >>> >> >>> >> Please help to review it again, and If there are no more concerns >>> about FLIP-314[1], I would like to start voting later, thanks. cc @ >>> <>Leonard >>> >> >>> >> Best, >>> >> Shammon FY >>> >> >>> >> On Mon, Jul 17, 2023 at 3:43 PM Shammon FY <zjur...@gmail.com >>> <mailto:zjur...@gmail.com>> wrote: >>> >> Hi devs, >>> >> >>> >> Thanks for all the valuable feedback. If there are no more concerns >>> about FLIP-314[1], I would like to start voting later, thanks. >>> >> >>> >> >>> >> [1] >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener >>> < >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-314%3A+Support+Customized+Job+Lineage+Listener >>> > >>> >> >>> >> Best, >>> >> Shammon FY >>> >> >>> >> >>> >> On Wed, Jul 12, 2023 at 11:18 AM Shammon FY <zjur...@gmail.com >>> <mailto:zjur...@gmail.com>> wrote: >>> >> Thanks for the valuable feedback, Leonard. >>> >> >>> >> I have discussed with Leonard off-line. We have reached some >>> conclusions about these issues and I have updated the FLIP as follows: >>> >> >>> >> 1. Simplify the `LineageEdge` interface by creating an edge from one >>> source vertex to sink vertex. >>> >> 2. Remove the `TableColumnSourceLineageVertex` interface and update >>> `TableColumnLineageEdge` to create an edge from columns in one source to >>> each sink column. >>> >> 3. Rename `SupportsLineageVertex` to `LineageVertexProvider` >>> >> 4. Add method `addLineageEdges(LineageEdge ... edges)` in >>> `StreamExecutionEnviroment` for datastream job and remove previous methods >>> in `DataStreamSource` and `DataStreamSink`. >>> >> >>> >> Looking forward to your feedback, thanks. >>> >> >>> >> Best, >>> >> Shammon FY >>> > >>> >>> Unless otherwise stated above: >>> >>> IBM United Kingdom Limited >>> Registered in England and Wales with number 741598 >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >>> >>> Unless otherwise stated above: >>> >>> IBM United Kingdom Limited >>> Registered in England and Wales with number 741598 >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >>> >>