Re: [DISCUSS] PIP-5: Paimon Table And Data Lineage For Flink

Jingsong Li Tue, 20 Jun 2023 05:13:15 -0700

Thanks Shammon,

For the metadata.store, is this just now the metastore?


I mean can we manage this meta information through the current Catalog
interface (which is in fact metastore as a key)?

For example,

CREATE CATALOG paimon_catalog1 WITH (
    ... // other options
    'metastore' = 'jdbc',
    'url' = 'XXXXX',
    'jdbc.driver' = 'com.mysql.jdbc.Driver',
    'jdbc.database' = 'paimon_cata1',    // The default Metadata
Database name is `paimon`
    'jdbc.username' = 'XXX',
    'jdbc.password' = 'XXX'
);

JDBC manages not only the table information (which is what Catalog
used to do), but also the data lineage information.

What do you think?

Or you still want to separate their responsibilities.

Best,
Jingsong

On Thu, Jun 15, 2023 at 1:46 PM Shammon FY <[email protected]> wrote:
>
> Hi Jingsong,
>
> I have updated this PIP and added the implementation for System Database, the 
> main changes are as follows:
>
> 1. Introduce MetadataStore and MetadataStoreFactory to store the data of 
> table and data lineages.
> 2. Use jdbc as default metadata store
> 3. Users can query table and data lineage tables, and delete lineages with 
> actions
>
> Looking forward to your feedback, thanks
>
> Best,
> Shammon FY
>
>
> On Wed, Jun 14, 2023 at 11:17 AM Shammon FY <[email protected]> wrote:
>>
>> Hi Jingsong,
>>
>> It's a good point about the detailed implementation of System Database, I'll 
>> update the PIP soon.
>>
>> Best,
>> Shammon FY
>>
>> On Wed, Jun 14, 2023 at 8:48 AM Shammon FY <[email protected]> wrote:
>>>
>>> Hi Jingsong,
>>>
>>> Thanks for your comments.
>>>
>>> > We should document what is based on FLIP-314.
>>>
>>> I have updated the operations supported by FLIP-314 in the future work
>>>
>>> > Is the current Source interface sufficient for your functionality?
>>>
>>> In our design the current Source interface fulfills our requirements. As 
>>> described in PIP-5, `AlignedEnumerator` will send checkpoint events to 
>>> `AlignedSourceReader`, which will align the checkpoint and snapshot, and 
>>> then send split the next operator. More detailed information can be 
>>> provided by @liming
>>>
>>> > Can we currently achieve the ability to flush all data in a snapshot 
>>> > before snapshot?
>>>
>>> Can you provide a more detailed description of this? Do you mean there may 
>>> be too much data for a snapshot if the source aligns the checkpoint and 
>>> snapshot and causes the snapshot to be too large to flush?
>>>
>>>
>>> Best,
>>> Shammon FY
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 12, 2023 at 4:30 PM Jingsong Li <[email protected]> wrote:
>>>>
>>>> System Database looks very good~ But perhaps there are some design
>>>> details here? What API should we use? Paimon Java API? And we should
>>>> commit every operation?
>>>>
>>>> Best,
>>>> Jingsong
>>>>
>>>> On Mon, Jun 12, 2023 at 4:27 PM Jingsong Li <[email protected]> wrote:
>>>> >
>>>> > Thanks Shammon,
>>>> >
>>>> > The overall design looks good to me!
>>>> >
>>>> > ## Plan For The Future
>>>> >
>>>> > We should document what is based on FLIP-314.
>>>> >
>>>> > ## AlignedEnumerator and AlignedSourceReader
>>>> >
>>>> > Is the current Source interface sufficient for your functionality?
>>>> >
>>>> > Can we currently achieve the ability to flush all data in a snapshot
>>>> > before snapshot?
>>>> >
>>>> > Best,
>>>> > Jingsong
>>>> >
>>>> > On Mon, Jun 5, 2023 at 7:57 PM Shammon FY <[email protected]> wrote:
>>>> > >
>>>> > > Hi Kelu,
>>>> > >
>>>> > > Thanks for your feedback. In the first stage, we do not want to 
>>>> > > introduce a
>>>> > > server, but instead store information directly in the Paimon table when
>>>> > > creating and running Flink jobs. A server will be considered when we
>>>> > > encounter more requirements in the future and need a resident service
>>>> > > management.
>>>> > >
>>>> > > Best,
>>>> > > Shammon FY
>>>> > >
>>>> > > On Fri, Jun 2, 2023 at 5:55 PM Kelu Tao <[email protected]> wrote:
>>>> > >
>>>> > > > +1
>>>> > > >
>>>> > > > cool job ~
>>>> > > >
>>>> > > > For this PIP, do we need to introduce a new server for the 
>>>> > > > information
>>>> > > > serving?
>>>> > > >
>>>> > > > On 2023/05/31 02:28:21 Shammon FY wrote:
>>>> > > > > Hi devs,
>>>> > > > >
>>>> > > > > We would like to start a discussion about PIP-5: Paimon Table And 
>>>> > > > > Data
>>>> > > > > Lineage For Flink[1].
>>>> > > > >
>>>> > > > > As a streaming lake, users can use Paimon integrated with Flink to
>>>> > > > complete
>>>> > > > > the entire ETL processing. In this process, users need to manage 
>>>> > > > > batch &
>>>> > > > > streaming jobs and data streams, including batch & streaming data
>>>> > > > > validation, job debug, and data revision. To support the above 
>>>> > > > > ability,
>>>> > > > we
>>>> > > > > introduce table and data lineage for Flink & Paimon. Users can
>>>> > > > conveniently
>>>> > > > > manage the entire ETL processing based on lineage information.
>>>> > > > >
>>>> > > > > Looking forward to hearing from you, thanks.
>>>> > > > >
>>>> > > > >
>>>> > > > > [1]
>>>> > > > >
>>>> > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-5%3A+Paimon+Table+And+Data+Lineage+For+Flink
>>>> > > > >
>>>> > > > >
>>>> > > > > Best,
>>>> > > > > Shammon FY
>>>> > > > >
>>>> > > >

Re: [DISCUSS] PIP-5: Paimon Table And Data Lineage For Flink

Reply via email to