Hi dev, Thanks for all the feedback. If there are no more comments, I will start a vote about this PIP later. Thanks
Best, Shammon FY On Wed, Jun 21, 2023 at 12:08 PM Jingsong Li <[email protected]> wrote: > Thanks for the update. > > Looks good to me! > > Best, > Jingsong > > On Wed, Jun 21, 2023 at 9:59 AM Shammon FY <[email protected]> wrote: > > > > Thanks Jingsong. > > > > As we discussed offline, the `metadata.store` will store the table > lineage and data lineage information, which is orthogonal with `metastore`. > We can introduce an option `lineage-meta` as follows. > > > > CREATE CATALOG paimon_catalog1 WITH ( > > ... // other options > > 'metastore' = 'hive', > > 'url' = 'XXXXX', > > 'lineage-meta' = 'jdbc', > > 'jdbc.driver' = 'com.mysql.jdbc.Driver', > > 'jdbc.database' = 'paimon_cata1', // The default Lineage Meta > Database name is `paimon` > > 'jdbc.username' = 'XXX', > > 'jdbc.password' = 'XXX' > > ); > > > > Then we can support `lineage-meta` for `filesystem` and `hive` > metastore. I have updated the PIP for the options and the interfaces. > > > > > > Best, > > Shammon FY > > > > > > On Tue, Jun 20, 2023 at 8:13 PM Jingsong Li <[email protected]> > wrote: > >> > >> Thanks Shammon, > >> > >> For the metadata.store, is this just now the metastore? > >> > >> I mean can we manage this meta information through the current Catalog > >> interface (which is in fact metastore as a key)? > >> > >> For example, > >> > >> CREATE CATALOG paimon_catalog1 WITH ( > >> ... // other options > >> 'metastore' = 'jdbc', > >> 'url' = 'XXXXX', > >> 'jdbc.driver' = 'com.mysql.jdbc.Driver', > >> 'jdbc.database' = 'paimon_cata1', // The default Metadata > >> Database name is `paimon` > >> 'jdbc.username' = 'XXX', > >> 'jdbc.password' = 'XXX' > >> ); > >> > >> JDBC manages not only the table information (which is what Catalog > >> used to do), but also the data lineage information. > >> > >> What do you think? > >> > >> Or you still want to separate their responsibilities. > >> > >> Best, > >> Jingsong > >> > >> On Thu, Jun 15, 2023 at 1:46 PM Shammon FY <[email protected]> wrote: > >> > > >> > Hi Jingsong, > >> > > >> > I have updated this PIP and added the implementation for System > Database, the main changes are as follows: > >> > > >> > 1. Introduce MetadataStore and MetadataStoreFactory to store the data > of table and data lineages. > >> > 2. Use jdbc as default metadata store > >> > 3. Users can query table and data lineage tables, and delete lineages > with actions > >> > > >> > Looking forward to your feedback, thanks > >> > > >> > Best, > >> > Shammon FY > >> > > >> > > >> > On Wed, Jun 14, 2023 at 11:17 AM Shammon FY <[email protected]> > wrote: > >> >> > >> >> Hi Jingsong, > >> >> > >> >> It's a good point about the detailed implementation of System > Database, I'll update the PIP soon. > >> >> > >> >> Best, > >> >> Shammon FY > >> >> > >> >> On Wed, Jun 14, 2023 at 8:48 AM Shammon FY <[email protected]> > wrote: > >> >>> > >> >>> Hi Jingsong, > >> >>> > >> >>> Thanks for your comments. > >> >>> > >> >>> > We should document what is based on FLIP-314. > >> >>> > >> >>> I have updated the operations supported by FLIP-314 in the future > work > >> >>> > >> >>> > Is the current Source interface sufficient for your functionality? > >> >>> > >> >>> In our design the current Source interface fulfills our > requirements. As described in PIP-5, `AlignedEnumerator` will send > checkpoint events to `AlignedSourceReader`, which will align the checkpoint > and snapshot, and then send split the next operator. More detailed > information can be provided by @liming > >> >>> > >> >>> > Can we currently achieve the ability to flush all data in a > snapshot before snapshot? > >> >>> > >> >>> Can you provide a more detailed description of this? Do you mean > there may be too much data for a snapshot if the source aligns the > checkpoint and snapshot and causes the snapshot to be too large to flush? > >> >>> > >> >>> > >> >>> Best, > >> >>> Shammon FY > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> On Mon, Jun 12, 2023 at 4:30 PM Jingsong Li <[email protected]> > wrote: > >> >>>> > >> >>>> System Database looks very good~ But perhaps there are some design > >> >>>> details here? What API should we use? Paimon Java API? And we > should > >> >>>> commit every operation? > >> >>>> > >> >>>> Best, > >> >>>> Jingsong > >> >>>> > >> >>>> On Mon, Jun 12, 2023 at 4:27 PM Jingsong Li < > [email protected]> wrote: > >> >>>> > > >> >>>> > Thanks Shammon, > >> >>>> > > >> >>>> > The overall design looks good to me! > >> >>>> > > >> >>>> > ## Plan For The Future > >> >>>> > > >> >>>> > We should document what is based on FLIP-314. > >> >>>> > > >> >>>> > ## AlignedEnumerator and AlignedSourceReader > >> >>>> > > >> >>>> > Is the current Source interface sufficient for your > functionality? > >> >>>> > > >> >>>> > Can we currently achieve the ability to flush all data in a > snapshot > >> >>>> > before snapshot? > >> >>>> > > >> >>>> > Best, > >> >>>> > Jingsong > >> >>>> > > >> >>>> > On Mon, Jun 5, 2023 at 7:57 PM Shammon FY <[email protected]> > wrote: > >> >>>> > > > >> >>>> > > Hi Kelu, > >> >>>> > > > >> >>>> > > Thanks for your feedback. In the first stage, we do not want > to introduce a > >> >>>> > > server, but instead store information directly in the Paimon > table when > >> >>>> > > creating and running Flink jobs. A server will be considered > when we > >> >>>> > > encounter more requirements in the future and need a resident > service > >> >>>> > > management. > >> >>>> > > > >> >>>> > > Best, > >> >>>> > > Shammon FY > >> >>>> > > > >> >>>> > > On Fri, Jun 2, 2023 at 5:55 PM Kelu Tao <[email protected]> > wrote: > >> >>>> > > > >> >>>> > > > +1 > >> >>>> > > > > >> >>>> > > > cool job ~ > >> >>>> > > > > >> >>>> > > > For this PIP, do we need to introduce a new server for the > information > >> >>>> > > > serving? > >> >>>> > > > > >> >>>> > > > On 2023/05/31 02:28:21 Shammon FY wrote: > >> >>>> > > > > Hi devs, > >> >>>> > > > > > >> >>>> > > > > We would like to start a discussion about PIP-5: Paimon > Table And Data > >> >>>> > > > > Lineage For Flink[1]. > >> >>>> > > > > > >> >>>> > > > > As a streaming lake, users can use Paimon integrated with > Flink to > >> >>>> > > > complete > >> >>>> > > > > the entire ETL processing. In this process, users need to > manage batch & > >> >>>> > > > > streaming jobs and data streams, including batch & > streaming data > >> >>>> > > > > validation, job debug, and data revision. To support the > above ability, > >> >>>> > > > we > >> >>>> > > > > introduce table and data lineage for Flink & Paimon. Users > can > >> >>>> > > > conveniently > >> >>>> > > > > manage the entire ETL processing based on lineage > information. > >> >>>> > > > > > >> >>>> > > > > Looking forward to hearing from you, thanks. > >> >>>> > > > > > >> >>>> > > > > > >> >>>> > > > > [1] > >> >>>> > > > > > >> >>>> > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-5%3A+Paimon+Table+And+Data+Lineage+For+Flink > >> >>>> > > > > > >> >>>> > > > > > >> >>>> > > > > Best, > >> >>>> > > > > Shammon FY > >> >>>> > > > > > >> >>>> > > > >
