Re: [DISCUSS] Incubating Proposal for Paimon

2023-03-05 Thread Yu Li
Thanks all for the comments and positive feedback. Let me start a vote.

Best Regards,
Yu


On Thu, 2 Mar 2023 at 20:42, Robert Metzger  wrote:

> Thanks for the proposal. I'm happy to act as a mentor for the project.
> I respect the desire to go through the regular incubation process, and
> maybe it is a good thing for the Paimon community to revisit some of the
> processes and customs and develop their own style, independent of Flink as
> part of the incubation process.
>
> I have no doubt regarding the technical or "community building" ability of
> the initial team.
>
>
> On Mon, Feb 27, 2023 at 2:49 PM Becket Qin  wrote:
>
> > I am really excited to see Paimon become an independent ASF incubation
> > project, and I am happy to be a mentor of the project.
> >
> > Re Dave,
> >
> > The plan is to let Paimon eventually graduate as a TLP by itself. The
> > project bootstrapped as a subproject of Flink because 1) it was designed
> to
> > provide a stream and batch unified storage which matches the vision of
> > Flink as a stream and batch unified engine and 2) the project was
> developed
> > by the same team who is working on Flink.
> >
> > Now since there have been a few releases, we see strong and reasonable
> use
> > cases from the users letting Paimon (flink-table-store) work with engines
> > other than Flink, such as Spark / Trino. Continuing to keep Paimon as a
> > subject of Flink might unnecessarily limit the development of the project
> > and is somewhat misleading to the users. Given its scope, we believe it
> > makes a lot of sense for Paimon to get incubated on its own independent
> of
> > Flink. There has been a thorough discussion[1] and vote[2] about this
> among
> > the Flink PMC.
> >
> > Cheers,
> >
> > Jiangjie (Becket) Qin
> >
> > [1] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> > [2] https://lists.apache.org/thread/95wyc51rfmsqc9osc86q7zx3491m7bvt
> >
> > On Fri, Feb 24, 2023 at 12:10 PM Dave Fisher 
> > wrote:
> >
> >> An interesting proposal. Since Paimon is already part of Apache Flink
> >> does the podling intend to graduate as it’s own Top Level Project? Or,
> is
> >> the plan currently to become a subproject of Flink? I’m just curious.
> Were
> >> there any discussions within the Flink community about incubating
> Paimon?
> >>
> >> Best Regards,
> >> Dave
> >>
> >> Sent from my iPhone
> >>
> >> > On Feb 23, 2023, at 7:58 PM, Yu Li  wrote:
> >> >
> >> > Revision: the hyperlink of the first reference is incorrect and
> please
> >> use
> >> > the website address directly instead of clicking it (sorry for my
> >> mistake).
> >> >
> >> > For easier reference: https://github.com/apache/flink-table-store
> >> >
> >> > Best Regards,
> >> > Yu
> >> >
> >> >
> >> >> On Fri, 24 Feb 2023 at 11:48, Yu Li  wrote:
> >> >>
> >> >> Hi All,
> >> >>
> >> >>
> >> >> I would like to propose Paimon [1] as a new apache incubator project,
> >> and
> >> >> you can find the proposal [2] of Paimon for more details.
> >> >>
> >> >>
> >> >> Paimon is a unified lake storage to build dynamic tables for both
> >> stream
> >> >> and batch processing with big data compute engines (Apache Flink,
> >> Apache
> >> >> Spark, Apache
> >> >> Hive, Trino, etc.), supporting high-speed data ingestion and
> real-time
> >> data query.
> >> >> With the adoption of stream processing in production, there is an
> >> increasing demand for storage to simultaneously support updates, deletes
> >> and streaming reads,
> >> >> which cannot be fully satisfied by existing lake storages. To tackle
> >> these
> >> >> new challenges, Paimon
> >> >> natively adopts LSM (Log-Structured Merge-tree) as its underlying
> data
> >> structure, and provides enhanced performance for data with primary keys
> >> >> (besides
> >> >> the common lake storage capabilities). What's more, Paimon supports
> >> both batch and stream operations (reads and writes), facilitating
> >> applications pursuing batch-stream-unified semantics. Specifically:
> >> >>
> >> >>
> >> >> 1. Paimon provides excellent performance on the intensive update
> >> >> / delete workload, leveraging the append-write feature of the LSM
> data
> >> >> structure.
> >> >>
> >> >> 2. Paimon utilizes the ordered feature of LSM to support effective
> >> filter
> >> >> pushdown, and could reduce
> >> >> the latency of queries with primary key filtering to milliseconds.
> >> >>
> >> >> 3.
> >> >> Paimon supports various (row-based or row-columnar) file formats
> >> including Apache Avro, Apache ORC and Apache Parquet (rows will be
> sorted
> >> by the primary key before writing out).
> >> >>
> >> >> 4.
> >> >> Tables provided by Paimon can be queried by various engines,
> including
> >> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
> >> >>
> >> >> 5.
> >> >> Paimon's metadata is self-managed, stored on the distributed file
> >> system and can be synchronized to Hive metastore (HMS).
> >> >>
> >> >> 6.
> >> >> Besides the common batch read and write support, Paimon 

Re: [DISCUSS] Incubating Proposal for Paimon

2023-03-02 Thread Robert Metzger
Thanks for the proposal. I'm happy to act as a mentor for the project.
I respect the desire to go through the regular incubation process, and
maybe it is a good thing for the Paimon community to revisit some of the
processes and customs and develop their own style, independent of Flink as
part of the incubation process.

I have no doubt regarding the technical or "community building" ability of
the initial team.


On Mon, Feb 27, 2023 at 2:49 PM Becket Qin  wrote:

> I am really excited to see Paimon become an independent ASF incubation
> project, and I am happy to be a mentor of the project.
>
> Re Dave,
>
> The plan is to let Paimon eventually graduate as a TLP by itself. The
> project bootstrapped as a subproject of Flink because 1) it was designed to
> provide a stream and batch unified storage which matches the vision of
> Flink as a stream and batch unified engine and 2) the project was developed
> by the same team who is working on Flink.
>
> Now since there have been a few releases, we see strong and reasonable use
> cases from the users letting Paimon (flink-table-store) work with engines
> other than Flink, such as Spark / Trino. Continuing to keep Paimon as a
> subject of Flink might unnecessarily limit the development of the project
> and is somewhat misleading to the users. Given its scope, we believe it
> makes a lot of sense for Paimon to get incubated on its own independent of
> Flink. There has been a thorough discussion[1] and vote[2] about this among
> the Flink PMC.
>
> Cheers,
>
> Jiangjie (Becket) Qin
>
> [1] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> [2] https://lists.apache.org/thread/95wyc51rfmsqc9osc86q7zx3491m7bvt
>
> On Fri, Feb 24, 2023 at 12:10 PM Dave Fisher 
> wrote:
>
>> An interesting proposal. Since Paimon is already part of Apache Flink
>> does the podling intend to graduate as it’s own Top Level Project? Or, is
>> the plan currently to become a subproject of Flink? I’m just curious. Were
>> there any discussions within the Flink community about incubating Paimon?
>>
>> Best Regards,
>> Dave
>>
>> Sent from my iPhone
>>
>> > On Feb 23, 2023, at 7:58 PM, Yu Li  wrote:
>> >
>> > Revision: the hyperlink of the first reference is incorrect and please
>> use
>> > the website address directly instead of clicking it (sorry for my
>> mistake).
>> >
>> > For easier reference: https://github.com/apache/flink-table-store
>> >
>> > Best Regards,
>> > Yu
>> >
>> >
>> >> On Fri, 24 Feb 2023 at 11:48, Yu Li  wrote:
>> >>
>> >> Hi All,
>> >>
>> >>
>> >> I would like to propose Paimon [1] as a new apache incubator project,
>> and
>> >> you can find the proposal [2] of Paimon for more details.
>> >>
>> >>
>> >> Paimon is a unified lake storage to build dynamic tables for both
>> stream
>> >> and batch processing with big data compute engines (Apache Flink,
>> Apache
>> >> Spark, Apache
>> >> Hive, Trino, etc.), supporting high-speed data ingestion and real-time
>> data query.
>> >> With the adoption of stream processing in production, there is an
>> increasing demand for storage to simultaneously support updates, deletes
>> and streaming reads,
>> >> which cannot be fully satisfied by existing lake storages. To tackle
>> these
>> >> new challenges, Paimon
>> >> natively adopts LSM (Log-Structured Merge-tree) as its underlying data
>> structure, and provides enhanced performance for data with primary keys
>> >> (besides
>> >> the common lake storage capabilities). What's more, Paimon supports
>> both batch and stream operations (reads and writes), facilitating
>> applications pursuing batch-stream-unified semantics. Specifically:
>> >>
>> >>
>> >> 1. Paimon provides excellent performance on the intensive update
>> >> / delete workload, leveraging the append-write feature of the LSM data
>> >> structure.
>> >>
>> >> 2. Paimon utilizes the ordered feature of LSM to support effective
>> filter
>> >> pushdown, and could reduce
>> >> the latency of queries with primary key filtering to milliseconds.
>> >>
>> >> 3.
>> >> Paimon supports various (row-based or row-columnar) file formats
>> including Apache Avro, Apache ORC and Apache Parquet (rows will be sorted
>> by the primary key before writing out).
>> >>
>> >> 4.
>> >> Tables provided by Paimon can be queried by various engines, including
>> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
>> >>
>> >> 5.
>> >> Paimon's metadata is self-managed, stored on the distributed file
>> system and can be synchronized to Hive metastore (HMS).
>> >>
>> >> 6.
>> >> Besides the common batch read and write support, Paimon also supports
>> streaming read and change data feed.
>> >>
>> >>
>> >>
>> >> Paimon has been used by various users and companies, including
>> Alibaba, Bilibili, ByteDance and so on. Paimon is also integrated into
>> Alibaba Cloud's E-MapReduce and Realtime Compute products to provide cloud
>> services.
>> >>
>> >>
>> >> Paimon was founded in the Flink community in 2022 with the name of
>> "Flink Table 

RE: [DISCUSS] Incubating Proposal for Paimon

2023-02-28 Thread Stephan Ewen
Hi all!

This is a great project idea and proposal. I have been following this
within the Flink project and like the vision a lot.
If another mentor would be needed, I can step in.

I also mentioned previously to the team members that this might just as
well go TLP directly, but would respect their wish to go through incubation.

Best,
Stephan


On 2023/02/24 03:48:22 Yu Li wrote:
> Hi All,
>
>
> I would like to propose Paimon [1] as a new apache incubator project, and
> you can find the proposal [2] of Paimon for more details.
>
>
> Paimon is a unified lake storage to build dynamic tables for both stream
> and batch processing with big data compute engines (Apache Flink, Apache
> Spark, Apache
> Hive, Trino, etc.), supporting high-speed data ingestion and real-time
> data query.
> With the adoption of stream processing in production, there is an
> increasing demand for storage to simultaneously support updates,
> deletes and streaming reads,
> which cannot be fully satisfied by existing lake storages. To tackle these
> new challenges, Paimon
> natively adopts LSM (Log-Structured Merge-tree) as its underlying data
> structure, and provides enhanced performance for data with primary
> keys
> (besides
> the common lake storage capabilities). What's more, Paimon supports
> both batch and stream operations (reads and writes), facilitating
> applications pursuing batch-stream-unified semantics. Specifically:
>
>
> 1. Paimon provides excellent performance on the intensive update
> / delete workload, leveraging the append-write feature of the LSM data
> structure.
>
> 2. Paimon utilizes the ordered feature of LSM to support effective filter
> pushdown, and could reduce
> the latency of queries with primary key filtering to milliseconds.
>
> 3.
> Paimon supports various (row-based or row-columnar) file formats
> including Apache Avro, Apache ORC and Apache Parquet (rows will be
> sorted by the primary key before writing out).
>
> 4.
> Tables provided by Paimon can be queried by various engines, including
> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
>
> 5.
> Paimon's metadata is self-managed, stored on the distributed file
> system and can be synchronized to Hive metastore (HMS).
>
> 6.
> Besides the common batch read and write support, Paimon also supports
> streaming read and change data feed.
>
>
> Paimon has been used by various users and companies, including
> Alibaba, Bilibili, ByteDance and so on. Paimon is also integrated into
> Alibaba Cloud's E-MapReduce and Realtime Compute products to provide
> cloud services.
>
>
> Paimon was founded in the Flink community in 2022 with the name of
> "Flink Table Store”.
> It has been developed for more than one year and produced 4 formal
> releases. As its adoption expands to more computing engines, some of
> the ecology users express their concerns about the neutrality of the
> project. This makes us rethink the positioning of Flink Table Store,
> which can be an independent lake storage.
>
>
> With adequate discussions, we have got the support from the Flink
> community to enter Apache incubation
> [3] [4], with the below expectations:
>
> 1.
> Expand Paimon's ecosystem, providing independent Java APIs to support
> reading and writing from more big data engines such as Apache
> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
>
> 2.
> Supplement key capabilities, especially streaming reads and intensive
> updates/deletes,  for creating a unified and easy-to-use streaming
> data warehouse (lakehouse).
>
> 3. Grow into a more vibrant and neutral open source community.
>
>
> And we believe the Paimon project will provide tremendous value for the
> community if it is introduced into the Apache incubator.
>
>
> I will help this project as the champion and mentor the project together
> with three other mentors (many thanks):
>
>
> * Becket Qin (j...@apache.org)
>
> * Robert Metzger (rmetz...@apache.org)
>
> * Stephan Ewen (se...@apache.org)
>
>
> Look forward to your feedback. Thanks.
>
>
> Best Regards,
> Yu
>
> [1] https://github.com/apache/flink-table-store
> 
>
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
>
> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
>


Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-27 Thread Becket Qin
I am really excited to see Paimon become an independent ASF incubation
project, and I am happy to be a mentor of the project.

Re Dave,

The plan is to let Paimon eventually graduate as a TLP by itself. The
project bootstrapped as a subproject of Flink because 1) it was designed to
provide a stream and batch unified storage which matches the vision of
Flink as a stream and batch unified engine and 2) the project was developed
by the same team who is working on Flink.

Now since there have been a few releases, we see strong and reasonable use
cases from the users letting Paimon (flink-table-store) work with engines
other than Flink, such as Spark / Trino. Continuing to keep Paimon as a
subject of Flink might unnecessarily limit the development of the project
and is somewhat misleading to the users. Given its scope, we believe it
makes a lot of sense for Paimon to get incubated on its own independent of
Flink. There has been a thorough discussion[1] and vote[2] about this among
the Flink PMC.

Cheers,

Jiangjie (Becket) Qin

[1] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
[2] https://lists.apache.org/thread/95wyc51rfmsqc9osc86q7zx3491m7bvt

On Fri, Feb 24, 2023 at 12:10 PM Dave Fisher  wrote:

> An interesting proposal. Since Paimon is already part of Apache Flink does
> the podling intend to graduate as it’s own Top Level Project? Or, is the
> plan currently to become a subproject of Flink? I’m just curious. Were
> there any discussions within the Flink community about incubating Paimon?
>
> Best Regards,
> Dave
>
> Sent from my iPhone
>
> > On Feb 23, 2023, at 7:58 PM, Yu Li  wrote:
> >
> > Revision: the hyperlink of the first reference is incorrect and please
> use
> > the website address directly instead of clicking it (sorry for my
> mistake).
> >
> > For easier reference: https://github.com/apache/flink-table-store
> >
> > Best Regards,
> > Yu
> >
> >
> >> On Fri, 24 Feb 2023 at 11:48, Yu Li  wrote:
> >>
> >> Hi All,
> >>
> >>
> >> I would like to propose Paimon [1] as a new apache incubator project,
> and
> >> you can find the proposal [2] of Paimon for more details.
> >>
> >>
> >> Paimon is a unified lake storage to build dynamic tables for both stream
> >> and batch processing with big data compute engines (Apache Flink, Apache
> >> Spark, Apache
> >> Hive, Trino, etc.), supporting high-speed data ingestion and real-time
> data query.
> >> With the adoption of stream processing in production, there is an
> increasing demand for storage to simultaneously support updates, deletes
> and streaming reads,
> >> which cannot be fully satisfied by existing lake storages. To tackle
> these
> >> new challenges, Paimon
> >> natively adopts LSM (Log-Structured Merge-tree) as its underlying data
> structure, and provides enhanced performance for data with primary keys
> >> (besides
> >> the common lake storage capabilities). What's more, Paimon supports
> both batch and stream operations (reads and writes), facilitating
> applications pursuing batch-stream-unified semantics. Specifically:
> >>
> >>
> >> 1. Paimon provides excellent performance on the intensive update
> >> / delete workload, leveraging the append-write feature of the LSM data
> >> structure.
> >>
> >> 2. Paimon utilizes the ordered feature of LSM to support effective
> filter
> >> pushdown, and could reduce
> >> the latency of queries with primary key filtering to milliseconds.
> >>
> >> 3.
> >> Paimon supports various (row-based or row-columnar) file formats
> including Apache Avro, Apache ORC and Apache Parquet (rows will be sorted
> by the primary key before writing out).
> >>
> >> 4.
> >> Tables provided by Paimon can be queried by various engines, including
> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
> >>
> >> 5.
> >> Paimon's metadata is self-managed, stored on the distributed file
> system and can be synchronized to Hive metastore (HMS).
> >>
> >> 6.
> >> Besides the common batch read and write support, Paimon also supports
> streaming read and change data feed.
> >>
> >>
> >>
> >> Paimon has been used by various users and companies, including Alibaba,
> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba
> Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
> >>
> >>
> >> Paimon was founded in the Flink community in 2022 with the name of
> "Flink Table Store”.
> >> It has been developed for more than one year and produced 4 formal
> >> releases. As its adoption expands to more computing engines, some of
> the ecology users express their concerns about the neutrality of the
> project. This makes us rethink the positioning of Flink Table Store, which
> can be an independent lake storage.
> >>
> >>
> >> With adequate discussions, we have got the support from the Flink
> community to enter Apache incubation
> >> [3] [4], with the below expectations:
> >>
> >> 1.
> >> Expand Paimon's ecosystem, providing independent Java APIs to support
> reading and writing from more 

Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-26 Thread Yu Li
Thanks for the reference and suggestions Willem and Justin.

TL;DR: I've conveyed the reminder to the team (initial committers) during
the weekend and they decided to (still) join the incubator after careful
consideration.

Checking Apache Camel history, it was established as a sub-project of
Apache ActiveMQ in April 2007 [1], and discussed to become a TLP (directly
out of ActiveMQ) in Nov. 2008 [2], which has a 1.5 years span (faster but
still long enough). In comparison, Paimon became a sub-project of Apache
Flink in Feb. 2022, and was discussed to become an incubator project in
Dec. 2022, developed for only 10 months and was less mature.

I also convey the suggestion to several of Paimon's initial committers, and
they are all grateful for the reminder and trust from IPMC. However, the
team believes the incubator could teach and help them to build a more
vibrant and neutral community, and become more qualified to be a TLP.

Therefore we'd like to continue the incubating proposal discussion here,
and comments and feedback are warmly welcome.

Best Regards,
Yu

[1] https://lists.apache.org/thread/zdpzpmbo2lwrdqt9ry7ygzqj6t8z78h6
[2] https://lists.apache.org/thread/s4k5njgz1nstdc9rt0rndyn144t87z0d


On Fri, 24 Feb 2023 at 18:22, Justin Mclean  wrote:

> Hi,
>
> If you feel you need to go though the Incubator we'll accept the project.
>
> Kind Regards,
> Justin
>
> On Fri, 24 Feb 2023, 7:48 pm Yu Li,  wrote:
>
> > Thanks all for the interest.
> >
> > Paimon is already a sub-project of Apache Flink [1], and yes we intend to
> > graduate as its own Top Level Project after incubating.
> >
> > And yes, similar question was raised during discussion in the Flink
> > community [2], and  please allow me to quote some content below for
> easier
> > reference:
> > ===
> > Apache has the tradition to incubate projects out of mature
> > sub-projects of top-level
> > projects, for example Apache HBase [3] and Apache Ozone [4] are both born
> > as sub-projects of Apache Hadoop. Although HBase and Ozone both become
> > top-level projects directly, they were also developed for more than 3
> years
> > as sub-projects, and we wonder whether table store is as mature to be a
> > top-level project. On the other hand, we believe the table store project
> is
> > qualified to enter the Apache incubation as an innovative lake storage,
> and
> > has the potential to graduate in a year or two.
> > ===
> >
> > However, it's true that we are not quite sure whether the project is
> > qualified to directly become another TLP, thus we took a conservative
> > approach. Please feel free to let us know if you have any suggestions.
> > Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://lists.apache.org/thread/h3hkvt3nq9gfblt3k0w2pr6xrg54pqxt
> > [2] https://lists.apache.org/thread/hgpcg6mhhoblzbnh1opkrxynmc93vxzp
> > [3]
> https://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg18236.html
> > [4]
> >
> >
> https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Hadoop+subproject+to+Apache+TLP+proposal
> >
> >
> > On Fri, 24 Feb 2023 at 12:16, Justin Mclean 
> > wrote:
> >
> > > Hi,
> > >
> > > Given that most of the people involved are familiar with the ASF and
> how
> > > projects should operate, I wonder if you should not just become a
> > > subproject of Fink or a separate TLP? Are there any reasons you think
> the
> > > project needs to go through incubation?
> > >
> > > Kind Regards,
> > > Justin
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>


Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-24 Thread Justin Mclean
Hi,

If you feel you need to go though the Incubator we'll accept the project.

Kind Regards,
Justin

On Fri, 24 Feb 2023, 7:48 pm Yu Li,  wrote:

> Thanks all for the interest.
>
> Paimon is already a sub-project of Apache Flink [1], and yes we intend to
> graduate as its own Top Level Project after incubating.
>
> And yes, similar question was raised during discussion in the Flink
> community [2], and  please allow me to quote some content below for easier
> reference:
> ===
> Apache has the tradition to incubate projects out of mature
> sub-projects of top-level
> projects, for example Apache HBase [3] and Apache Ozone [4] are both born
> as sub-projects of Apache Hadoop. Although HBase and Ozone both become
> top-level projects directly, they were also developed for more than 3 years
> as sub-projects, and we wonder whether table store is as mature to be a
> top-level project. On the other hand, we believe the table store project is
> qualified to enter the Apache incubation as an innovative lake storage, and
> has the potential to graduate in a year or two.
> ===
>
> However, it's true that we are not quite sure whether the project is
> qualified to directly become another TLP, thus we took a conservative
> approach. Please feel free to let us know if you have any suggestions.
> Thanks.
>
> Best Regards,
> Yu
>
> [1] https://lists.apache.org/thread/h3hkvt3nq9gfblt3k0w2pr6xrg54pqxt
> [2] https://lists.apache.org/thread/hgpcg6mhhoblzbnh1opkrxynmc93vxzp
> [3] https://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg18236.html
> [4]
>
> https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Hadoop+subproject+to+Apache+TLP+proposal
>
>
> On Fri, 24 Feb 2023 at 12:16, Justin Mclean 
> wrote:
>
> > Hi,
> >
> > Given that most of the people involved are familiar with the ASF and how
> > projects should operate, I wonder if you should not just become a
> > subproject of Fink or a separate TLP? Are there any reasons you think the
> > project needs to go through incubation?
> >
> > Kind Regards,
> > Justin
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-24 Thread Willem Jiang
Hi Yu,

Just so you know, Apache Camel was a subproject of Apache ActiveMQ,
and it became the TLP without going through the incubator process.
It's common to have to turn a subproject into TLP if most developers
know how to run an ASF project.

Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Feb 24, 2023 at 4:48 PM Yu Li  wrote:
>
> Thanks all for the interest.
>
> Paimon is already a sub-project of Apache Flink [1], and yes we intend to
> graduate as its own Top Level Project after incubating.
>
> And yes, similar question was raised during discussion in the Flink
> community [2], and  please allow me to quote some content below for easier
> reference:
> ===
> Apache has the tradition to incubate projects out of mature
> sub-projects of top-level
> projects, for example Apache HBase [3] and Apache Ozone [4] are both born
> as sub-projects of Apache Hadoop. Although HBase and Ozone both become
> top-level projects directly, they were also developed for more than 3 years
> as sub-projects, and we wonder whether table store is as mature to be a
> top-level project. On the other hand, we believe the table store project is
> qualified to enter the Apache incubation as an innovative lake storage, and
> has the potential to graduate in a year or two.
> ===
>
> However, it's true that we are not quite sure whether the project is
> qualified to directly become another TLP, thus we took a conservative
> approach. Please feel free to let us know if you have any suggestions.
> Thanks.
>
> Best Regards,
> Yu
>
> [1] https://lists.apache.org/thread/h3hkvt3nq9gfblt3k0w2pr6xrg54pqxt
> [2] https://lists.apache.org/thread/hgpcg6mhhoblzbnh1opkrxynmc93vxzp
> [3] https://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg18236.html
> [4]
> https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Hadoop+subproject+to+Apache+TLP+proposal
>
>
> On Fri, 24 Feb 2023 at 12:16, Justin Mclean 
> wrote:
>
> > Hi,
> >
> > Given that most of the people involved are familiar with the ASF and how
> > projects should operate, I wonder if you should not just become a
> > subproject of Fink or a separate TLP? Are there any reasons you think the
> > project needs to go through incubation?
> >
> > Kind Regards,
> > Justin
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-24 Thread Yu Li
Thanks all for the interest.

Paimon is already a sub-project of Apache Flink [1], and yes we intend to
graduate as its own Top Level Project after incubating.

And yes, similar question was raised during discussion in the Flink
community [2], and  please allow me to quote some content below for easier
reference:
===
Apache has the tradition to incubate projects out of mature
sub-projects of top-level
projects, for example Apache HBase [3] and Apache Ozone [4] are both born
as sub-projects of Apache Hadoop. Although HBase and Ozone both become
top-level projects directly, they were also developed for more than 3 years
as sub-projects, and we wonder whether table store is as mature to be a
top-level project. On the other hand, we believe the table store project is
qualified to enter the Apache incubation as an innovative lake storage, and
has the potential to graduate in a year or two.
===

However, it's true that we are not quite sure whether the project is
qualified to directly become another TLP, thus we took a conservative
approach. Please feel free to let us know if you have any suggestions.
Thanks.

Best Regards,
Yu

[1] https://lists.apache.org/thread/h3hkvt3nq9gfblt3k0w2pr6xrg54pqxt
[2] https://lists.apache.org/thread/hgpcg6mhhoblzbnh1opkrxynmc93vxzp
[3] https://www.mail-archive.com/hbase-dev@hadoop.apache.org/msg18236.html
[4]
https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Hadoop+subproject+to+Apache+TLP+proposal


On Fri, 24 Feb 2023 at 12:16, Justin Mclean 
wrote:

> Hi,
>
> Given that most of the people involved are familiar with the ASF and how
> projects should operate, I wonder if you should not just become a
> subproject of Fink or a separate TLP? Are there any reasons you think the
> project needs to go through incubation?
>
> Kind Regards,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-23 Thread Justin Mclean
Hi,

Given that most of the people involved are familiar with the ASF and how 
projects should operate, I wonder if you should not just become a subproject of 
Fink or a separate TLP? Are there any reasons you think the project needs to go 
through incubation?

Kind Regards,
Justin
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-23 Thread Dave Fisher
An interesting proposal. Since Paimon is already part of Apache Flink does the 
podling intend to graduate as it’s own Top Level Project? Or, is the plan 
currently to become a subproject of Flink? I’m just curious. Were there any 
discussions within the Flink community about incubating Paimon?

Best Regards,
Dave

Sent from my iPhone

> On Feb 23, 2023, at 7:58 PM, Yu Li  wrote:
> 
> Revision: the hyperlink of the first reference is incorrect and please use
> the website address directly instead of clicking it (sorry for my mistake).
> 
> For easier reference: https://github.com/apache/flink-table-store
> 
> Best Regards,
> Yu
> 
> 
>> On Fri, 24 Feb 2023 at 11:48, Yu Li  wrote:
>> 
>> Hi All,
>> 
>> 
>> I would like to propose Paimon [1] as a new apache incubator project, and
>> you can find the proposal [2] of Paimon for more details.
>> 
>> 
>> Paimon is a unified lake storage to build dynamic tables for both stream
>> and batch processing with big data compute engines (Apache Flink, Apache
>> Spark, Apache
>> Hive, Trino, etc.), supporting high-speed data ingestion and real-time data 
>> query.
>> With the adoption of stream processing in production, there is an increasing 
>> demand for storage to simultaneously support updates, deletes and streaming 
>> reads,
>> which cannot be fully satisfied by existing lake storages. To tackle these
>> new challenges, Paimon
>> natively adopts LSM (Log-Structured Merge-tree) as its underlying data 
>> structure, and provides enhanced performance for data with primary keys
>> (besides
>> the common lake storage capabilities). What's more, Paimon supports both 
>> batch and stream operations (reads and writes), facilitating applications 
>> pursuing batch-stream-unified semantics. Specifically:
>> 
>> 
>> 1. Paimon provides excellent performance on the intensive update
>> / delete workload, leveraging the append-write feature of the LSM data
>> structure.
>> 
>> 2. Paimon utilizes the ordered feature of LSM to support effective filter
>> pushdown, and could reduce
>> the latency of queries with primary key filtering to milliseconds.
>> 
>> 3.
>> Paimon supports various (row-based or row-columnar) file formats including 
>> Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the 
>> primary key before writing out).
>> 
>> 4.
>> Tables provided by Paimon can be queried by various engines, including 
>> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
>> 
>> 5.
>> Paimon's metadata is self-managed, stored on the distributed file system and 
>> can be synchronized to Hive metastore (HMS).
>> 
>> 6.
>> Besides the common batch read and write support, Paimon also supports 
>> streaming read and change data feed.
>> 
>> 
>> 
>> Paimon has been used by various users and companies, including Alibaba, 
>> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba 
>> Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
>> 
>> 
>> Paimon was founded in the Flink community in 2022 with the name of "Flink 
>> Table Store”.
>> It has been developed for more than one year and produced 4 formal
>> releases. As its adoption expands to more computing engines, some of the 
>> ecology users express their concerns about the neutrality of the project. 
>> This makes us rethink the positioning of Flink Table Store, which can be an 
>> independent lake storage.
>> 
>> 
>> With adequate discussions, we have got the support from the Flink community 
>> to enter Apache incubation
>> [3] [4], with the below expectations:
>> 
>> 1.
>> Expand Paimon's ecosystem, providing independent Java APIs to support 
>> reading and writing from more big data engines such as Apache
>> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
>> 
>> 2.
>> Supplement key capabilities, especially streaming reads and intensive 
>> updates/deletes,  for creating a unified and easy-to-use streaming data 
>> warehouse (lakehouse).
>> 
>> 3. Grow into a more vibrant and neutral open source community.
>> 
>> 
>> And we believe the Paimon project will provide tremendous value for the
>> community if it is introduced into the Apache incubator.
>> 
>> 
>> I will help this project as the champion and mentor the project together
>> with three other mentors (many thanks):
>> 
>> 
>> * Becket Qin (j...@apache.org)
>> 
>> * Robert Metzger (rmetz...@apache.org)
>> 
>> * Stephan Ewen (se...@apache.org)
>> 
>> 
>> Look forward to your feedback. Thanks.
>> 
>> 
>> Best Regards,
>> Yu
>> 
>> [1] https://github.com/apache/flink-table-store
>> 
>> 
>> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>> 
>> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
>> 
>> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
>> 
>> 
>> 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For 

Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-23 Thread Atri Sharma
Very interesting.

I would like to join as a mentor, if needed.

Atri

On Fri, Feb 24, 2023 at 9:28 AM Yu Li  wrote:
>
> Revision: the hyperlink of the first reference is incorrect and please use
> the website address directly instead of clicking it (sorry for my mistake).
>
> For easier reference: https://github.com/apache/flink-table-store
>
> Best Regards,
> Yu
>
>
> On Fri, 24 Feb 2023 at 11:48, Yu Li  wrote:
>
> > Hi All,
> >
> >
> > I would like to propose Paimon [1] as a new apache incubator project, and
> > you can find the proposal [2] of Paimon for more details.
> >
> >
> > Paimon is a unified lake storage to build dynamic tables for both stream
> > and batch processing with big data compute engines (Apache Flink, Apache
> > Spark, Apache
> > Hive, Trino, etc.), supporting high-speed data ingestion and real-time data 
> > query.
> > With the adoption of stream processing in production, there is an 
> > increasing demand for storage to simultaneously support updates, deletes 
> > and streaming reads,
> > which cannot be fully satisfied by existing lake storages. To tackle these
> > new challenges, Paimon
> > natively adopts LSM (Log-Structured Merge-tree) as its underlying data 
> > structure, and provides enhanced performance for data with primary keys
> > (besides
> > the common lake storage capabilities). What's more, Paimon supports both 
> > batch and stream operations (reads and writes), facilitating applications 
> > pursuing batch-stream-unified semantics. Specifically:
> >
> >
> > 1. Paimon provides excellent performance on the intensive update
> > / delete workload, leveraging the append-write feature of the LSM data
> > structure.
> >
> > 2. Paimon utilizes the ordered feature of LSM to support effective filter
> > pushdown, and could reduce
> > the latency of queries with primary key filtering to milliseconds.
> >
> > 3.
> > Paimon supports various (row-based or row-columnar) file formats including 
> > Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the 
> > primary key before writing out).
> >
> > 4.
> > Tables provided by Paimon can be queried by various engines, including 
> > Apache Flink, Apache Spark, Apache Hive, Trino, etc.
> >
> > 5.
> > Paimon's metadata is self-managed, stored on the distributed file system 
> > and can be synchronized to Hive metastore (HMS).
> >
> > 6.
> > Besides the common batch read and write support, Paimon also supports 
> > streaming read and change data feed.
> >
> >
> >
> > Paimon has been used by various users and companies, including Alibaba, 
> > Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba 
> > Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
> >
> >
> > Paimon was founded in the Flink community in 2022 with the name of "Flink 
> > Table Store”.
> > It has been developed for more than one year and produced 4 formal
> > releases. As its adoption expands to more computing engines, some of the 
> > ecology users express their concerns about the neutrality of the project. 
> > This makes us rethink the positioning of Flink Table Store, which can be an 
> > independent lake storage.
> >
> >
> > With adequate discussions, we have got the support from the Flink community 
> > to enter Apache incubation
> > [3] [4], with the below expectations:
> >
> > 1.
> > Expand Paimon's ecosystem, providing independent Java APIs to support 
> > reading and writing from more big data engines such as Apache
> > Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
> >
> > 2.
> > Supplement key capabilities, especially streaming reads and intensive 
> > updates/deletes,  for creating a unified and easy-to-use streaming data 
> > warehouse (lakehouse).
> >
> > 3. Grow into a more vibrant and neutral open source community.
> >
> >
> > And we believe the Paimon project will provide tremendous value for the
> > community if it is introduced into the Apache incubator.
> >
> >
> > I will help this project as the champion and mentor the project together
> > with three other mentors (many thanks):
> >
> >
> > * Becket Qin (j...@apache.org)
> >
> > * Robert Metzger (rmetz...@apache.org)
> >
> > * Stephan Ewen (se...@apache.org)
> >
> >
> > Look forward to your feedback. Thanks.
> >
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/apache/flink-table-store
> > 
> >
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> >
> > [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> >
> > [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
> >
> >
> >

-- 
Regards,

Atri
Apache Concerted

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] Incubating Proposal for Paimon

2023-02-23 Thread Yu Li
Revision: the hyperlink of the first reference is incorrect and please use
the website address directly instead of clicking it (sorry for my mistake).

For easier reference: https://github.com/apache/flink-table-store

Best Regards,
Yu


On Fri, 24 Feb 2023 at 11:48, Yu Li  wrote:

> Hi All,
>
>
> I would like to propose Paimon [1] as a new apache incubator project, and
> you can find the proposal [2] of Paimon for more details.
>
>
> Paimon is a unified lake storage to build dynamic tables for both stream
> and batch processing with big data compute engines (Apache Flink, Apache
> Spark, Apache
> Hive, Trino, etc.), supporting high-speed data ingestion and real-time data 
> query.
> With the adoption of stream processing in production, there is an increasing 
> demand for storage to simultaneously support updates, deletes and streaming 
> reads,
> which cannot be fully satisfied by existing lake storages. To tackle these
> new challenges, Paimon
> natively adopts LSM (Log-Structured Merge-tree) as its underlying data 
> structure, and provides enhanced performance for data with primary keys
> (besides
> the common lake storage capabilities). What's more, Paimon supports both 
> batch and stream operations (reads and writes), facilitating applications 
> pursuing batch-stream-unified semantics. Specifically:
>
>
> 1. Paimon provides excellent performance on the intensive update
> / delete workload, leveraging the append-write feature of the LSM data
> structure.
>
> 2. Paimon utilizes the ordered feature of LSM to support effective filter
> pushdown, and could reduce
> the latency of queries with primary key filtering to milliseconds.
>
> 3.
> Paimon supports various (row-based or row-columnar) file formats including 
> Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the 
> primary key before writing out).
>
> 4.
> Tables provided by Paimon can be queried by various engines, including Apache 
> Flink, Apache Spark, Apache Hive, Trino, etc.
>
> 5.
> Paimon's metadata is self-managed, stored on the distributed file system and 
> can be synchronized to Hive metastore (HMS).
>
> 6.
> Besides the common batch read and write support, Paimon also supports 
> streaming read and change data feed.
>
>
>
> Paimon has been used by various users and companies, including Alibaba, 
> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba Cloud's 
> E-MapReduce and Realtime Compute products to provide cloud services.
>
>
> Paimon was founded in the Flink community in 2022 with the name of "Flink 
> Table Store”.
> It has been developed for more than one year and produced 4 formal
> releases. As its adoption expands to more computing engines, some of the 
> ecology users express their concerns about the neutrality of the project. 
> This makes us rethink the positioning of Flink Table Store, which can be an 
> independent lake storage.
>
>
> With adequate discussions, we have got the support from the Flink community 
> to enter Apache incubation
> [3] [4], with the below expectations:
>
> 1.
> Expand Paimon's ecosystem, providing independent Java APIs to support reading 
> and writing from more big data engines such as Apache
> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
>
> 2.
> Supplement key capabilities, especially streaming reads and intensive 
> updates/deletes,  for creating a unified and easy-to-use streaming data 
> warehouse (lakehouse).
>
> 3. Grow into a more vibrant and neutral open source community.
>
>
> And we believe the Paimon project will provide tremendous value for the
> community if it is introduced into the Apache incubator.
>
>
> I will help this project as the champion and mentor the project together
> with three other mentors (many thanks):
>
>
> * Becket Qin (j...@apache.org)
>
> * Robert Metzger (rmetz...@apache.org)
>
> * Stephan Ewen (se...@apache.org)
>
>
> Look forward to your feedback. Thanks.
>
>
> Best Regards,
> Yu
>
> [1] https://github.com/apache/flink-table-store
> 
>
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
>
> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
>
>
>