I am really excited to see Paimon become an independent ASF incubation project, and I am happy to be a mentor of the project.
Re Dave, The plan is to let Paimon eventually graduate as a TLP by itself. The project bootstrapped as a subproject of Flink because 1) it was designed to provide a stream and batch unified storage which matches the vision of Flink as a stream and batch unified engine and 2) the project was developed by the same team who is working on Flink. Now since there have been a few releases, we see strong and reasonable use cases from the users letting Paimon (flink-table-store) work with engines other than Flink, such as Spark / Trino. Continuing to keep Paimon as a subject of Flink might unnecessarily limit the development of the project and is somewhat misleading to the users. Given its scope, we believe it makes a lot of sense for Paimon to get incubated on its own independent of Flink. There has been a thorough discussion[1] and vote[2] about this among the Flink PMC. Cheers, Jiangjie (Becket) Qin [1] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk [2] https://lists.apache.org/thread/95wyc51rfmsqc9osc86q7zx3491m7bvt On Fri, Feb 24, 2023 at 12:10 PM Dave Fisher <wave4d...@comcast.net> wrote: > An interesting proposal. Since Paimon is already part of Apache Flink does > the podling intend to graduate as it’s own Top Level Project? Or, is the > plan currently to become a subproject of Flink? I’m just curious. Were > there any discussions within the Flink community about incubating Paimon? > > Best Regards, > Dave > > Sent from my iPhone > > > On Feb 23, 2023, at 7:58 PM, Yu Li <car...@gmail.com> wrote: > > > > Revision: the hyperlink of the first reference is incorrect and please > use > > the website address directly instead of clicking it (sorry for my > mistake). > > > > For easier reference: https://github.com/apache/flink-table-store > > > > Best Regards, > > Yu > > > > > >> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote: > >> > >> Hi All, > >> > >> > >> I would like to propose Paimon [1] as a new apache incubator project, > and > >> you can find the proposal [2] of Paimon for more details. > >> > >> > >> Paimon is a unified lake storage to build dynamic tables for both stream > >> and batch processing with big data compute engines (Apache Flink, Apache > >> Spark, Apache > >> Hive, Trino, etc.), supporting high-speed data ingestion and real-time > data query. > >> With the adoption of stream processing in production, there is an > increasing demand for storage to simultaneously support updates, deletes > and streaming reads, > >> which cannot be fully satisfied by existing lake storages. To tackle > these > >> new challenges, Paimon > >> natively adopts LSM (Log-Structured Merge-tree) as its underlying data > structure, and provides enhanced performance for data with primary keys > >> (besides > >> the common lake storage capabilities). What's more, Paimon supports > both batch and stream operations (reads and writes), facilitating > applications pursuing batch-stream-unified semantics. Specifically: > >> > >> > >> 1. Paimon provides excellent performance on the intensive update > >> / delete workload, leveraging the append-write feature of the LSM data > >> structure. > >> > >> 2. Paimon utilizes the ordered feature of LSM to support effective > filter > >> pushdown, and could reduce > >> the latency of queries with primary key filtering to milliseconds. > >> > >> 3. > >> Paimon supports various (row-based or row-columnar) file formats > including Apache Avro, Apache ORC and Apache Parquet (rows will be sorted > by the primary key before writing out). > >> > >> 4. > >> Tables provided by Paimon can be queried by various engines, including > Apache Flink, Apache Spark, Apache Hive, Trino, etc. > >> > >> 5. > >> Paimon's metadata is self-managed, stored on the distributed file > system and can be synchronized to Hive metastore (HMS). > >> > >> 6. > >> Besides the common batch read and write support, Paimon also supports > streaming read and change data feed. > >> > >> > >> > >> Paimon has been used by various users and companies, including Alibaba, > Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba > Cloud's E-MapReduce and Realtime Compute products to provide cloud services. > >> > >> > >> Paimon was founded in the Flink community in 2022 with the name of > "Flink Table Store”. > >> It has been developed for more than one year and produced 4 formal > >> releases. As its adoption expands to more computing engines, some of > the ecology users express their concerns about the neutrality of the > project. This makes us rethink the positioning of Flink Table Store, which > can be an independent lake storage. > >> > >> > >> With adequate discussions, we have got the support from the Flink > community to enter Apache incubation > >> [3] [4], with the below expectations: > >> > >> 1. > >> Expand Paimon's ecosystem, providing independent Java APIs to support > reading and writing from more big data engines such as Apache > >> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc. > >> > >> 2. > >> Supplement key capabilities, especially streaming reads and intensive > updates/deletes, for creating a unified and easy-to-use streaming data > warehouse (lakehouse). > >> > >> 3. Grow into a more vibrant and neutral open source community. > >> > >> > >> And we believe the Paimon project will provide tremendous value for the > >> community if it is introduced into the Apache incubator. > >> > >> > >> I will help this project as the champion and mentor the project together > >> with three other mentors (many thanks): > >> > >> > >> * Becket Qin (j...@apache.org) > >> > >> * Robert Metzger (rmetz...@apache.org) > >> > >> * Stephan Ewen (se...@apache.org) > >> > >> > >> Look forward to your feedback. Thanks. > >> > >> > >> Best Regards, > >> Yu > >> > >> [1] https://github.com/apache/flink-table-store > >> <https://github.com/alibaba/RemoteShuffleService> > >> > >> [2] > https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal > >> > >> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk > >> > >> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6 > >> > >> > >> > >