An interesting proposal. Since Paimon is already part of Apache Flink does the 
podling intend to graduate as it’s own Top Level Project? Or, is the plan 
currently to become a subproject of Flink? I’m just curious. Were there any 
discussions within the Flink community about incubating Paimon?

Best Regards,
Dave

Sent from my iPhone

> On Feb 23, 2023, at 7:58 PM, Yu Li <car...@gmail.com> wrote:
> 
> Revision: the hyperlink of the first reference is incorrect and please use
> the website address directly instead of clicking it (sorry for my mistake).
> 
> For easier reference: https://github.com/apache/flink-table-store
> 
> Best Regards,
> Yu
> 
> 
>> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> 
>> I would like to propose Paimon [1] as a new apache incubator project, and
>> you can find the proposal [2] of Paimon for more details.
>> 
>> 
>> Paimon is a unified lake storage to build dynamic tables for both stream
>> and batch processing with big data compute engines (Apache Flink, Apache
>> Spark, Apache
>> Hive, Trino, etc.), supporting high-speed data ingestion and real-time data 
>> query.
>> With the adoption of stream processing in production, there is an increasing 
>> demand for storage to simultaneously support updates, deletes and streaming 
>> reads,
>> which cannot be fully satisfied by existing lake storages. To tackle these
>> new challenges, Paimon
>> natively adopts LSM (Log-Structured Merge-tree) as its underlying data 
>> structure, and provides enhanced performance for data with primary keys
>> (besides
>> the common lake storage capabilities). What's more, Paimon supports both 
>> batch and stream operations (reads and writes), facilitating applications 
>> pursuing batch-stream-unified semantics. Specifically:
>> 
>> 
>> 1. Paimon provides excellent performance on the intensive update
>> / delete workload, leveraging the append-write feature of the LSM data
>> structure.
>> 
>> 2. Paimon utilizes the ordered feature of LSM to support effective filter
>> pushdown, and could reduce
>> the latency of queries with primary key filtering to milliseconds.
>> 
>> 3.
>> Paimon supports various (row-based or row-columnar) file formats including 
>> Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the 
>> primary key before writing out).
>> 
>> 4.
>> Tables provided by Paimon can be queried by various engines, including 
>> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
>> 
>> 5.
>> Paimon's metadata is self-managed, stored on the distributed file system and 
>> can be synchronized to Hive metastore (HMS).
>> 
>> 6.
>> Besides the common batch read and write support, Paimon also supports 
>> streaming read and change data feed.
>> 
>> 
>> 
>> Paimon has been used by various users and companies, including Alibaba, 
>> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba 
>> Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
>> 
>> 
>> Paimon was founded in the Flink community in 2022 with the name of "Flink 
>> Table Store”.
>> It has been developed for more than one year and produced 4 formal
>> releases. As its adoption expands to more computing engines, some of the 
>> ecology users express their concerns about the neutrality of the project. 
>> This makes us rethink the positioning of Flink Table Store, which can be an 
>> independent lake storage.
>> 
>> 
>> With adequate discussions, we have got the support from the Flink community 
>> to enter Apache incubation
>> [3] [4], with the below expectations:
>> 
>> 1.
>> Expand Paimon's ecosystem, providing independent Java APIs to support 
>> reading and writing from more big data engines such as Apache
>> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
>> 
>> 2.
>> Supplement key capabilities, especially streaming reads and intensive 
>> updates/deletes,  for creating a unified and easy-to-use streaming data 
>> warehouse (lakehouse).
>> 
>> 3. Grow into a more vibrant and neutral open source community.
>> 
>> 
>> And we believe the Paimon project will provide tremendous value for the
>> community if it is introduced into the Apache incubator.
>> 
>> 
>> I will help this project as the champion and mentor the project together
>> with three other mentors (many thanks):
>> 
>> 
>> * Becket Qin (j...@apache.org)
>> 
>> * Robert Metzger (rmetz...@apache.org)
>> 
>> * Stephan Ewen (se...@apache.org)
>> 
>> 
>> Look forward to your feedback. Thanks.
>> 
>> 
>> Best Regards,
>> Yu
>> 
>> [1] https://github.com/apache/flink-table-store
>> <https://github.com/alibaba/RemoteShuffleService>
>> 
>> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>> 
>> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
>> 
>> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to