Hi all!

This is a great project idea and proposal. I have been following this
within the Flink project and like the vision a lot.
If another mentor would be needed, I can step in.

I also mentioned previously to the team members that this might just as
well go TLP directly, but would respect their wish to go through incubation.

Best,
Stephan


On 2023/02/24 03:48:22 Yu Li wrote:
> Hi All,
>
>
> I would like to propose Paimon [1] as a new apache incubator project, and
> you can find the proposal [2] of Paimon for more details.
>
>
> Paimon is a unified lake storage to build dynamic tables for both stream
> and batch processing with big data compute engines (Apache Flink, Apache
> Spark, Apache
> Hive, Trino, etc.), supporting high-speed data ingestion and real-time
> data query.
> With the adoption of stream processing in production, there is an
> increasing demand for storage to simultaneously support updates,
> deletes and streaming reads,
> which cannot be fully satisfied by existing lake storages. To tackle these
> new challenges, Paimon
> natively adopts LSM (Log-Structured Merge-tree) as its underlying data
> structure, and provides enhanced performance for data with primary
> keys
> (besides
> the common lake storage capabilities). What's more, Paimon supports
> both batch and stream operations (reads and writes), facilitating
> applications pursuing batch-stream-unified semantics. Specifically:
>
>
> 1. Paimon provides excellent performance on the intensive update
> / delete workload, leveraging the append-write feature of the LSM data
> structure.
>
> 2. Paimon utilizes the ordered feature of LSM to support effective filter
> pushdown, and could reduce
> the latency of queries with primary key filtering to milliseconds.
>
> 3.
> Paimon supports various (row-based or row-columnar) file formats
> including Apache Avro, Apache ORC and Apache Parquet (rows will be
> sorted by the primary key before writing out).
>
> 4.
> Tables provided by Paimon can be queried by various engines, including
> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
>
> 5.
> Paimon's metadata is self-managed, stored on the distributed file
> system and can be synchronized to Hive metastore (HMS).
>
> 6.
> Besides the common batch read and write support, Paimon also supports
> streaming read and change data feed.
>
>
> Paimon has been used by various users and companies, including
> Alibaba, Bilibili, ByteDance and so on. Paimon is also integrated into
> Alibaba Cloud's E-MapReduce and Realtime Compute products to provide
> cloud services.
>
>
> Paimon was founded in the Flink community in 2022 with the name of
> "Flink Table Storeā€.
> It has been developed for more than one year and produced 4 formal
> releases. As its adoption expands to more computing engines, some of
> the ecology users express their concerns about the neutrality of the
> project. This makes us rethink the positioning of Flink Table Store,
> which can be an independent lake storage.
>
>
> With adequate discussions, we have got the support from the Flink
> community to enter Apache incubation
> [3] [4], with the below expectations:
>
> 1.
> Expand Paimon's ecosystem, providing independent Java APIs to support
> reading and writing from more big data engines such as Apache
> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
>
> 2.
> Supplement key capabilities, especially streaming reads and intensive
> updates/deletes,  for creating a unified and easy-to-use streaming
> data warehouse (lakehouse).
>
> 3. Grow into a more vibrant and neutral open source community.
>
>
> And we believe the Paimon project will provide tremendous value for the
> community if it is introduced into the Apache incubator.
>
>
> I will help this project as the champion and mentor the project together
> with three other mentors (many thanks):
>
>
> * Becket Qin (j...@apache.org)
>
> * Robert Metzger (rmetz...@apache.org)
>
> * Stephan Ewen (se...@apache.org)
>
>
> Look forward to your feedback. Thanks.
>
>
> Best Regards,
> Yu
>
> [1] https://github.com/apache/flink-table-store
> <https://github.com/alibaba/RemoteShuffleService>
>
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
>
> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
>

Reply via email to