I am really excited to see Paimon become an independent ASF incubation
project, and I am happy to be a mentor of the project.

Re Dave,

The plan is to let Paimon eventually graduate as a TLP by itself. The
project bootstrapped as a subproject of Flink because 1) it was designed to
provide a stream and batch unified storage which matches the vision of
Flink as a stream and batch unified engine and 2) the project was developed
by the same team who is working on Flink.

Now since there have been a few releases, we see strong and reasonable use
cases from the users letting Paimon (flink-table-store) work with engines
other than Flink, such as Spark / Trino. Continuing to keep Paimon as a
subject of Flink might unnecessarily limit the development of the project
and is somewhat misleading to the users. Given its scope, we believe it
makes a lot of sense for Paimon to get incubated on its own independent of
Flink. There has been a thorough discussion[1] and vote[2] about this among
the Flink PMC.

Cheers,

Jiangjie (Becket) Qin

[1] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
[2] https://lists.apache.org/thread/95wyc51rfmsqc9osc86q7zx3491m7bvt

On Fri, Feb 24, 2023 at 12:10 PM Dave Fisher <wave4d...@comcast.net> wrote:

> An interesting proposal. Since Paimon is already part of Apache Flink does
> the podling intend to graduate as it’s own Top Level Project? Or, is the
> plan currently to become a subproject of Flink? I’m just curious. Were
> there any discussions within the Flink community about incubating Paimon?
>
> Best Regards,
> Dave
>
> Sent from my iPhone
>
> > On Feb 23, 2023, at 7:58 PM, Yu Li <car...@gmail.com> wrote:
> >
> > Revision: the hyperlink of the first reference is incorrect and please
> use
> > the website address directly instead of clicking it (sorry for my
> mistake).
> >
> > For easier reference: https://github.com/apache/flink-table-store
> >
> > Best Regards,
> > Yu
> >
> >
> >> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote:
> >>
> >> Hi All,
> >>
> >>
> >> I would like to propose Paimon [1] as a new apache incubator project,
> and
> >> you can find the proposal [2] of Paimon for more details.
> >>
> >>
> >> Paimon is a unified lake storage to build dynamic tables for both stream
> >> and batch processing with big data compute engines (Apache Flink, Apache
> >> Spark, Apache
> >> Hive, Trino, etc.), supporting high-speed data ingestion and real-time
> data query.
> >> With the adoption of stream processing in production, there is an
> increasing demand for storage to simultaneously support updates, deletes
> and streaming reads,
> >> which cannot be fully satisfied by existing lake storages. To tackle
> these
> >> new challenges, Paimon
> >> natively adopts LSM (Log-Structured Merge-tree) as its underlying data
> structure, and provides enhanced performance for data with primary keys
> >> (besides
> >> the common lake storage capabilities). What's more, Paimon supports
> both batch and stream operations (reads and writes), facilitating
> applications pursuing batch-stream-unified semantics. Specifically:
> >>
> >>
> >> 1. Paimon provides excellent performance on the intensive update
> >> / delete workload, leveraging the append-write feature of the LSM data
> >> structure.
> >>
> >> 2. Paimon utilizes the ordered feature of LSM to support effective
> filter
> >> pushdown, and could reduce
> >> the latency of queries with primary key filtering to milliseconds.
> >>
> >> 3.
> >> Paimon supports various (row-based or row-columnar) file formats
> including Apache Avro, Apache ORC and Apache Parquet (rows will be sorted
> by the primary key before writing out).
> >>
> >> 4.
> >> Tables provided by Paimon can be queried by various engines, including
> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
> >>
> >> 5.
> >> Paimon's metadata is self-managed, stored on the distributed file
> system and can be synchronized to Hive metastore (HMS).
> >>
> >> 6.
> >> Besides the common batch read and write support, Paimon also supports
> streaming read and change data feed.
> >>
> >>
> >>
> >> Paimon has been used by various users and companies, including Alibaba,
> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba
> Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
> >>
> >>
> >> Paimon was founded in the Flink community in 2022 with the name of
> "Flink Table Store”.
> >> It has been developed for more than one year and produced 4 formal
> >> releases. As its adoption expands to more computing engines, some of
> the ecology users express their concerns about the neutrality of the
> project. This makes us rethink the positioning of Flink Table Store, which
> can be an independent lake storage.
> >>
> >>
> >> With adequate discussions, we have got the support from the Flink
> community to enter Apache incubation
> >> [3] [4], with the below expectations:
> >>
> >> 1.
> >> Expand Paimon's ecosystem, providing independent Java APIs to support
> reading and writing from more big data engines such as Apache
> >> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
> >>
> >> 2.
> >> Supplement key capabilities, especially streaming reads and intensive
> updates/deletes,  for creating a unified and easy-to-use streaming data
> warehouse (lakehouse).
> >>
> >> 3. Grow into a more vibrant and neutral open source community.
> >>
> >>
> >> And we believe the Paimon project will provide tremendous value for the
> >> community if it is introduced into the Apache incubator.
> >>
> >>
> >> I will help this project as the champion and mentor the project together
> >> with three other mentors (many thanks):
> >>
> >>
> >> * Becket Qin (j...@apache.org)
> >>
> >> * Robert Metzger (rmetz...@apache.org)
> >>
> >> * Stephan Ewen (se...@apache.org)
> >>
> >>
> >> Look forward to your feedback. Thanks.
> >>
> >>
> >> Best Regards,
> >> Yu
> >>
> >> [1] https://github.com/apache/flink-table-store
> >> <https://github.com/alibaba/RemoteShuffleService>
> >>
> >> [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> >>
> >> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> >>
> >> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
> >>
> >>
> >>
>
>

Reply via email to