+1  (non-binding)

Good luck to Uniffle.

Bests,
Sammi

On Wed, 25 May 2022 at 00:05, Jerry Shao <js...@apache.org> wrote:

> Hi all,
>
> Due to the name issue in thread (
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> figured out a new project name "Uniffle" and created a new Thread. Please
> help to discuss.
>
> We would like to propose Uniffle[1] as a new Apache incubator project, you
> can find the proposal here [2] for more details.
>
> Uniffle is a high performance, general purpose Remote Shuffle Service for
> distributed compute engines like Apache Spark
> <https://spark.apache.org/>, Apache
> Hadoop MapReduce <https://hadoop.apache.org/>, Apache Flink
> <https://flink.apache.org/> and so on. We are aiming to make Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange the
> data between distributed tasks, the performance and stability of shuffle
> will directly affect the whole job. Current “local file pull-like shuffle
> style” has several limitations:
>
>    1. Current shuffle is hard to support super large workloads, especially
>    in a high load environment, the major problem is IO problem (random
> disk IO
>    issue, network congestion and timeout).
>    2. Current shuffle is hard to deploy on the disaggregated compute
>    storage environment, as disk capacity is quite limited on compute nodes.
>    3. The constraint of storing shuffle data locally makes it hard to scale
>    elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Uniffle's key idea is brought from Salfish shuffle
> <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> it has several key design goals:
>
>    1. High performance. Firestorm’s performance is close enough to local
>    file based shuffle style for small workloads. For large workloads, it is
>    far better than the current shuffle style.
>    2. Fault tolerance. Firestorm provides high availability for Coordinated
>    nodes, and failover for Shuffle nodes.
>    3. Pluggable. Firestorm is highly pluggable, which could be suited to
>    different compute engines, different backend storages, and different
>    wire-protocols.
>
> We believe that Uniffle project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>    -
>
>    Felix Cheung (felixche...@apache.org)
>    - Junping du (junping...@apache.org)
>    - Weiwei Yang (w...@apache.org)
>    - Xun liu (liu...@apache.org)
>    - Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>
> Best regards,
> Jerry
>

Reply via email to