Re: [DISCUSSION] Incubating Proposal of Uniffle

Daniel Widdis Tue, 24 May 2022 18:53:11 -0700

+1 (non-binding) from me!  Good luck!

On 5/24/22, 9:05 AM, "Jerry Shao" <js...@apache.org> wrote:

Hi all,

Due to the name issue in thread (
https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
figured out a new project name "Uniffle" and created a new Thread. Please
help to discuss.

We would like to propose Uniffle[1] as a new Apache incubator project, you
can find the proposal here [2] for more details.

Uniffle is a high performance, general purpose Remote Shuffle Service for
distributed compute engines like Apache Spark
<https://spark.apache.org/>, Apache
Hadoop MapReduce <https://hadoop.apache.org/>, Apache Flink
<https://flink.apache.org/> and so on. We are aiming to make Firestorm a
universal shuffle service for distributed compute engines.

Shuffle is the key part for a distributed compute engine to exchange the
data between distributed tasks, the performance and stability of shuffle
will directly affect the whole job. Current “local file pull-like shuffle
style” has several limitations:

1. Current shuffle is hard to support super large workloads, especially
in a high load environment, the major problem is IO problem (random disk
IO
issue, network congestion and timeout).
2. Current shuffle is hard to deploy on the disaggregated compute
storage environment, as disk capacity is quite limited on compute nodes.
3. The constraint of storing shuffle data locally makes it hard to scale
elastically.

Remote Shuffle Service is the key technology for enterprises to build big
data platforms, to expand big data applications to disaggregated,
online-offline hybrid environments, and to solve above problems.

The implementation of Remote Shuffle Service - “Uniffle” - is heavily
adopted in Tencent, and shows its advantages in production. Other
enterprises also adopted or prepared to adopt Firestorm in their
environments.

Uniffle's key idea is brought from Salfish shuffle

<https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing>,
it has several key design goals:

1. High performance. Firestorm’s performance is close enough to local
file based shuffle style for small workloads. For large workloads, it is
far better than the current shuffle style.
2. Fault tolerance. Firestorm provides high availability for Coordinated
nodes, and failover for Shuffle nodes.
3. Pluggable. Firestorm is highly pluggable, which could be suited to
different compute engines, different backend storages, and different
wire-protocols.

We believe that Uniffle project will provide the great value for the
community if it is accepted by the Apache incubator.

I will help this project as champion and many thanks to the 3 mentors:

Felix Cheung (felixche...@apache.org)
- Junping du (junping...@apache.org)
- Weiwei Yang (w...@apache.org)
- Xun liu (liu...@apache.org)
- Zhankun Tang (zt...@apache.org)

[1] https://github.com/Tencent/Firestorm
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal

Best regards,
Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [DISCUSSION] Incubating Proposal of Uniffle

Reply via email to