Hi all,

We would like to propose Firestorm[1] as a new Apache incubator project,
you can find the proposal here [2] for more details.

Firestorm is a high performance, general purpose Remote Shuffle Service for
distributed compute engines like Apache Spark
<https://spark.apache.org/>, Apache
Hadoop MapReduce <https://hadoop.apache.org/>, Apache Flink
<https://flink.apache.org/> and so on. We are aiming to make Firestorm a
universal shuffle service for distributed compute engines.

Shuffle is the key part for a distributed compute engine to exchange the
data between distributed tasks, the performance and stability of shuffle
will directly affect the whole job. Current “local file pull-like shuffle
style” has several limitations:

   1. Current shuffle is hard to support super large workloads, especially
   in a high load environment, the major problem is IO problem (random disk IO
   issue, network congestion and timeout).
   2. Current shuffle is hard to deploy on the disaggregated compute
   storage environment, as disk capacity is quite limited on compute nodes.
   3. The constraint of storing shuffle data locally makes it hard to scale
   elastically.

Remote Shuffle Service is the key technology for enterprises to build big
data platforms, to expand big data applications to disaggregated,
online-offline hybrid environments, and to solve above problems.

The implementation of Remote Shuffle Service -  “Firestorm”  - is heavily
adopted in Tencent, and shows its advantages in production. Other
enterprises also adopted or prepared to adopt Firestorm in their
environments.

Firestorm’s key idea is brought from Salfish shuffle
<https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing>,
it has several key design goals:

   1. High performance. Firestorm’s performance is close enough to local
   file based shuffle style for small workloads. For large workloads, it is
   far better than the current shuffle style.
   2. Fault tolerance. Firestorm provides high availability for Coordinated
   nodes, and failover for Shuffle nodes.
   3. Pluggable. Firestorm is highly pluggable, which could be suited to
   different compute engines, different backend storages, and different
   wire-protocols.

We believe that Firestorm project will provide the great value for the
community if it is accepted by the Apache incubator.

I will help this project as champion and many thanks to the 3 mentors:

   - Junping du (junping...@apache.org)
   - Xun liu (liu...@apache.org)
   - Zhankun Tang (zt...@apache.org)


[1] https://github.com/Tencent/Firestorm
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/FirestormProposal

Best regards,
Jerry

Reply via email to