Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-30 Thread Jerry Shao
Hi all,

Thanks a lot for your suggestions and supports. This thread is opened for
almost 7 days, I'm going to close it and create a new vote thread.

Thanks
Jerry

Aloys Zhang  于2022年5月27日周五 06:29写道:

> +1 (non-binding) good luck
>
> Zhang Yonglun  于2022年5月26日周四 20:52写道:
>
> > +1 (non-binding)
> >
> > --
> >
> > Zhang Yonglun
> > Apache ShenYu (Incubating)
> > Apache ShardingSphere
> >
> > Jerry Shao  于2022年5月25日周三 00:07写道:
> > >
> > > Hi all,
> > >
> > > Due to the name issue in thread (
> > > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > > figured out a new project name "Uniffle" and created a new Thread.
> Please
> > > help to discuss.
> > >
> > > We would like to propose Uniffle[1] as a new Apache incubator project,
> > you
> > > can find the proposal here [2] for more details.
> > >
> > > Uniffle is a high performance, general purpose Remote Shuffle Service
> for
> > > distributed compute engines like Apache Spark
> > > , Apache
> > > Hadoop MapReduce , Apache Flink
> > >  and so on. We are aiming to make
> Firestorm a
> > > universal shuffle service for distributed compute engines.
> > >
> > > Shuffle is the key part for a distributed compute engine to exchange
> the
> > > data between distributed tasks, the performance and stability of
> shuffle
> > > will directly affect the whole job. Current “local file pull-like
> shuffle
> > > style” has several limitations:
> > >
> > >1. Current shuffle is hard to support super large workloads,
> > especially
> > >in a high load environment, the major problem is IO problem (random
> > disk IO
> > >issue, network congestion and timeout).
> > >2. Current shuffle is hard to deploy on the disaggregated compute
> > >storage environment, as disk capacity is quite limited on compute
> > nodes.
> > >3. The constraint of storing shuffle data locally makes it hard to
> > scale
> > >elastically.
> > >
> > > Remote Shuffle Service is the key technology for enterprises to build
> big
> > > data platforms, to expand big data applications to disaggregated,
> > > online-offline hybrid environments, and to solve above problems.
> > >
> > > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > > adopted in Tencent, and shows its advantages in production. Other
> > > enterprises also adopted or prepared to adopt Firestorm in their
> > > environments.
> > >
> > > Uniffle's key idea is brought from Salfish shuffle
> > > <
> >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > >,
> > > it has several key design goals:
> > >
> > >1. High performance. Firestorm’s performance is close enough to
> local
> > >file based shuffle style for small workloads. For large workloads,
> it
> > is
> > >far better than the current shuffle style.
> > >2. Fault tolerance. Firestorm provides high availability for
> > Coordinated
> > >nodes, and failover for Shuffle nodes.
> > >3. Pluggable. Firestorm is highly pluggable, which could be suited
> to
> > >different compute engines, different backend storages, and different
> > >wire-protocols.
> > >
> > > We believe that Uniffle project will provide the great value for the
> > > community if it is accepted by the Apache incubator.
> > >
> > > I will help this project as champion and many thanks to the 3 mentors:
> > >
> > >-
> > >
> > >Felix Cheung (felixche...@apache.org)
> > >- Junping du (junping...@apache.org)
> > >- Weiwei Yang (w...@apache.org)
> > >- Xun liu (liu...@apache.org)
> > >- Zhankun Tang (zt...@apache.org)
> > >
> > >
> > > [1] https://github.com/Tencent/Firestorm
> > > [2]
> > https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> > >
> > > Best regards,
> > > Jerry
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-26 Thread Aloys Zhang
+1 (non-binding) good luck

Zhang Yonglun  于2022年5月26日周四 20:52写道:

> +1 (non-binding)
>
> --
>
> Zhang Yonglun
> Apache ShenYu (Incubating)
> Apache ShardingSphere
>
> Jerry Shao  于2022年5月25日周三 00:07写道:
> >
> > Hi all,
> >
> > Due to the name issue in thread (
> > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > figured out a new project name "Uniffle" and created a new Thread. Please
> > help to discuss.
> >
> > We would like to propose Uniffle[1] as a new Apache incubator project,
> you
> > can find the proposal here [2] for more details.
> >
> > Uniffle is a high performance, general purpose Remote Shuffle Service for
> > distributed compute engines like Apache Spark
> > , Apache
> > Hadoop MapReduce , Apache Flink
> >  and so on. We are aiming to make Firestorm a
> > universal shuffle service for distributed compute engines.
> >
> > Shuffle is the key part for a distributed compute engine to exchange the
> > data between distributed tasks, the performance and stability of shuffle
> > will directly affect the whole job. Current “local file pull-like shuffle
> > style” has several limitations:
> >
> >1. Current shuffle is hard to support super large workloads,
> especially
> >in a high load environment, the major problem is IO problem (random
> disk IO
> >issue, network congestion and timeout).
> >2. Current shuffle is hard to deploy on the disaggregated compute
> >storage environment, as disk capacity is quite limited on compute
> nodes.
> >3. The constraint of storing shuffle data locally makes it hard to
> scale
> >elastically.
> >
> > Remote Shuffle Service is the key technology for enterprises to build big
> > data platforms, to expand big data applications to disaggregated,
> > online-offline hybrid environments, and to solve above problems.
> >
> > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > adopted in Tencent, and shows its advantages in production. Other
> > enterprises also adopted or prepared to adopt Firestorm in their
> > environments.
> >
> > Uniffle's key idea is brought from Salfish shuffle
> > <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> > it has several key design goals:
> >
> >1. High performance. Firestorm’s performance is close enough to local
> >file based shuffle style for small workloads. For large workloads, it
> is
> >far better than the current shuffle style.
> >2. Fault tolerance. Firestorm provides high availability for
> Coordinated
> >nodes, and failover for Shuffle nodes.
> >3. Pluggable. Firestorm is highly pluggable, which could be suited to
> >different compute engines, different backend storages, and different
> >wire-protocols.
> >
> > We believe that Uniffle project will provide the great value for the
> > community if it is accepted by the Apache incubator.
> >
> > I will help this project as champion and many thanks to the 3 mentors:
> >
> >-
> >
> >Felix Cheung (felixche...@apache.org)
> >- Junping du (junping...@apache.org)
> >- Weiwei Yang (w...@apache.org)
> >- Xun liu (liu...@apache.org)
> >- Zhankun Tang (zt...@apache.org)
> >
> >
> > [1] https://github.com/Tencent/Firestorm
> > [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> >
> > Best regards,
> > Jerry
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-26 Thread Zhang Yonglun
+1 (non-binding)

--

Zhang Yonglun
Apache ShenYu (Incubating)
Apache ShardingSphere

Jerry Shao  于2022年5月25日周三 00:07写道:
>
> Hi all,
>
> Due to the name issue in thread (
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> figured out a new project name "Uniffle" and created a new Thread. Please
> help to discuss.
>
> We would like to propose Uniffle[1] as a new Apache incubator project, you
> can find the proposal here [2] for more details.
>
> Uniffle is a high performance, general purpose Remote Shuffle Service for
> distributed compute engines like Apache Spark
> , Apache
> Hadoop MapReduce , Apache Flink
>  and so on. We are aiming to make Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange the
> data between distributed tasks, the performance and stability of shuffle
> will directly affect the whole job. Current “local file pull-like shuffle
> style” has several limitations:
>
>1. Current shuffle is hard to support super large workloads, especially
>in a high load environment, the major problem is IO problem (random disk IO
>issue, network congestion and timeout).
>2. Current shuffle is hard to deploy on the disaggregated compute
>storage environment, as disk capacity is quite limited on compute nodes.
>3. The constraint of storing shuffle data locally makes it hard to scale
>elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Uniffle's key idea is brought from Salfish shuffle
> ,
> it has several key design goals:
>
>1. High performance. Firestorm’s performance is close enough to local
>file based shuffle style for small workloads. For large workloads, it is
>far better than the current shuffle style.
>2. Fault tolerance. Firestorm provides high availability for Coordinated
>nodes, and failover for Shuffle nodes.
>3. Pluggable. Firestorm is highly pluggable, which could be suited to
>different compute engines, different backend storages, and different
>wire-protocols.
>
> We believe that Uniffle project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>-
>
>Felix Cheung (felixche...@apache.org)
>- Junping du (junping...@apache.org)
>- Weiwei Yang (w...@apache.org)
>- Xun liu (liu...@apache.org)
>- Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>
> Best regards,
> Jerry

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



回复: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread BLAST
+1, Good Luck!




-- 原始邮件 --
发件人:
"general"   
 
https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > > > we
> > > > >>    figured 
out a new project name "Uniffle" and created a new
> > Thread.
> > > > >> Please
> > > > >>    help to 
discuss.
> > > > >>
> > > > >>    We would 
like to propose Uniffle[1] as a new Apache
> incubator
> > > > project,
> > > > >> you
> > > > >>    can find 
the proposal here [2] for more details.
> > > > >>
> > > > >>    Uniffle 
is a high performance, general purpose Remote
> > Shuffle Service
> > > > >> for
> > > > >>    
distributed compute engines like Apache Spark
> > > > >>    
;, Apache
> > > > >>    Hadoop 
MapReduce ;, Apache
> Flink
> > > > >>    
; and so on. We are aiming to
> make
> > > > >> Firestorm a
> > > > >>    
universal shuffle service for distributed compute engines.
> > > > >>
> > > > >>    Shuffle 
is the key part for a distributed compute engine to
> > exchange
> > > > >> the
> > > > >>    data 
between distributed tasks, the performance and
> > stability of
> > > > >> shuffle
> > > > >>    will 
directly affect the whole job. Current “local file
> > pull-like
> > > > >> shuffle
> > > > >>    style” 
has several limitations:
> > > > >>
> > > > 
>>   1. Current shuffle is hard to 
support super large
> > workloads,
> > > > >> especially
> > > > 
>>   in a high load environment, the 
major problem is IO
> > problem
> > > > (random
> > > > >> disk IO
> > > > 
>>   issue, network congestion and 
timeout).
> > > > 
>>   2. Current shuffle is hard to 
deploy on the
> disaggregated
> > compute
> > > > 
>>   storage environment, as disk 
capacity is quite limited
> on
> > compute
> > > > >> nodes.
> > > > 
>>   3. The constraint of storing 
shuffle data locally makes
> > it hard to
> > > > >> scale
> > > > 
>>   elastically.
> > > > >>
> > > > >>    Remote 
Shuffle Service is the key technology for
> enterprises
> > to build
> > > > >> big
> > > > >>    data 
platforms, to expand big data applications to
> > disaggregated,
> > > > >>    
online-offline hybrid environments, and to solve above
> > problems.
> > > > >>
> > > > >>    The 
implementation of Remote Shuffle Service -  “Uniffle”
> -
> > is
> > > > heavily
> > > > >>    adopted 
in Tencent, and shows its advantages in production.
> > Other
> > > > >>    
enterprises also adopted or prepared to adopt Firestorm in
> > their
> > > > >>    
environments.
> > > > >>
> > > > >>    
Uniffle's key idea is brought from Salfish shuffle
> > > > >>    <
> > > > >>
> > > >
> >
> 
https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > > >>> ,
> > > > >>    it has 
several key design goals:
> > > > >>
> > > > 
>>   1. High performance. Firestorm’s 
performance is close
> > enough to
> > > > >> local
> > > > 
>>   file based shuffle style for small 
workloads. For large
> > workloads,
> > > > >> it is
> > > > 
>>   far better than the current 
shuffle style.
> > > > 
>>   2. Fault tolerance. Firestorm 
provides high availability
> > for
> > > > >> Coordinated
> > > > 
>>   nodes, and failover for Shuffle 
nodes.
> > > > 
>>   3. Pluggable. Firestorm is highly 
pluggable, which could
> > be suited
> > > > >> to
> > > > 
>>   different compute engines, 
different backend storages,
> and
> > > > different
> > > > 
>>   wire-protocols.
> > > > >>
> > > > >>    We 
believe that Uniffle project will provide the great
> value
> > for the
> > > > >>    
community if it is accepted by the Apache incubator.
> > > > >>
> > > > >>    I will 
help this project as champion and many thanks to
> the 3
> > > > mentors:
> > > > >>
> > > > 
>>   -
> > > > >>
> > > > 
>>   Felix Cheung 
(felixche...@apache.org)
> > > > 
>>   - Junping du 
(junping...@apache.org)
> > > > 
>>   - Weiwei Yang (w...@apache.org)
> > > > 
>>   - Xun liu (liu...@apache.org)
> > > > 
>>   - Zhankun Tang (zt...@apache.org)
> > > > >>
> > > > >>
> > > > >>    [1] 
https://github.com/Tencent/Firestorm
> > > > >>    [2]
> > > > >>
> > https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> > > > >>
> > > > >>    Best 
regards,
> > > > >>    Jerry
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > 

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread guo jiwei
+1 non-binding
Good luck


Regards
Jiwei Guo (Tboy)


On Wed, May 25, 2022 at 9:54 PM Zhankun Tang  wrote:

> +1 (Non-binding).
> Looking forward to see it grows a bigger community and possibly cover more
> scenarios like public cloud. :D
>
> BR,
> Zhankun
>
> Willem Jiang 于2022年5月25日 周三下午8:59写道:
>
> > +1 (binding).
> >
> > Willem Jiang
> >
> > Twitter: willemjiang
> > Weibo: 姜宁willem
> >
> > On Wed, May 25, 2022 at 12:05 AM Jerry Shao  wrote:
> > >
> > > Hi all,
> > >
> > > Due to the name issue in thread (
> > > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > > figured out a new project name "Uniffle" and created a new Thread.
> Please
> > > help to discuss.
> > >
> > > We would like to propose Uniffle[1] as a new Apache incubator project,
> > you
> > > can find the proposal here [2] for more details.
> > >
> > > Uniffle is a high performance, general purpose Remote Shuffle Service
> for
> > > distributed compute engines like Apache Spark
> > > , Apache
> > > Hadoop MapReduce , Apache Flink
> > >  and so on. We are aiming to make
> Firestorm a
> > > universal shuffle service for distributed compute engines.
> > >
> > > Shuffle is the key part for a distributed compute engine to exchange
> the
> > > data between distributed tasks, the performance and stability of
> shuffle
> > > will directly affect the whole job. Current “local file pull-like
> shuffle
> > > style” has several limitations:
> > >
> > >1. Current shuffle is hard to support super large workloads,
> > especially
> > >in a high load environment, the major problem is IO problem (random
> > disk IO
> > >issue, network congestion and timeout).
> > >2. Current shuffle is hard to deploy on the disaggregated compute
> > >storage environment, as disk capacity is quite limited on compute
> > nodes.
> > >3. The constraint of storing shuffle data locally makes it hard to
> > scale
> > >elastically.
> > >
> > > Remote Shuffle Service is the key technology for enterprises to build
> big
> > > data platforms, to expand big data applications to disaggregated,
> > > online-offline hybrid environments, and to solve above problems.
> > >
> > > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > > adopted in Tencent, and shows its advantages in production. Other
> > > enterprises also adopted or prepared to adopt Firestorm in their
> > > environments.
> > >
> > > Uniffle's key idea is brought from Salfish shuffle
> > > <
> >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > >,
> > > it has several key design goals:
> > >
> > >1. High performance. Firestorm’s performance is close enough to
> local
> > >file based shuffle style for small workloads. For large workloads,
> it
> > is
> > >far better than the current shuffle style.
> > >2. Fault tolerance. Firestorm provides high availability for
> > Coordinated
> > >nodes, and failover for Shuffle nodes.
> > >3. Pluggable. Firestorm is highly pluggable, which could be suited
> to
> > >different compute engines, different backend storages, and different
> > >wire-protocols.
> > >
> > > We believe that Uniffle project will provide the great value for the
> > > community if it is accepted by the Apache incubator.
> > >
> > > I will help this project as champion and many thanks to the 3 mentors:
> > >
> > >-
> > >
> > >Felix Cheung (felixche...@apache.org)
> > >- Junping du (junping...@apache.org)
> > >- Weiwei Yang (w...@apache.org)
> > >- Xun liu (liu...@apache.org)
> > >- Zhankun Tang (zt...@apache.org)
> > >
> > >
> > > [1] https://github.com/Tencent/Firestorm
> > > [2]
> > https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> > >
> > > Best regards,
> > > Jerry
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Zhankun Tang
+1 (Non-binding).
Looking forward to see it grows a bigger community and possibly cover more
scenarios like public cloud. :D

BR,
Zhankun

Willem Jiang 于2022年5月25日 周三下午8:59写道:

> +1 (binding).
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Wed, May 25, 2022 at 12:05 AM Jerry Shao  wrote:
> >
> > Hi all,
> >
> > Due to the name issue in thread (
> > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > figured out a new project name "Uniffle" and created a new Thread. Please
> > help to discuss.
> >
> > We would like to propose Uniffle[1] as a new Apache incubator project,
> you
> > can find the proposal here [2] for more details.
> >
> > Uniffle is a high performance, general purpose Remote Shuffle Service for
> > distributed compute engines like Apache Spark
> > , Apache
> > Hadoop MapReduce , Apache Flink
> >  and so on. We are aiming to make Firestorm a
> > universal shuffle service for distributed compute engines.
> >
> > Shuffle is the key part for a distributed compute engine to exchange the
> > data between distributed tasks, the performance and stability of shuffle
> > will directly affect the whole job. Current “local file pull-like shuffle
> > style” has several limitations:
> >
> >1. Current shuffle is hard to support super large workloads,
> especially
> >in a high load environment, the major problem is IO problem (random
> disk IO
> >issue, network congestion and timeout).
> >2. Current shuffle is hard to deploy on the disaggregated compute
> >storage environment, as disk capacity is quite limited on compute
> nodes.
> >3. The constraint of storing shuffle data locally makes it hard to
> scale
> >elastically.
> >
> > Remote Shuffle Service is the key technology for enterprises to build big
> > data platforms, to expand big data applications to disaggregated,
> > online-offline hybrid environments, and to solve above problems.
> >
> > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > adopted in Tencent, and shows its advantages in production. Other
> > enterprises also adopted or prepared to adopt Firestorm in their
> > environments.
> >
> > Uniffle's key idea is brought from Salfish shuffle
> > <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> > it has several key design goals:
> >
> >1. High performance. Firestorm’s performance is close enough to local
> >file based shuffle style for small workloads. For large workloads, it
> is
> >far better than the current shuffle style.
> >2. Fault tolerance. Firestorm provides high availability for
> Coordinated
> >nodes, and failover for Shuffle nodes.
> >3. Pluggable. Firestorm is highly pluggable, which could be suited to
> >different compute engines, different backend storages, and different
> >wire-protocols.
> >
> > We believe that Uniffle project will provide the great value for the
> > community if it is accepted by the Apache incubator.
> >
> > I will help this project as champion and many thanks to the 3 mentors:
> >
> >-
> >
> >Felix Cheung (felixche...@apache.org)
> >- Junping du (junping...@apache.org)
> >- Weiwei Yang (w...@apache.org)
> >- Xun liu (liu...@apache.org)
> >- Zhankun Tang (zt...@apache.org)
> >
> >
> > [1] https://github.com/Tencent/Firestorm
> > [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> >
> > Best regards,
> > Jerry
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Willem Jiang
+1 (binding).

Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Wed, May 25, 2022 at 12:05 AM Jerry Shao  wrote:
>
> Hi all,
>
> Due to the name issue in thread (
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> figured out a new project name "Uniffle" and created a new Thread. Please
> help to discuss.
>
> We would like to propose Uniffle[1] as a new Apache incubator project, you
> can find the proposal here [2] for more details.
>
> Uniffle is a high performance, general purpose Remote Shuffle Service for
> distributed compute engines like Apache Spark
> , Apache
> Hadoop MapReduce , Apache Flink
>  and so on. We are aiming to make Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange the
> data between distributed tasks, the performance and stability of shuffle
> will directly affect the whole job. Current “local file pull-like shuffle
> style” has several limitations:
>
>1. Current shuffle is hard to support super large workloads, especially
>in a high load environment, the major problem is IO problem (random disk IO
>issue, network congestion and timeout).
>2. Current shuffle is hard to deploy on the disaggregated compute
>storage environment, as disk capacity is quite limited on compute nodes.
>3. The constraint of storing shuffle data locally makes it hard to scale
>elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Uniffle's key idea is brought from Salfish shuffle
> ,
> it has several key design goals:
>
>1. High performance. Firestorm’s performance is close enough to local
>file based shuffle style for small workloads. For large workloads, it is
>far better than the current shuffle style.
>2. Fault tolerance. Firestorm provides high availability for Coordinated
>nodes, and failover for Shuffle nodes.
>3. Pluggable. Firestorm is highly pluggable, which could be suited to
>different compute engines, different backend storages, and different
>wire-protocols.
>
> We believe that Uniffle project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>-
>
>Felix Cheung (felixche...@apache.org)
>- Junping du (junping...@apache.org)
>- Weiwei Yang (w...@apache.org)
>- Xun liu (liu...@apache.org)
>- Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>
> Best regards,
> Jerry

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Charles Zhang
+1 (non-binding)

Good luck

Brahma Reddy Battula  于2022年5月25日周三 17:24写道:

> + 1 ( non binding)
>
> Good luck
>
> On Wed, 25 May 2022 at 1:55 PM, Sammi Chen  wrote:
>
> > +1  (non-binding)
> >
> > Good luck to Uniffle.
> >
> > Bests,
> > Sammi
> >
> > On Wed, 25 May 2022 at 00:05, Jerry Shao  wrote:
> >
> > > Hi all,
> > >
> > > Due to the name issue in thread (
> > > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > > figured out a new project name "Uniffle" and created a new Thread.
> Please
> > > help to discuss.
> > >
> > > We would like to propose Uniffle[1] as a new Apache incubator project,
> > you
> > > can find the proposal here [2] for more details.
> > >
> > > Uniffle is a high performance, general purpose Remote Shuffle Service
> for
> > > distributed compute engines like Apache Spark
> > > , Apache
> > > Hadoop MapReduce , Apache Flink
> > >  and so on. We are aiming to make
> Firestorm a
> > > universal shuffle service for distributed compute engines.
> > >
> > > Shuffle is the key part for a distributed compute engine to exchange
> the
> > > data between distributed tasks, the performance and stability of
> shuffle
> > > will directly affect the whole job. Current “local file pull-like
> shuffle
> > > style” has several limitations:
> > >
> > >1. Current shuffle is hard to support super large workloads,
> > especially
> > >in a high load environment, the major problem is IO problem (random
> > > disk IO
> > >issue, network congestion and timeout).
> > >2. Current shuffle is hard to deploy on the disaggregated compute
> > >storage environment, as disk capacity is quite limited on compute
> > nodes.
> > >3. The constraint of storing shuffle data locally makes it hard to
> > scale
> > >elastically.
> > >
> > > Remote Shuffle Service is the key technology for enterprises to build
> big
> > > data platforms, to expand big data applications to disaggregated,
> > > online-offline hybrid environments, and to solve above problems.
> > >
> > > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > > adopted in Tencent, and shows its advantages in production. Other
> > > enterprises also adopted or prepared to adopt Firestorm in their
> > > environments.
> > >
> > > Uniffle's key idea is brought from Salfish shuffle
> > > <
> > >
> >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > >,
> > > it has several key design goals:
> > >
> > >1. High performance. Firestorm’s performance is close enough to
> local
> > >file based shuffle style for small workloads. For large workloads,
> it
> > is
> > >far better than the current shuffle style.
> > >2. Fault tolerance. Firestorm provides high availability for
> > Coordinated
> > >nodes, and failover for Shuffle nodes.
> > >3. Pluggable. Firestorm is highly pluggable, which could be suited
> to
> > >different compute engines, different backend storages, and different
> > >wire-protocols.
> > >
> > > We believe that Uniffle project will provide the great value for the
> > > community if it is accepted by the Apache incubator.
> > >
> > > I will help this project as champion and many thanks to the 3 mentors:
> > >
> > >-
> > >
> > >Felix Cheung (felixche...@apache.org)
> > >- Junping du (junping...@apache.org)
> > >- Weiwei Yang (w...@apache.org)
> > >- Xun liu (liu...@apache.org)
> > >- Zhankun Tang (zt...@apache.org)
> > >
> > >
> > > [1] https://github.com/Tencent/Firestorm
> > > [2]
> > https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> > >
> > > Best regards,
> > > Jerry
> > >
> >
> --
>
>
>
> --Brahma Reddy Battula
>


-- 
Best wishes,
Charles Zhang


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Brahma Reddy Battula
+ 1 ( non binding)

Good luck

On Wed, 25 May 2022 at 1:55 PM, Sammi Chen  wrote:

> +1  (non-binding)
>
> Good luck to Uniffle.
>
> Bests,
> Sammi
>
> On Wed, 25 May 2022 at 00:05, Jerry Shao  wrote:
>
> > Hi all,
> >
> > Due to the name issue in thread (
> > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > figured out a new project name "Uniffle" and created a new Thread. Please
> > help to discuss.
> >
> > We would like to propose Uniffle[1] as a new Apache incubator project,
> you
> > can find the proposal here [2] for more details.
> >
> > Uniffle is a high performance, general purpose Remote Shuffle Service for
> > distributed compute engines like Apache Spark
> > , Apache
> > Hadoop MapReduce , Apache Flink
> >  and so on. We are aiming to make Firestorm a
> > universal shuffle service for distributed compute engines.
> >
> > Shuffle is the key part for a distributed compute engine to exchange the
> > data between distributed tasks, the performance and stability of shuffle
> > will directly affect the whole job. Current “local file pull-like shuffle
> > style” has several limitations:
> >
> >1. Current shuffle is hard to support super large workloads,
> especially
> >in a high load environment, the major problem is IO problem (random
> > disk IO
> >issue, network congestion and timeout).
> >2. Current shuffle is hard to deploy on the disaggregated compute
> >storage environment, as disk capacity is quite limited on compute
> nodes.
> >3. The constraint of storing shuffle data locally makes it hard to
> scale
> >elastically.
> >
> > Remote Shuffle Service is the key technology for enterprises to build big
> > data platforms, to expand big data applications to disaggregated,
> > online-offline hybrid environments, and to solve above problems.
> >
> > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > adopted in Tencent, and shows its advantages in production. Other
> > enterprises also adopted or prepared to adopt Firestorm in their
> > environments.
> >
> > Uniffle's key idea is brought from Salfish shuffle
> > <
> >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > >,
> > it has several key design goals:
> >
> >1. High performance. Firestorm’s performance is close enough to local
> >file based shuffle style for small workloads. For large workloads, it
> is
> >far better than the current shuffle style.
> >2. Fault tolerance. Firestorm provides high availability for
> Coordinated
> >nodes, and failover for Shuffle nodes.
> >3. Pluggable. Firestorm is highly pluggable, which could be suited to
> >different compute engines, different backend storages, and different
> >wire-protocols.
> >
> > We believe that Uniffle project will provide the great value for the
> > community if it is accepted by the Apache incubator.
> >
> > I will help this project as champion and many thanks to the 3 mentors:
> >
> >-
> >
> >Felix Cheung (felixche...@apache.org)
> >- Junping du (junping...@apache.org)
> >- Weiwei Yang (w...@apache.org)
> >- Xun liu (liu...@apache.org)
> >- Zhankun Tang (zt...@apache.org)
> >
> >
> > [1] https://github.com/Tencent/Firestorm
> > [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> >
> > Best regards,
> > Jerry
> >
>
-- 



--Brahma Reddy Battula


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Eason Chen
+1 (non-binding)

Good luck!

On Wed, May 25, 2022 at 4:25 PM Sammi Chen  wrote:
>
> +1  (non-binding)
>
> Good luck to Uniffle.
>
> Bests,
> Sammi
>
> On Wed, 25 May 2022 at 00:05, Jerry Shao  wrote:
>
> > Hi all,
> >
> > Due to the name issue in thread (
> > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> > figured out a new project name "Uniffle" and created a new Thread. Please
> > help to discuss.
> >
> > We would like to propose Uniffle[1] as a new Apache incubator project, you
> > can find the proposal here [2] for more details.
> >
> > Uniffle is a high performance, general purpose Remote Shuffle Service for
> > distributed compute engines like Apache Spark
> > , Apache
> > Hadoop MapReduce , Apache Flink
> >  and so on. We are aiming to make Firestorm a
> > universal shuffle service for distributed compute engines.
> >
> > Shuffle is the key part for a distributed compute engine to exchange the
> > data between distributed tasks, the performance and stability of shuffle
> > will directly affect the whole job. Current “local file pull-like shuffle
> > style” has several limitations:
> >
> >1. Current shuffle is hard to support super large workloads, especially
> >in a high load environment, the major problem is IO problem (random
> > disk IO
> >issue, network congestion and timeout).
> >2. Current shuffle is hard to deploy on the disaggregated compute
> >storage environment, as disk capacity is quite limited on compute nodes.
> >3. The constraint of storing shuffle data locally makes it hard to scale
> >elastically.
> >
> > Remote Shuffle Service is the key technology for enterprises to build big
> > data platforms, to expand big data applications to disaggregated,
> > online-offline hybrid environments, and to solve above problems.
> >
> > The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> > adopted in Tencent, and shows its advantages in production. Other
> > enterprises also adopted or prepared to adopt Firestorm in their
> > environments.
> >
> > Uniffle's key idea is brought from Salfish shuffle
> > <
> > https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > >,
> > it has several key design goals:
> >
> >1. High performance. Firestorm’s performance is close enough to local
> >file based shuffle style for small workloads. For large workloads, it is
> >far better than the current shuffle style.
> >2. Fault tolerance. Firestorm provides high availability for Coordinated
> >nodes, and failover for Shuffle nodes.
> >3. Pluggable. Firestorm is highly pluggable, which could be suited to
> >different compute engines, different backend storages, and different
> >wire-protocols.
> >
> > We believe that Uniffle project will provide the great value for the
> > community if it is accepted by the Apache incubator.
> >
> > I will help this project as champion and many thanks to the 3 mentors:
> >
> >-
> >
> >Felix Cheung (felixche...@apache.org)
> >- Junping du (junping...@apache.org)
> >- Weiwei Yang (w...@apache.org)
> >- Xun liu (liu...@apache.org)
> >- Zhankun Tang (zt...@apache.org)
> >
> >
> > [1] https://github.com/Tencent/Firestorm
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> >
> > Best regards,
> > Jerry
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Sammi Chen
+1  (non-binding)

Good luck to Uniffle.

Bests,
Sammi

On Wed, 25 May 2022 at 00:05, Jerry Shao  wrote:

> Hi all,
>
> Due to the name issue in thread (
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> figured out a new project name "Uniffle" and created a new Thread. Please
> help to discuss.
>
> We would like to propose Uniffle[1] as a new Apache incubator project, you
> can find the proposal here [2] for more details.
>
> Uniffle is a high performance, general purpose Remote Shuffle Service for
> distributed compute engines like Apache Spark
> , Apache
> Hadoop MapReduce , Apache Flink
>  and so on. We are aiming to make Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange the
> data between distributed tasks, the performance and stability of shuffle
> will directly affect the whole job. Current “local file pull-like shuffle
> style” has several limitations:
>
>1. Current shuffle is hard to support super large workloads, especially
>in a high load environment, the major problem is IO problem (random
> disk IO
>issue, network congestion and timeout).
>2. Current shuffle is hard to deploy on the disaggregated compute
>storage environment, as disk capacity is quite limited on compute nodes.
>3. The constraint of storing shuffle data locally makes it hard to scale
>elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Uniffle's key idea is brought from Salfish shuffle
> <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> it has several key design goals:
>
>1. High performance. Firestorm’s performance is close enough to local
>file based shuffle style for small workloads. For large workloads, it is
>far better than the current shuffle style.
>2. Fault tolerance. Firestorm provides high availability for Coordinated
>nodes, and failover for Shuffle nodes.
>3. Pluggable. Firestorm is highly pluggable, which could be suited to
>different compute engines, different backend storages, and different
>wire-protocols.
>
> We believe that Uniffle project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>-
>
>Felix Cheung (felixche...@apache.org)
>- Junping du (junping...@apache.org)
>- Weiwei Yang (w...@apache.org)
>- Xun liu (liu...@apache.org)
>- Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>
> Best regards,
> Jerry
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Jerry Shao
Sorry for the copy-paste issue, +1 from my own side.

Best
Jerry

Heal Chow  于2022年5月25日周三 15:16写道:

> +1 (non-binding).
>
> Looking forward to the outstanding performance of the Uniffle on Shuffle.
> Good luck.
>
> Regards,
> HealChow
>
> On 2022/05/25 06:26:53 tison wrote:
> > +1 (binding)
> >
> > An interesting project. Good luck!
> >
> > Best,
> > tison.
> >
> >
> > Jungtaek Lim  于2022年5月25日周三 14:22写道:
> >
> > > +1 (non-binding)
> > >
> > > Good luck!
> > >
> > > On Wed, May 25, 2022 at 2:42 PM Daniel Widdis 
> wrote:
> > >
> > > > This was stated in the other thread: Unified/Universal Shuffle
> > > >
> > > > On 5/24/22, 10:04 PM, "XiaoYu"  wrote:
> > > >
> > > > Hi
> > > >
> > > > Uniffle  as a project name, What does he mean~
> > > >
> > > > thanks
> > > >
> > > > Weiwei Yang  于2022年5月25日周三 12:57写道:
> > > > >
> > > > > +1 (binding)
> > > > > Good luck!
> > > > >
> > > > > On Tue, May 24, 2022 at 8:49 PM Ye Xianjin <
> advance...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > +1 (non-binding).
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > > On May 25, 2022, at 9:59 AM, Goson zhang <
> > > gosonzh...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Good luck!
> > > > > > >
> > > > > > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > > > > > >
> > > > > > >> +1 (non-binding) from me!  Good luck!
> > > > > > >>
> > > > > > >> On 5/24/22, 9:05 AM, "Jerry Shao" 
> wrote:
> > > > > > >>
> > > > > > >>Hi all,
> > > > > > >>
> > > > > > >>Due to the name issue in thread (
> > > > > > >>
> > > > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > > > > > we
> > > > > > >>figured out a new project name "Uniffle" and created a
> new
> > > > Thread.
> > > > > > >> Please
> > > > > > >>help to discuss.
> > > > > > >>
> > > > > > >>We would like to propose Uniffle[1] as a new Apache
> > > incubator
> > > > > > project,
> > > > > > >> you
> > > > > > >>can find the proposal here [2] for more details.
> > > > > > >>
> > > > > > >>Uniffle is a high performance, general purpose Remote
> > > > Shuffle Service
> > > > > > >> for
> > > > > > >>distributed compute engines like Apache Spark
> > > > > > >>, Apache
> > > > > > >>Hadoop MapReduce , Apache
> > > Flink
> > > > > > >> and so on. We are aiming
> to
> > > make
> > > > > > >> Firestorm a
> > > > > > >>universal shuffle service for distributed compute
> engines.
> > > > > > >>
> > > > > > >>Shuffle is the key part for a distributed compute
> engine to
> > > > exchange
> > > > > > >> the
> > > > > > >>data between distributed tasks, the performance and
> > > > stability of
> > > > > > >> shuffle
> > > > > > >>will directly affect the whole job. Current “local file
> > > > pull-like
> > > > > > >> shuffle
> > > > > > >>style” has several limitations:
> > > > > > >>
> > > > > > >>   1. Current shuffle is hard to support super large
> > > > workloads,
> > > > > > >> especially
> > > > > > >>   in a high load environment, the major problem is IO
> > > > problem
> > > > > > (random
> > > > > > >> disk IO
> > > > > > >>   issue, network congestion and timeout).
> > > > > > >>   2. Current shuffle is hard to deploy on the
> > > disaggregated
> > > > compute
> > > > > > >>   storage environment, as disk capacity is quite
> limited
> > > on
> > > > compute
> > > > > > >> nodes.
> > > > > > >>   3. The constraint of storing shuffle data locally
> makes
> > > > it hard to
> > > > > > >> scale
> > > > > > >>   elastically.
> > > > > > >>
> > > > > > >>Remote Shuffle Service is the key technology for
> > > enterprises
> > > > to build
> > > > > > >> big
> > > > > > >>data platforms, to expand big data applications to
> > > > disaggregated,
> > > > > > >>online-offline hybrid environments, and to solve above
> > > > problems.
> > > > > > >>
> > > > > > >>The implementation of Remote Shuffle Service -
> “Uniffle”
> > > -
> > > > is
> > > > > > heavily
> > > > > > >>adopted in Tencent, and shows its advantages in
> production.
> > > > Other
> > > > > > >>enterprises also adopted or prepared to adopt
> Firestorm in
> > > > their
> > > > > > >>environments.
> > > > > > >>
> > > > > > >>Uniffle's key idea is brought from Salfish shuffle
> > > > > > >><
> > > > > > >>
> > > > > >
> > > >
> > >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > > > > >>> ,
> > > > > > >>it has several key

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-25 Thread Heal Chow
+1 (non-binding).

Looking forward to the outstanding performance of the Uniffle on Shuffle. Good 
luck.

Regards,
HealChow

On 2022/05/25 06:26:53 tison wrote:
> +1 (binding)
> 
> An interesting project. Good luck!
> 
> Best,
> tison.
> 
> 
> Jungtaek Lim  于2022年5月25日周三 14:22写道:
> 
> > +1 (non-binding)
> >
> > Good luck!
> >
> > On Wed, May 25, 2022 at 2:42 PM Daniel Widdis  wrote:
> >
> > > This was stated in the other thread: Unified/Universal Shuffle
> > >
> > > On 5/24/22, 10:04 PM, "XiaoYu"  wrote:
> > >
> > > Hi
> > >
> > > Uniffle  as a project name, What does he mean~
> > >
> > > thanks
> > >
> > > Weiwei Yang  于2022年5月25日周三 12:57写道:
> > > >
> > > > +1 (binding)
> > > > Good luck!
> > > >
> > > > On Tue, May 24, 2022 at 8:49 PM Ye Xianjin 
> > > wrote:
> > > >
> > > > > +1 (non-binding).
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > > On May 25, 2022, at 9:59 AM, Goson zhang <
> > gosonzh...@apache.org>
> > > wrote:
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Good luck!
> > > > > >
> > > > > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > > > > >
> > > > > >> +1 (non-binding) from me!  Good luck!
> > > > > >>
> > > > > >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> > > > > >>
> > > > > >>Hi all,
> > > > > >>
> > > > > >>Due to the name issue in thread (
> > > > > >>
> > > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > > > > we
> > > > > >>figured out a new project name "Uniffle" and created a new
> > > Thread.
> > > > > >> Please
> > > > > >>help to discuss.
> > > > > >>
> > > > > >>We would like to propose Uniffle[1] as a new Apache
> > incubator
> > > > > project,
> > > > > >> you
> > > > > >>can find the proposal here [2] for more details.
> > > > > >>
> > > > > >>Uniffle is a high performance, general purpose Remote
> > > Shuffle Service
> > > > > >> for
> > > > > >>distributed compute engines like Apache Spark
> > > > > >>, Apache
> > > > > >>Hadoop MapReduce , Apache
> > Flink
> > > > > >> and so on. We are aiming to
> > make
> > > > > >> Firestorm a
> > > > > >>universal shuffle service for distributed compute engines.
> > > > > >>
> > > > > >>Shuffle is the key part for a distributed compute engine to
> > > exchange
> > > > > >> the
> > > > > >>data between distributed tasks, the performance and
> > > stability of
> > > > > >> shuffle
> > > > > >>will directly affect the whole job. Current “local file
> > > pull-like
> > > > > >> shuffle
> > > > > >>style” has several limitations:
> > > > > >>
> > > > > >>   1. Current shuffle is hard to support super large
> > > workloads,
> > > > > >> especially
> > > > > >>   in a high load environment, the major problem is IO
> > > problem
> > > > > (random
> > > > > >> disk IO
> > > > > >>   issue, network congestion and timeout).
> > > > > >>   2. Current shuffle is hard to deploy on the
> > disaggregated
> > > compute
> > > > > >>   storage environment, as disk capacity is quite limited
> > on
> > > compute
> > > > > >> nodes.
> > > > > >>   3. The constraint of storing shuffle data locally makes
> > > it hard to
> > > > > >> scale
> > > > > >>   elastically.
> > > > > >>
> > > > > >>Remote Shuffle Service is the key technology for
> > enterprises
> > > to build
> > > > > >> big
> > > > > >>data platforms, to expand big data applications to
> > > disaggregated,
> > > > > >>online-offline hybrid environments, and to solve above
> > > problems.
> > > > > >>
> > > > > >>The implementation of Remote Shuffle Service -  “Uniffle”
> > -
> > > is
> > > > > heavily
> > > > > >>adopted in Tencent, and shows its advantages in production.
> > > Other
> > > > > >>enterprises also adopted or prepared to adopt Firestorm in
> > > their
> > > > > >>environments.
> > > > > >>
> > > > > >>Uniffle's key idea is brought from Salfish shuffle
> > > > > >><
> > > > > >>
> > > > >
> > >
> > https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > > > >>> ,
> > > > > >>it has several key design goals:
> > > > > >>
> > > > > >>   1. High performance. Firestorm’s performance is close
> > > enough to
> > > > > >> local
> > > > > >>   file based shuffle style for small workloads. For large
> > > workloads,
> > > > > >> it is
> > > > > >>   far better than the current shuffle style.
> > > > > >>   2. Fault tolerance. Firestorm provides high availability
> > > for
> > > > >

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread tison
+1 (binding)

An interesting project. Good luck!

Best,
tison.


Jungtaek Lim  于2022年5月25日周三 14:22写道:

> +1 (non-binding)
>
> Good luck!
>
> On Wed, May 25, 2022 at 2:42 PM Daniel Widdis  wrote:
>
> > This was stated in the other thread: Unified/Universal Shuffle
> >
> > On 5/24/22, 10:04 PM, "XiaoYu"  wrote:
> >
> > Hi
> >
> > Uniffle  as a project name, What does he mean~
> >
> > thanks
> >
> > Weiwei Yang  于2022年5月25日周三 12:57写道:
> > >
> > > +1 (binding)
> > > Good luck!
> > >
> > > On Tue, May 24, 2022 at 8:49 PM Ye Xianjin 
> > wrote:
> > >
> > > > +1 (non-binding).
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On May 25, 2022, at 9:59 AM, Goson zhang <
> gosonzh...@apache.org>
> > wrote:
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Good luck!
> > > > >
> > > > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > > > >
> > > > >> +1 (non-binding) from me!  Good luck!
> > > > >>
> > > > >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> > > > >>
> > > > >>Hi all,
> > > > >>
> > > > >>Due to the name issue in thread (
> > > > >>
> > https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > > > we
> > > > >>figured out a new project name "Uniffle" and created a new
> > Thread.
> > > > >> Please
> > > > >>help to discuss.
> > > > >>
> > > > >>We would like to propose Uniffle[1] as a new Apache
> incubator
> > > > project,
> > > > >> you
> > > > >>can find the proposal here [2] for more details.
> > > > >>
> > > > >>Uniffle is a high performance, general purpose Remote
> > Shuffle Service
> > > > >> for
> > > > >>distributed compute engines like Apache Spark
> > > > >>, Apache
> > > > >>Hadoop MapReduce , Apache
> Flink
> > > > >> and so on. We are aiming to
> make
> > > > >> Firestorm a
> > > > >>universal shuffle service for distributed compute engines.
> > > > >>
> > > > >>Shuffle is the key part for a distributed compute engine to
> > exchange
> > > > >> the
> > > > >>data between distributed tasks, the performance and
> > stability of
> > > > >> shuffle
> > > > >>will directly affect the whole job. Current “local file
> > pull-like
> > > > >> shuffle
> > > > >>style” has several limitations:
> > > > >>
> > > > >>   1. Current shuffle is hard to support super large
> > workloads,
> > > > >> especially
> > > > >>   in a high load environment, the major problem is IO
> > problem
> > > > (random
> > > > >> disk IO
> > > > >>   issue, network congestion and timeout).
> > > > >>   2. Current shuffle is hard to deploy on the
> disaggregated
> > compute
> > > > >>   storage environment, as disk capacity is quite limited
> on
> > compute
> > > > >> nodes.
> > > > >>   3. The constraint of storing shuffle data locally makes
> > it hard to
> > > > >> scale
> > > > >>   elastically.
> > > > >>
> > > > >>Remote Shuffle Service is the key technology for
> enterprises
> > to build
> > > > >> big
> > > > >>data platforms, to expand big data applications to
> > disaggregated,
> > > > >>online-offline hybrid environments, and to solve above
> > problems.
> > > > >>
> > > > >>The implementation of Remote Shuffle Service -  “Uniffle”
> -
> > is
> > > > heavily
> > > > >>adopted in Tencent, and shows its advantages in production.
> > Other
> > > > >>enterprises also adopted or prepared to adopt Firestorm in
> > their
> > > > >>environments.
> > > > >>
> > > > >>Uniffle's key idea is brought from Salfish shuffle
> > > > >><
> > > > >>
> > > >
> >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > > >>> ,
> > > > >>it has several key design goals:
> > > > >>
> > > > >>   1. High performance. Firestorm’s performance is close
> > enough to
> > > > >> local
> > > > >>   file based shuffle style for small workloads. For large
> > workloads,
> > > > >> it is
> > > > >>   far better than the current shuffle style.
> > > > >>   2. Fault tolerance. Firestorm provides high availability
> > for
> > > > >> Coordinated
> > > > >>   nodes, and failover for Shuffle nodes.
> > > > >>   3. Pluggable. Firestorm is highly pluggable, which could
> > be suited
> > > > >> to
> > > > >>   different compute engines, different backend storages,
> and
> > > > different
> > > > >>   wire-protocols.
> > > > >>
> > > > >>We believe that Uniffle project will provide the great
> value
> > for the
> > > > >>communit

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Jungtaek Lim
+1 (non-binding)

Good luck!

On Wed, May 25, 2022 at 2:42 PM Daniel Widdis  wrote:

> This was stated in the other thread: Unified/Universal Shuffle
>
> On 5/24/22, 10:04 PM, "XiaoYu"  wrote:
>
> Hi
>
> Uniffle  as a project name, What does he mean~
>
> thanks
>
> Weiwei Yang  于2022年5月25日周三 12:57写道:
> >
> > +1 (binding)
> > Good luck!
> >
> > On Tue, May 24, 2022 at 8:49 PM Ye Xianjin 
> wrote:
> >
> > > +1 (non-binding).
> > >
> > > Sent from my iPhone
> > >
> > > > On May 25, 2022, at 9:59 AM, Goson zhang 
> wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Good luck!
> > > >
> > > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > > >
> > > >> +1 (non-binding) from me!  Good luck!
> > > >>
> > > >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> > > >>
> > > >>Hi all,
> > > >>
> > > >>Due to the name issue in thread (
> > > >>
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > > we
> > > >>figured out a new project name "Uniffle" and created a new
> Thread.
> > > >> Please
> > > >>help to discuss.
> > > >>
> > > >>We would like to propose Uniffle[1] as a new Apache incubator
> > > project,
> > > >> you
> > > >>can find the proposal here [2] for more details.
> > > >>
> > > >>Uniffle is a high performance, general purpose Remote
> Shuffle Service
> > > >> for
> > > >>distributed compute engines like Apache Spark
> > > >>, Apache
> > > >>Hadoop MapReduce , Apache Flink
> > > >> and so on. We are aiming to make
> > > >> Firestorm a
> > > >>universal shuffle service for distributed compute engines.
> > > >>
> > > >>Shuffle is the key part for a distributed compute engine to
> exchange
> > > >> the
> > > >>data between distributed tasks, the performance and
> stability of
> > > >> shuffle
> > > >>will directly affect the whole job. Current “local file
> pull-like
> > > >> shuffle
> > > >>style” has several limitations:
> > > >>
> > > >>   1. Current shuffle is hard to support super large
> workloads,
> > > >> especially
> > > >>   in a high load environment, the major problem is IO
> problem
> > > (random
> > > >> disk IO
> > > >>   issue, network congestion and timeout).
> > > >>   2. Current shuffle is hard to deploy on the disaggregated
> compute
> > > >>   storage environment, as disk capacity is quite limited on
> compute
> > > >> nodes.
> > > >>   3. The constraint of storing shuffle data locally makes
> it hard to
> > > >> scale
> > > >>   elastically.
> > > >>
> > > >>Remote Shuffle Service is the key technology for enterprises
> to build
> > > >> big
> > > >>data platforms, to expand big data applications to
> disaggregated,
> > > >>online-offline hybrid environments, and to solve above
> problems.
> > > >>
> > > >>The implementation of Remote Shuffle Service -  “Uniffle”  -
> is
> > > heavily
> > > >>adopted in Tencent, and shows its advantages in production.
> Other
> > > >>enterprises also adopted or prepared to adopt Firestorm in
> their
> > > >>environments.
> > > >>
> > > >>Uniffle's key idea is brought from Salfish shuffle
> > > >><
> > > >>
> > >
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > > >>> ,
> > > >>it has several key design goals:
> > > >>
> > > >>   1. High performance. Firestorm’s performance is close
> enough to
> > > >> local
> > > >>   file based shuffle style for small workloads. For large
> workloads,
> > > >> it is
> > > >>   far better than the current shuffle style.
> > > >>   2. Fault tolerance. Firestorm provides high availability
> for
> > > >> Coordinated
> > > >>   nodes, and failover for Shuffle nodes.
> > > >>   3. Pluggable. Firestorm is highly pluggable, which could
> be suited
> > > >> to
> > > >>   different compute engines, different backend storages, and
> > > different
> > > >>   wire-protocols.
> > > >>
> > > >>We believe that Uniffle project will provide the great value
> for the
> > > >>community if it is accepted by the Apache incubator.
> > > >>
> > > >>I will help this project as champion and many thanks to the 3
> > > mentors:
> > > >>
> > > >>   -
> > > >>
> > > >>   Felix Cheung (felixche...@apache.org)
> > > >>   - Junping du (junping...@apache.org)
> > > >>   - Weiwei Yang (w...@apache.org)
> > > >>   - Xun liu (liu...@apache.org)
> > > >>  

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Daniel Widdis
This was stated in the other thread: Unified/Universal Shuffle

On 5/24/22, 10:04 PM, "XiaoYu"  wrote:

Hi

Uniffle  as a project name, What does he mean~

thanks

Weiwei Yang  于2022年5月25日周三 12:57写道:
>
> +1 (binding)
> Good luck!
>
> On Tue, May 24, 2022 at 8:49 PM Ye Xianjin  wrote:
>
> > +1 (non-binding).
> >
> > Sent from my iPhone
> >
> > > On May 25, 2022, at 9:59 AM, Goson zhang  
wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Good luck!
> > >
> > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > >
> > >> +1 (non-binding) from me!  Good luck!
> > >>
> > >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> > >>
> > >>Hi all,
> > >>
> > >>Due to the name issue in thread (
> > >>https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > we
> > >>figured out a new project name "Uniffle" and created a new Thread.
> > >> Please
> > >>help to discuss.
> > >>
> > >>We would like to propose Uniffle[1] as a new Apache incubator
> > project,
> > >> you
> > >>can find the proposal here [2] for more details.
> > >>
> > >>Uniffle is a high performance, general purpose Remote Shuffle 
Service
> > >> for
> > >>distributed compute engines like Apache Spark
> > >>, Apache
> > >>Hadoop MapReduce , Apache Flink
> > >> and so on. We are aiming to make
> > >> Firestorm a
> > >>universal shuffle service for distributed compute engines.
> > >>
> > >>Shuffle is the key part for a distributed compute engine to 
exchange
> > >> the
> > >>data between distributed tasks, the performance and stability of
> > >> shuffle
> > >>will directly affect the whole job. Current “local file pull-like
> > >> shuffle
> > >>style” has several limitations:
> > >>
> > >>   1. Current shuffle is hard to support super large workloads,
> > >> especially
> > >>   in a high load environment, the major problem is IO problem
> > (random
> > >> disk IO
> > >>   issue, network congestion and timeout).
> > >>   2. Current shuffle is hard to deploy on the disaggregated 
compute
> > >>   storage environment, as disk capacity is quite limited on 
compute
> > >> nodes.
> > >>   3. The constraint of storing shuffle data locally makes it 
hard to
> > >> scale
> > >>   elastically.
> > >>
> > >>Remote Shuffle Service is the key technology for enterprises to 
build
> > >> big
> > >>data platforms, to expand big data applications to disaggregated,
> > >>online-offline hybrid environments, and to solve above problems.
> > >>
> > >>The implementation of Remote Shuffle Service -  “Uniffle”  - is
> > heavily
> > >>adopted in Tencent, and shows its advantages in production. Other
> > >>enterprises also adopted or prepared to adopt Firestorm in their
> > >>environments.
> > >>
> > >>Uniffle's key idea is brought from Salfish shuffle
> > >><
> > >>
> > 
https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > >>> ,
> > >>it has several key design goals:
> > >>
> > >>   1. High performance. Firestorm’s performance is close enough to
> > >> local
> > >>   file based shuffle style for small workloads. For large 
workloads,
> > >> it is
> > >>   far better than the current shuffle style.
> > >>   2. Fault tolerance. Firestorm provides high availability for
> > >> Coordinated
> > >>   nodes, and failover for Shuffle nodes.
> > >>   3. Pluggable. Firestorm is highly pluggable, which could be 
suited
> > >> to
> > >>   different compute engines, different backend storages, and
> > different
> > >>   wire-protocols.
> > >>
> > >>We believe that Uniffle project will provide the great value for 
the
> > >>community if it is accepted by the Apache incubator.
> > >>
> > >>I will help this project as champion and many thanks to the 3
> > mentors:
> > >>
> > >>   -
> > >>
> > >>   Felix Cheung (felixche...@apache.org)
> > >>   - Junping du (junping...@apache.org)
> > >>   - Weiwei Yang (w...@apache.org)
> > >>   - Xun liu (liu...@apache.org)
> > >>   - Zhankun Tang (zt...@apache.org)
> > >>
> > >>
> > >>[1] https://github.com/Tencent/Firestorm
> > >>[2]
> > >> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> > >>
> > >>Best regards,
> > >>Jerry
> > >>
> > >>
> > >>
> > >> --

Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread XiaoYu
Hi

Uniffle  as a project name, What does he mean~

thanks

Weiwei Yang  于2022年5月25日周三 12:57写道:
>
> +1 (binding)
> Good luck!
>
> On Tue, May 24, 2022 at 8:49 PM Ye Xianjin  wrote:
>
> > +1 (non-binding).
> >
> > Sent from my iPhone
> >
> > > On May 25, 2022, at 9:59 AM, Goson zhang  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Good luck!
> > >
> > > Daniel Widdis  于2022年5月25日周三 09:53写道:
> > >
> > >> +1 (non-binding) from me!  Good luck!
> > >>
> > >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> > >>
> > >>Hi all,
> > >>
> > >>Due to the name issue in thread (
> > >>https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> > we
> > >>figured out a new project name "Uniffle" and created a new Thread.
> > >> Please
> > >>help to discuss.
> > >>
> > >>We would like to propose Uniffle[1] as a new Apache incubator
> > project,
> > >> you
> > >>can find the proposal here [2] for more details.
> > >>
> > >>Uniffle is a high performance, general purpose Remote Shuffle Service
> > >> for
> > >>distributed compute engines like Apache Spark
> > >>, Apache
> > >>Hadoop MapReduce , Apache Flink
> > >> and so on. We are aiming to make
> > >> Firestorm a
> > >>universal shuffle service for distributed compute engines.
> > >>
> > >>Shuffle is the key part for a distributed compute engine to exchange
> > >> the
> > >>data between distributed tasks, the performance and stability of
> > >> shuffle
> > >>will directly affect the whole job. Current “local file pull-like
> > >> shuffle
> > >>style” has several limitations:
> > >>
> > >>   1. Current shuffle is hard to support super large workloads,
> > >> especially
> > >>   in a high load environment, the major problem is IO problem
> > (random
> > >> disk IO
> > >>   issue, network congestion and timeout).
> > >>   2. Current shuffle is hard to deploy on the disaggregated compute
> > >>   storage environment, as disk capacity is quite limited on compute
> > >> nodes.
> > >>   3. The constraint of storing shuffle data locally makes it hard to
> > >> scale
> > >>   elastically.
> > >>
> > >>Remote Shuffle Service is the key technology for enterprises to build
> > >> big
> > >>data platforms, to expand big data applications to disaggregated,
> > >>online-offline hybrid environments, and to solve above problems.
> > >>
> > >>The implementation of Remote Shuffle Service -  “Uniffle”  - is
> > heavily
> > >>adopted in Tencent, and shows its advantages in production. Other
> > >>enterprises also adopted or prepared to adopt Firestorm in their
> > >>environments.
> > >>
> > >>Uniffle's key idea is brought from Salfish shuffle
> > >><
> > >>
> > https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> > >>> ,
> > >>it has several key design goals:
> > >>
> > >>   1. High performance. Firestorm’s performance is close enough to
> > >> local
> > >>   file based shuffle style for small workloads. For large workloads,
> > >> it is
> > >>   far better than the current shuffle style.
> > >>   2. Fault tolerance. Firestorm provides high availability for
> > >> Coordinated
> > >>   nodes, and failover for Shuffle nodes.
> > >>   3. Pluggable. Firestorm is highly pluggable, which could be suited
> > >> to
> > >>   different compute engines, different backend storages, and
> > different
> > >>   wire-protocols.
> > >>
> > >>We believe that Uniffle project will provide the great value for the
> > >>community if it is accepted by the Apache incubator.
> > >>
> > >>I will help this project as champion and many thanks to the 3
> > mentors:
> > >>
> > >>   -
> > >>
> > >>   Felix Cheung (felixche...@apache.org)
> > >>   - Junping du (junping...@apache.org)
> > >>   - Weiwei Yang (w...@apache.org)
> > >>   - Xun liu (liu...@apache.org)
> > >>   - Zhankun Tang (zt...@apache.org)
> > >>
> > >>
> > >>[1] https://github.com/Tencent/Firestorm
> > >>[2]
> > >> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> > >>
> > >>Best regards,
> > >>Jerry
> > >>
> > >>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > >> For additional commands, e-mail: general-h...@incubator.apache.org
> > >>
> > >>
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Weiwei Yang
+1 (binding)
Good luck!

On Tue, May 24, 2022 at 8:49 PM Ye Xianjin  wrote:

> +1 (non-binding).
>
> Sent from my iPhone
>
> > On May 25, 2022, at 9:59 AM, Goson zhang  wrote:
> >
> > +1 (non-binding)
> >
> > Good luck!
> >
> > Daniel Widdis  于2022年5月25日周三 09:53写道:
> >
> >> +1 (non-binding) from me!  Good luck!
> >>
> >> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
> >>
> >>Hi all,
> >>
> >>Due to the name issue in thread (
> >>https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f),
> we
> >>figured out a new project name "Uniffle" and created a new Thread.
> >> Please
> >>help to discuss.
> >>
> >>We would like to propose Uniffle[1] as a new Apache incubator
> project,
> >> you
> >>can find the proposal here [2] for more details.
> >>
> >>Uniffle is a high performance, general purpose Remote Shuffle Service
> >> for
> >>distributed compute engines like Apache Spark
> >>, Apache
> >>Hadoop MapReduce , Apache Flink
> >> and so on. We are aiming to make
> >> Firestorm a
> >>universal shuffle service for distributed compute engines.
> >>
> >>Shuffle is the key part for a distributed compute engine to exchange
> >> the
> >>data between distributed tasks, the performance and stability of
> >> shuffle
> >>will directly affect the whole job. Current “local file pull-like
> >> shuffle
> >>style” has several limitations:
> >>
> >>   1. Current shuffle is hard to support super large workloads,
> >> especially
> >>   in a high load environment, the major problem is IO problem
> (random
> >> disk IO
> >>   issue, network congestion and timeout).
> >>   2. Current shuffle is hard to deploy on the disaggregated compute
> >>   storage environment, as disk capacity is quite limited on compute
> >> nodes.
> >>   3. The constraint of storing shuffle data locally makes it hard to
> >> scale
> >>   elastically.
> >>
> >>Remote Shuffle Service is the key technology for enterprises to build
> >> big
> >>data platforms, to expand big data applications to disaggregated,
> >>online-offline hybrid environments, and to solve above problems.
> >>
> >>The implementation of Remote Shuffle Service -  “Uniffle”  - is
> heavily
> >>adopted in Tencent, and shows its advantages in production. Other
> >>enterprises also adopted or prepared to adopt Firestorm in their
> >>environments.
> >>
> >>Uniffle's key idea is brought from Salfish shuffle
> >><
> >>
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >>> ,
> >>it has several key design goals:
> >>
> >>   1. High performance. Firestorm’s performance is close enough to
> >> local
> >>   file based shuffle style for small workloads. For large workloads,
> >> it is
> >>   far better than the current shuffle style.
> >>   2. Fault tolerance. Firestorm provides high availability for
> >> Coordinated
> >>   nodes, and failover for Shuffle nodes.
> >>   3. Pluggable. Firestorm is highly pluggable, which could be suited
> >> to
> >>   different compute engines, different backend storages, and
> different
> >>   wire-protocols.
> >>
> >>We believe that Uniffle project will provide the great value for the
> >>community if it is accepted by the Apache incubator.
> >>
> >>I will help this project as champion and many thanks to the 3
> mentors:
> >>
> >>   -
> >>
> >>   Felix Cheung (felixche...@apache.org)
> >>   - Junping du (junping...@apache.org)
> >>   - Weiwei Yang (w...@apache.org)
> >>   - Xun liu (liu...@apache.org)
> >>   - Zhankun Tang (zt...@apache.org)
> >>
> >>
> >>[1] https://github.com/Tencent/Firestorm
> >>[2]
> >> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
> >>
> >>Best regards,
> >>Jerry
> >>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Ye Xianjin
+1 (non-binding).

Sent from my iPhone

> On May 25, 2022, at 9:59 AM, Goson zhang  wrote:
> 
> +1 (non-binding)
> 
> Good luck!
> 
> Daniel Widdis  于2022年5月25日周三 09:53写道:
> 
>> +1 (non-binding) from me!  Good luck!
>> 
>> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
>> 
>>Hi all,
>> 
>>Due to the name issue in thread (
>>https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
>>figured out a new project name "Uniffle" and created a new Thread.
>> Please
>>help to discuss.
>> 
>>We would like to propose Uniffle[1] as a new Apache incubator project,
>> you
>>can find the proposal here [2] for more details.
>> 
>>Uniffle is a high performance, general purpose Remote Shuffle Service
>> for
>>distributed compute engines like Apache Spark
>>, Apache
>>Hadoop MapReduce , Apache Flink
>> and so on. We are aiming to make
>> Firestorm a
>>universal shuffle service for distributed compute engines.
>> 
>>Shuffle is the key part for a distributed compute engine to exchange
>> the
>>data between distributed tasks, the performance and stability of
>> shuffle
>>will directly affect the whole job. Current “local file pull-like
>> shuffle
>>style” has several limitations:
>> 
>>   1. Current shuffle is hard to support super large workloads,
>> especially
>>   in a high load environment, the major problem is IO problem (random
>> disk IO
>>   issue, network congestion and timeout).
>>   2. Current shuffle is hard to deploy on the disaggregated compute
>>   storage environment, as disk capacity is quite limited on compute
>> nodes.
>>   3. The constraint of storing shuffle data locally makes it hard to
>> scale
>>   elastically.
>> 
>>Remote Shuffle Service is the key technology for enterprises to build
>> big
>>data platforms, to expand big data applications to disaggregated,
>>online-offline hybrid environments, and to solve above problems.
>> 
>>The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
>>adopted in Tencent, and shows its advantages in production. Other
>>enterprises also adopted or prepared to adopt Firestorm in their
>>environments.
>> 
>>Uniffle's key idea is brought from Salfish shuffle
>><
>> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
>>> ,
>>it has several key design goals:
>> 
>>   1. High performance. Firestorm’s performance is close enough to
>> local
>>   file based shuffle style for small workloads. For large workloads,
>> it is
>>   far better than the current shuffle style.
>>   2. Fault tolerance. Firestorm provides high availability for
>> Coordinated
>>   nodes, and failover for Shuffle nodes.
>>   3. Pluggable. Firestorm is highly pluggable, which could be suited
>> to
>>   different compute engines, different backend storages, and different
>>   wire-protocols.
>> 
>>We believe that Uniffle project will provide the great value for the
>>community if it is accepted by the Apache incubator.
>> 
>>I will help this project as champion and many thanks to the 3 mentors:
>> 
>>   -
>> 
>>   Felix Cheung (felixche...@apache.org)
>>   - Junping du (junping...@apache.org)
>>   - Weiwei Yang (w...@apache.org)
>>   - Xun liu (liu...@apache.org)
>>   - Zhankun Tang (zt...@apache.org)
>> 
>> 
>>[1] https://github.com/Tencent/Firestorm
>>[2]
>> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>> 
>>Best regards,
>>Jerry
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
>> 

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Goson zhang
+1 (non-binding)

Good luck!

Daniel Widdis  于2022年5月25日周三 09:53写道:

> +1 (non-binding) from me!  Good luck!
>
> On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:
>
> Hi all,
>
> Due to the name issue in thread (
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> figured out a new project name "Uniffle" and created a new Thread.
> Please
> help to discuss.
>
> We would like to propose Uniffle[1] as a new Apache incubator project,
> you
> can find the proposal here [2] for more details.
>
> Uniffle is a high performance, general purpose Remote Shuffle Service
> for
> distributed compute engines like Apache Spark
> , Apache
> Hadoop MapReduce , Apache Flink
>  and so on. We are aiming to make
> Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange
> the
> data between distributed tasks, the performance and stability of
> shuffle
> will directly affect the whole job. Current “local file pull-like
> shuffle
> style” has several limitations:
>
>1. Current shuffle is hard to support super large workloads,
> especially
>in a high load environment, the major problem is IO problem (random
> disk IO
>issue, network congestion and timeout).
>2. Current shuffle is hard to deploy on the disaggregated compute
>storage environment, as disk capacity is quite limited on compute
> nodes.
>3. The constraint of storing shuffle data locally makes it hard to
> scale
>elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build
> big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Uniffle's key idea is brought from Salfish shuffle
> <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> it has several key design goals:
>
>1. High performance. Firestorm’s performance is close enough to
> local
>file based shuffle style for small workloads. For large workloads,
> it is
>far better than the current shuffle style.
>2. Fault tolerance. Firestorm provides high availability for
> Coordinated
>nodes, and failover for Shuffle nodes.
>3. Pluggable. Firestorm is highly pluggable, which could be suited
> to
>different compute engines, different backend storages, and different
>wire-protocols.
>
> We believe that Uniffle project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>-
>
>Felix Cheung (felixche...@apache.org)
>- Junping du (junping...@apache.org)
>- Weiwei Yang (w...@apache.org)
>- Xun liu (liu...@apache.org)
>- Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2]
> https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>
> Best regards,
> Jerry
>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Daniel Widdis
+1 (non-binding) from me!  Good luck!

On 5/24/22, 9:05 AM, "Jerry Shao"  wrote:

Hi all,

Due to the name issue in thread (
https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
figured out a new project name "Uniffle" and created a new Thread. Please
help to discuss.

We would like to propose Uniffle[1] as a new Apache incubator project, you
can find the proposal here [2] for more details.

Uniffle is a high performance, general purpose Remote Shuffle Service for
distributed compute engines like Apache Spark
, Apache
Hadoop MapReduce , Apache Flink
 and so on. We are aiming to make Firestorm a
universal shuffle service for distributed compute engines.

Shuffle is the key part for a distributed compute engine to exchange the
data between distributed tasks, the performance and stability of shuffle
will directly affect the whole job. Current “local file pull-like shuffle
style” has several limitations:

   1. Current shuffle is hard to support super large workloads, especially
   in a high load environment, the major problem is IO problem (random disk 
IO
   issue, network congestion and timeout).
   2. Current shuffle is hard to deploy on the disaggregated compute
   storage environment, as disk capacity is quite limited on compute nodes.
   3. The constraint of storing shuffle data locally makes it hard to scale
   elastically.

Remote Shuffle Service is the key technology for enterprises to build big
data platforms, to expand big data applications to disaggregated,
online-offline hybrid environments, and to solve above problems.

The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
adopted in Tencent, and shows its advantages in production. Other
enterprises also adopted or prepared to adopt Firestorm in their
environments.

Uniffle's key idea is brought from Salfish shuffle

,
it has several key design goals:

   1. High performance. Firestorm’s performance is close enough to local
   file based shuffle style for small workloads. For large workloads, it is
   far better than the current shuffle style.
   2. Fault tolerance. Firestorm provides high availability for Coordinated
   nodes, and failover for Shuffle nodes.
   3. Pluggable. Firestorm is highly pluggable, which could be suited to
   different compute engines, different backend storages, and different
   wire-protocols.

We believe that Uniffle project will provide the great value for the
community if it is accepted by the Apache incubator.

I will help this project as champion and many thanks to the 3 mentors:

   -

   Felix Cheung (felixche...@apache.org)
   - Junping du (junping...@apache.org)
   - Weiwei Yang (w...@apache.org)
   - Xun liu (liu...@apache.org)
   - Zhankun Tang (zt...@apache.org)


[1] https://github.com/Tencent/Firestorm
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal

Best regards,
Jerry



-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Xun Liu
hi,

+1 (binding) from me,

We had several discussions and, based on the characteristics of the project
We created a new word Uniffle, and after checking, it was determined that
Uniffle has not been used as a software name yet.

Let's start a new journey. :-)

On Wed, May 25, 2022 at 12:05 AM Jerry Shao  wrote:

> Hi all,
>
> Due to the name issue in thread (
> https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
> figured out a new project name "Uniffle" and created a new Thread. Please
> help to discuss.
>
> We would like to propose Uniffle[1] as a new Apache incubator project, you
> can find the proposal here [2] for more details.
>
> Uniffle is a high performance, general purpose Remote Shuffle Service for
> distributed compute engines like Apache Spark
> , Apache
> Hadoop MapReduce , Apache Flink
>  and so on. We are aiming to make Firestorm a
> universal shuffle service for distributed compute engines.
>
> Shuffle is the key part for a distributed compute engine to exchange the
> data between distributed tasks, the performance and stability of shuffle
> will directly affect the whole job. Current “local file pull-like shuffle
> style” has several limitations:
>
>1. Current shuffle is hard to support super large workloads, especially
>in a high load environment, the major problem is IO problem (random
> disk IO
>issue, network congestion and timeout).
>2. Current shuffle is hard to deploy on the disaggregated compute
>storage environment, as disk capacity is quite limited on compute nodes.
>3. The constraint of storing shuffle data locally makes it hard to scale
>elastically.
>
> Remote Shuffle Service is the key technology for enterprises to build big
> data platforms, to expand big data applications to disaggregated,
> online-offline hybrid environments, and to solve above problems.
>
> The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
> adopted in Tencent, and shows its advantages in production. Other
> enterprises also adopted or prepared to adopt Firestorm in their
> environments.
>
> Uniffle's key idea is brought from Salfish shuffle
> <
> https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing
> >,
> it has several key design goals:
>
>1. High performance. Firestorm’s performance is close enough to local
>file based shuffle style for small workloads. For large workloads, it is
>far better than the current shuffle style.
>2. Fault tolerance. Firestorm provides high availability for Coordinated
>nodes, and failover for Shuffle nodes.
>3. Pluggable. Firestorm is highly pluggable, which could be suited to
>different compute engines, different backend storages, and different
>wire-protocols.
>
> We believe that Uniffle project will provide the great value for the
> community if it is accepted by the Apache incubator.
>
> I will help this project as champion and many thanks to the 3 mentors:
>
>-
>
>Felix Cheung (felixche...@apache.org)
>- Junping du (junping...@apache.org)
>- Weiwei Yang (w...@apache.org)
>- Xun liu (liu...@apache.org)
>- Zhankun Tang (zt...@apache.org)
>
>
> [1] https://github.com/Tencent/Firestorm
> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal
>
> Best regards,
> Jerry
>


[DISCUSSION] Incubating Proposal of Uniffle

2022-05-24 Thread Jerry Shao
Hi all,

Due to the name issue in thread (
https://lists.apache.org/thread/y07xjkqzvpchncym9zr1hgm3c4l4ql0f), we
figured out a new project name "Uniffle" and created a new Thread. Please
help to discuss.

We would like to propose Uniffle[1] as a new Apache incubator project, you
can find the proposal here [2] for more details.

Uniffle is a high performance, general purpose Remote Shuffle Service for
distributed compute engines like Apache Spark
, Apache
Hadoop MapReduce , Apache Flink
 and so on. We are aiming to make Firestorm a
universal shuffle service for distributed compute engines.

Shuffle is the key part for a distributed compute engine to exchange the
data between distributed tasks, the performance and stability of shuffle
will directly affect the whole job. Current “local file pull-like shuffle
style” has several limitations:

   1. Current shuffle is hard to support super large workloads, especially
   in a high load environment, the major problem is IO problem (random disk IO
   issue, network congestion and timeout).
   2. Current shuffle is hard to deploy on the disaggregated compute
   storage environment, as disk capacity is quite limited on compute nodes.
   3. The constraint of storing shuffle data locally makes it hard to scale
   elastically.

Remote Shuffle Service is the key technology for enterprises to build big
data platforms, to expand big data applications to disaggregated,
online-offline hybrid environments, and to solve above problems.

The implementation of Remote Shuffle Service -  “Uniffle”  - is heavily
adopted in Tencent, and shows its advantages in production. Other
enterprises also adopted or prepared to adopt Firestorm in their
environments.

Uniffle's key idea is brought from Salfish shuffle
,
it has several key design goals:

   1. High performance. Firestorm’s performance is close enough to local
   file based shuffle style for small workloads. For large workloads, it is
   far better than the current shuffle style.
   2. Fault tolerance. Firestorm provides high availability for Coordinated
   nodes, and failover for Shuffle nodes.
   3. Pluggable. Firestorm is highly pluggable, which could be suited to
   different compute engines, different backend storages, and different
   wire-protocols.

We believe that Uniffle project will provide the great value for the
community if it is accepted by the Apache incubator.

I will help this project as champion and many thanks to the 3 mentors:

   -

   Felix Cheung (felixche...@apache.org)
   - Junping du (junping...@apache.org)
   - Weiwei Yang (w...@apache.org)
   - Xun liu (liu...@apache.org)
   - Zhankun Tang (zt...@apache.org)


[1] https://github.com/Tencent/Firestorm
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/UniffleProposal

Best regards,
Jerry