+1

> 2022年9月25日 08:44,MINX Feng <fmx...@outlook.com> 写道:
> 
> It is an interesting project. Good luck to Datark, may this project lives 
> long and prosper.
> 
> Best wishes!
> Ethan
> 
>> 2022年9月22日 11:45,Yu Li <car...@gmail.com> 写道:
>> 
>> Hi All,
>> 
>> I would like to propose Datark [1] as a new apache incubator project, and
>> you can find the proposal [2] of Datark for more details.
>> 
>> Datark is an intermediate (shuffle and spilled) data service for big data
>> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
>> performance, stability, and flexibility. It aims at enabling computing
>> engines to fully embrace the disaggregated architecture. In a lot of cases,
>> intermediate data depends on large local disks, and is often a major cause
>> of inefficiency, instability, and inflexibility in the lifecycle of a
>> distributed job. Datark solves the problems through the following core
>> designs:
>> 
>> 1. Push-based shuffle plus partition data aggregation to turn random IO
>> access into sequential access.
>> 2. FileSystem-like API to support writing spilled data.
>> 3. Hierarchical storage from memory to DFS/object store to enable fast
>> cache and massive storage space.
>> 4. Engine-irrelevant APIs for easy integrating to various engines.
>> 5. Extended fault tolerance and data replication to increase reliability
>> 
>> Datark is currently adopted in the production environment at both Alibaba
>> and many other companies, serving petabytes of data per day. Beyond that,
>> it has more open source users including Shopee, NetEase, Bilibily, BOSS,
>> and Synnex. Most of these users have made contributions to the project,
>> forming an active community with dozens of developers.
>> 
>> The proposed initial committers are interested in joining ASF to reinforce
>> extensive collaboration and build a more vibrant community. We believe the
>> Datark project will provide tremendous value for the community if it is
>> introduced into the Apache incubator.
>> 
>> I will help this project as the champion and many thanks to our four other
>> mentors:
>> 
>> * Becket Qin (j...@apache.org)
>> * Duo Zhang (zhang...@apache.org)
>> * Lidong Dai (lidong...@apache.org)
>> * Willem Jiang (ningji...@apache.org)
>> 
>> FWIW, although with different solutions, the issues Datark aims to resolve
>> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed
>> this during the discussion phase of Uniffle incubation (when we were also
>> preparing for the incubation) and had some open and friendly discussion to
>> see whether there could be a joint force [4], and finally decided to
>> develop independently for the time being [5].
>> 
>> Look forward to your feedback. Thanks.
>> 
>> Best Regards,
>> Yu
>> 
>> [1] https://github.com/alibaba/RemoteShuffleService
>> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
>> [3] https://uniffle.apache.org/
>> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
>> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to