[DISCUSS] Time for 0.3.2

2023-12-06 Thread Nicholas
Hey, Celeborn community,

It has been a while since the 0.3.1 release, and there are some critical fixes 
land branch-0.3, for example, [CELEBORN-1037] Incorrect output for metrics of 
Prometheus. From my perspective, it’s time to prepare for releasing 0.3.2.

WDYT? And I’m volunteering to be the release manager if no one has applied.

Regards,
Nicholas Jiang

Re:[VOTE] Release Apache Celeborn(Incubating) 0.4.0-incubating-rc4

2024-01-19 Thread Nicholas
+1 (non-binding)

I have verified:
- Git commit hash is correct.
- Checksums and signatures are valid.
- Download links are valid.
- No binary files in the source release.
- Built the binary from the source with command successfully: 
./build/make-distribution.sh -Pspark-3.1,flink-1.17. Regards,

Nicholas Jiang














At 2024-01-18 21:40:22, "Fu Chen"  wrote:
>Hi Celeborn community,
>
>This is a call for a vote to release Apache Celeborn (Incubating)
>0.4.0-incubating-rc4
>
>
>The git tag to be voted upon:
>https://github.com/apache/incubator-celeborn/releases/tag/v0.4.0-incubating-rc4
>
>
>The git commit hash:
>8bc07466dd85a90216820617015e329fb806c7dd source and binary artifacts can be
>found at:
>https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.4.0-incubating-rc4
>
>
>The staging repo:
>https://repository.apache.org/content/repositories/orgapacheceleborn-1051
>
>
>Fingerprint of the PGP key release artifacts are signed with:
>92AF4750DAFCB5E25B5B83EA76F54B977EB5C09B
>
>
>My public key to verify signatures can be found in:
>https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS
>
>
>The vote will be open for at least 72 hours or until the necessary
>number of votes are reached.
>
>
>Please vote accordingly:
>
>
>[ ] +1 approve
>[ ] +0 no opinion
>[ ] -1 disapprove (and the reason)
>
>
>Checklist for release:
>https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
>Steps to validate the release:
>https://www.apache.org/info/verification.html
>
>
>* Download links, checksums and PGP signatures are valid.
>* Source code distributions have correct names matching the current release.
>* Release files have the word incubating in their name.
>* DISCLAIMER, LICENSE and NOTICE files are correct.
>* All files have license headers if necessary.
>* No unlicensed compiled archives bundled in source archive.
>* The source tarball matches the git tag.
>* Build from source is successful.
>
>Please be aware that there has been a transition in the Celeborn project's
>build tool, shifting from Maven to SBT. The SBT build documentation is
>available
>at https://celeborn.apache.org/docs/latest/developers/sbt/.
>
>For illustrative purposes:
>
>Packaging the project
>```
>./build/sbt clean package
>```
>
>Creating the distribution
>```
>./build/make-distribution.sh --sbt-enabled --release
>```
>
>Thanks,
>Fu Chen


Re:Re: Large number of incubator-celeb...@noreply.github.com emails

2024-02-06 Thread Nicholas
Hi Mridul,


I closed the github issues which is pending too long time and could be closed. 
We recommend to uses JIRA for Issue Management, please open new issues in JIRA 
instead of GitHub. For new contributors, please apply JIRA Account first.


Regards,
Nicholas Jiang
At 2024-02-07 14:03:07, "Mridul Muralidharan"  wrote:
>  Looks like I am wrong, github issues can be used [1].
>Is Celeborn planning to use github issues going forward ?
>
>Regards,
>Mridul
>
>
>[1] https://www.apache.org/dev/#issues
>
>
>On Wed, Feb 7, 2024 at 12:00 AM Mridul Muralidharan 
>wrote:
>
>> Hi,
>>
>>   I received a fairly large number of emails to
>> incubator-celeb...@noreply.github.com, which typically are for PR's.
>> They appear to be github issues - are we trying to move to github issues
>> instead of Apache jira ? IIRC there is a policy to use jira for tracking
>> bugs/improvements, right ?
>>
>> Regards,
>> Mridul
>>


Re:Re: [ANNONCE] New PPMC member: Fu Chen

2024-02-19 Thread Nicholas
Congratulations to Fu Chen!Regards,
Nicholas Jiang




At 2024-02-20 00:23:06, "Shaoyun Chen"  wrote:
>Congratulations!
>
>Keyong Zhou  于2024年2月19日周一 21:16写道:
>>
>> Hi Celeborn Community,
>>
>> The Podling Project Management Committee (PPMC) for Apache Celeborn
>> has invited Fu Chen to become our PPMC member and
>> we are pleased to announce that he has accepted.
>>
>> Fu Chen has been actively contributing to Celeborn community for more then
>> one year[1], including SBT build,
>> performance improvement, code refactor, bug fixes, code reviews, design
>> discussion, docs, etc.
>>
>> Please join me in congratulating Fu Chen!
>>
>> Being a committer enables easier contribution to the
>> project since there is no need to go via the patch
>> submission process. This should enable better productivity.
>> A PPMC member helps manage and guide the direction of the project.
>>
>> [1] https://github.com/apache/incubator-celeborn/commits?author=cfmcgrady
>>
>> Thanks,
>> On behalf of the Apache Celeborn PPMC


Re:[VOTE] Graduate Apache Celeborn (incubating) as a TLP - Community

2024-03-01 Thread Nicholas

+1.


Regards,
Nicholas Jiang




--
发自我的网易邮箱手机智能版



- Original Message -
From: "Yu Li" 
To: dev@celeborn.apache.org
Sent: Fri, 1 Mar 2024 16:52:10 +0800
Subject: [VOTE] Graduate Apache Celeborn (incubating) as a TLP - Community

Hi All,

After a thorough discussion [1], I'd like to call a formal vote to
graduate Apache Celeborn (incubating) as a TLP. Below are some facts
and project highlights carried from [1] as well as the draft
resolution:

- Currently, our community consists of 19 committers (including
mentors) from more than 10 companies, with 12 serving as PPMC members.
- So far, we have boasted 81 contributors.
- Throughout the incubation period, we've made 6 releases in 16
months, at a stable pace.
- We've had 6 different release managers to date.
- Our software is used in production by 10+ well known entities.
- As yet, we have opened 1,286 issues with 1,176 successfully resolved.
- We have submitted a total of 1,816 PRs, out of which 1,805 have been
merged or closed.
- Through self-assessment [2], we have met all maturity criteria as
outlined in [3].

We've resolved all branding issues which include Logo, GitHub repo,
document, website, and others [4] [5].

--
Establish the Apache Celeborn Project

WHEREAS, the Board of Directors deems it to be in the best interests of
the Foundation and consistent with the Foundation's purpose to establish
a Project Management Committee charged with the creation and maintenance
of open-source software, for distribution at no charge to the public,
related to an intermediate data service for big data computing engines
to boost performance, stability, and flexibility.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
(PMC), to be known as the "Apache Celeborn Project", be and hereby is
established pursuant to Bylaws of the Foundation; and be it further

RESOLVED, that the Apache Celeborn Project be and hereby is responsible
for the creation and maintenance of software related to an intermediate
data service for big data computing engines to boost performance,
stability, and flexibility; and be it further

RESOLVED, that the office of "Vice President, Apache Celeborn" be and
hereby is created, the person holding such office to serve at the
direction of the Board of Directors as the chair of the Apache Celeborn
Project, and to have primary responsibility for management of the
projects within the scope of responsibility of the Apache Celeborn
Project; and be it further

RESOLVED, that the persons listed immediately below be and hereby are
appointed to serve as the initial members of the Apache Celeborn
Project:

 * Becket Qin
 * Cheng Pan 
 * Duo Zhang 
 * Ethan Feng
 * Fu Chen   
 * Jiashu Xiong  
 * Kerwin Zhang  
 * Keyong Zhou   
 * Lidong Dai
 * Willem Ning Jiang 
 * Wu Wei
 * Yi Zhu
 * Yu Li 

NOW, THEREFORE, BE IT FURTHER RESOLVED, that Keyong Zhou be appointed to
the office of Vice President, Apache Celeborn, to serve in accordance
with and subject to the direction of the Board of Directors and the
Bylaws of the Foundation until death, resignation, retirement, removal
or disqualification, or until a successor is appointed; and be it
further

RESOLVED, that the Apache Celeborn Project be and hereby is tasked with
the migration and rationalization of the Apache Incubator Celeborn
podling; and be it further

RESOLVED, that all responsibilities pertaining to the Apache Incubator
Celeborn podling encumbered upon the Apache Incubator PMC are hereafter
discharged.
--

Best Regards,
Yu

[1] https://lists.apache.org/thread/z17rs0mw4nyv0s112dklmv7s3j053mby
[2] 
https://cwiki.apache.org/confluence/display/CELEBORN/Apache+Maturity+Model+Assessment+for+Celeborn
[3] https://community.apache.org/apache-way/apache-project-maturity-model.html
[4] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-206
[5] https://whimsy.apache.org/pods/project/celeborn


Re:Re: [ANNOUNCE] Add Chandni Singh as new committer

2024-03-21 Thread Nicholas
Congratulations Chandni. Well deserved!




Regards,
Nicholas Jiang




At 2024-03-21 19:16:25, "Shaoyun Chen"  wrote:
>Congratulations!
>
>Mridul Muralidharan  于2024年3月21日周四 16:54写道:
>>
>> Congratulations Chandni ! Great job :-)
>>
>> Regards,
>> Mridul
>>
>>
>> On Thu, Mar 21, 2024 at 3:30 AM Keyong Zhou  wrote:
>>
>> > Hi Celeborn Community,
>> >
>> > The Podling Project Management Committee (PPMC) for Apache Celeborn
>> > has invited Chandni Singh to become a committer and we are pleased
>> > to announce that she has accepted.
>> >
>> > Being a committer enables easier contribution to the
>> > project since there is no need to go via the patch
>> > submission process. This should enable better productivity.
>> > A (P)PMC member helps manage and guide the direction of the project.
>> >
>> > Please join me in congratulating Chandni Singh!
>> >
>> > Thanks,
>> > Keyong Zhou
>> >


Re:Re: [VOTE] Release Apache Celeborn 0.4.1-rc1

2024-05-20 Thread Nicholas
Regarding the random killing test, I has already simulated the following 
exception for testing:




1. master node exception: One or two master nodes have an abnormality and hang 
up for half a minute each time.

2. worker node exception: One or two worker nodes will hang for half a minute 
each time.

3. disk corruption: Randomly select a worker and test one disk and two disks. 
All disks are unwritable.

4. disk io hang: Randomly select a worker, test one disk, two disks, and io 
hang on all disks.

5. master metadata exception: Randomly select one or two of master nodes to 
test the ratis meta corruption.




The method of results verification is that running a query that will last for 
several minutes, which query runs successfully in most cases and failure is 
allowed under certain circumstances.




To summarize, I've tested with the following exception behavior:




1. Kill the master process[PASSED]

2. Kill the worker process[PASSED]

3. The worker directory is not writable[PASSED]

4. worker disk io hang[PASSED]

5. High CPU load[PASSED]

6. Delete ratis metadata of master node[PASSED]




Meanwhile, the resource statuses of master and worker are as expected.




+1 for the random killing test.




Regards,

Nicholas Jiang




At 2024-05-21 05:05:17, "Mridul Muralidharan"  wrote:
>+1
>
>Signatures, digests, etc check out fine.
>Checked out tag and build/tested with "-Pspark3.1"
>
>Regards,
>Mridul
>
>
>On Sun, May 19, 2024 at 10:19 PM rexxiong  wrote:
>
>> +1 (binding)
>> I checked
>> - Download links are valid.
>> - git commit hash is correct
>> - Checksums and signatures are valid.
>> - No binary files in the source release
>> - Successfully built the binary from the source on MacOs with Command:
>> ./build/make-distribution.sh -Pspark-3.3
>>
>> I also tested compatibility with version 0.4.0 by upgrading the
>> master/worker from 0.4.0 to 0.4.1. Using a 0.4.0 client to access the 0.4.1
>> master/worker, everything worked well.
>>
>> Thanks,
>> Jiashu Xiong
>>
>> Yihe Li  于2024年5月17日周五 18:47写道:
>>
>> > +1 (non-binding)
>> > I checked the following things:
>> > - git commit hash is correct.
>> > - download links are valid.
>> > - release files are in correct location.
>> > - signatures and checksums are good.
>> > - LICENSE and NOTICE files exist.
>> > - build success from source code(ubuntu 16.04).
>> > ```
>> > ./build/make-distribution.sh --sbt-enabled -Pspark-3.3
>> > ```
>> >
>> > Thanks,
>> > Yihe Li
>> >
>> > On 2024/05/17 01:53:48 angers zhu wrote:
>> > > +1
>> > >
>> > > - Checked license
>> > > - checked doc
>> > > - checked build from source with spark-32
>> > >
>> > > Nicholas Jiang  于2024年5月14日周二 12:13写道:
>> > >
>> > > > Hi Celeborn community,
>> > > >
>> > > > This is a call for a vote to release Apache Celeborn
>> > > >
>> > > > 0.4.1-rc1
>> > > >
>> > > >
>> > > > The git tag to be voted upon:
>> > > >
>> > > > https://github.com/apache/celeborn/releases/tag/v0.4.1-rc1
>> > > >
>> > > > The git commit hash:
>> > > > 641180142c5ef36430a6afcd702c9487a6007458 source and binary artifacts
>> > can be
>> > > > found at:
>> > > >
>> > > > https://dist.apache.org/repos/dist/dev/celeborn/v0.4.1-rc1
>> > > >
>> > > > The staging repo:
>> > > >
>> > > >
>> >
>> https://repository.apache.org/content/repositories/orgapacheceleborn-1055
>> > > >
>> > > >
>> > > > Fingerprint of the PGP key release artifacts are signed with:
>> > > > D73CADC1DAB63BD3C770BB6D9476842D24B7C885
>> > > >
>> > > > My public key to verify signatures can be found in:
>> > > >
>> > > > https://dist.apache.org/repos/dist/release/celeborn/KEYS
>> > > >
>> > > > The vote will be open for at least 72 hours or until the necessary
>> > > > number of votes are reached.
>> > > >
>> > > > Please vote accordingly:
>> > > >
>> > > > [ ] +1 approve
>> > > > [ ] +0 no opinion
>> > > > [ ] -1 disapprove (and the reason)
>> > > >
>> > > > Steps to validate the release:
>> > > >
>> > > > https://www.apache.org/info/verification.html
>> > > >
>> > > > * Download links, checksums and PGP signatures are valid.
>> > > > * Source code distributions have correct names matching the current
>> > > > release.
>> > > > * LICENSE and NOTICE files are correct.
>> > > > * All files have license headers if necessary.
>> > > > * No unlicensed compiled archives bundled in source archive.
>> > > > * The source tarball matches the git tag.
>> > > > * Build from source is successful.
>> > > >
>> > > > Regards,
>> > > > Nicholas Jiang
>> > >
>> >
>>


Re:Re: [DISCUSS] Time for 0.5.0

2024-05-25 Thread Nicholas
Hi Ethan Feng, thanks for driving volunteering to release 0.5.0. +1 for 
releasing 0.5.0. BTW, is the dynamic config service also experimental or 
production ready?




Regards,

Nicholas Jiang


At 2024-05-24 20:11:02, "Keyong Zhou"  wrote:
>+1 for releasing 0.5.0. But I think memory file storage is still
>experimental.
>
>Regards,
>Keyong Zhou
>
>Ethan Feng  于2024年5月24日周五 18:15写道:
>
>> Hello, Celeborn community,
>>
>> It has been 4 months since we released the last major version. Some
>> new features, such as SSL support and memory file storage, are now
>> ready. Several optimizations have been merged into the main branch.
>> Many components are updated to the latest version.
>>
>> What do you think? I'm volunteering to be the release manager if no
>> one else has applied.
>>
>> Thanks,
>> Ethan Feng
>>


Re:Re: Re: [DISCUSS] Time for 0.5.0

2024-05-27 Thread Nicholas
Hi Ethan Feng,




Thanks for the confirm of dynamic config service. This makes sense to me. +1 
for releasing 0.5.0.




Regards,

Nicholas Jiang




在 2024-05-27 15:10:05,"Ethan Feng"  写道:
>Hi Nicholas,
>
>The dynamic config service is production-ready but the database-based
>config server has not been tested in the production environment.
>
>Regards,
>Ethan Feng
>
>Nicholas  于2024年5月26日周日 11:37写道:
>>
>> Hi Ethan Feng, thanks for driving volunteering to release 0.5.0. +1 for 
>> releasing 0.5.0. BTW, is the dynamic config service also experimental or 
>> production ready?
>>
>>
>>
>>
>> Regards,
>>
>> Nicholas Jiang
>>
>>
>> At 2024-05-24 20:11:02, "Keyong Zhou"  wrote:
>> >+1 for releasing 0.5.0. But I think memory file storage is still
>> >experimental.
>> >
>> >Regards,
>> >Keyong Zhou
>> >
>> >Ethan Feng  于2024年5月24日周五 18:15写道:
>> >
>> >> Hello, Celeborn community,
>> >>
>> >> It has been 4 months since we released the last major version. Some
>> >> new features, such as SSL support and memory file storage, are now
>> >> ready. Several optimizations have been merged into the main branch.
>> >> Many components are updated to the latest version.
>> >>
>> >> What do you think? I'm volunteering to be the release manager if no
>> >> one else has applied.
>> >>
>> >> Thanks,
>> >> Ethan Feng
>> >>


Re:Re: [Discussion] Proposal Management in Celeborn Community

2024-05-29 Thread Nicholas
Hi Jiashu,




+1 for me. According to my experience in the Flink community, the discussion of 
the CIP is commented in dev maillist instead of commented in confluence.




Anyway, the CIP is required to introduce new feature or major changes.




Regards,

Nicholas Jiang




At 2024-05-30 01:29:58, "Mridul Muralidharan"  wrote:
>  Inline comments, discussions are invaluable for design docs - this is not
>yet supported in confluence right ?
>Another option would be to iterate and discuss through other means (like
>google docs), and before vote, move it to the wiki - so that the community
>is deciding/voting on artifacts which are on the wiki.
>This would also help in case proposals do not end up making it to the vote
>stage, but go through brainstorming/discussion - and evolve into something
>new (or get merged with others).
>
>Regards,
>Mridul
>
>
>On Wed, May 29, 2024 at 10:42 AM Keyong Zhou  wrote:
>
>> +1 for me.
>>
>> About the comments by Cheng, IMHO discussing in maillist is also acceptable
>> (and even better)
>>
>> Regards,
>> Keyong Zhou
>>
>> Cheng Pan  于2024年5月29日周三 14:32写道:
>>
>> > +1 for archiving proposals on confluence.
>> >
>> > Does Confluence support inline comments like Google Docs does? I think
>> > it’s a convincing functionality for the discussion period.
>> >
>> > Thanks,
>> > Cheng Pan
>> >
>> >
>> > > On May 29, 2024, at 11:19, rexxiong  wrote:
>> > >
>> > > Hello, Celeborn community,
>> > >
>> > > In the past, when Celeborn introduced new major features or significant
>> > changes, we typically used Google Docs to launch proposals. However, a
>> > major issue with Google Docs is the difficulty in centrally managing
>> these
>> > proposals. Therefore, after referring to other communities and based on
>> > discussions with several PMCs offline, it appears that Apache Confluence
>> > could be a viable alternative for our needs. With that in mind, I would
>> > like to invite all of you to share your thoughts, experiences, and
>> > preferences regarding the use of Apache Confluence versus Google Docs for
>> > our proposal management. Your feedback will be invaluable in helping us
>> > make an informed decision that best meets the needs of our community.
>> > >
>> > > Meanwhile, I have archived previous proposals and written the Celeborn
>> > Improvement Proposal (CIP) process on Confluence.
>> > >
>> > > What do you think? Looking forward to your thoughts on this proposal.
>> > >
>> > >
>> > > Thanks,
>> > > Jiashu Xiong
>> >
>> >
>>


Re:Re:[VOTE] CIP-6: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-13 Thread Nicholas
+1(binding).Regards,Nicholas Jiang

At 2024-06-14 14:05:56, "Nicholas Jiang"  wrote:
>+1(non-binding)
>
>
>
>
>Regards,
>
>Nicholas Jiang
>
>
>
>
>At 2024-06-14 11:36:17, "Yuxin Tan"  wrote:
>>Hi all,
>>
>>Thanks for all the feedback about the CIP-6: Support Flink
>>hybrid shuffle integration with Apache Celeborn[1].
>>The discussion thread is here [2].
>>
>>I'd like to start a vote for it. The vote will be open for at least
>>72 hours unless there is an objection or insufficient votes.
>>
>>[1]
>>https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
>>[2] https://lists.apache.org/thread/55mwmfsxwprzf5l80so9t2cpny82l4nx
>>
>>Best,
>>Yuxin


Re:Re:Re:[VOTE] CIP-6: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-13 Thread Nicholas
+1(non-binding). Sorry for mistake of my non-binding.




Regards,

Nicholas Jiang




At 2024-06-14 14:38:42, "Nicholas"  wrote:

+1(binding).
Regards,
Nicholas Jiang


At 2024-06-14 14:05:56, "Nicholas Jiang"  wrote:
>+1(non-binding)
>
>
>
>
>Regards,
>
>Nicholas Jiang
>
>
>
>
>At 2024-06-14 11:36:17, "Yuxin Tan"  wrote:
>>Hi all,
>>
>>Thanks for all the feedback about the CIP-6: Support Flink
>>hybrid shuffle integration with Apache Celeborn[1].
>>The discussion thread is here [2].
>>
>>I'd like to start a vote for it. The vote will be open for at least
>>72 hours unless there is an objection or insufficient votes.
>>
>>[1]
>>https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
>>[2] https://lists.apache.org/thread/55mwmfsxwprzf5l80so9t2cpny82l4nx
>>
>>Best,
>>Yuxin


Re:[DISCUSS] Celeborn RESTful API Refine Proposal

2024-06-25 Thread Nicholas
 Hi turboFei,




Thanks for driving the proposal of RESTful API Refine. I have some questions 
about this proposal:




1. Could you summary all /api/v1 interfaces in Public Interfaces section? 
Meanwhile, could you also the definition of the parameter and return type class 
like the fields of POJO?




2. Could some interfaces merged into one interface like 
/${version}/workers/lost, /${version}/workers/excluded and 
/${version}/workers/shutdown? Should the refined REST API be mapping to origin 
interface one by one?




3. Could this migration plan describe more detail? For example, the origin 
interfaces returns string, but the refined REST API returns the POJO? How does 
the user migrate the REST API?




4. Some interfaces like /${version}/exit do not mentation the HTTP method? 
Could you check all the HTTP method of refined REST API? Meanwhile, is there 
any standard or pattern of the naming for path? For example, 
/${version}/workers/events not only list the event info for Get method, but 
also supports sending event for POST method without operation name in path.




5. /${version}/conf/dynamic uses three parameters without any POJO parameter 
type class, but /${version}/workers/events uses SendWorkerEventRequest as 
parameter type. Could this parameter type be unified to POJO?




Regards,

Nicholas Jiang




At 2024-06-26 00:35:11, "Fei Wang"  wrote:
>Hi all,
>
>I have written up a proposal about refining the current Master/Worker
>RESTful API. You can find the proposal
>https://docs.google.com/document/d/1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs/edit?usp=sharing
><https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs%2Fedit%3Fusp%3Dsharing&data=05%7C02%7Cfwang12%40ebay.com%7C71f1561af7134f08cb1808dc94eda6d9%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638548995634818085%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1gfkqj0ocuhnKb7c4%2FB0qvJu%2Fc5U1KDi6XS4UM64m%2B8%3D&reserved=0>
> here.
>
>Please let me know if you have any comments or questions.
>
>
>
>TLDR by refining the RESTful API, it makes integration with operations
>tools and Celeborn easy.
>
>
>
>FYI, I was not able to access the cwiki page to put this proposal there,
>hope it is okay to just share as a google doc here for now.
>
>
>
>--
>
>Fei Wang
>
>
>
>Celeborn RESTful API Refine Proposal
>
>https://docs.google.com/document/d/1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs/edit?usp=sharing
><https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs%2Fedit%3Fusp%3Dsharing&data=05%7C02%7Cfwang12%40ebay.com%7C71f1561af7134f08cb1808dc94eda6d9%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638548995634832996%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=eQA75vkDkYn2LRZxXygkNsAEbiwniT0mluZAog5JM7Q%3D&reserved=0>


Re:Re: [VOTE] CIP-9: Celeborn RESTful API Refine

2024-07-03 Thread Nicholas
+1(non-binding).




Regards,

Nicholas Jiang




在 2024-07-03 17:04:52,"Ethan Feng"  写道:
>+1
>
>Thanks,
>Ethan Feng
>
>angers zhu  于2024年7月3日周三 10:20写道:
>>
>> +1
>>
>> Regards
>> Angerszh
>>
>> Keyong Zhou  于2024年7月3日周三 09:25写道:
>>
>> > +1
>> >
>> > Regards,
>> > Keyong Zhou
>> >
>> > Fei Wang  于2024年7月3日周三 02:07写道:
>> >
>> > > Hi all,
>> > >
>> > > Thanks for all the feedback about the CIP-9 Celeborn RESTful API Refine
>> > > [1].
>> > > The discussion thread is here [2].
>> > >
>> > > I'd like to start a vote for it. The vote will be open for at least 72
>> > > hours unless there is an objection or insufficient votes.
>> > >
>> > > [1]
>> > >
>> > >
>> > https://docs.google.com/document/d/1LV2vV-w3XtlbJj2Vi4J77mt4IYCr40-8A_JncZLsHqs/edit?usp=sharing
>> > > [2] https://lists.apache.org/thread/mng0pxst0z4gc9gs7mc1frz4pzpk70jb
>> > >
>> > > Best Regards,
>> > > Fei Wang
>> > >
>> >


[VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc0

2023-12-18 Thread Nicholas Jiang
Hi Celeborn community,


This is a call for a vote to release Apache Celeborn (Incubating)
0.3.2-incubating-rc0


The git tag to be voted upon:
https://github.com/apache/incubator-celeborn/releases/tag/v0.3.2-incubating-rc0


The git commit hash:
d43411b22adf24679c27004a08e813ab278eaaa3 source and binary artifacts can be
found at:
https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.3.2-incubating-rc0


The staging repo:
https://repository.apache.org/content/repositories/orgapacheceleborn-1041


Fingerprint of the PGP key release artifacts are signed with:
D73CADC1DAB63BD3C770BB6D9476842D24B7C885


My public key to verify signatures can be found in:
https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS


The vote will be open for at least 72 hours or until the necessary
number of votes are reached.


Please vote accordingly:


[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and the reason)


Checklist for release:
https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist


Steps to validate the release:
https://www.apache.org/info/verification.html


* Download links, checksums and PGP signatures are valid.
* Source code distributions have correct names matching the current release.
* Release files have the word incubating in their name.
* DISCLAIMER, LICENSE and NOTICE files are correct.
* All files have license headers if necessary.
* No unlicensed compiled archives bundled in source archive.
* The source tarball matches the git tag.
* Build from source is successful.


Regards,
Nicholas Jiang

[CANCEL][VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc0

2023-12-20 Thread Nicholas Jiang
Hi Celeborn community,


Cancel this vote of 0.3.2-incubating-rc0 because of the correctness issue of 
log level for CommitHandler, will prepare the next RC once it gets resolved.


Regards,
Nicholas Jiang

[VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc1

2023-12-20 Thread Nicholas Jiang
Hi Celeborn community,


This is a call for a vote to release Apache Celeborn (Incubating)
0.3.2-incubating-rc1


The git tag to be voted upon:
https://github.com/apache/incubator-celeborn/releases/tag/v0.3.2-incubating-rc1


The git commit hash:
bce190d8a0a53434ef57ef33e53720f5bf4d14d6 source and binary artifacts can be
found at:
https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.3.2-incubating-rc1


The staging repo:
https://repository.apache.org/content/repositories/orgapacheceleborn-1045


Fingerprint of the PGP key release artifacts are signed with:
D73CADC1DAB63BD3C770BB6D9476842D24B7C885


My public key to verify signatures can be found in:
https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS


The vote will be open for at least 72 hours or until the necessary
number of votes are reached.


Please vote accordingly:


[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and the reason)


Checklist for release:
https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
Steps to validate the release:
https://www.apache.org/info/verification.html


* Download links, checksums and PGP signatures are valid.
* Source code distributions have correct names matching the current release.
* Release files have the word incubating in their name.
* DISCLAIMER, LICENSE and NOTICE files are correct.
* All files have license headers if necessary.
* No unlicensed compiled archives bundled in source archive.
* The source tarball matches the git tag.
* Build from source is successful.


Regards,
Nicholas Jiang

Re: [VOTE] Release Apache Celeborn(Incubating) 0.4.0-incubating-rc0

2023-12-21 Thread Nicholas Jiang
+1 (non-binding)

I checked
- Download links are valid.
- Checksums and signatures are valid.
- Git commit hash is correct
- No binary files in the source release
- Files have the word incubating in names.
- DISCLAIMER,LICENSE and NOTICE files exist.
- Successfully built the binary from the source via command: 
./build/make-distribution.sh --release

Regards,
Nicholas Jiang

On 2023/12/21 13:41:36 Fu Chen wrote:
> Hi Celeborn community,
> 
> This is a call for a vote to release Apache Celeborn (Incubating)
> 0.4.0-incubating-rc0
> 
> 
> The git tag to be voted upon:
> https://github.com/apache/incubator-celeborn/releases/tag/v0.4.0-incubating-rc0
> 
> 
> The git commit hash:
> de6d8d69af3381ee899ba8d92c5d63b332cbdfbf source and binary artifacts can be
> found at:
> https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.4.0-incubating-rc0
> 
> 
> The staging repo:
> https://repository.apache.org/content/repositories/orgapacheceleborn-1046
> 
> 
> Fingerprint of the PGP key release artifacts are signed with:
> 92AF4750DAFCB5E25B5B83EA76F54B977EB5C09B
> 
> 
> My public key to verify signatures can be found in:
> https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS
> 
> 
> The vote will be open for at least 72 hours or until the necessary
> number of votes are reached.
> 
> 
> Please vote accordingly:
> 
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> 
> Checklist for release:
> https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
> Steps to validate the release:
> https://www.apache.org/info/verification.html
> 
> 
> * Download links, checksums and PGP signatures are valid.
> * Source code distributions have correct names matching the current release.
> * Release files have the word incubating in their name.
> * DISCLAIMER, LICENSE and NOTICE files are correct.
> * All files have license headers if necessary.
> * No unlicensed compiled archives bundled in source archive.
> * The source tarball matches the git tag.
> * Build from source is successful.
> 
> Please be aware that there has been a transition in the Celeborn project's
> build tool, shifting from Maven to SBT. The SBT build documentation is
> available
> at https://celeborn.apache.org/docs/latest/developers/sbt/.
> 
> For illustrative purposes:
> 
> Packaging the project
> ```
> ./build/sbt clean package
> ```
> 
> Creating the distribution
> ```
> ./build/make-distribution.sh --sbt-enabled --release
> ```
> 
> Thanks,
> Fu Chen
> 


[RESULT][VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc1

2023-12-24 Thread Nicholas Jiang
Hello Apache Celeborn PPMC and Community,




The vote closes now as 72hr have passed. The vote PASSES with

(* = binding)

+1:

- Keyong Zhou *

- Jiashu Xiong *

- Cheng Pan *

- Ethan Feng *

- Fu Chen

- Shaoyun Chen

- Yihe Li




+0: None




-1: None




The vote thread:

https://lists.apache.org/thread/w1jz9018p1cj9x0624oq86f9wo5wz1wd




I will now bring the vote to gene...@incubator.apache.org to get

approval by the IPMC.




If this vote passes too, the release is accepted and will be published.




Thanks to all who helped with the release!




Thanks,

On behalf of Apache Celeborn(Incubating) community

[VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc2

2023-12-29 Thread Nicholas Jiang
Hi Celeborn community,

This is a call for a vote to release Apache Celeborn (Incubating)

0.3.2-incubating-rc2


The git tag to be voted upon:

https://github.com/apache/incubator-celeborn/releases/tag/v0.3.2-incubating-rc2


The git commit hash:
0dccad38e28554c36a5eef98de2540d996f946f7 source and binary artifacts can be
found at:

https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.3.2-incubating-rc2


The staging repo:
https://repository.apache.org/content/repositories/orgapacheceleborn-1048


Fingerprint of the PGP key release artifacts are signed with:
D73CADC1DAB63BD3C770BB6D9476842D24B7C885

My public key to verify signatures can be found in:

https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS


The vote will be open for at least 72 hours or until the necessary
number of votes are reached.


Please vote accordingly:


[ ] +1 approve
[ ] +0 no opinion

[ ] -1 disapprove (and the reason)


Checklist for release:

https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
Steps to validate the release:

https://www.apache.org/info/verification.html

* Download links, checksums and PGP signatures are valid.
* Source code distributions have correct names matching the current release.
* Release files have the word incubating in their name.
* DISCLAIMER, LICENSE and NOTICE files are correct.
* All files have license headers if necessary.
* No unlicensed compiled archives bundled in source archive.
* The source tarball matches the git tag.
* Build from source is successful.

Regards,
Nicholas Jiang

[RESULT][VOTE] Release Apache Celeborn(Incubating) 0.3.2-incubating-rc2

2024-01-01 Thread Nicholas Jiang
Hello Apache Celeborn PPMC and Community,

The vote closes now as 72hr have passed. The vote PASSES with
(* = binding)
+1:
- Keyong Zhou *
- Ethan Feng *
- Jiashu Xiong *
- Yihe Li

+0: None

-1: None

The vote thread:

https://lists.apache.org/thread/zq5kb3ovr53xkocgh0jyx30nsrjqw9do


I will now bring the vote to gene...@incubator.apache.org to get
approval by the IPMC.

If this vote passes too, the release is accepted and will be published.

Thanks to all who helped with the release!

Thanks,
On behalf of Apache Celeborn(Incubating) community

[ANNOUNCE] Apache Celeborn(incubating) 0.3.2 available

2024-01-07 Thread Nicholas Jiang
Hi all,

Apache Celeborn(Incubating) community is glad to announce the
new release of Apache Celeborn(Incubating) 0.3.2.

Celeborn is dedicated to improving the efficiency and elasticity of
different map-reduce engines and provides an elastic, high-efficient
service for intermediate data including shuffle data, spilled data,
result data, etc.


Download Link: https://celeborn.apache.org/download/

GitHub Release Tag:

- https://github.com/apache/incubator-celeborn/releases/tag/v0.3.2-incubating

Release Notes:

- https://celeborn.apache.org/community/release_notes/release_note_0.3.2


Home Page: https://celeborn.apache.org/

Celeborn Resources:

- Issue Management: https://issues.apache.org/jira/projects/CELEBORN
- Mailing List: dev@celeborn.apache.org

Regards,
Nicholas Jiang
On behalf of the Apache Celeborn(incubating) community

Re: [VOTE] Release Apache Celeborn(Incubating) 0.4.0-incubating-rc5

2024-01-27 Thread Nicholas Jiang
+1 (non-binding)

I have checked:

- Git commit hash is correct.
- Checksums and signatures are valid.
- Download links are valid.
- No binary files in the source release
- DISCLAIMER, LICENSE and NOTICE files exist.
- Built binary from source with command: ./build/make-distribution.sh 
-Pspark-3.1

Regards,
Nicholas Jiang

On 2024/01/25 15:18:49 Fu Chen wrote:
> Hi Celeborn community,
> 
> This is a call for a vote to release Apache Celeborn (Incubating)
> 0.4.0-incubating-rc5
> 
> 
> The git tag to be voted upon:
> https://github.com/apache/incubator-celeborn/releases/tag/v0.4.0-incubating-rc5
> 
> 
> The git commit hash:
> 188ad3a0fceefa081160ebfc8433c29c0aeb253f
> source and binary artifacts can be
> found at:
> https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.4.0-incubating-rc5
> 
> 
> The staging repo:
> https://repository.apache.org/content/repositories/orgapacheceleborn-1052
> 
> 
> Fingerprint of the PGP key release artifacts are signed with:
> 92AF4750DAFCB5E25B5B83EA76F54B977EB5C09B
> 
> 
> My public key to verify signatures can be found in:
> https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS
> 
> 
> The vote will be open for at least 72 hours or until the necessary
> number of votes are reached.
> 
> 
> Please vote accordingly:
> 
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> 
> Checklist for release:
> https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
> Steps to validate the release:
> https://www.apache.org/info/verification.html
> 
> 
> * Download links, checksums and PGP signatures are valid.
> * Source code distributions have correct names matching the current release.
> * Release files have the word incubating in their name.
> * DISCLAIMER, LICENSE and NOTICE files are correct.
> * All files have license headers if necessary.
> * No unlicensed compiled archives bundled in source archive.
> * The source tarball matches the git tag.
> * Build from source is successful.
> 
> Please be aware that there has been a transition in the Celeborn project's
> build tool, shifting from Maven to SBT. The SBT build documentation is
> available
> at https://celeborn.apache.org/docs/latest/developers/sbt/.
> 
> For illustrative purposes:
> 
> Packaging the project
> ```
> ./build/sbt clean package
> ```
> 
> Creating the distribution
> ```
> ./build/make-distribution.sh --sbt-enabled --release
> ```
> 
> Thanks,
> Fu Chen
> 


Re:[VOTE] Release Apache Celeborn(Incubating) 0.4.0-incubating-rc6

2024-01-31 Thread Nicholas Jiang
+1 (non-binding)

I have verified:
- Git commit hash is correct.
- Checksums and signatures are valid.
- Download links are valid.
- No binary files in the source release.
- Built the binary from the source with command successfully: 
./build/make-distribution.sh -Pspark-3.1,flink-1.18. Regards,

Nicholas Jiang







At 2024-01-29 21:45:50, "Fu Chen"  wrote:
>Hi Celeborn community,
>
>This is a call for a vote to release Apache Celeborn (Incubating)
>0.4.0-incubating-rc6
>
>
>The git tag to be voted upon:
>https://github.com/apache/incubator-celeborn/releases/tag/v0.4.0-incubating-rc6
>
>
>The git commit hash:
>20a8576fc696f0208c24ab52e6ae883f5f0567d5
>source and binary artifacts can be
>found at:
>https://dist.apache.org/repos/dist/dev/incubator/celeborn/v0.4.0-incubating-rc6
>
>
>The staging repo:
>https://repository.apache.org/content/repositories/orgapacheceleborn-1053
>
>
>Fingerprint of the PGP key release artifacts are signed with:
>92AF4750DAFCB5E25B5B83EA76F54B977EB5C09B
>
>
>My public key to verify signatures can be found in:
>https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS
>
>
>The vote will be open for at least 72 hours or until the necessary
>number of votes are reached.
>
>
>Please vote accordingly:
>
>
>[ ] +1 approve
>[ ] +0 no opinion
>[ ] -1 disapprove (and the reason)
>
>
>Checklist for release:
>https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist
>Steps to validate the release:
>https://www.apache.org/info/verification.html
>
>
>* Download links, checksums and PGP signatures are valid.
>* Source code distributions have correct names matching the current release.
>* Release files have the word incubating in their name.
>* DISCLAIMER, LICENSE and NOTICE files are correct.
>* All files have license headers if necessary.
>* No unlicensed compiled archives bundled in source archive.
>* The source tarball matches the git tag.
>* Build from source is successful.
>
>Please be aware that there has been a transition in the Celeborn project's
>build tool, shifting from Maven to SBT. The SBT build documentation is
>available
>at https://celeborn.apache.org/docs/latest/developers/sbt/.
>
>For illustrative purposes:
>
>Packaging the project
>```
>./build/sbt clean package
>```
>
>Creating the distribution
>```
>./build/make-distribution.sh --sbt-enabled --release
>```
>
>Thanks,
>Fu Chen


Re:[DISCUSS] Graduate Celeborn as TLP

2024-02-26 Thread Nicholas Jiang
Hi Yu,




+1. Celeborn has active community with much contribution of developers and many 
company production practice including my company bilibili. It's time to start 
the graduation procedure. Forward to the graduation of Apache Celeborn.




Regards,

Nicholas Jiang


At 2024-02-27 09:40:04, "Yu Li"  wrote:
>Dear Celeborn Devs,
>
>We, the Celeborn community, began our incubation journey on October
>18, 2022. Since then, with the continuous efforts of you all, our
>community has steadily developed and gradually matured, approaching
>the graduation criteria [1]. Therefore, I'd like to call a discussion
>to graduate Celeborn as TLP. Below are some statistics I collected,
>please check it and let me know your thoughts.
>
>- Currently, our community consists of 19 committers (including
>mentors) from more than 10 companies, with 12 serving as PPMC members
>[2].
>- So far, we have boasted 81 contributors.
>- Throughout the incubation period, we've made 6 releases [3] in 16
>months, at a stable pace.
>- We've had 6 different release managers to date.
>- Our software is used in production by 10+ well known entities [4].
>- As yet, we have opened 1,286 issues with 1,176 successfully resolved [5].
>- We have submitted a total of 1,816 PRs, out of which 1,805 have been
>merged or closed [6].
>- Through self-assessment [7], we have met all maturity criteria as
>outlined in [1].
>
>And below is the drafted graduation resolution, JFYI:
>--
>Establish the Apache Celeborn Project
>
>WHEREAS, the Board of Directors deems it to be in the best interests of
>the Foundation and consistent with the Foundation's purpose to establish
>a Project Management Committee charged with the creation and maintenance
>of open-source software, for distribution at no charge to the public,
>related to an intermediate data service for big data computing engines
>to boost performance, stability, and flexibility.
>
>NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
>(PMC), to be known as the "Apache Celeborn Project", be and hereby is
>established pursuant to Bylaws of the Foundation; and be it further
>
>RESOLVED, that the Apache Celeborn Project be and hereby is responsible
>for the creation and maintenance of software related to an intermediate
>data service for big data computing engines to boost performance,
>stability, and flexibility; and be it further
>
>RESOLVED, that the office of "Vice President, Apache Celeborn" be and
>hereby is created, the person holding such office to serve at the
>direction of the Board of Directors as the chair of the Apache Celeborn
>Project, and to have primary responsibility for management of the
>projects within the scope of responsibility of the Apache Celeborn
>Project; and be it further
>
>RESOLVED, that the persons listed immediately below be and hereby are
>appointed to serve as the initial members of the Apache Celeborn
>Project:
>
> * Becket Qin
> * Cheng Pan 
> * Duo Zhang 
> * Ethan Feng
> * Fu Chen   
> * Jiashu Xiong  
> * Kerwin Zhang  
> * Keyong Zhou   
> * Lidong Dai
> * Willem Ning Jiang 
> * Wu Wei
> * Yi Zhu
> * Yu Li 
>
>NOW, THEREFORE, BE IT FURTHER RESOLVED, that Keyong Zhou be appointed to
>the office of Vice President, Apache Celeborn, to serve in accordance
>with and subject to the direction of the Board of Directors and the
>Bylaws of the Foundation until death, resignation, retirement, removal
>or disqualification, or until a successor is appointed; and be it
>further
>
>RESOLVED, that the Apache Celeborn Project be and hereby is tasked with
>the migration and rationalization of the Apache Incubator Celeborn
>podling; and be it further
>
>RESOLVED, that all responsibilities pertaining to the Apache Incubator
>Celeborn podling encumbered upon the Apache Incubator PMC are hereafter
>discharged.
>--
>
>Best Regards,
>Yu
>
>[1] https://incubator.apache.org/guides/graduation.html
>[2] https://celeborn.apache.org/community/project_management_committee
>[3] 
>https://issues.apache.org/jira/projects/CELEBORN?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page&status=released
>[4] https://github.com/apache/incubator-celeborn/issues/2140
>[5] https://s.apache.org/celeborn_jira_issues
>[6] https://github.com/apache/incubator-celeborn/pulls
>[7] 
>https://cwiki.apache.org/confluence/display/CELEBORN/Apache+Maturity+Model+Assessment+for+Celeborn


Re:[ANNOUNCE] Apache Celeborn is graduated to Top Level Project

2024-03-26 Thread Nicholas Jiang
Congratulations! Witness the continuous development of the community.Regards,
Nicholas Jiang
At 2024-03-25 20:49:36, "Ethan Feng"  wrote:
>Hello Celeborn community,
>
>I am glad to share that the ASF board has approved a resolution to
>graduate Celeborn into a full Top Level Project. Thank you all for
>your help in reaching this milestone.
>
>To transition from the Apache Incubator to a new TLP, there are a few
>action items[1] we need to complete the transition. I have opened an
>Umbrella Issue[2] to track the tasks, and you are welcome to take on
>the sub-tasks and leave comments if I have missed anything.
>
>Additionally, the GitHub repository migration is already complete[3].
>Please update your local git repository to track the new repo[4]. If
>you named the upstream as "apache", you can run the following command
>to complete the remote repo tracking migration.
>
>` git remote set-url apache g...@github.com:apache/celeborn.git `
>
>Please find the relevant URLs below:
>[1] https://incubator.apache.org/guides/transferring.html#life_after_graduation
>[2] https://github.com/apache/celeborn/issues/2415
>[3] https://issues.apache.org/jira/browse/INFRA-25635
>[4] https://github.com/apache/celeborn
>
>Thanks,
>Ethan Feng


Re: Re:Re: Add k8s operator for celeborn

2024-04-01 Thread Nicholas Jiang
Hi xleoken,

Good idea. I also propose to introduce K8S operator of Celeborn for more K8S 
possibilities like auto scaling etc.

BTW, the design document 
https://github.com/xleoken/incubator-celeborn/blob/operator/operator/doc/design.md
 could not access. Are you working on this? I have interest to build Celeborn 
K8S operator with you. WDYT?

Regards,
Nicholas Jiang

On 2023/11/08 18:38:25 leo65535 wrote:
> Hi Ethan Feng, Cheng Pan
> 
> 
> 
> 1. More detail about the design, you can check this doc.
> 
> https://github.com/xleoken/incubator-celeborn/blob/operator/operator/doc/design.md
> 
> 
> 
> 
> 2. Please consider providing minikube/k3s-based integration tests(or setup 
> steps),
> 
> Your suggestion is good. We can play with the celeborn-operator from the 
> operator/operator/README.md doc currently.
> Sorry, I'm not familiar with minikube tests for now, maybe we can create a 
> sub jira to finish it.
> 
> 
> Best,
> xleoken
> 
> 
> 
> At 2023-11-08 17:20:49, "Ethan Feng"  wrote:
> >Hi xleoken,
> >
> >I completely agree with your idea. We can definitely use a Kubernetes
> >operator for Celeborn, and it would help us manage the Celeborn
> >clusters more effectively.
> >
> >Thank you for bringing this up, I found that you had already completed
> >the Kubernetes operator on your branch. Can you share your design docs
> >and contribute it to the community?
> >
> >Best regards,
> >Ethan Feng
> >
> >leo65535  于2023年11月8日周三 16:54写道:
> >>
> >>
> >>
> >> Hello everyone,
> >>
> >>
> >> As we known, kubernetes is designed for automation, Kubernetes' operator 
> >> pattern concept
> >> lets you extend the cluster's behaviour without modifying the code of 
> >> Kubernetes itself
> >> by linking controllers to one or more custom resources. Operators are 
> >> clients of the
> >> Kubernetes API that act as controllers for a Custom Resource.
> >>
> >>
> >> So we can use k8s operator sdk to write our own operator for celeborn. the 
> >> operator can
> >> help us to manage the celeborn clusters.
> >>
> >>
> >> Please check the jira[1] for more detail.
> >>
> >>
> >> Looking forward to your feedback and suggestions, thanks.
> >>
> >>
> >> Bests,
> >> xleoken
> >>
> >>
> >> [1]
> >> https://issues.apache.org/jira/browse/CELEBORN-1117
> >>
> 


[DISCUSS] Time for 0.4.1

2024-04-12 Thread Nicholas Jiang
Hey, Celeborn community,


It has been a while since the 0.4.0 release, and there are some critical fixes 
land branch-0.4, for example, [CELEBORN-1252][FOLLOWUP] Fix 
Worker#computeResourceConsumption NullPointerException for 
userResourceConsumption that does not contain given userIdentifier. From my 
perspective, it’s time to prepare for releasing 0.4.1.


WDYT? And I’m volunteering to be the release manager if no one has applied.

Regards,
Nicholas Jiang

Re: [ANNOUNCE] Add Mridul Muralidharan as new committer

2024-04-28 Thread Nicholas Jiang
Congratulate Mridul!!! Well deserved! Welcome.

Regards,
Nicholas Jiang

On 2024/04/29 01:21:15 Keyong Zhou wrote:
> Hi Celeborn Community,
> 
> The Project Management Committee (PMC) for Apache Celeborn
> has invited Mridul Muralidharan to become a committer and we are pleased
> to announce that he has accepted.
> 
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
> A PMC member helps manage and guide the direction of the project.
> 
> Please join me in congratulating Mridul Muralidharan!
> 
> Regards,
> Keyong Zhou
> 


[VOTE] Release Apache Celeborn 0.4.1-rc0

2024-05-07 Thread Nicholas Jiang
Hi Celeborn community,

This is a call for a vote to release Apache Celeborn

0.4.1-rc0


The git tag to be voted upon:

https://github.com/apache/celeborn/releases/tag/v0.4.1-rc0

The git commit hash:
6118a549062cd6cda12947679485c98b2e8943a8 source and binary artifacts can be
found at:

https://dist.apache.org/repos/dist/dev/celeborn/v0.4.1-rc0

The staging repo:

https://repository.apache.org/content/repositories/orgapacheceleborn-1054


Fingerprint of the PGP key release artifacts are signed with:
D73CADC1DAB63BD3C770BB6D9476842D24B7C885

My public key to verify signatures can be found in:

https://dist.apache.org/repos/dist/release/celeborn/KEYS

The vote will be open for at least 72 hours or until the necessary
number of votes are reached.

Please vote accordingly:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and the reason)

Steps to validate the release:

https://www.apache.org/info/verification.html

* Download links, checksums and PGP signatures are valid.
* Source code distributions have correct names matching the current release.
* LICENSE and NOTICE files are correct.
* All files have license headers if necessary.
* No unlicensed compiled archives bundled in source archive.
* The source tarball matches the git tag.
* Build from source is successful.

Regards,
Nicholas Jiang

Re:Re: [VOTE] Release Apache Celeborn 0.4.1-rc0

2024-05-08 Thread Nicholas Jiang
Hi Keyong,




If no one takes the third test, I perhaps take random killing test via chaos 
testing framework of celeborn in internal testing environment.




Regard,

Nicholas Jiang




At 2024-05-07 17:09:24, "Keyong Zhou"  wrote:
>Hi Nicholas,
>
>Thanks for the work! I think we need to test the following scenarios before
>publishing:
>1. compatibility test
>2. perf test: i.e. TPCDS, pure shuffle workload
>3. random killing test
>
>I'll take the perf test, anyone take the other two?
>
>Regards,
>Keyong Zhou
>
>
>Nicholas Jiang  于2024年5月7日周二 17:04写道:
>
>> Hi Celeborn community,
>>
>> This is a call for a vote to release Apache Celeborn
>>
>> 0.4.1-rc0
>>
>>
>> The git tag to be voted upon:
>>
>> https://github.com/apache/celeborn/releases/tag/v0.4.1-rc0
>>
>> The git commit hash:
>> 6118a549062cd6cda12947679485c98b2e8943a8 source and binary artifacts can be
>> found at:
>>
>> https://dist.apache.org/repos/dist/dev/celeborn/v0.4.1-rc0
>>
>> The staging repo:
>>
>> https://repository.apache.org/content/repositories/orgapacheceleborn-1054
>>
>>
>> Fingerprint of the PGP key release artifacts are signed with:
>> D73CADC1DAB63BD3C770BB6D9476842D24B7C885
>>
>> My public key to verify signatures can be found in:
>>
>> https://dist.apache.org/repos/dist/release/celeborn/KEYS
>>
>> The vote will be open for at least 72 hours or until the necessary
>> number of votes are reached.
>>
>> Please vote accordingly:
>>
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove (and the reason)
>>
>> Steps to validate the release:
>>
>> https://www.apache.org/info/verification.html
>>
>> * Download links, checksums and PGP signatures are valid.
>> * Source code distributions have correct names matching the current
>> release.
>> * LICENSE and NOTICE files are correct.
>> * All files have license headers if necessary.
>> * No unlicensed compiled archives bundled in source archive.
>> * The source tarball matches the git tag.
>> * Build from source is successful.
>>
>> Regards,
>> Nicholas Jiang


[CANCEL][VOTE] Release Apache Celeborn 0.4.1-rc0

2024-05-12 Thread Nicholas Jiang
Hi Celeborn community,


Cancel this vote of 0.4.1-rc0 because of the bug that 
celeborn.network.bind.preferIpAddressconfiguration is only effective on worker 
nodes, will prepare the next RC once it gets resolved.


Regards,
Nicholas Jiang

[VOTE] Release Apache Celeborn 0.4.1-rc1

2024-05-13 Thread Nicholas Jiang
Hi Celeborn community,

This is a call for a vote to release Apache Celeborn

0.4.1-rc1


The git tag to be voted upon:

https://github.com/apache/celeborn/releases/tag/v0.4.1-rc1

The git commit hash:
641180142c5ef36430a6afcd702c9487a6007458 source and binary artifacts can be
found at:

https://dist.apache.org/repos/dist/dev/celeborn/v0.4.1-rc1

The staging repo:

https://repository.apache.org/content/repositories/orgapacheceleborn-1055


Fingerprint of the PGP key release artifacts are signed with:
D73CADC1DAB63BD3C770BB6D9476842D24B7C885

My public key to verify signatures can be found in:

https://dist.apache.org/repos/dist/release/celeborn/KEYS

The vote will be open for at least 72 hours or until the necessary
number of votes are reached.

Please vote accordingly:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and the reason)

Steps to validate the release:

https://www.apache.org/info/verification.html

* Download links, checksums and PGP signatures are valid.
* Source code distributions have correct names matching the current release.
* LICENSE and NOTICE files are correct.
* All files have license headers if necessary.
* No unlicensed compiled archives bundled in source archive.
* The source tarball matches the git tag.
* Build from source is successful.

Regards,
Nicholas Jiang

[RESULT][VOTE] Release Apache Celeborn 0.4.1-rc1

2024-05-21 Thread Nicholas Jiang
Hello Apache Celeborn PMC and Community,

The vote closes now as 72hr have passed. The vote PASSES with
(* = binding)

+1:
- Cheng Pan *
- Yi Zhu *
- Keyong Zhou *
- Ethan Feng *
- Jiashu Xiong *
- Yihe Li
- Mridul Muralidharan
- Shaoyun Chen

+0: None

-1: None

The vote thread:

https://lists.apache.org/thread/n17p517vrts0csrv6y1hr5dj7pq4n2yk

The release of v0.4.1 is accepted and will be published.

Thanks to all who helped with the release!

Thanks,
On behalf of Apache Celeborn community

[ANNOUNCE] Apache Celeborn 0.4.1 available

2024-05-22 Thread Nicholas Jiang
Hi all,

Apache Celeborn community is glad to announce the
new release of Apache Celeborn 0.4.1.

Celeborn is dedicated to improving the efficiency and elasticity of
different map-reduce engines and provides an elastic, high-efficient
service for intermediate data including shuffle data, spilled data,
result data, etc.


Download Link: https://celeborn.apache.org/download/

GitHub Release Tag:

- https://github.com/apache/celeborn/releases/tag/v0.4.1

Release Notes:

- https://celeborn.apache.org/community/release_notes/release_note_0.4.1


Home Page: https://celeborn.apache.org/

Celeborn Resources:

- Issue Management: https://issues.apache.org/jira/projects/CELEBORN
- Mailing List: dev@celeborn.apache.org

Regards,
Nicholas Jiang
On behalf of the Apache Celeborn community

Re: [DISCUSSION] CIP-6: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-05 Thread Nicholas Jiang
Hi Yuxin,

Thanks for driving this CIP about integration with Hybrid Shuffle. I have some 
comments on this CIP:

1. Could you describe in detail what functions the relevant components 
mentioned in Proposed Changes, including CelebornProducerAgent, 
CelebornConsumerAgent, CelebornMasterAgent, etc., support? In the design 
document, these components are only mentioned and no any details of changes.

2. Can you briefly introduce how to guarantee compatibility with Celeborn’s 
existing features such as partition splitting? IMO, the compatibility 
introduction should be mentioned in Proposed Changes to help community 
developers understand.

3. There are no changes on public interfaces. Is there any public configuration 
of integration with Hybrid Shuffle and Flink client?

4. The server side must store Segment information for each subpartition. How 
does the server side guarantee the accuracy and recoverability of Segment 
information?

5. Should Celeborn wait until FLIP-459 is released before releasing this 
integration? Which Flink version will release FLIP-459?

Regards,
Nicholas Jiang

On 2024/05/28 12:51:32 Yuxin Tan wrote:
> Hi all,
> 
> I would like to start a discussion on CIP-6 Support Flink hybrid shuffle
> integration with Apache
> Celeborn[1]. Celeborn provides a stable, performant, scalable remote
> shuffle service.
> Concurrently, Flink hybrid shuffle supports transitions between memory,
> disk, and remote
> storage to improve performance and job stability. This integration proposal
> is to harness the
> benefits from both Celeborn and hybrid shuffle simultaneously.
> 
> Note that this proposal has two parts.
> 1. The Celeborn-side changes are in CIP-6[1].
> 2. The Flink-side modifications are in FLIP-459[2].
> 
> Looking forward to everyone's feedback and suggestions. Thank you!
> 
> [1]
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> 
> Best,
> Yuxin
> 


Re: [DISCUSS] Celeborn CLI Proposal

2024-06-05 Thread Nicholas Jiang
Hi Aravind,

Thanks for driving this CIP about Celeborn CLI. I have some comments on this 
CIP:

1. From a user's perspective, the CLI is more used for some maintenance 
operations such as online and offline of server, rescaling of cluster etc, not 
only based on the REST API. What CLI interfaces are there that the REST API 
doesn’t have for maintenance?

2. There are same sub-commands between MASTER and WORKER. Why not these 
sub-commands belong to BOTH?

3. Does the implementation of CLI invoke the REST API? IMO, the CLI works well 
no matter the server is alive.

BTW, could this design doc of proposal follow the template of CIP[1]?

[1] 
https://cwiki.apache.org/confluence/display/CELEBORN/Celeborn+Improvement+Proposals

Regards,
Nicholas Jiang

On 2024/06/05 23:33:02 Aravind Patnam wrote:
> Hi all,
> 
> I have written up a proposal about introducing a CLI for Celeborn. You can
> find the proposal
> <https://docs.google.com/document/d/1j9wKFSR_ychYDF0NU5YN67WCCtNAgYTbN5CN8V3SOnk/edit?usp=sharing>
> here.
> Please let me know if you have any comments or questions.
> 
> TLDR by introducing a CLI, it would complement the existing dashboard and
> would benefit us internally. We rely on CLI tools internally a lot for
> automation and other operations.
> 
> FYI, I was not able to access the cwiki page to put this proposal there,
> there seems to be some permissions issue. Hope it is okay to just share as
> a google doc here for now.
> 
> -- 
> Aravind K. Patnam
> 
>  Apache Celeborn CLI Proposal
> <https://docs.google.com/document/d/1j9wKFSR_ychYDF0NU5YN67WCCtNAgYTbN5CN8V3SOnk/edit?usp=drive_web>
> 


Re:Re: [VOTE] Contrinute Apache Celeborn CLI

2024-06-11 Thread Nicholas Jiang
+1. Looking forward to Celeborn CLI.




Regards,

Nicholas Jiang


At 2024-06-12 12:26:34, "Aravind Patnam"  wrote:
>Hi all,
>
>Sorry, this is the correct link to the Celeborn CLI CIP
><https://cwiki.apache.org/confluence/display/CELEBORN/CIP+7+-+Celeborn+CLI>
>.
>
>Thanks,
>Aravind
>
>On Tue, Jun 11, 2024 at 9:24 PM Aravind Patnam  wrote:
>
>> Hi all,
>>
>> This is a call to vote to contribute the Celeborn CLI CIP
>> <https://cwiki.apache.org/confluence/display/CELEBORN/Celeborn+Improvement+Proposals>
>>  to
>> Apache Celeborn.
>>
>> Please do vote accordingly:
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove (and the reason)
>>
>> Thanks once again!!
>>
>> Aravind
>>
>
>
>-- 
>Aravind K. Patnam


Re: [DISCUSSION] CIP-6: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-11 Thread Nicholas Jiang
Hi Yuxin,

Thanks for the explanation of above question. IMO, for the implementation of 
Celeborn, more design details need to be provided in CIP rather than FLIP for 
reviewing of community developers. Meanwhile, although some public 
configurations are configured in Flink, for Celeborn Flink Client, they still 
need to be additionally exposed in CIP so that reviewer does not have to spend 
time looking at FLIP. Anyway, I got the answer from your detailed reply. Thanks.

+1 for me. Looking forward to this integration. I would like to consider to use 
this feature after ready.

Regards,
Nicholas Jiang

On 2024/06/11 08:09:06 Yuxin Tan wrote:
> Hi Nicholas,
> 
> Thanks for the valuable feedbacks.
> 
> > 1.  Could you describe in detail what functions the relevant components
> mentioned in Proposed Changes
> 
> These components are only the pluggable implementations of the Celeborn
> tier.
> The details and the mechanisms of switching between tiers are in the
> previous
> FLIP[1]. The Celeborn, as a new tier, is added to hybrid shuffle, sharing
> the
> similarities with existing tiers, such as the Memory tier and Disk tier. In
> this tiered
> storage, agents serve as the entry points of interaction between the
> framework
> and different tiers. For instance, CelebornProducerAgent acts as the entry
> point
> for producers to emit data into the tier. If there are still more similar
> questions
> after referencing that FLIP, please feel free to let me know.
> 
> > 2. Can you briefly introduce how to guarantee compatibility with
> Celeborn’s
> existing features such as partition splitting?
> 
> This integration work is a new way to make Celeborn work with Flink, so the
> compatibility of the old shuffle service mode is not affected. The new
> integration
> will also support the features of the old mode, e.g., the partition split
> will be
> supported by trying to open the stream from the next partition when the
> previous
> partition is read completely. Since these features are all implementation
> details,
> initially I didn't add them in the CIP to keep it focused, simple, and easy
> to
> understand. After the question, I have added some feature details to it.
> 
> > 3. Is there any public configuration of integration with Hybrid Shuffle
> and Flink
> client?
> 
> Yes, there is an added Flink configuration, which is described in the
> FLIP[2].
> 
> 
> > 4. How does the server side guarantee the accuracy and recoverability of
> Segment information?
> 
> Similar to other writing information, the segment info is also added to
> FileInfo.
> and the lock can protect it to guarantee accuracy. The recoverability is
> achieved
> by serialization and deserialization, which is also the same as other
> fields.
> 
> > 5. Should Celeborn wait until FLIP-459 is released before releasing this
> integration? Which Flink version will release FLIP-459?
> 
> Celeborn's integration should wait for FLIP-459 to be released. This is
> because
> the feature relies on both CIP-6 and FLIP-459 to function correctly. If all
> goes well,
> FLIP-459 could be part of Flink's next release, Flink 1.20.
> 
> 
> Hi, Keyong,
> 
> Thanks for the reminder and the interest in the Reduce Partition. After the
> Map
> Partition part is finished, we will continue to work on it as soon as
> possible.
> 
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-301%3A+Hybrid+Shuffle+supports+Remote+Storage
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> 
> Best,
> Yuxin
> 
> 
> Keyong Zhou  于2024年6月8日周六 13:00写道:
> 
> > Hi Yuxin and Xintong,
> >
> > Really excited to see Flink and Celeborn communities collaborate
> > more on shuffle component! I believe this will inspire more for both sides
> > :)
> >
> > +1 for this proposal, looking forward to see this feature to make progress.
> >
> > Also I'm very interested in integrating Flink Hybrid Shuffle with
> > Celeborn's
> > Reduce Partition as mentioned in the doc in the future, which I believe
> > will
> > benefit more for very large shuffle operators :)
> >
> > Regards,
> > Keyong Zhou
> >
> > Nicholas Jiang  于2024年6月6日周四 13:25写道:
> >
> > > Hi Yuxin,
> > >
> > > Thanks for driving this CIP about integration with Hybrid Shuffle. I have
> > > some comments on this CIP:
> > >
> > > 1. Could you describe in detail what functions the relevant components
> > > mentioned in Proposed Changes, including CelebornProducerAgent,
> > > Celebor

Re: [VOTE] Release Apache Celeborn 0.5.0-rc2

2024-06-12 Thread Nicholas Jiang
+1 (non-binding)

I have checked:

- Download links are valid.
- git commit hash is correct
- Checksums and signatures are valid.
- No binary files in the source release
- Successfully built binary from source via ./build/make-distribution.sh 
-Pspark-3.5/flink-1.19/mr

Meanwhile, I have upgraded internal version to 0.5.0 in our production 
environment. 

Regards,
Nicholas Jiang

On 2024/06/12 05:56:07 Ethan Feng wrote:
> Hello, Celeborn community,
> 
> This is a call for a vote to release Apache Celeborn
> 0.5.0-rc2
> 
> The git tag to be voted upon:
> https://github.com/apache/celeborn/releases/tag/v0.5.0-rc2
> 
> Source and binary artifacts can be found at:
> https://dist.apache.org/repos/dist/dev/celeborn/v0.5.0-rc2
> 
> The git commit hash:
> 68c503eb0023e274f8ae09bf4c2687f6a0c01a25
> 
> The staging repo:
> https://repository.apache.org/content/repositories/orgapacheceleborn-1075/
> 
> The fingerprint of the PGP key release artifacts is signed with:
> FCF20BB29C7BEFDF58F998F76392F71F37356FA0
> 
> My public key to verify signatures can be found in:
> https://dist.apache.org/repos/dist/release/celeborn/KEYS
> 
> The vote will be open for at least 72 hours or until the necessary
> number of votes are reached.
> 
> Please vote accordingly:
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> Steps to validate the release:
> https://www.apache.org/info/verification.html
> 
> * Download links, checksums, and PGP signatures are valid.
> * Source code distributions have correct names matching the current release.
> * LICENSE and NOTICE files are correct.
> * All files have license headers if necessary.
> * No unlicensed compiled archives bundled in the source archive.
> * The source tarball matches the git tag.
> * Build from source is successful.
> 
> There are additional tests:
> * Performance test no regression
> 1 TB TPC-DS, 0.5.0 VS 0.4.1 : 2042(s) VS 2050(s)
> 1.1 TB pure shuffle, 0.5.0 VS 0.4.1 : 11.8min vs 11.8min
> 
> * Result correctness test passed
> 1TB TPC-DS runs concurrently, the results are identical.
> 
> * Usability test passed
> Rolling upgrade from version 0.4.1 to 0.5.0 succeed.
> The metrics system works as expected.
> 
> * Stability test passed
> Random worker failures, Celeborn works as expected.
> Random master failures, Celeborn works as expected.
> Master meta corrupted, Celeborn works as expected.
> 
> * Compatibility test passed
> The Celeborn server version of 0.5.0 works fine with the Celeborn client 
> 0.4.1.
> 
> 
> Regards,
> Ethan Feng
> 


Re:[VOTE] CIP-6: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-13 Thread Nicholas Jiang
+1(non-binding)




Regards,

Nicholas Jiang




At 2024-06-14 11:36:17, "Yuxin Tan"  wrote:
>Hi all,
>
>Thanks for all the feedback about the CIP-6: Support Flink
>hybrid shuffle integration with Apache Celeborn[1].
>The discussion thread is here [2].
>
>I'd like to start a vote for it. The vote will be open for at least
>72 hours unless there is an objection or insufficient votes.
>
>[1]
>https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
>[2] https://lists.apache.org/thread/55mwmfsxwprzf5l80so9t2cpny82l4nx
>
>Best,
>Yuxin


Re: [VOTE] Release Apache Celeborn 0.5.0-rc3

2024-06-22 Thread Nicholas Jiang
+1 (non-binding)

I have checked:

- Download links are valid.
- git commit hash is correct
- Checksums and signatures are valid.
- No binary files in the source release
- Successfully built binary from source via ./build/make-distribution.sh 
-Pspark-3.5/flink-1.19/mr

BTW, I have tested the clean performance of Worker for 0.5.0-rc3. It works well 
and there is no backlog for clean task queue.

Regards,
Nicholas Jiang

On 2024/06/19 04:45:52 Ethan Feng wrote:
> Hello, Celeborn community,
> 
> This is a call for a vote to release Apache Celeborn
> 0.5.0-rc3
> 
> The git tag to be voted upon:
> https://github.com/apache/celeborn/releases/tag/v0.5.0-rc3
> 
> Source and binary artifacts can be found at:
> https://dist.apache.org/repos/dist/dev/celeborn/v0.5.0-rc3
> 
> The git commit hash:
> 048ef207359113247bff05dcc203c70021ccfa10
> 
> The staging repo:
> https://repository.apache.org/content/repositories/orgapacheceleborn-1076/
> 
> The fingerprint of the PGP key release artifacts is signed with:
> FCF20BB29C7BEFDF58F998F76392F71F37356FA0
> 
> My public key to verify signatures can be found in:
> https://dist.apache.org/repos/dist/release/celeborn/KEYS
> 
> The vote will be open for at least 72 hours or until the necessary
> number of votes are reached.
> 
> Please vote accordingly:
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> Steps to validate the release:
> https://www.apache.org/info/verification.html
> 
> * Download links, checksums, and PGP signatures are valid.
> * Source code distributions have correct names matching the current release.
> * LICENSE and NOTICE files are correct.
> * All files have license headers if necessary.
> * No unlicensed compiled archives bundled in the source archive.
> * The source tarball matches the git tag.
> * Build from source is successful.
> 
> There are additional tests:
> * Performance test no regression
> 1 TB TPC-DS, 0.5.0 VS 0.4.1 : 2042(s) VS 2050(s)
> 1.1 TB pure shuffle, 0.5.0 VS 0.4.1 : 11.8min vs 11.8min
> 
> * Result correctness test passed
> 1TB TPC-DS runs concurrently, the results are identical.
> 
> * Usability test passed
> Rolling upgrade from version 0.4.1 to 0.5.0 succeed.
> The metrics system works as expected.
> 
> * Stability test passed
> Random worker failures, Celeborn works as expected.
> Random master failures, Celeborn works as expected.
> Master meta corrupted, Celeborn works as expected.
> 
> * Compatibility test passed
> The Celeborn server version of 0.5.0 works fine with the Celeborn client 
> 0.4.1.
> 
> * Grafana dashboard layout checked
> 
> 
> Regards,
> Ethan Feng
> 


[DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-02 Thread Nicholas Jiang
Hi all,

I would like to start a discussion on CIP-10: Introduce Celeborn Chaos Testing 
Framework[1].

A chaos testing framework is designed to simulate unpredictable and adverse 
conditions in distributed systems to validate their robustness and resilience. 
This proposal aims to simulate various anomalies and test the stability of 
Celeborn in distributed environments via chaos testing.

Looking forward to everyone's feedback and suggestions. Thank you!

[1] 
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework

Regards,
Nicholas Jiang

Re: [DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-07 Thread Nicholas Jiang
Hi Ethan,

Thanks for your feedback. I have already added the helm chart support of chaos 
testing framework in "Test Plan". The deployment on K8S environment is 
significant for practice chaos testing framework.

Regards,
Nicholas Jiang

On 2024/07/04 09:24:32 Ethan Feng wrote:
> Hi Nicholas,
> 
> Thanks for reaching out.
> More and more Celeborn users are moving to cloud-native environments.
> Is there any plan to support testing Celeborn cluster on the K8s
> environment?
> 
> Best,
> Ethan Feng
> 
> Nicholas Jiang  于2024年7月3日周三 05:21写道:
> >
> > Hi all,
> >
> > I would like to start a discussion on CIP-10: Introduce Celeborn Chaos 
> > Testing Framework[1].
> >
> > A chaos testing framework is designed to simulate unpredictable and adverse 
> > conditions in distributed systems to validate their robustness and 
> > resilience. This proposal aims to simulate various anomalies and test the 
> > stability of Celeborn in distributed environments via chaos testing.
> >
> > Looking forward to everyone's feedback and suggestions. Thank you!
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
> >
> > Regards,
> > Nicholas Jiang
> 


Re:[DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-10 Thread Nicholas Jiang
Hello community,

It's been a while since the discussion on the Celeborn chaos testing framework. 
The main process of Celeborn chaos testing includes:

1. Defining a test plan to describe the types of events, the order in which 
events are triggered, and their duration. Event types include node anomalies, 
disk anomalies, IO anomalies, CPU overload, etc.
2. The client submits the plan to the scheduler.
3. The scheduler sends operations to each node's runner according to the plan 
description.
4. The runner is responsible for executing the operations and reporting the 
current status of the node.
5. Before triggering an operation, the scheduler deduces the result of this 
event. If it leads to the inability to meet the minimum runnable environment 
for RSS, the event is rejected.

Do you have any thoughts or questions about this chaos testing framework? 
Welcome feedback to further ensure the reliability of Celeborn through chaos 
testing.

Regards,
Nicholas Jiang

At 2024-07-03 05:20:57, "Nicholas Jiang"  wrote:
>Hi all,
>
>I would like to start a discussion on CIP-10: Introduce Celeborn Chaos Testing 
>Framework[1].
>
>A chaos testing framework is designed to simulate unpredictable and adverse 
>conditions in distributed systems to validate their robustness and resilience. 
>This proposal aims to simulate various anomalies and test the stability of 
>Celeborn in distributed environments via chaos testing.
>
>Looking forward to everyone's feedback and suggestions. Thank you!
>
>[1] 
>https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
>
>Regards,
>Nicholas Jiang


Re: [DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-11 Thread Nicholas Jiang
Hey Mridul,

Thanks for your feedback. The ability to reproduce problematic cases by 
capturing logs of events that have been triggered can maximize the value of 
chaos testing framework. Celeborn chaos testing not only needs to verify the 
reliability of the service under the background of simulating various abnormal 
events, but also reproduces problem cases to troubleshoot the root cause of 
Celeborn problems. I would like to take this reproduction feature into 
consideration for this CIP.

Best Regards,
Nicholas Jiang

On 2024/07/10 09:35:52 Mridul Muralidharan wrote:
> Hi,
> 
>   This is a great idea - and would go a long way in flushing out bugs and
> issues - and improving the overall robustness of Celeborn !
> It would also be good to have:
> a) Capture a (replay) log of all events which were triggered.
> b) Ability to 'replay' the log and deterministically reach the same state.
> 
> This will allow us to identify failure cases with the testing framework -
> while allowing developers to deterministically reproduce the identified
> state.
> 
> (Hopefully I did not miss this in the proposal).
> 
> Regards,
> Mridul
> 
> 
> On Wed, Jul 10, 2024 at 4:07 AM Nicholas Jiang 
> wrote:
> 
> > Hello community,
> >
> > It's been a while since the discussion on the Celeborn chaos testing
> > framework. The main process of Celeborn chaos testing includes:
> >
> > 1. Defining a test plan to describe the types of events, the order in
> > which events are triggered, and their duration. Event types include node
> > anomalies, disk anomalies, IO anomalies, CPU overload, etc.
> > 2. The client submits the plan to the scheduler.
> > 3. The scheduler sends operations to each node's runner according to the
> > plan description.
> > 4. The runner is responsible for executing the operations and reporting
> > the current status of the node.
> > 5. Before triggering an operation, the scheduler deduces the result of
> > this event. If it leads to the inability to meet the minimum runnable
> > environment for RSS, the event is rejected.
> >
> > Do you have any thoughts or questions about this chaos testing framework?
> > Welcome feedback to further ensure the reliability of Celeborn through
> > chaos testing.
> >
> > Regards,
> > Nicholas Jiang
> >
> > At 2024-07-03 05:20:57, "Nicholas Jiang"  wrote:
> > >Hi all,
> > >
> > >I would like to start a discussion on CIP-10: Introduce Celeborn Chaos
> > Testing Framework[1].
> > >
> > >A chaos testing framework is designed to simulate unpredictable and
> > adverse conditions in distributed systems to validate their robustness and
> > resilience. This proposal aims to simulate various anomalies and test the
> > stability of Celeborn in distributed environments via chaos testing.
> > >
> > >Looking forward to everyone's feedback and suggestions. Thank you!
> > >
> > >[1]
> > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
> > >
> > >Regards,
> > >Nicholas Jiang
> >
> 


Re: [DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-15 Thread Nicholas Jiang
Hey Keyong,

Thanks for your feedback. 

In my opinion, validating the data quality by checking "Shuffle Records 
Written" and "Records Read" satisfies the correctness check. Meanwhile, the 
codes of chaos testing framework are separated into "verifier" module. WDYT?

Regards,
Nicholas Jiang

On 2024/07/14 17:10:31 Keyong Zhou wrote:
> Thanks for the proposal!
> 
> The chaos framework is very useful for Celeborn, there are two points I
> think are important:
> 1. We need to add correctness check in the framework, correctness is NO.1
> important thing.
> 2. The framework should not intrude into the common code.
> 
> Regards,
> Keyong Zhou
> 
> Nicholas Jiang  于2024年7月12日周五 14:29写道:
> 
> > Hey Mridul,
> >
> > Thanks for your feedback. The ability to reproduce problematic cases by
> > capturing logs of events that have been triggered can maximize the value of
> > chaos testing framework. Celeborn chaos testing not only needs to verify
> > the reliability of the service under the background of simulating various
> > abnormal events, but also reproduces problem cases to troubleshoot the root
> > cause of Celeborn problems. I would like to take this reproduction feature
> > into consideration for this CIP.
> >
> > Best Regards,
> > Nicholas Jiang
> >
> > On 2024/07/10 09:35:52 Mridul Muralidharan wrote:
> > > Hi,
> > >
> > >   This is a great idea - and would go a long way in flushing out bugs and
> > > issues - and improving the overall robustness of Celeborn !
> > > It would also be good to have:
> > > a) Capture a (replay) log of all events which were triggered.
> > > b) Ability to 'replay' the log and deterministically reach the same
> > state.
> > >
> > > This will allow us to identify failure cases with the testing framework -
> > > while allowing developers to deterministically reproduce the identified
> > > state.
> > >
> > > (Hopefully I did not miss this in the proposal).
> > >
> > > Regards,
> > > Mridul
> > >
> > >
> > > On Wed, Jul 10, 2024 at 4:07 AM Nicholas Jiang  > >
> > > wrote:
> > >
> > > > Hello community,
> > > >
> > > > It's been a while since the discussion on the Celeborn chaos testing
> > > > framework. The main process of Celeborn chaos testing includes:
> > > >
> > > > 1. Defining a test plan to describe the types of events, the order in
> > > > which events are triggered, and their duration. Event types include
> > node
> > > > anomalies, disk anomalies, IO anomalies, CPU overload, etc.
> > > > 2. The client submits the plan to the scheduler.
> > > > 3. The scheduler sends operations to each node's runner according to
> > the
> > > > plan description.
> > > > 4. The runner is responsible for executing the operations and reporting
> > > > the current status of the node.
> > > > 5. Before triggering an operation, the scheduler deduces the result of
> > > > this event. If it leads to the inability to meet the minimum runnable
> > > > environment for RSS, the event is rejected.
> > > >
> > > > Do you have any thoughts or questions about this chaos testing
> > framework?
> > > > Welcome feedback to further ensure the reliability of Celeborn through
> > > > chaos testing.
> > > >
> > > > Regards,
> > > > Nicholas Jiang
> > > >
> > > > At 2024-07-03 05:20:57, "Nicholas Jiang" 
> > wrote:
> > > > >Hi all,
> > > > >
> > > > >I would like to start a discussion on CIP-10: Introduce Celeborn Chaos
> > > > Testing Framework[1].
> > > > >
> > > > >A chaos testing framework is designed to simulate unpredictable and
> > > > adverse conditions in distributed systems to validate their robustness
> > and
> > > > resilience. This proposal aims to simulate various anomalies and test
> > the
> > > > stability of Celeborn in distributed environments via chaos testing.
> > > > >
> > > > >Looking forward to everyone's feedback and suggestions. Thank you!
> > > > >
> > > > >[1]
> > > >
> > https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
> > > > >
> > > > >Regards,
> > > > >Nicholas Jiang
> > > >
> > >
> >
> 


Re: [DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-07-15 Thread Nicholas Jiang
Hi community,

I would like to emphasize here that this proposal is derived from the chaos 
testing framework built by @Ethan Feng within Alibaba Cloud. Thanks for efforts 
of @Ethan Feng in providing a testing framework to verify the reliability and 
stability of Celeborn.

Thanks,
Nicholas Jiang

On 2024/07/02 21:20:57 Nicholas Jiang wrote:
> Hi all,
> 
> I would like to start a discussion on CIP-10: Introduce Celeborn Chaos 
> Testing Framework[1].
> 
> A chaos testing framework is designed to simulate unpredictable and adverse 
> conditions in distributed systems to validate their robustness and 
> resilience. This proposal aims to simulate various anomalies and test the 
> stability of Celeborn in distributed environments via chaos testing.
> 
> Looking forward to everyone's feedback and suggestions. Thank you!
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
> 
> Regards,
> Nicholas Jiang


Re:Re: [ANNOUNCE] New Celeborn Committer: Fei Wang

2024-07-22 Thread Nicholas Jiang
Congratulations!Regards,

Nicholas Jiang


在 2024-07-23 12:21:19,"Yihe Li"  写道:
>Congratulations!
>
>Regards,
>Yihe Li
>
>On 2024/07/23 04:16:20 Keyong Zhou wrote:
>> Congratulations!
>> 
>> Regards,
>> Keyong Zhou
>> 
>> angers zhu  于2024年7月23日周二 12:07写道:
>> 
>> > Congratulations!
>> >
>> > Shaoyun Chen  于2024年7月23日周二 11:15写道:
>> >
>> > > Congratulations!
>> > >
>> > > Cheng Pan  于2024年7月23日周二 11:05写道:
>> > > >
>> > > > Hi Celeborn Community,
>> > > >
>> > > > The Project Management Committee (PMC) for Apache Celeborn
>> > > > has invited Fei Wang to become a committer and we are pleased
>> > > > to announce that he has accepted.
>> > > >
>> > > > Being a committer enables easier contribution to the
>> > > > project since there is no need to go via the patch
>> > > > submission process. This should enable better productivity.
>> > > > A PMC member helps manage and guide the direction of the project.
>> > > >
>> > > > Please join me in congratulating Fei!
>> > > >
>> > > > Thanks,
>> > > > Cheng Pan
>> > >
>> >
>> 


Re: [VOTE] Release Apache Celeborn 0.4.2-rc1

2024-07-25 Thread Nicholas Jiang
+1 (binding)
I have checked
- Download links are valid.
- Checksums and signatures are valid.
- Git commit hash is correct
- No binary files in the source release
- Built binary from source code with command successfully: 
./build/make-distribution.sh -Pspark-3.3

Meanwhile, I have also tested 0.4.2 with chaos testing framework. The test 
result is that everything worked well.

Regards,
Nicholas Jiang

On 2024/07/22 10:14:45 Fu Chen wrote:
> Hi Celeborn community,
> 
> This is a call for a vote to release Apache Celeborn 0.4.2-rc1
> 
> 
> The git tag to be voted upon:
> https://github.com/apache/celeborn/releases/v0.4.2-rc1
> 
> 
> The git commit hash:
> d3639bb3c3d4cb2d224f1d6542e2d20d3047c76e
> 
> Source and binary artifacts can be found at:
> https://dist.apache.org/repos/dist/dev/celeborn/v0.4.2-rc1/
> 
> 
> The staging repo:
> https://repository.apache.org/content/repositories/orgapacheceleborn-1079
> 
> 
> Fingerprint of the PGP key release artifacts are signed with:
> 92AF4750DAFCB5E25B5B83EA76F54B977EB5C09B
> 
> 
> My public key to verify signatures can be found in:
> https://dist.apache.org/repos/dist/release/incubator/celeborn/KEYS
> 
> 
> The vote will be open for at least 72 hours or until the necessary
> number of votes are reached.
> 
> 
> Please vote accordingly:
> 
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> Thanks,
> Fu Chen
> 


Re: [VOTE] Release Apache Celeborn 0.5.1-rc0

2024-07-25 Thread Nicholas Jiang
+1 (binding)

I have checked
- Download links are valid.
- Checksums and signatures are valid.
- Git commit hash is correct
- No binary files in the source release
- Built binary from source code with command successfully: 
./build/make-distribution.sh -Pspark-3.5

I have upgraded version to 0.5.1-snapshot in internal test environment a week 
ago. 0.5.1 works well in test environment.

Regards,
Nicholas Jiang

On 2024/07/23 03:38:57 Ethan Feng wrote:
> Hello, Celeborn community,
> 
> This is a call for a vote to release Apache Celeborn
> 0.5.1-rc0
> 
> The git tag to be voted upon:
> https://github.com/apache/celeborn/releases/tag/v0.5.1-rc0
> 
> Source and binary artifacts can be found at:
> https://dist.apache.org/repos/dist/dev/celeborn/v0.5.1-rc0
> 
> The git commit hash:
> 85297ca64c0973f19b15831031859c3104f0db5b
> 
> The staging repo:
> https://repository.apache.org/content/repositories/orgapacheceleborn-1081
> 
> The fingerprint of the PGP key release artifacts is signed with:
> FCF20BB29C7BEFDF58F998F76392F71F37356FA0
> 
> My public key to verify signatures can be found in:
> https://dist.apache.org/repos/dist/release/celeborn/KEYS
> 
> The vote will be open for at least 72 hours or until the necessary
> number of votes are reached.
> 
> Please vote accordingly:
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> Steps to validate the release:
> https://www.apache.org/info/verification.html
> 
> * Download links, checksums, and PGP signatures are valid.
> * Source code distributions have correct names matching the current release.
> * LICENSE and NOTICE files are correct.
> * All files have license headers if necessary.
> * No unlicensed compiled archives bundled in the source archive.
> * The source tarball matches the git tag.
> * Build from source is successful.
> 
> Regards,
> Ethan Feng
> 


Re: [DISCUSS] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-08-02 Thread Nicholas Jiang
Hi all,

Thank you for your feedback. I will start a VOTE thread for CIP-10.

Regards,
Nicholas Jiang

On 2024/07/02 21:20:57 Nicholas Jiang wrote:
> Hi all,
> 
> I would like to start a discussion on CIP-10: Introduce Celeborn Chaos 
> Testing Framework[1].
> 
> A chaos testing framework is designed to simulate unpredictable and adverse 
> conditions in distributed systems to validate their robustness and 
> resilience. This proposal aims to simulate various anomalies and test the 
> stability of Celeborn in distributed environments via chaos testing.
> 
> Looking forward to everyone's feedback and suggestions. Thank you!
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
> 
> Regards,
> Nicholas Jiang


[VOTE] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-08-02 Thread Nicholas Jiang
Hi all,

Thanks for all the feedback about the CIP-10: Introduce Celeborn Chaos Testing 
Framework[1]. The discussion thread is here [2].

I'd like to start a vote for it. The vote will be open for at least 72 hours 
unless there is an objection or insufficient votes.

Please vote accordingly:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and the reason)


[1] 
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
[2] https://lists.apache.org/thread/670qw80wwfflgv3djqg4304xqy9y8l19

Regards,
Nicholas Jiang

Re: [DRAFT] Celeborn Board Report

2024-08-13 Thread Nicholas Jiang
Thanks Keyong! LGTM.

Regards,
Nicholas Jiang

On 2024/08/12 02:58:10 Keyong Zhou wrote:
> Hi community,
> 
> The board report is due on August 14th, following is the draft I made, any
> comments
> will be appreciated, thanks!
> 
> ## Description:
> The mission of Apache Celeborn is the creation and maintenance of software
> related to an intermediate data service for big data computing engines to
> boost
> performance, stability, and flexibility
> 
> ## Project Status:
> Current project status: Ongoing
> Issues for the board: None
> 
> ## Membership Data:
> There are currently 22 committers and 14 PMC members in this project.
> The Committer-to-PMC ratio is roughly 3:2.
> 
> Community changes, past quarter:
> 
> - Nicholas Jiang was added to the PMC on 2024-07-23.
> - Fei Wang was added as committer on 2024-07-23.
> 
> ## Project Activity:
> Software development activity:
> 
>  - We released 0.5.1 on July 29th.
>  - We released 0.4.2 on July 26th.
>  - We released 0.5.0 on June 24th.
>  - Support for Apache Flink 1.20 is merged.
>  - Support for Apache Tez is under development.
>  - Several CIPs have been discussed and voted, including Support Flink
> hybrid shuffle, Celeborn CLI, Chaos Testing Framework, etc.
> 
> Meetups and Conferences:
> 
>  - 4 related talks were given in Apache CoC Asia 2024.
> 
> Recent releases:
> 
> - 0.5.1 was released on July 29th, 2024.
> - 0.4.2 was released on July 26th, 2024.
> - 0.5.0 was released on June 24th, 2024.
> 
> ## Community Health:
> Overall community health is good. In the past quarter, dev mail list had a
> 73% increase in past quarter. We have been performing
> extensive outreach for our users, and encouraging them to contribute back
> to the project. Also, we are active in making a voice
> in various conferences to attract more users.
> 
> Regards,
> Keyong Zhou
> 


Re: [VOTE] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-08-22 Thread Nicholas Jiang
Hi all,

Thanks for all your votes, I hereby close the vote and I'll announce the 
results in a separate email.

Regards,
Nicholas Jiang

On 2024/08/02 14:37:19 Nicholas Jiang wrote:
> Hi all,
> 
> Thanks for all the feedback about the CIP-10: Introduce Celeborn Chaos 
> Testing Framework[1]. The discussion thread is here [2].
> 
> I'd like to start a vote for it. The vote will be open for at least 72 hours 
> unless there is an objection or insufficient votes.
> 
> Please vote accordingly:
> 
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and the reason)
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
> [2] https://lists.apache.org/thread/670qw80wwfflgv3djqg4304xqy9y8l19
> 
> Regards,
> Nicholas Jiang


[RESULT][VOTE] CIP-10: Introduce Celeborn Chaos Testing Framework

2024-08-22 Thread Nicholas Jiang
Hi, all

I am happy to say that CIP-10: Introduce Celeborn Chaos Testing Framework[1] 
has been accepted.

There are 4 votes, of which 3 are binding[2].

Ethan Feng (binding)
Jiashu Xiong (binding)
Keyong Zhou (binding)
Mridul Muralidharan (non-binding)

There are no disapproving votes.

Thanks to everyone who participated in the discussion and voting.


[1] 
https://cwiki.apache.org/confluence/display/CELEBORN/CIP-10+Introduce+Celeborn+Chaos+Testing+Framework
[2] https://lists.apache.org/thread/2lqq021vyc98w3yly678s8lpv0o8vpz5


Regards,
Nicholas Jiang