Hi Celeborn (Incubating) community, I have tested the core-path and fixed on bug[1]. After fixing this, my test seems OK.
The way I tested is as follows: I run 1T TPCDS with Celeborn on, then I randomly kill worker during the test, carefully not to trigger datalost. In the end I check whether the results are correct. [1] https://github.com/apache/incubator-celeborn/pull/1101 Thanks, Keyong Zhou Keyong Zhou <zho...@apache.org> 于2022年12月14日周三 19:41写道: > Hi celeborn (-incubating) community: > > Currently we are preparing for the first release (branch-0.2). To ensure > code quality, I would like to test for core-path correctness and stability, > could Angerszhuuuu <angers....@gmail.com> and nafiyaix > <nafiyai...@gmail.com> help test graceful shutdown and rolling upgrade? > And could Ethan Feng <ethan.aquarius....@gmail.com> help test load-aware > slots allocation? > > And we would be rather happy if anyone can help test for other modules > (k8s, HA, etc.). > > Thanks, > Keyong >