Hi all, I have checked the quick graceful shutdown and rolling restart worker. Fix some minor problem and it's ok.
Best Regards Angerszhuuuu Keyong Zhou <zho...@apache.org> 于2022年12月21日周三 11:39写道: > Thank you guys for the valuable tests! > > angers zhu <angers....@gmail.com> 于2022年12月21日周三 10:33写道: > > > Hi all, > > > > I have tested quota management and make some minor improvements. It can > > work well. > > later I will test worker quick restart. > > > > apache <ethanf...@apache.org> 于2022年12月21日周三 10:00写道: > > > >> Hi community: > >> I’ve tested branch-0.2 about load-aware slots allocation modules with a > >> fusion cluster of HDDs, SSDs, and ESSDs. As expected, workers with > faster > >> disk drives have more slots on them. > >> > >> The way I tested is as follows: > >> I ran 1T TPC-DS while 1TB Terasort was running repeatedly and tested > >> different parameter combinations on three node groups. The group core > has > >> 10x NVME SSD drives. Group task 1 has 4x ESSD drives. Group task 2 has > 12x > >> SATA HDD drivers. > >> > >> Here are some snapshots: > >> > >> Graph 1: Only NVME SSD and HDD workers are involved and > >> "celeborn.slots.assign.loadAware.numDiskGroups" is set to 2. > >> > >> > >> Graph 2: All nodes are involved and > >> "celeborn.slots.assign.loadAware.numDiskGroups" is set to 3. > >> > >> > >> Thanks, > >> Ethan Feng > >> 在 2022年12月14日 +0800 19:41,Keyong Zhou <zho...@apache.org>,写道: > >> > >> Hi celeborn (-incubating) community: > >> > >> Currently we are preparing for the first release (branch-0.2). To ensure > >> code quality, I would like to test for core-path correctness and > >> stability, > >> could Angerszhuuuu <angers....@gmail.com> and nafiyaix > >> <nafiyai...@gmail.com> help test graceful shutdown and rolling upgrade? > >> And > >> could Ethan Feng <ethan.aquarius....@gmail.com> help test load-aware > >> slots > >> allocation? > >> > >> And we would be rather happy if anyone can help test for other modules > >> (k8s, HA, etc.). > >> > >> Thanks, > >> Keyong > >> > >> >