zuston commented on issue #378:
URL:
https://github.com/apache/incubator-uniffle/issues/378#issuecomment-1373447729
This issue will track all the optimizations of huge partition, and all
sub-tasks will be connected with this. The solution of handling huge partitions
is to make it flush to HDFS directly and limit memory usage, all subtasks are
as follows.
1. Speed up flushing partition data to HDFS.
- [x] https://github.com/apache/incubator-uniffle/pull/396
2. Introduce the memory usage limitation for huge partitions
- [ ] Record every partition data size in `ShuffleTaskInfo`
- [ ] Introduce storage selector strategy(to support huge partition
flushed to HDFS directly) in MultipleStorageManager to replace fallback strategy
- [ ] Introduce more metrics to monitor huge partitions and so on
3. Support split huge event into smaller multiple events concurrently to
speed up flushing
cc @jerqi @advancedxy I will create some subtasks of issues and PRs, feel
free to discuss more.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]