stream2000 commented on code in PR #9395: URL: https://github.com/apache/hudi/pull/9395#discussion_r1328657021
########## hudi-common/src/main/java/org/apache/hudi/common/engine/HoodieEngineContext.java: ########## @@ -105,4 +105,12 @@ public abstract <I, K, V> List<V> reduceByKey( public abstract void cancelJob(String jobId); public abstract void cancelAllJobs(); + + public <T> Stream<T> stream(List<T> data, Integer parallelism) { + return stream(stream(data.stream(), data.size()), parallelism); + } + + public <T> Stream<T> stream(Stream<T> data, Integer parallelism) { + return parallelism == null || parallelism > Runtime.getRuntime().availableProcessors() ? data : data.parallel(); Review Comment: @yihua I agree that reducing the parallelism can avoid OOM in some situations. I'm just wondering should we reduce the parallelism automatically. `HoodieFlinkEngineContext.map` provides a param `parallelism` and we can configure the parallelism as we want. In this scenario (Clean OOM in flink Engine) we can just set `hoodie.cleaner.parallelism` as 1 to reduce the parallelism. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org