Fan Hong created FLINK-31625: -------------------------------- Summary: Possbile OOM in KBinsDiscretizer Key: FLINK-31625 URL: https://issues.apache.org/jira/browse/FLINK-31625 Project: Flink Issue Type: Bug Components: Library / Machine Learning Reporter: Fan Hong
In KBinsDiscretizer, the main computation `findBinEdgesWithXXXStrategy` is put into a single subtask. While data sampling is used to decrease memory usage, the memory overhead can still be prohibitive for large input vectors, potentially resulting in OOM errors. A potential solution is to implement parallel computation, distributing the data evenly among all workers. -- This message was sent by Atlassian Jira (v8.20.10#820010)