Lobo2008 opened a new pull request, #2607: URL: https://github.com/apache/uniffle/pull/2607
### What changes were proposed in this pull request? Introduce a configuration `mapreduce.rss.client.combiner.enable` to control whether the map-stage combiner runs in Uniffle MapReduce client. Default value is `false` to prevent job instability caused by large send-buffer GC storm. ### Why are the changes needed? Using map-stage combiner on large send buffer (`mapreduce.task.io.sort.mb * mapreduce.rss.client.sort.memory.use.threshold`) can trigger severe GC overhead, which may stall MapTask and sender threads, leading to job hang. Most users do not require this by default. ### Does this PR introduce _any_ user-facing change? Yes, this adds a new optional configuration for expert users. Default behavior remains stable. ### How was this patch tested? Manually tested with MapReduce jobs with combiners. Verified that jobs run successfully with combiner disabled. 1. **Combiner disabled**: MapTasks completed normally with fast GC cycles. Sample logs: ``` [2025-09-11 19:48:47] S0: 0MB, S1: 0MB, Eden: 299.02MB, Old: 11.53MB, ... Total: 0.101s ... [2025-09-11 19:49:30] S0: 82.88MB, S1: 0MB, Eden: 160.04MB, Old: 485.99MB, ...Total: 6.683s ... [2025-09-11 19:49:57] S0: 0MB, S1: 0MB, Eden: 207.66MB, Old: 532.27MB, ...Total: 12.474s ``` > The MapTask completed successfully within 1 minute. 2. **Combiner enabled**: MapTask GC cycles grew very long; job stalled and was eventually killed. Sample logs: ``` [2025-09-11 19:52:00] S0: 0MB, S1: 0MB, Eden: 80.49MB, Old: 12.86MB, ... Total: 0.054s [2025-09-11 19:52:08] S0: 0MB, S1: 0MB, Eden: 515.53MB, Old: 27.24MB, ... Total: 0.149s ... [2025-09-11 20:01:54] S0: 0MB, S1: 0MB, Eden: 60.36MB, Old: 687.51MB, ... Total: 242.505s ``` > The MapTask did not complete after 9 minutes. These logs demonstrate that disabling the map-stage combiner avoids severe GC overhead and job stalls, validating the safety switch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
