Lobo2008 opened a new pull request, #2607:
URL: https://github.com/apache/uniffle/pull/2607

   ### What changes were proposed in this pull request?
   
   Introduce a configuration `mapreduce.rss.client.combiner.enable` to control 
whether the map-stage combiner runs in Uniffle MapReduce client.  
   Default value is `false` to prevent job instability caused by large 
send-buffer GC storm.
   
   
   ### Why are the changes needed?
   
   Using map-stage combiner on large send buffer (`mapreduce.task.io.sort.mb * 
mapreduce.rss.client.sort.memory.use.threshold`) can trigger severe GC 
overhead,  
   which may stall MapTask and sender threads, leading to job hang. Most users 
do not require this by default.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, this adds a new optional configuration for expert users. Default 
behavior remains stable.
   
   ### How was this patch tested?
   
   Manually tested with MapReduce jobs with combiners. Verified that jobs run 
successfully with combiner disabled.
   1. **Combiner disabled**: MapTasks completed normally with fast GC cycles. 
Sample logs:
   ```
   [2025-09-11 19:48:47] S0: 0MB, S1: 0MB, Eden: 299.02MB, Old: 11.53MB, ... 
Total: 0.101s
   ...
   [2025-09-11 19:49:30] S0: 82.88MB, S1: 0MB, Eden: 160.04MB, Old: 485.99MB, 
...Total: 6.683s
   ...
   [2025-09-11 19:49:57] S0: 0MB, S1: 0MB, Eden: 207.66MB, Old: 532.27MB, 
...Total: 12.474s
   ```
   > The MapTask completed successfully within 1 minute.
   
   2. **Combiner enabled**: MapTask GC cycles grew very long; job stalled and 
was eventually killed. Sample logs:
   
   ```
   [2025-09-11 19:52:00] S0: 0MB, S1: 0MB, Eden: 80.49MB, Old: 12.86MB, ... 
Total: 0.054s
   [2025-09-11 19:52:08] S0: 0MB, S1: 0MB, Eden: 515.53MB, Old: 27.24MB, ... 
Total: 0.149s
   ...
   [2025-09-11 20:01:54] S0: 0MB, S1: 0MB, Eden: 60.36MB, Old: 687.51MB, ... 
Total: 242.505s
   ```
   
   > The MapTask did not complete after 9 minutes.
   
   These logs demonstrate that disabling the map-stage combiner avoids severe 
GC overhead and job stalls, validating the safety switch.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to