zuston opened a new issue, #503:
URL: https://github.com/apache/incubator-uniffle/issues/503

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Describe the bug
   
   I found the full GC occurs when there are too many partitions on a shuffle 
server, and the duration could be 20s+.
   
   Env: 
   1. using java8
   2. xmx=30g, buffer-capacity=10g, read-capacity=10g
   3. from the metric dashboard, during peek hours, there are 20k partitions in 
a shuffle-server, but disk used capacity ratio is 0.1-0.2
   
   I guess the object creating or allocation request speed is greater than the 
gc speed, which causes the STW.
   
   ### Affects Version(s)
   
   master
   
   ### Uniffle Server Log Output
   
   _No response_
   
   ### Uniffle Engine Log Output
   
   _No response_
   
   ### Uniffle Server Configurations
   
   ```yaml
   We use the default g1 GC configuration.
   ```
   
   
   ### Uniffle Engine Configurations
   
   _No response_
   
   ### Additional context
   
   ### Possible solutions
   1. Garbage collector changes to CMS
   2. Expand the uniffle cluster by adding more shuffle-servers
   3. If one shuffle-server has partition number exceeding the threshold, we 
should make it fallback to ess.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to