J-HowHuang opened a new pull request, #17494:
URL: https://github.com/apache/pinot/pull/17494

   ## Description
   The current implementation of `PinotHelixTaskResourceManager` has a lot of 
synchronized methods, which introduces a potential global lock contention on 
the manager instance across all minion task related API resources (endpoints), 
see comments:
   
https://github.com/apache/pinot/blob/c23a1e971dceafae424ccd1d6c5b829ecf3c8e78/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java#L77-L81
   
   For the API resources that call on any of these synchronized methods, it 
could become long-running if any other synchronized method is blocked. For 
example: 
https://github.com/apache/pinot/blob/c23a1e971dceafae424ccd1d6c5b829ecf3c8e78/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java#L284-L296
   
   This method calls `_taskDriver.enqueueJob`, which writes to ZK with retries, 
and the retry mechanics is implemented by Helix. Some issues have been spotted 
with 24 Hrs with indefinite retry.
   
   Now any call to `PinotHelixTaskResourceManager`'s synchronized methods will 
be blocked. Therefore requests to these API endpoint may block the thread pool 
that controller used to handle API request. In this case, this thread pool: 
https://github.com/apache/pinot/blob/8333688e4ecec030b334f4d3239b6737ef17fdea/pinot-core/src/main/java/org/apache/pinot/core/util/ListenerConfigUtil.java#L239-L241
   
   This would result in a total unresponsive controller, even the request of 
pinot UI. Therefore we need to isolate these long-running API resources from 
other crucial API resources.
   
   ## Change
   * Create a separate thread pool dedicated for potentially long-running 
minion task related API resources.
     * The size of the thread pool is the same as our current http handler 
thread pool (`grizzly-http-server-%d`)
   * Run the handler of these resources asynchronously, using Jersey's 
`@Suspended AsyncResponse` annotation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to