capistrant opened a new issue, #18629:
URL: https://github.com/apache/druid/issues/18629

   ## Fabric8 KubernetesClient Overview
   
   [Fabric8 KubernetesClient](https://github.com/fabric8io/kubernetes-client) 
is the client library we use in the 
[druid-kubernetes-overlord-extensions](https://druid.apache.org/docs/latest/development/extensions-core/k8s-jobs).
   
   ### Underlying HTTP Client
   
   Fabric8 uses an underlying HTTP client for client/server interaction with 
the K8s cluster. This HTTP client is pluggable. Fabric8 supports four different 
clients as of this writing: `['vert.x', 'okhttp', 'jetty', 'native-jdk']`. 
`vert.x` is currently the default client used by Fabric8.
   
   ## Druid History with the Fabric8 client
   
   #17913 switched Druid to use `vert.x`. 
   
   #18013 got Druid caught up the latest Fabric8 versions.
   
   ## Druid's Path Forward
   
   The reason for this issue is that there have been issues with both `okhttp` 
and `vert.x` in production Druid clusters. In the wild, Druid operators have 
reported issues with both the vert.x and okhttp clients.
   * vert.x: Issues with failures communicating with the API server due to 
unhealthy connections in the connection pool, leading to sporadic task failures.
   * okhttp: Issues with large amounts of threads being created and polluting 
memory if there are many tasks being launched.
   
   The Druid developer community wants to reach a state where a stable default 
HTTP client and configuration is identified, simplifying configuration and 
distribution packaging. In the interim, Druid operators can select the HTTP 
client and configure some its parameters. This will help operators tailor the 
HTTP client to their use case and provide feedback to the Druid developer 
community on what works well in practice.
   
   ### Known Issues
   
   #### 
[vert.x](https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-vertx)
   * Issues with K8s API requests failing with `ConnectionClosed` exceptions 
due to unhealthy connections in the underlying connection pool.
     * This can lead to sporadic task failures.
     * The issue appears to be due to connections being closed on the server 
side, but the client side not cleaning them up before trying to use them in 
future requests.
     * [Related vert.x 
issue](https://github.com/fabric8io/kubernetes-client/issues/7252) has been 
opened with fabric8 to investigate exposing more configuration knobs to tune 
the connection pool. 
   
   #### 
[okhttp](https://github.com/fabric8io/kubernetes-client/tree/main/httpclient-okhttp)
   
   * With the default configuration, the client creates a large number of 
threads, which can lead to memory issues if there are many tasks being launched.
     * The underlying issue appears to be related to an unbounded thread pool 
being used by the client. We are exposing experimental configuration knobs to 
tune the thread pool size to attempt to mitigate this issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to