Hi devs,

I'm opening this thread to discuss FLIP-407: Improve Flink Client
performance in interactive scenarios. The POC test results and design doc
can be found at: FLIP-407
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+when+interacting+with+dedicated+Flink+Session+Clusters>
.

Currently, Flink Client is mainly designed for one time interaction with
the Flink Cluster. All the resources(http connections, threads, ha
services) and instances(ClusterDescriptor, ClusterClient, RestClient) are
created and recycled for each interaction. This works well when users do
not need to interact frequently with Flink Cluster and also saves resource
usage since resources are recycled immediately after each usage.

However, in OLAP or StreamingWarehouse scenarios, users might submit
interactive jobs to a dedicated Flink Session Cluster very often. In this
case, we find that for short queries that can finish in less than 1s in
Flink Cluster will still have E2E latency greater than 2s. Hence, we
propose this FLIP to improve the Flink Client performance in this scenario.
This could also improve the user experience when using session debug mode.

The major change in this FLIP is that there will be a new introduced option
*'execution.interactive-client'*. When this option is enabled, Flink
Client will reuse all the necessary resources to improve interactive
performance, including: HA Services, HTTP connections, threads and all
kinds of instances related to a long-running Flink Cluster. The default
value of this option will be false, then Flink Client will behave as before.

Also, this FLIP proposed a configurable RetryStrategy when fetching results
from client-side to Flink Cluster. In interactive scenarios, this can save
more than 15% of TM CPU usage without performance degradation.

Looking forward to your feedback, thanks.

Best regards,
Xiangyu

Reply via email to