[jira] [Commented] (FLINK-30101) Always use StandaloneClientHAServices to create RestClusterClient when retriving a Flink on YARN cluster client
[ https://issues.apache.org/jira/browse/FLINK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17637155#comment-17637155 ] Zhanghao Chen commented on FLINK-30101: --- Thanks, you made a point. I did not take the case where leadership changes after getting the application report into consideration. > Always use StandaloneClientHAServices to create RestClusterClient when > retriving a Flink on YARN cluster client > > > Key: FLINK-30101 > URL: https://issues.apache.org/jira/browse/FLINK-30101 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Affects Versions: 1.16.0 >Reporter: Zhanghao Chen >Priority: Major > Fix For: 1.17.0 > > > *Problem* > Currently, the procedure of retrieving a Flink on YARN cluster client is as > follows (in YarnClusterDescriptor#retrieve method): > # Get application report from YARN > # Set rest.address & rest.port using the info from application report > # Create a new RestClusterClient using the updated configuration, will use > client HA serivce to fetch the rest.address & rest.port if HA is enabled > Here, we can see that the usage of client HA in step 3 is redundant, as we've > already got the rest.address & rest.port from YARN application report. When > ZK HA is enabled, this would take ~1.5 s to initialize client HA services and > fetch the rest IP & port. > 1.5 s can mean a lot for latency-sensitive client operations. In my company, > we use Flink client to submit short-running session jobs and e2e latency is > critical. The job submission time is around 10 s on average, and 1.5s would > mean a 15% time saving. > *Proposal* > When retrieving a Flink on YARN cluster client, use > StandaloneClientHAServices to > create RestClusterClient instead as we have pre-fetched rest.address & > rest.port from YARN application report. This is also what we did in > KubernetesClusterDescriptor. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30101) Always use StandaloneClientHAServices to create RestClusterClient when retriving a Flink on YARN cluster client
[ https://issues.apache.org/jira/browse/FLINK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636970#comment-17636970 ] Xintong Song commented on FLINK-30101: -- I'm not sure about the proposed changes. {{StandaloneClientHAServices}} and {{StandaloneLeaderRetrievalService}} assumes there's only one contender, which should always be the leader. There's no such guarantee when running a Yarn deployment. It is possible that the leadership changes after getting the application report, and ZK HA makes sure the rest client always connects to the latest leader address in such cases. For short sql jobs, you may want to consider sql-gateway, which does not fetch leader address for every submitted job. Unfortunately, there's no such thing for DataStream / Table API jobs. Besides, you may also consider a non-HA cluster, if the end-to-end latency is cared mostly. > Always use StandaloneClientHAServices to create RestClusterClient when > retriving a Flink on YARN cluster client > > > Key: FLINK-30101 > URL: https://issues.apache.org/jira/browse/FLINK-30101 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Affects Versions: 1.16.0 >Reporter: Zhanghao Chen >Priority: Major > Fix For: 1.17.0 > > > *Problem* > Currently, the procedure of retrieving a Flink on YARN cluster client is as > follows (in YarnClusterDescriptor#retrieve method): > # Get application report from YARN > # Set rest.address & rest.port using the info from application report > # Create a new RestClusterClient using the updated configuration, will use > client HA serivce to fetch the rest.address & rest.port if HA is enabled > Here, we can see that the usage of client HA in step 3 is redundant, as we've > already got the rest.address & rest.port from YARN application report. When > ZK HA is enabled, this would take ~1.5 s to initialize client HA services and > fetch the rest IP & port. > 1.5 s can mean a lot for latency-sensitive client operations. In my company, > we use Flink client to submit short-running session jobs and e2e latency is > critical. The job submission time is around 10 s on average, and 1.5s would > mean a 15% time saving. > *Proposal* > When retrieving a Flink on YARN cluster client, use > StandaloneClientHAServices to > create RestClusterClient instead as we have pre-fetched rest.address & > rest.port from YARN application report. This is also what we did in > KubernetesClusterDescriptor. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30101) Always use StandaloneClientHAServices to create RestClusterClient when retriving a Flink on YARN cluster client
[ https://issues.apache.org/jira/browse/FLINK-30101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636408#comment-17636408 ] Zhanghao Chen commented on FLINK-30101: --- [~xtsong] Looking forwarding to your opinions on the proposal~ Much thanks > Always use StandaloneClientHAServices to create RestClusterClient when > retriving a Flink on YARN cluster client > > > Key: FLINK-30101 > URL: https://issues.apache.org/jira/browse/FLINK-30101 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Affects Versions: 1.16.0 >Reporter: Zhanghao Chen >Priority: Major > Fix For: 1.17.0 > > > *Problem* > Currently, the procedure of retrieving a Flink on YARN cluster client is as > follows (in YarnClusterDescriptor#retrieve method): > # Get application report from YARN > # Set rest.address & rest.port using the info from application report > # Create a new RestClusterClient using the updated configuration, will use > client HA serivce to fetch the rest.address & rest.port if HA is enabled > Here, we can see that the usage of client HA in step 3 is redundant, as we've > already got the rest.address & rest.port from YARN application report. When > ZK HA is enabled, this would take ~1.5 s to initialize client HA services and > fetch the rest IP & port. > 1.5 s can mean a lot for latency-sensitive client operations. In my company, > we use Flink client to submit short-running session jobs and e2e latency is > critical. The job submission time is around 10 s on average, and 1.5s would > mean 15% of time saving. > *Proposal* > When retrieving a Flink on YARN cluster client, use > StandaloneClientHAServices to > create RestClusterClient instead as we have pre-fetched rest.address & > rest.port from YARN application report. This is also what we did in > KubernetesClusterDescriptor. -- This message was sent by Atlassian Jira (v8.20.10#820010)