[ 
https://issues.apache.org/jira/browse/FLINK-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884819#comment-16884819
 ] 

Zhenqiu Huang commented on FLINK-13132:
---------------------------------------

[~fly_in_gis]
Our concern of the cost is mainly on the pipeline downtime. In our current 
design, the downloads always from the nearest storages, such as hdfs in prime 
and s3/GCS for cloud. But we think it is not enough to guarantee our SLA in 
worst case. If considering start a job in the service, the end to end latency 
includes download jars, start another process, start session client, upload 
remote resource,  start job cluster, submit joggraph to start the job, etc. It 
usually takes 1 - 2 minutes for low QPS. If request burst (1000 requests) comes 
due to some unexpected issue, some of the redeployment requests will be much 
slower due to the resource competition in the each stage of of job submission. 
The optimization we want to do is to skip some of the steps (like upload remote 
resource, job graph generation) in service side, and put the job-graph 
compilation into ClusterEntrypoints. In this way, download jar can be ignored, 
and the job graph can be parallelized for each job right after start a cluster, 
so that even in worst case, we can guarantee our downtime SLA.






> Allow ClusterEntrypoints use user main method to generate job graph
> -------------------------------------------------------------------
>
>                 Key: FLINK-13132
>                 URL: https://issues.apache.org/jira/browse/FLINK-13132
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.8.0, 1.8.1
>            Reporter: Zhenqiu Huang
>            Assignee: Zhenqiu Huang
>            Priority: Minor
>
> We are building a service that can transparently deploy a job to different 
> cluster management systems, such as Yarn and another internal system. It is 
> very cost to download the jar and generate JobGraph in the client side. Thus, 
> I want to propose an improvement to make Yarn Entrypoints can be configurable 
> to use either FileJobGraphRetriever or ClassPathJobGraphRetriever. It is 
> actually a long asking TODO in AbstractionYarnClusterDescriptor in line 834.
> https://github.com/apache/flink/blob/21468e0050dc5f97de5cfe39885e0d3fd648e399/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L834



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to