[ 
https://issues.apache.org/jira/browse/FLINK-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang updated FLINK-13938:
------------------------------
    Description: 
Currently, every time we start a flink cluster, flink lib jars need to be 
uploaded to hdfs and then register Yarn local resource so that it could be 
downloaded to jobmanager and all taskmanager container. I think we could have 
two optimizations.
 # Use pre-uploaded flink binary to avoid uploading of flink system jars
 # By default, the LocalResourceVisibility is APPLICATION, so they will be 
downloaded only once and shared for all taskmanager containers of a same 
application in the same node. However, different applications will have to 
download all jars every time, including the flink-dist.jar. We could use the 
yarn public cache to eliminate the unnecessary jars downloading and make 
launching container faster.
 
 

Following the discussion in the user ML. 
[https://lists.apache.org/list.html?u...@flink.apache.org:lte=1M:Flink%20Conf%20%22yarn.flink-dist-jar%22%20Question]

Take both FLINK-13938 and FLINK-14964 into account, this feature will be done 
in the following steps.
 * Enrich "\-yt/--yarnship" to support HDFS directory
 * Add a new config option to control whether to disable the flink-dist 
uploading(*Will be extended to support all files, including lib/plugin/user 
jars/dependencies/etc.*)
 * Enrich "\-yt/--yarnship" to specify local resource visibility. It is 
"APPLICATION" by default. It could be also configured to "PUBLIC", which means 
shared by all applications, or "PRIVATE" which means shared by a same user. 
(*Will be done later according to the feedback*)
  
 How to use this feature?
 1. First, upload the Flink binary and user jars to the HDFS directories
 2. Use "\-yt/–yarnship" to specify the pre-uploaded libs
 3. Disable the automatic uploading of flink-dist via 
{{yarn.submission.automatic-flink-dist-upload}}: false
  
 A final submission command could be issued like following.
{code:java}
./bin/flink run -m yarn-cluster -d \
-yt hdfs://myhdfs/flink/release/flink-1.11 \
-yD yarn.submission.automatic-flink-dist-upload=false \
examples/streaming/WindowJoin.jar
{code}

  was:
Currently, every time we start a flink cluster, flink lib jars need to be 
uploaded to hdfs and then register Yarn local resource so that it could be 
downloaded to jobmanager and all taskmanager container. I think we could have 
two optimizations.
 # Use pre-uploaded flink binary to avoid uploading of flink system jars
 # Use the yarn public cache to eliminate the unnecessary jars downloading and 
make launching container faster. The public cache could be shared by different 
applications.

 

By default, the LocalResourceVisibility is APPLICATION, so they will be 
downloaded only once and shared for all taskmanager containers of a same 
application in the same node. However, different applications will have to 
download all jars every time, including the flink-dist.jar. We could use the 
yarn public cache to eliminate the unnecessary jars downloading and make 
launching container faster.

 

 

Following the discussion in the user ML. 
[https://lists.apache.org/list.html?u...@flink.apache.org:lte=1M:Flink%20Conf%20%22yarn.flink-dist-jar%22%20Question]
 Take both FLINK-13938 and FLINK-14964 into account, this feature will be done 
in the following steps.
 * Enrich "-yt/--yarnship" to support HDFS directory
 * Add a new config option to control whether to disable the flink-dist 
uploading
 * Enrich "-yt/--yarnship" to specify local resource visibility. It is 
"APPLICATION" by default. It could be also configured to "PUBLIC", which means 
shared by all applications, or "PRIVATE" which means shared by a same user. 
(*Will be done later according to the feedback*)
  
 How to use this feature?
 1. First, upload the Flink binary and user jars to the HDFS directories
 2. Use "-yt/–yarnship" to specify the pre-uploaded libs
 3. Disable the automatic uploading of flink-dist via 
{{yarn.submission.automatic-flink-dist-upload}}: false
  
 A final submission command could be issued like following.
{code:java}
./bin/flink run -m yarn-cluster -d \
-yt hdfs://myhdfs/flink/release/flink-1.11 \
-yD yarn.submission.automatic-flink-dist-upload=false \
examples/streaming/WindowJoin.jar
{code}


> Use pre-uploaded libs to accelerate flink submission
> ----------------------------------------------------
>
>                 Key: FLINK-13938
>                 URL: https://issues.apache.org/jira/browse/FLINK-13938
>             Project: Flink
>          Issue Type: New Feature
>          Components: Client / Job Submission, Deployment / YARN
>            Reporter: Yang Wang
>            Assignee: Yang Wang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, every time we start a flink cluster, flink lib jars need to be 
> uploaded to hdfs and then register Yarn local resource so that it could be 
> downloaded to jobmanager and all taskmanager container. I think we could have 
> two optimizations.
>  # Use pre-uploaded flink binary to avoid uploading of flink system jars
>  # By default, the LocalResourceVisibility is APPLICATION, so they will be 
> downloaded only once and shared for all taskmanager containers of a same 
> application in the same node. However, different applications will have to 
> download all jars every time, including the flink-dist.jar. We could use the 
> yarn public cache to eliminate the unnecessary jars downloading and make 
> launching container faster.
>  
>  
> Following the discussion in the user ML. 
> [https://lists.apache.org/list.html?u...@flink.apache.org:lte=1M:Flink%20Conf%20%22yarn.flink-dist-jar%22%20Question]
> Take both FLINK-13938 and FLINK-14964 into account, this feature will be done 
> in the following steps.
>  * Enrich "\-yt/--yarnship" to support HDFS directory
>  * Add a new config option to control whether to disable the flink-dist 
> uploading(*Will be extended to support all files, including lib/plugin/user 
> jars/dependencies/etc.*)
>  * Enrich "\-yt/--yarnship" to specify local resource visibility. It is 
> "APPLICATION" by default. It could be also configured to "PUBLIC", which 
> means shared by all applications, or "PRIVATE" which means shared by a same 
> user. (*Will be done later according to the feedback*)
>   
>  How to use this feature?
>  1. First, upload the Flink binary and user jars to the HDFS directories
>  2. Use "\-yt/–yarnship" to specify the pre-uploaded libs
>  3. Disable the automatic uploading of flink-dist via 
> {{yarn.submission.automatic-flink-dist-upload}}: false
>   
>  A final submission command could be issued like following.
> {code:java}
> ./bin/flink run -m yarn-cluster -d \
> -yt hdfs://myhdfs/flink/release/flink-1.11 \
> -yD yarn.submission.automatic-flink-dist-upload=false \
> examples/streaming/WindowJoin.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to