William Lo created GOBBLIN-2135:
-----------------------------------
Summary: Cache Yarn jars in GobblinYarnAppLauncher
Key: GOBBLIN-2135
URL: https://issues.apache.org/jira/browse/GOBBLIN-2135
Project: Apache Gobblin
Issue Type: Improvement
Reporter: William Lo
Gobblin YARN Application Launcher lacks some functionality used in
MRJobLauncher. One of the biggest gaps in feature parity is the absence of jar
caching, where MRJobLauncher creates a monthly cache that is automatically
cleaned up by subsequent executions performed 2 months in advance.
YARN/MR requires uploading jars to HDFS, this step can be quite slow (~15 mins
for a sizeable job to get all the jars), and given that many jobs do share the
same jars, it makes sense to cache them together and only provide YARN the
shared path.
We also want to ensure that SNAPSHOT jars are other files are not uploaded to a
cache, since they are not immutable unlike jar versions on Artifactory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)