[ https://issues.apache.org/jira/browse/HIVE-17574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mithun Radhakrishnan updated HIVE-17574: ---------------------------------------- Description: Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.) This has to do with the classpaths of Hive actions run from Oozie, and affects scripts that adds jars/resources from HDFS locations. As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be stored in HDFS paths, as are any custom user-libraries used in workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following steps to occur: # Files are downloaded from HDFS to local temp dir. # UDFs are resolved/validated. # All jars/files, including those just downloaded from HDFS, are shipped right back to HDFS-based scratch-directories, for job submission. For HDFS-based files, this is wasteful and time-consuming. #3 above should skip shipping HDFS-based resources, and add those directly to the Tez session. We have a patch that's being used internally at Yahoo. was: Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.) This has to do with the classpaths of Hive actions run from Oozie, and affects scripts that adds jars/resources from HDFS locations. As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend to be stored in HDFS paths, as are any custom user-libraries used in workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following steps to occur: # Files are downloaded from HDFS to local temp dir. # UDFs are resolved/validated. # All jars/files, including those just downloaded from HDFS, are shipped right back to HDFS-based scratch-directories, for job submission. This is wasteful and time-consuming. #3 above should skip shipping HDFS-based resources, and add those directly to the Tez session. We have a patch that's being used internally at Yahoo. > Avoid multiple copies of HDFS-based jars when localizing job-jars > ----------------------------------------------------------------- > > Key: HIVE-17574 > URL: https://issues.apache.org/jira/browse/HIVE-17574 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0, 3.0.0, 2.4.0 > Reporter: Mithun Radhakrishnan > Assignee: Mithun Radhakrishnan > > Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.) > This has to do with the classpaths of Hive actions run from Oozie, and > affects scripts that adds jars/resources from HDFS locations. > As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) > tend to be stored in HDFS paths, as are any custom user-libraries used in > workflows. An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the > following steps to occur: > # Files are downloaded from HDFS to local temp dir. > # UDFs are resolved/validated. > # All jars/files, including those just downloaded from HDFS, are shipped > right back to HDFS-based scratch-directories, for job submission. > For HDFS-based files, this is wasteful and time-consuming. #3 above should > skip shipping HDFS-based resources, and add those directly to the Tez session. > We have a patch that's being used internally at Yahoo. -- This message was sent by Atlassian JIRA (v6.4.14#64029)