[ 
https://issues.apache.org/jira/browse/HIVE-27737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27737:
--------------------------------
    Description: 
[HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
 was about an optimization, where HDFS-based resources optionally were 
localized directly from the "original" hdfs folder instead of a tez session 
dir. This reduced the HDFS overhead, by introducing 
hive.resource.use.hdfs.location, so there are 2 cases:

1. hive.resource.use.hdfs.location=true
a) collect "HDFS temp files" and optimize their access: added files, added jars
b) collect local temp files and use the non-optimized session-based approach: 
added files, added jars, aux jars, reloadable aux jars

{code}
      // reference HDFS based resource directly, to use distribute cache 
efficiently.
      addHdfsResource(conf, tmpResources, LocalResourceType.FILE, 
getHdfsTempFilesFromConf(conf));
      // local resources are session based.
      tmpResources.addAll(
          addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
              getLocalTempFilesFromConf(conf), null).values()
      );
{code}

2. hive.resource.use.hdfs.location=false
a) original behavior: collect all jars in hs2's scope (added files, added jars, 
aux jars, reloadable aux jars) and put it to a session based directory
{code}
      // all resources including HDFS are session based.
      tmpResources.addAll(
          addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
              getTempFilesFromConf(conf), null).values()
      );
{code}

my proposal is related to 1)
let's say user is about to load an aux jar from hdfs and have it set in 
hive.aux.jars.path:
{code}
hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
{code}

in this case: we can distinguish between file:// scheme resources and hdfs:// 
scheme resources:
- file scheme resources should fall into 1b), still be used from session dir
- hdfs scheme resources should fall into 1a), simply used by addHdfsResource

 

> Consider extending HIVE-17574 to aux jars
> -----------------------------------------
>
>                 Key: HIVE-27737
>                 URL: https://issues.apache.org/jira/browse/HIVE-27737
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> [HIVE-17574|https://github.com/apache/hive/commit/26753ade2a130339940119c950b9c9af53e3d024]
>  was about an optimization, where HDFS-based resources optionally were 
> localized directly from the "original" hdfs folder instead of a tez session 
> dir. This reduced the HDFS overhead, by introducing 
> hive.resource.use.hdfs.location, so there are 2 cases:
> 1. hive.resource.use.hdfs.location=true
> a) collect "HDFS temp files" and optimize their access: added files, added 
> jars
> b) collect local temp files and use the non-optimized session-based approach: 
> added files, added jars, aux jars, reloadable aux jars
> {code}
>       // reference HDFS based resource directly, to use distribute cache 
> efficiently.
>       addHdfsResource(conf, tmpResources, LocalResourceType.FILE, 
> getHdfsTempFilesFromConf(conf));
>       // local resources are session based.
>       tmpResources.addAll(
>           addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
>               getLocalTempFilesFromConf(conf), null).values()
>       );
> {code}
> 2. hive.resource.use.hdfs.location=false
> a) original behavior: collect all jars in hs2's scope (added files, added 
> jars, aux jars, reloadable aux jars) and put it to a session based directory
> {code}
>       // all resources including HDFS are session based.
>       tmpResources.addAll(
>           addTempResources(conf, hdfsDirPathStr, LocalResourceType.FILE,
>               getTempFilesFromConf(conf), null).values()
>       );
> {code}
> my proposal is related to 1)
> let's say user is about to load an aux jar from hdfs and have it set in 
> hive.aux.jars.path:
> {code}
> hive.aux.jars.path=file:///opt/some_local_jar.jar,hdfs:///tmp/some_distributed.jar
> {code}
> in this case: we can distinguish between file:// scheme resources and hdfs:// 
> scheme resources:
> - file scheme resources should fall into 1b), still be used from session dir
> - hdfs scheme resources should fall into 1a), simply used by addHdfsResource
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to