[ 
https://issues.apache.org/jira/browse/HIVE-27723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27723:
--------------------------------
    Summary: Prevent localizing the same original file more than once if 
symlinks are present  (was: Prevent localizing the same file more than once)

> Prevent localizing the same original file more than once if symlinks are 
> present
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-27723
>                 URL: https://issues.apache.org/jira/browse/HIVE-27723
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>
> We already calculate SHA hashes for the files to be localized. There is a 
> chance, that in some setups, the hive-exec jars are symlinked so it gets 
> localized more than once.
> {code}
> [root@lbodor-hiveontez-4 ~]# sudo -u hive hdfs dfs -ls -R 
> /tmp/hive/hive/_tez_session_dir
> drwx------   - hive supergroup          0 2023-09-20 12:13 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6
> drwx------   - hive supergroup          0 2023-09-20 12:19 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6/.tez
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec.jar
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1
> drwx------   - hive supergroup          0 2023-09-20 12:04 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1/.tez
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec.jar
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad
> drwx------   - hive supergroup          0 2023-09-20 13:13 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad/.tez
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec.jar
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57
> drwx------   - hive supergroup          0 2023-09-20 12:04 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57/.tez
> drwx------   - hive supergroup          0 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec-3.1.3000.7.2.18.0-334.jar
> -rw-r--r--   3 hive supergroup   78366781 2023-09-20 11:58 
> /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec.jar
> {code}
> in the presence of huge amount of sessions, we cannot afford this overhead of 
> copying this files to HDFS and localizing to all containers twice
> the root cause can be solved by removing symlinks of the same hive-exec jar, 
> however, as we're already calculating SHA for the files, it's so easy to take 
> care of the duplications in the localization codepath, and this takes care of 
> any accidental duplications



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to