[ https://issues.apache.org/jira/browse/HIVE-27723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-27723: -------------------------------- Summary: Prevent localizing the same original file more than once if symlinks are present (was: Prevent localizing the same file more than once) > Prevent localizing the same original file more than once if symlinks are > present > -------------------------------------------------------------------------------- > > Key: HIVE-27723 > URL: https://issues.apache.org/jira/browse/HIVE-27723 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > > We already calculate SHA hashes for the files to be localized. There is a > chance, that in some setups, the hive-exec jars are symlinked so it gets > localized more than once. > {code} > [root@lbodor-hiveontez-4 ~]# sudo -u hive hdfs dfs -ls -R > /tmp/hive/hive/_tez_session_dir > drwx------ - hive supergroup 0 2023-09-20 12:13 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6 > drwx------ - hive supergroup 0 2023-09-20 12:19 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6/.tez > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/0febf6f5-bacc-4055-b22b-e621c59cd1d6-resources/hive-exec.jar > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1 > drwx------ - hive supergroup 0 2023-09-20 12:04 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1/.tez > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/21686e3c-2a00-457b-b84f-1a8db37699d1-resources/hive-exec.jar > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad > drwx------ - hive supergroup 0 2023-09-20 13:13 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad/.tez > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/40c7fb13-cfa1-4377-8d40-7e19503fbdad-resources/hive-exec.jar > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57 > drwx------ - hive supergroup 0 2023-09-20 12:04 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57/.tez > drwx------ - hive supergroup 0 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec-3.1.3000.7.2.18.0-334.jar > -rw-r--r-- 3 hive supergroup 78366781 2023-09-20 11:58 > /tmp/hive/hive/_tez_session_dir/5c48d6ab-ed8c-49c9-afe0-465de82c9c57-resources/hive-exec.jar > {code} > in the presence of huge amount of sessions, we cannot afford this overhead of > copying this files to HDFS and localizing to all containers twice > the root cause can be solved by removing symlinks of the same hive-exec jar, > however, as we're already calculating SHA for the files, it's so easy to take > care of the duplications in the localization codepath, and this takes care of > any accidental duplications -- This message was sent by Atlassian Jira (v8.20.10#820010)