Karen Coppage created HIVE-24021: ------------------------------------ Summary: Read insert-only tables truncated by Impala correctly Key: HIVE-24021 URL: https://issues.apache.org/jira/browse/HIVE-24021 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage
Impala truncates insert-only tables by writing a base directory containing an empty file named "_empty". (Like Hive should, see HIVE-20137) Generally in Hive a file name beginning with an underscore connotes a temporary file that isn't supposed to be read by operations that didn't create it. Before HIVE-23495, getAcidState listed each directory in the table (HdfsUtils#listLocatedStatus) – and filtered out directories with names beginning with an underscore or period as they are presumably temporary. This allowed files called "_empty" to be read, since hive checked the directory name and not the file name. After HIVE-23495, we recursively list each file in the table (AcidUtils#getHdfsDirSnapshots) with a filter that doesn't accept files with names beginning with an underscore or period as they are presumably temporary. As a result Hive reads the table data as if the truncate operation had not happened. Since performance in getAcidState is important, probably the best solution is make an exception in the filter and accept files with the name "_empty". -- This message was sent by Atlassian Jira (v8.3.4#803005)