Karen Coppage created HIVE-24021:
------------------------------------

             Summary: Read insert-only tables truncated by Impala correctly
                 Key: HIVE-24021
                 URL: https://issues.apache.org/jira/browse/HIVE-24021
             Project: Hive
          Issue Type: Bug
            Reporter: Karen Coppage
            Assignee: Karen Coppage


Impala truncates insert-only tables by writing a base directory containing an 
empty file named "_empty". (Like Hive should, see HIVE-20137) Generally in Hive 
a file name beginning with an underscore connotes a temporary file that isn't 
supposed to be read by operations that didn't create it.
 Before HIVE-23495, getAcidState listed each directory in the table 
(HdfsUtils#listLocatedStatus) – and filtered out directories with names 
beginning with an underscore or period as they are presumably temporary. This 
allowed files called "_empty" to be read, since hive checked the directory name 
and not the file name.
 After HIVE-23495, we recursively list each file in the table 
(AcidUtils#getHdfsDirSnapshots) with a filter that doesn't accept files with 
names beginning with an underscore or period as they are presumably temporary. 
As a result Hive reads the table data as if the truncate operation had not 
happened.

Since performance in getAcidState is important, probably the best solution is 
make an exception in the filter and accept files with the name "_empty".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to