Vaibhav Gumashta created HIVE-21451:
---------------------------------------
Summary: ACID: Avoid using hive.acid.key.index to determine if the
file is original or not
Key: HIVE-21451
URL: https://issues.apache.org/jira/browse/HIVE-21451
Project: Hive
Issue Type: Bug
Components: Transactions
Affects Versions: 3.1.1
Reporter: Vaibhav Gumashta
The transactional files written in hive have each row decorated with ROW__ID
column. However, when we bring in files using LOAD DATA... command to the
transactional tables, they do not have these metadata columns (in Hive ACID
parlance, these are called original files). These original files are decorated
with an inferred ROW__ID generated while reading these. However, after these
are compacted, the ROW__ID metadata column, becomes part of the file itself.
To determine if a file is original or not, currently we use check for the
presence of hive.acid.key.index. For query based compaction, currently we do
not write hive.acid.key.index (HIVE-21165). This means, there is a possibility
that that even after compaction, they get treated as original files.
Irrespective of HIVE-21165, we should avoid hive.acid.key.index to decide
whether the file is original or not, and instead look for the presence of
ROW__ID to do that. hive.acid.key.index should be treated as a performance
optimization, as it was seemingly meant to be.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)