Peter Vary created HIVE-23764:
---------------------------------
Summary: Remove unnecessary getLastFlushLength when checking
delete delta files
Key: HIVE-23764
URL: https://issues.apache.org/jira/browse/HIVE-23764
Project: Hive
Issue Type: Improvement
Components: Transactions
Reporter: Peter Vary
Assignee: Peter Vary
VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls
OrcAcidUtils.getLastFlushLength for every delete delta file.
Even the comment says:
{code}
// NOTE: Calling last flush length below is more for
future-proofing when we have
// streaming deletes. But currently we don't support streaming
deletes, and this can
// be removed if this becomes a performance issue.
{code}
If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then for
every base + delta dir we will check all of the delete_delta directories, and
check the getLastFlushLength method which will result in 6*5=30 unnecessary
NN/S3 calls.
We should remove the check as already proposed in the comment.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)