[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147553#comment-17147553 ]
Rajesh Balamohan commented on HIVE-23764: ----------------------------------------- Related ticket : https://issues.apache.org/jira/browse/HIVE-23597 > Remove unnecessary getLastFlushLength when checking delete delta files > ---------------------------------------------------------------------- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions > Reporter: Peter Vary > Assignee: Peter Vary > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)