[ https://issues.apache.org/jira/browse/HIVE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149471#comment-17149471 ]
Peter Vary commented on HIVE-23764: ----------------------------------- [~rajesh.balamohan]: I see that in HIVE-23597 we have issues with some tests. Also caching the OrcTail might be better placed in LLAP IO, and [~szita] is working on a possible solution. What do you think about pushing this change, and if we hit some road-block with the LLAP IO solution then we might pick up HIVE-23597 again? Thanks, Peter > Remove unnecessary getLastFlushLength when checking delete delta files > ---------------------------------------------------------------------- > > Key: HIVE-23764 > URL: https://issues.apache.org/jira/browse/HIVE-23764 > Project: Hive > Issue Type: Improvement > Components: Transactions > Reporter: Peter Vary > Assignee: Peter Vary > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry calls > OrcAcidUtils.getLastFlushLength for every delete delta file. > Even the comment says: > {code} > // NOTE: Calling last flush length below is more for > future-proofing when we have > // streaming deletes. But currently we don't support streaming > deletes, and this can > // be removed if this becomes a performance issue. > {code} > If we have a table with 5 updates (1 base + 5 delta + 5 delete_delta), then > for every base + delta dir we will check all of the delete_delta directories, > and check the getLastFlushLength method which will result in 6*5=30 > unnecessary NN/S3 calls. > We should remove the check as already proposed in the comment. -- This message was sent by Atlassian Jira (v8.3.4#803005)