[ https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628270#comment-16628270 ]
Hive QA commented on HIVE-16812: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12941300/HIVE-16812.05.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 14999 tests executed *Failed tests:* {noformat} TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=194) [druidmini_dynamic_partition.q,druidmini_test_ts.q,druidmini_expressions.q,druidmini_test_alter.q,druidmini_test_insert.q] org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_acid4] (batchId=159) org.apache.hive.streaming.TestStreaming.testAutoRollTransactionBatch (batchId=323) org.apache.hive.streaming.TestStreaming.testNoBuckets (batchId=323) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/14052/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/14052/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-14052/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12941300 - PreCommit-HIVE-Build > VectorizedOrcAcidRowBatchReader doesn't filter delete events > ------------------------------------------------------------ > > Key: HIVE-16812 > URL: https://issues.apache.org/jira/browse/HIVE-16812 > Project: Hive > Issue Type: Improvement > Components: Transactions > Affects Versions: 2.3.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-16812.02.patch, HIVE-16812.04.patch, > HIVE-16812.05.patch > > > the c'tor of VectorizedOrcAcidRowBatchReader has > {noformat} > // Clone readerOptions for deleteEvents. > Reader.Options deleteEventReaderOptions = readerOptions.clone(); > // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX > because > // we always want to read all the delete delta files. > deleteEventReaderOptions.range(0, Long.MAX_VALUE); > {noformat} > This is suboptimal since base and deltas are sorted by ROW__ID. So for each > split if base we can find min/max ROW_ID and only load events from delta that > are in [min,max] range. This will reduce the number of delete events we load > in memory (to no more than there in the split). > When we support sorting on PK, the same should apply but we'd need to make > sure to store PKs in ORC index > See {{OrcRawRecordMerger.discoverKeyBounds()}} > {{hive.acid.key.index}} in Orc footer has an index of ROW__IDs so we should > know min/max easily for any file written by {{OrcRecordUpdater}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)