[ https://issues.apache.org/jira/browse/HIVE-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman updated HIVE-17458: ---------------------------------- Status: Patch Available (was: Open) > VectorizedOrcAcidRowBatchReader doesn't handle 'original' files > --------------------------------------------------------------- > > Key: HIVE-17458 > URL: https://issues.apache.org/jira/browse/HIVE-17458 > Project: Hive > Issue Type: Improvement > Affects Versions: 2.2.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Priority: Critical > Attachments: HIVE-17458.01.patch > > > VectorizedOrcAcidRowBatchReader will not be used for original files. This > will likely look like a perf regression when converting a table from non-acid > to acid until it runs through a major compaction. > With Load Data support, if large files are added via Load Data, the read ops > will not vectorize until major compaction. > There is no reason why this should be the case. Just like > OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other > files in the logical tranche/bucket and calculate the offset for the RowBatch > of the split. (Presumably getRecordReader().getRowNumber() works the same in > vector mode). > In this case we don't even need OrcSplit.isOriginal() - the reader can infer > it from file path... which in particular simplifies > OrcInputFormat.determineSplitStrategies() -- This message was sent by Atlassian JIRA (v6.4.14#64029)